2026-02-21T08:04:43.3024241Z Current runner version: '2.331.0' 2026-02-21T08:04:43.3028103Z Runner name: 'dgxb200-05-1002' 2026-02-21T08:04:43.3028634Z Runner group name: 'default' 2026-02-21T08:04:43.3029180Z Machine name: '524c2002bf7a' 2026-02-21T08:04:43.3030873Z ##[group]GITHUB_TOKEN Permissions 2026-02-21T08:04:43.3032377Z Contents: read 2026-02-21T08:04:43.3032757Z Metadata: read 2026-02-21T08:04:43.3033146Z ##[endgroup] 2026-02-21T08:04:43.3034621Z Secret source: Actions 2026-02-21T08:04:43.3035204Z Prepare workflow directory 2026-02-21T08:04:43.3400801Z Prepare all required actions 2026-02-21T08:04:43.3429681Z Getting action download info 2026-02-21T08:04:43.8424952Z Download action repository 'actions/checkout@v6' (SHA:de0fac2e4500dabe0009e67214ff5f5447ce83dd) 2026-02-21T08:04:44.1955279Z Download action repository 'actions/setup-python@v6' (SHA:a309ff8b426b58ec0e2a45f0f869d46889d02405) 2026-02-21T08:04:44.5920215Z Download action repository 'astral-sh/setup-uv@v7' (SHA:eac588ad8def6316056a12d4907a9d4d84ff7a3b) 2026-02-21T08:04:44.9547929Z Download action repository 'pytorch/test-infra@main' (SHA:bb8f04ff3961233c844fde6533c7c6c5f0857909) 2026-02-21T08:04:45.6581460Z Download action repository 'actions/upload-artifact@v6' (SHA:b7c566a772e6b6bfb58ed0dc250532a479d7789f) 2026-02-21T08:04:46.1153459Z Getting action download info 2026-02-21T08:04:46.3078330Z Uses: pytorch/helion/.github/workflows/benchmark.yml@refs/heads/main (874a7d0cadab18218a84ad3579d329dc95c51820) 2026-02-21T08:04:46.3081516Z ##[group] Inputs 2026-02-21T08:04:46.3081798Z runner: linux.dgx.b200 2026-02-21T08:04:46.3082138Z python-version: 3.12 2026-02-21T08:04:46.3082392Z image: nvidia/cuda:13.0.1-devel-ubuntu24.04 2026-02-21T08:04:46.3082695Z runtime-version: cu130 2026-02-21T08:04:46.3083000Z container-options: --gpus all 2026-02-21T08:04:46.3083273Z alias: b200 2026-02-21T08:04:46.3083472Z kernels: gemm 2026-02-21T08:04:46.3083741Z env-vars: 2026-02-21T08:04:46.3083946Z custom-args: 2026-02-21T08:04:46.3084428Z run_h100: true 2026-02-21T08:04:46.3084747Z run_b200: true 2026-02-21T08:04:46.3084951Z run_mi325x: true 2026-02-21T08:04:46.3085201Z ##[endgroup] 2026-02-21T08:04:46.3085497Z Complete job name: run-b200 (gemm) / benchmark-cu130-gemm-py3.12-b200 2026-02-21T08:04:46.3332332Z ##[group]Checking docker version 2026-02-21T08:04:46.3342260Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2026-02-21T08:04:46.3531704Z '1.53' 2026-02-21T08:04:46.3545044Z Docker daemon API version: '1.53' 2026-02-21T08:04:46.3545529Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2026-02-21T08:04:46.3706816Z '1.52' 2026-02-21T08:04:46.3720933Z Docker client API version: '1.52' 2026-02-21T08:04:46.3725518Z ##[endgroup] 2026-02-21T08:04:46.3727337Z ##[group]Clean up resources from previous jobs 2026-02-21T08:04:46.3731759Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=fd04e0" 2026-02-21T08:04:46.3862122Z ##[command]/usr/bin/docker network prune --force --filter "label=fd04e0" 2026-02-21T08:04:46.3982911Z ##[endgroup] 2026-02-21T08:04:46.3983349Z ##[group]Create local container network 2026-02-21T08:04:46.3991098Z ##[command]/usr/bin/docker network create --label fd04e0 github_network_b2b3600e3d064225aed0e5f8bd238faa 2026-02-21T08:04:46.4336210Z b8599a6fbf9f6d4896771f4f1bb0f8ff5896fd23a51cbdf3c4374f61604d33ab 2026-02-21T08:04:46.4367675Z ##[endgroup] 2026-02-21T08:04:46.4388722Z ##[group]Starting job container 2026-02-21T08:04:46.4405162Z ##[command]/usr/bin/docker pull nvidia/cuda:13.0.1-devel-ubuntu24.04 2026-02-21T08:04:47.5269772Z 13.0.1-devel-ubuntu24.04: Pulling from nvidia/cuda 2026-02-21T08:04:47.8001841Z 1cd98a0b9132: Pulling fs layer 2026-02-21T08:04:47.8005565Z 76249c7cd503: Pulling fs layer 2026-02-21T08:04:47.8006214Z c20926c42231: Pulling fs layer 2026-02-21T08:04:47.8006633Z eea924c2c8fb: Pulling fs layer 2026-02-21T08:04:47.8007600Z afcf80b42416: Pulling fs layer 2026-02-21T08:04:47.8008149Z 8fb7ecb711ef: Pulling fs layer 2026-02-21T08:04:47.8008435Z 401d11fb2a09: Pulling fs layer 2026-02-21T08:04:47.8009045Z d7913b78456a: Pulling fs layer 2026-02-21T08:04:47.8009369Z c03b8ec8dd33: Pulling fs layer 2026-02-21T08:04:47.8009588Z e93dd1223ff5: Pulling fs layer 2026-02-21T08:04:47.8009863Z ab7341a40ee7: Pulling fs layer 2026-02-21T08:04:48.1524478Z 1cd98a0b9132: Download complete 2026-02-21T08:04:48.1528674Z afcf80b42416: Download complete 2026-02-21T08:04:48.1535111Z 8fb7ecb711ef: Download complete 2026-02-21T08:04:48.1544661Z d7913b78456a: Download complete 2026-02-21T08:04:48.1546862Z c03b8ec8dd33: Download complete 2026-02-21T08:04:48.2528285Z c20926c42231: Download complete 2026-02-21T08:04:49.0530900Z 76249c7cd503: Download complete 2026-02-21T08:04:49.0536264Z 401d11fb2a09: Download complete 2026-02-21T08:04:49.8568453Z 76249c7cd503: Pull complete 2026-02-21T08:04:53.9531234Z ab7341a40ee7: Download complete 2026-02-21T08:05:00.2562602Z 401d11fb2a09: Pull complete 2026-02-21T08:05:07.5578297Z ab7341a40ee7: Pull complete 2026-02-21T08:05:07.8544613Z d7913b78456a: Pull complete 2026-02-21T08:05:08.2563888Z c03b8ec8dd33: Pull complete 2026-02-21T08:05:26.2525049Z eea924c2c8fb: Download complete 2026-02-21T08:05:34.0536213Z e93dd1223ff5: Download complete 2026-02-21T08:05:59.1539966Z afcf80b42416: Pull complete 2026-02-21T08:05:59.1541205Z 8fb7ecb711ef: Pull complete 2026-02-21T08:05:59.1549074Z c20926c42231: Pull complete 2026-02-21T08:05:59.1554631Z eea924c2c8fb: Pull complete 2026-02-21T08:07:00.4319645Z 1cd98a0b9132: Pull complete 2026-02-21T08:07:00.4333192Z e93dd1223ff5: Pull complete 2026-02-21T08:07:00.4335734Z Digest: sha256:7d2f6a8c2071d911524f95061a0db363e24d27aa51ec831fcccf9e76eb72bc92 2026-02-21T08:07:00.4336379Z Status: Downloaded newer image for nvidia/cuda:13.0.1-devel-ubuntu24.04 2026-02-21T08:07:00.4339566Z docker.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 2026-02-21T08:07:00.4414632Z ##[command]/usr/bin/docker create --name 3d779417fd284758a9cf9374529f9c31_nvidiacuda1301develubuntu2404_79158c --label fd04e0 --workdir /__w/helion/helion --network github_network_b2b3600e3d064225aed0e5f8bd238faa --gpus all -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/alice/_work":"/__w" -v "/home/alice/externals":"/__e":ro -v "/home/alice/_work/_temp":"/__w/_temp" -v "/home/alice/_work/_actions":"/__w/_actions" -v "/home/alice/_work/_tool":"/__w/_tool" -v "/home/alice/_work/_temp/_github_home":"/github/home" -v "/home/alice/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" nvidia/cuda:13.0.1-devel-ubuntu24.04 "-f" "/dev/null" 2026-02-21T08:07:00.4789808Z b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 2026-02-21T08:07:00.4812223Z ##[command]/usr/bin/docker start b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 2026-02-21T08:07:00.7494117Z b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 2026-02-21T08:07:00.7520022Z ##[command]/usr/bin/docker ps --all --filter id=b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2026-02-21T08:07:00.7708353Z b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 Up Less than a second 2026-02-21T08:07:00.7727556Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 2026-02-21T08:07:00.7812380Z GITHUB_ACTIONS=true 2026-02-21T08:07:00.7812728Z CI=true 2026-02-21T08:07:00.7812960Z HOME=/github/home 2026-02-21T08:07:00.7813355Z PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T08:07:00.7813747Z NVARCH=x86_64 2026-02-21T08:07:00.7818548Z NVIDIA_REQUIRE_CUDA=cuda>=13.0 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,driver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=cloudgaming,driver>=570,driver<571 brand=unknown,driver>=575,driver<576 brand=grid,driver>=575,driver<576 brand=tesla,driver>=575,driver<576 brand=nvidia,driver>=575,driver<576 brand=quadro,driver>=575,driver<576 brand=quadrortx,driver>=575,driver<576 brand=nvidiartx,driver>=575,driver<576 brand=vapps,driver>=575,driver<576 brand=vpc,driver>=575,driver<576 brand=vcs,driver>=575,driver<576 brand=vws,driver>=575,driver<576 brand=cloudgaming,driver>=575,driver<576 2026-02-21T08:07:00.7823710Z NV_CUDA_CUDART_VERSION=13.0.88-1 2026-02-21T08:07:00.7823913Z CUDA_VERSION=13.0.1 2026-02-21T08:07:00.7824272Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:07:00.7824627Z NVIDIA_VISIBLE_DEVICES=all 2026-02-21T08:07:00.7824971Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2026-02-21T08:07:00.7825253Z NV_CUDA_LIB_VERSION=13.0.1-1 2026-02-21T08:07:00.7825617Z NV_NVTX_VERSION=13.0.85-1 2026-02-21T08:07:00.7825872Z NV_LIBNPP_VERSION=13.0.1.2-1 2026-02-21T08:07:00.7826122Z NV_LIBNPP_PACKAGE=libnpp-13-0=13.0.1.2-1 2026-02-21T08:07:00.7826418Z NV_LIBCUSPARSE_VERSION=12.6.3.3-1 2026-02-21T08:07:00.7826636Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-13-0 2026-02-21T08:07:00.7826927Z NV_LIBCUBLAS_VERSION=13.0.2.14-1 2026-02-21T08:07:00.7827156Z NV_LIBCUBLAS_PACKAGE=libcublas-13-0=13.0.2.14-1 2026-02-21T08:07:00.7827415Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2026-02-21T08:07:00.7827663Z NV_LIBNCCL_PACKAGE_VERSION=2.28.3-1 2026-02-21T08:07:00.7827924Z NCCL_VERSION=2.28.3-1 2026-02-21T08:07:00.7828167Z NV_LIBNCCL_PACKAGE=libnccl2=2.28.3-1+cuda13.0 2026-02-21T08:07:00.7828455Z NVIDIA_PRODUCT_NAME=CUDA 2026-02-21T08:07:00.7828690Z NV_CUDA_CUDART_DEV_VERSION=13.0.88-1 2026-02-21T08:07:00.7828912Z NV_NVML_DEV_VERSION=13.0.87-1 2026-02-21T08:07:00.7829187Z NV_LIBCUSPARSE_DEV_VERSION=12.6.3.3-1 2026-02-21T08:07:00.7829411Z NV_LIBNPP_DEV_VERSION=13.0.1.2-1 2026-02-21T08:07:00.7829689Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-13-0=13.0.1.2-1 2026-02-21T08:07:00.7830064Z NV_LIBCUBLAS_DEV_VERSION=13.0.2.14-1 2026-02-21T08:07:00.7830323Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-13-0 2026-02-21T08:07:00.7830636Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-13-0=13.0.2.14-1 2026-02-21T08:07:00.7830907Z NV_CUDA_NSIGHT_COMPUTE_VERSION=13.0.1-1 2026-02-21T08:07:00.7831232Z NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE=cuda-nsight-compute-13-0=13.0.1-1 2026-02-21T08:07:00.7831563Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2026-02-21T08:07:00.7831803Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.28.3-1 2026-02-21T08:07:00.7832409Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.28.3-1+cuda13.0 2026-02-21T08:07:00.7832681Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2026-02-21T08:07:00.7837928Z ##[endgroup] 2026-02-21T08:07:00.7844401Z ##[group]Waiting for all services to be ready 2026-02-21T08:07:00.7846043Z ##[endgroup] 2026-02-21T08:07:00.7974944Z ##[group]Run echo "Detected NVIDIA image" 2026-02-21T08:07:00.7975305Z echo "Detected NVIDIA image" 2026-02-21T08:07:00.7976249Z nvidia-smi || echo "nvidia-smi not found" 2026-02-21T08:07:00.7978514Z shell: bash -l {0} 2026-02-21T08:07:00.7978886Z env: 2026-02-21T08:07:00.7979091Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:00.7979417Z ##[endgroup] 2026-02-21T08:07:00.8617569Z Detected NVIDIA image 2026-02-21T08:07:00.8830710Z Sat Feb 21 08:07:00 2026 2026-02-21T08:07:00.8835103Z +-----------------------------------------------------------------------------------------+ 2026-02-21T08:07:00.8835990Z | NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 | 2026-02-21T08:07:00.8836518Z +-----------------------------------------+------------------------+----------------------+ 2026-02-21T08:07:00.8837033Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2026-02-21T08:07:00.8839544Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2026-02-21T08:07:00.8839967Z | | | MIG M. | 2026-02-21T08:07:00.8840290Z |=========================================+========================+======================| 2026-02-21T08:07:00.8957692Z | 0 NVIDIA B200 Off | 00000000:1B:00.0 Off | 0 | 2026-02-21T08:07:00.8958148Z | N/A 30C P0 138W / 750W | 0MiB / 183359MiB | 0% Default | 2026-02-21T08:07:00.8958547Z | | | Disabled | 2026-02-21T08:07:00.8958945Z +-----------------------------------------+------------------------+----------------------+ 2026-02-21T08:07:00.8964748Z 2026-02-21T08:07:00.8965244Z +-----------------------------------------------------------------------------------------+ 2026-02-21T08:07:00.8965811Z | Processes: | 2026-02-21T08:07:00.8966411Z | GPU GI CI PID Type Process name GPU Memory | 2026-02-21T08:07:00.8966808Z | ID ID Usage | 2026-02-21T08:07:00.8967236Z |=========================================================================================| 2026-02-21T08:07:00.8967663Z | No running processes found | 2026-02-21T08:07:00.8968193Z +-----------------------------------------------------------------------------------------+ 2026-02-21T08:07:00.9365333Z ##[group]Run set -x 2026-02-21T08:07:00.9365582Z set -x 2026-02-21T08:07:00.9365765Z apt-get update 2026-02-21T08:07:00.9365983Z apt-get install -y git 2026-02-21T08:07:00.9366277Z shell: bash -l {0} 2026-02-21T08:07:00.9366446Z env: 2026-02-21T08:07:00.9366658Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:00.9366927Z ##[endgroup] 2026-02-21T08:07:00.9983097Z + apt-get update 2026-02-21T08:07:01.0536771Z Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 InRelease [1581 B] 2026-02-21T08:07:01.1625132Z Get:2 http://archive.ubuntu.com/ubuntu noble InRelease [256 kB] 2026-02-21T08:07:01.1719962Z Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 Packages [1218 kB] 2026-02-21T08:07:01.2992502Z Get:4 http://security.ubuntu.com/ubuntu noble-security InRelease [126 kB] 2026-02-21T08:07:01.5408061Z Get:5 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB] 2026-02-21T08:07:01.6350712Z Get:6 http://archive.ubuntu.com/ubuntu noble-backports InRelease [126 kB] 2026-02-21T08:07:01.7778793Z Get:7 http://archive.ubuntu.com/ubuntu noble/restricted amd64 Packages [117 kB] 2026-02-21T08:07:01.7785429Z Get:8 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages [1808 kB] 2026-02-21T08:07:01.9483340Z Get:9 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages [1857 kB] 2026-02-21T08:07:02.0764299Z Get:10 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages [19.3 MB] 2026-02-21T08:07:02.4500863Z Get:11 http://archive.ubuntu.com/ubuntu noble/multiverse amd64 Packages [331 kB] 2026-02-21T08:07:02.4555505Z Get:12 http://archive.ubuntu.com/ubuntu noble-updates/restricted amd64 Packages [3381 kB] 2026-02-21T08:07:02.5112157Z Get:13 http://archive.ubuntu.com/ubuntu noble-updates/universe amd64 Packages [2016 kB] 2026-02-21T08:07:02.5516880Z Get:14 http://archive.ubuntu.com/ubuntu noble-updates/multiverse amd64 Packages [38.1 kB] 2026-02-21T08:07:02.6408114Z Get:15 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages [2240 kB] 2026-02-21T08:07:02.6478268Z Get:16 http://archive.ubuntu.com/ubuntu noble-backports/universe amd64 Packages [34.6 kB] 2026-02-21T08:07:02.6478850Z Get:17 http://archive.ubuntu.com/ubuntu noble-backports/main amd64 Packages [49.5 kB] 2026-02-21T08:07:02.6753533Z Get:18 http://security.ubuntu.com/ubuntu noble-security/restricted amd64 Packages [3196 kB] 2026-02-21T08:07:02.8741165Z Get:19 http://security.ubuntu.com/ubuntu noble-security/universe amd64 Packages [1207 kB] 2026-02-21T08:07:02.9286637Z Get:20 http://security.ubuntu.com/ubuntu noble-security/multiverse amd64 Packages [34.8 kB] 2026-02-21T08:07:03.3828258Z Fetched 37.5 MB in 2s (15.9 MB/s) 2026-02-21T08:07:03.9954222Z Reading package lists... 2026-02-21T08:07:04.0061861Z W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details. 2026-02-21T08:07:04.0071576Z + apt-get install -y git 2026-02-21T08:07:04.6302173Z Reading package lists... 2026-02-21T08:07:04.7473359Z Building dependency tree... 2026-02-21T08:07:04.7475279Z Reading state information... 2026-02-21T08:07:04.8821093Z The following additional packages will be installed: 2026-02-21T08:07:04.8823695Z git-man krb5-locales less libbrotli1 libbsd0 libcbor0.10 libcurl3t64-gnutls 2026-02-21T08:07:04.8824372Z libedit2 liberror-perl libexpat1 libfido2-1 libgssapi-krb5-2 libk5crypto3 2026-02-21T08:07:04.8824941Z libkeyutils1 libkrb5-3 libkrb5support0 libnghttp2-14 libpsl5t64 librtmp1 2026-02-21T08:07:04.8825389Z libssh-4 libx11-6 libx11-data libxau6 libxcb1 libxdmcp6 libxext6 libxmuu1 2026-02-21T08:07:04.8827235Z openssh-client publicsuffix xauth 2026-02-21T08:07:04.8833460Z Suggested packages: 2026-02-21T08:07:04.8833877Z gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui 2026-02-21T08:07:04.8838510Z gitk gitweb git-cvs git-mediawiki git-svn krb5-doc krb5-user keychain 2026-02-21T08:07:04.8838911Z libpam-ssh monkeysphere ssh-askpass 2026-02-21T08:07:04.9217460Z The following NEW packages will be installed: 2026-02-21T08:07:04.9222097Z git git-man krb5-locales less libbrotli1 libbsd0 libcbor0.10 2026-02-21T08:07:04.9223212Z libcurl3t64-gnutls libedit2 liberror-perl libexpat1 libfido2-1 2026-02-21T08:07:04.9223592Z libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 2026-02-21T08:07:04.9224027Z libnghttp2-14 libpsl5t64 librtmp1 libssh-4 libx11-6 libx11-data libxau6 2026-02-21T08:07:04.9228527Z libxcb1 libxdmcp6 libxext6 libxmuu1 openssh-client publicsuffix xauth 2026-02-21T08:07:05.0784013Z 0 upgraded, 31 newly installed, 0 to remove and 86 not upgraded. 2026-02-21T08:07:05.0788384Z Need to get 8886 kB of archives. 2026-02-21T08:07:05.0790211Z After this operation, 38.0 MB of additional disk space will be used. 2026-02-21T08:07:05.0790744Z Get:1 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 krb5-locales all 1.20.1-6ubuntu2.6 [14.8 kB] 2026-02-21T08:07:05.2207652Z Get:2 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 less amd64 590-2ubuntu2.1 [142 kB] 2026-02-21T08:07:05.4109853Z Get:3 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libbsd0 amd64 0.12.1-1build1.1 [41.2 kB] 2026-02-21T08:07:05.4395218Z Get:4 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libexpat1 amd64 2.6.1-2ubuntu0.4 [88.2 kB] 2026-02-21T08:07:05.4759200Z Get:5 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libkrb5support0 amd64 1.20.1-6ubuntu2.6 [34.4 kB] 2026-02-21T08:07:05.4870525Z Get:6 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libk5crypto3 amd64 1.20.1-6ubuntu2.6 [82.0 kB] 2026-02-21T08:07:05.5107573Z Get:7 http://archive.ubuntu.com/ubuntu noble/main amd64 libkeyutils1 amd64 1.6.3-3build1 [9490 B] 2026-02-21T08:07:05.5135035Z Get:8 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libkrb5-3 amd64 1.20.1-6ubuntu2.6 [348 kB] 2026-02-21T08:07:05.5750011Z Get:9 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libgssapi-krb5-2 amd64 1.20.1-6ubuntu2.6 [143 kB] 2026-02-21T08:07:05.6237345Z Get:10 http://archive.ubuntu.com/ubuntu noble/main amd64 libcbor0.10 amd64 0.10.2-1.2ubuntu2 [25.8 kB] 2026-02-21T08:07:05.6271237Z Get:11 http://archive.ubuntu.com/ubuntu noble/main amd64 libedit2 amd64 3.1-20230828-1build1 [97.6 kB] 2026-02-21T08:07:05.6519375Z Get:12 http://archive.ubuntu.com/ubuntu noble/main amd64 libfido2-1 amd64 1.14.0-1build3 [83.5 kB] 2026-02-21T08:07:05.6633975Z Get:13 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libnghttp2-14 amd64 1.59.0-1ubuntu0.2 [74.3 kB] 2026-02-21T08:07:05.6735657Z Get:14 http://archive.ubuntu.com/ubuntu noble/main amd64 libpsl5t64 amd64 0.21.2-1.1build1 [57.1 kB] 2026-02-21T08:07:05.6819590Z Get:15 http://archive.ubuntu.com/ubuntu noble/main amd64 libxau6 amd64 1:1.0.9-1build6 [7160 B] 2026-02-21T08:07:05.6830047Z Get:16 http://archive.ubuntu.com/ubuntu noble/main amd64 libxdmcp6 amd64 1:1.1.3-0ubuntu6 [10.3 kB] 2026-02-21T08:07:05.6860563Z Get:17 http://archive.ubuntu.com/ubuntu noble/main amd64 libxcb1 amd64 1.15-1ubuntu2 [47.7 kB] 2026-02-21T08:07:05.6947952Z Get:18 http://archive.ubuntu.com/ubuntu noble/main amd64 libx11-data all 2:1.8.7-1build1 [115 kB] 2026-02-21T08:07:05.7531841Z Get:19 http://archive.ubuntu.com/ubuntu noble/main amd64 libx11-6 amd64 2:1.8.7-1build1 [650 kB] 2026-02-21T08:07:05.7930618Z Get:20 http://archive.ubuntu.com/ubuntu noble/main amd64 libxext6 amd64 2:1.3.4-1build2 [30.4 kB] 2026-02-21T08:07:05.8023106Z Get:21 http://archive.ubuntu.com/ubuntu noble/main amd64 libxmuu1 amd64 2:1.1.3-3build2 [8958 B] 2026-02-21T08:07:05.8032437Z Get:22 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 openssh-client amd64 1:9.6p1-3ubuntu13.14 [906 kB] 2026-02-21T08:07:06.0771373Z Get:23 http://archive.ubuntu.com/ubuntu noble/main amd64 publicsuffix all 20231001.0357-0.1 [129 kB] 2026-02-21T08:07:06.0940267Z Get:24 http://archive.ubuntu.com/ubuntu noble/main amd64 xauth amd64 1:1.1.2-1build1 [25.6 kB] 2026-02-21T08:07:06.0979086Z Get:25 http://archive.ubuntu.com/ubuntu noble/main amd64 libbrotli1 amd64 1.1.0-2build2 [331 kB] 2026-02-21T08:07:06.1526830Z Get:26 http://archive.ubuntu.com/ubuntu noble/main amd64 librtmp1 amd64 2.4+20151223.gitfa8646d.1-2build7 [56.3 kB] 2026-02-21T08:07:06.1725726Z Get:27 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libssh-4 amd64 0.10.6-2ubuntu0.3 [190 kB] 2026-02-21T08:07:06.1862026Z Get:28 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libcurl3t64-gnutls amd64 8.5.0-2ubuntu10.6 [333 kB] 2026-02-21T08:07:06.2067960Z Get:29 http://archive.ubuntu.com/ubuntu noble/main amd64 liberror-perl all 0.17029-2 [25.6 kB] 2026-02-21T08:07:06.2081529Z Get:30 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 git-man all 1:2.43.0-1ubuntu7.3 [1100 kB] 2026-02-21T08:07:06.2744369Z Get:31 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 git amd64 1:2.43.0-1ubuntu7.3 [3680 kB] 2026-02-21T08:07:06.4749155Z debconf: delaying package configuration, since apt-utils is not installed 2026-02-21T08:07:06.4955171Z Fetched 8886 kB in 1s (6199 kB/s) 2026-02-21T08:07:06.5137702Z Selecting previously unselected package krb5-locales. 2026-02-21T08:07:06.5158027Z (Reading database ... 2026-02-21T08:07:06.5162128Z (Reading database ... 5% 2026-02-21T08:07:06.5163158Z (Reading database ... 10% 2026-02-21T08:07:06.5163497Z (Reading database ... 15% 2026-02-21T08:07:06.5163736Z (Reading database ... 20% 2026-02-21T08:07:06.5163983Z (Reading database ... 25% 2026-02-21T08:07:06.5164242Z (Reading database ... 30% 2026-02-21T08:07:06.5164446Z (Reading database ... 35% 2026-02-21T08:07:06.5164849Z (Reading database ... 40% 2026-02-21T08:07:06.5165063Z (Reading database ... 45% 2026-02-21T08:07:06.5165313Z (Reading database ... 50% 2026-02-21T08:07:06.5165513Z (Reading database ... 55% 2026-02-21T08:07:06.5165740Z (Reading database ... 60% 2026-02-21T08:07:06.5165952Z (Reading database ... 65% 2026-02-21T08:07:06.5169422Z (Reading database ... 70% 2026-02-21T08:07:06.5169712Z (Reading database ... 75% 2026-02-21T08:07:06.5178374Z (Reading database ... 80% 2026-02-21T08:07:06.5185671Z (Reading database ... 85% 2026-02-21T08:07:06.5196750Z (Reading database ... 90% 2026-02-21T08:07:06.5198630Z (Reading database ... 95% 2026-02-21T08:07:06.5198884Z (Reading database ... 100% 2026-02-21T08:07:06.5199210Z (Reading database ... 15507 files and directories currently installed.) 2026-02-21T08:07:06.5202957Z Preparing to unpack .../00-krb5-locales_1.20.1-6ubuntu2.6_all.deb ... 2026-02-21T08:07:06.5228131Z Unpacking krb5-locales (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:06.5453487Z Selecting previously unselected package less. 2026-02-21T08:07:06.5462640Z Preparing to unpack .../01-less_590-2ubuntu2.1_amd64.deb ... 2026-02-21T08:07:06.5490907Z Unpacking less (590-2ubuntu2.1) ... 2026-02-21T08:07:06.5728782Z Selecting previously unselected package libbsd0:amd64. 2026-02-21T08:07:06.5739910Z Preparing to unpack .../02-libbsd0_0.12.1-1build1.1_amd64.deb ... 2026-02-21T08:07:06.5777855Z Unpacking libbsd0:amd64 (0.12.1-1build1.1) ... 2026-02-21T08:07:06.5987650Z Selecting previously unselected package libexpat1:amd64. 2026-02-21T08:07:06.5999308Z Preparing to unpack .../03-libexpat1_2.6.1-2ubuntu0.4_amd64.deb ... 2026-02-21T08:07:06.6060187Z Unpacking libexpat1:amd64 (2.6.1-2ubuntu0.4) ... 2026-02-21T08:07:06.6268565Z Selecting previously unselected package libkrb5support0:amd64. 2026-02-21T08:07:06.6275187Z Preparing to unpack .../04-libkrb5support0_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:07:06.6293807Z Unpacking libkrb5support0:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:06.6495611Z Selecting previously unselected package libk5crypto3:amd64. 2026-02-21T08:07:06.6503638Z Preparing to unpack .../05-libk5crypto3_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:07:06.6519392Z Unpacking libk5crypto3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:06.6722489Z Selecting previously unselected package libkeyutils1:amd64. 2026-02-21T08:07:06.6729440Z Preparing to unpack .../06-libkeyutils1_1.6.3-3build1_amd64.deb ... 2026-02-21T08:07:06.6753276Z Unpacking libkeyutils1:amd64 (1.6.3-3build1) ... 2026-02-21T08:07:06.7202012Z Selecting previously unselected package libkrb5-3:amd64. 2026-02-21T08:07:06.7209061Z Preparing to unpack .../07-libkrb5-3_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:07:06.7241351Z Unpacking libkrb5-3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:06.7487046Z Selecting previously unselected package libgssapi-krb5-2:amd64. 2026-02-21T08:07:06.7493318Z Preparing to unpack .../08-libgssapi-krb5-2_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:07:06.7512171Z Unpacking libgssapi-krb5-2:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:06.7706708Z Selecting previously unselected package libcbor0.10:amd64. 2026-02-21T08:07:06.7714652Z Preparing to unpack .../09-libcbor0.10_0.10.2-1.2ubuntu2_amd64.deb ... 2026-02-21T08:07:06.7721810Z Unpacking libcbor0.10:amd64 (0.10.2-1.2ubuntu2) ... 2026-02-21T08:07:06.7923777Z Selecting previously unselected package libedit2:amd64. 2026-02-21T08:07:06.7932517Z Preparing to unpack .../10-libedit2_3.1-20230828-1build1_amd64.deb ... 2026-02-21T08:07:06.7951558Z Unpacking libedit2:amd64 (3.1-20230828-1build1) ... 2026-02-21T08:07:06.8182930Z Selecting previously unselected package libfido2-1:amd64. 2026-02-21T08:07:06.8190134Z Preparing to unpack .../11-libfido2-1_1.14.0-1build3_amd64.deb ... 2026-02-21T08:07:06.8207614Z Unpacking libfido2-1:amd64 (1.14.0-1build3) ... 2026-02-21T08:07:06.8420062Z Selecting previously unselected package libnghttp2-14:amd64. 2026-02-21T08:07:06.8427036Z Preparing to unpack .../12-libnghttp2-14_1.59.0-1ubuntu0.2_amd64.deb ... 2026-02-21T08:07:06.8444748Z Unpacking libnghttp2-14:amd64 (1.59.0-1ubuntu0.2) ... 2026-02-21T08:07:06.8645825Z Selecting previously unselected package libpsl5t64:amd64. 2026-02-21T08:07:06.8654612Z Preparing to unpack .../13-libpsl5t64_0.21.2-1.1build1_amd64.deb ... 2026-02-21T08:07:06.8683576Z Unpacking libpsl5t64:amd64 (0.21.2-1.1build1) ... 2026-02-21T08:07:06.8866430Z Selecting previously unselected package libxau6:amd64. 2026-02-21T08:07:06.8870604Z Preparing to unpack .../14-libxau6_1%3a1.0.9-1build6_amd64.deb ... 2026-02-21T08:07:06.8891747Z Unpacking libxau6:amd64 (1:1.0.9-1build6) ... 2026-02-21T08:07:06.9092511Z Selecting previously unselected package libxdmcp6:amd64. 2026-02-21T08:07:06.9100221Z Preparing to unpack .../15-libxdmcp6_1%3a1.1.3-0ubuntu6_amd64.deb ... 2026-02-21T08:07:06.9126153Z Unpacking libxdmcp6:amd64 (1:1.1.3-0ubuntu6) ... 2026-02-21T08:07:06.9323416Z Selecting previously unselected package libxcb1:amd64. 2026-02-21T08:07:06.9329547Z Preparing to unpack .../16-libxcb1_1.15-1ubuntu2_amd64.deb ... 2026-02-21T08:07:06.9349847Z Unpacking libxcb1:amd64 (1.15-1ubuntu2) ... 2026-02-21T08:07:06.9539365Z Selecting previously unselected package libx11-data. 2026-02-21T08:07:06.9547539Z Preparing to unpack .../17-libx11-data_2%3a1.8.7-1build1_all.deb ... 2026-02-21T08:07:06.9583472Z Unpacking libx11-data (2:1.8.7-1build1) ... 2026-02-21T08:07:06.9941073Z Selecting previously unselected package libx11-6:amd64. 2026-02-21T08:07:06.9948703Z Preparing to unpack .../18-libx11-6_2%3a1.8.7-1build1_amd64.deb ... 2026-02-21T08:07:06.9968164Z Unpacking libx11-6:amd64 (2:1.8.7-1build1) ... 2026-02-21T08:07:07.0204098Z Selecting previously unselected package libxext6:amd64. 2026-02-21T08:07:07.0212067Z Preparing to unpack .../19-libxext6_2%3a1.3.4-1build2_amd64.deb ... 2026-02-21T08:07:07.0229232Z Unpacking libxext6:amd64 (2:1.3.4-1build2) ... 2026-02-21T08:07:07.0491672Z Selecting previously unselected package libxmuu1:amd64. 2026-02-21T08:07:07.0498950Z Preparing to unpack .../20-libxmuu1_2%3a1.1.3-3build2_amd64.deb ... 2026-02-21T08:07:07.0514887Z Unpacking libxmuu1:amd64 (2:1.1.3-3build2) ... 2026-02-21T08:07:07.0725139Z Selecting previously unselected package openssh-client. 2026-02-21T08:07:07.0733917Z Preparing to unpack .../21-openssh-client_1%3a9.6p1-3ubuntu13.14_amd64.deb ... 2026-02-21T08:07:07.0797698Z Unpacking openssh-client (1:9.6p1-3ubuntu13.14) ... 2026-02-21T08:07:07.1151142Z Selecting previously unselected package publicsuffix. 2026-02-21T08:07:07.1158272Z Preparing to unpack .../22-publicsuffix_20231001.0357-0.1_all.deb ... 2026-02-21T08:07:07.1176587Z Unpacking publicsuffix (20231001.0357-0.1) ... 2026-02-21T08:07:07.1349921Z Selecting previously unselected package xauth. 2026-02-21T08:07:07.1359338Z Preparing to unpack .../23-xauth_1%3a1.1.2-1build1_amd64.deb ... 2026-02-21T08:07:07.1391854Z Unpacking xauth (1:1.1.2-1build1) ... 2026-02-21T08:07:07.1592060Z Selecting previously unselected package libbrotli1:amd64. 2026-02-21T08:07:07.1600216Z Preparing to unpack .../24-libbrotli1_1.1.0-2build2_amd64.deb ... 2026-02-21T08:07:07.1620242Z Unpacking libbrotli1:amd64 (1.1.0-2build2) ... 2026-02-21T08:07:07.1825684Z Selecting previously unselected package librtmp1:amd64. 2026-02-21T08:07:07.1834014Z Preparing to unpack .../25-librtmp1_2.4+20151223.gitfa8646d.1-2build7_amd64.deb ... 2026-02-21T08:07:07.1850058Z Unpacking librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build7) ... 2026-02-21T08:07:07.2306738Z Selecting previously unselected package libssh-4:amd64. 2026-02-21T08:07:07.2314052Z Preparing to unpack .../26-libssh-4_0.10.6-2ubuntu0.3_amd64.deb ... 2026-02-21T08:07:07.2334221Z Unpacking libssh-4:amd64 (0.10.6-2ubuntu0.3) ... 2026-02-21T08:07:07.2522795Z Selecting previously unselected package libcurl3t64-gnutls:amd64. 2026-02-21T08:07:07.2530540Z Preparing to unpack .../27-libcurl3t64-gnutls_8.5.0-2ubuntu10.6_amd64.deb ... 2026-02-21T08:07:07.2549718Z Unpacking libcurl3t64-gnutls:amd64 (8.5.0-2ubuntu10.6) ... 2026-02-21T08:07:07.2754404Z Selecting previously unselected package liberror-perl. 2026-02-21T08:07:07.2759169Z Preparing to unpack .../28-liberror-perl_0.17029-2_all.deb ... 2026-02-21T08:07:07.2778792Z Unpacking liberror-perl (0.17029-2) ... 2026-02-21T08:07:07.3021595Z Selecting previously unselected package git-man. 2026-02-21T08:07:07.3028088Z Preparing to unpack .../29-git-man_1%3a2.43.0-1ubuntu7.3_all.deb ... 2026-02-21T08:07:07.3079505Z Unpacking git-man (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:07:07.3362017Z Selecting previously unselected package git. 2026-02-21T08:07:07.3369384Z Preparing to unpack .../30-git_1%3a2.43.0-1ubuntu7.3_amd64.deb ... 2026-02-21T08:07:07.3438192Z Unpacking git (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:07:07.4718248Z Setting up libexpat1:amd64 (2.6.1-2ubuntu0.4) ... 2026-02-21T08:07:07.4775096Z Setting up libxau6:amd64 (1:1.0.9-1build6) ... 2026-02-21T08:07:07.4831162Z Setting up libkeyutils1:amd64 (1.6.3-3build1) ... 2026-02-21T08:07:07.4919050Z Setting up libcbor0.10:amd64 (0.10.2-1.2ubuntu2) ... 2026-02-21T08:07:07.4960124Z Setting up libbrotli1:amd64 (1.1.0-2build2) ... 2026-02-21T08:07:07.5035242Z Setting up libpsl5t64:amd64 (0.21.2-1.1build1) ... 2026-02-21T08:07:07.5083885Z Setting up libnghttp2-14:amd64 (1.59.0-1ubuntu0.2) ... 2026-02-21T08:07:07.5133391Z Setting up less (590-2ubuntu2.1) ... 2026-02-21T08:07:07.5287426Z Setting up krb5-locales (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:07.5336692Z Setting up libkrb5support0:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:07.5402987Z Setting up liberror-perl (0.17029-2) ... 2026-02-21T08:07:07.5468844Z Setting up libx11-data (2:1.8.7-1build1) ... 2026-02-21T08:07:07.5528845Z Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build7) ... 2026-02-21T08:07:07.5585531Z Setting up libk5crypto3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:07.5643665Z Setting up git-man (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:07:07.5718678Z Setting up libkrb5-3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:07.5779959Z Setting up libfido2-1:amd64 (1.14.0-1build3) ... 2026-02-21T08:07:07.5837371Z Setting up libbsd0:amd64 (0.12.1-1build1.1) ... 2026-02-21T08:07:07.5869408Z Setting up publicsuffix (20231001.0357-0.1) ... 2026-02-21T08:07:07.5932095Z Setting up libxdmcp6:amd64 (1:1.1.3-0ubuntu6) ... 2026-02-21T08:07:07.6001963Z Setting up libxcb1:amd64 (1.15-1ubuntu2) ... 2026-02-21T08:07:07.6066427Z Setting up libedit2:amd64 (3.1-20230828-1build1) ... 2026-02-21T08:07:07.6128602Z Setting up libgssapi-krb5-2:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:07:07.6188093Z Setting up libssh-4:amd64 (0.10.6-2ubuntu0.3) ... 2026-02-21T08:07:07.6240150Z Setting up libx11-6:amd64 (2:1.8.7-1build1) ... 2026-02-21T08:07:07.6252291Z Setting up libxmuu1:amd64 (2:1.1.3-3build2) ... 2026-02-21T08:07:07.6304217Z Setting up openssh-client (1:9.6p1-3ubuntu13.14) ... 2026-02-21T08:07:07.6843040Z Setting up libcurl3t64-gnutls:amd64 (8.5.0-2ubuntu10.6) ... 2026-02-21T08:07:07.6908090Z Setting up libxext6:amd64 (2:1.3.4-1build2) ... 2026-02-21T08:07:07.6962280Z Setting up git (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:07:07.7051803Z Setting up xauth (1:1.1.2-1build1) ... 2026-02-21T08:07:07.7111393Z Processing triggers for libc-bin (2.39-0ubuntu8.5) ... 2026-02-21T08:07:07.7497851Z ##[group]Run actions/checkout@v6 2026-02-21T08:07:07.7498160Z with: 2026-02-21T08:07:07.7498420Z repository: pytorch/helion 2026-02-21T08:07:07.7498796Z token: *** 2026-02-21T08:07:07.7499034Z ssh-strict: true 2026-02-21T08:07:07.7499268Z ssh-user: git 2026-02-21T08:07:07.7499467Z persist-credentials: true 2026-02-21T08:07:07.7499725Z clean: true 2026-02-21T08:07:07.7499937Z sparse-checkout-cone-mode: true 2026-02-21T08:07:07.7500194Z fetch-depth: 1 2026-02-21T08:07:07.7500379Z fetch-tags: false 2026-02-21T08:07:07.7500618Z show-progress: true 2026-02-21T08:07:07.7500962Z lfs: false 2026-02-21T08:07:07.7501171Z submodules: false 2026-02-21T08:07:07.7501419Z set-safe-directory: true 2026-02-21T08:07:07.7501620Z env: 2026-02-21T08:07:07.7501838Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:07.7502079Z ##[endgroup] 2026-02-21T08:07:07.7535253Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T08:07:07.9262301Z Syncing repository: pytorch/helion 2026-02-21T08:07:07.9263326Z ##[group]Getting Git version info 2026-02-21T08:07:07.9263635Z Working directory is '/__w/helion/helion' 2026-02-21T08:07:07.9264040Z [command]/usr/bin/git version 2026-02-21T08:07:07.9264322Z git version 2.43.0 2026-02-21T08:07:07.9308354Z ##[endgroup] 2026-02-21T08:07:07.9323715Z Temporarily overriding HOME='/__w/_temp/078b1558-4330-43ca-bd68-49e015bad291' before making global git config changes 2026-02-21T08:07:07.9324269Z Adding repository directory to the temporary git global config as a safe directory 2026-02-21T08:07:07.9324820Z [command]/usr/bin/git config --global --add safe.directory /__w/helion/helion 2026-02-21T08:07:07.9351854Z Deleting the contents of '/__w/helion/helion' 2026-02-21T08:07:07.9352523Z ##[group]Initializing the repository 2026-02-21T08:07:07.9360786Z [command]/usr/bin/git init /__w/helion/helion 2026-02-21T08:07:07.9389901Z hint: Using 'master' as the name for the initial branch. This default branch name 2026-02-21T08:07:07.9391725Z hint: is subject to change. To configure the initial branch name to use in all 2026-02-21T08:07:07.9392229Z hint: of your new repositories, which will suppress this warning, call: 2026-02-21T08:07:07.9392586Z hint: 2026-02-21T08:07:07.9392838Z hint: git config --global init.defaultBranch 2026-02-21T08:07:07.9393147Z hint: 2026-02-21T08:07:07.9393440Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2026-02-21T08:07:07.9393826Z hint: 'development'. The just-created branch can be renamed via this command: 2026-02-21T08:07:07.9394209Z hint: 2026-02-21T08:07:07.9394406Z hint: git branch -m 2026-02-21T08:07:07.9396330Z Initialized empty Git repository in /__w/helion/helion/.git/ 2026-02-21T08:07:07.9400863Z [command]/usr/bin/git remote add origin https://github.com/pytorch/helion 2026-02-21T08:07:07.9424639Z ##[endgroup] 2026-02-21T08:07:07.9426292Z ##[group]Disabling automatic garbage collection 2026-02-21T08:07:07.9426675Z [command]/usr/bin/git config --local gc.auto 0 2026-02-21T08:07:07.9442689Z ##[endgroup] 2026-02-21T08:07:07.9443311Z ##[group]Setting up auth 2026-02-21T08:07:07.9443593Z Removing SSH command configuration 2026-02-21T08:07:07.9449664Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2026-02-21T08:07:07.9474259Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2026-02-21T08:07:07.9710820Z Removing HTTP extra header 2026-02-21T08:07:07.9711577Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2026-02-21T08:07:07.9739643Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2026-02-21T08:07:07.9947937Z Removing includeIf entries pointing to credentials config files 2026-02-21T08:07:07.9950424Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2026-02-21T08:07:07.9977410Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2026-02-21T08:07:08.0206154Z [command]/usr/bin/git config --file /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config http.https://github.com/.extraheader AUTHORIZATION: basic *** 2026-02-21T08:07:08.0245309Z [command]/usr/bin/git config --local includeIf.gitdir:/__w/helion/helion/.git.path /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T08:07:08.0270475Z [command]/usr/bin/git config --local includeIf.gitdir:/__w/helion/helion/.git/worktrees/*.path /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T08:07:08.0295231Z [command]/usr/bin/git config --local includeIf.gitdir:/github/workspace/.git.path /github/runner_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T08:07:08.0319554Z [command]/usr/bin/git config --local includeIf.gitdir:/github/workspace/.git/worktrees/*.path /github/runner_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T08:07:08.0340932Z ##[endgroup] 2026-02-21T08:07:08.0341366Z ##[group]Fetching the repository 2026-02-21T08:07:08.0348012Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +874a7d0cadab18218a84ad3579d329dc95c51820:refs/remotes/origin/main 2026-02-21T08:07:08.5514116Z From https://github.com/pytorch/helion 2026-02-21T08:07:08.5516406Z * [new ref] 874a7d0cadab18218a84ad3579d329dc95c51820 -> origin/main 2026-02-21T08:07:08.5535710Z [command]/usr/bin/git branch --list --remote origin/main 2026-02-21T08:07:08.5558283Z origin/main 2026-02-21T08:07:08.5562400Z [command]/usr/bin/git rev-parse refs/remotes/origin/main 2026-02-21T08:07:08.5589152Z 874a7d0cadab18218a84ad3579d329dc95c51820 2026-02-21T08:07:08.5592560Z ##[endgroup] 2026-02-21T08:07:08.5592947Z ##[group]Determining the checkout info 2026-02-21T08:07:08.5593476Z ##[endgroup] 2026-02-21T08:07:08.5594902Z [command]/usr/bin/git sparse-checkout disable 2026-02-21T08:07:08.5627558Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2026-02-21T08:07:08.5647283Z ##[group]Checking out the ref 2026-02-21T08:07:08.5653795Z [command]/usr/bin/git checkout --progress --force -B main refs/remotes/origin/main 2026-02-21T08:07:08.5855209Z Switched to a new branch 'main' 2026-02-21T08:07:08.5859222Z branch 'main' set up to track 'origin/main'. 2026-02-21T08:07:08.5860891Z ##[endgroup] 2026-02-21T08:07:08.5893264Z [command]/usr/bin/git log -1 --format=%H 2026-02-21T08:07:08.5908514Z 874a7d0cadab18218a84ad3579d329dc95c51820 2026-02-21T08:07:08.6061441Z ##[group]Run actions/setup-python@v6 2026-02-21T08:07:08.6061681Z with: 2026-02-21T08:07:08.6061919Z python-version: 3.12 2026-02-21T08:07:08.6062129Z check-latest: false 2026-02-21T08:07:08.6062501Z token: *** 2026-02-21T08:07:08.6062738Z update-environment: true 2026-02-21T08:07:08.6063146Z allow-prereleases: false 2026-02-21T08:07:08.6063400Z freethreaded: false 2026-02-21T08:07:08.6063597Z env: 2026-02-21T08:07:08.6063831Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:08.6064058Z ##[endgroup] 2026-02-21T08:07:08.6068000Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T08:07:08.8013729Z ##[group]Installed versions 2026-02-21T08:07:08.8015679Z Version 3.12 was not found in the local cache 2026-02-21T08:07:09.5267137Z Version 3.12 is available for downloading 2026-02-21T08:07:09.5267759Z Download from "https://github.com/actions/python-versions/releases/download/3.12.12-18393146713/python-3.12.12-linux-24.04-x64.tar.gz" 2026-02-21T08:07:11.9169488Z Extract downloaded archive 2026-02-21T08:07:11.9268563Z [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /__w/_temp/f1b7e0ba-3992-45db-be33-7049ba9556d8 -f /__w/_temp/c8d578fe-3cd5-4b2e-9075-47db00cb642c 2026-02-21T08:07:13.7147689Z Execute installation script 2026-02-21T08:07:13.7260503Z Check if Python hostedtoolcache folder exist... 2026-02-21T08:07:13.7264655Z Creating Python hostedtoolcache folder... 2026-02-21T08:07:13.7268817Z Create Python 3.12.12 folder 2026-02-21T08:07:13.7278191Z Copy Python binaries to hostedtoolcache folder 2026-02-21T08:07:13.9903274Z Create additional symlinks (Required for the UsePythonVersion Azure Pipelines task and the setup-python GitHub Action) 2026-02-21T08:07:13.9940127Z Upgrading pip... 2026-02-21T08:07:15.3303444Z Looking in links: /tmp/tmpbzmkd_lz 2026-02-21T08:07:15.3308512Z Requirement already satisfied: pip in /__w/_tool/Python/3.12.12/x64/lib/python3.12/site-packages (25.0.1) 2026-02-21T08:07:15.3343945Z ##[error]WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2026-02-21T08:07:16.2209368Z ##[error]WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. 2026-02-21T08:07:16.3737978Z Collecting pip 2026-02-21T08:07:16.4120091Z Downloading pip-26.0.1-py3-none-any.whl.metadata (4.7 kB) 2026-02-21T08:07:16.4197194Z Downloading pip-26.0.1-py3-none-any.whl (1.8 MB) 2026-02-21T08:07:16.4482785Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 135.4 MB/s eta 0:00:00 2026-02-21T08:07:16.4570582Z Installing collected packages: pip 2026-02-21T08:07:16.4575224Z Attempting uninstall: pip 2026-02-21T08:07:16.4585236Z Found existing installation: pip 25.0.1 2026-02-21T08:07:16.4757981Z Uninstalling pip-25.0.1: 2026-02-21T08:07:16.4792017Z Successfully uninstalled pip-25.0.1 2026-02-21T08:07:17.0717664Z Successfully installed pip-26.0.1 2026-02-21T08:07:17.1206893Z Create complete file 2026-02-21T08:07:17.1238395Z Successfully set up CPython (3.12.12) 2026-02-21T08:07:17.1239126Z ##[endgroup] 2026-02-21T08:07:17.1435317Z ##[group]Run astral-sh/setup-uv@v7 2026-02-21T08:07:17.1435609Z with: 2026-02-21T08:07:17.1435809Z activate-environment: false 2026-02-21T08:07:17.1436111Z working-directory: /home/alice/_work/helion/helion 2026-02-21T08:07:17.1436542Z github-token: *** 2026-02-21T08:07:17.1436747Z enable-cache: auto 2026-02-21T08:07:17.1437230Z cache-dependency-glob: **/*requirements*.txt **/*requirements*.in **/*constraints*.txt **/*constraints*.in **/pyproject.toml **/uv.lock **/*.py.lock 2026-02-21T08:07:17.1437751Z restore-cache: true 2026-02-21T08:07:17.1437958Z save-cache: true 2026-02-21T08:07:17.1438185Z prune-cache: true 2026-02-21T08:07:17.1438378Z cache-python: false 2026-02-21T08:07:17.1440036Z ignore-nothing-to-cache: false 2026-02-21T08:07:17.1440273Z ignore-empty-workdir: false 2026-02-21T08:07:17.1440531Z add-problem-matchers: true 2026-02-21T08:07:17.1440778Z resolution-strategy: highest 2026-02-21T08:07:17.1441025Z env: 2026-02-21T08:07:17.1441322Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:17.1441570Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:17.1441898Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:07:17.1442235Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:17.1442531Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:17.1442800Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:17.1443257Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:07:17.1443665Z ##[endgroup] 2026-02-21T08:07:17.1449850Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T08:07:17.3610192Z (node:800) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2026-02-21T08:07:17.3611166Z (Use `node --trace-deprecation ...` to show where the warning was created) 2026-02-21T08:07:17.3682055Z Trying to find version for uv in: /__w/helion/helion/uv.toml 2026-02-21T08:07:17.3682536Z Could not find file: /__w/helion/helion/uv.toml 2026-02-21T08:07:17.3684738Z Trying to find version for uv in: /__w/helion/helion/pyproject.toml 2026-02-21T08:07:17.3689646Z Could not determine uv version from uv.toml or pyproject.toml. Falling back to latest. 2026-02-21T08:07:17.3690234Z Getting latest version from GitHub API... 2026-02-21T08:07:17.6053237Z manifest-file not provided, reading from local file. 2026-02-21T08:07:17.6089633Z manifest-file does not contain version 0.10.4, arch x86_64, platform unknown-linux-gnu. Falling back to GitHub releases. 2026-02-21T08:07:17.6094127Z Downloading uv from "https://github.com/astral-sh/uv/releases/download/0.10.4/uv-x86_64-unknown-linux-gnu.tar.gz" ... 2026-02-21T08:07:17.9554145Z [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /__w/_temp/e3527631-6675-4069-8fbd-021468df7034 -f /__w/_temp/6ff5fc90-29ae-4e86-ada8-16f00ae4dc6e 2026-02-21T08:07:18.3359085Z Added /github/home/.local/bin to the path 2026-02-21T08:07:18.3360131Z Added /__w/_tool/uv/0.10.4/x86_64 to the path 2026-02-21T08:07:18.3360517Z Set UV_PYTHON_INSTALL_DIR to /github/home/.local/share/uv/python 2026-02-21T08:07:18.3366331Z Added /github/home/.local/share/uv/python to the path 2026-02-21T08:07:18.3372604Z Successfully installed uv version 0.10.4 2026-02-21T08:07:18.4760778Z ##[group]Run uv venv --python 3.12 2026-02-21T08:07:18.4761147Z uv venv --python 3.12 2026-02-21T08:07:18.4761582Z shell: bash -l {0} 2026-02-21T08:07:18.4761852Z env: 2026-02-21T08:07:18.4762041Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:18.4762314Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.4762689Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:07:18.4762981Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.4763260Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.4763559Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.4763995Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:07:18.4764451Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:07:18.4764844Z ##[endgroup] 2026-02-21T08:07:18.7110494Z Using CPython 3.12.12 interpreter at: /__w/_tool/Python/3.12.12/x64/bin/python3.12 2026-02-21T08:07:18.7111206Z Creating virtual environment at: .venv 2026-02-21T08:07:18.7111602Z Activate with: source .venv/bin/activate 2026-02-21T08:07:18.7183012Z ##[group]Run source .venv/bin/activate 2026-02-21T08:07:18.7183349Z source .venv/bin/activate 2026-02-21T08:07:18.7183804Z uv pip install -U "torch==2.9.*" --index-url https://download.pytorch.org/whl/cu130 2026-02-21T08:07:18.7184302Z shell: bash -l {0} 2026-02-21T08:07:18.7184490Z env: 2026-02-21T08:07:18.7184770Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:18.7185068Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.7185380Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:07:18.7185726Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.7186017Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.7186344Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:18.7186800Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:07:18.7187346Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:07:18.7187606Z ##[endgroup] 2026-02-21T08:07:19.4319141Z Resolved 26 packages in 617ms 2026-02-21T08:07:19.4421349Z Downloading nvidia-cuda-cupti (10.2MiB) 2026-02-21T08:07:19.4459669Z Downloading nvidia-cufft (204.2MiB) 2026-02-21T08:07:19.4490360Z Downloading sympy (6.0MiB) 2026-02-21T08:07:19.4492435Z Downloading networkx (2.0MiB) 2026-02-21T08:07:19.4588062Z Downloading nvidia-cuda-runtime (2.1MiB) 2026-02-21T08:07:19.4589161Z Downloading nvidia-cusolver (184.5MiB) 2026-02-21T08:07:19.4589478Z Downloading nvidia-curand (56.8MiB) 2026-02-21T08:07:19.4589742Z Downloading nvidia-nvjitlink (38.8MiB) 2026-02-21T08:07:19.4609447Z Downloading nvidia-cudnn-cu13 (332.4MiB) 2026-02-21T08:07:19.4659149Z Downloading triton (162.6MiB) 2026-02-21T08:07:19.4659453Z Downloading torch (584.2MiB) 2026-02-21T08:07:19.4697612Z Downloading nvidia-cufile (1.2MiB) 2026-02-21T08:07:19.4804419Z Downloading nvidia-cusparse (133.8MiB) 2026-02-21T08:07:19.4866598Z Downloading nvidia-nvshmem-cu13 (57.6MiB) 2026-02-21T08:07:19.4997432Z Downloading nvidia-cusparselt-cu13 (162.0MiB) 2026-02-21T08:07:19.5153675Z Downloading nvidia-cuda-nvrtc (86.0MiB) 2026-02-21T08:07:19.5200337Z Downloading nvidia-nccl-cu13 (184.9MiB) 2026-02-21T08:07:19.5264581Z Downloading nvidia-cublas (400.0MiB) 2026-02-21T08:07:19.8273248Z Downloaded nvidia-cufile 2026-02-21T08:07:20.0438231Z Downloaded nvidia-cuda-runtime 2026-02-21T08:07:20.5356373Z Downloaded networkx 2026-02-21T08:07:20.5404821Z Downloaded nvidia-cuda-cupti 2026-02-21T08:07:22.2442307Z Downloaded sympy 2026-02-21T08:07:22.5388830Z Downloaded triton 2026-02-21T08:07:23.8889914Z Downloaded nvidia-nvjitlink 2026-02-21T08:07:24.5299050Z Downloaded nvidia-curand 2026-02-21T08:07:24.8169601Z Downloaded nvidia-nvshmem-cu13 2026-02-21T08:07:26.0180358Z Downloaded nvidia-cuda-nvrtc 2026-02-21T08:07:26.4964013Z Downloaded nvidia-cufft 2026-02-21T08:07:27.4437867Z Downloaded nvidia-cusparse 2026-02-21T08:07:27.7902465Z Downloaded nvidia-cusolver 2026-02-21T08:07:28.3348881Z Downloaded nvidia-cusparselt-cu13 2026-02-21T08:07:28.4460952Z Downloaded nvidia-nccl-cu13 2026-02-21T08:07:29.0231841Z Downloaded nvidia-cudnn-cu13 2026-02-21T08:07:29.8558006Z Downloaded nvidia-cublas 2026-02-21T08:07:34.4589910Z Downloaded torch 2026-02-21T08:07:34.4590701Z Prepared 26 packages in 15.02s 2026-02-21T08:07:34.4639926Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:07:34.4640491Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:07:34.4641184Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:07:35.4466647Z Installed 26 packages in 986ms 2026-02-21T08:07:35.4467599Z + filelock==3.20.0 2026-02-21T08:07:35.4467897Z + fsspec==2025.12.0 2026-02-21T08:07:35.4468137Z + jinja2==3.1.6 2026-02-21T08:07:35.4468430Z + markupsafe==3.0.2 2026-02-21T08:07:35.4469042Z + mpmath==1.3.0 2026-02-21T08:07:35.4469330Z + networkx==3.6.1 2026-02-21T08:07:35.4469574Z + nvidia-cublas==13.0.0.19 2026-02-21T08:07:35.4469895Z + nvidia-cuda-cupti==13.0.48 2026-02-21T08:07:35.4470162Z + nvidia-cuda-nvrtc==13.0.48 2026-02-21T08:07:35.4470491Z + nvidia-cuda-runtime==13.0.48 2026-02-21T08:07:35.4470817Z + nvidia-cudnn-cu13==9.13.0.50 2026-02-21T08:07:35.4471086Z + nvidia-cufft==12.0.0.15 2026-02-21T08:07:35.4471382Z + nvidia-cufile==1.15.0.42 2026-02-21T08:07:35.4471653Z + nvidia-curand==10.4.0.35 2026-02-21T08:07:35.4471948Z + nvidia-cusolver==12.0.3.29 2026-02-21T08:07:35.4472182Z + nvidia-cusparse==12.6.2.49 2026-02-21T08:07:35.4472513Z + nvidia-cusparselt-cu13==0.8.0 2026-02-21T08:07:35.4472788Z + nvidia-nccl-cu13==2.27.7 2026-02-21T08:07:35.4472982Z + nvidia-nvjitlink==13.0.39 2026-02-21T08:07:35.4473283Z + nvidia-nvshmem-cu13==3.3.24 2026-02-21T08:07:35.4473496Z + nvidia-nvtx==13.0.39 2026-02-21T08:07:35.4473778Z + setuptools==70.2.0 2026-02-21T08:07:35.4474019Z + sympy==1.14.0 2026-02-21T08:07:35.4474242Z + torch==2.9.1+cu130 2026-02-21T08:07:35.4474420Z + triton==3.5.1 2026-02-21T08:07:35.4474976Z + typing-extensions==4.15.0 2026-02-21T08:07:35.4593673Z ##[group]Run source .venv/bin/activate 2026-02-21T08:07:35.4593973Z source .venv/bin/activate 2026-02-21T08:07:35.4594343Z SETUPTOOLS_SCM_PRETEND_VERSION="0.0.0" uv pip install -e .'[dev]' 2026-02-21T08:07:35.4594832Z python -c "import helion; print(helion.__name__)" 2026-02-21T08:07:35.4595323Z shell: bash -l {0} 2026-02-21T08:07:35.4595607Z env: 2026-02-21T08:07:35.4595830Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:07:35.4596263Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:35.4596608Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:07:35.4596916Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:35.4597176Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:35.4597476Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:07:35.4597971Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:07:35.4598426Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:07:35.4598721Z ##[endgroup] 2026-02-21T08:07:39.2978591Z Resolved 30 packages in 3.62s 2026-02-21T08:07:39.2990113Z Building helion @ file:///__w/helion/helion 2026-02-21T08:07:39.3076400Z Downloading virtualenv (5.6MiB) 2026-02-21T08:07:39.3090479Z Downloading numpy (15.8MiB) 2026-02-21T08:07:39.3095349Z Downloading scipy (33.4MiB) 2026-02-21T08:07:39.3100284Z Downloading pygments (1.2MiB) 2026-02-21T08:07:39.3127903Z Downloading scikit-learn (8.5MiB) 2026-02-21T08:07:39.4412824Z Built helion @ file:///__w/helion/helion 2026-02-21T08:07:39.5341660Z Downloaded virtualenv 2026-02-21T08:07:39.5466063Z Downloaded pygments 2026-02-21T08:07:40.0806107Z Downloaded scikit-learn 2026-02-21T08:07:40.0996074Z Downloaded numpy 2026-02-21T08:07:40.4951466Z Downloaded scipy 2026-02-21T08:07:40.8065786Z Prepared 27 packages in 1.50s 2026-02-21T08:07:40.8071622Z Uninstalled 1 package in 0.53ms 2026-02-21T08:07:40.8077137Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:07:40.8079342Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:07:40.8080119Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:07:44.6571563Z Installed 29 packages in 3.84s 2026-02-21T08:07:44.6572372Z + cfgv==3.5.0 2026-02-21T08:07:44.6572680Z + distlib==0.4.0 2026-02-21T08:07:44.6572926Z + expecttest==0.3.0 2026-02-21T08:07:44.6573178Z + filecheck==1.0.3 2026-02-21T08:07:44.6573425Z - filelock==3.20.0 2026-02-21T08:07:44.6573615Z + filelock==3.24.3 2026-02-21T08:07:44.6574325Z + helion==0.0.0 (from file:///__w/helion/helion) 2026-02-21T08:07:44.6574581Z + hypothesis==6.151.9 2026-02-21T08:07:44.6575015Z + identify==2.6.16 2026-02-21T08:07:44.6575212Z + iniconfig==2.3.0 2026-02-21T08:07:44.6575460Z + joblib==1.5.3 2026-02-21T08:07:44.6575707Z + markdown-it-py==4.0.0 2026-02-21T08:07:44.6575916Z + mdurl==0.1.2 2026-02-21T08:07:44.6576146Z + nodeenv==1.10.0 2026-02-21T08:07:44.6576340Z + numpy==2.4.2 2026-02-21T08:07:44.6576554Z + packaging==26.0 2026-02-21T08:07:44.6576758Z + platformdirs==4.9.2 2026-02-21T08:07:44.6576990Z + pluggy==1.6.0 2026-02-21T08:07:44.6577162Z + pre-commit==4.5.1 2026-02-21T08:07:44.6577413Z + psutil==7.2.2 2026-02-21T08:07:44.6577599Z + pygments==2.19.2 2026-02-21T08:07:44.6577808Z + pytest==9.0.2 2026-02-21T08:07:44.6578055Z + pytest-timeout==2.4.0 2026-02-21T08:07:44.6578269Z + pyyaml==6.0.3 2026-02-21T08:07:44.6578468Z + rich==14.3.3 2026-02-21T08:07:44.6578679Z + scikit-learn==1.8.0 2026-02-21T08:07:44.6578913Z + scipy==1.17.0 2026-02-21T08:07:44.6579105Z + sortedcontainers==2.4.0 2026-02-21T08:07:44.6579360Z + threadpoolctl==3.6.0 2026-02-21T08:07:44.6579564Z + virtualenv==20.38.0 2026-02-21T08:08:03.6707801Z helion 2026-02-21T08:08:04.3339017Z ##[group]Run set -x 2026-02-21T08:08:04.3339306Z set -x 2026-02-21T08:08:04.3339554Z source .venv/bin/activate 2026-02-21T08:08:04.3339785Z uv pip install pip 2026-02-21T08:08:04.3340066Z uv pip install quack-kernels --no-deps 2026-02-21T08:08:04.3340379Z mkdir -p benchmarks/ && pushd benchmarks/ 2026-02-21T08:08:04.3340846Z git clone https://github.com/pytorch-labs/tritonbench/ 2026-02-21T08:08:04.3341180Z pushd tritonbench/ 2026-02-21T08:08:04.3341599Z git submodule update --init --recursive 2026-02-21T08:08:04.3341919Z uv pip install -r requirements.txt 2026-02-21T08:08:04.3342183Z python install.py --liger 2026-02-21T08:08:04.3342480Z uv pip install -e . --no-deps 2026-02-21T08:08:04.3342723Z popd 2026-02-21T08:08:04.3342949Z popd 2026-02-21T08:08:04.3343296Z shell: bash -l {0} 2026-02-21T08:08:04.3343491Z env: 2026-02-21T08:08:04.3343809Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:08:04.3344087Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:08:04.3344424Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:08:04.3344749Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:08:04.3345083Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:08:04.3345350Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:08:04.3345791Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:08:04.3346327Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:08:04.3346605Z ##[endgroup] 2026-02-21T08:08:19.2337657Z + source .venv/bin/activate 2026-02-21T08:08:19.2339852Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2340138Z ++ '[' -n x ']' 2026-02-21T08:08:19.2340442Z ++ SCRIPT_PATH=.venv/bin/activate 2026-02-21T08:08:19.2340798Z ++ '[' .venv/bin/activate = /__w/_temp/02936ecd-f481-449a-8118-fc2da21cdbbe.sh ']' 2026-02-21T08:08:19.2341174Z ++ deactivate nondestructive 2026-02-21T08:08:19.2341414Z ++ unset -f pydoc 2026-02-21T08:08:19.2341649Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2341843Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2342065Z ++ hash -r 2026-02-21T08:08:19.2342290Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2343767Z ++ unset VIRTUAL_ENV 2026-02-21T08:08:19.2344234Z ++ unset VIRTUAL_ENV_PROMPT 2026-02-21T08:08:19.2344553Z ++ '[' '!' nondestructive = nondestructive ']' 2026-02-21T08:08:19.2345039Z ++ VIRTUAL_ENV=/__w/helion/helion/.venv 2026-02-21T08:08:19.2345371Z ++ '[' linux-gnu = cygwin ']' 2026-02-21T08:08:19.2345722Z ++ '[' linux-gnu = msys ']' 2026-02-21T08:08:19.2345972Z ++ export VIRTUAL_ENV 2026-02-21T08:08:19.2346236Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2346541Z ++ unset SCRIPT_PATH 2026-02-21T08:08:19.2347355Z ++ _OLD_VIRTUAL_PATH=/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T08:08:19.2348664Z ++ PATH=/__w/helion/helion/.venv/bin:/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T08:08:19.2349469Z ++ export PATH 2026-02-21T08:08:19.2349662Z ++ '[' xhelion '!=' x ']' 2026-02-21T08:08:19.2350047Z ++ VIRTUAL_ENV_PROMPT=helion 2026-02-21T08:08:19.2350290Z ++ export VIRTUAL_ENV_PROMPT 2026-02-21T08:08:19.2350528Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2350693Z ++ '[' -z '' ']' 2026-02-21T08:08:19.2350956Z ++ _OLD_VIRTUAL_PS1= 2026-02-21T08:08:19.2351155Z ++ PS1='(helion) ' 2026-02-21T08:08:19.2351357Z ++ export PS1 2026-02-21T08:08:19.2351599Z ++ alias pydoc 2026-02-21T08:08:19.2351777Z ++ true 2026-02-21T08:08:19.2351978Z ++ hash -r 2026-02-21T08:08:19.2352604Z + uv pip install pip 2026-02-21T08:08:21.0903733Z Resolved 1 package in 1.84s 2026-02-21T08:08:21.0977125Z Downloading pip (1.7MiB) 2026-02-21T08:08:21.2266220Z Downloaded pip 2026-02-21T08:08:21.2271711Z Prepared 1 package in 136ms 2026-02-21T08:08:21.2304644Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:08:21.2305441Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:08:21.2306053Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:08:21.4319668Z Installed 1 package in 205ms 2026-02-21T08:08:21.4323674Z + pip==26.0.1 2026-02-21T08:08:21.4344199Z + uv pip install quack-kernels --no-deps 2026-02-21T08:08:21.9550183Z Resolved 1 package in 512ms 2026-02-21T08:08:21.9939113Z Prepared 1 package in 39ms 2026-02-21T08:08:21.9967782Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:08:21.9968455Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:08:21.9969065Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:08:21.9978943Z Installed 1 package in 4ms 2026-02-21T08:08:21.9979241Z + quack-kernels==0.2.10 2026-02-21T08:08:22.0008030Z + mkdir -p benchmarks/ 2026-02-21T08:08:22.0019589Z + pushd benchmarks/ 2026-02-21T08:08:22.0020446Z + git clone https://github.com/pytorch-labs/tritonbench/ 2026-02-21T08:08:22.0020944Z /__w/helion/helion/benchmarks /__w/helion/helion 2026-02-21T08:08:22.0031744Z Cloning into 'tritonbench'... 2026-02-21T08:08:27.4577806Z + pushd tritonbench/ 2026-02-21T08:08:27.4578367Z /__w/helion/helion/benchmarks/tritonbench /__w/helion/helion/benchmarks /__w/helion/helion 2026-02-21T08:08:27.4579145Z + git submodule update --init --recursive 2026-02-21T08:08:29.0863520Z Submodule 'submodules/ThunderKittens' (https://github.com/HazyResearch/ThunderKittens.git) registered for path 'submodules/ThunderKittens' 2026-02-21T08:08:30.1480680Z Submodule 'submodules/aiter' (https://github.com/ROCm/aiter.git) registered for path 'submodules/aiter' 2026-02-21T08:08:30.2757605Z Submodule 'submodules/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/cutlass' 2026-02-21T08:08:30.8725691Z Submodule 'submodules/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'submodules/flash-attention' 2026-02-21T08:08:31.6794276Z Submodule 'submodules/generative-recommenders' (https://github.com/facebookresearch/generative-recommenders.git) registered for path 'submodules/generative-recommenders' 2026-02-21T08:08:32.2941952Z Submodule 'submodules/xformers' (https://github.com/facebookresearch/xformers.git) registered for path 'submodules/xformers' 2026-02-21T08:08:32.2972658Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/ThunderKittens'... 2026-02-21T08:08:49.0536629Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/aiter'... 2026-02-21T08:09:05.3639678Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/cutlass'... 2026-02-21T08:09:09.0200418Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/flash-attention'... 2026-02-21T08:09:10.4960908Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/generative-recommenders'... 2026-02-21T08:09:10.9284179Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers'... 2026-02-21T08:09:12.3185837Z Submodule path 'submodules/ThunderKittens': checked out '25f7568450b412a1984a4f619fb28373df06fa1b' 2026-02-21T08:09:12.6159775Z Submodule path 'submodules/aiter': checked out '1f5b378dcc9d9b0bcd9456c8c767b7424a5e8190' 2026-02-21T08:09:12.6177929Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/aiter/3rdparty/composable_kernel' 2026-02-21T08:09:12.6202108Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/aiter/3rdparty/composable_kernel'... 2026-02-21T08:09:16.6901361Z Submodule path 'submodules/aiter/3rdparty/composable_kernel': checked out 'e31a7a4f29b371c32ea9daf9211b6ae1fed2fa40' 2026-02-21T08:09:17.1366389Z Submodule path 'submodules/cutlass': checked out 'ad7b2f5e84fcfa124cb02b91d5bd26d238c0459e' 2026-02-21T08:09:17.1997322Z Submodule path 'submodules/flash-attention': checked out '43375aab2893018dfb7950db1cfa623c14946ad6' 2026-02-21T08:09:17.2012817Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/flash-attention/csrc/composable_kernel' 2026-02-21T08:09:17.2015007Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/flash-attention/csrc/cutlass' 2026-02-21T08:09:17.2039423Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/flash-attention/csrc/composable_kernel'... 2026-02-21T08:09:21.7532290Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/flash-attention/csrc/cutlass'... 2026-02-21T08:09:25.7325302Z Submodule path 'submodules/flash-attention/csrc/composable_kernel': checked out 'e8709c24f403173ad21a2da907d1347957e324fb' 2026-02-21T08:09:26.2107135Z Submodule path 'submodules/flash-attention/csrc/cutlass': checked out 'b1d6e2c9b334dfa811e4183dfbd02419249e4b52' 2026-02-21T08:09:26.2360538Z Submodule path 'submodules/generative-recommenders': checked out '88512dbd71b053226bc4ef8ec1630e3db53e55e5' 2026-02-21T08:09:26.2376605Z Submodule 'generative_recommenders/ops/cpp/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/generative-recommenders/generative_recommenders/ops/cpp/cutlass' 2026-02-21T08:09:26.2399205Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/generative-recommenders/generative_recommenders/ops/cpp/cutlass'... 2026-02-21T08:09:30.3221915Z Submodule path 'submodules/generative-recommenders/generative_recommenders/ops/cpp/cutlass': checked out 'dc4817921edda44a549197ff3a9dcf5df0636e7b' 2026-02-21T08:09:30.3809805Z Submodule path 'submodules/xformers': checked out '8fc8ec5a4d6498ff81c0c418b89bbaf133ae3a44' 2026-02-21T08:09:30.3826774Z Submodule 'third_party/composable_kernel_tiled' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/xformers/third_party/composable_kernel_tiled' 2026-02-21T08:09:30.3830523Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/xformers/third_party/cutlass' 2026-02-21T08:09:30.3831348Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'submodules/xformers/third_party/flash-attention' 2026-02-21T08:09:30.3857436Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/composable_kernel_tiled'... 2026-02-21T08:09:34.5062941Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/cutlass'... 2026-02-21T08:09:39.8759960Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/flash-attention'... 2026-02-21T08:09:43.1995581Z Submodule path 'submodules/xformers/third_party/composable_kernel_tiled': checked out '4f54fa30583704f34da2ac50372d524cae6bad7d' 2026-02-21T08:09:43.8344396Z Submodule path 'submodules/xformers/third_party/cutlass': checked out 'e9627ce55b42fd2599f58cd4396da9380954def0' 2026-02-21T08:09:43.8890441Z Submodule path 'submodules/xformers/third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2026-02-21T08:09:43.9434576Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/xformers/third_party/flash-attention/csrc/composable_kernel' 2026-02-21T08:09:44.0332969Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/xformers/third_party/flash-attention/csrc/cutlass' 2026-02-21T08:09:44.0356879Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/flash-attention/csrc/composable_kernel'... 2026-02-21T08:09:48.5851615Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/flash-attention/csrc/cutlass'... 2026-02-21T08:09:52.9709263Z Submodule path 'submodules/xformers/third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2026-02-21T08:09:53.3999017Z Submodule path 'submodules/xformers/third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2026-02-21T08:09:53.4036059Z + uv pip install -r requirements.txt 2026-02-21T08:09:53.4103036Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:09:53.5737944Z Resolved 30 packages in 162ms 2026-02-21T08:09:53.5834567Z Downloading fonttools (4.7MiB) 2026-02-21T08:09:53.5881820Z Downloading pillow (6.7MiB) 2026-02-21T08:09:53.5882050Z Downloading matplotlib (8.3MiB) 2026-02-21T08:09:53.5882254Z Downloading tokenizers (3.0MiB) 2026-02-21T08:09:53.5882449Z Downloading transformers (10.3MiB) 2026-02-21T08:09:53.5882630Z Downloading kiwisolver (1.4MiB) 2026-02-21T08:09:53.5889791Z Downloading hf-xet (3.2MiB) 2026-02-21T08:09:53.7306745Z Downloaded kiwisolver 2026-02-21T08:09:53.7947454Z Downloaded tokenizers 2026-02-21T08:09:53.8000554Z Downloaded hf-xet 2026-02-21T08:09:53.9287249Z Downloaded pillow 2026-02-21T08:09:53.9546344Z Downloaded fonttools 2026-02-21T08:09:54.0630342Z Downloaded matplotlib 2026-02-21T08:09:54.9793271Z Downloaded transformers 2026-02-21T08:09:54.9800698Z Prepared 23 packages in 1.40s 2026-02-21T08:09:54.9832629Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:09:54.9833134Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:09:54.9833658Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:09:55.0442789Z Installed 23 packages in 64ms 2026-02-21T08:09:55.0446676Z + certifi==2026.1.4 2026-02-21T08:09:55.0451340Z + charset-normalizer==3.4.4 2026-02-21T08:09:55.0456696Z + contourpy==1.3.3 2026-02-21T08:09:55.0456935Z + cycler==0.12.1 2026-02-21T08:09:55.0457127Z + fonttools==4.61.1 2026-02-21T08:09:55.0457306Z + hf-xet==1.2.0 2026-02-21T08:09:55.0457502Z + huggingface-hub==0.36.2 2026-02-21T08:09:55.0457679Z + idna==3.11 2026-02-21T08:09:55.0457827Z + kiwisolver==1.4.9 2026-02-21T08:09:55.0458000Z + matplotlib==3.10.8 2026-02-21T08:09:55.0458177Z + nvidia-ml-py==13.590.48 2026-02-21T08:09:55.0458348Z + pillow==12.1.1 2026-02-21T08:09:55.0458486Z + pyparsing==3.3.2 2026-02-21T08:09:55.0458647Z + python-dateutil==2.9.0.post0 2026-02-21T08:09:55.0458807Z + regex==2026.2.19 2026-02-21T08:09:55.0458948Z + requests==2.32.5 2026-02-21T08:09:55.0459088Z + safetensors==0.7.0 2026-02-21T08:09:55.0459239Z + six==1.17.0 2026-02-21T08:09:55.0459370Z + tabulate==0.9.0 2026-02-21T08:09:55.0459521Z + tokenizers==0.21.4 2026-02-21T08:09:55.0459671Z + tqdm==4.67.3 2026-02-21T08:09:55.0459806Z + transformers==4.53.0 2026-02-21T08:09:55.0459961Z + urllib3==2.6.3 2026-02-21T08:09:55.0539782Z + python install.py --liger 2026-02-21T08:09:59.7664256Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:09:59.7693343Z Audited 6 packages in 3ms 2026-02-21T08:09:59.8266876Z INFO:__main__:[tritonbench] installing liger-kernels... 2026-02-21T08:09:59.8323245Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:09:59.8763885Z Resolved 1 package in 42ms 2026-02-21T08:09:59.9950414Z Prepared 1 package in 118ms 2026-02-21T08:09:59.9984315Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:09:59.9984874Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:09:59.9985725Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:10:00.0010305Z Installed 1 package in 6ms 2026-02-21T08:10:00.0010524Z + liger-kernel-nightly==0.7.0.dev20260219183429 2026-02-21T08:10:00.0034253Z INFO:__main__:[tritonbench] installation complete! 2026-02-21T08:10:00.3943255Z + uv pip install -e . --no-deps 2026-02-21T08:10:00.4343942Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:10:00.4381852Z Resolved 1 package in 2ms 2026-02-21T08:10:00.4393544Z Building tritonbench @ file:///__w/helion/helion/benchmarks/tritonbench 2026-02-21T08:10:01.1982856Z Built tritonbench @ file:///__w/helion/helion/benchmarks/tritonbench 2026-02-21T08:10:01.2001743Z Prepared 1 package in 761ms 2026-02-21T08:10:01.2006725Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:10:01.2007417Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:10:01.2007976Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:10:01.2008358Z Installed 1 package in 0.50ms 2026-02-21T08:10:01.2008639Z + tritonbench==0.0.1 (from file:///__w/helion/helion/benchmarks/tritonbench) 2026-02-21T08:10:01.2093501Z + popd 2026-02-21T08:10:01.2093744Z + popd 2026-02-21T08:10:01.2095798Z /__w/helion/helion/benchmarks /__w/helion/helion 2026-02-21T08:10:01.2096056Z /__w/helion/helion 2026-02-21T08:10:01.2143650Z ##[group]Run rm -rf /tmp/torchinductor_*/ || true 2026-02-21T08:10:01.2143992Z rm -rf /tmp/torchinductor_*/ || true 2026-02-21T08:10:01.2144187Z  2026-02-21T08:10:01.2144337Z source .venv/bin/activate 2026-02-21T08:10:01.2144509Z  2026-02-21T08:10:01.2144747Z TEST_REPORTS_DIR=$(pwd)/test/test-reports 2026-02-21T08:10:01.2144987Z mkdir -p "$TEST_REPORTS_DIR" 2026-02-21T08:10:01.2145186Z echo "$TEST_REPORTS_DIR" 2026-02-21T08:10:01.2145355Z  2026-02-21T08:10:01.2145502Z KERNEL_LIST="gemm" 2026-02-21T08:10:01.2145690Z for kernel in ${KERNEL_LIST//,/ }; do 2026-02-21T08:10:01.2145930Z  echo "==========================================" 2026-02-21T08:10:01.2146202Z  echo "Running benchmark for kernel: $kernel" 2026-02-21T08:10:01.2146430Z  echo "==========================================" 2026-02-21T08:10:01.2146636Z  2026-02-21T08:10:01.2146871Z  # Get available implementations and baseline for this kernel 2026-02-21T08:10:01.2147272Z  KERNEL_INFO=$(python benchmarks/run.py --list-impls-for-benchmark-ci --op $kernel | grep "^$kernel:") 2026-02-21T08:10:01.2147675Z  IMPLS=$(echo "$KERNEL_INFO" | sed -n 's/.*impls=\([^ ]*\).*/\1/p') 2026-02-21T08:10:01.2147988Z  BASELINE=$(echo "$KERNEL_INFO" | sed -n 's/.*baseline=\([^ ]*\).*/\1/p') 2026-02-21T08:10:01.2148224Z  2026-02-21T08:10:01.2148362Z  if [[ -z "$IMPLS" ]]; then 2026-02-21T08:10:01.2148620Z  echo "Warning: No implementations found for kernel $kernel, skipping..." 2026-02-21T08:10:01.2148891Z  continue 2026-02-21T08:10:01.2149033Z  fi 2026-02-21T08:10:01.2149183Z  if [[ -z "$BASELINE" ]]; then 2026-02-21T08:10:01.2149435Z  echo "Warning: No baseline found for kernel $kernel, skipping..." 2026-02-21T08:10:01.2149691Z  continue 2026-02-21T08:10:01.2149846Z  fi 2026-02-21T08:10:01.2149992Z  echo "Using baseline: $BASELINE" 2026-02-21T08:10:01.2150225Z  echo "Available implementations for $kernel: $IMPLS" 2026-02-21T08:10:01.2150436Z  2026-02-21T08:10:01.2150598Z  # Do autotuning but do not record the results 2026-02-21T08:10:01.2150804Z  python benchmarks/run.py \ 2026-02-21T08:10:01.2150988Z  --op $kernel \ 2026-02-21T08:10:01.2151168Z  --metrics speedup,accuracy \ 2026-02-21T08:10:01.2151383Z  --latency-measure-mode triton_do_bench \ 2026-02-21T08:10:01.2151597Z  --cudagraph \ 2026-02-21T08:10:01.2151755Z  --only $IMPLS \ 2026-02-21T08:10:01.2151959Z  --only-match-mode prefix-with-baseline \ 2026-02-21T08:10:01.2152165Z  --baseline $BASELINE \ 2026-02-21T08:10:01.2152344Z  --atol 1e-2 \ 2026-02-21T08:10:01.2152505Z  --rtol 1e-2 \ 2026-02-21T08:10:01.2152690Z  --input-sample-mode equally-spaced-k \ 2026-02-21T08:10:01.2153033Z  --keep-going \ 2026-02-21T08:10:01.2153187Z   2026-02-21T08:10:01.2153325Z  2026-02-21T08:10:01.2153452Z  # Relax the GPU 2026-02-21T08:10:01.2153611Z  sleep 2m 2026-02-21T08:10:01.2153743Z  2026-02-21T08:10:01.2153898Z  # Run again with cache and record results 2026-02-21T08:10:01.2154196Z  HELION_PRINT_OUTPUT_CODE=1 HELION_ASSERT_CACHE_HIT=1 python benchmarks/run.py \ 2026-02-21T08:10:01.2154475Z  --op $kernel \ 2026-02-21T08:10:01.2154652Z  --metrics speedup,accuracy \ 2026-02-21T08:10:01.2154912Z  --latency-measure-mode triton_do_bench \ 2026-02-21T08:10:01.2155131Z  --cudagraph \ 2026-02-21T08:10:01.2155296Z  --only $IMPLS \ 2026-02-21T08:10:01.2155637Z  --only-match-mode prefix-with-baseline \ 2026-02-21T08:10:01.2155840Z  --baseline $BASELINE \ 2026-02-21T08:10:01.2156017Z  --atol 1e-2 \ 2026-02-21T08:10:01.2156175Z  --rtol 1e-2 \ 2026-02-21T08:10:01.2156355Z  --input-sample-mode equally-spaced-k \ 2026-02-21T08:10:01.2156595Z  --output "$TEST_REPORTS_DIR/helionbench.json" \ 2026-02-21T08:10:01.2156810Z  --append-to-output \ 2026-02-21T08:10:01.2156990Z  --keep-going \ 2026-02-21T08:10:01.2157160Z   2026-02-21T08:10:01.2157297Z  2026-02-21T08:10:01.2157474Z  echo "✅ Completed benchmark for kernel: $kernel" 2026-02-21T08:10:01.2157675Z done 2026-02-21T08:10:01.2157807Z  2026-02-21T08:10:01.2157977Z if [[ ! -s "$TEST_REPORTS_DIR/helionbench.json" ]]; then 2026-02-21T08:10:01.2158234Z  echo "❌ helionbench.json is missing or empty" 2026-02-21T08:10:01.2158429Z  exit 1 2026-02-21T08:10:01.2158572Z fi 2026-02-21T08:10:01.2158730Z cat "$TEST_REPORTS_DIR/helionbench.json" 2026-02-21T08:10:01.2159063Z shell: bash -l {0} 2026-02-21T08:10:01.2159211Z env: 2026-02-21T08:10:01.2159355Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:10:01.2159562Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:10:01.2159803Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:10:01.2160052Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:10:01.2160266Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:10:01.2160496Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:10:01.2160878Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T08:10:01.2161265Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:10:01.2161482Z ##[endgroup] 2026-02-21T08:10:01.2736718Z /__w/helion/helion/test/test-reports 2026-02-21T08:10:01.2741330Z ========================================== 2026-02-21T08:10:01.2742646Z Running benchmark for kernel: gemm 2026-02-21T08:10:01.2742863Z ========================================== 2026-02-21T08:10:05.6131023Z Using baseline: aten_matmul 2026-02-21T08:10:05.6131438Z Available implementations for gemm: helion_matmul_tritonbench,pt2_triton_matmul,triton_tutorial_matmul 2026-02-21T08:10:10.4816348Z Applying custom args for gemm: {'num_inputs': 8, 'non_square': '', 'rep': '3000'} 2026-02-21T08:10:10.8734645Z Running gemm benchmark with Helion implementation... 2026-02-21T08:10:10.8739095Z 2026-02-21T08:10:10.9349322Z Equally-spaced-k mode: Selected 8 equally spaced inputs (total available: 12) 2026-02-21T08:10:10.9351437Z WARNING:tritonbench.utils.triton_op:Input IDs to run: [0, 2, 3, 5, 6, 8, 9, 11] 2026-02-21T08:10:10.9354977Z 2026-02-21T08:10:10.9363947Z 0%| | 0/8 [00:00; 2026-02-21T08:12:18.8850207Z .reg .b16 %rs<3>; 2026-02-21T08:12:18.8850364Z .reg .b32 %r<404>; 2026-02-21T08:12:18.8850529Z .reg .b64 %rd<141>; 2026-02-21T08:12:18.8850827Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19:0 2026-02-21T08:12:18.8851160Z $L__func_begin0: 2026-02-21T08:12:18.8851439Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19:0 2026-02-21T08:12:18.8851722Z 2026-02-21T08:12:18.8851784Z // %bb.0: 2026-02-21T08:12:18.8851960Z ld.param.b64 %rd8, [_helion_matmul_param_3]; 2026-02-21T08:12:18.8852179Z $L__tmp0: 2026-02-21T08:12:18.8852447Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19 2026-02-21T08:12:18.8852767Z mov.u32 %r1, %tid.x; 2026-02-21T08:12:18.8852938Z shr.u32 %r2, %r1, 5; 2026-02-21T08:12:18.8853116Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:12:18.8853336Z setp.lt.u32 %p1, %r3, 8; 2026-02-21T08:12:18.8853508Z @%p1 bra $L__BB0_12; 2026-02-21T08:12:18.8853674Z bra.uni $L__BB0_1; 2026-02-21T08:12:18.8853825Z $L__BB0_12: 2026-02-21T08:12:18.8854097Z .loc 1 0 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:0:0 2026-02-21T08:12:18.8854459Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:12:18.8854844Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19 2026-02-21T08:12:18.8855209Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T08:12:18.8855429Z setp.lt.u32 %p27, %r1, 32; 2026-02-21T08:12:18.8855677Z mov.b32 %r148, global_smem; 2026-02-21T08:12:18.8855856Z // begin inline asm 2026-02-21T08:12:18.8856150Z @%p27 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r148], 64; 2026-02-21T08:12:18.8856440Z // end inline asm 2026-02-21T08:12:18.8856593Z bar.sync 0, 256; 2026-02-21T08:12:18.8856769Z ld.shared.b32 %r395, [global_smem]; 2026-02-21T08:12:18.8856962Z bar.sync 0, 256; 2026-02-21T08:12:18.8857123Z // begin inline asm 2026-02-21T08:12:18.8857358Z @%p27 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:12:18.8857624Z // end inline asm 2026-02-21T08:12:18.8857914Z .loc 1 21 67 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:21:67 2026-02-21T08:12:18.8858266Z mov.u32 %r157, %ctaid.x; 2026-02-21T08:12:18.8858444Z mov.u32 %r158, %ctaid.y; 2026-02-21T08:12:18.8858612Z mov.u32 %r159, %ctaid.z; 2026-02-21T08:12:18.8858794Z mov.u32 %r160, %nctaid.x; 2026-02-21T08:12:18.8859060Z mov.u32 %r161, %nctaid.y; 2026-02-21T08:12:18.8859256Z mad.lo.s32 %r162, %r159, %r161, %r158; 2026-02-21T08:12:18.8859466Z mad.lo.s32 %r163, %r162, %r160, %r157; 2026-02-21T08:12:18.8859671Z shl.b32 %r164, %r163, 7; 2026-02-21T08:12:18.8859844Z cvt.s64.s32 %rd71, %r164; 2026-02-21T08:12:18.8860036Z add.s64 %rd68, %rd8, %rd71; 2026-02-21T08:12:18.8860221Z shl.b32 %r165, %r1, 2; 2026-02-21T08:12:18.8860413Z add.s32 %r149, %r148, %r165; 2026-02-21T08:12:18.8860608Z mov.b32 %r150, 0; 2026-02-21T08:12:18.8860774Z // begin inline asm 2026-02-21T08:12:18.8860972Z @%p27 st.shared.b32 [ %r149 + 0 ], %r150; 2026-02-21T08:12:18.8861183Z // end inline asm 2026-02-21T08:12:18.8861354Z bar.warp.sync -1; 2026-02-21T08:12:18.8861522Z setp.eq.b32 %p30, %r1, 0; 2026-02-21T08:12:18.8861710Z cvt.u64.u32 %rd53, %r148; 2026-02-21T08:12:18.8861886Z // begin inline asm 2026-02-21T08:12:18.8862253Z @%p30 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd53 + 0 ], %rd5; 2026-02-21T08:12:18.8862588Z // end inline asm 2026-02-21T08:12:18.8862738Z // begin inline asm 2026-02-21T08:12:18.8863008Z @%p30 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x1; 2026-02-21T08:12:18.8863293Z // end inline asm 2026-02-21T08:12:18.8863451Z mov.b32 %r151, 16; 2026-02-21T08:12:18.8863604Z // begin inline asm 2026-02-21T08:12:18.8863883Z @%p30 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x0, %r151; 2026-02-21T08:12:18.8864188Z // end inline asm 2026-02-21T08:12:18.8864350Z mov.b32 %r152, 256; 2026-02-21T08:12:18.8864517Z // begin inline asm 2026-02-21T08:12:18.8864820Z @%p30 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x1, %r152; 2026-02-21T08:12:18.8865133Z // end inline asm 2026-02-21T08:12:18.8865282Z mov.b32 %r153, 1024; 2026-02-21T08:12:18.8865445Z // begin inline asm 2026-02-21T08:12:18.8865728Z @%p30 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x0, %r153; 2026-02-21T08:12:18.8866053Z // end inline asm 2026-02-21T08:12:18.8866200Z mov.b32 %r154, 4096; 2026-02-21T08:12:18.8866362Z // begin inline asm 2026-02-21T08:12:18.8866638Z @%p30 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x1, %r154; 2026-02-21T08:12:18.8866968Z // end inline asm 2026-02-21T08:12:18.8867124Z mov.b64 %rd61, 2048; 2026-02-21T08:12:18.8867281Z // begin inline asm 2026-02-21T08:12:18.8867570Z @%p30 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd53 + 0 ], 0x0, %rd61; 2026-02-21T08:12:18.8867884Z // end inline asm 2026-02-21T08:12:18.8868038Z mov.b32 %r155, 1; 2026-02-21T08:12:18.8868195Z // begin inline asm 2026-02-21T08:12:18.8868483Z @%p30 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x0, %r155; 2026-02-21T08:12:18.8868815Z // end inline asm 2026-02-21T08:12:18.8868965Z // begin inline asm 2026-02-21T08:12:18.8869259Z @%p30 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x1, %r155; 2026-02-21T08:12:18.8869580Z // end inline asm 2026-02-21T08:12:18.8869740Z // begin inline asm 2026-02-21T08:12:18.8869999Z @%p30 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x6; 2026-02-21T08:12:18.8870305Z // end inline asm 2026-02-21T08:12:18.8870462Z // begin inline asm 2026-02-21T08:12:18.8870740Z @%p30 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x0; 2026-02-21T08:12:18.8871063Z // end inline asm 2026-02-21T08:12:18.8871212Z // begin inline asm 2026-02-21T08:12:18.8871483Z @%p30 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x1; 2026-02-21T08:12:18.8871796Z // end inline asm 2026-02-21T08:12:18.8871958Z // begin inline asm 2026-02-21T08:12:18.8872245Z @%p30 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd53 + 0 ], 0x0; 2026-02-21T08:12:18.8872543Z // end inline asm 2026-02-21T08:12:18.8872691Z // begin inline asm 2026-02-21T08:12:18.8873098Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd68 + 0 ], [ %rd53 + 0 ], 0x80; 2026-02-21T08:12:18.8873607Z // end inline asm 2026-02-21T08:12:18.8873793Z // begin inline asm 2026-02-21T08:12:18.8874038Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd68 + 0 ], 0x80; 2026-02-21T08:12:18.8874333Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T08:12:18.8874562Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:12:18.8874806Z // end inline asm 2026-02-21T08:12:18.8874963Z bar.sync 0, 256; 2026-02-21T08:12:18.8875249Z .loc 1 28 35 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:28:35 2026-02-21T08:12:18.8875599Z shl.b32 %r403, %r157, 2; 2026-02-21T08:12:18.8875905Z .loc 1 29 37 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:29:37 2026-02-21T08:12:18.8876250Z add.s32 %r166, %r403, 4; 2026-02-21T08:12:18.8876636Z .loc 1 29 49 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:29:49 2026-02-21T08:12:18.8876978Z min.s32 %r25, %r166, 512; 2026-02-21T08:12:18.8877287Z .loc 1 30 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:30:52 2026-02-21T08:12:18.8877622Z setp.ge.s32 %p47, %r403, %r25; 2026-02-21T08:12:18.8877819Z @%p47 bra $L__BB0_15; 2026-02-21T08:12:18.8878005Z // %bb.13: // %.lr.ph 2026-02-21T08:12:18.8878354Z .loc 1 0 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:0:52 2026-02-21T08:12:18.8878718Z ld.param.b64 %rd7, [_helion_matmul_param_2]; 2026-02-21T08:12:18.8878934Z shl.b32 %r167, %r1, 3; 2026-02-21T08:12:18.8879112Z and.b32 %r26, %r167, 24; 2026-02-21T08:12:18.8879279Z shr.u32 %r168, %r1, 2; 2026-02-21T08:12:18.8879453Z bfe.u32 %r27, %r1, 2, 6; 2026-02-21T08:12:18.8879623Z or.b32 %r28, %r27, 64; 2026-02-21T08:12:18.8879796Z or.b32 %r29, %r27, 128; 2026-02-21T08:12:18.8879962Z or.b32 %r30, %r168, 192; 2026-02-21T08:12:18.8880133Z shl.b32 %r169, %r1, 4; 2026-02-21T08:12:18.8880303Z and.b32 %r170, %r169, 4080; 2026-02-21T08:12:18.8880489Z add.s32 %r31, %r148, %r170; 2026-02-21T08:12:18.8880667Z shl.b32 %r172, %r1, 7; 2026-02-21T08:12:18.8880830Z and.b32 %r173, %r172, 3072; 2026-02-21T08:12:18.8881012Z and.b32 %r174, %r169, 112; 2026-02-21T08:12:18.8881185Z and.b32 %r176, %r165, 896; 2026-02-21T08:12:18.8881366Z add.s32 %r177, %r148, %r173; 2026-02-21T08:12:18.8881546Z add.s32 %r178, %r177, %r174; 2026-02-21T08:12:18.8881728Z add.s32 %r280, %r178, %r176; 2026-02-21T08:12:18.8881957Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T08:12:18.8882337Z .loc 1 36 35 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:36:35 2026-02-21T08:12:18.8882668Z shr.s32 %r312, %r403, 31; 2026-02-21T08:12:18.8882841Z shr.u32 %r313, %r312, 22; 2026-02-21T08:12:18.8883021Z add.s32 %r314, %r403, %r313; 2026-02-21T08:12:18.8883193Z shr.s32 %r315, %r314, 10; 2026-02-21T08:12:18.8883496Z .loc 1 37 33 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:37:33 2026-02-21T08:12:18.8883824Z shl.b32 %r316, %r315, 6; 2026-02-21T08:12:18.8884128Z .loc 1 38 39 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:38:39 2026-02-21T08:12:18.8884471Z sub.s32 %r317, 32, %r316; 2026-02-21T08:12:18.8884802Z .loc 1 38 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:38:52 2026-02-21T08:12:18.8885144Z min.s32 %r318, %r317, 64; 2026-02-21T08:12:18.8885435Z .loc 1 40 51 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:40:51 2026-02-21T08:12:18.8885774Z and.b32 %r319, %r314, -1024; 2026-02-21T08:12:18.8885949Z sub.s32 %r320, %r403, %r319; 2026-02-21T08:12:18.8886128Z div.s32 %r321, %r320, %r318; 2026-02-21T08:12:18.8886439Z .loc 1 41 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:41:27 2026-02-21T08:12:18.8886775Z mul.lo.s32 %r322, %r321, %r318; 2026-02-21T08:12:18.8887084Z mad.lo.s32 %r323, %r315, 960, %r322; 2026-02-21T08:12:18.8887276Z sub.s32 %r324, %r403, %r323; 2026-02-21T08:12:18.8887457Z shl.b32 %r325, %r324, 5; 2026-02-21T08:12:18.8887754Z .loc 1 42 32 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:42:32 2026-02-21T08:12:18.8888097Z or.b32 %r326, %r325, %r26; 2026-02-21T08:12:18.8888395Z .loc 1 43 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:43:27 2026-02-21T08:12:18.8888728Z shl.b32 %r327, %r321, 8; 2026-02-21T08:12:18.8889030Z .loc 1 44 32 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:44:32 2026-02-21T08:12:18.8889355Z or.b32 %r328, %r327, %r27; 2026-02-21T08:12:18.8889541Z or.b32 %r329, %r327, %r28; 2026-02-21T08:12:18.8889713Z or.b32 %r330, %r327, %r29; 2026-02-21T08:12:18.8889892Z or.b32 %r331, %r327, %r30; 2026-02-21T08:12:18.8890248Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8890601Z shfl.sync.idx.b32 %r332, %r2, 0, 31, -1; 2026-02-21T08:12:18.8890816Z shl.b32 %r333, %r332, 21; 2026-02-21T08:12:18.8890988Z and.b32 %r334, %r333, 6291456; 2026-02-21T08:12:18.8891176Z add.s32 %r335, %r334, %r395; 2026-02-21T08:12:18.8891347Z shl.b32 %r336, %r332, 3; 2026-02-21T08:12:18.8891523Z and.b32 %r337, %r336, 32; 2026-02-21T08:12:18.8891688Z add.s32 %r179, %r335, %r337; 2026-02-21T08:12:18.8891867Z mov.pred %p48, -1; 2026-02-21T08:12:18.8892026Z mov.b32 %r180, 0; 2026-02-21T08:12:18.8892187Z // begin inline asm 2026-02-21T08:12:18.8892613Z @%p48 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r179 + 0], {%r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180}; 2026-02-21T08:12:18.8893074Z // end inline asm 2026-02-21T08:12:18.8893236Z // begin inline asm 2026-02-21T08:12:18.8893661Z @%p48 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r179 + 16], {%r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180}; 2026-02-21T08:12:18.8894111Z // end inline asm 2026-02-21T08:12:18.8894261Z // begin inline asm 2026-02-21T08:12:18.8894440Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:12:18.8894625Z // end inline asm 2026-02-21T08:12:18.8894817Z bar.sync 0, 256; 2026-02-21T08:12:18.8895116Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.8895459Z add.s32 %r213, %r148, 45056; 2026-02-21T08:12:18.8895640Z // begin inline asm 2026-02-21T08:12:18.8895830Z @%p30 mbarrier.init.shared::cta.b64 [%r213], 1; 2026-02-21T08:12:18.8896050Z // end inline asm 2026-02-21T08:12:18.8896199Z bar.sync 0, 256; 2026-02-21T08:12:18.8896360Z add.s32 %r214, %r148, 45064; 2026-02-21T08:12:18.8896535Z // begin inline asm 2026-02-21T08:12:18.8896732Z @%p30 mbarrier.init.shared::cta.b64 [%r214], 1; 2026-02-21T08:12:18.8896968Z // end inline asm 2026-02-21T08:12:18.8897122Z bar.sync 0, 256; 2026-02-21T08:12:18.8897285Z add.s32 %r215, %r148, 45072; 2026-02-21T08:12:18.8897457Z // begin inline asm 2026-02-21T08:12:18.8897645Z @%p30 mbarrier.init.shared::cta.b64 [%r215], 1; 2026-02-21T08:12:18.8897855Z // end inline asm 2026-02-21T08:12:18.8898012Z bar.sync 0, 256; 2026-02-21T08:12:18.8898166Z add.s32 %r216, %r148, 45080; 2026-02-21T08:12:18.8898380Z // begin inline asm 2026-02-21T08:12:18.8898601Z @%p30 mbarrier.init.shared::cta.b64 [%r216], 1; 2026-02-21T08:12:18.8898847Z // end inline asm 2026-02-21T08:12:18.8899068Z bar.sync 0, 256; 2026-02-21T08:12:18.8899248Z add.s32 %r217, %r148, 45088; 2026-02-21T08:12:18.8899462Z // begin inline asm 2026-02-21T08:12:18.8899678Z @%p30 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T08:12:18.8899926Z // end inline asm 2026-02-21T08:12:18.8900076Z add.s32 %r218, %r148, 45104; 2026-02-21T08:12:18.8900255Z // begin inline asm 2026-02-21T08:12:18.8900439Z @%p30 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T08:12:18.8900738Z // end inline asm 2026-02-21T08:12:18.8900894Z bar.sync 0, 256; 2026-02-21T08:12:18.8901046Z add.s32 %r219, %r148, 45112; 2026-02-21T08:12:18.8901224Z // begin inline asm 2026-02-21T08:12:18.8901434Z @%p30 mbarrier.init.shared::cta.b64 [%r219], 1; 2026-02-21T08:12:18.8901648Z // end inline asm 2026-02-21T08:12:18.8901796Z bar.sync 0, 256; 2026-02-21T08:12:18.8901955Z add.s32 %r220, %r148, 45120; 2026-02-21T08:12:18.8902123Z // begin inline asm 2026-02-21T08:12:18.8902310Z @%p30 mbarrier.init.shared::cta.b64 [%r220], 1; 2026-02-21T08:12:18.8902516Z // end inline asm 2026-02-21T08:12:18.8902671Z bar.sync 0, 256; 2026-02-21T08:12:18.8902830Z add.s32 %r221, %r148, 45128; 2026-02-21T08:12:18.8903001Z // begin inline asm 2026-02-21T08:12:18.8903193Z @%p30 mbarrier.init.shared::cta.b64 [%r221], 1; 2026-02-21T08:12:18.8903400Z // end inline asm 2026-02-21T08:12:18.8903556Z bar.sync 0, 256; 2026-02-21T08:12:18.8903821Z add.s32 %r222, %r148, 45136; 2026-02-21T08:12:18.8904004Z // begin inline asm 2026-02-21T08:12:18.8904189Z @%p30 mbarrier.init.shared::cta.b64 [%r222], 1; 2026-02-21T08:12:18.8904404Z // end inline asm 2026-02-21T08:12:18.8904727Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.8905061Z bar.sync 0, 256; 2026-02-21T08:12:18.8905220Z // begin inline asm 2026-02-21T08:12:18.8905487Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r213]; 2026-02-21T08:12:18.8905716Z // end inline asm 2026-02-21T08:12:18.8905865Z bar.sync 0, 256; 2026-02-21T08:12:18.8906024Z // begin inline asm 2026-02-21T08:12:18.8906215Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r214]; 2026-02-21T08:12:18.8906445Z // end inline asm 2026-02-21T08:12:18.8906611Z bar.sync 0, 256; 2026-02-21T08:12:18.8906770Z // begin inline asm 2026-02-21T08:12:18.8906972Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r215]; 2026-02-21T08:12:18.8907193Z // end inline asm 2026-02-21T08:12:18.8907363Z bar.sync 0, 256; 2026-02-21T08:12:18.8907528Z // begin inline asm 2026-02-21T08:12:18.8907730Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r216]; 2026-02-21T08:12:18.8907940Z // end inline asm 2026-02-21T08:12:18.8908095Z bar.sync 0, 256; 2026-02-21T08:12:18.8908245Z // begin inline asm 2026-02-21T08:12:18.8908438Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r217]; 2026-02-21T08:12:18.8908656Z // end inline asm 2026-02-21T08:12:18.8908956Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.8909306Z bar.sync 0, 256; 2026-02-21T08:12:18.8909460Z add.s32 %r228, %r148, 45152; 2026-02-21T08:12:18.8909643Z // begin inline asm 2026-02-21T08:12:18.8909827Z @%p30 mbarrier.init.shared::cta.b64 [%r228], 1; 2026-02-21T08:12:18.8910044Z // end inline asm 2026-02-21T08:12:18.8910218Z st.shared.b32 [global_smem+45160], 33554689; 2026-02-21T08:12:18.8910457Z st.shared.b32 [global_smem+40960], %r395; 2026-02-21T08:12:18.8910719Z st.shared.v2.b32 [global_smem+40968], {%r327, %r325}; 2026-02-21T08:12:18.8910950Z barrier.sync 1; 2026-02-21T08:12:18.8911114Z barrier.sync 1; 2026-02-21T08:12:18.8911266Z barrier.sync 1; 2026-02-21T08:12:18.8911565Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8911889Z bar.sync 0, 256; 2026-02-21T08:12:18.8912046Z // begin inline asm 2026-02-21T08:12:18.8912195Z 2026-02-21T08:12:18.8912329Z { 2026-02-21T08:12:18.8912477Z .reg .pred complete; 2026-02-21T08:12:18.8912643Z waitLoop: 2026-02-21T08:12:18.8912864Z mbarrier.try_wait.parity.shared.b64 complete, [%r228], %r180; 2026-02-21T08:12:18.8913132Z @!complete bra.uni waitLoop; 2026-02-21T08:12:18.8913314Z } 2026-02-21T08:12:18.8913386Z 2026-02-21T08:12:18.8913446Z // end inline asm 2026-02-21T08:12:18.8913750Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.8914089Z bar.sync 0, 256; 2026-02-21T08:12:18.8914250Z // begin inline asm 2026-02-21T08:12:18.8914528Z @%p30 mbarrier.inval.shared::cta.b64 [%r228]; 2026-02-21T08:12:18.8914784Z // end inline asm 2026-02-21T08:12:18.8914943Z // begin inline asm 2026-02-21T08:12:18.8915123Z @%p30 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T08:12:18.8915340Z // end inline asm 2026-02-21T08:12:18.8915490Z bar.sync 0, 256; 2026-02-21T08:12:18.8915649Z // begin inline asm 2026-02-21T08:12:18.8915828Z @%p30 mbarrier.inval.shared::cta.b64 [%r219]; 2026-02-21T08:12:18.8916044Z // end inline asm 2026-02-21T08:12:18.8916190Z bar.sync 0, 256; 2026-02-21T08:12:18.8916347Z // begin inline asm 2026-02-21T08:12:18.8916548Z @%p30 mbarrier.inval.shared::cta.b64 [%r220]; 2026-02-21T08:12:18.8916754Z // end inline asm 2026-02-21T08:12:18.8916920Z bar.sync 0, 256; 2026-02-21T08:12:18.8917073Z // begin inline asm 2026-02-21T08:12:18.8917265Z @%p30 mbarrier.inval.shared::cta.b64 [%r221]; 2026-02-21T08:12:18.8917467Z // end inline asm 2026-02-21T08:12:18.8917687Z bar.sync 0, 256; 2026-02-21T08:12:18.8917842Z // begin inline asm 2026-02-21T08:12:18.8918025Z @%p30 mbarrier.inval.shared::cta.b64 [%r222]; 2026-02-21T08:12:18.8918234Z // end inline asm 2026-02-21T08:12:18.8918382Z // begin inline asm 2026-02-21T08:12:18.8918566Z @%p30 mbarrier.inval.shared::cta.b64 [%r213]; 2026-02-21T08:12:18.8918770Z // end inline asm 2026-02-21T08:12:18.8918925Z bar.sync 0, 256; 2026-02-21T08:12:18.8919071Z // begin inline asm 2026-02-21T08:12:18.8919255Z @%p30 mbarrier.inval.shared::cta.b64 [%r214]; 2026-02-21T08:12:18.8919455Z // end inline asm 2026-02-21T08:12:18.8919613Z bar.sync 0, 256; 2026-02-21T08:12:18.8919760Z // begin inline asm 2026-02-21T08:12:18.8919948Z @%p30 mbarrier.inval.shared::cta.b64 [%r215]; 2026-02-21T08:12:18.8920157Z // end inline asm 2026-02-21T08:12:18.8920302Z bar.sync 0, 256; 2026-02-21T08:12:18.8920456Z // begin inline asm 2026-02-21T08:12:18.8920633Z @%p30 mbarrier.inval.shared::cta.b64 [%r216]; 2026-02-21T08:12:18.8920845Z // end inline asm 2026-02-21T08:12:18.8920992Z bar.sync 0, 256; 2026-02-21T08:12:18.8921146Z // begin inline asm 2026-02-21T08:12:18.8921321Z @%p30 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T08:12:18.8921529Z // end inline asm 2026-02-21T08:12:18.8921810Z .loc 1 59 45 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:59:45 2026-02-21T08:12:18.8922142Z shl.b32 %r339, %r328, 10; 2026-02-21T08:12:18.8922322Z shl.b32 %r340, %r329, 10; 2026-02-21T08:12:18.8922491Z shl.b32 %r341, %r330, 10; 2026-02-21T08:12:18.8922666Z shl.b32 %r342, %r331, 10; 2026-02-21T08:12:18.8922954Z .loc 1 59 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:59:52 2026-02-21T08:12:18.8923285Z add.s32 %r343, %r339, %r326; 2026-02-21T08:12:18.8923458Z add.s32 %r344, %r340, %r326; 2026-02-21T08:12:18.8923636Z add.s32 %r345, %r341, %r326; 2026-02-21T08:12:18.8923813Z add.s32 %r346, %r342, %r326; 2026-02-21T08:12:18.8924109Z .loc 1 59 24 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:59:24 2026-02-21T08:12:18.8924460Z mad.wide.s32 %rd72, %r343, 2, %rd7; 2026-02-21T08:12:18.8924663Z mad.wide.s32 %rd73, %r344, 2, %rd7; 2026-02-21T08:12:18.8924896Z mad.wide.s32 %rd74, %r345, 2, %rd7; 2026-02-21T08:12:18.8925086Z mad.wide.s32 %rd75, %r346, 2, %rd7; 2026-02-21T08:12:18.8925411Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8925737Z // begin inline asm 2026-02-21T08:12:18.8926146Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251, %r252, %r253, %r254, %r255, %r256, %r257}, [%r179 + 0]; 2026-02-21T08:12:18.8926600Z // end inline asm 2026-02-21T08:12:18.8926752Z // begin inline asm 2026-02-21T08:12:18.8927152Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270, %r271, %r272, %r273, %r274}, [%r179 + 16]; 2026-02-21T08:12:18.8927587Z // end inline asm 2026-02-21T08:12:18.8927822Z // begin inline asm 2026-02-21T08:12:18.8928003Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:12:18.8928182Z // end inline asm 2026-02-21T08:12:18.8928342Z cvt.u64.u32 %rd76, %r242; 2026-02-21T08:12:18.8928514Z cvt.u64.u32 %rd77, %r243; 2026-02-21T08:12:18.8928694Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:12:18.8928863Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:12:18.8929175Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8929524Z mov.b64 {%r347, %r348}, %rd79; 2026-02-21T08:12:18.8929726Z cvt.rn.f16x2.f32 %r349, %r348, %r347; 2026-02-21T08:12:18.8930061Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8930385Z cvt.u64.u32 %rd80, %r244; 2026-02-21T08:12:18.8930563Z cvt.u64.u32 %rd81, %r245; 2026-02-21T08:12:18.8930733Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:12:18.8930998Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:12:18.8931303Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8931635Z mov.b64 {%r350, %r351}, %rd83; 2026-02-21T08:12:18.8931824Z cvt.rn.f16x2.f32 %r352, %r351, %r350; 2026-02-21T08:12:18.8932149Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8932478Z cvt.u64.u32 %rd84, %r246; 2026-02-21T08:12:18.8932648Z cvt.u64.u32 %rd85, %r247; 2026-02-21T08:12:18.8932823Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:12:18.8932992Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:12:18.8933297Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8933619Z mov.b64 {%r353, %r354}, %rd87; 2026-02-21T08:12:18.8933815Z cvt.rn.f16x2.f32 %r355, %r354, %r353; 2026-02-21T08:12:18.8934141Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8934470Z cvt.u64.u32 %rd88, %r248; 2026-02-21T08:12:18.8934644Z cvt.u64.u32 %rd89, %r249; 2026-02-21T08:12:18.8934880Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:12:18.8935054Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:12:18.8935358Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8935688Z mov.b64 {%r356, %r357}, %rd91; 2026-02-21T08:12:18.8935871Z cvt.rn.f16x2.f32 %r358, %r357, %r356; 2026-02-21T08:12:18.8936202Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8936529Z cvt.u64.u32 %rd92, %r250; 2026-02-21T08:12:18.8936694Z cvt.u64.u32 %rd93, %r251; 2026-02-21T08:12:18.8936866Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:12:18.8937032Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:12:18.8937337Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8937665Z mov.b64 {%r359, %r360}, %rd95; 2026-02-21T08:12:18.8937858Z cvt.rn.f16x2.f32 %r361, %r360, %r359; 2026-02-21T08:12:18.8938178Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8938499Z cvt.u64.u32 %rd96, %r252; 2026-02-21T08:12:18.8938678Z cvt.u64.u32 %rd97, %r253; 2026-02-21T08:12:18.8938844Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:12:18.8939016Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:12:18.8939312Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8939654Z mov.b64 {%r362, %r363}, %rd99; 2026-02-21T08:12:18.8939841Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T08:12:18.8940183Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8940518Z cvt.u64.u32 %rd100, %r254; 2026-02-21T08:12:18.8940692Z cvt.u64.u32 %rd101, %r255; 2026-02-21T08:12:18.8940875Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:12:18.8941120Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:12:18.8941433Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8941764Z mov.b64 {%r365, %r366}, %rd103; 2026-02-21T08:12:18.8941960Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T08:12:18.8942280Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8942610Z cvt.u64.u32 %rd104, %r256; 2026-02-21T08:12:18.8942794Z cvt.u64.u32 %rd105, %r257; 2026-02-21T08:12:18.8942966Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:12:18.8943152Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:12:18.8943460Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8943802Z mov.b64 {%r368, %r369}, %rd107; 2026-02-21T08:12:18.8943995Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T08:12:18.8944374Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8944758Z cvt.u64.u32 %rd108, %r259; 2026-02-21T08:12:18.8944932Z cvt.u64.u32 %rd109, %r260; 2026-02-21T08:12:18.8945110Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:12:18.8945282Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:12:18.8945589Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8945913Z mov.b64 {%r371, %r372}, %rd111; 2026-02-21T08:12:18.8946107Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T08:12:18.8946435Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8946755Z cvt.u64.u32 %rd112, %r261; 2026-02-21T08:12:18.8946932Z cvt.u64.u32 %rd113, %r262; 2026-02-21T08:12:18.8947102Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:12:18.8947283Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:12:18.8947586Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8947926Z mov.b64 {%r374, %r375}, %rd115; 2026-02-21T08:12:18.8948112Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T08:12:18.8948434Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8948769Z cvt.u64.u32 %rd116, %r263; 2026-02-21T08:12:18.8948940Z cvt.u64.u32 %rd117, %r264; 2026-02-21T08:12:18.8949115Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:12:18.8949289Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:12:18.8949597Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8949920Z mov.b64 {%r377, %r378}, %rd119; 2026-02-21T08:12:18.8950116Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T08:12:18.8950437Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8950762Z cvt.u64.u32 %rd120, %r265; 2026-02-21T08:12:18.8950951Z cvt.u64.u32 %rd121, %r266; 2026-02-21T08:12:18.8951120Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:12:18.8951304Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:12:18.8951605Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8951944Z mov.b64 {%r380, %r381}, %rd123; 2026-02-21T08:12:18.8952129Z cvt.rn.f16x2.f32 %r382, %r381, %r380; 2026-02-21T08:12:18.8952460Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8952808Z cvt.u64.u32 %rd124, %r267; 2026-02-21T08:12:18.8952983Z cvt.u64.u32 %rd125, %r268; 2026-02-21T08:12:18.8953163Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:12:18.8953338Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:12:18.8953647Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8953969Z mov.b64 {%r383, %r384}, %rd127; 2026-02-21T08:12:18.8954164Z cvt.rn.f16x2.f32 %r385, %r384, %r383; 2026-02-21T08:12:18.8954558Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8954938Z cvt.u64.u32 %rd128, %r269; 2026-02-21T08:12:18.8955119Z cvt.u64.u32 %rd129, %r270; 2026-02-21T08:12:18.8955290Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:12:18.8955473Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:12:18.8955780Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8956119Z mov.b64 {%r386, %r387}, %rd131; 2026-02-21T08:12:18.8956317Z cvt.rn.f16x2.f32 %r388, %r387, %r386; 2026-02-21T08:12:18.8956654Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8957000Z cvt.u64.u32 %rd132, %r271; 2026-02-21T08:12:18.8957173Z cvt.u64.u32 %rd133, %r272; 2026-02-21T08:12:18.8957354Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:12:18.8957588Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:12:18.8957904Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8958225Z mov.b64 {%r389, %r390}, %rd135; 2026-02-21T08:12:18.8958419Z cvt.rn.f16x2.f32 %r391, %r390, %r389; 2026-02-21T08:12:18.8958747Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.8959078Z cvt.u64.u32 %rd136, %r273; 2026-02-21T08:12:18.8959261Z cvt.u64.u32 %rd137, %r274; 2026-02-21T08:12:18.8959432Z shl.b64 %rd138, %rd137, 32; 2026-02-21T08:12:18.8959612Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T08:12:18.8959911Z .loc 1 58 27 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:58:27 2026-02-21T08:12:18.8960250Z mov.b64 {%r392, %r393}, %rd139; 2026-02-21T08:12:18.8960435Z cvt.rn.f16x2.f32 %r394, %r393, %r392; 2026-02-21T08:12:18.8960763Z .loc 1 59 82 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:59:82 2026-02-21T08:12:18.8961146Z st.shared.v4.b32 [%r31], {%r349, %r361, %r373, %r385}; 2026-02-21T08:12:18.8961373Z bar.sync 0, 256; 2026-02-21T08:12:18.8961534Z // begin inline asm 2026-02-21T08:12:18.8961805Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r276, %r277, %r278, %r279}, [%r280]; 2026-02-21T08:12:18.8962114Z // end inline asm 2026-02-21T08:12:18.8962264Z bar.sync 0, 256; 2026-02-21T08:12:18.8962463Z st.shared.v4.b32 [%r31], {%r352, %r364, %r376, %r388}; 2026-02-21T08:12:18.8962690Z bar.sync 0, 256; 2026-02-21T08:12:18.8962839Z // begin inline asm 2026-02-21T08:12:18.8963103Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r281, %r282, %r283, %r284}, [%r280]; 2026-02-21T08:12:18.8963395Z // end inline asm 2026-02-21T08:12:18.8963550Z bar.sync 0, 256; 2026-02-21T08:12:18.8963735Z st.shared.v4.b32 [%r31], {%r355, %r367, %r379, %r391}; 2026-02-21T08:12:18.8963960Z bar.sync 0, 256; 2026-02-21T08:12:18.8964108Z // begin inline asm 2026-02-21T08:12:18.8964373Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r286, %r287, %r288, %r289}, [%r280]; 2026-02-21T08:12:18.8964711Z // end inline asm 2026-02-21T08:12:18.8964874Z bar.sync 0, 256; 2026-02-21T08:12:18.8965066Z st.shared.v4.b32 [%r31], {%r358, %r370, %r382, %r394}; 2026-02-21T08:12:18.8965286Z bar.sync 0, 256; 2026-02-21T08:12:18.8965446Z // begin inline asm 2026-02-21T08:12:18.8965695Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r291, %r292, %r293, %r294}, [%r280]; 2026-02-21T08:12:18.8965991Z // end inline asm 2026-02-21T08:12:18.8966139Z // begin inline asm 2026-02-21T08:12:18.8966351Z st.global.v4.b32 [ %rd72 + 0 ], { %r276, %r281, %r286, %r291 }; 2026-02-21T08:12:18.8966590Z // end inline asm 2026-02-21T08:12:18.8966749Z // begin inline asm 2026-02-21T08:12:18.8966957Z st.global.v4.b32 [ %rd73 + 0 ], { %r277, %r282, %r287, %r292 }; 2026-02-21T08:12:18.8967193Z // end inline asm 2026-02-21T08:12:18.8967346Z // begin inline asm 2026-02-21T08:12:18.8967545Z st.global.v4.b32 [ %rd74 + 0 ], { %r278, %r283, %r288, %r293 }; 2026-02-21T08:12:18.8967854Z // end inline asm 2026-02-21T08:12:18.8968009Z // begin inline asm 2026-02-21T08:12:18.8968218Z st.global.v4.b32 [ %rd75 + 0 ], { %r279, %r284, %r289, %r294 }; 2026-02-21T08:12:18.8968452Z // end inline asm 2026-02-21T08:12:18.8968750Z .loc 1 30 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:30:52 2026-02-21T08:12:18.8969095Z add.s32 %r403, %r403, 1; 2026-02-21T08:12:18.8969286Z setp.ne.b32 %p77, %r25, %r403; 2026-02-21T08:12:18.8969483Z @%p77 bra $L__BB0_14; 2026-02-21T08:12:18.8969684Z $L__BB0_15: // %._crit_edge 2026-02-21T08:12:18.8970036Z .loc 1 30 4 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:30:4 2026-02-21T08:12:18.8970367Z bar.sync 0, 256; 2026-02-21T08:12:18.8970534Z // begin inline asm 2026-02-21T08:12:18.8970770Z @%p27 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r395, 64; 2026-02-21T08:12:18.8971098Z // end inline asm 2026-02-21T08:12:18.8971282Z st.shared.b32 [global_smem+45160], 50529027; 2026-02-21T08:12:18.8971489Z barrier.sync 1; 2026-02-21T08:12:18.8971679Z $L__BB0_16: // %common.ret 2026-02-21T08:12:18.8972017Z .loc 1 0 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:0 2026-02-21T08:12:18.8972339Z ret; 2026-02-21T08:12:18.8972524Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:12:18.8972789Z ld.param.b64 %rd6, [_helion_matmul_param_1]; 2026-02-21T08:12:18.8973131Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19 2026-02-21T08:12:18.8973455Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:12:18.8973635Z and.b16 %rs2, %rs1, 1; 2026-02-21T08:12:18.8973805Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:12:18.8973994Z mov.b32 %r36, global_smem; 2026-02-21T08:12:18.8974169Z add.s32 %r37, %r36, %r3; 2026-02-21T08:12:18.8974346Z add.s32 %r91, %r36, 40960; 2026-02-21T08:12:18.8974517Z bra.uni $L__BB0_2; 2026-02-21T08:12:18.8974767Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.8975159Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.8975490Z barrier.sync 1; 2026-02-21T08:12:18.8975654Z barrier.sync 1; 2026-02-21T08:12:18.8975831Z $L__BB0_2: // %.preheader 2026-02-21T08:12:18.8976087Z // =>This Loop Header: Depth=1 2026-02-21T08:12:18.8976352Z // Child Loop BB0_9 Depth 2 2026-02-21T08:12:18.8976614Z // Child Loop BB0_6 Depth 2 2026-02-21T08:12:18.8976978Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19 2026-02-21T08:12:18.8977327Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:12:18.8977542Z barrier.sync 1; 2026-02-21T08:12:18.8977710Z ld.shared.b8 %r35, [%r37+45152]; 2026-02-21T08:12:18.8977928Z setp.gt.u32 %p2, %r35, 3; 2026-02-21T08:12:18.8978111Z @%p2 bra $L__BB0_4; 2026-02-21T08:12:18.8978315Z // %bb.3: // %.preheader 2026-02-21T08:12:18.8978563Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.8978807Z $L_brx_0: .branchtargets 2026-02-21T08:12:18.8978982Z $L__BB0_5, 2026-02-21T08:12:18.8979122Z $L__BB0_8, 2026-02-21T08:12:18.8979269Z $L__BB0_11, 2026-02-21T08:12:18.8979410Z $L__BB0_16; 2026-02-21T08:12:18.8979567Z brx.idx %r35, $L_brx_0; 2026-02-21T08:12:18.8979761Z $L__BB0_5: // %.peel.next 2026-02-21T08:12:18.8980012Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.8980378Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.8980749Z ld.shared.b32 %r80, [global_smem+40960]; 2026-02-21T08:12:18.8980978Z ld.shared.b32 %r92, [global_smem+40972]; 2026-02-21T08:12:18.8981257Z barrier.sync 1; 2026-02-21T08:12:18.8981543Z .loc 1 42 45 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:42:45 2026-02-21T08:12:18.8981879Z bfe.u32 %r93, %r1, 1, 4; 2026-02-21T08:12:18.8982181Z .loc 1 50 48 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:50:48 2026-02-21T08:12:18.8982509Z shl.b32 %r94, %r1, 3; 2026-02-21T08:12:18.8982683Z and.b32 %r95, %r94, 8; 2026-02-21T08:12:18.8982970Z .loc 1 42 32 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:42:32 2026-02-21T08:12:18.8983299Z add.s32 %r96, %r92, %r93; 2026-02-21T08:12:18.8983478Z shl.b32 %r97, %r96, 10; 2026-02-21T08:12:18.8983768Z .loc 1 55 80 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:80 2026-02-21T08:12:18.8984099Z add.s32 %r98, %r97, 16384; 2026-02-21T08:12:18.8984450Z .loc 1 55 59 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:59 2026-02-21T08:12:18.8984826Z or.b32 %r99, %r97, %r95; 2026-02-21T08:12:18.8985000Z or.b32 %r100, %r98, %r95; 2026-02-21T08:12:18.8985314Z .loc 1 55 34 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:34 2026-02-21T08:12:18.8985668Z mad.wide.s32 %rd12, %r99, 2, %rd6; 2026-02-21T08:12:18.8985874Z mad.wide.s32 %rd13, %r100, 2, %rd6; 2026-02-21T08:12:18.8986206Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.8986538Z shl.b32 %r101, %r1, 4; 2026-02-21T08:12:18.8986718Z and.b32 %r102, %r101, 368; 2026-02-21T08:12:18.8986894Z bfe.s32 %r103, %r1, 3, 1; 2026-02-21T08:12:18.8987074Z and.b32 %r104, %r103, 144; 2026-02-21T08:12:18.8987246Z xor.b32 %r6, %r104, %r102; 2026-02-21T08:12:18.8987431Z add.s32 %r62, %r91, %r6; 2026-02-21T08:12:18.8987609Z mov.b32 %r63, 16; 2026-02-21T08:12:18.8987771Z // begin inline asm 2026-02-21T08:12:18.8988013Z cp.async.cg.shared.global [ %r62 + 0 ], [ %rd12 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.8988274Z // end inline asm 2026-02-21T08:12:18.8988439Z add.s32 %r64, %r62, 512; 2026-02-21T08:12:18.8988607Z // begin inline asm 2026-02-21T08:12:18.8988840Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd13 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.8989100Z // end inline asm 2026-02-21T08:12:18.8989279Z cp.async.commit_group; 2026-02-21T08:12:18.8989594Z .loc 1 55 34 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:34 2026-02-21T08:12:18.8989938Z cvt.s64.s32 %rd29, %r97; 2026-02-21T08:12:18.8990116Z cvt.u64.u32 %rd30, %r95; 2026-02-21T08:12:18.8990288Z or.b64 %rd31, %rd29, %rd30; 2026-02-21T08:12:18.8990473Z shl.b64 %rd32, %rd31, 1; 2026-02-21T08:12:18.8990643Z add.s64 %rd33, %rd6, %rd32; 2026-02-21T08:12:18.8990830Z add.s64 %rd14, %rd33, 32; 2026-02-21T08:12:18.8990999Z cvt.s64.s32 %rd34, %r98; 2026-02-21T08:12:18.8991178Z or.b64 %rd35, %rd34, %rd30; 2026-02-21T08:12:18.8991362Z shl.b64 %rd36, %rd35, 1; 2026-02-21T08:12:18.8991529Z add.s64 %rd37, %rd6, %rd36; 2026-02-21T08:12:18.8991712Z add.s64 %rd15, %rd37, 32; 2026-02-21T08:12:18.8992015Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.8992377Z bar.warp.sync -1; 2026-02-21T08:12:18.8992541Z add.s32 %r105, %r36, %r6; 2026-02-21T08:12:18.8992719Z add.s32 %r66, %r105, 41984; 2026-02-21T08:12:18.8992888Z // begin inline asm 2026-02-21T08:12:18.8993121Z cp.async.cg.shared.global [ %r66 + 0 ], [ %rd14 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.8993375Z // end inline asm 2026-02-21T08:12:18.8993538Z add.s32 %r68, %r105, 42496; 2026-02-21T08:12:18.8993718Z // begin inline asm 2026-02-21T08:12:18.8993938Z cp.async.cg.shared.global [ %r68 + 0 ], [ %rd15 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.8994196Z // end inline asm 2026-02-21T08:12:18.8994352Z cp.async.commit_group; 2026-02-21T08:12:18.8994667Z .loc 1 55 34 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:34 2026-02-21T08:12:18.8995104Z add.s64 %rd16, %rd33, 64; 2026-02-21T08:12:18.8995282Z add.s64 %rd17, %rd37, 64; 2026-02-21T08:12:18.8995578Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.8995910Z bar.warp.sync -1; 2026-02-21T08:12:18.8996076Z add.s32 %r70, %r105, 43008; 2026-02-21T08:12:18.8996254Z // begin inline asm 2026-02-21T08:12:18.8996478Z cp.async.cg.shared.global [ %r70 + 0 ], [ %rd16 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.8996721Z // end inline asm 2026-02-21T08:12:18.8996880Z add.s32 %r72, %r105, 43520; 2026-02-21T08:12:18.8997049Z // begin inline asm 2026-02-21T08:12:18.8997273Z cp.async.cg.shared.global [ %r72 + 0 ], [ %rd17 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.8997522Z // end inline asm 2026-02-21T08:12:18.8997685Z cp.async.commit_group; 2026-02-21T08:12:18.8998037Z .loc 1 55 34 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:34 2026-02-21T08:12:18.8998375Z add.s64 %rd18, %rd33, 96; 2026-02-21T08:12:18.8998553Z add.s64 %rd19, %rd37, 96; 2026-02-21T08:12:18.8998853Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.8999187Z bar.warp.sync -1; 2026-02-21T08:12:18.8999347Z add.s32 %r74, %r105, 44032; 2026-02-21T08:12:18.8999525Z // begin inline asm 2026-02-21T08:12:18.8999752Z cp.async.cg.shared.global [ %r74 + 0 ], [ %rd18 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.9000004Z // end inline asm 2026-02-21T08:12:18.9000172Z add.s32 %r76, %r105, 44544; 2026-02-21T08:12:18.9000350Z // begin inline asm 2026-02-21T08:12:18.9000574Z cp.async.cg.shared.global [ %r76 + 0 ], [ %rd19 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.9000820Z // end inline asm 2026-02-21T08:12:18.9000982Z cp.async.commit_group; 2026-02-21T08:12:18.9001155Z cp.async.wait_group 3; 2026-02-21T08:12:18.9001342Z bar.warp.sync -1; 2026-02-21T08:12:18.9001629Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.9001969Z add.s32 %r78, %r36, 45104; 2026-02-21T08:12:18.9002144Z mov.b32 %r396, 0; 2026-02-21T08:12:18.9002295Z // begin inline asm 2026-02-21T08:12:18.9002454Z 2026-02-21T08:12:18.9002579Z { 2026-02-21T08:12:18.9002722Z .reg .pred complete; 2026-02-21T08:12:18.9002883Z waitLoop: 2026-02-21T08:12:18.9003098Z mbarrier.try_wait.parity.shared.b64 complete, [%r78], %r396; 2026-02-21T08:12:18.9003367Z @!complete bra.uni waitLoop; 2026-02-21T08:12:18.9003546Z } 2026-02-21T08:12:18.9003619Z 2026-02-21T08:12:18.9003688Z // end inline asm 2026-02-21T08:12:18.9003970Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.9004321Z elect.sync %r106|%p10, -1; 2026-02-21T08:12:18.9004505Z bfe.u32 %r107, %r36, 4, 14; 2026-02-21T08:12:18.9004751Z cvt.u64.u32 %rd38, %r107; 2026-02-21T08:12:18.9004942Z or.b64 %rd20, %rd38, -4611685949674356736; 2026-02-21T08:12:18.9005161Z bfe.u32 %r108, %r91, 4, 14; 2026-02-21T08:12:18.9005333Z cvt.u64.u32 %rd39, %r108; 2026-02-21T08:12:18.9005527Z or.b64 %rd21, %rd39, -4611685949703716864; 2026-02-21T08:12:18.9005726Z mov.b32 %r81, 134742032; 2026-02-21T08:12:18.9005901Z mov.pred %p9, 0; 2026-02-21T08:12:18.9006064Z // begin inline asm 2026-02-21T08:12:18.9006314Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r80 + 0 ], %rd20, %rd21, %r81, %p9; 2026-02-21T08:12:18.9006611Z // end inline asm 2026-02-21T08:12:18.9006764Z add.s32 %r109, %r36, 4096; 2026-02-21T08:12:18.9006948Z bfe.u32 %r110, %r109, 4, 14; 2026-02-21T08:12:18.9007124Z cvt.u64.u32 %rd40, %r110; 2026-02-21T08:12:18.9007312Z or.b64 %rd22, %rd40, -4611685949674356736; 2026-02-21T08:12:18.9007509Z // begin inline asm 2026-02-21T08:12:18.9007762Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r80 + 32 ], %rd22, %rd21, %r81, %p9; 2026-02-21T08:12:18.9008047Z // end inline asm 2026-02-21T08:12:18.9008202Z add.s32 %r111, %r36, 45056; 2026-02-21T08:12:18.9008449Z cvt.u64.u32 %rd24, %r111; 2026-02-21T08:12:18.9008616Z // begin inline asm 2026-02-21T08:12:18.9008856Z @%p10 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd24]; 2026-02-21T08:12:18.9009119Z // end inline asm 2026-02-21T08:12:18.9009280Z add.s32 %r112, %r36, 45152; 2026-02-21T08:12:18.9009454Z cvt.u64.u32 %rd25, %r112; 2026-02-21T08:12:18.9009629Z // begin inline asm 2026-02-21T08:12:18.9009866Z @%p9 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd25]; 2026-02-21T08:12:18.9010122Z // end inline asm 2026-02-21T08:12:18.9010418Z .loc 1 55 34 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:34 2026-02-21T08:12:18.9010755Z add.s64 %rd26, %rd33, 128; 2026-02-21T08:12:18.9010942Z add.s64 %rd27, %rd37, 128; 2026-02-21T08:12:18.9011252Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.9011648Z bar.warp.sync -1; 2026-02-21T08:12:18.9011821Z // begin inline asm 2026-02-21T08:12:18.9012042Z cp.async.cg.shared.global [ %r62 + 0 ], [ %rd26 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.9012299Z // end inline asm 2026-02-21T08:12:18.9012451Z // begin inline asm 2026-02-21T08:12:18.9012675Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd27 + 0 ], 0x10, %r63; 2026-02-21T08:12:18.9012918Z // end inline asm 2026-02-21T08:12:18.9013080Z cp.async.commit_group; 2026-02-21T08:12:18.9013387Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9013735Z add.s32 %r113, %r4, %r97; 2026-02-21T08:12:18.9013916Z cvt.u64.u32 %rd1, %r113; 2026-02-21T08:12:18.9014086Z mov.b32 %r399, 1; 2026-02-21T08:12:18.9014247Z mov.b64 %rd140, 0; 2026-02-21T08:12:18.9014401Z mov.b32 %r397, %r396; 2026-02-21T08:12:18.9014578Z mov.b32 %r398, %r396; 2026-02-21T08:12:18.9014822Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:18.9015115Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:18.9015358Z add.s64 %rd3, %rd140, 16; 2026-02-21T08:12:18.9015547Z setp.lt.u64 %p21, %rd3, 960; 2026-02-21T08:12:18.9015738Z add.s32 %r124, %r396, 1; 2026-02-21T08:12:18.9015914Z setp.gt.s32 %p22, %r124, 3; 2026-02-21T08:12:18.9016107Z selp.b32 %r396, 0, %r124, %p22; 2026-02-21T08:12:18.9016298Z shl.b32 %r125, %r399, 3; 2026-02-21T08:12:18.9016475Z add.s32 %r127, %r36, %r125; 2026-02-21T08:12:18.9016650Z add.s32 %r128, %r127, 45056; 2026-02-21T08:12:18.9016830Z add.s32 %r114, %r127, 45104; 2026-02-21T08:12:18.9017137Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.9017476Z shl.b32 %r129, %r399, 13; 2026-02-21T08:12:18.9017655Z add.s32 %r130, %r36, %r129; 2026-02-21T08:12:18.9017961Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.9018306Z cp.async.wait_group 3; 2026-02-21T08:12:18.9018485Z bar.warp.sync -1; 2026-02-21T08:12:18.9018650Z shl.b32 %r131, %r396, 10; 2026-02-21T08:12:18.9018821Z add.s32 %r133, %r91, %r131; 2026-02-21T08:12:18.9019131Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.9019457Z // begin inline asm 2026-02-21T08:12:18.9019614Z 2026-02-21T08:12:18.9019746Z { 2026-02-21T08:12:18.9019882Z .reg .pred complete; 2026-02-21T08:12:18.9020056Z waitLoop: 2026-02-21T08:12:18.9020271Z mbarrier.try_wait.parity.shared.b64 complete, [%r114], %r398; 2026-02-21T08:12:18.9020546Z @!complete bra.uni waitLoop; 2026-02-21T08:12:18.9020721Z } 2026-02-21T08:12:18.9020800Z 2026-02-21T08:12:18.9020862Z // end inline asm 2026-02-21T08:12:18.9021150Z .loc 1 56 52 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:56:52 2026-02-21T08:12:18.9021492Z setp.eq.b64 %p23, %rd140, 992; 2026-02-21T08:12:18.9021691Z elect.sync %r134|%p16, -1; 2026-02-21T08:12:18.9021951Z bfe.u32 %r135, %r130, 4, 14; 2026-02-21T08:12:18.9022138Z cvt.u64.u32 %rd49, %r135; 2026-02-21T08:12:18.9022326Z or.b64 %rd41, %rd49, -4611685949674356736; 2026-02-21T08:12:18.9022538Z bfe.u32 %r136, %r133, 4, 14; 2026-02-21T08:12:18.9022713Z cvt.u64.u32 %rd50, %r136; 2026-02-21T08:12:18.9022901Z or.b64 %rd42, %rd50, -4611685949703716864; 2026-02-21T08:12:18.9023101Z mov.pred %p15, -1; 2026-02-21T08:12:18.9023272Z // begin inline asm 2026-02-21T08:12:18.9023522Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r80 + 0 ], %rd41, %rd42, %r81, %p15; 2026-02-21T08:12:18.9023818Z // end inline asm 2026-02-21T08:12:18.9023980Z add.s32 %r137, %r130, 4096; 2026-02-21T08:12:18.9024155Z bfe.u32 %r138, %r137, 4, 14; 2026-02-21T08:12:18.9024368Z cvt.u64.u32 %rd51, %r138; 2026-02-21T08:12:18.9024553Z or.b64 %rd43, %rd51, -4611685949674356736; 2026-02-21T08:12:18.9024791Z // begin inline asm 2026-02-21T08:12:18.9025126Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r80 + 32 ], %rd43, %rd42, %r81, %p15; 2026-02-21T08:12:18.9025429Z // end inline asm 2026-02-21T08:12:18.9025586Z cvt.u64.u32 %rd45, %r128; 2026-02-21T08:12:18.9025763Z // begin inline asm 2026-02-21T08:12:18.9026005Z @%p16 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd45]; 2026-02-21T08:12:18.9026264Z // end inline asm 2026-02-21T08:12:18.9026434Z and.pred %p20, %p23, %p16; 2026-02-21T08:12:18.9026608Z // begin inline asm 2026-02-21T08:12:18.9026843Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd25]; 2026-02-21T08:12:18.9027098Z // end inline asm 2026-02-21T08:12:18.9027391Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.9027729Z add.s32 %r140, %r399, 1; 2026-02-21T08:12:18.9027905Z setp.eq.b32 %p24, %r140, 5; 2026-02-21T08:12:18.9028098Z selp.b32 %r399, 0, %r140, %p24; 2026-02-21T08:12:18.9028287Z selp.b32 %r141, 1, 0, %p24; 2026-02-21T08:12:18.9028472Z xor.b32 %r398, %r398, %r141; 2026-02-21T08:12:18.9028790Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9029140Z add.s32 %r142, %r397, 1; 2026-02-21T08:12:18.9029312Z setp.gt.s32 %p25, %r142, 3; 2026-02-21T08:12:18.9029498Z selp.b32 %r397, 0, %r142, %p25; 2026-02-21T08:12:18.9029821Z .loc 1 55 59 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:59 2026-02-21T08:12:18.9030153Z add.s64 %rd52, %rd1, %rd140; 2026-02-21T08:12:18.9030338Z cvt.u32.u64 %r143, %rd52; 2026-02-21T08:12:18.9030508Z add.s32 %r144, %r143, 80; 2026-02-21T08:12:18.9030809Z .loc 1 55 34 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:34 2026-02-21T08:12:18.9031151Z mad.wide.s32 %rd47, %r144, 2, %rd6; 2026-02-21T08:12:18.9031355Z add.s32 %r145, %r143, 16464; 2026-02-21T08:12:18.9031535Z mad.wide.s32 %rd48, %r145, 2, %rd6; 2026-02-21T08:12:18.9031870Z .loc 1 55 87 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:55:87 2026-02-21T08:12:18.9032208Z shl.b32 %r146, %r397, 10; 2026-02-21T08:12:18.9032381Z add.s32 %r147, %r91, %r146; 2026-02-21T08:12:18.9032562Z bar.warp.sync -1; 2026-02-21T08:12:18.9032725Z add.s32 %r120, %r147, %r6; 2026-02-21T08:12:18.9032909Z selp.b32 %r121, 16, 0, %p21; 2026-02-21T08:12:18.9033086Z // begin inline asm 2026-02-21T08:12:18.9033323Z cp.async.cg.shared.global [ %r120 + 0 ], [ %rd47 + 0 ], 0x10, %r121; 2026-02-21T08:12:18.9033590Z // end inline asm 2026-02-21T08:12:18.9033754Z add.s32 %r122, %r120, 512; 2026-02-21T08:12:18.9033932Z // begin inline asm 2026-02-21T08:12:18.9034150Z cp.async.cg.shared.global [ %r122 + 0 ], [ %rd48 + 0 ], 0x10, %r121; 2026-02-21T08:12:18.9034410Z // end inline asm 2026-02-21T08:12:18.9034567Z cp.async.commit_group; 2026-02-21T08:12:18.9034931Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9035281Z setp.lt.u64 %p26, %rd3, 1008; 2026-02-21T08:12:18.9035528Z mov.b64 %rd140, %rd3; 2026-02-21T08:12:18.9035694Z @%p26 bra $L__BB0_6; 2026-02-21T08:12:18.9035889Z // %bb.7: // %.loopexit 2026-02-21T08:12:18.9036149Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.9036384Z cp.async.wait_group 0; 2026-02-21T08:12:18.9036566Z bar.warp.sync -1; 2026-02-21T08:12:18.9036724Z barrier.sync 1; 2026-02-21T08:12:18.9036887Z bra.uni $L__BB0_2; 2026-02-21T08:12:18.9037091Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.9037473Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9037832Z ld.shared.b32 %r53, [global_smem+40968]; 2026-02-21T08:12:18.9038042Z barrier.sync 1; 2026-02-21T08:12:18.9038330Z .loc 1 21 67 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:21:67 2026-02-21T08:12:18.9038778Z mov.u32 %r40, %ctaid.x; 2026-02-21T08:12:18.9038968Z mov.u32 %r41, %ctaid.y; 2026-02-21T08:12:18.9039136Z mov.u32 %r42, %ctaid.z; 2026-02-21T08:12:18.9039314Z mov.u32 %r43, %nctaid.x; 2026-02-21T08:12:18.9039482Z mov.u32 %r44, %nctaid.y; 2026-02-21T08:12:18.9039671Z mad.lo.s32 %r45, %r42, %r44, %r41; 2026-02-21T08:12:18.9039871Z mad.lo.s32 %r46, %r45, %r43, %r40; 2026-02-21T08:12:18.9040069Z shl.b32 %r47, %r46, 7; 2026-02-21T08:12:18.9040244Z cvt.s64.s32 %rd9, %r47; 2026-02-21T08:12:18.9040415Z add.s64 %rd10, %rd8, %rd9; 2026-02-21T08:12:18.9040604Z cvta.global.u64 %rd11, %rd10; 2026-02-21T08:12:18.9040785Z add.s32 %r16, %r1, -256; 2026-02-21T08:12:18.9040955Z mov.b32 %r401, 0; 2026-02-21T08:12:18.9041108Z mov.b32 %r400, -16; 2026-02-21T08:12:18.9041275Z mov.b32 %r402, %r401; 2026-02-21T08:12:18.9041482Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:18.9041766Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:18.9042131Z .loc 1 0 67 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:0:67 2026-02-21T08:12:18.9042459Z setp.lt.u32 %p5, %r16, 32; 2026-02-21T08:12:18.9042652Z setp.eq.b32 %p3, %r16, 0; 2026-02-21T08:12:18.9042954Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9043293Z add.s32 %r400, %r400, 16; 2026-02-21T08:12:18.9043466Z shl.b32 %r55, %r402, 3; 2026-02-21T08:12:18.9043641Z add.s32 %r57, %r36, %r55; 2026-02-21T08:12:18.9043808Z add.s32 %r48, %r57, 45056; 2026-02-21T08:12:18.9044110Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.9044451Z // begin inline asm 2026-02-21T08:12:18.9044604Z 2026-02-21T08:12:18.9044785Z { 2026-02-21T08:12:18.9044925Z .reg .pred complete; 2026-02-21T08:12:18.9045094Z waitLoop: 2026-02-21T08:12:18.9045308Z mbarrier.try_wait.parity.shared.b64 complete, [%r48], %r401; 2026-02-21T08:12:18.9045584Z @!complete bra.uni waitLoop; 2026-02-21T08:12:18.9045756Z } 2026-02-21T08:12:18.9045838Z 2026-02-21T08:12:18.9045900Z // end inline asm 2026-02-21T08:12:18.9046198Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9046532Z add.s32 %r54, %r57, 45104; 2026-02-21T08:12:18.9046868Z .loc 1 54 31 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:54:31 2026-02-21T08:12:18.9047191Z bar.sync 3, 64; 2026-02-21T08:12:18.9047351Z // begin inline asm 2026-02-21T08:12:18.9047564Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r54], 8192; 2026-02-21T08:12:18.9047816Z // end inline asm 2026-02-21T08:12:18.9047971Z shl.b32 %r58, %r402, 13; 2026-02-21T08:12:18.9048168Z add.s32 %r51, %r36, %r58; 2026-02-21T08:12:18.9048343Z bar.sync 3, 64; 2026-02-21T08:12:18.9048504Z elect.sync %r59|%p6, -1; 2026-02-21T08:12:18.9048695Z and.pred %p4, %p5, %p6; 2026-02-21T08:12:18.9048874Z // begin inline asm 2026-02-21T08:12:18.9049337Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r51], [%rd11, {%r400, %r53}], [%r54]; 2026-02-21T08:12:18.9049752Z // end inline asm 2026-02-21T08:12:18.9049944Z add.s32 %r60, %r402, 1; 2026-02-21T08:12:18.9050121Z setp.eq.b32 %p7, %r60, 5; 2026-02-21T08:12:18.9050300Z selp.b32 %r402, 0, %r60, %p7; 2026-02-21T08:12:18.9050492Z selp.b32 %r61, 1, 0, %p7; 2026-02-21T08:12:18.9050665Z xor.b32 %r401, %r401, %r61; 2026-02-21T08:12:18.9050980Z .loc 1 49 112 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:49:112 2026-02-21T08:12:18.9051324Z setp.lt.u32 %p8, %r400, 1008; 2026-02-21T08:12:18.9051509Z @%p8 bra $L__BB0_9; 2026-02-21T08:12:18.9051715Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.9051966Z barrier.sync 1; 2026-02-21T08:12:18.9052128Z bra.uni $L__BB0_2; 2026-02-21T08:12:18.9052392Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:18.9052765Z .loc 1 19 0 // cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py:19 2026-02-21T08:12:18.9053093Z barrier.sync 1; 2026-02-21T08:12:18.9053261Z barrier.sync 1; 2026-02-21T08:12:18.9053417Z bra.uni $L__BB0_2; 2026-02-21T08:12:18.9053584Z $L__tmp1: 2026-02-21T08:12:18.9053729Z $L__func_end0: 2026-02-21T08:12:18.9053918Z // -- End function 2026-02-21T08:12:18.9054130Z } 2026-02-21T08:12:18.9054449Z .file 1 "/tmp/torchinductor_root/ot/cotm2snfkdvx7y5bv4fpiezhq4rykyohhpllelimdzsp6mmezcqg.py" 2026-02-21T08:12:18.9054890Z .section .debug_abbrev 2026-02-21T08:12:18.9055054Z { 2026-02-21T08:12:18.9055230Z .b8 1 // Abbreviation Code 2026-02-21T08:12:18.9055482Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:12:18.9055736Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:12:18.9055981Z .b8 37 // DW_AT_producer 2026-02-21T08:12:18.9056235Z .b8 8 // DW_FORM_string 2026-02-21T08:12:18.9056483Z .b8 19 // DW_AT_language 2026-02-21T08:12:18.9056721Z .b8 5 // DW_FORM_data2 2026-02-21T08:12:18.9056965Z .b8 3 // DW_AT_name 2026-02-21T08:12:18.9057198Z .b8 8 // DW_FORM_string 2026-02-21T08:12:18.9057444Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:12:18.9057680Z .b8 6 // DW_FORM_data4 2026-02-21T08:12:18.9057914Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:12:18.9058146Z .b8 8 // DW_FORM_string 2026-02-21T08:12:18.9058363Z .b8 0 // EOM(1) 2026-02-21T08:12:18.9058584Z .b8 0 // EOM(2) 2026-02-21T08:12:18.9058794Z .b8 0 // EOM(3) 2026-02-21T08:12:18.9058995Z } 2026-02-21T08:12:18.9059131Z .section .debug_info 2026-02-21T08:12:18.9059292Z { 2026-02-21T08:12:18.9059453Z .b32 104 // Length of Unit 2026-02-21T08:12:18.9059704Z .b8 2 // DWARF version number 2026-02-21T08:12:18.9059924Z .b8 0 2026-02-21T08:12:18.9060128Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:12:18.9060423Z .b8 8 // Address Size (in bytes) 2026-02-21T08:12:18.9060689Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:12:18.9060963Z .b8 116 // DW_AT_producer 2026-02-21T08:12:18.9061170Z .b8 114 2026-02-21T08:12:18.9061316Z .b8 105 2026-02-21T08:12:18.9061447Z .b8 116 2026-02-21T08:12:18.9061582Z .b8 111 2026-02-21T08:12:18.9061706Z .b8 110 2026-02-21T08:12:18.9061841Z .b8 0 2026-02-21T08:12:18.9062003Z .b8 2 // DW_AT_language 2026-02-21T08:12:18.9062272Z .b8 0 2026-02-21T08:12:18.9062437Z .b8 99 // DW_AT_name 2026-02-21T08:12:18.9062637Z .b8 111 2026-02-21T08:12:18.9062772Z .b8 116 2026-02-21T08:12:18.9062895Z .b8 109 2026-02-21T08:12:18.9063027Z .b8 50 2026-02-21T08:12:18.9063157Z .b8 115 2026-02-21T08:12:18.9063288Z .b8 110 2026-02-21T08:12:18.9063414Z .b8 102 2026-02-21T08:12:18.9063547Z .b8 107 2026-02-21T08:12:18.9063672Z .b8 100 2026-02-21T08:12:18.9063806Z .b8 118 2026-02-21T08:12:18.9063942Z .b8 120 2026-02-21T08:12:18.9064068Z .b8 55 2026-02-21T08:12:18.9064205Z .b8 121 2026-02-21T08:12:18.9064330Z .b8 53 2026-02-21T08:12:18.9064464Z .b8 98 2026-02-21T08:12:18.9064590Z .b8 118 2026-02-21T08:12:18.9064761Z .b8 52 2026-02-21T08:12:18.9064890Z .b8 102 2026-02-21T08:12:18.9065028Z .b8 112 2026-02-21T08:12:18.9065155Z .b8 105 2026-02-21T08:12:18.9065295Z .b8 101 2026-02-21T08:12:18.9065424Z .b8 122 2026-02-21T08:12:18.9065618Z .b8 104 2026-02-21T08:12:18.9065745Z .b8 113 2026-02-21T08:12:18.9065882Z .b8 52 2026-02-21T08:12:18.9066007Z .b8 114 2026-02-21T08:12:18.9066140Z .b8 121 2026-02-21T08:12:18.9066272Z .b8 107 2026-02-21T08:12:18.9066397Z .b8 121 2026-02-21T08:12:18.9066531Z .b8 111 2026-02-21T08:12:18.9066656Z .b8 104 2026-02-21T08:12:18.9066790Z .b8 104 2026-02-21T08:12:18.9066914Z .b8 112 2026-02-21T08:12:18.9067045Z .b8 108 2026-02-21T08:12:18.9067169Z .b8 108 2026-02-21T08:12:18.9067301Z .b8 101 2026-02-21T08:12:18.9067425Z .b8 108 2026-02-21T08:12:18.9067561Z .b8 105 2026-02-21T08:12:18.9067686Z .b8 109 2026-02-21T08:12:18.9067819Z .b8 100 2026-02-21T08:12:18.9067944Z .b8 122 2026-02-21T08:12:18.9068076Z .b8 115 2026-02-21T08:12:18.9068207Z .b8 112 2026-02-21T08:12:18.9068332Z .b8 54 2026-02-21T08:12:18.9068463Z .b8 109 2026-02-21T08:12:18.9068588Z .b8 109 2026-02-21T08:12:18.9068719Z .b8 101 2026-02-21T08:12:18.9068842Z .b8 122 2026-02-21T08:12:18.9068973Z .b8 99 2026-02-21T08:12:18.9069097Z .b8 113 2026-02-21T08:12:18.9069230Z .b8 103 2026-02-21T08:12:18.9069357Z .b8 46 2026-02-21T08:12:18.9069487Z .b8 112 2026-02-21T08:12:18.9069611Z .b8 121 2026-02-21T08:12:18.9069742Z .b8 0 2026-02-21T08:12:18.9069915Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:12:18.9070165Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:12:18.9070372Z .b8 116 2026-02-21T08:12:18.9070499Z .b8 109 2026-02-21T08:12:18.9070632Z .b8 112 2026-02-21T08:12:18.9070757Z .b8 47 2026-02-21T08:12:18.9070890Z .b8 116 2026-02-21T08:12:18.9071015Z .b8 111 2026-02-21T08:12:18.9071147Z .b8 114 2026-02-21T08:12:18.9071274Z .b8 99 2026-02-21T08:12:18.9071415Z .b8 104 2026-02-21T08:12:18.9071540Z .b8 105 2026-02-21T08:12:18.9071674Z .b8 110 2026-02-21T08:12:18.9071798Z .b8 100 2026-02-21T08:12:18.9071939Z .b8 117 2026-02-21T08:12:18.9071997Z .b8 99 2026-02-21T08:12:18.9072075Z .b8 116 2026-02-21T08:12:18.9072136Z .b8 111 2026-02-21T08:12:18.9072193Z .b8 114 2026-02-21T08:12:18.9072250Z .b8 95 2026-02-21T08:12:18.9072319Z .b8 114 2026-02-21T08:12:18.9072382Z .b8 111 2026-02-21T08:12:18.9072438Z .b8 111 2026-02-21T08:12:18.9072493Z .b8 116 2026-02-21T08:12:18.9072559Z .b8 47 2026-02-21T08:12:18.9072616Z .b8 111 2026-02-21T08:12:18.9072671Z .b8 116 2026-02-21T08:12:18.9072734Z .b8 0 2026-02-21T08:12:18.9072789Z } 2026-02-21T08:12:18.9072864Z .section .debug_macinfo { } 2026-02-21T08:12:18.9072869Z 2026-02-21T08:12:18.9072957Z ================================================================ 2026-02-21T08:12:18.9073079Z please share the reproducer above with Triton project. 2026-02-21T08:12:19.5543337Z 2026-02-21T08:12:19.5545893Z [18s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:12:19.5547658Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 32, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=5, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:12:19.5549697Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:12:19.5550017Z `ptxas` stderr: 2026-02-21T08:12:19.5550557Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 196 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:12:19.5551149Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:12:19.5551327Z 2026-02-21T08:12:19.5551818Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpt65uoyh2.ptx -o /tmp/tmpt65uoyh2.ptx.o 2026-02-21T08:12:19.5552411Z 2026-02-21T08:12:19.5552580Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:12:19.5552839Z 2026-02-21T08:12:19.5553001Z 2026-02-21T08:12:19.5553107Z ================================================================ 2026-02-21T08:12:19.5553347Z Internal Triton PTX codegen error 2026-02-21T08:12:19.5553551Z `ptxas` stderr: 2026-02-21T08:12:19.5554035Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 196 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:12:19.5554598Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:12:19.5554854Z 2026-02-21T08:12:19.5555304Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpt65uoyh2.ptx -o /tmp/tmpt65uoyh2.ptx.o 2026-02-21T08:12:19.5555792Z 2026-02-21T08:12:19.5555796Z 2026-02-21T08:12:19.5555867Z // 2026-02-21T08:12:19.5556030Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:12:19.5556237Z // 2026-02-21T08:12:19.5556315Z 2026-02-21T08:12:19.5556381Z .version 8.7 2026-02-21T08:12:19.5556557Z .target sm_100a 2026-02-21T08:12:19.5556719Z .address_size 64 2026-02-21T08:12:19.5556828Z 2026-02-21T08:12:19.5556970Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:12:19.5557271Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:12:19.5557732Z // @_helion_matmul 2026-02-21T08:12:19.5557965Z .visible .entry _helion_matmul( 2026-02-21T08:12:19.5558205Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:12:19.5558502Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:12:19.5558792Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:12:19.5559083Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:12:19.5559377Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:12:19.5559614Z ) 2026-02-21T08:12:19.5559755Z .reqntid 256 2026-02-21T08:12:19.5559897Z .maxnreg 32 2026-02-21T08:12:19.5560041Z { 2026-02-21T08:12:19.5560183Z .reg .pred %p<81>; 2026-02-21T08:12:19.5560367Z .reg .b16 %rs<3>; 2026-02-21T08:12:19.5560522Z .reg .b32 %r<484>; 2026-02-21T08:12:19.5560686Z .reg .b64 %rd<193>; 2026-02-21T08:12:19.5560988Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19:0 2026-02-21T08:12:19.5561323Z $L__func_begin0: 2026-02-21T08:12:19.5561604Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19:0 2026-02-21T08:12:19.5561884Z 2026-02-21T08:12:19.5561943Z // %bb.0: 2026-02-21T08:12:19.5562118Z ld.param.b64 %rd8, [_helion_matmul_param_3]; 2026-02-21T08:12:19.5562325Z $L__tmp0: 2026-02-21T08:12:19.5562594Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19 2026-02-21T08:12:19.5562911Z mov.u32 %r1, %tid.x; 2026-02-21T08:12:19.5563083Z shr.u32 %r2, %r1, 5; 2026-02-21T08:12:19.5563258Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:12:19.5563479Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:12:19.5563655Z @%p2 bra $L__BB0_12; 2026-02-21T08:12:19.5563935Z bra.uni $L__BB0_1; 2026-02-21T08:12:19.5564098Z $L__BB0_12: 2026-02-21T08:12:19.5564374Z .loc 1 0 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:0:0 2026-02-21T08:12:19.5564777Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:12:19.5565122Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19 2026-02-21T08:12:19.5565487Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:12:19.5565708Z setp.lt.u32 %p28, %r1, 32; 2026-02-21T08:12:19.5565902Z mov.b32 %r232, global_smem; 2026-02-21T08:12:19.5566087Z // begin inline asm 2026-02-21T08:12:19.5566375Z @%p28 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r232], 32; 2026-02-21T08:12:19.5566665Z // end inline asm 2026-02-21T08:12:19.5566820Z bar.sync 0, 128; 2026-02-21T08:12:19.5566995Z ld.shared.b32 %r475, [global_smem]; 2026-02-21T08:12:19.5567269Z bar.sync 0, 128; 2026-02-21T08:12:19.5567439Z // begin inline asm 2026-02-21T08:12:19.5567676Z @%p28 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:12:19.5567947Z // end inline asm 2026-02-21T08:12:19.5568239Z .loc 1 21 67 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:21:67 2026-02-21T08:12:19.5568576Z mov.u32 %r241, %ctaid.x; 2026-02-21T08:12:19.5568766Z mov.u32 %r242, %ctaid.y; 2026-02-21T08:12:19.5568939Z mov.u32 %r243, %ctaid.z; 2026-02-21T08:12:19.5569122Z mov.u32 %r244, %nctaid.x; 2026-02-21T08:12:19.5569305Z mov.u32 %r245, %nctaid.y; 2026-02-21T08:12:19.5569500Z mad.lo.s32 %r246, %r243, %r245, %r242; 2026-02-21T08:12:19.5569705Z mad.lo.s32 %r247, %r246, %r244, %r241; 2026-02-21T08:12:19.5569903Z shl.b32 %r248, %r247, 7; 2026-02-21T08:12:19.5570084Z cvt.s64.s32 %rd123, %r248; 2026-02-21T08:12:19.5570264Z add.s64 %rd120, %rd8, %rd123; 2026-02-21T08:12:19.5570452Z shl.b32 %r249, %r1, 2; 2026-02-21T08:12:19.5570624Z add.s32 %r233, %r232, %r249; 2026-02-21T08:12:19.5570806Z mov.b32 %r234, 0; 2026-02-21T08:12:19.5570962Z // begin inline asm 2026-02-21T08:12:19.5571143Z @%p28 st.shared.b32 [ %r233 + 0 ], %r234; 2026-02-21T08:12:19.5571343Z // end inline asm 2026-02-21T08:12:19.5571506Z bar.warp.sync -1; 2026-02-21T08:12:19.5571668Z setp.eq.b32 %p31, %r1, 0; 2026-02-21T08:12:19.5571853Z cvt.u64.u32 %rd105, %r232; 2026-02-21T08:12:19.5572034Z // begin inline asm 2026-02-21T08:12:19.5572319Z @%p31 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd105 + 0 ], %rd5; 2026-02-21T08:12:19.5572654Z // end inline asm 2026-02-21T08:12:19.5572804Z // begin inline asm 2026-02-21T08:12:19.5573064Z @%p31 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x1; 2026-02-21T08:12:19.5573349Z // end inline asm 2026-02-21T08:12:19.5573505Z mov.b32 %r235, 64; 2026-02-21T08:12:19.5573657Z // begin inline asm 2026-02-21T08:12:19.5573935Z @%p31 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x0, %r235; 2026-02-21T08:12:19.5574253Z // end inline asm 2026-02-21T08:12:19.5574402Z mov.b32 %r236, 128; 2026-02-21T08:12:19.5574568Z // begin inline asm 2026-02-21T08:12:19.5574902Z @%p31 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x1, %r236; 2026-02-21T08:12:19.5575225Z // end inline asm 2026-02-21T08:12:19.5575375Z mov.b32 %r237, 1024; 2026-02-21T08:12:19.5575541Z // begin inline asm 2026-02-21T08:12:19.5575822Z @%p31 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x0, %r237; 2026-02-21T08:12:19.5576155Z // end inline asm 2026-02-21T08:12:19.5576309Z mov.b32 %r238, 4096; 2026-02-21T08:12:19.5576467Z // begin inline asm 2026-02-21T08:12:19.5576812Z @%p31 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x1, %r238; 2026-02-21T08:12:19.5577133Z // end inline asm 2026-02-21T08:12:19.5577292Z mov.b64 %rd113, 2048; 2026-02-21T08:12:19.5577457Z // begin inline asm 2026-02-21T08:12:19.5577763Z @%p31 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd105 + 0 ], 0x0, %rd113; 2026-02-21T08:12:19.5578193Z // end inline asm 2026-02-21T08:12:19.5578346Z mov.b32 %r239, 1; 2026-02-21T08:12:19.5578513Z // begin inline asm 2026-02-21T08:12:19.5578815Z @%p31 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x0, %r239; 2026-02-21T08:12:19.5579152Z // end inline asm 2026-02-21T08:12:19.5579299Z // begin inline asm 2026-02-21T08:12:19.5579594Z @%p31 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x1, %r239; 2026-02-21T08:12:19.5579916Z // end inline asm 2026-02-21T08:12:19.5580088Z // begin inline asm 2026-02-21T08:12:19.5580368Z @%p31 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x6; 2026-02-21T08:12:19.5580680Z // end inline asm 2026-02-21T08:12:19.5580847Z // begin inline asm 2026-02-21T08:12:19.5581132Z @%p31 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x0; 2026-02-21T08:12:19.5581523Z // end inline asm 2026-02-21T08:12:19.5581677Z // begin inline asm 2026-02-21T08:12:19.5581955Z @%p31 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x3; 2026-02-21T08:12:19.5582272Z // end inline asm 2026-02-21T08:12:19.5582425Z // begin inline asm 2026-02-21T08:12:19.5582702Z @%p31 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd105 + 0 ], 0x0; 2026-02-21T08:12:19.5583003Z // end inline asm 2026-02-21T08:12:19.5583163Z // begin inline asm 2026-02-21T08:12:19.5583580Z @%p28 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd120 + 0 ], [ %rd105 + 0 ], 0x80; 2026-02-21T08:12:19.5584058Z // end inline asm 2026-02-21T08:12:19.5584217Z // begin inline asm 2026-02-21T08:12:19.5584462Z @%p28 fence.proxy.tensormap::generic.acquire.gpu [ %rd120 + 0 ], 0x80; 2026-02-21T08:12:19.5584814Z @%p28 cp.async.bulk.commit_group ; 2026-02-21T08:12:19.5585038Z @%p28 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:12:19.5585253Z // end inline asm 2026-02-21T08:12:19.5585407Z bar.sync 0, 128; 2026-02-21T08:12:19.5585702Z .loc 1 28 35 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:28:35 2026-02-21T08:12:19.5586059Z mul.lo.s32 %r483, %r241, 7; 2026-02-21T08:12:19.5586379Z .loc 1 29 37 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:29:37 2026-02-21T08:12:19.5586730Z add.s32 %r250, %r483, 7; 2026-02-21T08:12:19.5587026Z .loc 1 29 49 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:29:49 2026-02-21T08:12:19.5587373Z min.s32 %r26, %r250, 1024; 2026-02-21T08:12:19.5587677Z .loc 1 30 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:30:52 2026-02-21T08:12:19.5588065Z setp.ge.s32 %p48, %r483, %r26; 2026-02-21T08:12:19.5588258Z @%p48 bra $L__BB0_15; 2026-02-21T08:12:19.5588456Z // %bb.13: // %.lr.ph 2026-02-21T08:12:19.5588806Z .loc 1 0 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:0:52 2026-02-21T08:12:19.5589161Z ld.param.b64 %rd7, [_helion_matmul_param_2]; 2026-02-21T08:12:19.5589406Z shl.b32 %r251, %r1, 3; 2026-02-21T08:12:19.5589581Z and.b32 %r27, %r251, 24; 2026-02-21T08:12:19.5589769Z bfe.u32 %r28, %r1, 2, 5; 2026-02-21T08:12:19.5589944Z or.b32 %r29, %r28, 32; 2026-02-21T08:12:19.5590124Z or.b32 %r30, %r28, 64; 2026-02-21T08:12:19.5590290Z or.b32 %r31, %r28, 96; 2026-02-21T08:12:19.5590462Z shl.b32 %r252, %r1, 4; 2026-02-21T08:12:19.5590643Z and.b32 %r253, %r252, 2032; 2026-02-21T08:12:19.5590828Z add.s32 %r32, %r232, %r253; 2026-02-21T08:12:19.5591013Z shl.b32 %r255, %r1, 6; 2026-02-21T08:12:19.5591180Z and.b32 %r256, %r255, 1536; 2026-02-21T08:12:19.5591374Z and.b32 %r257, %r252, 112; 2026-02-21T08:12:19.5591555Z and.b32 %r259, %r249, 384; 2026-02-21T08:12:19.5591745Z add.s32 %r260, %r232, %r256; 2026-02-21T08:12:19.5591968Z add.s32 %r261, %r260, %r257; 2026-02-21T08:12:19.5592151Z add.s32 %r363, %r261, %r259; 2026-02-21T08:12:19.5592467Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T08:12:19.5592862Z .loc 1 36 35 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:36:35 2026-02-21T08:12:19.5593194Z shr.s32 %r395, %r483, 31; 2026-02-21T08:12:19.5593379Z shr.u32 %r396, %r395, 21; 2026-02-21T08:12:19.5593553Z add.s32 %r397, %r483, %r396; 2026-02-21T08:12:19.5593735Z shr.s32 %r398, %r397, 11; 2026-02-21T08:12:19.5594034Z .loc 1 37 33 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:37:33 2026-02-21T08:12:19.5594376Z shl.b32 %r399, %r398, 6; 2026-02-21T08:12:19.5594735Z .loc 1 38 39 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:38:39 2026-02-21T08:12:19.5595055Z sub.s32 %r400, 32, %r399; 2026-02-21T08:12:19.5595352Z .loc 1 38 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:38:52 2026-02-21T08:12:19.5595728Z min.s32 %r401, %r400, 64; 2026-02-21T08:12:19.5596029Z .loc 1 40 51 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:40:51 2026-02-21T08:12:19.5596359Z and.b32 %r402, %r397, -2048; 2026-02-21T08:12:19.5596543Z sub.s32 %r403, %r483, %r402; 2026-02-21T08:12:19.5596726Z div.s32 %r404, %r403, %r401; 2026-02-21T08:12:19.5597019Z .loc 1 41 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:41:27 2026-02-21T08:12:19.5597365Z mul.lo.s32 %r405, %r404, %r401; 2026-02-21T08:12:19.5597560Z mad.lo.s32 %r406, %r398, 1984, %r405; 2026-02-21T08:12:19.5597761Z sub.s32 %r407, %r483, %r406; 2026-02-21T08:12:19.5597932Z shl.b32 %r408, %r407, 5; 2026-02-21T08:12:19.5598235Z .loc 1 42 32 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:42:32 2026-02-21T08:12:19.5598559Z or.b32 %r409, %r408, %r27; 2026-02-21T08:12:19.5598862Z .loc 1 43 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:43:27 2026-02-21T08:12:19.5599193Z shl.b32 %r410, %r404, 7; 2026-02-21T08:12:19.5599491Z .loc 1 44 32 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:44:32 2026-02-21T08:12:19.5599827Z or.b32 %r411, %r410, %r28; 2026-02-21T08:12:19.5599998Z or.b32 %r412, %r410, %r29; 2026-02-21T08:12:19.5600173Z or.b32 %r413, %r410, %r30; 2026-02-21T08:12:19.5600338Z or.b32 %r414, %r410, %r31; 2026-02-21T08:12:19.5600641Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5600994Z shfl.sync.idx.b32 %r415, %r2, 0, 31, -1; 2026-02-21T08:12:19.5601195Z shl.b32 %r416, %r415, 21; 2026-02-21T08:12:19.5601374Z and.b32 %r417, %r416, 6291456; 2026-02-21T08:12:19.5601555Z add.s32 %r262, %r417, %r475; 2026-02-21T08:12:19.5601738Z mov.pred %p49, -1; 2026-02-21T08:12:19.5601899Z mov.b32 %r263, 0; 2026-02-21T08:12:19.5602060Z // begin inline asm 2026-02-21T08:12:19.5602499Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r262 + 0], {%r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263}; 2026-02-21T08:12:19.5602982Z // end inline asm 2026-02-21T08:12:19.5603144Z // begin inline asm 2026-02-21T08:12:19.5603575Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r262 + 16], {%r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263, %r263}; 2026-02-21T08:12:19.5604049Z // end inline asm 2026-02-21T08:12:19.5604201Z // begin inline asm 2026-02-21T08:12:19.5604438Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:12:19.5604626Z // end inline asm 2026-02-21T08:12:19.5604831Z bar.sync 0, 128; 2026-02-21T08:12:19.5605150Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5605518Z add.s32 %r296, %r232, 98304; 2026-02-21T08:12:19.5605710Z // begin inline asm 2026-02-21T08:12:19.5605909Z @%p31 mbarrier.init.shared::cta.b64 [%r296], 1; 2026-02-21T08:12:19.5606216Z // end inline asm 2026-02-21T08:12:19.5606371Z bar.sync 0, 128; 2026-02-21T08:12:19.5606540Z add.s32 %r297, %r232, 98312; 2026-02-21T08:12:19.5606711Z // begin inline asm 2026-02-21T08:12:19.5606903Z @%p31 mbarrier.init.shared::cta.b64 [%r297], 1; 2026-02-21T08:12:19.5607115Z // end inline asm 2026-02-21T08:12:19.5607270Z bar.sync 0, 128; 2026-02-21T08:12:19.5607427Z add.s32 %r298, %r232, 98320; 2026-02-21T08:12:19.5607597Z // begin inline asm 2026-02-21T08:12:19.5607828Z @%p31 mbarrier.init.shared::cta.b64 [%r298], 1; 2026-02-21T08:12:19.5608032Z // end inline asm 2026-02-21T08:12:19.5608185Z bar.sync 0, 128; 2026-02-21T08:12:19.5608383Z add.s32 %r299, %r232, 98328; 2026-02-21T08:12:19.5608565Z // begin inline asm 2026-02-21T08:12:19.5608745Z @%p31 mbarrier.init.shared::cta.b64 [%r299], 1; 2026-02-21T08:12:19.5608958Z // end inline asm 2026-02-21T08:12:19.5609112Z bar.sync 0, 128; 2026-02-21T08:12:19.5609264Z add.s32 %r300, %r232, 98336; 2026-02-21T08:12:19.5609507Z // begin inline asm 2026-02-21T08:12:19.5609689Z @%p31 mbarrier.init.shared::cta.b64 [%r300], 1; 2026-02-21T08:12:19.5609901Z // end inline asm 2026-02-21T08:12:19.5610049Z add.s32 %r301, %r232, 98352; 2026-02-21T08:12:19.5610223Z // begin inline asm 2026-02-21T08:12:19.5610399Z @%p31 mbarrier.init.shared::cta.b64 [%r301], 1; 2026-02-21T08:12:19.5610608Z // end inline asm 2026-02-21T08:12:19.5610754Z bar.sync 0, 128; 2026-02-21T08:12:19.5610911Z add.s32 %r302, %r232, 98360; 2026-02-21T08:12:19.5611084Z // begin inline asm 2026-02-21T08:12:19.5611261Z @%p31 mbarrier.init.shared::cta.b64 [%r302], 1; 2026-02-21T08:12:19.5611470Z // end inline asm 2026-02-21T08:12:19.5611615Z bar.sync 0, 128; 2026-02-21T08:12:19.5611771Z add.s32 %r303, %r232, 98368; 2026-02-21T08:12:19.5611938Z // begin inline asm 2026-02-21T08:12:19.5612122Z @%p31 mbarrier.init.shared::cta.b64 [%r303], 1; 2026-02-21T08:12:19.5612322Z // end inline asm 2026-02-21T08:12:19.5612476Z bar.sync 0, 128; 2026-02-21T08:12:19.5612637Z add.s32 %r304, %r232, 98376; 2026-02-21T08:12:19.5612808Z // begin inline asm 2026-02-21T08:12:19.5612995Z @%p31 mbarrier.init.shared::cta.b64 [%r304], 1; 2026-02-21T08:12:19.5613198Z // end inline asm 2026-02-21T08:12:19.5613353Z bar.sync 0, 128; 2026-02-21T08:12:19.5613501Z add.s32 %r305, %r232, 98384; 2026-02-21T08:12:19.5613678Z // begin inline asm 2026-02-21T08:12:19.5613857Z @%p31 mbarrier.init.shared::cta.b64 [%r305], 1; 2026-02-21T08:12:19.5614072Z // end inline asm 2026-02-21T08:12:19.5614355Z .loc 1 54 31 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:54:31 2026-02-21T08:12:19.5614723Z bar.sync 0, 128; 2026-02-21T08:12:19.5614884Z // begin inline asm 2026-02-21T08:12:19.5615073Z @%p31 mbarrier.arrive.shared::cta.b64 _, [%r296]; 2026-02-21T08:12:19.5615301Z // end inline asm 2026-02-21T08:12:19.5615445Z bar.sync 0, 128; 2026-02-21T08:12:19.5615601Z // begin inline asm 2026-02-21T08:12:19.5615787Z @%p31 mbarrier.arrive.shared::cta.b64 _, [%r297]; 2026-02-21T08:12:19.5616016Z // end inline asm 2026-02-21T08:12:19.5616162Z bar.sync 0, 128; 2026-02-21T08:12:19.5616319Z // begin inline asm 2026-02-21T08:12:19.5616507Z @%p31 mbarrier.arrive.shared::cta.b64 _, [%r298]; 2026-02-21T08:12:19.5616722Z // end inline asm 2026-02-21T08:12:19.5616876Z bar.sync 0, 128; 2026-02-21T08:12:19.5617025Z // begin inline asm 2026-02-21T08:12:19.5617217Z @%p31 mbarrier.arrive.shared::cta.b64 _, [%r299]; 2026-02-21T08:12:19.5617429Z // end inline asm 2026-02-21T08:12:19.5617580Z bar.sync 0, 128; 2026-02-21T08:12:19.5617727Z // begin inline asm 2026-02-21T08:12:19.5617918Z @%p31 mbarrier.arrive.shared::cta.b64 _, [%r300]; 2026-02-21T08:12:19.5618136Z // end inline asm 2026-02-21T08:12:19.5618420Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5618758Z bar.sync 0, 128; 2026-02-21T08:12:19.5618909Z add.s32 %r311, %r232, 98400; 2026-02-21T08:12:19.5619089Z // begin inline asm 2026-02-21T08:12:19.5619376Z @%p31 mbarrier.init.shared::cta.b64 [%r311], 1; 2026-02-21T08:12:19.5619593Z // end inline asm 2026-02-21T08:12:19.5619767Z st.shared.b32 [global_smem+98408], 33554689; 2026-02-21T08:12:19.5620003Z st.shared.b32 [global_smem+81920], %r475; 2026-02-21T08:12:19.5620255Z st.shared.v2.b32 [global_smem+81928], {%r410, %r408}; 2026-02-21T08:12:19.5620481Z barrier.sync 1; 2026-02-21T08:12:19.5620645Z barrier.sync 1; 2026-02-21T08:12:19.5620796Z barrier.sync 1; 2026-02-21T08:12:19.5621091Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5621426Z bar.sync 0, 128; 2026-02-21T08:12:19.5621583Z // begin inline asm 2026-02-21T08:12:19.5621733Z 2026-02-21T08:12:19.5621866Z { 2026-02-21T08:12:19.5622001Z .reg .pred complete; 2026-02-21T08:12:19.5622174Z waitLoop: 2026-02-21T08:12:19.5622389Z mbarrier.try_wait.parity.shared.b64 complete, [%r311], %r263; 2026-02-21T08:12:19.5622778Z @!complete bra.uni waitLoop; 2026-02-21T08:12:19.5623008Z } 2026-02-21T08:12:19.5623083Z 2026-02-21T08:12:19.5623143Z // end inline asm 2026-02-21T08:12:19.5623439Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5623779Z bar.sync 0, 128; 2026-02-21T08:12:19.5623938Z // begin inline asm 2026-02-21T08:12:19.5624127Z @%p31 mbarrier.inval.shared::cta.b64 [%r311]; 2026-02-21T08:12:19.5624355Z // end inline asm 2026-02-21T08:12:19.5624525Z // begin inline asm 2026-02-21T08:12:19.5624773Z @%p31 mbarrier.inval.shared::cta.b64 [%r301]; 2026-02-21T08:12:19.5625003Z // end inline asm 2026-02-21T08:12:19.5625155Z bar.sync 0, 128; 2026-02-21T08:12:19.5625317Z // begin inline asm 2026-02-21T08:12:19.5625500Z @%p31 mbarrier.inval.shared::cta.b64 [%r302]; 2026-02-21T08:12:19.5625720Z // end inline asm 2026-02-21T08:12:19.5625870Z bar.sync 0, 128; 2026-02-21T08:12:19.5626029Z // begin inline asm 2026-02-21T08:12:19.5626221Z @%p31 mbarrier.inval.shared::cta.b64 [%r303]; 2026-02-21T08:12:19.5626433Z // end inline asm 2026-02-21T08:12:19.5626592Z bar.sync 0, 128; 2026-02-21T08:12:19.5626748Z // begin inline asm 2026-02-21T08:12:19.5626939Z @%p31 mbarrier.inval.shared::cta.b64 [%r304]; 2026-02-21T08:12:19.5627147Z // end inline asm 2026-02-21T08:12:19.5627306Z bar.sync 0, 128; 2026-02-21T08:12:19.5627457Z // begin inline asm 2026-02-21T08:12:19.5627648Z @%p31 mbarrier.inval.shared::cta.b64 [%r305]; 2026-02-21T08:12:19.5627856Z // end inline asm 2026-02-21T08:12:19.5628015Z // begin inline asm 2026-02-21T08:12:19.5628202Z @%p31 mbarrier.inval.shared::cta.b64 [%r296]; 2026-02-21T08:12:19.5628410Z // end inline asm 2026-02-21T08:12:19.5628570Z bar.sync 0, 128; 2026-02-21T08:12:19.5628722Z // begin inline asm 2026-02-21T08:12:19.5628909Z @%p31 mbarrier.inval.shared::cta.b64 [%r297]; 2026-02-21T08:12:19.5629116Z // end inline asm 2026-02-21T08:12:19.5629273Z bar.sync 0, 128; 2026-02-21T08:12:19.5629428Z // begin inline asm 2026-02-21T08:12:19.5629618Z @%p31 mbarrier.inval.shared::cta.b64 [%r298]; 2026-02-21T08:12:19.5629837Z // end inline asm 2026-02-21T08:12:19.5629985Z bar.sync 0, 128; 2026-02-21T08:12:19.5630145Z // begin inline asm 2026-02-21T08:12:19.5630326Z @%p31 mbarrier.inval.shared::cta.b64 [%r299]; 2026-02-21T08:12:19.5630543Z // end inline asm 2026-02-21T08:12:19.5630694Z bar.sync 0, 128; 2026-02-21T08:12:19.5630853Z // begin inline asm 2026-02-21T08:12:19.5631034Z @%p31 mbarrier.inval.shared::cta.b64 [%r300]; 2026-02-21T08:12:19.5631253Z // end inline asm 2026-02-21T08:12:19.5631539Z .loc 1 59 45 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:59:45 2026-02-21T08:12:19.5631891Z shl.b32 %r419, %r411, 10; 2026-02-21T08:12:19.5632074Z shl.b32 %r420, %r412, 10; 2026-02-21T08:12:19.5632246Z shl.b32 %r421, %r413, 10; 2026-02-21T08:12:19.5632425Z shl.b32 %r422, %r414, 10; 2026-02-21T08:12:19.5632739Z .loc 1 59 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:59:52 2026-02-21T08:12:19.5633160Z add.s32 %r423, %r419, %r409; 2026-02-21T08:12:19.5633341Z add.s32 %r424, %r420, %r409; 2026-02-21T08:12:19.5633526Z add.s32 %r425, %r421, %r409; 2026-02-21T08:12:19.5633702Z add.s32 %r426, %r422, %r409; 2026-02-21T08:12:19.5634022Z .loc 1 59 24 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:59:24 2026-02-21T08:12:19.5634383Z mad.wide.s32 %rd124, %r423, 2, %rd7; 2026-02-21T08:12:19.5634597Z mad.wide.s32 %rd125, %r424, 2, %rd7; 2026-02-21T08:12:19.5634864Z mad.wide.s32 %rd126, %r425, 2, %rd7; 2026-02-21T08:12:19.5635094Z mad.wide.s32 %rd127, %r426, 2, %rd7; 2026-02-21T08:12:19.5635436Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5635773Z // begin inline asm 2026-02-21T08:12:19.5636299Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340}, [%r262 + 0]; 2026-02-21T08:12:19.5636783Z // end inline asm 2026-02-21T08:12:19.5636941Z // begin inline asm 2026-02-21T08:12:19.5637393Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357}, [%r262 + 16]; 2026-02-21T08:12:19.5637850Z // end inline asm 2026-02-21T08:12:19.5638012Z // begin inline asm 2026-02-21T08:12:19.5638180Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:12:19.5638371Z // end inline asm 2026-02-21T08:12:19.5638535Z cvt.u64.u32 %rd128, %r325; 2026-02-21T08:12:19.5638717Z cvt.u64.u32 %rd129, %r326; 2026-02-21T08:12:19.5638904Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:12:19.5639088Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:12:19.5639410Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5639750Z mov.b64 {%r427, %r428}, %rd131; 2026-02-21T08:12:19.5639956Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T08:12:19.5640279Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5640616Z cvt.u64.u32 %rd132, %r327; 2026-02-21T08:12:19.5640799Z cvt.u64.u32 %rd133, %r328; 2026-02-21T08:12:19.5640973Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:12:19.5641160Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:12:19.5641460Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5641790Z mov.b64 {%r430, %r431}, %rd135; 2026-02-21T08:12:19.5641980Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T08:12:19.5642302Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5642636Z cvt.u64.u32 %rd136, %r329; 2026-02-21T08:12:19.5642809Z cvt.u64.u32 %rd137, %r330; 2026-02-21T08:12:19.5642992Z shl.b64 %rd138, %rd137, 32; 2026-02-21T08:12:19.5643171Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T08:12:19.5643479Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5643798Z mov.b64 {%r433, %r434}, %rd139; 2026-02-21T08:12:19.5643994Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T08:12:19.5644305Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5644648Z cvt.u64.u32 %rd140, %r331; 2026-02-21T08:12:19.5644870Z cvt.u64.u32 %rd141, %r332; 2026-02-21T08:12:19.5645045Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:12:19.5645243Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:12:19.5645581Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5645956Z mov.b64 {%r436, %r437}, %rd143; 2026-02-21T08:12:19.5646145Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T08:12:19.5646471Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5646800Z cvt.u64.u32 %rd144, %r333; 2026-02-21T08:12:19.5647079Z cvt.u64.u32 %rd145, %r334; 2026-02-21T08:12:19.5647261Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:12:19.5647438Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:12:19.5647757Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5648088Z mov.b64 {%r439, %r440}, %rd147; 2026-02-21T08:12:19.5648287Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T08:12:19.5648598Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5648940Z cvt.u64.u32 %rd148, %r335; 2026-02-21T08:12:19.5649144Z cvt.u64.u32 %rd149, %r336; 2026-02-21T08:12:19.5649356Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:12:19.5649584Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:12:19.5649952Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5650460Z mov.b64 {%r442, %r443}, %rd151; 2026-02-21T08:12:19.5650692Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T08:12:19.5651082Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5651415Z cvt.u64.u32 %rd152, %r337; 2026-02-21T08:12:19.5651586Z cvt.u64.u32 %rd153, %r338; 2026-02-21T08:12:19.5651764Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:12:19.5651936Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:12:19.5652238Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5652556Z mov.b64 {%r445, %r446}, %rd155; 2026-02-21T08:12:19.5652754Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T08:12:19.5653066Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5653395Z cvt.u64.u32 %rd156, %r339; 2026-02-21T08:12:19.5653573Z cvt.u64.u32 %rd157, %r340; 2026-02-21T08:12:19.5653747Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:12:19.5653934Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:12:19.5654235Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5654588Z mov.b64 {%r448, %r449}, %rd159; 2026-02-21T08:12:19.5654836Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T08:12:19.5655197Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5655564Z cvt.u64.u32 %rd160, %r342; 2026-02-21T08:12:19.5655753Z cvt.u64.u32 %rd161, %r343; 2026-02-21T08:12:19.5655951Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:12:19.5656147Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:12:19.5656462Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5656787Z mov.b64 {%r451, %r452}, %rd163; 2026-02-21T08:12:19.5657002Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T08:12:19.5657356Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5657734Z cvt.u64.u32 %rd164, %r344; 2026-02-21T08:12:19.5657938Z cvt.u64.u32 %rd165, %r345; 2026-02-21T08:12:19.5658133Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:12:19.5658339Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:12:19.5658696Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5659069Z mov.b64 {%r454, %r455}, %rd167; 2026-02-21T08:12:19.5659274Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T08:12:19.5659645Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5660018Z cvt.u64.u32 %rd168, %r346; 2026-02-21T08:12:19.5660209Z cvt.u64.u32 %rd169, %r347; 2026-02-21T08:12:19.5660408Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:12:19.5660603Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:12:19.5660957Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5661415Z mov.b64 {%r457, %r458}, %rd171; 2026-02-21T08:12:19.5661632Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T08:12:19.5661990Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5662381Z cvt.u64.u32 %rd172, %r348; 2026-02-21T08:12:19.5662585Z cvt.u64.u32 %rd173, %r349; 2026-02-21T08:12:19.5662787Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:12:19.5662968Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:12:19.5663272Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5663607Z mov.b64 {%r460, %r461}, %rd175; 2026-02-21T08:12:19.5663791Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T08:12:19.5664108Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5664509Z cvt.u64.u32 %rd176, %r350; 2026-02-21T08:12:19.5664722Z cvt.u64.u32 %rd177, %r351; 2026-02-21T08:12:19.5664905Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:12:19.5665083Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:12:19.5665388Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5665715Z mov.b64 {%r463, %r464}, %rd179; 2026-02-21T08:12:19.5665909Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T08:12:19.5666223Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5666566Z cvt.u64.u32 %rd180, %r352; 2026-02-21T08:12:19.5666746Z cvt.u64.u32 %rd181, %r353; 2026-02-21T08:12:19.5666917Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:12:19.5667099Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:12:19.5667396Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5667728Z mov.b64 {%r466, %r467}, %rd183; 2026-02-21T08:12:19.5667919Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T08:12:19.5668241Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5668583Z cvt.u64.u32 %rd184, %r354; 2026-02-21T08:12:19.5668757Z cvt.u64.u32 %rd185, %r355; 2026-02-21T08:12:19.5668934Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:12:19.5669109Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:12:19.5669415Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5669754Z mov.b64 {%r469, %r470}, %rd187; 2026-02-21T08:12:19.5669944Z cvt.rn.f16x2.f32 %r471, %r470, %r469; 2026-02-21T08:12:19.5670253Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5670582Z cvt.u64.u32 %rd188, %r356; 2026-02-21T08:12:19.5670758Z cvt.u64.u32 %rd189, %r357; 2026-02-21T08:12:19.5670928Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:12:19.5671110Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:12:19.5671412Z .loc 1 58 27 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:58:27 2026-02-21T08:12:19.5671769Z mov.b64 {%r472, %r473}, %rd191; 2026-02-21T08:12:19.5671840Z cvt.rn.f16x2.f32 %r474, %r473, %r472; 2026-02-21T08:12:19.5672024Z .loc 1 59 82 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:59:82 2026-02-21T08:12:19.5672139Z st.shared.v4.b32 [%r32], {%r429, %r441, %r453, %r465}; 2026-02-21T08:12:19.5672202Z bar.sync 0, 128; 2026-02-21T08:12:19.5672292Z // begin inline asm 2026-02-21T08:12:19.5672522Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r359, %r360, %r361, %r362}, [%r363]; 2026-02-21T08:12:19.5672586Z // end inline asm 2026-02-21T08:12:19.5672650Z bar.sync 0, 128; 2026-02-21T08:12:19.5672764Z st.shared.v4.b32 [%r32], {%r432, %r444, %r456, %r468}; 2026-02-21T08:12:19.5672829Z bar.sync 0, 128; 2026-02-21T08:12:19.5672895Z // begin inline asm 2026-02-21T08:12:19.5673070Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r364, %r365, %r366, %r367}, [%r363]; 2026-02-21T08:12:19.5673211Z // end inline asm 2026-02-21T08:12:19.5673273Z bar.sync 0, 128; 2026-02-21T08:12:19.5673376Z st.shared.v4.b32 [%r32], {%r435, %r447, %r459, %r471}; 2026-02-21T08:12:19.5673446Z bar.sync 0, 128; 2026-02-21T08:12:19.5673511Z // begin inline asm 2026-02-21T08:12:19.5673676Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r369, %r370, %r371, %r372}, [%r363]; 2026-02-21T08:12:19.5673738Z // end inline asm 2026-02-21T08:12:19.5673809Z bar.sync 0, 128; 2026-02-21T08:12:19.5673909Z st.shared.v4.b32 [%r32], {%r438, %r450, %r462, %r474}; 2026-02-21T08:12:19.5673971Z bar.sync 0, 128; 2026-02-21T08:12:19.5674046Z // begin inline asm 2026-02-21T08:12:19.5674207Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r374, %r375, %r376, %r377}, [%r363]; 2026-02-21T08:12:19.5674269Z // end inline asm 2026-02-21T08:12:19.5674344Z // begin inline asm 2026-02-21T08:12:19.5674538Z st.global.v4.b32 [ %rd124 + 0 ], { %r359, %r364, %r369, %r374 }; 2026-02-21T08:12:19.5674603Z // end inline asm 2026-02-21T08:12:19.5674665Z // begin inline asm 2026-02-21T08:12:19.5674827Z st.global.v4.b32 [ %rd125 + 0 ], { %r360, %r365, %r370, %r375 }; 2026-02-21T08:12:19.5674891Z // end inline asm 2026-02-21T08:12:19.5674952Z // begin inline asm 2026-02-21T08:12:19.5675069Z st.global.v4.b32 [ %rd126 + 0 ], { %r361, %r366, %r371, %r376 }; 2026-02-21T08:12:19.5675130Z // end inline asm 2026-02-21T08:12:19.5675191Z // begin inline asm 2026-02-21T08:12:19.5675299Z st.global.v4.b32 [ %rd127 + 0 ], { %r362, %r367, %r372, %r377 }; 2026-02-21T08:12:19.5675368Z // end inline asm 2026-02-21T08:12:19.5675569Z .loc 1 30 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:30:52 2026-02-21T08:12:19.5675636Z add.s32 %r483, %r483, 1; 2026-02-21T08:12:19.5675720Z setp.ne.b32 %p78, %r26, %r483; 2026-02-21T08:12:19.5675784Z @%p78 bra $L__BB0_14; 2026-02-21T08:12:19.5675878Z $L__BB0_15: // %._crit_edge 2026-02-21T08:12:19.5676081Z .loc 1 30 4 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:30:4 2026-02-21T08:12:19.5676142Z bar.sync 0, 128; 2026-02-21T08:12:19.5676203Z // begin inline asm 2026-02-21T08:12:19.5676336Z @%p28 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r475, 32; 2026-02-21T08:12:19.5676405Z // end inline asm 2026-02-21T08:12:19.5676493Z st.shared.b32 [global_smem+98408], 50529027; 2026-02-21T08:12:19.5676556Z barrier.sync 1; 2026-02-21T08:12:19.5676655Z $L__BB0_16: // %common.ret 2026-02-21T08:12:19.5676853Z .loc 1 0 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:0 2026-02-21T08:12:19.5676913Z ret; 2026-02-21T08:12:19.5677027Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:12:19.5677119Z ld.param.b64 %rd6, [_helion_matmul_param_1]; 2026-02-21T08:12:19.5677311Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19 2026-02-21T08:12:19.5677382Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:12:19.5677455Z and.b16 %rs2, %rs1, 7; 2026-02-21T08:12:19.5677526Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:12:19.5677592Z mov.b32 %r37, global_smem; 2026-02-21T08:12:19.5677667Z add.s32 %r38, %r37, %r3; 2026-02-21T08:12:19.5677733Z add.s32 %r131, %r37, 81920; 2026-02-21T08:12:19.5677797Z bra.uni $L__BB0_2; 2026-02-21T08:12:19.5677912Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5678123Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5678186Z barrier.sync 1; 2026-02-21T08:12:19.5678250Z barrier.sync 1; 2026-02-21T08:12:19.5678346Z $L__BB0_2: // %.preheader 2026-02-21T08:12:19.5678449Z // =>This Loop Header: Depth=1 2026-02-21T08:12:19.5678549Z // Child Loop BB0_9 Depth 2 2026-02-21T08:12:19.5678709Z // Child Loop BB0_6 Depth 2 2026-02-21T08:12:19.5678897Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19 2026-02-21T08:12:19.5678991Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:12:19.5679060Z barrier.sync 1; 2026-02-21T08:12:19.5679136Z ld.shared.b8 %r36, [%r38+98404]; 2026-02-21T08:12:19.5679208Z setp.gt.u32 %p3, %r36, 3; 2026-02-21T08:12:19.5679273Z @%p3 bra $L__BB0_4; 2026-02-21T08:12:19.5679369Z // %bb.3: // %.preheader 2026-02-21T08:12:19.5679470Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5679542Z $L_brx_0: .branchtargets 2026-02-21T08:12:19.5679609Z $L__BB0_5, 2026-02-21T08:12:19.5679668Z $L__BB0_8, 2026-02-21T08:12:19.5679726Z $L__BB0_11, 2026-02-21T08:12:19.5679783Z $L__BB0_16; 2026-02-21T08:12:19.5679859Z brx.idx %r36, $L_brx_0; 2026-02-21T08:12:19.5680519Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5680733Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5680828Z ld.shared.b32 %r167, [global_smem+81920]; 2026-02-21T08:12:19.5680911Z ld.shared.b32 %r132, [global_smem+81932]; 2026-02-21T08:12:19.5680976Z barrier.sync 1; 2026-02-21T08:12:19.5681178Z .loc 1 42 45 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:42:45 2026-02-21T08:12:19.5681244Z and.b32 %r133, %r1, 24; 2026-02-21T08:12:19.5681311Z bfe.u32 %r134, %r1, 3, 2; 2026-02-21T08:12:19.5681506Z .loc 1 50 48 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:50:48 2026-02-21T08:12:19.5681579Z shl.b32 %r135, %r1, 3; 2026-02-21T08:12:19.5681645Z and.b32 %r136, %r135, 56; 2026-02-21T08:12:19.5681838Z .loc 1 42 32 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:42:32 2026-02-21T08:12:19.5681916Z add.s32 %r137, %r132, %r134; 2026-02-21T08:12:19.5681982Z shl.b32 %r138, %r137, 10; 2026-02-21T08:12:19.5682184Z .loc 1 55 80 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:80 2026-02-21T08:12:19.5682262Z add.s32 %r139, %r138, 4096; 2026-02-21T08:12:19.5682327Z add.s32 %r140, %r138, 8192; 2026-02-21T08:12:19.5682395Z add.s32 %r141, %r138, 12288; 2026-02-21T08:12:19.5682462Z add.s32 %r142, %r138, 16384; 2026-02-21T08:12:19.5682534Z add.s32 %r143, %r138, 20480; 2026-02-21T08:12:19.5682598Z add.s32 %r144, %r138, 24576; 2026-02-21T08:12:19.5682660Z add.s32 %r145, %r138, 28672; 2026-02-21T08:12:19.5682864Z .loc 1 55 59 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:59 2026-02-21T08:12:19.5682929Z or.b32 %r146, %r138, %r136; 2026-02-21T08:12:19.5682992Z or.b32 %r147, %r139, %r136; 2026-02-21T08:12:19.5683054Z or.b32 %r148, %r140, %r136; 2026-02-21T08:12:19.5683128Z or.b32 %r149, %r141, %r136; 2026-02-21T08:12:19.5683194Z or.b32 %r150, %r142, %r136; 2026-02-21T08:12:19.5683255Z or.b32 %r151, %r143, %r136; 2026-02-21T08:12:19.5683325Z or.b32 %r152, %r144, %r136; 2026-02-21T08:12:19.5683387Z or.b32 %r153, %r145, %r136; 2026-02-21T08:12:19.5683576Z .loc 1 55 34 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:34 2026-02-21T08:12:19.5683656Z mad.wide.s32 %rd12, %r146, 2, %rd6; 2026-02-21T08:12:19.5683730Z mad.wide.s32 %rd16, %r147, 2, %rd6; 2026-02-21T08:12:19.5683800Z mad.wide.s32 %rd13, %r148, 2, %rd6; 2026-02-21T08:12:19.5683871Z mad.wide.s32 %rd17, %r149, 2, %rd6; 2026-02-21T08:12:19.5683950Z mad.wide.s32 %rd14, %r150, 2, %rd6; 2026-02-21T08:12:19.5684019Z mad.wide.s32 %rd18, %r151, 2, %rd6; 2026-02-21T08:12:19.5684087Z mad.wide.s32 %rd15, %r152, 2, %rd6; 2026-02-21T08:12:19.5684179Z mad.wide.s32 %rd19, %r153, 2, %rd6; 2026-02-21T08:12:19.5684383Z .loc 1 55 87 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:87 2026-02-21T08:12:19.5684503Z shl.b32 %r154, %r1, 4; 2026-02-21T08:12:19.5684580Z and.b32 %r155, %r154, 496; 2026-02-21T08:12:19.5684648Z shl.b32 %r156, %r133, 1; 2026-02-21T08:12:19.5684771Z xor.b32 %r6, %r155, %r156; 2026-02-21T08:12:19.5684837Z add.s32 %r63, %r131, %r6; 2026-02-21T08:12:19.5684907Z mov.b32 %r64, 16; 2026-02-21T08:12:19.5684971Z // begin inline asm 2026-02-21T08:12:19.5685108Z cp.async.cg.shared.global [ %r63 + 0 ], [ %rd12 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5685178Z // end inline asm 2026-02-21T08:12:19.5685244Z add.s32 %r65, %r63, 1024; 2026-02-21T08:12:19.5685306Z // begin inline asm 2026-02-21T08:12:19.5685433Z cp.async.cg.shared.global [ %r65 + 0 ], [ %rd13 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5685502Z // end inline asm 2026-02-21T08:12:19.5685566Z add.s32 %r67, %r63, 2048; 2026-02-21T08:12:19.5685630Z // begin inline asm 2026-02-21T08:12:19.5685762Z cp.async.cg.shared.global [ %r67 + 0 ], [ %rd14 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5685886Z // end inline asm 2026-02-21T08:12:19.5685955Z add.s32 %r69, %r63, 3072; 2026-02-21T08:12:19.5686018Z // begin inline asm 2026-02-21T08:12:19.5686146Z cp.async.cg.shared.global [ %r69 + 0 ], [ %rd15 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5686206Z // end inline asm 2026-02-21T08:12:19.5686270Z xor.b32 %r7, %r6, 64; 2026-02-21T08:12:19.5686344Z add.s32 %r157, %r131, %r7; 2026-02-21T08:12:19.5686409Z add.s32 %r71, %r157, 512; 2026-02-21T08:12:19.5686474Z // begin inline asm 2026-02-21T08:12:19.5686604Z cp.async.cg.shared.global [ %r71 + 0 ], [ %rd16 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5686667Z // end inline asm 2026-02-21T08:12:19.5686736Z add.s32 %r73, %r157, 1536; 2026-02-21T08:12:19.5686799Z // begin inline asm 2026-02-21T08:12:19.5686925Z cp.async.cg.shared.global [ %r73 + 0 ], [ %rd17 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5686986Z // end inline asm 2026-02-21T08:12:19.5687050Z add.s32 %r75, %r157, 2560; 2026-02-21T08:12:19.5687120Z // begin inline asm 2026-02-21T08:12:19.5687241Z cp.async.cg.shared.global [ %r75 + 0 ], [ %rd18 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5687304Z // end inline asm 2026-02-21T08:12:19.5687368Z add.s32 %r77, %r157, 3584; 2026-02-21T08:12:19.5687437Z // begin inline asm 2026-02-21T08:12:19.5687554Z cp.async.cg.shared.global [ %r77 + 0 ], [ %rd19 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5687614Z // end inline asm 2026-02-21T08:12:19.5687691Z cp.async.commit_group; 2026-02-21T08:12:19.5687887Z .loc 1 55 34 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:34 2026-02-21T08:12:19.5687953Z cvt.s64.s32 %rd45, %r138; 2026-02-21T08:12:19.5688023Z cvt.u64.u32 %rd46, %r136; 2026-02-21T08:12:19.5688088Z or.b64 %rd47, %rd45, %rd46; 2026-02-21T08:12:19.5688154Z shl.b64 %rd48, %rd47, 1; 2026-02-21T08:12:19.5688219Z add.s64 %rd49, %rd6, %rd48; 2026-02-21T08:12:19.5688291Z add.s64 %rd20, %rd49, 128; 2026-02-21T08:12:19.5688354Z cvt.s64.s32 %rd50, %r139; 2026-02-21T08:12:19.5688420Z or.b64 %rd51, %rd50, %rd46; 2026-02-21T08:12:19.5688492Z shl.b64 %rd52, %rd51, 1; 2026-02-21T08:12:19.5688556Z add.s64 %rd53, %rd6, %rd52; 2026-02-21T08:12:19.5688619Z add.s64 %rd24, %rd53, 128; 2026-02-21T08:12:19.5688684Z cvt.s64.s32 %rd54, %r140; 2026-02-21T08:12:19.5688755Z or.b64 %rd55, %rd54, %rd46; 2026-02-21T08:12:19.5688818Z shl.b64 %rd56, %rd55, 1; 2026-02-21T08:12:19.5688883Z add.s64 %rd57, %rd6, %rd56; 2026-02-21T08:12:19.5688954Z add.s64 %rd21, %rd57, 128; 2026-02-21T08:12:19.5689018Z cvt.s64.s32 %rd58, %r141; 2026-02-21T08:12:19.5689082Z or.b64 %rd59, %rd58, %rd46; 2026-02-21T08:12:19.5689145Z shl.b64 %rd60, %rd59, 1; 2026-02-21T08:12:19.5689217Z add.s64 %rd61, %rd6, %rd60; 2026-02-21T08:12:19.5689281Z add.s64 %rd25, %rd61, 128; 2026-02-21T08:12:19.5689346Z cvt.s64.s32 %rd62, %r142; 2026-02-21T08:12:19.5689415Z or.b64 %rd63, %rd62, %rd46; 2026-02-21T08:12:19.5689478Z shl.b64 %rd64, %rd63, 1; 2026-02-21T08:12:19.5689542Z add.s64 %rd65, %rd6, %rd64; 2026-02-21T08:12:19.5689608Z add.s64 %rd22, %rd65, 128; 2026-02-21T08:12:19.5689740Z cvt.s64.s32 %rd66, %r143; 2026-02-21T08:12:19.5689806Z or.b64 %rd67, %rd66, %rd46; 2026-02-21T08:12:19.5689870Z shl.b64 %rd68, %rd67, 1; 2026-02-21T08:12:19.5689942Z add.s64 %rd69, %rd6, %rd68; 2026-02-21T08:12:19.5690007Z add.s64 %rd26, %rd69, 128; 2026-02-21T08:12:19.5690071Z cvt.s64.s32 %rd70, %r144; 2026-02-21T08:12:19.5690134Z or.b64 %rd71, %rd70, %rd46; 2026-02-21T08:12:19.5690206Z shl.b64 %rd72, %rd71, 1; 2026-02-21T08:12:19.5690271Z add.s64 %rd73, %rd6, %rd72; 2026-02-21T08:12:19.5690335Z add.s64 %rd23, %rd73, 128; 2026-02-21T08:12:19.5690408Z cvt.s64.s32 %rd74, %r145; 2026-02-21T08:12:19.5690473Z or.b64 %rd75, %rd74, %rd46; 2026-02-21T08:12:19.5690538Z shl.b64 %rd76, %rd75, 1; 2026-02-21T08:12:19.5690610Z add.s64 %rd77, %rd6, %rd76; 2026-02-21T08:12:19.5690676Z add.s64 %rd27, %rd77, 128; 2026-02-21T08:12:19.5690917Z .loc 1 55 87 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:87 2026-02-21T08:12:19.5690992Z bar.warp.sync -1; 2026-02-21T08:12:19.5691068Z add.s32 %r158, %r37, 86016; 2026-02-21T08:12:19.5691132Z add.s32 %r79, %r158, %r6; 2026-02-21T08:12:19.5691194Z // begin inline asm 2026-02-21T08:12:19.5691322Z cp.async.cg.shared.global [ %r79 + 0 ], [ %rd20 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5691385Z // end inline asm 2026-02-21T08:12:19.5691447Z add.s32 %r81, %r79, 1024; 2026-02-21T08:12:19.5691509Z // begin inline asm 2026-02-21T08:12:19.5691640Z cp.async.cg.shared.global [ %r81 + 0 ], [ %rd21 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5691702Z // end inline asm 2026-02-21T08:12:19.5691769Z add.s32 %r83, %r79, 2048; 2026-02-21T08:12:19.5691843Z // begin inline asm 2026-02-21T08:12:19.5691973Z cp.async.cg.shared.global [ %r83 + 0 ], [ %rd22 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5692037Z // end inline asm 2026-02-21T08:12:19.5692104Z add.s32 %r85, %r79, 3072; 2026-02-21T08:12:19.5692178Z // begin inline asm 2026-02-21T08:12:19.5692309Z cp.async.cg.shared.global [ %r85 + 0 ], [ %rd23 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5692374Z // end inline asm 2026-02-21T08:12:19.5692446Z add.s32 %r159, %r158, %r7; 2026-02-21T08:12:19.5692509Z add.s32 %r87, %r159, 512; 2026-02-21T08:12:19.5692571Z // begin inline asm 2026-02-21T08:12:19.5692703Z cp.async.cg.shared.global [ %r87 + 0 ], [ %rd24 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5692764Z // end inline asm 2026-02-21T08:12:19.5692827Z add.s32 %r89, %r159, 1536; 2026-02-21T08:12:19.5692889Z // begin inline asm 2026-02-21T08:12:19.5693013Z cp.async.cg.shared.global [ %r89 + 0 ], [ %rd25 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5693074Z // end inline asm 2026-02-21T08:12:19.5693138Z add.s32 %r91, %r159, 2560; 2026-02-21T08:12:19.5693208Z // begin inline asm 2026-02-21T08:12:19.5693337Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd26 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5693398Z // end inline asm 2026-02-21T08:12:19.5693461Z add.s32 %r93, %r159, 3584; 2026-02-21T08:12:19.5693531Z // begin inline asm 2026-02-21T08:12:19.5693650Z cp.async.cg.shared.global [ %r93 + 0 ], [ %rd27 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5693715Z // end inline asm 2026-02-21T08:12:19.5693794Z cp.async.commit_group; 2026-02-21T08:12:19.5693986Z .loc 1 55 34 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:34 2026-02-21T08:12:19.5694052Z add.s64 %rd28, %rd49, 256; 2026-02-21T08:12:19.5694124Z add.s64 %rd32, %rd53, 256; 2026-02-21T08:12:19.5694189Z add.s64 %rd29, %rd57, 256; 2026-02-21T08:12:19.5694253Z add.s64 %rd33, %rd61, 256; 2026-02-21T08:12:19.5694315Z add.s64 %rd30, %rd65, 256; 2026-02-21T08:12:19.5694386Z add.s64 %rd34, %rd69, 256; 2026-02-21T08:12:19.5694449Z add.s64 %rd31, %rd73, 256; 2026-02-21T08:12:19.5694513Z add.s64 %rd35, %rd77, 256; 2026-02-21T08:12:19.5694758Z .loc 1 55 87 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:87 2026-02-21T08:12:19.5694826Z bar.warp.sync -1; 2026-02-21T08:12:19.5694893Z add.s32 %r160, %r37, 90112; 2026-02-21T08:12:19.5694959Z add.s32 %r95, %r160, %r6; 2026-02-21T08:12:19.5695107Z // begin inline asm 2026-02-21T08:12:19.5695234Z cp.async.cg.shared.global [ %r95 + 0 ], [ %rd28 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5695295Z // end inline asm 2026-02-21T08:12:19.5695366Z add.s32 %r97, %r95, 1024; 2026-02-21T08:12:19.5695429Z // begin inline asm 2026-02-21T08:12:19.5695548Z cp.async.cg.shared.global [ %r97 + 0 ], [ %rd29 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5695610Z // end inline asm 2026-02-21T08:12:19.5695681Z add.s32 %r99, %r95, 2048; 2026-02-21T08:12:19.5695746Z // begin inline asm 2026-02-21T08:12:19.5695868Z cp.async.cg.shared.global [ %r99 + 0 ], [ %rd30 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5695938Z // end inline asm 2026-02-21T08:12:19.5696002Z add.s32 %r101, %r95, 3072; 2026-02-21T08:12:19.5696065Z // begin inline asm 2026-02-21T08:12:19.5696200Z cp.async.cg.shared.global [ %r101 + 0 ], [ %rd31 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5696261Z // end inline asm 2026-02-21T08:12:19.5696380Z add.s32 %r161, %r160, %r7; 2026-02-21T08:12:19.5696449Z add.s32 %r103, %r161, 512; 2026-02-21T08:12:19.5696521Z // begin inline asm 2026-02-21T08:12:19.5696656Z cp.async.cg.shared.global [ %r103 + 0 ], [ %rd32 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5696716Z // end inline asm 2026-02-21T08:12:19.5696786Z add.s32 %r105, %r161, 1536; 2026-02-21T08:12:19.5696848Z // begin inline asm 2026-02-21T08:12:19.5696980Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd33 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5697041Z // end inline asm 2026-02-21T08:12:19.5697111Z add.s32 %r107, %r161, 2560; 2026-02-21T08:12:19.5697174Z // begin inline asm 2026-02-21T08:12:19.5697302Z cp.async.cg.shared.global [ %r107 + 0 ], [ %rd34 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5697370Z // end inline asm 2026-02-21T08:12:19.5697433Z add.s32 %r109, %r161, 3584; 2026-02-21T08:12:19.5697494Z // begin inline asm 2026-02-21T08:12:19.5697622Z cp.async.cg.shared.global [ %r109 + 0 ], [ %rd35 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5697692Z // end inline asm 2026-02-21T08:12:19.5697763Z cp.async.commit_group; 2026-02-21T08:12:19.5697966Z .loc 1 55 34 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:34 2026-02-21T08:12:19.5698039Z add.s64 %rd36, %rd49, 384; 2026-02-21T08:12:19.5698103Z add.s64 %rd40, %rd53, 384; 2026-02-21T08:12:19.5698166Z add.s64 %rd37, %rd57, 384; 2026-02-21T08:12:19.5698237Z add.s64 %rd41, %rd61, 384; 2026-02-21T08:12:19.5698301Z add.s64 %rd38, %rd65, 384; 2026-02-21T08:12:19.5698363Z add.s64 %rd42, %rd69, 384; 2026-02-21T08:12:19.5698426Z add.s64 %rd39, %rd73, 384; 2026-02-21T08:12:19.5698515Z add.s64 %rd43, %rd77, 384; 2026-02-21T08:12:19.5698717Z .loc 1 55 87 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:87 2026-02-21T08:12:19.5698785Z bar.warp.sync -1; 2026-02-21T08:12:19.5698857Z add.s32 %r162, %r37, 94208; 2026-02-21T08:12:19.5698921Z add.s32 %r111, %r162, %r6; 2026-02-21T08:12:19.5698986Z // begin inline asm 2026-02-21T08:12:19.5699112Z cp.async.cg.shared.global [ %r111 + 0 ], [ %rd36 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5699182Z // end inline asm 2026-02-21T08:12:19.5699245Z add.s32 %r113, %r111, 1024; 2026-02-21T08:12:19.5699306Z // begin inline asm 2026-02-21T08:12:19.5699442Z cp.async.cg.shared.global [ %r113 + 0 ], [ %rd37 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5699503Z // end inline asm 2026-02-21T08:12:19.5699566Z add.s32 %r115, %r111, 2048; 2026-02-21T08:12:19.5699641Z // begin inline asm 2026-02-21T08:12:19.5699771Z cp.async.cg.shared.global [ %r115 + 0 ], [ %rd38 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5699833Z // end inline asm 2026-02-21T08:12:19.5699898Z add.s32 %r117, %r111, 3072; 2026-02-21T08:12:19.5699969Z // begin inline asm 2026-02-21T08:12:19.5700097Z cp.async.cg.shared.global [ %r117 + 0 ], [ %rd39 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5700161Z // end inline asm 2026-02-21T08:12:19.5700235Z add.s32 %r163, %r162, %r7; 2026-02-21T08:12:19.5700301Z add.s32 %r119, %r163, 512; 2026-02-21T08:12:19.5700416Z // begin inline asm 2026-02-21T08:12:19.5700548Z cp.async.cg.shared.global [ %r119 + 0 ], [ %rd40 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5700618Z // end inline asm 2026-02-21T08:12:19.5700682Z add.s32 %r121, %r163, 1536; 2026-02-21T08:12:19.5700744Z // begin inline asm 2026-02-21T08:12:19.5700879Z cp.async.cg.shared.global [ %r121 + 0 ], [ %rd41 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5700939Z // end inline asm 2026-02-21T08:12:19.5701003Z add.s32 %r123, %r163, 2560; 2026-02-21T08:12:19.5701065Z // begin inline asm 2026-02-21T08:12:19.5701202Z cp.async.cg.shared.global [ %r123 + 0 ], [ %rd42 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5701263Z // end inline asm 2026-02-21T08:12:19.5701326Z add.s32 %r125, %r163, 3584; 2026-02-21T08:12:19.5701397Z // begin inline asm 2026-02-21T08:12:19.5701521Z cp.async.cg.shared.global [ %r125 + 0 ], [ %rd43 + 0 ], 0x10, %r64; 2026-02-21T08:12:19.5701583Z // end inline asm 2026-02-21T08:12:19.5701703Z cp.async.commit_group; 2026-02-21T08:12:19.5701907Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5701973Z add.s32 %r164, %r4, %r138; 2026-02-21T08:12:19.5702039Z cvt.u64.u32 %rd1, %r164; 2026-02-21T08:12:19.5702110Z mov.pred %p80, 0; 2026-02-21T08:12:19.5702171Z mov.b32 %r478, 0; 2026-02-21T08:12:19.5702229Z mov.b32 %r477, 3; 2026-02-21T08:12:19.5702300Z mov.b32 %r476, -1; 2026-02-21T08:12:19.5702362Z mov.b64 %rd192, 0; 2026-02-21T08:12:19.5702423Z mov.b32 %r479, %r478; 2026-02-21T08:12:19.5702533Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:19.5702646Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:19.5702720Z setp.lt.u64 %p22, %rd192, 768; 2026-02-21T08:12:19.5702785Z add.s32 %r191, %r476, 1; 2026-02-21T08:12:19.5702860Z setp.gt.s32 %p23, %r191, 3; 2026-02-21T08:12:19.5702933Z selp.b32 %r476, 0, %r191, %p23; 2026-02-21T08:12:19.5703000Z shl.b32 %r192, %r479, 3; 2026-02-21T08:12:19.5703075Z add.s32 %r194, %r37, %r192; 2026-02-21T08:12:19.5703139Z add.s32 %r195, %r194, 98304; 2026-02-21T08:12:19.5703205Z add.s32 %r165, %r194, 98352; 2026-02-21T08:12:19.5703404Z .loc 1 54 31 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:54:31 2026-02-21T08:12:19.5703475Z shl.b32 %r196, %r479, 14; 2026-02-21T08:12:19.5703539Z add.s32 %r197, %r37, %r196; 2026-02-21T08:12:19.5703739Z .loc 1 55 87 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:87 2026-02-21T08:12:19.5703818Z cp.async.wait_group 3; 2026-02-21T08:12:19.5703885Z bar.warp.sync -1; 2026-02-21T08:12:19.5703949Z shl.b32 %r198, %r476, 12; 2026-02-21T08:12:19.5704016Z add.s32 %r200, %r131, %r198; 2026-02-21T08:12:19.5704220Z .loc 1 54 31 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:54:31 2026-02-21T08:12:19.5704283Z // begin inline asm 2026-02-21T08:12:19.5704343Z 2026-02-21T08:12:19.5704406Z { 2026-02-21T08:12:19.5704479Z .reg .pred complete; 2026-02-21T08:12:19.5704541Z waitLoop: 2026-02-21T08:12:19.5704705Z mbarrier.try_wait.parity.shared.b64 complete, [%r165], %r478; 2026-02-21T08:12:19.5704788Z @!complete bra.uni waitLoop; 2026-02-21T08:12:19.5704844Z } 2026-02-21T08:12:19.5704848Z 2026-02-21T08:12:19.5704908Z // end inline asm 2026-02-21T08:12:19.5705109Z .loc 1 56 52 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:56:52 2026-02-21T08:12:19.5705182Z setp.eq.b64 %p24, %rd192, 960; 2026-02-21T08:12:19.5705252Z elect.sync %r201|%p12, -1; 2026-02-21T08:12:19.5705325Z bfe.u32 %r202, %r197, 4, 14; 2026-02-21T08:12:19.5705391Z cvt.u64.u32 %rd96, %r202; 2026-02-21T08:12:19.5705469Z or.b64 %rd78, %rd96, 4611686293372403712; 2026-02-21T08:12:19.5705534Z bfe.u32 %r203, %r200, 4, 14; 2026-02-21T08:12:19.5705607Z cvt.u64.u32 %rd97, %r203; 2026-02-21T08:12:19.5705681Z or.b64 %rd79, %rd97, 4611686293322072064; 2026-02-21T08:12:19.5705747Z mov.b32 %r168, 134742032; 2026-02-21T08:12:19.5705881Z // begin inline asm 2026-02-21T08:12:19.5706050Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r167 + 0 ], %rd78, %rd79, %r168, %p80; 2026-02-21T08:12:19.5706111Z // end inline asm 2026-02-21T08:12:19.5706183Z add.s32 %r204, %r197, 32; 2026-02-21T08:12:19.5706254Z bfe.u32 %r205, %r204, 4, 14; 2026-02-21T08:12:19.5706319Z cvt.u64.u32 %rd98, %r205; 2026-02-21T08:12:19.5706393Z or.b64 %rd80, %rd98, 4611686293372403712; 2026-02-21T08:12:19.5706464Z add.s32 %r206, %r200, 32; 2026-02-21T08:12:19.5706527Z bfe.u32 %r207, %r206, 4, 14; 2026-02-21T08:12:19.5706592Z cvt.u64.u32 %rd99, %r207; 2026-02-21T08:12:19.5706671Z or.b64 %rd81, %rd99, 4611686293322072064; 2026-02-21T08:12:19.5706739Z mov.pred %p80, -1; 2026-02-21T08:12:19.5706802Z // begin inline asm 2026-02-21T08:12:19.5706959Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r167 + 0 ], %rd80, %rd81, %r168, %p80; 2026-02-21T08:12:19.5707027Z // end inline asm 2026-02-21T08:12:19.5707145Z add.s32 %r208, %r197, 64; 2026-02-21T08:12:19.5707214Z bfe.u32 %r209, %r208, 4, 14; 2026-02-21T08:12:19.5707288Z cvt.u64.u32 %rd100, %r209; 2026-02-21T08:12:19.5707369Z or.b64 %rd82, %rd100, 4611686293372403712; 2026-02-21T08:12:19.5707433Z add.s32 %r210, %r200, 64; 2026-02-21T08:12:19.5707503Z bfe.u32 %r211, %r210, 4, 14; 2026-02-21T08:12:19.5707568Z cvt.u64.u32 %rd101, %r211; 2026-02-21T08:12:19.5707645Z or.b64 %rd83, %rd101, 4611686293322072064; 2026-02-21T08:12:19.5707709Z // begin inline asm 2026-02-21T08:12:19.5707869Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r167 + 0 ], %rd82, %rd83, %r168, %p80; 2026-02-21T08:12:19.5707932Z // end inline asm 2026-02-21T08:12:19.5707993Z add.s32 %r212, %r197, 96; 2026-02-21T08:12:19.5708065Z bfe.u32 %r213, %r212, 4, 14; 2026-02-21T08:12:19.5708130Z cvt.u64.u32 %rd102, %r213; 2026-02-21T08:12:19.5708204Z or.b64 %rd84, %rd102, 4611686293372403712; 2026-02-21T08:12:19.5708268Z add.s32 %r214, %r200, 96; 2026-02-21T08:12:19.5708343Z bfe.u32 %r215, %r214, 4, 14; 2026-02-21T08:12:19.5708413Z cvt.u64.u32 %rd103, %r215; 2026-02-21T08:12:19.5708489Z or.b64 %rd85, %rd103, 4611686293322072064; 2026-02-21T08:12:19.5708561Z // begin inline asm 2026-02-21T08:12:19.5708712Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r167 + 0 ], %rd84, %rd85, %r168, %p80; 2026-02-21T08:12:19.5708777Z // end inline asm 2026-02-21T08:12:19.5708853Z cvt.u64.u32 %rd86, %r195; 2026-02-21T08:12:19.5708915Z // begin inline asm 2026-02-21T08:12:19.5709059Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T08:12:19.5709120Z // end inline asm 2026-02-21T08:12:19.5709198Z and.pred %p20, %p24, %p12; 2026-02-21T08:12:19.5709264Z add.s32 %r216, %r37, 98400; 2026-02-21T08:12:19.5709327Z cvt.u64.u32 %rd87, %r216; 2026-02-21T08:12:19.5709399Z // begin inline asm 2026-02-21T08:12:19.5709536Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd87]; 2026-02-21T08:12:19.5709597Z // end inline asm 2026-02-21T08:12:19.5709808Z .loc 1 54 31 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:54:31 2026-02-21T08:12:19.5709877Z add.s32 %r217, %r479, 1; 2026-02-21T08:12:19.5709945Z setp.eq.b32 %p25, %r217, 5; 2026-02-21T08:12:19.5710016Z selp.b32 %r479, 0, %r217, %p25; 2026-02-21T08:12:19.5710091Z selp.b32 %r218, 1, 0, %p25; 2026-02-21T08:12:19.5710156Z xor.b32 %r478, %r478, %r218; 2026-02-21T08:12:19.5710354Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5710427Z add.s32 %r219, %r477, 1; 2026-02-21T08:12:19.5710494Z setp.gt.s32 %p26, %r219, 3; 2026-02-21T08:12:19.5710562Z selp.b32 %r477, 0, %r219, %p26; 2026-02-21T08:12:19.5710764Z .loc 1 55 59 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:59 2026-02-21T08:12:19.5710836Z add.s64 %rd104, %rd1, %rd192; 2026-02-21T08:12:19.5710900Z cvt.u32.u64 %r220, %rd104; 2026-02-21T08:12:19.5710966Z add.s32 %r221, %r220, 256; 2026-02-21T08:12:19.5711214Z .loc 1 55 34 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:34 2026-02-21T08:12:19.5711290Z mad.wide.s32 %rd88, %r221, 2, %rd6; 2026-02-21T08:12:19.5711354Z add.s32 %r222, %r220, 4352; 2026-02-21T08:12:19.5711435Z mad.wide.s32 %rd92, %r222, 2, %rd6; 2026-02-21T08:12:19.5711498Z add.s32 %r223, %r220, 8448; 2026-02-21T08:12:19.5711568Z mad.wide.s32 %rd89, %r223, 2, %rd6; 2026-02-21T08:12:19.5711632Z add.s32 %r224, %r220, 12544; 2026-02-21T08:12:19.5711708Z mad.wide.s32 %rd93, %r224, 2, %rd6; 2026-02-21T08:12:19.5711773Z add.s32 %r225, %r220, 16640; 2026-02-21T08:12:19.5711841Z mad.wide.s32 %rd90, %r225, 2, %rd6; 2026-02-21T08:12:19.5711914Z add.s32 %r226, %r220, 20736; 2026-02-21T08:12:19.5711983Z mad.wide.s32 %rd94, %r226, 2, %rd6; 2026-02-21T08:12:19.5712047Z add.s32 %r227, %r220, 24832; 2026-02-21T08:12:19.5712124Z mad.wide.s32 %rd91, %r227, 2, %rd6; 2026-02-21T08:12:19.5712234Z add.s32 %r228, %r220, 28928; 2026-02-21T08:12:19.5712308Z mad.wide.s32 %rd95, %r228, 2, %rd6; 2026-02-21T08:12:19.5712499Z .loc 1 55 87 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:55:87 2026-02-21T08:12:19.5712572Z shl.b32 %r229, %r477, 12; 2026-02-21T08:12:19.5712636Z add.s32 %r230, %r131, %r229; 2026-02-21T08:12:19.5712702Z bar.warp.sync -1; 2026-02-21T08:12:19.5712774Z add.s32 %r175, %r230, %r6; 2026-02-21T08:12:19.5712840Z selp.b32 %r176, 16, 0, %p22; 2026-02-21T08:12:19.5712901Z // begin inline asm 2026-02-21T08:12:19.5713035Z cp.async.cg.shared.global [ %r175 + 0 ], [ %rd88 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5713100Z // end inline asm 2026-02-21T08:12:19.5713164Z add.s32 %r177, %r175, 1024; 2026-02-21T08:12:19.5713225Z // begin inline asm 2026-02-21T08:12:19.5713359Z cp.async.cg.shared.global [ %r177 + 0 ], [ %rd89 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5713420Z // end inline asm 2026-02-21T08:12:19.5713483Z add.s32 %r179, %r175, 2048; 2026-02-21T08:12:19.5713553Z // begin inline asm 2026-02-21T08:12:19.5713680Z cp.async.cg.shared.global [ %r179 + 0 ], [ %rd90 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5713740Z // end inline asm 2026-02-21T08:12:19.5713803Z add.s32 %r181, %r175, 3072; 2026-02-21T08:12:19.5713873Z // begin inline asm 2026-02-21T08:12:19.5713996Z cp.async.cg.shared.global [ %r181 + 0 ], [ %rd91 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5714057Z // end inline asm 2026-02-21T08:12:19.5714127Z add.s32 %r231, %r230, %r7; 2026-02-21T08:12:19.5714190Z add.s32 %r183, %r231, 512; 2026-02-21T08:12:19.5714251Z // begin inline asm 2026-02-21T08:12:19.5714372Z cp.async.cg.shared.global [ %r183 + 0 ], [ %rd92 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5714441Z // end inline asm 2026-02-21T08:12:19.5714504Z add.s32 %r185, %r231, 1536; 2026-02-21T08:12:19.5714566Z // begin inline asm 2026-02-21T08:12:19.5714732Z cp.async.cg.shared.global [ %r185 + 0 ], [ %rd93 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5714792Z // end inline asm 2026-02-21T08:12:19.5714858Z add.s32 %r187, %r231, 2560; 2026-02-21T08:12:19.5714922Z // begin inline asm 2026-02-21T08:12:19.5715051Z cp.async.cg.shared.global [ %r187 + 0 ], [ %rd94 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5715112Z // end inline asm 2026-02-21T08:12:19.5715175Z add.s32 %r189, %r231, 3584; 2026-02-21T08:12:19.5715245Z // begin inline asm 2026-02-21T08:12:19.5715367Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd95 + 0 ], 0x10, %r176; 2026-02-21T08:12:19.5715427Z // end inline asm 2026-02-21T08:12:19.5715505Z cp.async.commit_group; 2026-02-21T08:12:19.5715709Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5715776Z add.s64 %rd3, %rd192, 64; 2026-02-21T08:12:19.5715848Z setp.lt.u64 %p27, %rd192, 960; 2026-02-21T08:12:19.5715920Z mov.b64 %rd192, %rd3; 2026-02-21T08:12:19.5715985Z @%p27 bra $L__BB0_6; 2026-02-21T08:12:19.5716096Z // %bb.7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5716177Z cp.async.wait_group 0; 2026-02-21T08:12:19.5716329Z bar.warp.sync -1; 2026-02-21T08:12:19.5716393Z barrier.sync 1; 2026-02-21T08:12:19.5716454Z bra.uni $L__BB0_2; 2026-02-21T08:12:19.5716574Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5716773Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5716860Z ld.shared.b32 %r54, [global_smem+81928]; 2026-02-21T08:12:19.5716933Z barrier.sync 1; 2026-02-21T08:12:19.5717125Z .loc 1 21 67 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:21:67 2026-02-21T08:12:19.5717194Z mov.u32 %r41, %ctaid.x; 2026-02-21T08:12:19.5717271Z mov.u32 %r42, %ctaid.y; 2026-02-21T08:12:19.5717335Z mov.u32 %r43, %ctaid.z; 2026-02-21T08:12:19.5717401Z mov.u32 %r44, %nctaid.x; 2026-02-21T08:12:19.5717482Z mov.u32 %r45, %nctaid.y; 2026-02-21T08:12:19.5717566Z mad.lo.s32 %r46, %r43, %r45, %r42; 2026-02-21T08:12:19.5717696Z mad.lo.s32 %r47, %r46, %r44, %r41; 2026-02-21T08:12:19.5717769Z shl.b32 %r48, %r47, 7; 2026-02-21T08:12:19.5717844Z cvt.s64.s32 %rd9, %r48; 2026-02-21T08:12:19.5717912Z add.s64 %rd10, %rd8, %rd9; 2026-02-21T08:12:19.5717984Z cvta.global.u64 %rd11, %rd10; 2026-02-21T08:12:19.5718050Z add.s32 %r17, %r1, -128; 2026-02-21T08:12:19.5718119Z mov.b32 %r481, 0; 2026-02-21T08:12:19.5718188Z mov.b32 %r480, -64; 2026-02-21T08:12:19.5718250Z mov.b32 %r482, %r481; 2026-02-21T08:12:19.5718371Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:19.5718477Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:19.5718670Z .loc 1 0 67 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:0:67 2026-02-21T08:12:19.5718749Z setp.lt.u32 %p6, %r17, 32; 2026-02-21T08:12:19.5718815Z setp.eq.b32 %p4, %r17, 0; 2026-02-21T08:12:19.5719016Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5719084Z add.s32 %r480, %r480, 64; 2026-02-21T08:12:19.5719155Z shl.b32 %r56, %r482, 3; 2026-02-21T08:12:19.5719220Z add.s32 %r58, %r37, %r56; 2026-02-21T08:12:19.5719282Z add.s32 %r49, %r58, 98304; 2026-02-21T08:12:19.5719480Z .loc 1 54 31 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:54:31 2026-02-21T08:12:19.5719543Z // begin inline asm 2026-02-21T08:12:19.5719599Z 2026-02-21T08:12:19.5719665Z { 2026-02-21T08:12:19.5719732Z .reg .pred complete; 2026-02-21T08:12:19.5719793Z waitLoop: 2026-02-21T08:12:19.5719926Z mbarrier.try_wait.parity.shared.b64 complete, [%r49], %r481; 2026-02-21T08:12:19.5720006Z @!complete bra.uni waitLoop; 2026-02-21T08:12:19.5720061Z } 2026-02-21T08:12:19.5720065Z 2026-02-21T08:12:19.5720128Z // end inline asm 2026-02-21T08:12:19.5720330Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5720396Z add.s32 %r55, %r58, 98352; 2026-02-21T08:12:19.5720588Z .loc 1 54 31 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:54:31 2026-02-21T08:12:19.5720658Z bar.sync 3, 64; 2026-02-21T08:12:19.5720721Z // begin inline asm 2026-02-21T08:12:19.5720844Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r55], 16384; 2026-02-21T08:12:19.5720905Z // end inline asm 2026-02-21T08:12:19.5720979Z shl.b32 %r59, %r482, 14; 2026-02-21T08:12:19.5721042Z add.s32 %r52, %r37, %r59; 2026-02-21T08:12:19.5721103Z bar.sync 3, 64; 2026-02-21T08:12:19.5721182Z elect.sync %r60|%p7, -1; 2026-02-21T08:12:19.5721253Z and.pred %p5, %p6, %p7; 2026-02-21T08:12:19.5721316Z // begin inline asm 2026-02-21T08:12:19.5721601Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r52], [%rd11, {%r480, %r54}], [%r55]; 2026-02-21T08:12:19.5721669Z // end inline asm 2026-02-21T08:12:19.5721731Z add.s32 %r61, %r482, 1; 2026-02-21T08:12:19.5721801Z setp.eq.b32 %p8, %r61, 5; 2026-02-21T08:12:19.5721929Z selp.b32 %r482, 0, %r61, %p8; 2026-02-21T08:12:19.5721995Z selp.b32 %r62, 1, 0, %p8; 2026-02-21T08:12:19.5722059Z xor.b32 %r481, %r481, %r62; 2026-02-21T08:12:19.5722266Z .loc 1 49 112 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:49:112 2026-02-21T08:12:19.5722337Z setp.lt.u32 %p9, %r480, 960; 2026-02-21T08:12:19.5722400Z @%p9 bra $L__BB0_9; 2026-02-21T08:12:19.5722511Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5722583Z barrier.sync 1; 2026-02-21T08:12:19.5722644Z bra.uni $L__BB0_2; 2026-02-21T08:12:19.5722753Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:19.5722952Z .loc 1 19 0 // cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py:19 2026-02-21T08:12:19.5723014Z barrier.sync 1; 2026-02-21T08:12:19.5723075Z barrier.sync 1; 2026-02-21T08:12:19.5723136Z bra.uni $L__BB0_2; 2026-02-21T08:12:19.5723248Z $L__tmp1: 2026-02-21T08:12:19.5723315Z $L__func_end0: 2026-02-21T08:12:19.5723405Z // -- End function 2026-02-21T08:12:19.5723469Z } 2026-02-21T08:12:19.5723704Z .file 1 "/tmp/torchinductor_root/vb/cvbmath37ddchklaflor5qqqhl6e6okhztmmxarnjy42h3kia4an.py" 2026-02-21T08:12:19.5723774Z .section .debug_abbrev 2026-02-21T08:12:19.5723838Z { 2026-02-21T08:12:19.5723938Z .b8 1 // Abbreviation Code 2026-02-21T08:12:19.5724037Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:12:19.5724127Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:12:19.5724224Z .b8 37 // DW_AT_producer 2026-02-21T08:12:19.5724307Z .b8 8 // DW_FORM_string 2026-02-21T08:12:19.5724388Z .b8 19 // DW_AT_language 2026-02-21T08:12:19.5724481Z .b8 5 // DW_FORM_data2 2026-02-21T08:12:19.5724567Z .b8 3 // DW_AT_name 2026-02-21T08:12:19.5724651Z .b8 8 // DW_FORM_string 2026-02-21T08:12:19.5724782Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:12:19.5724869Z .b8 6 // DW_FORM_data4 2026-02-21T08:12:19.5724954Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:12:19.5725096Z .b8 8 // DW_FORM_string 2026-02-21T08:12:19.5725177Z .b8 0 // EOM(1) 2026-02-21T08:12:19.5725254Z .b8 0 // EOM(2) 2026-02-21T08:12:19.5725336Z .b8 0 // EOM(3) 2026-02-21T08:12:19.5725394Z } 2026-02-21T08:12:19.5725463Z .section .debug_info 2026-02-21T08:12:19.5725527Z { 2026-02-21T08:12:19.5725619Z .b32 104 // Length of Unit 2026-02-21T08:12:19.5725719Z .b8 2 // DWARF version number 2026-02-21T08:12:19.5725779Z .b8 0 2026-02-21T08:12:19.5725917Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:12:19.5726021Z .b8 8 // Address Size (in bytes) 2026-02-21T08:12:19.5726136Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:12:19.5726238Z .b8 116 // DW_AT_producer 2026-02-21T08:12:19.5726300Z .b8 114 2026-02-21T08:12:19.5726361Z .b8 105 2026-02-21T08:12:19.5726419Z .b8 116 2026-02-21T08:12:19.5726491Z .b8 111 2026-02-21T08:12:19.5726551Z .b8 110 2026-02-21T08:12:19.5726611Z .b8 0 2026-02-21T08:12:19.5726706Z .b8 2 // DW_AT_language 2026-02-21T08:12:19.5726768Z .b8 0 2026-02-21T08:12:19.5726857Z .b8 99 // DW_AT_name 2026-02-21T08:12:19.5726915Z .b8 118 2026-02-21T08:12:19.5726982Z .b8 98 2026-02-21T08:12:19.5727038Z .b8 109 2026-02-21T08:12:19.5727095Z .b8 97 2026-02-21T08:12:19.5727160Z .b8 116 2026-02-21T08:12:19.5727279Z .b8 104 2026-02-21T08:12:19.5727336Z .b8 51 2026-02-21T08:12:19.5727393Z .b8 55 2026-02-21T08:12:19.5727457Z .b8 100 2026-02-21T08:12:19.5727513Z .b8 100 2026-02-21T08:12:19.5727569Z .b8 99 2026-02-21T08:12:19.5727631Z .b8 104 2026-02-21T08:12:19.5727688Z .b8 107 2026-02-21T08:12:19.5727745Z .b8 108 2026-02-21T08:12:19.5727800Z .b8 97 2026-02-21T08:12:19.5727864Z .b8 102 2026-02-21T08:12:19.5727920Z .b8 108 2026-02-21T08:12:19.5727977Z .b8 111 2026-02-21T08:12:19.5728039Z .b8 114 2026-02-21T08:12:19.5728095Z .b8 53 2026-02-21T08:12:19.5728151Z .b8 113 2026-02-21T08:12:19.5728206Z .b8 113 2026-02-21T08:12:19.5728270Z .b8 113 2026-02-21T08:12:19.5728326Z .b8 104 2026-02-21T08:12:19.5728380Z .b8 108 2026-02-21T08:12:19.5728435Z .b8 54 2026-02-21T08:12:19.5728498Z .b8 101 2026-02-21T08:12:19.5728554Z .b8 54 2026-02-21T08:12:19.5728610Z .b8 111 2026-02-21T08:12:19.5728670Z .b8 107 2026-02-21T08:12:19.5728726Z .b8 104 2026-02-21T08:12:19.5728832Z .b8 122 2026-02-21T08:12:19.5728890Z .b8 116 2026-02-21T08:12:19.5728957Z .b8 109 2026-02-21T08:12:19.5729014Z .b8 109 2026-02-21T08:12:19.5729070Z .b8 120 2026-02-21T08:12:19.5729134Z .b8 97 2026-02-21T08:12:19.5729190Z .b8 114 2026-02-21T08:12:19.5729245Z .b8 110 2026-02-21T08:12:19.5729301Z .b8 106 2026-02-21T08:12:19.5729363Z .b8 121 2026-02-21T08:12:19.5729418Z .b8 52 2026-02-21T08:12:19.5729472Z .b8 50 2026-02-21T08:12:19.5729526Z .b8 104 2026-02-21T08:12:19.5729588Z .b8 51 2026-02-21T08:12:19.5729643Z .b8 107 2026-02-21T08:12:19.5729697Z .b8 105 2026-02-21T08:12:19.5729758Z .b8 97 2026-02-21T08:12:19.5729813Z .b8 52 2026-02-21T08:12:19.5729869Z .b8 97 2026-02-21T08:12:19.5729923Z .b8 110 2026-02-21T08:12:19.5729984Z .b8 46 2026-02-21T08:12:19.5730041Z .b8 112 2026-02-21T08:12:19.5730096Z .b8 121 2026-02-21T08:12:19.5730157Z .b8 0 2026-02-21T08:12:19.5730261Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:12:19.5730348Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:12:19.5730405Z .b8 116 2026-02-21T08:12:19.5730469Z .b8 109 2026-02-21T08:12:19.5730524Z .b8 112 2026-02-21T08:12:19.5730578Z .b8 47 2026-02-21T08:12:19.5730640Z .b8 116 2026-02-21T08:12:19.5730696Z .b8 111 2026-02-21T08:12:19.5730751Z .b8 114 2026-02-21T08:12:19.5730805Z .b8 99 2026-02-21T08:12:19.5730867Z .b8 104 2026-02-21T08:12:19.5730922Z .b8 105 2026-02-21T08:12:19.5730978Z .b8 110 2026-02-21T08:12:19.5731039Z .b8 100 2026-02-21T08:12:19.5731093Z .b8 117 2026-02-21T08:12:19.5731148Z .b8 99 2026-02-21T08:12:19.5731204Z .b8 116 2026-02-21T08:12:19.5731266Z .b8 111 2026-02-21T08:12:19.5731321Z .b8 114 2026-02-21T08:12:19.5731376Z .b8 95 2026-02-21T08:12:19.5731431Z .b8 114 2026-02-21T08:12:19.5731493Z .b8 111 2026-02-21T08:12:19.5731549Z .b8 111 2026-02-21T08:12:19.5731603Z .b8 116 2026-02-21T08:12:19.5731667Z .b8 47 2026-02-21T08:12:19.5731722Z .b8 118 2026-02-21T08:12:19.5731777Z .b8 98 2026-02-21T08:12:19.5731832Z .b8 0 2026-02-21T08:12:19.5731895Z } 2026-02-21T08:12:19.5731973Z .section .debug_macinfo { } 2026-02-21T08:12:19.5731980Z 2026-02-21T08:12:19.5732069Z ================================================================ 2026-02-21T08:12:19.5732193Z please share the reproducer above with Triton project. 2026-02-21T08:12:21.4175143Z 2026-02-21T08:12:21.4177413Z [20s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:12:21.4178955Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=6, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T08:12:21.4180443Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:12:21.4180740Z `ptxas` stderr: 2026-02-21T08:12:21.4181260Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 122 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:12:21.4182170Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:12:21.4182342Z 2026-02-21T08:12:21.4182818Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjxk8bnvv.ptx -o /tmp/tmpjxk8bnvv.ptx.o 2026-02-21T08:12:21.4183351Z 2026-02-21T08:12:21.4183499Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:12:21.4183733Z 2026-02-21T08:12:21.4183736Z 2026-02-21T08:12:21.4183834Z ================================================================ 2026-02-21T08:12:21.4184080Z Internal Triton PTX codegen error 2026-02-21T08:12:21.4184287Z `ptxas` stderr: 2026-02-21T08:12:21.4186354Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 122 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:12:21.4186918Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:12:21.4187102Z 2026-02-21T08:12:21.4187540Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjxk8bnvv.ptx -o /tmp/tmpjxk8bnvv.ptx.o 2026-02-21T08:12:21.4188057Z 2026-02-21T08:12:21.4188061Z 2026-02-21T08:12:21.4188136Z // 2026-02-21T08:12:21.4188302Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:12:21.4188511Z // 2026-02-21T08:12:21.4188591Z 2026-02-21T08:12:21.4188659Z .version 8.7 2026-02-21T08:12:21.4188824Z .target sm_100a 2026-02-21T08:12:21.4188983Z .address_size 64 2026-02-21T08:12:21.4189088Z 2026-02-21T08:12:21.4189227Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:12:21.4189515Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:12:21.4189768Z // @_helion_matmul 2026-02-21T08:12:21.4190009Z .visible .entry _helion_matmul( 2026-02-21T08:12:21.4190432Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:12:21.4190730Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:12:21.4191015Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:12:21.4191301Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:12:21.4191587Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:12:21.4191831Z ) 2026-02-21T08:12:21.4191975Z .reqntid 256 2026-02-21T08:12:21.4192127Z .maxnreg 32 2026-02-21T08:12:21.4192282Z { 2026-02-21T08:12:21.4192428Z .reg .pred %p<28>; 2026-02-21T08:12:21.4192612Z .reg .b16 %rs<12>; 2026-02-21T08:12:21.4192771Z .reg .b32 %r<359>; 2026-02-21T08:12:21.4192935Z .reg .b64 %rd<164>; 2026-02-21T08:12:21.4193099Z $L__func_begin0: 2026-02-21T08:12:21.4193206Z 2026-02-21T08:12:21.4193270Z // %bb.0: 2026-02-21T08:12:21.4193554Z .loc 1 14 0 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:14 2026-02-21T08:12:21.4193889Z mov.u32 %r1, %tid.x; 2026-02-21T08:12:21.4194066Z shr.u32 %r2, %r1, 5; 2026-02-21T08:12:21.4194251Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:12:21.4194478Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:12:21.4194659Z @%p1 bra $L__BB0_14; 2026-02-21T08:12:21.4194895Z bra.uni $L__BB0_1; 2026-02-21T08:12:21.4195047Z $L__BB0_14: 2026-02-21T08:12:21.4195221Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:12:21.4195437Z setp.lt.u32 %p20, %r1, 32; 2026-02-21T08:12:21.4195632Z mov.b32 %r248, global_smem; 2026-02-21T08:12:21.4195815Z // begin inline asm 2026-02-21T08:12:21.4196089Z @%p20 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r248], 32; 2026-02-21T08:12:21.4196381Z // end inline asm 2026-02-21T08:12:21.4196536Z bar.sync 0, 128; 2026-02-21T08:12:21.4196718Z ld.shared.b32 %r355, [global_smem]; 2026-02-21T08:12:21.4196919Z bar.sync 0, 128; 2026-02-21T08:12:21.4197091Z // begin inline asm 2026-02-21T08:12:21.4197421Z @%p20 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:12:21.4197696Z // end inline asm 2026-02-21T08:12:21.4197988Z .loc 1 21 30 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:21:30 2026-02-21T08:12:21.4198319Z mov.u32 %r249, %ctaid.x; 2026-02-21T08:12:21.4198625Z .loc 1 21 35 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:21:35 2026-02-21T08:12:21.4198948Z shl.b32 %r358, %r249, 1; 2026-02-21T08:12:21.4199246Z .loc 1 22 37 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:22:37 2026-02-21T08:12:21.4199568Z add.s32 %r250, %r358, 2; 2026-02-21T08:12:21.4199858Z .loc 1 22 49 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:22:49 2026-02-21T08:12:21.4200194Z min.s32 %r26, %r250, 2048; 2026-02-21T08:12:21.4200560Z .loc 1 23 43 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:23:43 2026-02-21T08:12:21.4200892Z setp.ge.s32 %p22, %r358, %r26; 2026-02-21T08:12:21.4201077Z @%p22 bra $L__BB0_17; 2026-02-21T08:12:21.4201304Z // %bb.15: // %.lr.ph 2026-02-21T08:12:21.4201642Z .loc 1 0 43 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:0:43 2026-02-21T08:12:21.4201987Z ld.param.b64 %rd16, [_helion_matmul_param_2]; 2026-02-21T08:12:21.4202226Z bfe.u32 %r27, %r1, 1, 6; 2026-02-21T08:12:21.4202402Z or.b32 %r28, %r27, 64; 2026-02-21T08:12:21.4202587Z and.b32 %r29, %r1, 1; 2026-02-21T08:12:21.4202756Z shl.b32 %r30, %r29, 3; 2026-02-21T08:12:21.4202934Z shl.b32 %r251, %r1, 4; 2026-02-21T08:12:21.4203101Z and.b32 %r252, %r251, 1968; 2026-02-21T08:12:21.4203285Z bfe.s32 %r253, %r1, 2, 1; 2026-02-21T08:12:21.4203455Z and.b32 %r254, %r253, 2112; 2026-02-21T08:12:21.4203635Z or.b32 %r255, %r254, %r252; 2026-02-21T08:12:21.4203815Z add.s32 %r31, %r248, %r255; 2026-02-21T08:12:21.4203992Z xor.b32 %r257, %r255, 64; 2026-02-21T08:12:21.4204175Z add.s32 %r32, %r248, %r257; 2026-02-21T08:12:21.4204344Z shl.b32 %r258, %r1, 3; 2026-02-21T08:12:21.4204520Z and.b32 %r259, %r258, 944; 2026-02-21T08:12:21.4204806Z shl.b32 %r260, %r29, 6; 2026-02-21T08:12:21.4204988Z bfe.s32 %r261, %r1, 3, 1; 2026-02-21T08:12:21.4205157Z and.b32 %r262, %r261, 2112; 2026-02-21T08:12:21.4205336Z or.b32 %r263, %r259, %r260; 2026-02-21T08:12:21.4205512Z xor.b32 %r264, %r263, %r262; 2026-02-21T08:12:21.4205697Z add.s32 %r33, %r248, %r264; 2026-02-21T08:12:21.4205880Z setp.eq.b32 %p24, %r1, 0; 2026-02-21T08:12:21.4206111Z $L__BB0_16: // =>This Inner Loop Header: Depth=1 2026-02-21T08:12:21.4206490Z .loc 1 29 35 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:29:35 2026-02-21T08:12:21.4206806Z shr.s32 %r311, %r358, 31; 2026-02-21T08:12:21.4206983Z shr.u32 %r312, %r311, 25; 2026-02-21T08:12:21.4207153Z add.s32 %r313, %r358, %r312; 2026-02-21T08:12:21.4207466Z .loc 1 32 64 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:32:64 2026-02-21T08:12:21.4207795Z and.b32 %r314, %r313, 65408; 2026-02-21T08:12:21.4207967Z sub.s32 %r315, %r358, %r314; 2026-02-21T08:12:21.4208151Z cvt.u16.u32 %rs3, %r315; 2026-02-21T08:12:21.4208323Z and.b16 %rs4, %rs3, 128; 2026-02-21T08:12:21.4208614Z .loc 1 33 51 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:33:51 2026-02-21T08:12:21.4208930Z shr.u16 %rs5, %rs4, 7; 2026-02-21T08:12:21.4209104Z add.s16 %rs6, %rs3, %rs5; 2026-02-21T08:12:21.4209276Z cvt.s16.s8 %rs7, %rs6; 2026-02-21T08:12:21.4209450Z shr.s16 %rs8, %rs7, 1; 2026-02-21T08:12:21.4209745Z .loc 1 32 64 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:32:64 2026-02-21T08:12:21.4210062Z and.b16 %rs9, %rs6, 254; 2026-02-21T08:12:21.4210243Z sub.s16 %rs10, %rs3, %rs9; 2026-02-21T08:12:21.4210537Z .loc 1 34 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:34:27 2026-02-21T08:12:21.4210935Z shl.b32 %r316, %r313, 1; 2026-02-21T08:12:21.4211108Z and.b32 %r317, %r316, -256; 2026-02-21T08:12:21.4211290Z cvt.s16.s8 %rs11, %rs10; 2026-02-21T08:12:21.4211473Z mad.wide.s16 %r318, %rs11, 128, %r317; 2026-02-21T08:12:21.4211806Z .loc 1 35 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:35:32 2026-02-21T08:12:21.4212137Z or.b32 %r319, %r318, %r27; 2026-02-21T08:12:21.4212316Z or.b32 %r320, %r318, %r28; 2026-02-21T08:12:21.4212612Z .loc 1 36 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:36:27 2026-02-21T08:12:21.4212931Z mul.wide.s16 %r321, %rs8, 16; 2026-02-21T08:12:21.4213243Z .loc 1 37 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:37:32 2026-02-21T08:12:21.4213570Z or.b32 %r322, %r321, %r30; 2026-02-21T08:12:21.4213967Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4214316Z shfl.sync.idx.b32 %r323, %r2, 0, 31, -1; 2026-02-21T08:12:21.4214524Z shl.b32 %r324, %r323, 21; 2026-02-21T08:12:21.4214742Z and.b32 %r325, %r324, 6291456; 2026-02-21T08:12:21.4214929Z add.s32 %r265, %r325, %r355; 2026-02-21T08:12:21.4215114Z mov.pred %p23, -1; 2026-02-21T08:12:21.4215278Z mov.b32 %r266, 0; 2026-02-21T08:12:21.4215444Z // begin inline asm 2026-02-21T08:12:21.4215873Z @%p23 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r265 + 0], {%r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266, %r266}; 2026-02-21T08:12:21.4216328Z // end inline asm 2026-02-21T08:12:21.4216490Z // begin inline asm 2026-02-21T08:12:21.4216665Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:12:21.4216858Z // end inline asm 2026-02-21T08:12:21.4217007Z bar.sync 0, 128; 2026-02-21T08:12:21.4217300Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4217632Z add.s32 %r282, %r248, 46080; 2026-02-21T08:12:21.4217820Z // begin inline asm 2026-02-21T08:12:21.4218021Z @%p24 mbarrier.init.shared::cta.b64 [%r282], 1; 2026-02-21T08:12:21.4218236Z // end inline asm 2026-02-21T08:12:21.4218420Z st.shared.b32 [global_smem+46088], 33619968; 2026-02-21T08:12:21.4218644Z st.shared.b32 [global_smem], %r355; 2026-02-21T08:12:21.4218881Z st.shared.v2.b32 [global_smem+8], {%r318, %r321}; 2026-02-21T08:12:21.4219103Z barrier.sync 1; 2026-02-21T08:12:21.4219264Z barrier.sync 1; 2026-02-21T08:12:21.4219414Z barrier.sync 1; 2026-02-21T08:12:21.4219691Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4220006Z bar.sync 0, 128; 2026-02-21T08:12:21.4220158Z // begin inline asm 2026-02-21T08:12:21.4220313Z 2026-02-21T08:12:21.4220440Z { 2026-02-21T08:12:21.4220615Z .reg .pred complete; 2026-02-21T08:12:21.4220777Z waitLoop: 2026-02-21T08:12:21.4221002Z mbarrier.try_wait.parity.shared.b64 complete, [%r282], %r266; 2026-02-21T08:12:21.4221273Z @!complete bra.uni waitLoop; 2026-02-21T08:12:21.4221452Z } 2026-02-21T08:12:21.4221524Z 2026-02-21T08:12:21.4221587Z // end inline asm 2026-02-21T08:12:21.4221884Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4222226Z bar.sync 0, 128; 2026-02-21T08:12:21.4222379Z // begin inline asm 2026-02-21T08:12:21.4222579Z @%p24 mbarrier.inval.shared::cta.b64 [%r282]; 2026-02-21T08:12:21.4222792Z // end inline asm 2026-02-21T08:12:21.4223073Z .loc 1 52 45 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:52:45 2026-02-21T08:12:21.4223403Z shl.b32 %r327, %r319, 10; 2026-02-21T08:12:21.4223592Z shl.b32 %r328, %r320, 10; 2026-02-21T08:12:21.4223882Z .loc 1 52 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:52:52 2026-02-21T08:12:21.4224216Z add.s32 %r329, %r327, %r322; 2026-02-21T08:12:21.4224407Z add.s32 %r330, %r328, %r322; 2026-02-21T08:12:21.4224801Z .loc 1 52 24 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:52:24 2026-02-21T08:12:21.4225135Z mad.wide.s32 %rd129, %r329, 2, %rd16; 2026-02-21T08:12:21.4225359Z mad.wide.s32 %rd130, %r330, 2, %rd16; 2026-02-21T08:12:21.4225742Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4226186Z // begin inline asm 2026-02-21T08:12:21.4226690Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301}, [%r265 + 0]; 2026-02-21T08:12:21.4227238Z // end inline asm 2026-02-21T08:12:21.4227402Z // begin inline asm 2026-02-21T08:12:21.4227581Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:12:21.4227761Z // end inline asm 2026-02-21T08:12:21.4227925Z cvt.u64.u32 %rd131, %r286; 2026-02-21T08:12:21.4228102Z cvt.u64.u32 %rd132, %r287; 2026-02-21T08:12:21.4228345Z shl.b64 %rd133, %rd132, 32; 2026-02-21T08:12:21.4228538Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T08:12:21.4228835Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4229164Z mov.b64 {%r331, %r332}, %rd134; 2026-02-21T08:12:21.4229358Z cvt.rn.f16x2.f32 %r333, %r332, %r331; 2026-02-21T08:12:21.4229671Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4229981Z cvt.u64.u32 %rd135, %r288; 2026-02-21T08:12:21.4230162Z cvt.u64.u32 %rd136, %r289; 2026-02-21T08:12:21.4230332Z shl.b64 %rd137, %rd136, 32; 2026-02-21T08:12:21.4230521Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T08:12:21.4230824Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4231146Z mov.b64 {%r334, %r335}, %rd138; 2026-02-21T08:12:21.4231342Z cvt.rn.f16x2.f32 %r336, %r335, %r334; 2026-02-21T08:12:21.4231647Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4231962Z cvt.u64.u32 %rd139, %r290; 2026-02-21T08:12:21.4232131Z cvt.u64.u32 %rd140, %r291; 2026-02-21T08:12:21.4232307Z shl.b64 %rd141, %rd140, 32; 2026-02-21T08:12:21.4232489Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T08:12:21.4232779Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4233097Z mov.b64 {%r337, %r338}, %rd142; 2026-02-21T08:12:21.4233283Z cvt.rn.f16x2.f32 %r339, %r338, %r337; 2026-02-21T08:12:21.4233589Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4233898Z cvt.u64.u32 %rd143, %r292; 2026-02-21T08:12:21.4234076Z cvt.u64.u32 %rd144, %r293; 2026-02-21T08:12:21.4234242Z shl.b64 %rd145, %rd144, 32; 2026-02-21T08:12:21.4234423Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T08:12:21.4234745Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4235057Z mov.b64 {%r340, %r341}, %rd146; 2026-02-21T08:12:21.4235252Z cvt.rn.f16x2.f32 %r342, %r341, %r340; 2026-02-21T08:12:21.4235554Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4235871Z cvt.u64.u32 %rd147, %r294; 2026-02-21T08:12:21.4236042Z cvt.u64.u32 %rd148, %r295; 2026-02-21T08:12:21.4236220Z shl.b64 %rd149, %rd148, 32; 2026-02-21T08:12:21.4236400Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T08:12:21.4236696Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4237028Z mov.b64 {%r343, %r344}, %rd150; 2026-02-21T08:12:21.4237218Z cvt.rn.f16x2.f32 %r345, %r344, %r343; 2026-02-21T08:12:21.4237528Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4237849Z cvt.u64.u32 %rd151, %r296; 2026-02-21T08:12:21.4238093Z cvt.u64.u32 %rd152, %r297; 2026-02-21T08:12:21.4238261Z shl.b64 %rd153, %rd152, 32; 2026-02-21T08:12:21.4238440Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T08:12:21.4238732Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4239043Z mov.b64 {%r346, %r347}, %rd154; 2026-02-21T08:12:21.4239237Z cvt.rn.f16x2.f32 %r348, %r347, %r346; 2026-02-21T08:12:21.4239538Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4239864Z cvt.u64.u32 %rd155, %r298; 2026-02-21T08:12:21.4240037Z cvt.u64.u32 %rd156, %r299; 2026-02-21T08:12:21.4240214Z shl.b64 %rd157, %rd156, 32; 2026-02-21T08:12:21.4240395Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T08:12:21.4240690Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4241081Z mov.b64 {%r349, %r350}, %rd158; 2026-02-21T08:12:21.4241273Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:12:21.4241585Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4241904Z cvt.u64.u32 %rd159, %r300; 2026-02-21T08:12:21.4242084Z cvt.u64.u32 %rd160, %r301; 2026-02-21T08:12:21.4242254Z shl.b64 %rd161, %rd160, 32; 2026-02-21T08:12:21.4242436Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T08:12:21.4242735Z .loc 1 51 27 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:51:27 2026-02-21T08:12:21.4243046Z mov.b64 {%r352, %r353}, %rd162; 2026-02-21T08:12:21.4243241Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:12:21.4243544Z .loc 1 52 82 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:52:82 2026-02-21T08:12:21.4243914Z st.shared.v4.b32 [%r31], {%r333, %r336, %r339, %r342}; 2026-02-21T08:12:21.4244177Z st.shared.v4.b32 [%r32], {%r345, %r348, %r351, %r354}; 2026-02-21T08:12:21.4244406Z bar.sync 0, 128; 2026-02-21T08:12:21.4244615Z ld.shared.v4.b32 {%r307, %r308, %r309, %r310}, [%r33+1024]; 2026-02-21T08:12:21.4244925Z ld.shared.v4.b32 {%r303, %r304, %r305, %r306}, [%r33]; 2026-02-21T08:12:21.4245153Z // begin inline asm 2026-02-21T08:12:21.4245363Z st.global.v4.b32 [ %rd129 + 0 ], { %r303, %r304, %r305, %r306 }; 2026-02-21T08:12:21.4245607Z // end inline asm 2026-02-21T08:12:21.4245760Z // begin inline asm 2026-02-21T08:12:21.4245972Z st.global.v4.b32 [ %rd130 + 0 ], { %r307, %r308, %r309, %r310 }; 2026-02-21T08:12:21.4246202Z // end inline asm 2026-02-21T08:12:21.4246482Z .loc 1 23 43 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:23:43 2026-02-21T08:12:21.4246826Z add.s32 %r358, %r358, 1; 2026-02-21T08:12:21.4247039Z setp.ne.b32 %p26, %r26, %r358; 2026-02-21T08:12:21.4247284Z @%p26 bra $L__BB0_16; 2026-02-21T08:12:21.4247517Z $L__BB0_17: // %._crit_edge 2026-02-21T08:12:21.4247927Z .loc 1 23 4 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:23:4 2026-02-21T08:12:21.4248277Z bar.sync 0, 128; 2026-02-21T08:12:21.4248439Z // begin inline asm 2026-02-21T08:12:21.4248676Z @%p20 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r355, 32; 2026-02-21T08:12:21.4248924Z // end inline asm 2026-02-21T08:12:21.4249107Z st.shared.b32 [global_smem+46088], 50529027; 2026-02-21T08:12:21.4249317Z barrier.sync 1; 2026-02-21T08:12:21.4249512Z $L__BB0_18: // %common.ret 2026-02-21T08:12:21.4249856Z .loc 1 0 0 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:0 2026-02-21T08:12:21.4250172Z ret; 2026-02-21T08:12:21.4250355Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:12:21.4250620Z ld.param.b64 %rd15, [_helion_matmul_param_1]; 2026-02-21T08:12:21.4250870Z ld.param.b64 %rd14, [_helion_matmul_param_0]; 2026-02-21T08:12:21.4251209Z .loc 1 14 0 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:14 2026-02-21T08:12:21.4251686Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:12:21.4251871Z and.b16 %rs2, %rs1, 3; 2026-02-21T08:12:21.4252057Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:12:21.4252387Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4252733Z or.b32 %r5, %r4, 192; 2026-02-21T08:12:21.4252909Z mov.b32 %r38, global_smem; 2026-02-21T08:12:21.4253103Z add.s32 %r39, %r38, %r3; 2026-02-21T08:12:21.4253286Z bra.uni $L__BB0_2; 2026-02-21T08:12:21.4253505Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4253889Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4254213Z barrier.sync 1; 2026-02-21T08:12:21.4254386Z barrier.sync 1; 2026-02-21T08:12:21.4254572Z $L__BB0_2: // %.preheader 2026-02-21T08:12:21.4254946Z // =>This Loop Header: Depth=1 2026-02-21T08:12:21.4255213Z // Child Loop BB0_8 Depth 2 2026-02-21T08:12:21.4255548Z .loc 1 14 0 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:14 2026-02-21T08:12:21.4255901Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:12:21.4256107Z barrier.sync 1; 2026-02-21T08:12:21.4256279Z ld.shared.b8 %r37, [%r39+46084]; 2026-02-21T08:12:21.4256474Z setp.gt.u32 %p2, %r37, 3; 2026-02-21T08:12:21.4256656Z @%p2 bra $L__BB0_4; 2026-02-21T08:12:21.4256842Z // %bb.3: // %.preheader 2026-02-21T08:12:21.4257099Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4257340Z $L_brx_0: .branchtargets 2026-02-21T08:12:21.4257507Z $L__BB0_5, 2026-02-21T08:12:21.4257654Z $L__BB0_12, 2026-02-21T08:12:21.4257796Z $L__BB0_13, 2026-02-21T08:12:21.4257944Z $L__BB0_18; 2026-02-21T08:12:21.4258090Z brx.idx %r37, $L_brx_0; 2026-02-21T08:12:21.4258295Z $L__BB0_5: // %.peel.begin 2026-02-21T08:12:21.4258540Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4258902Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4259245Z ld.shared.b32 %r160, [global_smem]; 2026-02-21T08:12:21.4259474Z ld.shared.v2.b32 {%r131, %r132}, [global_smem+8]; 2026-02-21T08:12:21.4259704Z barrier.sync 1; 2026-02-21T08:12:21.4260030Z .loc 1 35 45 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:35:45 2026-02-21T08:12:21.4260389Z add.s32 %r133, %r1, -128; 2026-02-21T08:12:21.4260564Z shr.u32 %r7, %r133, 5; 2026-02-21T08:12:21.4260744Z bfe.u32 %r134, %r1, 2, 4; 2026-02-21T08:12:21.4261037Z .loc 1 43 48 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:43:48 2026-02-21T08:12:21.4261374Z shl.b32 %r135, %r1, 3; 2026-02-21T08:12:21.4261558Z and.b32 %r136, %r135, 24; 2026-02-21T08:12:21.4261851Z .loc 1 35 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:35:32 2026-02-21T08:12:21.4262187Z add.s32 %r137, %r131, %r134; 2026-02-21T08:12:21.4262488Z .loc 1 37 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:37:32 2026-02-21T08:12:21.4262822Z add.s32 %r138, %r132, %r134; 2026-02-21T08:12:21.4263118Z .loc 1 35 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:35:32 2026-02-21T08:12:21.4263444Z shl.b32 %r8, %r137, 10; 2026-02-21T08:12:21.4263733Z .loc 1 47 53 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:53 2026-02-21T08:12:21.4264061Z add.s32 %r139, %r8, 16384; 2026-02-21T08:12:21.4264245Z add.s32 %r140, %r8, 32768; 2026-02-21T08:12:21.4264418Z add.s32 %r141, %r8, 49152; 2026-02-21T08:12:21.4264595Z add.s32 %r142, %r8, 65536; 2026-02-21T08:12:21.4264798Z add.s32 %r143, %r8, 81920; 2026-02-21T08:12:21.4265041Z add.s32 %r144, %r8, 98304; 2026-02-21T08:12:21.4265211Z add.s32 %r145, %r8, 114688; 2026-02-21T08:12:21.4265507Z .loc 1 48 80 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:80 2026-02-21T08:12:21.4265832Z shl.b32 %r9, %r138, 10; 2026-02-21T08:12:21.4266108Z .loc 1 47 60 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:60 2026-02-21T08:12:21.4266428Z or.b32 %r146, %r8, %r136; 2026-02-21T08:12:21.4266600Z or.b32 %r147, %r139, %r136; 2026-02-21T08:12:21.4266781Z or.b32 %r148, %r140, %r136; 2026-02-21T08:12:21.4266953Z or.b32 %r149, %r141, %r136; 2026-02-21T08:12:21.4267130Z or.b32 %r150, %r142, %r136; 2026-02-21T08:12:21.4267309Z or.b32 %r151, %r143, %r136; 2026-02-21T08:12:21.4267479Z or.b32 %r152, %r144, %r136; 2026-02-21T08:12:21.4267653Z or.b32 %r153, %r145, %r136; 2026-02-21T08:12:21.4268061Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4268412Z mad.wide.s32 %rd17, %r146, 2, %rd14; 2026-02-21T08:12:21.4268616Z mad.wide.s32 %rd18, %r147, 2, %rd14; 2026-02-21T08:12:21.4268822Z mad.wide.s32 %rd19, %r148, 2, %rd14; 2026-02-21T08:12:21.4269017Z mad.wide.s32 %rd20, %r149, 2, %rd14; 2026-02-21T08:12:21.4269218Z mad.wide.s32 %rd21, %r150, 2, %rd14; 2026-02-21T08:12:21.4269416Z mad.wide.s32 %rd22, %r151, 2, %rd14; 2026-02-21T08:12:21.4269608Z mad.wide.s32 %rd23, %r152, 2, %rd14; 2026-02-21T08:12:21.4269807Z mad.wide.s32 %rd24, %r153, 2, %rd14; 2026-02-21T08:12:21.4270109Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4270452Z shl.b32 %r154, %r1, 4; 2026-02-21T08:12:21.4270619Z and.b32 %r155, %r154, 1008; 2026-02-21T08:12:21.4270800Z shl.b32 %r156, %r1, 1; 2026-02-21T08:12:21.4270966Z and.b32 %r157, %r156, 48; 2026-02-21T08:12:21.4271159Z xor.b32 %r10, %r155, %r157; 2026-02-21T08:12:21.4271340Z add.s32 %r174, %r38, %r10; 2026-02-21T08:12:21.4271512Z mov.b32 %r175, 16; 2026-02-21T08:12:21.4271677Z // begin inline asm 2026-02-21T08:12:21.4271910Z cp.async.cg.shared.global [ %r174 + 0 ], [ %rd17 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4272181Z // end inline asm 2026-02-21T08:12:21.4272340Z add.s32 %r176, %r174, 1024; 2026-02-21T08:12:21.4272519Z // begin inline asm 2026-02-21T08:12:21.4272744Z cp.async.cg.shared.global [ %r176 + 0 ], [ %rd18 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4273008Z // end inline asm 2026-02-21T08:12:21.4273164Z add.s32 %r178, %r174, 2048; 2026-02-21T08:12:21.4273345Z // begin inline asm 2026-02-21T08:12:21.4273577Z cp.async.cg.shared.global [ %r178 + 0 ], [ %rd19 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4273862Z // end inline asm 2026-02-21T08:12:21.4274098Z add.s32 %r180, %r174, 3072; 2026-02-21T08:12:21.4274266Z // begin inline asm 2026-02-21T08:12:21.4274490Z cp.async.cg.shared.global [ %r180 + 0 ], [ %rd20 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4274774Z // end inline asm 2026-02-21T08:12:21.4274936Z add.s32 %r182, %r174, 4096; 2026-02-21T08:12:21.4275105Z // begin inline asm 2026-02-21T08:12:21.4275330Z cp.async.cg.shared.global [ %r182 + 0 ], [ %rd21 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4275585Z // end inline asm 2026-02-21T08:12:21.4275738Z add.s32 %r184, %r174, 5120; 2026-02-21T08:12:21.4275916Z // begin inline asm 2026-02-21T08:12:21.4276134Z cp.async.cg.shared.global [ %r184 + 0 ], [ %rd22 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4276387Z // end inline asm 2026-02-21T08:12:21.4276539Z add.s32 %r186, %r174, 6144; 2026-02-21T08:12:21.4276716Z // begin inline asm 2026-02-21T08:12:21.4276932Z cp.async.cg.shared.global [ %r186 + 0 ], [ %rd23 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4277186Z // end inline asm 2026-02-21T08:12:21.4277347Z add.s32 %r188, %r174, 7168; 2026-02-21T08:12:21.4277516Z // begin inline asm 2026-02-21T08:12:21.4277743Z cp.async.cg.shared.global [ %r188 + 0 ], [ %rd24 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4277991Z // end inline asm 2026-02-21T08:12:21.4278244Z cp.async.commit_group; 2026-02-21T08:12:21.4278538Z .loc 1 48 59 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:59 2026-02-21T08:12:21.4278869Z or.b32 %r158, %r9, %r136; 2026-02-21T08:12:21.4279165Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4279503Z mad.wide.s32 %rd25, %r158, 2, %rd15; 2026-02-21T08:12:21.4279819Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4280153Z add.s32 %r190, %r174, 40960; 2026-02-21T08:12:21.4280344Z // begin inline asm 2026-02-21T08:12:21.4280608Z cp.async.cg.shared.global [ %r190 + 0 ], [ %rd25 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4280866Z // end inline asm 2026-02-21T08:12:21.4281044Z cp.async.commit_group; 2026-02-21T08:12:21.4281410Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4281740Z cvt.s64.s32 %rd62, %r8; 2026-02-21T08:12:21.4281914Z cvt.u64.u32 %rd63, %r136; 2026-02-21T08:12:21.4282098Z or.b64 %rd64, %rd62, %rd63; 2026-02-21T08:12:21.4282272Z shl.b64 %rd65, %rd64, 1; 2026-02-21T08:12:21.4282477Z add.s64 %rd1, %rd14, %rd65; 2026-02-21T08:12:21.4282662Z add.s64 %rd26, %rd1, 64; 2026-02-21T08:12:21.4282843Z cvt.s64.s32 %rd66, %r139; 2026-02-21T08:12:21.4283019Z or.b64 %rd67, %rd66, %rd63; 2026-02-21T08:12:21.4283206Z shl.b64 %rd68, %rd67, 1; 2026-02-21T08:12:21.4283380Z add.s64 %rd2, %rd14, %rd68; 2026-02-21T08:12:21.4283560Z add.s64 %rd27, %rd2, 64; 2026-02-21T08:12:21.4283738Z cvt.s64.s32 %rd69, %r140; 2026-02-21T08:12:21.4283911Z or.b64 %rd70, %rd69, %rd63; 2026-02-21T08:12:21.4284095Z shl.b64 %rd71, %rd70, 1; 2026-02-21T08:12:21.4284267Z add.s64 %rd3, %rd14, %rd71; 2026-02-21T08:12:21.4284453Z add.s64 %rd28, %rd3, 64; 2026-02-21T08:12:21.4284626Z cvt.s64.s32 %rd72, %r141; 2026-02-21T08:12:21.4284851Z or.b64 %rd73, %rd72, %rd63; 2026-02-21T08:12:21.4285024Z shl.b64 %rd74, %rd73, 1; 2026-02-21T08:12:21.4285198Z add.s64 %rd4, %rd14, %rd74; 2026-02-21T08:12:21.4285371Z add.s64 %rd29, %rd4, 64; 2026-02-21T08:12:21.4285549Z cvt.s64.s32 %rd75, %r142; 2026-02-21T08:12:21.4285725Z or.b64 %rd76, %rd75, %rd63; 2026-02-21T08:12:21.4285899Z shl.b64 %rd77, %rd76, 1; 2026-02-21T08:12:21.4286076Z add.s64 %rd5, %rd14, %rd77; 2026-02-21T08:12:21.4286250Z add.s64 %rd30, %rd5, 64; 2026-02-21T08:12:21.4286427Z cvt.s64.s32 %rd78, %r143; 2026-02-21T08:12:21.4286594Z or.b64 %rd79, %rd78, %rd63; 2026-02-21T08:12:21.4286772Z shl.b64 %rd80, %rd79, 1; 2026-02-21T08:12:21.4286939Z add.s64 %rd6, %rd14, %rd80; 2026-02-21T08:12:21.4287121Z add.s64 %rd31, %rd6, 64; 2026-02-21T08:12:21.4287289Z cvt.s64.s32 %rd81, %r144; 2026-02-21T08:12:21.4287467Z or.b64 %rd82, %rd81, %rd63; 2026-02-21T08:12:21.4287646Z shl.b64 %rd83, %rd82, 1; 2026-02-21T08:12:21.4287813Z add.s64 %rd7, %rd14, %rd83; 2026-02-21T08:12:21.4288001Z add.s64 %rd32, %rd7, 64; 2026-02-21T08:12:21.4288169Z cvt.s64.s32 %rd84, %r145; 2026-02-21T08:12:21.4288344Z or.b64 %rd85, %rd84, %rd63; 2026-02-21T08:12:21.4288513Z shl.b64 %rd86, %rd85, 1; 2026-02-21T08:12:21.4288689Z add.s64 %rd8, %rd14, %rd86; 2026-02-21T08:12:21.4288863Z add.s64 %rd33, %rd8, 64; 2026-02-21T08:12:21.4289157Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4289481Z bar.sync 2, 64; 2026-02-21T08:12:21.4289638Z add.s32 %r58, %r174, 8192; 2026-02-21T08:12:21.4289818Z // begin inline asm 2026-02-21T08:12:21.4290044Z cp.async.cg.shared.global [ %r58 + 0 ], [ %rd26 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4290303Z // end inline asm 2026-02-21T08:12:21.4290455Z add.s32 %r60, %r174, 9216; 2026-02-21T08:12:21.4290635Z // begin inline asm 2026-02-21T08:12:21.4290858Z cp.async.cg.shared.global [ %r60 + 0 ], [ %rd27 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4291113Z // end inline asm 2026-02-21T08:12:21.4291277Z add.s32 %r62, %r174, 10240; 2026-02-21T08:12:21.4291532Z // begin inline asm 2026-02-21T08:12:21.4291761Z cp.async.cg.shared.global [ %r62 + 0 ], [ %rd28 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4292012Z // end inline asm 2026-02-21T08:12:21.4292172Z add.s32 %r64, %r174, 11264; 2026-02-21T08:12:21.4292343Z // begin inline asm 2026-02-21T08:12:21.4292565Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd29 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4292810Z // end inline asm 2026-02-21T08:12:21.4292969Z add.s32 %r66, %r174, 12288; 2026-02-21T08:12:21.4293139Z // begin inline asm 2026-02-21T08:12:21.4293366Z cp.async.cg.shared.global [ %r66 + 0 ], [ %rd30 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4293620Z // end inline asm 2026-02-21T08:12:21.4293775Z add.s32 %r68, %r174, 13312; 2026-02-21T08:12:21.4293958Z // begin inline asm 2026-02-21T08:12:21.4294178Z cp.async.cg.shared.global [ %r68 + 0 ], [ %rd31 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4294454Z // end inline asm 2026-02-21T08:12:21.4294718Z add.s32 %r70, %r174, 14336; 2026-02-21T08:12:21.4294903Z // begin inline asm 2026-02-21T08:12:21.4295116Z cp.async.cg.shared.global [ %r70 + 0 ], [ %rd32 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4295378Z // end inline asm 2026-02-21T08:12:21.4295545Z add.s32 %r72, %r174, 15360; 2026-02-21T08:12:21.4295719Z // begin inline asm 2026-02-21T08:12:21.4295950Z cp.async.cg.shared.global [ %r72 + 0 ], [ %rd33 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4296199Z // end inline asm 2026-02-21T08:12:21.4296365Z cp.async.commit_group; 2026-02-21T08:12:21.4296649Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4296980Z cvt.s64.s32 %rd87, %r9; 2026-02-21T08:12:21.4297154Z or.b64 %rd88, %rd87, %rd63; 2026-02-21T08:12:21.4297334Z shl.b64 %rd89, %rd88, 1; 2026-02-21T08:12:21.4297510Z add.s64 %rd9, %rd15, %rd89; 2026-02-21T08:12:21.4297680Z add.s64 %rd34, %rd9, 64; 2026-02-21T08:12:21.4297972Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4298286Z add.s32 %r74, %r174, 41984; 2026-02-21T08:12:21.4298462Z // begin inline asm 2026-02-21T08:12:21.4298679Z cp.async.cg.shared.global [ %r74 + 0 ], [ %rd34 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4298933Z // end inline asm 2026-02-21T08:12:21.4299088Z cp.async.commit_group; 2026-02-21T08:12:21.4299375Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4299691Z add.s64 %rd35, %rd1, 128; 2026-02-21T08:12:21.4299859Z add.s64 %rd36, %rd2, 128; 2026-02-21T08:12:21.4300035Z add.s64 %rd37, %rd3, 128; 2026-02-21T08:12:21.4300200Z add.s64 %rd38, %rd4, 128; 2026-02-21T08:12:21.4300370Z add.s64 %rd39, %rd5, 128; 2026-02-21T08:12:21.4300531Z add.s64 %rd40, %rd6, 128; 2026-02-21T08:12:21.4300699Z add.s64 %rd41, %rd7, 128; 2026-02-21T08:12:21.4300860Z add.s64 %rd42, %rd8, 128; 2026-02-21T08:12:21.4301144Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4301456Z bar.sync 2, 64; 2026-02-21T08:12:21.4301609Z add.s32 %r76, %r174, 16384; 2026-02-21T08:12:21.4301783Z // begin inline asm 2026-02-21T08:12:21.4301999Z cp.async.cg.shared.global [ %r76 + 0 ], [ %rd35 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4302251Z // end inline asm 2026-02-21T08:12:21.4302404Z add.s32 %r78, %r174, 17408; 2026-02-21T08:12:21.4302578Z // begin inline asm 2026-02-21T08:12:21.4302793Z cp.async.cg.shared.global [ %r78 + 0 ], [ %rd36 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4303044Z // end inline asm 2026-02-21T08:12:21.4303200Z add.s32 %r80, %r174, 18432; 2026-02-21T08:12:21.4303369Z // begin inline asm 2026-02-21T08:12:21.4303591Z cp.async.cg.shared.global [ %r80 + 0 ], [ %rd37 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4303837Z // end inline asm 2026-02-21T08:12:21.4303996Z add.s32 %r82, %r174, 19456; 2026-02-21T08:12:21.4304165Z // begin inline asm 2026-02-21T08:12:21.4304388Z cp.async.cg.shared.global [ %r82 + 0 ], [ %rd38 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4304749Z // end inline asm 2026-02-21T08:12:21.4304917Z add.s32 %r84, %r174, 20480; 2026-02-21T08:12:21.4305090Z // begin inline asm 2026-02-21T08:12:21.4305312Z cp.async.cg.shared.global [ %r84 + 0 ], [ %rd39 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4305565Z // end inline asm 2026-02-21T08:12:21.4305715Z add.s32 %r86, %r174, 21504; 2026-02-21T08:12:21.4305900Z // begin inline asm 2026-02-21T08:12:21.4306123Z cp.async.cg.shared.global [ %r86 + 0 ], [ %rd40 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4306388Z // end inline asm 2026-02-21T08:12:21.4306539Z add.s32 %r88, %r174, 22528; 2026-02-21T08:12:21.4306719Z // begin inline asm 2026-02-21T08:12:21.4306930Z cp.async.cg.shared.global [ %r88 + 0 ], [ %rd41 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4307184Z // end inline asm 2026-02-21T08:12:21.4307342Z add.s32 %r90, %r174, 23552; 2026-02-21T08:12:21.4307511Z // begin inline asm 2026-02-21T08:12:21.4307805Z cp.async.cg.shared.global [ %r90 + 0 ], [ %rd42 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4308057Z // end inline asm 2026-02-21T08:12:21.4308221Z cp.async.commit_group; 2026-02-21T08:12:21.4308506Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4308836Z add.s64 %rd43, %rd9, 128; 2026-02-21T08:12:21.4309121Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4309441Z add.s32 %r92, %r174, 43008; 2026-02-21T08:12:21.4309618Z // begin inline asm 2026-02-21T08:12:21.4309833Z cp.async.cg.shared.global [ %r92 + 0 ], [ %rd43 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4310085Z // end inline asm 2026-02-21T08:12:21.4310241Z cp.async.commit_group; 2026-02-21T08:12:21.4310531Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4310843Z add.s64 %rd44, %rd1, 192; 2026-02-21T08:12:21.4311023Z add.s64 %rd45, %rd2, 192; 2026-02-21T08:12:21.4311194Z add.s64 %rd46, %rd3, 192; 2026-02-21T08:12:21.4311365Z add.s64 %rd47, %rd4, 192; 2026-02-21T08:12:21.4311536Z add.s64 %rd48, %rd5, 192; 2026-02-21T08:12:21.4311699Z add.s64 %rd49, %rd6, 192; 2026-02-21T08:12:21.4311871Z add.s64 %rd50, %rd7, 192; 2026-02-21T08:12:21.4312036Z add.s64 %rd51, %rd8, 192; 2026-02-21T08:12:21.4312323Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4312631Z bar.sync 2, 64; 2026-02-21T08:12:21.4312793Z add.s32 %r94, %r174, 24576; 2026-02-21T08:12:21.4312963Z // begin inline asm 2026-02-21T08:12:21.4313186Z cp.async.cg.shared.global [ %r94 + 0 ], [ %rd44 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4313438Z // end inline asm 2026-02-21T08:12:21.4313590Z add.s32 %r96, %r174, 25600; 2026-02-21T08:12:21.4313767Z // begin inline asm 2026-02-21T08:12:21.4313980Z cp.async.cg.shared.global [ %r96 + 0 ], [ %rd45 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4314237Z // end inline asm 2026-02-21T08:12:21.4314390Z add.s32 %r98, %r174, 26624; 2026-02-21T08:12:21.4314567Z // begin inline asm 2026-02-21T08:12:21.4314844Z cp.async.cg.shared.global [ %r98 + 0 ], [ %rd46 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4315101Z // end inline asm 2026-02-21T08:12:21.4315265Z add.s32 %r100, %r174, 27648; 2026-02-21T08:12:21.4315443Z // begin inline asm 2026-02-21T08:12:21.4315677Z cp.async.cg.shared.global [ %r100 + 0 ], [ %rd47 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4315939Z // end inline asm 2026-02-21T08:12:21.4316102Z add.s32 %r102, %r174, 28672; 2026-02-21T08:12:21.4316275Z // begin inline asm 2026-02-21T08:12:21.4316500Z cp.async.cg.shared.global [ %r102 + 0 ], [ %rd48 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4316752Z // end inline asm 2026-02-21T08:12:21.4316910Z add.s32 %r104, %r174, 29696; 2026-02-21T08:12:21.4317079Z // begin inline asm 2026-02-21T08:12:21.4317302Z cp.async.cg.shared.global [ %r104 + 0 ], [ %rd49 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4317556Z // end inline asm 2026-02-21T08:12:21.4317777Z add.s32 %r106, %r174, 30720; 2026-02-21T08:12:21.4317955Z // begin inline asm 2026-02-21T08:12:21.4318169Z cp.async.cg.shared.global [ %r106 + 0 ], [ %rd50 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4318424Z // end inline asm 2026-02-21T08:12:21.4318577Z add.s32 %r108, %r174, 31744; 2026-02-21T08:12:21.4318754Z // begin inline asm 2026-02-21T08:12:21.4318969Z cp.async.cg.shared.global [ %r108 + 0 ], [ %rd51 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4319226Z // end inline asm 2026-02-21T08:12:21.4319391Z cp.async.commit_group; 2026-02-21T08:12:21.4319684Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4320009Z add.s64 %rd52, %rd9, 192; 2026-02-21T08:12:21.4320298Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4320618Z add.s32 %r110, %r174, 44032; 2026-02-21T08:12:21.4320874Z // begin inline asm 2026-02-21T08:12:21.4321102Z cp.async.cg.shared.global [ %r110 + 0 ], [ %rd52 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4321356Z // end inline asm 2026-02-21T08:12:21.4321511Z cp.async.commit_group; 2026-02-21T08:12:21.4321797Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4322118Z add.s64 %rd53, %rd1, 256; 2026-02-21T08:12:21.4322293Z add.s64 %rd54, %rd2, 256; 2026-02-21T08:12:21.4322460Z add.s64 %rd55, %rd3, 256; 2026-02-21T08:12:21.4322632Z add.s64 %rd56, %rd4, 256; 2026-02-21T08:12:21.4322798Z add.s64 %rd57, %rd5, 256; 2026-02-21T08:12:21.4322971Z add.s64 %rd58, %rd6, 256; 2026-02-21T08:12:21.4323136Z add.s64 %rd59, %rd7, 256; 2026-02-21T08:12:21.4323311Z add.s64 %rd60, %rd8, 256; 2026-02-21T08:12:21.4323600Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4323910Z bar.sync 2, 64; 2026-02-21T08:12:21.4324075Z add.s32 %r112, %r174, 32768; 2026-02-21T08:12:21.4324248Z // begin inline asm 2026-02-21T08:12:21.4324472Z cp.async.cg.shared.global [ %r112 + 0 ], [ %rd53 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4324752Z // end inline asm 2026-02-21T08:12:21.4324915Z add.s32 %r114, %r174, 33792; 2026-02-21T08:12:21.4325085Z // begin inline asm 2026-02-21T08:12:21.4325310Z cp.async.cg.shared.global [ %r114 + 0 ], [ %rd54 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4325567Z // end inline asm 2026-02-21T08:12:21.4325718Z add.s32 %r116, %r174, 34816; 2026-02-21T08:12:21.4325896Z // begin inline asm 2026-02-21T08:12:21.4326112Z cp.async.cg.shared.global [ %r116 + 0 ], [ %rd55 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4326370Z // end inline asm 2026-02-21T08:12:21.4326524Z add.s32 %r118, %r174, 35840; 2026-02-21T08:12:21.4326706Z // begin inline asm 2026-02-21T08:12:21.4326920Z cp.async.cg.shared.global [ %r118 + 0 ], [ %rd56 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4327176Z // end inline asm 2026-02-21T08:12:21.4327336Z add.s32 %r120, %r174, 36864; 2026-02-21T08:12:21.4327509Z // begin inline asm 2026-02-21T08:12:21.4327731Z cp.async.cg.shared.global [ %r120 + 0 ], [ %rd57 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4327976Z // end inline asm 2026-02-21T08:12:21.4328135Z add.s32 %r122, %r174, 37888; 2026-02-21T08:12:21.4328301Z // begin inline asm 2026-02-21T08:12:21.4328522Z cp.async.cg.shared.global [ %r122 + 0 ], [ %rd58 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4328772Z // end inline asm 2026-02-21T08:12:21.4328933Z add.s32 %r124, %r174, 38912; 2026-02-21T08:12:21.4329102Z // begin inline asm 2026-02-21T08:12:21.4329325Z cp.async.cg.shared.global [ %r124 + 0 ], [ %rd59 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4329652Z // end inline asm 2026-02-21T08:12:21.4329843Z add.s32 %r126, %r174, 39936; 2026-02-21T08:12:21.4330019Z // begin inline asm 2026-02-21T08:12:21.4330238Z cp.async.cg.shared.global [ %r126 + 0 ], [ %rd60 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4330496Z // end inline asm 2026-02-21T08:12:21.4330656Z cp.async.commit_group; 2026-02-21T08:12:21.4331020Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4331349Z add.s64 %rd61, %rd9, 256; 2026-02-21T08:12:21.4331632Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4331955Z add.s32 %r128, %r174, 45056; 2026-02-21T08:12:21.4332130Z // begin inline asm 2026-02-21T08:12:21.4332354Z cp.async.cg.shared.global [ %r128 + 0 ], [ %rd61 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4332598Z // end inline asm 2026-02-21T08:12:21.4332762Z cp.async.commit_group; 2026-02-21T08:12:21.4333038Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4333371Z cp.async.wait_group 8; 2026-02-21T08:12:21.4333549Z bar.sync 2, 64; 2026-02-21T08:12:21.4333819Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4334225Z shfl.sync.idx.b32 %r159, %r7, 0, 31, -1; 2026-02-21T08:12:21.4334442Z setp.ne.b32 %p3, %r159, 0; 2026-02-21T08:12:21.4334627Z @%p3 bra $L__BB0_7; 2026-02-21T08:12:21.4334880Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4335134Z elect.sync %r164|%p5, -1; 2026-02-21T08:12:21.4335312Z bfe.u32 %r166, %r38, 4, 14; 2026-02-21T08:12:21.4335501Z cvt.u64.u32 %rd95, %r166; 2026-02-21T08:12:21.4335704Z or.b64 %rd90, %rd95, -9223371899382267904; 2026-02-21T08:12:21.4335912Z add.s32 %r167, %r38, 40960; 2026-02-21T08:12:21.4336098Z bfe.u32 %r168, %r167, 4, 14; 2026-02-21T08:12:21.4336270Z cvt.u64.u32 %rd96, %r168; 2026-02-21T08:12:21.4336459Z or.b64 %rd91, %rd96, -9223371899411628032; 2026-02-21T08:12:21.4336657Z mov.b32 %r161, 134479888; 2026-02-21T08:12:21.4336834Z mov.pred %p4, 0; 2026-02-21T08:12:21.4336988Z // begin inline asm 2026-02-21T08:12:21.4337258Z @%p5 tcgen05.mma.cta_group::1.kind::f16 [ %r160 + 0 ], %rd90, %rd91, %r161, %p4; 2026-02-21T08:12:21.4337554Z // end inline asm 2026-02-21T08:12:21.4337712Z add.s32 %r169, %r38, 32; 2026-02-21T08:12:21.4337910Z bfe.u32 %r170, %r169, 4, 14; 2026-02-21T08:12:21.4338085Z cvt.u64.u32 %rd97, %r170; 2026-02-21T08:12:21.4338275Z or.b64 %rd92, %rd97, -9223371899382267904; 2026-02-21T08:12:21.4338474Z add.s32 %r171, %r38, 40992; 2026-02-21T08:12:21.4338656Z bfe.u32 %r172, %r171, 4, 14; 2026-02-21T08:12:21.4338826Z cvt.u64.u32 %rd98, %r172; 2026-02-21T08:12:21.4339015Z or.b64 %rd93, %rd98, -9223371899411628032; 2026-02-21T08:12:21.4339220Z mov.pred %p6, -1; 2026-02-21T08:12:21.4339381Z // begin inline asm 2026-02-21T08:12:21.4339632Z @%p5 tcgen05.mma.cta_group::1.kind::f16 [ %r160 + 0 ], %rd92, %rd93, %r161, %p6; 2026-02-21T08:12:21.4339909Z // end inline asm 2026-02-21T08:12:21.4340070Z add.s32 %r173, %r38, 46080; 2026-02-21T08:12:21.4340244Z cvt.u64.u32 %rd94, %r173; 2026-02-21T08:12:21.4340421Z // begin inline asm 2026-02-21T08:12:21.4340657Z @%p4 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd94]; 2026-02-21T08:12:21.4340930Z // end inline asm 2026-02-21T08:12:21.4341121Z $L__BB0_7: // %.peel.next 2026-02-21T08:12:21.4341374Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4341732Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4342054Z add.s64 %rd99, %rd1, 320; 2026-02-21T08:12:21.4342235Z add.s64 %rd100, %rd2, 320; 2026-02-21T08:12:21.4342411Z add.s64 %rd101, %rd3, 320; 2026-02-21T08:12:21.4342589Z add.s64 %rd102, %rd4, 320; 2026-02-21T08:12:21.4342757Z add.s64 %rd103, %rd5, 320; 2026-02-21T08:12:21.4342932Z add.s64 %rd104, %rd6, 320; 2026-02-21T08:12:21.4343108Z add.s64 %rd105, %rd7, 320; 2026-02-21T08:12:21.4343274Z add.s64 %rd106, %rd8, 320; 2026-02-21T08:12:21.4343569Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4343886Z bar.sync 2, 64; 2026-02-21T08:12:21.4344124Z // begin inline asm 2026-02-21T08:12:21.4344349Z cp.async.cg.shared.global [ %r174 + 0 ], [ %rd99 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4344616Z // end inline asm 2026-02-21T08:12:21.4344794Z // begin inline asm 2026-02-21T08:12:21.4345031Z cp.async.cg.shared.global [ %r176 + 0 ], [ %rd100 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4345287Z // end inline asm 2026-02-21T08:12:21.4345435Z // begin inline asm 2026-02-21T08:12:21.4345670Z cp.async.cg.shared.global [ %r178 + 0 ], [ %rd101 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4345918Z // end inline asm 2026-02-21T08:12:21.4346077Z // begin inline asm 2026-02-21T08:12:21.4346293Z cp.async.cg.shared.global [ %r180 + 0 ], [ %rd102 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4346552Z // end inline asm 2026-02-21T08:12:21.4346699Z // begin inline asm 2026-02-21T08:12:21.4346922Z cp.async.cg.shared.global [ %r182 + 0 ], [ %rd103 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4347242Z // end inline asm 2026-02-21T08:12:21.4347396Z // begin inline asm 2026-02-21T08:12:21.4347625Z cp.async.cg.shared.global [ %r184 + 0 ], [ %rd104 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4347884Z // end inline asm 2026-02-21T08:12:21.4348042Z // begin inline asm 2026-02-21T08:12:21.4348259Z cp.async.cg.shared.global [ %r186 + 0 ], [ %rd105 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4348525Z // end inline asm 2026-02-21T08:12:21.4348677Z // begin inline asm 2026-02-21T08:12:21.4348901Z cp.async.cg.shared.global [ %r188 + 0 ], [ %rd106 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4349159Z // end inline asm 2026-02-21T08:12:21.4349317Z cp.async.commit_group; 2026-02-21T08:12:21.4349609Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4349932Z add.s64 %rd107, %rd9, 320; 2026-02-21T08:12:21.4350233Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4350548Z // begin inline asm 2026-02-21T08:12:21.4350774Z cp.async.cg.shared.global [ %r190 + 0 ], [ %rd107 + 0 ], 0x10, %r175; 2026-02-21T08:12:21.4351024Z // end inline asm 2026-02-21T08:12:21.4351190Z cp.async.commit_group; 2026-02-21T08:12:21.4351498Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4351825Z add.s32 %r193, %r5, %r9; 2026-02-21T08:12:21.4352004Z cvt.u64.u32 %rd10, %r193; 2026-02-21T08:12:21.4352175Z add.s32 %r194, %r4, %r8; 2026-02-21T08:12:21.4352351Z cvt.u64.u32 %rd11, %r194; 2026-02-21T08:12:21.4352517Z mov.b32 %r356, 0; 2026-02-21T08:12:21.4352677Z mov.b64 %rd163, 0; 2026-02-21T08:12:21.4352830Z mov.b32 %r357, %r356; 2026-02-21T08:12:21.4353000Z bra.uni $L__BB0_8; 2026-02-21T08:12:21.4353211Z $L__BB0_10: // in Loop: Header=BB0_8 Depth=2 2026-02-21T08:12:21.4353576Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4353917Z add.s64 %rd13, %rd163, 32; 2026-02-21T08:12:21.4354109Z setp.lt.u64 %p17, %rd13, 864; 2026-02-21T08:12:21.4354304Z add.s32 %r233, %r357, 1; 2026-02-21T08:12:21.4354481Z setp.gt.s32 %p18, %r233, 4; 2026-02-21T08:12:21.4354716Z selp.b32 %r357, 0, %r233, %p18; 2026-02-21T08:12:21.4355077Z .loc 1 47 60 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:60 2026-02-21T08:12:21.4355503Z add.s64 %rd127, %rd11, %rd163; 2026-02-21T08:12:21.4355739Z cvt.u32.u64 %r234, %rd127; 2026-02-21T08:12:21.4355950Z add.s32 %r235, %r234, 192; 2026-02-21T08:12:21.4356324Z .loc 1 47 32 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:32 2026-02-21T08:12:21.4356804Z mad.wide.s32 %rd118, %r235, 2, %rd14; 2026-02-21T08:12:21.4357015Z add.s32 %r236, %r234, 16576; 2026-02-21T08:12:21.4357203Z mad.wide.s32 %rd119, %r236, 2, %rd14; 2026-02-21T08:12:21.4357411Z add.s32 %r237, %r234, 32960; 2026-02-21T08:12:21.4357597Z mad.wide.s32 %rd120, %r237, 2, %rd14; 2026-02-21T08:12:21.4357896Z add.s32 %r238, %r234, 49344; 2026-02-21T08:12:21.4358083Z mad.wide.s32 %rd121, %r238, 2, %rd14; 2026-02-21T08:12:21.4358277Z add.s32 %r239, %r234, 65728; 2026-02-21T08:12:21.4358469Z mad.wide.s32 %rd122, %r239, 2, %rd14; 2026-02-21T08:12:21.4358662Z add.s32 %r240, %r234, 82112; 2026-02-21T08:12:21.4358851Z mad.wide.s32 %rd123, %r240, 2, %rd14; 2026-02-21T08:12:21.4359043Z add.s32 %r241, %r234, 98496; 2026-02-21T08:12:21.4359234Z mad.wide.s32 %rd124, %r241, 2, %rd14; 2026-02-21T08:12:21.4359429Z add.s32 %r242, %r234, 114880; 2026-02-21T08:12:21.4359625Z mad.wide.s32 %rd125, %r242, 2, %rd14; 2026-02-21T08:12:21.4359950Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4360286Z shl.b32 %r243, %r357, 13; 2026-02-21T08:12:21.4360475Z add.s32 %r245, %r38, %r243; 2026-02-21T08:12:21.4360653Z bar.sync 2, 64; 2026-02-21T08:12:21.4360820Z add.s32 %r215, %r245, %r10; 2026-02-21T08:12:21.4361065Z selp.b32 %r216, 16, 0, %p17; 2026-02-21T08:12:21.4361260Z // begin inline asm 2026-02-21T08:12:21.4361496Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd118 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4361770Z // end inline asm 2026-02-21T08:12:21.4361934Z add.s32 %r217, %r215, 1024; 2026-02-21T08:12:21.4362109Z // begin inline asm 2026-02-21T08:12:21.4362346Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd119 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4362612Z // end inline asm 2026-02-21T08:12:21.4362776Z add.s32 %r219, %r215, 2048; 2026-02-21T08:12:21.4362948Z // begin inline asm 2026-02-21T08:12:21.4363183Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd120 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4363443Z // end inline asm 2026-02-21T08:12:21.4363609Z add.s32 %r221, %r215, 3072; 2026-02-21T08:12:21.4363791Z // begin inline asm 2026-02-21T08:12:21.4364016Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd121 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4364287Z // end inline asm 2026-02-21T08:12:21.4364444Z add.s32 %r223, %r215, 4096; 2026-02-21T08:12:21.4364632Z // begin inline asm 2026-02-21T08:12:21.4364895Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd122 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4365160Z // end inline asm 2026-02-21T08:12:21.4365315Z add.s32 %r225, %r215, 5120; 2026-02-21T08:12:21.4365495Z // begin inline asm 2026-02-21T08:12:21.4365718Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd123 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4365988Z // end inline asm 2026-02-21T08:12:21.4366152Z add.s32 %r227, %r215, 6144; 2026-02-21T08:12:21.4366324Z // begin inline asm 2026-02-21T08:12:21.4366556Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd124 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4366821Z // end inline asm 2026-02-21T08:12:21.4366980Z add.s32 %r229, %r215, 7168; 2026-02-21T08:12:21.4367151Z // begin inline asm 2026-02-21T08:12:21.4367383Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd125 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4367647Z // end inline asm 2026-02-21T08:12:21.4367815Z cp.async.commit_group; 2026-02-21T08:12:21.4368119Z .loc 1 48 34 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:34 2026-02-21T08:12:21.4368457Z add.s64 %rd128, %rd10, %rd163; 2026-02-21T08:12:21.4368656Z cvt.u32.u64 %r246, %rd128; 2026-02-21T08:12:21.4368842Z mad.wide.s32 %rd126, %r246, 2, %rd15; 2026-02-21T08:12:21.4369164Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4369497Z shl.b32 %r247, %r357, 10; 2026-02-21T08:12:21.4369684Z add.s32 %r231, %r190, %r247; 2026-02-21T08:12:21.4369858Z // begin inline asm 2026-02-21T08:12:21.4370095Z cp.async.cg.shared.global [ %r231 + 0 ], [ %rd126 + 0 ], 0x10, %r216; 2026-02-21T08:12:21.4370360Z // end inline asm 2026-02-21T08:12:21.4370522Z cp.async.commit_group; 2026-02-21T08:12:21.4370836Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4371178Z setp.lt.u64 %p19, %rd13, 992; 2026-02-21T08:12:21.4371461Z mov.b64 %rd163, %rd13; 2026-02-21T08:12:21.4371642Z @%p19 bra $L__BB0_8; 2026-02-21T08:12:21.4371823Z bra.uni $L__BB0_11; 2026-02-21T08:12:21.4372040Z $L__BB0_8: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:21.4372337Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:21.4372586Z add.s32 %r195, %r356, 1; 2026-02-21T08:12:21.4372768Z setp.gt.s32 %p9, %r195, 4; 2026-02-21T08:12:21.4372965Z selp.b32 %r356, 0, %r195, %p9; 2026-02-21T08:12:21.4373275Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4373611Z cp.async.wait_group 8; 2026-02-21T08:12:21.4373786Z bar.sync 2, 64; 2026-02-21T08:12:21.4374071Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4374416Z shfl.sync.idx.b32 %r196, %r7, 0, 31, -1; 2026-02-21T08:12:21.4374734Z setp.ne.b32 %p10, %r196, 0; 2026-02-21T08:12:21.4374929Z @%p10 bra $L__BB0_10; 2026-02-21T08:12:21.4375142Z // %bb.9: // in Loop: Header=BB0_8 Depth=2 2026-02-21T08:12:21.4375404Z setp.eq.b64 %p16, %rd163, 960; 2026-02-21T08:12:21.4375706Z .loc 1 48 87 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:48:87 2026-02-21T08:12:21.4376044Z shl.b32 %r201, %r356, 10; 2026-02-21T08:12:21.4376218Z add.s32 %r203, %r38, %r201; 2026-02-21T08:12:21.4376408Z add.s32 %r204, %r203, 40960; 2026-02-21T08:12:21.4376718Z .loc 1 47 85 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:47:85 2026-02-21T08:12:21.4377036Z shl.b32 %r205, %r356, 13; 2026-02-21T08:12:21.4377218Z add.s32 %r206, %r38, %r205; 2026-02-21T08:12:21.4377510Z .loc 1 49 52 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:49:52 2026-02-21T08:12:21.4377848Z elect.sync %r207|%p12, -1; 2026-02-21T08:12:21.4378039Z bfe.u32 %r208, %r206, 4, 14; 2026-02-21T08:12:21.4378230Z cvt.u64.u32 %rd114, %r208; 2026-02-21T08:12:21.4378431Z or.b64 %rd109, %rd114, -9223371899382267904; 2026-02-21T08:12:21.4378644Z bfe.u32 %r209, %r204, 4, 14; 2026-02-21T08:12:21.4378835Z cvt.u64.u32 %rd115, %r209; 2026-02-21T08:12:21.4379024Z or.b64 %rd110, %rd115, -9223371899411628032; 2026-02-21T08:12:21.4379239Z mov.b32 %r198, 134479888; 2026-02-21T08:12:21.4379415Z mov.pred %p11, -1; 2026-02-21T08:12:21.4379588Z // begin inline asm 2026-02-21T08:12:21.4379855Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r160 + 0 ], %rd109, %rd110, %r198, %p11; 2026-02-21T08:12:21.4380167Z // end inline asm 2026-02-21T08:12:21.4380332Z add.s32 %r210, %r206, 32; 2026-02-21T08:12:21.4380510Z bfe.u32 %r211, %r210, 4, 14; 2026-02-21T08:12:21.4380699Z cvt.u64.u32 %rd116, %r211; 2026-02-21T08:12:21.4380892Z or.b64 %rd111, %rd116, -9223371899382267904; 2026-02-21T08:12:21.4381107Z add.s32 %r212, %r203, 40992; 2026-02-21T08:12:21.4381289Z bfe.u32 %r213, %r212, 4, 14; 2026-02-21T08:12:21.4381475Z cvt.u64.u32 %rd117, %r213; 2026-02-21T08:12:21.4381664Z or.b64 %rd112, %rd117, -9223371899411628032; 2026-02-21T08:12:21.4381878Z // begin inline asm 2026-02-21T08:12:21.4382144Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r160 + 0 ], %rd111, %rd112, %r198, %p11; 2026-02-21T08:12:21.4382442Z // end inline asm 2026-02-21T08:12:21.4382617Z and.pred %p15, %p16, %p12; 2026-02-21T08:12:21.4382799Z add.s32 %r214, %r38, 46080; 2026-02-21T08:12:21.4382990Z cvt.u64.u32 %rd113, %r214; 2026-02-21T08:12:21.4383166Z // begin inline asm 2026-02-21T08:12:21.4383432Z @%p15 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd113]; 2026-02-21T08:12:21.4383704Z // end inline asm 2026-02-21T08:12:21.4383867Z bra.uni $L__BB0_10; 2026-02-21T08:12:21.4384090Z $L__BB0_12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4384474Z .loc 1 42 112 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:42:112 2026-02-21T08:12:21.4384946Z barrier.sync 1; 2026-02-21T08:12:21.4385105Z barrier.sync 1; 2026-02-21T08:12:21.4385270Z bra.uni $L__BB0_2; 2026-02-21T08:12:21.4385458Z $L__BB0_11: // %.loopexit 2026-02-21T08:12:21.4385721Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4385960Z cp.async.wait_group 0; 2026-02-21T08:12:21.4386145Z bar.sync 2, 64; 2026-02-21T08:12:21.4386307Z barrier.sync 1; 2026-02-21T08:12:21.4386462Z bra.uni $L__BB0_2; 2026-02-21T08:12:21.4386681Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:21.4387041Z .loc 1 14 0 // c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py:14 2026-02-21T08:12:21.4387379Z barrier.sync 1; 2026-02-21T08:12:21.4387533Z barrier.sync 1; 2026-02-21T08:12:21.4387695Z bra.uni $L__BB0_2; 2026-02-21T08:12:21.4387853Z $L__tmp0: 2026-02-21T08:12:21.4388072Z $L__func_end0: 2026-02-21T08:12:21.4388253Z // -- End function 2026-02-21T08:12:21.4388475Z } 2026-02-21T08:12:21.4388788Z .file 1 "/tmp/torchinductor_root/63/c63h4t6fy6n6m466wa3shkwsqqdcmreylhd52hybhzfkl35ww3q6.py" 2026-02-21T08:12:21.4389171Z .section .debug_abbrev 2026-02-21T08:12:21.4389345Z { 2026-02-21T08:12:21.4389518Z .b8 1 // Abbreviation Code 2026-02-21T08:12:21.4389782Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:12:21.4390030Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:12:21.4390281Z .b8 37 // DW_AT_producer 2026-02-21T08:12:21.4390524Z .b8 8 // DW_FORM_string 2026-02-21T08:12:21.4390756Z .b8 19 // DW_AT_language 2026-02-21T08:12:21.4391007Z .b8 5 // DW_FORM_data2 2026-02-21T08:12:21.4391243Z .b8 3 // DW_AT_name 2026-02-21T08:12:21.4391482Z .b8 8 // DW_FORM_string 2026-02-21T08:12:21.4391719Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:12:21.4392003Z .b8 6 // DW_FORM_data4 2026-02-21T08:12:21.4392470Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:12:21.4392824Z .b8 8 // DW_FORM_string 2026-02-21T08:12:21.4393158Z .b8 0 // EOM(1) 2026-02-21T08:12:21.4393467Z .b8 0 // EOM(2) 2026-02-21T08:12:21.4393794Z .b8 0 // EOM(3) 2026-02-21T08:12:21.4394060Z } 2026-02-21T08:12:21.4394326Z .section .debug_info 2026-02-21T08:12:21.4394555Z { 2026-02-21T08:12:21.4394860Z .b32 104 // Length of Unit 2026-02-21T08:12:21.4395174Z .b8 2 // DWARF version number 2026-02-21T08:12:21.4395536Z .b8 0 2026-02-21T08:12:21.4395854Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:12:21.4396200Z .b8 8 // Address Size (in bytes) 2026-02-21T08:12:21.4396617Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:12:21.4396954Z .b8 116 // DW_AT_producer 2026-02-21T08:12:21.4397260Z .b8 114 2026-02-21T08:12:21.4397498Z .b8 105 2026-02-21T08:12:21.4397734Z .b8 116 2026-02-21T08:12:21.4397945Z .b8 111 2026-02-21T08:12:21.4398167Z .b8 110 2026-02-21T08:12:21.4398407Z .b8 0 2026-02-21T08:12:21.4398630Z .b8 2 // DW_AT_language 2026-02-21T08:12:21.4398956Z .b8 0 2026-02-21T08:12:21.4399188Z .b8 99 // DW_AT_name 2026-02-21T08:12:21.4399493Z .b8 54 2026-02-21T08:12:21.4399692Z .b8 51 2026-02-21T08:12:21.4399947Z .b8 104 2026-02-21T08:12:21.4400135Z .b8 52 2026-02-21T08:12:21.4400366Z .b8 116 2026-02-21T08:12:21.4400619Z .b8 54 2026-02-21T08:12:21.4400817Z .b8 102 2026-02-21T08:12:21.4401120Z .b8 121 2026-02-21T08:12:21.4401334Z .b8 54 2026-02-21T08:12:21.4401563Z .b8 110 2026-02-21T08:12:21.4401736Z .b8 54 2026-02-21T08:12:21.4402006Z .b8 109 2026-02-21T08:12:21.4402192Z .b8 52 2026-02-21T08:12:21.4402404Z .b8 54 2026-02-21T08:12:21.4402634Z .b8 54 2026-02-21T08:12:21.4402858Z .b8 119 2026-02-21T08:12:21.4403066Z .b8 97 2026-02-21T08:12:21.4403304Z .b8 51 2026-02-21T08:12:21.4403533Z .b8 115 2026-02-21T08:12:21.4403726Z .b8 104 2026-02-21T08:12:21.4403981Z .b8 107 2026-02-21T08:12:21.4404166Z .b8 119 2026-02-21T08:12:21.4404396Z .b8 115 2026-02-21T08:12:21.4404585Z .b8 113 2026-02-21T08:12:21.4404884Z .b8 113 2026-02-21T08:12:21.4405074Z .b8 100 2026-02-21T08:12:21.4405309Z .b8 99 2026-02-21T08:12:21.4405542Z .b8 109 2026-02-21T08:12:21.4405738Z .b8 114 2026-02-21T08:12:21.4405964Z .b8 101 2026-02-21T08:12:21.4406162Z .b8 121 2026-02-21T08:12:21.4406400Z .b8 108 2026-02-21T08:12:21.4406570Z .b8 104 2026-02-21T08:12:21.4406963Z .b8 100 2026-02-21T08:12:21.4407163Z .b8 53 2026-02-21T08:12:21.4407375Z .b8 50 2026-02-21T08:12:21.4407601Z .b8 104 2026-02-21T08:12:21.4407835Z .b8 121 2026-02-21T08:12:21.4408023Z .b8 98 2026-02-21T08:12:21.4408378Z .b8 104 2026-02-21T08:12:21.4408608Z .b8 122 2026-02-21T08:12:21.4408801Z .b8 102 2026-02-21T08:12:21.4409043Z .b8 107 2026-02-21T08:12:21.4409244Z .b8 108 2026-02-21T08:12:21.4409465Z .b8 51 2026-02-21T08:12:21.4409656Z .b8 53 2026-02-21T08:12:21.4409902Z .b8 119 2026-02-21T08:12:21.4410093Z .b8 119 2026-02-21T08:12:21.4410316Z .b8 51 2026-02-21T08:12:21.4410527Z .b8 113 2026-02-21T08:12:21.4410751Z .b8 54 2026-02-21T08:12:21.4410979Z .b8 46 2026-02-21T08:12:21.4411191Z .b8 112 2026-02-21T08:12:21.4411415Z .b8 121 2026-02-21T08:12:21.4411587Z .b8 0 2026-02-21T08:12:21.4411915Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:12:21.4412241Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:12:21.4412528Z .b8 116 2026-02-21T08:12:21.4412764Z .b8 109 2026-02-21T08:12:21.4412990Z .b8 112 2026-02-21T08:12:21.4413184Z .b8 47 2026-02-21T08:12:21.4413440Z .b8 116 2026-02-21T08:12:21.4413661Z .b8 111 2026-02-21T08:12:21.4413859Z .b8 114 2026-02-21T08:12:21.4414097Z .b8 99 2026-02-21T08:12:21.4414287Z .b8 104 2026-02-21T08:12:21.4414522Z .b8 105 2026-02-21T08:12:21.4414741Z .b8 110 2026-02-21T08:12:21.4414981Z .b8 100 2026-02-21T08:12:21.4415179Z .b8 117 2026-02-21T08:12:21.4415411Z .b8 99 2026-02-21T08:12:21.4415618Z .b8 116 2026-02-21T08:12:21.4415854Z .b8 111 2026-02-21T08:12:21.4416083Z .b8 114 2026-02-21T08:12:21.4416289Z .b8 95 2026-02-21T08:12:21.4416519Z .b8 114 2026-02-21T08:12:21.4416690Z .b8 111 2026-02-21T08:12:21.4416949Z .b8 111 2026-02-21T08:12:21.4417145Z .b8 116 2026-02-21T08:12:21.4417355Z .b8 47 2026-02-21T08:12:21.4417594Z .b8 54 2026-02-21T08:12:21.4417828Z .b8 51 2026-02-21T08:12:21.4418016Z .b8 0 2026-02-21T08:12:21.4418266Z } 2026-02-21T08:12:21.4418488Z .section .debug_macinfo { } 2026-02-21T08:12:21.4418682Z 2026-02-21T08:12:21.4418808Z ================================================================ 2026-02-21T08:12:21.4419206Z please share the reproducer above with Triton project. 2026-02-21T08:12:23.8918254Z 2026-02-21T08:12:23.8919215Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 92/92 16.7 configs/s 2026-02-21T08:12:25.4410130Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 651.7 2026-02-21T08:12:25.4415479Z configs/s 2026-02-21T08:12:25.5583177Z [24s] Generation 1 complete: 2026-02-21T08:12:25.5583626Z error=8 2026-02-21T08:12:25.5583847Z ok=86 2026-02-21T08:12:25.5584138Z min=0.0369 2026-02-21T08:12:25.5584365Z mid=0.0922 2026-02-21T08:12:25.5584617Z max=1.4878 2026-02-21T08:12:25.5585212Z best={'block_sizes': [128, 32, 32], 2026-02-21T08:12:25.5585550Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:12:25.5585916Z 'l2_groupings': [2], 2026-02-21T08:12:25.5586212Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:12:25.5586628Z 'loop_orders': [[1, 0]], 2026-02-21T08:12:25.5587221Z 'maxnreg': 128, 2026-02-21T08:12:25.5587491Z 'num_sm_multiplier': 8, 2026-02-21T08:12:25.5587792Z 'num_stages': 6, 2026-02-21T08:12:25.5588065Z 'num_warps': 1, 2026-02-21T08:12:25.5588327Z 'pid_type': 'persistent_blocked', 2026-02-21T08:12:25.5588641Z 'range_flattens': [None, False], 2026-02-21T08:12:25.5588968Z 'range_multi_buffers': [None, True], 2026-02-21T08:12:25.5589247Z 'range_num_stages': [0, 0], 2026-02-21T08:12:25.5589566Z 'range_unroll_factors': [0, 0], 2026-02-21T08:12:25.5589852Z 'range_warp_specializes': [None, True]} 2026-02-21T08:12:25.5601206Z [24s] Fitting surrogate: 194 points, 194 targets 2026-02-21T08:12:27.3545997Z [26s] Generation 2 starting: 86 neighbors, 5 active search path(s) 2026-02-21T08:12:32.7539077Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 10.6 configs/s 2026-02-21T08:12:37.2661401Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 19.5 configs/s 2026-02-21T08:12:38.1310059Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1173.9 2026-02-21T08:12:38.1311750Z configs/s 2026-02-21T08:12:38.2088224Z [36s] Generation 2 complete: 2026-02-21T08:12:38.2090049Z error=14 2026-02-21T08:12:38.2090400Z ok=78 2026-02-21T08:12:38.2090612Z min=0.0205 2026-02-21T08:12:38.2090891Z mid=0.0450 2026-02-21T08:12:38.2091094Z max=1.7398 2026-02-21T08:12:38.2091352Z best={'block_sizes': [128, 128, 32], 2026-02-21T08:12:38.2091670Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:12:38.2092063Z 'l2_groupings': [2], 2026-02-21T08:12:38.2092355Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:12:38.2092621Z 'loop_orders': [[1, 0]], 2026-02-21T08:12:38.2092917Z 'maxnreg': 128, 2026-02-21T08:12:38.2093157Z 'num_sm_multiplier': 4, 2026-02-21T08:12:38.2093417Z 'num_stages': 6, 2026-02-21T08:12:38.2093718Z 'num_warps': 1, 2026-02-21T08:12:38.2094000Z 'pid_type': 'persistent_blocked', 2026-02-21T08:12:38.2094297Z 'range_flattens': [None, False], 2026-02-21T08:12:38.2094582Z 'range_multi_buffers': [None, True], 2026-02-21T08:12:38.2094928Z 'range_num_stages': [0, 0], 2026-02-21T08:12:38.2095173Z 'range_unroll_factors': [0, 0], 2026-02-21T08:12:38.2095482Z 'range_warp_specializes': [None, True]} 2026-02-21T08:12:38.2124852Z [36s] Fitting surrogate: 286 points, 286 targets 2026-02-21T08:12:39.5559364Z [38s] Generation 3 starting: 82 neighbors, 5 active search path(s) 2026-02-21T08:12:44.3846537Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 25.5 configs/s 2026-02-21T08:12:45.0996252Z [43s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:12:45.0998162Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 32], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=5, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:12:45.1000001Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:12:45.1000419Z `ptxas` stderr: 2026-02-21T08:12:45.1001264Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 215 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:12:45.1001965Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:12:45.1002241Z 2026-02-21T08:12:45.1002879Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmps99ud8ke.ptx -o /tmp/tmps99ud8ke.ptx.o 2026-02-21T08:12:45.1003667Z 2026-02-21T08:12:45.1003906Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:12:45.1004239Z 2026-02-21T08:12:45.1004765Z 2026-02-21T08:12:45.1004945Z ================================================================ 2026-02-21T08:12:45.1005258Z Internal Triton PTX codegen error 2026-02-21T08:12:45.1005650Z `ptxas` stderr: 2026-02-21T08:12:45.1006198Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 215 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:12:45.1006933Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:12:45.1007183Z 2026-02-21T08:12:45.1007801Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmps99ud8ke.ptx -o /tmp/tmps99ud8ke.ptx.o 2026-02-21T08:12:45.1008427Z 2026-02-21T08:12:45.1008432Z 2026-02-21T08:12:45.1008533Z // 2026-02-21T08:12:45.1008788Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:12:45.1009099Z // 2026-02-21T08:12:45.1009245Z 2026-02-21T08:12:45.1009505Z .version 8.7 2026-02-21T08:12:45.1009767Z .target sm_100a 2026-02-21T08:12:45.1010081Z .address_size 64 2026-02-21T08:12:45.1010216Z 2026-02-21T08:12:45.1010479Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:12:45.1011010Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:12:45.1011455Z // @_helion_matmul 2026-02-21T08:12:45.1011820Z .visible .entry _helion_matmul( 2026-02-21T08:12:45.1012231Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:12:45.1012710Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:12:45.1013120Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:12:45.1013534Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:12:45.1014066Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:12:45.1014429Z ) 2026-02-21T08:12:45.1014657Z .reqntid 256 2026-02-21T08:12:45.1015022Z .maxnreg 32 2026-02-21T08:12:45.1015240Z { 2026-02-21T08:12:45.1015473Z .reg .pred %p<81>; 2026-02-21T08:12:45.1015783Z .reg .b16 %rs<3>; 2026-02-21T08:12:45.1016024Z .reg .b32 %r<672>; 2026-02-21T08:12:45.1016271Z .reg .b64 %rd<271>; 2026-02-21T08:12:45.1016708Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19:0 2026-02-21T08:12:45.1017203Z $L__func_begin0: 2026-02-21T08:12:45.1017584Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19:0 2026-02-21T08:12:45.1017980Z 2026-02-21T08:12:45.1018087Z // %bb.0: 2026-02-21T08:12:45.1018383Z ld.param.b64 %rd8, [_helion_matmul_param_3]; 2026-02-21T08:12:45.1018689Z $L__tmp0: 2026-02-21T08:12:45.1019204Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19 2026-02-21T08:12:45.1019643Z mov.u32 %r1, %tid.x; 2026-02-21T08:12:45.1019931Z shr.u32 %r2, %r1, 5; 2026-02-21T08:12:45.1020202Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:12:45.1020605Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:12:45.1020938Z @%p1 bra $L__BB0_12; 2026-02-21T08:12:45.1021200Z bra.uni $L__BB0_1; 2026-02-21T08:12:45.1021574Z $L__BB0_12: 2026-02-21T08:12:45.1021972Z .loc 1 0 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:0:0 2026-02-21T08:12:45.1022538Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:12:45.1022990Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19 2026-02-21T08:12:45.1023547Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:12:45.1023910Z setp.lt.u32 %p27, %r1, 32; 2026-02-21T08:12:45.1024191Z mov.b32 %r249, global_smem; 2026-02-21T08:12:45.1024505Z // begin inline asm 2026-02-21T08:12:45.1024958Z @%p27 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r249], 64; 2026-02-21T08:12:45.1025397Z // end inline asm 2026-02-21T08:12:45.1025653Z bar.sync 0, 128; 2026-02-21T08:12:45.1025965Z ld.shared.b32 %r663, [global_smem]; 2026-02-21T08:12:45.1026297Z bar.sync 0, 128; 2026-02-21T08:12:45.1026643Z // begin inline asm 2026-02-21T08:12:45.1027032Z @%p27 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:12:45.1027383Z // end inline asm 2026-02-21T08:12:45.1027858Z .loc 1 21 67 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:21:67 2026-02-21T08:12:45.1028317Z mov.u32 %r258, %ctaid.x; 2026-02-21T08:12:45.1028599Z mov.u32 %r259, %ctaid.y; 2026-02-21T08:12:45.1028931Z mov.u32 %r260, %ctaid.z; 2026-02-21T08:12:45.1029193Z mov.u32 %r261, %nctaid.x; 2026-02-21T08:12:45.1029481Z mov.u32 %r262, %nctaid.y; 2026-02-21T08:12:45.1029806Z mad.lo.s32 %r263, %r260, %r262, %r259; 2026-02-21T08:12:45.1030136Z mad.lo.s32 %r264, %r263, %r261, %r258; 2026-02-21T08:12:45.1030420Z shl.b32 %r265, %r264, 7; 2026-02-21T08:12:45.1030757Z cvt.s64.s32 %rd133, %r265; 2026-02-21T08:12:45.1031032Z add.s64 %rd130, %rd8, %rd133; 2026-02-21T08:12:45.1031440Z shl.b32 %r266, %r1, 2; 2026-02-21T08:12:45.1031778Z add.s32 %r250, %r249, %r266; 2026-02-21T08:12:45.1032053Z mov.b32 %r251, 0; 2026-02-21T08:12:45.1032342Z // begin inline asm 2026-02-21T08:12:45.1032627Z @%p27 st.shared.b32 [ %r250 + 0 ], %r251; 2026-02-21T08:12:45.1032977Z // end inline asm 2026-02-21T08:12:45.1033227Z bar.warp.sync -1; 2026-02-21T08:12:45.1033527Z setp.eq.b32 %p30, %r1, 0; 2026-02-21T08:12:45.1033809Z cvt.u64.u32 %rd115, %r249; 2026-02-21T08:12:45.1034106Z // begin inline asm 2026-02-21T08:12:45.1034551Z @%p30 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd115 + 0 ], %rd5; 2026-02-21T08:12:45.1035035Z // end inline asm 2026-02-21T08:12:45.1035329Z // begin inline asm 2026-02-21T08:12:45.1035676Z @%p30 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x1; 2026-02-21T08:12:45.1036152Z // end inline asm 2026-02-21T08:12:45.1036392Z mov.b32 %r252, 32; 2026-02-21T08:12:45.1036668Z // begin inline asm 2026-02-21T08:12:45.1037132Z @%p30 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x0, %r252; 2026-02-21T08:12:45.1037572Z // end inline asm 2026-02-21T08:12:45.1037841Z mov.b32 %r253, 128; 2026-02-21T08:12:45.1038130Z // begin inline asm 2026-02-21T08:12:45.1038561Z @%p30 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x1, %r253; 2026-02-21T08:12:45.1039006Z // end inline asm 2026-02-21T08:12:45.1039303Z mov.b32 %r254, 1024; 2026-02-21T08:12:45.1039610Z // begin inline asm 2026-02-21T08:12:45.1039992Z @%p30 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x0, %r254; 2026-02-21T08:12:45.1040488Z // end inline asm 2026-02-21T08:12:45.1040728Z mov.b32 %r255, 4096; 2026-02-21T08:12:45.1041009Z // begin inline asm 2026-02-21T08:12:45.1041372Z @%p30 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x1, %r255; 2026-02-21T08:12:45.1041880Z // end inline asm 2026-02-21T08:12:45.1042148Z mov.b64 %rd123, 2048; 2026-02-21T08:12:45.1042392Z // begin inline asm 2026-02-21T08:12:45.1042832Z @%p30 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd115 + 0 ], 0x0, %rd123; 2026-02-21T08:12:45.1043245Z // end inline asm 2026-02-21T08:12:45.1043504Z mov.b32 %r256, 1; 2026-02-21T08:12:45.1043750Z // begin inline asm 2026-02-21T08:12:45.1044169Z @%p30 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x0, %r256; 2026-02-21T08:12:45.1044656Z // end inline asm 2026-02-21T08:12:45.1044995Z // begin inline asm 2026-02-21T08:12:45.1045439Z @%p30 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x1, %r256; 2026-02-21T08:12:45.1045874Z // end inline asm 2026-02-21T08:12:45.1046193Z // begin inline asm 2026-02-21T08:12:45.1046558Z @%p30 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x6; 2026-02-21T08:12:45.1047000Z // end inline asm 2026-02-21T08:12:45.1047303Z // begin inline asm 2026-02-21T08:12:45.1047681Z @%p30 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x0; 2026-02-21T08:12:45.1048258Z // end inline asm 2026-02-21T08:12:45.1048524Z // begin inline asm 2026-02-21T08:12:45.1048929Z @%p30 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x2; 2026-02-21T08:12:45.1049368Z // end inline asm 2026-02-21T08:12:45.1049657Z // begin inline asm 2026-02-21T08:12:45.1050053Z @%p30 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd115 + 0 ], 0x0; 2026-02-21T08:12:45.1050471Z // end inline asm 2026-02-21T08:12:45.1050755Z // begin inline asm 2026-02-21T08:12:45.1051277Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd130 + 0 ], [ %rd115 + 0 ], 0x80; 2026-02-21T08:12:45.1051860Z // end inline asm 2026-02-21T08:12:45.1052116Z // begin inline asm 2026-02-21T08:12:45.1052416Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd130 + 0 ], 0x80; 2026-02-21T08:12:45.1052816Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T08:12:45.1053184Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:12:45.1053493Z // end inline asm 2026-02-21T08:12:45.1053701Z bar.sync 0, 128; 2026-02-21T08:12:45.1054102Z .loc 1 28 35 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:28:35 2026-02-21T08:12:45.1054557Z shl.b32 %r671, %r258, 2; 2026-02-21T08:12:45.1055022Z .loc 1 29 37 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:29:37 2026-02-21T08:12:45.1055509Z add.s32 %r267, %r671, 4; 2026-02-21T08:12:45.1055951Z .loc 1 29 49 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:29:49 2026-02-21T08:12:45.1056426Z min.s32 %r25, %r267, 512; 2026-02-21T08:12:45.1056947Z .loc 1 30 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:30:52 2026-02-21T08:12:45.1057506Z setp.ge.s32 %p47, %r671, %r25; 2026-02-21T08:12:45.1057830Z @%p47 bra $L__BB0_15; 2026-02-21T08:12:45.1058097Z // %bb.13: // %.lr.ph 2026-02-21T08:12:45.1058637Z .loc 1 0 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:0:52 2026-02-21T08:12:45.1059105Z ld.param.b64 %rd7, [_helion_matmul_param_2]; 2026-02-21T08:12:45.1059429Z shl.b32 %r26, %r1, 3; 2026-02-21T08:12:45.1059716Z and.b32 %r27, %r26, 56; 2026-02-21T08:12:45.1060011Z shr.u32 %r268, %r1, 3; 2026-02-21T08:12:45.1060286Z bfe.u32 %r28, %r1, 3, 4; 2026-02-21T08:12:45.1060591Z or.b32 %r29, %r28, 16; 2026-02-21T08:12:45.1060880Z or.b32 %r30, %r28, 32; 2026-02-21T08:12:45.1061127Z or.b32 %r31, %r28, 48; 2026-02-21T08:12:45.1061437Z or.b32 %r32, %r28, 64; 2026-02-21T08:12:45.1061677Z or.b32 %r33, %r28, 80; 2026-02-21T08:12:45.1061959Z or.b32 %r34, %r28, 96; 2026-02-21T08:12:45.1062216Z or.b32 %r35, %r268, 112; 2026-02-21T08:12:45.1062520Z shl.b32 %r269, %r1, 4; 2026-02-21T08:12:45.1062820Z and.b32 %r270, %r269, 1968; 2026-02-21T08:12:45.1063087Z bfe.s32 %r271, %r1, 2, 1; 2026-02-21T08:12:45.1063401Z and.b32 %r272, %r271, 2112; 2026-02-21T08:12:45.1063668Z or.b32 %r273, %r272, %r270; 2026-02-21T08:12:45.1063989Z add.s32 %r36, %r249, %r273; 2026-02-21T08:12:45.1064266Z xor.b32 %r275, %r273, 64; 2026-02-21T08:12:45.1064571Z add.s32 %r37, %r249, %r275; 2026-02-21T08:12:45.1064864Z shl.b32 %r276, %r1, 6; 2026-02-21T08:12:45.1065198Z and.b32 %r277, %r276, 2112; 2026-02-21T08:12:45.1065507Z shl.b32 %r278, %r1, 5; 2026-02-21T08:12:45.1065749Z and.b32 %r279, %r278, 768; 2026-02-21T08:12:45.1066103Z and.b32 %r280, %r26, 48; 2026-02-21T08:12:45.1066353Z shl.b32 %r281, %r1, 1; 2026-02-21T08:12:45.1066590Z and.b32 %r282, %r281, 192; 2026-02-21T08:12:45.1066854Z or.b32 %r283, %r277, %r279; 2026-02-21T08:12:45.1067124Z or.b32 %r284, %r280, %r282; 2026-02-21T08:12:45.1067357Z xor.b32 %r285, %r283, %r284; 2026-02-21T08:12:45.1067650Z add.s32 %r455, %r249, %r285; 2026-02-21T08:12:45.1067923Z add.s32 %r460, %r455, 1024; 2026-02-21T08:12:45.1068211Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T08:12:45.1068716Z .loc 1 36 35 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:36:35 2026-02-21T08:12:45.1069204Z shr.s32 %r523, %r671, 31; 2026-02-21T08:12:45.1069476Z shr.u32 %r524, %r523, 21; 2026-02-21T08:12:45.1069713Z add.s32 %r525, %r671, %r524; 2026-02-21T08:12:45.1070007Z shr.s32 %r526, %r525, 11; 2026-02-21T08:12:45.1070399Z .loc 1 37 33 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:37:33 2026-02-21T08:12:45.1070795Z shl.b32 %r527, %r526, 6; 2026-02-21T08:12:45.1071212Z .loc 1 38 39 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:38:39 2026-02-21T08:12:45.1071586Z sub.s32 %r528, 16, %r527; 2026-02-21T08:12:45.1071992Z .loc 1 38 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:38:52 2026-02-21T08:12:45.1072441Z min.s32 %r529, %r528, 64; 2026-02-21T08:12:45.1072864Z .loc 1 40 51 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:40:51 2026-02-21T08:12:45.1073313Z and.b32 %r530, %r525, -2048; 2026-02-21T08:12:45.1073566Z sub.s32 %r531, %r671, %r530; 2026-02-21T08:12:45.1073828Z div.s32 %r532, %r531, %r529; 2026-02-21T08:12:45.1074186Z .loc 1 41 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:41:27 2026-02-21T08:12:45.1074761Z mul.lo.s32 %r533, %r532, %r529; 2026-02-21T08:12:45.1075106Z mad.lo.s32 %r534, %r526, 1984, %r533; 2026-02-21T08:12:45.1075426Z sub.s32 %r535, %r671, %r534; 2026-02-21T08:12:45.1075800Z shl.b32 %r536, %r535, 6; 2026-02-21T08:12:45.1076229Z .loc 1 42 32 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:42:32 2026-02-21T08:12:45.1076652Z or.b32 %r537, %r536, %r27; 2026-02-21T08:12:45.1077108Z .loc 1 43 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:43:27 2026-02-21T08:12:45.1077613Z shl.b32 %r538, %r532, 7; 2026-02-21T08:12:45.1078073Z .loc 1 44 32 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:44:32 2026-02-21T08:12:45.1078579Z or.b32 %r539, %r538, %r28; 2026-02-21T08:12:45.1078893Z or.b32 %r540, %r538, %r29; 2026-02-21T08:12:45.1079153Z or.b32 %r541, %r538, %r30; 2026-02-21T08:12:45.1079466Z or.b32 %r542, %r538, %r31; 2026-02-21T08:12:45.1079717Z or.b32 %r543, %r538, %r32; 2026-02-21T08:12:45.1080028Z or.b32 %r544, %r538, %r33; 2026-02-21T08:12:45.1080343Z or.b32 %r545, %r538, %r34; 2026-02-21T08:12:45.1080612Z or.b32 %r546, %r538, %r35; 2026-02-21T08:12:45.1081079Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1081592Z shfl.sync.idx.b32 %r547, %r2, 0, 31, -1; 2026-02-21T08:12:45.1081909Z shl.b32 %r548, %r547, 21; 2026-02-21T08:12:45.1082146Z and.b32 %r549, %r548, 6291456; 2026-02-21T08:12:45.1082431Z add.s32 %r286, %r549, %r663; 2026-02-21T08:12:45.1082688Z mov.pred %p48, -1; 2026-02-21T08:12:45.1082939Z mov.b32 %r287, 0; 2026-02-21T08:12:45.1083194Z // begin inline asm 2026-02-21T08:12:45.1083713Z @%p48 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r286 + 0], {%r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287}; 2026-02-21T08:12:45.1084272Z // end inline asm 2026-02-21T08:12:45.1084468Z // begin inline asm 2026-02-21T08:12:45.1085188Z @%p48 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r286 + 16], {%r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287}; 2026-02-21T08:12:45.1085801Z // end inline asm 2026-02-21T08:12:45.1086022Z // begin inline asm 2026-02-21T08:12:45.1086599Z @%p48 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r286 + 32], {%r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287}; 2026-02-21T08:12:45.1087102Z // end inline asm 2026-02-21T08:12:45.1087335Z // begin inline asm 2026-02-21T08:12:45.1087860Z @%p48 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r286 + 48], {%r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287, %r287}; 2026-02-21T08:12:45.1088472Z // end inline asm 2026-02-21T08:12:45.1088706Z // begin inline asm 2026-02-21T08:12:45.1088975Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:12:45.1089250Z // end inline asm 2026-02-21T08:12:45.1089467Z bar.sync 0, 128; 2026-02-21T08:12:45.1089884Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1090276Z add.s32 %r354, %r249, 57344; 2026-02-21T08:12:45.1090564Z // begin inline asm 2026-02-21T08:12:45.1090858Z @%p30 mbarrier.init.shared::cta.b64 [%r354], 1; 2026-02-21T08:12:45.1091129Z // end inline asm 2026-02-21T08:12:45.1091385Z bar.sync 0, 128; 2026-02-21T08:12:45.1091615Z add.s32 %r355, %r249, 57352; 2026-02-21T08:12:45.1091891Z // begin inline asm 2026-02-21T08:12:45.1092144Z @%p30 mbarrier.init.shared::cta.b64 [%r355], 1; 2026-02-21T08:12:45.1092527Z // end inline asm 2026-02-21T08:12:45.1092795Z bar.sync 0, 128; 2026-02-21T08:12:45.1093012Z add.s32 %r356, %r249, 57360; 2026-02-21T08:12:45.1093282Z // begin inline asm 2026-02-21T08:12:45.1093538Z @%p30 mbarrier.init.shared::cta.b64 [%r356], 1; 2026-02-21T08:12:45.1093852Z // end inline asm 2026-02-21T08:12:45.1094041Z bar.sync 0, 128; 2026-02-21T08:12:45.1094330Z add.s32 %r357, %r249, 57368; 2026-02-21T08:12:45.1094564Z // begin inline asm 2026-02-21T08:12:45.1094852Z @%p30 mbarrier.init.shared::cta.b64 [%r357], 1; 2026-02-21T08:12:45.1095204Z // end inline asm 2026-02-21T08:12:45.1095425Z bar.sync 0, 128; 2026-02-21T08:12:45.1095674Z add.s32 %r358, %r249, 57376; 2026-02-21T08:12:45.1095969Z // begin inline asm 2026-02-21T08:12:45.1096263Z @%p30 mbarrier.init.shared::cta.b64 [%r358], 1; 2026-02-21T08:12:45.1096550Z // end inline asm 2026-02-21T08:12:45.1096819Z add.s32 %r359, %r249, 57392; 2026-02-21T08:12:45.1097049Z // begin inline asm 2026-02-21T08:12:45.1097329Z @%p30 mbarrier.init.shared::cta.b64 [%r359], 1; 2026-02-21T08:12:45.1097657Z // end inline asm 2026-02-21T08:12:45.1097857Z bar.sync 0, 128; 2026-02-21T08:12:45.1098102Z add.s32 %r360, %r249, 57400; 2026-02-21T08:12:45.1098341Z // begin inline asm 2026-02-21T08:12:45.1098622Z @%p30 mbarrier.init.shared::cta.b64 [%r360], 1; 2026-02-21T08:12:45.1098886Z // end inline asm 2026-02-21T08:12:45.1099138Z bar.sync 0, 128; 2026-02-21T08:12:45.1099361Z add.s32 %r361, %r249, 57408; 2026-02-21T08:12:45.1099623Z // begin inline asm 2026-02-21T08:12:45.1099915Z @%p30 mbarrier.init.shared::cta.b64 [%r361], 1; 2026-02-21T08:12:45.1100196Z // end inline asm 2026-02-21T08:12:45.1100438Z bar.sync 0, 128; 2026-02-21T08:12:45.1100641Z add.s32 %r362, %r249, 57416; 2026-02-21T08:12:45.1100937Z // begin inline asm 2026-02-21T08:12:45.1101176Z @%p30 mbarrier.init.shared::cta.b64 [%r362], 1; 2026-02-21T08:12:45.1101466Z // end inline asm 2026-02-21T08:12:45.1101737Z bar.sync 0, 128; 2026-02-21T08:12:45.1101950Z add.s32 %r363, %r249, 57424; 2026-02-21T08:12:45.1102209Z // begin inline asm 2026-02-21T08:12:45.1102485Z @%p30 mbarrier.init.shared::cta.b64 [%r363], 1; 2026-02-21T08:12:45.1102779Z // end inline asm 2026-02-21T08:12:45.1103135Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1103571Z bar.sync 0, 128; 2026-02-21T08:12:45.1103782Z // begin inline asm 2026-02-21T08:12:45.1104079Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r354]; 2026-02-21T08:12:45.1104400Z // end inline asm 2026-02-21T08:12:45.1104601Z bar.sync 0, 128; 2026-02-21T08:12:45.1104910Z // begin inline asm 2026-02-21T08:12:45.1105162Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r355]; 2026-02-21T08:12:45.1105499Z // end inline asm 2026-02-21T08:12:45.1105725Z bar.sync 0, 128; 2026-02-21T08:12:45.1105995Z // begin inline asm 2026-02-21T08:12:45.1106321Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r356]; 2026-02-21T08:12:45.1106625Z // end inline asm 2026-02-21T08:12:45.1106918Z bar.sync 0, 128; 2026-02-21T08:12:45.1107244Z // begin inline asm 2026-02-21T08:12:45.1107539Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r357]; 2026-02-21T08:12:45.1107826Z // end inline asm 2026-02-21T08:12:45.1108148Z bar.sync 0, 128; 2026-02-21T08:12:45.1108377Z // begin inline asm 2026-02-21T08:12:45.1108670Z @%p30 mbarrier.arrive.shared::cta.b64 _, [%r358]; 2026-02-21T08:12:45.1109058Z // end inline asm 2026-02-21T08:12:45.1109453Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1109921Z bar.sync 0, 128; 2026-02-21T08:12:45.1110200Z add.s32 %r369, %r249, 57440; 2026-02-21T08:12:45.1110495Z // begin inline asm 2026-02-21T08:12:45.1110767Z @%p30 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T08:12:45.1111130Z // end inline asm 2026-02-21T08:12:45.1111430Z st.shared.b32 [global_smem+57448], 33554689; 2026-02-21T08:12:45.1111753Z st.shared.b32 [global_smem+40960], %r663; 2026-02-21T08:12:45.1112245Z st.shared.v2.b32 [global_smem+40968], {%r538, %r536}; 2026-02-21T08:12:45.1112571Z barrier.sync 1; 2026-02-21T08:12:45.1112892Z barrier.sync 1; 2026-02-21T08:12:45.1113137Z barrier.sync 1; 2026-02-21T08:12:45.1113559Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1114000Z bar.sync 0, 128; 2026-02-21T08:12:45.1114207Z // begin inline asm 2026-02-21T08:12:45.1114466Z 2026-02-21T08:12:45.1114709Z { 2026-02-21T08:12:45.1115011Z .reg .pred complete; 2026-02-21T08:12:45.1115299Z waitLoop: 2026-02-21T08:12:45.1115669Z mbarrier.try_wait.parity.shared.b64 complete, [%r369], %r287; 2026-02-21T08:12:45.1115985Z @!complete bra.uni waitLoop; 2026-02-21T08:12:45.1116292Z } 2026-02-21T08:12:45.1116402Z 2026-02-21T08:12:45.1116529Z // end inline asm 2026-02-21T08:12:45.1116873Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1117329Z bar.sync 0, 128; 2026-02-21T08:12:45.1117558Z // begin inline asm 2026-02-21T08:12:45.1117826Z @%p30 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T08:12:45.1118129Z // end inline asm 2026-02-21T08:12:45.1118384Z // begin inline asm 2026-02-21T08:12:45.1118629Z @%p30 mbarrier.inval.shared::cta.b64 [%r359]; 2026-02-21T08:12:45.1118987Z // end inline asm 2026-02-21T08:12:45.1119228Z bar.sync 0, 128; 2026-02-21T08:12:45.1119437Z // begin inline asm 2026-02-21T08:12:45.1119739Z @%p30 mbarrier.inval.shared::cta.b64 [%r360]; 2026-02-21T08:12:45.1120000Z // end inline asm 2026-02-21T08:12:45.1120239Z bar.sync 0, 128; 2026-02-21T08:12:45.1209483Z // begin inline asm 2026-02-21T08:12:45.1210031Z @%p30 mbarrier.inval.shared::cta.b64 [%r361]; 2026-02-21T08:12:45.1210301Z // end inline asm 2026-02-21T08:12:45.1210470Z bar.sync 0, 128; 2026-02-21T08:12:45.1210644Z // begin inline asm 2026-02-21T08:12:45.1210846Z @%p30 mbarrier.inval.shared::cta.b64 [%r362]; 2026-02-21T08:12:45.1211080Z // end inline asm 2026-02-21T08:12:45.1211254Z bar.sync 0, 128; 2026-02-21T08:12:45.1211441Z // begin inline asm 2026-02-21T08:12:45.1211632Z @%p30 mbarrier.inval.shared::cta.b64 [%r363]; 2026-02-21T08:12:45.1211860Z // end inline asm 2026-02-21T08:12:45.1212032Z // begin inline asm 2026-02-21T08:12:45.1212219Z @%p30 mbarrier.inval.shared::cta.b64 [%r354]; 2026-02-21T08:12:45.1212442Z // end inline asm 2026-02-21T08:12:45.1212604Z bar.sync 0, 128; 2026-02-21T08:12:45.1212769Z // begin inline asm 2026-02-21T08:12:45.1212951Z @%p30 mbarrier.inval.shared::cta.b64 [%r355]; 2026-02-21T08:12:45.1213165Z // end inline asm 2026-02-21T08:12:45.1213315Z bar.sync 0, 128; 2026-02-21T08:12:45.1213478Z // begin inline asm 2026-02-21T08:12:45.1213658Z @%p30 mbarrier.inval.shared::cta.b64 [%r356]; 2026-02-21T08:12:45.1213870Z // end inline asm 2026-02-21T08:12:45.1214025Z bar.sync 0, 128; 2026-02-21T08:12:45.1214177Z // begin inline asm 2026-02-21T08:12:45.1214364Z @%p30 mbarrier.inval.shared::cta.b64 [%r357]; 2026-02-21T08:12:45.1214572Z // end inline asm 2026-02-21T08:12:45.1215021Z bar.sync 0, 128; 2026-02-21T08:12:45.1215181Z // begin inline asm 2026-02-21T08:12:45.1215378Z @%p30 mbarrier.inval.shared::cta.b64 [%r358]; 2026-02-21T08:12:45.1215594Z // end inline asm 2026-02-21T08:12:45.1215924Z .loc 1 59 45 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:59:45 2026-02-21T08:12:45.1216298Z shl.b32 %r551, %r539, 10; 2026-02-21T08:12:45.1216491Z shl.b32 %r552, %r540, 10; 2026-02-21T08:12:45.1216685Z shl.b32 %r553, %r541, 10; 2026-02-21T08:12:45.1216861Z shl.b32 %r554, %r542, 10; 2026-02-21T08:12:45.1217048Z shl.b32 %r555, %r543, 10; 2026-02-21T08:12:45.1217222Z shl.b32 %r556, %r544, 10; 2026-02-21T08:12:45.1217407Z shl.b32 %r557, %r545, 10; 2026-02-21T08:12:45.1217580Z shl.b32 %r558, %r546, 10; 2026-02-21T08:12:45.1217905Z .loc 1 59 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:59:52 2026-02-21T08:12:45.1218262Z add.s32 %r559, %r551, %r537; 2026-02-21T08:12:45.1218588Z add.s32 %r560, %r552, %r537; 2026-02-21T08:12:45.1218796Z add.s32 %r561, %r553, %r537; 2026-02-21T08:12:45.1218978Z add.s32 %r562, %r554, %r537; 2026-02-21T08:12:45.1219170Z add.s32 %r563, %r555, %r537; 2026-02-21T08:12:45.1219353Z add.s32 %r564, %r556, %r537; 2026-02-21T08:12:45.1219540Z add.s32 %r565, %r557, %r537; 2026-02-21T08:12:45.1219717Z add.s32 %r566, %r558, %r537; 2026-02-21T08:12:45.1220039Z .loc 1 59 24 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:59:24 2026-02-21T08:12:45.1220406Z mad.wide.s32 %rd134, %r559, 2, %rd7; 2026-02-21T08:12:45.1220635Z mad.wide.s32 %rd135, %r560, 2, %rd7; 2026-02-21T08:12:45.1220849Z mad.wide.s32 %rd136, %r561, 2, %rd7; 2026-02-21T08:12:45.1221044Z mad.wide.s32 %rd137, %r562, 2, %rd7; 2026-02-21T08:12:45.1221247Z mad.wide.s32 %rd138, %r563, 2, %rd7; 2026-02-21T08:12:45.1221439Z mad.wide.s32 %rd139, %r564, 2, %rd7; 2026-02-21T08:12:45.1221638Z mad.wide.s32 %rd140, %r565, 2, %rd7; 2026-02-21T08:12:45.1221834Z mad.wide.s32 %rd141, %r566, 2, %rd7; 2026-02-21T08:12:45.1222166Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1222514Z // begin inline asm 2026-02-21T08:12:45.1222936Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398}, [%r286 + 0]; 2026-02-21T08:12:45.1223387Z // end inline asm 2026-02-21T08:12:45.1223548Z // begin inline asm 2026-02-21T08:12:45.1223982Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415}, [%r286 + 16]; 2026-02-21T08:12:45.1224437Z // end inline asm 2026-02-21T08:12:45.1224606Z // begin inline asm 2026-02-21T08:12:45.1225055Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432}, [%r286 + 32]; 2026-02-21T08:12:45.1225483Z // end inline asm 2026-02-21T08:12:45.1225651Z // begin inline asm 2026-02-21T08:12:45.1226039Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449}, [%r286 + 48]; 2026-02-21T08:12:45.1226494Z // end inline asm 2026-02-21T08:12:45.1226645Z // begin inline asm 2026-02-21T08:12:45.1226834Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:12:45.1227029Z // end inline asm 2026-02-21T08:12:45.1227189Z cvt.u64.u32 %rd142, %r383; 2026-02-21T08:12:45.1227378Z cvt.u64.u32 %rd143, %r384; 2026-02-21T08:12:45.1227559Z shl.b64 %rd144, %rd143, 32; 2026-02-21T08:12:45.1227757Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T08:12:45.1228078Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1228422Z mov.b64 {%r567, %r568}, %rd145; 2026-02-21T08:12:45.1228624Z cvt.rn.f16x2.f32 %r569, %r568, %r567; 2026-02-21T08:12:45.1228969Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1229393Z cvt.u64.u32 %rd146, %r385; 2026-02-21T08:12:45.1229573Z cvt.u64.u32 %rd147, %r386; 2026-02-21T08:12:45.1229762Z shl.b64 %rd148, %rd147, 32; 2026-02-21T08:12:45.1229943Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T08:12:45.1230269Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1230603Z mov.b64 {%r570, %r571}, %rd149; 2026-02-21T08:12:45.1230808Z cvt.rn.f16x2.f32 %r572, %r571, %r570; 2026-02-21T08:12:45.1231144Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1231472Z cvt.u64.u32 %rd150, %r387; 2026-02-21T08:12:45.1231662Z cvt.u64.u32 %rd151, %r388; 2026-02-21T08:12:45.1231844Z shl.b64 %rd152, %rd151, 32; 2026-02-21T08:12:45.1232039Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T08:12:45.1233037Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1233395Z mov.b64 {%r573, %r574}, %rd153; 2026-02-21T08:12:45.1233596Z cvt.rn.f16x2.f32 %r575, %r574, %r573; 2026-02-21T08:12:45.1233934Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1234282Z cvt.u64.u32 %rd154, %r389; 2026-02-21T08:12:45.1234460Z cvt.u64.u32 %rd155, %r390; 2026-02-21T08:12:45.1234652Z shl.b64 %rd156, %rd155, 32; 2026-02-21T08:12:45.1234938Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T08:12:45.1235265Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1235594Z mov.b64 {%r576, %r577}, %rd157; 2026-02-21T08:12:45.1235796Z cvt.rn.f16x2.f32 %r578, %r577, %r576; 2026-02-21T08:12:45.1236137Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1236474Z cvt.u64.u32 %rd158, %r391; 2026-02-21T08:12:45.1236668Z cvt.u64.u32 %rd159, %r392; 2026-02-21T08:12:45.1236845Z shl.b64 %rd160, %rd159, 32; 2026-02-21T08:12:45.1237039Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T08:12:45.1237355Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1237710Z mov.b64 {%r579, %r580}, %rd161; 2026-02-21T08:12:45.1238096Z cvt.rn.f16x2.f32 %r581, %r580, %r579; 2026-02-21T08:12:45.1238426Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1238775Z cvt.u64.u32 %rd162, %r393; 2026-02-21T08:12:45.1238950Z cvt.u64.u32 %rd163, %r394; 2026-02-21T08:12:45.1239133Z shl.b64 %rd164, %rd163, 32; 2026-02-21T08:12:45.1239310Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T08:12:45.1239634Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1239981Z mov.b64 {%r582, %r583}, %rd165; 2026-02-21T08:12:45.1240184Z cvt.rn.f16x2.f32 %r584, %r583, %r582; 2026-02-21T08:12:45.1240517Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1240848Z cvt.u64.u32 %rd166, %r395; 2026-02-21T08:12:45.1241032Z cvt.u64.u32 %rd167, %r396; 2026-02-21T08:12:45.1241208Z shl.b64 %rd168, %rd167, 32; 2026-02-21T08:12:45.1241392Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T08:12:45.1241706Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1242043Z mov.b64 {%r585, %r586}, %rd169; 2026-02-21T08:12:45.1242242Z cvt.rn.f16x2.f32 %r587, %r586, %r585; 2026-02-21T08:12:45.1242566Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1242909Z cvt.u64.u32 %rd170, %r397; 2026-02-21T08:12:45.1243083Z cvt.u64.u32 %rd171, %r398; 2026-02-21T08:12:45.1243265Z shl.b64 %rd172, %rd171, 32; 2026-02-21T08:12:45.1243444Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T08:12:45.1243854Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1244178Z mov.b64 {%r588, %r589}, %rd173; 2026-02-21T08:12:45.1244376Z cvt.rn.f16x2.f32 %r590, %r589, %r588; 2026-02-21T08:12:45.1245210Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1245539Z cvt.u64.u32 %rd174, %r400; 2026-02-21T08:12:45.1245727Z cvt.u64.u32 %rd175, %r401; 2026-02-21T08:12:45.1245906Z shl.b64 %rd176, %rd175, 32; 2026-02-21T08:12:45.1246101Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T08:12:45.1246406Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1246753Z mov.b64 {%r591, %r592}, %rd177; 2026-02-21T08:12:45.1246960Z cvt.rn.f16x2.f32 %r593, %r592, %r591; 2026-02-21T08:12:45.1247346Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1247691Z cvt.u64.u32 %rd178, %r402; 2026-02-21T08:12:45.1247868Z cvt.u64.u32 %rd179, %r403; 2026-02-21T08:12:45.1248054Z shl.b64 %rd180, %rd179, 32; 2026-02-21T08:12:45.1248234Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T08:12:45.1248542Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1248865Z mov.b64 {%r594, %r595}, %rd181; 2026-02-21T08:12:45.1249062Z cvt.rn.f16x2.f32 %r596, %r595, %r594; 2026-02-21T08:12:45.1249387Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1249712Z cvt.u64.u32 %rd182, %r404; 2026-02-21T08:12:45.1249897Z cvt.u64.u32 %rd183, %r405; 2026-02-21T08:12:45.1250071Z shl.b64 %rd184, %rd183, 32; 2026-02-21T08:12:45.1250256Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T08:12:45.1250562Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1250900Z mov.b64 {%r597, %r598}, %rd185; 2026-02-21T08:12:45.1251095Z cvt.rn.f16x2.f32 %r599, %r598, %r597; 2026-02-21T08:12:45.1251413Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1251751Z cvt.u64.u32 %rd186, %r406; 2026-02-21T08:12:45.1251927Z cvt.u64.u32 %rd187, %r407; 2026-02-21T08:12:45.1252107Z shl.b64 %rd188, %rd187, 32; 2026-02-21T08:12:45.1252290Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T08:12:45.1252602Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1252936Z mov.b64 {%r600, %r601}, %rd189; 2026-02-21T08:12:45.1253124Z cvt.rn.f16x2.f32 %r602, %r601, %r600; 2026-02-21T08:12:45.1253451Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1253777Z cvt.u64.u32 %rd190, %r408; 2026-02-21T08:12:45.1253958Z cvt.u64.u32 %rd191, %r409; 2026-02-21T08:12:45.1254139Z shl.b64 %rd192, %rd191, 32; 2026-02-21T08:12:45.1254311Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T08:12:45.1254622Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1255032Z mov.b64 {%r603, %r604}, %rd193; 2026-02-21T08:12:45.1255225Z cvt.rn.f16x2.f32 %r605, %r604, %r603; 2026-02-21T08:12:45.1255537Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1255872Z cvt.u64.u32 %rd194, %r410; 2026-02-21T08:12:45.1256050Z cvt.u64.u32 %rd195, %r411; 2026-02-21T08:12:45.1256220Z shl.b64 %rd196, %rd195, 32; 2026-02-21T08:12:45.1256401Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T08:12:45.1256696Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1257027Z mov.b64 {%r606, %r607}, %rd197; 2026-02-21T08:12:45.1257215Z cvt.rn.f16x2.f32 %r608, %r607, %r606; 2026-02-21T08:12:45.1257608Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1257930Z cvt.u64.u32 %rd198, %r412; 2026-02-21T08:12:45.1258112Z cvt.u64.u32 %rd199, %r413; 2026-02-21T08:12:45.1258291Z shl.b64 %rd200, %rd199, 32; 2026-02-21T08:12:45.1258465Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T08:12:45.1258773Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1259093Z mov.b64 {%r609, %r610}, %rd201; 2026-02-21T08:12:45.1259286Z cvt.rn.f16x2.f32 %r611, %r610, %r609; 2026-02-21T08:12:45.1259596Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1259938Z cvt.u64.u32 %rd202, %r414; 2026-02-21T08:12:45.1260120Z cvt.u64.u32 %rd203, %r415; 2026-02-21T08:12:45.1260290Z shl.b64 %rd204, %rd203, 32; 2026-02-21T08:12:45.1260589Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T08:12:45.1260891Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1261250Z mov.b64 {%r612, %r613}, %rd205; 2026-02-21T08:12:45.1261438Z cvt.rn.f16x2.f32 %r614, %r613, %r612; 2026-02-21T08:12:45.1261760Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1262089Z cvt.u64.u32 %rd206, %r417; 2026-02-21T08:12:45.1262275Z cvt.u64.u32 %rd207, %r418; 2026-02-21T08:12:45.1262451Z shl.b64 %rd208, %rd207, 32; 2026-02-21T08:12:45.1262639Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T08:12:45.1262939Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1263279Z mov.b64 {%r615, %r616}, %rd209; 2026-02-21T08:12:45.1263477Z cvt.rn.f16x2.f32 %r617, %r616, %r615; 2026-02-21T08:12:45.1263791Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1264119Z cvt.u64.u32 %rd210, %r419; 2026-02-21T08:12:45.1264295Z cvt.u64.u32 %rd211, %r420; 2026-02-21T08:12:45.1264473Z shl.b64 %rd212, %rd211, 32; 2026-02-21T08:12:45.1264649Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T08:12:45.1265005Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1265338Z mov.b64 {%r618, %r619}, %rd213; 2026-02-21T08:12:45.1265526Z cvt.rn.f16x2.f32 %r620, %r619, %r618; 2026-02-21T08:12:45.1265848Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1266184Z cvt.u64.u32 %rd214, %r421; 2026-02-21T08:12:45.1266363Z cvt.u64.u32 %rd215, %r422; 2026-02-21T08:12:45.1266534Z shl.b64 %rd216, %rd215, 32; 2026-02-21T08:12:45.1266717Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T08:12:45.1267015Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1267346Z mov.b64 {%r621, %r622}, %rd217; 2026-02-21T08:12:45.1267542Z cvt.rn.f16x2.f32 %r623, %r622, %r621; 2026-02-21T08:12:45.1267859Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1268203Z cvt.u64.u32 %rd218, %r423; 2026-02-21T08:12:45.1268375Z cvt.u64.u32 %rd219, %r424; 2026-02-21T08:12:45.1268552Z shl.b64 %rd220, %rd219, 32; 2026-02-21T08:12:45.1268724Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T08:12:45.1269027Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1269353Z mov.b64 {%r624, %r625}, %rd221; 2026-02-21T08:12:45.1269536Z cvt.rn.f16x2.f32 %r626, %r625, %r624; 2026-02-21T08:12:45.1269857Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1270195Z cvt.u64.u32 %rd222, %r425; 2026-02-21T08:12:45.1270373Z cvt.u64.u32 %rd223, %r426; 2026-02-21T08:12:45.1270547Z shl.b64 %rd224, %rd223, 32; 2026-02-21T08:12:45.1270820Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T08:12:45.1271118Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1271454Z mov.b64 {%r627, %r628}, %rd225; 2026-02-21T08:12:45.1271646Z cvt.rn.f16x2.f32 %r629, %r628, %r627; 2026-02-21T08:12:45.1271958Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1272291Z cvt.u64.u32 %rd226, %r427; 2026-02-21T08:12:45.1272464Z cvt.u64.u32 %rd227, %r428; 2026-02-21T08:12:45.1272644Z shl.b64 %rd228, %rd227, 32; 2026-02-21T08:12:45.1272822Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T08:12:45.1273141Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1273488Z mov.b64 {%r630, %r631}, %rd229; 2026-02-21T08:12:45.1273676Z cvt.rn.f16x2.f32 %r632, %r631, %r630; 2026-02-21T08:12:45.1274078Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1274407Z cvt.u64.u32 %rd230, %r429; 2026-02-21T08:12:45.1274590Z cvt.u64.u32 %rd231, %r430; 2026-02-21T08:12:45.1274799Z shl.b64 %rd232, %rd231, 32; 2026-02-21T08:12:45.1274984Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T08:12:45.1275291Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1275619Z mov.b64 {%r633, %r634}, %rd233; 2026-02-21T08:12:45.1275811Z cvt.rn.f16x2.f32 %r635, %r634, %r633; 2026-02-21T08:12:45.1276134Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1276477Z cvt.u64.u32 %rd234, %r431; 2026-02-21T08:12:45.1276650Z cvt.u64.u32 %rd235, %r432; 2026-02-21T08:12:45.1276827Z shl.b64 %rd236, %rd235, 32; 2026-02-21T08:12:45.1277001Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T08:12:45.1277320Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1277661Z mov.b64 {%r636, %r637}, %rd237; 2026-02-21T08:12:45.1277847Z cvt.rn.f16x2.f32 %r638, %r637, %r636; 2026-02-21T08:12:45.1278172Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1278508Z cvt.u64.u32 %rd238, %r434; 2026-02-21T08:12:45.1278686Z cvt.u64.u32 %rd239, %r435; 2026-02-21T08:12:45.1278856Z shl.b64 %rd240, %rd239, 32; 2026-02-21T08:12:45.1279039Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T08:12:45.1279354Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1279681Z mov.b64 {%r639, %r640}, %rd241; 2026-02-21T08:12:45.1279873Z cvt.rn.f16x2.f32 %r641, %r640, %r639; 2026-02-21T08:12:45.1280195Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1280527Z cvt.u64.u32 %rd242, %r436; 2026-02-21T08:12:45.1280697Z cvt.u64.u32 %rd243, %r437; 2026-02-21T08:12:45.1280876Z shl.b64 %rd244, %rd243, 32; 2026-02-21T08:12:45.1281051Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T08:12:45.1281369Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1281700Z mov.b64 {%r642, %r643}, %rd245; 2026-02-21T08:12:45.1281884Z cvt.rn.f16x2.f32 %r644, %r643, %r642; 2026-02-21T08:12:45.1282209Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1282535Z cvt.u64.u32 %rd246, %r438; 2026-02-21T08:12:45.1282710Z cvt.u64.u32 %rd247, %r439; 2026-02-21T08:12:45.1282881Z shl.b64 %rd248, %rd247, 32; 2026-02-21T08:12:45.1283062Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T08:12:45.1283381Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1283710Z mov.b64 {%r645, %r646}, %rd249; 2026-02-21T08:12:45.1283902Z cvt.rn.f16x2.f32 %r647, %r646, %r645; 2026-02-21T08:12:45.1284281Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1284607Z cvt.u64.u32 %rd250, %r440; 2026-02-21T08:12:45.1284814Z cvt.u64.u32 %rd251, %r441; 2026-02-21T08:12:45.1284994Z shl.b64 %rd252, %rd251, 32; 2026-02-21T08:12:45.1285170Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T08:12:45.1285481Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1285808Z mov.b64 {%r648, %r649}, %rd253; 2026-02-21T08:12:45.1285996Z cvt.rn.f16x2.f32 %r650, %r649, %r648; 2026-02-21T08:12:45.1286324Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1286657Z cvt.u64.u32 %rd254, %r442; 2026-02-21T08:12:45.1286842Z cvt.u64.u32 %rd255, %r443; 2026-02-21T08:12:45.1287013Z shl.b64 %rd256, %rd255, 32; 2026-02-21T08:12:45.1287258Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T08:12:45.1287576Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1287901Z mov.b64 {%r651, %r652}, %rd257; 2026-02-21T08:12:45.1288091Z cvt.rn.f16x2.f32 %r653, %r652, %r651; 2026-02-21T08:12:45.1288408Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1288739Z cvt.u64.u32 %rd258, %r444; 2026-02-21T08:12:45.1288908Z cvt.u64.u32 %rd259, %r445; 2026-02-21T08:12:45.1289087Z shl.b64 %rd260, %rd259, 32; 2026-02-21T08:12:45.1289260Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T08:12:45.1289570Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1289902Z mov.b64 {%r654, %r655}, %rd261; 2026-02-21T08:12:45.1290086Z cvt.rn.f16x2.f32 %r656, %r655, %r654; 2026-02-21T08:12:45.1290411Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1290745Z cvt.u64.u32 %rd262, %r446; 2026-02-21T08:12:45.1290922Z cvt.u64.u32 %rd263, %r447; 2026-02-21T08:12:45.1291091Z shl.b64 %rd264, %rd263, 32; 2026-02-21T08:12:45.1291273Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T08:12:45.1291587Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1291918Z mov.b64 {%r657, %r658}, %rd265; 2026-02-21T08:12:45.1292111Z cvt.rn.f16x2.f32 %r659, %r658, %r657; 2026-02-21T08:12:45.1292423Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1292757Z cvt.u64.u32 %rd266, %r448; 2026-02-21T08:12:45.1292927Z cvt.u64.u32 %rd267, %r449; 2026-02-21T08:12:45.1293105Z shl.b64 %rd268, %rd267, 32; 2026-02-21T08:12:45.1293278Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T08:12:45.1293592Z .loc 1 58 27 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:58:27 2026-02-21T08:12:45.1293925Z mov.b64 {%r660, %r661}, %rd269; 2026-02-21T08:12:45.1294111Z cvt.rn.f16x2.f32 %r662, %r661, %r660; 2026-02-21T08:12:45.1294430Z .loc 1 59 82 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:59:82 2026-02-21T08:12:45.1294812Z st.shared.v4.b32 [%r36], {%r569, %r581, %r593, %r605}; 2026-02-21T08:12:45.1295082Z st.shared.v4.b32 [%r37], {%r617, %r629, %r641, %r653}; 2026-02-21T08:12:45.1295302Z bar.sync 0, 128; 2026-02-21T08:12:45.1295466Z // begin inline asm 2026-02-21T08:12:45.1295744Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r451, %r452, %r453, %r454}, [%r455]; 2026-02-21T08:12:45.1296046Z // end inline asm 2026-02-21T08:12:45.1296208Z // begin inline asm 2026-02-21T08:12:45.1296469Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r456, %r457, %r458, %r459}, [%r460]; 2026-02-21T08:12:45.1296769Z // end inline asm 2026-02-21T08:12:45.1296918Z bar.sync 0, 128; 2026-02-21T08:12:45.1297117Z st.shared.v4.b32 [%r36], {%r572, %r584, %r596, %r608}; 2026-02-21T08:12:45.1297441Z st.shared.v4.b32 [%r37], {%r620, %r632, %r644, %r656}; 2026-02-21T08:12:45.1297667Z bar.sync 0, 128; 2026-02-21T08:12:45.1297824Z // begin inline asm 2026-02-21T08:12:45.1298075Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r461, %r462, %r463, %r464}, [%r455]; 2026-02-21T08:12:45.1298368Z // end inline asm 2026-02-21T08:12:45.1298518Z // begin inline asm 2026-02-21T08:12:45.1298780Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r466, %r467, %r468, %r469}, [%r460]; 2026-02-21T08:12:45.1299072Z // end inline asm 2026-02-21T08:12:45.1299232Z bar.sync 0, 128; 2026-02-21T08:12:45.1299421Z st.shared.v4.b32 [%r36], {%r575, %r587, %r599, %r611}; 2026-02-21T08:12:45.1299691Z st.shared.v4.b32 [%r37], {%r623, %r635, %r647, %r659}; 2026-02-21T08:12:45.1299918Z bar.sync 0, 128; 2026-02-21T08:12:45.1300066Z // begin inline asm 2026-02-21T08:12:45.1300324Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r471, %r472, %r473, %r474}, [%r455]; 2026-02-21T08:12:45.1300676Z // end inline asm 2026-02-21T08:12:45.1300838Z // begin inline asm 2026-02-21T08:12:45.1301085Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r476, %r477, %r478, %r479}, [%r460]; 2026-02-21T08:12:45.1301377Z // end inline asm 2026-02-21T08:12:45.1301522Z bar.sync 0, 128; 2026-02-21T08:12:45.1301712Z st.shared.v4.b32 [%r36], {%r578, %r590, %r602, %r614}; 2026-02-21T08:12:45.1301967Z st.shared.v4.b32 [%r37], {%r626, %r638, %r650, %r662}; 2026-02-21T08:12:45.1302181Z bar.sync 0, 128; 2026-02-21T08:12:45.1302338Z // begin inline asm 2026-02-21T08:12:45.1302588Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r481, %r482, %r483, %r484}, [%r455]; 2026-02-21T08:12:45.1302882Z // end inline asm 2026-02-21T08:12:45.1303029Z // begin inline asm 2026-02-21T08:12:45.1303284Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r486, %r487, %r488, %r489}, [%r460]; 2026-02-21T08:12:45.1303568Z // end inline asm 2026-02-21T08:12:45.1303724Z // begin inline asm 2026-02-21T08:12:45.1303934Z st.global.v4.b32 [ %rd134 + 0 ], { %r451, %r461, %r471, %r481 }; 2026-02-21T08:12:45.1304184Z // end inline asm 2026-02-21T08:12:45.1304337Z // begin inline asm 2026-02-21T08:12:45.1304540Z st.global.v4.b32 [ %rd135 + 0 ], { %r452, %r462, %r472, %r482 }; 2026-02-21T08:12:45.1304813Z // end inline asm 2026-02-21T08:12:45.1304960Z // begin inline asm 2026-02-21T08:12:45.1305167Z st.global.v4.b32 [ %rd136 + 0 ], { %r453, %r463, %r473, %r483 }; 2026-02-21T08:12:45.1305405Z // end inline asm 2026-02-21T08:12:45.1305563Z // begin inline asm 2026-02-21T08:12:45.1305763Z st.global.v4.b32 [ %rd137 + 0 ], { %r454, %r464, %r474, %r484 }; 2026-02-21T08:12:45.1306004Z // end inline asm 2026-02-21T08:12:45.1306159Z // begin inline asm 2026-02-21T08:12:45.1306357Z st.global.v4.b32 [ %rd138 + 0 ], { %r456, %r466, %r476, %r486 }; 2026-02-21T08:12:45.1306597Z // end inline asm 2026-02-21T08:12:45.1306745Z // begin inline asm 2026-02-21T08:12:45.1306950Z st.global.v4.b32 [ %rd139 + 0 ], { %r457, %r467, %r477, %r487 }; 2026-02-21T08:12:45.1307186Z // end inline asm 2026-02-21T08:12:45.1307345Z // begin inline asm 2026-02-21T08:12:45.1307543Z st.global.v4.b32 [ %rd140 + 0 ], { %r458, %r468, %r478, %r488 }; 2026-02-21T08:12:45.1307784Z // end inline asm 2026-02-21T08:12:45.1307940Z // begin inline asm 2026-02-21T08:12:45.1308140Z st.global.v4.b32 [ %rd141 + 0 ], { %r459, %r469, %r479, %r489 }; 2026-02-21T08:12:45.1308382Z // end inline asm 2026-02-21T08:12:45.1308670Z .loc 1 30 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:30:52 2026-02-21T08:12:45.1309030Z add.s32 %r671, %r671, 1; 2026-02-21T08:12:45.1309213Z setp.ne.b32 %p79, %r25, %r671; 2026-02-21T08:12:45.1309411Z @%p79 bra $L__BB0_14; 2026-02-21T08:12:45.1309611Z $L__BB0_15: // %._crit_edge 2026-02-21T08:12:45.1309986Z .loc 1 30 4 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:30:4 2026-02-21T08:12:45.1310324Z bar.sync 0, 128; 2026-02-21T08:12:45.1310480Z // begin inline asm 2026-02-21T08:12:45.1310790Z @%p27 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r663, 64; 2026-02-21T08:12:45.1311045Z // end inline asm 2026-02-21T08:12:45.1311227Z st.shared.b32 [global_smem+57448], 50529027; 2026-02-21T08:12:45.1311437Z barrier.sync 1; 2026-02-21T08:12:45.1311632Z $L__BB0_16: // %common.ret 2026-02-21T08:12:45.1311975Z .loc 1 0 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:0 2026-02-21T08:12:45.1312297Z ret; 2026-02-21T08:12:45.1312487Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:12:45.1312748Z ld.param.b64 %rd6, [_helion_matmul_param_1]; 2026-02-21T08:12:45.1313094Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19 2026-02-21T08:12:45.1313421Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:12:45.1313601Z and.b16 %rs2, %rs1, 3; 2026-02-21T08:12:45.1313770Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:12:45.1314022Z mov.b32 %r43, global_smem; 2026-02-21T08:12:45.1314213Z add.s32 %r44, %r43, %r3; 2026-02-21T08:12:45.1314389Z add.s32 %r158, %r43, 40960; 2026-02-21T08:12:45.1314568Z bra.uni $L__BB0_2; 2026-02-21T08:12:45.1315141Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1315541Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1315879Z barrier.sync 1; 2026-02-21T08:12:45.1316044Z barrier.sync 1; 2026-02-21T08:12:45.1316225Z $L__BB0_2: // %.preheader 2026-02-21T08:12:45.1316488Z // =>This Loop Header: Depth=1 2026-02-21T08:12:45.1316756Z // Child Loop BB0_9 Depth 2 2026-02-21T08:12:45.1317007Z // Child Loop BB0_6 Depth 2 2026-02-21T08:12:45.1317368Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19 2026-02-21T08:12:45.1317729Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:12:45.1317942Z barrier.sync 1; 2026-02-21T08:12:45.1318108Z ld.shared.b8 %r42, [%r44+57444]; 2026-02-21T08:12:45.1318314Z setp.gt.u32 %p2, %r42, 3; 2026-02-21T08:12:45.1318497Z @%p2 bra $L__BB0_4; 2026-02-21T08:12:45.1318681Z // %bb.3: // %.preheader 2026-02-21T08:12:45.1318937Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1319173Z $L_brx_0: .branchtargets 2026-02-21T08:12:45.1319352Z $L__BB0_5, 2026-02-21T08:12:45.1319494Z $L__BB0_8, 2026-02-21T08:12:45.1319643Z $L__BB0_11, 2026-02-21T08:12:45.1319776Z $L__BB0_16; 2026-02-21T08:12:45.1319925Z brx.idx %r42, $L_brx_0; 2026-02-21T08:12:45.1320120Z $L__BB0_5: // %.peel.next 2026-02-21T08:12:45.1320363Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1320749Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1321108Z ld.shared.b32 %r135, [global_smem+40960]; 2026-02-21T08:12:45.1321341Z ld.shared.b32 %r159, [global_smem+40972]; 2026-02-21T08:12:45.1321544Z barrier.sync 1; 2026-02-21T08:12:45.1321843Z .loc 1 42 45 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:42:45 2026-02-21T08:12:45.1322189Z bfe.u32 %r160, %r1, 2, 3; 2026-02-21T08:12:45.1322508Z .loc 1 50 48 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:50:48 2026-02-21T08:12:45.1322844Z shl.b32 %r161, %r1, 3; 2026-02-21T08:12:45.1323015Z and.b32 %r162, %r161, 24; 2026-02-21T08:12:45.1323325Z .loc 1 42 32 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:42:32 2026-02-21T08:12:45.1323652Z add.s32 %r163, %r159, %r160; 2026-02-21T08:12:45.1323836Z shl.b32 %r164, %r163, 10; 2026-02-21T08:12:45.1324139Z .loc 1 55 80 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:80 2026-02-21T08:12:45.1324558Z add.s32 %r165, %r164, 8192; 2026-02-21T08:12:45.1325093Z add.s32 %r166, %r164, 16384; 2026-02-21T08:12:45.1325273Z add.s32 %r167, %r164, 24576; 2026-02-21T08:12:45.1325453Z add.s32 %r168, %r164, 32768; 2026-02-21T08:12:45.1325628Z add.s32 %r169, %r164, 40960; 2026-02-21T08:12:45.1325806Z add.s32 %r170, %r164, 49152; 2026-02-21T08:12:45.1325976Z add.s32 %r171, %r164, 57344; 2026-02-21T08:12:45.1326284Z .loc 1 55 59 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:59 2026-02-21T08:12:45.1326613Z or.b32 %r172, %r164, %r162; 2026-02-21T08:12:45.1326794Z or.b32 %r173, %r165, %r162; 2026-02-21T08:12:45.1326972Z or.b32 %r174, %r166, %r162; 2026-02-21T08:12:45.1327141Z or.b32 %r175, %r167, %r162; 2026-02-21T08:12:45.1327315Z or.b32 %r176, %r168, %r162; 2026-02-21T08:12:45.1327481Z or.b32 %r177, %r169, %r162; 2026-02-21T08:12:45.1327653Z or.b32 %r178, %r170, %r162; 2026-02-21T08:12:45.1327875Z or.b32 %r179, %r171, %r162; 2026-02-21T08:12:45.1328186Z .loc 1 55 34 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:34 2026-02-21T08:12:45.1328517Z mad.wide.s32 %rd12, %r172, 2, %rd6; 2026-02-21T08:12:45.1328727Z mad.wide.s32 %rd13, %r173, 2, %rd6; 2026-02-21T08:12:45.1328925Z mad.wide.s32 %rd14, %r174, 2, %rd6; 2026-02-21T08:12:45.1329115Z mad.wide.s32 %rd15, %r175, 2, %rd6; 2026-02-21T08:12:45.1329312Z mad.wide.s32 %rd16, %r176, 2, %rd6; 2026-02-21T08:12:45.1329500Z mad.wide.s32 %rd17, %r177, 2, %rd6; 2026-02-21T08:12:45.1329694Z mad.wide.s32 %rd18, %r178, 2, %rd6; 2026-02-21T08:12:45.1329881Z mad.wide.s32 %rd19, %r179, 2, %rd6; 2026-02-21T08:12:45.1330201Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1330529Z shl.b32 %r180, %r1, 4; 2026-02-21T08:12:45.1330708Z and.b32 %r181, %r180, 496; 2026-02-21T08:12:45.1330889Z shl.b32 %r182, %r1, 1; 2026-02-21T08:12:45.1331057Z and.b32 %r183, %r182, 48; 2026-02-21T08:12:45.1331237Z xor.b32 %r6, %r181, %r183; 2026-02-21T08:12:45.1331410Z add.s32 %r69, %r158, %r6; 2026-02-21T08:12:45.1331585Z mov.b32 %r70, 16; 2026-02-21T08:12:45.1331740Z // begin inline asm 2026-02-21T08:12:45.1331976Z cp.async.cg.shared.global [ %r69 + 0 ], [ %rd12 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1332238Z // end inline asm 2026-02-21T08:12:45.1332400Z add.s32 %r71, %r69, 512; 2026-02-21T08:12:45.1332569Z // begin inline asm 2026-02-21T08:12:45.1332793Z cp.async.cg.shared.global [ %r71 + 0 ], [ %rd13 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1333046Z // end inline asm 2026-02-21T08:12:45.1333198Z add.s32 %r73, %r69, 1024; 2026-02-21T08:12:45.1333373Z // begin inline asm 2026-02-21T08:12:45.1333588Z cp.async.cg.shared.global [ %r73 + 0 ], [ %rd14 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1333841Z // end inline asm 2026-02-21T08:12:45.1333990Z add.s32 %r75, %r69, 1536; 2026-02-21T08:12:45.1334159Z // begin inline asm 2026-02-21T08:12:45.1334371Z cp.async.cg.shared.global [ %r75 + 0 ], [ %rd15 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1334624Z // end inline asm 2026-02-21T08:12:45.1334946Z add.s32 %r77, %r69, 2048; 2026-02-21T08:12:45.1335116Z // begin inline asm 2026-02-21T08:12:45.1335340Z cp.async.cg.shared.global [ %r77 + 0 ], [ %rd16 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1335583Z // end inline asm 2026-02-21T08:12:45.1335741Z add.s32 %r79, %r69, 2560; 2026-02-21T08:12:45.1335906Z // begin inline asm 2026-02-21T08:12:45.1336121Z cp.async.cg.shared.global [ %r79 + 0 ], [ %rd17 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1336365Z // end inline asm 2026-02-21T08:12:45.1336526Z add.s32 %r81, %r69, 3072; 2026-02-21T08:12:45.1336690Z // begin inline asm 2026-02-21T08:12:45.1336910Z cp.async.cg.shared.global [ %r81 + 0 ], [ %rd18 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1337159Z // end inline asm 2026-02-21T08:12:45.1337311Z add.s32 %r83, %r69, 3584; 2026-02-21T08:12:45.1337486Z // begin inline asm 2026-02-21T08:12:45.1337697Z cp.async.cg.shared.global [ %r83 + 0 ], [ %rd19 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1338013Z // end inline asm 2026-02-21T08:12:45.1338169Z cp.async.commit_group; 2026-02-21T08:12:45.1338479Z .loc 1 55 34 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:34 2026-02-21T08:12:45.1338819Z cvt.s64.s32 %rd59, %r164; 2026-02-21T08:12:45.1338999Z cvt.u64.u32 %rd60, %r162; 2026-02-21T08:12:45.1339177Z or.b64 %rd61, %rd59, %rd60; 2026-02-21T08:12:45.1339355Z shl.b64 %rd62, %rd61, 1; 2026-02-21T08:12:45.1339533Z add.s64 %rd63, %rd6, %rd62; 2026-02-21T08:12:45.1339707Z add.s64 %rd20, %rd63, 64; 2026-02-21T08:12:45.1339882Z cvt.s64.s32 %rd64, %r165; 2026-02-21T08:12:45.1340049Z or.b64 %rd65, %rd64, %rd60; 2026-02-21T08:12:45.1340231Z shl.b64 %rd66, %rd65, 1; 2026-02-21T08:12:45.1340402Z add.s64 %rd67, %rd6, %rd66; 2026-02-21T08:12:45.1340580Z add.s64 %rd21, %rd67, 64; 2026-02-21T08:12:45.1340748Z cvt.s64.s32 %rd68, %r166; 2026-02-21T08:12:45.1340983Z or.b64 %rd69, %rd68, %rd60; 2026-02-21T08:12:45.1341169Z shl.b64 %rd70, %rd69, 1; 2026-02-21T08:12:45.1341334Z add.s64 %rd71, %rd6, %rd70; 2026-02-21T08:12:45.1341513Z add.s64 %rd22, %rd71, 64; 2026-02-21T08:12:45.1341675Z cvt.s64.s32 %rd72, %r167; 2026-02-21T08:12:45.1341845Z or.b64 %rd73, %rd72, %rd60; 2026-02-21T08:12:45.1342017Z shl.b64 %rd74, %rd73, 1; 2026-02-21T08:12:45.1342190Z add.s64 %rd75, %rd6, %rd74; 2026-02-21T08:12:45.1342360Z add.s64 %rd23, %rd75, 64; 2026-02-21T08:12:45.1342533Z cvt.s64.s32 %rd76, %r168; 2026-02-21T08:12:45.1342705Z or.b64 %rd77, %rd76, %rd60; 2026-02-21T08:12:45.1342876Z shl.b64 %rd78, %rd77, 1; 2026-02-21T08:12:45.1343052Z add.s64 %rd79, %rd6, %rd78; 2026-02-21T08:12:45.1343227Z add.s64 %rd24, %rd79, 64; 2026-02-21T08:12:45.1343404Z cvt.s64.s32 %rd80, %r169; 2026-02-21T08:12:45.1343576Z or.b64 %rd81, %rd80, %rd60; 2026-02-21T08:12:45.1343758Z shl.b64 %rd82, %rd81, 1; 2026-02-21T08:12:45.1343923Z add.s64 %rd83, %rd6, %rd82; 2026-02-21T08:12:45.1344103Z add.s64 %rd25, %rd83, 64; 2026-02-21T08:12:45.1344272Z cvt.s64.s32 %rd84, %r170; 2026-02-21T08:12:45.1344444Z or.b64 %rd85, %rd84, %rd60; 2026-02-21T08:12:45.1344620Z shl.b64 %rd86, %rd85, 1; 2026-02-21T08:12:45.1344883Z add.s64 %rd87, %rd6, %rd86; 2026-02-21T08:12:45.1345069Z add.s64 %rd26, %rd87, 64; 2026-02-21T08:12:45.1345237Z cvt.s64.s32 %rd88, %r171; 2026-02-21T08:12:45.1345412Z or.b64 %rd89, %rd88, %rd60; 2026-02-21T08:12:45.1345587Z shl.b64 %rd90, %rd89, 1; 2026-02-21T08:12:45.1345763Z add.s64 %rd91, %rd6, %rd90; 2026-02-21T08:12:45.1345937Z add.s64 %rd27, %rd91, 64; 2026-02-21T08:12:45.1346248Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1346608Z bar.warp.sync -1; 2026-02-21T08:12:45.1346776Z add.s32 %r184, %r43, %r6; 2026-02-21T08:12:45.1346953Z add.s32 %r85, %r184, 45056; 2026-02-21T08:12:45.1347126Z // begin inline asm 2026-02-21T08:12:45.1347358Z cp.async.cg.shared.global [ %r85 + 0 ], [ %rd20 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1347612Z // end inline asm 2026-02-21T08:12:45.1347771Z add.s32 %r87, %r184, 45568; 2026-02-21T08:12:45.1347942Z // begin inline asm 2026-02-21T08:12:45.1348166Z cp.async.cg.shared.global [ %r87 + 0 ], [ %rd21 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1348415Z // end inline asm 2026-02-21T08:12:45.1348576Z add.s32 %r89, %r184, 46080; 2026-02-21T08:12:45.1348751Z // begin inline asm 2026-02-21T08:12:45.1348963Z cp.async.cg.shared.global [ %r89 + 0 ], [ %rd22 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1349218Z // end inline asm 2026-02-21T08:12:45.1349368Z add.s32 %r91, %r184, 46592; 2026-02-21T08:12:45.1349546Z // begin inline asm 2026-02-21T08:12:45.1349759Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd23 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1350012Z // end inline asm 2026-02-21T08:12:45.1350163Z add.s32 %r93, %r184, 47104; 2026-02-21T08:12:45.1350339Z // begin inline asm 2026-02-21T08:12:45.1350564Z cp.async.cg.shared.global [ %r93 + 0 ], [ %rd24 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1350876Z // end inline asm 2026-02-21T08:12:45.1351034Z add.s32 %r95, %r184, 47616; 2026-02-21T08:12:45.1351203Z // begin inline asm 2026-02-21T08:12:45.1351423Z cp.async.cg.shared.global [ %r95 + 0 ], [ %rd25 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1351672Z // end inline asm 2026-02-21T08:12:45.1351830Z add.s32 %r97, %r184, 48128; 2026-02-21T08:12:45.1351999Z // begin inline asm 2026-02-21T08:12:45.1352217Z cp.async.cg.shared.global [ %r97 + 0 ], [ %rd26 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1352459Z // end inline asm 2026-02-21T08:12:45.1352623Z add.s32 %r99, %r184, 48640; 2026-02-21T08:12:45.1352804Z // begin inline asm 2026-02-21T08:12:45.1353019Z cp.async.cg.shared.global [ %r99 + 0 ], [ %rd27 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1353281Z // end inline asm 2026-02-21T08:12:45.1353441Z cp.async.commit_group; 2026-02-21T08:12:45.1353756Z .loc 1 55 34 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:34 2026-02-21T08:12:45.1354152Z add.s64 %rd28, %rd63, 128; 2026-02-21T08:12:45.1354345Z add.s64 %rd29, %rd67, 128; 2026-02-21T08:12:45.1354519Z add.s64 %rd30, %rd71, 128; 2026-02-21T08:12:45.1355220Z add.s64 %rd31, %rd75, 128; 2026-02-21T08:12:45.1355410Z add.s64 %rd32, %rd79, 128; 2026-02-21T08:12:45.1355584Z add.s64 %rd33, %rd83, 128; 2026-02-21T08:12:45.1355770Z add.s64 %rd34, %rd87, 128; 2026-02-21T08:12:45.1355944Z add.s64 %rd35, %rd91, 128; 2026-02-21T08:12:45.1356248Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1356573Z bar.warp.sync -1; 2026-02-21T08:12:45.1356747Z add.s32 %r101, %r184, 49152; 2026-02-21T08:12:45.1356923Z // begin inline asm 2026-02-21T08:12:45.1357160Z cp.async.cg.shared.global [ %r101 + 0 ], [ %rd28 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1357419Z // end inline asm 2026-02-21T08:12:45.1357576Z add.s32 %r103, %r184, 49664; 2026-02-21T08:12:45.1357758Z // begin inline asm 2026-02-21T08:12:45.1357982Z cp.async.cg.shared.global [ %r103 + 0 ], [ %rd29 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1358241Z // end inline asm 2026-02-21T08:12:45.1358396Z add.s32 %r105, %r184, 50176; 2026-02-21T08:12:45.1358575Z // begin inline asm 2026-02-21T08:12:45.1358792Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd30 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1359046Z // end inline asm 2026-02-21T08:12:45.1359208Z add.s32 %r107, %r184, 50688; 2026-02-21T08:12:45.1359377Z // begin inline asm 2026-02-21T08:12:45.1359600Z cp.async.cg.shared.global [ %r107 + 0 ], [ %rd31 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1359868Z // end inline asm 2026-02-21T08:12:45.1360026Z add.s32 %r109, %r184, 51200; 2026-02-21T08:12:45.1360198Z // begin inline asm 2026-02-21T08:12:45.1360418Z cp.async.cg.shared.global [ %r109 + 0 ], [ %rd32 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1360662Z // end inline asm 2026-02-21T08:12:45.1360820Z add.s32 %r111, %r184, 51712; 2026-02-21T08:12:45.1360988Z // begin inline asm 2026-02-21T08:12:45.1361212Z cp.async.cg.shared.global [ %r111 + 0 ], [ %rd33 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1361467Z // end inline asm 2026-02-21T08:12:45.1361617Z add.s32 %r113, %r184, 52224; 2026-02-21T08:12:45.1361792Z // begin inline asm 2026-02-21T08:12:45.1362005Z cp.async.cg.shared.global [ %r113 + 0 ], [ %rd34 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1362258Z // end inline asm 2026-02-21T08:12:45.1362410Z add.s32 %r115, %r184, 52736; 2026-02-21T08:12:45.1362588Z // begin inline asm 2026-02-21T08:12:45.1362800Z cp.async.cg.shared.global [ %r115 + 0 ], [ %rd35 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1363053Z // end inline asm 2026-02-21T08:12:45.1363217Z cp.async.commit_group; 2026-02-21T08:12:45.1363511Z .loc 1 55 34 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:34 2026-02-21T08:12:45.1363844Z add.s64 %rd36, %rd63, 192; 2026-02-21T08:12:45.1364020Z add.s64 %rd37, %rd67, 192; 2026-02-21T08:12:45.1364203Z add.s64 %rd38, %rd71, 192; 2026-02-21T08:12:45.1364376Z add.s64 %rd39, %rd75, 192; 2026-02-21T08:12:45.1364623Z add.s64 %rd40, %rd79, 192; 2026-02-21T08:12:45.1364817Z add.s64 %rd41, %rd83, 192; 2026-02-21T08:12:45.1364997Z add.s64 %rd42, %rd87, 192; 2026-02-21T08:12:45.1365172Z add.s64 %rd43, %rd91, 192; 2026-02-21T08:12:45.1365474Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1365812Z bar.warp.sync -1; 2026-02-21T08:12:45.1365975Z add.s32 %r117, %r184, 53248; 2026-02-21T08:12:45.1366162Z // begin inline asm 2026-02-21T08:12:45.1366383Z cp.async.cg.shared.global [ %r117 + 0 ], [ %rd36 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1366639Z // end inline asm 2026-02-21T08:12:45.1366792Z add.s32 %r119, %r184, 53760; 2026-02-21T08:12:45.1366968Z // begin inline asm 2026-02-21T08:12:45.1367192Z cp.async.cg.shared.global [ %r119 + 0 ], [ %rd37 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1367438Z // end inline asm 2026-02-21T08:12:45.1367603Z add.s32 %r121, %r184, 54272; 2026-02-21T08:12:45.1367849Z // begin inline asm 2026-02-21T08:12:45.1368078Z cp.async.cg.shared.global [ %r121 + 0 ], [ %rd38 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1368328Z // end inline asm 2026-02-21T08:12:45.1368487Z add.s32 %r123, %r184, 54784; 2026-02-21T08:12:45.1368655Z // begin inline asm 2026-02-21T08:12:45.1368880Z cp.async.cg.shared.global [ %r123 + 0 ], [ %rd39 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1369121Z // end inline asm 2026-02-21T08:12:45.1369276Z add.s32 %r125, %r184, 55296; 2026-02-21T08:12:45.1369451Z // begin inline asm 2026-02-21T08:12:45.1369663Z cp.async.cg.shared.global [ %r125 + 0 ], [ %rd40 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1369914Z // end inline asm 2026-02-21T08:12:45.1370066Z add.s32 %r127, %r184, 55808; 2026-02-21T08:12:45.1370246Z // begin inline asm 2026-02-21T08:12:45.1370461Z cp.async.cg.shared.global [ %r127 + 0 ], [ %rd41 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1370720Z // end inline asm 2026-02-21T08:12:45.1370869Z add.s32 %r129, %r184, 56320; 2026-02-21T08:12:45.1371047Z // begin inline asm 2026-02-21T08:12:45.1371267Z cp.async.cg.shared.global [ %r129 + 0 ], [ %rd42 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1371328Z // end inline asm 2026-02-21T08:12:45.1371392Z add.s32 %r131, %r184, 56832; 2026-02-21T08:12:45.1371454Z // begin inline asm 2026-02-21T08:12:45.1371582Z cp.async.cg.shared.global [ %r131 + 0 ], [ %rd43 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1371642Z // end inline asm 2026-02-21T08:12:45.1371710Z cp.async.commit_group; 2026-02-21T08:12:45.1371786Z cp.async.wait_group 3; 2026-02-21T08:12:45.1371852Z bar.warp.sync -1; 2026-02-21T08:12:45.1372043Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1372110Z add.s32 %r133, %r43, 57392; 2026-02-21T08:12:45.1372177Z mov.b32 %r664, 0; 2026-02-21T08:12:45.1372238Z // begin inline asm 2026-02-21T08:12:45.1372300Z 2026-02-21T08:12:45.1372364Z { 2026-02-21T08:12:45.1372433Z .reg .pred complete; 2026-02-21T08:12:45.1372500Z waitLoop: 2026-02-21T08:12:45.1372634Z mbarrier.try_wait.parity.shared.b64 complete, [%r133], %r664; 2026-02-21T08:12:45.1372715Z @!complete bra.uni waitLoop; 2026-02-21T08:12:45.1372771Z } 2026-02-21T08:12:45.1372778Z 2026-02-21T08:12:45.1372841Z // end inline asm 2026-02-21T08:12:45.1373035Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1373108Z elect.sync %r185|%p10, -1; 2026-02-21T08:12:45.1373178Z bfe.u32 %r186, %r43, 4, 14; 2026-02-21T08:12:45.1373250Z cvt.u64.u32 %rd92, %r186; 2026-02-21T08:12:45.1373331Z or.b64 %rd44, %rd92, -9223371899382267904; 2026-02-21T08:12:45.1373398Z bfe.u32 %r187, %r158, 4, 14; 2026-02-21T08:12:45.1373464Z cvt.u64.u32 %rd93, %r187; 2026-02-21T08:12:45.1373552Z or.b64 %rd45, %rd93, -9223371899399045120; 2026-02-21T08:12:45.1373617Z mov.b32 %r136, 135266320; 2026-02-21T08:12:45.1373681Z mov.pred %p9, 0; 2026-02-21T08:12:45.1373752Z // begin inline asm 2026-02-21T08:12:45.1373927Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r135 + 0 ], %rd44, %rd45, %r136, %p9; 2026-02-21T08:12:45.1374051Z // end inline asm 2026-02-21T08:12:45.1374129Z add.s32 %r188, %r43, 32; 2026-02-21T08:12:45.1374200Z bfe.u32 %r189, %r188, 4, 14; 2026-02-21T08:12:45.1374266Z cvt.u64.u32 %rd94, %r189; 2026-02-21T08:12:45.1374347Z or.b64 %rd46, %rd94, -9223371899382267904; 2026-02-21T08:12:45.1374421Z add.s32 %r190, %r43, 40992; 2026-02-21T08:12:45.1374485Z bfe.u32 %r191, %r190, 4, 14; 2026-02-21T08:12:45.1374550Z cvt.u64.u32 %rd95, %r191; 2026-02-21T08:12:45.1374630Z or.b64 %rd47, %rd95, -9223371899399045120; 2026-02-21T08:12:45.1374743Z mov.pred %p15, -1; 2026-02-21T08:12:45.1374806Z // begin inline asm 2026-02-21T08:12:45.1374967Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r135 + 0 ], %rd46, %rd47, %r136, %p15; 2026-02-21T08:12:45.1375038Z // end inline asm 2026-02-21T08:12:45.1375101Z add.s32 %r192, %r43, 57344; 2026-02-21T08:12:45.1375164Z cvt.u64.u32 %rd48, %r192; 2026-02-21T08:12:45.1375289Z // begin inline asm 2026-02-21T08:12:45.1375435Z @%p10 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd48]; 2026-02-21T08:12:45.1375500Z // end inline asm 2026-02-21T08:12:45.1375571Z add.s32 %r193, %r43, 57440; 2026-02-21T08:12:45.1375637Z cvt.u64.u32 %rd49, %r193; 2026-02-21T08:12:45.1375700Z // begin inline asm 2026-02-21T08:12:45.1375835Z @%p9 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd49]; 2026-02-21T08:12:45.1375904Z // end inline asm 2026-02-21T08:12:45.1376104Z .loc 1 55 34 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:34 2026-02-21T08:12:45.1376172Z add.s64 %rd50, %rd63, 256; 2026-02-21T08:12:45.1376246Z add.s64 %rd51, %rd67, 256; 2026-02-21T08:12:45.1376312Z add.s64 %rd52, %rd71, 256; 2026-02-21T08:12:45.1376377Z add.s64 %rd53, %rd75, 256; 2026-02-21T08:12:45.1376441Z add.s64 %rd54, %rd79, 256; 2026-02-21T08:12:45.1376516Z add.s64 %rd55, %rd83, 256; 2026-02-21T08:12:45.1376583Z add.s64 %rd56, %rd87, 256; 2026-02-21T08:12:45.1376648Z add.s64 %rd57, %rd91, 256; 2026-02-21T08:12:45.1376853Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1376921Z bar.warp.sync -1; 2026-02-21T08:12:45.1376983Z // begin inline asm 2026-02-21T08:12:45.1377118Z cp.async.cg.shared.global [ %r69 + 0 ], [ %rd50 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1377181Z // end inline asm 2026-02-21T08:12:45.1377242Z // begin inline asm 2026-02-21T08:12:45.1377367Z cp.async.cg.shared.global [ %r71 + 0 ], [ %rd51 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1377436Z // end inline asm 2026-02-21T08:12:45.1377497Z // begin inline asm 2026-02-21T08:12:45.1377620Z cp.async.cg.shared.global [ %r73 + 0 ], [ %rd52 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1377686Z // end inline asm 2026-02-21T08:12:45.1377748Z // begin inline asm 2026-02-21T08:12:45.1377869Z cp.async.cg.shared.global [ %r75 + 0 ], [ %rd53 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1377931Z // end inline asm 2026-02-21T08:12:45.1378004Z // begin inline asm 2026-02-21T08:12:45.1378125Z cp.async.cg.shared.global [ %r77 + 0 ], [ %rd54 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1378185Z // end inline asm 2026-02-21T08:12:45.1378253Z // begin inline asm 2026-02-21T08:12:45.1378372Z cp.async.cg.shared.global [ %r79 + 0 ], [ %rd55 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1378432Z // end inline asm 2026-02-21T08:12:45.1378498Z // begin inline asm 2026-02-21T08:12:45.1378618Z cp.async.cg.shared.global [ %r81 + 0 ], [ %rd56 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1378677Z // end inline asm 2026-02-21T08:12:45.1378737Z // begin inline asm 2026-02-21T08:12:45.1378866Z cp.async.cg.shared.global [ %r83 + 0 ], [ %rd57 + 0 ], 0x10, %r70; 2026-02-21T08:12:45.1378925Z // end inline asm 2026-02-21T08:12:45.1378994Z cp.async.commit_group; 2026-02-21T08:12:45.1379205Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1379272Z add.s32 %r194, %r4, %r164; 2026-02-21T08:12:45.1379399Z cvt.u64.u32 %rd1, %r194; 2026-02-21T08:12:45.1379460Z mov.b32 %r667, 1; 2026-02-21T08:12:45.1379531Z mov.b64 %rd270, 0; 2026-02-21T08:12:45.1379593Z mov.b32 %r665, %r664; 2026-02-21T08:12:45.1379656Z mov.b32 %r666, %r664; 2026-02-21T08:12:45.1379775Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:45.1379881Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:45.1379948Z add.s64 %rd3, %rd270, 32; 2026-02-21T08:12:45.1380026Z setp.lt.u64 %p21, %rd3, 896; 2026-02-21T08:12:45.1380091Z add.s32 %r217, %r664, 1; 2026-02-21T08:12:45.1380160Z setp.gt.s32 %p22, %r217, 3; 2026-02-21T08:12:45.1380233Z selp.b32 %r664, 0, %r217, %p22; 2026-02-21T08:12:45.1380305Z shl.b32 %r218, %r667, 3; 2026-02-21T08:12:45.1380368Z add.s32 %r220, %r43, %r218; 2026-02-21T08:12:45.1380432Z add.s32 %r221, %r220, 57344; 2026-02-21T08:12:45.1380503Z add.s32 %r195, %r220, 57392; 2026-02-21T08:12:45.1380799Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1380869Z shl.b32 %r222, %r667, 13; 2026-02-21T08:12:45.1380933Z add.s32 %r223, %r43, %r222; 2026-02-21T08:12:45.1381135Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1381207Z cp.async.wait_group 3; 2026-02-21T08:12:45.1381272Z bar.warp.sync -1; 2026-02-21T08:12:45.1381343Z shl.b32 %r224, %r664, 12; 2026-02-21T08:12:45.1381408Z add.s32 %r226, %r158, %r224; 2026-02-21T08:12:45.1381600Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1381670Z // begin inline asm 2026-02-21T08:12:45.1381727Z 2026-02-21T08:12:45.1381783Z { 2026-02-21T08:12:45.1381851Z .reg .pred complete; 2026-02-21T08:12:45.1381922Z waitLoop: 2026-02-21T08:12:45.1382058Z mbarrier.try_wait.parity.shared.b64 complete, [%r195], %r666; 2026-02-21T08:12:45.1382131Z @!complete bra.uni waitLoop; 2026-02-21T08:12:45.1382197Z } 2026-02-21T08:12:45.1382202Z 2026-02-21T08:12:45.1382264Z // end inline asm 2026-02-21T08:12:45.1382458Z .loc 1 56 52 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:56:52 2026-02-21T08:12:45.1382533Z setp.eq.b64 %p23, %rd270, 960; 2026-02-21T08:12:45.1382616Z elect.sync %r227|%p16, -1; 2026-02-21T08:12:45.1382681Z bfe.u32 %r228, %r223, 4, 14; 2026-02-21T08:12:45.1382750Z cvt.u64.u32 %rd110, %r228; 2026-02-21T08:12:45.1382844Z or.b64 %rd96, %rd110, -9223371899382267904; 2026-02-21T08:12:45.1382912Z bfe.u32 %r229, %r226, 4, 14; 2026-02-21T08:12:45.1382979Z cvt.u64.u32 %rd111, %r229; 2026-02-21T08:12:45.1383069Z or.b64 %rd97, %rd111, -9223371899399045120; 2026-02-21T08:12:45.1383133Z // begin inline asm 2026-02-21T08:12:45.1383292Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r135 + 0 ], %rd96, %rd97, %r136, %p15; 2026-02-21T08:12:45.1383352Z // end inline asm 2026-02-21T08:12:45.1383430Z add.s32 %r230, %r223, 32; 2026-02-21T08:12:45.1383495Z bfe.u32 %r231, %r230, 4, 14; 2026-02-21T08:12:45.1383561Z cvt.u64.u32 %rd112, %r231; 2026-02-21T08:12:45.1383644Z or.b64 %rd98, %rd112, -9223371899382267904; 2026-02-21T08:12:45.1383706Z add.s32 %r232, %r226, 32; 2026-02-21T08:12:45.1383768Z bfe.u32 %r233, %r232, 4, 14; 2026-02-21T08:12:45.1383832Z cvt.u64.u32 %rd113, %r233; 2026-02-21T08:12:45.1383913Z or.b64 %rd99, %rd113, -9223371899399045120; 2026-02-21T08:12:45.1383975Z // begin inline asm 2026-02-21T08:12:45.1384126Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r135 + 0 ], %rd98, %rd99, %r136, %p15; 2026-02-21T08:12:45.1384196Z // end inline asm 2026-02-21T08:12:45.1384259Z cvt.u64.u32 %rd100, %r221; 2026-02-21T08:12:45.1384320Z // begin inline asm 2026-02-21T08:12:45.1384465Z @%p16 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd100]; 2026-02-21T08:12:45.1384525Z // end inline asm 2026-02-21T08:12:45.1384594Z and.pred %p20, %p23, %p16; 2026-02-21T08:12:45.1384658Z // begin inline asm 2026-02-21T08:12:45.1384920Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd49]; 2026-02-21T08:12:45.1384983Z // end inline asm 2026-02-21T08:12:45.1385176Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1385251Z add.s32 %r235, %r667, 1; 2026-02-21T08:12:45.1385321Z setp.eq.b32 %p24, %r235, 5; 2026-02-21T08:12:45.1385394Z selp.b32 %r667, 0, %r235, %p24; 2026-02-21T08:12:45.1385471Z selp.b32 %r236, 1, 0, %p24; 2026-02-21T08:12:45.1385536Z xor.b32 %r666, %r666, %r236; 2026-02-21T08:12:45.1385737Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1385802Z add.s32 %r237, %r665, 1; 2026-02-21T08:12:45.1385877Z setp.gt.s32 %p25, %r237, 3; 2026-02-21T08:12:45.1385948Z selp.b32 %r665, 0, %r237, %p25; 2026-02-21T08:12:45.1386199Z .loc 1 55 59 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:59 2026-02-21T08:12:45.1386283Z add.s64 %rd114, %rd1, %rd270; 2026-02-21T08:12:45.1386350Z cvt.u32.u64 %r238, %rd114; 2026-02-21T08:12:45.1386414Z add.s32 %r239, %r238, 160; 2026-02-21T08:12:45.1386618Z .loc 1 55 34 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:34 2026-02-21T08:12:45.1386696Z mad.wide.s32 %rd102, %r239, 2, %rd6; 2026-02-21T08:12:45.1386761Z add.s32 %r240, %r238, 8352; 2026-02-21T08:12:45.1386835Z mad.wide.s32 %rd103, %r240, 2, %rd6; 2026-02-21T08:12:45.1386908Z add.s32 %r241, %r238, 16544; 2026-02-21T08:12:45.1386979Z mad.wide.s32 %rd104, %r241, 2, %rd6; 2026-02-21T08:12:45.1387042Z add.s32 %r242, %r238, 24736; 2026-02-21T08:12:45.1387119Z mad.wide.s32 %rd105, %r242, 2, %rd6; 2026-02-21T08:12:45.1387183Z add.s32 %r243, %r238, 32928; 2026-02-21T08:12:45.1387250Z mad.wide.s32 %rd106, %r243, 2, %rd6; 2026-02-21T08:12:45.1387314Z add.s32 %r244, %r238, 41120; 2026-02-21T08:12:45.1387395Z mad.wide.s32 %rd107, %r244, 2, %rd6; 2026-02-21T08:12:45.1387461Z add.s32 %r245, %r238, 49312; 2026-02-21T08:12:45.1387529Z mad.wide.s32 %rd108, %r245, 2, %rd6; 2026-02-21T08:12:45.1387600Z add.s32 %r246, %r238, 57504; 2026-02-21T08:12:45.1387667Z mad.wide.s32 %rd109, %r246, 2, %rd6; 2026-02-21T08:12:45.1387859Z .loc 1 55 87 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:55:87 2026-02-21T08:12:45.1387932Z shl.b32 %r247, %r665, 12; 2026-02-21T08:12:45.1387995Z add.s32 %r248, %r158, %r247; 2026-02-21T08:12:45.1388059Z bar.warp.sync -1; 2026-02-21T08:12:45.1388124Z add.s32 %r201, %r248, %r6; 2026-02-21T08:12:45.1388196Z selp.b32 %r202, 16, 0, %p21; 2026-02-21T08:12:45.1388259Z // begin inline asm 2026-02-21T08:12:45.1388393Z cp.async.cg.shared.global [ %r201 + 0 ], [ %rd102 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1388461Z // end inline asm 2026-02-21T08:12:45.1388524Z add.s32 %r203, %r201, 512; 2026-02-21T08:12:45.1388586Z // begin inline asm 2026-02-21T08:12:45.1388719Z cp.async.cg.shared.global [ %r203 + 0 ], [ %rd103 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1388790Z // end inline asm 2026-02-21T08:12:45.1388854Z add.s32 %r205, %r201, 1024; 2026-02-21T08:12:45.1388917Z // begin inline asm 2026-02-21T08:12:45.1389052Z cp.async.cg.shared.global [ %r205 + 0 ], [ %rd104 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1389113Z // end inline asm 2026-02-21T08:12:45.1389177Z add.s32 %r207, %r201, 1536; 2026-02-21T08:12:45.1389247Z // begin inline asm 2026-02-21T08:12:45.1389371Z cp.async.cg.shared.global [ %r207 + 0 ], [ %rd105 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1389432Z // end inline asm 2026-02-21T08:12:45.1389495Z add.s32 %r209, %r201, 2048; 2026-02-21T08:12:45.1389564Z // begin inline asm 2026-02-21T08:12:45.1389688Z cp.async.cg.shared.global [ %r209 + 0 ], [ %rd106 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1389749Z // end inline asm 2026-02-21T08:12:45.1389820Z add.s32 %r211, %r201, 2560; 2026-02-21T08:12:45.1389881Z // begin inline asm 2026-02-21T08:12:45.1390008Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd107 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1390127Z // end inline asm 2026-02-21T08:12:45.1390199Z add.s32 %r213, %r201, 3072; 2026-02-21T08:12:45.1390262Z // begin inline asm 2026-02-21T08:12:45.1390386Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd108 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1390456Z // end inline asm 2026-02-21T08:12:45.1390519Z add.s32 %r215, %r201, 3584; 2026-02-21T08:12:45.1390580Z // begin inline asm 2026-02-21T08:12:45.1390703Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd109 + 0 ], 0x10, %r202; 2026-02-21T08:12:45.1390773Z // end inline asm 2026-02-21T08:12:45.1390841Z cp.async.commit_group; 2026-02-21T08:12:45.1391039Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1391120Z setp.lt.u64 %p26, %rd3, 992; 2026-02-21T08:12:45.1391184Z mov.b64 %rd270, %rd3; 2026-02-21T08:12:45.1391255Z @%p26 bra $L__BB0_6; 2026-02-21T08:12:45.1391417Z // %bb.7: // %.loopexit 2026-02-21T08:12:45.1391523Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1391596Z cp.async.wait_group 0; 2026-02-21T08:12:45.1391669Z bar.warp.sync -1; 2026-02-21T08:12:45.1391740Z barrier.sync 1; 2026-02-21T08:12:45.1391805Z bra.uni $L__BB0_2; 2026-02-21T08:12:45.1391915Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1392119Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1392203Z ld.shared.b32 %r60, [global_smem+40968]; 2026-02-21T08:12:45.1392266Z barrier.sync 1; 2026-02-21T08:12:45.1392465Z .loc 1 21 67 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:21:67 2026-02-21T08:12:45.1392532Z mov.u32 %r47, %ctaid.x; 2026-02-21T08:12:45.1392597Z mov.u32 %r48, %ctaid.y; 2026-02-21T08:12:45.1392660Z mov.u32 %r49, %ctaid.z; 2026-02-21T08:12:45.1392736Z mov.u32 %r50, %nctaid.x; 2026-02-21T08:12:45.1392804Z mov.u32 %r51, %nctaid.y; 2026-02-21T08:12:45.1392877Z mad.lo.s32 %r52, %r49, %r51, %r48; 2026-02-21T08:12:45.1392953Z mad.lo.s32 %r53, %r52, %r50, %r47; 2026-02-21T08:12:45.1393017Z shl.b32 %r54, %r53, 7; 2026-02-21T08:12:45.1393082Z cvt.s64.s32 %rd9, %r54; 2026-02-21T08:12:45.1393147Z add.s64 %rd10, %rd8, %rd9; 2026-02-21T08:12:45.1393227Z cvta.global.u64 %rd11, %rd10; 2026-02-21T08:12:45.1393294Z add.s32 %r16, %r1, -128; 2026-02-21T08:12:45.1393353Z mov.b32 %r669, 0; 2026-02-21T08:12:45.1393431Z mov.b32 %r668, -32; 2026-02-21T08:12:45.1393496Z mov.b32 %r670, %r669; 2026-02-21T08:12:45.1393604Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:12:45.1393716Z // => This Inner Loop Header: Depth=2 2026-02-21T08:12:45.1393913Z .loc 1 0 67 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:0:67 2026-02-21T08:12:45.1393986Z setp.lt.u32 %p5, %r16, 32; 2026-02-21T08:12:45.1394057Z setp.eq.b32 %p3, %r16, 0; 2026-02-21T08:12:45.1394261Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1394327Z add.s32 %r668, %r668, 32; 2026-02-21T08:12:45.1394390Z shl.b32 %r62, %r670, 3; 2026-02-21T08:12:45.1394462Z add.s32 %r64, %r43, %r62; 2026-02-21T08:12:45.1394526Z add.s32 %r55, %r64, 57344; 2026-02-21T08:12:45.1394756Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1394828Z // begin inline asm 2026-02-21T08:12:45.1394884Z 2026-02-21T08:12:45.1394941Z { 2026-02-21T08:12:45.1395010Z .reg .pred complete; 2026-02-21T08:12:45.1395079Z waitLoop: 2026-02-21T08:12:45.1395211Z mbarrier.try_wait.parity.shared.b64 complete, [%r55], %r669; 2026-02-21T08:12:45.1395284Z @!complete bra.uni waitLoop; 2026-02-21T08:12:45.1395347Z } 2026-02-21T08:12:45.1395351Z 2026-02-21T08:12:45.1395415Z // end inline asm 2026-02-21T08:12:45.1395679Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1395744Z add.s32 %r61, %r64, 57392; 2026-02-21T08:12:45.1395947Z .loc 1 54 31 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:54:31 2026-02-21T08:12:45.1396009Z bar.sync 3, 64; 2026-02-21T08:12:45.1396070Z // begin inline asm 2026-02-21T08:12:45.1396200Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r61], 8192; 2026-02-21T08:12:45.1396261Z // end inline asm 2026-02-21T08:12:45.1396327Z shl.b32 %r65, %r670, 13; 2026-02-21T08:12:45.1396400Z add.s32 %r58, %r43, %r65; 2026-02-21T08:12:45.1396462Z bar.sync 3, 64; 2026-02-21T08:12:45.1396535Z elect.sync %r66|%p6, -1; 2026-02-21T08:12:45.1396607Z and.pred %p4, %p5, %p6; 2026-02-21T08:12:45.1396679Z // begin inline asm 2026-02-21T08:12:45.1397007Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r58], [%rd11, {%r668, %r60}], [%r61]; 2026-02-21T08:12:45.1397075Z // end inline asm 2026-02-21T08:12:45.1397146Z add.s32 %r67, %r670, 1; 2026-02-21T08:12:45.1397213Z setp.eq.b32 %p7, %r67, 5; 2026-02-21T08:12:45.1397284Z selp.b32 %r670, 0, %r67, %p7; 2026-02-21T08:12:45.1397349Z selp.b32 %r68, 1, 0, %p7; 2026-02-21T08:12:45.1397422Z xor.b32 %r669, %r669, %r68; 2026-02-21T08:12:45.1397627Z .loc 1 49 112 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:49:112 2026-02-21T08:12:45.1397696Z setp.lt.u32 %p8, %r668, 992; 2026-02-21T08:12:45.1397766Z @%p8 bra $L__BB0_9; 2026-02-21T08:12:45.1397874Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1397936Z barrier.sync 1; 2026-02-21T08:12:45.1398006Z bra.uni $L__BB0_2; 2026-02-21T08:12:45.1398112Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:12:45.1398301Z .loc 1 19 0 // c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py:19 2026-02-21T08:12:45.1398367Z barrier.sync 1; 2026-02-21T08:12:45.1398435Z barrier.sync 1; 2026-02-21T08:12:45.1398497Z bra.uni $L__BB0_2; 2026-02-21T08:12:45.1398556Z $L__tmp1: 2026-02-21T08:12:45.1398625Z $L__func_end0: 2026-02-21T08:12:45.1398716Z // -- End function 2026-02-21T08:12:45.1398772Z } 2026-02-21T08:12:45.1399019Z .file 1 "/tmp/torchinductor_root/6s/c6skklndkbclzjbjsvrukmcex473k6mkhrlwzvobxpl6jtcjficw.py" 2026-02-21T08:12:45.1399088Z .section .debug_abbrev 2026-02-21T08:12:45.1399146Z { 2026-02-21T08:12:45.1399244Z .b8 1 // Abbreviation Code 2026-02-21T08:12:45.1399351Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:12:45.1399441Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:12:45.1399532Z .b8 37 // DW_AT_producer 2026-02-21T08:12:45.1399623Z .b8 8 // DW_FORM_string 2026-02-21T08:12:45.1399709Z .b8 19 // DW_AT_language 2026-02-21T08:12:45.1399799Z .b8 5 // DW_FORM_data2 2026-02-21T08:12:45.1399889Z .b8 3 // DW_AT_name 2026-02-21T08:12:45.1399971Z .b8 8 // DW_FORM_string 2026-02-21T08:12:45.1400059Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:12:45.1400144Z .b8 6 // DW_FORM_data4 2026-02-21T08:12:45.1400238Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:12:45.1400320Z .b8 8 // DW_FORM_string 2026-02-21T08:12:45.1400398Z .b8 0 // EOM(1) 2026-02-21T08:12:45.1400487Z .b8 0 // EOM(2) 2026-02-21T08:12:45.1400560Z .b8 0 // EOM(3) 2026-02-21T08:12:45.1400619Z } 2026-02-21T08:12:45.1400696Z .section .debug_info 2026-02-21T08:12:45.1400755Z { 2026-02-21T08:12:45.1400899Z .b32 104 // Length of Unit 2026-02-21T08:12:45.1400997Z .b8 2 // DWARF version number 2026-02-21T08:12:45.1401064Z .b8 0 2026-02-21T08:12:45.1401193Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:12:45.1401290Z .b8 8 // Address Size (in bytes) 2026-02-21T08:12:45.1401410Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:12:45.1401499Z .b8 116 // DW_AT_producer 2026-02-21T08:12:45.1401557Z .b8 114 2026-02-21T08:12:45.1401615Z .b8 105 2026-02-21T08:12:45.1401679Z .b8 116 2026-02-21T08:12:45.1401737Z .b8 111 2026-02-21T08:12:45.1401793Z .b8 110 2026-02-21T08:12:45.1401858Z .b8 0 2026-02-21T08:12:45.1401940Z .b8 2 // DW_AT_language 2026-02-21T08:12:45.1401996Z .b8 0 2026-02-21T08:12:45.1402123Z .b8 99 // DW_AT_name 2026-02-21T08:12:45.1402196Z .b8 54 2026-02-21T08:12:45.1402253Z .b8 115 2026-02-21T08:12:45.1402312Z .b8 107 2026-02-21T08:12:45.1402376Z .b8 107 2026-02-21T08:12:45.1402432Z .b8 108 2026-02-21T08:12:45.1402489Z .b8 110 2026-02-21T08:12:45.1402544Z .b8 100 2026-02-21T08:12:45.1402610Z .b8 107 2026-02-21T08:12:45.1402667Z .b8 98 2026-02-21T08:12:45.1402724Z .b8 99 2026-02-21T08:12:45.1402787Z .b8 108 2026-02-21T08:12:45.1402842Z .b8 122 2026-02-21T08:12:45.1402898Z .b8 106 2026-02-21T08:12:45.1402954Z .b8 98 2026-02-21T08:12:45.1403020Z .b8 106 2026-02-21T08:12:45.1403076Z .b8 115 2026-02-21T08:12:45.1403132Z .b8 118 2026-02-21T08:12:45.1403189Z .b8 114 2026-02-21T08:12:45.1403252Z .b8 117 2026-02-21T08:12:45.1403307Z .b8 107 2026-02-21T08:12:45.1403362Z .b8 109 2026-02-21T08:12:45.1403425Z .b8 99 2026-02-21T08:12:45.1403481Z .b8 101 2026-02-21T08:12:45.1403539Z .b8 120 2026-02-21T08:12:45.1403594Z .b8 52 2026-02-21T08:12:45.1403657Z .b8 55 2026-02-21T08:12:45.1403713Z .b8 51 2026-02-21T08:12:45.1403772Z .b8 107 2026-02-21T08:12:45.1403837Z .b8 54 2026-02-21T08:12:45.1403892Z .b8 109 2026-02-21T08:12:45.1403946Z .b8 107 2026-02-21T08:12:45.1404002Z .b8 104 2026-02-21T08:12:45.1404064Z .b8 114 2026-02-21T08:12:45.1404119Z .b8 108 2026-02-21T08:12:45.1404174Z .b8 119 2026-02-21T08:12:45.1404236Z .b8 122 2026-02-21T08:12:45.1404293Z .b8 118 2026-02-21T08:12:45.1404348Z .b8 111 2026-02-21T08:12:45.1404403Z .b8 98 2026-02-21T08:12:45.1404466Z .b8 120 2026-02-21T08:12:45.1404522Z .b8 112 2026-02-21T08:12:45.1404578Z .b8 108 2026-02-21T08:12:45.1404633Z .b8 54 2026-02-21T08:12:45.1404727Z .b8 106 2026-02-21T08:12:45.1404784Z .b8 116 2026-02-21T08:12:45.1404840Z .b8 99 2026-02-21T08:12:45.1404902Z .b8 106 2026-02-21T08:12:45.1404959Z .b8 102 2026-02-21T08:12:45.1405014Z .b8 105 2026-02-21T08:12:45.1405070Z .b8 99 2026-02-21T08:12:45.1405133Z .b8 119 2026-02-21T08:12:45.1405191Z .b8 46 2026-02-21T08:12:45.1405245Z .b8 112 2026-02-21T08:12:45.1405306Z .b8 121 2026-02-21T08:12:45.1405361Z .b8 0 2026-02-21T08:12:45.1405465Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:12:45.1405553Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:12:45.1405617Z .b8 116 2026-02-21T08:12:45.1405675Z .b8 109 2026-02-21T08:12:45.1405731Z .b8 112 2026-02-21T08:12:45.1405795Z .b8 47 2026-02-21T08:12:45.1405852Z .b8 116 2026-02-21T08:12:45.1405908Z .b8 111 2026-02-21T08:12:45.1405964Z .b8 114 2026-02-21T08:12:45.1406030Z .b8 99 2026-02-21T08:12:45.1406088Z .b8 104 2026-02-21T08:12:45.1406144Z .b8 105 2026-02-21T08:12:45.1406200Z .b8 110 2026-02-21T08:12:45.1406264Z .b8 100 2026-02-21T08:12:45.1406321Z .b8 117 2026-02-21T08:12:45.1406377Z .b8 99 2026-02-21T08:12:45.1406443Z .b8 116 2026-02-21T08:12:45.1406498Z .b8 111 2026-02-21T08:12:45.1406553Z .b8 114 2026-02-21T08:12:45.1406608Z .b8 95 2026-02-21T08:12:45.1406673Z .b8 114 2026-02-21T08:12:45.1406730Z .b8 111 2026-02-21T08:12:45.1406786Z .b8 111 2026-02-21T08:12:45.1406853Z .b8 116 2026-02-21T08:12:45.1406910Z .b8 47 2026-02-21T08:12:45.1406970Z .b8 54 2026-02-21T08:12:45.1407095Z .b8 115 2026-02-21T08:12:45.1407161Z .b8 0 2026-02-21T08:12:45.1407218Z } 2026-02-21T08:12:45.1407291Z .section .debug_macinfo { } 2026-02-21T08:12:45.1407296Z 2026-02-21T08:12:45.1407392Z ================================================================ 2026-02-21T08:12:45.1407510Z please share the reproducer above with Triton project. 2026-02-21T08:12:48.1375980Z 2026-02-21T08:12:48.1378453Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 22.7 configs/s 2026-02-21T08:12:50.3525386Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 523.9 2026-02-21T08:12:50.3530484Z configs/s 2026-02-21T08:12:50.5117910Z [49s] Generation 3 complete: 2026-02-21T08:12:50.5118153Z error=28 2026-02-21T08:12:50.5118376Z ok=59 2026-02-21T08:12:50.5118503Z min=0.0205 2026-02-21T08:12:50.5118636Z mid=0.0307 2026-02-21T08:12:50.5118755Z max=1.6681 2026-02-21T08:12:50.5124465Z best={'block_sizes': [128, 128, 32], 2026-02-21T08:12:50.5129480Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:12:50.5131100Z 'l2_groupings': [2], 2026-02-21T08:12:50.5131367Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:12:50.5136530Z 'loop_orders': [[1, 0]], 2026-02-21T08:12:50.5137952Z 'maxnreg': 128, 2026-02-21T08:12:50.5138145Z 'num_sm_multiplier': 4, 2026-02-21T08:12:50.5138310Z 'num_stages': 6, 2026-02-21T08:12:50.5138457Z 'num_warps': 1, 2026-02-21T08:12:50.5138611Z 'pid_type': 'persistent_blocked', 2026-02-21T08:12:50.5138803Z 'range_flattens': [None, False], 2026-02-21T08:12:50.5138981Z 'range_multi_buffers': [None, True], 2026-02-21T08:12:50.5139167Z 'range_num_stages': [0, 0], 2026-02-21T08:12:50.5139328Z 'range_unroll_factors': [0, 0], 2026-02-21T08:12:50.5139513Z 'range_warp_specializes': [None, True]} 2026-02-21T08:12:50.5139794Z [49s] Fitting surrogate: 373 points, 373 targets 2026-02-21T08:12:51.8213637Z [50s] Generation 4 starting: 82 neighbors, 5 active search path(s) 2026-02-21T08:13:06.8422931Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 1.1 configs/s 2026-02-21T08:13:11.1267002Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 19.6 configs/s 2026-02-21T08:13:12.9357977Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 570.3 2026-02-21T08:13:12.9358654Z configs/s 2026-02-21T08:13:13.0944645Z [71s] Generation 4 complete: 2026-02-21T08:13:13.0945537Z error=20 2026-02-21T08:13:13.0945730Z ok=67 2026-02-21T08:13:13.0945923Z min=0.0205 2026-02-21T08:13:13.0946121Z mid=0.0319 2026-02-21T08:13:13.0946315Z max=0.7352 2026-02-21T08:13:13.0946531Z best={'block_sizes': [128, 128, 32], 2026-02-21T08:13:13.0946895Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:13:13.0947238Z 'l2_groupings': [2], 2026-02-21T08:13:13.0947528Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:13:13.0947896Z 'loop_orders': [[1, 0]], 2026-02-21T08:13:13.0948185Z 'maxnreg': 128, 2026-02-21T08:13:13.0948429Z 'num_sm_multiplier': 4, 2026-02-21T08:13:13.0948682Z 'num_stages': 6, 2026-02-21T08:13:13.0948911Z 'num_warps': 1, 2026-02-21T08:13:13.0949156Z 'pid_type': 'persistent_blocked', 2026-02-21T08:13:13.0949483Z 'range_flattens': [None, False], 2026-02-21T08:13:13.0949793Z 'range_multi_buffers': [None, True], 2026-02-21T08:13:13.0950123Z 'range_num_stages': [0, 0], 2026-02-21T08:13:13.0950411Z 'range_unroll_factors': [0, 0], 2026-02-21T08:13:13.0950728Z 'range_warp_specializes': [None, True]} 2026-02-21T08:13:13.0968500Z [71s] Fitting surrogate: 460 points, 460 targets 2026-02-21T08:13:13.9567525Z [72s] Generation 5 starting: 52 neighbors, 3 active search path(s) 2026-02-21T08:13:23.3098347Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53/53 2.4 configs/s 2026-02-21T08:13:25.4500206Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 53/53 25.0 configs/s 2026-02-21T08:13:26.7821767Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 759.4 2026-02-21T08:13:26.7822698Z configs/s 2026-02-21T08:13:26.9005181Z [85s] Generation 5 complete: 2026-02-21T08:13:26.9005992Z error=19 2026-02-21T08:13:26.9006200Z ok=37 2026-02-21T08:13:26.9006338Z min=0.0205 2026-02-21T08:13:26.9006481Z mid=0.0266 2026-02-21T08:13:26.9006608Z max=1.7409 2026-02-21T08:13:26.9010746Z best={'block_sizes': [128, 128, 32], 2026-02-21T08:13:26.9011045Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:13:26.9011283Z 'l2_groupings': [2], 2026-02-21T08:13:26.9011535Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:13:26.9011736Z 'loop_orders': [[1, 0]], 2026-02-21T08:13:26.9011898Z 'maxnreg': 128, 2026-02-21T08:13:26.9012041Z 'num_sm_multiplier': 4, 2026-02-21T08:13:26.9012201Z 'num_stages': 6, 2026-02-21T08:13:26.9012334Z 'num_warps': 1, 2026-02-21T08:13:26.9012888Z 'pid_type': 'persistent_blocked', 2026-02-21T08:13:26.9013084Z 'range_flattens': [None, False], 2026-02-21T08:13:26.9013279Z 'range_multi_buffers': [None, True], 2026-02-21T08:13:26.9013472Z 'range_num_stages': [0, 0], 2026-02-21T08:13:26.9013635Z 'range_unroll_factors': [0, 0], 2026-02-21T08:13:26.9013817Z 'range_warp_specializes': [None, True]} 2026-02-21T08:13:26.9022251Z [85s] Fitting surrogate: 516 points, 516 targets 2026-02-21T08:13:27.8002533Z [86s] Generation 6 starting: 50 neighbors, 3 active search path(s) 2026-02-21T08:13:30.9071370Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50/50 30.9 configs/s 2026-02-21T08:13:31.3583552Z [90s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:13:31.3585744Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:13:31.3593520Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:13:31.3593890Z `ptxas` stderr: 2026-02-21T08:13:31.3594572Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 292 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:31.3595390Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:31.3595636Z 2026-02-21T08:13:31.3596288Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpbaj5ecaj.ptx -o /tmp/tmpbaj5ecaj.ptx.o 2026-02-21T08:13:31.3597019Z 2026-02-21T08:13:31.3597223Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:13:31.3597540Z 2026-02-21T08:13:31.3597545Z 2026-02-21T08:13:31.3597673Z ================================================================ 2026-02-21T08:13:31.3598007Z Internal Triton PTX codegen error 2026-02-21T08:13:31.3598276Z `ptxas` stderr: 2026-02-21T08:13:31.3598932Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 292 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:31.3599699Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:31.3599928Z 2026-02-21T08:13:31.3600542Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpbaj5ecaj.ptx -o /tmp/tmpbaj5ecaj.ptx.o 2026-02-21T08:13:31.3601257Z 2026-02-21T08:13:31.3601262Z 2026-02-21T08:13:31.3601336Z // 2026-02-21T08:13:31.3601537Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:13:31.3601783Z // 2026-02-21T08:13:31.3601883Z 2026-02-21T08:13:31.3601959Z .version 8.7 2026-02-21T08:13:31.3602157Z .target sm_100a 2026-02-21T08:13:31.3602659Z .address_size 64 2026-02-21T08:13:31.3602785Z 2026-02-21T08:13:31.3602973Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:13:31.3603384Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:13:31.3604001Z // @_helion_matmul 2026-02-21T08:13:31.3604320Z .visible .entry _helion_matmul( 2026-02-21T08:13:31.3604664Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:13:31.3605138Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:13:31.3605553Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:13:31.3605957Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:13:31.3606369Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:13:31.3606699Z ) 2026-02-21T08:13:31.3606877Z .reqntid 256 2026-02-21T08:13:31.3607079Z .maxnreg 32 2026-02-21T08:13:31.3607259Z { 2026-02-21T08:13:31.3607610Z .reg .pred %p<105>; 2026-02-21T08:13:31.3607840Z .reg .b32 %r<906>; 2026-02-21T08:13:31.3608063Z .reg .b64 %rd<357>; 2026-02-21T08:13:31.3608489Z .loc 1 19 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:19:0 2026-02-21T08:13:31.3608974Z $L__func_begin0: 2026-02-21T08:13:31.3609384Z .loc 1 19 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:19:0 2026-02-21T08:13:31.3609779Z 2026-02-21T08:13:31.3609866Z // %bb.0: 2026-02-21T08:13:31.3610095Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T08:13:31.3610379Z $L__tmp0: 2026-02-21T08:13:31.3610740Z .loc 1 19 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:19 2026-02-21T08:13:31.3611187Z mov.u32 %r1, %tid.x; 2026-02-21T08:13:31.3611408Z shr.u32 %r2, %r1, 5; 2026-02-21T08:13:31.3611638Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:13:31.3611928Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:13:31.3612170Z @%p1 bra $L__BB0_12; 2026-02-21T08:13:31.3612385Z bra.uni $L__BB0_1; 2026-02-21T08:13:31.3612596Z $L__BB0_12: 2026-02-21T08:13:31.3612956Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0:0 2026-02-21T08:13:31.3613446Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T08:13:31.3613768Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T08:13:31.3614231Z .loc 1 19 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:19 2026-02-21T08:13:31.3614754Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:13:31.3615054Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T08:13:31.3615304Z mov.b32 %r149, global_smem; 2026-02-21T08:13:31.3615541Z // begin inline asm 2026-02-21T08:13:31.3615917Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r149], 128; 2026-02-21T08:13:31.3616312Z // end inline asm 2026-02-21T08:13:31.3616520Z bar.sync 0, 128; 2026-02-21T08:13:31.3616747Z ld.shared.b32 %r898, [global_smem]; 2026-02-21T08:13:31.3617027Z bar.sync 0, 128; 2026-02-21T08:13:31.3617230Z // begin inline asm 2026-02-21T08:13:31.3617557Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:13:31.3617941Z // end inline asm 2026-02-21T08:13:31.3618362Z .loc 1 21 67 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:21:67 2026-02-21T08:13:31.3618871Z mov.u32 %r166, %ctaid.x; 2026-02-21T08:13:31.3619106Z mov.u32 %r167, %ctaid.y; 2026-02-21T08:13:31.3619351Z mov.u32 %r168, %ctaid.z; 2026-02-21T08:13:31.3619593Z mov.u32 %r169, %nctaid.x; 2026-02-21T08:13:31.3619834Z mov.u32 %r170, %nctaid.y; 2026-02-21T08:13:31.3620090Z mad.lo.s32 %r171, %r168, %r170, %r167; 2026-02-21T08:13:31.3620402Z mad.lo.s32 %r172, %r171, %r169, %r166; 2026-02-21T08:13:31.3620683Z shl.b32 %r173, %r172, 8; 2026-02-21T08:13:31.3620914Z cvt.s64.s32 %rd84, %r173; 2026-02-21T08:13:31.3621159Z add.s64 %rd63, %rd6, %rd84; 2026-02-21T08:13:31.3621404Z shl.b32 %r174, %r1, 2; 2026-02-21T08:13:31.3621642Z add.s32 %r150, %r149, %r174; 2026-02-21T08:13:31.3622016Z mov.b32 %r159, 0; 2026-02-21T08:13:31.3622229Z // begin inline asm 2026-02-21T08:13:31.3622459Z @%p34 st.shared.b32 [ %r150 + 0 ], %r159; 2026-02-21T08:13:31.3622755Z // end inline asm 2026-02-21T08:13:31.3622965Z bar.warp.sync -1; 2026-02-21T08:13:31.3623202Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T08:13:31.3623440Z cvt.u64.u32 %rd48, %r149; 2026-02-21T08:13:31.3623663Z // begin inline asm 2026-02-21T08:13:31.3624059Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd48 + 0 ], %rd3; 2026-02-21T08:13:31.3624504Z // end inline asm 2026-02-21T08:13:31.3624748Z // begin inline asm 2026-02-21T08:13:31.3625097Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1; 2026-02-21T08:13:31.3625506Z // end inline asm 2026-02-21T08:13:31.3625701Z mov.b32 %r152, 64; 2026-02-21T08:13:31.3625911Z // begin inline asm 2026-02-21T08:13:31.3626389Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r152; 2026-02-21T08:13:31.3626823Z // end inline asm 2026-02-21T08:13:31.3627031Z mov.b32 %r153, 128; 2026-02-21T08:13:31.3627244Z // begin inline asm 2026-02-21T08:13:31.3627621Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r153; 2026-02-21T08:13:31.3628052Z // end inline asm 2026-02-21T08:13:31.3628261Z mov.b32 %r154, 1024; 2026-02-21T08:13:31.3628476Z // begin inline asm 2026-02-21T08:13:31.3628859Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r154; 2026-02-21T08:13:31.3629306Z // end inline asm 2026-02-21T08:13:31.3629500Z mov.b32 %r155, 4096; 2026-02-21T08:13:31.3629716Z // begin inline asm 2026-02-21T08:13:31.3630095Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r155; 2026-02-21T08:13:31.3630565Z // end inline asm 2026-02-21T08:13:31.3630773Z mov.b64 %rd56, 2048; 2026-02-21T08:13:31.3630985Z // begin inline asm 2026-02-21T08:13:31.3631389Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd48 + 0 ], 0x0, %rd56; 2026-02-21T08:13:31.3631848Z // end inline asm 2026-02-21T08:13:31.3632050Z mov.b32 %r156, 1; 2026-02-21T08:13:31.3632247Z // begin inline asm 2026-02-21T08:13:31.3632652Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r156; 2026-02-21T08:13:31.3633107Z // end inline asm 2026-02-21T08:13:31.3633312Z // begin inline asm 2026-02-21T08:13:31.3633714Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r156; 2026-02-21T08:13:31.3634167Z // end inline asm 2026-02-21T08:13:31.3634372Z // begin inline asm 2026-02-21T08:13:31.3634765Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x6; 2026-02-21T08:13:31.3635193Z // end inline asm 2026-02-21T08:13:31.3635392Z // begin inline asm 2026-02-21T08:13:31.3635798Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T08:13:31.3636248Z // end inline asm 2026-02-21T08:13:31.3636458Z // begin inline asm 2026-02-21T08:13:31.3636831Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x3; 2026-02-21T08:13:31.3637258Z // end inline asm 2026-02-21T08:13:31.3637461Z // begin inline asm 2026-02-21T08:13:31.3637815Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T08:13:31.3638229Z // end inline asm 2026-02-21T08:13:31.3638422Z // begin inline asm 2026-02-21T08:13:31.3638983Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd63 + 0 ], [ %rd48 + 0 ], 0x80; 2026-02-21T08:13:31.3639594Z // end inline asm 2026-02-21T08:13:31.3639790Z // begin inline asm 2026-02-21T08:13:31.3640120Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd63 + 0 ], 0x80; 2026-02-21T08:13:31.3640517Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T08:13:31.3640814Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:31.3641087Z // end inline asm 2026-02-21T08:13:31.3641408Z bar.sync 0, 128; 2026-02-21T08:13:31.3641804Z .loc 1 22 67 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:22:67 2026-02-21T08:13:31.3642279Z add.s64 %rd81, %rd63, 128; 2026-02-21T08:13:31.3642520Z bar.sync 0, 128; 2026-02-21T08:13:31.3642717Z // begin inline asm 2026-02-21T08:13:31.3642950Z @%p34 st.shared.b32 [ %r150 + 0 ], %r159; 2026-02-21T08:13:31.3643218Z // end inline asm 2026-02-21T08:13:31.3643429Z bar.warp.sync -1; 2026-02-21T08:13:31.3643635Z // begin inline asm 2026-02-21T08:13:31.3644031Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd48 + 0 ], %rd4; 2026-02-21T08:13:31.3644503Z // end inline asm 2026-02-21T08:13:31.3644745Z // begin inline asm 2026-02-21T08:13:31.3645103Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1; 2026-02-21T08:13:31.3645508Z // end inline asm 2026-02-21T08:13:31.3645717Z // begin inline asm 2026-02-21T08:13:31.3646173Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r152; 2026-02-21T08:13:31.3646621Z // end inline asm 2026-02-21T08:13:31.3646819Z // begin inline asm 2026-02-21T08:13:31.3647191Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r153; 2026-02-21T08:13:31.3647616Z // end inline asm 2026-02-21T08:13:31.3647820Z // begin inline asm 2026-02-21T08:13:31.3648224Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r154; 2026-02-21T08:13:31.3648683Z // end inline asm 2026-02-21T08:13:31.3648894Z // begin inline asm 2026-02-21T08:13:31.3649286Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r154; 2026-02-21T08:13:31.3649750Z // end inline asm 2026-02-21T08:13:31.3649960Z // begin inline asm 2026-02-21T08:13:31.3650374Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd48 + 0 ], 0x0, %rd56; 2026-02-21T08:13:31.3650852Z // end inline asm 2026-02-21T08:13:31.3651059Z // begin inline asm 2026-02-21T08:13:31.3651487Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r156; 2026-02-21T08:13:31.3651968Z // end inline asm 2026-02-21T08:13:31.3652179Z // begin inline asm 2026-02-21T08:13:31.3652594Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r156; 2026-02-21T08:13:31.3653077Z // end inline asm 2026-02-21T08:13:31.3653289Z // begin inline asm 2026-02-21T08:13:31.3653674Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x6; 2026-02-21T08:13:31.3654118Z // end inline asm 2026-02-21T08:13:31.3654322Z // begin inline asm 2026-02-21T08:13:31.3654785Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T08:13:31.3655253Z // end inline asm 2026-02-21T08:13:31.3655474Z // begin inline asm 2026-02-21T08:13:31.3655854Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x3; 2026-02-21T08:13:31.3656280Z // end inline asm 2026-02-21T08:13:31.3656488Z // begin inline asm 2026-02-21T08:13:31.3656841Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T08:13:31.3657258Z // end inline asm 2026-02-21T08:13:31.3657453Z // begin inline asm 2026-02-21T08:13:31.3658009Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd81 + 0 ], [ %rd48 + 0 ], 0x80; 2026-02-21T08:13:31.3658620Z // end inline asm 2026-02-21T08:13:31.3658815Z // begin inline asm 2026-02-21T08:13:31.3659140Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd81 + 0 ], 0x80; 2026-02-21T08:13:31.3659532Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T08:13:31.3659828Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:31.3660095Z // end inline asm 2026-02-21T08:13:31.3660297Z bar.sync 0, 128; 2026-02-21T08:13:31.3660683Z .loc 1 29 35 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:29:35 2026-02-21T08:13:31.3661154Z shl.b32 %r905, %r166, 1; 2026-02-21T08:13:31.3661755Z .loc 1 30 37 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:30:37 2026-02-21T08:13:31.3662210Z add.s32 %r175, %r905, 2; 2026-02-21T08:13:31.3662628Z .loc 1 30 49 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:30:49 2026-02-21T08:13:31.3663081Z min.s32 %r22, %r175, 256; 2026-02-21T08:13:31.3663489Z .loc 1 31 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:31:52 2026-02-21T08:13:31.3663956Z setp.ge.s32 %p72, %r905, %r22; 2026-02-21T08:13:31.3664215Z @%p72 bra $L__BB0_15; 2026-02-21T08:13:31.3664467Z // %bb.13: // %.lr.ph 2026-02-21T08:13:31.3664978Z .loc 1 0 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0:52 2026-02-21T08:13:31.3665479Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T08:13:31.3665868Z shr.u32 %r176, %r1, 4; 2026-02-21T08:13:31.3666103Z bfe.u32 %r23, %r1, 4, 3; 2026-02-21T08:13:31.3666326Z or.b32 %r24, %r23, 8; 2026-02-21T08:13:31.3666550Z or.b32 %r25, %r23, 16; 2026-02-21T08:13:31.3666765Z or.b32 %r26, %r23, 24; 2026-02-21T08:13:31.3666988Z or.b32 %r27, %r23, 32; 2026-02-21T08:13:31.3667208Z or.b32 %r28, %r23, 40; 2026-02-21T08:13:31.3667420Z or.b32 %r29, %r23, 48; 2026-02-21T08:13:31.3667641Z or.b32 %r30, %r176, 56; 2026-02-21T08:13:31.3667860Z or.b32 %r31, %r23, 64; 2026-02-21T08:13:31.3668080Z or.b32 %r32, %r23, 72; 2026-02-21T08:13:31.3668288Z or.b32 %r33, %r23, 80; 2026-02-21T08:13:31.3668502Z or.b32 %r34, %r23, 88; 2026-02-21T08:13:31.3668711Z or.b32 %r35, %r23, 96; 2026-02-21T08:13:31.3668930Z or.b32 %r36, %r23, 104; 2026-02-21T08:13:31.3669148Z or.b32 %r37, %r23, 112; 2026-02-21T08:13:31.3669374Z or.b32 %r38, %r176, 120; 2026-02-21T08:13:31.3669598Z shl.b32 %r177, %r1, 3; 2026-02-21T08:13:31.3669812Z and.b32 %r39, %r177, 120; 2026-02-21T08:13:31.3670048Z shl.b32 %r178, %r1, 10; 2026-02-21T08:13:31.3670270Z and.b32 %r179, %r178, 6144; 2026-02-21T08:13:31.3670507Z shl.b32 %r180, %r1, 4; 2026-02-21T08:13:31.3670720Z and.b32 %r181, %r180, 2032; 2026-02-21T08:13:31.3670961Z or.b32 %r182, %r179, %r181; 2026-02-21T08:13:31.3671187Z add.s32 %r40, %r149, %r182; 2026-02-21T08:13:31.3671425Z xor.b32 %r184, %r182, 32; 2026-02-21T08:13:31.3671648Z add.s32 %r41, %r149, %r184; 2026-02-21T08:13:31.3671886Z xor.b32 %r185, %r182, 64; 2026-02-21T08:13:31.3672115Z add.s32 %r42, %r149, %r185; 2026-02-21T08:13:31.3672343Z xor.b32 %r186, %r182, 96; 2026-02-21T08:13:31.3672572Z add.s32 %r43, %r149, %r186; 2026-02-21T08:13:31.3672798Z and.b32 %r187, %r1, 96; 2026-02-21T08:13:31.3673025Z shl.b32 %r188, %r187, 6; 2026-02-21T08:13:31.3673242Z shl.b32 %r189, %r1, 5; 2026-02-21T08:13:31.3673463Z and.b32 %r190, %r189, 96; 2026-02-21T08:13:31.3673689Z and.b32 %r191, %r180, 384; 2026-02-21T08:13:31.3673928Z and.b32 %r193, %r174, 16; 2026-02-21T08:13:31.3674154Z or.b32 %r194, %r188, %r190; 2026-02-21T08:13:31.3674395Z or.b32 %r195, %r191, %r187; 2026-02-21T08:13:31.3674646Z xor.b32 %r196, %r194, %r195; 2026-02-21T08:13:31.3674947Z add.s32 %r197, %r149, %r193; 2026-02-21T08:13:31.3675212Z add.s32 %r498, %r197, %r196; 2026-02-21T08:13:31.3675460Z add.s32 %r503, %r498, 512; 2026-02-21T08:13:31.3675718Z add.s32 %r508, %r498, 1024; 2026-02-21T08:13:31.3675980Z add.s32 %r513, %r498, 1536; 2026-02-21T08:13:31.3676301Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T08:13:31.3676842Z .loc 1 37 35 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:37:35 2026-02-21T08:13:31.3677356Z shr.s32 %r638, %r905, 31; 2026-02-21T08:13:31.3677610Z shr.u32 %r639, %r638, 21; 2026-02-21T08:13:31.3677931Z add.s32 %r640, %r905, %r639; 2026-02-21T08:13:31.3678218Z shr.s32 %r641, %r640, 11; 2026-02-21T08:13:31.3678631Z .loc 1 38 33 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:38:33 2026-02-21T08:13:31.3679261Z shl.b32 %r642, %r641, 6; 2026-02-21T08:13:31.3679650Z .loc 1 39 39 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:39:39 2026-02-21T08:13:31.3680128Z sub.s32 %r643, 8, %r642; 2026-02-21T08:13:31.3680562Z .loc 1 39 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:39:52 2026-02-21T08:13:31.3681042Z min.s32 %r644, %r643, 64; 2026-02-21T08:13:31.3681481Z .loc 1 41 51 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:41:51 2026-02-21T08:13:31.3681972Z and.b32 %r645, %r640, -2048; 2026-02-21T08:13:31.3682222Z sub.s32 %r646, %r905, %r645; 2026-02-21T08:13:31.3682462Z div.s32 %r647, %r646, %r644; 2026-02-21T08:13:31.3682911Z .loc 1 42 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:42:27 2026-02-21T08:13:31.3683408Z mul.lo.s32 %r648, %r647, %r644; 2026-02-21T08:13:31.3683809Z mad.lo.s32 %r649, %r641, 1984, %r648; 2026-02-21T08:13:31.3684106Z sub.s32 %r650, %r905, %r649; 2026-02-21T08:13:31.3684371Z shl.b32 %r651, %r650, 7; 2026-02-21T08:13:31.3684855Z .loc 1 43 32 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:43:32 2026-02-21T08:13:31.3685309Z or.b32 %r652, %r651, %r39; 2026-02-21T08:13:31.3685726Z .loc 1 44 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:44:27 2026-02-21T08:13:31.3686172Z shl.b32 %r653, %r647, 7; 2026-02-21T08:13:31.3686582Z .loc 1 45 32 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:45:32 2026-02-21T08:13:31.3687040Z or.b32 %r654, %r653, %r23; 2026-02-21T08:13:31.3687268Z or.b32 %r655, %r653, %r24; 2026-02-21T08:13:31.3687501Z or.b32 %r656, %r653, %r25; 2026-02-21T08:13:31.3687726Z or.b32 %r657, %r653, %r26; 2026-02-21T08:13:31.3687958Z or.b32 %r658, %r653, %r27; 2026-02-21T08:13:31.3688182Z or.b32 %r659, %r653, %r28; 2026-02-21T08:13:31.3688417Z or.b32 %r660, %r653, %r29; 2026-02-21T08:13:31.3688647Z or.b32 %r661, %r653, %r30; 2026-02-21T08:13:31.3688880Z or.b32 %r662, %r653, %r31; 2026-02-21T08:13:31.3689110Z or.b32 %r663, %r653, %r32; 2026-02-21T08:13:31.3689335Z or.b32 %r664, %r653, %r33; 2026-02-21T08:13:31.3689568Z or.b32 %r665, %r653, %r34; 2026-02-21T08:13:31.3689789Z or.b32 %r666, %r653, %r35; 2026-02-21T08:13:31.3690018Z or.b32 %r667, %r653, %r36; 2026-02-21T08:13:31.3690240Z or.b32 %r668, %r653, %r37; 2026-02-21T08:13:31.3690469Z or.b32 %r669, %r653, %r38; 2026-02-21T08:13:31.3690872Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3691347Z shfl.sync.idx.b32 %r670, %r2, 0, 31, -1; 2026-02-21T08:13:31.3691625Z shl.b32 %r671, %r670, 21; 2026-02-21T08:13:31.3691851Z and.b32 %r672, %r671, 6291456; 2026-02-21T08:13:31.3692103Z add.s32 %r198, %r672, %r898; 2026-02-21T08:13:31.3692339Z mov.pred %p73, -1; 2026-02-21T08:13:31.3692555Z mov.b32 %r199, 0; 2026-02-21T08:13:31.3692756Z // begin inline asm 2026-02-21T08:13:31.3693369Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 0], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3694013Z // end inline asm 2026-02-21T08:13:31.3694217Z // begin inline asm 2026-02-21T08:13:31.3694854Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 16], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3695513Z // end inline asm 2026-02-21T08:13:31.3695736Z // begin inline asm 2026-02-21T08:13:31.3696344Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 32], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3697033Z // end inline asm 2026-02-21T08:13:31.3697236Z // begin inline asm 2026-02-21T08:13:31.3697850Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 48], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3698644Z // end inline asm 2026-02-21T08:13:31.3698848Z // begin inline asm 2026-02-21T08:13:31.3699447Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 64], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3700096Z // end inline asm 2026-02-21T08:13:31.3700304Z // begin inline asm 2026-02-21T08:13:31.3700898Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 80], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3701529Z // end inline asm 2026-02-21T08:13:31.3701733Z // begin inline asm 2026-02-21T08:13:31.3702383Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 96], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3703030Z // end inline asm 2026-02-21T08:13:31.3703234Z // begin inline asm 2026-02-21T08:13:31.3703821Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r198 + 112], {%r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199, %r199}; 2026-02-21T08:13:31.3704465Z // end inline asm 2026-02-21T08:13:31.3704658Z // begin inline asm 2026-02-21T08:13:31.3704943Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:13:31.3705185Z // end inline asm 2026-02-21T08:13:31.3705390Z bar.sync 0, 128; 2026-02-21T08:13:31.3705790Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3706269Z add.s32 %r334, %r149, 131200; 2026-02-21T08:13:31.3706502Z // begin inline asm 2026-02-21T08:13:31.3706755Z @%p37 mbarrier.init.shared::cta.b64 [%r334], 1; 2026-02-21T08:13:31.3707052Z // end inline asm 2026-02-21T08:13:31.3707246Z bar.sync 0, 128; 2026-02-21T08:13:31.3707465Z add.s32 %r335, %r149, 131208; 2026-02-21T08:13:31.3707717Z // begin inline asm 2026-02-21T08:13:31.3707988Z @%p37 mbarrier.init.shared::cta.b64 [%r335], 1; 2026-02-21T08:13:31.3708299Z // end inline asm 2026-02-21T08:13:31.3708500Z bar.sync 0, 128; 2026-02-21T08:13:31.3708701Z add.s32 %r336, %r149, 131216; 2026-02-21T08:13:31.3708938Z // begin inline asm 2026-02-21T08:13:31.3709183Z @%p37 mbarrier.init.shared::cta.b64 [%r336], 1; 2026-02-21T08:13:31.3709467Z // end inline asm 2026-02-21T08:13:31.3709666Z bar.sync 0, 128; 2026-02-21T08:13:31.3709859Z add.s32 %r337, %r149, 131224; 2026-02-21T08:13:31.3710098Z // begin inline asm 2026-02-21T08:13:31.3710335Z @%p37 mbarrier.init.shared::cta.b64 [%r337], 1; 2026-02-21T08:13:31.3710648Z // end inline asm 2026-02-21T08:13:31.3710845Z add.s32 %r338, %r149, 131232; 2026-02-21T08:13:31.3711080Z // begin inline asm 2026-02-21T08:13:31.3711322Z @%p37 mbarrier.init.shared::cta.b64 [%r338], 1; 2026-02-21T08:13:31.3711612Z // end inline asm 2026-02-21T08:13:31.3711813Z bar.sync 0, 128; 2026-02-21T08:13:31.3712014Z add.s32 %r339, %r149, 131240; 2026-02-21T08:13:31.3712246Z // begin inline asm 2026-02-21T08:13:31.3712485Z @%p37 mbarrier.init.shared::cta.b64 [%r339], 1; 2026-02-21T08:13:31.3712777Z // end inline asm 2026-02-21T08:13:31.3712966Z bar.sync 0, 128; 2026-02-21T08:13:31.3713172Z add.s32 %r340, %r149, 131248; 2026-02-21T08:13:31.3713401Z // begin inline asm 2026-02-21T08:13:31.3713648Z @%p37 mbarrier.init.shared::cta.b64 [%r340], 1; 2026-02-21T08:13:31.3713940Z // end inline asm 2026-02-21T08:13:31.3714130Z bar.sync 0, 128; 2026-02-21T08:13:31.3714336Z add.s32 %r341, %r149, 131256; 2026-02-21T08:13:31.3714565Z // begin inline asm 2026-02-21T08:13:31.3714850Z @%p37 mbarrier.init.shared::cta.b64 [%r341], 1; 2026-02-21T08:13:31.3715135Z // end inline asm 2026-02-21T08:13:31.3715525Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3715998Z bar.sync 0, 128; 2026-02-21T08:13:31.3716227Z // begin inline asm 2026-02-21T08:13:31.3716602Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r334]; 2026-02-21T08:13:31.3716901Z // end inline asm 2026-02-21T08:13:31.3717104Z bar.sync 0, 128; 2026-02-21T08:13:31.3717301Z // begin inline asm 2026-02-21T08:13:31.3717559Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r335]; 2026-02-21T08:13:31.3717855Z // end inline asm 2026-02-21T08:13:31.3718055Z bar.sync 0, 128; 2026-02-21T08:13:31.3718249Z // begin inline asm 2026-02-21T08:13:31.3718501Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r336]; 2026-02-21T08:13:31.3718795Z // end inline asm 2026-02-21T08:13:31.3718998Z bar.sync 0, 128; 2026-02-21T08:13:31.3719198Z // begin inline asm 2026-02-21T08:13:31.3719442Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r337]; 2026-02-21T08:13:31.3719741Z // end inline asm 2026-02-21T08:13:31.3720132Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3720594Z bar.sync 0, 128; 2026-02-21T08:13:31.3720901Z add.s32 %r346, %r149, 131264; 2026-02-21T08:13:31.3721147Z // begin inline asm 2026-02-21T08:13:31.3721391Z @%p37 mbarrier.init.shared::cta.b64 [%r346], 1; 2026-02-21T08:13:31.3721687Z // end inline asm 2026-02-21T08:13:31.3721928Z st.shared.b32 [global_smem+131272], 33554689; 2026-02-21T08:13:31.3722249Z st.shared.b32 [global_smem+131072], %r898; 2026-02-21T08:13:31.3722597Z st.shared.v2.b32 [global_smem+131080], {%r653, %r651}; 2026-02-21T08:13:31.3722908Z barrier.sync 1; 2026-02-21T08:13:31.3723153Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:13:31.3723430Z barrier.sync 1; 2026-02-21T08:13:31.3723639Z barrier.sync 1; 2026-02-21T08:13:31.3723869Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:13:31.3724337Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3724859Z bar.sync 0, 128; 2026-02-21T08:13:31.3725076Z // begin inline asm 2026-02-21T08:13:31.3725304Z 2026-02-21T08:13:31.3725489Z { 2026-02-21T08:13:31.3725689Z .reg .pred complete; 2026-02-21T08:13:31.3725931Z waitLoop: 2026-02-21T08:13:31.3726223Z mbarrier.try_wait.parity.shared.b64 complete, [%r346], %r199; 2026-02-21T08:13:31.3726595Z @!complete bra.uni waitLoop; 2026-02-21T08:13:31.3726848Z } 2026-02-21T08:13:31.3726951Z 2026-02-21T08:13:31.3727036Z // end inline asm 2026-02-21T08:13:31.3727468Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3727970Z bar.sync 0, 128; 2026-02-21T08:13:31.3728182Z // begin inline asm 2026-02-21T08:13:31.3728458Z @%p37 mbarrier.inval.shared::cta.b64 [%r346]; 2026-02-21T08:13:31.3728766Z // end inline asm 2026-02-21T08:13:31.3728985Z // begin inline asm 2026-02-21T08:13:31.3729249Z @%p37 mbarrier.inval.shared::cta.b64 [%r338]; 2026-02-21T08:13:31.3729567Z // end inline asm 2026-02-21T08:13:31.3729775Z bar.sync 0, 128; 2026-02-21T08:13:31.3729994Z // begin inline asm 2026-02-21T08:13:31.3730261Z @%p37 mbarrier.inval.shared::cta.b64 [%r339]; 2026-02-21T08:13:31.3730567Z // end inline asm 2026-02-21T08:13:31.3730785Z bar.sync 0, 128; 2026-02-21T08:13:31.3730993Z // begin inline asm 2026-02-21T08:13:31.3731259Z @%p37 mbarrier.inval.shared::cta.b64 [%r340]; 2026-02-21T08:13:31.3731560Z // end inline asm 2026-02-21T08:13:31.3731770Z bar.sync 0, 128; 2026-02-21T08:13:31.3731975Z // begin inline asm 2026-02-21T08:13:31.3732239Z @%p37 mbarrier.inval.shared::cta.b64 [%r341]; 2026-02-21T08:13:31.3732548Z // end inline asm 2026-02-21T08:13:31.3732757Z // begin inline asm 2026-02-21T08:13:31.3733019Z @%p37 mbarrier.inval.shared::cta.b64 [%r334]; 2026-02-21T08:13:31.3733322Z // end inline asm 2026-02-21T08:13:31.3733536Z bar.sync 0, 128; 2026-02-21T08:13:31.3733743Z // begin inline asm 2026-02-21T08:13:31.3734006Z @%p37 mbarrier.inval.shared::cta.b64 [%r335]; 2026-02-21T08:13:31.3734305Z // end inline asm 2026-02-21T08:13:31.3734529Z bar.sync 0, 128; 2026-02-21T08:13:31.3734764Z // begin inline asm 2026-02-21T08:13:31.3735013Z @%p37 mbarrier.inval.shared::cta.b64 [%r336]; 2026-02-21T08:13:31.3735419Z // end inline asm 2026-02-21T08:13:31.3735627Z bar.sync 0, 128; 2026-02-21T08:13:31.3735846Z // begin inline asm 2026-02-21T08:13:31.3736101Z @%p37 mbarrier.inval.shared::cta.b64 [%r337]; 2026-02-21T08:13:31.3736410Z // end inline asm 2026-02-21T08:13:31.3736824Z .loc 1 59 45 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:59:45 2026-02-21T08:13:31.3737289Z shl.b32 %r674, %r654, 10; 2026-02-21T08:13:31.3737517Z shl.b32 %r675, %r655, 10; 2026-02-21T08:13:31.3737754Z shl.b32 %r676, %r656, 10; 2026-02-21T08:13:31.3737984Z shl.b32 %r677, %r657, 10; 2026-02-21T08:13:31.3738203Z shl.b32 %r678, %r658, 10; 2026-02-21T08:13:31.3738431Z shl.b32 %r679, %r659, 10; 2026-02-21T08:13:31.3738649Z shl.b32 %r680, %r660, 10; 2026-02-21T08:13:31.3738874Z shl.b32 %r681, %r661, 10; 2026-02-21T08:13:31.3739092Z shl.b32 %r682, %r662, 10; 2026-02-21T08:13:31.3739409Z shl.b32 %r683, %r663, 10; 2026-02-21T08:13:31.3739630Z shl.b32 %r684, %r664, 10; 2026-02-21T08:13:31.3739857Z shl.b32 %r685, %r665, 10; 2026-02-21T08:13:31.3740075Z shl.b32 %r686, %r666, 10; 2026-02-21T08:13:31.3740304Z shl.b32 %r687, %r667, 10; 2026-02-21T08:13:31.3740530Z shl.b32 %r688, %r668, 10; 2026-02-21T08:13:31.3740746Z shl.b32 %r689, %r669, 10; 2026-02-21T08:13:31.3741159Z .loc 1 59 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:59:52 2026-02-21T08:13:31.3741610Z add.s32 %r690, %r674, %r652; 2026-02-21T08:13:31.3741853Z add.s32 %r691, %r675, %r652; 2026-02-21T08:13:31.3742082Z add.s32 %r692, %r676, %r652; 2026-02-21T08:13:31.3742318Z add.s32 %r693, %r677, %r652; 2026-02-21T08:13:31.3742545Z add.s32 %r694, %r678, %r652; 2026-02-21T08:13:31.3742783Z add.s32 %r695, %r679, %r652; 2026-02-21T08:13:31.3743016Z add.s32 %r696, %r680, %r652; 2026-02-21T08:13:31.3743242Z add.s32 %r697, %r681, %r652; 2026-02-21T08:13:31.3743480Z add.s32 %r698, %r682, %r652; 2026-02-21T08:13:31.3743708Z add.s32 %r699, %r683, %r652; 2026-02-21T08:13:31.3743943Z add.s32 %r700, %r684, %r652; 2026-02-21T08:13:31.3744168Z add.s32 %r701, %r685, %r652; 2026-02-21T08:13:31.3744402Z add.s32 %r702, %r686, %r652; 2026-02-21T08:13:31.3744629Z add.s32 %r703, %r687, %r652; 2026-02-21T08:13:31.3744901Z add.s32 %r704, %r688, %r652; 2026-02-21T08:13:31.3745129Z add.s32 %r705, %r689, %r652; 2026-02-21T08:13:31.3745546Z .loc 1 59 24 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:59:24 2026-02-21T08:13:31.3746017Z mad.wide.s32 %rd85, %r690, 2, %rd5; 2026-02-21T08:13:31.3746290Z mad.wide.s32 %rd86, %r691, 2, %rd5; 2026-02-21T08:13:31.3746565Z mad.wide.s32 %rd87, %r692, 2, %rd5; 2026-02-21T08:13:31.3746823Z mad.wide.s32 %rd88, %r693, 2, %rd5; 2026-02-21T08:13:31.3747090Z mad.wide.s32 %rd89, %r694, 2, %rd5; 2026-02-21T08:13:31.3747345Z mad.wide.s32 %rd90, %r695, 2, %rd5; 2026-02-21T08:13:31.3747611Z mad.wide.s32 %rd91, %r696, 2, %rd5; 2026-02-21T08:13:31.3747884Z mad.wide.s32 %rd92, %r697, 2, %rd5; 2026-02-21T08:13:31.3748140Z mad.wide.s32 %rd93, %r698, 2, %rd5; 2026-02-21T08:13:31.3748407Z mad.wide.s32 %rd94, %r699, 2, %rd5; 2026-02-21T08:13:31.3748668Z mad.wide.s32 %rd95, %r700, 2, %rd5; 2026-02-21T08:13:31.3748935Z mad.wide.s32 %rd96, %r701, 2, %rd5; 2026-02-21T08:13:31.3749193Z mad.wide.s32 %rd97, %r702, 2, %rd5; 2026-02-21T08:13:31.3749460Z mad.wide.s32 %rd98, %r703, 2, %rd5; 2026-02-21T08:13:31.3749718Z mad.wide.s32 %rd99, %r704, 2, %rd5; 2026-02-21T08:13:31.3749989Z mad.wide.s32 %rd100, %r705, 2, %rd5; 2026-02-21T08:13:31.3750442Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3750895Z // begin inline asm 2026-02-21T08:13:31.3751487Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373}, [%r198 + 0]; 2026-02-21T08:13:31.3752140Z // end inline asm 2026-02-21T08:13:31.3752634Z // begin inline asm 2026-02-21T08:13:31.3753241Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r375, %r376, %r377, %r378, %r379, %r380, %r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390}, [%r198 + 16]; 2026-02-21T08:13:31.3753911Z // end inline asm 2026-02-21T08:13:31.3754127Z // begin inline asm 2026-02-21T08:13:31.3754778Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407}, [%r198 + 32]; 2026-02-21T08:13:31.3755432Z // end inline asm 2026-02-21T08:13:31.3755641Z // begin inline asm 2026-02-21T08:13:31.3756234Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424}, [%r198 + 48]; 2026-02-21T08:13:31.3756875Z // end inline asm 2026-02-21T08:13:31.3757085Z // begin inline asm 2026-02-21T08:13:31.3757762Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441}, [%r198 + 64]; 2026-02-21T08:13:31.3758416Z // end inline asm 2026-02-21T08:13:31.3758630Z // begin inline asm 2026-02-21T08:13:31.3759202Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458}, [%r198 + 80]; 2026-02-21T08:13:31.3759821Z // end inline asm 2026-02-21T08:13:31.3760013Z // begin inline asm 2026-02-21T08:13:31.3760575Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475}, [%r198 + 96]; 2026-02-21T08:13:31.3761198Z // end inline asm 2026-02-21T08:13:31.3761391Z // begin inline asm 2026-02-21T08:13:31.3761958Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492}, [%r198 + 112]; 2026-02-21T08:13:31.3762579Z // end inline asm 2026-02-21T08:13:31.3762783Z // begin inline asm 2026-02-21T08:13:31.3763010Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:13:31.3763251Z // end inline asm 2026-02-21T08:13:31.3763465Z cvt.u64.u32 %rd101, %r358; 2026-02-21T08:13:31.3763703Z cvt.u64.u32 %rd102, %r359; 2026-02-21T08:13:31.3763948Z shl.b64 %rd103, %rd102, 32; 2026-02-21T08:13:31.3764189Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T08:13:31.3764623Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3765120Z mov.b64 {%r706, %r707}, %rd104; 2026-02-21T08:13:31.3765389Z cvt.rn.f16x2.f32 %r708, %r707, %r706; 2026-02-21T08:13:31.3765841Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3766295Z cvt.u64.u32 %rd105, %r360; 2026-02-21T08:13:31.3766540Z cvt.u64.u32 %rd106, %r361; 2026-02-21T08:13:31.3766771Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:13:31.3767022Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:13:31.3767443Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3767903Z mov.b64 {%r709, %r710}, %rd108; 2026-02-21T08:13:31.3768159Z cvt.rn.f16x2.f32 %r711, %r710, %r709; 2026-02-21T08:13:31.3768608Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3769064Z cvt.u64.u32 %rd109, %r362; 2026-02-21T08:13:31.3769295Z cvt.u64.u32 %rd110, %r363; 2026-02-21T08:13:31.3769535Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:13:31.3769771Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:13:31.3770196Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3770645Z mov.b64 {%r712, %r713}, %rd112; 2026-02-21T08:13:31.3770906Z cvt.rn.f16x2.f32 %r714, %r713, %r712; 2026-02-21T08:13:31.3771363Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3771938Z cvt.u64.u32 %rd113, %r364; 2026-02-21T08:13:31.3772193Z cvt.u64.u32 %rd114, %r365; 2026-02-21T08:13:31.3772434Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:13:31.3772690Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:13:31.3773120Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3773608Z mov.b64 {%r715, %r716}, %rd116; 2026-02-21T08:13:31.3773864Z cvt.rn.f16x2.f32 %r717, %r716, %r715; 2026-02-21T08:13:31.3774331Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3774848Z cvt.u64.u32 %rd117, %r366; 2026-02-21T08:13:31.3775097Z cvt.u64.u32 %rd118, %r367; 2026-02-21T08:13:31.3775335Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:13:31.3775568Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:13:31.3776076Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3776531Z mov.b64 {%r718, %r719}, %rd120; 2026-02-21T08:13:31.3776790Z cvt.rn.f16x2.f32 %r720, %r719, %r718; 2026-02-21T08:13:31.3777232Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3777677Z cvt.u64.u32 %rd121, %r368; 2026-02-21T08:13:31.3777915Z cvt.u64.u32 %rd122, %r369; 2026-02-21T08:13:31.3778144Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:13:31.3778382Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:13:31.3778794Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3779252Z mov.b64 {%r721, %r722}, %rd124; 2026-02-21T08:13:31.3779501Z cvt.rn.f16x2.f32 %r723, %r722, %r721; 2026-02-21T08:13:31.3779945Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3780395Z cvt.u64.u32 %rd125, %r370; 2026-02-21T08:13:31.3780628Z cvt.u64.u32 %rd126, %r371; 2026-02-21T08:13:31.3780866Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:13:31.3781098Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:13:31.3781521Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3781967Z mov.b64 {%r724, %r725}, %rd128; 2026-02-21T08:13:31.3782223Z cvt.rn.f16x2.f32 %r726, %r725, %r724; 2026-02-21T08:13:31.3782664Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3783116Z cvt.u64.u32 %rd129, %r372; 2026-02-21T08:13:31.3783352Z cvt.u64.u32 %rd130, %r373; 2026-02-21T08:13:31.3783579Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:13:31.3783817Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:13:31.3784232Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3784719Z mov.b64 {%r727, %r728}, %rd132; 2026-02-21T08:13:31.3784970Z cvt.rn.f16x2.f32 %r729, %r728, %r727; 2026-02-21T08:13:31.3785413Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3785874Z cvt.u64.u32 %rd133, %r375; 2026-02-21T08:13:31.3786102Z cvt.u64.u32 %rd134, %r376; 2026-02-21T08:13:31.3786336Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:13:31.3786567Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:13:31.3786985Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3787431Z mov.b64 {%r730, %r731}, %rd136; 2026-02-21T08:13:31.3787690Z cvt.rn.f16x2.f32 %r732, %r731, %r730; 2026-02-21T08:13:31.3788127Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3788571Z cvt.u64.u32 %rd137, %r377; 2026-02-21T08:13:31.3788812Z cvt.u64.u32 %rd138, %r378; 2026-02-21T08:13:31.3789040Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:13:31.3789277Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:13:31.3789693Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3790261Z mov.b64 {%r733, %r734}, %rd140; 2026-02-21T08:13:31.3790511Z cvt.rn.f16x2.f32 %r735, %r734, %r733; 2026-02-21T08:13:31.3790953Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3791409Z cvt.u64.u32 %rd141, %r379; 2026-02-21T08:13:31.3791638Z cvt.u64.u32 %rd142, %r380; 2026-02-21T08:13:31.3791876Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:13:31.3792111Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:13:31.3792534Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3792985Z mov.b64 {%r736, %r737}, %rd144; 2026-02-21T08:13:31.3793247Z cvt.rn.f16x2.f32 %r738, %r737, %r736; 2026-02-21T08:13:31.3793691Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3794279Z cvt.u64.u32 %rd145, %r381; 2026-02-21T08:13:31.3794586Z cvt.u64.u32 %rd146, %r382; 2026-02-21T08:13:31.3794987Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:13:31.3795296Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:13:31.3795726Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3796178Z mov.b64 {%r739, %r740}, %rd148; 2026-02-21T08:13:31.3796413Z cvt.rn.f16x2.f32 %r741, %r740, %r739; 2026-02-21T08:13:31.3796844Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3797336Z cvt.u64.u32 %rd149, %r383; 2026-02-21T08:13:31.3797576Z cvt.u64.u32 %rd150, %r384; 2026-02-21T08:13:31.3797822Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:13:31.3798066Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:13:31.3798513Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3799000Z mov.b64 {%r742, %r743}, %rd152; 2026-02-21T08:13:31.3799271Z cvt.rn.f16x2.f32 %r744, %r743, %r742; 2026-02-21T08:13:31.3799747Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3800233Z cvt.u64.u32 %rd153, %r385; 2026-02-21T08:13:31.3800485Z cvt.u64.u32 %rd154, %r386; 2026-02-21T08:13:31.3800726Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:13:31.3800982Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:13:31.3801426Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3801916Z mov.b64 {%r745, %r746}, %rd156; 2026-02-21T08:13:31.3802179Z cvt.rn.f16x2.f32 %r747, %r746, %r745; 2026-02-21T08:13:31.3802655Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3803161Z cvt.u64.u32 %rd157, %r387; 2026-02-21T08:13:31.3803392Z cvt.u64.u32 %rd158, %r388; 2026-02-21T08:13:31.3803636Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:13:31.3803878Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:13:31.3804309Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3804831Z mov.b64 {%r748, %r749}, %rd160; 2026-02-21T08:13:31.3805093Z cvt.rn.f16x2.f32 %r750, %r749, %r748; 2026-02-21T08:13:31.3805547Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3805999Z cvt.u64.u32 %rd161, %r389; 2026-02-21T08:13:31.3806242Z cvt.u64.u32 %rd162, %r390; 2026-02-21T08:13:31.3806472Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:13:31.3806718Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:13:31.3807136Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3807592Z mov.b64 {%r751, %r752}, %rd164; 2026-02-21T08:13:31.3807846Z cvt.rn.f16x2.f32 %r753, %r752, %r751; 2026-02-21T08:13:31.3808296Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3808898Z cvt.u64.u32 %rd165, %r392; 2026-02-21T08:13:31.3809128Z cvt.u64.u32 %rd166, %r393; 2026-02-21T08:13:31.3809366Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:13:31.3809599Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:13:31.3810025Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3810480Z mov.b64 {%r754, %r755}, %rd168; 2026-02-21T08:13:31.3810736Z cvt.rn.f16x2.f32 %r756, %r755, %r754; 2026-02-21T08:13:31.3811178Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3811627Z cvt.u64.u32 %rd169, %r394; 2026-02-21T08:13:31.3811862Z cvt.u64.u32 %rd170, %r395; 2026-02-21T08:13:31.3812089Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:13:31.3812325Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:13:31.3812828Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3813297Z mov.b64 {%r757, %r758}, %rd172; 2026-02-21T08:13:31.3813547Z cvt.rn.f16x2.f32 %r759, %r758, %r757; 2026-02-21T08:13:31.3813985Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3814444Z cvt.u64.u32 %rd173, %r396; 2026-02-21T08:13:31.3814707Z cvt.u64.u32 %rd174, %r397; 2026-02-21T08:13:31.3814943Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:13:31.3815174Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:13:31.3815599Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3816051Z mov.b64 {%r760, %r761}, %rd176; 2026-02-21T08:13:31.3816309Z cvt.rn.f16x2.f32 %r762, %r761, %r760; 2026-02-21T08:13:31.3816750Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3817199Z cvt.u64.u32 %rd177, %r398; 2026-02-21T08:13:31.3817439Z cvt.u64.u32 %rd178, %r399; 2026-02-21T08:13:31.3817669Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:13:31.3817909Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:13:31.3818328Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3818782Z mov.b64 {%r763, %r764}, %rd180; 2026-02-21T08:13:31.3819031Z cvt.rn.f16x2.f32 %r765, %r764, %r763; 2026-02-21T08:13:31.3819472Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3819929Z cvt.u64.u32 %rd181, %r400; 2026-02-21T08:13:31.3820157Z cvt.u64.u32 %rd182, %r401; 2026-02-21T08:13:31.3820396Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:13:31.3820626Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:13:31.3821052Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3821505Z mov.b64 {%r766, %r767}, %rd184; 2026-02-21T08:13:31.3821765Z cvt.rn.f16x2.f32 %r768, %r767, %r766; 2026-02-21T08:13:31.3822206Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3822653Z cvt.u64.u32 %rd185, %r402; 2026-02-21T08:13:31.3822888Z cvt.u64.u32 %rd186, %r403; 2026-02-21T08:13:31.3823115Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:13:31.3823352Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:13:31.3823765Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3824225Z mov.b64 {%r769, %r770}, %rd188; 2026-02-21T08:13:31.3824474Z cvt.rn.f16x2.f32 %r771, %r770, %r769; 2026-02-21T08:13:31.3824892Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3825355Z cvt.u64.u32 %rd189, %r404; 2026-02-21T08:13:31.3825589Z cvt.u64.u32 %rd190, %r405; 2026-02-21T08:13:31.3825831Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:13:31.3826071Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:13:31.3826604Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3827053Z mov.b64 {%r772, %r773}, %rd192; 2026-02-21T08:13:31.3827314Z cvt.rn.f16x2.f32 %r774, %r773, %r772; 2026-02-21T08:13:31.3827755Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3828198Z cvt.u64.u32 %rd193, %r406; 2026-02-21T08:13:31.3828435Z cvt.u64.u32 %rd194, %r407; 2026-02-21T08:13:31.3828663Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:13:31.3828906Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:13:31.3829320Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3829781Z mov.b64 {%r775, %r776}, %rd196; 2026-02-21T08:13:31.3830029Z cvt.rn.f16x2.f32 %r777, %r776, %r775; 2026-02-21T08:13:31.3830557Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3831022Z cvt.u64.u32 %rd197, %r409; 2026-02-21T08:13:31.3831250Z cvt.u64.u32 %rd198, %r410; 2026-02-21T08:13:31.3831485Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:13:31.3831717Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:13:31.3832142Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3832586Z mov.b64 {%r778, %r779}, %rd200; 2026-02-21T08:13:31.3832837Z cvt.rn.f16x2.f32 %r780, %r779, %r778; 2026-02-21T08:13:31.3833278Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3833721Z cvt.u64.u32 %rd201, %r411; 2026-02-21T08:13:31.3833956Z cvt.u64.u32 %rd202, %r412; 2026-02-21T08:13:31.3834183Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:13:31.3834424Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:13:31.3834885Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3835343Z mov.b64 {%r781, %r782}, %rd204; 2026-02-21T08:13:31.3835597Z cvt.rn.f16x2.f32 %r783, %r782, %r781; 2026-02-21T08:13:31.3836034Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3836490Z cvt.u64.u32 %rd205, %r413; 2026-02-21T08:13:31.3836716Z cvt.u64.u32 %rd206, %r414; 2026-02-21T08:13:31.3836953Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:13:31.3837185Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:13:31.3837607Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3838060Z mov.b64 {%r784, %r785}, %rd208; 2026-02-21T08:13:31.3838330Z cvt.rn.f16x2.f32 %r786, %r785, %r784; 2026-02-21T08:13:31.3838797Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3839267Z cvt.u64.u32 %rd209, %r415; 2026-02-21T08:13:31.3839521Z cvt.u64.u32 %rd210, %r416; 2026-02-21T08:13:31.3839764Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:13:31.3840023Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:13:31.3840455Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3840934Z mov.b64 {%r787, %r788}, %rd212; 2026-02-21T08:13:31.3841197Z cvt.rn.f16x2.f32 %r789, %r788, %r787; 2026-02-21T08:13:31.3841674Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3842155Z cvt.u64.u32 %rd213, %r417; 2026-02-21T08:13:31.3842395Z cvt.u64.u32 %rd214, %r418; 2026-02-21T08:13:31.3842646Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:13:31.3842889Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:13:31.3843335Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3843807Z mov.b64 {%r790, %r791}, %rd216; 2026-02-21T08:13:31.3844078Z cvt.rn.f16x2.f32 %r792, %r791, %r790; 2026-02-21T08:13:31.3844547Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3845227Z cvt.u64.u32 %rd217, %r419; 2026-02-21T08:13:31.3845480Z cvt.u64.u32 %rd218, %r420; 2026-02-21T08:13:31.3845732Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:13:31.3845973Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:13:31.3846384Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3846841Z mov.b64 {%r793, %r794}, %rd220; 2026-02-21T08:13:31.3847088Z cvt.rn.f16x2.f32 %r795, %r794, %r793; 2026-02-21T08:13:31.3847529Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3847987Z cvt.u64.u32 %rd221, %r421; 2026-02-21T08:13:31.3848217Z cvt.u64.u32 %rd222, %r422; 2026-02-21T08:13:31.3848456Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:13:31.3848691Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:13:31.3849183Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3849640Z mov.b64 {%r796, %r797}, %rd224; 2026-02-21T08:13:31.3849900Z cvt.rn.f16x2.f32 %r798, %r797, %r796; 2026-02-21T08:13:31.3850349Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3850801Z cvt.u64.u32 %rd225, %r423; 2026-02-21T08:13:31.3851037Z cvt.u64.u32 %rd226, %r424; 2026-02-21T08:13:31.3851266Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:13:31.3851504Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:13:31.3851916Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3852370Z mov.b64 {%r799, %r800}, %rd228; 2026-02-21T08:13:31.3852618Z cvt.rn.f16x2.f32 %r801, %r800, %r799; 2026-02-21T08:13:31.3853058Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3853519Z cvt.u64.u32 %rd229, %r426; 2026-02-21T08:13:31.3853749Z cvt.u64.u32 %rd230, %r427; 2026-02-21T08:13:31.3853986Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:13:31.3854218Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:13:31.3854641Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3855125Z mov.b64 {%r802, %r803}, %rd232; 2026-02-21T08:13:31.3855380Z cvt.rn.f16x2.f32 %r804, %r803, %r802; 2026-02-21T08:13:31.3855824Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3856272Z cvt.u64.u32 %rd233, %r428; 2026-02-21T08:13:31.3856509Z cvt.u64.u32 %rd234, %r429; 2026-02-21T08:13:31.3856737Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:13:31.3856976Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:13:31.3857390Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3857852Z mov.b64 {%r805, %r806}, %rd236; 2026-02-21T08:13:31.3858108Z cvt.rn.f16x2.f32 %r807, %r806, %r805; 2026-02-21T08:13:31.3858552Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3859012Z cvt.u64.u32 %rd237, %r430; 2026-02-21T08:13:31.3859243Z cvt.u64.u32 %rd238, %r431; 2026-02-21T08:13:31.3859480Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:13:31.3859710Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:13:31.3860132Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3860578Z mov.b64 {%r808, %r809}, %rd240; 2026-02-21T08:13:31.3860834Z cvt.rn.f16x2.f32 %r810, %r809, %r808; 2026-02-21T08:13:31.3861278Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3861726Z cvt.u64.u32 %rd241, %r432; 2026-02-21T08:13:31.3861967Z cvt.u64.u32 %rd242, %r433; 2026-02-21T08:13:31.3862195Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:13:31.3862440Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:13:31.3862981Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3863442Z mov.b64 {%r811, %r812}, %rd244; 2026-02-21T08:13:31.3863692Z cvt.rn.f16x2.f32 %r813, %r812, %r811; 2026-02-21T08:13:31.3864135Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3864588Z cvt.u64.u32 %rd245, %r434; 2026-02-21T08:13:31.3864858Z cvt.u64.u32 %rd246, %r435; 2026-02-21T08:13:31.3865098Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:13:31.3865335Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:13:31.3865763Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3866223Z mov.b64 {%r814, %r815}, %rd248; 2026-02-21T08:13:31.3866479Z cvt.rn.f16x2.f32 %r816, %r815, %r814; 2026-02-21T08:13:31.3866995Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3867448Z cvt.u64.u32 %rd249, %r436; 2026-02-21T08:13:31.3867688Z cvt.u64.u32 %rd250, %r437; 2026-02-21T08:13:31.3867917Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:13:31.3868155Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:13:31.3868573Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3869030Z mov.b64 {%r817, %r818}, %rd252; 2026-02-21T08:13:31.3869279Z cvt.rn.f16x2.f32 %r819, %r818, %r817; 2026-02-21T08:13:31.3869719Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3870173Z cvt.u64.u32 %rd253, %r438; 2026-02-21T08:13:31.3870403Z cvt.u64.u32 %rd254, %r439; 2026-02-21T08:13:31.3870640Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:13:31.3870872Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:13:31.3871301Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3871756Z mov.b64 {%r820, %r821}, %rd256; 2026-02-21T08:13:31.3872011Z cvt.rn.f16x2.f32 %r822, %r821, %r820; 2026-02-21T08:13:31.3872452Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3872898Z cvt.u64.u32 %rd257, %r440; 2026-02-21T08:13:31.3873138Z cvt.u64.u32 %rd258, %r441; 2026-02-21T08:13:31.3873367Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:13:31.3873608Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:13:31.3874027Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3874484Z mov.b64 {%r823, %r824}, %rd260; 2026-02-21T08:13:31.3874774Z cvt.rn.f16x2.f32 %r825, %r824, %r823; 2026-02-21T08:13:31.3875213Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3875674Z cvt.u64.u32 %rd261, %r443; 2026-02-21T08:13:31.3875905Z cvt.u64.u32 %rd262, %r444; 2026-02-21T08:13:31.3876156Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:13:31.3876394Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:13:31.3876819Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3877265Z mov.b64 {%r826, %r827}, %rd264; 2026-02-21T08:13:31.3877517Z cvt.rn.f16x2.f32 %r828, %r827, %r826; 2026-02-21T08:13:31.3877960Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3878410Z cvt.u64.u32 %rd265, %r445; 2026-02-21T08:13:31.3878646Z cvt.u64.u32 %rd266, %r446; 2026-02-21T08:13:31.3878874Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:13:31.3879115Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:13:31.3879526Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3879979Z mov.b64 {%r829, %r830}, %rd268; 2026-02-21T08:13:31.3880230Z cvt.rn.f16x2.f32 %r831, %r830, %r829; 2026-02-21T08:13:31.3880763Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3881239Z cvt.u64.u32 %rd269, %r447; 2026-02-21T08:13:31.3881480Z cvt.u64.u32 %rd270, %r448; 2026-02-21T08:13:31.3881729Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:13:31.3881973Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:13:31.3882418Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3882897Z mov.b64 {%r832, %r833}, %rd272; 2026-02-21T08:13:31.3883166Z cvt.rn.f16x2.f32 %r834, %r833, %r832; 2026-02-21T08:13:31.3883631Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3884105Z cvt.u64.u32 %rd273, %r449; 2026-02-21T08:13:31.3884354Z cvt.u64.u32 %rd274, %r450; 2026-02-21T08:13:31.3884594Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:13:31.3884970Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:13:31.3885412Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3885895Z mov.b64 {%r835, %r836}, %rd276; 2026-02-21T08:13:31.3886158Z cvt.rn.f16x2.f32 %r837, %r836, %r835; 2026-02-21T08:13:31.3886624Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3887103Z cvt.u64.u32 %rd277, %r451; 2026-02-21T08:13:31.3887343Z cvt.u64.u32 %rd278, %r452; 2026-02-21T08:13:31.3887592Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:13:31.3887836Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:13:31.3888286Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3888757Z mov.b64 {%r838, %r839}, %rd280; 2026-02-21T08:13:31.3889026Z cvt.rn.f16x2.f32 %r840, %r839, %r838; 2026-02-21T08:13:31.3889492Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3889950Z cvt.u64.u32 %rd281, %r453; 2026-02-21T08:13:31.3890195Z cvt.u64.u32 %rd282, %r454; 2026-02-21T08:13:31.3890424Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:13:31.3890665Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:13:31.3891081Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3891538Z mov.b64 {%r841, %r842}, %rd284; 2026-02-21T08:13:31.3891784Z cvt.rn.f16x2.f32 %r843, %r842, %r841; 2026-02-21T08:13:31.3892225Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3892681Z cvt.u64.u32 %rd285, %r455; 2026-02-21T08:13:31.3892910Z cvt.u64.u32 %rd286, %r456; 2026-02-21T08:13:31.3893151Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:13:31.3893382Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:13:31.3893806Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3894256Z mov.b64 {%r844, %r845}, %rd288; 2026-02-21T08:13:31.3894520Z cvt.rn.f16x2.f32 %r846, %r845, %r844; 2026-02-21T08:13:31.3895019Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3895464Z cvt.u64.u32 %rd289, %r457; 2026-02-21T08:13:31.3895709Z cvt.u64.u32 %rd290, %r458; 2026-02-21T08:13:31.3895941Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:13:31.3896187Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:13:31.3896600Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3897057Z mov.b64 {%r847, %r848}, %rd292; 2026-02-21T08:13:31.3897307Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T08:13:31.3897752Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3898210Z cvt.u64.u32 %rd293, %r460; 2026-02-21T08:13:31.3898440Z cvt.u64.u32 %rd294, %r461; 2026-02-21T08:13:31.3898679Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:13:31.3899007Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:13:31.3899427Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3899872Z mov.b64 {%r850, %r851}, %rd296; 2026-02-21T08:13:31.3900126Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T08:13:31.3900569Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3901012Z cvt.u64.u32 %rd297, %r462; 2026-02-21T08:13:31.3901249Z cvt.u64.u32 %rd298, %r463; 2026-02-21T08:13:31.3901475Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:13:31.3901714Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:13:31.3902125Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3902580Z mov.b64 {%r853, %r854}, %rd300; 2026-02-21T08:13:31.3902837Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T08:13:31.3903349Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3903810Z cvt.u64.u32 %rd301, %r464; 2026-02-21T08:13:31.3904040Z cvt.u64.u32 %rd302, %r465; 2026-02-21T08:13:31.3904279Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:13:31.3904512Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:13:31.3904965Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3905409Z mov.b64 {%r856, %r857}, %rd304; 2026-02-21T08:13:31.3905665Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T08:13:31.3906105Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3906559Z cvt.u64.u32 %rd305, %r466; 2026-02-21T08:13:31.3906795Z cvt.u64.u32 %rd306, %r467; 2026-02-21T08:13:31.3907023Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:13:31.3907263Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:13:31.3907681Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3908138Z mov.b64 {%r859, %r860}, %rd308; 2026-02-21T08:13:31.3908395Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T08:13:31.3908829Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3909283Z cvt.u64.u32 %rd309, %r468; 2026-02-21T08:13:31.3909516Z cvt.u64.u32 %rd310, %r469; 2026-02-21T08:13:31.3909752Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:13:31.3910011Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:13:31.3910523Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3910936Z mov.b64 {%r862, %r863}, %rd312; 2026-02-21T08:13:31.3911177Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T08:13:31.3911580Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3911991Z cvt.u64.u32 %rd313, %r470; 2026-02-21T08:13:31.3912222Z cvt.u64.u32 %rd314, %r471; 2026-02-21T08:13:31.3912446Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:13:31.3912682Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:13:31.3913093Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3913549Z mov.b64 {%r865, %r866}, %rd316; 2026-02-21T08:13:31.3913803Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T08:13:31.3914231Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3914667Z cvt.u64.u32 %rd317, %r472; 2026-02-21T08:13:31.3914961Z cvt.u64.u32 %rd318, %r473; 2026-02-21T08:13:31.3915199Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:13:31.3915432Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:13:31.3915858Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3916313Z mov.b64 {%r868, %r869}, %rd320; 2026-02-21T08:13:31.3916577Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T08:13:31.3917160Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3917611Z cvt.u64.u32 %rd321, %r474; 2026-02-21T08:13:31.3917850Z cvt.u64.u32 %rd322, %r475; 2026-02-21T08:13:31.3918077Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:13:31.3918318Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:13:31.3918730Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3919184Z mov.b64 {%r871, %r872}, %rd324; 2026-02-21T08:13:31.3919437Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T08:13:31.3919871Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3920322Z cvt.u64.u32 %rd325, %r477; 2026-02-21T08:13:31.3920547Z cvt.u64.u32 %rd326, %r478; 2026-02-21T08:13:31.3920781Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:13:31.3921122Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:13:31.3921553Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3921999Z mov.b64 {%r874, %r875}, %rd328; 2026-02-21T08:13:31.3922255Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T08:13:31.3922694Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3923140Z cvt.u64.u32 %rd329, %r479; 2026-02-21T08:13:31.3923374Z cvt.u64.u32 %rd330, %r480; 2026-02-21T08:13:31.3923602Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:13:31.3923841Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:13:31.3924254Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3924783Z mov.b64 {%r877, %r878}, %rd332; 2026-02-21T08:13:31.3925055Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T08:13:31.3925519Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3926006Z cvt.u64.u32 %rd333, %r481; 2026-02-21T08:13:31.3926245Z cvt.u64.u32 %rd334, %r482; 2026-02-21T08:13:31.3926495Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:13:31.3926738Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:13:31.3927187Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3927664Z mov.b64 {%r880, %r881}, %rd336; 2026-02-21T08:13:31.3927936Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T08:13:31.3928401Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3928878Z cvt.u64.u32 %rd337, %r483; 2026-02-21T08:13:31.3929128Z cvt.u64.u32 %rd338, %r484; 2026-02-21T08:13:31.3929370Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:13:31.3929619Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:13:31.3930058Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3930548Z mov.b64 {%r883, %r884}, %rd340; 2026-02-21T08:13:31.3930815Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T08:13:31.3931272Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3931752Z cvt.u64.u32 %rd341, %r485; 2026-02-21T08:13:31.3931991Z cvt.u64.u32 %rd342, %r486; 2026-02-21T08:13:31.3932246Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:13:31.3932480Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:13:31.3932907Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3933364Z mov.b64 {%r886, %r887}, %rd344; 2026-02-21T08:13:31.3933623Z cvt.rn.f16x2.f32 %r888, %r887, %r886; 2026-02-21T08:13:31.3934066Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3934512Z cvt.u64.u32 %rd345, %r487; 2026-02-21T08:13:31.3934787Z cvt.u64.u32 %rd346, %r488; 2026-02-21T08:13:31.3935133Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:13:31.3935377Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:13:31.3935798Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3936255Z mov.b64 {%r889, %r890}, %rd348; 2026-02-21T08:13:31.3936510Z cvt.rn.f16x2.f32 %r891, %r890, %r889; 2026-02-21T08:13:31.3936946Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3937401Z cvt.u64.u32 %rd349, %r489; 2026-02-21T08:13:31.3937632Z cvt.u64.u32 %rd350, %r490; 2026-02-21T08:13:31.3937866Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:13:31.3938095Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:13:31.3938513Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3938597Z mov.b64 {%r892, %r893}, %rd352; 2026-02-21T08:13:31.3938762Z cvt.rn.f16x2.f32 %r894, %r893, %r892; 2026-02-21T08:13:31.3939035Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3939121Z cvt.u64.u32 %rd353, %r491; 2026-02-21T08:13:31.3939205Z cvt.u64.u32 %rd354, %r492; 2026-02-21T08:13:31.3939296Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:13:31.3939382Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:13:31.3939643Z .loc 1 58 27 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:58:27 2026-02-21T08:13:31.3939734Z mov.b64 {%r895, %r896}, %rd356; 2026-02-21T08:13:31.3939825Z cvt.rn.f16x2.f32 %r897, %r896, %r895; 2026-02-21T08:13:31.3940086Z .loc 1 59 82 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:59:82 2026-02-21T08:13:31.3940235Z st.shared.v4.b32 [%r40], {%r708, %r720, %r732, %r744}; 2026-02-21T08:13:31.3940372Z st.shared.v4.b32 [%r41], {%r756, %r768, %r780, %r792}; 2026-02-21T08:13:31.3940505Z st.shared.v4.b32 [%r42], {%r804, %r816, %r828, %r840}; 2026-02-21T08:13:31.3940633Z st.shared.v4.b32 [%r43], {%r852, %r864, %r876, %r888}; 2026-02-21T08:13:31.3940725Z bar.sync 0, 128; 2026-02-21T08:13:31.3940807Z // begin inline asm 2026-02-21T08:13:31.3941046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r494, %r495, %r496, %r497}, [%r498]; 2026-02-21T08:13:31.3941135Z // end inline asm 2026-02-21T08:13:31.3941218Z // begin inline asm 2026-02-21T08:13:31.3941450Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r499, %r500, %r501, %r502}, [%r503]; 2026-02-21T08:13:31.3941528Z // end inline asm 2026-02-21T08:13:31.3941617Z // begin inline asm 2026-02-21T08:13:31.3941846Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r504, %r505, %r506, %r507}, [%r508]; 2026-02-21T08:13:31.3941925Z // end inline asm 2026-02-21T08:13:31.3942013Z // begin inline asm 2026-02-21T08:13:31.3942235Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r509, %r510, %r511, %r512}, [%r513]; 2026-02-21T08:13:31.3942313Z // end inline asm 2026-02-21T08:13:31.3942401Z bar.sync 0, 128; 2026-02-21T08:13:31.3942537Z st.shared.v4.b32 [%r40], {%r711, %r723, %r735, %r747}; 2026-02-21T08:13:31.3942670Z st.shared.v4.b32 [%r41], {%r759, %r771, %r783, %r795}; 2026-02-21T08:13:31.3942800Z st.shared.v4.b32 [%r42], {%r807, %r819, %r831, %r843}; 2026-02-21T08:13:31.3942936Z st.shared.v4.b32 [%r43], {%r855, %r867, %r879, %r891}; 2026-02-21T08:13:31.3943014Z bar.sync 0, 128; 2026-02-21T08:13:31.3943094Z // begin inline asm 2026-02-21T08:13:31.3943325Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r514, %r515, %r516, %r517}, [%r498]; 2026-02-21T08:13:31.3943403Z // end inline asm 2026-02-21T08:13:31.3943482Z // begin inline asm 2026-02-21T08:13:31.3943708Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r519, %r520, %r521, %r522}, [%r503]; 2026-02-21T08:13:31.3943794Z // end inline asm 2026-02-21T08:13:31.3943878Z // begin inline asm 2026-02-21T08:13:31.3944105Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r524, %r525, %r526, %r527}, [%r508]; 2026-02-21T08:13:31.3944189Z // end inline asm 2026-02-21T08:13:31.3944273Z // begin inline asm 2026-02-21T08:13:31.3944554Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r529, %r530, %r531, %r532}, [%r513]; 2026-02-21T08:13:31.3944639Z // end inline asm 2026-02-21T08:13:31.3944763Z bar.sync 0, 128; 2026-02-21T08:13:31.3944894Z st.shared.v4.b32 [%r40], {%r714, %r726, %r738, %r750}; 2026-02-21T08:13:31.3945023Z st.shared.v4.b32 [%r41], {%r762, %r774, %r786, %r798}; 2026-02-21T08:13:31.3945161Z st.shared.v4.b32 [%r42], {%r810, %r822, %r834, %r846}; 2026-02-21T08:13:31.3945289Z st.shared.v4.b32 [%r43], {%r858, %r870, %r882, %r894}; 2026-02-21T08:13:31.3945369Z bar.sync 0, 128; 2026-02-21T08:13:31.3945462Z // begin inline asm 2026-02-21T08:13:31.3945688Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r534, %r535, %r536, %r537}, [%r498]; 2026-02-21T08:13:31.3945768Z // end inline asm 2026-02-21T08:13:31.3945858Z // begin inline asm 2026-02-21T08:13:31.3946085Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r539, %r540, %r541, %r542}, [%r503]; 2026-02-21T08:13:31.3946231Z // end inline asm 2026-02-21T08:13:31.3946313Z // begin inline asm 2026-02-21T08:13:31.3946548Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r544, %r545, %r546, %r547}, [%r508]; 2026-02-21T08:13:31.3946626Z // end inline asm 2026-02-21T08:13:31.3946706Z // begin inline asm 2026-02-21T08:13:31.3946939Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r549, %r550, %r551, %r552}, [%r513]; 2026-02-21T08:13:31.3947016Z // end inline asm 2026-02-21T08:13:31.3947091Z bar.sync 0, 128; 2026-02-21T08:13:31.3947220Z st.shared.v4.b32 [%r40], {%r717, %r729, %r741, %r753}; 2026-02-21T08:13:31.3947355Z st.shared.v4.b32 [%r41], {%r765, %r777, %r789, %r801}; 2026-02-21T08:13:31.3947479Z st.shared.v4.b32 [%r42], {%r813, %r825, %r837, %r849}; 2026-02-21T08:13:31.3947602Z st.shared.v4.b32 [%r43], {%r861, %r873, %r885, %r897}; 2026-02-21T08:13:31.3947687Z bar.sync 0, 128; 2026-02-21T08:13:31.3947766Z // begin inline asm 2026-02-21T08:13:31.3947994Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r554, %r555, %r556, %r557}, [%r498]; 2026-02-21T08:13:31.3948081Z // end inline asm 2026-02-21T08:13:31.3948159Z // begin inline asm 2026-02-21T08:13:31.3948384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r559, %r560, %r561, %r562}, [%r503]; 2026-02-21T08:13:31.3948459Z // end inline asm 2026-02-21T08:13:31.3948548Z // begin inline asm 2026-02-21T08:13:31.3948771Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r564, %r565, %r566, %r567}, [%r508]; 2026-02-21T08:13:31.3948846Z // end inline asm 2026-02-21T08:13:31.3948933Z // begin inline asm 2026-02-21T08:13:31.3949157Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r569, %r570, %r571, %r572}, [%r513]; 2026-02-21T08:13:31.3949234Z // end inline asm 2026-02-21T08:13:31.3949312Z // begin inline asm 2026-02-21T08:13:31.3949470Z st.global.v4.b32 [ %rd85 + 0 ], { %r494, %r514, %r534, %r554 }; 2026-02-21T08:13:31.3949546Z // end inline asm 2026-02-21T08:13:31.3949623Z // begin inline asm 2026-02-21T08:13:31.3949783Z st.global.v4.b32 [ %rd86 + 0 ], { %r495, %r515, %r535, %r555 }; 2026-02-21T08:13:31.3949859Z // end inline asm 2026-02-21T08:13:31.3949939Z // begin inline asm 2026-02-21T08:13:31.3950092Z st.global.v4.b32 [ %rd87 + 0 ], { %r496, %r516, %r536, %r556 }; 2026-02-21T08:13:31.3950168Z // end inline asm 2026-02-21T08:13:31.3950247Z // begin inline asm 2026-02-21T08:13:31.3950389Z st.global.v4.b32 [ %rd88 + 0 ], { %r497, %r517, %r537, %r557 }; 2026-02-21T08:13:31.3950473Z // end inline asm 2026-02-21T08:13:31.3950554Z // begin inline asm 2026-02-21T08:13:31.3950697Z st.global.v4.b32 [ %rd89 + 0 ], { %r499, %r519, %r539, %r559 }; 2026-02-21T08:13:31.3950782Z // end inline asm 2026-02-21T08:13:31.3950860Z // begin inline asm 2026-02-21T08:13:31.3951003Z st.global.v4.b32 [ %rd90 + 0 ], { %r500, %r520, %r540, %r560 }; 2026-02-21T08:13:31.3951080Z // end inline asm 2026-02-21T08:13:31.3951166Z // begin inline asm 2026-02-21T08:13:31.3951312Z st.global.v4.b32 [ %rd91 + 0 ], { %r501, %r521, %r541, %r561 }; 2026-02-21T08:13:31.3951387Z // end inline asm 2026-02-21T08:13:31.3951477Z // begin inline asm 2026-02-21T08:13:31.3951691Z st.global.v4.b32 [ %rd92 + 0 ], { %r502, %r522, %r542, %r562 }; 2026-02-21T08:13:31.3951770Z // end inline asm 2026-02-21T08:13:31.3951850Z // begin inline asm 2026-02-21T08:13:31.3952002Z st.global.v4.b32 [ %rd93 + 0 ], { %r504, %r524, %r544, %r564 }; 2026-02-21T08:13:31.3952081Z // end inline asm 2026-02-21T08:13:31.3952161Z // begin inline asm 2026-02-21T08:13:31.3952314Z st.global.v4.b32 [ %rd94 + 0 ], { %r505, %r525, %r545, %r565 }; 2026-02-21T08:13:31.3952392Z // end inline asm 2026-02-21T08:13:31.3952472Z // begin inline asm 2026-02-21T08:13:31.3952626Z st.global.v4.b32 [ %rd95 + 0 ], { %r506, %r526, %r546, %r566 }; 2026-02-21T08:13:31.3952703Z // end inline asm 2026-02-21T08:13:31.3952782Z // begin inline asm 2026-02-21T08:13:31.3952928Z st.global.v4.b32 [ %rd96 + 0 ], { %r507, %r527, %r547, %r567 }; 2026-02-21T08:13:31.3953014Z // end inline asm 2026-02-21T08:13:31.3953094Z // begin inline asm 2026-02-21T08:13:31.3953290Z st.global.v4.b32 [ %rd97 + 0 ], { %r509, %r529, %r549, %r569 }; 2026-02-21T08:13:31.3953381Z // end inline asm 2026-02-21T08:13:31.3953460Z // begin inline asm 2026-02-21T08:13:31.3953604Z st.global.v4.b32 [ %rd98 + 0 ], { %r510, %r530, %r550, %r570 }; 2026-02-21T08:13:31.3953680Z // end inline asm 2026-02-21T08:13:31.3953767Z // begin inline asm 2026-02-21T08:13:31.3953907Z st.global.v4.b32 [ %rd99 + 0 ], { %r511, %r531, %r551, %r571 }; 2026-02-21T08:13:31.3953982Z // end inline asm 2026-02-21T08:13:31.3954070Z // begin inline asm 2026-02-21T08:13:31.3954220Z st.global.v4.b32 [ %rd100 + 0 ], { %r512, %r532, %r552, %r572 }; 2026-02-21T08:13:31.3954297Z // end inline asm 2026-02-21T08:13:31.3954576Z .loc 1 31 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:31:52 2026-02-21T08:13:31.3954663Z add.s32 %r905, %r905, 1; 2026-02-21T08:13:31.3954821Z setp.ne.b32 %p103, %r22, %r905; 2026-02-21T08:13:31.3954905Z @%p103 bra $L__BB0_14; 2026-02-21T08:13:31.3955035Z $L__BB0_15: // %._crit_edge 2026-02-21T08:13:31.3955304Z .loc 1 31 4 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:31:4 2026-02-21T08:13:31.3955384Z bar.sync 0, 128; 2026-02-21T08:13:31.3955478Z // begin inline asm 2026-02-21T08:13:31.3955662Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r898, 128; 2026-02-21T08:13:31.3955742Z // end inline asm 2026-02-21T08:13:31.3955873Z st.shared.b32 [global_smem+131272], 50529027; 2026-02-21T08:13:31.3955954Z barrier.sync 1; 2026-02-21T08:13:31.3956070Z $L__BB0_16: // %common.ret 2026-02-21T08:13:31.3956321Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3956403Z ret; 2026-02-21T08:13:31.3956538Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:13:31.3956626Z mov.b32 %r52, global_smem; 2026-02-21T08:13:31.3956717Z add.s32 %r53, %r52, %r3; 2026-02-21T08:13:31.3956802Z add.s32 %r83, %r52, 131232; 2026-02-21T08:13:31.3956889Z bfe.u32 %r97, %r52, 4, 14; 2026-02-21T08:13:31.3956973Z cvt.u64.u32 %rd22, %r97; 2026-02-21T08:13:31.3957079Z or.b64 %rd12, %rd22, 4611686293372403712; 2026-02-21T08:13:31.3957161Z add.s32 %r98, %r52, 65536; 2026-02-21T08:13:31.3957241Z bfe.u32 %r99, %r98, 4, 14; 2026-02-21T08:13:31.3957332Z cvt.u64.u32 %rd23, %r99; 2026-02-21T08:13:31.3957428Z or.b64 %rd13, %rd23, 4611686293372403712; 2026-02-21T08:13:31.3957509Z add.s32 %r100, %r52, 32; 2026-02-21T08:13:31.3957594Z bfe.u32 %r101, %r100, 4, 14; 2026-02-21T08:13:31.3957680Z bra.uni $L__BB0_2; 2026-02-21T08:13:31.3957830Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3958106Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3958229Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:13:31.3958312Z barrier.sync 1; 2026-02-21T08:13:31.3958394Z barrier.sync 1; 2026-02-21T08:13:31.3958587Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:13:31.3958702Z $L__BB0_2: // %.preheader 2026-02-21T08:13:31.3958833Z // =>This Loop Header: Depth=1 2026-02-21T08:13:31.3958961Z // Child Loop BB0_9 Depth 2 2026-02-21T08:13:31.3959094Z // Child Loop BB0_6 Depth 2 2026-02-21T08:13:31.3959348Z .loc 1 19 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:19 2026-02-21T08:13:31.3959460Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:13:31.3959548Z barrier.sync 1; 2026-02-21T08:13:31.3959641Z ld.shared.b8 %r51, [%r53+131268]; 2026-02-21T08:13:31.3959731Z setp.gt.u32 %p2, %r51, 3; 2026-02-21T08:13:31.3959819Z @%p2 bra $L__BB0_4; 2026-02-21T08:13:31.3959931Z // %bb.3: // %.preheader 2026-02-21T08:13:31.3960145Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3960238Z $L_brx_0: .branchtargets 2026-02-21T08:13:31.3960325Z $L__BB0_5, 2026-02-21T08:13:31.3960402Z $L__BB0_8, 2026-02-21T08:13:31.3960474Z $L__BB0_11, 2026-02-21T08:13:31.3960554Z $L__BB0_16; 2026-02-21T08:13:31.3960641Z brx.idx %r51, $L_brx_0; 2026-02-21T08:13:31.3960754Z $L__BB0_5: // %.peel.next 2026-02-21T08:13:31.3960896Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3961170Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3961282Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:13:31.3961391Z ld.shared.b32 %r85, [global_smem+131072]; 2026-02-21T08:13:31.3961478Z barrier.sync 1; 2026-02-21T08:13:31.3961731Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3961820Z bar.warp.sync -1; 2026-02-21T08:13:31.3961907Z mov.b32 %r899, 0; 2026-02-21T08:13:31.3961989Z // begin inline asm 2026-02-21T08:13:31.3962067Z 2026-02-21T08:13:31.3962137Z { 2026-02-21T08:13:31.3962230Z .reg .pred complete; 2026-02-21T08:13:31.3962306Z waitLoop: 2026-02-21T08:13:31.3962486Z mbarrier.try_wait.parity.shared.b64 complete, [%r83], %r899; 2026-02-21T08:13:31.3962588Z @!complete bra.uni waitLoop; 2026-02-21T08:13:31.3962658Z } 2026-02-21T08:13:31.3962666Z 2026-02-21T08:13:31.3962744Z // end inline asm 2026-02-21T08:13:31.3963016Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3963109Z elect.sync %r96|%p12, -1; 2026-02-21T08:13:31.3963192Z mov.b32 %r86, 136314896; 2026-02-21T08:13:31.3963274Z mov.pred %p11, 0; 2026-02-21T08:13:31.3963362Z // begin inline asm 2026-02-21T08:13:31.3963581Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd12, %rd13, %r86, %p11; 2026-02-21T08:13:31.3963658Z // end inline asm 2026-02-21T08:13:31.3963753Z cvt.u64.u32 %rd24, %r101; 2026-02-21T08:13:31.3963852Z or.b64 %rd14, %rd24, 4611686293372403712; 2026-02-21T08:13:31.3963935Z add.s32 %r102, %r52, 65568; 2026-02-21T08:13:31.3964021Z bfe.u32 %r103, %r102, 4, 14; 2026-02-21T08:13:31.3964112Z cvt.u64.u32 %rd25, %r103; 2026-02-21T08:13:31.3964207Z or.b64 %rd15, %rd25, 4611686293372403712; 2026-02-21T08:13:31.3964294Z mov.pred %p13, -1; 2026-02-21T08:13:31.3964381Z // begin inline asm 2026-02-21T08:13:31.3964590Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd14, %rd15, %r86, %p13; 2026-02-21T08:13:31.3964708Z // end inline asm 2026-02-21T08:13:31.3964800Z add.s32 %r104, %r52, 64; 2026-02-21T08:13:31.3964885Z bfe.u32 %r105, %r104, 4, 14; 2026-02-21T08:13:31.3964969Z cvt.u64.u32 %rd26, %r105; 2026-02-21T08:13:31.3965061Z or.b64 %rd16, %rd26, 4611686293372403712; 2026-02-21T08:13:31.3965153Z add.s32 %r106, %r52, 65600; 2026-02-21T08:13:31.3965239Z bfe.u32 %r107, %r106, 4, 14; 2026-02-21T08:13:31.3965326Z cvt.u64.u32 %rd27, %r107; 2026-02-21T08:13:31.3965499Z or.b64 %rd17, %rd27, 4611686293372403712; 2026-02-21T08:13:31.3965583Z // begin inline asm 2026-02-21T08:13:31.3965786Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd16, %rd17, %r86, %p13; 2026-02-21T08:13:31.3965873Z // end inline asm 2026-02-21T08:13:31.3965956Z add.s32 %r108, %r52, 96; 2026-02-21T08:13:31.3966039Z bfe.u32 %r109, %r108, 4, 14; 2026-02-21T08:13:31.3966123Z cvt.u64.u32 %rd28, %r109; 2026-02-21T08:13:31.3966227Z or.b64 %rd18, %rd28, 4611686293372403712; 2026-02-21T08:13:31.3966310Z add.s32 %r110, %r52, 65632; 2026-02-21T08:13:31.3966396Z bfe.u32 %r111, %r110, 4, 14; 2026-02-21T08:13:31.3966486Z cvt.u64.u32 %rd29, %r111; 2026-02-21T08:13:31.3966577Z or.b64 %rd19, %rd29, 4611686293372403712; 2026-02-21T08:13:31.3966670Z // begin inline asm 2026-02-21T08:13:31.3966869Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd18, %rd19, %r86, %p13; 2026-02-21T08:13:31.3966955Z // end inline asm 2026-02-21T08:13:31.3967103Z add.s32 %r112, %r52, 131200; 2026-02-21T08:13:31.3967192Z cvt.u64.u32 %rd20, %r112; 2026-02-21T08:13:31.3967281Z // begin inline asm 2026-02-21T08:13:31.3967473Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T08:13:31.3967552Z // end inline asm 2026-02-21T08:13:31.3967642Z add.s32 %r113, %r52, 131264; 2026-02-21T08:13:31.3967735Z cvt.u64.u32 %rd21, %r113; 2026-02-21T08:13:31.3967819Z // begin inline asm 2026-02-21T08:13:31.3968013Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T08:13:31.3968106Z // end inline asm 2026-02-21T08:13:31.3968187Z mov.b32 %r901, 1; 2026-02-21T08:13:31.3968274Z mov.b32 %r900, %r899; 2026-02-21T08:13:31.3968436Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:13:31.3968582Z // => This Inner Loop Header: Depth=2 2026-02-21T08:13:31.3968876Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3968975Z shl.b32 %r124, %r901, 3; 2026-02-21T08:13:31.3969061Z add.s32 %r126, %r52, %r124; 2026-02-21T08:13:31.3969151Z add.s32 %r127, %r126, 131200; 2026-02-21T08:13:31.3969239Z add.s32 %r114, %r126, 131232; 2026-02-21T08:13:31.3969533Z .loc 1 54 31 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:54:31 2026-02-21T08:13:31.3969617Z shl.b32 %r128, %r901, 14; 2026-02-21T08:13:31.3969703Z add.s32 %r129, %r52, %r128; 2026-02-21T08:13:31.3969992Z .loc 1 55 44 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:55:44 2026-02-21T08:13:31.3970079Z add.s32 %r130, %r129, 65536; 2026-02-21T08:13:31.3970349Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3970447Z bar.warp.sync -1; 2026-02-21T08:13:31.3970531Z // begin inline asm 2026-02-21T08:13:31.3970606Z 2026-02-21T08:13:31.3970677Z { 2026-02-21T08:13:31.3970779Z .reg .pred complete; 2026-02-21T08:13:31.3970861Z waitLoop: 2026-02-21T08:13:31.3971058Z mbarrier.try_wait.parity.shared.b64 complete, [%r114], %r900; 2026-02-21T08:13:31.3971163Z @!complete bra.uni waitLoop; 2026-02-21T08:13:31.3971235Z } 2026-02-21T08:13:31.3971242Z 2026-02-21T08:13:31.3971323Z // end inline asm 2026-02-21T08:13:31.3971602Z .loc 1 56 52 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:56:52 2026-02-21T08:13:31.3971707Z setp.eq.b32 %p31, %r899, 896; 2026-02-21T08:13:31.3971803Z elect.sync %r131|%p22, -1; 2026-02-21T08:13:31.3971891Z bfe.u32 %r132, %r129, 4, 14; 2026-02-21T08:13:31.3971986Z cvt.u64.u32 %rd40, %r132; 2026-02-21T08:13:31.3972085Z or.b64 %rd30, %rd40, 4611686293372403712; 2026-02-21T08:13:31.3972170Z bfe.u32 %r133, %r130, 4, 14; 2026-02-21T08:13:31.3972258Z cvt.u64.u32 %rd41, %r133; 2026-02-21T08:13:31.3972364Z or.b64 %rd31, %rd41, 4611686293372403712; 2026-02-21T08:13:31.3972449Z // begin inline asm 2026-02-21T08:13:31.3972669Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd30, %rd31, %r86, %p13; 2026-02-21T08:13:31.3972830Z // end inline asm 2026-02-21T08:13:31.3972915Z add.s32 %r134, %r129, 32; 2026-02-21T08:13:31.3973000Z bfe.u32 %r135, %r134, 4, 14; 2026-02-21T08:13:31.3973095Z cvt.u64.u32 %rd42, %r135; 2026-02-21T08:13:31.3973195Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T08:13:31.3973283Z add.s32 %r136, %r129, 65568; 2026-02-21T08:13:31.3973370Z bfe.u32 %r137, %r136, 4, 14; 2026-02-21T08:13:31.3973468Z cvt.u64.u32 %rd43, %r137; 2026-02-21T08:13:31.3973566Z or.b64 %rd33, %rd43, 4611686293372403712; 2026-02-21T08:13:31.3973648Z // begin inline asm 2026-02-21T08:13:31.3973867Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd32, %rd33, %r86, %p13; 2026-02-21T08:13:31.3973948Z // end inline asm 2026-02-21T08:13:31.3974035Z add.s32 %r138, %r129, 64; 2026-02-21T08:13:31.3974127Z bfe.u32 %r139, %r138, 4, 14; 2026-02-21T08:13:31.3974214Z cvt.u64.u32 %rd44, %r139; 2026-02-21T08:13:31.3974369Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T08:13:31.3974458Z add.s32 %r140, %r129, 65600; 2026-02-21T08:13:31.3974551Z bfe.u32 %r141, %r140, 4, 14; 2026-02-21T08:13:31.3974638Z cvt.u64.u32 %rd45, %r141; 2026-02-21T08:13:31.3974778Z or.b64 %rd35, %rd45, 4611686293372403712; 2026-02-21T08:13:31.3974871Z // begin inline asm 2026-02-21T08:13:31.3975080Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd34, %rd35, %r86, %p13; 2026-02-21T08:13:31.3975162Z // end inline asm 2026-02-21T08:13:31.3975248Z add.s32 %r142, %r129, 96; 2026-02-21T08:13:31.3975343Z bfe.u32 %r143, %r142, 4, 14; 2026-02-21T08:13:31.3975430Z cvt.u64.u32 %rd46, %r143; 2026-02-21T08:13:31.3975528Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T08:13:31.3975626Z add.s32 %r144, %r129, 65632; 2026-02-21T08:13:31.3975722Z bfe.u32 %r145, %r144, 4, 14; 2026-02-21T08:13:31.3975805Z cvt.u64.u32 %rd47, %r145; 2026-02-21T08:13:31.3975905Z or.b64 %rd37, %rd47, 4611686293372403712; 2026-02-21T08:13:31.3975989Z // begin inline asm 2026-02-21T08:13:31.3976191Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd36, %rd37, %r86, %p13; 2026-02-21T08:13:31.3976268Z // end inline asm 2026-02-21T08:13:31.3976360Z cvt.u64.u32 %rd38, %r127; 2026-02-21T08:13:31.3976439Z // begin inline asm 2026-02-21T08:13:31.3976628Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd38]; 2026-02-21T08:13:31.3976713Z // end inline asm 2026-02-21T08:13:31.3976807Z and.pred %p30, %p31, %p22; 2026-02-21T08:13:31.3976884Z // begin inline asm 2026-02-21T08:13:31.3977069Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T08:13:31.3977155Z // end inline asm 2026-02-21T08:13:31.3977403Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3977486Z add.s32 %r147, %r901, 1; 2026-02-21T08:13:31.3977582Z setp.eq.b32 %p32, %r147, 4; 2026-02-21T08:13:31.3977675Z selp.b32 %r901, 0, %r147, %p32; 2026-02-21T08:13:31.3977762Z selp.b32 %r148, 1, 0, %p32; 2026-02-21T08:13:31.3977857Z xor.b32 %r900, %r900, %r148; 2026-02-21T08:13:31.3978133Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3978215Z add.s32 %r899, %r899, 64; 2026-02-21T08:13:31.3978305Z setp.lt.u32 %p33, %r899, 960; 2026-02-21T08:13:31.3978395Z @%p33 bra $L__BB0_6; 2026-02-21T08:13:31.3978510Z // %bb.7: // %.loopexit 2026-02-21T08:13:31.3978644Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3978733Z barrier.sync 1; 2026-02-21T08:13:31.3978845Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:13:31.3978925Z bra.uni $L__BB0_2; 2026-02-21T08:13:31.3979076Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3979347Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3979462Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:13:31.3979679Z ld.shared.v2.b32 {%r69, %r73}, [global_smem+131080]; 2026-02-21T08:13:31.3979769Z barrier.sync 1; 2026-02-21T08:13:31.3980032Z .loc 1 21 67 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:21:67 2026-02-21T08:13:31.3980119Z mov.u32 %r56, %ctaid.x; 2026-02-21T08:13:31.3980213Z mov.u32 %r57, %ctaid.y; 2026-02-21T08:13:31.3980295Z mov.u32 %r58, %ctaid.z; 2026-02-21T08:13:31.3980379Z mov.u32 %r59, %nctaid.x; 2026-02-21T08:13:31.3980471Z mov.u32 %r60, %nctaid.y; 2026-02-21T08:13:31.3980562Z mad.lo.s32 %r61, %r58, %r60, %r57; 2026-02-21T08:13:31.3980651Z mad.lo.s32 %r62, %r61, %r59, %r56; 2026-02-21T08:13:31.3980733Z shl.b32 %r63, %r62, 8; 2026-02-21T08:13:31.3981004Z .loc 1 22 67 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:22:67 2026-02-21T08:13:31.3981088Z cvt.s64.s32 %rd7, %r63; 2026-02-21T08:13:31.3981173Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T08:13:31.3981329Z add.s64 %rd9, %rd8, 128; 2026-02-21T08:13:31.3981425Z cvta.global.u64 %rd11, %rd9; 2026-02-21T08:13:31.3981683Z .loc 1 21 67 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:21:67 2026-02-21T08:13:31.3981782Z cvta.global.u64 %rd10, %rd8; 2026-02-21T08:13:31.3981865Z add.s32 %r13, %r1, -128; 2026-02-21T08:13:31.3981943Z mov.b32 %r903, 0; 2026-02-21T08:13:31.3982025Z mov.b32 %r902, -64; 2026-02-21T08:13:31.3982115Z mov.b32 %r904, %r903; 2026-02-21T08:13:31.3982260Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:13:31.3982401Z // => This Inner Loop Header: Depth=2 2026-02-21T08:13:31.3982670Z .loc 1 0 67 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0:67 2026-02-21T08:13:31.3982759Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T08:13:31.3982844Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T08:13:31.3983117Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3983211Z add.s32 %r902, %r902, 64; 2026-02-21T08:13:31.3983292Z shl.b32 %r75, %r904, 3; 2026-02-21T08:13:31.3983373Z add.s32 %r77, %r52, %r75; 2026-02-21T08:13:31.3983466Z add.s32 %r64, %r77, 131200; 2026-02-21T08:13:31.3983712Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3983793Z // begin inline asm 2026-02-21T08:13:31.3983872Z 2026-02-21T08:13:31.3983939Z { 2026-02-21T08:13:31.3984023Z .reg .pred complete; 2026-02-21T08:13:31.3984099Z waitLoop: 2026-02-21T08:13:31.3984283Z mbarrier.try_wait.parity.shared.b64 complete, [%r64], %r903; 2026-02-21T08:13:31.3984376Z @!complete bra.uni waitLoop; 2026-02-21T08:13:31.3984444Z } 2026-02-21T08:13:31.3984450Z 2026-02-21T08:13:31.3984535Z // end inline asm 2026-02-21T08:13:31.3984838Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3984925Z add.s32 %r70, %r77, 131232; 2026-02-21T08:13:31.3985181Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3985260Z bar.sync 3, 64; 2026-02-21T08:13:31.3985342Z // begin inline asm 2026-02-21T08:13:31.3985502Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r70], 32768; 2026-02-21T08:13:31.3985588Z // end inline asm 2026-02-21T08:13:31.3985851Z .loc 1 54 31 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:54:31 2026-02-21T08:13:31.3985936Z shl.b32 %r78, %r904, 14; 2026-02-21T08:13:31.3986028Z add.s32 %r67, %r52, %r78; 2026-02-21T08:13:31.3986273Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3986351Z bar.sync 3, 64; 2026-02-21T08:13:31.3986442Z elect.sync %r79|%p7, -1; 2026-02-21T08:13:31.3986541Z and.pred %p4, %p6, %p7; 2026-02-21T08:13:31.3986621Z // begin inline asm 2026-02-21T08:13:31.3987011Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r67], [%rd10, {%r902, %r69}], [%r70]; 2026-02-21T08:13:31.3987198Z // end inline asm 2026-02-21T08:13:31.3987463Z .loc 1 55 44 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:55:44 2026-02-21T08:13:31.3987549Z add.s32 %r71, %r67, 65536; 2026-02-21T08:13:31.3987802Z .loc 1 0 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:0 2026-02-21T08:13:31.3987879Z bar.sync 3, 64; 2026-02-21T08:13:31.3987970Z elect.sync %r80|%p8, -1; 2026-02-21T08:13:31.3988068Z and.pred %p5, %p6, %p8; 2026-02-21T08:13:31.3988150Z // begin inline asm 2026-02-21T08:13:31.3988533Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r71], [%rd11, {%r902, %r73}], [%r70]; 2026-02-21T08:13:31.3988613Z // end inline asm 2026-02-21T08:13:31.3988704Z add.s32 %r81, %r904, 1; 2026-02-21T08:13:31.3988792Z setp.eq.b32 %p9, %r81, 4; 2026-02-21T08:13:31.3988963Z selp.b32 %r904, 0, %r81, %p9; 2026-02-21T08:13:31.3989061Z selp.b32 %r82, 1, 0, %p9; 2026-02-21T08:13:31.3989146Z xor.b32 %r903, %r903, %r82; 2026-02-21T08:13:31.3989423Z .loc 1 50 112 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:50:112 2026-02-21T08:13:31.3989520Z setp.lt.u32 %p10, %r902, 960; 2026-02-21T08:13:31.3989602Z @%p10 bra $L__BB0_9; 2026-02-21T08:13:31.3989746Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3989828Z barrier.sync 1; 2026-02-21T08:13:31.3989951Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:13:31.3990032Z bra.uni $L__BB0_2; 2026-02-21T08:13:31.3990173Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:31.3990430Z .loc 1 19 0 // c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py:19 2026-02-21T08:13:31.3990510Z barrier.sync 1; 2026-02-21T08:13:31.3990589Z barrier.sync 1; 2026-02-21T08:13:31.3990671Z bra.uni $L__BB0_2; 2026-02-21T08:13:31.3990757Z $L__tmp1: 2026-02-21T08:13:31.3990835Z $L__func_end0: 2026-02-21T08:13:31.3990952Z // -- End function 2026-02-21T08:13:31.3991031Z } 2026-02-21T08:13:31.3991352Z .file 1 "/tmp/torchinductor_root/3o/c3osva7w7zfae4mqv4pyx62tigqerbf5syxycnwdc3eg74huaars.py" 2026-02-21T08:13:31.3991440Z .section .debug_abbrev 2026-02-21T08:13:31.3991520Z { 2026-02-21T08:13:31.3991643Z .b8 1 // Abbreviation Code 2026-02-21T08:13:31.3991766Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:13:31.3991881Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:13:31.3992003Z .b8 37 // DW_AT_producer 2026-02-21T08:13:31.3992112Z .b8 8 // DW_FORM_string 2026-02-21T08:13:31.3992219Z .b8 19 // DW_AT_language 2026-02-21T08:13:31.3992339Z .b8 5 // DW_FORM_data2 2026-02-21T08:13:31.3992449Z .b8 3 // DW_AT_name 2026-02-21T08:13:31.3992558Z .b8 8 // DW_FORM_string 2026-02-21T08:13:31.3992679Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:13:31.3992788Z .b8 6 // DW_FORM_data4 2026-02-21T08:13:31.3992897Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:13:31.3993004Z .b8 8 // DW_FORM_string 2026-02-21T08:13:31.3993115Z .b8 0 // EOM(1) 2026-02-21T08:13:31.3993212Z .b8 0 // EOM(2) 2026-02-21T08:13:31.3993308Z .b8 0 // EOM(3) 2026-02-21T08:13:31.3993385Z } 2026-02-21T08:13:31.3993469Z .section .debug_info 2026-02-21T08:13:31.3993538Z { 2026-02-21T08:13:31.3993663Z .b32 104 // Length of Unit 2026-02-21T08:13:31.3993791Z .b8 2 // DWARF version number 2026-02-21T08:13:31.3993921Z .b8 0 2026-02-21T08:13:31.3994092Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:13:31.3994232Z .b8 8 // Address Size (in bytes) 2026-02-21T08:13:31.3994377Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:13:31.3994493Z .b8 116 // DW_AT_producer 2026-02-21T08:13:31.3994574Z .b8 114 2026-02-21T08:13:31.3994646Z .b8 105 2026-02-21T08:13:31.3994750Z .b8 116 2026-02-21T08:13:31.3994820Z .b8 111 2026-02-21T08:13:31.3994901Z .b8 110 2026-02-21T08:13:31.3994972Z .b8 0 2026-02-21T08:13:31.3995082Z .b8 2 // DW_AT_language 2026-02-21T08:13:31.3995159Z .b8 0 2026-02-21T08:13:31.3995267Z .b8 99 // DW_AT_name 2026-02-21T08:13:31.3995337Z .b8 51 2026-02-21T08:13:31.3995408Z .b8 111 2026-02-21T08:13:31.3995557Z .b8 115 2026-02-21T08:13:31.3995628Z .b8 118 2026-02-21T08:13:31.3995700Z .b8 97 2026-02-21T08:13:31.3995775Z .b8 55 2026-02-21T08:13:31.3995842Z .b8 119 2026-02-21T08:13:31.3995912Z .b8 55 2026-02-21T08:13:31.3995983Z .b8 122 2026-02-21T08:13:31.3996061Z .b8 102 2026-02-21T08:13:31.3996134Z .b8 97 2026-02-21T08:13:31.3996204Z .b8 101 2026-02-21T08:13:31.3996282Z .b8 52 2026-02-21T08:13:31.3996353Z .b8 109 2026-02-21T08:13:31.3996421Z .b8 113 2026-02-21T08:13:31.3996491Z .b8 118 2026-02-21T08:13:31.3996568Z .b8 52 2026-02-21T08:13:31.3996638Z .b8 112 2026-02-21T08:13:31.3996709Z .b8 121 2026-02-21T08:13:31.3996779Z .b8 120 2026-02-21T08:13:31.3996856Z .b8 54 2026-02-21T08:13:31.3996929Z .b8 50 2026-02-21T08:13:31.3996998Z .b8 116 2026-02-21T08:13:31.3997075Z .b8 105 2026-02-21T08:13:31.3997146Z .b8 103 2026-02-21T08:13:31.3997217Z .b8 113 2026-02-21T08:13:31.3997287Z .b8 101 2026-02-21T08:13:31.3997364Z .b8 114 2026-02-21T08:13:31.3997433Z .b8 98 2026-02-21T08:13:31.3997502Z .b8 102 2026-02-21T08:13:31.3997583Z .b8 53 2026-02-21T08:13:31.3997653Z .b8 115 2026-02-21T08:13:31.3997725Z .b8 121 2026-02-21T08:13:31.3997795Z .b8 120 2026-02-21T08:13:31.3997873Z .b8 121 2026-02-21T08:13:31.3997942Z .b8 99 2026-02-21T08:13:31.3998012Z .b8 110 2026-02-21T08:13:31.3998090Z .b8 119 2026-02-21T08:13:31.3998160Z .b8 100 2026-02-21T08:13:31.3998231Z .b8 99 2026-02-21T08:13:31.3998300Z .b8 51 2026-02-21T08:13:31.3998377Z .b8 101 2026-02-21T08:13:31.3998446Z .b8 103 2026-02-21T08:13:31.3998516Z .b8 55 2026-02-21T08:13:31.3998587Z .b8 52 2026-02-21T08:13:31.3998665Z .b8 104 2026-02-21T08:13:31.3998734Z .b8 117 2026-02-21T08:13:31.3998805Z .b8 97 2026-02-21T08:13:31.3998882Z .b8 97 2026-02-21T08:13:31.3998953Z .b8 114 2026-02-21T08:13:31.3999023Z .b8 115 2026-02-21T08:13:31.3999093Z .b8 46 2026-02-21T08:13:31.3999171Z .b8 112 2026-02-21T08:13:31.3999241Z .b8 121 2026-02-21T08:13:31.3999310Z .b8 0 2026-02-21T08:13:31.3999451Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:13:31.3999563Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:13:31.3999635Z .b8 116 2026-02-21T08:13:31.3999705Z .b8 109 2026-02-21T08:13:31.3999782Z .b8 112 2026-02-21T08:13:31.3999852Z .b8 47 2026-02-21T08:13:31.3999921Z .b8 116 2026-02-21T08:13:31.3999997Z .b8 111 2026-02-21T08:13:31.4000067Z .b8 114 2026-02-21T08:13:31.4000137Z .b8 99 2026-02-21T08:13:31.4000206Z .b8 104 2026-02-21T08:13:31.4000284Z .b8 105 2026-02-21T08:13:31.4000353Z .b8 110 2026-02-21T08:13:31.4000422Z .b8 100 2026-02-21T08:13:31.4000492Z .b8 117 2026-02-21T08:13:31.4000570Z .b8 99 2026-02-21T08:13:31.4000639Z .b8 116 2026-02-21T08:13:31.4000710Z .b8 111 2026-02-21T08:13:31.4000786Z .b8 114 2026-02-21T08:13:31.4000856Z .b8 95 2026-02-21T08:13:31.4000925Z .b8 114 2026-02-21T08:13:31.4000993Z .b8 111 2026-02-21T08:13:31.4001074Z .b8 111 2026-02-21T08:13:31.4001143Z .b8 116 2026-02-21T08:13:31.4001214Z .b8 47 2026-02-21T08:13:31.4001291Z .b8 51 2026-02-21T08:13:31.4001362Z .b8 111 2026-02-21T08:13:31.4001431Z .b8 0 2026-02-21T08:13:31.4001501Z } 2026-02-21T08:13:31.4001609Z .section .debug_macinfo { } 2026-02-21T08:13:31.4001699Z 2026-02-21T08:13:31.4001814Z ================================================================ 2026-02-21T08:13:31.4001968Z please share the reproducer above with Triton project. 2026-02-21T08:13:33.2221732Z 2026-02-21T08:13:33.2222596Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 50/50 21.7 configs/s 2026-02-21T08:13:34.6303650Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 720.1 2026-02-21T08:13:34.6304016Z configs/s 2026-02-21T08:13:34.7486766Z [93s] Generation 6 complete: 2026-02-21T08:13:34.7487111Z error=16 2026-02-21T08:13:34.7487415Z ok=38 2026-02-21T08:13:34.7487613Z min=0.0184 2026-02-21T08:13:34.7487803Z mid=0.0246 2026-02-21T08:13:34.7488000Z max=0.0676 2026-02-21T08:13:34.7488205Z best={'block_sizes': [256, 128, 64], 2026-02-21T08:13:34.7489084Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:13:34.7489540Z 'l2_groupings': [64], 2026-02-21T08:13:34.7489826Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:13:34.7490109Z 'loop_orders': [[1, 0]], 2026-02-21T08:13:34.7490364Z 'maxnreg': 256, 2026-02-21T08:13:34.7490588Z 'num_sm_multiplier': 1, 2026-02-21T08:13:34.7490838Z 'num_stages': 4, 2026-02-21T08:13:34.7491054Z 'num_warps': 1, 2026-02-21T08:13:34.7491288Z 'pid_type': 'persistent_blocked', 2026-02-21T08:13:34.7491591Z 'range_flattens': [None, False], 2026-02-21T08:13:34.7491859Z 'range_multi_buffers': [None, True], 2026-02-21T08:13:34.7492170Z 'range_num_stages': [0, 0], 2026-02-21T08:13:34.7492431Z 'range_unroll_factors': [0, 0], 2026-02-21T08:13:34.7492743Z 'range_warp_specializes': [True, None]} 2026-02-21T08:13:34.7506437Z [93s] Fitting surrogate: 570 points, 570 targets 2026-02-21T08:13:35.4914984Z [94s] Generation 7 starting: 37 neighbors, 2 active search path(s) 2026-02-21T08:13:37.6643057Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38/38 16.8 configs/s 2026-02-21T08:13:38.9229177Z [97s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:13:38.9229603Z 2026-02-21T08:13:38.9229714Z 2026-02-21T08:13:38.9229860Z ================================================================ 2026-02-21T08:13:38.9230177Z Internal Triton PTX codegen error 2026-02-21T08:13:38.9230407Z `ptxas` stderr: 2026-02-21T08:13:38.9231022Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 342 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:38.9231660Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:38.9231884Z 2026-02-21T08:13:38.9232477Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_wstd829.ptx -o /tmp/tmp_wstd829.ptx.o 2026-02-21T08:13:38.9233053Z 2026-02-21T08:13:38.9233058Z 2026-02-21T08:13:38.9233132Z // 2026-02-21T08:13:38.9233330Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:13:38.9233541Z // 2026-02-21T08:13:38.9233614Z 2026-02-21T08:13:38.9233676Z .version 8.7 2026-02-21T08:13:38.9233833Z .target sm_100a 2026-02-21T08:13:38.9233988Z .address_size 64 2026-02-21T08:13:38.9234097Z 2026-02-21T08:13:38.9234241Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:13:38.9234551Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:13:38.9235027Z // @_helion_matmul 2026-02-21T08:13:38.9235291Z .visible .entry _helion_matmul( 2026-02-21T08:13:38.9235566Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:13:38.9235899Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:13:38.9236225Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:13:38.9236537Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:13:38.9236841Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:13:38.9237446Z ) 2026-02-21T08:13:38.9237598Z .reqntid 128 2026-02-21T08:13:38.9237749Z .maxnreg 32 2026-02-21T08:13:38.9237904Z { 2026-02-21T08:13:38.9238049Z .reg .pred %p<148>; 2026-02-21T08:13:38.9238225Z .reg .b16 %rs<11>; 2026-02-21T08:13:38.9238384Z .reg .b32 %r<508>; 2026-02-21T08:13:38.9238548Z .reg .b64 %rd<229>; 2026-02-21T08:13:38.9238850Z .loc 1 19 0 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:19:0 2026-02-21T08:13:38.9239196Z $L__func_begin0: 2026-02-21T08:13:38.9239492Z .loc 1 19 0 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:19:0 2026-02-21T08:13:38.9239759Z 2026-02-21T08:13:38.9239818Z // %bb.0: 2026-02-21T08:13:38.9240002Z ld.param.b64 %rd9, [_helion_matmul_param_0]; 2026-02-21T08:13:38.9240220Z $L__tmp0: 2026-02-21T08:13:38.9240492Z .loc 1 19 0 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:19 2026-02-21T08:13:38.9241025Z mov.u32 %r1, %tid.x; 2026-02-21T08:13:38.9241266Z ld.param.b64 %rd27, [_helion_matmul_param_1]; 2026-02-21T08:13:38.9241545Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:13:38.9241775Z ld.param.b64 %rd45, [_helion_matmul_param_2]; 2026-02-21T08:13:38.9242043Z mov.b32 %r35, global_smem; 2026-02-21T08:13:38.9242244Z // begin inline asm 2026-02-21T08:13:38.9242556Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r35], 64; 2026-02-21T08:13:38.9242880Z // end inline asm 2026-02-21T08:13:38.9243090Z ld.param.b64 %rd62, [_helion_matmul_param_3]; 2026-02-21T08:13:38.9243343Z bar.sync 0; 2026-02-21T08:13:38.9243528Z ld.shared.b32 %r498, [global_smem]; 2026-02-21T08:13:38.9243732Z bar.sync 0; 2026-02-21T08:13:38.9243881Z // begin inline asm 2026-02-21T08:13:38.9244128Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:13:38.9244414Z // end inline asm 2026-02-21T08:13:38.9244837Z .loc 1 21 67 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:21:67 2026-02-21T08:13:38.9245205Z mov.u32 %r60, %ctaid.x; 2026-02-21T08:13:38.9245396Z mov.u32 %r61, %ctaid.y; 2026-02-21T08:13:38.9245580Z mov.u32 %r62, %ctaid.z; 2026-02-21T08:13:38.9245757Z mov.u32 %r63, %nctaid.x; 2026-02-21T08:13:38.9245945Z mov.u32 %r64, %nctaid.y; 2026-02-21T08:13:38.9246136Z mad.lo.s32 %r65, %r62, %r64, %r61; 2026-02-21T08:13:38.9246359Z mad.lo.s32 %r66, %r65, %r63, %r60; 2026-02-21T08:13:38.9246570Z mul.lo.s32 %r67, %r66, 384; 2026-02-21T08:13:38.9246781Z cvt.s64.s32 %rd63, %r67; 2026-02-21T08:13:38.9246991Z add.s64 %rd23, %rd62, %rd63; 2026-02-21T08:13:38.9247198Z shl.b32 %r68, %r1, 2; 2026-02-21T08:13:38.9247383Z add.s32 %r36, %r35, %r68; 2026-02-21T08:13:38.9247578Z mov.b32 %r45, 0; 2026-02-21T08:13:38.9247756Z // begin inline asm 2026-02-21T08:13:38.9247951Z @%p1 st.shared.b32 [ %r36 + 0 ], %r45; 2026-02-21T08:13:38.9248173Z // end inline asm 2026-02-21T08:13:38.9248348Z bar.warp.sync -1; 2026-02-21T08:13:38.9248541Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T08:13:38.9248737Z cvt.u64.u32 %rd8, %r35; 2026-02-21T08:13:38.9248927Z // begin inline asm 2026-02-21T08:13:38.9249236Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T08:13:38.9249616Z // end inline asm 2026-02-21T08:13:38.9249779Z // begin inline asm 2026-02-21T08:13:38.9250061Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T08:13:38.9250386Z // end inline asm 2026-02-21T08:13:38.9250548Z mov.b32 %r38, 32; 2026-02-21T08:13:38.9250720Z // begin inline asm 2026-02-21T08:13:38.9251008Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r38; 2026-02-21T08:13:38.9251344Z // end inline asm 2026-02-21T08:13:38.9251506Z mov.b32 %r39, 128; 2026-02-21T08:13:38.9251679Z // begin inline asm 2026-02-21T08:13:38.9251965Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r39; 2026-02-21T08:13:38.9252288Z // end inline asm 2026-02-21T08:13:38.9252461Z mov.b32 %r40, 1024; 2026-02-21T08:13:38.9252727Z // begin inline asm 2026-02-21T08:13:38.9253022Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r40; 2026-02-21T08:13:38.9253354Z // end inline asm 2026-02-21T08:13:38.9253524Z mov.b32 %r41, 4096; 2026-02-21T08:13:38.9253699Z // begin inline asm 2026-02-21T08:13:38.9253995Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r41; 2026-02-21T08:13:38.9254329Z // end inline asm 2026-02-21T08:13:38.9254497Z mov.b64 %rd16, 2048; 2026-02-21T08:13:38.9254743Z // begin inline asm 2026-02-21T08:13:38.9255050Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T08:13:38.9255398Z // end inline asm 2026-02-21T08:13:38.9255558Z mov.b32 %r42, 1; 2026-02-21T08:13:38.9255725Z // begin inline asm 2026-02-21T08:13:38.9256030Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r42; 2026-02-21T08:13:38.9256536Z // end inline asm 2026-02-21T08:13:38.9256712Z // begin inline asm 2026-02-21T08:13:38.9257010Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r42; 2026-02-21T08:13:38.9257370Z // end inline asm 2026-02-21T08:13:38.9257531Z // begin inline asm 2026-02-21T08:13:38.9257819Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T08:13:38.9258136Z // end inline asm 2026-02-21T08:13:38.9258306Z // begin inline asm 2026-02-21T08:13:38.9258616Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:38.9258958Z // end inline asm 2026-02-21T08:13:38.9259128Z // begin inline asm 2026-02-21T08:13:38.9259416Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x2; 2026-02-21T08:13:38.9259748Z // end inline asm 2026-02-21T08:13:38.9259911Z // begin inline asm 2026-02-21T08:13:38.9260199Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:38.9260521Z // end inline asm 2026-02-21T08:13:38.9260692Z // begin inline asm 2026-02-21T08:13:38.9261117Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T08:13:38.9261579Z // end inline asm 2026-02-21T08:13:38.9261748Z // begin inline asm 2026-02-21T08:13:38.9261998Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T08:13:38.9262315Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:38.9262548Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:38.9262769Z // end inline asm 2026-02-21T08:13:38.9262935Z bar.sync 0; 2026-02-21T08:13:38.9263102Z cvta.global.u64 %rd87, %rd23; 2026-02-21T08:13:38.9263463Z .loc 1 22 67 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:22:67 2026-02-21T08:13:38.9263817Z add.s32 %r69, %r67, 128; 2026-02-21T08:13:38.9264004Z cvt.s64.s32 %rd64, %r69; 2026-02-21T08:13:38.9264189Z add.s64 %rd41, %rd62, %rd64; 2026-02-21T08:13:38.9264386Z bar.sync 0; 2026-02-21T08:13:38.9264538Z // begin inline asm 2026-02-21T08:13:38.9264772Z @%p1 st.shared.b32 [ %r36 + 0 ], %r45; 2026-02-21T08:13:38.9264985Z // end inline asm 2026-02-21T08:13:38.9265149Z bar.warp.sync -1; 2026-02-21T08:13:38.9265322Z // begin inline asm 2026-02-21T08:13:38.9265609Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd27; 2026-02-21T08:13:38.9265953Z // end inline asm 2026-02-21T08:13:38.9266111Z // begin inline asm 2026-02-21T08:13:38.9266378Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T08:13:38.9266676Z // end inline asm 2026-02-21T08:13:38.9266844Z // begin inline asm 2026-02-21T08:13:38.9267132Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r38; 2026-02-21T08:13:38.9267447Z // end inline asm 2026-02-21T08:13:38.9267616Z mov.b32 %r47, 64; 2026-02-21T08:13:38.9267780Z // begin inline asm 2026-02-21T08:13:38.9268064Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r47; 2026-02-21T08:13:38.9268472Z // end inline asm 2026-02-21T08:13:38.9268650Z // begin inline asm 2026-02-21T08:13:38.9268946Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r40; 2026-02-21T08:13:38.9269290Z // end inline asm 2026-02-21T08:13:38.9269465Z // begin inline asm 2026-02-21T08:13:38.9269757Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r40; 2026-02-21T08:13:38.9270100Z // end inline asm 2026-02-21T08:13:38.9270269Z // begin inline asm 2026-02-21T08:13:38.9270589Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T08:13:38.9270932Z // end inline asm 2026-02-21T08:13:38.9271112Z // begin inline asm 2026-02-21T08:13:38.9271431Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r42; 2026-02-21T08:13:38.9271779Z // end inline asm 2026-02-21T08:13:38.9272040Z // begin inline asm 2026-02-21T08:13:38.9272341Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r42; 2026-02-21T08:13:38.9272699Z // end inline asm 2026-02-21T08:13:38.9272861Z // begin inline asm 2026-02-21T08:13:38.9273155Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T08:13:38.9273516Z // end inline asm 2026-02-21T08:13:38.9273686Z // begin inline asm 2026-02-21T08:13:38.9274007Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:38.9274361Z // end inline asm 2026-02-21T08:13:38.9274539Z // begin inline asm 2026-02-21T08:13:38.9274879Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x2; 2026-02-21T08:13:38.9275209Z // end inline asm 2026-02-21T08:13:38.9275370Z // begin inline asm 2026-02-21T08:13:38.9275660Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:38.9275971Z // end inline asm 2026-02-21T08:13:38.9276131Z // begin inline asm 2026-02-21T08:13:38.9276539Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd41 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T08:13:38.9276999Z // end inline asm 2026-02-21T08:13:38.9277164Z // begin inline asm 2026-02-21T08:13:38.9277409Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd41 + 0 ], 0x80; 2026-02-21T08:13:38.9277710Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:38.9277942Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:38.9278153Z // end inline asm 2026-02-21T08:13:38.9278320Z bar.sync 0; 2026-02-21T08:13:38.9278487Z cvta.global.u64 %rd88, %rd41; 2026-02-21T08:13:38.9278830Z .loc 1 24 71 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:24:71 2026-02-21T08:13:38.9279191Z add.s32 %r70, %r67, 256; 2026-02-21T08:13:38.9279389Z cvt.s64.s32 %rd65, %r70; 2026-02-21T08:13:38.9279577Z add.s64 %rd59, %rd62, %rd65; 2026-02-21T08:13:38.9279778Z bar.sync 0; 2026-02-21T08:13:38.9279945Z // begin inline asm 2026-02-21T08:13:38.9280124Z @%p1 st.shared.b32 [ %r36 + 0 ], %r45; 2026-02-21T08:13:38.9280342Z // end inline asm 2026-02-21T08:13:38.9280507Z bar.warp.sync -1; 2026-02-21T08:13:38.9280685Z // begin inline asm 2026-02-21T08:13:38.9280977Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd45; 2026-02-21T08:13:38.9281313Z // end inline asm 2026-02-21T08:13:38.9281475Z // begin inline asm 2026-02-21T08:13:38.9281746Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T08:13:38.9282054Z // end inline asm 2026-02-21T08:13:38.9282215Z // begin inline asm 2026-02-21T08:13:38.9282508Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r47; 2026-02-21T08:13:38.9282823Z // end inline asm 2026-02-21T08:13:38.9282991Z // begin inline asm 2026-02-21T08:13:38.9283270Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r39; 2026-02-21T08:13:38.9283719Z // end inline asm 2026-02-21T08:13:38.9283882Z // begin inline asm 2026-02-21T08:13:38.9284178Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r40; 2026-02-21T08:13:38.9284515Z // end inline asm 2026-02-21T08:13:38.9284730Z // begin inline asm 2026-02-21T08:13:38.9285032Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r41; 2026-02-21T08:13:38.9285365Z // end inline asm 2026-02-21T08:13:38.9285539Z // begin inline asm 2026-02-21T08:13:38.9285843Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T08:13:38.9286203Z // end inline asm 2026-02-21T08:13:38.9286376Z // begin inline asm 2026-02-21T08:13:38.9286680Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r42; 2026-02-21T08:13:38.9287026Z // end inline asm 2026-02-21T08:13:38.9287186Z // begin inline asm 2026-02-21T08:13:38.9287563Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r42; 2026-02-21T08:13:38.9287919Z // end inline asm 2026-02-21T08:13:38.9288087Z // begin inline asm 2026-02-21T08:13:38.9288377Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T08:13:38.9288700Z // end inline asm 2026-02-21T08:13:38.9288869Z // begin inline asm 2026-02-21T08:13:38.9289171Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:38.9291039Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:13:38.9292637Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:13:38.9292933Z `ptxas` stderr: 2026-02-21T08:13:38.9293465Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 342 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:38.9294095Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:38.9294280Z 2026-02-21T08:13:38.9294843Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_wstd829.ptx -o /tmp/tmp_wstd829.ptx.o 2026-02-21T08:13:38.9295431Z 2026-02-21T08:13:38.9295587Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:13:38.9295879Z // end inline asm 2026-02-21T08:13:38.9296051Z // begin inline asm 2026-02-21T08:13:38.9296351Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T08:13:38.9296682Z // end inline asm 2026-02-21T08:13:38.9296853Z // begin inline asm 2026-02-21T08:13:38.9297133Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:38.9297455Z // end inline asm 2026-02-21T08:13:38.9297614Z // begin inline asm 2026-02-21T08:13:38.9298042Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd59 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T08:13:38.9298515Z // end inline asm 2026-02-21T08:13:38.9298678Z // begin inline asm 2026-02-21T08:13:38.9298938Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd59 + 0 ], 0x80; 2026-02-21T08:13:38.9299246Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:38.9299485Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:38.9299696Z // end inline asm 2026-02-21T08:13:38.9299861Z bar.sync 0; 2026-02-21T08:13:38.9300032Z cvta.global.u64 %rd100, %rd59; 2026-02-21T08:13:38.9300393Z .loc 1 31 35 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:31:35 2026-02-21T08:13:38.9300846Z shl.b32 %r499, %r60, 2; 2026-02-21T08:13:38.9301177Z .loc 1 32 37 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:32:37 2026-02-21T08:13:38.9301541Z add.s32 %r71, %r499, 4; 2026-02-21T08:13:38.9301848Z .loc 1 32 49 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:32:49 2026-02-21T08:13:38.9302209Z min.s32 %r4, %r71, 512; 2026-02-21T08:13:38.9302512Z .loc 1 33 74 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:33:74 2026-02-21T08:13:38.9302875Z setp.ge.s32 %p57, %r499, %r4; 2026-02-21T08:13:38.9303085Z @%p57 bra $L__BB0_9; 2026-02-21T08:13:38.9303284Z // %bb.1: // %.lr.ph 2026-02-21T08:13:38.9303652Z .loc 1 0 74 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:0:74 2026-02-21T08:13:38.9303992Z shr.u32 %r5, %r1, 5; 2026-02-21T08:13:38.9304189Z bfe.u32 %r73, %r35, 4, 14; 2026-02-21T08:13:38.9304466Z cvt.u64.u32 %rd66, %r73; 2026-02-21T08:13:38.9304736Z or.b64 %rd82, %rd66, -9223371899382267904; 2026-02-21T08:13:38.9304957Z add.s32 %r74, %r35, 57344; 2026-02-21T08:13:38.9305150Z bfe.u32 %r75, %r74, 4, 14; 2026-02-21T08:13:38.9305338Z cvt.u64.u32 %rd67, %r75; 2026-02-21T08:13:38.9305540Z or.b64 %rd83, %rd67, -9223371899399045120; 2026-02-21T08:13:38.9305769Z add.s32 %r76, %r35, 32; 2026-02-21T08:13:38.9305946Z bfe.u32 %r77, %r76, 4, 14; 2026-02-21T08:13:38.9306135Z cvt.u64.u32 %rd68, %r77; 2026-02-21T08:13:38.9306331Z or.b64 %rd84, %rd68, -9223371899382267904; 2026-02-21T08:13:38.9306554Z add.s32 %r78, %r35, 57376; 2026-02-21T08:13:38.9306737Z bfe.u32 %r79, %r78, 4, 14; 2026-02-21T08:13:38.9306932Z cvt.u64.u32 %rd69, %r79; 2026-02-21T08:13:38.9307127Z or.b64 %rd85, %rd69, -9223371899399045120; 2026-02-21T08:13:38.9307346Z shl.b32 %r80, %r1, 7; 2026-02-21T08:13:38.9307531Z and.b32 %r81, %r80, 16256; 2026-02-21T08:13:38.9307711Z shl.b32 %r82, %r1, 4; 2026-02-21T08:13:38.9307899Z and.b32 %r83, %r82, 112; 2026-02-21T08:13:38.9308084Z or.b32 %r84, %r81, %r83; 2026-02-21T08:13:38.9308271Z add.s32 %r85, %r35, 86016; 2026-02-21T08:13:38.9308456Z add.s32 %r6, %r85, %r84; 2026-02-21T08:13:38.9308644Z xor.b32 %r86, %r84, 16; 2026-02-21T08:13:38.9308821Z add.s32 %r7, %r85, %r86; 2026-02-21T08:13:38.9309010Z xor.b32 %r87, %r84, 32; 2026-02-21T08:13:38.9309187Z add.s32 %r8, %r85, %r87; 2026-02-21T08:13:38.9309371Z xor.b32 %r88, %r84, 48; 2026-02-21T08:13:38.9309552Z add.s32 %r9, %r85, %r88; 2026-02-21T08:13:38.9309731Z xor.b32 %r89, %r84, 64; 2026-02-21T08:13:38.9309919Z add.s32 %r10, %r85, %r89; 2026-02-21T08:13:38.9310102Z xor.b32 %r90, %r84, 80; 2026-02-21T08:13:38.9310284Z add.s32 %r11, %r85, %r90; 2026-02-21T08:13:38.9310465Z xor.b32 %r91, %r84, 96; 2026-02-21T08:13:38.9310653Z add.s32 %r12, %r85, %r91; 2026-02-21T08:13:38.9310834Z xor.b32 %r92, %r84, 112; 2026-02-21T08:13:38.9311020Z add.s32 %r13, %r85, %r92; 2026-02-21T08:13:38.9311210Z bra.uni $L__BB0_2; 2026-02-21T08:13:38.9311445Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:38.9311719Z mov.b32 %r319, 1; 2026-02-21T08:13:38.9312030Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9312390Z // begin inline asm 2026-02-21T08:13:38.9312556Z 2026-02-21T08:13:38.9312703Z { 2026-02-21T08:13:38.9312853Z .reg .pred complete; 2026-02-21T08:13:38.9313039Z waitLoop: 2026-02-21T08:13:38.9313275Z mbarrier.try_wait.parity.shared.b64 complete, [%r318], %r319; 2026-02-21T08:13:38.9313575Z @!complete bra.uni waitLoop; 2026-02-21T08:13:38.9313771Z } 2026-02-21T08:13:38.9313851Z 2026-02-21T08:13:38.9313919Z // end inline asm 2026-02-21T08:13:38.9314243Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9314593Z bar.sync 0; 2026-02-21T08:13:38.9314841Z // begin inline asm 2026-02-21T08:13:38.9315054Z @%p4 mbarrier.inval.shared::cta.b64 [%r163]; 2026-02-21T08:13:38.9315377Z // end inline asm 2026-02-21T08:13:38.9315545Z bar.sync 0; 2026-02-21T08:13:38.9315715Z // begin inline asm 2026-02-21T08:13:38.9315928Z @%p4 mbarrier.inval.shared::cta.b64 [%r164]; 2026-02-21T08:13:38.9316150Z // end inline asm 2026-02-21T08:13:38.9316316Z bar.sync 0; 2026-02-21T08:13:38.9316481Z // begin inline asm 2026-02-21T08:13:38.9316682Z @%p4 mbarrier.inval.shared::cta.b64 [%r165]; 2026-02-21T08:13:38.9316899Z // end inline asm 2026-02-21T08:13:38.9317062Z bar.sync 0; 2026-02-21T08:13:38.9317211Z // begin inline asm 2026-02-21T08:13:38.9317407Z @%p4 mbarrier.inval.shared::cta.b64 [%r166]; 2026-02-21T08:13:38.9317629Z // end inline asm 2026-02-21T08:13:38.9317791Z bar.sync 0; 2026-02-21T08:13:38.9317940Z // begin inline asm 2026-02-21T08:13:38.9318113Z @%p4 mbarrier.inval.shared::cta.b64 [%r167]; 2026-02-21T08:13:38.9318323Z // end inline asm 2026-02-21T08:13:38.9318468Z bar.sync 0; 2026-02-21T08:13:38.9318684Z // begin inline asm 2026-02-21T08:13:38.9318864Z @%p4 mbarrier.inval.shared::cta.b64 [%r168]; 2026-02-21T08:13:38.9319075Z // end inline asm 2026-02-21T08:13:38.9319220Z bar.sync 0; 2026-02-21T08:13:38.9319368Z // begin inline asm 2026-02-21T08:13:38.9319546Z @%p4 mbarrier.inval.shared::cta.b64 [%r254]; 2026-02-21T08:13:38.9319746Z // end inline asm 2026-02-21T08:13:38.9319910Z add.s32 %r327, %r35, 102464; 2026-02-21T08:13:38.9320083Z // begin inline asm 2026-02-21T08:13:38.9320263Z @%p4 mbarrier.inval.shared::cta.b64 [%r327]; 2026-02-21T08:13:38.9320464Z // end inline asm 2026-02-21T08:13:38.9320615Z bar.sync 0; 2026-02-21T08:13:38.9320757Z // begin inline asm 2026-02-21T08:13:38.9320940Z @%p4 mbarrier.inval.shared::cta.b64 [%r162]; 2026-02-21T08:13:38.9321141Z // end inline asm 2026-02-21T08:13:38.9321435Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9321774Z // begin inline asm 2026-02-21T08:13:38.9322188Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344}, [%r396 + 0]; 2026-02-21T08:13:38.9322635Z // end inline asm 2026-02-21T08:13:38.9322784Z // begin inline asm 2026-02-21T08:13:38.9323196Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361}, [%r396 + 16]; 2026-02-21T08:13:38.9323646Z // end inline asm 2026-02-21T08:13:38.9323795Z // begin inline asm 2026-02-21T08:13:38.9324201Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378}, [%r396 + 32]; 2026-02-21T08:13:38.9324630Z // end inline asm 2026-02-21T08:13:38.9324842Z // begin inline asm 2026-02-21T08:13:38.9325249Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r380, %r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395}, [%r396 + 48]; 2026-02-21T08:13:38.9325690Z // end inline asm 2026-02-21T08:13:38.9325857Z // begin inline asm 2026-02-21T08:13:38.9326048Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:13:38.9326269Z // end inline asm 2026-02-21T08:13:38.9326458Z cvt.u64.u32 %rd101, %r329; 2026-02-21T08:13:38.9326669Z cvt.u64.u32 %rd102, %r330; 2026-02-21T08:13:38.9326870Z shl.b64 %rd103, %rd102, 32; 2026-02-21T08:13:38.9327089Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T08:13:38.9327441Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9327814Z mov.b64 {%r401, %r402}, %rd104; 2026-02-21T08:13:38.9328039Z cvt.rn.f16x2.f32 %r403, %r402, %r401; 2026-02-21T08:13:38.9328401Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9328768Z cvt.u64.u32 %rd105, %r331; 2026-02-21T08:13:38.9328959Z cvt.u64.u32 %rd106, %r332; 2026-02-21T08:13:38.9329163Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:13:38.9329439Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:13:38.9329776Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9330129Z mov.b64 {%r404, %r405}, %rd108; 2026-02-21T08:13:38.9330347Z cvt.rn.f16x2.f32 %r406, %r405, %r404; 2026-02-21T08:13:38.9330696Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9331057Z cvt.u64.u32 %rd109, %r333; 2026-02-21T08:13:38.9331254Z cvt.u64.u32 %rd110, %r334; 2026-02-21T08:13:38.9331443Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:13:38.9331647Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:13:38.9331972Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9332342Z mov.b64 {%r407, %r408}, %rd112; 2026-02-21T08:13:38.9332556Z cvt.rn.f16x2.f32 %r409, %r408, %r407; 2026-02-21T08:13:38.9332965Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9333346Z cvt.u64.u32 %rd113, %r335; 2026-02-21T08:13:38.9333533Z cvt.u64.u32 %rd114, %r336; 2026-02-21T08:13:38.9333723Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:13:38.9333912Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:13:38.9334243Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9334602Z mov.b64 {%r410, %r411}, %rd116; 2026-02-21T08:13:38.9334870Z cvt.rn.f16x2.f32 %r412, %r411, %r410; 2026-02-21T08:13:38.9335222Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9335585Z cvt.u64.u32 %rd117, %r337; 2026-02-21T08:13:38.9335779Z cvt.u64.u32 %rd118, %r338; 2026-02-21T08:13:38.9335964Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:13:38.9336163Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:13:38.9336496Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9336874Z mov.b64 {%r413, %r414}, %rd120; 2026-02-21T08:13:38.9337085Z cvt.rn.f16x2.f32 %r415, %r414, %r413; 2026-02-21T08:13:38.9337429Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9337801Z cvt.u64.u32 %rd121, %r339; 2026-02-21T08:13:38.9337995Z cvt.u64.u32 %rd122, %r340; 2026-02-21T08:13:38.9338198Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:13:38.9338391Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:13:38.9338739Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9339104Z mov.b64 {%r416, %r417}, %rd124; 2026-02-21T08:13:38.9339316Z cvt.rn.f16x2.f32 %r418, %r417, %r416; 2026-02-21T08:13:38.9339681Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9340047Z cvt.u64.u32 %rd125, %r341; 2026-02-21T08:13:38.9340254Z cvt.u64.u32 %rd126, %r342; 2026-02-21T08:13:38.9340447Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:13:38.9340646Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:13:38.9340990Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9341375Z mov.b64 {%r419, %r420}, %rd128; 2026-02-21T08:13:38.9341584Z cvt.rn.f16x2.f32 %r421, %r420, %r419; 2026-02-21T08:13:38.9341942Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9342324Z cvt.u64.u32 %rd129, %r343; 2026-02-21T08:13:38.9342518Z cvt.u64.u32 %rd130, %r344; 2026-02-21T08:13:38.9342716Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:13:38.9342914Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:13:38.9343227Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9343553Z mov.b64 {%r422, %r423}, %rd132; 2026-02-21T08:13:38.9343750Z cvt.rn.f16x2.f32 %r424, %r423, %r422; 2026-02-21T08:13:38.9344175Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9344495Z cvt.u64.u32 %rd133, %r346; 2026-02-21T08:13:38.9344720Z cvt.u64.u32 %rd134, %r347; 2026-02-21T08:13:38.9344897Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:13:38.9345083Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:13:38.9345385Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9345726Z mov.b64 {%r425, %r426}, %rd136; 2026-02-21T08:13:38.9345925Z cvt.rn.f16x2.f32 %r427, %r426, %r425; 2026-02-21T08:13:38.9346246Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9346641Z cvt.u64.u32 %rd137, %r348; 2026-02-21T08:13:38.9346827Z cvt.u64.u32 %rd138, %r349; 2026-02-21T08:13:38.9347020Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:13:38.9347294Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:13:38.9347633Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9347998Z mov.b64 {%r428, %r429}, %rd140; 2026-02-21T08:13:38.9348201Z cvt.rn.f16x2.f32 %r430, %r429, %r428; 2026-02-21T08:13:38.9348537Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9348883Z cvt.u64.u32 %rd141, %r350; 2026-02-21T08:13:38.9349067Z cvt.u64.u32 %rd142, %r351; 2026-02-21T08:13:38.9349248Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:13:38.9349439Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:13:38.9349756Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9350112Z mov.b64 {%r431, %r432}, %rd144; 2026-02-21T08:13:38.9350317Z cvt.rn.f16x2.f32 %r433, %r432, %r431; 2026-02-21T08:13:38.9350668Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9351036Z cvt.u64.u32 %rd145, %r352; 2026-02-21T08:13:38.9351217Z cvt.u64.u32 %rd146, %r353; 2026-02-21T08:13:38.9351405Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:13:38.9351586Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:13:38.9351929Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9352277Z mov.b64 {%r434, %r435}, %rd148; 2026-02-21T08:13:38.9352485Z cvt.rn.f16x2.f32 %r436, %r435, %r434; 2026-02-21T08:13:38.9352827Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9353180Z cvt.u64.u32 %rd149, %r354; 2026-02-21T08:13:38.9353373Z cvt.u64.u32 %rd150, %r355; 2026-02-21T08:13:38.9353556Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:13:38.9353755Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:13:38.9354086Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9354453Z mov.b64 {%r437, %r438}, %rd152; 2026-02-21T08:13:38.9354658Z cvt.rn.f16x2.f32 %r439, %r438, %r437; 2026-02-21T08:13:38.9355058Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9355431Z cvt.u64.u32 %rd153, %r356; 2026-02-21T08:13:38.9355618Z cvt.u64.u32 %rd154, %r357; 2026-02-21T08:13:38.9355814Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:13:38.9356006Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:13:38.9356320Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9356647Z mov.b64 {%r440, %r441}, %rd156; 2026-02-21T08:13:38.9356840Z cvt.rn.f16x2.f32 %r442, %r441, %r440; 2026-02-21T08:13:38.9357164Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9357493Z cvt.u64.u32 %rd157, %r358; 2026-02-21T08:13:38.9357673Z cvt.u64.u32 %rd158, %r359; 2026-02-21T08:13:38.9357846Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:13:38.9358108Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:13:38.9358402Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9358728Z mov.b64 {%r443, %r444}, %rd160; 2026-02-21T08:13:38.9358931Z cvt.rn.f16x2.f32 %r445, %r444, %r443; 2026-02-21T08:13:38.9359255Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9359614Z cvt.u64.u32 %rd161, %r360; 2026-02-21T08:13:38.9359796Z cvt.u64.u32 %rd162, %r361; 2026-02-21T08:13:38.9359983Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:13:38.9360164Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:13:38.9360487Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9360845Z mov.b64 {%r446, %r447}, %rd164; 2026-02-21T08:13:38.9361057Z cvt.rn.f16x2.f32 %r448, %r447, %r446; 2026-02-21T08:13:38.9361435Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9361762Z cvt.u64.u32 %rd165, %r363; 2026-02-21T08:13:38.9361953Z cvt.u64.u32 %rd166, %r364; 2026-02-21T08:13:38.9362133Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:13:38.9362326Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:13:38.9362646Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9363015Z mov.b64 {%r449, %r450}, %rd168; 2026-02-21T08:13:38.9363219Z cvt.rn.f16x2.f32 %r451, %r450, %r449; 2026-02-21T08:13:38.9363555Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9363925Z cvt.u64.u32 %rd169, %r365; 2026-02-21T08:13:38.9364106Z cvt.u64.u32 %rd170, %r366; 2026-02-21T08:13:38.9364294Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:13:38.9364479Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:13:38.9364882Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9365244Z mov.b64 {%r452, %r453}, %rd172; 2026-02-21T08:13:38.9365458Z cvt.rn.f16x2.f32 %r454, %r453, %r452; 2026-02-21T08:13:38.9365811Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9366198Z cvt.u64.u32 %rd173, %r367; 2026-02-21T08:13:38.9366390Z cvt.u64.u32 %rd174, %r368; 2026-02-21T08:13:38.9366575Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:13:38.9366774Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:13:38.9367104Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9367459Z mov.b64 {%r455, %r456}, %rd176; 2026-02-21T08:13:38.9367663Z cvt.rn.f16x2.f32 %r457, %r456, %r455; 2026-02-21T08:13:38.9368002Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9368387Z cvt.u64.u32 %rd177, %r369; 2026-02-21T08:13:38.9368572Z cvt.u64.u32 %rd178, %r370; 2026-02-21T08:13:38.9368761Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:13:38.9368946Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:13:38.9369273Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9369623Z mov.b64 {%r458, %r459}, %rd180; 2026-02-21T08:13:38.9369826Z cvt.rn.f16x2.f32 %r460, %r459, %r458; 2026-02-21T08:13:38.9370175Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9370527Z cvt.u64.u32 %rd181, %r371; 2026-02-21T08:13:38.9370719Z cvt.u64.u32 %rd182, %r372; 2026-02-21T08:13:38.9370901Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:13:38.9371095Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:13:38.9371424Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9371787Z mov.b64 {%r461, %r462}, %rd184; 2026-02-21T08:13:38.9376063Z cvt.rn.f16x2.f32 %r463, %r462, %r461; 2026-02-21T08:13:38.9376464Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9376858Z cvt.u64.u32 %rd185, %r373; 2026-02-21T08:13:38.9377059Z cvt.u64.u32 %rd186, %r374; 2026-02-21T08:13:38.9377253Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:13:38.9377449Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:13:38.9377797Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9378149Z mov.b64 {%r464, %r465}, %rd188; 2026-02-21T08:13:38.9378364Z cvt.rn.f16x2.f32 %r466, %r465, %r464; 2026-02-21T08:13:38.9378716Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9379064Z cvt.u64.u32 %rd189, %r375; 2026-02-21T08:13:38.9379259Z cvt.u64.u32 %rd190, %r376; 2026-02-21T08:13:38.9379568Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:13:38.9379776Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:13:38.9380100Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9380457Z mov.b64 {%r467, %r468}, %rd192; 2026-02-21T08:13:38.9380663Z cvt.rn.f16x2.f32 %r469, %r468, %r467; 2026-02-21T08:13:38.9380996Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9381338Z cvt.u64.u32 %rd193, %r377; 2026-02-21T08:13:38.9381517Z cvt.u64.u32 %rd194, %r378; 2026-02-21T08:13:38.9381705Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:13:38.9381895Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:13:38.9382225Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9382563Z mov.b64 {%r470, %r471}, %rd196; 2026-02-21T08:13:38.9382765Z cvt.rn.f16x2.f32 %r472, %r471, %r470; 2026-02-21T08:13:38.9383107Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9383455Z cvt.u64.u32 %rd197, %r380; 2026-02-21T08:13:38.9383653Z cvt.u64.u32 %rd198, %r381; 2026-02-21T08:13:38.9383837Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:13:38.9384036Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:13:38.9384365Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9384814Z mov.b64 {%r473, %r474}, %rd200; 2026-02-21T08:13:38.9385034Z cvt.rn.f16x2.f32 %r475, %r474, %r473; 2026-02-21T08:13:38.9385410Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9385783Z cvt.u64.u32 %rd201, %r382; 2026-02-21T08:13:38.9385975Z cvt.u64.u32 %rd202, %r383; 2026-02-21T08:13:38.9386179Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:13:38.9386377Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:13:38.9386729Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9387093Z mov.b64 {%r476, %r477}, %rd204; 2026-02-21T08:13:38.9387306Z cvt.rn.f16x2.f32 %r478, %r477, %r476; 2026-02-21T08:13:38.9387664Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9388017Z cvt.u64.u32 %rd205, %r384; 2026-02-21T08:13:38.9388212Z cvt.u64.u32 %rd206, %r385; 2026-02-21T08:13:38.9388398Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:13:38.9388595Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:13:38.9388929Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9389290Z mov.b64 {%r479, %r480}, %rd208; 2026-02-21T08:13:38.9389498Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T08:13:38.9389839Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9390199Z cvt.u64.u32 %rd209, %r386; 2026-02-21T08:13:38.9390388Z cvt.u64.u32 %rd210, %r387; 2026-02-21T08:13:38.9390675Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:13:38.9390868Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:13:38.9391209Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9391571Z mov.b64 {%r482, %r483}, %rd212; 2026-02-21T08:13:38.9391784Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T08:13:38.9392136Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9392502Z cvt.u64.u32 %rd213, %r388; 2026-02-21T08:13:38.9392699Z cvt.u64.u32 %rd214, %r389; 2026-02-21T08:13:38.9392885Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:13:38.9393080Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:13:38.9393407Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9393785Z mov.b64 {%r485, %r486}, %rd216; 2026-02-21T08:13:38.9394099Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T08:13:38.9394450Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9394891Z cvt.u64.u32 %rd217, %r390; 2026-02-21T08:13:38.9395081Z cvt.u64.u32 %rd218, %r391; 2026-02-21T08:13:38.9395277Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:13:38.9395471Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:13:38.9395821Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9396205Z mov.b64 {%r488, %r489}, %rd220; 2026-02-21T08:13:38.9396417Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T08:13:38.9396775Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9397164Z cvt.u64.u32 %rd221, %r392; 2026-02-21T08:13:38.9397360Z cvt.u64.u32 %rd222, %r393; 2026-02-21T08:13:38.9397546Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:13:38.9397754Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:13:38.9398099Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9398472Z mov.b64 {%r491, %r492}, %rd224; 2026-02-21T08:13:38.9398679Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T08:13:38.9399022Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9399389Z cvt.u64.u32 %rd225, %r394; 2026-02-21T08:13:38.9399582Z cvt.u64.u32 %rd226, %r395; 2026-02-21T08:13:38.9399773Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:13:38.9399961Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:13:38.9400295Z .loc 1 58 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:58:27 2026-02-21T08:13:38.9400657Z mov.b64 {%r494, %r495}, %rd228; 2026-02-21T08:13:38.9400860Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T08:13:38.9401207Z .loc 1 59 45 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:59:45 2026-02-21T08:13:38.9401581Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:38.9401798Z bar.sync 0; 2026-02-21T08:13:38.9402005Z st.shared.v4.b32 [%r6], {%r403, %r406, %r409, %r412}; 2026-02-21T08:13:38.9402298Z st.shared.v4.b32 [%r7], {%r415, %r418, %r421, %r424}; 2026-02-21T08:13:38.9402566Z st.shared.v4.b32 [%r8], {%r427, %r430, %r433, %r436}; 2026-02-21T08:13:38.9402835Z st.shared.v4.b32 [%r9], {%r439, %r442, %r445, %r448}; 2026-02-21T08:13:38.9403114Z st.shared.v4.b32 [%r10], {%r451, %r454, %r457, %r460}; 2026-02-21T08:13:38.9403386Z st.shared.v4.b32 [%r11], {%r463, %r466, %r469, %r472}; 2026-02-21T08:13:38.9403659Z st.shared.v4.b32 [%r12], {%r475, %r478, %r481, %r484}; 2026-02-21T08:13:38.9403920Z st.shared.v4.b32 [%r13], {%r487, %r490, %r493, %r496}; 2026-02-21T08:13:38.9404162Z // begin inline asm 2026-02-21T08:13:38.9404379Z fence.proxy.async.shared::cta; 2026-02-21T08:13:38.9404588Z // end inline asm 2026-02-21T08:13:38.9404807Z bar.sync 0; 2026-02-21T08:13:38.9404990Z elect.sync %r497|%p145, -1; 2026-02-21T08:13:38.9405275Z and.pred %p143, %p1, %p145; 2026-02-21T08:13:38.9405465Z // begin inline asm 2026-02-21T08:13:38.9405805Z @%p143 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd100, {%r397, %r398}], [%r85]; 2026-02-21T08:13:38.9406164Z // end inline asm 2026-02-21T08:13:38.9406347Z cp.async.bulk.commit_group; 2026-02-21T08:13:38.9406672Z .loc 1 33 74 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:33:74 2026-02-21T08:13:38.9407044Z add.s32 %r499, %r499, 1; 2026-02-21T08:13:38.9407241Z setp.ne.b32 %p146, %r499, %r4; 2026-02-21T08:13:38.9407451Z @%p146 bra $L__BB0_2; 2026-02-21T08:13:38.9407637Z bra.uni $L__BB0_9; 2026-02-21T08:13:38.9407866Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:13:38.9408177Z // Child Loop BB0_5 Depth 2 2026-02-21T08:13:38.9408620Z .loc 1 39 35 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:39:35 2026-02-21T08:13:38.9408986Z shr.s32 %r226, %r499, 31; 2026-02-21T08:13:38.9409177Z shr.u32 %r227, %r226, 25; 2026-02-21T08:13:38.9409376Z add.s32 %r228, %r499, %r227; 2026-02-21T08:13:38.9409711Z .loc 1 42 45 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:42:45 2026-02-21T08:13:38.9410059Z and.b32 %r229, %r228, 65408; 2026-02-21T08:13:38.9410259Z sub.s32 %r230, %r499, %r229; 2026-02-21T08:13:38.9410580Z .loc 1 42 64 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:42:64 2026-02-21T08:13:38.9410942Z cvt.u16.u32 %rs1, %r230; 2026-02-21T08:13:38.9411136Z cvt.s8.s32 %rs2, %r230; 2026-02-21T08:13:38.9411460Z .loc 1 43 51 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:43:51 2026-02-21T08:13:38.9411820Z shr.u16 %rs3, %rs2, 12; 2026-02-21T08:13:38.9412008Z and.b16 %rs4, %rs3, 7; 2026-02-21T08:13:38.9412203Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T08:13:38.9412396Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T08:13:38.9412595Z shr.s16 %rs7, %rs6, 3; 2026-02-21T08:13:38.9412907Z .loc 1 42 64 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:42:64 2026-02-21T08:13:38.9413269Z and.b16 %rs8, %rs5, 248; 2026-02-21T08:13:38.9413453Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T08:13:38.9413787Z .loc 1 44 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:44:27 2026-02-21T08:13:38.9414149Z shl.b32 %r231, %r228, 3; 2026-02-21T08:13:38.9414337Z and.b32 %r232, %r231, -1024; 2026-02-21T08:13:38.9414540Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T08:13:38.9414824Z mad.wide.s16 %r398, %rs10, 128, %r232; 2026-02-21T08:13:38.9415194Z .loc 1 45 27 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:45:27 2026-02-21T08:13:38.9415560Z mul.wide.s16 %r397, %rs7, 64; 2026-02-21T08:13:38.9415910Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9416297Z shfl.sync.idx.b32 %r17, %r5, 0, 31, -1; 2026-02-21T08:13:38.9416530Z shl.b32 %r233, %r17, 21; 2026-02-21T08:13:38.9416725Z and.b32 %r234, %r233, 6291456; 2026-02-21T08:13:38.9416923Z add.s32 %r396, %r234, %r498; 2026-02-21T08:13:38.9417124Z mov.pred %p58, -1; 2026-02-21T08:13:38.9417301Z mov.b32 %r500, 0; 2026-02-21T08:13:38.9417463Z // begin inline asm 2026-02-21T08:13:38.9417888Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r396 + 0], {%r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500}; 2026-02-21T08:13:38.9418344Z // end inline asm 2026-02-21T08:13:38.9418505Z // begin inline asm 2026-02-21T08:13:38.9418917Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r396 + 16], {%r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500}; 2026-02-21T08:13:38.9419426Z // end inline asm 2026-02-21T08:13:38.9419589Z // begin inline asm 2026-02-21T08:13:38.9420044Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r396 + 32], {%r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500}; 2026-02-21T08:13:38.9420618Z // end inline asm 2026-02-21T08:13:38.9420781Z // begin inline asm 2026-02-21T08:13:38.9421233Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r396 + 48], {%r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500, %r500}; 2026-02-21T08:13:38.9421666Z // end inline asm 2026-02-21T08:13:38.9421828Z // begin inline asm 2026-02-21T08:13:38.9422009Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:13:38.9422214Z // end inline asm 2026-02-21T08:13:38.9422369Z bar.sync 0; 2026-02-21T08:13:38.9422685Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9423040Z add.s32 %r501, %r35, 102464; 2026-02-21T08:13:38.9423233Z // begin inline asm 2026-02-21T08:13:38.9423498Z @%p4 mbarrier.init.shared::cta.b64 [%r501], 1; 2026-02-21T08:13:38.9423734Z // end inline asm 2026-02-21T08:13:38.9423900Z bar.sync 0; 2026-02-21T08:13:38.9424055Z add.s32 %r162, %r35, 102472; 2026-02-21T08:13:38.9424244Z // begin inline asm 2026-02-21T08:13:38.9424437Z @%p4 mbarrier.init.shared::cta.b64 [%r162], 1; 2026-02-21T08:13:38.9424718Z // end inline asm 2026-02-21T08:13:38.9424889Z add.s32 %r163, %r35, 102400; 2026-02-21T08:13:38.9425084Z // begin inline asm 2026-02-21T08:13:38.9425288Z @%p4 mbarrier.init.shared::cta.b64 [%r163], 1; 2026-02-21T08:13:38.9425515Z // end inline asm 2026-02-21T08:13:38.9425688Z bar.sync 0; 2026-02-21T08:13:38.9425848Z add.s32 %r164, %r35, 102408; 2026-02-21T08:13:38.9426046Z // begin inline asm 2026-02-21T08:13:38.9426246Z @%p4 mbarrier.init.shared::cta.b64 [%r164], 1; 2026-02-21T08:13:38.9426482Z // end inline asm 2026-02-21T08:13:38.9426638Z bar.sync 0; 2026-02-21T08:13:38.9426801Z add.s32 %r165, %r35, 102416; 2026-02-21T08:13:38.9426992Z // begin inline asm 2026-02-21T08:13:38.9427196Z @%p4 mbarrier.init.shared::cta.b64 [%r165], 1; 2026-02-21T08:13:38.9427425Z // end inline asm 2026-02-21T08:13:38.9427580Z bar.sync 0; 2026-02-21T08:13:38.9427740Z add.s32 %r166, %r35, 102424; 2026-02-21T08:13:38.9427923Z // begin inline asm 2026-02-21T08:13:38.9428118Z @%p4 mbarrier.init.shared::cta.b64 [%r166], 1; 2026-02-21T08:13:38.9428334Z // end inline asm 2026-02-21T08:13:38.9428497Z bar.sync 0; 2026-02-21T08:13:38.9428650Z add.s32 %r167, %r35, 102432; 2026-02-21T08:13:38.9428840Z // begin inline asm 2026-02-21T08:13:38.9429035Z @%p4 mbarrier.init.shared::cta.b64 [%r167], 1; 2026-02-21T08:13:38.9429252Z // end inline asm 2026-02-21T08:13:38.9429415Z bar.sync 0; 2026-02-21T08:13:38.9429568Z add.s32 %r168, %r35, 102440; 2026-02-21T08:13:38.9429756Z // begin inline asm 2026-02-21T08:13:38.9429943Z @%p4 mbarrier.init.shared::cta.b64 [%r168], 1; 2026-02-21T08:13:38.9430166Z // end inline asm 2026-02-21T08:13:38.9430317Z bar.sync 0; 2026-02-21T08:13:38.9430479Z add.s32 %r254, %r35, 102448; 2026-02-21T08:13:38.9430662Z // begin inline asm 2026-02-21T08:13:38.9430856Z @%p4 mbarrier.init.shared::cta.b64 [%r254], 1; 2026-02-21T08:13:38.9431079Z // end inline asm 2026-02-21T08:13:38.9431232Z bar.sync 0; 2026-02-21T08:13:38.9431390Z // begin inline asm 2026-02-21T08:13:38.9431616Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r163], 12288; 2026-02-21T08:13:38.9431880Z // end inline asm 2026-02-21T08:13:38.9432182Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9432535Z // begin inline asm 2026-02-21T08:13:38.9432718Z fence.proxy.async.shared::cta; 2026-02-21T08:13:38.9432921Z // end inline asm 2026-02-21T08:13:38.9433082Z bar.sync 0; 2026-02-21T08:13:38.9433244Z elect.sync %r235|%p90, -1; 2026-02-21T08:13:38.9433446Z and.pred %p72, %p1, %p90; 2026-02-21T08:13:38.9433631Z // begin inline asm 2026-02-21T08:13:38.9434039Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r35], [%rd87, {%r500, %r398}], [%r163]; 2026-02-21T08:13:38.9434539Z // end inline asm 2026-02-21T08:13:38.9435114Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9435470Z bar.sync 0; 2026-02-21T08:13:38.9435657Z elect.sync %r236|%p91, -1; 2026-02-21T08:13:38.9435862Z and.pred %p73, %p1, %p91; 2026-02-21T08:13:38.9436046Z // begin inline asm 2026-02-21T08:13:38.9436455Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r74], [%rd88, {%r500, %r397}], [%r163]; 2026-02-21T08:13:38.9436878Z // end inline asm 2026-02-21T08:13:38.9437195Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9437565Z bar.sync 0; 2026-02-21T08:13:38.9437728Z // begin inline asm 2026-02-21T08:13:38.9437967Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r164], 12288; 2026-02-21T08:13:38.9438282Z // end inline asm 2026-02-21T08:13:38.9438569Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9438896Z bar.sync 0; 2026-02-21T08:13:38.9439053Z elect.sync %r237|%p92, -1; 2026-02-21T08:13:38.9439235Z and.pred %p75, %p1, %p92; 2026-02-21T08:13:38.9439417Z add.s32 %r180, %r35, 8192; 2026-02-21T08:13:38.9439585Z mov.b32 %r181, 32; 2026-02-21T08:13:38.9439743Z // begin inline asm 2026-02-21T08:13:38.9440115Z @%p75 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r180], [%rd87, {%r181, %r398}], [%r164]; 2026-02-21T08:13:38.9440513Z // end inline asm 2026-02-21T08:13:38.9440799Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9441124Z bar.sync 0; 2026-02-21T08:13:38.9441285Z elect.sync %r238|%p93, -1; 2026-02-21T08:13:38.9441467Z and.pred %p76, %p1, %p93; 2026-02-21T08:13:38.9441651Z add.s32 %r184, %r35, 61440; 2026-02-21T08:13:38.9441825Z // begin inline asm 2026-02-21T08:13:38.9442199Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r184], [%rd88, {%r181, %r397}], [%r164]; 2026-02-21T08:13:38.9442601Z // end inline asm 2026-02-21T08:13:38.9442883Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9443204Z bar.sync 0; 2026-02-21T08:13:38.9443345Z // begin inline asm 2026-02-21T08:13:38.9443562Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r165], 12288; 2026-02-21T08:13:38.9443802Z // end inline asm 2026-02-21T08:13:38.9444087Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9444404Z bar.sync 0; 2026-02-21T08:13:38.9444559Z elect.sync %r239|%p94, -1; 2026-02-21T08:13:38.9444815Z and.pred %p78, %p1, %p94; 2026-02-21T08:13:38.9445003Z add.s32 %r189, %r35, 16384; 2026-02-21T08:13:38.9445189Z mov.b32 %r190, 64; 2026-02-21T08:13:38.9445355Z // begin inline asm 2026-02-21T08:13:38.9445767Z @%p78 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r189], [%rd87, {%r190, %r398}], [%r165]; 2026-02-21T08:13:38.9446212Z // end inline asm 2026-02-21T08:13:38.9446529Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9446909Z bar.sync 0; 2026-02-21T08:13:38.9447066Z elect.sync %r240|%p95, -1; 2026-02-21T08:13:38.9447264Z and.pred %p79, %p1, %p95; 2026-02-21T08:13:38.9447454Z add.s32 %r193, %r35, 65536; 2026-02-21T08:13:38.9447642Z // begin inline asm 2026-02-21T08:13:38.9448040Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r193], [%rd88, {%r190, %r397}], [%r165]; 2026-02-21T08:13:38.9448500Z // end inline asm 2026-02-21T08:13:38.9448820Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9449179Z bar.sync 0; 2026-02-21T08:13:38.9449346Z // begin inline asm 2026-02-21T08:13:38.9449674Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r166], 12288; 2026-02-21T08:13:38.9449959Z // end inline asm 2026-02-21T08:13:38.9450273Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9450638Z bar.sync 0; 2026-02-21T08:13:38.9450818Z elect.sync %r241|%p96, -1; 2026-02-21T08:13:38.9451018Z and.pred %p81, %p1, %p96; 2026-02-21T08:13:38.9451214Z add.s32 %r198, %r35, 24576; 2026-02-21T08:13:38.9451406Z mov.b32 %r199, 96; 2026-02-21T08:13:38.9451595Z // begin inline asm 2026-02-21T08:13:38.9451990Z @%p81 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r198], [%rd87, {%r199, %r398}], [%r166]; 2026-02-21T08:13:38.9452430Z // end inline asm 2026-02-21T08:13:38.9452735Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9453096Z bar.sync 0; 2026-02-21T08:13:38.9453387Z elect.sync %r242|%p97, -1; 2026-02-21T08:13:38.9453589Z and.pred %p82, %p1, %p97; 2026-02-21T08:13:38.9453786Z add.s32 %r202, %r35, 69632; 2026-02-21T08:13:38.9453974Z // begin inline asm 2026-02-21T08:13:38.9454376Z @%p82 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r202], [%rd88, {%r199, %r397}], [%r166]; 2026-02-21T08:13:38.9454860Z // end inline asm 2026-02-21T08:13:38.9455178Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9455543Z bar.sync 0; 2026-02-21T08:13:38.9455701Z // begin inline asm 2026-02-21T08:13:38.9455938Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r167], 12288; 2026-02-21T08:13:38.9456204Z // end inline asm 2026-02-21T08:13:38.9456513Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9456889Z bar.sync 0; 2026-02-21T08:13:38.9457063Z elect.sync %r243|%p98, -1; 2026-02-21T08:13:38.9457262Z and.pred %p84, %p1, %p98; 2026-02-21T08:13:38.9457463Z add.s32 %r207, %r35, 32768; 2026-02-21T08:13:38.9457649Z mov.b32 %r208, 128; 2026-02-21T08:13:38.9457832Z // begin inline asm 2026-02-21T08:13:38.9458234Z @%p84 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r207], [%rd87, {%r208, %r398}], [%r167]; 2026-02-21T08:13:38.9458684Z // end inline asm 2026-02-21T08:13:38.9459002Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9459353Z bar.sync 0; 2026-02-21T08:13:38.9459523Z elect.sync %r244|%p99, -1; 2026-02-21T08:13:38.9459720Z and.pred %p85, %p1, %p99; 2026-02-21T08:13:38.9459920Z add.s32 %r211, %r35, 73728; 2026-02-21T08:13:38.9460110Z // begin inline asm 2026-02-21T08:13:38.9460512Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r211], [%rd88, {%r208, %r397}], [%r167]; 2026-02-21T08:13:38.9460966Z // end inline asm 2026-02-21T08:13:38.9461278Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9461637Z bar.sync 0; 2026-02-21T08:13:38.9461792Z // begin inline asm 2026-02-21T08:13:38.9462032Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r168], 12288; 2026-02-21T08:13:38.9462308Z // end inline asm 2026-02-21T08:13:38.9462623Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9462990Z bar.sync 0; 2026-02-21T08:13:38.9463156Z elect.sync %r245|%p100, -1; 2026-02-21T08:13:38.9463364Z and.pred %p87, %p1, %p100; 2026-02-21T08:13:38.9463557Z add.s32 %r216, %r35, 40960; 2026-02-21T08:13:38.9463755Z mov.b32 %r217, 160; 2026-02-21T08:13:38.9463931Z // begin inline asm 2026-02-21T08:13:38.9464336Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r216], [%rd87, {%r217, %r398}], [%r168]; 2026-02-21T08:13:38.9464831Z // end inline asm 2026-02-21T08:13:38.9465135Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9465582Z bar.sync 0; 2026-02-21T08:13:38.9465746Z elect.sync %r246|%p101, -1; 2026-02-21T08:13:38.9465952Z and.pred %p88, %p1, %p101; 2026-02-21T08:13:38.9466144Z add.s32 %r220, %r35, 77824; 2026-02-21T08:13:38.9466336Z // begin inline asm 2026-02-21T08:13:38.9466736Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r220], [%rd88, {%r217, %r397}], [%r168]; 2026-02-21T08:13:38.9467168Z // end inline asm 2026-02-21T08:13:38.9467476Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9467836Z bar.sync 0; 2026-02-21T08:13:38.9468000Z // begin inline asm 2026-02-21T08:13:38.9468168Z 2026-02-21T08:13:38.9468316Z { 2026-02-21T08:13:38.9468465Z .reg .pred complete; 2026-02-21T08:13:38.9468659Z waitLoop: 2026-02-21T08:13:38.9468971Z mbarrier.try_wait.parity.shared.b64 complete, [%r163], %r500; 2026-02-21T08:13:38.9469271Z @!complete bra.uni waitLoop; 2026-02-21T08:13:38.9469458Z } 2026-02-21T08:13:38.9469539Z 2026-02-21T08:13:38.9469607Z // end inline asm 2026-02-21T08:13:38.9469909Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9470278Z setp.ne.b32 %p102, %r17, 0; 2026-02-21T08:13:38.9470472Z @%p102 bra $L__BB0_4; 2026-02-21T08:13:38.9470702Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:38.9470975Z elect.sync %r251|%p104, -1; 2026-02-21T08:13:38.9471176Z mov.b32 %r248, 135266320; 2026-02-21T08:13:38.9471359Z mov.pred %p103, 0; 2026-02-21T08:13:38.9471538Z // begin inline asm 2026-02-21T08:13:38.9471816Z @%p104 tcgen05.mma.cta_group::1.kind::f16 [ %r498 + 0 ], %rd82, %rd83, %r248, %p103; 2026-02-21T08:13:38.9472130Z // end inline asm 2026-02-21T08:13:38.9472291Z // begin inline asm 2026-02-21T08:13:38.9472562Z @%p104 tcgen05.mma.cta_group::1.kind::f16 [ %r498 + 0 ], %rd84, %rd85, %r248, %p58; 2026-02-21T08:13:38.9472858Z // end inline asm 2026-02-21T08:13:38.9473033Z add.s32 %r253, %r35, 102464; 2026-02-21T08:13:38.9473230Z cvt.u64.u32 %rd86, %r253; 2026-02-21T08:13:38.9473411Z // begin inline asm 2026-02-21T08:13:38.9473670Z @%p104 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T08:13:38.9473945Z // end inline asm 2026-02-21T08:13:38.9474167Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:38.9474559Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9474953Z bar.sync 0; 2026-02-21T08:13:38.9475110Z // begin inline asm 2026-02-21T08:13:38.9475349Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r254], 12288; 2026-02-21T08:13:38.9475624Z // end inline asm 2026-02-21T08:13:38.9475932Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9476298Z bar.sync 0; 2026-02-21T08:13:38.9476477Z elect.sync %r268|%p112, -1; 2026-02-21T08:13:38.9476684Z and.pred %p109, %p1, %p112; 2026-02-21T08:13:38.9476873Z add.s32 %r255, %r35, 49152; 2026-02-21T08:13:38.9477066Z mov.b32 %r256, 192; 2026-02-21T08:13:38.9477237Z // begin inline asm 2026-02-21T08:13:38.9477654Z @%p109 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r255], [%rd87, {%r256, %r398}], [%r254]; 2026-02-21T08:13:38.9478109Z // end inline asm 2026-02-21T08:13:38.9478416Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9478779Z bar.sync 0; 2026-02-21T08:13:38.9478937Z elect.sync %r269|%p113, -1; 2026-02-21T08:13:38.9479136Z and.pred %p110, %p1, %p113; 2026-02-21T08:13:38.9479323Z add.s32 %r259, %r35, 81920; 2026-02-21T08:13:38.9479533Z // begin inline asm 2026-02-21T08:13:38.9479950Z @%p110 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r259], [%rd88, {%r256, %r397}], [%r254]; 2026-02-21T08:13:38.9480448Z // end inline asm 2026-02-21T08:13:38.9480604Z mov.b32 %r505, 1; 2026-02-21T08:13:38.9480757Z mov.b32 %r504, 6; 2026-02-21T08:13:38.9480917Z mov.b32 %r502, %r500; 2026-02-21T08:13:38.9481081Z mov.b32 %r503, %r500; 2026-02-21T08:13:38.9481252Z mov.b32 %r506, %r500; 2026-02-21T08:13:38.9481411Z mov.b32 %r507, %r500; 2026-02-21T08:13:38.9481581Z bra.uni $L__BB0_5; 2026-02-21T08:13:38.9481810Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:13:38.9482204Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9482570Z setp.lt.u32 %p124, %r507, 800; 2026-02-21T08:13:38.9482902Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9483247Z // begin inline asm 2026-02-21T08:13:38.9483449Z 2026-02-21T08:13:38.9483657Z { 2026-02-21T08:13:38.9483968Z .reg .pred complete; 2026-02-21T08:13:38.9484204Z waitLoop: 2026-02-21T08:13:38.9484422Z mbarrier.try_wait.parity.shared.b64 complete, [%r501], %r500; 2026-02-21T08:13:38.9484717Z @!complete bra.uni waitLoop; 2026-02-21T08:13:38.9484896Z } 2026-02-21T08:13:38.9484970Z 2026-02-21T08:13:38.9485033Z // end inline asm 2026-02-21T08:13:38.9485196Z add.s32 %r307, %r505, 1; 2026-02-21T08:13:38.9485373Z setp.gt.s32 %p127, %r307, 1; 2026-02-21T08:13:38.9485570Z selp.b32 %r505, 0, %r307, %p127; 2026-02-21T08:13:38.9485769Z selp.b32 %r308, 1, 0, %p127; 2026-02-21T08:13:38.9485957Z xor.b32 %r31, %r506, %r308; 2026-02-21T08:13:38.9486297Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9486664Z add.s32 %r309, %r504, 1; 2026-02-21T08:13:38.9486861Z setp.gt.s32 %p128, %r309, 6; 2026-02-21T08:13:38.9487057Z selp.b32 %r504, 0, %r309, %p128; 2026-02-21T08:13:38.9487269Z shl.b32 %r310, %r504, 3; 2026-02-21T08:13:38.9487454Z add.s32 %r312, %r35, %r310; 2026-02-21T08:13:38.9487655Z add.s32 %r302, %r312, 102400; 2026-02-21T08:13:38.9487846Z bar.sync 0; 2026-02-21T08:13:38.9488017Z and.pred %p121, %p4, %p124; 2026-02-21T08:13:38.9488204Z // begin inline asm 2026-02-21T08:13:38.9488449Z @%p121 mbarrier.arrive.expect_tx.shared.b64 _, [%r302], 12288; 2026-02-21T08:13:38.9488727Z // end inline asm 2026-02-21T08:13:38.9489034Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9489419Z shl.b32 %r313, %r504, 13; 2026-02-21T08:13:38.9489605Z add.s32 %r299, %r35, %r313; 2026-02-21T08:13:38.9489792Z bar.sync 0; 2026-02-21T08:13:38.9489955Z elect.sync %r314|%p129, -1; 2026-02-21T08:13:38.9490162Z and.pred %p130, %p124, %p129; 2026-02-21T08:13:38.9490362Z and.pred %p122, %p1, %p130; 2026-02-21T08:13:38.9490562Z add.s32 %r300, %r507, 224; 2026-02-21T08:13:38.9490754Z // begin inline asm 2026-02-21T08:13:38.9491180Z @%p122 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r299], [%rd87, {%r300, %r398}], [%r302]; 2026-02-21T08:13:38.9491629Z // end inline asm 2026-02-21T08:13:38.9491938Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9492299Z shl.b32 %r315, %r504, 12; 2026-02-21T08:13:38.9492485Z add.s32 %r316, %r35, %r315; 2026-02-21T08:13:38.9492679Z add.s32 %r303, %r316, 57344; 2026-02-21T08:13:38.9492865Z bar.sync 0; 2026-02-21T08:13:38.9493026Z elect.sync %r317|%p131, -1; 2026-02-21T08:13:38.9493231Z and.pred %p132, %p124, %p131; 2026-02-21T08:13:38.9493431Z and.pred %p123, %p1, %p132; 2026-02-21T08:13:38.9493628Z // begin inline asm 2026-02-21T08:13:38.9494033Z @%p123 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r303], [%rd88, {%r300, %r397}], [%r302]; 2026-02-21T08:13:38.9494483Z // end inline asm 2026-02-21T08:13:38.9494864Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9495327Z setp.lt.u32 %p133, %r507, 960; 2026-02-21T08:13:38.9495534Z add.s32 %r507, %r507, 32; 2026-02-21T08:13:38.9495718Z mov.b32 %r500, %r506; 2026-02-21T08:13:38.9495900Z mov.b32 %r501, %r318; 2026-02-21T08:13:38.9496073Z mov.b32 %r506, %r31; 2026-02-21T08:13:38.9496256Z @%p133 bra $L__BB0_5; 2026-02-21T08:13:38.9496440Z bra.uni $L__BB0_8; 2026-02-21T08:13:38.9496653Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:13:38.9496932Z // => This Inner Loop Header: Depth=2 2026-02-21T08:13:38.9497309Z .loc 1 50 42 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:50:42 2026-02-21T08:13:38.9497679Z add.s32 %r272, %r503, 1; 2026-02-21T08:13:38.9497865Z setp.gt.s32 %p115, %r272, 6; 2026-02-21T08:13:38.9498074Z selp.b32 %r503, 0, %r272, %p115; 2026-02-21T08:13:38.9498278Z selp.b32 %r273, 1, 0, %p115; 2026-02-21T08:13:38.9498546Z xor.b32 %r502, %r502, %r273; 2026-02-21T08:13:38.9498737Z shl.b32 %r274, %r503, 3; 2026-02-21T08:13:38.9498929Z add.s32 %r276, %r35, %r274; 2026-02-21T08:13:38.9499115Z add.s32 %r270, %r276, 102400; 2026-02-21T08:13:38.9499310Z bar.sync 0; 2026-02-21T08:13:38.9499473Z // begin inline asm 2026-02-21T08:13:38.9499636Z 2026-02-21T08:13:38.9499777Z { 2026-02-21T08:13:38.9499922Z .reg .pred complete; 2026-02-21T08:13:38.9500106Z waitLoop: 2026-02-21T08:13:38.9500334Z mbarrier.try_wait.parity.shared.b64 complete, [%r270], %r502; 2026-02-21T08:13:38.9500623Z @!complete bra.uni waitLoop; 2026-02-21T08:13:38.9500805Z } 2026-02-21T08:13:38.9500889Z 2026-02-21T08:13:38.9500954Z // end inline asm 2026-02-21T08:13:38.9501120Z shl.b32 %r277, %r505, 3; 2026-02-21T08:13:38.9501307Z add.s32 %r278, %r35, %r277; 2026-02-21T08:13:38.9501500Z add.s32 %r318, %r278, 102464; 2026-02-21T08:13:38.9501823Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9502188Z @%p102 bra $L__BB0_7; 2026-02-21T08:13:38.9502421Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:13:38.9502815Z .loc 1 54 31 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:54:31 2026-02-21T08:13:38.9503148Z shl.b32 %r283, %r503, 13; 2026-02-21T08:13:38.9503325Z add.s32 %r285, %r35, %r283; 2026-02-21T08:13:38.9503625Z .loc 1 55 44 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:55:44 2026-02-21T08:13:38.9503951Z shl.b32 %r286, %r503, 12; 2026-02-21T08:13:38.9504129Z add.s32 %r287, %r35, %r286; 2026-02-21T08:13:38.9504299Z add.s32 %r288, %r287, 57344; 2026-02-21T08:13:38.9504494Z .loc 1 56 52 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:56:52 2026-02-21T08:13:38.9504565Z elect.sync %r289|%p117, -1; 2026-02-21T08:13:38.9504638Z bfe.u32 %r290, %r285, 4, 14; 2026-02-21T08:13:38.9504752Z cvt.u64.u32 %rd94, %r290; 2026-02-21T08:13:38.9504848Z or.b64 %rd89, %rd94, -9223371899382267904; 2026-02-21T08:13:38.9504917Z bfe.u32 %r291, %r288, 4, 14; 2026-02-21T08:13:38.9504981Z cvt.u64.u32 %rd95, %r291; 2026-02-21T08:13:38.9505068Z or.b64 %rd90, %rd95, -9223371899399045120; 2026-02-21T08:13:38.9505135Z mov.b32 %r280, 135266320; 2026-02-21T08:13:38.9505209Z mov.pred %p116, -1; 2026-02-21T08:13:38.9505280Z // begin inline asm 2026-02-21T08:13:38.9505468Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r498 + 0 ], %rd89, %rd90, %r280, %p116; 2026-02-21T08:13:38.9505535Z // end inline asm 2026-02-21T08:13:38.9505603Z add.s32 %r292, %r285, 32; 2026-02-21T08:13:38.9505683Z bfe.u32 %r293, %r292, 4, 14; 2026-02-21T08:13:38.9505753Z cvt.u64.u32 %rd96, %r293; 2026-02-21T08:13:38.9505838Z or.b64 %rd91, %rd96, -9223371899382267904; 2026-02-21T08:13:38.9505913Z add.s32 %r294, %r287, 57376; 2026-02-21T08:13:38.9505983Z bfe.u32 %r295, %r294, 4, 14; 2026-02-21T08:13:38.9506053Z cvt.u64.u32 %rd97, %r295; 2026-02-21T08:13:38.9506136Z or.b64 %rd92, %rd97, -9223371899399045120; 2026-02-21T08:13:38.9506317Z // begin inline asm 2026-02-21T08:13:38.9506487Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r498 + 0 ], %rd91, %rd92, %r280, %p116; 2026-02-21T08:13:38.9506553Z // end inline asm 2026-02-21T08:13:38.9506643Z cvt.u64.u32 %rd93, %r318; 2026-02-21T08:13:38.9506708Z // begin inline asm 2026-02-21T08:13:38.9506861Z @%p117 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd93]; 2026-02-21T08:13:38.9506934Z // end inline asm 2026-02-21T08:13:38.9507001Z bra.uni $L__BB0_7; 2026-02-21T08:13:38.9507097Z $L__BB0_9: // %._crit_edge 2026-02-21T08:13:38.9507301Z .loc 1 33 74 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:33:74 2026-02-21T08:13:38.9507395Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:38.9507459Z bar.sync 0; 2026-02-21T08:13:38.9507735Z .loc 1 33 4 // cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py:33:4 2026-02-21T08:13:38.9507806Z bar.sync 0; 2026-02-21T08:13:38.9507876Z // begin inline asm 2026-02-21T08:13:38.9508010Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r498, 64; 2026-02-21T08:13:38.9508080Z // end inline asm 2026-02-21T08:13:38.9508142Z ret; 2026-02-21T08:13:38.9508206Z $L__tmp1: 2026-02-21T08:13:38.9508270Z $L__func_end0: 2026-02-21T08:13:38.9508373Z // -- End function 2026-02-21T08:13:38.9508433Z } 2026-02-21T08:13:38.9508683Z .file 1 "/tmp/torchinductor_root/ej/cejm7lnsgtbbkszuar66tfsmxjhckj7rwiaexvchjzw32xgm5vf7.py" 2026-02-21T08:13:38.9508764Z .section .debug_abbrev 2026-02-21T08:13:38.9508824Z { 2026-02-21T08:13:38.9508925Z .b8 1 // Abbreviation Code 2026-02-21T08:13:38.9509026Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:13:38.9509130Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:13:38.9509226Z .b8 37 // DW_AT_producer 2026-02-21T08:13:38.9509318Z .b8 8 // DW_FORM_string 2026-02-21T08:13:38.9509415Z .b8 19 // DW_AT_language 2026-02-21T08:13:38.9509507Z .b8 5 // DW_FORM_data2 2026-02-21T08:13:38.9509596Z .b8 3 // DW_AT_name 2026-02-21T08:13:38.9509691Z .b8 8 // DW_FORM_string 2026-02-21T08:13:38.9509782Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:13:38.9509872Z .b8 6 // DW_FORM_data4 2026-02-21T08:13:38.9509957Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:13:38.9510050Z .b8 8 // DW_FORM_string 2026-02-21T08:13:38.9510133Z .b8 0 // EOM(1) 2026-02-21T08:13:38.9510213Z .b8 0 // EOM(2) 2026-02-21T08:13:38.9510302Z .b8 0 // EOM(3) 2026-02-21T08:13:38.9510361Z } 2026-02-21T08:13:38.9510434Z .section .debug_info 2026-02-21T08:13:38.9510499Z { 2026-02-21T08:13:38.9510594Z .b32 104 // Length of Unit 2026-02-21T08:13:38.9510694Z .b8 2 // DWARF version number 2026-02-21T08:13:38.9510754Z .b8 0 2026-02-21T08:13:38.9510898Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:13:38.9511001Z .b8 8 // Address Size (in bytes) 2026-02-21T08:13:38.9511118Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:13:38.9511218Z .b8 116 // DW_AT_producer 2026-02-21T08:13:38.9511281Z .b8 114 2026-02-21T08:13:38.9511341Z .b8 105 2026-02-21T08:13:38.9511399Z .b8 116 2026-02-21T08:13:38.9511468Z .b8 111 2026-02-21T08:13:38.9511527Z .b8 110 2026-02-21T08:13:38.9511587Z .b8 0 2026-02-21T08:13:38.9511681Z .b8 2 // DW_AT_language 2026-02-21T08:13:38.9511789Z .b8 0 2026-02-21T08:13:38.9511878Z .b8 99 // DW_AT_name 2026-02-21T08:13:38.9511937Z .b8 101 2026-02-21T08:13:38.9512003Z .b8 106 2026-02-21T08:13:38.9512061Z .b8 109 2026-02-21T08:13:38.9512121Z .b8 55 2026-02-21T08:13:38.9512185Z .b8 108 2026-02-21T08:13:38.9512244Z .b8 110 2026-02-21T08:13:38.9512301Z .b8 115 2026-02-21T08:13:38.9512360Z .b8 103 2026-02-21T08:13:38.9512429Z .b8 116 2026-02-21T08:13:38.9512489Z .b8 98 2026-02-21T08:13:38.9512547Z .b8 98 2026-02-21T08:13:38.9512611Z .b8 107 2026-02-21T08:13:38.9512669Z .b8 115 2026-02-21T08:13:38.9512726Z .b8 122 2026-02-21T08:13:38.9512784Z .b8 117 2026-02-21T08:13:38.9512851Z .b8 97 2026-02-21T08:13:38.9512910Z .b8 114 2026-02-21T08:13:38.9512969Z .b8 54 2026-02-21T08:13:38.9513034Z .b8 54 2026-02-21T08:13:38.9513093Z .b8 116 2026-02-21T08:13:38.9513151Z .b8 102 2026-02-21T08:13:38.9513208Z .b8 115 2026-02-21T08:13:38.9513274Z .b8 109 2026-02-21T08:13:38.9513332Z .b8 120 2026-02-21T08:13:38.9513448Z .b8 106 2026-02-21T08:13:38.9513511Z .b8 104 2026-02-21T08:13:38.9513578Z .b8 99 2026-02-21T08:13:38.9513637Z .b8 107 2026-02-21T08:13:38.9513695Z .b8 106 2026-02-21T08:13:38.9513760Z .b8 55 2026-02-21T08:13:38.9513819Z .b8 114 2026-02-21T08:13:38.9513876Z .b8 119 2026-02-21T08:13:38.9513933Z .b8 105 2026-02-21T08:13:38.9513998Z .b8 97 2026-02-21T08:13:38.9514055Z .b8 101 2026-02-21T08:13:38.9514112Z .b8 120 2026-02-21T08:13:38.9514176Z .b8 118 2026-02-21T08:13:38.9514233Z .b8 99 2026-02-21T08:13:38.9514291Z .b8 104 2026-02-21T08:13:38.9514350Z .b8 106 2026-02-21T08:13:38.9514416Z .b8 122 2026-02-21T08:13:38.9514473Z .b8 119 2026-02-21T08:13:38.9514532Z .b8 51 2026-02-21T08:13:38.9514591Z .b8 50 2026-02-21T08:13:38.9514658Z .b8 120 2026-02-21T08:13:38.9514772Z .b8 103 2026-02-21T08:13:38.9514832Z .b8 109 2026-02-21T08:13:38.9514899Z .b8 53 2026-02-21T08:13:38.9514958Z .b8 118 2026-02-21T08:13:38.9515017Z .b8 102 2026-02-21T08:13:38.9515077Z .b8 55 2026-02-21T08:13:38.9515144Z .b8 46 2026-02-21T08:13:38.9515207Z .b8 112 2026-02-21T08:13:38.9515271Z .b8 121 2026-02-21T08:13:38.9515337Z .b8 0 2026-02-21T08:13:38.9515451Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:13:38.9515542Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:13:38.9515602Z .b8 116 2026-02-21T08:13:38.9515671Z .b8 109 2026-02-21T08:13:38.9515731Z .b8 112 2026-02-21T08:13:38.9515791Z .b8 47 2026-02-21T08:13:38.9515860Z .b8 116 2026-02-21T08:13:38.9515921Z .b8 111 2026-02-21T08:13:38.9515982Z .b8 114 2026-02-21T08:13:38.9516051Z .b8 99 2026-02-21T08:13:38.9516115Z .b8 104 2026-02-21T08:13:38.9516170Z .b8 105 2026-02-21T08:13:38.9516226Z .b8 110 2026-02-21T08:13:38.9516289Z .b8 100 2026-02-21T08:13:38.9516345Z .b8 117 2026-02-21T08:13:38.9516400Z .b8 99 2026-02-21T08:13:38.9516456Z .b8 116 2026-02-21T08:13:38.9516520Z .b8 111 2026-02-21T08:13:38.9516575Z .b8 114 2026-02-21T08:13:38.9516630Z .b8 95 2026-02-21T08:13:38.9516686Z .b8 114 2026-02-21T08:13:38.9516751Z .b8 111 2026-02-21T08:13:38.9516808Z .b8 111 2026-02-21T08:13:38.9516866Z .b8 116 2026-02-21T08:13:38.9516928Z .b8 47 2026-02-21T08:13:38.9516984Z .b8 101 2026-02-21T08:13:38.9517039Z .b8 106 2026-02-21T08:13:38.9517093Z .b8 0 2026-02-21T08:13:38.9517156Z } 2026-02-21T08:13:38.9517229Z .section .debug_macinfo { } 2026-02-21T08:13:38.9517234Z 2026-02-21T08:13:38.9517321Z ================================================================ 2026-02-21T08:13:38.9517443Z please share the reproducer above with Triton project. 2026-02-21T08:13:39.2484413Z 2026-02-21T08:13:39.2484510Z 2026-02-21T08:13:39.2484541Z 2026-02-21T08:13:39.2485113Z ================================================================ 2026-02-21T08:13:39.2485499Z Internal Triton PTX codegen error 2026-02-21T08:13:39.2485779Z `ptxas` stderr: 2026-02-21T08:13:39.2486361Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 273 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:39.2487063Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:39.2487635Z 2026-02-21T08:13:39.2488183Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp660h0m4w.ptx -o /tmp/tmp660h0m4w.ptx.o 2026-02-21T08:13:39.2488765Z 2026-02-21T08:13:39.2488770Z 2026-02-21T08:13:39.2488850Z // 2026-02-21T08:13:39.2489033Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:13:39.2489266Z // 2026-02-21T08:13:39.2489354Z 2026-02-21T08:13:39.2489426Z .version 8.7 2026-02-21T08:13:39.2489606Z .target sm_100a 2026-02-21T08:13:39.2489774Z .address_size 64 2026-02-21T08:13:39.2489887Z 2026-02-21T08:13:39.2490048Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:13:39.2490379Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:13:39.2490659Z // @_helion_matmul 2026-02-21T08:13:39.2490930Z .visible .entry _helion_matmul( 2026-02-21T08:13:39.2491358Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:13:39.2491712Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:13:39.2492038Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:13:39.2492362Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:13:39.2492683Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:13:39.2492959Z ) 2026-02-21T08:13:39.2493123Z .reqntid 128 2026-02-21T08:13:39.2493288Z .maxnreg 32 2026-02-21T08:13:39.2493464Z { 2026-02-21T08:13:39.2493627Z .reg .pred %p<167>; 2026-02-21T08:13:39.2493830Z .reg .b32 %r<1212>; 2026-02-21T08:13:39.2494006Z .reg .b64 %rd<621>; 2026-02-21T08:13:39.2494359Z .loc 1 19 0 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:19:0 2026-02-21T08:13:39.2494845Z $L__func_begin0: 2026-02-21T08:13:39.2495203Z .loc 1 19 0 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:19:0 2026-02-21T08:13:39.2495526Z 2026-02-21T08:13:39.2495610Z // %bb.0: 2026-02-21T08:13:39.2495822Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:13:39.2496079Z $L__tmp0: 2026-02-21T08:13:39.2496379Z .loc 1 19 0 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:19 2026-02-21T08:13:39.2496756Z mov.u32 %r1, %tid.x; 2026-02-21T08:13:39.2496977Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:13:39.2497247Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:13:39.2497474Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T08:13:39.2497731Z mov.b32 %r31, global_smem; 2026-02-21T08:13:39.2497941Z // begin inline asm 2026-02-21T08:13:39.2498254Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r31], 256; 2026-02-21T08:13:39.2498582Z // end inline asm 2026-02-21T08:13:39.2498788Z ld.param.b64 %rd58, [_helion_matmul_param_3]; 2026-02-21T08:13:39.2499030Z bar.sync 0; 2026-02-21T08:13:39.2499215Z ld.shared.b32 %r1203, [global_smem]; 2026-02-21T08:13:39.2499452Z bar.sync 0; 2026-02-21T08:13:39.2499624Z // begin inline asm 2026-02-21T08:13:39.2499890Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:13:39.2500186Z // end inline asm 2026-02-21T08:13:39.2500505Z .loc 1 21 67 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:21:67 2026-02-21T08:13:39.2500885Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:13:39.2501076Z mov.u32 %r56, %ctaid.y; 2026-02-21T08:13:39.2501273Z mov.u32 %r57, %ctaid.z; 2026-02-21T08:13:39.2501460Z mov.u32 %r58, %nctaid.x; 2026-02-21T08:13:39.2501655Z mov.u32 %r59, %nctaid.y; 2026-02-21T08:13:39.2501850Z mad.lo.s32 %r60, %r57, %r59, %r56; 2026-02-21T08:13:39.2502080Z mad.lo.s32 %r61, %r60, %r58, %r3; 2026-02-21T08:13:39.2502301Z mul.lo.s32 %r62, %r61, 384; 2026-02-21T08:13:39.2502501Z cvt.s64.s32 %rd59, %r62; 2026-02-21T08:13:39.2502701Z add.s64 %rd19, %rd58, %rd59; 2026-02-21T08:13:39.2502896Z shl.b32 %r63, %r1, 2; 2026-02-21T08:13:39.2503615Z [98s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:13:39.2505486Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:13:39.2507203Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:13:39.2507508Z `ptxas` stderr: 2026-02-21T08:13:39.2508061Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 273 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:39.2508778Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:39.2508981Z 2026-02-21T08:13:39.2509501Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp660h0m4w.ptx -o /tmp/tmp660h0m4w.ptx.o 2026-02-21T08:13:39.2510063Z 2026-02-21T08:13:39.2510239Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:13:39.2510561Z add.s32 %r32, %r31, %r63; 2026-02-21T08:13:39.2510760Z mov.b32 %r41, 0; 2026-02-21T08:13:39.2510933Z // begin inline asm 2026-02-21T08:13:39.2511142Z @%p1 st.shared.b32 [ %r32 + 0 ], %r41; 2026-02-21T08:13:39.2511367Z // end inline asm 2026-02-21T08:13:39.2511567Z bar.warp.sync -1; 2026-02-21T08:13:39.2511767Z setp.eq.b32 %p154, %r1, 0; 2026-02-21T08:13:39.2511970Z cvt.u64.u32 %rd4, %r31; 2026-02-21T08:13:39.2512163Z // begin inline asm 2026-02-21T08:13:39.2512477Z @%p154 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:13:39.2512846Z // end inline asm 2026-02-21T08:13:39.2513013Z // begin inline asm 2026-02-21T08:13:39.2513300Z @%p154 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:13:39.2513613Z // end inline asm 2026-02-21T08:13:39.2513781Z mov.b32 %r34, 32; 2026-02-21T08:13:39.2513956Z // begin inline asm 2026-02-21T08:13:39.2514250Z @%p154 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r34; 2026-02-21T08:13:39.2514597Z // end inline asm 2026-02-21T08:13:39.2514809Z mov.b32 %r35, 256; 2026-02-21T08:13:39.2514985Z // begin inline asm 2026-02-21T08:13:39.2515272Z @%p154 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:13:39.2515615Z // end inline asm 2026-02-21T08:13:39.2515775Z mov.b32 %r36, 1024; 2026-02-21T08:13:39.2515953Z // begin inline asm 2026-02-21T08:13:39.2516252Z @%p154 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r36; 2026-02-21T08:13:39.2516585Z // end inline asm 2026-02-21T08:13:39.2516753Z mov.b32 %r37, 4096; 2026-02-21T08:13:39.2516921Z // begin inline asm 2026-02-21T08:13:39.2517214Z @%p154 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:13:39.2517535Z // end inline asm 2026-02-21T08:13:39.2517701Z mov.b64 %rd12, 2048; 2026-02-21T08:13:39.2517876Z // begin inline asm 2026-02-21T08:13:39.2518184Z @%p154 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:13:39.2518532Z // end inline asm 2026-02-21T08:13:39.2518687Z mov.b32 %r38, 1; 2026-02-21T08:13:39.2518849Z // begin inline asm 2026-02-21T08:13:39.2519152Z @%p154 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r38; 2026-02-21T08:13:39.2519512Z // end inline asm 2026-02-21T08:13:39.2519668Z // begin inline asm 2026-02-21T08:13:39.2519986Z @%p154 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r38; 2026-02-21T08:13:39.2520337Z // end inline asm 2026-02-21T08:13:39.2520497Z // begin inline asm 2026-02-21T08:13:39.2520865Z @%p154 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:13:39.2521179Z // end inline asm 2026-02-21T08:13:39.2521342Z // begin inline asm 2026-02-21T08:13:39.2521651Z @%p154 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.2521999Z // end inline asm 2026-02-21T08:13:39.2522164Z // begin inline asm 2026-02-21T08:13:39.2522445Z @%p154 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T08:13:39.2522790Z // end inline asm 2026-02-21T08:13:39.2522951Z // begin inline asm 2026-02-21T08:13:39.2523244Z @%p154 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.2523573Z // end inline asm 2026-02-21T08:13:39.2523749Z // begin inline asm 2026-02-21T08:13:39.2524263Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:13:39.2524786Z // end inline asm 2026-02-21T08:13:39.2524957Z // begin inline asm 2026-02-21T08:13:39.2525206Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:13:39.2525518Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:39.2525744Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:39.2525964Z // end inline asm 2026-02-21T08:13:39.2526118Z bar.sync 0; 2026-02-21T08:13:39.2526296Z cvta.global.u64 %rd89, %rd19; 2026-02-21T08:13:39.2526654Z .loc 1 22 67 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:22:67 2026-02-21T08:13:39.2527028Z add.s32 %r64, %r62, 128; 2026-02-21T08:13:39.2527220Z cvt.s64.s32 %rd60, %r64; 2026-02-21T08:13:39.2527405Z add.s64 %rd37, %rd58, %rd60; 2026-02-21T08:13:39.2527598Z bar.sync 0; 2026-02-21T08:13:39.2527751Z // begin inline asm 2026-02-21T08:13:39.2527936Z @%p1 st.shared.b32 [ %r32 + 0 ], %r41; 2026-02-21T08:13:39.2528127Z // end inline asm 2026-02-21T08:13:39.2528292Z bar.warp.sync -1; 2026-02-21T08:13:39.2528450Z // begin inline asm 2026-02-21T08:13:39.2528735Z @%p154 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:13:39.2529057Z // end inline asm 2026-02-21T08:13:39.2529204Z // begin inline asm 2026-02-21T08:13:39.2529460Z @%p154 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:13:39.2529741Z // end inline asm 2026-02-21T08:13:39.2529895Z // begin inline asm 2026-02-21T08:13:39.2530157Z @%p154 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r34; 2026-02-21T08:13:39.2530471Z // end inline asm 2026-02-21T08:13:39.2530625Z mov.b32 %r385, 128; 2026-02-21T08:13:39.2530781Z // begin inline asm 2026-02-21T08:13:39.2531050Z @%p154 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r385; 2026-02-21T08:13:39.2531355Z // end inline asm 2026-02-21T08:13:39.2531512Z // begin inline asm 2026-02-21T08:13:39.2531792Z @%p154 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r36; 2026-02-21T08:13:39.2532121Z // end inline asm 2026-02-21T08:13:39.2532268Z // begin inline asm 2026-02-21T08:13:39.2532543Z @%p154 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r36; 2026-02-21T08:13:39.2532860Z // end inline asm 2026-02-21T08:13:39.2533006Z // begin inline asm 2026-02-21T08:13:39.2533293Z @%p154 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:13:39.2533612Z // end inline asm 2026-02-21T08:13:39.2533769Z // begin inline asm 2026-02-21T08:13:39.2534053Z @%p154 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r38; 2026-02-21T08:13:39.2534384Z // end inline asm 2026-02-21T08:13:39.2534544Z // begin inline asm 2026-02-21T08:13:39.2534907Z @%p154 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r38; 2026-02-21T08:13:39.2535263Z // end inline asm 2026-02-21T08:13:39.2535428Z // begin inline asm 2026-02-21T08:13:39.2535811Z @%p154 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:13:39.2536139Z // end inline asm 2026-02-21T08:13:39.2536316Z // begin inline asm 2026-02-21T08:13:39.2536626Z @%p154 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.2536963Z // end inline asm 2026-02-21T08:13:39.2537125Z // begin inline asm 2026-02-21T08:13:39.2537410Z @%p154 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T08:13:39.2537742Z // end inline asm 2026-02-21T08:13:39.2537900Z // begin inline asm 2026-02-21T08:13:39.2538180Z @%p154 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.2538507Z // end inline asm 2026-02-21T08:13:39.2538664Z // begin inline asm 2026-02-21T08:13:39.2539088Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:13:39.2539630Z // end inline asm 2026-02-21T08:13:39.2539801Z // begin inline asm 2026-02-21T08:13:39.2540047Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:13:39.2540352Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:39.2540575Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:39.2540788Z // end inline asm 2026-02-21T08:13:39.2540947Z bar.sync 0; 2026-02-21T08:13:39.2541110Z cvta.global.u64 %rd90, %rd37; 2026-02-21T08:13:39.2541452Z .loc 1 24 71 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:24:71 2026-02-21T08:13:39.2541802Z add.s32 %r65, %r62, 256; 2026-02-21T08:13:39.2541990Z cvt.s64.s32 %rd61, %r65; 2026-02-21T08:13:39.2542177Z add.s64 %rd55, %rd58, %rd61; 2026-02-21T08:13:39.2542368Z bar.sync 0; 2026-02-21T08:13:39.2542518Z // begin inline asm 2026-02-21T08:13:39.2542698Z @%p1 st.shared.b32 [ %r32 + 0 ], %r41; 2026-02-21T08:13:39.2542903Z // end inline asm 2026-02-21T08:13:39.2543065Z bar.warp.sync -1; 2026-02-21T08:13:39.2543237Z // begin inline asm 2026-02-21T08:13:39.2543526Z @%p154 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T08:13:39.2543858Z // end inline asm 2026-02-21T08:13:39.2544013Z // begin inline asm 2026-02-21T08:13:39.2544284Z @%p154 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:13:39.2544580Z // end inline asm 2026-02-21T08:13:39.2544790Z mov.b32 %r367, 64; 2026-02-21T08:13:39.2544959Z // begin inline asm 2026-02-21T08:13:39.2545240Z @%p154 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r367; 2026-02-21T08:13:39.2545564Z // end inline asm 2026-02-21T08:13:39.2545721Z // begin inline asm 2026-02-21T08:13:39.2546000Z @%p154 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:13:39.2546314Z // end inline asm 2026-02-21T08:13:39.2546478Z // begin inline asm 2026-02-21T08:13:39.2546763Z @%p154 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r36; 2026-02-21T08:13:39.2547095Z // end inline asm 2026-02-21T08:13:39.2547261Z // begin inline asm 2026-02-21T08:13:39.2547548Z @%p154 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:13:39.2547894Z // end inline asm 2026-02-21T08:13:39.2548049Z // begin inline asm 2026-02-21T08:13:39.2548358Z @%p154 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:13:39.2548693Z // end inline asm 2026-02-21T08:13:39.2548856Z // begin inline asm 2026-02-21T08:13:39.2549159Z @%p154 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r38; 2026-02-21T08:13:39.2549492Z // end inline asm 2026-02-21T08:13:39.2549653Z // begin inline asm 2026-02-21T08:13:39.2549943Z @%p154 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r38; 2026-02-21T08:13:39.2550285Z // end inline asm 2026-02-21T08:13:39.2550443Z // begin inline asm 2026-02-21T08:13:39.2550723Z @%p154 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:13:39.2551119Z // end inline asm 2026-02-21T08:13:39.2551277Z // begin inline asm 2026-02-21T08:13:39.2551583Z @%p154 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.2551929Z // end inline asm 2026-02-21T08:13:39.2552091Z // begin inline asm 2026-02-21T08:13:39.2552378Z @%p154 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T08:13:39.2552712Z // end inline asm 2026-02-21T08:13:39.2552868Z // begin inline asm 2026-02-21T08:13:39.2553143Z @%p154 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.2553463Z // end inline asm 2026-02-21T08:13:39.2553617Z // begin inline asm 2026-02-21T08:13:39.2554051Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:13:39.2554588Z // end inline asm 2026-02-21T08:13:39.2554786Z // begin inline asm 2026-02-21T08:13:39.2555039Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T08:13:39.2555351Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:39.2555581Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:39.2555790Z // end inline asm 2026-02-21T08:13:39.2555957Z bar.sync 0; 2026-02-21T08:13:39.2556127Z cvta.global.u64 %rd108, %rd55; 2026-02-21T08:13:39.2556482Z .loc 1 33 74 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:33:74 2026-02-21T08:13:39.2556867Z setp.gt.u32 %p57, %r3, 127; 2026-02-21T08:13:39.2557072Z @%p57 bra $L__BB0_8; 2026-02-21T08:13:39.2557268Z // %bb.1: // %.lr.ph 2026-02-21T08:13:39.2557643Z .loc 1 0 74 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:0:74 2026-02-21T08:13:39.2558014Z shl.b32 %r403, %r1, 7; 2026-02-21T08:13:39.2558201Z and.b32 %r404, %r403, 16256; 2026-02-21T08:13:39.2558402Z shl.b32 %r405, %r1, 4; 2026-02-21T08:13:39.2558582Z and.b32 %r406, %r405, 112; 2026-02-21T08:13:39.2558777Z or.b32 %r407, %r404, %r406; 2026-02-21T08:13:39.2558964Z add.s32 %r408, %r31, 114688; 2026-02-21T08:13:39.2559159Z xor.b32 %r409, %r407, 16; 2026-02-21T08:13:39.2559340Z xor.b32 %r410, %r407, 32; 2026-02-21T08:13:39.2559538Z xor.b32 %r411, %r407, 48; 2026-02-21T08:13:39.2559711Z xor.b32 %r412, %r407, 64; 2026-02-21T08:13:39.2559878Z xor.b32 %r413, %r407, 80; 2026-02-21T08:13:39.2560055Z xor.b32 %r414, %r407, 96; 2026-02-21T08:13:39.2560225Z xor.b32 %r415, %r407, 112; 2026-02-21T08:13:39.2560409Z shr.u32 %r416, %r1, 5; 2026-02-21T08:13:39.2560713Z .loc 1 44 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:44:27 2026-02-21T08:13:39.2561055Z shl.b32 %r417, %r3, 7; 2026-02-21T08:13:39.2561222Z and.b32 %r462, %r417, 896; 2026-02-21T08:13:39.2561540Z .loc 1 45 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:45:27 2026-02-21T08:13:39.2561878Z shl.b32 %r418, %r3, 5; 2026-02-21T08:13:39.2562043Z and.b32 %r811, %r418, 3840; 2026-02-21T08:13:39.2562361Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2562707Z shfl.sync.idx.b32 %r14, %r416, 0, 31, -1; 2026-02-21T08:13:39.2562924Z shl.b32 %r419, %r14, 21; 2026-02-21T08:13:39.2563099Z and.b32 %r420, %r419, 6291456; 2026-02-21T08:13:39.2563291Z add.s32 %r809, %r420, %r1203; 2026-02-21T08:13:39.2563471Z mov.pred %p117, -1; 2026-02-21T08:13:39.2563643Z // begin inline asm 2026-02-21T08:13:39.2564052Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 0], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2564465Z // end inline asm 2026-02-21T08:13:39.2564636Z // begin inline asm 2026-02-21T08:13:39.2565099Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 16], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2565629Z // end inline asm 2026-02-21T08:13:39.2565792Z // begin inline asm 2026-02-21T08:13:39.2566188Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 32], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2566644Z // end inline asm 2026-02-21T08:13:39.2566797Z // begin inline asm 2026-02-21T08:13:39.2567181Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 48], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2567597Z // end inline asm 2026-02-21T08:13:39.2567753Z // begin inline asm 2026-02-21T08:13:39.2568116Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 64], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2568522Z // end inline asm 2026-02-21T08:13:39.2569129Z // begin inline asm 2026-02-21T08:13:39.2569509Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 80], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2569928Z // end inline asm 2026-02-21T08:13:39.2570076Z // begin inline asm 2026-02-21T08:13:39.2570451Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 96], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2570875Z // end inline asm 2026-02-21T08:13:39.2571027Z // begin inline asm 2026-02-21T08:13:39.2571409Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 112], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2571820Z // end inline asm 2026-02-21T08:13:39.2571977Z // begin inline asm 2026-02-21T08:13:39.2572349Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 128], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2572771Z // end inline asm 2026-02-21T08:13:39.2572927Z // begin inline asm 2026-02-21T08:13:39.2573309Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 144], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2573720Z // end inline asm 2026-02-21T08:13:39.2573870Z // begin inline asm 2026-02-21T08:13:39.2574245Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 160], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2574752Z // end inline asm 2026-02-21T08:13:39.2574921Z // begin inline asm 2026-02-21T08:13:39.2575339Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 176], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2575805Z // end inline asm 2026-02-21T08:13:39.2575976Z // begin inline asm 2026-02-21T08:13:39.2576388Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 192], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2576806Z // end inline asm 2026-02-21T08:13:39.2576956Z // begin inline asm 2026-02-21T08:13:39.2577330Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 208], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2577758Z // end inline asm 2026-02-21T08:13:39.2577904Z // begin inline asm 2026-02-21T08:13:39.2578281Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 224], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2578703Z // end inline asm 2026-02-21T08:13:39.2578858Z // begin inline asm 2026-02-21T08:13:39.2579225Z @%p117 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 240], {%r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41, %r41}; 2026-02-21T08:13:39.2579696Z // end inline asm 2026-02-21T08:13:39.2579851Z // begin inline asm 2026-02-21T08:13:39.2580024Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:13:39.2580217Z // end inline asm 2026-02-21T08:13:39.2580364Z bar.sync 0; 2026-02-21T08:13:39.2580656Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2580986Z add.s32 %r1205, %r31, 237632; 2026-02-21T08:13:39.2581169Z // begin inline asm 2026-02-21T08:13:39.2581356Z @%p154 mbarrier.init.shared::cta.b64 [%r1205], 1; 2026-02-21T08:13:39.2581588Z // end inline asm 2026-02-21T08:13:39.2581748Z bar.sync 0; 2026-02-21T08:13:39.2581906Z add.s32 %r339, %r31, 237640; 2026-02-21T08:13:39.2582099Z // begin inline asm 2026-02-21T08:13:39.2582298Z @%p154 mbarrier.init.shared::cta.b64 [%r339], 1; 2026-02-21T08:13:39.2582530Z // end inline asm 2026-02-21T08:13:39.2582689Z add.s32 %r340, %r31, 237568; 2026-02-21T08:13:39.2582963Z // begin inline asm 2026-02-21T08:13:39.2583158Z @%p154 mbarrier.init.shared::cta.b64 [%r340], 1; 2026-02-21T08:13:39.2583395Z // end inline asm 2026-02-21T08:13:39.2583555Z bar.sync 0; 2026-02-21T08:13:39.2583709Z add.s32 %r341, %r31, 237576; 2026-02-21T08:13:39.2583902Z // begin inline asm 2026-02-21T08:13:39.2584087Z @%p154 mbarrier.init.shared::cta.b64 [%r341], 1; 2026-02-21T08:13:39.2584304Z // end inline asm 2026-02-21T08:13:39.2584450Z bar.sync 0; 2026-02-21T08:13:39.2584606Z add.s32 %r342, %r31, 237584; 2026-02-21T08:13:39.2584834Z // begin inline asm 2026-02-21T08:13:39.2585037Z @%p154 mbarrier.init.shared::cta.b64 [%r342], 1; 2026-02-21T08:13:39.2585261Z // end inline asm 2026-02-21T08:13:39.2585424Z bar.sync 0; 2026-02-21T08:13:39.2585585Z add.s32 %r343, %r31, 237592; 2026-02-21T08:13:39.2585766Z // begin inline asm 2026-02-21T08:13:39.2585967Z @%p154 mbarrier.init.shared::cta.b64 [%r343], 1; 2026-02-21T08:13:39.2586190Z // end inline asm 2026-02-21T08:13:39.2586354Z bar.sync 0; 2026-02-21T08:13:39.2586511Z add.s32 %r344, %r31, 237600; 2026-02-21T08:13:39.2586708Z // begin inline asm 2026-02-21T08:13:39.2586905Z @%p154 mbarrier.init.shared::cta.b64 [%r344], 1; 2026-02-21T08:13:39.2587145Z // end inline asm 2026-02-21T08:13:39.2587314Z bar.sync 0; 2026-02-21T08:13:39.2587472Z add.s32 %r345, %r31, 237608; 2026-02-21T08:13:39.2587662Z // begin inline asm 2026-02-21T08:13:39.2587859Z @%p154 mbarrier.init.shared::cta.b64 [%r345], 1; 2026-02-21T08:13:39.2588088Z // end inline asm 2026-02-21T08:13:39.2588243Z bar.sync 0; 2026-02-21T08:13:39.2588401Z add.s32 %r455, %r31, 237616; 2026-02-21T08:13:39.2588583Z // begin inline asm 2026-02-21T08:13:39.2588783Z @%p154 mbarrier.init.shared::cta.b64 [%r455], 1; 2026-02-21T08:13:39.2589003Z // end inline asm 2026-02-21T08:13:39.2589167Z bar.sync 0; 2026-02-21T08:13:39.2589325Z // begin inline asm 2026-02-21T08:13:39.2589559Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r340], 24576; 2026-02-21T08:13:39.2589835Z // end inline asm 2026-02-21T08:13:39.2590143Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2590515Z // begin inline asm 2026-02-21T08:13:39.2590699Z fence.proxy.async.shared::cta; 2026-02-21T08:13:39.2590904Z // end inline asm 2026-02-21T08:13:39.2591062Z bar.sync 0; 2026-02-21T08:13:39.2591239Z elect.sync %r421|%p102, -1; 2026-02-21T08:13:39.2591448Z and.pred %p84, %p1, %p102; 2026-02-21T08:13:39.2591635Z // begin inline asm 2026-02-21T08:13:39.2592057Z @%p84 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r31], [%rd89, {%r41, %r811}], [%r340]; 2026-02-21T08:13:39.2592514Z // end inline asm 2026-02-21T08:13:39.2592844Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2593221Z bar.sync 0; 2026-02-21T08:13:39.2593389Z elect.sync %r422|%p103, -1; 2026-02-21T08:13:39.2593587Z and.pred %p85, %p1, %p103; 2026-02-21T08:13:39.2593787Z add.s32 %r352, %r31, 180224; 2026-02-21T08:13:39.2594046Z // begin inline asm 2026-02-21T08:13:39.2594458Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r352], [%rd90, {%r41, %r462}], [%r340]; 2026-02-21T08:13:39.2594932Z // end inline asm 2026-02-21T08:13:39.2595260Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2595645Z bar.sync 0; 2026-02-21T08:13:39.2595795Z // begin inline asm 2026-02-21T08:13:39.2596038Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r341], 24576; 2026-02-21T08:13:39.2596313Z // end inline asm 2026-02-21T08:13:39.2596680Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2597059Z bar.sync 0; 2026-02-21T08:13:39.2597219Z elect.sync %r423|%p104, -1; 2026-02-21T08:13:39.2597423Z and.pred %p87, %p1, %p104; 2026-02-21T08:13:39.2597610Z add.s32 %r357, %r31, 16384; 2026-02-21T08:13:39.2597871Z // begin inline asm 2026-02-21T08:13:39.2598269Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r357], [%rd89, {%r34, %r811}], [%r341]; 2026-02-21T08:13:39.2598721Z // end inline asm 2026-02-21T08:13:39.2599027Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2599381Z bar.sync 0; 2026-02-21T08:13:39.2599548Z elect.sync %r424|%p105, -1; 2026-02-21T08:13:39.2599744Z and.pred %p88, %p1, %p105; 2026-02-21T08:13:39.2599946Z add.s32 %r361, %r31, 188416; 2026-02-21T08:13:39.2600132Z // begin inline asm 2026-02-21T08:13:39.2627196Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r361], [%rd90, {%r34, %r462}], [%r341]; 2026-02-21T08:13:39.2627739Z // end inline asm 2026-02-21T08:13:39.2628091Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2628464Z bar.sync 0; 2026-02-21T08:13:39.2628659Z // begin inline asm 2026-02-21T08:13:39.2628924Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r342], 24576; 2026-02-21T08:13:39.2629208Z // end inline asm 2026-02-21T08:13:39.2629528Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2629876Z bar.sync 0; 2026-02-21T08:13:39.2630064Z elect.sync %r425|%p106, -1; 2026-02-21T08:13:39.2630279Z and.pred %p90, %p1, %p106; 2026-02-21T08:13:39.2630489Z add.s32 %r366, %r31, 32768; 2026-02-21T08:13:39.2630678Z // begin inline asm 2026-02-21T08:13:39.2631104Z @%p90 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r366], [%rd89, {%r367, %r811}], [%r342]; 2026-02-21T08:13:39.2631543Z // end inline asm 2026-02-21T08:13:39.2631851Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2632218Z bar.sync 0; 2026-02-21T08:13:39.2632385Z elect.sync %r426|%p107, -1; 2026-02-21T08:13:39.2632599Z and.pred %p91, %p1, %p107; 2026-02-21T08:13:39.2632800Z add.s32 %r370, %r31, 196608; 2026-02-21T08:13:39.2632998Z // begin inline asm 2026-02-21T08:13:39.2633387Z @%p91 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r370], [%rd90, {%r367, %r462}], [%r342]; 2026-02-21T08:13:39.2633812Z // end inline asm 2026-02-21T08:13:39.2634123Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2634478Z bar.sync 0; 2026-02-21T08:13:39.2634644Z // begin inline asm 2026-02-21T08:13:39.2634955Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r343], 24576; 2026-02-21T08:13:39.2635233Z // end inline asm 2026-02-21T08:13:39.2635536Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2635894Z bar.sync 0; 2026-02-21T08:13:39.2636074Z elect.sync %r427|%p108, -1; 2026-02-21T08:13:39.2636266Z and.pred %p93, %p1, %p108; 2026-02-21T08:13:39.2636463Z add.s32 %r375, %r31, 49152; 2026-02-21T08:13:39.2636826Z mov.b32 %r376, 96; 2026-02-21T08:13:39.2636998Z // begin inline asm 2026-02-21T08:13:39.2637370Z @%p93 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r375], [%rd89, {%r376, %r811}], [%r343]; 2026-02-21T08:13:39.2637812Z // end inline asm 2026-02-21T08:13:39.2638119Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2638482Z bar.sync 0; 2026-02-21T08:13:39.2638659Z elect.sync %r428|%p109, -1; 2026-02-21T08:13:39.2638866Z and.pred %p94, %p1, %p109; 2026-02-21T08:13:39.2639076Z add.s32 %r379, %r31, 204800; 2026-02-21T08:13:39.2639271Z // begin inline asm 2026-02-21T08:13:39.2639678Z @%p94 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r379], [%rd90, {%r376, %r462}], [%r343]; 2026-02-21T08:13:39.2640103Z // end inline asm 2026-02-21T08:13:39.2640535Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2640899Z bar.sync 0; 2026-02-21T08:13:39.2641056Z // begin inline asm 2026-02-21T08:13:39.2641304Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r344], 24576; 2026-02-21T08:13:39.2641571Z // end inline asm 2026-02-21T08:13:39.2641882Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2642224Z bar.sync 0; 2026-02-21T08:13:39.2642397Z elect.sync %r429|%p110, -1; 2026-02-21T08:13:39.2642599Z and.pred %p96, %p1, %p110; 2026-02-21T08:13:39.2642800Z add.s32 %r384, %r31, 65536; 2026-02-21T08:13:39.2642993Z // begin inline asm 2026-02-21T08:13:39.2643392Z @%p96 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r384], [%rd89, {%r385, %r811}], [%r344]; 2026-02-21T08:13:39.2643838Z // end inline asm 2026-02-21T08:13:39.2644144Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2644512Z bar.sync 0; 2026-02-21T08:13:39.2644721Z elect.sync %r430|%p111, -1; 2026-02-21T08:13:39.2644935Z and.pred %p97, %p1, %p111; 2026-02-21T08:13:39.2645130Z add.s32 %r388, %r31, 212992; 2026-02-21T08:13:39.2645326Z // begin inline asm 2026-02-21T08:13:39.2645738Z @%p97 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r388], [%rd90, {%r385, %r462}], [%r344]; 2026-02-21T08:13:39.2646175Z // end inline asm 2026-02-21T08:13:39.2646483Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2646828Z bar.sync 0; 2026-02-21T08:13:39.2646996Z // begin inline asm 2026-02-21T08:13:39.2647232Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r345], 24576; 2026-02-21T08:13:39.2647505Z // end inline asm 2026-02-21T08:13:39.2647816Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2648153Z bar.sync 0; 2026-02-21T08:13:39.2648325Z elect.sync %r431|%p112, -1; 2026-02-21T08:13:39.2648526Z and.pred %p99, %p1, %p112; 2026-02-21T08:13:39.2648721Z add.s32 %r393, %r31, 81920; 2026-02-21T08:13:39.2648903Z mov.b32 %r394, 160; 2026-02-21T08:13:39.2649082Z // begin inline asm 2026-02-21T08:13:39.2649481Z @%p99 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r393], [%rd89, {%r394, %r811}], [%r345]; 2026-02-21T08:13:39.2649920Z // end inline asm 2026-02-21T08:13:39.2650230Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2650570Z bar.sync 0; 2026-02-21T08:13:39.2650741Z elect.sync %r432|%p113, -1; 2026-02-21T08:13:39.2650815Z and.pred %p100, %p1, %p113; 2026-02-21T08:13:39.2650888Z add.s32 %r397, %r31, 221184; 2026-02-21T08:13:39.2650967Z // begin inline asm 2026-02-21T08:13:39.2651278Z @%p100 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r397], [%rd90, {%r394, %r462}], [%r345]; 2026-02-21T08:13:39.2651409Z // end inline asm 2026-02-21T08:13:39.2651623Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2651689Z bar.sync 0; 2026-02-21T08:13:39.2651757Z // begin inline asm 2026-02-21T08:13:39.2651820Z 2026-02-21T08:13:39.2651892Z { 2026-02-21T08:13:39.2651969Z .reg .pred complete; 2026-02-21T08:13:39.2652036Z waitLoop: 2026-02-21T08:13:39.2652187Z mbarrier.try_wait.parity.shared.b64 complete, [%r340], %r41; 2026-02-21T08:13:39.2652266Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.2652327Z } 2026-02-21T08:13:39.2652334Z 2026-02-21T08:13:39.2652411Z // end inline asm 2026-02-21T08:13:39.2652626Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2652703Z setp.ne.b32 %p114, %r14, 0; 2026-02-21T08:13:39.2652776Z @%p114 bra $L__BB0_3; 2026-02-21T08:13:39.2652849Z // %bb.2: 2026-02-21T08:13:39.2653116Z .loc 1 0 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:0:52 2026-02-21T08:13:39.2653195Z add.s32 %r442, %r31, 8224; 2026-02-21T08:13:39.2653287Z bfe.u32 %r443, %r442, 4, 14; 2026-02-21T08:13:39.2653363Z cvt.u64.u32 %rd83, %r443; 2026-02-21T08:13:39.2653456Z or.b64 %rd80, %rd83, -9223371899348713472; 2026-02-21T08:13:39.2653529Z add.s32 %r444, %r31, 8192; 2026-02-21T08:13:39.2653612Z bfe.u32 %r445, %r444, 4, 14; 2026-02-21T08:13:39.2653685Z cvt.u64.u32 %rd84, %r445; 2026-02-21T08:13:39.2653773Z or.b64 %rd78, %rd84, -9223371899348713472; 2026-02-21T08:13:39.2653854Z add.s32 %r447, %r31, 180256; 2026-02-21T08:13:39.2653923Z bfe.u32 %r448, %r447, 4, 14; 2026-02-21T08:13:39.2653992Z cvt.u64.u32 %rd85, %r448; 2026-02-21T08:13:39.2654085Z or.b64 %rd77, %rd85, -9223371899382267904; 2026-02-21T08:13:39.2654157Z add.s32 %r449, %r31, 32; 2026-02-21T08:13:39.2654225Z bfe.u32 %r450, %r449, 4, 14; 2026-02-21T08:13:39.2654295Z cvt.u64.u32 %rd86, %r450; 2026-02-21T08:13:39.2654388Z or.b64 %rd76, %rd86, -9223371899348713472; 2026-02-21T08:13:39.2654460Z bfe.u32 %r451, %r352, 4, 14; 2026-02-21T08:13:39.2654531Z cvt.u64.u32 %rd87, %r451; 2026-02-21T08:13:39.2654620Z or.b64 %rd75, %rd87, -9223371899382267904; 2026-02-21T08:13:39.2654779Z bfe.u32 %r452, %r31, 4, 14; 2026-02-21T08:13:39.2654854Z cvt.u64.u32 %rd88, %r452; 2026-02-21T08:13:39.2654934Z or.b64 %rd74, %rd88, -9223371899348713472; 2026-02-21T08:13:39.2655152Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2655228Z elect.sync %r453|%p116, -1; 2026-02-21T08:13:39.2655296Z mov.b32 %r434, 136314896; 2026-02-21T08:13:39.2655376Z mov.pred %p115, 0; 2026-02-21T08:13:39.2655445Z // begin inline asm 2026-02-21T08:13:39.2655629Z @%p116 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 0 ], %rd74, %rd75, %r434, %p115; 2026-02-21T08:13:39.2655707Z // end inline asm 2026-02-21T08:13:39.2655776Z // begin inline asm 2026-02-21T08:13:39.2655950Z @%p116 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 0 ], %rd76, %rd77, %r434, %p117; 2026-02-21T08:13:39.2656020Z // end inline asm 2026-02-21T08:13:39.2656100Z // begin inline asm 2026-02-21T08:13:39.2656273Z @%p116 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 128 ], %rd78, %rd75, %r434, %p115; 2026-02-21T08:13:39.2656340Z // end inline asm 2026-02-21T08:13:39.2656418Z // begin inline asm 2026-02-21T08:13:39.2656587Z @%p116 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 128 ], %rd80, %rd77, %r434, %p117; 2026-02-21T08:13:39.2656653Z // end inline asm 2026-02-21T08:13:39.2656734Z add.s32 %r454, %r31, 237632; 2026-02-21T08:13:39.2656802Z cvt.u64.u32 %rd82, %r454; 2026-02-21T08:13:39.2656869Z // begin inline asm 2026-02-21T08:13:39.2657022Z @%p116 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd82]; 2026-02-21T08:13:39.2657101Z // end inline asm 2026-02-21T08:13:39.2657169Z $L__BB0_3: 2026-02-21T08:13:39.2657384Z .loc 1 0 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:0:52 2026-02-21T08:13:39.2657525Z add.s32 %r4, %r408, %r407; 2026-02-21T08:13:39.2657596Z add.s32 %r5, %r408, %r409; 2026-02-21T08:13:39.2657664Z add.s32 %r6, %r408, %r410; 2026-02-21T08:13:39.2657740Z add.s32 %r7, %r408, %r411; 2026-02-21T08:13:39.2657808Z add.s32 %r8, %r408, %r412; 2026-02-21T08:13:39.2657874Z add.s32 %r9, %r408, %r413; 2026-02-21T08:13:39.2657944Z add.s32 %r10, %r408, %r414; 2026-02-21T08:13:39.2658024Z add.s32 %r11, %r408, %r415; 2026-02-21T08:13:39.2658241Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2658308Z bar.sync 0; 2026-02-21T08:13:39.2658384Z // begin inline asm 2026-02-21T08:13:39.2658520Z @%p154 mbarrier.arrive.expect_tx.shared.b64 _, [%r455], 24576; 2026-02-21T08:13:39.2658587Z // end inline asm 2026-02-21T08:13:39.2658793Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2658924Z bar.sync 0; 2026-02-21T08:13:39.2659005Z elect.sync %r469|%p128, -1; 2026-02-21T08:13:39.2659079Z and.pred %p125, %p1, %p128; 2026-02-21T08:13:39.2659157Z add.s32 %r456, %r31, 98304; 2026-02-21T08:13:39.2659227Z mov.b32 %r457, 192; 2026-02-21T08:13:39.2659293Z // begin inline asm 2026-02-21T08:13:39.2659607Z @%p125 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r456], [%rd89, {%r457, %r811}], [%r455]; 2026-02-21T08:13:39.2659675Z // end inline asm 2026-02-21T08:13:39.2659886Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2659951Z bar.sync 0; 2026-02-21T08:13:39.2660034Z elect.sync %r470|%p129, -1; 2026-02-21T08:13:39.2660108Z and.pred %p126, %p1, %p129; 2026-02-21T08:13:39.2660180Z add.s32 %r460, %r31, 229376; 2026-02-21T08:13:39.2660257Z // begin inline asm 2026-02-21T08:13:39.2660557Z @%p126 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r460], [%rd90, {%r457, %r462}], [%r455]; 2026-02-21T08:13:39.2660629Z // end inline asm 2026-02-21T08:13:39.2660706Z mov.b32 %r1209, 1; 2026-02-21T08:13:39.2660773Z mov.b32 %r1208, 6; 2026-02-21T08:13:39.2660837Z mov.b32 %r1204, 0; 2026-02-21T08:13:39.2660908Z mov.b32 %r1206, %r1204; 2026-02-21T08:13:39.2660989Z mov.b32 %r1207, %r1204; 2026-02-21T08:13:39.2661057Z mov.b32 %r1210, %r1204; 2026-02-21T08:13:39.2661124Z mov.b32 %r1211, %r1204; 2026-02-21T08:13:39.2661202Z bra.uni $L__BB0_4; 2026-02-21T08:13:39.2661331Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:13:39.2661541Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2661632Z setp.lt.u32 %p144, %r1211, 800; 2026-02-21T08:13:39.2661856Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2661924Z // begin inline asm 2026-02-21T08:13:39.2661985Z 2026-02-21T08:13:39.2662059Z { 2026-02-21T08:13:39.2662137Z .reg .pred complete; 2026-02-21T08:13:39.2662214Z waitLoop: 2026-02-21T08:13:39.2662365Z mbarrier.try_wait.parity.shared.b64 complete, [%r1205], %r1204; 2026-02-21T08:13:39.2662441Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.2662500Z } 2026-02-21T08:13:39.2662505Z 2026-02-21T08:13:39.2662568Z // end inline asm 2026-02-21T08:13:39.2662648Z add.s32 %r516, %r1209, 1; 2026-02-21T08:13:39.2662718Z setp.gt.s32 %p147, %r516, 1; 2026-02-21T08:13:39.2662795Z selp.b32 %r1209, 0, %r516, %p147; 2026-02-21T08:13:39.2662873Z selp.b32 %r517, 1, 0, %p147; 2026-02-21T08:13:39.2662940Z xor.b32 %r28, %r1210, %r517; 2026-02-21T08:13:39.2663135Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2663212Z add.s32 %r518, %r1208, 1; 2026-02-21T08:13:39.2663283Z setp.gt.s32 %p148, %r518, 6; 2026-02-21T08:13:39.2663357Z selp.b32 %r1208, 0, %r518, %p148; 2026-02-21T08:13:39.2663428Z shl.b32 %r519, %r1208, 3; 2026-02-21T08:13:39.2663550Z add.s32 %r521, %r31, %r519; 2026-02-21T08:13:39.2663621Z add.s32 %r511, %r521, 237568; 2026-02-21T08:13:39.2663683Z bar.sync 0; 2026-02-21T08:13:39.2663763Z and.pred %p141, %p154, %p144; 2026-02-21T08:13:39.2663828Z // begin inline asm 2026-02-21T08:13:39.2663956Z @%p141 mbarrier.arrive.expect_tx.shared.b64 _, [%r511], 24576; 2026-02-21T08:13:39.2664017Z // end inline asm 2026-02-21T08:13:39.2664224Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2664293Z shl.b32 %r522, %r1208, 14; 2026-02-21T08:13:39.2664357Z add.s32 %r508, %r31, %r522; 2026-02-21T08:13:39.2664427Z bar.sync 0; 2026-02-21T08:13:39.2664497Z elect.sync %r523|%p149, -1; 2026-02-21T08:13:39.2664572Z and.pred %p150, %p144, %p149; 2026-02-21T08:13:39.2664644Z and.pred %p142, %p1, %p150; 2026-02-21T08:13:39.2664771Z add.s32 %r509, %r1211, 224; 2026-02-21T08:13:39.2664840Z // begin inline asm 2026-02-21T08:13:39.2665230Z @%p142 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r508], [%rd89, {%r509, %r811}], [%r511]; 2026-02-21T08:13:39.2665312Z // end inline asm 2026-02-21T08:13:39.2665514Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2665586Z shl.b32 %r524, %r1208, 13; 2026-02-21T08:13:39.2665667Z add.s32 %r525, %r31, %r524; 2026-02-21T08:13:39.2665737Z add.s32 %r512, %r525, 180224; 2026-02-21T08:13:39.2665802Z bar.sync 0; 2026-02-21T08:13:39.2665874Z elect.sync %r526|%p151, -1; 2026-02-21T08:13:39.2665959Z and.pred %p152, %p144, %p151; 2026-02-21T08:13:39.2666034Z and.pred %p143, %p1, %p152; 2026-02-21T08:13:39.2666102Z // begin inline asm 2026-02-21T08:13:39.2666411Z @%p143 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r512], [%rd90, {%r509, %r462}], [%r511]; 2026-02-21T08:13:39.2666492Z // end inline asm 2026-02-21T08:13:39.2666683Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2666773Z setp.lt.u32 %p153, %r1211, 960; 2026-02-21T08:13:39.2666841Z add.s32 %r1211, %r1211, 32; 2026-02-21T08:13:39.2666908Z mov.b32 %r1204, %r1210; 2026-02-21T08:13:39.2666975Z mov.b32 %r1205, %r527; 2026-02-21T08:13:39.2667050Z mov.b32 %r1210, %r28; 2026-02-21T08:13:39.2667114Z @%p153 bra $L__BB0_4; 2026-02-21T08:13:39.2667195Z bra.uni $L__BB0_7; 2026-02-21T08:13:39.2667315Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:13:39.2667507Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2667580Z add.s32 %r473, %r1207, 1; 2026-02-21T08:13:39.2667650Z setp.gt.s32 %p131, %r473, 6; 2026-02-21T08:13:39.2667721Z selp.b32 %r1207, 0, %r473, %p131; 2026-02-21T08:13:39.2667792Z selp.b32 %r474, 1, 0, %p131; 2026-02-21T08:13:39.2667872Z xor.b32 %r1206, %r1206, %r474; 2026-02-21T08:13:39.2667939Z shl.b32 %r475, %r1207, 3; 2026-02-21T08:13:39.2668006Z add.s32 %r477, %r31, %r475; 2026-02-21T08:13:39.2668077Z add.s32 %r471, %r477, 237568; 2026-02-21T08:13:39.2668137Z bar.sync 0; 2026-02-21T08:13:39.2668199Z // begin inline asm 2026-02-21T08:13:39.2668256Z 2026-02-21T08:13:39.2668319Z { 2026-02-21T08:13:39.2668385Z .reg .pred complete; 2026-02-21T08:13:39.2668445Z waitLoop: 2026-02-21T08:13:39.2668584Z mbarrier.try_wait.parity.shared.b64 complete, [%r471], %r1206; 2026-02-21T08:13:39.2668654Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.2668710Z } 2026-02-21T08:13:39.2668715Z 2026-02-21T08:13:39.2668774Z // end inline asm 2026-02-21T08:13:39.2668845Z shl.b32 %r478, %r1209, 3; 2026-02-21T08:13:39.2668908Z add.s32 %r479, %r31, %r478; 2026-02-21T08:13:39.2668971Z add.s32 %r527, %r479, 237632; 2026-02-21T08:13:39.2669166Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2669234Z @%p114 bra $L__BB0_6; 2026-02-21T08:13:39.2669411Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:13:39.2669611Z .loc 1 54 31 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:54:31 2026-02-21T08:13:39.2669677Z shl.b32 %r488, %r1207, 14; 2026-02-21T08:13:39.2669740Z add.s32 %r490, %r31, %r488; 2026-02-21T08:13:39.2669932Z .loc 1 55 44 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:55:44 2026-02-21T08:13:39.2670005Z shl.b32 %r491, %r1207, 13; 2026-02-21T08:13:39.2670069Z add.s32 %r492, %r31, %r491; 2026-02-21T08:13:39.2670133Z add.s32 %r493, %r492, 180224; 2026-02-21T08:13:39.2670329Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2670398Z elect.sync %r494|%p133, -1; 2026-02-21T08:13:39.2670464Z bfe.u32 %r495, %r490, 4, 14; 2026-02-21T08:13:39.2670538Z cvt.u64.u32 %rd100, %r495; 2026-02-21T08:13:39.2670665Z or.b64 %rd91, %rd100, -9223371899348713472; 2026-02-21T08:13:39.2670735Z bfe.u32 %r496, %r493, 4, 14; 2026-02-21T08:13:39.2670801Z cvt.u64.u32 %rd101, %r496; 2026-02-21T08:13:39.2670888Z or.b64 %rd92, %rd101, -9223371899382267904; 2026-02-21T08:13:39.2670952Z mov.b32 %r481, 136314896; 2026-02-21T08:13:39.2671019Z mov.pred %p132, -1; 2026-02-21T08:13:39.2671092Z // begin inline asm 2026-02-21T08:13:39.2671261Z @%p133 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 0 ], %rd91, %rd92, %r481, %p132; 2026-02-21T08:13:39.2671323Z // end inline asm 2026-02-21T08:13:39.2671395Z add.s32 %r497, %r490, 32; 2026-02-21T08:13:39.2671460Z bfe.u32 %r498, %r497, 4, 14; 2026-02-21T08:13:39.2671525Z cvt.u64.u32 %rd102, %r498; 2026-02-21T08:13:39.2671604Z or.b64 %rd93, %rd102, -9223371899348713472; 2026-02-21T08:13:39.2671681Z add.s32 %r499, %r492, 180256; 2026-02-21T08:13:39.2671746Z bfe.u32 %r500, %r499, 4, 14; 2026-02-21T08:13:39.2671810Z cvt.u64.u32 %rd103, %r500; 2026-02-21T08:13:39.2671896Z or.b64 %rd94, %rd103, -9223371899382267904; 2026-02-21T08:13:39.2671963Z // begin inline asm 2026-02-21T08:13:39.2672124Z @%p133 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 0 ], %rd93, %rd94, %r481, %p132; 2026-02-21T08:13:39.2672184Z // end inline asm 2026-02-21T08:13:39.2672255Z add.s32 %r501, %r490, 8192; 2026-02-21T08:13:39.2672319Z bfe.u32 %r502, %r501, 4, 14; 2026-02-21T08:13:39.2672382Z cvt.u64.u32 %rd104, %r502; 2026-02-21T08:13:39.2672467Z or.b64 %rd95, %rd104, -9223371899348713472; 2026-02-21T08:13:39.2672529Z // begin inline asm 2026-02-21T08:13:39.2672692Z @%p133 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 128 ], %rd95, %rd92, %r481, %p132; 2026-02-21T08:13:39.2672759Z // end inline asm 2026-02-21T08:13:39.2672824Z add.s32 %r503, %r490, 8224; 2026-02-21T08:13:39.2672899Z bfe.u32 %r504, %r503, 4, 14; 2026-02-21T08:13:39.2672965Z cvt.u64.u32 %rd105, %r504; 2026-02-21T08:13:39.2673049Z or.b64 %rd97, %rd105, -9223371899348713472; 2026-02-21T08:13:39.2673112Z // begin inline asm 2026-02-21T08:13:39.2673273Z @%p133 tcgen05.mma.cta_group::1.kind::f16 [ %r1203 + 128 ], %rd97, %rd94, %r481, %p132; 2026-02-21T08:13:39.2673336Z // end inline asm 2026-02-21T08:13:39.2673409Z cvt.u64.u32 %rd99, %r527; 2026-02-21T08:13:39.2673472Z // begin inline asm 2026-02-21T08:13:39.2673616Z @%p133 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd99]; 2026-02-21T08:13:39.2673687Z // end inline asm 2026-02-21T08:13:39.2673749Z bra.uni $L__BB0_6; 2026-02-21T08:13:39.2673860Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:13:39.2674065Z .loc 1 0 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:0:52 2026-02-21T08:13:39.2674134Z setp.lt.u32 %p164, %r1, 64; 2026-02-21T08:13:39.2674195Z mov.b32 %r528, 1; 2026-02-21T08:13:39.2674393Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2674466Z // begin inline asm 2026-02-21T08:13:39.2674521Z 2026-02-21T08:13:39.2674579Z { 2026-02-21T08:13:39.2675074Z .reg .pred complete; 2026-02-21T08:13:39.2675140Z waitLoop: 2026-02-21T08:13:39.2675284Z mbarrier.try_wait.parity.shared.b64 complete, [%r527], %r528; 2026-02-21T08:13:39.2675362Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.2675428Z } 2026-02-21T08:13:39.2675433Z 2026-02-21T08:13:39.2675497Z // end inline asm 2026-02-21T08:13:39.2675705Z .loc 1 50 42 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:50:42 2026-02-21T08:13:39.2675777Z bar.sync 0; 2026-02-21T08:13:39.2675842Z // begin inline asm 2026-02-21T08:13:39.2675945Z @%p154 mbarrier.inval.shared::cta.b64 [%r340]; 2026-02-21T08:13:39.2676017Z // end inline asm 2026-02-21T08:13:39.2676082Z bar.sync 0; 2026-02-21T08:13:39.2676149Z // begin inline asm 2026-02-21T08:13:39.2676249Z @%p154 mbarrier.inval.shared::cta.b64 [%r341]; 2026-02-21T08:13:39.2676322Z // end inline asm 2026-02-21T08:13:39.2676394Z bar.sync 0; 2026-02-21T08:13:39.2676513Z // begin inline asm 2026-02-21T08:13:39.2676616Z @%p154 mbarrier.inval.shared::cta.b64 [%r342]; 2026-02-21T08:13:39.2676676Z // end inline asm 2026-02-21T08:13:39.2676735Z bar.sync 0; 2026-02-21T08:13:39.2676796Z // begin inline asm 2026-02-21T08:13:39.2676888Z @%p154 mbarrier.inval.shared::cta.b64 [%r343]; 2026-02-21T08:13:39.2676949Z // end inline asm 2026-02-21T08:13:39.2677008Z bar.sync 0; 2026-02-21T08:13:39.2677076Z // begin inline asm 2026-02-21T08:13:39.2677164Z @%p154 mbarrier.inval.shared::cta.b64 [%r344]; 2026-02-21T08:13:39.2677223Z // end inline asm 2026-02-21T08:13:39.2677292Z bar.sync 0; 2026-02-21T08:13:39.2677353Z // begin inline asm 2026-02-21T08:13:39.2677439Z @%p154 mbarrier.inval.shared::cta.b64 [%r345]; 2026-02-21T08:13:39.2677499Z // end inline asm 2026-02-21T08:13:39.2677565Z bar.sync 0; 2026-02-21T08:13:39.2677629Z // begin inline asm 2026-02-21T08:13:39.2677713Z @%p154 mbarrier.inval.shared::cta.b64 [%r455]; 2026-02-21T08:13:39.2677780Z // end inline asm 2026-02-21T08:13:39.2677848Z add.s32 %r536, %r31, 237632; 2026-02-21T08:13:39.2677912Z // begin inline asm 2026-02-21T08:13:39.2677999Z @%p154 mbarrier.inval.shared::cta.b64 [%r536]; 2026-02-21T08:13:39.2678068Z // end inline asm 2026-02-21T08:13:39.2678129Z bar.sync 0; 2026-02-21T08:13:39.2678190Z // begin inline asm 2026-02-21T08:13:39.2678283Z @%p154 mbarrier.inval.shared::cta.b64 [%r339]; 2026-02-21T08:13:39.2678344Z // end inline asm 2026-02-21T08:13:39.2678535Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2678605Z // begin inline asm 2026-02-21T08:13:39.2678918Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553}, [%r809 + 0]; 2026-02-21T08:13:39.2678980Z // end inline asm 2026-02-21T08:13:39.2679041Z // begin inline asm 2026-02-21T08:13:39.2679358Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570}, [%r809 + 16]; 2026-02-21T08:13:39.2679423Z // end inline asm 2026-02-21T08:13:39.2679483Z // begin inline asm 2026-02-21T08:13:39.2679797Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587}, [%r809 + 32]; 2026-02-21T08:13:39.2679857Z // end inline asm 2026-02-21T08:13:39.2679917Z // begin inline asm 2026-02-21T08:13:39.2680219Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604}, [%r809 + 48]; 2026-02-21T08:13:39.2680278Z // end inline asm 2026-02-21T08:13:39.2680339Z // begin inline asm 2026-02-21T08:13:39.2680642Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621}, [%r809 + 64]; 2026-02-21T08:13:39.2680704Z // end inline asm 2026-02-21T08:13:39.2680767Z // begin inline asm 2026-02-21T08:13:39.2681148Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638}, [%r809 + 80]; 2026-02-21T08:13:39.2681220Z // end inline asm 2026-02-21T08:13:39.2681283Z // begin inline asm 2026-02-21T08:13:39.2681601Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655}, [%r809 + 96]; 2026-02-21T08:13:39.2681676Z // end inline asm 2026-02-21T08:13:39.2681741Z // begin inline asm 2026-02-21T08:13:39.2682070Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672}, [%r809 + 112]; 2026-02-21T08:13:39.2682144Z // end inline asm 2026-02-21T08:13:39.2682209Z // begin inline asm 2026-02-21T08:13:39.2682580Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689}, [%r809 + 128]; 2026-02-21T08:13:39.2682657Z // end inline asm 2026-02-21T08:13:39.2682722Z // begin inline asm 2026-02-21T08:13:39.2683038Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706}, [%r809 + 144]; 2026-02-21T08:13:39.2683102Z // end inline asm 2026-02-21T08:13:39.2683177Z // begin inline asm 2026-02-21T08:13:39.2683498Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723}, [%r809 + 160]; 2026-02-21T08:13:39.2683561Z // end inline asm 2026-02-21T08:13:39.2683632Z // begin inline asm 2026-02-21T08:13:39.2683943Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740}, [%r809 + 176]; 2026-02-21T08:13:39.2684009Z // end inline asm 2026-02-21T08:13:39.2684084Z // begin inline asm 2026-02-21T08:13:39.2684393Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757}, [%r809 + 192]; 2026-02-21T08:13:39.2684458Z // end inline asm 2026-02-21T08:13:39.2684529Z // begin inline asm 2026-02-21T08:13:39.2684964Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774}, [%r809 + 208]; 2026-02-21T08:13:39.2685030Z // end inline asm 2026-02-21T08:13:39.2685094Z // begin inline asm 2026-02-21T08:13:39.2685439Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791}, [%r809 + 224]; 2026-02-21T08:13:39.2685502Z // end inline asm 2026-02-21T08:13:39.2685566Z // begin inline asm 2026-02-21T08:13:39.2685891Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808}, [%r809 + 240]; 2026-02-21T08:13:39.2685960Z // end inline asm 2026-02-21T08:13:39.2686024Z // begin inline asm 2026-02-21T08:13:39.2686114Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:13:39.2686178Z // end inline asm 2026-02-21T08:13:39.2686249Z cvt.u64.u32 %rd109, %r538; 2026-02-21T08:13:39.2686325Z cvt.u64.u32 %rd110, %r539; 2026-02-21T08:13:39.2686394Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:13:39.2686469Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:13:39.2686677Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2686761Z mov.b64 {%r814, %r815}, %rd112; 2026-02-21T08:13:39.2686845Z cvt.rn.f16x2.f32 %r816, %r815, %r814; 2026-02-21T08:13:39.2687042Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2687126Z cvt.u64.u32 %rd113, %r540; 2026-02-21T08:13:39.2687195Z cvt.u64.u32 %rd114, %r541; 2026-02-21T08:13:39.2687331Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:13:39.2687403Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:13:39.2687612Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2687682Z mov.b64 {%r817, %r818}, %rd116; 2026-02-21T08:13:39.2687758Z cvt.rn.f16x2.f32 %r819, %r818, %r817; 2026-02-21T08:13:39.2687969Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2688039Z cvt.u64.u32 %rd117, %r542; 2026-02-21T08:13:39.2688106Z cvt.u64.u32 %rd118, %r543; 2026-02-21T08:13:39.2688181Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:13:39.2688252Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:13:39.2688450Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2688520Z mov.b64 {%r820, %r821}, %rd120; 2026-02-21T08:13:39.2688720Z cvt.rn.f16x2.f32 %r822, %r821, %r820; 2026-02-21T08:13:39.2688927Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2688997Z cvt.u64.u32 %rd121, %r544; 2026-02-21T08:13:39.2689074Z cvt.u64.u32 %rd122, %r545; 2026-02-21T08:13:39.2689142Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:13:39.2689211Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:13:39.2689421Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2689492Z mov.b64 {%r823, %r824}, %rd124; 2026-02-21T08:13:39.2689566Z cvt.rn.f16x2.f32 %r825, %r824, %r823; 2026-02-21T08:13:39.2689777Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2689844Z cvt.u64.u32 %rd125, %r546; 2026-02-21T08:13:39.2689912Z cvt.u64.u32 %rd126, %r547; 2026-02-21T08:13:39.2689979Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:13:39.2690061Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:13:39.2690269Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2690337Z mov.b64 {%r826, %r827}, %rd128; 2026-02-21T08:13:39.2690420Z cvt.rn.f16x2.f32 %r828, %r827, %r826; 2026-02-21T08:13:39.2690622Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2690690Z cvt.u64.u32 %rd129, %r548; 2026-02-21T08:13:39.2690757Z cvt.u64.u32 %rd130, %r549; 2026-02-21T08:13:39.2690834Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:13:39.2690904Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:13:39.2691110Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2691190Z mov.b64 {%r829, %r830}, %rd132; 2026-02-21T08:13:39.2691265Z cvt.rn.f16x2.f32 %r831, %r830, %r829; 2026-02-21T08:13:39.2691468Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2691546Z cvt.u64.u32 %rd133, %r550; 2026-02-21T08:13:39.2691614Z cvt.u64.u32 %rd134, %r551; 2026-02-21T08:13:39.2691683Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:13:39.2691752Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:13:39.2691965Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2692033Z mov.b64 {%r832, %r833}, %rd136; 2026-02-21T08:13:39.2692107Z cvt.rn.f16x2.f32 %r834, %r833, %r832; 2026-02-21T08:13:39.2692320Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2692387Z cvt.u64.u32 %rd137, %r552; 2026-02-21T08:13:39.2692454Z cvt.u64.u32 %rd138, %r553; 2026-02-21T08:13:39.2692529Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:13:39.2692598Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:13:39.2692805Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2692937Z mov.b64 {%r835, %r836}, %rd140; 2026-02-21T08:13:39.2693019Z cvt.rn.f16x2.f32 %r837, %r836, %r835; 2026-02-21T08:13:39.2693225Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2693293Z cvt.u64.u32 %rd141, %r555; 2026-02-21T08:13:39.2693369Z cvt.u64.u32 %rd142, %r556; 2026-02-21T08:13:39.2693438Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:13:39.2693507Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:13:39.2693718Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2693786Z mov.b64 {%r838, %r839}, %rd144; 2026-02-21T08:13:39.2693859Z cvt.rn.f16x2.f32 %r840, %r839, %r838; 2026-02-21T08:13:39.2694059Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2694135Z cvt.u64.u32 %rd145, %r557; 2026-02-21T08:13:39.2694248Z cvt.u64.u32 %rd146, %r558; 2026-02-21T08:13:39.2694321Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:13:39.2694402Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:13:39.2694598Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2694716Z mov.b64 {%r841, %r842}, %rd148; 2026-02-21T08:13:39.2694809Z cvt.rn.f16x2.f32 %r843, %r842, %r841; 2026-02-21T08:13:39.2695010Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2695079Z cvt.u64.u32 %rd149, %r559; 2026-02-21T08:13:39.2695148Z cvt.u64.u32 %rd150, %r560; 2026-02-21T08:13:39.2695229Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:13:39.2695298Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:13:39.2695499Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2695578Z mov.b64 {%r844, %r845}, %rd152; 2026-02-21T08:13:39.2695652Z cvt.rn.f16x2.f32 %r846, %r845, %r844; 2026-02-21T08:13:39.2695858Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2695938Z cvt.u64.u32 %rd153, %r561; 2026-02-21T08:13:39.2696005Z cvt.u64.u32 %rd154, %r562; 2026-02-21T08:13:39.2696075Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:13:39.2696148Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:13:39.2696369Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2696441Z mov.b64 {%r847, %r848}, %rd156; 2026-02-21T08:13:39.2696518Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T08:13:39.2696740Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2696809Z cvt.u64.u32 %rd157, %r563; 2026-02-21T08:13:39.2696878Z cvt.u64.u32 %rd158, %r564; 2026-02-21T08:13:39.2696957Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:13:39.2697026Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:13:39.2697240Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2697314Z mov.b64 {%r850, %r851}, %rd160; 2026-02-21T08:13:39.2697398Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T08:13:39.2697609Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2697680Z cvt.u64.u32 %rd161, %r565; 2026-02-21T08:13:39.2697759Z cvt.u64.u32 %rd162, %r566; 2026-02-21T08:13:39.2697829Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:13:39.2697900Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:13:39.2698114Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2698186Z mov.b64 {%r853, %r854}, %rd164; 2026-02-21T08:13:39.2698262Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T08:13:39.2698477Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2698558Z cvt.u64.u32 %rd165, %r567; 2026-02-21T08:13:39.2698689Z cvt.u64.u32 %rd166, %r568; 2026-02-21T08:13:39.2698758Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:13:39.2698836Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:13:39.2699045Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2699116Z mov.b64 {%r856, %r857}, %rd168; 2026-02-21T08:13:39.2699200Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T08:13:39.2699403Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2699472Z cvt.u64.u32 %rd169, %r569; 2026-02-21T08:13:39.2699539Z cvt.u64.u32 %rd170, %r570; 2026-02-21T08:13:39.2699615Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:13:39.2699683Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:13:39.2699887Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2700041Z mov.b64 {%r859, %r860}, %rd172; 2026-02-21T08:13:39.2700121Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T08:13:39.2700327Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2700404Z cvt.u64.u32 %rd173, %r572; 2026-02-21T08:13:39.2700472Z cvt.u64.u32 %rd174, %r573; 2026-02-21T08:13:39.2700540Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:13:39.2700610Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:13:39.2700827Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2700896Z mov.b64 {%r862, %r863}, %rd176; 2026-02-21T08:13:39.2700971Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T08:13:39.2701178Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2701247Z cvt.u64.u32 %rd177, %r574; 2026-02-21T08:13:39.2701314Z cvt.u64.u32 %rd178, %r575; 2026-02-21T08:13:39.2701393Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:13:39.2701463Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:13:39.2701674Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2701743Z mov.b64 {%r865, %r866}, %rd180; 2026-02-21T08:13:39.2701824Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T08:13:39.2702027Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2702095Z cvt.u64.u32 %rd181, %r576; 2026-02-21T08:13:39.2702169Z cvt.u64.u32 %rd182, %r577; 2026-02-21T08:13:39.2702237Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:13:39.2702305Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:13:39.2702522Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2702591Z mov.b64 {%r868, %r869}, %rd184; 2026-02-21T08:13:39.2702666Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T08:13:39.2702876Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2702953Z cvt.u64.u32 %rd185, %r578; 2026-02-21T08:13:39.2703021Z cvt.u64.u32 %rd186, %r579; 2026-02-21T08:13:39.2703087Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:13:39.2703164Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:13:39.2703373Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2703442Z mov.b64 {%r871, %r872}, %rd188; 2026-02-21T08:13:39.2703523Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T08:13:39.2703731Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2703799Z cvt.u64.u32 %rd189, %r580; 2026-02-21T08:13:39.2703867Z cvt.u64.u32 %rd190, %r581; 2026-02-21T08:13:39.2703948Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:13:39.2704017Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:13:39.2704226Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2704349Z mov.b64 {%r874, %r875}, %rd192; 2026-02-21T08:13:39.2704424Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T08:13:39.2704626Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2704747Z cvt.u64.u32 %rd193, %r582; 2026-02-21T08:13:39.2704817Z cvt.u64.u32 %rd194, %r583; 2026-02-21T08:13:39.2704887Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:13:39.2704956Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:13:39.2705171Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2705240Z mov.b64 {%r877, %r878}, %rd196; 2026-02-21T08:13:39.2705315Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T08:13:39.2705527Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2705598Z cvt.u64.u32 %rd197, %r584; 2026-02-21T08:13:39.2705769Z cvt.u64.u32 %rd198, %r585; 2026-02-21T08:13:39.2705854Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:13:39.2705926Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:13:39.2706138Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2706214Z mov.b64 {%r880, %r881}, %rd200; 2026-02-21T08:13:39.2706305Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T08:13:39.2706513Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2706584Z cvt.u64.u32 %rd201, %r586; 2026-02-21T08:13:39.2706660Z cvt.u64.u32 %rd202, %r587; 2026-02-21T08:13:39.2706727Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:13:39.2706796Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:13:39.2707010Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2707078Z mov.b64 {%r883, %r884}, %rd204; 2026-02-21T08:13:39.2707155Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T08:13:39.2707363Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2707439Z cvt.u64.u32 %rd205, %r589; 2026-02-21T08:13:39.2707506Z cvt.u64.u32 %rd206, %r590; 2026-02-21T08:13:39.2707573Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:13:39.2707648Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:13:39.2707849Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2707919Z mov.b64 {%r886, %r887}, %rd208; 2026-02-21T08:13:39.2708006Z cvt.rn.f16x2.f32 %r888, %r887, %r886; 2026-02-21T08:13:39.2708211Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2708280Z cvt.u64.u32 %rd209, %r591; 2026-02-21T08:13:39.2708348Z cvt.u64.u32 %rd210, %r592; 2026-02-21T08:13:39.2708424Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:13:39.2708495Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:13:39.2708703Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2708781Z mov.b64 {%r889, %r890}, %rd212; 2026-02-21T08:13:39.2708854Z cvt.rn.f16x2.f32 %r891, %r890, %r889; 2026-02-21T08:13:39.2709056Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2709131Z cvt.u64.u32 %rd213, %r593; 2026-02-21T08:13:39.2709198Z cvt.u64.u32 %rd214, %r594; 2026-02-21T08:13:39.2709265Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:13:39.2709335Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:13:39.2709548Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2709617Z mov.b64 {%r892, %r893}, %rd216; 2026-02-21T08:13:39.2709689Z cvt.rn.f16x2.f32 %r894, %r893, %r892; 2026-02-21T08:13:39.2709908Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2710062Z cvt.u64.u32 %rd217, %r595; 2026-02-21T08:13:39.2710129Z cvt.u64.u32 %rd218, %r596; 2026-02-21T08:13:39.2710203Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:13:39.2710271Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:13:39.2710471Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2710539Z mov.b64 {%r895, %r896}, %rd220; 2026-02-21T08:13:39.2710620Z cvt.rn.f16x2.f32 %r897, %r896, %r895; 2026-02-21T08:13:39.2710822Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2710890Z cvt.u64.u32 %rd221, %r597; 2026-02-21T08:13:39.2710965Z cvt.u64.u32 %rd222, %r598; 2026-02-21T08:13:39.2711033Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:13:39.2711102Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:13:39.2711312Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2711447Z mov.b64 {%r898, %r899}, %rd224; 2026-02-21T08:13:39.2711526Z cvt.rn.f16x2.f32 %r900, %r899, %r898; 2026-02-21T08:13:39.2711726Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2711802Z cvt.u64.u32 %rd225, %r599; 2026-02-21T08:13:39.2711870Z cvt.u64.u32 %rd226, %r600; 2026-02-21T08:13:39.2711938Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:13:39.2712014Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:13:39.2712210Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2712279Z mov.b64 {%r901, %r902}, %rd228; 2026-02-21T08:13:39.2712360Z cvt.rn.f16x2.f32 %r903, %r902, %r901; 2026-02-21T08:13:39.2712557Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2712623Z cvt.u64.u32 %rd229, %r601; 2026-02-21T08:13:39.2712690Z cvt.u64.u32 %rd230, %r602; 2026-02-21T08:13:39.2712769Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:13:39.2712841Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:13:39.2713046Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2713120Z mov.b64 {%r904, %r905}, %rd232; 2026-02-21T08:13:39.2713194Z cvt.rn.f16x2.f32 %r906, %r905, %r904; 2026-02-21T08:13:39.2713397Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2713472Z cvt.u64.u32 %rd233, %r603; 2026-02-21T08:13:39.2713539Z cvt.u64.u32 %rd234, %r604; 2026-02-21T08:13:39.2713605Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:13:39.2713674Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:13:39.2713883Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2713951Z mov.b64 {%r907, %r908}, %rd236; 2026-02-21T08:13:39.2714024Z cvt.rn.f16x2.f32 %r909, %r908, %r907; 2026-02-21T08:13:39.2714237Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2714307Z cvt.u64.u32 %rd237, %r606; 2026-02-21T08:13:39.2714374Z cvt.u64.u32 %rd238, %r607; 2026-02-21T08:13:39.2714451Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:13:39.2714520Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:13:39.2714784Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2714855Z mov.b64 {%r910, %r911}, %rd240; 2026-02-21T08:13:39.2714936Z cvt.rn.f16x2.f32 %r912, %r911, %r910; 2026-02-21T08:13:39.2715144Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2715212Z cvt.u64.u32 %rd241, %r608; 2026-02-21T08:13:39.2715291Z cvt.u64.u32 %rd242, %r609; 2026-02-21T08:13:39.2715360Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:13:39.2715429Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:13:39.2715642Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2715772Z mov.b64 {%r913, %r914}, %rd244; 2026-02-21T08:13:39.2715848Z cvt.rn.f16x2.f32 %r915, %r914, %r913; 2026-02-21T08:13:39.2716063Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2716140Z cvt.u64.u32 %rd245, %r610; 2026-02-21T08:13:39.2716207Z cvt.u64.u32 %rd246, %r611; 2026-02-21T08:13:39.2716275Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:13:39.2716355Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:13:39.2716570Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2716641Z mov.b64 {%r916, %r917}, %rd248; 2026-02-21T08:13:39.2716734Z cvt.rn.f16x2.f32 %r918, %r917, %r916; 2026-02-21T08:13:39.2716948Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2717080Z cvt.u64.u32 %rd249, %r612; 2026-02-21T08:13:39.2717151Z cvt.u64.u32 %rd250, %r613; 2026-02-21T08:13:39.2717229Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:13:39.2717297Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:13:39.2717509Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2717583Z mov.b64 {%r919, %r920}, %rd252; 2026-02-21T08:13:39.2717657Z cvt.rn.f16x2.f32 %r921, %r920, %r919; 2026-02-21T08:13:39.2717867Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2717942Z cvt.u64.u32 %rd253, %r614; 2026-02-21T08:13:39.2718008Z cvt.u64.u32 %rd254, %r615; 2026-02-21T08:13:39.2718076Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:13:39.2718143Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:13:39.2718364Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2718434Z mov.b64 {%r922, %r923}, %rd256; 2026-02-21T08:13:39.2718508Z cvt.rn.f16x2.f32 %r924, %r923, %r922; 2026-02-21T08:13:39.2718724Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2718793Z cvt.u64.u32 %rd257, %r616; 2026-02-21T08:13:39.2718860Z cvt.u64.u32 %rd258, %r617; 2026-02-21T08:13:39.2718935Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:13:39.2719002Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:13:39.2719215Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2719284Z mov.b64 {%r925, %r926}, %rd260; 2026-02-21T08:13:39.2719366Z cvt.rn.f16x2.f32 %r927, %r926, %r925; 2026-02-21T08:13:39.2719575Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2719643Z cvt.u64.u32 %rd261, %r618; 2026-02-21T08:13:39.2719715Z cvt.u64.u32 %rd262, %r619; 2026-02-21T08:13:39.2719783Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:13:39.2719854Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:13:39.2720069Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2720137Z mov.b64 {%r928, %r929}, %rd264; 2026-02-21T08:13:39.2720210Z cvt.rn.f16x2.f32 %r930, %r929, %r928; 2026-02-21T08:13:39.2720424Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2720498Z cvt.u64.u32 %rd265, %r620; 2026-02-21T08:13:39.2720565Z cvt.u64.u32 %rd266, %r621; 2026-02-21T08:13:39.2720631Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:13:39.2720708Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:13:39.2720919Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2720986Z mov.b64 {%r931, %r932}, %rd268; 2026-02-21T08:13:39.2721066Z cvt.rn.f16x2.f32 %r933, %r932, %r931; 2026-02-21T08:13:39.2721279Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2721397Z cvt.u64.u32 %rd269, %r623; 2026-02-21T08:13:39.2721464Z cvt.u64.u32 %rd270, %r624; 2026-02-21T08:13:39.2721538Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:13:39.2721608Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:13:39.2721821Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2721896Z mov.b64 {%r934, %r935}, %rd272; 2026-02-21T08:13:39.2721970Z cvt.rn.f16x2.f32 %r936, %r935, %r934; 2026-02-21T08:13:39.2722182Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2722255Z cvt.u64.u32 %rd273, %r625; 2026-02-21T08:13:39.2722322Z cvt.u64.u32 %rd274, %r626; 2026-02-21T08:13:39.2722389Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:13:39.2722456Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:13:39.2722723Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2722799Z mov.b64 {%r937, %r938}, %rd276; 2026-02-21T08:13:39.2722871Z cvt.rn.f16x2.f32 %r939, %r938, %r937; 2026-02-21T08:13:39.2723087Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2723154Z cvt.u64.u32 %rd277, %r627; 2026-02-21T08:13:39.2723221Z cvt.u64.u32 %rd278, %r628; 2026-02-21T08:13:39.2723295Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:13:39.2723364Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:13:39.2723573Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2723641Z mov.b64 {%r940, %r941}, %rd280; 2026-02-21T08:13:39.2723724Z cvt.rn.f16x2.f32 %r942, %r941, %r940; 2026-02-21T08:13:39.2723939Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2724010Z cvt.u64.u32 %rd281, %r629; 2026-02-21T08:13:39.2724090Z cvt.u64.u32 %rd282, %r630; 2026-02-21T08:13:39.2724162Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:13:39.2724233Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:13:39.2724450Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2724522Z mov.b64 {%r943, %r944}, %rd284; 2026-02-21T08:13:39.2724597Z cvt.rn.f16x2.f32 %r945, %r944, %r943; 2026-02-21T08:13:39.2724905Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2724985Z cvt.u64.u32 %rd285, %r631; 2026-02-21T08:13:39.2725055Z cvt.u64.u32 %rd286, %r632; 2026-02-21T08:13:39.2725125Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:13:39.2725203Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:13:39.2725403Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2725474Z mov.b64 {%r946, %r947}, %rd288; 2026-02-21T08:13:39.2725560Z cvt.rn.f16x2.f32 %r948, %r947, %r946; 2026-02-21T08:13:39.2725768Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2725848Z cvt.u64.u32 %rd289, %r633; 2026-02-21T08:13:39.2725915Z cvt.u64.u32 %rd290, %r634; 2026-02-21T08:13:39.2725991Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:13:39.2726058Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:13:39.2726254Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2726331Z mov.b64 {%r949, %r950}, %rd292; 2026-02-21T08:13:39.2726404Z cvt.rn.f16x2.f32 %r951, %r950, %r949; 2026-02-21T08:13:39.2726599Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2726674Z cvt.u64.u32 %rd293, %r635; 2026-02-21T08:13:39.2726741Z cvt.u64.u32 %rd294, %r636; 2026-02-21T08:13:39.2726809Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:13:39.2726878Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:13:39.2727088Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2727231Z mov.b64 {%r952, %r953}, %rd296; 2026-02-21T08:13:39.2727307Z cvt.rn.f16x2.f32 %r954, %r953, %r952; 2026-02-21T08:13:39.2727520Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2727591Z cvt.u64.u32 %rd297, %r637; 2026-02-21T08:13:39.2727660Z cvt.u64.u32 %rd298, %r638; 2026-02-21T08:13:39.2727736Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:13:39.2727804Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:13:39.2728007Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2728075Z mov.b64 {%r955, %r956}, %rd300; 2026-02-21T08:13:39.2728159Z cvt.rn.f16x2.f32 %r957, %r956, %r955; 2026-02-21T08:13:39.2728359Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2728486Z cvt.u64.u32 %rd301, %r640; 2026-02-21T08:13:39.2728567Z cvt.u64.u32 %rd302, %r641; 2026-02-21T08:13:39.2728636Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:13:39.2728704Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:13:39.2728917Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2728987Z mov.b64 {%r958, %r959}, %rd304; 2026-02-21T08:13:39.2729060Z cvt.rn.f16x2.f32 %r960, %r959, %r958; 2026-02-21T08:13:39.2729268Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2729346Z cvt.u64.u32 %rd305, %r642; 2026-02-21T08:13:39.2729414Z cvt.u64.u32 %rd306, %r643; 2026-02-21T08:13:39.2729482Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:13:39.2729560Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:13:39.2729764Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2729836Z mov.b64 {%r961, %r962}, %rd308; 2026-02-21T08:13:39.2729922Z cvt.rn.f16x2.f32 %r963, %r962, %r961; 2026-02-21T08:13:39.2730125Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2730192Z cvt.u64.u32 %rd309, %r644; 2026-02-21T08:13:39.2730260Z cvt.u64.u32 %rd310, %r645; 2026-02-21T08:13:39.2730337Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:13:39.2730407Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:13:39.2730608Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2730686Z mov.b64 {%r964, %r965}, %rd312; 2026-02-21T08:13:39.2730761Z cvt.rn.f16x2.f32 %r966, %r965, %r964; 2026-02-21T08:13:39.2730967Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2731043Z cvt.u64.u32 %rd313, %r646; 2026-02-21T08:13:39.2731111Z cvt.u64.u32 %rd314, %r647; 2026-02-21T08:13:39.2731181Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:13:39.2731251Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:13:39.2731461Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2731529Z mov.b64 {%r967, %r968}, %rd316; 2026-02-21T08:13:39.2731602Z cvt.rn.f16x2.f32 %r969, %r968, %r967; 2026-02-21T08:13:39.2731811Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2731879Z cvt.u64.u32 %rd317, %r648; 2026-02-21T08:13:39.2731946Z cvt.u64.u32 %rd318, %r649; 2026-02-21T08:13:39.2732014Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:13:39.2732092Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:13:39.2732296Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2732365Z mov.b64 {%r970, %r971}, %rd320; 2026-02-21T08:13:39.2732448Z cvt.rn.f16x2.f32 %r972, %r971, %r970; 2026-02-21T08:13:39.2732653Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2732772Z cvt.u64.u32 %rd321, %r650; 2026-02-21T08:13:39.2732847Z cvt.u64.u32 %rd322, %r651; 2026-02-21T08:13:39.2732915Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:13:39.2732984Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:13:39.2733191Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2733265Z mov.b64 {%r973, %r974}, %rd324; 2026-02-21T08:13:39.2733338Z cvt.rn.f16x2.f32 %r975, %r974, %r973; 2026-02-21T08:13:39.2733542Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2733617Z cvt.u64.u32 %rd325, %r652; 2026-02-21T08:13:39.2733685Z cvt.u64.u32 %rd326, %r653; 2026-02-21T08:13:39.2733751Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:13:39.2733828Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:13:39.2734090Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2734164Z mov.b64 {%r976, %r977}, %rd328; 2026-02-21T08:13:39.2734235Z cvt.rn.f16x2.f32 %r978, %r977, %r976; 2026-02-21T08:13:39.2734443Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2734511Z cvt.u64.u32 %rd329, %r654; 2026-02-21T08:13:39.2734579Z cvt.u64.u32 %rd330, %r655; 2026-02-21T08:13:39.2734657Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:13:39.2734792Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:13:39.2734995Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2735070Z mov.b64 {%r979, %r980}, %rd332; 2026-02-21T08:13:39.2735144Z cvt.rn.f16x2.f32 %r981, %r980, %r979; 2026-02-21T08:13:39.2735343Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2735422Z cvt.u64.u32 %rd333, %r657; 2026-02-21T08:13:39.2735492Z cvt.u64.u32 %rd334, %r658; 2026-02-21T08:13:39.2735564Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:13:39.2735633Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:13:39.2735846Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2735915Z mov.b64 {%r982, %r983}, %rd336; 2026-02-21T08:13:39.2735989Z cvt.rn.f16x2.f32 %r984, %r983, %r982; 2026-02-21T08:13:39.2736197Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2736265Z cvt.u64.u32 %rd337, %r659; 2026-02-21T08:13:39.2736332Z cvt.u64.u32 %rd338, %r660; 2026-02-21T08:13:39.2736400Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:13:39.2736478Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:13:39.2736692Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2736761Z mov.b64 {%r985, %r986}, %rd340; 2026-02-21T08:13:39.2736848Z cvt.rn.f16x2.f32 %r987, %r986, %r985; 2026-02-21T08:13:39.2737051Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2737120Z cvt.u64.u32 %rd341, %r661; 2026-02-21T08:13:39.2737198Z cvt.u64.u32 %rd342, %r662; 2026-02-21T08:13:39.2737268Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:13:39.2737338Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:13:39.2737547Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2737626Z mov.b64 {%r988, %r989}, %rd344; 2026-02-21T08:13:39.2737701Z cvt.rn.f16x2.f32 %r990, %r989, %r988; 2026-02-21T08:13:39.2737913Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2737993Z cvt.u64.u32 %rd345, %r663; 2026-02-21T08:13:39.2738063Z cvt.u64.u32 %rd346, %r664; 2026-02-21T08:13:39.2738134Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:13:39.2738216Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:13:39.2738483Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2738552Z mov.b64 {%r991, %r992}, %rd348; 2026-02-21T08:13:39.2738626Z cvt.rn.f16x2.f32 %r993, %r992, %r991; 2026-02-21T08:13:39.2738844Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2738912Z cvt.u64.u32 %rd349, %r665; 2026-02-21T08:13:39.2738978Z cvt.u64.u32 %rd350, %r666; 2026-02-21T08:13:39.2739055Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:13:39.2739122Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:13:39.2739334Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2739410Z mov.b64 {%r994, %r995}, %rd352; 2026-02-21T08:13:39.2739483Z cvt.rn.f16x2.f32 %r996, %r995, %r994; 2026-02-21T08:13:39.2739747Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2739827Z cvt.u64.u32 %rd353, %r667; 2026-02-21T08:13:39.2739894Z cvt.u64.u32 %rd354, %r668; 2026-02-21T08:13:39.2739961Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:13:39.2740031Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:13:39.2740242Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2740311Z mov.b64 {%r997, %r998}, %rd356; 2026-02-21T08:13:39.2740384Z cvt.rn.f16x2.f32 %r999, %r998, %r997; 2026-02-21T08:13:39.2740605Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2740674Z cvt.u64.u32 %rd357, %r669; 2026-02-21T08:13:39.2740743Z cvt.u64.u32 %rd358, %r670; 2026-02-21T08:13:39.2740811Z shl.b64 %rd359, %rd358, 32; 2026-02-21T08:13:39.2740887Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T08:13:39.2741102Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2741179Z mov.b64 {%r1000, %r1001}, %rd360; 2026-02-21T08:13:39.2741272Z cvt.rn.f16x2.f32 %r1002, %r1001, %r1000; 2026-02-21T08:13:39.2741489Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2741558Z cvt.u64.u32 %rd361, %r671; 2026-02-21T08:13:39.2741634Z cvt.u64.u32 %rd362, %r672; 2026-02-21T08:13:39.2741701Z shl.b64 %rd363, %rd362, 32; 2026-02-21T08:13:39.2741770Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T08:13:39.2741985Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2742064Z mov.b64 {%r1003, %r1004}, %rd364; 2026-02-21T08:13:39.2742145Z cvt.rn.f16x2.f32 %r1005, %r1004, %r1003; 2026-02-21T08:13:39.2742348Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2742425Z cvt.u64.u32 %rd365, %r674; 2026-02-21T08:13:39.2742495Z cvt.u64.u32 %rd366, %r675; 2026-02-21T08:13:39.2742566Z shl.b64 %rd367, %rd366, 32; 2026-02-21T08:13:39.2742643Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T08:13:39.2742845Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2742916Z mov.b64 {%r1006, %r1007}, %rd368; 2026-02-21T08:13:39.2742997Z cvt.rn.f16x2.f32 %r1008, %r1007, %r1006; 2026-02-21T08:13:39.2743213Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2743281Z cvt.u64.u32 %rd369, %r676; 2026-02-21T08:13:39.2743348Z cvt.u64.u32 %rd370, %r677; 2026-02-21T08:13:39.2743424Z shl.b64 %rd371, %rd370, 32; 2026-02-21T08:13:39.2743492Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T08:13:39.2743705Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2743780Z mov.b64 {%r1009, %r1010}, %rd372; 2026-02-21T08:13:39.2743862Z cvt.rn.f16x2.f32 %r1011, %r1010, %r1009; 2026-02-21T08:13:39.2744179Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2744258Z cvt.u64.u32 %rd373, %r678; 2026-02-21T08:13:39.2744326Z cvt.u64.u32 %rd374, %r679; 2026-02-21T08:13:39.2744395Z shl.b64 %rd375, %rd374, 32; 2026-02-21T08:13:39.2744465Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T08:13:39.2744743Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2744816Z mov.b64 {%r1012, %r1013}, %rd376; 2026-02-21T08:13:39.2744896Z cvt.rn.f16x2.f32 %r1014, %r1013, %r1012; 2026-02-21T08:13:39.2745113Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2745182Z cvt.u64.u32 %rd377, %r680; 2026-02-21T08:13:39.2745250Z cvt.u64.u32 %rd378, %r681; 2026-02-21T08:13:39.2745318Z shl.b64 %rd379, %rd378, 32; 2026-02-21T08:13:39.2745453Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T08:13:39.2745670Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2745744Z mov.b64 {%r1015, %r1016}, %rd380; 2026-02-21T08:13:39.2745832Z cvt.rn.f16x2.f32 %r1017, %r1016, %r1015; 2026-02-21T08:13:39.2746042Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2746111Z cvt.u64.u32 %rd381, %r682; 2026-02-21T08:13:39.2746185Z cvt.u64.u32 %rd382, %r683; 2026-02-21T08:13:39.2746254Z shl.b64 %rd383, %rd382, 32; 2026-02-21T08:13:39.2746323Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T08:13:39.2746524Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2746601Z mov.b64 {%r1018, %r1019}, %rd384; 2026-02-21T08:13:39.2746681Z cvt.rn.f16x2.f32 %r1020, %r1019, %r1018; 2026-02-21T08:13:39.2746888Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2746967Z cvt.u64.u32 %rd385, %r684; 2026-02-21T08:13:39.2747035Z cvt.u64.u32 %rd386, %r685; 2026-02-21T08:13:39.2747103Z shl.b64 %rd387, %rd386, 32; 2026-02-21T08:13:39.2747180Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T08:13:39.2747379Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2747451Z mov.b64 {%r1021, %r1022}, %rd388; 2026-02-21T08:13:39.2747531Z cvt.rn.f16x2.f32 %r1023, %r1022, %r1021; 2026-02-21T08:13:39.2747738Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2747806Z cvt.u64.u32 %rd389, %r686; 2026-02-21T08:13:39.2747873Z cvt.u64.u32 %rd390, %r687; 2026-02-21T08:13:39.2747949Z shl.b64 %rd391, %rd390, 32; 2026-02-21T08:13:39.2748017Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T08:13:39.2748217Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2748299Z mov.b64 {%r1024, %r1025}, %rd392; 2026-02-21T08:13:39.2748381Z cvt.rn.f16x2.f32 %r1026, %r1025, %r1024; 2026-02-21T08:13:39.2748581Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2748660Z cvt.u64.u32 %rd393, %r688; 2026-02-21T08:13:39.2748729Z cvt.u64.u32 %rd394, %r689; 2026-02-21T08:13:39.2748798Z shl.b64 %rd395, %rd394, 32; 2026-02-21T08:13:39.2748868Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T08:13:39.2749080Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2749154Z mov.b64 {%r1027, %r1028}, %rd396; 2026-02-21T08:13:39.2749234Z cvt.rn.f16x2.f32 %r1029, %r1028, %r1027; 2026-02-21T08:13:39.2749445Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2749513Z cvt.u64.u32 %rd397, %r691; 2026-02-21T08:13:39.2749580Z cvt.u64.u32 %rd398, %r692; 2026-02-21T08:13:39.2749652Z shl.b64 %rd399, %rd398, 32; 2026-02-21T08:13:39.2749789Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T08:13:39.2749992Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2750061Z mov.b64 {%r1030, %r1031}, %rd400; 2026-02-21T08:13:39.2750147Z cvt.rn.f16x2.f32 %r1032, %r1031, %r1030; 2026-02-21T08:13:39.2750344Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2750413Z cvt.u64.u32 %rd401, %r693; 2026-02-21T08:13:39.2750486Z cvt.u64.u32 %rd402, %r694; 2026-02-21T08:13:39.2750552Z shl.b64 %rd403, %rd402, 32; 2026-02-21T08:13:39.2750621Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T08:13:39.2750819Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2750894Z mov.b64 {%r1033, %r1034}, %rd404; 2026-02-21T08:13:39.2751024Z cvt.rn.f16x2.f32 %r1035, %r1034, %r1033; 2026-02-21T08:13:39.2751233Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2751311Z cvt.u64.u32 %rd405, %r695; 2026-02-21T08:13:39.2751379Z cvt.u64.u32 %rd406, %r696; 2026-02-21T08:13:39.2751448Z shl.b64 %rd407, %rd406, 32; 2026-02-21T08:13:39.2751524Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T08:13:39.2751729Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2751798Z mov.b64 {%r1036, %r1037}, %rd408; 2026-02-21T08:13:39.2751877Z cvt.rn.f16x2.f32 %r1038, %r1037, %r1036; 2026-02-21T08:13:39.2752095Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2752164Z cvt.u64.u32 %rd409, %r697; 2026-02-21T08:13:39.2752230Z cvt.u64.u32 %rd410, %r698; 2026-02-21T08:13:39.2752305Z shl.b64 %rd411, %rd410, 32; 2026-02-21T08:13:39.2752375Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T08:13:39.2752585Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2752664Z mov.b64 {%r1039, %r1040}, %rd412; 2026-02-21T08:13:39.2752744Z cvt.rn.f16x2.f32 %r1041, %r1040, %r1039; 2026-02-21T08:13:39.2752950Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2753026Z cvt.u64.u32 %rd413, %r699; 2026-02-21T08:13:39.2753093Z cvt.u64.u32 %rd414, %r700; 2026-02-21T08:13:39.2753161Z shl.b64 %rd415, %rd414, 32; 2026-02-21T08:13:39.2753229Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T08:13:39.2753443Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2753514Z mov.b64 {%r1042, %r1043}, %rd416; 2026-02-21T08:13:39.2753593Z cvt.rn.f16x2.f32 %r1044, %r1043, %r1042; 2026-02-21T08:13:39.2753804Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2753875Z cvt.u64.u32 %rd417, %r701; 2026-02-21T08:13:39.2753946Z cvt.u64.u32 %rd418, %r702; 2026-02-21T08:13:39.2754014Z shl.b64 %rd419, %rd418, 32; 2026-02-21T08:13:39.2754091Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T08:13:39.2754297Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2754365Z mov.b64 {%r1045, %r1046}, %rd420; 2026-02-21T08:13:39.2754451Z cvt.rn.f16x2.f32 %r1047, %r1046, %r1045; 2026-02-21T08:13:39.2754657Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2754999Z cvt.u64.u32 %rd421, %r703; 2026-02-21T08:13:39.2755078Z cvt.u64.u32 %rd422, %r704; 2026-02-21T08:13:39.2755148Z shl.b64 %rd423, %rd422, 32; 2026-02-21T08:13:39.2755218Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T08:13:39.2755427Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2755507Z mov.b64 {%r1048, %r1049}, %rd424; 2026-02-21T08:13:39.2755684Z cvt.rn.f16x2.f32 %r1050, %r1049, %r1048; 2026-02-21T08:13:39.2755904Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2755979Z cvt.u64.u32 %rd425, %r705; 2026-02-21T08:13:39.2756046Z cvt.u64.u32 %rd426, %r706; 2026-02-21T08:13:39.2756113Z shl.b64 %rd427, %rd426, 32; 2026-02-21T08:13:39.2756186Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T08:13:39.2756396Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2756464Z mov.b64 {%r1051, %r1052}, %rd428; 2026-02-21T08:13:39.2756541Z cvt.rn.f16x2.f32 %r1053, %r1052, %r1051; 2026-02-21T08:13:39.2756758Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2756827Z cvt.u64.u32 %rd429, %r708; 2026-02-21T08:13:39.2756893Z cvt.u64.u32 %rd430, %r709; 2026-02-21T08:13:39.2757024Z shl.b64 %rd431, %rd430, 32; 2026-02-21T08:13:39.2757101Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T08:13:39.2757319Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2757400Z mov.b64 {%r1054, %r1055}, %rd432; 2026-02-21T08:13:39.2757481Z cvt.rn.f16x2.f32 %r1056, %r1055, %r1054; 2026-02-21T08:13:39.2757697Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2757774Z cvt.u64.u32 %rd433, %r710; 2026-02-21T08:13:39.2757842Z cvt.u64.u32 %rd434, %r711; 2026-02-21T08:13:39.2757914Z shl.b64 %rd435, %rd434, 32; 2026-02-21T08:13:39.2757982Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T08:13:39.2758194Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2758263Z mov.b64 {%r1057, %r1058}, %rd436; 2026-02-21T08:13:39.2758342Z cvt.rn.f16x2.f32 %r1059, %r1058, %r1057; 2026-02-21T08:13:39.2758559Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2758631Z cvt.u64.u32 %rd437, %r712; 2026-02-21T08:13:39.2758697Z cvt.u64.u32 %rd438, %r713; 2026-02-21T08:13:39.2758766Z shl.b64 %rd439, %rd438, 32; 2026-02-21T08:13:39.2758843Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T08:13:39.2759047Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2759115Z mov.b64 {%r1060, %r1061}, %rd440; 2026-02-21T08:13:39.2759201Z cvt.rn.f16x2.f32 %r1062, %r1061, %r1060; 2026-02-21T08:13:39.2759406Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2759475Z cvt.u64.u32 %rd441, %r714; 2026-02-21T08:13:39.2759552Z cvt.u64.u32 %rd442, %r715; 2026-02-21T08:13:39.2759621Z shl.b64 %rd443, %rd442, 32; 2026-02-21T08:13:39.2759691Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T08:13:39.2759901Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2759983Z mov.b64 {%r1063, %r1064}, %rd444; 2026-02-21T08:13:39.2760062Z cvt.rn.f16x2.f32 %r1065, %r1064, %r1063; 2026-02-21T08:13:39.2760267Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2760342Z cvt.u64.u32 %rd445, %r716; 2026-02-21T08:13:39.2760409Z cvt.u64.u32 %rd446, %r717; 2026-02-21T08:13:39.2760478Z shl.b64 %rd447, %rd446, 32; 2026-02-21T08:13:39.2760555Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T08:13:39.2760761Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2760830Z mov.b64 {%r1066, %r1067}, %rd448; 2026-02-21T08:13:39.2760906Z cvt.rn.f16x2.f32 %r1068, %r1067, %r1066; 2026-02-21T08:13:39.2761117Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2761186Z cvt.u64.u32 %rd449, %r718; 2026-02-21T08:13:39.2761305Z cvt.u64.u32 %rd450, %r719; 2026-02-21T08:13:39.2761382Z shl.b64 %rd451, %rd450, 32; 2026-02-21T08:13:39.2761450Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T08:13:39.2761654Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2761730Z mov.b64 {%r1069, %r1070}, %rd452; 2026-02-21T08:13:39.2761808Z cvt.rn.f16x2.f32 %r1071, %r1070, %r1069; 2026-02-21T08:13:39.2762022Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2762099Z cvt.u64.u32 %rd453, %r720; 2026-02-21T08:13:39.2762167Z cvt.u64.u32 %rd454, %r721; 2026-02-21T08:13:39.2762236Z shl.b64 %rd455, %rd454, 32; 2026-02-21T08:13:39.2762303Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T08:13:39.2762515Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2762633Z mov.b64 {%r1072, %r1073}, %rd456; 2026-02-21T08:13:39.2762717Z cvt.rn.f16x2.f32 %r1074, %r1073, %r1072; 2026-02-21T08:13:39.2762930Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2762998Z cvt.u64.u32 %rd457, %r722; 2026-02-21T08:13:39.2763066Z cvt.u64.u32 %rd458, %r723; 2026-02-21T08:13:39.2763135Z shl.b64 %rd459, %rd458, 32; 2026-02-21T08:13:39.2763211Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T08:13:39.2763421Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2763489Z mov.b64 {%r1075, %r1076}, %rd460; 2026-02-21T08:13:39.2763576Z cvt.rn.f16x2.f32 %r1077, %r1076, %r1075; 2026-02-21T08:13:39.2763776Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2763844Z cvt.u64.u32 %rd461, %r725; 2026-02-21T08:13:39.2763919Z cvt.u64.u32 %rd462, %r726; 2026-02-21T08:13:39.2763991Z shl.b64 %rd463, %rd462, 32; 2026-02-21T08:13:39.2764063Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T08:13:39.2764267Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2764345Z mov.b64 {%r1078, %r1079}, %rd464; 2026-02-21T08:13:39.2764422Z cvt.rn.f16x2.f32 %r1080, %r1079, %r1078; 2026-02-21T08:13:39.2764621Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2764780Z cvt.u64.u32 %rd465, %r727; 2026-02-21T08:13:39.2764849Z cvt.u64.u32 %rd466, %r728; 2026-02-21T08:13:39.2764916Z shl.b64 %rd467, %rd466, 32; 2026-02-21T08:13:39.2764992Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T08:13:39.2765190Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2765259Z mov.b64 {%r1081, %r1082}, %rd468; 2026-02-21T08:13:39.2765337Z cvt.rn.f16x2.f32 %r1083, %r1082, %r1081; 2026-02-21T08:13:39.2765548Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2765618Z cvt.u64.u32 %rd469, %r729; 2026-02-21T08:13:39.2765684Z cvt.u64.u32 %rd470, %r730; 2026-02-21T08:13:39.2765758Z shl.b64 %rd471, %rd470, 32; 2026-02-21T08:13:39.2765826Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T08:13:39.2766027Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2766102Z mov.b64 {%r1084, %r1085}, %rd472; 2026-02-21T08:13:39.2766180Z cvt.rn.f16x2.f32 %r1086, %r1085, %r1084; 2026-02-21T08:13:39.2766381Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2766454Z cvt.u64.u32 %rd473, %r731; 2026-02-21T08:13:39.2766520Z cvt.u64.u32 %rd474, %r732; 2026-02-21T08:13:39.2766588Z shl.b64 %rd475, %rd474, 32; 2026-02-21T08:13:39.2766655Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T08:13:39.2766875Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2767006Z mov.b64 {%r1087, %r1088}, %rd476; 2026-02-21T08:13:39.2767086Z cvt.rn.f16x2.f32 %r1089, %r1088, %r1087; 2026-02-21T08:13:39.2767294Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2767361Z cvt.u64.u32 %rd477, %r733; 2026-02-21T08:13:39.2767428Z cvt.u64.u32 %rd478, %r734; 2026-02-21T08:13:39.2767496Z shl.b64 %rd479, %rd478, 32; 2026-02-21T08:13:39.2767572Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T08:13:39.2767776Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2767844Z mov.b64 {%r1090, %r1091}, %rd480; 2026-02-21T08:13:39.2767929Z cvt.rn.f16x2.f32 %r1092, %r1091, %r1090; 2026-02-21T08:13:39.2768133Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2768256Z cvt.u64.u32 %rd481, %r735; 2026-02-21T08:13:39.2768334Z cvt.u64.u32 %rd482, %r736; 2026-02-21T08:13:39.2768403Z shl.b64 %rd483, %rd482, 32; 2026-02-21T08:13:39.2768472Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T08:13:39.2768677Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2768754Z mov.b64 {%r1093, %r1094}, %rd484; 2026-02-21T08:13:39.2768832Z cvt.rn.f16x2.f32 %r1095, %r1094, %r1093; 2026-02-21T08:13:39.2769034Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2769109Z cvt.u64.u32 %rd485, %r737; 2026-02-21T08:13:39.2769175Z cvt.u64.u32 %rd486, %r738; 2026-02-21T08:13:39.2769244Z shl.b64 %rd487, %rd486, 32; 2026-02-21T08:13:39.2769320Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T08:13:39.2769524Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2769596Z mov.b64 {%r1096, %r1097}, %rd488; 2026-02-21T08:13:39.2769680Z cvt.rn.f16x2.f32 %r1098, %r1097, %r1096; 2026-02-21T08:13:39.2769895Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2769964Z cvt.u64.u32 %rd489, %r739; 2026-02-21T08:13:39.2770030Z cvt.u64.u32 %rd490, %r740; 2026-02-21T08:13:39.2770107Z shl.b64 %rd491, %rd490, 32; 2026-02-21T08:13:39.2770179Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T08:13:39.2770379Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2770457Z mov.b64 {%r1099, %r1100}, %rd492; 2026-02-21T08:13:39.2770537Z cvt.rn.f16x2.f32 %r1101, %r1100, %r1099; 2026-02-21T08:13:39.2770741Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2770821Z cvt.u64.u32 %rd493, %r742; 2026-02-21T08:13:39.2770888Z cvt.u64.u32 %rd494, %r743; 2026-02-21T08:13:39.2770959Z shl.b64 %rd495, %rd494, 32; 2026-02-21T08:13:39.2771030Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T08:13:39.2771248Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2771316Z mov.b64 {%r1102, %r1103}, %rd496; 2026-02-21T08:13:39.2771395Z cvt.rn.f16x2.f32 %r1104, %r1103, %r1102; 2026-02-21T08:13:39.2771607Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2771674Z cvt.u64.u32 %rd497, %r744; 2026-02-21T08:13:39.2771741Z cvt.u64.u32 %rd498, %r745; 2026-02-21T08:13:39.2771809Z shl.b64 %rd499, %rd498, 32; 2026-02-21T08:13:39.2771883Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T08:13:39.2772088Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2772157Z mov.b64 {%r1105, %r1106}, %rd500; 2026-02-21T08:13:39.2772242Z cvt.rn.f16x2.f32 %r1107, %r1106, %r1105; 2026-02-21T08:13:39.2772448Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2772569Z cvt.u64.u32 %rd501, %r746; 2026-02-21T08:13:39.2772646Z cvt.u64.u32 %rd502, %r747; 2026-02-21T08:13:39.2772714Z shl.b64 %rd503, %rd502, 32; 2026-02-21T08:13:39.2772783Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T08:13:39.2772986Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2773062Z mov.b64 {%r1108, %r1109}, %rd504; 2026-02-21T08:13:39.2773138Z cvt.rn.f16x2.f32 %r1110, %r1109, %r1108; 2026-02-21T08:13:39.2773336Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2773413Z cvt.u64.u32 %rd505, %r748; 2026-02-21T08:13:39.2773479Z cvt.u64.u32 %rd506, %r749; 2026-02-21T08:13:39.2773548Z shl.b64 %rd507, %rd506, 32; 2026-02-21T08:13:39.2773623Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T08:13:39.2773874Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2773948Z mov.b64 {%r1111, %r1112}, %rd508; 2026-02-21T08:13:39.2774028Z cvt.rn.f16x2.f32 %r1113, %r1112, %r1111; 2026-02-21T08:13:39.2774233Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2774300Z cvt.u64.u32 %rd509, %r750; 2026-02-21T08:13:39.2774367Z cvt.u64.u32 %rd510, %r751; 2026-02-21T08:13:39.2774443Z shl.b64 %rd511, %rd510, 32; 2026-02-21T08:13:39.2774512Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T08:13:39.2774762Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2774844Z mov.b64 {%r1114, %r1115}, %rd512; 2026-02-21T08:13:39.2774925Z cvt.rn.f16x2.f32 %r1116, %r1115, %r1114; 2026-02-21T08:13:39.2775130Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2775208Z cvt.u64.u32 %rd513, %r752; 2026-02-21T08:13:39.2775279Z cvt.u64.u32 %rd514, %r753; 2026-02-21T08:13:39.2775348Z shl.b64 %rd515, %rd514, 32; 2026-02-21T08:13:39.2775417Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T08:13:39.2775634Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2775704Z mov.b64 {%r1117, %r1118}, %rd516; 2026-02-21T08:13:39.2775782Z cvt.rn.f16x2.f32 %r1119, %r1118, %r1117; 2026-02-21T08:13:39.2775992Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2776059Z cvt.u64.u32 %rd517, %r754; 2026-02-21T08:13:39.2776125Z cvt.u64.u32 %rd518, %r755; 2026-02-21T08:13:39.2776194Z shl.b64 %rd519, %rd518, 32; 2026-02-21T08:13:39.2776268Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T08:13:39.2776475Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2776545Z mov.b64 {%r1120, %r1121}, %rd520; 2026-02-21T08:13:39.2776634Z cvt.rn.f16x2.f32 %r1122, %r1121, %r1120; 2026-02-21T08:13:39.2776842Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2776910Z cvt.u64.u32 %rd521, %r756; 2026-02-21T08:13:39.2776983Z cvt.u64.u32 %rd522, %r757; 2026-02-21T08:13:39.2777053Z shl.b64 %rd523, %rd522, 32; 2026-02-21T08:13:39.2777120Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T08:13:39.2777331Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2777409Z mov.b64 {%r1123, %r1124}, %rd524; 2026-02-21T08:13:39.2777489Z cvt.rn.f16x2.f32 %r1125, %r1124, %r1123; 2026-02-21T08:13:39.2777704Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2777779Z cvt.u64.u32 %rd525, %r759; 2026-02-21T08:13:39.2777848Z cvt.u64.u32 %rd526, %r760; 2026-02-21T08:13:39.2777916Z shl.b64 %rd527, %rd526, 32; 2026-02-21T08:13:39.2777996Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T08:13:39.2778286Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2778357Z mov.b64 {%r1126, %r1127}, %rd528; 2026-02-21T08:13:39.2778437Z cvt.rn.f16x2.f32 %r1128, %r1127, %r1126; 2026-02-21T08:13:39.2778654Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2778723Z cvt.u64.u32 %rd529, %r761; 2026-02-21T08:13:39.2778791Z cvt.u64.u32 %rd530, %r762; 2026-02-21T08:13:39.2778868Z shl.b64 %rd531, %rd530, 32; 2026-02-21T08:13:39.2778938Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T08:13:39.2779152Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2779229Z mov.b64 {%r1129, %r1130}, %rd532; 2026-02-21T08:13:39.2779309Z cvt.rn.f16x2.f32 %r1131, %r1130, %r1129; 2026-02-21T08:13:39.2779577Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2779656Z cvt.u64.u32 %rd533, %r763; 2026-02-21T08:13:39.2779723Z cvt.u64.u32 %rd534, %r764; 2026-02-21T08:13:39.2779792Z shl.b64 %rd535, %rd534, 32; 2026-02-21T08:13:39.2779860Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T08:13:39.2780076Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2780147Z mov.b64 {%r1132, %r1133}, %rd536; 2026-02-21T08:13:39.2780227Z cvt.rn.f16x2.f32 %r1134, %r1133, %r1132; 2026-02-21T08:13:39.2780444Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2780513Z cvt.u64.u32 %rd537, %r765; 2026-02-21T08:13:39.2780580Z cvt.u64.u32 %rd538, %r766; 2026-02-21T08:13:39.2780649Z shl.b64 %rd539, %rd538, 32; 2026-02-21T08:13:39.2780726Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T08:13:39.2780945Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2781018Z mov.b64 {%r1135, %r1136}, %rd540; 2026-02-21T08:13:39.2781107Z cvt.rn.f16x2.f32 %r1137, %r1136, %r1135; 2026-02-21T08:13:39.2781319Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2781389Z cvt.u64.u32 %rd541, %r767; 2026-02-21T08:13:39.2781464Z cvt.u64.u32 %rd542, %r768; 2026-02-21T08:13:39.2781534Z shl.b64 %rd543, %rd542, 32; 2026-02-21T08:13:39.2781605Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T08:13:39.2781816Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2781892Z mov.b64 {%r1138, %r1139}, %rd544; 2026-02-21T08:13:39.2781970Z cvt.rn.f16x2.f32 %r1140, %r1139, %r1138; 2026-02-21T08:13:39.2782184Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2782258Z cvt.u64.u32 %rd545, %r769; 2026-02-21T08:13:39.2782327Z cvt.u64.u32 %rd546, %r770; 2026-02-21T08:13:39.2782398Z shl.b64 %rd547, %rd546, 32; 2026-02-21T08:13:39.2782474Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T08:13:39.2782687Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2782756Z mov.b64 {%r1141, %r1142}, %rd548; 2026-02-21T08:13:39.2782833Z cvt.rn.f16x2.f32 %r1143, %r1142, %r1141; 2026-02-21T08:13:39.2783052Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2783120Z cvt.u64.u32 %rd549, %r771; 2026-02-21T08:13:39.2783189Z cvt.u64.u32 %rd550, %r772; 2026-02-21T08:13:39.2783264Z shl.b64 %rd551, %rd550, 32; 2026-02-21T08:13:39.2783332Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T08:13:39.2783542Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2783618Z mov.b64 {%r1144, %r1145}, %rd552; 2026-02-21T08:13:39.2783697Z cvt.rn.f16x2.f32 %r1146, %r1145, %r1144; 2026-02-21T08:13:39.2783957Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2784033Z cvt.u64.u32 %rd553, %r773; 2026-02-21T08:13:39.2784101Z cvt.u64.u32 %rd554, %r774; 2026-02-21T08:13:39.2784169Z shl.b64 %rd555, %rd554, 32; 2026-02-21T08:13:39.2784238Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T08:13:39.2784448Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2784518Z mov.b64 {%r1147, %r1148}, %rd556; 2026-02-21T08:13:39.2784596Z cvt.rn.f16x2.f32 %r1149, %r1148, %r1147; 2026-02-21T08:13:39.2784855Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2784924Z cvt.u64.u32 %rd557, %r776; 2026-02-21T08:13:39.2784992Z cvt.u64.u32 %rd558, %r777; 2026-02-21T08:13:39.2785061Z shl.b64 %rd559, %rd558, 32; 2026-02-21T08:13:39.2785193Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T08:13:39.2785402Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2785474Z mov.b64 {%r1150, %r1151}, %rd560; 2026-02-21T08:13:39.2785559Z cvt.rn.f16x2.f32 %r1152, %r1151, %r1150; 2026-02-21T08:13:39.2785768Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2785835Z cvt.u64.u32 %rd561, %r778; 2026-02-21T08:13:39.2785909Z cvt.u64.u32 %rd562, %r779; 2026-02-21T08:13:39.2785977Z shl.b64 %rd563, %rd562, 32; 2026-02-21T08:13:39.2786046Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T08:13:39.2786255Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2786331Z mov.b64 {%r1153, %r1154}, %rd564; 2026-02-21T08:13:39.2786408Z cvt.rn.f16x2.f32 %r1155, %r1154, %r1153; 2026-02-21T08:13:39.2786614Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2786693Z cvt.u64.u32 %rd565, %r780; 2026-02-21T08:13:39.2786760Z cvt.u64.u32 %rd566, %r781; 2026-02-21T08:13:39.2786828Z shl.b64 %rd567, %rd566, 32; 2026-02-21T08:13:39.2786904Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T08:13:39.2787106Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2787174Z mov.b64 {%r1156, %r1157}, %rd568; 2026-02-21T08:13:39.2787251Z cvt.rn.f16x2.f32 %r1158, %r1157, %r1156; 2026-02-21T08:13:39.2787464Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2787532Z cvt.u64.u32 %rd569, %r782; 2026-02-21T08:13:39.2787599Z cvt.u64.u32 %rd570, %r783; 2026-02-21T08:13:39.2787675Z shl.b64 %rd571, %rd570, 32; 2026-02-21T08:13:39.2787744Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T08:13:39.2787947Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2788026Z mov.b64 {%r1159, %r1160}, %rd572; 2026-02-21T08:13:39.2788103Z cvt.rn.f16x2.f32 %r1161, %r1160, %r1159; 2026-02-21T08:13:39.2788304Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2788379Z cvt.u64.u32 %rd573, %r784; 2026-02-21T08:13:39.2788446Z cvt.u64.u32 %rd574, %r785; 2026-02-21T08:13:39.2788514Z shl.b64 %rd575, %rd574, 32; 2026-02-21T08:13:39.2788582Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T08:13:39.2788792Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2788861Z mov.b64 {%r1162, %r1163}, %rd576; 2026-02-21T08:13:39.2788938Z cvt.rn.f16x2.f32 %r1164, %r1163, %r1162; 2026-02-21T08:13:39.2789144Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2789212Z cvt.u64.u32 %rd577, %r786; 2026-02-21T08:13:39.2789281Z cvt.u64.u32 %rd578, %r787; 2026-02-21T08:13:39.2789413Z shl.b64 %rd579, %rd578, 32; 2026-02-21T08:13:39.2789490Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T08:13:39.2789694Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2789764Z mov.b64 {%r1165, %r1166}, %rd580; 2026-02-21T08:13:39.2789849Z cvt.rn.f16x2.f32 %r1167, %r1166, %r1165; 2026-02-21T08:13:39.2790053Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2790122Z cvt.u64.u32 %rd581, %r788; 2026-02-21T08:13:39.2790197Z cvt.u64.u32 %rd582, %r789; 2026-02-21T08:13:39.2790264Z shl.b64 %rd583, %rd582, 32; 2026-02-21T08:13:39.2790332Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T08:13:39.2790536Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2790614Z mov.b64 {%r1168, %r1169}, %rd584; 2026-02-21T08:13:39.2790739Z cvt.rn.f16x2.f32 %r1170, %r1169, %r1168; 2026-02-21T08:13:39.2790949Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2791025Z cvt.u64.u32 %rd585, %r790; 2026-02-21T08:13:39.2791094Z cvt.u64.u32 %rd586, %r791; 2026-02-21T08:13:39.2791164Z shl.b64 %rd587, %rd586, 32; 2026-02-21T08:13:39.2791246Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T08:13:39.2791457Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2791529Z mov.b64 {%r1171, %r1172}, %rd588; 2026-02-21T08:13:39.2791611Z cvt.rn.f16x2.f32 %r1173, %r1172, %r1171; 2026-02-21T08:13:39.2791827Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2791897Z cvt.u64.u32 %rd589, %r793; 2026-02-21T08:13:39.2791968Z cvt.u64.u32 %rd590, %r794; 2026-02-21T08:13:39.2792050Z shl.b64 %rd591, %rd590, 32; 2026-02-21T08:13:39.2792126Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T08:13:39.2792336Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2792418Z mov.b64 {%r1174, %r1175}, %rd592; 2026-02-21T08:13:39.2792499Z cvt.rn.f16x2.f32 %r1176, %r1175, %r1174; 2026-02-21T08:13:39.2792708Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2792784Z cvt.u64.u32 %rd593, %r795; 2026-02-21T08:13:39.2792852Z cvt.u64.u32 %rd594, %r796; 2026-02-21T08:13:39.2792921Z shl.b64 %rd595, %rd594, 32; 2026-02-21T08:13:39.2792991Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T08:13:39.2793207Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2793276Z mov.b64 {%r1177, %r1178}, %rd596; 2026-02-21T08:13:39.2793356Z cvt.rn.f16x2.f32 %r1179, %r1178, %r1177; 2026-02-21T08:13:39.2793569Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2793642Z cvt.u64.u32 %rd597, %r797; 2026-02-21T08:13:39.2793711Z cvt.u64.u32 %rd598, %r798; 2026-02-21T08:13:39.2793779Z shl.b64 %rd599, %rd598, 32; 2026-02-21T08:13:39.2793853Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T08:13:39.2794064Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2794133Z mov.b64 {%r1180, %r1181}, %rd600; 2026-02-21T08:13:39.2794220Z cvt.rn.f16x2.f32 %r1182, %r1181, %r1180; 2026-02-21T08:13:39.2794427Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2794497Z cvt.u64.u32 %rd601, %r799; 2026-02-21T08:13:39.2794572Z cvt.u64.u32 %rd602, %r800; 2026-02-21T08:13:39.2794641Z shl.b64 %rd603, %rd602, 32; 2026-02-21T08:13:39.2794743Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T08:13:39.2794949Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2795087Z mov.b64 {%r1183, %r1184}, %rd604; 2026-02-21T08:13:39.2795165Z cvt.rn.f16x2.f32 %r1185, %r1184, %r1183; 2026-02-21T08:13:39.2795363Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2795440Z cvt.u64.u32 %rd605, %r801; 2026-02-21T08:13:39.2795504Z cvt.u64.u32 %rd606, %r802; 2026-02-21T08:13:39.2795571Z shl.b64 %rd607, %rd606, 32; 2026-02-21T08:13:39.2795646Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T08:13:39.2795856Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2795925Z mov.b64 {%r1186, %r1187}, %rd608; 2026-02-21T08:13:39.2796002Z cvt.rn.f16x2.f32 %r1188, %r1187, %r1186; 2026-02-21T08:13:39.2796212Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2796280Z cvt.u64.u32 %rd609, %r803; 2026-02-21T08:13:39.2796426Z cvt.u64.u32 %rd610, %r804; 2026-02-21T08:13:39.2796507Z shl.b64 %rd611, %rd610, 32; 2026-02-21T08:13:39.2796577Z or.b64 %rd612, %rd609, %rd611; 2026-02-21T08:13:39.2796792Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2796870Z mov.b64 {%r1189, %r1190}, %rd612; 2026-02-21T08:13:39.2796949Z cvt.rn.f16x2.f32 %r1191, %r1190, %r1189; 2026-02-21T08:13:39.2797160Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2797233Z cvt.u64.u32 %rd613, %r805; 2026-02-21T08:13:39.2797300Z cvt.u64.u32 %rd614, %r806; 2026-02-21T08:13:39.2797370Z shl.b64 %rd615, %rd614, 32; 2026-02-21T08:13:39.2797438Z or.b64 %rd616, %rd613, %rd615; 2026-02-21T08:13:39.2797646Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2797716Z mov.b64 {%r1192, %r1193}, %rd616; 2026-02-21T08:13:39.2797796Z cvt.rn.f16x2.f32 %r1194, %r1193, %r1192; 2026-02-21T08:13:39.2798007Z .loc 1 56 52 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:56:52 2026-02-21T08:13:39.2798076Z cvt.u64.u32 %rd617, %r807; 2026-02-21T08:13:39.2798142Z cvt.u64.u32 %rd618, %r808; 2026-02-21T08:13:39.2798211Z shl.b64 %rd619, %rd618, 32; 2026-02-21T08:13:39.2798288Z or.b64 %rd620, %rd617, %rd619; 2026-02-21T08:13:39.2798500Z .loc 1 58 27 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:58:27 2026-02-21T08:13:39.2798570Z mov.b64 {%r1195, %r1196}, %rd620; 2026-02-21T08:13:39.2798656Z cvt.rn.f16x2.f32 %r1197, %r1196, %r1195; 2026-02-21T08:13:39.2798869Z .loc 1 59 45 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:59:45 2026-02-21T08:13:39.2798956Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:39.2799026Z bar.sync 0; 2026-02-21T08:13:39.2799139Z st.shared.v4.b32 [%r4], {%r816, %r819, %r822, %r825}; 2026-02-21T08:13:39.2799263Z st.shared.v4.b32 [%r4+32768], {%r912, %r915, %r918, %r921}; 2026-02-21T08:13:39.2799400Z st.shared.v4.b32 [%r4+16384], {%r1008, %r1011, %r1014, %r1017}; 2026-02-21T08:13:39.2799528Z st.shared.v4.b32 [%r4+49152], {%r1104, %r1107, %r1110, %r1113}; 2026-02-21T08:13:39.2799634Z st.shared.v4.b32 [%r5], {%r828, %r831, %r834, %r837}; 2026-02-21T08:13:39.2799747Z st.shared.v4.b32 [%r5+32768], {%r924, %r927, %r930, %r933}; 2026-02-21T08:13:39.2799871Z st.shared.v4.b32 [%r5+16384], {%r1020, %r1023, %r1026, %r1029}; 2026-02-21T08:13:39.2799985Z st.shared.v4.b32 [%r5+49152], {%r1116, %r1119, %r1122, %r1125}; 2026-02-21T08:13:39.2800087Z st.shared.v4.b32 [%r6], {%r840, %r843, %r846, %r849}; 2026-02-21T08:13:39.2800207Z st.shared.v4.b32 [%r6+32768], {%r936, %r939, %r942, %r945}; 2026-02-21T08:13:39.2800320Z st.shared.v4.b32 [%r6+16384], {%r1032, %r1035, %r1038, %r1041}; 2026-02-21T08:13:39.2800430Z st.shared.v4.b32 [%r6+49152], {%r1128, %r1131, %r1134, %r1137}; 2026-02-21T08:13:39.2800539Z st.shared.v4.b32 [%r7], {%r852, %r855, %r858, %r861}; 2026-02-21T08:13:39.2800756Z st.shared.v4.b32 [%r7+32768], {%r948, %r951, %r954, %r957}; 2026-02-21T08:13:39.2800870Z st.shared.v4.b32 [%r7+16384], {%r1044, %r1047, %r1050, %r1053}; 2026-02-21T08:13:39.2800983Z st.shared.v4.b32 [%r7+49152], {%r1140, %r1143, %r1146, %r1149}; 2026-02-21T08:13:39.2801091Z st.shared.v4.b32 [%r8], {%r864, %r867, %r870, %r873}; 2026-02-21T08:13:39.2801203Z st.shared.v4.b32 [%r8+32768], {%r960, %r963, %r966, %r969}; 2026-02-21T08:13:39.2801315Z st.shared.v4.b32 [%r8+16384], {%r1056, %r1059, %r1062, %r1065}; 2026-02-21T08:13:39.2801438Z st.shared.v4.b32 [%r8+49152], {%r1152, %r1155, %r1158, %r1161}; 2026-02-21T08:13:39.2801538Z st.shared.v4.b32 [%r9], {%r876, %r879, %r882, %r885}; 2026-02-21T08:13:39.2801648Z st.shared.v4.b32 [%r9+32768], {%r972, %r975, %r978, %r981}; 2026-02-21T08:13:39.2801769Z st.shared.v4.b32 [%r9+16384], {%r1068, %r1071, %r1074, %r1077}; 2026-02-21T08:13:39.2801930Z st.shared.v4.b32 [%r9+49152], {%r1164, %r1167, %r1170, %r1173}; 2026-02-21T08:13:39.2802046Z st.shared.v4.b32 [%r10], {%r888, %r891, %r894, %r897}; 2026-02-21T08:13:39.2802179Z st.shared.v4.b32 [%r10+32768], {%r984, %r987, %r990, %r993}; 2026-02-21T08:13:39.2802307Z st.shared.v4.b32 [%r10+16384], {%r1080, %r1083, %r1086, %r1089}; 2026-02-21T08:13:39.2802432Z st.shared.v4.b32 [%r10+49152], {%r1176, %r1179, %r1182, %r1185}; 2026-02-21T08:13:39.2802542Z st.shared.v4.b32 [%r11], {%r900, %r903, %r906, %r909}; 2026-02-21T08:13:39.2802668Z st.shared.v4.b32 [%r11+32768], {%r996, %r999, %r1002, %r1005}; 2026-02-21T08:13:39.2802787Z st.shared.v4.b32 [%r11+16384], {%r1092, %r1095, %r1098, %r1101}; 2026-02-21T08:13:39.2802903Z st.shared.v4.b32 [%r11+49152], {%r1188, %r1191, %r1194, %r1197}; 2026-02-21T08:13:39.2802984Z // begin inline asm 2026-02-21T08:13:39.2803076Z fence.proxy.async.shared::cta; 2026-02-21T08:13:39.2803144Z // end inline asm 2026-02-21T08:13:39.2803217Z bar.sync 0; 2026-02-21T08:13:39.2803303Z elect.sync %r1198|%p165, -1; 2026-02-21T08:13:39.2803386Z and.pred %p163, %p164, %p165; 2026-02-21T08:13:39.2803461Z and.b32 %r1199, %r14, 1; 2026-02-21T08:13:39.2803540Z shl.b32 %r1200, %r1199, 15; 2026-02-21T08:13:39.2803609Z add.s32 %r1201, %r31, %r1200; 2026-02-21T08:13:39.2803680Z add.s32 %r812, %r1201, 114688; 2026-02-21T08:13:39.2803757Z shl.b32 %r1202, %r1199, 6; 2026-02-21T08:13:39.2803827Z or.b32 %r810, %r1202, %r462; 2026-02-21T08:13:39.2803893Z // begin inline asm 2026-02-21T08:13:39.2804118Z @%p163 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd108, {%r810, %r811}], [%r812]; 2026-02-21T08:13:39.2804189Z // end inline asm 2026-02-21T08:13:39.2804266Z cp.async.bulk.commit_group; 2026-02-21T08:13:39.2804362Z $L__BB0_8: // %._crit_edge 2026-02-21T08:13:39.2804586Z .loc 1 33 74 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:33:74 2026-02-21T08:13:39.2804721Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:39.2804791Z bar.sync 0; 2026-02-21T08:13:39.2805005Z .loc 1 33 4 // cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py:33:4 2026-02-21T08:13:39.2805071Z bar.sync 0; 2026-02-21T08:13:39.2805140Z // begin inline asm 2026-02-21T08:13:39.2805278Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1203, 256; 2026-02-21T08:13:39.2805351Z // end inline asm 2026-02-21T08:13:39.2805413Z ret; 2026-02-21T08:13:39.2805478Z $L__tmp1: 2026-02-21T08:13:39.2805553Z $L__func_end0: 2026-02-21T08:13:39.2805653Z // -- End function 2026-02-21T08:13:39.2805715Z } 2026-02-21T08:13:39.2805972Z .file 1 "/tmp/torchinductor_root/yp/cyptit3mhonrzeus2iqecmn2oi4ineoilgbkzmuxsi2ugxfrxsvp.py" 2026-02-21T08:13:39.2806045Z .section .debug_abbrev 2026-02-21T08:13:39.2806104Z { 2026-02-21T08:13:39.2806209Z .b8 1 // Abbreviation Code 2026-02-21T08:13:39.2806321Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:13:39.2806418Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:13:39.2806584Z .b8 37 // DW_AT_producer 2026-02-21T08:13:39.2806681Z .b8 8 // DW_FORM_string 2026-02-21T08:13:39.2806769Z .b8 19 // DW_AT_language 2026-02-21T08:13:39.2806859Z .b8 5 // DW_FORM_data2 2026-02-21T08:13:39.2806956Z .b8 3 // DW_AT_name 2026-02-21T08:13:39.2807042Z .b8 8 // DW_FORM_string 2026-02-21T08:13:39.2807135Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:13:39.2807222Z .b8 6 // DW_FORM_data4 2026-02-21T08:13:39.2807317Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:13:39.2807403Z .b8 8 // DW_FORM_string 2026-02-21T08:13:39.2807485Z .b8 0 // EOM(1) 2026-02-21T08:13:39.2807636Z .b8 0 // EOM(2) 2026-02-21T08:13:39.2807719Z .b8 0 // EOM(3) 2026-02-21T08:13:39.2807779Z } 2026-02-21T08:13:39.2807856Z .section .debug_info 2026-02-21T08:13:39.2807914Z { 2026-02-21T08:13:39.2808011Z .b32 104 // Length of Unit 2026-02-21T08:13:39.2808112Z .b8 2 // DWARF version number 2026-02-21T08:13:39.2808182Z .b8 0 2026-02-21T08:13:39.2808319Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:13:39.2808422Z .b8 8 // Address Size (in bytes) 2026-02-21T08:13:39.2808552Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:13:39.2808647Z .b8 116 // DW_AT_producer 2026-02-21T08:13:39.2808709Z .b8 114 2026-02-21T08:13:39.2808769Z .b8 105 2026-02-21T08:13:39.2808835Z .b8 116 2026-02-21T08:13:39.2808894Z .b8 111 2026-02-21T08:13:39.2808954Z .b8 110 2026-02-21T08:13:39.2809024Z .b8 0 2026-02-21T08:13:39.2809110Z .b8 2 // DW_AT_language 2026-02-21T08:13:39.2809168Z .b8 0 2026-02-21T08:13:39.2809258Z .b8 99 // DW_AT_name 2026-02-21T08:13:39.2809326Z .b8 121 2026-02-21T08:13:39.2809384Z .b8 112 2026-02-21T08:13:39.2809442Z .b8 116 2026-02-21T08:13:39.2809505Z .b8 105 2026-02-21T08:13:39.2809564Z .b8 116 2026-02-21T08:13:39.2809623Z .b8 51 2026-02-21T08:13:39.2809680Z .b8 109 2026-02-21T08:13:39.2809749Z .b8 104 2026-02-21T08:13:39.2809808Z .b8 111 2026-02-21T08:13:39.2809866Z .b8 110 2026-02-21T08:13:39.2809931Z .b8 114 2026-02-21T08:13:39.2809990Z .b8 122 2026-02-21T08:13:39.2810049Z .b8 101 2026-02-21T08:13:39.2810106Z .b8 117 2026-02-21T08:13:39.2810173Z .b8 115 2026-02-21T08:13:39.2810234Z .b8 50 2026-02-21T08:13:39.2810292Z .b8 105 2026-02-21T08:13:39.2810356Z .b8 113 2026-02-21T08:13:39.2810416Z .b8 101 2026-02-21T08:13:39.2810474Z .b8 99 2026-02-21T08:13:39.2810535Z .b8 109 2026-02-21T08:13:39.2810603Z .b8 110 2026-02-21T08:13:39.2810662Z .b8 50 2026-02-21T08:13:39.2810722Z .b8 111 2026-02-21T08:13:39.2810780Z .b8 105 2026-02-21T08:13:39.2810848Z .b8 52 2026-02-21T08:13:39.2810908Z .b8 105 2026-02-21T08:13:39.2810968Z .b8 110 2026-02-21T08:13:39.2811034Z .b8 101 2026-02-21T08:13:39.2811094Z .b8 111 2026-02-21T08:13:39.2811156Z .b8 105 2026-02-21T08:13:39.2811216Z .b8 108 2026-02-21T08:13:39.2811285Z .b8 103 2026-02-21T08:13:39.2811344Z .b8 98 2026-02-21T08:13:39.2811404Z .b8 107 2026-02-21T08:13:39.2811475Z .b8 122 2026-02-21T08:13:39.2811536Z .b8 109 2026-02-21T08:13:39.2811595Z .b8 117 2026-02-21T08:13:39.2811653Z .b8 120 2026-02-21T08:13:39.2811720Z .b8 115 2026-02-21T08:13:39.2811777Z .b8 105 2026-02-21T08:13:39.2811835Z .b8 50 2026-02-21T08:13:39.2811893Z .b8 117 2026-02-21T08:13:39.2811958Z .b8 103 2026-02-21T08:13:39.2812016Z .b8 120 2026-02-21T08:13:39.2812074Z .b8 102 2026-02-21T08:13:39.2812139Z .b8 114 2026-02-21T08:13:39.2812197Z .b8 120 2026-02-21T08:13:39.2812259Z .b8 115 2026-02-21T08:13:39.2812375Z .b8 118 2026-02-21T08:13:39.2812442Z .b8 112 2026-02-21T08:13:39.2812502Z .b8 46 2026-02-21T08:13:39.2812562Z .b8 112 2026-02-21T08:13:39.2812628Z .b8 121 2026-02-21T08:13:39.2812688Z .b8 0 2026-02-21T08:13:39.2812795Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:13:39.2812884Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:13:39.2812952Z .b8 116 2026-02-21T08:13:39.2813010Z .b8 109 2026-02-21T08:13:39.2813069Z .b8 112 2026-02-21T08:13:39.2813134Z .b8 47 2026-02-21T08:13:39.2813193Z .b8 116 2026-02-21T08:13:39.2813252Z .b8 111 2026-02-21T08:13:39.2813309Z .b8 114 2026-02-21T08:13:39.2813378Z .b8 99 2026-02-21T08:13:39.2813437Z .b8 104 2026-02-21T08:13:39.2813496Z .b8 105 2026-02-21T08:13:39.2813562Z .b8 110 2026-02-21T08:13:39.2813623Z .b8 100 2026-02-21T08:13:39.2813682Z .b8 117 2026-02-21T08:13:39.2813740Z .b8 99 2026-02-21T08:13:39.2813810Z .b8 116 2026-02-21T08:13:39.2813869Z .b8 111 2026-02-21T08:13:39.2813975Z .b8 114 2026-02-21T08:13:39.2814039Z .b8 95 2026-02-21T08:13:39.2814107Z .b8 114 2026-02-21T08:13:39.2814167Z .b8 111 2026-02-21T08:13:39.2814226Z .b8 111 2026-02-21T08:13:39.2814293Z .b8 116 2026-02-21T08:13:39.2814352Z .b8 47 2026-02-21T08:13:39.2814411Z .b8 121 2026-02-21T08:13:39.2814469Z .b8 112 2026-02-21T08:13:39.2814535Z .b8 0 2026-02-21T08:13:39.2814594Z } 2026-02-21T08:13:39.2814723Z .section .debug_macinfo { } 2026-02-21T08:13:39.2814730Z 2026-02-21T08:13:39.2814829Z ================================================================ 2026-02-21T08:13:39.2814952Z please share the reproducer above with Triton project. 2026-02-21T08:13:39.4727202Z 2026-02-21T08:13:39.4727222Z 2026-02-21T08:13:39.4727231Z 2026-02-21T08:13:39.4727695Z ================================================================ 2026-02-21T08:13:39.4727853Z Internal Triton PTX codegen error 2026-02-21T08:13:39.4727970Z `ptxas` stderr: 2026-02-21T08:13:39.4728784Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 276 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:39.4728981Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:39.4728992Z 2026-02-21T08:13:39.4729862Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpb97u9a67.ptx -o /tmp/tmpb97u9a67.ptx.o 2026-02-21T08:13:39.4729870Z 2026-02-21T08:13:39.4729876Z 2026-02-21T08:13:39.4729973Z // 2026-02-21T08:13:39.4730105Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:13:39.4730203Z // 2026-02-21T08:13:39.4730210Z 2026-02-21T08:13:39.4730309Z .version 8.7 2026-02-21T08:13:39.4730407Z .target sm_100a 2026-02-21T08:13:39.4730514Z .address_size 64 2026-02-21T08:13:39.4730522Z 2026-02-21T08:13:39.4730762Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:13:39.4730909Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:13:39.4731077Z // @_helion_matmul 2026-02-21T08:13:39.4731217Z .visible .entry _helion_matmul( 2026-02-21T08:13:39.4731431Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:13:39.4731616Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:13:39.4731818Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:13:39.4732004Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:13:39.4732193Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:13:39.4732291Z ) 2026-02-21T08:13:39.4732391Z .reqntid 256 2026-02-21T08:13:39.4732488Z .maxnreg 32 2026-02-21T08:13:39.4732587Z { 2026-02-21T08:13:39.4732705Z .reg .pred %p<178>; 2026-02-21T08:13:39.4732804Z .reg .b32 %r<790>; 2026-02-21T08:13:39.4732906Z .reg .b64 %rd<389>; 2026-02-21T08:13:39.4733286Z .loc 1 19 0 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:19:0 2026-02-21T08:13:39.4733383Z $L__func_begin0: 2026-02-21T08:13:39.4733739Z .loc 1 19 0 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:19:0 2026-02-21T08:13:39.4734149Z 2026-02-21T08:13:39.4734264Z // %bb.0: 2026-02-21T08:13:39.4734425Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:13:39.4734516Z $L__tmp0: 2026-02-21T08:13:39.4734943Z .loc 1 19 0 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:19 2026-02-21T08:13:39.4735063Z mov.u32 %r1, %tid.x; 2026-02-21T08:13:39.4735221Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:13:39.4735335Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:13:39.4735497Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T08:13:39.4735610Z mov.b32 %r34, global_smem; 2026-02-21T08:13:39.4735715Z // begin inline asm 2026-02-21T08:13:39.4736200Z [98s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:13:39.4738722Z Config: @helion.kernel(config=helion.Config(block_sizes=[512, 64, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=8, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:13:39.4739041Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:13:39.4739152Z `ptxas` stderr: 2026-02-21T08:13:39.4739890Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 276 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:39.4740054Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:39.4740061Z 2026-02-21T08:13:39.4740795Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpb97u9a67.ptx -o /tmp/tmpb97u9a67.ptx.o 2026-02-21T08:13:39.4740809Z 2026-02-21T08:13:39.4741038Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:13:39.4741308Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r34], 256; 2026-02-21T08:13:39.4741406Z // end inline asm 2026-02-21T08:13:39.4741541Z ld.param.b64 %rd58, [_helion_matmul_param_3]; 2026-02-21T08:13:39.4741625Z bar.sync 0; 2026-02-21T08:13:39.4741743Z ld.shared.b32 %r781, [global_smem]; 2026-02-21T08:13:39.4741830Z bar.sync 0; 2026-02-21T08:13:39.4741918Z // begin inline asm 2026-02-21T08:13:39.4742133Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:13:39.4742220Z // end inline asm 2026-02-21T08:13:39.4742524Z .loc 1 21 67 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:21:67 2026-02-21T08:13:39.4742618Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:13:39.4742716Z mov.u32 %r59, %ctaid.y; 2026-02-21T08:13:39.4742812Z mov.u32 %r60, %ctaid.z; 2026-02-21T08:13:39.4742913Z mov.u32 %r61, %nctaid.x; 2026-02-21T08:13:39.4743011Z mov.u32 %r62, %nctaid.y; 2026-02-21T08:13:39.4743113Z mad.lo.s32 %r63, %r60, %r62, %r59; 2026-02-21T08:13:39.4743211Z mad.lo.s32 %r64, %r63, %r61, %r3; 2026-02-21T08:13:39.4743304Z mul.lo.s32 %r65, %r64, 384; 2026-02-21T08:13:39.4743402Z cvt.s64.s32 %rd59, %r65; 2026-02-21T08:13:39.4743499Z add.s64 %rd19, %rd58, %rd59; 2026-02-21T08:13:39.4743586Z shl.b32 %r66, %r1, 2; 2026-02-21T08:13:39.4743683Z add.s32 %r35, %r34, %r66; 2026-02-21T08:13:39.4743765Z mov.b32 %r44, 0; 2026-02-21T08:13:39.4743852Z // begin inline asm 2026-02-21T08:13:39.4743962Z @%p1 st.shared.b32 [ %r35 + 0 ], %r44; 2026-02-21T08:13:39.4744052Z // end inline asm 2026-02-21T08:13:39.4744153Z bar.warp.sync -1; 2026-02-21T08:13:39.4744251Z setp.eq.b32 %p165, %r1, 0; 2026-02-21T08:13:39.4744353Z cvt.u64.u32 %rd4, %r34; 2026-02-21T08:13:39.4744440Z // begin inline asm 2026-02-21T08:13:39.4744854Z @%p165 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:13:39.4745147Z // end inline asm 2026-02-21T08:13:39.4745235Z // begin inline asm 2026-02-21T08:13:39.4745508Z @%p165 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:13:39.4745586Z // end inline asm 2026-02-21T08:13:39.4745671Z mov.b32 %r37, 32; 2026-02-21T08:13:39.4745751Z // begin inline asm 2026-02-21T08:13:39.4746004Z @%p165 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:13:39.4746092Z // end inline asm 2026-02-21T08:13:39.4746170Z mov.b32 %r38, 256; 2026-02-21T08:13:39.4746255Z // begin inline asm 2026-02-21T08:13:39.4746531Z @%p165 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r38; 2026-02-21T08:13:39.4746618Z // end inline asm 2026-02-21T08:13:39.4746704Z mov.b32 %r39, 1024; 2026-02-21T08:13:39.4746789Z // begin inline asm 2026-02-21T08:13:39.4747279Z @%p165 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r39; 2026-02-21T08:13:39.4747393Z // end inline asm 2026-02-21T08:13:39.4747484Z mov.b32 %r40, 4096; 2026-02-21T08:13:39.4747585Z // begin inline asm 2026-02-21T08:13:39.4747902Z @%p165 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r40; 2026-02-21T08:13:39.4747992Z // end inline asm 2026-02-21T08:13:39.4748097Z mov.b64 %rd12, 2048; 2026-02-21T08:13:39.4748189Z // begin inline asm 2026-02-21T08:13:39.4748523Z @%p165 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:13:39.4748614Z // end inline asm 2026-02-21T08:13:39.4748711Z mov.b32 %r41, 1; 2026-02-21T08:13:39.4748801Z // begin inline asm 2026-02-21T08:13:39.4749139Z @%p165 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r41; 2026-02-21T08:13:39.4749238Z // end inline asm 2026-02-21T08:13:39.4749332Z // begin inline asm 2026-02-21T08:13:39.4749690Z @%p165 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r41; 2026-02-21T08:13:39.4749804Z // end inline asm 2026-02-21T08:13:39.4749907Z // begin inline asm 2026-02-21T08:13:39.4750226Z @%p165 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:13:39.4750326Z // end inline asm 2026-02-21T08:13:39.4750439Z // begin inline asm 2026-02-21T08:13:39.4750806Z @%p165 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.4750903Z // end inline asm 2026-02-21T08:13:39.4751015Z // begin inline asm 2026-02-21T08:13:39.4751344Z @%p165 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T08:13:39.4751447Z // end inline asm 2026-02-21T08:13:39.4751557Z // begin inline asm 2026-02-21T08:13:39.4751870Z @%p165 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.4751968Z // end inline asm 2026-02-21T08:13:39.4752069Z // begin inline asm 2026-02-21T08:13:39.4752653Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:13:39.4752753Z // end inline asm 2026-02-21T08:13:39.4752846Z // begin inline asm 2026-02-21T08:13:39.4753089Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:13:39.4753217Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:39.4753346Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:39.4753443Z // end inline asm 2026-02-21T08:13:39.4753532Z bar.sync 0; 2026-02-21T08:13:39.4753644Z cvta.global.u64 %rd101, %rd19; 2026-02-21T08:13:39.4753971Z .loc 1 22 67 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:22:67 2026-02-21T08:13:39.4754084Z add.s32 %r67, %r65, 128; 2026-02-21T08:13:39.4754184Z cvt.s64.s32 %rd60, %r67; 2026-02-21T08:13:39.4754283Z add.s64 %rd37, %rd58, %rd60; 2026-02-21T08:13:39.4754385Z bar.sync 0; 2026-02-21T08:13:39.4754480Z // begin inline asm 2026-02-21T08:13:39.4754600Z @%p1 st.shared.b32 [ %r35 + 0 ], %r44; 2026-02-21T08:13:39.4754936Z // end inline asm 2026-02-21T08:13:39.4755038Z bar.warp.sync -1; 2026-02-21T08:13:39.4755132Z // begin inline asm 2026-02-21T08:13:39.4755461Z @%p165 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:13:39.4755568Z // end inline asm 2026-02-21T08:13:39.4755661Z // begin inline asm 2026-02-21T08:13:39.4755931Z @%p165 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:13:39.4756030Z // end inline asm 2026-02-21T08:13:39.4756121Z // begin inline asm 2026-02-21T08:13:39.4756402Z @%p165 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:13:39.4756502Z // end inline asm 2026-02-21T08:13:39.4756583Z mov.b32 %r46, 64; 2026-02-21T08:13:39.4756668Z // begin inline asm 2026-02-21T08:13:39.4756948Z @%p165 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r46; 2026-02-21T08:13:39.4757251Z // end inline asm 2026-02-21T08:13:39.4757364Z // begin inline asm 2026-02-21T08:13:39.4757685Z @%p165 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r39; 2026-02-21T08:13:39.4757788Z // end inline asm 2026-02-21T08:13:39.4757882Z // begin inline asm 2026-02-21T08:13:39.4758189Z @%p165 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r39; 2026-02-21T08:13:39.4758288Z // end inline asm 2026-02-21T08:13:39.4758380Z // begin inline asm 2026-02-21T08:13:39.4758722Z @%p165 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:13:39.4758816Z // end inline asm 2026-02-21T08:13:39.4758932Z // begin inline asm 2026-02-21T08:13:39.4759277Z @%p165 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r41; 2026-02-21T08:13:39.4759370Z // end inline asm 2026-02-21T08:13:39.4759485Z // begin inline asm 2026-02-21T08:13:39.4759830Z @%p165 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r41; 2026-02-21T08:13:39.4759932Z // end inline asm 2026-02-21T08:13:39.4760042Z // begin inline asm 2026-02-21T08:13:39.4760345Z @%p165 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:13:39.4760437Z // end inline asm 2026-02-21T08:13:39.4760540Z // begin inline asm 2026-02-21T08:13:39.4760876Z @%p165 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.4760966Z // end inline asm 2026-02-21T08:13:39.4761056Z // begin inline asm 2026-02-21T08:13:39.4761361Z @%p165 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T08:13:39.4761455Z // end inline asm 2026-02-21T08:13:39.4761547Z // begin inline asm 2026-02-21T08:13:39.4761848Z @%p165 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.4761939Z // end inline asm 2026-02-21T08:13:39.4762034Z // begin inline asm 2026-02-21T08:13:39.4762583Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:13:39.4762683Z // end inline asm 2026-02-21T08:13:39.4762782Z // begin inline asm 2026-02-21T08:13:39.4763011Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:13:39.4763122Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:39.4763241Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:39.4763324Z // end inline asm 2026-02-21T08:13:39.4763415Z bar.sync 0; 2026-02-21T08:13:39.4763519Z cvta.global.u64 %rd102, %rd37; 2026-02-21T08:13:39.4763831Z .loc 1 24 71 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:24:71 2026-02-21T08:13:39.4763933Z add.s32 %r68, %r65, 256; 2026-02-21T08:13:39.4764025Z cvt.s64.s32 %rd61, %r68; 2026-02-21T08:13:39.4764119Z add.s64 %rd55, %rd58, %rd61; 2026-02-21T08:13:39.4764203Z bar.sync 0; 2026-02-21T08:13:39.4764298Z // begin inline asm 2026-02-21T08:13:39.4764410Z @%p1 st.shared.b32 [ %r35 + 0 ], %r44; 2026-02-21T08:13:39.4764790Z // end inline asm 2026-02-21T08:13:39.4764897Z bar.warp.sync -1; 2026-02-21T08:13:39.4764985Z // begin inline asm 2026-02-21T08:13:39.4765286Z @%p165 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T08:13:39.4765381Z // end inline asm 2026-02-21T08:13:39.4765465Z // begin inline asm 2026-02-21T08:13:39.4765716Z @%p165 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:13:39.4765799Z // end inline asm 2026-02-21T08:13:39.4765893Z // begin inline asm 2026-02-21T08:13:39.4766161Z @%p165 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r46; 2026-02-21T08:13:39.4766248Z // end inline asm 2026-02-21T08:13:39.4766347Z // begin inline asm 2026-02-21T08:13:39.4766638Z @%p165 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r38; 2026-02-21T08:13:39.4766728Z // end inline asm 2026-02-21T08:13:39.4766990Z // begin inline asm 2026-02-21T08:13:39.4767355Z @%p165 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r39; 2026-02-21T08:13:39.4767448Z // end inline asm 2026-02-21T08:13:39.4767542Z // begin inline asm 2026-02-21T08:13:39.4767865Z @%p165 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r40; 2026-02-21T08:13:39.4767954Z // end inline asm 2026-02-21T08:13:39.4768045Z // begin inline asm 2026-02-21T08:13:39.4768383Z @%p165 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:13:39.4768470Z // end inline asm 2026-02-21T08:13:39.4768569Z // begin inline asm 2026-02-21T08:13:39.4768911Z @%p165 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r41; 2026-02-21T08:13:39.4769001Z // end inline asm 2026-02-21T08:13:39.4769093Z // begin inline asm 2026-02-21T08:13:39.4769431Z @%p165 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r41; 2026-02-21T08:13:39.4769532Z // end inline asm 2026-02-21T08:13:39.4769632Z // begin inline asm 2026-02-21T08:13:39.4769925Z @%p165 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:13:39.4770026Z // end inline asm 2026-02-21T08:13:39.4770117Z // begin inline asm 2026-02-21T08:13:39.4770445Z @%p165 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.4770546Z // end inline asm 2026-02-21T08:13:39.4770637Z // begin inline asm 2026-02-21T08:13:39.4770934Z @%p165 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T08:13:39.4771036Z // end inline asm 2026-02-21T08:13:39.4771130Z // begin inline asm 2026-02-21T08:13:39.4771418Z @%p165 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:13:39.4771508Z // end inline asm 2026-02-21T08:13:39.4771608Z // begin inline asm 2026-02-21T08:13:39.4772135Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:13:39.4772236Z // end inline asm 2026-02-21T08:13:39.4772337Z // begin inline asm 2026-02-21T08:13:39.4772569Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T08:13:39.4772688Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:39.4772821Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:39.4772910Z // end inline asm 2026-02-21T08:13:39.4772998Z bar.sync 0; 2026-02-21T08:13:39.4773109Z cvta.global.u64 %rd132, %rd55; 2026-02-21T08:13:39.4773444Z .loc 1 33 74 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:33:74 2026-02-21T08:13:39.4773556Z setp.gt.u32 %p57, %r3, 127; 2026-02-21T08:13:39.4773652Z @%p57 bra $L__BB0_8; 2026-02-21T08:13:39.4773787Z // %bb.1: // %.lr.ph 2026-02-21T08:13:39.4774111Z .loc 1 0 74 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:0:74 2026-02-21T08:13:39.4774222Z setp.lt.u32 %p94, %r1, 64; 2026-02-21T08:13:39.4774505Z shl.b32 %r270, %r1, 7; 2026-02-21T08:13:39.4774614Z and.b32 %r271, %r270, 32640; 2026-02-21T08:13:39.4774770Z shl.b32 %r272, %r1, 4; 2026-02-21T08:13:39.4774876Z and.b32 %r273, %r272, 112; 2026-02-21T08:13:39.4774991Z or.b32 %r274, %r271, %r273; 2026-02-21T08:13:39.4775098Z add.s32 %r276, %r34, 229376; 2026-02-21T08:13:39.4775203Z xor.b32 %r277, %r274, 16; 2026-02-21T08:13:39.4775317Z xor.b32 %r278, %r274, 32; 2026-02-21T08:13:39.4775409Z xor.b32 %r279, %r274, 48; 2026-02-21T08:13:39.4775500Z xor.b32 %r280, %r274, 64; 2026-02-21T08:13:39.4775591Z xor.b32 %r281, %r274, 80; 2026-02-21T08:13:39.4775691Z xor.b32 %r282, %r274, 96; 2026-02-21T08:13:39.4775787Z xor.b32 %r283, %r274, 112; 2026-02-21T08:13:39.4775880Z shr.u32 %r284, %r1, 5; 2026-02-21T08:13:39.4776208Z .loc 1 44 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:44:27 2026-02-21T08:13:39.4776443Z shl.b32 %r285, %r3, 9; 2026-02-21T08:13:39.4776543Z and.b32 %r286, %r285, 3584; 2026-02-21T08:13:39.4776863Z .loc 1 45 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:45:27 2026-02-21T08:13:39.4776949Z shl.b32 %r287, %r3, 3; 2026-02-21T08:13:39.4777036Z and.b32 %r582, %r287, 960; 2026-02-21T08:13:39.4777328Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4777457Z shfl.sync.idx.b32 %r13, %r284, 0, 31, -1; 2026-02-21T08:13:39.4777547Z shl.b32 %r288, %r13, 21; 2026-02-21T08:13:39.4777639Z and.b32 %r289, %r288, 6291456; 2026-02-21T08:13:39.4777736Z add.s32 %r290, %r289, %r781; 2026-02-21T08:13:39.4777825Z shl.b32 %r291, %r13, 4; 2026-02-21T08:13:39.4777911Z and.b32 %r292, %r291, 64; 2026-02-21T08:13:39.4777999Z add.s32 %r581, %r290, %r292; 2026-02-21T08:13:39.4778104Z mov.pred %p110, -1; 2026-02-21T08:13:39.4778190Z // begin inline asm 2026-02-21T08:13:39.4778669Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 0], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4778775Z // end inline asm 2026-02-21T08:13:39.4778862Z // begin inline asm 2026-02-21T08:13:39.4779322Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 16], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4779417Z // end inline asm 2026-02-21T08:13:39.4779504Z // begin inline asm 2026-02-21T08:13:39.4779960Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 32], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4780055Z // end inline asm 2026-02-21T08:13:39.4780141Z // begin inline asm 2026-02-21T08:13:39.4780593Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 48], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4780685Z // end inline asm 2026-02-21T08:13:39.4780782Z // begin inline asm 2026-02-21T08:13:39.4781250Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 128], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4781334Z // end inline asm 2026-02-21T08:13:39.4781431Z // begin inline asm 2026-02-21T08:13:39.4781890Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 144], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4781974Z // end inline asm 2026-02-21T08:13:39.4782067Z // begin inline asm 2026-02-21T08:13:39.4782522Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 160], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4782606Z // end inline asm 2026-02-21T08:13:39.4782701Z // begin inline asm 2026-02-21T08:13:39.4783170Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r581 + 176], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T08:13:39.4783409Z // end inline asm 2026-02-21T08:13:39.4783496Z // begin inline asm 2026-02-21T08:13:39.4783617Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:13:39.4783698Z // end inline asm 2026-02-21T08:13:39.4783778Z bar.sync 0; 2026-02-21T08:13:39.4784092Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4784184Z add.s32 %r783, %r34, 323648; 2026-02-21T08:13:39.4784268Z // begin inline asm 2026-02-21T08:13:39.4784419Z @%p165 mbarrier.init.shared::cta.b64 [%r783], 1; 2026-02-21T08:13:39.4784503Z // end inline asm 2026-02-21T08:13:39.4784590Z bar.sync 0; 2026-02-21T08:13:39.4784770Z add.s32 %r206, %r34, 323656; 2026-02-21T08:13:39.4784883Z // begin inline asm 2026-02-21T08:13:39.4785039Z @%p165 mbarrier.init.shared::cta.b64 [%r206], 1; 2026-02-21T08:13:39.4785128Z // end inline asm 2026-02-21T08:13:39.4785377Z add.s32 %r207, %r34, 323584; 2026-02-21T08:13:39.4785490Z // begin inline asm 2026-02-21T08:13:39.4785643Z @%p165 mbarrier.init.shared::cta.b64 [%r207], 1; 2026-02-21T08:13:39.4785736Z // end inline asm 2026-02-21T08:13:39.4785841Z bar.sync 0; 2026-02-21T08:13:39.4785942Z add.s32 %r208, %r34, 323592; 2026-02-21T08:13:39.4786042Z // begin inline asm 2026-02-21T08:13:39.4786204Z @%p165 mbarrier.init.shared::cta.b64 [%r208], 1; 2026-02-21T08:13:39.4786306Z // end inline asm 2026-02-21T08:13:39.4786385Z bar.sync 0; 2026-02-21T08:13:39.4786475Z add.s32 %r209, %r34, 323600; 2026-02-21T08:13:39.4786571Z // begin inline asm 2026-02-21T08:13:39.4786703Z @%p165 mbarrier.init.shared::cta.b64 [%r209], 1; 2026-02-21T08:13:39.4786787Z // end inline asm 2026-02-21T08:13:39.4786877Z bar.sync 0; 2026-02-21T08:13:39.4786966Z add.s32 %r210, %r34, 323608; 2026-02-21T08:13:39.4787051Z // begin inline asm 2026-02-21T08:13:39.4787192Z @%p165 mbarrier.init.shared::cta.b64 [%r210], 1; 2026-02-21T08:13:39.4787282Z // end inline asm 2026-02-21T08:13:39.4787371Z bar.sync 0; 2026-02-21T08:13:39.4787461Z add.s32 %r211, %r34, 323616; 2026-02-21T08:13:39.4787556Z // begin inline asm 2026-02-21T08:13:39.4787683Z @%p165 mbarrier.init.shared::cta.b64 [%r211], 1; 2026-02-21T08:13:39.4787771Z // end inline asm 2026-02-21T08:13:39.4787858Z bar.sync 0; 2026-02-21T08:13:39.4787947Z add.s32 %r212, %r34, 323624; 2026-02-21T08:13:39.4788034Z // begin inline asm 2026-02-21T08:13:39.4788165Z @%p165 mbarrier.init.shared::cta.b64 [%r212], 1; 2026-02-21T08:13:39.4788260Z // end inline asm 2026-02-21T08:13:39.4788340Z bar.sync 0; 2026-02-21T08:13:39.4788431Z add.s32 %r345, %r34, 323632; 2026-02-21T08:13:39.4788525Z // begin inline asm 2026-02-21T08:13:39.4788657Z @%p165 mbarrier.init.shared::cta.b64 [%r345], 1; 2026-02-21T08:13:39.4788739Z // end inline asm 2026-02-21T08:13:39.4788817Z bar.sync 0; 2026-02-21T08:13:39.4788913Z // begin inline asm 2026-02-21T08:13:39.4789114Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r207], 36864; 2026-02-21T08:13:39.4789205Z // end inline asm 2026-02-21T08:13:39.4789760Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4789847Z // begin inline asm 2026-02-21T08:13:39.4789964Z fence.proxy.async.shared::cta; 2026-02-21T08:13:39.4790064Z // end inline asm 2026-02-21T08:13:39.4790144Z bar.sync 0; 2026-02-21T08:13:39.4790244Z elect.sync %r293|%p95, -1; 2026-02-21T08:13:39.4790342Z and.pred %p76, %p94, %p95; 2026-02-21T08:13:39.4790442Z and.b32 %r15, %r13, 1; 2026-02-21T08:13:39.4790530Z shl.b32 %r16, %r15, 13; 2026-02-21T08:13:39.4790621Z shl.b32 %r294, %r15, 14; 2026-02-21T08:13:39.4790723Z add.s32 %r215, %r34, %r294; 2026-02-21T08:13:39.4790813Z shl.b32 %r295, %r15, 8; 2026-02-21T08:13:39.4790904Z or.b32 %r583, %r295, %r286; 2026-02-21T08:13:39.4790990Z // begin inline asm 2026-02-21T08:13:39.4791457Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r215], [%rd101, {%r44, %r583}], [%r207]; 2026-02-21T08:13:39.4791728Z // end inline asm 2026-02-21T08:13:39.4792027Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4792122Z bar.sync 0; 2026-02-21T08:13:39.4792220Z elect.sync %r296|%p96, -1; 2026-02-21T08:13:39.4792317Z and.pred %p77, %p1, %p96; 2026-02-21T08:13:39.4792416Z add.s32 %r219, %r34, 294912; 2026-02-21T08:13:39.4792502Z // begin inline asm 2026-02-21T08:13:39.4792943Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r219], [%rd102, {%r44, %r582}], [%r207]; 2026-02-21T08:13:39.4793038Z // end inline asm 2026-02-21T08:13:39.4793330Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4793413Z bar.sync 0; 2026-02-21T08:13:39.4793499Z // begin inline asm 2026-02-21T08:13:39.4793698Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r208], 36864; 2026-02-21T08:13:39.4793907Z // end inline asm 2026-02-21T08:13:39.4794231Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4794321Z bar.sync 0; 2026-02-21T08:13:39.4794420Z elect.sync %r297|%p97, -1; 2026-02-21T08:13:39.4794517Z and.pred %p79, %p94, %p97; 2026-02-21T08:13:39.4794629Z add.s32 %r224, %r215, 32768; 2026-02-21T08:13:39.4794773Z // begin inline asm 2026-02-21T08:13:39.4795256Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r224], [%rd101, {%r37, %r583}], [%r208]; 2026-02-21T08:13:39.4795351Z // end inline asm 2026-02-21T08:13:39.4795679Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4795766Z bar.sync 0; 2026-02-21T08:13:39.4795870Z elect.sync %r298|%p98, -1; 2026-02-21T08:13:39.4795987Z and.pred %p80, %p1, %p98; 2026-02-21T08:13:39.4796086Z add.s32 %r228, %r34, 299008; 2026-02-21T08:13:39.4796187Z // begin inline asm 2026-02-21T08:13:39.4796653Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r228], [%rd102, {%r37, %r582}], [%r208]; 2026-02-21T08:13:39.4796744Z // end inline asm 2026-02-21T08:13:39.4797041Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4797127Z bar.sync 0; 2026-02-21T08:13:39.4797227Z // begin inline asm 2026-02-21T08:13:39.4797417Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r209], 36864; 2026-02-21T08:13:39.4797501Z // end inline asm 2026-02-21T08:13:39.4797804Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4797885Z bar.sync 0; 2026-02-21T08:13:39.4797982Z elect.sync %r299|%p99, -1; 2026-02-21T08:13:39.4798086Z and.pred %p82, %p94, %p99; 2026-02-21T08:13:39.4798177Z add.s32 %r233, %r215, 65536; 2026-02-21T08:13:39.4798263Z // begin inline asm 2026-02-21T08:13:39.4798702Z @%p82 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r233], [%rd101, {%r46, %r583}], [%r209]; 2026-02-21T08:13:39.4798810Z // end inline asm 2026-02-21T08:13:39.4799102Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4799182Z bar.sync 0; 2026-02-21T08:13:39.4799291Z elect.sync %r300|%p100, -1; 2026-02-21T08:13:39.4799393Z and.pred %p83, %p1, %p100; 2026-02-21T08:13:39.4799484Z add.s32 %r237, %r34, 303104; 2026-02-21T08:13:39.4799579Z // begin inline asm 2026-02-21T08:13:39.4800014Z @%p83 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r237], [%rd102, {%r46, %r582}], [%r209]; 2026-02-21T08:13:39.4800101Z // end inline asm 2026-02-21T08:13:39.4800396Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4800486Z bar.sync 0; 2026-02-21T08:13:39.4800573Z // begin inline asm 2026-02-21T08:13:39.4800767Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r210], 36864; 2026-02-21T08:13:39.4801017Z // end inline asm 2026-02-21T08:13:39.4801318Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4801402Z bar.sync 0; 2026-02-21T08:13:39.4801511Z elect.sync %r301|%p101, -1; 2026-02-21T08:13:39.4801607Z and.pred %p85, %p94, %p101; 2026-02-21T08:13:39.4801699Z add.s32 %r242, %r215, 98304; 2026-02-21T08:13:39.4801785Z mov.b32 %r243, 96; 2026-02-21T08:13:39.4801879Z // begin inline asm 2026-02-21T08:13:39.4802323Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd101, {%r243, %r583}], [%r210]; 2026-02-21T08:13:39.4802410Z // end inline asm 2026-02-21T08:13:39.4802715Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4802796Z bar.sync 0; 2026-02-21T08:13:39.4802894Z elect.sync %r302|%p102, -1; 2026-02-21T08:13:39.4803133Z and.pred %p86, %p1, %p102; 2026-02-21T08:13:39.4803244Z add.s32 %r246, %r34, 307200; 2026-02-21T08:13:39.4803331Z // begin inline asm 2026-02-21T08:13:39.4803774Z @%p86 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd102, {%r243, %r582}], [%r210]; 2026-02-21T08:13:39.4803867Z // end inline asm 2026-02-21T08:13:39.4804165Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4804246Z bar.sync 0; 2026-02-21T08:13:39.4804340Z // begin inline asm 2026-02-21T08:13:39.4804523Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r211], 36864; 2026-02-21T08:13:39.4804609Z // end inline asm 2026-02-21T08:13:39.4804992Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4805082Z bar.sync 0; 2026-02-21T08:13:39.4805189Z elect.sync %r303|%p103, -1; 2026-02-21T08:13:39.4805293Z and.pred %p88, %p94, %p103; 2026-02-21T08:13:39.4805412Z add.s32 %r251, %r215, 131072; 2026-02-21T08:13:39.4805513Z mov.b32 %r252, 128; 2026-02-21T08:13:39.4805603Z // begin inline asm 2026-02-21T08:13:39.4806095Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r251], [%rd101, {%r252, %r583}], [%r211]; 2026-02-21T08:13:39.4806185Z // end inline asm 2026-02-21T08:13:39.4806503Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4806601Z bar.sync 0; 2026-02-21T08:13:39.4806706Z elect.sync %r304|%p104, -1; 2026-02-21T08:13:39.4806810Z and.pred %p89, %p1, %p104; 2026-02-21T08:13:39.4806907Z add.s32 %r255, %r34, 311296; 2026-02-21T08:13:39.4807010Z // begin inline asm 2026-02-21T08:13:39.4807486Z @%p89 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r255], [%rd102, {%r252, %r582}], [%r211]; 2026-02-21T08:13:39.4807580Z // end inline asm 2026-02-21T08:13:39.4807913Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4808011Z bar.sync 0; 2026-02-21T08:13:39.4808104Z // begin inline asm 2026-02-21T08:13:39.4808312Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r212], 36864; 2026-02-21T08:13:39.4808401Z // end inline asm 2026-02-21T08:13:39.4808723Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4808828Z bar.sync 0; 2026-02-21T08:13:39.4808935Z elect.sync %r305|%p105, -1; 2026-02-21T08:13:39.4809039Z and.pred %p91, %p94, %p105; 2026-02-21T08:13:39.4809139Z add.s32 %r260, %r215, 163840; 2026-02-21T08:13:39.4809239Z mov.b32 %r261, 160; 2026-02-21T08:13:39.4809332Z // begin inline asm 2026-02-21T08:13:39.4809815Z @%p91 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r260], [%rd101, {%r261, %r583}], [%r212]; 2026-02-21T08:13:39.4809915Z // end inline asm 2026-02-21T08:13:39.4810244Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4810499Z bar.sync 0; 2026-02-21T08:13:39.4810614Z elect.sync %r306|%p106, -1; 2026-02-21T08:13:39.4810716Z and.pred %p92, %p1, %p106; 2026-02-21T08:13:39.4810815Z add.s32 %r264, %r34, 315392; 2026-02-21T08:13:39.4810906Z // begin inline asm 2026-02-21T08:13:39.4811394Z @%p92 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd102, {%r261, %r582}], [%r212]; 2026-02-21T08:13:39.4811487Z // end inline asm 2026-02-21T08:13:39.4811807Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4811904Z bar.sync 0; 2026-02-21T08:13:39.4811998Z // begin inline asm 2026-02-21T08:13:39.4812081Z 2026-02-21T08:13:39.4812162Z { 2026-02-21T08:13:39.4812270Z .reg .pred complete; 2026-02-21T08:13:39.4812360Z waitLoop: 2026-02-21T08:13:39.4812571Z mbarrier.try_wait.parity.shared.b64 complete, [%r207], %r44; 2026-02-21T08:13:39.4812828Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.4812925Z } 2026-02-21T08:13:39.4812933Z 2026-02-21T08:13:39.4813028Z // end inline asm 2026-02-21T08:13:39.4813361Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4813466Z setp.ne.b32 %p107, %r13, 0; 2026-02-21T08:13:39.4813564Z @%p107 bra $L__BB0_3; 2026-02-21T08:13:39.4813653Z // %bb.2: 2026-02-21T08:13:39.4813974Z .loc 1 0 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:0:52 2026-02-21T08:13:39.4814070Z add.s32 %r324, %r34, 24608; 2026-02-21T08:13:39.4814179Z bfe.u32 %r325, %r324, 4, 14; 2026-02-21T08:13:39.4814288Z cvt.u64.u32 %rd91, %r325; 2026-02-21T08:13:39.4814416Z or.b64 %rd88, %rd91, -9223371899281604608; 2026-02-21T08:13:39.4814512Z add.s32 %r326, %r34, 24576; 2026-02-21T08:13:39.4814621Z bfe.u32 %r327, %r326, 4, 14; 2026-02-21T08:13:39.4814801Z cvt.u64.u32 %rd92, %r327; 2026-02-21T08:13:39.4814937Z or.b64 %rd86, %rd92, -9223371899281604608; 2026-02-21T08:13:39.4815044Z add.s32 %r328, %r34, 16416; 2026-02-21T08:13:39.4815150Z bfe.u32 %r329, %r328, 4, 14; 2026-02-21T08:13:39.4815249Z cvt.u64.u32 %rd93, %r329; 2026-02-21T08:13:39.4815368Z or.b64 %rd84, %rd93, -9223371899281604608; 2026-02-21T08:13:39.4815469Z add.s32 %r330, %r34, 16384; 2026-02-21T08:13:39.4815565Z bfe.u32 %r331, %r330, 4, 14; 2026-02-21T08:13:39.4815660Z cvt.u64.u32 %rd94, %r331; 2026-02-21T08:13:39.4815777Z or.b64 %rd82, %rd94, -9223371899281604608; 2026-02-21T08:13:39.4815880Z add.s32 %r332, %r34, 8224; 2026-02-21T08:13:39.4815975Z bfe.u32 %r333, %r332, 4, 14; 2026-02-21T08:13:39.4816070Z cvt.u64.u32 %rd95, %r333; 2026-02-21T08:13:39.4816193Z or.b64 %rd80, %rd95, -9223371899281604608; 2026-02-21T08:13:39.4816291Z add.s32 %r334, %r34, 8192; 2026-02-21T08:13:39.4816387Z bfe.u32 %r335, %r334, 4, 14; 2026-02-21T08:13:39.4816482Z cvt.u64.u32 %rd96, %r335; 2026-02-21T08:13:39.4816610Z or.b64 %rd78, %rd96, -9223371899281604608; 2026-02-21T08:13:39.4816709Z add.s32 %r337, %r34, 294944; 2026-02-21T08:13:39.4816812Z bfe.u32 %r338, %r337, 4, 14; 2026-02-21T08:13:39.4816918Z cvt.u64.u32 %rd97, %r338; 2026-02-21T08:13:39.4817031Z or.b64 %rd77, %rd97, -9223371899399045120; 2026-02-21T08:13:39.4817127Z add.s32 %r339, %r34, 32; 2026-02-21T08:13:39.4817220Z bfe.u32 %r340, %r339, 4, 14; 2026-02-21T08:13:39.4817328Z cvt.u64.u32 %rd98, %r340; 2026-02-21T08:13:39.4817441Z or.b64 %rd76, %rd98, -9223371899281604608; 2026-02-21T08:13:39.4817536Z bfe.u32 %r341, %r219, 4, 14; 2026-02-21T08:13:39.4817643Z cvt.u64.u32 %rd99, %r341; 2026-02-21T08:13:39.4817754Z or.b64 %rd75, %rd99, -9223371899399045120; 2026-02-21T08:13:39.4817851Z bfe.u32 %r342, %r34, 4, 14; 2026-02-21T08:13:39.4817957Z cvt.u64.u32 %rd100, %r342; 2026-02-21T08:13:39.4818079Z or.b64 %rd74, %rd100, -9223371899281604608; 2026-02-21T08:13:39.4818403Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4818515Z elect.sync %r343|%p109, -1; 2026-02-21T08:13:39.4818791Z mov.b32 %r308, 135266320; 2026-02-21T08:13:39.4818889Z mov.pred %p108, 0; 2026-02-21T08:13:39.4818981Z // begin inline asm 2026-02-21T08:13:39.4819268Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 0 ], %rd74, %rd75, %r308, %p108; 2026-02-21T08:13:39.4819363Z // end inline asm 2026-02-21T08:13:39.4819456Z // begin inline asm 2026-02-21T08:13:39.4819729Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 0 ], %rd76, %rd77, %r308, %p110; 2026-02-21T08:13:39.4819820Z // end inline asm 2026-02-21T08:13:39.4819923Z // begin inline asm 2026-02-21T08:13:39.4820166Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 64 ], %rd78, %rd75, %r308, %p108; 2026-02-21T08:13:39.4820261Z // end inline asm 2026-02-21T08:13:39.4820348Z // begin inline asm 2026-02-21T08:13:39.4820583Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 64 ], %rd80, %rd77, %r308, %p110; 2026-02-21T08:13:39.4820672Z // end inline asm 2026-02-21T08:13:39.4821244Z // begin inline asm 2026-02-21T08:13:39.4821529Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 128 ], %rd82, %rd75, %r308, %p108; 2026-02-21T08:13:39.4821621Z // end inline asm 2026-02-21T08:13:39.4821707Z // begin inline asm 2026-02-21T08:13:39.4821949Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 128 ], %rd84, %rd77, %r308, %p110; 2026-02-21T08:13:39.4822031Z // end inline asm 2026-02-21T08:13:39.4822129Z // begin inline asm 2026-02-21T08:13:39.4822369Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 192 ], %rd86, %rd75, %r308, %p108; 2026-02-21T08:13:39.4822450Z // end inline asm 2026-02-21T08:13:39.4822543Z // begin inline asm 2026-02-21T08:13:39.4822781Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 192 ], %rd88, %rd77, %r308, %p110; 2026-02-21T08:13:39.4822864Z // end inline asm 2026-02-21T08:13:39.4822962Z add.s32 %r344, %r34, 323648; 2026-02-21T08:13:39.4823054Z cvt.u64.u32 %rd90, %r344; 2026-02-21T08:13:39.4823139Z // begin inline asm 2026-02-21T08:13:39.4823369Z @%p109 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd90]; 2026-02-21T08:13:39.4823461Z // end inline asm 2026-02-21T08:13:39.4823544Z $L__BB0_3: 2026-02-21T08:13:39.4823838Z .loc 1 0 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:0:52 2026-02-21T08:13:39.4823939Z add.s32 %r4, %r276, %r274; 2026-02-21T08:13:39.4824029Z add.s32 %r5, %r276, %r277; 2026-02-21T08:13:39.4824119Z add.s32 %r6, %r276, %r278; 2026-02-21T08:13:39.4824213Z add.s32 %r7, %r276, %r279; 2026-02-21T08:13:39.4824298Z add.s32 %r8, %r276, %r280; 2026-02-21T08:13:39.4824383Z add.s32 %r9, %r276, %r281; 2026-02-21T08:13:39.4824474Z add.s32 %r10, %r276, %r282; 2026-02-21T08:13:39.4824576Z add.s32 %r11, %r276, %r283; 2026-02-21T08:13:39.4824994Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4825088Z bar.sync 0; 2026-02-21T08:13:39.4825209Z // begin inline asm 2026-02-21T08:13:39.4825417Z @%p165 mbarrier.arrive.expect_tx.shared.b64 _, [%r345], 36864; 2026-02-21T08:13:39.4825514Z // end inline asm 2026-02-21T08:13:39.4825840Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4825931Z bar.sync 0; 2026-02-21T08:13:39.4826040Z elect.sync %r359|%p130, -1; 2026-02-21T08:13:39.4826147Z and.pred %p126, %p94, %p130; 2026-02-21T08:13:39.4826257Z shl.b32 %r360, %r16, 1; 2026-02-21T08:13:39.4826359Z add.s32 %r361, %r34, %r360; 2026-02-21T08:13:39.4826454Z add.s32 %r346, %r361, 196608; 2026-02-21T08:13:39.4826548Z mov.b32 %r347, 192; 2026-02-21T08:13:39.4826636Z // begin inline asm 2026-02-21T08:13:39.4827111Z @%p126 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r346], [%rd101, {%r347, %r583}], [%r345]; 2026-02-21T08:13:39.4827217Z // end inline asm 2026-02-21T08:13:39.4827521Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4827605Z bar.sync 0; 2026-02-21T08:13:39.4827912Z elect.sync %r362|%p131, -1; 2026-02-21T08:13:39.4828017Z and.pred %p127, %p1, %p131; 2026-02-21T08:13:39.4828107Z add.s32 %r350, %r34, 319488; 2026-02-21T08:13:39.4828193Z // begin inline asm 2026-02-21T08:13:39.4828655Z @%p127 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r350], [%rd102, {%r347, %r582}], [%r345]; 2026-02-21T08:13:39.4828744Z // end inline asm 2026-02-21T08:13:39.4828827Z mov.b32 %r787, 1; 2026-02-21T08:13:39.4828907Z mov.b32 %r786, 6; 2026-02-21T08:13:39.4828999Z mov.b32 %r782, 0; 2026-02-21T08:13:39.4829086Z mov.b32 %r784, %r782; 2026-02-21T08:13:39.4829172Z mov.b32 %r785, %r782; 2026-02-21T08:13:39.4829268Z mov.b32 %r788, %r782; 2026-02-21T08:13:39.4829353Z mov.b32 %r789, %r782; 2026-02-21T08:13:39.4829438Z bra.uni $L__BB0_4; 2026-02-21T08:13:39.4829616Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:13:39.4830064Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4830186Z setp.lt.u32 %p154, %r789, 800; 2026-02-21T08:13:39.4830480Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4830577Z // begin inline asm 2026-02-21T08:13:39.4830653Z 2026-02-21T08:13:39.4830727Z { 2026-02-21T08:13:39.4830825Z .reg .pred complete; 2026-02-21T08:13:39.4830905Z waitLoop: 2026-02-21T08:13:39.4831107Z mbarrier.try_wait.parity.shared.b64 complete, [%r783], %r782; 2026-02-21T08:13:39.4831208Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.4831289Z } 2026-02-21T08:13:39.4831296Z 2026-02-21T08:13:39.4831379Z // end inline asm 2026-02-21T08:13:39.4831468Z add.s32 %r424, %r787, 1; 2026-02-21T08:13:39.4831572Z setp.gt.s32 %p158, %r424, 1; 2026-02-21T08:13:39.4831672Z selp.b32 %r787, 0, %r424, %p158; 2026-02-21T08:13:39.4831764Z selp.b32 %r425, 1, 0, %p158; 2026-02-21T08:13:39.4831865Z xor.b32 %r31, %r788, %r425; 2026-02-21T08:13:39.4832174Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4832262Z add.s32 %r426, %r786, 1; 2026-02-21T08:13:39.4832356Z setp.gt.s32 %p159, %r426, 6; 2026-02-21T08:13:39.4832460Z selp.b32 %r786, 0, %r426, %p159; 2026-02-21T08:13:39.4832548Z shl.b32 %r427, %r786, 3; 2026-02-21T08:13:39.4832645Z add.s32 %r429, %r34, %r427; 2026-02-21T08:13:39.4832744Z add.s32 %r419, %r429, 323584; 2026-02-21T08:13:39.4832824Z bar.sync 0; 2026-02-21T08:13:39.4832923Z and.pred %p151, %p165, %p154; 2026-02-21T08:13:39.4833009Z // begin inline asm 2026-02-21T08:13:39.4833206Z @%p151 mbarrier.arrive.expect_tx.shared.b64 _, [%r419], 36864; 2026-02-21T08:13:39.4833292Z // end inline asm 2026-02-21T08:13:39.4833591Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4833693Z shl.b32 %r430, %r786, 15; 2026-02-21T08:13:39.4833781Z bar.sync 0; 2026-02-21T08:13:39.4833886Z elect.sync %r431|%p160, -1; 2026-02-21T08:13:39.4833996Z and.pred %p161, %p154, %p160; 2026-02-21T08:13:39.4834095Z and.pred %p152, %p94, %p161; 2026-02-21T08:13:39.4834184Z add.s32 %r416, %r215, %r430; 2026-02-21T08:13:39.4834270Z add.s32 %r417, %r789, 224; 2026-02-21T08:13:39.4834365Z // begin inline asm 2026-02-21T08:13:39.4834997Z @%p152 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r416], [%rd101, {%r417, %r583}], [%r419]; 2026-02-21T08:13:39.4835087Z // end inline asm 2026-02-21T08:13:39.4835421Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4835519Z shl.b32 %r432, %r786, 12; 2026-02-21T08:13:39.4835617Z add.s32 %r433, %r34, %r432; 2026-02-21T08:13:39.4835722Z add.s32 %r420, %r433, 294912; 2026-02-21T08:13:39.4835813Z bar.sync 0; 2026-02-21T08:13:39.4835918Z elect.sync %r434|%p162, -1; 2026-02-21T08:13:39.4836030Z and.pred %p163, %p154, %p162; 2026-02-21T08:13:39.4836305Z and.pred %p153, %p1, %p163; 2026-02-21T08:13:39.4836393Z // begin inline asm 2026-02-21T08:13:39.4836843Z @%p153 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r420], [%rd102, {%r417, %r582}], [%r419]; 2026-02-21T08:13:39.4836938Z // end inline asm 2026-02-21T08:13:39.4837259Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4837366Z setp.lt.u32 %p164, %r789, 960; 2026-02-21T08:13:39.4837468Z add.s32 %r789, %r789, 32; 2026-02-21T08:13:39.4837566Z mov.b32 %r782, %r788; 2026-02-21T08:13:39.4837657Z mov.b32 %r783, %r435; 2026-02-21T08:13:39.4837750Z mov.b32 %r788, %r31; 2026-02-21T08:13:39.4837853Z @%p164 bra $L__BB0_4; 2026-02-21T08:13:39.4837948Z bra.uni $L__BB0_7; 2026-02-21T08:13:39.4838134Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:13:39.4838591Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4838703Z add.s32 %r365, %r785, 1; 2026-02-21T08:13:39.4838808Z setp.gt.s32 %p133, %r365, 6; 2026-02-21T08:13:39.4838915Z selp.b32 %r785, 0, %r365, %p133; 2026-02-21T08:13:39.4839025Z selp.b32 %r366, 1, 0, %p133; 2026-02-21T08:13:39.4839121Z xor.b32 %r784, %r784, %r366; 2026-02-21T08:13:39.4839215Z shl.b32 %r367, %r785, 3; 2026-02-21T08:13:39.4839319Z add.s32 %r369, %r34, %r367; 2026-02-21T08:13:39.4839416Z add.s32 %r363, %r369, 323584; 2026-02-21T08:13:39.4839502Z bar.sync 0; 2026-02-21T08:13:39.4839597Z // begin inline asm 2026-02-21T08:13:39.4839688Z 2026-02-21T08:13:39.4839767Z { 2026-02-21T08:13:39.4839866Z .reg .pred complete; 2026-02-21T08:13:39.4839963Z waitLoop: 2026-02-21T08:13:39.4840195Z mbarrier.try_wait.parity.shared.b64 complete, [%r363], %r784; 2026-02-21T08:13:39.4840306Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.4840386Z } 2026-02-21T08:13:39.4840402Z 2026-02-21T08:13:39.4840499Z // end inline asm 2026-02-21T08:13:39.4840603Z shl.b32 %r370, %r787, 3; 2026-02-21T08:13:39.4840698Z add.s32 %r371, %r34, %r370; 2026-02-21T08:13:39.4840803Z add.s32 %r435, %r371, 323648; 2026-02-21T08:13:39.4841130Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4841226Z @%p107 bra $L__BB0_6; 2026-02-21T08:13:39.4841403Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:13:39.4841724Z .loc 1 54 31 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:54:31 2026-02-21T08:13:39.4841818Z shl.b32 %r388, %r785, 15; 2026-02-21T08:13:39.4841925Z add.s32 %r390, %r34, %r388; 2026-02-21T08:13:39.4842247Z .loc 1 55 44 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:55:44 2026-02-21T08:13:39.4842340Z shl.b32 %r391, %r785, 12; 2026-02-21T08:13:39.4842441Z add.s32 %r392, %r34, %r391; 2026-02-21T08:13:39.4842552Z add.s32 %r393, %r392, 294912; 2026-02-21T08:13:39.4842884Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4842990Z elect.sync %r394|%p135, -1; 2026-02-21T08:13:39.4843100Z bfe.u32 %r395, %r390, 4, 14; 2026-02-21T08:13:39.4843201Z cvt.u64.u32 %rd120, %r395; 2026-02-21T08:13:39.4843327Z or.b64 %rd103, %rd120, -9223371899281604608; 2026-02-21T08:13:39.4843431Z bfe.u32 %r396, %r393, 4, 14; 2026-02-21T08:13:39.4843527Z cvt.u64.u32 %rd121, %r396; 2026-02-21T08:13:39.4843648Z or.b64 %rd104, %rd121, -9223371899399045120; 2026-02-21T08:13:39.4843754Z mov.b32 %r373, 135266320; 2026-02-21T08:13:39.4843858Z mov.pred %p134, -1; 2026-02-21T08:13:39.4843945Z // begin inline asm 2026-02-21T08:13:39.4844203Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 0 ], %rd103, %rd104, %r373, %p134; 2026-02-21T08:13:39.4844294Z // end inline asm 2026-02-21T08:13:39.4844381Z add.s32 %r397, %r390, 32; 2026-02-21T08:13:39.4844475Z bfe.u32 %r398, %r397, 4, 14; 2026-02-21T08:13:39.4844798Z cvt.u64.u32 %rd122, %r398; 2026-02-21T08:13:39.4844923Z or.b64 %rd105, %rd122, -9223371899281604608; 2026-02-21T08:13:39.4845020Z add.s32 %r399, %r392, 294944; 2026-02-21T08:13:39.4845117Z bfe.u32 %r400, %r399, 4, 14; 2026-02-21T08:13:39.4845222Z cvt.u64.u32 %rd123, %r400; 2026-02-21T08:13:39.4845339Z or.b64 %rd106, %rd123, -9223371899399045120; 2026-02-21T08:13:39.4845432Z // begin inline asm 2026-02-21T08:13:39.4845708Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 0 ], %rd105, %rd106, %r373, %p134; 2026-02-21T08:13:39.4845802Z // end inline asm 2026-02-21T08:13:39.4845900Z add.s32 %r401, %r390, 8192; 2026-02-21T08:13:39.4845992Z bfe.u32 %r402, %r401, 4, 14; 2026-02-21T08:13:39.4846097Z cvt.u64.u32 %rd124, %r402; 2026-02-21T08:13:39.4846215Z or.b64 %rd107, %rd124, -9223371899281604608; 2026-02-21T08:13:39.4846308Z // begin inline asm 2026-02-21T08:13:39.4846713Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 64 ], %rd107, %rd104, %r373, %p134; 2026-02-21T08:13:39.4846812Z // end inline asm 2026-02-21T08:13:39.4846902Z add.s32 %r403, %r390, 8224; 2026-02-21T08:13:39.4846990Z bfe.u32 %r404, %r403, 4, 14; 2026-02-21T08:13:39.4847090Z cvt.u64.u32 %rd125, %r404; 2026-02-21T08:13:39.4847199Z or.b64 %rd109, %rd125, -9223371899281604608; 2026-02-21T08:13:39.4847281Z // begin inline asm 2026-02-21T08:13:39.4847543Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 64 ], %rd109, %rd106, %r373, %p134; 2026-02-21T08:13:39.4847627Z // end inline asm 2026-02-21T08:13:39.4847714Z add.s32 %r405, %r390, 16384; 2026-02-21T08:13:39.4847814Z bfe.u32 %r406, %r405, 4, 14; 2026-02-21T08:13:39.4847904Z cvt.u64.u32 %rd126, %r406; 2026-02-21T08:13:39.4848012Z or.b64 %rd111, %rd126, -9223371899281604608; 2026-02-21T08:13:39.4848098Z // begin inline asm 2026-02-21T08:13:39.4848357Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 128 ], %rd111, %rd104, %r373, %p134; 2026-02-21T08:13:39.4848440Z // end inline asm 2026-02-21T08:13:39.4848535Z add.s32 %r407, %r390, 16416; 2026-02-21T08:13:39.4848640Z bfe.u32 %r408, %r407, 4, 14; 2026-02-21T08:13:39.4848729Z cvt.u64.u32 %rd127, %r408; 2026-02-21T08:13:39.4848837Z or.b64 %rd113, %rd127, -9223371899281604608; 2026-02-21T08:13:39.4848930Z // begin inline asm 2026-02-21T08:13:39.4849177Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 128 ], %rd113, %rd106, %r373, %p134; 2026-02-21T08:13:39.4849260Z // end inline asm 2026-02-21T08:13:39.4849349Z add.s32 %r409, %r390, 24576; 2026-02-21T08:13:39.4849442Z bfe.u32 %r410, %r409, 4, 14; 2026-02-21T08:13:39.4849531Z cvt.u64.u32 %rd128, %r410; 2026-02-21T08:13:39.4849649Z or.b64 %rd115, %rd128, -9223371899281604608; 2026-02-21T08:13:39.4849743Z // begin inline asm 2026-02-21T08:13:39.4849991Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 192 ], %rd115, %rd104, %r373, %p134; 2026-02-21T08:13:39.4850074Z // end inline asm 2026-02-21T08:13:39.4850172Z add.s32 %r411, %r390, 24608; 2026-02-21T08:13:39.4850267Z bfe.u32 %r412, %r411, 4, 14; 2026-02-21T08:13:39.4850364Z cvt.u64.u32 %rd129, %r412; 2026-02-21T08:13:39.4850473Z or.b64 %rd117, %rd129, -9223371899281604608; 2026-02-21T08:13:39.4850566Z // begin inline asm 2026-02-21T08:13:39.4850815Z @%p135 tcgen05.mma.cta_group::1.kind::f16 [ %r781 + 192 ], %rd117, %rd106, %r373, %p134; 2026-02-21T08:13:39.4850900Z // end inline asm 2026-02-21T08:13:39.4851003Z cvt.u64.u32 %rd119, %r435; 2026-02-21T08:13:39.4851088Z // begin inline asm 2026-02-21T08:13:39.4851310Z @%p135 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd119]; 2026-02-21T08:13:39.4851399Z // end inline asm 2026-02-21T08:13:39.4851487Z bra.uni $L__BB0_6; 2026-02-21T08:13:39.4851635Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:13:39.4851931Z .loc 1 0 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:0:52 2026-02-21T08:13:39.4852028Z mov.b32 %r436, 1; 2026-02-21T08:13:39.4852328Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4852569Z // begin inline asm 2026-02-21T08:13:39.4852659Z 2026-02-21T08:13:39.4852733Z { 2026-02-21T08:13:39.4852823Z .reg .pred complete; 2026-02-21T08:13:39.4852904Z waitLoop: 2026-02-21T08:13:39.4853115Z mbarrier.try_wait.parity.shared.b64 complete, [%r435], %r436; 2026-02-21T08:13:39.4853213Z @!complete bra.uni waitLoop; 2026-02-21T08:13:39.4853286Z } 2026-02-21T08:13:39.4853295Z 2026-02-21T08:13:39.4853386Z // end inline asm 2026-02-21T08:13:39.4853681Z .loc 1 50 42 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:50:42 2026-02-21T08:13:39.4853762Z bar.sync 0; 2026-02-21T08:13:39.4853860Z // begin inline asm 2026-02-21T08:13:39.4854000Z @%p165 mbarrier.inval.shared::cta.b64 [%r207]; 2026-02-21T08:13:39.4854082Z // end inline asm 2026-02-21T08:13:39.4854164Z bar.sync 0; 2026-02-21T08:13:39.4854261Z // begin inline asm 2026-02-21T08:13:39.4854519Z @%p165 mbarrier.inval.shared::cta.b64 [%r208]; 2026-02-21T08:13:39.4854624Z // end inline asm 2026-02-21T08:13:39.4854780Z bar.sync 0; 2026-02-21T08:13:39.4854879Z // begin inline asm 2026-02-21T08:13:39.4855021Z @%p165 mbarrier.inval.shared::cta.b64 [%r209]; 2026-02-21T08:13:39.4855111Z // end inline asm 2026-02-21T08:13:39.4855206Z bar.sync 0; 2026-02-21T08:13:39.4855297Z // begin inline asm 2026-02-21T08:13:39.4855435Z @%p165 mbarrier.inval.shared::cta.b64 [%r210]; 2026-02-21T08:13:39.4855534Z // end inline asm 2026-02-21T08:13:39.4855631Z bar.sync 0; 2026-02-21T08:13:39.4855717Z // begin inline asm 2026-02-21T08:13:39.4855852Z @%p165 mbarrier.inval.shared::cta.b64 [%r211]; 2026-02-21T08:13:39.4855934Z // end inline asm 2026-02-21T08:13:39.4856013Z bar.sync 0; 2026-02-21T08:13:39.4856097Z // begin inline asm 2026-02-21T08:13:39.4856232Z @%p165 mbarrier.inval.shared::cta.b64 [%r212]; 2026-02-21T08:13:39.4856311Z // end inline asm 2026-02-21T08:13:39.4856390Z bar.sync 0; 2026-02-21T08:13:39.4856488Z // begin inline asm 2026-02-21T08:13:39.4856615Z @%p165 mbarrier.inval.shared::cta.b64 [%r345]; 2026-02-21T08:13:39.4856707Z // end inline asm 2026-02-21T08:13:39.4856799Z add.s32 %r444, %r34, 323648; 2026-02-21T08:13:39.4856893Z // begin inline asm 2026-02-21T08:13:39.4857022Z @%p165 mbarrier.inval.shared::cta.b64 [%r444]; 2026-02-21T08:13:39.4857104Z // end inline asm 2026-02-21T08:13:39.4857191Z bar.sync 0; 2026-02-21T08:13:39.4857277Z // begin inline asm 2026-02-21T08:13:39.4857402Z @%p165 mbarrier.inval.shared::cta.b64 [%r206]; 2026-02-21T08:13:39.4857485Z // end inline asm 2026-02-21T08:13:39.4857791Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4857875Z // begin inline asm 2026-02-21T08:13:39.4858388Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461}, [%r581 + 0]; 2026-02-21T08:13:39.4858479Z // end inline asm 2026-02-21T08:13:39.4858568Z // begin inline asm 2026-02-21T08:13:39.4859079Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478}, [%r581 + 16]; 2026-02-21T08:13:39.4859174Z // end inline asm 2026-02-21T08:13:39.4859260Z // begin inline asm 2026-02-21T08:13:39.4859762Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495}, [%r581 + 32]; 2026-02-21T08:13:39.4859850Z // end inline asm 2026-02-21T08:13:39.4859935Z // begin inline asm 2026-02-21T08:13:39.4860429Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512}, [%r581 + 48]; 2026-02-21T08:13:39.4860521Z // end inline asm 2026-02-21T08:13:39.4860607Z // begin inline asm 2026-02-21T08:13:39.4861110Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529}, [%r581 + 128]; 2026-02-21T08:13:39.4861346Z // end inline asm 2026-02-21T08:13:39.4861443Z // begin inline asm 2026-02-21T08:13:39.4861942Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546}, [%r581 + 144]; 2026-02-21T08:13:39.4862026Z // end inline asm 2026-02-21T08:13:39.4862120Z // begin inline asm 2026-02-21T08:13:39.4862618Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563}, [%r581 + 160]; 2026-02-21T08:13:39.4862702Z // end inline asm 2026-02-21T08:13:39.4862794Z // begin inline asm 2026-02-21T08:13:39.4863293Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580}, [%r581 + 176]; 2026-02-21T08:13:39.4863506Z // end inline asm 2026-02-21T08:13:39.4863612Z // begin inline asm 2026-02-21T08:13:39.4863730Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:13:39.4863814Z // end inline asm 2026-02-21T08:13:39.4863907Z cvt.u64.u32 %rd133, %r446; 2026-02-21T08:13:39.4864010Z cvt.u64.u32 %rd134, %r447; 2026-02-21T08:13:39.4864103Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:13:39.4864199Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:13:39.4864513Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4864611Z mov.b64 {%r586, %r587}, %rd136; 2026-02-21T08:13:39.4864802Z cvt.rn.f16x2.f32 %r588, %r587, %r586; 2026-02-21T08:13:39.4865132Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4865245Z cvt.u64.u32 %rd137, %r448; 2026-02-21T08:13:39.4865342Z cvt.u64.u32 %rd138, %r449; 2026-02-21T08:13:39.4865441Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:13:39.4865558Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:13:39.4865871Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4865970Z mov.b64 {%r589, %r590}, %rd140; 2026-02-21T08:13:39.4866084Z cvt.rn.f16x2.f32 %r591, %r590, %r589; 2026-02-21T08:13:39.4866375Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4866466Z cvt.u64.u32 %rd141, %r450; 2026-02-21T08:13:39.4866557Z cvt.u64.u32 %rd142, %r451; 2026-02-21T08:13:39.4866657Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:13:39.4866750Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:13:39.4867045Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4867149Z mov.b64 {%r592, %r593}, %rd144; 2026-02-21T08:13:39.4867252Z cvt.rn.f16x2.f32 %r594, %r593, %r592; 2026-02-21T08:13:39.4867553Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4867663Z cvt.u64.u32 %rd145, %r452; 2026-02-21T08:13:39.4867752Z cvt.u64.u32 %rd146, %r453; 2026-02-21T08:13:39.4867842Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:13:39.4867934Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:13:39.4868231Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4868329Z mov.b64 {%r595, %r596}, %rd148; 2026-02-21T08:13:39.4868430Z cvt.rn.f16x2.f32 %r597, %r596, %r595; 2026-02-21T08:13:39.4868737Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4868827Z cvt.u64.u32 %rd149, %r454; 2026-02-21T08:13:39.4868918Z cvt.u64.u32 %rd150, %r455; 2026-02-21T08:13:39.4869018Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:13:39.4869111Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:13:39.4869406Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4869506Z mov.b64 {%r598, %r599}, %rd152; 2026-02-21T08:13:39.4869773Z cvt.rn.f16x2.f32 %r600, %r599, %r598; 2026-02-21T08:13:39.4870071Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4870161Z cvt.u64.u32 %rd153, %r456; 2026-02-21T08:13:39.4870258Z cvt.u64.u32 %rd154, %r457; 2026-02-21T08:13:39.4870346Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:13:39.4870443Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:13:39.4870748Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4870839Z mov.b64 {%r601, %r602}, %rd156; 2026-02-21T08:13:39.4870937Z cvt.rn.f16x2.f32 %r603, %r602, %r601; 2026-02-21T08:13:39.4871228Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4871327Z cvt.u64.u32 %rd157, %r458; 2026-02-21T08:13:39.4871415Z cvt.u64.u32 %rd158, %r459; 2026-02-21T08:13:39.4871649Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:13:39.4871768Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:13:39.4872065Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4872155Z mov.b64 {%r604, %r605}, %rd160; 2026-02-21T08:13:39.4872263Z cvt.rn.f16x2.f32 %r606, %r605, %r604; 2026-02-21T08:13:39.4872557Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4872646Z cvt.u64.u32 %rd161, %r460; 2026-02-21T08:13:39.4872740Z cvt.u64.u32 %rd162, %r461; 2026-02-21T08:13:39.4872838Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:13:39.4872930Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:13:39.4873223Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4873322Z mov.b64 {%r607, %r608}, %rd164; 2026-02-21T08:13:39.4873421Z cvt.rn.f16x2.f32 %r609, %r608, %r607; 2026-02-21T08:13:39.4873720Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4873825Z cvt.u64.u32 %rd165, %r463; 2026-02-21T08:13:39.4873916Z cvt.u64.u32 %rd166, %r464; 2026-02-21T08:13:39.4874005Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:13:39.4874095Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:13:39.4874399Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4874488Z mov.b64 {%r610, %r611}, %rd168; 2026-02-21T08:13:39.4874595Z cvt.rn.f16x2.f32 %r612, %r611, %r610; 2026-02-21T08:13:39.4874977Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4875079Z cvt.u64.u32 %rd169, %r465; 2026-02-21T08:13:39.4875176Z cvt.u64.u32 %rd170, %r466; 2026-02-21T08:13:39.4875284Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:13:39.4875379Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:13:39.4875702Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4875808Z mov.b64 {%r613, %r614}, %rd172; 2026-02-21T08:13:39.4875924Z cvt.rn.f16x2.f32 %r615, %r614, %r613; 2026-02-21T08:13:39.4876245Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4876352Z cvt.u64.u32 %rd173, %r467; 2026-02-21T08:13:39.4876451Z cvt.u64.u32 %rd174, %r468; 2026-02-21T08:13:39.4876542Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:13:39.4876633Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:13:39.4876934Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4877030Z mov.b64 {%r616, %r617}, %rd176; 2026-02-21T08:13:39.4877134Z cvt.rn.f16x2.f32 %r618, %r617, %r616; 2026-02-21T08:13:39.4877450Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4877567Z cvt.u64.u32 %rd177, %r469; 2026-02-21T08:13:39.4877815Z cvt.u64.u32 %rd178, %r470; 2026-02-21T08:13:39.4877914Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:13:39.4878022Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:13:39.4878348Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4878447Z mov.b64 {%r619, %r620}, %rd180; 2026-02-21T08:13:39.4878563Z cvt.rn.f16x2.f32 %r621, %r620, %r619; 2026-02-21T08:13:39.4878884Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4878981Z cvt.u64.u32 %rd181, %r471; 2026-02-21T08:13:39.4879079Z cvt.u64.u32 %rd182, %r472; 2026-02-21T08:13:39.4879183Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:13:39.4879282Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:13:39.4879606Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4879846Z mov.b64 {%r622, %r623}, %rd184; 2026-02-21T08:13:39.4879965Z cvt.rn.f16x2.f32 %r624, %r623, %r622; 2026-02-21T08:13:39.4880286Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4880392Z cvt.u64.u32 %rd185, %r473; 2026-02-21T08:13:39.4880488Z cvt.u64.u32 %rd186, %r474; 2026-02-21T08:13:39.4880588Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:13:39.4880686Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:13:39.4881010Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4881107Z mov.b64 {%r625, %r626}, %rd188; 2026-02-21T08:13:39.4881213Z cvt.rn.f16x2.f32 %r627, %r626, %r625; 2026-02-21T08:13:39.4881536Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4881633Z cvt.u64.u32 %rd189, %r475; 2026-02-21T08:13:39.4881730Z cvt.u64.u32 %rd190, %r476; 2026-02-21T08:13:39.4881834Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:13:39.4881937Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:13:39.4882255Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4882357Z mov.b64 {%r628, %r629}, %rd192; 2026-02-21T08:13:39.4882472Z cvt.rn.f16x2.f32 %r630, %r629, %r628; 2026-02-21T08:13:39.4882787Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4882883Z cvt.u64.u32 %rd193, %r477; 2026-02-21T08:13:39.4882990Z cvt.u64.u32 %rd194, %r478; 2026-02-21T08:13:39.4883086Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:13:39.4883184Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:13:39.4883509Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4883609Z mov.b64 {%r631, %r632}, %rd196; 2026-02-21T08:13:39.4883714Z cvt.rn.f16x2.f32 %r633, %r632, %r631; 2026-02-21T08:13:39.4884033Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4884146Z cvt.u64.u32 %rd197, %r480; 2026-02-21T08:13:39.4884241Z cvt.u64.u32 %rd198, %r481; 2026-02-21T08:13:39.4884346Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:13:39.4884443Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:13:39.4884830Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4884948Z mov.b64 {%r634, %r635}, %rd200; 2026-02-21T08:13:39.4885067Z cvt.rn.f16x2.f32 %r636, %r635, %r634; 2026-02-21T08:13:39.4885385Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4885484Z cvt.u64.u32 %rd201, %r482; 2026-02-21T08:13:39.4885582Z cvt.u64.u32 %rd202, %r483; 2026-02-21T08:13:39.4885689Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:13:39.4885784Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:13:39.4886110Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4886369Z mov.b64 {%r637, %r638}, %rd204; 2026-02-21T08:13:39.4886474Z cvt.rn.f16x2.f32 %r639, %r638, %r637; 2026-02-21T08:13:39.4886802Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4886900Z cvt.u64.u32 %rd205, %r484; 2026-02-21T08:13:39.4886988Z cvt.u64.u32 %rd206, %r485; 2026-02-21T08:13:39.4887079Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:13:39.4887170Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:13:39.4887478Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4887569Z mov.b64 {%r640, %r641}, %rd208; 2026-02-21T08:13:39.4887666Z cvt.rn.f16x2.f32 %r642, %r641, %r640; 2026-02-21T08:13:39.4887967Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4888057Z cvt.u64.u32 %rd209, %r486; 2026-02-21T08:13:39.4888298Z cvt.u64.u32 %rd210, %r487; 2026-02-21T08:13:39.4888423Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:13:39.4888518Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:13:39.4888813Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4888904Z mov.b64 {%r643, %r644}, %rd212; 2026-02-21T08:13:39.4889011Z cvt.rn.f16x2.f32 %r645, %r644, %r643; 2026-02-21T08:13:39.4889303Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4889391Z cvt.u64.u32 %rd213, %r488; 2026-02-21T08:13:39.4889495Z cvt.u64.u32 %rd214, %r489; 2026-02-21T08:13:39.4889586Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:13:39.4889678Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:13:39.4889984Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4890076Z mov.b64 {%r646, %r647}, %rd216; 2026-02-21T08:13:39.4890181Z cvt.rn.f16x2.f32 %r648, %r647, %r646; 2026-02-21T08:13:39.4890484Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4890585Z cvt.u64.u32 %rd217, %r490; 2026-02-21T08:13:39.4890676Z cvt.u64.u32 %rd218, %r491; 2026-02-21T08:13:39.4890769Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:13:39.4890871Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:13:39.4891164Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4891253Z mov.b64 {%r649, %r650}, %rd220; 2026-02-21T08:13:39.4891360Z cvt.rn.f16x2.f32 %r651, %r650, %r649; 2026-02-21T08:13:39.4891654Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4891750Z cvt.u64.u32 %rd221, %r492; 2026-02-21T08:13:39.4891839Z cvt.u64.u32 %rd222, %r493; 2026-02-21T08:13:39.4891939Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:13:39.4892031Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:13:39.4892327Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4892437Z mov.b64 {%r652, %r653}, %rd224; 2026-02-21T08:13:39.4892536Z cvt.rn.f16x2.f32 %r654, %r653, %r652; 2026-02-21T08:13:39.4892827Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4892926Z cvt.u64.u32 %rd225, %r494; 2026-02-21T08:13:39.4893016Z cvt.u64.u32 %rd226, %r495; 2026-02-21T08:13:39.4893106Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:13:39.4893199Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:13:39.4893500Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4893589Z mov.b64 {%r655, %r656}, %rd228; 2026-02-21T08:13:39.4893686Z cvt.rn.f16x2.f32 %r657, %r656, %r655; 2026-02-21T08:13:39.4893986Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4894085Z cvt.u64.u32 %rd229, %r497; 2026-02-21T08:13:39.4894295Z cvt.u64.u32 %rd230, %r498; 2026-02-21T08:13:39.4894393Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:13:39.4894486Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:13:39.4895202Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4895308Z mov.b64 {%r658, %r659}, %rd232; 2026-02-21T08:13:39.4895426Z cvt.rn.f16x2.f32 %r660, %r659, %r658; 2026-02-21T08:13:39.4895748Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4895844Z cvt.u64.u32 %rd233, %r499; 2026-02-21T08:13:39.4895956Z cvt.u64.u32 %rd234, %r500; 2026-02-21T08:13:39.4896046Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:13:39.4896136Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:13:39.4896436Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4896674Z mov.b64 {%r661, %r662}, %rd236; 2026-02-21T08:13:39.4896789Z cvt.rn.f16x2.f32 %r663, %r662, %r661; 2026-02-21T08:13:39.4897084Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4897184Z cvt.u64.u32 %rd237, %r501; 2026-02-21T08:13:39.4897276Z cvt.u64.u32 %rd238, %r502; 2026-02-21T08:13:39.4897366Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:13:39.4897467Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:13:39.4897759Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4897849Z mov.b64 {%r664, %r665}, %rd240; 2026-02-21T08:13:39.4897959Z cvt.rn.f16x2.f32 %r666, %r665, %r664; 2026-02-21T08:13:39.4898253Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4898342Z cvt.u64.u32 %rd241, %r503; 2026-02-21T08:13:39.4898428Z cvt.u64.u32 %rd242, %r504; 2026-02-21T08:13:39.4898534Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:13:39.4898632Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:13:39.4898933Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4899032Z mov.b64 {%r667, %r668}, %rd244; 2026-02-21T08:13:39.4899133Z cvt.rn.f16x2.f32 %r669, %r668, %r667; 2026-02-21T08:13:39.4899425Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4899526Z cvt.u64.u32 %rd245, %r505; 2026-02-21T08:13:39.4899614Z cvt.u64.u32 %rd246, %r506; 2026-02-21T08:13:39.4899702Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:13:39.4899794Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:13:39.4900092Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4900181Z mov.b64 {%r670, %r671}, %rd248; 2026-02-21T08:13:39.4900278Z cvt.rn.f16x2.f32 %r672, %r671, %r670; 2026-02-21T08:13:39.4900586Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4900686Z cvt.u64.u32 %rd249, %r507; 2026-02-21T08:13:39.4900775Z cvt.u64.u32 %rd250, %r508; 2026-02-21T08:13:39.4900874Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:13:39.4900972Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:13:39.4901269Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4901358Z mov.b64 {%r673, %r674}, %rd252; 2026-02-21T08:13:39.4901465Z cvt.rn.f16x2.f32 %r675, %r674, %r673; 2026-02-21T08:13:39.4901757Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4901845Z cvt.u64.u32 %rd253, %r509; 2026-02-21T08:13:39.4901945Z cvt.u64.u32 %rd254, %r510; 2026-02-21T08:13:39.4902033Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:13:39.4902124Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:13:39.4902433Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4902714Z mov.b64 {%r676, %r677}, %rd256; 2026-02-21T08:13:39.4902811Z cvt.rn.f16x2.f32 %r678, %r677, %r676; 2026-02-21T08:13:39.4903108Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4903203Z cvt.u64.u32 %rd257, %r511; 2026-02-21T08:13:39.4903297Z cvt.u64.u32 %rd258, %r512; 2026-02-21T08:13:39.4903386Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:13:39.4903484Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:13:39.4903774Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4903864Z mov.b64 {%r679, %r680}, %rd260; 2026-02-21T08:13:39.4903974Z cvt.rn.f16x2.f32 %r681, %r680, %r679; 2026-02-21T08:13:39.4904267Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4904460Z cvt.u64.u32 %rd261, %r514; 2026-02-21T08:13:39.4904569Z cvt.u64.u32 %rd262, %r515; 2026-02-21T08:13:39.4904772Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:13:39.4904874Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:13:39.4905199Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4905309Z mov.b64 {%r682, %r683}, %rd264; 2026-02-21T08:13:39.4905412Z cvt.rn.f16x2.f32 %r684, %r683, %r682; 2026-02-21T08:13:39.4905725Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4905832Z cvt.u64.u32 %rd265, %r516; 2026-02-21T08:13:39.4905929Z cvt.u64.u32 %rd266, %r517; 2026-02-21T08:13:39.4906027Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:13:39.4906125Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:13:39.4906445Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4906536Z mov.b64 {%r685, %r686}, %rd268; 2026-02-21T08:13:39.4906638Z cvt.rn.f16x2.f32 %r687, %r686, %r685; 2026-02-21T08:13:39.4906951Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4907048Z cvt.u64.u32 %rd269, %r518; 2026-02-21T08:13:39.4907145Z cvt.u64.u32 %rd270, %r519; 2026-02-21T08:13:39.4907243Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:13:39.4907352Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:13:39.4907665Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4907765Z mov.b64 {%r688, %r689}, %rd272; 2026-02-21T08:13:39.4907881Z cvt.rn.f16x2.f32 %r690, %r689, %r688; 2026-02-21T08:13:39.4908195Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4908297Z cvt.u64.u32 %rd273, %r520; 2026-02-21T08:13:39.4908401Z cvt.u64.u32 %rd274, %r521; 2026-02-21T08:13:39.4908497Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:13:39.4908601Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:13:39.4908929Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4909038Z mov.b64 {%r691, %r692}, %rd276; 2026-02-21T08:13:39.4909143Z cvt.rn.f16x2.f32 %r693, %r692, %r691; 2026-02-21T08:13:39.4909456Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4909561Z cvt.u64.u32 %rd277, %r522; 2026-02-21T08:13:39.4909658Z cvt.u64.u32 %rd278, %r523; 2026-02-21T08:13:39.4909754Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:13:39.4909857Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:13:39.4910172Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4910269Z mov.b64 {%r694, %r695}, %rd280; 2026-02-21T08:13:39.4910375Z cvt.rn.f16x2.f32 %r696, %r695, %r694; 2026-02-21T08:13:39.4910704Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4910952Z cvt.u64.u32 %rd281, %r524; 2026-02-21T08:13:39.4911049Z cvt.u64.u32 %rd282, %r525; 2026-02-21T08:13:39.4911156Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:13:39.4911255Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:13:39.4911577Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4911685Z mov.b64 {%r697, %r698}, %rd284; 2026-02-21T08:13:39.4911791Z cvt.rn.f16x2.f32 %r699, %r698, %r697; 2026-02-21T08:13:39.4912111Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4912223Z cvt.u64.u32 %rd285, %r526; 2026-02-21T08:13:39.4912321Z cvt.u64.u32 %rd286, %r527; 2026-02-21T08:13:39.4912418Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:13:39.4912516Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:13:39.4912979Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4913091Z mov.b64 {%r700, %r701}, %rd288; 2026-02-21T08:13:39.4913188Z cvt.rn.f16x2.f32 %r702, %r701, %r700; 2026-02-21T08:13:39.4913490Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4913582Z cvt.u64.u32 %rd289, %r528; 2026-02-21T08:13:39.4913673Z cvt.u64.u32 %rd290, %r529; 2026-02-21T08:13:39.4913765Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:13:39.4913868Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:13:39.4914163Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4914256Z mov.b64 {%r703, %r704}, %rd292; 2026-02-21T08:13:39.4914364Z cvt.rn.f16x2.f32 %r705, %r704, %r703; 2026-02-21T08:13:39.4914655Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4914842Z cvt.u64.u32 %rd293, %r531; 2026-02-21T08:13:39.4914950Z cvt.u64.u32 %rd294, %r532; 2026-02-21T08:13:39.4915042Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:13:39.4915137Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:13:39.4915430Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4915535Z mov.b64 {%r706, %r707}, %rd296; 2026-02-21T08:13:39.4915633Z cvt.rn.f16x2.f32 %r708, %r707, %r706; 2026-02-21T08:13:39.4915922Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4916023Z cvt.u64.u32 %rd297, %r533; 2026-02-21T08:13:39.4916116Z cvt.u64.u32 %rd298, %r534; 2026-02-21T08:13:39.4916209Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:13:39.4916307Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:13:39.4916597Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4916690Z mov.b64 {%r709, %r710}, %rd300; 2026-02-21T08:13:39.4916790Z cvt.rn.f16x2.f32 %r711, %r710, %r709; 2026-02-21T08:13:39.4917095Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4917195Z cvt.u64.u32 %rd301, %r535; 2026-02-21T08:13:39.4917283Z cvt.u64.u32 %rd302, %r536; 2026-02-21T08:13:39.4917382Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:13:39.4917472Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:13:39.4917759Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4917864Z mov.b64 {%r712, %r713}, %rd304; 2026-02-21T08:13:39.4917963Z cvt.rn.f16x2.f32 %r714, %r713, %r712; 2026-02-21T08:13:39.4918248Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4918340Z cvt.u64.u32 %rd305, %r537; 2026-02-21T08:13:39.4918442Z cvt.u64.u32 %rd306, %r538; 2026-02-21T08:13:39.4918532Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:13:39.4918624Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:13:39.4918955Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4919222Z mov.b64 {%r715, %r716}, %rd308; 2026-02-21T08:13:39.4919327Z cvt.rn.f16x2.f32 %r717, %r716, %r715; 2026-02-21T08:13:39.4919649Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4919747Z cvt.u64.u32 %rd309, %r539; 2026-02-21T08:13:39.4919844Z cvt.u64.u32 %rd310, %r540; 2026-02-21T08:13:39.4919937Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:13:39.4920048Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:13:39.4920363Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4920461Z mov.b64 {%r718, %r719}, %rd312; 2026-02-21T08:13:39.4920576Z cvt.rn.f16x2.f32 %r720, %r719, %r718; 2026-02-21T08:13:39.4920893Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4921106Z cvt.u64.u32 %rd313, %r541; 2026-02-21T08:13:39.4921222Z cvt.u64.u32 %rd314, %r542; 2026-02-21T08:13:39.4921311Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:13:39.4921398Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:13:39.4921692Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4921791Z mov.b64 {%r721, %r722}, %rd316; 2026-02-21T08:13:39.4921888Z cvt.rn.f16x2.f32 %r723, %r722, %r721; 2026-02-21T08:13:39.4922179Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4922275Z cvt.u64.u32 %rd317, %r543; 2026-02-21T08:13:39.4922368Z cvt.u64.u32 %rd318, %r544; 2026-02-21T08:13:39.4922457Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:13:39.4922552Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:13:39.4922846Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4922943Z mov.b64 {%r724, %r725}, %rd320; 2026-02-21T08:13:39.4923048Z cvt.rn.f16x2.f32 %r726, %r725, %r724; 2026-02-21T08:13:39.4923346Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4923435Z cvt.u64.u32 %rd321, %r545; 2026-02-21T08:13:39.4923524Z cvt.u64.u32 %rd322, %r546; 2026-02-21T08:13:39.4923626Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:13:39.4923716Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:13:39.4924005Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4924104Z mov.b64 {%r727, %r728}, %rd324; 2026-02-21T08:13:39.4924200Z cvt.rn.f16x2.f32 %r729, %r728, %r727; 2026-02-21T08:13:39.4924491Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4924582Z cvt.u64.u32 %rd325, %r548; 2026-02-21T08:13:39.4924761Z cvt.u64.u32 %rd326, %r549; 2026-02-21T08:13:39.4924865Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:13:39.4924957Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:13:39.4925264Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4925355Z mov.b64 {%r730, %r731}, %rd328; 2026-02-21T08:13:39.4925452Z cvt.rn.f16x2.f32 %r732, %r731, %r730; 2026-02-21T08:13:39.4925755Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4925845Z cvt.u64.u32 %rd329, %r550; 2026-02-21T08:13:39.4925933Z cvt.u64.u32 %rd330, %r551; 2026-02-21T08:13:39.4926025Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:13:39.4926126Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:13:39.4926421Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4926513Z mov.b64 {%r733, %r734}, %rd332; 2026-02-21T08:13:39.4926620Z cvt.rn.f16x2.f32 %r735, %r734, %r733; 2026-02-21T08:13:39.4926917Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4927156Z cvt.u64.u32 %rd333, %r552; 2026-02-21T08:13:39.4927255Z cvt.u64.u32 %rd334, %r553; 2026-02-21T08:13:39.4927345Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:13:39.4927437Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:13:39.4927729Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4927832Z mov.b64 {%r736, %r737}, %rd336; 2026-02-21T08:13:39.4927930Z cvt.rn.f16x2.f32 %r738, %r737, %r736; 2026-02-21T08:13:39.4928226Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4928330Z cvt.u64.u32 %rd337, %r554; 2026-02-21T08:13:39.4928421Z cvt.u64.u32 %rd338, %r555; 2026-02-21T08:13:39.4928512Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:13:39.4928610Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:13:39.4929045Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4929150Z mov.b64 {%r739, %r740}, %rd340; 2026-02-21T08:13:39.4929249Z cvt.rn.f16x2.f32 %r741, %r740, %r739; 2026-02-21T08:13:39.4929550Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4929642Z cvt.u64.u32 %rd341, %r556; 2026-02-21T08:13:39.4929732Z cvt.u64.u32 %rd342, %r557; 2026-02-21T08:13:39.4929832Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:13:39.4929924Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:13:39.4930214Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4930312Z mov.b64 {%r742, %r743}, %rd344; 2026-02-21T08:13:39.4930409Z cvt.rn.f16x2.f32 %r744, %r743, %r742; 2026-02-21T08:13:39.4930701Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4930792Z cvt.u64.u32 %rd345, %r558; 2026-02-21T08:13:39.4930902Z cvt.u64.u32 %rd346, %r559; 2026-02-21T08:13:39.4931004Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:13:39.4931095Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:13:39.4931398Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4931499Z mov.b64 {%r745, %r746}, %rd348; 2026-02-21T08:13:39.4931595Z cvt.rn.f16x2.f32 %r747, %r746, %r745; 2026-02-21T08:13:39.4931896Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4931986Z cvt.u64.u32 %rd349, %r560; 2026-02-21T08:13:39.4932076Z cvt.u64.u32 %rd350, %r561; 2026-02-21T08:13:39.4932168Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:13:39.4932265Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:13:39.4932561Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4932651Z mov.b64 {%r748, %r749}, %rd352; 2026-02-21T08:13:39.4932763Z cvt.rn.f16x2.f32 %r750, %r749, %r748; 2026-02-21T08:13:39.4933063Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4933153Z cvt.u64.u32 %rd353, %r562; 2026-02-21T08:13:39.4933251Z cvt.u64.u32 %rd354, %r563; 2026-02-21T08:13:39.4933340Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:13:39.4933430Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:13:39.4933722Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4933829Z mov.b64 {%r751, %r752}, %rd356; 2026-02-21T08:13:39.4933927Z cvt.rn.f16x2.f32 %r753, %r752, %r751; 2026-02-21T08:13:39.4934214Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4934311Z cvt.u64.u32 %rd357, %r565; 2026-02-21T08:13:39.4934400Z cvt.u64.u32 %rd358, %r566; 2026-02-21T08:13:39.4934490Z shl.b64 %rd359, %rd358, 32; 2026-02-21T08:13:39.4934595Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T08:13:39.4935105Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4935197Z mov.b64 {%r754, %r755}, %rd360; 2026-02-21T08:13:39.4935297Z cvt.rn.f16x2.f32 %r756, %r755, %r754; 2026-02-21T08:13:39.4935599Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4935688Z cvt.u64.u32 %rd361, %r567; 2026-02-21T08:13:39.4935776Z cvt.u64.u32 %rd362, %r568; 2026-02-21T08:13:39.4935870Z shl.b64 %rd363, %rd362, 32; 2026-02-21T08:13:39.4935961Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T08:13:39.4936251Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4936356Z mov.b64 {%r757, %r758}, %rd364; 2026-02-21T08:13:39.4936454Z cvt.rn.f16x2.f32 %r759, %r758, %r757; 2026-02-21T08:13:39.4936882Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4936997Z cvt.u64.u32 %rd365, %r569; 2026-02-21T08:13:39.4937094Z cvt.u64.u32 %rd366, %r570; 2026-02-21T08:13:39.4937186Z shl.b64 %rd367, %rd366, 32; 2026-02-21T08:13:39.4937277Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T08:13:39.4937579Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4937668Z mov.b64 {%r760, %r761}, %rd368; 2026-02-21T08:13:39.4937766Z cvt.rn.f16x2.f32 %r762, %r761, %r760; 2026-02-21T08:13:39.4938066Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4938155Z cvt.u64.u32 %rd369, %r571; 2026-02-21T08:13:39.4938245Z cvt.u64.u32 %rd370, %r572; 2026-02-21T08:13:39.4938330Z shl.b64 %rd371, %rd370, 32; 2026-02-21T08:13:39.4938435Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T08:13:39.4938731Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4938827Z mov.b64 {%r763, %r764}, %rd372; 2026-02-21T08:13:39.4938939Z cvt.rn.f16x2.f32 %r765, %r764, %r763; 2026-02-21T08:13:39.4939230Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4939324Z cvt.u64.u32 %rd373, %r573; 2026-02-21T08:13:39.4939430Z cvt.u64.u32 %rd374, %r574; 2026-02-21T08:13:39.4939530Z shl.b64 %rd375, %rd374, 32; 2026-02-21T08:13:39.4939628Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T08:13:39.4939947Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4940055Z mov.b64 {%r766, %r767}, %rd376; 2026-02-21T08:13:39.4940161Z cvt.rn.f16x2.f32 %r768, %r767, %r766; 2026-02-21T08:13:39.4940482Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4940589Z cvt.u64.u32 %rd377, %r575; 2026-02-21T08:13:39.4940685Z cvt.u64.u32 %rd378, %r576; 2026-02-21T08:13:39.4940795Z shl.b64 %rd379, %rd378, 32; 2026-02-21T08:13:39.4940909Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T08:13:39.4941232Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4941329Z mov.b64 {%r769, %r770}, %rd380; 2026-02-21T08:13:39.4941435Z cvt.rn.f16x2.f32 %r771, %r770, %r769; 2026-02-21T08:13:39.4941761Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4941859Z cvt.u64.u32 %rd381, %r577; 2026-02-21T08:13:39.4941956Z cvt.u64.u32 %rd382, %r578; 2026-02-21T08:13:39.4942064Z shl.b64 %rd383, %rd382, 32; 2026-02-21T08:13:39.4942161Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T08:13:39.4942483Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4942591Z mov.b64 {%r772, %r773}, %rd384; 2026-02-21T08:13:39.4942698Z cvt.rn.f16x2.f32 %r774, %r773, %r772; 2026-02-21T08:13:39.4943023Z .loc 1 56 52 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:56:52 2026-02-21T08:13:39.4943259Z cvt.u64.u32 %rd385, %r579; 2026-02-21T08:13:39.4943367Z cvt.u64.u32 %rd386, %r580; 2026-02-21T08:13:39.4943466Z shl.b64 %rd387, %rd386, 32; 2026-02-21T08:13:39.4943564Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T08:13:39.4943893Z .loc 1 58 27 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:58:27 2026-02-21T08:13:39.4943994Z mov.b64 {%r775, %r776}, %rd388; 2026-02-21T08:13:39.4944104Z cvt.rn.f16x2.f32 %r777, %r776, %r775; 2026-02-21T08:13:39.4944430Z .loc 1 59 45 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:59:45 2026-02-21T08:13:39.4944555Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:39.4944646Z bar.sync 0; 2026-02-21T08:13:39.4944898Z st.shared.v4.b32 [%r4], {%r588, %r591, %r594, %r597}; 2026-02-21T08:13:39.4945218Z st.shared.v4.b32 [%r4+32768], {%r684, %r687, %r690, %r693}; 2026-02-21T08:13:39.4945383Z st.shared.v4.b32 [%r5], {%r600, %r603, %r606, %r609}; 2026-02-21T08:13:39.4945561Z st.shared.v4.b32 [%r5+32768], {%r696, %r699, %r702, %r705}; 2026-02-21T08:13:39.4945715Z st.shared.v4.b32 [%r6], {%r612, %r615, %r618, %r621}; 2026-02-21T08:13:39.4945883Z st.shared.v4.b32 [%r6+32768], {%r708, %r711, %r714, %r717}; 2026-02-21T08:13:39.4946029Z st.shared.v4.b32 [%r7], {%r624, %r627, %r630, %r633}; 2026-02-21T08:13:39.4946201Z st.shared.v4.b32 [%r7+32768], {%r720, %r723, %r726, %r729}; 2026-02-21T08:13:39.4946349Z st.shared.v4.b32 [%r8], {%r636, %r639, %r642, %r645}; 2026-02-21T08:13:39.4946511Z st.shared.v4.b32 [%r8+32768], {%r732, %r735, %r738, %r741}; 2026-02-21T08:13:39.4946656Z st.shared.v4.b32 [%r9], {%r648, %r651, %r654, %r657}; 2026-02-21T08:13:39.4946846Z st.shared.v4.b32 [%r9+32768], {%r744, %r747, %r750, %r753}; 2026-02-21T08:13:39.4947008Z st.shared.v4.b32 [%r10], {%r660, %r663, %r666, %r669}; 2026-02-21T08:13:39.4947188Z st.shared.v4.b32 [%r10+32768], {%r756, %r759, %r762, %r765}; 2026-02-21T08:13:39.4947362Z st.shared.v4.b32 [%r11], {%r672, %r675, %r678, %r681}; 2026-02-21T08:13:39.4947533Z st.shared.v4.b32 [%r11+32768], {%r768, %r771, %r774, %r777}; 2026-02-21T08:13:39.4947630Z // begin inline asm 2026-02-21T08:13:39.4947773Z fence.proxy.async.shared::cta; 2026-02-21T08:13:39.4947863Z // end inline asm 2026-02-21T08:13:39.4947953Z bar.sync 0; 2026-02-21T08:13:39.4948065Z elect.sync %r778|%p176, -1; 2026-02-21T08:13:39.4948189Z and.pred %p174, %p94, %p176; 2026-02-21T08:13:39.4948282Z shl.b32 %r779, %r15, 15; 2026-02-21T08:13:39.4948378Z add.s32 %r780, %r34, %r779; 2026-02-21T08:13:39.4948484Z add.s32 %r584, %r780, 229376; 2026-02-21T08:13:39.4948577Z // begin inline asm 2026-02-21T08:13:39.4948939Z @%p174 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd132, {%r582, %r583}], [%r584]; 2026-02-21T08:13:39.4949039Z // end inline asm 2026-02-21T08:13:39.4949152Z cp.async.bulk.commit_group; 2026-02-21T08:13:39.4949296Z $L__BB0_8: // %._crit_edge 2026-02-21T08:13:39.4949624Z .loc 1 33 74 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:33:74 2026-02-21T08:13:39.4949752Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:39.4949840Z bar.sync 0; 2026-02-21T08:13:39.4950156Z .loc 1 33 4 // cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py:33:4 2026-02-21T08:13:39.4950253Z bar.sync 0; 2026-02-21T08:13:39.4950347Z // begin inline asm 2026-02-21T08:13:39.4950557Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r781, 256; 2026-02-21T08:13:39.4950663Z // end inline asm 2026-02-21T08:13:39.4950745Z ret; 2026-02-21T08:13:39.4950833Z $L__tmp1: 2026-02-21T08:13:39.4950922Z $L__func_end0: 2026-02-21T08:13:39.4951070Z // -- End function 2026-02-21T08:13:39.4951154Z } 2026-02-21T08:13:39.4951547Z .file 1 "/tmp/torchinductor_root/dk/cdkz2bk73vh4v7v5ylkzwtxh73liaip4uketvimajxdygwhnh4oa.py" 2026-02-21T08:13:39.4951667Z .section .debug_abbrev 2026-02-21T08:13:39.4951922Z { 2026-02-21T08:13:39.4952077Z .b8 1 // Abbreviation Code 2026-02-21T08:13:39.4952233Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:13:39.4952377Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:13:39.4952510Z .b8 37 // DW_AT_producer 2026-02-21T08:13:39.4952639Z .b8 8 // DW_FORM_string 2026-02-21T08:13:39.4952776Z .b8 19 // DW_AT_language 2026-02-21T08:13:39.4952906Z .b8 5 // DW_FORM_data2 2026-02-21T08:13:39.4953033Z .b8 3 // DW_AT_name 2026-02-21T08:13:39.4953174Z .b8 8 // DW_FORM_string 2026-02-21T08:13:39.4953308Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:13:39.4953542Z .b8 6 // DW_FORM_data4 2026-02-21T08:13:39.4953690Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:13:39.4953824Z .b8 8 // DW_FORM_string 2026-02-21T08:13:39.4953941Z .b8 0 // EOM(1) 2026-02-21T08:13:39.4954053Z .b8 0 // EOM(2) 2026-02-21T08:13:39.4954174Z .b8 0 // EOM(3) 2026-02-21T08:13:39.4954254Z } 2026-02-21T08:13:39.4954354Z .section .debug_info 2026-02-21T08:13:39.4954443Z { 2026-02-21T08:13:39.4954583Z .b32 104 // Length of Unit 2026-02-21T08:13:39.4954795Z .b8 2 // DWARF version number 2026-02-21T08:13:39.4954881Z .b8 0 2026-02-21T08:13:39.4955100Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:13:39.4955252Z .b8 8 // Address Size (in bytes) 2026-02-21T08:13:39.4955433Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:13:39.4955584Z .b8 116 // DW_AT_producer 2026-02-21T08:13:39.4955670Z .b8 114 2026-02-21T08:13:39.4955752Z .b8 105 2026-02-21T08:13:39.4955843Z .b8 116 2026-02-21T08:13:39.4955924Z .b8 111 2026-02-21T08:13:39.4956007Z .b8 110 2026-02-21T08:13:39.4956088Z .b8 0 2026-02-21T08:13:39.4956225Z .b8 2 // DW_AT_language 2026-02-21T08:13:39.4956309Z .b8 0 2026-02-21T08:13:39.4956437Z .b8 99 // DW_AT_name 2026-02-21T08:13:39.4956529Z .b8 100 2026-02-21T08:13:39.4956611Z .b8 107 2026-02-21T08:13:39.4956690Z .b8 122 2026-02-21T08:13:39.4956773Z .b8 50 2026-02-21T08:13:39.4956866Z .b8 98 2026-02-21T08:13:39.4956950Z .b8 107 2026-02-21T08:13:39.4957031Z .b8 55 2026-02-21T08:13:39.4957113Z .b8 51 2026-02-21T08:13:39.4957204Z .b8 118 2026-02-21T08:13:39.4957285Z .b8 104 2026-02-21T08:13:39.4957366Z .b8 52 2026-02-21T08:13:39.4957455Z .b8 118 2026-02-21T08:13:39.4957547Z .b8 55 2026-02-21T08:13:39.4957630Z .b8 118 2026-02-21T08:13:39.4957718Z .b8 53 2026-02-21T08:13:39.4957815Z .b8 121 2026-02-21T08:13:39.4957890Z .b8 108 2026-02-21T08:13:39.4957964Z .b8 107 2026-02-21T08:13:39.4958053Z .b8 122 2026-02-21T08:13:39.4958127Z .b8 119 2026-02-21T08:13:39.4958201Z .b8 116 2026-02-21T08:13:39.4958275Z .b8 120 2026-02-21T08:13:39.4958359Z .b8 104 2026-02-21T08:13:39.4958433Z .b8 55 2026-02-21T08:13:39.4958507Z .b8 51 2026-02-21T08:13:39.4958581Z .b8 108 2026-02-21T08:13:39.4958665Z .b8 105 2026-02-21T08:13:39.4958739Z .b8 97 2026-02-21T08:13:39.4958815Z .b8 105 2026-02-21T08:13:39.4958901Z .b8 112 2026-02-21T08:13:39.4958973Z .b8 52 2026-02-21T08:13:39.4959046Z .b8 117 2026-02-21T08:13:39.4959119Z .b8 107 2026-02-21T08:13:39.4959205Z .b8 101 2026-02-21T08:13:39.4959280Z .b8 116 2026-02-21T08:13:39.4959353Z .b8 118 2026-02-21T08:13:39.4959436Z .b8 105 2026-02-21T08:13:39.4959509Z .b8 109 2026-02-21T08:13:39.4959584Z .b8 97 2026-02-21T08:13:39.4959656Z .b8 106 2026-02-21T08:13:39.4959746Z .b8 120 2026-02-21T08:13:39.4960002Z .b8 100 2026-02-21T08:13:39.4960080Z .b8 121 2026-02-21T08:13:39.4960166Z .b8 103 2026-02-21T08:13:39.4960241Z .b8 119 2026-02-21T08:13:39.4960314Z .b8 104 2026-02-21T08:13:39.4960390Z .b8 110 2026-02-21T08:13:39.4960473Z .b8 104 2026-02-21T08:13:39.4960547Z .b8 52 2026-02-21T08:13:39.4960623Z .b8 111 2026-02-21T08:13:39.4960697Z .b8 97 2026-02-21T08:13:39.4960777Z .b8 46 2026-02-21T08:13:39.4960851Z .b8 112 2026-02-21T08:13:39.4960925Z .b8 121 2026-02-21T08:13:39.4961009Z .b8 0 2026-02-21T08:13:39.4961168Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:13:39.4961289Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:13:39.4961363Z .b8 116 2026-02-21T08:13:39.4961445Z .b8 109 2026-02-21T08:13:39.4961520Z .b8 112 2026-02-21T08:13:39.4961594Z .b8 47 2026-02-21T08:13:39.4961676Z .b8 116 2026-02-21T08:13:39.4961751Z .b8 111 2026-02-21T08:13:39.4961825Z .b8 114 2026-02-21T08:13:39.4961899Z .b8 99 2026-02-21T08:13:39.4962147Z .b8 104 2026-02-21T08:13:39.4962235Z .b8 105 2026-02-21T08:13:39.4962315Z .b8 110 2026-02-21T08:13:39.4962398Z .b8 100 2026-02-21T08:13:39.4962472Z .b8 117 2026-02-21T08:13:39.4962545Z .b8 99 2026-02-21T08:13:39.4962618Z .b8 116 2026-02-21T08:13:39.4962697Z .b8 111 2026-02-21T08:13:39.4962774Z .b8 114 2026-02-21T08:13:39.4962847Z .b8 95 2026-02-21T08:13:39.4962921Z .b8 114 2026-02-21T08:13:39.4963006Z .b8 111 2026-02-21T08:13:39.4963079Z .b8 111 2026-02-21T08:13:39.4963154Z .b8 116 2026-02-21T08:13:39.4963236Z .b8 47 2026-02-21T08:13:39.4963312Z .b8 100 2026-02-21T08:13:39.4963386Z .b8 107 2026-02-21T08:13:39.4963464Z .b8 0 2026-02-21T08:13:39.4963550Z } 2026-02-21T08:13:39.4963656Z .section .debug_macinfo { } 2026-02-21T08:13:39.4963665Z 2026-02-21T08:13:39.4963793Z ================================================================ 2026-02-21T08:13:39.4963970Z please share the reproducer above with Triton project. 2026-02-21T08:13:39.6342622Z 2026-02-21T08:13:39.6343477Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 38/38 19.3 configs/s 2026-02-21T08:13:40.4470892Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1243.5 2026-02-21T08:13:40.4471786Z configs/s 2026-02-21T08:13:40.5234520Z [99s] Generation 7 complete: 2026-02-21T08:13:40.5236332Z error=15 2026-02-21T08:13:40.5236486Z ok=24 2026-02-21T08:13:40.5236616Z min=0.0184 2026-02-21T08:13:40.5236751Z mid=0.0246 2026-02-21T08:13:40.5236869Z max=0.3267 2026-02-21T08:13:40.5237009Z best={'block_sizes': [256, 128, 64], 2026-02-21T08:13:40.5237265Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:13:40.5237511Z 'l2_groupings': [64], 2026-02-21T08:13:40.5237677Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:13:40.5237869Z 'loop_orders': [[1, 0]], 2026-02-21T08:13:40.5238017Z 'maxnreg': 256, 2026-02-21T08:13:40.5238161Z 'num_sm_multiplier': 1, 2026-02-21T08:13:40.5238308Z 'num_stages': 4, 2026-02-21T08:13:40.5238484Z 'num_warps': 1, 2026-02-21T08:13:40.5239144Z 'pid_type': 'persistent_blocked', 2026-02-21T08:13:40.5239328Z 'range_flattens': [None, False], 2026-02-21T08:13:40.5239513Z 'range_multi_buffers': [None, True], 2026-02-21T08:13:40.5239691Z 'range_num_stages': [0, 0], 2026-02-21T08:13:40.5239863Z 'range_unroll_factors': [0, 0], 2026-02-21T08:13:40.5240041Z 'range_warp_specializes': [True, None]} 2026-02-21T08:13:40.5272711Z [99s] Fitting surrogate: 609 points, 609 targets 2026-02-21T08:13:41.2136546Z [99s] Generation 8 starting: 33 neighbors, 2 active search path(s) 2026-02-21T08:13:44.2663469Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34/34 5.0 configs/s 2026-02-21T08:13:45.6222134Z [104s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:13:45.6223808Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:13:45.6225257Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:13:45.6225499Z `ptxas` stderr: 2026-02-21T08:13:45.6225932Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 349 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:45.6226408Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:45.6226556Z 2026-02-21T08:13:45.6226959Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2d7iso7q.ptx -o /tmp/tmp2d7iso7q.ptx.o 2026-02-21T08:13:45.6227415Z 2026-02-21T08:13:45.6227548Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:13:45.6227740Z 2026-02-21T08:13:45.6227934Z 2026-02-21T08:13:45.6228022Z ================================================================ 2026-02-21T08:13:45.6228228Z Internal Triton PTX codegen error 2026-02-21T08:13:45.6228400Z `ptxas` stderr: 2026-02-21T08:13:45.6228792Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 349 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:13:45.6229268Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:13:45.6229407Z 2026-02-21T08:13:45.6229788Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2d7iso7q.ptx -o /tmp/tmp2d7iso7q.ptx.o 2026-02-21T08:13:45.6230224Z 2026-02-21T08:13:45.6230227Z 2026-02-21T08:13:45.6230281Z // 2026-02-21T08:13:45.6230422Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:13:45.6230590Z // 2026-02-21T08:13:45.6230663Z 2026-02-21T08:13:45.6230717Z .version 8.7 2026-02-21T08:13:45.6230852Z .target sm_100a 2026-02-21T08:13:45.6230979Z .address_size 64 2026-02-21T08:13:45.6231058Z 2026-02-21T08:13:45.6231182Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:13:45.6231425Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:13:45.6231633Z // @_helion_matmul 2026-02-21T08:13:45.6231822Z .visible .entry _helion_matmul( 2026-02-21T08:13:45.6232035Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:13:45.6232278Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:13:45.6232521Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:13:45.6232760Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:13:45.6232995Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:13:45.6233193Z ) 2026-02-21T08:13:45.6233311Z .reqntid 128 2026-02-21T08:13:45.6233608Z .maxnreg 32 2026-02-21T08:13:45.6233728Z { 2026-02-21T08:13:45.6233856Z .reg .pred %p<148>; 2026-02-21T08:13:45.6234001Z .reg .b16 %rs<11>; 2026-02-21T08:13:45.6234150Z .reg .b32 %r<518>; 2026-02-21T08:13:45.6234287Z .reg .b64 %rd<229>; 2026-02-21T08:13:45.6234559Z .loc 1 19 0 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:19:0 2026-02-21T08:13:45.6234920Z $L__func_begin0: 2026-02-21T08:13:45.6235158Z .loc 1 19 0 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:19:0 2026-02-21T08:13:45.6235391Z 2026-02-21T08:13:45.6235443Z // %bb.0: 2026-02-21T08:13:45.6235590Z ld.param.b64 %rd9, [_helion_matmul_param_0]; 2026-02-21T08:13:45.6235776Z $L__tmp0: 2026-02-21T08:13:45.6236000Z .loc 1 19 0 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:19 2026-02-21T08:13:45.6236297Z mov.u32 %r1, %tid.x; 2026-02-21T08:13:45.6236550Z ld.param.b64 %rd27, [_helion_matmul_param_1]; 2026-02-21T08:13:45.6236752Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:13:45.6236936Z ld.param.b64 %rd45, [_helion_matmul_param_2]; 2026-02-21T08:13:45.6237124Z mov.b32 %r35, global_smem; 2026-02-21T08:13:45.6237282Z // begin inline asm 2026-02-21T08:13:45.6237518Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r35], 128; 2026-02-21T08:13:45.6237776Z // end inline asm 2026-02-21T08:13:45.6237941Z ld.param.b64 %rd62, [_helion_matmul_param_3]; 2026-02-21T08:13:45.6238142Z bar.sync 0; 2026-02-21T08:13:45.6238293Z ld.shared.b32 %r508, [global_smem]; 2026-02-21T08:13:45.6238502Z bar.sync 0; 2026-02-21T08:13:45.6238635Z // begin inline asm 2026-02-21T08:13:45.6238849Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:13:45.6239077Z // end inline asm 2026-02-21T08:13:45.6239341Z .loc 1 21 67 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:21:67 2026-02-21T08:13:45.6239651Z mov.u32 %r60, %ctaid.x; 2026-02-21T08:13:45.6239810Z mov.u32 %r61, %ctaid.y; 2026-02-21T08:13:45.6239970Z mov.u32 %r62, %ctaid.z; 2026-02-21T08:13:45.6240123Z mov.u32 %r63, %nctaid.x; 2026-02-21T08:13:45.6240283Z mov.u32 %r64, %nctaid.y; 2026-02-21T08:13:45.6240439Z mad.lo.s32 %r65, %r62, %r64, %r61; 2026-02-21T08:13:45.6240626Z mad.lo.s32 %r66, %r65, %r63, %r60; 2026-02-21T08:13:45.6240798Z mul.lo.s32 %r67, %r66, 384; 2026-02-21T08:13:45.6240969Z cvt.s64.s32 %rd63, %r67; 2026-02-21T08:13:45.6241132Z add.s64 %rd23, %rd62, %rd63; 2026-02-21T08:13:45.6241290Z shl.b32 %r68, %r1, 2; 2026-02-21T08:13:45.6241447Z add.s32 %r36, %r35, %r68; 2026-02-21T08:13:45.6241595Z mov.b32 %r45, 0; 2026-02-21T08:13:45.6241738Z // begin inline asm 2026-02-21T08:13:45.6241892Z @%p1 st.shared.b32 [ %r36 + 0 ], %r45; 2026-02-21T08:13:45.6242071Z // end inline asm 2026-02-21T08:13:45.6242214Z bar.warp.sync -1; 2026-02-21T08:13:45.6242370Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T08:13:45.6242526Z cvt.u64.u32 %rd8, %r35; 2026-02-21T08:13:45.6242685Z // begin inline asm 2026-02-21T08:13:45.6242945Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T08:13:45.6243236Z // end inline asm 2026-02-21T08:13:45.6243378Z // begin inline asm 2026-02-21T08:13:45.6243602Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T08:13:45.6243862Z // end inline asm 2026-02-21T08:13:45.6243996Z mov.b32 %r38, 32; 2026-02-21T08:13:45.6244141Z // begin inline asm 2026-02-21T08:13:45.6244386Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r38; 2026-02-21T08:13:45.6244652Z // end inline asm 2026-02-21T08:13:45.6244827Z mov.b32 %r39, 64; 2026-02-21T08:13:45.6244963Z // begin inline asm 2026-02-21T08:13:45.6245208Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r39; 2026-02-21T08:13:45.6245476Z // end inline asm 2026-02-21T08:13:45.6245620Z mov.b32 %r40, 1024; 2026-02-21T08:13:45.6245767Z // begin inline asm 2026-02-21T08:13:45.6246022Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r40; 2026-02-21T08:13:45.6246387Z // end inline asm 2026-02-21T08:13:45.6246538Z mov.b32 %r41, 4096; 2026-02-21T08:13:45.6246686Z // begin inline asm 2026-02-21T08:13:45.6246917Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r41; 2026-02-21T08:13:45.6247187Z // end inline asm 2026-02-21T08:13:45.6247322Z mov.b64 %rd16, 2048; 2026-02-21T08:13:45.6247476Z // begin inline asm 2026-02-21T08:13:45.6247730Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T08:13:45.6248024Z // end inline asm 2026-02-21T08:13:45.6248173Z mov.b32 %r42, 1; 2026-02-21T08:13:45.6248311Z // begin inline asm 2026-02-21T08:13:45.6248576Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r42; 2026-02-21T08:13:45.6248852Z // end inline asm 2026-02-21T08:13:45.6248990Z // begin inline asm 2026-02-21T08:13:45.6249301Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r42; 2026-02-21T08:13:45.6249582Z // end inline asm 2026-02-21T08:13:45.6249718Z // begin inline asm 2026-02-21T08:13:45.6249939Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T08:13:45.6250203Z // end inline asm 2026-02-21T08:13:45.6250329Z // begin inline asm 2026-02-21T08:13:45.6250575Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:45.6250852Z // end inline asm 2026-02-21T08:13:45.6250989Z // begin inline asm 2026-02-21T08:13:45.6251223Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x2; 2026-02-21T08:13:45.6251484Z // end inline asm 2026-02-21T08:13:45.6251623Z // begin inline asm 2026-02-21T08:13:45.6251841Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:45.6252099Z // end inline asm 2026-02-21T08:13:45.6252230Z // begin inline asm 2026-02-21T08:13:45.6252581Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T08:13:45.6252941Z // end inline asm 2026-02-21T08:13:45.6253078Z // begin inline asm 2026-02-21T08:13:45.6253286Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T08:13:45.6253530Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:45.6253719Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:45.6253889Z // end inline asm 2026-02-21T08:13:45.6254023Z bar.sync 0; 2026-02-21T08:13:45.6254155Z cvta.global.u64 %rd87, %rd23; 2026-02-21T08:13:45.6254436Z .loc 1 22 67 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:22:67 2026-02-21T08:13:45.6254758Z add.s32 %r69, %r67, 128; 2026-02-21T08:13:45.6254918Z cvt.s64.s32 %rd64, %r69; 2026-02-21T08:13:45.6255078Z add.s64 %rd41, %rd62, %rd64; 2026-02-21T08:13:45.6255226Z bar.sync 0; 2026-02-21T08:13:45.6255362Z // begin inline asm 2026-02-21T08:13:45.6255509Z @%p1 st.shared.b32 [ %r36 + 0 ], %r45; 2026-02-21T08:13:45.6255681Z // end inline asm 2026-02-21T08:13:45.6255813Z bar.warp.sync -1; 2026-02-21T08:13:45.6255954Z // begin inline asm 2026-02-21T08:13:45.6256189Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd27; 2026-02-21T08:13:45.6256467Z // end inline asm 2026-02-21T08:13:45.6256604Z // begin inline asm 2026-02-21T08:13:45.6256821Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T08:13:45.6257074Z // end inline asm 2026-02-21T08:13:45.6257203Z // begin inline asm 2026-02-21T08:13:45.6257435Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r38; 2026-02-21T08:13:45.6257692Z // end inline asm 2026-02-21T08:13:45.6257829Z mov.b32 %r47, 128; 2026-02-21T08:13:45.6257962Z // begin inline asm 2026-02-21T08:13:45.6258201Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r47; 2026-02-21T08:13:45.6258528Z // end inline asm 2026-02-21T08:13:45.6258656Z // begin inline asm 2026-02-21T08:13:45.6258888Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r40; 2026-02-21T08:13:45.6259147Z // end inline asm 2026-02-21T08:13:45.6259279Z // begin inline asm 2026-02-21T08:13:45.6259503Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r40; 2026-02-21T08:13:45.6259770Z // end inline asm 2026-02-21T08:13:45.6259908Z // begin inline asm 2026-02-21T08:13:45.6260148Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T08:13:45.6260433Z // end inline asm 2026-02-21T08:13:45.6260562Z // begin inline asm 2026-02-21T08:13:45.6260810Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r42; 2026-02-21T08:13:45.6261079Z // end inline asm 2026-02-21T08:13:45.6261214Z // begin inline asm 2026-02-21T08:13:45.6261536Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r42; 2026-02-21T08:13:45.6261815Z // end inline asm 2026-02-21T08:13:45.6261952Z // begin inline asm 2026-02-21T08:13:45.6262171Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T08:13:45.6262440Z // end inline asm 2026-02-21T08:13:45.6262569Z // begin inline asm 2026-02-21T08:13:45.6262815Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:45.6263089Z // end inline asm 2026-02-21T08:13:45.6263224Z // begin inline asm 2026-02-21T08:13:45.6263461Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x2; 2026-02-21T08:13:45.6263714Z // end inline asm 2026-02-21T08:13:45.6263850Z // begin inline asm 2026-02-21T08:13:45.6264064Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:45.6264321Z // end inline asm 2026-02-21T08:13:45.6264448Z // begin inline asm 2026-02-21T08:13:45.6264833Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd41 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T08:13:45.6265215Z // end inline asm 2026-02-21T08:13:45.6265345Z // begin inline asm 2026-02-21T08:13:45.6265551Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd41 + 0 ], 0x80; 2026-02-21T08:13:45.6265794Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:45.6265983Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:45.6266153Z // end inline asm 2026-02-21T08:13:45.6266293Z bar.sync 0; 2026-02-21T08:13:45.6266425Z cvta.global.u64 %rd88, %rd41; 2026-02-21T08:13:45.6266708Z .loc 1 24 71 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:24:71 2026-02-21T08:13:45.6267001Z add.s32 %r70, %r67, 256; 2026-02-21T08:13:45.6267153Z cvt.s64.s32 %rd65, %r70; 2026-02-21T08:13:45.6267314Z add.s64 %rd59, %rd62, %rd65; 2026-02-21T08:13:45.6267463Z bar.sync 0; 2026-02-21T08:13:45.6267599Z // begin inline asm 2026-02-21T08:13:45.6267740Z @%p1 st.shared.b32 [ %r36 + 0 ], %r45; 2026-02-21T08:13:45.6267914Z // end inline asm 2026-02-21T08:13:45.6268047Z bar.warp.sync -1; 2026-02-21T08:13:45.6268192Z // begin inline asm 2026-02-21T08:13:45.6268435Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd45; 2026-02-21T08:13:45.6268707Z // end inline asm 2026-02-21T08:13:45.6268848Z // begin inline asm 2026-02-21T08:13:45.6269058Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T08:13:45.6269302Z // end inline asm 2026-02-21T08:13:45.6269430Z // begin inline asm 2026-02-21T08:13:45.6269658Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r39; 2026-02-21T08:13:45.6269914Z // end inline asm 2026-02-21T08:13:45.6270049Z // begin inline asm 2026-02-21T08:13:45.6270272Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r39; 2026-02-21T08:13:45.6270521Z // end inline asm 2026-02-21T08:13:45.6270659Z // begin inline asm 2026-02-21T08:13:45.6270950Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r40; 2026-02-21T08:13:45.6271214Z // end inline asm 2026-02-21T08:13:45.6271341Z // begin inline asm 2026-02-21T08:13:45.6271570Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r41; 2026-02-21T08:13:45.6271830Z // end inline asm 2026-02-21T08:13:45.6271959Z // begin inline asm 2026-02-21T08:13:45.6272206Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T08:13:45.6272479Z // end inline asm 2026-02-21T08:13:45.6272614Z // begin inline asm 2026-02-21T08:13:45.6272851Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r42; 2026-02-21T08:13:45.6273138Z // end inline asm 2026-02-21T08:13:45.6273266Z // begin inline asm 2026-02-21T08:13:45.6273514Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r42; 2026-02-21T08:13:45.6273858Z // end inline asm 2026-02-21T08:13:45.6273990Z // begin inline asm 2026-02-21T08:13:45.6274215Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T08:13:45.6274460Z // end inline asm 2026-02-21T08:13:45.6274596Z // begin inline asm 2026-02-21T08:13:45.6274861Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:45.6275139Z // end inline asm 2026-02-21T08:13:45.6275274Z // begin inline asm 2026-02-21T08:13:45.6275498Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T08:13:45.6275759Z // end inline asm 2026-02-21T08:13:45.6275885Z // begin inline asm 2026-02-21T08:13:45.6276105Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T08:13:45.6276351Z // end inline asm 2026-02-21T08:13:45.6276487Z // begin inline asm 2026-02-21T08:13:45.6276836Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd59 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T08:13:45.6277195Z // end inline asm 2026-02-21T08:13:45.6277334Z // begin inline asm 2026-02-21T08:13:45.6277532Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd59 + 0 ], 0x80; 2026-02-21T08:13:45.6277778Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:13:45.6277956Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:13:45.6278133Z // end inline asm 2026-02-21T08:13:45.6278259Z bar.sync 0; 2026-02-21T08:13:45.6278403Z cvta.global.u64 %rd100, %rd59; 2026-02-21T08:13:45.6278686Z .loc 1 31 35 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:31:35 2026-02-21T08:13:45.6278972Z shl.b32 %r509, %r60, 2; 2026-02-21T08:13:45.6279239Z .loc 1 32 37 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:32:37 2026-02-21T08:13:45.6279530Z add.s32 %r71, %r509, 4; 2026-02-21T08:13:45.6279794Z .loc 1 32 49 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:32:49 2026-02-21T08:13:45.6280071Z min.s32 %r4, %r71, 512; 2026-02-21T08:13:45.6280331Z .loc 1 33 74 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:33:74 2026-02-21T08:13:45.6280621Z setp.ge.s32 %p57, %r509, %r4; 2026-02-21T08:13:45.6280781Z @%p57 bra $L__BB0_9; 2026-02-21T08:13:45.6280951Z // %bb.1: // %.lr.ph 2026-02-21T08:13:45.6281252Z .loc 1 0 74 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:0:74 2026-02-21T08:13:45.6281550Z shr.u32 %r5, %r1, 5; 2026-02-21T08:13:45.6281702Z add.s32 %r73, %r35, 57344; 2026-02-21T08:13:45.6281872Z bfe.u32 %r74, %r73, 4, 14; 2026-02-21T08:13:45.6282028Z cvt.u64.u32 %rd66, %r74; 2026-02-21T08:13:45.6282206Z or.b64 %rd82, %rd66, -9223371899399045120; 2026-02-21T08:13:45.6282396Z bfe.u32 %r75, %r35, 4, 14; 2026-02-21T08:13:45.6282551Z cvt.u64.u32 %rd67, %r75; 2026-02-21T08:13:45.6282726Z or.b64 %rd83, %rd67, -9223371899382267904; 2026-02-21T08:13:45.6282905Z add.s32 %r76, %r35, 57376; 2026-02-21T08:13:45.6283130Z bfe.u32 %r77, %r76, 4, 14; 2026-02-21T08:13:45.6283283Z cvt.u64.u32 %rd68, %r77; 2026-02-21T08:13:45.6283455Z or.b64 %rd84, %rd68, -9223371899399045120; 2026-02-21T08:13:45.6283632Z add.s32 %r78, %r35, 32; 2026-02-21T08:13:45.6283790Z bfe.u32 %r79, %r78, 4, 14; 2026-02-21T08:13:45.6283948Z cvt.u64.u32 %rd69, %r79; 2026-02-21T08:13:45.6284107Z or.b64 %rd85, %rd69, -9223371899382267904; 2026-02-21T08:13:45.6284291Z shl.b32 %r80, %r1, 7; 2026-02-21T08:13:45.6284442Z and.b32 %r81, %r80, 1920; 2026-02-21T08:13:45.6284603Z shl.b32 %r82, %r1, 6; 2026-02-21T08:13:45.6284773Z and.b32 %r83, %r82, 6144; 2026-02-21T08:13:45.6284932Z shl.b32 %r84, %r1, 4; 2026-02-21T08:13:45.6285075Z and.b32 %r85, %r84, 112; 2026-02-21T08:13:45.6285232Z shl.b32 %r86, %r1, 9; 2026-02-21T08:13:45.6285373Z and.b32 %r87, %r86, 8192; 2026-02-21T08:13:45.6285531Z or.b32 %r88, %r81, %r83; 2026-02-21T08:13:45.6285740Z or.b32 %r89, %r88, %r87; 2026-02-21T08:13:45.6285895Z or.b32 %r90, %r89, %r85; 2026-02-21T08:13:45.6286051Z add.s32 %r91, %r35, 86016; 2026-02-21T08:13:45.6286207Z add.s32 %r6, %r91, %r90; 2026-02-21T08:13:45.6286363Z xor.b32 %r92, %r90, 16; 2026-02-21T08:13:45.6286510Z add.s32 %r7, %r91, %r92; 2026-02-21T08:13:45.6286663Z xor.b32 %r93, %r90, 32; 2026-02-21T08:13:45.6286807Z add.s32 %r8, %r91, %r93; 2026-02-21T08:13:45.6286962Z xor.b32 %r94, %r90, 48; 2026-02-21T08:13:45.6287104Z add.s32 %r9, %r91, %r94; 2026-02-21T08:13:45.6287261Z xor.b32 %r95, %r90, 64; 2026-02-21T08:13:45.6287414Z add.s32 %r10, %r91, %r95; 2026-02-21T08:13:45.6287564Z xor.b32 %r96, %r90, 80; 2026-02-21T08:13:45.6287719Z add.s32 %r11, %r91, %r96; 2026-02-21T08:13:45.6287873Z xor.b32 %r97, %r90, 96; 2026-02-21T08:13:45.6288033Z add.s32 %r12, %r91, %r97; 2026-02-21T08:13:45.6288187Z xor.b32 %r98, %r90, 112; 2026-02-21T08:13:45.6288346Z add.s32 %r13, %r91, %r98; 2026-02-21T08:13:45.6288496Z bra.uni $L__BB0_2; 2026-02-21T08:13:45.6288699Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:45.6288928Z setp.lt.u32 %p144, %r1, 64; 2026-02-21T08:13:45.6289087Z mov.b32 %r325, 1; 2026-02-21T08:13:45.6289337Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6289616Z // begin inline asm 2026-02-21T08:13:45.6289757Z 2026-02-21T08:13:45.6289866Z { 2026-02-21T08:13:45.6289992Z .reg .pred complete; 2026-02-21T08:13:45.6290133Z waitLoop: 2026-02-21T08:13:45.6290327Z mbarrier.try_wait.parity.shared.b64 complete, [%r324], %r325; 2026-02-21T08:13:45.6290557Z @!complete bra.uni waitLoop; 2026-02-21T08:13:45.6290711Z } 2026-02-21T08:13:45.6290773Z 2026-02-21T08:13:45.6290835Z // end inline asm 2026-02-21T08:13:45.6291074Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6291362Z bar.sync 0; 2026-02-21T08:13:45.6291490Z // begin inline asm 2026-02-21T08:13:45.6291658Z @%p4 mbarrier.inval.shared::cta.b64 [%r169]; 2026-02-21T08:13:45.6291840Z // end inline asm 2026-02-21T08:13:45.6291976Z bar.sync 0; 2026-02-21T08:13:45.6292098Z // begin inline asm 2026-02-21T08:13:45.6292263Z @%p4 mbarrier.inval.shared::cta.b64 [%r170]; 2026-02-21T08:13:45.6292444Z // end inline asm 2026-02-21T08:13:45.6292569Z bar.sync 0; 2026-02-21T08:13:45.6292697Z // begin inline asm 2026-02-21T08:13:45.6292851Z @%p4 mbarrier.inval.shared::cta.b64 [%r171]; 2026-02-21T08:13:45.6293029Z // end inline asm 2026-02-21T08:13:45.6293151Z bar.sync 0; 2026-02-21T08:13:45.6293278Z // begin inline asm 2026-02-21T08:13:45.6293430Z @%p4 mbarrier.inval.shared::cta.b64 [%r172]; 2026-02-21T08:13:45.6293610Z // end inline asm 2026-02-21T08:13:45.6293732Z bar.sync 0; 2026-02-21T08:13:45.6293859Z // begin inline asm 2026-02-21T08:13:45.6294012Z @%p4 mbarrier.inval.shared::cta.b64 [%r173]; 2026-02-21T08:13:45.6294184Z // end inline asm 2026-02-21T08:13:45.6294328Z bar.sync 0; 2026-02-21T08:13:45.6294483Z // begin inline asm 2026-02-21T08:13:45.6294824Z @%p4 mbarrier.inval.shared::cta.b64 [%r174]; 2026-02-21T08:13:45.6295038Z // end inline asm 2026-02-21T08:13:45.6295207Z bar.sync 0; 2026-02-21T08:13:45.6295365Z // begin inline asm 2026-02-21T08:13:45.6295548Z @%p4 mbarrier.inval.shared::cta.b64 [%r260]; 2026-02-21T08:13:45.6295773Z // end inline asm 2026-02-21T08:13:45.6295951Z add.s32 %r333, %r35, 102464; 2026-02-21T08:13:45.6296140Z // begin inline asm 2026-02-21T08:13:45.6296332Z @%p4 mbarrier.inval.shared::cta.b64 [%r333]; 2026-02-21T08:13:45.6296545Z // end inline asm 2026-02-21T08:13:45.6296712Z bar.sync 0; 2026-02-21T08:13:45.6296880Z // begin inline asm 2026-02-21T08:13:45.6297075Z @%p4 mbarrier.inval.shared::cta.b64 [%r168]; 2026-02-21T08:13:45.6297307Z // end inline asm 2026-02-21T08:13:45.6297610Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6298028Z // begin inline asm 2026-02-21T08:13:45.6298448Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350}, [%r402 + 0], 64; 2026-02-21T08:13:45.6298899Z // end inline asm 2026-02-21T08:13:45.6299040Z // begin inline asm 2026-02-21T08:13:45.6299404Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365, %r366, %r367}, [%r402 + 16], 64; 2026-02-21T08:13:45.6299818Z // end inline asm 2026-02-21T08:13:45.6299951Z // begin inline asm 2026-02-21T08:13:45.6300339Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381, %r382, %r383, %r384}, [%r402 + 32], 64; 2026-02-21T08:13:45.6300745Z // end inline asm 2026-02-21T08:13:45.6300878Z // begin inline asm 2026-02-21T08:13:45.6301256Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401}, [%r402 + 48], 64; 2026-02-21T08:13:45.6301657Z // end inline asm 2026-02-21T08:13:45.6301797Z // begin inline asm 2026-02-21T08:13:45.6301953Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:13:45.6302113Z // end inline asm 2026-02-21T08:13:45.6302256Z cvt.u64.u32 %rd101, %r335; 2026-02-21T08:13:45.6302410Z cvt.u64.u32 %rd102, %r336; 2026-02-21T08:13:45.6302571Z shl.b64 %rd103, %rd102, 32; 2026-02-21T08:13:45.6302728Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T08:13:45.6303004Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6303287Z mov.b64 {%r407, %r408}, %rd104; 2026-02-21T08:13:45.6303462Z cvt.rn.f16x2.f32 %r409, %r408, %r407; 2026-02-21T08:13:45.6303745Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6304020Z cvt.u64.u32 %rd105, %r337; 2026-02-21T08:13:45.6304178Z cvt.u64.u32 %rd106, %r338; 2026-02-21T08:13:45.6304329Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:13:45.6304490Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:13:45.6304801Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6305094Z mov.b64 {%r410, %r411}, %rd108; 2026-02-21T08:13:45.6305260Z cvt.rn.f16x2.f32 %r412, %r411, %r410; 2026-02-21T08:13:45.6305543Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6305831Z cvt.u64.u32 %rd109, %r339; 2026-02-21T08:13:45.6305978Z cvt.u64.u32 %rd110, %r340; 2026-02-21T08:13:45.6306129Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:13:45.6306281Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:13:45.6306550Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6306827Z mov.b64 {%r413, %r414}, %rd112; 2026-02-21T08:13:45.6306998Z cvt.rn.f16x2.f32 %r415, %r414, %r413; 2026-02-21T08:13:45.6307347Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6307622Z cvt.u64.u32 %rd113, %r341; 2026-02-21T08:13:45.6307779Z cvt.u64.u32 %rd114, %r342; 2026-02-21T08:13:45.6307926Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:13:45.6308084Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:13:45.6308343Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6308631Z mov.b64 {%r416, %r417}, %rd116; 2026-02-21T08:13:45.6308795Z cvt.rn.f16x2.f32 %r418, %r417, %r416; 2026-02-21T08:13:45.6309081Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6309371Z cvt.u64.u32 %rd117, %r343; 2026-02-21T08:13:45.6309520Z cvt.u64.u32 %rd118, %r344; 2026-02-21T08:13:45.6309683Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:13:45.6309839Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:13:45.6310177Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6310459Z mov.b64 {%r419, %r420}, %rd120; 2026-02-21T08:13:45.6310627Z cvt.rn.f16x2.f32 %r421, %r420, %r419; 2026-02-21T08:13:45.6310905Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6311185Z cvt.u64.u32 %rd121, %r345; 2026-02-21T08:13:45.6311346Z cvt.u64.u32 %rd122, %r346; 2026-02-21T08:13:45.6311499Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:13:45.6311663Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:13:45.6311925Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6312213Z mov.b64 {%r422, %r423}, %rd124; 2026-02-21T08:13:45.6312380Z cvt.rn.f16x2.f32 %r424, %r423, %r422; 2026-02-21T08:13:45.6312659Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6312950Z cvt.u64.u32 %rd125, %r347; 2026-02-21T08:13:45.6313109Z cvt.u64.u32 %rd126, %r348; 2026-02-21T08:13:45.6313270Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:13:45.6313425Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:13:45.6313694Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6313972Z mov.b64 {%r425, %r426}, %rd128; 2026-02-21T08:13:45.6314146Z cvt.rn.f16x2.f32 %r427, %r426, %r425; 2026-02-21T08:13:45.6314420Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6314753Z cvt.u64.u32 %rd129, %r349; 2026-02-21T08:13:45.6314911Z cvt.u64.u32 %rd130, %r350; 2026-02-21T08:13:45.6315059Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:13:45.6315216Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:13:45.6315474Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6315775Z mov.b64 {%r428, %r429}, %rd132; 2026-02-21T08:13:45.6315942Z cvt.rn.f16x2.f32 %r430, %r429, %r428; 2026-02-21T08:13:45.6316245Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6316560Z cvt.u64.u32 %rd133, %r352; 2026-02-21T08:13:45.6316713Z cvt.u64.u32 %rd134, %r353; 2026-02-21T08:13:45.6316870Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:13:45.6317024Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:13:45.6317322Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6317627Z mov.b64 {%r431, %r432}, %rd136; 2026-02-21T08:13:45.6317799Z cvt.rn.f16x2.f32 %r433, %r432, %r431; 2026-02-21T08:13:45.6318099Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6318400Z cvt.u64.u32 %rd137, %r354; 2026-02-21T08:13:45.6318561Z cvt.u64.u32 %rd138, %r355; 2026-02-21T08:13:45.6318715Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:13:45.6318934Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:13:45.6319195Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6319490Z mov.b64 {%r434, %r435}, %rd140; 2026-02-21T08:13:45.6319651Z cvt.rn.f16x2.f32 %r436, %r435, %r434; 2026-02-21T08:13:45.6319924Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6320208Z cvt.u64.u32 %rd141, %r356; 2026-02-21T08:13:45.6320370Z cvt.u64.u32 %rd142, %r357; 2026-02-21T08:13:45.6320535Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:13:45.6320689Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:13:45.6320960Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6321237Z mov.b64 {%r437, %r438}, %rd144; 2026-02-21T08:13:45.6321404Z cvt.rn.f16x2.f32 %r439, %r438, %r437; 2026-02-21T08:13:45.6321730Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6322008Z cvt.u64.u32 %rd145, %r358; 2026-02-21T08:13:45.6322166Z cvt.u64.u32 %rd146, %r359; 2026-02-21T08:13:45.6322311Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:13:45.6322469Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:13:45.6322729Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6323018Z mov.b64 {%r440, %r441}, %rd148; 2026-02-21T08:13:45.6323178Z cvt.rn.f16x2.f32 %r442, %r441, %r440; 2026-02-21T08:13:45.6323459Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6323742Z cvt.u64.u32 %rd149, %r360; 2026-02-21T08:13:45.6323891Z cvt.u64.u32 %rd150, %r361; 2026-02-21T08:13:45.6324045Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:13:45.6324199Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:13:45.6324472Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6324784Z mov.b64 {%r443, %r444}, %rd152; 2026-02-21T08:13:45.6324949Z cvt.rn.f16x2.f32 %r445, %r444, %r443; 2026-02-21T08:13:45.6325229Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6325505Z cvt.u64.u32 %rd153, %r362; 2026-02-21T08:13:45.6325661Z cvt.u64.u32 %rd154, %r363; 2026-02-21T08:13:45.6325808Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:13:45.6325966Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:13:45.6326226Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6326508Z mov.b64 {%r446, %r447}, %rd156; 2026-02-21T08:13:45.6326667Z cvt.rn.f16x2.f32 %r448, %r447, %r446; 2026-02-21T08:13:45.6326949Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6327236Z cvt.u64.u32 %rd157, %r364; 2026-02-21T08:13:45.6327384Z cvt.u64.u32 %rd158, %r365; 2026-02-21T08:13:45.6327538Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:13:45.6327685Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:13:45.6327951Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6328230Z mov.b64 {%r449, %r450}, %rd160; 2026-02-21T08:13:45.6328397Z cvt.rn.f16x2.f32 %r451, %r450, %r449; 2026-02-21T08:13:45.6328676Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6328946Z cvt.u64.u32 %rd161, %r366; 2026-02-21T08:13:45.6329100Z cvt.u64.u32 %rd162, %r367; 2026-02-21T08:13:45.6329246Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:13:45.6329403Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:13:45.6329661Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6329946Z mov.b64 {%r452, %r453}, %rd164; 2026-02-21T08:13:45.6330105Z cvt.rn.f16x2.f32 %r454, %r453, %r452; 2026-02-21T08:13:45.6330438Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6330729Z cvt.u64.u32 %rd165, %r369; 2026-02-21T08:13:45.6330879Z cvt.u64.u32 %rd166, %r370; 2026-02-21T08:13:45.6331035Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:13:45.6331191Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:13:45.6331460Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6331737Z mov.b64 {%r455, %r456}, %rd168; 2026-02-21T08:13:45.6331905Z cvt.rn.f16x2.f32 %r457, %r456, %r455; 2026-02-21T08:13:45.6332174Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6332445Z cvt.u64.u32 %rd169, %r371; 2026-02-21T08:13:45.6332600Z cvt.u64.u32 %rd170, %r372; 2026-02-21T08:13:45.6332746Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:13:45.6332953Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:13:45.6333212Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6333497Z mov.b64 {%r458, %r459}, %rd172; 2026-02-21T08:13:45.6333654Z cvt.rn.f16x2.f32 %r460, %r459, %r458; 2026-02-21T08:13:45.6333932Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6334222Z cvt.u64.u32 %rd173, %r373; 2026-02-21T08:13:45.6334368Z cvt.u64.u32 %rd174, %r374; 2026-02-21T08:13:45.6334521Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:13:45.6334699Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:13:45.6334961Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6335243Z mov.b64 {%r461, %r462}, %rd176; 2026-02-21T08:13:45.6335410Z cvt.rn.f16x2.f32 %r463, %r462, %r461; 2026-02-21T08:13:45.6335694Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6335966Z cvt.u64.u32 %rd177, %r375; 2026-02-21T08:13:45.6336120Z cvt.u64.u32 %rd178, %r376; 2026-02-21T08:13:45.6336269Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:13:45.6336425Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:13:45.6336678Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6336976Z mov.b64 {%r464, %r465}, %rd180; 2026-02-21T08:13:45.6337143Z cvt.rn.f16x2.f32 %r466, %r465, %r464; 2026-02-21T08:13:45.6337436Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6337729Z cvt.u64.u32 %rd181, %r377; 2026-02-21T08:13:45.6337880Z cvt.u64.u32 %rd182, %r378; 2026-02-21T08:13:45.6338039Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:13:45.6338192Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:13:45.6338470Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6338755Z mov.b64 {%r467, %r468}, %rd184; 2026-02-21T08:13:45.6338930Z cvt.rn.f16x2.f32 %r469, %r468, %r467; 2026-02-21T08:13:45.6339213Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6339498Z cvt.u64.u32 %rd185, %r379; 2026-02-21T08:13:45.6339659Z cvt.u64.u32 %rd186, %r380; 2026-02-21T08:13:45.6339816Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:13:45.6339979Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:13:45.6340247Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6340548Z mov.b64 {%r470, %r471}, %rd188; 2026-02-21T08:13:45.6340714Z cvt.rn.f16x2.f32 %r472, %r471, %r470; 2026-02-21T08:13:45.6341000Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6341297Z cvt.u64.u32 %rd189, %r381; 2026-02-21T08:13:45.6341451Z cvt.u64.u32 %rd190, %r382; 2026-02-21T08:13:45.6341618Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:13:45.6341833Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:13:45.6342114Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6342409Z mov.b64 {%r473, %r474}, %rd192; 2026-02-21T08:13:45.6342584Z cvt.rn.f16x2.f32 %r475, %r474, %r473; 2026-02-21T08:13:45.6342875Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6343162Z cvt.u64.u32 %rd193, %r383; 2026-02-21T08:13:45.6343325Z cvt.u64.u32 %rd194, %r384; 2026-02-21T08:13:45.6343478Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:13:45.6343642Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:13:45.6343911Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6344208Z mov.b64 {%r476, %r477}, %rd196; 2026-02-21T08:13:45.6344370Z cvt.rn.f16x2.f32 %r478, %r477, %r476; 2026-02-21T08:13:45.6344736Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6345025Z cvt.u64.u32 %rd197, %r386; 2026-02-21T08:13:45.6345173Z cvt.u64.u32 %rd198, %r387; 2026-02-21T08:13:45.6345328Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:13:45.6345478Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:13:45.6345741Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6346014Z mov.b64 {%r479, %r480}, %rd200; 2026-02-21T08:13:45.6346181Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T08:13:45.6346454Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6346730Z cvt.u64.u32 %rd201, %r388; 2026-02-21T08:13:45.6346890Z cvt.u64.u32 %rd202, %r389; 2026-02-21T08:13:45.6347039Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:13:45.6347197Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:13:45.6347458Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6347747Z mov.b64 {%r482, %r483}, %rd204; 2026-02-21T08:13:45.6347907Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T08:13:45.6348179Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6348461Z cvt.u64.u32 %rd205, %r390; 2026-02-21T08:13:45.6348607Z cvt.u64.u32 %rd206, %r391; 2026-02-21T08:13:45.6348762Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:13:45.6348908Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:13:45.6349171Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6349447Z mov.b64 {%r485, %r486}, %rd208; 2026-02-21T08:13:45.6349613Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T08:13:45.6349883Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6350168Z cvt.u64.u32 %rd209, %r392; 2026-02-21T08:13:45.6350322Z cvt.u64.u32 %rd210, %r393; 2026-02-21T08:13:45.6350468Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:13:45.6350624Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:13:45.6350885Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6351166Z mov.b64 {%r488, %r489}, %rd212; 2026-02-21T08:13:45.6351323Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T08:13:45.6351592Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6351873Z cvt.u64.u32 %rd213, %r394; 2026-02-21T08:13:45.6352021Z cvt.u64.u32 %rd214, %r395; 2026-02-21T08:13:45.6352176Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:13:45.6352326Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:13:45.6352591Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6352876Z mov.b64 {%r491, %r492}, %rd216; 2026-02-21T08:13:45.6353125Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T08:13:45.6353402Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6353675Z cvt.u64.u32 %rd217, %r396; 2026-02-21T08:13:45.6353832Z cvt.u64.u32 %rd218, %r397; 2026-02-21T08:13:45.6353978Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:13:45.6354135Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:13:45.6354392Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6354711Z mov.b64 {%r494, %r495}, %rd220; 2026-02-21T08:13:45.6354877Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T08:13:45.6355147Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6355431Z cvt.u64.u32 %rd221, %r398; 2026-02-21T08:13:45.6355580Z cvt.u64.u32 %rd222, %r399; 2026-02-21T08:13:45.6355786Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:13:45.6355937Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:13:45.6356209Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6356499Z mov.b64 {%r497, %r498}, %rd224; 2026-02-21T08:13:45.6356662Z cvt.rn.f16x2.f32 %r499, %r498, %r497; 2026-02-21T08:13:45.6356941Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6357219Z cvt.u64.u32 %rd225, %r400; 2026-02-21T08:13:45.6357375Z cvt.u64.u32 %rd226, %r401; 2026-02-21T08:13:45.6357523Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:13:45.6357679Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:13:45.6357941Z .loc 1 58 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:58:27 2026-02-21T08:13:45.6358227Z mov.b64 {%r500, %r501}, %rd228; 2026-02-21T08:13:45.6358385Z cvt.rn.f16x2.f32 %r502, %r501, %r500; 2026-02-21T08:13:45.6358664Z .loc 1 59 45 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:59:45 2026-02-21T08:13:45.6358963Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:45.6359129Z bar.sync 0; 2026-02-21T08:13:45.6359296Z st.shared.v4.b32 [%r6], {%r409, %r412, %r415, %r418}; 2026-02-21T08:13:45.6359522Z st.shared.v4.b32 [%r7], {%r421, %r424, %r427, %r430}; 2026-02-21T08:13:45.6359745Z st.shared.v4.b32 [%r8], {%r433, %r436, %r439, %r442}; 2026-02-21T08:13:45.6359954Z st.shared.v4.b32 [%r9], {%r445, %r448, %r451, %r454}; 2026-02-21T08:13:45.6360178Z st.shared.v4.b32 [%r10], {%r457, %r460, %r463, %r466}; 2026-02-21T08:13:45.6360401Z st.shared.v4.b32 [%r11], {%r469, %r472, %r475, %r478}; 2026-02-21T08:13:45.6360613Z st.shared.v4.b32 [%r12], {%r481, %r484, %r487, %r490}; 2026-02-21T08:13:45.6360832Z st.shared.v4.b32 [%r13], {%r493, %r496, %r499, %r502}; 2026-02-21T08:13:45.6361015Z // begin inline asm 2026-02-21T08:13:45.6361180Z fence.proxy.async.shared::cta; 2026-02-21T08:13:45.6361339Z // end inline asm 2026-02-21T08:13:45.6361476Z bar.sync 0; 2026-02-21T08:13:45.6361615Z elect.sync %r503|%p145, -1; 2026-02-21T08:13:45.6361787Z and.pred %p143, %p144, %p145; 2026-02-21T08:13:45.6361949Z and.b32 %r504, %r17, 1; 2026-02-21T08:13:45.6362097Z shl.b32 %r505, %r504, 13; 2026-02-21T08:13:45.6362254Z add.s32 %r506, %r35, %r505; 2026-02-21T08:13:45.6362407Z add.s32 %r405, %r506, 86016; 2026-02-21T08:13:45.6362571Z shl.b32 %r507, %r504, 6; 2026-02-21T08:13:45.6362717Z or.b32 %r403, %r507, %r267; 2026-02-21T08:13:45.6362873Z // begin inline asm 2026-02-21T08:13:45.6363140Z @%p143 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd100, {%r403, %r404}], [%r405]; 2026-02-21T08:13:45.6363452Z // end inline asm 2026-02-21T08:13:45.6363604Z cp.async.bulk.commit_group; 2026-02-21T08:13:45.6363876Z .loc 1 33 74 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:33:74 2026-02-21T08:13:45.6364178Z add.s32 %r509, %r509, 1; 2026-02-21T08:13:45.6364331Z setp.ne.b32 %p146, %r509, %r4; 2026-02-21T08:13:45.6364555Z @%p146 bra $L__BB0_2; 2026-02-21T08:13:45.6364732Z bra.uni $L__BB0_9; 2026-02-21T08:13:45.6364922Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:13:45.6365150Z // Child Loop BB0_5 Depth 2 2026-02-21T08:13:45.6365455Z .loc 1 39 35 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:39:35 2026-02-21T08:13:45.6365740Z shr.s32 %r232, %r509, 31; 2026-02-21T08:13:45.6365890Z shr.u32 %r233, %r232, 26; 2026-02-21T08:13:45.6366047Z add.s32 %r234, %r509, %r233; 2026-02-21T08:13:45.6366300Z .loc 1 42 45 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:42:45 2026-02-21T08:13:45.6366593Z and.b32 %r235, %r234, 65472; 2026-02-21T08:13:45.6366745Z sub.s32 %r236, %r509, %r235; 2026-02-21T08:13:45.6367006Z .loc 1 42 64 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:42:64 2026-02-21T08:13:45.6367408Z cvt.u16.u32 %rs1, %r236; 2026-02-21T08:13:45.6367565Z cvt.s8.s32 %rs2, %r236; 2026-02-21T08:13:45.6367819Z .loc 1 43 51 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:43:51 2026-02-21T08:13:45.6368093Z shr.u16 %rs3, %rs2, 12; 2026-02-21T08:13:45.6368245Z and.b16 %rs4, %rs3, 7; 2026-02-21T08:13:45.6368390Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T08:13:45.6368549Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T08:13:45.6368693Z shr.s16 %rs7, %rs6, 3; 2026-02-21T08:13:45.6368947Z .loc 1 42 64 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:42:64 2026-02-21T08:13:45.6369231Z and.b16 %rs8, %rs5, 248; 2026-02-21T08:13:45.6369379Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T08:13:45.6369633Z .loc 1 44 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:44:27 2026-02-21T08:13:45.6369905Z shl.b32 %r237, %r234, 3; 2026-02-21T08:13:45.6370061Z and.b32 %r238, %r237, -512; 2026-02-21T08:13:45.6370216Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T08:13:45.6370377Z mad.wide.s16 %r404, %rs10, 64, %r238; 2026-02-21T08:13:45.6370649Z .loc 1 45 27 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:45:27 2026-02-21T08:13:45.6370933Z mul.wide.s16 %r267, %rs7, 128; 2026-02-21T08:13:45.6371199Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6371485Z shfl.sync.idx.b32 %r17, %r5, 0, 31, -1; 2026-02-21T08:13:45.6371667Z shl.b32 %r239, %r17, 21; 2026-02-21T08:13:45.6371813Z and.b32 %r240, %r239, 6291456; 2026-02-21T08:13:45.6371976Z add.s32 %r402, %r240, %r508; 2026-02-21T08:13:45.6372127Z mov.pred %p58, -1; 2026-02-21T08:13:45.6372275Z mov.b32 %r510, 0; 2026-02-21T08:13:45.6372407Z // begin inline asm 2026-02-21T08:13:45.6372792Z @%p58 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r402 + 0], 64, {%r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510}; 2026-02-21T08:13:45.6373200Z // end inline asm 2026-02-21T08:13:45.6373338Z // begin inline asm 2026-02-21T08:13:45.6373715Z @%p58 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r402 + 16], 64, {%r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510}; 2026-02-21T08:13:45.6374124Z // end inline asm 2026-02-21T08:13:45.6374269Z // begin inline asm 2026-02-21T08:13:45.6374633Z @%p58 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r402 + 32], 64, {%r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510}; 2026-02-21T08:13:45.6375083Z // end inline asm 2026-02-21T08:13:45.6375222Z // begin inline asm 2026-02-21T08:13:45.6375573Z @%p58 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r402 + 48], 64, {%r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510, %r510}; 2026-02-21T08:13:45.6375985Z // end inline asm 2026-02-21T08:13:45.6376114Z // begin inline asm 2026-02-21T08:13:45.6376271Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:13:45.6376496Z // end inline asm 2026-02-21T08:13:45.6376624Z bar.sync 0; 2026-02-21T08:13:45.6376871Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6377153Z add.s32 %r511, %r35, 102464; 2026-02-21T08:13:45.6377309Z // begin inline asm 2026-02-21T08:13:45.6377470Z @%p4 mbarrier.init.shared::cta.b64 [%r511], 1; 2026-02-21T08:13:45.6377660Z // end inline asm 2026-02-21T08:13:45.6377786Z bar.sync 0; 2026-02-21T08:13:45.6377920Z add.s32 %r168, %r35, 102472; 2026-02-21T08:13:45.6378075Z // begin inline asm 2026-02-21T08:13:45.6378234Z @%p4 mbarrier.init.shared::cta.b64 [%r168], 1; 2026-02-21T08:13:45.6378423Z // end inline asm 2026-02-21T08:13:45.6378553Z add.s32 %r169, %r35, 102400; 2026-02-21T08:13:45.6378706Z // begin inline asm 2026-02-21T08:13:45.6378860Z @%p4 mbarrier.init.shared::cta.b64 [%r169], 1; 2026-02-21T08:13:45.6379108Z // end inline asm 2026-02-21T08:13:45.6379238Z bar.sync 0; 2026-02-21T08:13:45.6379373Z add.s32 %r170, %r35, 102408; 2026-02-21T08:13:45.6379517Z // begin inline asm 2026-02-21T08:13:45.6379677Z @%p4 mbarrier.init.shared::cta.b64 [%r170], 1; 2026-02-21T08:13:45.6379876Z // end inline asm 2026-02-21T08:13:45.6380006Z bar.sync 0; 2026-02-21T08:13:45.6380142Z add.s32 %r171, %r35, 102416; 2026-02-21T08:13:45.6380295Z // begin inline asm 2026-02-21T08:13:45.6380461Z @%p4 mbarrier.init.shared::cta.b64 [%r171], 1; 2026-02-21T08:13:45.6380643Z // end inline asm 2026-02-21T08:13:45.6380783Z bar.sync 0; 2026-02-21T08:13:45.6380913Z add.s32 %r172, %r35, 102424; 2026-02-21T08:13:45.6381071Z // begin inline asm 2026-02-21T08:13:45.6381234Z @%p4 mbarrier.init.shared::cta.b64 [%r172], 1; 2026-02-21T08:13:45.6381415Z // end inline asm 2026-02-21T08:13:45.6381552Z bar.sync 0; 2026-02-21T08:13:45.6381682Z add.s32 %r173, %r35, 102432; 2026-02-21T08:13:45.6381843Z // begin inline asm 2026-02-21T08:13:45.6382003Z @%p4 mbarrier.init.shared::cta.b64 [%r173], 1; 2026-02-21T08:13:45.6382194Z // end inline asm 2026-02-21T08:13:45.6382326Z bar.sync 0; 2026-02-21T08:13:45.6382469Z add.s32 %r174, %r35, 102440; 2026-02-21T08:13:45.6382622Z // begin inline asm 2026-02-21T08:13:45.6382789Z @%p4 mbarrier.init.shared::cta.b64 [%r174], 1; 2026-02-21T08:13:45.6382981Z // end inline asm 2026-02-21T08:13:45.6383113Z bar.sync 0; 2026-02-21T08:13:45.6383255Z add.s32 %r260, %r35, 102448; 2026-02-21T08:13:45.6383411Z // begin inline asm 2026-02-21T08:13:45.6383584Z @%p4 mbarrier.init.shared::cta.b64 [%r260], 1; 2026-02-21T08:13:45.6383766Z // end inline asm 2026-02-21T08:13:45.6383904Z bar.sync 0; 2026-02-21T08:13:45.6384029Z // begin inline asm 2026-02-21T08:13:45.6384227Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r169], 12288; 2026-02-21T08:13:45.6384443Z // end inline asm 2026-02-21T08:13:45.6384749Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6385055Z bar.sync 0; 2026-02-21T08:13:45.6385199Z elect.sync %r241|%p90, -1; 2026-02-21T08:13:45.6385377Z and.pred %p72, %p1, %p90; 2026-02-21T08:13:45.6385535Z // begin inline asm 2026-02-21T08:13:45.6385877Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r73], [%rd87, {%r510, %r404}], [%r169]; 2026-02-21T08:13:45.6386244Z // end inline asm 2026-02-21T08:13:45.6386505Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6386802Z // begin inline asm 2026-02-21T08:13:45.6386958Z fence.proxy.async.shared::cta; 2026-02-21T08:13:45.6387139Z // end inline asm 2026-02-21T08:13:45.6387272Z bar.sync 0; 2026-02-21T08:13:45.6387420Z elect.sync %r242|%p91, -1; 2026-02-21T08:13:45.6387586Z and.pred %p73, %p1, %p91; 2026-02-21T08:13:45.6387748Z // begin inline asm 2026-02-21T08:13:45.6388081Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r35], [%rd88, {%r510, %r267}], [%r169]; 2026-02-21T08:13:45.6388485Z // end inline asm 2026-02-21T08:13:45.6388733Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6389010Z bar.sync 0; 2026-02-21T08:13:45.6389140Z // begin inline asm 2026-02-21T08:13:45.6389322Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r170], 12288; 2026-02-21T08:13:45.6389543Z // end inline asm 2026-02-21T08:13:45.6389786Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6390069Z bar.sync 0; 2026-02-21T08:13:45.6390208Z elect.sync %r243|%p92, -1; 2026-02-21T08:13:45.6390365Z and.pred %p75, %p1, %p92; 2026-02-21T08:13:45.6390525Z add.s32 %r186, %r35, 61440; 2026-02-21T08:13:45.6390672Z mov.b32 %r187, 32; 2026-02-21T08:13:45.6390814Z // begin inline asm 2026-02-21T08:13:45.6391188Z @%p75 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r186], [%rd87, {%r187, %r404}], [%r170]; 2026-02-21T08:13:45.6391541Z // end inline asm 2026-02-21T08:13:45.6391777Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6392058Z bar.sync 0; 2026-02-21T08:13:45.6392195Z elect.sync %r244|%p93, -1; 2026-02-21T08:13:45.6392354Z and.pred %p76, %p1, %p93; 2026-02-21T08:13:45.6392515Z add.s32 %r190, %r35, 8192; 2026-02-21T08:13:45.6392661Z // begin inline asm 2026-02-21T08:13:45.6392982Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r190], [%rd88, {%r187, %r267}], [%r170]; 2026-02-21T08:13:45.6393331Z // end inline asm 2026-02-21T08:13:45.6393583Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6393866Z bar.sync 0; 2026-02-21T08:13:45.6393993Z // begin inline asm 2026-02-21T08:13:45.6394188Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r171], 12288; 2026-02-21T08:13:45.6394404Z // end inline asm 2026-02-21T08:13:45.6394656Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6394972Z bar.sync 0; 2026-02-21T08:13:45.6395110Z elect.sync %r245|%p94, -1; 2026-02-21T08:13:45.6395268Z and.pred %p78, %p1, %p94; 2026-02-21T08:13:45.6395429Z add.s32 %r195, %r35, 65536; 2026-02-21T08:13:45.6395579Z mov.b32 %r196, 64; 2026-02-21T08:13:45.6395712Z // begin inline asm 2026-02-21T08:13:45.6396037Z @%p78 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r195], [%rd87, {%r196, %r404}], [%r171]; 2026-02-21T08:13:45.6396375Z // end inline asm 2026-02-21T08:13:45.6396623Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6396907Z bar.sync 0; 2026-02-21T08:13:45.6397044Z elect.sync %r246|%p95, -1; 2026-02-21T08:13:45.6397201Z and.pred %p79, %p1, %p95; 2026-02-21T08:13:45.6397363Z add.s32 %r199, %r35, 16384; 2026-02-21T08:13:45.6397520Z // begin inline asm 2026-02-21T08:13:45.6397836Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r199], [%rd88, {%r196, %r267}], [%r171]; 2026-02-21T08:13:45.6398183Z // end inline asm 2026-02-21T08:13:45.6398426Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6398710Z bar.sync 0; 2026-02-21T08:13:45.6398831Z // begin inline asm 2026-02-21T08:13:45.6399020Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r172], 12288; 2026-02-21T08:13:45.6399234Z // end inline asm 2026-02-21T08:13:45.6399473Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6399761Z bar.sync 0; 2026-02-21T08:13:45.6399888Z elect.sync %r247|%p96, -1; 2026-02-21T08:13:45.6400049Z and.pred %p81, %p1, %p96; 2026-02-21T08:13:45.6400201Z add.s32 %r204, %r35, 69632; 2026-02-21T08:13:45.6400351Z mov.b32 %r205, 96; 2026-02-21T08:13:45.6400482Z // begin inline asm 2026-02-21T08:13:45.6400870Z @%p81 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r204], [%rd87, {%r205, %r404}], [%r172]; 2026-02-21T08:13:45.6401211Z // end inline asm 2026-02-21T08:13:45.6401444Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6401718Z bar.sync 0; 2026-02-21T08:13:45.6401846Z elect.sync %r248|%p97, -1; 2026-02-21T08:13:45.6402007Z and.pred %p82, %p1, %p97; 2026-02-21T08:13:45.6402159Z add.s32 %r208, %r35, 24576; 2026-02-21T08:13:45.6402311Z // begin inline asm 2026-02-21T08:13:45.6402622Z @%p82 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r208], [%rd88, {%r205, %r267}], [%r172]; 2026-02-21T08:13:45.6402956Z // end inline asm 2026-02-21T08:13:45.6403200Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6403467Z bar.sync 0; 2026-02-21T08:13:45.6403654Z // begin inline asm 2026-02-21T08:13:45.6403838Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r173], 12288; 2026-02-21T08:13:45.6404053Z // end inline asm 2026-02-21T08:13:45.6404287Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6404565Z bar.sync 0; 2026-02-21T08:13:45.6404735Z elect.sync %r249|%p98, -1; 2026-02-21T08:13:45.6404895Z and.pred %p84, %p1, %p98; 2026-02-21T08:13:45.6405058Z add.s32 %r213, %r35, 73728; 2026-02-21T08:13:45.6405207Z mov.b32 %r214, 128; 2026-02-21T08:13:45.6405373Z // begin inline asm 2026-02-21T08:13:45.6405681Z @%p84 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r213], [%rd87, {%r214, %r404}], [%r173]; 2026-02-21T08:13:45.6406026Z // end inline asm 2026-02-21T08:13:45.6406273Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6406544Z bar.sync 0; 2026-02-21T08:13:45.6406691Z elect.sync %r250|%p99, -1; 2026-02-21T08:13:45.6406854Z and.pred %p85, %p1, %p99; 2026-02-21T08:13:45.6407015Z add.s32 %r217, %r35, 32768; 2026-02-21T08:13:45.6407163Z // begin inline asm 2026-02-21T08:13:45.6407475Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r217], [%rd88, {%r214, %r267}], [%r173]; 2026-02-21T08:13:45.6407810Z // end inline asm 2026-02-21T08:13:45.6408058Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6408335Z bar.sync 0; 2026-02-21T08:13:45.6408460Z // begin inline asm 2026-02-21T08:13:45.6408650Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r174], 12288; 2026-02-21T08:13:45.6408862Z // end inline asm 2026-02-21T08:13:45.6409109Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6409382Z bar.sync 0; 2026-02-21T08:13:45.6409519Z elect.sync %r251|%p100, -1; 2026-02-21T08:13:45.6409678Z and.pred %p87, %p1, %p100; 2026-02-21T08:13:45.6409839Z add.s32 %r222, %r35, 77824; 2026-02-21T08:13:45.6409991Z mov.b32 %r223, 160; 2026-02-21T08:13:45.6410129Z // begin inline asm 2026-02-21T08:13:45.6410444Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r222], [%rd87, {%r223, %r404}], [%r174]; 2026-02-21T08:13:45.6410781Z // end inline asm 2026-02-21T08:13:45.6411027Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6411301Z bar.sync 0; 2026-02-21T08:13:45.6411436Z elect.sync %r252|%p101, -1; 2026-02-21T08:13:45.6411598Z and.pred %p88, %p1, %p101; 2026-02-21T08:13:45.6411748Z add.s32 %r226, %r35, 40960; 2026-02-21T08:13:45.6411900Z // begin inline asm 2026-02-21T08:13:45.6412208Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r226], [%rd88, {%r223, %r267}], [%r174]; 2026-02-21T08:13:45.6412550Z // end inline asm 2026-02-21T08:13:45.6412791Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6413133Z bar.sync 0; 2026-02-21T08:13:45.6413262Z // begin inline asm 2026-02-21T08:13:45.6413392Z 2026-02-21T08:13:45.6413507Z { 2026-02-21T08:13:45.6413626Z .reg .pred complete; 2026-02-21T08:13:45.6413773Z waitLoop: 2026-02-21T08:13:45.6413955Z mbarrier.try_wait.parity.shared.b64 complete, [%r169], %r510; 2026-02-21T08:13:45.6414189Z @!complete bra.uni waitLoop; 2026-02-21T08:13:45.6414335Z } 2026-02-21T08:13:45.6414406Z 2026-02-21T08:13:45.6414462Z // end inline asm 2026-02-21T08:13:45.6414719Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6415006Z setp.ne.b32 %p102, %r17, 0; 2026-02-21T08:13:45.6415165Z @%p102 bra $L__BB0_4; 2026-02-21T08:13:45.6415354Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:45.6415576Z elect.sync %r257|%p104, -1; 2026-02-21T08:13:45.6415787Z mov.b32 %r254, 69206032; 2026-02-21T08:13:45.6415950Z mov.pred %p103, 0; 2026-02-21T08:13:45.6416088Z // begin inline asm 2026-02-21T08:13:45.6416320Z @%p104 tcgen05.mma.cta_group::1.kind::f16 [ %r508 + 0 ], %rd82, %rd83, %r254, %p103; 2026-02-21T08:13:45.6416579Z // end inline asm 2026-02-21T08:13:45.6416725Z // begin inline asm 2026-02-21T08:13:45.6416950Z @%p104 tcgen05.mma.cta_group::1.kind::f16 [ %r508 + 0 ], %rd84, %rd85, %r254, %p58; 2026-02-21T08:13:45.6417191Z // end inline asm 2026-02-21T08:13:45.6417335Z add.s32 %r259, %r35, 102464; 2026-02-21T08:13:45.6417487Z cvt.u64.u32 %rd86, %r259; 2026-02-21T08:13:45.6417642Z // begin inline asm 2026-02-21T08:13:45.6417845Z @%p104 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T08:13:45.6418079Z // end inline asm 2026-02-21T08:13:45.6418260Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:13:45.6418575Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6418858Z bar.sync 0; 2026-02-21T08:13:45.6418981Z // begin inline asm 2026-02-21T08:13:45.6419168Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r260], 12288; 2026-02-21T08:13:45.6419375Z // end inline asm 2026-02-21T08:13:45.6419617Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6419887Z bar.sync 0; 2026-02-21T08:13:45.6420023Z elect.sync %r274|%p112, -1; 2026-02-21T08:13:45.6420186Z and.pred %p109, %p1, %p112; 2026-02-21T08:13:45.6420338Z add.s32 %r261, %r35, 81920; 2026-02-21T08:13:45.6420490Z mov.b32 %r262, 192; 2026-02-21T08:13:45.6420624Z // begin inline asm 2026-02-21T08:13:45.6420951Z @%p109 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r261], [%rd87, {%r262, %r404}], [%r260]; 2026-02-21T08:13:45.6421294Z // end inline asm 2026-02-21T08:13:45.6421539Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6421818Z bar.sync 0; 2026-02-21T08:13:45.6421944Z elect.sync %r275|%p113, -1; 2026-02-21T08:13:45.6422106Z and.pred %p110, %p1, %p113; 2026-02-21T08:13:45.6422255Z add.s32 %r265, %r35, 49152; 2026-02-21T08:13:45.6422410Z // begin inline asm 2026-02-21T08:13:45.6422723Z @%p110 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r265], [%rd88, {%r262, %r267}], [%r260]; 2026-02-21T08:13:45.6423071Z // end inline asm 2026-02-21T08:13:45.6423204Z mov.b32 %r515, 1; 2026-02-21T08:13:45.6423348Z mov.b32 %r514, 6; 2026-02-21T08:13:45.6423491Z mov.b32 %r512, %r510; 2026-02-21T08:13:45.6423639Z mov.b32 %r513, %r510; 2026-02-21T08:13:45.6423793Z mov.b32 %r516, %r510; 2026-02-21T08:13:45.6423936Z mov.b32 %r517, %r510; 2026-02-21T08:13:45.6424085Z bra.uni $L__BB0_5; 2026-02-21T08:13:45.6424270Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:13:45.6424635Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6425050Z setp.lt.u32 %p124, %r517, 800; 2026-02-21T08:13:45.6425364Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6425697Z // begin inline asm 2026-02-21T08:13:45.6425842Z 2026-02-21T08:13:45.6425973Z { 2026-02-21T08:13:45.6426107Z .reg .pred complete; 2026-02-21T08:13:45.6426277Z waitLoop: 2026-02-21T08:13:45.6426483Z mbarrier.try_wait.parity.shared.b64 complete, [%r511], %r510; 2026-02-21T08:13:45.6426759Z @!complete bra.uni waitLoop; 2026-02-21T08:13:45.6426927Z } 2026-02-21T08:13:45.6427010Z 2026-02-21T08:13:45.6427071Z // end inline asm 2026-02-21T08:13:45.6427224Z add.s32 %r313, %r515, 1; 2026-02-21T08:13:45.6427406Z setp.gt.s32 %p127, %r313, 1; 2026-02-21T08:13:45.6427586Z selp.b32 %r515, 0, %r313, %p127; 2026-02-21T08:13:45.6427780Z selp.b32 %r314, 1, 0, %p127; 2026-02-21T08:13:45.6428008Z xor.b32 %r31, %r516, %r314; 2026-02-21T08:13:45.6428277Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6428606Z add.s32 %r315, %r514, 1; 2026-02-21T08:13:45.6428762Z setp.gt.s32 %p128, %r315, 6; 2026-02-21T08:13:45.6428932Z selp.b32 %r514, 0, %r315, %p128; 2026-02-21T08:13:45.6429095Z shl.b32 %r316, %r514, 3; 2026-02-21T08:13:45.6429257Z add.s32 %r318, %r35, %r316; 2026-02-21T08:13:45.6429421Z add.s32 %r308, %r318, 102400; 2026-02-21T08:13:45.6429575Z bar.sync 0; 2026-02-21T08:13:45.6429721Z and.pred %p121, %p4, %p124; 2026-02-21T08:13:45.6429878Z // begin inline asm 2026-02-21T08:13:45.6430082Z @%p121 mbarrier.arrive.expect_tx.shared.b64 _, [%r308], 12288; 2026-02-21T08:13:45.6430329Z // end inline asm 2026-02-21T08:13:45.6430605Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6430913Z shl.b32 %r319, %r514, 12; 2026-02-21T08:13:45.6431076Z add.s32 %r320, %r35, %r319; 2026-02-21T08:13:45.6431237Z add.s32 %r305, %r320, 57344; 2026-02-21T08:13:45.6431387Z bar.sync 0; 2026-02-21T08:13:45.6431531Z elect.sync %r321|%p129, -1; 2026-02-21T08:13:45.6431692Z and.pred %p130, %p124, %p129; 2026-02-21T08:13:45.6431863Z and.pred %p122, %p1, %p130; 2026-02-21T08:13:45.6432021Z add.s32 %r306, %r517, 224; 2026-02-21T08:13:45.6432180Z // begin inline asm 2026-02-21T08:13:45.6432508Z @%p122 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r305], [%rd87, {%r306, %r404}], [%r308]; 2026-02-21T08:13:45.6432887Z // end inline asm 2026-02-21T08:13:45.6433140Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6433430Z shl.b32 %r322, %r514, 13; 2026-02-21T08:13:45.6433584Z add.s32 %r309, %r35, %r322; 2026-02-21T08:13:45.6433725Z bar.sync 0; 2026-02-21T08:13:45.6433859Z elect.sync %r323|%p131, -1; 2026-02-21T08:13:45.6434015Z and.pred %p132, %p124, %p131; 2026-02-21T08:13:45.6434180Z and.pred %p123, %p1, %p132; 2026-02-21T08:13:45.6434329Z // begin inline asm 2026-02-21T08:13:45.6434651Z @%p123 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r309], [%rd88, {%r306, %r267}], [%r308]; 2026-02-21T08:13:45.6435051Z // end inline asm 2026-02-21T08:13:45.6435296Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6435586Z setp.lt.u32 %p133, %r517, 960; 2026-02-21T08:13:45.6435745Z add.s32 %r517, %r517, 32; 2026-02-21T08:13:45.6435899Z mov.b32 %r510, %r516; 2026-02-21T08:13:45.6436042Z mov.b32 %r511, %r324; 2026-02-21T08:13:45.6436192Z mov.b32 %r516, %r31; 2026-02-21T08:13:45.6436340Z @%p133 bra $L__BB0_5; 2026-02-21T08:13:45.6436478Z bra.uni $L__BB0_8; 2026-02-21T08:13:45.6436662Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:13:45.6436899Z // => This Inner Loop Header: Depth=2 2026-02-21T08:13:45.6437276Z .loc 1 50 57 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:50:57 2026-02-21T08:13:45.6437560Z add.s32 %r278, %r513, 1; 2026-02-21T08:13:45.6437720Z setp.gt.s32 %p115, %r278, 6; 2026-02-21T08:13:45.6437877Z selp.b32 %r513, 0, %r278, %p115; 2026-02-21T08:13:45.6438046Z selp.b32 %r279, 1, 0, %p115; 2026-02-21T08:13:45.6438204Z xor.b32 %r512, %r512, %r279; 2026-02-21T08:13:45.6438352Z shl.b32 %r280, %r513, 3; 2026-02-21T08:13:45.6438504Z add.s32 %r282, %r35, %r280; 2026-02-21T08:13:45.6438655Z add.s32 %r276, %r282, 102400; 2026-02-21T08:13:45.6438811Z bar.sync 0; 2026-02-21T08:13:45.6438941Z // begin inline asm 2026-02-21T08:13:45.6439078Z 2026-02-21T08:13:45.6439187Z { 2026-02-21T08:13:45.6439311Z .reg .pred complete; 2026-02-21T08:13:45.6439449Z waitLoop: 2026-02-21T08:13:45.6439639Z mbarrier.try_wait.parity.shared.b64 complete, [%r276], %r512; 2026-02-21T08:13:45.6439941Z @!complete bra.uni waitLoop; 2026-02-21T08:13:45.6440088Z } 2026-02-21T08:13:45.6440152Z 2026-02-21T08:13:45.6440212Z // end inline asm 2026-02-21T08:13:45.6440268Z shl.b32 %r283, %r515, 3; 2026-02-21T08:13:45.6440324Z add.s32 %r284, %r35, %r283; 2026-02-21T08:13:45.6440389Z add.s32 %r324, %r284, 102464; 2026-02-21T08:13:45.6440558Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6440614Z @%p102 bra $L__BB0_7; 2026-02-21T08:13:45.6440709Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:13:45.6440881Z .loc 1 54 31 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:54:31 2026-02-21T08:13:45.6440937Z shl.b32 %r289, %r513, 12; 2026-02-21T08:13:45.6440991Z add.s32 %r291, %r35, %r289; 2026-02-21T08:13:45.6441053Z add.s32 %r292, %r291, 57344; 2026-02-21T08:13:45.6441215Z .loc 1 55 44 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:55:44 2026-02-21T08:13:45.6441272Z shl.b32 %r293, %r513, 13; 2026-02-21T08:13:45.6441335Z add.s32 %r294, %r35, %r293; 2026-02-21T08:13:45.6441497Z .loc 1 56 52 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:56:52 2026-02-21T08:13:45.6441557Z elect.sync %r295|%p117, -1; 2026-02-21T08:13:45.6441617Z bfe.u32 %r296, %r292, 4, 14; 2026-02-21T08:13:45.6441681Z cvt.u64.u32 %rd94, %r296; 2026-02-21T08:13:45.6441754Z or.b64 %rd89, %rd94, -9223371899399045120; 2026-02-21T08:13:45.6441809Z bfe.u32 %r297, %r294, 4, 14; 2026-02-21T08:13:45.6441873Z cvt.u64.u32 %rd95, %r297; 2026-02-21T08:13:45.6441944Z or.b64 %rd90, %rd95, -9223371899382267904; 2026-02-21T08:13:45.6441998Z mov.b32 %r286, 69206032; 2026-02-21T08:13:45.6442056Z mov.pred %p116, -1; 2026-02-21T08:13:45.6442116Z // begin inline asm 2026-02-21T08:13:45.6442252Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r508 + 0 ], %rd89, %rd90, %r286, %p116; 2026-02-21T08:13:45.6442306Z // end inline asm 2026-02-21T08:13:45.6442371Z add.s32 %r298, %r291, 57376; 2026-02-21T08:13:45.6442428Z bfe.u32 %r299, %r298, 4, 14; 2026-02-21T08:13:45.6442483Z cvt.u64.u32 %rd96, %r299; 2026-02-21T08:13:45.6442555Z or.b64 %rd91, %rd96, -9223371899399045120; 2026-02-21T08:13:45.6442610Z add.s32 %r300, %r294, 32; 2026-02-21T08:13:45.6442665Z bfe.u32 %r301, %r300, 4, 14; 2026-02-21T08:13:45.6442722Z cvt.u64.u32 %rd97, %r301; 2026-02-21T08:13:45.6442795Z or.b64 %rd92, %rd97, -9223371899382267904; 2026-02-21T08:13:45.6442851Z // begin inline asm 2026-02-21T08:13:45.6442984Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r508 + 0 ], %rd91, %rd92, %r286, %p116; 2026-02-21T08:13:45.6443045Z // end inline asm 2026-02-21T08:13:45.6443100Z cvt.u64.u32 %rd93, %r324; 2026-02-21T08:13:45.6443153Z // begin inline asm 2026-02-21T08:13:45.6443284Z @%p117 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd93]; 2026-02-21T08:13:45.6443337Z // end inline asm 2026-02-21T08:13:45.6443391Z bra.uni $L__BB0_7; 2026-02-21T08:13:45.6443471Z $L__BB0_9: // %._crit_edge 2026-02-21T08:13:45.6443691Z .loc 1 33 74 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:33:74 2026-02-21T08:13:45.6443760Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:13:45.6443814Z bar.sync 0; 2026-02-21T08:13:45.6443985Z .loc 1 33 4 // cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py:33:4 2026-02-21T08:13:45.6444038Z bar.sync 0; 2026-02-21T08:13:45.6444094Z // begin inline asm 2026-02-21T08:13:45.6444215Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r508, 128; 2026-02-21T08:13:45.6444270Z // end inline asm 2026-02-21T08:13:45.6444321Z ret; 2026-02-21T08:13:45.6444387Z $L__tmp1: 2026-02-21T08:13:45.6444450Z $L__func_end0: 2026-02-21T08:13:45.6444528Z // -- End function 2026-02-21T08:13:45.6444576Z } 2026-02-21T08:13:45.6444830Z .file 1 "/tmp/torchinductor_root/qj/cqjben3473cqgnvuhmkidqnzqxt2urs4rizyfejxlqtuazd7ewny.py" 2026-02-21T08:13:45.6444940Z .section .debug_abbrev 2026-02-21T08:13:45.6444993Z { 2026-02-21T08:13:45.6445082Z .b8 1 // Abbreviation Code 2026-02-21T08:13:45.6445176Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:13:45.6445259Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:13:45.6445340Z .b8 37 // DW_AT_producer 2026-02-21T08:13:45.6445422Z .b8 8 // DW_FORM_string 2026-02-21T08:13:45.6445494Z .b8 19 // DW_AT_language 2026-02-21T08:13:45.6445572Z .b8 5 // DW_FORM_data2 2026-02-21T08:13:45.6445654Z .b8 3 // DW_AT_name 2026-02-21T08:13:45.6445727Z .b8 8 // DW_FORM_string 2026-02-21T08:13:45.6445804Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:13:45.6445884Z .b8 6 // DW_FORM_data4 2026-02-21T08:13:45.6445969Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:13:45.6446043Z .b8 8 // DW_FORM_string 2026-02-21T08:13:45.6446114Z .b8 0 // EOM(1) 2026-02-21T08:13:45.6446193Z .b8 0 // EOM(2) 2026-02-21T08:13:45.6446259Z .b8 0 // EOM(3) 2026-02-21T08:13:45.6446310Z } 2026-02-21T08:13:45.6446378Z .section .debug_info 2026-02-21T08:13:45.6446429Z { 2026-02-21T08:13:45.6446511Z .b32 104 // Length of Unit 2026-02-21T08:13:45.6446594Z .b8 2 // DWARF version number 2026-02-21T08:13:45.6446655Z .b8 0 2026-02-21T08:13:45.6446768Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:13:45.6446856Z .b8 8 // Address Size (in bytes) 2026-02-21T08:13:45.6446967Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:13:45.6447049Z .b8 116 // DW_AT_producer 2026-02-21T08:13:45.6447104Z .b8 114 2026-02-21T08:13:45.6447165Z .b8 105 2026-02-21T08:13:45.6447217Z .b8 116 2026-02-21T08:13:45.6447267Z .b8 111 2026-02-21T08:13:45.6447318Z .b8 110 2026-02-21T08:13:45.6447378Z .b8 0 2026-02-21T08:13:45.6447451Z .b8 2 // DW_AT_language 2026-02-21T08:13:45.6447503Z .b8 0 2026-02-21T08:13:45.6447585Z .b8 99 // DW_AT_name 2026-02-21T08:13:45.6447637Z .b8 113 2026-02-21T08:13:45.6447688Z .b8 106 2026-02-21T08:13:45.6447739Z .b8 98 2026-02-21T08:13:45.6447798Z .b8 101 2026-02-21T08:13:45.6447851Z .b8 110 2026-02-21T08:13:45.6447902Z .b8 51 2026-02-21T08:13:45.6447953Z .b8 52 2026-02-21T08:13:45.6448009Z .b8 55 2026-02-21T08:13:45.6448060Z .b8 51 2026-02-21T08:13:45.6448111Z .b8 99 2026-02-21T08:13:45.6448168Z .b8 113 2026-02-21T08:13:45.6448220Z .b8 103 2026-02-21T08:13:45.6448272Z .b8 110 2026-02-21T08:13:45.6448322Z .b8 118 2026-02-21T08:13:45.6448432Z .b8 117 2026-02-21T08:13:45.6448482Z .b8 104 2026-02-21T08:13:45.6448529Z .b8 109 2026-02-21T08:13:45.6448584Z .b8 107 2026-02-21T08:13:45.6448631Z .b8 105 2026-02-21T08:13:45.6448679Z .b8 100 2026-02-21T08:13:45.6448725Z .b8 113 2026-02-21T08:13:45.6448780Z .b8 110 2026-02-21T08:13:45.6448827Z .b8 122 2026-02-21T08:13:45.6448875Z .b8 113 2026-02-21T08:13:45.6448923Z .b8 120 2026-02-21T08:13:45.6448978Z .b8 116 2026-02-21T08:13:45.6449025Z .b8 50 2026-02-21T08:13:45.6449074Z .b8 117 2026-02-21T08:13:45.6449128Z .b8 114 2026-02-21T08:13:45.6449176Z .b8 115 2026-02-21T08:13:45.6449225Z .b8 52 2026-02-21T08:13:45.6449271Z .b8 114 2026-02-21T08:13:45.6449327Z .b8 105 2026-02-21T08:13:45.6449374Z .b8 122 2026-02-21T08:13:45.6449422Z .b8 121 2026-02-21T08:13:45.6449478Z .b8 102 2026-02-21T08:13:45.6449526Z .b8 101 2026-02-21T08:13:45.6449574Z .b8 106 2026-02-21T08:13:45.6449622Z .b8 120 2026-02-21T08:13:45.6449678Z .b8 108 2026-02-21T08:13:45.6449772Z .b8 113 2026-02-21T08:13:45.6449824Z .b8 116 2026-02-21T08:13:45.6449879Z .b8 117 2026-02-21T08:13:45.6449929Z .b8 97 2026-02-21T08:13:45.6449979Z .b8 122 2026-02-21T08:13:45.6450029Z .b8 100 2026-02-21T08:13:45.6450085Z .b8 55 2026-02-21T08:13:45.6450133Z .b8 101 2026-02-21T08:13:45.6450183Z .b8 119 2026-02-21T08:13:45.6450231Z .b8 110 2026-02-21T08:13:45.6450288Z .b8 121 2026-02-21T08:13:45.6450336Z .b8 46 2026-02-21T08:13:45.6450385Z .b8 112 2026-02-21T08:13:45.6450441Z .b8 121 2026-02-21T08:13:45.6450491Z .b8 0 2026-02-21T08:13:45.6450580Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:13:45.6450655Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:13:45.6450716Z .b8 116 2026-02-21T08:13:45.6450764Z .b8 109 2026-02-21T08:13:45.6450811Z .b8 112 2026-02-21T08:13:45.6450867Z .b8 47 2026-02-21T08:13:45.6450915Z .b8 116 2026-02-21T08:13:45.6450964Z .b8 111 2026-02-21T08:13:45.6451011Z .b8 114 2026-02-21T08:13:45.6451067Z .b8 99 2026-02-21T08:13:45.6451117Z .b8 104 2026-02-21T08:13:45.6451170Z .b8 105 2026-02-21T08:13:45.6451226Z .b8 110 2026-02-21T08:13:45.6451275Z .b8 100 2026-02-21T08:13:45.6451324Z .b8 117 2026-02-21T08:13:45.6451373Z .b8 99 2026-02-21T08:13:45.6451431Z .b8 116 2026-02-21T08:13:45.6451481Z .b8 111 2026-02-21T08:13:45.6451531Z .b8 114 2026-02-21T08:13:45.6451586Z .b8 95 2026-02-21T08:13:45.6451635Z .b8 114 2026-02-21T08:13:45.6451685Z .b8 111 2026-02-21T08:13:45.6451734Z .b8 111 2026-02-21T08:13:45.6451790Z .b8 116 2026-02-21T08:13:45.6451837Z .b8 47 2026-02-21T08:13:45.6451887Z .b8 113 2026-02-21T08:13:45.6451934Z .b8 106 2026-02-21T08:13:45.6451990Z .b8 0 2026-02-21T08:13:45.6452039Z } 2026-02-21T08:13:45.6452102Z .section .debug_macinfo { } 2026-02-21T08:13:45.6452106Z 2026-02-21T08:13:45.6452187Z ================================================================ 2026-02-21T08:13:45.6452287Z please share the reproducer above with Triton project. 2026-02-21T08:13:45.8118973Z 2026-02-21T08:13:45.8119769Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 34/34 22.9 configs/s 2026-02-21T08:13:46.4400942Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1599.8 2026-02-21T08:13:46.4401821Z configs/s 2026-02-21T08:13:46.5003209Z [105s] Generation 8 complete: 2026-02-21T08:13:46.5005224Z error=11 2026-02-21T08:13:46.5009360Z ok=24 2026-02-21T08:13:46.5013959Z min=0.0184 2026-02-21T08:13:46.5017973Z mid=0.0308 2026-02-21T08:13:46.5021941Z max=1.7275 2026-02-21T08:13:46.5025449Z best={'block_sizes': [256, 128, 64], 2026-02-21T08:13:46.5029059Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:13:46.5029359Z 'l2_groupings': [64], 2026-02-21T08:13:46.5029545Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:13:46.5029738Z 'loop_orders': [[1, 0]], 2026-02-21T08:13:46.5029902Z 'maxnreg': 256, 2026-02-21T08:13:46.5030046Z 'num_sm_multiplier': 1, 2026-02-21T08:13:46.5030206Z 'num_stages': 4, 2026-02-21T08:13:46.5030378Z 'num_warps': 1, 2026-02-21T08:13:46.5030939Z 'pid_type': 'persistent_blocked', 2026-02-21T08:13:46.5031130Z 'range_flattens': [None, False], 2026-02-21T08:13:46.5031306Z 'range_multi_buffers': [None, True], 2026-02-21T08:13:46.5031491Z 'range_num_stages': [0, 0], 2026-02-21T08:13:46.5031653Z 'range_unroll_factors': [0, 0], 2026-02-21T08:13:46.5031839Z 'range_warp_specializes': [True, None]} 2026-02-21T08:13:46.5032051Z [105s] Fitting surrogate: 644 points, 644 targets 2026-02-21T08:13:46.7774066Z [105s] Autotuning complete in 105.6s after searching 624 configs. 2026-02-21T08:13:46.7775204Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:13:46.7778389Z @helion.kernel(config=helion.Config(block_sizes=[256, 128, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=1, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:13:46.7780843Z 2026-02-21T08:13:46.7781388Z [105s] Code of selected kernel: /tmp/torchinductor_root/mk/cmkzqd5brdu7rpwh42e5oabnk7ace6aemr46xochtg7gluda2s2h.py 2026-02-21T08:14:03.3829596Z WARNING:tritonbench.utils.triton_op:Completed input ID 0: 2026-02-21T08:14:03.3834585Z (M, N, K) 2026-02-21T08:14:03.3838765Z ------------------ 2026-02-21T08:14:03.3842364Z (4096, 1024, 1024) 2026-02-21T08:14:03.3842545Z 2026-02-21T08:14:03.3843241Z 12%|█▎ | 1/8 [03:52<27:07, 232.45s/it]WARNING:tritonbench.utils.triton_op:Running input ID 2: 2026-02-21T08:14:03.3843546Z (M, N, K) 2026-02-21T08:14:03.3843693Z ------------------ 2026-02-21T08:14:03.3843831Z (4096, 2048, 2048) 2026-02-21T08:14:03.3844085Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:14:51.6781926Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:15:31.5367332Z Autotune Choices Stats: 2026-02-21T08:15:31.5369699Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.04076800122857094, "best_triton_pos": 0} 2026-02-21T08:15:31.5371867Z AUTOTUNE mm(4096x2048, 2048x2048) 2026-02-21T08:15:31.5372214Z strides: [2048, 1], [1, 2048] 2026-02-21T08:15:31.5372479Z dtypes: torch.float16, torch.float16 2026-02-21T08:15:31.5373073Z triton_mm_36 0.0408 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:15:31.5374005Z triton_mm_37 0.0512 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:15:31.5374952Z triton_mm_30 0.0542 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:15:31.5380298Z triton_mm_35 0.0553 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:15:31.5381134Z triton_mm_33 0.0633 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:15:31.5381977Z triton_mm_28 0.0634 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:15:31.5382838Z triton_mm_29 0.0634 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:15:31.5383960Z triton_mm_26 0.0653 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:15:31.5384868Z triton_mm_32 0.0674 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:15:31.5385668Z triton_mm_31 0.0758 ms 53.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T08:15:31.5386363Z SingleProcess AUTOTUNE benchmarking takes 0.3977 seconds and 0.5482 seconds precompiling for 19 choices 2026-02-21T08:15:31.8995939Z INFO:tritonbench.utils.triton_op:Took 1501.93ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:16:11.6977623Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:16:11.6978213Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:16:11.6978420Z 'dtype': 'torch.float16', 2026-02-21T08:16:11.6978607Z 'shape': (4096, 2048), 2026-02-21T08:16:11.6978790Z 'stride': (2048, 1)}, 2026-02-21T08:16:11.6978967Z { 'device': 'cuda:0', 2026-02-21T08:16:11.6979180Z 'dtype': 'torch.float16', 2026-02-21T08:16:11.6979362Z 'shape': (2048, 2048), 2026-02-21T08:16:11.6979530Z 'stride': (1, 2048)}, 2026-02-21T08:16:11.6979703Z None), 2026-02-21T08:16:11.6979845Z 'kwargs': {}} 2026-02-21T08:16:11.7013515Z INFO:tritonbench.utils.triton_op:Took 4.20ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:16:11.7941236Z [0s] Autotune random seed: 2134884919 2026-02-21T08:16:11.9203197Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:16:15.5874506Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 47.0 configs/s 2026-02-21T08:16:17.6476365Z 2026-02-21T08:16:17.6476380Z 2026-02-21T08:16:17.6476833Z ================================================================ 2026-02-21T08:16:17.6477230Z Internal Triton PTX codegen error 2026-02-21T08:16:17.6477506Z `ptxas` stderr: 2026-02-21T08:16:17.6478186Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 160 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:16:17.6478984Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:16:17.6479230Z 2026-02-21T08:16:17.6479943Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpf3h1r6aq.ptx -o /tmp/tmpf3h1r6aq.ptx.o 2026-02-21T08:16:17.6480719Z 2026-02-21T08:16:17.6480725Z 2026-02-21T08:16:17.6480802Z // 2026-02-21T08:16:17.6481014Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:16:17.6481310Z // 2026-02-21T08:16:17.6481420Z 2026-02-21T08:16:17.6481517Z .version 8.7 2026-02-21T08:16:17.6481710Z .target sm_100a 2026-02-21T08:16:17.6481917Z .address_size 64 2026-02-21T08:16:17.6482042Z 2026-02-21T08:16:17.6482237Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:16:17.6482634Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:16:17.6482963Z // @_helion_matmul 2026-02-21T08:16:17.6483267Z .visible .entry _helion_matmul( 2026-02-21T08:16:17.6483603Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:16:17.6484008Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:16:17.6484418Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:16:17.6485202Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:16:17.6485594Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:16:17.6485913Z ) 2026-02-21T08:16:17.6486106Z .reqntid 256 2026-02-21T08:16:17.6486728Z .maxnreg 32 2026-02-21T08:16:17.6486923Z { 2026-02-21T08:16:17.6487127Z .reg .pred %p<43>; 2026-02-21T08:16:17.6487359Z .reg .b16 %rs<3>; 2026-02-21T08:16:17.6487591Z .reg .b32 %r<342>; 2026-02-21T08:16:17.6487816Z .reg .b64 %rd<185>; 2026-02-21T08:16:17.6488057Z $L__func_begin0: 2026-02-21T08:16:17.6488197Z 2026-02-21T08:16:17.6488289Z // %bb.0: 2026-02-21T08:16:17.6488701Z .loc 1 19 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:19 2026-02-21T08:16:17.6489221Z mov.u32 %r1, %tid.x; 2026-02-21T08:16:17.6489459Z shr.u32 %r2, %r1, 5; 2026-02-21T08:16:17.6489724Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:16:17.6490031Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:16:17.6490291Z @%p1 bra $L__BB0_10; 2026-02-21T08:16:17.6490520Z bra.uni $L__BB0_1; 2026-02-21T08:16:17.6490745Z $L__BB0_10: 2026-02-21T08:16:17.6491350Z .loc 1 0 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:0:0 2026-02-21T08:16:17.6491899Z ld.param.b64 %rd9, [_helion_matmul_param_3]; 2026-02-21T08:16:17.6492285Z ld.param.b64 %rd8, [_helion_matmul_param_2]; 2026-02-21T08:16:17.6492812Z .loc 1 19 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:19 2026-02-21T08:16:17.6493725Z [5s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:16:17.6495992Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 16], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[True, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:16:17.6498021Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:16:17.6498430Z `ptxas` stderr: 2026-02-21T08:16:17.6499152Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 160 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:16:17.6499975Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:16:17.6500230Z 2026-02-21T08:16:17.6500893Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpf3h1r6aq.ptx -o /tmp/tmpf3h1r6aq.ptx.o 2026-02-21T08:16:17.6501611Z 2026-02-21T08:16:17.6501815Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:16:17.6502263Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:16:17.6502590Z setp.lt.u32 %p14, %r1, 32; 2026-02-21T08:16:17.6502861Z mov.b32 %r239, global_smem; 2026-02-21T08:16:17.6503128Z // begin inline asm 2026-02-21T08:16:17.6503530Z @%p14 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r239], 32; 2026-02-21T08:16:17.6503975Z // end inline asm 2026-02-21T08:16:17.6504180Z bar.sync 0, 128; 2026-02-21T08:16:17.6504412Z ld.shared.b32 %r336, [global_smem]; 2026-02-21T08:16:17.6504730Z bar.sync 0, 128; 2026-02-21T08:16:17.6504943Z // begin inline asm 2026-02-21T08:16:17.6505275Z @%p14 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:16:17.6505635Z // end inline asm 2026-02-21T08:16:17.6506046Z .loc 1 21 71 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:21:71 2026-02-21T08:16:17.6506545Z mov.u32 %r341, %ctaid.x; 2026-02-21T08:16:17.6506805Z mov.u32 %r248, %ctaid.y; 2026-02-21T08:16:17.6507052Z mov.u32 %r249, %ctaid.z; 2026-02-21T08:16:17.6507311Z mov.u32 %r250, %nctaid.x; 2026-02-21T08:16:17.6507562Z mov.u32 %r251, %nctaid.y; 2026-02-21T08:16:17.6507837Z mad.lo.s32 %r252, %r249, %r251, %r248; 2026-02-21T08:16:17.6508150Z mad.lo.s32 %r253, %r252, %r250, %r341; 2026-02-21T08:16:17.6508432Z shl.b32 %r254, %r253, 7; 2026-02-21T08:16:17.6508696Z cvt.s64.s32 %rd150, %r254; 2026-02-21T08:16:17.6509113Z add.s64 %rd147, %rd9, %rd150; 2026-02-21T08:16:17.6509393Z shl.b32 %r255, %r1, 2; 2026-02-21T08:16:17.6509645Z add.s32 %r240, %r239, %r255; 2026-02-21T08:16:17.6509914Z mov.b32 %r241, 0; 2026-02-21T08:16:17.6510143Z // begin inline asm 2026-02-21T08:16:17.6510405Z @%p14 st.shared.b32 [ %r240 + 0 ], %r241; 2026-02-21T08:16:17.6510706Z // end inline asm 2026-02-21T08:16:17.6510928Z bar.warp.sync -1; 2026-02-21T08:16:17.6511169Z setp.eq.b32 %p17, %r1, 0; 2026-02-21T08:16:17.6511431Z cvt.u64.u32 %rd132, %r239; 2026-02-21T08:16:17.6511692Z // begin inline asm 2026-02-21T08:16:17.6512118Z @%p17 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd132 + 0 ], %rd8; 2026-02-21T08:16:17.6512610Z // end inline asm 2026-02-21T08:16:17.6512825Z // begin inline asm 2026-02-21T08:16:17.6513220Z @%p17 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1; 2026-02-21T08:16:17.6513668Z // end inline asm 2026-02-21T08:16:17.6514024Z mov.b32 %r242, 16; 2026-02-21T08:16:17.6514258Z // begin inline asm 2026-02-21T08:16:17.6514662Z @%p17 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0, %r242; 2026-02-21T08:16:17.6515192Z // end inline asm 2026-02-21T08:16:17.6515406Z mov.b32 %r243, 128; 2026-02-21T08:16:17.6515641Z // begin inline asm 2026-02-21T08:16:17.6516050Z @%p17 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1, %r243; 2026-02-21T08:16:17.6516491Z // end inline asm 2026-02-21T08:16:17.6516698Z mov.b32 %r244, 2048; 2026-02-21T08:16:17.6516911Z // begin inline asm 2026-02-21T08:16:17.6517312Z @%p17 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0, %r244; 2026-02-21T08:16:17.6517792Z // end inline asm 2026-02-21T08:16:17.6518017Z mov.b32 %r245, 4096; 2026-02-21T08:16:17.6518239Z // begin inline asm 2026-02-21T08:16:17.6518665Z @%p17 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1, %r245; 2026-02-21T08:16:17.6519122Z // end inline asm 2026-02-21T08:16:17.6519335Z mov.b64 %rd140, 4096; 2026-02-21T08:16:17.6519558Z // begin inline asm 2026-02-21T08:16:17.6519964Z @%p17 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd132 + 0 ], 0x0, %rd140; 2026-02-21T08:16:17.6520451Z // end inline asm 2026-02-21T08:16:17.6520655Z mov.b32 %r246, 1; 2026-02-21T08:16:17.6520872Z // begin inline asm 2026-02-21T08:16:17.6521303Z @%p17 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0, %r246; 2026-02-21T08:16:17.6521792Z // end inline asm 2026-02-21T08:16:17.6522011Z // begin inline asm 2026-02-21T08:16:17.6522450Z @%p17 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1, %r246; 2026-02-21T08:16:17.6522962Z // end inline asm 2026-02-21T08:16:17.6523170Z // begin inline asm 2026-02-21T08:16:17.6523565Z @%p17 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x6; 2026-02-21T08:16:17.6524024Z // end inline asm 2026-02-21T08:16:17.6524252Z // begin inline asm 2026-02-21T08:16:17.6524734Z @%p17 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0; 2026-02-21T08:16:17.6525225Z // end inline asm 2026-02-21T08:16:17.6525445Z // begin inline asm 2026-02-21T08:16:17.6525846Z @%p17 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1; 2026-02-21T08:16:17.6526302Z // end inline asm 2026-02-21T08:16:17.6526514Z // begin inline asm 2026-02-21T08:16:17.6526925Z @%p17 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0; 2026-02-21T08:16:17.6527379Z // end inline asm 2026-02-21T08:16:17.6527605Z // begin inline asm 2026-02-21T08:16:17.6528230Z @%p14 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd147 + 0 ], [ %rd132 + 0 ], 0x80; 2026-02-21T08:16:17.6528897Z // end inline asm 2026-02-21T08:16:17.6529123Z // begin inline asm 2026-02-21T08:16:17.6529487Z @%p14 fence.proxy.tensormap::generic.acquire.gpu [ %rd147 + 0 ], 0x80; 2026-02-21T08:16:17.6530052Z @%p14 cp.async.bulk.commit_group ; 2026-02-21T08:16:17.6530369Z @%p14 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:16:17.6530671Z // end inline asm 2026-02-21T08:16:17.6530894Z bar.sync 0, 128; 2026-02-21T08:16:17.6531131Z cvta.global.u64 %rd151, %rd147; 2026-02-21T08:16:17.6531627Z .loc 1 27 94 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:27:94 2026-02-21T08:16:17.6532138Z setp.gt.u32 %p34, %r341, 4095; 2026-02-21T08:16:17.6532438Z @%p34 bra $L__BB0_13; 2026-02-21T08:16:17.6532729Z // %bb.11: // %.lr.ph 2026-02-21T08:16:17.6533265Z .loc 1 0 94 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:0:94 2026-02-21T08:16:17.6533759Z shl.b32 %r256, %r1, 5; 2026-02-21T08:16:17.6534017Z and.b32 %r257, %r256, 3936; 2026-02-21T08:16:17.6534287Z bfe.s32 %r258, %r1, 2, 1; 2026-02-21T08:16:17.6534553Z and.b32 %r259, %r258, 144; 2026-02-21T08:16:17.6535018Z or.b32 %r260, %r259, %r257; 2026-02-21T08:16:17.6535307Z add.s32 %r262, %r239, 24576; 2026-02-21T08:16:17.6535597Z add.s32 %r15, %r262, %r260; 2026-02-21T08:16:17.6535873Z xor.b32 %r263, %r260, 16; 2026-02-21T08:16:17.6536154Z add.s32 %r16, %r262, %r263; 2026-02-21T08:16:17.6536655Z .loc 1 27 94 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:27:94 2026-02-21T08:16:17.6537208Z shl.b32 %r340, %r341, 2; 2026-02-21T08:16:17.6537484Z shl.b32 %r339, %r341, 7; 2026-02-21T08:16:17.6537837Z $L__BB0_12: // =>This Inner Loop Header: Depth=1 2026-02-21T08:16:17.6538469Z .loc 1 38 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:38:27 2026-02-21T08:16:17.6539010Z and.b32 %r305, %r339, 384; 2026-02-21T08:16:17.6539298Z and.b32 %r306, %r341, 3584; 2026-02-21T08:16:17.6539568Z or.b32 %r303, %r305, %r306; 2026-02-21T08:16:17.6540075Z .loc 1 40 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:40:27 2026-02-21T08:16:17.6540619Z and.b32 %r302, %r340, 2032; 2026-02-21T08:16:17.6541111Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6541698Z shfl.sync.idx.b32 %r307, %r2, 0, 31, -1; 2026-02-21T08:16:17.6542036Z shl.b32 %r308, %r307, 21; 2026-02-21T08:16:17.6542330Z and.b32 %r309, %r308, 6291456; 2026-02-21T08:16:17.6542620Z add.s32 %r264, %r309, %r336; 2026-02-21T08:16:17.6542919Z mov.pred %p35, -1; 2026-02-21T08:16:17.6543177Z mov.b32 %r265, 0; 2026-02-21T08:16:17.6543432Z // begin inline asm 2026-02-21T08:16:17.6544153Z @%p35 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r264 + 0], {%r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265}; 2026-02-21T08:16:17.6544984Z // end inline asm 2026-02-21T08:16:17.6545276Z // begin inline asm 2026-02-21T08:16:17.6545550Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:16:17.6545865Z // end inline asm 2026-02-21T08:16:17.6546090Z bar.sync 0, 128; 2026-02-21T08:16:17.6546529Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6547036Z add.s32 %r281, %r239, 31744; 2026-02-21T08:16:17.6547301Z // begin inline asm 2026-02-21T08:16:17.6547595Z @%p17 mbarrier.init.shared::cta.b64 [%r281], 1; 2026-02-21T08:16:17.6547923Z // end inline asm 2026-02-21T08:16:17.6548191Z st.shared.b32 [global_smem+31752], 16777730; 2026-02-21T08:16:17.6548534Z st.shared.b32 [global_smem], %r336; 2026-02-21T08:16:17.6548897Z st.shared.v2.b32 [global_smem+8], {%r303, %r302}; 2026-02-21T08:16:17.6549235Z barrier.sync 1; 2026-02-21T08:16:17.6549471Z barrier.sync 1; 2026-02-21T08:16:17.6549693Z barrier.sync 1; 2026-02-21T08:16:17.6550136Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6550644Z bar.sync 0, 128; 2026-02-21T08:16:17.6550866Z // begin inline asm 2026-02-21T08:16:17.6551083Z 2026-02-21T08:16:17.6551377Z { 2026-02-21T08:16:17.6551581Z .reg .pred complete; 2026-02-21T08:16:17.6551809Z waitLoop: 2026-02-21T08:16:17.6552122Z mbarrier.try_wait.parity.shared.b64 complete, [%r281], %r265; 2026-02-21T08:16:17.6552516Z @!complete bra.uni waitLoop; 2026-02-21T08:16:17.6552767Z } 2026-02-21T08:16:17.6552873Z 2026-02-21T08:16:17.6552958Z // end inline asm 2026-02-21T08:16:17.6553382Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6553872Z bar.sync 0, 128; 2026-02-21T08:16:17.6554086Z // begin inline asm 2026-02-21T08:16:17.6554367Z @%p17 mbarrier.inval.shared::cta.b64 [%r281]; 2026-02-21T08:16:17.6554725Z // end inline asm 2026-02-21T08:16:17.6555146Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6555631Z // begin inline asm 2026-02-21T08:16:17.6556392Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300}, [%r264 + 0]; 2026-02-21T08:16:17.6557080Z // end inline asm 2026-02-21T08:16:17.6557296Z // begin inline asm 2026-02-21T08:16:17.6557548Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:16:17.6557813Z // end inline asm 2026-02-21T08:16:17.6558047Z cvt.u64.u32 %rd152, %r285; 2026-02-21T08:16:17.6558309Z cvt.u64.u32 %rd153, %r286; 2026-02-21T08:16:17.6558576Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:16:17.6558839Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:16:17.6559317Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6559829Z mov.b64 {%r311, %r312}, %rd155; 2026-02-21T08:16:17.6560114Z cvt.rn.f16x2.f32 %r313, %r312, %r311; 2026-02-21T08:16:17.6560613Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6561112Z cvt.u64.u32 %rd156, %r287; 2026-02-21T08:16:17.6561382Z cvt.u64.u32 %rd157, %r288; 2026-02-21T08:16:17.6561642Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:16:17.6561910Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:16:17.6562371Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6562886Z mov.b64 {%r314, %r315}, %rd159; 2026-02-21T08:16:17.6563177Z cvt.rn.f16x2.f32 %r316, %r315, %r314; 2026-02-21T08:16:17.6563663Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6564154Z cvt.u64.u32 %rd160, %r289; 2026-02-21T08:16:17.6564400Z cvt.u64.u32 %rd161, %r290; 2026-02-21T08:16:17.6564662Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:16:17.6564969Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:16:17.6565439Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6565939Z mov.b64 {%r317, %r318}, %rd163; 2026-02-21T08:16:17.6566225Z cvt.rn.f16x2.f32 %r319, %r318, %r317; 2026-02-21T08:16:17.6566734Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6567250Z cvt.u64.u32 %rd164, %r291; 2026-02-21T08:16:17.6567527Z cvt.u64.u32 %rd165, %r292; 2026-02-21T08:16:17.6567792Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:16:17.6568071Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:16:17.6568548Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6569074Z mov.b64 {%r320, %r321}, %rd167; 2026-02-21T08:16:17.6569376Z cvt.rn.f16x2.f32 %r322, %r321, %r320; 2026-02-21T08:16:17.6569885Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6570366Z cvt.u64.u32 %rd168, %r293; 2026-02-21T08:16:17.6570602Z cvt.u64.u32 %rd169, %r294; 2026-02-21T08:16:17.6570843Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:16:17.6571090Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:16:17.6571641Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6572106Z mov.b64 {%r323, %r324}, %rd171; 2026-02-21T08:16:17.6572365Z cvt.rn.f16x2.f32 %r325, %r324, %r323; 2026-02-21T08:16:17.6572819Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6573271Z cvt.u64.u32 %rd172, %r295; 2026-02-21T08:16:17.6573515Z cvt.u64.u32 %rd173, %r296; 2026-02-21T08:16:17.6573752Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:16:17.6574006Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:16:17.6574453Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6575015Z mov.b64 {%r326, %r327}, %rd175; 2026-02-21T08:16:17.6575282Z cvt.rn.f16x2.f32 %r328, %r327, %r326; 2026-02-21T08:16:17.6575855Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6576363Z cvt.u64.u32 %rd176, %r297; 2026-02-21T08:16:17.6576615Z cvt.u64.u32 %rd177, %r298; 2026-02-21T08:16:17.6576879Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:16:17.6577139Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:16:17.6577610Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6578114Z mov.b64 {%r329, %r330}, %rd179; 2026-02-21T08:16:17.6578392Z cvt.rn.f16x2.f32 %r331, %r330, %r329; 2026-02-21T08:16:17.6578883Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6579375Z cvt.u64.u32 %rd180, %r299; 2026-02-21T08:16:17.6579647Z cvt.u64.u32 %rd181, %r300; 2026-02-21T08:16:17.6579902Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:16:17.6580169Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:16:17.6580636Z .loc 1 55 27 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:55:27 2026-02-21T08:16:17.6581149Z mov.b64 {%r332, %r333}, %rd183; 2026-02-21T08:16:17.6581451Z cvt.rn.f16x2.f32 %r334, %r333, %r332; 2026-02-21T08:16:17.6581938Z .loc 1 56 45 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:56:45 2026-02-21T08:16:17.6582459Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:16:17.6582742Z bar.sync 0, 128; 2026-02-21T08:16:17.6583039Z st.shared.v4.b32 [%r15], {%r313, %r316, %r319, %r322}; 2026-02-21T08:16:17.6583442Z st.shared.v4.b32 [%r16], {%r325, %r328, %r331, %r334}; 2026-02-21T08:16:17.6583784Z // begin inline asm 2026-02-21T08:16:17.6584057Z fence.proxy.async.shared::cta; 2026-02-21T08:16:17.6584328Z // end inline asm 2026-02-21T08:16:17.6584556Z bar.sync 0, 128; 2026-02-21T08:16:17.6584849Z elect.sync %r335|%p40, -1; 2026-02-21T08:16:17.6585132Z and.pred %p38, %p14, %p40; 2026-02-21T08:16:17.6585386Z // begin inline asm 2026-02-21T08:16:17.6585851Z @%p38 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd151, {%r302, %r303}], [%r262]; 2026-02-21T08:16:17.6586357Z // end inline asm 2026-02-21T08:16:17.6586596Z cp.async.bulk.commit_group; 2026-02-21T08:16:17.6587065Z .loc 1 27 94 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:27:94 2026-02-21T08:16:17.6587531Z add.s32 %r22, %r341, 148; 2026-02-21T08:16:17.6587795Z add.s32 %r340, %r340, 592; 2026-02-21T08:16:17.6588047Z add.s32 %r339, %r339, 18944; 2026-02-21T08:16:17.6588319Z setp.lt.u32 %p41, %r341, 3948; 2026-02-21T08:16:17.6588582Z mov.b32 %r341, %r22; 2026-02-21T08:16:17.6588823Z @%p41 bra $L__BB0_12; 2026-02-21T08:16:17.6589102Z $L__BB0_13: // %._crit_edge 2026-02-21T08:16:17.6589456Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:16:17.6589745Z bar.sync 0, 128; 2026-02-21T08:16:17.6590172Z .loc 1 27 4 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:27:4 2026-02-21T08:16:17.6590665Z bar.sync 0, 128; 2026-02-21T08:16:17.6590891Z // begin inline asm 2026-02-21T08:16:17.6591384Z @%p14 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r336, 32; 2026-02-21T08:16:17.6591764Z // end inline asm 2026-02-21T08:16:17.6592030Z st.shared.b32 [global_smem+31752], 50529027; 2026-02-21T08:16:17.6592344Z barrier.sync 1; 2026-02-21T08:16:17.6592615Z $L__BB0_14: // %common.ret 2026-02-21T08:16:17.6593134Z .loc 1 0 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:0 2026-02-21T08:16:17.6593599Z ret; 2026-02-21T08:16:17.6593877Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:16:17.6594268Z ld.param.b64 %rd7, [_helion_matmul_param_1]; 2026-02-21T08:16:17.6594638Z ld.param.b64 %rd6, [_helion_matmul_param_0]; 2026-02-21T08:16:17.6595187Z .loc 1 19 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:19 2026-02-21T08:16:17.6595651Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:16:17.6595911Z and.b16 %rs2, %rs1, 1; 2026-02-21T08:16:17.6596270Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:16:17.6596749Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6597255Z or.b32 %r5, %r4, 112; 2026-02-21T08:16:17.6597508Z mov.b32 %r26, global_smem; 2026-02-21T08:16:17.6597768Z add.s32 %r27, %r26, %r3; 2026-02-21T08:16:17.6598025Z bra.uni $L__BB0_2; 2026-02-21T08:16:17.6598336Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:17.6598915Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6599422Z barrier.sync 1; 2026-02-21T08:16:17.6599649Z barrier.sync 1; 2026-02-21T08:16:17.6599919Z $L__BB0_2: // %.preheader 2026-02-21T08:16:17.6600295Z // =>This Loop Header: Depth=1 2026-02-21T08:16:17.6600696Z // Child Loop BB0_6 Depth 2 2026-02-21T08:16:17.6601227Z .loc 1 19 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:19 2026-02-21T08:16:17.6601768Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:16:17.6602088Z barrier.sync 1; 2026-02-21T08:16:17.6602329Z ld.shared.b8 %r25, [%r27+31748]; 2026-02-21T08:16:17.6602633Z setp.gt.u32 %p2, %r25, 3; 2026-02-21T08:16:17.6602893Z @%p2 bra $L__BB0_4; 2026-02-21T08:16:17.6603177Z // %bb.3: // %.preheader 2026-02-21T08:16:17.6603560Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:17.6603930Z $L_brx_0: .branchtargets 2026-02-21T08:16:17.6604175Z $L__BB0_5, 2026-02-21T08:16:17.6604381Z $L__BB0_8, 2026-02-21T08:16:17.6604574Z $L__BB0_9, 2026-02-21T08:16:17.6604829Z $L__BB0_14; 2026-02-21T08:16:17.6605052Z brx.idx %r25, $L_brx_0; 2026-02-21T08:16:17.6605331Z $L__BB0_5: // %.peel.next 2026-02-21T08:16:17.6605704Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:17.6606203Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6606683Z ld.shared.b32 %r136, [global_smem]; 2026-02-21T08:16:17.6606994Z ld.shared.v2.b32 {%r158, %r159}, [global_smem+8]; 2026-02-21T08:16:17.6607301Z barrier.sync 1; 2026-02-21T08:16:17.6607696Z .loc 1 39 45 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:39:45 2026-02-21T08:16:17.6608150Z bfe.u32 %r160, %r1, 1, 4; 2026-02-21T08:16:17.6608573Z .loc 1 41 45 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:41:45 2026-02-21T08:16:17.6609026Z shl.b32 %r161, %r1, 3; 2026-02-21T08:16:17.6609260Z and.b32 %r162, %r161, 8; 2026-02-21T08:16:17.6609669Z .loc 1 39 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:39:32 2026-02-21T08:16:17.6610132Z add.s32 %r163, %r158, %r160; 2026-02-21T08:16:17.6610561Z .loc 1 41 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:41:32 2026-02-21T08:16:17.6611119Z add.s32 %r164, %r159, %r160; 2026-02-21T08:16:17.6611547Z .loc 1 39 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:39:32 2026-02-21T08:16:17.6612004Z shl.b32 %r165, %r163, 11; 2026-02-21T08:16:17.6612421Z .loc 1 51 53 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:53 2026-02-21T08:16:17.6612917Z add.s32 %r166, %r165, 32768; 2026-02-21T08:16:17.6613173Z add.s32 %r167, %r165, 65536; 2026-02-21T08:16:17.6613419Z add.s32 %r168, %r165, 98304; 2026-02-21T08:16:17.6613674Z add.s32 %r169, %r165, 131072; 2026-02-21T08:16:17.6613934Z add.s32 %r170, %r165, 163840; 2026-02-21T08:16:17.6614182Z add.s32 %r171, %r165, 196608; 2026-02-21T08:16:17.6614437Z add.s32 %r172, %r165, 229376; 2026-02-21T08:16:17.6614937Z .loc 1 52 80 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:80 2026-02-21T08:16:17.6615521Z shl.b32 %r173, %r164, 11; 2026-02-21T08:16:17.6615982Z .loc 1 51 60 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:60 2026-02-21T08:16:17.6616467Z or.b32 %r174, %r165, %r162; 2026-02-21T08:16:17.6616729Z or.b32 %r175, %r166, %r162; 2026-02-21T08:16:17.6616975Z or.b32 %r176, %r167, %r162; 2026-02-21T08:16:17.6617255Z or.b32 %r177, %r168, %r162; 2026-02-21T08:16:17.6617497Z or.b32 %r178, %r169, %r162; 2026-02-21T08:16:17.6617746Z or.b32 %r179, %r170, %r162; 2026-02-21T08:16:17.6617986Z or.b32 %r180, %r171, %r162; 2026-02-21T08:16:17.6618234Z or.b32 %r181, %r172, %r162; 2026-02-21T08:16:17.6618674Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6619177Z mad.wide.s32 %rd10, %r174, 2, %rd6; 2026-02-21T08:16:17.6619474Z mad.wide.s32 %rd11, %r175, 2, %rd6; 2026-02-21T08:16:17.6619752Z mad.wide.s32 %rd12, %r176, 2, %rd6; 2026-02-21T08:16:17.6620039Z mad.wide.s32 %rd13, %r177, 2, %rd6; 2026-02-21T08:16:17.6620315Z mad.wide.s32 %rd14, %r178, 2, %rd6; 2026-02-21T08:16:17.6620592Z mad.wide.s32 %rd15, %r179, 2, %rd6; 2026-02-21T08:16:17.6620863Z mad.wide.s32 %rd16, %r180, 2, %rd6; 2026-02-21T08:16:17.6621140Z mad.wide.s32 %rd17, %r181, 2, %rd6; 2026-02-21T08:16:17.6621599Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6622088Z shl.b32 %r182, %r1, 4; 2026-02-21T08:16:17.6622346Z and.b32 %r183, %r182, 368; 2026-02-21T08:16:17.6622583Z bfe.s32 %r184, %r1, 3, 1; 2026-02-21T08:16:17.6622825Z and.b32 %r185, %r184, 144; 2026-02-21T08:16:17.6623058Z xor.b32 %r7, %r185, %r183; 2026-02-21T08:16:17.6623297Z add.s32 %r28, %r26, %r7; 2026-02-21T08:16:17.6623520Z mov.b32 %r29, 16; 2026-02-21T08:16:17.6623733Z // begin inline asm 2026-02-21T08:16:17.6624047Z cp.async.cg.shared.global [ %r28 + 0 ], [ %rd10 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6624407Z // end inline asm 2026-02-21T08:16:17.6624616Z add.s32 %r30, %r28, 512; 2026-02-21T08:16:17.6624905Z // begin inline asm 2026-02-21T08:16:17.6625241Z cp.async.cg.shared.global [ %r30 + 0 ], [ %rd11 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6625614Z // end inline asm 2026-02-21T08:16:17.6625842Z add.s32 %r32, %r28, 1024; 2026-02-21T08:16:17.6626087Z // begin inline asm 2026-02-21T08:16:17.6626418Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd12 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6626789Z // end inline asm 2026-02-21T08:16:17.6627021Z add.s32 %r34, %r28, 1536; 2026-02-21T08:16:17.6627266Z // begin inline asm 2026-02-21T08:16:17.6627598Z cp.async.cg.shared.global [ %r34 + 0 ], [ %rd13 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6627982Z // end inline asm 2026-02-21T08:16:17.6628200Z add.s32 %r36, %r28, 2048; 2026-02-21T08:16:17.6628451Z // begin inline asm 2026-02-21T08:16:17.6628773Z cp.async.cg.shared.global [ %r36 + 0 ], [ %rd14 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6629148Z // end inline asm 2026-02-21T08:16:17.6629374Z add.s32 %r38, %r28, 2560; 2026-02-21T08:16:17.6629772Z // begin inline asm 2026-02-21T08:16:17.6630094Z cp.async.cg.shared.global [ %r38 + 0 ], [ %rd15 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6630470Z // end inline asm 2026-02-21T08:16:17.6630696Z add.s32 %r40, %r28, 3072; 2026-02-21T08:16:17.6630936Z // begin inline asm 2026-02-21T08:16:17.6631265Z cp.async.cg.shared.global [ %r40 + 0 ], [ %rd16 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6631631Z // end inline asm 2026-02-21T08:16:17.6631858Z add.s32 %r42, %r28, 3584; 2026-02-21T08:16:17.6632098Z // begin inline asm 2026-02-21T08:16:17.6632421Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd17 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6632789Z // end inline asm 2026-02-21T08:16:17.6633021Z cp.async.commit_group; 2026-02-21T08:16:17.6633475Z .loc 1 52 59 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:59 2026-02-21T08:16:17.6633985Z or.b32 %r186, %r173, %r162; 2026-02-21T08:16:17.6634724Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6635272Z mad.wide.s32 %rd18, %r186, 2, %rd7; 2026-02-21T08:16:17.6635757Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6636213Z add.s32 %r187, %r26, 28672; 2026-02-21T08:16:17.6636466Z add.s32 %r44, %r187, %r7; 2026-02-21T08:16:17.6636697Z // begin inline asm 2026-02-21T08:16:17.6637016Z cp.async.cg.shared.global [ %r44 + 0 ], [ %rd18 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6637373Z // end inline asm 2026-02-21T08:16:17.6637588Z cp.async.commit_group; 2026-02-21T08:16:17.6638008Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6638478Z cvt.s64.s32 %rd77, %r165; 2026-02-21T08:16:17.6638722Z cvt.u64.u32 %rd78, %r162; 2026-02-21T08:16:17.6638956Z or.b64 %rd79, %rd77, %rd78; 2026-02-21T08:16:17.6639207Z shl.b64 %rd80, %rd79, 1; 2026-02-21T08:16:17.6639448Z add.s64 %rd81, %rd6, %rd80; 2026-02-21T08:16:17.6639697Z add.s64 %rd19, %rd81, 32; 2026-02-21T08:16:17.6639936Z cvt.s64.s32 %rd82, %r166; 2026-02-21T08:16:17.6640166Z or.b64 %rd83, %rd82, %rd78; 2026-02-21T08:16:17.6640412Z shl.b64 %rd84, %rd83, 1; 2026-02-21T08:16:17.6640646Z add.s64 %rd85, %rd6, %rd84; 2026-02-21T08:16:17.6640907Z add.s64 %rd20, %rd85, 32; 2026-02-21T08:16:17.6641150Z cvt.s64.s32 %rd86, %r167; 2026-02-21T08:16:17.6641405Z or.b64 %rd87, %rd86, %rd78; 2026-02-21T08:16:17.6641651Z shl.b64 %rd88, %rd87, 1; 2026-02-21T08:16:17.6641892Z add.s64 %rd89, %rd6, %rd88; 2026-02-21T08:16:17.6642133Z add.s64 %rd21, %rd89, 32; 2026-02-21T08:16:17.6642385Z cvt.s64.s32 %rd90, %r168; 2026-02-21T08:16:17.6642640Z or.b64 %rd91, %rd90, %rd78; 2026-02-21T08:16:17.6642897Z shl.b64 %rd92, %rd91, 1; 2026-02-21T08:16:17.6643156Z add.s64 %rd93, %rd6, %rd92; 2026-02-21T08:16:17.6643418Z add.s64 %rd22, %rd93, 32; 2026-02-21T08:16:17.6643672Z cvt.s64.s32 %rd94, %r169; 2026-02-21T08:16:17.6643929Z or.b64 %rd95, %rd94, %rd78; 2026-02-21T08:16:17.6644196Z shl.b64 %rd96, %rd95, 1; 2026-02-21T08:16:17.6644443Z add.s64 %rd97, %rd6, %rd96; 2026-02-21T08:16:17.6644779Z add.s64 %rd23, %rd97, 32; 2026-02-21T08:16:17.6645024Z cvt.s64.s32 %rd98, %r170; 2026-02-21T08:16:17.6645280Z or.b64 %rd99, %rd98, %rd78; 2026-02-21T08:16:17.6645546Z shl.b64 %rd100, %rd99, 1; 2026-02-21T08:16:17.6645803Z add.s64 %rd101, %rd6, %rd100; 2026-02-21T08:16:17.6646080Z add.s64 %rd24, %rd101, 32; 2026-02-21T08:16:17.6646338Z cvt.s64.s32 %rd102, %r171; 2026-02-21T08:16:17.6646604Z or.b64 %rd103, %rd102, %rd78; 2026-02-21T08:16:17.6646868Z shl.b64 %rd104, %rd103, 1; 2026-02-21T08:16:17.6647138Z add.s64 %rd105, %rd6, %rd104; 2026-02-21T08:16:17.6647394Z add.s64 %rd25, %rd105, 32; 2026-02-21T08:16:17.6647653Z cvt.s64.s32 %rd106, %r172; 2026-02-21T08:16:17.6647917Z or.b64 %rd107, %rd106, %rd78; 2026-02-21T08:16:17.6648170Z shl.b64 %rd108, %rd107, 1; 2026-02-21T08:16:17.6648436Z add.s64 %rd109, %rd6, %rd108; 2026-02-21T08:16:17.6648821Z add.s64 %rd26, %rd109, 32; 2026-02-21T08:16:17.6649293Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6649796Z bar.warp.sync -1; 2026-02-21T08:16:17.6650042Z add.s32 %r46, %r28, 4096; 2026-02-21T08:16:17.6650289Z // begin inline asm 2026-02-21T08:16:17.6650643Z cp.async.cg.shared.global [ %r46 + 0 ], [ %rd19 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6651026Z // end inline asm 2026-02-21T08:16:17.6651263Z add.s32 %r48, %r28, 4608; 2026-02-21T08:16:17.6651527Z // begin inline asm 2026-02-21T08:16:17.6651837Z cp.async.cg.shared.global [ %r48 + 0 ], [ %rd20 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6652204Z // end inline asm 2026-02-21T08:16:17.6652413Z add.s32 %r50, %r28, 5120; 2026-02-21T08:16:17.6652653Z // begin inline asm 2026-02-21T08:16:17.6652957Z cp.async.cg.shared.global [ %r50 + 0 ], [ %rd21 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6653400Z // end inline asm 2026-02-21T08:16:17.6653607Z add.s32 %r52, %r28, 5632; 2026-02-21T08:16:17.6653847Z // begin inline asm 2026-02-21T08:16:17.6654158Z cp.async.cg.shared.global [ %r52 + 0 ], [ %rd22 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6654537Z // end inline asm 2026-02-21T08:16:17.6654852Z add.s32 %r54, %r28, 6144; 2026-02-21T08:16:17.6655101Z // begin inline asm 2026-02-21T08:16:17.6655441Z cp.async.cg.shared.global [ %r54 + 0 ], [ %rd23 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6655820Z // end inline asm 2026-02-21T08:16:17.6656036Z add.s32 %r56, %r28, 6656; 2026-02-21T08:16:17.6656269Z // begin inline asm 2026-02-21T08:16:17.6656586Z cp.async.cg.shared.global [ %r56 + 0 ], [ %rd24 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6656956Z // end inline asm 2026-02-21T08:16:17.6657163Z add.s32 %r58, %r28, 7168; 2026-02-21T08:16:17.6657404Z // begin inline asm 2026-02-21T08:16:17.6657709Z cp.async.cg.shared.global [ %r58 + 0 ], [ %rd25 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6658057Z // end inline asm 2026-02-21T08:16:17.6658262Z add.s32 %r60, %r28, 7680; 2026-02-21T08:16:17.6658500Z // begin inline asm 2026-02-21T08:16:17.6658797Z cp.async.cg.shared.global [ %r60 + 0 ], [ %rd26 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6659145Z // end inline asm 2026-02-21T08:16:17.6659354Z cp.async.commit_group; 2026-02-21T08:16:17.6659786Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6660266Z cvt.s64.s32 %rd110, %r173; 2026-02-21T08:16:17.6660510Z or.b64 %rd111, %rd110, %rd78; 2026-02-21T08:16:17.6660765Z shl.b64 %rd112, %rd111, 1; 2026-02-21T08:16:17.6661004Z add.s64 %rd113, %rd7, %rd112; 2026-02-21T08:16:17.6661253Z add.s64 %rd27, %rd113, 32; 2026-02-21T08:16:17.6661674Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6662140Z add.s32 %r62, %r28, 29184; 2026-02-21T08:16:17.6662370Z // begin inline asm 2026-02-21T08:16:17.6662685Z cp.async.cg.shared.global [ %r62 + 0 ], [ %rd27 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6663042Z // end inline asm 2026-02-21T08:16:17.6663248Z cp.async.commit_group; 2026-02-21T08:16:17.6663666Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6664121Z add.s64 %rd28, %rd81, 64; 2026-02-21T08:16:17.6664359Z add.s64 %rd29, %rd85, 64; 2026-02-21T08:16:17.6664587Z add.s64 %rd30, %rd89, 64; 2026-02-21T08:16:17.6664902Z add.s64 %rd31, %rd93, 64; 2026-02-21T08:16:17.6665136Z add.s64 %rd32, %rd97, 64; 2026-02-21T08:16:17.6665390Z add.s64 %rd33, %rd101, 64; 2026-02-21T08:16:17.6665650Z add.s64 %rd34, %rd105, 64; 2026-02-21T08:16:17.6665896Z add.s64 %rd35, %rd109, 64; 2026-02-21T08:16:17.6666315Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6666769Z bar.warp.sync -1; 2026-02-21T08:16:17.6666996Z add.s32 %r64, %r28, 8192; 2026-02-21T08:16:17.6667217Z // begin inline asm 2026-02-21T08:16:17.6667533Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd28 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6668003Z // end inline asm 2026-02-21T08:16:17.6668217Z add.s32 %r66, %r28, 8704; 2026-02-21T08:16:17.6668452Z // begin inline asm 2026-02-21T08:16:17.6668754Z cp.async.cg.shared.global [ %r66 + 0 ], [ %rd29 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6669109Z // end inline asm 2026-02-21T08:16:17.6669314Z add.s32 %r68, %r28, 9216; 2026-02-21T08:16:17.6669547Z // begin inline asm 2026-02-21T08:16:17.6669842Z cp.async.cg.shared.global [ %r68 + 0 ], [ %rd30 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6670190Z // end inline asm 2026-02-21T08:16:17.6670392Z add.s32 %r70, %r28, 9728; 2026-02-21T08:16:17.6670620Z // begin inline asm 2026-02-21T08:16:17.6670919Z cp.async.cg.shared.global [ %r70 + 0 ], [ %rd31 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6671255Z // end inline asm 2026-02-21T08:16:17.6671464Z add.s32 %r72, %r28, 10240; 2026-02-21T08:16:17.6671695Z // begin inline asm 2026-02-21T08:16:17.6672078Z cp.async.cg.shared.global [ %r72 + 0 ], [ %rd32 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6672421Z // end inline asm 2026-02-21T08:16:17.6672631Z add.s32 %r74, %r28, 10752; 2026-02-21T08:16:17.6672860Z // begin inline asm 2026-02-21T08:16:17.6673163Z cp.async.cg.shared.global [ %r74 + 0 ], [ %rd33 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6673506Z // end inline asm 2026-02-21T08:16:17.6673723Z add.s32 %r76, %r28, 11264; 2026-02-21T08:16:17.6673960Z // begin inline asm 2026-02-21T08:16:17.6674259Z cp.async.cg.shared.global [ %r76 + 0 ], [ %rd34 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6674611Z // end inline asm 2026-02-21T08:16:17.6674887Z add.s32 %r78, %r28, 11776; 2026-02-21T08:16:17.6675124Z // begin inline asm 2026-02-21T08:16:17.6675416Z cp.async.cg.shared.global [ %r78 + 0 ], [ %rd35 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6675768Z // end inline asm 2026-02-21T08:16:17.6675994Z cp.async.commit_group; 2026-02-21T08:16:17.6676462Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6676988Z add.s64 %rd36, %rd113, 64; 2026-02-21T08:16:17.6677438Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6677935Z add.s32 %r80, %r28, 29696; 2026-02-21T08:16:17.6678185Z // begin inline asm 2026-02-21T08:16:17.6678514Z cp.async.cg.shared.global [ %r80 + 0 ], [ %rd36 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6678882Z // end inline asm 2026-02-21T08:16:17.6679115Z cp.async.commit_group; 2026-02-21T08:16:17.6679556Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6680057Z add.s64 %rd37, %rd81, 96; 2026-02-21T08:16:17.6680317Z add.s64 %rd38, %rd85, 96; 2026-02-21T08:16:17.6680561Z add.s64 %rd39, %rd89, 96; 2026-02-21T08:16:17.6680812Z add.s64 %rd40, %rd93, 96; 2026-02-21T08:16:17.6681053Z add.s64 %rd41, %rd97, 96; 2026-02-21T08:16:17.6681304Z add.s64 %rd42, %rd101, 96; 2026-02-21T08:16:17.6681562Z add.s64 %rd43, %rd105, 96; 2026-02-21T08:16:17.6681821Z add.s64 %rd44, %rd109, 96; 2026-02-21T08:16:17.6682263Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6682759Z bar.warp.sync -1; 2026-02-21T08:16:17.6682995Z add.s32 %r82, %r28, 12288; 2026-02-21T08:16:17.6683235Z // begin inline asm 2026-02-21T08:16:17.6683572Z cp.async.cg.shared.global [ %r82 + 0 ], [ %rd37 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6683948Z // end inline asm 2026-02-21T08:16:17.6684174Z add.s32 %r84, %r28, 12800; 2026-02-21T08:16:17.6684422Z // begin inline asm 2026-02-21T08:16:17.6684783Z cp.async.cg.shared.global [ %r84 + 0 ], [ %rd38 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6685127Z // end inline asm 2026-02-21T08:16:17.6685357Z add.s32 %r86, %r28, 13312; 2026-02-21T08:16:17.6685614Z // begin inline asm 2026-02-21T08:16:17.6685939Z cp.async.cg.shared.global [ %r86 + 0 ], [ %rd39 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6686330Z // end inline asm 2026-02-21T08:16:17.6686667Z add.s32 %r88, %r28, 13824; 2026-02-21T08:16:17.6686924Z // begin inline asm 2026-02-21T08:16:17.6687243Z cp.async.cg.shared.global [ %r88 + 0 ], [ %rd40 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6687621Z // end inline asm 2026-02-21T08:16:17.6687846Z add.s32 %r90, %r28, 14336; 2026-02-21T08:16:17.6688099Z // begin inline asm 2026-02-21T08:16:17.6688420Z cp.async.cg.shared.global [ %r90 + 0 ], [ %rd41 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6688800Z // end inline asm 2026-02-21T08:16:17.6689025Z add.s32 %r92, %r28, 14848; 2026-02-21T08:16:17.6689269Z // begin inline asm 2026-02-21T08:16:17.6689596Z cp.async.cg.shared.global [ %r92 + 0 ], [ %rd42 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6689973Z // end inline asm 2026-02-21T08:16:17.6690187Z add.s32 %r94, %r28, 15360; 2026-02-21T08:16:17.6690417Z // begin inline asm 2026-02-21T08:16:17.6690720Z cp.async.cg.shared.global [ %r94 + 0 ], [ %rd43 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6691149Z // end inline asm 2026-02-21T08:16:17.6691362Z add.s32 %r96, %r28, 15872; 2026-02-21T08:16:17.6691606Z // begin inline asm 2026-02-21T08:16:17.6691900Z cp.async.cg.shared.global [ %r96 + 0 ], [ %rd44 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6692253Z // end inline asm 2026-02-21T08:16:17.6692469Z cp.async.commit_group; 2026-02-21T08:16:17.6692902Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6693371Z add.s64 %rd45, %rd113, 96; 2026-02-21T08:16:17.6693800Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6694262Z add.s32 %r98, %r28, 30208; 2026-02-21T08:16:17.6694502Z // begin inline asm 2026-02-21T08:16:17.6694890Z cp.async.cg.shared.global [ %r98 + 0 ], [ %rd45 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6695263Z // end inline asm 2026-02-21T08:16:17.6695507Z cp.async.commit_group; 2026-02-21T08:16:17.6695962Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6696465Z add.s64 %rd46, %rd81, 128; 2026-02-21T08:16:17.6696705Z add.s64 %rd47, %rd85, 128; 2026-02-21T08:16:17.6696948Z add.s64 %rd48, %rd89, 128; 2026-02-21T08:16:17.6697183Z add.s64 %rd49, %rd93, 128; 2026-02-21T08:16:17.6697424Z add.s64 %rd50, %rd97, 128; 2026-02-21T08:16:17.6697669Z add.s64 %rd51, %rd101, 128; 2026-02-21T08:16:17.6697915Z add.s64 %rd52, %rd105, 128; 2026-02-21T08:16:17.6698166Z add.s64 %rd53, %rd109, 128; 2026-02-21T08:16:17.6698587Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6699054Z bar.warp.sync -1; 2026-02-21T08:16:17.6699272Z add.s32 %r100, %r28, 16384; 2026-02-21T08:16:17.6699518Z // begin inline asm 2026-02-21T08:16:17.6699835Z cp.async.cg.shared.global [ %r100 + 0 ], [ %rd46 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6700194Z // end inline asm 2026-02-21T08:16:17.6700404Z add.s32 %r102, %r28, 16896; 2026-02-21T08:16:17.6700639Z // begin inline asm 2026-02-21T08:16:17.6700960Z cp.async.cg.shared.global [ %r102 + 0 ], [ %rd47 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6701312Z // end inline asm 2026-02-21T08:16:17.6701525Z add.s32 %r104, %r28, 17408; 2026-02-21T08:16:17.6701759Z // begin inline asm 2026-02-21T08:16:17.6702079Z cp.async.cg.shared.global [ %r104 + 0 ], [ %rd48 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6702428Z // end inline asm 2026-02-21T08:16:17.6702642Z add.s32 %r106, %r28, 17920; 2026-02-21T08:16:17.6702887Z // begin inline asm 2026-02-21T08:16:17.6703192Z cp.async.cg.shared.global [ %r106 + 0 ], [ %rd49 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6703552Z // end inline asm 2026-02-21T08:16:17.6703756Z add.s32 %r108, %r28, 18432; 2026-02-21T08:16:17.6703998Z // begin inline asm 2026-02-21T08:16:17.6704301Z cp.async.cg.shared.global [ %r108 + 0 ], [ %rd50 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6704659Z // end inline asm 2026-02-21T08:16:17.6704905Z add.s32 %r110, %r28, 18944; 2026-02-21T08:16:17.6705152Z // begin inline asm 2026-02-21T08:16:17.6705597Z cp.async.cg.shared.global [ %r110 + 0 ], [ %rd51 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6705959Z // end inline asm 2026-02-21T08:16:17.6706179Z add.s32 %r112, %r28, 19456; 2026-02-21T08:16:17.6706419Z // begin inline asm 2026-02-21T08:16:17.6706754Z cp.async.cg.shared.global [ %r112 + 0 ], [ %rd52 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6707138Z // end inline asm 2026-02-21T08:16:17.6707372Z add.s32 %r114, %r28, 19968; 2026-02-21T08:16:17.6707631Z // begin inline asm 2026-02-21T08:16:17.6707974Z cp.async.cg.shared.global [ %r114 + 0 ], [ %rd53 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6708362Z // end inline asm 2026-02-21T08:16:17.6708607Z cp.async.commit_group; 2026-02-21T08:16:17.6709078Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6709579Z add.s64 %rd54, %rd113, 128; 2026-02-21T08:16:17.6710133Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6710635Z add.s32 %r116, %r28, 30720; 2026-02-21T08:16:17.6710898Z // begin inline asm 2026-02-21T08:16:17.6711225Z cp.async.cg.shared.global [ %r116 + 0 ], [ %rd54 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6711612Z // end inline asm 2026-02-21T08:16:17.6711848Z cp.async.commit_group; 2026-02-21T08:16:17.6712294Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6712800Z add.s64 %rd55, %rd81, 160; 2026-02-21T08:16:17.6713058Z add.s64 %rd56, %rd85, 160; 2026-02-21T08:16:17.6713317Z add.s64 %rd57, %rd89, 160; 2026-02-21T08:16:17.6713569Z add.s64 %rd58, %rd93, 160; 2026-02-21T08:16:17.6713824Z add.s64 %rd59, %rd97, 160; 2026-02-21T08:16:17.6714075Z add.s64 %rd60, %rd101, 160; 2026-02-21T08:16:17.6714337Z add.s64 %rd61, %rd105, 160; 2026-02-21T08:16:17.6714586Z add.s64 %rd62, %rd109, 160; 2026-02-21T08:16:17.6715111Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6715627Z bar.warp.sync -1; 2026-02-21T08:16:17.6715845Z add.s32 %r118, %r28, 20480; 2026-02-21T08:16:17.6716089Z // begin inline asm 2026-02-21T08:16:17.6716414Z cp.async.cg.shared.global [ %r118 + 0 ], [ %rd55 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6716789Z // end inline asm 2026-02-21T08:16:17.6717003Z add.s32 %r120, %r28, 20992; 2026-02-21T08:16:17.6717257Z // begin inline asm 2026-02-21T08:16:17.6717597Z cp.async.cg.shared.global [ %r120 + 0 ], [ %rd56 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6717991Z // end inline asm 2026-02-21T08:16:17.6718212Z add.s32 %r122, %r28, 21504; 2026-02-21T08:16:17.6718454Z // begin inline asm 2026-02-21T08:16:17.6718781Z cp.async.cg.shared.global [ %r122 + 0 ], [ %rd57 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6719156Z // end inline asm 2026-02-21T08:16:17.6719375Z add.s32 %r124, %r28, 22016; 2026-02-21T08:16:17.6719616Z // begin inline asm 2026-02-21T08:16:17.6719943Z cp.async.cg.shared.global [ %r124 + 0 ], [ %rd58 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6720312Z // end inline asm 2026-02-21T08:16:17.6720531Z add.s32 %r126, %r28, 22528; 2026-02-21T08:16:17.6720782Z // begin inline asm 2026-02-21T08:16:17.6721094Z cp.async.cg.shared.global [ %r126 + 0 ], [ %rd59 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6721466Z // end inline asm 2026-02-21T08:16:17.6721676Z add.s32 %r128, %r28, 23040; 2026-02-21T08:16:17.6721935Z // begin inline asm 2026-02-21T08:16:17.6722230Z cp.async.cg.shared.global [ %r128 + 0 ], [ %rd60 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6722586Z // end inline asm 2026-02-21T08:16:17.6722787Z add.s32 %r130, %r28, 23552; 2026-02-21T08:16:17.6723028Z // begin inline asm 2026-02-21T08:16:17.6723326Z cp.async.cg.shared.global [ %r130 + 0 ], [ %rd61 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6723682Z // end inline asm 2026-02-21T08:16:17.6723899Z add.s32 %r132, %r28, 24064; 2026-02-21T08:16:17.6724130Z // begin inline asm 2026-02-21T08:16:17.6724445Z cp.async.cg.shared.global [ %r132 + 0 ], [ %rd62 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6724977Z // end inline asm 2026-02-21T08:16:17.6725224Z cp.async.commit_group; 2026-02-21T08:16:17.6725683Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6726206Z add.s64 %rd63, %rd113, 160; 2026-02-21T08:16:17.6726647Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6727139Z add.s32 %r134, %r28, 31232; 2026-02-21T08:16:17.6727400Z // begin inline asm 2026-02-21T08:16:17.6727727Z cp.async.cg.shared.global [ %r134 + 0 ], [ %rd63 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6728116Z // end inline asm 2026-02-21T08:16:17.6728343Z cp.async.commit_group; 2026-02-21T08:16:17.6728799Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6729296Z cp.async.wait_group 10; 2026-02-21T08:16:17.6729559Z bar.warp.sync -1; 2026-02-21T08:16:17.6730106Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6730608Z elect.sync %r188|%p4, -1; 2026-02-21T08:16:17.6730885Z bfe.u32 %r189, %r26, 4, 14; 2026-02-21T08:16:17.6731144Z cvt.u64.u32 %rd114, %r189; 2026-02-21T08:16:17.6731440Z or.b64 %rd64, %rd114, -4611685949691133952; 2026-02-21T08:16:17.6731751Z bfe.u32 %r190, %r187, 4, 14; 2026-02-21T08:16:17.6732021Z cvt.u64.u32 %rd115, %r190; 2026-02-21T08:16:17.6732298Z or.b64 %rd65, %rd115, -4611685949705814016; 2026-02-21T08:16:17.6732603Z mov.b32 %r137, 134479888; 2026-02-21T08:16:17.6732856Z mov.pred %p3, 0; 2026-02-21T08:16:17.6733079Z // begin inline asm 2026-02-21T08:16:17.6733466Z @%p4 tcgen05.mma.cta_group::1.kind::f16 [ %r136 + 0 ], %rd64, %rd65, %r137, %p3; 2026-02-21T08:16:17.6733857Z // end inline asm 2026-02-21T08:16:17.6734068Z add.s32 %r191, %r26, 31744; 2026-02-21T08:16:17.6734308Z cvt.u64.u32 %rd66, %r191; 2026-02-21T08:16:17.6734550Z // begin inline asm 2026-02-21T08:16:17.6734985Z @%p3 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd66]; 2026-02-21T08:16:17.6735379Z // end inline asm 2026-02-21T08:16:17.6735813Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6736320Z add.s64 %rd67, %rd81, 192; 2026-02-21T08:16:17.6736590Z add.s64 %rd68, %rd85, 192; 2026-02-21T08:16:17.6736842Z add.s64 %rd69, %rd89, 192; 2026-02-21T08:16:17.6737098Z add.s64 %rd70, %rd93, 192; 2026-02-21T08:16:17.6737347Z add.s64 %rd71, %rd97, 192; 2026-02-21T08:16:17.6737607Z add.s64 %rd72, %rd101, 192; 2026-02-21T08:16:17.6737865Z add.s64 %rd73, %rd105, 192; 2026-02-21T08:16:17.6738128Z add.s64 %rd74, %rd109, 192; 2026-02-21T08:16:17.6738591Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6739086Z bar.warp.sync -1; 2026-02-21T08:16:17.6739329Z // begin inline asm 2026-02-21T08:16:17.6739669Z cp.async.cg.shared.global [ %r28 + 0 ], [ %rd67 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6740066Z // end inline asm 2026-02-21T08:16:17.6740289Z // begin inline asm 2026-02-21T08:16:17.6740629Z cp.async.cg.shared.global [ %r30 + 0 ], [ %rd68 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6741017Z // end inline asm 2026-02-21T08:16:17.6741241Z // begin inline asm 2026-02-21T08:16:17.6741576Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd69 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6741946Z // end inline asm 2026-02-21T08:16:17.6742172Z // begin inline asm 2026-02-21T08:16:17.6742491Z cp.async.cg.shared.global [ %r34 + 0 ], [ %rd70 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6742872Z // end inline asm 2026-02-21T08:16:17.6743090Z // begin inline asm 2026-02-21T08:16:17.6743421Z cp.async.cg.shared.global [ %r36 + 0 ], [ %rd71 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6743797Z // end inline asm 2026-02-21T08:16:17.6744021Z // begin inline asm 2026-02-21T08:16:17.6744351Z cp.async.cg.shared.global [ %r38 + 0 ], [ %rd72 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6744881Z // end inline asm 2026-02-21T08:16:17.6745108Z // begin inline asm 2026-02-21T08:16:17.6745426Z cp.async.cg.shared.global [ %r40 + 0 ], [ %rd73 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6745806Z // end inline asm 2026-02-21T08:16:17.6746025Z // begin inline asm 2026-02-21T08:16:17.6746355Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd74 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6746721Z // end inline asm 2026-02-21T08:16:17.6746959Z cp.async.commit_group; 2026-02-21T08:16:17.6747415Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6747937Z add.s64 %rd75, %rd113, 192; 2026-02-21T08:16:17.6748411Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6748901Z // begin inline asm 2026-02-21T08:16:17.6749236Z cp.async.cg.shared.global [ %r44 + 0 ], [ %rd75 + 0 ], 0x10, %r29; 2026-02-21T08:16:17.6749693Z // end inline asm 2026-02-21T08:16:17.6749932Z cp.async.commit_group; 2026-02-21T08:16:17.6750387Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6750898Z add.s32 %r192, %r5, %r173; 2026-02-21T08:16:17.6751173Z cvt.u64.u32 %rd1, %r192; 2026-02-21T08:16:17.6751412Z add.s32 %r193, %r4, %r165; 2026-02-21T08:16:17.6751660Z cvt.u64.u32 %rd2, %r193; 2026-02-21T08:16:17.6751885Z mov.b32 %r337, 0; 2026-02-21T08:16:17.6752097Z mov.b64 %rd184, 0; 2026-02-21T08:16:17.6752306Z mov.b32 %r338, %r337; 2026-02-21T08:16:17.6752603Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:16:17.6752988Z // => This Inner Loop Header: Depth=2 2026-02-21T08:16:17.6753332Z add.s64 %rd4, %rd184, 16; 2026-02-21T08:16:17.6753587Z setp.lt.u64 %p9, %rd4, 1952; 2026-02-21T08:16:17.6753837Z add.s32 %r214, %r337, 1; 2026-02-21T08:16:17.6754085Z setp.gt.s32 %p10, %r214, 5; 2026-02-21T08:16:17.6754347Z selp.b32 %r337, 0, %r214, %p10; 2026-02-21T08:16:17.6754884Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6755396Z cp.async.wait_group 10; 2026-02-21T08:16:17.6755667Z bar.warp.sync -1; 2026-02-21T08:16:17.6755901Z shl.b32 %r215, %r337, 12; 2026-02-21T08:16:17.6756167Z add.s32 %r217, %r26, %r215; 2026-02-21T08:16:17.6756638Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6757136Z shl.b32 %r218, %r337, 9; 2026-02-21T08:16:17.6757401Z add.s32 %r219, %r26, %r218; 2026-02-21T08:16:17.6757672Z add.s32 %r220, %r219, 28672; 2026-02-21T08:16:17.6758139Z .loc 1 53 52 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:53:52 2026-02-21T08:16:17.6758663Z setp.eq.b64 %p11, %rd184, 2016; 2026-02-21T08:16:17.6758973Z elect.sync %r221|%p7, -1; 2026-02-21T08:16:17.6759244Z bfe.u32 %r222, %r217, 4, 14; 2026-02-21T08:16:17.6759528Z cvt.u64.u32 %rd128, %r222; 2026-02-21T08:16:17.6759830Z or.b64 %rd116, %rd128, -4611685949691133952; 2026-02-21T08:16:17.6760151Z bfe.u32 %r223, %r220, 4, 14; 2026-02-21T08:16:17.6760426Z cvt.u64.u32 %rd129, %r223; 2026-02-21T08:16:17.6760710Z or.b64 %rd117, %rd129, -4611685949705814016; 2026-02-21T08:16:17.6761034Z mov.pred %p6, -1; 2026-02-21T08:16:17.6761271Z // begin inline asm 2026-02-21T08:16:17.6761665Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r136 + 0 ], %rd116, %rd117, %r137, %p6; 2026-02-21T08:16:17.6762097Z // end inline asm 2026-02-21T08:16:17.6762339Z and.pred %p8, %p11, %p7; 2026-02-21T08:16:17.6762601Z // begin inline asm 2026-02-21T08:16:17.6762949Z @%p8 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd66]; 2026-02-21T08:16:17.6763341Z // end inline asm 2026-02-21T08:16:17.6763756Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6764267Z add.s32 %r225, %r338, 1; 2026-02-21T08:16:17.6764522Z setp.gt.s32 %p12, %r225, 5; 2026-02-21T08:16:17.6765509Z selp.b32 %r338, 0, %r225, %p12; 2026-02-21T08:16:17.6765984Z .loc 1 51 60 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:60 2026-02-21T08:16:17.6766482Z add.s64 %rd130, %rd2, %rd184; 2026-02-21T08:16:17.6766736Z cvt.u32.u64 %r226, %rd130; 2026-02-21T08:16:17.6766972Z add.s32 %r227, %r226, 112; 2026-02-21T08:16:17.6767416Z .loc 1 51 32 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:32 2026-02-21T08:16:17.6767904Z mad.wide.s32 %rd119, %r227, 2, %rd6; 2026-02-21T08:16:17.6768203Z add.s32 %r228, %r226, 32880; 2026-02-21T08:16:17.6768467Z mad.wide.s32 %rd120, %r228, 2, %rd6; 2026-02-21T08:16:17.6768762Z add.s32 %r229, %r226, 65648; 2026-02-21T08:16:17.6769027Z mad.wide.s32 %rd121, %r229, 2, %rd6; 2026-02-21T08:16:17.6769298Z add.s32 %r230, %r226, 98416; 2026-02-21T08:16:17.6769560Z mad.wide.s32 %rd122, %r230, 2, %rd6; 2026-02-21T08:16:17.6769972Z add.s32 %r231, %r226, 131184; 2026-02-21T08:16:17.6770244Z mad.wide.s32 %rd123, %r231, 2, %rd6; 2026-02-21T08:16:17.6770531Z add.s32 %r232, %r226, 163952; 2026-02-21T08:16:17.6770798Z mad.wide.s32 %rd124, %r232, 2, %rd6; 2026-02-21T08:16:17.6771073Z add.s32 %r233, %r226, 196720; 2026-02-21T08:16:17.6771349Z mad.wide.s32 %rd125, %r233, 2, %rd6; 2026-02-21T08:16:17.6771641Z add.s32 %r234, %r226, 229488; 2026-02-21T08:16:17.6771896Z mad.wide.s32 %rd126, %r234, 2, %rd6; 2026-02-21T08:16:17.6772393Z .loc 1 51 85 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:51:85 2026-02-21T08:16:17.6772889Z shl.b32 %r235, %r338, 12; 2026-02-21T08:16:17.6773158Z add.s32 %r236, %r26, %r235; 2026-02-21T08:16:17.6773411Z bar.warp.sync -1; 2026-02-21T08:16:17.6773647Z add.s32 %r196, %r236, %r7; 2026-02-21T08:16:17.6773893Z selp.b32 %r197, 16, 0, %p9; 2026-02-21T08:16:17.6774149Z // begin inline asm 2026-02-21T08:16:17.6774489Z cp.async.cg.shared.global [ %r196 + 0 ], [ %rd119 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6774979Z // end inline asm 2026-02-21T08:16:17.6775207Z add.s32 %r198, %r196, 512; 2026-02-21T08:16:17.6775448Z // begin inline asm 2026-02-21T08:16:17.6775787Z cp.async.cg.shared.global [ %r198 + 0 ], [ %rd120 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6776171Z // end inline asm 2026-02-21T08:16:17.6776409Z add.s32 %r200, %r196, 1024; 2026-02-21T08:16:17.6776660Z // begin inline asm 2026-02-21T08:16:17.6777000Z cp.async.cg.shared.global [ %r200 + 0 ], [ %rd121 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6777377Z // end inline asm 2026-02-21T08:16:17.6777604Z add.s32 %r202, %r196, 1536; 2026-02-21T08:16:17.6777860Z // begin inline asm 2026-02-21T08:16:17.6778183Z cp.async.cg.shared.global [ %r202 + 0 ], [ %rd122 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6778575Z // end inline asm 2026-02-21T08:16:17.6778792Z add.s32 %r204, %r196, 2048; 2026-02-21T08:16:17.6779051Z // begin inline asm 2026-02-21T08:16:17.6779386Z cp.async.cg.shared.global [ %r204 + 0 ], [ %rd123 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6779780Z // end inline asm 2026-02-21T08:16:17.6779999Z add.s32 %r206, %r196, 2560; 2026-02-21T08:16:17.6780258Z // begin inline asm 2026-02-21T08:16:17.6780597Z cp.async.cg.shared.global [ %r206 + 0 ], [ %rd124 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6780979Z // end inline asm 2026-02-21T08:16:17.6781209Z add.s32 %r208, %r196, 3072; 2026-02-21T08:16:17.6781457Z // begin inline asm 2026-02-21T08:16:17.6781794Z cp.async.cg.shared.global [ %r208 + 0 ], [ %rd125 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6782171Z // end inline asm 2026-02-21T08:16:17.6782397Z add.s32 %r210, %r196, 3584; 2026-02-21T08:16:17.6782646Z // begin inline asm 2026-02-21T08:16:17.6782975Z cp.async.cg.shared.global [ %r210 + 0 ], [ %rd126 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6783357Z // end inline asm 2026-02-21T08:16:17.6783580Z cp.async.commit_group; 2026-02-21T08:16:17.6784041Z .loc 1 52 34 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:34 2026-02-21T08:16:17.6784663Z add.s64 %rd131, %rd1, %rd184; 2026-02-21T08:16:17.6784997Z cvt.u32.u64 %r237, %rd131; 2026-02-21T08:16:17.6785269Z mad.wide.s32 %rd127, %r237, 2, %rd7; 2026-02-21T08:16:17.6785762Z .loc 1 52 87 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:52:87 2026-02-21T08:16:17.6786253Z shl.b32 %r238, %r338, 9; 2026-02-21T08:16:17.6786511Z add.s32 %r212, %r44, %r238; 2026-02-21T08:16:17.6786769Z // begin inline asm 2026-02-21T08:16:17.6787103Z cp.async.cg.shared.global [ %r212 + 0 ], [ %rd127 + 0 ], 0x10, %r197; 2026-02-21T08:16:17.6787497Z // end inline asm 2026-02-21T08:16:17.6787722Z cp.async.commit_group; 2026-02-21T08:16:17.6788174Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6788676Z setp.lt.u64 %p13, %rd4, 2032; 2026-02-21T08:16:17.6788952Z mov.b64 %rd184, %rd4; 2026-02-21T08:16:17.6789192Z @%p13 bra $L__BB0_6; 2026-02-21T08:16:17.6789574Z // %bb.7: // %.loopexit 2026-02-21T08:16:17.6789964Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:17.6790319Z cp.async.wait_group 0; 2026-02-21T08:16:17.6790588Z bar.warp.sync -1; 2026-02-21T08:16:17.6790821Z barrier.sync 1; 2026-02-21T08:16:17.6791062Z bra.uni $L__BB0_2; 2026-02-21T08:16:17.6791365Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:17.6791939Z .loc 1 46 79 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:46:79 2026-02-21T08:16:17.6792432Z barrier.sync 1; 2026-02-21T08:16:17.6792660Z barrier.sync 1; 2026-02-21T08:16:17.6792888Z bra.uni $L__BB0_2; 2026-02-21T08:16:17.6793183Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:17.6793726Z .loc 1 19 0 // cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py:19 2026-02-21T08:16:17.6794203Z barrier.sync 1; 2026-02-21T08:16:17.6794439Z barrier.sync 1; 2026-02-21T08:16:17.6794639Z bra.uni $L__BB0_2; 2026-02-21T08:16:17.6794908Z $L__tmp0: 2026-02-21T08:16:17.6795092Z $L__func_end0: 2026-02-21T08:16:17.6795340Z // -- End function 2026-02-21T08:16:17.6795631Z } 2026-02-21T08:16:17.6796071Z .file 1 "/tmp/torchinductor_root/jl/cjlne57mi724ytrjawju3ux7aasrpelrqk4kzveccbz22uj4dvzb.py" 2026-02-21T08:16:17.6796639Z .section .debug_abbrev 2026-02-21T08:16:17.6796870Z { 2026-02-21T08:16:17.6797121Z .b8 1 // Abbreviation Code 2026-02-21T08:16:17.6797497Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:16:17.6797876Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:16:17.6798228Z .b8 37 // DW_AT_producer 2026-02-21T08:16:17.6798587Z .b8 8 // DW_FORM_string 2026-02-21T08:16:17.6798944Z .b8 19 // DW_AT_language 2026-02-21T08:16:17.6799290Z .b8 5 // DW_FORM_data2 2026-02-21T08:16:17.6799635Z .b8 3 // DW_AT_name 2026-02-21T08:16:17.6799977Z .b8 8 // DW_FORM_string 2026-02-21T08:16:17.6800332Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:16:17.6800682Z .b8 6 // DW_FORM_data4 2026-02-21T08:16:17.6801033Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:16:17.6801385Z .b8 8 // DW_FORM_string 2026-02-21T08:16:17.6801720Z .b8 0 // EOM(1) 2026-02-21T08:16:17.6802052Z .b8 0 // EOM(2) 2026-02-21T08:16:17.6802367Z .b8 0 // EOM(3) 2026-02-21T08:16:17.6802654Z } 2026-02-21T08:16:17.6802849Z .section .debug_info 2026-02-21T08:16:17.6803080Z { 2026-02-21T08:16:17.6803315Z .b32 104 // Length of Unit 2026-02-21T08:16:17.6803814Z .b8 2 // DWARF version number 2026-02-21T08:16:17.6804150Z .b8 0 2026-02-21T08:16:17.6804451Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:16:17.6804960Z .b8 8 // Address Size (in bytes) 2026-02-21T08:16:17.6805371Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:16:17.6805792Z .b8 116 // DW_AT_producer 2026-02-21T08:16:17.6806100Z .b8 114 2026-02-21T08:16:17.6806296Z .b8 105 2026-02-21T08:16:17.6806477Z .b8 116 2026-02-21T08:16:17.6806666Z .b8 111 2026-02-21T08:16:17.6806837Z .b8 110 2026-02-21T08:16:17.6806999Z .b8 0 2026-02-21T08:16:17.6820384Z .b8 2 // DW_AT_language 2026-02-21T08:16:17.6820873Z .b8 0 2026-02-21T08:16:17.6821124Z .b8 99 // DW_AT_name 2026-02-21T08:16:17.6821676Z .b8 106 2026-02-21T08:16:17.6821866Z .b8 108 2026-02-21T08:16:17.6822070Z .b8 110 2026-02-21T08:16:17.6822260Z .b8 101 2026-02-21T08:16:17.6822435Z .b8 53 2026-02-21T08:16:17.6822625Z .b8 55 2026-02-21T08:16:17.6822801Z .b8 109 2026-02-21T08:16:17.6822989Z .b8 105 2026-02-21T08:16:17.6823168Z .b8 55 2026-02-21T08:16:17.6823354Z .b8 50 2026-02-21T08:16:17.6823526Z .b8 52 2026-02-21T08:16:17.6823712Z .b8 121 2026-02-21T08:16:17.6823882Z .b8 116 2026-02-21T08:16:17.6824065Z .b8 114 2026-02-21T08:16:17.6824239Z .b8 106 2026-02-21T08:16:17.6824421Z .b8 97 2026-02-21T08:16:17.6824598Z .b8 119 2026-02-21T08:16:17.6824842Z .b8 106 2026-02-21T08:16:17.6825021Z .b8 117 2026-02-21T08:16:17.6825209Z .b8 51 2026-02-21T08:16:17.6825393Z .b8 117 2026-02-21T08:16:17.6825570Z .b8 120 2026-02-21T08:16:17.6825758Z .b8 55 2026-02-21T08:16:17.6825938Z .b8 97 2026-02-21T08:16:17.6826125Z .b8 97 2026-02-21T08:16:17.6826301Z .b8 115 2026-02-21T08:16:17.6826486Z .b8 114 2026-02-21T08:16:17.6826661Z .b8 112 2026-02-21T08:16:17.6826860Z .b8 101 2026-02-21T08:16:17.6827044Z .b8 108 2026-02-21T08:16:17.6827230Z .b8 114 2026-02-21T08:16:17.6827404Z .b8 113 2026-02-21T08:16:17.6827586Z .b8 107 2026-02-21T08:16:17.6827762Z .b8 52 2026-02-21T08:16:17.6827948Z .b8 107 2026-02-21T08:16:17.6828134Z .b8 122 2026-02-21T08:16:17.6828307Z .b8 118 2026-02-21T08:16:17.6828492Z .b8 101 2026-02-21T08:16:17.6828665Z .b8 99 2026-02-21T08:16:17.6828853Z .b8 99 2026-02-21T08:16:17.6829027Z .b8 98 2026-02-21T08:16:17.6829211Z .b8 122 2026-02-21T08:16:17.6829385Z .b8 50 2026-02-21T08:16:17.6829570Z .b8 50 2026-02-21T08:16:17.6829745Z .b8 117 2026-02-21T08:16:17.6829931Z .b8 106 2026-02-21T08:16:17.6830106Z .b8 52 2026-02-21T08:16:17.6830292Z .b8 100 2026-02-21T08:16:17.6830467Z .b8 118 2026-02-21T08:16:17.6830653Z .b8 122 2026-02-21T08:16:17.6830827Z .b8 98 2026-02-21T08:16:17.6830911Z .b8 46 2026-02-21T08:16:17.6830989Z .b8 112 2026-02-21T08:16:17.6831065Z .b8 121 2026-02-21T08:16:17.6831151Z .b8 0 2026-02-21T08:16:17.6831314Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:16:17.6831453Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:16:17.6831530Z .b8 116 2026-02-21T08:16:17.6831615Z .b8 109 2026-02-21T08:16:17.6831689Z .b8 112 2026-02-21T08:16:17.6831766Z .b8 47 2026-02-21T08:16:17.6831852Z .b8 116 2026-02-21T08:16:17.6831931Z .b8 111 2026-02-21T08:16:17.6832006Z .b8 114 2026-02-21T08:16:17.6832082Z .b8 99 2026-02-21T08:16:17.6832168Z .b8 104 2026-02-21T08:16:17.6832243Z .b8 105 2026-02-21T08:16:17.6832318Z .b8 110 2026-02-21T08:16:17.6832405Z .b8 100 2026-02-21T08:16:17.6832478Z .b8 117 2026-02-21T08:16:17.6832556Z .b8 99 2026-02-21T08:16:17.6832632Z .b8 116 2026-02-21T08:16:17.6832716Z .b8 111 2026-02-21T08:16:17.6832792Z .b8 114 2026-02-21T08:16:17.6832867Z .b8 95 2026-02-21T08:16:17.6832952Z .b8 114 2026-02-21T08:16:17.6833028Z .b8 111 2026-02-21T08:16:17.6833102Z .b8 111 2026-02-21T08:16:17.6833177Z .b8 116 2026-02-21T08:16:17.6833260Z .b8 47 2026-02-21T08:16:17.6833334Z .b8 106 2026-02-21T08:16:17.6833414Z .b8 108 2026-02-21T08:16:17.6833488Z .b8 0 2026-02-21T08:16:17.6833736Z } 2026-02-21T08:16:17.6833847Z .section .debug_macinfo { } 2026-02-21T08:16:17.6833857Z 2026-02-21T08:16:17.6833988Z ================================================================ 2026-02-21T08:16:17.6834181Z please share the reproducer above with Triton project. 2026-02-21T08:16:29.5762835Z 2026-02-21T08:16:29.5763677Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 7.1 configs/s 2026-02-21T08:16:29.5778650Z [17s] Adaptive compile timeout: 30s (90% percentile=2.0s, bounds=[30.0s, 30s]) 2026-02-21T08:16:29.9462024Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 2410.3 configs/s 2026-02-21T08:16:30.0062114Z [18s] Initial random population of 100, 5 starting points: 2026-02-21T08:16:30.0066640Z error=12 2026-02-21T08:16:30.0068857Z ok=88 2026-02-21T08:16:30.0069059Z min=0.1598 2026-02-21T08:16:30.0069227Z mid=2.4647 2026-02-21T08:16:30.0069398Z max=267.7432 2026-02-21T08:16:30.0070113Z best={'block_sizes': [64, 128, 16], 2026-02-21T08:16:30.0070394Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:16:30.0070614Z 'l2_groupings': [4], 2026-02-21T08:16:30.0070804Z 'load_eviction_policies': ['', ''], 2026-02-21T08:16:30.0071021Z 'loop_orders': [[1, 0]], 2026-02-21T08:16:30.0071204Z 'num_stages': 8, 2026-02-21T08:16:30.0071410Z 'num_warps': 2, 2026-02-21T08:16:30.0077608Z 'pid_type': 'flat', 2026-02-21T08:16:30.0079876Z 'range_flattens': [None, None], 2026-02-21T08:16:30.0080153Z 'range_multi_buffers': [None, None], 2026-02-21T08:16:30.0085486Z 'range_num_stages': [0, 0], 2026-02-21T08:16:30.0087943Z 'range_unroll_factors': [0, 0], 2026-02-21T08:16:30.0088211Z 'range_warp_specializes': [None, None]} 2026-02-21T08:16:30.0088436Z [18s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:16:31.1495989Z [19s] Generation 1 starting: 79 neighbors, 5 active search path(s) 2026-02-21T08:16:34.3649391Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81/81 25.9 configs/s 2026-02-21T08:16:38.7180192Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 81/81 18.8 configs/s 2026-02-21T08:16:39.7422200Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 959.5 2026-02-21T08:16:39.7422858Z configs/s 2026-02-21T08:16:39.8094845Z [27s] Generation 1 complete: 2026-02-21T08:16:39.8098522Z error=9 2026-02-21T08:16:39.8102382Z ok=76 2026-02-21T08:16:39.8106511Z min=0.0758 2026-02-21T08:16:39.8110484Z mid=0.3153 2026-02-21T08:16:39.8114431Z max=6.1194 2026-02-21T08:16:39.8114634Z best={'block_sizes': [64, 128, 64], 2026-02-21T08:16:39.8114931Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:16:39.8115150Z 'l2_groupings': [4], 2026-02-21T08:16:39.8115319Z 'load_eviction_policies': ['', 'last'], 2026-02-21T08:16:39.8115523Z 'loop_orders': [[1, 0]], 2026-02-21T08:16:39.8115683Z 'num_stages': 7, 2026-02-21T08:16:39.8115841Z 'num_warps': 8, 2026-02-21T08:16:39.8116013Z 'pid_type': 'flat', 2026-02-21T08:16:39.8116196Z 'range_flattens': [None, None], 2026-02-21T08:16:39.8116376Z 'range_multi_buffers': [None, None], 2026-02-21T08:16:39.8116566Z 'range_num_stages': [0, 0], 2026-02-21T08:16:39.8116730Z 'range_unroll_factors': [0, 0], 2026-02-21T08:16:39.8116917Z 'range_warp_specializes': [None, None]} 2026-02-21T08:16:39.8117131Z [27s] Fitting surrogate: 185 points, 185 targets 2026-02-21T08:16:41.0047615Z [29s] Generation 2 starting: 84 neighbors, 5 active search path(s) 2026-02-21T08:16:46.6102850Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 5.6 configs/s 2026-02-21T08:16:48.9573481Z 2026-02-21T08:16:48.9573534Z 2026-02-21T08:16:48.9573966Z ================================================================ 2026-02-21T08:16:48.9574291Z Internal Triton PTX codegen error 2026-02-21T08:16:48.9574538Z `ptxas` stderr: 2026-02-21T08:16:48.9575368Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 193 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:16:48.9576336Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:16:48.9576503Z 2026-02-21T08:16:48.9577012Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplm9uxi26.ptx -o /tmp/tmplm9uxi26.ptx.o 2026-02-21T08:16:48.9577523Z 2026-02-21T08:16:48.9577527Z 2026-02-21T08:16:48.9577601Z // 2026-02-21T08:16:48.9577773Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:16:48.9577977Z // 2026-02-21T08:16:48.9578064Z 2026-02-21T08:16:48.9578154Z .version 8.7 2026-02-21T08:16:48.9578298Z .target sm_100a 2026-02-21T08:16:48.9578462Z .address_size 64 2026-02-21T08:16:48.9578544Z 2026-02-21T08:16:48.9578677Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:16:48.9578964Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:16:48.9579199Z // @_helion_matmul 2026-02-21T08:16:48.9579573Z .visible .entry _helion_matmul( 2026-02-21T08:16:48.9579817Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:16:48.9580065Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:16:48.9580309Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:16:48.9580545Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:16:48.9580792Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:16:48.9580982Z ) 2026-02-21T08:16:48.9581106Z .reqntid 384 2026-02-21T08:16:48.9581236Z .maxnreg 32 2026-02-21T08:16:48.9581353Z { 2026-02-21T08:16:48.9581481Z .reg .pred %p<105>; 2026-02-21T08:16:48.9581645Z .reg .b16 %rs<3>; 2026-02-21T08:16:48.9581784Z .reg .b32 %r<1612>; 2026-02-21T08:16:48.9581924Z .reg .b64 %rd<685>; 2026-02-21T08:16:48.9582180Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19:0 2026-02-21T08:16:48.9582466Z $L__func_begin0: 2026-02-21T08:16:48.9582708Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19:0 2026-02-21T08:16:48.9582943Z 2026-02-21T08:16:48.9582995Z // %bb.0: 2026-02-21T08:16:48.9583142Z ld.param.b64 %rd9, [_helion_matmul_param_3]; 2026-02-21T08:16:48.9583328Z $L__tmp0: 2026-02-21T08:16:48.9583568Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19 2026-02-21T08:16:48.9583842Z mov.u32 %r1, %tid.x; 2026-02-21T08:16:48.9583982Z shr.u32 %r2, %r1, 5; 2026-02-21T08:16:48.9584138Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:16:48.9584320Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:16:48.9584468Z @%p2 bra $L__BB0_14; 2026-02-21T08:16:48.9584610Z bra.uni $L__BB0_1; 2026-02-21T08:16:48.9584802Z $L__BB0_14: 2026-02-21T08:16:48.9585045Z .loc 1 0 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:0:0 2026-02-21T08:16:48.9585351Z ld.param.b64 %rd7, [_helion_matmul_param_1]; 2026-02-21T08:16:48.9585643Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19 2026-02-21T08:16:48.9585941Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T08:16:48.9586147Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T08:16:48.9586328Z mov.b32 %r253, global_smem; 2026-02-21T08:16:48.9586492Z // begin inline asm 2026-02-21T08:16:48.9586754Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r253], 256; 2026-02-21T08:16:48.9587010Z // end inline asm 2026-02-21T08:16:48.9587155Z bar.sync 0, 128; 2026-02-21T08:16:48.9587304Z ld.shared.b32 %r1604, [global_smem]; 2026-02-21T08:16:48.9587477Z bar.sync 0, 128; 2026-02-21T08:16:48.9587604Z // begin inline asm 2026-02-21T08:16:48.9587810Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:16:48.9588033Z // end inline asm 2026-02-21T08:16:48.9588280Z .loc 1 21 67 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:21:67 2026-02-21T08:16:48.9588579Z mov.u32 %r26, %ctaid.x; 2026-02-21T08:16:48.9588817Z mov.u32 %r262, %ctaid.y; 2026-02-21T08:16:48.9588976Z mov.u32 %r263, %ctaid.z; 2026-02-21T08:16:48.9589126Z mov.u32 %r264, %nctaid.x; 2026-02-21T08:16:48.9589289Z mov.u32 %r265, %nctaid.y; 2026-02-21T08:16:48.9589459Z mad.lo.s32 %r266, %r263, %r265, %r262; 2026-02-21T08:16:48.9589640Z mad.lo.s32 %r267, %r266, %r264, %r26; 2026-02-21T08:16:48.9589821Z shl.b32 %r268, %r267, 7; 2026-02-21T08:16:48.9589974Z cvt.s64.s32 %rd139, %r268; 2026-02-21T08:16:48.9590141Z add.s64 %rd136, %rd9, %rd139; 2026-02-21T08:16:48.9590304Z shl.b32 %r269, %r1, 2; 2026-02-21T08:16:48.9590473Z add.s32 %r254, %r253, %r269; 2026-02-21T08:16:48.9590628Z mov.b32 %r271, 0; 2026-02-21T08:16:48.9590781Z // begin inline asm 2026-02-21T08:16:48.9590935Z @%p29 st.shared.b32 [ %r254 + 0 ], %r271; 2026-02-21T08:16:48.9591122Z // end inline asm 2026-02-21T08:16:48.9591279Z bar.warp.sync -1; 2026-02-21T08:16:48.9591433Z setp.eq.b32 %p66, %r1, 0; 2026-02-21T08:16:48.9591679Z cvt.u64.u32 %rd121, %r253; 2026-02-21T08:16:48.9591839Z // begin inline asm 2026-02-21T08:16:48.9592106Z @%p66 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd121 + 0 ], %rd7; 2026-02-21T08:16:48.9592389Z // end inline asm 2026-02-21T08:16:48.9592526Z // begin inline asm 2026-02-21T08:16:48.9592744Z @%p66 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1; 2026-02-21T08:16:48.9592996Z // end inline asm 2026-02-21T08:16:48.9593135Z mov.b32 %r256, 32; 2026-02-21T08:16:48.9593268Z // begin inline asm 2026-02-21T08:16:48.9593506Z @%p66 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0, %r256; 2026-02-21T08:16:48.9593772Z // end inline asm 2026-02-21T08:16:48.9593910Z mov.b32 %r257, 128; 2026-02-21T08:16:48.9594044Z // begin inline asm 2026-02-21T08:16:48.9594274Z @%p66 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1, %r257; 2026-02-21T08:16:48.9594534Z // end inline asm 2026-02-21T08:16:48.9594664Z mov.b32 %r258, 2048; 2026-02-21T08:16:48.9594844Z // begin inline asm 2026-02-21T08:16:48.9595081Z @%p66 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0, %r258; 2026-02-21T08:16:48.9595356Z // end inline asm 2026-02-21T08:16:48.9595483Z // begin inline asm 2026-02-21T08:16:48.9595725Z @%p66 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1, %r258; 2026-02-21T08:16:48.9595992Z // end inline asm 2026-02-21T08:16:48.9596129Z mov.b64 %rd129, 4096; 2026-02-21T08:16:48.9596275Z // begin inline asm 2026-02-21T08:16:48.9596528Z @%p66 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd121 + 0 ], 0x0, %rd129; 2026-02-21T08:16:48.9597316Z [37s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:16:48.9598497Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:16:48.9599633Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:16:48.9599871Z `ptxas` stderr: 2026-02-21T08:16:48.9600293Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 193 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:16:48.9600759Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:16:48.9600903Z 2026-02-21T08:16:48.9601302Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplm9uxi26.ptx -o /tmp/tmplm9uxi26.ptx.o 2026-02-21T08:16:48.9601742Z 2026-02-21T08:16:48.9601868Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:16:48.9602173Z // end inline asm 2026-02-21T08:16:48.9602309Z mov.b32 %r260, 1; 2026-02-21T08:16:48.9602441Z // begin inline asm 2026-02-21T08:16:48.9602707Z @%p66 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0, %r260; 2026-02-21T08:16:48.9602987Z // end inline asm 2026-02-21T08:16:48.9603124Z // begin inline asm 2026-02-21T08:16:48.9603374Z @%p66 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1, %r260; 2026-02-21T08:16:48.9603719Z // end inline asm 2026-02-21T08:16:48.9603899Z // begin inline asm 2026-02-21T08:16:48.9604131Z @%p66 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x6; 2026-02-21T08:16:48.9604392Z // end inline asm 2026-02-21T08:16:48.9604521Z // begin inline asm 2026-02-21T08:16:48.9604810Z @%p66 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0; 2026-02-21T08:16:48.9605080Z // end inline asm 2026-02-21T08:16:48.9605285Z // begin inline asm 2026-02-21T08:16:48.9605527Z @%p66 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x2; 2026-02-21T08:16:48.9605781Z // end inline asm 2026-02-21T08:16:48.9605915Z // begin inline asm 2026-02-21T08:16:48.9606130Z @%p66 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0; 2026-02-21T08:16:48.9606387Z // end inline asm 2026-02-21T08:16:48.9606513Z // begin inline asm 2026-02-21T08:16:48.9606855Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd136 + 0 ], [ %rd121 + 0 ], 0x80; 2026-02-21T08:16:48.9607238Z // end inline asm 2026-02-21T08:16:48.9607368Z // begin inline asm 2026-02-21T08:16:48.9607578Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd136 + 0 ], 0x80; 2026-02-21T08:16:48.9607829Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T08:16:48.9608019Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:16:48.9608186Z // end inline asm 2026-02-21T08:16:48.9608320Z bar.sync 0, 128; 2026-02-21T08:16:48.9608564Z .loc 1 30 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:30:52 2026-02-21T08:16:48.9608878Z setp.gt.u32 %p49, %r26, 255; 2026-02-21T08:16:48.9609051Z @%p49 bra $L__BB0_16; 2026-02-21T08:16:48.9609214Z // %bb.15: // %.lr.ph 2026-02-21T08:16:48.9609522Z .loc 1 0 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:0:52 2026-02-21T08:16:48.9609831Z ld.param.b64 %rd8, [_helion_matmul_param_2]; 2026-02-21T08:16:48.9610136Z .loc 1 44 45 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:45 2026-02-21T08:16:48.9610425Z shr.u32 %r1141, %r1, 4; 2026-02-21T08:16:48.9610693Z .loc 1 43 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:43:27 2026-02-21T08:16:48.9610987Z shl.b32 %r1142, %r26, 4; 2026-02-21T08:16:48.9611248Z .loc 1 44 45 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:45 2026-02-21T08:16:48.9611546Z or.b32 %r1143, %r1141, %r1142; 2026-02-21T08:16:48.9611716Z bfe.u32 %r1144, %r1, 4, 3; 2026-02-21T08:16:48.9611884Z or.b32 %r1145, %r1144, %r1142; 2026-02-21T08:16:48.9612150Z .loc 1 43 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:43:27 2026-02-21T08:16:48.9612451Z and.b32 %r1146, %r1142, 3840; 2026-02-21T08:16:48.9612723Z .loc 1 44 45 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:45 2026-02-21T08:16:48.9613008Z or.b32 %r1147, %r1141, %r1146; 2026-02-21T08:16:48.9613282Z .loc 1 42 45 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:42:45 2026-02-21T08:16:48.9613569Z shl.b32 %r1148, %r1, 3; 2026-02-21T08:16:48.9613730Z and.b32 %r1149, %r1148, 120; 2026-02-21T08:16:48.9613886Z and.b32 %r1151, %r269, 16; 2026-02-21T08:16:48.9614049Z add.s32 %r1153, %r253, %r1151; 2026-02-21T08:16:48.9614209Z and.b32 %r1154, %r1, 96; 2026-02-21T08:16:48.9614368Z shl.b32 %r1155, %r1154, 6; 2026-02-21T08:16:48.9614596Z shl.b32 %r1156, %r1, 5; 2026-02-21T08:16:48.9614792Z and.b32 %r1157, %r1156, 96; 2026-02-21T08:16:48.9614961Z or.b32 %r1158, %r1155, %r1157; 2026-02-21T08:16:48.9615117Z shl.b32 %r1159, %r1, 4; 2026-02-21T08:16:48.9615276Z and.b32 %r1160, %r1159, 384; 2026-02-21T08:16:48.9615430Z or.b32 %r1161, %r1160, %r1154; 2026-02-21T08:16:48.9615600Z xor.b32 %r1162, %r1158, %r1161; 2026-02-21T08:16:48.9615763Z add.s32 %r857, %r1153, %r1162; 2026-02-21T08:16:48.9615930Z add.s32 %r872, %r857, 1536; 2026-02-21T08:16:48.9616084Z add.s32 %r867, %r857, 1024; 2026-02-21T08:16:48.9616246Z add.s32 %r862, %r857, 512; 2026-02-21T08:16:48.9616405Z shl.b32 %r1163, %r1, 10; 2026-02-21T08:16:48.9616557Z and.b32 %r1164, %r1163, 6144; 2026-02-21T08:16:48.9616723Z and.b32 %r1165, %r1159, 2032; 2026-02-21T08:16:48.9616878Z or.b32 %r1166, %r1164, %r1165; 2026-02-21T08:16:48.9617042Z xor.b32 %r1167, %r1166, 96; 2026-02-21T08:16:48.9617287Z add.s32 %r1168, %r253, %r1167; 2026-02-21T08:16:48.9617450Z xor.b32 %r1169, %r1166, 64; 2026-02-21T08:16:48.9617602Z add.s32 %r1170, %r253, %r1169; 2026-02-21T08:16:48.9617765Z xor.b32 %r1171, %r1166, 32; 2026-02-21T08:16:48.9617915Z add.s32 %r1172, %r253, %r1171; 2026-02-21T08:16:48.9618076Z add.s32 %r1173, %r253, %r1166; 2026-02-21T08:16:48.9618344Z .loc 1 41 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:41:27 2026-02-21T08:16:48.9618622Z shl.b32 %r1174, %r26, 7; 2026-02-21T08:16:48.9618784Z and.b32 %r1175, %r1174, 1920; 2026-02-21T08:16:48.9619041Z .loc 1 42 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:42:32 2026-02-21T08:16:48.9619327Z or.b32 %r1176, %r1175, %r1149; 2026-02-21T08:16:48.9619587Z .loc 1 44 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:32 2026-02-21T08:16:48.9619875Z or.b32 %r1177, %r1146, %r1144; 2026-02-21T08:16:48.9620039Z shl.b32 %r1178, %r1147, 11; 2026-02-21T08:16:48.9620193Z shl.b32 %r1179, %r1145, 11; 2026-02-21T08:16:48.9620350Z shl.b32 %r1180, %r1143, 11; 2026-02-21T08:16:48.9620601Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9620906Z shfl.sync.idx.b32 %r1181, %r2, 0, 31, -1; 2026-02-21T08:16:48.9621087Z shl.b32 %r1182, %r1181, 21; 2026-02-21T08:16:48.9621250Z and.b32 %r1183, %r1182, 6291456; 2026-02-21T08:16:48.9621418Z add.s32 %r270, %r1183, %r1604; 2026-02-21T08:16:48.9621576Z mov.pred %p50, -1; 2026-02-21T08:16:48.9621725Z // begin inline asm 2026-02-21T08:16:48.9622084Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 0], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9622496Z // end inline asm 2026-02-21T08:16:48.9622632Z // begin inline asm 2026-02-21T08:16:48.9622993Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 16], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9623378Z // end inline asm 2026-02-21T08:16:48.9623519Z // begin inline asm 2026-02-21T08:16:48.9623871Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 32], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9624261Z // end inline asm 2026-02-21T08:16:48.9624399Z // begin inline asm 2026-02-21T08:16:48.9624798Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 48], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9625185Z // end inline asm 2026-02-21T08:16:48.9625313Z // begin inline asm 2026-02-21T08:16:48.9625653Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 64], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9626032Z // end inline asm 2026-02-21T08:16:48.9626219Z // begin inline asm 2026-02-21T08:16:48.9626560Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 80], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9626940Z // end inline asm 2026-02-21T08:16:48.9627076Z // begin inline asm 2026-02-21T08:16:48.9627415Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 96], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9627779Z // end inline asm 2026-02-21T08:16:48.9627913Z // begin inline asm 2026-02-21T08:16:48.9628247Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 112], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9628634Z // end inline asm 2026-02-21T08:16:48.9628760Z // begin inline asm 2026-02-21T08:16:48.9629154Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 128], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9629542Z // end inline asm 2026-02-21T08:16:48.9629668Z // begin inline asm 2026-02-21T08:16:48.9630015Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 144], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9630395Z // end inline asm 2026-02-21T08:16:48.9630529Z // begin inline asm 2026-02-21T08:16:48.9630868Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 160], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9631262Z // end inline asm 2026-02-21T08:16:48.9631398Z // begin inline asm 2026-02-21T08:16:48.9631746Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 176], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9632128Z // end inline asm 2026-02-21T08:16:48.9632256Z // begin inline asm 2026-02-21T08:16:48.9632605Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 192], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9633007Z // end inline asm 2026-02-21T08:16:48.9633134Z // begin inline asm 2026-02-21T08:16:48.9633481Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 208], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9633880Z // end inline asm 2026-02-21T08:16:48.9634016Z // begin inline asm 2026-02-21T08:16:48.9634358Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 224], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9634762Z // end inline asm 2026-02-21T08:16:48.9634905Z // begin inline asm 2026-02-21T08:16:48.9635259Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 240], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:16:48.9635651Z // end inline asm 2026-02-21T08:16:48.9635783Z // begin inline asm 2026-02-21T08:16:48.9635941Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:16:48.9636102Z // end inline asm 2026-02-21T08:16:48.9636243Z bar.sync 0, 128; 2026-02-21T08:16:48.9636498Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9636784Z add.s32 %r542, %r253, 155648; 2026-02-21T08:16:48.9636946Z // begin inline asm 2026-02-21T08:16:48.9637111Z @%p66 mbarrier.init.shared::cta.b64 [%r542], 1; 2026-02-21T08:16:48.9637307Z // end inline asm 2026-02-21T08:16:48.9637440Z bar.sync 0, 128; 2026-02-21T08:16:48.9637581Z add.s32 %r543, %r253, 155656; 2026-02-21T08:16:48.9637737Z // begin inline asm 2026-02-21T08:16:48.9637908Z @%p66 mbarrier.init.shared::cta.b64 [%r543], 1; 2026-02-21T08:16:48.9638151Z // end inline asm 2026-02-21T08:16:48.9638277Z bar.sync 0, 128; 2026-02-21T08:16:48.9638411Z add.s32 %r544, %r253, 155664; 2026-02-21T08:16:48.9638556Z // begin inline asm 2026-02-21T08:16:48.9638719Z @%p66 mbarrier.init.shared::cta.b64 [%r544], 1; 2026-02-21T08:16:48.9638893Z // end inline asm 2026-02-21T08:16:48.9639024Z bar.sync 0, 128; 2026-02-21T08:16:48.9639152Z add.s32 %r545, %r253, 155672; 2026-02-21T08:16:48.9639305Z // begin inline asm 2026-02-21T08:16:48.9639459Z @%p66 mbarrier.init.shared::cta.b64 [%r545], 1; 2026-02-21T08:16:48.9639641Z // end inline asm 2026-02-21T08:16:48.9639772Z bar.sync 0, 128; 2026-02-21T08:16:48.9639900Z add.s32 %r546, %r253, 155680; 2026-02-21T08:16:48.9640053Z // begin inline asm 2026-02-21T08:16:48.9640206Z @%p66 mbarrier.init.shared::cta.b64 [%r546], 1; 2026-02-21T08:16:48.9640387Z // end inline asm 2026-02-21T08:16:48.9640559Z bar.sync 0, 128; 2026-02-21T08:16:48.9640698Z add.s32 %r547, %r253, 155688; 2026-02-21T08:16:48.9640842Z // begin inline asm 2026-02-21T08:16:48.9641000Z @%p66 mbarrier.init.shared::cta.b64 [%r547], 1; 2026-02-21T08:16:48.9641173Z // end inline asm 2026-02-21T08:16:48.9641303Z bar.sync 0, 128; 2026-02-21T08:16:48.9641436Z add.s32 %r548, %r253, 155696; 2026-02-21T08:16:48.9641580Z // begin inline asm 2026-02-21T08:16:48.9641742Z @%p66 mbarrier.init.shared::cta.b64 [%r548], 1; 2026-02-21T08:16:48.9641915Z // end inline asm 2026-02-21T08:16:48.9642049Z add.s32 %r549, %r253, 155712; 2026-02-21T08:16:48.9642194Z // begin inline asm 2026-02-21T08:16:48.9642354Z @%p66 mbarrier.init.shared::cta.b64 [%r549], 1; 2026-02-21T08:16:48.9642527Z // end inline asm 2026-02-21T08:16:48.9642661Z bar.sync 0, 128; 2026-02-21T08:16:48.9642797Z add.s32 %r550, %r253, 155720; 2026-02-21T08:16:48.9642941Z // begin inline asm 2026-02-21T08:16:48.9643101Z @%p66 mbarrier.init.shared::cta.b64 [%r550], 1; 2026-02-21T08:16:48.9643278Z // end inline asm 2026-02-21T08:16:48.9643413Z bar.sync 0, 128; 2026-02-21T08:16:48.9643540Z add.s32 %r551, %r253, 155728; 2026-02-21T08:16:48.9643691Z // begin inline asm 2026-02-21T08:16:48.9643842Z @%p66 mbarrier.init.shared::cta.b64 [%r551], 1; 2026-02-21T08:16:48.9644023Z // end inline asm 2026-02-21T08:16:48.9644149Z bar.sync 0, 128; 2026-02-21T08:16:48.9644284Z add.s32 %r552, %r253, 155736; 2026-02-21T08:16:48.9644434Z // begin inline asm 2026-02-21T08:16:48.9644585Z @%p66 mbarrier.init.shared::cta.b64 [%r552], 1; 2026-02-21T08:16:48.9644791Z // end inline asm 2026-02-21T08:16:48.9644916Z bar.sync 0, 128; 2026-02-21T08:16:48.9645053Z add.s32 %r553, %r253, 155744; 2026-02-21T08:16:48.9645199Z // begin inline asm 2026-02-21T08:16:48.9645361Z @%p66 mbarrier.init.shared::cta.b64 [%r553], 1; 2026-02-21T08:16:48.9645535Z // end inline asm 2026-02-21T08:16:48.9645668Z bar.sync 0, 128; 2026-02-21T08:16:48.9645805Z add.s32 %r554, %r253, 155752; 2026-02-21T08:16:48.9645952Z // begin inline asm 2026-02-21T08:16:48.9646113Z @%p66 mbarrier.init.shared::cta.b64 [%r554], 1; 2026-02-21T08:16:48.9646287Z // end inline asm 2026-02-21T08:16:48.9646421Z bar.sync 0, 128; 2026-02-21T08:16:48.9646547Z add.s32 %r555, %r253, 155760; 2026-02-21T08:16:48.9646697Z // begin inline asm 2026-02-21T08:16:48.9646846Z @%p66 mbarrier.init.shared::cta.b64 [%r555], 1; 2026-02-21T08:16:48.9647029Z // end inline asm 2026-02-21T08:16:48.9647272Z .loc 1 55 44 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:55:44 2026-02-21T08:16:48.9647555Z bar.sync 0, 128; 2026-02-21T08:16:48.9647685Z // begin inline asm 2026-02-21T08:16:48.9647848Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r542]; 2026-02-21T08:16:48.9648037Z // end inline asm 2026-02-21T08:16:48.9648161Z bar.sync 0, 128; 2026-02-21T08:16:48.9648292Z // begin inline asm 2026-02-21T08:16:48.9648445Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r543]; 2026-02-21T08:16:48.9648633Z // end inline asm 2026-02-21T08:16:48.9648758Z bar.sync 0, 128; 2026-02-21T08:16:48.9648959Z // begin inline asm 2026-02-21T08:16:48.9649120Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r544]; 2026-02-21T08:16:48.9649300Z // end inline asm 2026-02-21T08:16:48.9649431Z bar.sync 0, 128; 2026-02-21T08:16:48.9649556Z // begin inline asm 2026-02-21T08:16:48.9649716Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r545]; 2026-02-21T08:16:48.9649894Z // end inline asm 2026-02-21T08:16:48.9650028Z bar.sync 0, 128; 2026-02-21T08:16:48.9650154Z // begin inline asm 2026-02-21T08:16:48.9650320Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r546]; 2026-02-21T08:16:48.9650506Z // end inline asm 2026-02-21T08:16:48.9650631Z bar.sync 0, 128; 2026-02-21T08:16:48.9650763Z // begin inline asm 2026-02-21T08:16:48.9650915Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r547]; 2026-02-21T08:16:48.9651099Z // end inline asm 2026-02-21T08:16:48.9651223Z bar.sync 0, 128; 2026-02-21T08:16:48.9651358Z // begin inline asm 2026-02-21T08:16:48.9651573Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r548]; 2026-02-21T08:16:48.9651771Z // end inline asm 2026-02-21T08:16:48.9652026Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9652329Z bar.sync 0, 128; 2026-02-21T08:16:48.9652472Z add.s32 %r563, %r253, 155776; 2026-02-21T08:16:48.9652624Z // begin inline asm 2026-02-21T08:16:48.9652791Z @%p66 mbarrier.init.shared::cta.b64 [%r563], 1; 2026-02-21T08:16:48.9652973Z // end inline asm 2026-02-21T08:16:48.9653152Z st.shared.v2.b32 [global_smem+155784], {0, 33685761}; 2026-02-21T08:16:48.9653370Z st.shared.b32 [global_smem], %r1604; 2026-02-21T08:16:48.9653586Z st.shared.v2.b32 [global_smem+8], {%r1175, %r1146}; 2026-02-21T08:16:48.9653788Z barrier.sync 1; 2026-02-21T08:16:48.9653926Z barrier.sync 1; 2026-02-21T08:16:48.9654069Z barrier.sync 1; 2026-02-21T08:16:48.9654326Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9654632Z bar.sync 0, 128; 2026-02-21T08:16:48.9654802Z // begin inline asm 2026-02-21T08:16:48.9654942Z 2026-02-21T08:16:48.9655055Z { 2026-02-21T08:16:48.9655186Z .reg .pred complete; 2026-02-21T08:16:48.9655331Z waitLoop: 2026-02-21T08:16:48.9655534Z mbarrier.try_wait.parity.shared.b64 complete, [%r563], %r271; 2026-02-21T08:16:48.9655782Z @!complete bra.uni waitLoop; 2026-02-21T08:16:48.9655934Z } 2026-02-21T08:16:48.9656000Z 2026-02-21T08:16:48.9656064Z // end inline asm 2026-02-21T08:16:48.9656319Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9656624Z bar.sync 0, 128; 2026-02-21T08:16:48.9656757Z // begin inline asm 2026-02-21T08:16:48.9656930Z @%p66 mbarrier.inval.shared::cta.b64 [%r563]; 2026-02-21T08:16:48.9657118Z // end inline asm 2026-02-21T08:16:48.9657256Z // begin inline asm 2026-02-21T08:16:48.9657426Z @%p66 mbarrier.inval.shared::cta.b64 [%r549]; 2026-02-21T08:16:48.9657614Z // end inline asm 2026-02-21T08:16:48.9657754Z bar.sync 0, 128; 2026-02-21T08:16:48.9657885Z // begin inline asm 2026-02-21T08:16:48.9658052Z @%p66 mbarrier.inval.shared::cta.b64 [%r550]; 2026-02-21T08:16:48.9658234Z // end inline asm 2026-02-21T08:16:48.9658373Z bar.sync 0, 128; 2026-02-21T08:16:48.9658504Z // begin inline asm 2026-02-21T08:16:48.9658673Z @%p66 mbarrier.inval.shared::cta.b64 [%r551]; 2026-02-21T08:16:48.9658855Z // end inline asm 2026-02-21T08:16:48.9658997Z bar.sync 0, 128; 2026-02-21T08:16:48.9659137Z // begin inline asm 2026-02-21T08:16:48.9659308Z @%p66 mbarrier.inval.shared::cta.b64 [%r552]; 2026-02-21T08:16:48.9659487Z // end inline asm 2026-02-21T08:16:48.9659612Z bar.sync 0, 128; 2026-02-21T08:16:48.9659745Z // begin inline asm 2026-02-21T08:16:48.9659897Z @%p66 mbarrier.inval.shared::cta.b64 [%r553]; 2026-02-21T08:16:48.9660079Z // end inline asm 2026-02-21T08:16:48.9660202Z bar.sync 0, 128; 2026-02-21T08:16:48.9660336Z // begin inline asm 2026-02-21T08:16:48.9660494Z @%p66 mbarrier.inval.shared::cta.b64 [%r554]; 2026-02-21T08:16:48.9660759Z // end inline asm 2026-02-21T08:16:48.9660898Z bar.sync 0, 128; 2026-02-21T08:16:48.9661025Z // begin inline asm 2026-02-21T08:16:48.9661189Z @%p66 mbarrier.inval.shared::cta.b64 [%r555]; 2026-02-21T08:16:48.9661365Z // end inline asm 2026-02-21T08:16:48.9661503Z // begin inline asm 2026-02-21T08:16:48.9661656Z @%p66 mbarrier.inval.shared::cta.b64 [%r542]; 2026-02-21T08:16:48.9661841Z // end inline asm 2026-02-21T08:16:48.9661966Z bar.sync 0, 128; 2026-02-21T08:16:48.9662100Z // begin inline asm 2026-02-21T08:16:48.9662259Z @%p66 mbarrier.inval.shared::cta.b64 [%r543]; 2026-02-21T08:16:48.9662432Z // end inline asm 2026-02-21T08:16:48.9662566Z bar.sync 0, 128; 2026-02-21T08:16:48.9662689Z // begin inline asm 2026-02-21T08:16:48.9662849Z @%p66 mbarrier.inval.shared::cta.b64 [%r544]; 2026-02-21T08:16:48.9663019Z // end inline asm 2026-02-21T08:16:48.9663148Z bar.sync 0, 128; 2026-02-21T08:16:48.9663360Z // begin inline asm 2026-02-21T08:16:48.9663525Z @%p66 mbarrier.inval.shared::cta.b64 [%r545]; 2026-02-21T08:16:48.9663709Z // end inline asm 2026-02-21T08:16:48.9663834Z bar.sync 0, 128; 2026-02-21T08:16:48.9663964Z // begin inline asm 2026-02-21T08:16:48.9664112Z @%p66 mbarrier.inval.shared::cta.b64 [%r546]; 2026-02-21T08:16:48.9664292Z // end inline asm 2026-02-21T08:16:48.9664417Z bar.sync 0, 128; 2026-02-21T08:16:48.9664548Z // begin inline asm 2026-02-21T08:16:48.9664724Z @%p66 mbarrier.inval.shared::cta.b64 [%r547]; 2026-02-21T08:16:48.9664904Z // end inline asm 2026-02-21T08:16:48.9665029Z bar.sync 0, 128; 2026-02-21T08:16:48.9665161Z // begin inline asm 2026-02-21T08:16:48.9665319Z @%p66 mbarrier.inval.shared::cta.b64 [%r548]; 2026-02-21T08:16:48.9665489Z // end inline asm 2026-02-21T08:16:48.9665734Z .loc 1 44 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:32 2026-02-21T08:16:48.9666020Z shl.b32 %r1184, %r1177, 11; 2026-02-21T08:16:48.9666283Z .loc 1 59 45 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:59:45 2026-02-21T08:16:48.9666570Z or.b32 %r1185, %r1178, %r1176; 2026-02-21T08:16:48.9666735Z or.b32 %r1186, %r1179, %r1176; 2026-02-21T08:16:48.9666888Z or.b32 %r1187, %r1180, %r1176; 2026-02-21T08:16:48.9667045Z or.b32 %r1188, %r1184, %r1176; 2026-02-21T08:16:48.9667310Z .loc 1 59 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:59:52 2026-02-21T08:16:48.9667584Z or.b32 %r1189, %r1188, 16384; 2026-02-21T08:16:48.9667743Z or.b32 %r1190, %r1188, 32768; 2026-02-21T08:16:48.9667891Z or.b32 %r1191, %r1188, 49152; 2026-02-21T08:16:48.9668044Z or.b32 %r1192, %r1188, 65536; 2026-02-21T08:16:48.9668187Z or.b32 %r1193, %r1188, 81920; 2026-02-21T08:16:48.9668338Z or.b32 %r1194, %r1188, 98304; 2026-02-21T08:16:48.9668479Z or.b32 %r1195, %r1185, 114688; 2026-02-21T08:16:48.9668633Z or.b32 %r1196, %r1188, 131072; 2026-02-21T08:16:48.9668787Z or.b32 %r1197, %r1188, 147456; 2026-02-21T08:16:48.9668936Z or.b32 %r1198, %r1188, 163840; 2026-02-21T08:16:48.9669087Z or.b32 %r1199, %r1188, 180224; 2026-02-21T08:16:48.9669236Z or.b32 %r1200, %r1188, 196608; 2026-02-21T08:16:48.9669389Z or.b32 %r1201, %r1188, 212992; 2026-02-21T08:16:48.9669534Z or.b32 %r1202, %r1188, 229376; 2026-02-21T08:16:48.9669685Z or.b32 %r1203, %r1185, 245760; 2026-02-21T08:16:48.9669830Z or.b32 %r1204, %r1188, 262144; 2026-02-21T08:16:48.9669983Z or.b32 %r1205, %r1188, 278528; 2026-02-21T08:16:48.9670133Z or.b32 %r1206, %r1188, 294912; 2026-02-21T08:16:48.9670278Z or.b32 %r1207, %r1188, 311296; 2026-02-21T08:16:48.9670431Z or.b32 %r1208, %r1188, 327680; 2026-02-21T08:16:48.9670574Z or.b32 %r1209, %r1188, 344064; 2026-02-21T08:16:48.9670728Z or.b32 %r1210, %r1188, 360448; 2026-02-21T08:16:48.9670871Z or.b32 %r1211, %r1185, 376832; 2026-02-21T08:16:48.9671022Z or.b32 %r1212, %r1188, 393216; 2026-02-21T08:16:48.9671168Z or.b32 %r1213, %r1188, 409600; 2026-02-21T08:16:48.9671322Z or.b32 %r1214, %r1188, 425984; 2026-02-21T08:16:48.9671525Z or.b32 %r1215, %r1188, 442368; 2026-02-21T08:16:48.9671677Z or.b32 %r1216, %r1188, 458752; 2026-02-21T08:16:48.9671827Z or.b32 %r1217, %r1188, 475136; 2026-02-21T08:16:48.9671970Z or.b32 %r1218, %r1186, 491520; 2026-02-21T08:16:48.9672123Z or.b32 %r1219, %r1187, 507904; 2026-02-21T08:16:48.9672378Z .loc 1 59 24 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:59:24 2026-02-21T08:16:48.9672678Z mad.wide.u32 %rd140, %r1188, 2, %rd8; 2026-02-21T08:16:48.9672852Z mad.wide.u32 %rd141, %r1189, 2, %rd8; 2026-02-21T08:16:48.9673026Z mad.wide.u32 %rd142, %r1190, 2, %rd8; 2026-02-21T08:16:48.9673191Z mad.wide.u32 %rd143, %r1191, 2, %rd8; 2026-02-21T08:16:48.9673360Z mad.wide.u32 %rd144, %r1192, 2, %rd8; 2026-02-21T08:16:48.9673531Z mad.wide.u32 %rd145, %r1193, 2, %rd8; 2026-02-21T08:16:48.9673693Z mad.wide.u32 %rd146, %r1194, 2, %rd8; 2026-02-21T08:16:48.9673919Z mad.wide.u32 %rd147, %r1195, 2, %rd8; 2026-02-21T08:16:48.9674085Z mad.wide.u32 %rd148, %r1196, 2, %rd8; 2026-02-21T08:16:48.9674253Z mad.wide.u32 %rd149, %r1197, 2, %rd8; 2026-02-21T08:16:48.9674415Z mad.wide.u32 %rd150, %r1198, 2, %rd8; 2026-02-21T08:16:48.9674583Z mad.wide.u32 %rd151, %r1199, 2, %rd8; 2026-02-21T08:16:48.9674771Z mad.wide.u32 %rd152, %r1200, 2, %rd8; 2026-02-21T08:16:48.9674939Z mad.wide.u32 %rd153, %r1201, 2, %rd8; 2026-02-21T08:16:48.9675109Z mad.wide.u32 %rd154, %r1202, 2, %rd8; 2026-02-21T08:16:48.9675272Z mad.wide.u32 %rd155, %r1203, 2, %rd8; 2026-02-21T08:16:48.9675442Z mad.wide.u32 %rd156, %r1204, 2, %rd8; 2026-02-21T08:16:48.9675605Z mad.wide.u32 %rd157, %r1205, 2, %rd8; 2026-02-21T08:16:48.9675775Z mad.wide.u32 %rd158, %r1206, 2, %rd8; 2026-02-21T08:16:48.9675940Z mad.wide.u32 %rd159, %r1207, 2, %rd8; 2026-02-21T08:16:48.9676108Z mad.wide.u32 %rd160, %r1208, 2, %rd8; 2026-02-21T08:16:48.9676270Z mad.wide.u32 %rd161, %r1209, 2, %rd8; 2026-02-21T08:16:48.9676443Z mad.wide.u32 %rd162, %r1210, 2, %rd8; 2026-02-21T08:16:48.9676616Z mad.wide.u32 %rd163, %r1211, 2, %rd8; 2026-02-21T08:16:48.9676781Z mad.wide.u32 %rd164, %r1212, 2, %rd8; 2026-02-21T08:16:48.9676954Z mad.wide.u32 %rd165, %r1213, 2, %rd8; 2026-02-21T08:16:48.9677117Z mad.wide.u32 %rd166, %r1214, 2, %rd8; 2026-02-21T08:16:48.9677286Z mad.wide.u32 %rd167, %r1215, 2, %rd8; 2026-02-21T08:16:48.9677448Z mad.wide.u32 %rd168, %r1216, 2, %rd8; 2026-02-21T08:16:48.9677622Z mad.wide.u32 %rd169, %r1217, 2, %rd8; 2026-02-21T08:16:48.9677783Z mad.wide.u32 %rd170, %r1218, 2, %rd8; 2026-02-21T08:16:48.9677952Z mad.wide.u32 %rd171, %r1219, 2, %rd8; 2026-02-21T08:16:48.9678226Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9678513Z // begin inline asm 2026-02-21T08:16:48.9678885Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596}, [%r270 + 0]; 2026-02-21T08:16:48.9679273Z // end inline asm 2026-02-21T08:16:48.9679413Z // begin inline asm 2026-02-21T08:16:48.9679766Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613}, [%r270 + 16]; 2026-02-21T08:16:48.9680142Z // end inline asm 2026-02-21T08:16:48.9680279Z // begin inline asm 2026-02-21T08:16:48.9680626Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630}, [%r270 + 32]; 2026-02-21T08:16:48.9681003Z // end inline asm 2026-02-21T08:16:48.9681133Z // begin inline asm 2026-02-21T08:16:48.9681480Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647}, [%r270 + 48]; 2026-02-21T08:16:48.9681861Z // end inline asm 2026-02-21T08:16:48.9681989Z // begin inline asm 2026-02-21T08:16:48.9682339Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664}, [%r270 + 64]; 2026-02-21T08:16:48.9682771Z // end inline asm 2026-02-21T08:16:48.9682910Z // begin inline asm 2026-02-21T08:16:48.9683238Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681}, [%r270 + 80]; 2026-02-21T08:16:48.9683608Z // end inline asm 2026-02-21T08:16:48.9683745Z // begin inline asm 2026-02-21T08:16:48.9684073Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698}, [%r270 + 96]; 2026-02-21T08:16:48.9684445Z // end inline asm 2026-02-21T08:16:48.9684576Z // begin inline asm 2026-02-21T08:16:48.9685004Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715}, [%r270 + 112]; 2026-02-21T08:16:48.9685385Z // end inline asm 2026-02-21T08:16:48.9685518Z // begin inline asm 2026-02-21T08:16:48.9685854Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732}, [%r270 + 128]; 2026-02-21T08:16:48.9686216Z // end inline asm 2026-02-21T08:16:48.9686351Z // begin inline asm 2026-02-21T08:16:48.9686683Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749}, [%r270 + 144]; 2026-02-21T08:16:48.9687053Z // end inline asm 2026-02-21T08:16:48.9687182Z // begin inline asm 2026-02-21T08:16:48.9687524Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766}, [%r270 + 160]; 2026-02-21T08:16:48.9687898Z // end inline asm 2026-02-21T08:16:48.9688025Z // begin inline asm 2026-02-21T08:16:48.9688360Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783}, [%r270 + 176]; 2026-02-21T08:16:48.9688723Z // end inline asm 2026-02-21T08:16:48.9688858Z // begin inline asm 2026-02-21T08:16:48.9689199Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800}, [%r270 + 192]; 2026-02-21T08:16:48.9689560Z // end inline asm 2026-02-21T08:16:48.9689695Z // begin inline asm 2026-02-21T08:16:48.9690021Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817}, [%r270 + 208]; 2026-02-21T08:16:48.9690387Z // end inline asm 2026-02-21T08:16:48.9690511Z // begin inline asm 2026-02-21T08:16:48.9690851Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833, %r834}, [%r270 + 224]; 2026-02-21T08:16:48.9691222Z // end inline asm 2026-02-21T08:16:48.9691346Z // begin inline asm 2026-02-21T08:16:48.9691691Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845, %r846, %r847, %r848, %r849, %r850, %r851}, [%r270 + 240]; 2026-02-21T08:16:48.9692056Z // end inline asm 2026-02-21T08:16:48.9692189Z // begin inline asm 2026-02-21T08:16:48.9692334Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:16:48.9692492Z // end inline asm 2026-02-21T08:16:48.9692631Z cvt.u64.u32 %rd172, %r581; 2026-02-21T08:16:48.9692785Z cvt.u64.u32 %rd173, %r582; 2026-02-21T08:16:48.9692943Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:16:48.9693094Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:16:48.9693367Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9693659Z mov.b64 {%r1220, %r1221}, %rd175; 2026-02-21T08:16:48.9693842Z cvt.rn.f16x2.f32 %r1222, %r1221, %r1220; 2026-02-21T08:16:48.9694182Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9694468Z cvt.u64.u32 %rd176, %r583; 2026-02-21T08:16:48.9694623Z cvt.u64.u32 %rd177, %r584; 2026-02-21T08:16:48.9694804Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:16:48.9694965Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:16:48.9695233Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9695530Z mov.b64 {%r1223, %r1224}, %rd179; 2026-02-21T08:16:48.9695713Z cvt.rn.f16x2.f32 %r1225, %r1224, %r1223; 2026-02-21T08:16:48.9696022Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9696326Z cvt.u64.u32 %rd180, %r585; 2026-02-21T08:16:48.9696480Z cvt.u64.u32 %rd181, %r586; 2026-02-21T08:16:48.9696641Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:16:48.9696853Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:16:48.9697132Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9697431Z mov.b64 {%r1226, %r1227}, %rd183; 2026-02-21T08:16:48.9697613Z cvt.rn.f16x2.f32 %r1228, %r1227, %r1226; 2026-02-21T08:16:48.9697901Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9698192Z cvt.u64.u32 %rd184, %r587; 2026-02-21T08:16:48.9698351Z cvt.u64.u32 %rd185, %r588; 2026-02-21T08:16:48.9698502Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:16:48.9698664Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:16:48.9698936Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9699238Z mov.b64 {%r1229, %r1230}, %rd187; 2026-02-21T08:16:48.9699414Z cvt.rn.f16x2.f32 %r1231, %r1230, %r1229; 2026-02-21T08:16:48.9699709Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9700010Z cvt.u64.u32 %rd188, %r589; 2026-02-21T08:16:48.9700165Z cvt.u64.u32 %rd189, %r590; 2026-02-21T08:16:48.9700327Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:16:48.9700483Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:16:48.9700760Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9701055Z mov.b64 {%r1232, %r1233}, %rd191; 2026-02-21T08:16:48.9701239Z cvt.rn.f16x2.f32 %r1234, %r1233, %r1232; 2026-02-21T08:16:48.9701519Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9701809Z cvt.u64.u32 %rd192, %r591; 2026-02-21T08:16:48.9701966Z cvt.u64.u32 %rd193, %r592; 2026-02-21T08:16:48.9702118Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:16:48.9702279Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:16:48.9702558Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9702855Z mov.b64 {%r1235, %r1236}, %rd195; 2026-02-21T08:16:48.9703043Z cvt.rn.f16x2.f32 %r1237, %r1236, %r1235; 2026-02-21T08:16:48.9703328Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9703604Z cvt.u64.u32 %rd196, %r593; 2026-02-21T08:16:48.9703748Z cvt.u64.u32 %rd197, %r594; 2026-02-21T08:16:48.9703901Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:16:48.9704049Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:16:48.9704309Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9704585Z mov.b64 {%r1238, %r1239}, %rd199; 2026-02-21T08:16:48.9704782Z cvt.rn.f16x2.f32 %r1240, %r1239, %r1238; 2026-02-21T08:16:48.9705058Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9705341Z cvt.u64.u32 %rd200, %r595; 2026-02-21T08:16:48.9705499Z cvt.u64.u32 %rd201, %r596; 2026-02-21T08:16:48.9705645Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:16:48.9705872Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:16:48.9706125Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9706406Z mov.b64 {%r1241, %r1242}, %rd203; 2026-02-21T08:16:48.9706569Z cvt.rn.f16x2.f32 %r1243, %r1242, %r1241; 2026-02-21T08:16:48.9706849Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9707134Z cvt.u64.u32 %rd204, %r598; 2026-02-21T08:16:48.9707281Z cvt.u64.u32 %rd205, %r599; 2026-02-21T08:16:48.9707435Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:16:48.9707584Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:16:48.9707843Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9708116Z mov.b64 {%r1244, %r1245}, %rd207; 2026-02-21T08:16:48.9708342Z cvt.rn.f16x2.f32 %r1246, %r1245, %r1244; 2026-02-21T08:16:48.9708621Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9708913Z cvt.u64.u32 %rd208, %r600; 2026-02-21T08:16:48.9709072Z cvt.u64.u32 %rd209, %r601; 2026-02-21T08:16:48.9709225Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:16:48.9709388Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:16:48.9709652Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9709945Z mov.b64 {%r1247, %r1248}, %rd211; 2026-02-21T08:16:48.9710115Z cvt.rn.f16x2.f32 %r1249, %r1248, %r1247; 2026-02-21T08:16:48.9710407Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9710706Z cvt.u64.u32 %rd212, %r602; 2026-02-21T08:16:48.9710858Z cvt.u64.u32 %rd213, %r603; 2026-02-21T08:16:48.9711018Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:16:48.9711172Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:16:48.9711442Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9711735Z mov.b64 {%r1250, %r1251}, %rd215; 2026-02-21T08:16:48.9711914Z cvt.rn.f16x2.f32 %r1252, %r1251, %r1250; 2026-02-21T08:16:48.9712193Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9712483Z cvt.u64.u32 %rd216, %r604; 2026-02-21T08:16:48.9712639Z cvt.u64.u32 %rd217, %r605; 2026-02-21T08:16:48.9712788Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:16:48.9712948Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:16:48.9713209Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9713500Z mov.b64 {%r1253, %r1254}, %rd219; 2026-02-21T08:16:48.9713669Z cvt.rn.f16x2.f32 %r1255, %r1254, %r1253; 2026-02-21T08:16:48.9713964Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9714249Z cvt.u64.u32 %rd220, %r606; 2026-02-21T08:16:48.9714404Z cvt.u64.u32 %rd221, %r607; 2026-02-21T08:16:48.9714563Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:16:48.9714750Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:16:48.9715011Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9715295Z mov.b64 {%r1256, %r1257}, %rd223; 2026-02-21T08:16:48.9715469Z cvt.rn.f16x2.f32 %r1258, %r1257, %r1256; 2026-02-21T08:16:48.9715739Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9716024Z cvt.u64.u32 %rd224, %r608; 2026-02-21T08:16:48.9716175Z cvt.u64.u32 %rd225, %r609; 2026-02-21T08:16:48.9716358Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:16:48.9716550Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:16:48.9716812Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9717098Z mov.b64 {%r1259, %r1260}, %rd227; 2026-02-21T08:16:48.9717325Z cvt.rn.f16x2.f32 %r1261, %r1260, %r1259; 2026-02-21T08:16:48.9717601Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9717884Z cvt.u64.u32 %rd228, %r610; 2026-02-21T08:16:48.9718029Z cvt.u64.u32 %rd229, %r611; 2026-02-21T08:16:48.9718180Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:16:48.9718325Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:16:48.9718581Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9718857Z mov.b64 {%r1262, %r1263}, %rd231; 2026-02-21T08:16:48.9719029Z cvt.rn.f16x2.f32 %r1264, %r1263, %r1262; 2026-02-21T08:16:48.9719298Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9719584Z cvt.u64.u32 %rd232, %r612; 2026-02-21T08:16:48.9719739Z cvt.u64.u32 %rd233, %r613; 2026-02-21T08:16:48.9719932Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:16:48.9720093Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:16:48.9720347Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9720627Z mov.b64 {%r1265, %r1266}, %rd235; 2026-02-21T08:16:48.9720793Z cvt.rn.f16x2.f32 %r1267, %r1266, %r1265; 2026-02-21T08:16:48.9721070Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9721351Z cvt.u64.u32 %rd236, %r615; 2026-02-21T08:16:48.9721496Z cvt.u64.u32 %rd237, %r616; 2026-02-21T08:16:48.9721651Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:16:48.9721800Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:16:48.9722060Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9722334Z mov.b64 {%r1268, %r1269}, %rd239; 2026-02-21T08:16:48.9722510Z cvt.rn.f16x2.f32 %r1270, %r1269, %r1268; 2026-02-21T08:16:48.9722783Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9723061Z cvt.u64.u32 %rd240, %r617; 2026-02-21T08:16:48.9723215Z cvt.u64.u32 %rd241, %r618; 2026-02-21T08:16:48.9723359Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:16:48.9723512Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:16:48.9723765Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9724040Z mov.b64 {%r1271, %r1272}, %rd243; 2026-02-21T08:16:48.9724201Z cvt.rn.f16x2.f32 %r1273, %r1272, %r1271; 2026-02-21T08:16:48.9724475Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9724795Z cvt.u64.u32 %rd244, %r619; 2026-02-21T08:16:48.9724945Z cvt.u64.u32 %rd245, %r620; 2026-02-21T08:16:48.9725102Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:16:48.9725251Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:16:48.9725522Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9725811Z mov.b64 {%r1274, %r1275}, %rd247; 2026-02-21T08:16:48.9725989Z cvt.rn.f16x2.f32 %r1276, %r1275, %r1274; 2026-02-21T08:16:48.9726276Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9726550Z cvt.u64.u32 %rd248, %r621; 2026-02-21T08:16:48.9726708Z cvt.u64.u32 %rd249, %r622; 2026-02-21T08:16:48.9726853Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:16:48.9727011Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:16:48.9727272Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9727560Z mov.b64 {%r1277, %r1278}, %rd251; 2026-02-21T08:16:48.9727724Z cvt.rn.f16x2.f32 %r1279, %r1278, %r1277; 2026-02-21T08:16:48.9728008Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9728298Z cvt.u64.u32 %rd252, %r623; 2026-02-21T08:16:48.9728498Z cvt.u64.u32 %rd253, %r624; 2026-02-21T08:16:48.9728654Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:16:48.9728807Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:16:48.9729081Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9729368Z mov.b64 {%r1280, %r1281}, %rd255; 2026-02-21T08:16:48.9729543Z cvt.rn.f16x2.f32 %r1282, %r1281, %r1280; 2026-02-21T08:16:48.9729837Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9730119Z cvt.u64.u32 %rd256, %r625; 2026-02-21T08:16:48.9730278Z cvt.u64.u32 %rd257, %r626; 2026-02-21T08:16:48.9730425Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:16:48.9730585Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:16:48.9730851Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9731196Z mov.b64 {%r1283, %r1284}, %rd259; 2026-02-21T08:16:48.9731365Z cvt.rn.f16x2.f32 %r1285, %r1284, %r1283; 2026-02-21T08:16:48.9731649Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9731938Z cvt.u64.u32 %rd260, %r627; 2026-02-21T08:16:48.9732084Z cvt.u64.u32 %rd261, %r628; 2026-02-21T08:16:48.9732235Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:16:48.9732381Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:16:48.9732645Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9732920Z mov.b64 {%r1286, %r1287}, %rd263; 2026-02-21T08:16:48.9733091Z cvt.rn.f16x2.f32 %r1288, %r1287, %r1286; 2026-02-21T08:16:48.9733371Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9733653Z cvt.u64.u32 %rd264, %r629; 2026-02-21T08:16:48.9733808Z cvt.u64.u32 %rd265, %r630; 2026-02-21T08:16:48.9733955Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:16:48.9734114Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:16:48.9734379Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9734694Z mov.b64 {%r1289, %r1290}, %rd267; 2026-02-21T08:16:48.9734862Z cvt.rn.f16x2.f32 %r1291, %r1290, %r1289; 2026-02-21T08:16:48.9735147Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9735428Z cvt.u64.u32 %rd268, %r632; 2026-02-21T08:16:48.9735573Z cvt.u64.u32 %rd269, %r633; 2026-02-21T08:16:48.9735727Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:16:48.9735875Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:16:48.9736136Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9736418Z mov.b64 {%r1292, %r1293}, %rd271; 2026-02-21T08:16:48.9736590Z cvt.rn.f16x2.f32 %r1294, %r1293, %r1292; 2026-02-21T08:16:48.9736869Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9737147Z cvt.u64.u32 %rd272, %r634; 2026-02-21T08:16:48.9737302Z cvt.u64.u32 %rd273, %r635; 2026-02-21T08:16:48.9737447Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:16:48.9737601Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:16:48.9737856Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9738142Z mov.b64 {%r1295, %r1296}, %rd275; 2026-02-21T08:16:48.9738305Z cvt.rn.f16x2.f32 %r1297, %r1296, %r1295; 2026-02-21T08:16:48.9738584Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9738869Z cvt.u64.u32 %rd276, %r636; 2026-02-21T08:16:48.9739012Z cvt.u64.u32 %rd277, %r637; 2026-02-21T08:16:48.9739169Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:16:48.9739322Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:16:48.9739595Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9739944Z mov.b64 {%r1298, %r1299}, %rd279; 2026-02-21T08:16:48.9740129Z cvt.rn.f16x2.f32 %r1300, %r1299, %r1298; 2026-02-21T08:16:48.9740425Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9740720Z cvt.u64.u32 %rd280, %r638; 2026-02-21T08:16:48.9740885Z cvt.u64.u32 %rd281, %r639; 2026-02-21T08:16:48.9741042Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:16:48.9741209Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:16:48.9741486Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9741794Z mov.b64 {%r1301, %r1302}, %rd283; 2026-02-21T08:16:48.9741973Z cvt.rn.f16x2.f32 %r1303, %r1302, %r1301; 2026-02-21T08:16:48.9742274Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9742644Z cvt.u64.u32 %rd284, %r640; 2026-02-21T08:16:48.9742803Z cvt.u64.u32 %rd285, %r641; 2026-02-21T08:16:48.9742962Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:16:48.9743118Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:16:48.9743394Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9743688Z mov.b64 {%r1304, %r1305}, %rd287; 2026-02-21T08:16:48.9743868Z cvt.rn.f16x2.f32 %r1306, %r1305, %r1304; 2026-02-21T08:16:48.9744161Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9744451Z cvt.u64.u32 %rd288, %r642; 2026-02-21T08:16:48.9744612Z cvt.u64.u32 %rd289, %r643; 2026-02-21T08:16:48.9744795Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:16:48.9744961Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:16:48.9745234Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9745547Z mov.b64 {%r1307, %r1308}, %rd291; 2026-02-21T08:16:48.9745722Z cvt.rn.f16x2.f32 %r1309, %r1308, %r1307; 2026-02-21T08:16:48.9746024Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9746322Z cvt.u64.u32 %rd292, %r644; 2026-02-21T08:16:48.9746476Z cvt.u64.u32 %rd293, %r645; 2026-02-21T08:16:48.9746637Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:16:48.9746804Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:16:48.9747087Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9747387Z mov.b64 {%r1310, %r1311}, %rd295; 2026-02-21T08:16:48.9747559Z cvt.rn.f16x2.f32 %r1312, %r1311, %r1310; 2026-02-21T08:16:48.9747844Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9748125Z cvt.u64.u32 %rd296, %r646; 2026-02-21T08:16:48.9748276Z cvt.u64.u32 %rd297, %r647; 2026-02-21T08:16:48.9748422Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:16:48.9748581Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:16:48.9748842Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9749130Z mov.b64 {%r1313, %r1314}, %rd299; 2026-02-21T08:16:48.9749296Z cvt.rn.f16x2.f32 %r1315, %r1314, %r1313; 2026-02-21T08:16:48.9749582Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9749865Z cvt.u64.u32 %rd300, %r649; 2026-02-21T08:16:48.9750011Z cvt.u64.u32 %rd301, %r650; 2026-02-21T08:16:48.9750163Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:16:48.9750310Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:16:48.9750576Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9750857Z mov.b64 {%r1316, %r1317}, %rd303; 2026-02-21T08:16:48.9751028Z cvt.rn.f16x2.f32 %r1318, %r1317, %r1316; 2026-02-21T08:16:48.9751313Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9751644Z cvt.u64.u32 %rd304, %r651; 2026-02-21T08:16:48.9751797Z cvt.u64.u32 %rd305, %r652; 2026-02-21T08:16:48.9751940Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:16:48.9752093Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:16:48.9752352Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9752645Z mov.b64 {%r1319, %r1320}, %rd307; 2026-02-21T08:16:48.9752819Z cvt.rn.f16x2.f32 %r1321, %r1320, %r1319; 2026-02-21T08:16:48.9753094Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9753381Z cvt.u64.u32 %rd308, %r653; 2026-02-21T08:16:48.9753527Z cvt.u64.u32 %rd309, %r654; 2026-02-21T08:16:48.9753681Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:16:48.9753830Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:16:48.9754155Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9754435Z mov.b64 {%r1322, %r1323}, %rd311; 2026-02-21T08:16:48.9754607Z cvt.rn.f16x2.f32 %r1324, %r1323, %r1322; 2026-02-21T08:16:48.9754925Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9755215Z cvt.u64.u32 %rd312, %r655; 2026-02-21T08:16:48.9755372Z cvt.u64.u32 %rd313, %r656; 2026-02-21T08:16:48.9755523Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:16:48.9755684Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:16:48.9755954Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9756256Z mov.b64 {%r1325, %r1326}, %rd315; 2026-02-21T08:16:48.9756431Z cvt.rn.f16x2.f32 %r1327, %r1326, %r1325; 2026-02-21T08:16:48.9756714Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9757015Z cvt.u64.u32 %rd316, %r657; 2026-02-21T08:16:48.9757169Z cvt.u64.u32 %rd317, %r658; 2026-02-21T08:16:48.9757324Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:16:48.9757473Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:16:48.9757749Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9758032Z mov.b64 {%r1328, %r1329}, %rd319; 2026-02-21T08:16:48.9758206Z cvt.rn.f16x2.f32 %r1330, %r1329, %r1328; 2026-02-21T08:16:48.9758495Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9758781Z cvt.u64.u32 %rd320, %r659; 2026-02-21T08:16:48.9758937Z cvt.u64.u32 %rd321, %r660; 2026-02-21T08:16:48.9759086Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:16:48.9759243Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:16:48.9759510Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9759802Z mov.b64 {%r1331, %r1332}, %rd323; 2026-02-21T08:16:48.9759975Z cvt.rn.f16x2.f32 %r1333, %r1332, %r1331; 2026-02-21T08:16:48.9760260Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9760552Z cvt.u64.u32 %rd324, %r661; 2026-02-21T08:16:48.9760700Z cvt.u64.u32 %rd325, %r662; 2026-02-21T08:16:48.9760853Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:16:48.9761003Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:16:48.9761280Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9761567Z mov.b64 {%r1334, %r1335}, %rd327; 2026-02-21T08:16:48.9761740Z cvt.rn.f16x2.f32 %r1336, %r1335, %r1334; 2026-02-21T08:16:48.9762032Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9762316Z cvt.u64.u32 %rd328, %r663; 2026-02-21T08:16:48.9762470Z cvt.u64.u32 %rd329, %r664; 2026-02-21T08:16:48.9762615Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:16:48.9762774Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:16:48.9763088Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9763378Z mov.b64 {%r1337, %r1338}, %rd331; 2026-02-21T08:16:48.9763548Z cvt.rn.f16x2.f32 %r1339, %r1338, %r1337; 2026-02-21T08:16:48.9763821Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9764105Z cvt.u64.u32 %rd332, %r666; 2026-02-21T08:16:48.9764249Z cvt.u64.u32 %rd333, %r667; 2026-02-21T08:16:48.9764400Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:16:48.9764547Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:16:48.9764834Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9765114Z mov.b64 {%r1340, %r1341}, %rd335; 2026-02-21T08:16:48.9765288Z cvt.rn.f16x2.f32 %r1342, %r1341, %r1340; 2026-02-21T08:16:48.9765610Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9765893Z cvt.u64.u32 %rd336, %r668; 2026-02-21T08:16:48.9766051Z cvt.u64.u32 %rd337, %r669; 2026-02-21T08:16:48.9766197Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:16:48.9766353Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:16:48.9766609Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9766899Z mov.b64 {%r1343, %r1344}, %rd339; 2026-02-21T08:16:48.9767073Z cvt.rn.f16x2.f32 %r1345, %r1344, %r1343; 2026-02-21T08:16:48.9767344Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9767625Z cvt.u64.u32 %rd340, %r670; 2026-02-21T08:16:48.9767770Z cvt.u64.u32 %rd341, %r671; 2026-02-21T08:16:48.9767922Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:16:48.9768070Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:16:48.9768333Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9768614Z mov.b64 {%r1346, %r1347}, %rd343; 2026-02-21T08:16:48.9768784Z cvt.rn.f16x2.f32 %r1348, %r1347, %r1346; 2026-02-21T08:16:48.9769065Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9769344Z cvt.u64.u32 %rd344, %r672; 2026-02-21T08:16:48.9769498Z cvt.u64.u32 %rd345, %r673; 2026-02-21T08:16:48.9769644Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:16:48.9769798Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:16:48.9770055Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9770342Z mov.b64 {%r1349, %r1350}, %rd347; 2026-02-21T08:16:48.9770512Z cvt.rn.f16x2.f32 %r1351, %r1350, %r1349; 2026-02-21T08:16:48.9770784Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9771063Z cvt.u64.u32 %rd348, %r674; 2026-02-21T08:16:48.9771213Z cvt.u64.u32 %rd349, %r675; 2026-02-21T08:16:48.9771366Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:16:48.9771515Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:16:48.9771776Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9772053Z mov.b64 {%r1352, %r1353}, %rd351; 2026-02-21T08:16:48.9772226Z cvt.rn.f16x2.f32 %r1354, %r1353, %r1352; 2026-02-21T08:16:48.9772502Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9772781Z cvt.u64.u32 %rd352, %r676; 2026-02-21T08:16:48.9772933Z cvt.u64.u32 %rd353, %r677; 2026-02-21T08:16:48.9773079Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:16:48.9773230Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:16:48.9773484Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9773768Z mov.b64 {%r1355, %r1356}, %rd355; 2026-02-21T08:16:48.9773941Z cvt.rn.f16x2.f32 %r1357, %r1356, %r1355; 2026-02-21T08:16:48.9774263Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9774547Z cvt.u64.u32 %rd356, %r678; 2026-02-21T08:16:48.9774719Z cvt.u64.u32 %rd357, %r679; 2026-02-21T08:16:48.9774875Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:16:48.9775024Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:16:48.9775287Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9775578Z mov.b64 {%r1358, %r1359}, %rd359; 2026-02-21T08:16:48.9775746Z cvt.rn.f16x2.f32 %r1360, %r1359, %r1358; 2026-02-21T08:16:48.9776028Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9776307Z cvt.u64.u32 %rd360, %r680; 2026-02-21T08:16:48.9776461Z cvt.u64.u32 %rd361, %r681; 2026-02-21T08:16:48.9776610Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:16:48.9776848Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:16:48.9777115Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9777400Z mov.b64 {%r1361, %r1362}, %rd363; 2026-02-21T08:16:48.9777571Z cvt.rn.f16x2.f32 %r1363, %r1362, %r1361; 2026-02-21T08:16:48.9777847Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9778141Z cvt.u64.u32 %rd364, %r683; 2026-02-21T08:16:48.9778289Z cvt.u64.u32 %rd365, %r684; 2026-02-21T08:16:48.9778442Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:16:48.9778593Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:16:48.9778861Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9779152Z mov.b64 {%r1364, %r1365}, %rd367; 2026-02-21T08:16:48.9779315Z cvt.rn.f16x2.f32 %r1366, %r1365, %r1364; 2026-02-21T08:16:48.9779598Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9779881Z cvt.u64.u32 %rd368, %r685; 2026-02-21T08:16:48.9780034Z cvt.u64.u32 %rd369, %r686; 2026-02-21T08:16:48.9780180Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:16:48.9780334Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:16:48.9780588Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9780874Z mov.b64 {%r1367, %r1368}, %rd371; 2026-02-21T08:16:48.9781045Z cvt.rn.f16x2.f32 %r1369, %r1368, %r1367; 2026-02-21T08:16:48.9781316Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9781604Z cvt.u64.u32 %rd372, %r687; 2026-02-21T08:16:48.9781750Z cvt.u64.u32 %rd373, %r688; 2026-02-21T08:16:48.9781904Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:16:48.9782053Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:16:48.9782321Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9782610Z mov.b64 {%r1370, %r1371}, %rd375; 2026-02-21T08:16:48.9782773Z cvt.rn.f16x2.f32 %r1372, %r1371, %r1370; 2026-02-21T08:16:48.9783049Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9783360Z cvt.u64.u32 %rd376, %r689; 2026-02-21T08:16:48.9783515Z cvt.u64.u32 %rd377, %r690; 2026-02-21T08:16:48.9783661Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:16:48.9783814Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:16:48.9784082Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9784378Z mov.b64 {%r1373, %r1374}, %rd379; 2026-02-21T08:16:48.9784557Z cvt.rn.f16x2.f32 %r1375, %r1374, %r1373; 2026-02-21T08:16:48.9784871Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9785175Z cvt.u64.u32 %rd380, %r691; 2026-02-21T08:16:48.9785331Z cvt.u64.u32 %rd381, %r692; 2026-02-21T08:16:48.9785548Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:16:48.9785709Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:16:48.9785992Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9786301Z mov.b64 {%r1376, %r1377}, %rd383; 2026-02-21T08:16:48.9786480Z cvt.rn.f16x2.f32 %r1378, %r1377, %r1376; 2026-02-21T08:16:48.9786780Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9787071Z cvt.u64.u32 %rd384, %r693; 2026-02-21T08:16:48.9787236Z cvt.u64.u32 %rd385, %r694; 2026-02-21T08:16:48.9787393Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:16:48.9787561Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:16:48.9787833Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9788136Z mov.b64 {%r1379, %r1380}, %rd387; 2026-02-21T08:16:48.9788406Z cvt.rn.f16x2.f32 %r1381, %r1380, %r1379; 2026-02-21T08:16:48.9788697Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9788996Z cvt.u64.u32 %rd388, %r695; 2026-02-21T08:16:48.9789150Z cvt.u64.u32 %rd389, %r696; 2026-02-21T08:16:48.9789309Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:16:48.9789465Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:16:48.9789741Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9790046Z mov.b64 {%r1382, %r1383}, %rd391; 2026-02-21T08:16:48.9790219Z cvt.rn.f16x2.f32 %r1384, %r1383, %r1382; 2026-02-21T08:16:48.9790513Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9790798Z cvt.u64.u32 %rd392, %r697; 2026-02-21T08:16:48.9790960Z cvt.u64.u32 %rd393, %r698; 2026-02-21T08:16:48.9791122Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:16:48.9791299Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:16:48.9791554Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9791837Z mov.b64 {%r1385, %r1386}, %rd395; 2026-02-21T08:16:48.9792007Z cvt.rn.f16x2.f32 %r1387, %r1386, %r1385; 2026-02-21T08:16:48.9792275Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9792553Z cvt.u64.u32 %rd396, %r700; 2026-02-21T08:16:48.9792700Z cvt.u64.u32 %rd397, %r701; 2026-02-21T08:16:48.9792851Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:16:48.9792997Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:16:48.9793256Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9793538Z mov.b64 {%r1388, %r1389}, %rd399; 2026-02-21T08:16:48.9793703Z cvt.rn.f16x2.f32 %r1390, %r1389, %r1388; 2026-02-21T08:16:48.9793981Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9794257Z cvt.u64.u32 %rd400, %r702; 2026-02-21T08:16:48.9794409Z cvt.u64.u32 %rd401, %r703; 2026-02-21T08:16:48.9794552Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:16:48.9794735Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:16:48.9794989Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9795272Z mov.b64 {%r1391, %r1392}, %rd403; 2026-02-21T08:16:48.9795444Z cvt.rn.f16x2.f32 %r1393, %r1392, %r1391; 2026-02-21T08:16:48.9795711Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9795996Z cvt.u64.u32 %rd404, %r704; 2026-02-21T08:16:48.9796140Z cvt.u64.u32 %rd405, %r705; 2026-02-21T08:16:48.9796292Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:16:48.9796349Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:16:48.9796511Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9796630Z mov.b64 {%r1394, %r1395}, %rd407; 2026-02-21T08:16:48.9796694Z cvt.rn.f16x2.f32 %r1396, %r1395, %r1394; 2026-02-21T08:16:48.9796861Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9796924Z cvt.u64.u32 %rd408, %r706; 2026-02-21T08:16:48.9796979Z cvt.u64.u32 %rd409, %r707; 2026-02-21T08:16:48.9797034Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:16:48.9797097Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:16:48.9797260Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9797316Z mov.b64 {%r1397, %r1398}, %rd411; 2026-02-21T08:16:48.9797379Z cvt.rn.f16x2.f32 %r1399, %r1398, %r1397; 2026-02-21T08:16:48.9797547Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9797602Z cvt.u64.u32 %rd412, %r708; 2026-02-21T08:16:48.9797714Z cvt.u64.u32 %rd413, %r709; 2026-02-21T08:16:48.9797782Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:16:48.9797841Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:16:48.9798010Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9798074Z mov.b64 {%r1400, %r1401}, %rd415; 2026-02-21T08:16:48.9798138Z cvt.rn.f16x2.f32 %r1402, %r1401, %r1400; 2026-02-21T08:16:48.9798301Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9798357Z cvt.u64.u32 %rd416, %r710; 2026-02-21T08:16:48.9798419Z cvt.u64.u32 %rd417, %r711; 2026-02-21T08:16:48.9798474Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:16:48.9798530Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:16:48.9798699Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9798756Z mov.b64 {%r1403, %r1404}, %rd419; 2026-02-21T08:16:48.9798823Z cvt.rn.f16x2.f32 %r1405, %r1404, %r1403; 2026-02-21T08:16:48.9798994Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9799049Z cvt.u64.u32 %rd420, %r712; 2026-02-21T08:16:48.9799104Z cvt.u64.u32 %rd421, %r713; 2026-02-21T08:16:48.9799159Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:16:48.9799224Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:16:48.9799388Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9799444Z mov.b64 {%r1406, %r1407}, %rd423; 2026-02-21T08:16:48.9799516Z cvt.rn.f16x2.f32 %r1408, %r1407, %r1406; 2026-02-21T08:16:48.9799681Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9799737Z cvt.u64.u32 %rd424, %r714; 2026-02-21T08:16:48.9799802Z cvt.u64.u32 %rd425, %r715; 2026-02-21T08:16:48.9799858Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:16:48.9799917Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:16:48.9800080Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9800145Z mov.b64 {%r1409, %r1410}, %rd427; 2026-02-21T08:16:48.9800208Z cvt.rn.f16x2.f32 %r1411, %r1410, %r1409; 2026-02-21T08:16:48.9800371Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9800433Z cvt.u64.u32 %rd428, %r717; 2026-02-21T08:16:48.9800488Z cvt.u64.u32 %rd429, %r718; 2026-02-21T08:16:48.9800543Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:16:48.9800605Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:16:48.9800771Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9800828Z mov.b64 {%r1412, %r1413}, %rd431; 2026-02-21T08:16:48.9800891Z cvt.rn.f16x2.f32 %r1414, %r1413, %r1412; 2026-02-21T08:16:48.9801064Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9801157Z cvt.u64.u32 %rd432, %r719; 2026-02-21T08:16:48.9801211Z cvt.u64.u32 %rd433, %r720; 2026-02-21T08:16:48.9801274Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:16:48.9801330Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:16:48.9801496Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9801560Z mov.b64 {%r1415, %r1416}, %rd435; 2026-02-21T08:16:48.9801622Z cvt.rn.f16x2.f32 %r1417, %r1416, %r1415; 2026-02-21T08:16:48.9801785Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9801840Z cvt.u64.u32 %rd436, %r721; 2026-02-21T08:16:48.9801902Z cvt.u64.u32 %rd437, %r722; 2026-02-21T08:16:48.9801958Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:16:48.9802013Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:16:48.9802226Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9802287Z mov.b64 {%r1418, %r1419}, %rd439; 2026-02-21T08:16:48.9802349Z cvt.rn.f16x2.f32 %r1420, %r1419, %r1418; 2026-02-21T08:16:48.9802520Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9802574Z cvt.u64.u32 %rd440, %r723; 2026-02-21T08:16:48.9802629Z cvt.u64.u32 %rd441, %r724; 2026-02-21T08:16:48.9802684Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:16:48.9802748Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:16:48.9802914Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9802970Z mov.b64 {%r1421, %r1422}, %rd443; 2026-02-21T08:16:48.9803040Z cvt.rn.f16x2.f32 %r1423, %r1422, %r1421; 2026-02-21T08:16:48.9803208Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9803262Z cvt.u64.u32 %rd444, %r725; 2026-02-21T08:16:48.9803324Z cvt.u64.u32 %rd445, %r726; 2026-02-21T08:16:48.9803383Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:16:48.9803439Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:16:48.9803604Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9803666Z mov.b64 {%r1424, %r1425}, %rd447; 2026-02-21T08:16:48.9803728Z cvt.rn.f16x2.f32 %r1426, %r1425, %r1424; 2026-02-21T08:16:48.9803892Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9803954Z cvt.u64.u32 %rd448, %r727; 2026-02-21T08:16:48.9804008Z cvt.u64.u32 %rd449, %r728; 2026-02-21T08:16:48.9804063Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:16:48.9804126Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:16:48.9804290Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9804344Z mov.b64 {%r1427, %r1428}, %rd451; 2026-02-21T08:16:48.9804408Z cvt.rn.f16x2.f32 %r1429, %r1428, %r1427; 2026-02-21T08:16:48.9804577Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9804631Z cvt.u64.u32 %rd452, %r729; 2026-02-21T08:16:48.9804713Z cvt.u64.u32 %rd453, %r730; 2026-02-21T08:16:48.9804776Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:16:48.9804831Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:16:48.9804991Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9805053Z mov.b64 {%r1430, %r1431}, %rd455; 2026-02-21T08:16:48.9805117Z cvt.rn.f16x2.f32 %r1432, %r1431, %r1430; 2026-02-21T08:16:48.9805278Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9805333Z cvt.u64.u32 %rd456, %r731; 2026-02-21T08:16:48.9805393Z cvt.u64.u32 %rd457, %r732; 2026-02-21T08:16:48.9805448Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:16:48.9805503Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:16:48.9805672Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9805775Z mov.b64 {%r1433, %r1434}, %rd459; 2026-02-21T08:16:48.9805837Z cvt.rn.f16x2.f32 %r1435, %r1434, %r1433; 2026-02-21T08:16:48.9806008Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9806063Z cvt.u64.u32 %rd460, %r734; 2026-02-21T08:16:48.9806118Z cvt.u64.u32 %rd461, %r735; 2026-02-21T08:16:48.9806174Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:16:48.9806239Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:16:48.9806400Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9806459Z mov.b64 {%r1436, %r1437}, %rd463; 2026-02-21T08:16:48.9806529Z cvt.rn.f16x2.f32 %r1438, %r1437, %r1436; 2026-02-21T08:16:48.9806739Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9806800Z cvt.u64.u32 %rd464, %r736; 2026-02-21T08:16:48.9806863Z cvt.u64.u32 %rd465, %r737; 2026-02-21T08:16:48.9806919Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:16:48.9806976Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:16:48.9807139Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9807202Z mov.b64 {%r1439, %r1440}, %rd467; 2026-02-21T08:16:48.9807266Z cvt.rn.f16x2.f32 %r1441, %r1440, %r1439; 2026-02-21T08:16:48.9807433Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9807496Z cvt.u64.u32 %rd468, %r738; 2026-02-21T08:16:48.9807554Z cvt.u64.u32 %rd469, %r739; 2026-02-21T08:16:48.9807609Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:16:48.9807672Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:16:48.9807835Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9807895Z mov.b64 {%r1442, %r1443}, %rd471; 2026-02-21T08:16:48.9807961Z cvt.rn.f16x2.f32 %r1444, %r1443, %r1442; 2026-02-21T08:16:48.9808132Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9808187Z cvt.u64.u32 %rd472, %r740; 2026-02-21T08:16:48.9808241Z cvt.u64.u32 %rd473, %r741; 2026-02-21T08:16:48.9808303Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:16:48.9808358Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:16:48.9808518Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9808580Z mov.b64 {%r1445, %r1446}, %rd475; 2026-02-21T08:16:48.9808643Z cvt.rn.f16x2.f32 %r1447, %r1446, %r1445; 2026-02-21T08:16:48.9808804Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9808859Z cvt.u64.u32 %rd476, %r742; 2026-02-21T08:16:48.9808919Z cvt.u64.u32 %rd477, %r743; 2026-02-21T08:16:48.9808976Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:16:48.9809035Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:16:48.9809205Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9809261Z mov.b64 {%r1448, %r1449}, %rd479; 2026-02-21T08:16:48.9809324Z cvt.rn.f16x2.f32 %r1450, %r1449, %r1448; 2026-02-21T08:16:48.9809494Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9809550Z cvt.u64.u32 %rd480, %r744; 2026-02-21T08:16:48.9809603Z cvt.u64.u32 %rd481, %r745; 2026-02-21T08:16:48.9809659Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:16:48.9809721Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:16:48.9809885Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9809939Z mov.b64 {%r1451, %r1452}, %rd483; 2026-02-21T08:16:48.9810009Z cvt.rn.f16x2.f32 %r1453, %r1452, %r1451; 2026-02-21T08:16:48.9810174Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9810286Z cvt.u64.u32 %rd484, %r746; 2026-02-21T08:16:48.9810348Z cvt.u64.u32 %rd485, %r747; 2026-02-21T08:16:48.9810403Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:16:48.9810460Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:16:48.9810620Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9810681Z mov.b64 {%r1454, %r1455}, %rd487; 2026-02-21T08:16:48.9810744Z cvt.rn.f16x2.f32 %r1456, %r1455, %r1454; 2026-02-21T08:16:48.9810903Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9810966Z cvt.u64.u32 %rd488, %r748; 2026-02-21T08:16:48.9811020Z cvt.u64.u32 %rd489, %r749; 2026-02-21T08:16:48.9811074Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:16:48.9811136Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:16:48.9811338Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9811398Z mov.b64 {%r1457, %r1458}, %rd491; 2026-02-21T08:16:48.9811460Z cvt.rn.f16x2.f32 %r1459, %r1458, %r1457; 2026-02-21T08:16:48.9811629Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9811685Z cvt.u64.u32 %rd492, %r751; 2026-02-21T08:16:48.9811739Z cvt.u64.u32 %rd493, %r752; 2026-02-21T08:16:48.9811804Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:16:48.9811860Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:16:48.9812021Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9812085Z mov.b64 {%r1460, %r1461}, %rd495; 2026-02-21T08:16:48.9812147Z cvt.rn.f16x2.f32 %r1462, %r1461, %r1460; 2026-02-21T08:16:48.9812307Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9812366Z cvt.u64.u32 %rd496, %r753; 2026-02-21T08:16:48.9812430Z cvt.u64.u32 %rd497, %r754; 2026-02-21T08:16:48.9812484Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:16:48.9812541Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:16:48.9812708Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9812763Z mov.b64 {%r1463, %r1464}, %rd499; 2026-02-21T08:16:48.9812824Z cvt.rn.f16x2.f32 %r1465, %r1464, %r1463; 2026-02-21T08:16:48.9812991Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9813046Z cvt.u64.u32 %rd500, %r755; 2026-02-21T08:16:48.9813099Z cvt.u64.u32 %rd501, %r756; 2026-02-21T08:16:48.9813153Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:16:48.9813214Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:16:48.9813374Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9813432Z mov.b64 {%r1466, %r1467}, %rd503; 2026-02-21T08:16:48.9813505Z cvt.rn.f16x2.f32 %r1468, %r1467, %r1466; 2026-02-21T08:16:48.9813661Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9813715Z cvt.u64.u32 %rd504, %r757; 2026-02-21T08:16:48.9813774Z cvt.u64.u32 %rd505, %r758; 2026-02-21T08:16:48.9813829Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:16:48.9813883Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:16:48.9814043Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9814105Z mov.b64 {%r1469, %r1470}, %rd507; 2026-02-21T08:16:48.9814168Z cvt.rn.f16x2.f32 %r1471, %r1470, %r1469; 2026-02-21T08:16:48.9814326Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9814387Z cvt.u64.u32 %rd508, %r759; 2026-02-21T08:16:48.9814441Z cvt.u64.u32 %rd509, %r760; 2026-02-21T08:16:48.9814498Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:16:48.9814604Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:16:48.9814799Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9814856Z mov.b64 {%r1472, %r1473}, %rd511; 2026-02-21T08:16:48.9814918Z cvt.rn.f16x2.f32 %r1474, %r1473, %r1472; 2026-02-21T08:16:48.9815083Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9815137Z cvt.u64.u32 %rd512, %r761; 2026-02-21T08:16:48.9815190Z cvt.u64.u32 %rd513, %r762; 2026-02-21T08:16:48.9815251Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:16:48.9815307Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:16:48.9815469Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9815531Z mov.b64 {%r1475, %r1476}, %rd515; 2026-02-21T08:16:48.9815593Z cvt.rn.f16x2.f32 %r1477, %r1476, %r1475; 2026-02-21T08:16:48.9815802Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9815861Z cvt.u64.u32 %rd516, %r763; 2026-02-21T08:16:48.9815923Z cvt.u64.u32 %rd517, %r764; 2026-02-21T08:16:48.9815978Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:16:48.9816035Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:16:48.9816204Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9816260Z mov.b64 {%r1478, %r1479}, %rd519; 2026-02-21T08:16:48.9816322Z cvt.rn.f16x2.f32 %r1480, %r1479, %r1478; 2026-02-21T08:16:48.9816492Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9816548Z cvt.u64.u32 %rd520, %r765; 2026-02-21T08:16:48.9816601Z cvt.u64.u32 %rd521, %r766; 2026-02-21T08:16:48.9816656Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:16:48.9816719Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:16:48.9816883Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9816942Z mov.b64 {%r1481, %r1482}, %rd523; 2026-02-21T08:16:48.9817013Z cvt.rn.f16x2.f32 %r1483, %r1482, %r1481; 2026-02-21T08:16:48.9817173Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9817226Z cvt.u64.u32 %rd524, %r768; 2026-02-21T08:16:48.9817288Z cvt.u64.u32 %rd525, %r769; 2026-02-21T08:16:48.9817342Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:16:48.9817397Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:16:48.9817558Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9817619Z mov.b64 {%r1484, %r1485}, %rd527; 2026-02-21T08:16:48.9817681Z cvt.rn.f16x2.f32 %r1486, %r1485, %r1484; 2026-02-21T08:16:48.9817845Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9817910Z cvt.u64.u32 %rd528, %r770; 2026-02-21T08:16:48.9817966Z cvt.u64.u32 %rd529, %r771; 2026-02-21T08:16:48.9818021Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:16:48.9818082Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:16:48.9818245Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9818301Z mov.b64 {%r1487, %r1488}, %rd531; 2026-02-21T08:16:48.9818362Z cvt.rn.f16x2.f32 %r1489, %r1488, %r1487; 2026-02-21T08:16:48.9818533Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9818588Z cvt.u64.u32 %rd532, %r772; 2026-02-21T08:16:48.9818641Z cvt.u64.u32 %rd533, %r773; 2026-02-21T08:16:48.9818704Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:16:48.9818759Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:16:48.9818923Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9818988Z mov.b64 {%r1490, %r1491}, %rd535; 2026-02-21T08:16:48.9819106Z cvt.rn.f16x2.f32 %r1492, %r1491, %r1490; 2026-02-21T08:16:48.9819273Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9819329Z cvt.u64.u32 %rd536, %r774; 2026-02-21T08:16:48.9819390Z cvt.u64.u32 %rd537, %r775; 2026-02-21T08:16:48.9819445Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:16:48.9819501Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:16:48.9819666Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9819722Z mov.b64 {%r1493, %r1494}, %rd539; 2026-02-21T08:16:48.9819783Z cvt.rn.f16x2.f32 %r1495, %r1494, %r1493; 2026-02-21T08:16:48.9819953Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9820008Z cvt.u64.u32 %rd540, %r776; 2026-02-21T08:16:48.9820063Z cvt.u64.u32 %rd541, %r777; 2026-02-21T08:16:48.9820158Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:16:48.9820226Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:16:48.9820388Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9820444Z mov.b64 {%r1496, %r1497}, %rd543; 2026-02-21T08:16:48.9820513Z cvt.rn.f16x2.f32 %r1498, %r1497, %r1496; 2026-02-21T08:16:48.9820674Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9820730Z cvt.u64.u32 %rd544, %r778; 2026-02-21T08:16:48.9820790Z cvt.u64.u32 %rd545, %r779; 2026-02-21T08:16:48.9820846Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:16:48.9820903Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:16:48.9821057Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9821121Z mov.b64 {%r1499, %r1500}, %rd547; 2026-02-21T08:16:48.9821183Z cvt.rn.f16x2.f32 %r1501, %r1500, %r1499; 2026-02-21T08:16:48.9821339Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9821403Z cvt.u64.u32 %rd548, %r780; 2026-02-21T08:16:48.9821458Z cvt.u64.u32 %rd549, %r781; 2026-02-21T08:16:48.9821514Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:16:48.9821576Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:16:48.9821737Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9821793Z mov.b64 {%r1502, %r1503}, %rd551; 2026-02-21T08:16:48.9821856Z cvt.rn.f16x2.f32 %r1504, %r1503, %r1502; 2026-02-21T08:16:48.9822024Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9822081Z cvt.u64.u32 %rd552, %r782; 2026-02-21T08:16:48.9822136Z cvt.u64.u32 %rd553, %r783; 2026-02-21T08:16:48.9822198Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:16:48.9822254Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:16:48.9822419Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9822483Z mov.b64 {%r1505, %r1506}, %rd555; 2026-02-21T08:16:48.9822545Z cvt.rn.f16x2.f32 %r1507, %r1506, %r1505; 2026-02-21T08:16:48.9822707Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9822761Z cvt.u64.u32 %rd556, %r785; 2026-02-21T08:16:48.9822822Z cvt.u64.u32 %rd557, %r786; 2026-02-21T08:16:48.9822877Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:16:48.9822934Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:16:48.9823100Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9823156Z mov.b64 {%r1508, %r1509}, %rd559; 2026-02-21T08:16:48.9823218Z cvt.rn.f16x2.f32 %r1510, %r1509, %r1508; 2026-02-21T08:16:48.9823392Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9823449Z cvt.u64.u32 %rd560, %r787; 2026-02-21T08:16:48.9823544Z cvt.u64.u32 %rd561, %r788; 2026-02-21T08:16:48.9823601Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:16:48.9823663Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:16:48.9823828Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9823884Z mov.b64 {%r1511, %r1512}, %rd563; 2026-02-21T08:16:48.9823952Z cvt.rn.f16x2.f32 %r1513, %r1512, %r1511; 2026-02-21T08:16:48.9824116Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9824170Z cvt.u64.u32 %rd564, %r789; 2026-02-21T08:16:48.9824231Z cvt.u64.u32 %rd565, %r790; 2026-02-21T08:16:48.9824287Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:16:48.9824342Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:16:48.9824508Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9824610Z mov.b64 {%r1514, %r1515}, %rd567; 2026-02-21T08:16:48.9824696Z cvt.rn.f16x2.f32 %r1516, %r1515, %r1514; 2026-02-21T08:16:48.9824859Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9824922Z cvt.u64.u32 %rd568, %r791; 2026-02-21T08:16:48.9824977Z cvt.u64.u32 %rd569, %r792; 2026-02-21T08:16:48.9825031Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:16:48.9825095Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:16:48.9825255Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9825310Z mov.b64 {%r1517, %r1518}, %rd571; 2026-02-21T08:16:48.9825372Z cvt.rn.f16x2.f32 %r1519, %r1518, %r1517; 2026-02-21T08:16:48.9825542Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9825597Z cvt.u64.u32 %rd572, %r793; 2026-02-21T08:16:48.9825651Z cvt.u64.u32 %rd573, %r794; 2026-02-21T08:16:48.9825715Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:16:48.9825771Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:16:48.9825932Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9825993Z mov.b64 {%r1520, %r1521}, %rd575; 2026-02-21T08:16:48.9826055Z cvt.rn.f16x2.f32 %r1522, %r1521, %r1520; 2026-02-21T08:16:48.9826211Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9826265Z cvt.u64.u32 %rd576, %r795; 2026-02-21T08:16:48.9826325Z cvt.u64.u32 %rd577, %r796; 2026-02-21T08:16:48.9826380Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:16:48.9826442Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:16:48.9826607Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9826662Z mov.b64 {%r1523, %r1524}, %rd579; 2026-02-21T08:16:48.9826724Z cvt.rn.f16x2.f32 %r1525, %r1524, %r1523; 2026-02-21T08:16:48.9826889Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9826947Z cvt.u64.u32 %rd580, %r797; 2026-02-21T08:16:48.9827002Z cvt.u64.u32 %rd581, %r798; 2026-02-21T08:16:48.9827058Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:16:48.9827122Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:16:48.9827309Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9827367Z mov.b64 {%r1526, %r1527}, %rd583; 2026-02-21T08:16:48.9827438Z cvt.rn.f16x2.f32 %r1528, %r1527, %r1526; 2026-02-21T08:16:48.9827606Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9827664Z cvt.u64.u32 %rd584, %r799; 2026-02-21T08:16:48.9827729Z cvt.u64.u32 %rd585, %r800; 2026-02-21T08:16:48.9827787Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:16:48.9827846Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:16:48.9828015Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9828148Z mov.b64 {%r1529, %r1530}, %rd587; 2026-02-21T08:16:48.9828215Z cvt.rn.f16x2.f32 %r1531, %r1530, %r1529; 2026-02-21T08:16:48.9828383Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9828449Z cvt.u64.u32 %rd588, %r802; 2026-02-21T08:16:48.9828506Z cvt.u64.u32 %rd589, %r803; 2026-02-21T08:16:48.9828564Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:16:48.9828629Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:16:48.9828795Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9828851Z mov.b64 {%r1532, %r1533}, %rd591; 2026-02-21T08:16:48.9828916Z cvt.rn.f16x2.f32 %r1534, %r1533, %r1532; 2026-02-21T08:16:48.9829092Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9829204Z cvt.u64.u32 %rd592, %r804; 2026-02-21T08:16:48.9829263Z cvt.u64.u32 %rd593, %r805; 2026-02-21T08:16:48.9829332Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:16:48.9829391Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:16:48.9829563Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9829627Z mov.b64 {%r1535, %r1536}, %rd595; 2026-02-21T08:16:48.9829693Z cvt.rn.f16x2.f32 %r1537, %r1536, %r1535; 2026-02-21T08:16:48.9829862Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9829919Z cvt.u64.u32 %rd596, %r806; 2026-02-21T08:16:48.9829985Z cvt.u64.u32 %rd597, %r807; 2026-02-21T08:16:48.9830043Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:16:48.9830102Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:16:48.9830283Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9830341Z mov.b64 {%r1538, %r1539}, %rd599; 2026-02-21T08:16:48.9830409Z cvt.rn.f16x2.f32 %r1540, %r1539, %r1538; 2026-02-21T08:16:48.9830589Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9830646Z cvt.u64.u32 %rd600, %r808; 2026-02-21T08:16:48.9830703Z cvt.u64.u32 %rd601, %r809; 2026-02-21T08:16:48.9830761Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:16:48.9830825Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:16:48.9831001Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9831059Z mov.b64 {%r1541, %r1542}, %rd603; 2026-02-21T08:16:48.9831131Z cvt.rn.f16x2.f32 %r1543, %r1542, %r1541; 2026-02-21T08:16:48.9831298Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9831356Z cvt.u64.u32 %rd604, %r810; 2026-02-21T08:16:48.9831419Z cvt.u64.u32 %rd605, %r811; 2026-02-21T08:16:48.9831475Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:16:48.9831535Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:16:48.9831704Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9831768Z mov.b64 {%r1544, %r1545}, %rd607; 2026-02-21T08:16:48.9831834Z cvt.rn.f16x2.f32 %r1546, %r1545, %r1544; 2026-02-21T08:16:48.9832000Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9832064Z cvt.u64.u32 %rd608, %r812; 2026-02-21T08:16:48.9832121Z cvt.u64.u32 %rd609, %r813; 2026-02-21T08:16:48.9832178Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:16:48.9832244Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:16:48.9832414Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9832472Z mov.b64 {%r1547, %r1548}, %rd611; 2026-02-21T08:16:48.9832537Z cvt.rn.f16x2.f32 %r1549, %r1548, %r1547; 2026-02-21T08:16:48.9832716Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9832833Z cvt.u64.u32 %rd612, %r814; 2026-02-21T08:16:48.9832891Z cvt.u64.u32 %rd613, %r815; 2026-02-21T08:16:48.9832957Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:16:48.9833016Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:16:48.9833185Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9833251Z mov.b64 {%r1550, %r1551}, %rd615; 2026-02-21T08:16:48.9833318Z cvt.rn.f16x2.f32 %r1552, %r1551, %r1550; 2026-02-21T08:16:48.9833485Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9833544Z cvt.u64.u32 %rd616, %r816; 2026-02-21T08:16:48.9833609Z cvt.u64.u32 %rd617, %r817; 2026-02-21T08:16:48.9833666Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:16:48.9833727Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:16:48.9833941Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9834004Z mov.b64 {%r1553, %r1554}, %rd619; 2026-02-21T08:16:48.9834071Z cvt.rn.f16x2.f32 %r1555, %r1554, %r1553; 2026-02-21T08:16:48.9834250Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9834309Z cvt.u64.u32 %rd620, %r819; 2026-02-21T08:16:48.9834367Z cvt.u64.u32 %rd621, %r820; 2026-02-21T08:16:48.9834426Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:16:48.9834493Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:16:48.9834661Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9834757Z mov.b64 {%r1556, %r1557}, %rd623; 2026-02-21T08:16:48.9834829Z cvt.rn.f16x2.f32 %r1558, %r1557, %r1556; 2026-02-21T08:16:48.9834992Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9835052Z cvt.u64.u32 %rd624, %r821; 2026-02-21T08:16:48.9835117Z cvt.u64.u32 %rd625, %r822; 2026-02-21T08:16:48.9835178Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:16:48.9835237Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:16:48.9835403Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9835471Z mov.b64 {%r1559, %r1560}, %rd627; 2026-02-21T08:16:48.9835536Z cvt.rn.f16x2.f32 %r1561, %r1560, %r1559; 2026-02-21T08:16:48.9835700Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9835766Z cvt.u64.u32 %rd628, %r823; 2026-02-21T08:16:48.9835822Z cvt.u64.u32 %rd629, %r824; 2026-02-21T08:16:48.9835891Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:16:48.9835952Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:16:48.9836108Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9836165Z mov.b64 {%r1562, %r1563}, %rd631; 2026-02-21T08:16:48.9836232Z cvt.rn.f16x2.f32 %r1564, %r1563, %r1562; 2026-02-21T08:16:48.9836398Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9836452Z cvt.u64.u32 %rd632, %r825; 2026-02-21T08:16:48.9836506Z cvt.u64.u32 %rd633, %r826; 2026-02-21T08:16:48.9836568Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:16:48.9836625Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:16:48.9836780Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9836842Z mov.b64 {%r1565, %r1566}, %rd635; 2026-02-21T08:16:48.9836905Z cvt.rn.f16x2.f32 %r1567, %r1566, %r1565; 2026-02-21T08:16:48.9837061Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9837117Z cvt.u64.u32 %rd636, %r827; 2026-02-21T08:16:48.9837180Z cvt.u64.u32 %rd637, %r828; 2026-02-21T08:16:48.9837235Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:16:48.9837293Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:16:48.9837510Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9837566Z mov.b64 {%r1568, %r1569}, %rd639; 2026-02-21T08:16:48.9837631Z cvt.rn.f16x2.f32 %r1570, %r1569, %r1568; 2026-02-21T08:16:48.9837847Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9837914Z cvt.u64.u32 %rd640, %r829; 2026-02-21T08:16:48.9837984Z cvt.u64.u32 %rd641, %r830; 2026-02-21T08:16:48.9838044Z shl.b64 %rd642, %rd641, 32; 2026-02-21T08:16:48.9838109Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T08:16:48.9838272Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9838328Z mov.b64 {%r1571, %r1572}, %rd643; 2026-02-21T08:16:48.9838400Z cvt.rn.f16x2.f32 %r1573, %r1572, %r1571; 2026-02-21T08:16:48.9838612Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9838672Z cvt.u64.u32 %rd644, %r831; 2026-02-21T08:16:48.9838734Z cvt.u64.u32 %rd645, %r832; 2026-02-21T08:16:48.9838790Z shl.b64 %rd646, %rd645, 32; 2026-02-21T08:16:48.9838847Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T08:16:48.9839011Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9839074Z mov.b64 {%r1574, %r1575}, %rd647; 2026-02-21T08:16:48.9839135Z cvt.rn.f16x2.f32 %r1576, %r1575, %r1574; 2026-02-21T08:16:48.9839303Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9839365Z cvt.u64.u32 %rd648, %r833; 2026-02-21T08:16:48.9839418Z cvt.u64.u32 %rd649, %r834; 2026-02-21T08:16:48.9839472Z shl.b64 %rd650, %rd649, 32; 2026-02-21T08:16:48.9839533Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T08:16:48.9839700Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9839756Z mov.b64 {%r1577, %r1578}, %rd651; 2026-02-21T08:16:48.9839818Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T08:16:48.9839987Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9840042Z cvt.u64.u32 %rd652, %r836; 2026-02-21T08:16:48.9840096Z cvt.u64.u32 %rd653, %r837; 2026-02-21T08:16:48.9840158Z shl.b64 %rd654, %rd653, 32; 2026-02-21T08:16:48.9840213Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T08:16:48.9840376Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9840437Z mov.b64 {%r1580, %r1581}, %rd655; 2026-02-21T08:16:48.9840499Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T08:16:48.9840664Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9840719Z cvt.u64.u32 %rd656, %r838; 2026-02-21T08:16:48.9840782Z cvt.u64.u32 %rd657, %r839; 2026-02-21T08:16:48.9840840Z shl.b64 %rd658, %rd657, 32; 2026-02-21T08:16:48.9840895Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T08:16:48.9841065Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9841121Z mov.b64 {%r1583, %r1584}, %rd659; 2026-02-21T08:16:48.9841182Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T08:16:48.9841355Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9841409Z cvt.u64.u32 %rd660, %r840; 2026-02-21T08:16:48.9841464Z cvt.u64.u32 %rd661, %r841; 2026-02-21T08:16:48.9841518Z shl.b64 %rd662, %rd661, 32; 2026-02-21T08:16:48.9841581Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T08:16:48.9841745Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9841800Z mov.b64 {%r1586, %r1587}, %rd663; 2026-02-21T08:16:48.9841872Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T08:16:48.9842074Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9842131Z cvt.u64.u32 %rd664, %r842; 2026-02-21T08:16:48.9842193Z cvt.u64.u32 %rd665, %r843; 2026-02-21T08:16:48.9842250Z shl.b64 %rd666, %rd665, 32; 2026-02-21T08:16:48.9842306Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T08:16:48.9842468Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9842530Z mov.b64 {%r1589, %r1590}, %rd667; 2026-02-21T08:16:48.9842592Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T08:16:48.9842754Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9842816Z cvt.u64.u32 %rd668, %r844; 2026-02-21T08:16:48.9842871Z cvt.u64.u32 %rd669, %r845; 2026-02-21T08:16:48.9842926Z shl.b64 %rd670, %rd669, 32; 2026-02-21T08:16:48.9843040Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T08:16:48.9843206Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9843262Z mov.b64 {%r1592, %r1593}, %rd671; 2026-02-21T08:16:48.9843324Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T08:16:48.9843495Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9843549Z cvt.u64.u32 %rd672, %r846; 2026-02-21T08:16:48.9843602Z cvt.u64.u32 %rd673, %r847; 2026-02-21T08:16:48.9843663Z shl.b64 %rd674, %rd673, 32; 2026-02-21T08:16:48.9843718Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T08:16:48.9843881Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9843942Z mov.b64 {%r1595, %r1596}, %rd675; 2026-02-21T08:16:48.9844004Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T08:16:48.9844171Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9844228Z cvt.u64.u32 %rd676, %r848; 2026-02-21T08:16:48.9844291Z cvt.u64.u32 %rd677, %r849; 2026-02-21T08:16:48.9844345Z shl.b64 %rd678, %rd677, 32; 2026-02-21T08:16:48.9844401Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T08:16:48.9844569Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9844624Z mov.b64 {%r1598, %r1599}, %rd679; 2026-02-21T08:16:48.9844711Z cvt.rn.f16x2.f32 %r1600, %r1599, %r1598; 2026-02-21T08:16:48.9844882Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9844937Z cvt.u64.u32 %rd680, %r850; 2026-02-21T08:16:48.9844991Z cvt.u64.u32 %rd681, %r851; 2026-02-21T08:16:48.9845047Z shl.b64 %rd682, %rd681, 32; 2026-02-21T08:16:48.9845110Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T08:16:48.9845276Z .loc 1 58 27 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:58:27 2026-02-21T08:16:48.9845334Z mov.b64 {%r1601, %r1602}, %rd683; 2026-02-21T08:16:48.9845405Z cvt.rn.f16x2.f32 %r1603, %r1602, %r1601; 2026-02-21T08:16:48.9845567Z .loc 1 59 82 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:59:82 2026-02-21T08:16:48.9845669Z st.shared.v4.b32 [%r1173], {%r1222, %r1234, %r1246, %r1258}; 2026-02-21T08:16:48.9845771Z st.shared.v4.b32 [%r1172], {%r1270, %r1282, %r1294, %r1306}; 2026-02-21T08:16:48.9845860Z st.shared.v4.b32 [%r1170], {%r1318, %r1330, %r1342, %r1354}; 2026-02-21T08:16:48.9845948Z st.shared.v4.b32 [%r1168], {%r1366, %r1378, %r1390, %r1402}; 2026-02-21T08:16:48.9846009Z bar.sync 0, 128; 2026-02-21T08:16:48.9846064Z // begin inline asm 2026-02-21T08:16:48.9846218Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1013, %r1017, %r1021, %r1025}, [%r857]; 2026-02-21T08:16:48.9846272Z // end inline asm 2026-02-21T08:16:48.9846335Z // begin inline asm 2026-02-21T08:16:48.9846487Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1029, %r1033, %r1037, %r1041}, [%r862]; 2026-02-21T08:16:48.9846596Z // end inline asm 2026-02-21T08:16:48.9846656Z // begin inline asm 2026-02-21T08:16:48.9846803Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1045, %r1049, %r1053, %r1057}, [%r867]; 2026-02-21T08:16:48.9846856Z // end inline asm 2026-02-21T08:16:48.9846909Z // begin inline asm 2026-02-21T08:16:48.9847058Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1061, %r1065, %r1069, %r1073}, [%r872]; 2026-02-21T08:16:48.9847110Z // end inline asm 2026-02-21T08:16:48.9847163Z bar.sync 0, 128; 2026-02-21T08:16:48.9847261Z st.shared.v4.b32 [%r1173], {%r1414, %r1426, %r1438, %r1450}; 2026-02-21T08:16:48.9847351Z st.shared.v4.b32 [%r1172], {%r1462, %r1474, %r1486, %r1498}; 2026-02-21T08:16:48.9847437Z st.shared.v4.b32 [%r1170], {%r1510, %r1522, %r1534, %r1546}; 2026-02-21T08:16:48.9847530Z st.shared.v4.b32 [%r1168], {%r1558, %r1570, %r1582, %r1594}; 2026-02-21T08:16:48.9847583Z bar.sync 0, 128; 2026-02-21T08:16:48.9847681Z // begin inline asm 2026-02-21T08:16:48.9847829Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1077, %r1081, %r1085, %r1089}, [%r857]; 2026-02-21T08:16:48.9847889Z // end inline asm 2026-02-21T08:16:48.9847942Z // begin inline asm 2026-02-21T08:16:48.9848083Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1093, %r1097, %r1101, %r1105}, [%r862]; 2026-02-21T08:16:48.9848143Z // end inline asm 2026-02-21T08:16:48.9848196Z // begin inline asm 2026-02-21T08:16:48.9848336Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1109, %r1113, %r1117, %r1121}, [%r867]; 2026-02-21T08:16:48.9848393Z // end inline asm 2026-02-21T08:16:48.9848445Z // begin inline asm 2026-02-21T08:16:48.9848585Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1125, %r1129, %r1133, %r1137}, [%r872]; 2026-02-21T08:16:48.9848635Z // end inline asm 2026-02-21T08:16:48.9848695Z bar.sync 0, 128; 2026-02-21T08:16:48.9848784Z st.shared.v4.b32 [%r1173], {%r1225, %r1237, %r1249, %r1261}; 2026-02-21T08:16:48.9848874Z st.shared.v4.b32 [%r1172], {%r1273, %r1285, %r1297, %r1309}; 2026-02-21T08:16:48.9848970Z st.shared.v4.b32 [%r1170], {%r1321, %r1333, %r1345, %r1357}; 2026-02-21T08:16:48.9849056Z st.shared.v4.b32 [%r1168], {%r1369, %r1381, %r1393, %r1405}; 2026-02-21T08:16:48.9849111Z bar.sync 0, 128; 2026-02-21T08:16:48.9849163Z // begin inline asm 2026-02-21T08:16:48.9849316Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1014, %r1018, %r1022, %r1026}, [%r857]; 2026-02-21T08:16:48.9849368Z // end inline asm 2026-02-21T08:16:48.9849421Z // begin inline asm 2026-02-21T08:16:48.9849571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1030, %r1034, %r1038, %r1042}, [%r862]; 2026-02-21T08:16:48.9849622Z // end inline asm 2026-02-21T08:16:48.9849675Z // begin inline asm 2026-02-21T08:16:48.9849821Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1046, %r1050, %r1054, %r1058}, [%r867]; 2026-02-21T08:16:48.9849872Z // end inline asm 2026-02-21T08:16:48.9849925Z // begin inline asm 2026-02-21T08:16:48.9850067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1062, %r1066, %r1070, %r1074}, [%r872]; 2026-02-21T08:16:48.9850129Z // end inline asm 2026-02-21T08:16:48.9850183Z bar.sync 0, 128; 2026-02-21T08:16:48.9850272Z st.shared.v4.b32 [%r1173], {%r1417, %r1429, %r1441, %r1453}; 2026-02-21T08:16:48.9850366Z st.shared.v4.b32 [%r1172], {%r1465, %r1477, %r1489, %r1501}; 2026-02-21T08:16:48.9850453Z st.shared.v4.b32 [%r1170], {%r1513, %r1525, %r1537, %r1549}; 2026-02-21T08:16:48.9850541Z st.shared.v4.b32 [%r1168], {%r1561, %r1573, %r1585, %r1597}; 2026-02-21T08:16:48.9850601Z bar.sync 0, 128; 2026-02-21T08:16:48.9850654Z // begin inline asm 2026-02-21T08:16:48.9850796Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1078, %r1082, %r1086, %r1090}, [%r857]; 2026-02-21T08:16:48.9850847Z // end inline asm 2026-02-21T08:16:48.9850909Z // begin inline asm 2026-02-21T08:16:48.9851051Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1094, %r1098, %r1102, %r1106}, [%r862]; 2026-02-21T08:16:48.9851103Z // end inline asm 2026-02-21T08:16:48.9851162Z // begin inline asm 2026-02-21T08:16:48.9851306Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1110, %r1114, %r1118, %r1122}, [%r867]; 2026-02-21T08:16:48.9851404Z // end inline asm 2026-02-21T08:16:48.9851456Z // begin inline asm 2026-02-21T08:16:48.9851603Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1126, %r1130, %r1134, %r1138}, [%r872]; 2026-02-21T08:16:48.9851654Z // end inline asm 2026-02-21T08:16:48.9851707Z bar.sync 0, 128; 2026-02-21T08:16:48.9851804Z st.shared.v4.b32 [%r1173], {%r1228, %r1240, %r1252, %r1264}; 2026-02-21T08:16:48.9851892Z st.shared.v4.b32 [%r1172], {%r1276, %r1288, %r1300, %r1312}; 2026-02-21T08:16:48.9851979Z st.shared.v4.b32 [%r1170], {%r1324, %r1336, %r1348, %r1360}; 2026-02-21T08:16:48.9852072Z st.shared.v4.b32 [%r1168], {%r1372, %r1384, %r1396, %r1408}; 2026-02-21T08:16:48.9852123Z bar.sync 0, 128; 2026-02-21T08:16:48.9852175Z // begin inline asm 2026-02-21T08:16:48.9852316Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1015, %r1019, %r1023, %r1027}, [%r857]; 2026-02-21T08:16:48.9852413Z // end inline asm 2026-02-21T08:16:48.9852470Z // begin inline asm 2026-02-21T08:16:48.9852613Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1031, %r1035, %r1039, %r1043}, [%r862]; 2026-02-21T08:16:48.9852672Z // end inline asm 2026-02-21T08:16:48.9852726Z // begin inline asm 2026-02-21T08:16:48.9852870Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1047, %r1051, %r1055, %r1059}, [%r867]; 2026-02-21T08:16:48.9852932Z // end inline asm 2026-02-21T08:16:48.9852988Z // begin inline asm 2026-02-21T08:16:48.9853126Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1063, %r1067, %r1071, %r1075}, [%r872]; 2026-02-21T08:16:48.9853176Z // end inline asm 2026-02-21T08:16:48.9853235Z bar.sync 0, 128; 2026-02-21T08:16:48.9853324Z st.shared.v4.b32 [%r1173], {%r1420, %r1432, %r1444, %r1456}; 2026-02-21T08:16:48.9853412Z st.shared.v4.b32 [%r1172], {%r1468, %r1480, %r1492, %r1504}; 2026-02-21T08:16:48.9853507Z st.shared.v4.b32 [%r1170], {%r1516, %r1528, %r1540, %r1552}; 2026-02-21T08:16:48.9853596Z st.shared.v4.b32 [%r1168], {%r1564, %r1576, %r1588, %r1600}; 2026-02-21T08:16:48.9853651Z bar.sync 0, 128; 2026-02-21T08:16:48.9853703Z // begin inline asm 2026-02-21T08:16:48.9853850Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1079, %r1083, %r1087, %r1091}, [%r857]; 2026-02-21T08:16:48.9853900Z // end inline asm 2026-02-21T08:16:48.9853951Z // begin inline asm 2026-02-21T08:16:48.9854100Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1095, %r1099, %r1103, %r1107}, [%r862]; 2026-02-21T08:16:48.9854151Z // end inline asm 2026-02-21T08:16:48.9854203Z // begin inline asm 2026-02-21T08:16:48.9854353Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1111, %r1115, %r1119, %r1123}, [%r867]; 2026-02-21T08:16:48.9854403Z // end inline asm 2026-02-21T08:16:48.9854456Z // begin inline asm 2026-02-21T08:16:48.9854597Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1127, %r1131, %r1135, %r1139}, [%r872]; 2026-02-21T08:16:48.9854654Z // end inline asm 2026-02-21T08:16:48.9854726Z bar.sync 0, 128; 2026-02-21T08:16:48.9854818Z st.shared.v4.b32 [%r1173], {%r1231, %r1243, %r1255, %r1267}; 2026-02-21T08:16:48.9854915Z st.shared.v4.b32 [%r1172], {%r1279, %r1291, %r1303, %r1315}; 2026-02-21T08:16:48.9855002Z st.shared.v4.b32 [%r1170], {%r1327, %r1339, %r1351, %r1363}; 2026-02-21T08:16:48.9855088Z st.shared.v4.b32 [%r1168], {%r1375, %r1387, %r1399, %r1411}; 2026-02-21T08:16:48.9855147Z bar.sync 0, 128; 2026-02-21T08:16:48.9855198Z // begin inline asm 2026-02-21T08:16:48.9855341Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1016, %r1020, %r1024, %r1028}, [%r857]; 2026-02-21T08:16:48.9855392Z // end inline asm 2026-02-21T08:16:48.9855451Z // begin inline asm 2026-02-21T08:16:48.9855590Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1032, %r1036, %r1040, %r1044}, [%r862]; 2026-02-21T08:16:48.9855643Z // end inline asm 2026-02-21T08:16:48.9855701Z // begin inline asm 2026-02-21T08:16:48.9855843Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1048, %r1052, %r1056, %r1060}, [%r867]; 2026-02-21T08:16:48.9855893Z // end inline asm 2026-02-21T08:16:48.9855948Z // begin inline asm 2026-02-21T08:16:48.9856146Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1064, %r1068, %r1072, %r1076}, [%r872]; 2026-02-21T08:16:48.9856199Z // end inline asm 2026-02-21T08:16:48.9856251Z bar.sync 0, 128; 2026-02-21T08:16:48.9856347Z st.shared.v4.b32 [%r1173], {%r1423, %r1435, %r1447, %r1459}; 2026-02-21T08:16:48.9856434Z st.shared.v4.b32 [%r1172], {%r1471, %r1483, %r1495, %r1507}; 2026-02-21T08:16:48.9856521Z st.shared.v4.b32 [%r1170], {%r1519, %r1531, %r1543, %r1555}; 2026-02-21T08:16:48.9856615Z st.shared.v4.b32 [%r1168], {%r1567, %r1579, %r1591, %r1603}; 2026-02-21T08:16:48.9856666Z bar.sync 0, 128; 2026-02-21T08:16:48.9856718Z // begin inline asm 2026-02-21T08:16:48.9856860Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1080, %r1084, %r1088, %r1092}, [%r857]; 2026-02-21T08:16:48.9856917Z // end inline asm 2026-02-21T08:16:48.9856969Z // begin inline asm 2026-02-21T08:16:48.9857158Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1096, %r1100, %r1104, %r1108}, [%r862]; 2026-02-21T08:16:48.9857220Z // end inline asm 2026-02-21T08:16:48.9857271Z // begin inline asm 2026-02-21T08:16:48.9857409Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1112, %r1116, %r1120, %r1124}, [%r867]; 2026-02-21T08:16:48.9857465Z // end inline asm 2026-02-21T08:16:48.9857517Z // begin inline asm 2026-02-21T08:16:48.9857653Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1128, %r1132, %r1136, %r1140}, [%r872]; 2026-02-21T08:16:48.9857703Z // end inline asm 2026-02-21T08:16:48.9857762Z // begin inline asm 2026-02-21T08:16:48.9857869Z st.global.v4.b32 [ %rd140 + 0 ], { %r1013, %r1014, %r1015, %r1016 }; 2026-02-21T08:16:48.9857920Z // end inline asm 2026-02-21T08:16:48.9857979Z // begin inline asm 2026-02-21T08:16:48.9858081Z st.global.v4.b32 [ %rd141 + 0 ], { %r1017, %r1018, %r1019, %r1020 }; 2026-02-21T08:16:48.9858132Z // end inline asm 2026-02-21T08:16:48.9858185Z // begin inline asm 2026-02-21T08:16:48.9858292Z st.global.v4.b32 [ %rd142 + 0 ], { %r1021, %r1022, %r1023, %r1024 }; 2026-02-21T08:16:48.9858347Z // end inline asm 2026-02-21T08:16:48.9858398Z // begin inline asm 2026-02-21T08:16:48.9858499Z st.global.v4.b32 [ %rd143 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T08:16:48.9858550Z // end inline asm 2026-02-21T08:16:48.9858603Z // begin inline asm 2026-02-21T08:16:48.9858702Z st.global.v4.b32 [ %rd144 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T08:16:48.9858755Z // end inline asm 2026-02-21T08:16:48.9858807Z // begin inline asm 2026-02-21T08:16:48.9858900Z st.global.v4.b32 [ %rd145 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T08:16:48.9858958Z // end inline asm 2026-02-21T08:16:48.9859010Z // begin inline asm 2026-02-21T08:16:48.9859103Z st.global.v4.b32 [ %rd146 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T08:16:48.9859161Z // end inline asm 2026-02-21T08:16:48.9859214Z // begin inline asm 2026-02-21T08:16:48.9859306Z st.global.v4.b32 [ %rd147 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T08:16:48.9859359Z // end inline asm 2026-02-21T08:16:48.9859422Z // begin inline asm 2026-02-21T08:16:48.9859513Z st.global.v4.b32 [ %rd148 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T08:16:48.9859563Z // end inline asm 2026-02-21T08:16:48.9859621Z // begin inline asm 2026-02-21T08:16:48.9859712Z st.global.v4.b32 [ %rd149 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T08:16:48.9859763Z // end inline asm 2026-02-21T08:16:48.9859821Z // begin inline asm 2026-02-21T08:16:48.9859914Z st.global.v4.b32 [ %rd150 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T08:16:48.9859964Z // end inline asm 2026-02-21T08:16:48.9860016Z // begin inline asm 2026-02-21T08:16:48.9860116Z st.global.v4.b32 [ %rd151 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T08:16:48.9860167Z // end inline asm 2026-02-21T08:16:48.9860220Z // begin inline asm 2026-02-21T08:16:48.9860325Z st.global.v4.b32 [ %rd152 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T08:16:48.9860377Z // end inline asm 2026-02-21T08:16:48.9860431Z // begin inline asm 2026-02-21T08:16:48.9860564Z st.global.v4.b32 [ %rd153 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T08:16:48.9860623Z // end inline asm 2026-02-21T08:16:48.9860676Z // begin inline asm 2026-02-21T08:16:48.9860769Z st.global.v4.b32 [ %rd154 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T08:16:48.9860828Z // end inline asm 2026-02-21T08:16:48.9860880Z // begin inline asm 2026-02-21T08:16:48.9860975Z st.global.v4.b32 [ %rd155 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T08:16:48.9861032Z // end inline asm 2026-02-21T08:16:48.9861084Z // begin inline asm 2026-02-21T08:16:48.9861177Z st.global.v4.b32 [ %rd156 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T08:16:48.9861228Z // end inline asm 2026-02-21T08:16:48.9861287Z // begin inline asm 2026-02-21T08:16:48.9861383Z st.global.v4.b32 [ %rd157 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T08:16:48.9861434Z // end inline asm 2026-02-21T08:16:48.9861492Z // begin inline asm 2026-02-21T08:16:48.9861639Z st.global.v4.b32 [ %rd158 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T08:16:48.9861695Z // end inline asm 2026-02-21T08:16:48.9861748Z // begin inline asm 2026-02-21T08:16:48.9861849Z st.global.v4.b32 [ %rd159 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T08:16:48.9861902Z // end inline asm 2026-02-21T08:16:48.9861954Z // begin inline asm 2026-02-21T08:16:48.9862051Z st.global.v4.b32 [ %rd160 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T08:16:48.9862103Z // end inline asm 2026-02-21T08:16:48.9862156Z // begin inline asm 2026-02-21T08:16:48.9862247Z st.global.v4.b32 [ %rd161 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T08:16:48.9862304Z // end inline asm 2026-02-21T08:16:48.9862356Z // begin inline asm 2026-02-21T08:16:48.9862449Z st.global.v4.b32 [ %rd162 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T08:16:48.9862507Z // end inline asm 2026-02-21T08:16:48.9862559Z // begin inline asm 2026-02-21T08:16:48.9862655Z st.global.v4.b32 [ %rd163 + 0 ], { %r1105, %r1106, %r1107, %r1108 }; 2026-02-21T08:16:48.9862717Z // end inline asm 2026-02-21T08:16:48.9862768Z // begin inline asm 2026-02-21T08:16:48.9862860Z st.global.v4.b32 [ %rd164 + 0 ], { %r1109, %r1110, %r1111, %r1112 }; 2026-02-21T08:16:48.9862911Z // end inline asm 2026-02-21T08:16:48.9862968Z // begin inline asm 2026-02-21T08:16:48.9863061Z st.global.v4.b32 [ %rd165 + 0 ], { %r1113, %r1114, %r1115, %r1116 }; 2026-02-21T08:16:48.9863113Z // end inline asm 2026-02-21T08:16:48.9863173Z // begin inline asm 2026-02-21T08:16:48.9863266Z st.global.v4.b32 [ %rd166 + 0 ], { %r1117, %r1118, %r1119, %r1120 }; 2026-02-21T08:16:48.9863318Z // end inline asm 2026-02-21T08:16:48.9863370Z // begin inline asm 2026-02-21T08:16:48.9863468Z st.global.v4.b32 [ %rd167 + 0 ], { %r1121, %r1122, %r1123, %r1124 }; 2026-02-21T08:16:48.9863518Z // end inline asm 2026-02-21T08:16:48.9863571Z // begin inline asm 2026-02-21T08:16:48.9863673Z st.global.v4.b32 [ %rd168 + 0 ], { %r1125, %r1126, %r1127, %r1128 }; 2026-02-21T08:16:48.9863726Z // end inline asm 2026-02-21T08:16:48.9863778Z // begin inline asm 2026-02-21T08:16:48.9863876Z st.global.v4.b32 [ %rd169 + 0 ], { %r1129, %r1130, %r1131, %r1132 }; 2026-02-21T08:16:48.9863927Z // end inline asm 2026-02-21T08:16:48.9863980Z // begin inline asm 2026-02-21T08:16:48.9864071Z st.global.v4.b32 [ %rd170 + 0 ], { %r1133, %r1134, %r1135, %r1136 }; 2026-02-21T08:16:48.9864129Z // end inline asm 2026-02-21T08:16:48.9864182Z // begin inline asm 2026-02-21T08:16:48.9864273Z st.global.v4.b32 [ %rd171 + 0 ], { %r1137, %r1138, %r1139, %r1140 }; 2026-02-21T08:16:48.9864330Z // end inline asm 2026-02-21T08:16:48.9864411Z $L__BB0_16: // %._crit_edge 2026-02-21T08:16:48.9864579Z .loc 1 30 4 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:30:4 2026-02-21T08:16:48.9864633Z bar.sync 0, 128; 2026-02-21T08:16:48.9864721Z // begin inline asm 2026-02-21T08:16:48.9864846Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1604, 256; 2026-02-21T08:16:48.9864951Z // end inline asm 2026-02-21T08:16:48.9865055Z st.shared.v2.b32 [global_smem+155784], {50529027, 50529027}; 2026-02-21T08:16:48.9865109Z barrier.sync 1; 2026-02-21T08:16:48.9865190Z $L__BB0_17: // %common.ret 2026-02-21T08:16:48.9865360Z .loc 1 0 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:0 2026-02-21T08:16:48.9865412Z ret; 2026-02-21T08:16:48.9865505Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:16:48.9865587Z ld.param.b64 %rd6, [_helion_matmul_param_0]; 2026-02-21T08:16:48.9865755Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19 2026-02-21T08:16:48.9865815Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:16:48.9865873Z and.b16 %rs2, %rs1, 3; 2026-02-21T08:16:48.9865940Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:16:48.9866153Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9866213Z or.b32 %r5, %r4, 192; 2026-02-21T08:16:48.9866281Z mov.b32 %r28, global_smem; 2026-02-21T08:16:48.9866338Z add.s32 %r29, %r28, %r3; 2026-02-21T08:16:48.9866394Z bra.uni $L__BB0_2; 2026-02-21T08:16:48.9866493Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9866668Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9866722Z barrier.sync 1; 2026-02-21T08:16:48.9866777Z barrier.sync 1; 2026-02-21T08:16:48.9866862Z $L__BB0_2: // %.preheader 2026-02-21T08:16:48.9866948Z // =>This Loop Header: Depth=1 2026-02-21T08:16:48.9867029Z // Child Loop BB0_11 Depth 2 2026-02-21T08:16:48.9867118Z // Child Loop BB0_6 Depth 2 2026-02-21T08:16:48.9867276Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19 2026-02-21T08:16:48.9867355Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:16:48.9867415Z barrier.sync 1; 2026-02-21T08:16:48.9867477Z ld.shared.b8 %r27, [%r29+155780]; 2026-02-21T08:16:48.9867537Z setp.gt.u32 %p3, %r27, 3; 2026-02-21T08:16:48.9867593Z @%p3 bra $L__BB0_4; 2026-02-21T08:16:48.9867677Z // %bb.3: // %.preheader 2026-02-21T08:16:48.9867761Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9867822Z $L_brx_0: .branchtargets 2026-02-21T08:16:48.9867881Z $L__BB0_5, 2026-02-21T08:16:48.9867932Z $L__BB0_10, 2026-02-21T08:16:48.9867982Z $L__BB0_13, 2026-02-21T08:16:48.9868030Z $L__BB0_17; 2026-02-21T08:16:48.9868094Z brx.idx %r27, $L_brx_0; 2026-02-21T08:16:48.9868184Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9868349Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9868425Z ld.shared.b32 %r193, [global_smem]; 2026-02-21T08:16:48.9868494Z ld.shared.b32 %r155, [global_smem+12]; 2026-02-21T08:16:48.9868548Z barrier.sync 1; 2026-02-21T08:16:48.9868713Z .loc 1 44 45 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:45 2026-02-21T08:16:48.9868774Z add.s32 %r156, %r1, -128; 2026-02-21T08:16:48.9868830Z shr.u32 %r7, %r156, 5; 2026-02-21T08:16:48.9868884Z shr.u32 %r157, %r1, 2; 2026-02-21T08:16:48.9868947Z bfe.u32 %r158, %r1, 2, 5; 2026-02-21T08:16:48.9869000Z or.b32 %r159, %r157, 224; 2026-02-21T08:16:48.9869158Z .loc 1 50 48 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:50:48 2026-02-21T08:16:48.9869218Z shl.b32 %r160, %r1, 3; 2026-02-21T08:16:48.9869271Z and.b32 %r161, %r160, 24; 2026-02-21T08:16:48.9869429Z .loc 1 44 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:44:32 2026-02-21T08:16:48.9869489Z add.s32 %r162, %r155, %r158; 2026-02-21T08:16:48.9869594Z add.s32 %r163, %r155, %r159; 2026-02-21T08:16:48.9869649Z shl.b32 %r164, %r162, 11; 2026-02-21T08:16:48.9869815Z .loc 1 54 53 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:53 2026-02-21T08:16:48.9869878Z add.s32 %r165, %r164, 65536; 2026-02-21T08:16:48.9869935Z add.s32 %r166, %r164, 131072; 2026-02-21T08:16:48.9869992Z add.s32 %r167, %r164, 196608; 2026-02-21T08:16:48.9870053Z add.s32 %r168, %r164, 262144; 2026-02-21T08:16:48.9870105Z add.s32 %r169, %r164, 327680; 2026-02-21T08:16:48.9870158Z add.s32 %r170, %r164, 393216; 2026-02-21T08:16:48.9870212Z shl.b32 %r171, %r163, 11; 2026-02-21T08:16:48.9870384Z .loc 1 54 60 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:60 2026-02-21T08:16:48.9870441Z or.b32 %r172, %r164, %r161; 2026-02-21T08:16:48.9870495Z or.b32 %r173, %r165, %r161; 2026-02-21T08:16:48.9870594Z or.b32 %r174, %r166, %r161; 2026-02-21T08:16:48.9870651Z or.b32 %r175, %r167, %r161; 2026-02-21T08:16:48.9870703Z or.b32 %r176, %r168, %r161; 2026-02-21T08:16:48.9870755Z or.b32 %r177, %r169, %r161; 2026-02-21T08:16:48.9870814Z or.b32 %r178, %r170, %r161; 2026-02-21T08:16:48.9870867Z or.b32 %r179, %r171, %r161; 2026-02-21T08:16:48.9871032Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9871102Z mad.wide.s32 %rd13, %r172, 2, %rd6; 2026-02-21T08:16:48.9871165Z mad.wide.s32 %rd14, %r173, 2, %rd6; 2026-02-21T08:16:48.9871225Z mad.wide.s32 %rd15, %r174, 2, %rd6; 2026-02-21T08:16:48.9871294Z mad.wide.s32 %rd16, %r175, 2, %rd6; 2026-02-21T08:16:48.9871355Z mad.wide.s32 %rd17, %r176, 2, %rd6; 2026-02-21T08:16:48.9871417Z mad.wide.s32 %rd18, %r177, 2, %rd6; 2026-02-21T08:16:48.9871477Z mad.wide.s32 %rd19, %r178, 2, %rd6; 2026-02-21T08:16:48.9871543Z mad.wide.s32 %rd20, %r179, 2, %rd6; 2026-02-21T08:16:48.9871716Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9871776Z shl.b32 %r180, %r1, 4; 2026-02-21T08:16:48.9871841Z and.b32 %r181, %r180, 2032; 2026-02-21T08:16:48.9871897Z shl.b32 %r182, %r1, 1; 2026-02-21T08:16:48.9871955Z and.b32 %r183, %r182, 48; 2026-02-21T08:16:48.9872021Z xor.b32 %r8, %r181, %r183; 2026-02-21T08:16:48.9872080Z add.s32 %r55, %r28, %r8; 2026-02-21T08:16:48.9872135Z mov.b32 %r56, 16; 2026-02-21T08:16:48.9872191Z // begin inline asm 2026-02-21T08:16:48.9872319Z cp.async.cg.shared.global [ %r55 + 0 ], [ %rd13 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9872375Z // end inline asm 2026-02-21T08:16:48.9872433Z add.s32 %r57, %r55, 2048; 2026-02-21T08:16:48.9872495Z // begin inline asm 2026-02-21T08:16:48.9872613Z cp.async.cg.shared.global [ %r57 + 0 ], [ %rd14 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9872668Z // end inline asm 2026-02-21T08:16:48.9872727Z add.s32 %r59, %r55, 4096; 2026-02-21T08:16:48.9872793Z // begin inline asm 2026-02-21T08:16:48.9872906Z cp.async.cg.shared.global [ %r59 + 0 ], [ %rd15 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9872964Z // end inline asm 2026-02-21T08:16:48.9873030Z add.s32 %r61, %r55, 6144; 2026-02-21T08:16:48.9873085Z // begin inline asm 2026-02-21T08:16:48.9873196Z cp.async.cg.shared.global [ %r61 + 0 ], [ %rd16 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9873252Z // end inline asm 2026-02-21T08:16:48.9873314Z add.s32 %r63, %r55, 8192; 2026-02-21T08:16:48.9873371Z // begin inline asm 2026-02-21T08:16:48.9873480Z cp.async.cg.shared.global [ %r63 + 0 ], [ %rd17 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9873543Z // end inline asm 2026-02-21T08:16:48.9873604Z add.s32 %r65, %r55, 10240; 2026-02-21T08:16:48.9873660Z // begin inline asm 2026-02-21T08:16:48.9873774Z cp.async.cg.shared.global [ %r65 + 0 ], [ %rd18 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9873829Z // end inline asm 2026-02-21T08:16:48.9873888Z add.s32 %r67, %r55, 12288; 2026-02-21T08:16:48.9873944Z // begin inline asm 2026-02-21T08:16:48.9874062Z cp.async.cg.shared.global [ %r67 + 0 ], [ %rd19 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9874159Z // end inline asm 2026-02-21T08:16:48.9874218Z add.s32 %r69, %r55, 14336; 2026-02-21T08:16:48.9874280Z // begin inline asm 2026-02-21T08:16:48.9874386Z cp.async.cg.shared.global [ %r69 + 0 ], [ %rd20 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9874438Z // end inline asm 2026-02-21T08:16:48.9874500Z cp.async.commit_group; 2026-02-21T08:16:48.9874713Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9874775Z cvt.s64.s32 %rd62, %r164; 2026-02-21T08:16:48.9874833Z cvt.u64.u32 %rd63, %r161; 2026-02-21T08:16:48.9874900Z or.b64 %rd64, %rd62, %rd63; 2026-02-21T08:16:48.9874960Z shl.b64 %rd65, %rd64, 1; 2026-02-21T08:16:48.9875019Z add.s64 %rd66, %rd6, %rd65; 2026-02-21T08:16:48.9875077Z add.s64 %rd21, %rd66, 64; 2026-02-21T08:16:48.9875140Z cvt.s64.s32 %rd67, %r165; 2026-02-21T08:16:48.9875248Z or.b64 %rd68, %rd67, %rd63; 2026-02-21T08:16:48.9875307Z shl.b64 %rd69, %rd68, 1; 2026-02-21T08:16:48.9875375Z add.s64 %rd70, %rd6, %rd69; 2026-02-21T08:16:48.9875432Z add.s64 %rd22, %rd70, 64; 2026-02-21T08:16:48.9875489Z cvt.s64.s32 %rd71, %r166; 2026-02-21T08:16:48.9875556Z or.b64 %rd72, %rd71, %rd63; 2026-02-21T08:16:48.9875616Z shl.b64 %rd73, %rd72, 1; 2026-02-21T08:16:48.9875675Z add.s64 %rd74, %rd6, %rd73; 2026-02-21T08:16:48.9875732Z add.s64 %rd23, %rd74, 64; 2026-02-21T08:16:48.9875802Z cvt.s64.s32 %rd75, %r167; 2026-02-21T08:16:48.9875859Z or.b64 %rd76, %rd75, %rd63; 2026-02-21T08:16:48.9875916Z shl.b64 %rd77, %rd76, 1; 2026-02-21T08:16:48.9875981Z add.s64 %rd78, %rd6, %rd77; 2026-02-21T08:16:48.9876037Z add.s64 %rd24, %rd78, 64; 2026-02-21T08:16:48.9876093Z cvt.s64.s32 %rd79, %r168; 2026-02-21T08:16:48.9876150Z or.b64 %rd80, %rd79, %rd63; 2026-02-21T08:16:48.9876215Z shl.b64 %rd81, %rd80, 1; 2026-02-21T08:16:48.9876272Z add.s64 %rd82, %rd6, %rd81; 2026-02-21T08:16:48.9876331Z add.s64 %rd25, %rd82, 64; 2026-02-21T08:16:48.9876396Z cvt.s64.s32 %rd83, %r169; 2026-02-21T08:16:48.9876454Z or.b64 %rd84, %rd83, %rd63; 2026-02-21T08:16:48.9876511Z shl.b64 %rd85, %rd84, 1; 2026-02-21T08:16:48.9876568Z add.s64 %rd86, %rd6, %rd85; 2026-02-21T08:16:48.9876635Z add.s64 %rd26, %rd86, 64; 2026-02-21T08:16:48.9876693Z cvt.s64.s32 %rd87, %r170; 2026-02-21T08:16:48.9876751Z or.b64 %rd88, %rd87, %rd63; 2026-02-21T08:16:48.9876816Z shl.b64 %rd89, %rd88, 1; 2026-02-21T08:16:48.9876873Z add.s64 %rd90, %rd6, %rd89; 2026-02-21T08:16:48.9876929Z add.s64 %rd27, %rd90, 64; 2026-02-21T08:16:48.9876985Z cvt.s64.s32 %rd91, %r171; 2026-02-21T08:16:48.9877047Z or.b64 %rd92, %rd91, %rd63; 2026-02-21T08:16:48.9877103Z shl.b64 %rd93, %rd92, 1; 2026-02-21T08:16:48.9877160Z add.s64 %rd94, %rd6, %rd93; 2026-02-21T08:16:48.9877224Z add.s64 %rd28, %rd94, 64; 2026-02-21T08:16:48.9877401Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9877459Z bar.sync 2, 128; 2026-02-21T08:16:48.9877525Z add.s32 %r71, %r55, 16384; 2026-02-21T08:16:48.9877580Z // begin inline asm 2026-02-21T08:16:48.9877689Z cp.async.cg.shared.global [ %r71 + 0 ], [ %rd21 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9877742Z // end inline asm 2026-02-21T08:16:48.9877806Z add.s32 %r73, %r55, 18432; 2026-02-21T08:16:48.9877861Z // begin inline asm 2026-02-21T08:16:48.9877969Z cp.async.cg.shared.global [ %r73 + 0 ], [ %rd22 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9878028Z // end inline asm 2026-02-21T08:16:48.9878084Z add.s32 %r75, %r55, 20480; 2026-02-21T08:16:48.9878139Z // begin inline asm 2026-02-21T08:16:48.9878245Z cp.async.cg.shared.global [ %r75 + 0 ], [ %rd23 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9878304Z // end inline asm 2026-02-21T08:16:48.9878360Z add.s32 %r77, %r55, 22528; 2026-02-21T08:16:48.9878415Z // begin inline asm 2026-02-21T08:16:48.9878528Z cp.async.cg.shared.global [ %r77 + 0 ], [ %rd24 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9878583Z // end inline asm 2026-02-21T08:16:48.9878723Z add.s32 %r79, %r55, 24576; 2026-02-21T08:16:48.9878779Z // begin inline asm 2026-02-21T08:16:48.9878894Z cp.async.cg.shared.global [ %r79 + 0 ], [ %rd25 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9878948Z // end inline asm 2026-02-21T08:16:48.9879004Z add.s32 %r81, %r55, 26624; 2026-02-21T08:16:48.9879066Z // begin inline asm 2026-02-21T08:16:48.9879178Z cp.async.cg.shared.global [ %r81 + 0 ], [ %rd26 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9879229Z // end inline asm 2026-02-21T08:16:48.9879289Z add.s32 %r83, %r55, 28672; 2026-02-21T08:16:48.9879341Z // begin inline asm 2026-02-21T08:16:48.9879442Z cp.async.cg.shared.global [ %r83 + 0 ], [ %rd27 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9879493Z // end inline asm 2026-02-21T08:16:48.9879553Z add.s32 %r85, %r55, 30720; 2026-02-21T08:16:48.9879607Z // begin inline asm 2026-02-21T08:16:48.9879707Z cp.async.cg.shared.global [ %r85 + 0 ], [ %rd28 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9879804Z // end inline asm 2026-02-21T08:16:48.9879868Z cp.async.commit_group; 2026-02-21T08:16:48.9880039Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9880094Z add.s64 %rd29, %rd66, 128; 2026-02-21T08:16:48.9880158Z add.s64 %rd30, %rd70, 128; 2026-02-21T08:16:48.9880213Z add.s64 %rd31, %rd74, 128; 2026-02-21T08:16:48.9880267Z add.s64 %rd32, %rd78, 128; 2026-02-21T08:16:48.9880328Z add.s64 %rd33, %rd82, 128; 2026-02-21T08:16:48.9880382Z add.s64 %rd34, %rd86, 128; 2026-02-21T08:16:48.9880437Z add.s64 %rd35, %rd90, 128; 2026-02-21T08:16:48.9880491Z add.s64 %rd36, %rd94, 128; 2026-02-21T08:16:48.9880662Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9880717Z bar.sync 2, 128; 2026-02-21T08:16:48.9880770Z add.s32 %r87, %r55, 32768; 2026-02-21T08:16:48.9880832Z // begin inline asm 2026-02-21T08:16:48.9880934Z cp.async.cg.shared.global [ %r87 + 0 ], [ %rd29 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9880990Z // end inline asm 2026-02-21T08:16:48.9881053Z add.s32 %r89, %r55, 34816; 2026-02-21T08:16:48.9881105Z // begin inline asm 2026-02-21T08:16:48.9881205Z cp.async.cg.shared.global [ %r89 + 0 ], [ %rd30 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9881258Z // end inline asm 2026-02-21T08:16:48.9881319Z add.s32 %r91, %r55, 36864; 2026-02-21T08:16:48.9881371Z // begin inline asm 2026-02-21T08:16:48.9881470Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd31 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9881526Z // end inline asm 2026-02-21T08:16:48.9881578Z add.s32 %r93, %r55, 38912; 2026-02-21T08:16:48.9881631Z // begin inline asm 2026-02-21T08:16:48.9881729Z cp.async.cg.shared.global [ %r93 + 0 ], [ %rd32 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9881787Z // end inline asm 2026-02-21T08:16:48.9881839Z add.s32 %r95, %r55, 40960; 2026-02-21T08:16:48.9881891Z // begin inline asm 2026-02-21T08:16:48.9881998Z cp.async.cg.shared.global [ %r95 + 0 ], [ %rd33 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9882052Z // end inline asm 2026-02-21T08:16:48.9882104Z add.s32 %r97, %r55, 43008; 2026-02-21T08:16:48.9882162Z // begin inline asm 2026-02-21T08:16:48.9882261Z cp.async.cg.shared.global [ %r97 + 0 ], [ %rd34 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9882311Z // end inline asm 2026-02-21T08:16:48.9882364Z add.s32 %r99, %r55, 45056; 2026-02-21T08:16:48.9882425Z // begin inline asm 2026-02-21T08:16:48.9882523Z cp.async.cg.shared.global [ %r99 + 0 ], [ %rd35 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9882575Z // end inline asm 2026-02-21T08:16:48.9882637Z add.s32 %r101, %r55, 47104; 2026-02-21T08:16:48.9882689Z // begin inline asm 2026-02-21T08:16:48.9882797Z cp.async.cg.shared.global [ %r101 + 0 ], [ %rd36 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9882847Z // end inline asm 2026-02-21T08:16:48.9882912Z cp.async.commit_group; 2026-02-21T08:16:48.9883079Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9883138Z add.s64 %rd37, %rd66, 192; 2026-02-21T08:16:48.9883238Z add.s64 %rd38, %rd70, 192; 2026-02-21T08:16:48.9883292Z add.s64 %rd39, %rd74, 192; 2026-02-21T08:16:48.9883346Z add.s64 %rd40, %rd78, 192; 2026-02-21T08:16:48.9883408Z add.s64 %rd41, %rd82, 192; 2026-02-21T08:16:48.9883461Z add.s64 %rd42, %rd86, 192; 2026-02-21T08:16:48.9883515Z add.s64 %rd43, %rd90, 192; 2026-02-21T08:16:48.9883568Z add.s64 %rd44, %rd94, 192; 2026-02-21T08:16:48.9883743Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9883796Z bar.sync 2, 128; 2026-02-21T08:16:48.9883850Z add.s32 %r103, %r55, 49152; 2026-02-21T08:16:48.9883910Z // begin inline asm 2026-02-21T08:16:48.9884021Z cp.async.cg.shared.global [ %r103 + 0 ], [ %rd37 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9884074Z // end inline asm 2026-02-21T08:16:48.9884128Z add.s32 %r105, %r55, 51200; 2026-02-21T08:16:48.9884188Z // begin inline asm 2026-02-21T08:16:48.9884333Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd38 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9884388Z // end inline asm 2026-02-21T08:16:48.9884449Z add.s32 %r107, %r55, 53248; 2026-02-21T08:16:48.9884501Z // begin inline asm 2026-02-21T08:16:48.9884606Z cp.async.cg.shared.global [ %r107 + 0 ], [ %rd39 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9884658Z // end inline asm 2026-02-21T08:16:48.9884745Z add.s32 %r109, %r55, 55296; 2026-02-21T08:16:48.9884799Z // begin inline asm 2026-02-21T08:16:48.9884904Z cp.async.cg.shared.global [ %r109 + 0 ], [ %rd40 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9884964Z // end inline asm 2026-02-21T08:16:48.9885018Z add.s32 %r111, %r55, 57344; 2026-02-21T08:16:48.9885070Z // begin inline asm 2026-02-21T08:16:48.9885179Z cp.async.cg.shared.global [ %r111 + 0 ], [ %rd41 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9885231Z // end inline asm 2026-02-21T08:16:48.9885284Z add.s32 %r113, %r55, 59392; 2026-02-21T08:16:48.9885336Z // begin inline asm 2026-02-21T08:16:48.9885451Z cp.async.cg.shared.global [ %r113 + 0 ], [ %rd42 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9885505Z // end inline asm 2026-02-21T08:16:48.9885561Z add.s32 %r115, %r55, 61440; 2026-02-21T08:16:48.9885620Z // begin inline asm 2026-02-21T08:16:48.9885724Z cp.async.cg.shared.global [ %r115 + 0 ], [ %rd43 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9885775Z // end inline asm 2026-02-21T08:16:48.9885828Z add.s32 %r117, %r55, 63488; 2026-02-21T08:16:48.9885890Z // begin inline asm 2026-02-21T08:16:48.9885993Z cp.async.cg.shared.global [ %r117 + 0 ], [ %rd44 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9886044Z // end inline asm 2026-02-21T08:16:48.9886111Z cp.async.commit_group; 2026-02-21T08:16:48.9886277Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9886334Z add.s64 %rd45, %rd66, 256; 2026-02-21T08:16:48.9886396Z add.s64 %rd46, %rd70, 256; 2026-02-21T08:16:48.9886450Z add.s64 %rd47, %rd74, 256; 2026-02-21T08:16:48.9886508Z add.s64 %rd48, %rd78, 256; 2026-02-21T08:16:48.9886565Z add.s64 %rd49, %rd82, 256; 2026-02-21T08:16:48.9886628Z add.s64 %rd50, %rd86, 256; 2026-02-21T08:16:48.9886682Z add.s64 %rd51, %rd90, 256; 2026-02-21T08:16:48.9886736Z add.s64 %rd52, %rd94, 256; 2026-02-21T08:16:48.9886908Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9886961Z bar.sync 2, 128; 2026-02-21T08:16:48.9887016Z add.s32 %r119, %r55, 65536; 2026-02-21T08:16:48.9887069Z // begin inline asm 2026-02-21T08:16:48.9887181Z cp.async.cg.shared.global [ %r119 + 0 ], [ %rd45 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9887233Z // end inline asm 2026-02-21T08:16:48.9887288Z add.s32 %r121, %r55, 67584; 2026-02-21T08:16:48.9887349Z // begin inline asm 2026-02-21T08:16:48.9887452Z cp.async.cg.shared.global [ %r121 + 0 ], [ %rd46 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9887504Z // end inline asm 2026-02-21T08:16:48.9887559Z add.s32 %r123, %r55, 69632; 2026-02-21T08:16:48.9887622Z // begin inline asm 2026-02-21T08:16:48.9887783Z cp.async.cg.shared.global [ %r123 + 0 ], [ %rd47 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9887837Z // end inline asm 2026-02-21T08:16:48.9887899Z add.s32 %r125, %r55, 71680; 2026-02-21T08:16:48.9887953Z // begin inline asm 2026-02-21T08:16:48.9888057Z cp.async.cg.shared.global [ %r125 + 0 ], [ %rd48 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9888117Z // end inline asm 2026-02-21T08:16:48.9888170Z add.s32 %r127, %r55, 73728; 2026-02-21T08:16:48.9888223Z // begin inline asm 2026-02-21T08:16:48.9888327Z cp.async.cg.shared.global [ %r127 + 0 ], [ %rd49 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9888387Z // end inline asm 2026-02-21T08:16:48.9888438Z add.s32 %r129, %r55, 75776; 2026-02-21T08:16:48.9888492Z // begin inline asm 2026-02-21T08:16:48.9888601Z cp.async.cg.shared.global [ %r129 + 0 ], [ %rd50 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9888652Z // end inline asm 2026-02-21T08:16:48.9888705Z add.s32 %r131, %r55, 77824; 2026-02-21T08:16:48.9888815Z // begin inline asm 2026-02-21T08:16:48.9888932Z cp.async.cg.shared.global [ %r131 + 0 ], [ %rd51 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9888983Z // end inline asm 2026-02-21T08:16:48.9889038Z add.s32 %r133, %r55, 79872; 2026-02-21T08:16:48.9889097Z // begin inline asm 2026-02-21T08:16:48.9889199Z cp.async.cg.shared.global [ %r133 + 0 ], [ %rd52 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9889260Z // end inline asm 2026-02-21T08:16:48.9889327Z cp.async.commit_group; 2026-02-21T08:16:48.9889492Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9889549Z add.s64 %rd53, %rd66, 320; 2026-02-21T08:16:48.9889605Z add.s64 %rd54, %rd70, 320; 2026-02-21T08:16:48.9889667Z add.s64 %rd55, %rd74, 320; 2026-02-21T08:16:48.9889720Z add.s64 %rd56, %rd78, 320; 2026-02-21T08:16:48.9889774Z add.s64 %rd57, %rd82, 320; 2026-02-21T08:16:48.9889839Z add.s64 %rd58, %rd86, 320; 2026-02-21T08:16:48.9889895Z add.s64 %rd59, %rd90, 320; 2026-02-21T08:16:48.9889953Z add.s64 %rd60, %rd94, 320; 2026-02-21T08:16:48.9890119Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9890183Z bar.sync 2, 128; 2026-02-21T08:16:48.9890239Z add.s32 %r135, %r55, 81920; 2026-02-21T08:16:48.9890291Z // begin inline asm 2026-02-21T08:16:48.9890400Z cp.async.cg.shared.global [ %r135 + 0 ], [ %rd53 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9890451Z // end inline asm 2026-02-21T08:16:48.9890504Z add.s32 %r137, %r55, 83968; 2026-02-21T08:16:48.9890556Z // begin inline asm 2026-02-21T08:16:48.9890668Z cp.async.cg.shared.global [ %r137 + 0 ], [ %rd54 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9890718Z // end inline asm 2026-02-21T08:16:48.9890772Z add.s32 %r139, %r55, 86016; 2026-02-21T08:16:48.9890832Z // begin inline asm 2026-02-21T08:16:48.9890933Z cp.async.cg.shared.global [ %r139 + 0 ], [ %rd55 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9890984Z // end inline asm 2026-02-21T08:16:48.9891046Z add.s32 %r141, %r55, 88064; 2026-02-21T08:16:48.9891102Z // begin inline asm 2026-02-21T08:16:48.9891203Z cp.async.cg.shared.global [ %r141 + 0 ], [ %rd56 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9891254Z // end inline asm 2026-02-21T08:16:48.9891315Z add.s32 %r143, %r55, 90112; 2026-02-21T08:16:48.9891367Z // begin inline asm 2026-02-21T08:16:48.9891468Z cp.async.cg.shared.global [ %r143 + 0 ], [ %rd57 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9891523Z // end inline asm 2026-02-21T08:16:48.9891577Z add.s32 %r145, %r55, 92160; 2026-02-21T08:16:48.9891629Z // begin inline asm 2026-02-21T08:16:48.9891731Z cp.async.cg.shared.global [ %r145 + 0 ], [ %rd58 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9891787Z // end inline asm 2026-02-21T08:16:48.9891841Z add.s32 %r147, %r55, 94208; 2026-02-21T08:16:48.9891892Z // begin inline asm 2026-02-21T08:16:48.9892002Z cp.async.cg.shared.global [ %r147 + 0 ], [ %rd59 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9892052Z // end inline asm 2026-02-21T08:16:48.9892108Z add.s32 %r149, %r55, 96256; 2026-02-21T08:16:48.9892209Z // begin inline asm 2026-02-21T08:16:48.9892311Z cp.async.cg.shared.global [ %r149 + 0 ], [ %rd60 + 0 ], 0x10, %r56; 2026-02-21T08:16:48.9892363Z // end inline asm 2026-02-21T08:16:48.9892422Z cp.async.commit_group; 2026-02-21T08:16:48.9892597Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9892652Z add.s32 %r184, %r5, %r171; 2026-02-21T08:16:48.9892709Z cvt.u64.u32 %rd1, %r184; 2026-02-21T08:16:48.9892768Z add.s32 %r185, %r4, %r164; 2026-02-21T08:16:48.9892824Z cvt.u64.u32 %rd2, %r185; 2026-02-21T08:16:48.9892880Z mov.pred %p104, 0; 2026-02-21T08:16:48.9892932Z mov.b32 %r1607, 0; 2026-02-21T08:16:48.9892990Z mov.b32 %r1606, 5; 2026-02-21T08:16:48.9893045Z mov.b32 %r1605, -1; 2026-02-21T08:16:48.9893098Z mov.b64 %rd684, 0; 2026-02-21T08:16:48.9893158Z mov.b32 %r1608, %r1607; 2026-02-21T08:16:48.9893211Z bra.uni $L__BB0_6; 2026-02-21T08:16:48.9893341Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:16:48.9893509Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9893579Z setp.lt.u64 %p25, %rd684, 1856; 2026-02-21T08:16:48.9893742Z .loc 1 55 44 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:55:44 2026-02-21T08:16:48.9893798Z add.s32 %r238, %r1608, 1; 2026-02-21T08:16:48.9893864Z setp.eq.b32 %p26, %r238, 7; 2026-02-21T08:16:48.9893927Z selp.b32 %r1608, 0, %r238, %p26; 2026-02-21T08:16:48.9893983Z selp.b32 %r239, 1, 0, %p26; 2026-02-21T08:16:48.9894047Z xor.b32 %r1607, %r1607, %r239; 2026-02-21T08:16:48.9894213Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9894267Z add.s32 %r240, %r1606, 1; 2026-02-21T08:16:48.9894325Z setp.gt.s32 %p27, %r240, 5; 2026-02-21T08:16:48.9894392Z selp.b32 %r1606, 0, %r240, %p27; 2026-02-21T08:16:48.9894554Z .loc 1 54 60 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:60 2026-02-21T08:16:48.9894616Z add.s64 %rd119, %rd2, %rd684; 2026-02-21T08:16:48.9894811Z .loc 1 54 32 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:32 2026-02-21T08:16:48.9894869Z add.s64 %rd120, %rd1, %rd684; 2026-02-21T08:16:48.9894926Z cvt.u32.u64 %r241, %rd119; 2026-02-21T08:16:48.9894990Z add.s32 %r242, %r241, 192; 2026-02-21T08:16:48.9895056Z mad.wide.s32 %rd111, %r242, 2, %rd6; 2026-02-21T08:16:48.9895116Z add.s32 %r243, %r241, 65728; 2026-02-21T08:16:48.9895180Z mad.wide.s32 %rd112, %r243, 2, %rd6; 2026-02-21T08:16:48.9895244Z add.s32 %r244, %r241, 131264; 2026-02-21T08:16:48.9895304Z mad.wide.s32 %rd113, %r244, 2, %rd6; 2026-02-21T08:16:48.9895360Z add.s32 %r245, %r241, 196800; 2026-02-21T08:16:48.9895426Z mad.wide.s32 %rd114, %r245, 2, %rd6; 2026-02-21T08:16:48.9895481Z add.s32 %r246, %r241, 262336; 2026-02-21T08:16:48.9895542Z mad.wide.s32 %rd115, %r246, 2, %rd6; 2026-02-21T08:16:48.9895606Z add.s32 %r247, %r241, 327872; 2026-02-21T08:16:48.9895663Z mad.wide.s32 %rd116, %r247, 2, %rd6; 2026-02-21T08:16:48.9895718Z add.s32 %r248, %r241, 393408; 2026-02-21T08:16:48.9895777Z mad.wide.s32 %rd117, %r248, 2, %rd6; 2026-02-21T08:16:48.9895843Z cvt.u32.u64 %r249, %rd120; 2026-02-21T08:16:48.9895902Z mad.wide.s32 %rd118, %r249, 2, %rd6; 2026-02-21T08:16:48.9896066Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9896128Z shl.b32 %r250, %r1606, 14; 2026-02-21T08:16:48.9896183Z add.s32 %r252, %r28, %r250; 2026-02-21T08:16:48.9896236Z bar.sync 2, 128; 2026-02-21T08:16:48.9896291Z add.s32 %r222, %r252, %r8; 2026-02-21T08:16:48.9896354Z selp.b32 %r223, 16, 0, %p25; 2026-02-21T08:16:48.9896408Z // begin inline asm 2026-02-21T08:16:48.9896522Z cp.async.cg.shared.global [ %r222 + 0 ], [ %rd111 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9896583Z // end inline asm 2026-02-21T08:16:48.9896701Z add.s32 %r224, %r222, 2048; 2026-02-21T08:16:48.9896755Z // begin inline asm 2026-02-21T08:16:48.9896876Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd112 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9896928Z // end inline asm 2026-02-21T08:16:48.9896982Z add.s32 %r226, %r222, 4096; 2026-02-21T08:16:48.9897034Z // begin inline asm 2026-02-21T08:16:48.9897151Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd113 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9897204Z // end inline asm 2026-02-21T08:16:48.9897257Z add.s32 %r228, %r222, 6144; 2026-02-21T08:16:48.9897316Z // begin inline asm 2026-02-21T08:16:48.9897425Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd114 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9897476Z // end inline asm 2026-02-21T08:16:48.9897529Z add.s32 %r230, %r222, 8192; 2026-02-21T08:16:48.9897589Z // begin inline asm 2026-02-21T08:16:48.9897697Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd115 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9897801Z // end inline asm 2026-02-21T08:16:48.9897869Z add.s32 %r232, %r222, 10240; 2026-02-21T08:16:48.9897922Z // begin inline asm 2026-02-21T08:16:48.9898028Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd116 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9898081Z // end inline asm 2026-02-21T08:16:48.9898143Z add.s32 %r234, %r222, 12288; 2026-02-21T08:16:48.9898196Z // begin inline asm 2026-02-21T08:16:48.9898304Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd117 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9898363Z // end inline asm 2026-02-21T08:16:48.9898416Z add.s32 %r236, %r222, 14336; 2026-02-21T08:16:48.9898468Z // begin inline asm 2026-02-21T08:16:48.9898581Z cp.async.cg.shared.global [ %r236 + 0 ], [ %rd118 + 0 ], 0x10, %r223; 2026-02-21T08:16:48.9898633Z // end inline asm 2026-02-21T08:16:48.9898692Z cp.async.commit_group; 2026-02-21T08:16:48.9898859Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9898925Z add.s64 %rd4, %rd684, 32; 2026-02-21T08:16:48.9898990Z setp.lt.u64 %p28, %rd684, 2016; 2026-02-21T08:16:48.9899050Z mov.pred %p104, -1; 2026-02-21T08:16:48.9899112Z mov.b64 %rd684, %rd4; 2026-02-21T08:16:48.9899166Z @%p28 bra $L__BB0_6; 2026-02-21T08:16:48.9899220Z bra.uni $L__BB0_9; 2026-02-21T08:16:48.9899312Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:16:48.9899410Z // => This Inner Loop Header: Depth=2 2026-02-21T08:16:48.9899466Z add.s32 %r188, %r1605, 1; 2026-02-21T08:16:48.9899524Z setp.gt.s32 %p11, %r188, 5; 2026-02-21T08:16:48.9899593Z selp.b32 %r1605, 0, %r188, %p11; 2026-02-21T08:16:48.9899759Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9899819Z cp.async.wait_group 5; 2026-02-21T08:16:48.9899878Z bar.sync 2, 128; 2026-02-21T08:16:48.9900045Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9900101Z shl.b32 %r189, %r1608, 3; 2026-02-21T08:16:48.9900159Z add.s32 %r191, %r28, %r189; 2026-02-21T08:16:48.9900221Z add.s32 %r186, %r191, 155712; 2026-02-21T08:16:48.9900386Z .loc 1 55 44 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:55:44 2026-02-21T08:16:48.9900439Z // begin inline asm 2026-02-21T08:16:48.9900496Z 2026-02-21T08:16:48.9900545Z { 2026-02-21T08:16:48.9900603Z .reg .pred complete; 2026-02-21T08:16:48.9900657Z waitLoop: 2026-02-21T08:16:48.9900782Z mbarrier.try_wait.parity.shared.b64 complete, [%r186], %r1607; 2026-02-21T08:16:48.9900844Z @!complete bra.uni waitLoop; 2026-02-21T08:16:48.9900891Z } 2026-02-21T08:16:48.9900896Z 2026-02-21T08:16:48.9900956Z // end inline asm 2026-02-21T08:16:48.9901122Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9901192Z shfl.sync.idx.b32 %r192, %r7, 0, 31, -1; 2026-02-21T08:16:48.9901257Z setp.ne.b32 %p12, %r192, 0; 2026-02-21T08:16:48.9901358Z @%p12 bra $L__BB0_8; 2026-02-21T08:16:48.9901450Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:16:48.9901512Z setp.eq.b64 %p23, %rd684, 2016; 2026-02-21T08:16:48.9901685Z .loc 1 55 44 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:55:44 2026-02-21T08:16:48.9901741Z shl.b32 %r201, %r1608, 13; 2026-02-21T08:16:48.9901795Z add.s32 %r203, %r28, %r201; 2026-02-21T08:16:48.9901858Z add.s32 %r204, %r203, 98304; 2026-02-21T08:16:48.9902022Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9902078Z add.s32 %r207, %r191, 155648; 2026-02-21T08:16:48.9902246Z .loc 1 54 85 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:54:85 2026-02-21T08:16:48.9902301Z shl.b32 %r208, %r1605, 14; 2026-02-21T08:16:48.9902355Z add.s32 %r209, %r28, %r208; 2026-02-21T08:16:48.9902567Z .loc 1 56 52 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:56:52 2026-02-21T08:16:48.9902632Z elect.sync %r210|%p14, -1; 2026-02-21T08:16:48.9902690Z bfe.u32 %r211, %r209, 4, 14; 2026-02-21T08:16:48.9902746Z cvt.u64.u32 %rd105, %r211; 2026-02-21T08:16:48.9902827Z or.b64 %rd95, %rd105, -9223371899348713472; 2026-02-21T08:16:48.9902883Z bfe.u32 %r212, %r204, 4, 14; 2026-02-21T08:16:48.9902939Z cvt.u64.u32 %rd106, %r212; 2026-02-21T08:16:48.9903015Z or.b64 %rd96, %rd106, -9223371899382267904; 2026-02-21T08:16:48.9903070Z mov.b32 %r194, 136314896; 2026-02-21T08:16:48.9903124Z // begin inline asm 2026-02-21T08:16:48.9903266Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 0 ], %rd95, %rd96, %r194, %p104; 2026-02-21T08:16:48.9903327Z // end inline asm 2026-02-21T08:16:48.9903382Z add.s32 %r213, %r209, 32; 2026-02-21T08:16:48.9903436Z bfe.u32 %r214, %r213, 4, 14; 2026-02-21T08:16:48.9903498Z cvt.u64.u32 %rd107, %r214; 2026-02-21T08:16:48.9903564Z or.b64 %rd97, %rd107, -9223371899348713472; 2026-02-21T08:16:48.9903621Z add.s32 %r215, %r203, 98336; 2026-02-21T08:16:48.9903682Z bfe.u32 %r216, %r215, 4, 14; 2026-02-21T08:16:48.9903737Z cvt.u64.u32 %rd108, %r216; 2026-02-21T08:16:48.9903800Z or.b64 %rd98, %rd108, -9223371899382267904; 2026-02-21T08:16:48.9903859Z mov.pred %p15, -1; 2026-02-21T08:16:48.9903918Z // begin inline asm 2026-02-21T08:16:48.9904049Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 0 ], %rd97, %rd98, %r194, %p15; 2026-02-21T08:16:48.9904101Z // end inline asm 2026-02-21T08:16:48.9904162Z add.s32 %r217, %r209, 8192; 2026-02-21T08:16:48.9904216Z bfe.u32 %r218, %r217, 4, 14; 2026-02-21T08:16:48.9904271Z cvt.u64.u32 %rd109, %r218; 2026-02-21T08:16:48.9904334Z or.b64 %rd99, %rd109, -9223371899348713472; 2026-02-21T08:16:48.9904396Z // begin inline asm 2026-02-21T08:16:48.9904529Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 128 ], %rd99, %rd96, %r194, %p104; 2026-02-21T08:16:48.9904581Z // end inline asm 2026-02-21T08:16:48.9904646Z add.s32 %r219, %r209, 8224; 2026-02-21T08:16:48.9904732Z bfe.u32 %r220, %r219, 4, 14; 2026-02-21T08:16:48.9904789Z cvt.u64.u32 %rd110, %r220; 2026-02-21T08:16:48.9904866Z or.b64 %rd101, %rd110, -9223371899348713472; 2026-02-21T08:16:48.9904919Z // begin inline asm 2026-02-21T08:16:48.9905052Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 128 ], %rd101, %rd98, %r194, %p15; 2026-02-21T08:16:48.9905105Z // end inline asm 2026-02-21T08:16:48.9905166Z cvt.u64.u32 %rd103, %r207; 2026-02-21T08:16:48.9905220Z // begin inline asm 2026-02-21T08:16:48.9905343Z @%p14 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd103]; 2026-02-21T08:16:48.9905402Z // end inline asm 2026-02-21T08:16:48.9905462Z and.pred %p22, %p23, %p14; 2026-02-21T08:16:48.9905517Z add.s32 %r221, %r28, 155776; 2026-02-21T08:16:48.9905579Z cvt.u64.u32 %rd104, %r221; 2026-02-21T08:16:48.9905632Z // begin inline asm 2026-02-21T08:16:48.9905752Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd104]; 2026-02-21T08:16:48.9905861Z // end inline asm 2026-02-21T08:16:48.9905923Z bra.uni $L__BB0_8; 2026-02-21T08:16:48.9906020Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9906190Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9906264Z ld.shared.b32 %r45, [global_smem+8]; 2026-02-21T08:16:48.9906319Z barrier.sync 1; 2026-02-21T08:16:48.9906482Z .loc 1 21 67 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:21:67 2026-02-21T08:16:48.9906545Z mov.u32 %r32, %ctaid.x; 2026-02-21T08:16:48.9906600Z mov.u32 %r33, %ctaid.y; 2026-02-21T08:16:48.9906654Z mov.u32 %r34, %ctaid.z; 2026-02-21T08:16:48.9906712Z mov.u32 %r35, %nctaid.x; 2026-02-21T08:16:48.9906773Z mov.u32 %r36, %nctaid.y; 2026-02-21T08:16:48.9906834Z mad.lo.s32 %r37, %r34, %r36, %r33; 2026-02-21T08:16:48.9906892Z mad.lo.s32 %r38, %r37, %r35, %r32; 2026-02-21T08:16:48.9907008Z shl.b32 %r39, %r38, 7; 2026-02-21T08:16:48.9907067Z cvt.s64.s32 %rd10, %r39; 2026-02-21T08:16:48.9907125Z add.s64 %rd11, %rd9, %rd10; 2026-02-21T08:16:48.9907186Z cvta.global.u64 %rd12, %rd11; 2026-02-21T08:16:48.9907251Z add.s32 %r18, %r1, -256; 2026-02-21T08:16:48.9907303Z mov.b32 %r1610, 0; 2026-02-21T08:16:48.9907358Z mov.b32 %r1609, -32; 2026-02-21T08:16:48.9907420Z mov.b32 %r1611, %r1610; 2026-02-21T08:16:48.9907512Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:16:48.9907603Z // => This Inner Loop Header: Depth=2 2026-02-21T08:16:48.9907771Z .loc 1 0 67 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:0:67 2026-02-21T08:16:48.9907831Z setp.lt.u32 %p6, %r18, 32; 2026-02-21T08:16:48.9907888Z setp.eq.b32 %p4, %r18, 0; 2026-02-21T08:16:48.9908057Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9908121Z add.s32 %r1609, %r1609, 32; 2026-02-21T08:16:48.9908177Z shl.b32 %r47, %r1611, 3; 2026-02-21T08:16:48.9908232Z add.s32 %r49, %r28, %r47; 2026-02-21T08:16:48.9908291Z add.s32 %r40, %r49, 155648; 2026-02-21T08:16:48.9908456Z .loc 1 55 44 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:55:44 2026-02-21T08:16:48.9908510Z // begin inline asm 2026-02-21T08:16:48.9908557Z 2026-02-21T08:16:48.9908611Z { 2026-02-21T08:16:48.9908668Z .reg .pred complete; 2026-02-21T08:16:48.9908720Z waitLoop: 2026-02-21T08:16:48.9908843Z mbarrier.try_wait.parity.shared.b64 complete, [%r40], %r1610; 2026-02-21T08:16:48.9908903Z @!complete bra.uni waitLoop; 2026-02-21T08:16:48.9908950Z } 2026-02-21T08:16:48.9908955Z 2026-02-21T08:16:48.9909014Z // end inline asm 2026-02-21T08:16:48.9909180Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9909234Z add.s32 %r46, %r49, 155712; 2026-02-21T08:16:48.9909401Z .loc 1 55 44 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:55:44 2026-02-21T08:16:48.9909463Z bar.sync 3, 64; 2026-02-21T08:16:48.9909517Z // begin inline asm 2026-02-21T08:16:48.9909621Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r46], 8192; 2026-02-21T08:16:48.9909682Z // end inline asm 2026-02-21T08:16:48.9909737Z shl.b32 %r50, %r1611, 13; 2026-02-21T08:16:48.9909791Z add.s32 %r51, %r28, %r50; 2026-02-21T08:16:48.9909853Z add.s32 %r43, %r51, 98304; 2026-02-21T08:16:48.9909907Z bar.sync 3, 64; 2026-02-21T08:16:48.9909967Z elect.sync %r52|%p7, -1; 2026-02-21T08:16:48.9910027Z and.pred %p5, %p6, %p7; 2026-02-21T08:16:48.9910087Z // begin inline asm 2026-02-21T08:16:48.9910330Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r43], [%rd12, {%r1609, %r45}], [%r46]; 2026-02-21T08:16:48.9910383Z // end inline asm 2026-02-21T08:16:48.9910446Z add.s32 %r53, %r1611, 1; 2026-02-21T08:16:48.9910506Z setp.eq.b32 %p8, %r53, 7; 2026-02-21T08:16:48.9910612Z selp.b32 %r1611, 0, %r53, %p8; 2026-02-21T08:16:48.9910668Z selp.b32 %r54, 1, 0, %p8; 2026-02-21T08:16:48.9910731Z xor.b32 %r1610, %r1610, %r54; 2026-02-21T08:16:48.9910895Z .loc 1 49 57 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:49:57 2026-02-21T08:16:48.9910958Z setp.lt.u32 %p9, %r1609, 2016; 2026-02-21T08:16:48.9911020Z @%p9 bra $L__BB0_11; 2026-02-21T08:16:48.9911114Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9911166Z barrier.sync 1; 2026-02-21T08:16:48.9911228Z bra.uni $L__BB0_2; 2026-02-21T08:16:48.9911318Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9911375Z cp.async.wait_group 0; 2026-02-21T08:16:48.9911427Z bar.sync 2, 128; 2026-02-21T08:16:48.9911488Z barrier.sync 1; 2026-02-21T08:16:48.9911540Z bra.uni $L__BB0_2; 2026-02-21T08:16:48.9911678Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:16:48.9911850Z .loc 1 19 0 // cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py:19 2026-02-21T08:16:48.9911903Z barrier.sync 1; 2026-02-21T08:16:48.9911956Z barrier.sync 1; 2026-02-21T08:16:48.9912011Z bra.uni $L__BB0_2; 2026-02-21T08:16:48.9912070Z $L__tmp1: 2026-02-21T08:16:48.9912123Z $L__func_end0: 2026-02-21T08:16:48.9912202Z // -- End function 2026-02-21T08:16:48.9912258Z } 2026-02-21T08:16:48.9912461Z .file 1 "/tmp/torchinductor_root/og/cogiw6bi7qpuoto5n2zawdb6einvrdqba3fnsxqpf5q66kwijirs.py" 2026-02-21T08:16:48.9912520Z .section .debug_abbrev 2026-02-21T08:16:48.9912574Z { 2026-02-21T08:16:48.9912660Z .b8 1 // Abbreviation Code 2026-02-21T08:16:48.9912745Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:16:48.9912822Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:16:48.9912910Z .b8 37 // DW_AT_producer 2026-02-21T08:16:48.9912985Z .b8 8 // DW_FORM_string 2026-02-21T08:16:48.9913055Z .b8 19 // DW_AT_language 2026-02-21T08:16:48.9913137Z .b8 5 // DW_FORM_data2 2026-02-21T08:16:48.9913209Z .b8 3 // DW_AT_name 2026-02-21T08:16:48.9913278Z .b8 8 // DW_FORM_string 2026-02-21T08:16:48.9913360Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:16:48.9913432Z .b8 6 // DW_FORM_data4 2026-02-21T08:16:48.9913503Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:16:48.9913573Z .b8 8 // DW_FORM_string 2026-02-21T08:16:48.9913645Z .b8 0 // EOM(1) 2026-02-21T08:16:48.9913711Z .b8 0 // EOM(2) 2026-02-21T08:16:48.9913777Z .b8 0 // EOM(3) 2026-02-21T08:16:48.9913834Z } 2026-02-21T08:16:48.9913891Z .section .debug_info 2026-02-21T08:16:48.9913938Z { 2026-02-21T08:16:48.9914023Z .b32 104 // Length of Unit 2026-02-21T08:16:48.9914105Z .b8 2 // DWARF version number 2026-02-21T08:16:48.9914153Z .b8 0 2026-02-21T08:16:48.9914261Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:16:48.9914352Z .b8 8 // Address Size (in bytes) 2026-02-21T08:16:48.9914447Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:16:48.9914525Z .b8 116 // DW_AT_producer 2026-02-21T08:16:48.9914584Z .b8 114 2026-02-21T08:16:48.9914637Z .b8 105 2026-02-21T08:16:48.9914715Z .b8 116 2026-02-21T08:16:48.9914765Z .b8 111 2026-02-21T08:16:48.9914822Z .b8 110 2026-02-21T08:16:48.9914873Z .b8 0 2026-02-21T08:16:48.9914948Z .b8 2 // DW_AT_language 2026-02-21T08:16:48.9915056Z .b8 0 2026-02-21T08:16:48.9915132Z .b8 99 // DW_AT_name 2026-02-21T08:16:48.9915181Z .b8 111 2026-02-21T08:16:48.9915231Z .b8 103 2026-02-21T08:16:48.9915287Z .b8 105 2026-02-21T08:16:48.9915337Z .b8 119 2026-02-21T08:16:48.9915387Z .b8 54 2026-02-21T08:16:48.9915445Z .b8 98 2026-02-21T08:16:48.9915495Z .b8 105 2026-02-21T08:16:48.9915544Z .b8 55 2026-02-21T08:16:48.9915594Z .b8 113 2026-02-21T08:16:48.9915650Z .b8 112 2026-02-21T08:16:48.9915700Z .b8 117 2026-02-21T08:16:48.9915749Z .b8 111 2026-02-21T08:16:48.9915806Z .b8 116 2026-02-21T08:16:48.9915856Z .b8 111 2026-02-21T08:16:48.9915905Z .b8 53 2026-02-21T08:16:48.9915954Z .b8 110 2026-02-21T08:16:48.9916010Z .b8 50 2026-02-21T08:16:48.9916059Z .b8 122 2026-02-21T08:16:48.9916108Z .b8 97 2026-02-21T08:16:48.9916156Z .b8 119 2026-02-21T08:16:48.9916212Z .b8 100 2026-02-21T08:16:48.9916261Z .b8 98 2026-02-21T08:16:48.9916312Z .b8 54 2026-02-21T08:16:48.9916417Z .b8 101 2026-02-21T08:16:48.9916471Z .b8 105 2026-02-21T08:16:48.9916520Z .b8 110 2026-02-21T08:16:48.9916570Z .b8 118 2026-02-21T08:16:48.9916628Z .b8 114 2026-02-21T08:16:48.9916678Z .b8 100 2026-02-21T08:16:48.9916728Z .b8 113 2026-02-21T08:16:48.9916784Z .b8 98 2026-02-21T08:16:48.9916833Z .b8 97 2026-02-21T08:16:48.9916883Z .b8 51 2026-02-21T08:16:48.9916934Z .b8 102 2026-02-21T08:16:48.9916991Z .b8 110 2026-02-21T08:16:48.9917042Z .b8 115 2026-02-21T08:16:48.9917091Z .b8 120 2026-02-21T08:16:48.9917147Z .b8 113 2026-02-21T08:16:48.9917197Z .b8 112 2026-02-21T08:16:48.9917247Z .b8 102 2026-02-21T08:16:48.9917295Z .b8 53 2026-02-21T08:16:48.9917352Z .b8 113 2026-02-21T08:16:48.9917402Z .b8 54 2026-02-21T08:16:48.9917452Z .b8 54 2026-02-21T08:16:48.9917501Z .b8 107 2026-02-21T08:16:48.9917560Z .b8 119 2026-02-21T08:16:48.9917608Z .b8 105 2026-02-21T08:16:48.9917656Z .b8 106 2026-02-21T08:16:48.9917710Z .b8 105 2026-02-21T08:16:48.9917758Z .b8 114 2026-02-21T08:16:48.9917807Z .b8 115 2026-02-21T08:16:48.9917857Z .b8 46 2026-02-21T08:16:48.9917925Z .b8 112 2026-02-21T08:16:48.9917975Z .b8 121 2026-02-21T08:16:48.9918026Z .b8 0 2026-02-21T08:16:48.9918125Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:16:48.9918200Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:16:48.9918249Z .b8 116 2026-02-21T08:16:48.9918299Z .b8 109 2026-02-21T08:16:48.9918356Z .b8 112 2026-02-21T08:16:48.9918406Z .b8 47 2026-02-21T08:16:48.9918455Z .b8 116 2026-02-21T08:16:48.9918513Z .b8 111 2026-02-21T08:16:48.9918562Z .b8 114 2026-02-21T08:16:48.9918611Z .b8 99 2026-02-21T08:16:48.9918661Z .b8 104 2026-02-21T08:16:48.9918718Z .b8 105 2026-02-21T08:16:48.9918768Z .b8 110 2026-02-21T08:16:48.9918817Z .b8 100 2026-02-21T08:16:48.9918866Z .b8 117 2026-02-21T08:16:48.9918923Z .b8 99 2026-02-21T08:16:48.9918973Z .b8 116 2026-02-21T08:16:48.9919021Z .b8 111 2026-02-21T08:16:48.9919079Z .b8 114 2026-02-21T08:16:48.9919128Z .b8 95 2026-02-21T08:16:48.9919177Z .b8 114 2026-02-21T08:16:48.9919229Z .b8 111 2026-02-21T08:16:48.9919290Z .b8 111 2026-02-21T08:16:48.9919338Z .b8 116 2026-02-21T08:16:48.9919388Z .b8 47 2026-02-21T08:16:48.9919442Z .b8 111 2026-02-21T08:16:48.9919492Z .b8 103 2026-02-21T08:16:48.9919540Z .b8 0 2026-02-21T08:16:48.9919590Z } 2026-02-21T08:16:48.9919661Z .section .debug_macinfo { } 2026-02-21T08:16:48.9919666Z 2026-02-21T08:16:48.9919743Z ================================================================ 2026-02-21T08:16:48.9919846Z please share the reproducer above with Triton project. 2026-02-21T08:16:50.5999508Z [38s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:16:50.5999799Z 2026-02-21T08:16:50.6004654Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=8, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:16:50.6006133Z 2026-02-21T08:16:50.6006476Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:16:50.6006721Z `ptxas` stderr: 2026-02-21T08:16:50.6007139Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 195 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:16:50.6007625Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:16:50.6007774Z 2026-02-21T08:16:50.6008175Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpyo1td3d2.ptx -o /tmp/tmpyo1td3d2.ptx.o 2026-02-21T08:16:50.6008612Z 2026-02-21T08:16:50.6008737Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:16:50.6008927Z 2026-02-21T08:16:50.6009195Z ================================================================ 2026-02-21T08:16:50.6009419Z Internal Triton PTX codegen error 2026-02-21T08:16:50.6009584Z `ptxas` stderr: 2026-02-21T08:16:50.6009987Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 195 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:16:50.6010453Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:16:50.6010602Z 2026-02-21T08:16:50.6010966Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpyo1td3d2.ptx -o /tmp/tmpyo1td3d2.ptx.o 2026-02-21T08:16:50.6011402Z 2026-02-21T08:16:50.6011405Z 2026-02-21T08:16:50.6011469Z // 2026-02-21T08:16:50.6011611Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:16:50.6011789Z // 2026-02-21T08:16:50.6011857Z 2026-02-21T08:16:50.6011912Z .version 8.7 2026-02-21T08:16:50.6012052Z .target sm_100a 2026-02-21T08:16:50.6012183Z .address_size 64 2026-02-21T08:16:50.6012274Z 2026-02-21T08:16:50.6012389Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:16:50.6012641Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:16:50.6012841Z // @_helion_matmul 2026-02-21T08:16:50.6013040Z .visible .entry _helion_matmul( 2026-02-21T08:16:50.6013243Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:16:50.6013489Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:16:50.6013729Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:16:50.6013968Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:16:50.6014202Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:16:50.6014400Z ) 2026-02-21T08:16:50.6014520Z .reqntid 128 2026-02-21T08:16:50.6014644Z .maxnreg 32 2026-02-21T08:16:50.6014819Z { 2026-02-21T08:16:50.6014944Z .reg .pred %p<92>; 2026-02-21T08:16:50.6015096Z .reg .b16 %rs<4>; 2026-02-21T08:16:50.6015236Z .reg .b32 %r<572>; 2026-02-21T08:16:50.6015381Z .reg .b64 %rd<198>; 2026-02-21T08:16:50.6015520Z $L__func_begin0: 2026-02-21T08:16:50.6015608Z 2026-02-21T08:16:50.6015661Z // %bb.0: 2026-02-21T08:16:50.6015901Z .loc 1 19 0 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:19 2026-02-21T08:16:50.6016197Z mov.u32 %r1, %tid.x; 2026-02-21T08:16:50.6016372Z ld.param.b64 %rd11, [_helion_matmul_param_1]; 2026-02-21T08:16:50.6016565Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:16:50.6016732Z mov.b32 %r38, global_smem; 2026-02-21T08:16:50.6016881Z // begin inline asm 2026-02-21T08:16:50.6017120Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r38], 128; 2026-02-21T08:16:50.6017356Z // end inline asm 2026-02-21T08:16:50.6017514Z ld.param.b64 %rd28, [_helion_matmul_param_3]; 2026-02-21T08:16:50.6017694Z bar.sync 0; 2026-02-21T08:16:50.6017840Z ld.shared.b32 %r564, [global_smem]; 2026-02-21T08:16:50.6018010Z bar.sync 0; 2026-02-21T08:16:50.6018222Z // begin inline asm 2026-02-21T08:16:50.6018425Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:16:50.6018645Z // end inline asm 2026-02-21T08:16:50.6018897Z .loc 1 21 67 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:21:67 2026-02-21T08:16:50.6019180Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:16:50.6019337Z mov.u32 %r47, %ctaid.y; 2026-02-21T08:16:50.6019482Z mov.u32 %r48, %ctaid.z; 2026-02-21T08:16:50.6019635Z mov.u32 %r49, %nctaid.x; 2026-02-21T08:16:50.6019791Z mov.u32 %r50, %nctaid.y; 2026-02-21T08:16:50.6019942Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T08:16:50.6020121Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T08:16:50.6020283Z shl.b32 %r53, %r52, 7; 2026-02-21T08:16:50.6020437Z cvt.s64.s32 %rd29, %r53; 2026-02-21T08:16:50.6020590Z add.s64 %rd25, %rd28, %rd29; 2026-02-21T08:16:50.6020755Z shl.b32 %r54, %r1, 2; 2026-02-21T08:16:50.6020963Z add.s32 %r39, %r38, %r54; 2026-02-21T08:16:50.6021123Z mov.b32 %r56, 0; 2026-02-21T08:16:50.6021268Z // begin inline asm 2026-02-21T08:16:50.6021413Z @%p1 st.shared.b32 [ %r39 + 0 ], %r56; 2026-02-21T08:16:50.6021585Z // end inline asm 2026-02-21T08:16:50.6021720Z bar.warp.sync -1; 2026-02-21T08:16:50.6021867Z setp.eq.b32 %p81, %r1, 0; 2026-02-21T08:16:50.6022016Z cvt.u64.u32 %rd10, %r38; 2026-02-21T08:16:50.6022163Z // begin inline asm 2026-02-21T08:16:50.6022408Z @%p81 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T08:16:50.6022691Z // end inline asm 2026-02-21T08:16:50.6022822Z // begin inline asm 2026-02-21T08:16:50.6023047Z @%p81 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:16:50.6023297Z // end inline asm 2026-02-21T08:16:50.6023427Z mov.b32 %r134, 16; 2026-02-21T08:16:50.6023566Z // begin inline asm 2026-02-21T08:16:50.6023796Z @%p81 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r134; 2026-02-21T08:16:50.6024065Z // end inline asm 2026-02-21T08:16:50.6024192Z mov.b32 %r42, 128; 2026-02-21T08:16:50.6024328Z // begin inline asm 2026-02-21T08:16:50.6024554Z @%p81 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r42; 2026-02-21T08:16:50.6024845Z // end inline asm 2026-02-21T08:16:50.6024983Z mov.b32 %r43, 2048; 2026-02-21T08:16:50.6025119Z // begin inline asm 2026-02-21T08:16:50.6025359Z @%p81 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r43; 2026-02-21T08:16:50.6025627Z // end inline asm 2026-02-21T08:16:50.6025764Z // begin inline asm 2026-02-21T08:16:50.6025995Z @%p81 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r43; 2026-02-21T08:16:50.6026271Z // end inline asm 2026-02-21T08:16:50.6026411Z mov.b64 %rd18, 4096; 2026-02-21T08:16:50.6026550Z // begin inline asm 2026-02-21T08:16:50.6026805Z @%p81 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T08:16:50.6027089Z // end inline asm 2026-02-21T08:16:50.6027225Z mov.b32 %r45, 1; 2026-02-21T08:16:50.6027351Z // begin inline asm 2026-02-21T08:16:50.6027607Z @%p81 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r45; 2026-02-21T08:16:50.6027889Z // end inline asm 2026-02-21T08:16:50.6028024Z // begin inline asm 2026-02-21T08:16:50.6028273Z @%p81 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r45; 2026-02-21T08:16:50.6028547Z // end inline asm 2026-02-21T08:16:50.6028681Z // begin inline asm 2026-02-21T08:16:50.6028903Z @%p81 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T08:16:50.6029168Z // end inline asm 2026-02-21T08:16:50.6029296Z // begin inline asm 2026-02-21T08:16:50.6029545Z @%p81 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:16:50.6029826Z // end inline asm 2026-02-21T08:16:50.6029958Z // begin inline asm 2026-02-21T08:16:50.6030196Z @%p81 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:16:50.6030539Z // end inline asm 2026-02-21T08:16:50.6030676Z // begin inline asm 2026-02-21T08:16:50.6030896Z @%p81 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:16:50.6031165Z // end inline asm 2026-02-21T08:16:50.6031304Z // begin inline asm 2026-02-21T08:16:50.6031648Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd25 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T08:16:50.6032027Z // end inline asm 2026-02-21T08:16:50.6032164Z // begin inline asm 2026-02-21T08:16:50.6032384Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd25 + 0 ], 0x80; 2026-02-21T08:16:50.6032632Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:16:50.6032833Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:16:50.6033006Z // end inline asm 2026-02-21T08:16:50.6033144Z bar.sync 0; 2026-02-21T08:16:50.6033343Z cvta.global.u64 %rd50, %rd25; 2026-02-21T08:16:50.6033618Z .loc 1 27 76 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:27:76 2026-02-21T08:16:50.6033913Z setp.gt.u32 %p21, %r3, 1023; 2026-02-21T08:16:50.6034069Z @%p21 bra $L__BB0_8; 2026-02-21T08:16:50.6034240Z // %bb.1: // %.lr.ph 2026-02-21T08:16:50.6034543Z .loc 1 0 76 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:0:76 2026-02-21T08:16:50.6034918Z ld.param.b64 %rd8, [_helion_matmul_param_0]; 2026-02-21T08:16:50.6035235Z .loc 1 47 48 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:47:48 2026-02-21T08:16:50.6035532Z shl.b32 %r184, %r1, 3; 2026-02-21T08:16:50.6035698Z and.b32 %r4, %r184, 8; 2026-02-21T08:16:50.6035964Z .loc 1 41 45 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:41:45 2026-02-21T08:16:50.6036270Z and.b32 %r185, %r1, 15; 2026-02-21T08:16:50.6036541Z .loc 1 39 45 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:39:45 2026-02-21T08:16:50.6036847Z shr.u32 %r186, %r1, 4; 2026-02-21T08:16:50.6037119Z .loc 1 34 33 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:34:33 2026-02-21T08:16:50.6037405Z shr.u32 %r187, %r3, 4; 2026-02-21T08:16:50.6037564Z and.b32 %r188, %r187, 48; 2026-02-21T08:16:50.6037837Z .loc 1 36 64 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:36:64 2026-02-21T08:16:50.6038138Z and.b32 %r12, %r3, 15; 2026-02-21T08:16:50.6038425Z .loc 1 36 30 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:36:30 2026-02-21T08:16:50.6038728Z or.b32 %r189, %r188, %r12; 2026-02-21T08:16:50.6039003Z .loc 1 38 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:38:27 2026-02-21T08:16:50.6039309Z shl.b32 %r190, %r189, 6; 2026-02-21T08:16:50.6039605Z .loc 1 39 45 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:39:45 2026-02-21T08:16:50.6039917Z or.b32 %r191, %r186, %r190; 2026-02-21T08:16:50.6040089Z bfe.u32 %r192, %r1, 4, 3; 2026-02-21T08:16:50.6040241Z bfe.u32 %r5, %r1, 1, 6; 2026-02-21T08:16:50.6040397Z shr.u32 %r193, %r1, 5; 2026-02-21T08:16:50.6040544Z shl.b32 %r194, %r1, 4; 2026-02-21T08:16:50.6040700Z and.b32 %r195, %r194, 1904; 2026-02-21T08:16:50.6040859Z bfe.s32 %r196, %r1, 3, 1; 2026-02-21T08:16:50.6041025Z and.b32 %r197, %r196, 144; 2026-02-21T08:16:50.6041184Z xor.b32 %r198, %r197, %r195; 2026-02-21T08:16:50.6041354Z add.s32 %r199, %r38, %r198; 2026-02-21T08:16:50.6041520Z add.s32 %r133, %r199, 32768; 2026-02-21T08:16:50.6041677Z add.s32 %r140, %r199, 34816; 2026-02-21T08:16:50.6041837Z add.s32 %r147, %r199, 36864; 2026-02-21T08:16:50.6041988Z add.s32 %r154, %r199, 38912; 2026-02-21T08:16:50.6042155Z add.s32 %r161, %r199, 40960; 2026-02-21T08:16:50.6042299Z add.s32 %r168, %r199, 43008; 2026-02-21T08:16:50.6042453Z add.s32 %r175, %r199, 45056; 2026-02-21T08:16:50.6042659Z add.s32 %r241, %r199, 47104; 2026-02-21T08:16:50.6042813Z shl.b32 %r200, %r1, 9; 2026-02-21T08:16:50.6042959Z and.b32 %r201, %r200, 3072; 2026-02-21T08:16:50.6043118Z shl.b32 %r202, %r185, 4; 2026-02-21T08:16:50.6043271Z and.b32 %r203, %r1, 96; 2026-02-21T08:16:50.6043416Z shl.b32 %r204, %r203, 3; 2026-02-21T08:16:50.6043569Z and.b32 %r206, %r54, 64; 2026-02-21T08:16:50.6043714Z or.b32 %r207, %r202, %r204; 2026-02-21T08:16:50.6043875Z xor.b32 %r208, %r207, %r206; 2026-02-21T08:16:50.6044026Z or.b32 %r209, %r208, %r201; 2026-02-21T08:16:50.6044182Z xor.b32 %r210, %r209, 32; 2026-02-21T08:16:50.6044329Z shl.b32 %r211, %r1, 5; 2026-02-21T08:16:50.6044482Z and.b32 %r212, %r211, 3168; 2026-02-21T08:16:50.6044635Z and.b32 %r213, %r194, 384; 2026-02-21T08:16:50.6044818Z and.b32 %r214, %r54, 16; 2026-02-21T08:16:50.6044966Z or.b32 %r215, %r212, %r213; 2026-02-21T08:16:50.6045111Z xor.b32 %r216, %r215, %r203; 2026-02-21T08:16:50.6045318Z add.s32 %r217, %r38, %r214; 2026-02-21T08:16:50.6045471Z add.s32 %r383, %r217, %r216; 2026-02-21T08:16:50.6045733Z .loc 1 39 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:39:32 2026-02-21T08:16:50.6046008Z or.b32 %r218, %r190, %r5; 2026-02-21T08:16:50.6046163Z or.b32 %r13, %r190, %r192; 2026-02-21T08:16:50.6046415Z .loc 1 40 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:40:27 2026-02-21T08:16:50.6046694Z shl.b32 %r219, %r3, 3; 2026-02-21T08:16:50.6046842Z and.b32 %r246, %r219, 1920; 2026-02-21T08:16:50.6047097Z .loc 1 41 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:41:32 2026-02-21T08:16:50.6047378Z and.b32 %r220, %r3, 240; 2026-02-21T08:16:50.6047526Z or.b32 %r221, %r220, %r185; 2026-02-21T08:16:50.6047785Z .loc 1 51 53 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:53 2026-02-21T08:16:50.6048058Z shl.b32 %r222, %r218, 11; 2026-02-21T08:16:50.6048312Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6048618Z shfl.sync.idx.b32 %r23, %r193, 0, 31, -1; 2026-02-21T08:16:50.6048795Z shl.b32 %r223, %r23, 21; 2026-02-21T08:16:50.6048949Z and.b32 %r224, %r223, 6291456; 2026-02-21T08:16:50.6049105Z add.s32 %r378, %r224, %r564; 2026-02-21T08:16:50.6049264Z mov.pred %p22, -1; 2026-02-21T08:16:50.6049401Z // begin inline asm 2026-02-21T08:16:50.6049759Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r378 + 0], 64, {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T08:16:50.6050126Z // end inline asm 2026-02-21T08:16:50.6050258Z // begin inline asm 2026-02-21T08:16:50.6050589Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r378 + 16], 64, {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T08:16:50.6050942Z // end inline asm 2026-02-21T08:16:50.6051081Z // begin inline asm 2026-02-21T08:16:50.6051400Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r378 + 32], 64, {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T08:16:50.6051761Z // end inline asm 2026-02-21T08:16:50.6051895Z // begin inline asm 2026-02-21T08:16:50.6052213Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r378 + 48], 64, {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T08:16:50.6052575Z // end inline asm 2026-02-21T08:16:50.6052704Z // begin inline asm 2026-02-21T08:16:50.6052858Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:16:50.6053014Z // end inline asm 2026-02-21T08:16:50.6053148Z bar.sync 0; 2026-02-21T08:16:50.6053387Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6053672Z add.s32 %r566, %r38, 49216; 2026-02-21T08:16:50.6053827Z // begin inline asm 2026-02-21T08:16:50.6054041Z @%p81 mbarrier.init.shared::cta.b64 [%r566], 1; 2026-02-21T08:16:50.6054232Z // end inline asm 2026-02-21T08:16:50.6054358Z bar.sync 0; 2026-02-21T08:16:50.6054490Z add.s32 %r124, %r38, 49224; 2026-02-21T08:16:50.6054635Z // begin inline asm 2026-02-21T08:16:50.6054858Z @%p81 mbarrier.init.shared::cta.b64 [%r124], 1; 2026-02-21T08:16:50.6055039Z // end inline asm 2026-02-21T08:16:50.6055178Z add.s32 %r125, %r38, 49152; 2026-02-21T08:16:50.6055329Z // begin inline asm 2026-02-21T08:16:50.6055483Z @%p81 mbarrier.init.shared::cta.b64 [%r125], 1; 2026-02-21T08:16:50.6055668Z // end inline asm 2026-02-21T08:16:50.6055794Z bar.sync 0; 2026-02-21T08:16:50.6055927Z add.s32 %r126, %r38, 49160; 2026-02-21T08:16:50.6056070Z // begin inline asm 2026-02-21T08:16:50.6056233Z @%p81 mbarrier.init.shared::cta.b64 [%r126], 1; 2026-02-21T08:16:50.6056409Z // end inline asm 2026-02-21T08:16:50.6056541Z bar.sync 0; 2026-02-21T08:16:50.6056714Z add.s32 %r127, %r38, 49168; 2026-02-21T08:16:50.6056871Z // begin inline asm 2026-02-21T08:16:50.6057033Z @%p81 mbarrier.init.shared::cta.b64 [%r127], 1; 2026-02-21T08:16:50.6057211Z // end inline asm 2026-02-21T08:16:50.6057348Z bar.sync 0; 2026-02-21T08:16:50.6057477Z add.s32 %r128, %r38, 49176; 2026-02-21T08:16:50.6057631Z // begin inline asm 2026-02-21T08:16:50.6057784Z @%p81 mbarrier.init.shared::cta.b64 [%r128], 1; 2026-02-21T08:16:50.6057966Z // end inline asm 2026-02-21T08:16:50.6058090Z bar.sync 0; 2026-02-21T08:16:50.6058221Z add.s32 %r129, %r38, 49184; 2026-02-21T08:16:50.6058366Z // begin inline asm 2026-02-21T08:16:50.6058528Z @%p81 mbarrier.init.shared::cta.b64 [%r129], 1; 2026-02-21T08:16:50.6058710Z // end inline asm 2026-02-21T08:16:50.6058835Z bar.sync 0; 2026-02-21T08:16:50.6058967Z add.s32 %r130, %r38, 49192; 2026-02-21T08:16:50.6059112Z // begin inline asm 2026-02-21T08:16:50.6059271Z @%p81 mbarrier.init.shared::cta.b64 [%r130], 1; 2026-02-21T08:16:50.6059448Z // end inline asm 2026-02-21T08:16:50.6059583Z bar.sync 0; 2026-02-21T08:16:50.6059710Z add.s32 %r131, %r38, 49200; 2026-02-21T08:16:50.6059863Z // begin inline asm 2026-02-21T08:16:50.6060022Z @%p81 mbarrier.init.shared::cta.b64 [%r131], 1; 2026-02-21T08:16:50.6060199Z // end inline asm 2026-02-21T08:16:50.6060333Z bar.sync 0; 2026-02-21T08:16:50.6060457Z add.s32 %r243, %r38, 49208; 2026-02-21T08:16:50.6060608Z // begin inline asm 2026-02-21T08:16:50.6060763Z @%p81 mbarrier.init.shared::cta.b64 [%r243], 1; 2026-02-21T08:16:50.6060945Z // end inline asm 2026-02-21T08:16:50.6061185Z .loc 1 51 60 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:60 2026-02-21T08:16:50.6061473Z or.b32 %r225, %r222, %r4; 2026-02-21T08:16:50.6061730Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6062014Z mad.wide.u32 %rd30, %r225, 2, %rd8; 2026-02-21T08:16:50.6062295Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6062565Z // begin inline asm 2026-02-21T08:16:50.6062766Z cp.async.cg.shared.global [ %r133 + 0 ], [ %rd30 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6062982Z // end inline asm 2026-02-21T08:16:50.6063124Z cp.async.commit_group; 2026-02-21T08:16:50.6063381Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6063658Z bar.sync 0; 2026-02-21T08:16:50.6063787Z // begin inline asm 2026-02-21T08:16:50.6063968Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r125], 4096; 2026-02-21T08:16:50.6064184Z // end inline asm 2026-02-21T08:16:50.6064419Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6064730Z // begin inline asm 2026-02-21T08:16:50.6064882Z fence.proxy.async.shared::cta; 2026-02-21T08:16:50.6065051Z // end inline asm 2026-02-21T08:16:50.6065176Z bar.sync 0; 2026-02-21T08:16:50.6065319Z elect.sync %r226|%p51, -1; 2026-02-21T08:16:50.6065551Z and.pred %p37, %p1, %p51; 2026-02-21T08:16:50.6065701Z // begin inline asm 2026-02-21T08:16:50.6066035Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r38], [%rd50, {%r56, %r246}], [%r125]; 2026-02-21T08:16:50.6066373Z // end inline asm 2026-02-21T08:16:50.6066622Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6066909Z add.s64 %rd32, %rd30, 32; 2026-02-21T08:16:50.6067174Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6067464Z // begin inline asm 2026-02-21T08:16:50.6067659Z cp.async.cg.shared.global [ %r140 + 0 ], [ %rd32 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6067893Z // end inline asm 2026-02-21T08:16:50.6068036Z cp.async.commit_group; 2026-02-21T08:16:50.6068362Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6068638Z bar.sync 0; 2026-02-21T08:16:50.6068770Z // begin inline asm 2026-02-21T08:16:50.6068951Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r126], 4096; 2026-02-21T08:16:50.6069167Z // end inline asm 2026-02-21T08:16:50.6069410Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6069688Z bar.sync 0; 2026-02-21T08:16:50.6069829Z elect.sync %r227|%p52, -1; 2026-02-21T08:16:50.6069990Z and.pred %p39, %p1, %p52; 2026-02-21T08:16:50.6070150Z add.s32 %r143, %r38, 4096; 2026-02-21T08:16:50.6070296Z // begin inline asm 2026-02-21T08:16:50.6070620Z @%p39 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r143], [%rd50, {%r134, %r246}], [%r126]; 2026-02-21T08:16:50.6070980Z // end inline asm 2026-02-21T08:16:50.6071220Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6071507Z add.s64 %rd34, %rd30, 64; 2026-02-21T08:16:50.6071761Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6072038Z // begin inline asm 2026-02-21T08:16:50.6072229Z cp.async.cg.shared.global [ %r147 + 0 ], [ %rd34 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6072453Z // end inline asm 2026-02-21T08:16:50.6072595Z cp.async.commit_group; 2026-02-21T08:16:50.6072844Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6073118Z bar.sync 0; 2026-02-21T08:16:50.6073241Z // begin inline asm 2026-02-21T08:16:50.6073428Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r127], 4096; 2026-02-21T08:16:50.6073635Z // end inline asm 2026-02-21T08:16:50.6073875Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6074140Z bar.sync 0; 2026-02-21T08:16:50.6074276Z elect.sync %r228|%p53, -1; 2026-02-21T08:16:50.6074440Z and.pred %p41, %p1, %p53; 2026-02-21T08:16:50.6074592Z add.s32 %r150, %r38, 8192; 2026-02-21T08:16:50.6074773Z mov.b32 %r151, 32; 2026-02-21T08:16:50.6074909Z // begin inline asm 2026-02-21T08:16:50.6075240Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r150], [%rd50, {%r151, %r246}], [%r127]; 2026-02-21T08:16:50.6075600Z // end inline asm 2026-02-21T08:16:50.6075857Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6076158Z add.s64 %rd36, %rd30, 96; 2026-02-21T08:16:50.6076422Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6076709Z // begin inline asm 2026-02-21T08:16:50.6076908Z cp.async.cg.shared.global [ %r154 + 0 ], [ %rd36 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6077138Z // end inline asm 2026-02-21T08:16:50.6077279Z cp.async.commit_group; 2026-02-21T08:16:50.6077548Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6077900Z bar.sync 0; 2026-02-21T08:16:50.6078038Z // begin inline asm 2026-02-21T08:16:50.6078241Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r128], 4096; 2026-02-21T08:16:50.6078471Z // end inline asm 2026-02-21T08:16:50.6078724Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6079008Z bar.sync 0; 2026-02-21T08:16:50.6079152Z elect.sync %r229|%p54, -1; 2026-02-21T08:16:50.6079325Z and.pred %p43, %p1, %p54; 2026-02-21T08:16:50.6079499Z add.s32 %r157, %r38, 12288; 2026-02-21T08:16:50.6079654Z mov.b32 %r158, 48; 2026-02-21T08:16:50.6079798Z // begin inline asm 2026-02-21T08:16:50.6080130Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r157], [%rd50, {%r158, %r246}], [%r128]; 2026-02-21T08:16:50.6080499Z // end inline asm 2026-02-21T08:16:50.6080810Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6081105Z add.s64 %rd38, %rd30, 128; 2026-02-21T08:16:50.6081380Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6081671Z // begin inline asm 2026-02-21T08:16:50.6081875Z cp.async.cg.shared.global [ %r161 + 0 ], [ %rd38 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6082108Z // end inline asm 2026-02-21T08:16:50.6082248Z cp.async.commit_group; 2026-02-21T08:16:50.6082517Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6082798Z bar.sync 0; 2026-02-21T08:16:50.6082934Z // begin inline asm 2026-02-21T08:16:50.6083122Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r129], 4096; 2026-02-21T08:16:50.6083346Z // end inline asm 2026-02-21T08:16:50.6083589Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6083880Z bar.sync 0; 2026-02-21T08:16:50.6084024Z elect.sync %r230|%p55, -1; 2026-02-21T08:16:50.6084186Z and.pred %p45, %p1, %p55; 2026-02-21T08:16:50.6084349Z add.s32 %r164, %r38, 16384; 2026-02-21T08:16:50.6084501Z mov.b32 %r165, 64; 2026-02-21T08:16:50.6084643Z // begin inline asm 2026-02-21T08:16:50.6085005Z @%p45 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r164], [%rd50, {%r165, %r246}], [%r129]; 2026-02-21T08:16:50.6085369Z // end inline asm 2026-02-21T08:16:50.6085627Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6085921Z add.s64 %rd40, %rd30, 160; 2026-02-21T08:16:50.6086184Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6086468Z // begin inline asm 2026-02-21T08:16:50.6086668Z cp.async.cg.shared.global [ %r168 + 0 ], [ %rd40 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6086884Z // end inline asm 2026-02-21T08:16:50.6087029Z cp.async.commit_group; 2026-02-21T08:16:50.6087286Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6087567Z bar.sync 0; 2026-02-21T08:16:50.6087697Z // begin inline asm 2026-02-21T08:16:50.6087876Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r130], 4096; 2026-02-21T08:16:50.6088090Z // end inline asm 2026-02-21T08:16:50.6088329Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6088605Z bar.sync 0; 2026-02-21T08:16:50.6088733Z elect.sync %r231|%p56, -1; 2026-02-21T08:16:50.6088897Z and.pred %p47, %p1, %p56; 2026-02-21T08:16:50.6089048Z add.s32 %r171, %r38, 20480; 2026-02-21T08:16:50.6089205Z mov.b32 %r172, 80; 2026-02-21T08:16:50.6089349Z // begin inline asm 2026-02-21T08:16:50.6089668Z @%p47 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r171], [%rd50, {%r172, %r246}], [%r130]; 2026-02-21T08:16:50.6090026Z // end inline asm 2026-02-21T08:16:50.6090314Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6090594Z add.s64 %rd42, %rd30, 192; 2026-02-21T08:16:50.6090843Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6091117Z // begin inline asm 2026-02-21T08:16:50.6091312Z cp.async.cg.shared.global [ %r175 + 0 ], [ %rd42 + 0 ], 0x10, %r134; 2026-02-21T08:16:50.6091527Z // end inline asm 2026-02-21T08:16:50.6091666Z cp.async.commit_group; 2026-02-21T08:16:50.6091911Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6092184Z bar.sync 0; 2026-02-21T08:16:50.6092306Z // begin inline asm 2026-02-21T08:16:50.6092494Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r131], 4096; 2026-02-21T08:16:50.6092698Z // end inline asm 2026-02-21T08:16:50.6092989Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6093283Z bar.sync 0; 2026-02-21T08:16:50.6093414Z elect.sync %r232|%p57, -1; 2026-02-21T08:16:50.6093576Z and.pred %p49, %p1, %p57; 2026-02-21T08:16:50.6093728Z add.s32 %r178, %r38, 24576; 2026-02-21T08:16:50.6093880Z mov.b32 %r179, 96; 2026-02-21T08:16:50.6094011Z // begin inline asm 2026-02-21T08:16:50.6094338Z @%p49 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd50, {%r179, %r246}], [%r131]; 2026-02-21T08:16:50.6094726Z // end inline asm 2026-02-21T08:16:50.6094966Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6095253Z cp.async.wait_group 6; 2026-02-21T08:16:50.6095398Z bar.sync 0; 2026-02-21T08:16:50.6095635Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6095907Z // begin inline asm 2026-02-21T08:16:50.6096044Z 2026-02-21T08:16:50.6096158Z { 2026-02-21T08:16:50.6096284Z .reg .pred complete; 2026-02-21T08:16:50.6096431Z waitLoop: 2026-02-21T08:16:50.6096612Z mbarrier.try_wait.parity.shared.b64 complete, [%r125], %r56; 2026-02-21T08:16:50.6096843Z @!complete bra.uni waitLoop; 2026-02-21T08:16:50.6096987Z } 2026-02-21T08:16:50.6097050Z 2026-02-21T08:16:50.6097111Z // end inline asm 2026-02-21T08:16:50.6097347Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6097630Z setp.ne.b32 %p58, %r23, 0; 2026-02-21T08:16:50.6097782Z @%p58 bra $L__BB0_3; 2026-02-21T08:16:50.6097925Z // %bb.2: 2026-02-21T08:16:50.6098160Z .loc 1 0 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:0:52 2026-02-21T08:16:50.6098439Z bfe.u32 %r236, %r38, 4, 14; 2026-02-21T08:16:50.6098600Z cvt.u64.u32 %rd47, %r236; 2026-02-21T08:16:50.6098763Z or.b64 %rd45, %rd47, -4611685949691133952; 2026-02-21T08:16:50.6098947Z add.s32 %r237, %r38, 32768; 2026-02-21T08:16:50.6099099Z bfe.u32 %r238, %r237, 4, 14; 2026-02-21T08:16:50.6099260Z cvt.u64.u32 %rd48, %r238; 2026-02-21T08:16:50.6099422Z or.b64 %rd44, %rd48, -4611685949699522560; 2026-02-21T08:16:50.6099712Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6100011Z elect.sync %r239|%p60, -1; 2026-02-21T08:16:50.6100172Z mov.b32 %r234, 69206032; 2026-02-21T08:16:50.6100326Z mov.pred %p59, 0; 2026-02-21T08:16:50.6100464Z // begin inline asm 2026-02-21T08:16:50.6100692Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r564 + 0 ], %rd44, %rd45, %r234, %p59; 2026-02-21T08:16:50.6100937Z // end inline asm 2026-02-21T08:16:50.6101078Z add.s32 %r240, %r38, 49216; 2026-02-21T08:16:50.6101225Z cvt.u64.u32 %rd46, %r240; 2026-02-21T08:16:50.6101378Z // begin inline asm 2026-02-21T08:16:50.6101582Z @%p60 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:16:50.6101804Z // end inline asm 2026-02-21T08:16:50.6102005Z $L__BB0_3: 2026-02-21T08:16:50.6102237Z .loc 1 0 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:0:52 2026-02-21T08:16:50.6102551Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T08:16:50.6102735Z add.s32 %r8, %r38, %r209; 2026-02-21T08:16:50.6102888Z add.s32 %r9, %r38, %r210; 2026-02-21T08:16:50.6103031Z add.s32 %r388, %r383, 512; 2026-02-21T08:16:50.6103186Z or.b32 %r14, %r13, 8; 2026-02-21T08:16:50.6103335Z or.b32 %r15, %r13, 16; 2026-02-21T08:16:50.6103478Z or.b32 %r16, %r13, 24; 2026-02-21T08:16:50.6103628Z or.b32 %r17, %r13, 32; 2026-02-21T08:16:50.6103768Z or.b32 %r18, %r13, 40; 2026-02-21T08:16:50.6103917Z or.b32 %r19, %r13, 48; 2026-02-21T08:16:50.6104058Z or.b32 %r20, %r191, 56; 2026-02-21T08:16:50.6104210Z shl.b32 %r22, %r221, 3; 2026-02-21T08:16:50.6104460Z .loc 1 51 32 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:32 2026-02-21T08:16:50.6104825Z add.s64 %rd49, %rd30, 224; 2026-02-21T08:16:50.6105088Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6105363Z bar.sync 0; 2026-02-21T08:16:50.6105495Z mov.b32 %r242, 16; 2026-02-21T08:16:50.6105627Z // begin inline asm 2026-02-21T08:16:50.6105824Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd49 + 0 ], 0x10, %r242; 2026-02-21T08:16:50.6106040Z // end inline asm 2026-02-21T08:16:50.6106180Z cp.async.commit_group; 2026-02-21T08:16:50.6106426Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6106711Z // begin inline asm 2026-02-21T08:16:50.6106904Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r243], 4096; 2026-02-21T08:16:50.6107110Z // end inline asm 2026-02-21T08:16:50.6107356Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6107625Z bar.sync 0; 2026-02-21T08:16:50.6107767Z elect.sync %r253|%p65, -1; 2026-02-21T08:16:50.6107926Z and.pred %p63, %p1, %p65; 2026-02-21T08:16:50.6108083Z add.s32 %r244, %r38, 28672; 2026-02-21T08:16:50.6108228Z mov.b32 %r245, 112; 2026-02-21T08:16:50.6108370Z // begin inline asm 2026-02-21T08:16:50.6108687Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r244], [%rd50, {%r245, %r246}], [%r243]; 2026-02-21T08:16:50.6109029Z // end inline asm 2026-02-21T08:16:50.6109272Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6109552Z cvt.u16.u32 %rs1, %r3; 2026-02-21T08:16:50.6109707Z shr.u16 %rs2, %rs1, 8; 2026-02-21T08:16:50.6109849Z and.b16 %rs3, %rs2, 3; 2026-02-21T08:16:50.6110008Z mul.wide.u16 %r254, %rs3, 1024; 2026-02-21T08:16:50.6110170Z shl.b32 %r255, %r12, 6; 2026-02-21T08:16:50.6110321Z or.b32 %r256, %r254, %r255; 2026-02-21T08:16:50.6110475Z or.b32 %r257, %r256, %r5; 2026-02-21T08:16:50.6110621Z shl.b32 %r258, %r257, 11; 2026-02-21T08:16:50.6110774Z or.b32 %r259, %r258, %r4; 2026-02-21T08:16:50.6110925Z mad.wide.u32 %rd52, %r259, 2, %rd8; 2026-02-21T08:16:50.6111097Z add.s64 %rd196, %rd52, 256; 2026-02-21T08:16:50.6111244Z mov.b32 %r570, 1; 2026-02-21T08:16:50.6111382Z mov.b32 %r569, 7; 2026-02-21T08:16:50.6111509Z mov.b32 %r565, 0; 2026-02-21T08:16:50.6111643Z mov.b64 %rd197, 0; 2026-02-21T08:16:50.6111778Z mov.b32 %r567, %r565; 2026-02-21T08:16:50.6111921Z mov.b32 %r568, %r565; 2026-02-21T08:16:50.6112062Z mov.b32 %r571, %r565; 2026-02-21T08:16:50.6112197Z bra.uni $L__BB0_4; 2026-02-21T08:16:50.6112383Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:16:50.6112695Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6112982Z setp.lt.u64 %p73, %rd197, 1920; 2026-02-21T08:16:50.6113249Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6113587Z // begin inline asm 2026-02-21T08:16:50.6113721Z 2026-02-21T08:16:50.6113830Z { 2026-02-21T08:16:50.6113952Z .reg .pred complete; 2026-02-21T08:16:50.6114090Z waitLoop: 2026-02-21T08:16:50.6114280Z mbarrier.try_wait.parity.shared.b64 complete, [%r566], %r565; 2026-02-21T08:16:50.6114506Z @!complete bra.uni waitLoop; 2026-02-21T08:16:50.6114659Z } 2026-02-21T08:16:50.6114756Z 2026-02-21T08:16:50.6114807Z // end inline asm 2026-02-21T08:16:50.6114948Z add.s32 %r289, %r570, 1; 2026-02-21T08:16:50.6115097Z setp.gt.s32 %p76, %r289, 1; 2026-02-21T08:16:50.6115260Z selp.b32 %r570, 0, %r289, %p76; 2026-02-21T08:16:50.6115425Z selp.b32 %r290, 1, 0, %p76; 2026-02-21T08:16:50.6115578Z xor.b32 %r571, %r300, %r290; 2026-02-21T08:16:50.6115849Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6116136Z add.s32 %r291, %r569, 1; 2026-02-21T08:16:50.6116295Z setp.gt.s32 %p77, %r291, 7; 2026-02-21T08:16:50.6116521Z selp.b32 %r569, 0, %r291, %p77; 2026-02-21T08:16:50.6116794Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6117070Z shl.b32 %r292, %r569, 11; 2026-02-21T08:16:50.6117220Z bar.sync 0; 2026-02-21T08:16:50.6117355Z add.s32 %r282, %r133, %r292; 2026-02-21T08:16:50.6117507Z selp.b32 %r283, 16, 0, %p73; 2026-02-21T08:16:50.6117665Z // begin inline asm 2026-02-21T08:16:50.6117861Z cp.async.cg.shared.global [ %r282 + 0 ], [ %rd196 + 0 ], 0x10, %r283; 2026-02-21T08:16:50.6118087Z // end inline asm 2026-02-21T08:16:50.6118225Z cp.async.commit_group; 2026-02-21T08:16:50.6118491Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6118773Z shl.b32 %r293, %r569, 3; 2026-02-21T08:16:50.6118934Z add.s32 %r295, %r38, %r293; 2026-02-21T08:16:50.6119100Z add.s32 %r288, %r295, 49152; 2026-02-21T08:16:50.6119257Z and.pred %p71, %p81, %p73; 2026-02-21T08:16:50.6119415Z // begin inline asm 2026-02-21T08:16:50.6119601Z @%p71 mbarrier.arrive.expect_tx.shared.b64 _, [%r288], 4096; 2026-02-21T08:16:50.6119814Z // end inline asm 2026-02-21T08:16:50.6120051Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6120338Z shl.b32 %r296, %r569, 12; 2026-02-21T08:16:50.6120485Z add.s32 %r285, %r38, %r296; 2026-02-21T08:16:50.6120638Z bar.sync 0; 2026-02-21T08:16:50.6120776Z elect.sync %r297|%p78, -1; 2026-02-21T08:16:50.6120932Z and.pred %p79, %p73, %p78; 2026-02-21T08:16:50.6121091Z and.pred %p72, %p1, %p79; 2026-02-21T08:16:50.6121240Z cvt.u32.u64 %r298, %rd197; 2026-02-21T08:16:50.6121397Z add.s32 %r286, %r298, 128; 2026-02-21T08:16:50.6121547Z // begin inline asm 2026-02-21T08:16:50.6121885Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r285], [%rd50, {%r286, %r246}], [%r288]; 2026-02-21T08:16:50.6122246Z // end inline asm 2026-02-21T08:16:50.6122507Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6122808Z add.s64 %rd196, %rd196, 32; 2026-02-21T08:16:50.6122970Z setp.lt.u64 %p80, %rd197, 2016; 2026-02-21T08:16:50.6123146Z add.s64 %rd197, %rd197, 16; 2026-02-21T08:16:50.6123296Z mov.b32 %r565, %r300; 2026-02-21T08:16:50.6123448Z mov.b32 %r566, %r299; 2026-02-21T08:16:50.6123594Z @%p80 bra $L__BB0_4; 2026-02-21T08:16:50.6123746Z bra.uni $L__BB0_7; 2026-02-21T08:16:50.6123933Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:16:50.6124276Z .loc 1 0 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:0:89 2026-02-21T08:16:50.6124573Z mov.b32 %r300, %r571; 2026-02-21T08:16:50.6124870Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6125167Z add.s32 %r262, %r568, 1; 2026-02-21T08:16:50.6125329Z setp.gt.s32 %p67, %r262, 7; 2026-02-21T08:16:50.6125614Z selp.b32 %r568, 0, %r262, %p67; 2026-02-21T08:16:50.6125781Z selp.b32 %r263, 1, 0, %p67; 2026-02-21T08:16:50.6125947Z xor.b32 %r567, %r567, %r263; 2026-02-21T08:16:50.6126233Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6126527Z cp.async.wait_group 6; 2026-02-21T08:16:50.6126689Z bar.sync 0; 2026-02-21T08:16:50.6126938Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6127238Z shl.b32 %r264, %r568, 3; 2026-02-21T08:16:50.6127391Z add.s32 %r266, %r38, %r264; 2026-02-21T08:16:50.6127558Z add.s32 %r260, %r266, 49152; 2026-02-21T08:16:50.6127713Z // begin inline asm 2026-02-21T08:16:50.6127860Z 2026-02-21T08:16:50.6127974Z { 2026-02-21T08:16:50.6128105Z .reg .pred complete; 2026-02-21T08:16:50.6128259Z waitLoop: 2026-02-21T08:16:50.6128508Z mbarrier.try_wait.parity.shared.b64 complete, [%r260], %r567; 2026-02-21T08:16:50.6128770Z @!complete bra.uni waitLoop; 2026-02-21T08:16:50.6128926Z } 2026-02-21T08:16:50.6129002Z 2026-02-21T08:16:50.6129060Z // end inline asm 2026-02-21T08:16:50.6129204Z shl.b32 %r267, %r570, 3; 2026-02-21T08:16:50.6129366Z add.s32 %r268, %r38, %r267; 2026-02-21T08:16:50.6129528Z add.s32 %r299, %r268, 49216; 2026-02-21T08:16:50.6129796Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6130079Z @%p58 bra $L__BB0_6; 2026-02-21T08:16:50.6130261Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:16:50.6130581Z .loc 1 52 44 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:52:44 2026-02-21T08:16:50.6130854Z shl.b32 %r271, %r568, 12; 2026-02-21T08:16:50.6131009Z add.s32 %r273, %r38, %r271; 2026-02-21T08:16:50.6131267Z .loc 1 51 85 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:51:85 2026-02-21T08:16:50.6131552Z shl.b32 %r274, %r568, 11; 2026-02-21T08:16:50.6131705Z add.s32 %r275, %r38, %r274; 2026-02-21T08:16:50.6131855Z add.s32 %r276, %r275, 32768; 2026-02-21T08:16:50.6132114Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6132405Z elect.sync %r277|%p69, -1; 2026-02-21T08:16:50.6132566Z bfe.u32 %r278, %r276, 4, 14; 2026-02-21T08:16:50.6132713Z cvt.u64.u32 %rd56, %r278; 2026-02-21T08:16:50.6132882Z or.b64 %rd53, %rd56, -4611685949699522560; 2026-02-21T08:16:50.6133058Z bfe.u32 %r279, %r273, 4, 14; 2026-02-21T08:16:50.6133214Z cvt.u64.u32 %rd57, %r279; 2026-02-21T08:16:50.6133378Z or.b64 %rd54, %rd57, -4611685949691133952; 2026-02-21T08:16:50.6133551Z mov.b32 %r270, 69206032; 2026-02-21T08:16:50.6133703Z mov.pred %p68, -1; 2026-02-21T08:16:50.6133842Z // begin inline asm 2026-02-21T08:16:50.6134069Z @%p69 tcgen05.mma.cta_group::1.kind::f16 [ %r564 + 0 ], %rd53, %rd54, %r270, %p68; 2026-02-21T08:16:50.6134317Z // end inline asm 2026-02-21T08:16:50.6134460Z cvt.u64.u32 %rd55, %r299; 2026-02-21T08:16:50.6134602Z // begin inline asm 2026-02-21T08:16:50.6134840Z @%p69 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd55]; 2026-02-21T08:16:50.6135069Z // end inline asm 2026-02-21T08:16:50.6135198Z bra.uni $L__BB0_6; 2026-02-21T08:16:50.6135377Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:16:50.6135684Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6135982Z // begin inline asm 2026-02-21T08:16:50.6136107Z 2026-02-21T08:16:50.6136223Z { 2026-02-21T08:16:50.6136340Z .reg .pred complete; 2026-02-21T08:16:50.6136487Z waitLoop: 2026-02-21T08:16:50.6136677Z mbarrier.try_wait.parity.shared.b64 complete, [%r299], %r300; 2026-02-21T08:16:50.6136904Z @!complete bra.uni waitLoop; 2026-02-21T08:16:50.6137059Z } 2026-02-21T08:16:50.6137122Z 2026-02-21T08:16:50.6137176Z // end inline asm 2026-02-21T08:16:50.6137483Z .loc 1 46 89 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:46:89 2026-02-21T08:16:50.6137763Z cp.async.wait_group 0; 2026-02-21T08:16:50.6137917Z bar.sync 0; 2026-02-21T08:16:50.6138044Z // begin inline asm 2026-02-21T08:16:50.6138209Z @%p81 mbarrier.inval.shared::cta.b64 [%r125]; 2026-02-21T08:16:50.6138399Z // end inline asm 2026-02-21T08:16:50.6138528Z bar.sync 0; 2026-02-21T08:16:50.6138663Z // begin inline asm 2026-02-21T08:16:50.6138818Z @%p81 mbarrier.inval.shared::cta.b64 [%r126]; 2026-02-21T08:16:50.6139012Z // end inline asm 2026-02-21T08:16:50.6139144Z bar.sync 0; 2026-02-21T08:16:50.6139285Z // begin inline asm 2026-02-21T08:16:50.6139439Z @%p81 mbarrier.inval.shared::cta.b64 [%r127]; 2026-02-21T08:16:50.6139621Z // end inline asm 2026-02-21T08:16:50.6139744Z bar.sync 0; 2026-02-21T08:16:50.6139878Z // begin inline asm 2026-02-21T08:16:50.6140086Z @%p81 mbarrier.inval.shared::cta.b64 [%r128]; 2026-02-21T08:16:50.6140262Z // end inline asm 2026-02-21T08:16:50.6140397Z bar.sync 0; 2026-02-21T08:16:50.6140518Z // begin inline asm 2026-02-21T08:16:50.6140676Z @%p81 mbarrier.inval.shared::cta.b64 [%r129]; 2026-02-21T08:16:50.6140851Z // end inline asm 2026-02-21T08:16:50.6140985Z bar.sync 0; 2026-02-21T08:16:50.6141107Z // begin inline asm 2026-02-21T08:16:50.6141265Z @%p81 mbarrier.inval.shared::cta.b64 [%r130]; 2026-02-21T08:16:50.6141437Z // end inline asm 2026-02-21T08:16:50.6141570Z bar.sync 0; 2026-02-21T08:16:50.6141698Z // begin inline asm 2026-02-21T08:16:50.6141850Z @%p81 mbarrier.inval.shared::cta.b64 [%r131]; 2026-02-21T08:16:50.6142034Z // end inline asm 2026-02-21T08:16:50.6142158Z bar.sync 0; 2026-02-21T08:16:50.6142285Z // begin inline asm 2026-02-21T08:16:50.6142437Z @%p81 mbarrier.inval.shared::cta.b64 [%r243]; 2026-02-21T08:16:50.6142619Z // end inline asm 2026-02-21T08:16:50.6142749Z add.s32 %r309, %r38, 49216; 2026-02-21T08:16:50.6142903Z // begin inline asm 2026-02-21T08:16:50.6143061Z @%p81 mbarrier.inval.shared::cta.b64 [%r309]; 2026-02-21T08:16:50.6143236Z // end inline asm 2026-02-21T08:16:50.6143370Z bar.sync 0; 2026-02-21T08:16:50.6143489Z // begin inline asm 2026-02-21T08:16:50.6143646Z @%p81 mbarrier.inval.shared::cta.b64 [%r124]; 2026-02-21T08:16:50.6143818Z // end inline asm 2026-02-21T08:16:50.6144064Z .loc 1 56 45 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:56:45 2026-02-21T08:16:50.6144349Z shl.b32 %r452, %r13, 11; 2026-02-21T08:16:50.6144501Z shl.b32 %r453, %r14, 11; 2026-02-21T08:16:50.6144651Z shl.b32 %r454, %r15, 11; 2026-02-21T08:16:50.6144822Z shl.b32 %r455, %r16, 11; 2026-02-21T08:16:50.6144968Z shl.b32 %r456, %r17, 11; 2026-02-21T08:16:50.6145107Z shl.b32 %r457, %r18, 11; 2026-02-21T08:16:50.6145254Z shl.b32 %r458, %r19, 11; 2026-02-21T08:16:50.6145394Z shl.b32 %r459, %r20, 11; 2026-02-21T08:16:50.6145653Z .loc 1 56 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:56:52 2026-02-21T08:16:50.6145935Z or.b32 %r460, %r452, %r22; 2026-02-21T08:16:50.6146094Z or.b32 %r461, %r453, %r22; 2026-02-21T08:16:50.6146244Z or.b32 %r462, %r454, %r22; 2026-02-21T08:16:50.6146401Z or.b32 %r463, %r455, %r22; 2026-02-21T08:16:50.6146560Z or.b32 %r464, %r456, %r22; 2026-02-21T08:16:50.6146708Z or.b32 %r465, %r457, %r22; 2026-02-21T08:16:50.6146867Z or.b32 %r466, %r458, %r22; 2026-02-21T08:16:50.6147009Z or.b32 %r467, %r459, %r22; 2026-02-21T08:16:50.6147273Z .loc 1 56 24 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:56:24 2026-02-21T08:16:50.6147568Z mad.wide.u32 %rd60, %r460, 2, %rd9; 2026-02-21T08:16:50.6147754Z mad.wide.u32 %rd61, %r461, 2, %rd9; 2026-02-21T08:16:50.6147923Z mad.wide.u32 %rd62, %r462, 2, %rd9; 2026-02-21T08:16:50.6148097Z mad.wide.u32 %rd63, %r463, 2, %rd9; 2026-02-21T08:16:50.6148268Z mad.wide.u32 %rd64, %r464, 2, %rd9; 2026-02-21T08:16:50.6148433Z mad.wide.u32 %rd65, %r465, 2, %rd9; 2026-02-21T08:16:50.6148603Z mad.wide.u32 %rd66, %r466, 2, %rd9; 2026-02-21T08:16:50.6148822Z mad.wide.u32 %rd67, %r467, 2, %rd9; 2026-02-21T08:16:50.6149097Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6149371Z // begin inline asm 2026-02-21T08:16:50.6149737Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326}, [%r378 + 0], 64; 2026-02-21T08:16:50.6150134Z // end inline asm 2026-02-21T08:16:50.6150269Z // begin inline asm 2026-02-21T08:16:50.6150633Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343}, [%r378 + 16], 64; 2026-02-21T08:16:50.6151021Z // end inline asm 2026-02-21T08:16:50.6151159Z // begin inline asm 2026-02-21T08:16:50.6151557Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360}, [%r378 + 32], 64; 2026-02-21T08:16:50.6151943Z // end inline asm 2026-02-21T08:16:50.6152080Z // begin inline asm 2026-02-21T08:16:50.6152427Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r362, %r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377}, [%r378 + 48], 64; 2026-02-21T08:16:50.6152807Z // end inline asm 2026-02-21T08:16:50.6152933Z // begin inline asm 2026-02-21T08:16:50.6153086Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:16:50.6153242Z // end inline asm 2026-02-21T08:16:50.6153378Z cvt.u64.u32 %rd68, %r311; 2026-02-21T08:16:50.6153535Z cvt.u64.u32 %rd69, %r312; 2026-02-21T08:16:50.6153679Z shl.b64 %rd70, %rd69, 32; 2026-02-21T08:16:50.6153832Z or.b64 %rd71, %rd68, %rd70; 2026-02-21T08:16:50.6154089Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6154384Z mov.b64 {%r468, %r469}, %rd71; 2026-02-21T08:16:50.6154550Z cvt.rn.f16x2.f32 %r470, %r469, %r468; 2026-02-21T08:16:50.6154866Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6155150Z cvt.u64.u32 %rd72, %r313; 2026-02-21T08:16:50.6155302Z cvt.u64.u32 %rd73, %r314; 2026-02-21T08:16:50.6155452Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:16:50.6155600Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:16:50.6155866Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6156160Z mov.b64 {%r471, %r472}, %rd75; 2026-02-21T08:16:50.6156330Z cvt.rn.f16x2.f32 %r473, %r472, %r471; 2026-02-21T08:16:50.6156605Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6156898Z cvt.u64.u32 %rd76, %r315; 2026-02-21T08:16:50.6157051Z cvt.u64.u32 %rd77, %r316; 2026-02-21T08:16:50.6157195Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:16:50.6157351Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:16:50.6157618Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6157912Z mov.b64 {%r474, %r475}, %rd79; 2026-02-21T08:16:50.6158082Z cvt.rn.f16x2.f32 %r476, %r475, %r474; 2026-02-21T08:16:50.6158359Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6158633Z cvt.u64.u32 %rd80, %r317; 2026-02-21T08:16:50.6158783Z cvt.u64.u32 %rd81, %r318; 2026-02-21T08:16:50.6158933Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:16:50.6159077Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:16:50.6159339Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6159617Z mov.b64 {%r477, %r478}, %rd83; 2026-02-21T08:16:50.6159783Z cvt.rn.f16x2.f32 %r479, %r478, %r477; 2026-02-21T08:16:50.6160062Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6160405Z cvt.u64.u32 %rd84, %r319; 2026-02-21T08:16:50.6160554Z cvt.u64.u32 %rd85, %r320; 2026-02-21T08:16:50.6160697Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:16:50.6160846Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:16:50.6161099Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6161381Z mov.b64 {%r480, %r481}, %rd87; 2026-02-21T08:16:50.6161537Z cvt.rn.f16x2.f32 %r482, %r481, %r480; 2026-02-21T08:16:50.6161818Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6162094Z cvt.u64.u32 %rd88, %r321; 2026-02-21T08:16:50.6162245Z cvt.u64.u32 %rd89, %r322; 2026-02-21T08:16:50.6162396Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:16:50.6162543Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:16:50.6162887Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6163161Z mov.b64 {%r483, %r484}, %rd91; 2026-02-21T08:16:50.6163327Z cvt.rn.f16x2.f32 %r485, %r484, %r483; 2026-02-21T08:16:50.6163621Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6163916Z cvt.u64.u32 %rd92, %r323; 2026-02-21T08:16:50.6164065Z cvt.u64.u32 %rd93, %r324; 2026-02-21T08:16:50.6164220Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:16:50.6164377Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:16:50.6164645Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6164986Z mov.b64 {%r486, %r487}, %rd95; 2026-02-21T08:16:50.6165152Z cvt.rn.f16x2.f32 %r488, %r487, %r486; 2026-02-21T08:16:50.6165437Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6165721Z cvt.u64.u32 %rd96, %r325; 2026-02-21T08:16:50.6165878Z cvt.u64.u32 %rd97, %r326; 2026-02-21T08:16:50.6166036Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:16:50.6166189Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:16:50.6166461Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6166751Z mov.b64 {%r489, %r490}, %rd99; 2026-02-21T08:16:50.6166924Z cvt.rn.f16x2.f32 %r491, %r490, %r489; 2026-02-21T08:16:50.6167205Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6167501Z cvt.u64.u32 %rd100, %r328; 2026-02-21T08:16:50.6167662Z cvt.u64.u32 %rd101, %r329; 2026-02-21T08:16:50.6167825Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:16:50.6167990Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:16:50.6168260Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6168565Z mov.b64 {%r492, %r493}, %rd103; 2026-02-21T08:16:50.6168737Z cvt.rn.f16x2.f32 %r494, %r493, %r492; 2026-02-21T08:16:50.6169032Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6169329Z cvt.u64.u32 %rd104, %r330; 2026-02-21T08:16:50.6169493Z cvt.u64.u32 %rd105, %r331; 2026-02-21T08:16:50.6169657Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:16:50.6169816Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:16:50.6170091Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6170386Z mov.b64 {%r495, %r496}, %rd107; 2026-02-21T08:16:50.6170561Z cvt.rn.f16x2.f32 %r497, %r496, %r495; 2026-02-21T08:16:50.6170839Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6171142Z cvt.u64.u32 %rd108, %r332; 2026-02-21T08:16:50.6171297Z cvt.u64.u32 %rd109, %r333; 2026-02-21T08:16:50.6171459Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:16:50.6171620Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:16:50.6171888Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6172231Z mov.b64 {%r498, %r499}, %rd111; 2026-02-21T08:16:50.6172396Z cvt.rn.f16x2.f32 %r500, %r499, %r498; 2026-02-21T08:16:50.6172677Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6172956Z cvt.u64.u32 %rd112, %r334; 2026-02-21T08:16:50.6173112Z cvt.u64.u32 %rd113, %r335; 2026-02-21T08:16:50.6173265Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:16:50.6173415Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:16:50.6173690Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6173980Z mov.b64 {%r501, %r502}, %rd115; 2026-02-21T08:16:50.6174147Z cvt.rn.f16x2.f32 %r503, %r502, %r501; 2026-02-21T08:16:50.6174425Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6174798Z cvt.u64.u32 %rd116, %r336; 2026-02-21T08:16:50.6174948Z cvt.u64.u32 %rd117, %r337; 2026-02-21T08:16:50.6175102Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:16:50.6175255Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:16:50.6175513Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6175808Z mov.b64 {%r504, %r505}, %rd119; 2026-02-21T08:16:50.6175965Z cvt.rn.f16x2.f32 %r506, %r505, %r504; 2026-02-21T08:16:50.6176240Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6176518Z cvt.u64.u32 %rd120, %r338; 2026-02-21T08:16:50.6176674Z cvt.u64.u32 %rd121, %r339; 2026-02-21T08:16:50.6176828Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:16:50.6176978Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:16:50.6177244Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6177524Z mov.b64 {%r507, %r508}, %rd123; 2026-02-21T08:16:50.6177692Z cvt.rn.f16x2.f32 %r509, %r508, %r507; 2026-02-21T08:16:50.6177964Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6178254Z cvt.u64.u32 %rd124, %r340; 2026-02-21T08:16:50.6178398Z cvt.u64.u32 %rd125, %r341; 2026-02-21T08:16:50.6178553Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:16:50.6178708Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:16:50.6178969Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6179252Z mov.b64 {%r510, %r511}, %rd127; 2026-02-21T08:16:50.6179412Z cvt.rn.f16x2.f32 %r512, %r511, %r510; 2026-02-21T08:16:50.6179692Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6179970Z cvt.u64.u32 %rd128, %r342; 2026-02-21T08:16:50.6180128Z cvt.u64.u32 %rd129, %r343; 2026-02-21T08:16:50.6180285Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:16:50.6180437Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:16:50.6180704Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6180981Z mov.b64 {%r513, %r514}, %rd131; 2026-02-21T08:16:50.6181145Z cvt.rn.f16x2.f32 %r515, %r514, %r513; 2026-02-21T08:16:50.6181411Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6181691Z cvt.u64.u32 %rd132, %r345; 2026-02-21T08:16:50.6181838Z cvt.u64.u32 %rd133, %r346; 2026-02-21T08:16:50.6181989Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:16:50.6182142Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:16:50.6182401Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6182686Z mov.b64 {%r516, %r517}, %rd135; 2026-02-21T08:16:50.6182845Z cvt.rn.f16x2.f32 %r518, %r517, %r516; 2026-02-21T08:16:50.6183120Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6183446Z cvt.u64.u32 %rd136, %r347; 2026-02-21T08:16:50.6183603Z cvt.u64.u32 %rd137, %r348; 2026-02-21T08:16:50.6183756Z shl.b64 %rd138, %rd137, 32; 2026-02-21T08:16:50.6183905Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T08:16:50.6184167Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6184440Z mov.b64 {%r519, %r520}, %rd139; 2026-02-21T08:16:50.6184605Z cvt.rn.f16x2.f32 %r521, %r520, %r519; 2026-02-21T08:16:50.6184907Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6185189Z cvt.u64.u32 %rd140, %r349; 2026-02-21T08:16:50.6185338Z cvt.u64.u32 %rd141, %r350; 2026-02-21T08:16:50.6185493Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:16:50.6185649Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:16:50.6185957Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6186244Z mov.b64 {%r522, %r523}, %rd143; 2026-02-21T08:16:50.6186405Z cvt.rn.f16x2.f32 %r524, %r523, %r522; 2026-02-21T08:16:50.6186685Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6186962Z cvt.u64.u32 %rd144, %r351; 2026-02-21T08:16:50.6187115Z cvt.u64.u32 %rd145, %r352; 2026-02-21T08:16:50.6187267Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:16:50.6187416Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:16:50.6187681Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6187960Z mov.b64 {%r525, %r526}, %rd147; 2026-02-21T08:16:50.6188125Z cvt.rn.f16x2.f32 %r527, %r526, %r525; 2026-02-21T08:16:50.6188396Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6188683Z cvt.u64.u32 %rd148, %r353; 2026-02-21T08:16:50.6188833Z cvt.u64.u32 %rd149, %r354; 2026-02-21T08:16:50.6188989Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:16:50.6189146Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:16:50.6189413Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6189709Z mov.b64 {%r528, %r529}, %rd151; 2026-02-21T08:16:50.6189868Z cvt.rn.f16x2.f32 %r530, %r529, %r528; 2026-02-21T08:16:50.6190146Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6190429Z cvt.u64.u32 %rd152, %r355; 2026-02-21T08:16:50.6190587Z cvt.u64.u32 %rd153, %r356; 2026-02-21T08:16:50.6190744Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:16:50.6190898Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:16:50.6191166Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6191449Z mov.b64 {%r531, %r532}, %rd155; 2026-02-21T08:16:50.6191617Z cvt.rn.f16x2.f32 %r533, %r532, %r531; 2026-02-21T08:16:50.6191890Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6192180Z cvt.u64.u32 %rd156, %r357; 2026-02-21T08:16:50.6192328Z cvt.u64.u32 %rd157, %r358; 2026-02-21T08:16:50.6192482Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:16:50.6192637Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:16:50.6192899Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6193192Z mov.b64 {%r534, %r535}, %rd159; 2026-02-21T08:16:50.6193350Z cvt.rn.f16x2.f32 %r536, %r535, %r534; 2026-02-21T08:16:50.6193632Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6193920Z cvt.u64.u32 %rd160, %r359; 2026-02-21T08:16:50.6194075Z cvt.u64.u32 %rd161, %r360; 2026-02-21T08:16:50.6194228Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:16:50.6194377Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:16:50.6194644Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6195014Z mov.b64 {%r537, %r538}, %rd163; 2026-02-21T08:16:50.6195183Z cvt.rn.f16x2.f32 %r539, %r538, %r537; 2026-02-21T08:16:50.6195453Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6195734Z cvt.u64.u32 %rd164, %r362; 2026-02-21T08:16:50.6195881Z cvt.u64.u32 %rd165, %r363; 2026-02-21T08:16:50.6196035Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:16:50.6196192Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:16:50.6196450Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6196732Z mov.b64 {%r540, %r541}, %rd167; 2026-02-21T08:16:50.6196892Z cvt.rn.f16x2.f32 %r542, %r541, %r540; 2026-02-21T08:16:50.6197168Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6197504Z cvt.u64.u32 %rd168, %r364; 2026-02-21T08:16:50.6197664Z cvt.u64.u32 %rd169, %r365; 2026-02-21T08:16:50.6197814Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:16:50.6197963Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:16:50.6198229Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6198501Z mov.b64 {%r543, %r544}, %rd171; 2026-02-21T08:16:50.6198667Z cvt.rn.f16x2.f32 %r545, %r544, %r543; 2026-02-21T08:16:50.6198937Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6199215Z cvt.u64.u32 %rd172, %r366; 2026-02-21T08:16:50.6199361Z cvt.u64.u32 %rd173, %r367; 2026-02-21T08:16:50.6199514Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:16:50.6199671Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:16:50.6199932Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6200214Z mov.b64 {%r546, %r547}, %rd175; 2026-02-21T08:16:50.6200375Z cvt.rn.f16x2.f32 %r548, %r547, %r546; 2026-02-21T08:16:50.6200648Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6200922Z cvt.u64.u32 %rd176, %r368; 2026-02-21T08:16:50.6201078Z cvt.u64.u32 %rd177, %r369; 2026-02-21T08:16:50.6201232Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:16:50.6201384Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:16:50.6201654Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6201943Z mov.b64 {%r549, %r550}, %rd179; 2026-02-21T08:16:50.6202110Z cvt.rn.f16x2.f32 %r551, %r550, %r549; 2026-02-21T08:16:50.6202375Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6202664Z cvt.u64.u32 %rd180, %r370; 2026-02-21T08:16:50.6202808Z cvt.u64.u32 %rd181, %r371; 2026-02-21T08:16:50.6202966Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:16:50.6203127Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:16:50.6203382Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6203663Z mov.b64 {%r552, %r553}, %rd183; 2026-02-21T08:16:50.6203821Z cvt.rn.f16x2.f32 %r554, %r553, %r552; 2026-02-21T08:16:50.6204098Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6204378Z cvt.u64.u32 %rd184, %r372; 2026-02-21T08:16:50.6204535Z cvt.u64.u32 %rd185, %r373; 2026-02-21T08:16:50.6204718Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:16:50.6204871Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:16:50.6205140Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6205422Z mov.b64 {%r555, %r556}, %rd187; 2026-02-21T08:16:50.6205588Z cvt.rn.f16x2.f32 %r557, %r556, %r555; 2026-02-21T08:16:50.6205872Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6206277Z cvt.u64.u32 %rd188, %r374; 2026-02-21T08:16:50.6206430Z cvt.u64.u32 %rd189, %r375; 2026-02-21T08:16:50.6206589Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:16:50.6206751Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:16:50.6207026Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6207335Z mov.b64 {%r558, %r559}, %rd191; 2026-02-21T08:16:50.6207503Z cvt.rn.f16x2.f32 %r560, %r559, %r558; 2026-02-21T08:16:50.6207802Z .loc 1 53 52 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:53:52 2026-02-21T08:16:50.6208099Z cvt.u64.u32 %rd192, %r376; 2026-02-21T08:16:50.6208263Z cvt.u64.u32 %rd193, %r377; 2026-02-21T08:16:50.6208425Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:16:50.6208582Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:16:50.6208921Z .loc 1 55 27 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:55:27 2026-02-21T08:16:50.6209212Z mov.b64 {%r561, %r562}, %rd195; 2026-02-21T08:16:50.6209384Z cvt.rn.f16x2.f32 %r563, %r562, %r561; 2026-02-21T08:16:50.6209665Z .loc 1 56 82 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:56:82 2026-02-21T08:16:50.6209992Z st.shared.v4.b32 [%r8], {%r470, %r482, %r494, %r506}; 2026-02-21T08:16:50.6210232Z st.shared.v4.b32 [%r9], {%r518, %r530, %r542, %r554}; 2026-02-21T08:16:50.6210425Z bar.sync 0; 2026-02-21T08:16:50.6210566Z // begin inline asm 2026-02-21T08:16:50.6210802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r419, %r423, %r427, %r431}, [%r383]; 2026-02-21T08:16:50.6211075Z // end inline asm 2026-02-21T08:16:50.6211215Z // begin inline asm 2026-02-21T08:16:50.6211453Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r435, %r439, %r443, %r447}, [%r388]; 2026-02-21T08:16:50.6211712Z // end inline asm 2026-02-21T08:16:50.6211855Z bar.sync 0; 2026-02-21T08:16:50.6212020Z st.shared.v4.b32 [%r8], {%r473, %r485, %r497, %r509}; 2026-02-21T08:16:50.6212256Z st.shared.v4.b32 [%r9], {%r521, %r533, %r545, %r557}; 2026-02-21T08:16:50.6212458Z bar.sync 0; 2026-02-21T08:16:50.6212590Z // begin inline asm 2026-02-21T08:16:50.6212833Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r420, %r424, %r428, %r432}, [%r383]; 2026-02-21T08:16:50.6213098Z // end inline asm 2026-02-21T08:16:50.6213241Z // begin inline asm 2026-02-21T08:16:50.6213472Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r436, %r440, %r444, %r448}, [%r388]; 2026-02-21T08:16:50.6213727Z // end inline asm 2026-02-21T08:16:50.6213853Z bar.sync 0; 2026-02-21T08:16:50.6214014Z st.shared.v4.b32 [%r8], {%r476, %r488, %r500, %r512}; 2026-02-21T08:16:50.6214235Z st.shared.v4.b32 [%r9], {%r524, %r536, %r548, %r560}; 2026-02-21T08:16:50.6214412Z bar.sync 0; 2026-02-21T08:16:50.6214543Z // begin inline asm 2026-02-21T08:16:50.6214785Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r421, %r425, %r429, %r433}, [%r383]; 2026-02-21T08:16:50.6215049Z // end inline asm 2026-02-21T08:16:50.6215180Z // begin inline asm 2026-02-21T08:16:50.6215406Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r437, %r441, %r445, %r449}, [%r388]; 2026-02-21T08:16:50.6215660Z // end inline asm 2026-02-21T08:16:50.6215794Z bar.sync 0; 2026-02-21T08:16:50.6215954Z st.shared.v4.b32 [%r8], {%r479, %r491, %r503, %r515}; 2026-02-21T08:16:50.6216168Z st.shared.v4.b32 [%r9], {%r527, %r539, %r551, %r563}; 2026-02-21T08:16:50.6216355Z bar.sync 0; 2026-02-21T08:16:50.6216478Z // begin inline asm 2026-02-21T08:16:50.6216704Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r422, %r426, %r430, %r434}, [%r383]; 2026-02-21T08:16:50.6216958Z // end inline asm 2026-02-21T08:16:50.6217095Z // begin inline asm 2026-02-21T08:16:50.6217307Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r438, %r442, %r446, %r450}, [%r388]; 2026-02-21T08:16:50.6217564Z // end inline asm 2026-02-21T08:16:50.6217698Z // begin inline asm 2026-02-21T08:16:50.6217877Z st.global.v4.b32 [ %rd60 + 0 ], { %r419, %r420, %r421, %r422 }; 2026-02-21T08:16:50.6218146Z // end inline asm 2026-02-21T08:16:50.6218272Z // begin inline asm 2026-02-21T08:16:50.6218450Z st.global.v4.b32 [ %rd61 + 0 ], { %r423, %r424, %r425, %r426 }; 2026-02-21T08:16:50.6218644Z // end inline asm 2026-02-21T08:16:50.6218779Z // begin inline asm 2026-02-21T08:16:50.6218947Z st.global.v4.b32 [ %rd62 + 0 ], { %r427, %r428, %r429, %r430 }; 2026-02-21T08:16:50.6219007Z // end inline asm 2026-02-21T08:16:50.6219060Z // begin inline asm 2026-02-21T08:16:50.6219152Z st.global.v4.b32 [ %rd63 + 0 ], { %r431, %r432, %r433, %r434 }; 2026-02-21T08:16:50.6219210Z // end inline asm 2026-02-21T08:16:50.6219262Z // begin inline asm 2026-02-21T08:16:50.6219351Z st.global.v4.b32 [ %rd64 + 0 ], { %r435, %r436, %r437, %r438 }; 2026-02-21T08:16:50.6219403Z // end inline asm 2026-02-21T08:16:50.6219466Z // begin inline asm 2026-02-21T08:16:50.6219557Z st.global.v4.b32 [ %rd65 + 0 ], { %r439, %r440, %r441, %r442 }; 2026-02-21T08:16:50.6219658Z // end inline asm 2026-02-21T08:16:50.6219724Z // begin inline asm 2026-02-21T08:16:50.6219816Z st.global.v4.b32 [ %rd66 + 0 ], { %r443, %r444, %r445, %r446 }; 2026-02-21T08:16:50.6219867Z // end inline asm 2026-02-21T08:16:50.6219926Z // begin inline asm 2026-02-21T08:16:50.6220016Z st.global.v4.b32 [ %rd67 + 0 ], { %r447, %r448, %r449, %r450 }; 2026-02-21T08:16:50.6220069Z // end inline asm 2026-02-21T08:16:50.6220147Z $L__BB0_8: // %._crit_edge 2026-02-21T08:16:50.6220329Z .loc 1 27 4 // cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py:27:4 2026-02-21T08:16:50.6220381Z bar.sync 0; 2026-02-21T08:16:50.6220434Z // begin inline asm 2026-02-21T08:16:50.6220557Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r564, 128; 2026-02-21T08:16:50.6220610Z // end inline asm 2026-02-21T08:16:50.6220661Z ret; 2026-02-21T08:16:50.6220713Z $L__tmp0: 2026-02-21T08:16:50.6220775Z $L__func_end0: 2026-02-21T08:16:50.6220860Z // -- End function 2026-02-21T08:16:50.6220913Z } 2026-02-21T08:16:50.6221127Z .file 1 "/tmp/torchinductor_root/sh/cshrsjarzal2mh5k2pyfhrjnyfvioxxzrm5hk5rb5bgruf42tfjz.py" 2026-02-21T08:16:50.6221187Z .section .debug_abbrev 2026-02-21T08:16:50.6221236Z { 2026-02-21T08:16:50.6221328Z .b8 1 // Abbreviation Code 2026-02-21T08:16:50.6221414Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:16:50.6221492Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:16:50.6221571Z .b8 37 // DW_AT_producer 2026-02-21T08:16:50.6221652Z .b8 8 // DW_FORM_string 2026-02-21T08:16:50.6221725Z .b8 19 // DW_AT_language 2026-02-21T08:16:50.6221803Z .b8 5 // DW_FORM_data2 2026-02-21T08:16:50.6221882Z .b8 3 // DW_AT_name 2026-02-21T08:16:50.6221954Z .b8 8 // DW_FORM_string 2026-02-21T08:16:50.6222033Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:16:50.6222115Z .b8 6 // DW_FORM_data4 2026-02-21T08:16:50.6222186Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:16:50.6222256Z .b8 8 // DW_FORM_string 2026-02-21T08:16:50.6222323Z .b8 0 // EOM(1) 2026-02-21T08:16:50.6222398Z .b8 0 // EOM(2) 2026-02-21T08:16:50.6222462Z .b8 0 // EOM(3) 2026-02-21T08:16:50.6222511Z } 2026-02-21T08:16:50.6222575Z .section .debug_info 2026-02-21T08:16:50.6222625Z { 2026-02-21T08:16:50.6222704Z .b32 104 // Length of Unit 2026-02-21T08:16:50.6222791Z .b8 2 // DWARF version number 2026-02-21T08:16:50.6222841Z .b8 0 2026-02-21T08:16:50.6222958Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:16:50.6223084Z .b8 8 // Address Size (in bytes) 2026-02-21T08:16:50.6223188Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:16:50.6223265Z .b8 116 // DW_AT_producer 2026-02-21T08:16:50.6223316Z .b8 114 2026-02-21T08:16:50.6223372Z .b8 105 2026-02-21T08:16:50.6223422Z .b8 116 2026-02-21T08:16:50.6223471Z .b8 111 2026-02-21T08:16:50.6223519Z .b8 110 2026-02-21T08:16:50.6223574Z .b8 0 2026-02-21T08:16:50.6223644Z .b8 2 // DW_AT_language 2026-02-21T08:16:50.6223693Z .b8 0 2026-02-21T08:16:50.6223772Z .b8 99 // DW_AT_name 2026-02-21T08:16:50.6223821Z .b8 115 2026-02-21T08:16:50.6223869Z .b8 104 2026-02-21T08:16:50.6223916Z .b8 114 2026-02-21T08:16:50.6223973Z .b8 115 2026-02-21T08:16:50.6224023Z .b8 106 2026-02-21T08:16:50.6224073Z .b8 97 2026-02-21T08:16:50.6224129Z .b8 114 2026-02-21T08:16:50.6224217Z .b8 122 2026-02-21T08:16:50.6224271Z .b8 97 2026-02-21T08:16:50.6224319Z .b8 108 2026-02-21T08:16:50.6224377Z .b8 50 2026-02-21T08:16:50.6224427Z .b8 109 2026-02-21T08:16:50.6224475Z .b8 104 2026-02-21T08:16:50.6224532Z .b8 53 2026-02-21T08:16:50.6224581Z .b8 107 2026-02-21T08:16:50.6224630Z .b8 50 2026-02-21T08:16:50.6224707Z .b8 112 2026-02-21T08:16:50.6224764Z .b8 121 2026-02-21T08:16:50.6224813Z .b8 102 2026-02-21T08:16:50.6224862Z .b8 104 2026-02-21T08:16:50.6224911Z .b8 114 2026-02-21T08:16:50.6224967Z .b8 106 2026-02-21T08:16:50.6225015Z .b8 110 2026-02-21T08:16:50.6225063Z .b8 121 2026-02-21T08:16:50.6225116Z .b8 102 2026-02-21T08:16:50.6225164Z .b8 118 2026-02-21T08:16:50.6225212Z .b8 105 2026-02-21T08:16:50.6225259Z .b8 111 2026-02-21T08:16:50.6225313Z .b8 120 2026-02-21T08:16:50.6225359Z .b8 120 2026-02-21T08:16:50.6225407Z .b8 122 2026-02-21T08:16:50.6225461Z .b8 114 2026-02-21T08:16:50.6225508Z .b8 109 2026-02-21T08:16:50.6225557Z .b8 53 2026-02-21T08:16:50.6225604Z .b8 104 2026-02-21T08:16:50.6225663Z .b8 107 2026-02-21T08:16:50.6225714Z .b8 53 2026-02-21T08:16:50.6225762Z .b8 114 2026-02-21T08:16:50.6225811Z .b8 98 2026-02-21T08:16:50.6225868Z .b8 53 2026-02-21T08:16:50.6225919Z .b8 98 2026-02-21T08:16:50.6225968Z .b8 103 2026-02-21T08:16:50.6226025Z .b8 114 2026-02-21T08:16:50.6226073Z .b8 117 2026-02-21T08:16:50.6226120Z .b8 102 2026-02-21T08:16:50.6226168Z .b8 52 2026-02-21T08:16:50.6226224Z .b8 50 2026-02-21T08:16:50.6226271Z .b8 116 2026-02-21T08:16:50.6226319Z .b8 102 2026-02-21T08:16:50.6226373Z .b8 106 2026-02-21T08:16:50.6226421Z .b8 122 2026-02-21T08:16:50.6226468Z .b8 46 2026-02-21T08:16:50.6226517Z .b8 112 2026-02-21T08:16:50.6226572Z .b8 121 2026-02-21T08:16:50.6226621Z .b8 0 2026-02-21T08:16:50.6226711Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:16:50.6226789Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:16:50.6226837Z .b8 116 2026-02-21T08:16:50.6226885Z .b8 109 2026-02-21T08:16:50.6226933Z .b8 112 2026-02-21T08:16:50.6226992Z .b8 47 2026-02-21T08:16:50.6227043Z .b8 116 2026-02-21T08:16:50.6227093Z .b8 111 2026-02-21T08:16:50.6227147Z .b8 114 2026-02-21T08:16:50.6227197Z .b8 99 2026-02-21T08:16:50.6227246Z .b8 104 2026-02-21T08:16:50.6227296Z .b8 105 2026-02-21T08:16:50.6227353Z .b8 110 2026-02-21T08:16:50.6227403Z .b8 100 2026-02-21T08:16:50.6227452Z .b8 117 2026-02-21T08:16:50.6227501Z .b8 99 2026-02-21T08:16:50.6227560Z .b8 116 2026-02-21T08:16:50.6227608Z .b8 111 2026-02-21T08:16:50.6227656Z .b8 114 2026-02-21T08:16:50.6227713Z .b8 95 2026-02-21T08:16:50.6227762Z .b8 114 2026-02-21T08:16:50.6227811Z .b8 111 2026-02-21T08:16:50.6227859Z .b8 111 2026-02-21T08:16:50.6227914Z .b8 116 2026-02-21T08:16:50.6227961Z .b8 47 2026-02-21T08:16:50.6228008Z .b8 115 2026-02-21T08:16:50.6228064Z .b8 104 2026-02-21T08:16:50.6228111Z .b8 0 2026-02-21T08:16:50.6228159Z } 2026-02-21T08:16:50.6228223Z .section .debug_macinfo { } 2026-02-21T08:16:50.6228228Z 2026-02-21T08:16:50.6228312Z ================================================================ 2026-02-21T08:16:50.6228517Z please share the reproducer above with Triton project. 2026-02-21T08:16:50.8991557Z 2026-02-21T08:16:50.8996676Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 20.5 configs/s 2026-02-21T08:16:51.2462719Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2767.8 2026-02-21T08:16:51.2466365Z configs/s 2026-02-21T08:16:51.2850943Z [39s] Generation 2 complete: 2026-02-21T08:16:51.2855404Z error=25 2026-02-21T08:16:51.2859276Z ok=65 2026-02-21T08:16:51.2864213Z min=0.0368 2026-02-21T08:16:51.2868718Z mid=0.1536 2026-02-21T08:16:51.2870312Z max=6.8209 2026-02-21T08:16:51.2870509Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:16:51.2870747Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:16:51.2870969Z 'l2_groupings': [64], 2026-02-21T08:16:51.2871196Z 'load_eviction_policies': ['', ''], 2026-02-21T08:16:51.2871682Z 'loop_orders': [[1, 0]], 2026-02-21T08:16:51.2871891Z 'num_stages': 7, 2026-02-21T08:16:51.2872049Z 'num_warps': 2, 2026-02-21T08:16:51.2873219Z 'pid_type': 'flat', 2026-02-21T08:16:51.2873430Z 'range_flattens': [None, None], 2026-02-21T08:16:51.2873647Z 'range_multi_buffers': [None, None], 2026-02-21T08:16:51.2873838Z 'range_num_stages': [0, 0], 2026-02-21T08:16:51.2874010Z 'range_unroll_factors': [0, 0], 2026-02-21T08:16:51.2874186Z 'range_warp_specializes': [None, True]} 2026-02-21T08:16:51.2874472Z [39s] Fitting surrogate: 275 points, 275 targets 2026-02-21T08:16:52.4839079Z [40s] Generation 3 starting: 77 neighbors, 5 active search path(s) 2026-02-21T08:16:56.3897990Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 10.2 configs/s 2026-02-21T08:17:00.1849089Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 21.4 configs/s 2026-02-21T08:17:01.4081662Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1028.5 2026-02-21T08:17:01.4082749Z configs/s 2026-02-21T08:17:01.4844910Z [49s] Generation 3 complete: 2026-02-21T08:17:01.4849580Z error=20 2026-02-21T08:17:01.4850993Z ok=63 2026-02-21T08:17:01.4851159Z min=0.0389 2026-02-21T08:17:01.4851287Z mid=0.0840 2026-02-21T08:17:01.4851417Z max=12.4268 2026-02-21T08:17:01.4851557Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:17:01.4851782Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:17:01.4851983Z 'l2_groupings': [64], 2026-02-21T08:17:01.4852148Z 'load_eviction_policies': ['', ''], 2026-02-21T08:17:01.4852334Z 'loop_orders': [[1, 0]], 2026-02-21T08:17:01.4852490Z 'num_stages': 7, 2026-02-21T08:17:01.4852633Z 'num_warps': 2, 2026-02-21T08:17:01.4852767Z 'pid_type': 'flat', 2026-02-21T08:17:01.4852926Z 'range_flattens': [None, None], 2026-02-21T08:17:01.4853097Z 'range_multi_buffers': [None, None], 2026-02-21T08:17:01.4853283Z 'range_num_stages': [0, 0], 2026-02-21T08:17:01.4853445Z 'range_unroll_factors': [0, 0], 2026-02-21T08:17:01.4853658Z 'range_warp_specializes': [None, True]} 2026-02-21T08:17:01.4864578Z [49s] Fitting surrogate: 358 points, 358 targets 2026-02-21T08:17:02.7040988Z [50s] Generation 4 starting: 75 neighbors, 5 active search path(s) 2026-02-21T08:17:10.0535320Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78/78 3.5 configs/s 2026-02-21T08:17:13.6363250Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 78/78 22.1 configs/s 2026-02-21T08:17:15.4636715Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 552.9 2026-02-21T08:17:15.4641311Z configs/s 2026-02-21T08:17:15.5892658Z [63s] Generation 4 complete: 2026-02-21T08:17:15.5897467Z error=21 2026-02-21T08:17:15.5902099Z ok=60 2026-02-21T08:17:15.5906981Z min=0.0389 2026-02-21T08:17:15.5909341Z mid=0.0696 2026-02-21T08:17:15.5909541Z max=12.8123 2026-02-21T08:17:15.5909753Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:17:15.5910092Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:17:15.5910952Z 'l2_groupings': [32], 2026-02-21T08:17:15.5911160Z 'load_eviction_policies': ['', ''], 2026-02-21T08:17:15.5911401Z 'loop_orders': [[1, 0]], 2026-02-21T08:17:15.5911608Z 'num_stages': 7, 2026-02-21T08:17:15.5911785Z 'num_warps': 8, 2026-02-21T08:17:15.5911974Z 'pid_type': 'flat', 2026-02-21T08:17:15.5912173Z 'range_flattens': [None, None], 2026-02-21T08:17:15.5912407Z 'range_multi_buffers': [None, True], 2026-02-21T08:17:15.5912636Z 'range_num_stages': [0, 0], 2026-02-21T08:17:15.5912854Z 'range_unroll_factors': [0, 0], 2026-02-21T08:17:15.5917140Z 'range_warp_specializes': [None, True]} 2026-02-21T08:17:15.5917467Z [63s] Fitting surrogate: 439 points, 439 targets 2026-02-21T08:17:16.6581277Z [64s] Generation 5 starting: 56 neighbors, 4 active search path(s) 2026-02-21T08:17:48.9646471Z [97s] Timeout after 30s compiling Config(block_sizes=[1024, 256, 32], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T08:17:48.9664099Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 0.6 configs/s 2026-02-21T08:17:51.8592917Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 20.4 configs/s 2026-02-21T08:17:52.8573082Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 997.9 2026-02-21T08:17:52.8573509Z configs/s 2026-02-21T08:17:52.9320778Z [101s] Generation 5 complete: 2026-02-21T08:17:52.9321046Z error=16 2026-02-21T08:17:52.9321218Z timeout=1 2026-02-21T08:17:52.9321358Z ok=44 2026-02-21T08:17:52.9321655Z min=0.0389 2026-02-21T08:17:52.9321798Z mid=0.1105 2026-02-21T08:17:52.9321950Z max=11.0019 2026-02-21T08:17:52.9322151Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:17:52.9322410Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:17:52.9322648Z 'l2_groupings': [32], 2026-02-21T08:17:52.9322827Z 'load_eviction_policies': ['', ''], 2026-02-21T08:17:52.9323028Z 'loop_orders': [[1, 0]], 2026-02-21T08:17:52.9323198Z 'num_stages': 7, 2026-02-21T08:17:52.9323360Z 'num_warps': 8, 2026-02-21T08:17:52.9323511Z 'pid_type': 'flat', 2026-02-21T08:17:52.9323689Z 'range_flattens': [None, None], 2026-02-21T08:17:52.9323883Z 'range_multi_buffers': [None, True], 2026-02-21T08:17:52.9324088Z 'range_num_stages': [0, 0], 2026-02-21T08:17:52.9324271Z 'range_unroll_factors': [0, 0], 2026-02-21T08:17:52.9324466Z 'range_warp_specializes': [None, True]} 2026-02-21T08:17:52.9344210Z [101s] Fitting surrogate: 500 points, 500 targets 2026-02-21T08:17:53.9618216Z [102s] Generation 6 starting: 62 neighbors, 4 active search path(s) 2026-02-21T08:18:04.2967482Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64/64 2.1 configs/s 2026-02-21T08:18:07.4103957Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 64/64 21.3 configs/s 2026-02-21T08:18:08.7648678Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 754.9 2026-02-21T08:18:08.7649329Z configs/s 2026-02-21T08:18:08.8937792Z [116s] Generation 6 complete: 2026-02-21T08:18:08.8938215Z error=17 2026-02-21T08:18:08.8938478Z ok=50 2026-02-21T08:18:08.8938705Z min=0.0369 2026-02-21T08:18:08.8938926Z mid=0.0901 2026-02-21T08:18:08.8939152Z max=12.7100 2026-02-21T08:18:08.8939399Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:18:08.8939853Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:18:08.8940278Z 'l2_groupings': [8], 2026-02-21T08:18:08.8940598Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:18:08.8940966Z 'loop_orders': [[0, 1]], 2026-02-21T08:18:08.8941259Z 'num_stages': 7, 2026-02-21T08:18:08.8941566Z 'num_warps': 4, 2026-02-21T08:18:08.8941824Z 'pid_type': 'flat', 2026-02-21T08:18:08.8942725Z 'range_flattens': [None, False], 2026-02-21T08:18:08.8943066Z 'range_multi_buffers': [None, False], 2026-02-21T08:18:08.8943423Z 'range_num_stages': [0, 0], 2026-02-21T08:18:08.8943737Z 'range_unroll_factors': [0, 0], 2026-02-21T08:18:08.8944088Z 'range_warp_specializes': [None, False]} 2026-02-21T08:18:08.8982040Z [116s] Fitting surrogate: 567 points, 567 targets 2026-02-21T08:18:10.7442607Z [118s] Generation 7 starting: 55 neighbors, 4 active search path(s) 2026-02-21T08:18:44.6025149Z [152s] Timeout after 30s compiling Config(block_sizes=[512, 256, 16], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=6, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T08:18:44.6043829Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 0.6 configs/s 2026-02-21T08:18:47.2494553Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 56/56 21.6 configs/s 2026-02-21T08:18:48.1093201Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1156.8 2026-02-21T08:18:48.1093981Z configs/s 2026-02-21T08:18:48.1777413Z [156s] Generation 7 complete: 2026-02-21T08:18:48.1777791Z error=20 2026-02-21T08:18:48.1778026Z timeout=1 2026-02-21T08:18:48.1778230Z ok=38 2026-02-21T08:18:48.1778450Z min=0.0369 2026-02-21T08:18:48.1778660Z mid=0.1556 2026-02-21T08:18:48.1778881Z max=13.0079 2026-02-21T08:18:48.1779113Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:18:48.1779481Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:18:48.1779871Z 'l2_groupings': [8], 2026-02-21T08:18:48.1780156Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:18:48.1780510Z 'loop_orders': [[0, 1]], 2026-02-21T08:18:48.1780765Z 'num_stages': 7, 2026-02-21T08:18:48.1781024Z 'num_warps': 4, 2026-02-21T08:18:48.1781267Z 'pid_type': 'flat', 2026-02-21T08:18:48.1781526Z 'range_flattens': [None, False], 2026-02-21T08:18:48.1781825Z 'range_multi_buffers': [None, False], 2026-02-21T08:18:48.1782132Z 'range_num_stages': [0, 0], 2026-02-21T08:18:48.1782410Z 'range_unroll_factors': [0, 0], 2026-02-21T08:18:48.1782711Z 'range_warp_specializes': [None, False]} 2026-02-21T08:18:48.1802603Z [156s] Fitting surrogate: 626 points, 626 targets 2026-02-21T08:18:49.0784322Z [157s] Generation 8 starting: 42 neighbors, 3 active search path(s) 2026-02-21T08:18:53.7161717Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43/43 6.0 configs/s 2026-02-21T08:18:55.1999536Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 43/43 30.4 configs/s 2026-02-21T08:18:55.8542323Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1515.3 2026-02-21T08:18:55.8543271Z configs/s 2026-02-21T08:18:55.9193065Z [163s] Generation 8 complete: 2026-02-21T08:18:55.9193647Z error=21 2026-02-21T08:18:55.9194011Z ok=24 2026-02-21T08:18:55.9194342Z min=0.0369 2026-02-21T08:18:55.9197132Z mid=0.0553 2026-02-21T08:18:55.9197536Z max=13.8722 2026-02-21T08:18:55.9197920Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:18:55.9198543Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:18:55.9199146Z 'l2_groupings': [8], 2026-02-21T08:18:55.9199602Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:18:55.9200162Z 'loop_orders': [[0, 1]], 2026-02-21T08:18:55.9200570Z 'num_stages': 3, 2026-02-21T08:18:55.9200960Z 'num_warps': 8, 2026-02-21T08:18:55.9201361Z 'pid_type': 'flat', 2026-02-21T08:18:55.9201824Z 'range_flattens': [None, None], 2026-02-21T08:18:55.9202270Z 'range_multi_buffers': [None, False], 2026-02-21T08:18:55.9202731Z 'range_num_stages': [0, 0], 2026-02-21T08:18:55.9203571Z 'range_unroll_factors': [0, 0], 2026-02-21T08:18:55.9204085Z 'range_warp_specializes': [None, False]} 2026-02-21T08:18:55.9220624Z [164s] Fitting surrogate: 671 points, 671 targets 2026-02-21T08:18:56.5962224Z [164s] Generation 9 starting: 31 neighbors, 2 active search path(s) 2026-02-21T08:19:01.8712169Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32/32 3.4 configs/s 2026-02-21T08:19:03.6076273Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 32/32 17.1 configs/s 2026-02-21T08:19:03.9254034Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3014.4 2026-02-21T08:19:03.9254604Z configs/s 2026-02-21T08:19:03.9644537Z [172s] Generation 9 complete: 2026-02-21T08:19:03.9645271Z error=9 2026-02-21T08:19:03.9645451Z ok=24 2026-02-21T08:19:03.9645658Z min=0.0368 2026-02-21T08:19:03.9645844Z mid=0.3338 2026-02-21T08:19:03.9646038Z max=13.7185 2026-02-21T08:19:03.9646293Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:19:03.9646636Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:19:03.9646997Z 'l2_groupings': [8], 2026-02-21T08:19:03.9647254Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:19:03.9647560Z 'loop_orders': [[0, 1]], 2026-02-21T08:19:03.9647789Z 'num_stages': 3, 2026-02-21T08:19:03.9648009Z 'num_warps': 8, 2026-02-21T08:19:03.9648215Z 'pid_type': 'flat', 2026-02-21T08:19:03.9648466Z 'range_flattens': [None, None], 2026-02-21T08:19:03.9648741Z 'range_multi_buffers': [None, False], 2026-02-21T08:19:03.9649022Z 'range_num_stages': [0, 0], 2026-02-21T08:19:03.9649265Z 'range_unroll_factors': [0, 0], 2026-02-21T08:19:03.9649540Z 'range_warp_specializes': [None, False]} 2026-02-21T08:19:03.9669018Z [172s] Fitting surrogate: 704 points, 704 targets 2026-02-21T08:19:04.3958539Z [172s] Generation 10 starting: 15 neighbors, 1 active search path(s) 2026-02-21T08:19:05.3317159Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 31.2 configs/s 2026-02-21T08:19:06.0294195Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 25.2 configs/s 2026-02-21T08:19:06.1705140Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 6374.8 2026-02-21T08:19:06.1705935Z configs/s 2026-02-21T08:19:06.1974745Z [174s] Generation 10 complete: 2026-02-21T08:19:06.1975092Z error=4 2026-02-21T08:19:06.1975330Z ok=13 2026-02-21T08:19:06.1975527Z min=0.0349 2026-02-21T08:19:06.1975760Z mid=0.1146 2026-02-21T08:19:06.1975971Z max=0.2908 2026-02-21T08:19:06.1976186Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:19:06.1981122Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:19:06.1981455Z 'l2_groupings': [8], 2026-02-21T08:19:06.1981663Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:19:06.1981879Z 'loop_orders': [[0, 1]], 2026-02-21T08:19:06.1982054Z 'num_stages': 3, 2026-02-21T08:19:06.1982204Z 'num_warps': 8, 2026-02-21T08:19:06.1982422Z 'pid_type': 'flat', 2026-02-21T08:19:06.1982970Z 'range_flattens': [None, None], 2026-02-21T08:19:06.1983172Z 'range_multi_buffers': [None, False], 2026-02-21T08:19:06.1983367Z 'range_num_stages': [0, 0], 2026-02-21T08:19:06.1983551Z 'range_unroll_factors': [0, 0], 2026-02-21T08:19:06.1983735Z 'range_warp_specializes': [None, False]} 2026-02-21T08:19:06.1998113Z [174s] Fitting surrogate: 721 points, 721 targets 2026-02-21T08:19:06.6386054Z [174s] Generation 11 starting: 15 neighbors, 1 active search path(s) 2026-02-21T08:19:07.8615317Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 19.7 configs/s 2026-02-21T08:19:08.1815793Z 2026-02-21T08:19:08.1817934Z 2026-02-21T08:19:08.1818353Z ================================================================ 2026-02-21T08:19:08.1818635Z Internal Triton PTX codegen error 2026-02-21T08:19:08.1818850Z `ptxas` stderr: 2026-02-21T08:19:08.1819674Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 265 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:19:08.1823848Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:19:08.1825311Z 2026-02-21T08:19:08.1831490Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpub2t9ujm.ptx -o /tmp/tmpub2t9ujm.ptx.o 2026-02-21T08:19:08.1835436Z 2026-02-21T08:19:08.1837182Z 2026-02-21T08:19:08.1837427Z // 2026-02-21T08:19:08.1837640Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:19:08.1837880Z // 2026-02-21T08:19:08.1837962Z 2026-02-21T08:19:08.1838033Z .version 8.7 2026-02-21T08:19:08.1838210Z .target sm_100a 2026-02-21T08:19:08.1838357Z .address_size 64 2026-02-21T08:19:08.1838440Z 2026-02-21T08:19:08.1838579Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:19:08.1838849Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:19:08.1839084Z // @_helion_matmul 2026-02-21T08:19:08.1839295Z .visible .entry _helion_matmul( 2026-02-21T08:19:08.1839518Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:19:08.1839787Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:19:08.1840037Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:19:08.1840302Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:19:08.1841012Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:19:08.1841216Z ) 2026-02-21T08:19:08.1841332Z .reqntid 256 2026-02-21T08:19:08.1841464Z .maxnreg 32 2026-02-21T08:19:08.1841625Z { 2026-02-21T08:19:08.1841744Z .reg .pred %p<127>; 2026-02-21T08:19:08.1843864Z .reg .b32 %r<1242>; 2026-02-21T08:19:08.1844061Z .reg .b64 %rd<649>; 2026-02-21T08:19:08.1844340Z .loc 1 19 0 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:19:0 2026-02-21T08:19:08.1844641Z $L__func_begin0: 2026-02-21T08:19:08.1845236Z .loc 1 19 0 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:19:0 2026-02-21T08:19:08.1845488Z 2026-02-21T08:19:08.1845543Z // %bb.0: 2026-02-21T08:19:08.1845708Z ld.param.b64 %rd16, [_helion_matmul_param_0]; 2026-02-21T08:19:08.1845897Z $L__tmp0: 2026-02-21T08:19:08.1846139Z .loc 1 19 0 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:19 2026-02-21T08:19:08.1846468Z mov.u32 %r1, %tid.x; 2026-02-21T08:19:08.1846639Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:19:08.1846845Z ld.param.b64 %rd34, [_helion_matmul_param_2]; 2026-02-21T08:19:08.1847102Z mov.b32 %r37, global_smem; 2026-02-21T08:19:08.1847297Z // begin inline asm 2026-02-21T08:19:08.1847598Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r37], 512; 2026-02-21T08:19:08.1847906Z // end inline asm 2026-02-21T08:19:08.1848112Z ld.param.b64 %rd51, [_helion_matmul_param_3]; 2026-02-21T08:19:08.1848349Z bar.sync 0; 2026-02-21T08:19:08.1848534Z ld.shared.b32 %r1234, [global_smem]; 2026-02-21T08:19:08.1849109Z bar.sync 0; 2026-02-21T08:19:08.1849264Z // begin inline asm 2026-02-21T08:19:08.1849505Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:19:08.1849803Z // end inline asm 2026-02-21T08:19:08.1850111Z .loc 1 21 67 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:21:67 2026-02-21T08:19:08.1850449Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:19:08.1850620Z mov.u32 %r54, %ctaid.y; 2026-02-21T08:19:08.1850813Z mov.u32 %r55, %ctaid.z; 2026-02-21T08:19:08.1850973Z mov.u32 %r56, %nctaid.x; 2026-02-21T08:19:08.1851141Z mov.u32 %r57, %nctaid.y; 2026-02-21T08:19:08.1851307Z mad.lo.s32 %r58, %r55, %r57, %r54; 2026-02-21T08:19:08.1851498Z mad.lo.s32 %r59, %r58, %r56, %r3; 2026-02-21T08:19:08.1851681Z shl.b32 %r60, %r59, 8; 2026-02-21T08:19:08.1851828Z cvt.s64.s32 %rd52, %r60; 2026-02-21T08:19:08.1851990Z add.s64 %rd30, %rd51, %rd52; 2026-02-21T08:19:08.1852145Z shl.b32 %r61, %r1, 2; 2026-02-21T08:19:08.1852372Z add.s32 %r38, %r37, %r61; 2026-02-21T08:19:08.1852524Z mov.b32 %r47, 0; 2026-02-21T08:19:08.1852664Z // begin inline asm 2026-02-21T08:19:08.1852833Z @%p1 st.shared.b32 [ %r38 + 0 ], %r47; 2026-02-21T08:19:08.1853009Z // end inline asm 2026-02-21T08:19:08.1853148Z bar.warp.sync -1; 2026-02-21T08:19:08.1853302Z setp.eq.b32 %p115, %r1, 0; 2026-02-21T08:19:08.1853468Z cvt.u64.u32 %rd15, %r37; 2026-02-21T08:19:08.1853614Z // begin inline asm 2026-02-21T08:19:08.1853878Z @%p115 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd15 + 0 ], %rd16; 2026-02-21T08:19:08.1854163Z // end inline asm 2026-02-21T08:19:08.1854310Z // begin inline asm 2026-02-21T08:19:08.1854533Z @%p115 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1; 2026-02-21T08:19:08.1854837Z // end inline asm 2026-02-21T08:19:08.1854963Z mov.b32 %r40, 32; 2026-02-21T08:19:08.1855102Z // begin inline asm 2026-02-21T08:19:08.1855339Z @%p115 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r40; 2026-02-21T08:19:08.1855607Z // end inline asm 2026-02-21T08:19:08.1855746Z mov.b32 %r41, 256; 2026-02-21T08:19:08.1855876Z // begin inline asm 2026-02-21T08:19:08.1856109Z @%p115 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r41; 2026-02-21T08:19:08.1856365Z // end inline asm 2026-02-21T08:19:08.1856502Z mov.b32 %r42, 2048; 2026-02-21T08:19:08.1856637Z // begin inline asm 2026-02-21T08:19:08.1857215Z [176s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:19:08.1858460Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=4, num_stages=6, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:19:08.1859585Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:19:08.1859812Z `ptxas` stderr: 2026-02-21T08:19:08.1860224Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 265 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:19:08.1860683Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:19:08.1860825Z 2026-02-21T08:19:08.1861242Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpub2t9ujm.ptx -o /tmp/tmpub2t9ujm.ptx.o 2026-02-21T08:19:08.1861677Z 2026-02-21T08:19:08.1861800Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:19:08.1862142Z @%p115 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r42; 2026-02-21T08:19:08.1862418Z // end inline asm 2026-02-21T08:19:08.1862568Z mov.b32 %r43, 4096; 2026-02-21T08:19:08.1862788Z // begin inline asm 2026-02-21T08:19:08.1863036Z @%p115 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r43; 2026-02-21T08:19:08.1863311Z // end inline asm 2026-02-21T08:19:08.1863447Z mov.b64 %rd23, 4096; 2026-02-21T08:19:08.1863599Z // begin inline asm 2026-02-21T08:19:08.1863848Z @%p115 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd15 + 0 ], 0x0, %rd23; 2026-02-21T08:19:08.1864147Z // end inline asm 2026-02-21T08:19:08.1864278Z mov.b32 %r44, 1; 2026-02-21T08:19:08.1864419Z // begin inline asm 2026-02-21T08:19:08.1864711Z @%p115 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r44; 2026-02-21T08:19:08.1864998Z // end inline asm 2026-02-21T08:19:08.1865132Z // begin inline asm 2026-02-21T08:19:08.1865376Z @%p115 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r44; 2026-02-21T08:19:08.1865669Z // end inline asm 2026-02-21T08:19:08.1865849Z // begin inline asm 2026-02-21T08:19:08.1866088Z @%p115 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x6; 2026-02-21T08:19:08.1866345Z // end inline asm 2026-02-21T08:19:08.1866483Z // begin inline asm 2026-02-21T08:19:08.1866730Z @%p115 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T08:19:08.1867011Z // end inline asm 2026-02-21T08:19:08.1867144Z // begin inline asm 2026-02-21T08:19:08.1867370Z @%p115 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x2; 2026-02-21T08:19:08.1867641Z // end inline asm 2026-02-21T08:19:08.1867766Z // begin inline asm 2026-02-21T08:19:08.1867997Z @%p115 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T08:19:08.1868249Z // end inline asm 2026-02-21T08:19:08.1868381Z // begin inline asm 2026-02-21T08:19:08.1868727Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd30 + 0 ], [ %rd15 + 0 ], 0x80; 2026-02-21T08:19:08.1869108Z // end inline asm 2026-02-21T08:19:08.1869240Z // begin inline asm 2026-02-21T08:19:08.1869439Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd30 + 0 ], 0x80; 2026-02-21T08:19:08.1869691Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:19:08.1869874Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:19:08.1870052Z // end inline asm 2026-02-21T08:19:08.1870183Z bar.sync 0; 2026-02-21T08:19:08.1870320Z cvta.global.u64 %rd102, %rd30; 2026-02-21T08:19:08.1870599Z .loc 1 23 71 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:23:71 2026-02-21T08:19:08.1870880Z add.s64 %rd48, %rd30, 128; 2026-02-21T08:19:08.1871036Z bar.sync 0; 2026-02-21T08:19:08.1871161Z // begin inline asm 2026-02-21T08:19:08.1871312Z @%p1 st.shared.b32 [ %r38 + 0 ], %r47; 2026-02-21T08:19:08.1871478Z // end inline asm 2026-02-21T08:19:08.1871620Z bar.warp.sync -1; 2026-02-21T08:19:08.1871766Z // begin inline asm 2026-02-21T08:19:08.1872015Z @%p115 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd15 + 0 ], %rd34; 2026-02-21T08:19:08.1872303Z // end inline asm 2026-02-21T08:19:08.1872429Z // begin inline asm 2026-02-21T08:19:08.1872653Z @%p115 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1; 2026-02-21T08:19:08.1872898Z // end inline asm 2026-02-21T08:19:08.1873030Z mov.b32 %r48, 64; 2026-02-21T08:19:08.1873161Z // begin inline asm 2026-02-21T08:19:08.1873396Z @%p115 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r48; 2026-02-21T08:19:08.1873665Z // end inline asm 2026-02-21T08:19:08.1873791Z // begin inline asm 2026-02-21T08:19:08.1874024Z @%p115 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r41; 2026-02-21T08:19:08.1874280Z // end inline asm 2026-02-21T08:19:08.1874414Z // begin inline asm 2026-02-21T08:19:08.1874649Z @%p115 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r42; 2026-02-21T08:19:08.1874993Z // end inline asm 2026-02-21T08:19:08.1875204Z // begin inline asm 2026-02-21T08:19:08.1875450Z @%p115 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r43; 2026-02-21T08:19:08.1875731Z // end inline asm 2026-02-21T08:19:08.1875864Z // begin inline asm 2026-02-21T08:19:08.1876128Z @%p115 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd15 + 0 ], 0x0, %rd23; 2026-02-21T08:19:08.1876416Z // end inline asm 2026-02-21T08:19:08.1876559Z // begin inline asm 2026-02-21T08:19:08.1876817Z @%p115 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r44; 2026-02-21T08:19:08.1877122Z // end inline asm 2026-02-21T08:19:08.1877263Z // begin inline asm 2026-02-21T08:19:08.1877522Z @%p115 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r44; 2026-02-21T08:19:08.1877815Z // end inline asm 2026-02-21T08:19:08.1877950Z // begin inline asm 2026-02-21T08:19:08.1878252Z @%p115 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x6; 2026-02-21T08:19:08.1878531Z // end inline asm 2026-02-21T08:19:08.1878669Z // begin inline asm 2026-02-21T08:19:08.1878930Z @%p115 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T08:19:08.1879214Z // end inline asm 2026-02-21T08:19:08.1879352Z // begin inline asm 2026-02-21T08:19:08.1879587Z @%p115 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x3; 2026-02-21T08:19:08.1879868Z // end inline asm 2026-02-21T08:19:08.1880000Z // begin inline asm 2026-02-21T08:19:08.1880238Z @%p115 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T08:19:08.1880505Z // end inline asm 2026-02-21T08:19:08.1880639Z // begin inline asm 2026-02-21T08:19:08.1880999Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd48 + 0 ], [ %rd15 + 0 ], 0x80; 2026-02-21T08:19:08.1881379Z // end inline asm 2026-02-21T08:19:08.1881523Z // begin inline asm 2026-02-21T08:19:08.1881732Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd48 + 0 ], 0x80; 2026-02-21T08:19:08.1881993Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:19:08.1882180Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:19:08.1882350Z // end inline asm 2026-02-21T08:19:08.1882485Z bar.sync 0; 2026-02-21T08:19:08.1882621Z cvta.global.u64 %rd134, %rd48; 2026-02-21T08:19:08.1882899Z .loc 1 29 75 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:29:75 2026-02-21T08:19:08.1883188Z setp.gt.u32 %p39, %r3, 127; 2026-02-21T08:19:08.1883353Z @%p39 bra $L__BB0_8; 2026-02-21T08:19:08.1883511Z // %bb.1: // %.lr.ph 2026-02-21T08:19:08.1883802Z .loc 1 0 75 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:0:75 2026-02-21T08:19:08.1884103Z ld.param.b64 %rd14, [_helion_matmul_param_1]; 2026-02-21T08:19:08.1884394Z .loc 1 48 48 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:48:48 2026-02-21T08:19:08.1884710Z shl.b32 %r409, %r1, 3; 2026-02-21T08:19:08.1884862Z and.b32 %r410, %r409, 24; 2026-02-21T08:19:08.1885120Z .loc 1 42 45 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:42:45 2026-02-21T08:19:08.1885396Z shr.u32 %r411, %r1, 2; 2026-02-21T08:19:08.1885549Z bfe.u32 %r4, %r1, 2, 6; 2026-02-21T08:19:08.1885695Z shr.u32 %r412, %r1, 5; 2026-02-21T08:19:08.1885844Z and.b32 %r413, %r1, 255; 2026-02-21T08:19:08.1885995Z shl.b32 %r414, %r413, 4; 2026-02-21T08:19:08.1886140Z shl.b32 %r415, %r1, 1; 2026-02-21T08:19:08.1886291Z and.b32 %r416, %r415, 48; 2026-02-21T08:19:08.1886440Z xor.b32 %r5, %r414, %r416; 2026-02-21T08:19:08.1886597Z add.s32 %r418, %r37, %r5; 2026-02-21T08:19:08.1886748Z add.s32 %r347, %r418, 229376; 2026-02-21T08:19:08.1886914Z add.s32 %r349, %r418, 233472; 2026-02-21T08:19:08.1887066Z add.s32 %r351, %r418, 237568; 2026-02-21T08:19:08.1887229Z add.s32 %r353, %r418, 241664; 2026-02-21T08:19:08.1887385Z add.s32 %r360, %r418, 245760; 2026-02-21T08:19:08.1887593Z add.s32 %r362, %r418, 249856; 2026-02-21T08:19:08.1887746Z add.s32 %r364, %r418, 253952; 2026-02-21T08:19:08.1887889Z add.s32 %r366, %r418, 258048; 2026-02-21T08:19:08.1888038Z add.s32 %r373, %r418, 262144; 2026-02-21T08:19:08.1888181Z add.s32 %r375, %r418, 266240; 2026-02-21T08:19:08.1888333Z add.s32 %r377, %r418, 270336; 2026-02-21T08:19:08.1888477Z add.s32 %r379, %r418, 274432; 2026-02-21T08:19:08.1888628Z add.s32 %r386, %r418, 278528; 2026-02-21T08:19:08.1888770Z add.s32 %r388, %r418, 282624; 2026-02-21T08:19:08.1888921Z add.s32 %r390, %r418, 286720; 2026-02-21T08:19:08.1889071Z add.s32 %r392, %r418, 290816; 2026-02-21T08:19:08.1889217Z add.s32 %r399, %r418, 294912; 2026-02-21T08:19:08.1889372Z add.s32 %r401, %r418, 299008; 2026-02-21T08:19:08.1889517Z add.s32 %r403, %r418, 303104; 2026-02-21T08:19:08.1889670Z add.s32 %r405, %r418, 307200; 2026-02-21T08:19:08.1889898Z or.b32 %r6, %r410, 160; 2026-02-21T08:19:08.1890055Z add.s32 %r483, %r418, 311296; 2026-02-21T08:19:08.1890200Z add.s32 %r485, %r418, 315392; 2026-02-21T08:19:08.1890350Z add.s32 %r487, %r418, 319488; 2026-02-21T08:19:08.1890497Z add.s32 %r489, %r418, 323584; 2026-02-21T08:19:08.1890650Z shl.b32 %r419, %r413, 7; 2026-02-21T08:19:08.1890800Z shl.b32 %r420, %r1, 4; 2026-02-21T08:19:08.1890945Z and.b32 %r421, %r420, 112; 2026-02-21T08:19:08.1891106Z or.b32 %r422, %r419, %r421; 2026-02-21T08:19:08.1891259Z xor.b32 %r423, %r422, 16; 2026-02-21T08:19:08.1891413Z xor.b32 %r424, %r422, 32; 2026-02-21T08:19:08.1891558Z xor.b32 %r425, %r422, 48; 2026-02-21T08:19:08.1891714Z xor.b32 %r426, %r422, 64; 2026-02-21T08:19:08.1891859Z xor.b32 %r427, %r422, 80; 2026-02-21T08:19:08.1892008Z xor.b32 %r428, %r422, 96; 2026-02-21T08:19:08.1892155Z xor.b32 %r429, %r422, 112; 2026-02-21T08:19:08.1892409Z .loc 1 29 75 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:29:75 2026-02-21T08:19:08.1892711Z cvt.u64.u32 %rd78, %r410; 2026-02-21T08:19:08.1892960Z .loc 1 36 33 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:36:33 2026-02-21T08:19:08.1893238Z shr.u32 %r430, %r3, 3; 2026-02-21T08:19:08.1893378Z and.b32 %r431, %r430, 14; 2026-02-21T08:19:08.1893625Z .loc 1 38 64 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:38:64 2026-02-21T08:19:08.1893893Z and.b32 %r432, %r3, 1; 2026-02-21T08:19:08.1894141Z .loc 1 38 30 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:38:30 2026-02-21T08:19:08.1894427Z or.b32 %r433, %r431, %r432; 2026-02-21T08:19:08.1894716Z .loc 1 40 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:40:27 2026-02-21T08:19:08.1894991Z shl.b32 %r844, %r433, 8; 2026-02-21T08:19:08.1895231Z .loc 1 41 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:41:27 2026-02-21T08:19:08.1895507Z shl.b32 %r434, %r3, 7; 2026-02-21T08:19:08.1895651Z and.b32 %r20, %r434, 1792; 2026-02-21T08:19:08.1895906Z .loc 1 42 32 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:42:32 2026-02-21T08:19:08.1896194Z or.b32 %r435, %r20, %r4; 2026-02-21T08:19:08.1896340Z or.b32 %r436, %r411, %r20; 2026-02-21T08:19:08.1896595Z .loc 1 53 80 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:80 2026-02-21T08:19:08.1896864Z shl.b32 %r437, %r435, 11; 2026-02-21T08:19:08.1897013Z shl.b32 %r438, %r436, 11; 2026-02-21T08:19:08.1897157Z or.b32 %r439, %r438, 393216; 2026-02-21T08:19:08.1897414Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.1897707Z shfl.sync.idx.b32 %r21, %r412, 0, 31, -1; 2026-02-21T08:19:08.1897883Z and.b32 %r22, %r21, 3; 2026-02-21T08:19:08.1898032Z shl.b32 %r440, %r22, 21; 2026-02-21T08:19:08.1898175Z add.s32 %r441, %r440, %r1234; 2026-02-21T08:19:08.1898332Z shl.b32 %r442, %r21, 6; 2026-02-21T08:19:08.1898541Z and.b32 %r443, %r442, 256; 2026-02-21T08:19:08.1898697Z add.s32 %r842, %r441, %r443; 2026-02-21T08:19:08.1898849Z mov.pred %p83, -1; 2026-02-21T08:19:08.1898993Z // begin inline asm 2026-02-21T08:19:08.1899335Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 0], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1899688Z // end inline asm 2026-02-21T08:19:08.1899825Z // begin inline asm 2026-02-21T08:19:08.1900146Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 16], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1900505Z // end inline asm 2026-02-21T08:19:08.1900636Z // begin inline asm 2026-02-21T08:19:08.1900959Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 32], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1901375Z // end inline asm 2026-02-21T08:19:08.1901513Z // begin inline asm 2026-02-21T08:19:08.1901838Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 48], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1902198Z // end inline asm 2026-02-21T08:19:08.1902337Z // begin inline asm 2026-02-21T08:19:08.1902658Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 64], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1903017Z // end inline asm 2026-02-21T08:19:08.1903156Z // begin inline asm 2026-02-21T08:19:08.1903487Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 80], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1903843Z // end inline asm 2026-02-21T08:19:08.1903972Z // begin inline asm 2026-02-21T08:19:08.1904290Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 96], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1904648Z // end inline asm 2026-02-21T08:19:08.1904820Z // begin inline asm 2026-02-21T08:19:08.1905142Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 112], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1905492Z // end inline asm 2026-02-21T08:19:08.1905629Z // begin inline asm 2026-02-21T08:19:08.1905943Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 128], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1906308Z // end inline asm 2026-02-21T08:19:08.1906435Z // begin inline asm 2026-02-21T08:19:08.1906762Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 144], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1907127Z // end inline asm 2026-02-21T08:19:08.1907258Z // begin inline asm 2026-02-21T08:19:08.1907581Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 160], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1907931Z // end inline asm 2026-02-21T08:19:08.1908068Z // begin inline asm 2026-02-21T08:19:08.1908382Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 176], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1908732Z // end inline asm 2026-02-21T08:19:08.1908867Z // begin inline asm 2026-02-21T08:19:08.1909176Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 192], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1909542Z // end inline asm 2026-02-21T08:19:08.1909670Z // begin inline asm 2026-02-21T08:19:08.1909989Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 208], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1910405Z // end inline asm 2026-02-21T08:19:08.1910532Z // begin inline asm 2026-02-21T08:19:08.1910847Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 224], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1911191Z // end inline asm 2026-02-21T08:19:08.1911324Z // begin inline asm 2026-02-21T08:19:08.1911630Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r842 + 240], {%r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47, %r47}; 2026-02-21T08:19:08.1911984Z // end inline asm 2026-02-21T08:19:08.1912118Z // begin inline asm 2026-02-21T08:19:08.1912263Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:19:08.1912426Z // end inline asm 2026-02-21T08:19:08.1912553Z bar.sync 0; 2026-02-21T08:19:08.1912797Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1913124Z add.s32 %r1236, %r37, 327728; 2026-02-21T08:19:08.1913286Z // begin inline asm 2026-02-21T08:19:08.1913445Z @%p115 mbarrier.init.shared::cta.b64 [%r1236], 1; 2026-02-21T08:19:08.1913641Z // end inline asm 2026-02-21T08:19:08.1913772Z bar.sync 0; 2026-02-21T08:19:08.1913902Z add.s32 %r335, %r37, 327736; 2026-02-21T08:19:08.1914057Z // begin inline asm 2026-02-21T08:19:08.1914217Z @%p115 mbarrier.init.shared::cta.b64 [%r335], 1; 2026-02-21T08:19:08.1914408Z // end inline asm 2026-02-21T08:19:08.1914538Z add.s32 %r336, %r37, 327680; 2026-02-21T08:19:08.1914728Z // begin inline asm 2026-02-21T08:19:08.1914892Z @%p115 mbarrier.init.shared::cta.b64 [%r336], 1; 2026-02-21T08:19:08.1915086Z // end inline asm 2026-02-21T08:19:08.1915215Z bar.sync 0; 2026-02-21T08:19:08.1915351Z add.s32 %r337, %r37, 327688; 2026-02-21T08:19:08.1915507Z // begin inline asm 2026-02-21T08:19:08.1915661Z @%p115 mbarrier.init.shared::cta.b64 [%r337], 1; 2026-02-21T08:19:08.1915846Z // end inline asm 2026-02-21T08:19:08.1915972Z bar.sync 0; 2026-02-21T08:19:08.1916108Z add.s32 %r338, %r37, 327696; 2026-02-21T08:19:08.1916255Z // begin inline asm 2026-02-21T08:19:08.1916419Z @%p115 mbarrier.init.shared::cta.b64 [%r338], 1; 2026-02-21T08:19:08.1916597Z // end inline asm 2026-02-21T08:19:08.1916731Z bar.sync 0; 2026-02-21T08:19:08.1916857Z add.s32 %r339, %r37, 327704; 2026-02-21T08:19:08.1917008Z // begin inline asm 2026-02-21T08:19:08.1917170Z @%p115 mbarrier.init.shared::cta.b64 [%r339], 1; 2026-02-21T08:19:08.1917347Z // end inline asm 2026-02-21T08:19:08.1917479Z bar.sync 0; 2026-02-21T08:19:08.1917602Z add.s32 %r340, %r37, 327712; 2026-02-21T08:19:08.1917758Z // begin inline asm 2026-02-21T08:19:08.1917915Z @%p115 mbarrier.init.shared::cta.b64 [%r340], 1; 2026-02-21T08:19:08.1918099Z // end inline asm 2026-02-21T08:19:08.1918222Z bar.sync 0; 2026-02-21T08:19:08.1918354Z add.s32 %r478, %r37, 327720; 2026-02-21T08:19:08.1918509Z // begin inline asm 2026-02-21T08:19:08.1918669Z @%p115 mbarrier.init.shared::cta.b64 [%r478], 1; 2026-02-21T08:19:08.1918863Z // end inline asm 2026-02-21T08:19:08.1918994Z bar.sync 0; 2026-02-21T08:19:08.1919130Z // begin inline asm 2026-02-21T08:19:08.1919327Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r336], 16384; 2026-02-21T08:19:08.1919561Z // end inline asm 2026-02-21T08:19:08.1919810Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.1920109Z bar.sync 0; 2026-02-21T08:19:08.1920251Z elect.sync %r444|%p75, -1; 2026-02-21T08:19:08.1920419Z and.pred %p65, %p1, %p75; 2026-02-21T08:19:08.1920584Z add.s32 %r343, %r37, 131072; 2026-02-21T08:19:08.1920737Z // begin inline asm 2026-02-21T08:19:08.1921078Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r343], [%rd102, {%r47, %r844}], [%r336]; 2026-02-21T08:19:08.1921448Z // end inline asm 2026-02-21T08:19:08.1921708Z .loc 1 53 59 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:59 2026-02-21T08:19:08.1922059Z or.b32 %r445, %r437, %r410; 2026-02-21T08:19:08.1922228Z or.b32 %r446, %r439, %r410; 2026-02-21T08:19:08.1922499Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.1922799Z mad.wide.u32 %rd54, %r445, 2, %rd14; 2026-02-21T08:19:08.1922986Z cvt.u64.u32 %rd3, %r437; 2026-02-21T08:19:08.1923143Z or.b64 %rd79, %rd3, %rd78; 2026-02-21T08:19:08.1923309Z shl.b64 %rd80, %rd79, 1; 2026-02-21T08:19:08.1923465Z add.s64 %rd4, %rd14, %rd80; 2026-02-21T08:19:08.1923633Z add.s64 %rd55, %rd4, 262144; 2026-02-21T08:19:08.1923794Z add.s64 %rd56, %rd4, 524288; 2026-02-21T08:19:08.1923970Z mad.wide.u32 %rd57, %r446, 2, %rd14; 2026-02-21T08:19:08.1924154Z mov.b32 %r484, 16; 2026-02-21T08:19:08.1924410Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1924735Z // begin inline asm 2026-02-21T08:19:08.1925004Z cp.async.cg.shared.global [ %r347 + 0 ], [ %rd54 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1925247Z // end inline asm 2026-02-21T08:19:08.1925388Z // begin inline asm 2026-02-21T08:19:08.1925599Z cp.async.cg.shared.global [ %r349 + 0 ], [ %rd55 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1925839Z // end inline asm 2026-02-21T08:19:08.1925978Z // begin inline asm 2026-02-21T08:19:08.1926185Z cp.async.cg.shared.global [ %r351 + 0 ], [ %rd56 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1926415Z // end inline asm 2026-02-21T08:19:08.1926563Z // begin inline asm 2026-02-21T08:19:08.1926760Z cp.async.cg.shared.global [ %r353 + 0 ], [ %rd57 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1926999Z // end inline asm 2026-02-21T08:19:08.1927155Z cp.async.commit_group; 2026-02-21T08:19:08.1927416Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1927697Z bar.sync 0; 2026-02-21T08:19:08.1927827Z // begin inline asm 2026-02-21T08:19:08.1928027Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r337], 16384; 2026-02-21T08:19:08.1928253Z // end inline asm 2026-02-21T08:19:08.1928504Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.1928776Z bar.sync 0; 2026-02-21T08:19:08.1928921Z elect.sync %r447|%p76, -1; 2026-02-21T08:19:08.1929087Z and.pred %p67, %p1, %p76; 2026-02-21T08:19:08.1929253Z add.s32 %r356, %r37, 147456; 2026-02-21T08:19:08.1929405Z // begin inline asm 2026-02-21T08:19:08.1929739Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r356], [%rd102, {%r40, %r844}], [%r337]; 2026-02-21T08:19:08.1930091Z // end inline asm 2026-02-21T08:19:08.1930336Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.1930629Z add.s64 %rd59, %rd4, 64; 2026-02-21T08:19:08.1930782Z or.b32 %r448, %r445, 32; 2026-02-21T08:19:08.1930950Z mad.wide.u32 %rd81, %r448, 2, %rd14; 2026-02-21T08:19:08.1931127Z add.s64 %rd60, %rd81, 262144; 2026-02-21T08:19:08.1931296Z add.s64 %rd61, %rd81, 524288; 2026-02-21T08:19:08.1931457Z cvt.u64.u32 %rd5, %r439; 2026-02-21T08:19:08.1931609Z or.b64 %rd82, %rd5, %rd78; 2026-02-21T08:19:08.1931769Z shl.b64 %rd83, %rd82, 1; 2026-02-21T08:19:08.1931919Z add.s64 %rd6, %rd14, %rd83; 2026-02-21T08:19:08.1932081Z add.s64 %rd62, %rd6, 64; 2026-02-21T08:19:08.1932339Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1932621Z // begin inline asm 2026-02-21T08:19:08.1932816Z cp.async.cg.shared.global [ %r360 + 0 ], [ %rd59 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1933044Z // end inline asm 2026-02-21T08:19:08.1933185Z // begin inline asm 2026-02-21T08:19:08.1933379Z cp.async.cg.shared.global [ %r362 + 0 ], [ %rd60 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1933602Z // end inline asm 2026-02-21T08:19:08.1933736Z // begin inline asm 2026-02-21T08:19:08.1933936Z cp.async.cg.shared.global [ %r364 + 0 ], [ %rd61 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1934233Z // end inline asm 2026-02-21T08:19:08.1934374Z // begin inline asm 2026-02-21T08:19:08.1934560Z cp.async.cg.shared.global [ %r366 + 0 ], [ %rd62 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1934816Z // end inline asm 2026-02-21T08:19:08.1934959Z cp.async.commit_group; 2026-02-21T08:19:08.1935215Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1935488Z bar.sync 0; 2026-02-21T08:19:08.1935615Z // begin inline asm 2026-02-21T08:19:08.1935817Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r338], 16384; 2026-02-21T08:19:08.1936038Z // end inline asm 2026-02-21T08:19:08.1936288Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.1936555Z bar.sync 0; 2026-02-21T08:19:08.1936694Z elect.sync %r449|%p77, -1; 2026-02-21T08:19:08.1936857Z and.pred %p69, %p1, %p77; 2026-02-21T08:19:08.1937069Z add.s32 %r369, %r37, 163840; 2026-02-21T08:19:08.1937233Z // begin inline asm 2026-02-21T08:19:08.1937549Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r369], [%rd102, {%r48, %r844}], [%r338]; 2026-02-21T08:19:08.1937892Z // end inline asm 2026-02-21T08:19:08.1938124Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.1938407Z add.s64 %rd64, %rd4, 128; 2026-02-21T08:19:08.1938561Z or.b32 %r450, %r445, 64; 2026-02-21T08:19:08.1938715Z mad.wide.u32 %rd84, %r450, 2, %rd14; 2026-02-21T08:19:08.1938890Z add.s64 %rd65, %rd84, 262144; 2026-02-21T08:19:08.1939043Z add.s64 %rd66, %rd84, 524288; 2026-02-21T08:19:08.1939200Z add.s64 %rd67, %rd6, 128; 2026-02-21T08:19:08.1939449Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1939727Z // begin inline asm 2026-02-21T08:19:08.1939918Z cp.async.cg.shared.global [ %r373 + 0 ], [ %rd64 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1940140Z // end inline asm 2026-02-21T08:19:08.1940275Z // begin inline asm 2026-02-21T08:19:08.1940459Z cp.async.cg.shared.global [ %r375 + 0 ], [ %rd65 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1940679Z // end inline asm 2026-02-21T08:19:08.1940806Z // begin inline asm 2026-02-21T08:19:08.1940996Z cp.async.cg.shared.global [ %r377 + 0 ], [ %rd66 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1941211Z // end inline asm 2026-02-21T08:19:08.1941343Z // begin inline asm 2026-02-21T08:19:08.1941524Z cp.async.cg.shared.global [ %r379 + 0 ], [ %rd67 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1941742Z // end inline asm 2026-02-21T08:19:08.1941882Z cp.async.commit_group; 2026-02-21T08:19:08.1942133Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1942407Z bar.sync 0; 2026-02-21T08:19:08.1942527Z // begin inline asm 2026-02-21T08:19:08.1942721Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r339], 16384; 2026-02-21T08:19:08.1942935Z // end inline asm 2026-02-21T08:19:08.1943170Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.1943436Z bar.sync 0; 2026-02-21T08:19:08.1943576Z elect.sync %r451|%p78, -1; 2026-02-21T08:19:08.1943740Z and.pred %p71, %p1, %p78; 2026-02-21T08:19:08.1943891Z add.s32 %r382, %r37, 180224; 2026-02-21T08:19:08.1944047Z mov.b32 %r383, 96; 2026-02-21T08:19:08.1944181Z // begin inline asm 2026-02-21T08:19:08.1944507Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r382], [%rd102, {%r383, %r844}], [%r339]; 2026-02-21T08:19:08.1944880Z // end inline asm 2026-02-21T08:19:08.1955605Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.1955988Z add.s64 %rd69, %rd4, 192; 2026-02-21T08:19:08.1956168Z or.b32 %r452, %r445, 96; 2026-02-21T08:19:08.1956348Z mad.wide.u32 %rd85, %r452, 2, %rd14; 2026-02-21T08:19:08.1961590Z add.s64 %rd70, %rd85, 262144; 2026-02-21T08:19:08.1961765Z add.s64 %rd71, %rd85, 524288; 2026-02-21T08:19:08.1961938Z add.s64 %rd72, %rd6, 192; 2026-02-21T08:19:08.1962226Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1962541Z // begin inline asm 2026-02-21T08:19:08.1962767Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd69 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1962992Z // end inline asm 2026-02-21T08:19:08.1963140Z // begin inline asm 2026-02-21T08:19:08.1963333Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd70 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1963562Z // end inline asm 2026-02-21T08:19:08.1963695Z // begin inline asm 2026-02-21T08:19:08.1963893Z cp.async.cg.shared.global [ %r390 + 0 ], [ %rd71 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1964109Z // end inline asm 2026-02-21T08:19:08.1964251Z // begin inline asm 2026-02-21T08:19:08.1964542Z cp.async.cg.shared.global [ %r392 + 0 ], [ %rd72 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1964841Z // end inline asm 2026-02-21T08:19:08.1964996Z cp.async.commit_group; 2026-02-21T08:19:08.1965260Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1965557Z bar.sync 0; 2026-02-21T08:19:08.1965689Z // begin inline asm 2026-02-21T08:19:08.1965905Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r340], 16384; 2026-02-21T08:19:08.1966132Z // end inline asm 2026-02-21T08:19:08.1966385Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.1966682Z bar.sync 0; 2026-02-21T08:19:08.1966830Z elect.sync %r453|%p79, -1; 2026-02-21T08:19:08.1967013Z and.pred %p73, %p1, %p79; 2026-02-21T08:19:08.1967178Z add.s32 %r395, %r37, 196608; 2026-02-21T08:19:08.1967350Z mov.b32 %r396, 128; 2026-02-21T08:19:08.1967497Z // begin inline asm 2026-02-21T08:19:08.1967872Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r395], [%rd102, {%r396, %r844}], [%r340]; 2026-02-21T08:19:08.1968249Z // end inline asm 2026-02-21T08:19:08.1968499Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.1968796Z add.s64 %rd74, %rd4, 256; 2026-02-21T08:19:08.1968953Z or.b32 %r454, %r445, 128; 2026-02-21T08:19:08.1969126Z mad.wide.u32 %rd86, %r454, 2, %rd14; 2026-02-21T08:19:08.1969306Z add.s64 %rd75, %rd86, 262144; 2026-02-21T08:19:08.1969482Z add.s64 %rd76, %rd86, 524288; 2026-02-21T08:19:08.1969643Z add.s64 %rd77, %rd6, 256; 2026-02-21T08:19:08.1969916Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1970209Z // begin inline asm 2026-02-21T08:19:08.1970410Z cp.async.cg.shared.global [ %r399 + 0 ], [ %rd74 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1970646Z // end inline asm 2026-02-21T08:19:08.1970784Z // begin inline asm 2026-02-21T08:19:08.1970990Z cp.async.cg.shared.global [ %r401 + 0 ], [ %rd75 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1971215Z // end inline asm 2026-02-21T08:19:08.1971364Z // begin inline asm 2026-02-21T08:19:08.1971556Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd76 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1971788Z // end inline asm 2026-02-21T08:19:08.1971935Z // begin inline asm 2026-02-21T08:19:08.1972127Z cp.async.cg.shared.global [ %r405 + 0 ], [ %rd77 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1972357Z // end inline asm 2026-02-21T08:19:08.1972501Z cp.async.commit_group; 2026-02-21T08:19:08.1972773Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1973065Z bar.sync 0; 2026-02-21T08:19:08.1973207Z // begin inline asm 2026-02-21T08:19:08.1973347Z 2026-02-21T08:19:08.1973480Z { 2026-02-21T08:19:08.1973609Z .reg .pred complete; 2026-02-21T08:19:08.1973768Z waitLoop: 2026-02-21T08:19:08.1973976Z mbarrier.try_wait.parity.shared.b64 complete, [%r336], %r47; 2026-02-21T08:19:08.1974294Z @!complete bra.uni waitLoop; 2026-02-21T08:19:08.1974459Z } 2026-02-21T08:19:08.1974527Z 2026-02-21T08:19:08.1974583Z // end inline asm 2026-02-21T08:19:08.1974887Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1975169Z cp.async.wait_group 4; 2026-02-21T08:19:08.1975326Z bar.sync 0; 2026-02-21T08:19:08.1975568Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.1975854Z setp.ne.b32 %p80, %r21, 0; 2026-02-21T08:19:08.1976017Z @%p80 bra $L__BB0_3; 2026-02-21T08:19:08.1976157Z // %bb.2: 2026-02-21T08:19:08.1976397Z .loc 1 0 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:0:52 2026-02-21T08:19:08.1976686Z add.s32 %r465, %r37, 139296; 2026-02-21T08:19:08.1976856Z bfe.u32 %r466, %r465, 4, 14; 2026-02-21T08:19:08.1977085Z cvt.u64.u32 %rd96, %r466; 2026-02-21T08:19:08.1977263Z or.b64 %rd93, %rd96, -9223371899348713472; 2026-02-21T08:19:08.1977453Z add.s32 %r467, %r37, 139264; 2026-02-21T08:19:08.1977606Z bfe.u32 %r468, %r467, 4, 14; 2026-02-21T08:19:08.1977768Z cvt.u64.u32 %rd97, %r468; 2026-02-21T08:19:08.1977928Z or.b64 %rd91, %rd97, -9223371899348713472; 2026-02-21T08:19:08.1978111Z add.s32 %r469, %r37, 229376; 2026-02-21T08:19:08.1978263Z add.s32 %r470, %r37, 229408; 2026-02-21T08:19:08.1978420Z bfe.u32 %r471, %r470, 4, 14; 2026-02-21T08:19:08.1978571Z cvt.u64.u32 %rd98, %r471; 2026-02-21T08:19:08.1978740Z or.b64 %rd90, %rd98, -9223371899348713472; 2026-02-21T08:19:08.1978914Z add.s32 %r472, %r37, 131104; 2026-02-21T08:19:08.1979076Z bfe.u32 %r473, %r472, 4, 14; 2026-02-21T08:19:08.1979236Z cvt.u64.u32 %rd99, %r473; 2026-02-21T08:19:08.1979398Z or.b64 %rd89, %rd99, -9223371899348713472; 2026-02-21T08:19:08.1979579Z bfe.u32 %r474, %r469, 4, 14; 2026-02-21T08:19:08.1979726Z cvt.u64.u32 %rd100, %r474; 2026-02-21T08:19:08.1979903Z or.b64 %rd88, %rd100, -9223371899348713472; 2026-02-21T08:19:08.1980081Z bfe.u32 %r475, %r343, 4, 14; 2026-02-21T08:19:08.1980243Z cvt.u64.u32 %rd101, %r475; 2026-02-21T08:19:08.1980404Z or.b64 %rd87, %rd101, -9223371899348713472; 2026-02-21T08:19:08.1980696Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.1980996Z elect.sync %r476|%p82, -1; 2026-02-21T08:19:08.1981154Z mov.b32 %r456, 138412048; 2026-02-21T08:19:08.1981312Z mov.pred %p81, 0; 2026-02-21T08:19:08.1981452Z // begin inline asm 2026-02-21T08:19:08.1981691Z @%p82 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 0 ], %rd87, %rd88, %r456, %p81; 2026-02-21T08:19:08.1981946Z // end inline asm 2026-02-21T08:19:08.1982088Z // begin inline asm 2026-02-21T08:19:08.1982299Z @%p82 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 0 ], %rd89, %rd90, %r456, %p83; 2026-02-21T08:19:08.1982551Z // end inline asm 2026-02-21T08:19:08.1982692Z // begin inline asm 2026-02-21T08:19:08.1982907Z @%p82 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 256 ], %rd91, %rd88, %r456, %p81; 2026-02-21T08:19:08.1983165Z // end inline asm 2026-02-21T08:19:08.1983299Z // begin inline asm 2026-02-21T08:19:08.1983518Z @%p82 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 256 ], %rd93, %rd90, %r456, %p83; 2026-02-21T08:19:08.1983759Z // end inline asm 2026-02-21T08:19:08.1983905Z add.s32 %r477, %r37, 327728; 2026-02-21T08:19:08.1984068Z cvt.u64.u32 %rd95, %r477; 2026-02-21T08:19:08.1984214Z // begin inline asm 2026-02-21T08:19:08.1984424Z @%p82 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd95]; 2026-02-21T08:19:08.1984649Z // end inline asm 2026-02-21T08:19:08.1984830Z $L__BB0_3: 2026-02-21T08:19:08.1985065Z .loc 1 0 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:0:52 2026-02-21T08:19:08.1985349Z add.s32 %r11, %r37, %r422; 2026-02-21T08:19:08.1985499Z add.s32 %r12, %r37, %r423; 2026-02-21T08:19:08.1985657Z add.s32 %r13, %r37, %r424; 2026-02-21T08:19:08.1985881Z add.s32 %r14, %r37, %r425; 2026-02-21T08:19:08.1986027Z add.s32 %r15, %r37, %r426; 2026-02-21T08:19:08.1986180Z add.s32 %r16, %r37, %r427; 2026-02-21T08:19:08.1986323Z add.s32 %r17, %r37, %r428; 2026-02-21T08:19:08.1986479Z add.s32 %r18, %r37, %r429; 2026-02-21T08:19:08.1986730Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1987011Z // begin inline asm 2026-02-21T08:19:08.1987208Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r478], 16384; 2026-02-21T08:19:08.1987435Z // end inline asm 2026-02-21T08:19:08.1987685Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.1987965Z bar.sync 0; 2026-02-21T08:19:08.1988112Z elect.sync %r496|%p93, -1; 2026-02-21T08:19:08.1988273Z and.pred %p91, %p1, %p93; 2026-02-21T08:19:08.1988437Z add.s32 %r479, %r37, 212992; 2026-02-21T08:19:08.1988636Z mov.b32 %r480, 160; 2026-02-21T08:19:08.1988790Z // begin inline asm 2026-02-21T08:19:08.1989120Z @%p91 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r479], [%rd102, {%r480, %r844}], [%r478]; 2026-02-21T08:19:08.1989493Z // end inline asm 2026-02-21T08:19:08.1989741Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.1990021Z add.s64 %rd103, %rd4, 320; 2026-02-21T08:19:08.1990184Z cvt.u64.u32 %rd108, %r6; 2026-02-21T08:19:08.1990339Z add.s64 %rd109, %rd3, %rd108; 2026-02-21T08:19:08.1990504Z shl.b64 %rd110, %rd109, 1; 2026-02-21T08:19:08.1990659Z add.s64 %rd111, %rd14, %rd110; 2026-02-21T08:19:08.1990832Z add.s64 %rd104, %rd111, 262144; 2026-02-21T08:19:08.1990996Z add.s64 %rd105, %rd111, 524288; 2026-02-21T08:19:08.1991166Z add.s64 %rd106, %rd6, 320; 2026-02-21T08:19:08.1991431Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.1991708Z // begin inline asm 2026-02-21T08:19:08.1991918Z cp.async.cg.shared.global [ %r483 + 0 ], [ %rd103 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1992140Z // end inline asm 2026-02-21T08:19:08.1992280Z // begin inline asm 2026-02-21T08:19:08.1992477Z cp.async.cg.shared.global [ %r485 + 0 ], [ %rd104 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1992707Z // end inline asm 2026-02-21T08:19:08.1992843Z // begin inline asm 2026-02-21T08:19:08.1993044Z cp.async.cg.shared.global [ %r487 + 0 ], [ %rd105 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1993272Z // end inline asm 2026-02-21T08:19:08.1993403Z // begin inline asm 2026-02-21T08:19:08.1993599Z cp.async.cg.shared.global [ %r489 + 0 ], [ %rd106 + 0 ], 0x10, %r484; 2026-02-21T08:19:08.1993816Z // end inline asm 2026-02-21T08:19:08.1993962Z cp.async.commit_group; 2026-02-21T08:19:08.1994226Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1994504Z shl.b64 %rd112, %rd5, 1; 2026-02-21T08:19:08.1994695Z add.s64 %rd7, %rd112, 384; 2026-02-21T08:19:08.1994856Z and.b32 %r497, %r1, 3; 2026-02-21T08:19:08.1995021Z mad.wide.u32 %rd647, %r497, 16, %rd14; 2026-02-21T08:19:08.1995190Z shl.b32 %r498, %r3, 18; 2026-02-21T08:19:08.1995345Z and.b32 %r499, %r498, 3670016; 2026-02-21T08:19:08.1995506Z shl.b32 %r500, %r4, 11; 2026-02-21T08:19:08.1995652Z or.b32 %r501, %r499, %r500; 2026-02-21T08:19:08.1995811Z mul.wide.u32 %rd9, %r501, 2; 2026-02-21T08:19:08.1995960Z mov.b32 %r1240, 1; 2026-02-21T08:19:08.1996099Z mov.b32 %r1239, 5; 2026-02-21T08:19:08.1996227Z mov.b32 %r1235, 0; 2026-02-21T08:19:08.1996362Z mov.b64 %rd648, 0; 2026-02-21T08:19:08.1996493Z mov.b32 %r1237, %r1235; 2026-02-21T08:19:08.1996643Z mov.b32 %r1238, %r1235; 2026-02-21T08:19:08.1996781Z mov.b32 %r1241, %r1235; 2026-02-21T08:19:08.1996927Z bra.uni $L__BB0_4; 2026-02-21T08:19:08.1997109Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:19:08.1997426Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.1997785Z setp.lt.u64 %p107, %rd648, 1856; 2026-02-21T08:19:08.1998056Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.1998335Z // begin inline asm 2026-02-21T08:19:08.1998464Z 2026-02-21T08:19:08.1998580Z { 2026-02-21T08:19:08.1998697Z .reg .pred complete; 2026-02-21T08:19:08.1998845Z waitLoop: 2026-02-21T08:19:08.1999039Z mbarrier.try_wait.parity.shared.b64 complete, [%r1236], %r1235; 2026-02-21T08:19:08.1999272Z @!complete bra.uni waitLoop; 2026-02-21T08:19:08.1999432Z } 2026-02-21T08:19:08.1999496Z 2026-02-21T08:19:08.1999549Z // end inline asm 2026-02-21T08:19:08.1999690Z add.s32 %r550, %r1240, 1; 2026-02-21T08:19:08.1999844Z setp.gt.s32 %p110, %r550, 1; 2026-02-21T08:19:08.2000012Z selp.b32 %r1240, 0, %r550, %p110; 2026-02-21T08:19:08.2000187Z selp.b32 %r551, 1, 0, %p110; 2026-02-21T08:19:08.2000421Z xor.b32 %r35, %r1241, %r551; 2026-02-21T08:19:08.2000680Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.2000964Z add.s32 %r552, %r1239, 1; 2026-02-21T08:19:08.2001111Z setp.gt.s32 %p111, %r552, 5; 2026-02-21T08:19:08.2001273Z selp.b32 %r1239, 0, %r552, %p111; 2026-02-21T08:19:08.2001440Z shl.b32 %r553, %r1239, 3; 2026-02-21T08:19:08.2001585Z add.s32 %r555, %r37, %r553; 2026-02-21T08:19:08.2001744Z add.s32 %r541, %r555, 327680; 2026-02-21T08:19:08.2001901Z and.pred %p105, %p115, %p107; 2026-02-21T08:19:08.2002062Z // begin inline asm 2026-02-21T08:19:08.2002253Z @%p105 mbarrier.arrive.expect_tx.shared.b64 _, [%r541], 16384; 2026-02-21T08:19:08.2002474Z // end inline asm 2026-02-21T08:19:08.2002715Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.2002998Z shl.b32 %r556, %r1239, 14; 2026-02-21T08:19:08.2003154Z add.s32 %r557, %r37, %r556; 2026-02-21T08:19:08.2003308Z add.s32 %r538, %r557, 131072; 2026-02-21T08:19:08.2003463Z bar.sync 0; 2026-02-21T08:19:08.2003594Z elect.sync %r558|%p112, -1; 2026-02-21T08:19:08.2003757Z and.pred %p113, %p107, %p112; 2026-02-21T08:19:08.2003913Z and.pred %p106, %p1, %p113; 2026-02-21T08:19:08.2004072Z cvt.u32.u64 %r559, %rd648; 2026-02-21T08:19:08.2004219Z add.s32 %r539, %r559, 192; 2026-02-21T08:19:08.2004368Z // begin inline asm 2026-02-21T08:19:08.2004724Z @%p106 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r538], [%rd102, {%r539, %r844}], [%r541]; 2026-02-21T08:19:08.2005084Z // end inline asm 2026-02-21T08:19:08.2005329Z .loc 1 53 34 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:34 2026-02-21T08:19:08.2005607Z add.s64 %rd133, %rd647, %rd9; 2026-02-21T08:19:08.2005765Z add.s64 %rd129, %rd133, 384; 2026-02-21T08:19:08.2005916Z add.s64 %rd130, %rd133, 262528; 2026-02-21T08:19:08.2006083Z add.s64 %rd131, %rd133, 524672; 2026-02-21T08:19:08.2006347Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.2006633Z add.s64 %rd132, %rd647, %rd7; 2026-02-21T08:19:08.2006792Z add.s32 %r560, %r557, %r5; 2026-02-21T08:19:08.2006939Z add.s32 %r542, %r560, 229376; 2026-02-21T08:19:08.2007095Z selp.b32 %r543, 16, 0, %p107; 2026-02-21T08:19:08.2007243Z // begin inline asm 2026-02-21T08:19:08.2007448Z cp.async.cg.shared.global [ %r542 + 0 ], [ %rd129 + 0 ], 0x10, %r543; 2026-02-21T08:19:08.2007664Z // end inline asm 2026-02-21T08:19:08.2007800Z add.s32 %r544, %r560, 233472; 2026-02-21T08:19:08.2007943Z // begin inline asm 2026-02-21T08:19:08.2008139Z cp.async.cg.shared.global [ %r544 + 0 ], [ %rd130 + 0 ], 0x10, %r543; 2026-02-21T08:19:08.2008360Z // end inline asm 2026-02-21T08:19:08.2008490Z add.s32 %r546, %r560, 237568; 2026-02-21T08:19:08.2008641Z // begin inline asm 2026-02-21T08:19:08.2008832Z cp.async.cg.shared.global [ %r546 + 0 ], [ %rd131 + 0 ], 0x10, %r543; 2026-02-21T08:19:08.2009110Z // end inline asm 2026-02-21T08:19:08.2009247Z add.s32 %r548, %r560, 241664; 2026-02-21T08:19:08.2009409Z // begin inline asm 2026-02-21T08:19:08.2009607Z cp.async.cg.shared.global [ %r548 + 0 ], [ %rd132 + 0 ], 0x10, %r543; 2026-02-21T08:19:08.2009841Z // end inline asm 2026-02-21T08:19:08.2009988Z cp.async.commit_group; 2026-02-21T08:19:08.2010256Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.2010559Z add.s64 %rd647, %rd647, 64; 2026-02-21T08:19:08.2010722Z setp.lt.u64 %p114, %rd648, 1984; 2026-02-21T08:19:08.2010897Z add.s64 %rd648, %rd648, 32; 2026-02-21T08:19:08.2011051Z mov.b32 %r1235, %r1241; 2026-02-21T08:19:08.2011211Z mov.b32 %r1236, %r561; 2026-02-21T08:19:08.2011359Z mov.b32 %r1241, %r35; 2026-02-21T08:19:08.2011514Z @%p114 bra $L__BB0_4; 2026-02-21T08:19:08.2011665Z bra.uni $L__BB0_7; 2026-02-21T08:19:08.2011905Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:19:08.2012138Z add.s32 %r504, %r1238, 1; 2026-02-21T08:19:08.2012301Z setp.gt.s32 %p95, %r504, 5; 2026-02-21T08:19:08.2012482Z selp.b32 %r1238, 0, %r504, %p95; 2026-02-21T08:19:08.2012656Z selp.b32 %r505, 1, 0, %p95; 2026-02-21T08:19:08.2012829Z xor.b32 %r1237, %r1237, %r505; 2026-02-21T08:19:08.2012993Z shl.b32 %r506, %r1238, 3; 2026-02-21T08:19:08.2013158Z add.s32 %r508, %r37, %r506; 2026-02-21T08:19:08.2013316Z add.s32 %r502, %r508, 327680; 2026-02-21T08:19:08.2013480Z bar.sync 0; 2026-02-21T08:19:08.2013621Z // begin inline asm 2026-02-21T08:19:08.2013757Z 2026-02-21T08:19:08.2013877Z { 2026-02-21T08:19:08.2014003Z .reg .pred complete; 2026-02-21T08:19:08.2014158Z waitLoop: 2026-02-21T08:19:08.2014353Z mbarrier.try_wait.parity.shared.b64 complete, [%r502], %r1237; 2026-02-21T08:19:08.2014605Z @!complete bra.uni waitLoop; 2026-02-21T08:19:08.2014784Z } 2026-02-21T08:19:08.2014855Z 2026-02-21T08:19:08.2014913Z // end inline asm 2026-02-21T08:19:08.2015160Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.2015463Z cp.async.wait_group 4; 2026-02-21T08:19:08.2015621Z bar.sync 0; 2026-02-21T08:19:08.2015861Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.2016151Z shl.b32 %r509, %r1240, 3; 2026-02-21T08:19:08.2016306Z add.s32 %r510, %r37, %r509; 2026-02-21T08:19:08.2016468Z add.s32 %r561, %r510, 327728; 2026-02-21T08:19:08.2016731Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2017021Z @%p80 bra $L__BB0_6; 2026-02-21T08:19:08.2017222Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:19:08.2017549Z .loc 1 53 87 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:53:87 2026-02-21T08:19:08.2017836Z shl.b32 %r519, %r1238, 14; 2026-02-21T08:19:08.2017985Z add.s32 %r521, %r37, %r519; 2026-02-21T08:19:08.2018142Z add.s32 %r522, %r521, 229376; 2026-02-21T08:19:08.2018388Z .loc 1 52 31 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:52:31 2026-02-21T08:19:08.2018666Z add.s32 %r523, %r521, 131072; 2026-02-21T08:19:08.2018918Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2019196Z elect.sync %r524|%p97, -1; 2026-02-21T08:19:08.2019361Z bfe.u32 %r525, %r523, 4, 14; 2026-02-21T08:19:08.2019514Z cvt.u64.u32 %rd122, %r525; 2026-02-21T08:19:08.2019685Z or.b64 %rd113, %rd122, -9223371899348713472; 2026-02-21T08:19:08.2019863Z bfe.u32 %r526, %r522, 4, 14; 2026-02-21T08:19:08.2020019Z cvt.u64.u32 %rd123, %r526; 2026-02-21T08:19:08.2020178Z or.b64 %rd114, %rd123, -9223371899348713472; 2026-02-21T08:19:08.2020358Z mov.b32 %r512, 138412048; 2026-02-21T08:19:08.2020511Z mov.pred %p96, -1; 2026-02-21T08:19:08.2020653Z // begin inline asm 2026-02-21T08:19:08.2021185Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 0 ], %rd113, %rd114, %r512, %p96; 2026-02-21T08:19:08.2021437Z // end inline asm 2026-02-21T08:19:08.2021578Z add.s32 %r527, %r521, 131104; 2026-02-21T08:19:08.2021727Z bfe.u32 %r528, %r527, 4, 14; 2026-02-21T08:19:08.2021882Z cvt.u64.u32 %rd124, %r528; 2026-02-21T08:19:08.2022041Z or.b64 %rd115, %rd124, -9223371899348713472; 2026-02-21T08:19:08.2022225Z add.s32 %r529, %r521, 229408; 2026-02-21T08:19:08.2022379Z bfe.u32 %r530, %r529, 4, 14; 2026-02-21T08:19:08.2022527Z cvt.u64.u32 %rd125, %r530; 2026-02-21T08:19:08.2022692Z or.b64 %rd116, %rd125, -9223371899348713472; 2026-02-21T08:19:08.2022862Z // begin inline asm 2026-02-21T08:19:08.2023081Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 0 ], %rd115, %rd116, %r512, %p96; 2026-02-21T08:19:08.2023324Z // end inline asm 2026-02-21T08:19:08.2023464Z add.s32 %r531, %r521, 139264; 2026-02-21T08:19:08.2023670Z bfe.u32 %r532, %r531, 4, 14; 2026-02-21T08:19:08.2023826Z cvt.u64.u32 %rd126, %r532; 2026-02-21T08:19:08.2023990Z or.b64 %rd117, %rd126, -9223371899348713472; 2026-02-21T08:19:08.2024159Z // begin inline asm 2026-02-21T08:19:08.2024380Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 256 ], %rd117, %rd114, %r512, %p96; 2026-02-21T08:19:08.2024630Z // end inline asm 2026-02-21T08:19:08.2024807Z add.s32 %r533, %r521, 139296; 2026-02-21T08:19:08.2024955Z bfe.u32 %r534, %r533, 4, 14; 2026-02-21T08:19:08.2025110Z cvt.u64.u32 %rd127, %r534; 2026-02-21T08:19:08.2025266Z or.b64 %rd119, %rd127, -9223371899348713472; 2026-02-21T08:19:08.2025439Z // begin inline asm 2026-02-21T08:19:08.2025658Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r1234 + 256 ], %rd119, %rd116, %r512, %p96; 2026-02-21T08:19:08.2025902Z // end inline asm 2026-02-21T08:19:08.2026037Z cvt.u64.u32 %rd121, %r561; 2026-02-21T08:19:08.2026182Z // begin inline asm 2026-02-21T08:19:08.2026391Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd121]; 2026-02-21T08:19:08.2026618Z // end inline asm 2026-02-21T08:19:08.2026752Z bra.uni $L__BB0_6; 2026-02-21T08:19:08.2026920Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:19:08.2027231Z .loc 1 0 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:0:52 2026-02-21T08:19:08.2027518Z setp.lt.u32 %p124, %r1, 128; 2026-02-21T08:19:08.2027667Z mov.b32 %r562, 1; 2026-02-21T08:19:08.2027914Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2028192Z // begin inline asm 2026-02-21T08:19:08.2028327Z 2026-02-21T08:19:08.2028434Z { 2026-02-21T08:19:08.2028560Z .reg .pred complete; 2026-02-21T08:19:08.2028698Z waitLoop: 2026-02-21T08:19:08.2028888Z mbarrier.try_wait.parity.shared.b64 complete, [%r561], %r562; 2026-02-21T08:19:08.2029120Z @!complete bra.uni waitLoop; 2026-02-21T08:19:08.2029267Z } 2026-02-21T08:19:08.2029329Z 2026-02-21T08:19:08.2029389Z // end inline asm 2026-02-21T08:19:08.2029632Z .loc 1 47 90 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:47:90 2026-02-21T08:19:08.2029916Z cp.async.wait_group 0; 2026-02-21T08:19:08.2030057Z bar.sync 0; 2026-02-21T08:19:08.2030189Z // begin inline asm 2026-02-21T08:19:08.2030353Z @%p115 mbarrier.inval.shared::cta.b64 [%r336]; 2026-02-21T08:19:08.2030540Z // end inline asm 2026-02-21T08:19:08.2030671Z bar.sync 0; 2026-02-21T08:19:08.2030794Z // begin inline asm 2026-02-21T08:19:08.2030956Z @%p115 mbarrier.inval.shared::cta.b64 [%r337]; 2026-02-21T08:19:08.2031135Z // end inline asm 2026-02-21T08:19:08.2031267Z bar.sync 0; 2026-02-21T08:19:08.2031387Z // begin inline asm 2026-02-21T08:19:08.2031546Z @%p115 mbarrier.inval.shared::cta.b64 [%r338]; 2026-02-21T08:19:08.2031721Z // end inline asm 2026-02-21T08:19:08.2031852Z bar.sync 0; 2026-02-21T08:19:08.2031973Z // begin inline asm 2026-02-21T08:19:08.2032132Z @%p115 mbarrier.inval.shared::cta.b64 [%r339]; 2026-02-21T08:19:08.2032316Z // end inline asm 2026-02-21T08:19:08.2032501Z bar.sync 0; 2026-02-21T08:19:08.2032628Z // begin inline asm 2026-02-21T08:19:08.2032777Z @%p115 mbarrier.inval.shared::cta.b64 [%r340]; 2026-02-21T08:19:08.2032956Z // end inline asm 2026-02-21T08:19:08.2033077Z bar.sync 0; 2026-02-21T08:19:08.2033202Z // begin inline asm 2026-02-21T08:19:08.2033353Z @%p115 mbarrier.inval.shared::cta.b64 [%r478]; 2026-02-21T08:19:08.2033532Z // end inline asm 2026-02-21T08:19:08.2033667Z add.s32 %r569, %r37, 327728; 2026-02-21T08:19:08.2033813Z // begin inline asm 2026-02-21T08:19:08.2033970Z @%p115 mbarrier.inval.shared::cta.b64 [%r569]; 2026-02-21T08:19:08.2034143Z // end inline asm 2026-02-21T08:19:08.2034272Z bar.sync 0; 2026-02-21T08:19:08.2034392Z // begin inline asm 2026-02-21T08:19:08.2034547Z @%p115 mbarrier.inval.shared::cta.b64 [%r335]; 2026-02-21T08:19:08.2034756Z // end inline asm 2026-02-21T08:19:08.2035156Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2035452Z // begin inline asm 2026-02-21T08:19:08.2035817Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586}, [%r842 + 0]; 2026-02-21T08:19:08.2036215Z // end inline asm 2026-02-21T08:19:08.2036345Z // begin inline asm 2026-02-21T08:19:08.2036689Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603}, [%r842 + 16]; 2026-02-21T08:19:08.2037074Z // end inline asm 2026-02-21T08:19:08.2037209Z // begin inline asm 2026-02-21T08:19:08.2037561Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620}, [%r842 + 32]; 2026-02-21T08:19:08.2037939Z // end inline asm 2026-02-21T08:19:08.2038078Z // begin inline asm 2026-02-21T08:19:08.2038420Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637}, [%r842 + 48]; 2026-02-21T08:19:08.2038815Z // end inline asm 2026-02-21T08:19:08.2038942Z // begin inline asm 2026-02-21T08:19:08.2039294Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654}, [%r842 + 64]; 2026-02-21T08:19:08.2039668Z // end inline asm 2026-02-21T08:19:08.2039797Z // begin inline asm 2026-02-21T08:19:08.2040137Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671}, [%r842 + 80]; 2026-02-21T08:19:08.2040517Z // end inline asm 2026-02-21T08:19:08.2040652Z // begin inline asm 2026-02-21T08:19:08.2040991Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688}, [%r842 + 96]; 2026-02-21T08:19:08.2041380Z // end inline asm 2026-02-21T08:19:08.2041517Z // begin inline asm 2026-02-21T08:19:08.2041864Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705}, [%r842 + 112]; 2026-02-21T08:19:08.2042254Z // end inline asm 2026-02-21T08:19:08.2042380Z // begin inline asm 2026-02-21T08:19:08.2042731Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722}, [%r842 + 128]; 2026-02-21T08:19:08.2043121Z // end inline asm 2026-02-21T08:19:08.2043248Z // begin inline asm 2026-02-21T08:19:08.2043599Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739}, [%r842 + 144]; 2026-02-21T08:19:08.2043979Z // end inline asm 2026-02-21T08:19:08.2044115Z // begin inline asm 2026-02-21T08:19:08.2044458Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756}, [%r842 + 160]; 2026-02-21T08:19:08.2044931Z // end inline asm 2026-02-21T08:19:08.2045062Z // begin inline asm 2026-02-21T08:19:08.2045393Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773}, [%r842 + 176]; 2026-02-21T08:19:08.2045767Z // end inline asm 2026-02-21T08:19:08.2045891Z // begin inline asm 2026-02-21T08:19:08.2046226Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790}, [%r842 + 192]; 2026-02-21T08:19:08.2046590Z // end inline asm 2026-02-21T08:19:08.2046723Z // begin inline asm 2026-02-21T08:19:08.2047112Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807}, [%r842 + 208]; 2026-02-21T08:19:08.2047478Z // end inline asm 2026-02-21T08:19:08.2047618Z // begin inline asm 2026-02-21T08:19:08.2047948Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824}, [%r842 + 224]; 2026-02-21T08:19:08.2048335Z // end inline asm 2026-02-21T08:19:08.2048467Z // begin inline asm 2026-02-21T08:19:08.2048807Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841}, [%r842 + 240]; 2026-02-21T08:19:08.2049181Z // end inline asm 2026-02-21T08:19:08.2049314Z // begin inline asm 2026-02-21T08:19:08.2049471Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:19:08.2049633Z // end inline asm 2026-02-21T08:19:08.2049778Z cvt.u64.u32 %rd135, %r571; 2026-02-21T08:19:08.2049935Z cvt.u64.u32 %rd136, %r572; 2026-02-21T08:19:08.2050098Z shl.b64 %rd137, %rd136, 32; 2026-02-21T08:19:08.2050260Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T08:19:08.2050540Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2050834Z mov.b64 {%r847, %r848}, %rd138; 2026-02-21T08:19:08.2051007Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T08:19:08.2051299Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2051582Z cvt.u64.u32 %rd139, %r573; 2026-02-21T08:19:08.2051750Z cvt.u64.u32 %rd140, %r574; 2026-02-21T08:19:08.2051903Z shl.b64 %rd141, %rd140, 32; 2026-02-21T08:19:08.2052070Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T08:19:08.2052337Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2052624Z mov.b64 {%r850, %r851}, %rd142; 2026-02-21T08:19:08.2052809Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T08:19:08.2053097Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2053395Z cvt.u64.u32 %rd143, %r575; 2026-02-21T08:19:08.2053554Z cvt.u64.u32 %rd144, %r576; 2026-02-21T08:19:08.2053721Z shl.b64 %rd145, %rd144, 32; 2026-02-21T08:19:08.2053887Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T08:19:08.2054165Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2054460Z mov.b64 {%r853, %r854}, %rd146; 2026-02-21T08:19:08.2054634Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T08:19:08.2054972Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2055265Z cvt.u64.u32 %rd147, %r577; 2026-02-21T08:19:08.2055427Z cvt.u64.u32 %rd148, %r578; 2026-02-21T08:19:08.2055581Z shl.b64 %rd149, %rd148, 32; 2026-02-21T08:19:08.2055746Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T08:19:08.2056029Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2056361Z mov.b64 {%r856, %r857}, %rd150; 2026-02-21T08:19:08.2056538Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T08:19:08.2056812Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2057098Z cvt.u64.u32 %rd151, %r579; 2026-02-21T08:19:08.2057251Z cvt.u64.u32 %rd152, %r580; 2026-02-21T08:19:08.2057409Z shl.b64 %rd153, %rd152, 32; 2026-02-21T08:19:08.2057564Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T08:19:08.2057832Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2058119Z mov.b64 {%r859, %r860}, %rd154; 2026-02-21T08:19:08.2058286Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T08:19:08.2058575Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2058864Z cvt.u64.u32 %rd155, %r581; 2026-02-21T08:19:08.2059073Z cvt.u64.u32 %rd156, %r582; 2026-02-21T08:19:08.2059232Z shl.b64 %rd157, %rd156, 32; 2026-02-21T08:19:08.2059396Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T08:19:08.2059668Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2059959Z mov.b64 {%r862, %r863}, %rd158; 2026-02-21T08:19:08.2060134Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T08:19:08.2060409Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2060682Z cvt.u64.u32 %rd159, %r583; 2026-02-21T08:19:08.2060828Z cvt.u64.u32 %rd160, %r584; 2026-02-21T08:19:08.2060982Z shl.b64 %rd161, %rd160, 32; 2026-02-21T08:19:08.2061131Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T08:19:08.2061393Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2061679Z mov.b64 {%r865, %r866}, %rd162; 2026-02-21T08:19:08.2061840Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T08:19:08.2062115Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2062382Z cvt.u64.u32 %rd163, %r585; 2026-02-21T08:19:08.2062538Z cvt.u64.u32 %rd164, %r586; 2026-02-21T08:19:08.2062686Z shl.b64 %rd165, %rd164, 32; 2026-02-21T08:19:08.2062847Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T08:19:08.2063111Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2063382Z mov.b64 {%r868, %r869}, %rd166; 2026-02-21T08:19:08.2063544Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T08:19:08.2063805Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2064080Z cvt.u64.u32 %rd167, %r588; 2026-02-21T08:19:08.2064227Z cvt.u64.u32 %rd168, %r589; 2026-02-21T08:19:08.2064379Z shl.b64 %rd169, %rd168, 32; 2026-02-21T08:19:08.2064528Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T08:19:08.2064834Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2065112Z mov.b64 {%r871, %r872}, %rd170; 2026-02-21T08:19:08.2065270Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T08:19:08.2065538Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2065815Z cvt.u64.u32 %rd171, %r590; 2026-02-21T08:19:08.2065973Z cvt.u64.u32 %rd172, %r591; 2026-02-21T08:19:08.2066119Z shl.b64 %rd173, %rd172, 32; 2026-02-21T08:19:08.2066275Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T08:19:08.2066538Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2066809Z mov.b64 {%r874, %r875}, %rd174; 2026-02-21T08:19:08.2066975Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T08:19:08.2067237Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2067520Z cvt.u64.u32 %rd175, %r592; 2026-02-21T08:19:08.2067729Z cvt.u64.u32 %rd176, %r593; 2026-02-21T08:19:08.2067879Z shl.b64 %rd177, %rd176, 32; 2026-02-21T08:19:08.2068025Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T08:19:08.2068278Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2068554Z mov.b64 {%r877, %r878}, %rd178; 2026-02-21T08:19:08.2068710Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T08:19:08.2068975Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2069243Z cvt.u64.u32 %rd179, %r594; 2026-02-21T08:19:08.2069396Z cvt.u64.u32 %rd180, %r595; 2026-02-21T08:19:08.2069541Z shl.b64 %rd181, %rd180, 32; 2026-02-21T08:19:08.2069696Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T08:19:08.2069950Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2070270Z mov.b64 {%r880, %r881}, %rd182; 2026-02-21T08:19:08.2070441Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T08:19:08.2070706Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2070768Z cvt.u64.u32 %rd183, %r596; 2026-02-21T08:19:08.2070825Z cvt.u64.u32 %rd184, %r597; 2026-02-21T08:19:08.2070881Z shl.b64 %rd185, %rd184, 32; 2026-02-21T08:19:08.2070938Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T08:19:08.2071105Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2071162Z mov.b64 {%r883, %r884}, %rd186; 2026-02-21T08:19:08.2071224Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T08:19:08.2071389Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2071446Z cvt.u64.u32 %rd187, %r598; 2026-02-21T08:19:08.2071501Z cvt.u64.u32 %rd188, %r599; 2026-02-21T08:19:08.2071567Z shl.b64 %rd189, %rd188, 32; 2026-02-21T08:19:08.2071627Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T08:19:08.2071784Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2071841Z mov.b64 {%r886, %r887}, %rd190; 2026-02-21T08:19:08.2071909Z cvt.rn.f16x2.f32 %r888, %r887, %r886; 2026-02-21T08:19:08.2072065Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2072122Z cvt.u64.u32 %rd191, %r600; 2026-02-21T08:19:08.2072184Z cvt.u64.u32 %rd192, %r601; 2026-02-21T08:19:08.2072240Z shl.b64 %rd193, %rd192, 32; 2026-02-21T08:19:08.2072296Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T08:19:08.2072459Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2072516Z mov.b64 {%r889, %r890}, %rd194; 2026-02-21T08:19:08.2072578Z cvt.rn.f16x2.f32 %r891, %r890, %r889; 2026-02-21T08:19:08.2072736Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2072802Z cvt.u64.u32 %rd195, %r602; 2026-02-21T08:19:08.2072859Z cvt.u64.u32 %rd196, %r603; 2026-02-21T08:19:08.2072917Z shl.b64 %rd197, %rd196, 32; 2026-02-21T08:19:08.2072982Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T08:19:08.2073137Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2073195Z mov.b64 {%r892, %r893}, %rd198; 2026-02-21T08:19:08.2073263Z cvt.rn.f16x2.f32 %r894, %r893, %r892; 2026-02-21T08:19:08.2073417Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2073474Z cvt.u64.u32 %rd199, %r605; 2026-02-21T08:19:08.2073527Z cvt.u64.u32 %rd200, %r606; 2026-02-21T08:19:08.2073591Z shl.b64 %rd201, %rd200, 32; 2026-02-21T08:19:08.2073648Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T08:19:08.2073804Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2073907Z mov.b64 {%r895, %r896}, %rd202; 2026-02-21T08:19:08.2073968Z cvt.rn.f16x2.f32 %r897, %r896, %r895; 2026-02-21T08:19:08.2074125Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2074188Z cvt.u64.u32 %rd203, %r607; 2026-02-21T08:19:08.2074243Z cvt.u64.u32 %rd204, %r608; 2026-02-21T08:19:08.2074298Z shl.b64 %rd205, %rd204, 32; 2026-02-21T08:19:08.2074355Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T08:19:08.2074519Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2074575Z mov.b64 {%r898, %r899}, %rd206; 2026-02-21T08:19:08.2074635Z cvt.rn.f16x2.f32 %r900, %r899, %r898; 2026-02-21T08:19:08.2074834Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2074891Z cvt.u64.u32 %rd207, %r609; 2026-02-21T08:19:08.2074993Z cvt.u64.u32 %rd208, %r610; 2026-02-21T08:19:08.2075059Z shl.b64 %rd209, %rd208, 32; 2026-02-21T08:19:08.2075117Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T08:19:08.2075275Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2075332Z mov.b64 {%r901, %r902}, %rd210; 2026-02-21T08:19:08.2075401Z cvt.rn.f16x2.f32 %r903, %r902, %r901; 2026-02-21T08:19:08.2075561Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2075615Z cvt.u64.u32 %rd211, %r611; 2026-02-21T08:19:08.2075677Z cvt.u64.u32 %rd212, %r612; 2026-02-21T08:19:08.2075732Z shl.b64 %rd213, %rd212, 32; 2026-02-21T08:19:08.2075787Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T08:19:08.2075950Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2076006Z mov.b64 {%r904, %r905}, %rd214; 2026-02-21T08:19:08.2076069Z cvt.rn.f16x2.f32 %r906, %r905, %r904; 2026-02-21T08:19:08.2076227Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2076290Z cvt.u64.u32 %rd215, %r613; 2026-02-21T08:19:08.2076346Z cvt.u64.u32 %rd216, %r614; 2026-02-21T08:19:08.2076403Z shl.b64 %rd217, %rd216, 32; 2026-02-21T08:19:08.2076468Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T08:19:08.2076628Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2076684Z mov.b64 {%r907, %r908}, %rd218; 2026-02-21T08:19:08.2076752Z cvt.rn.f16x2.f32 %r909, %r908, %r907; 2026-02-21T08:19:08.2076908Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2076965Z cvt.u64.u32 %rd219, %r615; 2026-02-21T08:19:08.2077019Z cvt.u64.u32 %rd220, %r616; 2026-02-21T08:19:08.2077083Z shl.b64 %rd221, %rd220, 32; 2026-02-21T08:19:08.2077141Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T08:19:08.2077301Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2077364Z mov.b64 {%r910, %r911}, %rd222; 2026-02-21T08:19:08.2077424Z cvt.rn.f16x2.f32 %r912, %r911, %r910; 2026-02-21T08:19:08.2077579Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2077641Z cvt.u64.u32 %rd223, %r617; 2026-02-21T08:19:08.2077696Z cvt.u64.u32 %rd224, %r618; 2026-02-21T08:19:08.2077751Z shl.b64 %rd225, %rd224, 32; 2026-02-21T08:19:08.2077806Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T08:19:08.2077968Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2078023Z mov.b64 {%r913, %r914}, %rd226; 2026-02-21T08:19:08.2078082Z cvt.rn.f16x2.f32 %r915, %r914, %r913; 2026-02-21T08:19:08.2078246Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2078349Z cvt.u64.u32 %rd227, %r619; 2026-02-21T08:19:08.2078404Z cvt.u64.u32 %rd228, %r620; 2026-02-21T08:19:08.2078465Z shl.b64 %rd229, %rd228, 32; 2026-02-21T08:19:08.2078521Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T08:19:08.2078684Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2078739Z mov.b64 {%r916, %r917}, %rd230; 2026-02-21T08:19:08.2078806Z cvt.rn.f16x2.f32 %r918, %r917, %r916; 2026-02-21T08:19:08.2078970Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2079025Z cvt.u64.u32 %rd231, %r622; 2026-02-21T08:19:08.2079087Z cvt.u64.u32 %rd232, %r623; 2026-02-21T08:19:08.2079142Z shl.b64 %rd233, %rd232, 32; 2026-02-21T08:19:08.2079197Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T08:19:08.2079396Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2079455Z mov.b64 {%r919, %r920}, %rd234; 2026-02-21T08:19:08.2079515Z cvt.rn.f16x2.f32 %r921, %r920, %r919; 2026-02-21T08:19:08.2079672Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2079736Z cvt.u64.u32 %rd235, %r624; 2026-02-21T08:19:08.2079792Z cvt.u64.u32 %rd236, %r625; 2026-02-21T08:19:08.2079846Z shl.b64 %rd237, %rd236, 32; 2026-02-21T08:19:08.2079910Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T08:19:08.2080069Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2080126Z mov.b64 {%r922, %r923}, %rd238; 2026-02-21T08:19:08.2080192Z cvt.rn.f16x2.f32 %r924, %r923, %r922; 2026-02-21T08:19:08.2080347Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2080402Z cvt.u64.u32 %rd239, %r626; 2026-02-21T08:19:08.2080460Z cvt.u64.u32 %rd240, %r627; 2026-02-21T08:19:08.2080522Z shl.b64 %rd241, %rd240, 32; 2026-02-21T08:19:08.2080582Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T08:19:08.2080739Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2080803Z mov.b64 {%r925, %r926}, %rd242; 2026-02-21T08:19:08.2080862Z cvt.rn.f16x2.f32 %r927, %r926, %r925; 2026-02-21T08:19:08.2081020Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2081082Z cvt.u64.u32 %rd243, %r628; 2026-02-21T08:19:08.2081137Z cvt.u64.u32 %rd244, %r629; 2026-02-21T08:19:08.2081193Z shl.b64 %rd245, %rd244, 32; 2026-02-21T08:19:08.2081251Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T08:19:08.2081423Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2081485Z mov.b64 {%r928, %r929}, %rd246; 2026-02-21T08:19:08.2081546Z cvt.rn.f16x2.f32 %r930, %r929, %r928; 2026-02-21T08:19:08.2081715Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2081774Z cvt.u64.u32 %rd247, %r630; 2026-02-21T08:19:08.2081829Z cvt.u64.u32 %rd248, %r631; 2026-02-21T08:19:08.2081892Z shl.b64 %rd249, %rd248, 32; 2026-02-21T08:19:08.2081948Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T08:19:08.2082103Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2082159Z mov.b64 {%r931, %r932}, %rd250; 2026-02-21T08:19:08.2082227Z cvt.rn.f16x2.f32 %r933, %r932, %r931; 2026-02-21T08:19:08.2082386Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2082442Z cvt.u64.u32 %rd251, %r632; 2026-02-21T08:19:08.2082505Z cvt.u64.u32 %rd252, %r633; 2026-02-21T08:19:08.2082561Z shl.b64 %rd253, %rd252, 32; 2026-02-21T08:19:08.2082616Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T08:19:08.2082782Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2082881Z mov.b64 {%r934, %r935}, %rd254; 2026-02-21T08:19:08.2082943Z cvt.rn.f16x2.f32 %r936, %r935, %r934; 2026-02-21T08:19:08.2083103Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2083167Z cvt.u64.u32 %rd255, %r634; 2026-02-21T08:19:08.2083222Z cvt.u64.u32 %rd256, %r635; 2026-02-21T08:19:08.2083278Z shl.b64 %rd257, %rd256, 32; 2026-02-21T08:19:08.2083341Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T08:19:08.2083498Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2083553Z mov.b64 {%r937, %r938}, %rd258; 2026-02-21T08:19:08.2083618Z cvt.rn.f16x2.f32 %r939, %r938, %r937; 2026-02-21T08:19:08.2083775Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2083869Z cvt.u64.u32 %rd259, %r636; 2026-02-21T08:19:08.2083928Z cvt.u64.u32 %rd260, %r637; 2026-02-21T08:19:08.2083991Z shl.b64 %rd261, %rd260, 32; 2026-02-21T08:19:08.2084047Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T08:19:08.2084206Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2084269Z mov.b64 {%r940, %r941}, %rd262; 2026-02-21T08:19:08.2084329Z cvt.rn.f16x2.f32 %r942, %r941, %r940; 2026-02-21T08:19:08.2084489Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2084552Z cvt.u64.u32 %rd263, %r639; 2026-02-21T08:19:08.2084608Z cvt.u64.u32 %rd264, %r640; 2026-02-21T08:19:08.2084664Z shl.b64 %rd265, %rd264, 32; 2026-02-21T08:19:08.2084757Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T08:19:08.2084921Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2084979Z mov.b64 {%r943, %r944}, %rd266; 2026-02-21T08:19:08.2085042Z cvt.rn.f16x2.f32 %r945, %r944, %r943; 2026-02-21T08:19:08.2085203Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2085258Z cvt.u64.u32 %rd267, %r641; 2026-02-21T08:19:08.2085312Z cvt.u64.u32 %rd268, %r642; 2026-02-21T08:19:08.2085375Z shl.b64 %rd269, %rd268, 32; 2026-02-21T08:19:08.2085428Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T08:19:08.2085584Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2085639Z mov.b64 {%r946, %r947}, %rd270; 2026-02-21T08:19:08.2085705Z cvt.rn.f16x2.f32 %r948, %r947, %r946; 2026-02-21T08:19:08.2085859Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2085914Z cvt.u64.u32 %rd271, %r643; 2026-02-21T08:19:08.2085976Z cvt.u64.u32 %rd272, %r644; 2026-02-21T08:19:08.2086034Z shl.b64 %rd273, %rd272, 32; 2026-02-21T08:19:08.2086090Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T08:19:08.2086253Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2086308Z mov.b64 {%r949, %r950}, %rd274; 2026-02-21T08:19:08.2086367Z cvt.rn.f16x2.f32 %r951, %r950, %r949; 2026-02-21T08:19:08.2086523Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2086586Z cvt.u64.u32 %rd275, %r645; 2026-02-21T08:19:08.2086641Z cvt.u64.u32 %rd276, %r646; 2026-02-21T08:19:08.2086697Z shl.b64 %rd277, %rd276, 32; 2026-02-21T08:19:08.2086761Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T08:19:08.2086916Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2086971Z mov.b64 {%r952, %r953}, %rd278; 2026-02-21T08:19:08.2087037Z cvt.rn.f16x2.f32 %r954, %r953, %r952; 2026-02-21T08:19:08.2087193Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2087316Z cvt.u64.u32 %rd279, %r647; 2026-02-21T08:19:08.2087371Z cvt.u64.u32 %rd280, %r648; 2026-02-21T08:19:08.2087434Z shl.b64 %rd281, %rd280, 32; 2026-02-21T08:19:08.2087490Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T08:19:08.2087650Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2087712Z mov.b64 {%r955, %r956}, %rd282; 2026-02-21T08:19:08.2087771Z cvt.rn.f16x2.f32 %r957, %r956, %r955; 2026-02-21T08:19:08.2087932Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2087995Z cvt.u64.u32 %rd283, %r649; 2026-02-21T08:19:08.2088051Z cvt.u64.u32 %rd284, %r650; 2026-02-21T08:19:08.2088107Z shl.b64 %rd285, %rd284, 32; 2026-02-21T08:19:08.2088163Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T08:19:08.2088380Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2088440Z mov.b64 {%r958, %r959}, %rd286; 2026-02-21T08:19:08.2088501Z cvt.rn.f16x2.f32 %r960, %r959, %r958; 2026-02-21T08:19:08.2088668Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2088725Z cvt.u64.u32 %rd287, %r651; 2026-02-21T08:19:08.2088779Z cvt.u64.u32 %rd288, %r652; 2026-02-21T08:19:08.2088834Z shl.b64 %rd289, %rd288, 32; 2026-02-21T08:19:08.2088898Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T08:19:08.2089059Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2089116Z mov.b64 {%r961, %r962}, %rd290; 2026-02-21T08:19:08.2089185Z cvt.rn.f16x2.f32 %r963, %r962, %r961; 2026-02-21T08:19:08.2089347Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2089403Z cvt.u64.u32 %rd291, %r653; 2026-02-21T08:19:08.2089470Z cvt.u64.u32 %rd292, %r654; 2026-02-21T08:19:08.2089528Z shl.b64 %rd293, %rd292, 32; 2026-02-21T08:19:08.2089586Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T08:19:08.2089750Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2089816Z mov.b64 {%r964, %r965}, %rd294; 2026-02-21T08:19:08.2089877Z cvt.rn.f16x2.f32 %r966, %r965, %r964; 2026-02-21T08:19:08.2090036Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2090101Z cvt.u64.u32 %rd295, %r656; 2026-02-21T08:19:08.2090156Z cvt.u64.u32 %rd296, %r657; 2026-02-21T08:19:08.2090210Z shl.b64 %rd297, %rd296, 32; 2026-02-21T08:19:08.2090274Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T08:19:08.2090434Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2090490Z mov.b64 {%r967, %r968}, %rd298; 2026-02-21T08:19:08.2090552Z cvt.rn.f16x2.f32 %r969, %r968, %r967; 2026-02-21T08:19:08.2090717Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2090772Z cvt.u64.u32 %rd299, %r658; 2026-02-21T08:19:08.2090826Z cvt.u64.u32 %rd300, %r659; 2026-02-21T08:19:08.2090888Z shl.b64 %rd301, %rd300, 32; 2026-02-21T08:19:08.2090945Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T08:19:08.2091106Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2091168Z mov.b64 {%r970, %r971}, %rd302; 2026-02-21T08:19:08.2091228Z cvt.rn.f16x2.f32 %r972, %r971, %r970; 2026-02-21T08:19:08.2091389Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2091451Z cvt.u64.u32 %rd303, %r660; 2026-02-21T08:19:08.2091507Z cvt.u64.u32 %rd304, %r661; 2026-02-21T08:19:08.2091562Z shl.b64 %rd305, %rd304, 32; 2026-02-21T08:19:08.2091619Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T08:19:08.2091826Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2091882Z mov.b64 {%r973, %r974}, %rd306; 2026-02-21T08:19:08.2091942Z cvt.rn.f16x2.f32 %r975, %r974, %r973; 2026-02-21T08:19:08.2092105Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2092162Z cvt.u64.u32 %rd307, %r662; 2026-02-21T08:19:08.2092220Z cvt.u64.u32 %rd308, %r663; 2026-02-21T08:19:08.2092276Z shl.b64 %rd309, %rd308, 32; 2026-02-21T08:19:08.2092341Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T08:19:08.2092501Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2092558Z mov.b64 {%r976, %r977}, %rd310; 2026-02-21T08:19:08.2092627Z cvt.rn.f16x2.f32 %r978, %r977, %r976; 2026-02-21T08:19:08.2092872Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2092935Z cvt.u64.u32 %rd311, %r664; 2026-02-21T08:19:08.2093000Z cvt.u64.u32 %rd312, %r665; 2026-02-21T08:19:08.2093056Z shl.b64 %rd313, %rd312, 32; 2026-02-21T08:19:08.2093112Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T08:19:08.2093270Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2093334Z mov.b64 {%r979, %r980}, %rd314; 2026-02-21T08:19:08.2093396Z cvt.rn.f16x2.f32 %r981, %r980, %r979; 2026-02-21T08:19:08.2093552Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2093614Z cvt.u64.u32 %rd315, %r666; 2026-02-21T08:19:08.2093670Z cvt.u64.u32 %rd316, %r667; 2026-02-21T08:19:08.2093725Z shl.b64 %rd317, %rd316, 32; 2026-02-21T08:19:08.2093788Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T08:19:08.2093948Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2094006Z mov.b64 {%r982, %r983}, %rd318; 2026-02-21T08:19:08.2094068Z cvt.rn.f16x2.f32 %r984, %r983, %r982; 2026-02-21T08:19:08.2094233Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2094289Z cvt.u64.u32 %rd319, %r668; 2026-02-21T08:19:08.2094344Z cvt.u64.u32 %rd320, %r669; 2026-02-21T08:19:08.2094407Z shl.b64 %rd321, %rd320, 32; 2026-02-21T08:19:08.2094463Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T08:19:08.2094618Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2094717Z mov.b64 {%r985, %r986}, %rd322; 2026-02-21T08:19:08.2094778Z cvt.rn.f16x2.f32 %r987, %r986, %r985; 2026-02-21T08:19:08.2094936Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2094997Z cvt.u64.u32 %rd323, %r670; 2026-02-21T08:19:08.2095053Z cvt.u64.u32 %rd324, %r671; 2026-02-21T08:19:08.2095109Z shl.b64 %rd325, %rd324, 32; 2026-02-21T08:19:08.2095167Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T08:19:08.2095329Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2095384Z mov.b64 {%r988, %r989}, %rd326; 2026-02-21T08:19:08.2095443Z cvt.rn.f16x2.f32 %r990, %r989, %r988; 2026-02-21T08:19:08.2095603Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2095657Z cvt.u64.u32 %rd327, %r673; 2026-02-21T08:19:08.2095713Z cvt.u64.u32 %rd328, %r674; 2026-02-21T08:19:08.2095768Z shl.b64 %rd329, %rd328, 32; 2026-02-21T08:19:08.2095828Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T08:19:08.2095983Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2096037Z mov.b64 {%r991, %r992}, %rd330; 2026-02-21T08:19:08.2096101Z cvt.rn.f16x2.f32 %r993, %r992, %r991; 2026-02-21T08:19:08.2096256Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2096373Z cvt.u64.u32 %rd331, %r675; 2026-02-21T08:19:08.2096430Z cvt.u64.u32 %rd332, %r676; 2026-02-21T08:19:08.2096483Z shl.b64 %rd333, %rd332, 32; 2026-02-21T08:19:08.2096536Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T08:19:08.2096691Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2096750Z mov.b64 {%r994, %r995}, %rd334; 2026-02-21T08:19:08.2096807Z cvt.rn.f16x2.f32 %r996, %r995, %r994; 2026-02-21T08:19:08.2096962Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2097022Z cvt.u64.u32 %rd335, %r677; 2026-02-21T08:19:08.2097076Z cvt.u64.u32 %rd336, %r678; 2026-02-21T08:19:08.2097131Z shl.b64 %rd337, %rd336, 32; 2026-02-21T08:19:08.2097190Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T08:19:08.2097406Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2097467Z mov.b64 {%r997, %r998}, %rd338; 2026-02-21T08:19:08.2097529Z cvt.rn.f16x2.f32 %r999, %r998, %r997; 2026-02-21T08:19:08.2097703Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2097759Z cvt.u64.u32 %rd339, %r679; 2026-02-21T08:19:08.2097812Z cvt.u64.u32 %rd340, %r680; 2026-02-21T08:19:08.2097875Z shl.b64 %rd341, %rd340, 32; 2026-02-21T08:19:08.2097933Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T08:19:08.2098103Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2098172Z mov.b64 {%r1000, %r1001}, %rd342; 2026-02-21T08:19:08.2098243Z cvt.rn.f16x2.f32 %r1002, %r1001, %r1000; 2026-02-21T08:19:08.2098412Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2098474Z cvt.u64.u32 %rd343, %r681; 2026-02-21T08:19:08.2098539Z cvt.u64.u32 %rd344, %r682; 2026-02-21T08:19:08.2098596Z shl.b64 %rd345, %rd344, 32; 2026-02-21T08:19:08.2098651Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T08:19:08.2098818Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2098877Z mov.b64 {%r1003, %r1004}, %rd346; 2026-02-21T08:19:08.2098945Z cvt.rn.f16x2.f32 %r1005, %r1004, %r1003; 2026-02-21T08:19:08.2099115Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2099170Z cvt.u64.u32 %rd347, %r683; 2026-02-21T08:19:08.2099223Z cvt.u64.u32 %rd348, %r684; 2026-02-21T08:19:08.2099278Z shl.b64 %rd349, %rd348, 32; 2026-02-21T08:19:08.2099338Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T08:19:08.2099504Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2099566Z mov.b64 {%r1006, %r1007}, %rd350; 2026-02-21T08:19:08.2099642Z cvt.rn.f16x2.f32 %r1008, %r1007, %r1006; 2026-02-21T08:19:08.2099803Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2099860Z cvt.u64.u32 %rd351, %r685; 2026-02-21T08:19:08.2099919Z cvt.u64.u32 %rd352, %r686; 2026-02-21T08:19:08.2099972Z shl.b64 %rd353, %rd352, 32; 2026-02-21T08:19:08.2100028Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T08:19:08.2100192Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2100252Z mov.b64 {%r1009, %r1010}, %rd354; 2026-02-21T08:19:08.2100320Z cvt.rn.f16x2.f32 %r1011, %r1010, %r1009; 2026-02-21T08:19:08.2100487Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2100552Z cvt.u64.u32 %rd355, %r687; 2026-02-21T08:19:08.2100610Z cvt.u64.u32 %rd356, %r688; 2026-02-21T08:19:08.2100669Z shl.b64 %rd357, %rd356, 32; 2026-02-21T08:19:08.2100734Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T08:19:08.2100943Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2101004Z mov.b64 {%r1012, %r1013}, %rd358; 2026-02-21T08:19:08.2101074Z cvt.rn.f16x2.f32 %r1014, %r1013, %r1012; 2026-02-21T08:19:08.2101248Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2101306Z cvt.u64.u32 %rd359, %r690; 2026-02-21T08:19:08.2101365Z cvt.u64.u32 %rd360, %r691; 2026-02-21T08:19:08.2101432Z shl.b64 %rd361, %rd360, 32; 2026-02-21T08:19:08.2101492Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T08:19:08.2101658Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2101725Z mov.b64 {%r1015, %r1016}, %rd362; 2026-02-21T08:19:08.2101793Z cvt.rn.f16x2.f32 %r1017, %r1016, %r1015; 2026-02-21T08:19:08.2102015Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2102078Z cvt.u64.u32 %rd363, %r692; 2026-02-21T08:19:08.2102145Z cvt.u64.u32 %rd364, %r693; 2026-02-21T08:19:08.2102203Z shl.b64 %rd365, %rd364, 32; 2026-02-21T08:19:08.2102260Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T08:19:08.2102432Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2102492Z mov.b64 {%r1018, %r1019}, %rd366; 2026-02-21T08:19:08.2102559Z cvt.rn.f16x2.f32 %r1020, %r1019, %r1018; 2026-02-21T08:19:08.2102736Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2102793Z cvt.u64.u32 %rd367, %r694; 2026-02-21T08:19:08.2102851Z cvt.u64.u32 %rd368, %r695; 2026-02-21T08:19:08.2102907Z shl.b64 %rd369, %rd368, 32; 2026-02-21T08:19:08.2102965Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T08:19:08.2103131Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2103192Z mov.b64 {%r1021, %r1022}, %rd370; 2026-02-21T08:19:08.2103259Z cvt.rn.f16x2.f32 %r1023, %r1022, %r1021; 2026-02-21T08:19:08.2103424Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2103477Z cvt.u64.u32 %rd371, %r696; 2026-02-21T08:19:08.2103537Z cvt.u64.u32 %rd372, %r697; 2026-02-21T08:19:08.2103592Z shl.b64 %rd373, %rd372, 32; 2026-02-21T08:19:08.2103647Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T08:19:08.2103813Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2103875Z mov.b64 {%r1024, %r1025}, %rd374; 2026-02-21T08:19:08.2103937Z cvt.rn.f16x2.f32 %r1026, %r1025, %r1024; 2026-02-21T08:19:08.2104099Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2104160Z cvt.u64.u32 %rd375, %r698; 2026-02-21T08:19:08.2104217Z cvt.u64.u32 %rd376, %r699; 2026-02-21T08:19:08.2104276Z shl.b64 %rd377, %rd376, 32; 2026-02-21T08:19:08.2104336Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T08:19:08.2104499Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2104558Z mov.b64 {%r1027, %r1028}, %rd378; 2026-02-21T08:19:08.2104621Z cvt.rn.f16x2.f32 %r1029, %r1028, %r1027; 2026-02-21T08:19:08.2104817Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2104873Z cvt.u64.u32 %rd379, %r700; 2026-02-21T08:19:08.2104928Z cvt.u64.u32 %rd380, %r701; 2026-02-21T08:19:08.2104989Z shl.b64 %rd381, %rd380, 32; 2026-02-21T08:19:08.2105045Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T08:19:08.2105200Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2105258Z mov.b64 {%r1030, %r1031}, %rd382; 2026-02-21T08:19:08.2105324Z cvt.rn.f16x2.f32 %r1032, %r1031, %r1030; 2026-02-21T08:19:08.2105536Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2105590Z cvt.u64.u32 %rd383, %r702; 2026-02-21T08:19:08.2105649Z cvt.u64.u32 %rd384, %r703; 2026-02-21T08:19:08.2105702Z shl.b64 %rd385, %rd384, 32; 2026-02-21T08:19:08.2105760Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T08:19:08.2105932Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2105991Z mov.b64 {%r1033, %r1034}, %rd386; 2026-02-21T08:19:08.2106058Z cvt.rn.f16x2.f32 %r1035, %r1034, %r1033; 2026-02-21T08:19:08.2106230Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2106289Z cvt.u64.u32 %rd387, %r704; 2026-02-21T08:19:08.2106343Z cvt.u64.u32 %rd388, %r705; 2026-02-21T08:19:08.2106398Z shl.b64 %rd389, %rd388, 32; 2026-02-21T08:19:08.2106500Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T08:19:08.2106672Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2106729Z mov.b64 {%r1036, %r1037}, %rd390; 2026-02-21T08:19:08.2106798Z cvt.rn.f16x2.f32 %r1038, %r1037, %r1036; 2026-02-21T08:19:08.2106968Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2107022Z cvt.u64.u32 %rd391, %r707; 2026-02-21T08:19:08.2107081Z cvt.u64.u32 %rd392, %r708; 2026-02-21T08:19:08.2107135Z shl.b64 %rd393, %rd392, 32; 2026-02-21T08:19:08.2107188Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T08:19:08.2107343Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2107401Z mov.b64 {%r1039, %r1040}, %rd394; 2026-02-21T08:19:08.2107460Z cvt.rn.f16x2.f32 %r1041, %r1040, %r1039; 2026-02-21T08:19:08.2107615Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2107673Z cvt.u64.u32 %rd395, %r709; 2026-02-21T08:19:08.2107725Z cvt.u64.u32 %rd396, %r710; 2026-02-21T08:19:08.2107777Z shl.b64 %rd397, %rd396, 32; 2026-02-21T08:19:08.2107831Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T08:19:08.2107988Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2108045Z mov.b64 {%r1042, %r1043}, %rd398; 2026-02-21T08:19:08.2108108Z cvt.rn.f16x2.f32 %r1044, %r1043, %r1042; 2026-02-21T08:19:08.2108272Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2108328Z cvt.u64.u32 %rd399, %r711; 2026-02-21T08:19:08.2108382Z cvt.u64.u32 %rd400, %r712; 2026-02-21T08:19:08.2108444Z shl.b64 %rd401, %rd400, 32; 2026-02-21T08:19:08.2108500Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T08:19:08.2108663Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2108730Z mov.b64 {%r1045, %r1046}, %rd402; 2026-02-21T08:19:08.2108792Z cvt.rn.f16x2.f32 %r1047, %r1046, %r1045; 2026-02-21T08:19:08.2108948Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2109003Z cvt.u64.u32 %rd403, %r713; 2026-02-21T08:19:08.2109064Z cvt.u64.u32 %rd404, %r714; 2026-02-21T08:19:08.2109119Z shl.b64 %rd405, %rd404, 32; 2026-02-21T08:19:08.2109175Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T08:19:08.2109339Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2109395Z mov.b64 {%r1048, %r1049}, %rd406; 2026-02-21T08:19:08.2109457Z cvt.rn.f16x2.f32 %r1050, %r1049, %r1048; 2026-02-21T08:19:08.2109615Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2109670Z cvt.u64.u32 %rd407, %r715; 2026-02-21T08:19:08.2109726Z cvt.u64.u32 %rd408, %r716; 2026-02-21T08:19:08.2109822Z shl.b64 %rd409, %rd408, 32; 2026-02-21T08:19:08.2109884Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T08:19:08.2110045Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2110101Z mov.b64 {%r1051, %r1052}, %rd410; 2026-02-21T08:19:08.2110171Z cvt.rn.f16x2.f32 %r1053, %r1052, %r1051; 2026-02-21T08:19:08.2110325Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2110381Z cvt.u64.u32 %rd411, %r717; 2026-02-21T08:19:08.2110442Z cvt.u64.u32 %rd412, %r718; 2026-02-21T08:19:08.2110497Z shl.b64 %rd413, %rd412, 32; 2026-02-21T08:19:08.2110552Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T08:19:08.2110707Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2110771Z mov.b64 {%r1054, %r1055}, %rd414; 2026-02-21T08:19:08.2110870Z cvt.rn.f16x2.f32 %r1056, %r1055, %r1054; 2026-02-21T08:19:08.2111031Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2111092Z cvt.u64.u32 %rd415, %r719; 2026-02-21T08:19:08.2111147Z cvt.u64.u32 %rd416, %r720; 2026-02-21T08:19:08.2111203Z shl.b64 %rd417, %rd416, 32; 2026-02-21T08:19:08.2111264Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T08:19:08.2111424Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2111481Z mov.b64 {%r1057, %r1058}, %rd418; 2026-02-21T08:19:08.2111544Z cvt.rn.f16x2.f32 %r1059, %r1058, %r1057; 2026-02-21T08:19:08.2111706Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2111761Z cvt.u64.u32 %rd419, %r721; 2026-02-21T08:19:08.2111816Z cvt.u64.u32 %rd420, %r722; 2026-02-21T08:19:08.2111876Z shl.b64 %rd421, %rd420, 32; 2026-02-21T08:19:08.2111933Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T08:19:08.2112093Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2112155Z mov.b64 {%r1060, %r1061}, %rd422; 2026-02-21T08:19:08.2112219Z cvt.rn.f16x2.f32 %r1062, %r1061, %r1060; 2026-02-21T08:19:08.2112374Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2112428Z cvt.u64.u32 %rd423, %r724; 2026-02-21T08:19:08.2112488Z cvt.u64.u32 %rd424, %r725; 2026-02-21T08:19:08.2112544Z shl.b64 %rd425, %rd424, 32; 2026-02-21T08:19:08.2112599Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T08:19:08.2112764Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2112820Z mov.b64 {%r1063, %r1064}, %rd426; 2026-02-21T08:19:08.2112882Z cvt.rn.f16x2.f32 %r1065, %r1064, %r1063; 2026-02-21T08:19:08.2113047Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2113104Z cvt.u64.u32 %rd427, %r726; 2026-02-21T08:19:08.2113158Z cvt.u64.u32 %rd428, %r727; 2026-02-21T08:19:08.2113214Z shl.b64 %rd429, %rd428, 32; 2026-02-21T08:19:08.2113275Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T08:19:08.2113432Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2113490Z mov.b64 {%r1066, %r1067}, %rd430; 2026-02-21T08:19:08.2113560Z cvt.rn.f16x2.f32 %r1068, %r1067, %r1066; 2026-02-21T08:19:08.2113718Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2113774Z cvt.u64.u32 %rd431, %r728; 2026-02-21T08:19:08.2113836Z cvt.u64.u32 %rd432, %r729; 2026-02-21T08:19:08.2113890Z shl.b64 %rd433, %rd432, 32; 2026-02-21T08:19:08.2113945Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T08:19:08.2114105Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2114204Z mov.b64 {%r1069, %r1070}, %rd434; 2026-02-21T08:19:08.2114265Z cvt.rn.f16x2.f32 %r1071, %r1070, %r1069; 2026-02-21T08:19:08.2114417Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2114474Z cvt.u64.u32 %rd435, %r730; 2026-02-21T08:19:08.2114529Z cvt.u64.u32 %rd436, %r731; 2026-02-21T08:19:08.2114583Z shl.b64 %rd437, %rd436, 32; 2026-02-21T08:19:08.2114644Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T08:19:08.2114836Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2114894Z mov.b64 {%r1072, %r1073}, %rd438; 2026-02-21T08:19:08.2114957Z cvt.rn.f16x2.f32 %r1074, %r1073, %r1072; 2026-02-21T08:19:08.2115116Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2115172Z cvt.u64.u32 %rd439, %r732; 2026-02-21T08:19:08.2115273Z cvt.u64.u32 %rd440, %r733; 2026-02-21T08:19:08.2115345Z shl.b64 %rd441, %rd440, 32; 2026-02-21T08:19:08.2115404Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T08:19:08.2115560Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2115632Z mov.b64 {%r1075, %r1076}, %rd442; 2026-02-21T08:19:08.2115697Z cvt.rn.f16x2.f32 %r1077, %r1076, %r1075; 2026-02-21T08:19:08.2115853Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2115912Z cvt.u64.u32 %rd443, %r734; 2026-02-21T08:19:08.2115980Z cvt.u64.u32 %rd444, %r735; 2026-02-21T08:19:08.2116041Z shl.b64 %rd445, %rd444, 32; 2026-02-21T08:19:08.2116101Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T08:19:08.2116272Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2116331Z mov.b64 {%r1078, %r1079}, %rd446; 2026-02-21T08:19:08.2116396Z cvt.rn.f16x2.f32 %r1080, %r1079, %r1078; 2026-02-21T08:19:08.2116565Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2116619Z cvt.u64.u32 %rd447, %r736; 2026-02-21T08:19:08.2116674Z cvt.u64.u32 %rd448, %r737; 2026-02-21T08:19:08.2116729Z shl.b64 %rd449, %rd448, 32; 2026-02-21T08:19:08.2116793Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T08:19:08.2116950Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2117006Z mov.b64 {%r1081, %r1082}, %rd450; 2026-02-21T08:19:08.2117077Z cvt.rn.f16x2.f32 %r1083, %r1082, %r1081; 2026-02-21T08:19:08.2117233Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2117287Z cvt.u64.u32 %rd451, %r738; 2026-02-21T08:19:08.2117352Z cvt.u64.u32 %rd452, %r739; 2026-02-21T08:19:08.2117406Z shl.b64 %rd453, %rd452, 32; 2026-02-21T08:19:08.2117463Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T08:19:08.2117622Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2117687Z mov.b64 {%r1084, %r1085}, %rd454; 2026-02-21T08:19:08.2117751Z cvt.rn.f16x2.f32 %r1086, %r1085, %r1084; 2026-02-21T08:19:08.2117911Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2117975Z cvt.u64.u32 %rd455, %r741; 2026-02-21T08:19:08.2118031Z cvt.u64.u32 %rd456, %r742; 2026-02-21T08:19:08.2118087Z shl.b64 %rd457, %rd456, 32; 2026-02-21T08:19:08.2118150Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T08:19:08.2118307Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2118364Z mov.b64 {%r1087, %r1088}, %rd458; 2026-02-21T08:19:08.2118427Z cvt.rn.f16x2.f32 %r1089, %r1088, %r1087; 2026-02-21T08:19:08.2118594Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2118694Z cvt.u64.u32 %rd459, %r743; 2026-02-21T08:19:08.2118749Z cvt.u64.u32 %rd460, %r744; 2026-02-21T08:19:08.2118813Z shl.b64 %rd461, %rd460, 32; 2026-02-21T08:19:08.2118869Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T08:19:08.2119026Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2119090Z mov.b64 {%r1090, %r1091}, %rd462; 2026-02-21T08:19:08.2119153Z cvt.rn.f16x2.f32 %r1092, %r1091, %r1090; 2026-02-21T08:19:08.2119309Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2119365Z cvt.u64.u32 %rd463, %r745; 2026-02-21T08:19:08.2119427Z cvt.u64.u32 %rd464, %r746; 2026-02-21T08:19:08.2119482Z shl.b64 %rd465, %rd464, 32; 2026-02-21T08:19:08.2119538Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T08:19:08.2119748Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2119809Z mov.b64 {%r1093, %r1094}, %rd466; 2026-02-21T08:19:08.2119871Z cvt.rn.f16x2.f32 %r1095, %r1094, %r1093; 2026-02-21T08:19:08.2120034Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2120090Z cvt.u64.u32 %rd467, %r747; 2026-02-21T08:19:08.2120144Z cvt.u64.u32 %rd468, %r748; 2026-02-21T08:19:08.2120199Z shl.b64 %rd469, %rd468, 32; 2026-02-21T08:19:08.2120260Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T08:19:08.2120411Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2120468Z mov.b64 {%r1096, %r1097}, %rd470; 2026-02-21T08:19:08.2120537Z cvt.rn.f16x2.f32 %r1098, %r1097, %r1096; 2026-02-21T08:19:08.2120694Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2120748Z cvt.u64.u32 %rd471, %r749; 2026-02-21T08:19:08.2120810Z cvt.u64.u32 %rd472, %r750; 2026-02-21T08:19:08.2120869Z shl.b64 %rd473, %rd472, 32; 2026-02-21T08:19:08.2120923Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T08:19:08.2121077Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2121140Z mov.b64 {%r1099, %r1100}, %rd474; 2026-02-21T08:19:08.2121202Z cvt.rn.f16x2.f32 %r1101, %r1100, %r1099; 2026-02-21T08:19:08.2121357Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2121421Z cvt.u64.u32 %rd475, %r751; 2026-02-21T08:19:08.2121476Z cvt.u64.u32 %rd476, %r752; 2026-02-21T08:19:08.2121532Z shl.b64 %rd477, %rd476, 32; 2026-02-21T08:19:08.2121594Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T08:19:08.2121746Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2121801Z mov.b64 {%r1102, %r1103}, %rd478; 2026-02-21T08:19:08.2121866Z cvt.rn.f16x2.f32 %r1104, %r1103, %r1102; 2026-02-21T08:19:08.2122030Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2122085Z cvt.u64.u32 %rd479, %r753; 2026-02-21T08:19:08.2122140Z cvt.u64.u32 %rd480, %r754; 2026-02-21T08:19:08.2122202Z shl.b64 %rd481, %rd480, 32; 2026-02-21T08:19:08.2122257Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T08:19:08.2122415Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2122480Z mov.b64 {%r1105, %r1106}, %rd482; 2026-02-21T08:19:08.2122543Z cvt.rn.f16x2.f32 %r1107, %r1106, %r1105; 2026-02-21T08:19:08.2122698Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2122753Z cvt.u64.u32 %rd483, %r755; 2026-02-21T08:19:08.2122815Z cvt.u64.u32 %rd484, %r756; 2026-02-21T08:19:08.2122871Z shl.b64 %rd485, %rd484, 32; 2026-02-21T08:19:08.2122930Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T08:19:08.2123095Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2123195Z mov.b64 {%r1108, %r1109}, %rd486; 2026-02-21T08:19:08.2123260Z cvt.rn.f16x2.f32 %r1110, %r1109, %r1108; 2026-02-21T08:19:08.2123422Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2123478Z cvt.u64.u32 %rd487, %r758; 2026-02-21T08:19:08.2123532Z cvt.u64.u32 %rd488, %r759; 2026-02-21T08:19:08.2123589Z shl.b64 %rd489, %rd488, 32; 2026-02-21T08:19:08.2123655Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T08:19:08.2123810Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2123867Z mov.b64 {%r1111, %r1112}, %rd490; 2026-02-21T08:19:08.2123944Z cvt.rn.f16x2.f32 %r1113, %r1112, %r1111; 2026-02-21T08:19:08.2124149Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2124211Z cvt.u64.u32 %rd491, %r760; 2026-02-21T08:19:08.2124274Z cvt.u64.u32 %rd492, %r761; 2026-02-21T08:19:08.2124329Z shl.b64 %rd493, %rd492, 32; 2026-02-21T08:19:08.2124385Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T08:19:08.2124549Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2124613Z mov.b64 {%r1114, %r1115}, %rd494; 2026-02-21T08:19:08.2124700Z cvt.rn.f16x2.f32 %r1116, %r1115, %r1114; 2026-02-21T08:19:08.2124865Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2124928Z cvt.u64.u32 %rd495, %r762; 2026-02-21T08:19:08.2124983Z cvt.u64.u32 %rd496, %r763; 2026-02-21T08:19:08.2125038Z shl.b64 %rd497, %rd496, 32; 2026-02-21T08:19:08.2125101Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T08:19:08.2125268Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2125324Z mov.b64 {%r1117, %r1118}, %rd498; 2026-02-21T08:19:08.2125391Z cvt.rn.f16x2.f32 %r1119, %r1118, %r1117; 2026-02-21T08:19:08.2125554Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2125609Z cvt.u64.u32 %rd499, %r764; 2026-02-21T08:19:08.2125665Z cvt.u64.u32 %rd500, %r765; 2026-02-21T08:19:08.2125729Z shl.b64 %rd501, %rd500, 32; 2026-02-21T08:19:08.2125785Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T08:19:08.2125947Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2126009Z mov.b64 {%r1120, %r1121}, %rd502; 2026-02-21T08:19:08.2126072Z cvt.rn.f16x2.f32 %r1122, %r1121, %r1120; 2026-02-21T08:19:08.2126234Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2126288Z cvt.u64.u32 %rd503, %r766; 2026-02-21T08:19:08.2126351Z cvt.u64.u32 %rd504, %r767; 2026-02-21T08:19:08.2126408Z shl.b64 %rd505, %rd504, 32; 2026-02-21T08:19:08.2126468Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T08:19:08.2126635Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2126692Z mov.b64 {%r1123, %r1124}, %rd506; 2026-02-21T08:19:08.2126756Z cvt.rn.f16x2.f32 %r1125, %r1124, %r1123; 2026-02-21T08:19:08.2126923Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2126980Z cvt.u64.u32 %rd507, %r768; 2026-02-21T08:19:08.2127035Z cvt.u64.u32 %rd508, %r769; 2026-02-21T08:19:08.2127090Z shl.b64 %rd509, %rd508, 32; 2026-02-21T08:19:08.2127156Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T08:19:08.2127315Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2127371Z mov.b64 {%r1126, %r1127}, %rd510; 2026-02-21T08:19:08.2127444Z cvt.rn.f16x2.f32 %r1128, %r1127, %r1126; 2026-02-21T08:19:08.2127605Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2127711Z cvt.u64.u32 %rd511, %r770; 2026-02-21T08:19:08.2127774Z cvt.u64.u32 %rd512, %r771; 2026-02-21T08:19:08.2127829Z shl.b64 %rd513, %rd512, 32; 2026-02-21T08:19:08.2127884Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T08:19:08.2128044Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2128108Z mov.b64 {%r1129, %r1130}, %rd514; 2026-02-21T08:19:08.2128171Z cvt.rn.f16x2.f32 %r1131, %r1130, %r1129; 2026-02-21T08:19:08.2128333Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2128395Z cvt.u64.u32 %rd515, %r772; 2026-02-21T08:19:08.2128450Z cvt.u64.u32 %rd516, %r773; 2026-02-21T08:19:08.2128505Z shl.b64 %rd517, %rd516, 32; 2026-02-21T08:19:08.2128565Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T08:19:08.2128769Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2128830Z mov.b64 {%r1132, %r1133}, %rd518; 2026-02-21T08:19:08.2128894Z cvt.rn.f16x2.f32 %r1134, %r1133, %r1132; 2026-02-21T08:19:08.2129059Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2129113Z cvt.u64.u32 %rd519, %r775; 2026-02-21T08:19:08.2129167Z cvt.u64.u32 %rd520, %r776; 2026-02-21T08:19:08.2129229Z shl.b64 %rd521, %rd520, 32; 2026-02-21T08:19:08.2129284Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T08:19:08.2129443Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2129508Z mov.b64 {%r1135, %r1136}, %rd522; 2026-02-21T08:19:08.2129571Z cvt.rn.f16x2.f32 %r1137, %r1136, %r1135; 2026-02-21T08:19:08.2129731Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2129787Z cvt.u64.u32 %rd523, %r777; 2026-02-21T08:19:08.2129853Z cvt.u64.u32 %rd524, %r778; 2026-02-21T08:19:08.2129909Z shl.b64 %rd525, %rd524, 32; 2026-02-21T08:19:08.2129964Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T08:19:08.2130125Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2130181Z mov.b64 {%r1138, %r1139}, %rd526; 2026-02-21T08:19:08.2130246Z cvt.rn.f16x2.f32 %r1140, %r1139, %r1138; 2026-02-21T08:19:08.2130410Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2130465Z cvt.u64.u32 %rd527, %r779; 2026-02-21T08:19:08.2130521Z cvt.u64.u32 %rd528, %r780; 2026-02-21T08:19:08.2130576Z shl.b64 %rd529, %rd528, 32; 2026-02-21T08:19:08.2130639Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T08:19:08.2130799Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2130859Z mov.b64 {%r1141, %r1142}, %rd530; 2026-02-21T08:19:08.2130932Z cvt.rn.f16x2.f32 %r1143, %r1142, %r1141; 2026-02-21T08:19:08.2131092Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2131147Z cvt.u64.u32 %rd531, %r781; 2026-02-21T08:19:08.2131211Z cvt.u64.u32 %rd532, %r782; 2026-02-21T08:19:08.2131266Z shl.b64 %rd533, %rd532, 32; 2026-02-21T08:19:08.2131323Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T08:19:08.2131480Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2131546Z mov.b64 {%r1144, %r1145}, %rd534; 2026-02-21T08:19:08.2131608Z cvt.rn.f16x2.f32 %r1146, %r1145, %r1144; 2026-02-21T08:19:08.2131766Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2131834Z cvt.u64.u32 %rd535, %r783; 2026-02-21T08:19:08.2131890Z cvt.u64.u32 %rd536, %r784; 2026-02-21T08:19:08.2131947Z shl.b64 %rd537, %rd536, 32; 2026-02-21T08:19:08.2132064Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T08:19:08.2132224Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2132282Z mov.b64 {%r1147, %r1148}, %rd538; 2026-02-21T08:19:08.2132345Z cvt.rn.f16x2.f32 %r1149, %r1148, %r1147; 2026-02-21T08:19:08.2132513Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2132569Z cvt.u64.u32 %rd539, %r785; 2026-02-21T08:19:08.2132627Z cvt.u64.u32 %rd540, %r786; 2026-02-21T08:19:08.2132693Z shl.b64 %rd541, %rd540, 32; 2026-02-21T08:19:08.2132749Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T08:19:08.2132902Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2132967Z mov.b64 {%r1150, %r1151}, %rd542; 2026-02-21T08:19:08.2133030Z cvt.rn.f16x2.f32 %r1152, %r1151, %r1150; 2026-02-21T08:19:08.2133223Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2133283Z cvt.u64.u32 %rd543, %r787; 2026-02-21T08:19:08.2133347Z cvt.u64.u32 %rd544, %r788; 2026-02-21T08:19:08.2133402Z shl.b64 %rd545, %rd544, 32; 2026-02-21T08:19:08.2133457Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T08:19:08.2133618Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2133676Z mov.b64 {%r1153, %r1154}, %rd546; 2026-02-21T08:19:08.2133738Z cvt.rn.f16x2.f32 %r1155, %r1154, %r1153; 2026-02-21T08:19:08.2133902Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2133957Z cvt.u64.u32 %rd547, %r789; 2026-02-21T08:19:08.2134012Z cvt.u64.u32 %rd548, %r790; 2026-02-21T08:19:08.2134067Z shl.b64 %rd549, %rd548, 32; 2026-02-21T08:19:08.2134131Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T08:19:08.2134291Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2134350Z mov.b64 {%r1156, %r1157}, %rd550; 2026-02-21T08:19:08.2134420Z cvt.rn.f16x2.f32 %r1158, %r1157, %r1156; 2026-02-21T08:19:08.2134580Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2134636Z cvt.u64.u32 %rd551, %r792; 2026-02-21T08:19:08.2134727Z cvt.u64.u32 %rd552, %r793; 2026-02-21T08:19:08.2134783Z shl.b64 %rd553, %rd552, 32; 2026-02-21T08:19:08.2134840Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T08:19:08.2134997Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2135062Z mov.b64 {%r1159, %r1160}, %rd554; 2026-02-21T08:19:08.2135124Z cvt.rn.f16x2.f32 %r1161, %r1160, %r1159; 2026-02-21T08:19:08.2135284Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2135349Z cvt.u64.u32 %rd555, %r794; 2026-02-21T08:19:08.2135408Z cvt.u64.u32 %rd556, %r795; 2026-02-21T08:19:08.2135463Z shl.b64 %rd557, %rd556, 32; 2026-02-21T08:19:08.2135528Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T08:19:08.2135687Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2135743Z mov.b64 {%r1162, %r1163}, %rd558; 2026-02-21T08:19:08.2135807Z cvt.rn.f16x2.f32 %r1164, %r1163, %r1162; 2026-02-21T08:19:08.2135971Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2136027Z cvt.u64.u32 %rd559, %r796; 2026-02-21T08:19:08.2136081Z cvt.u64.u32 %rd560, %r797; 2026-02-21T08:19:08.2136145Z shl.b64 %rd561, %rd560, 32; 2026-02-21T08:19:08.2136201Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T08:19:08.2136361Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2136425Z mov.b64 {%r1165, %r1166}, %rd562; 2026-02-21T08:19:08.2136560Z cvt.rn.f16x2.f32 %r1167, %r1166, %r1165; 2026-02-21T08:19:08.2136716Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2136771Z cvt.u64.u32 %rd563, %r798; 2026-02-21T08:19:08.2136831Z cvt.u64.u32 %rd564, %r799; 2026-02-21T08:19:08.2136886Z shl.b64 %rd565, %rd564, 32; 2026-02-21T08:19:08.2136941Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T08:19:08.2137102Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2137158Z mov.b64 {%r1168, %r1169}, %rd566; 2026-02-21T08:19:08.2137220Z cvt.rn.f16x2.f32 %r1170, %r1169, %r1168; 2026-02-21T08:19:08.2137382Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2137438Z cvt.u64.u32 %rd567, %r800; 2026-02-21T08:19:08.2137493Z cvt.u64.u32 %rd568, %r801; 2026-02-21T08:19:08.2137638Z shl.b64 %rd569, %rd568, 32; 2026-02-21T08:19:08.2137706Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T08:19:08.2137864Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2137921Z mov.b64 {%r1171, %r1172}, %rd570; 2026-02-21T08:19:08.2137991Z cvt.rn.f16x2.f32 %r1173, %r1172, %r1171; 2026-02-21T08:19:08.2138148Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2138202Z cvt.u64.u32 %rd571, %r802; 2026-02-21T08:19:08.2138264Z cvt.u64.u32 %rd572, %r803; 2026-02-21T08:19:08.2138319Z shl.b64 %rd573, %rd572, 32; 2026-02-21T08:19:08.2138375Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T08:19:08.2138529Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2138592Z mov.b64 {%r1174, %r1175}, %rd574; 2026-02-21T08:19:08.2138655Z cvt.rn.f16x2.f32 %r1176, %r1175, %r1174; 2026-02-21T08:19:08.2138813Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2138879Z cvt.u64.u32 %rd575, %r804; 2026-02-21T08:19:08.2138933Z cvt.u64.u32 %rd576, %r805; 2026-02-21T08:19:08.2138988Z shl.b64 %rd577, %rd576, 32; 2026-02-21T08:19:08.2139049Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T08:19:08.2139212Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2139268Z mov.b64 {%r1177, %r1178}, %rd578; 2026-02-21T08:19:08.2139329Z cvt.rn.f16x2.f32 %r1179, %r1178, %r1177; 2026-02-21T08:19:08.2139493Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2139549Z cvt.u64.u32 %rd579, %r806; 2026-02-21T08:19:08.2139604Z cvt.u64.u32 %rd580, %r807; 2026-02-21T08:19:08.2139668Z shl.b64 %rd581, %rd580, 32; 2026-02-21T08:19:08.2139724Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T08:19:08.2139885Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2139951Z mov.b64 {%r1180, %r1181}, %rd582; 2026-02-21T08:19:08.2140017Z cvt.rn.f16x2.f32 %r1182, %r1181, %r1180; 2026-02-21T08:19:08.2140185Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2140243Z cvt.u64.u32 %rd583, %r809; 2026-02-21T08:19:08.2140309Z cvt.u64.u32 %rd584, %r810; 2026-02-21T08:19:08.2140367Z shl.b64 %rd585, %rd584, 32; 2026-02-21T08:19:08.2140424Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T08:19:08.2140596Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2140656Z mov.b64 {%r1183, %r1184}, %rd586; 2026-02-21T08:19:08.2140722Z cvt.rn.f16x2.f32 %r1185, %r1184, %r1183; 2026-02-21T08:19:08.2140899Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2140960Z cvt.u64.u32 %rd587, %r811; 2026-02-21T08:19:08.2141060Z cvt.u64.u32 %rd588, %r812; 2026-02-21T08:19:08.2141120Z shl.b64 %rd589, %rd588, 32; 2026-02-21T08:19:08.2141190Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T08:19:08.2141362Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2141422Z mov.b64 {%r1186, %r1187}, %rd590; 2026-02-21T08:19:08.2141494Z cvt.rn.f16x2.f32 %r1188, %r1187, %r1186; 2026-02-21T08:19:08.2141662Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2141721Z cvt.u64.u32 %rd591, %r813; 2026-02-21T08:19:08.2141785Z cvt.u64.u32 %rd592, %r814; 2026-02-21T08:19:08.2141843Z shl.b64 %rd593, %rd592, 32; 2026-02-21T08:19:08.2141900Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T08:19:08.2142067Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2142174Z mov.b64 {%r1189, %r1190}, %rd594; 2026-02-21T08:19:08.2142244Z cvt.rn.f16x2.f32 %r1191, %r1190, %r1189; 2026-02-21T08:19:08.2142411Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2142477Z cvt.u64.u32 %rd595, %r815; 2026-02-21T08:19:08.2142534Z cvt.u64.u32 %rd596, %r816; 2026-02-21T08:19:08.2142591Z shl.b64 %rd597, %rd596, 32; 2026-02-21T08:19:08.2142656Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T08:19:08.2142823Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2142884Z mov.b64 {%r1192, %r1193}, %rd598; 2026-02-21T08:19:08.2142949Z cvt.rn.f16x2.f32 %r1194, %r1193, %r1192; 2026-02-21T08:19:08.2143126Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2143183Z cvt.u64.u32 %rd599, %r817; 2026-02-21T08:19:08.2143241Z cvt.u64.u32 %rd600, %r818; 2026-02-21T08:19:08.2143308Z shl.b64 %rd601, %rd600, 32; 2026-02-21T08:19:08.2143369Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T08:19:08.2143537Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2143603Z mov.b64 {%r1195, %r1196}, %rd602; 2026-02-21T08:19:08.2143669Z cvt.rn.f16x2.f32 %r1197, %r1196, %r1195; 2026-02-21T08:19:08.2143836Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2143892Z cvt.u64.u32 %rd603, %r819; 2026-02-21T08:19:08.2143958Z cvt.u64.u32 %rd604, %r820; 2026-02-21T08:19:08.2144015Z shl.b64 %rd605, %rd604, 32; 2026-02-21T08:19:08.2144074Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T08:19:08.2144247Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2144306Z mov.b64 {%r1198, %r1199}, %rd606; 2026-02-21T08:19:08.2144372Z cvt.rn.f16x2.f32 %r1200, %r1199, %r1198; 2026-02-21T08:19:08.2144549Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2144609Z cvt.u64.u32 %rd607, %r821; 2026-02-21T08:19:08.2144665Z cvt.u64.u32 %rd608, %r822; 2026-02-21T08:19:08.2144753Z shl.b64 %rd609, %rd608, 32; 2026-02-21T08:19:08.2144820Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T08:19:08.2144987Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2145046Z mov.b64 {%r1201, %r1202}, %rd610; 2026-02-21T08:19:08.2145119Z cvt.rn.f16x2.f32 %r1203, %r1202, %r1201; 2026-02-21T08:19:08.2145286Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2145344Z cvt.u64.u32 %rd611, %r823; 2026-02-21T08:19:08.2145409Z cvt.u64.u32 %rd612, %r824; 2026-02-21T08:19:08.2145466Z shl.b64 %rd613, %rd612, 32; 2026-02-21T08:19:08.2145524Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T08:19:08.2145693Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2145809Z mov.b64 {%r1204, %r1205}, %rd614; 2026-02-21T08:19:08.2145874Z cvt.rn.f16x2.f32 %r1206, %r1205, %r1204; 2026-02-21T08:19:08.2146036Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2146101Z cvt.u64.u32 %rd615, %r826; 2026-02-21T08:19:08.2146158Z cvt.u64.u32 %rd616, %r827; 2026-02-21T08:19:08.2146216Z shl.b64 %rd617, %rd616, 32; 2026-02-21T08:19:08.2146279Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T08:19:08.2146445Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2146504Z mov.b64 {%r1207, %r1208}, %rd618; 2026-02-21T08:19:08.2146569Z cvt.rn.f16x2.f32 %r1209, %r1208, %r1207; 2026-02-21T08:19:08.2146734Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2146836Z cvt.u64.u32 %rd619, %r828; 2026-02-21T08:19:08.2146897Z cvt.u64.u32 %rd620, %r829; 2026-02-21T08:19:08.2146961Z shl.b64 %rd621, %rd620, 32; 2026-02-21T08:19:08.2147020Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T08:19:08.2147182Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2147249Z mov.b64 {%r1210, %r1211}, %rd622; 2026-02-21T08:19:08.2147316Z cvt.rn.f16x2.f32 %r1212, %r1211, %r1210; 2026-02-21T08:19:08.2147478Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2147536Z cvt.u64.u32 %rd623, %r830; 2026-02-21T08:19:08.2147601Z cvt.u64.u32 %rd624, %r831; 2026-02-21T08:19:08.2147658Z shl.b64 %rd625, %rd624, 32; 2026-02-21T08:19:08.2147716Z or.b64 %rd626, %rd623, %rd625; 2026-02-21T08:19:08.2147888Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2147949Z mov.b64 {%r1213, %r1214}, %rd626; 2026-02-21T08:19:08.2148014Z cvt.rn.f16x2.f32 %r1215, %r1214, %r1213; 2026-02-21T08:19:08.2148187Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2148245Z cvt.u64.u32 %rd627, %r832; 2026-02-21T08:19:08.2148310Z cvt.u64.u32 %rd628, %r833; 2026-02-21T08:19:08.2148365Z shl.b64 %rd629, %rd628, 32; 2026-02-21T08:19:08.2148427Z or.b64 %rd630, %rd627, %rd629; 2026-02-21T08:19:08.2148585Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2148642Z mov.b64 {%r1216, %r1217}, %rd630; 2026-02-21T08:19:08.2148711Z cvt.rn.f16x2.f32 %r1218, %r1217, %r1216; 2026-02-21T08:19:08.2148867Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2148923Z cvt.u64.u32 %rd631, %r834; 2026-02-21T08:19:08.2148985Z cvt.u64.u32 %rd632, %r835; 2026-02-21T08:19:08.2149040Z shl.b64 %rd633, %rd632, 32; 2026-02-21T08:19:08.2149099Z or.b64 %rd634, %rd631, %rd633; 2026-02-21T08:19:08.2149258Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2149323Z mov.b64 {%r1219, %r1220}, %rd634; 2026-02-21T08:19:08.2149385Z cvt.rn.f16x2.f32 %r1221, %r1220, %r1219; 2026-02-21T08:19:08.2149542Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2149606Z cvt.u64.u32 %rd635, %r836; 2026-02-21T08:19:08.2149661Z cvt.u64.u32 %rd636, %r837; 2026-02-21T08:19:08.2149718Z shl.b64 %rd637, %rd636, 32; 2026-02-21T08:19:08.2149782Z or.b64 %rd638, %rd635, %rd637; 2026-02-21T08:19:08.2149938Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2149996Z mov.b64 {%r1222, %r1223}, %rd638; 2026-02-21T08:19:08.2150066Z cvt.rn.f16x2.f32 %r1224, %r1223, %r1222; 2026-02-21T08:19:08.2150233Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2150327Z cvt.u64.u32 %rd639, %r838; 2026-02-21T08:19:08.2150381Z cvt.u64.u32 %rd640, %r839; 2026-02-21T08:19:08.2150446Z shl.b64 %rd641, %rd640, 32; 2026-02-21T08:19:08.2150502Z or.b64 %rd642, %rd639, %rd641; 2026-02-21T08:19:08.2150659Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2150722Z mov.b64 {%r1225, %r1226}, %rd642; 2026-02-21T08:19:08.2150786Z cvt.rn.f16x2.f32 %r1227, %r1226, %r1225; 2026-02-21T08:19:08.2150942Z .loc 1 54 52 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:54:52 2026-02-21T08:19:08.2151005Z cvt.u64.u32 %rd643, %r840; 2026-02-21T08:19:08.2151060Z cvt.u64.u32 %rd644, %r841; 2026-02-21T08:19:08.2151115Z shl.b64 %rd645, %rd644, 32; 2026-02-21T08:19:08.2151170Z or.b64 %rd646, %rd643, %rd645; 2026-02-21T08:19:08.2151372Z .loc 1 56 27 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:56:27 2026-02-21T08:19:08.2151434Z mov.b64 {%r1228, %r1229}, %rd646; 2026-02-21T08:19:08.2151499Z cvt.rn.f16x2.f32 %r1230, %r1229, %r1228; 2026-02-21T08:19:08.2151666Z .loc 1 57 45 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:57:45 2026-02-21T08:19:08.2151739Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:19:08.2151791Z bar.sync 0; 2026-02-21T08:19:08.2151886Z st.shared.v4.b32 [%r11], {%r849, %r852, %r855, %r858}; 2026-02-21T08:19:08.2151995Z st.shared.v4.b32 [%r11+32768], {%r945, %r948, %r951, %r954}; 2026-02-21T08:19:08.2152101Z st.shared.v4.b32 [%r11+65536], {%r1041, %r1044, %r1047, %r1050}; 2026-02-21T08:19:08.2152199Z st.shared.v4.b32 [%r11+98304], {%r1137, %r1140, %r1143, %r1146}; 2026-02-21T08:19:08.2152292Z st.shared.v4.b32 [%r12], {%r861, %r864, %r867, %r870}; 2026-02-21T08:19:08.2152383Z st.shared.v4.b32 [%r12+32768], {%r957, %r960, %r963, %r966}; 2026-02-21T08:19:08.2152480Z st.shared.v4.b32 [%r12+65536], {%r1053, %r1056, %r1059, %r1062}; 2026-02-21T08:19:08.2152584Z st.shared.v4.b32 [%r12+98304], {%r1149, %r1152, %r1155, %r1158}; 2026-02-21T08:19:08.2152669Z st.shared.v4.b32 [%r13], {%r873, %r876, %r879, %r882}; 2026-02-21T08:19:08.2152762Z st.shared.v4.b32 [%r13+32768], {%r969, %r972, %r975, %r978}; 2026-02-21T08:19:08.2152862Z st.shared.v4.b32 [%r13+65536], {%r1065, %r1068, %r1071, %r1074}; 2026-02-21T08:19:08.2152955Z st.shared.v4.b32 [%r13+98304], {%r1161, %r1164, %r1167, %r1170}; 2026-02-21T08:19:08.2153038Z st.shared.v4.b32 [%r14], {%r885, %r888, %r891, %r894}; 2026-02-21T08:19:08.2153126Z st.shared.v4.b32 [%r14+32768], {%r981, %r984, %r987, %r990}; 2026-02-21T08:19:08.2153226Z st.shared.v4.b32 [%r14+65536], {%r1077, %r1080, %r1083, %r1086}; 2026-02-21T08:19:08.2153319Z st.shared.v4.b32 [%r14+98304], {%r1173, %r1176, %r1179, %r1182}; 2026-02-21T08:19:08.2153402Z st.shared.v4.b32 [%r15], {%r897, %r900, %r903, %r906}; 2026-02-21T08:19:08.2153506Z st.shared.v4.b32 [%r15+32768], {%r993, %r996, %r999, %r1002}; 2026-02-21T08:19:08.2153601Z st.shared.v4.b32 [%r15+65536], {%r1089, %r1092, %r1095, %r1098}; 2026-02-21T08:19:08.2153692Z st.shared.v4.b32 [%r15+98304], {%r1185, %r1188, %r1191, %r1194}; 2026-02-21T08:19:08.2153781Z st.shared.v4.b32 [%r16], {%r909, %r912, %r915, %r918}; 2026-02-21T08:19:08.2153872Z st.shared.v4.b32 [%r16+32768], {%r1005, %r1008, %r1011, %r1014}; 2026-02-21T08:19:08.2153964Z st.shared.v4.b32 [%r16+65536], {%r1101, %r1104, %r1107, %r1110}; 2026-02-21T08:19:08.2154063Z st.shared.v4.b32 [%r16+98304], {%r1197, %r1200, %r1203, %r1206}; 2026-02-21T08:19:08.2154144Z st.shared.v4.b32 [%r17], {%r921, %r924, %r927, %r930}; 2026-02-21T08:19:08.2154234Z st.shared.v4.b32 [%r17+32768], {%r1017, %r1020, %r1023, %r1026}; 2026-02-21T08:19:08.2154324Z st.shared.v4.b32 [%r17+65536], {%r1113, %r1116, %r1119, %r1122}; 2026-02-21T08:19:08.2154423Z st.shared.v4.b32 [%r17+98304], {%r1209, %r1212, %r1215, %r1218}; 2026-02-21T08:19:08.2154506Z st.shared.v4.b32 [%r18], {%r933, %r936, %r939, %r942}; 2026-02-21T08:19:08.2154649Z st.shared.v4.b32 [%r18+32768], {%r1029, %r1032, %r1035, %r1038}; 2026-02-21T08:19:08.2154782Z st.shared.v4.b32 [%r18+65536], {%r1125, %r1128, %r1131, %r1134}; 2026-02-21T08:19:08.2154873Z st.shared.v4.b32 [%r18+98304], {%r1221, %r1224, %r1227, %r1230}; 2026-02-21T08:19:08.2154930Z // begin inline asm 2026-02-21T08:19:08.2155009Z fence.proxy.async.shared::cta; 2026-02-21T08:19:08.2155064Z // end inline asm 2026-02-21T08:19:08.2155116Z bar.sync 0; 2026-02-21T08:19:08.2155181Z elect.sync %r1231|%p125, -1; 2026-02-21T08:19:08.2155250Z and.pred %p123, %p124, %p125; 2026-02-21T08:19:08.2155308Z shl.b32 %r1232, %r22, 15; 2026-02-21T08:19:08.2155368Z add.s32 %r845, %r37, %r1232; 2026-02-21T08:19:08.2155431Z shl.b32 %r1233, %r22, 6; 2026-02-21T08:19:08.2155488Z or.b32 %r843, %r1233, %r20; 2026-02-21T08:19:08.2155543Z // begin inline asm 2026-02-21T08:19:08.2155779Z @%p123 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd134, {%r843, %r844}], [%r845]; 2026-02-21T08:19:08.2155835Z // end inline asm 2026-02-21T08:19:08.2155900Z cp.async.bulk.commit_group; 2026-02-21T08:19:08.2155983Z $L__BB0_8: // %._crit_edge 2026-02-21T08:19:08.2156147Z .loc 1 29 75 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:29:75 2026-02-21T08:19:08.2156218Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:19:08.2156268Z bar.sync 0; 2026-02-21T08:19:08.2156434Z .loc 1 29 4 // c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py:29:4 2026-02-21T08:19:08.2156486Z bar.sync 0; 2026-02-21T08:19:08.2156540Z // begin inline asm 2026-02-21T08:19:08.2156659Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1234, 512; 2026-02-21T08:19:08.2156712Z // end inline asm 2026-02-21T08:19:08.2156762Z ret; 2026-02-21T08:19:08.2156815Z $L__tmp1: 2026-02-21T08:19:08.2156877Z $L__func_end0: 2026-02-21T08:19:08.2156959Z // -- End function 2026-02-21T08:19:08.2157010Z } 2026-02-21T08:19:08.2157223Z .file 1 "/tmp/torchinductor_root/6a/c6ak5m7qyy3f64kkv6pcde7qqg35xjoztcdnm4bm6yposplhhled.py" 2026-02-21T08:19:08.2157283Z .section .debug_abbrev 2026-02-21T08:19:08.2157332Z { 2026-02-21T08:19:08.2157421Z .b8 1 // Abbreviation Code 2026-02-21T08:19:08.2157513Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:19:08.2157591Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:19:08.2157668Z .b8 37 // DW_AT_producer 2026-02-21T08:19:08.2157748Z .b8 8 // DW_FORM_string 2026-02-21T08:19:08.2157820Z .b8 19 // DW_AT_language 2026-02-21T08:19:08.2157894Z .b8 5 // DW_FORM_data2 2026-02-21T08:19:08.2157973Z .b8 3 // DW_AT_name 2026-02-21T08:19:08.2158046Z .b8 8 // DW_FORM_string 2026-02-21T08:19:08.2158126Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:19:08.2158198Z .b8 6 // DW_FORM_data4 2026-02-21T08:19:08.2158279Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:19:08.2158349Z .b8 8 // DW_FORM_string 2026-02-21T08:19:08.2158418Z .b8 0 // EOM(1) 2026-02-21T08:19:08.2158496Z .b8 0 // EOM(2) 2026-02-21T08:19:08.2158560Z .b8 0 // EOM(3) 2026-02-21T08:19:08.2158609Z } 2026-02-21T08:19:08.2158674Z .section .debug_info 2026-02-21T08:19:08.2158722Z { 2026-02-21T08:19:08.2158801Z .b32 104 // Length of Unit 2026-02-21T08:19:08.2158883Z .b8 2 // DWARF version number 2026-02-21T08:19:08.2158942Z .b8 0 2026-02-21T08:19:08.2159054Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:19:08.2159136Z .b8 8 // Address Size (in bytes) 2026-02-21T08:19:08.2159288Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:19:08.2159367Z .b8 116 // DW_AT_producer 2026-02-21T08:19:08.2159419Z .b8 114 2026-02-21T08:19:08.2159476Z .b8 105 2026-02-21T08:19:08.2159524Z .b8 116 2026-02-21T08:19:08.2159573Z .b8 111 2026-02-21T08:19:08.2159622Z .b8 110 2026-02-21T08:19:08.2159678Z .b8 0 2026-02-21T08:19:08.2159750Z .b8 2 // DW_AT_language 2026-02-21T08:19:08.2159797Z .b8 0 2026-02-21T08:19:08.2159869Z .b8 99 // DW_AT_name 2026-02-21T08:19:08.2159926Z .b8 54 2026-02-21T08:19:08.2159973Z .b8 97 2026-02-21T08:19:08.2160021Z .b8 107 2026-02-21T08:19:08.2160078Z .b8 53 2026-02-21T08:19:08.2160127Z .b8 109 2026-02-21T08:19:08.2160175Z .b8 55 2026-02-21T08:19:08.2160225Z .b8 113 2026-02-21T08:19:08.2160282Z .b8 121 2026-02-21T08:19:08.2160366Z .b8 121 2026-02-21T08:19:08.2160419Z .b8 51 2026-02-21T08:19:08.2160480Z .b8 102 2026-02-21T08:19:08.2160532Z .b8 54 2026-02-21T08:19:08.2160583Z .b8 52 2026-02-21T08:19:08.2160634Z .b8 107 2026-02-21T08:19:08.2160693Z .b8 107 2026-02-21T08:19:08.2160744Z .b8 118 2026-02-21T08:19:08.2160795Z .b8 54 2026-02-21T08:19:08.2160854Z .b8 112 2026-02-21T08:19:08.2160904Z .b8 99 2026-02-21T08:19:08.2160955Z .b8 100 2026-02-21T08:19:08.2161008Z .b8 101 2026-02-21T08:19:08.2161066Z .b8 55 2026-02-21T08:19:08.2161117Z .b8 113 2026-02-21T08:19:08.2161168Z .b8 113 2026-02-21T08:19:08.2161218Z .b8 103 2026-02-21T08:19:08.2161278Z .b8 51 2026-02-21T08:19:08.2161329Z .b8 53 2026-02-21T08:19:08.2161380Z .b8 120 2026-02-21T08:19:08.2161437Z .b8 106 2026-02-21T08:19:08.2161487Z .b8 111 2026-02-21T08:19:08.2161538Z .b8 122 2026-02-21T08:19:08.2161589Z .b8 116 2026-02-21T08:19:08.2161646Z .b8 99 2026-02-21T08:19:08.2161698Z .b8 100 2026-02-21T08:19:08.2161748Z .b8 110 2026-02-21T08:19:08.2161806Z .b8 109 2026-02-21T08:19:08.2161859Z .b8 52 2026-02-21T08:19:08.2161912Z .b8 98 2026-02-21T08:19:08.2161964Z .b8 109 2026-02-21T08:19:08.2162020Z .b8 54 2026-02-21T08:19:08.2162071Z .b8 121 2026-02-21T08:19:08.2162121Z .b8 112 2026-02-21T08:19:08.2162178Z .b8 111 2026-02-21T08:19:08.2162228Z .b8 115 2026-02-21T08:19:08.2162280Z .b8 112 2026-02-21T08:19:08.2162330Z .b8 108 2026-02-21T08:19:08.2162389Z .b8 104 2026-02-21T08:19:08.2162440Z .b8 104 2026-02-21T08:19:08.2162490Z .b8 108 2026-02-21T08:19:08.2162540Z .b8 101 2026-02-21T08:19:08.2162599Z .b8 100 2026-02-21T08:19:08.2162649Z .b8 46 2026-02-21T08:19:08.2162702Z .b8 112 2026-02-21T08:19:08.2162759Z .b8 121 2026-02-21T08:19:08.2162809Z .b8 0 2026-02-21T08:19:08.2162901Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:19:08.2162974Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:19:08.2163032Z .b8 116 2026-02-21T08:19:08.2163083Z .b8 109 2026-02-21T08:19:08.2163135Z .b8 112 2026-02-21T08:19:08.2163191Z .b8 47 2026-02-21T08:19:08.2163246Z .b8 116 2026-02-21T08:19:08.2163299Z .b8 111 2026-02-21T08:19:08.2163351Z .b8 114 2026-02-21T08:19:08.2163408Z .b8 99 2026-02-21T08:19:08.2163458Z .b8 104 2026-02-21T08:19:08.2163510Z .b8 105 2026-02-21T08:19:08.2163568Z .b8 110 2026-02-21T08:19:08.2163619Z .b8 100 2026-02-21T08:19:08.2163671Z .b8 117 2026-02-21T08:19:08.2163722Z .b8 99 2026-02-21T08:19:08.2163781Z .b8 116 2026-02-21T08:19:08.2163833Z .b8 111 2026-02-21T08:19:08.2163886Z .b8 114 2026-02-21T08:19:08.2163937Z .b8 95 2026-02-21T08:19:08.2163998Z .b8 114 2026-02-21T08:19:08.2164051Z .b8 111 2026-02-21T08:19:08.2164103Z .b8 111 2026-02-21T08:19:08.2164165Z .b8 116 2026-02-21T08:19:08.2164219Z .b8 47 2026-02-21T08:19:08.2164270Z .b8 54 2026-02-21T08:19:08.2164320Z .b8 97 2026-02-21T08:19:08.2164379Z .b8 0 2026-02-21T08:19:08.2164430Z } 2026-02-21T08:19:08.2164496Z .section .debug_macinfo { } 2026-02-21T08:19:08.2164501Z 2026-02-21T08:19:08.2164587Z ================================================================ 2026-02-21T08:19:08.2164725Z please share the reproducer above with Triton project. 2026-02-21T08:19:08.4937330Z 2026-02-21T08:19:08.4939815Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 26.4 configs/s 2026-02-21T08:19:09.0508366Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1761.0 2026-02-21T08:19:09.0509709Z configs/s 2026-02-21T08:19:09.1028915Z [177s] Generation 11 complete: 2026-02-21T08:19:09.1029238Z error=8 2026-02-21T08:19:09.1029399Z ok=9 2026-02-21T08:19:09.1029529Z min=0.0369 2026-02-21T08:19:09.1029701Z mid=0.0389 2026-02-21T08:19:09.1029840Z max=0.1003 2026-02-21T08:19:09.1030002Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:19:09.1030288Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:19:09.1030547Z 'l2_groupings': [8], 2026-02-21T08:19:09.1030744Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:19:09.1030945Z 'loop_orders': [[0, 1]], 2026-02-21T08:19:09.1031503Z 'num_stages': 3, 2026-02-21T08:19:09.1031680Z 'num_warps': 8, 2026-02-21T08:19:09.1031849Z 'pid_type': 'flat', 2026-02-21T08:19:09.1032021Z 'range_flattens': [None, None], 2026-02-21T08:19:09.1032230Z 'range_multi_buffers': [None, False], 2026-02-21T08:19:09.1032436Z 'range_num_stages': [0, 0], 2026-02-21T08:19:09.1032614Z 'range_unroll_factors': [0, 0], 2026-02-21T08:19:09.1032808Z 'range_warp_specializes': [None, False]} 2026-02-21T08:19:09.1052750Z [177s] Fitting surrogate: 738 points, 738 targets 2026-02-21T08:19:09.5244948Z [177s] Generation 12 starting: 12 neighbors, 1 active search path(s) 2026-02-21T08:19:10.6652650Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 12/12 16.0 configs/s 2026-02-21T08:19:11.2058077Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 12/12 25.1 configs/s 2026-02-21T08:19:12.1003015Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1111.3 2026-02-21T08:19:12.1004526Z configs/s 2026-02-21T08:19:12.1749122Z [180s] Generation 12 complete: 2026-02-21T08:19:12.1749490Z error=3 2026-02-21T08:19:12.1749645Z ok=11 2026-02-21T08:19:12.1749833Z min=0.0369 2026-02-21T08:19:12.1749985Z mid=0.0389 2026-02-21T08:19:12.1750112Z max=0.0430 2026-02-21T08:19:12.1750280Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:19:12.1750536Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:19:12.1750780Z 'l2_groupings': [8], 2026-02-21T08:19:12.1750955Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:19:12.1751165Z 'loop_orders': [[0, 1]], 2026-02-21T08:19:12.1751329Z 'num_stages': 3, 2026-02-21T08:19:12.1751477Z 'num_warps': 8, 2026-02-21T08:19:12.1751630Z 'pid_type': 'flat', 2026-02-21T08:19:12.1751795Z 'range_flattens': [None, None], 2026-02-21T08:19:12.1751992Z 'range_multi_buffers': [None, False], 2026-02-21T08:19:12.1752185Z 'range_num_stages': [0, 0], 2026-02-21T08:19:12.1752394Z 'range_unroll_factors': [0, 0], 2026-02-21T08:19:12.1752585Z 'range_warp_specializes': [None, False]} 2026-02-21T08:19:12.1774975Z [180s] Fitting surrogate: 752 points, 752 targets 2026-02-21T08:19:12.4988971Z [180s] Autotuning complete in 180.6s after searching 721 configs. 2026-02-21T08:19:12.4989391Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:19:12.4990738Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 64], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], num_stages=3, num_warps=8, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T08:19:12.4991900Z 2026-02-21T08:19:12.4992210Z [180s] Code of selected kernel: /tmp/torchinductor_root/xv/cxvvzzxrz2uiglqio7bwbcs5ibnsakw3qffrck44pulpsoalb5tt.py 2026-02-21T08:19:27.9070214Z WARNING:tritonbench.utils.triton_op:Completed input ID 2: 2026-02-21T08:19:27.9071533Z (M, N, K) 2026-02-21T08:19:27.9071811Z ------------------ 2026-02-21T08:19:27.9072077Z (4096, 2048, 2048) 2026-02-21T08:19:27.9072247Z 2026-02-21T08:19:27.9077787Z 25%|██▌ | 2/8 [09:16<28:39, 286.61s/it]WARNING:tritonbench.utils.triton_op:Running input ID 3: 2026-02-21T08:19:27.9078423Z (M, N, K) 2026-02-21T08:19:27.9078647Z ------------------ 2026-02-21T08:19:27.9078904Z (2048, 4096, 2048) 2026-02-21T08:19:27.9081266Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:20:10.6848016Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:20:49.1184076Z Autotune Choices Stats: 2026-02-21T08:20:49.1186751Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.040832001715898514, "best_triton_pos": 0} 2026-02-21T08:20:49.1191250Z AUTOTUNE mm(2048x2048, 2048x4096) 2026-02-21T08:20:49.1197456Z strides: [2048, 1], [1, 2048] 2026-02-21T08:20:49.1200595Z dtypes: torch.float16, torch.float16 2026-02-21T08:20:49.1206488Z triton_mm_55 0.0408 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:20:49.1207605Z triton_mm_49 0.0530 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:20:49.1208308Z triton_mm_56 0.0531 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:20:49.1209015Z triton_mm_54 0.0612 ms 66.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:20:49.1209692Z triton_mm_52 0.0633 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:20:49.1210353Z triton_mm_48 0.0634 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:20:49.1212440Z triton_mm_51 0.0634 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:20:49.1215443Z triton_mm_45 0.0645 ms 63.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:20:49.1216217Z triton_mm_47 0.0695 ms 58.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:20:49.1216933Z triton_mm_50 0.0766 ms 53.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T08:20:49.1217541Z SingleProcess AUTOTUNE benchmarking takes 0.3936 seconds and 0.2901 seconds precompiling for 19 choices 2026-02-21T08:20:49.3968155Z INFO:tritonbench.utils.triton_op:Took 1132.37ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:21:28.1176880Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:21:28.1180572Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:21:28.1185183Z 'dtype': 'torch.float16', 2026-02-21T08:21:28.1189235Z 'shape': (2048, 2048), 2026-02-21T08:21:28.1193769Z 'stride': (2048, 1)}, 2026-02-21T08:21:28.1198346Z { 'device': 'cuda:0', 2026-02-21T08:21:28.1200035Z 'dtype': 'torch.float16', 2026-02-21T08:21:28.1200317Z 'shape': (2048, 4096), 2026-02-21T08:21:28.1205204Z 'stride': (1, 2048)}, 2026-02-21T08:21:28.1209372Z None), 2026-02-21T08:21:28.1213214Z 'kwargs': {}} 2026-02-21T08:21:28.1217361Z INFO:tritonbench.utils.triton_op:Took 4.56ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:21:28.2147557Z [0s] Autotune random seed: 2134884919 2026-02-21T08:21:28.3438453Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:21:32.0091116Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 37.5 configs/s 2026-02-21T08:21:45.4147109Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 7.4 configs/s 2026-02-21T08:21:45.4158122Z [17s] Adaptive compile timeout: 30s (90% percentile=2.2s, bounds=[30.0s, 30s]) 2026-02-21T08:21:45.4160531Z [17s] Initial random population of 100, 5 starting points: 2026-02-21T08:21:45.4161239Z error=11 2026-02-21T08:21:45.4165466Z ok=89 2026-02-21T08:21:45.4170018Z min=0.1597 2026-02-21T08:21:45.4173979Z mid=2.5601 2026-02-21T08:21:45.4178015Z max=267.5334 2026-02-21T08:21:45.4182627Z best={'block_sizes': [64, 128, 16], 2026-02-21T08:21:45.4186642Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:21:45.4190594Z 'l2_groupings': [4], 2026-02-21T08:21:45.4192286Z 'load_eviction_policies': ['', ''], 2026-02-21T08:21:45.4192529Z 'loop_orders': [[1, 0]], 2026-02-21T08:21:45.4192688Z 'num_stages': 8, 2026-02-21T08:21:45.4192838Z 'num_warps': 2, 2026-02-21T08:21:45.4193060Z 'pid_type': 'flat', 2026-02-21T08:21:45.4193227Z 'range_flattens': [None, None], 2026-02-21T08:21:45.4193401Z 'range_multi_buffers': [None, None], 2026-02-21T08:21:45.4193591Z 'range_num_stages': [0, 0], 2026-02-21T08:21:45.4193775Z 'range_unroll_factors': [0, 0], 2026-02-21T08:21:45.4198484Z 'range_warp_specializes': [None, None]} 2026-02-21T08:21:45.4200244Z [17s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:21:46.8768377Z [18s] Generation 1 starting: 84 neighbors, 5 active search path(s) 2026-02-21T08:21:51.0283476Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 38.7 configs/s 2026-02-21T08:21:54.7081112Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:54.7082444Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:21:54.7084445Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:54.7084642Z 2026-02-21T08:21:54.7084646Z 2026-02-21T08:21:54.7084983Z ================================================================ 2026-02-21T08:21:54.7085210Z Internal Triton PTX codegen error 2026-02-21T08:21:54.7085387Z `ptxas` stderr: 2026-02-21T08:21:54.7085809Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:54.7086294Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:54.7086437Z 2026-02-21T08:21:54.7086859Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpc9urzb6q.ptx -o /tmp/tmpc9urzb6q.ptx.o 2026-02-21T08:21:54.7087306Z 2026-02-21T08:21:54.7087309Z 2026-02-21T08:21:54.7087372Z // 2026-02-21T08:21:54.7087529Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:54.7087701Z // 2026-02-21T08:21:54.7087781Z 2026-02-21T08:21:54.7087840Z .version 8.7 2026-02-21T08:21:54.7087989Z .target sm_100a 2026-02-21T08:21:54.7088128Z .address_size 64 2026-02-21T08:21:54.7088528Z 2026-02-21T08:21:54.7088662Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:54.7088913Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:54.7089134Z // @_helion_matmul 2026-02-21T08:21:54.7089331Z .visible .entry _helion_matmul( 2026-02-21T08:21:54.7089550Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:54.7089799Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:54.7090037Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:54.7090280Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:54.7090534Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:54.7090746Z ) 2026-02-21T08:21:54.7090867Z .reqntid 128 2026-02-21T08:21:54.7091003Z .maxnreg 32 2026-02-21T08:21:54.7091126Z { 2026-02-21T08:21:54.7091260Z .reg .pred %p<120>; 2026-02-21T08:21:54.7091534Z .reg .b32 %r<406>; 2026-02-21T08:21:54.7091697Z .reg .b64 %rd<136>; 2026-02-21T08:21:54.7091983Z .loc 1 19 0 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:19:0 2026-02-21T08:21:54.7092289Z $L__func_begin0: 2026-02-21T08:21:54.7092549Z .loc 1 19 0 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:19:0 2026-02-21T08:21:54.7092786Z 2026-02-21T08:21:54.7092841Z // %bb.0: 2026-02-21T08:21:54.7093008Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:21:54.7093202Z $L__tmp0: 2026-02-21T08:21:54.7093445Z .loc 1 19 0 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:19 2026-02-21T08:21:54.7093748Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:54.7093923Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:21:54.7094133Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:21:54.7094299Z mov.b32 %r30, global_smem; 2026-02-21T08:21:54.7094466Z // begin inline asm 2026-02-21T08:21:54.7094767Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r30], 64; 2026-02-21T08:21:54.7095029Z // end inline asm 2026-02-21T08:21:54.7095192Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T08:21:54.7095388Z bar.sync 0; 2026-02-21T08:21:54.7095540Z ld.shared.b32 %r397, [global_smem]; 2026-02-21T08:21:54.7095714Z bar.sync 0; 2026-02-21T08:21:54.7095853Z // begin inline asm 2026-02-21T08:21:54.7096062Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:54.7096298Z // end inline asm 2026-02-21T08:21:54.7096546Z .loc 1 21 67 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:21:67 2026-02-21T08:21:54.7096862Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:21:54.7097018Z mov.u32 %r47, %ctaid.y; 2026-02-21T08:21:54.7097182Z mov.u32 %r48, %ctaid.z; 2026-02-21T08:21:54.7097341Z mov.u32 %r49, %nctaid.x; 2026-02-21T08:21:54.7097494Z mov.u32 %r50, %nctaid.y; 2026-02-21T08:21:54.7097663Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T08:21:54.7097849Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T08:21:54.7098035Z shl.b32 %r53, %r52, 8; 2026-02-21T08:21:54.7098191Z cvt.s64.s32 %rd41, %r53; 2026-02-21T08:21:54.7098355Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T08:21:54.7098520Z shl.b32 %r54, %r1, 2; 2026-02-21T08:21:54.7098669Z add.s32 %r31, %r30, %r54; 2026-02-21T08:21:54.7098810Z mov.b32 %r40, 0; 2026-02-21T08:21:54.7098947Z // begin inline asm 2026-02-21T08:21:54.7099230Z `ptxas` stderr: 2026-02-21T08:21:54.7099623Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:54.7100076Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:54.7100216Z 2026-02-21T08:21:54.7100579Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpc9urzb6q.ptx -o /tmp/tmpc9urzb6q.ptx.o 2026-02-21T08:21:54.7101028Z 2026-02-21T08:21:54.7101152Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:54.7101478Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:54.7101647Z // end inline asm 2026-02-21T08:21:54.7101790Z bar.warp.sync -1; 2026-02-21T08:21:54.7101939Z setp.eq.b32 %p110, %r1, 0; 2026-02-21T08:21:54.7102103Z cvt.u64.u32 %rd4, %r30; 2026-02-21T08:21:54.7102249Z // begin inline asm 2026-02-21T08:21:54.7102507Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:21:54.7102787Z // end inline asm 2026-02-21T08:21:54.7102920Z // begin inline asm 2026-02-21T08:21:54.7103170Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:54.7103422Z // end inline asm 2026-02-21T08:21:54.7103558Z mov.b32 %r33, 16; 2026-02-21T08:21:54.7103696Z // begin inline asm 2026-02-21T08:21:54.7103923Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:54.7104253Z // end inline asm 2026-02-21T08:21:54.7104384Z mov.b32 %r34, 64; 2026-02-21T08:21:54.7104519Z // begin inline asm 2026-02-21T08:21:54.7104766Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:54.7105034Z // end inline asm 2026-02-21T08:21:54.7105164Z mov.b32 %r35, 2048; 2026-02-21T08:21:54.7105305Z // begin inline asm 2026-02-21T08:21:54.7105544Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:54.7105808Z // end inline asm 2026-02-21T08:21:54.7105946Z // begin inline asm 2026-02-21T08:21:54.7106177Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:21:54.7106448Z // end inline asm 2026-02-21T08:21:54.7106577Z mov.b64 %rd12, 4096; 2026-02-21T08:21:54.7106722Z // begin inline asm 2026-02-21T08:21:54.7106974Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:54.7107252Z // end inline asm 2026-02-21T08:21:54.7107391Z mov.b32 %r37, 1; 2026-02-21T08:21:54.7107518Z // begin inline asm 2026-02-21T08:21:54.7107772Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:54.7108046Z // end inline asm 2026-02-21T08:21:54.7108178Z // begin inline asm 2026-02-21T08:21:54.7108414Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:54.7108699Z // end inline asm 2026-02-21T08:21:54.7108833Z // begin inline asm 2026-02-21T08:21:54.7109056Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:54.7109323Z // end inline asm 2026-02-21T08:21:54.7109448Z // begin inline asm 2026-02-21T08:21:54.7109698Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:54.7109968Z // end inline asm 2026-02-21T08:21:54.7110101Z // begin inline asm 2026-02-21T08:21:54.7110333Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:54.7110601Z // end inline asm 2026-02-21T08:21:54.7110734Z // begin inline asm 2026-02-21T08:21:54.7110951Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:54.7111206Z // end inline asm 2026-02-21T08:21:54.7111333Z // begin inline asm 2026-02-21T08:21:54.7111687Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:54.7112054Z // end inline asm 2026-02-21T08:21:54.7112183Z // begin inline asm 2026-02-21T08:21:54.7112393Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:21:54.7112635Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:54.7112825Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:54.7112997Z // end inline asm 2026-02-21T08:21:54.7113136Z bar.sync 0; 2026-02-21T08:21:54.7113274Z cvta.global.u64 %rd59, %rd19; 2026-02-21T08:21:54.7113553Z .loc 1 22 67 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:22:67 2026-02-21T08:21:54.7113899Z add.s64 %rd37, %rd19, 128; 2026-02-21T08:21:54.7114047Z bar.sync 0; 2026-02-21T08:21:54.7114180Z // begin inline asm 2026-02-21T08:21:54.7114322Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:54.7114491Z // end inline asm 2026-02-21T08:21:54.7114624Z bar.warp.sync -1; 2026-02-21T08:21:54.7114793Z // begin inline asm 2026-02-21T08:21:54.7115039Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:21:54.7115323Z // end inline asm 2026-02-21T08:21:54.7115458Z // begin inline asm 2026-02-21T08:21:54.7115676Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:54.7115934Z // end inline asm 2026-02-21T08:21:54.7116063Z // begin inline asm 2026-02-21T08:21:54.7116361Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:54.7116624Z // end inline asm 2026-02-21T08:21:54.7116762Z // begin inline asm 2026-02-21T08:21:54.7116994Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:54.7117247Z // end inline asm 2026-02-21T08:21:54.7117385Z // begin inline asm 2026-02-21T08:21:54.7117617Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:54.7117886Z // end inline asm 2026-02-21T08:21:54.7118015Z mov.b32 %r44, 4096; 2026-02-21T08:21:54.7118160Z // begin inline asm 2026-02-21T08:21:54.7118386Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r44; 2026-02-21T08:21:54.7118656Z // end inline asm 2026-02-21T08:21:54.7118792Z // begin inline asm 2026-02-21T08:21:54.7119041Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:54.7119322Z // end inline asm 2026-02-21T08:21:54.7119451Z // begin inline asm 2026-02-21T08:21:54.7119699Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:54.7119981Z // end inline asm 2026-02-21T08:21:54.7120115Z // begin inline asm 2026-02-21T08:21:54.7120365Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:54.7120643Z // end inline asm 2026-02-21T08:21:54.7120779Z // begin inline asm 2026-02-21T08:21:54.7121001Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:54.7121261Z // end inline asm 2026-02-21T08:21:54.7121389Z // begin inline asm 2026-02-21T08:21:54.7121641Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:54.7121913Z // end inline asm 2026-02-21T08:21:54.7122048Z // begin inline asm 2026-02-21T08:21:54.7122284Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:54.7122545Z // end inline asm 2026-02-21T08:21:54.7122683Z // begin inline asm 2026-02-21T08:21:54.7122904Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:54.7123161Z // end inline asm 2026-02-21T08:21:54.7123289Z // begin inline asm 2026-02-21T08:21:54.7123632Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:54.7124005Z // end inline asm 2026-02-21T08:21:54.7124137Z // begin inline asm 2026-02-21T08:21:54.7124347Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:21:54.7124587Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:54.7124827Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:54.7124999Z // end inline asm 2026-02-21T08:21:54.7125136Z bar.sync 0; 2026-02-21T08:21:54.7125270Z cvta.global.u64 %rd60, %rd37; 2026-02-21T08:21:54.7125558Z .loc 1 28 131 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:28:131 2026-02-21T08:21:54.7125931Z setp.gt.u32 %p39, %r3, 2047; 2026-02-21T08:21:54.7126091Z @%p39 bra $L__BB0_8; 2026-02-21T08:21:54.7126258Z // %bb.1: // %.lr.ph 2026-02-21T08:21:54.7126548Z .loc 1 40 45 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:40:45 2026-02-21T08:21:54.7126845Z shl.b32 %r154, %r1, 3; 2026-02-21T08:21:54.7126994Z and.b32 %r155, %r154, 56; 2026-02-21T08:21:54.7127155Z bfe.u32 %r156, %r1, 3, 4; 2026-02-21T08:21:54.7127311Z shr.u32 %r157, %r1, 5; 2026-02-21T08:21:54.7127455Z shl.b32 %r158, %r1, 4; 2026-02-21T08:21:54.7127605Z and.b32 %r159, %r158, 176; 2026-02-21T08:21:54.7127758Z and.b32 %r160, %r1, 96; 2026-02-21T08:21:54.7127913Z shl.b32 %r161, %r160, 3; 2026-02-21T08:21:54.7128061Z bfe.s32 %r162, %r1, 2, 1; 2026-02-21T08:21:54.7128214Z and.b32 %r163, %r162, 1088; 2026-02-21T08:21:54.7128367Z and.b32 %r165, %r54, 64; 2026-02-21T08:21:54.7128523Z xor.b32 %r166, %r163, %r165; 2026-02-21T08:21:54.7128760Z add.s32 %r167, %r30, %r159; 2026-02-21T08:21:54.7128924Z add.s32 %r168, %r167, %r161; 2026-02-21T08:21:54.7129094Z shl.b32 %r169, %r1, 5; 2026-02-21T08:21:54.7129235Z and.b32 %r170, %r169, 1792; 2026-02-21T08:21:54.7129388Z and.b32 %r171, %r154, 48; 2026-02-21T08:21:54.7129533Z shl.b32 %r172, %r160, 1; 2026-02-21T08:21:54.7129685Z shl.b32 %r173, %r1, 6; 2026-02-21T08:21:54.7129825Z and.b32 %r174, %r173, 64; 2026-02-21T08:21:54.7129976Z xor.b32 %r175, %r172, %r174; 2026-02-21T08:21:54.7130126Z add.s32 %r176, %r30, %r170; 2026-02-21T08:21:54.7130281Z add.s32 %r177, %r176, %r171; 2026-02-21T08:21:54.7130536Z .loc 1 35 33 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:35:33 2026-02-21T08:21:54.7130819Z shr.u32 %r178, %r3, 5; 2026-02-21T08:21:54.7130968Z and.b32 %r179, %r178, 60; 2026-02-21T08:21:54.7131215Z .loc 1 37 64 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:37:64 2026-02-21T08:21:54.7131496Z and.b32 %r180, %r3, 3; 2026-02-21T08:21:54.7131743Z .loc 1 37 30 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:37:30 2026-02-21T08:21:54.7132020Z or.b32 %r181, %r179, %r180; 2026-02-21T08:21:54.7132269Z .loc 1 39 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:39:27 2026-02-21T08:21:54.7132547Z shl.b32 %r212, %r181, 6; 2026-02-21T08:21:54.7132801Z .loc 1 41 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:41:27 2026-02-21T08:21:54.7133070Z shl.b32 %r182, %r3, 4; 2026-02-21T08:21:54.7133221Z and.b32 %r208, %r182, 1984; 2026-02-21T08:21:54.7133474Z .loc 1 42 32 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:42:32 2026-02-21T08:21:54.7133762Z or.b32 %r9, %r208, %r156; 2026-02-21T08:21:54.7134021Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7134330Z shfl.sync.idx.b32 %r13, %r157, 0, 31, -1; 2026-02-21T08:21:54.7134525Z shl.b32 %r183, %r13, 21; 2026-02-21T08:21:54.7134732Z and.b32 %r184, %r183, 6291456; 2026-02-21T08:21:54.7134908Z add.s32 %r303, %r184, %r397; 2026-02-21T08:21:54.7135068Z mov.pred %p40, -1; 2026-02-21T08:21:54.7135220Z // begin inline asm 2026-02-21T08:21:54.7135577Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 0], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:54.7135967Z // end inline asm 2026-02-21T08:21:54.7136105Z // begin inline asm 2026-02-21T08:21:54.7136454Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 16], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:54.7136836Z // end inline asm 2026-02-21T08:21:54.7136975Z // begin inline asm 2026-02-21T08:21:54.7137134Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:54.7137296Z // end inline asm 2026-02-21T08:21:54.7137439Z bar.sync 0; 2026-02-21T08:21:54.7137742Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7138049Z add.s32 %r399, %r30, 28736; 2026-02-21T08:21:54.7138211Z // begin inline asm 2026-02-21T08:21:54.7138384Z @%p110 mbarrier.init.shared::cta.b64 [%r399], 1; 2026-02-21T08:21:54.7138593Z // end inline asm 2026-02-21T08:21:54.7138728Z bar.sync 0; 2026-02-21T08:21:54.7138869Z add.s32 %r90, %r30, 28744; 2026-02-21T08:21:54.7139023Z // begin inline asm 2026-02-21T08:21:54.7139197Z @%p110 mbarrier.init.shared::cta.b64 [%r90], 1; 2026-02-21T08:21:54.7139395Z // end inline asm 2026-02-21T08:21:54.7139543Z add.s32 %r91, %r30, 28672; 2026-02-21T08:21:54.7139700Z // begin inline asm 2026-02-21T08:21:54.7139870Z @%p110 mbarrier.init.shared::cta.b64 [%r91], 1; 2026-02-21T08:21:54.7140065Z // end inline asm 2026-02-21T08:21:54.7140197Z bar.sync 0; 2026-02-21T08:21:54.7140335Z add.s32 %r92, %r30, 28680; 2026-02-21T08:21:54.7140543Z // begin inline asm 2026-02-21T08:21:54.7140717Z @%p110 mbarrier.init.shared::cta.b64 [%r92], 1; 2026-02-21T08:21:54.7140901Z // end inline asm 2026-02-21T08:21:54.7141038Z bar.sync 0; 2026-02-21T08:21:54.7141168Z add.s32 %r93, %r30, 28688; 2026-02-21T08:21:54.7141326Z // begin inline asm 2026-02-21T08:21:54.7141497Z @%p110 mbarrier.init.shared::cta.b64 [%r93], 1; 2026-02-21T08:21:54.7141682Z // end inline asm 2026-02-21T08:21:54.7141822Z bar.sync 0; 2026-02-21T08:21:54.7141953Z add.s32 %r94, %r30, 28696; 2026-02-21T08:21:54.7142111Z // begin inline asm 2026-02-21T08:21:54.7142281Z @%p110 mbarrier.init.shared::cta.b64 [%r94], 1; 2026-02-21T08:21:54.7142462Z // end inline asm 2026-02-21T08:21:54.7142587Z bar.sync 0; 2026-02-21T08:21:54.7142719Z add.s32 %r95, %r30, 28704; 2026-02-21T08:21:54.7142863Z // begin inline asm 2026-02-21T08:21:54.7143026Z @%p110 mbarrier.init.shared::cta.b64 [%r95], 1; 2026-02-21T08:21:54.7143212Z // end inline asm 2026-02-21T08:21:54.7143340Z bar.sync 0; 2026-02-21T08:21:54.7143486Z add.s32 %r96, %r30, 28712; 2026-02-21T08:21:54.7143634Z // begin inline asm 2026-02-21T08:21:54.7143797Z @%p110 mbarrier.init.shared::cta.b64 [%r96], 1; 2026-02-21T08:21:54.7143973Z // end inline asm 2026-02-21T08:21:54.7144106Z bar.sync 0; 2026-02-21T08:21:54.7144230Z add.s32 %r205, %r30, 28720; 2026-02-21T08:21:54.7144385Z // begin inline asm 2026-02-21T08:21:54.7144541Z @%p110 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T08:21:54.7144752Z // end inline asm 2026-02-21T08:21:54.7144883Z bar.sync 0; 2026-02-21T08:21:54.7145006Z // begin inline asm 2026-02-21T08:21:54.7145199Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r91], 4096; 2026-02-21T08:21:54.7145411Z // end inline asm 2026-02-21T08:21:54.7145654Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7145932Z // begin inline asm 2026-02-21T08:21:54.7146086Z fence.proxy.async.shared::cta; 2026-02-21T08:21:54.7146247Z // end inline asm 2026-02-21T08:21:54.7146382Z bar.sync 0; 2026-02-21T08:21:54.7146522Z elect.sync %r185|%p70, -1; 2026-02-21T08:21:54.7146681Z and.pred %p52, %p1, %p70; 2026-02-21T08:21:54.7146835Z // begin inline asm 2026-02-21T08:21:54.7147150Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r30], [%rd59, {%r40, %r208}], [%r91]; 2026-02-21T08:21:54.7147496Z // end inline asm 2026-02-21T08:21:54.7147729Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7148001Z bar.sync 0; 2026-02-21T08:21:54.7148134Z elect.sync %r186|%p71, -1; 2026-02-21T08:21:54.7148289Z and.pred %p53, %p1, %p71; 2026-02-21T08:21:54.7148443Z add.s32 %r103, %r30, 14336; 2026-02-21T08:21:54.7148588Z // begin inline asm 2026-02-21T08:21:54.7148906Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r103], [%rd60, {%r40, %r212}], [%r91]; 2026-02-21T08:21:54.7149246Z // end inline asm 2026-02-21T08:21:54.7149490Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7149822Z bar.sync 0; 2026-02-21T08:21:54.7149950Z // begin inline asm 2026-02-21T08:21:54.7150137Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r92], 4096; 2026-02-21T08:21:54.7150343Z // end inline asm 2026-02-21T08:21:54.7150582Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7150854Z bar.sync 0; 2026-02-21T08:21:54.7150991Z elect.sync %r187|%p72, -1; 2026-02-21T08:21:54.7151148Z and.pred %p55, %p1, %p72; 2026-02-21T08:21:54.7151308Z add.s32 %r108, %r30, 2048; 2026-02-21T08:21:54.7151454Z // begin inline asm 2026-02-21T08:21:54.7151768Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd59, {%r33, %r208}], [%r92]; 2026-02-21T08:21:54.7152110Z // end inline asm 2026-02-21T08:21:54.7152402Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7152681Z bar.sync 0; 2026-02-21T08:21:54.7152810Z elect.sync %r188|%p73, -1; 2026-02-21T08:21:54.7152977Z and.pred %p56, %p1, %p73; 2026-02-21T08:21:54.7153129Z add.s32 %r112, %r30, 16384; 2026-02-21T08:21:54.7153288Z // begin inline asm 2026-02-21T08:21:54.7153604Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r112], [%rd60, {%r33, %r212}], [%r92]; 2026-02-21T08:21:54.7153945Z // end inline asm 2026-02-21T08:21:54.7154186Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7154457Z bar.sync 0; 2026-02-21T08:21:54.7154586Z // begin inline asm 2026-02-21T08:21:54.7154791Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r93], 4096; 2026-02-21T08:21:54.7155009Z // end inline asm 2026-02-21T08:21:54.7155245Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7155515Z bar.sync 0; 2026-02-21T08:21:54.7155654Z elect.sync %r189|%p74, -1; 2026-02-21T08:21:54.7155813Z and.pred %p58, %p1, %p74; 2026-02-21T08:21:54.7155971Z add.s32 %r117, %r30, 4096; 2026-02-21T08:21:54.7156116Z mov.b32 %r118, 32; 2026-02-21T08:21:54.7156256Z // begin inline asm 2026-02-21T08:21:54.7156568Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd59, {%r118, %r208}], [%r93]; 2026-02-21T08:21:54.7156917Z // end inline asm 2026-02-21T08:21:54.7157161Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7157428Z bar.sync 0; 2026-02-21T08:21:54.7157570Z elect.sync %r190|%p75, -1; 2026-02-21T08:21:54.7157729Z and.pred %p59, %p1, %p75; 2026-02-21T08:21:54.7157887Z add.s32 %r121, %r30, 18432; 2026-02-21T08:21:54.7158036Z // begin inline asm 2026-02-21T08:21:54.7158357Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd60, {%r118, %r212}], [%r93]; 2026-02-21T08:21:54.7158713Z // end inline asm 2026-02-21T08:21:54.7158949Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7159220Z bar.sync 0; 2026-02-21T08:21:54.7159343Z // begin inline asm 2026-02-21T08:21:54.7159537Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r94], 4096; 2026-02-21T08:21:54.7159747Z // end inline asm 2026-02-21T08:21:54.7159986Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7160250Z bar.sync 0; 2026-02-21T08:21:54.7160385Z elect.sync %r191|%p76, -1; 2026-02-21T08:21:54.7160545Z and.pred %p61, %p1, %p76; 2026-02-21T08:21:54.7160693Z add.s32 %r126, %r30, 6144; 2026-02-21T08:21:54.7160844Z mov.b32 %r127, 48; 2026-02-21T08:21:54.7160977Z // begin inline asm 2026-02-21T08:21:54.7161295Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r126], [%rd59, {%r127, %r208}], [%r94]; 2026-02-21T08:21:54.7161752Z // end inline asm 2026-02-21T08:21:54.7161992Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7162256Z bar.sync 0; 2026-02-21T08:21:54.7162393Z elect.sync %r192|%p77, -1; 2026-02-21T08:21:54.7162555Z and.pred %p62, %p1, %p77; 2026-02-21T08:21:54.7162707Z add.s32 %r130, %r30, 20480; 2026-02-21T08:21:54.7162863Z // begin inline asm 2026-02-21T08:21:54.7163170Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r130], [%rd60, {%r127, %r212}], [%r94]; 2026-02-21T08:21:54.7163520Z // end inline asm 2026-02-21T08:21:54.7163751Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7164033Z bar.sync 0; 2026-02-21T08:21:54.7164171Z // begin inline asm 2026-02-21T08:21:54.7164356Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r95], 4096; 2026-02-21T08:21:54.7164624Z // end inline asm 2026-02-21T08:21:54.7164898Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7165182Z bar.sync 0; 2026-02-21T08:21:54.7165313Z elect.sync %r193|%p78, -1; 2026-02-21T08:21:54.7165476Z and.pred %p64, %p1, %p78; 2026-02-21T08:21:54.7165625Z add.s32 %r135, %r30, 8192; 2026-02-21T08:21:54.7165779Z // begin inline asm 2026-02-21T08:21:54.7166097Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r135], [%rd59, {%r34, %r208}], [%r95]; 2026-02-21T08:21:54.7166434Z // end inline asm 2026-02-21T08:21:54.7166677Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7166947Z bar.sync 0; 2026-02-21T08:21:54.7167083Z elect.sync %r194|%p79, -1; 2026-02-21T08:21:54.7167240Z and.pred %p65, %p1, %p79; 2026-02-21T08:21:54.7167401Z add.s32 %r139, %r30, 22528; 2026-02-21T08:21:54.7167548Z // begin inline asm 2026-02-21T08:21:54.7167869Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r139], [%rd60, {%r34, %r212}], [%r95]; 2026-02-21T08:21:54.7168227Z // end inline asm 2026-02-21T08:21:54.7168462Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7168742Z bar.sync 0; 2026-02-21T08:21:54.7168867Z // begin inline asm 2026-02-21T08:21:54.7169066Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r96], 4096; 2026-02-21T08:21:54.7169274Z // end inline asm 2026-02-21T08:21:54.7169518Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7169790Z bar.sync 0; 2026-02-21T08:21:54.7169918Z elect.sync %r195|%p80, -1; 2026-02-21T08:21:54.7170078Z and.pred %p67, %p1, %p80; 2026-02-21T08:21:54.7170228Z add.s32 %r144, %r30, 10240; 2026-02-21T08:21:54.7170378Z mov.b32 %r145, 80; 2026-02-21T08:21:54.7170506Z // begin inline asm 2026-02-21T08:21:54.7170824Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd59, {%r145, %r208}], [%r96]; 2026-02-21T08:21:54.7171169Z // end inline asm 2026-02-21T08:21:54.7171406Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7171680Z bar.sync 0; 2026-02-21T08:21:54.7171808Z elect.sync %r196|%p81, -1; 2026-02-21T08:21:54.7171967Z and.pred %p68, %p1, %p81; 2026-02-21T08:21:54.7172118Z add.s32 %r148, %r30, 24576; 2026-02-21T08:21:54.7172271Z // begin inline asm 2026-02-21T08:21:54.7172580Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd60, {%r145, %r212}], [%r96]; 2026-02-21T08:21:54.7172924Z // end inline asm 2026-02-21T08:21:54.7173169Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7173446Z bar.sync 0; 2026-02-21T08:21:54.7173575Z // begin inline asm 2026-02-21T08:21:54.7173706Z 2026-02-21T08:21:54.7173902Z { 2026-02-21T08:21:54.7174024Z .reg .pred complete; 2026-02-21T08:21:54.7174173Z waitLoop: 2026-02-21T08:21:54.7174354Z mbarrier.try_wait.parity.shared.b64 complete, [%r91], %r40; 2026-02-21T08:21:54.7174591Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.7174778Z } 2026-02-21T08:21:54.7174852Z 2026-02-21T08:21:54.7174908Z // end inline asm 2026-02-21T08:21:54.7175157Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7175437Z setp.ne.b32 %p82, %r13, 0; 2026-02-21T08:21:54.7175598Z @%p82 bra $L__BB0_3; 2026-02-21T08:21:54.7175736Z // %bb.2: 2026-02-21T08:21:54.7175964Z .loc 1 0 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:0:52 2026-02-21T08:21:54.7176238Z bfe.u32 %r201, %r103, 4, 14; 2026-02-21T08:21:54.7176401Z cvt.u64.u32 %rd57, %r201; 2026-02-21T08:21:54.7176570Z or.b64 %rd55, %rd57, -4611685949699522560; 2026-02-21T08:21:54.7176797Z bfe.u32 %r202, %r30, 4, 14; 2026-02-21T08:21:54.7176957Z cvt.u64.u32 %rd58, %r202; 2026-02-21T08:21:54.7177114Z or.b64 %rd54, %rd58, -4611685949699522560; 2026-02-21T08:21:54.7177394Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7177670Z elect.sync %r203|%p84, -1; 2026-02-21T08:21:54.7177838Z mov.b32 %r198, 68157456; 2026-02-21T08:21:54.7177988Z mov.pred %p83, 0; 2026-02-21T08:21:54.7178137Z // begin inline asm 2026-02-21T08:21:54.7178369Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd54, %rd55, %r198, %p83; 2026-02-21T08:21:54.7178633Z // end inline asm 2026-02-21T08:21:54.7178779Z add.s32 %r204, %r30, 28736; 2026-02-21T08:21:54.7178936Z cvt.u64.u32 %rd56, %r204; 2026-02-21T08:21:54.7179095Z // begin inline asm 2026-02-21T08:21:54.7179307Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T08:21:54.7179550Z // end inline asm 2026-02-21T08:21:54.7179685Z $L__BB0_3: 2026-02-21T08:21:54.7179925Z .loc 1 0 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:0:52 2026-02-21T08:21:54.7180242Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T08:21:54.7180434Z add.s32 %r4, %r168, %r166; 2026-02-21T08:21:54.7180601Z add.s32 %r308, %r177, %r175; 2026-02-21T08:21:54.7180757Z or.b32 %r7, %r212, %r155; 2026-02-21T08:21:54.7180911Z or.b32 %r10, %r9, 16; 2026-02-21T08:21:54.7181055Z or.b32 %r11, %r9, 32; 2026-02-21T08:21:54.7181204Z or.b32 %r12, %r9, 48; 2026-02-21T08:21:54.7181450Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7181734Z bar.sync 0; 2026-02-21T08:21:54.7181868Z // begin inline asm 2026-02-21T08:21:54.7182061Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r205], 4096; 2026-02-21T08:21:54.7182285Z // end inline asm 2026-02-21T08:21:54.7182526Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7182811Z bar.sync 0; 2026-02-21T08:21:54.7182947Z elect.sync %r219|%p90, -1; 2026-02-21T08:21:54.7183119Z and.pred %p87, %p1, %p90; 2026-02-21T08:21:54.7183276Z add.s32 %r206, %r30, 12288; 2026-02-21T08:21:54.7183435Z mov.b32 %r207, 96; 2026-02-21T08:21:54.7183583Z // begin inline asm 2026-02-21T08:21:54.7183922Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r206], [%rd59, {%r207, %r208}], [%r205]; 2026-02-21T08:21:54.7184300Z // end inline asm 2026-02-21T08:21:54.7184543Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7184860Z bar.sync 0; 2026-02-21T08:21:54.7185001Z elect.sync %r220|%p91, -1; 2026-02-21T08:21:54.7185193Z and.pred %p88, %p1, %p91; 2026-02-21T08:21:54.7185345Z add.s32 %r210, %r30, 26624; 2026-02-21T08:21:54.7185501Z // begin inline asm 2026-02-21T08:21:54.7185819Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r210], [%rd60, {%r207, %r212}], [%r205]; 2026-02-21T08:21:54.7186213Z // end inline asm 2026-02-21T08:21:54.7186346Z mov.b32 %r403, 1; 2026-02-21T08:21:54.7186475Z mov.b32 %r402, 6; 2026-02-21T08:21:54.7186610Z mov.b32 %r398, 0; 2026-02-21T08:21:54.7186737Z mov.b32 %r400, %r398; 2026-02-21T08:21:54.7186885Z mov.b32 %r401, %r398; 2026-02-21T08:21:54.7187021Z mov.b32 %r404, %r398; 2026-02-21T08:21:54.7187165Z mov.b32 %r405, %r398; 2026-02-21T08:21:54.7187309Z bra.uni $L__BB0_4; 2026-02-21T08:21:54.7187490Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:54.7187822Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7188109Z setp.lt.u32 %p100, %r405, 1936; 2026-02-21T08:21:54.7188385Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7188654Z // begin inline asm 2026-02-21T08:21:54.7188841Z 2026-02-21T08:21:54.7188959Z { 2026-02-21T08:21:54.7189077Z .reg .pred complete; 2026-02-21T08:21:54.7189223Z waitLoop: 2026-02-21T08:21:54.7189406Z mbarrier.try_wait.parity.shared.b64 complete, [%r399], %r398; 2026-02-21T08:21:54.7189643Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.7189789Z } 2026-02-21T08:21:54.7189856Z 2026-02-21T08:21:54.7189910Z // end inline asm 2026-02-21T08:21:54.7190041Z add.s32 %r250, %r403, 1; 2026-02-21T08:21:54.7190200Z setp.gt.s32 %p103, %r250, 1; 2026-02-21T08:21:54.7190360Z selp.b32 %r403, 0, %r250, %p103; 2026-02-21T08:21:54.7190535Z selp.b32 %r251, 1, 0, %p103; 2026-02-21T08:21:54.7190695Z xor.b32 %r404, %r260, %r251; 2026-02-21T08:21:54.7190947Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7191225Z add.s32 %r252, %r402, 1; 2026-02-21T08:21:54.7191373Z setp.gt.s32 %p104, %r252, 6; 2026-02-21T08:21:54.7191534Z selp.b32 %r402, 0, %r252, %p104; 2026-02-21T08:21:54.7191694Z shl.b32 %r253, %r402, 3; 2026-02-21T08:21:54.7191850Z add.s32 %r255, %r30, %r253; 2026-02-21T08:21:54.7191997Z add.s32 %r245, %r255, 28672; 2026-02-21T08:21:54.7192148Z bar.sync 0; 2026-02-21T08:21:54.7192283Z and.pred %p97, %p110, %p100; 2026-02-21T08:21:54.7192433Z // begin inline asm 2026-02-21T08:21:54.7192623Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r245], 4096; 2026-02-21T08:21:54.7192831Z // end inline asm 2026-02-21T08:21:54.7193073Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7193344Z shl.b32 %r256, %r402, 11; 2026-02-21T08:21:54.7193498Z add.s32 %r242, %r30, %r256; 2026-02-21T08:21:54.7193644Z bar.sync 0; 2026-02-21T08:21:54.7193784Z elect.sync %r257|%p105, -1; 2026-02-21T08:21:54.7193954Z and.pred %p106, %p100, %p105; 2026-02-21T08:21:54.7194118Z and.pred %p98, %p1, %p106; 2026-02-21T08:21:54.7194283Z add.s32 %r243, %r405, 112; 2026-02-21T08:21:54.7194430Z // begin inline asm 2026-02-21T08:21:54.7194778Z @%p98 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd59, {%r243, %r208}], [%r245]; 2026-02-21T08:21:54.7195135Z // end inline asm 2026-02-21T08:21:54.7195377Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7195654Z add.s32 %r246, %r242, 14336; 2026-02-21T08:21:54.7195808Z bar.sync 0; 2026-02-21T08:21:54.7195942Z elect.sync %r258|%p107, -1; 2026-02-21T08:21:54.7196102Z and.pred %p108, %p100, %p107; 2026-02-21T08:21:54.7196268Z and.pred %p99, %p1, %p108; 2026-02-21T08:21:54.7196417Z // begin inline asm 2026-02-21T08:21:54.7196750Z @%p99 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd60, {%r243, %r212}], [%r245]; 2026-02-21T08:21:54.7197100Z // end inline asm 2026-02-21T08:21:54.7197343Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7197636Z setp.lt.u32 %p109, %r405, 2016; 2026-02-21T08:21:54.7197861Z add.s32 %r405, %r405, 16; 2026-02-21T08:21:54.7198016Z mov.b32 %r398, %r260; 2026-02-21T08:21:54.7198154Z mov.b32 %r399, %r259; 2026-02-21T08:21:54.7198302Z @%p109 bra $L__BB0_4; 2026-02-21T08:21:54.7198441Z bra.uni $L__BB0_7; 2026-02-21T08:21:54.7198631Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:54.7198940Z .loc 1 0 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:0:42 2026-02-21T08:21:54.7199216Z mov.b32 %r260, %r404; 2026-02-21T08:21:54.7199465Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7199743Z add.s32 %r223, %r401, 1; 2026-02-21T08:21:54.7199904Z setp.gt.s32 %p93, %r223, 6; 2026-02-21T08:21:54.7200062Z selp.b32 %r401, 0, %r223, %p93; 2026-02-21T08:21:54.7200230Z selp.b32 %r224, 1, 0, %p93; 2026-02-21T08:21:54.7200433Z xor.b32 %r400, %r400, %r224; 2026-02-21T08:21:54.7200591Z shl.b32 %r225, %r401, 3; 2026-02-21T08:21:54.7200737Z add.s32 %r227, %r30, %r225; 2026-02-21T08:21:54.7200893Z add.s32 %r221, %r227, 28672; 2026-02-21T08:21:54.7201043Z bar.sync 0; 2026-02-21T08:21:54.7201167Z // begin inline asm 2026-02-21T08:21:54.7201304Z 2026-02-21T08:21:54.7201408Z { 2026-02-21T08:21:54.7201530Z .reg .pred complete; 2026-02-21T08:21:54.7201665Z waitLoop: 2026-02-21T08:21:54.7201849Z mbarrier.try_wait.parity.shared.b64 complete, [%r221], %r400; 2026-02-21T08:21:54.7202075Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.7202231Z } 2026-02-21T08:21:54.7202293Z 2026-02-21T08:21:54.7202346Z // end inline asm 2026-02-21T08:21:54.7202488Z shl.b32 %r228, %r403, 3; 2026-02-21T08:21:54.7202640Z add.s32 %r229, %r30, %r228; 2026-02-21T08:21:54.7202788Z add.s32 %r259, %r229, 28736; 2026-02-21T08:21:54.7203041Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7203314Z @%p82 bra $L__BB0_6; 2026-02-21T08:21:54.7203508Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:54.7203812Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:54.7204094Z shl.b32 %r232, %r401, 11; 2026-02-21T08:21:54.7204253Z add.s32 %r234, %r30, %r232; 2026-02-21T08:21:54.7204498Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:54.7204808Z add.s32 %r235, %r234, 14336; 2026-02-21T08:21:54.7205059Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7205351Z elect.sync %r236|%p95, -1; 2026-02-21T08:21:54.7205506Z bfe.u32 %r237, %r234, 4, 14; 2026-02-21T08:21:54.7205660Z cvt.u64.u32 %rd64, %r237; 2026-02-21T08:21:54.7205821Z or.b64 %rd61, %rd64, -4611685949699522560; 2026-02-21T08:21:54.7206004Z bfe.u32 %r238, %r235, 4, 14; 2026-02-21T08:21:54.7206161Z cvt.u64.u32 %rd65, %r238; 2026-02-21T08:21:54.7206322Z or.b64 %rd62, %rd65, -4611685949699522560; 2026-02-21T08:21:54.7206505Z mov.b32 %r231, 68157456; 2026-02-21T08:21:54.7206652Z mov.pred %p94, -1; 2026-02-21T08:21:54.7206799Z // begin inline asm 2026-02-21T08:21:54.7207022Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd61, %rd62, %r231, %p94; 2026-02-21T08:21:54.7207276Z // end inline asm 2026-02-21T08:21:54.7207409Z cvt.u64.u32 %rd63, %r259; 2026-02-21T08:21:54.7207561Z // begin inline asm 2026-02-21T08:21:54.7207774Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T08:21:54.7207996Z // end inline asm 2026-02-21T08:21:54.7208134Z bra.uni $L__BB0_6; 2026-02-21T08:21:54.7208303Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:21:54.7208609Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7208890Z // begin inline asm 2026-02-21T08:21:54.7209026Z 2026-02-21T08:21:54.7209187Z { 2026-02-21T08:21:54.7209310Z .reg .pred complete; 2026-02-21T08:21:54.7209452Z waitLoop: 2026-02-21T08:21:54.7209628Z mbarrier.try_wait.parity.shared.b64 complete, [%r259], %r260; 2026-02-21T08:21:54.7209858Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.7210003Z } 2026-02-21T08:21:54.7210065Z 2026-02-21T08:21:54.7210124Z // end inline asm 2026-02-21T08:21:54.7210356Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:54.7210625Z bar.sync 0; 2026-02-21T08:21:54.7210749Z // begin inline asm 2026-02-21T08:21:54.7210916Z @%p110 mbarrier.inval.shared::cta.b64 [%r91]; 2026-02-21T08:21:54.7211108Z // end inline asm 2026-02-21T08:21:54.7211237Z bar.sync 0; 2026-02-21T08:21:54.7211368Z // begin inline asm 2026-02-21T08:21:54.7211524Z @%p110 mbarrier.inval.shared::cta.b64 [%r92]; 2026-02-21T08:21:54.7211710Z // end inline asm 2026-02-21T08:21:54.7211836Z bar.sync 0; 2026-02-21T08:21:54.7212016Z // begin inline asm 2026-02-21T08:21:54.7212174Z @%p110 mbarrier.inval.shared::cta.b64 [%r93]; 2026-02-21T08:21:54.7212361Z // end inline asm 2026-02-21T08:21:54.7212488Z bar.sync 0; 2026-02-21T08:21:54.7212616Z // begin inline asm 2026-02-21T08:21:54.7212778Z @%p110 mbarrier.inval.shared::cta.b64 [%r94]; 2026-02-21T08:21:54.7212955Z // end inline asm 2026-02-21T08:21:54.7213094Z bar.sync 0; 2026-02-21T08:21:54.7213220Z // begin inline asm 2026-02-21T08:21:54.7213381Z @%p110 mbarrier.inval.shared::cta.b64 [%r95]; 2026-02-21T08:21:54.7213557Z // end inline asm 2026-02-21T08:21:54.7213689Z bar.sync 0; 2026-02-21T08:21:54.7213815Z // begin inline asm 2026-02-21T08:21:54.7213990Z @%p110 mbarrier.inval.shared::cta.b64 [%r96]; 2026-02-21T08:21:54.7214169Z // end inline asm 2026-02-21T08:21:54.7214304Z bar.sync 0; 2026-02-21T08:21:54.7214432Z // begin inline asm 2026-02-21T08:21:54.7214590Z @%p110 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T08:21:54.7214807Z // end inline asm 2026-02-21T08:21:54.7214939Z add.s32 %r268, %r30, 28736; 2026-02-21T08:21:54.7215097Z // begin inline asm 2026-02-21T08:21:54.7215254Z @%p110 mbarrier.inval.shared::cta.b64 [%r268]; 2026-02-21T08:21:54.7215438Z // end inline asm 2026-02-21T08:21:54.7215565Z bar.sync 0; 2026-02-21T08:21:54.7215695Z // begin inline asm 2026-02-21T08:21:54.7215854Z @%p110 mbarrier.inval.shared::cta.b64 [%r90]; 2026-02-21T08:21:54.7216029Z // end inline asm 2026-02-21T08:21:54.7216269Z .loc 1 56 45 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:45 2026-02-21T08:21:54.7216544Z shl.b32 %r341, %r9, 12; 2026-02-21T08:21:54.7216700Z shl.b32 %r342, %r10, 12; 2026-02-21T08:21:54.7216845Z shl.b32 %r343, %r11, 12; 2026-02-21T08:21:54.7216994Z shl.b32 %r344, %r12, 12; 2026-02-21T08:21:54.7217239Z .loc 1 56 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:52 2026-02-21T08:21:54.7217521Z or.b32 %r345, %r7, %r341; 2026-02-21T08:21:54.7217676Z or.b32 %r346, %r7, %r342; 2026-02-21T08:21:54.7217821Z or.b32 %r347, %r7, %r343; 2026-02-21T08:21:54.7217967Z or.b32 %r348, %r7, %r344; 2026-02-21T08:21:54.7218207Z .loc 1 56 24 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:24 2026-02-21T08:21:54.7218499Z mad.wide.u32 %rd68, %r345, 2, %rd3; 2026-02-21T08:21:54.7218671Z mad.wide.u32 %rd69, %r346, 2, %rd3; 2026-02-21T08:21:54.7218842Z mad.wide.u32 %rd70, %r347, 2, %rd3; 2026-02-21T08:21:54.7219003Z mad.wide.u32 %rd71, %r348, 2, %rd3; 2026-02-21T08:21:54.7219269Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7219551Z // begin inline asm 2026-02-21T08:21:54.7219913Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285}, [%r303 + 0], 32; 2026-02-21T08:21:54.7220307Z // end inline asm 2026-02-21T08:21:54.7220440Z // begin inline asm 2026-02-21T08:21:54.7220799Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302}, [%r303 + 16], 32; 2026-02-21T08:21:54.7221285Z // end inline asm 2026-02-21T08:21:54.7221428Z // begin inline asm 2026-02-21T08:21:54.7221597Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:54.7221767Z // end inline asm 2026-02-21T08:21:54.7221921Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:21:54.7222083Z cvt.u64.u32 %rd73, %r271; 2026-02-21T08:21:54.7222250Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:21:54.7222416Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:21:54.7222703Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7223021Z mov.b64 {%r349, %r350}, %rd75; 2026-02-21T08:21:54.7223205Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:21:54.7223561Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7223856Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:21:54.7224015Z cvt.u64.u32 %rd77, %r273; 2026-02-21T08:21:54.7224167Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:21:54.7224324Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:21:54.7224589Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7224917Z mov.b64 {%r352, %r353}, %rd79; 2026-02-21T08:21:54.7225097Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:21:54.7225382Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7225677Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:21:54.7225828Z cvt.u64.u32 %rd81, %r275; 2026-02-21T08:21:54.7225984Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:21:54.7226135Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:21:54.7226415Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7226713Z mov.b64 {%r355, %r356}, %rd83; 2026-02-21T08:21:54.7226885Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T08:21:54.7227172Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7227457Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:21:54.7227615Z cvt.u64.u32 %rd85, %r277; 2026-02-21T08:21:54.7227763Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:21:54.7227920Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:21:54.7228185Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7228477Z mov.b64 {%r358, %r359}, %rd87; 2026-02-21T08:21:54.7228650Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T08:21:54.7228933Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7229281Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:21:54.7229425Z cvt.u64.u32 %rd89, %r279; 2026-02-21T08:21:54.7229575Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:21:54.7229721Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:21:54.7229980Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7230257Z mov.b64 {%r361, %r362}, %rd91; 2026-02-21T08:21:54.7230413Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T08:21:54.7230685Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7230956Z cvt.u64.u32 %rd92, %r280; 2026-02-21T08:21:54.7231104Z cvt.u64.u32 %rd93, %r281; 2026-02-21T08:21:54.7231247Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:21:54.7231399Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:21:54.7231648Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7231928Z mov.b64 {%r364, %r365}, %rd95; 2026-02-21T08:21:54.7232092Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T08:21:54.7232362Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7232691Z cvt.u64.u32 %rd96, %r282; 2026-02-21T08:21:54.7232834Z cvt.u64.u32 %rd97, %r283; 2026-02-21T08:21:54.7232982Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:21:54.7233125Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:21:54.7233383Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7233670Z mov.b64 {%r367, %r368}, %rd99; 2026-02-21T08:21:54.7233831Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T08:21:54.7234103Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7234372Z cvt.u64.u32 %rd100, %r284; 2026-02-21T08:21:54.7234539Z cvt.u64.u32 %rd101, %r285; 2026-02-21T08:21:54.7234716Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:21:54.7234881Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:21:54.7235183Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7235472Z mov.b64 {%r370, %r371}, %rd103; 2026-02-21T08:21:54.7235644Z cvt.rn.f16x2.f32 %r372, %r371, %r370; 2026-02-21T08:21:54.7235911Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7236200Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:21:54.7236350Z cvt.u64.u32 %rd105, %r288; 2026-02-21T08:21:54.7236507Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:21:54.7236657Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:21:54.7236920Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7237207Z mov.b64 {%r373, %r374}, %rd107; 2026-02-21T08:21:54.7237370Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T08:21:54.7237640Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7237910Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:21:54.7238069Z cvt.u64.u32 %rd109, %r290; 2026-02-21T08:21:54.7238219Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:21:54.7238376Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:21:54.7238631Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7238912Z mov.b64 {%r376, %r377}, %rd111; 2026-02-21T08:21:54.7239081Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T08:21:54.7239343Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7239629Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:21:54.7239774Z cvt.u64.u32 %rd113, %r292; 2026-02-21T08:21:54.7239925Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:21:54.7240073Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:21:54.7240328Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7240613Z mov.b64 {%r379, %r380}, %rd115; 2026-02-21T08:21:54.7240776Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T08:21:54.7241049Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7241329Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:21:54.7241484Z cvt.u64.u32 %rd117, %r294; 2026-02-21T08:21:54.7241631Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:21:54.7241790Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:21:54.7242042Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7242323Z mov.b64 {%r382, %r383}, %rd119; 2026-02-21T08:21:54.7242489Z cvt.rn.f16x2.f32 %r384, %r383, %r382; 2026-02-21T08:21:54.7242752Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7243036Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:21:54.7243184Z cvt.u64.u32 %rd121, %r296; 2026-02-21T08:21:54.7243338Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:21:54.7243489Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:21:54.7243748Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7244077Z mov.b64 {%r385, %r386}, %rd123; 2026-02-21T08:21:54.7244245Z cvt.rn.f16x2.f32 %r387, %r386, %r385; 2026-02-21T08:21:54.7244516Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7244814Z cvt.u64.u32 %rd124, %r297; 2026-02-21T08:21:54.7244979Z cvt.u64.u32 %rd125, %r298; 2026-02-21T08:21:54.7245127Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:21:54.7245285Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:21:54.7245543Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7245824Z mov.b64 {%r388, %r389}, %rd127; 2026-02-21T08:21:54.7245990Z cvt.rn.f16x2.f32 %r390, %r389, %r388; 2026-02-21T08:21:54.7246304Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7246597Z cvt.u64.u32 %rd128, %r299; 2026-02-21T08:21:54.7246749Z cvt.u64.u32 %rd129, %r300; 2026-02-21T08:21:54.7246905Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:21:54.7247053Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:21:54.7247319Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7247599Z mov.b64 {%r391, %r392}, %rd131; 2026-02-21T08:21:54.7247759Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T08:21:54.7248035Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:54.7248304Z cvt.u64.u32 %rd132, %r301; 2026-02-21T08:21:54.7248458Z cvt.u64.u32 %rd133, %r302; 2026-02-21T08:21:54.7248605Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:21:54.7248777Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:21:54.7249029Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:54.7249321Z mov.b64 {%r394, %r395}, %rd135; 2026-02-21T08:21:54.7249491Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T08:21:54.7249759Z .loc 1 56 82 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:82 2026-02-21T08:21:54.7250074Z st.shared.v4.b32 [%r4], {%r351, %r363, %r375, %r387}; 2026-02-21T08:21:54.7250311Z bar.sync 0; 2026-02-21T08:21:54.7250462Z // begin inline asm 2026-02-21T08:21:54.7250727Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r324, %r328, %r332, %r336}, [%r308]; 2026-02-21T08:21:54.7251051Z // end inline asm 2026-02-21T08:21:54.7251210Z bar.sync 0; 2026-02-21T08:21:54.7251422Z st.shared.v4.b32 [%r4], {%r354, %r366, %r378, %r390}; 2026-02-21T08:21:54.7251616Z bar.sync 0; 2026-02-21T08:21:54.7251741Z // begin inline asm 2026-02-21T08:21:54.7251974Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r325, %r329, %r333, %r337}, [%r308]; 2026-02-21T08:21:54.7252227Z // end inline asm 2026-02-21T08:21:54.7252363Z bar.sync 0; 2026-02-21T08:21:54.7252518Z st.shared.v4.b32 [%r4], {%r357, %r369, %r381, %r393}; 2026-02-21T08:21:54.7252713Z bar.sync 0; 2026-02-21T08:21:54.7252834Z // begin inline asm 2026-02-21T08:21:54.7253061Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r326, %r330, %r334, %r338}, [%r308]; 2026-02-21T08:21:54.7253321Z // end inline asm 2026-02-21T08:21:54.7253452Z bar.sync 0; 2026-02-21T08:21:54.7253610Z st.shared.v4.b32 [%r4], {%r360, %r372, %r384, %r396}; 2026-02-21T08:21:54.7253793Z bar.sync 0; 2026-02-21T08:21:54.7253920Z // begin inline asm 2026-02-21T08:21:54.7254135Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r327, %r331, %r335, %r339}, [%r308]; 2026-02-21T08:21:54.7254388Z // end inline asm 2026-02-21T08:21:54.7254516Z // begin inline asm 2026-02-21T08:21:54.7254726Z st.global.v4.b32 [ %rd68 + 0 ], { %r324, %r325, %r326, %r327 }; 2026-02-21T08:21:54.7254936Z // end inline asm 2026-02-21T08:21:54.7255063Z // begin inline asm 2026-02-21T08:21:54.7255247Z st.global.v4.b32 [ %rd69 + 0 ], { %r328, %r329, %r330, %r331 }; 2026-02-21T08:21:54.7255445Z // end inline asm 2026-02-21T08:21:54.7255649Z // begin inline asm 2026-02-21T08:21:54.7255818Z st.global.v4.b32 [ %rd70 + 0 ], { %r332, %r333, %r334, %r335 }; 2026-02-21T08:21:54.7256030Z // end inline asm 2026-02-21T08:21:54.7256157Z // begin inline asm 2026-02-21T08:21:54.7256335Z st.global.v4.b32 [ %rd71 + 0 ], { %r336, %r337, %r338, %r339 }; 2026-02-21T08:21:54.7256529Z // end inline asm 2026-02-21T08:21:54.7256688Z $L__BB0_8: // %._crit_edge 2026-02-21T08:21:54.7256983Z .loc 1 28 4 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:28:4 2026-02-21T08:21:54.7257249Z bar.sync 0; 2026-02-21T08:21:54.7257380Z // begin inline asm 2026-02-21T08:21:54.7257572Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r397, 64; 2026-02-21T08:21:54.7257784Z // end inline asm 2026-02-21T08:21:54.7257906Z ret; 2026-02-21T08:21:54.7258028Z $L__tmp1: 2026-02-21T08:21:54.7258147Z $L__func_end0: 2026-02-21T08:21:54.7258381Z // -- End function 2026-02-21T08:21:54.7258567Z } 2026-02-21T08:21:54.7258826Z .file 1 "/tmp/torchinductor_root/4k/c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py" 2026-02-21T08:21:54.7259149Z .section .debug_abbrev 2026-02-21T08:21:54.7259285Z { 2026-02-21T08:21:54.7259436Z .b8 1 // Abbreviation Code 2026-02-21T08:21:54.7259648Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:54.7259864Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:54.7260066Z .b8 37 // DW_AT_producer 2026-02-21T08:21:54.7260271Z .b8 8 // DW_FORM_string 2026-02-21T08:21:54.7260470Z .b8 19 // DW_AT_language 2026-02-21T08:21:54.7260665Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:54.7260864Z .b8 3 // DW_AT_name 2026-02-21T08:21:54.7261055Z .b8 8 // DW_FORM_string 2026-02-21T08:21:54.7261258Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:54.7261454Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:54.7261650Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:54.7261848Z .b8 8 // DW_FORM_string 2026-02-21T08:21:54.7262037Z .b8 0 // EOM(1) 2026-02-21T08:21:54.7262226Z .b8 0 // EOM(2) 2026-02-21T08:21:54.7262406Z .b8 0 // EOM(3) 2026-02-21T08:21:54.7262575Z } 2026-02-21T08:21:54.7262693Z .section .debug_info 2026-02-21T08:21:54.7262833Z { 2026-02-21T08:21:54.7262969Z .b32 104 // Length of Unit 2026-02-21T08:21:54.7263188Z .b8 2 // DWARF version number 2026-02-21T08:21:54.7263382Z .b8 0 2026-02-21T08:21:54.7263558Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:54.7263810Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:54.7264037Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:54.7264269Z .b8 116 // DW_AT_producer 2026-02-21T08:21:54.7264444Z .b8 114 2026-02-21T08:21:54.7264573Z .b8 105 2026-02-21T08:21:54.7264729Z .b8 116 2026-02-21T08:21:54.7264851Z .b8 111 2026-02-21T08:21:54.7264970Z .b8 110 2026-02-21T08:21:54.7265083Z .b8 0 2026-02-21T08:21:54.7265230Z .b8 2 // DW_AT_language 2026-02-21T08:21:54.7265408Z .b8 0 2026-02-21T08:21:54.7265556Z .b8 99 // DW_AT_name 2026-02-21T08:21:54.7265735Z .b8 52 2026-02-21T08:21:54.7265860Z .b8 107 2026-02-21T08:21:54.7265973Z .b8 51 2026-02-21T08:21:54.7266096Z .b8 99 2026-02-21T08:21:54.7266209Z .b8 103 2026-02-21T08:21:54.7266330Z .b8 114 2026-02-21T08:21:54.7266446Z .b8 120 2026-02-21T08:21:54.7266645Z .b8 118 2026-02-21T08:21:54.7266761Z .b8 104 2026-02-21T08:21:54.7266872Z .b8 52 2026-02-21T08:21:54.7266990Z .b8 51 2026-02-21T08:21:54.7267102Z .b8 55 2026-02-21T08:21:54.7267226Z .b8 113 2026-02-21T08:21:54.7267338Z .b8 113 2026-02-21T08:21:54.7267458Z .b8 99 2026-02-21T08:21:54.7267571Z .b8 52 2026-02-21T08:21:54.7267690Z .b8 117 2026-02-21T08:21:54.7267801Z .b8 110 2026-02-21T08:21:54.7267921Z .b8 102 2026-02-21T08:21:54.7268032Z .b8 116 2026-02-21T08:21:54.7268151Z .b8 54 2026-02-21T08:21:54.7268263Z .b8 114 2026-02-21T08:21:54.7268382Z .b8 120 2026-02-21T08:21:54.7268500Z .b8 118 2026-02-21T08:21:54.7268609Z .b8 52 2026-02-21T08:21:54.7268728Z .b8 103 2026-02-21T08:21:54.7268840Z .b8 102 2026-02-21T08:21:54.7268958Z .b8 101 2026-02-21T08:21:54.7269069Z .b8 102 2026-02-21T08:21:54.7269187Z .b8 120 2026-02-21T08:21:54.7269298Z .b8 52 2026-02-21T08:21:54.7269418Z .b8 55 2026-02-21T08:21:54.7269530Z .b8 106 2026-02-21T08:21:54.7269647Z .b8 52 2026-02-21T08:21:54.7269827Z .b8 104 2026-02-21T08:21:54.7269953Z .b8 111 2026-02-21T08:21:54.7270064Z .b8 51 2026-02-21T08:21:54.7270184Z .b8 55 2026-02-21T08:21:54.7270294Z .b8 110 2026-02-21T08:21:54.7270417Z .b8 116 2026-02-21T08:21:54.7270539Z .b8 104 2026-02-21T08:21:54.7270651Z .b8 120 2026-02-21T08:21:54.7270771Z .b8 104 2026-02-21T08:21:54.7270883Z .b8 97 2026-02-21T08:21:54.7271007Z .b8 115 2026-02-21T08:21:54.7271123Z .b8 102 2026-02-21T08:21:54.7271248Z .b8 122 2026-02-21T08:21:54.7271360Z .b8 120 2026-02-21T08:21:54.7271479Z .b8 119 2026-02-21T08:21:54.7271590Z .b8 104 2026-02-21T08:21:54.7271708Z .b8 106 2026-02-21T08:21:54.7271818Z .b8 46 2026-02-21T08:21:54.7271937Z .b8 112 2026-02-21T08:21:54.7272047Z .b8 121 2026-02-21T08:21:54.7272166Z .b8 0 2026-02-21T08:21:54.7272327Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:54.7272548Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:54.7272743Z .b8 116 2026-02-21T08:21:54.7272849Z .b8 109 2026-02-21T08:21:54.7272964Z .b8 112 2026-02-21T08:21:54.7273071Z .b8 47 2026-02-21T08:21:54.7273183Z .b8 116 2026-02-21T08:21:54.7273290Z .b8 111 2026-02-21T08:21:54.7273402Z .b8 114 2026-02-21T08:21:54.7273507Z .b8 99 2026-02-21T08:21:54.7273623Z .b8 104 2026-02-21T08:21:54.7273728Z .b8 105 2026-02-21T08:21:54.7273840Z .b8 110 2026-02-21T08:21:54.7273951Z .b8 100 2026-02-21T08:21:54.7274055Z .b8 117 2026-02-21T08:21:54.7274166Z .b8 99 2026-02-21T08:21:54.7274272Z .b8 116 2026-02-21T08:21:54.7274384Z .b8 111 2026-02-21T08:21:54.7274488Z .b8 114 2026-02-21T08:21:54.7274603Z .b8 95 2026-02-21T08:21:54.7274736Z .b8 114 2026-02-21T08:21:54.7274853Z .b8 111 2026-02-21T08:21:54.7274960Z .b8 111 2026-02-21T08:21:54.7275074Z .b8 116 2026-02-21T08:21:54.7275181Z .b8 47 2026-02-21T08:21:54.7275297Z .b8 52 2026-02-21T08:21:54.7275406Z .b8 107 2026-02-21T08:21:54.7275522Z .b8 0 2026-02-21T08:21:54.7275633Z } 2026-02-21T08:21:54.7275764Z .section .debug_macinfo { } 2026-02-21T08:21:54.7275869Z 2026-02-21T08:21:54.7275957Z ================================================================ 2026-02-21T08:21:54.7276189Z please share the reproducer above with Triton project. 2026-02-21T08:21:54.8317660Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:54.8318900Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, None]), static_shapes=True) 2026-02-21T08:21:54.8320027Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:54.8320272Z `ptxas` stderr: 2026-02-21T08:21:54.8320712Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 186 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:54.8321521Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:54.8321670Z 2026-02-21T08:21:54.8322092Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmppporpg62.ptx -o /tmp/tmppporpg62.ptx.o 2026-02-21T08:21:54.8322524Z 2026-02-21T08:21:54.8322650Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:54.8322838Z 2026-02-21T08:21:54.8322841Z 2026-02-21T08:21:54.8322844Z 2026-02-21T08:21:54.8323068Z ================================================================ 2026-02-21T08:21:54.8323276Z Internal Triton PTX codegen error 2026-02-21T08:21:54.8323441Z `ptxas` stderr: 2026-02-21T08:21:54.8323856Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 186 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:54.8324398Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:54.8324557Z 2026-02-21T08:21:54.8325143Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmppporpg62.ptx -o /tmp/tmppporpg62.ptx.o 2026-02-21T08:21:54.8325582Z 2026-02-21T08:21:54.8325593Z 2026-02-21T08:21:54.8325646Z // 2026-02-21T08:21:54.8325785Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:54.8325962Z // 2026-02-21T08:21:54.8326027Z 2026-02-21T08:21:54.8326081Z .version 8.7 2026-02-21T08:21:54.8326221Z .target sm_100a 2026-02-21T08:21:54.8326360Z .address_size 64 2026-02-21T08:21:54.8326442Z 2026-02-21T08:21:54.8326559Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:54.8326814Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:54.8327015Z // @_helion_matmul 2026-02-21T08:21:54.8327214Z .visible .entry _helion_matmul( 2026-02-21T08:21:54.8327423Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:54.8327678Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:54.8327922Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:54.8328154Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:54.8328400Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:54.8328590Z ) 2026-02-21T08:21:54.8328712Z .reqntid 128 2026-02-21T08:21:54.8328833Z .maxnreg 32 2026-02-21T08:21:54.8328957Z { 2026-02-21T08:21:54.8329077Z .reg .pred %p<85>; 2026-02-21T08:21:54.8329224Z .reg .b32 %r<390>; 2026-02-21T08:21:54.8329358Z .reg .b64 %rd<128>; 2026-02-21T08:21:54.8329499Z $L__func_begin0: 2026-02-21T08:21:54.8329579Z 2026-02-21T08:21:54.8329637Z // %bb.0: 2026-02-21T08:21:54.8329874Z .loc 1 19 0 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:19 2026-02-21T08:21:54.8330163Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:54.8330332Z ld.param.b64 %rd11, [_helion_matmul_param_1]; 2026-02-21T08:21:54.8330534Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:21:54.8330691Z mov.b32 %r31, global_smem; 2026-02-21T08:21:54.8330849Z // begin inline asm 2026-02-21T08:21:54.8331074Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r31], 64; 2026-02-21T08:21:54.8331333Z // end inline asm 2026-02-21T08:21:54.8331502Z ld.param.b64 %rd28, [_helion_matmul_param_3]; 2026-02-21T08:21:54.8331691Z bar.sync 0; 2026-02-21T08:21:54.8331838Z ld.shared.b32 %r382, [global_smem]; 2026-02-21T08:21:54.8332003Z bar.sync 0; 2026-02-21T08:21:54.8332135Z // begin inline asm 2026-02-21T08:21:54.8332328Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:54.8332389Z // end inline asm 2026-02-21T08:21:54.8332562Z .loc 1 21 67 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:21:67 2026-02-21T08:21:54.8332621Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:21:54.8332689Z mov.u32 %r40, %ctaid.y; 2026-02-21T08:21:54.8332831Z mov.u32 %r41, %ctaid.z; 2026-02-21T08:21:54.8332891Z mov.u32 %r42, %nctaid.x; 2026-02-21T08:21:54.8332949Z mov.u32 %r43, %nctaid.y; 2026-02-21T08:21:54.8333021Z mad.lo.s32 %r44, %r41, %r43, %r40; 2026-02-21T08:21:54.8333084Z mad.lo.s32 %r45, %r44, %r42, %r3; 2026-02-21T08:21:54.8333140Z shl.b32 %r46, %r45, 7; 2026-02-21T08:21:54.8333206Z cvt.s64.s32 %rd29, %r46; 2026-02-21T08:21:54.8333265Z add.s64 %rd25, %rd28, %rd29; 2026-02-21T08:21:54.8333321Z shl.b32 %r47, %r1, 2; 2026-02-21T08:21:54.8333378Z add.s32 %r32, %r31, %r47; 2026-02-21T08:21:54.8333439Z mov.b32 %r49, 0; 2026-02-21T08:21:54.8333494Z // begin inline asm 2026-02-21T08:21:54.8333561Z @%p1 st.shared.b32 [ %r32 + 0 ], %r49; 2026-02-21T08:21:54.8333622Z // end inline asm 2026-02-21T08:21:54.8333683Z bar.warp.sync -1; 2026-02-21T08:21:54.8333743Z setp.eq.b32 %p75, %r1, 0; 2026-02-21T08:21:54.8333800Z cvt.u64.u32 %rd10, %r31; 2026-02-21T08:21:54.8333920Z // begin inline asm 2026-02-21T08:21:54.8334093Z @%p75 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T08:21:54.8334150Z // end inline asm 2026-02-21T08:21:54.8334211Z // begin inline asm 2026-02-21T08:21:54.8334349Z @%p75 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:21:54.8334402Z // end inline asm 2026-02-21T08:21:54.8334462Z mov.b32 %r92, 16; 2026-02-21T08:21:54.8334516Z // begin inline asm 2026-02-21T08:21:54.8334701Z @%p75 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r92; 2026-02-21T08:21:54.8334757Z // end inline asm 2026-02-21T08:21:54.8334818Z mov.b32 %r35, 64; 2026-02-21T08:21:54.8334873Z // begin inline asm 2026-02-21T08:21:54.8335020Z @%p75 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r35; 2026-02-21T08:21:54.8335084Z // end inline asm 2026-02-21T08:21:54.8335142Z mov.b32 %r36, 2048; 2026-02-21T08:21:54.8335198Z // begin inline asm 2026-02-21T08:21:54.8335366Z @%p75 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r36; 2026-02-21T08:21:54.8335424Z // end inline asm 2026-02-21T08:21:54.8335482Z mov.b32 %r37, 4096; 2026-02-21T08:21:54.8335538Z // begin inline asm 2026-02-21T08:21:54.8335703Z @%p75 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r37; 2026-02-21T08:21:54.8335759Z // end inline asm 2026-02-21T08:21:54.8335817Z mov.b64 %rd18, 4096; 2026-02-21T08:21:54.8335881Z // begin inline asm 2026-02-21T08:21:54.8336048Z @%p75 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T08:21:54.8336103Z // end inline asm 2026-02-21T08:21:54.8336164Z mov.b32 %r38, 1; 2026-02-21T08:21:54.8336220Z // begin inline asm 2026-02-21T08:21:54.8336389Z @%p75 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r38; 2026-02-21T08:21:54.8336445Z // end inline asm 2026-02-21T08:21:54.8336509Z // begin inline asm 2026-02-21T08:21:54.8336680Z @%p75 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r38; 2026-02-21T08:21:54.8336739Z // end inline asm 2026-02-21T08:21:54.8336803Z // begin inline asm 2026-02-21T08:21:54.8336952Z @%p75 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T08:21:54.8337006Z // end inline asm 2026-02-21T08:21:54.8337068Z // begin inline asm 2026-02-21T08:21:54.8337233Z @%p75 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:21:54.8337289Z // end inline asm 2026-02-21T08:21:54.8337345Z // begin inline asm 2026-02-21T08:21:54.8337506Z @%p75 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:21:54.8337562Z // end inline asm 2026-02-21T08:21:54.8337616Z // begin inline asm 2026-02-21T08:21:54.8337772Z @%p75 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:21:54.8337827Z // end inline asm 2026-02-21T08:21:54.8337885Z // begin inline asm 2026-02-21T08:21:54.8338207Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd25 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T08:21:54.8338260Z // end inline asm 2026-02-21T08:21:54.8338314Z // begin inline asm 2026-02-21T08:21:54.8338443Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd25 + 0 ], 0x80; 2026-02-21T08:21:54.8338513Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:54.8338585Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:54.8338636Z // end inline asm 2026-02-21T08:21:54.8338696Z bar.sync 0; 2026-02-21T08:21:54.8338761Z cvta.global.u64 %rd48, %rd25; 2026-02-21T08:21:54.8338940Z .loc 1 27 132 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:27:132 2026-02-21T08:21:54.8339010Z setp.gt.u32 %p21, %r3, 2047; 2026-02-21T08:21:54.8339067Z @%p21 bra $L__BB0_8; 2026-02-21T08:21:54.8339140Z // %bb.1: // %.lr.ph 2026-02-21T08:21:54.8339384Z .loc 1 0 132 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:0:132 2026-02-21T08:21:54.8339476Z ld.param.b64 %rd8, [_helion_matmul_param_0]; 2026-02-21T08:21:54.8339643Z .loc 1 47 48 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:47:48 2026-02-21T08:21:54.8339700Z and.b32 %r135, %r1, 1; 2026-02-21T08:21:54.8339767Z shl.b32 %r4, %r135, 3; 2026-02-21T08:21:54.8339927Z .loc 1 39 45 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:39:45 2026-02-21T08:21:54.8339982Z shl.b32 %r136, %r1, 3; 2026-02-21T08:21:54.8340048Z and.b32 %r137, %r136, 56; 2026-02-21T08:21:54.8340107Z bfe.u32 %r138, %r1, 3, 4; 2026-02-21T08:21:54.8340181Z bfe.u32 %r5, %r1, 1, 6; 2026-02-21T08:21:54.8340239Z shr.u32 %r139, %r1, 5; 2026-02-21T08:21:54.8340304Z shl.b32 %r140, %r1, 4; 2026-02-21T08:21:54.8340367Z and.b32 %r141, %r140, 1904; 2026-02-21T08:21:54.8340426Z bfe.s32 %r142, %r1, 3, 1; 2026-02-21T08:21:54.8340499Z and.b32 %r143, %r142, 144; 2026-02-21T08:21:54.8340564Z xor.b32 %r144, %r143, %r141; 2026-02-21T08:21:54.8340627Z add.s32 %r91, %r31, %r144; 2026-02-21T08:21:54.8340694Z add.s32 %r98, %r91, 2048; 2026-02-21T08:21:54.8340749Z add.s32 %r105, %r91, 4096; 2026-02-21T08:21:54.8340805Z add.s32 %r112, %r91, 6144; 2026-02-21T08:21:54.8340859Z add.s32 %r119, %r91, 8192; 2026-02-21T08:21:54.8340923Z add.s32 %r126, %r91, 10240; 2026-02-21T08:21:54.8340980Z add.s32 %r189, %r91, 12288; 2026-02-21T08:21:54.8341035Z and.b32 %r146, %r140, 176; 2026-02-21T08:21:54.8341096Z and.b32 %r147, %r1, 96; 2026-02-21T08:21:54.8341152Z shl.b32 %r148, %r147, 3; 2026-02-21T08:21:54.8341209Z bfe.s32 %r149, %r1, 2, 1; 2026-02-21T08:21:54.8341267Z and.b32 %r150, %r149, 1088; 2026-02-21T08:21:54.8341333Z and.b32 %r152, %r47, 64; 2026-02-21T08:21:54.8341393Z xor.b32 %r153, %r150, %r152; 2026-02-21T08:21:54.8341451Z add.s32 %r154, %r31, %r146; 2026-02-21T08:21:54.8341516Z add.s32 %r155, %r154, %r148; 2026-02-21T08:21:54.8341577Z shl.b32 %r156, %r1, 5; 2026-02-21T08:21:54.8341637Z and.b32 %r157, %r156, 1792; 2026-02-21T08:21:54.8341695Z and.b32 %r158, %r136, 48; 2026-02-21T08:21:54.8341759Z shl.b32 %r159, %r147, 1; 2026-02-21T08:21:54.8341818Z shl.b32 %r160, %r135, 6; 2026-02-21T08:21:54.8341876Z xor.b32 %r161, %r159, %r160; 2026-02-21T08:21:54.8341942Z add.s32 %r162, %r31, %r157; 2026-02-21T08:21:54.8342000Z add.s32 %r163, %r162, %r158; 2026-02-21T08:21:54.8342172Z .loc 1 34 33 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:34:33 2026-02-21T08:21:54.8342229Z shr.u32 %r164, %r3, 5; 2026-02-21T08:21:54.8342294Z and.b32 %r165, %r164, 60; 2026-02-21T08:21:54.8342465Z .loc 1 36 64 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:36:64 2026-02-21T08:21:54.8342524Z and.b32 %r166, %r3, 3; 2026-02-21T08:21:54.8342701Z .loc 1 36 30 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:36:30 2026-02-21T08:21:54.8342763Z or.b32 %r167, %r165, %r166; 2026-02-21T08:21:54.8342976Z .loc 1 38 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:38:27 2026-02-21T08:21:54.8343043Z shl.b32 %r194, %r167, 6; 2026-02-21T08:21:54.8343209Z .loc 1 40 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:40:27 2026-02-21T08:21:54.8343268Z shl.b32 %r168, %r3, 4; 2026-02-21T08:21:54.8343334Z and.b32 %r169, %r168, 1984; 2026-02-21T08:21:54.8343499Z .loc 1 41 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:41:32 2026-02-21T08:21:54.8343557Z or.b32 %r170, %r169, %r5; 2026-02-21T08:21:54.8343617Z or.b32 %r12, %r169, %r138; 2026-02-21T08:21:54.8343790Z .loc 1 51 53 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:53 2026-02-21T08:21:54.8343848Z shl.b32 %r171, %r170, 11; 2026-02-21T08:21:54.8344063Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8344153Z shfl.sync.idx.b32 %r16, %r139, 0, 31, -1; 2026-02-21T08:21:54.8344212Z shl.b32 %r172, %r16, 21; 2026-02-21T08:21:54.8344274Z and.b32 %r173, %r172, 6291456; 2026-02-21T08:21:54.8344340Z add.s32 %r288, %r173, %r382; 2026-02-21T08:21:54.8344402Z mov.pred %p22, -1; 2026-02-21T08:21:54.8344460Z // begin inline asm 2026-02-21T08:21:54.8344776Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r288 + 0], 32, {%r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49}; 2026-02-21T08:21:54.8344841Z // end inline asm 2026-02-21T08:21:54.8344898Z // begin inline asm 2026-02-21T08:21:54.8345163Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r288 + 16], 32, {%r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49}; 2026-02-21T08:21:54.8345228Z // end inline asm 2026-02-21T08:21:54.8345285Z // begin inline asm 2026-02-21T08:21:54.8345355Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:54.8345420Z // end inline asm 2026-02-21T08:21:54.8345479Z bar.sync 0; 2026-02-21T08:21:54.8345654Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8345713Z add.s32 %r384, %r31, 28736; 2026-02-21T08:21:54.8345777Z // begin inline asm 2026-02-21T08:21:54.8345865Z @%p75 mbarrier.init.shared::cta.b64 [%r384], 1; 2026-02-21T08:21:54.8345922Z // end inline asm 2026-02-21T08:21:54.8345987Z bar.sync 0; 2026-02-21T08:21:54.8346046Z add.s32 %r83, %r31, 28744; 2026-02-21T08:21:54.8346103Z // begin inline asm 2026-02-21T08:21:54.8346188Z @%p75 mbarrier.init.shared::cta.b64 [%r83], 1; 2026-02-21T08:21:54.8346252Z // end inline asm 2026-02-21T08:21:54.8346312Z add.s32 %r84, %r31, 28672; 2026-02-21T08:21:54.8346368Z // begin inline asm 2026-02-21T08:21:54.8346459Z @%p75 mbarrier.init.shared::cta.b64 [%r84], 1; 2026-02-21T08:21:54.8346515Z // end inline asm 2026-02-21T08:21:54.8346571Z bar.sync 0; 2026-02-21T08:21:54.8346640Z add.s32 %r85, %r31, 28680; 2026-02-21T08:21:54.8346700Z // begin inline asm 2026-02-21T08:21:54.8346781Z @%p75 mbarrier.init.shared::cta.b64 [%r85], 1; 2026-02-21T08:21:54.8346836Z // end inline asm 2026-02-21T08:21:54.8346899Z bar.sync 0; 2026-02-21T08:21:54.8346958Z add.s32 %r86, %r31, 28688; 2026-02-21T08:21:54.8347015Z // begin inline asm 2026-02-21T08:21:54.8347102Z @%p75 mbarrier.init.shared::cta.b64 [%r86], 1; 2026-02-21T08:21:54.8347157Z // end inline asm 2026-02-21T08:21:54.8347212Z bar.sync 0; 2026-02-21T08:21:54.8347271Z add.s32 %r87, %r31, 28696; 2026-02-21T08:21:54.8347335Z // begin inline asm 2026-02-21T08:21:54.8347412Z @%p75 mbarrier.init.shared::cta.b64 [%r87], 1; 2026-02-21T08:21:54.8347467Z // end inline asm 2026-02-21T08:21:54.8347530Z bar.sync 0; 2026-02-21T08:21:54.8347589Z add.s32 %r88, %r31, 28704; 2026-02-21T08:21:54.8347646Z // begin inline asm 2026-02-21T08:21:54.8347725Z @%p75 mbarrier.init.shared::cta.b64 [%r88], 1; 2026-02-21T08:21:54.8347793Z // end inline asm 2026-02-21T08:21:54.8347851Z bar.sync 0; 2026-02-21T08:21:54.8347963Z add.s32 %r89, %r31, 28712; 2026-02-21T08:21:54.8348026Z // begin inline asm 2026-02-21T08:21:54.8348103Z @%p75 mbarrier.init.shared::cta.b64 [%r89], 1; 2026-02-21T08:21:54.8348158Z // end inline asm 2026-02-21T08:21:54.8348218Z bar.sync 0; 2026-02-21T08:21:54.8348277Z add.s32 %r191, %r31, 28720; 2026-02-21T08:21:54.8348334Z // begin inline asm 2026-02-21T08:21:54.8348417Z @%p75 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T08:21:54.8348481Z // end inline asm 2026-02-21T08:21:54.8348658Z .loc 1 51 60 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:60 2026-02-21T08:21:54.8348718Z or.b32 %r174, %r171, %r4; 2026-02-21T08:21:54.8348901Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8348974Z mad.wide.u32 %rd30, %r174, 2, %rd8; 2026-02-21T08:21:54.8349202Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8349272Z // begin inline asm 2026-02-21T08:21:54.8349404Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd30 + 0 ], 0x10, %r92; 2026-02-21T08:21:54.8349459Z // end inline asm 2026-02-21T08:21:54.8349521Z cp.async.commit_group; 2026-02-21T08:21:54.8349687Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8349739Z bar.sync 0; 2026-02-21T08:21:54.8349793Z // begin inline asm 2026-02-21T08:21:54.8349906Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r84], 2048; 2026-02-21T08:21:54.8349960Z // end inline asm 2026-02-21T08:21:54.8350124Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8350183Z bar.sync 0; 2026-02-21T08:21:54.8350247Z elect.sync %r175|%p46, -1; 2026-02-21T08:21:54.8350309Z and.pred %p34, %p1, %p46; 2026-02-21T08:21:54.8350365Z add.s32 %r94, %r31, 14336; 2026-02-21T08:21:54.8350428Z // begin inline asm 2026-02-21T08:21:54.8350665Z @%p34 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r94], [%rd48, {%r49, %r194}], [%r84]; 2026-02-21T08:21:54.8350720Z // end inline asm 2026-02-21T08:21:54.8350887Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8350946Z add.s64 %rd32, %rd30, 32; 2026-02-21T08:21:54.8351106Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8351168Z // begin inline asm 2026-02-21T08:21:54.8351281Z cp.async.cg.shared.global [ %r98 + 0 ], [ %rd32 + 0 ], 0x10, %r92; 2026-02-21T08:21:54.8351336Z // end inline asm 2026-02-21T08:21:54.8351398Z cp.async.commit_group; 2026-02-21T08:21:54.8351564Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8351616Z bar.sync 0; 2026-02-21T08:21:54.8351670Z // begin inline asm 2026-02-21T08:21:54.8351781Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r85], 2048; 2026-02-21T08:21:54.8351836Z // end inline asm 2026-02-21T08:21:54.8351997Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8352054Z bar.sync 0; 2026-02-21T08:21:54.8352114Z elect.sync %r176|%p47, -1; 2026-02-21T08:21:54.8352172Z and.pred %p36, %p1, %p47; 2026-02-21T08:21:54.8352227Z add.s32 %r101, %r31, 16384; 2026-02-21T08:21:54.8352286Z // begin inline asm 2026-02-21T08:21:54.8352516Z @%p36 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r101], [%rd48, {%r92, %r194}], [%r85]; 2026-02-21T08:21:54.8352570Z // end inline asm 2026-02-21T08:21:54.8352735Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8352792Z add.s64 %rd34, %rd30, 64; 2026-02-21T08:21:54.8352954Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8353054Z // begin inline asm 2026-02-21T08:21:54.8353168Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd34 + 0 ], 0x10, %r92; 2026-02-21T08:21:54.8353221Z // end inline asm 2026-02-21T08:21:54.8353281Z cp.async.commit_group; 2026-02-21T08:21:54.8353449Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8353500Z bar.sync 0; 2026-02-21T08:21:54.8353552Z // begin inline asm 2026-02-21T08:21:54.8353657Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r86], 2048; 2026-02-21T08:21:54.8353709Z // end inline asm 2026-02-21T08:21:54.8353869Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8353927Z bar.sync 0; 2026-02-21T08:21:54.8353987Z elect.sync %r177|%p48, -1; 2026-02-21T08:21:54.8354046Z and.pred %p38, %p1, %p48; 2026-02-21T08:21:54.8354102Z add.s32 %r108, %r31, 18432; 2026-02-21T08:21:54.8354200Z mov.b32 %r109, 32; 2026-02-21T08:21:54.8354256Z // begin inline asm 2026-02-21T08:21:54.8354488Z @%p38 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd48, {%r109, %r194}], [%r86]; 2026-02-21T08:21:54.8354547Z // end inline asm 2026-02-21T08:21:54.8354738Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8354795Z add.s64 %rd36, %rd30, 96; 2026-02-21T08:21:54.8354964Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8355018Z // begin inline asm 2026-02-21T08:21:54.8355133Z cp.async.cg.shared.global [ %r112 + 0 ], [ %rd36 + 0 ], 0x10, %r92; 2026-02-21T08:21:54.8355187Z // end inline asm 2026-02-21T08:21:54.8355253Z cp.async.commit_group; 2026-02-21T08:21:54.8355413Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8355468Z bar.sync 0; 2026-02-21T08:21:54.8355528Z // begin inline asm 2026-02-21T08:21:54.8355629Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r87], 2048; 2026-02-21T08:21:54.8355682Z // end inline asm 2026-02-21T08:21:54.8355840Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8355899Z bar.sync 0; 2026-02-21T08:21:54.8355960Z elect.sync %r178|%p49, -1; 2026-02-21T08:21:54.8356018Z and.pred %p40, %p1, %p49; 2026-02-21T08:21:54.8356083Z add.s32 %r115, %r31, 20480; 2026-02-21T08:21:54.8356135Z mov.b32 %r116, 48; 2026-02-21T08:21:54.8356190Z // begin inline asm 2026-02-21T08:21:54.8356428Z @%p40 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r115], [%rd48, {%r116, %r194}], [%r87]; 2026-02-21T08:21:54.8356483Z // end inline asm 2026-02-21T08:21:54.8356651Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8356716Z add.s64 %rd38, %rd30, 128; 2026-02-21T08:21:54.8356889Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8356949Z // begin inline asm 2026-02-21T08:21:54.8357062Z cp.async.cg.shared.global [ %r119 + 0 ], [ %rd38 + 0 ], 0x10, %r92; 2026-02-21T08:21:54.8357129Z // end inline asm 2026-02-21T08:21:54.8357190Z cp.async.commit_group; 2026-02-21T08:21:54.8357352Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8357410Z bar.sync 0; 2026-02-21T08:21:54.8357463Z // begin inline asm 2026-02-21T08:21:54.8357562Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r88], 2048; 2026-02-21T08:21:54.8357615Z // end inline asm 2026-02-21T08:21:54.8357784Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8357834Z bar.sync 0; 2026-02-21T08:21:54.8357893Z elect.sync %r179|%p50, -1; 2026-02-21T08:21:54.8357962Z and.pred %p42, %p1, %p50; 2026-02-21T08:21:54.8358090Z add.s32 %r122, %r31, 22528; 2026-02-21T08:21:54.8358144Z // begin inline asm 2026-02-21T08:21:54.8358380Z @%p42 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r122], [%rd48, {%r35, %r194}], [%r88]; 2026-02-21T08:21:54.8358433Z // end inline asm 2026-02-21T08:21:54.8358596Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8358661Z add.s64 %rd40, %rd30, 160; 2026-02-21T08:21:54.8358825Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8358878Z // begin inline asm 2026-02-21T08:21:54.8358986Z cp.async.cg.shared.global [ %r126 + 0 ], [ %rd40 + 0 ], 0x10, %r92; 2026-02-21T08:21:54.8359047Z // end inline asm 2026-02-21T08:21:54.8359107Z cp.async.commit_group; 2026-02-21T08:21:54.8359319Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8359382Z bar.sync 0; 2026-02-21T08:21:54.8359438Z // begin inline asm 2026-02-21T08:21:54.8359537Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r89], 2048; 2026-02-21T08:21:54.8359590Z // end inline asm 2026-02-21T08:21:54.8359753Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8359805Z bar.sync 0; 2026-02-21T08:21:54.8359865Z elect.sync %r180|%p51, -1; 2026-02-21T08:21:54.8359935Z and.pred %p44, %p1, %p51; 2026-02-21T08:21:54.8359991Z add.s32 %r129, %r31, 24576; 2026-02-21T08:21:54.8360043Z mov.b32 %r130, 80; 2026-02-21T08:21:54.8360105Z // begin inline asm 2026-02-21T08:21:54.8360337Z @%p44 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r129], [%rd48, {%r130, %r194}], [%r89]; 2026-02-21T08:21:54.8360390Z // end inline asm 2026-02-21T08:21:54.8360548Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8360619Z cp.async.wait_group 5; 2026-02-21T08:21:54.8360673Z bar.sync 0; 2026-02-21T08:21:54.8360833Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8360895Z // begin inline asm 2026-02-21T08:21:54.8360943Z 2026-02-21T08:21:54.8360992Z { 2026-02-21T08:21:54.8361057Z .reg .pred complete; 2026-02-21T08:21:54.8361110Z waitLoop: 2026-02-21T08:21:54.8361219Z mbarrier.try_wait.parity.shared.b64 complete, [%r84], %r49; 2026-02-21T08:21:54.8361280Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.8361335Z } 2026-02-21T08:21:54.8361339Z 2026-02-21T08:21:54.8361392Z // end inline asm 2026-02-21T08:21:54.8361550Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8361617Z setp.ne.b32 %p52, %r16, 0; 2026-02-21T08:21:54.8361675Z @%p52 bra $L__BB0_3; 2026-02-21T08:21:54.8361725Z // %bb.2: 2026-02-21T08:21:54.8361885Z .loc 1 0 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:0:52 2026-02-21T08:21:54.8361951Z bfe.u32 %r185, %r94, 4, 14; 2026-02-21T08:21:54.8362010Z cvt.u64.u32 %rd45, %r185; 2026-02-21T08:21:54.8362081Z or.b64 %rd43, %rd45, -4611685949699522560; 2026-02-21T08:21:54.8362144Z bfe.u32 %r186, %r31, 4, 14; 2026-02-21T08:21:54.8362200Z cvt.u64.u32 %rd46, %r186; 2026-02-21T08:21:54.8362269Z or.b64 %rd42, %rd46, -4611685949699522560; 2026-02-21T08:21:54.8362432Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8362493Z elect.sync %r187|%p54, -1; 2026-02-21T08:21:54.8362548Z mov.b32 %r182, 68157456; 2026-02-21T08:21:54.8362604Z mov.pred %p53, 0; 2026-02-21T08:21:54.8362665Z // begin inline asm 2026-02-21T08:21:54.8362800Z @%p54 tcgen05.mma.cta_group::1.kind::f16 [ %r382 + 0 ], %rd42, %rd43, %r182, %p53; 2026-02-21T08:21:54.8362852Z // end inline asm 2026-02-21T08:21:54.8362915Z add.s32 %r188, %r31, 28736; 2026-02-21T08:21:54.8362973Z cvt.u64.u32 %rd44, %r188; 2026-02-21T08:21:54.8363067Z // begin inline asm 2026-02-21T08:21:54.8363196Z @%p54 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd44]; 2026-02-21T08:21:54.8363250Z // end inline asm 2026-02-21T08:21:54.8363303Z $L__BB0_3: 2026-02-21T08:21:54.8363469Z .loc 1 0 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:0:52 2026-02-21T08:21:54.8363557Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T08:21:54.8363615Z add.s32 %r8, %r155, %r153; 2026-02-21T08:21:54.8363673Z add.s32 %r293, %r163, %r161; 2026-02-21T08:21:54.8363734Z or.b32 %r11, %r194, %r137; 2026-02-21T08:21:54.8363790Z or.b32 %r13, %r12, 16; 2026-02-21T08:21:54.8363843Z or.b32 %r14, %r12, 32; 2026-02-21T08:21:54.8363895Z or.b32 %r15, %r12, 48; 2026-02-21T08:21:54.8364068Z .loc 1 51 32 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:32 2026-02-21T08:21:54.8364164Z add.s64 %rd47, %rd30, 192; 2026-02-21T08:21:54.8364326Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8364390Z bar.sync 0; 2026-02-21T08:21:54.8364442Z mov.b32 %r190, 16; 2026-02-21T08:21:54.8364497Z // begin inline asm 2026-02-21T08:21:54.8364616Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd47 + 0 ], 0x10, %r190; 2026-02-21T08:21:54.8364696Z // end inline asm 2026-02-21T08:21:54.8364758Z cp.async.commit_group; 2026-02-21T08:21:54.8364921Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8364983Z // begin inline asm 2026-02-21T08:21:54.8365089Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r191], 2048; 2026-02-21T08:21:54.8365144Z // end inline asm 2026-02-21T08:21:54.8365317Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8365369Z bar.sync 0; 2026-02-21T08:21:54.8365432Z elect.sync %r201|%p59, -1; 2026-02-21T08:21:54.8365502Z and.pred %p57, %p1, %p59; 2026-02-21T08:21:54.8365563Z add.s32 %r192, %r31, 26624; 2026-02-21T08:21:54.8365615Z mov.b32 %r193, 96; 2026-02-21T08:21:54.8365668Z // begin inline asm 2026-02-21T08:21:54.8365912Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r192], [%rd48, {%r193, %r194}], [%r191]; 2026-02-21T08:21:54.8365967Z // end inline asm 2026-02-21T08:21:54.8366125Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8366189Z shl.b32 %r202, %r3, 15; 2026-02-21T08:21:54.8366248Z and.b32 %r203, %r202, 4063232; 2026-02-21T08:21:54.8366303Z shl.b32 %r204, %r5, 11; 2026-02-21T08:21:54.8366366Z or.b32 %r205, %r203, %r204; 2026-02-21T08:21:54.8366421Z or.b32 %r206, %r205, %r4; 2026-02-21T08:21:54.8366488Z mad.wide.u32 %rd50, %r206, 2, %rd8; 2026-02-21T08:21:54.8366543Z add.s64 %rd126, %rd50, 224; 2026-02-21T08:21:54.8366607Z mov.b32 %r388, 1; 2026-02-21T08:21:54.8366663Z mov.b32 %r387, 6; 2026-02-21T08:21:54.8366718Z mov.b32 %r383, 0; 2026-02-21T08:21:54.8366780Z mov.b64 %rd127, 0; 2026-02-21T08:21:54.8366833Z mov.b32 %r385, %r383; 2026-02-21T08:21:54.8366887Z mov.b32 %r386, %r383; 2026-02-21T08:21:54.8366939Z mov.b32 %r389, %r383; 2026-02-21T08:21:54.8367001Z bra.uni $L__BB0_4; 2026-02-21T08:21:54.8367101Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:54.8367265Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8367336Z setp.lt.u64 %p67, %rd127, 1936; 2026-02-21T08:21:54.8367495Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8367549Z // begin inline asm 2026-02-21T08:21:54.8367602Z 2026-02-21T08:21:54.8367650Z { 2026-02-21T08:21:54.8367707Z .reg .pred complete; 2026-02-21T08:21:54.8367759Z waitLoop: 2026-02-21T08:21:54.8367885Z mbarrier.try_wait.parity.shared.b64 complete, [%r384], %r383; 2026-02-21T08:21:54.8367999Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.8368048Z } 2026-02-21T08:21:54.8368052Z 2026-02-21T08:21:54.8368112Z // end inline asm 2026-02-21T08:21:54.8368169Z add.s32 %r234, %r388, 1; 2026-02-21T08:21:54.8368229Z setp.gt.s32 %p70, %r234, 1; 2026-02-21T08:21:54.8368290Z selp.b32 %r388, 0, %r234, %p70; 2026-02-21T08:21:54.8368356Z selp.b32 %r235, 1, 0, %p70; 2026-02-21T08:21:54.8368414Z xor.b32 %r389, %r245, %r235; 2026-02-21T08:21:54.8368579Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8368644Z add.s32 %r236, %r387, 1; 2026-02-21T08:21:54.8368703Z setp.gt.s32 %p71, %r236, 6; 2026-02-21T08:21:54.8368762Z selp.b32 %r387, 0, %r236, %p71; 2026-02-21T08:21:54.8368932Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8368985Z bar.sync 0; 2026-02-21T08:21:54.8369094Z shl.b32 %r237, %r387, 11; 2026-02-21T08:21:54.8369153Z add.s32 %r227, %r91, %r237; 2026-02-21T08:21:54.8369220Z selp.b32 %r228, 16, 0, %p67; 2026-02-21T08:21:54.8369272Z // begin inline asm 2026-02-21T08:21:54.8369387Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd126 + 0 ], 0x10, %r228; 2026-02-21T08:21:54.8369446Z // end inline asm 2026-02-21T08:21:54.8369504Z cp.async.commit_group; 2026-02-21T08:21:54.8369668Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8369722Z shl.b32 %r238, %r387, 3; 2026-02-21T08:21:54.8369783Z add.s32 %r240, %r31, %r238; 2026-02-21T08:21:54.8369840Z add.s32 %r233, %r240, 28672; 2026-02-21T08:21:54.8369900Z and.pred %p65, %p75, %p67; 2026-02-21T08:21:54.8369961Z // begin inline asm 2026-02-21T08:21:54.8370068Z @%p65 mbarrier.arrive.expect_tx.shared.b64 _, [%r233], 2048; 2026-02-21T08:21:54.8370122Z // end inline asm 2026-02-21T08:21:54.8370294Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8370352Z add.s32 %r241, %r31, %r237; 2026-02-21T08:21:54.8370407Z add.s32 %r230, %r241, 14336; 2026-02-21T08:21:54.8370459Z bar.sync 0; 2026-02-21T08:21:54.8370526Z elect.sync %r242|%p72, -1; 2026-02-21T08:21:54.8370586Z and.pred %p73, %p67, %p72; 2026-02-21T08:21:54.8370645Z and.pred %p66, %p1, %p73; 2026-02-21T08:21:54.8370707Z cvt.u32.u64 %r243, %rd127; 2026-02-21T08:21:54.8370762Z add.s32 %r231, %r243, 112; 2026-02-21T08:21:54.8370814Z // begin inline asm 2026-02-21T08:21:54.8371054Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r230], [%rd48, {%r231, %r194}], [%r233]; 2026-02-21T08:21:54.8371115Z // end inline asm 2026-02-21T08:21:54.8371278Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8371335Z add.s64 %rd126, %rd126, 32; 2026-02-21T08:21:54.8371405Z setp.lt.u64 %p74, %rd127, 2016; 2026-02-21T08:21:54.8371462Z add.s64 %rd127, %rd127, 16; 2026-02-21T08:21:54.8371519Z mov.b32 %r383, %r245; 2026-02-21T08:21:54.8371581Z mov.b32 %r384, %r244; 2026-02-21T08:21:54.8371637Z @%p74 bra $L__BB0_4; 2026-02-21T08:21:54.8371691Z bra.uni $L__BB0_7; 2026-02-21T08:21:54.8371792Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:54.8371960Z .loc 1 0 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:0:42 2026-02-21T08:21:54.8372015Z mov.b32 %r245, %r389; 2026-02-21T08:21:54.8372178Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8372242Z add.s32 %r209, %r386, 1; 2026-02-21T08:21:54.8372301Z setp.gt.s32 %p61, %r209, 6; 2026-02-21T08:21:54.8372362Z selp.b32 %r386, 0, %r209, %p61; 2026-02-21T08:21:54.8372426Z selp.b32 %r210, 1, 0, %p61; 2026-02-21T08:21:54.8372482Z xor.b32 %r385, %r385, %r210; 2026-02-21T08:21:54.8372647Z .loc 1 51 85 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:51:85 2026-02-21T08:21:54.8372753Z cp.async.wait_group 5; 2026-02-21T08:21:54.8372813Z bar.sync 0; 2026-02-21T08:21:54.8372981Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8373037Z shl.b32 %r211, %r386, 3; 2026-02-21T08:21:54.8373102Z add.s32 %r213, %r31, %r211; 2026-02-21T08:21:54.8373161Z add.s32 %r207, %r213, 28672; 2026-02-21T08:21:54.8373215Z // begin inline asm 2026-02-21T08:21:54.8373274Z 2026-02-21T08:21:54.8373331Z { 2026-02-21T08:21:54.8373390Z .reg .pred complete; 2026-02-21T08:21:54.8373442Z waitLoop: 2026-02-21T08:21:54.8373567Z mbarrier.try_wait.parity.shared.b64 complete, [%r207], %r385; 2026-02-21T08:21:54.8373628Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.8373676Z } 2026-02-21T08:21:54.8373679Z 2026-02-21T08:21:54.8373738Z // end inline asm 2026-02-21T08:21:54.8373794Z shl.b32 %r214, %r388, 3; 2026-02-21T08:21:54.8373936Z add.s32 %r215, %r31, %r214; 2026-02-21T08:21:54.8373995Z add.s32 %r244, %r215, 28736; 2026-02-21T08:21:54.8374168Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8374223Z @%p52 bra $L__BB0_6; 2026-02-21T08:21:54.8374318Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:54.8374486Z .loc 1 52 44 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:52:44 2026-02-21T08:21:54.8374541Z shl.b32 %r218, %r386, 11; 2026-02-21T08:21:54.8374596Z add.s32 %r220, %r31, %r218; 2026-02-21T08:21:54.8374657Z add.s32 %r221, %r220, 14336; 2026-02-21T08:21:54.8374842Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8374907Z elect.sync %r222|%p63, -1; 2026-02-21T08:21:54.8374966Z bfe.u32 %r223, %r220, 4, 14; 2026-02-21T08:21:54.8375035Z cvt.u64.u32 %rd54, %r223; 2026-02-21T08:21:54.8375111Z or.b64 %rd51, %rd54, -4611685949699522560; 2026-02-21T08:21:54.8375174Z bfe.u32 %r224, %r221, 4, 14; 2026-02-21T08:21:54.8375243Z cvt.u64.u32 %rd55, %r224; 2026-02-21T08:21:54.8375311Z or.b64 %rd52, %rd55, -4611685949699522560; 2026-02-21T08:21:54.8375366Z mov.b32 %r217, 68157456; 2026-02-21T08:21:54.8375424Z mov.pred %p62, -1; 2026-02-21T08:21:54.8375486Z // begin inline asm 2026-02-21T08:21:54.8375622Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r382 + 0 ], %rd51, %rd52, %r217, %p62; 2026-02-21T08:21:54.8375676Z // end inline asm 2026-02-21T08:21:54.8375741Z cvt.u64.u32 %rd53, %r244; 2026-02-21T08:21:54.8375794Z // begin inline asm 2026-02-21T08:21:54.8375915Z @%p63 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd53]; 2026-02-21T08:21:54.8375976Z // end inline asm 2026-02-21T08:21:54.8376029Z bra.uni $L__BB0_6; 2026-02-21T08:21:54.8376120Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:21:54.8376281Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8376343Z // begin inline asm 2026-02-21T08:21:54.8376391Z 2026-02-21T08:21:54.8376439Z { 2026-02-21T08:21:54.8376505Z .reg .pred complete; 2026-02-21T08:21:54.8376557Z waitLoop: 2026-02-21T08:21:54.8376668Z mbarrier.try_wait.parity.shared.b64 complete, [%r244], %r245; 2026-02-21T08:21:54.8376739Z @!complete bra.uni waitLoop; 2026-02-21T08:21:54.8376789Z } 2026-02-21T08:21:54.8376792Z 2026-02-21T08:21:54.8376846Z // end inline asm 2026-02-21T08:21:54.8377003Z .loc 1 46 42 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:46:42 2026-02-21T08:21:54.8377069Z cp.async.wait_group 0; 2026-02-21T08:21:54.8377121Z bar.sync 0; 2026-02-21T08:21:54.8377174Z // begin inline asm 2026-02-21T08:21:54.8377259Z @%p75 mbarrier.inval.shared::cta.b64 [%r84]; 2026-02-21T08:21:54.8377313Z // end inline asm 2026-02-21T08:21:54.8377363Z bar.sync 0; 2026-02-21T08:21:54.8377419Z // begin inline asm 2026-02-21T08:21:54.8377571Z @%p75 mbarrier.inval.shared::cta.b64 [%r85]; 2026-02-21T08:21:54.8377624Z // end inline asm 2026-02-21T08:21:54.8377675Z bar.sync 0; 2026-02-21T08:21:54.8377734Z // begin inline asm 2026-02-21T08:21:54.8377809Z @%p75 mbarrier.inval.shared::cta.b64 [%r86]; 2026-02-21T08:21:54.8377861Z // end inline asm 2026-02-21T08:21:54.8377917Z bar.sync 0; 2026-02-21T08:21:54.8377970Z // begin inline asm 2026-02-21T08:21:54.8378043Z @%p75 mbarrier.inval.shared::cta.b64 [%r87]; 2026-02-21T08:21:54.8378095Z // end inline asm 2026-02-21T08:21:54.8378153Z bar.sync 0; 2026-02-21T08:21:54.8378206Z // begin inline asm 2026-02-21T08:21:54.8378278Z @%p75 mbarrier.inval.shared::cta.b64 [%r88]; 2026-02-21T08:21:54.8378337Z // end inline asm 2026-02-21T08:21:54.8378388Z bar.sync 0; 2026-02-21T08:21:54.8378441Z // begin inline asm 2026-02-21T08:21:54.8378512Z @%p75 mbarrier.inval.shared::cta.b64 [%r89]; 2026-02-21T08:21:54.8378572Z // end inline asm 2026-02-21T08:21:54.8378682Z bar.sync 0; 2026-02-21T08:21:54.8378739Z // begin inline asm 2026-02-21T08:21:54.8378823Z @%p75 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T08:21:54.8378876Z // end inline asm 2026-02-21T08:21:54.8378932Z add.s32 %r253, %r31, 28736; 2026-02-21T08:21:54.8378985Z // begin inline asm 2026-02-21T08:21:54.8379067Z @%p75 mbarrier.inval.shared::cta.b64 [%r253]; 2026-02-21T08:21:54.8379120Z // end inline asm 2026-02-21T08:21:54.8379172Z bar.sync 0; 2026-02-21T08:21:54.8379231Z // begin inline asm 2026-02-21T08:21:54.8379302Z @%p75 mbarrier.inval.shared::cta.b64 [%r83]; 2026-02-21T08:21:54.8379355Z // end inline asm 2026-02-21T08:21:54.8379527Z .loc 1 56 45 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:56:45 2026-02-21T08:21:54.8379585Z shl.b32 %r326, %r12, 12; 2026-02-21T08:21:54.8379641Z shl.b32 %r327, %r13, 12; 2026-02-21T08:21:54.8379695Z shl.b32 %r328, %r14, 12; 2026-02-21T08:21:54.8379757Z shl.b32 %r329, %r15, 12; 2026-02-21T08:21:54.8379926Z .loc 1 56 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:56:52 2026-02-21T08:21:54.8379986Z or.b32 %r330, %r11, %r326; 2026-02-21T08:21:54.8380050Z or.b32 %r331, %r11, %r327; 2026-02-21T08:21:54.8380104Z or.b32 %r332, %r11, %r328; 2026-02-21T08:21:54.8380159Z or.b32 %r333, %r11, %r329; 2026-02-21T08:21:54.8380322Z .loc 1 56 24 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:56:24 2026-02-21T08:21:54.8380401Z mad.wide.u32 %rd58, %r330, 2, %rd9; 2026-02-21T08:21:54.8380467Z mad.wide.u32 %rd59, %r331, 2, %rd9; 2026-02-21T08:21:54.8380529Z mad.wide.u32 %rd60, %r332, 2, %rd9; 2026-02-21T08:21:54.8380601Z mad.wide.u32 %rd61, %r333, 2, %rd9; 2026-02-21T08:21:54.8380768Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8380822Z // begin inline asm 2026-02-21T08:21:54.8381113Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r255, %r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270}, [%r288 + 0], 32; 2026-02-21T08:21:54.8381169Z // end inline asm 2026-02-21T08:21:54.8381223Z // begin inline asm 2026-02-21T08:21:54.8381513Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287}, [%r288 + 16], 32; 2026-02-21T08:21:54.8381566Z // end inline asm 2026-02-21T08:21:54.8381619Z // begin inline asm 2026-02-21T08:21:54.8381686Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:54.8381745Z // end inline asm 2026-02-21T08:21:54.8381804Z cvt.u64.u32 %rd62, %r255; 2026-02-21T08:21:54.8381861Z cvt.u64.u32 %rd63, %r256; 2026-02-21T08:21:54.8381922Z shl.b64 %rd64, %rd63, 32; 2026-02-21T08:21:54.8381978Z or.b64 %rd65, %rd62, %rd64; 2026-02-21T08:21:54.8382141Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8382210Z mov.b64 {%r334, %r335}, %rd65; 2026-02-21T08:21:54.8382318Z cvt.rn.f16x2.f32 %r336, %r335, %r334; 2026-02-21T08:21:54.8382484Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8382540Z cvt.u64.u32 %rd66, %r257; 2026-02-21T08:21:54.8382604Z cvt.u64.u32 %rd67, %r258; 2026-02-21T08:21:54.8382659Z shl.b64 %rd68, %rd67, 32; 2026-02-21T08:21:54.8382715Z or.b64 %rd69, %rd66, %rd68; 2026-02-21T08:21:54.8382885Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8382943Z mov.b64 {%r337, %r338}, %rd69; 2026-02-21T08:21:54.8383008Z cvt.rn.f16x2.f32 %r339, %r338, %r337; 2026-02-21T08:21:54.8383180Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8383236Z cvt.u64.u32 %rd70, %r259; 2026-02-21T08:21:54.8383292Z cvt.u64.u32 %rd71, %r260; 2026-02-21T08:21:54.8383346Z shl.b64 %rd72, %rd71, 32; 2026-02-21T08:21:54.8383451Z or.b64 %rd73, %rd70, %rd72; 2026-02-21T08:21:54.8383615Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8383673Z mov.b64 {%r340, %r341}, %rd73; 2026-02-21T08:21:54.8383742Z cvt.rn.f16x2.f32 %r342, %r341, %r340; 2026-02-21T08:21:54.8383903Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8383958Z cvt.u64.u32 %rd74, %r261; 2026-02-21T08:21:54.8384020Z cvt.u64.u32 %rd75, %r262; 2026-02-21T08:21:54.8384076Z shl.b64 %rd76, %rd75, 32; 2026-02-21T08:21:54.8384132Z or.b64 %rd77, %rd74, %rd76; 2026-02-21T08:21:54.8384295Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8384360Z mov.b64 {%r343, %r344}, %rd77; 2026-02-21T08:21:54.8384421Z cvt.rn.f16x2.f32 %r345, %r344, %r343; 2026-02-21T08:21:54.8384586Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8384653Z cvt.u64.u32 %rd78, %r263; 2026-02-21T08:21:54.8384741Z cvt.u64.u32 %rd79, %r264; 2026-02-21T08:21:54.8384798Z shl.b64 %rd80, %rd79, 32; 2026-02-21T08:21:54.8384854Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T08:21:54.8385021Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8385079Z mov.b64 {%r346, %r347}, %rd81; 2026-02-21T08:21:54.8385141Z cvt.rn.f16x2.f32 %r348, %r347, %r346; 2026-02-21T08:21:54.8385311Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8385365Z cvt.u64.u32 %rd82, %r265; 2026-02-21T08:21:54.8385419Z cvt.u64.u32 %rd83, %r266; 2026-02-21T08:21:54.8385482Z shl.b64 %rd84, %rd83, 32; 2026-02-21T08:21:54.8385541Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T08:21:54.8385713Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8385772Z mov.b64 {%r349, %r350}, %rd85; 2026-02-21T08:21:54.8385845Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:21:54.8386015Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8386073Z cvt.u64.u32 %rd86, %r267; 2026-02-21T08:21:54.8386139Z cvt.u64.u32 %rd87, %r268; 2026-02-21T08:21:54.8386199Z shl.b64 %rd88, %rd87, 32; 2026-02-21T08:21:54.8386256Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T08:21:54.8386435Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8386496Z mov.b64 {%r352, %r353}, %rd89; 2026-02-21T08:21:54.8386559Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:21:54.8386731Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8386796Z cvt.u64.u32 %rd90, %r269; 2026-02-21T08:21:54.8386854Z cvt.u64.u32 %rd91, %r270; 2026-02-21T08:21:54.8386914Z shl.b64 %rd92, %rd91, 32; 2026-02-21T08:21:54.8387029Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T08:21:54.8387203Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8387263Z mov.b64 {%r355, %r356}, %rd93; 2026-02-21T08:21:54.8387333Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T08:21:54.8387504Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8387561Z cvt.u64.u32 %rd94, %r272; 2026-02-21T08:21:54.8387619Z cvt.u64.u32 %rd95, %r273; 2026-02-21T08:21:54.8387683Z shl.b64 %rd96, %rd95, 32; 2026-02-21T08:21:54.8387741Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T08:21:54.8387911Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8387978Z mov.b64 {%r358, %r359}, %rd97; 2026-02-21T08:21:54.8388043Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T08:21:54.8388262Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8388330Z cvt.u64.u32 %rd98, %r274; 2026-02-21T08:21:54.8388388Z cvt.u64.u32 %rd99, %r275; 2026-02-21T08:21:54.8388447Z shl.b64 %rd100, %rd99, 32; 2026-02-21T08:21:54.8388508Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T08:21:54.8388682Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8388745Z mov.b64 {%r361, %r362}, %rd101; 2026-02-21T08:21:54.8388809Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T08:21:54.8388980Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8389041Z cvt.u64.u32 %rd102, %r276; 2026-02-21T08:21:54.8389099Z cvt.u64.u32 %rd103, %r277; 2026-02-21T08:21:54.8389165Z shl.b64 %rd104, %rd103, 32; 2026-02-21T08:21:54.8389226Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T08:21:54.8389395Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8389461Z mov.b64 {%r364, %r365}, %rd105; 2026-02-21T08:21:54.8389535Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T08:21:54.8389706Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8389769Z cvt.u64.u32 %rd106, %r278; 2026-02-21T08:21:54.8389836Z cvt.u64.u32 %rd107, %r279; 2026-02-21T08:21:54.8389895Z shl.b64 %rd108, %rd107, 32; 2026-02-21T08:21:54.8389955Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T08:21:54.8390132Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8390192Z mov.b64 {%r367, %r368}, %rd109; 2026-02-21T08:21:54.8390256Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T08:21:54.8390426Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8390493Z cvt.u64.u32 %rd110, %r280; 2026-02-21T08:21:54.8390552Z cvt.u64.u32 %rd111, %r281; 2026-02-21T08:21:54.8390613Z shl.b64 %rd112, %rd111, 32; 2026-02-21T08:21:54.8390679Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T08:21:54.8390849Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8390910Z mov.b64 {%r370, %r371}, %rd113; 2026-02-21T08:21:54.8390981Z cvt.rn.f16x2.f32 %r372, %r371, %r370; 2026-02-21T08:21:54.8391148Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8391206Z cvt.u64.u32 %rd114, %r282; 2026-02-21T08:21:54.8391262Z cvt.u64.u32 %rd115, %r283; 2026-02-21T08:21:54.8391327Z shl.b64 %rd116, %rd115, 32; 2026-02-21T08:21:54.8391386Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T08:21:54.8391555Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8391621Z mov.b64 {%r373, %r374}, %rd117; 2026-02-21T08:21:54.8391686Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T08:21:54.8391902Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8391968Z cvt.u64.u32 %rd118, %r284; 2026-02-21T08:21:54.8392027Z cvt.u64.u32 %rd119, %r285; 2026-02-21T08:21:54.8392085Z shl.b64 %rd120, %rd119, 32; 2026-02-21T08:21:54.8392144Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T08:21:54.8392323Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8392383Z mov.b64 {%r376, %r377}, %rd121; 2026-02-21T08:21:54.8392447Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T08:21:54.8392681Z .loc 1 53 52 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:53:52 2026-02-21T08:21:54.8392738Z cvt.u64.u32 %rd122, %r286; 2026-02-21T08:21:54.8392792Z cvt.u64.u32 %rd123, %r287; 2026-02-21T08:21:54.8392855Z shl.b64 %rd124, %rd123, 32; 2026-02-21T08:21:54.8392966Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T08:21:54.8393137Z .loc 1 55 27 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:55:27 2026-02-21T08:21:54.8393193Z mov.b64 {%r379, %r380}, %rd125; 2026-02-21T08:21:54.8393261Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T08:21:54.8393422Z .loc 1 56 82 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:56:82 2026-02-21T08:21:54.8393513Z st.shared.v4.b32 [%r8], {%r336, %r348, %r360, %r372}; 2026-02-21T08:21:54.8393576Z bar.sync 0; 2026-02-21T08:21:54.8393632Z // begin inline asm 2026-02-21T08:21:54.8393782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r309, %r313, %r317, %r321}, [%r293]; 2026-02-21T08:21:54.8393844Z // end inline asm 2026-02-21T08:21:54.8393896Z bar.sync 0; 2026-02-21T08:21:54.8393985Z st.shared.v4.b32 [%r8], {%r339, %r351, %r363, %r375}; 2026-02-21T08:21:54.8394038Z bar.sync 0; 2026-02-21T08:21:54.8394099Z // begin inline asm 2026-02-21T08:21:54.8394252Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r310, %r314, %r318, %r322}, [%r293]; 2026-02-21T08:21:54.8394308Z // end inline asm 2026-02-21T08:21:54.8394365Z bar.sync 0; 2026-02-21T08:21:54.8394450Z st.shared.v4.b32 [%r8], {%r342, %r354, %r366, %r378}; 2026-02-21T08:21:54.8394501Z bar.sync 0; 2026-02-21T08:21:54.8394554Z // begin inline asm 2026-02-21T08:21:54.8394732Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r311, %r315, %r319, %r323}, [%r293]; 2026-02-21T08:21:54.8394785Z // end inline asm 2026-02-21T08:21:54.8394836Z bar.sync 0; 2026-02-21T08:21:54.8394927Z st.shared.v4.b32 [%r8], {%r345, %r357, %r369, %r381}; 2026-02-21T08:21:54.8394978Z bar.sync 0; 2026-02-21T08:21:54.8395032Z // begin inline asm 2026-02-21T08:21:54.8395177Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r312, %r316, %r320, %r324}, [%r293]; 2026-02-21T08:21:54.8395231Z // end inline asm 2026-02-21T08:21:54.8395285Z // begin inline asm 2026-02-21T08:21:54.8395386Z st.global.v4.b32 [ %rd58 + 0 ], { %r309, %r310, %r311, %r312 }; 2026-02-21T08:21:54.8395450Z // end inline asm 2026-02-21T08:21:54.8395506Z // begin inline asm 2026-02-21T08:21:54.8395605Z st.global.v4.b32 [ %rd59 + 0 ], { %r313, %r314, %r315, %r316 }; 2026-02-21T08:21:54.8395664Z // end inline asm 2026-02-21T08:21:54.8395718Z // begin inline asm 2026-02-21T08:21:54.8395810Z st.global.v4.b32 [ %rd60 + 0 ], { %r317, %r318, %r319, %r320 }; 2026-02-21T08:21:54.8395862Z // end inline asm 2026-02-21T08:21:54.8395923Z // begin inline asm 2026-02-21T08:21:54.8396014Z st.global.v4.b32 [ %rd61 + 0 ], { %r321, %r322, %r323, %r324 }; 2026-02-21T08:21:54.8396066Z // end inline asm 2026-02-21T08:21:54.8396150Z $L__BB0_8: // %._crit_edge 2026-02-21T08:21:54.8396315Z .loc 1 27 4 // c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py:27:4 2026-02-21T08:21:54.8396366Z bar.sync 0; 2026-02-21T08:21:54.8396428Z // begin inline asm 2026-02-21T08:21:54.8396541Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r382, 64; 2026-02-21T08:21:54.8396597Z // end inline asm 2026-02-21T08:21:54.8396698Z ret; 2026-02-21T08:21:54.8396758Z $L__tmp0: 2026-02-21T08:21:54.8396813Z $L__func_end0: 2026-02-21T08:21:54.8396895Z // -- End function 2026-02-21T08:21:54.8396951Z } 2026-02-21T08:21:54.8397155Z .file 1 "/tmp/torchinductor_root/7w/c7w4ul3skbfdpqlo2gaqzbsjf5rsfh3ll3ntaqerqyeojvalak6y.py" 2026-02-21T08:21:54.8397215Z .section .debug_abbrev 2026-02-21T08:21:54.8397263Z { 2026-02-21T08:21:54.8397356Z .b8 1 // Abbreviation Code 2026-02-21T08:21:54.8397441Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:54.8397519Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:54.8397601Z .b8 37 // DW_AT_producer 2026-02-21T08:21:54.8397675Z .b8 8 // DW_FORM_string 2026-02-21T08:21:54.8397747Z .b8 19 // DW_AT_language 2026-02-21T08:21:54.8397875Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:54.8397955Z .b8 3 // DW_AT_name 2026-02-21T08:21:54.8398026Z .b8 8 // DW_FORM_string 2026-02-21T08:21:54.8398103Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:54.8398189Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:54.8398260Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:54.8398329Z .b8 8 // DW_FORM_string 2026-02-21T08:21:54.8398405Z .b8 0 // EOM(1) 2026-02-21T08:21:54.8398473Z .b8 0 // EOM(2) 2026-02-21T08:21:54.8398537Z .b8 0 // EOM(3) 2026-02-21T08:21:54.8398593Z } 2026-02-21T08:21:54.8398649Z .section .debug_info 2026-02-21T08:21:54.8398698Z { 2026-02-21T08:21:54.8398780Z .b32 104 // Length of Unit 2026-02-21T08:21:54.8398872Z .b8 2 // DWARF version number 2026-02-21T08:21:54.8398921Z .b8 0 2026-02-21T08:21:54.8399036Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:54.8399127Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:54.8399223Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:54.8399301Z .b8 116 // DW_AT_producer 2026-02-21T08:21:54.8399359Z .b8 114 2026-02-21T08:21:54.8399409Z .b8 105 2026-02-21T08:21:54.8399459Z .b8 116 2026-02-21T08:21:54.8399507Z .b8 111 2026-02-21T08:21:54.8399563Z .b8 110 2026-02-21T08:21:54.8399612Z .b8 0 2026-02-21T08:21:54.8399682Z .b8 2 // DW_AT_language 2026-02-21T08:21:54.8399739Z .b8 0 2026-02-21T08:21:54.8399810Z .b8 99 // DW_AT_name 2026-02-21T08:21:54.8399860Z .b8 55 2026-02-21T08:21:54.8399911Z .b8 119 2026-02-21T08:21:54.8399970Z .b8 52 2026-02-21T08:21:54.8400022Z .b8 117 2026-02-21T08:21:54.8400071Z .b8 108 2026-02-21T08:21:54.8400120Z .b8 51 2026-02-21T08:21:54.8400174Z .b8 115 2026-02-21T08:21:54.8400223Z .b8 107 2026-02-21T08:21:54.8400270Z .b8 98 2026-02-21T08:21:54.8400325Z .b8 102 2026-02-21T08:21:54.8400375Z .b8 100 2026-02-21T08:21:54.8400423Z .b8 112 2026-02-21T08:21:54.8400470Z .b8 113 2026-02-21T08:21:54.8400527Z .b8 108 2026-02-21T08:21:54.8400575Z .b8 111 2026-02-21T08:21:54.8400622Z .b8 50 2026-02-21T08:21:54.8400678Z .b8 103 2026-02-21T08:21:54.8400726Z .b8 97 2026-02-21T08:21:54.8400774Z .b8 113 2026-02-21T08:21:54.8400822Z .b8 122 2026-02-21T08:21:54.8400877Z .b8 98 2026-02-21T08:21:54.8400925Z .b8 115 2026-02-21T08:21:54.8400973Z .b8 106 2026-02-21T08:21:54.8401028Z .b8 102 2026-02-21T08:21:54.8401077Z .b8 53 2026-02-21T08:21:54.8401124Z .b8 114 2026-02-21T08:21:54.8401173Z .b8 115 2026-02-21T08:21:54.8401227Z .b8 102 2026-02-21T08:21:54.8401275Z .b8 104 2026-02-21T08:21:54.8401324Z .b8 51 2026-02-21T08:21:54.8401423Z .b8 108 2026-02-21T08:21:54.8401479Z .b8 108 2026-02-21T08:21:54.8401527Z .b8 51 2026-02-21T08:21:54.8401575Z .b8 110 2026-02-21T08:21:54.8401629Z .b8 116 2026-02-21T08:21:54.8401677Z .b8 97 2026-02-21T08:21:54.8401725Z .b8 113 2026-02-21T08:21:54.8401772Z .b8 101 2026-02-21T08:21:54.8401827Z .b8 114 2026-02-21T08:21:54.8401874Z .b8 113 2026-02-21T08:21:54.8401922Z .b8 121 2026-02-21T08:21:54.8401976Z .b8 101 2026-02-21T08:21:54.8402025Z .b8 111 2026-02-21T08:21:54.8402072Z .b8 106 2026-02-21T08:21:54.8402120Z .b8 118 2026-02-21T08:21:54.8402176Z .b8 97 2026-02-21T08:21:54.8402223Z .b8 108 2026-02-21T08:21:54.8402272Z .b8 97 2026-02-21T08:21:54.8402319Z .b8 107 2026-02-21T08:21:54.8402374Z .b8 54 2026-02-21T08:21:54.8402421Z .b8 121 2026-02-21T08:21:54.8402470Z .b8 46 2026-02-21T08:21:54.8402525Z .b8 112 2026-02-21T08:21:54.8402575Z .b8 121 2026-02-21T08:21:54.8402624Z .b8 0 2026-02-21T08:21:54.8402714Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:54.8402835Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:54.8402887Z .b8 116 2026-02-21T08:21:54.8402937Z .b8 109 2026-02-21T08:21:54.8402992Z .b8 112 2026-02-21T08:21:54.8403042Z .b8 47 2026-02-21T08:21:54.8403091Z .b8 116 2026-02-21T08:21:54.8403139Z .b8 111 2026-02-21T08:21:54.8403195Z .b8 114 2026-02-21T08:21:54.8403244Z .b8 99 2026-02-21T08:21:54.8403293Z .b8 104 2026-02-21T08:21:54.8403346Z .b8 105 2026-02-21T08:21:54.8403396Z .b8 110 2026-02-21T08:21:54.8403445Z .b8 100 2026-02-21T08:21:54.8403494Z .b8 117 2026-02-21T08:21:54.8403552Z .b8 99 2026-02-21T08:21:54.8403601Z .b8 116 2026-02-21T08:21:54.8403651Z .b8 111 2026-02-21T08:21:54.8403717Z .b8 114 2026-02-21T08:21:54.8403768Z .b8 95 2026-02-21T08:21:54.8403817Z .b8 114 2026-02-21T08:21:54.8403865Z .b8 111 2026-02-21T08:21:54.8403923Z .b8 111 2026-02-21T08:21:54.8403970Z .b8 116 2026-02-21T08:21:54.8404019Z .b8 47 2026-02-21T08:21:54.8404066Z .b8 55 2026-02-21T08:21:54.8404121Z .b8 119 2026-02-21T08:21:54.8404171Z .b8 0 2026-02-21T08:21:54.8404222Z } 2026-02-21T08:21:54.8404292Z .section .debug_macinfo { } 2026-02-21T08:21:54.8404296Z 2026-02-21T08:21:54.8404371Z ================================================================ 2026-02-21T08:21:54.8404471Z please share the reproducer above with Triton project. 2026-02-21T08:21:55.0717537Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:55.0718848Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:21:55.0720189Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:55.0720443Z `ptxas` stderr: 2026-02-21T08:21:55.0720874Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 206 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.0721359Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.0721514Z 2026-02-21T08:21:55.0721921Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpkl9qkhdz.ptx -o /tmp/tmpkl9qkhdz.ptx.o 2026-02-21T08:21:55.0722371Z 2026-02-21T08:21:55.0722504Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:55.0722689Z 2026-02-21T08:21:55.0722869Z 2026-02-21T08:21:55.0722872Z 2026-02-21T08:21:55.0722954Z ================================================================ 2026-02-21T08:21:55.0723168Z Internal Triton PTX codegen error 2026-02-21T08:21:55.0723341Z `ptxas` stderr: 2026-02-21T08:21:55.0723752Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 206 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.0724529Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.0724749Z 2026-02-21T08:21:55.0725137Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpkl9qkhdz.ptx -o /tmp/tmpkl9qkhdz.ptx.o 2026-02-21T08:21:55.0725588Z 2026-02-21T08:21:55.0725591Z 2026-02-21T08:21:55.0725645Z // 2026-02-21T08:21:55.0725789Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:55.0725951Z // 2026-02-21T08:21:55.0726022Z 2026-02-21T08:21:55.0726077Z .version 8.7 2026-02-21T08:21:55.0726209Z .target sm_100a 2026-02-21T08:21:55.0726350Z .address_size 64 2026-02-21T08:21:55.0726431Z 2026-02-21T08:21:55.0726548Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:55.0726917Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:55.0727138Z // @_helion_matmul 2026-02-21T08:21:55.0727329Z .visible .entry _helion_matmul( 2026-02-21T08:21:55.0727549Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:55.0727797Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:55.0728053Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:55.0728298Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:55.0728560Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:55.0728770Z ) 2026-02-21T08:21:55.0728890Z .reqntid 256 2026-02-21T08:21:55.0729028Z .maxnreg 32 2026-02-21T08:21:55.0729151Z { 2026-02-21T08:21:55.0729278Z .reg .pred %p<118>; 2026-02-21T08:21:55.0729420Z .reg .b32 %r<458>; 2026-02-21T08:21:55.0729562Z .reg .b64 %rd<123>; 2026-02-21T08:21:55.0729826Z .loc 1 19 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:19:0 2026-02-21T08:21:55.0730132Z $L__func_begin0: 2026-02-21T08:21:55.0730378Z .loc 1 19 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:19:0 2026-02-21T08:21:55.0730613Z 2026-02-21T08:21:55.0730664Z // %bb.0: 2026-02-21T08:21:55.0730815Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T08:21:55.0730998Z $L__tmp0: 2026-02-21T08:21:55.0731230Z .loc 1 19 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:19 2026-02-21T08:21:55.0731510Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:55.0731657Z shr.u32 %r2, %r1, 5; 2026-02-21T08:21:55.0731807Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:21:55.0731995Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T08:21:55.0732149Z @%p3 bra $L__BB0_16; 2026-02-21T08:21:55.0732285Z bra.uni $L__BB0_1; 2026-02-21T08:21:55.0732424Z $L__BB0_16: 2026-02-21T08:21:55.0732651Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0:0 2026-02-21T08:21:55.0732961Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T08:21:55.0733166Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T08:21:55.0733459Z .loc 1 19 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:19 2026-02-21T08:21:55.0733754Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:21:55.0733947Z setp.lt.u32 %p27, %r1, 32; 2026-02-21T08:21:55.0734110Z mov.b32 %r139, global_smem; 2026-02-21T08:21:55.0734264Z // begin inline asm 2026-02-21T08:21:55.0734502Z @%p27 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r139], 64; 2026-02-21T08:21:55.0734775Z // end inline asm 2026-02-21T08:21:55.0734915Z bar.sync 0, 128; 2026-02-21T08:21:55.0735055Z ld.shared.b32 %r429, [global_smem]; 2026-02-21T08:21:55.0735225Z bar.sync 0, 128; 2026-02-21T08:21:55.0735351Z // begin inline asm 2026-02-21T08:21:55.0735554Z @%p27 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:55.0735780Z // end inline asm 2026-02-21T08:21:55.0736026Z .loc 1 21 67 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:21:67 2026-02-21T08:21:55.0736393Z mov.u32 %r41, %ctaid.x; 2026-02-21T08:21:55.0736541Z mov.u32 %r215, %ctaid.y; 2026-02-21T08:21:55.0736694Z mov.u32 %r216, %ctaid.z; 2026-02-21T08:21:55.0736841Z mov.u32 %r217, %nctaid.x; 2026-02-21T08:21:55.0736998Z mov.u32 %r218, %nctaid.y; 2026-02-21T08:21:55.0737151Z mad.lo.s32 %r219, %r216, %r218, %r215; 2026-02-21T08:21:55.0737337Z mad.lo.s32 %r220, %r219, %r217, %r41; 2026-02-21T08:21:55.0737513Z shl.b32 %r221, %r220, 8; 2026-02-21T08:21:55.0737662Z cvt.s64.s32 %rd54, %r221; 2026-02-21T08:21:55.0737824Z add.s64 %rd33, %rd6, %rd54; 2026-02-21T08:21:55.0737980Z shl.b32 %r222, %r1, 2; 2026-02-21T08:21:55.0738143Z add.s32 %r140, %r139, %r222; 2026-02-21T08:21:55.0738296Z mov.b32 %r457, 0; 2026-02-21T08:21:55.0738437Z // begin inline asm 2026-02-21T08:21:55.0738583Z @%p27 st.shared.b32 [ %r140 + 0 ], %r457; 2026-02-21T08:21:55.0738822Z // end inline asm 2026-02-21T08:21:55.0738964Z bar.warp.sync -1; 2026-02-21T08:21:55.0739112Z setp.eq.b32 %p100, %r1, 0; 2026-02-21T08:21:55.0739272Z cvt.u64.u32 %rd18, %r139; 2026-02-21T08:21:55.0739418Z // begin inline asm 2026-02-21T08:21:55.0739672Z @%p100 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd18 + 0 ], %rd3; 2026-02-21T08:21:55.0739952Z // end inline asm 2026-02-21T08:21:55.0740088Z // begin inline asm 2026-02-21T08:21:55.0740308Z @%p100 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:21:55.0740563Z // end inline asm 2026-02-21T08:21:55.0740691Z mov.b32 %r142, 16; 2026-02-21T08:21:55.0740829Z // begin inline asm 2026-02-21T08:21:55.0741070Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r142; 2026-02-21T08:21:55.0741336Z // end inline asm 2026-02-21T08:21:55.0741470Z mov.b32 %r143, 64; 2026-02-21T08:21:55.0741599Z // begin inline asm 2026-02-21T08:21:55.0741835Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r143; 2026-02-21T08:21:55.0742104Z // end inline asm 2026-02-21T08:21:55.0742241Z mov.b32 %r144, 2048; 2026-02-21T08:21:55.0742374Z // begin inline asm 2026-02-21T08:21:55.0742617Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r144; 2026-02-21T08:21:55.0742889Z // end inline asm 2026-02-21T08:21:55.0743016Z // begin inline asm 2026-02-21T08:21:55.0743255Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r144; 2026-02-21T08:21:55.0743521Z // end inline asm 2026-02-21T08:21:55.0743654Z mov.b64 %rd26, 4096; 2026-02-21T08:21:55.0743789Z // begin inline asm 2026-02-21T08:21:55.0744046Z @%p100 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd18 + 0 ], 0x0, %rd26; 2026-02-21T08:21:55.0744328Z // end inline asm 2026-02-21T08:21:55.0744453Z mov.b32 %r146, 1; 2026-02-21T08:21:55.0744589Z // begin inline asm 2026-02-21T08:21:55.0744871Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r146; 2026-02-21T08:21:55.0745156Z // end inline asm 2026-02-21T08:21:55.0745282Z // begin inline asm 2026-02-21T08:21:55.0745541Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r146; 2026-02-21T08:21:55.0745821Z // end inline asm 2026-02-21T08:21:55.0745948Z // begin inline asm 2026-02-21T08:21:55.0746179Z @%p100 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x6; 2026-02-21T08:21:55.0746437Z // end inline asm 2026-02-21T08:21:55.0746571Z // begin inline asm 2026-02-21T08:21:55.0746811Z @%p100 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:21:55.0747101Z // end inline asm 2026-02-21T08:21:55.0747237Z // begin inline asm 2026-02-21T08:21:55.0747487Z @%p100 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:21:55.0747775Z // end inline asm 2026-02-21T08:21:55.0747915Z // begin inline asm 2026-02-21T08:21:55.0748156Z @%p100 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:21:55.0748483Z // end inline asm 2026-02-21T08:21:55.0748625Z // begin inline asm 2026-02-21T08:21:55.0748972Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd33 + 0 ], [ %rd18 + 0 ], 0x80; 2026-02-21T08:21:55.0749360Z // end inline asm 2026-02-21T08:21:55.0749498Z // begin inline asm 2026-02-21T08:21:55.0749709Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd33 + 0 ], 0x80; 2026-02-21T08:21:55.0749970Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.0750168Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.0750351Z // end inline asm 2026-02-21T08:21:55.0750483Z bar.sync 0, 128; 2026-02-21T08:21:55.0750745Z .loc 1 22 67 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:22:67 2026-02-21T08:21:55.0751058Z add.s64 %rd51, %rd33, 128; 2026-02-21T08:21:55.0751292Z bar.sync 0, 128; 2026-02-21T08:21:55.0751441Z // begin inline asm 2026-02-21T08:21:55.0751594Z @%p27 st.shared.b32 [ %r140 + 0 ], %r457; 2026-02-21T08:21:55.0751776Z // end inline asm 2026-02-21T08:21:55.0751912Z bar.warp.sync -1; 2026-02-21T08:21:55.0752058Z // begin inline asm 2026-02-21T08:21:55.0752304Z @%p100 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd18 + 0 ], %rd4; 2026-02-21T08:21:55.0752597Z // end inline asm 2026-02-21T08:21:55.0752729Z // begin inline asm 2026-02-21T08:21:55.0752958Z @%p100 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:21:55.0753222Z // end inline asm 2026-02-21T08:21:55.0753354Z // begin inline asm 2026-02-21T08:21:55.0753601Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r142; 2026-02-21T08:21:55.0753871Z // end inline asm 2026-02-21T08:21:55.0754008Z // begin inline asm 2026-02-21T08:21:55.0754246Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r143; 2026-02-21T08:21:55.0754524Z // end inline asm 2026-02-21T08:21:55.0754699Z // begin inline asm 2026-02-21T08:21:55.0754954Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r144; 2026-02-21T08:21:55.0755250Z // end inline asm 2026-02-21T08:21:55.0755384Z mov.b32 %r153, 4096; 2026-02-21T08:21:55.0755532Z // begin inline asm 2026-02-21T08:21:55.0755783Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r153; 2026-02-21T08:21:55.0756070Z // end inline asm 2026-02-21T08:21:55.0756204Z // begin inline asm 2026-02-21T08:21:55.0756468Z @%p100 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd18 + 0 ], 0x0, %rd26; 2026-02-21T08:21:55.0756765Z // end inline asm 2026-02-21T08:21:55.0756897Z // begin inline asm 2026-02-21T08:21:55.0757204Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r146; 2026-02-21T08:21:55.0757504Z // end inline asm 2026-02-21T08:21:55.0757647Z // begin inline asm 2026-02-21T08:21:55.0757909Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r146; 2026-02-21T08:21:55.0758203Z // end inline asm 2026-02-21T08:21:55.0758343Z // begin inline asm 2026-02-21T08:21:55.0758587Z @%p100 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x6; 2026-02-21T08:21:55.0758852Z // end inline asm 2026-02-21T08:21:55.0758979Z // begin inline asm 2026-02-21T08:21:55.0759227Z @%p100 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:21:55.0759506Z // end inline asm 2026-02-21T08:21:55.0759641Z // begin inline asm 2026-02-21T08:21:55.0759877Z @%p100 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:21:55.0760138Z // end inline asm 2026-02-21T08:21:55.0760273Z // begin inline asm 2026-02-21T08:21:55.0760495Z @%p100 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:21:55.0760766Z // end inline asm 2026-02-21T08:21:55.0760944Z // begin inline asm 2026-02-21T08:21:55.0761285Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd51 + 0 ], [ %rd18 + 0 ], 0x80; 2026-02-21T08:21:55.0761664Z // end inline asm 2026-02-21T08:21:55.0761793Z // begin inline asm 2026-02-21T08:21:55.0762003Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd51 + 0 ], 0x80; 2026-02-21T08:21:55.0762246Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.0762439Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.0762608Z // end inline asm 2026-02-21T08:21:55.0762739Z bar.sync 0, 128; 2026-02-21T08:21:55.0762984Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0763280Z sub.s32 %r223, 2048, %r41; 2026-02-21T08:21:55.0763449Z mul.hi.s32 %r224, %r223, -580400985; 2026-02-21T08:21:55.0763619Z add.s32 %r225, %r224, %r223; 2026-02-21T08:21:55.0763849Z shr.u32 %r226, %r225, 31; 2026-02-21T08:21:55.0763998Z shr.s32 %r227, %r225, 14; 2026-02-21T08:21:55.0764155Z add.s32 %r228, %r227, %r226; 2026-02-21T08:21:55.0764308Z mul.lo.s32 %r229, %r228, 18944; 2026-02-21T08:21:55.0764477Z setp.ne.b32 %p91, %r223, %r229; 2026-02-21T08:21:55.0764638Z setp.lt.u32 %p92, %r41, 2049; 2026-02-21T08:21:55.0764832Z and.pred %p93, %p92, %p91; 2026-02-21T08:21:55.0764989Z selp.b32 %r230, 1, 0, %p93; 2026-02-21T08:21:55.0765153Z add.s32 %r231, %r228, %r230; 2026-02-21T08:21:55.0765312Z shl.b32 %r48, %r231, 7; 2026-02-21T08:21:55.0765570Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0765872Z shfl.sync.idx.b32 %r232, %r2, 0, 31, -1; 2026-02-21T08:21:55.0766048Z shl.b32 %r233, %r232, 21; 2026-02-21T08:21:55.0766203Z and.b32 %r234, %r233, 6291456; 2026-02-21T08:21:55.0766356Z add.s32 %r156, %r234, %r429; 2026-02-21T08:21:55.0766515Z mov.pred %p65, -1; 2026-02-21T08:21:55.0766653Z // begin inline asm 2026-02-21T08:21:55.0767033Z @%p65 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r156 + 0], 32, {%r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457}; 2026-02-21T08:21:55.0767452Z // end inline asm 2026-02-21T08:21:55.0767586Z // begin inline asm 2026-02-21T08:21:55.0767965Z @%p65 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r156 + 16], 32, {%r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457, %r457}; 2026-02-21T08:21:55.0768360Z // end inline asm 2026-02-21T08:21:55.0768495Z // begin inline asm 2026-02-21T08:21:55.0768642Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:55.0768806Z // end inline asm 2026-02-21T08:21:55.0768940Z bar.sync 0, 128; 2026-02-21T08:21:55.0769190Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0769491Z add.s32 %r190, %r139, 30720; 2026-02-21T08:21:55.0769638Z // begin inline asm 2026-02-21T08:21:55.0769806Z @%p100 mbarrier.init.shared::cta.b64 [%r190], 1; 2026-02-21T08:21:55.0769996Z // end inline asm 2026-02-21T08:21:55.0770132Z bar.sync 0, 128; 2026-02-21T08:21:55.0770262Z add.s32 %r191, %r139, 30728; 2026-02-21T08:21:55.0770415Z // begin inline asm 2026-02-21T08:21:55.0770579Z @%p100 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T08:21:55.0770761Z // end inline asm 2026-02-21T08:21:55.0770895Z bar.sync 0, 128; 2026-02-21T08:21:55.0771025Z add.s32 %r192, %r139, 30736; 2026-02-21T08:21:55.0771179Z // begin inline asm 2026-02-21T08:21:55.0771334Z @%p100 mbarrier.init.shared::cta.b64 [%r192], 1; 2026-02-21T08:21:55.0771524Z // end inline asm 2026-02-21T08:21:55.0771650Z bar.sync 0, 128; 2026-02-21T08:21:55.0771787Z add.s32 %r193, %r139, 30744; 2026-02-21T08:21:55.0771939Z // begin inline asm 2026-02-21T08:21:55.0772093Z @%p100 mbarrier.init.shared::cta.b64 [%r193], 1; 2026-02-21T08:21:55.0772276Z // end inline asm 2026-02-21T08:21:55.0772401Z bar.sync 0, 128; 2026-02-21T08:21:55.0772539Z add.s32 %r194, %r139, 30752; 2026-02-21T08:21:55.0772743Z // begin inline asm 2026-02-21T08:21:55.0772903Z @%p100 mbarrier.init.shared::cta.b64 [%r194], 1; 2026-02-21T08:21:55.0773079Z // end inline asm 2026-02-21T08:21:55.0773211Z bar.sync 0, 128; 2026-02-21T08:21:55.0773339Z add.s32 %r195, %r139, 30760; 2026-02-21T08:21:55.0773494Z // begin inline asm 2026-02-21T08:21:55.0773656Z @%p100 mbarrier.init.shared::cta.b64 [%r195], 1; 2026-02-21T08:21:55.0773832Z // end inline asm 2026-02-21T08:21:55.0773966Z bar.sync 0, 128; 2026-02-21T08:21:55.0774095Z add.s32 %r196, %r139, 30768; 2026-02-21T08:21:55.0774248Z // begin inline asm 2026-02-21T08:21:55.0774402Z @%p100 mbarrier.init.shared::cta.b64 [%r196], 1; 2026-02-21T08:21:55.0774587Z // end inline asm 2026-02-21T08:21:55.0774747Z add.s32 %r197, %r139, 30784; 2026-02-21T08:21:55.0774904Z // begin inline asm 2026-02-21T08:21:55.0775066Z @%p100 mbarrier.init.shared::cta.b64 [%r197], 1; 2026-02-21T08:21:55.0775302Z // end inline asm 2026-02-21T08:21:55.0775441Z bar.sync 0, 128; 2026-02-21T08:21:55.0775576Z add.s32 %r198, %r139, 30792; 2026-02-21T08:21:55.0775738Z // begin inline asm 2026-02-21T08:21:55.0775900Z @%p100 mbarrier.init.shared::cta.b64 [%r198], 1; 2026-02-21T08:21:55.0776096Z // end inline asm 2026-02-21T08:21:55.0776232Z bar.sync 0, 128; 2026-02-21T08:21:55.0776376Z add.s32 %r199, %r139, 30800; 2026-02-21T08:21:55.0776525Z // begin inline asm 2026-02-21T08:21:55.0776693Z @%p100 mbarrier.init.shared::cta.b64 [%r199], 1; 2026-02-21T08:21:55.0776882Z // end inline asm 2026-02-21T08:21:55.0777010Z bar.sync 0, 128; 2026-02-21T08:21:55.0777153Z add.s32 %r200, %r139, 30808; 2026-02-21T08:21:55.0777303Z // begin inline asm 2026-02-21T08:21:55.0777465Z @%p100 mbarrier.init.shared::cta.b64 [%r200], 1; 2026-02-21T08:21:55.0777646Z // end inline asm 2026-02-21T08:21:55.0777786Z bar.sync 0, 128; 2026-02-21T08:21:55.0777919Z add.s32 %r201, %r139, 30816; 2026-02-21T08:21:55.0778075Z // begin inline asm 2026-02-21T08:21:55.0778244Z @%p100 mbarrier.init.shared::cta.b64 [%r201], 1; 2026-02-21T08:21:55.0778429Z // end inline asm 2026-02-21T08:21:55.0778569Z bar.sync 0, 128; 2026-02-21T08:21:55.0778704Z add.s32 %r202, %r139, 30824; 2026-02-21T08:21:55.0778864Z // begin inline asm 2026-02-21T08:21:55.0779025Z @%p100 mbarrier.init.shared::cta.b64 [%r202], 1; 2026-02-21T08:21:55.0779215Z // end inline asm 2026-02-21T08:21:55.0779347Z bar.sync 0, 128; 2026-02-21T08:21:55.0779489Z add.s32 %r203, %r139, 30832; 2026-02-21T08:21:55.0779639Z // begin inline asm 2026-02-21T08:21:55.0779801Z @%p100 mbarrier.init.shared::cta.b64 [%r203], 1; 2026-02-21T08:21:55.0779990Z // end inline asm 2026-02-21T08:21:55.0780231Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0780511Z bar.sync 0, 128; 2026-02-21T08:21:55.0780644Z // begin inline asm 2026-02-21T08:21:55.0780818Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r190]; 2026-02-21T08:21:55.0781017Z // end inline asm 2026-02-21T08:21:55.0781155Z bar.sync 0, 128; 2026-02-21T08:21:55.0781285Z // begin inline asm 2026-02-21T08:21:55.0781454Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r191]; 2026-02-21T08:21:55.0781653Z // end inline asm 2026-02-21T08:21:55.0781783Z bar.sync 0, 128; 2026-02-21T08:21:55.0781922Z // begin inline asm 2026-02-21T08:21:55.0782085Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r192]; 2026-02-21T08:21:55.0782279Z // end inline asm 2026-02-21T08:21:55.0782407Z bar.sync 0, 128; 2026-02-21T08:21:55.0782544Z // begin inline asm 2026-02-21T08:21:55.0782708Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r193]; 2026-02-21T08:21:55.0782901Z // end inline asm 2026-02-21T08:21:55.0783038Z bar.sync 0, 128; 2026-02-21T08:21:55.0783169Z // begin inline asm 2026-02-21T08:21:55.0783342Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r194]; 2026-02-21T08:21:55.0783529Z // end inline asm 2026-02-21T08:21:55.0783667Z bar.sync 0, 128; 2026-02-21T08:21:55.0783802Z // begin inline asm 2026-02-21T08:21:55.0784040Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r195]; 2026-02-21T08:21:55.0784223Z // end inline asm 2026-02-21T08:21:55.0784364Z bar.sync 0, 128; 2026-02-21T08:21:55.0784493Z // begin inline asm 2026-02-21T08:21:55.0784657Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r196]; 2026-02-21T08:21:55.0784879Z // end inline asm 2026-02-21T08:21:55.0785138Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0785450Z bar.sync 0, 128; 2026-02-21T08:21:55.0785582Z add.s32 %r211, %r139, 30848; 2026-02-21T08:21:55.0785737Z // begin inline asm 2026-02-21T08:21:55.0785894Z @%p100 mbarrier.init.shared::cta.b64 [%r211], 1; 2026-02-21T08:21:55.0786084Z // end inline asm 2026-02-21T08:21:55.0786214Z add.s32 %r411, %r139, 30864; 2026-02-21T08:21:55.0786370Z // begin inline asm 2026-02-21T08:21:55.0786533Z @%p100 mbarrier.init.shared::cta.b64 [%r411], 1; 2026-02-21T08:21:55.0786712Z // end inline asm 2026-02-21T08:21:55.0787022Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0787308Z bar.sync 0, 128; 2026-02-21T08:21:55.0787441Z // begin inline asm 2026-02-21T08:21:55.0787601Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r411]; 2026-02-21T08:21:55.0787794Z // end inline asm 2026-02-21T08:21:55.0788033Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0788339Z st.shared.b32 [global_smem+30872], 33554689; 2026-02-21T08:21:55.0788543Z st.shared.b32 [global_smem+28672], %r429; 2026-02-21T08:21:55.0788732Z st.shared.b32 [global_smem+28680], %r48; 2026-02-21T08:21:55.0788911Z barrier.sync 1; 2026-02-21T08:21:55.0789065Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:21:55.0789248Z barrier.sync 1; 2026-02-21T08:21:55.0789384Z setp.lt.s32 %p94, %r231, 1; 2026-02-21T08:21:55.0789546Z @%p94 bra $L__BB0_23; 2026-02-21T08:21:55.0789708Z // %bb.17: // %.lr.ph7 2026-02-21T08:21:55.0790011Z .loc 1 0 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0:131 2026-02-21T08:21:55.0790345Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T08:21:55.0790540Z bfe.u32 %r42, %r1, 3, 4; 2026-02-21T08:21:55.0790699Z or.b32 %r43, %r42, 16; 2026-02-21T08:21:55.0790848Z or.b32 %r44, %r42, 32; 2026-02-21T08:21:55.0790999Z or.b32 %r45, %r42, 48; 2026-02-21T08:21:55.0791141Z shl.b32 %r46, %r1, 3; 2026-02-21T08:21:55.0791297Z and.b32 %r47, %r46, 56; 2026-02-21T08:21:55.0791563Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0791871Z add.s32 %r454, %r41, -18944; 2026-02-21T08:21:55.0792038Z shl.b32 %r237, %r1, 4; 2026-02-21T08:21:55.0792184Z and.b32 %r238, %r237, 176; 2026-02-21T08:21:55.0792347Z and.b32 %r239, %r1, 96; 2026-02-21T08:21:55.0792498Z shl.b32 %r240, %r239, 3; 2026-02-21T08:21:55.0792659Z bfe.s32 %r241, %r1, 2, 1; 2026-02-21T08:21:55.0792818Z and.b32 %r242, %r241, 1088; 2026-02-21T08:21:55.0792986Z and.b32 %r244, %r222, 64; 2026-02-21T08:21:55.0793137Z xor.b32 %r245, %r242, %r244; 2026-02-21T08:21:55.0793302Z add.s32 %r247, %r139, 28672; 2026-02-21T08:21:55.0793467Z add.s32 %r248, %r247, %r238; 2026-02-21T08:21:55.0793621Z add.s32 %r249, %r248, %r240; 2026-02-21T08:21:55.0793805Z add.s32 %r51, %r249, %r245; 2026-02-21T08:21:55.0793960Z shl.b32 %r250, %r1, 5; 2026-02-21T08:21:55.0794115Z and.b32 %r251, %r250, 1792; 2026-02-21T08:21:55.0794269Z and.b32 %r252, %r46, 48; 2026-02-21T08:21:55.0794428Z shl.b32 %r253, %r239, 1; 2026-02-21T08:21:55.0794575Z shl.b32 %r254, %r1, 6; 2026-02-21T08:21:55.0794754Z and.b32 %r255, %r254, 64; 2026-02-21T08:21:55.0794908Z xor.b32 %r256, %r253, %r255; 2026-02-21T08:21:55.0795069Z add.s32 %r257, %r247, %r251; 2026-02-21T08:21:55.0795227Z add.s32 %r258, %r257, %r252; 2026-02-21T08:21:55.0795381Z add.s32 %r302, %r258, %r256; 2026-02-21T08:21:55.0795547Z max.s32 %r447, %r48, 1; 2026-02-21T08:21:55.0795756Z mov.b32 %r452, -1; 2026-02-21T08:21:55.0795903Z mov.b32 %r455, %r457; 2026-02-21T08:21:55.0796046Z mov.b32 %r456, %r457; 2026-02-21T08:21:55.0796197Z bra.uni $L__BB0_18; 2026-02-21T08:21:55.0796389Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:21:55.0796723Z .loc 1 40 32 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:40:32 2026-02-21T08:21:55.0797020Z or.b32 %r335, %r456, %r47; 2026-02-21T08:21:55.0797285Z .loc 1 42 32 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:42:32 2026-02-21T08:21:55.0797578Z add.s32 %r336, %r455, %r42; 2026-02-21T08:21:55.0797731Z add.s32 %r337, %r455, %r43; 2026-02-21T08:21:55.0797891Z add.s32 %r338, %r455, %r44; 2026-02-21T08:21:55.0798044Z add.s32 %r339, %r455, %r45; 2026-02-21T08:21:55.0798370Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0798660Z bar.sync 0, 128; 2026-02-21T08:21:55.0798792Z // begin inline asm 2026-02-21T08:21:55.0798927Z 2026-02-21T08:21:55.0799035Z { 2026-02-21T08:21:55.0799159Z .reg .pred complete; 2026-02-21T08:21:55.0799300Z waitLoop: 2026-02-21T08:21:55.0799491Z mbarrier.try_wait.parity.shared.b64 complete, [%r211], %r457; 2026-02-21T08:21:55.0799716Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.0799866Z } 2026-02-21T08:21:55.0799928Z 2026-02-21T08:21:55.0799982Z // end inline asm 2026-02-21T08:21:55.0800119Z // begin inline asm 2026-02-21T08:21:55.0800480Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278}, [%r156 + 0], 32; 2026-02-21T08:21:55.0800865Z // end inline asm 2026-02-21T08:21:55.0801004Z // begin inline asm 2026-02-21T08:21:55.0801360Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295}, [%r156 + 16], 32; 2026-02-21T08:21:55.0801769Z // end inline asm 2026-02-21T08:21:55.0801897Z // begin inline asm 2026-02-21T08:21:55.0802055Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:55.0802219Z // end inline asm 2026-02-21T08:21:55.0802345Z bar.sync 0, 128; 2026-02-21T08:21:55.0802479Z // begin inline asm 2026-02-21T08:21:55.0802647Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r411]; 2026-02-21T08:21:55.0802848Z // end inline asm 2026-02-21T08:21:55.0802983Z cvt.u64.u32 %rd59, %r263; 2026-02-21T08:21:55.0803147Z cvt.u64.u32 %rd60, %r264; 2026-02-21T08:21:55.0803298Z shl.b64 %rd61, %rd60, 32; 2026-02-21T08:21:55.0803453Z or.b64 %rd62, %rd59, %rd61; 2026-02-21T08:21:55.0803715Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0804007Z mov.b64 {%r341, %r342}, %rd62; 2026-02-21T08:21:55.0804176Z cvt.rn.f16x2.f32 %r343, %r342, %r341; 2026-02-21T08:21:55.0804459Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0804779Z cvt.u64.u32 %rd63, %r265; 2026-02-21T08:21:55.0804926Z cvt.u64.u32 %rd64, %r266; 2026-02-21T08:21:55.0805077Z shl.b64 %rd65, %rd64, 32; 2026-02-21T08:21:55.0805222Z or.b64 %rd66, %rd63, %rd65; 2026-02-21T08:21:55.0805491Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0805777Z mov.b64 {%r344, %r345}, %rd66; 2026-02-21T08:21:55.0805939Z cvt.rn.f16x2.f32 %r346, %r345, %r344; 2026-02-21T08:21:55.0806220Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0806496Z cvt.u64.u32 %rd67, %r267; 2026-02-21T08:21:55.0806646Z cvt.u64.u32 %rd68, %r268; 2026-02-21T08:21:55.0806790Z shl.b64 %rd69, %rd68, 32; 2026-02-21T08:21:55.0806943Z or.b64 %rd70, %rd67, %rd69; 2026-02-21T08:21:55.0807201Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0807566Z mov.b64 {%r347, %r348}, %rd70; 2026-02-21T08:21:55.0807733Z cvt.rn.f16x2.f32 %r349, %r348, %r347; 2026-02-21T08:21:55.0808000Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0808276Z cvt.u64.u32 %rd71, %r269; 2026-02-21T08:21:55.0808421Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:21:55.0808572Z shl.b64 %rd73, %rd72, 32; 2026-02-21T08:21:55.0808715Z or.b64 %rd74, %rd71, %rd73; 2026-02-21T08:21:55.0808972Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0809252Z mov.b64 {%r350, %r351}, %rd74; 2026-02-21T08:21:55.0809411Z cvt.rn.f16x2.f32 %r352, %r351, %r350; 2026-02-21T08:21:55.0809686Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0809961Z cvt.u64.u32 %rd75, %r271; 2026-02-21T08:21:55.0810163Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:21:55.0810311Z shl.b64 %rd77, %rd76, 32; 2026-02-21T08:21:55.0810461Z or.b64 %rd78, %rd75, %rd77; 2026-02-21T08:21:55.0810718Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0811002Z mov.b64 {%r353, %r354}, %rd78; 2026-02-21T08:21:55.0811169Z cvt.rn.f16x2.f32 %r355, %r354, %r353; 2026-02-21T08:21:55.0811441Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0811733Z cvt.u64.u32 %rd79, %r273; 2026-02-21T08:21:55.0811875Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:21:55.0812025Z shl.b64 %rd81, %rd80, 32; 2026-02-21T08:21:55.0812169Z or.b64 %rd82, %rd79, %rd81; 2026-02-21T08:21:55.0812434Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0812715Z mov.b64 {%r356, %r357}, %rd82; 2026-02-21T08:21:55.0812876Z cvt.rn.f16x2.f32 %r358, %r357, %r356; 2026-02-21T08:21:55.0813153Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0813437Z cvt.u64.u32 %rd83, %r275; 2026-02-21T08:21:55.0813593Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:21:55.0813738Z shl.b64 %rd85, %rd84, 32; 2026-02-21T08:21:55.0813892Z or.b64 %rd86, %rd83, %rd85; 2026-02-21T08:21:55.0814151Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0814436Z mov.b64 {%r359, %r360}, %rd86; 2026-02-21T08:21:55.0814602Z cvt.rn.f16x2.f32 %r361, %r360, %r359; 2026-02-21T08:21:55.0814903Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0815191Z cvt.u64.u32 %rd87, %r277; 2026-02-21T08:21:55.0815337Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:21:55.0815489Z shl.b64 %rd89, %rd88, 32; 2026-02-21T08:21:55.0815633Z or.b64 %rd90, %rd87, %rd89; 2026-02-21T08:21:55.0815904Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0816194Z mov.b64 {%r362, %r363}, %rd90; 2026-02-21T08:21:55.0816354Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T08:21:55.0816633Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0816914Z cvt.u64.u32 %rd91, %r280; 2026-02-21T08:21:55.0817064Z cvt.u64.u32 %rd92, %r281; 2026-02-21T08:21:55.0817208Z shl.b64 %rd93, %rd92, 32; 2026-02-21T08:21:55.0817360Z or.b64 %rd94, %rd91, %rd93; 2026-02-21T08:21:55.0817615Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0817903Z mov.b64 {%r365, %r366}, %rd94; 2026-02-21T08:21:55.0818068Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T08:21:55.0818337Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0818624Z cvt.u64.u32 %rd95, %r282; 2026-02-21T08:21:55.0818824Z cvt.u64.u32 %rd96, %r283; 2026-02-21T08:21:55.0818976Z shl.b64 %rd97, %rd96, 32; 2026-02-21T08:21:55.0819119Z or.b64 %rd98, %rd95, %rd97; 2026-02-21T08:21:55.0819374Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0819651Z mov.b64 {%r368, %r369}, %rd98; 2026-02-21T08:21:55.0819808Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T08:21:55.0820075Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0820344Z cvt.u64.u32 %rd99, %r284; 2026-02-21T08:21:55.0820497Z cvt.u64.u32 %rd100, %r285; 2026-02-21T08:21:55.0820648Z shl.b64 %rd101, %rd100, 32; 2026-02-21T08:21:55.0820808Z or.b64 %rd102, %rd99, %rd101; 2026-02-21T08:21:55.0821059Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0821180Z mov.b64 {%r371, %r372}, %rd102; 2026-02-21T08:21:55.0821244Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T08:21:55.0821408Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0821475Z cvt.u64.u32 %rd103, %r286; 2026-02-21T08:21:55.0821531Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:21:55.0821586Z shl.b64 %rd105, %rd104, 32; 2026-02-21T08:21:55.0821652Z or.b64 %rd106, %rd103, %rd105; 2026-02-21T08:21:55.0821812Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0821870Z mov.b64 {%r374, %r375}, %rd106; 2026-02-21T08:21:55.0821931Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T08:21:55.0822099Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0822154Z cvt.u64.u32 %rd107, %r288; 2026-02-21T08:21:55.0822210Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:21:55.0822274Z shl.b64 %rd109, %rd108, 32; 2026-02-21T08:21:55.0822336Z or.b64 %rd110, %rd107, %rd109; 2026-02-21T08:21:55.0822502Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0822566Z mov.b64 {%r377, %r378}, %rd110; 2026-02-21T08:21:55.0822629Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T08:21:55.0822786Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0822841Z cvt.u64.u32 %rd111, %r290; 2026-02-21T08:21:55.0822905Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:21:55.0822962Z shl.b64 %rd113, %rd112, 32; 2026-02-21T08:21:55.0823019Z or.b64 %rd114, %rd111, %rd113; 2026-02-21T08:21:55.0823187Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0823244Z mov.b64 {%r380, %r381}, %rd114; 2026-02-21T08:21:55.0823304Z cvt.rn.f16x2.f32 %r382, %r381, %r380; 2026-02-21T08:21:55.0823473Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0823533Z cvt.u64.u32 %rd115, %r292; 2026-02-21T08:21:55.0823588Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:21:55.0823645Z shl.b64 %rd117, %rd116, 32; 2026-02-21T08:21:55.0823713Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T08:21:55.0823881Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0823939Z mov.b64 {%r383, %r384}, %rd118; 2026-02-21T08:21:55.0824012Z cvt.rn.f16x2.f32 %r385, %r384, %r383; 2026-02-21T08:21:55.0824178Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0824233Z cvt.u64.u32 %rd119, %r294; 2026-02-21T08:21:55.0824296Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:21:55.0824354Z shl.b64 %rd121, %rd120, 32; 2026-02-21T08:21:55.0824411Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T08:21:55.0824576Z .loc 1 55 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:55:27 2026-02-21T08:21:55.0824877Z mov.b64 {%r386, %r387}, %rd122; 2026-02-21T08:21:55.0824942Z cvt.rn.f16x2.f32 %r388, %r387, %r386; 2026-02-21T08:21:55.0825112Z .loc 1 56 45 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:56:45 2026-02-21T08:21:55.0825183Z shl.b32 %r389, %r336, 12; 2026-02-21T08:21:55.0825243Z shl.b32 %r390, %r337, 12; 2026-02-21T08:21:55.0825301Z shl.b32 %r391, %r338, 12; 2026-02-21T08:21:55.0825369Z shl.b32 %r392, %r339, 12; 2026-02-21T08:21:55.0825534Z .loc 1 56 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:56:52 2026-02-21T08:21:55.0825594Z add.s32 %r393, %r389, %r335; 2026-02-21T08:21:55.0825653Z add.s32 %r394, %r390, %r335; 2026-02-21T08:21:55.0825720Z add.s32 %r395, %r391, %r335; 2026-02-21T08:21:55.0825776Z add.s32 %r396, %r392, %r335; 2026-02-21T08:21:55.0825986Z .loc 1 56 24 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:56:24 2026-02-21T08:21:55.0826066Z mad.wide.s32 %rd55, %r393, 2, %rd5; 2026-02-21T08:21:55.0826131Z mad.wide.s32 %rd56, %r394, 2, %rd5; 2026-02-21T08:21:55.0826191Z mad.wide.s32 %rd57, %r395, 2, %rd5; 2026-02-21T08:21:55.0826257Z mad.wide.s32 %rd58, %r396, 2, %rd5; 2026-02-21T08:21:55.0826424Z .loc 1 56 82 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:56:82 2026-02-21T08:21:55.0826478Z bar.sync 0, 128; 2026-02-21T08:21:55.0826570Z st.shared.v4.b32 [%r51], {%r343, %r355, %r367, %r379}; 2026-02-21T08:21:55.0826633Z bar.sync 0, 128; 2026-02-21T08:21:55.0826690Z // begin inline asm 2026-02-21T08:21:55.0826839Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r318, %r322, %r326, %r330}, [%r302]; 2026-02-21T08:21:55.0826901Z // end inline asm 2026-02-21T08:21:55.0826953Z bar.sync 0, 128; 2026-02-21T08:21:55.0827040Z st.shared.v4.b32 [%r51], {%r346, %r358, %r370, %r382}; 2026-02-21T08:21:55.0827092Z bar.sync 0, 128; 2026-02-21T08:21:55.0827157Z // begin inline asm 2026-02-21T08:21:55.0827300Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r319, %r323, %r327, %r331}, [%r302]; 2026-02-21T08:21:55.0827355Z // end inline asm 2026-02-21T08:21:55.0827415Z bar.sync 0, 128; 2026-02-21T08:21:55.0827499Z st.shared.v4.b32 [%r51], {%r349, %r361, %r373, %r385}; 2026-02-21T08:21:55.0827553Z bar.sync 0, 128; 2026-02-21T08:21:55.0827615Z // begin inline asm 2026-02-21T08:21:55.0827757Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r320, %r324, %r328, %r332}, [%r302]; 2026-02-21T08:21:55.0827810Z // end inline asm 2026-02-21T08:21:55.0827861Z bar.sync 0, 128; 2026-02-21T08:21:55.0827950Z st.shared.v4.b32 [%r51], {%r352, %r364, %r376, %r388}; 2026-02-21T08:21:55.0828004Z bar.sync 0, 128; 2026-02-21T08:21:55.0828057Z // begin inline asm 2026-02-21T08:21:55.0828199Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r321, %r325, %r329, %r333}, [%r302]; 2026-02-21T08:21:55.0828251Z // end inline asm 2026-02-21T08:21:55.0828303Z // begin inline asm 2026-02-21T08:21:55.0828401Z st.global.v4.b32 [ %rd55 + 0 ], { %r318, %r319, %r320, %r321 }; 2026-02-21T08:21:55.0828463Z // end inline asm 2026-02-21T08:21:55.0828516Z // begin inline asm 2026-02-21T08:21:55.0828609Z st.global.v4.b32 [ %rd56 + 0 ], { %r322, %r323, %r324, %r325 }; 2026-02-21T08:21:55.0828667Z // end inline asm 2026-02-21T08:21:55.0828720Z // begin inline asm 2026-02-21T08:21:55.0828811Z st.global.v4.b32 [ %rd57 + 0 ], { %r326, %r327, %r328, %r329 }; 2026-02-21T08:21:55.0828863Z // end inline asm 2026-02-21T08:21:55.0828922Z // begin inline asm 2026-02-21T08:21:55.0829011Z st.global.v4.b32 [ %rd58 + 0 ], { %r330, %r331, %r332, %r333 }; 2026-02-21T08:21:55.0829062Z // end inline asm 2026-02-21T08:21:55.0829122Z mov.b32 %r453, 1; 2026-02-21T08:21:55.0829221Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:21:55.0829385Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0829451Z xor.b32 %r457, %r453, %r457; 2026-02-21T08:21:55.0829667Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0829728Z add.s32 %r447, %r447, -1; 2026-02-21T08:21:55.0829787Z setp.ne.b32 %p99, %r447, 0; 2026-02-21T08:21:55.0829852Z @%p99 bra $L__BB0_18; 2026-02-21T08:21:55.0829907Z bra.uni $L__BB0_23; 2026-02-21T08:21:55.0830007Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:55.0830071Z add.s32 %r260, %r452, 1; 2026-02-21T08:21:55.0830131Z setp.eq.b32 %p95, %r452, 127; 2026-02-21T08:21:55.0830190Z selp.b32 %r452, 0, %r260, %p95; 2026-02-21T08:21:55.0830257Z setp.eq.b32 %p96, %r452, 127; 2026-02-21T08:21:55.0830312Z @%p96 bra $L__BB0_21; 2026-02-21T08:21:55.0830407Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:21:55.0830575Z .loc 1 0 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0:131 2026-02-21T08:21:55.0830676Z mov.b32 %r453, 0; 2026-02-21T08:21:55.0830840Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0830899Z setp.ne.b32 %p97, %r452, 0; 2026-02-21T08:21:55.0830962Z @%p97 bra $L__BB0_22; 2026-02-21T08:21:55.0831036Z // %bb.20: // %.thread 2026-02-21T08:21:55.0831123Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:21:55.0831186Z add.s32 %r454, %r454, 18944; 2026-02-21T08:21:55.0831341Z .loc 1 34 35 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:34:35 2026-02-21T08:21:55.0831397Z shr.s32 %r398, %r454, 31; 2026-02-21T08:21:55.0831452Z shr.u32 %r399, %r398, 25; 2026-02-21T08:21:55.0831515Z add.s32 %r400, %r454, %r399; 2026-02-21T08:21:55.0831570Z shr.s32 %r401, %r400, 7; 2026-02-21T08:21:55.0831724Z .loc 1 35 33 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:35:33 2026-02-21T08:21:55.0831791Z shl.b32 %r402, %r401, 2; 2026-02-21T08:21:55.0831948Z .loc 1 36 39 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:36:39 2026-02-21T08:21:55.0832004Z sub.s32 %r403, 64, %r402; 2026-02-21T08:21:55.0832174Z .loc 1 36 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:36:52 2026-02-21T08:21:55.0832230Z min.s32 %r404, %r403, 4; 2026-02-21T08:21:55.0832389Z .loc 1 37 45 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:37:45 2026-02-21T08:21:55.0832451Z and.b32 %r405, %r400, -128; 2026-02-21T08:21:55.0832527Z sub.s32 %r406, %r454, %r405; 2026-02-21T08:21:55.0832691Z .loc 1 38 51 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:38:51 2026-02-21T08:21:55.0832749Z div.s32 %r407, %r406, %r404; 2026-02-21T08:21:55.0832924Z .loc 1 37 64 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:37:64 2026-02-21T08:21:55.0832988Z mul.lo.s32 %r408, %r407, %r404; 2026-02-21T08:21:55.0833050Z sub.s32 %r409, %r406, %r408; 2026-02-21T08:21:55.0833221Z .loc 1 37 30 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:37:30 2026-02-21T08:21:55.0833279Z add.s32 %r410, %r409, %r402; 2026-02-21T08:21:55.0833443Z .loc 1 39 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:39:27 2026-02-21T08:21:55.0833510Z shl.b32 %r456, %r410, 6; 2026-02-21T08:21:55.0833670Z .loc 1 41 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:41:27 2026-02-21T08:21:55.0833726Z shl.b32 %r455, %r407, 6; 2026-02-21T08:21:55.0833895Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0833959Z bra.uni $L__BB0_22; 2026-02-21T08:21:55.0834055Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:21:55.0834221Z .loc 1 0 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0:131 2026-02-21T08:21:55.0834355Z mov.b32 %r72, global_smem; 2026-02-21T08:21:55.0834414Z add.s32 %r73, %r72, %r3; 2026-02-21T08:21:55.0834470Z bra.uni $L__BB0_2; 2026-02-21T08:21:55.0834577Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0834790Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0834873Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:21:55.0834930Z barrier.sync 1; 2026-02-21T08:21:55.0834993Z barrier.sync 1; 2026-02-21T08:21:55.0835071Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:21:55.0835154Z $L__BB0_2: // %.preheader 2026-02-21T08:21:55.0835256Z // =>This Loop Header: Depth=1 2026-02-21T08:21:55.0835345Z // Child Loop BB0_11 Depth 2 2026-02-21T08:21:55.0835482Z // Child Loop BB0_7 Depth 2 2026-02-21T08:21:55.0835664Z .loc 1 19 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:19 2026-02-21T08:21:55.0835742Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:21:55.0835801Z barrier.sync 1; 2026-02-21T08:21:55.0835869Z ld.shared.b8 %r71, [%r73+30868]; 2026-02-21T08:21:55.0835937Z setp.gt.u32 %p4, %r71, 3; 2026-02-21T08:21:55.0835996Z @%p4 bra $L__BB0_4; 2026-02-21T08:21:55.0836076Z // %bb.3: // %.preheader 2026-02-21T08:21:55.0836173Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0836236Z $L_brx_0: .branchtargets 2026-02-21T08:21:55.0836291Z $L__BB0_5, 2026-02-21T08:21:55.0836354Z $L__BB0_9, 2026-02-21T08:21:55.0836407Z $L__BB0_15, 2026-02-21T08:21:55.0836459Z $L__BB0_24; 2026-02-21T08:21:55.0836521Z brx.idx %r71, $L_brx_0; 2026-02-21T08:21:55.0836628Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0836808Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0836889Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:21:55.0836974Z ld.shared.b32 %r120, [global_smem+28672]; 2026-02-21T08:21:55.0837050Z ld.shared.b32 %r431, [global_smem+28680]; 2026-02-21T08:21:55.0837107Z barrier.sync 1; 2026-02-21T08:21:55.0837169Z setp.lt.s32 %p17, %r431, 1; 2026-02-21T08:21:55.0837233Z @%p17 bra $L__BB0_8; 2026-02-21T08:21:55.0837308Z // %bb.6: // %.lr.ph4 2026-02-21T08:21:55.0837392Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0837571Z .loc 1 0 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0:131 2026-02-21T08:21:55.0837627Z mov.b32 %r435, -1; 2026-02-21T08:21:55.0837686Z mov.pred %p117, 0; 2026-02-21T08:21:55.0837747Z mov.b32 %r432, 0; 2026-02-21T08:21:55.0837806Z mov.b32 %r433, %r432; 2026-02-21T08:21:55.0837862Z mov.b32 %r434, %r432; 2026-02-21T08:21:55.0837961Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:21:55.0838060Z // => This Inner Loop Header: Depth=2 2026-02-21T08:21:55.0838234Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0838293Z add.s32 %r124, %r435, 1; 2026-02-21T08:21:55.0838362Z setp.eq.b32 %p24, %r435, 127; 2026-02-21T08:21:55.0838425Z selp.b32 %r435, 0, %r124, %p24; 2026-02-21T08:21:55.0838482Z shl.b32 %r125, %r434, 3; 2026-02-21T08:21:55.0838548Z add.s32 %r127, %r72, %r125; 2026-02-21T08:21:55.0838606Z add.s32 %r128, %r127, 30720; 2026-02-21T08:21:55.0838665Z add.s32 %r118, %r127, 30784; 2026-02-21T08:21:55.0838833Z .loc 1 51 31 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:51:31 2026-02-21T08:21:55.0838898Z shl.b32 %r129, %r434, 11; 2026-02-21T08:21:55.0838958Z add.s32 %r130, %r72, %r129; 2026-02-21T08:21:55.0839186Z .loc 1 52 44 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:52:44 2026-02-21T08:21:55.0839250Z add.s32 %r131, %r130, 14336; 2026-02-21T08:21:55.0839419Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0839480Z bar.warp.sync -1; 2026-02-21T08:21:55.0839546Z // begin inline asm 2026-02-21T08:21:55.0839598Z 2026-02-21T08:21:55.0839648Z { 2026-02-21T08:21:55.0839710Z .reg .pred complete; 2026-02-21T08:21:55.0839776Z waitLoop: 2026-02-21T08:21:55.0839900Z mbarrier.try_wait.parity.shared.b64 complete, [%r118], %r433; 2026-02-21T08:21:55.0839965Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.0840022Z } 2026-02-21T08:21:55.0840026Z 2026-02-21T08:21:55.0840081Z // end inline asm 2026-02-21T08:21:55.0840261Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0840363Z setp.eq.b32 %p23, %r435, 127; 2026-02-21T08:21:55.0840550Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0840613Z elect.sync %r132|%p20, -1; 2026-02-21T08:21:55.0840673Z bfe.u32 %r133, %r130, 4, 14; 2026-02-21T08:21:55.0840741Z cvt.u64.u32 %rd16, %r133; 2026-02-21T08:21:55.0840813Z or.b64 %rd12, %rd16, -4611685949699522560; 2026-02-21T08:21:55.0840870Z bfe.u32 %r134, %r131, 4, 14; 2026-02-21T08:21:55.0840935Z cvt.u64.u32 %rd17, %r134; 2026-02-21T08:21:55.0841004Z or.b64 %rd13, %rd17, -4611685949699522560; 2026-02-21T08:21:55.0841061Z mov.b32 %r121, 68157456; 2026-02-21T08:21:55.0841116Z // begin inline asm 2026-02-21T08:21:55.0841272Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r120 + 0 ], %rd12, %rd13, %r121, %p117; 2026-02-21T08:21:55.0841329Z // end inline asm 2026-02-21T08:21:55.0841386Z cvt.u64.u32 %rd14, %r128; 2026-02-21T08:21:55.0841447Z // begin inline asm 2026-02-21T08:21:55.0841575Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd14]; 2026-02-21T08:21:55.0841631Z // end inline asm 2026-02-21T08:21:55.0841701Z and.pred %p22, %p23, %p20; 2026-02-21T08:21:55.0841757Z add.s32 %r135, %r72, 30848; 2026-02-21T08:21:55.0841813Z cvt.u64.u32 %rd15, %r135; 2026-02-21T08:21:55.0841868Z // begin inline asm 2026-02-21T08:21:55.0841996Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd15]; 2026-02-21T08:21:55.0842049Z // end inline asm 2026-02-21T08:21:55.0842212Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0842278Z setp.ne.b32 %p117, %r435, 127; 2026-02-21T08:21:55.0842443Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0842500Z selp.b32 %r136, 1, 0, %p23; 2026-02-21T08:21:55.0842562Z xor.b32 %r432, %r432, %r136; 2026-02-21T08:21:55.0842616Z add.s32 %r122, %r72, 30864; 2026-02-21T08:21:55.0842672Z // begin inline asm 2026-02-21T08:21:55.0842719Z 2026-02-21T08:21:55.0842780Z { 2026-02-21T08:21:55.0842840Z @!%p23 bra.uni skipWait; 2026-02-21T08:21:55.0842899Z .reg .pred complete; 2026-02-21T08:21:55.0842958Z waitLoop: 2026-02-21T08:21:55.0843075Z mbarrier.try_wait.parity.shared.b64 complete, [%r122], %r432; 2026-02-21T08:21:55.0843135Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.0843188Z skipWait: 2026-02-21T08:21:55.0843243Z } 2026-02-21T08:21:55.0843247Z 2026-02-21T08:21:55.0843300Z // end inline asm 2026-02-21T08:21:55.0843463Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0843527Z add.s32 %r137, %r434, 1; 2026-02-21T08:21:55.0843584Z setp.eq.b32 %p25, %r137, 7; 2026-02-21T08:21:55.0843645Z selp.b32 %r434, 0, %r137, %p25; 2026-02-21T08:21:55.0843701Z selp.b32 %r138, 1, 0, %p25; 2026-02-21T08:21:55.0843764Z xor.b32 %r433, %r433, %r138; 2026-02-21T08:21:55.0843942Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0844049Z add.s32 %r431, %r431, -1; 2026-02-21T08:21:55.0844117Z setp.ne.b32 %p26, %r431, 0; 2026-02-21T08:21:55.0844172Z @%p26 bra $L__BB0_7; 2026-02-21T08:21:55.0844252Z $L__BB0_8: // %._crit_edge5 2026-02-21T08:21:55.0844344Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0844398Z barrier.sync 1; 2026-02-21T08:21:55.0844471Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:21:55.0844525Z bra.uni $L__BB0_2; 2026-02-21T08:21:55.0844620Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0844821Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0844894Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:21:55.0844974Z ld.shared.b32 %r436, [global_smem+28680]; 2026-02-21T08:21:55.0845027Z barrier.sync 1; 2026-02-21T08:21:55.0845236Z .loc 1 21 67 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:21:67 2026-02-21T08:21:55.0845307Z mov.u32 %r17, %ctaid.x; 2026-02-21T08:21:55.0845364Z mov.u32 %r74, %ctaid.y; 2026-02-21T08:21:55.0845419Z mov.u32 %r75, %ctaid.z; 2026-02-21T08:21:55.0845474Z mov.u32 %r76, %nctaid.x; 2026-02-21T08:21:55.0845538Z mov.u32 %r77, %nctaid.y; 2026-02-21T08:21:55.0845600Z mad.lo.s32 %r78, %r75, %r77, %r74; 2026-02-21T08:21:55.0845660Z mad.lo.s32 %r79, %r78, %r76, %r17; 2026-02-21T08:21:55.0845722Z shl.b32 %r80, %r79, 8; 2026-02-21T08:21:55.0845888Z .loc 1 22 67 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:22:67 2026-02-21T08:21:55.0845945Z cvt.s64.s32 %rd7, %r80; 2026-02-21T08:21:55.0846004Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T08:21:55.0846066Z add.s64 %rd9, %rd8, 128; 2026-02-21T08:21:55.0846126Z cvta.global.u64 %rd11, %rd9; 2026-02-21T08:21:55.0846292Z .loc 1 21 67 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:21:67 2026-02-21T08:21:55.0846363Z cvta.global.u64 %rd10, %rd8; 2026-02-21T08:21:55.0846529Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0846589Z setp.lt.s32 %p5, %r436, 1; 2026-02-21T08:21:55.0846653Z @%p5 bra $L__BB0_14; 2026-02-21T08:21:55.0846726Z // %bb.10: // %.lr.ph 2026-02-21T08:21:55.0846812Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0846868Z add.s32 %r446, %r17, -18944; 2026-02-21T08:21:55.0846932Z add.s32 %r19, %r1, -128; 2026-02-21T08:21:55.0846985Z mov.b32 %r443, -1; 2026-02-21T08:21:55.0847036Z mov.b32 %r437, 0; 2026-02-21T08:21:55.0847098Z mov.b32 %r438, %r437; 2026-02-21T08:21:55.0847152Z mov.b32 %r445, %r437; 2026-02-21T08:21:55.0847205Z mov.b32 %r444, %r437; 2026-02-21T08:21:55.0847265Z mov.b32 %r441, %r437; 2026-02-21T08:21:55.0847320Z bra.uni $L__BB0_11; 2026-02-21T08:21:55.0847416Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T08:21:55.0847584Z .loc 1 0 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0:131 2026-02-21T08:21:55.0847651Z selp.b32 %r101, 0, %r441, %p8; 2026-02-21T08:21:55.0847710Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T08:21:55.0847769Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T08:21:55.0847942Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0847997Z shl.b32 %r108, %r438, 3; 2026-02-21T08:21:55.0848052Z add.s32 %r110, %r72, %r108; 2026-02-21T08:21:55.0848107Z add.s32 %r97, %r110, 30720; 2026-02-21T08:21:55.0848275Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0848330Z // begin inline asm 2026-02-21T08:21:55.0848377Z 2026-02-21T08:21:55.0848433Z { 2026-02-21T08:21:55.0848489Z .reg .pred complete; 2026-02-21T08:21:55.0848545Z waitLoop: 2026-02-21T08:21:55.0848665Z mbarrier.try_wait.parity.shared.b64 complete, [%r97], %r437; 2026-02-21T08:21:55.0848780Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.0848829Z } 2026-02-21T08:21:55.0848834Z 2026-02-21T08:21:55.0848888Z // end inline asm 2026-02-21T08:21:55.0849064Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0849123Z add.s32 %r103, %r110, 30784; 2026-02-21T08:21:55.0849278Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0849341Z bar.sync 3, 64; 2026-02-21T08:21:55.0849396Z // begin inline asm 2026-02-21T08:21:55.0849503Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r103], 4096; 2026-02-21T08:21:55.0849563Z // end inline asm 2026-02-21T08:21:55.0849724Z .loc 1 51 31 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:51:31 2026-02-21T08:21:55.0849778Z shl.b32 %r111, %r438, 11; 2026-02-21T08:21:55.0849892Z add.s32 %r100, %r72, %r111; 2026-02-21T08:21:55.0850053Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0850108Z bar.sync 3, 64; 2026-02-21T08:21:55.0850171Z elect.sync %r112|%p13, -1; 2026-02-21T08:21:55.0850239Z and.pred %p10, %p12, %p13; 2026-02-21T08:21:55.0850294Z // begin inline asm 2026-02-21T08:21:55.0850542Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r100], [%rd10, {%r101, %r445}], [%r103]; 2026-02-21T08:21:55.0850603Z // end inline asm 2026-02-21T08:21:55.0850767Z .loc 1 52 44 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:52:44 2026-02-21T08:21:55.0850823Z add.s32 %r104, %r100, 14336; 2026-02-21T08:21:55.0850975Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0851035Z bar.sync 3, 64; 2026-02-21T08:21:55.0851095Z elect.sync %r113|%p14, -1; 2026-02-21T08:21:55.0851156Z and.pred %p11, %p12, %p14; 2026-02-21T08:21:55.0851223Z // begin inline asm 2026-02-21T08:21:55.0851455Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r104], [%rd11, {%r101, %r444}], [%r103]; 2026-02-21T08:21:55.0851509Z // end inline asm 2026-02-21T08:21:55.0851683Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0851739Z add.s32 %r441, %r101, 16; 2026-02-21T08:21:55.0851892Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0851955Z add.s32 %r114, %r438, 1; 2026-02-21T08:21:55.0852012Z setp.eq.b32 %p15, %r114, 7; 2026-02-21T08:21:55.0852071Z selp.b32 %r438, 0, %r114, %p15; 2026-02-21T08:21:55.0852127Z selp.b32 %r115, 1, 0, %p15; 2026-02-21T08:21:55.0852190Z xor.b32 %r437, %r437, %r115; 2026-02-21T08:21:55.0852363Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0852422Z add.s32 %r436, %r436, -1; 2026-02-21T08:21:55.0852488Z setp.ne.b32 %p16, %r436, 0; 2026-02-21T08:21:55.0852544Z @%p16 bra $L__BB0_11; 2026-02-21T08:21:55.0852598Z bra.uni $L__BB0_14; 2026-02-21T08:21:55.0852694Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:21:55.0852791Z // => This Inner Loop Header: Depth=2 2026-02-21T08:21:55.0852847Z add.s32 %r83, %r443, 1; 2026-02-21T08:21:55.0852906Z setp.eq.b32 %p6, %r443, 127; 2026-02-21T08:21:55.0852974Z selp.b32 %r443, 0, %r83, %p6; 2026-02-21T08:21:55.0853033Z setp.ne.b32 %p7, %r443, 0; 2026-02-21T08:21:55.0853090Z setp.eq.b32 %p8, %r443, 0; 2026-02-21T08:21:55.0853151Z @%p7 bra $L__BB0_13; 2026-02-21T08:21:55.0853240Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T08:21:55.0853295Z add.s32 %r446, %r446, 18944; 2026-02-21T08:21:55.0853458Z .loc 1 34 35 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:34:35 2026-02-21T08:21:55.0853562Z shr.s32 %r84, %r446, 31; 2026-02-21T08:21:55.0853617Z shr.u32 %r85, %r84, 25; 2026-02-21T08:21:55.0853671Z add.s32 %r86, %r446, %r85; 2026-02-21T08:21:55.0853734Z shr.s32 %r87, %r86, 7; 2026-02-21T08:21:55.0853897Z .loc 1 35 33 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:35:33 2026-02-21T08:21:55.0853951Z shl.b32 %r88, %r87, 2; 2026-02-21T08:21:55.0854124Z .loc 1 36 39 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:36:39 2026-02-21T08:21:55.0854178Z sub.s32 %r89, 64, %r88; 2026-02-21T08:21:55.0854342Z .loc 1 36 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:36:52 2026-02-21T08:21:55.0854397Z min.s32 %r90, %r89, 4; 2026-02-21T08:21:55.0854569Z .loc 1 37 45 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:37:45 2026-02-21T08:21:55.0854717Z and.b32 %r91, %r86, -128; 2026-02-21T08:21:55.0854777Z sub.s32 %r92, %r446, %r91; 2026-02-21T08:21:55.0854947Z .loc 1 38 51 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:38:51 2026-02-21T08:21:55.0855003Z div.s32 %r93, %r92, %r90; 2026-02-21T08:21:55.0855167Z .loc 1 37 64 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:37:64 2026-02-21T08:21:55.0855231Z mul.lo.s32 %r94, %r93, %r90; 2026-02-21T08:21:55.0855287Z sub.s32 %r95, %r92, %r94; 2026-02-21T08:21:55.0855443Z .loc 1 37 30 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:37:30 2026-02-21T08:21:55.0855498Z add.s32 %r96, %r95, %r88; 2026-02-21T08:21:55.0855666Z .loc 1 39 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:39:27 2026-02-21T08:21:55.0855723Z shl.b32 %r444, %r96, 6; 2026-02-21T08:21:55.0855888Z .loc 1 41 27 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:41:27 2026-02-21T08:21:55.0855952Z shl.b32 %r445, %r93, 6; 2026-02-21T08:21:55.0856009Z bra.uni $L__BB0_13; 2026-02-21T08:21:55.0856090Z $L__BB0_14: // %._crit_edge 2026-02-21T08:21:55.0856181Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0856349Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0856404Z barrier.sync 1; 2026-02-21T08:21:55.0856485Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:21:55.0856541Z bra.uni $L__BB0_2; 2026-02-21T08:21:55.0856632Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:21:55.0856791Z .loc 1 19 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:19 2026-02-21T08:21:55.0856853Z barrier.sync 1; 2026-02-21T08:21:55.0856907Z barrier.sync 1; 2026-02-21T08:21:55.0856961Z bra.uni $L__BB0_2; 2026-02-21T08:21:55.0857049Z $L__BB0_23: // %._crit_edge8 2026-02-21T08:21:55.0857227Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0857281Z barrier.sync 1; 2026-02-21T08:21:55.0857361Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:21:55.0857517Z .loc 1 53 52 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:53:52 2026-02-21T08:21:55.0857571Z bar.sync 0, 128; 2026-02-21T08:21:55.0857623Z // begin inline asm 2026-02-21T08:21:55.0857679Z 2026-02-21T08:21:55.0857727Z { 2026-02-21T08:21:55.0857785Z .reg .pred complete; 2026-02-21T08:21:55.0857847Z waitLoop: 2026-02-21T08:21:55.0857963Z mbarrier.try_wait.parity.shared.b64 complete, [%r411], %r457; 2026-02-21T08:21:55.0858026Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.0858075Z } 2026-02-21T08:21:55.0858078Z 2026-02-21T08:21:55.0858143Z // end inline asm 2026-02-21T08:21:55.0858312Z .loc 1 28 131 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:131 2026-02-21T08:21:55.0858418Z bar.sync 0, 128; 2026-02-21T08:21:55.0858481Z // begin inline asm 2026-02-21T08:21:55.0858566Z @%p100 mbarrier.inval.shared::cta.b64 [%r411]; 2026-02-21T08:21:55.0858620Z // end inline asm 2026-02-21T08:21:55.0858678Z // begin inline asm 2026-02-21T08:21:55.0858761Z @%p100 mbarrier.inval.shared::cta.b64 [%r211]; 2026-02-21T08:21:55.0858812Z // end inline asm 2026-02-21T08:21:55.0858863Z // begin inline asm 2026-02-21T08:21:55.0858945Z @%p100 mbarrier.inval.shared::cta.b64 [%r197]; 2026-02-21T08:21:55.0858996Z // end inline asm 2026-02-21T08:21:55.0859047Z bar.sync 0, 128; 2026-02-21T08:21:55.0859106Z // begin inline asm 2026-02-21T08:21:55.0859179Z @%p100 mbarrier.inval.shared::cta.b64 [%r198]; 2026-02-21T08:21:55.0859230Z // end inline asm 2026-02-21T08:21:55.0859282Z bar.sync 0, 128; 2026-02-21T08:21:55.0859343Z // begin inline asm 2026-02-21T08:21:55.0859414Z @%p100 mbarrier.inval.shared::cta.b64 [%r199]; 2026-02-21T08:21:55.0859518Z // end inline asm 2026-02-21T08:21:55.0859579Z bar.sync 0, 128; 2026-02-21T08:21:55.0859631Z // begin inline asm 2026-02-21T08:21:55.0859704Z @%p100 mbarrier.inval.shared::cta.b64 [%r200]; 2026-02-21T08:21:55.0859763Z // end inline asm 2026-02-21T08:21:55.0859816Z bar.sync 0, 128; 2026-02-21T08:21:55.0859869Z // begin inline asm 2026-02-21T08:21:55.0859942Z @%p100 mbarrier.inval.shared::cta.b64 [%r201]; 2026-02-21T08:21:55.0860001Z // end inline asm 2026-02-21T08:21:55.0860052Z bar.sync 0, 128; 2026-02-21T08:21:55.0860104Z // begin inline asm 2026-02-21T08:21:55.0860186Z @%p100 mbarrier.inval.shared::cta.b64 [%r202]; 2026-02-21T08:21:55.0860236Z // end inline asm 2026-02-21T08:21:55.0860287Z bar.sync 0, 128; 2026-02-21T08:21:55.0860340Z // begin inline asm 2026-02-21T08:21:55.0860422Z @%p100 mbarrier.inval.shared::cta.b64 [%r203]; 2026-02-21T08:21:55.0860475Z // end inline asm 2026-02-21T08:21:55.0860528Z // begin inline asm 2026-02-21T08:21:55.0860611Z @%p100 mbarrier.inval.shared::cta.b64 [%r190]; 2026-02-21T08:21:55.0860666Z // end inline asm 2026-02-21T08:21:55.0860717Z bar.sync 0, 128; 2026-02-21T08:21:55.0860772Z // begin inline asm 2026-02-21T08:21:55.0860854Z @%p100 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T08:21:55.0860905Z // end inline asm 2026-02-21T08:21:55.0860957Z bar.sync 0, 128; 2026-02-21T08:21:55.0861018Z // begin inline asm 2026-02-21T08:21:55.0861093Z @%p100 mbarrier.inval.shared::cta.b64 [%r192]; 2026-02-21T08:21:55.0861144Z // end inline asm 2026-02-21T08:21:55.0861204Z bar.sync 0, 128; 2026-02-21T08:21:55.0861257Z // begin inline asm 2026-02-21T08:21:55.0861328Z @%p100 mbarrier.inval.shared::cta.b64 [%r193]; 2026-02-21T08:21:55.0861379Z // end inline asm 2026-02-21T08:21:55.0861437Z bar.sync 0, 128; 2026-02-21T08:21:55.0861489Z // begin inline asm 2026-02-21T08:21:55.0861561Z @%p100 mbarrier.inval.shared::cta.b64 [%r194]; 2026-02-21T08:21:55.0861618Z // end inline asm 2026-02-21T08:21:55.0861668Z bar.sync 0, 128; 2026-02-21T08:21:55.0861722Z // begin inline asm 2026-02-21T08:21:55.0861797Z @%p100 mbarrier.inval.shared::cta.b64 [%r195]; 2026-02-21T08:21:55.0861854Z // end inline asm 2026-02-21T08:21:55.0861905Z bar.sync 0, 128; 2026-02-21T08:21:55.0861958Z // begin inline asm 2026-02-21T08:21:55.0862037Z @%p100 mbarrier.inval.shared::cta.b64 [%r196]; 2026-02-21T08:21:55.0862089Z // end inline asm 2026-02-21T08:21:55.0862249Z .loc 1 28 4 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:28:4 2026-02-21T08:21:55.0862309Z bar.sync 0, 128; 2026-02-21T08:21:55.0862362Z // begin inline asm 2026-02-21T08:21:55.0862473Z @%p27 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r429, 64; 2026-02-21T08:21:55.0862524Z // end inline asm 2026-02-21T08:21:55.0862608Z st.shared.b32 [global_smem+30872], 50529027; 2026-02-21T08:21:55.0862662Z barrier.sync 1; 2026-02-21T08:21:55.0862741Z $L__BB0_24: // %common.ret 2026-02-21T08:21:55.0862901Z .loc 1 0 0 // chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py:0 2026-02-21T08:21:55.0862995Z ret; 2026-02-21T08:21:55.0863048Z $L__tmp1: 2026-02-21T08:21:55.0863102Z $L__func_end0: 2026-02-21T08:21:55.0863188Z // -- End function 2026-02-21T08:21:55.0863236Z } 2026-02-21T08:21:55.0863434Z .file 1 "/tmp/torchinductor_root/hj/chjweijwjvkys5jtuuc22z44vxktlin72dezckq7aiuhqcwx7la5.py" 2026-02-21T08:21:55.0863501Z .section .debug_abbrev 2026-02-21T08:21:55.0863549Z { 2026-02-21T08:21:55.0863635Z .b8 1 // Abbreviation Code 2026-02-21T08:21:55.0863726Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:55.0863804Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:55.0863881Z .b8 37 // DW_AT_producer 2026-02-21T08:21:55.0863952Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.0864073Z .b8 19 // DW_AT_language 2026-02-21T08:21:55.0864147Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:55.0864223Z .b8 3 // DW_AT_name 2026-02-21T08:21:55.0864302Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.0864377Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:55.0864448Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:55.0864528Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:55.0864598Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.0864664Z .b8 0 // EOM(1) 2026-02-21T08:21:55.0864781Z .b8 0 // EOM(2) 2026-02-21T08:21:55.0864855Z .b8 0 // EOM(3) 2026-02-21T08:21:55.0864905Z } 2026-02-21T08:21:55.0864963Z .section .debug_info 2026-02-21T08:21:55.0865028Z { 2026-02-21T08:21:55.0865111Z .b32 104 // Length of Unit 2026-02-21T08:21:55.0865196Z .b8 2 // DWARF version number 2026-02-21T08:21:55.0865254Z .b8 0 2026-02-21T08:21:55.0865364Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:55.0865451Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:55.0865550Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:55.0865640Z .b8 116 // DW_AT_producer 2026-02-21T08:21:55.0865696Z .b8 114 2026-02-21T08:21:55.0865751Z .b8 105 2026-02-21T08:21:55.0865814Z .b8 116 2026-02-21T08:21:55.0865866Z .b8 111 2026-02-21T08:21:55.0865917Z .b8 110 2026-02-21T08:21:55.0865968Z .b8 0 2026-02-21T08:21:55.0866052Z .b8 2 // DW_AT_language 2026-02-21T08:21:55.0866102Z .b8 0 2026-02-21T08:21:55.0866173Z .b8 99 // DW_AT_name 2026-02-21T08:21:55.0866230Z .b8 104 2026-02-21T08:21:55.0866282Z .b8 106 2026-02-21T08:21:55.0866332Z .b8 119 2026-02-21T08:21:55.0866381Z .b8 101 2026-02-21T08:21:55.0866437Z .b8 105 2026-02-21T08:21:55.0866486Z .b8 106 2026-02-21T08:21:55.0866533Z .b8 119 2026-02-21T08:21:55.0866588Z .b8 106 2026-02-21T08:21:55.0866635Z .b8 118 2026-02-21T08:21:55.0866684Z .b8 107 2026-02-21T08:21:55.0866731Z .b8 121 2026-02-21T08:21:55.0866787Z .b8 115 2026-02-21T08:21:55.0866837Z .b8 53 2026-02-21T08:21:55.0866885Z .b8 106 2026-02-21T08:21:55.0866940Z .b8 116 2026-02-21T08:21:55.0866988Z .b8 117 2026-02-21T08:21:55.0867037Z .b8 117 2026-02-21T08:21:55.0867086Z .b8 99 2026-02-21T08:21:55.0867144Z .b8 50 2026-02-21T08:21:55.0867193Z .b8 50 2026-02-21T08:21:55.0867241Z .b8 122 2026-02-21T08:21:55.0867289Z .b8 52 2026-02-21T08:21:55.0867345Z .b8 52 2026-02-21T08:21:55.0867393Z .b8 118 2026-02-21T08:21:55.0867441Z .b8 120 2026-02-21T08:21:55.0867496Z .b8 107 2026-02-21T08:21:55.0867544Z .b8 116 2026-02-21T08:21:55.0867591Z .b8 108 2026-02-21T08:21:55.0867638Z .b8 105 2026-02-21T08:21:55.0867696Z .b8 110 2026-02-21T08:21:55.0867796Z .b8 55 2026-02-21T08:21:55.0867845Z .b8 50 2026-02-21T08:21:55.0867900Z .b8 100 2026-02-21T08:21:55.0867950Z .b8 101 2026-02-21T08:21:55.0867998Z .b8 122 2026-02-21T08:21:55.0868045Z .b8 99 2026-02-21T08:21:55.0868100Z .b8 107 2026-02-21T08:21:55.0868148Z .b8 113 2026-02-21T08:21:55.0868195Z .b8 55 2026-02-21T08:21:55.0868244Z .b8 97 2026-02-21T08:21:55.0868299Z .b8 105 2026-02-21T08:21:55.0868348Z .b8 117 2026-02-21T08:21:55.0868396Z .b8 104 2026-02-21T08:21:55.0868449Z .b8 113 2026-02-21T08:21:55.0868496Z .b8 99 2026-02-21T08:21:55.0868544Z .b8 119 2026-02-21T08:21:55.0868591Z .b8 120 2026-02-21T08:21:55.0868646Z .b8 55 2026-02-21T08:21:55.0868693Z .b8 108 2026-02-21T08:21:55.0868739Z .b8 97 2026-02-21T08:21:55.0868792Z .b8 53 2026-02-21T08:21:55.0868840Z .b8 46 2026-02-21T08:21:55.0868888Z .b8 112 2026-02-21T08:21:55.0868936Z .b8 121 2026-02-21T08:21:55.0868993Z .b8 0 2026-02-21T08:21:55.0869199Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:55.0869274Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:55.0869332Z .b8 116 2026-02-21T08:21:55.0869380Z .b8 109 2026-02-21T08:21:55.0869428Z .b8 112 2026-02-21T08:21:55.0869476Z .b8 47 2026-02-21T08:21:55.0869533Z .b8 116 2026-02-21T08:21:55.0869582Z .b8 111 2026-02-21T08:21:55.0869630Z .b8 114 2026-02-21T08:21:55.0869685Z .b8 99 2026-02-21T08:21:55.0869734Z .b8 104 2026-02-21T08:21:55.0869782Z .b8 105 2026-02-21T08:21:55.0869831Z .b8 110 2026-02-21T08:21:55.0869887Z .b8 100 2026-02-21T08:21:55.0869936Z .b8 117 2026-02-21T08:21:55.0869984Z .b8 99 2026-02-21T08:21:55.0870033Z .b8 116 2026-02-21T08:21:55.0870090Z .b8 111 2026-02-21T08:21:55.0870138Z .b8 114 2026-02-21T08:21:55.0870186Z .b8 95 2026-02-21T08:21:55.0870244Z .b8 114 2026-02-21T08:21:55.0870293Z .b8 111 2026-02-21T08:21:55.0870343Z .b8 111 2026-02-21T08:21:55.0870393Z .b8 116 2026-02-21T08:21:55.0870454Z .b8 47 2026-02-21T08:21:55.0870504Z .b8 104 2026-02-21T08:21:55.0870553Z .b8 106 2026-02-21T08:21:55.0870616Z .b8 0 2026-02-21T08:21:55.0870670Z } 2026-02-21T08:21:55.0870735Z .section .debug_macinfo { } 2026-02-21T08:21:55.0870739Z 2026-02-21T08:21:55.0870814Z ================================================================ 2026-02-21T08:21:55.0870922Z please share the reproducer above with Triton project. 2026-02-21T08:21:55.4269421Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:55.4270616Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:21:55.4270816Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:55.4270912Z `ptxas` stderr: 2026-02-21T08:21:55.4271351Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.4271466Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.4271474Z 2026-02-21T08:21:55.4271979Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxp_yb4ou.ptx -o /tmp/tmpxp_yb4ou.ptx.o 2026-02-21T08:21:55.4271985Z 2026-02-21T08:21:55.4272161Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:55.4272172Z 2026-02-21T08:21:55.4272358Z 2026-02-21T08:21:55.4272363Z 2026-02-21T08:21:55.4272467Z ================================================================ 2026-02-21T08:21:55.4272557Z Internal Triton PTX codegen error 2026-02-21T08:21:55.4272631Z `ptxas` stderr: 2026-02-21T08:21:55.4273070Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.4273479Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.4273485Z 2026-02-21T08:21:55.4273972Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxp_yb4ou.ptx -o /tmp/tmpxp_yb4ou.ptx.o 2026-02-21T08:21:55.4273985Z 2026-02-21T08:21:55.4273988Z 2026-02-21T08:21:55.4274058Z // 2026-02-21T08:21:55.4274149Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:55.4274205Z // 2026-02-21T08:21:55.4274217Z 2026-02-21T08:21:55.4274279Z .version 8.7 2026-02-21T08:21:55.4274341Z .target sm_100a 2026-02-21T08:21:55.4274401Z .address_size 64 2026-02-21T08:21:55.4274404Z 2026-02-21T08:21:55.4274543Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:55.4274840Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:55.4274936Z // @_helion_matmul 2026-02-21T08:21:55.4275018Z .visible .entry _helion_matmul( 2026-02-21T08:21:55.4275134Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:55.4275245Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:55.4275346Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:55.4275437Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:55.4275531Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:55.4275580Z ) 2026-02-21T08:21:55.4275642Z .reqntid 128 2026-02-21T08:21:55.4275696Z .maxnreg 32 2026-02-21T08:21:55.4275746Z { 2026-02-21T08:21:55.4275815Z .reg .pred %p<120>; 2026-02-21T08:21:55.4275870Z .reg .b32 %r<406>; 2026-02-21T08:21:55.4275923Z .reg .b64 %rd<136>; 2026-02-21T08:21:55.4276099Z .loc 1 19 0 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:19:0 2026-02-21T08:21:55.4276163Z $L__func_begin0: 2026-02-21T08:21:55.4276327Z .loc 1 19 0 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:19:0 2026-02-21T08:21:55.4276332Z 2026-02-21T08:21:55.4276383Z // %bb.0: 2026-02-21T08:21:55.4276475Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:21:55.4276527Z $L__tmp0: 2026-02-21T08:21:55.4276687Z .loc 1 19 0 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:19 2026-02-21T08:21:55.4276751Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:55.4276834Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:21:55.4276895Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:21:55.4276954Z mov.b32 %r30, global_smem; 2026-02-21T08:21:55.4277016Z // begin inline asm 2026-02-21T08:21:55.4277170Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r30], 64; 2026-02-21T08:21:55.4277224Z // end inline asm 2026-02-21T08:21:55.4277315Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T08:21:55.4277371Z bar.sync 0; 2026-02-21T08:21:55.4277439Z ld.shared.b32 %r397, [global_smem]; 2026-02-21T08:21:55.4277498Z bar.sync 0; 2026-02-21T08:21:55.4277551Z // begin inline asm 2026-02-21T08:21:55.4277668Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:55.4277723Z // end inline asm 2026-02-21T08:21:55.4277892Z .loc 1 21 67 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:21:67 2026-02-21T08:21:55.4277949Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:21:55.4278007Z mov.u32 %r47, %ctaid.y; 2026-02-21T08:21:55.4278070Z mov.u32 %r48, %ctaid.z; 2026-02-21T08:21:55.4278127Z mov.u32 %r49, %nctaid.x; 2026-02-21T08:21:55.4278183Z mov.u32 %r50, %nctaid.y; 2026-02-21T08:21:55.4278245Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T08:21:55.4278314Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T08:21:55.4278368Z shl.b32 %r53, %r52, 8; 2026-02-21T08:21:55.4278425Z cvt.s64.s32 %rd41, %r53; 2026-02-21T08:21:55.4278496Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T08:21:55.4278613Z shl.b32 %r54, %r1, 2; 2026-02-21T08:21:55.4278672Z add.s32 %r31, %r30, %r54; 2026-02-21T08:21:55.4278725Z mov.b32 %r40, 0; 2026-02-21T08:21:55.4278788Z // begin inline asm 2026-02-21T08:21:55.4278859Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:55.4278912Z // end inline asm 2026-02-21T08:21:55.4278988Z bar.warp.sync -1; 2026-02-21T08:21:55.4279049Z setp.eq.b32 %p110, %r1, 0; 2026-02-21T08:21:55.4279108Z cvt.u64.u32 %rd4, %r30; 2026-02-21T08:21:55.4279161Z // begin inline asm 2026-02-21T08:21:55.4279334Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:21:55.4279387Z // end inline asm 2026-02-21T08:21:55.4279439Z // begin inline asm 2026-02-21T08:21:55.4279589Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.4279643Z // end inline asm 2026-02-21T08:21:55.4279698Z mov.b32 %r33, 16; 2026-02-21T08:21:55.4279822Z // begin inline asm 2026-02-21T08:21:55.4279977Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:55.4280030Z // end inline asm 2026-02-21T08:21:55.4280090Z mov.b32 %r34, 64; 2026-02-21T08:21:55.4280144Z // begin inline asm 2026-02-21T08:21:55.4280290Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.4280343Z // end inline asm 2026-02-21T08:21:55.4280404Z mov.b32 %r35, 2048; 2026-02-21T08:21:55.4280457Z // begin inline asm 2026-02-21T08:21:55.4280613Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:55.4280672Z // end inline asm 2026-02-21T08:21:55.4280724Z // begin inline asm 2026-02-21T08:21:55.4280878Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:21:55.4280937Z // end inline asm 2026-02-21T08:21:55.4280992Z mov.b64 %rd12, 4096; 2026-02-21T08:21:55.4281048Z // begin inline asm 2026-02-21T08:21:55.4281217Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.4281277Z // end inline asm 2026-02-21T08:21:55.4281328Z mov.b32 %r37, 1; 2026-02-21T08:21:55.4281379Z // begin inline asm 2026-02-21T08:21:55.4281557Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:55.4281610Z // end inline asm 2026-02-21T08:21:55.4281663Z // begin inline asm 2026-02-21T08:21:55.4281835Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:55.4281886Z // end inline asm 2026-02-21T08:21:55.4281939Z // begin inline asm 2026-02-21T08:21:55.4282086Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.4282146Z // end inline asm 2026-02-21T08:21:55.4282199Z // begin inline asm 2026-02-21T08:21:55.4282366Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.4282427Z // end inline asm 2026-02-21T08:21:55.4282480Z // begin inline asm 2026-02-21T08:21:55.4282632Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.4282690Z // end inline asm 2026-02-21T08:21:55.4282743Z // begin inline asm 2026-02-21T08:21:55.4282885Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.4282937Z // end inline asm 2026-02-21T08:21:55.4282996Z // begin inline asm 2026-02-21T08:21:55.4283260Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.4283313Z // end inline asm 2026-02-21T08:21:55.4283373Z // begin inline asm 2026-02-21T08:21:55.4283497Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:21:55.4283567Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.4283647Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.4283699Z // end inline asm 2026-02-21T08:21:55.4283797Z bar.sync 0; 2026-02-21T08:21:55.4283860Z cvta.global.u64 %rd59, %rd19; 2026-02-21T08:21:55.4284031Z .loc 1 22 67 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:22:67 2026-02-21T08:21:55.4284090Z add.s64 %rd37, %rd19, 128; 2026-02-21T08:21:55.4284140Z bar.sync 0; 2026-02-21T08:21:55.4284201Z // begin inline asm 2026-02-21T08:21:55.4284263Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:55.4284314Z // end inline asm 2026-02-21T08:21:55.4284377Z bar.warp.sync -1; 2026-02-21T08:21:55.4284430Z // begin inline asm 2026-02-21T08:21:55.4284589Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:21:55.4284641Z // end inline asm 2026-02-21T08:21:55.4284743Z // begin inline asm 2026-02-21T08:21:55.4284882Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.4284991Z // end inline asm 2026-02-21T08:21:55.4285051Z // begin inline asm 2026-02-21T08:21:55.4285204Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:55.4285255Z // end inline asm 2026-02-21T08:21:55.4285316Z // begin inline asm 2026-02-21T08:21:55.4285464Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.4285517Z // end inline asm 2026-02-21T08:21:55.4285569Z // begin inline asm 2026-02-21T08:21:55.4285730Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:55.4285783Z // end inline asm 2026-02-21T08:21:55.4285837Z mov.b32 %r44, 4096; 2026-02-21T08:21:55.4285899Z // begin inline asm 2026-02-21T08:21:55.4286055Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r44; 2026-02-21T08:21:55.4286107Z // end inline asm 2026-02-21T08:21:55.4286167Z // begin inline asm 2026-02-21T08:21:55.4286332Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.4286388Z // end inline asm 2026-02-21T08:21:55.4286440Z // begin inline asm 2026-02-21T08:21:55.4286615Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:55.4286668Z // end inline asm 2026-02-21T08:21:55.4286722Z // begin inline asm 2026-02-21T08:21:55.4286895Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:55.4286948Z // end inline asm 2026-02-21T08:21:55.4287001Z // begin inline asm 2026-02-21T08:21:55.4287160Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.4287213Z // end inline asm 2026-02-21T08:21:55.4287267Z // begin inline asm 2026-02-21T08:21:55.4287442Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.4287495Z // end inline asm 2026-02-21T08:21:55.4287550Z // begin inline asm 2026-02-21T08:21:55.4287703Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.4287768Z // end inline asm 2026-02-21T08:21:55.4287820Z // begin inline asm 2026-02-21T08:21:55.4287963Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.4288025Z // end inline asm 2026-02-21T08:21:55.4288078Z // begin inline asm 2026-02-21T08:21:55.4288341Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.4288399Z // end inline asm 2026-02-21T08:21:55.4288452Z // begin inline asm 2026-02-21T08:21:55.4288575Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:21:55.4288650Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.4288721Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.4288772Z // end inline asm 2026-02-21T08:21:55.4288823Z bar.sync 0; 2026-02-21T08:21:55.4288898Z cvta.global.u64 %rd60, %rd37; 2026-02-21T08:21:55.4289147Z .loc 1 28 131 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:28:131 2026-02-21T08:21:55.4289212Z setp.gt.u32 %p39, %r3, 2047; 2026-02-21T08:21:55.4289274Z @%p39 bra $L__BB0_8; 2026-02-21T08:21:55.4289349Z // %bb.1: // %.lr.ph 2026-02-21T08:21:55.4289513Z .loc 1 40 45 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:40:45 2026-02-21T08:21:55.4289569Z shl.b32 %r154, %r1, 3; 2026-02-21T08:21:55.4289634Z and.b32 %r155, %r154, 56; 2026-02-21T08:21:55.4289694Z bfe.u32 %r156, %r1, 3, 4; 2026-02-21T08:21:55.4289749Z shr.u32 %r157, %r1, 5; 2026-02-21T08:21:55.4289812Z shl.b32 %r158, %r1, 4; 2026-02-21T08:21:55.4289869Z and.b32 %r159, %r158, 176; 2026-02-21T08:21:55.4289924Z and.b32 %r160, %r1, 96; 2026-02-21T08:21:55.4289986Z shl.b32 %r161, %r160, 3; 2026-02-21T08:21:55.4290042Z bfe.s32 %r162, %r1, 2, 1; 2026-02-21T08:21:55.4290140Z and.b32 %r163, %r162, 1088; 2026-02-21T08:21:55.4290199Z and.b32 %r165, %r54, 64; 2026-02-21T08:21:55.4290264Z xor.b32 %r166, %r163, %r165; 2026-02-21T08:21:55.4290321Z add.s32 %r167, %r30, %r159; 2026-02-21T08:21:55.4290378Z add.s32 %r168, %r167, %r161; 2026-02-21T08:21:55.4290440Z shl.b32 %r169, %r1, 5; 2026-02-21T08:21:55.4290497Z and.b32 %r170, %r169, 1792; 2026-02-21T08:21:55.4290552Z and.b32 %r171, %r154, 48; 2026-02-21T08:21:55.4290606Z shl.b32 %r172, %r160, 1; 2026-02-21T08:21:55.4290668Z shl.b32 %r173, %r1, 6; 2026-02-21T08:21:55.4290722Z and.b32 %r174, %r173, 64; 2026-02-21T08:21:55.4290776Z xor.b32 %r175, %r172, %r174; 2026-02-21T08:21:55.4290839Z add.s32 %r176, %r30, %r170; 2026-02-21T08:21:55.4290894Z add.s32 %r177, %r176, %r171; 2026-02-21T08:21:55.4291057Z .loc 1 35 33 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:35:33 2026-02-21T08:21:55.4291112Z shr.u32 %r178, %r3, 5; 2026-02-21T08:21:55.4291175Z and.b32 %r179, %r178, 62; 2026-02-21T08:21:55.4291338Z .loc 1 37 64 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:37:64 2026-02-21T08:21:55.4291395Z and.b32 %r180, %r3, 1; 2026-02-21T08:21:55.4291560Z .loc 1 37 30 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:37:30 2026-02-21T08:21:55.4291614Z or.b32 %r181, %r179, %r180; 2026-02-21T08:21:55.4291770Z .loc 1 39 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:39:27 2026-02-21T08:21:55.4291830Z shl.b32 %r212, %r181, 6; 2026-02-21T08:21:55.4291985Z .loc 1 41 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:41:27 2026-02-21T08:21:55.4292039Z shl.b32 %r182, %r3, 5; 2026-02-21T08:21:55.4292094Z and.b32 %r208, %r182, 1984; 2026-02-21T08:21:55.4292255Z .loc 1 42 32 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:42:32 2026-02-21T08:21:55.4292310Z or.b32 %r9, %r208, %r156; 2026-02-21T08:21:55.4292465Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4292548Z shfl.sync.idx.b32 %r13, %r157, 0, 31, -1; 2026-02-21T08:21:55.4292604Z shl.b32 %r183, %r13, 21; 2026-02-21T08:21:55.4292664Z and.b32 %r184, %r183, 6291456; 2026-02-21T08:21:55.4292726Z add.s32 %r303, %r184, %r397; 2026-02-21T08:21:55.4292783Z mov.pred %p40, -1; 2026-02-21T08:21:55.4292836Z // begin inline asm 2026-02-21T08:21:55.4293107Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 0], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:55.4293166Z // end inline asm 2026-02-21T08:21:55.4293219Z // begin inline asm 2026-02-21T08:21:55.4293473Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 16], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:55.4293533Z // end inline asm 2026-02-21T08:21:55.4293588Z // begin inline asm 2026-02-21T08:21:55.4293706Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:55.4293765Z // end inline asm 2026-02-21T08:21:55.4293817Z bar.sync 0; 2026-02-21T08:21:55.4293978Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4294043Z add.s32 %r399, %r30, 28736; 2026-02-21T08:21:55.4294096Z // begin inline asm 2026-02-21T08:21:55.4294179Z @%p110 mbarrier.init.shared::cta.b64 [%r399], 1; 2026-02-21T08:21:55.4294233Z // end inline asm 2026-02-21T08:21:55.4294293Z bar.sync 0; 2026-02-21T08:21:55.4294351Z add.s32 %r90, %r30, 28744; 2026-02-21T08:21:55.4294404Z // begin inline asm 2026-02-21T08:21:55.4294493Z @%p110 mbarrier.init.shared::cta.b64 [%r90], 1; 2026-02-21T08:21:55.4294545Z // end inline asm 2026-02-21T08:21:55.4294600Z add.s32 %r91, %r30, 28672; 2026-02-21T08:21:55.4294654Z // begin inline asm 2026-02-21T08:21:55.4294765Z @%p110 mbarrier.init.shared::cta.b64 [%r91], 1; 2026-02-21T08:21:55.4294868Z // end inline asm 2026-02-21T08:21:55.4294924Z bar.sync 0; 2026-02-21T08:21:55.4294987Z add.s32 %r92, %r30, 28680; 2026-02-21T08:21:55.4295041Z // begin inline asm 2026-02-21T08:21:55.4295118Z @%p110 mbarrier.init.shared::cta.b64 [%r92], 1; 2026-02-21T08:21:55.4295172Z // end inline asm 2026-02-21T08:21:55.4295232Z bar.sync 0; 2026-02-21T08:21:55.4295287Z add.s32 %r93, %r30, 28688; 2026-02-21T08:21:55.4295341Z // begin inline asm 2026-02-21T08:21:55.4295429Z @%p110 mbarrier.init.shared::cta.b64 [%r93], 1; 2026-02-21T08:21:55.4295485Z // end inline asm 2026-02-21T08:21:55.4295537Z bar.sync 0; 2026-02-21T08:21:55.4295631Z add.s32 %r94, %r30, 28696; 2026-02-21T08:21:55.4295685Z // begin inline asm 2026-02-21T08:21:55.4295761Z @%p110 mbarrier.init.shared::cta.b64 [%r94], 1; 2026-02-21T08:21:55.4295813Z // end inline asm 2026-02-21T08:21:55.4295873Z bar.sync 0; 2026-02-21T08:21:55.4295927Z add.s32 %r95, %r30, 28704; 2026-02-21T08:21:55.4295980Z // begin inline asm 2026-02-21T08:21:55.4296064Z @%p110 mbarrier.init.shared::cta.b64 [%r95], 1; 2026-02-21T08:21:55.4296118Z // end inline asm 2026-02-21T08:21:55.4296169Z bar.sync 0; 2026-02-21T08:21:55.4296224Z add.s32 %r96, %r30, 28712; 2026-02-21T08:21:55.4296283Z // begin inline asm 2026-02-21T08:21:55.4296359Z @%p110 mbarrier.init.shared::cta.b64 [%r96], 1; 2026-02-21T08:21:55.4296410Z // end inline asm 2026-02-21T08:21:55.4296467Z bar.sync 0; 2026-02-21T08:21:55.4296523Z add.s32 %r205, %r30, 28720; 2026-02-21T08:21:55.4296577Z // begin inline asm 2026-02-21T08:21:55.4296655Z @%p110 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T08:21:55.4296716Z // end inline asm 2026-02-21T08:21:55.4296767Z bar.sync 0; 2026-02-21T08:21:55.4296820Z // begin inline asm 2026-02-21T08:21:55.4296937Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r91], 4096; 2026-02-21T08:21:55.4296990Z // end inline asm 2026-02-21T08:21:55.4297149Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4297212Z // begin inline asm 2026-02-21T08:21:55.4297282Z fence.proxy.async.shared::cta; 2026-02-21T08:21:55.4297334Z // end inline asm 2026-02-21T08:21:55.4297385Z bar.sync 0; 2026-02-21T08:21:55.4297454Z elect.sync %r185|%p70, -1; 2026-02-21T08:21:55.4297514Z and.pred %p52, %p1, %p70; 2026-02-21T08:21:55.4297567Z // begin inline asm 2026-02-21T08:21:55.4297805Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r30], [%rd59, {%r40, %r208}], [%r91]; 2026-02-21T08:21:55.4297858Z // end inline asm 2026-02-21T08:21:55.4298017Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4298076Z bar.sync 0; 2026-02-21T08:21:55.4298136Z elect.sync %r186|%p71, -1; 2026-02-21T08:21:55.4298194Z and.pred %p53, %p1, %p71; 2026-02-21T08:21:55.4298251Z add.s32 %r103, %r30, 14336; 2026-02-21T08:21:55.4298312Z // begin inline asm 2026-02-21T08:21:55.4298544Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r103], [%rd60, {%r40, %r212}], [%r91]; 2026-02-21T08:21:55.4298652Z // end inline asm 2026-02-21T08:21:55.4298815Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4298866Z bar.sync 0; 2026-02-21T08:21:55.4298919Z // begin inline asm 2026-02-21T08:21:55.4299031Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r92], 4096; 2026-02-21T08:21:55.4299083Z // end inline asm 2026-02-21T08:21:55.4299238Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4299289Z bar.sync 0; 2026-02-21T08:21:55.4299356Z elect.sync %r187|%p72, -1; 2026-02-21T08:21:55.4299416Z and.pred %p55, %p1, %p72; 2026-02-21T08:21:55.4299471Z add.s32 %r108, %r30, 2048; 2026-02-21T08:21:55.4299529Z // begin inline asm 2026-02-21T08:21:55.4299799Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd59, {%r33, %r208}], [%r92]; 2026-02-21T08:21:55.4299856Z // end inline asm 2026-02-21T08:21:55.4300022Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4300074Z bar.sync 0; 2026-02-21T08:21:55.4300133Z elect.sync %r188|%p73, -1; 2026-02-21T08:21:55.4300192Z and.pred %p56, %p1, %p73; 2026-02-21T08:21:55.4300257Z add.s32 %r112, %r30, 16384; 2026-02-21T08:21:55.4300311Z // begin inline asm 2026-02-21T08:21:55.4300538Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r112], [%rd60, {%r33, %r212}], [%r92]; 2026-02-21T08:21:55.4300600Z // end inline asm 2026-02-21T08:21:55.4300756Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4300806Z bar.sync 0; 2026-02-21T08:21:55.4300865Z // begin inline asm 2026-02-21T08:21:55.4300969Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r93], 4096; 2026-02-21T08:21:55.4301024Z // end inline asm 2026-02-21T08:21:55.4301185Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4301243Z bar.sync 0; 2026-02-21T08:21:55.4301303Z elect.sync %r189|%p74, -1; 2026-02-21T08:21:55.4301362Z and.pred %p58, %p1, %p74; 2026-02-21T08:21:55.4301423Z add.s32 %r117, %r30, 4096; 2026-02-21T08:21:55.4301475Z mov.b32 %r118, 32; 2026-02-21T08:21:55.4301526Z // begin inline asm 2026-02-21T08:21:55.4301770Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd59, {%r118, %r208}], [%r93]; 2026-02-21T08:21:55.4301822Z // end inline asm 2026-02-21T08:21:55.4301978Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4302028Z bar.sync 0; 2026-02-21T08:21:55.4302095Z elect.sync %r190|%p75, -1; 2026-02-21T08:21:55.4302152Z and.pred %p59, %p1, %p75; 2026-02-21T08:21:55.4302209Z add.s32 %r121, %r30, 18432; 2026-02-21T08:21:55.4302272Z // begin inline asm 2026-02-21T08:21:55.4302502Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd60, {%r118, %r212}], [%r93]; 2026-02-21T08:21:55.4302554Z // end inline asm 2026-02-21T08:21:55.4302720Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4302771Z bar.sync 0; 2026-02-21T08:21:55.4302824Z // begin inline asm 2026-02-21T08:21:55.4302926Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r94], 4096; 2026-02-21T08:21:55.4302987Z // end inline asm 2026-02-21T08:21:55.4303144Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4303195Z bar.sync 0; 2026-02-21T08:21:55.4303262Z elect.sync %r191|%p76, -1; 2026-02-21T08:21:55.4303322Z and.pred %p61, %p1, %p76; 2026-02-21T08:21:55.4303379Z add.s32 %r126, %r30, 6144; 2026-02-21T08:21:55.4303432Z mov.b32 %r127, 48; 2026-02-21T08:21:55.4303496Z // begin inline asm 2026-02-21T08:21:55.4303775Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r126], [%rd59, {%r127, %r208}], [%r94]; 2026-02-21T08:21:55.4303830Z // end inline asm 2026-02-21T08:21:55.4304002Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4304054Z bar.sync 0; 2026-02-21T08:21:55.4304113Z elect.sync %r192|%p77, -1; 2026-02-21T08:21:55.4304179Z and.pred %p62, %p1, %p77; 2026-02-21T08:21:55.4304235Z add.s32 %r130, %r30, 20480; 2026-02-21T08:21:55.4304288Z // begin inline asm 2026-02-21T08:21:55.4304555Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r130], [%rd60, {%r127, %r212}], [%r94]; 2026-02-21T08:21:55.4304607Z // end inline asm 2026-02-21T08:21:55.4304799Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4304850Z bar.sync 0; 2026-02-21T08:21:55.4304964Z // begin inline asm 2026-02-21T08:21:55.4305070Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r95], 4096; 2026-02-21T08:21:55.4305126Z // end inline asm 2026-02-21T08:21:55.4305306Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4305362Z bar.sync 0; 2026-02-21T08:21:55.4305427Z elect.sync %r193|%p78, -1; 2026-02-21T08:21:55.4305498Z and.pred %p64, %p1, %p78; 2026-02-21T08:21:55.4305557Z add.s32 %r135, %r30, 8192; 2026-02-21T08:21:55.4305614Z // begin inline asm 2026-02-21T08:21:55.4305855Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r135], [%rd59, {%r34, %r208}], [%r95]; 2026-02-21T08:21:55.4305920Z // end inline asm 2026-02-21T08:21:55.4306088Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4306144Z bar.sync 0; 2026-02-21T08:21:55.4306214Z elect.sync %r194|%p79, -1; 2026-02-21T08:21:55.4306277Z and.pred %p65, %p1, %p79; 2026-02-21T08:21:55.4306340Z add.s32 %r139, %r30, 22528; 2026-02-21T08:21:55.4306405Z // begin inline asm 2026-02-21T08:21:55.4306646Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r139], [%rd60, {%r34, %r212}], [%r95]; 2026-02-21T08:21:55.4306702Z // end inline asm 2026-02-21T08:21:55.4306869Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4306930Z bar.sync 0; 2026-02-21T08:21:55.4306986Z // begin inline asm 2026-02-21T08:21:55.4307094Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r96], 4096; 2026-02-21T08:21:55.4307157Z // end inline asm 2026-02-21T08:21:55.4307323Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4307376Z bar.sync 0; 2026-02-21T08:21:55.4307445Z elect.sync %r195|%p80, -1; 2026-02-21T08:21:55.4307506Z and.pred %p67, %p1, %p80; 2026-02-21T08:21:55.4307566Z add.s32 %r144, %r30, 10240; 2026-02-21T08:21:55.4307624Z mov.b32 %r145, 80; 2026-02-21T08:21:55.4307687Z // begin inline asm 2026-02-21T08:21:55.4307932Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd59, {%r145, %r208}], [%r96]; 2026-02-21T08:21:55.4307987Z // end inline asm 2026-02-21T08:21:55.4308159Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4308215Z bar.sync 0; 2026-02-21T08:21:55.4308275Z elect.sync %r196|%p81, -1; 2026-02-21T08:21:55.4308342Z and.pred %p68, %p1, %p81; 2026-02-21T08:21:55.4308399Z add.s32 %r148, %r30, 24576; 2026-02-21T08:21:55.4308456Z // begin inline asm 2026-02-21T08:21:55.4308697Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd60, {%r145, %r212}], [%r96]; 2026-02-21T08:21:55.4308760Z // end inline asm 2026-02-21T08:21:55.4308934Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4309065Z bar.sync 0; 2026-02-21T08:21:55.4309129Z // begin inline asm 2026-02-21T08:21:55.4309180Z 2026-02-21T08:21:55.4309229Z { 2026-02-21T08:21:55.4309291Z .reg .pred complete; 2026-02-21T08:21:55.4309352Z waitLoop: 2026-02-21T08:21:55.4309470Z mbarrier.try_wait.parity.shared.b64 complete, [%r91], %r40; 2026-02-21T08:21:55.4309535Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.4309592Z } 2026-02-21T08:21:55.4309596Z 2026-02-21T08:21:55.4309650Z // end inline asm 2026-02-21T08:21:55.4309813Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4309882Z setp.ne.b32 %p82, %r13, 0; 2026-02-21T08:21:55.4309941Z @%p82 bra $L__BB0_3; 2026-02-21T08:21:55.4309992Z // %bb.2: 2026-02-21T08:21:55.4310153Z .loc 1 0 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:0:52 2026-02-21T08:21:55.4310270Z bfe.u32 %r201, %r103, 4, 14; 2026-02-21T08:21:55.4310333Z cvt.u64.u32 %rd57, %r201; 2026-02-21T08:21:55.4310414Z or.b64 %rd55, %rd57, -4611685949699522560; 2026-02-21T08:21:55.4310480Z bfe.u32 %r202, %r30, 4, 14; 2026-02-21T08:21:55.4310538Z cvt.u64.u32 %rd58, %r202; 2026-02-21T08:21:55.4310610Z or.b64 %rd54, %rd58, -4611685949699522560; 2026-02-21T08:21:55.4310787Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4310850Z elect.sync %r203|%p84, -1; 2026-02-21T08:21:55.4310907Z mov.b32 %r198, 68157456; 2026-02-21T08:21:55.4310966Z mov.pred %p83, 0; 2026-02-21T08:21:55.4311030Z // begin inline asm 2026-02-21T08:21:55.4311175Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd54, %rd55, %r198, %p83; 2026-02-21T08:21:55.4311230Z // end inline asm 2026-02-21T08:21:55.4311294Z add.s32 %r204, %r30, 28736; 2026-02-21T08:21:55.4311353Z cvt.u64.u32 %rd56, %r204; 2026-02-21T08:21:55.4311409Z // begin inline asm 2026-02-21T08:21:55.4311544Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T08:21:55.4311601Z // end inline asm 2026-02-21T08:21:55.4311654Z $L__BB0_3: 2026-02-21T08:21:55.4311822Z .loc 1 0 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:0:52 2026-02-21T08:21:55.4311916Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T08:21:55.4311976Z add.s32 %r4, %r168, %r166; 2026-02-21T08:21:55.4312037Z add.s32 %r308, %r177, %r175; 2026-02-21T08:21:55.4312102Z or.b32 %r7, %r212, %r155; 2026-02-21T08:21:55.4312158Z or.b32 %r10, %r9, 16; 2026-02-21T08:21:55.4312214Z or.b32 %r11, %r9, 32; 2026-02-21T08:21:55.4312269Z or.b32 %r12, %r9, 48; 2026-02-21T08:21:55.4312449Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4312504Z bar.sync 0; 2026-02-21T08:21:55.4312562Z // begin inline asm 2026-02-21T08:21:55.4312685Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r205], 4096; 2026-02-21T08:21:55.4312745Z // end inline asm 2026-02-21T08:21:55.4312915Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4312976Z bar.sync 0; 2026-02-21T08:21:55.4313039Z elect.sync %r219|%p90, -1; 2026-02-21T08:21:55.4313100Z and.pred %p87, %p1, %p90; 2026-02-21T08:21:55.4313158Z add.s32 %r206, %r30, 12288; 2026-02-21T08:21:55.4313221Z mov.b32 %r207, 96; 2026-02-21T08:21:55.4313277Z // begin inline asm 2026-02-21T08:21:55.4313531Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r206], [%rd59, {%r207, %r208}], [%r205]; 2026-02-21T08:21:55.4313595Z // end inline asm 2026-02-21T08:21:55.4313762Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4313815Z bar.sync 0; 2026-02-21T08:21:55.4313884Z elect.sync %r220|%p91, -1; 2026-02-21T08:21:55.4313955Z and.pred %p88, %p1, %p91; 2026-02-21T08:21:55.4314013Z add.s32 %r210, %r30, 26624; 2026-02-21T08:21:55.4314109Z // begin inline asm 2026-02-21T08:21:55.4314343Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r210], [%rd60, {%r207, %r212}], [%r205]; 2026-02-21T08:21:55.4314395Z // end inline asm 2026-02-21T08:21:55.4314449Z mov.b32 %r403, 1; 2026-02-21T08:21:55.4314508Z mov.b32 %r402, 6; 2026-02-21T08:21:55.4314560Z mov.b32 %r398, 0; 2026-02-21T08:21:55.4314614Z mov.b32 %r400, %r398; 2026-02-21T08:21:55.4314694Z mov.b32 %r401, %r398; 2026-02-21T08:21:55.4314759Z mov.b32 %r404, %r398; 2026-02-21T08:21:55.4314814Z mov.b32 %r405, %r398; 2026-02-21T08:21:55.4314869Z bra.uni $L__BB0_4; 2026-02-21T08:21:55.4314978Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.4315139Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4315203Z setp.lt.u32 %p100, %r405, 1936; 2026-02-21T08:21:55.4315423Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4315482Z // begin inline asm 2026-02-21T08:21:55.4315531Z 2026-02-21T08:21:55.4315580Z { 2026-02-21T08:21:55.4315645Z .reg .pred complete; 2026-02-21T08:21:55.4315696Z waitLoop: 2026-02-21T08:21:55.4315813Z mbarrier.try_wait.parity.shared.b64 complete, [%r399], %r398; 2026-02-21T08:21:55.4315885Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.4315933Z } 2026-02-21T08:21:55.4315937Z 2026-02-21T08:21:55.4315991Z // end inline asm 2026-02-21T08:21:55.4316057Z add.s32 %r250, %r403, 1; 2026-02-21T08:21:55.4316120Z setp.gt.s32 %p103, %r250, 1; 2026-02-21T08:21:55.4316183Z selp.b32 %r403, 0, %r250, %p103; 2026-02-21T08:21:55.4316242Z selp.b32 %r251, 1, 0, %p103; 2026-02-21T08:21:55.4316305Z xor.b32 %r404, %r260, %r251; 2026-02-21T08:21:55.4316463Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4316522Z add.s32 %r252, %r402, 1; 2026-02-21T08:21:55.4316590Z setp.gt.s32 %p104, %r252, 6; 2026-02-21T08:21:55.4316649Z selp.b32 %r402, 0, %r252, %p104; 2026-02-21T08:21:55.4316703Z shl.b32 %r253, %r402, 3; 2026-02-21T08:21:55.4316757Z add.s32 %r255, %r30, %r253; 2026-02-21T08:21:55.4316819Z add.s32 %r245, %r255, 28672; 2026-02-21T08:21:55.4316869Z bar.sync 0; 2026-02-21T08:21:55.4316927Z and.pred %p97, %p110, %p100; 2026-02-21T08:21:55.4316988Z // begin inline asm 2026-02-21T08:21:55.4317092Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r245], 4096; 2026-02-21T08:21:55.4317144Z // end inline asm 2026-02-21T08:21:55.4317302Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4317365Z shl.b32 %r256, %r402, 11; 2026-02-21T08:21:55.4317421Z add.s32 %r242, %r30, %r256; 2026-02-21T08:21:55.4317470Z bar.sync 0; 2026-02-21T08:21:55.4317535Z elect.sync %r257|%p105, -1; 2026-02-21T08:21:55.4317598Z and.pred %p106, %p100, %p105; 2026-02-21T08:21:55.4317660Z and.pred %p98, %p1, %p106; 2026-02-21T08:21:55.4317724Z add.s32 %r243, %r405, 112; 2026-02-21T08:21:55.4317777Z // begin inline asm 2026-02-21T08:21:55.4318005Z @%p98 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd59, {%r243, %r208}], [%r245]; 2026-02-21T08:21:55.4318057Z // end inline asm 2026-02-21T08:21:55.4318220Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4318275Z add.s32 %r246, %r242, 14336; 2026-02-21T08:21:55.4318324Z bar.sync 0; 2026-02-21T08:21:55.4318389Z elect.sync %r258|%p107, -1; 2026-02-21T08:21:55.4318451Z and.pred %p108, %p100, %p107; 2026-02-21T08:21:55.4318508Z and.pred %p99, %p1, %p108; 2026-02-21T08:21:55.4318568Z // begin inline asm 2026-02-21T08:21:55.4318797Z @%p99 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd60, {%r243, %r212}], [%r245]; 2026-02-21T08:21:55.4318852Z // end inline asm 2026-02-21T08:21:55.4319011Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4319132Z setp.lt.u32 %p109, %r405, 2016; 2026-02-21T08:21:55.4319187Z add.s32 %r405, %r405, 16; 2026-02-21T08:21:55.4319241Z mov.b32 %r398, %r260; 2026-02-21T08:21:55.4319303Z mov.b32 %r399, %r259; 2026-02-21T08:21:55.4319359Z @%p109 bra $L__BB0_4; 2026-02-21T08:21:55.4319412Z bra.uni $L__BB0_7; 2026-02-21T08:21:55.4319516Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:55.4319674Z .loc 1 0 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:0:42 2026-02-21T08:21:55.4319728Z mov.b32 %r260, %r404; 2026-02-21T08:21:55.4319883Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4319948Z add.s32 %r223, %r401, 1; 2026-02-21T08:21:55.4320007Z setp.gt.s32 %p93, %r223, 6; 2026-02-21T08:21:55.4320163Z selp.b32 %r401, 0, %r223, %p93; 2026-02-21T08:21:55.4320232Z selp.b32 %r224, 1, 0, %p93; 2026-02-21T08:21:55.4320288Z xor.b32 %r400, %r400, %r224; 2026-02-21T08:21:55.4320343Z shl.b32 %r225, %r401, 3; 2026-02-21T08:21:55.4320405Z add.s32 %r227, %r30, %r225; 2026-02-21T08:21:55.4320460Z add.s32 %r221, %r227, 28672; 2026-02-21T08:21:55.4320511Z bar.sync 0; 2026-02-21T08:21:55.4320568Z // begin inline asm 2026-02-21T08:21:55.4320626Z 2026-02-21T08:21:55.4320677Z { 2026-02-21T08:21:55.4320736Z .reg .pred complete; 2026-02-21T08:21:55.4320797Z waitLoop: 2026-02-21T08:21:55.4320911Z mbarrier.try_wait.parity.shared.b64 complete, [%r221], %r400; 2026-02-21T08:21:55.4320972Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.4321020Z } 2026-02-21T08:21:55.4321023Z 2026-02-21T08:21:55.4321083Z // end inline asm 2026-02-21T08:21:55.4321137Z shl.b32 %r228, %r403, 3; 2026-02-21T08:21:55.4321194Z add.s32 %r229, %r30, %r228; 2026-02-21T08:21:55.4321257Z add.s32 %r259, %r229, 28736; 2026-02-21T08:21:55.4321422Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4321482Z @%p82 bra $L__BB0_6; 2026-02-21T08:21:55.4321582Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.4321754Z .loc 1 51 31 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:51:31 2026-02-21T08:21:55.4321814Z shl.b32 %r232, %r401, 11; 2026-02-21T08:21:55.4321869Z add.s32 %r234, %r30, %r232; 2026-02-21T08:21:55.4322034Z .loc 1 52 44 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:52:44 2026-02-21T08:21:55.4322090Z add.s32 %r235, %r234, 14336; 2026-02-21T08:21:55.4322248Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4322316Z elect.sync %r236|%p95, -1; 2026-02-21T08:21:55.4322373Z bfe.u32 %r237, %r234, 4, 14; 2026-02-21T08:21:55.4322432Z cvt.u64.u32 %rd64, %r237; 2026-02-21T08:21:55.4322509Z or.b64 %rd61, %rd64, -4611685949699522560; 2026-02-21T08:21:55.4322568Z bfe.u32 %r238, %r235, 4, 14; 2026-02-21T08:21:55.4322623Z cvt.u64.u32 %rd65, %r238; 2026-02-21T08:21:55.4322691Z or.b64 %rd62, %rd65, -4611685949699522560; 2026-02-21T08:21:55.4322753Z mov.b32 %r231, 68157456; 2026-02-21T08:21:55.4322808Z mov.pred %p94, -1; 2026-02-21T08:21:55.4322861Z // begin inline asm 2026-02-21T08:21:55.4323005Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd61, %rd62, %r231, %p94; 2026-02-21T08:21:55.4323058Z // end inline asm 2026-02-21T08:21:55.4323115Z cvt.u64.u32 %rd63, %r259; 2026-02-21T08:21:55.4323168Z // begin inline asm 2026-02-21T08:21:55.4323298Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T08:21:55.4323350Z // end inline asm 2026-02-21T08:21:55.4323403Z bra.uni $L__BB0_6; 2026-02-21T08:21:55.4323502Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:21:55.4323668Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4323761Z // begin inline asm 2026-02-21T08:21:55.4323816Z 2026-02-21T08:21:55.4323863Z { 2026-02-21T08:21:55.4323920Z .reg .pred complete; 2026-02-21T08:21:55.4323972Z waitLoop: 2026-02-21T08:21:55.4324092Z mbarrier.try_wait.parity.shared.b64 complete, [%r259], %r260; 2026-02-21T08:21:55.4324153Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.4324199Z } 2026-02-21T08:21:55.4324203Z 2026-02-21T08:21:55.4324260Z // end inline asm 2026-02-21T08:21:55.4324414Z .loc 1 47 42 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:47:42 2026-02-21T08:21:55.4324467Z bar.sync 0; 2026-02-21T08:21:55.4324518Z // begin inline asm 2026-02-21T08:21:55.4324605Z @%p110 mbarrier.inval.shared::cta.b64 [%r91]; 2026-02-21T08:21:55.4324656Z // end inline asm 2026-02-21T08:21:55.4324730Z bar.sync 0; 2026-02-21T08:21:55.4324791Z // begin inline asm 2026-02-21T08:21:55.4324934Z @%p110 mbarrier.inval.shared::cta.b64 [%r92]; 2026-02-21T08:21:55.4324991Z // end inline asm 2026-02-21T08:21:55.4325049Z bar.sync 0; 2026-02-21T08:21:55.4325102Z // begin inline asm 2026-02-21T08:21:55.4325177Z @%p110 mbarrier.inval.shared::cta.b64 [%r93]; 2026-02-21T08:21:55.4325229Z // end inline asm 2026-02-21T08:21:55.4325289Z bar.sync 0; 2026-02-21T08:21:55.4325344Z // begin inline asm 2026-02-21T08:21:55.4325418Z @%p110 mbarrier.inval.shared::cta.b64 [%r94]; 2026-02-21T08:21:55.4325478Z // end inline asm 2026-02-21T08:21:55.4325529Z bar.sync 0; 2026-02-21T08:21:55.4325582Z // begin inline asm 2026-02-21T08:21:55.4325656Z @%p110 mbarrier.inval.shared::cta.b64 [%r95]; 2026-02-21T08:21:55.4325717Z // end inline asm 2026-02-21T08:21:55.4325769Z bar.sync 0; 2026-02-21T08:21:55.4325821Z // begin inline asm 2026-02-21T08:21:55.4325899Z @%p110 mbarrier.inval.shared::cta.b64 [%r96]; 2026-02-21T08:21:55.4325951Z // end inline asm 2026-02-21T08:21:55.4326001Z bar.sync 0; 2026-02-21T08:21:55.4326055Z // begin inline asm 2026-02-21T08:21:55.4326143Z @%p110 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T08:21:55.4326196Z // end inline asm 2026-02-21T08:21:55.4326252Z add.s32 %r268, %r30, 28736; 2026-02-21T08:21:55.4326313Z // begin inline asm 2026-02-21T08:21:55.4326390Z @%p110 mbarrier.inval.shared::cta.b64 [%r268]; 2026-02-21T08:21:55.4326442Z // end inline asm 2026-02-21T08:21:55.4326499Z bar.sync 0; 2026-02-21T08:21:55.4326552Z // begin inline asm 2026-02-21T08:21:55.4326624Z @%p110 mbarrier.inval.shared::cta.b64 [%r90]; 2026-02-21T08:21:55.4326675Z // end inline asm 2026-02-21T08:21:55.4326839Z .loc 1 56 45 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:56:45 2026-02-21T08:21:55.4326896Z shl.b32 %r341, %r9, 12; 2026-02-21T08:21:55.4326951Z shl.b32 %r342, %r10, 12; 2026-02-21T08:21:55.4327014Z shl.b32 %r343, %r11, 12; 2026-02-21T08:21:55.4327068Z shl.b32 %r344, %r12, 12; 2026-02-21T08:21:55.4327231Z .loc 1 56 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:56:52 2026-02-21T08:21:55.4327298Z or.b32 %r345, %r7, %r341; 2026-02-21T08:21:55.4327352Z or.b32 %r346, %r7, %r342; 2026-02-21T08:21:55.4327405Z or.b32 %r347, %r7, %r343; 2026-02-21T08:21:55.4327458Z or.b32 %r348, %r7, %r344; 2026-02-21T08:21:55.4327622Z .loc 1 56 24 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:56:24 2026-02-21T08:21:55.4327689Z mad.wide.u32 %rd68, %r345, 2, %rd3; 2026-02-21T08:21:55.4327753Z mad.wide.u32 %rd69, %r346, 2, %rd3; 2026-02-21T08:21:55.4327824Z mad.wide.u32 %rd70, %r347, 2, %rd3; 2026-02-21T08:21:55.4327885Z mad.wide.u32 %rd71, %r348, 2, %rd3; 2026-02-21T08:21:55.4328037Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4328100Z // begin inline asm 2026-02-21T08:21:55.4340331Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285}, [%r303 + 0], 32; 2026-02-21T08:21:55.4340656Z // end inline asm 2026-02-21T08:21:55.4340726Z // begin inline asm 2026-02-21T08:21:55.4341031Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302}, [%r303 + 16], 32; 2026-02-21T08:21:55.4341154Z // end inline asm 2026-02-21T08:21:55.4341214Z // begin inline asm 2026-02-21T08:21:55.4341292Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:55.4341357Z // end inline asm 2026-02-21T08:21:55.4341420Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:21:55.4341476Z cvt.u64.u32 %rd73, %r271; 2026-02-21T08:21:55.4341545Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:21:55.4341606Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:21:55.4341795Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4341861Z mov.b64 {%r349, %r350}, %rd75; 2026-02-21T08:21:55.4342026Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:21:55.4342202Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4342263Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:21:55.4342331Z cvt.u64.u32 %rd77, %r273; 2026-02-21T08:21:55.4342389Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:21:55.4342449Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:21:55.4342625Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4342687Z mov.b64 {%r352, %r353}, %rd79; 2026-02-21T08:21:55.4342756Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:21:55.4342922Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4342990Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:21:55.4343047Z cvt.u64.u32 %rd81, %r275; 2026-02-21T08:21:55.4343104Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:21:55.4343178Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:21:55.4343344Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4343410Z mov.b64 {%r355, %r356}, %rd83; 2026-02-21T08:21:55.4343486Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T08:21:55.4343653Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4343711Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:21:55.4343767Z cvt.u64.u32 %rd85, %r277; 2026-02-21T08:21:55.4343831Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:21:55.4343887Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:21:55.4344052Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4344118Z mov.b64 {%r358, %r359}, %rd87; 2026-02-21T08:21:55.4344180Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T08:21:55.4344342Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4344407Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:21:55.4344465Z cvt.u64.u32 %rd89, %r279; 2026-02-21T08:21:55.4344523Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:21:55.4344579Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:21:55.4344789Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4344847Z mov.b64 {%r361, %r362}, %rd91; 2026-02-21T08:21:55.4344911Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T08:21:55.4345078Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4345136Z cvt.u64.u32 %rd92, %r280; 2026-02-21T08:21:55.4345190Z cvt.u64.u32 %rd93, %r281; 2026-02-21T08:21:55.4345258Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:21:55.4345317Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:21:55.4345477Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4345540Z mov.b64 {%r364, %r365}, %rd95; 2026-02-21T08:21:55.4345666Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T08:21:55.4345833Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4345891Z cvt.u64.u32 %rd96, %r282; 2026-02-21T08:21:55.4345961Z cvt.u64.u32 %rd97, %r283; 2026-02-21T08:21:55.4346018Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:21:55.4346078Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:21:55.4346238Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4346307Z mov.b64 {%r367, %r368}, %rd99; 2026-02-21T08:21:55.4346369Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T08:21:55.4346531Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4346602Z cvt.u64.u32 %rd100, %r284; 2026-02-21T08:21:55.4346661Z cvt.u64.u32 %rd101, %r285; 2026-02-21T08:21:55.4346719Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:21:55.4346859Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:21:55.4347026Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4347086Z mov.b64 {%r370, %r371}, %rd103; 2026-02-21T08:21:55.4347149Z cvt.rn.f16x2.f32 %r372, %r371, %r370; 2026-02-21T08:21:55.4347318Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4347376Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:21:55.4347434Z cvt.u64.u32 %rd105, %r288; 2026-02-21T08:21:55.4347501Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:21:55.4347559Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:21:55.4347721Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4347790Z mov.b64 {%r373, %r374}, %rd107; 2026-02-21T08:21:55.4347854Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T08:21:55.4348020Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4348091Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:21:55.4348149Z cvt.u64.u32 %rd109, %r290; 2026-02-21T08:21:55.4348206Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:21:55.4348264Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:21:55.4348431Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4348491Z mov.b64 {%r376, %r377}, %rd111; 2026-02-21T08:21:55.4348557Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T08:21:55.4348737Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4348796Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:21:55.4348855Z cvt.u64.u32 %rd113, %r292; 2026-02-21T08:21:55.4348914Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:21:55.4348985Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:21:55.4349145Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4349206Z mov.b64 {%r379, %r380}, %rd115; 2026-02-21T08:21:55.4349276Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T08:21:55.4349434Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4349492Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:21:55.4349558Z cvt.u64.u32 %rd117, %r294; 2026-02-21T08:21:55.4349617Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:21:55.4349678Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:21:55.4349845Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4349913Z mov.b64 {%r382, %r383}, %rd119; 2026-02-21T08:21:55.4349977Z cvt.rn.f16x2.f32 %r384, %r383, %r382; 2026-02-21T08:21:55.4350144Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4350213Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:21:55.4350276Z cvt.u64.u32 %rd121, %r296; 2026-02-21T08:21:55.4350383Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:21:55.4350453Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:21:55.4350623Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4350682Z mov.b64 {%r385, %r386}, %rd123; 2026-02-21T08:21:55.4350748Z cvt.rn.f16x2.f32 %r387, %r386, %r385; 2026-02-21T08:21:55.4350926Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4350986Z cvt.u64.u32 %rd124, %r297; 2026-02-21T08:21:55.4351047Z cvt.u64.u32 %rd125, %r298; 2026-02-21T08:21:55.4351118Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:21:55.4351177Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:21:55.4351347Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4351416Z mov.b64 {%r388, %r389}, %rd127; 2026-02-21T08:21:55.4351532Z cvt.rn.f16x2.f32 %r390, %r389, %r388; 2026-02-21T08:21:55.4351703Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4351776Z cvt.u64.u32 %rd128, %r299; 2026-02-21T08:21:55.4351834Z cvt.u64.u32 %rd129, %r300; 2026-02-21T08:21:55.4351897Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:21:55.4351957Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:21:55.4352133Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4352196Z mov.b64 {%r391, %r392}, %rd131; 2026-02-21T08:21:55.4352260Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T08:21:55.4352438Z .loc 1 53 52 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:53:52 2026-02-21T08:21:55.4352497Z cvt.u64.u32 %rd132, %r301; 2026-02-21T08:21:55.4352555Z cvt.u64.u32 %rd133, %r302; 2026-02-21T08:21:55.4352617Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:21:55.4352686Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:21:55.4352851Z .loc 1 55 27 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:55:27 2026-02-21T08:21:55.4352916Z mov.b64 {%r394, %r395}, %rd135; 2026-02-21T08:21:55.4352988Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T08:21:55.4353152Z .loc 1 56 82 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:56:82 2026-02-21T08:21:55.4353251Z st.shared.v4.b32 [%r4], {%r351, %r363, %r375, %r387}; 2026-02-21T08:21:55.4353317Z bar.sync 0; 2026-02-21T08:21:55.4353376Z // begin inline asm 2026-02-21T08:21:55.4353537Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r324, %r328, %r332, %r336}, [%r308]; 2026-02-21T08:21:55.4353594Z // end inline asm 2026-02-21T08:21:55.4353658Z bar.sync 0; 2026-02-21T08:21:55.4353752Z st.shared.v4.b32 [%r4], {%r354, %r366, %r378, %r390}; 2026-02-21T08:21:55.4353807Z bar.sync 0; 2026-02-21T08:21:55.4353872Z // begin inline asm 2026-02-21T08:21:55.4354025Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r325, %r329, %r333, %r337}, [%r308]; 2026-02-21T08:21:55.4354084Z // end inline asm 2026-02-21T08:21:55.4354147Z bar.sync 0; 2026-02-21T08:21:55.4354237Z st.shared.v4.b32 [%r4], {%r357, %r369, %r381, %r393}; 2026-02-21T08:21:55.4354292Z bar.sync 0; 2026-02-21T08:21:55.4354349Z // begin inline asm 2026-02-21T08:21:55.4354508Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r326, %r330, %r334, %r338}, [%r308]; 2026-02-21T08:21:55.4354566Z // end inline asm 2026-02-21T08:21:55.4354620Z bar.sync 0; 2026-02-21T08:21:55.4354743Z st.shared.v4.b32 [%r4], {%r360, %r372, %r384, %r396}; 2026-02-21T08:21:55.4354801Z bar.sync 0; 2026-02-21T08:21:55.4354858Z // begin inline asm 2026-02-21T08:21:55.4355004Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r327, %r331, %r335, %r339}, [%r308]; 2026-02-21T08:21:55.4355072Z // end inline asm 2026-02-21T08:21:55.4355131Z // begin inline asm 2026-02-21T08:21:55.4355239Z st.global.v4.b32 [ %rd68 + 0 ], { %r324, %r325, %r326, %r327 }; 2026-02-21T08:21:55.4355309Z // end inline asm 2026-02-21T08:21:55.4355422Z // begin inline asm 2026-02-21T08:21:55.4355524Z st.global.v4.b32 [ %rd69 + 0 ], { %r328, %r329, %r330, %r331 }; 2026-02-21T08:21:55.4355583Z // end inline asm 2026-02-21T08:21:55.4355650Z // begin inline asm 2026-02-21T08:21:55.4355749Z st.global.v4.b32 [ %rd70 + 0 ], { %r332, %r333, %r334, %r335 }; 2026-02-21T08:21:55.4355807Z // end inline asm 2026-02-21T08:21:55.4355874Z // begin inline asm 2026-02-21T08:21:55.4355970Z st.global.v4.b32 [ %rd71 + 0 ], { %r336, %r337, %r338, %r339 }; 2026-02-21T08:21:55.4356028Z // end inline asm 2026-02-21T08:21:55.4356122Z $L__BB0_8: // %._crit_edge 2026-02-21T08:21:55.4356299Z .loc 1 28 4 // caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py:28:4 2026-02-21T08:21:55.4356355Z bar.sync 0; 2026-02-21T08:21:55.4356412Z // begin inline asm 2026-02-21T08:21:55.4356547Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r397, 64; 2026-02-21T08:21:55.4356653Z // end inline asm 2026-02-21T08:21:55.4356713Z ret; 2026-02-21T08:21:55.4356782Z $L__tmp1: 2026-02-21T08:21:55.4356841Z $L__func_end0: 2026-02-21T08:21:55.4356931Z // -- End function 2026-02-21T08:21:55.4356985Z } 2026-02-21T08:21:55.4357208Z .file 1 "/tmp/torchinductor_root/ap/caphyqnog3wc263l6r2bpz2rf3wt6lvqce47mpnakemwf4mc2vc2.py" 2026-02-21T08:21:55.4357280Z .section .debug_abbrev 2026-02-21T08:21:55.4357333Z { 2026-02-21T08:21:55.4357428Z .b8 1 // Abbreviation Code 2026-02-21T08:21:55.4357511Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:55.4357593Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:55.4357678Z .b8 37 // DW_AT_producer 2026-02-21T08:21:55.4357752Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.4357823Z .b8 19 // DW_AT_language 2026-02-21T08:21:55.4357900Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:55.4357984Z .b8 3 // DW_AT_name 2026-02-21T08:21:55.4358056Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.4358132Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:55.4358214Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:55.4358289Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:55.4358358Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.4358436Z .b8 0 // EOM(1) 2026-02-21T08:21:55.4358503Z .b8 0 // EOM(2) 2026-02-21T08:21:55.4358568Z .b8 0 // EOM(3) 2026-02-21T08:21:55.4358619Z } 2026-02-21T08:21:55.4358687Z .section .debug_info 2026-02-21T08:21:55.4358738Z { 2026-02-21T08:21:55.4358819Z .b32 104 // Length of Unit 2026-02-21T08:21:55.4358918Z .b8 2 // DWARF version number 2026-02-21T08:21:55.4358971Z .b8 0 2026-02-21T08:21:55.4359085Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:55.4359181Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:55.4359277Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:55.4359356Z .b8 116 // DW_AT_producer 2026-02-21T08:21:55.4359408Z .b8 114 2026-02-21T08:21:55.4359471Z .b8 105 2026-02-21T08:21:55.4359526Z .b8 116 2026-02-21T08:21:55.4359575Z .b8 111 2026-02-21T08:21:55.4359634Z .b8 110 2026-02-21T08:21:55.4359686Z .b8 0 2026-02-21T08:21:55.4359758Z .b8 2 // DW_AT_language 2026-02-21T08:21:55.4359808Z .b8 0 2026-02-21T08:21:55.4359893Z .b8 99 // DW_AT_name 2026-02-21T08:21:55.4359944Z .b8 97 2026-02-21T08:21:55.4359995Z .b8 112 2026-02-21T08:21:55.4360055Z .b8 104 2026-02-21T08:21:55.4360147Z .b8 121 2026-02-21T08:21:55.4360199Z .b8 113 2026-02-21T08:21:55.4360248Z .b8 110 2026-02-21T08:21:55.4360309Z .b8 111 2026-02-21T08:21:55.4360360Z .b8 103 2026-02-21T08:21:55.4360411Z .b8 51 2026-02-21T08:21:55.4360470Z .b8 119 2026-02-21T08:21:55.4360521Z .b8 99 2026-02-21T08:21:55.4360569Z .b8 50 2026-02-21T08:21:55.4360618Z .b8 54 2026-02-21T08:21:55.4360677Z .b8 51 2026-02-21T08:21:55.4360727Z .b8 108 2026-02-21T08:21:55.4360777Z .b8 54 2026-02-21T08:21:55.4360825Z .b8 114 2026-02-21T08:21:55.4360882Z .b8 50 2026-02-21T08:21:55.4360933Z .b8 98 2026-02-21T08:21:55.4360981Z .b8 112 2026-02-21T08:21:55.4361037Z .b8 122 2026-02-21T08:21:55.4361087Z .b8 50 2026-02-21T08:21:55.4361136Z .b8 114 2026-02-21T08:21:55.4361186Z .b8 102 2026-02-21T08:21:55.4361244Z .b8 51 2026-02-21T08:21:55.4361293Z .b8 119 2026-02-21T08:21:55.4361343Z .b8 116 2026-02-21T08:21:55.4361401Z .b8 54 2026-02-21T08:21:55.4361449Z .b8 108 2026-02-21T08:21:55.4361499Z .b8 118 2026-02-21T08:21:55.4361586Z .b8 113 2026-02-21T08:21:55.4361648Z .b8 99 2026-02-21T08:21:55.4361700Z .b8 101 2026-02-21T08:21:55.4361748Z .b8 52 2026-02-21T08:21:55.4361805Z .b8 55 2026-02-21T08:21:55.4361855Z .b8 109 2026-02-21T08:21:55.4361905Z .b8 112 2026-02-21T08:21:55.4361954Z .b8 110 2026-02-21T08:21:55.4362012Z .b8 97 2026-02-21T08:21:55.4362061Z .b8 107 2026-02-21T08:21:55.4362111Z .b8 101 2026-02-21T08:21:55.4362159Z .b8 109 2026-02-21T08:21:55.4362223Z .b8 119 2026-02-21T08:21:55.4362274Z .b8 102 2026-02-21T08:21:55.4362322Z .b8 52 2026-02-21T08:21:55.4362380Z .b8 109 2026-02-21T08:21:55.4362429Z .b8 99 2026-02-21T08:21:55.4362477Z .b8 50 2026-02-21T08:21:55.4362526Z .b8 118 2026-02-21T08:21:55.4362585Z .b8 99 2026-02-21T08:21:55.4362635Z .b8 50 2026-02-21T08:21:55.4362683Z .b8 46 2026-02-21T08:21:55.4362742Z .b8 112 2026-02-21T08:21:55.4362791Z .b8 121 2026-02-21T08:21:55.4362844Z .b8 0 2026-02-21T08:21:55.4362933Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:55.4363021Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:55.4363075Z .b8 116 2026-02-21T08:21:55.4363124Z .b8 109 2026-02-21T08:21:55.4363182Z .b8 112 2026-02-21T08:21:55.4363232Z .b8 47 2026-02-21T08:21:55.4363281Z .b8 116 2026-02-21T08:21:55.4363330Z .b8 111 2026-02-21T08:21:55.4363388Z .b8 114 2026-02-21T08:21:55.4363436Z .b8 99 2026-02-21T08:21:55.4363486Z .b8 104 2026-02-21T08:21:55.4363535Z .b8 105 2026-02-21T08:21:55.4363595Z .b8 110 2026-02-21T08:21:55.4363644Z .b8 100 2026-02-21T08:21:55.4363692Z .b8 117 2026-02-21T08:21:55.4363751Z .b8 99 2026-02-21T08:21:55.4363801Z .b8 116 2026-02-21T08:21:55.4363851Z .b8 111 2026-02-21T08:21:55.4363900Z .b8 114 2026-02-21T08:21:55.4363959Z .b8 95 2026-02-21T08:21:55.4364007Z .b8 114 2026-02-21T08:21:55.4364058Z .b8 111 2026-02-21T08:21:55.4364116Z .b8 111 2026-02-21T08:21:55.4364166Z .b8 116 2026-02-21T08:21:55.4364218Z .b8 47 2026-02-21T08:21:55.4364266Z .b8 97 2026-02-21T08:21:55.4364325Z .b8 112 2026-02-21T08:21:55.4364376Z .b8 0 2026-02-21T08:21:55.4364429Z } 2026-02-21T08:21:55.4364504Z .section .debug_macinfo { } 2026-02-21T08:21:55.4364510Z 2026-02-21T08:21:55.4364585Z ================================================================ 2026-02-21T08:21:55.4364719Z please share the reproducer above with Triton project. 2026-02-21T08:21:55.6337590Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:55.6337865Z 2026-02-21T08:21:55.6337948Z 2026-02-21T08:21:55.6338048Z 2026-02-21T08:21:55.6338169Z ================================================================ 2026-02-21T08:21:55.6338429Z Internal Triton PTX codegen error 2026-02-21T08:21:55.6338699Z `ptxas` stderr: 2026-02-21T08:21:55.6339146Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 279 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.6339639Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.6340910Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:21:55.6342487Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:55.6342728Z `ptxas` stderr: 2026-02-21T08:21:55.6343135Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 279 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.6343590Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.6343740Z 2026-02-21T08:21:55.6344249Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjd98g37m.ptx -o /tmp/tmpjd98g37m.ptx.o 2026-02-21T08:21:55.6344877Z 2026-02-21T08:21:55.6345015Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:55.6345208Z 2026-02-21T08:21:55.6345584Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjd98g37m.ptx -o /tmp/tmpjd98g37m.ptx.o 2026-02-21T08:21:55.6346024Z 2026-02-21T08:21:55.6346027Z 2026-02-21T08:21:55.6346080Z // 2026-02-21T08:21:55.6346230Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:55.6346400Z // 2026-02-21T08:21:55.6346466Z 2026-02-21T08:21:55.6346532Z .version 8.7 2026-02-21T08:21:55.6346665Z .target sm_100a 2026-02-21T08:21:55.6346807Z .address_size 64 2026-02-21T08:21:55.6346891Z 2026-02-21T08:21:55.6347009Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:55.6347260Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:55.6347481Z // @_helion_matmul 2026-02-21T08:21:55.6347687Z .visible .entry _helion_matmul( 2026-02-21T08:21:55.6347894Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:55.6348144Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:55.6348388Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:55.6348621Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:55.6348865Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:55.6349056Z ) 2026-02-21T08:21:55.6349180Z .reqntid 128 2026-02-21T08:21:55.6349305Z .maxnreg 32 2026-02-21T08:21:55.6349430Z { 2026-02-21T08:21:55.6349551Z .reg .pred %p<141>; 2026-02-21T08:21:55.6349701Z .reg .b32 %r<365>; 2026-02-21T08:21:55.6349840Z .reg .b64 %rd<153>; 2026-02-21T08:21:55.6350108Z .loc 1 19 0 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:19:0 2026-02-21T08:21:55.6350406Z $L__func_begin0: 2026-02-21T08:21:55.6350657Z .loc 1 19 0 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:19:0 2026-02-21T08:21:55.6350896Z 2026-02-21T08:21:55.6350950Z // %bb.0: 2026-02-21T08:21:55.6351097Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:21:55.6351288Z $L__tmp0: 2026-02-21T08:21:55.6351523Z .loc 1 19 0 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:19 2026-02-21T08:21:55.6351810Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:55.6351984Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:21:55.6352178Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:21:55.6352360Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T08:21:55.6352549Z mov.b32 %r27, global_smem; 2026-02-21T08:21:55.6352709Z // begin inline asm 2026-02-21T08:21:55.6352932Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r27], 64; 2026-02-21T08:21:55.6353176Z // end inline asm 2026-02-21T08:21:55.6353332Z ld.param.b64 %rd58, [_helion_matmul_param_3]; 2026-02-21T08:21:55.6353661Z bar.sync 0; 2026-02-21T08:21:55.6353809Z ld.shared.b32 %r356, [global_smem]; 2026-02-21T08:21:55.6353978Z bar.sync 0; 2026-02-21T08:21:55.6354111Z // begin inline asm 2026-02-21T08:21:55.6354310Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:55.6354542Z // end inline asm 2026-02-21T08:21:55.6354847Z .loc 1 21 67 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:21:67 2026-02-21T08:21:55.6355148Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:21:55.6355297Z mov.u32 %r52, %ctaid.y; 2026-02-21T08:21:55.6355450Z mov.u32 %r53, %ctaid.z; 2026-02-21T08:21:55.6355606Z mov.u32 %r54, %nctaid.x; 2026-02-21T08:21:55.6355759Z mov.u32 %r55, %nctaid.y; 2026-02-21T08:21:55.6355924Z mad.lo.s32 %r56, %r53, %r55, %r52; 2026-02-21T08:21:55.6356101Z mad.lo.s32 %r57, %r56, %r54, %r3; 2026-02-21T08:21:55.6356282Z mul.lo.s32 %r58, %r57, 384; 2026-02-21T08:21:55.6356507Z cvt.s64.s32 %rd59, %r58; 2026-02-21T08:21:55.6356676Z add.s64 %rd19, %rd58, %rd59; 2026-02-21T08:21:55.6356835Z shl.b32 %r59, %r1, 2; 2026-02-21T08:21:55.6356993Z add.s32 %r28, %r27, %r59; 2026-02-21T08:21:55.6357143Z mov.b32 %r37, 0; 2026-02-21T08:21:55.6357292Z // begin inline asm 2026-02-21T08:21:55.6357451Z @%p1 st.shared.b32 [ %r28 + 0 ], %r37; 2026-02-21T08:21:55.6357620Z // end inline asm 2026-02-21T08:21:55.6357764Z bar.warp.sync -1; 2026-02-21T08:21:55.6357918Z setp.eq.b32 %p128, %r1, 0; 2026-02-21T08:21:55.6358091Z cvt.u64.u32 %rd4, %r27; 2026-02-21T08:21:55.6358244Z // begin inline asm 2026-02-21T08:21:55.6358509Z @%p128 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:21:55.6358801Z // end inline asm 2026-02-21T08:21:55.6358938Z // begin inline asm 2026-02-21T08:21:55.6359168Z @%p128 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.6359413Z // end inline asm 2026-02-21T08:21:55.6359553Z mov.b32 %r30, 16; 2026-02-21T08:21:55.6359688Z // begin inline asm 2026-02-21T08:21:55.6359925Z @%p128 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T08:21:55.6360184Z // end inline asm 2026-02-21T08:21:55.6360321Z mov.b32 %r31, 64; 2026-02-21T08:21:55.6360450Z // begin inline asm 2026-02-21T08:21:55.6360679Z @%p128 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r31; 2026-02-21T08:21:55.6360938Z // end inline asm 2026-02-21T08:21:55.6361067Z mov.b32 %r32, 2048; 2026-02-21T08:21:55.6361209Z // begin inline asm 2026-02-21T08:21:55.6361439Z @%p128 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T08:21:55.6361709Z // end inline asm 2026-02-21T08:21:55.6361836Z // begin inline asm 2026-02-21T08:21:55.6362074Z @%p128 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T08:21:55.6362338Z // end inline asm 2026-02-21T08:21:55.6362469Z mov.b64 %rd12, 4096; 2026-02-21T08:21:55.6362612Z // begin inline asm 2026-02-21T08:21:55.6362858Z @%p128 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.6363135Z // end inline asm 2026-02-21T08:21:55.6363263Z mov.b32 %r34, 1; 2026-02-21T08:21:55.6363396Z // begin inline asm 2026-02-21T08:21:55.6363640Z @%p128 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r34; 2026-02-21T08:21:55.6363921Z // end inline asm 2026-02-21T08:21:55.6364055Z // begin inline asm 2026-02-21T08:21:55.6364298Z @%p128 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.6364574Z // end inline asm 2026-02-21T08:21:55.6364732Z // begin inline asm 2026-02-21T08:21:55.6364968Z @%p128 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.6365220Z // end inline asm 2026-02-21T08:21:55.6365355Z // begin inline asm 2026-02-21T08:21:55.6365608Z @%p128 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.6365957Z // end inline asm 2026-02-21T08:21:55.6366093Z // begin inline asm 2026-02-21T08:21:55.6366323Z @%p128 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.6366589Z // end inline asm 2026-02-21T08:21:55.6366719Z // begin inline asm 2026-02-21T08:21:55.6366956Z @%p128 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.6367216Z // end inline asm 2026-02-21T08:21:55.6367356Z // begin inline asm 2026-02-21T08:21:55.6367705Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.6368079Z // end inline asm 2026-02-21T08:21:55.6368226Z // begin inline asm 2026-02-21T08:21:55.6368431Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:21:55.6368684Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.6368928Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.6369110Z // end inline asm 2026-02-21T08:21:55.6369244Z bar.sync 0; 2026-02-21T08:21:55.6369379Z cvta.global.u64 %rd79, %rd19; 2026-02-21T08:21:55.6369664Z .loc 1 22 67 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:22:67 2026-02-21T08:21:55.6369956Z add.s32 %r60, %r58, 128; 2026-02-21T08:21:55.6370113Z cvt.s64.s32 %rd60, %r60; 2026-02-21T08:21:55.6370262Z add.s64 %rd37, %rd58, %rd60; 2026-02-21T08:21:55.6370422Z bar.sync 0; 2026-02-21T08:21:55.6370545Z // begin inline asm 2026-02-21T08:21:55.6370698Z @%p1 st.shared.b32 [ %r28 + 0 ], %r37; 2026-02-21T08:21:55.6370870Z // end inline asm 2026-02-21T08:21:55.6371006Z bar.warp.sync -1; 2026-02-21T08:21:55.6371148Z // begin inline asm 2026-02-21T08:21:55.6371390Z @%p128 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:21:55.6371672Z // end inline asm 2026-02-21T08:21:55.6371802Z // begin inline asm 2026-02-21T08:21:55.6372028Z @%p128 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.6372280Z // end inline asm 2026-02-21T08:21:55.6372415Z // begin inline asm 2026-02-21T08:21:55.6372646Z @%p128 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T08:21:55.6372901Z // end inline asm 2026-02-21T08:21:55.6373034Z // begin inline asm 2026-02-21T08:21:55.6373260Z @%p128 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r31; 2026-02-21T08:21:55.6373520Z // end inline asm 2026-02-21T08:21:55.6373646Z // begin inline asm 2026-02-21T08:21:55.6373886Z @%p128 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T08:21:55.6374163Z // end inline asm 2026-02-21T08:21:55.6374291Z mov.b32 %r41, 4096; 2026-02-21T08:21:55.6374432Z // begin inline asm 2026-02-21T08:21:55.6374661Z @%p128 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r41; 2026-02-21T08:21:55.6374970Z // end inline asm 2026-02-21T08:21:55.6375101Z // begin inline asm 2026-02-21T08:21:55.6375348Z @%p128 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.6375620Z // end inline asm 2026-02-21T08:21:55.6375755Z // begin inline asm 2026-02-21T08:21:55.6376003Z @%p128 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r34; 2026-02-21T08:21:55.6376280Z // end inline asm 2026-02-21T08:21:55.6376413Z // begin inline asm 2026-02-21T08:21:55.6376674Z @%p128 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.6376968Z // end inline asm 2026-02-21T08:21:55.6377104Z // begin inline asm 2026-02-21T08:21:55.6377351Z @%p128 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.6377627Z // end inline asm 2026-02-21T08:21:55.6377762Z // begin inline asm 2026-02-21T08:21:55.6378021Z @%p128 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.6378373Z // end inline asm 2026-02-21T08:21:55.6378514Z // begin inline asm 2026-02-21T08:21:55.6378749Z @%p128 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.6379031Z // end inline asm 2026-02-21T08:21:55.6379163Z // begin inline asm 2026-02-21T08:21:55.6379401Z @%p128 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.6379677Z // end inline asm 2026-02-21T08:21:55.6379810Z // begin inline asm 2026-02-21T08:21:55.6380165Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.6380547Z // end inline asm 2026-02-21T08:21:55.6380689Z // begin inline asm 2026-02-21T08:21:55.6380899Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:21:55.6381159Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.6381431Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.6381615Z // end inline asm 2026-02-21T08:21:55.6381758Z bar.sync 0; 2026-02-21T08:21:55.6381898Z cvta.global.u64 %rd80, %rd37; 2026-02-21T08:21:55.6382187Z .loc 1 24 71 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:24:71 2026-02-21T08:21:55.6382483Z add.s32 %r61, %r58, 256; 2026-02-21T08:21:55.6382644Z cvt.s64.s32 %rd61, %r61; 2026-02-21T08:21:55.6382803Z add.s64 %rd55, %rd58, %rd61; 2026-02-21T08:21:55.6382964Z bar.sync 0; 2026-02-21T08:21:55.6383101Z // begin inline asm 2026-02-21T08:21:55.6383250Z @%p1 st.shared.b32 [ %r28 + 0 ], %r37; 2026-02-21T08:21:55.6383429Z // end inline asm 2026-02-21T08:21:55.6383565Z bar.warp.sync -1; 2026-02-21T08:21:55.6383714Z // begin inline asm 2026-02-21T08:21:55.6383964Z @%p128 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T08:21:55.6384257Z // end inline asm 2026-02-21T08:21:55.6384396Z // begin inline asm 2026-02-21T08:21:55.6384621Z @%p128 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.6384920Z // end inline asm 2026-02-21T08:21:55.6385048Z // begin inline asm 2026-02-21T08:21:55.6385285Z @%p128 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r31; 2026-02-21T08:21:55.6385545Z // end inline asm 2026-02-21T08:21:55.6385680Z // begin inline asm 2026-02-21T08:21:55.6385905Z @%p128 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r31; 2026-02-21T08:21:55.6386166Z // end inline asm 2026-02-21T08:21:55.6386293Z // begin inline asm 2026-02-21T08:21:55.6386532Z @%p128 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r41; 2026-02-21T08:21:55.6386806Z // end inline asm 2026-02-21T08:21:55.6386934Z // begin inline asm 2026-02-21T08:21:55.6387173Z @%p128 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T08:21:55.6387439Z // end inline asm 2026-02-21T08:21:55.6387576Z mov.b64 %rd48, 8192; 2026-02-21T08:21:55.6387716Z // begin inline asm 2026-02-21T08:21:55.6387970Z @%p128 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd48; 2026-02-21T08:21:55.6388262Z // end inline asm 2026-02-21T08:21:55.6388391Z // begin inline asm 2026-02-21T08:21:55.6388643Z @%p128 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r34; 2026-02-21T08:21:55.6388919Z // end inline asm 2026-02-21T08:21:55.6389052Z // begin inline asm 2026-02-21T08:21:55.6389295Z @%p128 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.6389577Z // end inline asm 2026-02-21T08:21:55.6389709Z // begin inline asm 2026-02-21T08:21:55.6389934Z @%p128 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.6390200Z // end inline asm 2026-02-21T08:21:55.6390327Z // begin inline asm 2026-02-21T08:21:55.6390582Z @%p128 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.6390911Z // end inline asm 2026-02-21T08:21:55.6391046Z // begin inline asm 2026-02-21T08:21:55.6391269Z @%p128 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T08:21:55.6391534Z // end inline asm 2026-02-21T08:21:55.6391667Z // begin inline asm 2026-02-21T08:21:55.6391886Z @%p128 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.6392145Z // end inline asm 2026-02-21T08:21:55.6392273Z // begin inline asm 2026-02-21T08:21:55.6392612Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.6392973Z // end inline asm 2026-02-21T08:21:55.6393110Z // begin inline asm 2026-02-21T08:21:55.6393315Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T08:21:55.6393551Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.6393786Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.6393960Z // end inline asm 2026-02-21T08:21:55.6394094Z bar.sync 0; 2026-02-21T08:21:55.6394226Z cvta.global.u64 %rd88, %rd55; 2026-02-21T08:21:55.6394505Z .loc 1 30 99 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:30:99 2026-02-21T08:21:55.6394844Z setp.gt.u32 %p57, %r3, 2047; 2026-02-21T08:21:55.6395009Z @%p57 bra $L__BB0_8; 2026-02-21T08:21:55.6395174Z // %bb.1: // %.lr.ph 2026-02-21T08:21:55.6395478Z .loc 1 0 99 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:0:99 2026-02-21T08:21:55.6395768Z shr.u32 %r161, %r1, 5; 2026-02-21T08:21:55.6395913Z shl.b32 %r162, %r1, 7; 2026-02-21T08:21:55.6396068Z and.b32 %r163, %r162, 1920; 2026-02-21T08:21:55.6396220Z shl.b32 %r164, %r1, 6; 2026-02-21T08:21:55.6396372Z and.b32 %r165, %r164, 6144; 2026-02-21T08:21:55.6396520Z shl.b32 %r166, %r1, 4; 2026-02-21T08:21:55.6396674Z and.b32 %r167, %r166, 112; 2026-02-21T08:21:55.6396834Z and.b32 %r169, %r59, 64; 2026-02-21T08:21:55.6396983Z or.b32 %r170, %r165, %r167; 2026-02-21T08:21:55.6397144Z xor.b32 %r171, %r170, %r169; 2026-02-21T08:21:55.6397293Z or.b32 %r172, %r171, %r163; 2026-02-21T08:21:55.6397446Z add.s32 %r173, %r27, 28672; 2026-02-21T08:21:55.6397593Z xor.b32 %r174, %r172, 16; 2026-02-21T08:21:55.6397751Z xor.b32 %r175, %r172, 32; 2026-02-21T08:21:55.6397897Z xor.b32 %r176, %r172, 48; 2026-02-21T08:21:55.6398172Z .loc 1 37 33 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:37:33 2026-02-21T08:21:55.6398464Z shr.u32 %r177, %r3, 5; 2026-02-21T08:21:55.6398609Z and.b32 %r178, %r177, 60; 2026-02-21T08:21:55.6398870Z .loc 1 39 64 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:39:64 2026-02-21T08:21:55.6399160Z and.b32 %r179, %r3, 3; 2026-02-21T08:21:55.6399415Z .loc 1 39 30 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:39:30 2026-02-21T08:21:55.6399691Z or.b32 %r180, %r178, %r179; 2026-02-21T08:21:55.6399949Z .loc 1 41 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:41:27 2026-02-21T08:21:55.6400231Z shl.b32 %r303, %r180, 6; 2026-02-21T08:21:55.6400482Z .loc 1 42 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:42:27 2026-02-21T08:21:55.6400769Z shl.b32 %r181, %r3, 4; 2026-02-21T08:21:55.6400908Z and.b32 %r304, %r181, 1984; 2026-02-21T08:21:55.6401165Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6401469Z shfl.sync.idx.b32 %r10, %r161, 0, 31, -1; 2026-02-21T08:21:55.6401653Z shl.b32 %r182, %r10, 21; 2026-02-21T08:21:55.6401800Z and.b32 %r183, %r182, 6291456; 2026-02-21T08:21:55.6401960Z add.s32 %r302, %r183, %r356; 2026-02-21T08:21:55.6402116Z mov.pred %p58, -1; 2026-02-21T08:21:55.6402253Z // begin inline asm 2026-02-21T08:21:55.6402612Z @%p58 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r302 + 0], 32, {%r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37}; 2026-02-21T08:21:55.6403036Z // end inline asm 2026-02-21T08:21:55.6403175Z // begin inline asm 2026-02-21T08:21:55.6403503Z @%p58 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r302 + 16], 32, {%r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37, %r37}; 2026-02-21T08:21:55.6403873Z // end inline asm 2026-02-21T08:21:55.6404011Z // begin inline asm 2026-02-21T08:21:55.6404157Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:55.6404320Z // end inline asm 2026-02-21T08:21:55.6404447Z bar.sync 0; 2026-02-21T08:21:55.6404783Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6405065Z add.s32 %r358, %r27, 36928; 2026-02-21T08:21:55.6405218Z // begin inline asm 2026-02-21T08:21:55.6405379Z @%p128 mbarrier.init.shared::cta.b64 [%r358], 1; 2026-02-21T08:21:55.6405620Z // end inline asm 2026-02-21T08:21:55.6405758Z bar.sync 0; 2026-02-21T08:21:55.6405884Z add.s32 %r97, %r27, 36936; 2026-02-21T08:21:55.6406038Z // begin inline asm 2026-02-21T08:21:55.6406194Z @%p128 mbarrier.init.shared::cta.b64 [%r97], 1; 2026-02-21T08:21:55.6406382Z // end inline asm 2026-02-21T08:21:55.6406514Z add.s32 %r98, %r27, 36864; 2026-02-21T08:21:55.6406666Z // begin inline asm 2026-02-21T08:21:55.6406820Z @%p128 mbarrier.init.shared::cta.b64 [%r98], 1; 2026-02-21T08:21:55.6407006Z // end inline asm 2026-02-21T08:21:55.6407139Z bar.sync 0; 2026-02-21T08:21:55.6407264Z add.s32 %r99, %r27, 36872; 2026-02-21T08:21:55.6407415Z // begin inline asm 2026-02-21T08:21:55.6407571Z @%p128 mbarrier.init.shared::cta.b64 [%r99], 1; 2026-02-21T08:21:55.6407757Z // end inline asm 2026-02-21T08:21:55.6407884Z bar.sync 0; 2026-02-21T08:21:55.6408017Z add.s32 %r100, %r27, 36880; 2026-02-21T08:21:55.6408165Z // begin inline asm 2026-02-21T08:21:55.6408333Z @%p128 mbarrier.init.shared::cta.b64 [%r100], 1; 2026-02-21T08:21:55.6408517Z // end inline asm 2026-02-21T08:21:55.6408650Z bar.sync 0; 2026-02-21T08:21:55.6408782Z add.s32 %r101, %r27, 36888; 2026-02-21T08:21:55.6408925Z // begin inline asm 2026-02-21T08:21:55.6409086Z @%p128 mbarrier.init.shared::cta.b64 [%r101], 1; 2026-02-21T08:21:55.6409265Z // end inline asm 2026-02-21T08:21:55.6409399Z bar.sync 0; 2026-02-21T08:21:55.6409522Z add.s32 %r102, %r27, 36896; 2026-02-21T08:21:55.6409672Z // begin inline asm 2026-02-21T08:21:55.6409825Z @%p128 mbarrier.init.shared::cta.b64 [%r102], 1; 2026-02-21T08:21:55.6410009Z // end inline asm 2026-02-21T08:21:55.6410132Z bar.sync 0; 2026-02-21T08:21:55.6410261Z add.s32 %r103, %r27, 36904; 2026-02-21T08:21:55.6410412Z // begin inline asm 2026-02-21T08:21:55.6410566Z @%p128 mbarrier.init.shared::cta.b64 [%r103], 1; 2026-02-21T08:21:55.6410749Z // end inline asm 2026-02-21T08:21:55.6410871Z bar.sync 0; 2026-02-21T08:21:55.6411003Z add.s32 %r204, %r27, 36912; 2026-02-21T08:21:55.6411150Z // begin inline asm 2026-02-21T08:21:55.6411313Z @%p128 mbarrier.init.shared::cta.b64 [%r204], 1; 2026-02-21T08:21:55.6411489Z // end inline asm 2026-02-21T08:21:55.6411620Z bar.sync 0; 2026-02-21T08:21:55.6411740Z // begin inline asm 2026-02-21T08:21:55.6411925Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r98], 4096; 2026-02-21T08:21:55.6412138Z // end inline asm 2026-02-21T08:21:55.6412375Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6412655Z // begin inline asm 2026-02-21T08:21:55.6412799Z fence.proxy.async.shared::cta; 2026-02-21T08:21:55.6412961Z // end inline asm 2026-02-21T08:21:55.6413084Z bar.sync 0; 2026-02-21T08:21:55.6413221Z elect.sync %r184|%p88, -1; 2026-02-21T08:21:55.6413377Z and.pred %p70, %p1, %p88; 2026-02-21T08:21:55.6413528Z // begin inline asm 2026-02-21T08:21:55.6413846Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r27], [%rd79, {%r37, %r304}], [%r98]; 2026-02-21T08:21:55.6414258Z // end inline asm 2026-02-21T08:21:55.6414510Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6414822Z bar.sync 0; 2026-02-21T08:21:55.6414961Z elect.sync %r185|%p89, -1; 2026-02-21T08:21:55.6415118Z and.pred %p71, %p1, %p89; 2026-02-21T08:21:55.6415277Z add.s32 %r110, %r27, 14336; 2026-02-21T08:21:55.6415432Z // begin inline asm 2026-02-21T08:21:55.6415741Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r110], [%rd80, {%r37, %r303}], [%r98]; 2026-02-21T08:21:55.6416097Z // end inline asm 2026-02-21T08:21:55.6416341Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6416622Z bar.sync 0; 2026-02-21T08:21:55.6416744Z // begin inline asm 2026-02-21T08:21:55.6416935Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r99], 4096; 2026-02-21T08:21:55.6417214Z // end inline asm 2026-02-21T08:21:55.6417460Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6417748Z bar.sync 0; 2026-02-21T08:21:55.6417881Z elect.sync %r186|%p90, -1; 2026-02-21T08:21:55.6418047Z and.pred %p73, %p1, %p90; 2026-02-21T08:21:55.6418195Z add.s32 %r115, %r27, 2048; 2026-02-21T08:21:55.6418346Z // begin inline asm 2026-02-21T08:21:55.6418655Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r115], [%rd79, {%r30, %r304}], [%r99]; 2026-02-21T08:21:55.6419007Z // end inline asm 2026-02-21T08:21:55.6419251Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6419527Z bar.sync 0; 2026-02-21T08:21:55.6419665Z elect.sync %r187|%p91, -1; 2026-02-21T08:21:55.6419819Z and.pred %p74, %p1, %p91; 2026-02-21T08:21:55.6419975Z add.s32 %r119, %r27, 16384; 2026-02-21T08:21:55.6420121Z // begin inline asm 2026-02-21T08:21:55.6420438Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r119], [%rd80, {%r30, %r303}], [%r99]; 2026-02-21T08:21:55.6420795Z // end inline asm 2026-02-21T08:21:55.6421035Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6421344Z bar.sync 0; 2026-02-21T08:21:55.6421473Z // begin inline asm 2026-02-21T08:21:55.6421675Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r100], 4096; 2026-02-21T08:21:55.6421896Z // end inline asm 2026-02-21T08:21:55.6422157Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6422451Z bar.sync 0; 2026-02-21T08:21:55.6422596Z elect.sync %r188|%p92, -1; 2026-02-21T08:21:55.6422766Z and.pred %p76, %p1, %p92; 2026-02-21T08:21:55.6422920Z add.s32 %r124, %r27, 4096; 2026-02-21T08:21:55.6423075Z mov.b32 %r125, 32; 2026-02-21T08:21:55.6423211Z // begin inline asm 2026-02-21T08:21:55.6423559Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r124], [%rd79, {%r125, %r304}], [%r100]; 2026-02-21T08:21:55.6423937Z // end inline asm 2026-02-21T08:21:55.6424193Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6424487Z bar.sync 0; 2026-02-21T08:21:55.6424622Z elect.sync %r189|%p93, -1; 2026-02-21T08:21:55.6424820Z and.pred %p77, %p1, %p93; 2026-02-21T08:21:55.6424977Z add.s32 %r128, %r27, 18432; 2026-02-21T08:21:55.6425138Z // begin inline asm 2026-02-21T08:21:55.6425474Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r128], [%rd80, {%r125, %r303}], [%r100]; 2026-02-21T08:21:55.6425846Z // end inline asm 2026-02-21T08:21:55.6426102Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6426402Z bar.sync 0; 2026-02-21T08:21:55.6426542Z // begin inline asm 2026-02-21T08:21:55.6426740Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r101], 4096; 2026-02-21T08:21:55.6427046Z // end inline asm 2026-02-21T08:21:55.6427304Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6427610Z bar.sync 0; 2026-02-21T08:21:55.6427747Z elect.sync %r190|%p94, -1; 2026-02-21T08:21:55.6427920Z and.pred %p79, %p1, %p94; 2026-02-21T08:21:55.6428077Z add.s32 %r133, %r27, 6144; 2026-02-21T08:21:55.6428236Z mov.b32 %r134, 48; 2026-02-21T08:21:55.6428388Z // begin inline asm 2026-02-21T08:21:55.6428728Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r133], [%rd79, {%r134, %r304}], [%r101]; 2026-02-21T08:21:55.6429119Z // end inline asm 2026-02-21T08:21:55.6429373Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6429666Z bar.sync 0; 2026-02-21T08:21:55.6429797Z elect.sync %r191|%p95, -1; 2026-02-21T08:21:55.6430013Z and.pred %p80, %p1, %p95; 2026-02-21T08:21:55.6430176Z add.s32 %r137, %r27, 20480; 2026-02-21T08:21:55.6430322Z // begin inline asm 2026-02-21T08:21:55.6430638Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r137], [%rd80, {%r134, %r303}], [%r101]; 2026-02-21T08:21:55.6430983Z // end inline asm 2026-02-21T08:21:55.6431231Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6431514Z bar.sync 0; 2026-02-21T08:21:55.6431644Z // begin inline asm 2026-02-21T08:21:55.6431825Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r102], 4096; 2026-02-21T08:21:55.6432046Z // end inline asm 2026-02-21T08:21:55.6432296Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6432570Z bar.sync 0; 2026-02-21T08:21:55.6432707Z elect.sync %r192|%p96, -1; 2026-02-21T08:21:55.6432861Z and.pred %p82, %p1, %p96; 2026-02-21T08:21:55.6433020Z add.s32 %r142, %r27, 8192; 2026-02-21T08:21:55.6433167Z // begin inline asm 2026-02-21T08:21:55.6433488Z @%p82 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r142], [%rd79, {%r31, %r304}], [%r102]; 2026-02-21T08:21:55.6433840Z // end inline asm 2026-02-21T08:21:55.6434082Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6434371Z bar.sync 0; 2026-02-21T08:21:55.6434499Z elect.sync %r193|%p97, -1; 2026-02-21T08:21:55.6434658Z and.pred %p83, %p1, %p97; 2026-02-21T08:21:55.6434873Z add.s32 %r146, %r27, 22528; 2026-02-21T08:21:55.6435028Z // begin inline asm 2026-02-21T08:21:55.6435340Z @%p83 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r146], [%rd80, {%r31, %r303}], [%r102]; 2026-02-21T08:21:55.6435694Z // end inline asm 2026-02-21T08:21:55.6435945Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6436231Z bar.sync 0; 2026-02-21T08:21:55.6436365Z // begin inline asm 2026-02-21T08:21:55.6436550Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r103], 4096; 2026-02-21T08:21:55.6436771Z // end inline asm 2026-02-21T08:21:55.6437009Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6437294Z bar.sync 0; 2026-02-21T08:21:55.6437431Z elect.sync %r194|%p98, -1; 2026-02-21T08:21:55.6437586Z and.pred %p85, %p1, %p98; 2026-02-21T08:21:55.6437745Z add.s32 %r151, %r27, 10240; 2026-02-21T08:21:55.6437893Z mov.b32 %r152, 80; 2026-02-21T08:21:55.6438032Z // begin inline asm 2026-02-21T08:21:55.6438340Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r151], [%rd79, {%r152, %r304}], [%r103]; 2026-02-21T08:21:55.6438699Z // end inline asm 2026-02-21T08:21:55.6438943Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6439232Z bar.sync 0; 2026-02-21T08:21:55.6439437Z elect.sync %r195|%p99, -1; 2026-02-21T08:21:55.6439593Z and.pred %p86, %p1, %p99; 2026-02-21T08:21:55.6439753Z add.s32 %r155, %r27, 24576; 2026-02-21T08:21:55.6439905Z // begin inline asm 2026-02-21T08:21:55.6440223Z @%p86 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r155], [%rd80, {%r152, %r303}], [%r103]; 2026-02-21T08:21:55.6440568Z // end inline asm 2026-02-21T08:21:55.6440816Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6441106Z bar.sync 0; 2026-02-21T08:21:55.6441237Z // begin inline asm 2026-02-21T08:21:55.6441384Z 2026-02-21T08:21:55.6441501Z { 2026-02-21T08:21:55.6441634Z .reg .pred complete; 2026-02-21T08:21:55.6441775Z waitLoop: 2026-02-21T08:21:55.6441959Z mbarrier.try_wait.parity.shared.b64 complete, [%r98], %r37; 2026-02-21T08:21:55.6442180Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.6442385Z } 2026-02-21T08:21:55.6442451Z 2026-02-21T08:21:55.6442506Z // end inline asm 2026-02-21T08:21:55.6442755Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6443047Z setp.ne.b32 %p100, %r10, 0; 2026-02-21T08:21:55.6443200Z @%p100 bra $L__BB0_3; 2026-02-21T08:21:55.6443346Z // %bb.2: 2026-02-21T08:21:55.6443574Z .loc 1 0 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:0:52 2026-02-21T08:21:55.6443869Z bfe.u32 %r200, %r110, 4, 14; 2026-02-21T08:21:55.6444023Z cvt.u64.u32 %rd77, %r200; 2026-02-21T08:21:55.6444193Z or.b64 %rd75, %rd77, -4611685949699522560; 2026-02-21T08:21:55.6444372Z bfe.u32 %r201, %r27, 4, 14; 2026-02-21T08:21:55.6444528Z cvt.u64.u32 %rd78, %r201; 2026-02-21T08:21:55.6444724Z or.b64 %rd74, %rd78, -4611685949699522560; 2026-02-21T08:21:55.6445012Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6445306Z elect.sync %r202|%p102, -1; 2026-02-21T08:21:55.6445461Z mov.b32 %r197, 68157456; 2026-02-21T08:21:55.6445615Z mov.pred %p101, 0; 2026-02-21T08:21:55.6445751Z // begin inline asm 2026-02-21T08:21:55.6445981Z @%p102 tcgen05.mma.cta_group::1.kind::f16 [ %r356 + 0 ], %rd74, %rd75, %r197, %p101; 2026-02-21T08:21:55.6446235Z // end inline asm 2026-02-21T08:21:55.6446369Z add.s32 %r203, %r27, 36928; 2026-02-21T08:21:55.6446425Z cvt.u64.u32 %rd76, %r203; 2026-02-21T08:21:55.6446568Z // begin inline asm 2026-02-21T08:21:55.6446690Z @%p102 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd76]; 2026-02-21T08:21:55.6446743Z // end inline asm 2026-02-21T08:21:55.6446802Z $L__BB0_3: 2026-02-21T08:21:55.6446963Z .loc 1 0 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:0:52 2026-02-21T08:21:55.6447021Z add.s32 %r4, %r173, %r172; 2026-02-21T08:21:55.6447078Z add.s32 %r5, %r173, %r174; 2026-02-21T08:21:55.6447141Z add.s32 %r6, %r173, %r175; 2026-02-21T08:21:55.6447199Z add.s32 %r7, %r173, %r176; 2026-02-21T08:21:55.6447362Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6447424Z bar.sync 0; 2026-02-21T08:21:55.6447479Z // begin inline asm 2026-02-21T08:21:55.6447585Z @%p128 mbarrier.arrive.expect_tx.shared.b64 _, [%r204], 4096; 2026-02-21T08:21:55.6447647Z // end inline asm 2026-02-21T08:21:55.6447808Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6447861Z bar.sync 0; 2026-02-21T08:21:55.6447923Z elect.sync %r218|%p108, -1; 2026-02-21T08:21:55.6447992Z and.pred %p105, %p1, %p108; 2026-02-21T08:21:55.6448047Z add.s32 %r205, %r27, 12288; 2026-02-21T08:21:55.6448100Z mov.b32 %r206, 96; 2026-02-21T08:21:55.6448160Z // begin inline asm 2026-02-21T08:21:55.6448398Z @%p105 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r205], [%rd79, {%r206, %r304}], [%r204]; 2026-02-21T08:21:55.6448452Z // end inline asm 2026-02-21T08:21:55.6448682Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6448734Z bar.sync 0; 2026-02-21T08:21:55.6448794Z elect.sync %r219|%p109, -1; 2026-02-21T08:21:55.6448853Z and.pred %p106, %p1, %p109; 2026-02-21T08:21:55.6448916Z add.s32 %r209, %r27, 26624; 2026-02-21T08:21:55.6448970Z // begin inline asm 2026-02-21T08:21:55.6449210Z @%p106 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r209], [%rd80, {%r206, %r303}], [%r204]; 2026-02-21T08:21:55.6449273Z // end inline asm 2026-02-21T08:21:55.6449329Z mov.b32 %r362, 1; 2026-02-21T08:21:55.6449384Z mov.b32 %r361, 6; 2026-02-21T08:21:55.6449446Z mov.b32 %r357, 0; 2026-02-21T08:21:55.6449503Z mov.b32 %r359, %r357; 2026-02-21T08:21:55.6449558Z mov.b32 %r360, %r357; 2026-02-21T08:21:55.6449611Z mov.b32 %r363, %r357; 2026-02-21T08:21:55.6449672Z mov.b32 %r364, %r357; 2026-02-21T08:21:55.6449785Z bra.uni $L__BB0_4; 2026-02-21T08:21:55.6449889Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.6450065Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6450128Z setp.lt.u32 %p118, %r364, 1936; 2026-02-21T08:21:55.6450293Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6450355Z // begin inline asm 2026-02-21T08:21:55.6450404Z 2026-02-21T08:21:55.6450452Z { 2026-02-21T08:21:55.6450512Z .reg .pred complete; 2026-02-21T08:21:55.6450572Z waitLoop: 2026-02-21T08:21:55.6450687Z mbarrier.try_wait.parity.shared.b64 complete, [%r358], %r357; 2026-02-21T08:21:55.6450750Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.6450805Z } 2026-02-21T08:21:55.6450810Z 2026-02-21T08:21:55.6450862Z // end inline asm 2026-02-21T08:21:55.6450918Z add.s32 %r249, %r362, 1; 2026-02-21T08:21:55.6450983Z setp.gt.s32 %p121, %r249, 1; 2026-02-21T08:21:55.6451056Z selp.b32 %r362, 0, %r249, %p121; 2026-02-21T08:21:55.6451116Z selp.b32 %r250, 1, 0, %p121; 2026-02-21T08:21:55.6451171Z xor.b32 %r363, %r259, %r250; 2026-02-21T08:21:55.6451347Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6451402Z add.s32 %r251, %r361, 1; 2026-02-21T08:21:55.6451460Z setp.gt.s32 %p122, %r251, 6; 2026-02-21T08:21:55.6451519Z selp.b32 %r361, 0, %r251, %p122; 2026-02-21T08:21:55.6451584Z shl.b32 %r252, %r361, 3; 2026-02-21T08:21:55.6451640Z add.s32 %r254, %r27, %r252; 2026-02-21T08:21:55.6451695Z add.s32 %r244, %r254, 36864; 2026-02-21T08:21:55.6451759Z bar.sync 0; 2026-02-21T08:21:55.6451822Z and.pred %p115, %p128, %p118; 2026-02-21T08:21:55.6451876Z // begin inline asm 2026-02-21T08:21:55.6451989Z @%p115 mbarrier.arrive.expect_tx.shared.b64 _, [%r244], 4096; 2026-02-21T08:21:55.6452042Z // end inline asm 2026-02-21T08:21:55.6452208Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6452268Z shl.b32 %r255, %r361, 11; 2026-02-21T08:21:55.6452331Z add.s32 %r241, %r27, %r255; 2026-02-21T08:21:55.6452383Z bar.sync 0; 2026-02-21T08:21:55.6452442Z elect.sync %r256|%p123, -1; 2026-02-21T08:21:55.6452510Z and.pred %p124, %p118, %p123; 2026-02-21T08:21:55.6452568Z and.pred %p116, %p1, %p124; 2026-02-21T08:21:55.6452624Z add.s32 %r242, %r364, 112; 2026-02-21T08:21:55.6452677Z // begin inline asm 2026-02-21T08:21:55.6452917Z @%p116 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r241], [%rd79, {%r242, %r304}], [%r244]; 2026-02-21T08:21:55.6452970Z // end inline asm 2026-02-21T08:21:55.6453135Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6453196Z add.s32 %r245, %r241, 14336; 2026-02-21T08:21:55.6453247Z bar.sync 0; 2026-02-21T08:21:55.6453307Z elect.sync %r257|%p125, -1; 2026-02-21T08:21:55.6453417Z and.pred %p126, %p118, %p125; 2026-02-21T08:21:55.6453476Z and.pred %p117, %p1, %p126; 2026-02-21T08:21:55.6453529Z // begin inline asm 2026-02-21T08:21:55.6453761Z @%p117 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r245], [%rd80, {%r242, %r303}], [%r244]; 2026-02-21T08:21:55.6453820Z // end inline asm 2026-02-21T08:21:55.6453986Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6454046Z setp.lt.u32 %p127, %r364, 2016; 2026-02-21T08:21:55.6454108Z add.s32 %r364, %r364, 16; 2026-02-21T08:21:55.6454163Z mov.b32 %r357, %r259; 2026-02-21T08:21:55.6454216Z mov.b32 %r358, %r258; 2026-02-21T08:21:55.6454278Z @%p127 bra $L__BB0_4; 2026-02-21T08:21:55.6454331Z bra.uni $L__BB0_7; 2026-02-21T08:21:55.6454430Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:55.6454637Z .loc 1 0 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:0:42 2026-02-21T08:21:55.6454724Z mov.b32 %r259, %r363; 2026-02-21T08:21:55.6454892Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6454948Z add.s32 %r222, %r360, 1; 2026-02-21T08:21:55.6455017Z setp.gt.s32 %p111, %r222, 6; 2026-02-21T08:21:55.6455078Z selp.b32 %r360, 0, %r222, %p111; 2026-02-21T08:21:55.6455135Z selp.b32 %r223, 1, 0, %p111; 2026-02-21T08:21:55.6455194Z xor.b32 %r359, %r359, %r223; 2026-02-21T08:21:55.6455248Z shl.b32 %r224, %r360, 3; 2026-02-21T08:21:55.6455303Z add.s32 %r226, %r27, %r224; 2026-02-21T08:21:55.6455356Z add.s32 %r220, %r226, 36864; 2026-02-21T08:21:55.6455416Z bar.sync 0; 2026-02-21T08:21:55.6455469Z // begin inline asm 2026-02-21T08:21:55.6455517Z 2026-02-21T08:21:55.6455571Z { 2026-02-21T08:21:55.6455629Z .reg .pred complete; 2026-02-21T08:21:55.6455681Z waitLoop: 2026-02-21T08:21:55.6455798Z mbarrier.try_wait.parity.shared.b64 complete, [%r220], %r359; 2026-02-21T08:21:55.6455869Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.6455917Z } 2026-02-21T08:21:55.6455920Z 2026-02-21T08:21:55.6455973Z // end inline asm 2026-02-21T08:21:55.6456034Z shl.b32 %r227, %r362, 3; 2026-02-21T08:21:55.6456090Z add.s32 %r228, %r27, %r227; 2026-02-21T08:21:55.6456146Z add.s32 %r258, %r228, 36928; 2026-02-21T08:21:55.6456321Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6456378Z @%p100 bra $L__BB0_6; 2026-02-21T08:21:55.6456474Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.6456643Z .loc 1 51 31 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:51:31 2026-02-21T08:21:55.6456707Z shl.b32 %r231, %r360, 11; 2026-02-21T08:21:55.6456763Z add.s32 %r233, %r27, %r231; 2026-02-21T08:21:55.6456934Z .loc 1 52 44 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:52:44 2026-02-21T08:21:55.6457003Z add.s32 %r234, %r233, 14336; 2026-02-21T08:21:55.6457169Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6457229Z elect.sync %r235|%p113, -1; 2026-02-21T08:21:55.6457297Z bfe.u32 %r236, %r233, 4, 14; 2026-02-21T08:21:55.6457355Z cvt.u64.u32 %rd84, %r236; 2026-02-21T08:21:55.6457427Z or.b64 %rd81, %rd84, -4611685949699522560; 2026-02-21T08:21:55.6457485Z bfe.u32 %r237, %r234, 4, 14; 2026-02-21T08:21:55.6457549Z cvt.u64.u32 %rd85, %r237; 2026-02-21T08:21:55.6457617Z or.b64 %rd82, %rd85, -4611685949699522560; 2026-02-21T08:21:55.6457671Z mov.b32 %r230, 68157456; 2026-02-21T08:21:55.6457737Z mov.pred %p112, -1; 2026-02-21T08:21:55.6457792Z // begin inline asm 2026-02-21T08:21:55.6457932Z @%p113 tcgen05.mma.cta_group::1.kind::f16 [ %r356 + 0 ], %rd81, %rd82, %r230, %p112; 2026-02-21T08:21:55.6457991Z // end inline asm 2026-02-21T08:21:55.6458048Z cvt.u64.u32 %rd83, %r258; 2026-02-21T08:21:55.6458216Z // begin inline asm 2026-02-21T08:21:55.6458342Z @%p113 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd83]; 2026-02-21T08:21:55.6458400Z // end inline asm 2026-02-21T08:21:55.6458455Z bra.uni $L__BB0_6; 2026-02-21T08:21:55.6458546Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:21:55.6458721Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6458774Z // begin inline asm 2026-02-21T08:21:55.6458822Z 2026-02-21T08:21:55.6458870Z { 2026-02-21T08:21:55.6458934Z .reg .pred complete; 2026-02-21T08:21:55.6458987Z waitLoop: 2026-02-21T08:21:55.6459097Z mbarrier.try_wait.parity.shared.b64 complete, [%r258], %r259; 2026-02-21T08:21:55.6459165Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.6459212Z } 2026-02-21T08:21:55.6459215Z 2026-02-21T08:21:55.6459266Z // end inline asm 2026-02-21T08:21:55.6459489Z .loc 1 47 42 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:47:42 2026-02-21T08:21:55.6459549Z bar.sync 0; 2026-02-21T08:21:55.6459604Z // begin inline asm 2026-02-21T08:21:55.6459685Z @%p128 mbarrier.inval.shared::cta.b64 [%r98]; 2026-02-21T08:21:55.6459746Z // end inline asm 2026-02-21T08:21:55.6459799Z bar.sync 0; 2026-02-21T08:21:55.6459853Z // begin inline asm 2026-02-21T08:21:55.6459940Z @%p128 mbarrier.inval.shared::cta.b64 [%r99]; 2026-02-21T08:21:55.6459992Z // end inline asm 2026-02-21T08:21:55.6460042Z bar.sync 0; 2026-02-21T08:21:55.6460094Z // begin inline asm 2026-02-21T08:21:55.6460183Z @%p128 mbarrier.inval.shared::cta.b64 [%r100]; 2026-02-21T08:21:55.6460236Z // end inline asm 2026-02-21T08:21:55.6460288Z bar.sync 0; 2026-02-21T08:21:55.6460349Z // begin inline asm 2026-02-21T08:21:55.6460428Z @%p128 mbarrier.inval.shared::cta.b64 [%r101]; 2026-02-21T08:21:55.6460480Z // end inline asm 2026-02-21T08:21:55.6460531Z bar.sync 0; 2026-02-21T08:21:55.6460592Z // begin inline asm 2026-02-21T08:21:55.6460667Z @%p128 mbarrier.inval.shared::cta.b64 [%r102]; 2026-02-21T08:21:55.6460721Z // end inline asm 2026-02-21T08:21:55.6460778Z bar.sync 0; 2026-02-21T08:21:55.6460831Z // begin inline asm 2026-02-21T08:21:55.6460905Z @%p128 mbarrier.inval.shared::cta.b64 [%r103]; 2026-02-21T08:21:55.6460963Z // end inline asm 2026-02-21T08:21:55.6461014Z bar.sync 0; 2026-02-21T08:21:55.6461066Z // begin inline asm 2026-02-21T08:21:55.6461141Z @%p128 mbarrier.inval.shared::cta.b64 [%r204]; 2026-02-21T08:21:55.6461200Z // end inline asm 2026-02-21T08:21:55.6461255Z add.s32 %r267, %r27, 36928; 2026-02-21T08:21:55.6461307Z // begin inline asm 2026-02-21T08:21:55.6461387Z @%p128 mbarrier.inval.shared::cta.b64 [%r267]; 2026-02-21T08:21:55.6461438Z // end inline asm 2026-02-21T08:21:55.6461489Z bar.sync 0; 2026-02-21T08:21:55.6461541Z // begin inline asm 2026-02-21T08:21:55.6461621Z @%p128 mbarrier.inval.shared::cta.b64 [%r97]; 2026-02-21T08:21:55.6461673Z // end inline asm 2026-02-21T08:21:55.6461838Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6461900Z // begin inline asm 2026-02-21T08:21:55.6462190Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r269, %r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284}, [%r302 + 0], 32; 2026-02-21T08:21:55.6462244Z // end inline asm 2026-02-21T08:21:55.6462303Z // begin inline asm 2026-02-21T08:21:55.6462593Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301}, [%r302 + 16], 32; 2026-02-21T08:21:55.6462646Z // end inline asm 2026-02-21T08:21:55.6462705Z // begin inline asm 2026-02-21T08:21:55.6462774Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:55.6462824Z // end inline asm 2026-02-21T08:21:55.6462881Z cvt.u64.u32 %rd89, %r269; 2026-02-21T08:21:55.6462946Z cvt.u64.u32 %rd90, %r270; 2026-02-21T08:21:55.6463003Z shl.b64 %rd91, %rd90, 32; 2026-02-21T08:21:55.6463125Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T08:21:55.6463303Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6463362Z mov.b64 {%r307, %r308}, %rd92; 2026-02-21T08:21:55.6463429Z cvt.rn.f16x2.f32 %r309, %r308, %r307; 2026-02-21T08:21:55.6463599Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6463661Z cvt.u64.u32 %rd93, %r271; 2026-02-21T08:21:55.6463714Z cvt.u64.u32 %rd94, %r272; 2026-02-21T08:21:55.6463770Z shl.b64 %rd95, %rd94, 32; 2026-02-21T08:21:55.6463834Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T08:21:55.6464001Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6464059Z mov.b64 {%r310, %r311}, %rd96; 2026-02-21T08:21:55.6464131Z cvt.rn.f16x2.f32 %r312, %r311, %r310; 2026-02-21T08:21:55.6464367Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6464432Z cvt.u64.u32 %rd97, %r273; 2026-02-21T08:21:55.6464491Z cvt.u64.u32 %rd98, %r274; 2026-02-21T08:21:55.6464558Z shl.b64 %rd99, %rd98, 32; 2026-02-21T08:21:55.6464620Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T08:21:55.6465381Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6465455Z mov.b64 {%r313, %r314}, %rd100; 2026-02-21T08:21:55.6465524Z cvt.rn.f16x2.f32 %r315, %r314, %r313; 2026-02-21T08:21:55.6465702Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6465775Z cvt.u64.u32 %rd101, %r275; 2026-02-21T08:21:55.6465838Z cvt.u64.u32 %rd102, %r276; 2026-02-21T08:21:55.6465906Z shl.b64 %rd103, %rd102, 32; 2026-02-21T08:21:55.6465973Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T08:21:55.6466167Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6466237Z mov.b64 {%r316, %r317}, %rd104; 2026-02-21T08:21:55.6466306Z cvt.rn.f16x2.f32 %r318, %r317, %r316; 2026-02-21T08:21:55.6466496Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6466560Z cvt.u64.u32 %rd105, %r277; 2026-02-21T08:21:55.6466623Z cvt.u64.u32 %rd106, %r278; 2026-02-21T08:21:55.6466700Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:21:55.6466765Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:21:55.6466944Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6467008Z mov.b64 {%r319, %r320}, %rd108; 2026-02-21T08:21:55.6467086Z cvt.rn.f16x2.f32 %r321, %r320, %r319; 2026-02-21T08:21:55.6467260Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6467319Z cvt.u64.u32 %rd109, %r279; 2026-02-21T08:21:55.6467387Z cvt.u64.u32 %rd110, %r280; 2026-02-21T08:21:55.6467448Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:21:55.6467507Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:21:55.6467686Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6467746Z mov.b64 {%r322, %r323}, %rd112; 2026-02-21T08:21:55.6467809Z cvt.rn.f16x2.f32 %r324, %r323, %r322; 2026-02-21T08:21:55.6467982Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6468048Z cvt.u64.u32 %rd113, %r281; 2026-02-21T08:21:55.6468107Z cvt.u64.u32 %rd114, %r282; 2026-02-21T08:21:55.6468164Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:21:55.6468230Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:21:55.6468400Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6468459Z mov.b64 {%r325, %r326}, %rd116; 2026-02-21T08:21:55.6468531Z cvt.rn.f16x2.f32 %r327, %r326, %r325; 2026-02-21T08:21:55.6468762Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6468822Z cvt.u64.u32 %rd117, %r283; 2026-02-21T08:21:55.6468881Z cvt.u64.u32 %rd118, %r284; 2026-02-21T08:21:55.6468950Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:21:55.6469010Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:21:55.6469186Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6469253Z mov.b64 {%r328, %r329}, %rd120; 2026-02-21T08:21:55.6469317Z cvt.rn.f16x2.f32 %r330, %r329, %r328; 2026-02-21T08:21:55.6469490Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6469556Z cvt.u64.u32 %rd121, %r286; 2026-02-21T08:21:55.6469614Z cvt.u64.u32 %rd122, %r287; 2026-02-21T08:21:55.6469673Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:21:55.6469794Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:21:55.6469976Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6470037Z mov.b64 {%r331, %r332}, %rd124; 2026-02-21T08:21:55.6470102Z cvt.rn.f16x2.f32 %r333, %r332, %r331; 2026-02-21T08:21:55.6470279Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6470339Z cvt.u64.u32 %rd125, %r288; 2026-02-21T08:21:55.6470398Z cvt.u64.u32 %rd126, %r289; 2026-02-21T08:21:55.6470466Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:21:55.6470524Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:21:55.6470691Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6470749Z mov.b64 {%r334, %r335}, %rd128; 2026-02-21T08:21:55.6470819Z cvt.rn.f16x2.f32 %r336, %r335, %r334; 2026-02-21T08:21:55.6470988Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6471049Z cvt.u64.u32 %rd129, %r290; 2026-02-21T08:21:55.6471114Z cvt.u64.u32 %rd130, %r291; 2026-02-21T08:21:55.6471171Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:21:55.6471230Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:21:55.6471407Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6471466Z mov.b64 {%r337, %r338}, %rd132; 2026-02-21T08:21:55.6471528Z cvt.rn.f16x2.f32 %r339, %r338, %r337; 2026-02-21T08:21:55.6471693Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6471758Z cvt.u64.u32 %rd133, %r292; 2026-02-21T08:21:55.6471816Z cvt.u64.u32 %rd134, %r293; 2026-02-21T08:21:55.6471875Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:21:55.6471940Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:21:55.6472112Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6472170Z mov.b64 {%r340, %r341}, %rd136; 2026-02-21T08:21:55.6472242Z cvt.rn.f16x2.f32 %r342, %r341, %r340; 2026-02-21T08:21:55.6472410Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6472467Z cvt.u64.u32 %rd137, %r294; 2026-02-21T08:21:55.6472524Z cvt.u64.u32 %rd138, %r295; 2026-02-21T08:21:55.6472589Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:21:55.6472647Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:21:55.6472818Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6472882Z mov.b64 {%r343, %r344}, %rd140; 2026-02-21T08:21:55.6472941Z cvt.rn.f16x2.f32 %r345, %r344, %r343; 2026-02-21T08:21:55.6473107Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6473168Z cvt.u64.u32 %rd141, %r296; 2026-02-21T08:21:55.6473224Z cvt.u64.u32 %rd142, %r297; 2026-02-21T08:21:55.6473280Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:21:55.6473379Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:21:55.6473558Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6473615Z mov.b64 {%r346, %r347}, %rd144; 2026-02-21T08:21:55.6473674Z cvt.rn.f16x2.f32 %r348, %r347, %r346; 2026-02-21T08:21:55.6473849Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6473904Z cvt.u64.u32 %rd145, %r298; 2026-02-21T08:21:55.6473958Z cvt.u64.u32 %rd146, %r299; 2026-02-21T08:21:55.6474023Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:21:55.6474078Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:21:55.6474245Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6474300Z mov.b64 {%r349, %r350}, %rd148; 2026-02-21T08:21:55.6474370Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:21:55.6474593Z .loc 1 53 52 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:53:52 2026-02-21T08:21:55.6474655Z cvt.u64.u32 %rd149, %r300; 2026-02-21T08:21:55.6474761Z cvt.u64.u32 %rd150, %r301; 2026-02-21T08:21:55.6474820Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:21:55.6474877Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:21:55.6475054Z .loc 1 55 27 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:55:27 2026-02-21T08:21:55.6475111Z mov.b64 {%r352, %r353}, %rd152; 2026-02-21T08:21:55.6475171Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:21:55.6475338Z .loc 1 56 45 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:56:45 2026-02-21T08:21:55.6475417Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:21:55.6475469Z bar.sync 0; 2026-02-21T08:21:55.6475561Z st.shared.v4.b32 [%r4], {%r309, %r312, %r315, %r318}; 2026-02-21T08:21:55.6475658Z st.shared.v4.b32 [%r5], {%r321, %r324, %r327, %r330}; 2026-02-21T08:21:55.6475742Z st.shared.v4.b32 [%r6], {%r333, %r336, %r339, %r342}; 2026-02-21T08:21:55.6475827Z st.shared.v4.b32 [%r7], {%r345, %r348, %r351, %r354}; 2026-02-21T08:21:55.6475890Z // begin inline asm 2026-02-21T08:21:55.6475963Z fence.proxy.async.shared::cta; 2026-02-21T08:21:55.6476016Z // end inline asm 2026-02-21T08:21:55.6476067Z bar.sync 0; 2026-02-21T08:21:55.6476141Z elect.sync %r355|%p139, -1; 2026-02-21T08:21:55.6476200Z and.pred %p137, %p1, %p139; 2026-02-21T08:21:55.6476255Z // begin inline asm 2026-02-21T08:21:55.6476442Z @%p137 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd88, {%r303, %r304}], [%r173]; 2026-02-21T08:21:55.6476497Z // end inline asm 2026-02-21T08:21:55.6476561Z cp.async.bulk.commit_group; 2026-02-21T08:21:55.6476637Z $L__BB0_8: // %._crit_edge 2026-02-21T08:21:55.6476812Z .loc 1 30 99 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:30:99 2026-02-21T08:21:55.6476883Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:21:55.6476938Z bar.sync 0; 2026-02-21T08:21:55.6477115Z .loc 1 30 4 // cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py:30:4 2026-02-21T08:21:55.6477167Z bar.sync 0; 2026-02-21T08:21:55.6477221Z // begin inline asm 2026-02-21T08:21:55.6477339Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r356, 64; 2026-02-21T08:21:55.6477392Z // end inline asm 2026-02-21T08:21:55.6477443Z ret; 2026-02-21T08:21:55.6477496Z $L__tmp1: 2026-02-21T08:21:55.6477557Z $L__func_end0: 2026-02-21T08:21:55.6477637Z // -- End function 2026-02-21T08:21:55.6477685Z } 2026-02-21T08:21:55.6477901Z .file 1 "/tmp/torchinductor_root/zn/cznbjqbmwhxjkdvswq5vptbw4dqutoe6lg57dhxmfpya5mu4lnjr.py" 2026-02-21T08:21:55.6477961Z .section .debug_abbrev 2026-02-21T08:21:55.6478009Z { 2026-02-21T08:21:55.6478091Z .b8 1 // Abbreviation Code 2026-02-21T08:21:55.6478183Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:55.6478311Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:55.6478387Z .b8 37 // DW_AT_producer 2026-02-21T08:21:55.6478467Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.6478538Z .b8 19 // DW_AT_language 2026-02-21T08:21:55.6478612Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:55.6478689Z .b8 3 // DW_AT_name 2026-02-21T08:21:55.6478759Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.6478834Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:55.6478913Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:55.6478984Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:55.6479053Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.6479172Z .b8 0 // EOM(1) 2026-02-21T08:21:55.6479249Z .b8 0 // EOM(2) 2026-02-21T08:21:55.6479313Z .b8 0 // EOM(3) 2026-02-21T08:21:55.6479362Z } 2026-02-21T08:21:55.6479424Z .section .debug_info 2026-02-21T08:21:55.6479472Z { 2026-02-21T08:21:55.6479550Z .b32 104 // Length of Unit 2026-02-21T08:21:55.6479638Z .b8 2 // DWARF version number 2026-02-21T08:21:55.6479688Z .b8 0 2026-02-21T08:21:55.6479799Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:55.6479885Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:55.6479989Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:55.6480065Z .b8 116 // DW_AT_producer 2026-02-21T08:21:55.6480116Z .b8 114 2026-02-21T08:21:55.6480172Z .b8 105 2026-02-21T08:21:55.6480224Z .b8 116 2026-02-21T08:21:55.6480275Z .b8 111 2026-02-21T08:21:55.6480322Z .b8 110 2026-02-21T08:21:55.6480378Z .b8 0 2026-02-21T08:21:55.6480448Z .b8 2 // DW_AT_language 2026-02-21T08:21:55.6480496Z .b8 0 2026-02-21T08:21:55.6480574Z .b8 99 // DW_AT_name 2026-02-21T08:21:55.6480623Z .b8 122 2026-02-21T08:21:55.6480672Z .b8 110 2026-02-21T08:21:55.6480720Z .b8 98 2026-02-21T08:21:55.6480778Z .b8 106 2026-02-21T08:21:55.6480826Z .b8 113 2026-02-21T08:21:55.6480874Z .b8 98 2026-02-21T08:21:55.6480929Z .b8 109 2026-02-21T08:21:55.6480976Z .b8 119 2026-02-21T08:21:55.6481024Z .b8 104 2026-02-21T08:21:55.6481072Z .b8 120 2026-02-21T08:21:55.6481128Z .b8 106 2026-02-21T08:21:55.6481176Z .b8 107 2026-02-21T08:21:55.6481224Z .b8 100 2026-02-21T08:21:55.6481273Z .b8 118 2026-02-21T08:21:55.6481328Z .b8 115 2026-02-21T08:21:55.6481377Z .b8 119 2026-02-21T08:21:55.6481424Z .b8 113 2026-02-21T08:21:55.6481481Z .b8 53 2026-02-21T08:21:55.6481531Z .b8 118 2026-02-21T08:21:55.6481581Z .b8 112 2026-02-21T08:21:55.6481631Z .b8 116 2026-02-21T08:21:55.6481690Z .b8 98 2026-02-21T08:21:55.6481740Z .b8 119 2026-02-21T08:21:55.6481790Z .b8 52 2026-02-21T08:21:55.6481848Z .b8 100 2026-02-21T08:21:55.6481897Z .b8 113 2026-02-21T08:21:55.6481945Z .b8 117 2026-02-21T08:21:55.6481997Z .b8 116 2026-02-21T08:21:55.6482054Z .b8 111 2026-02-21T08:21:55.6482103Z .b8 101 2026-02-21T08:21:55.6482151Z .b8 54 2026-02-21T08:21:55.6482206Z .b8 108 2026-02-21T08:21:55.6482254Z .b8 103 2026-02-21T08:21:55.6482302Z .b8 53 2026-02-21T08:21:55.6482350Z .b8 55 2026-02-21T08:21:55.6482405Z .b8 100 2026-02-21T08:21:55.6482453Z .b8 104 2026-02-21T08:21:55.6482501Z .b8 120 2026-02-21T08:21:55.6482550Z .b8 109 2026-02-21T08:21:55.6482606Z .b8 102 2026-02-21T08:21:55.6482653Z .b8 112 2026-02-21T08:21:55.6482701Z .b8 121 2026-02-21T08:21:55.6482756Z .b8 97 2026-02-21T08:21:55.6482805Z .b8 53 2026-02-21T08:21:55.6482853Z .b8 109 2026-02-21T08:21:55.6482902Z .b8 117 2026-02-21T08:21:55.6482961Z .b8 52 2026-02-21T08:21:55.6483054Z .b8 108 2026-02-21T08:21:55.6483104Z .b8 110 2026-02-21T08:21:55.6483159Z .b8 106 2026-02-21T08:21:55.6483207Z .b8 114 2026-02-21T08:21:55.6483254Z .b8 46 2026-02-21T08:21:55.6483301Z .b8 112 2026-02-21T08:21:55.6483357Z .b8 121 2026-02-21T08:21:55.6483405Z .b8 0 2026-02-21T08:21:55.6483493Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:55.6483571Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:55.6483619Z .b8 116 2026-02-21T08:21:55.6483667Z .b8 109 2026-02-21T08:21:55.6483714Z .b8 112 2026-02-21T08:21:55.6483769Z .b8 47 2026-02-21T08:21:55.6483816Z .b8 116 2026-02-21T08:21:55.6483863Z .b8 111 2026-02-21T08:21:55.6483918Z .b8 114 2026-02-21T08:21:55.6483965Z .b8 99 2026-02-21T08:21:55.6484014Z .b8 104 2026-02-21T08:21:55.6484062Z .b8 105 2026-02-21T08:21:55.6484117Z .b8 110 2026-02-21T08:21:55.6484165Z .b8 100 2026-02-21T08:21:55.6484212Z .b8 117 2026-02-21T08:21:55.6484260Z .b8 99 2026-02-21T08:21:55.6484357Z .b8 116 2026-02-21T08:21:55.6484406Z .b8 111 2026-02-21T08:21:55.6484458Z .b8 114 2026-02-21T08:21:55.6484512Z .b8 95 2026-02-21T08:21:55.6484561Z .b8 114 2026-02-21T08:21:55.6484609Z .b8 111 2026-02-21T08:21:55.6484655Z .b8 111 2026-02-21T08:21:55.6484747Z .b8 116 2026-02-21T08:21:55.6484796Z .b8 47 2026-02-21T08:21:55.6484844Z .b8 122 2026-02-21T08:21:55.6484899Z .b8 110 2026-02-21T08:21:55.6484948Z .b8 0 2026-02-21T08:21:55.6484996Z } 2026-02-21T08:21:55.6485057Z .section .debug_macinfo { } 2026-02-21T08:21:55.6485062Z 2026-02-21T08:21:55.6485146Z ================================================================ 2026-02-21T08:21:55.6485245Z please share the reproducer above with Triton project. 2026-02-21T08:21:55.8768500Z 2026-02-21T08:21:55.8768553Z 2026-02-21T08:21:55.8768559Z 2026-02-21T08:21:55.8768854Z ================================================================ 2026-02-21T08:21:55.8768947Z Internal Triton PTX codegen error 2026-02-21T08:21:55.8769051Z `ptxas` stderr: 2026-02-21T08:21:55.8769472Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.8769584Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.8769589Z 2026-02-21T08:21:55.8769823Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:55.8770923Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:21:55.8771064Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:55.8771131Z `ptxas` stderr: 2026-02-21T08:21:55.8771551Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_r0c69zu.ptx -o /tmp/tmp_r0c69zu.ptx.o 2026-02-21T08:21:55.8772075Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.8772079Z 2026-02-21T08:21:55.8772083Z 2026-02-21T08:21:55.8772139Z // 2026-02-21T08:21:55.8772221Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:55.8772275Z // 2026-02-21T08:21:55.8772279Z 2026-02-21T08:21:55.8772337Z .version 8.7 2026-02-21T08:21:55.8772397Z .target sm_100a 2026-02-21T08:21:55.8772466Z .address_size 64 2026-02-21T08:21:55.8772470Z 2026-02-21T08:21:55.8772606Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:55.8772690Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:55.8772788Z // @_helion_matmul 2026-02-21T08:21:55.8773123Z .visible .entry _helion_matmul( 2026-02-21T08:21:55.8773238Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:55.8773354Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:55.8773453Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:55.8773551Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:55.8773654Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:55.8773722Z ) 2026-02-21T08:21:55.8773780Z .reqntid 128 2026-02-21T08:21:55.8773839Z .maxnreg 32 2026-02-21T08:21:55.8773903Z { 2026-02-21T08:21:55.8773972Z .reg .pred %p<120>; 2026-02-21T08:21:55.8774033Z .reg .b32 %r<406>; 2026-02-21T08:21:55.8774095Z .reg .b64 %rd<136>; 2026-02-21T08:21:55.8774299Z .loc 1 19 0 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:19:0 2026-02-21T08:21:55.8774455Z $L__func_begin0: 2026-02-21T08:21:55.8774649Z .loc 1 19 0 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:19:0 2026-02-21T08:21:55.8774653Z 2026-02-21T08:21:55.8774908Z // %bb.0: 2026-02-21T08:21:55.8774998Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:21:55.8775055Z $L__tmp0: 2026-02-21T08:21:55.8775238Z .loc 1 19 0 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:19 2026-02-21T08:21:55.8775298Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:55.8775388Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:21:55.8775456Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:21:55.8775531Z mov.b32 %r30, global_smem; 2026-02-21T08:21:55.8775590Z // begin inline asm 2026-02-21T08:21:55.8775745Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r30], 64; 2026-02-21T08:21:55.8775812Z // end inline asm 2026-02-21T08:21:55.8775899Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T08:21:55.8775959Z bar.sync 0; 2026-02-21T08:21:55.8776042Z ld.shared.b32 %r397, [global_smem]; 2026-02-21T08:21:55.8776102Z bar.sync 0; 2026-02-21T08:21:55.8776160Z // begin inline asm 2026-02-21T08:21:55.8776295Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:55.8776356Z // end inline asm 2026-02-21T08:21:55.8776531Z .loc 1 21 67 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:21:67 2026-02-21T08:21:55.8776591Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:21:55.8776658Z mov.u32 %r47, %ctaid.y; 2026-02-21T08:21:55.8776715Z mov.u32 %r48, %ctaid.z; 2026-02-21T08:21:55.8776773Z mov.u32 %r49, %nctaid.x; 2026-02-21T08:21:55.8776829Z mov.u32 %r50, %nctaid.y; 2026-02-21T08:21:55.8776901Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T08:21:55.8776964Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T08:21:55.8777021Z shl.b32 %r53, %r52, 8; 2026-02-21T08:21:55.8777088Z cvt.s64.s32 %rd41, %r53; 2026-02-21T08:21:55.8777149Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T08:21:55.8777206Z shl.b32 %r54, %r1, 2; 2026-02-21T08:21:55.8777265Z add.s32 %r31, %r30, %r54; 2026-02-21T08:21:55.8777324Z mov.b32 %r40, 0; 2026-02-21T08:21:55.8777379Z // begin inline asm 2026-02-21T08:21:55.8777446Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:55.8777504Z // end inline asm 2026-02-21T08:21:55.8777564Z bar.warp.sync -1; 2026-02-21T08:21:55.8777625Z setp.eq.b32 %p110, %r1, 0; 2026-02-21T08:21:55.8777689Z cvt.u64.u32 %rd4, %r30; 2026-02-21T08:21:55.8777742Z // begin inline asm 2026-02-21T08:21:55.8777913Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:21:55.8777965Z // end inline asm 2026-02-21T08:21:55.8778026Z // begin inline asm 2026-02-21T08:21:55.8778166Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.8778218Z // end inline asm 2026-02-21T08:21:55.8778280Z mov.b32 %r33, 16; 2026-02-21T08:21:55.8778332Z // begin inline asm 2026-02-21T08:21:55.8778483Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:55.8778614Z // end inline asm 2026-02-21T08:21:55.8778667Z mov.b32 %r34, 64; 2026-02-21T08:21:55.8778719Z // begin inline asm 2026-02-21T08:21:55.8778865Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.8778927Z // end inline asm 2026-02-21T08:21:55.8778983Z mov.b32 %r35, 2048; 2026-02-21T08:21:55.8779038Z // begin inline asm 2026-02-21T08:21:55.8779203Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:55.8779256Z // end inline asm 2026-02-21T08:21:55.8779310Z // begin inline asm 2026-02-21T08:21:55.8779467Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:21:55.8779521Z // end inline asm 2026-02-21T08:21:55.8779575Z mov.b64 %rd12, 4096; 2026-02-21T08:21:55.8779629Z // begin inline asm 2026-02-21T08:21:55.8779992Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.8780050Z // end inline asm 2026-02-21T08:21:55.8780102Z mov.b32 %r37, 1; 2026-02-21T08:21:55.8780165Z // begin inline asm 2026-02-21T08:21:55.8780334Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:55.8780387Z // end inline asm 2026-02-21T08:21:55.8780449Z // begin inline asm 2026-02-21T08:21:55.8780617Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:55.8780670Z // end inline asm 2026-02-21T08:21:55.8780725Z // begin inline asm 2026-02-21T08:21:55.8780888Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.8780941Z // end inline asm 2026-02-21T08:21:55.8780996Z // begin inline asm 2026-02-21T08:21:55.8781173Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.8781229Z // end inline asm 2026-02-21T08:21:55.8781285Z // begin inline asm 2026-02-21T08:21:55.8781447Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.8781501Z // end inline asm 2026-02-21T08:21:55.8781556Z // begin inline asm 2026-02-21T08:21:55.8781702Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.8781764Z // end inline asm 2026-02-21T08:21:55.8781828Z // begin inline asm 2026-02-21T08:21:55.8782088Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.8782150Z // end inline asm 2026-02-21T08:21:55.8782202Z // begin inline asm 2026-02-21T08:21:55.8782328Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:21:55.8782407Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.8782478Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.8782531Z // end inline asm 2026-02-21T08:21:55.8782591Z bar.sync 0; 2026-02-21T08:21:55.8782658Z cvta.global.u64 %rd59, %rd19; 2026-02-21T08:21:55.8782832Z .loc 1 22 67 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:22:67 2026-02-21T08:21:55.8782891Z add.s64 %rd37, %rd19, 128; 2026-02-21T08:21:55.8782951Z bar.sync 0; 2026-02-21T08:21:55.8783004Z // begin inline asm 2026-02-21T08:21:55.8783068Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:55.8783128Z // end inline asm 2026-02-21T08:21:55.8783185Z bar.warp.sync -1; 2026-02-21T08:21:55.8783239Z // begin inline asm 2026-02-21T08:21:55.8783396Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:21:55.8783457Z // end inline asm 2026-02-21T08:21:55.8783510Z // begin inline asm 2026-02-21T08:21:55.8783648Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.8783709Z // end inline asm 2026-02-21T08:21:55.8783763Z // begin inline asm 2026-02-21T08:21:55.8783911Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:55.8784016Z // end inline asm 2026-02-21T08:21:55.8784070Z // begin inline asm 2026-02-21T08:21:55.8784214Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.8784268Z // end inline asm 2026-02-21T08:21:55.8784328Z // begin inline asm 2026-02-21T08:21:55.8784480Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:55.8784533Z // end inline asm 2026-02-21T08:21:55.8784594Z mov.b32 %r44, 4096; 2026-02-21T08:21:55.8784648Z // begin inline asm 2026-02-21T08:21:55.8784833Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r44; 2026-02-21T08:21:55.8784892Z // end inline asm 2026-02-21T08:21:55.8784945Z // begin inline asm 2026-02-21T08:21:55.8785157Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.8785219Z // end inline asm 2026-02-21T08:21:55.8785276Z // begin inline asm 2026-02-21T08:21:55.8785440Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:55.8785493Z // end inline asm 2026-02-21T08:21:55.8785554Z // begin inline asm 2026-02-21T08:21:55.8785719Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:55.8785772Z // end inline asm 2026-02-21T08:21:55.8785835Z // begin inline asm 2026-02-21T08:21:55.8785983Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.8786036Z // end inline asm 2026-02-21T08:21:55.8786097Z // begin inline asm 2026-02-21T08:21:55.8786263Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.8786317Z // end inline asm 2026-02-21T08:21:55.8786370Z // begin inline asm 2026-02-21T08:21:55.8786532Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.8786589Z // end inline asm 2026-02-21T08:21:55.8786643Z // begin inline asm 2026-02-21T08:21:55.8786789Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.8786842Z // end inline asm 2026-02-21T08:21:55.8786894Z // begin inline asm 2026-02-21T08:21:55.8787154Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.8787207Z // end inline asm 2026-02-21T08:21:55.8787259Z // begin inline asm 2026-02-21T08:21:55.8787388Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:21:55.8787457Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.8787527Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.8787578Z // end inline asm 2026-02-21T08:21:55.8787635Z bar.sync 0; 2026-02-21T08:21:55.8787698Z cvta.global.u64 %rd60, %rd37; 2026-02-21T08:21:55.8787879Z .loc 1 28 131 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:28:131 2026-02-21T08:21:55.8787950Z setp.gt.u32 %p39, %r3, 2047; 2026-02-21T08:21:55.8788009Z @%p39 bra $L__BB0_8; 2026-02-21T08:21:55.8788083Z // %bb.1: // %.lr.ph 2026-02-21T08:21:55.8788253Z .loc 1 40 45 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:40:45 2026-02-21T08:21:55.8788316Z shl.b32 %r154, %r1, 3; 2026-02-21T08:21:55.8788373Z and.b32 %r155, %r154, 56; 2026-02-21T08:21:55.8788432Z bfe.u32 %r156, %r1, 3, 4; 2026-02-21T08:21:55.8788494Z shr.u32 %r157, %r1, 5; 2026-02-21T08:21:55.8788548Z shl.b32 %r158, %r1, 4; 2026-02-21T08:21:55.8788606Z and.b32 %r159, %r158, 176; 2026-02-21T08:21:55.8788667Z and.b32 %r160, %r1, 96; 2026-02-21T08:21:55.8788724Z shl.b32 %r161, %r160, 3; 2026-02-21T08:21:55.8788780Z bfe.s32 %r162, %r1, 2, 1; 2026-02-21T08:21:55.8788837Z and.b32 %r163, %r162, 1088; 2026-02-21T08:21:55.8788902Z and.b32 %r165, %r54, 64; 2026-02-21T08:21:55.8789014Z xor.b32 %r166, %r163, %r165; 2026-02-21T08:21:55.8789072Z add.s32 %r167, %r30, %r159; 2026-02-21T08:21:55.8789138Z add.s32 %r168, %r167, %r161; 2026-02-21T08:21:55.8789193Z shl.b32 %r169, %r1, 5; 2026-02-21T08:21:55.8789248Z and.b32 %r170, %r169, 1792; 2026-02-21T08:21:55.8789303Z and.b32 %r171, %r154, 48; 2026-02-21T08:21:55.8789365Z shl.b32 %r172, %r160, 1; 2026-02-21T08:21:55.8789420Z shl.b32 %r173, %r1, 6; 2026-02-21T08:21:55.8789476Z and.b32 %r174, %r173, 64; 2026-02-21T08:21:55.8789540Z xor.b32 %r175, %r172, %r174; 2026-02-21T08:21:55.8789596Z add.s32 %r176, %r30, %r170; 2026-02-21T08:21:55.8789653Z add.s32 %r177, %r176, %r171; 2026-02-21T08:21:55.8789820Z .loc 1 35 33 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:35:33 2026-02-21T08:21:55.8789884Z shr.u32 %r178, %r3, 5; 2026-02-21T08:21:55.8789941Z and.b32 %r179, %r178, 60; 2026-02-21T08:21:55.8790166Z .loc 1 37 64 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:37:64 2026-02-21T08:21:55.8790235Z and.b32 %r180, %r3, 3; 2026-02-21T08:21:55.8790400Z .loc 1 37 30 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:37:30 2026-02-21T08:21:55.8790455Z or.b32 %r181, %r179, %r180; 2026-02-21T08:21:55.8790630Z .loc 1 39 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:39:27 2026-02-21T08:21:55.8790688Z shl.b32 %r212, %r181, 6; 2026-02-21T08:21:55.8790850Z .loc 1 41 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:41:27 2026-02-21T08:21:55.8790913Z shl.b32 %r182, %r3, 4; 2026-02-21T08:21:55.8790966Z and.b32 %r208, %r182, 1984; 2026-02-21T08:21:55.8791128Z .loc 1 42 32 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:42:32 2026-02-21T08:21:55.8791183Z or.b32 %r9, %r208, %r156; 2026-02-21T08:21:55.8791355Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8791433Z shfl.sync.idx.b32 %r13, %r157, 0, 31, -1; 2026-02-21T08:21:55.8791489Z shl.b32 %r183, %r13, 21; 2026-02-21T08:21:55.8791558Z and.b32 %r184, %r183, 6291456; 2026-02-21T08:21:55.8791613Z add.s32 %r303, %r184, %r397; 2026-02-21T08:21:55.8791672Z mov.pred %p40, -1; 2026-02-21T08:21:55.8791727Z // begin inline asm 2026-02-21T08:21:55.8792007Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 0], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:55.8792061Z // end inline asm 2026-02-21T08:21:55.8792114Z // begin inline asm 2026-02-21T08:21:55.8792378Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 16], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:55.8792431Z // end inline asm 2026-02-21T08:21:55.8792484Z // begin inline asm 2026-02-21T08:21:55.8792561Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:55.8792616Z // end inline asm 2026-02-21T08:21:55.8792669Z bar.sync 0; 2026-02-21T08:21:55.8792845Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8792902Z add.s32 %r399, %r30, 28736; 2026-02-21T08:21:55.8792956Z // begin inline asm 2026-02-21T08:21:55.8793043Z @%p110 mbarrier.init.shared::cta.b64 [%r399], 1; 2026-02-21T08:21:55.8793103Z // end inline asm 2026-02-21T08:21:55.8793155Z bar.sync 0; 2026-02-21T08:21:55.8793212Z add.s32 %r90, %r30, 28744; 2026-02-21T08:21:55.8793273Z // begin inline asm 2026-02-21T08:21:55.8793356Z @%p110 mbarrier.init.shared::cta.b64 [%r90], 1; 2026-02-21T08:21:55.8793410Z // end inline asm 2026-02-21T08:21:55.8793467Z add.s32 %r91, %r30, 28672; 2026-02-21T08:21:55.8793529Z // begin inline asm 2026-02-21T08:21:55.8793607Z @%p110 mbarrier.init.shared::cta.b64 [%r91], 1; 2026-02-21T08:21:55.8793660Z // end inline asm 2026-02-21T08:21:55.8793721Z bar.sync 0; 2026-02-21T08:21:55.8793777Z add.s32 %r92, %r30, 28680; 2026-02-21T08:21:55.8793879Z // begin inline asm 2026-02-21T08:21:55.8793956Z @%p110 mbarrier.init.shared::cta.b64 [%r92], 1; 2026-02-21T08:21:55.8794015Z // end inline asm 2026-02-21T08:21:55.8794066Z bar.sync 0; 2026-02-21T08:21:55.8794122Z add.s32 %r93, %r30, 28688; 2026-02-21T08:21:55.8794183Z // begin inline asm 2026-02-21T08:21:55.8794257Z @%p110 mbarrier.init.shared::cta.b64 [%r93], 1; 2026-02-21T08:21:55.8794309Z // end inline asm 2026-02-21T08:21:55.8794365Z bar.sync 0; 2026-02-21T08:21:55.8794420Z add.s32 %r94, %r30, 28696; 2026-02-21T08:21:55.8794472Z // begin inline asm 2026-02-21T08:21:55.8794547Z @%p110 mbarrier.init.shared::cta.b64 [%r94], 1; 2026-02-21T08:21:55.8794607Z // end inline asm 2026-02-21T08:21:55.8794658Z bar.sync 0; 2026-02-21T08:21:55.8794750Z add.s32 %r95, %r30, 28704; 2026-02-21T08:21:55.8794811Z // begin inline asm 2026-02-21T08:21:55.8794888Z @%p110 mbarrier.init.shared::cta.b64 [%r95], 1; 2026-02-21T08:21:55.8794992Z // end inline asm 2026-02-21T08:21:55.8795047Z bar.sync 0; 2026-02-21T08:21:55.8795111Z add.s32 %r96, %r30, 28712; 2026-02-21T08:21:55.8795164Z // begin inline asm 2026-02-21T08:21:55.8795240Z @%p110 mbarrier.init.shared::cta.b64 [%r96], 1; 2026-02-21T08:21:55.8795300Z // end inline asm 2026-02-21T08:21:55.8795351Z bar.sync 0; 2026-02-21T08:21:55.8795409Z add.s32 %r205, %r30, 28720; 2026-02-21T08:21:55.8795463Z // begin inline asm 2026-02-21T08:21:55.8795549Z @%p110 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T08:21:55.8795603Z // end inline asm 2026-02-21T08:21:55.8795654Z bar.sync 0; 2026-02-21T08:21:55.8795717Z // begin inline asm 2026-02-21T08:21:55.8795828Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r91], 4096; 2026-02-21T08:21:55.8795882Z // end inline asm 2026-02-21T08:21:55.8796056Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8796111Z // begin inline asm 2026-02-21T08:21:55.8796183Z fence.proxy.async.shared::cta; 2026-02-21T08:21:55.8796238Z // end inline asm 2026-02-21T08:21:55.8796298Z bar.sync 0; 2026-02-21T08:21:55.8796361Z elect.sync %r185|%p70, -1; 2026-02-21T08:21:55.8796423Z and.pred %p52, %p1, %p70; 2026-02-21T08:21:55.8796485Z // begin inline asm 2026-02-21T08:21:55.8796724Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r30], [%rd59, {%r40, %r208}], [%r91]; 2026-02-21T08:21:55.8796778Z // end inline asm 2026-02-21T08:21:55.8796948Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8797002Z bar.sync 0; 2026-02-21T08:21:55.8797062Z elect.sync %r186|%p71, -1; 2026-02-21T08:21:55.8797123Z and.pred %p53, %p1, %p71; 2026-02-21T08:21:55.8797190Z add.s32 %r103, %r30, 14336; 2026-02-21T08:21:55.8797244Z // begin inline asm 2026-02-21T08:21:55.8797488Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r103], [%rd60, {%r40, %r212}], [%r91]; 2026-02-21T08:21:55.8797554Z // end inline asm 2026-02-21T08:21:55.8797718Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8797770Z bar.sync 0; 2026-02-21T08:21:55.8797836Z // begin inline asm 2026-02-21T08:21:55.8797944Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r92], 4096; 2026-02-21T08:21:55.8797998Z // end inline asm 2026-02-21T08:21:55.8798157Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8798219Z bar.sync 0; 2026-02-21T08:21:55.8798280Z elect.sync %r187|%p72, -1; 2026-02-21T08:21:55.8798339Z and.pred %p55, %p1, %p72; 2026-02-21T08:21:55.8798404Z add.s32 %r108, %r30, 2048; 2026-02-21T08:21:55.8798457Z // begin inline asm 2026-02-21T08:21:55.8798687Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd59, {%r33, %r208}], [%r92]; 2026-02-21T08:21:55.8798749Z // end inline asm 2026-02-21T08:21:55.8798974Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8799027Z bar.sync 0; 2026-02-21T08:21:55.8799085Z elect.sync %r188|%p73, -1; 2026-02-21T08:21:55.8799153Z and.pred %p56, %p1, %p73; 2026-02-21T08:21:55.8799211Z add.s32 %r112, %r30, 16384; 2026-02-21T08:21:55.8799265Z // begin inline asm 2026-02-21T08:21:55.8799496Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r112], [%rd60, {%r33, %r212}], [%r92]; 2026-02-21T08:21:55.8799550Z // end inline asm 2026-02-21T08:21:55.8799720Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8799778Z bar.sync 0; 2026-02-21T08:21:55.8799833Z // begin inline asm 2026-02-21T08:21:55.8799939Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r93], 4096; 2026-02-21T08:21:55.8799994Z // end inline asm 2026-02-21T08:21:55.8800213Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8800268Z bar.sync 0; 2026-02-21T08:21:55.8800328Z elect.sync %r189|%p74, -1; 2026-02-21T08:21:55.8800395Z and.pred %p58, %p1, %p74; 2026-02-21T08:21:55.8800452Z add.s32 %r117, %r30, 4096; 2026-02-21T08:21:55.8800505Z mov.b32 %r118, 32; 2026-02-21T08:21:55.8800558Z // begin inline asm 2026-02-21T08:21:55.8800807Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd59, {%r118, %r208}], [%r93]; 2026-02-21T08:21:55.8800860Z // end inline asm 2026-02-21T08:21:55.8801022Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8801083Z bar.sync 0; 2026-02-21T08:21:55.8801142Z elect.sync %r190|%p75, -1; 2026-02-21T08:21:55.8801203Z and.pred %p59, %p1, %p75; 2026-02-21T08:21:55.8801267Z add.s32 %r121, %r30, 18432; 2026-02-21T08:21:55.8801322Z // begin inline asm 2026-02-21T08:21:55.8801549Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd60, {%r118, %r212}], [%r93]; 2026-02-21T08:21:55.8801612Z // end inline asm 2026-02-21T08:21:55.8801774Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8801827Z bar.sync 0; 2026-02-21T08:21:55.8801879Z // begin inline asm 2026-02-21T08:21:55.8801990Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r94], 4096; 2026-02-21T08:21:55.8802043Z // end inline asm 2026-02-21T08:21:55.8802202Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8802263Z bar.sync 0; 2026-02-21T08:21:55.8802323Z elect.sync %r191|%p76, -1; 2026-02-21T08:21:55.8802382Z and.pred %p61, %p1, %p76; 2026-02-21T08:21:55.8802437Z add.s32 %r126, %r30, 6144; 2026-02-21T08:21:55.8802496Z mov.b32 %r127, 48; 2026-02-21T08:21:55.8802550Z // begin inline asm 2026-02-21T08:21:55.8802787Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r126], [%rd59, {%r127, %r208}], [%r94]; 2026-02-21T08:21:55.8802849Z // end inline asm 2026-02-21T08:21:55.8803008Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8803060Z bar.sync 0; 2026-02-21T08:21:55.8803126Z elect.sync %r192|%p77, -1; 2026-02-21T08:21:55.8803184Z and.pred %p62, %p1, %p77; 2026-02-21T08:21:55.8803240Z add.s32 %r130, %r30, 20480; 2026-02-21T08:21:55.8803294Z // begin inline asm 2026-02-21T08:21:55.8803531Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r130], [%rd60, {%r127, %r212}], [%r94]; 2026-02-21T08:21:55.8803585Z // end inline asm 2026-02-21T08:21:55.8803747Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8803805Z bar.sync 0; 2026-02-21T08:21:55.8803858Z // begin inline asm 2026-02-21T08:21:55.8803962Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r95], 4096; 2026-02-21T08:21:55.8804066Z // end inline asm 2026-02-21T08:21:55.8804233Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8804283Z bar.sync 0; 2026-02-21T08:21:55.8804351Z elect.sync %r193|%p78, -1; 2026-02-21T08:21:55.8804409Z and.pred %p64, %p1, %p78; 2026-02-21T08:21:55.8804464Z add.s32 %r135, %r30, 8192; 2026-02-21T08:21:55.8804519Z // begin inline asm 2026-02-21T08:21:55.8804802Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r135], [%rd59, {%r34, %r208}], [%r95]; 2026-02-21T08:21:55.8804857Z // end inline asm 2026-02-21T08:21:55.8805021Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8805081Z bar.sync 0; 2026-02-21T08:21:55.8805139Z elect.sync %r194|%p79, -1; 2026-02-21T08:21:55.8805197Z and.pred %p65, %p1, %p79; 2026-02-21T08:21:55.8805309Z add.s32 %r139, %r30, 22528; 2026-02-21T08:21:55.8805368Z // begin inline asm 2026-02-21T08:21:55.8805596Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r139], [%rd60, {%r34, %r212}], [%r95]; 2026-02-21T08:21:55.8805649Z // end inline asm 2026-02-21T08:21:55.8805824Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8805876Z bar.sync 0; 2026-02-21T08:21:55.8805929Z // begin inline asm 2026-02-21T08:21:55.8806043Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r96], 4096; 2026-02-21T08:21:55.8806098Z // end inline asm 2026-02-21T08:21:55.8806263Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8806322Z bar.sync 0; 2026-02-21T08:21:55.8806382Z elect.sync %r195|%p80, -1; 2026-02-21T08:21:55.8806442Z and.pred %p67, %p1, %p80; 2026-02-21T08:21:55.8806498Z add.s32 %r144, %r30, 10240; 2026-02-21T08:21:55.8806562Z mov.b32 %r145, 80; 2026-02-21T08:21:55.8806618Z // begin inline asm 2026-02-21T08:21:55.8806848Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd59, {%r145, %r208}], [%r96]; 2026-02-21T08:21:55.8806912Z // end inline asm 2026-02-21T08:21:55.8807078Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8807132Z bar.sync 0; 2026-02-21T08:21:55.8807199Z elect.sync %r196|%p81, -1; 2026-02-21T08:21:55.8807260Z and.pred %p68, %p1, %p81; 2026-02-21T08:21:55.8807319Z add.s32 %r148, %r30, 24576; 2026-02-21T08:21:55.8807373Z // begin inline asm 2026-02-21T08:21:55.8807616Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd60, {%r145, %r212}], [%r96]; 2026-02-21T08:21:55.8807670Z // end inline asm 2026-02-21T08:21:55.8807836Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8807898Z bar.sync 0; 2026-02-21T08:21:55.8807956Z // begin inline asm 2026-02-21T08:21:55.8808004Z 2026-02-21T08:21:55.8808060Z { 2026-02-21T08:21:55.8808120Z .reg .pred complete; 2026-02-21T08:21:55.8808174Z waitLoop: 2026-02-21T08:21:55.8808288Z mbarrier.try_wait.parity.shared.b64 complete, [%r91], %r40; 2026-02-21T08:21:55.8808358Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.8808405Z } 2026-02-21T08:21:55.8808409Z 2026-02-21T08:21:55.8808463Z // end inline asm 2026-02-21T08:21:55.8808634Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8808694Z setp.ne.b32 %p82, %r13, 0; 2026-02-21T08:21:55.8808749Z @%p82 bra $L__BB0_3; 2026-02-21T08:21:55.8808801Z // %bb.2: 2026-02-21T08:21:55.8808976Z .loc 1 0 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:0:52 2026-02-21T08:21:55.8809038Z bfe.u32 %r201, %r103, 4, 14; 2026-02-21T08:21:55.8809100Z cvt.u64.u32 %rd57, %r201; 2026-02-21T08:21:55.8809181Z or.b64 %rd55, %rd57, -4611685949699522560; 2026-02-21T08:21:55.8809299Z bfe.u32 %r202, %r30, 4, 14; 2026-02-21T08:21:55.8809357Z cvt.u64.u32 %rd58, %r202; 2026-02-21T08:21:55.8809434Z or.b64 %rd54, %rd58, -4611685949699522560; 2026-02-21T08:21:55.8809602Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8809663Z elect.sync %r203|%p84, -1; 2026-02-21T08:21:55.8809718Z mov.b32 %r198, 68157456; 2026-02-21T08:21:55.8809781Z mov.pred %p83, 0; 2026-02-21T08:21:55.8809835Z // begin inline asm 2026-02-21T08:21:55.8809974Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd54, %rd55, %r198, %p83; 2026-02-21T08:21:55.8810035Z // end inline asm 2026-02-21T08:21:55.8810091Z add.s32 %r204, %r30, 28736; 2026-02-21T08:21:55.8810149Z cvt.u64.u32 %rd56, %r204; 2026-02-21T08:21:55.8810206Z // begin inline asm 2026-02-21T08:21:55.8810384Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T08:21:55.8810447Z // end inline asm 2026-02-21T08:21:55.8810502Z $L__BB0_3: 2026-02-21T08:21:55.8810685Z .loc 1 0 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:0:52 2026-02-21T08:21:55.8810771Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T08:21:55.8810831Z add.s32 %r4, %r168, %r166; 2026-02-21T08:21:55.8810899Z add.s32 %r308, %r177, %r175; 2026-02-21T08:21:55.8810957Z or.b32 %r7, %r212, %r155; 2026-02-21T08:21:55.8811013Z or.b32 %r10, %r9, 16; 2026-02-21T08:21:55.8811070Z or.b32 %r11, %r9, 32; 2026-02-21T08:21:55.8811135Z or.b32 %r12, %r9, 48; 2026-02-21T08:21:55.8811306Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8811360Z bar.sync 0; 2026-02-21T08:21:55.8811424Z // begin inline asm 2026-02-21T08:21:55.8811537Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r205], 4096; 2026-02-21T08:21:55.8811594Z // end inline asm 2026-02-21T08:21:55.8811774Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8811831Z bar.sync 0; 2026-02-21T08:21:55.8811895Z elect.sync %r219|%p90, -1; 2026-02-21T08:21:55.8811955Z and.pred %p87, %p1, %p90; 2026-02-21T08:21:55.8812021Z add.s32 %r206, %r30, 12288; 2026-02-21T08:21:55.8812076Z mov.b32 %r207, 96; 2026-02-21T08:21:55.8812131Z // begin inline asm 2026-02-21T08:21:55.8812383Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r206], [%rd59, {%r207, %r208}], [%r205]; 2026-02-21T08:21:55.8812440Z // end inline asm 2026-02-21T08:21:55.8812611Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8812670Z bar.sync 0; 2026-02-21T08:21:55.8812733Z elect.sync %r220|%p91, -1; 2026-02-21T08:21:55.8812793Z and.pred %p88, %p1, %p91; 2026-02-21T08:21:55.8812850Z add.s32 %r210, %r30, 26624; 2026-02-21T08:21:55.8812915Z // begin inline asm 2026-02-21T08:21:55.8813165Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r210], [%rd60, {%r207, %r212}], [%r205]; 2026-02-21T08:21:55.8813220Z // end inline asm 2026-02-21T08:21:55.8813282Z mov.b32 %r403, 1; 2026-02-21T08:21:55.8813338Z mov.b32 %r402, 6; 2026-02-21T08:21:55.8813392Z mov.b32 %r398, 0; 2026-02-21T08:21:55.8813449Z mov.b32 %r400, %r398; 2026-02-21T08:21:55.8813512Z mov.b32 %r401, %r398; 2026-02-21T08:21:55.8813567Z mov.b32 %r404, %r398; 2026-02-21T08:21:55.8813622Z mov.b32 %r405, %r398; 2026-02-21T08:21:55.8813687Z bra.uni $L__BB0_4; 2026-02-21T08:21:55.8813794Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.8813971Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8814045Z setp.lt.u32 %p100, %r405, 1936; 2026-02-21T08:21:55.8814218Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8814317Z // begin inline asm 2026-02-21T08:21:55.8814368Z 2026-02-21T08:21:55.8814427Z { 2026-02-21T08:21:55.8814489Z .reg .pred complete; 2026-02-21T08:21:55.8814544Z waitLoop: 2026-02-21T08:21:55.8814699Z mbarrier.try_wait.parity.shared.b64 complete, [%r399], %r398; 2026-02-21T08:21:55.8814768Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.8814819Z } 2026-02-21T08:21:55.8814823Z 2026-02-21T08:21:55.8814885Z // end inline asm 2026-02-21T08:21:55.8814946Z add.s32 %r250, %r403, 1; 2026-02-21T08:21:55.8815010Z setp.gt.s32 %p103, %r250, 1; 2026-02-21T08:21:55.8815077Z selp.b32 %r403, 0, %r250, %p103; 2026-02-21T08:21:55.8815147Z selp.b32 %r251, 1, 0, %p103; 2026-02-21T08:21:55.8815209Z xor.b32 %r404, %r260, %r251; 2026-02-21T08:21:55.8815386Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8815454Z add.s32 %r252, %r402, 1; 2026-02-21T08:21:55.8815568Z setp.gt.s32 %p104, %r252, 6; 2026-02-21T08:21:55.8815644Z selp.b32 %r402, 0, %r252, %p104; 2026-02-21T08:21:55.8815704Z shl.b32 %r253, %r402, 3; 2026-02-21T08:21:55.8815778Z add.s32 %r255, %r30, %r253; 2026-02-21T08:21:55.8815839Z add.s32 %r245, %r255, 28672; 2026-02-21T08:21:55.8815896Z bar.sync 0; 2026-02-21T08:21:55.8815971Z and.pred %p97, %p110, %p100; 2026-02-21T08:21:55.8816030Z // begin inline asm 2026-02-21T08:21:55.8816140Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r245], 4096; 2026-02-21T08:21:55.8816196Z // end inline asm 2026-02-21T08:21:55.8816374Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8816433Z shl.b32 %r256, %r402, 11; 2026-02-21T08:21:55.8816490Z add.s32 %r242, %r30, %r256; 2026-02-21T08:21:55.8816553Z bar.sync 0; 2026-02-21T08:21:55.8816615Z elect.sync %r257|%p105, -1; 2026-02-21T08:21:55.8816680Z and.pred %p106, %p100, %p105; 2026-02-21T08:21:55.8816750Z and.pred %p98, %p1, %p106; 2026-02-21T08:21:55.8816810Z add.s32 %r243, %r405, 112; 2026-02-21T08:21:55.8816871Z // begin inline asm 2026-02-21T08:21:55.8817113Z @%p98 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd59, {%r243, %r208}], [%r245]; 2026-02-21T08:21:55.8817178Z // end inline asm 2026-02-21T08:21:55.8817353Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8817412Z add.s32 %r246, %r242, 14336; 2026-02-21T08:21:55.8817474Z bar.sync 0; 2026-02-21T08:21:55.8817537Z elect.sync %r258|%p107, -1; 2026-02-21T08:21:55.8817610Z and.pred %p108, %p100, %p107; 2026-02-21T08:21:55.8817676Z and.pred %p99, %p1, %p108; 2026-02-21T08:21:55.8817730Z // begin inline asm 2026-02-21T08:21:55.8817958Z @%p99 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd60, {%r243, %r212}], [%r245]; 2026-02-21T08:21:55.8818012Z // end inline asm 2026-02-21T08:21:55.8818189Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8818255Z setp.lt.u32 %p109, %r405, 2016; 2026-02-21T08:21:55.8818312Z add.s32 %r405, %r405, 16; 2026-02-21T08:21:55.8818374Z mov.b32 %r398, %r260; 2026-02-21T08:21:55.8818428Z mov.b32 %r399, %r259; 2026-02-21T08:21:55.8818484Z @%p109 bra $L__BB0_4; 2026-02-21T08:21:55.8818546Z bra.uni $L__BB0_7; 2026-02-21T08:21:55.8818646Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:55.8818812Z .loc 1 0 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:0:57 2026-02-21T08:21:55.8818865Z mov.b32 %r260, %r404; 2026-02-21T08:21:55.8819036Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8819092Z add.s32 %r223, %r401, 1; 2026-02-21T08:21:55.8819150Z setp.gt.s32 %p93, %r223, 6; 2026-02-21T08:21:55.8819214Z selp.b32 %r401, 0, %r223, %p93; 2026-02-21T08:21:55.8819273Z selp.b32 %r224, 1, 0, %p93; 2026-02-21T08:21:55.8819382Z xor.b32 %r400, %r400, %r224; 2026-02-21T08:21:55.8819444Z shl.b32 %r225, %r401, 3; 2026-02-21T08:21:55.8819500Z add.s32 %r227, %r30, %r225; 2026-02-21T08:21:55.8819553Z add.s32 %r221, %r227, 28672; 2026-02-21T08:21:55.8819603Z bar.sync 0; 2026-02-21T08:21:55.8819663Z // begin inline asm 2026-02-21T08:21:55.8819712Z 2026-02-21T08:21:55.8819760Z { 2026-02-21T08:21:55.8819823Z .reg .pred complete; 2026-02-21T08:21:55.8819876Z waitLoop: 2026-02-21T08:21:55.8819990Z mbarrier.try_wait.parity.shared.b64 complete, [%r221], %r400; 2026-02-21T08:21:55.8820051Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.8820104Z } 2026-02-21T08:21:55.8820107Z 2026-02-21T08:21:55.8820160Z // end inline asm 2026-02-21T08:21:55.8820214Z shl.b32 %r228, %r403, 3; 2026-02-21T08:21:55.8820276Z add.s32 %r229, %r30, %r228; 2026-02-21T08:21:55.8820331Z add.s32 %r259, %r229, 28736; 2026-02-21T08:21:55.8820538Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8820600Z @%p82 bra $L__BB0_6; 2026-02-21T08:21:55.8820702Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.8820870Z .loc 1 51 31 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:51:31 2026-02-21T08:21:55.8820925Z shl.b32 %r232, %r401, 11; 2026-02-21T08:21:55.8820988Z add.s32 %r234, %r30, %r232; 2026-02-21T08:21:55.8821151Z .loc 1 52 44 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:52:44 2026-02-21T08:21:55.8821206Z add.s32 %r235, %r234, 14336; 2026-02-21T08:21:55.8821375Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8821436Z elect.sync %r236|%p95, -1; 2026-02-21T08:21:55.8821492Z bfe.u32 %r237, %r234, 4, 14; 2026-02-21T08:21:55.8821558Z cvt.u64.u32 %rd64, %r237; 2026-02-21T08:21:55.8821629Z or.b64 %rd61, %rd64, -4611685949699522560; 2026-02-21T08:21:55.8821689Z bfe.u32 %r238, %r235, 4, 14; 2026-02-21T08:21:55.8821745Z cvt.u64.u32 %rd65, %r238; 2026-02-21T08:21:55.8821820Z or.b64 %rd62, %rd65, -4611685949699522560; 2026-02-21T08:21:55.8821875Z mov.b32 %r231, 68157456; 2026-02-21T08:21:55.8821931Z mov.pred %p94, -1; 2026-02-21T08:21:55.8821994Z // begin inline asm 2026-02-21T08:21:55.8822132Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd61, %rd62, %r231, %p94; 2026-02-21T08:21:55.8822185Z // end inline asm 2026-02-21T08:21:55.8822240Z cvt.u64.u32 %rd63, %r259; 2026-02-21T08:21:55.8822301Z // begin inline asm 2026-02-21T08:21:55.8822422Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T08:21:55.8822476Z // end inline asm 2026-02-21T08:21:55.8822539Z bra.uni $L__BB0_6; 2026-02-21T08:21:55.8822632Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:21:55.8822799Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8822863Z // begin inline asm 2026-02-21T08:21:55.8822912Z 2026-02-21T08:21:55.8822960Z { 2026-02-21T08:21:55.8823017Z .reg .pred complete; 2026-02-21T08:21:55.8823081Z waitLoop: 2026-02-21T08:21:55.8823195Z mbarrier.try_wait.parity.shared.b64 complete, [%r259], %r260; 2026-02-21T08:21:55.8823259Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.8823316Z } 2026-02-21T08:21:55.8823320Z 2026-02-21T08:21:55.8823374Z // end inline asm 2026-02-21T08:21:55.8823541Z .loc 1 47 57 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:47:57 2026-02-21T08:21:55.8823594Z bar.sync 0; 2026-02-21T08:21:55.8823655Z // begin inline asm 2026-02-21T08:21:55.8823735Z @%p110 mbarrier.inval.shared::cta.b64 [%r91]; 2026-02-21T08:21:55.8823787Z // end inline asm 2026-02-21T08:21:55.8823847Z bar.sync 0; 2026-02-21T08:21:55.8823901Z // begin inline asm 2026-02-21T08:21:55.8823981Z @%p110 mbarrier.inval.shared::cta.b64 [%r92]; 2026-02-21T08:21:55.8824041Z // end inline asm 2026-02-21T08:21:55.8824135Z bar.sync 0; 2026-02-21T08:21:55.8824189Z // begin inline asm 2026-02-21T08:21:55.8824265Z @%p110 mbarrier.inval.shared::cta.b64 [%r93]; 2026-02-21T08:21:55.8824325Z // end inline asm 2026-02-21T08:21:55.8824376Z bar.sync 0; 2026-02-21T08:21:55.8824429Z // begin inline asm 2026-02-21T08:21:55.8824510Z @%p110 mbarrier.inval.shared::cta.b64 [%r94]; 2026-02-21T08:21:55.8824563Z // end inline asm 2026-02-21T08:21:55.8824615Z bar.sync 0; 2026-02-21T08:21:55.8824692Z // begin inline asm 2026-02-21T08:21:55.8824777Z @%p110 mbarrier.inval.shared::cta.b64 [%r95]; 2026-02-21T08:21:55.8824831Z // end inline asm 2026-02-21T08:21:55.8824883Z bar.sync 0; 2026-02-21T08:21:55.8824942Z // begin inline asm 2026-02-21T08:21:55.8825016Z @%p110 mbarrier.inval.shared::cta.b64 [%r96]; 2026-02-21T08:21:55.8825070Z // end inline asm 2026-02-21T08:21:55.8825121Z bar.sync 0; 2026-02-21T08:21:55.8825183Z // begin inline asm 2026-02-21T08:21:55.8825308Z @%p110 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T08:21:55.8825364Z // end inline asm 2026-02-21T08:21:55.8825429Z add.s32 %r268, %r30, 28736; 2026-02-21T08:21:55.8825484Z // begin inline asm 2026-02-21T08:21:55.8825564Z @%p110 mbarrier.inval.shared::cta.b64 [%r268]; 2026-02-21T08:21:55.8825624Z // end inline asm 2026-02-21T08:21:55.8825675Z bar.sync 0; 2026-02-21T08:21:55.8825729Z // begin inline asm 2026-02-21T08:21:55.8825802Z @%p110 mbarrier.inval.shared::cta.b64 [%r90]; 2026-02-21T08:21:55.8825862Z // end inline asm 2026-02-21T08:21:55.8826022Z .loc 1 56 45 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:56:45 2026-02-21T08:21:55.8826082Z shl.b32 %r341, %r9, 12; 2026-02-21T08:21:55.8826147Z shl.b32 %r342, %r10, 12; 2026-02-21T08:21:55.8826204Z shl.b32 %r343, %r11, 12; 2026-02-21T08:21:55.8826259Z shl.b32 %r344, %r12, 12; 2026-02-21T08:21:55.8826421Z .loc 1 56 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:56:52 2026-02-21T08:21:55.8826488Z or.b32 %r345, %r7, %r341; 2026-02-21T08:21:55.8826544Z or.b32 %r346, %r7, %r342; 2026-02-21T08:21:55.8826599Z or.b32 %r347, %r7, %r343; 2026-02-21T08:21:55.8826661Z or.b32 %r348, %r7, %r344; 2026-02-21T08:21:55.8826816Z .loc 1 56 24 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:56:24 2026-02-21T08:21:55.8826883Z mad.wide.u32 %rd68, %r345, 2, %rd3; 2026-02-21T08:21:55.8826954Z mad.wide.u32 %rd69, %r346, 2, %rd3; 2026-02-21T08:21:55.8827014Z mad.wide.u32 %rd70, %r347, 2, %rd3; 2026-02-21T08:21:55.8827072Z mad.wide.u32 %rd71, %r348, 2, %rd3; 2026-02-21T08:21:55.8827230Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8827291Z // begin inline asm 2026-02-21T08:21:55.8827587Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285}, [%r303 + 0], 32; 2026-02-21T08:21:55.8827641Z // end inline asm 2026-02-21T08:21:55.8827704Z // begin inline asm 2026-02-21T08:21:55.8827982Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302}, [%r303 + 16], 32; 2026-02-21T08:21:55.8828036Z // end inline asm 2026-02-21T08:21:55.8828094Z // begin inline asm 2026-02-21T08:21:55.8828162Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:55.8828214Z // end inline asm 2026-02-21T08:21:55.8828275Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:21:55.8828332Z cvt.u64.u32 %rd73, %r271; 2026-02-21T08:21:55.8828387Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:21:55.8828442Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:21:55.8828608Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8828669Z mov.b64 {%r349, %r350}, %rd75; 2026-02-21T08:21:55.8828734Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:21:55.8828899Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8829068Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:21:55.8829124Z cvt.u64.u32 %rd77, %r273; 2026-02-21T08:21:55.8829178Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:21:55.8829241Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:21:55.8829405Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8829464Z mov.b64 {%r352, %r353}, %rd79; 2026-02-21T08:21:55.8829535Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:21:55.8829697Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8829751Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:21:55.8829812Z cvt.u64.u32 %rd81, %r275; 2026-02-21T08:21:55.8829869Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:21:55.8829924Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:21:55.8830131Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8830201Z mov.b64 {%r355, %r356}, %rd83; 2026-02-21T08:21:55.8830266Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T08:21:55.8830430Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8830495Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:21:55.8830550Z cvt.u64.u32 %rd85, %r277; 2026-02-21T08:21:55.8830605Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:21:55.8830669Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:21:55.8830833Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8830891Z mov.b64 {%r358, %r359}, %rd87; 2026-02-21T08:21:55.8830954Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T08:21:55.8831127Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8831184Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:21:55.8831242Z cvt.u64.u32 %rd89, %r279; 2026-02-21T08:21:55.8831315Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:21:55.8831373Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:21:55.8831536Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8831604Z mov.b64 {%r361, %r362}, %rd91; 2026-02-21T08:21:55.8831666Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T08:21:55.8831833Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8831887Z cvt.u64.u32 %rd92, %r280; 2026-02-21T08:21:55.8831950Z cvt.u64.u32 %rd93, %r281; 2026-02-21T08:21:55.8832005Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:21:55.8832062Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:21:55.8832230Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8832285Z mov.b64 {%r364, %r365}, %rd95; 2026-02-21T08:21:55.8832349Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T08:21:55.8832521Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8832578Z cvt.u64.u32 %rd96, %r282; 2026-02-21T08:21:55.8832633Z cvt.u64.u32 %rd97, %r283; 2026-02-21T08:21:55.8832687Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:21:55.8832751Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:21:55.8832912Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8832970Z mov.b64 {%r367, %r368}, %rd99; 2026-02-21T08:21:55.8833038Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T08:21:55.8833202Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8833261Z cvt.u64.u32 %rd100, %r284; 2026-02-21T08:21:55.8833326Z cvt.u64.u32 %rd101, %r285; 2026-02-21T08:21:55.8833383Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:21:55.8833445Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:21:55.8833606Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8833714Z mov.b64 {%r370, %r371}, %rd103; 2026-02-21T08:21:55.8833775Z cvt.rn.f16x2.f32 %r372, %r371, %r370; 2026-02-21T08:21:55.8833941Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8834005Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:21:55.8834061Z cvt.u64.u32 %rd105, %r288; 2026-02-21T08:21:55.8834117Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:21:55.8834183Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:21:55.8834346Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8834404Z mov.b64 {%r373, %r374}, %rd107; 2026-02-21T08:21:55.8834466Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T08:21:55.8834713Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8834778Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:21:55.8834835Z cvt.u64.u32 %rd109, %r290; 2026-02-21T08:21:55.8834899Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:21:55.8834957Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:21:55.8835122Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8835187Z mov.b64 {%r376, %r377}, %rd111; 2026-02-21T08:21:55.8835251Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T08:21:55.8835413Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8835471Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:21:55.8835533Z cvt.u64.u32 %rd113, %r292; 2026-02-21T08:21:55.8835591Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:21:55.8835647Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:21:55.8835819Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8835879Z mov.b64 {%r379, %r380}, %rd115; 2026-02-21T08:21:55.8835943Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T08:21:55.8836111Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8836168Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:21:55.8836223Z cvt.u64.u32 %rd117, %r294; 2026-02-21T08:21:55.8836280Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:21:55.8836345Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:21:55.8836508Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8836566Z mov.b64 {%r382, %r383}, %rd119; 2026-02-21T08:21:55.8836636Z cvt.rn.f16x2.f32 %r384, %r383, %r382; 2026-02-21T08:21:55.8836800Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8836856Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:21:55.8836920Z cvt.u64.u32 %rd121, %r296; 2026-02-21T08:21:55.8836978Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:21:55.8837037Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:21:55.8837198Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8837263Z mov.b64 {%r385, %r386}, %rd123; 2026-02-21T08:21:55.8837324Z cvt.rn.f16x2.f32 %r387, %r386, %r385; 2026-02-21T08:21:55.8837488Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8837553Z cvt.u64.u32 %rd124, %r297; 2026-02-21T08:21:55.8837608Z cvt.u64.u32 %rd125, %r298; 2026-02-21T08:21:55.8837664Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:21:55.8837726Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:21:55.8837887Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8837945Z mov.b64 {%r388, %r389}, %rd127; 2026-02-21T08:21:55.8838006Z cvt.rn.f16x2.f32 %r390, %r389, %r388; 2026-02-21T08:21:55.8838176Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8838283Z cvt.u64.u32 %rd128, %r299; 2026-02-21T08:21:55.8838338Z cvt.u64.u32 %rd129, %r300; 2026-02-21T08:21:55.8838401Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:21:55.8838457Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:21:55.8838621Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8838684Z mov.b64 {%r391, %r392}, %rd131; 2026-02-21T08:21:55.8838747Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T08:21:55.8838910Z .loc 1 53 52 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:53:52 2026-02-21T08:21:55.8838966Z cvt.u64.u32 %rd132, %r301; 2026-02-21T08:21:55.8839029Z cvt.u64.u32 %rd133, %r302; 2026-02-21T08:21:55.8839086Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:21:55.8839143Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:21:55.8839359Z .loc 1 55 27 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:55:27 2026-02-21T08:21:55.8839421Z mov.b64 {%r394, %r395}, %rd135; 2026-02-21T08:21:55.8839484Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T08:21:55.8839650Z .loc 1 56 82 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:56:82 2026-02-21T08:21:55.8839745Z st.shared.v4.b32 [%r4], {%r351, %r363, %r375, %r387}; 2026-02-21T08:21:55.8839798Z bar.sync 0; 2026-02-21T08:21:55.8839854Z // begin inline asm 2026-02-21T08:21:55.8840010Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r324, %r328, %r332, %r336}, [%r308]; 2026-02-21T08:21:55.8840066Z // end inline asm 2026-02-21T08:21:55.8840118Z bar.sync 0; 2026-02-21T08:21:55.8840218Z st.shared.v4.b32 [%r4], {%r354, %r366, %r378, %r390}; 2026-02-21T08:21:55.8840275Z bar.sync 0; 2026-02-21T08:21:55.8840333Z // begin inline asm 2026-02-21T08:21:55.8840476Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r325, %r329, %r333, %r337}, [%r308]; 2026-02-21T08:21:55.8840541Z // end inline asm 2026-02-21T08:21:55.8840596Z bar.sync 0; 2026-02-21T08:21:55.8840685Z st.shared.v4.b32 [%r4], {%r357, %r369, %r381, %r393}; 2026-02-21T08:21:55.8840746Z bar.sync 0; 2026-02-21T08:21:55.8840800Z // begin inline asm 2026-02-21T08:21:55.8840939Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r326, %r330, %r334, %r338}, [%r308]; 2026-02-21T08:21:55.8840999Z // end inline asm 2026-02-21T08:21:55.8841051Z bar.sync 0; 2026-02-21T08:21:55.8841134Z st.shared.v4.b32 [%r4], {%r360, %r372, %r384, %r396}; 2026-02-21T08:21:55.8841186Z bar.sync 0; 2026-02-21T08:21:55.8841248Z // begin inline asm 2026-02-21T08:21:55.8841387Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r327, %r331, %r335, %r339}, [%r308]; 2026-02-21T08:21:55.8841441Z // end inline asm 2026-02-21T08:21:55.8841502Z // begin inline asm 2026-02-21T08:21:55.8841603Z st.global.v4.b32 [ %rd68 + 0 ], { %r324, %r325, %r326, %r327 }; 2026-02-21T08:21:55.8841655Z // end inline asm 2026-02-21T08:21:55.8841712Z // begin inline asm 2026-02-21T08:21:55.8841819Z st.global.v4.b32 [ %rd69 + 0 ], { %r328, %r329, %r330, %r331 }; 2026-02-21T08:21:55.8841872Z // end inline asm 2026-02-21T08:21:55.8841926Z // begin inline asm 2026-02-21T08:21:55.8842027Z st.global.v4.b32 [ %rd70 + 0 ], { %r332, %r333, %r334, %r335 }; 2026-02-21T08:21:55.8842080Z // end inline asm 2026-02-21T08:21:55.8842134Z // begin inline asm 2026-02-21T08:21:55.8842233Z st.global.v4.b32 [ %rd71 + 0 ], { %r336, %r337, %r338, %r339 }; 2026-02-21T08:21:55.8842287Z // end inline asm 2026-02-21T08:21:55.8842366Z $L__BB0_8: // %._crit_edge 2026-02-21T08:21:55.8842535Z .loc 1 28 4 // cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py:28:4 2026-02-21T08:21:55.8842596Z bar.sync 0; 2026-02-21T08:21:55.8842650Z // begin inline asm 2026-02-21T08:21:55.8842762Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r397, 64; 2026-02-21T08:21:55.8842824Z // end inline asm 2026-02-21T08:21:55.8842874Z ret; 2026-02-21T08:21:55.8842928Z $L__tmp1: 2026-02-21T08:21:55.8843022Z $L__func_end0: 2026-02-21T08:21:55.8843111Z // -- End function 2026-02-21T08:21:55.8843161Z } 2026-02-21T08:21:55.8843367Z .file 1 "/tmp/torchinductor_root/wa/cwa4zcvnvywo2j3rrsplgwbdi72byup4wzeewyffne6tgcypj6lw.py" 2026-02-21T08:21:55.8843435Z .section .debug_abbrev 2026-02-21T08:21:55.8843483Z { 2026-02-21T08:21:55.8843568Z .b8 1 // Abbreviation Code 2026-02-21T08:21:55.8843657Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:55.8843735Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:55.8843811Z .b8 37 // DW_AT_producer 2026-02-21T08:21:55.8843881Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.8843959Z .b8 19 // DW_AT_language 2026-02-21T08:21:55.8844100Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:55.8844174Z .b8 3 // DW_AT_name 2026-02-21T08:21:55.8844254Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.8844330Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:55.8844404Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:55.8844483Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:55.8844552Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.8844619Z .b8 0 // EOM(1) 2026-02-21T08:21:55.8844711Z .b8 0 // EOM(2) 2026-02-21T08:21:55.8844784Z .b8 0 // EOM(3) 2026-02-21T08:21:55.8844834Z } 2026-02-21T08:21:55.8844894Z .section .debug_info 2026-02-21T08:21:55.8844950Z { 2026-02-21T08:21:55.8845032Z .b32 104 // Length of Unit 2026-02-21T08:21:55.8845118Z .b8 2 // DWARF version number 2026-02-21T08:21:55.8845178Z .b8 0 2026-02-21T08:21:55.8845291Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:55.8845378Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:55.8845473Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:55.8845559Z .b8 116 // DW_AT_producer 2026-02-21T08:21:55.8845611Z .b8 114 2026-02-21T08:21:55.8845663Z .b8 105 2026-02-21T08:21:55.8845719Z .b8 116 2026-02-21T08:21:55.8845768Z .b8 111 2026-02-21T08:21:55.8845816Z .b8 110 2026-02-21T08:21:55.8845867Z .b8 0 2026-02-21T08:21:55.8845947Z .b8 2 // DW_AT_language 2026-02-21T08:21:55.8845996Z .b8 0 2026-02-21T08:21:55.8846070Z .b8 99 // DW_AT_name 2026-02-21T08:21:55.8846126Z .b8 119 2026-02-21T08:21:55.8846177Z .b8 97 2026-02-21T08:21:55.8846227Z .b8 52 2026-02-21T08:21:55.8846281Z .b8 122 2026-02-21T08:21:55.8846339Z .b8 99 2026-02-21T08:21:55.8846391Z .b8 118 2026-02-21T08:21:55.8846441Z .b8 110 2026-02-21T08:21:55.8846499Z .b8 118 2026-02-21T08:21:55.8846547Z .b8 121 2026-02-21T08:21:55.8846599Z .b8 119 2026-02-21T08:21:55.8846650Z .b8 111 2026-02-21T08:21:55.8846707Z .b8 50 2026-02-21T08:21:55.8846823Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.8846828Z 2026-02-21T08:21:55.8847218Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_r0c69zu.ptx -o /tmp/tmp_r0c69zu.ptx.o 2026-02-21T08:21:55.8847222Z 2026-02-21T08:21:55.8847346Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:55.8847397Z .b8 106 2026-02-21T08:21:55.8847454Z .b8 51 2026-02-21T08:21:55.8847504Z .b8 114 2026-02-21T08:21:55.8847556Z .b8 114 2026-02-21T08:21:55.8847606Z .b8 115 2026-02-21T08:21:55.8847667Z .b8 112 2026-02-21T08:21:55.8847724Z .b8 108 2026-02-21T08:21:55.8847778Z .b8 103 2026-02-21T08:21:55.8847898Z .b8 119 2026-02-21T08:21:55.8847948Z .b8 98 2026-02-21T08:21:55.8847999Z .b8 100 2026-02-21T08:21:55.8848048Z .b8 105 2026-02-21T08:21:55.8848107Z .b8 55 2026-02-21T08:21:55.8848156Z .b8 50 2026-02-21T08:21:55.8848205Z .b8 98 2026-02-21T08:21:55.8848254Z .b8 121 2026-02-21T08:21:55.8848309Z .b8 117 2026-02-21T08:21:55.8848358Z .b8 112 2026-02-21T08:21:55.8848406Z .b8 52 2026-02-21T08:21:55.8848462Z .b8 119 2026-02-21T08:21:55.8848510Z .b8 122 2026-02-21T08:21:55.8848559Z .b8 101 2026-02-21T08:21:55.8848607Z .b8 101 2026-02-21T08:21:55.8848664Z .b8 119 2026-02-21T08:21:55.8848713Z .b8 121 2026-02-21T08:21:55.8848762Z .b8 102 2026-02-21T08:21:55.8848817Z .b8 102 2026-02-21T08:21:55.8848866Z .b8 110 2026-02-21T08:21:55.8848915Z .b8 101 2026-02-21T08:21:55.8848963Z .b8 54 2026-02-21T08:21:55.8849021Z .b8 116 2026-02-21T08:21:55.8849072Z .b8 103 2026-02-21T08:21:55.8849121Z .b8 99 2026-02-21T08:21:55.8849178Z .b8 121 2026-02-21T08:21:55.8849228Z .b8 112 2026-02-21T08:21:55.8849327Z .b8 106 2026-02-21T08:21:55.8849383Z .b8 54 2026-02-21T08:21:55.8849439Z .b8 108 2026-02-21T08:21:55.8849489Z .b8 119 2026-02-21T08:21:55.8849538Z .b8 46 2026-02-21T08:21:55.8849586Z .b8 112 2026-02-21T08:21:55.8849644Z .b8 121 2026-02-21T08:21:55.8849693Z .b8 0 2026-02-21T08:21:55.8849786Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:55.8849867Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:55.8849915Z .b8 116 2026-02-21T08:21:55.8849964Z .b8 109 2026-02-21T08:21:55.8850012Z .b8 112 2026-02-21T08:21:55.8850067Z .b8 47 2026-02-21T08:21:55.8850116Z .b8 116 2026-02-21T08:21:55.8850165Z .b8 111 2026-02-21T08:21:55.8850219Z .b8 114 2026-02-21T08:21:55.8850267Z .b8 99 2026-02-21T08:21:55.8850315Z .b8 104 2026-02-21T08:21:55.8850363Z .b8 105 2026-02-21T08:21:55.8850418Z .b8 110 2026-02-21T08:21:55.8850466Z .b8 100 2026-02-21T08:21:55.8850514Z .b8 117 2026-02-21T08:21:55.8850569Z .b8 99 2026-02-21T08:21:55.8850618Z .b8 116 2026-02-21T08:21:55.8850671Z .b8 111 2026-02-21T08:21:55.8850725Z .b8 114 2026-02-21T08:21:55.8850785Z .b8 95 2026-02-21T08:21:55.8850834Z .b8 114 2026-02-21T08:21:55.8850883Z .b8 111 2026-02-21T08:21:55.8850930Z .b8 111 2026-02-21T08:21:55.8850985Z .b8 116 2026-02-21T08:21:55.8851033Z .b8 47 2026-02-21T08:21:55.8851081Z .b8 119 2026-02-21T08:21:55.8851135Z .b8 97 2026-02-21T08:21:55.8851183Z .b8 0 2026-02-21T08:21:55.8851233Z } 2026-02-21T08:21:55.8851297Z .section .debug_macinfo { } 2026-02-21T08:21:55.8851307Z 2026-02-21T08:21:55.8851381Z ================================================================ 2026-02-21T08:21:55.8851479Z please share the reproducer above with Triton project. 2026-02-21T08:21:55.9769083Z 2026-02-21T08:21:55.9769115Z 2026-02-21T08:21:55.9769120Z 2026-02-21T08:21:55.9769319Z ================================================================ 2026-02-21T08:21:55.9769608Z Internal Triton PTX codegen error 2026-02-21T08:21:55.9769971Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:21:55.9771220Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:21:55.9772437Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:21:55.9772676Z `ptxas` stderr: 2026-02-21T08:21:55.9772826Z `ptxas` stderr: 2026-02-21T08:21:55.9773386Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.9774107Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:21:55.9774990Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.9775137Z 2026-02-21T08:21:55.9775542Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp4krkxsnj.ptx -o /tmp/tmp4krkxsnj.ptx.o 2026-02-21T08:21:55.9776001Z 2026-02-21T08:21:55.9776005Z 2026-02-21T08:21:55.9776060Z // 2026-02-21T08:21:55.9776201Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:21:55.9776384Z // 2026-02-21T08:21:55.9776451Z 2026-02-21T08:21:55.9776513Z .version 8.7 2026-02-21T08:21:55.9776644Z .target sm_100a 2026-02-21T08:21:55.9776780Z .address_size 64 2026-02-21T08:21:55.9776860Z 2026-02-21T08:21:55.9776978Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:21:55.9777230Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:21:55.9777520Z // @_helion_matmul 2026-02-21T08:21:55.9777732Z .visible .entry _helion_matmul( 2026-02-21T08:21:55.9777941Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:21:55.9778188Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:21:55.9778439Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:21:55.9778670Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:21:55.9778911Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:21:55.9779108Z ) 2026-02-21T08:21:55.9779231Z .reqntid 128 2026-02-21T08:21:55.9779352Z .maxnreg 32 2026-02-21T08:21:55.9779475Z { 2026-02-21T08:21:55.9779594Z .reg .pred %p<120>; 2026-02-21T08:21:55.9779742Z .reg .b32 %r<406>; 2026-02-21T08:21:55.9779885Z .reg .b64 %rd<136>; 2026-02-21T08:21:55.9780131Z .loc 1 19 0 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:19:0 2026-02-21T08:21:55.9780420Z $L__func_begin0: 2026-02-21T08:21:55.9780652Z .loc 1 19 0 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:19:0 2026-02-21T08:21:55.9780887Z 2026-02-21T08:21:55.9780938Z // %bb.0: 2026-02-21T08:21:55.9781083Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:21:55.9781267Z $L__tmp0: 2026-02-21T08:21:55.9781481Z .loc 1 19 0 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:19 2026-02-21T08:21:55.9781747Z mov.u32 %r1, %tid.x; 2026-02-21T08:21:55.9781917Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:21:55.9782107Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:21:55.9782270Z mov.b32 %r30, global_smem; 2026-02-21T08:21:55.9782420Z // begin inline asm 2026-02-21T08:21:55.9782644Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r30], 64; 2026-02-21T08:21:55.9782877Z // end inline asm 2026-02-21T08:21:55.9783038Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T08:21:55.9783221Z bar.sync 0; 2026-02-21T08:21:55.9783362Z ld.shared.b32 %r397, [global_smem]; 2026-02-21T08:21:55.9783535Z bar.sync 0; 2026-02-21T08:21:55.9783659Z // begin inline asm 2026-02-21T08:21:55.9783862Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:21:55.9784082Z // end inline asm 2026-02-21T08:21:55.9784325Z .loc 1 21 67 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:21:67 2026-02-21T08:21:55.9784599Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:21:55.9784798Z mov.u32 %r47, %ctaid.y; 2026-02-21T08:21:55.9784957Z mov.u32 %r48, %ctaid.z; 2026-02-21T08:21:55.9785110Z mov.u32 %r49, %nctaid.x; 2026-02-21T08:21:55.9785270Z mov.u32 %r50, %nctaid.y; 2026-02-21T08:21:55.9785429Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T08:21:55.9785622Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T08:21:55.9785792Z shl.b32 %r53, %r52, 8; 2026-02-21T08:21:55.9785960Z cvt.s64.s32 %rd41, %r53; 2026-02-21T08:21:55.9786122Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T08:21:55.9786300Z shl.b32 %r54, %r1, 2; 2026-02-21T08:21:55.9786514Z add.s32 %r31, %r30, %r54; 2026-02-21T08:21:55.9786675Z mov.b32 %r40, 0; 2026-02-21T08:21:55.9786819Z // begin inline asm 2026-02-21T08:21:55.9786972Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:55.9787150Z // end inline asm 2026-02-21T08:21:55.9787284Z bar.warp.sync -1; 2026-02-21T08:21:55.9787436Z setp.eq.b32 %p110, %r1, 0; 2026-02-21T08:21:55.9787589Z cvt.u64.u32 %rd4, %r30; 2026-02-21T08:21:55.9787741Z // begin inline asm 2026-02-21T08:21:55.9787985Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:21:55.9788267Z // end inline asm 2026-02-21T08:21:55.9788395Z // begin inline asm 2026-02-21T08:21:55.9788620Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.9788874Z // end inline asm 2026-02-21T08:21:55.9788999Z mov.b32 %r33, 16; 2026-02-21T08:21:55.9789135Z // begin inline asm 2026-02-21T08:21:55.9789420Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:55.9789695Z // end inline asm 2026-02-21T08:21:55.9789821Z mov.b32 %r34, 64; 2026-02-21T08:21:55.9789959Z // begin inline asm 2026-02-21T08:21:55.9790190Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.9790444Z // end inline asm 2026-02-21T08:21:55.9790580Z mov.b32 %r35, 2048; 2026-02-21T08:21:55.9790713Z // begin inline asm 2026-02-21T08:21:55.9790953Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:55.9791221Z // end inline asm 2026-02-21T08:21:55.9791356Z // begin inline asm 2026-02-21T08:21:55.9791592Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:21:55.9791864Z // end inline asm 2026-02-21T08:21:55.9791996Z mov.b64 %rd12, 4096; 2026-02-21T08:21:55.9792131Z // begin inline asm 2026-02-21T08:21:55.9792384Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.9792660Z // end inline asm 2026-02-21T08:21:55.9792792Z mov.b32 %r37, 1; 2026-02-21T08:21:55.9792917Z // begin inline asm 2026-02-21T08:21:55.9793176Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:55.9793457Z // end inline asm 2026-02-21T08:21:55.9793584Z // begin inline asm 2026-02-21T08:21:55.9793833Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:55.9794116Z // end inline asm 2026-02-21T08:21:55.9794253Z // begin inline asm 2026-02-21T08:21:55.9794477Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.9794774Z // end inline asm 2026-02-21T08:21:55.9794903Z // begin inline asm 2026-02-21T08:21:55.9795157Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.9795441Z // end inline asm 2026-02-21T08:21:55.9795578Z // begin inline asm 2026-02-21T08:21:55.9795823Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.9796088Z // end inline asm 2026-02-21T08:21:55.9796231Z // begin inline asm 2026-02-21T08:21:55.9796464Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.9796732Z // end inline asm 2026-02-21T08:21:55.9796877Z // begin inline asm 2026-02-21T08:21:55.9797222Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.9797605Z // end inline asm 2026-02-21T08:21:55.9797733Z // begin inline asm 2026-02-21T08:21:55.9797941Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:21:55.9798187Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.9798376Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.9798545Z // end inline asm 2026-02-21T08:21:55.9798682Z bar.sync 0; 2026-02-21T08:21:55.9798825Z cvta.global.u64 %rd59, %rd19; 2026-02-21T08:21:55.9799150Z .loc 1 22 67 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:22:67 2026-02-21T08:21:55.9799434Z add.s64 %rd37, %rd19, 128; 2026-02-21T08:21:55.9799583Z bar.sync 0; 2026-02-21T08:21:55.9799717Z // begin inline asm 2026-02-21T08:21:55.9799860Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:21:55.9800031Z // end inline asm 2026-02-21T08:21:55.9800164Z bar.warp.sync -1; 2026-02-21T08:21:55.9800307Z // begin inline asm 2026-02-21T08:21:55.9800549Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:21:55.9800823Z // end inline asm 2026-02-21T08:21:55.9800957Z // begin inline asm 2026-02-21T08:21:55.9801171Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.9801419Z // end inline asm 2026-02-21T08:21:55.9801547Z // begin inline asm 2026-02-21T08:21:55.9801861Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:21:55.9802152Z // end inline asm 2026-02-21T08:21:55.9802285Z // begin inline asm 2026-02-21T08:21:55.9802531Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:21:55.9802809Z // end inline asm 2026-02-21T08:21:55.9802946Z // begin inline asm 2026-02-21T08:21:55.9803198Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:21:55.9803466Z // end inline asm 2026-02-21T08:21:55.9803607Z mov.b32 %r44, 4096; 2026-02-21T08:21:55.9803741Z // begin inline asm 2026-02-21T08:21:55.9803979Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r44; 2026-02-21T08:21:55.9804273Z // end inline asm 2026-02-21T08:21:55.9804407Z // begin inline asm 2026-02-21T08:21:55.9804662Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:21:55.9804988Z // end inline asm 2026-02-21T08:21:55.9805134Z // begin inline asm 2026-02-21T08:21:55.9805388Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:21:55.9805681Z // end inline asm 2026-02-21T08:21:55.9805813Z // begin inline asm 2026-02-21T08:21:55.9806071Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:21:55.9806360Z // end inline asm 2026-02-21T08:21:55.9806493Z // begin inline asm 2026-02-21T08:21:55.9806736Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:21:55.9807001Z // end inline asm 2026-02-21T08:21:55.9807142Z // begin inline asm 2026-02-21T08:21:55.9807391Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.9807696Z // end inline asm 2026-02-21T08:21:55.9807830Z // begin inline asm 2026-02-21T08:21:55.9808077Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:21:55.9808358Z // end inline asm 2026-02-21T08:21:55.9808490Z // begin inline asm 2026-02-21T08:21:55.9808726Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:21:55.9808987Z // end inline asm 2026-02-21T08:21:55.9809125Z // begin inline asm 2026-02-21T08:21:55.9809469Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:21:55.9809853Z // end inline asm 2026-02-21T08:21:55.9809995Z // begin inline asm 2026-02-21T08:21:55.9810205Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:21:55.9810461Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:21:55.9810649Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:21:55.9810834Z // end inline asm 2026-02-21T08:21:55.9810965Z bar.sync 0; 2026-02-21T08:21:55.9811112Z cvta.global.u64 %rd60, %rd37; 2026-02-21T08:21:55.9811401Z .loc 1 28 131 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:28:131 2026-02-21T08:21:55.9811773Z setp.gt.u32 %p39, %r3, 2047; 2026-02-21T08:21:55.9811947Z @%p39 bra $L__BB0_8; 2026-02-21T08:21:55.9812111Z // %bb.1: // %.lr.ph 2026-02-21T08:21:55.9812417Z .loc 1 40 45 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:40:45 2026-02-21T08:21:55.9812724Z shl.b32 %r154, %r1, 3; 2026-02-21T08:21:55.9812882Z and.b32 %r155, %r154, 56; 2026-02-21T08:21:55.9813036Z bfe.u32 %r156, %r1, 3, 4; 2026-02-21T08:21:55.9813198Z shr.u32 %r157, %r1, 5; 2026-02-21T08:21:55.9813355Z shl.b32 %r158, %r1, 4; 2026-02-21T08:21:55.9813505Z and.b32 %r159, %r158, 176; 2026-02-21T08:21:55.9813672Z and.b32 %r160, %r1, 96; 2026-02-21T08:21:55.9813822Z shl.b32 %r161, %r160, 3; 2026-02-21T08:21:55.9813982Z bfe.s32 %r162, %r1, 2, 1; 2026-02-21T08:21:55.9814130Z and.b32 %r163, %r162, 1088; 2026-02-21T08:21:55.9814287Z and.b32 %r165, %r54, 64; 2026-02-21T08:21:55.9814490Z xor.b32 %r166, %r163, %r165; 2026-02-21T08:21:55.9814657Z add.s32 %r167, %r30, %r159; 2026-02-21T08:21:55.9814835Z add.s32 %r168, %r167, %r161; 2026-02-21T08:21:55.9814990Z shl.b32 %r169, %r1, 5; 2026-02-21T08:21:55.9815139Z and.b32 %r170, %r169, 1792; 2026-02-21T08:21:55.9815289Z and.b32 %r171, %r154, 48; 2026-02-21T08:21:55.9815443Z shl.b32 %r172, %r160, 1; 2026-02-21T08:21:55.9815586Z shl.b32 %r173, %r1, 6; 2026-02-21T08:21:55.9815733Z and.b32 %r174, %r173, 64; 2026-02-21T08:21:55.9815876Z xor.b32 %r175, %r172, %r174; 2026-02-21T08:21:55.9816033Z add.s32 %r176, %r30, %r170; 2026-02-21T08:21:55.9816178Z add.s32 %r177, %r176, %r171; 2026-02-21T08:21:55.9816443Z .loc 1 35 33 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:35:33 2026-02-21T08:21:55.9816728Z shr.u32 %r178, %r3, 5; 2026-02-21T08:21:55.9816870Z and.b32 %r179, %r178, 60; 2026-02-21T08:21:55.9817124Z .loc 1 37 64 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:37:64 2026-02-21T08:21:55.9817394Z and.b32 %r180, %r3, 3; 2026-02-21T08:21:55.9817644Z .loc 1 37 30 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:37:30 2026-02-21T08:21:55.9817924Z or.b32 %r181, %r179, %r180; 2026-02-21T08:21:55.9818178Z .loc 1 39 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:39:27 2026-02-21T08:21:55.9818447Z shl.b32 %r212, %r181, 6; 2026-02-21T08:21:55.9818696Z .loc 1 41 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:41:27 2026-02-21T08:21:55.9818971Z shl.b32 %r182, %r3, 4; 2026-02-21T08:21:55.9819112Z and.b32 %r208, %r182, 1984; 2026-02-21T08:21:55.9819365Z .loc 1 42 32 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:42:32 2026-02-21T08:21:55.9819635Z or.b32 %r9, %r208, %r156; 2026-02-21T08:21:55.9819887Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9820175Z shfl.sync.idx.b32 %r13, %r157, 0, 31, -1; 2026-02-21T08:21:55.9820360Z shl.b32 %r183, %r13, 21; 2026-02-21T08:21:55.9820515Z and.b32 %r184, %r183, 6291456; 2026-02-21T08:21:55.9820669Z add.s32 %r303, %r184, %r397; 2026-02-21T08:21:55.9820827Z mov.pred %p40, -1; 2026-02-21T08:21:55.9820964Z // begin inline asm 2026-02-21T08:21:55.9821317Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 0], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:55.9821678Z // end inline asm 2026-02-21T08:21:55.9821817Z // begin inline asm 2026-02-21T08:21:55.9822153Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 16], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:21:55.9822509Z // end inline asm 2026-02-21T08:21:55.9822648Z // begin inline asm 2026-02-21T08:21:55.9822796Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:21:55.9822966Z // end inline asm 2026-02-21T08:21:55.9823153Z bar.sync 0; 2026-02-21T08:21:55.9823397Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9823669Z add.s32 %r399, %r30, 28736; 2026-02-21T08:21:55.9823824Z // begin inline asm 2026-02-21T08:21:55.9823992Z @%p110 mbarrier.init.shared::cta.b64 [%r399], 1; 2026-02-21T08:21:55.9824174Z // end inline asm 2026-02-21T08:21:55.9824309Z bar.sync 0; 2026-02-21T08:21:55.9824435Z add.s32 %r90, %r30, 28744; 2026-02-21T08:21:55.9824587Z // begin inline asm 2026-02-21T08:21:55.9824774Z @%p110 mbarrier.init.shared::cta.b64 [%r90], 1; 2026-02-21T08:21:55.9824965Z // end inline asm 2026-02-21T08:21:55.9825098Z add.s32 %r91, %r30, 28672; 2026-02-21T08:21:55.9825252Z // begin inline asm 2026-02-21T08:21:55.9825416Z @%p110 mbarrier.init.shared::cta.b64 [%r91], 1; 2026-02-21T08:21:55.9825595Z // end inline asm 2026-02-21T08:21:55.9825727Z bar.sync 0; 2026-02-21T08:21:55.9825901Z add.s32 %r92, %r30, 28680; 2026-02-21T08:21:55.9826060Z // begin inline asm 2026-02-21T08:21:55.9826216Z @%p110 mbarrier.init.shared::cta.b64 [%r92], 1; 2026-02-21T08:21:55.9826399Z // end inline asm 2026-02-21T08:21:55.9826525Z bar.sync 0; 2026-02-21T08:21:55.9826656Z add.s32 %r93, %r30, 28688; 2026-02-21T08:21:55.9826800Z // begin inline asm 2026-02-21T08:21:55.9826962Z @%p110 mbarrier.init.shared::cta.b64 [%r93], 1; 2026-02-21T08:21:55.9827145Z // end inline asm 2026-02-21T08:21:55.9827269Z bar.sync 0; 2026-02-21T08:21:55.9827400Z add.s32 %r94, %r30, 28696; 2026-02-21T08:21:55.9827544Z // begin inline asm 2026-02-21T08:21:55.9827707Z @%p110 mbarrier.init.shared::cta.b64 [%r94], 1; 2026-02-21T08:21:55.9827883Z // end inline asm 2026-02-21T08:21:55.9828017Z bar.sync 0; 2026-02-21T08:21:55.9828138Z add.s32 %r95, %r30, 28704; 2026-02-21T08:21:55.9828288Z // begin inline asm 2026-02-21T08:21:55.9828441Z @%p110 mbarrier.init.shared::cta.b64 [%r95], 1; 2026-02-21T08:21:55.9828628Z // end inline asm 2026-02-21T08:21:55.9828762Z bar.sync 0; 2026-02-21T08:21:55.9828885Z add.s32 %r96, %r30, 28712; 2026-02-21T08:21:55.9829034Z // begin inline asm 2026-02-21T08:21:55.9829187Z @%p110 mbarrier.init.shared::cta.b64 [%r96], 1; 2026-02-21T08:21:55.9829369Z // end inline asm 2026-02-21T08:21:55.9829493Z bar.sync 0; 2026-02-21T08:21:55.9829624Z add.s32 %r205, %r30, 28720; 2026-02-21T08:21:55.9829769Z // begin inline asm 2026-02-21T08:21:55.9829932Z @%p110 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T08:21:55.9830121Z // end inline asm 2026-02-21T08:21:55.9830246Z bar.sync 0; 2026-02-21T08:21:55.9830373Z // begin inline asm 2026-02-21T08:21:55.9830555Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r91], 4096; 2026-02-21T08:21:55.9830771Z // end inline asm 2026-02-21T08:21:55.9831005Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9831298Z // begin inline asm 2026-02-21T08:21:55.9831451Z fence.proxy.async.shared::cta; 2026-02-21T08:21:55.9831622Z // end inline asm 2026-02-21T08:21:55.9831747Z bar.sync 0; 2026-02-21T08:21:55.9831885Z elect.sync %r185|%p70, -1; 2026-02-21T08:21:55.9832046Z and.pred %p52, %p1, %p70; 2026-02-21T08:21:55.9832195Z // begin inline asm 2026-02-21T08:21:55.9832522Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r30], [%rd59, {%r40, %r208}], [%r91]; 2026-02-21T08:21:55.9832870Z // end inline asm 2026-02-21T08:21:55.9833108Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9833374Z bar.sync 0; 2026-02-21T08:21:55.9833513Z elect.sync %r186|%p71, -1; 2026-02-21T08:21:55.9833675Z and.pred %p53, %p1, %p71; 2026-02-21T08:21:55.9833824Z add.s32 %r103, %r30, 14336; 2026-02-21T08:21:55.9833975Z // begin inline asm 2026-02-21T08:21:55.9834286Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r103], [%rd60, {%r40, %r212}], [%r91]; 2026-02-21T08:21:55.9834638Z // end inline asm 2026-02-21T08:21:55.9835017Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9835304Z bar.sync 0; 2026-02-21T08:21:55.9835429Z // begin inline asm 2026-02-21T08:21:55.9835622Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r92], 4096; 2026-02-21T08:21:55.9835839Z // end inline asm 2026-02-21T08:21:55.9836076Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9836357Z bar.sync 0; 2026-02-21T08:21:55.9836489Z elect.sync %r187|%p72, -1; 2026-02-21T08:21:55.9836655Z and.pred %p55, %p1, %p72; 2026-02-21T08:21:55.9836806Z add.s32 %r108, %r30, 2048; 2026-02-21T08:21:55.9836960Z // begin inline asm 2026-02-21T08:21:55.9837288Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd59, {%r33, %r208}], [%r92]; 2026-02-21T08:21:55.9837624Z // end inline asm 2026-02-21T08:21:55.9837918Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9838188Z bar.sync 0; 2026-02-21T08:21:55.9838324Z elect.sync %r188|%p73, -1; 2026-02-21T08:21:55.9838479Z and.pred %p56, %p1, %p73; 2026-02-21T08:21:55.9838635Z add.s32 %r112, %r30, 16384; 2026-02-21T08:21:55.9838779Z // begin inline asm 2026-02-21T08:21:55.9839095Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r112], [%rd60, {%r33, %r212}], [%r92]; 2026-02-21T08:21:55.9839435Z // end inline asm 2026-02-21T08:21:55.9839664Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9839935Z bar.sync 0; 2026-02-21T08:21:55.9840058Z // begin inline asm 2026-02-21T08:21:55.9840248Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r93], 4096; 2026-02-21T08:21:55.9840456Z // end inline asm 2026-02-21T08:21:55.9840697Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9840972Z bar.sync 0; 2026-02-21T08:21:55.9841102Z elect.sync %r189|%p74, -1; 2026-02-21T08:21:55.9841265Z and.pred %p58, %p1, %p74; 2026-02-21T08:21:55.9841414Z add.s32 %r117, %r30, 4096; 2026-02-21T08:21:55.9841567Z mov.b32 %r118, 32; 2026-02-21T08:21:55.9841698Z // begin inline asm 2026-02-21T08:21:55.9842026Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd59, {%r118, %r208}], [%r93]; 2026-02-21T08:21:55.9842372Z // end inline asm 2026-02-21T08:21:55.9842612Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9842881Z bar.sync 0; 2026-02-21T08:21:55.9843010Z elect.sync %r190|%p75, -1; 2026-02-21T08:21:55.9843169Z and.pred %p59, %p1, %p75; 2026-02-21T08:21:55.9843318Z add.s32 %r121, %r30, 18432; 2026-02-21T08:21:55.9843471Z // begin inline asm 2026-02-21T08:21:55.9843777Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd60, {%r118, %r212}], [%r93]; 2026-02-21T08:21:55.9844117Z // end inline asm 2026-02-21T08:21:55.9844356Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9844629Z bar.sync 0; 2026-02-21T08:21:55.9844781Z // begin inline asm 2026-02-21T08:21:55.9844966Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r94], 4096; 2026-02-21T08:21:55.9845183Z // end inline asm 2026-02-21T08:21:55.9845427Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9845708Z bar.sync 0; 2026-02-21T08:21:55.9845838Z elect.sync %r191|%p76, -1; 2026-02-21T08:21:55.9846002Z and.pred %p61, %p1, %p76; 2026-02-21T08:21:55.9846161Z add.s32 %r126, %r30, 6144; 2026-02-21T08:21:55.9846305Z mov.b32 %r127, 48; 2026-02-21T08:21:55.9846443Z // begin inline asm 2026-02-21T08:21:55.9846761Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r126], [%rd59, {%r127, %r208}], [%r94]; 2026-02-21T08:21:55.9847180Z // end inline asm 2026-02-21T08:21:55.9847417Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9847701Z bar.sync 0; 2026-02-21T08:21:55.9847835Z elect.sync %r192|%p77, -1; 2026-02-21T08:21:55.9847998Z and.pred %p62, %p1, %p77; 2026-02-21T08:21:55.9848158Z add.s32 %r130, %r30, 20480; 2026-02-21T08:21:55.9848307Z // begin inline asm 2026-02-21T08:21:55.9848637Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r130], [%rd60, {%r127, %r212}], [%r94]; 2026-02-21T08:21:55.9849002Z // end inline asm 2026-02-21T08:21:55.9849259Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9849548Z bar.sync 0; 2026-02-21T08:21:55.9849687Z // begin inline asm 2026-02-21T08:21:55.9849943Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r95], 4096; 2026-02-21T08:21:55.9850164Z // end inline asm 2026-02-21T08:21:55.9850412Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9850694Z bar.sync 0; 2026-02-21T08:21:55.9850837Z elect.sync %r193|%p78, -1; 2026-02-21T08:21:55.9850997Z and.pred %p64, %p1, %p78; 2026-02-21T08:21:55.9851162Z add.s32 %r135, %r30, 8192; 2026-02-21T08:21:55.9851313Z // begin inline asm 2026-02-21T08:21:55.9851648Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r135], [%rd59, {%r34, %r208}], [%r95]; 2026-02-21T08:21:55.9852008Z // end inline asm 2026-02-21T08:21:55.9852252Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9852545Z bar.sync 0; 2026-02-21T08:21:55.9852684Z elect.sync %r194|%p79, -1; 2026-02-21T08:21:55.9852859Z and.pred %p65, %p1, %p79; 2026-02-21T08:21:55.9853021Z add.s32 %r139, %r30, 22528; 2026-02-21T08:21:55.9853189Z // begin inline asm 2026-02-21T08:21:55.9853522Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r139], [%rd60, {%r34, %r212}], [%r95]; 2026-02-21T08:21:55.9853871Z // end inline asm 2026-02-21T08:21:55.9854127Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9854413Z bar.sync 0; 2026-02-21T08:21:55.9854548Z // begin inline asm 2026-02-21T08:21:55.9854763Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r96], 4096; 2026-02-21T08:21:55.9854993Z // end inline asm 2026-02-21T08:21:55.9855233Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9855519Z bar.sync 0; 2026-02-21T08:21:55.9855659Z elect.sync %r195|%p80, -1; 2026-02-21T08:21:55.9855821Z and.pred %p67, %p1, %p80; 2026-02-21T08:21:55.9855986Z add.s32 %r144, %r30, 10240; 2026-02-21T08:21:55.9856141Z mov.b32 %r145, 80; 2026-02-21T08:21:55.9856286Z // begin inline asm 2026-02-21T08:21:55.9856613Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd59, {%r145, %r208}], [%r96]; 2026-02-21T08:21:55.9857010Z // end inline asm 2026-02-21T08:21:55.9857248Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9857521Z bar.sync 0; 2026-02-21T08:21:55.9857656Z elect.sync %r196|%p81, -1; 2026-02-21T08:21:55.9857809Z and.pred %p68, %p1, %p81; 2026-02-21T08:21:55.9857965Z add.s32 %r148, %r30, 24576; 2026-02-21T08:21:55.9858109Z // begin inline asm 2026-02-21T08:21:55.9858423Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd60, {%r145, %r212}], [%r96]; 2026-02-21T08:21:55.9858766Z // end inline asm 2026-02-21T08:21:55.9859003Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9859286Z bar.sync 0; 2026-02-21T08:21:55.9859466Z // begin inline asm 2026-02-21T08:21:55.9859601Z 2026-02-21T08:21:55.9859710Z { 2026-02-21T08:21:55.9859833Z .reg .pred complete; 2026-02-21T08:21:55.9859971Z waitLoop: 2026-02-21T08:21:55.9860154Z mbarrier.try_wait.parity.shared.b64 complete, [%r91], %r40; 2026-02-21T08:21:55.9860377Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.9860530Z } 2026-02-21T08:21:55.9860593Z 2026-02-21T08:21:55.9860653Z // end inline asm 2026-02-21T08:21:55.9860881Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9861172Z setp.ne.b32 %p82, %r13, 0; 2026-02-21T08:21:55.9861323Z @%p82 bra $L__BB0_3; 2026-02-21T08:21:55.9861466Z // %bb.2: 2026-02-21T08:21:55.9861684Z .loc 1 0 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:0:52 2026-02-21T08:21:55.9861964Z bfe.u32 %r201, %r103, 4, 14; 2026-02-21T08:21:55.9862117Z cvt.u64.u32 %rd57, %r201; 2026-02-21T08:21:55.9862339Z or.b64 %rd55, %rd57, -4611685949699522560; 2026-02-21T08:21:55.9862528Z bfe.u32 %r202, %r30, 4, 14; 2026-02-21T08:21:55.9862678Z cvt.u64.u32 %rd58, %r202; 2026-02-21T08:21:55.9862845Z or.b64 %rd54, %rd58, -4611685949699522560; 2026-02-21T08:21:55.9863131Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9863420Z elect.sync %r203|%p84, -1; 2026-02-21T08:21:55.9863574Z mov.b32 %r198, 68157456; 2026-02-21T08:21:55.9863731Z mov.pred %p83, 0; 2026-02-21T08:21:55.9863867Z // begin inline asm 2026-02-21T08:21:55.9864092Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd54, %rd55, %r198, %p83; 2026-02-21T08:21:55.9864342Z // end inline asm 2026-02-21T08:21:55.9864474Z add.s32 %r204, %r30, 28736; 2026-02-21T08:21:55.9864634Z cvt.u64.u32 %rd56, %r204; 2026-02-21T08:21:55.9864801Z // begin inline asm 2026-02-21T08:21:55.9865011Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T08:21:55.9865236Z // end inline asm 2026-02-21T08:21:55.9865377Z $L__BB0_3: 2026-02-21T08:21:55.9865600Z .loc 1 0 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:0:52 2026-02-21T08:21:55.9865909Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T08:21:55.9866101Z add.s32 %r4, %r168, %r166; 2026-02-21T08:21:55.9866251Z add.s32 %r308, %r177, %r175; 2026-02-21T08:21:55.9866410Z or.b32 %r7, %r212, %r155; 2026-02-21T08:21:55.9866554Z or.b32 %r10, %r9, 16; 2026-02-21T08:21:55.9866701Z or.b32 %r11, %r9, 32; 2026-02-21T08:21:55.9866838Z or.b32 %r12, %r9, 48; 2026-02-21T08:21:55.9867082Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9867351Z bar.sync 0; 2026-02-21T08:21:55.9867483Z // begin inline asm 2026-02-21T08:21:55.9867680Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r205], 4096; 2026-02-21T08:21:55.9867892Z // end inline asm 2026-02-21T08:21:55.9868134Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9868402Z bar.sync 0; 2026-02-21T08:21:55.9868537Z elect.sync %r219|%p90, -1; 2026-02-21T08:21:55.9868693Z and.pred %p87, %p1, %p90; 2026-02-21T08:21:55.9868848Z add.s32 %r206, %r30, 12288; 2026-02-21T08:21:55.9868992Z mov.b32 %r207, 96; 2026-02-21T08:21:55.9869129Z // begin inline asm 2026-02-21T08:21:55.9869457Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r206], [%rd59, {%r207, %r208}], [%r205]; 2026-02-21T08:21:55.9869810Z // end inline asm 2026-02-21T08:21:55.9870051Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9870317Z bar.sync 0; 2026-02-21T08:21:55.9870451Z elect.sync %r220|%p91, -1; 2026-02-21T08:21:55.9870604Z and.pred %p88, %p1, %p91; 2026-02-21T08:21:55.9870759Z add.s32 %r210, %r30, 26624; 2026-02-21T08:21:55.9870904Z // begin inline asm 2026-02-21T08:21:55.9871222Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r210], [%rd60, {%r207, %r212}], [%r205]; 2026-02-21T08:21:55.9871617Z // end inline asm 2026-02-21T08:21:55.9871743Z mov.b32 %r403, 1; 2026-02-21T08:21:55.9871879Z mov.b32 %r402, 6; 2026-02-21T08:21:55.9872006Z mov.b32 %r398, 0; 2026-02-21T08:21:55.9872141Z mov.b32 %r400, %r398; 2026-02-21T08:21:55.9872279Z mov.b32 %r401, %r398; 2026-02-21T08:21:55.9872425Z mov.b32 %r404, %r398; 2026-02-21T08:21:55.9872560Z mov.b32 %r405, %r398; 2026-02-21T08:21:55.9872701Z bra.uni $L__BB0_4; 2026-02-21T08:21:55.9872893Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.9873204Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9873499Z setp.lt.u32 %p100, %r405, 1936; 2026-02-21T08:21:55.9873816Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9874099Z // begin inline asm 2026-02-21T08:21:55.9874231Z 2026-02-21T08:21:55.9874347Z { 2026-02-21T08:21:55.9874461Z .reg .pred complete; 2026-02-21T08:21:55.9874606Z waitLoop: 2026-02-21T08:21:55.9874829Z mbarrier.try_wait.parity.shared.b64 complete, [%r399], %r398; 2026-02-21T08:21:55.9875057Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.9875215Z } 2026-02-21T08:21:55.9875219Z 2026-02-21T08:21:55.9875273Z // end inline asm 2026-02-21T08:21:55.9875330Z add.s32 %r250, %r403, 1; 2026-02-21T08:21:55.9875400Z setp.gt.s32 %p103, %r250, 1; 2026-02-21T08:21:55.9875464Z selp.b32 %r403, 0, %r250, %p103; 2026-02-21T08:21:55.9875523Z selp.b32 %r251, 1, 0, %p103; 2026-02-21T08:21:55.9875578Z xor.b32 %r404, %r260, %r251; 2026-02-21T08:21:55.9875751Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9875807Z add.s32 %r252, %r402, 1; 2026-02-21T08:21:55.9875865Z setp.gt.s32 %p104, %r252, 6; 2026-02-21T08:21:55.9875937Z selp.b32 %r402, 0, %r252, %p104; 2026-02-21T08:21:55.9875995Z shl.b32 %r253, %r402, 3; 2026-02-21T08:21:55.9876051Z add.s32 %r255, %r30, %r253; 2026-02-21T08:21:55.9876113Z add.s32 %r245, %r255, 28672; 2026-02-21T08:21:55.9876165Z bar.sync 0; 2026-02-21T08:21:55.9876225Z and.pred %p97, %p110, %p100; 2026-02-21T08:21:55.9876278Z // begin inline asm 2026-02-21T08:21:55.9876392Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r245], 4096; 2026-02-21T08:21:55.9876444Z // end inline asm 2026-02-21T08:21:55.9876601Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9876663Z shl.b32 %r256, %r402, 11; 2026-02-21T08:21:55.9876719Z add.s32 %r242, %r30, %r256; 2026-02-21T08:21:55.9876770Z bar.sync 0; 2026-02-21T08:21:55.9876831Z elect.sync %r257|%p105, -1; 2026-02-21T08:21:55.9876901Z and.pred %p106, %p100, %p105; 2026-02-21T08:21:55.9876962Z and.pred %p98, %p1, %p106; 2026-02-21T08:21:55.9877019Z add.s32 %r243, %r405, 112; 2026-02-21T08:21:55.9877083Z // begin inline asm 2026-02-21T08:21:55.9877313Z @%p98 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd59, {%r243, %r208}], [%r245]; 2026-02-21T08:21:55.9877365Z // end inline asm 2026-02-21T08:21:55.9877534Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9877590Z add.s32 %r246, %r242, 14336; 2026-02-21T08:21:55.9877641Z bar.sync 0; 2026-02-21T08:21:55.9877701Z elect.sync %r258|%p107, -1; 2026-02-21T08:21:55.9877769Z and.pred %p108, %p100, %p107; 2026-02-21T08:21:55.9877828Z and.pred %p99, %p1, %p108; 2026-02-21T08:21:55.9877881Z // begin inline asm 2026-02-21T08:21:55.9878117Z @%p99 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd60, {%r243, %r212}], [%r245]; 2026-02-21T08:21:55.9878170Z // end inline asm 2026-02-21T08:21:55.9878334Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9878455Z setp.lt.u32 %p109, %r405, 2016; 2026-02-21T08:21:55.9878514Z add.s32 %r405, %r405, 16; 2026-02-21T08:21:55.9878572Z mov.b32 %r398, %r260; 2026-02-21T08:21:55.9878628Z mov.b32 %r399, %r259; 2026-02-21T08:21:55.9878694Z @%p109 bra $L__BB0_4; 2026-02-21T08:21:55.9878750Z bra.uni $L__BB0_7; 2026-02-21T08:21:55.9878855Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:21:55.9879023Z .loc 1 0 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:0:42 2026-02-21T08:21:55.9879079Z mov.b32 %r260, %r404; 2026-02-21T08:21:55.9879245Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9879310Z add.s32 %r223, %r401, 1; 2026-02-21T08:21:55.9879372Z setp.gt.s32 %p93, %r223, 6; 2026-02-21T08:21:55.9879433Z selp.b32 %r401, 0, %r223, %p93; 2026-02-21T08:21:55.9879538Z selp.b32 %r224, 1, 0, %p93; 2026-02-21T08:21:55.9879606Z xor.b32 %r400, %r400, %r224; 2026-02-21T08:21:55.9879661Z shl.b32 %r225, %r401, 3; 2026-02-21T08:21:55.9879716Z add.s32 %r227, %r30, %r225; 2026-02-21T08:21:55.9879778Z add.s32 %r221, %r227, 28672; 2026-02-21T08:21:55.9879828Z bar.sync 0; 2026-02-21T08:21:55.9879882Z // begin inline asm 2026-02-21T08:21:55.9879929Z 2026-02-21T08:21:55.9879985Z { 2026-02-21T08:21:55.9880042Z .reg .pred complete; 2026-02-21T08:21:55.9880093Z waitLoop: 2026-02-21T08:21:55.9880217Z mbarrier.try_wait.parity.shared.b64 complete, [%r221], %r400; 2026-02-21T08:21:55.9880278Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.9880325Z } 2026-02-21T08:21:55.9880328Z 2026-02-21T08:21:55.9880388Z // end inline asm 2026-02-21T08:21:55.9880442Z shl.b32 %r228, %r403, 3; 2026-02-21T08:21:55.9880496Z add.s32 %r229, %r30, %r228; 2026-02-21T08:21:55.9880551Z add.s32 %r259, %r229, 28736; 2026-02-21T08:21:55.9880719Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9880778Z @%p82 bra $L__BB0_6; 2026-02-21T08:21:55.9880874Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:21:55.9881042Z .loc 1 51 31 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:51:31 2026-02-21T08:21:55.9881098Z shl.b32 %r232, %r401, 11; 2026-02-21T08:21:55.9881154Z add.s32 %r234, %r30, %r232; 2026-02-21T08:21:55.9881327Z .loc 1 52 44 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:52:44 2026-02-21T08:21:55.9881385Z add.s32 %r235, %r234, 14336; 2026-02-21T08:21:55.9881555Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9881625Z elect.sync %r236|%p95, -1; 2026-02-21T08:21:55.9881691Z bfe.u32 %r237, %r234, 4, 14; 2026-02-21T08:21:55.9881748Z cvt.u64.u32 %rd64, %r237; 2026-02-21T08:21:55.9881818Z or.b64 %rd61, %rd64, -4611685949699522560; 2026-02-21T08:21:55.9881884Z bfe.u32 %r238, %r235, 4, 14; 2026-02-21T08:21:55.9881943Z cvt.u64.u32 %rd65, %r238; 2026-02-21T08:21:55.9882012Z or.b64 %rd62, %rd65, -4611685949699522560; 2026-02-21T08:21:55.9882066Z mov.b32 %r231, 68157456; 2026-02-21T08:21:55.9882130Z mov.pred %p94, -1; 2026-02-21T08:21:55.9882183Z // begin inline asm 2026-02-21T08:21:55.9882322Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r397 + 0 ], %rd61, %rd62, %r231, %p94; 2026-02-21T08:21:55.9882382Z // end inline asm 2026-02-21T08:21:55.9882438Z cvt.u64.u32 %rd63, %r259; 2026-02-21T08:21:55.9882491Z // begin inline asm 2026-02-21T08:21:55.9882619Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T08:21:55.9882671Z // end inline asm 2026-02-21T08:21:55.9882726Z bra.uni $L__BB0_6; 2026-02-21T08:21:55.9882819Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:21:55.9882998Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9883053Z // begin inline asm 2026-02-21T08:21:55.9883158Z 2026-02-21T08:21:55.9883214Z { 2026-02-21T08:21:55.9883270Z .reg .pred complete; 2026-02-21T08:21:55.9883322Z waitLoop: 2026-02-21T08:21:55.9883433Z mbarrier.try_wait.parity.shared.b64 complete, [%r259], %r260; 2026-02-21T08:21:55.9883500Z @!complete bra.uni waitLoop; 2026-02-21T08:21:55.9883547Z } 2026-02-21T08:21:55.9883550Z 2026-02-21T08:21:55.9883602Z // end inline asm 2026-02-21T08:21:55.9883767Z .loc 1 47 42 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:47:42 2026-02-21T08:21:55.9883820Z bar.sync 0; 2026-02-21T08:21:55.9883874Z // begin inline asm 2026-02-21T08:21:55.9883961Z @%p110 mbarrier.inval.shared::cta.b64 [%r91]; 2026-02-21T08:21:55.9884014Z // end inline asm 2026-02-21T08:21:55.9884066Z bar.sync 0; 2026-02-21T08:21:55.9884118Z // begin inline asm 2026-02-21T08:21:55.9884204Z @%p110 mbarrier.inval.shared::cta.b64 [%r92]; 2026-02-21T08:21:55.9884295Z // end inline asm 2026-02-21T08:21:55.9884352Z bar.sync 0; 2026-02-21T08:21:55.9884415Z // begin inline asm 2026-02-21T08:21:55.9884493Z @%p110 mbarrier.inval.shared::cta.b64 [%r93]; 2026-02-21T08:21:55.9884546Z // end inline asm 2026-02-21T08:21:55.9884597Z bar.sync 0; 2026-02-21T08:21:55.9884658Z // begin inline asm 2026-02-21T08:21:55.9884757Z @%p110 mbarrier.inval.shared::cta.b64 [%r94]; 2026-02-21T08:21:55.9884808Z // end inline asm 2026-02-21T08:21:55.9884867Z bar.sync 0; 2026-02-21T08:21:55.9884920Z // begin inline asm 2026-02-21T08:21:55.9884994Z @%p110 mbarrier.inval.shared::cta.b64 [%r95]; 2026-02-21T08:21:55.9885051Z // end inline asm 2026-02-21T08:21:55.9885101Z bar.sync 0; 2026-02-21T08:21:55.9885153Z // begin inline asm 2026-02-21T08:21:55.9885225Z @%p110 mbarrier.inval.shared::cta.b64 [%r96]; 2026-02-21T08:21:55.9885284Z // end inline asm 2026-02-21T08:21:55.9885335Z bar.sync 0; 2026-02-21T08:21:55.9885387Z // begin inline asm 2026-02-21T08:21:55.9885476Z @%p110 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T08:21:55.9885531Z // end inline asm 2026-02-21T08:21:55.9885587Z add.s32 %r268, %r30, 28736; 2026-02-21T08:21:55.9885640Z // begin inline asm 2026-02-21T08:21:55.9885724Z @%p110 mbarrier.inval.shared::cta.b64 [%r268]; 2026-02-21T08:21:55.9885776Z // end inline asm 2026-02-21T08:21:55.9885827Z bar.sync 0; 2026-02-21T08:21:55.9885886Z // begin inline asm 2026-02-21T08:21:55.9885959Z @%p110 mbarrier.inval.shared::cta.b64 [%r90]; 2026-02-21T08:21:55.9886012Z // end inline asm 2026-02-21T08:21:55.9886177Z .loc 1 56 45 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:45 2026-02-21T08:21:55.9886235Z shl.b32 %r341, %r9, 12; 2026-02-21T08:21:55.9886290Z shl.b32 %r342, %r10, 12; 2026-02-21T08:21:55.9886346Z shl.b32 %r343, %r11, 12; 2026-02-21T08:21:55.9886408Z shl.b32 %r344, %r12, 12; 2026-02-21T08:21:55.9886570Z .loc 1 56 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:52 2026-02-21T08:21:55.9886628Z or.b32 %r345, %r7, %r341; 2026-02-21T08:21:55.9886694Z or.b32 %r346, %r7, %r342; 2026-02-21T08:21:55.9886748Z or.b32 %r347, %r7, %r343; 2026-02-21T08:21:55.9886801Z or.b32 %r348, %r7, %r344; 2026-02-21T08:21:55.9886962Z .loc 1 56 24 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:24 2026-02-21T08:21:55.9887035Z mad.wide.u32 %rd68, %r345, 2, %rd3; 2026-02-21T08:21:55.9887101Z mad.wide.u32 %rd69, %r346, 2, %rd3; 2026-02-21T08:21:55.9887163Z mad.wide.u32 %rd70, %r347, 2, %rd3; 2026-02-21T08:21:55.9887229Z mad.wide.u32 %rd71, %r348, 2, %rd3; 2026-02-21T08:21:55.9887389Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9887444Z // begin inline asm 2026-02-21T08:21:55.9887747Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285}, [%r303 + 0], 32; 2026-02-21T08:21:55.9887804Z // end inline asm 2026-02-21T08:21:55.9887915Z // begin inline asm 2026-02-21T08:21:55.9888212Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302}, [%r303 + 16], 32; 2026-02-21T08:21:55.9888266Z // end inline asm 2026-02-21T08:21:55.9888320Z // begin inline asm 2026-02-21T08:21:55.9888388Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:21:55.9888449Z // end inline asm 2026-02-21T08:21:55.9888506Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:21:55.9888562Z cvt.u64.u32 %rd73, %r271; 2026-02-21T08:21:55.9888627Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:21:55.9888685Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:21:55.9888847Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9888917Z mov.b64 {%r349, %r350}, %rd75; 2026-02-21T08:21:55.9888984Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T08:21:55.9889198Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9889261Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:21:55.9889331Z cvt.u64.u32 %rd77, %r273; 2026-02-21T08:21:55.9889389Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:21:55.9889449Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:21:55.9889626Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9889686Z mov.b64 {%r352, %r353}, %rd79; 2026-02-21T08:21:55.9889750Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T08:21:55.9889917Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9889972Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:21:55.9890027Z cvt.u64.u32 %rd81, %r275; 2026-02-21T08:21:55.9890082Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:21:55.9890146Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:21:55.9890306Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9890367Z mov.b64 {%r355, %r356}, %rd83; 2026-02-21T08:21:55.9890434Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T08:21:55.9890589Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9890643Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:21:55.9890697Z cvt.u64.u32 %rd85, %r277; 2026-02-21T08:21:55.9890758Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:21:55.9890817Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:21:55.9890982Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9891048Z mov.b64 {%r358, %r359}, %rd87; 2026-02-21T08:21:55.9891113Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T08:21:55.9891280Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9891344Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:21:55.9891403Z cvt.u64.u32 %rd89, %r279; 2026-02-21T08:21:55.9891463Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:21:55.9891522Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:21:55.9891693Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9891753Z mov.b64 {%r361, %r362}, %rd91; 2026-02-21T08:21:55.9891814Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T08:21:55.9891987Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9892044Z cvt.u64.u32 %rd92, %r280; 2026-02-21T08:21:55.9892101Z cvt.u64.u32 %rd93, %r281; 2026-02-21T08:21:55.9892164Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:21:55.9892223Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:21:55.9892389Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9892449Z mov.b64 {%r364, %r365}, %rd95; 2026-02-21T08:21:55.9892522Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T08:21:55.9892688Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9892791Z cvt.u64.u32 %rd96, %r282; 2026-02-21T08:21:55.9892858Z cvt.u64.u32 %rd97, %r283; 2026-02-21T08:21:55.9892916Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:21:55.9892974Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:21:55.9893149Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9893208Z mov.b64 {%r367, %r368}, %rd99; 2026-02-21T08:21:55.9893272Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T08:21:55.9893438Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9893506Z cvt.u64.u32 %rd100, %r284; 2026-02-21T08:21:55.9893564Z cvt.u64.u32 %rd101, %r285; 2026-02-21T08:21:55.9893622Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:21:55.9893688Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:21:55.9893897Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9893963Z mov.b64 {%r370, %r371}, %rd103; 2026-02-21T08:21:55.9894033Z cvt.rn.f16x2.f32 %r372, %r371, %r370; 2026-02-21T08:21:55.9894193Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9894252Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:21:55.9894309Z cvt.u64.u32 %rd105, %r288; 2026-02-21T08:21:55.9894374Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:21:55.9894433Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:21:55.9894593Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9894662Z mov.b64 {%r373, %r374}, %rd107; 2026-02-21T08:21:55.9894757Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T08:21:55.9894920Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9894985Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:21:55.9895046Z cvt.u64.u32 %rd109, %r290; 2026-02-21T08:21:55.9895104Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:21:55.9895162Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:21:55.9895333Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9895393Z mov.b64 {%r376, %r377}, %rd111; 2026-02-21T08:21:55.9895455Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T08:21:55.9895624Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9895684Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:21:55.9895743Z cvt.u64.u32 %rd113, %r292; 2026-02-21T08:21:55.9895808Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:21:55.9895866Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:21:55.9896030Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9896092Z mov.b64 {%r379, %r380}, %rd115; 2026-02-21T08:21:55.9896165Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T08:21:55.9896329Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9896387Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:21:55.9896452Z cvt.u64.u32 %rd117, %r294; 2026-02-21T08:21:55.9896509Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:21:55.9896568Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:21:55.9896737Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9896796Z mov.b64 {%r382, %r383}, %rd119; 2026-02-21T08:21:55.9896858Z cvt.rn.f16x2.f32 %r384, %r383, %r382; 2026-02-21T08:21:55.9897017Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9897083Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:21:55.9897140Z cvt.u64.u32 %rd121, %r296; 2026-02-21T08:21:55.9897200Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:21:55.9897268Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:21:55.9897487Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9897550Z mov.b64 {%r385, %r386}, %rd123; 2026-02-21T08:21:55.9897623Z cvt.rn.f16x2.f32 %r387, %r386, %r385; 2026-02-21T08:21:55.9897791Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9897851Z cvt.u64.u32 %rd124, %r297; 2026-02-21T08:21:55.9897909Z cvt.u64.u32 %rd125, %r298; 2026-02-21T08:21:55.9897978Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:21:55.9898038Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:21:55.9898205Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9898272Z mov.b64 {%r388, %r389}, %rd127; 2026-02-21T08:21:55.9898335Z cvt.rn.f16x2.f32 %r390, %r389, %r388; 2026-02-21T08:21:55.9898564Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9898634Z cvt.u64.u32 %rd128, %r299; 2026-02-21T08:21:55.9898702Z cvt.u64.u32 %rd129, %r300; 2026-02-21T08:21:55.9898758Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:21:55.9898814Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:21:55.9898977Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9899033Z mov.b64 {%r391, %r392}, %rd131; 2026-02-21T08:21:55.9899093Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T08:21:55.9899256Z .loc 1 53 52 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:53:52 2026-02-21T08:21:55.9899313Z cvt.u64.u32 %rd132, %r301; 2026-02-21T08:21:55.9899368Z cvt.u64.u32 %rd133, %r302; 2026-02-21T08:21:55.9899432Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:21:55.9899489Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:21:55.9899645Z .loc 1 55 27 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:55:27 2026-02-21T08:21:55.9899704Z mov.b64 {%r394, %r395}, %rd135; 2026-02-21T08:21:55.9899771Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T08:21:55.9899929Z .loc 1 56 82 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:56:82 2026-02-21T08:21:55.9900019Z st.shared.v4.b32 [%r4], {%r351, %r363, %r375, %r387}; 2026-02-21T08:21:55.9900080Z bar.sync 0; 2026-02-21T08:21:55.9900136Z // begin inline asm 2026-02-21T08:21:55.9900284Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r324, %r328, %r332, %r336}, [%r308]; 2026-02-21T08:21:55.9900343Z // end inline asm 2026-02-21T08:21:55.9900395Z bar.sync 0; 2026-02-21T08:21:55.9900483Z st.shared.v4.b32 [%r4], {%r354, %r366, %r378, %r390}; 2026-02-21T08:21:55.9900534Z bar.sync 0; 2026-02-21T08:21:55.9900596Z // begin inline asm 2026-02-21T08:21:55.9900742Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r325, %r329, %r333, %r337}, [%r308]; 2026-02-21T08:21:55.9900798Z // end inline asm 2026-02-21T08:21:55.9900857Z bar.sync 0; 2026-02-21T08:21:55.9900944Z st.shared.v4.b32 [%r4], {%r357, %r369, %r381, %r393}; 2026-02-21T08:21:55.9900996Z bar.sync 0; 2026-02-21T08:21:55.9901049Z // begin inline asm 2026-02-21T08:21:55.9901197Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r326, %r330, %r334, %r338}, [%r308]; 2026-02-21T08:21:55.9901250Z // end inline asm 2026-02-21T08:21:55.9901300Z bar.sync 0; 2026-02-21T08:21:55.9901388Z st.shared.v4.b32 [%r4], {%r360, %r372, %r384, %r396}; 2026-02-21T08:21:55.9901439Z bar.sync 0; 2026-02-21T08:21:55.9901492Z // begin inline asm 2026-02-21T08:21:55.9901637Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r327, %r331, %r335, %r339}, [%r308]; 2026-02-21T08:21:55.9901690Z // end inline asm 2026-02-21T08:21:55.9901744Z // begin inline asm 2026-02-21T08:21:55.9901844Z st.global.v4.b32 [ %rd68 + 0 ], { %r324, %r325, %r326, %r327 }; 2026-02-21T08:21:55.9901905Z // end inline asm 2026-02-21T08:21:55.9901957Z // begin inline asm 2026-02-21T08:21:55.9902054Z st.global.v4.b32 [ %rd69 + 0 ], { %r328, %r329, %r330, %r331 }; 2026-02-21T08:21:55.9902156Z // end inline asm 2026-02-21T08:21:55.9902209Z // begin inline asm 2026-02-21T08:21:55.9902302Z st.global.v4.b32 [ %rd70 + 0 ], { %r332, %r333, %r334, %r335 }; 2026-02-21T08:21:55.9902353Z // end inline asm 2026-02-21T08:21:55.9902414Z // begin inline asm 2026-02-21T08:21:55.9902503Z st.global.v4.b32 [ %rd71 + 0 ], { %r336, %r337, %r338, %r339 }; 2026-02-21T08:21:55.9902555Z // end inline asm 2026-02-21T08:21:55.9902637Z $L__BB0_8: // %._crit_edge 2026-02-21T08:21:55.9902795Z .loc 1 28 4 // c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py:28:4 2026-02-21T08:21:55.9902847Z bar.sync 0; 2026-02-21T08:21:55.9902907Z // begin inline asm 2026-02-21T08:21:55.9903018Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r397, 64; 2026-02-21T08:21:55.9903070Z // end inline asm 2026-02-21T08:21:55.9903119Z ret; 2026-02-21T08:21:55.9903277Z $L__tmp1: 2026-02-21T08:21:55.9903333Z $L__func_end0: 2026-02-21T08:21:55.9903415Z // -- End function 2026-02-21T08:21:55.9903469Z } 2026-02-21T08:21:55.9903661Z .file 1 "/tmp/torchinductor_root/4k/c4k3cgrxvh437qqc4unft6rxv4gfefx47j4ho37nthxhasfzxwhj.py" 2026-02-21T08:21:55.9903720Z .section .debug_abbrev 2026-02-21T08:21:55.9903768Z { 2026-02-21T08:21:55.9903858Z .b8 1 // Abbreviation Code 2026-02-21T08:21:55.9903940Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:21:55.9904016Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:21:55.9904100Z .b8 37 // DW_AT_producer 2026-02-21T08:21:55.9904171Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.9904241Z .b8 19 // DW_AT_language 2026-02-21T08:21:55.9904322Z .b8 5 // DW_FORM_data2 2026-02-21T08:21:55.9904397Z .b8 3 // DW_AT_name 2026-02-21T08:21:55.9904470Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.9904545Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:21:55.9904627Z .b8 6 // DW_FORM_data4 2026-02-21T08:21:55.9904721Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:21:55.9904790Z .b8 8 // DW_FORM_string 2026-02-21T08:21:55.9904866Z .b8 0 // EOM(1) 2026-02-21T08:21:55.9904932Z .b8 0 // EOM(2) 2026-02-21T08:21:55.9904998Z .b8 0 // EOM(3) 2026-02-21T08:21:55.9905054Z } 2026-02-21T08:21:55.9905111Z .section .debug_info 2026-02-21T08:21:55.9905158Z { 2026-02-21T08:21:55.9905238Z .b32 104 // Length of Unit 2026-02-21T08:21:55.9905326Z .b8 2 // DWARF version number 2026-02-21T08:21:55.9905379Z .b8 0 2026-02-21T08:21:55.9905494Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:21:55.9905588Z .b8 8 // Address Size (in bytes) 2026-02-21T08:21:55.9905685Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:21:55.9905762Z .b8 116 // DW_AT_producer 2026-02-21T08:21:55.9905824Z .b8 114 2026-02-21T08:21:55.9905878Z .b8 105 2026-02-21T08:21:55.9905930Z .b8 116 2026-02-21T08:21:55.9905980Z .b8 111 2026-02-21T08:21:55.9906039Z .b8 110 2026-02-21T08:21:55.9906088Z .b8 0 2026-02-21T08:21:55.9906157Z .b8 2 // DW_AT_language 2026-02-21T08:21:55.9906217Z .b8 0 2026-02-21T08:21:55.9906291Z .b8 99 // DW_AT_name 2026-02-21T08:21:55.9906344Z .b8 52 2026-02-21T08:21:55.9906395Z .b8 107 2026-02-21T08:21:55.9906459Z .b8 51 2026-02-21T08:21:55.9906511Z .b8 99 2026-02-21T08:21:55.9906563Z .b8 103 2026-02-21T08:21:55.9906611Z .b8 114 2026-02-21T08:21:55.9906720Z .b8 120 2026-02-21T08:21:55.9906770Z .b8 118 2026-02-21T08:21:55.9906818Z .b8 104 2026-02-21T08:21:55.9906871Z .b8 52 2026-02-21T08:21:55.9906918Z .b8 51 2026-02-21T08:21:55.9906966Z .b8 55 2026-02-21T08:21:55.9907013Z .b8 113 2026-02-21T08:21:55.9907069Z .b8 113 2026-02-21T08:21:55.9907118Z .b8 99 2026-02-21T08:21:55.9907166Z .b8 52 2026-02-21T08:21:55.9907221Z .b8 117 2026-02-21T08:21:55.9907268Z .b8 110 2026-02-21T08:21:55.9907317Z .b8 102 2026-02-21T08:21:55.9907365Z .b8 116 2026-02-21T08:21:55.9907420Z .b8 54 2026-02-21T08:21:55.9907468Z .b8 114 2026-02-21T08:21:55.9907516Z .b8 120 2026-02-21T08:21:55.9907570Z .b8 118 2026-02-21T08:21:55.9907619Z .b8 52 2026-02-21T08:21:55.9907667Z .b8 103 2026-02-21T08:21:55.9907715Z .b8 102 2026-02-21T08:21:55.9907772Z .b8 101 2026-02-21T08:21:55.9907820Z .b8 102 2026-02-21T08:21:55.9907868Z .b8 120 2026-02-21T08:21:55.9907916Z .b8 52 2026-02-21T08:21:55.9907972Z .b8 55 2026-02-21T08:21:55.9908075Z .b8 106 2026-02-21T08:21:55.9908127Z .b8 52 2026-02-21T08:21:55.9908182Z .b8 104 2026-02-21T08:21:55.9908231Z .b8 111 2026-02-21T08:21:55.9908279Z .b8 51 2026-02-21T08:21:55.9908326Z .b8 55 2026-02-21T08:21:55.9908383Z .b8 110 2026-02-21T08:21:55.9908431Z .b8 116 2026-02-21T08:21:55.9908480Z .b8 104 2026-02-21T08:21:55.9908534Z .b8 120 2026-02-21T08:21:55.9908583Z .b8 104 2026-02-21T08:21:55.9908631Z .b8 97 2026-02-21T08:21:55.9908678Z .b8 115 2026-02-21T08:21:55.9908733Z .b8 102 2026-02-21T08:21:55.9908781Z .b8 122 2026-02-21T08:21:55.9908829Z .b8 120 2026-02-21T08:21:55.9908877Z .b8 119 2026-02-21T08:21:55.9908933Z .b8 104 2026-02-21T08:21:55.9908980Z .b8 106 2026-02-21T08:21:55.9909029Z .b8 46 2026-02-21T08:21:55.9909083Z .b8 112 2026-02-21T08:21:55.9909130Z .b8 121 2026-02-21T08:21:55.9909177Z .b8 0 2026-02-21T08:21:55.9909264Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:21:55.9909342Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:21:55.9909392Z .b8 116 2026-02-21T08:21:55.9909440Z .b8 109 2026-02-21T08:21:55.9909496Z .b8 112 2026-02-21T08:21:55.9909543Z .b8 47 2026-02-21T08:21:55.9909591Z .b8 116 2026-02-21T08:21:55.9909638Z .b8 111 2026-02-21T08:21:55.9909694Z .b8 114 2026-02-21T08:21:55.9909742Z .b8 99 2026-02-21T08:21:55.9909789Z .b8 104 2026-02-21T08:21:55.9909843Z .b8 105 2026-02-21T08:21:55.9909891Z .b8 110 2026-02-21T08:21:55.9909938Z .b8 100 2026-02-21T08:21:55.9909986Z .b8 117 2026-02-21T08:21:55.9910041Z .b8 99 2026-02-21T08:21:55.9910088Z .b8 116 2026-02-21T08:21:55.9910136Z .b8 111 2026-02-21T08:21:55.9910190Z .b8 114 2026-02-21T08:21:55.9910238Z .b8 95 2026-02-21T08:21:55.9910285Z .b8 114 2026-02-21T08:21:55.9910333Z .b8 111 2026-02-21T08:21:55.9910388Z .b8 111 2026-02-21T08:21:55.9910437Z .b8 116 2026-02-21T08:21:55.9910484Z .b8 47 2026-02-21T08:21:55.9910533Z .b8 52 2026-02-21T08:21:55.9910588Z .b8 107 2026-02-21T08:21:55.9910637Z .b8 0 2026-02-21T08:21:55.9910685Z } 2026-02-21T08:21:55.9910756Z .section .debug_macinfo { } 2026-02-21T08:21:55.9910762Z 2026-02-21T08:21:55.9910837Z ================================================================ 2026-02-21T08:21:55.9910935Z please share the reproducer above with Triton project. 2026-02-21T08:21:55.9911044Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:21:55.9911048Z 2026-02-21T08:21:55.9911426Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp4krkxsnj.ptx -o /tmp/tmp4krkxsnj.ptx.o 2026-02-21T08:21:55.9911430Z 2026-02-21T08:21:55.9911556Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:21:55.9911560Z 2026-02-21T08:21:55.9911965Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 17.9 configs/s 2026-02-21T08:21:56.9140089Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1047.5 2026-02-21T08:21:56.9140895Z configs/s 2026-02-21T08:21:56.9806200Z [28s] Generation 1 complete: 2026-02-21T08:21:56.9811289Z error=19 2026-02-21T08:21:56.9812758Z ok=71 2026-02-21T08:21:56.9812922Z min=0.0778 2026-02-21T08:21:56.9813070Z mid=0.5181 2026-02-21T08:21:56.9813208Z max=3.8892 2026-02-21T08:21:56.9813349Z best={'block_sizes': [64, 128, 32], 2026-02-21T08:21:56.9813591Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:21:56.9813810Z 'l2_groupings': [4], 2026-02-21T08:21:56.9813976Z 'load_eviction_policies': ['', ''], 2026-02-21T08:21:56.9814153Z 'loop_orders': [[1, 0]], 2026-02-21T08:21:56.9814305Z 'num_stages': 8, 2026-02-21T08:21:56.9814439Z 'num_warps': 4, 2026-02-21T08:21:56.9814579Z 'pid_type': 'flat', 2026-02-21T08:21:56.9815416Z 'range_flattens': [None, None], 2026-02-21T08:21:56.9815601Z 'range_multi_buffers': [None, None], 2026-02-21T08:21:56.9815789Z 'range_num_stages': [0, 0], 2026-02-21T08:21:56.9815950Z 'range_unroll_factors': [0, 0], 2026-02-21T08:21:56.9816132Z 'range_warp_specializes': [None, None]} 2026-02-21T08:21:56.9822788Z [28s] Fitting surrogate: 190 points, 190 targets 2026-02-21T08:21:58.2801450Z [29s] Generation 2 starting: 87 neighbors, 5 active search path(s) 2026-02-21T08:22:01.6981906Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90/90 63.0 configs/s 2026-02-21T08:22:03.1909402Z 2026-02-21T08:22:03.1909453Z 2026-02-21T08:22:03.1909870Z ================================================================ 2026-02-21T08:22:03.1910251Z Internal Triton PTX codegen error 2026-02-21T08:22:03.1910445Z `ptxas` stderr: 2026-02-21T08:22:03.1910920Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 189 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:22:03.1911480Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:22:03.1911644Z 2026-02-21T08:22:03.1912107Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp32gw7rai.ptx -o /tmp/tmp32gw7rai.ptx.o 2026-02-21T08:22:03.1912581Z 2026-02-21T08:22:03.1912585Z 2026-02-21T08:22:03.1912654Z // 2026-02-21T08:22:03.1912794Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:22:03.1912972Z // 2026-02-21T08:22:03.1913036Z 2026-02-21T08:22:03.1913091Z .version 8.7 2026-02-21T08:22:03.1913231Z .target sm_100a 2026-02-21T08:22:03.1913360Z .address_size 64 2026-02-21T08:22:03.1913453Z 2026-02-21T08:22:03.1913574Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:22:03.1913841Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:22:03.1914054Z // @_helion_matmul 2026-02-21T08:22:03.1914262Z .visible .entry _helion_matmul( 2026-02-21T08:22:03.1914476Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:22:03.1916399Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:22:03.1916649Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:22:03.1916910Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:22:03.1917174Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:22:03.1917371Z ) 2026-02-21T08:22:03.1917494Z .reqntid 384 2026-02-21T08:22:03.1917619Z .maxnreg 32 2026-02-21T08:22:03.1917742Z { 2026-02-21T08:22:03.1917861Z .reg .pred %p<105>; 2026-02-21T08:22:03.1918011Z .reg .b16 %rs<3>; 2026-02-21T08:22:03.1918160Z .reg .b32 %r<1606>; 2026-02-21T08:22:03.1918301Z .reg .b64 %rd<685>; 2026-02-21T08:22:03.1918555Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19:0 2026-02-21T08:22:03.1918833Z $L__func_begin0: 2026-02-21T08:22:03.1919073Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19:0 2026-02-21T08:22:03.1919293Z 2026-02-21T08:22:03.1919345Z // %bb.0: 2026-02-21T08:22:03.1919498Z ld.param.b64 %rd9, [_helion_matmul_param_3]; 2026-02-21T08:22:03.1919677Z $L__tmp0: 2026-02-21T08:22:03.1919910Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19 2026-02-21T08:22:03.1920555Z mov.u32 %r1, %tid.x; 2026-02-21T08:22:03.1920696Z shr.u32 %r2, %r1, 5; 2026-02-21T08:22:03.1920854Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:22:03.1921033Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:22:03.1921189Z @%p2 bra $L__BB0_14; 2026-02-21T08:22:03.1921324Z bra.uni $L__BB0_1; 2026-02-21T08:22:03.1921465Z $L__BB0_14: 2026-02-21T08:22:03.1921693Z .loc 1 0 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:0:0 2026-02-21T08:22:03.1921993Z ld.param.b64 %rd7, [_helion_matmul_param_1]; 2026-02-21T08:22:03.1922276Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19 2026-02-21T08:22:03.1922576Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T08:22:03.1922769Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T08:22:03.1922927Z mov.b32 %r253, global_smem; 2026-02-21T08:22:03.1923288Z // begin inline asm 2026-02-21T08:22:03.1923539Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r253], 256; 2026-02-21T08:22:03.1923792Z // end inline asm 2026-02-21T08:22:03.1923934Z bar.sync 0, 128; 2026-02-21T08:22:03.1924094Z ld.shared.b32 %r1598, [global_smem]; 2026-02-21T08:22:03.1924266Z bar.sync 0, 128; 2026-02-21T08:22:03.1924414Z // begin inline asm 2026-02-21T08:22:03.1924624Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:22:03.1924889Z // end inline asm 2026-02-21T08:22:03.1925143Z .loc 1 21 67 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:21:67 2026-02-21T08:22:03.1925438Z mov.u32 %r26, %ctaid.x; 2026-02-21T08:22:03.1925593Z mov.u32 %r262, %ctaid.y; 2026-02-21T08:22:03.1925739Z mov.u32 %r263, %ctaid.z; 2026-02-21T08:22:03.1925896Z mov.u32 %r264, %nctaid.x; 2026-02-21T08:22:03.1926053Z mov.u32 %r265, %nctaid.y; 2026-02-21T08:22:03.1926211Z mad.lo.s32 %r266, %r263, %r265, %r262; 2026-02-21T08:22:03.1926396Z mad.lo.s32 %r267, %r266, %r264, %r26; 2026-02-21T08:22:03.1926568Z shl.b32 %r268, %r267, 7; 2026-02-21T08:22:03.1926725Z cvt.s64.s32 %rd139, %r268; 2026-02-21T08:22:03.1926886Z add.s64 %rd136, %rd9, %rd139; 2026-02-21T08:22:03.1927049Z shl.b32 %r269, %r1, 2; 2026-02-21T08:22:03.1927197Z add.s32 %r254, %r253, %r269; 2026-02-21T08:22:03.1927352Z mov.b32 %r271, 0; 2026-02-21T08:22:03.1927483Z // begin inline asm 2026-02-21T08:22:03.1927640Z @%p29 st.shared.b32 [ %r254 + 0 ], %r271; 2026-02-21T08:22:03.1927815Z // end inline asm 2026-02-21T08:22:03.1927950Z bar.warp.sync -1; 2026-02-21T08:22:03.1928098Z setp.eq.b32 %p66, %r1, 0; 2026-02-21T08:22:03.1928249Z cvt.u64.u32 %rd121, %r253; 2026-02-21T08:22:03.1928404Z // begin inline asm 2026-02-21T08:22:03.1928653Z @%p66 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd121 + 0 ], %rd7; 2026-02-21T08:22:03.1928940Z // end inline asm 2026-02-21T08:22:03.1929070Z // begin inline asm 2026-02-21T08:22:03.1929300Z @%p66 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1; 2026-02-21T08:22:03.1929554Z // end inline asm 2026-02-21T08:22:03.1929684Z mov.b32 %r256, 32; 2026-02-21T08:22:03.1929824Z // begin inline asm 2026-02-21T08:22:03.1930056Z @%p66 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0, %r256; 2026-02-21T08:22:03.1930332Z // end inline asm 2026-02-21T08:22:03.1930462Z mov.b32 %r257, 128; 2026-02-21T08:22:03.1930605Z // begin inline asm 2026-02-21T08:22:03.1930834Z @%p66 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1, %r257; 2026-02-21T08:22:03.1931107Z // end inline asm 2026-02-21T08:22:03.1931244Z mov.b32 %r258, 2048; 2026-02-21T08:22:03.1931380Z // begin inline asm 2026-02-21T08:22:03.1931631Z @%p66 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0, %r258; 2026-02-21T08:22:03.1931909Z // end inline asm 2026-02-21T08:22:03.1932045Z mov.b32 %r259, 4096; 2026-02-21T08:22:03.1932184Z // begin inline asm 2026-02-21T08:22:03.1932433Z @%p66 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1, %r259; 2026-02-21T08:22:03.1932783Z // end inline asm 2026-02-21T08:22:03.1932917Z mov.b64 %rd129, 4096; 2026-02-21T08:22:03.1933069Z // begin inline asm 2026-02-21T08:22:03.1933319Z @%p66 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd121 + 0 ], 0x0, %rd129; 2026-02-21T08:22:03.1933608Z // end inline asm 2026-02-21T08:22:03.1933734Z mov.b32 %r260, 1; 2026-02-21T08:22:03.1933873Z // begin inline asm 2026-02-21T08:22:03.1934122Z @%p66 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0, %r260; 2026-02-21T08:22:03.1934409Z // end inline asm 2026-02-21T08:22:03.1934551Z // begin inline asm 2026-02-21T08:22:03.1934829Z @%p66 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x1, %r260; 2026-02-21T08:22:03.1935110Z // end inline asm 2026-02-21T08:22:03.1935236Z // begin inline asm 2026-02-21T08:22:03.1935533Z @%p66 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x6; 2026-02-21T08:22:03.1935794Z // end inline asm 2026-02-21T08:22:03.1935931Z // begin inline asm 2026-02-21T08:22:03.1936182Z @%p66 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0; 2026-02-21T08:22:03.1936461Z // end inline asm 2026-02-21T08:22:03.1936593Z // begin inline asm 2026-02-21T08:22:03.1936823Z @%p66 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x2; 2026-02-21T08:22:03.1937085Z // end inline asm 2026-02-21T08:22:03.1937213Z // begin inline asm 2026-02-21T08:22:03.1937443Z @%p66 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd121 + 0 ], 0x0; 2026-02-21T08:22:03.1937697Z // end inline asm 2026-02-21T08:22:03.1937834Z // begin inline asm 2026-02-21T08:22:03.1938201Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd136 + 0 ], [ %rd121 + 0 ], 0x80; 2026-02-21T08:22:03.1938587Z // end inline asm 2026-02-21T08:22:03.1938727Z // begin inline asm 2026-02-21T08:22:03.1938941Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd136 + 0 ], 0x80; 2026-02-21T08:22:03.1939188Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T08:22:03.1939381Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:22:03.1939554Z // end inline asm 2026-02-21T08:22:03.1939690Z bar.sync 0, 128; 2026-02-21T08:22:03.1939949Z .loc 1 30 75 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:30:75 2026-02-21T08:22:03.1940258Z setp.gt.u32 %p49, %r26, 255; 2026-02-21T08:22:03.1940424Z @%p49 bra $L__BB0_16; 2026-02-21T08:22:03.1940599Z // %bb.15: // %.lr.ph 2026-02-21T08:22:03.1940910Z .loc 1 0 75 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:0:75 2026-02-21T08:22:03.1941227Z ld.param.b64 %rd8, [_helion_matmul_param_2]; 2026-02-21T08:22:03.1941541Z .loc 1 44 45 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:44:45 2026-02-21T08:22:03.1941834Z shl.b32 %r1141, %r1, 3; 2026-02-21T08:22:03.1942002Z and.b32 %r1142, %r1141, 120; 2026-02-21T08:22:03.1942270Z .loc 1 42 45 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:42:45 2026-02-21T08:22:03.1942564Z shr.u32 %r1143, %r1, 4; 2026-02-21T08:22:03.1942834Z .loc 1 41 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:41:27 2026-02-21T08:22:03.1943120Z shl.b32 %r1144, %r26, 8; 2026-02-21T08:22:03.1943286Z and.b32 %r1145, %r1144, 1792; 2026-02-21T08:22:03.1943556Z .loc 1 42 45 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:42:45 2026-02-21T08:22:03.1943849Z or.b32 %r1146, %r1143, %r1145; 2026-02-21T08:22:03.1944016Z and.b32 %r1148, %r269, 16; 2026-02-21T08:22:03.1944183Z add.s32 %r1150, %r253, %r1148; 2026-02-21T08:22:03.1944353Z and.b32 %r1151, %r1, 96; 2026-02-21T08:22:03.1944505Z shl.b32 %r1152, %r1151, 6; 2026-02-21T08:22:03.1944699Z shl.b32 %r1153, %r1, 5; 2026-02-21T08:22:03.1944916Z and.b32 %r1154, %r1153, 96; 2026-02-21T08:22:03.1945085Z or.b32 %r1155, %r1152, %r1154; 2026-02-21T08:22:03.1945244Z shl.b32 %r1156, %r1, 4; 2026-02-21T08:22:03.1945399Z and.b32 %r1157, %r1156, 384; 2026-02-21T08:22:03.1945557Z or.b32 %r1158, %r1157, %r1151; 2026-02-21T08:22:03.1945727Z xor.b32 %r1159, %r1155, %r1158; 2026-02-21T08:22:03.1945894Z add.s32 %r857, %r1150, %r1159; 2026-02-21T08:22:03.1946060Z add.s32 %r872, %r857, 1536; 2026-02-21T08:22:03.1946225Z add.s32 %r867, %r857, 1024; 2026-02-21T08:22:03.1946381Z add.s32 %r862, %r857, 512; 2026-02-21T08:22:03.1946544Z shl.b32 %r1160, %r1, 10; 2026-02-21T08:22:03.1946698Z and.b32 %r1161, %r1160, 6144; 2026-02-21T08:22:03.1946869Z and.b32 %r1162, %r1156, 2032; 2026-02-21T08:22:03.1947029Z or.b32 %r1163, %r1161, %r1162; 2026-02-21T08:22:03.1947192Z xor.b32 %r1164, %r1163, 96; 2026-02-21T08:22:03.1947349Z add.s32 %r1165, %r253, %r1164; 2026-02-21T08:22:03.1947573Z xor.b32 %r1166, %r1163, 64; 2026-02-21T08:22:03.1947737Z add.s32 %r1167, %r253, %r1166; 2026-02-21T08:22:03.1947912Z xor.b32 %r1168, %r1163, 32; 2026-02-21T08:22:03.1948083Z add.s32 %r1169, %r253, %r1168; 2026-02-21T08:22:03.1948251Z add.s32 %r1170, %r253, %r1163; 2026-02-21T08:22:03.1948543Z .loc 1 41 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:41:27 2026-02-21T08:22:03.1948892Z or.b32 %r1171, %r1144, %r1143; 2026-02-21T08:22:03.1949158Z .loc 1 42 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:42:32 2026-02-21T08:22:03.1949435Z and.b32 %r1172, %r1171, 1799; 2026-02-21T08:22:03.1949598Z shl.b32 %r1173, %r1146, 12; 2026-02-21T08:22:03.1949864Z .loc 1 43 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:43:27 2026-02-21T08:22:03.1950143Z shl.b32 %r1174, %r26, 4; 2026-02-21T08:22:03.1950302Z and.b32 %r1175, %r1174, 3968; 2026-02-21T08:22:03.1950557Z .loc 1 44 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:44:32 2026-02-21T08:22:03.1950845Z or.b32 %r1176, %r1175, %r1142; 2026-02-21T08:22:03.1951100Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.1951401Z shfl.sync.idx.b32 %r1177, %r2, 0, 31, -1; 2026-02-21T08:22:03.1951593Z shl.b32 %r1178, %r1177, 21; 2026-02-21T08:22:03.1951754Z and.b32 %r1179, %r1178, 6291456; 2026-02-21T08:22:03.1952414Z [34s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:22:03.1953645Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:22:03.1954928Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:22:03.1955164Z `ptxas` stderr: 2026-02-21T08:22:03.1955580Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 189 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:22:03.1956052Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:22:03.1956195Z 2026-02-21T08:22:03.1956597Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp32gw7rai.ptx -o /tmp/tmp32gw7rai.ptx.o 2026-02-21T08:22:03.1957047Z 2026-02-21T08:22:03.1957176Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:22:03.1957422Z add.s32 %r270, %r1179, %r1598; 2026-02-21T08:22:03.1957589Z mov.pred %p50, -1; 2026-02-21T08:22:03.1957733Z // begin inline asm 2026-02-21T08:22:03.1958115Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 0], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1958557Z // end inline asm 2026-02-21T08:22:03.1958693Z // begin inline asm 2026-02-21T08:22:03.1959047Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 16], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1959427Z // end inline asm 2026-02-21T08:22:03.1959562Z // begin inline asm 2026-02-21T08:22:03.1959897Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 32], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1960271Z // end inline asm 2026-02-21T08:22:03.1960400Z // begin inline asm 2026-02-21T08:22:03.1960818Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 48], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1961196Z // end inline asm 2026-02-21T08:22:03.1961327Z // begin inline asm 2026-02-21T08:22:03.1961667Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 64], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1962034Z // end inline asm 2026-02-21T08:22:03.1962168Z // begin inline asm 2026-02-21T08:22:03.1962513Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 80], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1962891Z // end inline asm 2026-02-21T08:22:03.1963030Z // begin inline asm 2026-02-21T08:22:03.1963367Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 96], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1963762Z // end inline asm 2026-02-21T08:22:03.1963896Z // begin inline asm 2026-02-21T08:22:03.1964256Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 112], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1964634Z // end inline asm 2026-02-21T08:22:03.1964793Z // begin inline asm 2026-02-21T08:22:03.1965142Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 128], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1965512Z // end inline asm 2026-02-21T08:22:03.1965645Z // begin inline asm 2026-02-21T08:22:03.1965986Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 144], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1966365Z // end inline asm 2026-02-21T08:22:03.1966494Z // begin inline asm 2026-02-21T08:22:03.1966843Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 160], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1967226Z // end inline asm 2026-02-21T08:22:03.1967354Z // begin inline asm 2026-02-21T08:22:03.1967707Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 176], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1968078Z // end inline asm 2026-02-21T08:22:03.1968215Z // begin inline asm 2026-02-21T08:22:03.1968563Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 192], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1968934Z // end inline asm 2026-02-21T08:22:03.1969071Z // begin inline asm 2026-02-21T08:22:03.1969411Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 208], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1969844Z // end inline asm 2026-02-21T08:22:03.1969974Z // begin inline asm 2026-02-21T08:22:03.1970317Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 224], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1970709Z // end inline asm 2026-02-21T08:22:03.1970835Z // begin inline asm 2026-02-21T08:22:03.1971187Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r270 + 240], {%r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271, %r271}; 2026-02-21T08:22:03.1971555Z // end inline asm 2026-02-21T08:22:03.1971690Z // begin inline asm 2026-02-21T08:22:03.1971837Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:22:03.1972001Z // end inline asm 2026-02-21T08:22:03.1972134Z bar.sync 0, 128; 2026-02-21T08:22:03.1972381Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.1972725Z add.s32 %r542, %r253, 155648; 2026-02-21T08:22:03.1972881Z // begin inline asm 2026-02-21T08:22:03.1973047Z @%p66 mbarrier.init.shared::cta.b64 [%r542], 1; 2026-02-21T08:22:03.1973228Z // end inline asm 2026-02-21T08:22:03.1973365Z bar.sync 0, 128; 2026-02-21T08:22:03.1973497Z add.s32 %r543, %r253, 155656; 2026-02-21T08:22:03.1973654Z // begin inline asm 2026-02-21T08:22:03.1973817Z @%p66 mbarrier.init.shared::cta.b64 [%r543], 1; 2026-02-21T08:22:03.1974000Z // end inline asm 2026-02-21T08:22:03.1974133Z bar.sync 0, 128; 2026-02-21T08:22:03.1974265Z add.s32 %r544, %r253, 155664; 2026-02-21T08:22:03.1974423Z // begin inline asm 2026-02-21T08:22:03.1974576Z @%p66 mbarrier.init.shared::cta.b64 [%r544], 1; 2026-02-21T08:22:03.1974788Z // end inline asm 2026-02-21T08:22:03.1974916Z bar.sync 0, 128; 2026-02-21T08:22:03.1975053Z add.s32 %r545, %r253, 155672; 2026-02-21T08:22:03.1975200Z // begin inline asm 2026-02-21T08:22:03.1975365Z @%p66 mbarrier.init.shared::cta.b64 [%r545], 1; 2026-02-21T08:22:03.1975554Z // end inline asm 2026-02-21T08:22:03.1975685Z bar.sync 0, 128; 2026-02-21T08:22:03.1975831Z add.s32 %r546, %r253, 155680; 2026-02-21T08:22:03.1975979Z // begin inline asm 2026-02-21T08:22:03.1976139Z @%p66 mbarrier.init.shared::cta.b64 [%r546], 1; 2026-02-21T08:22:03.1976316Z // end inline asm 2026-02-21T08:22:03.1976448Z bar.sync 0, 128; 2026-02-21T08:22:03.1976584Z add.s32 %r547, %r253, 155688; 2026-02-21T08:22:03.1976744Z // begin inline asm 2026-02-21T08:22:03.1976905Z @%p66 mbarrier.init.shared::cta.b64 [%r547], 1; 2026-02-21T08:22:03.1977088Z // end inline asm 2026-02-21T08:22:03.1977221Z bar.sync 0, 128; 2026-02-21T08:22:03.1977351Z add.s32 %r548, %r253, 155696; 2026-02-21T08:22:03.1977504Z // begin inline asm 2026-02-21T08:22:03.1977659Z @%p66 mbarrier.init.shared::cta.b64 [%r548], 1; 2026-02-21T08:22:03.1977842Z // end inline asm 2026-02-21T08:22:03.1977970Z add.s32 %r549, %r253, 155712; 2026-02-21T08:22:03.1978121Z // begin inline asm 2026-02-21T08:22:03.1978277Z @%p66 mbarrier.init.shared::cta.b64 [%r549], 1; 2026-02-21T08:22:03.1978463Z // end inline asm 2026-02-21T08:22:03.1978596Z bar.sync 0, 128; 2026-02-21T08:22:03.1978725Z add.s32 %r550, %r253, 155720; 2026-02-21T08:22:03.1978880Z // begin inline asm 2026-02-21T08:22:03.1979031Z @%p66 mbarrier.init.shared::cta.b64 [%r550], 1; 2026-02-21T08:22:03.1979212Z // end inline asm 2026-02-21T08:22:03.1979339Z bar.sync 0, 128; 2026-02-21T08:22:03.1979475Z add.s32 %r551, %r253, 155728; 2026-02-21T08:22:03.1979620Z // begin inline asm 2026-02-21T08:22:03.1979781Z @%p66 mbarrier.init.shared::cta.b64 [%r551], 1; 2026-02-21T08:22:03.1979951Z // end inline asm 2026-02-21T08:22:03.1980083Z bar.sync 0, 128; 2026-02-21T08:22:03.1980218Z add.s32 %r552, %r253, 155736; 2026-02-21T08:22:03.1980365Z // begin inline asm 2026-02-21T08:22:03.1980524Z @%p66 mbarrier.init.shared::cta.b64 [%r552], 1; 2026-02-21T08:22:03.1980697Z // end inline asm 2026-02-21T08:22:03.1980830Z bar.sync 0, 128; 2026-02-21T08:22:03.1980962Z add.s32 %r553, %r253, 155744; 2026-02-21T08:22:03.1981183Z // begin inline asm 2026-02-21T08:22:03.1981336Z @%p66 mbarrier.init.shared::cta.b64 [%r553], 1; 2026-02-21T08:22:03.1981518Z // end inline asm 2026-02-21T08:22:03.1981650Z bar.sync 0, 128; 2026-02-21T08:22:03.1981780Z add.s32 %r554, %r253, 155752; 2026-02-21T08:22:03.1981934Z // begin inline asm 2026-02-21T08:22:03.1982088Z @%p66 mbarrier.init.shared::cta.b64 [%r554], 1; 2026-02-21T08:22:03.1982271Z // end inline asm 2026-02-21T08:22:03.1982400Z bar.sync 0, 128; 2026-02-21T08:22:03.1982538Z add.s32 %r555, %r253, 155760; 2026-02-21T08:22:03.1982686Z // begin inline asm 2026-02-21T08:22:03.1982848Z @%p66 mbarrier.init.shared::cta.b64 [%r555], 1; 2026-02-21T08:22:03.1983026Z // end inline asm 2026-02-21T08:22:03.1983280Z .loc 1 55 44 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:55:44 2026-02-21T08:22:03.1983573Z bar.sync 0, 128; 2026-02-21T08:22:03.1983761Z // begin inline asm 2026-02-21T08:22:03.1983936Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r542]; 2026-02-21T08:22:03.1984126Z // end inline asm 2026-02-21T08:22:03.1984276Z bar.sync 0, 128; 2026-02-21T08:22:03.1984408Z // begin inline asm 2026-02-21T08:22:03.1984580Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r543]; 2026-02-21T08:22:03.1984806Z // end inline asm 2026-02-21T08:22:03.1984948Z bar.sync 0, 128; 2026-02-21T08:22:03.1985090Z // begin inline asm 2026-02-21T08:22:03.1985256Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r544]; 2026-02-21T08:22:03.1985456Z // end inline asm 2026-02-21T08:22:03.1985588Z bar.sync 0, 128; 2026-02-21T08:22:03.1985728Z // begin inline asm 2026-02-21T08:22:03.1985892Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r545]; 2026-02-21T08:22:03.1986090Z // end inline asm 2026-02-21T08:22:03.1986220Z bar.sync 0, 128; 2026-02-21T08:22:03.1986363Z // begin inline asm 2026-02-21T08:22:03.1986538Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r546]; 2026-02-21T08:22:03.1986735Z // end inline asm 2026-02-21T08:22:03.1986877Z bar.sync 0, 128; 2026-02-21T08:22:03.1987008Z // begin inline asm 2026-02-21T08:22:03.1987182Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r547]; 2026-02-21T08:22:03.1987370Z // end inline asm 2026-02-21T08:22:03.1987508Z bar.sync 0, 128; 2026-02-21T08:22:03.1987643Z // begin inline asm 2026-02-21T08:22:03.1987812Z @%p66 mbarrier.arrive.shared::cta.b64 _, [%r548]; 2026-02-21T08:22:03.1988002Z // end inline asm 2026-02-21T08:22:03.1988259Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.1988556Z bar.sync 0, 128; 2026-02-21T08:22:03.1988692Z add.s32 %r563, %r253, 155776; 2026-02-21T08:22:03.1988854Z // begin inline asm 2026-02-21T08:22:03.1989016Z @%p66 mbarrier.init.shared::cta.b64 [%r563], 1; 2026-02-21T08:22:03.1989208Z // end inline asm 2026-02-21T08:22:03.1989381Z st.shared.v2.b32 [global_smem+155784], {0, 33685761}; 2026-02-21T08:22:03.1989610Z st.shared.b32 [global_smem], %r1598; 2026-02-21T08:22:03.1989832Z st.shared.v2.b32 [global_smem+8], {%r1175, %r1145}; 2026-02-21T08:22:03.1990032Z barrier.sync 1; 2026-02-21T08:22:03.1990179Z barrier.sync 1; 2026-02-21T08:22:03.1990317Z barrier.sync 1; 2026-02-21T08:22:03.1990578Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.1990875Z bar.sync 0, 128; 2026-02-21T08:22:03.1991017Z // begin inline asm 2026-02-21T08:22:03.1991150Z 2026-02-21T08:22:03.1991274Z { 2026-02-21T08:22:03.1991400Z .reg .pred complete; 2026-02-21T08:22:03.1991555Z waitLoop: 2026-02-21T08:22:03.1991764Z mbarrier.try_wait.parity.shared.b64 complete, [%r563], %r271; 2026-02-21T08:22:03.1991996Z @!complete bra.uni waitLoop; 2026-02-21T08:22:03.1992159Z } 2026-02-21T08:22:03.1992222Z 2026-02-21T08:22:03.1992277Z // end inline asm 2026-02-21T08:22:03.1992521Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.1992797Z bar.sync 0, 128; 2026-02-21T08:22:03.1993001Z // begin inline asm 2026-02-21T08:22:03.1993162Z @%p66 mbarrier.inval.shared::cta.b64 [%r563]; 2026-02-21T08:22:03.1993348Z // end inline asm 2026-02-21T08:22:03.1993479Z // begin inline asm 2026-02-21T08:22:03.1993638Z @%p66 mbarrier.inval.shared::cta.b64 [%r549]; 2026-02-21T08:22:03.1993823Z // end inline asm 2026-02-21T08:22:03.1993949Z bar.sync 0, 128; 2026-02-21T08:22:03.1994088Z // begin inline asm 2026-02-21T08:22:03.1994240Z @%p66 mbarrier.inval.shared::cta.b64 [%r550]; 2026-02-21T08:22:03.1994424Z // end inline asm 2026-02-21T08:22:03.1994550Z bar.sync 0, 128; 2026-02-21T08:22:03.1994716Z // begin inline asm 2026-02-21T08:22:03.1994872Z @%p66 mbarrier.inval.shared::cta.b64 [%r551]; 2026-02-21T08:22:03.1995055Z // end inline asm 2026-02-21T08:22:03.1995188Z bar.sync 0, 128; 2026-02-21T08:22:03.1995315Z // begin inline asm 2026-02-21T08:22:03.1995475Z @%p66 mbarrier.inval.shared::cta.b64 [%r552]; 2026-02-21T08:22:03.1995711Z // end inline asm 2026-02-21T08:22:03.1995852Z bar.sync 0, 128; 2026-02-21T08:22:03.1995984Z // begin inline asm 2026-02-21T08:22:03.1996149Z @%p66 mbarrier.inval.shared::cta.b64 [%r553]; 2026-02-21T08:22:03.1996329Z // end inline asm 2026-02-21T08:22:03.1996466Z bar.sync 0, 128; 2026-02-21T08:22:03.1996603Z // begin inline asm 2026-02-21T08:22:03.1996760Z @%p66 mbarrier.inval.shared::cta.b64 [%r554]; 2026-02-21T08:22:03.1996944Z // end inline asm 2026-02-21T08:22:03.1997074Z bar.sync 0, 128; 2026-02-21T08:22:03.1997213Z // begin inline asm 2026-02-21T08:22:03.1997369Z @%p66 mbarrier.inval.shared::cta.b64 [%r555]; 2026-02-21T08:22:03.1997554Z // end inline asm 2026-02-21T08:22:03.1997683Z // begin inline asm 2026-02-21T08:22:03.1997846Z @%p66 mbarrier.inval.shared::cta.b64 [%r542]; 2026-02-21T08:22:03.1998022Z // end inline asm 2026-02-21T08:22:03.1998159Z bar.sync 0, 128; 2026-02-21T08:22:03.1998295Z // begin inline asm 2026-02-21T08:22:03.1998452Z @%p66 mbarrier.inval.shared::cta.b64 [%r543]; 2026-02-21T08:22:03.1998644Z // end inline asm 2026-02-21T08:22:03.1998774Z bar.sync 0, 128; 2026-02-21T08:22:03.1998916Z // begin inline asm 2026-02-21T08:22:03.1999074Z @%p66 mbarrier.inval.shared::cta.b64 [%r544]; 2026-02-21T08:22:03.1999259Z // end inline asm 2026-02-21T08:22:03.1999390Z bar.sync 0, 128; 2026-02-21T08:22:03.1999528Z // begin inline asm 2026-02-21T08:22:03.1999690Z @%p66 mbarrier.inval.shared::cta.b64 [%r545]; 2026-02-21T08:22:03.1999870Z // end inline asm 2026-02-21T08:22:03.2000009Z bar.sync 0, 128; 2026-02-21T08:22:03.2000142Z // begin inline asm 2026-02-21T08:22:03.2000315Z @%p66 mbarrier.inval.shared::cta.b64 [%r546]; 2026-02-21T08:22:03.2000493Z // end inline asm 2026-02-21T08:22:03.2000633Z bar.sync 0, 128; 2026-02-21T08:22:03.2000763Z // begin inline asm 2026-02-21T08:22:03.2000930Z @%p66 mbarrier.inval.shared::cta.b64 [%r547]; 2026-02-21T08:22:03.2001105Z // end inline asm 2026-02-21T08:22:03.2001242Z bar.sync 0, 128; 2026-02-21T08:22:03.2001380Z // begin inline asm 2026-02-21T08:22:03.2001542Z @%p66 mbarrier.inval.shared::cta.b64 [%r548]; 2026-02-21T08:22:03.2001725Z // end inline asm 2026-02-21T08:22:03.2001966Z .loc 1 42 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:42:32 2026-02-21T08:22:03.2002264Z shl.b32 %r1180, %r1172, 12; 2026-02-21T08:22:03.2002524Z .loc 1 59 45 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:59:45 2026-02-21T08:22:03.2002814Z or.b32 %r1181, %r1173, %r1176; 2026-02-21T08:22:03.2002978Z or.b32 %r1182, %r1180, %r1176; 2026-02-21T08:22:03.2003245Z .loc 1 59 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:59:52 2026-02-21T08:22:03.2003537Z or.b32 %r1183, %r1182, 32768; 2026-02-21T08:22:03.2003694Z or.b32 %r1184, %r1182, 65536; 2026-02-21T08:22:03.2003853Z or.b32 %r1185, %r1182, 98304; 2026-02-21T08:22:03.2004006Z or.b32 %r1186, %r1182, 131072; 2026-02-21T08:22:03.2004172Z or.b32 %r1187, %r1182, 163840; 2026-02-21T08:22:03.2004401Z or.b32 %r1188, %r1182, 196608; 2026-02-21T08:22:03.2004558Z or.b32 %r1189, %r1181, 229376; 2026-02-21T08:22:03.2004735Z or.b32 %r1190, %r1182, 262144; 2026-02-21T08:22:03.2004894Z or.b32 %r1191, %r1182, 294912; 2026-02-21T08:22:03.2005049Z or.b32 %r1192, %r1182, 327680; 2026-02-21T08:22:03.2005197Z or.b32 %r1193, %r1182, 360448; 2026-02-21T08:22:03.2005351Z or.b32 %r1194, %r1182, 393216; 2026-02-21T08:22:03.2005497Z or.b32 %r1195, %r1182, 425984; 2026-02-21T08:22:03.2005648Z or.b32 %r1196, %r1182, 458752; 2026-02-21T08:22:03.2005793Z or.b32 %r1197, %r1181, 491520; 2026-02-21T08:22:03.2005944Z or.b32 %r1198, %r1182, 524288; 2026-02-21T08:22:03.2006089Z or.b32 %r1199, %r1182, 557056; 2026-02-21T08:22:03.2006239Z or.b32 %r1200, %r1182, 589824; 2026-02-21T08:22:03.2006391Z or.b32 %r1201, %r1182, 622592; 2026-02-21T08:22:03.2006538Z or.b32 %r1202, %r1182, 655360; 2026-02-21T08:22:03.2006694Z or.b32 %r1203, %r1182, 688128; 2026-02-21T08:22:03.2006900Z or.b32 %r1204, %r1182, 720896; 2026-02-21T08:22:03.2007060Z or.b32 %r1205, %r1181, 753664; 2026-02-21T08:22:03.2007206Z or.b32 %r1206, %r1182, 786432; 2026-02-21T08:22:03.2007357Z or.b32 %r1207, %r1182, 819200; 2026-02-21T08:22:03.2007504Z or.b32 %r1208, %r1182, 851968; 2026-02-21T08:22:03.2007657Z or.b32 %r1209, %r1182, 884736; 2026-02-21T08:22:03.2007805Z or.b32 %r1210, %r1182, 917504; 2026-02-21T08:22:03.2007963Z or.b32 %r1211, %r1182, 950272; 2026-02-21T08:22:03.2008118Z or.b32 %r1212, %r1182, 983040; 2026-02-21T08:22:03.2008270Z or.b32 %r1213, %r1181, 1015808; 2026-02-21T08:22:03.2008542Z .loc 1 59 24 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:59:24 2026-02-21T08:22:03.2008829Z mad.wide.u32 %rd140, %r1182, 2, %rd8; 2026-02-21T08:22:03.2009019Z mad.wide.u32 %rd141, %r1183, 2, %rd8; 2026-02-21T08:22:03.2009190Z mad.wide.u32 %rd142, %r1184, 2, %rd8; 2026-02-21T08:22:03.2009366Z mad.wide.u32 %rd143, %r1185, 2, %rd8; 2026-02-21T08:22:03.2009541Z mad.wide.u32 %rd144, %r1186, 2, %rd8; 2026-02-21T08:22:03.2009707Z mad.wide.u32 %rd145, %r1187, 2, %rd8; 2026-02-21T08:22:03.2009880Z mad.wide.u32 %rd146, %r1188, 2, %rd8; 2026-02-21T08:22:03.2010041Z mad.wide.u32 %rd147, %r1189, 2, %rd8; 2026-02-21T08:22:03.2010210Z mad.wide.u32 %rd148, %r1190, 2, %rd8; 2026-02-21T08:22:03.2010372Z mad.wide.u32 %rd149, %r1191, 2, %rd8; 2026-02-21T08:22:03.2010542Z mad.wide.u32 %rd150, %r1192, 2, %rd8; 2026-02-21T08:22:03.2010705Z mad.wide.u32 %rd151, %r1193, 2, %rd8; 2026-02-21T08:22:03.2010877Z mad.wide.u32 %rd152, %r1194, 2, %rd8; 2026-02-21T08:22:03.2011038Z mad.wide.u32 %rd153, %r1195, 2, %rd8; 2026-02-21T08:22:03.2011206Z mad.wide.u32 %rd154, %r1196, 2, %rd8; 2026-02-21T08:22:03.2011374Z mad.wide.u32 %rd155, %r1197, 2, %rd8; 2026-02-21T08:22:03.2011538Z mad.wide.u32 %rd156, %r1198, 2, %rd8; 2026-02-21T08:22:03.2011709Z mad.wide.u32 %rd157, %r1199, 2, %rd8; 2026-02-21T08:22:03.2011871Z mad.wide.u32 %rd158, %r1200, 2, %rd8; 2026-02-21T08:22:03.2012042Z mad.wide.u32 %rd159, %r1201, 2, %rd8; 2026-02-21T08:22:03.2012207Z mad.wide.u32 %rd160, %r1202, 2, %rd8; 2026-02-21T08:22:03.2012377Z mad.wide.u32 %rd161, %r1203, 2, %rd8; 2026-02-21T08:22:03.2012539Z mad.wide.u32 %rd162, %r1204, 2, %rd8; 2026-02-21T08:22:03.2012708Z mad.wide.u32 %rd163, %r1205, 2, %rd8; 2026-02-21T08:22:03.2012875Z mad.wide.u32 %rd164, %r1206, 2, %rd8; 2026-02-21T08:22:03.2013036Z mad.wide.u32 %rd165, %r1207, 2, %rd8; 2026-02-21T08:22:03.2013207Z mad.wide.u32 %rd166, %r1208, 2, %rd8; 2026-02-21T08:22:03.2013371Z mad.wide.u32 %rd167, %r1209, 2, %rd8; 2026-02-21T08:22:03.2013541Z mad.wide.u32 %rd168, %r1210, 2, %rd8; 2026-02-21T08:22:03.2013702Z mad.wide.u32 %rd169, %r1211, 2, %rd8; 2026-02-21T08:22:03.2013870Z mad.wide.u32 %rd170, %r1212, 2, %rd8; 2026-02-21T08:22:03.2014031Z mad.wide.u32 %rd171, %r1213, 2, %rd8; 2026-02-21T08:22:03.2014311Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2014592Z // begin inline asm 2026-02-21T08:22:03.2015024Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596}, [%r270 + 0]; 2026-02-21T08:22:03.2015410Z // end inline asm 2026-02-21T08:22:03.2015542Z // begin inline asm 2026-02-21T08:22:03.2015887Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613}, [%r270 + 16]; 2026-02-21T08:22:03.2016257Z // end inline asm 2026-02-21T08:22:03.2016388Z // begin inline asm 2026-02-21T08:22:03.2016735Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630}, [%r270 + 32]; 2026-02-21T08:22:03.2017092Z // end inline asm 2026-02-21T08:22:03.2017228Z // begin inline asm 2026-02-21T08:22:03.2017616Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647}, [%r270 + 48]; 2026-02-21T08:22:03.2017991Z // end inline asm 2026-02-21T08:22:03.2018128Z // begin inline asm 2026-02-21T08:22:03.2018459Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664}, [%r270 + 64]; 2026-02-21T08:22:03.2018830Z // end inline asm 2026-02-21T08:22:03.2018958Z // begin inline asm 2026-02-21T08:22:03.2019297Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681}, [%r270 + 80]; 2026-02-21T08:22:03.2019674Z // end inline asm 2026-02-21T08:22:03.2019814Z // begin inline asm 2026-02-21T08:22:03.2020163Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698}, [%r270 + 96]; 2026-02-21T08:22:03.2020528Z // end inline asm 2026-02-21T08:22:03.2020677Z // begin inline asm 2026-02-21T08:22:03.2021006Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715}, [%r270 + 112]; 2026-02-21T08:22:03.2021388Z // end inline asm 2026-02-21T08:22:03.2021516Z // begin inline asm 2026-02-21T08:22:03.2021850Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732}, [%r270 + 128]; 2026-02-21T08:22:03.2022222Z // end inline asm 2026-02-21T08:22:03.2022349Z // begin inline asm 2026-02-21T08:22:03.2022685Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749}, [%r270 + 144]; 2026-02-21T08:22:03.2023046Z // end inline asm 2026-02-21T08:22:03.2023181Z // begin inline asm 2026-02-21T08:22:03.2023514Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766}, [%r270 + 160]; 2026-02-21T08:22:03.2023877Z // end inline asm 2026-02-21T08:22:03.2024011Z // begin inline asm 2026-02-21T08:22:03.2024337Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783}, [%r270 + 176]; 2026-02-21T08:22:03.2024760Z // end inline asm 2026-02-21T08:22:03.2024889Z // begin inline asm 2026-02-21T08:22:03.2025229Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800}, [%r270 + 192]; 2026-02-21T08:22:03.2025610Z // end inline asm 2026-02-21T08:22:03.2025738Z // begin inline asm 2026-02-21T08:22:03.2026087Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817}, [%r270 + 208]; 2026-02-21T08:22:03.2026452Z // end inline asm 2026-02-21T08:22:03.2026644Z // begin inline asm 2026-02-21T08:22:03.2026979Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833, %r834}, [%r270 + 224]; 2026-02-21T08:22:03.2027348Z // end inline asm 2026-02-21T08:22:03.2027479Z // begin inline asm 2026-02-21T08:22:03.2027820Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845, %r846, %r847, %r848, %r849, %r850, %r851}, [%r270 + 240]; 2026-02-21T08:22:03.2028187Z // end inline asm 2026-02-21T08:22:03.2028331Z // begin inline asm 2026-02-21T08:22:03.2028494Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:22:03.2028657Z // end inline asm 2026-02-21T08:22:03.2028808Z cvt.u64.u32 %rd172, %r581; 2026-02-21T08:22:03.2028979Z cvt.u64.u32 %rd173, %r582; 2026-02-21T08:22:03.2029143Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:22:03.2029366Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:22:03.2029644Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2029947Z mov.b64 {%r1214, %r1215}, %rd175; 2026-02-21T08:22:03.2030133Z cvt.rn.f16x2.f32 %r1216, %r1215, %r1214; 2026-02-21T08:22:03.2030431Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2030720Z cvt.u64.u32 %rd176, %r583; 2026-02-21T08:22:03.2030887Z cvt.u64.u32 %rd177, %r584; 2026-02-21T08:22:03.2031051Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:22:03.2031212Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:22:03.2031487Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2031776Z mov.b64 {%r1217, %r1218}, %rd179; 2026-02-21T08:22:03.2031966Z cvt.rn.f16x2.f32 %r1219, %r1218, %r1217; 2026-02-21T08:22:03.2032257Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2032555Z cvt.u64.u32 %rd180, %r585; 2026-02-21T08:22:03.2032724Z cvt.u64.u32 %rd181, %r586; 2026-02-21T08:22:03.2032882Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:22:03.2033049Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:22:03.2033314Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2033607Z mov.b64 {%r1220, %r1221}, %rd183; 2026-02-21T08:22:03.2033784Z cvt.rn.f16x2.f32 %r1222, %r1221, %r1220; 2026-02-21T08:22:03.2034077Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2034359Z cvt.u64.u32 %rd184, %r587; 2026-02-21T08:22:03.2034520Z cvt.u64.u32 %rd185, %r588; 2026-02-21T08:22:03.2034706Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:22:03.2034868Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:22:03.2035149Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2035450Z mov.b64 {%r1223, %r1224}, %rd187; 2026-02-21T08:22:03.2035635Z cvt.rn.f16x2.f32 %r1225, %r1224, %r1223; 2026-02-21T08:22:03.2035922Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2036224Z cvt.u64.u32 %rd188, %r589; 2026-02-21T08:22:03.2036378Z cvt.u64.u32 %rd189, %r590; 2026-02-21T08:22:03.2036525Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:22:03.2036682Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:22:03.2036941Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2037224Z mov.b64 {%r1226, %r1227}, %rd191; 2026-02-21T08:22:03.2037391Z cvt.rn.f16x2.f32 %r1228, %r1227, %r1226; 2026-02-21T08:22:03.2037674Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2037954Z cvt.u64.u32 %rd192, %r591; 2026-02-21T08:22:03.2038111Z cvt.u64.u32 %rd193, %r592; 2026-02-21T08:22:03.2038263Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:22:03.2038516Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:22:03.2038771Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2039037Z mov.b64 {%r1229, %r1230}, %rd195; 2026-02-21T08:22:03.2039206Z cvt.rn.f16x2.f32 %r1231, %r1230, %r1229; 2026-02-21T08:22:03.2039471Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2039744Z cvt.u64.u32 %rd196, %r593; 2026-02-21T08:22:03.2039896Z cvt.u64.u32 %rd197, %r594; 2026-02-21T08:22:03.2040042Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:22:03.2040202Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:22:03.2040459Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2040741Z mov.b64 {%r1232, %r1233}, %rd199; 2026-02-21T08:22:03.2040957Z cvt.rn.f16x2.f32 %r1234, %r1233, %r1232; 2026-02-21T08:22:03.2041242Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2041526Z cvt.u64.u32 %rd200, %r595; 2026-02-21T08:22:03.2041680Z cvt.u64.u32 %rd201, %r596; 2026-02-21T08:22:03.2041833Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:22:03.2041982Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:22:03.2042245Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2042518Z mov.b64 {%r1235, %r1236}, %rd203; 2026-02-21T08:22:03.2042691Z cvt.rn.f16x2.f32 %r1237, %r1236, %r1235; 2026-02-21T08:22:03.2042961Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2043251Z cvt.u64.u32 %rd204, %r598; 2026-02-21T08:22:03.2043408Z cvt.u64.u32 %rd205, %r599; 2026-02-21T08:22:03.2043559Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:22:03.2043724Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:22:03.2043982Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2044264Z mov.b64 {%r1238, %r1239}, %rd207; 2026-02-21T08:22:03.2044429Z cvt.rn.f16x2.f32 %r1240, %r1239, %r1238; 2026-02-21T08:22:03.2044728Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2045014Z cvt.u64.u32 %rd208, %r600; 2026-02-21T08:22:03.2045162Z cvt.u64.u32 %rd209, %r601; 2026-02-21T08:22:03.2045315Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:22:03.2045466Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:22:03.2045724Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2045993Z mov.b64 {%r1241, %r1242}, %rd211; 2026-02-21T08:22:03.2046165Z cvt.rn.f16x2.f32 %r1243, %r1242, %r1241; 2026-02-21T08:22:03.2046433Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2046711Z cvt.u64.u32 %rd212, %r602; 2026-02-21T08:22:03.2046866Z cvt.u64.u32 %rd213, %r603; 2026-02-21T08:22:03.2047013Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:22:03.2047170Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:22:03.2047421Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2047695Z mov.b64 {%r1244, %r1245}, %rd215; 2026-02-21T08:22:03.2047861Z cvt.rn.f16x2.f32 %r1246, %r1245, %r1244; 2026-02-21T08:22:03.2048141Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2048414Z cvt.u64.u32 %rd216, %r604; 2026-02-21T08:22:03.2048560Z cvt.u64.u32 %rd217, %r605; 2026-02-21T08:22:03.2048712Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:22:03.2048859Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:22:03.2049114Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2049384Z mov.b64 {%r1247, %r1248}, %rd219; 2026-02-21T08:22:03.2049630Z cvt.rn.f16x2.f32 %r1249, %r1248, %r1247; 2026-02-21T08:22:03.2049894Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2050166Z cvt.u64.u32 %rd220, %r606; 2026-02-21T08:22:03.2050316Z cvt.u64.u32 %rd221, %r607; 2026-02-21T08:22:03.2050462Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:22:03.2050617Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:22:03.2050871Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2051156Z mov.b64 {%r1250, %r1251}, %rd223; 2026-02-21T08:22:03.2051320Z cvt.rn.f16x2.f32 %r1252, %r1251, %r1250; 2026-02-21T08:22:03.2051594Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2051873Z cvt.u64.u32 %rd224, %r608; 2026-02-21T08:22:03.2052023Z cvt.u64.u32 %rd225, %r609; 2026-02-21T08:22:03.2052228Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:22:03.2052382Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:22:03.2052641Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2052909Z mov.b64 {%r1253, %r1254}, %rd227; 2026-02-21T08:22:03.2053083Z cvt.rn.f16x2.f32 %r1255, %r1254, %r1253; 2026-02-21T08:22:03.2053349Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2053637Z cvt.u64.u32 %rd228, %r610; 2026-02-21T08:22:03.2053792Z cvt.u64.u32 %rd229, %r611; 2026-02-21T08:22:03.2053941Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:22:03.2054102Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:22:03.2054353Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2054645Z mov.b64 {%r1256, %r1257}, %rd231; 2026-02-21T08:22:03.2054843Z cvt.rn.f16x2.f32 %r1258, %r1257, %r1256; 2026-02-21T08:22:03.2055130Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2055412Z cvt.u64.u32 %rd232, %r612; 2026-02-21T08:22:03.2055559Z cvt.u64.u32 %rd233, %r613; 2026-02-21T08:22:03.2055717Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:22:03.2055866Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:22:03.2056130Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2056405Z mov.b64 {%r1259, %r1260}, %rd235; 2026-02-21T08:22:03.2056577Z cvt.rn.f16x2.f32 %r1261, %r1260, %r1259; 2026-02-21T08:22:03.2056848Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2057124Z cvt.u64.u32 %rd236, %r615; 2026-02-21T08:22:03.2057282Z cvt.u64.u32 %rd237, %r616; 2026-02-21T08:22:03.2057428Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:22:03.2057586Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:22:03.2057843Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2058127Z mov.b64 {%r1262, %r1263}, %rd239; 2026-02-21T08:22:03.2058292Z cvt.rn.f16x2.f32 %r1264, %r1263, %r1262; 2026-02-21T08:22:03.2058574Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2058849Z cvt.u64.u32 %rd240, %r617; 2026-02-21T08:22:03.2058996Z cvt.u64.u32 %rd241, %r618; 2026-02-21T08:22:03.2059150Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:22:03.2059299Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:22:03.2059563Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2059836Z mov.b64 {%r1265, %r1266}, %rd243; 2026-02-21T08:22:03.2060008Z cvt.rn.f16x2.f32 %r1267, %r1266, %r1265; 2026-02-21T08:22:03.2060279Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2060557Z cvt.u64.u32 %rd244, %r619; 2026-02-21T08:22:03.2060761Z cvt.u64.u32 %rd245, %r620; 2026-02-21T08:22:03.2060907Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:22:03.2061063Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:22:03.2061314Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2061590Z mov.b64 {%r1268, %r1269}, %rd247; 2026-02-21T08:22:03.2061756Z cvt.rn.f16x2.f32 %r1270, %r1269, %r1268; 2026-02-21T08:22:03.2062032Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2062307Z cvt.u64.u32 %rd248, %r621; 2026-02-21T08:22:03.2062454Z cvt.u64.u32 %rd249, %r622; 2026-02-21T08:22:03.2062605Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:22:03.2062753Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:22:03.2063011Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2063330Z mov.b64 {%r1271, %r1272}, %rd251; 2026-02-21T08:22:03.2063509Z cvt.rn.f16x2.f32 %r1273, %r1272, %r1271; 2026-02-21T08:22:03.2063776Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2064057Z cvt.u64.u32 %rd252, %r623; 2026-02-21T08:22:03.2064211Z cvt.u64.u32 %rd253, %r624; 2026-02-21T08:22:03.2064360Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:22:03.2064519Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:22:03.2064798Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2065085Z mov.b64 {%r1274, %r1275}, %rd255; 2026-02-21T08:22:03.2065254Z cvt.rn.f16x2.f32 %r1276, %r1275, %r1274; 2026-02-21T08:22:03.2065545Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2065821Z cvt.u64.u32 %rd256, %r625; 2026-02-21T08:22:03.2065967Z cvt.u64.u32 %rd257, %r626; 2026-02-21T08:22:03.2066124Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:22:03.2066279Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:22:03.2066541Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2066811Z mov.b64 {%r1277, %r1278}, %rd259; 2026-02-21T08:22:03.2066983Z cvt.rn.f16x2.f32 %r1279, %r1278, %r1277; 2026-02-21T08:22:03.2067254Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2067534Z cvt.u64.u32 %rd260, %r627; 2026-02-21T08:22:03.2067689Z cvt.u64.u32 %rd261, %r628; 2026-02-21T08:22:03.2067837Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:22:03.2067995Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:22:03.2068248Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2068525Z mov.b64 {%r1280, %r1281}, %rd263; 2026-02-21T08:22:03.2068691Z cvt.rn.f16x2.f32 %r1282, %r1281, %r1280; 2026-02-21T08:22:03.2068968Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2069244Z cvt.u64.u32 %rd264, %r629; 2026-02-21T08:22:03.2069391Z cvt.u64.u32 %rd265, %r630; 2026-02-21T08:22:03.2069544Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:22:03.2069692Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:22:03.2069950Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2070218Z mov.b64 {%r1283, %r1284}, %rd267; 2026-02-21T08:22:03.2070391Z cvt.rn.f16x2.f32 %r1285, %r1284, %r1283; 2026-02-21T08:22:03.2070658Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2070934Z cvt.u64.u32 %rd268, %r632; 2026-02-21T08:22:03.2071087Z cvt.u64.u32 %rd269, %r633; 2026-02-21T08:22:03.2071234Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:22:03.2071388Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:22:03.2071646Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2071988Z mov.b64 {%r1286, %r1287}, %rd271; 2026-02-21T08:22:03.2072153Z cvt.rn.f16x2.f32 %r1288, %r1287, %r1286; 2026-02-21T08:22:03.2072427Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2072716Z cvt.u64.u32 %rd272, %r634; 2026-02-21T08:22:03.2072869Z cvt.u64.u32 %rd273, %r635; 2026-02-21T08:22:03.2073032Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:22:03.2073190Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:22:03.2073467Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2073756Z mov.b64 {%r1289, %r1290}, %rd275; 2026-02-21T08:22:03.2073941Z cvt.rn.f16x2.f32 %r1291, %r1290, %r1289; 2026-02-21T08:22:03.2074230Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2074580Z cvt.u64.u32 %rd276, %r636; 2026-02-21T08:22:03.2074778Z cvt.u64.u32 %rd277, %r637; 2026-02-21T08:22:03.2074938Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:22:03.2075106Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:22:03.2075383Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2075682Z mov.b64 {%r1292, %r1293}, %rd279; 2026-02-21T08:22:03.2075861Z cvt.rn.f16x2.f32 %r1294, %r1293, %r1292; 2026-02-21T08:22:03.2076171Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2076479Z cvt.u64.u32 %rd280, %r638; 2026-02-21T08:22:03.2076637Z cvt.u64.u32 %rd281, %r639; 2026-02-21T08:22:03.2076801Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:22:03.2076960Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:22:03.2077238Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2077540Z mov.b64 {%r1295, %r1296}, %rd283; 2026-02-21T08:22:03.2077730Z cvt.rn.f16x2.f32 %r1297, %r1296, %r1295; 2026-02-21T08:22:03.2078031Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2078322Z cvt.u64.u32 %rd284, %r640; 2026-02-21T08:22:03.2078485Z cvt.u64.u32 %rd285, %r641; 2026-02-21T08:22:03.2078641Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:22:03.2078807Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:22:03.2079086Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2079385Z mov.b64 {%r1298, %r1299}, %rd287; 2026-02-21T08:22:03.2079558Z cvt.rn.f16x2.f32 %r1300, %r1299, %r1298; 2026-02-21T08:22:03.2079856Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2080159Z cvt.u64.u32 %rd288, %r642; 2026-02-21T08:22:03.2080314Z cvt.u64.u32 %rd289, %r643; 2026-02-21T08:22:03.2080477Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:22:03.2080636Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:22:03.2080919Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2081227Z mov.b64 {%r1301, %r1302}, %rd291; 2026-02-21T08:22:03.2081399Z cvt.rn.f16x2.f32 %r1303, %r1302, %r1301; 2026-02-21T08:22:03.2081679Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2081959Z cvt.u64.u32 %rd292, %r644; 2026-02-21T08:22:03.2082114Z cvt.u64.u32 %rd293, %r645; 2026-02-21T08:22:03.2082260Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:22:03.2082417Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:22:03.2082673Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2082957Z mov.b64 {%r1304, %r1305}, %rd295; 2026-02-21T08:22:03.2083124Z cvt.rn.f16x2.f32 %r1306, %r1305, %r1304; 2026-02-21T08:22:03.2083411Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2083737Z cvt.u64.u32 %rd296, %r646; 2026-02-21T08:22:03.2083883Z cvt.u64.u32 %rd297, %r647; 2026-02-21T08:22:03.2084035Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:22:03.2084183Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:22:03.2084444Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2084741Z mov.b64 {%r1307, %r1308}, %rd299; 2026-02-21T08:22:03.2084917Z cvt.rn.f16x2.f32 %r1309, %r1308, %r1307; 2026-02-21T08:22:03.2085189Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2085459Z cvt.u64.u32 %rd300, %r649; 2026-02-21T08:22:03.2085617Z cvt.u64.u32 %rd301, %r650; 2026-02-21T08:22:03.2085766Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:22:03.2085923Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:22:03.2086239Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2086527Z mov.b64 {%r1310, %r1311}, %rd303; 2026-02-21T08:22:03.2086694Z cvt.rn.f16x2.f32 %r1312, %r1311, %r1310; 2026-02-21T08:22:03.2086975Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2087264Z cvt.u64.u32 %rd304, %r651; 2026-02-21T08:22:03.2087413Z cvt.u64.u32 %rd305, %r652; 2026-02-21T08:22:03.2087573Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:22:03.2087722Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:22:03.2087984Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2088252Z mov.b64 {%r1313, %r1314}, %rd307; 2026-02-21T08:22:03.2088425Z cvt.rn.f16x2.f32 %r1315, %r1314, %r1313; 2026-02-21T08:22:03.2088712Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2088997Z cvt.u64.u32 %rd308, %r653; 2026-02-21T08:22:03.2089157Z cvt.u64.u32 %rd309, %r654; 2026-02-21T08:22:03.2089301Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:22:03.2089455Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:22:03.2089709Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2089993Z mov.b64 {%r1316, %r1317}, %rd311; 2026-02-21T08:22:03.2090158Z cvt.rn.f16x2.f32 %r1318, %r1317, %r1316; 2026-02-21T08:22:03.2090432Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2090709Z cvt.u64.u32 %rd312, %r655; 2026-02-21T08:22:03.2090856Z cvt.u64.u32 %rd313, %r656; 2026-02-21T08:22:03.2091010Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:22:03.2091161Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:22:03.2091423Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2091695Z mov.b64 {%r1319, %r1320}, %rd315; 2026-02-21T08:22:03.2091869Z cvt.rn.f16x2.f32 %r1321, %r1320, %r1319; 2026-02-21T08:22:03.2092151Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2092418Z cvt.u64.u32 %rd316, %r657; 2026-02-21T08:22:03.2092570Z cvt.u64.u32 %rd317, %r658; 2026-02-21T08:22:03.2092716Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:22:03.2092873Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:22:03.2093124Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2093413Z mov.b64 {%r1322, %r1323}, %rd319; 2026-02-21T08:22:03.2093577Z cvt.rn.f16x2.f32 %r1324, %r1323, %r1322; 2026-02-21T08:22:03.2093853Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2094133Z cvt.u64.u32 %rd320, %r659; 2026-02-21T08:22:03.2094282Z cvt.u64.u32 %rd321, %r660; 2026-02-21T08:22:03.2094434Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:22:03.2094585Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:22:03.2094932Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2095206Z mov.b64 {%r1325, %r1326}, %rd323; 2026-02-21T08:22:03.2095379Z cvt.rn.f16x2.f32 %r1327, %r1326, %r1325; 2026-02-21T08:22:03.2095660Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2095931Z cvt.u64.u32 %rd324, %r661; 2026-02-21T08:22:03.2096088Z cvt.u64.u32 %rd325, %r662; 2026-02-21T08:22:03.2096235Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:22:03.2096396Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:22:03.2096656Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2096944Z mov.b64 {%r1328, %r1329}, %rd327; 2026-02-21T08:22:03.2097111Z cvt.rn.f16x2.f32 %r1330, %r1329, %r1328; 2026-02-21T08:22:03.2097453Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2097743Z cvt.u64.u32 %rd328, %r663; 2026-02-21T08:22:03.2097890Z cvt.u64.u32 %rd329, %r664; 2026-02-21T08:22:03.2098048Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:22:03.2098203Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:22:03.2098469Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2098740Z mov.b64 {%r1331, %r1332}, %rd331; 2026-02-21T08:22:03.2098914Z cvt.rn.f16x2.f32 %r1333, %r1332, %r1331; 2026-02-21T08:22:03.2099200Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2099473Z cvt.u64.u32 %rd332, %r666; 2026-02-21T08:22:03.2099633Z cvt.u64.u32 %rd333, %r667; 2026-02-21T08:22:03.2099783Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:22:03.2099945Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:22:03.2100202Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2100487Z mov.b64 {%r1334, %r1335}, %rd335; 2026-02-21T08:22:03.2100652Z cvt.rn.f16x2.f32 %r1336, %r1335, %r1334; 2026-02-21T08:22:03.2100931Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2101208Z cvt.u64.u32 %rd336, %r668; 2026-02-21T08:22:03.2101355Z cvt.u64.u32 %rd337, %r669; 2026-02-21T08:22:03.2101510Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:22:03.2101661Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:22:03.2101922Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2102192Z mov.b64 {%r1337, %r1338}, %rd339; 2026-02-21T08:22:03.2102365Z cvt.rn.f16x2.f32 %r1339, %r1338, %r1337; 2026-02-21T08:22:03.2102642Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2102920Z cvt.u64.u32 %rd340, %r670; 2026-02-21T08:22:03.2103075Z cvt.u64.u32 %rd341, %r671; 2026-02-21T08:22:03.2103224Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:22:03.2103380Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:22:03.2103633Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2103919Z mov.b64 {%r1340, %r1341}, %rd343; 2026-02-21T08:22:03.2104083Z cvt.rn.f16x2.f32 %r1342, %r1341, %r1340; 2026-02-21T08:22:03.2104359Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2104642Z cvt.u64.u32 %rd344, %r672; 2026-02-21T08:22:03.2104815Z cvt.u64.u32 %rd345, %r673; 2026-02-21T08:22:03.2104970Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:22:03.2105120Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:22:03.2105377Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2105658Z mov.b64 {%r1343, %r1344}, %rd347; 2026-02-21T08:22:03.2105833Z cvt.rn.f16x2.f32 %r1345, %r1344, %r1343; 2026-02-21T08:22:03.2106165Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2106448Z cvt.u64.u32 %rd348, %r674; 2026-02-21T08:22:03.2106600Z cvt.u64.u32 %rd349, %r675; 2026-02-21T08:22:03.2106745Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:22:03.2106904Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:22:03.2107159Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2107438Z mov.b64 {%r1346, %r1347}, %rd351; 2026-02-21T08:22:03.2107613Z cvt.rn.f16x2.f32 %r1348, %r1347, %r1346; 2026-02-21T08:22:03.2107891Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2108170Z cvt.u64.u32 %rd352, %r676; 2026-02-21T08:22:03.2108317Z cvt.u64.u32 %rd353, %r677; 2026-02-21T08:22:03.2108470Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:22:03.2108670Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:22:03.2108939Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2109220Z mov.b64 {%r1349, %r1350}, %rd355; 2026-02-21T08:22:03.2109405Z cvt.rn.f16x2.f32 %r1351, %r1350, %r1349; 2026-02-21T08:22:03.2109681Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2109946Z cvt.u64.u32 %rd356, %r678; 2026-02-21T08:22:03.2110101Z cvt.u64.u32 %rd357, %r679; 2026-02-21T08:22:03.2110247Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:22:03.2110404Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:22:03.2110655Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2110929Z mov.b64 {%r1352, %r1353}, %rd359; 2026-02-21T08:22:03.2111103Z cvt.rn.f16x2.f32 %r1354, %r1353, %r1352; 2026-02-21T08:22:03.2111373Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2111649Z cvt.u64.u32 %rd360, %r680; 2026-02-21T08:22:03.2111796Z cvt.u64.u32 %rd361, %r681; 2026-02-21T08:22:03.2111948Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:22:03.2112098Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:22:03.2112357Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2112632Z mov.b64 {%r1355, %r1356}, %rd363; 2026-02-21T08:22:03.2112805Z cvt.rn.f16x2.f32 %r1357, %r1356, %r1355; 2026-02-21T08:22:03.2113079Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2113346Z cvt.u64.u32 %rd364, %r683; 2026-02-21T08:22:03.2113501Z cvt.u64.u32 %rd365, %r684; 2026-02-21T08:22:03.2113646Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:22:03.2113804Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:22:03.2114061Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2114339Z mov.b64 {%r1358, %r1359}, %rd367; 2026-02-21T08:22:03.2114511Z cvt.rn.f16x2.f32 %r1360, %r1359, %r1358; 2026-02-21T08:22:03.2114796Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2115083Z cvt.u64.u32 %rd368, %r685; 2026-02-21T08:22:03.2115230Z cvt.u64.u32 %rd369, %r686; 2026-02-21T08:22:03.2115384Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:22:03.2115545Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:22:03.2115834Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2116120Z mov.b64 {%r1361, %r1362}, %rd371; 2026-02-21T08:22:03.2116301Z cvt.rn.f16x2.f32 %r1363, %r1362, %r1361; 2026-02-21T08:22:03.2116592Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2116878Z cvt.u64.u32 %rd372, %r687; 2026-02-21T08:22:03.2117041Z cvt.u64.u32 %rd373, %r688; 2026-02-21T08:22:03.2117258Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:22:03.2117424Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:22:03.2117690Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2117989Z mov.b64 {%r1364, %r1365}, %rd375; 2026-02-21T08:22:03.2118172Z cvt.rn.f16x2.f32 %r1366, %r1365, %r1364; 2026-02-21T08:22:03.2118463Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2118754Z cvt.u64.u32 %rd376, %r689; 2026-02-21T08:22:03.2118910Z cvt.u64.u32 %rd377, %r690; 2026-02-21T08:22:03.2119073Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:22:03.2119230Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:22:03.2119509Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2119808Z mov.b64 {%r1367, %r1368}, %rd379; 2026-02-21T08:22:03.2120044Z cvt.rn.f16x2.f32 %r1369, %r1368, %r1367; 2026-02-21T08:22:03.2120347Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2120631Z cvt.u64.u32 %rd380, %r691; 2026-02-21T08:22:03.2120794Z cvt.u64.u32 %rd381, %r692; 2026-02-21T08:22:03.2120946Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:22:03.2121112Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:22:03.2121381Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2121683Z mov.b64 {%r1370, %r1371}, %rd383; 2026-02-21T08:22:03.2121863Z cvt.rn.f16x2.f32 %r1372, %r1371, %r1370; 2026-02-21T08:22:03.2122144Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2122435Z cvt.u64.u32 %rd384, %r693; 2026-02-21T08:22:03.2122591Z cvt.u64.u32 %rd385, %r694; 2026-02-21T08:22:03.2122753Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:22:03.2122911Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:22:03.2123192Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2123530Z mov.b64 {%r1373, %r1374}, %rd387; 2026-02-21T08:22:03.2123698Z cvt.rn.f16x2.f32 %r1375, %r1374, %r1373; 2026-02-21T08:22:03.2123970Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2124238Z cvt.u64.u32 %rd388, %r695; 2026-02-21T08:22:03.2124393Z cvt.u64.u32 %rd389, %r696; 2026-02-21T08:22:03.2124539Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:22:03.2124714Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:22:03.2124965Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2125239Z mov.b64 {%r1376, %r1377}, %rd391; 2026-02-21T08:22:03.2125410Z cvt.rn.f16x2.f32 %r1378, %r1377, %r1376; 2026-02-21T08:22:03.2125678Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2125963Z cvt.u64.u32 %rd392, %r697; 2026-02-21T08:22:03.2126109Z cvt.u64.u32 %rd393, %r698; 2026-02-21T08:22:03.2126263Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:22:03.2126412Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:22:03.2126670Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2126941Z mov.b64 {%r1379, %r1380}, %rd395; 2026-02-21T08:22:03.2127105Z cvt.rn.f16x2.f32 %r1381, %r1380, %r1379; 2026-02-21T08:22:03.2127379Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2127645Z cvt.u64.u32 %rd396, %r700; 2026-02-21T08:22:03.2127799Z cvt.u64.u32 %rd397, %r701; 2026-02-21T08:22:03.2127944Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:22:03.2128101Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:22:03.2128352Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2128685Z mov.b64 {%r1382, %r1383}, %rd399; 2026-02-21T08:22:03.2128857Z cvt.rn.f16x2.f32 %r1384, %r1383, %r1382; 2026-02-21T08:22:03.2129122Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2129398Z cvt.u64.u32 %rd400, %r702; 2026-02-21T08:22:03.2129543Z cvt.u64.u32 %rd401, %r703; 2026-02-21T08:22:03.2129696Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:22:03.2129844Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:22:03.2130104Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2130380Z mov.b64 {%r1385, %r1386}, %rd403; 2026-02-21T08:22:03.2130548Z cvt.rn.f16x2.f32 %r1387, %r1386, %r1385; 2026-02-21T08:22:03.2130825Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2131095Z cvt.u64.u32 %rd404, %r704; 2026-02-21T08:22:03.2131325Z cvt.u64.u32 %rd405, %r705; 2026-02-21T08:22:03.2131477Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:22:03.2131634Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:22:03.2131884Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2132157Z mov.b64 {%r1388, %r1389}, %rd407; 2026-02-21T08:22:03.2132327Z cvt.rn.f16x2.f32 %r1390, %r1389, %r1388; 2026-02-21T08:22:03.2132591Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2132863Z cvt.u64.u32 %rd408, %r706; 2026-02-21T08:22:03.2132919Z cvt.u64.u32 %rd409, %r707; 2026-02-21T08:22:03.2132975Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:22:03.2133040Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:22:03.2133193Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2133250Z mov.b64 {%r1391, %r1392}, %rd411; 2026-02-21T08:22:03.2133323Z cvt.rn.f16x2.f32 %r1393, %r1392, %r1391; 2026-02-21T08:22:03.2133479Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2133536Z cvt.u64.u32 %rd412, %r708; 2026-02-21T08:22:03.2133592Z cvt.u64.u32 %rd413, %r709; 2026-02-21T08:22:03.2133656Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:22:03.2133712Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:22:03.2133863Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2133929Z mov.b64 {%r1394, %r1395}, %rd415; 2026-02-21T08:22:03.2133993Z cvt.rn.f16x2.f32 %r1396, %r1395, %r1394; 2026-02-21T08:22:03.2134151Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2134213Z cvt.u64.u32 %rd416, %r710; 2026-02-21T08:22:03.2134270Z cvt.u64.u32 %rd417, %r711; 2026-02-21T08:22:03.2134326Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:22:03.2134384Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:22:03.2134548Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2134605Z mov.b64 {%r1397, %r1398}, %rd419; 2026-02-21T08:22:03.2134692Z cvt.rn.f16x2.f32 %r1399, %r1398, %r1397; 2026-02-21T08:22:03.2134863Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2134919Z cvt.u64.u32 %rd420, %r712; 2026-02-21T08:22:03.2134974Z cvt.u64.u32 %rd421, %r713; 2026-02-21T08:22:03.2135038Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:22:03.2135095Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:22:03.2135252Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2135309Z mov.b64 {%r1400, %r1401}, %rd423; 2026-02-21T08:22:03.2135380Z cvt.rn.f16x2.f32 %r1402, %r1401, %r1400; 2026-02-21T08:22:03.2135539Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2135650Z cvt.u64.u32 %rd424, %r714; 2026-02-21T08:22:03.2135714Z cvt.u64.u32 %rd425, %r715; 2026-02-21T08:22:03.2135769Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:22:03.2135826Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:22:03.2135989Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2136046Z mov.b64 {%r1403, %r1404}, %rd427; 2026-02-21T08:22:03.2136109Z cvt.rn.f16x2.f32 %r1405, %r1404, %r1403; 2026-02-21T08:22:03.2136264Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2136326Z cvt.u64.u32 %rd428, %r717; 2026-02-21T08:22:03.2136379Z cvt.u64.u32 %rd429, %r718; 2026-02-21T08:22:03.2136433Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:22:03.2136494Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:22:03.2136693Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2136752Z mov.b64 {%r1406, %r1407}, %rd431; 2026-02-21T08:22:03.2136825Z cvt.rn.f16x2.f32 %r1408, %r1407, %r1406; 2026-02-21T08:22:03.2136984Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2137040Z cvt.u64.u32 %rd432, %r719; 2026-02-21T08:22:03.2137095Z cvt.u64.u32 %rd433, %r720; 2026-02-21T08:22:03.2137158Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:22:03.2137215Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:22:03.2137379Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2137444Z mov.b64 {%r1409, %r1410}, %rd435; 2026-02-21T08:22:03.2137507Z cvt.rn.f16x2.f32 %r1411, %r1410, %r1409; 2026-02-21T08:22:03.2137665Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2137727Z cvt.u64.u32 %rd436, %r721; 2026-02-21T08:22:03.2137786Z cvt.u64.u32 %rd437, %r722; 2026-02-21T08:22:03.2137840Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:22:03.2137899Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:22:03.2138065Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2138122Z mov.b64 {%r1412, %r1413}, %rd439; 2026-02-21T08:22:03.2138183Z cvt.rn.f16x2.f32 %r1414, %r1413, %r1412; 2026-02-21T08:22:03.2138348Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2138405Z cvt.u64.u32 %rd440, %r723; 2026-02-21T08:22:03.2138460Z cvt.u64.u32 %rd441, %r724; 2026-02-21T08:22:03.2138523Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:22:03.2138579Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:22:03.2138740Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2138796Z mov.b64 {%r1415, %r1416}, %rd443; 2026-02-21T08:22:03.2138868Z cvt.rn.f16x2.f32 %r1417, %r1416, %r1415; 2026-02-21T08:22:03.2139032Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2139089Z cvt.u64.u32 %rd444, %r725; 2026-02-21T08:22:03.2139153Z cvt.u64.u32 %rd445, %r726; 2026-02-21T08:22:03.2139207Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:22:03.2139264Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:22:03.2139432Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2139489Z mov.b64 {%r1418, %r1419}, %rd447; 2026-02-21T08:22:03.2139552Z cvt.rn.f16x2.f32 %r1420, %r1419, %r1418; 2026-02-21T08:22:03.2139715Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2139779Z cvt.u64.u32 %rd448, %r727; 2026-02-21T08:22:03.2139836Z cvt.u64.u32 %rd449, %r728; 2026-02-21T08:22:03.2139891Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:22:03.2139956Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:22:03.2140122Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2140271Z mov.b64 {%r1421, %r1422}, %rd451; 2026-02-21T08:22:03.2140342Z cvt.rn.f16x2.f32 %r1423, %r1422, %r1421; 2026-02-21T08:22:03.2140500Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2140556Z cvt.u64.u32 %rd452, %r729; 2026-02-21T08:22:03.2140611Z cvt.u64.u32 %rd453, %r730; 2026-02-21T08:22:03.2140673Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:22:03.2140731Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:22:03.2140889Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2140955Z mov.b64 {%r1424, %r1425}, %rd455; 2026-02-21T08:22:03.2141017Z cvt.rn.f16x2.f32 %r1426, %r1425, %r1424; 2026-02-21T08:22:03.2141213Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2141279Z cvt.u64.u32 %rd456, %r731; 2026-02-21T08:22:03.2141334Z cvt.u64.u32 %rd457, %r732; 2026-02-21T08:22:03.2141390Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:22:03.2141446Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:22:03.2141615Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2141673Z mov.b64 {%r1427, %r1428}, %rd459; 2026-02-21T08:22:03.2141738Z cvt.rn.f16x2.f32 %r1429, %r1428, %r1427; 2026-02-21T08:22:03.2141900Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2141956Z cvt.u64.u32 %rd460, %r734; 2026-02-21T08:22:03.2142011Z cvt.u64.u32 %rd461, %r735; 2026-02-21T08:22:03.2142075Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:22:03.2142131Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:22:03.2142289Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2142347Z mov.b64 {%r1430, %r1431}, %rd463; 2026-02-21T08:22:03.2142421Z cvt.rn.f16x2.f32 %r1432, %r1431, %r1430; 2026-02-21T08:22:03.2142580Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2142638Z cvt.u64.u32 %rd464, %r736; 2026-02-21T08:22:03.2142700Z cvt.u64.u32 %rd465, %r737; 2026-02-21T08:22:03.2142756Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:22:03.2142811Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:22:03.2142977Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2143034Z mov.b64 {%r1433, %r1434}, %rd467; 2026-02-21T08:22:03.2143098Z cvt.rn.f16x2.f32 %r1435, %r1434, %r1433; 2026-02-21T08:22:03.2143257Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2143320Z cvt.u64.u32 %rd468, %r738; 2026-02-21T08:22:03.2143375Z cvt.u64.u32 %rd469, %r739; 2026-02-21T08:22:03.2143432Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:22:03.2143496Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:22:03.2143656Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2143711Z mov.b64 {%r1436, %r1437}, %rd471; 2026-02-21T08:22:03.2143781Z cvt.rn.f16x2.f32 %r1438, %r1437, %r1436; 2026-02-21T08:22:03.2143938Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2143994Z cvt.u64.u32 %rd472, %r740; 2026-02-21T08:22:03.2144050Z cvt.u64.u32 %rd473, %r741; 2026-02-21T08:22:03.2144112Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:22:03.2144169Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:22:03.2144328Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2144392Z mov.b64 {%r1439, %r1440}, %rd475; 2026-02-21T08:22:03.2144455Z cvt.rn.f16x2.f32 %r1441, %r1440, %r1439; 2026-02-21T08:22:03.2144615Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2144743Z cvt.u64.u32 %rd476, %r742; 2026-02-21T08:22:03.2144799Z cvt.u64.u32 %rd477, %r743; 2026-02-21T08:22:03.2144854Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:22:03.2144910Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:22:03.2145077Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2145134Z mov.b64 {%r1442, %r1443}, %rd479; 2026-02-21T08:22:03.2145196Z cvt.rn.f16x2.f32 %r1444, %r1443, %r1442; 2026-02-21T08:22:03.2145358Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2145414Z cvt.u64.u32 %rd480, %r744; 2026-02-21T08:22:03.2145468Z cvt.u64.u32 %rd481, %r745; 2026-02-21T08:22:03.2145529Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:22:03.2145584Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:22:03.2145790Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2145851Z mov.b64 {%r1445, %r1446}, %rd483; 2026-02-21T08:22:03.2145919Z cvt.rn.f16x2.f32 %r1447, %r1446, %r1445; 2026-02-21T08:22:03.2146074Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2146129Z cvt.u64.u32 %rd484, %r746; 2026-02-21T08:22:03.2146191Z cvt.u64.u32 %rd485, %r747; 2026-02-21T08:22:03.2146247Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:22:03.2146303Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:22:03.2146464Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2146522Z mov.b64 {%r1448, %r1449}, %rd487; 2026-02-21T08:22:03.2146585Z cvt.rn.f16x2.f32 %r1450, %r1449, %r1448; 2026-02-21T08:22:03.2146738Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2146803Z cvt.u64.u32 %rd488, %r748; 2026-02-21T08:22:03.2146859Z cvt.u64.u32 %rd489, %r749; 2026-02-21T08:22:03.2146915Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:22:03.2146979Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:22:03.2147132Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2147189Z mov.b64 {%r1451, %r1452}, %rd491; 2026-02-21T08:22:03.2147262Z cvt.rn.f16x2.f32 %r1453, %r1452, %r1451; 2026-02-21T08:22:03.2147415Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2147471Z cvt.u64.u32 %rd492, %r751; 2026-02-21T08:22:03.2147525Z cvt.u64.u32 %rd493, %r752; 2026-02-21T08:22:03.2147591Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:22:03.2147648Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:22:03.2147800Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2147867Z mov.b64 {%r1454, %r1455}, %rd495; 2026-02-21T08:22:03.2147933Z cvt.rn.f16x2.f32 %r1456, %r1455, %r1454; 2026-02-21T08:22:03.2148085Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2148149Z cvt.u64.u32 %rd496, %r753; 2026-02-21T08:22:03.2148205Z cvt.u64.u32 %rd497, %r754; 2026-02-21T08:22:03.2148262Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:22:03.2148321Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:22:03.2148484Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2148541Z mov.b64 {%r1457, %r1458}, %rd499; 2026-02-21T08:22:03.2148611Z cvt.rn.f16x2.f32 %r1459, %r1458, %r1457; 2026-02-21T08:22:03.2148772Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2148827Z cvt.u64.u32 %rd500, %r755; 2026-02-21T08:22:03.2148881Z cvt.u64.u32 %rd501, %r756; 2026-02-21T08:22:03.2148944Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:22:03.2149066Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:22:03.2149228Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2149285Z mov.b64 {%r1460, %r1461}, %rd503; 2026-02-21T08:22:03.2149355Z cvt.rn.f16x2.f32 %r1462, %r1461, %r1460; 2026-02-21T08:22:03.2149515Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2149571Z cvt.u64.u32 %rd504, %r757; 2026-02-21T08:22:03.2149634Z cvt.u64.u32 %rd505, %r758; 2026-02-21T08:22:03.2149690Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:22:03.2149747Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:22:03.2149918Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2149974Z mov.b64 {%r1463, %r1464}, %rd507; 2026-02-21T08:22:03.2150038Z cvt.rn.f16x2.f32 %r1465, %r1464, %r1463; 2026-02-21T08:22:03.2150244Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2150312Z cvt.u64.u32 %rd508, %r759; 2026-02-21T08:22:03.2150366Z cvt.u64.u32 %rd509, %r760; 2026-02-21T08:22:03.2150421Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:22:03.2150486Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:22:03.2150646Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2150703Z mov.b64 {%r1466, %r1467}, %rd511; 2026-02-21T08:22:03.2150773Z cvt.rn.f16x2.f32 %r1468, %r1467, %r1466; 2026-02-21T08:22:03.2150932Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2150988Z cvt.u64.u32 %rd512, %r761; 2026-02-21T08:22:03.2151044Z cvt.u64.u32 %rd513, %r762; 2026-02-21T08:22:03.2151108Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:22:03.2151163Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:22:03.2151324Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2151391Z mov.b64 {%r1469, %r1470}, %rd515; 2026-02-21T08:22:03.2151454Z cvt.rn.f16x2.f32 %r1471, %r1470, %r1469; 2026-02-21T08:22:03.2151614Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2151678Z cvt.u64.u32 %rd516, %r763; 2026-02-21T08:22:03.2151734Z cvt.u64.u32 %rd517, %r764; 2026-02-21T08:22:03.2151789Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:22:03.2151845Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:22:03.2152015Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2152072Z mov.b64 {%r1472, %r1473}, %rd519; 2026-02-21T08:22:03.2152136Z cvt.rn.f16x2.f32 %r1474, %r1473, %r1472; 2026-02-21T08:22:03.2152303Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2152361Z cvt.u64.u32 %rd520, %r765; 2026-02-21T08:22:03.2152417Z cvt.u64.u32 %rd521, %r766; 2026-02-21T08:22:03.2152479Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:22:03.2152536Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:22:03.2152699Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2152755Z mov.b64 {%r1475, %r1476}, %rd523; 2026-02-21T08:22:03.2152824Z cvt.rn.f16x2.f32 %r1477, %r1476, %r1475; 2026-02-21T08:22:03.2152985Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2153040Z cvt.u64.u32 %rd524, %r768; 2026-02-21T08:22:03.2153102Z cvt.u64.u32 %rd525, %r769; 2026-02-21T08:22:03.2153158Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:22:03.2153213Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:22:03.2153381Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2153440Z mov.b64 {%r1478, %r1479}, %rd527; 2026-02-21T08:22:03.2153546Z cvt.rn.f16x2.f32 %r1480, %r1479, %r1478; 2026-02-21T08:22:03.2153709Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2153772Z cvt.u64.u32 %rd528, %r770; 2026-02-21T08:22:03.2153827Z cvt.u64.u32 %rd529, %r771; 2026-02-21T08:22:03.2153882Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:22:03.2153947Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:22:03.2154105Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2154161Z mov.b64 {%r1481, %r1482}, %rd531; 2026-02-21T08:22:03.2154232Z cvt.rn.f16x2.f32 %r1483, %r1482, %r1481; 2026-02-21T08:22:03.2154390Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2154445Z cvt.u64.u32 %rd532, %r772; 2026-02-21T08:22:03.2154500Z cvt.u64.u32 %rd533, %r773; 2026-02-21T08:22:03.2154606Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:22:03.2154667Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:22:03.2154850Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2154914Z mov.b64 {%r1484, %r1485}, %rd535; 2026-02-21T08:22:03.2154978Z cvt.rn.f16x2.f32 %r1486, %r1485, %r1484; 2026-02-21T08:22:03.2155137Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2155199Z cvt.u64.u32 %rd536, %r774; 2026-02-21T08:22:03.2155255Z cvt.u64.u32 %rd537, %r775; 2026-02-21T08:22:03.2155310Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:22:03.2155367Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:22:03.2155533Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2155590Z mov.b64 {%r1487, %r1488}, %rd539; 2026-02-21T08:22:03.2155654Z cvt.rn.f16x2.f32 %r1489, %r1488, %r1487; 2026-02-21T08:22:03.2155822Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2155881Z cvt.u64.u32 %rd540, %r776; 2026-02-21T08:22:03.2155936Z cvt.u64.u32 %rd541, %r777; 2026-02-21T08:22:03.2155997Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:22:03.2156055Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:22:03.2156215Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2156272Z mov.b64 {%r1490, %r1491}, %rd543; 2026-02-21T08:22:03.2156344Z cvt.rn.f16x2.f32 %r1492, %r1491, %r1490; 2026-02-21T08:22:03.2156506Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2156562Z cvt.u64.u32 %rd544, %r778; 2026-02-21T08:22:03.2156627Z cvt.u64.u32 %rd545, %r779; 2026-02-21T08:22:03.2156684Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:22:03.2156743Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:22:03.2156914Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2156978Z mov.b64 {%r1493, %r1494}, %rd547; 2026-02-21T08:22:03.2157044Z cvt.rn.f16x2.f32 %r1495, %r1494, %r1493; 2026-02-21T08:22:03.2157209Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2157277Z cvt.u64.u32 %rd548, %r780; 2026-02-21T08:22:03.2157333Z cvt.u64.u32 %rd549, %r781; 2026-02-21T08:22:03.2157390Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:22:03.2157454Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:22:03.2157612Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2157669Z mov.b64 {%r1496, %r1497}, %rd551; 2026-02-21T08:22:03.2157739Z cvt.rn.f16x2.f32 %r1498, %r1497, %r1496; 2026-02-21T08:22:03.2157899Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2157957Z cvt.u64.u32 %rd552, %r782; 2026-02-21T08:22:03.2158066Z cvt.u64.u32 %rd553, %r783; 2026-02-21T08:22:03.2158128Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:22:03.2158184Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:22:03.2158360Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2158425Z mov.b64 {%r1499, %r1500}, %rd555; 2026-02-21T08:22:03.2158492Z cvt.rn.f16x2.f32 %r1501, %r1500, %r1499; 2026-02-21T08:22:03.2158658Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2158722Z cvt.u64.u32 %rd556, %r785; 2026-02-21T08:22:03.2158779Z cvt.u64.u32 %rd557, %r786; 2026-02-21T08:22:03.2158836Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:22:03.2158895Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:22:03.2159069Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2159186Z mov.b64 {%r1502, %r1503}, %rd559; 2026-02-21T08:22:03.2159259Z cvt.rn.f16x2.f32 %r1504, %r1503, %r1502; 2026-02-21T08:22:03.2159430Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2159487Z cvt.u64.u32 %rd560, %r787; 2026-02-21T08:22:03.2159545Z cvt.u64.u32 %rd561, %r788; 2026-02-21T08:22:03.2159609Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:22:03.2159668Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:22:03.2159837Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2159896Z mov.b64 {%r1505, %r1506}, %rd563; 2026-02-21T08:22:03.2159971Z cvt.rn.f16x2.f32 %r1507, %r1506, %r1505; 2026-02-21T08:22:03.2160136Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2160196Z cvt.u64.u32 %rd564, %r789; 2026-02-21T08:22:03.2160260Z cvt.u64.u32 %rd565, %r790; 2026-02-21T08:22:03.2160322Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:22:03.2160382Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:22:03.2160560Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2160619Z mov.b64 {%r1508, %r1509}, %rd567; 2026-02-21T08:22:03.2160684Z cvt.rn.f16x2.f32 %r1510, %r1509, %r1508; 2026-02-21T08:22:03.2160855Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2160920Z cvt.u64.u32 %rd568, %r791; 2026-02-21T08:22:03.2160977Z cvt.u64.u32 %rd569, %r792; 2026-02-21T08:22:03.2161035Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:22:03.2161099Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:22:03.2161273Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2161331Z mov.b64 {%r1511, %r1512}, %rd571; 2026-02-21T08:22:03.2161405Z cvt.rn.f16x2.f32 %r1513, %r1512, %r1511; 2026-02-21T08:22:03.2161576Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2161637Z cvt.u64.u32 %rd572, %r793; 2026-02-21T08:22:03.2161695Z cvt.u64.u32 %rd573, %r794; 2026-02-21T08:22:03.2161759Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:22:03.2161819Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:22:03.2161984Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2162051Z mov.b64 {%r1514, %r1515}, %rd575; 2026-02-21T08:22:03.2162115Z cvt.rn.f16x2.f32 %r1516, %r1515, %r1514; 2026-02-21T08:22:03.2162279Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2162344Z cvt.u64.u32 %rd576, %r795; 2026-02-21T08:22:03.2162402Z cvt.u64.u32 %rd577, %r796; 2026-02-21T08:22:03.2162459Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:22:03.2162517Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:22:03.2162694Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2162801Z mov.b64 {%r1517, %r1518}, %rd579; 2026-02-21T08:22:03.2162868Z cvt.rn.f16x2.f32 %r1519, %r1518, %r1517; 2026-02-21T08:22:03.2163044Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2163102Z cvt.u64.u32 %rd580, %r797; 2026-02-21T08:22:03.2163159Z cvt.u64.u32 %rd581, %r798; 2026-02-21T08:22:03.2163224Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:22:03.2163283Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:22:03.2163452Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2163512Z mov.b64 {%r1520, %r1521}, %rd583; 2026-02-21T08:22:03.2163586Z cvt.rn.f16x2.f32 %r1522, %r1521, %r1520; 2026-02-21T08:22:03.2163757Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2163860Z cvt.u64.u32 %rd584, %r799; 2026-02-21T08:22:03.2163930Z cvt.u64.u32 %rd585, %r800; 2026-02-21T08:22:03.2163991Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:22:03.2164048Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:22:03.2164224Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2164286Z mov.b64 {%r1523, %r1524}, %rd587; 2026-02-21T08:22:03.2164353Z cvt.rn.f16x2.f32 %r1525, %r1524, %r1523; 2026-02-21T08:22:03.2164521Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2164589Z cvt.u64.u32 %rd588, %r802; 2026-02-21T08:22:03.2164647Z cvt.u64.u32 %rd589, %r803; 2026-02-21T08:22:03.2164726Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:22:03.2164793Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:22:03.2164956Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2165016Z mov.b64 {%r1526, %r1527}, %rd591; 2026-02-21T08:22:03.2165090Z cvt.rn.f16x2.f32 %r1528, %r1527, %r1526; 2026-02-21T08:22:03.2165261Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2165319Z cvt.u64.u32 %rd592, %r804; 2026-02-21T08:22:03.2165376Z cvt.u64.u32 %rd593, %r805; 2026-02-21T08:22:03.2165442Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:22:03.2165501Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:22:03.2165666Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2165735Z mov.b64 {%r1529, %r1530}, %rd595; 2026-02-21T08:22:03.2165801Z cvt.rn.f16x2.f32 %r1531, %r1530, %r1529; 2026-02-21T08:22:03.2165973Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2166040Z cvt.u64.u32 %rd596, %r806; 2026-02-21T08:22:03.2166100Z cvt.u64.u32 %rd597, %r807; 2026-02-21T08:22:03.2166160Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:22:03.2166222Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:22:03.2166404Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2166461Z mov.b64 {%r1532, %r1533}, %rd599; 2026-02-21T08:22:03.2166526Z cvt.rn.f16x2.f32 %r1534, %r1533, %r1532; 2026-02-21T08:22:03.2166698Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2166758Z cvt.u64.u32 %rd600, %r808; 2026-02-21T08:22:03.2166816Z cvt.u64.u32 %rd601, %r809; 2026-02-21T08:22:03.2166885Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:22:03.2166944Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:22:03.2167101Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2167159Z mov.b64 {%r1535, %r1536}, %rd603; 2026-02-21T08:22:03.2167229Z cvt.rn.f16x2.f32 %r1537, %r1536, %r1535; 2026-02-21T08:22:03.2167385Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2167536Z cvt.u64.u32 %rd604, %r810; 2026-02-21T08:22:03.2167599Z cvt.u64.u32 %rd605, %r811; 2026-02-21T08:22:03.2167654Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:22:03.2167712Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:22:03.2167875Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2167933Z mov.b64 {%r1538, %r1539}, %rd607; 2026-02-21T08:22:03.2167996Z cvt.rn.f16x2.f32 %r1540, %r1539, %r1538; 2026-02-21T08:22:03.2168152Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2168217Z cvt.u64.u32 %rd608, %r812; 2026-02-21T08:22:03.2168272Z cvt.u64.u32 %rd609, %r813; 2026-02-21T08:22:03.2168329Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:22:03.2168394Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:22:03.2168610Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2168672Z mov.b64 {%r1541, %r1542}, %rd611; 2026-02-21T08:22:03.2168745Z cvt.rn.f16x2.f32 %r1543, %r1542, %r1541; 2026-02-21T08:22:03.2168904Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2168961Z cvt.u64.u32 %rd612, %r814; 2026-02-21T08:22:03.2169018Z cvt.u64.u32 %rd613, %r815; 2026-02-21T08:22:03.2169081Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:22:03.2169142Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:22:03.2169301Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2169367Z mov.b64 {%r1544, %r1545}, %rd615; 2026-02-21T08:22:03.2169431Z cvt.rn.f16x2.f32 %r1546, %r1545, %r1544; 2026-02-21T08:22:03.2169591Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2169653Z cvt.u64.u32 %rd616, %r816; 2026-02-21T08:22:03.2169711Z cvt.u64.u32 %rd617, %r817; 2026-02-21T08:22:03.2169769Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:22:03.2169826Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:22:03.2169994Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2170052Z mov.b64 {%r1547, %r1548}, %rd619; 2026-02-21T08:22:03.2170117Z cvt.rn.f16x2.f32 %r1549, %r1548, %r1547; 2026-02-21T08:22:03.2170284Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2170340Z cvt.u64.u32 %rd620, %r819; 2026-02-21T08:22:03.2170394Z cvt.u64.u32 %rd621, %r820; 2026-02-21T08:22:03.2170457Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:22:03.2170513Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:22:03.2170673Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2170730Z mov.b64 {%r1550, %r1551}, %rd623; 2026-02-21T08:22:03.2170803Z cvt.rn.f16x2.f32 %r1552, %r1551, %r1550; 2026-02-21T08:22:03.2170963Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2171017Z cvt.u64.u32 %rd624, %r821; 2026-02-21T08:22:03.2171079Z cvt.u64.u32 %rd625, %r822; 2026-02-21T08:22:03.2171133Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:22:03.2171188Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:22:03.2171354Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2171411Z mov.b64 {%r1553, %r1554}, %rd627; 2026-02-21T08:22:03.2171475Z cvt.rn.f16x2.f32 %r1555, %r1554, %r1553; 2026-02-21T08:22:03.2171636Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2171698Z cvt.u64.u32 %rd628, %r823; 2026-02-21T08:22:03.2171752Z cvt.u64.u32 %rd629, %r824; 2026-02-21T08:22:03.2171807Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:22:03.2171872Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:22:03.2172079Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2172135Z mov.b64 {%r1556, %r1557}, %rd631; 2026-02-21T08:22:03.2172204Z cvt.rn.f16x2.f32 %r1558, %r1557, %r1556; 2026-02-21T08:22:03.2172365Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2172420Z cvt.u64.u32 %rd632, %r825; 2026-02-21T08:22:03.2172475Z cvt.u64.u32 %rd633, %r826; 2026-02-21T08:22:03.2172537Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:22:03.2172593Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:22:03.2172751Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2172815Z mov.b64 {%r1559, %r1560}, %rd635; 2026-02-21T08:22:03.2172878Z cvt.rn.f16x2.f32 %r1561, %r1560, %r1559; 2026-02-21T08:22:03.2173082Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2173149Z cvt.u64.u32 %rd636, %r827; 2026-02-21T08:22:03.2173204Z cvt.u64.u32 %rd637, %r828; 2026-02-21T08:22:03.2173260Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:22:03.2173316Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:22:03.2173480Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2173536Z mov.b64 {%r1562, %r1563}, %rd639; 2026-02-21T08:22:03.2173599Z cvt.rn.f16x2.f32 %r1564, %r1563, %r1562; 2026-02-21T08:22:03.2173761Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2173818Z cvt.u64.u32 %rd640, %r829; 2026-02-21T08:22:03.2173872Z cvt.u64.u32 %rd641, %r830; 2026-02-21T08:22:03.2173935Z shl.b64 %rd642, %rd641, 32; 2026-02-21T08:22:03.2173993Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T08:22:03.2174151Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2174210Z mov.b64 {%r1565, %r1566}, %rd643; 2026-02-21T08:22:03.2174283Z cvt.rn.f16x2.f32 %r1567, %r1566, %r1565; 2026-02-21T08:22:03.2174439Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2174496Z cvt.u64.u32 %rd644, %r831; 2026-02-21T08:22:03.2174560Z cvt.u64.u32 %rd645, %r832; 2026-02-21T08:22:03.2174616Z shl.b64 %rd646, %rd645, 32; 2026-02-21T08:22:03.2174703Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T08:22:03.2174876Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2174935Z mov.b64 {%r1568, %r1569}, %rd647; 2026-02-21T08:22:03.2174998Z cvt.rn.f16x2.f32 %r1570, %r1569, %r1568; 2026-02-21T08:22:03.2175162Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2175225Z cvt.u64.u32 %rd648, %r833; 2026-02-21T08:22:03.2175281Z cvt.u64.u32 %rd649, %r834; 2026-02-21T08:22:03.2175340Z shl.b64 %rd650, %rd649, 32; 2026-02-21T08:22:03.2175403Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T08:22:03.2175563Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2175621Z mov.b64 {%r1571, %r1572}, %rd651; 2026-02-21T08:22:03.2175690Z cvt.rn.f16x2.f32 %r1573, %r1572, %r1571; 2026-02-21T08:22:03.2175845Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2175901Z cvt.u64.u32 %rd652, %r836; 2026-02-21T08:22:03.2175956Z cvt.u64.u32 %rd653, %r837; 2026-02-21T08:22:03.2176018Z shl.b64 %rd654, %rd653, 32; 2026-02-21T08:22:03.2176073Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T08:22:03.2176230Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2176294Z mov.b64 {%r1574, %r1575}, %rd655; 2026-02-21T08:22:03.2176359Z cvt.rn.f16x2.f32 %r1576, %r1575, %r1574; 2026-02-21T08:22:03.2176572Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2176635Z cvt.u64.u32 %rd656, %r838; 2026-02-21T08:22:03.2176690Z cvt.u64.u32 %rd657, %r839; 2026-02-21T08:22:03.2176745Z shl.b64 %rd658, %rd657, 32; 2026-02-21T08:22:03.2176801Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T08:22:03.2176966Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2177023Z mov.b64 {%r1577, %r1578}, %rd659; 2026-02-21T08:22:03.2177086Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T08:22:03.2177250Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2177305Z cvt.u64.u32 %rd660, %r840; 2026-02-21T08:22:03.2177359Z cvt.u64.u32 %rd661, %r841; 2026-02-21T08:22:03.2177422Z shl.b64 %rd662, %rd661, 32; 2026-02-21T08:22:03.2177525Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T08:22:03.2177690Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2177746Z mov.b64 {%r1580, %r1581}, %rd663; 2026-02-21T08:22:03.2177815Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T08:22:03.2177969Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2178025Z cvt.u64.u32 %rd664, %r842; 2026-02-21T08:22:03.2178086Z cvt.u64.u32 %rd665, %r843; 2026-02-21T08:22:03.2178140Z shl.b64 %rd666, %rd665, 32; 2026-02-21T08:22:03.2178197Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T08:22:03.2178361Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2178418Z mov.b64 {%r1583, %r1584}, %rd667; 2026-02-21T08:22:03.2178480Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T08:22:03.2178640Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2178705Z cvt.u64.u32 %rd668, %r844; 2026-02-21T08:22:03.2178760Z cvt.u64.u32 %rd669, %r845; 2026-02-21T08:22:03.2178815Z shl.b64 %rd670, %rd669, 32; 2026-02-21T08:22:03.2178877Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T08:22:03.2179036Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2179092Z mov.b64 {%r1586, %r1587}, %rd671; 2026-02-21T08:22:03.2179162Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T08:22:03.2179319Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2179374Z cvt.u64.u32 %rd672, %r846; 2026-02-21T08:22:03.2179427Z cvt.u64.u32 %rd673, %r847; 2026-02-21T08:22:03.2179489Z shl.b64 %rd674, %rd673, 32; 2026-02-21T08:22:03.2179545Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T08:22:03.2179706Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2179771Z mov.b64 {%r1589, %r1590}, %rd675; 2026-02-21T08:22:03.2179833Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T08:22:03.2179984Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2180046Z cvt.u64.u32 %rd676, %r848; 2026-02-21T08:22:03.2180100Z cvt.u64.u32 %rd677, %r849; 2026-02-21T08:22:03.2180154Z shl.b64 %rd678, %rd677, 32; 2026-02-21T08:22:03.2180211Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T08:22:03.2180374Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2180430Z mov.b64 {%r1592, %r1593}, %rd679; 2026-02-21T08:22:03.2180492Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T08:22:03.2180654Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2180709Z cvt.u64.u32 %rd680, %r850; 2026-02-21T08:22:03.2180765Z cvt.u64.u32 %rd681, %r851; 2026-02-21T08:22:03.2180871Z shl.b64 %rd682, %rd681, 32; 2026-02-21T08:22:03.2180927Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T08:22:03.2181085Z .loc 1 58 27 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:58:27 2026-02-21T08:22:03.2181141Z mov.b64 {%r1595, %r1596}, %rd683; 2026-02-21T08:22:03.2181211Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T08:22:03.2181366Z .loc 1 59 82 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:59:82 2026-02-21T08:22:03.2181469Z st.shared.v4.b32 [%r1170], {%r1216, %r1228, %r1240, %r1252}; 2026-02-21T08:22:03.2181576Z st.shared.v4.b32 [%r1169], {%r1264, %r1276, %r1288, %r1300}; 2026-02-21T08:22:03.2181669Z st.shared.v4.b32 [%r1167], {%r1312, %r1324, %r1336, %r1348}; 2026-02-21T08:22:03.2181756Z st.shared.v4.b32 [%r1165], {%r1360, %r1372, %r1384, %r1396}; 2026-02-21T08:22:03.2181818Z bar.sync 0, 128; 2026-02-21T08:22:03.2181875Z // begin inline asm 2026-02-21T08:22:03.2182126Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1013, %r1017, %r1021, %r1025}, [%r857]; 2026-02-21T08:22:03.2182185Z // end inline asm 2026-02-21T08:22:03.2182248Z // begin inline asm 2026-02-21T08:22:03.2182399Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1029, %r1033, %r1037, %r1041}, [%r862]; 2026-02-21T08:22:03.2182452Z // end inline asm 2026-02-21T08:22:03.2182514Z // begin inline asm 2026-02-21T08:22:03.2182656Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1045, %r1049, %r1053, %r1057}, [%r867]; 2026-02-21T08:22:03.2182709Z // end inline asm 2026-02-21T08:22:03.2182769Z // begin inline asm 2026-02-21T08:22:03.2182911Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1061, %r1065, %r1069, %r1073}, [%r872]; 2026-02-21T08:22:03.2182965Z // end inline asm 2026-02-21T08:22:03.2183019Z bar.sync 0, 128; 2026-02-21T08:22:03.2183122Z st.shared.v4.b32 [%r1170], {%r1408, %r1420, %r1432, %r1444}; 2026-02-21T08:22:03.2183213Z st.shared.v4.b32 [%r1169], {%r1456, %r1468, %r1480, %r1492}; 2026-02-21T08:22:03.2183307Z st.shared.v4.b32 [%r1167], {%r1504, %r1516, %r1528, %r1540}; 2026-02-21T08:22:03.2183409Z st.shared.v4.b32 [%r1165], {%r1552, %r1564, %r1576, %r1588}; 2026-02-21T08:22:03.2183471Z bar.sync 0, 128; 2026-02-21T08:22:03.2183528Z // begin inline asm 2026-02-21T08:22:03.2183673Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1077, %r1081, %r1085, %r1089}, [%r857]; 2026-02-21T08:22:03.2183734Z // end inline asm 2026-02-21T08:22:03.2183788Z // begin inline asm 2026-02-21T08:22:03.2183928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1093, %r1097, %r1101, %r1105}, [%r862]; 2026-02-21T08:22:03.2183987Z // end inline asm 2026-02-21T08:22:03.2184040Z // begin inline asm 2026-02-21T08:22:03.2184179Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1109, %r1113, %r1117, %r1121}, [%r867]; 2026-02-21T08:22:03.2184237Z // end inline asm 2026-02-21T08:22:03.2184292Z // begin inline asm 2026-02-21T08:22:03.2184430Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1125, %r1129, %r1133, %r1137}, [%r872]; 2026-02-21T08:22:03.2184483Z // end inline asm 2026-02-21T08:22:03.2184546Z bar.sync 0, 128; 2026-02-21T08:22:03.2184637Z st.shared.v4.b32 [%r1170], {%r1219, %r1231, %r1243, %r1255}; 2026-02-21T08:22:03.2184758Z st.shared.v4.b32 [%r1169], {%r1267, %r1279, %r1291, %r1303}; 2026-02-21T08:22:03.2184856Z st.shared.v4.b32 [%r1167], {%r1315, %r1327, %r1339, %r1351}; 2026-02-21T08:22:03.2184944Z st.shared.v4.b32 [%r1165], {%r1363, %r1375, %r1387, %r1399}; 2026-02-21T08:22:03.2184998Z bar.sync 0, 128; 2026-02-21T08:22:03.2185059Z // begin inline asm 2026-02-21T08:22:03.2185202Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1014, %r1018, %r1022, %r1026}, [%r857]; 2026-02-21T08:22:03.2185254Z // end inline asm 2026-02-21T08:22:03.2185306Z // begin inline asm 2026-02-21T08:22:03.2185454Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1030, %r1034, %r1038, %r1042}, [%r862]; 2026-02-21T08:22:03.2185506Z // end inline asm 2026-02-21T08:22:03.2185558Z // begin inline asm 2026-02-21T08:22:03.2185709Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1046, %r1050, %r1054, %r1058}, [%r867]; 2026-02-21T08:22:03.2185813Z // end inline asm 2026-02-21T08:22:03.2185866Z // begin inline asm 2026-02-21T08:22:03.2186008Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1062, %r1066, %r1070, %r1074}, [%r872]; 2026-02-21T08:22:03.2186068Z // end inline asm 2026-02-21T08:22:03.2186121Z bar.sync 0, 128; 2026-02-21T08:22:03.2186210Z st.shared.v4.b32 [%r1170], {%r1411, %r1423, %r1435, %r1447}; 2026-02-21T08:22:03.2186307Z st.shared.v4.b32 [%r1169], {%r1459, %r1471, %r1483, %r1495}; 2026-02-21T08:22:03.2186394Z st.shared.v4.b32 [%r1167], {%r1507, %r1519, %r1531, %r1543}; 2026-02-21T08:22:03.2186482Z st.shared.v4.b32 [%r1165], {%r1555, %r1567, %r1579, %r1591}; 2026-02-21T08:22:03.2186541Z bar.sync 0, 128; 2026-02-21T08:22:03.2186596Z // begin inline asm 2026-02-21T08:22:03.2186736Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1078, %r1082, %r1086, %r1090}, [%r857]; 2026-02-21T08:22:03.2186786Z // end inline asm 2026-02-21T08:22:03.2186892Z // begin inline asm 2026-02-21T08:22:03.2187037Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1094, %r1098, %r1102, %r1106}, [%r862]; 2026-02-21T08:22:03.2187089Z // end inline asm 2026-02-21T08:22:03.2187149Z // begin inline asm 2026-02-21T08:22:03.2187289Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1110, %r1114, %r1118, %r1122}, [%r867]; 2026-02-21T08:22:03.2187341Z // end inline asm 2026-02-21T08:22:03.2187400Z // begin inline asm 2026-02-21T08:22:03.2187541Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1126, %r1130, %r1134, %r1138}, [%r872]; 2026-02-21T08:22:03.2187592Z // end inline asm 2026-02-21T08:22:03.2187645Z bar.sync 0, 128; 2026-02-21T08:22:03.2187741Z st.shared.v4.b32 [%r1170], {%r1222, %r1234, %r1246, %r1258}; 2026-02-21T08:22:03.2187828Z st.shared.v4.b32 [%r1169], {%r1270, %r1282, %r1294, %r1306}; 2026-02-21T08:22:03.2187915Z st.shared.v4.b32 [%r1167], {%r1318, %r1330, %r1342, %r1354}; 2026-02-21T08:22:03.2188010Z st.shared.v4.b32 [%r1165], {%r1366, %r1378, %r1390, %r1402}; 2026-02-21T08:22:03.2188061Z bar.sync 0, 128; 2026-02-21T08:22:03.2188119Z // begin inline asm 2026-02-21T08:22:03.2188262Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1015, %r1019, %r1023, %r1027}, [%r857]; 2026-02-21T08:22:03.2188320Z // end inline asm 2026-02-21T08:22:03.2188372Z // begin inline asm 2026-02-21T08:22:03.2188513Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1031, %r1035, %r1039, %r1043}, [%r862]; 2026-02-21T08:22:03.2188571Z // end inline asm 2026-02-21T08:22:03.2188623Z // begin inline asm 2026-02-21T08:22:03.2188763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1047, %r1051, %r1055, %r1059}, [%r867]; 2026-02-21T08:22:03.2188820Z // end inline asm 2026-02-21T08:22:03.2188871Z // begin inline asm 2026-02-21T08:22:03.2189011Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1063, %r1067, %r1071, %r1075}, [%r872]; 2026-02-21T08:22:03.2189062Z // end inline asm 2026-02-21T08:22:03.2189122Z bar.sync 0, 128; 2026-02-21T08:22:03.2189213Z st.shared.v4.b32 [%r1170], {%r1414, %r1426, %r1438, %r1450}; 2026-02-21T08:22:03.2189304Z st.shared.v4.b32 [%r1169], {%r1462, %r1474, %r1486, %r1498}; 2026-02-21T08:22:03.2189400Z st.shared.v4.b32 [%r1167], {%r1510, %r1522, %r1534, %r1546}; 2026-02-21T08:22:03.2189486Z st.shared.v4.b32 [%r1165], {%r1558, %r1570, %r1582, %r1594}; 2026-02-21T08:22:03.2189537Z bar.sync 0, 128; 2026-02-21T08:22:03.2189596Z // begin inline asm 2026-02-21T08:22:03.2189738Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1079, %r1083, %r1087, %r1091}, [%r857]; 2026-02-21T08:22:03.2189788Z // end inline asm 2026-02-21T08:22:03.2189839Z // begin inline asm 2026-02-21T08:22:03.2189989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1095, %r1099, %r1103, %r1107}, [%r862]; 2026-02-21T08:22:03.2190040Z // end inline asm 2026-02-21T08:22:03.2190093Z // begin inline asm 2026-02-21T08:22:03.2190239Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1111, %r1115, %r1119, %r1123}, [%r867]; 2026-02-21T08:22:03.2190292Z // end inline asm 2026-02-21T08:22:03.2190347Z // begin inline asm 2026-02-21T08:22:03.2190488Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1127, %r1131, %r1135, %r1139}, [%r872]; 2026-02-21T08:22:03.2190601Z // end inline asm 2026-02-21T08:22:03.2190654Z bar.sync 0, 128; 2026-02-21T08:22:03.2190744Z st.shared.v4.b32 [%r1170], {%r1225, %r1237, %r1249, %r1261}; 2026-02-21T08:22:03.2190843Z st.shared.v4.b32 [%r1169], {%r1273, %r1285, %r1297, %r1309}; 2026-02-21T08:22:03.2190933Z st.shared.v4.b32 [%r1167], {%r1321, %r1333, %r1345, %r1357}; 2026-02-21T08:22:03.2191022Z st.shared.v4.b32 [%r1165], {%r1369, %r1381, %r1393, %r1405}; 2026-02-21T08:22:03.2191082Z bar.sync 0, 128; 2026-02-21T08:22:03.2191135Z // begin inline asm 2026-02-21T08:22:03.2191275Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1016, %r1020, %r1024, %r1028}, [%r857]; 2026-02-21T08:22:03.2191328Z // end inline asm 2026-02-21T08:22:03.2191391Z // begin inline asm 2026-02-21T08:22:03.2191531Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1032, %r1036, %r1040, %r1044}, [%r862]; 2026-02-21T08:22:03.2191624Z // end inline asm 2026-02-21T08:22:03.2191692Z // begin inline asm 2026-02-21T08:22:03.2191839Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1048, %r1052, %r1056, %r1060}, [%r867]; 2026-02-21T08:22:03.2191894Z // end inline asm 2026-02-21T08:22:03.2191961Z // begin inline asm 2026-02-21T08:22:03.2192108Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1064, %r1068, %r1072, %r1076}, [%r872]; 2026-02-21T08:22:03.2192161Z // end inline asm 2026-02-21T08:22:03.2192213Z bar.sync 0, 128; 2026-02-21T08:22:03.2192312Z st.shared.v4.b32 [%r1170], {%r1417, %r1429, %r1441, %r1453}; 2026-02-21T08:22:03.2192402Z st.shared.v4.b32 [%r1169], {%r1465, %r1477, %r1489, %r1501}; 2026-02-21T08:22:03.2192491Z st.shared.v4.b32 [%r1167], {%r1513, %r1525, %r1537, %r1549}; 2026-02-21T08:22:03.2192585Z st.shared.v4.b32 [%r1165], {%r1561, %r1573, %r1585, %r1597}; 2026-02-21T08:22:03.2192636Z bar.sync 0, 128; 2026-02-21T08:22:03.2192689Z // begin inline asm 2026-02-21T08:22:03.2192843Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1080, %r1084, %r1088, %r1092}, [%r857]; 2026-02-21T08:22:03.2192898Z // end inline asm 2026-02-21T08:22:03.2192951Z // begin inline asm 2026-02-21T08:22:03.2193094Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1096, %r1100, %r1104, %r1108}, [%r862]; 2026-02-21T08:22:03.2193153Z // end inline asm 2026-02-21T08:22:03.2193206Z // begin inline asm 2026-02-21T08:22:03.2193350Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1112, %r1116, %r1120, %r1124}, [%r867]; 2026-02-21T08:22:03.2193408Z // end inline asm 2026-02-21T08:22:03.2193461Z // begin inline asm 2026-02-21T08:22:03.2193602Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1128, %r1132, %r1136, %r1140}, [%r872]; 2026-02-21T08:22:03.2193654Z // end inline asm 2026-02-21T08:22:03.2193714Z // begin inline asm 2026-02-21T08:22:03.2193822Z st.global.v4.b32 [ %rd140 + 0 ], { %r1013, %r1014, %r1015, %r1016 }; 2026-02-21T08:22:03.2193874Z // end inline asm 2026-02-21T08:22:03.2193934Z // begin inline asm 2026-02-21T08:22:03.2194039Z st.global.v4.b32 [ %rd141 + 0 ], { %r1017, %r1018, %r1019, %r1020 }; 2026-02-21T08:22:03.2194094Z // end inline asm 2026-02-21T08:22:03.2194156Z // begin inline asm 2026-02-21T08:22:03.2194256Z st.global.v4.b32 [ %rd142 + 0 ], { %r1021, %r1022, %r1023, %r1024 }; 2026-02-21T08:22:03.2194310Z // end inline asm 2026-02-21T08:22:03.2194363Z // begin inline asm 2026-02-21T08:22:03.2194467Z st.global.v4.b32 [ %rd143 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T08:22:03.2194519Z // end inline asm 2026-02-21T08:22:03.2194572Z // begin inline asm 2026-02-21T08:22:03.2194698Z st.global.v4.b32 [ %rd144 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T08:22:03.2194751Z // end inline asm 2026-02-21T08:22:03.2194805Z // begin inline asm 2026-02-21T08:22:03.2194900Z st.global.v4.b32 [ %rd145 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T08:22:03.2194961Z // end inline asm 2026-02-21T08:22:03.2195015Z // begin inline asm 2026-02-21T08:22:03.2195111Z st.global.v4.b32 [ %rd146 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T08:22:03.2195224Z // end inline asm 2026-02-21T08:22:03.2195280Z // begin inline asm 2026-02-21T08:22:03.2195375Z st.global.v4.b32 [ %rd147 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T08:22:03.2195434Z // end inline asm 2026-02-21T08:22:03.2195487Z // begin inline asm 2026-02-21T08:22:03.2195580Z st.global.v4.b32 [ %rd148 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T08:22:03.2195633Z // end inline asm 2026-02-21T08:22:03.2195693Z // begin inline asm 2026-02-21T08:22:03.2197562Z st.global.v4.b32 [ %rd149 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T08:22:03.2197619Z // end inline asm 2026-02-21T08:22:03.2197673Z // begin inline asm 2026-02-21T08:22:03.2197778Z st.global.v4.b32 [ %rd150 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T08:22:03.2197831Z // end inline asm 2026-02-21T08:22:03.2197886Z // begin inline asm 2026-02-21T08:22:03.2197988Z st.global.v4.b32 [ %rd151 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T08:22:03.2198102Z // end inline asm 2026-02-21T08:22:03.2198160Z // begin inline asm 2026-02-21T08:22:03.2198254Z st.global.v4.b32 [ %rd152 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T08:22:03.2198313Z // end inline asm 2026-02-21T08:22:03.2198368Z // begin inline asm 2026-02-21T08:22:03.2198460Z st.global.v4.b32 [ %rd153 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T08:22:03.2198518Z // end inline asm 2026-02-21T08:22:03.2198571Z // begin inline asm 2026-02-21T08:22:03.2198665Z st.global.v4.b32 [ %rd154 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T08:22:03.2198758Z // end inline asm 2026-02-21T08:22:03.2198813Z // begin inline asm 2026-02-21T08:22:03.2198920Z st.global.v4.b32 [ %rd155 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T08:22:03.2198975Z // end inline asm 2026-02-21T08:22:03.2199030Z // begin inline asm 2026-02-21T08:22:03.2199132Z st.global.v4.b32 [ %rd156 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T08:22:03.2199184Z // end inline asm 2026-02-21T08:22:03.2199239Z // begin inline asm 2026-02-21T08:22:03.2199333Z st.global.v4.b32 [ %rd157 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T08:22:03.2199394Z // end inline asm 2026-02-21T08:22:03.2199446Z // begin inline asm 2026-02-21T08:22:03.2199539Z st.global.v4.b32 [ %rd158 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T08:22:03.2199600Z // end inline asm 2026-02-21T08:22:03.2199652Z // begin inline asm 2026-02-21T08:22:03.2199745Z st.global.v4.b32 [ %rd159 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T08:22:03.2199803Z // end inline asm 2026-02-21T08:22:03.2199855Z // begin inline asm 2026-02-21T08:22:03.2199948Z st.global.v4.b32 [ %rd160 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T08:22:03.2200000Z // end inline asm 2026-02-21T08:22:03.2200060Z // begin inline asm 2026-02-21T08:22:03.2200153Z st.global.v4.b32 [ %rd161 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T08:22:03.2200205Z // end inline asm 2026-02-21T08:22:03.2200266Z // begin inline asm 2026-02-21T08:22:03.2200359Z st.global.v4.b32 [ %rd162 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T08:22:03.2200414Z // end inline asm 2026-02-21T08:22:03.2200466Z // begin inline asm 2026-02-21T08:22:03.2200566Z st.global.v4.b32 [ %rd163 + 0 ], { %r1105, %r1106, %r1107, %r1108 }; 2026-02-21T08:22:03.2200618Z // end inline asm 2026-02-21T08:22:03.2200671Z // begin inline asm 2026-02-21T08:22:03.2200773Z st.global.v4.b32 [ %rd164 + 0 ], { %r1109, %r1110, %r1111, %r1112 }; 2026-02-21T08:22:03.2200825Z // end inline asm 2026-02-21T08:22:03.2200880Z // begin inline asm 2026-02-21T08:22:03.2200980Z st.global.v4.b32 [ %rd165 + 0 ], { %r1113, %r1114, %r1115, %r1116 }; 2026-02-21T08:22:03.2201032Z // end inline asm 2026-02-21T08:22:03.2201085Z // begin inline asm 2026-02-21T08:22:03.2201180Z st.global.v4.b32 [ %rd166 + 0 ], { %r1117, %r1118, %r1119, %r1120 }; 2026-02-21T08:22:03.2201241Z // end inline asm 2026-02-21T08:22:03.2201293Z // begin inline asm 2026-02-21T08:22:03.2201388Z st.global.v4.b32 [ %rd167 + 0 ], { %r1121, %r1122, %r1123, %r1124 }; 2026-02-21T08:22:03.2201470Z // end inline asm 2026-02-21T08:22:03.2201523Z // begin inline asm 2026-02-21T08:22:03.2201617Z st.global.v4.b32 [ %rd168 + 0 ], { %r1125, %r1126, %r1127, %r1128 }; 2026-02-21T08:22:03.2201671Z // end inline asm 2026-02-21T08:22:03.2201733Z // begin inline asm 2026-02-21T08:22:03.2201831Z st.global.v4.b32 [ %rd169 + 0 ], { %r1129, %r1130, %r1131, %r1132 }; 2026-02-21T08:22:03.2201885Z // end inline asm 2026-02-21T08:22:03.2201946Z // begin inline asm 2026-02-21T08:22:03.2202139Z st.global.v4.b32 [ %rd170 + 0 ], { %r1133, %r1134, %r1135, %r1136 }; 2026-02-21T08:22:03.2202193Z // end inline asm 2026-02-21T08:22:03.2202255Z // begin inline asm 2026-02-21T08:22:03.2202354Z st.global.v4.b32 [ %rd171 + 0 ], { %r1137, %r1138, %r1139, %r1140 }; 2026-02-21T08:22:03.2202411Z // end inline asm 2026-02-21T08:22:03.2202500Z $L__BB0_16: // %._crit_edge 2026-02-21T08:22:03.2202746Z .loc 1 30 4 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:30:4 2026-02-21T08:22:03.2202806Z bar.sync 0, 128; 2026-02-21T08:22:03.2202862Z // begin inline asm 2026-02-21T08:22:03.2203000Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1598, 256; 2026-02-21T08:22:03.2203056Z // end inline asm 2026-02-21T08:22:03.2203158Z st.shared.v2.b32 [global_smem+155784], {50529027, 50529027}; 2026-02-21T08:22:03.2203215Z barrier.sync 1; 2026-02-21T08:22:03.2203306Z $L__BB0_17: // %common.ret 2026-02-21T08:22:03.2203475Z .loc 1 0 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:0 2026-02-21T08:22:03.2203528Z ret; 2026-02-21T08:22:03.2203634Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:22:03.2203720Z ld.param.b64 %rd6, [_helion_matmul_param_0]; 2026-02-21T08:22:03.2203884Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19 2026-02-21T08:22:03.2203956Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:22:03.2204018Z and.b16 %rs2, %rs1, 3; 2026-02-21T08:22:03.2204081Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:22:03.2204255Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2204314Z or.b32 %r5, %r4, 192; 2026-02-21T08:22:03.2204376Z mov.b32 %r28, global_smem; 2026-02-21T08:22:03.2204436Z add.s32 %r29, %r28, %r3; 2026-02-21T08:22:03.2204501Z bra.uni $L__BB0_2; 2026-02-21T08:22:03.2204603Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2204818Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2204884Z barrier.sync 1; 2026-02-21T08:22:03.2204942Z barrier.sync 1; 2026-02-21T08:22:03.2205024Z $L__BB0_2: // %.preheader 2026-02-21T08:22:03.2205121Z // =>This Loop Header: Depth=1 2026-02-21T08:22:03.2205221Z // Child Loop BB0_11 Depth 2 2026-02-21T08:22:03.2205310Z // Child Loop BB0_6 Depth 2 2026-02-21T08:22:03.2205473Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19 2026-02-21T08:22:03.2205566Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:22:03.2205624Z barrier.sync 1; 2026-02-21T08:22:03.2205693Z ld.shared.b8 %r27, [%r29+155780]; 2026-02-21T08:22:03.2205767Z setp.gt.u32 %p3, %r27, 3; 2026-02-21T08:22:03.2205833Z @%p3 bra $L__BB0_4; 2026-02-21T08:22:03.2205920Z // %bb.3: // %.preheader 2026-02-21T08:22:03.2206013Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2206092Z $L_brx_0: .branchtargets 2026-02-21T08:22:03.2206150Z $L__BB0_5, 2026-02-21T08:22:03.2206206Z $L__BB0_10, 2026-02-21T08:22:03.2206272Z $L__BB0_13, 2026-02-21T08:22:03.2206329Z $L__BB0_17; 2026-02-21T08:22:03.2206424Z brx.idx %r27, $L_brx_0; 2026-02-21T08:22:03.2206536Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2206712Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2206783Z ld.shared.b32 %r193, [global_smem]; 2026-02-21T08:22:03.2206858Z ld.shared.b32 %r155, [global_smem+12]; 2026-02-21T08:22:03.2206924Z barrier.sync 1; 2026-02-21T08:22:03.2207091Z .loc 1 42 45 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:42:45 2026-02-21T08:22:03.2207192Z add.s32 %r156, %r1, -128; 2026-02-21T08:22:03.2207261Z shr.u32 %r7, %r156, 5; 2026-02-21T08:22:03.2207319Z shr.u32 %r157, %r1, 2; 2026-02-21T08:22:03.2207383Z bfe.u32 %r158, %r1, 2, 5; 2026-02-21T08:22:03.2207440Z or.b32 %r159, %r157, 224; 2026-02-21T08:22:03.2207668Z .loc 1 50 48 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:50:48 2026-02-21T08:22:03.2207728Z shl.b32 %r160, %r1, 3; 2026-02-21T08:22:03.2207786Z and.b32 %r161, %r160, 24; 2026-02-21T08:22:03.2207961Z .loc 1 42 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:42:32 2026-02-21T08:22:03.2208023Z add.s32 %r162, %r155, %r158; 2026-02-21T08:22:03.2208084Z add.s32 %r163, %r155, %r159; 2026-02-21T08:22:03.2208148Z shl.b32 %r164, %r162, 11; 2026-02-21T08:22:03.2208316Z .loc 1 54 53 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:53 2026-02-21T08:22:03.2208378Z add.s32 %r165, %r164, 65536; 2026-02-21T08:22:03.2208440Z add.s32 %r166, %r164, 131072; 2026-02-21T08:22:03.2208507Z add.s32 %r167, %r164, 196608; 2026-02-21T08:22:03.2208564Z add.s32 %r168, %r164, 262144; 2026-02-21T08:22:03.2208621Z add.s32 %r169, %r164, 327680; 2026-02-21T08:22:03.2208686Z add.s32 %r170, %r164, 393216; 2026-02-21T08:22:03.2208744Z shl.b32 %r171, %r163, 11; 2026-02-21T08:22:03.2208916Z .loc 1 54 60 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:60 2026-02-21T08:22:03.2208986Z or.b32 %r172, %r164, %r161; 2026-02-21T08:22:03.2209045Z or.b32 %r173, %r165, %r161; 2026-02-21T08:22:03.2209102Z or.b32 %r174, %r166, %r161; 2026-02-21T08:22:03.2209157Z or.b32 %r175, %r167, %r161; 2026-02-21T08:22:03.2209222Z or.b32 %r176, %r168, %r161; 2026-02-21T08:22:03.2209277Z or.b32 %r177, %r169, %r161; 2026-02-21T08:22:03.2209333Z or.b32 %r178, %r170, %r161; 2026-02-21T08:22:03.2209395Z or.b32 %r179, %r171, %r161; 2026-02-21T08:22:03.2209580Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2209644Z mad.wide.s32 %rd13, %r172, 2, %rd6; 2026-02-21T08:22:03.2209713Z mad.wide.s32 %rd14, %r173, 2, %rd6; 2026-02-21T08:22:03.2209774Z mad.wide.s32 %rd15, %r174, 2, %rd6; 2026-02-21T08:22:03.2209833Z mad.wide.s32 %rd16, %r175, 2, %rd6; 2026-02-21T08:22:03.2209892Z mad.wide.s32 %rd17, %r176, 2, %rd6; 2026-02-21T08:22:03.2209958Z mad.wide.s32 %rd18, %r177, 2, %rd6; 2026-02-21T08:22:03.2210016Z mad.wide.s32 %rd19, %r178, 2, %rd6; 2026-02-21T08:22:03.2210074Z mad.wide.s32 %rd20, %r179, 2, %rd6; 2026-02-21T08:22:03.2210237Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2210291Z shl.b32 %r180, %r1, 4; 2026-02-21T08:22:03.2210345Z and.b32 %r181, %r180, 2032; 2026-02-21T08:22:03.2210399Z shl.b32 %r182, %r1, 1; 2026-02-21T08:22:03.2210459Z and.b32 %r183, %r182, 48; 2026-02-21T08:22:03.2210517Z xor.b32 %r8, %r181, %r183; 2026-02-21T08:22:03.2210572Z add.s32 %r55, %r28, %r8; 2026-02-21T08:22:03.2210631Z mov.b32 %r56, 16; 2026-02-21T08:22:03.2210684Z // begin inline asm 2026-02-21T08:22:03.2210799Z cp.async.cg.shared.global [ %r55 + 0 ], [ %rd13 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2210851Z // end inline asm 2026-02-21T08:22:03.2210913Z add.s32 %r57, %r55, 2048; 2026-02-21T08:22:03.2210969Z // begin inline asm 2026-02-21T08:22:03.2211083Z cp.async.cg.shared.global [ %r57 + 0 ], [ %rd14 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2211168Z // end inline asm 2026-02-21T08:22:03.2211223Z add.s32 %r59, %r55, 4096; 2026-02-21T08:22:03.2211276Z // begin inline asm 2026-02-21T08:22:03.2211391Z cp.async.cg.shared.global [ %r59 + 0 ], [ %rd15 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2211446Z // end inline asm 2026-02-21T08:22:03.2211499Z add.s32 %r61, %r55, 6144; 2026-02-21T08:22:03.2211553Z // begin inline asm 2026-02-21T08:22:03.2211663Z cp.async.cg.shared.global [ %r61 + 0 ], [ %rd16 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2211742Z // end inline asm 2026-02-21T08:22:03.2211796Z add.s32 %r63, %r55, 8192; 2026-02-21T08:22:03.2211856Z // begin inline asm 2026-02-21T08:22:03.2211957Z cp.async.cg.shared.global [ %r63 + 0 ], [ %rd17 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2212010Z // end inline asm 2026-02-21T08:22:03.2212066Z add.s32 %r65, %r55, 10240; 2026-02-21T08:22:03.2212128Z // begin inline asm 2026-02-21T08:22:03.2212271Z cp.async.cg.shared.global [ %r65 + 0 ], [ %rd18 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2212326Z // end inline asm 2026-02-21T08:22:03.2212390Z add.s32 %r67, %r55, 12288; 2026-02-21T08:22:03.2212443Z // begin inline asm 2026-02-21T08:22:03.2212544Z cp.async.cg.shared.global [ %r67 + 0 ], [ %rd19 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2212597Z // end inline asm 2026-02-21T08:22:03.2212660Z add.s32 %r69, %r55, 14336; 2026-02-21T08:22:03.2212713Z // begin inline asm 2026-02-21T08:22:03.2212814Z cp.async.cg.shared.global [ %r69 + 0 ], [ %rd20 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2212875Z // end inline asm 2026-02-21T08:22:03.2212936Z cp.async.commit_group; 2026-02-21T08:22:03.2213096Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2213160Z cvt.s64.s32 %rd62, %r164; 2026-02-21T08:22:03.2213217Z cvt.u64.u32 %rd63, %r161; 2026-02-21T08:22:03.2213276Z or.b64 %rd64, %rd62, %rd63; 2026-02-21T08:22:03.2213337Z shl.b64 %rd65, %rd64, 1; 2026-02-21T08:22:03.2213406Z add.s64 %rd66, %rd6, %rd65; 2026-02-21T08:22:03.2213461Z add.s64 %rd21, %rd66, 64; 2026-02-21T08:22:03.2213520Z cvt.s64.s32 %rd67, %r165; 2026-02-21T08:22:03.2213585Z or.b64 %rd68, %rd67, %rd63; 2026-02-21T08:22:03.2213641Z shl.b64 %rd69, %rd68, 1; 2026-02-21T08:22:03.2213699Z add.s64 %rd70, %rd6, %rd69; 2026-02-21T08:22:03.2213754Z add.s64 %rd22, %rd70, 64; 2026-02-21T08:22:03.2213817Z cvt.s64.s32 %rd71, %r166; 2026-02-21T08:22:03.2213872Z or.b64 %rd72, %rd71, %rd63; 2026-02-21T08:22:03.2213928Z shl.b64 %rd73, %rd72, 1; 2026-02-21T08:22:03.2213992Z add.s64 %rd74, %rd6, %rd73; 2026-02-21T08:22:03.2214046Z add.s64 %rd23, %rd74, 64; 2026-02-21T08:22:03.2214101Z cvt.s64.s32 %rd75, %r167; 2026-02-21T08:22:03.2214157Z or.b64 %rd76, %rd75, %rd63; 2026-02-21T08:22:03.2214218Z shl.b64 %rd77, %rd76, 1; 2026-02-21T08:22:03.2214273Z add.s64 %rd78, %rd6, %rd77; 2026-02-21T08:22:03.2214327Z add.s64 %rd24, %rd78, 64; 2026-02-21T08:22:03.2214392Z cvt.s64.s32 %rd79, %r168; 2026-02-21T08:22:03.2214448Z or.b64 %rd80, %rd79, %rd63; 2026-02-21T08:22:03.2214503Z shl.b64 %rd81, %rd80, 1; 2026-02-21T08:22:03.2214567Z add.s64 %rd82, %rd6, %rd81; 2026-02-21T08:22:03.2214622Z add.s64 %rd25, %rd82, 64; 2026-02-21T08:22:03.2214712Z cvt.s64.s32 %rd83, %r169; 2026-02-21T08:22:03.2214769Z or.b64 %rd84, %rd83, %rd63; 2026-02-21T08:22:03.2214831Z shl.b64 %rd85, %rd84, 1; 2026-02-21T08:22:03.2214885Z add.s64 %rd86, %rd6, %rd85; 2026-02-21T08:22:03.2214938Z add.s64 %rd26, %rd86, 64; 2026-02-21T08:22:03.2215000Z cvt.s64.s32 %rd87, %r170; 2026-02-21T08:22:03.2215056Z or.b64 %rd88, %rd87, %rd63; 2026-02-21T08:22:03.2215111Z shl.b64 %rd89, %rd88, 1; 2026-02-21T08:22:03.2215166Z add.s64 %rd90, %rd6, %rd89; 2026-02-21T08:22:03.2215229Z add.s64 %rd27, %rd90, 64; 2026-02-21T08:22:03.2215284Z cvt.s64.s32 %rd91, %r171; 2026-02-21T08:22:03.2215339Z or.b64 %rd92, %rd91, %rd63; 2026-02-21T08:22:03.2215402Z shl.b64 %rd93, %rd92, 1; 2026-02-21T08:22:03.2215460Z add.s64 %rd94, %rd6, %rd93; 2026-02-21T08:22:03.2215550Z add.s64 %rd28, %rd94, 64; 2026-02-21T08:22:03.2215710Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2215773Z bar.sync 2, 128; 2026-02-21T08:22:03.2215831Z add.s32 %r71, %r55, 16384; 2026-02-21T08:22:03.2215886Z // begin inline asm 2026-02-21T08:22:03.2215996Z cp.async.cg.shared.global [ %r71 + 0 ], [ %rd21 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2216057Z // end inline asm 2026-02-21T08:22:03.2216143Z add.s32 %r73, %r55, 18432; 2026-02-21T08:22:03.2216205Z // begin inline asm 2026-02-21T08:22:03.2216307Z cp.async.cg.shared.global [ %r73 + 0 ], [ %rd22 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2216362Z // end inline asm 2026-02-21T08:22:03.2216423Z add.s32 %r75, %r55, 20480; 2026-02-21T08:22:03.2216476Z // begin inline asm 2026-02-21T08:22:03.2216577Z cp.async.cg.shared.global [ %r75 + 0 ], [ %rd23 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2216677Z // end inline asm 2026-02-21T08:22:03.2216740Z add.s32 %r77, %r55, 22528; 2026-02-21T08:22:03.2216795Z // begin inline asm 2026-02-21T08:22:03.2216896Z cp.async.cg.shared.global [ %r77 + 0 ], [ %rd24 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2216956Z // end inline asm 2026-02-21T08:22:03.2217009Z add.s32 %r79, %r55, 24576; 2026-02-21T08:22:03.2217062Z // begin inline asm 2026-02-21T08:22:03.2217170Z cp.async.cg.shared.global [ %r79 + 0 ], [ %rd25 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2217223Z // end inline asm 2026-02-21T08:22:03.2217278Z add.s32 %r81, %r55, 26624; 2026-02-21T08:22:03.2217332Z // begin inline asm 2026-02-21T08:22:03.2217442Z cp.async.cg.shared.global [ %r81 + 0 ], [ %rd26 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2217494Z // end inline asm 2026-02-21T08:22:03.2217548Z add.s32 %r83, %r55, 28672; 2026-02-21T08:22:03.2217609Z // begin inline asm 2026-02-21T08:22:03.2217712Z cp.async.cg.shared.global [ %r83 + 0 ], [ %rd27 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2217767Z // end inline asm 2026-02-21T08:22:03.2217823Z add.s32 %r85, %r55, 30720; 2026-02-21T08:22:03.2217884Z // begin inline asm 2026-02-21T08:22:03.2217985Z cp.async.cg.shared.global [ %r85 + 0 ], [ %rd28 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2218038Z // end inline asm 2026-02-21T08:22:03.2218104Z cp.async.commit_group; 2026-02-21T08:22:03.2218270Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2218326Z add.s64 %rd29, %rd66, 128; 2026-02-21T08:22:03.2218381Z add.s64 %rd30, %rd70, 128; 2026-02-21T08:22:03.2218445Z add.s64 %rd31, %rd74, 128; 2026-02-21T08:22:03.2218499Z add.s64 %rd32, %rd78, 128; 2026-02-21T08:22:03.2218553Z add.s64 %rd33, %rd82, 128; 2026-02-21T08:22:03.2218615Z add.s64 %rd34, %rd86, 128; 2026-02-21T08:22:03.2218670Z add.s64 %rd35, %rd90, 128; 2026-02-21T08:22:03.2218723Z add.s64 %rd36, %rd94, 128; 2026-02-21T08:22:03.2218893Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2218950Z bar.sync 2, 128; 2026-02-21T08:22:03.2219004Z add.s32 %r87, %r55, 32768; 2026-02-21T08:22:03.2219057Z // begin inline asm 2026-02-21T08:22:03.2219168Z cp.async.cg.shared.global [ %r87 + 0 ], [ %rd29 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2219220Z // end inline asm 2026-02-21T08:22:03.2219273Z add.s32 %r89, %r55, 34816; 2026-02-21T08:22:03.2219332Z // begin inline asm 2026-02-21T08:22:03.2219433Z cp.async.cg.shared.global [ %r89 + 0 ], [ %rd30 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2219487Z // end inline asm 2026-02-21T08:22:03.2219541Z add.s32 %r91, %r55, 36864; 2026-02-21T08:22:03.2219601Z // begin inline asm 2026-02-21T08:22:03.2219703Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd31 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2219754Z // end inline asm 2026-02-21T08:22:03.2219814Z add.s32 %r93, %r55, 38912; 2026-02-21T08:22:03.2219866Z // begin inline asm 2026-02-21T08:22:03.2219968Z cp.async.cg.shared.global [ %r93 + 0 ], [ %rd32 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2220065Z // end inline asm 2026-02-21T08:22:03.2220118Z add.s32 %r95, %r55, 40960; 2026-02-21T08:22:03.2220169Z // begin inline asm 2026-02-21T08:22:03.2220271Z cp.async.cg.shared.global [ %r95 + 0 ], [ %rd33 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2220330Z // end inline asm 2026-02-21T08:22:03.2220383Z add.s32 %r97, %r55, 43008; 2026-02-21T08:22:03.2220436Z // begin inline asm 2026-02-21T08:22:03.2220542Z cp.async.cg.shared.global [ %r97 + 0 ], [ %rd34 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2220594Z // end inline asm 2026-02-21T08:22:03.2220674Z add.s32 %r99, %r55, 45056; 2026-02-21T08:22:03.2220726Z // begin inline asm 2026-02-21T08:22:03.2220835Z cp.async.cg.shared.global [ %r99 + 0 ], [ %rd35 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2220887Z // end inline asm 2026-02-21T08:22:03.2220942Z add.s32 %r101, %r55, 47104; 2026-02-21T08:22:03.2221003Z // begin inline asm 2026-02-21T08:22:03.2221153Z cp.async.cg.shared.global [ %r101 + 0 ], [ %rd36 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2221206Z // end inline asm 2026-02-21T08:22:03.2221268Z cp.async.commit_group; 2026-02-21T08:22:03.2221431Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2221487Z add.s64 %rd37, %rd66, 192; 2026-02-21T08:22:03.2221542Z add.s64 %rd38, %rd70, 192; 2026-02-21T08:22:03.2221603Z add.s64 %rd39, %rd74, 192; 2026-02-21T08:22:03.2221658Z add.s64 %rd40, %rd78, 192; 2026-02-21T08:22:03.2221713Z add.s64 %rd41, %rd82, 192; 2026-02-21T08:22:03.2221775Z add.s64 %rd42, %rd86, 192; 2026-02-21T08:22:03.2221829Z add.s64 %rd43, %rd90, 192; 2026-02-21T08:22:03.2221885Z add.s64 %rd44, %rd94, 192; 2026-02-21T08:22:03.2222039Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2222100Z bar.sync 2, 128; 2026-02-21T08:22:03.2222156Z add.s32 %r103, %r55, 49152; 2026-02-21T08:22:03.2222208Z // begin inline asm 2026-02-21T08:22:03.2222328Z cp.async.cg.shared.global [ %r103 + 0 ], [ %rd37 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2222382Z // end inline asm 2026-02-21T08:22:03.2222438Z add.s32 %r105, %r55, 51200; 2026-02-21T08:22:03.2222491Z // begin inline asm 2026-02-21T08:22:03.2222607Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd38 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2222659Z // end inline asm 2026-02-21T08:22:03.2222715Z add.s32 %r107, %r55, 53248; 2026-02-21T08:22:03.2222781Z // begin inline asm 2026-02-21T08:22:03.2222888Z cp.async.cg.shared.global [ %r107 + 0 ], [ %rd39 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2222944Z // end inline asm 2026-02-21T08:22:03.2223000Z add.s32 %r109, %r55, 55296; 2026-02-21T08:22:03.2223063Z // begin inline asm 2026-02-21T08:22:03.2223170Z cp.async.cg.shared.global [ %r109 + 0 ], [ %rd40 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2223222Z // end inline asm 2026-02-21T08:22:03.2223284Z add.s32 %r111, %r55, 57344; 2026-02-21T08:22:03.2223339Z // begin inline asm 2026-02-21T08:22:03.2223444Z cp.async.cg.shared.global [ %r111 + 0 ], [ %rd41 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2223504Z // end inline asm 2026-02-21T08:22:03.2223559Z add.s32 %r113, %r55, 59392; 2026-02-21T08:22:03.2223612Z // begin inline asm 2026-02-21T08:22:03.2223715Z cp.async.cg.shared.global [ %r113 + 0 ], [ %rd42 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2223777Z // end inline asm 2026-02-21T08:22:03.2223832Z add.s32 %r115, %r55, 61440; 2026-02-21T08:22:03.2223885Z // begin inline asm 2026-02-21T08:22:03.2223994Z cp.async.cg.shared.global [ %r115 + 0 ], [ %rd43 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2224048Z // end inline asm 2026-02-21T08:22:03.2224101Z add.s32 %r117, %r55, 63488; 2026-02-21T08:22:03.2224153Z // begin inline asm 2026-02-21T08:22:03.2224264Z cp.async.cg.shared.global [ %r117 + 0 ], [ %rd44 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2224315Z // end inline asm 2026-02-21T08:22:03.2224374Z cp.async.commit_group; 2026-02-21T08:22:03.2224536Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2224617Z add.s64 %rd45, %rd66, 256; 2026-02-21T08:22:03.2224706Z add.s64 %rd46, %rd70, 256; 2026-02-21T08:22:03.2224769Z add.s64 %rd47, %rd74, 256; 2026-02-21T08:22:03.2224823Z add.s64 %rd48, %rd78, 256; 2026-02-21T08:22:03.2224877Z add.s64 %rd49, %rd82, 256; 2026-02-21T08:22:03.2224932Z add.s64 %rd50, %rd86, 256; 2026-02-21T08:22:03.2224995Z add.s64 %rd51, %rd90, 256; 2026-02-21T08:22:03.2225050Z add.s64 %rd52, %rd94, 256; 2026-02-21T08:22:03.2225211Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2225301Z bar.sync 2, 128; 2026-02-21T08:22:03.2225356Z add.s32 %r119, %r55, 65536; 2026-02-21T08:22:03.2225411Z // begin inline asm 2026-02-21T08:22:03.2225515Z cp.async.cg.shared.global [ %r119 + 0 ], [ %rd45 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2225576Z // end inline asm 2026-02-21T08:22:03.2225630Z add.s32 %r121, %r55, 67584; 2026-02-21T08:22:03.2225736Z // begin inline asm 2026-02-21T08:22:03.2225853Z cp.async.cg.shared.global [ %r121 + 0 ], [ %rd46 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2225908Z // end inline asm 2026-02-21T08:22:03.2225964Z add.s32 %r123, %r55, 69632; 2026-02-21T08:22:03.2226026Z // begin inline asm 2026-02-21T08:22:03.2226132Z cp.async.cg.shared.global [ %r123 + 0 ], [ %rd47 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2226187Z // end inline asm 2026-02-21T08:22:03.2226241Z add.s32 %r125, %r55, 71680; 2026-02-21T08:22:03.2226303Z // begin inline asm 2026-02-21T08:22:03.2226408Z cp.async.cg.shared.global [ %r125 + 0 ], [ %rd48 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2226461Z // end inline asm 2026-02-21T08:22:03.2226521Z add.s32 %r127, %r55, 73728; 2026-02-21T08:22:03.2226574Z // begin inline asm 2026-02-21T08:22:03.2226677Z cp.async.cg.shared.global [ %r127 + 0 ], [ %rd49 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2226729Z // end inline asm 2026-02-21T08:22:03.2226792Z add.s32 %r129, %r55, 75776; 2026-02-21T08:22:03.2226846Z // begin inline asm 2026-02-21T08:22:03.2226950Z cp.async.cg.shared.global [ %r129 + 0 ], [ %rd50 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2227007Z // end inline asm 2026-02-21T08:22:03.2227060Z add.s32 %r131, %r55, 77824; 2026-02-21T08:22:03.2227113Z // begin inline asm 2026-02-21T08:22:03.2227217Z cp.async.cg.shared.global [ %r131 + 0 ], [ %rd51 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2227275Z // end inline asm 2026-02-21T08:22:03.2227330Z add.s32 %r133, %r55, 79872; 2026-02-21T08:22:03.2227383Z // begin inline asm 2026-02-21T08:22:03.2227494Z cp.async.cg.shared.global [ %r133 + 0 ], [ %rd52 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2227544Z // end inline asm 2026-02-21T08:22:03.2227603Z cp.async.commit_group; 2026-02-21T08:22:03.2227770Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2227826Z add.s64 %rd53, %rd66, 320; 2026-02-21T08:22:03.2227881Z add.s64 %rd54, %rd70, 320; 2026-02-21T08:22:03.2227938Z add.s64 %rd55, %rd74, 320; 2026-02-21T08:22:03.2228000Z add.s64 %rd56, %rd78, 320; 2026-02-21T08:22:03.2228056Z add.s64 %rd57, %rd82, 320; 2026-02-21T08:22:03.2228110Z add.s64 %rd58, %rd86, 320; 2026-02-21T08:22:03.2228171Z add.s64 %rd59, %rd90, 320; 2026-02-21T08:22:03.2228224Z add.s64 %rd60, %rd94, 320; 2026-02-21T08:22:03.2228384Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2228437Z bar.sync 2, 128; 2026-02-21T08:22:03.2228500Z add.s32 %r135, %r55, 81920; 2026-02-21T08:22:03.2228555Z // begin inline asm 2026-02-21T08:22:03.2228659Z cp.async.cg.shared.global [ %r135 + 0 ], [ %rd53 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2228720Z // end inline asm 2026-02-21T08:22:03.2228776Z add.s32 %r137, %r55, 83968; 2026-02-21T08:22:03.2228829Z // begin inline asm 2026-02-21T08:22:03.2228942Z cp.async.cg.shared.global [ %r137 + 0 ], [ %rd54 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2228994Z // end inline asm 2026-02-21T08:22:03.2229052Z add.s32 %r139, %r55, 86016; 2026-02-21T08:22:03.2229137Z // begin inline asm 2026-02-21T08:22:03.2229249Z cp.async.cg.shared.global [ %r139 + 0 ], [ %rd55 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2229302Z // end inline asm 2026-02-21T08:22:03.2229358Z add.s32 %r141, %r55, 88064; 2026-02-21T08:22:03.2229418Z // begin inline asm 2026-02-21T08:22:03.2229521Z cp.async.cg.shared.global [ %r141 + 0 ], [ %rd56 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2229576Z // end inline asm 2026-02-21T08:22:03.2229631Z add.s32 %r143, %r55, 90112; 2026-02-21T08:22:03.2229766Z // begin inline asm 2026-02-21T08:22:03.2229871Z cp.async.cg.shared.global [ %r143 + 0 ], [ %rd57 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2229927Z // end inline asm 2026-02-21T08:22:03.2229992Z add.s32 %r145, %r55, 92160; 2026-02-21T08:22:03.2230048Z // begin inline asm 2026-02-21T08:22:03.2230153Z cp.async.cg.shared.global [ %r145 + 0 ], [ %rd58 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2230206Z // end inline asm 2026-02-21T08:22:03.2230316Z add.s32 %r147, %r55, 94208; 2026-02-21T08:22:03.2230374Z // begin inline asm 2026-02-21T08:22:03.2230477Z cp.async.cg.shared.global [ %r147 + 0 ], [ %rd59 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2230539Z // end inline asm 2026-02-21T08:22:03.2230594Z add.s32 %r149, %r55, 96256; 2026-02-21T08:22:03.2230649Z // begin inline asm 2026-02-21T08:22:03.2230759Z cp.async.cg.shared.global [ %r149 + 0 ], [ %rd60 + 0 ], 0x10, %r56; 2026-02-21T08:22:03.2230811Z // end inline asm 2026-02-21T08:22:03.2230870Z cp.async.commit_group; 2026-02-21T08:22:03.2231036Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2231106Z add.s32 %r184, %r5, %r171; 2026-02-21T08:22:03.2231167Z cvt.u64.u32 %rd1, %r184; 2026-02-21T08:22:03.2231224Z add.s32 %r185, %r4, %r164; 2026-02-21T08:22:03.2231293Z cvt.u64.u32 %rd2, %r185; 2026-02-21T08:22:03.2231349Z mov.pred %p104, 0; 2026-02-21T08:22:03.2231401Z mov.b32 %r1601, 0; 2026-02-21T08:22:03.2231455Z mov.b32 %r1600, 5; 2026-02-21T08:22:03.2231522Z mov.b32 %r1599, -1; 2026-02-21T08:22:03.2231576Z mov.b64 %rd684, 0; 2026-02-21T08:22:03.2231632Z mov.b32 %r1602, %r1601; 2026-02-21T08:22:03.2231692Z bra.uni $L__BB0_6; 2026-02-21T08:22:03.2231788Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:22:03.2231953Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2232024Z setp.lt.u64 %p25, %rd684, 1856; 2026-02-21T08:22:03.2232187Z .loc 1 55 44 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:55:44 2026-02-21T08:22:03.2232245Z add.s32 %r238, %r1602, 1; 2026-02-21T08:22:03.2232304Z setp.eq.b32 %p26, %r238, 7; 2026-02-21T08:22:03.2232375Z selp.b32 %r1602, 0, %r238, %p26; 2026-02-21T08:22:03.2232434Z selp.b32 %r239, 1, 0, %p26; 2026-02-21T08:22:03.2232494Z xor.b32 %r1601, %r1601, %r239; 2026-02-21T08:22:03.2232664Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2232721Z add.s32 %r240, %r1600, 1; 2026-02-21T08:22:03.2232779Z setp.gt.s32 %p27, %r240, 5; 2026-02-21T08:22:03.2232840Z selp.b32 %r1600, 0, %r240, %p27; 2026-02-21T08:22:03.2233012Z .loc 1 54 60 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:60 2026-02-21T08:22:03.2233072Z add.s64 %rd119, %rd2, %rd684; 2026-02-21T08:22:03.2233228Z .loc 1 54 32 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:32 2026-02-21T08:22:03.2233294Z add.s64 %rd120, %rd1, %rd684; 2026-02-21T08:22:03.2233351Z cvt.u32.u64 %r241, %rd119; 2026-02-21T08:22:03.2233407Z add.s32 %r242, %r241, 192; 2026-02-21T08:22:03.2233479Z mad.wide.s32 %rd111, %r242, 2, %rd6; 2026-02-21T08:22:03.2233537Z add.s32 %r243, %r241, 65728; 2026-02-21T08:22:03.2233598Z mad.wide.s32 %rd112, %r243, 2, %rd6; 2026-02-21T08:22:03.2233654Z add.s32 %r244, %r241, 131264; 2026-02-21T08:22:03.2233723Z mad.wide.s32 %rd113, %r244, 2, %rd6; 2026-02-21T08:22:03.2233802Z add.s32 %r245, %r241, 196800; 2026-02-21T08:22:03.2233863Z mad.wide.s32 %rd114, %r245, 2, %rd6; 2026-02-21T08:22:03.2233924Z add.s32 %r246, %r241, 262336; 2026-02-21T08:22:03.2233983Z mad.wide.s32 %rd115, %r246, 2, %rd6; 2026-02-21T08:22:03.2234038Z add.s32 %r247, %r241, 327872; 2026-02-21T08:22:03.2234103Z mad.wide.s32 %rd116, %r247, 2, %rd6; 2026-02-21T08:22:03.2234157Z add.s32 %r248, %r241, 393408; 2026-02-21T08:22:03.2234215Z mad.wide.s32 %rd117, %r248, 2, %rd6; 2026-02-21T08:22:03.2234293Z cvt.u32.u64 %r249, %rd120; 2026-02-21T08:22:03.2234358Z mad.wide.s32 %rd118, %r249, 2, %rd6; 2026-02-21T08:22:03.2234521Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2234576Z shl.b32 %r250, %r1600, 14; 2026-02-21T08:22:03.2234636Z add.s32 %r252, %r28, %r250; 2026-02-21T08:22:03.2234716Z bar.sync 2, 128; 2026-02-21T08:22:03.2234830Z add.s32 %r222, %r252, %r8; 2026-02-21T08:22:03.2234889Z selp.b32 %r223, 16, 0, %p25; 2026-02-21T08:22:03.2234953Z // begin inline asm 2026-02-21T08:22:03.2235069Z cp.async.cg.shared.global [ %r222 + 0 ], [ %rd111 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2235121Z // end inline asm 2026-02-21T08:22:03.2235182Z add.s32 %r224, %r222, 2048; 2026-02-21T08:22:03.2235236Z // begin inline asm 2026-02-21T08:22:03.2235348Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd112 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2235406Z // end inline asm 2026-02-21T08:22:03.2235462Z add.s32 %r226, %r222, 4096; 2026-02-21T08:22:03.2235515Z // begin inline asm 2026-02-21T08:22:03.2235625Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd113 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2235686Z // end inline asm 2026-02-21T08:22:03.2235742Z add.s32 %r228, %r222, 6144; 2026-02-21T08:22:03.2235797Z // begin inline asm 2026-02-21T08:22:03.2235912Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd114 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2235966Z // end inline asm 2026-02-21T08:22:03.2236023Z add.s32 %r230, %r222, 8192; 2026-02-21T08:22:03.2236076Z // begin inline asm 2026-02-21T08:22:03.2236191Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd115 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2236243Z // end inline asm 2026-02-21T08:22:03.2236299Z add.s32 %r232, %r222, 10240; 2026-02-21T08:22:03.2236361Z // begin inline asm 2026-02-21T08:22:03.2236468Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd116 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2236522Z // end inline asm 2026-02-21T08:22:03.2236579Z add.s32 %r234, %r222, 12288; 2026-02-21T08:22:03.2236642Z // begin inline asm 2026-02-21T08:22:03.2236750Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd117 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2236802Z // end inline asm 2026-02-21T08:22:03.2236865Z add.s32 %r236, %r222, 14336; 2026-02-21T08:22:03.2236918Z // begin inline asm 2026-02-21T08:22:03.2237025Z cp.async.cg.shared.global [ %r236 + 0 ], [ %rd118 + 0 ], 0x10, %r223; 2026-02-21T08:22:03.2237087Z // end inline asm 2026-02-21T08:22:03.2237148Z cp.async.commit_group; 2026-02-21T08:22:03.2237310Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2237369Z add.s64 %rd4, %rd684, 32; 2026-02-21T08:22:03.2237445Z setp.lt.u64 %p28, %rd684, 2016; 2026-02-21T08:22:03.2237505Z mov.pred %p104, -1; 2026-02-21T08:22:03.2237562Z mov.b64 %rd684, %rd4; 2026-02-21T08:22:03.2237626Z @%p28 bra $L__BB0_6; 2026-02-21T08:22:03.2237683Z bra.uni $L__BB0_9; 2026-02-21T08:22:03.2237777Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:22:03.2237869Z // => This Inner Loop Header: Depth=2 2026-02-21T08:22:03.2237931Z add.s32 %r188, %r1599, 1; 2026-02-21T08:22:03.2237990Z setp.gt.s32 %p11, %r188, 5; 2026-02-21T08:22:03.2238051Z selp.b32 %r1599, 0, %r188, %p11; 2026-02-21T08:22:03.2238223Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2238314Z cp.async.wait_group 5; 2026-02-21T08:22:03.2238367Z bar.sync 2, 128; 2026-02-21T08:22:03.2238534Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2238591Z shl.b32 %r189, %r1602, 3; 2026-02-21T08:22:03.2238646Z add.s32 %r191, %r28, %r189; 2026-02-21T08:22:03.2238703Z add.s32 %r186, %r191, 155712; 2026-02-21T08:22:03.2238868Z .loc 1 55 44 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:55:44 2026-02-21T08:22:03.2238950Z // begin inline asm 2026-02-21T08:22:03.2239001Z 2026-02-21T08:22:03.2239058Z { 2026-02-21T08:22:03.2239119Z .reg .pred complete; 2026-02-21T08:22:03.2239175Z waitLoop: 2026-02-21T08:22:03.2239298Z mbarrier.try_wait.parity.shared.b64 complete, [%r186], %r1601; 2026-02-21T08:22:03.2239360Z @!complete bra.uni waitLoop; 2026-02-21T08:22:03.2239410Z } 2026-02-21T08:22:03.2239417Z 2026-02-21T08:22:03.2239509Z // end inline asm 2026-02-21T08:22:03.2239677Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2239751Z shfl.sync.idx.b32 %r192, %r7, 0, 31, -1; 2026-02-21T08:22:03.2239810Z setp.ne.b32 %p12, %r192, 0; 2026-02-21T08:22:03.2239872Z @%p12 bra $L__BB0_8; 2026-02-21T08:22:03.2239965Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:22:03.2240029Z setp.eq.b64 %p23, %rd684, 2016; 2026-02-21T08:22:03.2240195Z .loc 1 55 44 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:55:44 2026-02-21T08:22:03.2240253Z shl.b32 %r201, %r1602, 13; 2026-02-21T08:22:03.2240309Z add.s32 %r203, %r28, %r201; 2026-02-21T08:22:03.2240366Z add.s32 %r204, %r203, 98304; 2026-02-21T08:22:03.2240532Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2240589Z add.s32 %r207, %r191, 155648; 2026-02-21T08:22:03.2240751Z .loc 1 54 85 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:54:85 2026-02-21T08:22:03.2240816Z shl.b32 %r208, %r1599, 14; 2026-02-21T08:22:03.2240872Z add.s32 %r209, %r28, %r208; 2026-02-21T08:22:03.2241030Z .loc 1 56 52 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:56:52 2026-02-21T08:22:03.2241098Z elect.sync %r210|%p14, -1; 2026-02-21T08:22:03.2241157Z bfe.u32 %r211, %r209, 4, 14; 2026-02-21T08:22:03.2241214Z cvt.u64.u32 %rd105, %r211; 2026-02-21T08:22:03.2241287Z or.b64 %rd95, %rd105, -9223371899348713472; 2026-02-21T08:22:03.2241353Z bfe.u32 %r212, %r204, 4, 14; 2026-02-21T08:22:03.2241410Z cvt.u64.u32 %rd106, %r212; 2026-02-21T08:22:03.2241479Z or.b64 %rd96, %rd106, -9223371899382267904; 2026-02-21T08:22:03.2241541Z mov.b32 %r194, 136314896; 2026-02-21T08:22:03.2241594Z // begin inline asm 2026-02-21T08:22:03.2241740Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 0 ], %rd95, %rd96, %r194, %p104; 2026-02-21T08:22:03.2241804Z // end inline asm 2026-02-21T08:22:03.2241860Z add.s32 %r213, %r209, 32; 2026-02-21T08:22:03.2241915Z bfe.u32 %r214, %r213, 4, 14; 2026-02-21T08:22:03.2241972Z cvt.u64.u32 %rd107, %r214; 2026-02-21T08:22:03.2242045Z or.b64 %rd97, %rd107, -9223371899348713472; 2026-02-21T08:22:03.2242100Z add.s32 %r215, %r203, 98336; 2026-02-21T08:22:03.2242153Z bfe.u32 %r216, %r215, 4, 14; 2026-02-21T08:22:03.2242213Z cvt.u64.u32 %rd108, %r216; 2026-02-21T08:22:03.2242277Z or.b64 %rd98, %rd108, -9223371899382267904; 2026-02-21T08:22:03.2242337Z mov.pred %p15, -1; 2026-02-21T08:22:03.2242390Z // begin inline asm 2026-02-21T08:22:03.2242535Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 0 ], %rd97, %rd98, %r194, %p15; 2026-02-21T08:22:03.2242588Z // end inline asm 2026-02-21T08:22:03.2242641Z add.s32 %r217, %r209, 8192; 2026-02-21T08:22:03.2242702Z bfe.u32 %r218, %r217, 4, 14; 2026-02-21T08:22:03.2242757Z cvt.u64.u32 %rd109, %r218; 2026-02-21T08:22:03.2242824Z or.b64 %rd99, %rd109, -9223371899348713472; 2026-02-21T08:22:03.2242902Z // begin inline asm 2026-02-21T08:22:03.2243043Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 128 ], %rd99, %rd96, %r194, %p104; 2026-02-21T08:22:03.2243095Z // end inline asm 2026-02-21T08:22:03.2243150Z add.s32 %r219, %r209, 8224; 2026-02-21T08:22:03.2243211Z bfe.u32 %r220, %r219, 4, 14; 2026-02-21T08:22:03.2243265Z cvt.u64.u32 %rd110, %r220; 2026-02-21T08:22:03.2243333Z or.b64 %rd101, %rd110, -9223371899348713472; 2026-02-21T08:22:03.2243391Z // begin inline asm 2026-02-21T08:22:03.2243548Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r193 + 128 ], %rd101, %rd98, %r194, %p15; 2026-02-21T08:22:03.2243601Z // end inline asm 2026-02-21T08:22:03.2243655Z cvt.u64.u32 %rd103, %r207; 2026-02-21T08:22:03.2243716Z // begin inline asm 2026-02-21T08:22:03.2243837Z @%p14 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd103]; 2026-02-21T08:22:03.2243889Z // end inline asm 2026-02-21T08:22:03.2243998Z and.pred %p22, %p23, %p14; 2026-02-21T08:22:03.2244055Z add.s32 %r221, %r28, 155776; 2026-02-21T08:22:03.2244112Z cvt.u64.u32 %rd104, %r221; 2026-02-21T08:22:03.2244173Z // begin inline asm 2026-02-21T08:22:03.2244293Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd104]; 2026-02-21T08:22:03.2244345Z // end inline asm 2026-02-21T08:22:03.2244398Z bra.uni $L__BB0_8; 2026-02-21T08:22:03.2244502Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2244665Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2244762Z ld.shared.b32 %r45, [global_smem+8]; 2026-02-21T08:22:03.2244825Z barrier.sync 1; 2026-02-21T08:22:03.2244988Z .loc 1 21 67 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:21:67 2026-02-21T08:22:03.2245046Z mov.u32 %r32, %ctaid.x; 2026-02-21T08:22:03.2245112Z mov.u32 %r33, %ctaid.y; 2026-02-21T08:22:03.2245168Z mov.u32 %r34, %ctaid.z; 2026-02-21T08:22:03.2245229Z mov.u32 %r35, %nctaid.x; 2026-02-21T08:22:03.2245289Z mov.u32 %r36, %nctaid.y; 2026-02-21T08:22:03.2245360Z mad.lo.s32 %r37, %r34, %r36, %r33; 2026-02-21T08:22:03.2245432Z mad.lo.s32 %r38, %r37, %r35, %r32; 2026-02-21T08:22:03.2257013Z shl.b32 %r39, %r38, 7; 2026-02-21T08:22:03.2257173Z cvt.s64.s32 %rd10, %r39; 2026-02-21T08:22:03.2257254Z add.s64 %rd11, %rd9, %rd10; 2026-02-21T08:22:03.2257325Z cvta.global.u64 %rd12, %rd11; 2026-02-21T08:22:03.2257391Z add.s32 %r18, %r1, -256; 2026-02-21T08:22:03.2257453Z mov.b32 %r1604, 0; 2026-02-21T08:22:03.2257525Z mov.b32 %r1603, -32; 2026-02-21T08:22:03.2257585Z mov.b32 %r1605, %r1604; 2026-02-21T08:22:03.2257708Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:22:03.2257816Z // => This Inner Loop Header: Depth=2 2026-02-21T08:22:03.2258002Z .loc 1 0 67 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:0:67 2026-02-21T08:22:03.2258078Z setp.lt.u32 %p6, %r18, 32; 2026-02-21T08:22:03.2258148Z setp.eq.b32 %p4, %r18, 0; 2026-02-21T08:22:03.2258331Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2258393Z add.s32 %r1603, %r1603, 32; 2026-02-21T08:22:03.2258453Z shl.b32 %r47, %r1605, 3; 2026-02-21T08:22:03.2258523Z add.s32 %r49, %r28, %r47; 2026-02-21T08:22:03.2258580Z add.s32 %r40, %r49, 155648; 2026-02-21T08:22:03.2258747Z .loc 1 55 44 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:55:44 2026-02-21T08:22:03.2258819Z // begin inline asm 2026-02-21T08:22:03.2258872Z 2026-02-21T08:22:03.2258923Z { 2026-02-21T08:22:03.2258985Z .reg .pred complete; 2026-02-21T08:22:03.2259053Z waitLoop: 2026-02-21T08:22:03.2259178Z mbarrier.try_wait.parity.shared.b64 complete, [%r40], %r1604; 2026-02-21T08:22:03.2259243Z @!complete bra.uni waitLoop; 2026-02-21T08:22:03.2259301Z } 2026-02-21T08:22:03.2259307Z 2026-02-21T08:22:03.2259369Z // end inline asm 2026-02-21T08:22:03.2259661Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2259732Z add.s32 %r46, %r49, 155712; 2026-02-21T08:22:03.2259899Z .loc 1 55 44 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:55:44 2026-02-21T08:22:03.2259956Z bar.sync 3, 64; 2026-02-21T08:22:03.2260014Z // begin inline asm 2026-02-21T08:22:03.2260136Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r46], 8192; 2026-02-21T08:22:03.2260238Z // end inline asm 2026-02-21T08:22:03.2260299Z shl.b32 %r50, %r1605, 13; 2026-02-21T08:22:03.2260367Z add.s32 %r51, %r28, %r50; 2026-02-21T08:22:03.2260426Z add.s32 %r43, %r51, 98304; 2026-02-21T08:22:03.2260481Z bar.sync 3, 64; 2026-02-21T08:22:03.2260545Z elect.sync %r52|%p7, -1; 2026-02-21T08:22:03.2260618Z and.pred %p5, %p6, %p7; 2026-02-21T08:22:03.2260674Z // begin inline asm 2026-02-21T08:22:03.2260995Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r43], [%rd12, {%r1603, %r45}], [%r46]; 2026-02-21T08:22:03.2261063Z // end inline asm 2026-02-21T08:22:03.2261121Z add.s32 %r53, %r1605, 1; 2026-02-21T08:22:03.2261180Z setp.eq.b32 %p8, %r53, 7; 2026-02-21T08:22:03.2261253Z selp.b32 %r1605, 0, %r53, %p8; 2026-02-21T08:22:03.2261311Z selp.b32 %r54, 1, 0, %p8; 2026-02-21T08:22:03.2261370Z xor.b32 %r1604, %r1604, %r54; 2026-02-21T08:22:03.2261529Z .loc 1 49 79 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:49:79 2026-02-21T08:22:03.2261604Z setp.lt.u32 %p9, %r1603, 2016; 2026-02-21T08:22:03.2261662Z @%p9 bra $L__BB0_11; 2026-02-21T08:22:03.2261763Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2261830Z barrier.sync 1; 2026-02-21T08:22:03.2261885Z bra.uni $L__BB0_2; 2026-02-21T08:22:03.2261983Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2262049Z cp.async.wait_group 0; 2026-02-21T08:22:03.2262116Z bar.sync 2, 128; 2026-02-21T08:22:03.2262172Z barrier.sync 1; 2026-02-21T08:22:03.2262227Z bra.uni $L__BB0_2; 2026-02-21T08:22:03.2262328Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:22:03.2262484Z .loc 1 19 0 // cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py:19 2026-02-21T08:22:03.2262539Z barrier.sync 1; 2026-02-21T08:22:03.2262601Z barrier.sync 1; 2026-02-21T08:22:03.2262657Z bra.uni $L__BB0_2; 2026-02-21T08:22:03.2262713Z $L__tmp1: 2026-02-21T08:22:03.2262769Z $L__func_end0: 2026-02-21T08:22:03.2262861Z // -- End function 2026-02-21T08:22:03.2262912Z } 2026-02-21T08:22:03.2263116Z .file 1 "/tmp/torchinductor_root/to/cto4qetrbkizhy3a5x5ekgmzzpdc6mz7o3rte43nxnqz3kcf25sg.py" 2026-02-21T08:22:03.2263187Z .section .debug_abbrev 2026-02-21T08:22:03.2263239Z { 2026-02-21T08:22:03.2263329Z .b8 1 // Abbreviation Code 2026-02-21T08:22:03.2263418Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:22:03.2263506Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:22:03.2263583Z .b8 37 // DW_AT_producer 2026-02-21T08:22:03.2263656Z .b8 8 // DW_FORM_string 2026-02-21T08:22:03.2263739Z .b8 19 // DW_AT_language 2026-02-21T08:22:03.2263817Z .b8 5 // DW_FORM_data2 2026-02-21T08:22:03.2263894Z .b8 3 // DW_AT_name 2026-02-21T08:22:03.2263977Z .b8 8 // DW_FORM_string 2026-02-21T08:22:03.2264055Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:22:03.2264130Z .b8 6 // DW_FORM_data4 2026-02-21T08:22:03.2264212Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:22:03.2264284Z .b8 8 // DW_FORM_string 2026-02-21T08:22:03.2264380Z .b8 0 // EOM(1) 2026-02-21T08:22:03.2264446Z .b8 0 // EOM(2) 2026-02-21T08:22:03.2264522Z .b8 0 // EOM(3) 2026-02-21T08:22:03.2264572Z } 2026-02-21T08:22:03.2264632Z .section .debug_info 2026-02-21T08:22:03.2264729Z { 2026-02-21T08:22:03.2264812Z .b32 104 // Length of Unit 2026-02-21T08:22:03.2264898Z .b8 2 // DWARF version number 2026-02-21T08:22:03.2264978Z .b8 0 2026-02-21T08:22:03.2265109Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:22:03.2265200Z .b8 8 // Address Size (in bytes) 2026-02-21T08:22:03.2265301Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:22:03.2265392Z .b8 116 // DW_AT_producer 2026-02-21T08:22:03.2265499Z .b8 114 2026-02-21T08:22:03.2265554Z .b8 105 2026-02-21T08:22:03.2265615Z .b8 116 2026-02-21T08:22:03.2265667Z .b8 111 2026-02-21T08:22:03.2265719Z .b8 110 2026-02-21T08:22:03.2265769Z .b8 0 2026-02-21T08:22:03.2265853Z .b8 2 // DW_AT_language 2026-02-21T08:22:03.2265905Z .b8 0 2026-02-21T08:22:03.2265980Z .b8 99 // DW_AT_name 2026-02-21T08:22:03.2266042Z .b8 116 2026-02-21T08:22:03.2266091Z .b8 111 2026-02-21T08:22:03.2266141Z .b8 52 2026-02-21T08:22:03.2266192Z .b8 113 2026-02-21T08:22:03.2266252Z .b8 101 2026-02-21T08:22:03.2266304Z .b8 116 2026-02-21T08:22:03.2266354Z .b8 114 2026-02-21T08:22:03.2266406Z .b8 98 2026-02-21T08:22:03.2266465Z .b8 107 2026-02-21T08:22:03.2266515Z .b8 105 2026-02-21T08:22:03.2266565Z .b8 122 2026-02-21T08:22:03.2266623Z .b8 104 2026-02-21T08:22:03.2266673Z .b8 121 2026-02-21T08:22:03.2266724Z .b8 51 2026-02-21T08:22:03.2266773Z .b8 97 2026-02-21T08:22:03.2266834Z .b8 53 2026-02-21T08:22:03.2266889Z .b8 120 2026-02-21T08:22:03.2266941Z .b8 53 2026-02-21T08:22:03.2267003Z .b8 101 2026-02-21T08:22:03.2267055Z .b8 107 2026-02-21T08:22:03.2267105Z .b8 103 2026-02-21T08:22:03.2267157Z .b8 109 2026-02-21T08:22:03.2267219Z .b8 122 2026-02-21T08:22:03.2267269Z .b8 122 2026-02-21T08:22:03.2267318Z .b8 112 2026-02-21T08:22:03.2267378Z .b8 100 2026-02-21T08:22:03.2267428Z .b8 99 2026-02-21T08:22:03.2267478Z .b8 54 2026-02-21T08:22:03.2267527Z .b8 109 2026-02-21T08:22:03.2267587Z .b8 122 2026-02-21T08:22:03.2267638Z .b8 55 2026-02-21T08:22:03.2267688Z .b8 111 2026-02-21T08:22:03.2267739Z .b8 51 2026-02-21T08:22:03.2267799Z .b8 114 2026-02-21T08:22:03.2267849Z .b8 116 2026-02-21T08:22:03.2267901Z .b8 101 2026-02-21T08:22:03.2267959Z .b8 52 2026-02-21T08:22:03.2268007Z .b8 51 2026-02-21T08:22:03.2268058Z .b8 110 2026-02-21T08:22:03.2268107Z .b8 120 2026-02-21T08:22:03.2268167Z .b8 110 2026-02-21T08:22:03.2268220Z .b8 113 2026-02-21T08:22:03.2268269Z .b8 122 2026-02-21T08:22:03.2268328Z .b8 51 2026-02-21T08:22:03.2268381Z .b8 107 2026-02-21T08:22:03.2268433Z .b8 99 2026-02-21T08:22:03.2268484Z .b8 102 2026-02-21T08:22:03.2268545Z .b8 50 2026-02-21T08:22:03.2268593Z .b8 53 2026-02-21T08:22:03.2268643Z .b8 115 2026-02-21T08:22:03.2268691Z .b8 103 2026-02-21T08:22:03.2268749Z .b8 46 2026-02-21T08:22:03.2268799Z .b8 112 2026-02-21T08:22:03.2268848Z .b8 121 2026-02-21T08:22:03.2268906Z .b8 0 2026-02-21T08:22:03.2268995Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:22:03.2269070Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:22:03.2269122Z .b8 116 2026-02-21T08:22:03.2269183Z .b8 109 2026-02-21T08:22:03.2269234Z .b8 112 2026-02-21T08:22:03.2269283Z .b8 47 2026-02-21T08:22:03.2269344Z .b8 116 2026-02-21T08:22:03.2269393Z .b8 111 2026-02-21T08:22:03.2269445Z .b8 114 2026-02-21T08:22:03.2269493Z .b8 99 2026-02-21T08:22:03.2269553Z .b8 104 2026-02-21T08:22:03.2269605Z .b8 105 2026-02-21T08:22:03.2269655Z .b8 110 2026-02-21T08:22:03.2269714Z .b8 100 2026-02-21T08:22:03.2269767Z .b8 117 2026-02-21T08:22:03.2269818Z .b8 99 2026-02-21T08:22:03.2269909Z .b8 116 2026-02-21T08:22:03.2269973Z .b8 111 2026-02-21T08:22:03.2270024Z .b8 114 2026-02-21T08:22:03.2270077Z .b8 95 2026-02-21T08:22:03.2270138Z .b8 114 2026-02-21T08:22:03.2270187Z .b8 111 2026-02-21T08:22:03.2270236Z .b8 111 2026-02-21T08:22:03.2270285Z .b8 116 2026-02-21T08:22:03.2270343Z .b8 47 2026-02-21T08:22:03.2270393Z .b8 116 2026-02-21T08:22:03.2270443Z .b8 111 2026-02-21T08:22:03.2270493Z .b8 0 2026-02-21T08:22:03.2270552Z } 2026-02-21T08:22:03.2270618Z .section .debug_macinfo { } 2026-02-21T08:22:03.2270647Z 2026-02-21T08:22:03.2270728Z ================================================================ 2026-02-21T08:22:03.2270842Z please share the reproducer above with Triton project. 2026-02-21T08:22:06.7957581Z 2026-02-21T08:22:06.7958329Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 90/90 17.8 configs/s 2026-02-21T08:22:07.3330045Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1820.4 2026-02-21T08:22:07.3330446Z configs/s 2026-02-21T08:22:07.3839056Z [39s] Generation 2 complete: 2026-02-21T08:22:07.3843560Z error=6 2026-02-21T08:22:07.3847434Z ok=86 2026-02-21T08:22:07.3851578Z min=0.0389 2026-02-21T08:22:07.3857090Z mid=0.1761 2026-02-21T08:22:07.3860588Z max=3.6947 2026-02-21T08:22:07.3863346Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:22:07.3868388Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:22:07.3868747Z 'l2_groupings': [64], 2026-02-21T08:22:07.3868946Z 'load_eviction_policies': ['', ''], 2026-02-21T08:22:07.3869167Z 'loop_orders': [[1, 0]], 2026-02-21T08:22:07.3869339Z 'num_stages': 7, 2026-02-21T08:22:07.3869518Z 'num_warps': 8, 2026-02-21T08:22:07.3869715Z 'pid_type': 'flat', 2026-02-21T08:22:07.3869871Z 'range_flattens': [None, None], 2026-02-21T08:22:07.3870078Z 'range_multi_buffers': [None, True], 2026-02-21T08:22:07.3874034Z 'range_num_stages': [0, 0], 2026-02-21T08:22:07.3875703Z 'range_unroll_factors': [0, 0], 2026-02-21T08:22:07.3875970Z 'range_warp_specializes': [None, True]} 2026-02-21T08:22:07.3876191Z [39s] Fitting surrogate: 282 points, 282 targets 2026-02-21T08:22:08.4975467Z [40s] Generation 3 starting: 80 neighbors, 5 active search path(s) 2026-02-21T08:22:13.0877548Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83/83 8.0 configs/s 2026-02-21T08:22:17.3011809Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 83/83 19.7 configs/s 2026-02-21T08:22:18.0861364Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1262.2 2026-02-21T08:22:18.0862164Z configs/s 2026-02-21T08:22:18.1492819Z [49s] Generation 3 complete: 2026-02-21T08:22:18.1497800Z error=14 2026-02-21T08:22:18.1502506Z ok=72 2026-02-21T08:22:18.1507260Z min=0.0369 2026-02-21T08:22:18.1511327Z mid=0.1166 2026-02-21T08:22:18.1512830Z max=6.4564 2026-02-21T08:22:18.1513089Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:22:18.1513732Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:22:18.1513996Z 'l2_groupings': [64], 2026-02-21T08:22:18.1514186Z 'load_eviction_policies': ['', 'last'], 2026-02-21T08:22:18.1514404Z 'loop_orders': [[1, 0]], 2026-02-21T08:22:18.1514578Z 'num_stages': 7, 2026-02-21T08:22:18.1514915Z 'num_warps': 2, 2026-02-21T08:22:18.1515079Z 'pid_type': 'flat', 2026-02-21T08:22:18.1515266Z 'range_flattens': [None, None], 2026-02-21T08:22:18.1515477Z 'range_multi_buffers': [None, True], 2026-02-21T08:22:18.1515697Z 'range_num_stages': [0, 0], 2026-02-21T08:22:18.1515895Z 'range_unroll_factors': [0, 0], 2026-02-21T08:22:18.1516099Z 'range_warp_specializes': [None, True]} 2026-02-21T08:22:18.1516352Z [49s] Fitting surrogate: 368 points, 368 targets 2026-02-21T08:22:19.4822477Z [51s] Generation 4 starting: 78 neighbors, 5 active search path(s) 2026-02-21T08:22:26.5550137Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 5.8 configs/s 2026-02-21T08:22:30.8175790Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 19.0 configs/s 2026-02-21T08:22:31.5223196Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1403.1 2026-02-21T08:22:31.5224630Z configs/s 2026-02-21T08:22:31.5853104Z [63s] Generation 4 complete: 2026-02-21T08:22:31.5857373Z error=20 2026-02-21T08:22:31.5862653Z ok=63 2026-02-21T08:22:31.5867257Z min=0.0369 2026-02-21T08:22:31.5871870Z mid=0.1004 2026-02-21T08:22:31.5876691Z max=10.3025 2026-02-21T08:22:31.5881390Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:22:31.5881787Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:22:31.5882060Z 'l2_groupings': [64], 2026-02-21T08:22:31.5886127Z 'load_eviction_policies': ['', 'last'], 2026-02-21T08:22:31.5888155Z 'loop_orders': [[1, 0]], 2026-02-21T08:22:31.5888369Z 'num_stages': 6, 2026-02-21T08:22:31.5888559Z 'num_warps': 4, 2026-02-21T08:22:31.5888720Z 'pid_type': 'flat', 2026-02-21T08:22:31.5888913Z 'range_flattens': [None, None], 2026-02-21T08:22:31.5889110Z 'range_multi_buffers': [None, True], 2026-02-21T08:22:31.5889317Z 'range_num_stages': [0, 0], 2026-02-21T08:22:31.5889496Z 'range_unroll_factors': [0, 0], 2026-02-21T08:22:31.5889696Z 'range_warp_specializes': [None, True]} 2026-02-21T08:22:31.5890000Z [63s] Fitting surrogate: 451 points, 451 targets 2026-02-21T08:22:32.7287159Z [64s] Generation 5 starting: 71 neighbors, 5 active search path(s) 2026-02-21T08:22:45.1796129Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74/74 1.2 configs/s 2026-02-21T08:22:48.1436001Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 74/74 25.2 configs/s 2026-02-21T08:22:49.2931513Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 868.4 2026-02-21T08:22:49.2932453Z configs/s 2026-02-21T08:22:49.3770552Z [81s] Generation 5 complete: 2026-02-21T08:22:49.3772300Z error=25 2026-02-21T08:22:49.3772470Z ok=52 2026-02-21T08:22:49.3772607Z min=0.0369 2026-02-21T08:22:49.3772786Z mid=0.0553 2026-02-21T08:22:49.3772919Z max=3.0894 2026-02-21T08:22:49.3773061Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:22:49.3773322Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:22:49.3776951Z 'l2_groupings': [64], 2026-02-21T08:22:49.3780783Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:22:49.3785231Z 'loop_orders': [[1, 0]], 2026-02-21T08:22:49.3789356Z 'num_stages': 6, 2026-02-21T08:22:49.3794170Z 'num_warps': 2, 2026-02-21T08:22:49.3794444Z 'pid_type': 'flat', 2026-02-21T08:22:49.3794659Z 'range_flattens': [None, None], 2026-02-21T08:22:49.3794966Z 'range_multi_buffers': [None, True], 2026-02-21T08:22:49.3799208Z 'range_num_stages': [0, 0], 2026-02-21T08:22:49.3804388Z 'range_unroll_factors': [0, 0], 2026-02-21T08:22:49.3806016Z 'range_warp_specializes': [None, True]} 2026-02-21T08:22:49.3806363Z [81s] Fitting surrogate: 528 points, 528 targets 2026-02-21T08:22:50.3159190Z [81s] Generation 6 starting: 47 neighbors, 3 active search path(s) 2026-02-21T08:22:55.9846601Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48/48 3.7 configs/s 2026-02-21T08:22:58.3290476Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 48/48 20.5 configs/s 2026-02-21T08:22:59.0216237Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1424.3 2026-02-21T08:22:59.0217812Z configs/s 2026-02-21T08:22:59.0832635Z [90s] Generation 6 complete: 2026-02-21T08:22:59.0834525Z error=11 2026-02-21T08:22:59.0834759Z ok=40 2026-02-21T08:22:59.0834901Z min=0.0369 2026-02-21T08:22:59.0835029Z mid=0.1208 2026-02-21T08:22:59.0835160Z max=13.1645 2026-02-21T08:22:59.0835305Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:22:59.0835557Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:22:59.0836378Z 'l2_groupings': [64], 2026-02-21T08:22:59.0836572Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:22:59.0836770Z 'loop_orders': [[1, 0]], 2026-02-21T08:22:59.0836925Z 'num_stages': 6, 2026-02-21T08:22:59.0837074Z 'num_warps': 2, 2026-02-21T08:22:59.0837213Z 'pid_type': 'flat', 2026-02-21T08:22:59.0837377Z 'range_flattens': [None, None], 2026-02-21T08:22:59.0837555Z 'range_multi_buffers': [None, True], 2026-02-21T08:22:59.0837746Z 'range_num_stages': [0, 0], 2026-02-21T08:22:59.0837918Z 'range_unroll_factors': [0, 0], 2026-02-21T08:22:59.0838255Z 'range_warp_specializes': [None, True]} 2026-02-21T08:22:59.0855242Z [90s] Fitting surrogate: 579 points, 579 targets 2026-02-21T08:23:00.0519737Z [91s] Generation 7 starting: 50 neighbors, 3 active search path(s) 2026-02-21T08:23:04.0817288Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51/51 13.0 configs/s 2026-02-21T08:23:06.2213938Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 51/51 24.6 configs/s 2026-02-21T08:23:06.4498058Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 4128.4 2026-02-21T08:23:06.4503564Z configs/s 2026-02-21T08:23:06.4835771Z [98s] Generation 7 complete: 2026-02-21T08:23:06.4836062Z error=17 2026-02-21T08:23:06.4842151Z ok=37 2026-02-21T08:23:06.4846252Z min=0.0369 2026-02-21T08:23:06.4850516Z mid=0.1208 2026-02-21T08:23:06.4852808Z max=8.2781 2026-02-21T08:23:06.4859656Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:23:06.4861868Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:23:06.4862099Z 'l2_groupings': [64], 2026-02-21T08:23:06.4862265Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:23:06.4862501Z 'loop_orders': [[1, 0]], 2026-02-21T08:23:06.4867964Z 'num_stages': 6, 2026-02-21T08:23:06.4869574Z 'num_warps': 2, 2026-02-21T08:23:06.4869825Z 'pid_type': 'flat', 2026-02-21T08:23:06.4870006Z 'range_flattens': [None, None], 2026-02-21T08:23:06.4875676Z 'range_multi_buffers': [None, True], 2026-02-21T08:23:06.4877852Z 'range_num_stages': [0, 0], 2026-02-21T08:23:06.4883415Z 'range_unroll_factors': [0, 0], 2026-02-21T08:23:06.4887445Z 'range_warp_specializes': [None, True]} 2026-02-21T08:23:06.4889514Z [98s] Fitting surrogate: 633 points, 633 targets 2026-02-21T08:23:07.0914316Z [98s] Generation 8 starting: 23 neighbors, 2 active search path(s) 2026-02-21T08:23:09.7761737Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24/24 8.6 configs/s 2026-02-21T08:23:10.7711755Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 24/24 22.4 configs/s 2026-02-21T08:23:11.1675422Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2450.5 2026-02-21T08:23:11.1675833Z configs/s 2026-02-21T08:23:11.2127588Z [102s] Generation 8 complete: 2026-02-21T08:23:11.2129493Z error=9 2026-02-21T08:23:11.2129739Z ok=17 2026-02-21T08:23:11.2137033Z min=0.0369 2026-02-21T08:23:11.2140990Z mid=0.0757 2026-02-21T08:23:11.2145663Z max=8.6231 2026-02-21T08:23:11.2149272Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:23:11.2153408Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:23:11.2154874Z 'l2_groupings': [64], 2026-02-21T08:23:11.2155056Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:23:11.2155259Z 'loop_orders': [[1, 0]], 2026-02-21T08:23:11.2155412Z 'num_stages': 6, 2026-02-21T08:23:11.2155556Z 'num_warps': 2, 2026-02-21T08:23:11.2155694Z 'pid_type': 'flat', 2026-02-21T08:23:11.2155875Z 'range_flattens': [None, None], 2026-02-21T08:23:11.2156052Z 'range_multi_buffers': [None, True], 2026-02-21T08:23:11.2156249Z 'range_num_stages': [0, 0], 2026-02-21T08:23:11.2156419Z 'range_unroll_factors': [0, 0], 2026-02-21T08:23:11.2156594Z 'range_warp_specializes': [None, True]} 2026-02-21T08:23:11.2156805Z [102s] Fitting surrogate: 659 points, 659 targets 2026-02-21T08:23:11.6823452Z [103s] Generation 9 starting: 14 neighbors, 1 active search path(s) 2026-02-21T08:23:16.2181496Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14/14 3.0 configs/s 2026-02-21T08:23:16.6860491Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 14/14 32.9 configs/s 2026-02-21T08:23:16.6865213Z [108s] Generation 9 complete: 2026-02-21T08:23:16.6867805Z error=7 2026-02-21T08:23:16.6873449Z ok=9 2026-02-21T08:23:16.6877544Z min=0.0369 2026-02-21T08:23:16.6879080Z mid=0.0533 2026-02-21T08:23:16.6879232Z max=3.3050 2026-02-21T08:23:16.6879431Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:23:16.6880049Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:23:16.6880275Z 'l2_groupings': [64], 2026-02-21T08:23:16.6884545Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:23:16.6886795Z 'loop_orders': [[1, 0]], 2026-02-21T08:23:16.6886995Z 'num_stages': 6, 2026-02-21T08:23:16.6887147Z 'num_warps': 2, 2026-02-21T08:23:16.6887289Z 'pid_type': 'flat', 2026-02-21T08:23:16.6887472Z 'range_flattens': [None, None], 2026-02-21T08:23:16.6887653Z 'range_multi_buffers': [None, True], 2026-02-21T08:23:16.6887849Z 'range_num_stages': [0, 0], 2026-02-21T08:23:16.6888009Z 'range_unroll_factors': [0, 0], 2026-02-21T08:23:16.6888190Z 'range_warp_specializes': [None, True]} 2026-02-21T08:23:16.6888504Z [108s] Fitting surrogate: 675 points, 675 targets 2026-02-21T08:23:17.0006137Z [108s] Autotuning complete in 108.7s after searching 653 configs. 2026-02-21T08:23:17.0007812Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:23:17.0008977Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T08:23:17.0015657Z 2026-02-21T08:23:17.0015985Z [108s] Code of selected kernel: /tmp/torchinductor_root/hg/chg4yh6iinuzt4w627dhumfckppd5lnetlkykkdogp75zyntbhdo.py 2026-02-21T08:23:45.9721667Z WARNING:tritonbench.utils.triton_op:Completed input ID 3: 2026-02-21T08:23:45.9722112Z (M, N, K) 2026-02-21T08:23:45.9722329Z ------------------ 2026-02-21T08:23:45.9722556Z (2048, 4096, 2048) 2026-02-21T08:23:45.9722696Z 2026-02-21T08:23:45.9737704Z 38%|███▊ | 3/8 [13:35<22:47, 273.58s/it]WARNING:tritonbench.utils.triton_op:Running input ID 5: 2026-02-21T08:23:45.9738222Z (M, N, K) 2026-02-21T08:23:45.9738462Z ------------------ 2026-02-21T08:23:45.9738688Z (1024, 8192, 1024) 2026-02-21T08:23:45.9741268Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:24:33.2741596Z INFO:tritonbench.utils.triton_op:Took 0.02ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:25:08.8344515Z Autotune Choices Stats: 2026-02-21T08:25:08.8348886Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_74", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.028511999174952507, "best_triton_pos": 0} 2026-02-21T08:25:08.8354952Z AUTOTUNE mm(1024x1024, 1024x8192) 2026-02-21T08:25:08.8359531Z strides: [1024, 1], [1, 1024] 2026-02-21T08:25:08.8363479Z dtypes: torch.float16, torch.float16 2026-02-21T08:25:08.8368315Z triton_mm_74 0.0285 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:25:08.8372798Z triton_mm_68 0.0346 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:25:08.8373891Z triton_mm_75 0.0357 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:25:08.8374621Z triton_mm_66 0.0387 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:25:08.8375390Z triton_mm_67 0.0387 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:25:08.8376041Z triton_mm_71 0.0388 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:25:08.8376819Z triton_mm_73 0.0399 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:25:08.8377483Z triton_mm_70 0.0407 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:25:08.8378136Z triton_mm_64 0.0408 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:25:08.8378795Z triton_mm_69 0.0481 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T08:25:08.8379388Z SingleProcess AUTOTUNE benchmarking takes 0.3126 seconds and 0.2967 seconds precompiling for 19 choices 2026-02-21T08:25:09.1163529Z INFO:tritonbench.utils.triton_op:Took 1067.22ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:25:45.0883971Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:25:45.0888916Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:25:45.0893443Z 'dtype': 'torch.float16', 2026-02-21T08:25:45.0897996Z 'shape': (1024, 1024), 2026-02-21T08:25:45.0902574Z 'stride': (1024, 1)}, 2026-02-21T08:25:45.0906495Z { 'device': 'cuda:0', 2026-02-21T08:25:45.0910446Z 'dtype': 'torch.float16', 2026-02-21T08:25:45.0912082Z 'shape': (1024, 8192), 2026-02-21T08:25:45.0912300Z 'stride': (1, 1024)}, 2026-02-21T08:25:45.0928545Z None), 2026-02-21T08:25:45.0928736Z 'kwargs': {}} 2026-02-21T08:25:45.0929045Z INFO:tritonbench.utils.triton_op:Took 4.74ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:25:45.1816892Z [0s] Autotune random seed: 2134884919 2026-02-21T08:25:45.2841929Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:25:49.4914998Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 48.9 configs/s 2026-02-21T08:25:57.7382318Z [12s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:25:57.7382642Z 2026-02-21T08:25:57.7382647Z 2026-02-21T08:25:57.7383041Z ================================================================ 2026-02-21T08:25:57.7383265Z Internal Triton PTX codegen error 2026-02-21T08:25:57.7383436Z `ptxas` stderr: 2026-02-21T08:25:57.7383862Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 260 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:25:57.7384344Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:25:57.7384490Z 2026-02-21T08:25:57.7385158Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5ga16to_.ptx -o /tmp/tmp5ga16to_.ptx.o 2026-02-21T08:25:57.7385613Z 2026-02-21T08:25:57.7385617Z 2026-02-21T08:25:57.7385673Z // 2026-02-21T08:25:57.7385840Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:25:57.7386028Z // 2026-02-21T08:25:57.7386097Z 2026-02-21T08:25:57.7386157Z .version 8.7 2026-02-21T08:25:57.7386523Z .target sm_100a 2026-02-21T08:25:57.7386669Z .address_size 64 2026-02-21T08:25:57.7386769Z 2026-02-21T08:25:57.7386898Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:25:57.7387163Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:25:57.7387370Z // @_helion_matmul 2026-02-21T08:25:57.7387570Z .visible .entry _helion_matmul( 2026-02-21T08:25:57.7387779Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:25:57.7388184Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:25:57.7388427Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:25:57.7388672Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:25:57.7389812Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:25:57.7390957Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:25:57.7391400Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:25:57.7391625Z ) 2026-02-21T08:25:57.7391737Z .reqntid 128 2026-02-21T08:25:57.7391868Z .maxnreg 32 2026-02-21T08:25:57.7391985Z { 2026-02-21T08:25:57.7392113Z .reg .pred %p<73>; 2026-02-21T08:25:57.7392252Z .reg .b16 %rs<8>; 2026-02-21T08:25:57.7392391Z .reg .b32 %r<548>; 2026-02-21T08:25:57.7392524Z .reg .b64 %rd<191>; 2026-02-21T08:25:57.7392666Z $L__func_begin0: 2026-02-21T08:25:57.7392745Z 2026-02-21T08:25:57.7392803Z // %bb.0: 2026-02-21T08:25:57.7393043Z .loc 1 19 0 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:19 2026-02-21T08:25:57.7393332Z mov.u32 %r1, %tid.x; 2026-02-21T08:25:57.7393501Z ld.param.b64 %rd13, [_helion_matmul_param_1]; 2026-02-21T08:25:57.7393707Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:25:57.7393866Z mov.b32 %r55, global_smem; 2026-02-21T08:25:57.7394024Z // begin inline asm 2026-02-21T08:25:57.7394259Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r55], 128; 2026-02-21T08:25:57.7394511Z // end inline asm 2026-02-21T08:25:57.7394721Z ld.param.b64 %rd30, [_helion_matmul_param_3]; 2026-02-21T08:25:57.7394908Z bar.sync 0; 2026-02-21T08:25:57.7395059Z ld.shared.b32 %r539, [global_smem]; 2026-02-21T08:25:57.7395231Z bar.sync 0; 2026-02-21T08:25:57.7395371Z // begin inline asm 2026-02-21T08:25:57.7395571Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:25:57.7395801Z // end inline asm 2026-02-21T08:25:57.7396062Z .loc 1 21 67 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:21:67 2026-02-21T08:25:57.7396382Z mov.u32 %r64, %ctaid.x; 2026-02-21T08:25:57.7396582Z mov.u32 %r65, %ctaid.y; 2026-02-21T08:25:57.7396728Z mov.u32 %r66, %ctaid.z; 2026-02-21T08:25:57.7396881Z mov.u32 %r67, %nctaid.x; 2026-02-21T08:25:57.7397028Z mov.u32 %r68, %nctaid.y; 2026-02-21T08:25:57.7397184Z mad.lo.s32 %r69, %r66, %r68, %r65; 2026-02-21T08:25:57.7397351Z mad.lo.s32 %r70, %r69, %r67, %r64; 2026-02-21T08:25:57.7397519Z shl.b32 %r71, %r70, 7; 2026-02-21T08:25:57.7397668Z cvt.s64.s32 %rd31, %r71; 2026-02-21T08:25:57.7397825Z add.s64 %rd27, %rd30, %rd31; 2026-02-21T08:25:57.7397979Z shl.b32 %r72, %r1, 2; 2026-02-21T08:25:57.7398128Z add.s32 %r56, %r55, %r72; 2026-02-21T08:25:57.7398280Z mov.b32 %r57, 0; 2026-02-21T08:25:57.7398432Z // begin inline asm 2026-02-21T08:25:57.7398587Z @%p1 st.shared.b32 [ %r56 + 0 ], %r57; 2026-02-21T08:25:57.7398754Z // end inline asm 2026-02-21T08:25:57.7398897Z bar.warp.sync -1; 2026-02-21T08:25:57.7399042Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T08:25:57.7399271Z cvt.u64.u32 %rd12, %r55; 2026-02-21T08:25:57.7399415Z // begin inline asm 2026-02-21T08:25:57.7399668Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd12 + 0 ], %rd13; 2026-02-21T08:25:57.7399942Z // end inline asm 2026-02-21T08:25:57.7400071Z // begin inline asm 2026-02-21T08:25:57.7400290Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1; 2026-02-21T08:25:57.7400535Z // end inline asm 2026-02-21T08:25:57.7400669Z mov.b32 %r58, 16; 2026-02-21T08:25:57.7400797Z // begin inline asm 2026-02-21T08:25:57.7401066Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0, %r58; 2026-02-21T08:25:57.7401318Z // end inline asm 2026-02-21T08:25:57.7401453Z mov.b32 %r59, 128; 2026-02-21T08:25:57.7401590Z // begin inline asm 2026-02-21T08:25:57.7403369Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1, %r59; 2026-02-21T08:25:57.7403651Z // end inline asm 2026-02-21T08:25:57.7403793Z mov.b32 %r60, 1024; 2026-02-21T08:25:57.7403937Z // begin inline asm 2026-02-21T08:25:57.7404181Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0, %r60; 2026-02-21T08:25:57.7404459Z // end inline asm 2026-02-21T08:25:57.7404589Z mov.b32 %r61, 8192; 2026-02-21T08:25:57.7404775Z // begin inline asm 2026-02-21T08:25:57.7405007Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1, %r61; 2026-02-21T08:25:57.7405285Z // end inline asm 2026-02-21T08:25:57.7405415Z mov.b64 %rd20, 2048; 2026-02-21T08:25:57.7405566Z // begin inline asm 2026-02-21T08:25:57.7405818Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd12 + 0 ], 0x0, %rd20; 2026-02-21T08:25:57.7406100Z // end inline asm 2026-02-21T08:25:57.7406244Z mov.b32 %r62, 1; 2026-02-21T08:25:57.7406378Z // begin inline asm 2026-02-21T08:25:57.7406684Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0, %r62; 2026-02-21T08:25:57.7406960Z // end inline asm 2026-02-21T08:25:57.7407097Z // begin inline asm 2026-02-21T08:25:57.7407337Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1, %r62; 2026-02-21T08:25:57.7407620Z // end inline asm 2026-02-21T08:25:57.7407747Z // begin inline asm 2026-02-21T08:25:57.7407975Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x6; 2026-02-21T08:25:57.7408238Z // end inline asm 2026-02-21T08:25:57.7408365Z // begin inline asm 2026-02-21T08:25:57.7408612Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0; 2026-02-21T08:25:57.7408885Z // end inline asm 2026-02-21T08:25:57.7409021Z // begin inline asm 2026-02-21T08:25:57.7409247Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1; 2026-02-21T08:25:57.7409522Z // end inline asm 2026-02-21T08:25:57.7409663Z // begin inline asm 2026-02-21T08:25:57.7409893Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0; 2026-02-21T08:25:57.7410165Z // end inline asm 2026-02-21T08:25:57.7410348Z // begin inline asm 2026-02-21T08:25:57.7410714Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd27 + 0 ], [ %rd12 + 0 ], 0x80; 2026-02-21T08:25:57.7411114Z // end inline asm 2026-02-21T08:25:57.7411261Z // begin inline asm 2026-02-21T08:25:57.7411480Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd27 + 0 ], 0x80; 2026-02-21T08:25:57.7411739Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:25:57.7411941Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:25:57.7412127Z // end inline asm 2026-02-21T08:25:57.7412275Z bar.sync 0; 2026-02-21T08:25:57.7412422Z cvta.global.u64 %rd44, %rd27; 2026-02-21T08:25:57.7412723Z .loc 1 28 35 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:28:35 2026-02-21T08:25:57.7413032Z shl.b32 %r540, %r64, 2; 2026-02-21T08:25:57.7413343Z .loc 1 29 37 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:29:37 2026-02-21T08:25:57.7413650Z add.s32 %r73, %r540, 4; 2026-02-21T08:25:57.7413921Z .loc 1 29 49 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:29:49 2026-02-21T08:25:57.7414223Z min.s32 %r4, %r73, 1024; 2026-02-21T08:25:57.7414490Z .loc 1 30 74 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:30:74 2026-02-21T08:25:57.7414834Z setp.ge.s32 %p21, %r540, %r4; 2026-02-21T08:25:57.7415001Z @%p21 bra $L__BB0_9; 2026-02-21T08:25:57.7415173Z // %bb.1: // %.lr.ph 2026-02-21T08:25:57.7415523Z .loc 1 0 74 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:0:74 2026-02-21T08:25:57.7415846Z ld.param.b64 %rd11, [_helion_matmul_param_2]; 2026-02-21T08:25:57.7416078Z ld.param.b64 %rd10, [_helion_matmul_param_0]; 2026-02-21T08:25:57.7416270Z shr.u32 %r5, %r1, 5; 2026-02-21T08:25:57.7416499Z and.b32 %r6, %r1, 15; 2026-02-21T08:25:57.7416651Z shl.b32 %r7, %r6, 3; 2026-02-21T08:25:57.7416805Z bfe.u32 %r8, %r1, 1, 6; 2026-02-21T08:25:57.7416956Z shr.u32 %r74, %r1, 4; 2026-02-21T08:25:57.7417114Z bfe.u32 %r9, %r1, 4, 3; 2026-02-21T08:25:57.7417275Z or.b32 %r10, %r9, 8; 2026-02-21T08:25:57.7417419Z or.b32 %r11, %r9, 16; 2026-02-21T08:25:57.7417577Z or.b32 %r12, %r9, 24; 2026-02-21T08:25:57.7417734Z or.b32 %r13, %r9, 32; 2026-02-21T08:25:57.7417886Z or.b32 %r14, %r9, 40; 2026-02-21T08:25:57.7418027Z or.b32 %r15, %r9, 48; 2026-02-21T08:25:57.7418184Z or.b32 %r16, %r74, 56; 2026-02-21T08:25:57.7418337Z shl.b32 %r75, %r1, 3; 2026-02-21T08:25:57.7418484Z and.b32 %r17, %r75, 8; 2026-02-21T08:25:57.7418624Z shl.b32 %r76, %r1, 4; 2026-02-21T08:25:57.7418772Z and.b32 %r77, %r76, 1904; 2026-02-21T08:25:57.7418925Z bfe.s32 %r78, %r1, 3, 1; 2026-02-21T08:25:57.7419078Z and.b32 %r79, %r78, 144; 2026-02-21T08:25:57.7419232Z xor.b32 %r80, %r79, %r77; 2026-02-21T08:25:57.7419383Z add.s32 %r82, %r55, 16384; 2026-02-21T08:25:57.7419546Z add.s32 %r180, %r82, %r80; 2026-02-21T08:25:57.7419693Z add.s32 %r83, %r55, %r80; 2026-02-21T08:25:57.7419850Z add.s32 %r187, %r83, 18432; 2026-02-21T08:25:57.7420001Z add.s32 %r194, %r83, 20480; 2026-02-21T08:25:57.7420159Z bfe.u32 %r84, %r82, 4, 14; 2026-02-21T08:25:57.7420306Z cvt.u64.u32 %rd32, %r84; 2026-02-21T08:25:57.7420478Z or.b64 %rd40, %rd32, -4611685949699522560; 2026-02-21T08:25:57.7420661Z bfe.u32 %r85, %r55, 4, 14; 2026-02-21T08:25:57.7420809Z cvt.u64.u32 %rd33, %r85; 2026-02-21T08:25:57.7420972Z or.b64 %rd41, %rd33, -4611685949691133952; 2026-02-21T08:25:57.7421145Z add.s32 %r224, %r83, 22528; 2026-02-21T08:25:57.7421297Z shl.b32 %r86, %r1, 9; 2026-02-21T08:25:57.7421436Z and.b32 %r87, %r86, 3072; 2026-02-21T08:25:57.7421584Z shl.b32 %r88, %r6, 4; 2026-02-21T08:25:57.7421720Z and.b32 %r89, %r1, 96; 2026-02-21T08:25:57.7421864Z shl.b32 %r90, %r89, 3; 2026-02-21T08:25:57.7422004Z and.b32 %r92, %r72, 64; 2026-02-21T08:25:57.7422153Z or.b32 %r93, %r88, %r90; 2026-02-21T08:25:57.7422301Z xor.b32 %r94, %r93, %r92; 2026-02-21T08:25:57.7422477Z or.b32 %r95, %r94, %r87; 2026-02-21T08:25:57.7422626Z add.s32 %r22, %r55, %r95; 2026-02-21T08:25:57.7422771Z xor.b32 %r96, %r95, 32; 2026-02-21T08:25:57.7422918Z add.s32 %r23, %r55, %r96; 2026-02-21T08:25:57.7423060Z shl.b32 %r97, %r1, 5; 2026-02-21T08:25:57.7423202Z and.b32 %r98, %r97, 3168; 2026-02-21T08:25:57.7423344Z and.b32 %r99, %r76, 384; 2026-02-21T08:25:57.7423492Z and.b32 %r100, %r72, 16; 2026-02-21T08:25:57.7423634Z or.b32 %r101, %r98, %r99; 2026-02-21T08:25:57.7423790Z xor.b32 %r102, %r101, %r89; 2026-02-21T08:25:57.7423950Z add.s32 %r103, %r55, %r100; 2026-02-21T08:25:57.7424104Z add.s32 %r358, %r103, %r102; 2026-02-21T08:25:57.7424265Z add.s32 %r363, %r358, 512; 2026-02-21T08:25:57.7424523Z .loc 1 30 74 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:30:74 2026-02-21T08:25:57.7424842Z shl.b32 %r104, %r8, 10; 2026-02-21T08:25:57.7425158Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7425458Z or.b32 %r105, %r104, %r17; 2026-02-21T08:25:57.7425610Z or.b32 %r26, %r105, 64; 2026-02-21T08:25:57.7425774Z bra.uni $L__BB0_2; 2026-02-21T08:25:57.7425975Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:25:57.7426302Z .loc 1 0 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:0:90 2026-02-21T08:25:57.7426599Z mov.b32 %r279, 1; 2026-02-21T08:25:57.7426846Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7427166Z // begin inline asm 2026-02-21T08:25:57.7427298Z 2026-02-21T08:25:57.7427416Z { 2026-02-21T08:25:57.7427535Z .reg .pred complete; 2026-02-21T08:25:57.7427682Z waitLoop: 2026-02-21T08:25:57.7427919Z mbarrier.try_wait.parity.shared.b64 complete, [%r278], %r279; 2026-02-21T08:25:57.7428154Z @!complete bra.uni waitLoop; 2026-02-21T08:25:57.7428318Z } 2026-02-21T08:25:57.7428384Z 2026-02-21T08:25:57.7428443Z // end inline asm 2026-02-21T08:25:57.7428705Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7428997Z cp.async.wait_group 0; 2026-02-21T08:25:57.7429157Z bar.sync 0; 2026-02-21T08:25:57.7429288Z // begin inline asm 2026-02-21T08:25:57.7429465Z @%p4 mbarrier.inval.shared::cta.b64 [%r176]; 2026-02-21T08:25:57.7429659Z // end inline asm 2026-02-21T08:25:57.7429795Z bar.sync 0; 2026-02-21T08:25:57.7429935Z // begin inline asm 2026-02-21T08:25:57.7430096Z @%p4 mbarrier.inval.shared::cta.b64 [%r177]; 2026-02-21T08:25:57.7430283Z // end inline asm 2026-02-21T08:25:57.7430414Z bar.sync 0; 2026-02-21T08:25:57.7430549Z // begin inline asm 2026-02-21T08:25:57.7430707Z @%p4 mbarrier.inval.shared::cta.b64 [%r178]; 2026-02-21T08:25:57.7430897Z // end inline asm 2026-02-21T08:25:57.7431037Z bar.sync 0; 2026-02-21T08:25:57.7431164Z // begin inline asm 2026-02-21T08:25:57.7431331Z @%p4 mbarrier.inval.shared::cta.b64 [%r226]; 2026-02-21T08:25:57.7431511Z // end inline asm 2026-02-21T08:25:57.7431656Z add.s32 %r284, %r55, 24608; 2026-02-21T08:25:57.7431809Z // begin inline asm 2026-02-21T08:25:57.7431974Z @%p4 mbarrier.inval.shared::cta.b64 [%r284]; 2026-02-21T08:25:57.7432151Z // end inline asm 2026-02-21T08:25:57.7432291Z bar.sync 0; 2026-02-21T08:25:57.7432417Z // begin inline asm 2026-02-21T08:25:57.7432585Z @%p4 mbarrier.inval.shared::cta.b64 [%r175]; 2026-02-21T08:25:57.7432770Z // end inline asm 2026-02-21T08:25:57.7433022Z .loc 1 59 45 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:59:45 2026-02-21T08:25:57.7433324Z shl.b32 %r427, %r31, 13; 2026-02-21T08:25:57.7433476Z shl.b32 %r428, %r32, 13; 2026-02-21T08:25:57.7433632Z shl.b32 %r429, %r33, 13; 2026-02-21T08:25:57.7433778Z shl.b32 %r430, %r34, 13; 2026-02-21T08:25:57.7433933Z shl.b32 %r431, %r35, 13; 2026-02-21T08:25:57.7434080Z shl.b32 %r432, %r36, 13; 2026-02-21T08:25:57.7434235Z shl.b32 %r433, %r37, 13; 2026-02-21T08:25:57.7434413Z shl.b32 %r434, %r38, 13; 2026-02-21T08:25:57.7434665Z .loc 1 59 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:59:52 2026-02-21T08:25:57.7434983Z add.s32 %r435, %r427, %r30; 2026-02-21T08:25:57.7435135Z add.s32 %r436, %r428, %r30; 2026-02-21T08:25:57.7435290Z add.s32 %r437, %r429, %r30; 2026-02-21T08:25:57.7435436Z add.s32 %r438, %r430, %r30; 2026-02-21T08:25:57.7435590Z add.s32 %r439, %r431, %r30; 2026-02-21T08:25:57.7435739Z add.s32 %r440, %r432, %r30; 2026-02-21T08:25:57.7435888Z add.s32 %r441, %r433, %r30; 2026-02-21T08:25:57.7436043Z add.s32 %r442, %r434, %r30; 2026-02-21T08:25:57.7436297Z .loc 1 59 24 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:59:24 2026-02-21T08:25:57.7436591Z mad.wide.s32 %rd53, %r435, 2, %rd11; 2026-02-21T08:25:57.7436770Z mad.wide.s32 %rd54, %r436, 2, %rd11; 2026-02-21T08:25:57.7436977Z mad.wide.s32 %rd55, %r437, 2, %rd11; 2026-02-21T08:25:57.7437145Z mad.wide.s32 %rd56, %r438, 2, %rd11; 2026-02-21T08:25:57.7437316Z mad.wide.s32 %rd57, %r439, 2, %rd11; 2026-02-21T08:25:57.7437476Z mad.wide.s32 %rd58, %r440, 2, %rd11; 2026-02-21T08:25:57.7437647Z mad.wide.s32 %rd59, %r441, 2, %rd11; 2026-02-21T08:25:57.7437815Z mad.wide.s32 %rd60, %r442, 2, %rd11; 2026-02-21T08:25:57.7438078Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7438366Z // begin inline asm 2026-02-21T08:25:57.7438761Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301}, [%r353 + 0], 64; 2026-02-21T08:25:57.7439156Z // end inline asm 2026-02-21T08:25:57.7439287Z // begin inline asm 2026-02-21T08:25:57.7439688Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r303, %r304, %r305, %r306, %r307, %r308, %r309, %r310, %r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318}, [%r353 + 16], 64; 2026-02-21T08:25:57.7440087Z // end inline asm 2026-02-21T08:25:57.7440216Z // begin inline asm 2026-02-21T08:25:57.7440569Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335}, [%r353 + 32], 64; 2026-02-21T08:25:57.7440943Z // end inline asm 2026-02-21T08:25:57.7441081Z // begin inline asm 2026-02-21T08:25:57.7441422Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352}, [%r353 + 48], 64; 2026-02-21T08:25:57.7441805Z // end inline asm 2026-02-21T08:25:57.7441941Z // begin inline asm 2026-02-21T08:25:57.7442087Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:25:57.7442253Z // end inline asm 2026-02-21T08:25:57.7442385Z cvt.u64.u32 %rd61, %r286; 2026-02-21T08:25:57.7442543Z cvt.u64.u32 %rd62, %r287; 2026-02-21T08:25:57.7442689Z shl.b64 %rd63, %rd62, 32; 2026-02-21T08:25:57.7442843Z or.b64 %rd64, %rd61, %rd63; 2026-02-21T08:25:57.7443113Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7443396Z mov.b64 {%r443, %r444}, %rd64; 2026-02-21T08:25:57.7443569Z cvt.rn.f16x2.f32 %r445, %r444, %r443; 2026-02-21T08:25:57.7443847Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7444130Z cvt.u64.u32 %rd65, %r288; 2026-02-21T08:25:57.7444277Z cvt.u64.u32 %rd66, %r289; 2026-02-21T08:25:57.7444428Z shl.b64 %rd67, %rd66, 32; 2026-02-21T08:25:57.7444573Z or.b64 %rd68, %rd65, %rd67; 2026-02-21T08:25:57.7444909Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7445205Z mov.b64 {%r446, %r447}, %rd68; 2026-02-21T08:25:57.7445375Z cvt.rn.f16x2.f32 %r448, %r447, %r446; 2026-02-21T08:25:57.7445668Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7445987Z cvt.u64.u32 %rd69, %r290; 2026-02-21T08:25:57.7446150Z cvt.u64.u32 %rd70, %r291; 2026-02-21T08:25:57.7446300Z shl.b64 %rd71, %rd70, 32; 2026-02-21T08:25:57.7446465Z or.b64 %rd72, %rd69, %rd71; 2026-02-21T08:25:57.7446733Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7447028Z mov.b64 {%r449, %r450}, %rd72; 2026-02-21T08:25:57.7447206Z cvt.rn.f16x2.f32 %r451, %r450, %r449; 2026-02-21T08:25:57.7447490Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7447774Z cvt.u64.u32 %rd73, %r292; 2026-02-21T08:25:57.7447919Z cvt.u64.u32 %rd74, %r293; 2026-02-21T08:25:57.7448071Z shl.b64 %rd75, %rd74, 32; 2026-02-21T08:25:57.7448216Z or.b64 %rd76, %rd73, %rd75; 2026-02-21T08:25:57.7448513Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7448802Z mov.b64 {%r452, %r453}, %rd76; 2026-02-21T08:25:57.7448964Z cvt.rn.f16x2.f32 %r454, %r453, %r452; 2026-02-21T08:25:57.7449241Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7449524Z cvt.u64.u32 %rd77, %r294; 2026-02-21T08:25:57.7449676Z cvt.u64.u32 %rd78, %r295; 2026-02-21T08:25:57.7449819Z shl.b64 %rd79, %rd78, 32; 2026-02-21T08:25:57.7449971Z or.b64 %rd80, %rd77, %rd79; 2026-02-21T08:25:57.7450229Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7450537Z mov.b64 {%r455, %r456}, %rd80; 2026-02-21T08:25:57.7450703Z cvt.rn.f16x2.f32 %r457, %r456, %r455; 2026-02-21T08:25:57.7450979Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7451288Z cvt.u64.u32 %rd81, %r296; 2026-02-21T08:25:57.7451431Z cvt.u64.u32 %rd82, %r297; 2026-02-21T08:25:57.7451581Z shl.b64 %rd83, %rd82, 32; 2026-02-21T08:25:57.7451723Z or.b64 %rd84, %rd81, %rd83; 2026-02-21T08:25:57.7451980Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7452259Z mov.b64 {%r458, %r459}, %rd84; 2026-02-21T08:25:57.7452418Z cvt.rn.f16x2.f32 %r460, %r459, %r458; 2026-02-21T08:25:57.7452698Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7452987Z cvt.u64.u32 %rd85, %r298; 2026-02-21T08:25:57.7453144Z cvt.u64.u32 %rd86, %r299; 2026-02-21T08:25:57.7453292Z shl.b64 %rd87, %rd86, 32; 2026-02-21T08:25:57.7453451Z or.b64 %rd88, %rd85, %rd87; 2026-02-21T08:25:57.7453714Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7454010Z mov.b64 {%r461, %r462}, %rd88; 2026-02-21T08:25:57.7454181Z cvt.rn.f16x2.f32 %r463, %r462, %r461; 2026-02-21T08:25:57.7454460Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7454802Z cvt.u64.u32 %rd89, %r300; 2026-02-21T08:25:57.7454953Z cvt.u64.u32 %rd90, %r301; 2026-02-21T08:25:57.7455111Z shl.b64 %rd91, %rd90, 32; 2026-02-21T08:25:57.7455261Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T08:25:57.7455536Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7455836Z mov.b64 {%r464, %r465}, %rd92; 2026-02-21T08:25:57.7456004Z cvt.rn.f16x2.f32 %r466, %r465, %r464; 2026-02-21T08:25:57.7456301Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7456601Z cvt.u64.u32 %rd93, %r303; 2026-02-21T08:25:57.7456763Z cvt.u64.u32 %rd94, %r304; 2026-02-21T08:25:57.7456916Z shl.b64 %rd95, %rd94, 32; 2026-02-21T08:25:57.7457078Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T08:25:57.7457351Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7457684Z mov.b64 {%r467, %r468}, %rd96; 2026-02-21T08:25:57.7457854Z cvt.rn.f16x2.f32 %r469, %r468, %r467; 2026-02-21T08:25:57.7458136Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7458436Z cvt.u64.u32 %rd97, %r305; 2026-02-21T08:25:57.7458588Z cvt.u64.u32 %rd98, %r306; 2026-02-21T08:25:57.7458744Z shl.b64 %rd99, %rd98, 32; 2026-02-21T08:25:57.7458898Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T08:25:57.7459183Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7459480Z mov.b64 {%r470, %r471}, %rd100; 2026-02-21T08:25:57.7459651Z cvt.rn.f16x2.f32 %r472, %r471, %r470; 2026-02-21T08:25:57.7459939Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7460228Z cvt.u64.u32 %rd101, %r307; 2026-02-21T08:25:57.7460417Z cvt.u64.u32 %rd102, %r308; 2026-02-21T08:25:57.7460576Z shl.b64 %rd103, %rd102, 32; 2026-02-21T08:25:57.7460740Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T08:25:57.7461018Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7461298Z mov.b64 {%r473, %r474}, %rd104; 2026-02-21T08:25:57.7461466Z cvt.rn.f16x2.f32 %r475, %r474, %r473; 2026-02-21T08:25:57.7461731Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7462042Z cvt.u64.u32 %rd105, %r309; 2026-02-21T08:25:57.7462190Z cvt.u64.u32 %rd106, %r310; 2026-02-21T08:25:57.7462343Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:25:57.7462491Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:25:57.7462783Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7463074Z mov.b64 {%r476, %r477}, %rd108; 2026-02-21T08:25:57.7463242Z cvt.rn.f16x2.f32 %r478, %r477, %r476; 2026-02-21T08:25:57.7463525Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7463804Z cvt.u64.u32 %rd109, %r311; 2026-02-21T08:25:57.7463963Z cvt.u64.u32 %rd110, %r312; 2026-02-21T08:25:57.7464113Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:25:57.7464274Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:25:57.7464539Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7464867Z mov.b64 {%r479, %r480}, %rd112; 2026-02-21T08:25:57.7465034Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T08:25:57.7465309Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7465595Z cvt.u64.u32 %rd113, %r313; 2026-02-21T08:25:57.7465741Z cvt.u64.u32 %rd114, %r314; 2026-02-21T08:25:57.7465899Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:25:57.7466049Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:25:57.7466329Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7466629Z mov.b64 {%r482, %r483}, %rd116; 2026-02-21T08:25:57.7466790Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T08:25:57.7467070Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7467351Z cvt.u64.u32 %rd117, %r315; 2026-02-21T08:25:57.7467511Z cvt.u64.u32 %rd118, %r316; 2026-02-21T08:25:57.7467662Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:25:57.7467823Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:25:57.7468100Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7468388Z mov.b64 {%r485, %r486}, %rd120; 2026-02-21T08:25:57.7468554Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T08:25:57.7468832Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7469183Z cvt.u64.u32 %rd121, %r317; 2026-02-21T08:25:57.7469335Z cvt.u64.u32 %rd122, %r318; 2026-02-21T08:25:57.7469489Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:25:57.7469642Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:25:57.7469937Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7470240Z mov.b64 {%r488, %r489}, %rd124; 2026-02-21T08:25:57.7470403Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T08:25:57.7470688Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7470989Z cvt.u64.u32 %rd125, %r320; 2026-02-21T08:25:57.7471149Z cvt.u64.u32 %rd126, %r321; 2026-02-21T08:25:57.7471299Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:25:57.7471462Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:25:57.7471747Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7472067Z mov.b64 {%r491, %r492}, %rd128; 2026-02-21T08:25:57.7472246Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T08:25:57.7472547Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7472852Z cvt.u64.u32 %rd129, %r322; 2026-02-21T08:25:57.7473008Z cvt.u64.u32 %rd130, %r323; 2026-02-21T08:25:57.7473168Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:25:57.7473325Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:25:57.7473622Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7473966Z mov.b64 {%r494, %r495}, %rd132; 2026-02-21T08:25:57.7474129Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T08:25:57.7474409Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7474762Z cvt.u64.u32 %rd133, %r324; 2026-02-21T08:25:57.7474923Z cvt.u64.u32 %rd134, %r325; 2026-02-21T08:25:57.7475072Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:25:57.7475231Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:25:57.7475492Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7475781Z mov.b64 {%r497, %r498}, %rd136; 2026-02-21T08:25:57.7475949Z cvt.rn.f16x2.f32 %r499, %r498, %r497; 2026-02-21T08:25:57.7476219Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7476514Z cvt.u64.u32 %rd137, %r326; 2026-02-21T08:25:57.7476663Z cvt.u64.u32 %rd138, %r327; 2026-02-21T08:25:57.7476819Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:25:57.7476969Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:25:57.7477238Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7477540Z mov.b64 {%r500, %r501}, %rd140; 2026-02-21T08:25:57.7477703Z cvt.rn.f16x2.f32 %r502, %r501, %r500; 2026-02-21T08:25:57.7477989Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7478262Z cvt.u64.u32 %rd141, %r328; 2026-02-21T08:25:57.7478415Z cvt.u64.u32 %rd142, %r329; 2026-02-21T08:25:57.7478561Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:25:57.7478720Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:25:57.7478980Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7479265Z mov.b64 {%r503, %r504}, %rd144; 2026-02-21T08:25:57.7479432Z cvt.rn.f16x2.f32 %r505, %r504, %r503; 2026-02-21T08:25:57.7479698Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7479982Z cvt.u64.u32 %rd145, %r330; 2026-02-21T08:25:57.7480128Z cvt.u64.u32 %rd146, %r331; 2026-02-21T08:25:57.7480279Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:25:57.7480428Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:25:57.7480691Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7481001Z mov.b64 {%r506, %r507}, %rd148; 2026-02-21T08:25:57.7481162Z cvt.rn.f16x2.f32 %r508, %r507, %r506; 2026-02-21T08:25:57.7481436Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7481717Z cvt.u64.u32 %rd149, %r332; 2026-02-21T08:25:57.7481874Z cvt.u64.u32 %rd150, %r333; 2026-02-21T08:25:57.7482020Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:25:57.7482180Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:25:57.7482436Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7482725Z mov.b64 {%r509, %r510}, %rd152; 2026-02-21T08:25:57.7482893Z cvt.rn.f16x2.f32 %r511, %r510, %r509; 2026-02-21T08:25:57.7483167Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7483472Z cvt.u64.u32 %rd153, %r334; 2026-02-21T08:25:57.7483623Z cvt.u64.u32 %rd154, %r335; 2026-02-21T08:25:57.7483775Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:25:57.7483924Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:25:57.7484184Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7484464Z mov.b64 {%r512, %r513}, %rd156; 2026-02-21T08:25:57.7484620Z cvt.rn.f16x2.f32 %r514, %r513, %r512; 2026-02-21T08:25:57.7484935Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7485240Z cvt.u64.u32 %rd157, %r337; 2026-02-21T08:25:57.7485392Z cvt.u64.u32 %rd158, %r338; 2026-02-21T08:25:57.7485538Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:25:57.7485695Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:25:57.7485975Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7486267Z mov.b64 {%r515, %r516}, %rd160; 2026-02-21T08:25:57.7486433Z cvt.rn.f16x2.f32 %r517, %r516, %r515; 2026-02-21T08:25:57.7486701Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7486983Z cvt.u64.u32 %rd161, %r339; 2026-02-21T08:25:57.7487131Z cvt.u64.u32 %rd162, %r340; 2026-02-21T08:25:57.7487285Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:25:57.7487433Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:25:57.7487703Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7487990Z mov.b64 {%r518, %r519}, %rd164; 2026-02-21T08:25:57.7488151Z cvt.rn.f16x2.f32 %r520, %r519, %r518; 2026-02-21T08:25:57.7488428Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7488706Z cvt.u64.u32 %rd165, %r341; 2026-02-21T08:25:57.7488862Z cvt.u64.u32 %rd166, %r342; 2026-02-21T08:25:57.7489008Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:25:57.7489166Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:25:57.7489432Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7489722Z mov.b64 {%r521, %r522}, %rd168; 2026-02-21T08:25:57.7489892Z cvt.rn.f16x2.f32 %r523, %r522, %r521; 2026-02-21T08:25:57.7490171Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7490455Z cvt.u64.u32 %rd169, %r343; 2026-02-21T08:25:57.7490601Z cvt.u64.u32 %rd170, %r344; 2026-02-21T08:25:57.7490756Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:25:57.7490904Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:25:57.7491170Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7491455Z mov.b64 {%r524, %r525}, %rd172; 2026-02-21T08:25:57.7491616Z cvt.rn.f16x2.f32 %r526, %r525, %r524; 2026-02-21T08:25:57.7491901Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7492211Z cvt.u64.u32 %rd173, %r345; 2026-02-21T08:25:57.7492364Z cvt.u64.u32 %rd174, %r346; 2026-02-21T08:25:57.7492509Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:25:57.7492668Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:25:57.7492923Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7493205Z mov.b64 {%r527, %r528}, %rd176; 2026-02-21T08:25:57.7493372Z cvt.rn.f16x2.f32 %r529, %r528, %r527; 2026-02-21T08:25:57.7493644Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7493933Z cvt.u64.u32 %rd177, %r347; 2026-02-21T08:25:57.7494080Z cvt.u64.u32 %rd178, %r348; 2026-02-21T08:25:57.7494233Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:25:57.7494383Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:25:57.7494729Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7495039Z mov.b64 {%r530, %r531}, %rd180; 2026-02-21T08:25:57.7495211Z cvt.rn.f16x2.f32 %r532, %r531, %r530; 2026-02-21T08:25:57.7495504Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7495796Z cvt.u64.u32 %rd181, %r349; 2026-02-21T08:25:57.7495961Z cvt.u64.u32 %rd182, %r350; 2026-02-21T08:25:57.7496119Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:25:57.7496287Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:25:57.7496585Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7496888Z mov.b64 {%r533, %r534}, %rd184; 2026-02-21T08:25:57.7497061Z cvt.rn.f16x2.f32 %r535, %r534, %r533; 2026-02-21T08:25:57.7497419Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7497715Z cvt.u64.u32 %rd185, %r351; 2026-02-21T08:25:57.7497868Z cvt.u64.u32 %rd186, %r352; 2026-02-21T08:25:57.7498028Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:25:57.7498183Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:25:57.7498456Z .loc 1 58 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:58:27 2026-02-21T08:25:57.7498758Z mov.b64 {%r536, %r537}, %rd188; 2026-02-21T08:25:57.7498924Z cvt.rn.f16x2.f32 %r538, %r537, %r536; 2026-02-21T08:25:57.7499210Z .loc 1 59 82 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:59:82 2026-02-21T08:25:57.7499540Z st.shared.v4.b32 [%r22], {%r445, %r457, %r469, %r481}; 2026-02-21T08:25:57.7499786Z st.shared.v4.b32 [%r23], {%r493, %r505, %r517, %r529}; 2026-02-21T08:25:57.7499986Z bar.sync 0; 2026-02-21T08:25:57.7500131Z // begin inline asm 2026-02-21T08:25:57.7500375Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r354, %r355, %r356, %r357}, [%r358]; 2026-02-21T08:25:57.7500657Z // end inline asm 2026-02-21T08:25:57.7500803Z // begin inline asm 2026-02-21T08:25:57.7501042Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r359, %r360, %r361, %r362}, [%r363]; 2026-02-21T08:25:57.7501319Z // end inline asm 2026-02-21T08:25:57.7501459Z bar.sync 0; 2026-02-21T08:25:57.7501632Z st.shared.v4.b32 [%r22], {%r448, %r460, %r472, %r484}; 2026-02-21T08:25:57.7501860Z st.shared.v4.b32 [%r23], {%r496, %r508, %r520, %r532}; 2026-02-21T08:25:57.7502059Z bar.sync 0; 2026-02-21T08:25:57.7502189Z // begin inline asm 2026-02-21T08:25:57.7502425Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r364, %r365, %r366, %r367}, [%r358]; 2026-02-21T08:25:57.7502692Z // end inline asm 2026-02-21T08:25:57.7502835Z // begin inline asm 2026-02-21T08:25:57.7503059Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r369, %r370, %r371, %r372}, [%r363]; 2026-02-21T08:25:57.7503302Z // end inline asm 2026-02-21T08:25:57.7503435Z bar.sync 0; 2026-02-21T08:25:57.7503590Z st.shared.v4.b32 [%r22], {%r451, %r463, %r475, %r487}; 2026-02-21T08:25:57.7503814Z st.shared.v4.b32 [%r23], {%r499, %r511, %r523, %r535}; 2026-02-21T08:25:57.7504027Z bar.sync 0; 2026-02-21T08:25:57.7504160Z // begin inline asm 2026-02-21T08:25:57.7504384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r374, %r375, %r376, %r377}, [%r358]; 2026-02-21T08:25:57.7504638Z // end inline asm 2026-02-21T08:25:57.7504820Z // begin inline asm 2026-02-21T08:25:57.7505036Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r379, %r380, %r381, %r382}, [%r363]; 2026-02-21T08:25:57.7505289Z // end inline asm 2026-02-21T08:25:57.7505418Z bar.sync 0; 2026-02-21T08:25:57.7505584Z st.shared.v4.b32 [%r22], {%r454, %r466, %r478, %r490}; 2026-02-21T08:25:57.7505802Z st.shared.v4.b32 [%r23], {%r502, %r514, %r526, %r538}; 2026-02-21T08:25:57.7505993Z bar.sync 0; 2026-02-21T08:25:57.7506124Z // begin inline asm 2026-02-21T08:25:57.7506341Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r384, %r385, %r386, %r387}, [%r358]; 2026-02-21T08:25:57.7506599Z // end inline asm 2026-02-21T08:25:57.7506728Z // begin inline asm 2026-02-21T08:25:57.7506982Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r389, %r390, %r391, %r392}, [%r363]; 2026-02-21T08:25:57.7507226Z // end inline asm 2026-02-21T08:25:57.7507362Z // begin inline asm 2026-02-21T08:25:57.7507537Z st.global.v4.b32 [ %rd53 + 0 ], { %r354, %r364, %r374, %r384 }; 2026-02-21T08:25:57.7507746Z // end inline asm 2026-02-21T08:25:57.7507882Z // begin inline asm 2026-02-21T08:25:57.7508053Z st.global.v4.b32 [ %rd54 + 0 ], { %r355, %r365, %r375, %r385 }; 2026-02-21T08:25:57.7508259Z // end inline asm 2026-02-21T08:25:57.7508385Z // begin inline asm 2026-02-21T08:25:57.7508591Z st.global.v4.b32 [ %rd55 + 0 ], { %r356, %r366, %r376, %r386 }; 2026-02-21T08:25:57.7508787Z // end inline asm 2026-02-21T08:25:57.7508920Z // begin inline asm 2026-02-21T08:25:57.7509088Z st.global.v4.b32 [ %rd56 + 0 ], { %r357, %r367, %r377, %r387 }; 2026-02-21T08:25:57.7509294Z // end inline asm 2026-02-21T08:25:57.7509464Z // begin inline asm 2026-02-21T08:25:57.7509644Z st.global.v4.b32 [ %rd57 + 0 ], { %r359, %r369, %r379, %r389 }; 2026-02-21T08:25:57.7509848Z // end inline asm 2026-02-21T08:25:57.7509975Z // begin inline asm 2026-02-21T08:25:57.7510154Z st.global.v4.b32 [ %rd58 + 0 ], { %r360, %r370, %r380, %r390 }; 2026-02-21T08:25:57.7510348Z // end inline asm 2026-02-21T08:25:57.7510487Z // begin inline asm 2026-02-21T08:25:57.7510661Z st.global.v4.b32 [ %rd59 + 0 ], { %r361, %r371, %r381, %r391 }; 2026-02-21T08:25:57.7510880Z // end inline asm 2026-02-21T08:25:57.7511008Z // begin inline asm 2026-02-21T08:25:57.7511183Z st.global.v4.b32 [ %rd60 + 0 ], { %r362, %r372, %r382, %r392 }; 2026-02-21T08:25:57.7511386Z // end inline asm 2026-02-21T08:25:57.7511634Z .loc 1 30 74 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:30:74 2026-02-21T08:25:57.7511926Z add.s32 %r540, %r540, 1; 2026-02-21T08:25:57.7512080Z setp.ne.b32 %p71, %r540, %r4; 2026-02-21T08:25:57.7512248Z @%p71 bra $L__BB0_2; 2026-02-21T08:25:57.7512387Z bra.uni $L__BB0_9; 2026-02-21T08:25:57.7512571Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:25:57.7512804Z // Child Loop BB0_5 Depth 2 2026-02-21T08:25:57.7513110Z .loc 1 36 35 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:36:35 2026-02-21T08:25:57.7513397Z shr.s32 %r203, %r540, 31; 2026-02-21T08:25:57.7513545Z shr.u32 %r204, %r203, 22; 2026-02-21T08:25:57.7513702Z add.s32 %r205, %r540, %r204; 2026-02-21T08:25:57.7513964Z .loc 1 39 45 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:39:45 2026-02-21T08:25:57.7514257Z and.b32 %r206, %r205, 64512; 2026-02-21T08:25:57.7514409Z sub.s32 %r207, %r540, %r206; 2026-02-21T08:25:57.7514704Z .loc 1 39 64 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:39:64 2026-02-21T08:25:57.7514996Z cvt.u16.u32 %rs1, %r207; 2026-02-21T08:25:57.7515258Z .loc 1 40 51 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:40:51 2026-02-21T08:25:57.7515580Z shr.s16 %rs2, %rs1, 15; 2026-02-21T08:25:57.7515726Z shr.u16 %rs3, %rs2, 10; 2026-02-21T08:25:57.7515876Z add.s16 %rs4, %rs1, %rs3; 2026-02-21T08:25:57.7516023Z shr.s16 %rs5, %rs4, 6; 2026-02-21T08:25:57.7516275Z .loc 1 39 64 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:39:64 2026-02-21T08:25:57.7516558Z and.b16 %rs6, %rs4, -64; 2026-02-21T08:25:57.7516706Z sub.s16 %rs7, %rs1, %rs6; 2026-02-21T08:25:57.7516961Z .loc 1 41 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:41:27 2026-02-21T08:25:57.7517243Z shl.b32 %r208, %r205, 3; 2026-02-21T08:25:57.7517401Z and.b32 %r209, %r208, -8192; 2026-02-21T08:25:57.7517559Z mad.wide.s16 %r229, %rs7, 128, %r209; 2026-02-21T08:25:57.7517839Z .loc 1 43 27 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:43:27 2026-02-21T08:25:57.7518125Z mul.wide.s16 %r210, %rs5, 64; 2026-02-21T08:25:57.7518416Z .loc 1 44 32 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:44:32 2026-02-21T08:25:57.7518710Z or.b32 %r211, %r210, %r8; 2026-02-21T08:25:57.7518959Z .loc 1 54 53 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:53 2026-02-21T08:25:57.7519240Z shl.b32 %r212, %r211, 10; 2026-02-21T08:25:57.7519486Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7519786Z shfl.sync.idx.b32 %r39, %r5, 0, 31, -1; 2026-02-21T08:25:57.7519989Z shl.b32 %r213, %r39, 21; 2026-02-21T08:25:57.7520144Z and.b32 %r214, %r213, 6291456; 2026-02-21T08:25:57.7520307Z add.s32 %r353, %r214, %r539; 2026-02-21T08:25:57.7520457Z mov.pred %p22, -1; 2026-02-21T08:25:57.7520603Z mov.b32 %r541, 0; 2026-02-21T08:25:57.7520736Z // begin inline asm 2026-02-21T08:25:57.7521159Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r353 + 0], 64, {%r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541}; 2026-02-21T08:25:57.7521558Z // end inline asm 2026-02-21T08:25:57.7521700Z // begin inline asm 2026-02-21T08:25:57.7522082Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r353 + 16], 64, {%r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541}; 2026-02-21T08:25:57.7522472Z // end inline asm 2026-02-21T08:25:57.7522609Z // begin inline asm 2026-02-21T08:25:57.7522973Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r353 + 32], 64, {%r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541}; 2026-02-21T08:25:57.7523371Z // end inline asm 2026-02-21T08:25:57.7523500Z // begin inline asm 2026-02-21T08:25:57.7523859Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r353 + 48], 64, {%r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541, %r541}; 2026-02-21T08:25:57.7524264Z // end inline asm 2026-02-21T08:25:57.7524393Z // begin inline asm 2026-02-21T08:25:57.7524551Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:25:57.7524733Z // end inline asm 2026-02-21T08:25:57.7524870Z bar.sync 0; 2026-02-21T08:25:57.7525105Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7525390Z add.s32 %r542, %r55, 24608; 2026-02-21T08:25:57.7525539Z // begin inline asm 2026-02-21T08:25:57.7525706Z @%p4 mbarrier.init.shared::cta.b64 [%r542], 1; 2026-02-21T08:25:57.7525894Z // end inline asm 2026-02-21T08:25:57.7526020Z bar.sync 0; 2026-02-21T08:25:57.7526155Z add.s32 %r175, %r55, 24616; 2026-02-21T08:25:57.7526304Z // begin inline asm 2026-02-21T08:25:57.7526470Z @%p4 mbarrier.init.shared::cta.b64 [%r175], 1; 2026-02-21T08:25:57.7526651Z // end inline asm 2026-02-21T08:25:57.7526788Z add.s32 %r176, %r55, 24576; 2026-02-21T08:25:57.7526936Z // begin inline asm 2026-02-21T08:25:57.7527095Z @%p4 mbarrier.init.shared::cta.b64 [%r176], 1; 2026-02-21T08:25:57.7527310Z // end inline asm 2026-02-21T08:25:57.7527435Z bar.sync 0; 2026-02-21T08:25:57.7527565Z add.s32 %r177, %r55, 24584; 2026-02-21T08:25:57.7527710Z // begin inline asm 2026-02-21T08:25:57.7527870Z @%p4 mbarrier.init.shared::cta.b64 [%r177], 1; 2026-02-21T08:25:57.7528046Z // end inline asm 2026-02-21T08:25:57.7528179Z bar.sync 0; 2026-02-21T08:25:57.7528303Z add.s32 %r178, %r55, 24592; 2026-02-21T08:25:57.7528456Z // begin inline asm 2026-02-21T08:25:57.7528609Z @%p4 mbarrier.init.shared::cta.b64 [%r178], 1; 2026-02-21T08:25:57.7528790Z // end inline asm 2026-02-21T08:25:57.7528920Z bar.sync 0; 2026-02-21T08:25:57.7529045Z add.s32 %r226, %r55, 24600; 2026-02-21T08:25:57.7529199Z // begin inline asm 2026-02-21T08:25:57.7529351Z @%p4 mbarrier.init.shared::cta.b64 [%r226], 1; 2026-02-21T08:25:57.7529533Z // end inline asm 2026-02-21T08:25:57.7529775Z .loc 1 54 60 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:60 2026-02-21T08:25:57.7530089Z or.b32 %r215, %r212, %r17; 2026-02-21T08:25:57.7530356Z .loc 1 54 32 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:32 2026-02-21T08:25:57.7530660Z mad.wide.s32 %rd34, %r215, 2, %rd10; 2026-02-21T08:25:57.7530840Z mov.b32 %r181, 16; 2026-02-21T08:25:57.7531095Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7531382Z // begin inline asm 2026-02-21T08:25:57.7531586Z cp.async.cg.shared.global [ %r180 + 0 ], [ %rd34 + 0 ], 0x10, %r181; 2026-02-21T08:25:57.7531842Z // end inline asm 2026-02-21T08:25:57.7531981Z cp.async.commit_group; 2026-02-21T08:25:57.7532253Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7532546Z bar.sync 0; 2026-02-21T08:25:57.7532669Z // begin inline asm 2026-02-21T08:25:57.7532887Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r176], 4096; 2026-02-21T08:25:57.7533102Z // end inline asm 2026-02-21T08:25:57.7533350Z .loc 1 55 44 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:55:44 2026-02-21T08:25:57.7533627Z // begin inline asm 2026-02-21T08:25:57.7533779Z fence.proxy.async.shared::cta; 2026-02-21T08:25:57.7533936Z // end inline asm 2026-02-21T08:25:57.7534070Z bar.sync 0; 2026-02-21T08:25:57.7534207Z elect.sync %r216|%p39, -1; 2026-02-21T08:25:57.7534366Z and.pred %p33, %p1, %p39; 2026-02-21T08:25:57.7534524Z // begin inline asm 2026-02-21T08:25:57.7534897Z @%p33 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r55], [%rd44, {%r541, %r229}], [%r176]; 2026-02-21T08:25:57.7535261Z // end inline asm 2026-02-21T08:25:57.7535505Z .loc 1 54 32 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:32 2026-02-21T08:25:57.7535792Z add.s64 %rd36, %rd34, 32; 2026-02-21T08:25:57.7536064Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7536343Z // begin inline asm 2026-02-21T08:25:57.7536543Z cp.async.cg.shared.global [ %r187 + 0 ], [ %rd36 + 0 ], 0x10, %r181; 2026-02-21T08:25:57.7536760Z // end inline asm 2026-02-21T08:25:57.7536902Z cp.async.commit_group; 2026-02-21T08:25:57.7537154Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7537440Z bar.sync 0; 2026-02-21T08:25:57.7537564Z // begin inline asm 2026-02-21T08:25:57.7537751Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r177], 4096; 2026-02-21T08:25:57.7537974Z // end inline asm 2026-02-21T08:25:57.7538212Z .loc 1 55 44 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:55:44 2026-02-21T08:25:57.7538501Z bar.sync 0; 2026-02-21T08:25:57.7538631Z elect.sync %r217|%p40, -1; 2026-02-21T08:25:57.7538793Z and.pred %p35, %p1, %p40; 2026-02-21T08:25:57.7538944Z add.s32 %r190, %r55, 4096; 2026-02-21T08:25:57.7539097Z // begin inline asm 2026-02-21T08:25:57.7539453Z @%p35 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r190], [%rd44, {%r181, %r229}], [%r177]; 2026-02-21T08:25:57.7539814Z // end inline asm 2026-02-21T08:25:57.7540067Z .loc 1 54 32 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:32 2026-02-21T08:25:57.7540360Z add.s64 %rd38, %rd34, 64; 2026-02-21T08:25:57.7540630Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7540921Z // begin inline asm 2026-02-21T08:25:57.7541133Z cp.async.cg.shared.global [ %r194 + 0 ], [ %rd38 + 0 ], 0x10, %r181; 2026-02-21T08:25:57.7541358Z // end inline asm 2026-02-21T08:25:57.7541508Z cp.async.commit_group; 2026-02-21T08:25:57.7541776Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7542060Z bar.sync 0; 2026-02-21T08:25:57.7542199Z // begin inline asm 2026-02-21T08:25:57.7542412Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r178], 4096; 2026-02-21T08:25:57.7542643Z // end inline asm 2026-02-21T08:25:57.7542895Z .loc 1 55 44 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:55:44 2026-02-21T08:25:57.7543190Z bar.sync 0; 2026-02-21T08:25:57.7543329Z elect.sync %r218|%p41, -1; 2026-02-21T08:25:57.7543505Z and.pred %p37, %p1, %p41; 2026-02-21T08:25:57.7543669Z add.s32 %r197, %r55, 8192; 2026-02-21T08:25:57.7543820Z mov.b32 %r198, 32; 2026-02-21T08:25:57.7543999Z // begin inline asm 2026-02-21T08:25:57.7544329Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r197], [%rd44, {%r198, %r229}], [%r178]; 2026-02-21T08:25:57.7544739Z // end inline asm 2026-02-21T08:25:57.7544991Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7545333Z cp.async.wait_group 2; 2026-02-21T08:25:57.7545495Z bar.sync 0; 2026-02-21T08:25:57.7545745Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7546051Z // begin inline asm 2026-02-21T08:25:57.7546184Z 2026-02-21T08:25:57.7546305Z { 2026-02-21T08:25:57.7546429Z .reg .pred complete; 2026-02-21T08:25:57.7546585Z waitLoop: 2026-02-21T08:25:57.7546779Z mbarrier.try_wait.parity.shared.b64 complete, [%r176], %r541; 2026-02-21T08:25:57.7547029Z @!complete bra.uni waitLoop; 2026-02-21T08:25:57.7547187Z } 2026-02-21T08:25:57.7547262Z 2026-02-21T08:25:57.7547319Z // end inline asm 2026-02-21T08:25:57.7547573Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7547859Z setp.ne.b32 %p42, %r39, 0; 2026-02-21T08:25:57.7548019Z @%p42 bra $L__BB0_4; 2026-02-21T08:25:57.7548204Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:25:57.7548426Z elect.sync %r221|%p44, -1; 2026-02-21T08:25:57.7548578Z mov.b32 %r220, 69206032; 2026-02-21T08:25:57.7548732Z mov.pred %p43, 0; 2026-02-21T08:25:57.7548866Z // begin inline asm 2026-02-21T08:25:57.7549090Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r539 + 0 ], %rd40, %rd41, %r220, %p43; 2026-02-21T08:25:57.7549348Z // end inline asm 2026-02-21T08:25:57.7549481Z add.s32 %r223, %r55, 24608; 2026-02-21T08:25:57.7549642Z cvt.u64.u32 %rd42, %r223; 2026-02-21T08:25:57.7549789Z // begin inline asm 2026-02-21T08:25:57.7549994Z @%p44 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd42]; 2026-02-21T08:25:57.7550220Z // end inline asm 2026-02-21T08:25:57.7550400Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:25:57.7550720Z .loc 1 0 0 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:0 2026-02-21T08:25:57.7550995Z cvt.s32.s16 %r28, %rs5; 2026-02-21T08:25:57.7551153Z or.b32 %r30, %r229, %r7; 2026-02-21T08:25:57.7551300Z or.b32 %r31, %r210, %r9; 2026-02-21T08:25:57.7551452Z or.b32 %r32, %r210, %r10; 2026-02-21T08:25:57.7551625Z or.b32 %r33, %r210, %r11; 2026-02-21T08:25:57.7551772Z or.b32 %r34, %r210, %r12; 2026-02-21T08:25:57.7551913Z or.b32 %r35, %r210, %r13; 2026-02-21T08:25:57.7552063Z or.b32 %r36, %r210, %r14; 2026-02-21T08:25:57.7552212Z or.b32 %r37, %r210, %r15; 2026-02-21T08:25:57.7552352Z or.b32 %r38, %r210, %r16; 2026-02-21T08:25:57.7552611Z .loc 1 54 32 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:32 2026-02-21T08:25:57.7552887Z add.s64 %rd43, %rd34, 96; 2026-02-21T08:25:57.7553148Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7553441Z bar.sync 0; 2026-02-21T08:25:57.7553577Z // begin inline asm 2026-02-21T08:25:57.7553771Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd43 + 0 ], 0x10, %r181; 2026-02-21T08:25:57.7554001Z // end inline asm 2026-02-21T08:25:57.7554141Z cp.async.commit_group; 2026-02-21T08:25:57.7554424Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7554990Z // begin inline asm 2026-02-21T08:25:57.7555176Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r226], 4096; 2026-02-21T08:25:57.7555385Z // end inline asm 2026-02-21T08:25:57.7555619Z .loc 1 55 44 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:55:44 2026-02-21T08:25:57.7555900Z bar.sync 0; 2026-02-21T08:25:57.7556032Z elect.sync %r236|%p49, -1; 2026-02-21T08:25:57.7556196Z and.pred %p47, %p1, %p49; 2026-02-21T08:25:57.7556405Z add.s32 %r227, %r55, 12288; 2026-02-21T08:25:57.7556552Z mov.b32 %r228, 48; 2026-02-21T08:25:57.7556695Z // begin inline asm 2026-02-21T08:25:57.7557016Z @%p47 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r227], [%rd44, {%r228, %r229}], [%r226]; 2026-02-21T08:25:57.7557404Z // end inline asm 2026-02-21T08:25:57.7557648Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7557935Z shl.b32 %r237, %r28, 16; 2026-02-21T08:25:57.7558089Z or.b32 %r238, %r26, %r237; 2026-02-21T08:25:57.7558243Z mad.wide.s32 %rd189, %r238, 2, %rd10; 2026-02-21T08:25:57.7558418Z mov.b32 %r546, 1; 2026-02-21T08:25:57.7558549Z mov.b32 %r545, 3; 2026-02-21T08:25:57.7558694Z mov.b64 %rd190, -16; 2026-02-21T08:25:57.7558832Z mov.b32 %r543, %r541; 2026-02-21T08:25:57.7558977Z mov.b32 %r544, %r541; 2026-02-21T08:25:57.7559112Z mov.b32 %r547, %r541; 2026-02-21T08:25:57.7559257Z bra.uni $L__BB0_5; 2026-02-21T08:25:57.7559430Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:25:57.7559748Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7560030Z add.s64 %rd8, %rd190, 16; 2026-02-21T08:25:57.7560185Z setp.lt.u64 %p57, %rd8, 960; 2026-02-21T08:25:57.7560455Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7560731Z // begin inline asm 2026-02-21T08:25:57.7560867Z 2026-02-21T08:25:57.7560973Z { 2026-02-21T08:25:57.7561096Z .reg .pred complete; 2026-02-21T08:25:57.7561234Z waitLoop: 2026-02-21T08:25:57.7561422Z mbarrier.try_wait.parity.shared.b64 complete, [%r542], %r541; 2026-02-21T08:25:57.7561655Z @!complete bra.uni waitLoop; 2026-02-21T08:25:57.7561801Z } 2026-02-21T08:25:57.7561863Z 2026-02-21T08:25:57.7561924Z // end inline asm 2026-02-21T08:25:57.7562057Z add.s32 %r268, %r546, 1; 2026-02-21T08:25:57.7562214Z setp.gt.s32 %p60, %r268, 1; 2026-02-21T08:25:57.7562369Z selp.b32 %r546, 0, %r268, %p60; 2026-02-21T08:25:57.7562539Z selp.b32 %r269, 1, 0, %p60; 2026-02-21T08:25:57.7562687Z xor.b32 %r52, %r547, %r269; 2026-02-21T08:25:57.7562951Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7563243Z add.s32 %r270, %r545, 1; 2026-02-21T08:25:57.7563397Z setp.gt.s32 %p61, %r270, 3; 2026-02-21T08:25:57.7563601Z selp.b32 %r545, 0, %r270, %p61; 2026-02-21T08:25:57.7563868Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7564154Z shl.b32 %r271, %r545, 11; 2026-02-21T08:25:57.7564295Z bar.sync 0; 2026-02-21T08:25:57.7564433Z add.s32 %r261, %r180, %r271; 2026-02-21T08:25:57.7564586Z selp.b32 %r262, 16, 0, %p57; 2026-02-21T08:25:57.7564779Z // begin inline asm 2026-02-21T08:25:57.7564984Z cp.async.cg.shared.global [ %r261 + 0 ], [ %rd189 + 0 ], 0x10, %r262; 2026-02-21T08:25:57.7565207Z // end inline asm 2026-02-21T08:25:57.7565353Z cp.async.commit_group; 2026-02-21T08:25:57.7565616Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7565915Z shl.b32 %r272, %r545, 3; 2026-02-21T08:25:57.7566061Z add.s32 %r274, %r55, %r272; 2026-02-21T08:25:57.7566221Z add.s32 %r267, %r274, 24576; 2026-02-21T08:25:57.7566401Z and.pred %p55, %p4, %p57; 2026-02-21T08:25:57.7566561Z // begin inline asm 2026-02-21T08:25:57.7566752Z @%p55 mbarrier.arrive.expect_tx.shared.b64 _, [%r267], 4096; 2026-02-21T08:25:57.7566965Z // end inline asm 2026-02-21T08:25:57.7567213Z .loc 1 55 44 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:55:44 2026-02-21T08:25:57.7567491Z shl.b32 %r275, %r545, 12; 2026-02-21T08:25:57.7567644Z add.s32 %r264, %r55, %r275; 2026-02-21T08:25:57.7567787Z bar.sync 0; 2026-02-21T08:25:57.7567927Z elect.sync %r276|%p62, -1; 2026-02-21T08:25:57.7568112Z and.pred %p63, %p57, %p62; 2026-02-21T08:25:57.7568274Z and.pred %p56, %p1, %p63; 2026-02-21T08:25:57.7568430Z cvt.u32.u64 %r277, %rd190; 2026-02-21T08:25:57.7568575Z add.s32 %r265, %r277, 80; 2026-02-21T08:25:57.7568724Z // begin inline asm 2026-02-21T08:25:57.7569064Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd44, {%r265, %r229}], [%r267]; 2026-02-21T08:25:57.7569421Z // end inline asm 2026-02-21T08:25:57.7569667Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7569963Z add.s64 %rd189, %rd189, 32; 2026-02-21T08:25:57.7570126Z setp.lt.u64 %p64, %rd8, 992; 2026-02-21T08:25:57.7570276Z mov.b64 %rd190, %rd8; 2026-02-21T08:25:57.7570422Z mov.b32 %r541, %r547; 2026-02-21T08:25:57.7570561Z mov.b32 %r542, %r278; 2026-02-21T08:25:57.7570705Z mov.b32 %r547, %r52; 2026-02-21T08:25:57.7570842Z @%p64 bra $L__BB0_5; 2026-02-21T08:25:57.7570987Z bra.uni $L__BB0_8; 2026-02-21T08:25:57.7571161Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:25:57.7571407Z // => This Inner Loop Header: Depth=2 2026-02-21T08:25:57.7571608Z add.s32 %r241, %r544, 1; 2026-02-21T08:25:57.7571765Z setp.gt.s32 %p51, %r241, 3; 2026-02-21T08:25:57.7571932Z selp.b32 %r544, 0, %r241, %p51; 2026-02-21T08:25:57.7572095Z selp.b32 %r242, 1, 0, %p51; 2026-02-21T08:25:57.7572254Z xor.b32 %r543, %r543, %r242; 2026-02-21T08:25:57.7572519Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7572824Z cp.async.wait_group 2; 2026-02-21T08:25:57.7572970Z bar.sync 0; 2026-02-21T08:25:57.7573220Z .loc 1 49 90 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:49:90 2026-02-21T08:25:57.7573514Z shl.b32 %r243, %r544, 3; 2026-02-21T08:25:57.7573660Z add.s32 %r245, %r55, %r243; 2026-02-21T08:25:57.7573817Z add.s32 %r239, %r245, 24576; 2026-02-21T08:25:57.7573963Z // begin inline asm 2026-02-21T08:25:57.7574099Z 2026-02-21T08:25:57.7574205Z { 2026-02-21T08:25:57.7574326Z .reg .pred complete; 2026-02-21T08:25:57.7574465Z waitLoop: 2026-02-21T08:25:57.7574650Z mbarrier.try_wait.parity.shared.b64 complete, [%r239], %r543; 2026-02-21T08:25:57.7574912Z @!complete bra.uni waitLoop; 2026-02-21T08:25:57.7575067Z } 2026-02-21T08:25:57.7575132Z 2026-02-21T08:25:57.7575195Z // end inline asm 2026-02-21T08:25:57.7575363Z shl.b32 %r246, %r546, 3; 2026-02-21T08:25:57.7575513Z add.s32 %r247, %r55, %r246; 2026-02-21T08:25:57.7575663Z add.s32 %r278, %r247, 24608; 2026-02-21T08:25:57.7575926Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7576203Z @%p42 bra $L__BB0_7; 2026-02-21T08:25:57.7576387Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:25:57.7576702Z .loc 1 55 44 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:55:44 2026-02-21T08:25:57.7576986Z shl.b32 %r250, %r544, 12; 2026-02-21T08:25:57.7577141Z add.s32 %r252, %r55, %r250; 2026-02-21T08:25:57.7577392Z .loc 1 54 85 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:54:85 2026-02-21T08:25:57.7577673Z shl.b32 %r253, %r544, 11; 2026-02-21T08:25:57.7577821Z add.s32 %r254, %r55, %r253; 2026-02-21T08:25:57.7578096Z add.s32 %r255, %r254, 16384; 2026-02-21T08:25:57.7578355Z .loc 1 56 52 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:56:52 2026-02-21T08:25:57.7578649Z elect.sync %r256|%p53, -1; 2026-02-21T08:25:57.7578812Z bfe.u32 %r257, %r255, 4, 14; 2026-02-21T08:25:57.7578961Z cvt.u64.u32 %rd49, %r257; 2026-02-21T08:25:57.7579132Z or.b64 %rd46, %rd49, -4611685949699522560; 2026-02-21T08:25:57.7579312Z bfe.u32 %r258, %r252, 4, 14; 2026-02-21T08:25:57.7579469Z cvt.u64.u32 %rd50, %r258; 2026-02-21T08:25:57.7579660Z or.b64 %rd47, %rd50, -4611685949691133952; 2026-02-21T08:25:57.7579842Z mov.b32 %r249, 69206032; 2026-02-21T08:25:57.7579988Z mov.pred %p52, -1; 2026-02-21T08:25:57.7580134Z // begin inline asm 2026-02-21T08:25:57.7580355Z @%p53 tcgen05.mma.cta_group::1.kind::f16 [ %r539 + 0 ], %rd46, %rd47, %r249, %p52; 2026-02-21T08:25:57.7580645Z // end inline asm 2026-02-21T08:25:57.7580786Z cvt.u64.u32 %rd48, %r278; 2026-02-21T08:25:57.7580931Z // begin inline asm 2026-02-21T08:25:57.7581140Z @%p53 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd48]; 2026-02-21T08:25:57.7581364Z // end inline asm 2026-02-21T08:25:57.7581503Z bra.uni $L__BB0_7; 2026-02-21T08:25:57.7581659Z $L__BB0_9: // %._crit_edge 2026-02-21T08:25:57.7581959Z .loc 1 30 4 // cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py:30:4 2026-02-21T08:25:57.7582241Z bar.sync 0; 2026-02-21T08:25:57.7582368Z // begin inline asm 2026-02-21T08:25:57.7582568Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r539, 128; 2026-02-21T08:25:57.7582794Z // end inline asm 2026-02-21T08:25:57.7582929Z ret; 2026-02-21T08:25:57.7583047Z $L__tmp0: 2026-02-21T08:25:57.7583172Z $L__func_end0: 2026-02-21T08:25:57.7583320Z // -- End function 2026-02-21T08:25:57.7583509Z } 2026-02-21T08:25:57.7583781Z .file 1 "/tmp/torchinductor_root/ag/cageupf6ppomguhll42jfqvqf7mavob4q6mo3hwqpzybbeyera3g.py" 2026-02-21T08:25:57.7584117Z .section .debug_abbrev 2026-02-21T08:25:57.7584268Z { 2026-02-21T08:25:57.7584418Z .b8 1 // Abbreviation Code 2026-02-21T08:25:57.7584652Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:25:57.7584954Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:25:57.7585178Z .b8 37 // DW_AT_producer 2026-02-21T08:25:57.7585394Z .b8 8 // DW_FORM_string 2026-02-21T08:25:57.7585608Z .b8 19 // DW_AT_language 2026-02-21T08:25:57.7585826Z .b8 5 // DW_FORM_data2 2026-02-21T08:25:57.7586036Z .b8 3 // DW_AT_name 2026-02-21T08:25:57.7586247Z .b8 8 // DW_FORM_string 2026-02-21T08:25:57.7586455Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:25:57.7586674Z .b8 6 // DW_FORM_data4 2026-02-21T08:25:57.7586881Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:25:57.7587126Z .b8 8 // DW_FORM_string 2026-02-21T08:25:57.7587329Z .b8 0 // EOM(1) 2026-02-21T08:25:57.7587521Z .b8 0 // EOM(2) 2026-02-21T08:25:57.7587719Z .b8 0 // EOM(3) 2026-02-21T08:25:57.7587890Z } 2026-02-21T08:25:57.7588020Z .section .debug_info 2026-02-21T08:25:57.7588160Z { 2026-02-21T08:25:57.7588313Z .b32 104 // Length of Unit 2026-02-21T08:25:57.7588534Z .b8 2 // DWARF version number 2026-02-21T08:25:57.7588733Z .b8 0 2026-02-21T08:25:57.7588917Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:25:57.7589187Z .b8 8 // Address Size (in bytes) 2026-02-21T08:25:57.7589460Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:25:57.7589708Z .b8 116 // DW_AT_producer 2026-02-21T08:25:57.7589907Z .b8 114 2026-02-21T08:25:57.7590031Z .b8 105 2026-02-21T08:25:57.7590160Z .b8 116 2026-02-21T08:25:57.7590279Z .b8 111 2026-02-21T08:25:57.7590407Z .b8 110 2026-02-21T08:25:57.7590524Z .b8 0 2026-02-21T08:25:57.7590676Z .b8 2 // DW_AT_language 2026-02-21T08:25:57.7590868Z .b8 0 2026-02-21T08:25:57.7591012Z .b8 99 // DW_AT_name 2026-02-21T08:25:57.7591233Z .b8 97 2026-02-21T08:25:57.7591354Z .b8 103 2026-02-21T08:25:57.7591477Z .b8 101 2026-02-21T08:25:57.7591591Z .b8 117 2026-02-21T08:25:57.7591714Z .b8 112 2026-02-21T08:25:57.7591828Z .b8 102 2026-02-21T08:25:57.7591954Z .b8 54 2026-02-21T08:25:57.7592081Z .b8 112 2026-02-21T08:25:57.7592215Z .b8 112 2026-02-21T08:25:57.7592324Z .b8 111 2026-02-21T08:25:57.7592468Z .b8 109 2026-02-21T08:25:57.7592577Z .b8 103 2026-02-21T08:25:57.7592692Z .b8 117 2026-02-21T08:25:57.7592808Z .b8 104 2026-02-21T08:25:57.7592916Z .b8 108 2026-02-21T08:25:57.7593030Z .b8 108 2026-02-21T08:25:57.7593136Z .b8 52 2026-02-21T08:25:57.7593252Z .b8 50 2026-02-21T08:25:57.7593360Z .b8 106 2026-02-21T08:25:57.7593474Z .b8 102 2026-02-21T08:25:57.7593581Z .b8 113 2026-02-21T08:25:57.7593696Z .b8 118 2026-02-21T08:25:57.7593802Z .b8 113 2026-02-21T08:25:57.7593918Z .b8 102 2026-02-21T08:25:57.7594026Z .b8 55 2026-02-21T08:25:57.7594144Z .b8 109 2026-02-21T08:25:57.7594251Z .b8 97 2026-02-21T08:25:57.7594366Z .b8 118 2026-02-21T08:25:57.7594482Z .b8 111 2026-02-21T08:25:57.7594589Z .b8 98 2026-02-21T08:25:57.7594749Z .b8 52 2026-02-21T08:25:57.7594858Z .b8 113 2026-02-21T08:25:57.7594974Z .b8 54 2026-02-21T08:25:57.7595081Z .b8 109 2026-02-21T08:25:57.7595196Z .b8 111 2026-02-21T08:25:57.7595302Z .b8 51 2026-02-21T08:25:57.7595418Z .b8 104 2026-02-21T08:25:57.7595525Z .b8 119 2026-02-21T08:25:57.7595643Z .b8 113 2026-02-21T08:25:57.7595750Z .b8 112 2026-02-21T08:25:57.7595812Z .b8 122 2026-02-21T08:25:57.7595863Z .b8 121 2026-02-21T08:25:57.7595913Z .b8 98 2026-02-21T08:25:57.7595962Z .b8 98 2026-02-21T08:25:57.7596018Z .b8 101 2026-02-21T08:25:57.7596067Z .b8 121 2026-02-21T08:25:57.7596115Z .b8 101 2026-02-21T08:25:57.7596168Z .b8 114 2026-02-21T08:25:57.7596216Z .b8 97 2026-02-21T08:25:57.7596264Z .b8 51 2026-02-21T08:25:57.7596313Z .b8 103 2026-02-21T08:25:57.7596368Z .b8 46 2026-02-21T08:25:57.7596414Z .b8 112 2026-02-21T08:25:57.7596463Z .b8 121 2026-02-21T08:25:57.7596517Z .b8 0 2026-02-21T08:25:57.7596609Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:25:57.7596683Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:25:57.7596733Z .b8 116 2026-02-21T08:25:57.7596790Z .b8 109 2026-02-21T08:25:57.7596839Z .b8 112 2026-02-21T08:25:57.7596889Z .b8 47 2026-02-21T08:25:57.7596946Z .b8 116 2026-02-21T08:25:57.7596994Z .b8 111 2026-02-21T08:25:57.7597044Z .b8 114 2026-02-21T08:25:57.7597093Z .b8 99 2026-02-21T08:25:57.7597150Z .b8 104 2026-02-21T08:25:57.7597203Z .b8 105 2026-02-21T08:25:57.7597286Z .b8 110 2026-02-21T08:25:57.7597342Z .b8 100 2026-02-21T08:25:57.7597392Z .b8 117 2026-02-21T08:25:57.7597442Z .b8 99 2026-02-21T08:25:57.7597491Z .b8 116 2026-02-21T08:25:57.7597550Z .b8 111 2026-02-21T08:25:57.7597600Z .b8 114 2026-02-21T08:25:57.7597651Z .b8 95 2026-02-21T08:25:57.7597700Z .b8 114 2026-02-21T08:25:57.7597759Z .b8 111 2026-02-21T08:25:57.7597807Z .b8 111 2026-02-21T08:25:57.7597855Z .b8 116 2026-02-21T08:25:57.7597910Z .b8 47 2026-02-21T08:25:57.7597957Z .b8 97 2026-02-21T08:25:57.7598007Z .b8 103 2026-02-21T08:25:57.7598054Z .b8 0 2026-02-21T08:25:57.7598111Z } 2026-02-21T08:25:57.7598174Z .section .debug_macinfo { } 2026-02-21T08:25:57.7598178Z 2026-02-21T08:25:57.7598253Z ================================================================ 2026-02-21T08:25:57.7598359Z please share the reproducer above with Triton project. 2026-02-21T08:25:57.7598440Z `ptxas` stderr: 2026-02-21T08:25:57.7598827Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 260 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:25:57.7598927Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:25:57.7598931Z 2026-02-21T08:25:57.7599310Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5ga16to_.ptx -o /tmp/tmp5ga16to_.ptx.o 2026-02-21T08:25:57.7599315Z 2026-02-21T08:25:57.7599436Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:26:01.0467726Z 2026-02-21T08:26:01.0468449Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 8.6 configs/s 2026-02-21T08:26:01.0479580Z [15s] Adaptive compile timeout: 30s (90% percentile=2.7s, bounds=[30.0s, 30s]) 2026-02-21T08:26:01.7567845Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 1336.2 configs/s 2026-02-21T08:26:01.8206956Z [16s] Initial random population of 100, 5 starting points: 2026-02-21T08:26:01.8208790Z error=4 2026-02-21T08:26:01.8208958Z ok=96 2026-02-21T08:26:01.8209093Z min=0.1188 2026-02-21T08:26:01.8209220Z mid=2.0655 2026-02-21T08:26:01.8209351Z max=154.0699 2026-02-21T08:26:01.8209500Z best={'block_sizes': [128, 32, 16], 2026-02-21T08:26:01.8209774Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:26:01.8210039Z 'l2_groupings': [2], 2026-02-21T08:26:01.8210216Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:26:01.8210408Z 'loop_orders': [[1, 0]], 2026-02-21T08:26:01.8210582Z 'num_stages': 5, 2026-02-21T08:26:01.8210723Z 'num_warps': 2, 2026-02-21T08:26:01.8210858Z 'pid_type': 'flat', 2026-02-21T08:26:01.8211013Z 'range_flattens': [None, False], 2026-02-21T08:26:01.8211189Z 'range_multi_buffers': [None, None], 2026-02-21T08:26:01.8211372Z 'range_num_stages': [0, 0], 2026-02-21T08:26:01.8211542Z 'range_unroll_factors': [0, 0], 2026-02-21T08:26:01.8211725Z 'range_warp_specializes': [None, None]} 2026-02-21T08:26:01.8225190Z [16s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:26:03.0716342Z [17s] Generation 1 starting: 85 neighbors, 5 active search path(s) 2026-02-21T08:26:07.9255612Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 45.2 configs/s 2026-02-21T08:26:12.2962522Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 20.6 configs/s 2026-02-21T08:26:13.2183228Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1072.9 2026-02-21T08:26:13.2183955Z configs/s 2026-02-21T08:26:13.2846784Z [28s] Generation 1 complete: 2026-02-21T08:26:13.2849947Z error=13 2026-02-21T08:26:13.2854573Z ok=78 2026-02-21T08:26:13.2856145Z min=0.0492 2026-02-21T08:26:13.2856356Z mid=0.1229 2026-02-21T08:26:13.2859916Z max=0.9433 2026-02-21T08:26:13.2860121Z best={'block_sizes': [64, 128, 32], 2026-02-21T08:26:13.2860380Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:26:13.2860625Z 'l2_groupings': [4], 2026-02-21T08:26:13.2861120Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:26:13.2861321Z 'loop_orders': [[0, 1]], 2026-02-21T08:26:13.2861470Z 'num_stages': 8, 2026-02-21T08:26:13.2861613Z 'num_warps': 4, 2026-02-21T08:26:13.2861748Z 'pid_type': 'flat', 2026-02-21T08:26:13.2861911Z 'range_flattens': [None, None], 2026-02-21T08:26:13.2862083Z 'range_multi_buffers': [None, False], 2026-02-21T08:26:13.2862272Z 'range_num_stages': [0, 0], 2026-02-21T08:26:13.2862433Z 'range_unroll_factors': [0, 0], 2026-02-21T08:26:13.2862716Z 'range_warp_specializes': [None, False]} 2026-02-21T08:26:13.2863284Z [28s] Fitting surrogate: 191 points, 191 targets 2026-02-21T08:26:14.5598684Z [29s] Generation 2 starting: 84 neighbors, 5 active search path(s) 2026-02-21T08:26:19.7898187Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 5.1 configs/s 2026-02-21T08:26:24.7235495Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 87/87 17.8 configs/s 2026-02-21T08:26:25.6025602Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1135.6 2026-02-21T08:26:25.6029379Z configs/s 2026-02-21T08:26:25.6714467Z [40s] Generation 2 complete: 2026-02-21T08:26:25.6718226Z error=4 2026-02-21T08:26:25.6722914Z ok=85 2026-02-21T08:26:25.6727551Z min=0.0308 2026-02-21T08:26:25.6732153Z mid=0.0716 2026-02-21T08:26:25.6733565Z max=5.8103 2026-02-21T08:26:25.6733739Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:26:25.6734006Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:26:25.6734228Z 'l2_groupings': [4], 2026-02-21T08:26:25.6734401Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:26:25.6734596Z 'loop_orders': [[0, 1]], 2026-02-21T08:26:25.6734825Z 'num_stages': 8, 2026-02-21T08:26:25.6734973Z 'num_warps': 4, 2026-02-21T08:26:25.6735315Z 'pid_type': 'flat', 2026-02-21T08:26:25.6735491Z 'range_flattens': [None, None], 2026-02-21T08:26:25.6735671Z 'range_multi_buffers': [None, True], 2026-02-21T08:26:25.6735874Z 'range_num_stages': [0, 0], 2026-02-21T08:26:25.6736042Z 'range_unroll_factors': [0, 0], 2026-02-21T08:26:25.6736233Z 'range_warp_specializes': [None, False]} 2026-02-21T08:26:25.6736453Z [40s] Fitting surrogate: 280 points, 280 targets 2026-02-21T08:26:26.9375366Z [41s] Generation 3 starting: 81 neighbors, 5 active search path(s) 2026-02-21T08:26:31.0991171Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83/83 15.9 configs/s 2026-02-21T08:26:34.8856877Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 83/83 22.3 configs/s 2026-02-21T08:26:35.8750363Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1056.3 2026-02-21T08:26:35.8751071Z configs/s 2026-02-21T08:26:35.9777269Z [50s] Generation 3 complete: 2026-02-21T08:26:35.9777760Z error=22 2026-02-21T08:26:35.9778025Z ok=65 2026-02-21T08:26:35.9778303Z min=0.0286 2026-02-21T08:26:35.9778560Z mid=0.0573 2026-02-21T08:26:35.9779235Z max=2.1242 2026-02-21T08:26:35.9779546Z best={'block_sizes': [512, 128, 32], 2026-02-21T08:26:35.9780034Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:26:35.9780508Z 'l2_groupings': [2], 2026-02-21T08:26:35.9780854Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:26:35.9781224Z 'loop_orders': [[0, 1]], 2026-02-21T08:26:35.9781508Z 'num_stages': 5, 2026-02-21T08:26:35.9781755Z 'num_warps': 8, 2026-02-21T08:26:35.9782014Z 'pid_type': 'flat', 2026-02-21T08:26:35.9782302Z 'range_flattens': [None, False], 2026-02-21T08:26:35.9782665Z 'range_multi_buffers': [None, None], 2026-02-21T08:26:35.9783004Z 'range_num_stages': [0, 0], 2026-02-21T08:26:35.9783332Z 'range_unroll_factors': [0, 0], 2026-02-21T08:26:35.9783672Z 'range_warp_specializes': [None, False]} 2026-02-21T08:26:35.9812535Z [50s] Fitting surrogate: 367 points, 367 targets 2026-02-21T08:26:37.7172386Z [52s] Generation 4 starting: 76 neighbors, 5 active search path(s) 2026-02-21T08:26:59.1456911Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77/77 0.8 configs/s 2026-02-21T08:27:03.8036829Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 77/77 16.7 configs/s 2026-02-21T08:27:04.6458356Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1287.0 2026-02-21T08:27:04.6459314Z configs/s 2026-02-21T08:27:04.7217909Z [79s] Generation 4 complete: 2026-02-21T08:27:04.7218872Z error=12 2026-02-21T08:27:04.7219207Z ok=69 2026-02-21T08:27:04.7219537Z min=0.0247 2026-02-21T08:27:04.7219878Z mid=0.0573 2026-02-21T08:27:04.7220199Z max=13.6346 2026-02-21T08:27:04.7220575Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:27:04.7221193Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:27:04.7221781Z 'l2_groupings': [32], 2026-02-21T08:27:04.7222220Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:27:04.7222752Z 'loop_orders': [[0, 1]], 2026-02-21T08:27:04.7223176Z 'num_stages': 7, 2026-02-21T08:27:04.7223576Z 'num_warps': 8, 2026-02-21T08:27:04.7223953Z 'pid_type': 'flat', 2026-02-21T08:27:04.7224361Z 'range_flattens': [None, False], 2026-02-21T08:27:04.7224856Z 'range_multi_buffers': [None, None], 2026-02-21T08:27:04.7225307Z 'range_num_stages': [0, 0], 2026-02-21T08:27:04.7225695Z 'range_unroll_factors': [0, 0], 2026-02-21T08:27:04.7226131Z 'range_warp_specializes': [None, False]} 2026-02-21T08:27:04.7251382Z [79s] Fitting surrogate: 448 points, 448 targets 2026-02-21T08:27:06.1111395Z [80s] Generation 5 starting: 60 neighbors, 5 active search path(s) 2026-02-21T08:27:23.1796999Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60/60 1.1 configs/s 2026-02-21T08:27:25.8369912Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 60/60 23.1 configs/s 2026-02-21T08:27:27.3534041Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 669.3 2026-02-21T08:27:27.3535443Z configs/s 2026-02-21T08:27:27.4911337Z [102s] Generation 5 complete: 2026-02-21T08:27:27.4911894Z error=19 2026-02-21T08:27:27.4912242Z ok=46 2026-02-21T08:27:27.4912591Z min=0.0245 2026-02-21T08:27:27.4912969Z mid=0.0410 2026-02-21T08:27:27.4913317Z max=6.1051 2026-02-21T08:27:27.4913700Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:27:27.4914383Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:27:27.4915562Z 'l2_groupings': [32], 2026-02-21T08:27:27.4916151Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:27:27.4916804Z 'loop_orders': [[0, 1]], 2026-02-21T08:27:27.4917272Z 'num_stages': 7, 2026-02-21T08:27:27.4917678Z 'num_warps': 8, 2026-02-21T08:27:27.4918089Z 'pid_type': 'flat', 2026-02-21T08:27:27.4918539Z 'range_flattens': [None, False], 2026-02-21T08:27:27.4919095Z 'range_multi_buffers': [None, None], 2026-02-21T08:27:27.4919646Z 'range_num_stages': [0, 0], 2026-02-21T08:27:27.4920172Z 'range_unroll_factors': [0, 0], 2026-02-21T08:27:27.4921087Z 'range_warp_specializes': [None, False]} 2026-02-21T08:27:27.4938009Z [102s] Fitting surrogate: 513 points, 513 targets 2026-02-21T08:27:28.8769865Z [103s] Generation 6 starting: 67 neighbors, 5 active search path(s) 2026-02-21T08:27:35.5491162Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68/68 16.8 configs/s 2026-02-21T08:27:38.5261219Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 68/68 23.0 configs/s 2026-02-21T08:27:40.3276512Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 561.2 2026-02-21T08:27:40.3277126Z configs/s 2026-02-21T08:27:40.4569057Z [115s] Generation 6 complete: 2026-02-21T08:27:40.4572741Z error=20 2026-02-21T08:27:40.4576825Z ok=52 2026-02-21T08:27:40.4580207Z min=0.0245 2026-02-21T08:27:40.4583009Z mid=0.0389 2026-02-21T08:27:40.4585842Z max=3.8133 2026-02-21T08:27:40.4590169Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:27:40.4594001Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:27:40.4597942Z 'l2_groupings': [32], 2026-02-21T08:27:40.4602157Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:27:40.4606229Z 'loop_orders': [[0, 1]], 2026-02-21T08:27:40.4609817Z 'num_stages': 7, 2026-02-21T08:27:40.4613917Z 'num_warps': 8, 2026-02-21T08:27:40.4615213Z 'pid_type': 'flat', 2026-02-21T08:27:40.4615397Z 'range_flattens': [None, False], 2026-02-21T08:27:40.4615591Z 'range_multi_buffers': [None, None], 2026-02-21T08:27:40.4616038Z 'range_num_stages': [0, 0], 2026-02-21T08:27:40.4616214Z 'range_unroll_factors': [0, 0], 2026-02-21T08:27:40.4616410Z 'range_warp_specializes': [None, False]} 2026-02-21T08:27:40.4616635Z [115s] Fitting surrogate: 585 points, 585 targets 2026-02-21T08:27:41.1115241Z [115s] Generation 7 starting: 31 neighbors, 2 active search path(s) 2026-02-21T08:28:12.1017385Z [146s] Timeout after 30s compiling Config(block_sizes=[512, 256, 32], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_stages=5, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T08:28:12.1033952Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 31/31 0.5 configs/s 2026-02-21T08:28:13.5042968Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 31/31 24.2 configs/s 2026-02-21T08:28:14.4886447Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1021.6 2026-02-21T08:28:14.4886933Z configs/s 2026-02-21T08:28:14.5715885Z [149s] Generation 7 complete: 2026-02-21T08:28:14.5717788Z error=10 2026-02-21T08:28:14.5717962Z timeout=1 2026-02-21T08:28:14.5718421Z ok=23 2026-02-21T08:28:14.5718565Z min=0.0245 2026-02-21T08:28:14.5718699Z mid=0.0308 2026-02-21T08:28:14.5718827Z max=6.6120 2026-02-21T08:28:14.5718972Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:28:14.5719217Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:28:14.5719436Z 'l2_groupings': [32], 2026-02-21T08:28:14.5719608Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:28:14.5719787Z 'loop_orders': [[0, 1]], 2026-02-21T08:28:14.5719942Z 'num_stages': 7, 2026-02-21T08:28:14.5720076Z 'num_warps': 8, 2026-02-21T08:28:14.5720220Z 'pid_type': 'flat', 2026-02-21T08:28:14.5720376Z 'range_flattens': [None, False], 2026-02-21T08:28:14.5720562Z 'range_multi_buffers': [None, None], 2026-02-21T08:28:14.5720746Z 'range_num_stages': [0, 0], 2026-02-21T08:28:14.5720907Z 'range_unroll_factors': [0, 0], 2026-02-21T08:28:14.5721088Z 'range_warp_specializes': [None, False]} 2026-02-21T08:28:14.5736369Z [149s] Fitting surrogate: 619 points, 619 targets 2026-02-21T08:28:15.0401101Z [149s] Generation 8 starting: 16 neighbors, 1 active search path(s) 2026-02-21T08:28:16.8065515Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 16.4 configs/s 2026-02-21T08:28:17.3755831Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 16/16 32.2 configs/s 2026-02-21T08:28:17.8564154Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2056.0 2026-02-21T08:28:17.8568887Z configs/s 2026-02-21T08:28:17.9047330Z [152s] Generation 8 complete: 2026-02-21T08:28:17.9047624Z error=7 2026-02-21T08:28:17.9047791Z ok=11 2026-02-21T08:28:17.9047929Z min=0.0245 2026-02-21T08:28:17.9048091Z mid=0.0287 2026-02-21T08:28:17.9048226Z max=2.8182 2026-02-21T08:28:17.9048374Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:28:17.9048618Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:28:17.9048847Z 'l2_groupings': [32], 2026-02-21T08:28:17.9049039Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:28:17.9049231Z 'loop_orders': [[0, 1]], 2026-02-21T08:28:17.9049633Z 'num_stages': 7, 2026-02-21T08:28:17.9049806Z 'num_warps': 8, 2026-02-21T08:28:17.9049948Z 'pid_type': 'flat', 2026-02-21T08:28:17.9050146Z 'range_flattens': [None, False], 2026-02-21T08:28:17.9050332Z 'range_multi_buffers': [None, None], 2026-02-21T08:28:17.9050520Z 'range_num_stages': [0, 0], 2026-02-21T08:28:17.9050690Z 'range_unroll_factors': [0, 0], 2026-02-21T08:28:17.9050866Z 'range_warp_specializes': [None, False]} 2026-02-21T08:28:17.9073406Z [152s] Fitting surrogate: 637 points, 637 targets 2026-02-21T08:28:18.3181360Z [153s] Generation 9 starting: 11 neighbors, 1 active search path(s) 2026-02-21T08:28:26.5902793Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11/11 1.2 configs/s 2026-02-21T08:28:27.0268769Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 11/11 27.1 configs/s 2026-02-21T08:28:27.3607047Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2926.2 2026-02-21T08:28:27.3607569Z configs/s 2026-02-21T08:28:27.3992082Z [162s] Generation 9 complete: 2026-02-21T08:28:27.3992397Z error=5 2026-02-21T08:28:27.3992565Z ok=8 2026-02-21T08:28:27.3992744Z min=0.0245 2026-02-21T08:28:27.3992922Z mid=0.0286 2026-02-21T08:28:27.3993098Z max=5.6863 2026-02-21T08:28:27.3993306Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:28:27.3993637Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:28:27.3993962Z 'l2_groupings': [32], 2026-02-21T08:28:27.3994194Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:28:27.3994488Z 'loop_orders': [[0, 1]], 2026-02-21T08:28:27.3994991Z 'num_stages': 7, 2026-02-21T08:28:27.3995202Z 'num_warps': 8, 2026-02-21T08:28:27.3995393Z 'pid_type': 'flat', 2026-02-21T08:28:27.3995621Z 'range_flattens': [None, False], 2026-02-21T08:28:27.3995877Z 'range_multi_buffers': [None, None], 2026-02-21T08:28:27.3996158Z 'range_num_stages': [0, 0], 2026-02-21T08:28:27.3996402Z 'range_unroll_factors': [0, 0], 2026-02-21T08:28:27.3996673Z 'range_warp_specializes': [None, False]} 2026-02-21T08:28:27.4012520Z [162s] Fitting surrogate: 650 points, 650 targets 2026-02-21T08:28:27.8644619Z [162s] Generation 10 starting: 17 neighbors, 1 active search path(s) 2026-02-21T08:28:29.3920328Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 17.4 configs/s 2026-02-21T08:28:30.1199607Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 20.8 configs/s 2026-02-21T08:28:30.8807152Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1316.0 2026-02-21T08:28:30.8811297Z configs/s 2026-02-21T08:28:30.9484351Z [165s] Generation 10 complete: 2026-02-21T08:28:30.9484606Z error=5 2026-02-21T08:28:30.9484797Z ok=14 2026-02-21T08:28:30.9484943Z min=0.0245 2026-02-21T08:28:30.9485087Z mid=0.0286 2026-02-21T08:28:30.9485239Z max=0.1473 2026-02-21T08:28:30.9485390Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:28:30.9485918Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:28:30.9486285Z 'l2_groupings': [32], 2026-02-21T08:28:30.9486459Z 'load_eviction_policies': ['last', ''], 2026-02-21T08:28:30.9486658Z 'loop_orders': [[0, 1]], 2026-02-21T08:28:30.9486810Z 'num_stages': 7, 2026-02-21T08:28:30.9486954Z 'num_warps': 8, 2026-02-21T08:28:30.9487089Z 'pid_type': 'flat', 2026-02-21T08:28:30.9487248Z 'range_flattens': [None, False], 2026-02-21T08:28:30.9487430Z 'range_multi_buffers': [None, None], 2026-02-21T08:28:30.9487606Z 'range_num_stages': [0, 0], 2026-02-21T08:28:30.9487858Z 'range_unroll_factors': [0, 0], 2026-02-21T08:28:30.9488034Z 'range_warp_specializes': [None, False]} 2026-02-21T08:28:30.9505518Z [165s] Fitting surrogate: 669 points, 669 targets 2026-02-21T08:28:31.2184016Z [165s] Autotuning complete in 165.9s after searching 638 configs. 2026-02-21T08:28:31.2184366Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:28:31.2185586Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_stages=7, num_warps=8, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T08:28:31.2186591Z 2026-02-21T08:28:31.2186839Z [165s] Code of selected kernel: /tmp/torchinductor_root/v3/cv3as7xm5qdngmhtyqyyug243xb47ax6zh4f6j4dffr4jwvopuqi.py 2026-02-21T08:28:56.1090061Z WARNING:tritonbench.utils.triton_op:Completed input ID 5: 2026-02-21T08:28:56.1090677Z (M, N, K) 2026-02-21T08:28:56.1090958Z ------------------ 2026-02-21T08:28:56.1091278Z (1024, 8192, 1024) 2026-02-21T08:28:56.1091466Z 2026-02-21T08:28:56.1101849Z 50%|█████ | 4/8 [18:45<19:12, 288.01s/it]WARNING:tritonbench.utils.triton_op:Running input ID 6: 2026-02-21T08:28:56.1102669Z (M, N, K) 2026-02-21T08:28:56.1104557Z ------------------ 2026-02-21T08:28:56.1105155Z (8192, 2048, 2048) 2026-02-21T08:28:56.1105850Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:29:43.6107514Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:30:21.4834492Z Autotune Choices Stats: 2026-02-21T08:30:21.4840013Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_93", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.07574400305747986, "best_triton_pos": 0} 2026-02-21T08:30:21.4841433Z AUTOTUNE mm(8192x2048, 2048x2048) 2026-02-21T08:30:21.4841619Z strides: [2048, 1], [1, 2048] 2026-02-21T08:30:21.4841852Z dtypes: torch.float16, torch.float16 2026-02-21T08:30:21.4844380Z triton_mm_93 0.0757 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:30:21.4845537Z triton_mm_94 0.0900 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:30:21.4846236Z triton_mm_87 0.1002 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:30:21.4846914Z triton_mm_92 0.1002 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:30:21.4847584Z triton_mm_89 0.1178 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:30:21.4848350Z triton_mm_90 0.1218 ms 62.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:30:21.4849024Z triton_mm_83 0.1239 ms 61.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:30:21.4849703Z triton_mm_86 0.1248 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:30:21.4850360Z triton_mm_85 0.1248 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:30:21.4851107Z triton_mm_88 0.1495 ms 50.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T08:30:21.4851678Z SingleProcess AUTOTUNE benchmarking takes 0.5081 seconds and 0.2532 seconds precompiling for 19 choices 2026-02-21T08:30:21.7652745Z INFO:tritonbench.utils.triton_op:Took 1211.66ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:31:04.7076820Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:31:04.7077188Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:31:04.7077484Z 'dtype': 'torch.float16', 2026-02-21T08:31:04.7077775Z 'shape': (8192, 2048), 2026-02-21T08:31:04.7078037Z 'stride': (2048, 1)}, 2026-02-21T08:31:04.7078306Z { 'device': 'cuda:0', 2026-02-21T08:31:04.7078596Z 'dtype': 'torch.float16', 2026-02-21T08:31:04.7078877Z 'shape': (2048, 2048), 2026-02-21T08:31:04.7079135Z 'stride': (1, 2048)}, 2026-02-21T08:31:04.7079387Z None), 2026-02-21T08:31:04.7079592Z 'kwargs': {}} 2026-02-21T08:31:04.7129109Z INFO:tritonbench.utils.triton_op:Took 5.45ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:31:04.8055283Z [0s] Autotune random seed: 2134884919 2026-02-21T08:31:04.9339183Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:31:11.4755199Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 26.1 configs/s 2026-02-21T08:31:14.1222423Z 2026-02-21T08:31:14.1223068Z [9s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:31:14.1224467Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 16], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[True, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:31:14.1226029Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:31:14.1226570Z `ptxas` stderr: 2026-02-21T08:31:14.1226997Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 160 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:31:14.1227462Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:31:14.1227617Z 2026-02-21T08:31:14.1228010Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpgm_j5556.ptx -o /tmp/tmpgm_j5556.ptx.o 2026-02-21T08:31:14.1228442Z 2026-02-21T08:31:14.1228582Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:31:14.1228769Z 2026-02-21T08:31:14.1228853Z ================================================================ 2026-02-21T08:31:14.1229066Z Internal Triton PTX codegen error 2026-02-21T08:31:14.1229237Z `ptxas` stderr: 2026-02-21T08:31:14.1229728Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 160 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:31:14.1230200Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:31:14.1230352Z 2026-02-21T08:31:14.1230712Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpgm_j5556.ptx -o /tmp/tmpgm_j5556.ptx.o 2026-02-21T08:31:14.1231130Z 2026-02-21T08:31:14.1231146Z 2026-02-21T08:31:14.1231205Z // 2026-02-21T08:31:14.1231430Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:31:14.1231608Z // 2026-02-21T08:31:14.1231675Z 2026-02-21T08:31:14.1231744Z .version 8.7 2026-02-21T08:31:14.1231880Z .target sm_100a 2026-02-21T08:31:14.1232014Z .address_size 64 2026-02-21T08:31:14.1232095Z 2026-02-21T08:31:14.1232215Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:31:14.1232467Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:31:14.1232711Z // @_helion_matmul 2026-02-21T08:31:14.1232913Z .visible .entry _helion_matmul( 2026-02-21T08:31:14.1233120Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:31:14.1233370Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:31:14.1233606Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:31:14.1233847Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:31:14.1234085Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:31:14.1234289Z ) 2026-02-21T08:31:14.1234409Z .reqntid 256 2026-02-21T08:31:14.1234530Z .maxnreg 32 2026-02-21T08:31:14.1234653Z { 2026-02-21T08:31:14.1234857Z .reg .pred %p<43>; 2026-02-21T08:31:14.1235013Z .reg .b16 %rs<3>; 2026-02-21T08:31:14.1235147Z .reg .b32 %r<342>; 2026-02-21T08:31:14.1235289Z .reg .b64 %rd<185>; 2026-02-21T08:31:14.1235508Z $L__func_begin0: 2026-02-21T08:31:14.1235596Z 2026-02-21T08:31:14.1235649Z // %bb.0: 2026-02-21T08:31:14.1235893Z .loc 1 19 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:19 2026-02-21T08:31:14.1236185Z mov.u32 %r1, %tid.x; 2026-02-21T08:31:14.1236333Z shr.u32 %r2, %r1, 5; 2026-02-21T08:31:14.1236487Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:31:14.1236677Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:31:14.1236827Z @%p1 bra $L__BB0_10; 2026-02-21T08:31:14.1236972Z bra.uni $L__BB0_1; 2026-02-21T08:31:14.1237105Z $L__BB0_10: 2026-02-21T08:31:14.1237348Z .loc 1 0 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:0:0 2026-02-21T08:31:14.1237652Z ld.param.b64 %rd9, [_helion_matmul_param_3]; 2026-02-21T08:31:14.1237864Z ld.param.b64 %rd8, [_helion_matmul_param_2]; 2026-02-21T08:31:14.1238154Z .loc 1 19 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:19 2026-02-21T08:31:14.1238449Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:31:14.1238640Z setp.lt.u32 %p14, %r1, 32; 2026-02-21T08:31:14.1238798Z mov.b32 %r239, global_smem; 2026-02-21T08:31:14.1239005Z // begin inline asm 2026-02-21T08:31:14.1239241Z @%p14 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r239], 32; 2026-02-21T08:31:14.1239491Z // end inline asm 2026-02-21T08:31:14.1239622Z bar.sync 0, 128; 2026-02-21T08:31:14.1239776Z ld.shared.b32 %r336, [global_smem]; 2026-02-21T08:31:14.1239948Z bar.sync 0, 128; 2026-02-21T08:31:14.1240076Z // begin inline asm 2026-02-21T08:31:14.1240282Z @%p14 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:31:14.1240515Z // end inline asm 2026-02-21T08:31:14.1240783Z .loc 1 21 71 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:21:71 2026-02-21T08:31:14.1241079Z mov.u32 %r341, %ctaid.x; 2026-02-21T08:31:14.1241242Z mov.u32 %r248, %ctaid.y; 2026-02-21T08:31:14.1241392Z mov.u32 %r249, %ctaid.z; 2026-02-21T08:31:14.1241553Z mov.u32 %r250, %nctaid.x; 2026-02-21T08:31:14.1241762Z mov.u32 %r251, %nctaid.y; 2026-02-21T08:31:14.1241932Z mad.lo.s32 %r252, %r249, %r251, %r248; 2026-02-21T08:31:14.1242140Z mad.lo.s32 %r253, %r252, %r250, %r341; 2026-02-21T08:31:14.1242320Z shl.b32 %r254, %r253, 7; 2026-02-21T08:31:14.1242487Z cvt.s64.s32 %rd150, %r254; 2026-02-21T08:31:14.1242655Z add.s64 %rd147, %rd9, %rd150; 2026-02-21T08:31:14.1242828Z shl.b32 %r255, %r1, 2; 2026-02-21T08:31:14.1242989Z add.s32 %r240, %r239, %r255; 2026-02-21T08:31:14.1243157Z mov.b32 %r241, 0; 2026-02-21T08:31:14.1243306Z // begin inline asm 2026-02-21T08:31:14.1243500Z @%p14 st.shared.b32 [ %r240 + 0 ], %r241; 2026-02-21T08:31:14.1243686Z // end inline asm 2026-02-21T08:31:14.1243826Z bar.warp.sync -1; 2026-02-21T08:31:14.1243978Z setp.eq.b32 %p17, %r1, 0; 2026-02-21T08:31:14.1244135Z cvt.u64.u32 %rd132, %r239; 2026-02-21T08:31:14.1244301Z // begin inline asm 2026-02-21T08:31:14.1244560Z @%p17 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd132 + 0 ], %rd8; 2026-02-21T08:31:14.1244900Z // end inline asm 2026-02-21T08:31:14.1245037Z // begin inline asm 2026-02-21T08:31:14.1245277Z @%p17 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1; 2026-02-21T08:31:14.1245545Z // end inline asm 2026-02-21T08:31:14.1245679Z mov.b32 %r242, 16; 2026-02-21T08:31:14.1245824Z // begin inline asm 2026-02-21T08:31:14.1246066Z @%p17 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0, %r242; 2026-02-21T08:31:14.1246348Z // end inline asm 2026-02-21T08:31:14.1246484Z mov.b32 %r243, 128; 2026-02-21T08:31:14.1246639Z // begin inline asm 2026-02-21T08:31:14.1246886Z @%p17 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1, %r243; 2026-02-21T08:31:14.1247165Z // end inline asm 2026-02-21T08:31:14.1247306Z mov.b32 %r244, 2048; 2026-02-21T08:31:14.1247450Z // begin inline asm 2026-02-21T08:31:14.1247743Z @%p17 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0, %r244; 2026-02-21T08:31:14.1248025Z // end inline asm 2026-02-21T08:31:14.1248165Z mov.b32 %r245, 8192; 2026-02-21T08:31:14.1248317Z // begin inline asm 2026-02-21T08:31:14.1248560Z @%p17 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1, %r245; 2026-02-21T08:31:14.1248832Z // end inline asm 2026-02-21T08:31:14.1248962Z mov.b64 %rd140, 4096; 2026-02-21T08:31:14.1249111Z // begin inline asm 2026-02-21T08:31:14.1249362Z @%p17 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd132 + 0 ], 0x0, %rd140; 2026-02-21T08:31:14.1249654Z // end inline asm 2026-02-21T08:31:14.1249783Z mov.b32 %r246, 1; 2026-02-21T08:31:14.1249923Z // begin inline asm 2026-02-21T08:31:14.1250184Z @%p17 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0, %r246; 2026-02-21T08:31:14.1250461Z // end inline asm 2026-02-21T08:31:14.1250602Z // begin inline asm 2026-02-21T08:31:14.1251004Z @%p17 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1, %r246; 2026-02-21T08:31:14.1251296Z // end inline asm 2026-02-21T08:31:14.1251459Z // begin inline asm 2026-02-21T08:31:14.1251703Z @%p17 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x6; 2026-02-21T08:31:14.1251976Z // end inline asm 2026-02-21T08:31:14.1252109Z // begin inline asm 2026-02-21T08:31:14.1252365Z @%p17 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0; 2026-02-21T08:31:14.1252640Z // end inline asm 2026-02-21T08:31:14.1252775Z // begin inline asm 2026-02-21T08:31:14.1253006Z @%p17 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x1; 2026-02-21T08:31:14.1253281Z // end inline asm 2026-02-21T08:31:14.1253407Z // begin inline asm 2026-02-21T08:31:14.1253639Z @%p17 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd132 + 0 ], 0x0; 2026-02-21T08:31:14.1253898Z // end inline asm 2026-02-21T08:31:14.1254025Z // begin inline asm 2026-02-21T08:31:14.1254413Z @%p14 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd147 + 0 ], [ %rd132 + 0 ], 0x80; 2026-02-21T08:31:14.1255893Z // end inline asm 2026-02-21T08:31:14.1256036Z // begin inline asm 2026-02-21T08:31:14.1256243Z @%p14 fence.proxy.tensormap::generic.acquire.gpu [ %rd147 + 0 ], 0x80; 2026-02-21T08:31:14.1256499Z @%p14 cp.async.bulk.commit_group ; 2026-02-21T08:31:14.1256691Z @%p14 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:31:14.1256863Z // end inline asm 2026-02-21T08:31:14.1257002Z bar.sync 0, 128; 2026-02-21T08:31:14.1257143Z cvta.global.u64 %rd151, %rd147; 2026-02-21T08:31:14.1257463Z .loc 1 27 94 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:27:94 2026-02-21T08:31:14.1257759Z setp.gt.u32 %p34, %r341, 8191; 2026-02-21T08:31:14.1257927Z @%p34 bra $L__BB0_13; 2026-02-21T08:31:14.1258085Z // %bb.11: // %.lr.ph 2026-02-21T08:31:14.1258381Z .loc 1 0 94 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:0:94 2026-02-21T08:31:14.1258666Z shl.b32 %r256, %r1, 5; 2026-02-21T08:31:14.1258818Z and.b32 %r257, %r256, 3936; 2026-02-21T08:31:14.1258988Z bfe.s32 %r258, %r1, 2, 1; 2026-02-21T08:31:14.1259137Z and.b32 %r259, %r258, 144; 2026-02-21T08:31:14.1259296Z or.b32 %r260, %r259, %r257; 2026-02-21T08:31:14.1259447Z add.s32 %r262, %r239, 24576; 2026-02-21T08:31:14.1259603Z add.s32 %r15, %r262, %r260; 2026-02-21T08:31:14.1259751Z xor.b32 %r263, %r260, 16; 2026-02-21T08:31:14.1259900Z add.s32 %r16, %r262, %r263; 2026-02-21T08:31:14.1260162Z .loc 1 27 94 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:27:94 2026-02-21T08:31:14.1260436Z shl.b32 %r340, %r341, 2; 2026-02-21T08:31:14.1260587Z shl.b32 %r339, %r341, 7; 2026-02-21T08:31:14.1260778Z $L__BB0_12: // =>This Inner Loop Header: Depth=1 2026-02-21T08:31:14.1261135Z .loc 1 38 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:38:27 2026-02-21T08:31:14.1261414Z and.b32 %r305, %r339, 384; 2026-02-21T08:31:14.1261573Z and.b32 %r306, %r341, 7680; 2026-02-21T08:31:14.1261729Z or.b32 %r303, %r305, %r306; 2026-02-21T08:31:14.1261983Z .loc 1 40 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:40:27 2026-02-21T08:31:14.1262271Z and.b32 %r302, %r340, 2032; 2026-02-21T08:31:14.1262524Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1262821Z shfl.sync.idx.b32 %r307, %r2, 0, 31, -1; 2026-02-21T08:31:14.1262999Z shl.b32 %r308, %r307, 21; 2026-02-21T08:31:14.1263159Z and.b32 %r309, %r308, 6291456; 2026-02-21T08:31:14.1263319Z add.s32 %r264, %r309, %r336; 2026-02-21T08:31:14.1263484Z mov.pred %p35, -1; 2026-02-21T08:31:14.1263632Z mov.b32 %r265, 0; 2026-02-21T08:31:14.1263771Z // begin inline asm 2026-02-21T08:31:14.1264145Z @%p35 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r264 + 0], {%r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265, %r265}; 2026-02-21T08:31:14.1264560Z // end inline asm 2026-02-21T08:31:14.1264754Z // begin inline asm 2026-02-21T08:31:14.1264904Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:31:14.1265069Z // end inline asm 2026-02-21T08:31:14.1265196Z bar.sync 0, 128; 2026-02-21T08:31:14.1265443Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1265728Z add.s32 %r281, %r239, 31744; 2026-02-21T08:31:14.1265877Z // begin inline asm 2026-02-21T08:31:14.1266047Z @%p17 mbarrier.init.shared::cta.b64 [%r281], 1; 2026-02-21T08:31:14.1266229Z // end inline asm 2026-02-21T08:31:14.1266384Z st.shared.b32 [global_smem+31752], 16777730; 2026-02-21T08:31:14.1266574Z st.shared.b32 [global_smem], %r336; 2026-02-21T08:31:14.1266777Z st.shared.v2.b32 [global_smem+8], {%r303, %r302}; 2026-02-21T08:31:14.1266977Z barrier.sync 1; 2026-02-21T08:31:14.1267110Z barrier.sync 1; 2026-02-21T08:31:14.1267284Z barrier.sync 1; 2026-02-21T08:31:14.1267533Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1267823Z bar.sync 0, 128; 2026-02-21T08:31:14.1267954Z // begin inline asm 2026-02-21T08:31:14.1268090Z 2026-02-21T08:31:14.1268198Z { 2026-02-21T08:31:14.1268322Z .reg .pred complete; 2026-02-21T08:31:14.1268460Z waitLoop: 2026-02-21T08:31:14.1268654Z mbarrier.try_wait.parity.shared.b64 complete, [%r281], %r265; 2026-02-21T08:31:14.1268886Z @!complete bra.uni waitLoop; 2026-02-21T08:31:14.1269097Z } 2026-02-21T08:31:14.1269163Z 2026-02-21T08:31:14.1269226Z // end inline asm 2026-02-21T08:31:14.1269470Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1269753Z bar.sync 0, 128; 2026-02-21T08:31:14.1269886Z // begin inline asm 2026-02-21T08:31:14.1270059Z @%p17 mbarrier.inval.shared::cta.b64 [%r281]; 2026-02-21T08:31:14.1270243Z // end inline asm 2026-02-21T08:31:14.1270494Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1270782Z // begin inline asm 2026-02-21T08:31:14.1271154Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300}, [%r264 + 0]; 2026-02-21T08:31:14.1271554Z // end inline asm 2026-02-21T08:31:14.1271689Z // begin inline asm 2026-02-21T08:31:14.1271848Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:31:14.1272010Z // end inline asm 2026-02-21T08:31:14.1272159Z cvt.u64.u32 %rd152, %r285; 2026-02-21T08:31:14.1272317Z cvt.u64.u32 %rd153, %r286; 2026-02-21T08:31:14.1272480Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:31:14.1272646Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:31:14.1272956Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1273252Z mov.b64 {%r311, %r312}, %rd155; 2026-02-21T08:31:14.1273422Z cvt.rn.f16x2.f32 %r313, %r312, %r311; 2026-02-21T08:31:14.1273706Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1273993Z cvt.u64.u32 %rd156, %r287; 2026-02-21T08:31:14.1274168Z cvt.u64.u32 %rd157, %r288; 2026-02-21T08:31:14.1274324Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:31:14.1274477Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:31:14.1274779Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1275062Z mov.b64 {%r314, %r315}, %rd159; 2026-02-21T08:31:14.1275233Z cvt.rn.f16x2.f32 %r316, %r315, %r314; 2026-02-21T08:31:14.1275507Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1275790Z cvt.u64.u32 %rd160, %r289; 2026-02-21T08:31:14.1275937Z cvt.u64.u32 %rd161, %r290; 2026-02-21T08:31:14.1276093Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:31:14.1276251Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:31:14.1276544Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1276832Z mov.b64 {%r317, %r318}, %rd163; 2026-02-21T08:31:14.1276999Z cvt.rn.f16x2.f32 %r319, %r318, %r317; 2026-02-21T08:31:14.1277283Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1277559Z cvt.u64.u32 %rd164, %r291; 2026-02-21T08:31:14.1277719Z cvt.u64.u32 %rd165, %r292; 2026-02-21T08:31:14.1277878Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:31:14.1278034Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:31:14.1278300Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1278581Z mov.b64 {%r320, %r321}, %rd167; 2026-02-21T08:31:14.1278753Z cvt.rn.f16x2.f32 %r322, %r321, %r320; 2026-02-21T08:31:14.1279056Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1279341Z cvt.u64.u32 %rd168, %r293; 2026-02-21T08:31:14.1279488Z cvt.u64.u32 %rd169, %r294; 2026-02-21T08:31:14.1279641Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:31:14.1279795Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:31:14.1280054Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1280343Z mov.b64 {%r323, %r324}, %rd171; 2026-02-21T08:31:14.1280501Z cvt.rn.f16x2.f32 %r325, %r324, %r323; 2026-02-21T08:31:14.1280869Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1281142Z cvt.u64.u32 %rd172, %r295; 2026-02-21T08:31:14.1281294Z cvt.u64.u32 %rd173, %r296; 2026-02-21T08:31:14.1281446Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:31:14.1281593Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:31:14.1281861Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1282147Z mov.b64 {%r326, %r327}, %rd175; 2026-02-21T08:31:14.1282320Z cvt.rn.f16x2.f32 %r328, %r327, %r326; 2026-02-21T08:31:14.1282588Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1282870Z cvt.u64.u32 %rd176, %r297; 2026-02-21T08:31:14.1283017Z cvt.u64.u32 %rd177, %r298; 2026-02-21T08:31:14.1283172Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:31:14.1283328Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:31:14.1283598Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1283895Z mov.b64 {%r329, %r330}, %rd179; 2026-02-21T08:31:14.1284061Z cvt.rn.f16x2.f32 %r331, %r330, %r329; 2026-02-21T08:31:14.1284349Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1284665Z cvt.u64.u32 %rd180, %r299; 2026-02-21T08:31:14.1284870Z cvt.u64.u32 %rd181, %r300; 2026-02-21T08:31:14.1285038Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:31:14.1285203Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:31:14.1285483Z .loc 1 55 27 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:55:27 2026-02-21T08:31:14.1285771Z mov.b64 {%r332, %r333}, %rd183; 2026-02-21T08:31:14.1285949Z cvt.rn.f16x2.f32 %r334, %r333, %r332; 2026-02-21T08:31:14.1286231Z .loc 1 56 45 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:56:45 2026-02-21T08:31:14.1286551Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:31:14.1286734Z bar.sync 0, 128; 2026-02-21T08:31:14.1286916Z st.shared.v4.b32 [%r15], {%r313, %r316, %r319, %r322}; 2026-02-21T08:31:14.1287165Z st.shared.v4.b32 [%r16], {%r325, %r328, %r331, %r334}; 2026-02-21T08:31:14.1287364Z // begin inline asm 2026-02-21T08:31:14.1287532Z fence.proxy.async.shared::cta; 2026-02-21T08:31:14.1287702Z // end inline asm 2026-02-21T08:31:14.1287844Z bar.sync 0, 128; 2026-02-21T08:31:14.1287990Z elect.sync %r335|%p40, -1; 2026-02-21T08:31:14.1288190Z and.pred %p38, %p14, %p40; 2026-02-21T08:31:14.1288346Z // begin inline asm 2026-02-21T08:31:14.1288624Z @%p38 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd151, {%r302, %r303}], [%r262]; 2026-02-21T08:31:14.1288934Z // end inline asm 2026-02-21T08:31:14.1289082Z cp.async.bulk.commit_group; 2026-02-21T08:31:14.1289367Z .loc 1 27 94 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:27:94 2026-02-21T08:31:14.1289669Z add.s32 %r22, %r341, 148; 2026-02-21T08:31:14.1289833Z add.s32 %r340, %r340, 592; 2026-02-21T08:31:14.1289989Z add.s32 %r339, %r339, 18944; 2026-02-21T08:31:14.1290159Z setp.lt.u32 %p41, %r341, 8044; 2026-02-21T08:31:14.1290320Z mov.b32 %r341, %r22; 2026-02-21T08:31:14.1290473Z @%p41 bra $L__BB0_12; 2026-02-21T08:31:14.1290654Z $L__BB0_13: // %._crit_edge 2026-02-21T08:31:14.1290860Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:31:14.1291069Z bar.sync 0, 128; 2026-02-21T08:31:14.1291328Z .loc 1 27 4 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:27:4 2026-02-21T08:31:14.1291627Z bar.sync 0, 128; 2026-02-21T08:31:14.1291771Z // begin inline asm 2026-02-21T08:31:14.1291972Z @%p14 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r336, 32; 2026-02-21T08:31:14.1292193Z // end inline asm 2026-02-21T08:31:14.1292343Z st.shared.b32 [global_smem+31752], 50529027; 2026-02-21T08:31:14.1292530Z barrier.sync 1; 2026-02-21T08:31:14.1292685Z $L__BB0_14: // %common.ret 2026-02-21T08:31:14.1293011Z .loc 1 0 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:0 2026-02-21T08:31:14.1293284Z ret; 2026-02-21T08:31:14.1293448Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:31:14.1293677Z ld.param.b64 %rd7, [_helion_matmul_param_1]; 2026-02-21T08:31:14.1293885Z ld.param.b64 %rd6, [_helion_matmul_param_0]; 2026-02-21T08:31:14.1294188Z .loc 1 19 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:19 2026-02-21T08:31:14.1294467Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:31:14.1294624Z and.b16 %rs2, %rs1, 1; 2026-02-21T08:31:14.1294802Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:31:14.1295077Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1295360Z or.b32 %r5, %r4, 112; 2026-02-21T08:31:14.1295518Z mov.b32 %r26, global_smem; 2026-02-21T08:31:14.1295675Z add.s32 %r27, %r26, %r3; 2026-02-21T08:31:14.1295838Z bra.uni $L__BB0_2; 2026-02-21T08:31:14.1296022Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:31:14.1296337Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1296624Z barrier.sync 1; 2026-02-21T08:31:14.1296785Z barrier.sync 1; 2026-02-21T08:31:14.1296957Z $L__BB0_2: // %.preheader 2026-02-21T08:31:14.1297178Z // =>This Loop Header: Depth=1 2026-02-21T08:31:14.1297408Z // Child Loop BB0_6 Depth 2 2026-02-21T08:31:14.1297708Z .loc 1 19 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:19 2026-02-21T08:31:14.1298004Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:31:14.1298188Z barrier.sync 1; 2026-02-21T08:31:14.1298329Z ld.shared.b8 %r25, [%r27+31748]; 2026-02-21T08:31:14.1298500Z setp.gt.u32 %p2, %r25, 3; 2026-02-21T08:31:14.1298653Z @%p2 bra $L__BB0_4; 2026-02-21T08:31:14.1298822Z // %bb.3: // %.preheader 2026-02-21T08:31:14.1299035Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:31:14.1299249Z $L_brx_0: .branchtargets 2026-02-21T08:31:14.1299402Z $L__BB0_5, 2026-02-21T08:31:14.1299526Z $L__BB0_8, 2026-02-21T08:31:14.1299654Z $L__BB0_9, 2026-02-21T08:31:14.1299773Z $L__BB0_14; 2026-02-21T08:31:14.1299943Z brx.idx %r25, $L_brx_0; 2026-02-21T08:31:14.1300114Z $L__BB0_5: // %.peel.next 2026-02-21T08:31:14.1300331Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:31:14.1300641Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1300948Z ld.shared.b32 %r136, [global_smem]; 2026-02-21T08:31:14.1301155Z ld.shared.v2.b32 {%r158, %r159}, [global_smem+8]; 2026-02-21T08:31:14.1301349Z barrier.sync 1; 2026-02-21T08:31:14.1301597Z .loc 1 39 45 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:39:45 2026-02-21T08:31:14.1301878Z bfe.u32 %r160, %r1, 1, 4; 2026-02-21T08:31:14.1302138Z .loc 1 41 45 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:41:45 2026-02-21T08:31:14.1302414Z shl.b32 %r161, %r1, 3; 2026-02-21T08:31:14.1302572Z and.b32 %r162, %r161, 8; 2026-02-21T08:31:14.1302858Z .loc 1 39 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:39:32 2026-02-21T08:31:14.1303143Z add.s32 %r163, %r158, %r160; 2026-02-21T08:31:14.1303403Z .loc 1 41 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:41:32 2026-02-21T08:31:14.1303675Z add.s32 %r164, %r159, %r160; 2026-02-21T08:31:14.1303934Z .loc 1 39 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:39:32 2026-02-21T08:31:14.1304205Z shl.b32 %r165, %r163, 11; 2026-02-21T08:31:14.1304495Z .loc 1 51 53 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:53 2026-02-21T08:31:14.1304807Z add.s32 %r166, %r165, 32768; 2026-02-21T08:31:14.1304959Z add.s32 %r167, %r165, 65536; 2026-02-21T08:31:14.1305115Z add.s32 %r168, %r165, 98304; 2026-02-21T08:31:14.1305266Z add.s32 %r169, %r165, 131072; 2026-02-21T08:31:14.1305429Z add.s32 %r170, %r165, 163840; 2026-02-21T08:31:14.1305581Z add.s32 %r171, %r165, 196608; 2026-02-21T08:31:14.1305739Z add.s32 %r172, %r165, 229376; 2026-02-21T08:31:14.1305999Z .loc 1 52 80 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:80 2026-02-21T08:31:14.1306291Z shl.b32 %r173, %r164, 11; 2026-02-21T08:31:14.1306558Z .loc 1 51 60 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:60 2026-02-21T08:31:14.1306846Z or.b32 %r174, %r165, %r162; 2026-02-21T08:31:14.1307014Z or.b32 %r175, %r166, %r162; 2026-02-21T08:31:14.1307172Z or.b32 %r176, %r167, %r162; 2026-02-21T08:31:14.1307337Z or.b32 %r177, %r168, %r162; 2026-02-21T08:31:14.1307483Z or.b32 %r178, %r169, %r162; 2026-02-21T08:31:14.1307633Z or.b32 %r179, %r170, %r162; 2026-02-21T08:31:14.1307777Z or.b32 %r180, %r171, %r162; 2026-02-21T08:31:14.1307928Z or.b32 %r181, %r172, %r162; 2026-02-21T08:31:14.1308220Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1308515Z mad.wide.s32 %rd10, %r174, 2, %rd6; 2026-02-21T08:31:14.1308699Z mad.wide.s32 %rd11, %r175, 2, %rd6; 2026-02-21T08:31:14.1308863Z mad.wide.s32 %rd12, %r176, 2, %rd6; 2026-02-21T08:31:14.1309033Z mad.wide.s32 %rd13, %r177, 2, %rd6; 2026-02-21T08:31:14.1309195Z mad.wide.s32 %rd14, %r178, 2, %rd6; 2026-02-21T08:31:14.1309366Z mad.wide.s32 %rd15, %r179, 2, %rd6; 2026-02-21T08:31:14.1309526Z mad.wide.s32 %rd16, %r180, 2, %rd6; 2026-02-21T08:31:14.1309694Z mad.wide.s32 %rd17, %r181, 2, %rd6; 2026-02-21T08:31:14.1309970Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1310252Z shl.b32 %r182, %r1, 4; 2026-02-21T08:31:14.1310405Z and.b32 %r183, %r182, 368; 2026-02-21T08:31:14.1310553Z bfe.s32 %r184, %r1, 3, 1; 2026-02-21T08:31:14.1310707Z and.b32 %r185, %r184, 144; 2026-02-21T08:31:14.1310856Z xor.b32 %r7, %r185, %r183; 2026-02-21T08:31:14.1311012Z add.s32 %r28, %r26, %r7; 2026-02-21T08:31:14.1311153Z mov.b32 %r29, 16; 2026-02-21T08:31:14.1311324Z // begin inline asm 2026-02-21T08:31:14.1311528Z cp.async.cg.shared.global [ %r28 + 0 ], [ %rd10 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1311747Z // end inline asm 2026-02-21T08:31:14.1311884Z add.s32 %r30, %r28, 512; 2026-02-21T08:31:14.1312026Z // begin inline asm 2026-02-21T08:31:14.1312226Z cp.async.cg.shared.global [ %r30 + 0 ], [ %rd11 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1312438Z // end inline asm 2026-02-21T08:31:14.1312576Z add.s32 %r32, %r28, 1024; 2026-02-21T08:31:14.1312724Z // begin inline asm 2026-02-21T08:31:14.1312917Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd12 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1313135Z // end inline asm 2026-02-21T08:31:14.1313267Z add.s32 %r34, %r28, 1536; 2026-02-21T08:31:14.1313416Z // begin inline asm 2026-02-21T08:31:14.1313601Z cp.async.cg.shared.global [ %r34 + 0 ], [ %rd13 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1313819Z // end inline asm 2026-02-21T08:31:14.1313998Z add.s32 %r36, %r28, 2048; 2026-02-21T08:31:14.1314155Z // begin inline asm 2026-02-21T08:31:14.1314338Z cp.async.cg.shared.global [ %r36 + 0 ], [ %rd14 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1314554Z // end inline asm 2026-02-21T08:31:14.1314715Z add.s32 %r38, %r28, 2560; 2026-02-21T08:31:14.1314867Z // begin inline asm 2026-02-21T08:31:14.1315057Z cp.async.cg.shared.global [ %r38 + 0 ], [ %rd15 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1315266Z // end inline asm 2026-02-21T08:31:14.1315410Z add.s32 %r40, %r28, 3072; 2026-02-21T08:31:14.1315557Z // begin inline asm 2026-02-21T08:31:14.1315790Z cp.async.cg.shared.global [ %r40 + 0 ], [ %rd16 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1315998Z // end inline asm 2026-02-21T08:31:14.1316146Z add.s32 %r42, %r28, 3584; 2026-02-21T08:31:14.1316294Z // begin inline asm 2026-02-21T08:31:14.1316488Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd17 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1316702Z // end inline asm 2026-02-21T08:31:14.1316838Z cp.async.commit_group; 2026-02-21T08:31:14.1317103Z .loc 1 52 59 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:59 2026-02-21T08:31:14.1317386Z or.b32 %r186, %r173, %r162; 2026-02-21T08:31:14.1317652Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1317939Z mad.wide.s32 %rd18, %r186, 2, %rd7; 2026-02-21T08:31:14.1318213Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1318502Z add.s32 %r187, %r26, 28672; 2026-02-21T08:31:14.1318656Z add.s32 %r44, %r187, %r7; 2026-02-21T08:31:14.1318808Z // begin inline asm 2026-02-21T08:31:14.1318992Z cp.async.cg.shared.global [ %r44 + 0 ], [ %rd18 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1319209Z // end inline asm 2026-02-21T08:31:14.1319343Z cp.async.commit_group; 2026-02-21T08:31:14.1319632Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1319920Z cvt.s64.s32 %rd77, %r165; 2026-02-21T08:31:14.1320077Z cvt.u64.u32 %rd78, %r162; 2026-02-21T08:31:14.1320229Z or.b64 %rd79, %rd77, %rd78; 2026-02-21T08:31:14.1320378Z shl.b64 %rd80, %rd79, 1; 2026-02-21T08:31:14.1320533Z add.s64 %rd81, %rd6, %rd80; 2026-02-21T08:31:14.1320681Z add.s64 %rd19, %rd81, 32; 2026-02-21T08:31:14.1320831Z cvt.s64.s32 %rd82, %r166; 2026-02-21T08:31:14.1320974Z or.b64 %rd83, %rd82, %rd78; 2026-02-21T08:31:14.1321127Z shl.b64 %rd84, %rd83, 1; 2026-02-21T08:31:14.1321271Z add.s64 %rd85, %rd6, %rd84; 2026-02-21T08:31:14.1321425Z add.s64 %rd20, %rd85, 32; 2026-02-21T08:31:14.1321570Z cvt.s64.s32 %rd86, %r167; 2026-02-21T08:31:14.1321720Z or.b64 %rd87, %rd86, %rd78; 2026-02-21T08:31:14.1321874Z shl.b64 %rd88, %rd87, 1; 2026-02-21T08:31:14.1322018Z add.s64 %rd89, %rd6, %rd88; 2026-02-21T08:31:14.1322171Z add.s64 %rd21, %rd89, 32; 2026-02-21T08:31:14.1322315Z cvt.s64.s32 %rd90, %r168; 2026-02-21T08:31:14.1322469Z or.b64 %rd91, %rd90, %rd78; 2026-02-21T08:31:14.1322614Z shl.b64 %rd92, %rd91, 1; 2026-02-21T08:31:14.1322796Z add.s64 %rd93, %rd6, %rd92; 2026-02-21T08:31:14.1322941Z add.s64 %rd22, %rd93, 32; 2026-02-21T08:31:14.1323094Z cvt.s64.s32 %rd94, %r169; 2026-02-21T08:31:14.1323237Z or.b64 %rd95, %rd94, %rd78; 2026-02-21T08:31:14.1323390Z shl.b64 %rd96, %rd95, 1; 2026-02-21T08:31:14.1323541Z add.s64 %rd97, %rd6, %rd96; 2026-02-21T08:31:14.1323686Z add.s64 %rd23, %rd97, 32; 2026-02-21T08:31:14.1323838Z cvt.s64.s32 %rd98, %r170; 2026-02-21T08:31:14.1323981Z or.b64 %rd99, %rd98, %rd78; 2026-02-21T08:31:14.1324139Z shl.b64 %rd100, %rd99, 1; 2026-02-21T08:31:14.1324288Z add.s64 %rd101, %rd6, %rd100; 2026-02-21T08:31:14.1324454Z add.s64 %rd24, %rd101, 32; 2026-02-21T08:31:14.1324605Z cvt.s64.s32 %rd102, %r171; 2026-02-21T08:31:14.1324810Z or.b64 %rd103, %rd102, %rd78; 2026-02-21T08:31:14.1324964Z shl.b64 %rd104, %rd103, 1; 2026-02-21T08:31:14.1325119Z add.s64 %rd105, %rd6, %rd104; 2026-02-21T08:31:14.1325302Z add.s64 %rd25, %rd105, 32; 2026-02-21T08:31:14.1325451Z cvt.s64.s32 %rd106, %r172; 2026-02-21T08:31:14.1325605Z or.b64 %rd107, %rd106, %rd78; 2026-02-21T08:31:14.1325753Z shl.b64 %rd108, %rd107, 1; 2026-02-21T08:31:14.1325906Z add.s64 %rd109, %rd6, %rd108; 2026-02-21T08:31:14.1326056Z add.s64 %rd26, %rd109, 32; 2026-02-21T08:31:14.1326316Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1326598Z bar.warp.sync -1; 2026-02-21T08:31:14.1326745Z add.s32 %r46, %r28, 4096; 2026-02-21T08:31:14.1326928Z // begin inline asm 2026-02-21T08:31:14.1327116Z cp.async.cg.shared.global [ %r46 + 0 ], [ %rd19 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1327339Z // end inline asm 2026-02-21T08:31:14.1327474Z add.s32 %r48, %r28, 4608; 2026-02-21T08:31:14.1327631Z // begin inline asm 2026-02-21T08:31:14.1327829Z cp.async.cg.shared.global [ %r48 + 0 ], [ %rd20 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1328061Z // end inline asm 2026-02-21T08:31:14.1328198Z add.s32 %r50, %r28, 5120; 2026-02-21T08:31:14.1328356Z // begin inline asm 2026-02-21T08:31:14.1328553Z cp.async.cg.shared.global [ %r50 + 0 ], [ %rd21 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1328779Z // end inline asm 2026-02-21T08:31:14.1328920Z add.s32 %r52, %r28, 5632; 2026-02-21T08:31:14.1329069Z // begin inline asm 2026-02-21T08:31:14.1329265Z cp.async.cg.shared.global [ %r52 + 0 ], [ %rd22 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1329481Z // end inline asm 2026-02-21T08:31:14.1329619Z add.s32 %r54, %r28, 6144; 2026-02-21T08:31:14.1329768Z // begin inline asm 2026-02-21T08:31:14.1329965Z cp.async.cg.shared.global [ %r54 + 0 ], [ %rd23 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1330185Z // end inline asm 2026-02-21T08:31:14.1330326Z add.s32 %r56, %r28, 6656; 2026-02-21T08:31:14.1330479Z // begin inline asm 2026-02-21T08:31:14.1330698Z cp.async.cg.shared.global [ %r56 + 0 ], [ %rd24 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1330926Z // end inline asm 2026-02-21T08:31:14.1331064Z add.s32 %r58, %r28, 7168; 2026-02-21T08:31:14.1331221Z // begin inline asm 2026-02-21T08:31:14.1331412Z cp.async.cg.shared.global [ %r58 + 0 ], [ %rd25 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1331638Z // end inline asm 2026-02-21T08:31:14.1331770Z add.s32 %r60, %r28, 7680; 2026-02-21T08:31:14.1331926Z // begin inline asm 2026-02-21T08:31:14.1332122Z cp.async.cg.shared.global [ %r60 + 0 ], [ %rd26 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1332344Z // end inline asm 2026-02-21T08:31:14.1332492Z cp.async.commit_group; 2026-02-21T08:31:14.1332761Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1333064Z cvt.s64.s32 %rd110, %r173; 2026-02-21T08:31:14.1333225Z or.b64 %rd111, %rd110, %rd78; 2026-02-21T08:31:14.1333395Z shl.b64 %rd112, %rd111, 1; 2026-02-21T08:31:14.1333554Z add.s64 %rd113, %rd7, %rd112; 2026-02-21T08:31:14.1333726Z add.s64 %rd27, %rd113, 32; 2026-02-21T08:31:14.1334000Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1334331Z add.s32 %r62, %r28, 29184; 2026-02-21T08:31:14.1334491Z // begin inline asm 2026-02-21T08:31:14.1334727Z cp.async.cg.shared.global [ %r62 + 0 ], [ %rd27 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1334954Z // end inline asm 2026-02-21T08:31:14.1335094Z cp.async.commit_group; 2026-02-21T08:31:14.1335364Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1335641Z add.s64 %rd28, %rd81, 64; 2026-02-21T08:31:14.1335795Z add.s64 %rd29, %rd85, 64; 2026-02-21T08:31:14.1335948Z add.s64 %rd30, %rd89, 64; 2026-02-21T08:31:14.1336094Z add.s64 %rd31, %rd93, 64; 2026-02-21T08:31:14.1336247Z add.s64 %rd32, %rd97, 64; 2026-02-21T08:31:14.1336393Z add.s64 %rd33, %rd101, 64; 2026-02-21T08:31:14.1336546Z add.s64 %rd34, %rd105, 64; 2026-02-21T08:31:14.1336696Z add.s64 %rd35, %rd109, 64; 2026-02-21T08:31:14.1336986Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1337270Z bar.warp.sync -1; 2026-02-21T08:31:14.1337415Z add.s32 %r64, %r28, 8192; 2026-02-21T08:31:14.1337561Z // begin inline asm 2026-02-21T08:31:14.1337746Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd28 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1337962Z // end inline asm 2026-02-21T08:31:14.1338090Z add.s32 %r66, %r28, 8704; 2026-02-21T08:31:14.1338237Z // begin inline asm 2026-02-21T08:31:14.1338421Z cp.async.cg.shared.global [ %r66 + 0 ], [ %rd29 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1338674Z // end inline asm 2026-02-21T08:31:14.1338803Z add.s32 %r68, %r28, 9216; 2026-02-21T08:31:14.1338952Z // begin inline asm 2026-02-21T08:31:14.1339134Z cp.async.cg.shared.global [ %r68 + 0 ], [ %rd30 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1339354Z // end inline asm 2026-02-21T08:31:14.1339490Z add.s32 %r70, %r28, 9728; 2026-02-21T08:31:14.1339631Z // begin inline asm 2026-02-21T08:31:14.1339822Z cp.async.cg.shared.global [ %r70 + 0 ], [ %rd31 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1340029Z // end inline asm 2026-02-21T08:31:14.1340168Z add.s32 %r72, %r28, 10240; 2026-02-21T08:31:14.1340312Z // begin inline asm 2026-02-21T08:31:14.1340501Z cp.async.cg.shared.global [ %r72 + 0 ], [ %rd32 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1340710Z // end inline asm 2026-02-21T08:31:14.1340847Z add.s32 %r74, %r28, 10752; 2026-02-21T08:31:14.1341001Z // begin inline asm 2026-02-21T08:31:14.1341180Z cp.async.cg.shared.global [ %r74 + 0 ], [ %rd33 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1341397Z // end inline asm 2026-02-21T08:31:14.1341530Z add.s32 %r76, %r28, 11264; 2026-02-21T08:31:14.1341681Z // begin inline asm 2026-02-21T08:31:14.1341862Z cp.async.cg.shared.global [ %r76 + 0 ], [ %rd34 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1342077Z // end inline asm 2026-02-21T08:31:14.1342233Z add.s32 %r78, %r28, 11776; 2026-02-21T08:31:14.1342448Z // begin inline asm 2026-02-21T08:31:14.1342664Z cp.async.cg.shared.global [ %r78 + 0 ], [ %rd35 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1342915Z // end inline asm 2026-02-21T08:31:14.1343108Z cp.async.commit_group; 2026-02-21T08:31:14.1343394Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1343718Z add.s64 %rd36, %rd113, 64; 2026-02-21T08:31:14.1344014Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1344359Z add.s32 %r80, %r28, 29696; 2026-02-21T08:31:14.1344534Z // begin inline asm 2026-02-21T08:31:14.1344779Z cp.async.cg.shared.global [ %r80 + 0 ], [ %rd36 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1345026Z // end inline asm 2026-02-21T08:31:14.1345175Z cp.async.commit_group; 2026-02-21T08:31:14.1345463Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1345785Z add.s64 %rd37, %rd81, 96; 2026-02-21T08:31:14.1345942Z add.s64 %rd38, %rd85, 96; 2026-02-21T08:31:14.1346092Z add.s64 %rd39, %rd89, 96; 2026-02-21T08:31:14.1346274Z add.s64 %rd40, %rd93, 96; 2026-02-21T08:31:14.1346421Z add.s64 %rd41, %rd97, 96; 2026-02-21T08:31:14.1346581Z add.s64 %rd42, %rd101, 96; 2026-02-21T08:31:14.1346743Z add.s64 %rd43, %rd105, 96; 2026-02-21T08:31:14.1346896Z add.s64 %rd44, %rd109, 96; 2026-02-21T08:31:14.1347171Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1347475Z bar.warp.sync -1; 2026-02-21T08:31:14.1347623Z add.s32 %r82, %r28, 12288; 2026-02-21T08:31:14.1347771Z // begin inline asm 2026-02-21T08:31:14.1347966Z cp.async.cg.shared.global [ %r82 + 0 ], [ %rd37 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1348194Z // end inline asm 2026-02-21T08:31:14.1348335Z add.s32 %r84, %r28, 12800; 2026-02-21T08:31:14.1348480Z // begin inline asm 2026-02-21T08:31:14.1348672Z cp.async.cg.shared.global [ %r84 + 0 ], [ %rd38 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1348910Z // end inline asm 2026-02-21T08:31:14.1349072Z add.s32 %r86, %r28, 13312; 2026-02-21T08:31:14.1349229Z // begin inline asm 2026-02-21T08:31:14.1349417Z cp.async.cg.shared.global [ %r86 + 0 ], [ %rd39 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1349658Z // end inline asm 2026-02-21T08:31:14.1349790Z add.s32 %r88, %r28, 13824; 2026-02-21T08:31:14.1349944Z // begin inline asm 2026-02-21T08:31:14.1350131Z cp.async.cg.shared.global [ %r88 + 0 ], [ %rd40 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1350369Z // end inline asm 2026-02-21T08:31:14.1350509Z add.s32 %r90, %r28, 14336; 2026-02-21T08:31:14.1350690Z // begin inline asm 2026-02-21T08:31:14.1350890Z cp.async.cg.shared.global [ %r90 + 0 ], [ %rd41 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1351120Z // end inline asm 2026-02-21T08:31:14.1351274Z add.s32 %r92, %r28, 14848; 2026-02-21T08:31:14.1351423Z // begin inline asm 2026-02-21T08:31:14.1351617Z cp.async.cg.shared.global [ %r92 + 0 ], [ %rd42 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1351834Z // end inline asm 2026-02-21T08:31:14.1351977Z add.s32 %r94, %r28, 15360; 2026-02-21T08:31:14.1352137Z // begin inline asm 2026-02-21T08:31:14.1352324Z cp.async.cg.shared.global [ %r94 + 0 ], [ %rd43 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1352547Z // end inline asm 2026-02-21T08:31:14.1352685Z add.s32 %r96, %r28, 15872; 2026-02-21T08:31:14.1352859Z // begin inline asm 2026-02-21T08:31:14.1353044Z cp.async.cg.shared.global [ %r96 + 0 ], [ %rd44 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1353263Z // end inline asm 2026-02-21T08:31:14.1353401Z cp.async.commit_group; 2026-02-21T08:31:14.1353669Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1353967Z add.s64 %rd45, %rd113, 96; 2026-02-21T08:31:14.1354228Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1354551Z add.s32 %r98, %r28, 30208; 2026-02-21T08:31:14.1354721Z // begin inline asm 2026-02-21T08:31:14.1354914Z cp.async.cg.shared.global [ %r98 + 0 ], [ %rd45 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1355129Z // end inline asm 2026-02-21T08:31:14.1355271Z cp.async.commit_group; 2026-02-21T08:31:14.1355530Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1355824Z add.s64 %rd46, %rd81, 128; 2026-02-21T08:31:14.1355979Z add.s64 %rd47, %rd85, 128; 2026-02-21T08:31:14.1356127Z add.s64 %rd48, %rd89, 128; 2026-02-21T08:31:14.1356281Z add.s64 %rd49, %rd93, 128; 2026-02-21T08:31:14.1356428Z add.s64 %rd50, %rd97, 128; 2026-02-21T08:31:14.1356585Z add.s64 %rd51, %rd101, 128; 2026-02-21T08:31:14.1356738Z add.s64 %rd52, %rd105, 128; 2026-02-21T08:31:14.1356897Z add.s64 %rd53, %rd109, 128; 2026-02-21T08:31:14.1357158Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1357454Z bar.warp.sync -1; 2026-02-21T08:31:14.1357599Z add.s32 %r100, %r28, 16384; 2026-02-21T08:31:14.1357748Z // begin inline asm 2026-02-21T08:31:14.1357994Z cp.async.cg.shared.global [ %r100 + 0 ], [ %rd46 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1358207Z // end inline asm 2026-02-21T08:31:14.1358346Z add.s32 %r102, %r28, 16896; 2026-02-21T08:31:14.1358493Z // begin inline asm 2026-02-21T08:31:14.1358687Z cp.async.cg.shared.global [ %r102 + 0 ], [ %rd47 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1358899Z // end inline asm 2026-02-21T08:31:14.1359035Z add.s32 %r104, %r28, 17408; 2026-02-21T08:31:14.1359180Z // begin inline asm 2026-02-21T08:31:14.1359373Z cp.async.cg.shared.global [ %r104 + 0 ], [ %rd48 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1359594Z // end inline asm 2026-02-21T08:31:14.1359725Z add.s32 %r106, %r28, 17920; 2026-02-21T08:31:14.1359880Z // begin inline asm 2026-02-21T08:31:14.1360065Z cp.async.cg.shared.global [ %r106 + 0 ], [ %rd49 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1360289Z // end inline asm 2026-02-21T08:31:14.1360420Z add.s32 %r108, %r28, 18432; 2026-02-21T08:31:14.1360599Z // begin inline asm 2026-02-21T08:31:14.1360785Z cp.async.cg.shared.global [ %r108 + 0 ], [ %rd50 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1361005Z // end inline asm 2026-02-21T08:31:14.1361140Z add.s32 %r110, %r28, 18944; 2026-02-21T08:31:14.1361285Z // begin inline asm 2026-02-21T08:31:14.1361473Z cp.async.cg.shared.global [ %r110 + 0 ], [ %rd51 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1361686Z // end inline asm 2026-02-21T08:31:14.1361823Z add.s32 %r112, %r28, 19456; 2026-02-21T08:31:14.1361974Z // begin inline asm 2026-02-21T08:31:14.1362165Z cp.async.cg.shared.global [ %r112 + 0 ], [ %rd52 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1362422Z // end inline asm 2026-02-21T08:31:14.1362562Z add.s32 %r114, %r28, 19968; 2026-02-21T08:31:14.1362715Z // begin inline asm 2026-02-21T08:31:14.1362899Z cp.async.cg.shared.global [ %r114 + 0 ], [ %rd53 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1363123Z // end inline asm 2026-02-21T08:31:14.1363255Z cp.async.commit_group; 2026-02-21T08:31:14.1363519Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1363809Z add.s64 %rd54, %rd113, 128; 2026-02-21T08:31:14.1364074Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1364365Z add.s32 %r116, %r28, 30720; 2026-02-21T08:31:14.1364519Z // begin inline asm 2026-02-21T08:31:14.1364747Z cp.async.cg.shared.global [ %r116 + 0 ], [ %rd54 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1364961Z // end inline asm 2026-02-21T08:31:14.1365104Z cp.async.commit_group; 2026-02-21T08:31:14.1365358Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1365643Z add.s64 %rd55, %rd81, 160; 2026-02-21T08:31:14.1365792Z add.s64 %rd56, %rd85, 160; 2026-02-21T08:31:14.1365946Z add.s64 %rd57, %rd89, 160; 2026-02-21T08:31:14.1366122Z add.s64 %rd58, %rd93, 160; 2026-02-21T08:31:14.1366277Z add.s64 %rd59, %rd97, 160; 2026-02-21T08:31:14.1366433Z add.s64 %rd60, %rd101, 160; 2026-02-21T08:31:14.1366582Z add.s64 %rd61, %rd105, 160; 2026-02-21T08:31:14.1366735Z add.s64 %rd62, %rd109, 160; 2026-02-21T08:31:14.1366983Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1367271Z bar.warp.sync -1; 2026-02-21T08:31:14.1367405Z add.s32 %r118, %r28, 20480; 2026-02-21T08:31:14.1367557Z // begin inline asm 2026-02-21T08:31:14.1367742Z cp.async.cg.shared.global [ %r118 + 0 ], [ %rd55 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1367964Z // end inline asm 2026-02-21T08:31:14.1368103Z add.s32 %r120, %r28, 20992; 2026-02-21T08:31:14.1368248Z // begin inline asm 2026-02-21T08:31:14.1368438Z cp.async.cg.shared.global [ %r120 + 0 ], [ %rd56 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1368650Z // end inline asm 2026-02-21T08:31:14.1368786Z add.s32 %r122, %r28, 21504; 2026-02-21T08:31:14.1368931Z // begin inline asm 2026-02-21T08:31:14.1369121Z cp.async.cg.shared.global [ %r122 + 0 ], [ %rd57 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1369358Z // end inline asm 2026-02-21T08:31:14.1369494Z add.s32 %r124, %r28, 22016; 2026-02-21T08:31:14.1369637Z // begin inline asm 2026-02-21T08:31:14.1369825Z cp.async.cg.shared.global [ %r124 + 0 ], [ %rd58 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1370044Z // end inline asm 2026-02-21T08:31:14.1370172Z add.s32 %r126, %r28, 22528; 2026-02-21T08:31:14.1370326Z // begin inline asm 2026-02-21T08:31:14.1370508Z cp.async.cg.shared.global [ %r126 + 0 ], [ %rd59 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1370733Z // end inline asm 2026-02-21T08:31:14.1370867Z add.s32 %r128, %r28, 23040; 2026-02-21T08:31:14.1371025Z // begin inline asm 2026-02-21T08:31:14.1371216Z cp.async.cg.shared.global [ %r128 + 0 ], [ %rd60 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1371445Z // end inline asm 2026-02-21T08:31:14.1371588Z add.s32 %r130, %r28, 23552; 2026-02-21T08:31:14.1371745Z // begin inline asm 2026-02-21T08:31:14.1371975Z cp.async.cg.shared.global [ %r130 + 0 ], [ %rd61 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1372197Z // end inline asm 2026-02-21T08:31:14.1372339Z add.s32 %r132, %r28, 24064; 2026-02-21T08:31:14.1372492Z // begin inline asm 2026-02-21T08:31:14.1372690Z cp.async.cg.shared.global [ %r132 + 0 ], [ %rd62 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1372915Z // end inline asm 2026-02-21T08:31:14.1373059Z cp.async.commit_group; 2026-02-21T08:31:14.1373333Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1373627Z add.s64 %rd63, %rd113, 160; 2026-02-21T08:31:14.1373937Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1374234Z add.s32 %r134, %r28, 31232; 2026-02-21T08:31:14.1374395Z // begin inline asm 2026-02-21T08:31:14.1374587Z cp.async.cg.shared.global [ %r134 + 0 ], [ %rd63 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1374851Z // end inline asm 2026-02-21T08:31:14.1374990Z cp.async.commit_group; 2026-02-21T08:31:14.1375268Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1375579Z cp.async.wait_group 10; 2026-02-21T08:31:14.1375738Z bar.warp.sync -1; 2026-02-21T08:31:14.1376002Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1376307Z elect.sync %r188|%p4, -1; 2026-02-21T08:31:14.1376477Z bfe.u32 %r189, %r26, 4, 14; 2026-02-21T08:31:14.1376634Z cvt.u64.u32 %rd114, %r189; 2026-02-21T08:31:14.1376817Z or.b64 %rd64, %rd114, -4611685949691133952; 2026-02-21T08:31:14.1377013Z bfe.u32 %r190, %r187, 4, 14; 2026-02-21T08:31:14.1377173Z cvt.u64.u32 %rd115, %r190; 2026-02-21T08:31:14.1377350Z or.b64 %rd65, %rd115, -4611685949705814016; 2026-02-21T08:31:14.1377534Z mov.b32 %r137, 134479888; 2026-02-21T08:31:14.1377696Z mov.pred %p3, 0; 2026-02-21T08:31:14.1377865Z // begin inline asm 2026-02-21T08:31:14.1378107Z @%p4 tcgen05.mma.cta_group::1.kind::f16 [ %r136 + 0 ], %rd64, %rd65, %r137, %p3; 2026-02-21T08:31:14.1378351Z // end inline asm 2026-02-21T08:31:14.1378490Z add.s32 %r191, %r26, 31744; 2026-02-21T08:31:14.1378640Z cvt.u64.u32 %rd66, %r191; 2026-02-21T08:31:14.1378793Z // begin inline asm 2026-02-21T08:31:14.1378995Z @%p3 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd66]; 2026-02-21T08:31:14.1379212Z // end inline asm 2026-02-21T08:31:14.1379456Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1379738Z add.s64 %rd67, %rd81, 192; 2026-02-21T08:31:14.1379892Z add.s64 %rd68, %rd85, 192; 2026-02-21T08:31:14.1380038Z add.s64 %rd69, %rd89, 192; 2026-02-21T08:31:14.1380190Z add.s64 %rd70, %rd93, 192; 2026-02-21T08:31:14.1380335Z add.s64 %rd71, %rd97, 192; 2026-02-21T08:31:14.1380487Z add.s64 %rd72, %rd101, 192; 2026-02-21T08:31:14.1380645Z add.s64 %rd73, %rd105, 192; 2026-02-21T08:31:14.1380794Z add.s64 %rd74, %rd109, 192; 2026-02-21T08:31:14.1381052Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1381359Z bar.warp.sync -1; 2026-02-21T08:31:14.1381506Z // begin inline asm 2026-02-21T08:31:14.1381697Z cp.async.cg.shared.global [ %r28 + 0 ], [ %rd67 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1381927Z // end inline asm 2026-02-21T08:31:14.1382059Z // begin inline asm 2026-02-21T08:31:14.1382252Z cp.async.cg.shared.global [ %r30 + 0 ], [ %rd68 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1382470Z // end inline asm 2026-02-21T08:31:14.1382600Z // begin inline asm 2026-02-21T08:31:14.1382790Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd69 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1382997Z // end inline asm 2026-02-21T08:31:14.1383130Z // begin inline asm 2026-02-21T08:31:14.1383311Z cp.async.cg.shared.global [ %r34 + 0 ], [ %rd70 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1383530Z // end inline asm 2026-02-21T08:31:14.1383658Z // begin inline asm 2026-02-21T08:31:14.1383871Z cp.async.cg.shared.global [ %r36 + 0 ], [ %rd71 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1384088Z // end inline asm 2026-02-21T08:31:14.1384216Z // begin inline asm 2026-02-21T08:31:14.1384402Z cp.async.cg.shared.global [ %r38 + 0 ], [ %rd72 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1384607Z // end inline asm 2026-02-21T08:31:14.1384779Z // begin inline asm 2026-02-21T08:31:14.1384960Z cp.async.cg.shared.global [ %r40 + 0 ], [ %rd73 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1385177Z // end inline asm 2026-02-21T08:31:14.1385306Z // begin inline asm 2026-02-21T08:31:14.1385522Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd74 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1385737Z // end inline asm 2026-02-21T08:31:14.1385871Z cp.async.commit_group; 2026-02-21T08:31:14.1386131Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1386419Z add.s64 %rd75, %rd113, 192; 2026-02-21T08:31:14.1386692Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1386969Z // begin inline asm 2026-02-21T08:31:14.1387158Z cp.async.cg.shared.global [ %r44 + 0 ], [ %rd75 + 0 ], 0x10, %r29; 2026-02-21T08:31:14.1387366Z // end inline asm 2026-02-21T08:31:14.1387506Z cp.async.commit_group; 2026-02-21T08:31:14.1387765Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1388043Z add.s32 %r192, %r5, %r173; 2026-02-21T08:31:14.1388201Z cvt.u64.u32 %rd1, %r192; 2026-02-21T08:31:14.1388350Z add.s32 %r193, %r4, %r165; 2026-02-21T08:31:14.1388503Z cvt.u64.u32 %rd2, %r193; 2026-02-21T08:31:14.1388645Z mov.b32 %r337, 0; 2026-02-21T08:31:14.1388782Z mov.b64 %rd184, 0; 2026-02-21T08:31:14.1388911Z mov.b32 %r338, %r337; 2026-02-21T08:31:14.1389099Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:31:14.1389373Z // => This Inner Loop Header: Depth=2 2026-02-21T08:31:14.1389581Z add.s64 %rd4, %rd184, 16; 2026-02-21T08:31:14.1389746Z setp.lt.u64 %p9, %rd4, 1952; 2026-02-21T08:31:14.1389901Z add.s32 %r214, %r337, 1; 2026-02-21T08:31:14.1390057Z setp.gt.s32 %p10, %r214, 5; 2026-02-21T08:31:14.1390216Z selp.b32 %r337, 0, %r214, %p10; 2026-02-21T08:31:14.1390492Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1390774Z cp.async.wait_group 10; 2026-02-21T08:31:14.1390934Z bar.warp.sync -1; 2026-02-21T08:31:14.1391080Z shl.b32 %r215, %r337, 12; 2026-02-21T08:31:14.1391232Z add.s32 %r217, %r26, %r215; 2026-02-21T08:31:14.1391506Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1391781Z shl.b32 %r218, %r337, 9; 2026-02-21T08:31:14.1391933Z add.s32 %r219, %r26, %r218; 2026-02-21T08:31:14.1392085Z add.s32 %r220, %r219, 28672; 2026-02-21T08:31:14.1392347Z .loc 1 53 52 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:53:52 2026-02-21T08:31:14.1392658Z setp.eq.b64 %p11, %rd184, 2016; 2026-02-21T08:31:14.1392823Z elect.sync %r221|%p7, -1; 2026-02-21T08:31:14.1392981Z bfe.u32 %r222, %r217, 4, 14; 2026-02-21T08:31:14.1393132Z cvt.u64.u32 %rd128, %r222; 2026-02-21T08:31:14.1393303Z or.b64 %rd116, %rd128, -4611685949691133952; 2026-02-21T08:31:14.1393479Z bfe.u32 %r223, %r220, 4, 14; 2026-02-21T08:31:14.1393635Z cvt.u64.u32 %rd129, %r223; 2026-02-21T08:31:14.1393796Z or.b64 %rd117, %rd129, -4611685949705814016; 2026-02-21T08:31:14.1393980Z mov.pred %p6, -1; 2026-02-21T08:31:14.1394114Z // begin inline asm 2026-02-21T08:31:14.1394345Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r136 + 0 ], %rd116, %rd117, %r137, %p6; 2026-02-21T08:31:14.1394600Z // end inline asm 2026-02-21T08:31:14.1394777Z and.pred %p8, %p11, %p7; 2026-02-21T08:31:14.1394937Z // begin inline asm 2026-02-21T08:31:14.1395137Z @%p8 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd66]; 2026-02-21T08:31:14.1395443Z // end inline asm 2026-02-21T08:31:14.1395684Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1395977Z add.s32 %r225, %r338, 1; 2026-02-21T08:31:14.1396136Z setp.gt.s32 %p12, %r225, 5; 2026-02-21T08:31:14.1396293Z selp.b32 %r338, 0, %r225, %p12; 2026-02-21T08:31:14.1396561Z .loc 1 51 60 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:60 2026-02-21T08:31:14.1396845Z add.s64 %rd130, %rd2, %rd184; 2026-02-21T08:31:14.1397042Z cvt.u32.u64 %r226, %rd130; 2026-02-21T08:31:14.1397200Z add.s32 %r227, %r226, 112; 2026-02-21T08:31:14.1397465Z .loc 1 51 32 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:32 2026-02-21T08:31:14.1397763Z mad.wide.s32 %rd119, %r227, 2, %rd6; 2026-02-21T08:31:14.1397947Z add.s32 %r228, %r226, 32880; 2026-02-21T08:31:14.1398113Z mad.wide.s32 %rd120, %r228, 2, %rd6; 2026-02-21T08:31:14.1398285Z add.s32 %r229, %r226, 65648; 2026-02-21T08:31:14.1398451Z mad.wide.s32 %rd121, %r229, 2, %rd6; 2026-02-21T08:31:14.1398618Z add.s32 %r230, %r226, 98416; 2026-02-21T08:31:14.1398781Z mad.wide.s32 %rd122, %r230, 2, %rd6; 2026-02-21T08:31:14.1398950Z add.s32 %r231, %r226, 131184; 2026-02-21T08:31:14.1399117Z mad.wide.s32 %rd123, %r231, 2, %rd6; 2026-02-21T08:31:14.1399284Z add.s32 %r232, %r226, 163952; 2026-02-21T08:31:14.1399451Z mad.wide.s32 %rd124, %r232, 2, %rd6; 2026-02-21T08:31:14.1399626Z add.s32 %r233, %r226, 196720; 2026-02-21T08:31:14.1399787Z mad.wide.s32 %rd125, %r233, 2, %rd6; 2026-02-21T08:31:14.1399961Z add.s32 %r234, %r226, 229488; 2026-02-21T08:31:14.1400116Z mad.wide.s32 %rd126, %r234, 2, %rd6; 2026-02-21T08:31:14.1400395Z .loc 1 51 85 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:51:85 2026-02-21T08:31:14.1400728Z shl.b32 %r235, %r338, 12; 2026-02-21T08:31:14.1400888Z add.s32 %r236, %r26, %r235; 2026-02-21T08:31:14.1401043Z bar.warp.sync -1; 2026-02-21T08:31:14.1401198Z add.s32 %r196, %r236, %r7; 2026-02-21T08:31:14.1401362Z selp.b32 %r197, 16, 0, %p9; 2026-02-21T08:31:14.1401512Z // begin inline asm 2026-02-21T08:31:14.1401716Z cp.async.cg.shared.global [ %r196 + 0 ], [ %rd119 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1401939Z // end inline asm 2026-02-21T08:31:14.1402078Z add.s32 %r198, %r196, 512; 2026-02-21T08:31:14.1402223Z // begin inline asm 2026-02-21T08:31:14.1402420Z cp.async.cg.shared.global [ %r198 + 0 ], [ %rd120 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1402637Z // end inline asm 2026-02-21T08:31:14.1402777Z add.s32 %r200, %r196, 1024; 2026-02-21T08:31:14.1402934Z // begin inline asm 2026-02-21T08:31:14.1403123Z cp.async.cg.shared.global [ %r200 + 0 ], [ %rd121 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1403345Z // end inline asm 2026-02-21T08:31:14.1403476Z add.s32 %r202, %r196, 1536; 2026-02-21T08:31:14.1403631Z // begin inline asm 2026-02-21T08:31:14.1403820Z cp.async.cg.shared.global [ %r202 + 0 ], [ %rd122 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1404090Z // end inline asm 2026-02-21T08:31:14.1404220Z add.s32 %r204, %r196, 2048; 2026-02-21T08:31:14.1404376Z // begin inline asm 2026-02-21T08:31:14.1404562Z cp.async.cg.shared.global [ %r204 + 0 ], [ %rd123 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1404821Z // end inline asm 2026-02-21T08:31:14.1404958Z add.s32 %r206, %r196, 2560; 2026-02-21T08:31:14.1405104Z // begin inline asm 2026-02-21T08:31:14.1405303Z cp.async.cg.shared.global [ %r206 + 0 ], [ %rd124 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1405519Z // end inline asm 2026-02-21T08:31:14.1405659Z add.s32 %r208, %r196, 3072; 2026-02-21T08:31:14.1405806Z // begin inline asm 2026-02-21T08:31:14.1405999Z cp.async.cg.shared.global [ %r208 + 0 ], [ %rd125 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1406211Z // end inline asm 2026-02-21T08:31:14.1406346Z add.s32 %r210, %r196, 3584; 2026-02-21T08:31:14.1406500Z // begin inline asm 2026-02-21T08:31:14.1406721Z cp.async.cg.shared.global [ %r210 + 0 ], [ %rd126 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1406943Z // end inline asm 2026-02-21T08:31:14.1407074Z cp.async.commit_group; 2026-02-21T08:31:14.1407335Z .loc 1 52 34 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:34 2026-02-21T08:31:14.1407629Z add.s64 %rd131, %rd1, %rd184; 2026-02-21T08:31:14.1407789Z cvt.u32.u64 %r237, %rd131; 2026-02-21T08:31:14.1407944Z mad.wide.s32 %rd127, %r237, 2, %rd7; 2026-02-21T08:31:14.1408222Z .loc 1 52 87 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:52:87 2026-02-21T08:31:14.1408537Z shl.b32 %r238, %r338, 9; 2026-02-21T08:31:14.1408684Z add.s32 %r212, %r44, %r238; 2026-02-21T08:31:14.1408839Z // begin inline asm 2026-02-21T08:31:14.1409025Z cp.async.cg.shared.global [ %r212 + 0 ], [ %rd127 + 0 ], 0x10, %r197; 2026-02-21T08:31:14.1409245Z // end inline asm 2026-02-21T08:31:14.1409380Z cp.async.commit_group; 2026-02-21T08:31:14.1409643Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1409934Z setp.lt.u64 %p13, %rd4, 2032; 2026-02-21T08:31:14.1410087Z mov.b64 %rd184, %rd4; 2026-02-21T08:31:14.1410234Z @%p13 bra $L__BB0_6; 2026-02-21T08:31:14.1410398Z // %bb.7: // %.loopexit 2026-02-21T08:31:14.1410626Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:31:14.1410834Z cp.async.wait_group 0; 2026-02-21T08:31:14.1410990Z bar.warp.sync -1; 2026-02-21T08:31:14.1411123Z barrier.sync 1; 2026-02-21T08:31:14.1411266Z bra.uni $L__BB0_2; 2026-02-21T08:31:14.1411438Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:31:14.1411751Z .loc 1 46 79 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:46:79 2026-02-21T08:31:14.1412034Z barrier.sync 1; 2026-02-21T08:31:14.1412194Z barrier.sync 1; 2026-02-21T08:31:14.1412334Z bra.uni $L__BB0_2; 2026-02-21T08:31:14.1412506Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:31:14.1412821Z .loc 1 19 0 // cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py:19 2026-02-21T08:31:14.1413104Z barrier.sync 1; 2026-02-21T08:31:14.1413240Z barrier.sync 1; 2026-02-21T08:31:14.1413376Z bra.uni $L__BB0_2; 2026-02-21T08:31:14.1413508Z $L__tmp0: 2026-02-21T08:31:14.1413632Z $L__func_end0: 2026-02-21T08:31:14.1413783Z // -- End function 2026-02-21T08:31:14.1413966Z } 2026-02-21T08:31:14.1414234Z .file 1 "/tmp/torchinductor_root/ja/cjaxthaytmd3jwsk6ps35kxex7uwo5bhlbesuwgrqobktaadgut5.py" 2026-02-21T08:31:14.1414570Z .section .debug_abbrev 2026-02-21T08:31:14.1414757Z { 2026-02-21T08:31:14.1414919Z .b8 1 // Abbreviation Code 2026-02-21T08:31:14.1415156Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:31:14.1415384Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:31:14.1415605Z .b8 37 // DW_AT_producer 2026-02-21T08:31:14.1415845Z .b8 8 // DW_FORM_string 2026-02-21T08:31:14.1416056Z .b8 19 // DW_AT_language 2026-02-21T08:31:14.1416259Z .b8 5 // DW_FORM_data2 2026-02-21T08:31:14.1416473Z .b8 3 // DW_AT_name 2026-02-21T08:31:14.1416672Z .b8 8 // DW_FORM_string 2026-02-21T08:31:14.1416885Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:31:14.1417097Z .b8 6 // DW_FORM_data4 2026-02-21T08:31:14.1417300Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:31:14.1417508Z .b8 8 // DW_FORM_string 2026-02-21T08:31:14.1417703Z .b8 0 // EOM(1) 2026-02-21T08:31:14.1417903Z .b8 0 // EOM(2) 2026-02-21T08:31:14.1418115Z .b8 0 // EOM(3) 2026-02-21T08:31:14.1418301Z } 2026-02-21T08:31:14.1418433Z .section .debug_info 2026-02-21T08:31:14.1418573Z { 2026-02-21T08:31:14.1418726Z .b32 104 // Length of Unit 2026-02-21T08:31:14.1418946Z .b8 2 // DWARF version number 2026-02-21T08:31:14.1419148Z .b8 0 2026-02-21T08:31:14.1419329Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:31:14.1419594Z .b8 8 // Address Size (in bytes) 2026-02-21T08:31:14.1419866Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:31:14.1420114Z .b8 116 // DW_AT_producer 2026-02-21T08:31:14.1420313Z .b8 114 2026-02-21T08:31:14.1420436Z .b8 105 2026-02-21T08:31:14.1420567Z .b8 116 2026-02-21T08:31:14.1420686Z .b8 111 2026-02-21T08:31:14.1420809Z .b8 110 2026-02-21T08:31:14.1420920Z .b8 0 2026-02-21T08:31:14.1421069Z .b8 2 // DW_AT_language 2026-02-21T08:31:14.1421250Z .b8 0 2026-02-21T08:31:14.1421397Z .b8 99 // DW_AT_name 2026-02-21T08:31:14.1421574Z .b8 106 2026-02-21T08:31:14.1421695Z .b8 97 2026-02-21T08:31:14.1421810Z .b8 120 2026-02-21T08:31:14.1421930Z .b8 116 2026-02-21T08:31:14.1422048Z .b8 104 2026-02-21T08:31:14.1422160Z .b8 97 2026-02-21T08:31:14.1422287Z .b8 121 2026-02-21T08:31:14.1422394Z .b8 116 2026-02-21T08:31:14.1422509Z .b8 109 2026-02-21T08:31:14.1422618Z .b8 100 2026-02-21T08:31:14.1422732Z .b8 51 2026-02-21T08:31:14.1422841Z .b8 106 2026-02-21T08:31:14.1422953Z .b8 119 2026-02-21T08:31:14.1423060Z .b8 115 2026-02-21T08:31:14.1423174Z .b8 107 2026-02-21T08:31:14.1423278Z .b8 54 2026-02-21T08:31:14.1423392Z .b8 112 2026-02-21T08:31:14.1423499Z .b8 115 2026-02-21T08:31:14.1423612Z .b8 51 2026-02-21T08:31:14.1423758Z .b8 53 2026-02-21T08:31:14.1423866Z .b8 107 2026-02-21T08:31:14.1423977Z .b8 120 2026-02-21T08:31:14.1424084Z .b8 101 2026-02-21T08:31:14.1424198Z .b8 120 2026-02-21T08:31:14.1424304Z .b8 55 2026-02-21T08:31:14.1424418Z .b8 117 2026-02-21T08:31:14.1424525Z .b8 119 2026-02-21T08:31:14.1424637Z .b8 111 2026-02-21T08:31:14.1424772Z .b8 53 2026-02-21T08:31:14.1424886Z .b8 98 2026-02-21T08:31:14.1424993Z .b8 104 2026-02-21T08:31:14.1425107Z .b8 108 2026-02-21T08:31:14.1425213Z .b8 98 2026-02-21T08:31:14.1425332Z .b8 101 2026-02-21T08:31:14.1425441Z .b8 115 2026-02-21T08:31:14.1425557Z .b8 117 2026-02-21T08:31:14.1425672Z .b8 119 2026-02-21T08:31:14.1425784Z .b8 103 2026-02-21T08:31:14.1425901Z .b8 114 2026-02-21T08:31:14.1426010Z .b8 113 2026-02-21T08:31:14.1426128Z .b8 111 2026-02-21T08:31:14.1426237Z .b8 98 2026-02-21T08:31:14.1426355Z .b8 107 2026-02-21T08:31:14.1426465Z .b8 116 2026-02-21T08:31:14.1426595Z .b8 97 2026-02-21T08:31:14.1426704Z .b8 97 2026-02-21T08:31:14.1426818Z .b8 100 2026-02-21T08:31:14.1426929Z .b8 103 2026-02-21T08:31:14.1427043Z .b8 117 2026-02-21T08:31:14.1427151Z .b8 116 2026-02-21T08:31:14.1427266Z .b8 53 2026-02-21T08:31:14.1427348Z .b8 46 2026-02-21T08:31:14.1427404Z .b8 112 2026-02-21T08:31:14.1427453Z .b8 121 2026-02-21T08:31:14.1427502Z .b8 0 2026-02-21T08:31:14.1427594Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:31:14.1427677Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:31:14.1427726Z .b8 116 2026-02-21T08:31:14.1427773Z .b8 109 2026-02-21T08:31:14.1427828Z .b8 112 2026-02-21T08:31:14.1427876Z .b8 47 2026-02-21T08:31:14.1427923Z .b8 116 2026-02-21T08:31:14.1427972Z .b8 111 2026-02-21T08:31:14.1428027Z .b8 114 2026-02-21T08:31:14.1428074Z .b8 99 2026-02-21T08:31:14.1428123Z .b8 104 2026-02-21T08:31:14.1428179Z .b8 105 2026-02-21T08:31:14.1428227Z .b8 110 2026-02-21T08:31:14.1428275Z .b8 100 2026-02-21T08:31:14.1428323Z .b8 117 2026-02-21T08:31:14.1428379Z .b8 99 2026-02-21T08:31:14.1428427Z .b8 116 2026-02-21T08:31:14.1428477Z .b8 111 2026-02-21T08:31:14.1428526Z .b8 114 2026-02-21T08:31:14.1428581Z .b8 95 2026-02-21T08:31:14.1428658Z .b8 114 2026-02-21T08:31:14.1428709Z .b8 111 2026-02-21T08:31:14.1428765Z .b8 111 2026-02-21T08:31:14.1428815Z .b8 116 2026-02-21T08:31:14.1428862Z .b8 47 2026-02-21T08:31:14.1428909Z .b8 106 2026-02-21T08:31:14.1428965Z .b8 97 2026-02-21T08:31:14.1429013Z .b8 0 2026-02-21T08:31:14.1429061Z } 2026-02-21T08:31:14.1429131Z .section .debug_macinfo { } 2026-02-21T08:31:14.1429136Z 2026-02-21T08:31:14.1429210Z ================================================================ 2026-02-21T08:31:14.1429307Z please share the reproducer above with Triton project. 2026-02-21T08:31:33.3913706Z 2026-02-21T08:31:33.3917594Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 4.5 configs/s 2026-02-21T08:31:33.3926920Z [28s] Adaptive compile timeout: 30s (90% percentile=4.3s, bounds=[30.0s, 30s]) 2026-02-21T08:31:33.8198491Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━━ 763/763 1501.9 configs/s 2026-02-21T08:31:33.9123676Z [28s] Initial random population of 100, 5 starting points: 2026-02-21T08:31:33.9126925Z error=13 2026-02-21T08:31:33.9131973Z ok=87 2026-02-21T08:31:33.9133491Z min=0.2642 2026-02-21T08:31:33.9133658Z mid=5.1016 2026-02-21T08:31:33.9133835Z max=535.8069 2026-02-21T08:31:33.9134001Z best={'block_sizes': [64, 128, 16], 2026-02-21T08:31:33.9134222Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:31:33.9139182Z 'l2_groupings': [4], 2026-02-21T08:31:33.9140680Z 'load_eviction_policies': ['', ''], 2026-02-21T08:31:33.9140928Z 'loop_orders': [[1, 0]], 2026-02-21T08:31:33.9141096Z 'num_stages': 8, 2026-02-21T08:31:33.9141270Z 'num_warps': 2, 2026-02-21T08:31:33.9141411Z 'pid_type': 'flat', 2026-02-21T08:31:33.9141572Z 'range_flattens': [None, None], 2026-02-21T08:31:33.9141750Z 'range_multi_buffers': [None, None], 2026-02-21T08:31:33.9141936Z 'range_num_stages': [0, 0], 2026-02-21T08:31:33.9142097Z 'range_unroll_factors': [0, 0], 2026-02-21T08:31:33.9142524Z 'range_warp_specializes': [None, None]} 2026-02-21T08:31:33.9142750Z [28s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:31:35.0955092Z [30s] Generation 1 starting: 81 neighbors, 5 active search path(s) 2026-02-21T08:31:39.2310725Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 38.7 configs/s 2026-02-21T08:31:44.4399827Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 16.4 configs/s 2026-02-21T08:31:45.3584335Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1028.7 2026-02-21T08:31:45.3595085Z configs/s 2026-02-21T08:31:45.4382822Z [40s] Generation 1 complete: 2026-02-21T08:31:45.4386597Z error=3 2026-02-21T08:31:45.4391051Z ok=84 2026-02-21T08:31:45.4394935Z min=0.1517 2026-02-21T08:31:45.4399541Z mid=0.7537 2026-02-21T08:31:45.4404141Z max=11.3925 2026-02-21T08:31:45.4407951Z best={'block_sizes': [256, 128, 16], 2026-02-21T08:31:45.4408308Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:31:45.4408551Z 'l2_groupings': [4], 2026-02-21T08:31:45.4412939Z 'load_eviction_policies': ['', ''], 2026-02-21T08:31:45.4417996Z 'loop_orders': [[1, 0]], 2026-02-21T08:31:45.4419368Z 'num_stages': 8, 2026-02-21T08:31:45.4419555Z 'num_warps': 4, 2026-02-21T08:31:45.4419722Z 'pid_type': 'flat', 2026-02-21T08:31:45.4419894Z 'range_flattens': [None, None], 2026-02-21T08:31:45.4420096Z 'range_multi_buffers': [None, None], 2026-02-21T08:31:45.4420295Z 'range_num_stages': [0, 0], 2026-02-21T08:31:45.4420465Z 'range_unroll_factors': [0, 0], 2026-02-21T08:31:45.4420675Z 'range_warp_specializes': [None, True]} 2026-02-21T08:31:45.4420990Z [40s] Fitting surrogate: 187 points, 187 targets 2026-02-21T08:31:46.6678818Z [41s] Generation 2 starting: 86 neighbors, 5 active search path(s) 2026-02-21T08:31:55.2421091Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 2.3 configs/s 2026-02-21T08:32:00.0342897Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 18.8 configs/s 2026-02-21T08:32:01.7874524Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 561.1 2026-02-21T08:32:01.7875240Z configs/s 2026-02-21T08:32:01.8813786Z [56s] Generation 2 complete: 2026-02-21T08:32:01.8818135Z error=9 2026-02-21T08:32:01.8820207Z ok=83 2026-02-21T08:32:01.8820378Z min=0.1125 2026-02-21T08:32:01.8820509Z mid=0.2540 2026-02-21T08:32:01.8820641Z max=16.8336 2026-02-21T08:32:01.8820782Z best={'block_sizes': [256, 256, 16], 2026-02-21T08:32:01.8821014Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:32:01.8821523Z 'l2_groupings': [4], 2026-02-21T08:32:01.8821695Z 'load_eviction_policies': ['', ''], 2026-02-21T08:32:01.8821877Z 'loop_orders': [[1, 0]], 2026-02-21T08:32:01.8822042Z 'num_stages': 8, 2026-02-21T08:32:01.8822191Z 'num_warps': 8, 2026-02-21T08:32:01.8822331Z 'pid_type': 'flat', 2026-02-21T08:32:01.8822507Z 'range_flattens': [None, None], 2026-02-21T08:32:01.8822687Z 'range_multi_buffers': [None, None], 2026-02-21T08:32:01.8822878Z 'range_num_stages': [0, 0], 2026-02-21T08:32:01.8823049Z 'range_unroll_factors': [0, 0], 2026-02-21T08:32:01.8823228Z 'range_warp_specializes': [None, True]} 2026-02-21T08:32:01.8829550Z [56s] Fitting surrogate: 279 points, 279 targets 2026-02-21T08:32:03.0265054Z [58s] Generation 3 starting: 81 neighbors, 5 active search path(s) 2026-02-21T08:32:07.1101236Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 26.7 configs/s 2026-02-21T08:32:10.4411839Z 2026-02-21T08:32:10.4413688Z [65s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:32:10.4415537Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:32:10.4416737Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:32:10.4416972Z `ptxas` stderr: 2026-02-21T08:32:10.4417386Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 203 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:32:10.4417861Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:32:10.4418017Z 2026-02-21T08:32:10.4418424Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp24e9tolg.ptx -o /tmp/tmp24e9tolg.ptx.o 2026-02-21T08:32:10.4418863Z 2026-02-21T08:32:10.4419002Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:32:10.4419259Z 2026-02-21T08:32:10.4419356Z ================================================================ 2026-02-21T08:32:10.4419581Z Internal Triton PTX codegen error 2026-02-21T08:32:10.4419873Z `ptxas` stderr: 2026-02-21T08:32:10.4420312Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 203 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:32:10.4420813Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:32:10.4420966Z 2026-02-21T08:32:10.4421377Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp24e9tolg.ptx -o /tmp/tmp24e9tolg.ptx.o 2026-02-21T08:32:10.4421825Z 2026-02-21T08:32:10.4421828Z 2026-02-21T08:32:10.4421892Z // 2026-02-21T08:32:10.4422062Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:32:10.4422247Z // 2026-02-21T08:32:10.4422333Z 2026-02-21T08:32:10.4422395Z .version 8.7 2026-02-21T08:32:10.4422540Z .target sm_100a 2026-02-21T08:32:10.4422691Z .address_size 64 2026-02-21T08:32:10.4422779Z 2026-02-21T08:32:10.4422984Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:32:10.4423246Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:32:10.4423466Z // @_helion_matmul 2026-02-21T08:32:10.4423665Z .visible .entry _helion_matmul( 2026-02-21T08:32:10.4423891Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:32:10.4424154Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:32:10.4424412Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:32:10.4424806Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:32:10.4425060Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:32:10.4425275Z ) 2026-02-21T08:32:10.4425396Z .reqntid 256 2026-02-21T08:32:10.4425540Z .maxnreg 32 2026-02-21T08:32:10.4425670Z { 2026-02-21T08:32:10.4425819Z .reg .pred %p<100>; 2026-02-21T08:32:10.4425981Z .reg .b16 %rs<4>; 2026-02-21T08:32:10.4426143Z .reg .b32 %r<938>; 2026-02-21T08:32:10.4426295Z .reg .b64 %rd<363>; 2026-02-21T08:32:10.4426452Z $L__func_begin0: 2026-02-21T08:32:10.4426549Z 2026-02-21T08:32:10.4426607Z // %bb.0: 2026-02-21T08:32:10.4426867Z .loc 1 19 0 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:19 2026-02-21T08:32:10.4427163Z mov.u32 %r1, %tid.x; 2026-02-21T08:32:10.4427347Z ld.param.b64 %rd12, [_helion_matmul_param_1]; 2026-02-21T08:32:10.4427549Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:32:10.4427722Z mov.b32 %r51, global_smem; 2026-02-21T08:32:10.4427881Z // begin inline asm 2026-02-21T08:32:10.4428137Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r51], 256; 2026-02-21T08:32:10.4428385Z // end inline asm 2026-02-21T08:32:10.4428556Z ld.param.b64 %rd29, [_helion_matmul_param_3]; 2026-02-21T08:32:10.4428756Z bar.sync 0; 2026-02-21T08:32:10.4428907Z ld.shared.b32 %r930, [global_smem]; 2026-02-21T08:32:10.4429124Z bar.sync 0; 2026-02-21T08:32:10.4429253Z // begin inline asm 2026-02-21T08:32:10.4429460Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:32:10.4429679Z // end inline asm 2026-02-21T08:32:10.4429936Z .loc 1 21 67 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:21:67 2026-02-21T08:32:10.4430222Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:32:10.4430378Z mov.u32 %r60, %ctaid.y; 2026-02-21T08:32:10.4430529Z mov.u32 %r61, %ctaid.z; 2026-02-21T08:32:10.4430673Z mov.u32 %r62, %nctaid.x; 2026-02-21T08:32:10.4430823Z mov.u32 %r63, %nctaid.y; 2026-02-21T08:32:10.4430980Z mad.lo.s32 %r64, %r61, %r63, %r60; 2026-02-21T08:32:10.4431162Z mad.lo.s32 %r65, %r64, %r62, %r3; 2026-02-21T08:32:10.4431325Z shl.b32 %r66, %r65, 7; 2026-02-21T08:32:10.4431476Z cvt.s64.s32 %rd30, %r66; 2026-02-21T08:32:10.4431626Z add.s64 %rd26, %rd29, %rd30; 2026-02-21T08:32:10.4431783Z shl.b32 %r67, %r1, 2; 2026-02-21T08:32:10.4431925Z add.s32 %r52, %r51, %r67; 2026-02-21T08:32:10.4432074Z mov.b32 %r69, 0; 2026-02-21T08:32:10.4432214Z // begin inline asm 2026-02-21T08:32:10.4432408Z @%p1 st.shared.b32 [ %r52 + 0 ], %r69; 2026-02-21T08:32:10.4432583Z // end inline asm 2026-02-21T08:32:10.4432717Z bar.warp.sync -1; 2026-02-21T08:32:10.4432866Z setp.eq.b32 %p89, %r1, 0; 2026-02-21T08:32:10.4433022Z cvt.u64.u32 %rd11, %r51; 2026-02-21T08:32:10.4433173Z // begin inline asm 2026-02-21T08:32:10.4433418Z @%p89 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T08:32:10.4433710Z // end inline asm 2026-02-21T08:32:10.4433846Z // begin inline asm 2026-02-21T08:32:10.4434068Z @%p89 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T08:32:10.4434329Z // end inline asm 2026-02-21T08:32:10.4434461Z mov.b32 %r215, 16; 2026-02-21T08:32:10.4434605Z // begin inline asm 2026-02-21T08:32:10.4434877Z @%p89 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r215; 2026-02-21T08:32:10.4435152Z // end inline asm 2026-02-21T08:32:10.4435280Z mov.b32 %r55, 128; 2026-02-21T08:32:10.4435456Z // begin inline asm 2026-02-21T08:32:10.4435702Z @%p89 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r55; 2026-02-21T08:32:10.4435989Z // end inline asm 2026-02-21T08:32:10.4436138Z mov.b32 %r56, 2048; 2026-02-21T08:32:10.4436285Z // begin inline asm 2026-02-21T08:32:10.4436531Z @%p89 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r56; 2026-02-21T08:32:10.4436822Z // end inline asm 2026-02-21T08:32:10.4436962Z // begin inline asm 2026-02-21T08:32:10.4437233Z @%p89 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r56; 2026-02-21T08:32:10.4437528Z // end inline asm 2026-02-21T08:32:10.4437667Z mov.b64 %rd19, 4096; 2026-02-21T08:32:10.4437807Z // begin inline asm 2026-02-21T08:32:10.4438062Z @%p89 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T08:32:10.4438355Z // end inline asm 2026-02-21T08:32:10.4438492Z mov.b32 %r58, 1; 2026-02-21T08:32:10.4438622Z // begin inline asm 2026-02-21T08:32:10.4438881Z @%p89 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r58; 2026-02-21T08:32:10.4439168Z // end inline asm 2026-02-21T08:32:10.4439297Z // begin inline asm 2026-02-21T08:32:10.4439549Z @%p89 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r58; 2026-02-21T08:32:10.4439824Z // end inline asm 2026-02-21T08:32:10.4439960Z // begin inline asm 2026-02-21T08:32:10.4440188Z @%p89 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T08:32:10.4440454Z // end inline asm 2026-02-21T08:32:10.4440583Z // begin inline asm 2026-02-21T08:32:10.4440834Z @%p89 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T08:32:10.4441121Z // end inline asm 2026-02-21T08:32:10.4441251Z // begin inline asm 2026-02-21T08:32:10.4441519Z @%p89 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T08:32:10.4441793Z // end inline asm 2026-02-21T08:32:10.4441932Z // begin inline asm 2026-02-21T08:32:10.4442157Z @%p89 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T08:32:10.4442426Z // end inline asm 2026-02-21T08:32:10.4442565Z // begin inline asm 2026-02-21T08:32:10.4442907Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd26 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T08:32:10.4443285Z // end inline asm 2026-02-21T08:32:10.4443413Z // begin inline asm 2026-02-21T08:32:10.4443624Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd26 + 0 ], 0x80; 2026-02-21T08:32:10.4443867Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:32:10.4444058Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:32:10.4444235Z // end inline asm 2026-02-21T08:32:10.4444365Z bar.sync 0; 2026-02-21T08:32:10.4444510Z cvta.global.u64 %rd68, %rd26; 2026-02-21T08:32:10.4444823Z .loc 1 30 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:30:52 2026-02-21T08:32:10.4445164Z setp.gt.u32 %p21, %r3, 511; 2026-02-21T08:32:10.4445330Z @%p21 bra $L__BB0_8; 2026-02-21T08:32:10.4445507Z // %bb.1: // %.lr.ph 2026-02-21T08:32:10.4445807Z .loc 1 0 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:0:52 2026-02-21T08:32:10.4446131Z ld.param.b64 %rd9, [_helion_matmul_param_0]; 2026-02-21T08:32:10.4446449Z .loc 1 44 45 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:44:45 2026-02-21T08:32:10.4446740Z shl.b32 %r279, %r1, 3; 2026-02-21T08:32:10.4447007Z .loc 1 50 48 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:50:48 2026-02-21T08:32:10.4447298Z and.b32 %r280, %r279, 8; 2026-02-21T08:32:10.4447575Z .loc 1 44 45 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:44:45 2026-02-21T08:32:10.4447862Z and.b32 %r281, %r279, 120; 2026-02-21T08:32:10.4448176Z .loc 1 42 45 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:42:45 2026-02-21T08:32:10.4448462Z shr.u32 %r282, %r1, 4; 2026-02-21T08:32:10.4448712Z .loc 1 37 33 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:37:33 2026-02-21T08:32:10.4449005Z shr.u32 %r283, %r3, 4; 2026-02-21T08:32:10.4449149Z and.b32 %r284, %r283, 16; 2026-02-21T08:32:10.4449413Z .loc 1 39 64 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:39:64 2026-02-21T08:32:10.4449732Z and.b32 %r17, %r3, 15; 2026-02-21T08:32:10.4450000Z .loc 1 39 30 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:39:30 2026-02-21T08:32:10.4450294Z or.b32 %r285, %r284, %r17; 2026-02-21T08:32:10.4450559Z .loc 1 41 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:41:27 2026-02-21T08:32:10.4450849Z shl.b32 %r286, %r285, 8; 2026-02-21T08:32:10.4451112Z .loc 1 42 45 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:42:45 2026-02-21T08:32:10.4451411Z or.b32 %r287, %r282, %r286; 2026-02-21T08:32:10.4451573Z bfe.u32 %r288, %r1, 4, 4; 2026-02-21T08:32:10.4451736Z bfe.u32 %r4, %r1, 1, 7; 2026-02-21T08:32:10.4451892Z shr.u32 %r289, %r1, 5; 2026-02-21T08:32:10.4452038Z shl.b32 %r290, %r1, 4; 2026-02-21T08:32:10.4452193Z and.b32 %r291, %r290, 3952; 2026-02-21T08:32:10.4452349Z bfe.s32 %r292, %r1, 3, 1; 2026-02-21T08:32:10.4452511Z and.b32 %r293, %r292, 144; 2026-02-21T08:32:10.4452668Z xor.b32 %r5, %r293, %r291; 2026-02-21T08:32:10.4452827Z or.b32 %r6, %r280, 112; 2026-02-21T08:32:10.4452975Z add.s32 %r214, %r51, %r5; 2026-02-21T08:32:10.4453137Z add.s32 %r346, %r214, 57344; 2026-02-21T08:32:10.4453295Z add.s32 %r348, %r214, 61440; 2026-02-21T08:32:10.4453455Z shl.b32 %r295, %r1, 11; 2026-02-21T08:32:10.4453635Z and.b32 %r296, %r295, 12288; 2026-02-21T08:32:10.4453787Z and.b32 %r297, %r290, 4080; 2026-02-21T08:32:10.4453946Z or.b32 %r298, %r296, %r297; 2026-02-21T08:32:10.4454094Z xor.b32 %r299, %r298, 32; 2026-02-21T08:32:10.4454244Z xor.b32 %r300, %r298, 64; 2026-02-21T08:32:10.4454389Z xor.b32 %r301, %r298, 96; 2026-02-21T08:32:10.4454539Z shl.b32 %r302, %r1, 7; 2026-02-21T08:32:10.4454716Z and.b32 %r303, %r302, 12288; 2026-02-21T08:32:10.4454875Z shl.b32 %r304, %r1, 5; 2026-02-21T08:32:10.4455017Z and.b32 %r305, %r304, 864; 2026-02-21T08:32:10.4455172Z and.b32 %r306, %r1, 224; 2026-02-21T08:32:10.4455324Z and.b32 %r308, %r67, 16; 2026-02-21T08:32:10.4455470Z or.b32 %r309, %r303, %r305; 2026-02-21T08:32:10.4455629Z xor.b32 %r310, %r309, %r306; 2026-02-21T08:32:10.4455781Z add.s32 %r311, %r51, %r308; 2026-02-21T08:32:10.4455941Z add.s32 %r565, %r311, %r310; 2026-02-21T08:32:10.4456092Z add.s32 %r268, %r214, 49152; 2026-02-21T08:32:10.4456251Z add.s32 %r270, %r214, 53248; 2026-02-21T08:32:10.4456399Z add.s32 %r259, %r214, 40960; 2026-02-21T08:32:10.4456556Z add.s32 %r261, %r214, 45056; 2026-02-21T08:32:10.4456708Z add.s32 %r250, %r214, 32768; 2026-02-21T08:32:10.4456921Z add.s32 %r252, %r214, 36864; 2026-02-21T08:32:10.4457072Z add.s32 %r241, %r214, 24576; 2026-02-21T08:32:10.4457217Z add.s32 %r243, %r214, 28672; 2026-02-21T08:32:10.4457372Z add.s32 %r232, %r214, 16384; 2026-02-21T08:32:10.4457517Z add.s32 %r234, %r214, 20480; 2026-02-21T08:32:10.4457670Z add.s32 %r223, %r214, 8192; 2026-02-21T08:32:10.4457818Z add.s32 %r225, %r214, 12288; 2026-02-21T08:32:10.4457972Z add.s32 %r216, %r214, 4096; 2026-02-21T08:32:10.4458230Z .loc 1 42 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:42:32 2026-02-21T08:32:10.4458518Z or.b32 %r312, %r286, %r4; 2026-02-21T08:32:10.4458676Z or.b32 %r18, %r286, %r288; 2026-02-21T08:32:10.4458930Z .loc 1 43 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:43:27 2026-02-21T08:32:10.4459211Z shl.b32 %r313, %r3, 3; 2026-02-21T08:32:10.4459392Z and.b32 %r353, %r313, 1920; 2026-02-21T08:32:10.4459654Z .loc 1 54 53 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:53 2026-02-21T08:32:10.4459953Z shl.b32 %r314, %r312, 11; 2026-02-21T08:32:10.4460208Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4460512Z shfl.sync.idx.b32 %r36, %r289, 0, 31, -1; 2026-02-21T08:32:10.4460688Z shl.b32 %r315, %r36, 21; 2026-02-21T08:32:10.4460844Z and.b32 %r316, %r315, 6291456; 2026-02-21T08:32:10.4461059Z add.s32 %r317, %r316, %r930; 2026-02-21T08:32:10.4461217Z shl.b32 %r318, %r36, 5; 2026-02-21T08:32:10.4461365Z and.b32 %r319, %r318, 128; 2026-02-21T08:32:10.4461524Z add.s32 %r560, %r317, %r319; 2026-02-21T08:32:10.4461677Z mov.pred %p22, -1; 2026-02-21T08:32:10.4461833Z // begin inline asm 2026-02-21T08:32:10.4462192Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 0], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4462581Z // end inline asm 2026-02-21T08:32:10.4462731Z // begin inline asm 2026-02-21T08:32:10.4463086Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 16], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4463475Z // end inline asm 2026-02-21T08:32:10.4463614Z // begin inline asm 2026-02-21T08:32:10.4463953Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 32], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4464337Z // end inline asm 2026-02-21T08:32:10.4464475Z // begin inline asm 2026-02-21T08:32:10.4464846Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 48], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4465252Z // end inline asm 2026-02-21T08:32:10.4465398Z // begin inline asm 2026-02-21T08:32:10.4465723Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 64], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4466094Z // end inline asm 2026-02-21T08:32:10.4466238Z // begin inline asm 2026-02-21T08:32:10.4466561Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 80], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4466930Z // end inline asm 2026-02-21T08:32:10.4467065Z // begin inline asm 2026-02-21T08:32:10.4467402Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 96], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4467781Z // end inline asm 2026-02-21T08:32:10.4467927Z // begin inline asm 2026-02-21T08:32:10.4468263Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r560 + 112], {%r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69, %r69}; 2026-02-21T08:32:10.4468640Z // end inline asm 2026-02-21T08:32:10.4468822Z // begin inline asm 2026-02-21T08:32:10.4468977Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:32:10.4469152Z // end inline asm 2026-02-21T08:32:10.4469285Z bar.sync 0; 2026-02-21T08:32:10.4469542Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4469847Z add.s32 %r932, %r51, 98368; 2026-02-21T08:32:10.4469996Z // begin inline asm 2026-02-21T08:32:10.4470165Z @%p89 mbarrier.init.shared::cta.b64 [%r932], 1; 2026-02-21T08:32:10.4470350Z // end inline asm 2026-02-21T08:32:10.4470486Z bar.sync 0; 2026-02-21T08:32:10.4470613Z add.s32 %r205, %r51, 98376; 2026-02-21T08:32:10.4470767Z // begin inline asm 2026-02-21T08:32:10.4470925Z @%p89 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T08:32:10.4471113Z // end inline asm 2026-02-21T08:32:10.4471244Z add.s32 %r206, %r51, 98304; 2026-02-21T08:32:10.4471400Z // begin inline asm 2026-02-21T08:32:10.4471593Z @%p89 mbarrier.init.shared::cta.b64 [%r206], 1; 2026-02-21T08:32:10.4471775Z // end inline asm 2026-02-21T08:32:10.4471910Z bar.sync 0; 2026-02-21T08:32:10.4472035Z add.s32 %r207, %r51, 98312; 2026-02-21T08:32:10.4472188Z // begin inline asm 2026-02-21T08:32:10.4472342Z @%p89 mbarrier.init.shared::cta.b64 [%r207], 1; 2026-02-21T08:32:10.4472525Z // end inline asm 2026-02-21T08:32:10.4472650Z bar.sync 0; 2026-02-21T08:32:10.4472782Z add.s32 %r208, %r51, 98320; 2026-02-21T08:32:10.4472931Z // begin inline asm 2026-02-21T08:32:10.4473085Z @%p89 mbarrier.init.shared::cta.b64 [%r208], 1; 2026-02-21T08:32:10.4473310Z // end inline asm 2026-02-21T08:32:10.4473434Z bar.sync 0; 2026-02-21T08:32:10.4473567Z add.s32 %r209, %r51, 98328; 2026-02-21T08:32:10.4473711Z // begin inline asm 2026-02-21T08:32:10.4473875Z @%p89 mbarrier.init.shared::cta.b64 [%r209], 1; 2026-02-21T08:32:10.4474051Z // end inline asm 2026-02-21T08:32:10.4474185Z bar.sync 0; 2026-02-21T08:32:10.4474308Z add.s32 %r210, %r51, 98336; 2026-02-21T08:32:10.4474463Z // begin inline asm 2026-02-21T08:32:10.4474625Z @%p89 mbarrier.init.shared::cta.b64 [%r210], 1; 2026-02-21T08:32:10.4474840Z // end inline asm 2026-02-21T08:32:10.4474976Z bar.sync 0; 2026-02-21T08:32:10.4475101Z add.s32 %r211, %r51, 98344; 2026-02-21T08:32:10.4475257Z // begin inline asm 2026-02-21T08:32:10.4475412Z @%p89 mbarrier.init.shared::cta.b64 [%r211], 1; 2026-02-21T08:32:10.4475598Z // end inline asm 2026-02-21T08:32:10.4475725Z bar.sync 0; 2026-02-21T08:32:10.4475863Z add.s32 %r212, %r51, 98352; 2026-02-21T08:32:10.4476014Z // begin inline asm 2026-02-21T08:32:10.4476182Z @%p89 mbarrier.init.shared::cta.b64 [%r212], 1; 2026-02-21T08:32:10.4476369Z // end inline asm 2026-02-21T08:32:10.4476502Z bar.sync 0; 2026-02-21T08:32:10.4476645Z add.s32 %r350, %r51, 98360; 2026-02-21T08:32:10.4476798Z // begin inline asm 2026-02-21T08:32:10.4476994Z @%p89 mbarrier.init.shared::cta.b64 [%r350], 1; 2026-02-21T08:32:10.4477173Z // end inline asm 2026-02-21T08:32:10.4477425Z .loc 1 54 60 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:60 2026-02-21T08:32:10.4477707Z or.b32 %r320, %r314, %r280; 2026-02-21T08:32:10.4477973Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4478272Z mad.wide.u32 %rd31, %r320, 2, %rd9; 2026-02-21T08:32:10.4478443Z cvt.u64.u32 %rd2, %r314; 2026-02-21T08:32:10.4478605Z add.s64 %rd32, %rd31, 524288; 2026-02-21T08:32:10.4478867Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4479153Z // begin inline asm 2026-02-21T08:32:10.4479353Z cp.async.cg.shared.global [ %r214 + 0 ], [ %rd31 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4479583Z // end inline asm 2026-02-21T08:32:10.4479720Z // begin inline asm 2026-02-21T08:32:10.4479912Z cp.async.cg.shared.global [ %r216 + 0 ], [ %rd32 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4480137Z // end inline asm 2026-02-21T08:32:10.4480273Z cp.async.commit_group; 2026-02-21T08:32:10.4480571Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4480843Z bar.sync 0; 2026-02-21T08:32:10.4480978Z // begin inline asm 2026-02-21T08:32:10.4481161Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r206], 4096; 2026-02-21T08:32:10.4481380Z // end inline asm 2026-02-21T08:32:10.4481621Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4481896Z bar.sync 0; 2026-02-21T08:32:10.4482039Z elect.sync %r321|%p55, -1; 2026-02-21T08:32:10.4482399Z and.pred %p41, %p1, %p55; 2026-02-21T08:32:10.4482557Z add.s32 %r219, %r51, 65536; 2026-02-21T08:32:10.4482703Z // begin inline asm 2026-02-21T08:32:10.4483034Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r219], [%rd68, {%r69, %r353}], [%r206]; 2026-02-21T08:32:10.4483397Z // end inline asm 2026-02-21T08:32:10.4483662Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4483951Z add.s64 %rd34, %rd31, 32; 2026-02-21T08:32:10.4484102Z or.b32 %r322, %r320, 16; 2026-02-21T08:32:10.4484265Z mad.wide.u32 %rd52, %r322, 2, %rd9; 2026-02-21T08:32:10.4484434Z add.s64 %rd35, %rd52, 524288; 2026-02-21T08:32:10.4484728Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4485004Z // begin inline asm 2026-02-21T08:32:10.4485233Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd34 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4485455Z // end inline asm 2026-02-21T08:32:10.4485583Z // begin inline asm 2026-02-21T08:32:10.4485780Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd35 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4485995Z // end inline asm 2026-02-21T08:32:10.4486144Z cp.async.commit_group; 2026-02-21T08:32:10.4486409Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4486696Z bar.sync 0; 2026-02-21T08:32:10.4486822Z // begin inline asm 2026-02-21T08:32:10.4487014Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r207], 4096; 2026-02-21T08:32:10.4487230Z // end inline asm 2026-02-21T08:32:10.4487466Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4487756Z bar.sync 0; 2026-02-21T08:32:10.4487888Z elect.sync %r323|%p56, -1; 2026-02-21T08:32:10.4488055Z and.pred %p43, %p1, %p56; 2026-02-21T08:32:10.4488207Z add.s32 %r228, %r51, 69632; 2026-02-21T08:32:10.4488361Z // begin inline asm 2026-02-21T08:32:10.4488683Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r228], [%rd68, {%r215, %r353}], [%r207]; 2026-02-21T08:32:10.4489037Z // end inline asm 2026-02-21T08:32:10.4489313Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4489591Z add.s64 %rd37, %rd31, 64; 2026-02-21T08:32:10.4489747Z or.b32 %r324, %r320, 32; 2026-02-21T08:32:10.4489902Z mad.wide.u32 %rd53, %r324, 2, %rd9; 2026-02-21T08:32:10.4490078Z add.s64 %rd38, %rd53, 524288; 2026-02-21T08:32:10.4490335Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4490624Z // begin inline asm 2026-02-21T08:32:10.4490822Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd37 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4491043Z // end inline asm 2026-02-21T08:32:10.4491181Z // begin inline asm 2026-02-21T08:32:10.4491366Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd38 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4491585Z // end inline asm 2026-02-21T08:32:10.4491719Z cp.async.commit_group; 2026-02-21T08:32:10.4491978Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4492262Z bar.sync 0; 2026-02-21T08:32:10.4492384Z // begin inline asm 2026-02-21T08:32:10.4492574Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r208], 4096; 2026-02-21T08:32:10.4492817Z // end inline asm 2026-02-21T08:32:10.4493060Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4493336Z bar.sync 0; 2026-02-21T08:32:10.4493473Z elect.sync %r325|%p57, -1; 2026-02-21T08:32:10.4493633Z and.pred %p45, %p1, %p57; 2026-02-21T08:32:10.4493792Z add.s32 %r237, %r51, 73728; 2026-02-21T08:32:10.4493937Z mov.b32 %r238, 32; 2026-02-21T08:32:10.4494077Z // begin inline asm 2026-02-21T08:32:10.4494406Z @%p45 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r237], [%rd68, {%r238, %r353}], [%r208]; 2026-02-21T08:32:10.4494783Z // end inline asm 2026-02-21T08:32:10.4495029Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4495305Z add.s64 %rd40, %rd31, 96; 2026-02-21T08:32:10.4495492Z or.b32 %r326, %r320, 48; 2026-02-21T08:32:10.4495648Z mad.wide.u32 %rd54, %r326, 2, %rd9; 2026-02-21T08:32:10.4495823Z add.s64 %rd41, %rd54, 524288; 2026-02-21T08:32:10.4496085Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4496358Z // begin inline asm 2026-02-21T08:32:10.4496552Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd40 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4496765Z // end inline asm 2026-02-21T08:32:10.4496904Z // begin inline asm 2026-02-21T08:32:10.4497093Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd41 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4497351Z // end inline asm 2026-02-21T08:32:10.4497486Z cp.async.commit_group; 2026-02-21T08:32:10.4497751Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4498039Z bar.sync 0; 2026-02-21T08:32:10.4498163Z // begin inline asm 2026-02-21T08:32:10.4498354Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r209], 4096; 2026-02-21T08:32:10.4498562Z // end inline asm 2026-02-21T08:32:10.4498806Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4499084Z bar.sync 0; 2026-02-21T08:32:10.4499222Z elect.sync %r327|%p58, -1; 2026-02-21T08:32:10.4499381Z and.pred %p47, %p1, %p58; 2026-02-21T08:32:10.4499539Z add.s32 %r246, %r51, 77824; 2026-02-21T08:32:10.4499691Z mov.b32 %r247, 48; 2026-02-21T08:32:10.4499823Z // begin inline asm 2026-02-21T08:32:10.4500150Z @%p47 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd68, {%r247, %r353}], [%r209]; 2026-02-21T08:32:10.4500491Z // end inline asm 2026-02-21T08:32:10.4500738Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4501016Z add.s64 %rd43, %rd31, 128; 2026-02-21T08:32:10.4501224Z or.b32 %r328, %r320, 64; 2026-02-21T08:32:10.4501390Z mad.wide.u32 %rd55, %r328, 2, %rd9; 2026-02-21T08:32:10.4501563Z add.s64 %rd44, %rd55, 524288; 2026-02-21T08:32:10.4501839Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4502123Z // begin inline asm 2026-02-21T08:32:10.4502326Z cp.async.cg.shared.global [ %r250 + 0 ], [ %rd43 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4502546Z // end inline asm 2026-02-21T08:32:10.4502686Z // begin inline asm 2026-02-21T08:32:10.4502876Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd44 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4503101Z // end inline asm 2026-02-21T08:32:10.4503245Z cp.async.commit_group; 2026-02-21T08:32:10.4503503Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4503791Z bar.sync 0; 2026-02-21T08:32:10.4503919Z // begin inline asm 2026-02-21T08:32:10.4504111Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r210], 4096; 2026-02-21T08:32:10.4504322Z // end inline asm 2026-02-21T08:32:10.4504574Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4504923Z bar.sync 0; 2026-02-21T08:32:10.4505055Z elect.sync %r329|%p59, -1; 2026-02-21T08:32:10.4505221Z and.pred %p49, %p1, %p59; 2026-02-21T08:32:10.4505374Z add.s32 %r255, %r51, 81920; 2026-02-21T08:32:10.4505528Z mov.b32 %r256, 64; 2026-02-21T08:32:10.4505660Z // begin inline asm 2026-02-21T08:32:10.4505976Z @%p49 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r255], [%rd68, {%r256, %r353}], [%r210]; 2026-02-21T08:32:10.4506327Z // end inline asm 2026-02-21T08:32:10.4506578Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4506879Z add.s64 %rd46, %rd31, 160; 2026-02-21T08:32:10.4507039Z or.b32 %r330, %r320, 80; 2026-02-21T08:32:10.4507212Z mad.wide.u32 %rd56, %r330, 2, %rd9; 2026-02-21T08:32:10.4507385Z add.s64 %rd47, %rd56, 524288; 2026-02-21T08:32:10.4507693Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4507986Z // begin inline asm 2026-02-21T08:32:10.4508206Z cp.async.cg.shared.global [ %r259 + 0 ], [ %rd46 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4508432Z // end inline asm 2026-02-21T08:32:10.4508577Z // begin inline asm 2026-02-21T08:32:10.4508778Z cp.async.cg.shared.global [ %r261 + 0 ], [ %rd47 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4509005Z // end inline asm 2026-02-21T08:32:10.4509155Z cp.async.commit_group; 2026-02-21T08:32:10.4509459Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4509759Z bar.sync 0; 2026-02-21T08:32:10.4509889Z // begin inline asm 2026-02-21T08:32:10.4510089Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r211], 4096; 2026-02-21T08:32:10.4510310Z // end inline asm 2026-02-21T08:32:10.4510568Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4510865Z bar.sync 0; 2026-02-21T08:32:10.4511003Z elect.sync %r331|%p60, -1; 2026-02-21T08:32:10.4511176Z and.pred %p51, %p1, %p60; 2026-02-21T08:32:10.4511334Z add.s32 %r264, %r51, 86016; 2026-02-21T08:32:10.4511496Z mov.b32 %r265, 80; 2026-02-21T08:32:10.4511636Z // begin inline asm 2026-02-21T08:32:10.4511974Z @%p51 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd68, {%r265, %r353}], [%r211]; 2026-02-21T08:32:10.4512352Z // end inline asm 2026-02-21T08:32:10.4512608Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4512918Z add.s64 %rd49, %rd31, 192; 2026-02-21T08:32:10.4513074Z or.b32 %r332, %r320, 96; 2026-02-21T08:32:10.4513242Z mad.wide.u32 %rd57, %r332, 2, %rd9; 2026-02-21T08:32:10.4513443Z add.s64 %rd50, %rd57, 524288; 2026-02-21T08:32:10.4513725Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4513996Z // begin inline asm 2026-02-21T08:32:10.4514193Z cp.async.cg.shared.global [ %r268 + 0 ], [ %rd49 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4514412Z // end inline asm 2026-02-21T08:32:10.4514540Z // begin inline asm 2026-02-21T08:32:10.4514761Z cp.async.cg.shared.global [ %r270 + 0 ], [ %rd50 + 0 ], 0x10, %r215; 2026-02-21T08:32:10.4514978Z // end inline asm 2026-02-21T08:32:10.4515117Z cp.async.commit_group; 2026-02-21T08:32:10.4515375Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4515664Z bar.sync 0; 2026-02-21T08:32:10.4515787Z // begin inline asm 2026-02-21T08:32:10.4515974Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r212], 4096; 2026-02-21T08:32:10.4516190Z // end inline asm 2026-02-21T08:32:10.4516432Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4516711Z bar.sync 0; 2026-02-21T08:32:10.4516842Z elect.sync %r333|%p61, -1; 2026-02-21T08:32:10.4517036Z and.pred %p53, %p1, %p61; 2026-02-21T08:32:10.4517188Z add.s32 %r273, %r51, 90112; 2026-02-21T08:32:10.4517341Z mov.b32 %r274, 96; 2026-02-21T08:32:10.4517472Z // begin inline asm 2026-02-21T08:32:10.4517796Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r273], [%rd68, {%r274, %r353}], [%r212]; 2026-02-21T08:32:10.4518155Z // end inline asm 2026-02-21T08:32:10.4518401Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4518702Z cp.async.wait_group 6; 2026-02-21T08:32:10.4518851Z bar.sync 0; 2026-02-21T08:32:10.4519099Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4519378Z // begin inline asm 2026-02-21T08:32:10.4519513Z 2026-02-21T08:32:10.4519632Z { 2026-02-21T08:32:10.4519751Z .reg .pred complete; 2026-02-21T08:32:10.4519924Z waitLoop: 2026-02-21T08:32:10.4520111Z mbarrier.try_wait.parity.shared.b64 complete, [%r206], %r69; 2026-02-21T08:32:10.4520342Z @!complete bra.uni waitLoop; 2026-02-21T08:32:10.4520488Z } 2026-02-21T08:32:10.4520559Z 2026-02-21T08:32:10.4520613Z // end inline asm 2026-02-21T08:32:10.4520855Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4521155Z setp.ne.b32 %p62, %r36, 0; 2026-02-21T08:32:10.4521314Z @%p62 bra $L__BB0_3; 2026-02-21T08:32:10.4521452Z // %bb.2: 2026-02-21T08:32:10.4521718Z .loc 1 0 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:0:52 2026-02-21T08:32:10.4521997Z add.s32 %r339, %r51, 4096; 2026-02-21T08:32:10.4522157Z bfe.u32 %r340, %r339, 4, 14; 2026-02-21T08:32:10.4522310Z cvt.u64.u32 %rd63, %r340; 2026-02-21T08:32:10.4522481Z or.b64 %rd60, %rd63, -4611685949674356736; 2026-02-21T08:32:10.4522660Z bfe.u32 %r342, %r219, 4, 14; 2026-02-21T08:32:10.4522819Z cvt.u64.u32 %rd64, %r342; 2026-02-21T08:32:10.4522982Z or.b64 %rd59, %rd64, -4611685949691133952; 2026-02-21T08:32:10.4523164Z bfe.u32 %r343, %r51, 4, 14; 2026-02-21T08:32:10.4523317Z cvt.u64.u32 %rd65, %r343; 2026-02-21T08:32:10.4523471Z or.b64 %rd58, %rd65, -4611685949674356736; 2026-02-21T08:32:10.4523759Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4524044Z elect.sync %r344|%p64, -1; 2026-02-21T08:32:10.4524202Z mov.b32 %r335, 136314896; 2026-02-21T08:32:10.4524348Z mov.pred %p63, 0; 2026-02-21T08:32:10.4524490Z // begin inline asm 2026-02-21T08:32:10.4524749Z @%p64 tcgen05.mma.cta_group::1.kind::f16 [ %r930 + 0 ], %rd58, %rd59, %r335, %p63; 2026-02-21T08:32:10.4525000Z // end inline asm 2026-02-21T08:32:10.4525137Z // begin inline asm 2026-02-21T08:32:10.4525384Z @%p64 tcgen05.mma.cta_group::1.kind::f16 [ %r930 + 128 ], %rd60, %rd59, %r335, %p63; 2026-02-21T08:32:10.4525643Z // end inline asm 2026-02-21T08:32:10.4525775Z add.s32 %r345, %r51, 98368; 2026-02-21T08:32:10.4525936Z cvt.u64.u32 %rd62, %r345; 2026-02-21T08:32:10.4526085Z // begin inline asm 2026-02-21T08:32:10.4526297Z @%p64 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd62]; 2026-02-21T08:32:10.4526528Z // end inline asm 2026-02-21T08:32:10.4526655Z $L__BB0_3: 2026-02-21T08:32:10.4526893Z .loc 1 0 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:0:52 2026-02-21T08:32:10.4527197Z ld.param.b64 %rd10, [_helion_matmul_param_2]; 2026-02-21T08:32:10.4527393Z add.s32 %r9, %r51, %r298; 2026-02-21T08:32:10.4527542Z add.s32 %r10, %r51, %r299; 2026-02-21T08:32:10.4527700Z add.s32 %r11, %r51, %r300; 2026-02-21T08:32:10.4527847Z add.s32 %r12, %r51, %r301; 2026-02-21T08:32:10.4528007Z add.s32 %r570, %r565, 1024; 2026-02-21T08:32:10.4528169Z add.s32 %r575, %r565, 2048; 2026-02-21T08:32:10.4528321Z add.s32 %r580, %r565, 3072; 2026-02-21T08:32:10.4528479Z or.b32 %r19, %r18, 16; 2026-02-21T08:32:10.4528624Z or.b32 %r20, %r18, 32; 2026-02-21T08:32:10.4528810Z or.b32 %r21, %r287, 48; 2026-02-21T08:32:10.4528953Z or.b32 %r22, %r18, 64; 2026-02-21T08:32:10.4529097Z or.b32 %r23, %r18, 80; 2026-02-21T08:32:10.4529232Z or.b32 %r24, %r18, 96; 2026-02-21T08:32:10.4529378Z or.b32 %r25, %r287, 112; 2026-02-21T08:32:10.4529524Z or.b32 %r26, %r18, 128; 2026-02-21T08:32:10.4529672Z or.b32 %r27, %r18, 144; 2026-02-21T08:32:10.4529818Z or.b32 %r28, %r18, 160; 2026-02-21T08:32:10.4529958Z or.b32 %r29, %r287, 176; 2026-02-21T08:32:10.4530105Z or.b32 %r30, %r18, 192; 2026-02-21T08:32:10.4530242Z or.b32 %r31, %r18, 208; 2026-02-21T08:32:10.4530387Z or.b32 %r32, %r18, 224; 2026-02-21T08:32:10.4530525Z or.b32 %r33, %r287, 240; 2026-02-21T08:32:10.4530675Z or.b32 %r35, %r353, %r281; 2026-02-21T08:32:10.4530935Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4531220Z add.s64 %rd66, %rd31, 224; 2026-02-21T08:32:10.4531395Z cvt.u64.u32 %rd70, %r6; 2026-02-21T08:32:10.4531553Z add.s64 %rd71, %rd2, %rd70; 2026-02-21T08:32:10.4531710Z shl.b64 %rd72, %rd71, 1; 2026-02-21T08:32:10.4531856Z add.s64 %rd73, %rd9, %rd72; 2026-02-21T08:32:10.4532015Z add.s64 %rd67, %rd73, 524288; 2026-02-21T08:32:10.4532274Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4532551Z bar.sync 0; 2026-02-21T08:32:10.4532677Z mov.b32 %r347, 16; 2026-02-21T08:32:10.4532819Z // begin inline asm 2026-02-21T08:32:10.4533117Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd66 + 0 ], 0x10, %r347; 2026-02-21T08:32:10.4533343Z // end inline asm 2026-02-21T08:32:10.4533480Z // begin inline asm 2026-02-21T08:32:10.4533674Z cp.async.cg.shared.global [ %r348 + 0 ], [ %rd67 + 0 ], 0x10, %r347; 2026-02-21T08:32:10.4533895Z // end inline asm 2026-02-21T08:32:10.4534032Z cp.async.commit_group; 2026-02-21T08:32:10.4534295Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4534582Z // begin inline asm 2026-02-21T08:32:10.4534816Z @%p89 mbarrier.arrive.expect_tx.shared.b64 _, [%r350], 4096; 2026-02-21T08:32:10.4535030Z // end inline asm 2026-02-21T08:32:10.4535280Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4535565Z bar.sync 0; 2026-02-21T08:32:10.4535700Z elect.sync %r360|%p71, -1; 2026-02-21T08:32:10.4535868Z and.pred %p69, %p1, %p71; 2026-02-21T08:32:10.4536024Z add.s32 %r351, %r51, 94208; 2026-02-21T08:32:10.4536182Z mov.b32 %r352, 112; 2026-02-21T08:32:10.4536318Z // begin inline asm 2026-02-21T08:32:10.4536654Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r351], [%rd68, {%r352, %r353}], [%r350]; 2026-02-21T08:32:10.4537014Z // end inline asm 2026-02-21T08:32:10.4537289Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4537582Z cvt.u16.u32 %rs1, %r3; 2026-02-21T08:32:10.4537735Z shr.u16 %rs2, %rs1, 8; 2026-02-21T08:32:10.4537890Z and.b16 %rs3, %rs2, 1; 2026-02-21T08:32:10.4538044Z mul.wide.u16 %r361, %rs3, 4096; 2026-02-21T08:32:10.4538217Z shl.b32 %r362, %r17, 8; 2026-02-21T08:32:10.4538361Z or.b32 %r363, %r361, %r362; 2026-02-21T08:32:10.4538519Z or.b32 %r364, %r363, %r4; 2026-02-21T08:32:10.4538681Z mul.wide.u32 %rd74, %r364, 4096; 2026-02-21T08:32:10.4538841Z and.b32 %r365, %r1, 1; 2026-02-21T08:32:10.4538994Z mul.wide.u32 %rd75, %r365, 16; 2026-02-21T08:32:10.4539153Z or.b64 %rd76, %rd74, %rd75; 2026-02-21T08:32:10.4539310Z add.s64 %rd77, %rd76, %rd9; 2026-02-21T08:32:10.4539462Z add.s64 %rd361, %rd77, 524544; 2026-02-21T08:32:10.4539621Z mov.b32 %r936, 1; 2026-02-21T08:32:10.4539752Z mov.b32 %r935, 7; 2026-02-21T08:32:10.4539888Z mov.b32 %r931, 0; 2026-02-21T08:32:10.4540019Z mov.b64 %rd362, 0; 2026-02-21T08:32:10.4540159Z mov.b32 %r933, %r931; 2026-02-21T08:32:10.4540301Z mov.b32 %r934, %r931; 2026-02-21T08:32:10.4540488Z mov.b32 %r937, %r931; 2026-02-21T08:32:10.4540635Z bra.uni $L__BB0_4; 2026-02-21T08:32:10.4540815Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:32:10.4541141Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4541439Z setp.lt.u64 %p81, %rd362, 1920; 2026-02-21T08:32:10.4541720Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4541996Z // begin inline asm 2026-02-21T08:32:10.4542132Z 2026-02-21T08:32:10.4542247Z { 2026-02-21T08:32:10.4542365Z .reg .pred complete; 2026-02-21T08:32:10.4542511Z waitLoop: 2026-02-21T08:32:10.4542698Z mbarrier.try_wait.parity.shared.b64 complete, [%r932], %r931; 2026-02-21T08:32:10.4542935Z @!complete bra.uni waitLoop; 2026-02-21T08:32:10.4543082Z } 2026-02-21T08:32:10.4543152Z 2026-02-21T08:32:10.4543233Z // end inline asm 2026-02-21T08:32:10.4543369Z add.s32 %r401, %r936, 1; 2026-02-21T08:32:10.4543526Z setp.gt.s32 %p84, %r401, 1; 2026-02-21T08:32:10.4543680Z selp.b32 %r936, 0, %r401, %p84; 2026-02-21T08:32:10.4543847Z selp.b32 %r402, 1, 0, %p84; 2026-02-21T08:32:10.4544005Z xor.b32 %r937, %r414, %r402; 2026-02-21T08:32:10.4544262Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4544546Z add.s32 %r403, %r935, 1; 2026-02-21T08:32:10.4544721Z setp.gt.s32 %p85, %r403, 7; 2026-02-21T08:32:10.4544934Z selp.b32 %r935, 0, %r403, %p85; 2026-02-21T08:32:10.4545195Z .loc 1 54 32 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:32 2026-02-21T08:32:10.4545482Z add.s64 %rd86, %rd361, -524288; 2026-02-21T08:32:10.4545754Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4546032Z shl.b32 %r404, %r935, 13; 2026-02-21T08:32:10.4546191Z add.s32 %r406, %r51, %r404; 2026-02-21T08:32:10.4546341Z bar.sync 0; 2026-02-21T08:32:10.4546482Z add.s32 %r392, %r406, %r5; 2026-02-21T08:32:10.4546636Z selp.b32 %r393, 16, 0, %p81; 2026-02-21T08:32:10.4546800Z // begin inline asm 2026-02-21T08:32:10.4546999Z cp.async.cg.shared.global [ %r392 + 0 ], [ %rd86 + 0 ], 0x10, %r393; 2026-02-21T08:32:10.4560797Z // end inline asm 2026-02-21T08:32:10.4561041Z add.s32 %r394, %r392, 4096; 2026-02-21T08:32:10.4561234Z // begin inline asm 2026-02-21T08:32:10.4561459Z cp.async.cg.shared.global [ %r394 + 0 ], [ %rd361 + 0 ], 0x10, %r393; 2026-02-21T08:32:10.4561725Z // end inline asm 2026-02-21T08:32:10.4561876Z cp.async.commit_group; 2026-02-21T08:32:10.4562173Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4562469Z shl.b32 %r407, %r935, 3; 2026-02-21T08:32:10.4562734Z add.s32 %r408, %r51, %r407; 2026-02-21T08:32:10.4562913Z add.s32 %r400, %r408, 98304; 2026-02-21T08:32:10.4563092Z and.pred %p79, %p89, %p81; 2026-02-21T08:32:10.4563273Z // begin inline asm 2026-02-21T08:32:10.4563475Z @%p79 mbarrier.arrive.expect_tx.shared.b64 _, [%r400], 4096; 2026-02-21T08:32:10.4563711Z // end inline asm 2026-02-21T08:32:10.4563969Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4564278Z shl.b32 %r409, %r935, 12; 2026-02-21T08:32:10.4564436Z add.s32 %r410, %r51, %r409; 2026-02-21T08:32:10.4564606Z add.s32 %r397, %r410, 65536; 2026-02-21T08:32:10.4564816Z bar.sync 0; 2026-02-21T08:32:10.4564958Z elect.sync %r411|%p86, -1; 2026-02-21T08:32:10.4565131Z and.pred %p87, %p81, %p86; 2026-02-21T08:32:10.4565289Z and.pred %p80, %p1, %p87; 2026-02-21T08:32:10.4565458Z cvt.u32.u64 %r412, %rd362; 2026-02-21T08:32:10.4565611Z add.s32 %r398, %r412, 128; 2026-02-21T08:32:10.4565772Z // begin inline asm 2026-02-21T08:32:10.4566116Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r397], [%rd68, {%r398, %r353}], [%r400]; 2026-02-21T08:32:10.4566551Z // end inline asm 2026-02-21T08:32:10.4566818Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4567114Z add.s64 %rd361, %rd361, 32; 2026-02-21T08:32:10.4567292Z setp.lt.u64 %p88, %rd362, 2016; 2026-02-21T08:32:10.4567461Z add.s64 %rd362, %rd362, 16; 2026-02-21T08:32:10.4567628Z mov.b32 %r931, %r414; 2026-02-21T08:32:10.4567776Z mov.b32 %r932, %r413; 2026-02-21T08:32:10.4567933Z @%p88 bra $L__BB0_4; 2026-02-21T08:32:10.4568081Z bra.uni $L__BB0_7; 2026-02-21T08:32:10.4568284Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:32:10.4568619Z .loc 1 0 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:0:89 2026-02-21T08:32:10.4568901Z mov.b32 %r414, %r937; 2026-02-21T08:32:10.4569210Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4569499Z add.s32 %r368, %r934, 1; 2026-02-21T08:32:10.4569670Z setp.gt.s32 %p73, %r368, 7; 2026-02-21T08:32:10.4569838Z selp.b32 %r934, 0, %r368, %p73; 2026-02-21T08:32:10.4570011Z selp.b32 %r369, 1, 0, %p73; 2026-02-21T08:32:10.4570167Z xor.b32 %r933, %r933, %r369; 2026-02-21T08:32:10.4570438Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4570747Z cp.async.wait_group 6; 2026-02-21T08:32:10.4570905Z bar.sync 0; 2026-02-21T08:32:10.4571199Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4571488Z shl.b32 %r370, %r934, 3; 2026-02-21T08:32:10.4571648Z add.s32 %r372, %r51, %r370; 2026-02-21T08:32:10.4571799Z add.s32 %r366, %r372, 98304; 2026-02-21T08:32:10.4571960Z // begin inline asm 2026-02-21T08:32:10.4572104Z 2026-02-21T08:32:10.4572218Z { 2026-02-21T08:32:10.4572348Z .reg .pred complete; 2026-02-21T08:32:10.4572495Z waitLoop: 2026-02-21T08:32:10.4572697Z mbarrier.try_wait.parity.shared.b64 complete, [%r366], %r933; 2026-02-21T08:32:10.4572929Z @!complete bra.uni waitLoop; 2026-02-21T08:32:10.4573088Z } 2026-02-21T08:32:10.4573156Z 2026-02-21T08:32:10.4573213Z // end inline asm 2026-02-21T08:32:10.4573360Z shl.b32 %r373, %r936, 3; 2026-02-21T08:32:10.4573510Z add.s32 %r374, %r51, %r373; 2026-02-21T08:32:10.4573670Z add.s32 %r413, %r374, 98368; 2026-02-21T08:32:10.4573944Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4574222Z @%p62 bra $L__BB0_6; 2026-02-21T08:32:10.4574418Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:32:10.4574766Z .loc 1 55 44 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:55:44 2026-02-21T08:32:10.4575089Z shl.b32 %r379, %r934, 12; 2026-02-21T08:32:10.4575243Z add.s32 %r381, %r51, %r379; 2026-02-21T08:32:10.4575405Z add.s32 %r382, %r381, 65536; 2026-02-21T08:32:10.4575670Z .loc 1 54 85 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:54:85 2026-02-21T08:32:10.4575951Z shl.b32 %r383, %r934, 13; 2026-02-21T08:32:10.4576108Z add.s32 %r384, %r51, %r383; 2026-02-21T08:32:10.4576362Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4576658Z elect.sync %r385|%p75, -1; 2026-02-21T08:32:10.4576818Z bfe.u32 %r386, %r384, 4, 14; 2026-02-21T08:32:10.4576983Z cvt.u64.u32 %rd83, %r386; 2026-02-21T08:32:10.4577150Z or.b64 %rd78, %rd83, -4611685949674356736; 2026-02-21T08:32:10.4577341Z bfe.u32 %r387, %r382, 4, 14; 2026-02-21T08:32:10.4577502Z cvt.u64.u32 %rd84, %r387; 2026-02-21T08:32:10.4577665Z or.b64 %rd79, %rd84, -4611685949691133952; 2026-02-21T08:32:10.4577850Z mov.b32 %r376, 136314896; 2026-02-21T08:32:10.4578001Z mov.pred %p74, -1; 2026-02-21T08:32:10.4578151Z // begin inline asm 2026-02-21T08:32:10.4578379Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r930 + 0 ], %rd78, %rd79, %r376, %p74; 2026-02-21T08:32:10.4578679Z // end inline asm 2026-02-21T08:32:10.4578816Z add.s32 %r388, %r384, 4096; 2026-02-21T08:32:10.4578978Z bfe.u32 %r389, %r388, 4, 14; 2026-02-21T08:32:10.4579139Z cvt.u64.u32 %rd85, %r389; 2026-02-21T08:32:10.4579304Z or.b64 %rd80, %rd85, -4611685949674356736; 2026-02-21T08:32:10.4579492Z // begin inline asm 2026-02-21T08:32:10.4579716Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r930 + 128 ], %rd80, %rd79, %r376, %p74; 2026-02-21T08:32:10.4579978Z // end inline asm 2026-02-21T08:32:10.4580117Z cvt.u64.u32 %rd82, %r413; 2026-02-21T08:32:10.4580277Z // begin inline asm 2026-02-21T08:32:10.4580481Z @%p75 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd82]; 2026-02-21T08:32:10.4580718Z // end inline asm 2026-02-21T08:32:10.4580858Z bra.uni $L__BB0_6; 2026-02-21T08:32:10.4581034Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:32:10.4581392Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4581677Z // begin inline asm 2026-02-21T08:32:10.4581816Z 2026-02-21T08:32:10.4581928Z { 2026-02-21T08:32:10.4582061Z .reg .pred complete; 2026-02-21T08:32:10.4582205Z waitLoop: 2026-02-21T08:32:10.4582397Z mbarrier.try_wait.parity.shared.b64 complete, [%r413], %r414; 2026-02-21T08:32:10.4582633Z @!complete bra.uni waitLoop; 2026-02-21T08:32:10.4582783Z } 2026-02-21T08:32:10.4582847Z 2026-02-21T08:32:10.4582943Z // end inline asm 2026-02-21T08:32:10.4583188Z .loc 1 49 89 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:49:89 2026-02-21T08:32:10.4583486Z cp.async.wait_group 0; 2026-02-21T08:32:10.4583634Z bar.sync 0; 2026-02-21T08:32:10.4583770Z // begin inline asm 2026-02-21T08:32:10.4583939Z @%p89 mbarrier.inval.shared::cta.b64 [%r206]; 2026-02-21T08:32:10.4584141Z // end inline asm 2026-02-21T08:32:10.4584280Z bar.sync 0; 2026-02-21T08:32:10.4584408Z // begin inline asm 2026-02-21T08:32:10.4584582Z @%p89 mbarrier.inval.shared::cta.b64 [%r207]; 2026-02-21T08:32:10.4584801Z // end inline asm 2026-02-21T08:32:10.4584940Z bar.sync 0; 2026-02-21T08:32:10.4585067Z // begin inline asm 2026-02-21T08:32:10.4585231Z @%p89 mbarrier.inval.shared::cta.b64 [%r208]; 2026-02-21T08:32:10.4585410Z // end inline asm 2026-02-21T08:32:10.4585547Z bar.sync 0; 2026-02-21T08:32:10.4585672Z // begin inline asm 2026-02-21T08:32:10.4585836Z @%p89 mbarrier.inval.shared::cta.b64 [%r209]; 2026-02-21T08:32:10.4586025Z // end inline asm 2026-02-21T08:32:10.4586152Z bar.sync 0; 2026-02-21T08:32:10.4586286Z // begin inline asm 2026-02-21T08:32:10.4586442Z @%p89 mbarrier.inval.shared::cta.b64 [%r210]; 2026-02-21T08:32:10.4586628Z // end inline asm 2026-02-21T08:32:10.4586762Z bar.sync 0; 2026-02-21T08:32:10.4586950Z // begin inline asm 2026-02-21T08:32:10.4587107Z @%p89 mbarrier.inval.shared::cta.b64 [%r211]; 2026-02-21T08:32:10.4587295Z // end inline asm 2026-02-21T08:32:10.4587425Z bar.sync 0; 2026-02-21T08:32:10.4587559Z // begin inline asm 2026-02-21T08:32:10.4587722Z @%p89 mbarrier.inval.shared::cta.b64 [%r212]; 2026-02-21T08:32:10.4587899Z // end inline asm 2026-02-21T08:32:10.4588040Z bar.sync 0; 2026-02-21T08:32:10.4588165Z // begin inline asm 2026-02-21T08:32:10.4588333Z @%p89 mbarrier.inval.shared::cta.b64 [%r350]; 2026-02-21T08:32:10.4588515Z // end inline asm 2026-02-21T08:32:10.4588671Z add.s32 %r423, %r51, 98368; 2026-02-21T08:32:10.4588828Z // begin inline asm 2026-02-21T08:32:10.4589007Z @%p89 mbarrier.inval.shared::cta.b64 [%r423]; 2026-02-21T08:32:10.4589201Z // end inline asm 2026-02-21T08:32:10.4589327Z bar.sync 0; 2026-02-21T08:32:10.4589461Z // begin inline asm 2026-02-21T08:32:10.4589615Z @%p89 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T08:32:10.4589801Z // end inline asm 2026-02-21T08:32:10.4590044Z .loc 1 59 45 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:59:45 2026-02-21T08:32:10.4590369Z shl.b32 %r706, %r18, 11; 2026-02-21T08:32:10.4590522Z shl.b32 %r707, %r19, 11; 2026-02-21T08:32:10.4590680Z shl.b32 %r708, %r20, 11; 2026-02-21T08:32:10.4590827Z shl.b32 %r709, %r21, 11; 2026-02-21T08:32:10.4590982Z shl.b32 %r710, %r22, 11; 2026-02-21T08:32:10.4591135Z shl.b32 %r711, %r23, 11; 2026-02-21T08:32:10.4591281Z shl.b32 %r712, %r24, 11; 2026-02-21T08:32:10.4591435Z shl.b32 %r713, %r25, 11; 2026-02-21T08:32:10.4591578Z shl.b32 %r714, %r26, 11; 2026-02-21T08:32:10.4591732Z shl.b32 %r715, %r27, 11; 2026-02-21T08:32:10.4591876Z shl.b32 %r716, %r28, 11; 2026-02-21T08:32:10.4592028Z shl.b32 %r717, %r29, 11; 2026-02-21T08:32:10.4592172Z shl.b32 %r718, %r30, 11; 2026-02-21T08:32:10.4592318Z shl.b32 %r719, %r31, 11; 2026-02-21T08:32:10.4592471Z shl.b32 %r720, %r32, 11; 2026-02-21T08:32:10.4592614Z shl.b32 %r721, %r33, 11; 2026-02-21T08:32:10.4592913Z .loc 1 59 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:59:52 2026-02-21T08:32:10.4593229Z or.b32 %r722, %r706, %r35; 2026-02-21T08:32:10.4593399Z or.b32 %r723, %r707, %r35; 2026-02-21T08:32:10.4593564Z or.b32 %r724, %r708, %r35; 2026-02-21T08:32:10.4593717Z or.b32 %r725, %r709, %r35; 2026-02-21T08:32:10.4593875Z or.b32 %r726, %r710, %r35; 2026-02-21T08:32:10.4594026Z or.b32 %r727, %r711, %r35; 2026-02-21T08:32:10.4594182Z or.b32 %r728, %r712, %r35; 2026-02-21T08:32:10.4594333Z or.b32 %r729, %r713, %r35; 2026-02-21T08:32:10.4594490Z or.b32 %r730, %r714, %r35; 2026-02-21T08:32:10.4594712Z or.b32 %r731, %r715, %r35; 2026-02-21T08:32:10.4594880Z or.b32 %r732, %r716, %r35; 2026-02-21T08:32:10.4595039Z or.b32 %r733, %r717, %r35; 2026-02-21T08:32:10.4595203Z or.b32 %r734, %r718, %r35; 2026-02-21T08:32:10.4595367Z or.b32 %r735, %r719, %r35; 2026-02-21T08:32:10.4595522Z or.b32 %r736, %r720, %r35; 2026-02-21T08:32:10.4595692Z or.b32 %r737, %r721, %r35; 2026-02-21T08:32:10.4595964Z .loc 1 59 24 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:59:24 2026-02-21T08:32:10.4596281Z mad.wide.u32 %rd89, %r722, 2, %rd10; 2026-02-21T08:32:10.4596472Z mad.wide.u32 %rd90, %r723, 2, %rd10; 2026-02-21T08:32:10.4596668Z mad.wide.u32 %rd91, %r724, 2, %rd10; 2026-02-21T08:32:10.4596851Z mad.wide.u32 %rd92, %r725, 2, %rd10; 2026-02-21T08:32:10.4597039Z mad.wide.u32 %rd93, %r726, 2, %rd10; 2026-02-21T08:32:10.4597227Z mad.wide.u32 %rd94, %r727, 2, %rd10; 2026-02-21T08:32:10.4597404Z mad.wide.u32 %rd95, %r728, 2, %rd10; 2026-02-21T08:32:10.4597590Z mad.wide.u32 %rd96, %r729, 2, %rd10; 2026-02-21T08:32:10.4597767Z mad.wide.u32 %rd97, %r730, 2, %rd10; 2026-02-21T08:32:10.4597949Z mad.wide.u32 %rd98, %r731, 2, %rd10; 2026-02-21T08:32:10.4598123Z mad.wide.u32 %rd99, %r732, 2, %rd10; 2026-02-21T08:32:10.4598311Z mad.wide.u32 %rd100, %r733, 2, %rd10; 2026-02-21T08:32:10.4598543Z mad.wide.u32 %rd101, %r734, 2, %rd10; 2026-02-21T08:32:10.4598744Z mad.wide.u32 %rd102, %r735, 2, %rd10; 2026-02-21T08:32:10.4598923Z mad.wide.u32 %rd103, %r736, 2, %rd10; 2026-02-21T08:32:10.4599106Z mad.wide.u32 %rd104, %r737, 2, %rd10; 2026-02-21T08:32:10.4599399Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4599691Z // begin inline asm 2026-02-21T08:32:10.4600071Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440}, [%r560 + 0]; 2026-02-21T08:32:10.4600473Z // end inline asm 2026-02-21T08:32:10.4600625Z // begin inline asm 2026-02-21T08:32:10.4600994Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457}, [%r560 + 16]; 2026-02-21T08:32:10.4601370Z // end inline asm 2026-02-21T08:32:10.4601508Z // begin inline asm 2026-02-21T08:32:10.4601844Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474}, [%r560 + 32]; 2026-02-21T08:32:10.4602264Z // end inline asm 2026-02-21T08:32:10.4602394Z // begin inline asm 2026-02-21T08:32:10.4602739Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491}, [%r560 + 48]; 2026-02-21T08:32:10.4603118Z // end inline asm 2026-02-21T08:32:10.4603257Z // begin inline asm 2026-02-21T08:32:10.4603602Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508}, [%r560 + 64]; 2026-02-21T08:32:10.4603986Z // end inline asm 2026-02-21T08:32:10.4604124Z // begin inline asm 2026-02-21T08:32:10.4604456Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525}, [%r560 + 80]; 2026-02-21T08:32:10.4604872Z // end inline asm 2026-02-21T08:32:10.4605024Z // begin inline asm 2026-02-21T08:32:10.4605363Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r527, %r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542}, [%r560 + 96]; 2026-02-21T08:32:10.4605751Z // end inline asm 2026-02-21T08:32:10.4605877Z // begin inline asm 2026-02-21T08:32:10.4606212Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559}, [%r560 + 112]; 2026-02-21T08:32:10.4606569Z // end inline asm 2026-02-21T08:32:10.4606731Z // begin inline asm 2026-02-21T08:32:10.4606885Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:32:10.4607043Z // end inline asm 2026-02-21T08:32:10.4607181Z cvt.u64.u32 %rd105, %r425; 2026-02-21T08:32:10.4607334Z cvt.u64.u32 %rd106, %r426; 2026-02-21T08:32:10.4607491Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:32:10.4607648Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:32:10.4607924Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4608213Z mov.b64 {%r738, %r739}, %rd108; 2026-02-21T08:32:10.4608388Z cvt.rn.f16x2.f32 %r740, %r739, %r738; 2026-02-21T08:32:10.4608669Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4608953Z cvt.u64.u32 %rd109, %r427; 2026-02-21T08:32:10.4609109Z cvt.u64.u32 %rd110, %r428; 2026-02-21T08:32:10.4609257Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:32:10.4609417Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:32:10.4609683Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4609968Z mov.b64 {%r741, %r742}, %rd112; 2026-02-21T08:32:10.4610132Z cvt.rn.f16x2.f32 %r743, %r742, %r741; 2026-02-21T08:32:10.4610445Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4610736Z cvt.u64.u32 %rd113, %r429; 2026-02-21T08:32:10.4610887Z cvt.u64.u32 %rd114, %r430; 2026-02-21T08:32:10.4611049Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:32:10.4611201Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:32:10.4611471Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4611749Z mov.b64 {%r744, %r745}, %rd116; 2026-02-21T08:32:10.4611917Z cvt.rn.f16x2.f32 %r746, %r745, %r744; 2026-02-21T08:32:10.4612196Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4612475Z cvt.u64.u32 %rd117, %r431; 2026-02-21T08:32:10.4612630Z cvt.u64.u32 %rd118, %r432; 2026-02-21T08:32:10.4612775Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:32:10.4612931Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:32:10.4613197Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4613487Z mov.b64 {%r747, %r748}, %rd120; 2026-02-21T08:32:10.4613646Z cvt.rn.f16x2.f32 %r749, %r748, %r747; 2026-02-21T08:32:10.4613948Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4614230Z cvt.u64.u32 %rd121, %r433; 2026-02-21T08:32:10.4614378Z cvt.u64.u32 %rd122, %r434; 2026-02-21T08:32:10.4614532Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:32:10.4614712Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:32:10.4614977Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4615254Z mov.b64 {%r750, %r751}, %rd124; 2026-02-21T08:32:10.4615424Z cvt.rn.f16x2.f32 %r752, %r751, %r750; 2026-02-21T08:32:10.4615700Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4615974Z cvt.u64.u32 %rd125, %r435; 2026-02-21T08:32:10.4616130Z cvt.u64.u32 %rd126, %r436; 2026-02-21T08:32:10.4616276Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:32:10.4616479Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:32:10.4616742Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4617027Z mov.b64 {%r753, %r754}, %rd128; 2026-02-21T08:32:10.4617191Z cvt.rn.f16x2.f32 %r755, %r754, %r753; 2026-02-21T08:32:10.4617470Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4617761Z cvt.u64.u32 %rd129, %r437; 2026-02-21T08:32:10.4617913Z cvt.u64.u32 %rd130, %r438; 2026-02-21T08:32:10.4618093Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:32:10.4618243Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:32:10.4618512Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4618797Z mov.b64 {%r756, %r757}, %rd132; 2026-02-21T08:32:10.4618966Z cvt.rn.f16x2.f32 %r758, %r757, %r756; 2026-02-21T08:32:10.4619247Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4619526Z cvt.u64.u32 %rd133, %r439; 2026-02-21T08:32:10.4619681Z cvt.u64.u32 %rd134, %r440; 2026-02-21T08:32:10.4619829Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:32:10.4619986Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:32:10.4620245Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4620534Z mov.b64 {%r759, %r760}, %rd136; 2026-02-21T08:32:10.4620695Z cvt.rn.f16x2.f32 %r761, %r760, %r759; 2026-02-21T08:32:10.4620978Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4621261Z cvt.u64.u32 %rd137, %r442; 2026-02-21T08:32:10.4621408Z cvt.u64.u32 %rd138, %r443; 2026-02-21T08:32:10.4621565Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:32:10.4621718Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:32:10.4622021Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4622299Z mov.b64 {%r762, %r763}, %rd140; 2026-02-21T08:32:10.4622465Z cvt.rn.f16x2.f32 %r764, %r763, %r762; 2026-02-21T08:32:10.4622741Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4623018Z cvt.u64.u32 %rd141, %r444; 2026-02-21T08:32:10.4623171Z cvt.u64.u32 %rd142, %r445; 2026-02-21T08:32:10.4623318Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:32:10.4623476Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:32:10.4623732Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4624013Z mov.b64 {%r765, %r766}, %rd144; 2026-02-21T08:32:10.4624172Z cvt.rn.f16x2.f32 %r767, %r766, %r765; 2026-02-21T08:32:10.4624444Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4624769Z cvt.u64.u32 %rd145, %r446; 2026-02-21T08:32:10.4624920Z cvt.u64.u32 %rd146, %r447; 2026-02-21T08:32:10.4625076Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:32:10.4625253Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:32:10.4625514Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4625784Z mov.b64 {%r768, %r769}, %rd148; 2026-02-21T08:32:10.4625952Z cvt.rn.f16x2.f32 %r770, %r769, %r768; 2026-02-21T08:32:10.4626228Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4626499Z cvt.u64.u32 %rd149, %r448; 2026-02-21T08:32:10.4626654Z cvt.u64.u32 %rd150, %r449; 2026-02-21T08:32:10.4626798Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:32:10.4626954Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:32:10.4627207Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4627488Z mov.b64 {%r771, %r772}, %rd152; 2026-02-21T08:32:10.4627644Z cvt.rn.f16x2.f32 %r773, %r772, %r771; 2026-02-21T08:32:10.4627940Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4628223Z cvt.u64.u32 %rd153, %r450; 2026-02-21T08:32:10.4628367Z cvt.u64.u32 %rd154, %r451; 2026-02-21T08:32:10.4628517Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:32:10.4628665Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:32:10.4628927Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4629201Z mov.b64 {%r774, %r775}, %rd156; 2026-02-21T08:32:10.4629406Z cvt.rn.f16x2.f32 %r776, %r775, %r774; 2026-02-21T08:32:10.4629678Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4629949Z cvt.u64.u32 %rd157, %r452; 2026-02-21T08:32:10.4630103Z cvt.u64.u32 %rd158, %r453; 2026-02-21T08:32:10.4630249Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:32:10.4630405Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:32:10.4630661Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4630944Z mov.b64 {%r777, %r778}, %rd160; 2026-02-21T08:32:10.4631102Z cvt.rn.f16x2.f32 %r779, %r778, %r777; 2026-02-21T08:32:10.4631376Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4631657Z cvt.u64.u32 %rd161, %r454; 2026-02-21T08:32:10.4631804Z cvt.u64.u32 %rd162, %r455; 2026-02-21T08:32:10.4631958Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:32:10.4632109Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:32:10.4632371Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4632646Z mov.b64 {%r780, %r781}, %rd164; 2026-02-21T08:32:10.4632816Z cvt.rn.f16x2.f32 %r782, %r781, %r780; 2026-02-21T08:32:10.4633118Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4633399Z cvt.u64.u32 %rd165, %r456; 2026-02-21T08:32:10.4633554Z cvt.u64.u32 %rd166, %r457; 2026-02-21T08:32:10.4633703Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:32:10.4633859Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:32:10.4634113Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4634394Z mov.b64 {%r783, %r784}, %rd168; 2026-02-21T08:32:10.4634550Z cvt.rn.f16x2.f32 %r785, %r784, %r783; 2026-02-21T08:32:10.4634868Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4635157Z cvt.u64.u32 %rd169, %r459; 2026-02-21T08:32:10.4635304Z cvt.u64.u32 %rd170, %r460; 2026-02-21T08:32:10.4635456Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:32:10.4635606Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:32:10.4635874Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4636150Z mov.b64 {%r786, %r787}, %rd172; 2026-02-21T08:32:10.4636341Z cvt.rn.f16x2.f32 %r788, %r787, %r786; 2026-02-21T08:32:10.4636615Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4636893Z cvt.u64.u32 %rd173, %r461; 2026-02-21T08:32:10.4637050Z cvt.u64.u32 %rd174, %r462; 2026-02-21T08:32:10.4637196Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:32:10.4637351Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:32:10.4637615Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4637915Z mov.b64 {%r789, %r790}, %rd176; 2026-02-21T08:32:10.4638081Z cvt.rn.f16x2.f32 %r791, %r790, %r789; 2026-02-21T08:32:10.4638372Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4638686Z cvt.u64.u32 %rd177, %r463; 2026-02-21T08:32:10.4638838Z cvt.u64.u32 %rd178, %r464; 2026-02-21T08:32:10.4639023Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:32:10.4639179Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:32:10.4639451Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4639765Z mov.b64 {%r792, %r793}, %rd180; 2026-02-21T08:32:10.4639937Z cvt.rn.f16x2.f32 %r794, %r793, %r792; 2026-02-21T08:32:10.4640218Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4640519Z cvt.u64.u32 %rd181, %r465; 2026-02-21T08:32:10.4640710Z cvt.u64.u32 %rd182, %r466; 2026-02-21T08:32:10.4640862Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:32:10.4641026Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:32:10.4641292Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4641585Z mov.b64 {%r795, %r796}, %rd184; 2026-02-21T08:32:10.4641753Z cvt.rn.f16x2.f32 %r797, %r796, %r795; 2026-02-21T08:32:10.4642039Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4642334Z cvt.u64.u32 %rd185, %r467; 2026-02-21T08:32:10.4642486Z cvt.u64.u32 %rd186, %r468; 2026-02-21T08:32:10.4642644Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:32:10.4642799Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:32:10.4643074Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4643134Z mov.b64 {%r798, %r799}, %rd188; 2026-02-21T08:32:10.4643197Z cvt.rn.f16x2.f32 %r800, %r799, %r798; 2026-02-21T08:32:10.4643375Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4643434Z cvt.u64.u32 %rd189, %r469; 2026-02-21T08:32:10.4643493Z cvt.u64.u32 %rd190, %r470; 2026-02-21T08:32:10.4643561Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:32:10.4643652Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:32:10.4643829Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4643900Z mov.b64 {%r801, %r802}, %rd192; 2026-02-21T08:32:10.4643964Z cvt.rn.f16x2.f32 %r803, %r802, %r801; 2026-02-21T08:32:10.4644138Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4644196Z cvt.u64.u32 %rd193, %r471; 2026-02-21T08:32:10.4644261Z cvt.u64.u32 %rd194, %r472; 2026-02-21T08:32:10.4644319Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:32:10.4644380Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:32:10.4644558Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4644618Z mov.b64 {%r804, %r805}, %rd196; 2026-02-21T08:32:10.4644714Z cvt.rn.f16x2.f32 %r806, %r805, %r804; 2026-02-21T08:32:10.4644894Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4644952Z cvt.u64.u32 %rd197, %r473; 2026-02-21T08:32:10.4645010Z cvt.u64.u32 %rd198, %r474; 2026-02-21T08:32:10.4645096Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:32:10.4645161Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:32:10.4645331Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4645392Z mov.b64 {%r807, %r808}, %rd200; 2026-02-21T08:32:10.4645462Z cvt.rn.f16x2.f32 %r809, %r808, %r807; 2026-02-21T08:32:10.4645631Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4645691Z cvt.u64.u32 %rd201, %r476; 2026-02-21T08:32:10.4645757Z cvt.u64.u32 %rd202, %r477; 2026-02-21T08:32:10.4645816Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:32:10.4645874Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:32:10.4646043Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4646111Z mov.b64 {%r810, %r811}, %rd204; 2026-02-21T08:32:10.4646198Z cvt.rn.f16x2.f32 %r812, %r811, %r810; 2026-02-21T08:32:10.4646395Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4646462Z cvt.u64.u32 %rd205, %r478; 2026-02-21T08:32:10.4646520Z cvt.u64.u32 %rd206, %r479; 2026-02-21T08:32:10.4646579Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:32:10.4646636Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:32:10.4646821Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4646904Z mov.b64 {%r813, %r814}, %rd208; 2026-02-21T08:32:10.4646962Z cvt.rn.f16x2.f32 %r815, %r814, %r813; 2026-02-21T08:32:10.4647133Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4647188Z cvt.u64.u32 %rd209, %r480; 2026-02-21T08:32:10.4647243Z cvt.u64.u32 %rd210, %r481; 2026-02-21T08:32:10.4647307Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:32:10.4647363Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:32:10.4647528Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4647591Z mov.b64 {%r816, %r817}, %rd212; 2026-02-21T08:32:10.4647650Z cvt.rn.f16x2.f32 %r818, %r817, %r816; 2026-02-21T08:32:10.4647807Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4647863Z cvt.u64.u32 %rd213, %r482; 2026-02-21T08:32:10.4647924Z cvt.u64.u32 %rd214, %r483; 2026-02-21T08:32:10.4647980Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:32:10.4648035Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:32:10.4648199Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4648252Z mov.b64 {%r819, %r820}, %rd216; 2026-02-21T08:32:10.4648310Z cvt.rn.f16x2.f32 %r821, %r820, %r819; 2026-02-21T08:32:10.4648499Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4648558Z cvt.u64.u32 %rd217, %r484; 2026-02-21T08:32:10.4648613Z cvt.u64.u32 %rd218, %r485; 2026-02-21T08:32:10.4648668Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:32:10.4648732Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:32:10.4648893Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4648948Z mov.b64 {%r822, %r823}, %rd220; 2026-02-21T08:32:10.4649011Z cvt.rn.f16x2.f32 %r824, %r823, %r822; 2026-02-21T08:32:10.4649170Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4649227Z cvt.u64.u32 %rd221, %r486; 2026-02-21T08:32:10.4649289Z cvt.u64.u32 %rd222, %r487; 2026-02-21T08:32:10.4649345Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:32:10.4649400Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:32:10.4649564Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4649629Z mov.b64 {%r825, %r826}, %rd224; 2026-02-21T08:32:10.4649710Z cvt.rn.f16x2.f32 %r827, %r826, %r825; 2026-02-21T08:32:10.4649873Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4649935Z cvt.u64.u32 %rd225, %r488; 2026-02-21T08:32:10.4649988Z cvt.u64.u32 %rd226, %r489; 2026-02-21T08:32:10.4650043Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:32:10.4650097Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:32:10.4650263Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4650321Z mov.b64 {%r828, %r829}, %rd228; 2026-02-21T08:32:10.4650379Z cvt.rn.f16x2.f32 %r830, %r829, %r828; 2026-02-21T08:32:10.4650542Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4650599Z cvt.u64.u32 %rd229, %r490; 2026-02-21T08:32:10.4650655Z cvt.u64.u32 %rd230, %r491; 2026-02-21T08:32:10.4650796Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:32:10.4650856Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:32:10.4651018Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4651075Z mov.b64 {%r831, %r832}, %rd232; 2026-02-21T08:32:10.4651142Z cvt.rn.f16x2.f32 %r833, %r832, %r831; 2026-02-21T08:32:10.4651304Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4651359Z cvt.u64.u32 %rd233, %r493; 2026-02-21T08:32:10.4651450Z cvt.u64.u32 %rd234, %r494; 2026-02-21T08:32:10.4651505Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:32:10.4651562Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:32:10.4651737Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4651795Z mov.b64 {%r834, %r835}, %rd236; 2026-02-21T08:32:10.4651857Z cvt.rn.f16x2.f32 %r836, %r835, %r834; 2026-02-21T08:32:10.4652032Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4652093Z cvt.u64.u32 %rd237, %r495; 2026-02-21T08:32:10.4652149Z cvt.u64.u32 %rd238, %r496; 2026-02-21T08:32:10.4652206Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:32:10.4652277Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:32:10.4652445Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4652504Z mov.b64 {%r837, %r838}, %rd240; 2026-02-21T08:32:10.4652579Z cvt.rn.f16x2.f32 %r839, %r838, %r837; 2026-02-21T08:32:10.4652748Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4652806Z cvt.u64.u32 %rd241, %r497; 2026-02-21T08:32:10.4652860Z cvt.u64.u32 %rd242, %r498; 2026-02-21T08:32:10.4652922Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:32:10.4652997Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:32:10.4653165Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4653231Z mov.b64 {%r840, %r841}, %rd244; 2026-02-21T08:32:10.4653291Z cvt.rn.f16x2.f32 %r842, %r841, %r840; 2026-02-21T08:32:10.4653460Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4653524Z cvt.u64.u32 %rd245, %r499; 2026-02-21T08:32:10.4653580Z cvt.u64.u32 %rd246, %r500; 2026-02-21T08:32:10.4653636Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:32:10.4653694Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:32:10.4653868Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4653924Z mov.b64 {%r843, %r844}, %rd248; 2026-02-21T08:32:10.4653983Z cvt.rn.f16x2.f32 %r845, %r844, %r843; 2026-02-21T08:32:10.4654157Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4654215Z cvt.u64.u32 %rd249, %r501; 2026-02-21T08:32:10.4654290Z cvt.u64.u32 %rd250, %r502; 2026-02-21T08:32:10.4654353Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:32:10.4654409Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:32:10.4654575Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4654630Z mov.b64 {%r846, %r847}, %rd252; 2026-02-21T08:32:10.4654726Z cvt.rn.f16x2.f32 %r848, %r847, %r846; 2026-02-21T08:32:10.4654892Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4654950Z cvt.u64.u32 %rd253, %r503; 2026-02-21T08:32:10.4655011Z cvt.u64.u32 %rd254, %r504; 2026-02-21T08:32:10.4655066Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:32:10.4655122Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:32:10.4655295Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4655390Z mov.b64 {%r849, %r850}, %rd256; 2026-02-21T08:32:10.4655452Z cvt.rn.f16x2.f32 %r851, %r850, %r849; 2026-02-21T08:32:10.4655618Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4655683Z cvt.u64.u32 %rd257, %r505; 2026-02-21T08:32:10.4655737Z cvt.u64.u32 %rd258, %r506; 2026-02-21T08:32:10.4655791Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:32:10.4655856Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:32:10.4656019Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4656104Z mov.b64 {%r852, %r853}, %rd260; 2026-02-21T08:32:10.4656172Z cvt.rn.f16x2.f32 %r854, %r853, %r852; 2026-02-21T08:32:10.4656337Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4656393Z cvt.u64.u32 %rd261, %r507; 2026-02-21T08:32:10.4656449Z cvt.u64.u32 %rd262, %r508; 2026-02-21T08:32:10.4656513Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:32:10.4656570Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:32:10.4656734Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4656796Z mov.b64 {%r855, %r856}, %rd264; 2026-02-21T08:32:10.4656854Z cvt.rn.f16x2.f32 %r857, %r856, %r855; 2026-02-21T08:32:10.4657020Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4657081Z cvt.u64.u32 %rd265, %r510; 2026-02-21T08:32:10.4657135Z cvt.u64.u32 %rd266, %r511; 2026-02-21T08:32:10.4657191Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:32:10.4657246Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:32:10.4657418Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4657473Z mov.b64 {%r858, %r859}, %rd268; 2026-02-21T08:32:10.4657557Z cvt.rn.f16x2.f32 %r860, %r859, %r858; 2026-02-21T08:32:10.4657732Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4657788Z cvt.u64.u32 %rd269, %r512; 2026-02-21T08:32:10.4657843Z cvt.u64.u32 %rd270, %r513; 2026-02-21T08:32:10.4657904Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:32:10.4657959Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:32:10.4658124Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4658182Z mov.b64 {%r861, %r862}, %rd272; 2026-02-21T08:32:10.4658248Z cvt.rn.f16x2.f32 %r863, %r862, %r861; 2026-02-21T08:32:10.4658415Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4658471Z cvt.u64.u32 %rd273, %r514; 2026-02-21T08:32:10.4658534Z cvt.u64.u32 %rd274, %r515; 2026-02-21T08:32:10.4658589Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:32:10.4658646Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:32:10.4658816Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4658901Z mov.b64 {%r864, %r865}, %rd276; 2026-02-21T08:32:10.4658960Z cvt.rn.f16x2.f32 %r866, %r865, %r864; 2026-02-21T08:32:10.4659116Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4659178Z cvt.u64.u32 %rd277, %r516; 2026-02-21T08:32:10.4659231Z cvt.u64.u32 %rd278, %r517; 2026-02-21T08:32:10.4659285Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:32:10.4659348Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:32:10.4659505Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4659561Z mov.b64 {%r867, %r868}, %rd280; 2026-02-21T08:32:10.4659626Z cvt.rn.f16x2.f32 %r869, %r868, %r867; 2026-02-21T08:32:10.4659784Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4659841Z cvt.u64.u32 %rd281, %r518; 2026-02-21T08:32:10.4659915Z cvt.u64.u32 %rd282, %r519; 2026-02-21T08:32:10.4659981Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:32:10.4660037Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:32:10.4660193Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4660257Z mov.b64 {%r870, %r871}, %rd284; 2026-02-21T08:32:10.4660318Z cvt.rn.f16x2.f32 %r872, %r871, %r870; 2026-02-21T08:32:10.4660475Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4660562Z cvt.u64.u32 %rd285, %r520; 2026-02-21T08:32:10.4660616Z cvt.u64.u32 %rd286, %r521; 2026-02-21T08:32:10.4660671Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:32:10.4660727Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:32:10.4660903Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4660960Z mov.b64 {%r873, %r874}, %rd288; 2026-02-21T08:32:10.4661021Z cvt.rn.f16x2.f32 %r875, %r874, %r873; 2026-02-21T08:32:10.4661196Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4661253Z cvt.u64.u32 %rd289, %r522; 2026-02-21T08:32:10.4661308Z cvt.u64.u32 %rd290, %r523; 2026-02-21T08:32:10.4661371Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:32:10.4661427Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:32:10.4661591Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4661647Z mov.b64 {%r876, %r877}, %rd292; 2026-02-21T08:32:10.4661713Z cvt.rn.f16x2.f32 %r878, %r877, %r876; 2026-02-21T08:32:10.4661877Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4661932Z cvt.u64.u32 %rd293, %r524; 2026-02-21T08:32:10.4661993Z cvt.u64.u32 %rd294, %r525; 2026-02-21T08:32:10.4662070Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:32:10.4662127Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:32:10.4662303Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4662363Z mov.b64 {%r879, %r880}, %rd296; 2026-02-21T08:32:10.4662421Z cvt.rn.f16x2.f32 %r881, %r880, %r879; 2026-02-21T08:32:10.4662587Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4662649Z cvt.u64.u32 %rd297, %r527; 2026-02-21T08:32:10.4662702Z cvt.u64.u32 %rd298, %r528; 2026-02-21T08:32:10.4662757Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:32:10.4662824Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:32:10.4662988Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4663044Z mov.b64 {%r882, %r883}, %rd300; 2026-02-21T08:32:10.4663111Z cvt.rn.f16x2.f32 %r884, %r883, %r882; 2026-02-21T08:32:10.4663275Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4663332Z cvt.u64.u32 %rd301, %r529; 2026-02-21T08:32:10.4663412Z cvt.u64.u32 %rd302, %r530; 2026-02-21T08:32:10.4663475Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:32:10.4663530Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:32:10.4663694Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4663758Z mov.b64 {%r885, %r886}, %rd304; 2026-02-21T08:32:10.4663817Z cvt.rn.f16x2.f32 %r887, %r886, %r885; 2026-02-21T08:32:10.4663978Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4664042Z cvt.u64.u32 %rd305, %r531; 2026-02-21T08:32:10.4664096Z cvt.u64.u32 %rd306, %r532; 2026-02-21T08:32:10.4664152Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:32:10.4664207Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:32:10.4664378Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4664455Z mov.b64 {%r888, %r889}, %rd308; 2026-02-21T08:32:10.4664519Z cvt.rn.f16x2.f32 %r890, %r889, %r888; 2026-02-21T08:32:10.4664717Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4664774Z cvt.u64.u32 %rd309, %r533; 2026-02-21T08:32:10.4664828Z cvt.u64.u32 %rd310, %r534; 2026-02-21T08:32:10.4664888Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:32:10.4664944Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:32:10.4665109Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4665195Z mov.b64 {%r891, %r892}, %rd312; 2026-02-21T08:32:10.4665264Z cvt.rn.f16x2.f32 %r893, %r892, %r891; 2026-02-21T08:32:10.4665428Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4665483Z cvt.u64.u32 %rd313, %r535; 2026-02-21T08:32:10.4665545Z cvt.u64.u32 %rd314, %r536; 2026-02-21T08:32:10.4665602Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:32:10.4665658Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:32:10.4665825Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4665881Z mov.b64 {%r894, %r895}, %rd316; 2026-02-21T08:32:10.4665938Z cvt.rn.f16x2.f32 %r896, %r895, %r894; 2026-02-21T08:32:10.4666099Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4666160Z cvt.u64.u32 %rd317, %r537; 2026-02-21T08:32:10.4666216Z cvt.u64.u32 %rd318, %r538; 2026-02-21T08:32:10.4666270Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:32:10.4666333Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:32:10.4666498Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4666554Z mov.b64 {%r897, %r898}, %rd320; 2026-02-21T08:32:10.4666656Z cvt.rn.f16x2.f32 %r899, %r898, %r897; 2026-02-21T08:32:10.4666820Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4666877Z cvt.u64.u32 %rd321, %r539; 2026-02-21T08:32:10.4666931Z cvt.u64.u32 %rd322, %r540; 2026-02-21T08:32:10.4666992Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:32:10.4667047Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:32:10.4667209Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4667272Z mov.b64 {%r900, %r901}, %rd324; 2026-02-21T08:32:10.4667333Z cvt.rn.f16x2.f32 %r902, %r901, %r900; 2026-02-21T08:32:10.4667496Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4667558Z cvt.u64.u32 %rd325, %r541; 2026-02-21T08:32:10.4667613Z cvt.u64.u32 %rd326, %r542; 2026-02-21T08:32:10.4667667Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:32:10.4667723Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:32:10.4667895Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4667977Z mov.b64 {%r903, %r904}, %rd328; 2026-02-21T08:32:10.4668036Z cvt.rn.f16x2.f32 %r905, %r904, %r903; 2026-02-21T08:32:10.4668205Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4668260Z cvt.u64.u32 %rd329, %r544; 2026-02-21T08:32:10.4668315Z cvt.u64.u32 %rd330, %r545; 2026-02-21T08:32:10.4668377Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:32:10.4668433Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:32:10.4668597Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4668654Z mov.b64 {%r906, %r907}, %rd332; 2026-02-21T08:32:10.4668724Z cvt.rn.f16x2.f32 %r908, %r907, %r906; 2026-02-21T08:32:10.4668887Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4668942Z cvt.u64.u32 %rd333, %r546; 2026-02-21T08:32:10.4669030Z cvt.u64.u32 %rd334, %r547; 2026-02-21T08:32:10.4669088Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:32:10.4669144Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:32:10.4669314Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4669371Z mov.b64 {%r909, %r910}, %rd336; 2026-02-21T08:32:10.4669432Z cvt.rn.f16x2.f32 %r911, %r910, %r909; 2026-02-21T08:32:10.4669591Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4669684Z cvt.u64.u32 %rd337, %r548; 2026-02-21T08:32:10.4669741Z cvt.u64.u32 %rd338, %r549; 2026-02-21T08:32:10.4669800Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:32:10.4669868Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:32:10.4670031Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4670089Z mov.b64 {%r912, %r913}, %rd340; 2026-02-21T08:32:10.4670156Z cvt.rn.f16x2.f32 %r914, %r913, %r912; 2026-02-21T08:32:10.4670319Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4670374Z cvt.u64.u32 %rd341, %r550; 2026-02-21T08:32:10.4670429Z cvt.u64.u32 %rd342, %r551; 2026-02-21T08:32:10.4670491Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:32:10.4670547Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:32:10.4670705Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4670769Z mov.b64 {%r915, %r916}, %rd344; 2026-02-21T08:32:10.4670828Z cvt.rn.f16x2.f32 %r917, %r916, %r915; 2026-02-21T08:32:10.4670989Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4671051Z cvt.u64.u32 %rd345, %r552; 2026-02-21T08:32:10.4671127Z cvt.u64.u32 %rd346, %r553; 2026-02-21T08:32:10.4671185Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:32:10.4671242Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:32:10.4671412Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4671468Z mov.b64 {%r918, %r919}, %rd348; 2026-02-21T08:32:10.4671526Z cvt.rn.f16x2.f32 %r920, %r919, %r918; 2026-02-21T08:32:10.4671693Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4671748Z cvt.u64.u32 %rd349, %r554; 2026-02-21T08:32:10.4671802Z cvt.u64.u32 %rd350, %r555; 2026-02-21T08:32:10.4671865Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:32:10.4671920Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:32:10.4672082Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4672137Z mov.b64 {%r921, %r922}, %rd352; 2026-02-21T08:32:10.4672206Z cvt.rn.f16x2.f32 %r923, %r922, %r921; 2026-02-21T08:32:10.4672370Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4672457Z cvt.u64.u32 %rd353, %r556; 2026-02-21T08:32:10.4672521Z cvt.u64.u32 %rd354, %r557; 2026-02-21T08:32:10.4672577Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:32:10.4672632Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:32:10.4672800Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4672856Z mov.b64 {%r924, %r925}, %rd356; 2026-02-21T08:32:10.4672915Z cvt.rn.f16x2.f32 %r926, %r925, %r924; 2026-02-21T08:32:10.4673074Z .loc 1 56 52 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:56:52 2026-02-21T08:32:10.4673136Z cvt.u64.u32 %rd357, %r558; 2026-02-21T08:32:10.4673190Z cvt.u64.u32 %rd358, %r559; 2026-02-21T08:32:10.4673244Z shl.b64 %rd359, %rd358, 32; 2026-02-21T08:32:10.4673308Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T08:32:10.4673489Z .loc 1 58 27 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:58:27 2026-02-21T08:32:10.4673547Z mov.b64 {%r927, %r928}, %rd360; 2026-02-21T08:32:10.4673613Z cvt.rn.f16x2.f32 %r929, %r928, %r927; 2026-02-21T08:32:10.4673774Z .loc 1 59 82 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:59:82 2026-02-21T08:32:10.4673867Z st.shared.v4.b32 [%r9], {%r740, %r752, %r764, %r776}; 2026-02-21T08:32:10.4673958Z st.shared.v4.b32 [%r10], {%r788, %r800, %r812, %r824}; 2026-02-21T08:32:10.4674050Z st.shared.v4.b32 [%r11], {%r836, %r848, %r860, %r872}; 2026-02-21T08:32:10.4674154Z st.shared.v4.b32 [%r12], {%r884, %r896, %r908, %r920}; 2026-02-21T08:32:10.4674208Z bar.sync 0; 2026-02-21T08:32:10.4674272Z // begin inline asm 2026-02-21T08:32:10.4674422Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r641, %r645, %r649, %r653}, [%r565]; 2026-02-21T08:32:10.4674475Z // end inline asm 2026-02-21T08:32:10.4674538Z // begin inline asm 2026-02-21T08:32:10.4674709Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r657, %r661, %r665, %r669}, [%r570]; 2026-02-21T08:32:10.4674764Z // end inline asm 2026-02-21T08:32:10.4674820Z // begin inline asm 2026-02-21T08:32:10.4674967Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r673, %r677, %r681, %r685}, [%r575]; 2026-02-21T08:32:10.4675020Z // end inline asm 2026-02-21T08:32:10.4675074Z // begin inline asm 2026-02-21T08:32:10.4675216Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r689, %r693, %r697, %r701}, [%r580]; 2026-02-21T08:32:10.4675268Z // end inline asm 2026-02-21T08:32:10.4675320Z bar.sync 0; 2026-02-21T08:32:10.4675406Z st.shared.v4.b32 [%r9], {%r743, %r755, %r767, %r779}; 2026-02-21T08:32:10.4675496Z st.shared.v4.b32 [%r10], {%r791, %r803, %r815, %r827}; 2026-02-21T08:32:10.4675578Z st.shared.v4.b32 [%r11], {%r839, %r851, %r863, %r875}; 2026-02-21T08:32:10.4675659Z st.shared.v4.b32 [%r12], {%r887, %r899, %r911, %r923}; 2026-02-21T08:32:10.4675718Z bar.sync 0; 2026-02-21T08:32:10.4675799Z // begin inline asm 2026-02-21T08:32:10.4675942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r642, %r646, %r650, %r654}, [%r565]; 2026-02-21T08:32:10.4676002Z // end inline asm 2026-02-21T08:32:10.4676055Z // begin inline asm 2026-02-21T08:32:10.4676196Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r658, %r662, %r666, %r670}, [%r570]; 2026-02-21T08:32:10.4676247Z // end inline asm 2026-02-21T08:32:10.4676308Z // begin inline asm 2026-02-21T08:32:10.4676443Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r674, %r678, %r682, %r686}, [%r575]; 2026-02-21T08:32:10.4676495Z // end inline asm 2026-02-21T08:32:10.4676557Z // begin inline asm 2026-02-21T08:32:10.4676695Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r690, %r694, %r698, %r702}, [%r580]; 2026-02-21T08:32:10.4676747Z // end inline asm 2026-02-21T08:32:10.4676800Z bar.sync 0; 2026-02-21T08:32:10.4676891Z st.shared.v4.b32 [%r9], {%r746, %r758, %r770, %r782}; 2026-02-21T08:32:10.4676975Z st.shared.v4.b32 [%r10], {%r794, %r806, %r818, %r830}; 2026-02-21T08:32:10.4677057Z st.shared.v4.b32 [%r11], {%r842, %r854, %r866, %r878}; 2026-02-21T08:32:10.4677148Z st.shared.v4.b32 [%r12], {%r890, %r902, %r914, %r926}; 2026-02-21T08:32:10.4677227Z bar.sync 0; 2026-02-21T08:32:10.4677282Z // begin inline asm 2026-02-21T08:32:10.4677427Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r643, %r647, %r651, %r655}, [%r565]; 2026-02-21T08:32:10.4677479Z // end inline asm 2026-02-21T08:32:10.4677532Z // begin inline asm 2026-02-21T08:32:10.4677670Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r659, %r663, %r667, %r671}, [%r570]; 2026-02-21T08:32:10.4677733Z // end inline asm 2026-02-21T08:32:10.4677788Z // begin inline asm 2026-02-21T08:32:10.4677925Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r675, %r679, %r683, %r687}, [%r575]; 2026-02-21T08:32:10.4677989Z // end inline asm 2026-02-21T08:32:10.4678044Z // begin inline asm 2026-02-21T08:32:10.4678184Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r691, %r695, %r699, %r703}, [%r580]; 2026-02-21T08:32:10.4678245Z // end inline asm 2026-02-21T08:32:10.4678292Z bar.sync 0; 2026-02-21T08:32:10.4678396Z st.shared.v4.b32 [%r9], {%r749, %r761, %r773, %r785}; 2026-02-21T08:32:10.4678482Z st.shared.v4.b32 [%r10], {%r797, %r809, %r821, %r833}; 2026-02-21T08:32:10.4678568Z st.shared.v4.b32 [%r11], {%r845, %r857, %r869, %r881}; 2026-02-21T08:32:10.4678648Z st.shared.v4.b32 [%r12], {%r893, %r905, %r917, %r929}; 2026-02-21T08:32:10.4678698Z bar.sync 0; 2026-02-21T08:32:10.4678753Z // begin inline asm 2026-02-21T08:32:10.4678894Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r644, %r648, %r652, %r656}, [%r565]; 2026-02-21T08:32:10.4678942Z // end inline asm 2026-02-21T08:32:10.4679018Z // begin inline asm 2026-02-21T08:32:10.4679162Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r660, %r664, %r668, %r672}, [%r570]; 2026-02-21T08:32:10.4679214Z // end inline asm 2026-02-21T08:32:10.4679268Z // begin inline asm 2026-02-21T08:32:10.4679410Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r676, %r680, %r684, %r688}, [%r575]; 2026-02-21T08:32:10.4679461Z // end inline asm 2026-02-21T08:32:10.4679513Z // begin inline asm 2026-02-21T08:32:10.4679655Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r692, %r696, %r700, %r704}, [%r580]; 2026-02-21T08:32:10.4679708Z // end inline asm 2026-02-21T08:32:10.4679761Z // begin inline asm 2026-02-21T08:32:10.4679861Z st.global.v4.b32 [ %rd89 + 0 ], { %r641, %r642, %r643, %r644 }; 2026-02-21T08:32:10.4679920Z // end inline asm 2026-02-21T08:32:10.4679972Z // begin inline asm 2026-02-21T08:32:10.4680068Z st.global.v4.b32 [ %rd90 + 0 ], { %r645, %r646, %r647, %r648 }; 2026-02-21T08:32:10.4680126Z // end inline asm 2026-02-21T08:32:10.4680179Z // begin inline asm 2026-02-21T08:32:10.4680271Z st.global.v4.b32 [ %rd91 + 0 ], { %r649, %r650, %r651, %r652 }; 2026-02-21T08:32:10.4680324Z // end inline asm 2026-02-21T08:32:10.4680387Z // begin inline asm 2026-02-21T08:32:10.4680480Z st.global.v4.b32 [ %rd92 + 0 ], { %r653, %r654, %r655, %r656 }; 2026-02-21T08:32:10.4680557Z // end inline asm 2026-02-21T08:32:10.4680621Z // begin inline asm 2026-02-21T08:32:10.4680717Z st.global.v4.b32 [ %rd93 + 0 ], { %r657, %r658, %r659, %r660 }; 2026-02-21T08:32:10.4680774Z // end inline asm 2026-02-21T08:32:10.4680829Z // begin inline asm 2026-02-21T08:32:10.4680933Z st.global.v4.b32 [ %rd94 + 0 ], { %r661, %r662, %r663, %r664 }; 2026-02-21T08:32:10.4680988Z // end inline asm 2026-02-21T08:32:10.4681044Z // begin inline asm 2026-02-21T08:32:10.4681147Z st.global.v4.b32 [ %rd95 + 0 ], { %r665, %r666, %r667, %r668 }; 2026-02-21T08:32:10.4681201Z // end inline asm 2026-02-21T08:32:10.4681255Z // begin inline asm 2026-02-21T08:32:10.4681359Z st.global.v4.b32 [ %rd96 + 0 ], { %r669, %r670, %r671, %r672 }; 2026-02-21T08:32:10.4681413Z // end inline asm 2026-02-21T08:32:10.4681467Z // begin inline asm 2026-02-21T08:32:10.4681562Z st.global.v4.b32 [ %rd97 + 0 ], { %r673, %r674, %r675, %r676 }; 2026-02-21T08:32:10.4681622Z // end inline asm 2026-02-21T08:32:10.4681677Z // begin inline asm 2026-02-21T08:32:10.4681772Z st.global.v4.b32 [ %rd98 + 0 ], { %r677, %r678, %r679, %r680 }; 2026-02-21T08:32:10.4681833Z // end inline asm 2026-02-21T08:32:10.4681914Z // begin inline asm 2026-02-21T08:32:10.4682008Z st.global.v4.b32 [ %rd99 + 0 ], { %r681, %r682, %r683, %r684 }; 2026-02-21T08:32:10.4682063Z // end inline asm 2026-02-21T08:32:10.4682125Z // begin inline asm 2026-02-21T08:32:10.4682227Z st.global.v4.b32 [ %rd100 + 0 ], { %r685, %r686, %r687, %r688 }; 2026-02-21T08:32:10.4682280Z // end inline asm 2026-02-21T08:32:10.4682342Z // begin inline asm 2026-02-21T08:32:10.4682442Z st.global.v4.b32 [ %rd101 + 0 ], { %r689, %r690, %r691, %r692 }; 2026-02-21T08:32:10.4682496Z // end inline asm 2026-02-21T08:32:10.4682556Z // begin inline asm 2026-02-21T08:32:10.4682653Z st.global.v4.b32 [ %rd102 + 0 ], { %r693, %r694, %r695, %r696 }; 2026-02-21T08:32:10.4682707Z // end inline asm 2026-02-21T08:32:10.4682762Z // begin inline asm 2026-02-21T08:32:10.4682868Z st.global.v4.b32 [ %rd103 + 0 ], { %r697, %r698, %r699, %r700 }; 2026-02-21T08:32:10.4682922Z // end inline asm 2026-02-21T08:32:10.4682997Z // begin inline asm 2026-02-21T08:32:10.4683102Z st.global.v4.b32 [ %rd104 + 0 ], { %r701, %r702, %r703, %r704 }; 2026-02-21T08:32:10.4683156Z // end inline asm 2026-02-21T08:32:10.4683237Z $L__BB0_8: // %._crit_edge 2026-02-21T08:32:10.4683415Z .loc 1 30 4 // cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py:30:4 2026-02-21T08:32:10.4683476Z bar.sync 0; 2026-02-21T08:32:10.4683531Z // begin inline asm 2026-02-21T08:32:10.4683655Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r930, 256; 2026-02-21T08:32:10.4683738Z // end inline asm 2026-02-21T08:32:10.4683790Z ret; 2026-02-21T08:32:10.4683846Z $L__tmp0: 2026-02-21T08:32:10.4683908Z $L__func_end0: 2026-02-21T08:32:10.4683993Z // -- End function 2026-02-21T08:32:10.4684045Z } 2026-02-21T08:32:10.4684267Z .file 1 "/tmp/torchinductor_root/vi/cvipho6htlx2xmxryzwurv62cp6wjrz2xclu4dymnurfy2nwixpp.py" 2026-02-21T08:32:10.4684334Z .section .debug_abbrev 2026-02-21T08:32:10.4684385Z { 2026-02-21T08:32:10.4684474Z .b8 1 // Abbreviation Code 2026-02-21T08:32:10.4684571Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:32:10.4684654Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:32:10.4684771Z .b8 37 // DW_AT_producer 2026-02-21T08:32:10.4684848Z .b8 8 // DW_FORM_string 2026-02-21T08:32:10.4684932Z .b8 19 // DW_AT_language 2026-02-21T08:32:10.4685010Z .b8 5 // DW_FORM_data2 2026-02-21T08:32:10.4685086Z .b8 3 // DW_AT_name 2026-02-21T08:32:10.4685167Z .b8 8 // DW_FORM_string 2026-02-21T08:32:10.4685247Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:32:10.4685348Z .b8 6 // DW_FORM_data4 2026-02-21T08:32:10.4685434Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:32:10.4685511Z .b8 8 // DW_FORM_string 2026-02-21T08:32:10.4685585Z .b8 0 // EOM(1) 2026-02-21T08:32:10.4685665Z .b8 0 // EOM(2) 2026-02-21T08:32:10.4685734Z .b8 0 // EOM(3) 2026-02-21T08:32:10.4685786Z } 2026-02-21T08:32:10.4685849Z .section .debug_info 2026-02-21T08:32:10.4685910Z { 2026-02-21T08:32:10.4685995Z .b32 104 // Length of Unit 2026-02-21T08:32:10.4686082Z .b8 2 // DWARF version number 2026-02-21T08:32:10.4686147Z .b8 0 2026-02-21T08:32:10.4686269Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:32:10.4686359Z .b8 8 // Address Size (in bytes) 2026-02-21T08:32:10.4686464Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:32:10.4686557Z .b8 116 // DW_AT_producer 2026-02-21T08:32:10.4686638Z .b8 114 2026-02-21T08:32:10.4686690Z .b8 105 2026-02-21T08:32:10.4686748Z .b8 116 2026-02-21T08:32:10.4686800Z .b8 111 2026-02-21T08:32:10.4686850Z .b8 110 2026-02-21T08:32:10.4686902Z .b8 0 2026-02-21T08:32:10.4686981Z .b8 2 // DW_AT_language 2026-02-21T08:32:10.4687031Z .b8 0 2026-02-21T08:32:10.4687106Z .b8 99 // DW_AT_name 2026-02-21T08:32:10.4687164Z .b8 118 2026-02-21T08:32:10.4687217Z .b8 105 2026-02-21T08:32:10.4687267Z .b8 112 2026-02-21T08:32:10.4687318Z .b8 104 2026-02-21T08:32:10.4687377Z .b8 111 2026-02-21T08:32:10.4687428Z .b8 54 2026-02-21T08:32:10.4687478Z .b8 104 2026-02-21T08:32:10.4687534Z .b8 116 2026-02-21T08:32:10.4687584Z .b8 108 2026-02-21T08:32:10.4687634Z .b8 120 2026-02-21T08:32:10.4687684Z .b8 50 2026-02-21T08:32:10.4687742Z .b8 120 2026-02-21T08:32:10.4687793Z .b8 109 2026-02-21T08:32:10.4687843Z .b8 120 2026-02-21T08:32:10.4687926Z .b8 114 2026-02-21T08:32:10.4687979Z .b8 121 2026-02-21T08:32:10.4688030Z .b8 122 2026-02-21T08:32:10.4688079Z .b8 119 2026-02-21T08:32:10.4688138Z .b8 117 2026-02-21T08:32:10.4688189Z .b8 114 2026-02-21T08:32:10.4688239Z .b8 118 2026-02-21T08:32:10.4688289Z .b8 54 2026-02-21T08:32:10.4688346Z .b8 50 2026-02-21T08:32:10.4688396Z .b8 99 2026-02-21T08:32:10.4688446Z .b8 112 2026-02-21T08:32:10.4688504Z .b8 54 2026-02-21T08:32:10.4688558Z .b8 119 2026-02-21T08:32:10.4688608Z .b8 106 2026-02-21T08:32:10.4688658Z .b8 114 2026-02-21T08:32:10.4688742Z .b8 122 2026-02-21T08:32:10.4688793Z .b8 50 2026-02-21T08:32:10.4688843Z .b8 120 2026-02-21T08:32:10.4688900Z .b8 99 2026-02-21T08:32:10.4688949Z .b8 108 2026-02-21T08:32:10.4689001Z .b8 117 2026-02-21T08:32:10.4689050Z .b8 52 2026-02-21T08:32:10.4689107Z .b8 100 2026-02-21T08:32:10.4689158Z .b8 121 2026-02-21T08:32:10.4689209Z .b8 109 2026-02-21T08:32:10.4689266Z .b8 110 2026-02-21T08:32:10.4689317Z .b8 117 2026-02-21T08:32:10.4689369Z .b8 114 2026-02-21T08:32:10.4689419Z .b8 102 2026-02-21T08:32:10.4689478Z .b8 121 2026-02-21T08:32:10.4689529Z .b8 50 2026-02-21T08:32:10.4689579Z .b8 110 2026-02-21T08:32:10.4689629Z .b8 119 2026-02-21T08:32:10.4689685Z .b8 105 2026-02-21T08:32:10.4689735Z .b8 120 2026-02-21T08:32:10.4689785Z .b8 112 2026-02-21T08:32:10.4689841Z .b8 112 2026-02-21T08:32:10.4689892Z .b8 46 2026-02-21T08:32:10.4689942Z .b8 112 2026-02-21T08:32:10.4689991Z .b8 121 2026-02-21T08:32:10.4690050Z .b8 0 2026-02-21T08:32:10.4690153Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:32:10.4690227Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:32:10.4690283Z .b8 116 2026-02-21T08:32:10.4690331Z .b8 109 2026-02-21T08:32:10.4690379Z .b8 112 2026-02-21T08:32:10.4690426Z .b8 47 2026-02-21T08:32:10.4690481Z .b8 116 2026-02-21T08:32:10.4690529Z .b8 111 2026-02-21T08:32:10.4690576Z .b8 114 2026-02-21T08:32:10.4690664Z .b8 99 2026-02-21T08:32:10.4690715Z .b8 104 2026-02-21T08:32:10.4690764Z .b8 105 2026-02-21T08:32:10.4690813Z .b8 110 2026-02-21T08:32:10.4690870Z .b8 100 2026-02-21T08:32:10.4690919Z .b8 117 2026-02-21T08:32:10.4690966Z .b8 99 2026-02-21T08:32:10.4691014Z .b8 116 2026-02-21T08:32:10.4691069Z .b8 111 2026-02-21T08:32:10.4691116Z .b8 114 2026-02-21T08:32:10.4691164Z .b8 95 2026-02-21T08:32:10.4691218Z .b8 114 2026-02-21T08:32:10.4691266Z .b8 111 2026-02-21T08:32:10.4691315Z .b8 111 2026-02-21T08:32:10.4691364Z .b8 116 2026-02-21T08:32:10.4691421Z .b8 47 2026-02-21T08:32:10.4691471Z .b8 118 2026-02-21T08:32:10.4691520Z .b8 105 2026-02-21T08:32:10.4691580Z .b8 0 2026-02-21T08:32:10.4691628Z } 2026-02-21T08:32:10.4691691Z .section .debug_macinfo { } 2026-02-21T08:32:10.4691695Z 2026-02-21T08:32:10.4691770Z ================================================================ 2026-02-21T08:32:10.4691878Z please share the reproducer above with Triton project. 2026-02-21T08:32:10.9182287Z 2026-02-21T08:32:10.9185245Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 22.4 configs/s 2026-02-21T08:32:12.5166846Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 810.9 2026-02-21T08:32:12.5167541Z configs/s 2026-02-21T08:32:12.5945016Z [67s] Generation 3 complete: 2026-02-21T08:32:12.5950509Z error=22 2026-02-21T08:32:12.5955599Z ok=64 2026-02-21T08:32:12.5960213Z min=0.0748 2026-02-21T08:32:12.5961745Z mid=0.1884 2026-02-21T08:32:12.5961910Z max=4.3582 2026-02-21T08:32:12.5962063Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:32:12.5962312Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:32:12.5962527Z 'l2_groupings': [4], 2026-02-21T08:32:12.5962687Z 'load_eviction_policies': ['', ''], 2026-02-21T08:32:12.5962876Z 'loop_orders': [[1, 0]], 2026-02-21T08:32:12.5963032Z 'num_stages': 8, 2026-02-21T08:32:12.5963177Z 'num_warps': 4, 2026-02-21T08:32:12.5963322Z 'pid_type': 'flat', 2026-02-21T08:32:12.5963482Z 'range_flattens': [None, None], 2026-02-21T08:32:12.5963924Z 'range_multi_buffers': [None, None], 2026-02-21T08:32:12.5964133Z 'range_num_stages': [0, 0], 2026-02-21T08:32:12.5964313Z 'range_unroll_factors': [0, 0], 2026-02-21T08:32:12.5964491Z 'range_warp_specializes': [None, True]} 2026-02-21T08:32:12.5964803Z [67s] Fitting surrogate: 365 points, 365 targets 2026-02-21T08:32:13.7650908Z [68s] Generation 4 starting: 81 neighbors, 5 active search path(s) 2026-02-21T08:32:46.2027560Z [101s] Timeout after 30s compiling Config(block_sizes=[256, 256, 256], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=None, num_sm_multiplier=16, num_stages=2, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, None]) 2026-02-21T08:32:46.2042333Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82/82 0.4 configs/s 2026-02-21T08:32:49.0892205Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 82/82 29.1 configs/s 2026-02-21T08:32:50.7386938Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 598.7 2026-02-21T08:32:50.7391679Z configs/s 2026-02-21T08:32:50.8317935Z [105s] Generation 4 complete: 2026-02-21T08:32:50.8319806Z error=37 2026-02-21T08:32:50.8319984Z timeout=1 2026-02-21T08:32:50.8320152Z ok=48 2026-02-21T08:32:50.8320298Z min=0.0758 2026-02-21T08:32:50.8320454Z mid=0.1290 2026-02-21T08:32:50.8320630Z max=11.8498 2026-02-21T08:32:50.8320778Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:32:50.8321002Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:32:50.8321228Z 'l2_groupings': [2], 2026-02-21T08:32:50.8321393Z 'load_eviction_policies': ['', ''], 2026-02-21T08:32:50.8321576Z 'loop_orders': [[0, 1]], 2026-02-21T08:32:50.8322084Z 'num_stages': 8, 2026-02-21T08:32:50.8322239Z 'num_warps': 4, 2026-02-21T08:32:50.8322398Z 'pid_type': 'flat', 2026-02-21T08:32:50.8322564Z 'range_flattens': [None, None], 2026-02-21T08:32:50.8322753Z 'range_multi_buffers': [None, None], 2026-02-21T08:32:50.8322936Z 'range_num_stages': [0, 0], 2026-02-21T08:32:50.8323114Z 'range_unroll_factors': [0, 0], 2026-02-21T08:32:50.8323297Z 'range_warp_specializes': [None, True]} 2026-02-21T08:32:50.8340159Z [105s] Fitting surrogate: 451 points, 451 targets 2026-02-21T08:32:51.9407522Z [107s] Generation 5 starting: 75 neighbors, 5 active search path(s) 2026-02-21T08:32:56.3179743Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75/75 15.3 configs/s 2026-02-21T08:32:58.1486005Z 2026-02-21T08:32:58.1491993Z 2026-02-21T08:32:58.1495063Z ================================================================ 2026-02-21T08:32:58.1498732Z Internal Triton PTX codegen error 2026-02-21T08:32:58.1502588Z `ptxas` stderr: 2026-02-21T08:32:58.1505912Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 173 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:32:58.1509592Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:32:58.1513442Z 2026-02-21T08:32:58.1514023Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpk5czzt3i.ptx -o /tmp/tmpk5czzt3i.ptx.o 2026-02-21T08:32:58.1514515Z 2026-02-21T08:32:58.1514519Z 2026-02-21T08:32:58.1514578Z // 2026-02-21T08:32:58.1514764Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:32:58.1514964Z // 2026-02-21T08:32:58.1515039Z 2026-02-21T08:32:58.1515096Z .version 8.7 2026-02-21T08:32:58.1515243Z .target sm_100a 2026-02-21T08:32:58.1515381Z .address_size 64 2026-02-21T08:32:58.1515467Z 2026-02-21T08:32:58.1515598Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:32:58.1515861Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:32:58.1516722Z // @_helion_matmul 2026-02-21T08:32:58.1516932Z .visible .entry _helion_matmul( 2026-02-21T08:32:58.1517172Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:32:58.1517450Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:32:58.1517703Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:32:58.1517962Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:32:58.1518219Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:32:58.1518422Z ) 2026-02-21T08:32:58.1518625Z .reqntid 384 2026-02-21T08:32:58.1518755Z .maxnreg 32 2026-02-21T08:32:58.1518873Z { 2026-02-21T08:32:58.1519003Z .reg .pred %p<70>; 2026-02-21T08:32:58.1519152Z .reg .b16 %rs<3>; 2026-02-21T08:32:58.1519288Z .reg .b32 %r<1275>; 2026-02-21T08:32:58.1519435Z .reg .b64 %rd<641>; 2026-02-21T08:32:58.1519571Z $L__func_begin0: 2026-02-21T08:32:58.1519976Z [113s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:32:58.1521211Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 64], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=2, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:32:58.1522317Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:32:58.1522548Z `ptxas` stderr: 2026-02-21T08:32:58.1522944Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 173 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:32:58.1523406Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:32:58.1523609Z 2026-02-21T08:32:58.1523997Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpk5czzt3i.ptx -o /tmp/tmpk5czzt3i.ptx.o 2026-02-21T08:32:58.1524417Z 2026-02-21T08:32:58.1524542Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:32:58.1524780Z 2026-02-21T08:32:58.1524831Z // %bb.0: 2026-02-21T08:32:58.1525081Z .loc 1 19 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:19 2026-02-21T08:32:58.1525361Z mov.u32 %r1, %tid.x; 2026-02-21T08:32:58.1525511Z shr.u32 %r2, %r1, 5; 2026-02-21T08:32:58.1525663Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:32:58.1525850Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:32:58.1526001Z @%p2 bra $L__BB0_13; 2026-02-21T08:32:58.1526146Z bra.uni $L__BB0_1; 2026-02-21T08:32:58.1526279Z $L__BB0_13: 2026-02-21T08:32:58.1526519Z .loc 1 0 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:0:0 2026-02-21T08:32:58.1526831Z ld.param.b64 %rd24, [_helion_matmul_param_3]; 2026-02-21T08:32:58.1527082Z ld.param.b64 %rd23, [_helion_matmul_param_2]; 2026-02-21T08:32:58.1527385Z .loc 1 19 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:19 2026-02-21T08:32:58.1527688Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T08:32:58.1527893Z setp.lt.u32 %p26, %r1, 32; 2026-02-21T08:32:58.1528062Z mov.b32 %r287, global_smem; 2026-02-21T08:32:58.1528234Z // begin inline asm 2026-02-21T08:32:58.1528485Z @%p26 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r287], 256; 2026-02-21T08:32:58.1528725Z // end inline asm 2026-02-21T08:32:58.1528862Z bar.sync 0, 128; 2026-02-21T08:32:58.1529007Z ld.shared.b32 %r1274, [global_smem]; 2026-02-21T08:32:58.1529181Z bar.sync 0, 128; 2026-02-21T08:32:58.1529311Z // begin inline asm 2026-02-21T08:32:58.1529519Z @%p26 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:32:58.1529740Z // end inline asm 2026-02-21T08:32:58.1530026Z .loc 1 21 71 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:21:71 2026-02-21T08:32:58.1530317Z mov.u32 %r33, %ctaid.x; 2026-02-21T08:32:58.1530467Z mov.u32 %r296, %ctaid.y; 2026-02-21T08:32:58.1530622Z mov.u32 %r297, %ctaid.z; 2026-02-21T08:32:58.1530769Z mov.u32 %r298, %nctaid.x; 2026-02-21T08:32:58.1530931Z mov.u32 %r299, %nctaid.y; 2026-02-21T08:32:58.1531085Z mad.lo.s32 %r300, %r297, %r299, %r296; 2026-02-21T08:32:58.1531271Z mad.lo.s32 %r301, %r300, %r298, %r33; 2026-02-21T08:32:58.1531438Z shl.b32 %r302, %r301, 7; 2026-02-21T08:32:58.1531628Z cvt.s64.s32 %rd126, %r302; 2026-02-21T08:32:58.1531786Z add.s64 %rd20, %rd24, %rd126; 2026-02-21T08:32:58.1531950Z shl.b32 %r303, %r1, 2; 2026-02-21T08:32:58.1532102Z add.s32 %r288, %r287, %r303; 2026-02-21T08:32:58.1532249Z mov.b32 %r305, 0; 2026-02-21T08:32:58.1532389Z // begin inline asm 2026-02-21T08:32:58.1532539Z @%p26 st.shared.b32 [ %r288 + 0 ], %r305; 2026-02-21T08:32:58.1532714Z // end inline asm 2026-02-21T08:32:58.1532848Z bar.warp.sync -1; 2026-02-21T08:32:58.1532994Z setp.eq.b32 %p63, %r1, 0; 2026-02-21T08:32:58.1533143Z cvt.u64.u32 %rd108, %r287; 2026-02-21T08:32:58.1533301Z // begin inline asm 2026-02-21T08:32:58.1533560Z @%p63 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd108 + 0 ], %rd23; 2026-02-21T08:32:58.1533841Z // end inline asm 2026-02-21T08:32:58.1533976Z // begin inline asm 2026-02-21T08:32:58.1534199Z @%p63 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x1; 2026-02-21T08:32:58.1534456Z // end inline asm 2026-02-21T08:32:58.1534585Z mov.b32 %r290, 64; 2026-02-21T08:32:58.1534757Z // begin inline asm 2026-02-21T08:32:58.1534989Z @%p63 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x0, %r290; 2026-02-21T08:32:58.1535256Z // end inline asm 2026-02-21T08:32:58.1535388Z mov.b32 %r291, 256; 2026-02-21T08:32:58.1535555Z // begin inline asm 2026-02-21T08:32:58.1535795Z @%p63 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x1, %r291; 2026-02-21T08:32:58.1536055Z // end inline asm 2026-02-21T08:32:58.1536190Z mov.b32 %r292, 2048; 2026-02-21T08:32:58.1536325Z // begin inline asm 2026-02-21T08:32:58.1536569Z @%p63 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x0, %r292; 2026-02-21T08:32:58.1536835Z // end inline asm 2026-02-21T08:32:58.1536969Z mov.b32 %r293, 8192; 2026-02-21T08:32:58.1537109Z // begin inline asm 2026-02-21T08:32:58.1537344Z @%p63 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x1, %r293; 2026-02-21T08:32:58.1537623Z // end inline asm 2026-02-21T08:32:58.1537753Z mov.b64 %rd116, 4096; 2026-02-21T08:32:58.1537898Z // begin inline asm 2026-02-21T08:32:58.1538146Z @%p63 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd108 + 0 ], 0x0, %rd116; 2026-02-21T08:32:58.1538437Z // end inline asm 2026-02-21T08:32:58.1538570Z mov.b32 %r294, 1; 2026-02-21T08:32:58.1538700Z // begin inline asm 2026-02-21T08:32:58.1538960Z @%p63 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x0, %r294; 2026-02-21T08:32:58.1539282Z // end inline asm 2026-02-21T08:32:58.1539422Z // begin inline asm 2026-02-21T08:32:58.1539671Z @%p63 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x1, %r294; 2026-02-21T08:32:58.1539959Z // end inline asm 2026-02-21T08:32:58.1540093Z // begin inline asm 2026-02-21T08:32:58.1540330Z @%p63 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x6; 2026-02-21T08:32:58.1540606Z // end inline asm 2026-02-21T08:32:58.1540745Z // begin inline asm 2026-02-21T08:32:58.1540998Z @%p63 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x0; 2026-02-21T08:32:58.1541274Z // end inline asm 2026-02-21T08:32:58.1541413Z // begin inline asm 2026-02-21T08:32:58.1541650Z @%p63 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x3; 2026-02-21T08:32:58.1541918Z // end inline asm 2026-02-21T08:32:58.1542090Z // begin inline asm 2026-02-21T08:32:58.1542315Z @%p63 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd108 + 0 ], 0x0; 2026-02-21T08:32:58.1542575Z // end inline asm 2026-02-21T08:32:58.1542702Z // begin inline asm 2026-02-21T08:32:58.1543047Z @%p26 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd20 + 0 ], [ %rd108 + 0 ], 0x80; 2026-02-21T08:32:58.1543424Z // end inline asm 2026-02-21T08:32:58.1543557Z // begin inline asm 2026-02-21T08:32:58.1543764Z @%p26 fence.proxy.tensormap::generic.acquire.gpu [ %rd20 + 0 ], 0x80; 2026-02-21T08:32:58.1544042Z @%p26 cp.async.bulk.commit_group ; 2026-02-21T08:32:58.1544237Z @%p26 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:32:58.1544407Z // end inline asm 2026-02-21T08:32:58.1544541Z bar.sync 0, 128; 2026-02-21T08:32:58.1544826Z .loc 1 30 75 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:30:75 2026-02-21T08:32:58.1545122Z setp.gt.u32 %p46, %r33, 511; 2026-02-21T08:32:58.1545281Z @%p46 bra $L__BB0_15; 2026-02-21T08:32:58.1545452Z // %bb.14: // %.lr.ph 2026-02-21T08:32:58.1545747Z .loc 1 21 71 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:21:71 2026-02-21T08:32:58.1546044Z cvta.global.u64 %rd127, %rd20; 2026-02-21T08:32:58.1546217Z setp.lt.u32 %p66, %r1, 64; 2026-02-21T08:32:58.1546369Z shl.b32 %r855, %r1, 7; 2026-02-21T08:32:58.1546521Z and.b32 %r856, %r855, 16256; 2026-02-21T08:32:58.1546669Z shl.b32 %r857, %r1, 4; 2026-02-21T08:32:58.1546821Z and.b32 %r858, %r857, 112; 2026-02-21T08:32:58.1546970Z or.b32 %r859, %r856, %r858; 2026-02-21T08:32:58.1547126Z xor.b32 %r860, %r859, 112; 2026-02-21T08:32:58.1547277Z add.s32 %r862, %r287, %r860; 2026-02-21T08:32:58.1547424Z xor.b32 %r863, %r859, 96; 2026-02-21T08:32:58.1547577Z add.s32 %r864, %r287, %r863; 2026-02-21T08:32:58.1547755Z xor.b32 %r865, %r859, 80; 2026-02-21T08:32:58.1547909Z add.s32 %r866, %r287, %r865; 2026-02-21T08:32:58.1548055Z xor.b32 %r867, %r859, 64; 2026-02-21T08:32:58.1548204Z add.s32 %r868, %r287, %r867; 2026-02-21T08:32:58.1548349Z xor.b32 %r869, %r859, 48; 2026-02-21T08:32:58.1548498Z add.s32 %r870, %r287, %r869; 2026-02-21T08:32:58.1548642Z xor.b32 %r871, %r859, 32; 2026-02-21T08:32:58.1548791Z add.s32 %r872, %r287, %r871; 2026-02-21T08:32:58.1548942Z xor.b32 %r873, %r859, 16; 2026-02-21T08:32:58.1549084Z add.s32 %r874, %r287, %r873; 2026-02-21T08:32:58.1549238Z add.s32 %r875, %r287, %r859; 2026-02-21T08:32:58.1549493Z .loc 1 37 33 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:37:33 2026-02-21T08:32:58.1549777Z shr.u32 %r876, %r33, 4; 2026-02-21T08:32:58.1549921Z and.b32 %r877, %r876, 30; 2026-02-21T08:32:58.1550181Z .loc 1 39 64 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:39:64 2026-02-21T08:32:58.1550471Z and.b32 %r878, %r33, 1; 2026-02-21T08:32:58.1550736Z .loc 1 39 30 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:39:30 2026-02-21T08:32:58.1551065Z or.b32 %r879, %r877, %r878; 2026-02-21T08:32:58.1551329Z .loc 1 41 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:41:27 2026-02-21T08:32:58.1551619Z shl.b32 %r853, %r879, 8; 2026-02-21T08:32:58.1551881Z .loc 1 43 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:43:27 2026-02-21T08:32:58.1552169Z shl.b32 %r880, %r33, 6; 2026-02-21T08:32:58.1552315Z and.b32 %r881, %r880, 1920; 2026-02-21T08:32:58.1552587Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1552891Z shfl.sync.idx.b32 %r882, %r2, 0, 31, -1; 2026-02-21T08:32:58.1553087Z shl.b32 %r883, %r882, 21; 2026-02-21T08:32:58.1553250Z and.b32 %r884, %r883, 6291456; 2026-02-21T08:32:58.1553416Z add.s32 %r304, %r884, %r1274; 2026-02-21T08:32:58.1553584Z mov.pred %p47, -1; 2026-02-21T08:32:58.1553759Z // begin inline asm 2026-02-21T08:32:58.1554142Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 0], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1554546Z // end inline asm 2026-02-21T08:32:58.1554732Z // begin inline asm 2026-02-21T08:32:58.1555117Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 16], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1555512Z // end inline asm 2026-02-21T08:32:58.1555690Z // begin inline asm 2026-02-21T08:32:58.1556062Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 32], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1556466Z // end inline asm 2026-02-21T08:32:58.1556607Z // begin inline asm 2026-02-21T08:32:58.1556964Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 48], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1557376Z // end inline asm 2026-02-21T08:32:58.1557509Z // begin inline asm 2026-02-21T08:32:58.1557870Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 64], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1558310Z // end inline asm 2026-02-21T08:32:58.1558440Z // begin inline asm 2026-02-21T08:32:58.1558788Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 80], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1559173Z // end inline asm 2026-02-21T08:32:58.1559306Z // begin inline asm 2026-02-21T08:32:58.1559687Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 96], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1560067Z // end inline asm 2026-02-21T08:32:58.1560201Z // begin inline asm 2026-02-21T08:32:58.1560555Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 112], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1560946Z // end inline asm 2026-02-21T08:32:58.1561073Z // begin inline asm 2026-02-21T08:32:58.1561420Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 128], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1561809Z // end inline asm 2026-02-21T08:32:58.1561944Z // begin inline asm 2026-02-21T08:32:58.1562292Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 144], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1562673Z // end inline asm 2026-02-21T08:32:58.1562813Z // begin inline asm 2026-02-21T08:32:58.1563167Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 160], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1563597Z // end inline asm 2026-02-21T08:32:58.1563726Z // begin inline asm 2026-02-21T08:32:58.1564074Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 176], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1564462Z // end inline asm 2026-02-21T08:32:58.1564590Z // begin inline asm 2026-02-21T08:32:58.1564969Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 192], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1565348Z // end inline asm 2026-02-21T08:32:58.1565486Z // begin inline asm 2026-02-21T08:32:58.1565864Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 208], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1566243Z // end inline asm 2026-02-21T08:32:58.1566377Z // begin inline asm 2026-02-21T08:32:58.1566714Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 224], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1567094Z // end inline asm 2026-02-21T08:32:58.1567224Z // begin inline asm 2026-02-21T08:32:58.1567580Z @%p47 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r304 + 240], {%r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305, %r305}; 2026-02-21T08:32:58.1567994Z // end inline asm 2026-02-21T08:32:58.1568122Z // begin inline asm 2026-02-21T08:32:58.1568279Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:32:58.1568438Z // end inline asm 2026-02-21T08:32:58.1568573Z bar.sync 0, 128; 2026-02-21T08:32:58.1568827Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1569133Z add.s32 %r576, %r287, 114688; 2026-02-21T08:32:58.1569285Z // begin inline asm 2026-02-21T08:32:58.1569454Z @%p63 mbarrier.init.shared::cta.b64 [%r576], 1; 2026-02-21T08:32:58.1569646Z // end inline asm 2026-02-21T08:32:58.1569812Z st.shared.v2.b32 [global_smem+114696], {0, 50397698}; 2026-02-21T08:32:58.1570033Z st.shared.b32 [global_smem+65536], %r1274; 2026-02-21T08:32:58.1570242Z st.shared.v2.b32 [global_smem+65544], {%r853, %r881}; 2026-02-21T08:32:58.1570439Z barrier.sync 1; 2026-02-21T08:32:58.1570573Z barrier.sync 1; 2026-02-21T08:32:58.1570708Z barrier.sync 1; 2026-02-21T08:32:58.1570953Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1571238Z bar.sync 0, 128; 2026-02-21T08:32:58.1571373Z // begin inline asm 2026-02-21T08:32:58.1571502Z 2026-02-21T08:32:58.1571618Z { 2026-02-21T08:32:58.1571766Z .reg .pred complete; 2026-02-21T08:32:58.1571915Z waitLoop: 2026-02-21T08:32:58.1572100Z mbarrier.try_wait.parity.shared.b64 complete, [%r576], %r305; 2026-02-21T08:32:58.1572338Z @!complete bra.uni waitLoop; 2026-02-21T08:32:58.1572486Z } 2026-02-21T08:32:58.1572554Z 2026-02-21T08:32:58.1572607Z // end inline asm 2026-02-21T08:32:58.1572857Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1573137Z bar.sync 0, 128; 2026-02-21T08:32:58.1573272Z // begin inline asm 2026-02-21T08:32:58.1573433Z @%p63 mbarrier.inval.shared::cta.b64 [%r576]; 2026-02-21T08:32:58.1573622Z // end inline asm 2026-02-21T08:32:58.1573862Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1574148Z // begin inline asm 2026-02-21T08:32:58.1574495Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595}, [%r304 + 0]; 2026-02-21T08:32:58.1574904Z // end inline asm 2026-02-21T08:32:58.1575043Z // begin inline asm 2026-02-21T08:32:58.1575405Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612}, [%r304 + 16]; 2026-02-21T08:32:58.1575790Z // end inline asm 2026-02-21T08:32:58.1575916Z // begin inline asm 2026-02-21T08:32:58.1576254Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629}, [%r304 + 32]; 2026-02-21T08:32:58.1576622Z // end inline asm 2026-02-21T08:32:58.1576747Z // begin inline asm 2026-02-21T08:32:58.1577080Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646}, [%r304 + 48]; 2026-02-21T08:32:58.1577449Z // end inline asm 2026-02-21T08:32:58.1577580Z // begin inline asm 2026-02-21T08:32:58.1577930Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663}, [%r304 + 64]; 2026-02-21T08:32:58.1578307Z // end inline asm 2026-02-21T08:32:58.1578442Z // begin inline asm 2026-02-21T08:32:58.1578764Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680}, [%r304 + 80]; 2026-02-21T08:32:58.1579134Z // end inline asm 2026-02-21T08:32:58.1579260Z // begin inline asm 2026-02-21T08:32:58.1579597Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697}, [%r304 + 96]; 2026-02-21T08:32:58.1579986Z // end inline asm 2026-02-21T08:32:58.1580119Z // begin inline asm 2026-02-21T08:32:58.1580465Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714}, [%r304 + 112]; 2026-02-21T08:32:58.1580826Z // end inline asm 2026-02-21T08:32:58.1580960Z // begin inline asm 2026-02-21T08:32:58.1581287Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731}, [%r304 + 128]; 2026-02-21T08:32:58.1581668Z // end inline asm 2026-02-21T08:32:58.1581794Z // begin inline asm 2026-02-21T08:32:58.1582128Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748}, [%r304 + 144]; 2026-02-21T08:32:58.1582501Z // end inline asm 2026-02-21T08:32:58.1582626Z // begin inline asm 2026-02-21T08:32:58.1582958Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765}, [%r304 + 160]; 2026-02-21T08:32:58.1583314Z // end inline asm 2026-02-21T08:32:58.1583469Z // begin inline asm 2026-02-21T08:32:58.1583805Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782}, [%r304 + 176]; 2026-02-21T08:32:58.1584168Z // end inline asm 2026-02-21T08:32:58.1584302Z // begin inline asm 2026-02-21T08:32:58.1584642Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799}, [%r304 + 192]; 2026-02-21T08:32:58.1585061Z // end inline asm 2026-02-21T08:32:58.1585187Z // begin inline asm 2026-02-21T08:32:58.1585518Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816}, [%r304 + 208]; 2026-02-21T08:32:58.1585886Z // end inline asm 2026-02-21T08:32:58.1586012Z // begin inline asm 2026-02-21T08:32:58.1586351Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833}, [%r304 + 224]; 2026-02-21T08:32:58.1586713Z // end inline asm 2026-02-21T08:32:58.1586869Z // begin inline asm 2026-02-21T08:32:58.1587197Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845, %r846, %r847, %r848, %r849, %r850}, [%r304 + 240]; 2026-02-21T08:32:58.1587583Z // end inline asm 2026-02-21T08:32:58.1587717Z // begin inline asm 2026-02-21T08:32:58.1587862Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:32:58.1588024Z // end inline asm 2026-02-21T08:32:58.1588159Z cvt.u64.u32 %rd128, %r580; 2026-02-21T08:32:58.1588323Z cvt.u64.u32 %rd129, %r581; 2026-02-21T08:32:58.1588474Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:32:58.1588639Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:32:58.1588906Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1589201Z mov.b64 {%r885, %r886}, %rd131; 2026-02-21T08:32:58.1589377Z cvt.rn.f16x2.f32 %r887, %r886, %r885; 2026-02-21T08:32:58.1589669Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1589959Z cvt.u64.u32 %rd132, %r582; 2026-02-21T08:32:58.1590109Z cvt.u64.u32 %rd133, %r583; 2026-02-21T08:32:58.1590264Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:32:58.1590418Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:32:58.1590683Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1590974Z mov.b64 {%r888, %r889}, %rd135; 2026-02-21T08:32:58.1591168Z cvt.rn.f16x2.f32 %r890, %r889, %r888; 2026-02-21T08:32:58.1591447Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1591721Z cvt.u64.u32 %rd136, %r584; 2026-02-21T08:32:58.1591874Z cvt.u64.u32 %rd137, %r585; 2026-02-21T08:32:58.1592022Z shl.b64 %rd138, %rd137, 32; 2026-02-21T08:32:58.1592182Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T08:32:58.1592438Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1592731Z mov.b64 {%r891, %r892}, %rd139; 2026-02-21T08:32:58.1592898Z cvt.rn.f16x2.f32 %r893, %r892, %r891; 2026-02-21T08:32:58.1593170Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1593448Z cvt.u64.u32 %rd140, %r586; 2026-02-21T08:32:58.1593593Z cvt.u64.u32 %rd141, %r587; 2026-02-21T08:32:58.1593744Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:32:58.1593892Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:32:58.1594163Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1594460Z mov.b64 {%r894, %r895}, %rd143; 2026-02-21T08:32:58.1594627Z cvt.rn.f16x2.f32 %r896, %r895, %r894; 2026-02-21T08:32:58.1594973Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1595263Z cvt.u64.u32 %rd144, %r588; 2026-02-21T08:32:58.1595423Z cvt.u64.u32 %rd145, %r589; 2026-02-21T08:32:58.1595573Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:32:58.1595736Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:32:58.1596005Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1596308Z mov.b64 {%r897, %r898}, %rd147; 2026-02-21T08:32:58.1596483Z cvt.rn.f16x2.f32 %r899, %r898, %r897; 2026-02-21T08:32:58.1596763Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1597063Z cvt.u64.u32 %rd148, %r590; 2026-02-21T08:32:58.1597218Z cvt.u64.u32 %rd149, %r591; 2026-02-21T08:32:58.1597379Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:32:58.1597533Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:32:58.1597810Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1598106Z mov.b64 {%r900, %r901}, %rd151; 2026-02-21T08:32:58.1598275Z cvt.rn.f16x2.f32 %r902, %r901, %r900; 2026-02-21T08:32:58.1598586Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1598878Z cvt.u64.u32 %rd152, %r592; 2026-02-21T08:32:58.1599037Z cvt.u64.u32 %rd153, %r593; 2026-02-21T08:32:58.1599191Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:32:58.1599355Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:32:58.1599616Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1599917Z mov.b64 {%r903, %r904}, %rd155; 2026-02-21T08:32:58.1600092Z cvt.rn.f16x2.f32 %r905, %r904, %r903; 2026-02-21T08:32:58.1600370Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1600662Z cvt.u64.u32 %rd156, %r594; 2026-02-21T08:32:58.1600816Z cvt.u64.u32 %rd157, %r595; 2026-02-21T08:32:58.1600975Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:32:58.1601156Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:32:58.1601430Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1601721Z mov.b64 {%r906, %r907}, %rd159; 2026-02-21T08:32:58.1601887Z cvt.rn.f16x2.f32 %r908, %r907, %r906; 2026-02-21T08:32:58.1602180Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1602449Z cvt.u64.u32 %rd160, %r597; 2026-02-21T08:32:58.1602603Z cvt.u64.u32 %rd161, %r598; 2026-02-21T08:32:58.1602799Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:32:58.1602956Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:32:58.1603206Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1603483Z mov.b64 {%r909, %r910}, %rd163; 2026-02-21T08:32:58.1603648Z cvt.rn.f16x2.f32 %r911, %r910, %r909; 2026-02-21T08:32:58.1603917Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1604198Z cvt.u64.u32 %rd164, %r599; 2026-02-21T08:32:58.1604344Z cvt.u64.u32 %rd165, %r600; 2026-02-21T08:32:58.1604496Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:32:58.1604645Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:32:58.1604942Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1605222Z mov.b64 {%r912, %r913}, %rd167; 2026-02-21T08:32:58.1605380Z cvt.rn.f16x2.f32 %r914, %r913, %r912; 2026-02-21T08:32:58.1605653Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1605927Z cvt.u64.u32 %rd168, %r601; 2026-02-21T08:32:58.1606078Z cvt.u64.u32 %rd169, %r602; 2026-02-21T08:32:58.1606222Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:32:58.1606375Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:32:58.1606658Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1606941Z mov.b64 {%r915, %r916}, %rd171; 2026-02-21T08:32:58.1607104Z cvt.rn.f16x2.f32 %r917, %r916, %r915; 2026-02-21T08:32:58.1607370Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1607658Z cvt.u64.u32 %rd172, %r603; 2026-02-21T08:32:58.1607804Z cvt.u64.u32 %rd173, %r604; 2026-02-21T08:32:58.1607955Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:32:58.1608103Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:32:58.1608362Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1608648Z mov.b64 {%r918, %r919}, %rd175; 2026-02-21T08:32:58.1608804Z cvt.rn.f16x2.f32 %r920, %r919, %r918; 2026-02-21T08:32:58.1609074Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1609343Z cvt.u64.u32 %rd176, %r605; 2026-02-21T08:32:58.1609499Z cvt.u64.u32 %rd177, %r606; 2026-02-21T08:32:58.1609666Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:32:58.1609823Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:32:58.1610075Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1610367Z mov.b64 {%r921, %r922}, %rd179; 2026-02-21T08:32:58.1610535Z cvt.rn.f16x2.f32 %r923, %r922, %r921; 2026-02-21T08:32:58.1610801Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1611081Z cvt.u64.u32 %rd180, %r607; 2026-02-21T08:32:58.1611229Z cvt.u64.u32 %rd181, %r608; 2026-02-21T08:32:58.1611382Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:32:58.1611531Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:32:58.1611790Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1612067Z mov.b64 {%r924, %r925}, %rd183; 2026-02-21T08:32:58.1612249Z cvt.rn.f16x2.f32 %r926, %r925, %r924; 2026-02-21T08:32:58.1612517Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1612786Z cvt.u64.u32 %rd184, %r609; 2026-02-21T08:32:58.1612938Z cvt.u64.u32 %rd185, %r610; 2026-02-21T08:32:58.1613082Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:32:58.1613237Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:32:58.1613485Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1613764Z mov.b64 {%r927, %r928}, %rd187; 2026-02-21T08:32:58.1613954Z cvt.rn.f16x2.f32 %r929, %r928, %r927; 2026-02-21T08:32:58.1614215Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1614498Z cvt.u64.u32 %rd188, %r611; 2026-02-21T08:32:58.1614646Z cvt.u64.u32 %rd189, %r612; 2026-02-21T08:32:58.1614835Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:32:58.1614985Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:32:58.1615244Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1615523Z mov.b64 {%r930, %r931}, %rd191; 2026-02-21T08:32:58.1615681Z cvt.rn.f16x2.f32 %r932, %r931, %r930; 2026-02-21T08:32:58.1615952Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1616223Z cvt.u64.u32 %rd192, %r614; 2026-02-21T08:32:58.1616378Z cvt.u64.u32 %rd193, %r615; 2026-02-21T08:32:58.1616524Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:32:58.1616680Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:32:58.1616931Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1617221Z mov.b64 {%r933, %r934}, %rd195; 2026-02-21T08:32:58.1617385Z cvt.rn.f16x2.f32 %r935, %r934, %r933; 2026-02-21T08:32:58.1617672Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1617956Z cvt.u64.u32 %rd196, %r616; 2026-02-21T08:32:58.1618102Z cvt.u64.u32 %rd197, %r617; 2026-02-21T08:32:58.1618253Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:32:58.1618401Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:32:58.1618660Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1618940Z mov.b64 {%r936, %r937}, %rd199; 2026-02-21T08:32:58.1619096Z cvt.rn.f16x2.f32 %r938, %r937, %r936; 2026-02-21T08:32:58.1619366Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1619641Z cvt.u64.u32 %rd200, %r618; 2026-02-21T08:32:58.1619793Z cvt.u64.u32 %rd201, %r619; 2026-02-21T08:32:58.1619939Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:32:58.1620097Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:32:58.1620350Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1620632Z mov.b64 {%r939, %r940}, %rd203; 2026-02-21T08:32:58.1620823Z cvt.rn.f16x2.f32 %r941, %r940, %r939; 2026-02-21T08:32:58.1621091Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1621375Z cvt.u64.u32 %rd204, %r620; 2026-02-21T08:32:58.1621521Z cvt.u64.u32 %rd205, %r621; 2026-02-21T08:32:58.1621674Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:32:58.1621823Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:32:58.1622088Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1622372Z mov.b64 {%r942, %r943}, %rd207; 2026-02-21T08:32:58.1622527Z cvt.rn.f16x2.f32 %r944, %r943, %r942; 2026-02-21T08:32:58.1622798Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1623076Z cvt.u64.u32 %rd208, %r622; 2026-02-21T08:32:58.1623229Z cvt.u64.u32 %rd209, %r623; 2026-02-21T08:32:58.1623399Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:32:58.1623563Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:32:58.1623820Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1624102Z mov.b64 {%r945, %r946}, %rd211; 2026-02-21T08:32:58.1624266Z cvt.rn.f16x2.f32 %r947, %r946, %r945; 2026-02-21T08:32:58.1624535Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1624855Z cvt.u64.u32 %rd212, %r624; 2026-02-21T08:32:58.1625040Z cvt.u64.u32 %rd213, %r625; 2026-02-21T08:32:58.1625199Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:32:58.1625357Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:32:58.1625625Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1625916Z mov.b64 {%r948, %r949}, %rd215; 2026-02-21T08:32:58.1626078Z cvt.rn.f16x2.f32 %r950, %r949, %r948; 2026-02-21T08:32:58.1626360Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1626642Z cvt.u64.u32 %rd216, %r626; 2026-02-21T08:32:58.1626802Z cvt.u64.u32 %rd217, %r627; 2026-02-21T08:32:58.1626953Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:32:58.1627115Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:32:58.1627378Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1627675Z mov.b64 {%r951, %r952}, %rd219; 2026-02-21T08:32:58.1627842Z cvt.rn.f16x2.f32 %r953, %r952, %r951; 2026-02-21T08:32:58.1628122Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1628414Z cvt.u64.u32 %rd220, %r628; 2026-02-21T08:32:58.1628565Z cvt.u64.u32 %rd221, %r629; 2026-02-21T08:32:58.1628722Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:32:58.1628901Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:32:58.1629163Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1629438Z mov.b64 {%r954, %r955}, %rd223; 2026-02-21T08:32:58.1629594Z cvt.rn.f16x2.f32 %r956, %r955, %r954; 2026-02-21T08:32:58.1629859Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1630135Z cvt.u64.u32 %rd224, %r631; 2026-02-21T08:32:58.1630288Z cvt.u64.u32 %rd225, %r632; 2026-02-21T08:32:58.1630433Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:32:58.1630589Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:32:58.1630839Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1631119Z mov.b64 {%r957, %r958}, %rd227; 2026-02-21T08:32:58.1631283Z cvt.rn.f16x2.f32 %r959, %r958, %r957; 2026-02-21T08:32:58.1631545Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1631821Z cvt.u64.u32 %rd228, %r633; 2026-02-21T08:32:58.1631969Z cvt.u64.u32 %rd229, %r634; 2026-02-21T08:32:58.1632182Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:32:58.1632331Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:32:58.1632592Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1632873Z mov.b64 {%r960, %r961}, %rd231; 2026-02-21T08:32:58.1633029Z cvt.rn.f16x2.f32 %r962, %r961, %r960; 2026-02-21T08:32:58.1633297Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1633570Z cvt.u64.u32 %rd232, %r635; 2026-02-21T08:32:58.1633725Z cvt.u64.u32 %rd233, %r636; 2026-02-21T08:32:58.1633872Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:32:58.1634032Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:32:58.1634283Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1634563Z mov.b64 {%r963, %r964}, %rd235; 2026-02-21T08:32:58.1634787Z cvt.rn.f16x2.f32 %r965, %r964, %r963; 2026-02-21T08:32:58.1635052Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1635328Z cvt.u64.u32 %rd236, %r637; 2026-02-21T08:32:58.1635473Z cvt.u64.u32 %rd237, %r638; 2026-02-21T08:32:58.1635624Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:32:58.1635772Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:32:58.1636031Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1636335Z mov.b64 {%r966, %r967}, %rd239; 2026-02-21T08:32:58.1636495Z cvt.rn.f16x2.f32 %r968, %r967, %r966; 2026-02-21T08:32:58.1636767Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1637038Z cvt.u64.u32 %rd240, %r639; 2026-02-21T08:32:58.1637194Z cvt.u64.u32 %rd241, %r640; 2026-02-21T08:32:58.1637339Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:32:58.1637494Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:32:58.1637748Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1638033Z mov.b64 {%r969, %r970}, %rd243; 2026-02-21T08:32:58.1638196Z cvt.rn.f16x2.f32 %r971, %r970, %r969; 2026-02-21T08:32:58.1638482Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1638779Z cvt.u64.u32 %rd244, %r641; 2026-02-21T08:32:58.1638932Z cvt.u64.u32 %rd245, %r642; 2026-02-21T08:32:58.1639092Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:32:58.1639247Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:32:58.1639520Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1639815Z mov.b64 {%r972, %r973}, %rd247; 2026-02-21T08:32:58.1639982Z cvt.rn.f16x2.f32 %r974, %r973, %r972; 2026-02-21T08:32:58.1640289Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1640592Z cvt.u64.u32 %rd248, %r643; 2026-02-21T08:32:58.1640751Z cvt.u64.u32 %rd249, %r644; 2026-02-21T08:32:58.1640903Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:32:58.1641065Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:32:58.1641334Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1641637Z mov.b64 {%r975, %r976}, %rd251; 2026-02-21T08:32:58.1641809Z cvt.rn.f16x2.f32 %r977, %r976, %r975; 2026-02-21T08:32:58.1642093Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1642388Z cvt.u64.u32 %rd252, %r645; 2026-02-21T08:32:58.1642542Z cvt.u64.u32 %rd253, %r646; 2026-02-21T08:32:58.1642701Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:32:58.1642857Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:32:58.1643137Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1643441Z mov.b64 {%r978, %r979}, %rd255; 2026-02-21T08:32:58.1643629Z cvt.rn.f16x2.f32 %r980, %r979, %r978; 2026-02-21T08:32:58.1643909Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1644193Z cvt.u64.u32 %rd256, %r648; 2026-02-21T08:32:58.1644353Z cvt.u64.u32 %rd257, %r649; 2026-02-21T08:32:58.1644505Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:32:58.1644699Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:32:58.1644966Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1645272Z mov.b64 {%r981, %r982}, %rd259; 2026-02-21T08:32:58.1645452Z cvt.rn.f16x2.f32 %r983, %r982, %r981; 2026-02-21T08:32:58.1645729Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1646021Z cvt.u64.u32 %rd260, %r650; 2026-02-21T08:32:58.1646173Z cvt.u64.u32 %rd261, %r651; 2026-02-21T08:32:58.1646374Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:32:58.1646535Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:32:58.1646822Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1647105Z mov.b64 {%r984, %r985}, %rd263; 2026-02-21T08:32:58.1647265Z cvt.rn.f16x2.f32 %r986, %r985, %r984; 2026-02-21T08:32:58.1647540Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1647820Z cvt.u64.u32 %rd264, %r652; 2026-02-21T08:32:58.1648000Z cvt.u64.u32 %rd265, %r653; 2026-02-21T08:32:58.1648147Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:32:58.1648307Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:32:58.1648559Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1648840Z mov.b64 {%r987, %r988}, %rd267; 2026-02-21T08:32:58.1649003Z cvt.rn.f16x2.f32 %r989, %r988, %r987; 2026-02-21T08:32:58.1649265Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1649550Z cvt.u64.u32 %rd268, %r654; 2026-02-21T08:32:58.1649694Z cvt.u64.u32 %rd269, %r655; 2026-02-21T08:32:58.1649842Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:32:58.1649989Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:32:58.1650245Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1650519Z mov.b64 {%r990, %r991}, %rd271; 2026-02-21T08:32:58.1650677Z cvt.rn.f16x2.f32 %r992, %r991, %r990; 2026-02-21T08:32:58.1650943Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1651218Z cvt.u64.u32 %rd272, %r656; 2026-02-21T08:32:58.1651371Z cvt.u64.u32 %rd273, %r657; 2026-02-21T08:32:58.1651538Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:32:58.1651697Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:32:58.1651951Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1652237Z mov.b64 {%r993, %r994}, %rd275; 2026-02-21T08:32:58.1652402Z cvt.rn.f16x2.f32 %r995, %r994, %r993; 2026-02-21T08:32:58.1652662Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1652938Z cvt.u64.u32 %rd276, %r658; 2026-02-21T08:32:58.1653086Z cvt.u64.u32 %rd277, %r659; 2026-02-21T08:32:58.1653239Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:32:58.1653389Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:32:58.1653648Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1653930Z mov.b64 {%r996, %r997}, %rd279; 2026-02-21T08:32:58.1654090Z cvt.rn.f16x2.f32 %r998, %r997, %r996; 2026-02-21T08:32:58.1654361Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1654633Z cvt.u64.u32 %rd280, %r660; 2026-02-21T08:32:58.1655127Z cvt.u64.u32 %rd281, %r661; 2026-02-21T08:32:58.1655275Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:32:58.1655436Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:32:58.1655692Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1655994Z mov.b64 {%r999, %r1000}, %rd283; 2026-02-21T08:32:58.1656178Z cvt.rn.f16x2.f32 %r1001, %r1000, %r999; 2026-02-21T08:32:58.1656459Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1656757Z cvt.u64.u32 %rd284, %r662; 2026-02-21T08:32:58.1656907Z cvt.u64.u32 %rd285, %r663; 2026-02-21T08:32:58.1657066Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:32:58.1657222Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:32:58.1657487Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1657813Z mov.b64 {%r1002, %r1003}, %rd287; 2026-02-21T08:32:58.1657993Z cvt.rn.f16x2.f32 %r1004, %r1003, %r1002; 2026-02-21T08:32:58.1658276Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1658553Z cvt.u64.u32 %rd288, %r665; 2026-02-21T08:32:58.1658708Z cvt.u64.u32 %rd289, %r666; 2026-02-21T08:32:58.1658854Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:32:58.1659012Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:32:58.1659269Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1659588Z mov.b64 {%r1005, %r1006}, %rd291; 2026-02-21T08:32:58.1659769Z cvt.rn.f16x2.f32 %r1007, %r1006, %r1005; 2026-02-21T08:32:58.1660047Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1660327Z cvt.u64.u32 %rd292, %r667; 2026-02-21T08:32:58.1660473Z cvt.u64.u32 %rd293, %r668; 2026-02-21T08:32:58.1660629Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:32:58.1660779Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:32:58.1661042Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1661324Z mov.b64 {%r1008, %r1009}, %rd295; 2026-02-21T08:32:58.1661492Z cvt.rn.f16x2.f32 %r1010, %r1009, %r1008; 2026-02-21T08:32:58.1661766Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1662040Z cvt.u64.u32 %rd296, %r669; 2026-02-21T08:32:58.1662195Z cvt.u64.u32 %rd297, %r670; 2026-02-21T08:32:58.1662340Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:32:58.1662494Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:32:58.1662748Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1663032Z mov.b64 {%r1011, %r1012}, %rd299; 2026-02-21T08:32:58.1663233Z cvt.rn.f16x2.f32 %r1013, %r1012, %r1011; 2026-02-21T08:32:58.1663511Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1663793Z cvt.u64.u32 %rd300, %r671; 2026-02-21T08:32:58.1663937Z cvt.u64.u32 %rd301, %r672; 2026-02-21T08:32:58.1664090Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:32:58.1664237Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:32:58.1664507Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1664833Z mov.b64 {%r1014, %r1015}, %rd303; 2026-02-21T08:32:58.1665006Z cvt.rn.f16x2.f32 %r1016, %r1015, %r1014; 2026-02-21T08:32:58.1665293Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1665571Z cvt.u64.u32 %rd304, %r673; 2026-02-21T08:32:58.1665724Z cvt.u64.u32 %rd305, %r674; 2026-02-21T08:32:58.1665871Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:32:58.1666030Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:32:58.1666291Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1666605Z mov.b64 {%r1017, %r1018}, %rd307; 2026-02-21T08:32:58.1666779Z cvt.rn.f16x2.f32 %r1019, %r1018, %r1017; 2026-02-21T08:32:58.1667050Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1667330Z cvt.u64.u32 %rd308, %r675; 2026-02-21T08:32:58.1667475Z cvt.u64.u32 %rd309, %r676; 2026-02-21T08:32:58.1667627Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:32:58.1667776Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:32:58.1668043Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1668323Z mov.b64 {%r1020, %r1021}, %rd311; 2026-02-21T08:32:58.1668489Z cvt.rn.f16x2.f32 %r1022, %r1021, %r1020; 2026-02-21T08:32:58.1668769Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1669063Z cvt.u64.u32 %rd312, %r677; 2026-02-21T08:32:58.1669220Z cvt.u64.u32 %rd313, %r678; 2026-02-21T08:32:58.1669366Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:32:58.1669523Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:32:58.1669774Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1670052Z mov.b64 {%r1023, %r1024}, %rd315; 2026-02-21T08:32:58.1670221Z cvt.rn.f16x2.f32 %r1025, %r1024, %r1023; 2026-02-21T08:32:58.1670489Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1670807Z cvt.u64.u32 %rd316, %r679; 2026-02-21T08:32:58.1670959Z cvt.u64.u32 %rd317, %r680; 2026-02-21T08:32:58.1671117Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:32:58.1671272Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:32:58.1671541Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1671824Z mov.b64 {%r1026, %r1027}, %rd319; 2026-02-21T08:32:58.1671996Z cvt.rn.f16x2.f32 %r1028, %r1027, %r1026; 2026-02-21T08:32:58.1672286Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1672568Z cvt.u64.u32 %rd320, %r682; 2026-02-21T08:32:58.1672726Z cvt.u64.u32 %rd321, %r683; 2026-02-21T08:32:58.1672877Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:32:58.1673039Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:32:58.1673298Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1673592Z mov.b64 {%r1029, %r1030}, %rd323; 2026-02-21T08:32:58.1673769Z cvt.rn.f16x2.f32 %r1031, %r1030, %r1029; 2026-02-21T08:32:58.1674042Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1674330Z cvt.u64.u32 %rd324, %r684; 2026-02-21T08:32:58.1674509Z cvt.u64.u32 %rd325, %r685; 2026-02-21T08:32:58.1674666Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:32:58.1674840Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:32:58.1675104Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1675388Z mov.b64 {%r1032, %r1033}, %rd327; 2026-02-21T08:32:58.1675555Z cvt.rn.f16x2.f32 %r1034, %r1033, %r1032; 2026-02-21T08:32:58.1675836Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1676120Z cvt.u64.u32 %rd328, %r686; 2026-02-21T08:32:58.1676276Z cvt.u64.u32 %rd329, %r687; 2026-02-21T08:32:58.1676428Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:32:58.1676591Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:32:58.1676854Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1677130Z mov.b64 {%r1035, %r1036}, %rd331; 2026-02-21T08:32:58.1677305Z cvt.rn.f16x2.f32 %r1037, %r1036, %r1035; 2026-02-21T08:32:58.1677577Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1677880Z cvt.u64.u32 %rd332, %r688; 2026-02-21T08:32:58.1678027Z cvt.u64.u32 %rd333, %r689; 2026-02-21T08:32:58.1678179Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:32:58.1678328Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:32:58.1678586Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1678867Z mov.b64 {%r1038, %r1039}, %rd335; 2026-02-21T08:32:58.1679036Z cvt.rn.f16x2.f32 %r1040, %r1039, %r1038; 2026-02-21T08:32:58.1679318Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1679591Z cvt.u64.u32 %rd336, %r690; 2026-02-21T08:32:58.1679744Z cvt.u64.u32 %rd337, %r691; 2026-02-21T08:32:58.1679888Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:32:58.1680047Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:32:58.1680338Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1680621Z mov.b64 {%r1041, %r1042}, %rd339; 2026-02-21T08:32:58.1680801Z cvt.rn.f16x2.f32 %r1043, %r1042, %r1041; 2026-02-21T08:32:58.1681082Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1681372Z cvt.u64.u32 %rd340, %r692; 2026-02-21T08:32:58.1681528Z cvt.u64.u32 %rd341, %r693; 2026-02-21T08:32:58.1681698Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:32:58.1681859Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:32:58.1682161Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1682454Z mov.b64 {%r1044, %r1045}, %rd343; 2026-02-21T08:32:58.1682626Z cvt.rn.f16x2.f32 %r1046, %r1045, %r1044; 2026-02-21T08:32:58.1682914Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1683202Z cvt.u64.u32 %rd344, %r694; 2026-02-21T08:32:58.1683363Z cvt.u64.u32 %rd345, %r695; 2026-02-21T08:32:58.1683516Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:32:58.1683680Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:32:58.1683951Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1684234Z mov.b64 {%r1047, %r1048}, %rd347; 2026-02-21T08:32:58.1684415Z cvt.rn.f16x2.f32 %r1049, %r1048, %r1047; 2026-02-21T08:32:58.1684732Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1685033Z cvt.u64.u32 %rd348, %r696; 2026-02-21T08:32:58.1685186Z cvt.u64.u32 %rd349, %r697; 2026-02-21T08:32:58.1685348Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:32:58.1685505Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:32:58.1685804Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1686101Z mov.b64 {%r1050, %r1051}, %rd351; 2026-02-21T08:32:58.1686280Z cvt.rn.f16x2.f32 %r1052, %r1051, %r1050; 2026-02-21T08:32:58.1686581Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1686869Z cvt.u64.u32 %rd352, %r699; 2026-02-21T08:32:58.1687031Z cvt.u64.u32 %rd353, %r700; 2026-02-21T08:32:58.1687185Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:32:58.1687349Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:32:58.1687625Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1687918Z mov.b64 {%r1053, %r1054}, %rd355; 2026-02-21T08:32:58.1688099Z cvt.rn.f16x2.f32 %r1055, %r1054, %r1053; 2026-02-21T08:32:58.1688391Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1688698Z cvt.u64.u32 %rd356, %r701; 2026-02-21T08:32:58.1688853Z cvt.u64.u32 %rd357, %r702; 2026-02-21T08:32:58.1689015Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:32:58.1689176Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:32:58.1689486Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1689780Z mov.b64 {%r1056, %r1057}, %rd359; 2026-02-21T08:32:58.1689943Z cvt.rn.f16x2.f32 %r1058, %r1057, %r1056; 2026-02-21T08:32:58.1690225Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1690502Z cvt.u64.u32 %rd360, %r703; 2026-02-21T08:32:58.1690656Z cvt.u64.u32 %rd361, %r704; 2026-02-21T08:32:58.1690806Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:32:58.1690968Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:32:58.1691240Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1691516Z mov.b64 {%r1059, %r1060}, %rd363; 2026-02-21T08:32:58.1691690Z cvt.rn.f16x2.f32 %r1061, %r1060, %r1059; 2026-02-21T08:32:58.1691997Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1692273Z cvt.u64.u32 %rd364, %r705; 2026-02-21T08:32:58.1692417Z cvt.u64.u32 %rd365, %r706; 2026-02-21T08:32:58.1692569Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:32:58.1692717Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:32:58.1692973Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1693257Z mov.b64 {%r1062, %r1063}, %rd367; 2026-02-21T08:32:58.1693418Z cvt.rn.f16x2.f32 %r1064, %r1063, %r1062; 2026-02-21T08:32:58.1693720Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1693990Z cvt.u64.u32 %rd368, %r707; 2026-02-21T08:32:58.1694144Z cvt.u64.u32 %rd369, %r708; 2026-02-21T08:32:58.1694288Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:32:58.1694447Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:32:58.1694728Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1695000Z mov.b64 {%r1065, %r1066}, %rd371; 2026-02-21T08:32:58.1695169Z cvt.rn.f16x2.f32 %r1067, %r1066, %r1065; 2026-02-21T08:32:58.1695432Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1695706Z cvt.u64.u32 %rd372, %r709; 2026-02-21T08:32:58.1695851Z cvt.u64.u32 %rd373, %r710; 2026-02-21T08:32:58.1696002Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:32:58.1696151Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:32:58.1696412Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1696692Z mov.b64 {%r1068, %r1069}, %rd375; 2026-02-21T08:32:58.1696854Z cvt.rn.f16x2.f32 %r1070, %r1069, %r1068; 2026-02-21T08:32:58.1697152Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1697426Z cvt.u64.u32 %rd376, %r711; 2026-02-21T08:32:58.1697583Z cvt.u64.u32 %rd377, %r712; 2026-02-21T08:32:58.1697730Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:32:58.1697888Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:32:58.1698145Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1698420Z mov.b64 {%r1071, %r1072}, %rd379; 2026-02-21T08:32:58.1698593Z cvt.rn.f16x2.f32 %r1073, %r1072, %r1071; 2026-02-21T08:32:58.1698859Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1699140Z cvt.u64.u32 %rd380, %r713; 2026-02-21T08:32:58.1699287Z cvt.u64.u32 %rd381, %r714; 2026-02-21T08:32:58.1699441Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:32:58.1699596Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:32:58.1699850Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1700140Z mov.b64 {%r1074, %r1075}, %rd383; 2026-02-21T08:32:58.1700308Z cvt.rn.f16x2.f32 %r1076, %r1075, %r1074; 2026-02-21T08:32:58.1700617Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1700887Z cvt.u64.u32 %rd384, %r716; 2026-02-21T08:32:58.1701043Z cvt.u64.u32 %rd385, %r717; 2026-02-21T08:32:58.1701189Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:32:58.1701343Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:32:58.1701605Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1701879Z mov.b64 {%r1077, %r1078}, %rd387; 2026-02-21T08:32:58.1702052Z cvt.rn.f16x2.f32 %r1079, %r1078, %r1077; 2026-02-21T08:32:58.1702320Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1702592Z cvt.u64.u32 %rd388, %r718; 2026-02-21T08:32:58.1702740Z cvt.u64.u32 %rd389, %r719; 2026-02-21T08:32:58.1702896Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:32:58.1703079Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:32:58.1703333Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1703612Z mov.b64 {%r1080, %r1081}, %rd391; 2026-02-21T08:32:58.1703773Z cvt.rn.f16x2.f32 %r1082, %r1081, %r1080; 2026-02-21T08:32:58.1704049Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1704317Z cvt.u64.u32 %rd392, %r720; 2026-02-21T08:32:58.1704468Z cvt.u64.u32 %rd393, %r721; 2026-02-21T08:32:58.1704639Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:32:58.1704837Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:32:58.1705097Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1705371Z mov.b64 {%r1083, %r1084}, %rd395; 2026-02-21T08:32:58.1705544Z cvt.rn.f16x2.f32 %r1085, %r1084, %r1083; 2026-02-21T08:32:58.1705817Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1706093Z cvt.u64.u32 %rd396, %r722; 2026-02-21T08:32:58.1706240Z cvt.u64.u32 %rd397, %r723; 2026-02-21T08:32:58.1706394Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:32:58.1706549Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:32:58.1706805Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1707087Z mov.b64 {%r1086, %r1087}, %rd399; 2026-02-21T08:32:58.1707250Z cvt.rn.f16x2.f32 %r1088, %r1087, %r1086; 2026-02-21T08:32:58.1707534Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1707805Z cvt.u64.u32 %rd400, %r724; 2026-02-21T08:32:58.1707959Z cvt.u64.u32 %rd401, %r725; 2026-02-21T08:32:58.1708104Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:32:58.1708259Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:32:58.1708550Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1708829Z mov.b64 {%r1089, %r1090}, %rd403; 2026-02-21T08:32:58.1709002Z cvt.rn.f16x2.f32 %r1091, %r1090, %r1089; 2026-02-21T08:32:58.1709272Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1709550Z cvt.u64.u32 %rd404, %r726; 2026-02-21T08:32:58.1709695Z cvt.u64.u32 %rd405, %r727; 2026-02-21T08:32:58.1709847Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:32:58.1710002Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:32:58.1710253Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1710538Z mov.b64 {%r1092, %r1093}, %rd407; 2026-02-21T08:32:58.1710703Z cvt.rn.f16x2.f32 %r1094, %r1093, %r1092; 2026-02-21T08:32:58.1710975Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1711245Z cvt.u64.u32 %rd408, %r728; 2026-02-21T08:32:58.1711402Z cvt.u64.u32 %rd409, %r729; 2026-02-21T08:32:58.1711576Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:32:58.1711730Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:32:58.1711994Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1712269Z mov.b64 {%r1095, %r1096}, %rd411; 2026-02-21T08:32:58.1712442Z cvt.rn.f16x2.f32 %r1097, %r1096, %r1095; 2026-02-21T08:32:58.1712716Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1712999Z cvt.u64.u32 %rd412, %r730; 2026-02-21T08:32:58.1713147Z cvt.u64.u32 %rd413, %r731; 2026-02-21T08:32:58.1713301Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:32:58.1713460Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:32:58.1713720Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1714005Z mov.b64 {%r1098, %r1099}, %rd415; 2026-02-21T08:32:58.1714198Z cvt.rn.f16x2.f32 %r1100, %r1099, %r1098; 2026-02-21T08:32:58.1714474Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1714772Z cvt.u64.u32 %rd416, %r733; 2026-02-21T08:32:58.1714925Z cvt.u64.u32 %rd417, %r734; 2026-02-21T08:32:58.1715071Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:32:58.1715228Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:32:58.1715495Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1715805Z mov.b64 {%r1101, %r1102}, %rd419; 2026-02-21T08:32:58.1715976Z cvt.rn.f16x2.f32 %r1103, %r1102, %r1101; 2026-02-21T08:32:58.1716243Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1716521Z cvt.u64.u32 %rd420, %r735; 2026-02-21T08:32:58.1716664Z cvt.u64.u32 %rd421, %r736; 2026-02-21T08:32:58.1716818Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:32:58.1716974Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:32:58.1717232Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1717514Z mov.b64 {%r1104, %r1105}, %rd423; 2026-02-21T08:32:58.1717676Z cvt.rn.f16x2.f32 %r1106, %r1105, %r1104; 2026-02-21T08:32:58.1717958Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1718232Z cvt.u64.u32 %rd424, %r737; 2026-02-21T08:32:58.1718385Z cvt.u64.u32 %rd425, %r738; 2026-02-21T08:32:58.1718541Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:32:58.1718689Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:32:58.1718953Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1719235Z mov.b64 {%r1107, %r1108}, %rd427; 2026-02-21T08:32:58.1719404Z cvt.rn.f16x2.f32 %r1109, %r1108, %r1107; 2026-02-21T08:32:58.1719697Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1719989Z cvt.u64.u32 %rd428, %r739; 2026-02-21T08:32:58.1720135Z cvt.u64.u32 %rd429, %r740; 2026-02-21T08:32:58.1720288Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:32:58.1720445Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:32:58.1720700Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1720984Z mov.b64 {%r1110, %r1111}, %rd431; 2026-02-21T08:32:58.1721148Z cvt.rn.f16x2.f32 %r1112, %r1111, %r1110; 2026-02-21T08:32:58.1721428Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1721701Z cvt.u64.u32 %rd432, %r741; 2026-02-21T08:32:58.1721853Z cvt.u64.u32 %rd433, %r742; 2026-02-21T08:32:58.1722004Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:32:58.1722157Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:32:58.1722425Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1722698Z mov.b64 {%r1113, %r1114}, %rd435; 2026-02-21T08:32:58.1722904Z cvt.rn.f16x2.f32 %r1115, %r1114, %r1113; 2026-02-21T08:32:58.1723169Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1723454Z cvt.u64.u32 %rd436, %r743; 2026-02-21T08:32:58.1723600Z cvt.u64.u32 %rd437, %r744; 2026-02-21T08:32:58.1723752Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:32:58.1723910Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:32:58.1724160Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1724446Z mov.b64 {%r1116, %r1117}, %rd439; 2026-02-21T08:32:58.1724610Z cvt.rn.f16x2.f32 %r1118, %r1117, %r1116; 2026-02-21T08:32:58.1724917Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1725198Z cvt.u64.u32 %rd440, %r745; 2026-02-21T08:32:58.1725383Z cvt.u64.u32 %rd441, %r746; 2026-02-21T08:32:58.1725550Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:32:58.1725714Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:32:58.1725999Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1726298Z mov.b64 {%r1119, %r1120}, %rd443; 2026-02-21T08:32:58.1726487Z cvt.rn.f16x2.f32 %r1121, %r1120, %r1119; 2026-02-21T08:32:58.1726775Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1727093Z cvt.u64.u32 %rd444, %r747; 2026-02-21T08:32:58.1727246Z cvt.u64.u32 %rd445, %r748; 2026-02-21T08:32:58.1727405Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:32:58.1727569Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:32:58.1727829Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1728288Z mov.b64 {%r1122, %r1123}, %rd447; 2026-02-21T08:32:58.1728463Z cvt.rn.f16x2.f32 %r1124, %r1123, %r1122; 2026-02-21T08:32:58.1728755Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1729039Z cvt.u64.u32 %rd448, %r750; 2026-02-21T08:32:58.1729201Z cvt.u64.u32 %rd449, %r751; 2026-02-21T08:32:58.1729363Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:32:58.1729522Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:32:58.1729797Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1730087Z mov.b64 {%r1125, %r1126}, %rd451; 2026-02-21T08:32:58.1730268Z cvt.rn.f16x2.f32 %r1127, %r1126, %r1125; 2026-02-21T08:32:58.1730548Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1730839Z cvt.u64.u32 %rd452, %r752; 2026-02-21T08:32:58.1731001Z cvt.u64.u32 %rd453, %r753; 2026-02-21T08:32:58.1731202Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:32:58.1731370Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:32:58.1731634Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1731929Z mov.b64 {%r1128, %r1129}, %rd455; 2026-02-21T08:32:58.1732101Z cvt.rn.f16x2.f32 %r1130, %r1129, %r1128; 2026-02-21T08:32:58.1732385Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1732678Z cvt.u64.u32 %rd456, %r754; 2026-02-21T08:32:58.1732845Z cvt.u64.u32 %rd457, %r755; 2026-02-21T08:32:58.1733001Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:32:58.1733160Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:32:58.1733428Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1733712Z mov.b64 {%r1131, %r1132}, %rd459; 2026-02-21T08:32:58.1733882Z cvt.rn.f16x2.f32 %r1133, %r1132, %r1131; 2026-02-21T08:32:58.1734151Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1734422Z cvt.u64.u32 %rd460, %r756; 2026-02-21T08:32:58.1734603Z cvt.u64.u32 %rd461, %r757; 2026-02-21T08:32:58.1734781Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:32:58.1734938Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:32:58.1735196Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1735476Z mov.b64 {%r1134, %r1135}, %rd463; 2026-02-21T08:32:58.1735643Z cvt.rn.f16x2.f32 %r1136, %r1135, %r1134; 2026-02-21T08:32:58.1735924Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1736204Z cvt.u64.u32 %rd464, %r758; 2026-02-21T08:32:58.1736350Z cvt.u64.u32 %rd465, %r759; 2026-02-21T08:32:58.1736501Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:32:58.1736649Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:32:58.1736920Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1737216Z mov.b64 {%r1137, %r1138}, %rd467; 2026-02-21T08:32:58.1737389Z cvt.rn.f16x2.f32 %r1139, %r1138, %r1137; 2026-02-21T08:32:58.1737652Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1737939Z cvt.u64.u32 %rd468, %r760; 2026-02-21T08:32:58.1738088Z cvt.u64.u32 %rd469, %r761; 2026-02-21T08:32:58.1738231Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:32:58.1738388Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:32:58.1738637Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1738946Z mov.b64 {%r1140, %r1141}, %rd471; 2026-02-21T08:32:58.1739110Z cvt.rn.f16x2.f32 %r1142, %r1141, %r1140; 2026-02-21T08:32:58.1739382Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1739662Z cvt.u64.u32 %rd472, %r762; 2026-02-21T08:32:58.1739808Z cvt.u64.u32 %rd473, %r763; 2026-02-21T08:32:58.1739960Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:32:58.1740110Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:32:58.1740373Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1740648Z mov.b64 {%r1143, %r1144}, %rd475; 2026-02-21T08:32:58.1740821Z cvt.rn.f16x2.f32 %r1145, %r1144, %r1143; 2026-02-21T08:32:58.1741087Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1741364Z cvt.u64.u32 %rd476, %r764; 2026-02-21T08:32:58.1741523Z cvt.u64.u32 %rd477, %r765; 2026-02-21T08:32:58.1741670Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:32:58.1741826Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:32:58.1742080Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1742415Z mov.b64 {%r1146, %r1147}, %rd479; 2026-02-21T08:32:58.1742583Z cvt.rn.f16x2.f32 %r1148, %r1147, %r1146; 2026-02-21T08:32:58.1742864Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1743142Z cvt.u64.u32 %rd480, %r767; 2026-02-21T08:32:58.1743288Z cvt.u64.u32 %rd481, %r768; 2026-02-21T08:32:58.1743441Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:32:58.1743590Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:32:58.1743854Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1744136Z mov.b64 {%r1149, %r1150}, %rd483; 2026-02-21T08:32:58.1744313Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T08:32:58.1744582Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1744650Z cvt.u64.u32 %rd484, %r769; 2026-02-21T08:32:58.1744754Z cvt.u64.u32 %rd485, %r770; 2026-02-21T08:32:58.1744813Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:32:58.1744879Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:32:58.1745042Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1745126Z mov.b64 {%r1152, %r1153}, %rd487; 2026-02-21T08:32:58.1745198Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T08:32:58.1745360Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1745415Z cvt.u64.u32 %rd488, %r771; 2026-02-21T08:32:58.1745470Z cvt.u64.u32 %rd489, %r772; 2026-02-21T08:32:58.1745534Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:32:58.1745592Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:32:58.1745755Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1745821Z mov.b64 {%r1155, %r1156}, %rd491; 2026-02-21T08:32:58.1745886Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T08:32:58.1746049Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1746132Z cvt.u64.u32 %rd492, %r773; 2026-02-21T08:32:58.1746190Z cvt.u64.u32 %rd493, %r774; 2026-02-21T08:32:58.1746247Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:32:58.1746303Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:32:58.1746478Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1746535Z mov.b64 {%r1158, %r1159}, %rd495; 2026-02-21T08:32:58.1746598Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T08:32:58.1746765Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1746840Z cvt.u64.u32 %rd496, %r775; 2026-02-21T08:32:58.1746895Z cvt.u64.u32 %rd497, %r776; 2026-02-21T08:32:58.1746958Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:32:58.1747014Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:32:58.1747175Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1747233Z mov.b64 {%r1161, %r1162}, %rd499; 2026-02-21T08:32:58.1747307Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T08:32:58.1747462Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1747516Z cvt.u64.u32 %rd500, %r777; 2026-02-21T08:32:58.1747578Z cvt.u64.u32 %rd501, %r778; 2026-02-21T08:32:58.1747634Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:32:58.1747690Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:32:58.1747860Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1747919Z mov.b64 {%r1164, %r1165}, %rd503; 2026-02-21T08:32:58.1747981Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T08:32:58.1748139Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1748224Z cvt.u64.u32 %rd504, %r779; 2026-02-21T08:32:58.1748280Z cvt.u64.u32 %rd505, %r780; 2026-02-21T08:32:58.1748336Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:32:58.1748400Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:32:58.1748562Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1748619Z mov.b64 {%r1167, %r1168}, %rd507; 2026-02-21T08:32:58.1748689Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T08:32:58.1748852Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1748906Z cvt.u64.u32 %rd508, %r781; 2026-02-21T08:32:58.1748962Z cvt.u64.u32 %rd509, %r782; 2026-02-21T08:32:58.1749023Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:32:58.1749080Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:32:58.1749238Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1749302Z mov.b64 {%r1170, %r1171}, %rd511; 2026-02-21T08:32:58.1749365Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T08:32:58.1749528Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1749612Z cvt.u64.u32 %rd512, %r784; 2026-02-21T08:32:58.1749667Z cvt.u64.u32 %rd513, %r785; 2026-02-21T08:32:58.1749721Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:32:58.1749777Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:32:58.1749947Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1750004Z mov.b64 {%r1173, %r1174}, %rd515; 2026-02-21T08:32:58.1750069Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T08:32:58.1750237Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1750292Z cvt.u64.u32 %rd516, %r786; 2026-02-21T08:32:58.1750346Z cvt.u64.u32 %rd517, %r787; 2026-02-21T08:32:58.1750411Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:32:58.1750468Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:32:58.1750644Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1750703Z mov.b64 {%r1176, %r1177}, %rd519; 2026-02-21T08:32:58.1750776Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T08:32:58.1750935Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1750992Z cvt.u64.u32 %rd520, %r788; 2026-02-21T08:32:58.1751058Z cvt.u64.u32 %rd521, %r789; 2026-02-21T08:32:58.1751114Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:32:58.1751193Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:32:58.1751359Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1751416Z mov.b64 {%r1179, %r1180}, %rd523; 2026-02-21T08:32:58.1751478Z cvt.rn.f16x2.f32 %r1181, %r1180, %r1179; 2026-02-21T08:32:58.1751641Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1751706Z cvt.u64.u32 %rd524, %r790; 2026-02-21T08:32:58.1751762Z cvt.u64.u32 %rd525, %r791; 2026-02-21T08:32:58.1751817Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:32:58.1751879Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:32:58.1752036Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1752091Z mov.b64 {%r1182, %r1183}, %rd527; 2026-02-21T08:32:58.1752159Z cvt.rn.f16x2.f32 %r1184, %r1183, %r1182; 2026-02-21T08:32:58.1752317Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1752374Z cvt.u64.u32 %rd528, %r792; 2026-02-21T08:32:58.1752431Z cvt.u64.u32 %rd529, %r793; 2026-02-21T08:32:58.1752492Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:32:58.1752548Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:32:58.1752730Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1752798Z mov.b64 {%r1185, %r1186}, %rd531; 2026-02-21T08:32:58.1752862Z cvt.rn.f16x2.f32 %r1187, %r1186, %r1185; 2026-02-21T08:32:58.1753021Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1753084Z cvt.u64.u32 %rd532, %r794; 2026-02-21T08:32:58.1753140Z cvt.u64.u32 %rd533, %r795; 2026-02-21T08:32:58.1753195Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:32:58.1753250Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:32:58.1753416Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1753475Z mov.b64 {%r1188, %r1189}, %rd535; 2026-02-21T08:32:58.1753537Z cvt.rn.f16x2.f32 %r1190, %r1189, %r1188; 2026-02-21T08:32:58.1753706Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1753763Z cvt.u64.u32 %rd536, %r796; 2026-02-21T08:32:58.1753819Z cvt.u64.u32 %rd537, %r797; 2026-02-21T08:32:58.1753883Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:32:58.1753959Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:32:58.1754119Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1754175Z mov.b64 {%r1191, %r1192}, %rd539; 2026-02-21T08:32:58.1754246Z cvt.rn.f16x2.f32 %r1193, %r1192, %r1191; 2026-02-21T08:32:58.1754402Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1754456Z cvt.u64.u32 %rd540, %r798; 2026-02-21T08:32:58.1754520Z cvt.u64.u32 %rd541, %r799; 2026-02-21T08:32:58.1754575Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:32:58.1754631Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:32:58.1754836Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1754892Z mov.b64 {%r1194, %r1195}, %rd543; 2026-02-21T08:32:58.1754955Z cvt.rn.f16x2.f32 %r1196, %r1195, %r1194; 2026-02-21T08:32:58.1755132Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1755197Z cvt.u64.u32 %rd544, %r801; 2026-02-21T08:32:58.1755253Z cvt.u64.u32 %rd545, %r802; 2026-02-21T08:32:58.1755310Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:32:58.1755372Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:32:58.1755533Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1755589Z mov.b64 {%r1197, %r1198}, %rd547; 2026-02-21T08:32:58.1755697Z cvt.rn.f16x2.f32 %r1199, %r1198, %r1197; 2026-02-21T08:32:58.1755861Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1755917Z cvt.u64.u32 %rd548, %r803; 2026-02-21T08:32:58.1755971Z cvt.u64.u32 %rd549, %r804; 2026-02-21T08:32:58.1756033Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:32:58.1756090Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:32:58.1756253Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1756317Z mov.b64 {%r1200, %r1201}, %rd551; 2026-02-21T08:32:58.1756379Z cvt.rn.f16x2.f32 %r1202, %r1201, %r1200; 2026-02-21T08:32:58.1756541Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1756603Z cvt.u64.u32 %rd552, %r805; 2026-02-21T08:32:58.1756658Z cvt.u64.u32 %rd553, %r806; 2026-02-21T08:32:58.1756713Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:32:58.1756770Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:32:58.1756934Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1756990Z mov.b64 {%r1203, %r1204}, %rd555; 2026-02-21T08:32:58.1757051Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T08:32:58.1757241Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1757299Z cvt.u64.u32 %rd556, %r807; 2026-02-21T08:32:58.1757355Z cvt.u64.u32 %rd557, %r808; 2026-02-21T08:32:58.1757416Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:32:58.1757472Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:32:58.1757634Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1757691Z mov.b64 {%r1206, %r1207}, %rd559; 2026-02-21T08:32:58.1757760Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T08:32:58.1757922Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1757980Z cvt.u64.u32 %rd560, %r809; 2026-02-21T08:32:58.1758041Z cvt.u64.u32 %rd561, %r810; 2026-02-21T08:32:58.1758097Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:32:58.1758152Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:32:58.1758320Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1758378Z mov.b64 {%r1209, %r1210}, %rd563; 2026-02-21T08:32:58.1758441Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T08:32:58.1758632Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1758695Z cvt.u64.u32 %rd564, %r811; 2026-02-21T08:32:58.1758750Z cvt.u64.u32 %rd565, %r812; 2026-02-21T08:32:58.1758806Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:32:58.1758870Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:32:58.1759035Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1759095Z mov.b64 {%r1212, %r1213}, %rd567; 2026-02-21T08:32:58.1759164Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T08:32:58.1759328Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1759384Z cvt.u64.u32 %rd568, %r813; 2026-02-21T08:32:58.1759442Z cvt.u64.u32 %rd569, %r814; 2026-02-21T08:32:58.1759507Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:32:58.1759583Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:32:58.1759741Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1759806Z mov.b64 {%r1215, %r1216}, %rd571; 2026-02-21T08:32:58.1759871Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T08:32:58.1760028Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1760095Z cvt.u64.u32 %rd572, %r815; 2026-02-21T08:32:58.1760171Z cvt.u64.u32 %rd573, %r816; 2026-02-21T08:32:58.1760227Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:32:58.1760283Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:32:58.1760454Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1760510Z mov.b64 {%r1218, %r1219}, %rd575; 2026-02-21T08:32:58.1760573Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T08:32:58.1760742Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1760799Z cvt.u64.u32 %rd576, %r818; 2026-02-21T08:32:58.1760853Z cvt.u64.u32 %rd577, %r819; 2026-02-21T08:32:58.1760915Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:32:58.1760971Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:32:58.1761129Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1761186Z mov.b64 {%r1221, %r1222}, %rd579; 2026-02-21T08:32:58.1761256Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T08:32:58.1761416Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1761471Z cvt.u64.u32 %rd580, %r820; 2026-02-21T08:32:58.1761533Z cvt.u64.u32 %rd581, %r821; 2026-02-21T08:32:58.1761589Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:32:58.1761666Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:32:58.1761835Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1761892Z mov.b64 {%r1224, %r1225}, %rd583; 2026-02-21T08:32:58.1761955Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T08:32:58.1762115Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1762180Z cvt.u64.u32 %rd584, %r822; 2026-02-21T08:32:58.1762236Z cvt.u64.u32 %rd585, %r823; 2026-02-21T08:32:58.1762292Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:32:58.1762354Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:32:58.1762513Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1762569Z mov.b64 {%r1227, %r1228}, %rd587; 2026-02-21T08:32:58.1762638Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T08:32:58.1762800Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1762856Z cvt.u64.u32 %rd588, %r824; 2026-02-21T08:32:58.1762911Z cvt.u64.u32 %rd589, %r825; 2026-02-21T08:32:58.1762995Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:32:58.1763051Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:32:58.1763212Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1763275Z mov.b64 {%r1230, %r1231}, %rd591; 2026-02-21T08:32:58.1763338Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T08:32:58.1763498Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1763562Z cvt.u64.u32 %rd592, %r826; 2026-02-21T08:32:58.1763617Z cvt.u64.u32 %rd593, %r827; 2026-02-21T08:32:58.1763671Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:32:58.1763728Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:32:58.1763898Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1763955Z mov.b64 {%r1233, %r1234}, %rd595; 2026-02-21T08:32:58.1764039Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T08:32:58.1764207Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1764263Z cvt.u64.u32 %rd596, %r828; 2026-02-21T08:32:58.1764317Z cvt.u64.u32 %rd597, %r829; 2026-02-21T08:32:58.1764379Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:32:58.1764433Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:32:58.1764593Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1764706Z mov.b64 {%r1236, %r1237}, %rd599; 2026-02-21T08:32:58.1764778Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T08:32:58.1764941Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1764998Z cvt.u64.u32 %rd600, %r830; 2026-02-21T08:32:58.1765063Z cvt.u64.u32 %rd601, %r831; 2026-02-21T08:32:58.1765120Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:32:58.1765177Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:32:58.1765344Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1765401Z mov.b64 {%r1239, %r1240}, %rd603; 2026-02-21T08:32:58.1765464Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T08:32:58.1765621Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1765684Z cvt.u64.u32 %rd604, %r832; 2026-02-21T08:32:58.1765738Z cvt.u64.u32 %rd605, %r833; 2026-02-21T08:32:58.1765794Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:32:58.1765856Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:32:58.1766016Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1766072Z mov.b64 {%r1242, %r1243}, %rd607; 2026-02-21T08:32:58.1766165Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T08:32:58.1766324Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1766381Z cvt.u64.u32 %rd608, %r835; 2026-02-21T08:32:58.1766437Z cvt.u64.u32 %rd609, %r836; 2026-02-21T08:32:58.1766501Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:32:58.1766557Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:32:58.1766718Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1766781Z mov.b64 {%r1245, %r1246}, %rd611; 2026-02-21T08:32:58.1766843Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T08:32:58.1767003Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1767064Z cvt.u64.u32 %rd612, %r837; 2026-02-21T08:32:58.1767120Z cvt.u64.u32 %rd613, %r838; 2026-02-21T08:32:58.1767175Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:32:58.1767232Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:32:58.1767398Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1767480Z mov.b64 {%r1248, %r1249}, %rd615; 2026-02-21T08:32:58.1767542Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T08:32:58.1767709Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1767764Z cvt.u64.u32 %rd616, %r839; 2026-02-21T08:32:58.1767819Z cvt.u64.u32 %rd617, %r840; 2026-02-21T08:32:58.1767881Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:32:58.1767938Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:32:58.1768097Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1768154Z mov.b64 {%r1251, %r1252}, %rd619; 2026-02-21T08:32:58.1768225Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T08:32:58.1768384Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1768438Z cvt.u64.u32 %rd620, %r841; 2026-02-21T08:32:58.1768527Z cvt.u64.u32 %rd621, %r842; 2026-02-21T08:32:58.1768586Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:32:58.1768643Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:32:58.1768812Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1768869Z mov.b64 {%r1254, %r1255}, %rd623; 2026-02-21T08:32:58.1768931Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T08:32:58.1769087Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1769178Z cvt.u64.u32 %rd624, %r843; 2026-02-21T08:32:58.1769234Z cvt.u64.u32 %rd625, %r844; 2026-02-21T08:32:58.1769289Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:32:58.1769353Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:32:58.1769516Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1769578Z mov.b64 {%r1257, %r1258}, %rd627; 2026-02-21T08:32:58.1769653Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T08:32:58.1769819Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1769884Z cvt.u64.u32 %rd628, %r845; 2026-02-21T08:32:58.1769941Z cvt.u64.u32 %rd629, %r846; 2026-02-21T08:32:58.1770008Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:32:58.1770068Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:32:58.1770231Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1770300Z mov.b64 {%r1260, %r1261}, %rd631; 2026-02-21T08:32:58.1770366Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T08:32:58.1770531Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1770597Z cvt.u64.u32 %rd632, %r847; 2026-02-21T08:32:58.1770677Z cvt.u64.u32 %rd633, %r848; 2026-02-21T08:32:58.1770739Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:32:58.1770798Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:32:58.1770975Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1771034Z mov.b64 {%r1263, %r1264}, %rd635; 2026-02-21T08:32:58.1771100Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T08:32:58.1771275Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1771333Z cvt.u64.u32 %rd636, %r849; 2026-02-21T08:32:58.1771390Z cvt.u64.u32 %rd637, %r850; 2026-02-21T08:32:58.1771457Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:32:58.1771517Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:32:58.1771685Z .loc 1 58 27 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:58:27 2026-02-21T08:32:58.1771743Z mov.b64 {%r1266, %r1267}, %rd639; 2026-02-21T08:32:58.1771818Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T08:32:58.1771984Z .loc 1 59 45 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:59:45 2026-02-21T08:32:58.1772080Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:32:58.1772144Z bar.sync 0, 128; 2026-02-21T08:32:58.1772243Z st.shared.v4.b32 [%r875], {%r887, %r890, %r893, %r896}; 2026-02-21T08:32:58.1772347Z st.shared.v4.b32 [%r875+32768], {%r983, %r986, %r989, %r992}; 2026-02-21T08:32:58.1772462Z st.shared.v4.b32 [%r875+16384], {%r1079, %r1082, %r1085, %r1088}; 2026-02-21T08:32:58.1772562Z st.shared.v4.b32 [%r875+49152], {%r1175, %r1178, %r1181, %r1184}; 2026-02-21T08:32:58.1772654Z st.shared.v4.b32 [%r874], {%r899, %r902, %r905, %r908}; 2026-02-21T08:32:58.1772757Z st.shared.v4.b32 [%r874+32768], {%r995, %r998, %r1001, %r1004}; 2026-02-21T08:32:58.1772865Z st.shared.v4.b32 [%r874+16384], {%r1091, %r1094, %r1097, %r1100}; 2026-02-21T08:32:58.1772962Z st.shared.v4.b32 [%r874+49152], {%r1187, %r1190, %r1193, %r1196}; 2026-02-21T08:32:58.1773050Z st.shared.v4.b32 [%r872], {%r911, %r914, %r917, %r920}; 2026-02-21T08:32:58.1773187Z st.shared.v4.b32 [%r872+32768], {%r1007, %r1010, %r1013, %r1016}; 2026-02-21T08:32:58.1773287Z st.shared.v4.b32 [%r872+16384], {%r1103, %r1106, %r1109, %r1112}; 2026-02-21T08:32:58.1773383Z st.shared.v4.b32 [%r872+49152], {%r1199, %r1202, %r1205, %r1208}; 2026-02-21T08:32:58.1773477Z st.shared.v4.b32 [%r870], {%r923, %r926, %r929, %r932}; 2026-02-21T08:32:58.1773575Z st.shared.v4.b32 [%r870+32768], {%r1019, %r1022, %r1025, %r1028}; 2026-02-21T08:32:58.1773672Z st.shared.v4.b32 [%r870+16384], {%r1115, %r1118, %r1121, %r1124}; 2026-02-21T08:32:58.1773790Z st.shared.v4.b32 [%r870+49152], {%r1211, %r1214, %r1217, %r1220}; 2026-02-21T08:32:58.1773877Z st.shared.v4.b32 [%r868], {%r935, %r938, %r941, %r944}; 2026-02-21T08:32:58.1773973Z st.shared.v4.b32 [%r868+32768], {%r1031, %r1034, %r1037, %r1040}; 2026-02-21T08:32:58.1774069Z st.shared.v4.b32 [%r868+16384], {%r1127, %r1130, %r1133, %r1136}; 2026-02-21T08:32:58.1774171Z st.shared.v4.b32 [%r868+49152], {%r1223, %r1226, %r1229, %r1232}; 2026-02-21T08:32:58.1774260Z st.shared.v4.b32 [%r866], {%r947, %r950, %r953, %r956}; 2026-02-21T08:32:58.1774356Z st.shared.v4.b32 [%r866+32768], {%r1043, %r1046, %r1049, %r1052}; 2026-02-21T08:32:58.1774458Z st.shared.v4.b32 [%r866+16384], {%r1139, %r1142, %r1145, %r1148}; 2026-02-21T08:32:58.1774553Z st.shared.v4.b32 [%r866+49152], {%r1235, %r1238, %r1241, %r1244}; 2026-02-21T08:32:58.1774638Z st.shared.v4.b32 [%r864], {%r959, %r962, %r965, %r968}; 2026-02-21T08:32:58.1774786Z st.shared.v4.b32 [%r864+32768], {%r1055, %r1058, %r1061, %r1064}; 2026-02-21T08:32:58.1774884Z st.shared.v4.b32 [%r864+16384], {%r1151, %r1154, %r1157, %r1160}; 2026-02-21T08:32:58.1774980Z st.shared.v4.b32 [%r864+49152], {%r1247, %r1250, %r1253, %r1256}; 2026-02-21T08:32:58.1775072Z st.shared.v4.b32 [%r862], {%r971, %r974, %r977, %r980}; 2026-02-21T08:32:58.1775193Z st.shared.v4.b32 [%r862+32768], {%r1067, %r1070, %r1073, %r1076}; 2026-02-21T08:32:58.1775292Z st.shared.v4.b32 [%r862+16384], {%r1163, %r1166, %r1169, %r1172}; 2026-02-21T08:32:58.1775389Z st.shared.v4.b32 [%r862+49152], {%r1259, %r1262, %r1265, %r1268}; 2026-02-21T08:32:58.1775455Z // begin inline asm 2026-02-21T08:32:58.1775534Z fence.proxy.async.shared::cta; 2026-02-21T08:32:58.1775590Z // end inline asm 2026-02-21T08:32:58.1775655Z bar.sync 0, 128; 2026-02-21T08:32:58.1775723Z elect.sync %r1269|%p67, -1; 2026-02-21T08:32:58.1775788Z and.pred %p65, %p66, %p67; 2026-02-21T08:32:58.1775848Z and.b32 %r1270, %r882, 1; 2026-02-21T08:32:58.1775916Z shl.b32 %r1271, %r1270, 15; 2026-02-21T08:32:58.1775977Z add.s32 %r854, %r287, %r1271; 2026-02-21T08:32:58.1776036Z and.b32 %r1272, %r33, 30; 2026-02-21T08:32:58.1776106Z or.b32 %r1273, %r1270, %r1272; 2026-02-21T08:32:58.1776166Z shl.b32 %r852, %r1273, 6; 2026-02-21T08:32:58.1776223Z // begin inline asm 2026-02-21T08:32:58.1776422Z @%p65 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd127, {%r852, %r853}], [%r854]; 2026-02-21T08:32:58.1776478Z // end inline asm 2026-02-21T08:32:58.1776546Z cp.async.bulk.commit_group; 2026-02-21T08:32:58.1776661Z $L__BB0_15: // %._crit_edge 2026-02-21T08:32:58.1776841Z .loc 1 30 75 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:30:75 2026-02-21T08:32:58.1776925Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:32:58.1776979Z bar.sync 0, 128; 2026-02-21T08:32:58.1777145Z .loc 1 30 4 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:30:4 2026-02-21T08:32:58.1777199Z bar.sync 0, 128; 2026-02-21T08:32:58.1777255Z // begin inline asm 2026-02-21T08:32:58.1777377Z @%p26 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1274, 256; 2026-02-21T08:32:58.1777433Z // end inline asm 2026-02-21T08:32:58.1777532Z st.shared.v2.b32 [global_smem+114696], {67372036, 67372036}; 2026-02-21T08:32:58.1777587Z barrier.sync 1; 2026-02-21T08:32:58.1777678Z $L__BB0_16: // %common.ret 2026-02-21T08:32:58.1777863Z .loc 1 0 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:0 2026-02-21T08:32:58.1777914Z ret; 2026-02-21T08:32:58.1778016Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:32:58.1778101Z ld.param.b64 %rd22, [_helion_matmul_param_1]; 2026-02-21T08:32:58.1778181Z ld.param.b64 %rd21, [_helion_matmul_param_0]; 2026-02-21T08:32:58.1778347Z .loc 1 19 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:19 2026-02-21T08:32:58.1778409Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:32:58.1778494Z and.b16 %rs2, %rs1, 7; 2026-02-21T08:32:58.1778553Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:32:58.1778727Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1778780Z or.b32 %r5, %r4, 64; 2026-02-21T08:32:58.1778838Z mov.b32 %r35, global_smem; 2026-02-21T08:32:58.1778906Z add.s32 %r36, %r35, %r3; 2026-02-21T08:32:58.1778961Z bra.uni $L__BB0_2; 2026-02-21T08:32:58.1779059Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1779230Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1779284Z barrier.sync 1; 2026-02-21T08:32:58.1779337Z barrier.sync 1; 2026-02-21T08:32:58.1779414Z $L__BB0_2: // %.preheader 2026-02-21T08:32:58.1779509Z // =>This Loop Header: Depth=1 2026-02-21T08:32:58.1779594Z // Child Loop BB0_6 Depth 2 2026-02-21T08:32:58.1779748Z .loc 1 19 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:19 2026-02-21T08:32:58.1779834Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:32:58.1779887Z barrier.sync 1; 2026-02-21T08:32:58.1779948Z ld.shared.b8 %r34, [%r36+114692]; 2026-02-21T08:32:58.1780039Z setp.gt.u32 %p3, %r34, 4; 2026-02-21T08:32:58.1780097Z @%p3 bra $L__BB0_4; 2026-02-21T08:32:58.1780173Z // %bb.3: // %.preheader 2026-02-21T08:32:58.1780260Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1780327Z $L_brx_0: .branchtargets 2026-02-21T08:32:58.1780380Z $L__BB0_5, 2026-02-21T08:32:58.1780430Z $L__BB0_10, 2026-02-21T08:32:58.1780485Z $L__BB0_11, 2026-02-21T08:32:58.1780533Z $L__BB0_12, 2026-02-21T08:32:58.1780583Z $L__BB0_16; 2026-02-21T08:32:58.1780643Z brx.idx %r34, $L_brx_0; 2026-02-21T08:32:58.1780741Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1780909Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1780968Z add.s32 %r86, %r35, 65536; 2026-02-21T08:32:58.1781048Z ld.shared.b32 %r194, [global_smem+65536]; 2026-02-21T08:32:58.1781139Z ld.shared.v2.b32 {%r87, %r88}, [global_smem+65544]; 2026-02-21T08:32:58.1781192Z barrier.sync 1; 2026-02-21T08:32:58.1781364Z .loc 1 42 45 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:42:45 2026-02-21T08:32:58.1781443Z add.s32 %r89, %r1, -128; 2026-02-21T08:32:58.1781499Z shr.u32 %r7, %r89, 5; 2026-02-21T08:32:58.1781554Z shr.u32 %r90, %r1, 3; 2026-02-21T08:32:58.1781622Z bfe.u32 %r91, %r1, 3, 4; 2026-02-21T08:32:58.1781677Z or.b32 %r92, %r91, 16; 2026-02-21T08:32:58.1781730Z or.b32 %r93, %r91, 32; 2026-02-21T08:32:58.1781788Z or.b32 %r94, %r91, 48; 2026-02-21T08:32:58.1781840Z or.b32 %r95, %r91, 64; 2026-02-21T08:32:58.1781894Z or.b32 %r96, %r91, 80; 2026-02-21T08:32:58.1781946Z or.b32 %r97, %r91, 96; 2026-02-21T08:32:58.1782009Z or.b32 %r98, %r90, 112; 2026-02-21T08:32:58.1782063Z or.b32 %r99, %r90, 240; 2026-02-21T08:32:58.1782230Z .loc 1 50 48 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:50:48 2026-02-21T08:32:58.1782291Z shl.b32 %r100, %r1, 3; 2026-02-21T08:32:58.1782345Z and.b32 %r101, %r100, 56; 2026-02-21T08:32:58.1782526Z .loc 1 42 32 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:42:32 2026-02-21T08:32:58.1782593Z add.s32 %r102, %r87, %r91; 2026-02-21T08:32:58.1782648Z add.s32 %r103, %r87, %r92; 2026-02-21T08:32:58.1782701Z add.s32 %r104, %r87, %r93; 2026-02-21T08:32:58.1782754Z add.s32 %r105, %r87, %r94; 2026-02-21T08:32:58.1782814Z add.s32 %r106, %r87, %r95; 2026-02-21T08:32:58.1782868Z add.s32 %r107, %r87, %r96; 2026-02-21T08:32:58.1782921Z add.s32 %r108, %r87, %r97; 2026-02-21T08:32:58.1782981Z add.s32 %r109, %r87, %r98; 2026-02-21T08:32:58.1783055Z add.s32 %r110, %r87, %r99; 2026-02-21T08:32:58.1783215Z .loc 1 44 32 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:44:32 2026-02-21T08:32:58.1783269Z add.s32 %r111, %r88, %r91; 2026-02-21T08:32:58.1783330Z add.s32 %r112, %r88, %r92; 2026-02-21T08:32:58.1783385Z add.s32 %r113, %r88, %r93; 2026-02-21T08:32:58.1783439Z add.s32 %r114, %r88, %r94; 2026-02-21T08:32:58.1783502Z add.s32 %r115, %r88, %r95; 2026-02-21T08:32:58.1783556Z add.s32 %r116, %r88, %r96; 2026-02-21T08:32:58.1783610Z add.s32 %r117, %r88, %r97; 2026-02-21T08:32:58.1783664Z add.s32 %r118, %r88, %r98; 2026-02-21T08:32:58.1783828Z .loc 1 42 32 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:42:32 2026-02-21T08:32:58.1783885Z shl.b32 %r119, %r102, 11; 2026-02-21T08:32:58.1784044Z .loc 1 54 53 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:53 2026-02-21T08:32:58.1784107Z shl.b32 %r120, %r103, 11; 2026-02-21T08:32:58.1784165Z shl.b32 %r121, %r104, 11; 2026-02-21T08:32:58.1784219Z shl.b32 %r122, %r105, 11; 2026-02-21T08:32:58.1784280Z shl.b32 %r123, %r106, 11; 2026-02-21T08:32:58.1784333Z shl.b32 %r124, %r107, 11; 2026-02-21T08:32:58.1784385Z shl.b32 %r125, %r108, 11; 2026-02-21T08:32:58.1784438Z shl.b32 %r126, %r109, 11; 2026-02-21T08:32:58.1784517Z shl.b32 %r127, %r110, 11; 2026-02-21T08:32:58.1784724Z .loc 1 55 80 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:80 2026-02-21T08:32:58.1784779Z shl.b32 %r128, %r111, 11; 2026-02-21T08:32:58.1784840Z shl.b32 %r129, %r112, 11; 2026-02-21T08:32:58.1784894Z shl.b32 %r130, %r113, 11; 2026-02-21T08:32:58.1784947Z shl.b32 %r131, %r114, 11; 2026-02-21T08:32:58.1785007Z shl.b32 %r132, %r115, 11; 2026-02-21T08:32:58.1785061Z shl.b32 %r133, %r116, 11; 2026-02-21T08:32:58.1785115Z shl.b32 %r134, %r117, 11; 2026-02-21T08:32:58.1785168Z shl.b32 %r135, %r118, 11; 2026-02-21T08:32:58.1785336Z .loc 1 54 53 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:53 2026-02-21T08:32:58.1785393Z or.b32 %r136, %r119, %r101; 2026-02-21T08:32:58.1785551Z .loc 1 54 60 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:60 2026-02-21T08:32:58.1785613Z or.b32 %r137, %r120, %r101; 2026-02-21T08:32:58.1785669Z or.b32 %r138, %r121, %r101; 2026-02-21T08:32:58.1785725Z or.b32 %r139, %r122, %r101; 2026-02-21T08:32:58.1785778Z or.b32 %r140, %r123, %r101; 2026-02-21T08:32:58.1785865Z or.b32 %r141, %r124, %r101; 2026-02-21T08:32:58.1785920Z or.b32 %r142, %r125, %r101; 2026-02-21T08:32:58.1785974Z or.b32 %r143, %r126, %r101; 2026-02-21T08:32:58.1786040Z add.s32 %r144, %r136, 262144; 2026-02-21T08:32:58.1786096Z add.s32 %r145, %r136, 294912; 2026-02-21T08:32:58.1786151Z add.s32 %r146, %r136, 327680; 2026-02-21T08:32:58.1786214Z add.s32 %r147, %r136, 360448; 2026-02-21T08:32:58.1786268Z add.s32 %r148, %r136, 393216; 2026-02-21T08:32:58.1786323Z add.s32 %r149, %r136, 425984; 2026-02-21T08:32:58.1786377Z add.s32 %r150, %r136, 458752; 2026-02-21T08:32:58.1786440Z or.b32 %r151, %r127, %r101; 2026-02-21T08:32:58.1786602Z .loc 1 54 32 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:32 2026-02-21T08:32:58.1786669Z mad.wide.s32 %rd25, %r136, 2, %rd21; 2026-02-21T08:32:58.1786739Z mad.wide.s32 %rd26, %r137, 2, %rd21; 2026-02-21T08:32:58.1786824Z mad.wide.s32 %rd27, %r138, 2, %rd21; 2026-02-21T08:32:58.1786886Z mad.wide.s32 %rd28, %r139, 2, %rd21; 2026-02-21T08:32:58.1786946Z mad.wide.s32 %rd29, %r140, 2, %rd21; 2026-02-21T08:32:58.1787012Z mad.wide.s32 %rd30, %r141, 2, %rd21; 2026-02-21T08:32:58.1787069Z mad.wide.s32 %rd31, %r142, 2, %rd21; 2026-02-21T08:32:58.1787129Z mad.wide.s32 %rd32, %r143, 2, %rd21; 2026-02-21T08:32:58.1787195Z mad.wide.s32 %rd33, %r144, 2, %rd21; 2026-02-21T08:32:58.1787255Z mad.wide.s32 %rd34, %r145, 2, %rd21; 2026-02-21T08:32:58.1787313Z mad.wide.s32 %rd35, %r146, 2, %rd21; 2026-02-21T08:32:58.1787423Z mad.wide.s32 %rd36, %r147, 2, %rd21; 2026-02-21T08:32:58.1787482Z mad.wide.s32 %rd37, %r148, 2, %rd21; 2026-02-21T08:32:58.1787539Z mad.wide.s32 %rd38, %r149, 2, %rd21; 2026-02-21T08:32:58.1787596Z mad.wide.s32 %rd39, %r150, 2, %rd21; 2026-02-21T08:32:58.1787663Z mad.wide.s32 %rd40, %r151, 2, %rd21; 2026-02-21T08:32:58.1787828Z .loc 1 54 85 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:85 2026-02-21T08:32:58.1787885Z shl.b32 %r152, %r1, 4; 2026-02-21T08:32:58.1787950Z and.b32 %r153, %r152, 2032; 2026-02-21T08:32:58.1788006Z shl.b32 %r154, %r1, 1; 2026-02-21T08:32:58.1788063Z and.b32 %r155, %r154, 112; 2026-02-21T08:32:58.1788123Z xor.b32 %r156, %r153, %r155; 2026-02-21T08:32:58.1788188Z add.s32 %r213, %r86, %r156; 2026-02-21T08:32:58.1788241Z mov.b32 %r38, 16; 2026-02-21T08:32:58.1788295Z // begin inline asm 2026-02-21T08:32:58.1788420Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd25 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1788475Z // end inline asm 2026-02-21T08:32:58.1788529Z add.s32 %r215, %r213, 2048; 2026-02-21T08:32:58.1788588Z // begin inline asm 2026-02-21T08:32:58.1788700Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd26 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1788754Z // end inline asm 2026-02-21T08:32:58.1788833Z add.s32 %r217, %r213, 4096; 2026-02-21T08:32:58.1788896Z // begin inline asm 2026-02-21T08:32:58.1789004Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd27 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1789058Z // end inline asm 2026-02-21T08:32:58.1789118Z add.s32 %r219, %r213, 6144; 2026-02-21T08:32:58.1789171Z // begin inline asm 2026-02-21T08:32:58.1789275Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd28 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1789327Z // end inline asm 2026-02-21T08:32:58.1789388Z add.s32 %r221, %r213, 8192; 2026-02-21T08:32:58.1789440Z // begin inline asm 2026-02-21T08:32:58.1789547Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd29 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1789607Z // end inline asm 2026-02-21T08:32:58.1789665Z add.s32 %r223, %r213, 10240; 2026-02-21T08:32:58.1789717Z // begin inline asm 2026-02-21T08:32:58.1789823Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd30 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1789880Z // end inline asm 2026-02-21T08:32:58.1789937Z add.s32 %r225, %r213, 12288; 2026-02-21T08:32:58.1789990Z // begin inline asm 2026-02-21T08:32:58.1790102Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd31 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1790189Z // end inline asm 2026-02-21T08:32:58.1790244Z add.s32 %r227, %r213, 14336; 2026-02-21T08:32:58.1790303Z // begin inline asm 2026-02-21T08:32:58.1790406Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd32 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1790459Z // end inline asm 2026-02-21T08:32:58.1790513Z add.s32 %r229, %r213, 16384; 2026-02-21T08:32:58.1790574Z // begin inline asm 2026-02-21T08:32:58.1790677Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd33 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1790730Z // end inline asm 2026-02-21T08:32:58.1790791Z add.s32 %r231, %r213, 18432; 2026-02-21T08:32:58.1790845Z // begin inline asm 2026-02-21T08:32:58.1790947Z cp.async.cg.shared.global [ %r231 + 0 ], [ %rd34 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1790999Z // end inline asm 2026-02-21T08:32:58.1791062Z add.s32 %r233, %r213, 20480; 2026-02-21T08:32:58.1791115Z // begin inline asm 2026-02-21T08:32:58.1791237Z cp.async.cg.shared.global [ %r233 + 0 ], [ %rd35 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1791301Z // end inline asm 2026-02-21T08:32:58.1791357Z add.s32 %r235, %r213, 22528; 2026-02-21T08:32:58.1791412Z // begin inline asm 2026-02-21T08:32:58.1791516Z cp.async.cg.shared.global [ %r235 + 0 ], [ %rd36 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1791579Z // end inline asm 2026-02-21T08:32:58.1791634Z add.s32 %r237, %r213, 24576; 2026-02-21T08:32:58.1791688Z // begin inline asm 2026-02-21T08:32:58.1791798Z cp.async.cg.shared.global [ %r237 + 0 ], [ %rd37 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1791869Z // end inline asm 2026-02-21T08:32:58.1791923Z add.s32 %r239, %r213, 26624; 2026-02-21T08:32:58.1791983Z // begin inline asm 2026-02-21T08:32:58.1792086Z cp.async.cg.shared.global [ %r239 + 0 ], [ %rd38 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1792137Z // end inline asm 2026-02-21T08:32:58.1792193Z add.s32 %r241, %r213, 28672; 2026-02-21T08:32:58.1792253Z // begin inline asm 2026-02-21T08:32:58.1792356Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd39 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1792408Z // end inline asm 2026-02-21T08:32:58.1792469Z add.s32 %r243, %r213, 30720; 2026-02-21T08:32:58.1792523Z // begin inline asm 2026-02-21T08:32:58.1792625Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd40 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1792677Z // end inline asm 2026-02-21T08:32:58.1792742Z cp.async.commit_group; 2026-02-21T08:32:58.1792901Z .loc 1 55 59 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:59 2026-02-21T08:32:58.1792958Z or.b32 %r157, %r128, %r101; 2026-02-21T08:32:58.1793019Z or.b32 %r158, %r129, %r101; 2026-02-21T08:32:58.1793073Z or.b32 %r159, %r130, %r101; 2026-02-21T08:32:58.1793128Z or.b32 %r160, %r131, %r101; 2026-02-21T08:32:58.1793191Z or.b32 %r161, %r132, %r101; 2026-02-21T08:32:58.1793244Z or.b32 %r162, %r133, %r101; 2026-02-21T08:32:58.1793316Z or.b32 %r163, %r134, %r101; 2026-02-21T08:32:58.1793370Z or.b32 %r164, %r135, %r101; 2026-02-21T08:32:58.1793545Z .loc 1 55 34 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:34 2026-02-21T08:32:58.1793607Z mad.wide.s32 %rd41, %r157, 2, %rd22; 2026-02-21T08:32:58.1793667Z mad.wide.s32 %rd42, %r158, 2, %rd22; 2026-02-21T08:32:58.1793733Z mad.wide.s32 %rd43, %r159, 2, %rd22; 2026-02-21T08:32:58.1793791Z mad.wide.s32 %rd44, %r160, 2, %rd22; 2026-02-21T08:32:58.1793851Z mad.wide.s32 %rd45, %r161, 2, %rd22; 2026-02-21T08:32:58.1793909Z mad.wide.s32 %rd46, %r162, 2, %rd22; 2026-02-21T08:32:58.1793978Z mad.wide.s32 %rd47, %r163, 2, %rd22; 2026-02-21T08:32:58.1794036Z mad.wide.s32 %rd48, %r164, 2, %rd22; 2026-02-21T08:32:58.1794199Z .loc 1 55 87 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:87 2026-02-21T08:32:58.1794261Z add.s32 %r165, %r35, 98304; 2026-02-21T08:32:58.1794320Z add.s32 %r245, %r165, %r156; 2026-02-21T08:32:58.1794374Z // begin inline asm 2026-02-21T08:32:58.1794489Z cp.async.cg.shared.global [ %r245 + 0 ], [ %rd41 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1794578Z // end inline asm 2026-02-21T08:32:58.1794633Z add.s32 %r247, %r245, 2048; 2026-02-21T08:32:58.1794718Z // begin inline asm 2026-02-21T08:32:58.1794834Z cp.async.cg.shared.global [ %r247 + 0 ], [ %rd42 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1794886Z // end inline asm 2026-02-21T08:32:58.1794942Z add.s32 %r249, %r245, 4096; 2026-02-21T08:32:58.1795000Z // begin inline asm 2026-02-21T08:32:58.1795106Z cp.async.cg.shared.global [ %r249 + 0 ], [ %rd43 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1795161Z // end inline asm 2026-02-21T08:32:58.1795222Z add.s32 %r251, %r245, 6144; 2026-02-21T08:32:58.1795276Z // begin inline asm 2026-02-21T08:32:58.1795380Z cp.async.cg.shared.global [ %r251 + 0 ], [ %rd44 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1795431Z // end inline asm 2026-02-21T08:32:58.1795496Z add.s32 %r253, %r245, 8192; 2026-02-21T08:32:58.1795549Z // begin inline asm 2026-02-21T08:32:58.1795679Z cp.async.cg.shared.global [ %r253 + 0 ], [ %rd45 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1795744Z // end inline asm 2026-02-21T08:32:58.1795800Z add.s32 %r255, %r245, 10240; 2026-02-21T08:32:58.1795853Z // begin inline asm 2026-02-21T08:32:58.1795956Z cp.async.cg.shared.global [ %r255 + 0 ], [ %rd46 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1796015Z // end inline asm 2026-02-21T08:32:58.1796070Z add.s32 %r257, %r245, 12288; 2026-02-21T08:32:58.1796122Z // begin inline asm 2026-02-21T08:32:58.1796232Z cp.async.cg.shared.global [ %r257 + 0 ], [ %rd47 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1796308Z // end inline asm 2026-02-21T08:32:58.1796363Z add.s32 %r259, %r245, 14336; 2026-02-21T08:32:58.1796415Z // begin inline asm 2026-02-21T08:32:58.1796526Z cp.async.cg.shared.global [ %r259 + 0 ], [ %rd48 + 0 ], 0x10, %r38; 2026-02-21T08:32:58.1796577Z // end inline asm 2026-02-21T08:32:58.1796637Z cp.async.commit_group; 2026-02-21T08:32:58.1796705Z bfe.u32 %r166, %r86, 4, 14; 2026-02-21T08:32:58.1796763Z cvt.u64.u32 %rd50, %r166; 2026-02-21T08:32:58.1796829Z or.b64 %rd62, %rd50, 4611686293439512576; 2026-02-21T08:32:58.1796896Z bfe.u32 %r167, %r165, 4, 14; 2026-02-21T08:32:58.1796953Z cvt.u64.u32 %rd51, %r167; 2026-02-21T08:32:58.1797018Z or.b64 %rd63, %rd51, 4611686293372403712; 2026-02-21T08:32:58.1797072Z add.s32 %r168, %r35, 65568; 2026-02-21T08:32:58.1797135Z bfe.u32 %r169, %r168, 4, 14; 2026-02-21T08:32:58.1797193Z cvt.u64.u32 %rd52, %r169; 2026-02-21T08:32:58.1797256Z or.b64 %rd64, %rd52, 4611686293439512576; 2026-02-21T08:32:58.1797317Z add.s32 %r170, %r35, 98336; 2026-02-21T08:32:58.1797373Z bfe.u32 %r171, %r170, 4, 14; 2026-02-21T08:32:58.1797428Z cvt.u64.u32 %rd53, %r171; 2026-02-21T08:32:58.1797489Z or.b64 %rd65, %rd53, 4611686293372403712; 2026-02-21T08:32:58.1797551Z add.s32 %r172, %r35, 65600; 2026-02-21T08:32:58.1797604Z bfe.u32 %r173, %r172, 4, 14; 2026-02-21T08:32:58.1797682Z cvt.u64.u32 %rd54, %r173; 2026-02-21T08:32:58.1797752Z or.b64 %rd66, %rd54, 4611686293439512576; 2026-02-21T08:32:58.1797808Z add.s32 %r174, %r35, 98368; 2026-02-21T08:32:58.1797864Z bfe.u32 %r175, %r174, 4, 14; 2026-02-21T08:32:58.1797919Z cvt.u64.u32 %rd55, %r175; 2026-02-21T08:32:58.1797988Z or.b64 %rd67, %rd55, 4611686293372403712; 2026-02-21T08:32:58.1798042Z add.s32 %r176, %r35, 65632; 2026-02-21T08:32:58.1798096Z bfe.u32 %r177, %r176, 4, 14; 2026-02-21T08:32:58.1798159Z cvt.u64.u32 %rd56, %r177; 2026-02-21T08:32:58.1798219Z or.b64 %rd68, %rd56, 4611686293439512576; 2026-02-21T08:32:58.1798273Z add.s32 %r178, %r35, 98400; 2026-02-21T08:32:58.1798337Z bfe.u32 %r179, %r178, 4, 14; 2026-02-21T08:32:58.1798392Z cvt.u64.u32 %rd57, %r179; 2026-02-21T08:32:58.1798451Z or.b64 %rd69, %rd57, 4611686293372403712; 2026-02-21T08:32:58.1798506Z add.s32 %r180, %r35, 81920; 2026-02-21T08:32:58.1798569Z bfe.u32 %r181, %r180, 4, 14; 2026-02-21T08:32:58.1798625Z cvt.u64.u32 %rd58, %r181; 2026-02-21T08:32:58.1798688Z or.b64 %rd70, %rd58, 4611686293439512576; 2026-02-21T08:32:58.1798750Z add.s32 %r182, %r35, 81952; 2026-02-21T08:32:58.1798832Z bfe.u32 %r183, %r182, 4, 14; 2026-02-21T08:32:58.1798887Z cvt.u64.u32 %rd59, %r183; 2026-02-21T08:32:58.1798948Z or.b64 %rd72, %rd59, 4611686293439512576; 2026-02-21T08:32:58.1799011Z add.s32 %r184, %r35, 81984; 2026-02-21T08:32:58.1799065Z bfe.u32 %r185, %r184, 4, 14; 2026-02-21T08:32:58.1799120Z cvt.u64.u32 %rd60, %r185; 2026-02-21T08:32:58.1799185Z or.b64 %rd74, %rd60, 4611686293439512576; 2026-02-21T08:32:58.1799238Z add.s32 %r186, %r35, 82016; 2026-02-21T08:32:58.1799292Z bfe.u32 %r187, %r186, 4, 14; 2026-02-21T08:32:58.1799348Z cvt.u64.u32 %rd61, %r187; 2026-02-21T08:32:58.1799414Z or.b64 %rd76, %rd61, 4611686293439512576; 2026-02-21T08:32:58.1799585Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1799642Z add.s32 %r188, %r5, %r135; 2026-02-21T08:32:58.1799707Z cvt.u64.u32 %rd13, %r188; 2026-02-21T08:32:58.1799764Z add.s32 %r189, %r4, %r128; 2026-02-21T08:32:58.1799838Z cvt.u64.u32 %rd14, %r189; 2026-02-21T08:32:58.1799913Z add.s32 %r190, %r5, %r127; 2026-02-21T08:32:58.1799969Z cvt.u64.u32 %rd15, %r190; 2026-02-21T08:32:58.1800024Z add.s32 %r191, %r4, %r119; 2026-02-21T08:32:58.1800078Z cvt.u64.u32 %rd16, %r191; 2026-02-21T08:32:58.1800141Z add.s32 %r192, %r5, %r126; 2026-02-21T08:32:58.1800195Z cvt.u64.u32 %rd17, %r192; 2026-02-21T08:32:58.1800251Z mov.pred %p69, 0; 2026-02-21T08:32:58.1800310Z mov.b64 %rd640, 0; 2026-02-21T08:32:58.1800363Z bra.uni $L__BB0_6; 2026-02-21T08:32:58.1800478Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:32:58.1800647Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1800719Z setp.lt.u64 %p25, %rd640, 1984; 2026-02-21T08:32:58.1800776Z add.s64 %rd19, %rd640, 64; 2026-02-21T08:32:58.1800942Z .loc 1 54 60 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:60 2026-02-21T08:32:58.1801010Z add.s64 %rd103, %rd16, %rd640; 2026-02-21T08:32:58.1801070Z add.s64 %rd104, %rd17, %rd640; 2026-02-21T08:32:58.1801231Z .loc 1 54 32 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:32 2026-02-21T08:32:58.1801296Z add.s64 %rd105, %rd15, %rd640; 2026-02-21T08:32:58.1801351Z cvt.u32.u64 %r261, %rd103; 2026-02-21T08:32:58.1801404Z add.s32 %r262, %r261, 64; 2026-02-21T08:32:58.1801465Z mad.wide.s32 %rd79, %r262, 2, %rd21; 2026-02-21T08:32:58.1801529Z add.s32 %r263, %r261, 32832; 2026-02-21T08:32:58.1801590Z mad.wide.s32 %rd80, %r263, 2, %rd21; 2026-02-21T08:32:58.1801646Z add.s32 %r264, %r261, 65600; 2026-02-21T08:32:58.1801713Z mad.wide.s32 %rd81, %r264, 2, %rd21; 2026-02-21T08:32:58.1801767Z add.s32 %r265, %r261, 98368; 2026-02-21T08:32:58.1801825Z mad.wide.s32 %rd82, %r265, 2, %rd21; 2026-02-21T08:32:58.1801904Z add.s32 %r266, %r261, 131136; 2026-02-21T08:32:58.1801965Z mad.wide.s32 %rd83, %r266, 2, %rd21; 2026-02-21T08:32:58.1802025Z add.s32 %r267, %r261, 163904; 2026-02-21T08:32:58.1802087Z mad.wide.s32 %rd84, %r267, 2, %rd21; 2026-02-21T08:32:58.1802151Z add.s32 %r268, %r261, 196672; 2026-02-21T08:32:58.1802211Z mad.wide.s32 %rd85, %r268, 2, %rd21; 2026-02-21T08:32:58.1802266Z cvt.u32.u64 %r269, %rd104; 2026-02-21T08:32:58.1802330Z mad.wide.s32 %rd86, %r269, 2, %rd21; 2026-02-21T08:32:58.1802385Z add.s32 %r270, %r261, 262208; 2026-02-21T08:32:58.1802444Z mad.wide.s32 %rd87, %r270, 2, %rd21; 2026-02-21T08:32:58.1802498Z add.s32 %r271, %r261, 294976; 2026-02-21T08:32:58.1802564Z mad.wide.s32 %rd88, %r271, 2, %rd21; 2026-02-21T08:32:58.1802618Z add.s32 %r272, %r261, 327744; 2026-02-21T08:32:58.1802676Z mad.wide.s32 %rd89, %r272, 2, %rd21; 2026-02-21T08:32:58.1802737Z add.s32 %r273, %r261, 360512; 2026-02-21T08:32:58.1802793Z mad.wide.s32 %rd90, %r273, 2, %rd21; 2026-02-21T08:32:58.1802849Z add.s32 %r274, %r261, 393280; 2026-02-21T08:32:58.1802909Z mad.wide.s32 %rd91, %r274, 2, %rd21; 2026-02-21T08:32:58.1802970Z add.s32 %r275, %r261, 426048; 2026-02-21T08:32:58.1803051Z mad.wide.s32 %rd92, %r275, 2, %rd21; 2026-02-21T08:32:58.1803105Z add.s32 %r276, %r261, 458816; 2026-02-21T08:32:58.1803169Z mad.wide.s32 %rd93, %r276, 2, %rd21; 2026-02-21T08:32:58.1803223Z cvt.u32.u64 %r277, %rd105; 2026-02-21T08:32:58.1803280Z mad.wide.s32 %rd94, %r277, 2, %rd21; 2026-02-21T08:32:58.1803456Z .loc 1 54 85 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:85 2026-02-21T08:32:58.1803511Z bar.sync 2, 128; 2026-02-21T08:32:58.1803571Z selp.b32 %r214, 16, 0, %p25; 2026-02-21T08:32:58.1803625Z // begin inline asm 2026-02-21T08:32:58.1803745Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd79 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1803798Z // end inline asm 2026-02-21T08:32:58.1803851Z // begin inline asm 2026-02-21T08:32:58.1803969Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd80 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1804022Z // end inline asm 2026-02-21T08:32:58.1804094Z // begin inline asm 2026-02-21T08:32:58.1804203Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd81 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1804262Z // end inline asm 2026-02-21T08:32:58.1804315Z // begin inline asm 2026-02-21T08:32:58.1804420Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd82 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1804478Z // end inline asm 2026-02-21T08:32:58.1804532Z // begin inline asm 2026-02-21T08:32:58.1804638Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd83 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1804753Z // end inline asm 2026-02-21T08:32:58.1804806Z // begin inline asm 2026-02-21T08:32:58.1804910Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd84 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1804962Z // end inline asm 2026-02-21T08:32:58.1805022Z // begin inline asm 2026-02-21T08:32:58.1805128Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd85 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1805180Z // end inline asm 2026-02-21T08:32:58.1805241Z // begin inline asm 2026-02-21T08:32:58.1805345Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd86 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1805399Z // end inline asm 2026-02-21T08:32:58.1805453Z // begin inline asm 2026-02-21T08:32:58.1805566Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd87 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1805618Z // end inline asm 2026-02-21T08:32:58.1805673Z // begin inline asm 2026-02-21T08:32:58.1805786Z cp.async.cg.shared.global [ %r231 + 0 ], [ %rd88 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1805841Z // end inline asm 2026-02-21T08:32:58.1805896Z // begin inline asm 2026-02-21T08:32:58.1806008Z cp.async.cg.shared.global [ %r233 + 0 ], [ %rd89 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1806060Z // end inline asm 2026-02-21T08:32:58.1806114Z // begin inline asm 2026-02-21T08:32:58.1806216Z cp.async.cg.shared.global [ %r235 + 0 ], [ %rd90 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1806316Z // end inline asm 2026-02-21T08:32:58.1806371Z // begin inline asm 2026-02-21T08:32:58.1806478Z cp.async.cg.shared.global [ %r237 + 0 ], [ %rd91 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1806537Z // end inline asm 2026-02-21T08:32:58.1806590Z // begin inline asm 2026-02-21T08:32:58.1806694Z cp.async.cg.shared.global [ %r239 + 0 ], [ %rd92 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1806747Z // end inline asm 2026-02-21T08:32:58.1806807Z // begin inline asm 2026-02-21T08:32:58.1806911Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd93 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1806963Z // end inline asm 2026-02-21T08:32:58.1807025Z // begin inline asm 2026-02-21T08:32:58.1807131Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd94 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1807181Z // end inline asm 2026-02-21T08:32:58.1807247Z cp.async.commit_group; 2026-02-21T08:32:58.1807411Z .loc 1 55 59 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:59 2026-02-21T08:32:58.1807472Z add.s64 %rd106, %rd14, %rd640; 2026-02-21T08:32:58.1807634Z .loc 1 55 34 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:34 2026-02-21T08:32:58.1807729Z add.s64 %rd107, %rd13, %rd640; 2026-02-21T08:32:58.1807786Z cvt.u32.u64 %r278, %rd106; 2026-02-21T08:32:58.1807843Z add.s32 %r279, %r278, 64; 2026-02-21T08:32:58.1807911Z mad.wide.s32 %rd95, %r279, 2, %rd22; 2026-02-21T08:32:58.1807967Z add.s32 %r280, %r278, 32832; 2026-02-21T08:32:58.1808026Z mad.wide.s32 %rd96, %r280, 2, %rd22; 2026-02-21T08:32:58.1808081Z add.s32 %r281, %r278, 65600; 2026-02-21T08:32:58.1808147Z mad.wide.s32 %rd97, %r281, 2, %rd22; 2026-02-21T08:32:58.1808203Z add.s32 %r282, %r278, 98368; 2026-02-21T08:32:58.1808261Z mad.wide.s32 %rd98, %r282, 2, %rd22; 2026-02-21T08:32:58.1808324Z add.s32 %r283, %r278, 131136; 2026-02-21T08:32:58.1808383Z mad.wide.s32 %rd99, %r283, 2, %rd22; 2026-02-21T08:32:58.1808438Z add.s32 %r284, %r278, 163904; 2026-02-21T08:32:58.1808509Z mad.wide.s32 %rd100, %r284, 2, %rd22; 2026-02-21T08:32:58.1808564Z add.s32 %r285, %r278, 196672; 2026-02-21T08:32:58.1808651Z mad.wide.s32 %rd101, %r285, 2, %rd22; 2026-02-21T08:32:58.1808710Z cvt.u32.u64 %r286, %rd107; 2026-02-21T08:32:58.1808778Z mad.wide.s32 %rd102, %r286, 2, %rd22; 2026-02-21T08:32:58.1808943Z .loc 1 55 87 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:55:87 2026-02-21T08:32:58.1808998Z // begin inline asm 2026-02-21T08:32:58.1809111Z cp.async.cg.shared.global [ %r245 + 0 ], [ %rd95 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1809163Z // end inline asm 2026-02-21T08:32:58.1809216Z // begin inline asm 2026-02-21T08:32:58.1809347Z cp.async.cg.shared.global [ %r247 + 0 ], [ %rd96 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1809406Z // end inline asm 2026-02-21T08:32:58.1809458Z // begin inline asm 2026-02-21T08:32:58.1809562Z cp.async.cg.shared.global [ %r249 + 0 ], [ %rd97 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1809620Z // end inline asm 2026-02-21T08:32:58.1809674Z // begin inline asm 2026-02-21T08:32:58.1809781Z cp.async.cg.shared.global [ %r251 + 0 ], [ %rd98 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1809841Z // end inline asm 2026-02-21T08:32:58.1809893Z // begin inline asm 2026-02-21T08:32:58.1809997Z cp.async.cg.shared.global [ %r253 + 0 ], [ %rd99 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1810048Z // end inline asm 2026-02-21T08:32:58.1810108Z // begin inline asm 2026-02-21T08:32:58.1810218Z cp.async.cg.shared.global [ %r255 + 0 ], [ %rd100 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1810269Z // end inline asm 2026-02-21T08:32:58.1810327Z // begin inline asm 2026-02-21T08:32:58.1810437Z cp.async.cg.shared.global [ %r257 + 0 ], [ %rd101 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1810487Z // end inline asm 2026-02-21T08:32:58.1810540Z // begin inline asm 2026-02-21T08:32:58.1810650Z cp.async.cg.shared.global [ %r259 + 0 ], [ %rd102 + 0 ], 0x10, %r214; 2026-02-21T08:32:58.1810701Z // end inline asm 2026-02-21T08:32:58.1810782Z cp.async.commit_group; 2026-02-21T08:32:58.1810849Z mov.pred %p69, -1; 2026-02-21T08:32:58.1810906Z mov.b64 %rd640, %rd19; 2026-02-21T08:32:58.1811079Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1811143Z @%p25 bra $L__BB0_6; 2026-02-21T08:32:58.1811196Z bra.uni $L__BB0_9; 2026-02-21T08:32:58.1811286Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:32:58.1811376Z // => This Inner Loop Header: Depth=2 2026-02-21T08:32:58.1811545Z .loc 1 54 85 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:54:85 2026-02-21T08:32:58.1811606Z cp.async.wait_group 0; 2026-02-21T08:32:58.1811660Z bar.sync 2, 128; 2026-02-21T08:32:58.1811828Z .loc 1 56 52 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:56:52 2026-02-21T08:32:58.1811898Z shfl.sync.idx.b32 %r193, %r7, 0, 31, -1; 2026-02-21T08:32:58.1811959Z setp.ne.b32 %p5, %r193, 0; 2026-02-21T08:32:58.1812021Z @%p5 bra $L__BB0_8; 2026-02-21T08:32:58.1812112Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:32:58.1812194Z setp.eq.b64 %p23, %rd640, 1984; 2026-02-21T08:32:58.1812257Z elect.sync %r210|%p7, -1; 2026-02-21T08:32:58.1812320Z mov.b32 %r195, 136314896; 2026-02-21T08:32:58.1812373Z // begin inline asm 2026-02-21T08:32:58.1812514Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 0 ], %rd62, %rd63, %r195, %p69; 2026-02-21T08:32:58.1812574Z // end inline asm 2026-02-21T08:32:58.1812630Z mov.pred %p8, -1; 2026-02-21T08:32:58.1812683Z // begin inline asm 2026-02-21T08:32:58.1812821Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 0 ], %rd64, %rd65, %r195, %p8; 2026-02-21T08:32:58.1812875Z // end inline asm 2026-02-21T08:32:58.1812928Z // begin inline asm 2026-02-21T08:32:58.1813053Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 0 ], %rd66, %rd67, %r195, %p8; 2026-02-21T08:32:58.1813115Z // end inline asm 2026-02-21T08:32:58.1813169Z // begin inline asm 2026-02-21T08:32:58.1813311Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 0 ], %rd68, %rd69, %r195, %p8; 2026-02-21T08:32:58.1813378Z // end inline asm 2026-02-21T08:32:58.1813434Z // begin inline asm 2026-02-21T08:32:58.1813569Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 128 ], %rd70, %rd63, %r195, %p69; 2026-02-21T08:32:58.1813632Z // end inline asm 2026-02-21T08:32:58.1813688Z // begin inline asm 2026-02-21T08:32:58.1813819Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 128 ], %rd72, %rd65, %r195, %p8; 2026-02-21T08:32:58.1813873Z // end inline asm 2026-02-21T08:32:58.1813962Z // begin inline asm 2026-02-21T08:32:58.1814091Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 128 ], %rd74, %rd67, %r195, %p8; 2026-02-21T08:32:58.1814144Z // end inline asm 2026-02-21T08:32:58.1814207Z // begin inline asm 2026-02-21T08:32:58.1814334Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r194 + 128 ], %rd76, %rd69, %r195, %p8; 2026-02-21T08:32:58.1814390Z // end inline asm 2026-02-21T08:32:58.1814459Z and.pred %p22, %p23, %p7; 2026-02-21T08:32:58.1814520Z add.s32 %r212, %r35, 114688; 2026-02-21T08:32:58.1814579Z cvt.u64.u32 %rd78, %r212; 2026-02-21T08:32:58.1814636Z // begin inline asm 2026-02-21T08:32:58.1814803Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd78]; 2026-02-21T08:32:58.1814858Z // end inline asm 2026-02-21T08:32:58.1814914Z bra.uni $L__BB0_8; 2026-02-21T08:32:58.1815021Z $L__BB0_12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1815198Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1815258Z barrier.sync 1; 2026-02-21T08:32:58.1815321Z barrier.sync 1; 2026-02-21T08:32:58.1815378Z bra.uni $L__BB0_2; 2026-02-21T08:32:58.1815472Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1815535Z cp.async.wait_group 0; 2026-02-21T08:32:58.1815624Z bar.sync 2, 128; 2026-02-21T08:32:58.1815681Z barrier.sync 1; 2026-02-21T08:32:58.1815738Z bra.uni $L__BB0_2; 2026-02-21T08:32:58.1815844Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1816017Z .loc 1 49 111 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:49:111 2026-02-21T08:32:58.1816073Z barrier.sync 1; 2026-02-21T08:32:58.1816129Z barrier.sync 1; 2026-02-21T08:32:58.1816192Z bra.uni $L__BB0_2; 2026-02-21T08:32:58.1816283Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:32:58.1816445Z .loc 1 19 0 // cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py:19 2026-02-21T08:32:58.1816510Z barrier.sync 1; 2026-02-21T08:32:58.1816565Z barrier.sync 1; 2026-02-21T08:32:58.1816620Z bra.uni $L__BB0_2; 2026-02-21T08:32:58.1816681Z $L__tmp0: 2026-02-21T08:32:58.1816739Z $L__func_end0: 2026-02-21T08:32:58.1816822Z // -- End function 2026-02-21T08:32:58.1816873Z } 2026-02-21T08:32:58.1817088Z .file 1 "/tmp/torchinductor_root/wu/cwuiji3qxrfvlgddv2uxyef2xd7uzc23cl664bauyfphrfe5myxw.py" 2026-02-21T08:32:58.1817180Z .section .debug_abbrev 2026-02-21T08:32:58.1817231Z { 2026-02-21T08:32:58.1817327Z .b8 1 // Abbreviation Code 2026-02-21T08:32:58.1817416Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:32:58.1817497Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:32:58.1817582Z .b8 37 // DW_AT_producer 2026-02-21T08:32:58.1817656Z .b8 8 // DW_FORM_string 2026-02-21T08:32:58.1817733Z .b8 19 // DW_AT_language 2026-02-21T08:32:58.1817810Z .b8 5 // DW_FORM_data2 2026-02-21T08:32:58.1817892Z .b8 3 // DW_AT_name 2026-02-21T08:32:58.1817967Z .b8 8 // DW_FORM_string 2026-02-21T08:32:58.1818071Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:32:58.1818156Z .b8 6 // DW_FORM_data4 2026-02-21T08:32:58.1818232Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:32:58.1818303Z .b8 8 // DW_FORM_string 2026-02-21T08:32:58.1818381Z .b8 0 // EOM(1) 2026-02-21T08:32:58.1818449Z .b8 0 // EOM(2) 2026-02-21T08:32:58.1818513Z .b8 0 // EOM(3) 2026-02-21T08:32:58.1818564Z } 2026-02-21T08:32:58.1818658Z .section .debug_info 2026-02-21T08:32:58.1818707Z { 2026-02-21T08:32:58.1818790Z .b32 104 // Length of Unit 2026-02-21T08:32:58.1818882Z .b8 2 // DWARF version number 2026-02-21T08:32:58.1818934Z .b8 0 2026-02-21T08:32:58.1819053Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:32:58.1819143Z .b8 8 // Address Size (in bytes) 2026-02-21T08:32:58.1819248Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:32:58.1819327Z .b8 116 // DW_AT_producer 2026-02-21T08:32:58.1819381Z .b8 114 2026-02-21T08:32:58.1819440Z .b8 105 2026-02-21T08:32:58.1819493Z .b8 116 2026-02-21T08:32:58.1819544Z .b8 111 2026-02-21T08:32:58.1819600Z .b8 110 2026-02-21T08:32:58.1819652Z .b8 0 2026-02-21T08:32:58.1819724Z .b8 2 // DW_AT_language 2026-02-21T08:32:58.1819774Z .b8 0 2026-02-21T08:32:58.1819858Z .b8 99 // DW_AT_name 2026-02-21T08:32:58.1819908Z .b8 119 2026-02-21T08:32:58.1819959Z .b8 117 2026-02-21T08:32:58.1820014Z .b8 105 2026-02-21T08:32:58.1820065Z .b8 106 2026-02-21T08:32:58.1820115Z .b8 105 2026-02-21T08:32:58.1820166Z .b8 51 2026-02-21T08:32:58.1820224Z .b8 113 2026-02-21T08:32:58.1820295Z .b8 120 2026-02-21T08:32:58.1820347Z .b8 114 2026-02-21T08:32:58.1820398Z .b8 102 2026-02-21T08:32:58.1820462Z .b8 118 2026-02-21T08:32:58.1820517Z .b8 108 2026-02-21T08:32:58.1820572Z .b8 103 2026-02-21T08:32:58.1820630Z .b8 100 2026-02-21T08:32:58.1820683Z .b8 100 2026-02-21T08:32:58.1820738Z .b8 118 2026-02-21T08:32:58.1820793Z .b8 50 2026-02-21T08:32:58.1820854Z .b8 117 2026-02-21T08:32:58.1820907Z .b8 120 2026-02-21T08:32:58.1820958Z .b8 121 2026-02-21T08:32:58.1821017Z .b8 101 2026-02-21T08:32:58.1821069Z .b8 102 2026-02-21T08:32:58.1821120Z .b8 50 2026-02-21T08:32:58.1821172Z .b8 120 2026-02-21T08:32:58.1821232Z .b8 100 2026-02-21T08:32:58.1821286Z .b8 55 2026-02-21T08:32:58.1821346Z .b8 117 2026-02-21T08:32:58.1821395Z .b8 122 2026-02-21T08:32:58.1821451Z .b8 99 2026-02-21T08:32:58.1821500Z .b8 50 2026-02-21T08:32:58.1821550Z .b8 51 2026-02-21T08:32:58.1821605Z .b8 99 2026-02-21T08:32:58.1821655Z .b8 108 2026-02-21T08:32:58.1821704Z .b8 54 2026-02-21T08:32:58.1821755Z .b8 54 2026-02-21T08:32:58.1821813Z .b8 52 2026-02-21T08:32:58.1821863Z .b8 98 2026-02-21T08:32:58.1821914Z .b8 97 2026-02-21T08:32:58.1821972Z .b8 117 2026-02-21T08:32:58.1822041Z .b8 121 2026-02-21T08:32:58.1822089Z .b8 102 2026-02-21T08:32:58.1822137Z .b8 112 2026-02-21T08:32:58.1822192Z .b8 104 2026-02-21T08:32:58.1822239Z .b8 114 2026-02-21T08:32:58.1822286Z .b8 102 2026-02-21T08:32:58.1822339Z .b8 101 2026-02-21T08:32:58.1822387Z .b8 53 2026-02-21T08:32:58.1822434Z .b8 109 2026-02-21T08:32:58.1822480Z .b8 121 2026-02-21T08:32:58.1822536Z .b8 120 2026-02-21T08:32:58.1822583Z .b8 119 2026-02-21T08:32:58.1822630Z .b8 46 2026-02-21T08:32:58.1822677Z .b8 112 2026-02-21T08:32:58.1822733Z .b8 121 2026-02-21T08:32:58.1822780Z .b8 0 2026-02-21T08:32:58.1822867Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:32:58.1822946Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:32:58.1822993Z .b8 116 2026-02-21T08:32:58.1823040Z .b8 109 2026-02-21T08:32:58.1823087Z .b8 112 2026-02-21T08:32:58.1823143Z .b8 47 2026-02-21T08:32:58.1823191Z .b8 116 2026-02-21T08:32:58.1823238Z .b8 111 2026-02-21T08:32:58.1823312Z .b8 114 2026-02-21T08:32:58.1823361Z .b8 99 2026-02-21T08:32:58.1823410Z .b8 104 2026-02-21T08:32:58.1823457Z .b8 105 2026-02-21T08:32:58.1823513Z .b8 110 2026-02-21T08:32:58.1823562Z .b8 100 2026-02-21T08:32:58.1823610Z .b8 117 2026-02-21T08:32:58.1823676Z .b8 99 2026-02-21T08:32:58.1823723Z .b8 116 2026-02-21T08:32:58.1823770Z .b8 111 2026-02-21T08:32:58.1823818Z .b8 114 2026-02-21T08:32:58.1823874Z .b8 95 2026-02-21T08:32:58.1823922Z .b8 114 2026-02-21T08:32:58.1823968Z .b8 111 2026-02-21T08:32:58.1824016Z .b8 111 2026-02-21T08:32:58.1824103Z .b8 116 2026-02-21T08:32:58.1824151Z .b8 47 2026-02-21T08:32:58.1824198Z .b8 119 2026-02-21T08:32:58.1824253Z .b8 117 2026-02-21T08:32:58.1824300Z .b8 0 2026-02-21T08:32:58.1824348Z } 2026-02-21T08:32:58.1824410Z .section .debug_macinfo { } 2026-02-21T08:32:58.1824422Z 2026-02-21T08:32:58.1824499Z ================================================================ 2026-02-21T08:32:58.1824600Z please share the reproducer above with Triton project. 2026-02-21T08:32:59.5362323Z 2026-02-21T08:32:59.5367554Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 75/75 23.8 configs/s 2026-02-21T08:33:01.7097091Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 456.9 2026-02-21T08:33:01.7101257Z configs/s 2026-02-21T08:33:01.8118850Z [116s] Generation 5 complete: 2026-02-21T08:33:01.8123150Z error=27 2026-02-21T08:33:01.8127174Z ok=53 2026-02-21T08:33:01.8128655Z min=0.0757 2026-02-21T08:33:01.8128836Z mid=0.1107 2026-02-21T08:33:01.8128988Z max=7.0287 2026-02-21T08:33:01.8129140Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:33:01.8129366Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:33:01.8129587Z 'l2_groupings': [2], 2026-02-21T08:33:01.8129748Z 'load_eviction_policies': ['', ''], 2026-02-21T08:33:01.8129935Z 'loop_orders': [[0, 1]], 2026-02-21T08:33:01.8130363Z 'num_stages': 8, 2026-02-21T08:33:01.8130522Z 'num_warps': 8, 2026-02-21T08:33:01.8130680Z 'pid_type': 'flat', 2026-02-21T08:33:01.8130846Z 'range_flattens': [None, None], 2026-02-21T08:33:01.8131035Z 'range_multi_buffers': [None, None], 2026-02-21T08:33:01.8131219Z 'range_num_stages': [0, 0], 2026-02-21T08:33:01.8131391Z 'range_unroll_factors': [0, 0], 2026-02-21T08:33:01.8131566Z 'range_warp_specializes': [None, True]} 2026-02-21T08:33:01.8141638Z [116s] Fitting surrogate: 531 points, 531 targets 2026-02-21T08:33:02.7435564Z [117s] Generation 6 starting: 52 neighbors, 3 active search path(s) 2026-02-21T08:33:34.4966422Z [149s] Timeout after 30s compiling Config(block_sizes=[256, 1024, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T08:33:34.7958365Z [149s] Timeout after 30s compiling Config(block_sizes=[1024, 256, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T08:33:34.7980929Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 52/52 0.5 configs/s 2026-02-21T08:33:36.9578809Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 52/52 23.3 configs/s 2026-02-21T08:33:39.8729444Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 346.1 2026-02-21T08:33:39.8732637Z configs/s 2026-02-21T08:33:40.0660400Z [155s] Generation 6 complete: 2026-02-21T08:33:40.0660858Z error=21 2026-02-21T08:33:40.0661163Z timeout=2 2026-02-21T08:33:40.0661400Z ok=33 2026-02-21T08:33:40.0661638Z min=0.0738 2026-02-21T08:33:40.0662401Z mid=0.1065 2026-02-21T08:33:40.0662664Z max=24.4751 2026-02-21T08:33:40.0662964Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:33:40.0663420Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:33:40.0663813Z 'l2_groupings': [2], 2026-02-21T08:33:40.0664119Z 'load_eviction_policies': ['', ''], 2026-02-21T08:33:40.0664508Z 'loop_orders': [[0, 1]], 2026-02-21T08:33:40.0665194Z 'num_stages': 8, 2026-02-21T08:33:40.0665424Z 'num_warps': 8, 2026-02-21T08:33:40.0665675Z 'pid_type': 'flat', 2026-02-21T08:33:40.0666112Z 'range_flattens': [None, None], 2026-02-21T08:33:40.0666429Z 'range_multi_buffers': [None, None], 2026-02-21T08:33:40.0666747Z 'range_num_stages': [0, 0], 2026-02-21T08:33:40.0667038Z 'range_unroll_factors': [0, 0], 2026-02-21T08:33:40.0667362Z 'range_warp_specializes': [None, True]} 2026-02-21T08:33:40.0704663Z [155s] Fitting surrogate: 587 points, 587 targets 2026-02-21T08:33:41.3427384Z [156s] Generation 7 starting: 52 neighbors, 3 active search path(s) 2026-02-21T08:34:02.7279827Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53/53 0.5 configs/s 2026-02-21T08:34:05.3621023Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 53/53 20.2 configs/s 2026-02-21T08:34:08.0287398Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 375.8 2026-02-21T08:34:08.0288443Z configs/s 2026-02-21T08:34:08.1770577Z [183s] Generation 7 complete: 2026-02-21T08:34:08.1770967Z error=15 2026-02-21T08:34:08.1771204Z ok=41 2026-02-21T08:34:08.1771382Z min=0.0738 2026-02-21T08:34:08.1771571Z mid=0.1106 2026-02-21T08:34:08.1771744Z max=16.0768 2026-02-21T08:34:08.1771956Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:34:08.1772260Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:34:08.1772560Z 'l2_groupings': [4], 2026-02-21T08:34:08.1772831Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:34:08.1773117Z 'loop_orders': [[0, 1]], 2026-02-21T08:34:08.1773374Z 'num_stages': 6, 2026-02-21T08:34:08.1774091Z 'num_warps': 8, 2026-02-21T08:34:08.1774311Z 'pid_type': 'flat', 2026-02-21T08:34:08.1774540Z 'range_flattens': [None, False], 2026-02-21T08:34:08.1775128Z 'range_multi_buffers': [None, True], 2026-02-21T08:34:08.1775401Z 'range_num_stages': [0, 0], 2026-02-21T08:34:08.1775665Z 'range_unroll_factors': [0, 0], 2026-02-21T08:34:08.1775930Z 'range_warp_specializes': [None, False]} 2026-02-21T08:34:08.1804101Z [183s] Fitting surrogate: 643 points, 643 targets 2026-02-21T08:34:09.1584657Z [184s] Generation 8 starting: 49 neighbors, 3 active search path(s) 2026-02-21T08:34:17.2556474Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50/50 8.3 configs/s 2026-02-21T08:34:17.8611765Z 2026-02-21T08:34:17.8611784Z 2026-02-21T08:34:17.8612266Z ================================================================ 2026-02-21T08:34:17.8612863Z Internal Triton PTX codegen error 2026-02-21T08:34:17.8613237Z `ptxas` stderr: 2026-02-21T08:34:17.8614575Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 125 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:34:17.8616102Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:34:17.8616460Z 2026-02-21T08:34:17.8617378Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2r0xr_3m.ptx -o /tmp/tmp2r0xr_3m.ptx.o 2026-02-21T08:34:17.8618418Z 2026-02-21T08:34:17.8618629Z 2026-02-21T08:34:17.8618770Z // 2026-02-21T08:34:17.8619102Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:34:17.8619560Z // 2026-02-21T08:34:17.8619723Z 2026-02-21T08:34:17.8619859Z .version 8.7 2026-02-21T08:34:17.8620206Z .target sm_100a 2026-02-21T08:34:17.8620559Z .address_size 64 2026-02-21T08:34:17.8620785Z 2026-02-21T08:34:17.8621103Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:34:17.8621737Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:34:17.8622225Z // @_helion_matmul 2026-02-21T08:34:17.8622698Z .visible .entry _helion_matmul( 2026-02-21T08:34:17.8623183Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:34:17.8623815Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:34:17.8624413Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:34:17.8625073Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:34:17.8625672Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:34:17.8626138Z ) 2026-02-21T08:34:17.8626405Z .reqntid 384 2026-02-21T08:34:17.8626683Z .maxnreg 32 2026-02-21T08:34:17.8626968Z { 2026-02-21T08:34:17.8627242Z .reg .pred %p<65>; 2026-02-21T08:34:17.8627555Z .reg .b16 %rs<3>; 2026-02-21T08:34:17.8627853Z .reg .b32 %r<2930>; 2026-02-21T08:34:17.8628323Z .reg .b64 %rd<1213>; 2026-02-21T08:34:17.8628655Z $L__func_begin0: 2026-02-21T08:34:17.8628841Z 2026-02-21T08:34:17.8628959Z // %bb.0: 2026-02-21T08:34:17.8629536Z .loc 1 14 0 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:14 2026-02-21T08:34:17.8630235Z mov.u32 %r1, %tid.x; 2026-02-21T08:34:17.8630546Z shr.u32 %r2, %r1, 5; 2026-02-21T08:34:17.8630882Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:34:17.8631303Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:34:17.8631632Z @%p2 bra $L__BB0_13; 2026-02-21T08:34:17.8631938Z bra.uni $L__BB0_1; 2026-02-21T08:34:17.8632247Z $L__BB0_13: 2026-02-21T08:34:17.8632563Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T08:34:17.8632983Z setp.lt.u32 %p26, %r1, 32; 2026-02-21T08:34:17.8633333Z mov.b32 %r382, global_smem; 2026-02-21T08:34:17.8633680Z // begin inline asm 2026-02-21T08:34:17.8634211Z @%p26 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r382], 512; 2026-02-21T08:34:17.8635983Z [192s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:34:17.8638892Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=2, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:34:17.8641806Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:34:17.8642348Z `ptxas` stderr: 2026-02-21T08:34:17.8643319Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 125 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:34:17.8644485Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:34:17.8644878Z 2026-02-21T08:34:17.8646030Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2r0xr_3m.ptx -o /tmp/tmp2r0xr_3m.ptx.o 2026-02-21T08:34:17.8647174Z 2026-02-21T08:34:17.8647463Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:34:17.8648023Z // end inline asm 2026-02-21T08:34:17.8648312Z bar.sync 0, 128; 2026-02-21T08:34:17.8648635Z ld.shared.b32 %r2929, [global_smem]; 2026-02-21T08:34:17.8649009Z bar.sync 0, 128; 2026-02-21T08:34:17.8649307Z // begin inline asm 2026-02-21T08:34:17.8649766Z @%p26 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:34:17.8650409Z // end inline asm 2026-02-21T08:34:17.8650980Z .loc 1 21 30 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:21:30 2026-02-21T08:34:17.8651656Z mov.u32 %r41, %ctaid.x; 2026-02-21T08:34:17.8652256Z .loc 1 23 75 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:23:75 2026-02-21T08:34:17.8652932Z setp.gt.u32 %p28, %r41, 255; 2026-02-21T08:34:17.8653287Z @%p28 bra $L__BB0_15; 2026-02-21T08:34:17.8653633Z // %bb.14: // %.lr.ph 2026-02-21T08:34:17.8654313Z .loc 1 0 75 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:0:75 2026-02-21T08:34:17.8655097Z ld.param.b64 %rd23, [_helion_matmul_param_2]; 2026-02-21T08:34:17.8655785Z .loc 1 35 45 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:35:45 2026-02-21T08:34:17.8656452Z and.b32 %r42, %r1, 96; 2026-02-21T08:34:17.8656786Z bfe.u32 %r43, %r1, 5, 2; 2026-02-21T08:34:17.8657108Z shl.b32 %r2051, %r1, 3; 2026-02-21T08:34:17.8657428Z and.b32 %r2052, %r2051, 248; 2026-02-21T08:34:17.8657775Z and.b32 %r2053, %r2051, 128; 2026-02-21T08:34:17.8658119Z add.s32 %r2055, %r382, %r2053; 2026-02-21T08:34:17.8658484Z bfe.s32 %r2056, %r1, 3, 1; 2026-02-21T08:34:17.8658911Z and.b32 %r2057, %r2056, 8256; 2026-02-21T08:34:17.8659244Z shr.u32 %r2058, %r42, 1; 2026-02-21T08:34:17.8659574Z or.b32 %r2059, %r2057, %r2058; 2026-02-21T08:34:17.8659910Z shl.b32 %r2060, %r42, 6; 2026-02-21T08:34:17.8660226Z and.b32 %r2061, %r1, 7; 2026-02-21T08:34:17.8660530Z shl.b32 %r2062, %r2061, 4; 2026-02-21T08:34:17.8660863Z or.b32 %r2063, %r2060, %r2062; 2026-02-21T08:34:17.8661208Z xor.b32 %r2064, %r2059, %r2063; 2026-02-21T08:34:17.8661615Z add.s32 %r1479, %r2055, %r2064; 2026-02-21T08:34:17.8662044Z add.s32 %r1514, %r1479, 1792; 2026-02-21T08:34:17.8662439Z add.s32 %r1509, %r1479, 1536; 2026-02-21T08:34:17.8662813Z add.s32 %r1504, %r1479, 1280; 2026-02-21T08:34:17.8663139Z add.s32 %r1499, %r1479, 1024; 2026-02-21T08:34:17.8663470Z add.s32 %r1494, %r1479, 768; 2026-02-21T08:34:17.8663793Z add.s32 %r1489, %r1479, 512; 2026-02-21T08:34:17.8664121Z add.s32 %r1484, %r1479, 256; 2026-02-21T08:34:17.8664443Z shl.b32 %r2065, %r2061, 11; 2026-02-21T08:34:17.8664837Z shl.b32 %r2066, %r1, 4; 2026-02-21T08:34:17.8665143Z and.b32 %r2067, %r2066, 2032; 2026-02-21T08:34:17.8665485Z or.b32 %r2068, %r2065, %r2067; 2026-02-21T08:34:17.8665918Z xor.b32 %r2069, %r2068, 112; 2026-02-21T08:34:17.8666243Z add.s32 %r2070, %r382, %r2069; 2026-02-21T08:34:17.8666578Z xor.b32 %r2071, %r2068, 96; 2026-02-21T08:34:17.8666901Z add.s32 %r2072, %r382, %r2071; 2026-02-21T08:34:17.8667239Z xor.b32 %r2073, %r2068, 80; 2026-02-21T08:34:17.8667563Z add.s32 %r2074, %r382, %r2073; 2026-02-21T08:34:17.8667886Z xor.b32 %r2075, %r2068, 64; 2026-02-21T08:34:17.8668192Z add.s32 %r2076, %r382, %r2075; 2026-02-21T08:34:17.8668510Z xor.b32 %r2077, %r2068, 48; 2026-02-21T08:34:17.8668824Z add.s32 %r2078, %r382, %r2077; 2026-02-21T08:34:17.8669149Z xor.b32 %r2079, %r2068, 32; 2026-02-21T08:34:17.8669469Z add.s32 %r2080, %r382, %r2079; 2026-02-21T08:34:17.8669784Z xor.b32 %r2081, %r2068, 16; 2026-02-21T08:34:17.8670104Z add.s32 %r2082, %r382, %r2081; 2026-02-21T08:34:17.8670426Z add.s32 %r2083, %r382, %r2068; 2026-02-21T08:34:17.8670760Z setp.eq.b32 %p61, %r1, 0; 2026-02-21T08:34:17.8671411Z .loc 1 30 33 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:30:33 2026-02-21T08:34:17.8672049Z shr.u32 %r2084, %r41, 3; 2026-02-21T08:34:17.8672344Z and.b32 %r2085, %r2084, 30; 2026-02-21T08:34:17.8672908Z .loc 1 32 64 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:32:64 2026-02-21T08:34:17.8673526Z and.b32 %r2086, %r41, 1; 2026-02-21T08:34:17.8674066Z .loc 1 32 30 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:32:30 2026-02-21T08:34:17.8674834Z or.b32 %r2087, %r2085, %r2086; 2026-02-21T08:34:17.8675415Z .loc 1 34 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:34:27 2026-02-21T08:34:17.8676053Z shl.b32 %r2088, %r2087, 8; 2026-02-21T08:34:17.8676629Z .loc 1 35 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:35:32 2026-02-21T08:34:17.8677275Z or.b32 %r2089, %r2088, %r43; 2026-02-21T08:34:17.8677857Z .loc 1 36 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:36:27 2026-02-21T08:34:17.8678485Z shl.b32 %r2090, %r41, 7; 2026-02-21T08:34:17.8678808Z and.b32 %r2091, %r2090, 1792; 2026-02-21T08:34:17.8679387Z .loc 1 37 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:37:32 2026-02-21T08:34:17.8680026Z or.b32 %r2092, %r2091, %r2052; 2026-02-21T08:34:17.8680607Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.8681287Z shfl.sync.idx.b32 %r2093, %r2, 0, 31, -1; 2026-02-21T08:34:17.8681683Z shl.b32 %r2094, %r2093, 21; 2026-02-21T08:34:17.8682011Z and.b32 %r2095, %r2094, 6291456; 2026-02-21T08:34:17.8682370Z add.s32 %r383, %r2095, %r2929; 2026-02-21T08:34:17.8682702Z mov.pred %p29, -1; 2026-02-21T08:34:17.8683001Z mov.b32 %r384, 0; 2026-02-21T08:34:17.8683371Z // begin inline asm 2026-02-21T08:34:17.8684362Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 0], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8685557Z // end inline asm 2026-02-21T08:34:17.8685835Z // begin inline asm 2026-02-21T08:34:17.8686672Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 16], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8687574Z // end inline asm 2026-02-21T08:34:17.8687859Z // begin inline asm 2026-02-21T08:34:17.8688676Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 32], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8689588Z // end inline asm 2026-02-21T08:34:17.8689863Z // begin inline asm 2026-02-21T08:34:17.8690680Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 48], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8691582Z // end inline asm 2026-02-21T08:34:17.8691950Z // begin inline asm 2026-02-21T08:34:17.8692767Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 64], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8693652Z // end inline asm 2026-02-21T08:34:17.8693936Z // begin inline asm 2026-02-21T08:34:17.8694788Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 80], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8695695Z // end inline asm 2026-02-21T08:34:17.8695976Z // begin inline asm 2026-02-21T08:34:17.8696785Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 96], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8697702Z // end inline asm 2026-02-21T08:34:17.8697984Z // begin inline asm 2026-02-21T08:34:17.8698891Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 112], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8699822Z // end inline asm 2026-02-21T08:34:17.8700095Z // begin inline asm 2026-02-21T08:34:17.8700999Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 128], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8701935Z // end inline asm 2026-02-21T08:34:17.8702361Z // begin inline asm 2026-02-21T08:34:17.8703266Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 144], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8704202Z // end inline asm 2026-02-21T08:34:17.8704544Z // begin inline asm 2026-02-21T08:34:17.8705463Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 160], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8706407Z // end inline asm 2026-02-21T08:34:17.8706677Z // begin inline asm 2026-02-21T08:34:17.8707511Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 176], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8708414Z // end inline asm 2026-02-21T08:34:17.8708677Z // begin inline asm 2026-02-21T08:34:17.8709480Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 192], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8710361Z // end inline asm 2026-02-21T08:34:17.8710634Z // begin inline asm 2026-02-21T08:34:17.8711539Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 208], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8712421Z // end inline asm 2026-02-21T08:34:17.8712699Z // begin inline asm 2026-02-21T08:34:17.8713489Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 224], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8714371Z // end inline asm 2026-02-21T08:34:17.8714640Z // begin inline asm 2026-02-21T08:34:17.8715519Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 240], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8716431Z // end inline asm 2026-02-21T08:34:17.8716696Z // begin inline asm 2026-02-21T08:34:17.8717487Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 256], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8718351Z // end inline asm 2026-02-21T08:34:17.8718626Z // begin inline asm 2026-02-21T08:34:17.8719409Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 272], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8720389Z // end inline asm 2026-02-21T08:34:17.8720666Z // begin inline asm 2026-02-21T08:34:17.8721448Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 288], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8722329Z // end inline asm 2026-02-21T08:34:17.8722593Z // begin inline asm 2026-02-21T08:34:17.8723380Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 304], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8724261Z // end inline asm 2026-02-21T08:34:17.8724525Z // begin inline asm 2026-02-21T08:34:17.8725487Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 320], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8726424Z // end inline asm 2026-02-21T08:34:17.8726701Z // begin inline asm 2026-02-21T08:34:17.8727495Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 336], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8728394Z // end inline asm 2026-02-21T08:34:17.8728670Z // begin inline asm 2026-02-21T08:34:17.8729478Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 352], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8730496Z // end inline asm 2026-02-21T08:34:17.8730772Z // begin inline asm 2026-02-21T08:34:17.8731593Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 368], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8732502Z // end inline asm 2026-02-21T08:34:17.8732786Z // begin inline asm 2026-02-21T08:34:17.8733607Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 384], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8734507Z // end inline asm 2026-02-21T08:34:17.8734862Z // begin inline asm 2026-02-21T08:34:17.8735668Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 400], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8736565Z // end inline asm 2026-02-21T08:34:17.8736842Z // begin inline asm 2026-02-21T08:34:17.8737661Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 416], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8738560Z // end inline asm 2026-02-21T08:34:17.8738820Z // begin inline asm 2026-02-21T08:34:17.8739712Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 432], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8740599Z // end inline asm 2026-02-21T08:34:17.8740877Z // begin inline asm 2026-02-21T08:34:17.8741681Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 448], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8742557Z // end inline asm 2026-02-21T08:34:17.8742830Z // begin inline asm 2026-02-21T08:34:17.8743615Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 464], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8744500Z // end inline asm 2026-02-21T08:34:17.8744840Z // begin inline asm 2026-02-21T08:34:17.8745651Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 480], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8746551Z // end inline asm 2026-02-21T08:34:17.8746815Z // begin inline asm 2026-02-21T08:34:17.8747706Z @%p29 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 496], {%r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384, %r384}; 2026-02-21T08:34:17.8748573Z // end inline asm 2026-02-21T08:34:17.8748846Z // begin inline asm 2026-02-21T08:34:17.8749157Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:34:17.8749489Z // end inline asm 2026-02-21T08:34:17.8749757Z bar.sync 0, 128; 2026-02-21T08:34:17.8750289Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.8750944Z add.s32 %r927, %r382, 65536; 2026-02-21T08:34:17.8751253Z // begin inline asm 2026-02-21T08:34:17.8751595Z @%p61 mbarrier.init.shared::cta.b64 [%r927], 1; 2026-02-21T08:34:17.8751992Z // end inline asm 2026-02-21T08:34:17.8752348Z st.shared.v2.b32 [global_smem+65544], {0, 50397698}; 2026-02-21T08:34:17.8752874Z st.shared.b32 [global_smem], %r2929; 2026-02-21T08:34:17.8753323Z st.shared.v2.b32 [global_smem+8], {%r2088, %r2091}; 2026-02-21T08:34:17.8753741Z barrier.sync 1; 2026-02-21T08:34:17.8754011Z barrier.sync 1; 2026-02-21T08:34:17.8754280Z barrier.sync 1; 2026-02-21T08:34:17.8754869Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.8755516Z bar.sync 0, 128; 2026-02-21T08:34:17.8755788Z // begin inline asm 2026-02-21T08:34:17.8756093Z 2026-02-21T08:34:17.8756336Z { 2026-02-21T08:34:17.8756669Z .reg .pred complete; 2026-02-21T08:34:17.8756964Z waitLoop: 2026-02-21T08:34:17.8757369Z mbarrier.try_wait.parity.shared.b64 complete, [%r927], %r384; 2026-02-21T08:34:17.8757971Z @!complete bra.uni waitLoop; 2026-02-21T08:34:17.8758369Z } 2026-02-21T08:34:17.8758582Z 2026-02-21T08:34:17.8758722Z // end inline asm 2026-02-21T08:34:17.8759422Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.8760097Z bar.sync 0, 128; 2026-02-21T08:34:17.8760370Z // begin inline asm 2026-02-21T08:34:17.8760720Z @%p61 mbarrier.inval.shared::cta.b64 [%r927]; 2026-02-21T08:34:17.8761130Z // end inline asm 2026-02-21T08:34:17.8761660Z .loc 1 35 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:35:32 2026-02-21T08:34:17.8762313Z shl.b32 %r2096, %r2089, 11; 2026-02-21T08:34:17.8762871Z .loc 1 52 45 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:52:45 2026-02-21T08:34:17.8763502Z or.b32 %r2097, %r2096, %r2092; 2026-02-21T08:34:17.8764064Z .loc 1 52 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:52:52 2026-02-21T08:34:17.8764738Z or.b32 %r2098, %r2097, 8192; 2026-02-21T08:34:17.8765059Z or.b32 %r2099, %r2097, 16384; 2026-02-21T08:34:17.8765394Z or.b32 %r2100, %r2097, 24576; 2026-02-21T08:34:17.8765804Z or.b32 %r2101, %r2097, 32768; 2026-02-21T08:34:17.8766130Z or.b32 %r2102, %r2097, 40960; 2026-02-21T08:34:17.8766465Z or.b32 %r2103, %r2097, 49152; 2026-02-21T08:34:17.8766767Z or.b32 %r2104, %r2097, 57344; 2026-02-21T08:34:17.8767067Z or.b32 %r2105, %r2097, 65536; 2026-02-21T08:34:17.8767372Z or.b32 %r2106, %r2097, 73728; 2026-02-21T08:34:17.8767682Z or.b32 %r2107, %r2097, 81920; 2026-02-21T08:34:17.8767980Z or.b32 %r2108, %r2097, 90112; 2026-02-21T08:34:17.8768286Z or.b32 %r2109, %r2097, 98304; 2026-02-21T08:34:17.8768601Z or.b32 %r2110, %r2097, 106496; 2026-02-21T08:34:17.8768917Z or.b32 %r2111, %r2097, 114688; 2026-02-21T08:34:17.8769245Z or.b32 %r2112, %r2097, 122880; 2026-02-21T08:34:17.8769556Z or.b32 %r2113, %r2097, 131072; 2026-02-21T08:34:17.8769879Z or.b32 %r2114, %r2097, 139264; 2026-02-21T08:34:17.8770189Z or.b32 %r2115, %r2097, 147456; 2026-02-21T08:34:17.8770507Z or.b32 %r2116, %r2097, 155648; 2026-02-21T08:34:17.8770818Z or.b32 %r2117, %r2097, 163840; 2026-02-21T08:34:17.8771132Z or.b32 %r2118, %r2097, 172032; 2026-02-21T08:34:17.8771444Z or.b32 %r2119, %r2097, 180224; 2026-02-21T08:34:17.8771847Z or.b32 %r2120, %r2097, 188416; 2026-02-21T08:34:17.8772168Z or.b32 %r2121, %r2097, 196608; 2026-02-21T08:34:17.8772480Z or.b32 %r2122, %r2097, 204800; 2026-02-21T08:34:17.8772797Z or.b32 %r2123, %r2097, 212992; 2026-02-21T08:34:17.8773109Z or.b32 %r2124, %r2097, 221184; 2026-02-21T08:34:17.8773428Z or.b32 %r2125, %r2097, 229376; 2026-02-21T08:34:17.8773737Z or.b32 %r2126, %r2097, 237568; 2026-02-21T08:34:17.8774057Z or.b32 %r2127, %r2097, 245760; 2026-02-21T08:34:17.8774374Z or.b32 %r2128, %r2097, 253952; 2026-02-21T08:34:17.8774760Z or.b32 %r2129, %r2097, 262144; 2026-02-21T08:34:17.8775102Z or.b32 %r2130, %r2097, 270336; 2026-02-21T08:34:17.8775426Z or.b32 %r2131, %r2097, 278528; 2026-02-21T08:34:17.8775762Z or.b32 %r2132, %r2097, 286720; 2026-02-21T08:34:17.8776072Z or.b32 %r2133, %r2097, 294912; 2026-02-21T08:34:17.8776395Z or.b32 %r2134, %r2097, 303104; 2026-02-21T08:34:17.8776784Z or.b32 %r2135, %r2097, 311296; 2026-02-21T08:34:17.8777110Z or.b32 %r2136, %r2097, 319488; 2026-02-21T08:34:17.8777426Z or.b32 %r2137, %r2097, 327680; 2026-02-21T08:34:17.8777743Z or.b32 %r2138, %r2097, 335872; 2026-02-21T08:34:17.8778051Z or.b32 %r2139, %r2097, 344064; 2026-02-21T08:34:17.8778372Z or.b32 %r2140, %r2097, 352256; 2026-02-21T08:34:17.8778693Z or.b32 %r2141, %r2097, 360448; 2026-02-21T08:34:17.8779004Z or.b32 %r2142, %r2097, 368640; 2026-02-21T08:34:17.8779319Z or.b32 %r2143, %r2097, 376832; 2026-02-21T08:34:17.8779627Z or.b32 %r2144, %r2097, 385024; 2026-02-21T08:34:17.8780023Z or.b32 %r2145, %r2097, 393216; 2026-02-21T08:34:17.8780335Z or.b32 %r2146, %r2097, 401408; 2026-02-21T08:34:17.8780651Z or.b32 %r2147, %r2097, 409600; 2026-02-21T08:34:17.8780956Z or.b32 %r2148, %r2097, 417792; 2026-02-21T08:34:17.8781274Z or.b32 %r2149, %r2097, 425984; 2026-02-21T08:34:17.8781593Z or.b32 %r2150, %r2097, 434176; 2026-02-21T08:34:17.8781902Z or.b32 %r2151, %r2097, 442368; 2026-02-21T08:34:17.8782225Z or.b32 %r2152, %r2097, 450560; 2026-02-21T08:34:17.8782540Z or.b32 %r2153, %r2097, 458752; 2026-02-21T08:34:17.8782861Z or.b32 %r2154, %r2097, 466944; 2026-02-21T08:34:17.8783171Z or.b32 %r2155, %r2097, 475136; 2026-02-21T08:34:17.8783491Z or.b32 %r2156, %r2097, 483328; 2026-02-21T08:34:17.8783798Z or.b32 %r2157, %r2097, 491520; 2026-02-21T08:34:17.8784111Z or.b32 %r2158, %r2097, 499712; 2026-02-21T08:34:17.8784426Z or.b32 %r2159, %r2097, 507904; 2026-02-21T08:34:17.8784830Z or.b32 %r2160, %r2097, 516096; 2026-02-21T08:34:17.8785414Z .loc 1 52 24 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:52:24 2026-02-21T08:34:17.8786050Z mad.wide.u32 %rd124, %r2097, 2, %rd23; 2026-02-21T08:34:17.8786432Z mad.wide.u32 %rd125, %r2098, 2, %rd23; 2026-02-21T08:34:17.8786802Z mad.wide.u32 %rd126, %r2099, 2, %rd23; 2026-02-21T08:34:17.8787253Z mad.wide.u32 %rd127, %r2100, 2, %rd23; 2026-02-21T08:34:17.8787620Z mad.wide.u32 %rd128, %r2101, 2, %rd23; 2026-02-21T08:34:17.8787997Z mad.wide.u32 %rd129, %r2102, 2, %rd23; 2026-02-21T08:34:17.8788358Z mad.wide.u32 %rd130, %r2103, 2, %rd23; 2026-02-21T08:34:17.8788726Z mad.wide.u32 %rd131, %r2104, 2, %rd23; 2026-02-21T08:34:17.8789097Z mad.wide.u32 %rd132, %r2105, 2, %rd23; 2026-02-21T08:34:17.8789456Z mad.wide.u32 %rd133, %r2106, 2, %rd23; 2026-02-21T08:34:17.8789828Z mad.wide.u32 %rd134, %r2107, 2, %rd23; 2026-02-21T08:34:17.8790189Z mad.wide.u32 %rd135, %r2108, 2, %rd23; 2026-02-21T08:34:17.8790551Z mad.wide.u32 %rd136, %r2109, 2, %rd23; 2026-02-21T08:34:17.8790912Z mad.wide.u32 %rd137, %r2110, 2, %rd23; 2026-02-21T08:34:17.8791280Z mad.wide.u32 %rd138, %r2111, 2, %rd23; 2026-02-21T08:34:17.8791637Z mad.wide.u32 %rd139, %r2112, 2, %rd23; 2026-02-21T08:34:17.8792004Z mad.wide.u32 %rd140, %r2113, 2, %rd23; 2026-02-21T08:34:17.8792367Z mad.wide.u32 %rd141, %r2114, 2, %rd23; 2026-02-21T08:34:17.8792730Z mad.wide.u32 %rd142, %r2115, 2, %rd23; 2026-02-21T08:34:17.8793100Z mad.wide.u32 %rd143, %r2116, 2, %rd23; 2026-02-21T08:34:17.8793556Z mad.wide.u32 %rd144, %r2117, 2, %rd23; 2026-02-21T08:34:17.8793926Z mad.wide.u32 %rd145, %r2118, 2, %rd23; 2026-02-21T08:34:17.8794284Z mad.wide.u32 %rd146, %r2119, 2, %rd23; 2026-02-21T08:34:17.8794661Z mad.wide.u32 %rd147, %r2120, 2, %rd23; 2026-02-21T08:34:17.8795119Z mad.wide.u32 %rd148, %r2121, 2, %rd23; 2026-02-21T08:34:17.8795498Z mad.wide.u32 %rd149, %r2122, 2, %rd23; 2026-02-21T08:34:17.8795874Z mad.wide.u32 %rd150, %r2123, 2, %rd23; 2026-02-21T08:34:17.8796257Z mad.wide.u32 %rd151, %r2124, 2, %rd23; 2026-02-21T08:34:17.8796645Z mad.wide.u32 %rd152, %r2125, 2, %rd23; 2026-02-21T08:34:17.8797022Z mad.wide.u32 %rd153, %r2126, 2, %rd23; 2026-02-21T08:34:17.8797403Z mad.wide.u32 %rd154, %r2127, 2, %rd23; 2026-02-21T08:34:17.8797775Z mad.wide.u32 %rd155, %r2128, 2, %rd23; 2026-02-21T08:34:17.8798157Z mad.wide.u32 %rd156, %r2129, 2, %rd23; 2026-02-21T08:34:17.8798531Z mad.wide.u32 %rd157, %r2130, 2, %rd23; 2026-02-21T08:34:17.8798985Z mad.wide.u32 %rd158, %r2131, 2, %rd23; 2026-02-21T08:34:17.8799378Z mad.wide.u32 %rd159, %r2132, 2, %rd23; 2026-02-21T08:34:17.8799748Z mad.wide.u32 %rd160, %r2133, 2, %rd23; 2026-02-21T08:34:17.8800129Z mad.wide.u32 %rd161, %r2134, 2, %rd23; 2026-02-21T08:34:17.8800500Z mad.wide.u32 %rd162, %r2135, 2, %rd23; 2026-02-21T08:34:17.8800880Z mad.wide.u32 %rd163, %r2136, 2, %rd23; 2026-02-21T08:34:17.8801244Z mad.wide.u32 %rd164, %r2137, 2, %rd23; 2026-02-21T08:34:17.8801628Z mad.wide.u32 %rd165, %r2138, 2, %rd23; 2026-02-21T08:34:17.8802097Z mad.wide.u32 %rd166, %r2139, 2, %rd23; 2026-02-21T08:34:17.8802480Z mad.wide.u32 %rd167, %r2140, 2, %rd23; 2026-02-21T08:34:17.8802858Z mad.wide.u32 %rd168, %r2141, 2, %rd23; 2026-02-21T08:34:17.8803227Z mad.wide.u32 %rd169, %r2142, 2, %rd23; 2026-02-21T08:34:17.8803612Z mad.wide.u32 %rd170, %r2143, 2, %rd23; 2026-02-21T08:34:17.8803987Z mad.wide.u32 %rd171, %r2144, 2, %rd23; 2026-02-21T08:34:17.8804366Z mad.wide.u32 %rd172, %r2145, 2, %rd23; 2026-02-21T08:34:17.8804788Z mad.wide.u32 %rd173, %r2146, 2, %rd23; 2026-02-21T08:34:17.8805178Z mad.wide.u32 %rd174, %r2147, 2, %rd23; 2026-02-21T08:34:17.8805548Z mad.wide.u32 %rd175, %r2148, 2, %rd23; 2026-02-21T08:34:17.8805927Z mad.wide.u32 %rd176, %r2149, 2, %rd23; 2026-02-21T08:34:17.8806307Z mad.wide.u32 %rd177, %r2150, 2, %rd23; 2026-02-21T08:34:17.8806678Z mad.wide.u32 %rd178, %r2151, 2, %rd23; 2026-02-21T08:34:17.8807056Z mad.wide.u32 %rd179, %r2152, 2, %rd23; 2026-02-21T08:34:17.8807430Z mad.wide.u32 %rd180, %r2153, 2, %rd23; 2026-02-21T08:34:17.8807815Z mad.wide.u32 %rd181, %r2154, 2, %rd23; 2026-02-21T08:34:17.8808184Z mad.wide.u32 %rd182, %r2155, 2, %rd23; 2026-02-21T08:34:17.8808567Z mad.wide.u32 %rd183, %r2156, 2, %rd23; 2026-02-21T08:34:17.8808938Z mad.wide.u32 %rd184, %r2157, 2, %rd23; 2026-02-21T08:34:17.8809320Z mad.wide.u32 %rd185, %r2158, 2, %rd23; 2026-02-21T08:34:17.8809784Z mad.wide.u32 %rd186, %r2159, 2, %rd23; 2026-02-21T08:34:17.8810165Z mad.wide.u32 %rd187, %r2160, 2, %rd23; 2026-02-21T08:34:17.8810800Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.8811433Z // begin inline asm 2026-02-21T08:34:17.8812257Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r931, %r932, %r933, %r934, %r935, %r936, %r937, %r938, %r939, %r940, %r941, %r942, %r943, %r944, %r945, %r946}, [%r383 + 0]; 2026-02-21T08:34:17.8813133Z // end inline asm 2026-02-21T08:34:17.8813434Z cvt.u64.u32 %rd188, %r935; 2026-02-21T08:34:17.8813774Z cvt.u64.u32 %rd189, %r936; 2026-02-21T08:34:17.8814106Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:34:17.8814444Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:34:17.8814850Z cvt.u64.u32 %rd192, %r937; 2026-02-21T08:34:17.8815182Z cvt.u64.u32 %rd193, %r938; 2026-02-21T08:34:17.8815496Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:34:17.8815836Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:34:17.8816161Z cvt.u64.u32 %rd196, %r943; 2026-02-21T08:34:17.8816491Z cvt.u64.u32 %rd197, %r944; 2026-02-21T08:34:17.8816894Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:34:17.8817228Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:34:17.8817569Z cvt.u64.u32 %rd200, %r945; 2026-02-21T08:34:17.8817881Z cvt.u64.u32 %rd201, %r946; 2026-02-21T08:34:17.8818207Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:34:17.8818527Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:34:17.8818859Z // begin inline asm 2026-02-21T08:34:17.8819661Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r948, %r949, %r950, %r951, %r952, %r953, %r954, %r955, %r956, %r957, %r958, %r959, %r960, %r961, %r962, %r963}, [%r383 + 16]; 2026-02-21T08:34:17.8820550Z // end inline asm 2026-02-21T08:34:17.8820842Z cvt.u64.u32 %rd204, %r952; 2026-02-21T08:34:17.8821161Z cvt.u64.u32 %rd205, %r953; 2026-02-21T08:34:17.8821488Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:34:17.8821809Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:34:17.8822157Z cvt.u64.u32 %rd208, %r954; 2026-02-21T08:34:17.8822470Z cvt.u64.u32 %rd209, %r955; 2026-02-21T08:34:17.8822868Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:34:17.8823201Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:34:17.8823537Z cvt.u64.u32 %rd212, %r960; 2026-02-21T08:34:17.8823849Z cvt.u64.u32 %rd213, %r961; 2026-02-21T08:34:17.8824173Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:34:17.8824502Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:34:17.8824881Z cvt.u64.u32 %rd216, %r962; 2026-02-21T08:34:17.8825209Z cvt.u64.u32 %rd217, %r963; 2026-02-21T08:34:17.8825529Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:34:17.8825937Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:34:17.8826260Z // begin inline asm 2026-02-21T08:34:17.8827063Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r965, %r966, %r967, %r968, %r969, %r970, %r971, %r972, %r973, %r974, %r975, %r976, %r977, %r978, %r979, %r980}, [%r383 + 32]; 2026-02-21T08:34:17.8827940Z // end inline asm 2026-02-21T08:34:17.8828233Z cvt.u64.u32 %rd220, %r969; 2026-02-21T08:34:17.8828555Z cvt.u64.u32 %rd221, %r970; 2026-02-21T08:34:17.8828870Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:34:17.8829203Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:34:17.8829532Z cvt.u64.u32 %rd224, %r971; 2026-02-21T08:34:17.8829857Z cvt.u64.u32 %rd225, %r972; 2026-02-21T08:34:17.8830170Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:34:17.8830500Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:34:17.8830837Z cvt.u64.u32 %rd228, %r977; 2026-02-21T08:34:17.8831292Z cvt.u64.u32 %rd229, %r978; 2026-02-21T08:34:17.8831733Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:34:17.8832112Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:34:17.8832461Z cvt.u64.u32 %rd232, %r979; 2026-02-21T08:34:17.8832779Z cvt.u64.u32 %rd233, %r980; 2026-02-21T08:34:17.8833106Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:34:17.8833428Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:34:17.8833763Z // begin inline asm 2026-02-21T08:34:17.8834651Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r982, %r983, %r984, %r985, %r986, %r987, %r988, %r989, %r990, %r991, %r992, %r993, %r994, %r995, %r996, %r997}, [%r383 + 48]; 2026-02-21T08:34:17.8835640Z // end inline asm 2026-02-21T08:34:17.8835921Z cvt.u64.u32 %rd236, %r986; 2026-02-21T08:34:17.8836248Z cvt.u64.u32 %rd237, %r987; 2026-02-21T08:34:17.8836575Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:34:17.8836898Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:34:17.8837239Z cvt.u64.u32 %rd240, %r988; 2026-02-21T08:34:17.8837553Z cvt.u64.u32 %rd241, %r989; 2026-02-21T08:34:17.8837880Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:34:17.8838201Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:34:17.8838543Z cvt.u64.u32 %rd244, %r994; 2026-02-21T08:34:17.8838853Z cvt.u64.u32 %rd245, %r995; 2026-02-21T08:34:17.8839180Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:34:17.8839507Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:34:17.8839838Z cvt.u64.u32 %rd248, %r996; 2026-02-21T08:34:17.8840160Z cvt.u64.u32 %rd249, %r997; 2026-02-21T08:34:17.8840477Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:34:17.8840812Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:34:17.8841139Z // begin inline asm 2026-02-21T08:34:17.8842115Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r999, %r1000, %r1001, %r1002, %r1003, %r1004, %r1005, %r1006, %r1007, %r1008, %r1009, %r1010, %r1011, %r1012, %r1013, %r1014}, [%r383 + 64]; 2026-02-21T08:34:17.8843077Z // end inline asm 2026-02-21T08:34:17.8843369Z cvt.u64.u32 %rd252, %r1003; 2026-02-21T08:34:17.8843697Z cvt.u64.u32 %rd253, %r1004; 2026-02-21T08:34:17.8844017Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:34:17.8844346Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:34:17.8844728Z cvt.u64.u32 %rd256, %r1005; 2026-02-21T08:34:17.8845059Z cvt.u64.u32 %rd257, %r1006; 2026-02-21T08:34:17.8845384Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:34:17.8845718Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:34:17.8846047Z cvt.u64.u32 %rd260, %r1011; 2026-02-21T08:34:17.8846379Z cvt.u64.u32 %rd261, %r1012; 2026-02-21T08:34:17.8846699Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:34:17.8847114Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:34:17.8847463Z cvt.u64.u32 %rd264, %r1013; 2026-02-21T08:34:17.8847784Z cvt.u64.u32 %rd265, %r1014; 2026-02-21T08:34:17.8848117Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:34:17.8848434Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:34:17.8848775Z // begin inline asm 2026-02-21T08:34:17.8849663Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1016, %r1017, %r1018, %r1019, %r1020, %r1021, %r1022, %r1023, %r1024, %r1025, %r1026, %r1027, %r1028, %r1029, %r1030, %r1031}, [%r383 + 80]; 2026-02-21T08:34:17.8850642Z // end inline asm 2026-02-21T08:34:17.8851001Z cvt.u64.u32 %rd268, %r1020; 2026-02-21T08:34:17.8851322Z cvt.u64.u32 %rd269, %r1021; 2026-02-21T08:34:17.8851647Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:34:17.8851965Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:34:17.8852307Z cvt.u64.u32 %rd272, %r1022; 2026-02-21T08:34:17.8852624Z cvt.u64.u32 %rd273, %r1023; 2026-02-21T08:34:17.8852951Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:34:17.8853267Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:34:17.8853610Z cvt.u64.u32 %rd276, %r1028; 2026-02-21T08:34:17.8853931Z cvt.u64.u32 %rd277, %r1029; 2026-02-21T08:34:17.8854249Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:34:17.8854576Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:34:17.8854956Z cvt.u64.u32 %rd280, %r1030; 2026-02-21T08:34:17.8855287Z cvt.u64.u32 %rd281, %r1031; 2026-02-21T08:34:17.8855606Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:34:17.8855936Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:34:17.8856263Z // begin inline asm 2026-02-21T08:34:17.8857150Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1033, %r1034, %r1035, %r1036, %r1037, %r1038, %r1039, %r1040, %r1041, %r1042, %r1043, %r1044, %r1045, %r1046, %r1047, %r1048}, [%r383 + 96]; 2026-02-21T08:34:17.8858108Z // end inline asm 2026-02-21T08:34:17.8858400Z cvt.u64.u32 %rd284, %r1037; 2026-02-21T08:34:17.8858732Z cvt.u64.u32 %rd285, %r1038; 2026-02-21T08:34:17.8859328Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:34:17.8859671Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:34:17.8860000Z cvt.u64.u32 %rd288, %r1039; 2026-02-21T08:34:17.8860324Z cvt.u64.u32 %rd289, %r1040; 2026-02-21T08:34:17.8860640Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:34:17.8860972Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:34:17.8861297Z cvt.u64.u32 %rd292, %r1045; 2026-02-21T08:34:17.8861620Z cvt.u64.u32 %rd293, %r1046; 2026-02-21T08:34:17.8861948Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:34:17.8862266Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:34:17.8862601Z cvt.u64.u32 %rd296, %r1047; 2026-02-21T08:34:17.8862920Z cvt.u64.u32 %rd297, %r1048; 2026-02-21T08:34:17.8863241Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:34:17.8863559Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:34:17.8863889Z // begin inline asm 2026-02-21T08:34:17.8864839Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1050, %r1051, %r1052, %r1053, %r1054, %r1055, %r1056, %r1057, %r1058, %r1059, %r1060, %r1061, %r1062, %r1063, %r1064, %r1065}, [%r383 + 112]; 2026-02-21T08:34:17.8865828Z // end inline asm 2026-02-21T08:34:17.8866198Z cvt.u64.u32 %rd300, %r1054; 2026-02-21T08:34:17.8866519Z cvt.u64.u32 %rd301, %r1055; 2026-02-21T08:34:17.8866850Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:34:17.8867166Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:34:17.8867504Z cvt.u64.u32 %rd304, %r1056; 2026-02-21T08:34:17.8867821Z cvt.u64.u32 %rd305, %r1057; 2026-02-21T08:34:17.8868146Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:34:17.8868464Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:34:17.8868796Z cvt.u64.u32 %rd308, %r1062; 2026-02-21T08:34:17.8869119Z cvt.u64.u32 %rd309, %r1063; 2026-02-21T08:34:17.8869447Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:34:17.8869770Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:34:17.8870097Z cvt.u64.u32 %rd312, %r1064; 2026-02-21T08:34:17.8870417Z cvt.u64.u32 %rd313, %r1065; 2026-02-21T08:34:17.8870735Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:34:17.8871066Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:34:17.8871389Z // begin inline asm 2026-02-21T08:34:17.8872358Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1067, %r1068, %r1069, %r1070, %r1071, %r1072, %r1073, %r1074, %r1075, %r1076, %r1077, %r1078, %r1079, %r1080, %r1081, %r1082}, [%r383 + 128]; 2026-02-21T08:34:17.8873354Z // end inline asm 2026-02-21T08:34:17.8873640Z cvt.u64.u32 %rd316, %r1071; 2026-02-21T08:34:17.8873967Z cvt.u64.u32 %rd317, %r1072; 2026-02-21T08:34:17.8874285Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:34:17.8874614Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:34:17.8874996Z cvt.u64.u32 %rd320, %r1073; 2026-02-21T08:34:17.8875408Z cvt.u64.u32 %rd321, %r1074; 2026-02-21T08:34:17.8875728Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:34:17.8876059Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:34:17.8876388Z cvt.u64.u32 %rd324, %r1079; 2026-02-21T08:34:17.8876713Z cvt.u64.u32 %rd325, %r1080; 2026-02-21T08:34:17.8877046Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:34:17.8877378Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:34:17.8877731Z cvt.u64.u32 %rd328, %r1081; 2026-02-21T08:34:17.8878056Z cvt.u64.u32 %rd329, %r1082; 2026-02-21T08:34:17.8878390Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:34:17.8878714Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:34:17.8879049Z // begin inline asm 2026-02-21T08:34:17.8879928Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1084, %r1085, %r1086, %r1087, %r1088, %r1089, %r1090, %r1091, %r1092, %r1093, %r1094, %r1095, %r1096, %r1097, %r1098, %r1099}, [%r383 + 144]; 2026-02-21T08:34:17.8880903Z // end inline asm 2026-02-21T08:34:17.8881188Z cvt.u64.u32 %rd332, %r1088; 2026-02-21T08:34:17.8881519Z cvt.u64.u32 %rd333, %r1089; 2026-02-21T08:34:17.8881849Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:34:17.8882165Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:34:17.8882500Z cvt.u64.u32 %rd336, %r1090; 2026-02-21T08:34:17.8882816Z cvt.u64.u32 %rd337, %r1091; 2026-02-21T08:34:17.8883229Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:34:17.8883570Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:34:17.8883905Z cvt.u64.u32 %rd340, %r1096; 2026-02-21T08:34:17.8884238Z cvt.u64.u32 %rd341, %r1097; 2026-02-21T08:34:17.8884556Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:34:17.8884962Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:34:17.8885364Z cvt.u64.u32 %rd344, %r1098; 2026-02-21T08:34:17.8885718Z cvt.u64.u32 %rd345, %r1099; 2026-02-21T08:34:17.8886046Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:34:17.8886366Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:34:17.8886699Z // begin inline asm 2026-02-21T08:34:17.8887589Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1101, %r1102, %r1103, %r1104, %r1105, %r1106, %r1107, %r1108, %r1109, %r1110, %r1111, %r1112, %r1113, %r1114, %r1115, %r1116}, [%r383 + 160]; 2026-02-21T08:34:17.8888587Z // end inline asm 2026-02-21T08:34:17.8888887Z cvt.u64.u32 %rd348, %r1105; 2026-02-21T08:34:17.8889205Z cvt.u64.u32 %rd349, %r1106; 2026-02-21T08:34:17.8889545Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:34:17.8889867Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:34:17.8890212Z cvt.u64.u32 %rd352, %r1107; 2026-02-21T08:34:17.8890626Z cvt.u64.u32 %rd353, %r1108; 2026-02-21T08:34:17.8890959Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:34:17.8891278Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:34:17.8891620Z cvt.u64.u32 %rd356, %r1113; 2026-02-21T08:34:17.8891939Z cvt.u64.u32 %rd357, %r1114; 2026-02-21T08:34:17.8892272Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:34:17.8892601Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:34:17.8892930Z cvt.u64.u32 %rd360, %r1115; 2026-02-21T08:34:17.8893259Z cvt.u64.u32 %rd361, %r1116; 2026-02-21T08:34:17.8893580Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:34:17.8893907Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:34:17.8894237Z // begin inline asm 2026-02-21T08:34:17.8895183Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1118, %r1119, %r1120, %r1121, %r1122, %r1123, %r1124, %r1125, %r1126, %r1127, %r1128, %r1129, %r1130, %r1131, %r1132, %r1133}, [%r383 + 176]; 2026-02-21T08:34:17.8896165Z // end inline asm 2026-02-21T08:34:17.8896592Z cvt.u64.u32 %rd364, %r1122; 2026-02-21T08:34:17.8896935Z cvt.u64.u32 %rd365, %r1123; 2026-02-21T08:34:17.8897257Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:34:17.8897591Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:34:17.8897915Z cvt.u64.u32 %rd368, %r1124; 2026-02-21T08:34:17.8898248Z cvt.u64.u32 %rd369, %r1125; 2026-02-21T08:34:17.8898572Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:34:17.8898898Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:34:17.8899226Z cvt.u64.u32 %rd372, %r1130; 2026-02-21T08:34:17.8899559Z cvt.u64.u32 %rd373, %r1131; 2026-02-21T08:34:17.8899973Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:34:17.8900303Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:34:17.8900645Z cvt.u64.u32 %rd376, %r1132; 2026-02-21T08:34:17.8900967Z cvt.u64.u32 %rd377, %r1133; 2026-02-21T08:34:17.8901287Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:34:17.8901613Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:34:17.8901959Z // begin inline asm 2026-02-21T08:34:17.8902875Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1135, %r1136, %r1137, %r1138, %r1139, %r1140, %r1141, %r1142, %r1143, %r1144, %r1145, %r1146, %r1147, %r1148, %r1149, %r1150}, [%r383 + 192]; 2026-02-21T08:34:17.8903874Z // end inline asm 2026-02-21T08:34:17.8904161Z cvt.u64.u32 %rd380, %r1139; 2026-02-21T08:34:17.8904480Z cvt.u64.u32 %rd381, %r1140; 2026-02-21T08:34:17.8904866Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:34:17.8905188Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:34:17.8905529Z cvt.u64.u32 %rd384, %r1141; 2026-02-21T08:34:17.8905851Z cvt.u64.u32 %rd385, %r1142; 2026-02-21T08:34:17.8906199Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:34:17.8906528Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:34:17.8906855Z cvt.u64.u32 %rd388, %r1147; 2026-02-21T08:34:17.8907189Z cvt.u64.u32 %rd389, %r1148; 2026-02-21T08:34:17.8907507Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:34:17.8907925Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:34:17.8908270Z cvt.u64.u32 %rd392, %r1149; 2026-02-21T08:34:17.8908603Z cvt.u64.u32 %rd393, %r1150; 2026-02-21T08:34:17.8908924Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:34:17.8909251Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:34:17.8909581Z // begin inline asm 2026-02-21T08:34:17.8910481Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1152, %r1153, %r1154, %r1155, %r1156, %r1157, %r1158, %r1159, %r1160, %r1161, %r1162, %r1163, %r1164, %r1165, %r1166, %r1167}, [%r383 + 208]; 2026-02-21T08:34:17.8911464Z // end inline asm 2026-02-21T08:34:17.8911737Z cvt.u64.u32 %rd396, %r1156; 2026-02-21T08:34:17.8912062Z cvt.u64.u32 %rd397, %r1157; 2026-02-21T08:34:17.8912386Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:34:17.8912711Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:34:17.8913038Z cvt.u64.u32 %rd400, %r1158; 2026-02-21T08:34:17.8913365Z cvt.u64.u32 %rd401, %r1159; 2026-02-21T08:34:17.8913689Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:34:17.8914015Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:34:17.8914359Z cvt.u64.u32 %rd404, %r1164; 2026-02-21T08:34:17.8914780Z cvt.u64.u32 %rd405, %r1165; 2026-02-21T08:34:17.8915218Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:34:17.8915540Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:34:17.8915879Z cvt.u64.u32 %rd408, %r1166; 2026-02-21T08:34:17.8916198Z cvt.u64.u32 %rd409, %r1167; 2026-02-21T08:34:17.8916531Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:34:17.8916849Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:34:17.8917186Z // begin inline asm 2026-02-21T08:34:17.8918086Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1169, %r1170, %r1171, %r1172, %r1173, %r1174, %r1175, %r1176, %r1177, %r1178, %r1179, %r1180, %r1181, %r1182, %r1183, %r1184}, [%r383 + 224]; 2026-02-21T08:34:17.8919067Z // end inline asm 2026-02-21T08:34:17.8919360Z cvt.u64.u32 %rd412, %r1173; 2026-02-21T08:34:17.8919677Z cvt.u64.u32 %rd413, %r1174; 2026-02-21T08:34:17.8920014Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:34:17.8920335Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:34:17.8920678Z cvt.u64.u32 %rd416, %r1175; 2026-02-21T08:34:17.8921083Z cvt.u64.u32 %rd417, %r1176; 2026-02-21T08:34:17.8921424Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:34:17.8921756Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:34:17.8922084Z cvt.u64.u32 %rd420, %r1181; 2026-02-21T08:34:17.8922420Z cvt.u64.u32 %rd421, %r1182; 2026-02-21T08:34:17.8922743Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:34:17.8923071Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:34:17.8923397Z cvt.u64.u32 %rd424, %r1183; 2026-02-21T08:34:17.8923729Z cvt.u64.u32 %rd425, %r1184; 2026-02-21T08:34:17.8924053Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:34:17.8924466Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:34:17.8924842Z // begin inline asm 2026-02-21T08:34:17.8925740Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1186, %r1187, %r1188, %r1189, %r1190, %r1191, %r1192, %r1193, %r1194, %r1195, %r1196, %r1197, %r1198, %r1199, %r1200, %r1201}, [%r383 + 240]; 2026-02-21T08:34:17.8926727Z // end inline asm 2026-02-21T08:34:17.8927001Z cvt.u64.u32 %rd428, %r1190; 2026-02-21T08:34:17.8927335Z cvt.u64.u32 %rd429, %r1191; 2026-02-21T08:34:17.8927655Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:34:17.8927985Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:34:17.8928315Z cvt.u64.u32 %rd432, %r1192; 2026-02-21T08:34:17.8928647Z cvt.u64.u32 %rd433, %r1193; 2026-02-21T08:34:17.8928979Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:34:17.8929299Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:34:17.8929638Z cvt.u64.u32 %rd436, %r1198; 2026-02-21T08:34:17.8929958Z cvt.u64.u32 %rd437, %r1199; 2026-02-21T08:34:17.8930282Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:34:17.8930612Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:34:17.8930948Z cvt.u64.u32 %rd440, %r1200; 2026-02-21T08:34:17.8931267Z cvt.u64.u32 %rd441, %r1201; 2026-02-21T08:34:17.8931591Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:34:17.8931917Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:34:17.8932342Z // begin inline asm 2026-02-21T08:34:17.8933243Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1203, %r1204, %r1205, %r1206, %r1207, %r1208, %r1209, %r1210, %r1211, %r1212, %r1213, %r1214, %r1215, %r1216, %r1217, %r1218}, [%r383 + 256]; 2026-02-21T08:34:17.8934215Z // end inline asm 2026-02-21T08:34:17.8934506Z cvt.u64.u32 %rd444, %r1207; 2026-02-21T08:34:17.8934894Z cvt.u64.u32 %rd445, %r1208; 2026-02-21T08:34:17.8935227Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:34:17.8935547Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:34:17.8935889Z cvt.u64.u32 %rd448, %r1209; 2026-02-21T08:34:17.8936210Z cvt.u64.u32 %rd449, %r1210; 2026-02-21T08:34:17.8936542Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:34:17.8936869Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:34:17.8937204Z cvt.u64.u32 %rd452, %r1215; 2026-02-21T08:34:17.8937537Z cvt.u64.u32 %rd453, %r1216; 2026-02-21T08:34:17.8937858Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:34:17.8938184Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:34:17.8938521Z cvt.u64.u32 %rd456, %r1217; 2026-02-21T08:34:17.8938855Z cvt.u64.u32 %rd457, %r1218; 2026-02-21T08:34:17.8939177Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:34:17.8939596Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:34:17.8939941Z // begin inline asm 2026-02-21T08:34:17.8940818Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1220, %r1221, %r1222, %r1223, %r1224, %r1225, %r1226, %r1227, %r1228, %r1229, %r1230, %r1231, %r1232, %r1233, %r1234, %r1235}, [%r383 + 272]; 2026-02-21T08:34:17.8941783Z // end inline asm 2026-02-21T08:34:17.8942054Z cvt.u64.u32 %rd460, %r1224; 2026-02-21T08:34:17.8942382Z cvt.u64.u32 %rd461, %r1225; 2026-02-21T08:34:17.8942702Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:34:17.8943032Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:34:17.8943358Z cvt.u64.u32 %rd464, %r1226; 2026-02-21T08:34:17.8943682Z cvt.u64.u32 %rd465, %r1227; 2026-02-21T08:34:17.8944011Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:34:17.8944328Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:34:17.8944716Z cvt.u64.u32 %rd468, %r1232; 2026-02-21T08:34:17.8945042Z cvt.u64.u32 %rd469, %r1233; 2026-02-21T08:34:17.8945450Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:34:17.8945783Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:34:17.8946120Z cvt.u64.u32 %rd472, %r1234; 2026-02-21T08:34:17.8946437Z cvt.u64.u32 %rd473, %r1235; 2026-02-21T08:34:17.8946763Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:34:17.8947087Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:34:17.8947425Z // begin inline asm 2026-02-21T08:34:17.8948305Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1237, %r1238, %r1239, %r1240, %r1241, %r1242, %r1243, %r1244, %r1245, %r1246, %r1247, %r1248, %r1249, %r1250, %r1251, %r1252}, [%r383 + 288]; 2026-02-21T08:34:17.8949344Z // end inline asm 2026-02-21T08:34:17.8949627Z cvt.u64.u32 %rd476, %r1241; 2026-02-21T08:34:17.8949947Z cvt.u64.u32 %rd477, %r1242; 2026-02-21T08:34:17.8950272Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:34:17.8950587Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:34:17.8950931Z cvt.u64.u32 %rd480, %r1243; 2026-02-21T08:34:17.8951256Z cvt.u64.u32 %rd481, %r1244; 2026-02-21T08:34:17.8951575Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:34:17.8951907Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:34:17.8952233Z cvt.u64.u32 %rd484, %r1249; 2026-02-21T08:34:17.8952561Z cvt.u64.u32 %rd485, %r1250; 2026-02-21T08:34:17.8952878Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:34:17.8953208Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:34:17.8953536Z cvt.u64.u32 %rd488, %r1251; 2026-02-21T08:34:17.8953855Z cvt.u64.u32 %rd489, %r1252; 2026-02-21T08:34:17.8954171Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:34:17.8954504Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:34:17.8954904Z // begin inline asm 2026-02-21T08:34:17.8955785Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1254, %r1255, %r1256, %r1257, %r1258, %r1259, %r1260, %r1261, %r1262, %r1263, %r1264, %r1265, %r1266, %r1267, %r1268, %r1269}, [%r383 + 304]; 2026-02-21T08:34:17.8956783Z // end inline asm 2026-02-21T08:34:17.8957140Z cvt.u64.u32 %rd492, %r1258; 2026-02-21T08:34:17.8957480Z cvt.u64.u32 %rd493, %r1259; 2026-02-21T08:34:17.8957799Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:34:17.8958131Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:34:17.8958459Z cvt.u64.u32 %rd496, %r1260; 2026-02-21T08:34:17.8958785Z cvt.u64.u32 %rd497, %r1261; 2026-02-21T08:34:17.8959115Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:34:17.8959436Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:34:17.8959779Z cvt.u64.u32 %rd500, %r1266; 2026-02-21T08:34:17.8960099Z cvt.u64.u32 %rd501, %r1267; 2026-02-21T08:34:17.8960427Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:34:17.8960755Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:34:17.8961094Z cvt.u64.u32 %rd504, %r1268; 2026-02-21T08:34:17.8961412Z cvt.u64.u32 %rd505, %r1269; 2026-02-21T08:34:17.8961738Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:34:17.8962059Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:34:17.8962390Z // begin inline asm 2026-02-21T08:34:17.8963282Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1271, %r1272, %r1273, %r1274, %r1275, %r1276, %r1277, %r1278, %r1279, %r1280, %r1281, %r1282, %r1283, %r1284, %r1285, %r1286}, [%r383 + 320]; 2026-02-21T08:34:17.8964339Z // end inline asm 2026-02-21T08:34:17.8964636Z cvt.u64.u32 %rd508, %r1275; 2026-02-21T08:34:17.8965012Z cvt.u64.u32 %rd509, %r1276; 2026-02-21T08:34:17.8965346Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:34:17.8965668Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:34:17.8966003Z cvt.u64.u32 %rd512, %r1277; 2026-02-21T08:34:17.8966331Z cvt.u64.u32 %rd513, %r1278; 2026-02-21T08:34:17.8966652Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:34:17.8966985Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:34:17.8967313Z cvt.u64.u32 %rd516, %r1283; 2026-02-21T08:34:17.8967639Z cvt.u64.u32 %rd517, %r1284; 2026-02-21T08:34:17.8967956Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:34:17.8968284Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:34:17.8968615Z cvt.u64.u32 %rd520, %r1285; 2026-02-21T08:34:17.8968939Z cvt.u64.u32 %rd521, %r1286; 2026-02-21T08:34:17.8969261Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:34:17.8969669Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:34:17.8970012Z // begin inline asm 2026-02-21T08:34:17.8970902Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1288, %r1289, %r1290, %r1291, %r1292, %r1293, %r1294, %r1295, %r1296, %r1297, %r1298, %r1299, %r1300, %r1301, %r1302, %r1303}, [%r383 + 336]; 2026-02-21T08:34:17.8971876Z // end inline asm 2026-02-21T08:34:17.8972155Z cvt.u64.u32 %rd524, %r1292; 2026-02-21T08:34:17.8972489Z cvt.u64.u32 %rd525, %r1293; 2026-02-21T08:34:17.8972813Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:34:17.8973231Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:34:17.8973570Z cvt.u64.u32 %rd528, %r1294; 2026-02-21T08:34:17.8973886Z cvt.u64.u32 %rd529, %r1295; 2026-02-21T08:34:17.8974212Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:34:17.8974523Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:34:17.8974918Z cvt.u64.u32 %rd532, %r1300; 2026-02-21T08:34:17.8975246Z cvt.u64.u32 %rd533, %r1301; 2026-02-21T08:34:17.8975576Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:34:17.8975896Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:34:17.8976232Z cvt.u64.u32 %rd536, %r1302; 2026-02-21T08:34:17.8976550Z cvt.u64.u32 %rd537, %r1303; 2026-02-21T08:34:17.8976872Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:34:17.8977202Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:34:17.8977529Z // begin inline asm 2026-02-21T08:34:17.8978420Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1305, %r1306, %r1307, %r1308, %r1309, %r1310, %r1311, %r1312, %r1313, %r1314, %r1315, %r1316, %r1317, %r1318, %r1319, %r1320}, [%r383 + 352]; 2026-02-21T08:34:17.8979389Z // end inline asm 2026-02-21T08:34:17.8979676Z cvt.u64.u32 %rd540, %r1309; 2026-02-21T08:34:17.8979993Z cvt.u64.u32 %rd541, %r1310; 2026-02-21T08:34:17.8980329Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:34:17.8980652Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:34:17.8980993Z cvt.u64.u32 %rd544, %r1311; 2026-02-21T08:34:17.8981401Z cvt.u64.u32 %rd545, %r1312; 2026-02-21T08:34:17.8981726Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:34:17.8982059Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:34:17.8982386Z cvt.u64.u32 %rd548, %r1317; 2026-02-21T08:34:17.8982720Z cvt.u64.u32 %rd549, %r1318; 2026-02-21T08:34:17.8983033Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:34:17.8983359Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:34:17.8983684Z cvt.u64.u32 %rd552, %r1319; 2026-02-21T08:34:17.8984009Z cvt.u64.u32 %rd553, %r1320; 2026-02-21T08:34:17.8984335Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:34:17.8984655Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:34:17.8985081Z // begin inline asm 2026-02-21T08:34:17.8985978Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1322, %r1323, %r1324, %r1325, %r1326, %r1327, %r1328, %r1329, %r1330, %r1331, %r1332, %r1333, %r1334, %r1335, %r1336, %r1337}, [%r383 + 368]; 2026-02-21T08:34:17.8986956Z // end inline asm 2026-02-21T08:34:17.8987232Z cvt.u64.u32 %rd556, %r1326; 2026-02-21T08:34:17.8987564Z cvt.u64.u32 %rd557, %r1327; 2026-02-21T08:34:17.8987885Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:34:17.8988219Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:34:17.8988679Z cvt.u64.u32 %rd560, %r1328; 2026-02-21T08:34:17.8989001Z cvt.u64.u32 %rd561, %r1329; 2026-02-21T08:34:17.8989327Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:34:17.8989641Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:34:17.8989976Z cvt.u64.u32 %rd564, %r1334; 2026-02-21T08:34:17.8990293Z cvt.u64.u32 %rd565, %r1335; 2026-02-21T08:34:17.8990621Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:34:17.8990937Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:34:17.8991277Z cvt.u64.u32 %rd568, %r1336; 2026-02-21T08:34:17.8991599Z cvt.u64.u32 %rd569, %r1337; 2026-02-21T08:34:17.8991921Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:34:17.8992248Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:34:17.8992572Z // begin inline asm 2026-02-21T08:34:17.8993527Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1339, %r1340, %r1341, %r1342, %r1343, %r1344, %r1345, %r1346, %r1347, %r1348, %r1349, %r1350, %r1351, %r1352, %r1353, %r1354}, [%r383 + 384]; 2026-02-21T08:34:17.8994501Z // end inline asm 2026-02-21T08:34:17.8994840Z cvt.u64.u32 %rd572, %r1343; 2026-02-21T08:34:17.8995157Z cvt.u64.u32 %rd573, %r1344; 2026-02-21T08:34:17.8995490Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:34:17.8995817Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:34:17.8996147Z cvt.u64.u32 %rd576, %r1345; 2026-02-21T08:34:17.8996470Z cvt.u64.u32 %rd577, %r1346; 2026-02-21T08:34:17.8996787Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:34:17.8997118Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:34:17.8997530Z cvt.u64.u32 %rd580, %r1351; 2026-02-21T08:34:17.8997856Z cvt.u64.u32 %rd581, %r1352; 2026-02-21T08:34:17.8998171Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:34:17.8998504Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:34:17.8998829Z cvt.u64.u32 %rd584, %r1353; 2026-02-21T08:34:17.8999162Z cvt.u64.u32 %rd585, %r1354; 2026-02-21T08:34:17.8999492Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:34:17.8999819Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:34:17.9000156Z // begin inline asm 2026-02-21T08:34:17.9001037Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1356, %r1357, %r1358, %r1359, %r1360, %r1361, %r1362, %r1363, %r1364, %r1365, %r1366, %r1367, %r1368, %r1369, %r1370, %r1371}, [%r383 + 400]; 2026-02-21T08:34:17.9002012Z // end inline asm 2026-02-21T08:34:17.9002285Z cvt.u64.u32 %rd588, %r1360; 2026-02-21T08:34:17.9002612Z cvt.u64.u32 %rd589, %r1361; 2026-02-21T08:34:17.9002936Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:34:17.9003257Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:34:17.9003602Z cvt.u64.u32 %rd592, %r1362; 2026-02-21T08:34:17.9003923Z cvt.u64.u32 %rd593, %r1363; 2026-02-21T08:34:17.9004255Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:34:17.9004577Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:34:17.9004995Z cvt.u64.u32 %rd596, %r1368; 2026-02-21T08:34:17.9005318Z cvt.u64.u32 %rd597, %r1369; 2026-02-21T08:34:17.9005741Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:34:17.9006067Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:34:17.9006403Z cvt.u64.u32 %rd600, %r1370; 2026-02-21T08:34:17.9006734Z cvt.u64.u32 %rd601, %r1371; 2026-02-21T08:34:17.9007050Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:34:17.9007384Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:34:17.9007709Z // begin inline asm 2026-02-21T08:34:17.9008600Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1373, %r1374, %r1375, %r1376, %r1377, %r1378, %r1379, %r1380, %r1381, %r1382, %r1383, %r1384, %r1385, %r1386, %r1387, %r1388}, [%r383 + 416]; 2026-02-21T08:34:17.9009565Z // end inline asm 2026-02-21T08:34:17.9009852Z cvt.u64.u32 %rd604, %r1375; 2026-02-21T08:34:17.9010170Z cvt.u64.u32 %rd605, %r1376; 2026-02-21T08:34:17.9010496Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:34:17.9010825Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:34:17.9011154Z cvt.u64.u32 %rd608, %r1377; 2026-02-21T08:34:17.9011486Z cvt.u64.u32 %rd609, %r1378; 2026-02-21T08:34:17.9011807Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:34:17.9012140Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:34:17.9012554Z cvt.u64.u32 %rd612, %r1379; 2026-02-21T08:34:17.9012884Z cvt.u64.u32 %rd613, %r1380; 2026-02-21T08:34:17.9013202Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:34:17.9013529Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:34:17.9013869Z cvt.u64.u32 %rd616, %r1383; 2026-02-21T08:34:17.9014188Z cvt.u64.u32 %rd617, %r1384; 2026-02-21T08:34:17.9014512Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:34:17.9014897Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:34:17.9015236Z cvt.u64.u32 %rd620, %r1385; 2026-02-21T08:34:17.9015559Z cvt.u64.u32 %rd621, %r1386; 2026-02-21T08:34:17.9015888Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:34:17.9016208Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:34:17.9016547Z cvt.u64.u32 %rd624, %r1387; 2026-02-21T08:34:17.9016864Z cvt.u64.u32 %rd625, %r1388; 2026-02-21T08:34:17.9017193Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:34:17.9017519Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:34:17.9017845Z // begin inline asm 2026-02-21T08:34:17.9018805Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1390, %r1391, %r1392, %r1393, %r1394, %r1395, %r1396, %r1397, %r1398, %r1399, %r1400, %r1401, %r1402, %r1403, %r1404, %r1405}, [%r383 + 432]; 2026-02-21T08:34:17.9019784Z // end inline asm 2026-02-21T08:34:17.9020079Z cvt.u64.u32 %rd628, %r1392; 2026-02-21T08:34:17.9020396Z cvt.u64.u32 %rd629, %r1393; 2026-02-21T08:34:17.9020723Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:34:17.9021042Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:34:17.9021379Z cvt.u64.u32 %rd632, %r1394; 2026-02-21T08:34:17.9021784Z cvt.u64.u32 %rd633, %r1395; 2026-02-21T08:34:17.9022099Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:34:17.9022429Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:34:17.9022760Z cvt.u64.u32 %rd636, %r1396; 2026-02-21T08:34:17.9023086Z cvt.u64.u32 %rd637, %r1397; 2026-02-21T08:34:17.9023403Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:34:17.9023735Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:34:17.9024064Z cvt.u64.u32 %rd640, %r1402; 2026-02-21T08:34:17.9024395Z cvt.u64.u32 %rd641, %r1403; 2026-02-21T08:34:17.9024764Z shl.b64 %rd642, %rd641, 32; 2026-02-21T08:34:17.9025091Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T08:34:17.9025427Z cvt.u64.u32 %rd644, %r1404; 2026-02-21T08:34:17.9025746Z cvt.u64.u32 %rd645, %r1405; 2026-02-21T08:34:17.9026068Z shl.b64 %rd646, %rd645, 32; 2026-02-21T08:34:17.9026391Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T08:34:17.9026725Z // begin inline asm 2026-02-21T08:34:17.9027598Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1407, %r1408, %r1409, %r1410, %r1411, %r1412, %r1413, %r1414, %r1415, %r1416, %r1417, %r1418, %r1419, %r1420, %r1421, %r1422}, [%r383 + 448]; 2026-02-21T08:34:17.9028571Z // end inline asm 2026-02-21T08:34:17.9028857Z cvt.u64.u32 %rd648, %r1409; 2026-02-21T08:34:17.9029173Z cvt.u64.u32 %rd649, %r1410; 2026-02-21T08:34:17.9029503Z shl.b64 %rd650, %rd649, 32; 2026-02-21T08:34:17.9029898Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T08:34:17.9030253Z cvt.u64.u32 %rd652, %r1411; 2026-02-21T08:34:17.9030568Z cvt.u64.u32 %rd653, %r1412; 2026-02-21T08:34:17.9030896Z shl.b64 %rd654, %rd653, 32; 2026-02-21T08:34:17.9031215Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T08:34:17.9031550Z cvt.u64.u32 %rd656, %r1413; 2026-02-21T08:34:17.9031868Z cvt.u64.u32 %rd657, %r1414; 2026-02-21T08:34:17.9032197Z shl.b64 %rd658, %rd657, 32; 2026-02-21T08:34:17.9032526Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T08:34:17.9032855Z cvt.u64.u32 %rd660, %r1417; 2026-02-21T08:34:17.9033178Z cvt.u64.u32 %rd661, %r1418; 2026-02-21T08:34:17.9033494Z shl.b64 %rd662, %rd661, 32; 2026-02-21T08:34:17.9033820Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T08:34:17.9034148Z cvt.u64.u32 %rd664, %r1419; 2026-02-21T08:34:17.9034473Z cvt.u64.u32 %rd665, %r1420; 2026-02-21T08:34:17.9034839Z shl.b64 %rd666, %rd665, 32; 2026-02-21T08:34:17.9035168Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T08:34:17.9035511Z cvt.u64.u32 %rd668, %r1421; 2026-02-21T08:34:17.9035832Z cvt.u64.u32 %rd669, %r1422; 2026-02-21T08:34:17.9036161Z shl.b64 %rd670, %rd669, 32; 2026-02-21T08:34:17.9036562Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T08:34:17.9036897Z // begin inline asm 2026-02-21T08:34:17.9037774Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1424, %r1425, %r1426, %r1427, %r1428, %r1429, %r1430, %r1431, %r1432, %r1433, %r1434, %r1435, %r1436, %r1437, %r1438, %r1439}, [%r383 + 464]; 2026-02-21T08:34:17.9038740Z // end inline asm 2026-02-21T08:34:17.9039015Z cvt.u64.u32 %rd672, %r1426; 2026-02-21T08:34:17.9039340Z cvt.u64.u32 %rd673, %r1427; 2026-02-21T08:34:17.9039672Z shl.b64 %rd674, %rd673, 32; 2026-02-21T08:34:17.9039989Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T08:34:17.9040329Z cvt.u64.u32 %rd676, %r1428; 2026-02-21T08:34:17.9040647Z cvt.u64.u32 %rd677, %r1429; 2026-02-21T08:34:17.9040976Z shl.b64 %rd678, %rd677, 32; 2026-02-21T08:34:17.9041295Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T08:34:17.9041630Z cvt.u64.u32 %rd680, %r1430; 2026-02-21T08:34:17.9042015Z cvt.u64.u32 %rd681, %r1431; 2026-02-21T08:34:17.9042349Z shl.b64 %rd682, %rd681, 32; 2026-02-21T08:34:17.9042675Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T08:34:17.9043016Z cvt.u64.u32 %rd684, %r1434; 2026-02-21T08:34:17.9043347Z cvt.u64.u32 %rd685, %r1435; 2026-02-21T08:34:17.9043666Z shl.b64 %rd686, %rd685, 32; 2026-02-21T08:34:17.9043997Z or.b64 %rd687, %rd684, %rd686; 2026-02-21T08:34:17.9044329Z cvt.u64.u32 %rd688, %r1436; 2026-02-21T08:34:17.9044656Z cvt.u64.u32 %rd689, %r1437; 2026-02-21T08:34:17.9045037Z shl.b64 %rd690, %rd689, 32; 2026-02-21T08:34:17.9045439Z or.b64 %rd691, %rd688, %rd690; 2026-02-21T08:34:17.9045770Z cvt.u64.u32 %rd692, %r1438; 2026-02-21T08:34:17.9046097Z cvt.u64.u32 %rd693, %r1439; 2026-02-21T08:34:17.9046541Z shl.b64 %rd694, %rd693, 32; 2026-02-21T08:34:17.9046942Z or.b64 %rd695, %rd692, %rd694; 2026-02-21T08:34:17.9047431Z // begin inline asm 2026-02-21T08:34:17.9048512Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1441, %r1442, %r1443, %r1444, %r1445, %r1446, %r1447, %r1448, %r1449, %r1450, %r1451, %r1452, %r1453, %r1454, %r1455, %r1456}, [%r383 + 480]; 2026-02-21T08:34:17.9049512Z // end inline asm 2026-02-21T08:34:17.9049828Z cvt.u64.u32 %rd696, %r1443; 2026-02-21T08:34:17.9050222Z cvt.u64.u32 %rd697, %r1444; 2026-02-21T08:34:17.9050580Z shl.b64 %rd698, %rd697, 32; 2026-02-21T08:34:17.9050972Z or.b64 %rd699, %rd696, %rd698; 2026-02-21T08:34:17.9051354Z cvt.u64.u32 %rd700, %r1445; 2026-02-21T08:34:17.9051712Z cvt.u64.u32 %rd701, %r1446; 2026-02-21T08:34:17.9052062Z shl.b64 %rd702, %rd701, 32; 2026-02-21T08:34:17.9052390Z or.b64 %rd703, %rd700, %rd702; 2026-02-21T08:34:17.9052755Z cvt.u64.u32 %rd704, %r1447; 2026-02-21T08:34:17.9053078Z cvt.u64.u32 %rd705, %r1448; 2026-02-21T08:34:17.9053427Z shl.b64 %rd706, %rd705, 32; 2026-02-21T08:34:17.9053764Z or.b64 %rd707, %rd704, %rd706; 2026-02-21T08:34:17.9054126Z cvt.u64.u32 %rd708, %r1451; 2026-02-21T08:34:17.9054541Z cvt.u64.u32 %rd709, %r1452; 2026-02-21T08:34:17.9054955Z shl.b64 %rd710, %rd709, 32; 2026-02-21T08:34:17.9055297Z or.b64 %rd711, %rd708, %rd710; 2026-02-21T08:34:17.9055641Z cvt.u64.u32 %rd712, %r1453; 2026-02-21T08:34:17.9055999Z cvt.u64.u32 %rd713, %r1454; 2026-02-21T08:34:17.9056344Z shl.b64 %rd714, %rd713, 32; 2026-02-21T08:34:17.9056689Z or.b64 %rd715, %rd712, %rd714; 2026-02-21T08:34:17.9057038Z cvt.u64.u32 %rd716, %r1455; 2026-02-21T08:34:17.9057390Z cvt.u64.u32 %rd717, %r1456; 2026-02-21T08:34:17.9057730Z shl.b64 %rd718, %rd717, 32; 2026-02-21T08:34:17.9058070Z or.b64 %rd719, %rd716, %rd718; 2026-02-21T08:34:17.9058432Z // begin inline asm 2026-02-21T08:34:17.9059356Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1458, %r1459, %r1460, %r1461, %r1462, %r1463, %r1464, %r1465, %r1466, %r1467, %r1468, %r1469, %r1470, %r1471, %r1472, %r1473}, [%r383 + 496]; 2026-02-21T08:34:17.9060364Z // end inline asm 2026-02-21T08:34:17.9060667Z cvt.u64.u32 %rd720, %r1460; 2026-02-21T08:34:17.9061019Z cvt.u64.u32 %rd721, %r1461; 2026-02-21T08:34:17.9061363Z shl.b64 %rd722, %rd721, 32; 2026-02-21T08:34:17.9061811Z or.b64 %rd723, %rd720, %rd722; 2026-02-21T08:34:17.9062164Z cvt.u64.u32 %rd724, %r1462; 2026-02-21T08:34:17.9062491Z cvt.u64.u32 %rd725, %r1463; 2026-02-21T08:34:17.9062813Z shl.b64 %rd726, %rd725, 32; 2026-02-21T08:34:17.9063132Z or.b64 %rd727, %rd724, %rd726; 2026-02-21T08:34:17.9063503Z cvt.u64.u32 %rd728, %r1464; 2026-02-21T08:34:17.9063819Z cvt.u64.u32 %rd729, %r1465; 2026-02-21T08:34:17.9064143Z shl.b64 %rd730, %rd729, 32; 2026-02-21T08:34:17.9064459Z or.b64 %rd731, %rd728, %rd730; 2026-02-21T08:34:17.9064867Z cvt.u64.u32 %rd732, %r1468; 2026-02-21T08:34:17.9065183Z cvt.u64.u32 %rd733, %r1469; 2026-02-21T08:34:17.9065506Z shl.b64 %rd734, %rd733, 32; 2026-02-21T08:34:17.9065818Z or.b64 %rd735, %rd732, %rd734; 2026-02-21T08:34:17.9066151Z cvt.u64.u32 %rd736, %r1470; 2026-02-21T08:34:17.9066469Z cvt.u64.u32 %rd737, %r1471; 2026-02-21T08:34:17.9066784Z shl.b64 %rd738, %rd737, 32; 2026-02-21T08:34:17.9067191Z or.b64 %rd739, %rd736, %rd738; 2026-02-21T08:34:17.9067533Z cvt.u64.u32 %rd740, %r1472; 2026-02-21T08:34:17.9067863Z cvt.u64.u32 %rd741, %r1473; 2026-02-21T08:34:17.9068172Z shl.b64 %rd742, %rd741, 32; 2026-02-21T08:34:17.9068502Z or.b64 %rd743, %rd740, %rd742; 2026-02-21T08:34:17.9068835Z // begin inline asm 2026-02-21T08:34:17.9069160Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:34:17.9069507Z // end inline asm 2026-02-21T08:34:17.9069784Z cvt.u64.u32 %rd744, %r931; 2026-02-21T08:34:17.9070120Z cvt.u64.u32 %rd745, %r932; 2026-02-21T08:34:17.9070435Z shl.b64 %rd746, %rd745, 32; 2026-02-21T08:34:17.9070837Z or.b64 %rd747, %rd744, %rd746; 2026-02-21T08:34:17.9071455Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9072123Z mov.b64 {%r2161, %r2162}, %rd747; 2026-02-21T08:34:17.9072504Z cvt.rn.f16x2.f32 %r2163, %r2162, %r2161; 2026-02-21T08:34:17.9073165Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9073804Z cvt.u64.u32 %rd748, %r933; 2026-02-21T08:34:17.9074125Z cvt.u64.u32 %rd749, %r934; 2026-02-21T08:34:17.9074450Z shl.b64 %rd750, %rd749, 32; 2026-02-21T08:34:17.9074819Z or.b64 %rd751, %rd748, %rd750; 2026-02-21T08:34:17.9075429Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9076075Z mov.b64 {%r2164, %r2165}, %rd751; 2026-02-21T08:34:17.9076455Z cvt.rn.f16x2.f32 %r2166, %r2165, %r2164; 2026-02-21T08:34:17.9076845Z mov.b64 {%r2167, %r2168}, %rd191; 2026-02-21T08:34:17.9077220Z cvt.rn.f16x2.f32 %r2169, %r2168, %r2167; 2026-02-21T08:34:17.9077603Z mov.b64 {%r2170, %r2171}, %rd195; 2026-02-21T08:34:17.9077965Z cvt.rn.f16x2.f32 %r2172, %r2171, %r2170; 2026-02-21T08:34:17.9078744Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9079392Z cvt.u64.u32 %rd752, %r939; 2026-02-21T08:34:17.9079720Z cvt.u64.u32 %rd753, %r940; 2026-02-21T08:34:17.9080036Z shl.b64 %rd754, %rd753, 32; 2026-02-21T08:34:17.9080355Z or.b64 %rd755, %rd752, %rd754; 2026-02-21T08:34:17.9080929Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9081550Z mov.b64 {%r2173, %r2174}, %rd755; 2026-02-21T08:34:17.9081907Z cvt.rn.f16x2.f32 %r2175, %r2174, %r2173; 2026-02-21T08:34:17.9082504Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9083124Z cvt.u64.u32 %rd756, %r941; 2026-02-21T08:34:17.9083431Z cvt.u64.u32 %rd757, %r942; 2026-02-21T08:34:17.9083746Z shl.b64 %rd758, %rd757, 32; 2026-02-21T08:34:17.9084059Z or.b64 %rd759, %rd756, %rd758; 2026-02-21T08:34:17.9084635Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9085329Z mov.b64 {%r2176, %r2177}, %rd759; 2026-02-21T08:34:17.9085685Z cvt.rn.f16x2.f32 %r2178, %r2177, %r2176; 2026-02-21T08:34:17.9086144Z mov.b64 {%r2179, %r2180}, %rd199; 2026-02-21T08:34:17.9086495Z cvt.rn.f16x2.f32 %r2181, %r2180, %r2179; 2026-02-21T08:34:17.9086858Z mov.b64 {%r2182, %r2183}, %rd203; 2026-02-21T08:34:17.9087200Z cvt.rn.f16x2.f32 %r2184, %r2183, %r2182; 2026-02-21T08:34:17.9087812Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9088427Z cvt.u64.u32 %rd760, %r948; 2026-02-21T08:34:17.9088737Z cvt.u64.u32 %rd761, %r949; 2026-02-21T08:34:17.9089059Z shl.b64 %rd762, %rd761, 32; 2026-02-21T08:34:17.9089372Z or.b64 %rd763, %rd760, %rd762; 2026-02-21T08:34:17.9089948Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9090564Z mov.b64 {%r2185, %r2186}, %rd763; 2026-02-21T08:34:17.9090926Z cvt.rn.f16x2.f32 %r2187, %r2186, %r2185; 2026-02-21T08:34:17.9091601Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9092233Z cvt.u64.u32 %rd764, %r950; 2026-02-21T08:34:17.9092550Z cvt.u64.u32 %rd765, %r951; 2026-02-21T08:34:17.9092864Z shl.b64 %rd766, %rd765, 32; 2026-02-21T08:34:17.9093187Z or.b64 %rd767, %rd764, %rd766; 2026-02-21T08:34:17.9093755Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9094375Z mov.b64 {%r2188, %r2189}, %rd767; 2026-02-21T08:34:17.9094765Z cvt.rn.f16x2.f32 %r2190, %r2189, %r2188; 2026-02-21T08:34:17.9095262Z mov.b64 {%r2191, %r2192}, %rd207; 2026-02-21T08:34:17.9095609Z cvt.rn.f16x2.f32 %r2193, %r2192, %r2191; 2026-02-21T08:34:17.9095977Z mov.b64 {%r2194, %r2195}, %rd211; 2026-02-21T08:34:17.9096332Z cvt.rn.f16x2.f32 %r2196, %r2195, %r2194; 2026-02-21T08:34:17.9096942Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9097568Z cvt.u64.u32 %rd768, %r956; 2026-02-21T08:34:17.9097878Z cvt.u64.u32 %rd769, %r957; 2026-02-21T08:34:17.9098202Z shl.b64 %rd770, %rd769, 32; 2026-02-21T08:34:17.9098509Z or.b64 %rd771, %rd768, %rd770; 2026-02-21T08:34:17.9099083Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9099706Z mov.b64 {%r2197, %r2198}, %rd771; 2026-02-21T08:34:17.9100055Z cvt.rn.f16x2.f32 %r2199, %r2198, %r2197; 2026-02-21T08:34:17.9100671Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9101284Z cvt.u64.u32 %rd772, %r958; 2026-02-21T08:34:17.9101653Z cvt.u64.u32 %rd773, %r959; 2026-02-21T08:34:17.9101967Z shl.b64 %rd774, %rd773, 32; 2026-02-21T08:34:17.9102297Z or.b64 %rd775, %rd772, %rd774; 2026-02-21T08:34:17.9102942Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9103567Z mov.b64 {%r2200, %r2201}, %rd775; 2026-02-21T08:34:17.9103931Z cvt.rn.f16x2.f32 %r2202, %r2201, %r2200; 2026-02-21T08:34:17.9104302Z mov.b64 {%r2203, %r2204}, %rd215; 2026-02-21T08:34:17.9104660Z cvt.rn.f16x2.f32 %r2205, %r2204, %r2203; 2026-02-21T08:34:17.9105066Z mov.b64 {%r2206, %r2207}, %rd219; 2026-02-21T08:34:17.9105422Z cvt.rn.f16x2.f32 %r2208, %r2207, %r2206; 2026-02-21T08:34:17.9106020Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9106640Z cvt.u64.u32 %rd776, %r965; 2026-02-21T08:34:17.9106962Z cvt.u64.u32 %rd777, %r966; 2026-02-21T08:34:17.9107273Z shl.b64 %rd778, %rd777, 32; 2026-02-21T08:34:17.9107594Z or.b64 %rd779, %rd776, %rd778; 2026-02-21T08:34:17.9108161Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9108781Z mov.b64 {%r2209, %r2210}, %rd779; 2026-02-21T08:34:17.9109132Z cvt.rn.f16x2.f32 %r2211, %r2210, %r2209; 2026-02-21T08:34:17.9109746Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9110453Z cvt.u64.u32 %rd780, %r967; 2026-02-21T08:34:17.9110770Z cvt.u64.u32 %rd781, %r968; 2026-02-21T08:34:17.9111084Z shl.b64 %rd782, %rd781, 32; 2026-02-21T08:34:17.9111395Z or.b64 %rd783, %rd780, %rd782; 2026-02-21T08:34:17.9111970Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9112657Z mov.b64 {%r2212, %r2213}, %rd783; 2026-02-21T08:34:17.9113126Z cvt.rn.f16x2.f32 %r2214, %r2213, %r2212; 2026-02-21T08:34:17.9113543Z mov.b64 {%r2215, %r2216}, %rd223; 2026-02-21T08:34:17.9113910Z cvt.rn.f16x2.f32 %r2217, %r2216, %r2215; 2026-02-21T08:34:17.9114291Z mov.b64 {%r2218, %r2219}, %rd227; 2026-02-21T08:34:17.9114645Z cvt.rn.f16x2.f32 %r2220, %r2219, %r2218; 2026-02-21T08:34:17.9115361Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9116064Z cvt.u64.u32 %rd784, %r973; 2026-02-21T08:34:17.9116404Z cvt.u64.u32 %rd785, %r974; 2026-02-21T08:34:17.9116723Z shl.b64 %rd786, %rd785, 32; 2026-02-21T08:34:17.9117053Z or.b64 %rd787, %rd784, %rd786; 2026-02-21T08:34:17.9117632Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9118272Z mov.b64 {%r2221, %r2222}, %rd787; 2026-02-21T08:34:17.9118640Z cvt.rn.f16x2.f32 %r2223, %r2222, %r2221; 2026-02-21T08:34:17.9119265Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9119984Z cvt.u64.u32 %rd788, %r975; 2026-02-21T08:34:17.9120301Z cvt.u64.u32 %rd789, %r976; 2026-02-21T08:34:17.9120626Z shl.b64 %rd790, %rd789, 32; 2026-02-21T08:34:17.9120951Z or.b64 %rd791, %rd788, %rd790; 2026-02-21T08:34:17.9121544Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9122188Z mov.b64 {%r2224, %r2225}, %rd791; 2026-02-21T08:34:17.9122545Z cvt.rn.f16x2.f32 %r2226, %r2225, %r2224; 2026-02-21T08:34:17.9122929Z mov.b64 {%r2227, %r2228}, %rd231; 2026-02-21T08:34:17.9123288Z cvt.rn.f16x2.f32 %r2229, %r2228, %r2227; 2026-02-21T08:34:17.9123670Z mov.b64 {%r2230, %r2231}, %rd235; 2026-02-21T08:34:17.9124026Z cvt.rn.f16x2.f32 %r2232, %r2231, %r2230; 2026-02-21T08:34:17.9124659Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9125351Z cvt.u64.u32 %rd792, %r982; 2026-02-21T08:34:17.9125683Z cvt.u64.u32 %rd793, %r983; 2026-02-21T08:34:17.9126012Z shl.b64 %rd794, %rd793, 32; 2026-02-21T08:34:17.9126336Z or.b64 %rd795, %rd792, %rd794; 2026-02-21T08:34:17.9126937Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9127646Z mov.b64 {%r2233, %r2234}, %rd795; 2026-02-21T08:34:17.9128021Z cvt.rn.f16x2.f32 %r2235, %r2234, %r2233; 2026-02-21T08:34:17.9128646Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9129289Z cvt.u64.u32 %rd796, %r984; 2026-02-21T08:34:17.9129606Z cvt.u64.u32 %rd797, %r985; 2026-02-21T08:34:17.9129929Z shl.b64 %rd798, %rd797, 32; 2026-02-21T08:34:17.9130259Z or.b64 %rd799, %rd796, %rd798; 2026-02-21T08:34:17.9130837Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9131479Z mov.b64 {%r2236, %r2237}, %rd799; 2026-02-21T08:34:17.9131842Z cvt.rn.f16x2.f32 %r2238, %r2237, %r2236; 2026-02-21T08:34:17.9132220Z mov.b64 {%r2239, %r2240}, %rd239; 2026-02-21T08:34:17.9132575Z cvt.rn.f16x2.f32 %r2241, %r2240, %r2239; 2026-02-21T08:34:17.9132951Z mov.b64 {%r2242, %r2243}, %rd243; 2026-02-21T08:34:17.9133316Z cvt.rn.f16x2.f32 %r2244, %r2243, %r2242; 2026-02-21T08:34:17.9133939Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9134581Z cvt.u64.u32 %rd800, %r990; 2026-02-21T08:34:17.9135027Z cvt.u64.u32 %rd801, %r991; 2026-02-21T08:34:17.9135354Z shl.b64 %rd802, %rd801, 32; 2026-02-21T08:34:17.9135676Z or.b64 %rd803, %rd800, %rd802; 2026-02-21T08:34:17.9136278Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9136916Z mov.b64 {%r2245, %r2246}, %rd803; 2026-02-21T08:34:17.9137291Z cvt.rn.f16x2.f32 %r2247, %r2246, %r2245; 2026-02-21T08:34:17.9137930Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9138570Z cvt.u64.u32 %rd804, %r992; 2026-02-21T08:34:17.9138893Z cvt.u64.u32 %rd805, %r993; 2026-02-21T08:34:17.9139211Z shl.b64 %rd806, %rd805, 32; 2026-02-21T08:34:17.9139544Z or.b64 %rd807, %rd804, %rd806; 2026-02-21T08:34:17.9140135Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9140843Z mov.b64 {%r2248, %r2249}, %rd807; 2026-02-21T08:34:17.9141222Z cvt.rn.f16x2.f32 %r2250, %r2249, %r2248; 2026-02-21T08:34:17.9141595Z mov.b64 {%r2251, %r2252}, %rd247; 2026-02-21T08:34:17.9141957Z cvt.rn.f16x2.f32 %r2253, %r2252, %r2251; 2026-02-21T08:34:17.9142332Z mov.b64 {%r2254, %r2255}, %rd251; 2026-02-21T08:34:17.9142699Z cvt.rn.f16x2.f32 %r2256, %r2255, %r2254; 2026-02-21T08:34:17.9143323Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9143963Z cvt.u64.u32 %rd808, %r999; 2026-02-21T08:34:17.9144351Z cvt.u64.u32 %rd809, %r1000; 2026-02-21T08:34:17.9144750Z shl.b64 %rd810, %rd809, 32; 2026-02-21T08:34:17.9145084Z or.b64 %rd811, %rd808, %rd810; 2026-02-21T08:34:17.9145671Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9146319Z mov.b64 {%r2257, %r2258}, %rd811; 2026-02-21T08:34:17.9146683Z cvt.rn.f16x2.f32 %r2259, %r2258, %r2257; 2026-02-21T08:34:17.9147318Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9147952Z cvt.u64.u32 %rd812, %r1001; 2026-02-21T08:34:17.9148281Z cvt.u64.u32 %rd813, %r1002; 2026-02-21T08:34:17.9148609Z shl.b64 %rd814, %rd813, 32; 2026-02-21T08:34:17.9148929Z or.b64 %rd815, %rd812, %rd814; 2026-02-21T08:34:17.9149521Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9150153Z mov.b64 {%r2260, %r2261}, %rd815; 2026-02-21T08:34:17.9150527Z cvt.rn.f16x2.f32 %r2262, %r2261, %r2260; 2026-02-21T08:34:17.9150897Z mov.b64 {%r2263, %r2264}, %rd255; 2026-02-21T08:34:17.9151265Z cvt.rn.f16x2.f32 %r2265, %r2264, %r2263; 2026-02-21T08:34:17.9151633Z mov.b64 {%r2266, %r2267}, %rd259; 2026-02-21T08:34:17.9151996Z cvt.rn.f16x2.f32 %r2268, %r2267, %r2266; 2026-02-21T08:34:17.9152695Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9153348Z cvt.u64.u32 %rd816, %r1007; 2026-02-21T08:34:17.9153706Z cvt.u64.u32 %rd817, %r1008; 2026-02-21T08:34:17.9154028Z shl.b64 %rd818, %rd817, 32; 2026-02-21T08:34:17.9154358Z or.b64 %rd819, %rd816, %rd818; 2026-02-21T08:34:17.9154998Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9155644Z mov.b64 {%r2269, %r2270}, %rd819; 2026-02-21T08:34:17.9156003Z cvt.rn.f16x2.f32 %r2271, %r2270, %r2269; 2026-02-21T08:34:17.9156637Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9157280Z cvt.u64.u32 %rd820, %r1009; 2026-02-21T08:34:17.9157600Z cvt.u64.u32 %rd821, %r1010; 2026-02-21T08:34:17.9157930Z shl.b64 %rd822, %rd821, 32; 2026-02-21T08:34:17.9158257Z or.b64 %rd823, %rd820, %rd822; 2026-02-21T08:34:17.9158861Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9159502Z mov.b64 {%r2272, %r2273}, %rd823; 2026-02-21T08:34:17.9159957Z cvt.rn.f16x2.f32 %r2274, %r2273, %r2272; 2026-02-21T08:34:17.9160338Z mov.b64 {%r2275, %r2276}, %rd263; 2026-02-21T08:34:17.9160698Z cvt.rn.f16x2.f32 %r2277, %r2276, %r2275; 2026-02-21T08:34:17.9161079Z mov.b64 {%r2278, %r2279}, %rd267; 2026-02-21T08:34:17.9161435Z cvt.rn.f16x2.f32 %r2280, %r2279, %r2278; 2026-02-21T08:34:17.9162062Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9162704Z cvt.u64.u32 %rd824, %r1016; 2026-02-21T08:34:17.9163033Z cvt.u64.u32 %rd825, %r1017; 2026-02-21T08:34:17.9163353Z shl.b64 %rd826, %rd825, 32; 2026-02-21T08:34:17.9163682Z or.b64 %rd827, %rd824, %rd826; 2026-02-21T08:34:17.9164274Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9164975Z mov.b64 {%r2281, %r2282}, %rd827; 2026-02-21T08:34:17.9165413Z cvt.rn.f16x2.f32 %r2283, %r2282, %r2281; 2026-02-21T08:34:17.9166050Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9166695Z cvt.u64.u32 %rd828, %r1018; 2026-02-21T08:34:17.9167020Z cvt.u64.u32 %rd829, %r1019; 2026-02-21T08:34:17.9167354Z shl.b64 %rd830, %rd829, 32; 2026-02-21T08:34:17.9167684Z or.b64 %rd831, %rd828, %rd830; 2026-02-21T08:34:17.9168265Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9169021Z mov.b64 {%r2284, %r2285}, %rd831; 2026-02-21T08:34:17.9169385Z cvt.rn.f16x2.f32 %r2286, %r2285, %r2284; 2026-02-21T08:34:17.9169766Z mov.b64 {%r2287, %r2288}, %rd271; 2026-02-21T08:34:17.9170122Z cvt.rn.f16x2.f32 %r2289, %r2288, %r2287; 2026-02-21T08:34:17.9170502Z mov.b64 {%r2290, %r2291}, %rd275; 2026-02-21T08:34:17.9170862Z cvt.rn.f16x2.f32 %r2292, %r2291, %r2290; 2026-02-21T08:34:17.9171496Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9172150Z cvt.u64.u32 %rd832, %r1024; 2026-02-21T08:34:17.9172473Z cvt.u64.u32 %rd833, %r1025; 2026-02-21T08:34:17.9172803Z shl.b64 %rd834, %rd833, 32; 2026-02-21T08:34:17.9173129Z or.b64 %rd835, %rd832, %rd834; 2026-02-21T08:34:17.9173720Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9174347Z mov.b64 {%r2293, %r2294}, %rd835; 2026-02-21T08:34:17.9174764Z cvt.rn.f16x2.f32 %r2295, %r2294, %r2293; 2026-02-21T08:34:17.9175583Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9176280Z cvt.u64.u32 %rd836, %r1026; 2026-02-21T08:34:17.9176609Z cvt.u64.u32 %rd837, %r1027; 2026-02-21T08:34:17.9176947Z shl.b64 %rd838, %rd837, 32; 2026-02-21T08:34:17.9177356Z or.b64 %rd839, %rd836, %rd838; 2026-02-21T08:34:17.9177966Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9178647Z mov.b64 {%r2296, %r2297}, %rd839; 2026-02-21T08:34:17.9179027Z cvt.rn.f16x2.f32 %r2298, %r2297, %r2296; 2026-02-21T08:34:17.9179424Z mov.b64 {%r2299, %r2300}, %rd279; 2026-02-21T08:34:17.9179811Z cvt.rn.f16x2.f32 %r2301, %r2300, %r2299; 2026-02-21T08:34:17.9180199Z mov.b64 {%r2302, %r2303}, %rd283; 2026-02-21T08:34:17.9180579Z cvt.rn.f16x2.f32 %r2304, %r2303, %r2302; 2026-02-21T08:34:17.9181219Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9181872Z cvt.u64.u32 %rd840, %r1033; 2026-02-21T08:34:17.9182217Z cvt.u64.u32 %rd841, %r1034; 2026-02-21T08:34:17.9182562Z shl.b64 %rd842, %rd841, 32; 2026-02-21T08:34:17.9182906Z or.b64 %rd843, %rd840, %rd842; 2026-02-21T08:34:17.9183523Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9184192Z mov.b64 {%r2305, %r2306}, %rd843; 2026-02-21T08:34:17.9184580Z cvt.rn.f16x2.f32 %r2307, %r2306, %r2305; 2026-02-21T08:34:17.9185389Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9186063Z cvt.u64.u32 %rd844, %r1035; 2026-02-21T08:34:17.9186408Z cvt.u64.u32 %rd845, %r1036; 2026-02-21T08:34:17.9186750Z shl.b64 %rd846, %rd845, 32; 2026-02-21T08:34:17.9187094Z or.b64 %rd847, %rd844, %rd846; 2026-02-21T08:34:17.9187704Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9188345Z mov.b64 {%r2308, %r2309}, %rd847; 2026-02-21T08:34:17.9188708Z cvt.rn.f16x2.f32 %r2310, %r2309, %r2308; 2026-02-21T08:34:17.9189095Z mov.b64 {%r2311, %r2312}, %rd287; 2026-02-21T08:34:17.9189455Z cvt.rn.f16x2.f32 %r2313, %r2312, %r2311; 2026-02-21T08:34:17.9189838Z mov.b64 {%r2314, %r2315}, %rd291; 2026-02-21T08:34:17.9190203Z cvt.rn.f16x2.f32 %r2316, %r2315, %r2314; 2026-02-21T08:34:17.9190905Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9191554Z cvt.u64.u32 %rd848, %r1041; 2026-02-21T08:34:17.9191889Z cvt.u64.u32 %rd849, %r1042; 2026-02-21T08:34:17.9192205Z shl.b64 %rd850, %rd849, 32; 2026-02-21T08:34:17.9192539Z or.b64 %rd851, %rd848, %rd850; 2026-02-21T08:34:17.9193139Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9193789Z mov.b64 {%r2317, %r2318}, %rd851; 2026-02-21T08:34:17.9194226Z cvt.rn.f16x2.f32 %r2319, %r2318, %r2317; 2026-02-21T08:34:17.9194928Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9195582Z cvt.u64.u32 %rd852, %r1043; 2026-02-21T08:34:17.9195906Z cvt.u64.u32 %rd853, %r1044; 2026-02-21T08:34:17.9196225Z shl.b64 %rd854, %rd853, 32; 2026-02-21T08:34:17.9196537Z or.b64 %rd855, %rd852, %rd854; 2026-02-21T08:34:17.9197114Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9197727Z mov.b64 {%r2320, %r2321}, %rd855; 2026-02-21T08:34:17.9198081Z cvt.rn.f16x2.f32 %r2322, %r2321, %r2320; 2026-02-21T08:34:17.9198451Z mov.b64 {%r2323, %r2324}, %rd295; 2026-02-21T08:34:17.9198794Z cvt.rn.f16x2.f32 %r2325, %r2324, %r2323; 2026-02-21T08:34:17.9199160Z mov.b64 {%r2326, %r2327}, %rd299; 2026-02-21T08:34:17.9199504Z cvt.rn.f16x2.f32 %r2328, %r2327, %r2326; 2026-02-21T08:34:17.9200112Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9200731Z cvt.u64.u32 %rd856, %r1050; 2026-02-21T08:34:17.9201054Z cvt.u64.u32 %rd857, %r1051; 2026-02-21T08:34:17.9201362Z shl.b64 %rd858, %rd857, 32; 2026-02-21T08:34:17.9201678Z or.b64 %rd859, %rd856, %rd858; 2026-02-21T08:34:17.9202319Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9202945Z mov.b64 {%r2329, %r2330}, %rd859; 2026-02-21T08:34:17.9203312Z cvt.rn.f16x2.f32 %r2331, %r2330, %r2329; 2026-02-21T08:34:17.9203915Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9204536Z cvt.u64.u32 %rd860, %r1052; 2026-02-21T08:34:17.9204901Z cvt.u64.u32 %rd861, %r1053; 2026-02-21T08:34:17.9205223Z shl.b64 %rd862, %rd861, 32; 2026-02-21T08:34:17.9205532Z or.b64 %rd863, %rd860, %rd862; 2026-02-21T08:34:17.9206107Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9206731Z mov.b64 {%r2332, %r2333}, %rd863; 2026-02-21T08:34:17.9207080Z cvt.rn.f16x2.f32 %r2334, %r2333, %r2332; 2026-02-21T08:34:17.9207450Z mov.b64 {%r2335, %r2336}, %rd303; 2026-02-21T08:34:17.9207799Z cvt.rn.f16x2.f32 %r2337, %r2336, %r2335; 2026-02-21T08:34:17.9208171Z mov.b64 {%r2338, %r2339}, %rd307; 2026-02-21T08:34:17.9208515Z cvt.rn.f16x2.f32 %r2340, %r2339, %r2338; 2026-02-21T08:34:17.9209137Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9209833Z cvt.u64.u32 %rd864, %r1058; 2026-02-21T08:34:17.9210144Z cvt.u64.u32 %rd865, %r1059; 2026-02-21T08:34:17.9210459Z shl.b64 %rd866, %rd865, 32; 2026-02-21T08:34:17.9210764Z or.b64 %rd867, %rd864, %rd866; 2026-02-21T08:34:17.9211332Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9211941Z mov.b64 {%r2341, %r2342}, %rd867; 2026-02-21T08:34:17.9212298Z cvt.rn.f16x2.f32 %r2343, %r2342, %r2341; 2026-02-21T08:34:17.9212898Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9213513Z cvt.u64.u32 %rd868, %r1060; 2026-02-21T08:34:17.9213829Z cvt.u64.u32 %rd869, %r1061; 2026-02-21T08:34:17.9214140Z shl.b64 %rd870, %rd869, 32; 2026-02-21T08:34:17.9214454Z or.b64 %rd871, %rd868, %rd870; 2026-02-21T08:34:17.9215143Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9215775Z mov.b64 {%r2344, %r2345}, %rd871; 2026-02-21T08:34:17.9216122Z cvt.rn.f16x2.f32 %r2346, %r2345, %r2344; 2026-02-21T08:34:17.9216490Z mov.b64 {%r2347, %r2348}, %rd311; 2026-02-21T08:34:17.9216841Z cvt.rn.f16x2.f32 %r2349, %r2348, %r2347; 2026-02-21T08:34:17.9217201Z mov.b64 {%r2350, %r2351}, %rd315; 2026-02-21T08:34:17.9217552Z cvt.rn.f16x2.f32 %r2352, %r2351, %r2350; 2026-02-21T08:34:17.9218221Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9218846Z cvt.u64.u32 %rd872, %r1067; 2026-02-21T08:34:17.9219159Z cvt.u64.u32 %rd873, %r1068; 2026-02-21T08:34:17.9219474Z shl.b64 %rd874, %rd873, 32; 2026-02-21T08:34:17.9219787Z or.b64 %rd875, %rd872, %rd874; 2026-02-21T08:34:17.9220372Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9221003Z mov.b64 {%r2353, %r2354}, %rd875; 2026-02-21T08:34:17.9221356Z cvt.rn.f16x2.f32 %r2355, %r2354, %r2353; 2026-02-21T08:34:17.9221978Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9222592Z cvt.u64.u32 %rd876, %r1069; 2026-02-21T08:34:17.9222912Z cvt.u64.u32 %rd877, %r1070; 2026-02-21T08:34:17.9223222Z shl.b64 %rd878, %rd877, 32; 2026-02-21T08:34:17.9223546Z or.b64 %rd879, %rd876, %rd878; 2026-02-21T08:34:17.9224125Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9224788Z mov.b64 {%r2356, %r2357}, %rd879; 2026-02-21T08:34:17.9225153Z cvt.rn.f16x2.f32 %r2358, %r2357, %r2356; 2026-02-21T08:34:17.9225518Z mov.b64 {%r2359, %r2360}, %rd319; 2026-02-21T08:34:17.9225880Z cvt.rn.f16x2.f32 %r2361, %r2360, %r2359; 2026-02-21T08:34:17.9226307Z mov.b64 {%r2362, %r2363}, %rd323; 2026-02-21T08:34:17.9226676Z cvt.rn.f16x2.f32 %r2364, %r2363, %r2362; 2026-02-21T08:34:17.9227283Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9227917Z cvt.u64.u32 %rd880, %r1075; 2026-02-21T08:34:17.9228242Z cvt.u64.u32 %rd881, %r1076; 2026-02-21T08:34:17.9228545Z shl.b64 %rd882, %rd881, 32; 2026-02-21T08:34:17.9228868Z or.b64 %rd883, %rd880, %rd882; 2026-02-21T08:34:17.9229431Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9230053Z mov.b64 {%r2365, %r2366}, %rd883; 2026-02-21T08:34:17.9230395Z cvt.rn.f16x2.f32 %r2367, %r2366, %r2365; 2026-02-21T08:34:17.9231005Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9231621Z cvt.u64.u32 %rd884, %r1077; 2026-02-21T08:34:17.9231931Z cvt.u64.u32 %rd885, %r1078; 2026-02-21T08:34:17.9232249Z shl.b64 %rd886, %rd885, 32; 2026-02-21T08:34:17.9232560Z or.b64 %rd887, %rd884, %rd886; 2026-02-21T08:34:17.9233205Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9233810Z mov.b64 {%r2368, %r2369}, %rd887; 2026-02-21T08:34:17.9234173Z cvt.rn.f16x2.f32 %r2370, %r2369, %r2368; 2026-02-21T08:34:17.9234533Z mov.b64 {%r2371, %r2372}, %rd327; 2026-02-21T08:34:17.9234983Z cvt.rn.f16x2.f32 %r2373, %r2372, %r2371; 2026-02-21T08:34:17.9235352Z mov.b64 {%r2374, %r2375}, %rd331; 2026-02-21T08:34:17.9235701Z cvt.rn.f16x2.f32 %r2376, %r2375, %r2374; 2026-02-21T08:34:17.9236324Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9236949Z cvt.u64.u32 %rd888, %r1084; 2026-02-21T08:34:17.9237273Z cvt.u64.u32 %rd889, %r1085; 2026-02-21T08:34:17.9237587Z shl.b64 %rd890, %rd889, 32; 2026-02-21T08:34:17.9237912Z or.b64 %rd891, %rd888, %rd890; 2026-02-21T08:34:17.9238544Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9239169Z mov.b64 {%r2377, %r2378}, %rd891; 2026-02-21T08:34:17.9239530Z cvt.rn.f16x2.f32 %r2379, %r2378, %r2377; 2026-02-21T08:34:17.9240132Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9240753Z cvt.u64.u32 %rd892, %r1086; 2026-02-21T08:34:17.9241068Z cvt.u64.u32 %rd893, %r1087; 2026-02-21T08:34:17.9241391Z shl.b64 %rd894, %rd893, 32; 2026-02-21T08:34:17.9241704Z or.b64 %rd895, %rd892, %rd894; 2026-02-21T08:34:17.9242352Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9242976Z mov.b64 {%r2380, %r2381}, %rd895; 2026-02-21T08:34:17.9243325Z cvt.rn.f16x2.f32 %r2382, %r2381, %r2380; 2026-02-21T08:34:17.9243695Z mov.b64 {%r2383, %r2384}, %rd335; 2026-02-21T08:34:17.9244041Z cvt.rn.f16x2.f32 %r2385, %r2384, %r2383; 2026-02-21T08:34:17.9244406Z mov.b64 {%r2386, %r2387}, %rd339; 2026-02-21T08:34:17.9244805Z cvt.rn.f16x2.f32 %r2388, %r2387, %r2386; 2026-02-21T08:34:17.9245423Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9246233Z cvt.u64.u32 %rd896, %r1092; 2026-02-21T08:34:17.9271776Z cvt.u64.u32 %rd897, %r1093; 2026-02-21T08:34:17.9272221Z shl.b64 %rd898, %rd897, 32; 2026-02-21T08:34:17.9272607Z or.b64 %rd899, %rd896, %rd898; 2026-02-21T08:34:17.9273302Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9274083Z mov.b64 {%r2389, %r2390}, %rd899; 2026-02-21T08:34:17.9274502Z cvt.rn.f16x2.f32 %r2391, %r2390, %r2389; 2026-02-21T08:34:17.9275285Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9276001Z cvt.u64.u32 %rd900, %r1094; 2026-02-21T08:34:17.9276582Z cvt.u64.u32 %rd901, %r1095; 2026-02-21T08:34:17.9276920Z shl.b64 %rd902, %rd901, 32; 2026-02-21T08:34:17.9277275Z or.b64 %rd903, %rd900, %rd902; 2026-02-21T08:34:17.9277895Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9278544Z mov.b64 {%r2392, %r2393}, %rd903; 2026-02-21T08:34:17.9278930Z cvt.rn.f16x2.f32 %r2394, %r2393, %r2392; 2026-02-21T08:34:17.9279317Z mov.b64 {%r2395, %r2396}, %rd343; 2026-02-21T08:34:17.9279695Z cvt.rn.f16x2.f32 %r2397, %r2396, %r2395; 2026-02-21T08:34:17.9280075Z mov.b64 {%r2398, %r2399}, %rd347; 2026-02-21T08:34:17.9280451Z cvt.rn.f16x2.f32 %r2400, %r2399, %r2398; 2026-02-21T08:34:17.9281090Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9281728Z cvt.u64.u32 %rd904, %r1101; 2026-02-21T08:34:17.9282073Z cvt.u64.u32 %rd905, %r1102; 2026-02-21T08:34:17.9282397Z shl.b64 %rd906, %rd905, 32; 2026-02-21T08:34:17.9282746Z or.b64 %rd907, %rd904, %rd906; 2026-02-21T08:34:17.9283346Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9284082Z mov.b64 {%r2401, %r2402}, %rd907; 2026-02-21T08:34:17.9284449Z cvt.rn.f16x2.f32 %r2403, %r2402, %r2401; 2026-02-21T08:34:17.9285172Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9285831Z cvt.u64.u32 %rd908, %r1103; 2026-02-21T08:34:17.9286163Z cvt.u64.u32 %rd909, %r1104; 2026-02-21T08:34:17.9286503Z shl.b64 %rd910, %rd909, 32; 2026-02-21T08:34:17.9286837Z or.b64 %rd911, %rd908, %rd910; 2026-02-21T08:34:17.9287439Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9288077Z mov.b64 {%r2404, %r2405}, %rd911; 2026-02-21T08:34:17.9288457Z cvt.rn.f16x2.f32 %r2406, %r2405, %r2404; 2026-02-21T08:34:17.9288842Z mov.b64 {%r2407, %r2408}, %rd351; 2026-02-21T08:34:17.9289205Z cvt.rn.f16x2.f32 %r2409, %r2408, %r2407; 2026-02-21T08:34:17.9289687Z mov.b64 {%r2410, %r2411}, %rd355; 2026-02-21T08:34:17.9290055Z cvt.rn.f16x2.f32 %r2412, %r2411, %r2410; 2026-02-21T08:34:17.9290697Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9291329Z cvt.u64.u32 %rd912, %r1109; 2026-02-21T08:34:17.9291664Z cvt.u64.u32 %rd913, %r1110; 2026-02-21T08:34:17.9291984Z shl.b64 %rd914, %rd913, 32; 2026-02-21T08:34:17.9292324Z or.b64 %rd915, %rd912, %rd914; 2026-02-21T08:34:17.9292923Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9293657Z mov.b64 {%r2413, %r2414}, %rd915; 2026-02-21T08:34:17.9294037Z cvt.rn.f16x2.f32 %r2415, %r2414, %r2413; 2026-02-21T08:34:17.9294663Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9295364Z cvt.u64.u32 %rd916, %r1111; 2026-02-21T08:34:17.9295691Z cvt.u64.u32 %rd917, %r1112; 2026-02-21T08:34:17.9296024Z shl.b64 %rd918, %rd917, 32; 2026-02-21T08:34:17.9296352Z or.b64 %rd919, %rd916, %rd918; 2026-02-21T08:34:17.9296948Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9297591Z mov.b64 {%r2416, %r2417}, %rd919; 2026-02-21T08:34:17.9297952Z cvt.rn.f16x2.f32 %r2418, %r2417, %r2416; 2026-02-21T08:34:17.9298338Z mov.b64 {%r2419, %r2420}, %rd359; 2026-02-21T08:34:17.9298696Z cvt.rn.f16x2.f32 %r2421, %r2420, %r2419; 2026-02-21T08:34:17.9299082Z mov.b64 {%r2422, %r2423}, %rd363; 2026-02-21T08:34:17.9299444Z cvt.rn.f16x2.f32 %r2424, %r2423, %r2422; 2026-02-21T08:34:17.9300073Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9300712Z cvt.u64.u32 %rd920, %r1118; 2026-02-21T08:34:17.9301033Z cvt.u64.u32 %rd921, %r1119; 2026-02-21T08:34:17.9301517Z shl.b64 %rd922, %rd921, 32; 2026-02-21T08:34:17.9301847Z or.b64 %rd923, %rd920, %rd922; 2026-02-21T08:34:17.9302453Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9303099Z mov.b64 {%r2425, %r2426}, %rd923; 2026-02-21T08:34:17.9303475Z cvt.rn.f16x2.f32 %r2427, %r2426, %r2425; 2026-02-21T08:34:17.9304100Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9304791Z cvt.u64.u32 %rd924, %r1120; 2026-02-21T08:34:17.9305133Z cvt.u64.u32 %rd925, %r1121; 2026-02-21T08:34:17.9305450Z shl.b64 %rd926, %rd925, 32; 2026-02-21T08:34:17.9305793Z or.b64 %rd927, %rd924, %rd926; 2026-02-21T08:34:17.9306377Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9307030Z mov.b64 {%r2428, %r2429}, %rd927; 2026-02-21T08:34:17.9307388Z cvt.rn.f16x2.f32 %r2430, %r2429, %r2428; 2026-02-21T08:34:17.9307777Z mov.b64 {%r2431, %r2432}, %rd367; 2026-02-21T08:34:17.9308149Z cvt.rn.f16x2.f32 %r2433, %r2432, %r2431; 2026-02-21T08:34:17.9308522Z mov.b64 {%r2434, %r2435}, %rd371; 2026-02-21T08:34:17.9308961Z cvt.rn.f16x2.f32 %r2436, %r2435, %r2434; 2026-02-21T08:34:17.9309583Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9310237Z cvt.u64.u32 %rd928, %r1126; 2026-02-21T08:34:17.9310563Z cvt.u64.u32 %rd929, %r1127; 2026-02-21T08:34:17.9310897Z shl.b64 %rd930, %rd929, 32; 2026-02-21T08:34:17.9311226Z or.b64 %rd931, %rd928, %rd930; 2026-02-21T08:34:17.9311823Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9312476Z mov.b64 {%r2437, %r2438}, %rd931; 2026-02-21T08:34:17.9312840Z cvt.rn.f16x2.f32 %r2439, %r2438, %r2437; 2026-02-21T08:34:17.9313480Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9314117Z cvt.u64.u32 %rd932, %r1128; 2026-02-21T08:34:17.9314524Z cvt.u64.u32 %rd933, %r1129; 2026-02-21T08:34:17.9314908Z shl.b64 %rd934, %rd933, 32; 2026-02-21T08:34:17.9315250Z or.b64 %rd935, %rd932, %rd934; 2026-02-21T08:34:17.9315846Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9316484Z mov.b64 {%r2440, %r2441}, %rd935; 2026-02-21T08:34:17.9316862Z cvt.rn.f16x2.f32 %r2442, %r2441, %r2440; 2026-02-21T08:34:17.9317242Z mov.b64 {%r2443, %r2444}, %rd375; 2026-02-21T08:34:17.9317392Z cvt.rn.f16x2.f32 %r2445, %r2444, %r2443; 2026-02-21T08:34:17.9317583Z mov.b64 {%r2446, %r2447}, %rd379; 2026-02-21T08:34:17.9317722Z cvt.rn.f16x2.f32 %r2448, %r2447, %r2446; 2026-02-21T08:34:17.9318100Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9318228Z cvt.u64.u32 %rd936, %r1135; 2026-02-21T08:34:17.9318350Z cvt.u64.u32 %rd937, %r1136; 2026-02-21T08:34:17.9318467Z shl.b64 %rd938, %rd937, 32; 2026-02-21T08:34:17.9318600Z or.b64 %rd939, %rd936, %rd938; 2026-02-21T08:34:17.9318968Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9319088Z mov.b64 {%r2449, %r2450}, %rd939; 2026-02-21T08:34:17.9319237Z cvt.rn.f16x2.f32 %r2451, %r2450, %r2449; 2026-02-21T08:34:17.9319600Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9319715Z cvt.u64.u32 %rd940, %r1137; 2026-02-21T08:34:17.9319828Z cvt.u64.u32 %rd941, %r1138; 2026-02-21T08:34:17.9319958Z shl.b64 %rd942, %rd941, 32; 2026-02-21T08:34:17.9320075Z or.b64 %rd943, %rd940, %rd942; 2026-02-21T08:34:17.9320444Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9320577Z mov.b64 {%r2452, %r2453}, %rd943; 2026-02-21T08:34:17.9320770Z cvt.rn.f16x2.f32 %r2454, %r2453, %r2452; 2026-02-21T08:34:17.9320889Z mov.b64 {%r2455, %r2456}, %rd383; 2026-02-21T08:34:17.9321038Z cvt.rn.f16x2.f32 %r2457, %r2456, %r2455; 2026-02-21T08:34:17.9321157Z mov.b64 {%r2458, %r2459}, %rd387; 2026-02-21T08:34:17.9321292Z cvt.rn.f16x2.f32 %r2460, %r2459, %r2458; 2026-02-21T08:34:17.9321668Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9321800Z cvt.u64.u32 %rd944, %r1143; 2026-02-21T08:34:17.9321918Z cvt.u64.u32 %rd945, %r1144; 2026-02-21T08:34:17.9322036Z shl.b64 %rd946, %rd945, 32; 2026-02-21T08:34:17.9322170Z or.b64 %rd947, %rd944, %rd946; 2026-02-21T08:34:17.9322541Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9322661Z mov.b64 {%r2461, %r2462}, %rd947; 2026-02-21T08:34:17.9322796Z cvt.rn.f16x2.f32 %r2463, %r2462, %r2461; 2026-02-21T08:34:17.9323182Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9323300Z cvt.u64.u32 %rd948, %r1145; 2026-02-21T08:34:17.9323420Z cvt.u64.u32 %rd949, %r1146; 2026-02-21T08:34:17.9323663Z shl.b64 %rd950, %rd949, 32; 2026-02-21T08:34:17.9323780Z or.b64 %rd951, %rd948, %rd950; 2026-02-21T08:34:17.9324149Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9324280Z mov.b64 {%r2464, %r2465}, %rd951; 2026-02-21T08:34:17.9324414Z cvt.rn.f16x2.f32 %r2466, %r2465, %r2464; 2026-02-21T08:34:17.9324531Z mov.b64 {%r2467, %r2468}, %rd391; 2026-02-21T08:34:17.9324663Z cvt.rn.f16x2.f32 %r2469, %r2468, %r2467; 2026-02-21T08:34:17.9324853Z mov.b64 {%r2470, %r2471}, %rd395; 2026-02-21T08:34:17.9324987Z cvt.rn.f16x2.f32 %r2472, %r2471, %r2470; 2026-02-21T08:34:17.9325352Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9325484Z cvt.u64.u32 %rd952, %r1152; 2026-02-21T08:34:17.9325603Z cvt.u64.u32 %rd953, %r1153; 2026-02-21T08:34:17.9325717Z shl.b64 %rd954, %rd953, 32; 2026-02-21T08:34:17.9325902Z or.b64 %rd955, %rd952, %rd954; 2026-02-21T08:34:17.9326283Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9326399Z mov.b64 {%r2473, %r2474}, %rd955; 2026-02-21T08:34:17.9326535Z cvt.rn.f16x2.f32 %r2475, %r2474, %r2473; 2026-02-21T08:34:17.9326936Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9327072Z cvt.u64.u32 %rd956, %r1154; 2026-02-21T08:34:17.9327206Z cvt.u64.u32 %rd957, %r1155; 2026-02-21T08:34:17.9327424Z shl.b64 %rd958, %rd957, 32; 2026-02-21T08:34:17.9327576Z or.b64 %rd959, %rd956, %rd958; 2026-02-21T08:34:17.9327966Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9328112Z mov.b64 {%r2476, %r2477}, %rd959; 2026-02-21T08:34:17.9328281Z cvt.rn.f16x2.f32 %r2478, %r2477, %r2476; 2026-02-21T08:34:17.9328422Z mov.b64 {%r2479, %r2480}, %rd399; 2026-02-21T08:34:17.9328576Z cvt.rn.f16x2.f32 %r2481, %r2480, %r2479; 2026-02-21T08:34:17.9328724Z mov.b64 {%r2482, %r2483}, %rd403; 2026-02-21T08:34:17.9328892Z cvt.rn.f16x2.f32 %r2484, %r2483, %r2482; 2026-02-21T08:34:17.9329281Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9329428Z cvt.u64.u32 %rd960, %r1160; 2026-02-21T08:34:17.9329573Z cvt.u64.u32 %rd961, %r1161; 2026-02-21T08:34:17.9329708Z shl.b64 %rd962, %rd961, 32; 2026-02-21T08:34:17.9329846Z or.b64 %rd963, %rd960, %rd962; 2026-02-21T08:34:17.9330306Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9330466Z mov.b64 {%r2485, %r2486}, %rd963; 2026-02-21T08:34:17.9330620Z cvt.rn.f16x2.f32 %r2487, %r2486, %r2485; 2026-02-21T08:34:17.9331232Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9331399Z cvt.u64.u32 %rd964, %r1162; 2026-02-21T08:34:17.9331562Z cvt.u64.u32 %rd965, %r1163; 2026-02-21T08:34:17.9331735Z shl.b64 %rd966, %rd965, 32; 2026-02-21T08:34:17.9331884Z or.b64 %rd967, %rd964, %rd966; 2026-02-21T08:34:17.9332280Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9332419Z mov.b64 {%r2488, %r2489}, %rd967; 2026-02-21T08:34:17.9332555Z cvt.rn.f16x2.f32 %r2490, %r2489, %r2488; 2026-02-21T08:34:17.9332694Z mov.b64 {%r2491, %r2492}, %rd407; 2026-02-21T08:34:17.9332840Z cvt.rn.f16x2.f32 %r2493, %r2492, %r2491; 2026-02-21T08:34:17.9332989Z mov.b64 {%r2494, %r2495}, %rd411; 2026-02-21T08:34:17.9333121Z cvt.rn.f16x2.f32 %r2496, %r2495, %r2494; 2026-02-21T08:34:17.9333493Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9333635Z cvt.u64.u32 %rd968, %r1169; 2026-02-21T08:34:17.9333750Z cvt.u64.u32 %rd969, %r1170; 2026-02-21T08:34:17.9333880Z shl.b64 %rd970, %rd969, 32; 2026-02-21T08:34:17.9334089Z or.b64 %rd971, %rd968, %rd970; 2026-02-21T08:34:17.9334464Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9334600Z mov.b64 {%r2497, %r2498}, %rd971; 2026-02-21T08:34:17.9334820Z cvt.rn.f16x2.f32 %r2499, %r2498, %r2497; 2026-02-21T08:34:17.9335209Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9335336Z cvt.u64.u32 %rd972, %r1171; 2026-02-21T08:34:17.9335466Z cvt.u64.u32 %rd973, %r1172; 2026-02-21T08:34:17.9335603Z shl.b64 %rd974, %rd973, 32; 2026-02-21T08:34:17.9335727Z or.b64 %rd975, %rd972, %rd974; 2026-02-21T08:34:17.9336100Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9336238Z mov.b64 {%r2500, %r2501}, %rd975; 2026-02-21T08:34:17.9336379Z cvt.rn.f16x2.f32 %r2502, %r2501, %r2500; 2026-02-21T08:34:17.9336627Z mov.b64 {%r2503, %r2504}, %rd415; 2026-02-21T08:34:17.9336787Z cvt.rn.f16x2.f32 %r2505, %r2504, %r2503; 2026-02-21T08:34:17.9336931Z mov.b64 {%r2506, %r2507}, %rd419; 2026-02-21T08:34:17.9337065Z cvt.rn.f16x2.f32 %r2508, %r2507, %r2506; 2026-02-21T08:34:17.9337437Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9337577Z cvt.u64.u32 %rd976, %r1177; 2026-02-21T08:34:17.9337702Z cvt.u64.u32 %rd977, %r1178; 2026-02-21T08:34:17.9337827Z shl.b64 %rd978, %rd977, 32; 2026-02-21T08:34:17.9338028Z or.b64 %rd979, %rd976, %rd978; 2026-02-21T08:34:17.9338400Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9338527Z mov.b64 {%r2509, %r2510}, %rd979; 2026-02-21T08:34:17.9338670Z cvt.rn.f16x2.f32 %r2511, %r2510, %r2509; 2026-02-21T08:34:17.9339065Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9339185Z cvt.u64.u32 %rd980, %r1179; 2026-02-21T08:34:17.9339313Z cvt.u64.u32 %rd981, %r1180; 2026-02-21T08:34:17.9339451Z shl.b64 %rd982, %rd981, 32; 2026-02-21T08:34:17.9339578Z or.b64 %rd983, %rd980, %rd982; 2026-02-21T08:34:17.9339955Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9340075Z mov.b64 {%r2512, %r2513}, %rd983; 2026-02-21T08:34:17.9340202Z cvt.rn.f16x2.f32 %r2514, %r2513, %r2512; 2026-02-21T08:34:17.9340308Z mov.b64 {%r2515, %r2516}, %rd423; 2026-02-21T08:34:17.9340440Z cvt.rn.f16x2.f32 %r2517, %r2516, %r2515; 2026-02-21T08:34:17.9340564Z mov.b64 {%r2518, %r2519}, %rd427; 2026-02-21T08:34:17.9340690Z cvt.rn.f16x2.f32 %r2520, %r2519, %r2518; 2026-02-21T08:34:17.9341041Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9341220Z cvt.u64.u32 %rd984, %r1186; 2026-02-21T08:34:17.9341341Z cvt.u64.u32 %rd985, %r1187; 2026-02-21T08:34:17.9341451Z shl.b64 %rd986, %rd985, 32; 2026-02-21T08:34:17.9341578Z or.b64 %rd987, %rd984, %rd986; 2026-02-21T08:34:17.9341936Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9342049Z mov.b64 {%r2521, %r2522}, %rd987; 2026-02-21T08:34:17.9342180Z cvt.rn.f16x2.f32 %r2523, %r2522, %r2521; 2026-02-21T08:34:17.9342545Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9342659Z cvt.u64.u32 %rd988, %r1188; 2026-02-21T08:34:17.9342769Z cvt.u64.u32 %rd989, %r1189; 2026-02-21T08:34:17.9342889Z shl.b64 %rd990, %rd989, 32; 2026-02-21T08:34:17.9343002Z or.b64 %rd991, %rd988, %rd990; 2026-02-21T08:34:17.9343359Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9343486Z mov.b64 {%r2524, %r2525}, %rd991; 2026-02-21T08:34:17.9343623Z cvt.rn.f16x2.f32 %r2526, %r2525, %r2524; 2026-02-21T08:34:17.9343734Z mov.b64 {%r2527, %r2528}, %rd431; 2026-02-21T08:34:17.9343933Z cvt.rn.f16x2.f32 %r2529, %r2528, %r2527; 2026-02-21T08:34:17.9344042Z mov.b64 {%r2530, %r2531}, %rd435; 2026-02-21T08:34:17.9344170Z cvt.rn.f16x2.f32 %r2532, %r2531, %r2530; 2026-02-21T08:34:17.9344534Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9344646Z cvt.u64.u32 %rd992, %r1194; 2026-02-21T08:34:17.9344826Z cvt.u64.u32 %rd993, %r1195; 2026-02-21T08:34:17.9344939Z shl.b64 %rd994, %rd993, 32; 2026-02-21T08:34:17.9345069Z or.b64 %rd995, %rd992, %rd994; 2026-02-21T08:34:17.9345429Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9345541Z mov.b64 {%r2533, %r2534}, %rd995; 2026-02-21T08:34:17.9345677Z cvt.rn.f16x2.f32 %r2535, %r2534, %r2533; 2026-02-21T08:34:17.9346083Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9346199Z cvt.u64.u32 %rd996, %r1196; 2026-02-21T08:34:17.9346320Z cvt.u64.u32 %rd997, %r1197; 2026-02-21T08:34:17.9346431Z shl.b64 %rd998, %rd997, 32; 2026-02-21T08:34:17.9346545Z or.b64 %rd999, %rd996, %rd998; 2026-02-21T08:34:17.9346900Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9347021Z mov.b64 {%r2536, %r2537}, %rd999; 2026-02-21T08:34:17.9347146Z cvt.rn.f16x2.f32 %r2538, %r2537, %r2536; 2026-02-21T08:34:17.9347317Z mov.b64 {%r2539, %r2540}, %rd439; 2026-02-21T08:34:17.9347452Z cvt.rn.f16x2.f32 %r2541, %r2540, %r2539; 2026-02-21T08:34:17.9347561Z mov.b64 {%r2542, %r2543}, %rd443; 2026-02-21T08:34:17.9347687Z cvt.rn.f16x2.f32 %r2544, %r2543, %r2542; 2026-02-21T08:34:17.9348055Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9348173Z cvt.u64.u32 %rd1000, %r1203; 2026-02-21T08:34:17.9348284Z cvt.u64.u32 %rd1001, %r1204; 2026-02-21T08:34:17.9348407Z shl.b64 %rd1002, %rd1001, 32; 2026-02-21T08:34:17.9348530Z or.b64 %rd1003, %rd1000, %rd1002; 2026-02-21T08:34:17.9348882Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9348999Z mov.b64 {%r2545, %r2546}, %rd1003; 2026-02-21T08:34:17.9349135Z cvt.rn.f16x2.f32 %r2547, %r2546, %r2545; 2026-02-21T08:34:17.9349490Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9349605Z cvt.u64.u32 %rd1004, %r1205; 2026-02-21T08:34:17.9349725Z cvt.u64.u32 %rd1005, %r1206; 2026-02-21T08:34:17.9349840Z shl.b64 %rd1006, %rd1005, 32; 2026-02-21T08:34:17.9349953Z or.b64 %rd1007, %rd1004, %rd1006; 2026-02-21T08:34:17.9350361Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9350486Z mov.b64 {%r2548, %r2549}, %rd1007; 2026-02-21T08:34:17.9350617Z cvt.rn.f16x2.f32 %r2550, %r2549, %r2548; 2026-02-21T08:34:17.9350729Z mov.b64 {%r2551, %r2552}, %rd447; 2026-02-21T08:34:17.9350863Z cvt.rn.f16x2.f32 %r2553, %r2552, %r2551; 2026-02-21T08:34:17.9350973Z mov.b64 {%r2554, %r2555}, %rd451; 2026-02-21T08:34:17.9351098Z cvt.rn.f16x2.f32 %r2556, %r2555, %r2554; 2026-02-21T08:34:17.9351459Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9351567Z cvt.u64.u32 %rd1008, %r1211; 2026-02-21T08:34:17.9351682Z cvt.u64.u32 %rd1009, %r1212; 2026-02-21T08:34:17.9351795Z shl.b64 %rd1010, %rd1009, 32; 2026-02-21T08:34:17.9351914Z or.b64 %rd1011, %rd1008, %rd1010; 2026-02-21T08:34:17.9352267Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9352380Z mov.b64 {%r2557, %r2558}, %rd1011; 2026-02-21T08:34:17.9352518Z cvt.rn.f16x2.f32 %r2559, %r2558, %r2557; 2026-02-21T08:34:17.9352866Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9353040Z cvt.u64.u32 %rd1012, %r1213; 2026-02-21T08:34:17.9353158Z cvt.u64.u32 %rd1013, %r1214; 2026-02-21T08:34:17.9353271Z shl.b64 %rd1014, %rd1013, 32; 2026-02-21T08:34:17.9353386Z or.b64 %rd1015, %rd1012, %rd1014; 2026-02-21T08:34:17.9353737Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9353859Z mov.b64 {%r2560, %r2561}, %rd1015; 2026-02-21T08:34:17.9353986Z cvt.rn.f16x2.f32 %r2562, %r2561, %r2560; 2026-02-21T08:34:17.9354095Z mov.b64 {%r2563, %r2564}, %rd455; 2026-02-21T08:34:17.9354230Z cvt.rn.f16x2.f32 %r2565, %r2564, %r2563; 2026-02-21T08:34:17.9354339Z mov.b64 {%r2566, %r2567}, %rd459; 2026-02-21T08:34:17.9354464Z cvt.rn.f16x2.f32 %r2568, %r2567, %r2566; 2026-02-21T08:34:17.9354877Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9355083Z cvt.u64.u32 %rd1016, %r1220; 2026-02-21T08:34:17.9355201Z cvt.u64.u32 %rd1017, %r1221; 2026-02-21T08:34:17.9355314Z shl.b64 %rd1018, %rd1017, 32; 2026-02-21T08:34:17.9355442Z or.b64 %rd1019, %rd1016, %rd1018; 2026-02-21T08:34:17.9355803Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9355914Z mov.b64 {%r2569, %r2570}, %rd1019; 2026-02-21T08:34:17.9356042Z cvt.rn.f16x2.f32 %r2571, %r2570, %r2569; 2026-02-21T08:34:17.9356406Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9356581Z cvt.u64.u32 %rd1020, %r1222; 2026-02-21T08:34:17.9356692Z cvt.u64.u32 %rd1021, %r1223; 2026-02-21T08:34:17.9356813Z shl.b64 %rd1022, %rd1021, 32; 2026-02-21T08:34:17.9356925Z or.b64 %rd1023, %rd1020, %rd1022; 2026-02-21T08:34:17.9357283Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9357404Z mov.b64 {%r2572, %r2573}, %rd1023; 2026-02-21T08:34:17.9357532Z cvt.rn.f16x2.f32 %r2574, %r2573, %r2572; 2026-02-21T08:34:17.9357643Z mov.b64 {%r2575, %r2576}, %rd463; 2026-02-21T08:34:17.9357769Z cvt.rn.f16x2.f32 %r2577, %r2576, %r2575; 2026-02-21T08:34:17.9357887Z mov.b64 {%r2578, %r2579}, %rd467; 2026-02-21T08:34:17.9358012Z cvt.rn.f16x2.f32 %r2580, %r2579, %r2578; 2026-02-21T08:34:17.9358362Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9358489Z cvt.u64.u32 %rd1024, %r1228; 2026-02-21T08:34:17.9358598Z cvt.u64.u32 %rd1025, %r1229; 2026-02-21T08:34:17.9358709Z shl.b64 %rd1026, %rd1025, 32; 2026-02-21T08:34:17.9358824Z or.b64 %rd1027, %rd1024, %rd1026; 2026-02-21T08:34:17.9359177Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9359378Z mov.b64 {%r2581, %r2582}, %rd1027; 2026-02-21T08:34:17.9359513Z cvt.rn.f16x2.f32 %r2583, %r2582, %r2581; 2026-02-21T08:34:17.9359877Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9359987Z cvt.u64.u32 %rd1028, %r1230; 2026-02-21T08:34:17.9360097Z cvt.u64.u32 %rd1029, %r1231; 2026-02-21T08:34:17.9360217Z shl.b64 %rd1030, %rd1029, 32; 2026-02-21T08:34:17.9360329Z or.b64 %rd1031, %rd1028, %rd1030; 2026-02-21T08:34:17.9360680Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9360807Z mov.b64 {%r2584, %r2585}, %rd1031; 2026-02-21T08:34:17.9360934Z cvt.rn.f16x2.f32 %r2586, %r2585, %r2584; 2026-02-21T08:34:17.9361045Z mov.b64 {%r2587, %r2588}, %rd471; 2026-02-21T08:34:17.9361174Z cvt.rn.f16x2.f32 %r2589, %r2588, %r2587; 2026-02-21T08:34:17.9361292Z mov.b64 {%r2590, %r2591}, %rd475; 2026-02-21T08:34:17.9361418Z cvt.rn.f16x2.f32 %r2592, %r2591, %r2590; 2026-02-21T08:34:17.9361773Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9361960Z cvt.u64.u32 %rd1032, %r1237; 2026-02-21T08:34:17.9362070Z cvt.u64.u32 %rd1033, %r1238; 2026-02-21T08:34:17.9362185Z shl.b64 %rd1034, %rd1033, 32; 2026-02-21T08:34:17.9362308Z or.b64 %rd1035, %rd1032, %rd1034; 2026-02-21T08:34:17.9362664Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9362775Z mov.b64 {%r2593, %r2594}, %rd1035; 2026-02-21T08:34:17.9362901Z cvt.rn.f16x2.f32 %r2595, %r2594, %r2593; 2026-02-21T08:34:17.9363269Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9363378Z cvt.u64.u32 %rd1036, %r1239; 2026-02-21T08:34:17.9363490Z cvt.u64.u32 %rd1037, %r1240; 2026-02-21T08:34:17.9363612Z shl.b64 %rd1038, %rd1037, 32; 2026-02-21T08:34:17.9363729Z or.b64 %rd1039, %rd1036, %rd1038; 2026-02-21T08:34:17.9364127Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9364251Z mov.b64 {%r2596, %r2597}, %rd1039; 2026-02-21T08:34:17.9364378Z cvt.rn.f16x2.f32 %r2598, %r2597, %r2596; 2026-02-21T08:34:17.9364487Z mov.b64 {%r2599, %r2600}, %rd479; 2026-02-21T08:34:17.9364614Z cvt.rn.f16x2.f32 %r2601, %r2600, %r2599; 2026-02-21T08:34:17.9364796Z mov.b64 {%r2602, %r2603}, %rd483; 2026-02-21T08:34:17.9364923Z cvt.rn.f16x2.f32 %r2604, %r2603, %r2602; 2026-02-21T08:34:17.9365279Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9365457Z cvt.u64.u32 %rd1040, %r1245; 2026-02-21T08:34:17.9365569Z cvt.u64.u32 %rd1041, %r1246; 2026-02-21T08:34:17.9365682Z shl.b64 %rd1042, %rd1041, 32; 2026-02-21T08:34:17.9365806Z or.b64 %rd1043, %rd1040, %rd1042; 2026-02-21T08:34:17.9366168Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9366285Z mov.b64 {%r2605, %r2606}, %rd1043; 2026-02-21T08:34:17.9366416Z cvt.rn.f16x2.f32 %r2607, %r2606, %r2605; 2026-02-21T08:34:17.9366781Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9366898Z cvt.u64.u32 %rd1044, %r1247; 2026-02-21T08:34:17.9367007Z cvt.u64.u32 %rd1045, %r1248; 2026-02-21T08:34:17.9367131Z shl.b64 %rd1046, %rd1045, 32; 2026-02-21T08:34:17.9367243Z or.b64 %rd1047, %rd1044, %rd1046; 2026-02-21T08:34:17.9367599Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9367723Z mov.b64 {%r2608, %r2609}, %rd1047; 2026-02-21T08:34:17.9367852Z cvt.rn.f16x2.f32 %r2610, %r2609, %r2608; 2026-02-21T08:34:17.9367960Z mov.b64 {%r2611, %r2612}, %rd487; 2026-02-21T08:34:17.9368088Z cvt.rn.f16x2.f32 %r2613, %r2612, %r2611; 2026-02-21T08:34:17.9368262Z mov.b64 {%r2614, %r2615}, %rd491; 2026-02-21T08:34:17.9368389Z cvt.rn.f16x2.f32 %r2616, %r2615, %r2614; 2026-02-21T08:34:17.9368744Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9368869Z cvt.u64.u32 %rd1048, %r1254; 2026-02-21T08:34:17.9368980Z cvt.u64.u32 %rd1049, %r1255; 2026-02-21T08:34:17.9369091Z shl.b64 %rd1050, %rd1049, 32; 2026-02-21T08:34:17.9369213Z or.b64 %rd1051, %rd1048, %rd1050; 2026-02-21T08:34:17.9369565Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9369683Z mov.b64 {%r2617, %r2618}, %rd1051; 2026-02-21T08:34:17.9369835Z cvt.rn.f16x2.f32 %r2619, %r2618, %r2617; 2026-02-21T08:34:17.9370271Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9370401Z cvt.u64.u32 %rd1052, %r1256; 2026-02-21T08:34:17.9370531Z cvt.u64.u32 %rd1053, %r1257; 2026-02-21T08:34:17.9370712Z shl.b64 %rd1054, %rd1053, 32; 2026-02-21T08:34:17.9370846Z or.b64 %rd1055, %rd1052, %rd1054; 2026-02-21T08:34:17.9371294Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9371510Z mov.b64 {%r2620, %r2621}, %rd1055; 2026-02-21T08:34:17.9371667Z cvt.rn.f16x2.f32 %r2622, %r2621, %r2620; 2026-02-21T08:34:17.9371804Z mov.b64 {%r2623, %r2624}, %rd495; 2026-02-21T08:34:17.9371958Z cvt.rn.f16x2.f32 %r2625, %r2624, %r2623; 2026-02-21T08:34:17.9372082Z mov.b64 {%r2626, %r2627}, %rd499; 2026-02-21T08:34:17.9372214Z cvt.rn.f16x2.f32 %r2628, %r2627, %r2626; 2026-02-21T08:34:17.9372581Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9372706Z cvt.u64.u32 %rd1056, %r1262; 2026-02-21T08:34:17.9372818Z cvt.u64.u32 %rd1057, %r1263; 2026-02-21T08:34:17.9372932Z shl.b64 %rd1058, %rd1057, 32; 2026-02-21T08:34:17.9373056Z or.b64 %rd1059, %rd1056, %rd1058; 2026-02-21T08:34:17.9373486Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9373608Z mov.b64 {%r2629, %r2630}, %rd1059; 2026-02-21T08:34:17.9373736Z cvt.rn.f16x2.f32 %r2631, %r2630, %r2629; 2026-02-21T08:34:17.9374116Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9374231Z cvt.u64.u32 %rd1060, %r1264; 2026-02-21T08:34:17.9374339Z cvt.u64.u32 %rd1061, %r1265; 2026-02-21T08:34:17.9374464Z shl.b64 %rd1062, %rd1061, 32; 2026-02-21T08:34:17.9374578Z or.b64 %rd1063, %rd1060, %rd1062; 2026-02-21T08:34:17.9375037Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9375169Z mov.b64 {%r2632, %r2633}, %rd1063; 2026-02-21T08:34:17.9375299Z cvt.rn.f16x2.f32 %r2634, %r2633, %r2632; 2026-02-21T08:34:17.9375412Z mov.b64 {%r2635, %r2636}, %rd503; 2026-02-21T08:34:17.9375546Z cvt.rn.f16x2.f32 %r2637, %r2636, %r2635; 2026-02-21T08:34:17.9375668Z mov.b64 {%r2638, %r2639}, %rd507; 2026-02-21T08:34:17.9375797Z cvt.rn.f16x2.f32 %r2640, %r2639, %r2638; 2026-02-21T08:34:17.9376168Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9376295Z cvt.u64.u32 %rd1064, %r1271; 2026-02-21T08:34:17.9376408Z cvt.u64.u32 %rd1065, %r1272; 2026-02-21T08:34:17.9376525Z shl.b64 %rd1066, %rd1065, 32; 2026-02-21T08:34:17.9376651Z or.b64 %rd1067, %rd1064, %rd1066; 2026-02-21T08:34:17.9377013Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9377130Z mov.b64 {%r2641, %r2642}, %rd1067; 2026-02-21T08:34:17.9377261Z cvt.rn.f16x2.f32 %r2643, %r2642, %r2641; 2026-02-21T08:34:17.9377635Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9377749Z cvt.u64.u32 %rd1068, %r1273; 2026-02-21T08:34:17.9377922Z cvt.u64.u32 %rd1069, %r1274; 2026-02-21T08:34:17.9378050Z shl.b64 %rd1070, %rd1069, 32; 2026-02-21T08:34:17.9378170Z or.b64 %rd1071, %rd1068, %rd1070; 2026-02-21T08:34:17.9378538Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9378666Z mov.b64 {%r2644, %r2645}, %rd1071; 2026-02-21T08:34:17.9378796Z cvt.rn.f16x2.f32 %r2646, %r2645, %r2644; 2026-02-21T08:34:17.9378910Z mov.b64 {%r2647, %r2648}, %rd511; 2026-02-21T08:34:17.9379040Z cvt.rn.f16x2.f32 %r2649, %r2648, %r2647; 2026-02-21T08:34:17.9379163Z mov.b64 {%r2650, %r2651}, %rd515; 2026-02-21T08:34:17.9379300Z cvt.rn.f16x2.f32 %r2652, %r2651, %r2650; 2026-02-21T08:34:17.9379664Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9379786Z cvt.u64.u32 %rd1072, %r1279; 2026-02-21T08:34:17.9379903Z cvt.u64.u32 %rd1073, %r1280; 2026-02-21T08:34:17.9380024Z shl.b64 %rd1074, %rd1073, 32; 2026-02-21T08:34:17.9380140Z or.b64 %rd1075, %rd1072, %rd1074; 2026-02-21T08:34:17.9380518Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9380698Z mov.b64 {%r2653, %r2654}, %rd1075; 2026-02-21T08:34:17.9380828Z cvt.rn.f16x2.f32 %r2655, %r2654, %r2653; 2026-02-21T08:34:17.9381207Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9381323Z cvt.u64.u32 %rd1076, %r1281; 2026-02-21T08:34:17.9381434Z cvt.u64.u32 %rd1077, %r1282; 2026-02-21T08:34:17.9381556Z shl.b64 %rd1078, %rd1077, 32; 2026-02-21T08:34:17.9381673Z or.b64 %rd1079, %rd1076, %rd1078; 2026-02-21T08:34:17.9382034Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9382158Z mov.b64 {%r2656, %r2657}, %rd1079; 2026-02-21T08:34:17.9382288Z cvt.rn.f16x2.f32 %r2658, %r2657, %r2656; 2026-02-21T08:34:17.9382402Z mov.b64 {%r2659, %r2660}, %rd519; 2026-02-21T08:34:17.9382534Z cvt.rn.f16x2.f32 %r2661, %r2660, %r2659; 2026-02-21T08:34:17.9382712Z mov.b64 {%r2662, %r2663}, %rd523; 2026-02-21T08:34:17.9382845Z cvt.rn.f16x2.f32 %r2664, %r2663, %r2662; 2026-02-21T08:34:17.9383211Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9383338Z cvt.u64.u32 %rd1080, %r1288; 2026-02-21T08:34:17.9383454Z cvt.u64.u32 %rd1081, %r1289; 2026-02-21T08:34:17.9383568Z shl.b64 %rd1082, %rd1081, 32; 2026-02-21T08:34:17.9383685Z or.b64 %rd1083, %rd1080, %rd1082; 2026-02-21T08:34:17.9384061Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9384239Z mov.b64 {%r2665, %r2666}, %rd1083; 2026-02-21T08:34:17.9384371Z cvt.rn.f16x2.f32 %r2667, %r2666, %r2665; 2026-02-21T08:34:17.9384811Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9384931Z cvt.u64.u32 %rd1084, %r1290; 2026-02-21T08:34:17.9385049Z cvt.u64.u32 %rd1085, %r1291; 2026-02-21T08:34:17.9385172Z shl.b64 %rd1086, %rd1085, 32; 2026-02-21T08:34:17.9385291Z or.b64 %rd1087, %rd1084, %rd1086; 2026-02-21T08:34:17.9385659Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9385785Z mov.b64 {%r2668, %r2669}, %rd1087; 2026-02-21T08:34:17.9385916Z cvt.rn.f16x2.f32 %r2670, %r2669, %r2668; 2026-02-21T08:34:17.9386030Z mov.b64 {%r2671, %r2672}, %rd527; 2026-02-21T08:34:17.9386161Z cvt.rn.f16x2.f32 %r2673, %r2672, %r2671; 2026-02-21T08:34:17.9386288Z mov.b64 {%r2674, %r2675}, %rd531; 2026-02-21T08:34:17.9386418Z cvt.rn.f16x2.f32 %r2676, %r2675, %r2674; 2026-02-21T08:34:17.9386781Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9386906Z cvt.u64.u32 %rd1088, %r1296; 2026-02-21T08:34:17.9387079Z cvt.u64.u32 %rd1089, %r1297; 2026-02-21T08:34:17.9387191Z shl.b64 %rd1090, %rd1089, 32; 2026-02-21T08:34:17.9387311Z or.b64 %rd1091, %rd1088, %rd1090; 2026-02-21T08:34:17.9387690Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9387808Z mov.b64 {%r2677, %r2678}, %rd1091; 2026-02-21T08:34:17.9387941Z cvt.rn.f16x2.f32 %r2679, %r2678, %r2677; 2026-02-21T08:34:17.9388319Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9388435Z cvt.u64.u32 %rd1092, %r1298; 2026-02-21T08:34:17.9388545Z cvt.u64.u32 %rd1093, %r1299; 2026-02-21T08:34:17.9388670Z shl.b64 %rd1094, %rd1093, 32; 2026-02-21T08:34:17.9388788Z or.b64 %rd1095, %rd1092, %rd1094; 2026-02-21T08:34:17.9389158Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9389274Z mov.b64 {%r2680, %r2681}, %rd1095; 2026-02-21T08:34:17.9389418Z cvt.rn.f16x2.f32 %r2682, %r2681, %r2680; 2026-02-21T08:34:17.9389532Z mov.b64 {%r2683, %r2684}, %rd535; 2026-02-21T08:34:17.9389666Z cvt.rn.f16x2.f32 %r2685, %r2684, %r2683; 2026-02-21T08:34:17.9389849Z mov.b64 {%r2686, %r2687}, %rd539; 2026-02-21T08:34:17.9389983Z cvt.rn.f16x2.f32 %r2688, %r2687, %r2686; 2026-02-21T08:34:17.9390357Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9390493Z cvt.u64.u32 %rd1096, %r1305; 2026-02-21T08:34:17.9390608Z cvt.u64.u32 %rd1097, %r1306; 2026-02-21T08:34:17.9390725Z shl.b64 %rd1098, %rd1097, 32; 2026-02-21T08:34:17.9390841Z or.b64 %rd1099, %rd1096, %rd1098; 2026-02-21T08:34:17.9391221Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9391339Z mov.b64 {%r2689, %r2690}, %rd1099; 2026-02-21T08:34:17.9391471Z cvt.rn.f16x2.f32 %r2691, %r2690, %r2689; 2026-02-21T08:34:17.9391848Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9392029Z cvt.u64.u32 %rd1100, %r1307; 2026-02-21T08:34:17.9392152Z cvt.u64.u32 %rd1101, %r1308; 2026-02-21T08:34:17.9392277Z shl.b64 %rd1102, %rd1101, 32; 2026-02-21T08:34:17.9392393Z or.b64 %rd1103, %rd1100, %rd1102; 2026-02-21T08:34:17.9392758Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9392875Z mov.b64 {%r2692, %r2693}, %rd1103; 2026-02-21T08:34:17.9393019Z cvt.rn.f16x2.f32 %r2694, %r2693, %r2692; 2026-02-21T08:34:17.9393133Z mov.b64 {%r2695, %r2696}, %rd543; 2026-02-21T08:34:17.9393340Z cvt.rn.f16x2.f32 %r2697, %r2696, %r2695; 2026-02-21T08:34:17.9393468Z mov.b64 {%r2698, %r2699}, %rd547; 2026-02-21T08:34:17.9393602Z cvt.rn.f16x2.f32 %r2700, %r2699, %r2698; 2026-02-21T08:34:17.9393966Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9394096Z cvt.u64.u32 %rd1104, %r1313; 2026-02-21T08:34:17.9394211Z cvt.u64.u32 %rd1105, %r1314; 2026-02-21T08:34:17.9394328Z shl.b64 %rd1106, %rd1105, 32; 2026-02-21T08:34:17.9394448Z or.b64 %rd1107, %rd1104, %rd1106; 2026-02-21T08:34:17.9394957Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9395077Z mov.b64 {%r2701, %r2702}, %rd1107; 2026-02-21T08:34:17.9395209Z cvt.rn.f16x2.f32 %r2703, %r2702, %r2701; 2026-02-21T08:34:17.9395586Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9395702Z cvt.u64.u32 %rd1108, %r1315; 2026-02-21T08:34:17.9395821Z cvt.u64.u32 %rd1109, %r1316; 2026-02-21T08:34:17.9395947Z shl.b64 %rd1110, %rd1109, 32; 2026-02-21T08:34:17.9396060Z or.b64 %rd1111, %rd1108, %rd1110; 2026-02-21T08:34:17.9396426Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9396600Z mov.b64 {%r2704, %r2705}, %rd1111; 2026-02-21T08:34:17.9396742Z cvt.rn.f16x2.f32 %r2706, %r2705, %r2704; 2026-02-21T08:34:17.9396859Z mov.b64 {%r2707, %r2708}, %rd551; 2026-02-21T08:34:17.9396998Z cvt.rn.f16x2.f32 %r2709, %r2708, %r2707; 2026-02-21T08:34:17.9397122Z mov.b64 {%r2710, %r2711}, %rd555; 2026-02-21T08:34:17.9397252Z cvt.rn.f16x2.f32 %r2712, %r2711, %r2710; 2026-02-21T08:34:17.9397614Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9397737Z cvt.u64.u32 %rd1112, %r1322; 2026-02-21T08:34:17.9397851Z cvt.u64.u32 %rd1113, %r1323; 2026-02-21T08:34:17.9397966Z shl.b64 %rd1114, %rd1113, 32; 2026-02-21T08:34:17.9398083Z or.b64 %rd1115, %rd1112, %rd1114; 2026-02-21T08:34:17.9398458Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9398574Z mov.b64 {%r2713, %r2714}, %rd1115; 2026-02-21T08:34:17.9398701Z cvt.rn.f16x2.f32 %r2715, %r2714, %r2713; 2026-02-21T08:34:17.9399085Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9399257Z cvt.u64.u32 %rd1116, %r1324; 2026-02-21T08:34:17.9399370Z cvt.u64.u32 %rd1117, %r1325; 2026-02-21T08:34:17.9399494Z shl.b64 %rd1118, %rd1117, 32; 2026-02-21T08:34:17.9399610Z or.b64 %rd1119, %rd1116, %rd1118; 2026-02-21T08:34:17.9399975Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9400091Z mov.b64 {%r2716, %r2717}, %rd1119; 2026-02-21T08:34:17.9400234Z cvt.rn.f16x2.f32 %r2718, %r2717, %r2716; 2026-02-21T08:34:17.9400352Z mov.b64 {%r2719, %r2720}, %rd559; 2026-02-21T08:34:17.9400480Z cvt.rn.f16x2.f32 %r2721, %r2720, %r2719; 2026-02-21T08:34:17.9400604Z mov.b64 {%r2722, %r2723}, %rd563; 2026-02-21T08:34:17.9400728Z cvt.rn.f16x2.f32 %r2724, %r2723, %r2722; 2026-02-21T08:34:17.9401100Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9401225Z cvt.u64.u32 %rd1120, %r1330; 2026-02-21T08:34:17.9401394Z cvt.u64.u32 %rd1121, %r1331; 2026-02-21T08:34:17.9401514Z shl.b64 %rd1122, %rd1121, 32; 2026-02-21T08:34:17.9401631Z or.b64 %rd1123, %rd1120, %rd1122; 2026-02-21T08:34:17.9402010Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9402128Z mov.b64 {%r2725, %r2726}, %rd1123; 2026-02-21T08:34:17.9402260Z cvt.rn.f16x2.f32 %r2727, %r2726, %r2725; 2026-02-21T08:34:17.9402640Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9402811Z cvt.u64.u32 %rd1124, %r1332; 2026-02-21T08:34:17.9402927Z cvt.u64.u32 %rd1125, %r1333; 2026-02-21T08:34:17.9403052Z shl.b64 %rd1126, %rd1125, 32; 2026-02-21T08:34:17.9403169Z or.b64 %rd1127, %rd1124, %rd1126; 2026-02-21T08:34:17.9403540Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9403658Z mov.b64 {%r2728, %r2729}, %rd1127; 2026-02-21T08:34:17.9403804Z cvt.rn.f16x2.f32 %r2730, %r2729, %r2728; 2026-02-21T08:34:17.9403921Z mov.b64 {%r2731, %r2732}, %rd567; 2026-02-21T08:34:17.9404054Z cvt.rn.f16x2.f32 %r2733, %r2732, %r2731; 2026-02-21T08:34:17.9404180Z mov.b64 {%r2734, %r2735}, %rd571; 2026-02-21T08:34:17.9404313Z cvt.rn.f16x2.f32 %r2736, %r2735, %r2734; 2026-02-21T08:34:17.9404723Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9404849Z cvt.u64.u32 %rd1128, %r1339; 2026-02-21T08:34:17.9404966Z cvt.u64.u32 %rd1129, %r1340; 2026-02-21T08:34:17.9405082Z shl.b64 %rd1130, %rd1129, 32; 2026-02-21T08:34:17.9405198Z or.b64 %rd1131, %rd1128, %rd1130; 2026-02-21T08:34:17.9405572Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9405687Z mov.b64 {%r2737, %r2738}, %rd1131; 2026-02-21T08:34:17.9405877Z cvt.rn.f16x2.f32 %r2739, %r2738, %r2737; 2026-02-21T08:34:17.9406255Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9406375Z cvt.u64.u32 %rd1132, %r1341; 2026-02-21T08:34:17.9406491Z cvt.u64.u32 %rd1133, %r1342; 2026-02-21T08:34:17.9406617Z shl.b64 %rd1134, %rd1133, 32; 2026-02-21T08:34:17.9406731Z or.b64 %rd1135, %rd1132, %rd1134; 2026-02-21T08:34:17.9407097Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9407215Z mov.b64 {%r2740, %r2741}, %rd1135; 2026-02-21T08:34:17.9407356Z cvt.rn.f16x2.f32 %r2742, %r2741, %r2740; 2026-02-21T08:34:17.9407474Z mov.b64 {%r2743, %r2744}, %rd575; 2026-02-21T08:34:17.9407603Z cvt.rn.f16x2.f32 %r2745, %r2744, %r2743; 2026-02-21T08:34:17.9407726Z mov.b64 {%r2746, %r2747}, %rd579; 2026-02-21T08:34:17.9407858Z cvt.rn.f16x2.f32 %r2748, %r2747, %r2746; 2026-02-21T08:34:17.9408221Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9408350Z cvt.u64.u32 %rd1136, %r1347; 2026-02-21T08:34:17.9408527Z cvt.u64.u32 %rd1137, %r1348; 2026-02-21T08:34:17.9408641Z shl.b64 %rd1138, %rd1137, 32; 2026-02-21T08:34:17.9408757Z or.b64 %rd1139, %rd1136, %rd1138; 2026-02-21T08:34:17.9409135Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9409251Z mov.b64 {%r2749, %r2750}, %rd1139; 2026-02-21T08:34:17.9409379Z cvt.rn.f16x2.f32 %r2751, %r2750, %r2749; 2026-02-21T08:34:17.9409759Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9409876Z cvt.u64.u32 %rd1140, %r1349; 2026-02-21T08:34:17.9409986Z cvt.u64.u32 %rd1141, %r1350; 2026-02-21T08:34:17.9410110Z shl.b64 %rd1142, %rd1141, 32; 2026-02-21T08:34:17.9410225Z or.b64 %rd1143, %rd1140, %rd1142; 2026-02-21T08:34:17.9410594Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9410764Z mov.b64 {%r2752, %r2753}, %rd1143; 2026-02-21T08:34:17.9410911Z cvt.rn.f16x2.f32 %r2754, %r2753, %r2752; 2026-02-21T08:34:17.9411026Z mov.b64 {%r2755, %r2756}, %rd583; 2026-02-21T08:34:17.9411160Z cvt.rn.f16x2.f32 %r2757, %r2756, %r2755; 2026-02-21T08:34:17.9411285Z mov.b64 {%r2758, %r2759}, %rd587; 2026-02-21T08:34:17.9411415Z cvt.rn.f16x2.f32 %r2760, %r2759, %r2758; 2026-02-21T08:34:17.9411784Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9411963Z cvt.u64.u32 %rd1144, %r1356; 2026-02-21T08:34:17.9412077Z cvt.u64.u32 %rd1145, %r1357; 2026-02-21T08:34:17.9412195Z shl.b64 %rd1146, %rd1145, 32; 2026-02-21T08:34:17.9412311Z or.b64 %rd1147, %rd1144, %rd1146; 2026-02-21T08:34:17.9412686Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9412804Z mov.b64 {%r2761, %r2762}, %rd1147; 2026-02-21T08:34:17.9412937Z cvt.rn.f16x2.f32 %r2763, %r2762, %r2761; 2026-02-21T08:34:17.9413322Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9413439Z cvt.u64.u32 %rd1148, %r1358; 2026-02-21T08:34:17.9413554Z cvt.u64.u32 %rd1149, %r1359; 2026-02-21T08:34:17.9413677Z shl.b64 %rd1150, %rd1149, 32; 2026-02-21T08:34:17.9413792Z or.b64 %rd1151, %rd1148, %rd1150; 2026-02-21T08:34:17.9414161Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9414281Z mov.b64 {%r2764, %r2765}, %rd1151; 2026-02-21T08:34:17.9414424Z cvt.rn.f16x2.f32 %r2766, %r2765, %r2764; 2026-02-21T08:34:17.9414536Z mov.b64 {%r2767, %r2768}, %rd591; 2026-02-21T08:34:17.9414665Z cvt.rn.f16x2.f32 %r2769, %r2768, %r2767; 2026-02-21T08:34:17.9414840Z mov.b64 {%r2770, %r2771}, %rd595; 2026-02-21T08:34:17.9415025Z cvt.rn.f16x2.f32 %r2772, %r2771, %r2770; 2026-02-21T08:34:17.9415396Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9415526Z cvt.u64.u32 %rd1152, %r1364; 2026-02-21T08:34:17.9415642Z cvt.u64.u32 %rd1153, %r1365; 2026-02-21T08:34:17.9415758Z shl.b64 %rd1154, %rd1153, 32; 2026-02-21T08:34:17.9415876Z or.b64 %rd1155, %rd1152, %rd1154; 2026-02-21T08:34:17.9416253Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9416370Z mov.b64 {%r2773, %r2774}, %rd1155; 2026-02-21T08:34:17.9416500Z cvt.rn.f16x2.f32 %r2775, %r2774, %r2773; 2026-02-21T08:34:17.9416874Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9416990Z cvt.u64.u32 %rd1156, %r1366; 2026-02-21T08:34:17.9417104Z cvt.u64.u32 %rd1157, %r1367; 2026-02-21T08:34:17.9417230Z shl.b64 %rd1158, %rd1157, 32; 2026-02-21T08:34:17.9417348Z or.b64 %rd1159, %rd1156, %rd1158; 2026-02-21T08:34:17.9417718Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9417897Z mov.b64 {%r2776, %r2777}, %rd1159; 2026-02-21T08:34:17.9418033Z cvt.rn.f16x2.f32 %r2778, %r2777, %r2776; 2026-02-21T08:34:17.9418148Z mov.b64 {%r2779, %r2780}, %rd599; 2026-02-21T08:34:17.9418279Z cvt.rn.f16x2.f32 %r2781, %r2780, %r2779; 2026-02-21T08:34:17.9418403Z mov.b64 {%r2782, %r2783}, %rd603; 2026-02-21T08:34:17.9418533Z cvt.rn.f16x2.f32 %r2784, %r2783, %r2782; 2026-02-21T08:34:17.9418905Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9419031Z cvt.u64.u32 %rd1160, %r1373; 2026-02-21T08:34:17.9419147Z cvt.u64.u32 %rd1161, %r1374; 2026-02-21T08:34:17.9419260Z shl.b64 %rd1162, %rd1161, 32; 2026-02-21T08:34:17.9419372Z or.b64 %rd1163, %rd1160, %rd1162; 2026-02-21T08:34:17.9419754Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9419929Z mov.b64 {%r2785, %r2786}, %rd1163; 2026-02-21T08:34:17.9420063Z cvt.rn.f16x2.f32 %r2787, %r2786, %r2785; 2026-02-21T08:34:17.9420187Z mov.b64 {%r2788, %r2789}, %rd607; 2026-02-21T08:34:17.9420316Z cvt.rn.f16x2.f32 %r2790, %r2789, %r2788; 2026-02-21T08:34:17.9420430Z mov.b64 {%r2791, %r2792}, %rd611; 2026-02-21T08:34:17.9420561Z cvt.rn.f16x2.f32 %r2793, %r2792, %r2791; 2026-02-21T08:34:17.9420682Z mov.b64 {%r2794, %r2795}, %rd615; 2026-02-21T08:34:17.9420809Z cvt.rn.f16x2.f32 %r2796, %r2795, %r2794; 2026-02-21T08:34:17.9421174Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9421347Z cvt.u64.u32 %rd1164, %r1381; 2026-02-21T08:34:17.9421465Z cvt.u64.u32 %rd1165, %r1382; 2026-02-21T08:34:17.9421583Z shl.b64 %rd1166, %rd1165, 32; 2026-02-21T08:34:17.9421711Z or.b64 %rd1167, %rd1164, %rd1166; 2026-02-21T08:34:17.9422084Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9422204Z mov.b64 {%r2797, %r2798}, %rd1167; 2026-02-21T08:34:17.9422339Z cvt.rn.f16x2.f32 %r2799, %r2798, %r2797; 2026-02-21T08:34:17.9422465Z mov.b64 {%r2800, %r2801}, %rd619; 2026-02-21T08:34:17.9422596Z cvt.rn.f16x2.f32 %r2802, %r2801, %r2800; 2026-02-21T08:34:17.9422705Z mov.b64 {%r2803, %r2804}, %rd623; 2026-02-21T08:34:17.9422847Z cvt.rn.f16x2.f32 %r2805, %r2804, %r2803; 2026-02-21T08:34:17.9422960Z mov.b64 {%r2806, %r2807}, %rd627; 2026-02-21T08:34:17.9423092Z cvt.rn.f16x2.f32 %r2808, %r2807, %r2806; 2026-02-21T08:34:17.9423472Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9423588Z cvt.u64.u32 %rd1168, %r1390; 2026-02-21T08:34:17.9423701Z cvt.u64.u32 %rd1169, %r1391; 2026-02-21T08:34:17.9423815Z shl.b64 %rd1170, %rd1169, 32; 2026-02-21T08:34:17.9423941Z or.b64 %rd1171, %rd1168, %rd1170; 2026-02-21T08:34:17.9424378Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9424498Z mov.b64 {%r2809, %r2810}, %rd1171; 2026-02-21T08:34:17.9424635Z cvt.rn.f16x2.f32 %r2811, %r2810, %r2809; 2026-02-21T08:34:17.9424810Z mov.b64 {%r2812, %r2813}, %rd631; 2026-02-21T08:34:17.9424941Z cvt.rn.f16x2.f32 %r2814, %r2813, %r2812; 2026-02-21T08:34:17.9425066Z mov.b64 {%r2815, %r2816}, %rd635; 2026-02-21T08:34:17.9425195Z cvt.rn.f16x2.f32 %r2817, %r2816, %r2815; 2026-02-21T08:34:17.9425310Z mov.b64 {%r2818, %r2819}, %rd639; 2026-02-21T08:34:17.9425443Z cvt.rn.f16x2.f32 %r2820, %r2819, %r2818; 2026-02-21T08:34:17.9425818Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9425933Z cvt.u64.u32 %rd1172, %r1398; 2026-02-21T08:34:17.9426045Z cvt.u64.u32 %rd1173, %r1399; 2026-02-21T08:34:17.9426169Z shl.b64 %rd1174, %rd1173, 32; 2026-02-21T08:34:17.9426284Z or.b64 %rd1175, %rd1172, %rd1174; 2026-02-21T08:34:17.9426646Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9426865Z mov.b64 {%r2821, %r2822}, %rd1175; 2026-02-21T08:34:17.9426997Z cvt.rn.f16x2.f32 %r2823, %r2822, %r2821; 2026-02-21T08:34:17.9427364Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9427482Z cvt.u64.u32 %rd1176, %r1400; 2026-02-21T08:34:17.9427606Z cvt.u64.u32 %rd1177, %r1401; 2026-02-21T08:34:17.9427720Z shl.b64 %rd1178, %rd1177, 32; 2026-02-21T08:34:17.9427835Z or.b64 %rd1179, %rd1176, %rd1178; 2026-02-21T08:34:17.9428210Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9428326Z mov.b64 {%r2824, %r2825}, %rd1179; 2026-02-21T08:34:17.9428459Z cvt.rn.f16x2.f32 %r2826, %r2825, %r2824; 2026-02-21T08:34:17.9428580Z mov.b64 {%r2827, %r2828}, %rd643; 2026-02-21T08:34:17.9428715Z cvt.rn.f16x2.f32 %r2829, %r2828, %r2827; 2026-02-21T08:34:17.9428910Z mov.b64 {%r2830, %r2831}, %rd647; 2026-02-21T08:34:17.9429045Z cvt.rn.f16x2.f32 %r2832, %r2831, %r2830; 2026-02-21T08:34:17.9429420Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9429536Z cvt.u64.u32 %rd1180, %r1407; 2026-02-21T08:34:17.9429652Z cvt.u64.u32 %rd1181, %r1408; 2026-02-21T08:34:17.9429778Z shl.b64 %rd1182, %rd1181, 32; 2026-02-21T08:34:17.9429892Z or.b64 %rd1183, %rd1180, %rd1182; 2026-02-21T08:34:17.9430262Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9430442Z mov.b64 {%r2833, %r2834}, %rd1183; 2026-02-21T08:34:17.9430574Z cvt.rn.f16x2.f32 %r2835, %r2834, %r2833; 2026-02-21T08:34:17.9430688Z mov.b64 {%r2836, %r2837}, %rd651; 2026-02-21T08:34:17.9430817Z cvt.rn.f16x2.f32 %r2838, %r2837, %r2836; 2026-02-21T08:34:17.9430944Z mov.b64 {%r2839, %r2840}, %rd655; 2026-02-21T08:34:17.9431074Z cvt.rn.f16x2.f32 %r2841, %r2840, %r2839; 2026-02-21T08:34:17.9431192Z mov.b64 {%r2842, %r2843}, %rd659; 2026-02-21T08:34:17.9431332Z cvt.rn.f16x2.f32 %r2844, %r2843, %r2842; 2026-02-21T08:34:17.9431706Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9431822Z cvt.u64.u32 %rd1184, %r1415; 2026-02-21T08:34:17.9431940Z cvt.u64.u32 %rd1185, %r1416; 2026-02-21T08:34:17.9432063Z shl.b64 %rd1186, %rd1185, 32; 2026-02-21T08:34:17.9432177Z or.b64 %rd1187, %rd1184, %rd1186; 2026-02-21T08:34:17.9432541Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9432670Z mov.b64 {%r2845, %r2846}, %rd1187; 2026-02-21T08:34:17.9432802Z cvt.rn.f16x2.f32 %r2847, %r2846, %r2845; 2026-02-21T08:34:17.9432915Z mov.b64 {%r2848, %r2849}, %rd663; 2026-02-21T08:34:17.9433054Z cvt.rn.f16x2.f32 %r2850, %r2849, %r2848; 2026-02-21T08:34:17.9433222Z mov.b64 {%r2851, %r2852}, %rd667; 2026-02-21T08:34:17.9433355Z cvt.rn.f16x2.f32 %r2853, %r2852, %r2851; 2026-02-21T08:34:17.9433473Z mov.b64 {%r2854, %r2855}, %rd671; 2026-02-21T08:34:17.9433610Z cvt.rn.f16x2.f32 %r2856, %r2855, %r2854; 2026-02-21T08:34:17.9433977Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9434092Z cvt.u64.u32 %rd1188, %r1424; 2026-02-21T08:34:17.9434214Z cvt.u64.u32 %rd1189, %r1425; 2026-02-21T08:34:17.9434327Z shl.b64 %rd1190, %rd1189, 32; 2026-02-21T08:34:17.9434464Z or.b64 %rd1191, %rd1188, %rd1190; 2026-02-21T08:34:17.9434917Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9435037Z mov.b64 {%r2857, %r2858}, %rd1191; 2026-02-21T08:34:17.9435167Z cvt.rn.f16x2.f32 %r2859, %r2858, %r2857; 2026-02-21T08:34:17.9435284Z mov.b64 {%r2860, %r2861}, %rd675; 2026-02-21T08:34:17.9435426Z cvt.rn.f16x2.f32 %r2862, %r2861, %r2860; 2026-02-21T08:34:17.9435540Z mov.b64 {%r2863, %r2864}, %rd679; 2026-02-21T08:34:17.9435674Z cvt.rn.f16x2.f32 %r2865, %r2864, %r2863; 2026-02-21T08:34:17.9435855Z mov.b64 {%r2866, %r2867}, %rd683; 2026-02-21T08:34:17.9435986Z cvt.rn.f16x2.f32 %r2868, %r2867, %r2866; 2026-02-21T08:34:17.9436354Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9436481Z cvt.u64.u32 %rd1192, %r1432; 2026-02-21T08:34:17.9436594Z cvt.u64.u32 %rd1193, %r1433; 2026-02-21T08:34:17.9436711Z shl.b64 %rd1194, %rd1193, 32; 2026-02-21T08:34:17.9436825Z or.b64 %rd1195, %rd1192, %rd1194; 2026-02-21T08:34:17.9437204Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9437321Z mov.b64 {%r2869, %r2870}, %rd1195; 2026-02-21T08:34:17.9437452Z cvt.rn.f16x2.f32 %r2871, %r2870, %r2869; 2026-02-21T08:34:17.9437575Z mov.b64 {%r2872, %r2873}, %rd687; 2026-02-21T08:34:17.9437709Z cvt.rn.f16x2.f32 %r2874, %r2873, %r2872; 2026-02-21T08:34:17.9437909Z mov.b64 {%r2875, %r2876}, %rd691; 2026-02-21T08:34:17.9438075Z cvt.rn.f16x2.f32 %r2877, %r2876, %r2875; 2026-02-21T08:34:17.9438213Z mov.b64 {%r2878, %r2879}, %rd695; 2026-02-21T08:34:17.9438377Z cvt.rn.f16x2.f32 %r2880, %r2879, %r2878; 2026-02-21T08:34:17.9438744Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9438908Z cvt.u64.u32 %rd1196, %r1441; 2026-02-21T08:34:17.9439044Z cvt.u64.u32 %rd1197, %r1442; 2026-02-21T08:34:17.9439186Z shl.b64 %rd1198, %rd1197, 32; 2026-02-21T08:34:17.9439395Z or.b64 %rd1199, %rd1196, %rd1198; 2026-02-21T08:34:17.9439777Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9439900Z mov.b64 {%r2881, %r2882}, %rd1199; 2026-02-21T08:34:17.9440078Z cvt.rn.f16x2.f32 %r2883, %r2882, %r2881; 2026-02-21T08:34:17.9440240Z mov.b64 {%r2884, %r2885}, %rd699; 2026-02-21T08:34:17.9440430Z cvt.rn.f16x2.f32 %r2886, %r2885, %r2884; 2026-02-21T08:34:17.9440581Z mov.b64 {%r2887, %r2888}, %rd703; 2026-02-21T08:34:17.9440796Z cvt.rn.f16x2.f32 %r2889, %r2888, %r2887; 2026-02-21T08:34:17.9440963Z mov.b64 {%r2890, %r2891}, %rd707; 2026-02-21T08:34:17.9441155Z cvt.rn.f16x2.f32 %r2892, %r2891, %r2890; 2026-02-21T08:34:17.9441618Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9441788Z cvt.u64.u32 %rd1200, %r1449; 2026-02-21T08:34:17.9441944Z cvt.u64.u32 %rd1201, %r1450; 2026-02-21T08:34:17.9442092Z shl.b64 %rd1202, %rd1201, 32; 2026-02-21T08:34:17.9442234Z or.b64 %rd1203, %rd1200, %rd1202; 2026-02-21T08:34:17.9442609Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9442737Z mov.b64 {%r2893, %r2894}, %rd1203; 2026-02-21T08:34:17.9442978Z cvt.rn.f16x2.f32 %r2895, %r2894, %r2893; 2026-02-21T08:34:17.9443114Z mov.b64 {%r2896, %r2897}, %rd711; 2026-02-21T08:34:17.9443280Z cvt.rn.f16x2.f32 %r2898, %r2897, %r2896; 2026-02-21T08:34:17.9443436Z mov.b64 {%r2899, %r2900}, %rd715; 2026-02-21T08:34:17.9443601Z cvt.rn.f16x2.f32 %r2901, %r2900, %r2899; 2026-02-21T08:34:17.9443736Z mov.b64 {%r2902, %r2903}, %rd719; 2026-02-21T08:34:17.9443886Z cvt.rn.f16x2.f32 %r2904, %r2903, %r2902; 2026-02-21T08:34:17.9444274Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9444419Z cvt.u64.u32 %rd1204, %r1458; 2026-02-21T08:34:17.9444554Z cvt.u64.u32 %rd1205, %r1459; 2026-02-21T08:34:17.9444751Z shl.b64 %rd1206, %rd1205, 32; 2026-02-21T08:34:17.9444880Z or.b64 %rd1207, %rd1204, %rd1206; 2026-02-21T08:34:17.9445250Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9445389Z mov.b64 {%r2905, %r2906}, %rd1207; 2026-02-21T08:34:17.9445523Z cvt.rn.f16x2.f32 %r2907, %r2906, %r2905; 2026-02-21T08:34:17.9445635Z mov.b64 {%r2908, %r2909}, %rd723; 2026-02-21T08:34:17.9445781Z cvt.rn.f16x2.f32 %r2910, %r2909, %r2908; 2026-02-21T08:34:17.9445989Z mov.b64 {%r2911, %r2912}, %rd727; 2026-02-21T08:34:17.9446120Z cvt.rn.f16x2.f32 %r2913, %r2912, %r2911; 2026-02-21T08:34:17.9446245Z mov.b64 {%r2914, %r2915}, %rd731; 2026-02-21T08:34:17.9446401Z cvt.rn.f16x2.f32 %r2916, %r2915, %r2914; 2026-02-21T08:34:17.9446778Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9446893Z cvt.u64.u32 %rd1208, %r1466; 2026-02-21T08:34:17.9447031Z cvt.u64.u32 %rd1209, %r1467; 2026-02-21T08:34:17.9447157Z shl.b64 %rd1210, %rd1209, 32; 2026-02-21T08:34:17.9447284Z or.b64 %rd1211, %rd1208, %rd1210; 2026-02-21T08:34:17.9447653Z .loc 1 51 27 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:51:27 2026-02-21T08:34:17.9447790Z mov.b64 {%r2917, %r2918}, %rd1211; 2026-02-21T08:34:17.9447922Z cvt.rn.f16x2.f32 %r2919, %r2918, %r2917; 2026-02-21T08:34:17.9448099Z mov.b64 {%r2920, %r2921}, %rd735; 2026-02-21T08:34:17.9448269Z cvt.rn.f16x2.f32 %r2922, %r2921, %r2920; 2026-02-21T08:34:17.9448380Z mov.b64 {%r2923, %r2924}, %rd739; 2026-02-21T08:34:17.9448512Z cvt.rn.f16x2.f32 %r2925, %r2924, %r2923; 2026-02-21T08:34:17.9448661Z mov.b64 {%r2926, %r2927}, %rd743; 2026-02-21T08:34:17.9448800Z cvt.rn.f16x2.f32 %r2928, %r2927, %r2926; 2026-02-21T08:34:17.9449176Z .loc 1 52 82 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:52:82 2026-02-21T08:34:17.9449407Z st.shared.v4.b32 [%r2083], {%r2163, %r2175, %r2187, %r2199}; 2026-02-21T08:34:17.9449677Z st.shared.v4.b32 [%r2082], {%r2211, %r2223, %r2235, %r2247}; 2026-02-21T08:34:17.9449886Z st.shared.v4.b32 [%r2080], {%r2259, %r2271, %r2283, %r2295}; 2026-02-21T08:34:17.9450094Z st.shared.v4.b32 [%r2078], {%r2307, %r2319, %r2331, %r2343}; 2026-02-21T08:34:17.9450310Z st.shared.v4.b32 [%r2076], {%r2355, %r2367, %r2379, %r2391}; 2026-02-21T08:34:17.9450501Z st.shared.v4.b32 [%r2074], {%r2403, %r2415, %r2427, %r2439}; 2026-02-21T08:34:17.9450710Z st.shared.v4.b32 [%r2072], {%r2451, %r2463, %r2475, %r2487}; 2026-02-21T08:34:17.9450922Z st.shared.v4.b32 [%r2070], {%r2499, %r2511, %r2523, %r2535}; 2026-02-21T08:34:17.9451033Z bar.sync 0, 128; 2026-02-21T08:34:17.9451151Z // begin inline asm 2026-02-21T08:34:17.9451529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1795, %r1799, %r1803, %r1807}, [%r1479]; 2026-02-21T08:34:17.9451656Z // end inline asm 2026-02-21T08:34:17.9451762Z // begin inline asm 2026-02-21T08:34:17.9452126Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1811, %r1815, %r1819, %r1823}, [%r1484]; 2026-02-21T08:34:17.9452239Z // end inline asm 2026-02-21T08:34:17.9452346Z // begin inline asm 2026-02-21T08:34:17.9452672Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1827, %r1831, %r1835, %r1839}, [%r1489]; 2026-02-21T08:34:17.9452783Z // end inline asm 2026-02-21T08:34:17.9452943Z // begin inline asm 2026-02-21T08:34:17.9453265Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1843, %r1847, %r1851, %r1855}, [%r1494]; 2026-02-21T08:34:17.9453372Z // end inline asm 2026-02-21T08:34:17.9453491Z // begin inline asm 2026-02-21T08:34:17.9453809Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1859, %r1863, %r1867, %r1871}, [%r1499]; 2026-02-21T08:34:17.9453910Z // end inline asm 2026-02-21T08:34:17.9454023Z // begin inline asm 2026-02-21T08:34:17.9454341Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1875, %r1879, %r1883, %r1887}, [%r1504]; 2026-02-21T08:34:17.9454445Z // end inline asm 2026-02-21T08:34:17.9454556Z // begin inline asm 2026-02-21T08:34:17.9454935Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1891, %r1895, %r1899, %r1903}, [%r1509]; 2026-02-21T08:34:17.9455042Z // end inline asm 2026-02-21T08:34:17.9455149Z // begin inline asm 2026-02-21T08:34:17.9455477Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1907, %r1911, %r1915, %r1919}, [%r1514]; 2026-02-21T08:34:17.9455579Z // end inline asm 2026-02-21T08:34:17.9455683Z bar.sync 0, 128; 2026-02-21T08:34:17.9455890Z st.shared.v4.b32 [%r2083], {%r2547, %r2559, %r2571, %r2583}; 2026-02-21T08:34:17.9456135Z st.shared.v4.b32 [%r2082], {%r2595, %r2607, %r2619, %r2631}; 2026-02-21T08:34:17.9456323Z st.shared.v4.b32 [%r2080], {%r2643, %r2655, %r2667, %r2679}; 2026-02-21T08:34:17.9456511Z st.shared.v4.b32 [%r2078], {%r2691, %r2703, %r2715, %r2727}; 2026-02-21T08:34:17.9456708Z st.shared.v4.b32 [%r2076], {%r2739, %r2751, %r2763, %r2775}; 2026-02-21T08:34:17.9456895Z st.shared.v4.b32 [%r2074], {%r2787, %r2799, %r2811, %r2823}; 2026-02-21T08:34:17.9457085Z st.shared.v4.b32 [%r2072], {%r2835, %r2847, %r2859, %r2871}; 2026-02-21T08:34:17.9457280Z st.shared.v4.b32 [%r2070], {%r2883, %r2895, %r2907, %r2919}; 2026-02-21T08:34:17.9457383Z bar.sync 0, 128; 2026-02-21T08:34:17.9457488Z // begin inline asm 2026-02-21T08:34:17.9457829Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1923, %r1927, %r1931, %r1935}, [%r1479]; 2026-02-21T08:34:17.9457930Z // end inline asm 2026-02-21T08:34:17.9458035Z // begin inline asm 2026-02-21T08:34:17.9458413Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1939, %r1943, %r1947, %r1951}, [%r1484]; 2026-02-21T08:34:17.9458530Z // end inline asm 2026-02-21T08:34:17.9458636Z // begin inline asm 2026-02-21T08:34:17.9458957Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1955, %r1959, %r1963, %r1967}, [%r1489]; 2026-02-21T08:34:17.9459076Z // end inline asm 2026-02-21T08:34:17.9459184Z // begin inline asm 2026-02-21T08:34:17.9459502Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1971, %r1975, %r1979, %r1983}, [%r1494]; 2026-02-21T08:34:17.9459667Z // end inline asm 2026-02-21T08:34:17.9459774Z // begin inline asm 2026-02-21T08:34:17.9460096Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1987, %r1991, %r1995, %r1999}, [%r1499]; 2026-02-21T08:34:17.9460199Z // end inline asm 2026-02-21T08:34:17.9460317Z // begin inline asm 2026-02-21T08:34:17.9460641Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2003, %r2007, %r2011, %r2015}, [%r1504]; 2026-02-21T08:34:17.9460744Z // end inline asm 2026-02-21T08:34:17.9460863Z // begin inline asm 2026-02-21T08:34:17.9461191Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2019, %r2023, %r2027, %r2031}, [%r1509]; 2026-02-21T08:34:17.9461293Z // end inline asm 2026-02-21T08:34:17.9461398Z // begin inline asm 2026-02-21T08:34:17.9461735Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2035, %r2039, %r2043, %r2047}, [%r1514]; 2026-02-21T08:34:17.9461833Z // end inline asm 2026-02-21T08:34:17.9461937Z bar.sync 0, 128; 2026-02-21T08:34:17.9462143Z st.shared.v4.b32 [%r2083], {%r2166, %r2178, %r2190, %r2202}; 2026-02-21T08:34:17.9462337Z st.shared.v4.b32 [%r2082], {%r2214, %r2226, %r2238, %r2250}; 2026-02-21T08:34:17.9462524Z st.shared.v4.b32 [%r2080], {%r2262, %r2274, %r2286, %r2298}; 2026-02-21T08:34:17.9462722Z st.shared.v4.b32 [%r2078], {%r2310, %r2322, %r2334, %r2346}; 2026-02-21T08:34:17.9462964Z st.shared.v4.b32 [%r2076], {%r2358, %r2370, %r2382, %r2394}; 2026-02-21T08:34:17.9463151Z st.shared.v4.b32 [%r2074], {%r2406, %r2418, %r2430, %r2442}; 2026-02-21T08:34:17.9463349Z st.shared.v4.b32 [%r2072], {%r2454, %r2466, %r2478, %r2490}; 2026-02-21T08:34:17.9463542Z st.shared.v4.b32 [%r2070], {%r2502, %r2514, %r2526, %r2538}; 2026-02-21T08:34:17.9463646Z bar.sync 0, 128; 2026-02-21T08:34:17.9463753Z // begin inline asm 2026-02-21T08:34:17.9464086Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1796, %r1800, %r1804, %r1808}, [%r1479]; 2026-02-21T08:34:17.9464189Z // end inline asm 2026-02-21T08:34:17.9464291Z // begin inline asm 2026-02-21T08:34:17.9464625Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1812, %r1816, %r1820, %r1824}, [%r1484]; 2026-02-21T08:34:17.9464785Z // end inline asm 2026-02-21T08:34:17.9464892Z // begin inline asm 2026-02-21T08:34:17.9465214Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1828, %r1832, %r1836, %r1840}, [%r1489]; 2026-02-21T08:34:17.9465328Z // end inline asm 2026-02-21T08:34:17.9465436Z // begin inline asm 2026-02-21T08:34:17.9465758Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1844, %r1848, %r1852, %r1856}, [%r1494]; 2026-02-21T08:34:17.9465876Z // end inline asm 2026-02-21T08:34:17.9466068Z // begin inline asm 2026-02-21T08:34:17.9466387Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1860, %r1864, %r1868, %r1872}, [%r1499]; 2026-02-21T08:34:17.9466502Z // end inline asm 2026-02-21T08:34:17.9466607Z // begin inline asm 2026-02-21T08:34:17.9466928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1876, %r1880, %r1884, %r1888}, [%r1504]; 2026-02-21T08:34:17.9467035Z // end inline asm 2026-02-21T08:34:17.9467154Z // begin inline asm 2026-02-21T08:34:17.9467473Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1892, %r1896, %r1900, %r1904}, [%r1509]; 2026-02-21T08:34:17.9467574Z // end inline asm 2026-02-21T08:34:17.9467689Z // begin inline asm 2026-02-21T08:34:17.9468009Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1908, %r1912, %r1916, %r1920}, [%r1514]; 2026-02-21T08:34:17.9468111Z // end inline asm 2026-02-21T08:34:17.9468217Z bar.sync 0, 128; 2026-02-21T08:34:17.9468473Z st.shared.v4.b32 [%r2083], {%r2550, %r2562, %r2574, %r2586}; 2026-02-21T08:34:17.9468668Z st.shared.v4.b32 [%r2082], {%r2598, %r2610, %r2622, %r2634}; 2026-02-21T08:34:17.9468854Z st.shared.v4.b32 [%r2080], {%r2646, %r2658, %r2670, %r2682}; 2026-02-21T08:34:17.9469052Z st.shared.v4.b32 [%r2078], {%r2694, %r2706, %r2718, %r2730}; 2026-02-21T08:34:17.9469240Z st.shared.v4.b32 [%r2076], {%r2742, %r2754, %r2766, %r2778}; 2026-02-21T08:34:17.9469426Z st.shared.v4.b32 [%r2074], {%r2790, %r2802, %r2814, %r2826}; 2026-02-21T08:34:17.9469621Z st.shared.v4.b32 [%r2072], {%r2838, %r2850, %r2862, %r2874}; 2026-02-21T08:34:17.9469855Z st.shared.v4.b32 [%r2070], {%r2886, %r2898, %r2910, %r2922}; 2026-02-21T08:34:17.9469955Z bar.sync 0, 128; 2026-02-21T08:34:17.9470071Z // begin inline asm 2026-02-21T08:34:17.9470395Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1924, %r1928, %r1932, %r1936}, [%r1479]; 2026-02-21T08:34:17.9470502Z // end inline asm 2026-02-21T08:34:17.9470607Z // begin inline asm 2026-02-21T08:34:17.9470942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1940, %r1944, %r1948, %r1952}, [%r1484]; 2026-02-21T08:34:17.9471047Z // end inline asm 2026-02-21T08:34:17.9471153Z // begin inline asm 2026-02-21T08:34:17.9471485Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1956, %r1960, %r1964, %r1968}, [%r1489]; 2026-02-21T08:34:17.9471588Z // end inline asm 2026-02-21T08:34:17.9471694Z // begin inline asm 2026-02-21T08:34:17.9472012Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1972, %r1976, %r1980, %r1984}, [%r1494]; 2026-02-21T08:34:17.9472123Z // end inline asm 2026-02-21T08:34:17.9472231Z // begin inline asm 2026-02-21T08:34:17.9472548Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1988, %r1992, %r1996, %r2000}, [%r1499]; 2026-02-21T08:34:17.9472662Z // end inline asm 2026-02-21T08:34:17.9472768Z // begin inline asm 2026-02-21T08:34:17.9473085Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2004, %r2008, %r2012, %r2016}, [%r1504]; 2026-02-21T08:34:17.9473250Z // end inline asm 2026-02-21T08:34:17.9473357Z // begin inline asm 2026-02-21T08:34:17.9473677Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2020, %r2024, %r2028, %r2032}, [%r1509]; 2026-02-21T08:34:17.9473782Z // end inline asm 2026-02-21T08:34:17.9473897Z // begin inline asm 2026-02-21T08:34:17.9474219Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2036, %r2040, %r2044, %r2048}, [%r1514]; 2026-02-21T08:34:17.9474317Z // end inline asm 2026-02-21T08:34:17.9474428Z bar.sync 0, 128; 2026-02-21T08:34:17.9474616Z st.shared.v4.b32 [%r2083], {%r2169, %r2181, %r2193, %r2205}; 2026-02-21T08:34:17.9474853Z st.shared.v4.b32 [%r2082], {%r2217, %r2229, %r2241, %r2253}; 2026-02-21T08:34:17.9475056Z st.shared.v4.b32 [%r2080], {%r2265, %r2277, %r2289, %r2301}; 2026-02-21T08:34:17.9475244Z st.shared.v4.b32 [%r2078], {%r2313, %r2325, %r2337, %r2349}; 2026-02-21T08:34:17.9475429Z st.shared.v4.b32 [%r2076], {%r2361, %r2373, %r2385, %r2397}; 2026-02-21T08:34:17.9475613Z st.shared.v4.b32 [%r2074], {%r2409, %r2421, %r2433, %r2445}; 2026-02-21T08:34:17.9475810Z st.shared.v4.b32 [%r2072], {%r2457, %r2469, %r2481, %r2493}; 2026-02-21T08:34:17.9476046Z st.shared.v4.b32 [%r2070], {%r2505, %r2517, %r2529, %r2541}; 2026-02-21T08:34:17.9476149Z bar.sync 0, 128; 2026-02-21T08:34:17.9476265Z // begin inline asm 2026-02-21T08:34:17.9476591Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1797, %r1801, %r1805, %r1809}, [%r1479]; 2026-02-21T08:34:17.9476693Z // end inline asm 2026-02-21T08:34:17.9476809Z // begin inline asm 2026-02-21T08:34:17.9477131Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1813, %r1817, %r1821, %r1825}, [%r1484]; 2026-02-21T08:34:17.9477235Z // end inline asm 2026-02-21T08:34:17.9477341Z // begin inline asm 2026-02-21T08:34:17.9477670Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1829, %r1833, %r1837, %r1841}, [%r1489]; 2026-02-21T08:34:17.9477775Z // end inline asm 2026-02-21T08:34:17.9477881Z // begin inline asm 2026-02-21T08:34:17.9478209Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1845, %r1849, %r1853, %r1857}, [%r1494]; 2026-02-21T08:34:17.9478314Z // end inline asm 2026-02-21T08:34:17.9478472Z // begin inline asm 2026-02-21T08:34:17.9478799Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1861, %r1865, %r1869, %r1873}, [%r1499]; 2026-02-21T08:34:17.9478918Z // end inline asm 2026-02-21T08:34:17.9479027Z // begin inline asm 2026-02-21T08:34:17.9479346Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1877, %r1881, %r1885, %r1889}, [%r1504]; 2026-02-21T08:34:17.9479461Z // end inline asm 2026-02-21T08:34:17.9479569Z // begin inline asm 2026-02-21T08:34:17.9479889Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1893, %r1897, %r1901, %r1905}, [%r1509]; 2026-02-21T08:34:17.9480054Z // end inline asm 2026-02-21T08:34:17.9480158Z // begin inline asm 2026-02-21T08:34:17.9480478Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1909, %r1913, %r1917, %r1921}, [%r1514]; 2026-02-21T08:34:17.9480583Z // end inline asm 2026-02-21T08:34:17.9480701Z bar.sync 0, 128; 2026-02-21T08:34:17.9480895Z st.shared.v4.b32 [%r2083], {%r2553, %r2565, %r2577, %r2589}; 2026-02-21T08:34:17.9481090Z st.shared.v4.b32 [%r2082], {%r2601, %r2613, %r2625, %r2637}; 2026-02-21T08:34:17.9481296Z st.shared.v4.b32 [%r2080], {%r2649, %r2661, %r2673, %r2685}; 2026-02-21T08:34:17.9481485Z st.shared.v4.b32 [%r2078], {%r2697, %r2709, %r2721, %r2733}; 2026-02-21T08:34:17.9481674Z st.shared.v4.b32 [%r2076], {%r2745, %r2757, %r2769, %r2781}; 2026-02-21T08:34:17.9481872Z st.shared.v4.b32 [%r2074], {%r2793, %r2805, %r2817, %r2829}; 2026-02-21T08:34:17.9482059Z st.shared.v4.b32 [%r2072], {%r2841, %r2853, %r2865, %r2877}; 2026-02-21T08:34:17.9482248Z st.shared.v4.b32 [%r2070], {%r2889, %r2901, %r2913, %r2925}; 2026-02-21T08:34:17.9482351Z bar.sync 0, 128; 2026-02-21T08:34:17.9482471Z // begin inline asm 2026-02-21T08:34:17.9482792Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1925, %r1929, %r1933, %r1937}, [%r1479]; 2026-02-21T08:34:17.9482895Z // end inline asm 2026-02-21T08:34:17.9483086Z // begin inline asm 2026-02-21T08:34:17.9483418Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1941, %r1945, %r1949, %r1953}, [%r1484]; 2026-02-21T08:34:17.9483526Z // end inline asm 2026-02-21T08:34:17.9483640Z // begin inline asm 2026-02-21T08:34:17.9483966Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1957, %r1961, %r1965, %r1969}, [%r1489]; 2026-02-21T08:34:17.9484068Z // end inline asm 2026-02-21T08:34:17.9484175Z // begin inline asm 2026-02-21T08:34:17.9484540Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1973, %r1977, %r1981, %r1985}, [%r1494]; 2026-02-21T08:34:17.9484664Z // end inline asm 2026-02-21T08:34:17.9484860Z // begin inline asm 2026-02-21T08:34:17.9485259Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1989, %r1993, %r1997, %r2001}, [%r1499]; 2026-02-21T08:34:17.9485386Z // end inline asm 2026-02-21T08:34:17.9485514Z // begin inline asm 2026-02-21T08:34:17.9485929Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2005, %r2009, %r2013, %r2017}, [%r1504]; 2026-02-21T08:34:17.9486054Z // end inline asm 2026-02-21T08:34:17.9486168Z // begin inline asm 2026-02-21T08:34:17.9486498Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2021, %r2025, %r2029, %r2033}, [%r1509]; 2026-02-21T08:34:17.9486677Z // end inline asm 2026-02-21T08:34:17.9486786Z // begin inline asm 2026-02-21T08:34:17.9487115Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2037, %r2041, %r2045, %r2049}, [%r1514]; 2026-02-21T08:34:17.9487236Z // end inline asm 2026-02-21T08:34:17.9487344Z bar.sync 0, 128; 2026-02-21T08:34:17.9487561Z st.shared.v4.b32 [%r2083], {%r2172, %r2184, %r2196, %r2208}; 2026-02-21T08:34:17.9487775Z st.shared.v4.b32 [%r2082], {%r2220, %r2232, %r2244, %r2256}; 2026-02-21T08:34:17.9488002Z st.shared.v4.b32 [%r2080], {%r2268, %r2280, %r2292, %r2304}; 2026-02-21T08:34:17.9488219Z st.shared.v4.b32 [%r2078], {%r2316, %r2328, %r2340, %r2352}; 2026-02-21T08:34:17.9488433Z st.shared.v4.b32 [%r2076], {%r2364, %r2376, %r2388, %r2400}; 2026-02-21T08:34:17.9488657Z st.shared.v4.b32 [%r2074], {%r2412, %r2424, %r2436, %r2448}; 2026-02-21T08:34:17.9488875Z st.shared.v4.b32 [%r2072], {%r2460, %r2472, %r2484, %r2496}; 2026-02-21T08:34:17.9489143Z st.shared.v4.b32 [%r2070], {%r2508, %r2520, %r2532, %r2544}; 2026-02-21T08:34:17.9489282Z bar.sync 0, 128; 2026-02-21T08:34:17.9489395Z // begin inline asm 2026-02-21T08:34:17.9489749Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1798, %r1802, %r1806, %r1810}, [%r1479]; 2026-02-21T08:34:17.9489851Z // end inline asm 2026-02-21T08:34:17.9489987Z // begin inline asm 2026-02-21T08:34:17.9490338Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1814, %r1818, %r1822, %r1826}, [%r1484]; 2026-02-21T08:34:17.9490462Z // end inline asm 2026-02-21T08:34:17.9490647Z // begin inline asm 2026-02-21T08:34:17.9491000Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1830, %r1834, %r1838, %r1842}, [%r1489]; 2026-02-21T08:34:17.9491129Z // end inline asm 2026-02-21T08:34:17.9491249Z // begin inline asm 2026-02-21T08:34:17.9491602Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1846, %r1850, %r1854, %r1858}, [%r1494]; 2026-02-21T08:34:17.9491716Z // end inline asm 2026-02-21T08:34:17.9491856Z // begin inline asm 2026-02-21T08:34:17.9492215Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1862, %r1866, %r1870, %r1874}, [%r1499]; 2026-02-21T08:34:17.9492329Z // end inline asm 2026-02-21T08:34:17.9492462Z // begin inline asm 2026-02-21T08:34:17.9492818Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1878, %r1882, %r1886, %r1890}, [%r1504]; 2026-02-21T08:34:17.9492938Z // end inline asm 2026-02-21T08:34:17.9493064Z // begin inline asm 2026-02-21T08:34:17.9493437Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1894, %r1898, %r1902, %r1906}, [%r1509]; 2026-02-21T08:34:17.9493552Z // end inline asm 2026-02-21T08:34:17.9493677Z // begin inline asm 2026-02-21T08:34:17.9494021Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1910, %r1914, %r1918, %r1922}, [%r1514]; 2026-02-21T08:34:17.9494151Z // end inline asm 2026-02-21T08:34:17.9494273Z bar.sync 0, 128; 2026-02-21T08:34:17.9494541Z st.shared.v4.b32 [%r2083], {%r2556, %r2568, %r2580, %r2592}; 2026-02-21T08:34:17.9494840Z st.shared.v4.b32 [%r2082], {%r2604, %r2616, %r2628, %r2640}; 2026-02-21T08:34:17.9495046Z st.shared.v4.b32 [%r2080], {%r2652, %r2664, %r2676, %r2688}; 2026-02-21T08:34:17.9495259Z st.shared.v4.b32 [%r2078], {%r2700, %r2712, %r2724, %r2736}; 2026-02-21T08:34:17.9495477Z st.shared.v4.b32 [%r2076], {%r2748, %r2760, %r2772, %r2784}; 2026-02-21T08:34:17.9495692Z st.shared.v4.b32 [%r2074], {%r2796, %r2808, %r2820, %r2832}; 2026-02-21T08:34:17.9495904Z st.shared.v4.b32 [%r2072], {%r2844, %r2856, %r2868, %r2880}; 2026-02-21T08:34:17.9496116Z st.shared.v4.b32 [%r2070], {%r2892, %r2904, %r2916, %r2928}; 2026-02-21T08:34:17.9496241Z bar.sync 0, 128; 2026-02-21T08:34:17.9496358Z // begin inline asm 2026-02-21T08:34:17.9496715Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1926, %r1930, %r1934, %r1938}, [%r1479]; 2026-02-21T08:34:17.9496833Z // end inline asm 2026-02-21T08:34:17.9496951Z // begin inline asm 2026-02-21T08:34:17.9497327Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1942, %r1946, %r1950, %r1954}, [%r1484]; 2026-02-21T08:34:17.9497452Z // end inline asm 2026-02-21T08:34:17.9497621Z // begin inline asm 2026-02-21T08:34:17.9497978Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1958, %r1962, %r1966, %r1970}, [%r1489]; 2026-02-21T08:34:17.9498088Z // end inline asm 2026-02-21T08:34:17.9498217Z // begin inline asm 2026-02-21T08:34:17.9498575Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1974, %r1978, %r1982, %r1986}, [%r1494]; 2026-02-21T08:34:17.9498689Z // end inline asm 2026-02-21T08:34:17.9498820Z // begin inline asm 2026-02-21T08:34:17.9499161Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1990, %r1994, %r1998, %r2002}, [%r1499]; 2026-02-21T08:34:17.9499286Z // end inline asm 2026-02-21T08:34:17.9499398Z // begin inline asm 2026-02-21T08:34:17.9499739Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2006, %r2010, %r2014, %r2018}, [%r1504]; 2026-02-21T08:34:17.9499848Z // end inline asm 2026-02-21T08:34:17.9499959Z // begin inline asm 2026-02-21T08:34:17.9500348Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2022, %r2026, %r2030, %r2034}, [%r1509]; 2026-02-21T08:34:17.9500456Z // end inline asm 2026-02-21T08:34:17.9500570Z // begin inline asm 2026-02-21T08:34:17.9500908Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2038, %r2042, %r2046, %r2050}, [%r1514]; 2026-02-21T08:34:17.9501014Z // end inline asm 2026-02-21T08:34:17.9501124Z // begin inline asm 2026-02-21T08:34:17.9501357Z st.global.v4.b32 [ %rd124 + 0 ], { %r1795, %r1796, %r1797, %r1798 }; 2026-02-21T08:34:17.9501477Z // end inline asm 2026-02-21T08:34:17.9501588Z // begin inline asm 2026-02-21T08:34:17.9501813Z st.global.v4.b32 [ %rd125 + 0 ], { %r1799, %r1800, %r1801, %r1802 }; 2026-02-21T08:34:17.9501968Z // end inline asm 2026-02-21T08:34:17.9502080Z // begin inline asm 2026-02-21T08:34:17.9502297Z st.global.v4.b32 [ %rd126 + 0 ], { %r1803, %r1804, %r1805, %r1806 }; 2026-02-21T08:34:17.9502403Z // end inline asm 2026-02-21T08:34:17.9502525Z // begin inline asm 2026-02-21T08:34:17.9502741Z st.global.v4.b32 [ %rd127 + 0 ], { %r1807, %r1808, %r1809, %r1810 }; 2026-02-21T08:34:17.9502851Z // end inline asm 2026-02-21T08:34:17.9502978Z // begin inline asm 2026-02-21T08:34:17.9503194Z st.global.v4.b32 [ %rd128 + 0 ], { %r1811, %r1812, %r1813, %r1814 }; 2026-02-21T08:34:17.9503301Z // end inline asm 2026-02-21T08:34:17.9503423Z // begin inline asm 2026-02-21T08:34:17.9503636Z st.global.v4.b32 [ %rd129 + 0 ], { %r1815, %r1816, %r1817, %r1818 }; 2026-02-21T08:34:17.9503745Z // end inline asm 2026-02-21T08:34:17.9503853Z // begin inline asm 2026-02-21T08:34:17.9504074Z st.global.v4.b32 [ %rd130 + 0 ], { %r1819, %r1820, %r1821, %r1822 }; 2026-02-21T08:34:17.9504183Z // end inline asm 2026-02-21T08:34:17.9504292Z // begin inline asm 2026-02-21T08:34:17.9504510Z st.global.v4.b32 [ %rd131 + 0 ], { %r1823, %r1824, %r1825, %r1826 }; 2026-02-21T08:34:17.9504618Z // end inline asm 2026-02-21T08:34:17.9504840Z // begin inline asm 2026-02-21T08:34:17.9505130Z st.global.v4.b32 [ %rd132 + 0 ], { %r1827, %r1828, %r1829, %r1830 }; 2026-02-21T08:34:17.9505246Z // end inline asm 2026-02-21T08:34:17.9505361Z // begin inline asm 2026-02-21T08:34:17.9505580Z st.global.v4.b32 [ %rd133 + 0 ], { %r1831, %r1832, %r1833, %r1834 }; 2026-02-21T08:34:17.9505703Z // end inline asm 2026-02-21T08:34:17.9505812Z // begin inline asm 2026-02-21T08:34:17.9506020Z st.global.v4.b32 [ %rd134 + 0 ], { %r1835, %r1836, %r1837, %r1838 }; 2026-02-21T08:34:17.9506137Z // end inline asm 2026-02-21T08:34:17.9506248Z // begin inline asm 2026-02-21T08:34:17.9506460Z st.global.v4.b32 [ %rd135 + 0 ], { %r1839, %r1840, %r1841, %r1842 }; 2026-02-21T08:34:17.9506567Z // end inline asm 2026-02-21T08:34:17.9506690Z // begin inline asm 2026-02-21T08:34:17.9506901Z st.global.v4.b32 [ %rd136 + 0 ], { %r1843, %r1844, %r1845, %r1846 }; 2026-02-21T08:34:17.9507007Z // end inline asm 2026-02-21T08:34:17.9507124Z // begin inline asm 2026-02-21T08:34:17.9507336Z st.global.v4.b32 [ %rd137 + 0 ], { %r1847, %r1848, %r1849, %r1850 }; 2026-02-21T08:34:17.9507444Z // end inline asm 2026-02-21T08:34:17.9507557Z // begin inline asm 2026-02-21T08:34:17.9507778Z st.global.v4.b32 [ %rd138 + 0 ], { %r1851, %r1852, %r1853, %r1854 }; 2026-02-21T08:34:17.9507947Z // end inline asm 2026-02-21T08:34:17.9508056Z // begin inline asm 2026-02-21T08:34:17.9508276Z st.global.v4.b32 [ %rd139 + 0 ], { %r1855, %r1856, %r1857, %r1858 }; 2026-02-21T08:34:17.9508384Z // end inline asm 2026-02-21T08:34:17.9508495Z // begin inline asm 2026-02-21T08:34:17.9508707Z st.global.v4.b32 [ %rd140 + 0 ], { %r1859, %r1860, %r1861, %r1862 }; 2026-02-21T08:34:17.9508826Z // end inline asm 2026-02-21T08:34:17.9508940Z // begin inline asm 2026-02-21T08:34:17.9509148Z st.global.v4.b32 [ %rd141 + 0 ], { %r1863, %r1864, %r1865, %r1866 }; 2026-02-21T08:34:17.9509268Z // end inline asm 2026-02-21T08:34:17.9509379Z // begin inline asm 2026-02-21T08:34:17.9509588Z st.global.v4.b32 [ %rd142 + 0 ], { %r1867, %r1868, %r1869, %r1870 }; 2026-02-21T08:34:17.9509698Z // end inline asm 2026-02-21T08:34:17.9509807Z // begin inline asm 2026-02-21T08:34:17.9510076Z st.global.v4.b32 [ %rd143 + 0 ], { %r1871, %r1872, %r1873, %r1874 }; 2026-02-21T08:34:17.9510184Z // end inline asm 2026-02-21T08:34:17.9510304Z // begin inline asm 2026-02-21T08:34:17.9510516Z st.global.v4.b32 [ %rd144 + 0 ], { %r1875, %r1876, %r1877, %r1878 }; 2026-02-21T08:34:17.9510625Z // end inline asm 2026-02-21T08:34:17.9510749Z // begin inline asm 2026-02-21T08:34:17.9510960Z st.global.v4.b32 [ %rd145 + 0 ], { %r1879, %r1880, %r1881, %r1882 }; 2026-02-21T08:34:17.9511069Z // end inline asm 2026-02-21T08:34:17.9511180Z // begin inline asm 2026-02-21T08:34:17.9511460Z st.global.v4.b32 [ %rd146 + 0 ], { %r1883, %r1884, %r1885, %r1886 }; 2026-02-21T08:34:17.9511568Z // end inline asm 2026-02-21T08:34:17.9511679Z // begin inline asm 2026-02-21T08:34:17.9511902Z st.global.v4.b32 [ %rd147 + 0 ], { %r1887, %r1888, %r1889, %r1890 }; 2026-02-21T08:34:17.9512012Z // end inline asm 2026-02-21T08:34:17.9512126Z // begin inline asm 2026-02-21T08:34:17.9512350Z st.global.v4.b32 [ %rd148 + 0 ], { %r1891, %r1892, %r1893, %r1894 }; 2026-02-21T08:34:17.9512461Z // end inline asm 2026-02-21T08:34:17.9512572Z // begin inline asm 2026-02-21T08:34:17.9512783Z st.global.v4.b32 [ %rd149 + 0 ], { %r1895, %r1896, %r1897, %r1898 }; 2026-02-21T08:34:17.9512895Z // end inline asm 2026-02-21T08:34:17.9513008Z // begin inline asm 2026-02-21T08:34:17.9513221Z st.global.v4.b32 [ %rd150 + 0 ], { %r1899, %r1900, %r1901, %r1902 }; 2026-02-21T08:34:17.9513335Z // end inline asm 2026-02-21T08:34:17.9513445Z // begin inline asm 2026-02-21T08:34:17.9513657Z st.global.v4.b32 [ %rd151 + 0 ], { %r1903, %r1904, %r1905, %r1906 }; 2026-02-21T08:34:17.9513767Z // end inline asm 2026-02-21T08:34:17.9513889Z // begin inline asm 2026-02-21T08:34:17.9514100Z st.global.v4.b32 [ %rd152 + 0 ], { %r1907, %r1908, %r1909, %r1910 }; 2026-02-21T08:34:17.9514206Z // end inline asm 2026-02-21T08:34:17.9514326Z // begin inline asm 2026-02-21T08:34:17.9514586Z st.global.v4.b32 [ %rd153 + 0 ], { %r1911, %r1912, %r1913, %r1914 }; 2026-02-21T08:34:17.9514752Z // end inline asm 2026-02-21T08:34:17.9514882Z // begin inline asm 2026-02-21T08:34:17.9515094Z st.global.v4.b32 [ %rd154 + 0 ], { %r1915, %r1916, %r1917, %r1918 }; 2026-02-21T08:34:17.9515205Z // end inline asm 2026-02-21T08:34:17.9515318Z // begin inline asm 2026-02-21T08:34:17.9515534Z st.global.v4.b32 [ %rd155 + 0 ], { %r1919, %r1920, %r1921, %r1922 }; 2026-02-21T08:34:17.9515641Z // end inline asm 2026-02-21T08:34:17.9515750Z // begin inline asm 2026-02-21T08:34:17.9515969Z st.global.v4.b32 [ %rd156 + 0 ], { %r1923, %r1924, %r1925, %r1926 }; 2026-02-21T08:34:17.9516076Z // end inline asm 2026-02-21T08:34:17.9516184Z // begin inline asm 2026-02-21T08:34:17.9516392Z st.global.v4.b32 [ %rd157 + 0 ], { %r1927, %r1928, %r1929, %r1930 }; 2026-02-21T08:34:17.9516510Z // end inline asm 2026-02-21T08:34:17.9516620Z // begin inline asm 2026-02-21T08:34:17.9516832Z st.global.v4.b32 [ %rd158 + 0 ], { %r1931, %r1932, %r1933, %r1934 }; 2026-02-21T08:34:17.9516948Z // end inline asm 2026-02-21T08:34:17.9517059Z // begin inline asm 2026-02-21T08:34:17.9517320Z st.global.v4.b32 [ %rd159 + 0 ], { %r1935, %r1936, %r1937, %r1938 }; 2026-02-21T08:34:17.9517436Z // end inline asm 2026-02-21T08:34:17.9517547Z // begin inline asm 2026-02-21T08:34:17.9517758Z st.global.v4.b32 [ %rd160 + 0 ], { %r1939, %r1940, %r1941, %r1942 }; 2026-02-21T08:34:17.9517865Z // end inline asm 2026-02-21T08:34:17.9517980Z // begin inline asm 2026-02-21T08:34:17.9518192Z st.global.v4.b32 [ %rd161 + 0 ], { %r1943, %r1944, %r1945, %r1946 }; 2026-02-21T08:34:17.9518300Z // end inline asm 2026-02-21T08:34:17.9518420Z // begin inline asm 2026-02-21T08:34:17.9518628Z st.global.v4.b32 [ %rd162 + 0 ], { %r1947, %r1948, %r1949, %r1950 }; 2026-02-21T08:34:17.9518735Z // end inline asm 2026-02-21T08:34:17.9518847Z // begin inline asm 2026-02-21T08:34:17.9519069Z st.global.v4.b32 [ %rd163 + 0 ], { %r1951, %r1952, %r1953, %r1954 }; 2026-02-21T08:34:17.9519175Z // end inline asm 2026-02-21T08:34:17.9519335Z // begin inline asm 2026-02-21T08:34:17.9519558Z st.global.v4.b32 [ %rd164 + 0 ], { %r1955, %r1956, %r1957, %r1958 }; 2026-02-21T08:34:17.9519670Z // end inline asm 2026-02-21T08:34:17.9519780Z // begin inline asm 2026-02-21T08:34:17.9519987Z st.global.v4.b32 [ %rd165 + 0 ], { %r1959, %r1960, %r1961, %r1962 }; 2026-02-21T08:34:17.9520106Z // end inline asm 2026-02-21T08:34:17.9520217Z // begin inline asm 2026-02-21T08:34:17.9520426Z st.global.v4.b32 [ %rd166 + 0 ], { %r1963, %r1964, %r1965, %r1966 }; 2026-02-21T08:34:17.9520539Z // end inline asm 2026-02-21T08:34:17.9520702Z // begin inline asm 2026-02-21T08:34:17.9520918Z st.global.v4.b32 [ %rd167 + 0 ], { %r1967, %r1968, %r1969, %r1970 }; 2026-02-21T08:34:17.9521034Z // end inline asm 2026-02-21T08:34:17.9521139Z // begin inline asm 2026-02-21T08:34:17.9521350Z st.global.v4.b32 [ %rd168 + 0 ], { %r1971, %r1972, %r1973, %r1974 }; 2026-02-21T08:34:17.9521456Z // end inline asm 2026-02-21T08:34:17.9521578Z // begin inline asm 2026-02-21T08:34:17.9521787Z st.global.v4.b32 [ %rd169 + 0 ], { %r1975, %r1976, %r1977, %r1978 }; 2026-02-21T08:34:17.9521896Z // end inline asm 2026-02-21T08:34:17.9522015Z // begin inline asm 2026-02-21T08:34:17.9522226Z st.global.v4.b32 [ %rd170 + 0 ], { %r1979, %r1980, %r1981, %r1982 }; 2026-02-21T08:34:17.9522335Z // end inline asm 2026-02-21T08:34:17.9522447Z // begin inline asm 2026-02-21T08:34:17.9522667Z st.global.v4.b32 [ %rd171 + 0 ], { %r1983, %r1984, %r1985, %r1986 }; 2026-02-21T08:34:17.9522774Z // end inline asm 2026-02-21T08:34:17.9522883Z // begin inline asm 2026-02-21T08:34:17.9523104Z st.global.v4.b32 [ %rd172 + 0 ], { %r1987, %r1988, %r1989, %r1990 }; 2026-02-21T08:34:17.9523210Z // end inline asm 2026-02-21T08:34:17.9523319Z // begin inline asm 2026-02-21T08:34:17.9523539Z st.global.v4.b32 [ %rd173 + 0 ], { %r1991, %r1992, %r1993, %r1994 }; 2026-02-21T08:34:17.9523641Z // end inline asm 2026-02-21T08:34:17.9523847Z // begin inline asm 2026-02-21T08:34:17.9524063Z st.global.v4.b32 [ %rd174 + 0 ], { %r1995, %r1996, %r1997, %r1998 }; 2026-02-21T08:34:17.9524185Z // end inline asm 2026-02-21T08:34:17.9524296Z // begin inline asm 2026-02-21T08:34:17.9524507Z st.global.v4.b32 [ %rd175 + 0 ], { %r1999, %r2000, %r2001, %r2002 }; 2026-02-21T08:34:17.9524623Z // end inline asm 2026-02-21T08:34:17.9524802Z // begin inline asm 2026-02-21T08:34:17.9525012Z st.global.v4.b32 [ %rd176 + 0 ], { %r2003, %r2004, %r2005, %r2006 }; 2026-02-21T08:34:17.9525120Z // end inline asm 2026-02-21T08:34:17.9525241Z // begin inline asm 2026-02-21T08:34:17.9525454Z st.global.v4.b32 [ %rd177 + 0 ], { %r2007, %r2008, %r2009, %r2010 }; 2026-02-21T08:34:17.9525559Z // end inline asm 2026-02-21T08:34:17.9525683Z // begin inline asm 2026-02-21T08:34:17.9525896Z st.global.v4.b32 [ %rd178 + 0 ], { %r2011, %r2012, %r2013, %r2014 }; 2026-02-21T08:34:17.9526001Z // end inline asm 2026-02-21T08:34:17.9526122Z // begin inline asm 2026-02-21T08:34:17.9526333Z st.global.v4.b32 [ %rd179 + 0 ], { %r2015, %r2016, %r2017, %r2018 }; 2026-02-21T08:34:17.9526442Z // end inline asm 2026-02-21T08:34:17.9526615Z // begin inline asm 2026-02-21T08:34:17.9526835Z st.global.v4.b32 [ %rd180 + 0 ], { %r2019, %r2020, %r2021, %r2022 }; 2026-02-21T08:34:17.9526944Z // end inline asm 2026-02-21T08:34:17.9527056Z // begin inline asm 2026-02-21T08:34:17.9527282Z st.global.v4.b32 [ %rd181 + 0 ], { %r2023, %r2024, %r2025, %r2026 }; 2026-02-21T08:34:17.9527391Z // end inline asm 2026-02-21T08:34:17.9527497Z // begin inline asm 2026-02-21T08:34:17.9527710Z st.global.v4.b32 [ %rd182 + 0 ], { %r2027, %r2028, %r2029, %r2030 }; 2026-02-21T08:34:17.9527829Z // end inline asm 2026-02-21T08:34:17.9527939Z // begin inline asm 2026-02-21T08:34:17.9528152Z st.global.v4.b32 [ %rd183 + 0 ], { %r2031, %r2032, %r2033, %r2034 }; 2026-02-21T08:34:17.9528269Z // end inline asm 2026-02-21T08:34:17.9528380Z // begin inline asm 2026-02-21T08:34:17.9528596Z st.global.v4.b32 [ %rd184 + 0 ], { %r2035, %r2036, %r2037, %r2038 }; 2026-02-21T08:34:17.9528708Z // end inline asm 2026-02-21T08:34:17.9528877Z // begin inline asm 2026-02-21T08:34:17.9529092Z st.global.v4.b32 [ %rd185 + 0 ], { %r2039, %r2040, %r2041, %r2042 }; 2026-02-21T08:34:17.9529197Z // end inline asm 2026-02-21T08:34:17.9529318Z // begin inline asm 2026-02-21T08:34:17.9529534Z st.global.v4.b32 [ %rd186 + 0 ], { %r2043, %r2044, %r2045, %r2046 }; 2026-02-21T08:34:17.9529643Z // end inline asm 2026-02-21T08:34:17.9529766Z // begin inline asm 2026-02-21T08:34:17.9529974Z st.global.v4.b32 [ %rd187 + 0 ], { %r2047, %r2048, %r2049, %r2050 }; 2026-02-21T08:34:17.9530130Z // end inline asm 2026-02-21T08:34:17.9530297Z $L__BB0_15: // %._crit_edge 2026-02-21T08:34:17.9530696Z .loc 1 23 4 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:23:4 2026-02-21T08:34:17.9530805Z bar.sync 0, 128; 2026-02-21T08:34:17.9530914Z // begin inline asm 2026-02-21T08:34:17.9531208Z @%p26 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2929, 512; 2026-02-21T08:34:17.9531320Z // end inline asm 2026-02-21T08:34:17.9531534Z st.shared.v2.b32 [global_smem+65544], {67372036, 67372036}; 2026-02-21T08:34:17.9531655Z barrier.sync 1; 2026-02-21T08:34:17.9531817Z $L__BB0_16: // %common.ret 2026-02-21T08:34:17.9532176Z .loc 1 0 0 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:0 2026-02-21T08:34:17.9532278Z ret; 2026-02-21T08:34:17.9532482Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:34:17.9532661Z ld.param.b64 %rd22, [_helion_matmul_param_1]; 2026-02-21T08:34:17.9532835Z ld.param.b64 %rd21, [_helion_matmul_param_0]; 2026-02-21T08:34:17.9533204Z .loc 1 14 0 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:14 2026-02-21T08:34:17.9533327Z cvt.u16.u32 %rs1, %r1; 2026-02-21T08:34:17.9533443Z and.b16 %rs2, %rs1, 7; 2026-02-21T08:34:17.9533632Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T08:34:17.9534024Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9534141Z or.b32 %r5, %r4, 64; 2026-02-21T08:34:17.9534263Z mov.b32 %r45, global_smem; 2026-02-21T08:34:17.9534388Z add.s32 %r46, %r45, %r3; 2026-02-21T08:34:17.9534501Z bra.uni $L__BB0_2; 2026-02-21T08:34:17.9534761Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9536896Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9537173Z barrier.sync 1; 2026-02-21T08:34:17.9537313Z barrier.sync 1; 2026-02-21T08:34:17.9537480Z $L__BB0_2: // %.preheader 2026-02-21T08:34:17.9537619Z // =>This Loop Header: Depth=1 2026-02-21T08:34:17.9537757Z // Child Loop BB0_6 Depth 2 2026-02-21T08:34:17.9538038Z .loc 1 14 0 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:14 2026-02-21T08:34:17.9538188Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:34:17.9538780Z barrier.sync 1; 2026-02-21T08:34:17.9538904Z ld.shared.b8 %r44, [%r46+65540]; 2026-02-21T08:34:17.9539038Z setp.gt.u32 %p3, %r44, 4; 2026-02-21T08:34:17.9539147Z @%p3 bra $L__BB0_4; 2026-02-21T08:34:17.9539281Z // %bb.3: // %.preheader 2026-02-21T08:34:17.9539423Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9539528Z $L_brx_0: .branchtargets 2026-02-21T08:34:17.9539646Z $L__BB0_5, 2026-02-21T08:34:17.9539732Z $L__BB0_10, 2026-02-21T08:34:17.9539825Z $L__BB0_11, 2026-02-21T08:34:17.9539922Z $L__BB0_12, 2026-02-21T08:34:17.9540016Z $L__BB0_16; 2026-02-21T08:34:17.9540133Z brx.idx %r44, $L_brx_0; 2026-02-21T08:34:17.9540284Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9540635Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9540752Z ld.shared.b32 %r265, [global_smem]; 2026-02-21T08:34:17.9540890Z ld.shared.v2.b32 {%r112, %r113}, [global_smem+8]; 2026-02-21T08:34:17.9540968Z barrier.sync 1; 2026-02-21T08:34:17.9541207Z .loc 1 35 45 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:35:45 2026-02-21T08:34:17.9541306Z add.s32 %r114, %r1, -128; 2026-02-21T08:34:17.9541388Z shr.u32 %r7, %r114, 5; 2026-02-21T08:34:17.9541462Z shr.u32 %r115, %r1, 3; 2026-02-21T08:34:17.9541552Z bfe.u32 %r116, %r1, 3, 4; 2026-02-21T08:34:17.9541723Z or.b32 %r117, %r116, 16; 2026-02-21T08:34:17.9541797Z or.b32 %r118, %r116, 32; 2026-02-21T08:34:17.9541869Z or.b32 %r119, %r116, 48; 2026-02-21T08:34:17.9541950Z or.b32 %r120, %r116, 64; 2026-02-21T08:34:17.9542023Z or.b32 %r121, %r116, 80; 2026-02-21T08:34:17.9542097Z or.b32 %r122, %r116, 96; 2026-02-21T08:34:17.9542184Z or.b32 %r123, %r115, 112; 2026-02-21T08:34:17.9542261Z or.b32 %r124, %r116, 128; 2026-02-21T08:34:17.9542341Z or.b32 %r125, %r116, 144; 2026-02-21T08:34:17.9542419Z or.b32 %r126, %r116, 160; 2026-02-21T08:34:17.9542503Z or.b32 %r127, %r116, 176; 2026-02-21T08:34:17.9542577Z or.b32 %r128, %r116, 192; 2026-02-21T08:34:17.9542651Z or.b32 %r129, %r116, 208; 2026-02-21T08:34:17.9542734Z or.b32 %r130, %r116, 224; 2026-02-21T08:34:17.9542807Z or.b32 %r131, %r115, 240; 2026-02-21T08:34:17.9543042Z .loc 1 43 48 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:43:48 2026-02-21T08:34:17.9543121Z shl.b32 %r132, %r1, 3; 2026-02-21T08:34:17.9543209Z and.b32 %r133, %r132, 56; 2026-02-21T08:34:17.9543435Z .loc 1 35 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:35:32 2026-02-21T08:34:17.9543517Z add.s32 %r134, %r112, %r116; 2026-02-21T08:34:17.9543603Z add.s32 %r135, %r112, %r117; 2026-02-21T08:34:17.9543765Z add.s32 %r136, %r112, %r118; 2026-02-21T08:34:17.9543861Z add.s32 %r137, %r112, %r119; 2026-02-21T08:34:17.9543961Z add.s32 %r138, %r112, %r120; 2026-02-21T08:34:17.9544055Z add.s32 %r139, %r112, %r121; 2026-02-21T08:34:17.9544143Z add.s32 %r140, %r112, %r122; 2026-02-21T08:34:17.9544234Z add.s32 %r141, %r112, %r123; 2026-02-21T08:34:17.9544340Z add.s32 %r142, %r112, %r124; 2026-02-21T08:34:17.9544433Z add.s32 %r143, %r112, %r125; 2026-02-21T08:34:17.9544524Z add.s32 %r144, %r112, %r126; 2026-02-21T08:34:17.9544956Z add.s32 %r145, %r112, %r127; 2026-02-21T08:34:17.9545059Z add.s32 %r146, %r112, %r128; 2026-02-21T08:34:17.9545155Z add.s32 %r147, %r112, %r129; 2026-02-21T08:34:17.9545258Z add.s32 %r148, %r112, %r130; 2026-02-21T08:34:17.9545366Z add.s32 %r149, %r112, %r131; 2026-02-21T08:34:17.9545651Z .loc 1 37 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:37:32 2026-02-21T08:34:17.9545747Z add.s32 %r150, %r113, %r116; 2026-02-21T08:34:17.9545855Z add.s32 %r151, %r113, %r117; 2026-02-21T08:34:17.9545954Z add.s32 %r152, %r113, %r118; 2026-02-21T08:34:17.9546037Z add.s32 %r153, %r113, %r119; 2026-02-21T08:34:17.9546176Z add.s32 %r154, %r113, %r120; 2026-02-21T08:34:17.9546250Z add.s32 %r155, %r113, %r121; 2026-02-21T08:34:17.9546322Z add.s32 %r156, %r113, %r122; 2026-02-21T08:34:17.9546397Z add.s32 %r157, %r113, %r123; 2026-02-21T08:34:17.9546481Z add.s32 %r158, %r113, %r124; 2026-02-21T08:34:17.9546555Z add.s32 %r159, %r113, %r125; 2026-02-21T08:34:17.9546641Z add.s32 %r160, %r113, %r126; 2026-02-21T08:34:17.9546743Z add.s32 %r161, %r113, %r127; 2026-02-21T08:34:17.9546831Z add.s32 %r162, %r113, %r128; 2026-02-21T08:34:17.9546942Z add.s32 %r163, %r113, %r129; 2026-02-21T08:34:17.9547039Z add.s32 %r164, %r113, %r130; 2026-02-21T08:34:17.9547133Z add.s32 %r165, %r113, %r131; 2026-02-21T08:34:17.9547395Z .loc 1 47 53 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:53 2026-02-21T08:34:17.9547473Z shl.b32 %r166, %r134, 11; 2026-02-21T08:34:17.9547563Z shl.b32 %r167, %r135, 11; 2026-02-21T08:34:17.9547690Z shl.b32 %r168, %r136, 11; 2026-02-21T08:34:17.9547772Z shl.b32 %r169, %r137, 11; 2026-02-21T08:34:17.9547847Z shl.b32 %r170, %r138, 11; 2026-02-21T08:34:17.9547938Z shl.b32 %r171, %r139, 11; 2026-02-21T08:34:17.9548012Z shl.b32 %r172, %r140, 11; 2026-02-21T08:34:17.9548087Z shl.b32 %r173, %r141, 11; 2026-02-21T08:34:17.9548182Z shl.b32 %r174, %r142, 11; 2026-02-21T08:34:17.9548257Z shl.b32 %r175, %r143, 11; 2026-02-21T08:34:17.9548332Z shl.b32 %r176, %r144, 11; 2026-02-21T08:34:17.9548415Z shl.b32 %r177, %r145, 11; 2026-02-21T08:34:17.9548553Z shl.b32 %r178, %r146, 11; 2026-02-21T08:34:17.9548631Z shl.b32 %r179, %r147, 11; 2026-02-21T08:34:17.9548709Z shl.b32 %r180, %r148, 11; 2026-02-21T08:34:17.9548802Z shl.b32 %r181, %r149, 11; 2026-02-21T08:34:17.9549046Z .loc 1 48 80 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:80 2026-02-21T08:34:17.9549128Z shl.b32 %r182, %r150, 11; 2026-02-21T08:34:17.9549215Z shl.b32 %r183, %r151, 11; 2026-02-21T08:34:17.9549299Z shl.b32 %r184, %r152, 11; 2026-02-21T08:34:17.9549376Z shl.b32 %r185, %r153, 11; 2026-02-21T08:34:17.9549450Z shl.b32 %r186, %r154, 11; 2026-02-21T08:34:17.9549542Z shl.b32 %r187, %r155, 11; 2026-02-21T08:34:17.9549616Z shl.b32 %r188, %r156, 11; 2026-02-21T08:34:17.9549691Z shl.b32 %r189, %r157, 11; 2026-02-21T08:34:17.9549772Z shl.b32 %r190, %r158, 11; 2026-02-21T08:34:17.9549857Z shl.b32 %r191, %r159, 11; 2026-02-21T08:34:17.9549934Z shl.b32 %r192, %r160, 11; 2026-02-21T08:34:17.9550008Z shl.b32 %r193, %r161, 11; 2026-02-21T08:34:17.9550093Z shl.b32 %r194, %r162, 11; 2026-02-21T08:34:17.9550167Z shl.b32 %r195, %r163, 11; 2026-02-21T08:34:17.9550240Z shl.b32 %r196, %r164, 11; 2026-02-21T08:34:17.9550323Z shl.b32 %r197, %r165, 11; 2026-02-21T08:34:17.9550617Z .loc 1 47 60 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:60 2026-02-21T08:34:17.9550705Z or.b32 %r198, %r166, %r133; 2026-02-21T08:34:17.9550790Z or.b32 %r199, %r167, %r133; 2026-02-21T08:34:17.9550873Z or.b32 %r200, %r168, %r133; 2026-02-21T08:34:17.9550951Z or.b32 %r201, %r169, %r133; 2026-02-21T08:34:17.9551025Z or.b32 %r202, %r170, %r133; 2026-02-21T08:34:17.9551110Z or.b32 %r203, %r171, %r133; 2026-02-21T08:34:17.9551184Z or.b32 %r204, %r172, %r133; 2026-02-21T08:34:17.9551259Z or.b32 %r205, %r173, %r133; 2026-02-21T08:34:17.9551342Z or.b32 %r206, %r174, %r133; 2026-02-21T08:34:17.9551418Z or.b32 %r207, %r175, %r133; 2026-02-21T08:34:17.9551507Z or.b32 %r208, %r176, %r133; 2026-02-21T08:34:17.9551583Z or.b32 %r209, %r177, %r133; 2026-02-21T08:34:17.9551671Z or.b32 %r210, %r178, %r133; 2026-02-21T08:34:17.9551744Z or.b32 %r211, %r179, %r133; 2026-02-21T08:34:17.9551816Z or.b32 %r212, %r180, %r133; 2026-02-21T08:34:17.9551895Z or.b32 %r213, %r181, %r133; 2026-02-21T08:34:17.9552133Z .loc 1 47 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:32 2026-02-21T08:34:17.9552225Z mad.wide.s32 %rd24, %r198, 2, %rd21; 2026-02-21T08:34:17.9552347Z mad.wide.s32 %rd25, %r199, 2, %rd21; 2026-02-21T08:34:17.9552439Z mad.wide.s32 %rd26, %r200, 2, %rd21; 2026-02-21T08:34:17.9552524Z mad.wide.s32 %rd27, %r201, 2, %rd21; 2026-02-21T08:34:17.9552603Z mad.wide.s32 %rd28, %r202, 2, %rd21; 2026-02-21T08:34:17.9552691Z mad.wide.s32 %rd29, %r203, 2, %rd21; 2026-02-21T08:34:17.9552771Z mad.wide.s32 %rd30, %r204, 2, %rd21; 2026-02-21T08:34:17.9552852Z mad.wide.s32 %rd31, %r205, 2, %rd21; 2026-02-21T08:34:17.9552937Z mad.wide.s32 %rd32, %r206, 2, %rd21; 2026-02-21T08:34:17.9553023Z mad.wide.s32 %rd33, %r207, 2, %rd21; 2026-02-21T08:34:17.9553104Z mad.wide.s32 %rd34, %r208, 2, %rd21; 2026-02-21T08:34:17.9553184Z mad.wide.s32 %rd35, %r209, 2, %rd21; 2026-02-21T08:34:17.9553289Z mad.wide.s32 %rd36, %r210, 2, %rd21; 2026-02-21T08:34:17.9553370Z mad.wide.s32 %rd37, %r211, 2, %rd21; 2026-02-21T08:34:17.9553453Z mad.wide.s32 %rd38, %r212, 2, %rd21; 2026-02-21T08:34:17.9553571Z mad.wide.s32 %rd39, %r213, 2, %rd21; 2026-02-21T08:34:17.9553812Z .loc 1 47 85 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:85 2026-02-21T08:34:17.9553895Z shl.b32 %r214, %r1, 4; 2026-02-21T08:34:17.9553994Z and.b32 %r215, %r214, 2032; 2026-02-21T08:34:17.9554083Z shl.b32 %r216, %r1, 1; 2026-02-21T08:34:17.9554169Z and.b32 %r217, %r216, 112; 2026-02-21T08:34:17.9554252Z xor.b32 %r218, %r215, %r217; 2026-02-21T08:34:17.9554351Z add.s32 %r284, %r45, %r218; 2026-02-21T08:34:17.9554454Z mov.b32 %r48, 16; 2026-02-21T08:34:17.9554565Z // begin inline asm 2026-02-21T08:34:17.9554854Z cp.async.cg.shared.global [ %r284 + 0 ], [ %rd24 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9554930Z // end inline asm 2026-02-21T08:34:17.9555006Z add.s32 %r286, %r284, 2048; 2026-02-21T08:34:17.9555078Z // begin inline asm 2026-02-21T08:34:17.9555259Z cp.async.cg.shared.global [ %r286 + 0 ], [ %rd25 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9555333Z // end inline asm 2026-02-21T08:34:17.9555411Z add.s32 %r288, %r284, 4096; 2026-02-21T08:34:17.9555495Z // begin inline asm 2026-02-21T08:34:17.9555654Z cp.async.cg.shared.global [ %r288 + 0 ], [ %rd26 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9555726Z // end inline asm 2026-02-21T08:34:17.9555804Z add.s32 %r290, %r284, 6144; 2026-02-21T08:34:17.9555893Z // begin inline asm 2026-02-21T08:34:17.9556051Z cp.async.cg.shared.global [ %r290 + 0 ], [ %rd27 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9556121Z // end inline asm 2026-02-21T08:34:17.9556207Z add.s32 %r292, %r284, 8192; 2026-02-21T08:34:17.9556284Z // begin inline asm 2026-02-21T08:34:17.9556447Z cp.async.cg.shared.global [ %r292 + 0 ], [ %rd28 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9556526Z // end inline asm 2026-02-21T08:34:17.9556607Z add.s32 %r294, %r284, 10240; 2026-02-21T08:34:17.9556682Z // begin inline asm 2026-02-21T08:34:17.9556895Z cp.async.cg.shared.global [ %r294 + 0 ], [ %rd29 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9557001Z // end inline asm 2026-02-21T08:34:17.9557081Z add.s32 %r296, %r284, 12288; 2026-02-21T08:34:17.9557156Z // begin inline asm 2026-02-21T08:34:17.9557319Z cp.async.cg.shared.global [ %r296 + 0 ], [ %rd30 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9557391Z // end inline asm 2026-02-21T08:34:17.9557468Z add.s32 %r298, %r284, 14336; 2026-02-21T08:34:17.9557540Z // begin inline asm 2026-02-21T08:34:17.9557708Z cp.async.cg.shared.global [ %r298 + 0 ], [ %rd31 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9557779Z // end inline asm 2026-02-21T08:34:17.9557855Z add.s32 %r300, %r284, 16384; 2026-02-21T08:34:17.9557937Z // begin inline asm 2026-02-21T08:34:17.9558078Z cp.async.cg.shared.global [ %r300 + 0 ], [ %rd32 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9558152Z // end inline asm 2026-02-21T08:34:17.9558227Z add.s32 %r302, %r284, 18432; 2026-02-21T08:34:17.9558309Z // begin inline asm 2026-02-21T08:34:17.9558454Z cp.async.cg.shared.global [ %r302 + 0 ], [ %rd33 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9558524Z // end inline asm 2026-02-21T08:34:17.9558610Z add.s32 %r304, %r284, 20480; 2026-02-21T08:34:17.9558725Z // begin inline asm 2026-02-21T08:34:17.9558870Z cp.async.cg.shared.global [ %r304 + 0 ], [ %rd34 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9558952Z // end inline asm 2026-02-21T08:34:17.9559026Z add.s32 %r306, %r284, 22528; 2026-02-21T08:34:17.9559097Z // begin inline asm 2026-02-21T08:34:17.9559241Z cp.async.cg.shared.global [ %r306 + 0 ], [ %rd35 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9559323Z // end inline asm 2026-02-21T08:34:17.9559396Z add.s32 %r308, %r284, 24576; 2026-02-21T08:34:17.9559469Z // begin inline asm 2026-02-21T08:34:17.9559620Z cp.async.cg.shared.global [ %r308 + 0 ], [ %rd36 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9559690Z // end inline asm 2026-02-21T08:34:17.9559766Z add.s32 %r310, %r284, 26624; 2026-02-21T08:34:17.9559841Z // begin inline asm 2026-02-21T08:34:17.9559992Z cp.async.cg.shared.global [ %r310 + 0 ], [ %rd37 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9560064Z // end inline asm 2026-02-21T08:34:17.9560207Z add.s32 %r312, %r284, 28672; 2026-02-21T08:34:17.9560292Z // begin inline asm 2026-02-21T08:34:17.9560436Z cp.async.cg.shared.global [ %r312 + 0 ], [ %rd38 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9560507Z // end inline asm 2026-02-21T08:34:17.9560588Z add.s32 %r314, %r284, 30720; 2026-02-21T08:34:17.9560662Z // begin inline asm 2026-02-21T08:34:17.9560802Z cp.async.cg.shared.global [ %r314 + 0 ], [ %rd39 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9560873Z // end inline asm 2026-02-21T08:34:17.9560963Z cp.async.commit_group; 2026-02-21T08:34:17.9561249Z .loc 1 48 59 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:59 2026-02-21T08:34:17.9561328Z or.b32 %r219, %r182, %r133; 2026-02-21T08:34:17.9561414Z or.b32 %r220, %r183, %r133; 2026-02-21T08:34:17.9561489Z or.b32 %r221, %r184, %r133; 2026-02-21T08:34:17.9561564Z or.b32 %r222, %r185, %r133; 2026-02-21T08:34:17.9561642Z or.b32 %r223, %r186, %r133; 2026-02-21T08:34:17.9561728Z or.b32 %r224, %r187, %r133; 2026-02-21T08:34:17.9561805Z or.b32 %r225, %r188, %r133; 2026-02-21T08:34:17.9561879Z or.b32 %r226, %r189, %r133; 2026-02-21T08:34:17.9561960Z or.b32 %r227, %r190, %r133; 2026-02-21T08:34:17.9562034Z or.b32 %r228, %r191, %r133; 2026-02-21T08:34:17.9562108Z or.b32 %r229, %r192, %r133; 2026-02-21T08:34:17.9562182Z or.b32 %r230, %r193, %r133; 2026-02-21T08:34:17.9562266Z or.b32 %r231, %r194, %r133; 2026-02-21T08:34:17.9562339Z or.b32 %r232, %r195, %r133; 2026-02-21T08:34:17.9562412Z or.b32 %r233, %r196, %r133; 2026-02-21T08:34:17.9562494Z or.b32 %r234, %r197, %r133; 2026-02-21T08:34:17.9562739Z .loc 1 48 34 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:34 2026-02-21T08:34:17.9562829Z mad.wide.s32 %rd40, %r219, 2, %rd22; 2026-02-21T08:34:17.9562920Z mad.wide.s32 %rd41, %r220, 2, %rd22; 2026-02-21T08:34:17.9563037Z mad.wide.s32 %rd42, %r221, 2, %rd22; 2026-02-21T08:34:17.9563122Z mad.wide.s32 %rd43, %r222, 2, %rd22; 2026-02-21T08:34:17.9563204Z mad.wide.s32 %rd44, %r223, 2, %rd22; 2026-02-21T08:34:17.9563297Z mad.wide.s32 %rd45, %r224, 2, %rd22; 2026-02-21T08:34:17.9563377Z mad.wide.s32 %rd46, %r225, 2, %rd22; 2026-02-21T08:34:17.9563455Z mad.wide.s32 %rd47, %r226, 2, %rd22; 2026-02-21T08:34:17.9563543Z mad.wide.s32 %rd48, %r227, 2, %rd22; 2026-02-21T08:34:17.9563623Z mad.wide.s32 %rd49, %r228, 2, %rd22; 2026-02-21T08:34:17.9563701Z mad.wide.s32 %rd50, %r229, 2, %rd22; 2026-02-21T08:34:17.9563789Z mad.wide.s32 %rd51, %r230, 2, %rd22; 2026-02-21T08:34:17.9563869Z mad.wide.s32 %rd52, %r231, 2, %rd22; 2026-02-21T08:34:17.9563954Z mad.wide.s32 %rd53, %r232, 2, %rd22; 2026-02-21T08:34:17.9564037Z mad.wide.s32 %rd54, %r233, 2, %rd22; 2026-02-21T08:34:17.9564133Z mad.wide.s32 %rd55, %r234, 2, %rd22; 2026-02-21T08:34:17.9564374Z .loc 1 48 87 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:87 2026-02-21T08:34:17.9564451Z add.s32 %r235, %r45, 32768; 2026-02-21T08:34:17.9564541Z add.s32 %r316, %r235, %r218; 2026-02-21T08:34:17.9564656Z // begin inline asm 2026-02-21T08:34:17.9564836Z cp.async.cg.shared.global [ %r316 + 0 ], [ %rd40 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9564908Z // end inline asm 2026-02-21T08:34:17.9564990Z add.s32 %r318, %r316, 2048; 2026-02-21T08:34:17.9565062Z // begin inline asm 2026-02-21T08:34:17.9565204Z cp.async.cg.shared.global [ %r318 + 0 ], [ %rd41 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9565285Z // end inline asm 2026-02-21T08:34:17.9565360Z add.s32 %r320, %r316, 4096; 2026-02-21T08:34:17.9565435Z // begin inline asm 2026-02-21T08:34:17.9565588Z cp.async.cg.shared.global [ %r320 + 0 ], [ %rd42 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9565660Z // end inline asm 2026-02-21T08:34:17.9565733Z add.s32 %r322, %r316, 6144; 2026-02-21T08:34:17.9565805Z // begin inline asm 2026-02-21T08:34:17.9565958Z cp.async.cg.shared.global [ %r322 + 0 ], [ %rd43 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9566030Z // end inline asm 2026-02-21T08:34:17.9566145Z add.s32 %r324, %r316, 8192; 2026-02-21T08:34:17.9566231Z // begin inline asm 2026-02-21T08:34:17.9566379Z cp.async.cg.shared.global [ %r324 + 0 ], [ %rd44 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9566450Z // end inline asm 2026-02-21T08:34:17.9566527Z add.s32 %r326, %r316, 10240; 2026-02-21T08:34:17.9566610Z // begin inline asm 2026-02-21T08:34:17.9566751Z cp.async.cg.shared.global [ %r326 + 0 ], [ %rd45 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9566826Z // end inline asm 2026-02-21T08:34:17.9566911Z add.s32 %r328, %r316, 12288; 2026-02-21T08:34:17.9567017Z // begin inline asm 2026-02-21T08:34:17.9567161Z cp.async.cg.shared.global [ %r328 + 0 ], [ %rd46 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9567230Z // end inline asm 2026-02-21T08:34:17.9567314Z add.s32 %r330, %r316, 14336; 2026-02-21T08:34:17.9567386Z // begin inline asm 2026-02-21T08:34:17.9567530Z cp.async.cg.shared.global [ %r330 + 0 ], [ %rd47 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9567613Z // end inline asm 2026-02-21T08:34:17.9567690Z add.s32 %r332, %r316, 16384; 2026-02-21T08:34:17.9567763Z // begin inline asm 2026-02-21T08:34:17.9567916Z cp.async.cg.shared.global [ %r332 + 0 ], [ %rd48 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9567985Z // end inline asm 2026-02-21T08:34:17.9568061Z add.s32 %r334, %r316, 18432; 2026-02-21T08:34:17.9568132Z // begin inline asm 2026-02-21T08:34:17.9568281Z cp.async.cg.shared.global [ %r334 + 0 ], [ %rd49 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9568352Z // end inline asm 2026-02-21T08:34:17.9568429Z add.s32 %r336, %r316, 20480; 2026-02-21T08:34:17.9568510Z // begin inline asm 2026-02-21T08:34:17.9568653Z cp.async.cg.shared.global [ %r336 + 0 ], [ %rd50 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9568724Z // end inline asm 2026-02-21T08:34:17.9568798Z add.s32 %r338, %r316, 22528; 2026-02-21T08:34:17.9568880Z // begin inline asm 2026-02-21T08:34:17.9569062Z cp.async.cg.shared.global [ %r338 + 0 ], [ %rd51 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9569134Z // end inline asm 2026-02-21T08:34:17.9569217Z add.s32 %r340, %r316, 24576; 2026-02-21T08:34:17.9569292Z // begin inline asm 2026-02-21T08:34:17.9569435Z cp.async.cg.shared.global [ %r340 + 0 ], [ %rd52 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9569515Z // end inline asm 2026-02-21T08:34:17.9569589Z add.s32 %r342, %r316, 26624; 2026-02-21T08:34:17.9569663Z // begin inline asm 2026-02-21T08:34:17.9569801Z cp.async.cg.shared.global [ %r342 + 0 ], [ %rd53 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9569882Z // end inline asm 2026-02-21T08:34:17.9569955Z add.s32 %r344, %r316, 28672; 2026-02-21T08:34:17.9570029Z // begin inline asm 2026-02-21T08:34:17.9570178Z cp.async.cg.shared.global [ %r344 + 0 ], [ %rd54 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9570249Z // end inline asm 2026-02-21T08:34:17.9570323Z add.s32 %r346, %r316, 30720; 2026-02-21T08:34:17.9570399Z // begin inline asm 2026-02-21T08:34:17.9570546Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd55 + 0 ], 0x10, %r48; 2026-02-21T08:34:17.9570619Z // end inline asm 2026-02-21T08:34:17.9570705Z cp.async.commit_group; 2026-02-21T08:34:17.9570793Z bfe.u32 %r236, %r45, 4, 14; 2026-02-21T08:34:17.9570919Z cvt.u64.u32 %rd57, %r236; 2026-02-21T08:34:17.9571013Z or.b64 %rd69, %rd57, 4611686293439512576; 2026-02-21T08:34:17.9571090Z bfe.u32 %r237, %r235, 4, 14; 2026-02-21T08:34:17.9571175Z cvt.u64.u32 %rd58, %r237; 2026-02-21T08:34:17.9571261Z or.b64 %rd70, %rd58, 4611686293439512576; 2026-02-21T08:34:17.9571337Z add.s32 %r238, %r45, 32; 2026-02-21T08:34:17.9571418Z bfe.u32 %r239, %r238, 4, 14; 2026-02-21T08:34:17.9571494Z cvt.u64.u32 %rd59, %r239; 2026-02-21T08:34:17.9571579Z or.b64 %rd71, %rd59, 4611686293439512576; 2026-02-21T08:34:17.9571656Z add.s32 %r240, %r45, 32800; 2026-02-21T08:34:17.9571739Z bfe.u32 %r241, %r240, 4, 14; 2026-02-21T08:34:17.9571818Z cvt.u64.u32 %rd60, %r241; 2026-02-21T08:34:17.9571903Z or.b64 %rd72, %rd60, 4611686293439512576; 2026-02-21T08:34:17.9571988Z add.s32 %r242, %r45, 64; 2026-02-21T08:34:17.9572060Z bfe.u32 %r243, %r242, 4, 14; 2026-02-21T08:34:17.9572171Z cvt.u64.u32 %rd61, %r243; 2026-02-21T08:34:17.9572268Z or.b64 %rd73, %rd61, 4611686293439512576; 2026-02-21T08:34:17.9572342Z add.s32 %r244, %r45, 32832; 2026-02-21T08:34:17.9572415Z bfe.u32 %r245, %r244, 4, 14; 2026-02-21T08:34:17.9572490Z cvt.u64.u32 %rd62, %r245; 2026-02-21T08:34:17.9572581Z or.b64 %rd74, %rd62, 4611686293439512576; 2026-02-21T08:34:17.9572656Z add.s32 %r246, %r45, 96; 2026-02-21T08:34:17.9572729Z bfe.u32 %r247, %r246, 4, 14; 2026-02-21T08:34:17.9572813Z cvt.u64.u32 %rd63, %r247; 2026-02-21T08:34:17.9572895Z or.b64 %rd75, %rd63, 4611686293439512576; 2026-02-21T08:34:17.9573000Z add.s32 %r248, %r45, 32864; 2026-02-21T08:34:17.9573077Z bfe.u32 %r249, %r248, 4, 14; 2026-02-21T08:34:17.9573168Z cvt.u64.u32 %rd64, %r249; 2026-02-21T08:34:17.9573252Z or.b64 %rd76, %rd64, 4611686293439512576; 2026-02-21T08:34:17.9573326Z add.s32 %r250, %r45, 16384; 2026-02-21T08:34:17.9573416Z bfe.u32 %r251, %r250, 4, 14; 2026-02-21T08:34:17.9573494Z cvt.u64.u32 %rd65, %r251; 2026-02-21T08:34:17.9573578Z or.b64 %rd77, %rd65, 4611686293439512576; 2026-02-21T08:34:17.9573654Z add.s32 %r252, %r45, 16416; 2026-02-21T08:34:17.9573736Z bfe.u32 %r253, %r252, 4, 14; 2026-02-21T08:34:17.9573809Z cvt.u64.u32 %rd66, %r253; 2026-02-21T08:34:17.9573894Z or.b64 %rd79, %rd66, 4611686293439512576; 2026-02-21T08:34:17.9573980Z add.s32 %r254, %r45, 16448; 2026-02-21T08:34:17.9574053Z bfe.u32 %r255, %r254, 4, 14; 2026-02-21T08:34:17.9574127Z cvt.u64.u32 %rd67, %r255; 2026-02-21T08:34:17.9574221Z or.b64 %rd81, %rd67, 4611686293439512576; 2026-02-21T08:34:17.9574298Z add.s32 %r256, %r45, 16480; 2026-02-21T08:34:17.9574371Z bfe.u32 %r257, %r256, 4, 14; 2026-02-21T08:34:17.9574446Z cvt.u64.u32 %rd68, %r257; 2026-02-21T08:34:17.9574537Z or.b64 %rd83, %rd68, 4611686293439512576; 2026-02-21T08:34:17.9574868Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9574953Z add.s32 %r258, %r5, %r197; 2026-02-21T08:34:17.9575042Z cvt.u64.u32 %rd13, %r258; 2026-02-21T08:34:17.9575122Z add.s32 %r259, %r4, %r182; 2026-02-21T08:34:17.9575196Z cvt.u64.u32 %rd14, %r259; 2026-02-21T08:34:17.9575271Z add.s32 %r260, %r5, %r189; 2026-02-21T08:34:17.9575351Z cvt.u64.u32 %rd15, %r260; 2026-02-21T08:34:17.9575425Z add.s32 %r261, %r5, %r181; 2026-02-21T08:34:17.9575497Z cvt.u64.u32 %rd16, %r261; 2026-02-21T08:34:17.9575578Z add.s32 %r262, %r4, %r166; 2026-02-21T08:34:17.9575650Z cvt.u64.u32 %rd17, %r262; 2026-02-21T08:34:17.9575723Z add.s32 %r263, %r5, %r173; 2026-02-21T08:34:17.9575798Z cvt.u64.u32 %rd18, %r263; 2026-02-21T08:34:17.9575886Z mov.pred %p64, 0; 2026-02-21T08:34:17.9575961Z mov.b64 %rd1212, 0; 2026-02-21T08:34:17.9576036Z bra.uni $L__BB0_6; 2026-02-21T08:34:17.9576179Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:34:17.9576407Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9576498Z setp.lt.u64 %p25, %rd1212, 1984; 2026-02-21T08:34:17.9576662Z add.s64 %rd20, %rd1212, 64; 2026-02-21T08:34:17.9576892Z .loc 1 47 60 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:60 2026-02-21T08:34:17.9576977Z add.s64 %rd118, %rd17, %rd1212; 2026-02-21T08:34:17.9577059Z add.s64 %rd119, %rd18, %rd1212; 2026-02-21T08:34:17.9577290Z .loc 1 47 32 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:32 2026-02-21T08:34:17.9577370Z add.s64 %rd120, %rd16, %rd1212; 2026-02-21T08:34:17.9577448Z cvt.u32.u64 %r348, %rd118; 2026-02-21T08:34:17.9577534Z add.s32 %r349, %r348, 64; 2026-02-21T08:34:17.9577619Z mad.wide.s32 %rd86, %r349, 2, %rd21; 2026-02-21T08:34:17.9577693Z add.s32 %r350, %r348, 32832; 2026-02-21T08:34:17.9577783Z mad.wide.s32 %rd87, %r350, 2, %rd21; 2026-02-21T08:34:17.9577862Z add.s32 %r351, %r348, 65600; 2026-02-21T08:34:17.9577941Z mad.wide.s32 %rd88, %r351, 2, %rd21; 2026-02-21T08:34:17.9578054Z add.s32 %r352, %r348, 98368; 2026-02-21T08:34:17.9578150Z mad.wide.s32 %rd89, %r352, 2, %rd21; 2026-02-21T08:34:17.9578231Z add.s32 %r353, %r348, 131136; 2026-02-21T08:34:17.9578313Z mad.wide.s32 %rd90, %r353, 2, %rd21; 2026-02-21T08:34:17.9578401Z add.s32 %r354, %r348, 163904; 2026-02-21T08:34:17.9578481Z mad.wide.s32 %rd91, %r354, 2, %rd21; 2026-02-21T08:34:17.9578566Z add.s32 %r355, %r348, 196672; 2026-02-21T08:34:17.9578645Z mad.wide.s32 %rd92, %r355, 2, %rd21; 2026-02-21T08:34:17.9578727Z cvt.u32.u64 %r356, %rd119; 2026-02-21T08:34:17.9578852Z mad.wide.s32 %rd93, %r356, 2, %rd21; 2026-02-21T08:34:17.9578924Z add.s32 %r357, %r348, 262208; 2026-02-21T08:34:17.9579011Z mad.wide.s32 %rd94, %r357, 2, %rd21; 2026-02-21T08:34:17.9579084Z add.s32 %r358, %r348, 294976; 2026-02-21T08:34:17.9579161Z mad.wide.s32 %rd95, %r358, 2, %rd21; 2026-02-21T08:34:17.9579243Z add.s32 %r359, %r348, 327744; 2026-02-21T08:34:17.9579322Z mad.wide.s32 %rd96, %r359, 2, %rd21; 2026-02-21T08:34:17.9579396Z add.s32 %r360, %r348, 360512; 2026-02-21T08:34:17.9579474Z mad.wide.s32 %rd97, %r360, 2, %rd21; 2026-02-21T08:34:17.9579554Z add.s32 %r361, %r348, 393280; 2026-02-21T08:34:17.9579630Z mad.wide.s32 %rd98, %r361, 2, %rd21; 2026-02-21T08:34:17.9579702Z add.s32 %r362, %r348, 426048; 2026-02-21T08:34:17.9579785Z mad.wide.s32 %rd99, %r362, 2, %rd21; 2026-02-21T08:34:17.9579857Z add.s32 %r363, %r348, 458816; 2026-02-21T08:34:17.9579940Z mad.wide.s32 %rd100, %r363, 2, %rd21; 2026-02-21T08:34:17.9580019Z cvt.u32.u64 %r364, %rd120; 2026-02-21T08:34:17.9580113Z mad.wide.s32 %rd101, %r364, 2, %rd21; 2026-02-21T08:34:17.9580335Z .loc 1 47 85 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:85 2026-02-21T08:34:17.9580406Z bar.sync 2, 128; 2026-02-21T08:34:17.9580493Z selp.b32 %r285, 16, 0, %p25; 2026-02-21T08:34:17.9580564Z // begin inline asm 2026-02-21T08:34:17.9580750Z cp.async.cg.shared.global [ %r284 + 0 ], [ %rd86 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9580831Z // end inline asm 2026-02-21T08:34:17.9580907Z // begin inline asm 2026-02-21T08:34:17.9581057Z cp.async.cg.shared.global [ %r286 + 0 ], [ %rd87 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9581127Z // end inline asm 2026-02-21T08:34:17.9581206Z // begin inline asm 2026-02-21T08:34:17.9581349Z cp.async.cg.shared.global [ %r288 + 0 ], [ %rd88 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9581420Z // end inline asm 2026-02-21T08:34:17.9581501Z // begin inline asm 2026-02-21T08:34:17.9581640Z cp.async.cg.shared.global [ %r290 + 0 ], [ %rd89 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9581713Z // end inline asm 2026-02-21T08:34:17.9581782Z // begin inline asm 2026-02-21T08:34:17.9581930Z cp.async.cg.shared.global [ %r292 + 0 ], [ %rd90 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9582001Z // end inline asm 2026-02-21T08:34:17.9582071Z // begin inline asm 2026-02-21T08:34:17.9582217Z cp.async.cg.shared.global [ %r294 + 0 ], [ %rd91 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9582291Z // end inline asm 2026-02-21T08:34:17.9582365Z // begin inline asm 2026-02-21T08:34:17.9582575Z cp.async.cg.shared.global [ %r296 + 0 ], [ %rd92 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9582647Z // end inline asm 2026-02-21T08:34:17.9582720Z // begin inline asm 2026-02-21T08:34:17.9582859Z cp.async.cg.shared.global [ %r298 + 0 ], [ %rd93 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9582943Z // end inline asm 2026-02-21T08:34:17.9583015Z // begin inline asm 2026-02-21T08:34:17.9583155Z cp.async.cg.shared.global [ %r300 + 0 ], [ %rd94 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9583237Z // end inline asm 2026-02-21T08:34:17.9583308Z // begin inline asm 2026-02-21T08:34:17.9583448Z cp.async.cg.shared.global [ %r302 + 0 ], [ %rd95 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9583516Z // end inline asm 2026-02-21T08:34:17.9583597Z // begin inline asm 2026-02-21T08:34:17.9583738Z cp.async.cg.shared.global [ %r304 + 0 ], [ %rd96 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9583807Z // end inline asm 2026-02-21T08:34:17.9583887Z // begin inline asm 2026-02-21T08:34:17.9584060Z cp.async.cg.shared.global [ %r306 + 0 ], [ %rd97 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9584137Z // end inline asm 2026-02-21T08:34:17.9584217Z // begin inline asm 2026-02-21T08:34:17.9584354Z cp.async.cg.shared.global [ %r308 + 0 ], [ %rd98 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9584425Z // end inline asm 2026-02-21T08:34:17.9584496Z // begin inline asm 2026-02-21T08:34:17.9584642Z cp.async.cg.shared.global [ %r310 + 0 ], [ %rd99 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9584753Z // end inline asm 2026-02-21T08:34:17.9584863Z // begin inline asm 2026-02-21T08:34:17.9585022Z cp.async.cg.shared.global [ %r312 + 0 ], [ %rd100 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9585093Z // end inline asm 2026-02-21T08:34:17.9585165Z // begin inline asm 2026-02-21T08:34:17.9585310Z cp.async.cg.shared.global [ %r314 + 0 ], [ %rd101 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9585393Z // end inline asm 2026-02-21T08:34:17.9585472Z cp.async.commit_group; 2026-02-21T08:34:17.9585695Z .loc 1 48 59 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:59 2026-02-21T08:34:17.9585786Z add.s64 %rd121, %rd14, %rd1212; 2026-02-21T08:34:17.9585863Z add.s64 %rd122, %rd15, %rd1212; 2026-02-21T08:34:17.9586080Z .loc 1 48 34 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:34 2026-02-21T08:34:17.9586167Z add.s64 %rd123, %rd13, %rd1212; 2026-02-21T08:34:17.9586243Z cvt.u32.u64 %r365, %rd121; 2026-02-21T08:34:17.9586318Z add.s32 %r366, %r365, 64; 2026-02-21T08:34:17.9586403Z mad.wide.s32 %rd102, %r366, 2, %rd22; 2026-02-21T08:34:17.9586487Z add.s32 %r367, %r365, 32832; 2026-02-21T08:34:17.9586569Z mad.wide.s32 %rd103, %r367, 2, %rd22; 2026-02-21T08:34:17.9586644Z add.s32 %r368, %r365, 65600; 2026-02-21T08:34:17.9586730Z mad.wide.s32 %rd104, %r368, 2, %rd22; 2026-02-21T08:34:17.9586850Z add.s32 %r369, %r365, 98368; 2026-02-21T08:34:17.9586935Z mad.wide.s32 %rd105, %r369, 2, %rd22; 2026-02-21T08:34:17.9587011Z add.s32 %r370, %r365, 131136; 2026-02-21T08:34:17.9587100Z mad.wide.s32 %rd106, %r370, 2, %rd22; 2026-02-21T08:34:17.9587174Z add.s32 %r371, %r365, 163904; 2026-02-21T08:34:17.9587253Z mad.wide.s32 %rd107, %r371, 2, %rd22; 2026-02-21T08:34:17.9587334Z add.s32 %r372, %r365, 196672; 2026-02-21T08:34:17.9587412Z mad.wide.s32 %rd108, %r372, 2, %rd22; 2026-02-21T08:34:17.9587488Z cvt.u32.u64 %r373, %rd122; 2026-02-21T08:34:17.9587572Z mad.wide.s32 %rd109, %r373, 2, %rd22; 2026-02-21T08:34:17.9587646Z add.s32 %r374, %r365, 262208; 2026-02-21T08:34:17.9587725Z mad.wide.s32 %rd110, %r374, 2, %rd22; 2026-02-21T08:34:17.9587796Z add.s32 %r375, %r365, 294976; 2026-02-21T08:34:17.9587881Z mad.wide.s32 %rd111, %r375, 2, %rd22; 2026-02-21T08:34:17.9587954Z add.s32 %r376, %r365, 327744; 2026-02-21T08:34:17.9588033Z mad.wide.s32 %rd112, %r376, 2, %rd22; 2026-02-21T08:34:17.9588115Z add.s32 %r377, %r365, 360512; 2026-02-21T08:34:17.9588194Z mad.wide.s32 %rd113, %r377, 2, %rd22; 2026-02-21T08:34:17.9588268Z add.s32 %r378, %r365, 393280; 2026-02-21T08:34:17.9588389Z mad.wide.s32 %rd114, %r378, 2, %rd22; 2026-02-21T08:34:17.9588470Z add.s32 %r379, %r365, 426048; 2026-02-21T08:34:17.9588548Z mad.wide.s32 %rd115, %r379, 2, %rd22; 2026-02-21T08:34:17.9588621Z add.s32 %r380, %r365, 458816; 2026-02-21T08:34:17.9588707Z mad.wide.s32 %rd116, %r380, 2, %rd22; 2026-02-21T08:34:17.9588781Z cvt.u32.u64 %r381, %rd123; 2026-02-21T08:34:17.9588856Z mad.wide.s32 %rd117, %r381, 2, %rd22; 2026-02-21T08:34:17.9589086Z .loc 1 48 87 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:48:87 2026-02-21T08:34:17.9589159Z // begin inline asm 2026-02-21T08:34:17.9589304Z cp.async.cg.shared.global [ %r316 + 0 ], [ %rd102 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9589371Z // end inline asm 2026-02-21T08:34:17.9589451Z // begin inline asm 2026-02-21T08:34:17.9589596Z cp.async.cg.shared.global [ %r318 + 0 ], [ %rd103 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9589663Z // end inline asm 2026-02-21T08:34:17.9589778Z // begin inline asm 2026-02-21T08:34:17.9589925Z cp.async.cg.shared.global [ %r320 + 0 ], [ %rd104 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9589995Z // end inline asm 2026-02-21T08:34:17.9590064Z // begin inline asm 2026-02-21T08:34:17.9590213Z cp.async.cg.shared.global [ %r322 + 0 ], [ %rd105 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9590282Z // end inline asm 2026-02-21T08:34:17.9590352Z // begin inline asm 2026-02-21T08:34:17.9590500Z cp.async.cg.shared.global [ %r324 + 0 ], [ %rd106 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9590598Z // end inline asm 2026-02-21T08:34:17.9590671Z // begin inline asm 2026-02-21T08:34:17.9590829Z cp.async.cg.shared.global [ %r326 + 0 ], [ %rd107 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9590914Z // end inline asm 2026-02-21T08:34:17.9590997Z // begin inline asm 2026-02-21T08:34:17.9591167Z cp.async.cg.shared.global [ %r328 + 0 ], [ %rd108 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9591259Z // end inline asm 2026-02-21T08:34:17.9591346Z // begin inline asm 2026-02-21T08:34:17.9591513Z cp.async.cg.shared.global [ %r330 + 0 ], [ %rd109 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9591603Z // end inline asm 2026-02-21T08:34:17.9591689Z // begin inline asm 2026-02-21T08:34:17.9591877Z cp.async.cg.shared.global [ %r332 + 0 ], [ %rd110 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9591962Z // end inline asm 2026-02-21T08:34:17.9592071Z // begin inline asm 2026-02-21T08:34:17.9592263Z cp.async.cg.shared.global [ %r334 + 0 ], [ %rd111 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9592348Z // end inline asm 2026-02-21T08:34:17.9592519Z // begin inline asm 2026-02-21T08:34:17.9592711Z cp.async.cg.shared.global [ %r336 + 0 ], [ %rd112 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9592796Z // end inline asm 2026-02-21T08:34:17.9592881Z // begin inline asm 2026-02-21T08:34:17.9593079Z cp.async.cg.shared.global [ %r338 + 0 ], [ %rd113 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9593192Z // end inline asm 2026-02-21T08:34:17.9593270Z // begin inline asm 2026-02-21T08:34:17.9593441Z cp.async.cg.shared.global [ %r340 + 0 ], [ %rd114 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9593521Z // end inline asm 2026-02-21T08:34:17.9593598Z // begin inline asm 2026-02-21T08:34:17.9593760Z cp.async.cg.shared.global [ %r342 + 0 ], [ %rd115 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9593836Z // end inline asm 2026-02-21T08:34:17.9593909Z // begin inline asm 2026-02-21T08:34:17.9594053Z cp.async.cg.shared.global [ %r344 + 0 ], [ %rd116 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9594131Z // end inline asm 2026-02-21T08:34:17.9594205Z // begin inline asm 2026-02-21T08:34:17.9594349Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd117 + 0 ], 0x10, %r285; 2026-02-21T08:34:17.9594430Z // end inline asm 2026-02-21T08:34:17.9594511Z cp.async.commit_group; 2026-02-21T08:34:17.9594600Z mov.pred %p64, -1; 2026-02-21T08:34:17.9594723Z mov.b64 %rd1212, %rd20; 2026-02-21T08:34:17.9594967Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9595047Z @%p25 bra $L__BB0_6; 2026-02-21T08:34:17.9595161Z bra.uni $L__BB0_9; 2026-02-21T08:34:17.9595298Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:34:17.9595424Z // => This Inner Loop Header: Depth=2 2026-02-21T08:34:17.9595645Z .loc 1 47 85 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:47:85 2026-02-21T08:34:17.9595738Z cp.async.wait_group 0; 2026-02-21T08:34:17.9595809Z bar.sync 2, 128; 2026-02-21T08:34:17.9596036Z .loc 1 49 52 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:49:52 2026-02-21T08:34:17.9596141Z shfl.sync.idx.b32 %r264, %r7, 0, 31, -1; 2026-02-21T08:34:17.9596224Z setp.ne.b32 %p5, %r264, 0; 2026-02-21T08:34:17.9596300Z @%p5 bra $L__BB0_8; 2026-02-21T08:34:17.9596429Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:34:17.9596528Z setp.eq.b64 %p23, %rd1212, 1984; 2026-02-21T08:34:17.9596655Z elect.sync %r281|%p7, -1; 2026-02-21T08:34:17.9596734Z mov.b32 %r266, 138412048; 2026-02-21T08:34:17.9596817Z // begin inline asm 2026-02-21T08:34:17.9597044Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 0 ], %rd69, %rd70, %r266, %p64; 2026-02-21T08:34:17.9597116Z // end inline asm 2026-02-21T08:34:17.9597193Z mov.pred %p8, -1; 2026-02-21T08:34:17.9597276Z // begin inline asm 2026-02-21T08:34:17.9597460Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 0 ], %rd71, %rd72, %r266, %p8; 2026-02-21T08:34:17.9597572Z // end inline asm 2026-02-21T08:34:17.9597656Z // begin inline asm 2026-02-21T08:34:17.9597827Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 0 ], %rd73, %rd74, %r266, %p8; 2026-02-21T08:34:17.9597899Z // end inline asm 2026-02-21T08:34:17.9597979Z // begin inline asm 2026-02-21T08:34:17.9598145Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 0 ], %rd75, %rd76, %r266, %p8; 2026-02-21T08:34:17.9598217Z // end inline asm 2026-02-21T08:34:17.9598289Z // begin inline asm 2026-02-21T08:34:17.9598483Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 256 ], %rd77, %rd70, %r266, %p64; 2026-02-21T08:34:17.9598554Z // end inline asm 2026-02-21T08:34:17.9598626Z // begin inline asm 2026-02-21T08:34:17.9598809Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 256 ], %rd79, %rd72, %r266, %p8; 2026-02-21T08:34:17.9598880Z // end inline asm 2026-02-21T08:34:17.9598951Z // begin inline asm 2026-02-21T08:34:17.9599127Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 256 ], %rd81, %rd74, %r266, %p8; 2026-02-21T08:34:17.9599201Z // end inline asm 2026-02-21T08:34:17.9599273Z // begin inline asm 2026-02-21T08:34:17.9599446Z @%p7 tcgen05.mma.cta_group::1.kind::f16 [ %r265 + 256 ], %rd83, %rd76, %r266, %p8; 2026-02-21T08:34:17.9599517Z // end inline asm 2026-02-21T08:34:17.9599601Z and.pred %p22, %p23, %p7; 2026-02-21T08:34:17.9599797Z add.s32 %r283, %r45, 65536; 2026-02-21T08:34:17.9599889Z cvt.u64.u32 %rd85, %r283; 2026-02-21T08:34:17.9599964Z // begin inline asm 2026-02-21T08:34:17.9600143Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd85]; 2026-02-21T08:34:17.9600221Z // end inline asm 2026-02-21T08:34:17.9600293Z bra.uni $L__BB0_8; 2026-02-21T08:34:17.9600425Z $L__BB0_12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9600672Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9600747Z barrier.sync 1; 2026-02-21T08:34:17.9600824Z barrier.sync 1; 2026-02-21T08:34:17.9600898Z bra.uni $L__BB0_2; 2026-02-21T08:34:17.9601039Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9601124Z cp.async.wait_group 0; 2026-02-21T08:34:17.9601197Z bar.sync 2, 128; 2026-02-21T08:34:17.9601280Z barrier.sync 1; 2026-02-21T08:34:17.9601353Z bra.uni $L__BB0_2; 2026-02-21T08:34:17.9601484Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9601723Z .loc 1 42 111 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:42:111 2026-02-21T08:34:17.9601842Z barrier.sync 1; 2026-02-21T08:34:17.9601918Z barrier.sync 1; 2026-02-21T08:34:17.9601995Z bra.uni $L__BB0_2; 2026-02-21T08:34:17.9602134Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:34:17.9602356Z .loc 1 14 0 // cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py:14 2026-02-21T08:34:17.9602431Z barrier.sync 1; 2026-02-21T08:34:17.9602513Z barrier.sync 1; 2026-02-21T08:34:17.9602585Z bra.uni $L__BB0_2; 2026-02-21T08:34:17.9602658Z $L__tmp0: 2026-02-21T08:34:17.9602729Z $L__func_end0: 2026-02-21T08:34:17.9602847Z // -- End function 2026-02-21T08:34:17.9602923Z } 2026-02-21T08:34:17.9603203Z .file 1 "/tmp/torchinductor_root/mu/cmuor7t4v45ykkb2hcs6ectiwhjqhz4vn6jqolrowau5xjqr4sm5.py" 2026-02-21T08:34:17.9603292Z .section .debug_abbrev 2026-02-21T08:34:17.9603394Z { 2026-02-21T08:34:17.9603522Z .b8 1 // Abbreviation Code 2026-02-21T08:34:17.9603639Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:34:17.9603757Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:34:17.9603864Z .b8 37 // DW_AT_producer 2026-02-21T08:34:17.9603964Z .b8 8 // DW_FORM_string 2026-02-21T08:34:17.9604074Z .b8 19 // DW_AT_language 2026-02-21T08:34:17.9604211Z .b8 5 // DW_FORM_data2 2026-02-21T08:34:17.9604315Z .b8 3 // DW_AT_name 2026-02-21T08:34:17.9604421Z .b8 8 // DW_FORM_string 2026-02-21T08:34:17.9604526Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:34:17.9604628Z .b8 6 // DW_FORM_data4 2026-02-21T08:34:17.9604783Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:34:17.9604890Z .b8 8 // DW_FORM_string 2026-02-21T08:34:17.9604982Z .b8 0 // EOM(1) 2026-02-21T08:34:17.9605073Z .b8 0 // EOM(2) 2026-02-21T08:34:17.9605168Z .b8 0 // EOM(3) 2026-02-21T08:34:17.9605233Z } 2026-02-21T08:34:17.9605314Z .section .debug_info 2026-02-21T08:34:17.9605387Z { 2026-02-21T08:34:17.9605496Z .b32 104 // Length of Unit 2026-02-21T08:34:17.9605613Z .b8 2 // DWARF version number 2026-02-21T08:34:17.9605681Z .b8 0 2026-02-21T08:34:17.9605845Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:34:17.9605962Z .b8 8 // Address Size (in bytes) 2026-02-21T08:34:17.9606157Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:34:17.9606278Z .b8 116 // DW_AT_producer 2026-02-21T08:34:17.9606351Z .b8 114 2026-02-21T08:34:17.9606420Z .b8 105 2026-02-21T08:34:17.9606493Z .b8 116 2026-02-21T08:34:17.9606558Z .b8 111 2026-02-21T08:34:17.9606624Z .b8 110 2026-02-21T08:34:17.9606689Z .b8 0 2026-02-21T08:34:17.9606800Z .b8 2 // DW_AT_language 2026-02-21T08:34:17.9606865Z .b8 0 2026-02-21T08:34:17.9606964Z .b8 99 // DW_AT_name 2026-02-21T08:34:17.9607037Z .b8 109 2026-02-21T08:34:17.9607104Z .b8 117 2026-02-21T08:34:17.9607168Z .b8 111 2026-02-21T08:34:17.9607233Z .b8 114 2026-02-21T08:34:17.9607310Z .b8 55 2026-02-21T08:34:17.9607377Z .b8 116 2026-02-21T08:34:17.9607443Z .b8 52 2026-02-21T08:34:17.9607510Z .b8 118 2026-02-21T08:34:17.9607586Z .b8 52 2026-02-21T08:34:17.9607653Z .b8 53 2026-02-21T08:34:17.9607720Z .b8 121 2026-02-21T08:34:17.9607798Z .b8 107 2026-02-21T08:34:17.9607863Z .b8 107 2026-02-21T08:34:17.9607928Z .b8 98 2026-02-21T08:34:17.9607994Z .b8 50 2026-02-21T08:34:17.9608109Z .b8 104 2026-02-21T08:34:17.9608176Z .b8 99 2026-02-21T08:34:17.9608241Z .b8 115 2026-02-21T08:34:17.9608313Z .b8 54 2026-02-21T08:34:17.9608379Z .b8 101 2026-02-21T08:34:17.9608443Z .b8 99 2026-02-21T08:34:17.9608506Z .b8 116 2026-02-21T08:34:17.9608583Z .b8 105 2026-02-21T08:34:17.9608647Z .b8 119 2026-02-21T08:34:17.9608712Z .b8 104 2026-02-21T08:34:17.9608776Z .b8 106 2026-02-21T08:34:17.9608850Z .b8 113 2026-02-21T08:34:17.9608916Z .b8 104 2026-02-21T08:34:17.9608981Z .b8 122 2026-02-21T08:34:17.9609053Z .b8 52 2026-02-21T08:34:17.9609118Z .b8 118 2026-02-21T08:34:17.9609181Z .b8 110 2026-02-21T08:34:17.9609245Z .b8 54 2026-02-21T08:34:17.9609317Z .b8 106 2026-02-21T08:34:17.9609382Z .b8 113 2026-02-21T08:34:17.9609447Z .b8 111 2026-02-21T08:34:17.9609520Z .b8 108 2026-02-21T08:34:17.9609586Z .b8 114 2026-02-21T08:34:17.9609649Z .b8 111 2026-02-21T08:34:17.9609715Z .b8 119 2026-02-21T08:34:17.9609787Z .b8 97 2026-02-21T08:34:17.9609853Z .b8 117 2026-02-21T08:34:17.9609955Z .b8 53 2026-02-21T08:34:17.9610033Z .b8 120 2026-02-21T08:34:17.9610099Z .b8 106 2026-02-21T08:34:17.9610164Z .b8 113 2026-02-21T08:34:17.9610229Z .b8 114 2026-02-21T08:34:17.9610302Z .b8 52 2026-02-21T08:34:17.9610366Z .b8 115 2026-02-21T08:34:17.9610431Z .b8 109 2026-02-21T08:34:17.9610496Z .b8 53 2026-02-21T08:34:17.9610569Z .b8 46 2026-02-21T08:34:17.9610635Z .b8 112 2026-02-21T08:34:17.9610701Z .b8 121 2026-02-21T08:34:17.9610772Z .b8 0 2026-02-21T08:34:17.9610896Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:34:17.9611036Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:34:17.9611102Z .b8 116 2026-02-21T08:34:17.9611177Z .b8 109 2026-02-21T08:34:17.9611242Z .b8 112 2026-02-21T08:34:17.9611308Z .b8 47 2026-02-21T08:34:17.9611380Z .b8 116 2026-02-21T08:34:17.9611446Z .b8 111 2026-02-21T08:34:17.9611510Z .b8 114 2026-02-21T08:34:17.9611576Z .b8 99 2026-02-21T08:34:17.9611655Z .b8 104 2026-02-21T08:34:17.9611722Z .b8 105 2026-02-21T08:34:17.9611790Z .b8 110 2026-02-21T08:34:17.9611870Z .b8 100 2026-02-21T08:34:17.9611938Z .b8 117 2026-02-21T08:34:17.9612008Z .b8 99 2026-02-21T08:34:17.9612074Z .b8 116 2026-02-21T08:34:17.9612148Z .b8 111 2026-02-21T08:34:17.9612214Z .b8 114 2026-02-21T08:34:17.9612278Z .b8 95 2026-02-21T08:34:17.9612343Z .b8 114 2026-02-21T08:34:17.9612417Z .b8 111 2026-02-21T08:34:17.9612483Z .b8 111 2026-02-21T08:34:17.9612547Z .b8 116 2026-02-21T08:34:17.9612620Z .b8 47 2026-02-21T08:34:17.9612687Z .b8 109 2026-02-21T08:34:17.9612750Z .b8 117 2026-02-21T08:34:17.9612816Z .b8 0 2026-02-21T08:34:17.9612890Z } 2026-02-21T08:34:17.9612976Z .section .debug_macinfo { } 2026-02-21T08:34:17.9612985Z 2026-02-21T08:34:17.9613091Z ================================================================ 2026-02-21T08:34:17.9613239Z please share the reproducer above with Triton project. 2026-02-21T08:34:19.9618273Z 2026-02-21T08:34:19.9619871Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 50/50 24.0 configs/s 2026-02-21T08:34:22.8140276Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 351.8 2026-02-21T08:34:22.8141136Z configs/s 2026-02-21T08:34:22.9654185Z [198s] Generation 8 complete: 2026-02-21T08:34:22.9654593Z error=17 2026-02-21T08:34:22.9656736Z ok=35 2026-02-21T08:34:22.9656956Z min=0.0696 2026-02-21T08:34:22.9657193Z mid=0.0983 2026-02-21T08:34:22.9657417Z max=4.1042 2026-02-21T08:34:22.9657667Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:34:22.9658108Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:34:22.9658558Z 'l2_groupings': [2], 2026-02-21T08:34:22.9658890Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:34:22.9659279Z 'loop_orders': [[0, 1]], 2026-02-21T08:34:22.9659582Z 'maxnreg': 128, 2026-02-21T08:34:22.9659847Z 'num_sm_multiplier': 8, 2026-02-21T08:34:22.9660141Z 'num_stages': 3, 2026-02-21T08:34:22.9660400Z 'num_warps': 2, 2026-02-21T08:34:22.9660705Z 'pid_type': 'persistent_blocked', 2026-02-21T08:34:22.9661097Z 'range_flattens': [None, True], 2026-02-21T08:34:22.9662023Z 'range_multi_buffers': [True, False], 2026-02-21T08:34:22.9662417Z 'range_num_stages': [0, 0], 2026-02-21T08:34:22.9662745Z 'range_unroll_factors': [0, 0], 2026-02-21T08:34:22.9663107Z 'range_warp_specializes': [None, True]} 2026-02-21T08:34:22.9687186Z [198s] Fitting surrogate: 695 points, 695 targets 2026-02-21T08:34:23.9817183Z [199s] Generation 9 starting: 46 neighbors, 3 active search path(s) 2026-02-21T08:34:29.5830131Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46/46 6.0 configs/s 2026-02-21T08:34:31.3379086Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 46/46 27.2 configs/s 2026-02-21T08:34:34.2973044Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 338.1 2026-02-21T08:34:34.2974065Z configs/s 2026-02-21T08:34:34.4481454Z [209s] Generation 9 complete: 2026-02-21T08:34:34.4485750Z error=19 2026-02-21T08:34:34.4490295Z ok=30 2026-02-21T08:34:34.4494007Z min=0.0706 2026-02-21T08:34:34.4498100Z mid=0.0788 2026-02-21T08:34:34.4501787Z max=6.9320 2026-02-21T08:34:34.4505905Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:34:34.4507375Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:34:34.4507635Z 'l2_groupings': [2], 2026-02-21T08:34:34.4507859Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:34:34.4508106Z 'loop_orders': [[0, 1]], 2026-02-21T08:34:34.4508314Z 'maxnreg': 128, 2026-02-21T08:34:34.4508815Z 'num_sm_multiplier': 4, 2026-02-21T08:34:34.4509019Z 'num_stages': 3, 2026-02-21T08:34:34.4509213Z 'num_warps': 4, 2026-02-21T08:34:34.4509414Z 'pid_type': 'persistent_blocked', 2026-02-21T08:34:34.4509650Z 'range_flattens': [None, True], 2026-02-21T08:34:34.4509875Z 'range_multi_buffers': [True, False], 2026-02-21T08:34:34.4510125Z 'range_num_stages': [0, 0], 2026-02-21T08:34:34.4510334Z 'range_unroll_factors': [0, 0], 2026-02-21T08:34:34.4510572Z 'range_warp_specializes': [None, True]} 2026-02-21T08:34:34.4510841Z [209s] Fitting surrogate: 744 points, 744 targets 2026-02-21T08:34:35.2991019Z [210s] Generation 10 starting: 34 neighbors, 2 active search path(s) 2026-02-21T08:34:38.8826661Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 34/34 10.5 configs/s 2026-02-21T08:34:40.3134005Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 34/34 24.1 configs/s 2026-02-21T08:34:42.3930338Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 478.9 2026-02-21T08:34:42.3931880Z configs/s 2026-02-21T08:34:42.5112369Z [217s] Generation 10 complete: 2026-02-21T08:34:42.5112805Z error=12 2026-02-21T08:34:42.5113042Z ok=24 2026-02-21T08:34:42.5113304Z min=0.0706 2026-02-21T08:34:42.5113536Z mid=0.0789 2026-02-21T08:34:42.5113771Z max=4.1841 2026-02-21T08:34:42.5114626Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:34:42.5115532Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:34:42.5115987Z 'l2_groupings': [2], 2026-02-21T08:34:42.5116341Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:34:42.5116745Z 'loop_orders': [[0, 1]], 2026-02-21T08:34:42.5117068Z 'maxnreg': 128, 2026-02-21T08:34:42.5117380Z 'num_sm_multiplier': 4, 2026-02-21T08:34:42.5117681Z 'num_stages': 3, 2026-02-21T08:34:42.5118007Z 'num_warps': 4, 2026-02-21T08:34:42.5118275Z 'pid_type': 'persistent_blocked', 2026-02-21T08:34:42.5118617Z 'range_flattens': [None, True], 2026-02-21T08:34:42.5119020Z 'range_multi_buffers': [True, False], 2026-02-21T08:34:42.5119390Z 'range_num_stages': [0, 0], 2026-02-21T08:34:42.5119776Z 'range_unroll_factors': [0, 0], 2026-02-21T08:34:42.5120136Z 'range_warp_specializes': [None, True]} 2026-02-21T08:34:42.5145340Z [217s] Fitting surrogate: 780 points, 780 targets 2026-02-21T08:34:43.0103732Z [218s] Generation 11 starting: 14 neighbors, 1 active search path(s) 2026-02-21T08:34:46.9032410Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14/14 2.3 configs/s 2026-02-21T08:34:47.5120764Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 14/14 23.7 configs/s 2026-02-21T08:34:48.0973067Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1624.4 2026-02-21T08:34:48.0973519Z configs/s 2026-02-21T08:34:48.1538560Z [223s] Generation 11 complete: 2026-02-21T08:34:48.1539128Z error=5 2026-02-21T08:34:48.1539473Z ok=11 2026-02-21T08:34:48.1539809Z min=0.0716 2026-02-21T08:34:48.1540210Z mid=0.1094 2026-02-21T08:34:48.1540533Z max=4.8814 2026-02-21T08:34:48.1540903Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:34:48.1541497Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:34:48.1542064Z 'l2_groupings': [2], 2026-02-21T08:34:48.1542502Z 'load_eviction_policies': ['first', ''], 2026-02-21T08:34:48.1543040Z 'loop_orders': [[0, 1]], 2026-02-21T08:34:48.1543471Z 'maxnreg': 128, 2026-02-21T08:34:48.1544358Z 'num_sm_multiplier': 4, 2026-02-21T08:34:48.1545185Z 'num_stages': 3, 2026-02-21T08:34:48.1545574Z 'num_warps': 4, 2026-02-21T08:34:48.1545952Z 'pid_type': 'persistent_blocked', 2026-02-21T08:34:48.1546452Z 'range_flattens': [None, True], 2026-02-21T08:34:48.1546923Z 'range_multi_buffers': [True, False], 2026-02-21T08:34:48.1547420Z 'range_num_stages': [0, 0], 2026-02-21T08:34:48.1547854Z 'range_unroll_factors': [0, 0], 2026-02-21T08:34:48.1548336Z 'range_warp_specializes': [None, True]} 2026-02-21T08:34:48.1575344Z [223s] Fitting surrogate: 796 points, 796 targets 2026-02-21T08:34:48.4827298Z [223s] Autotuning complete in 223.5s after searching 761 configs. 2026-02-21T08:34:48.4828058Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:34:48.4830550Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=4, num_stages=3, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T08:34:48.4832902Z 2026-02-21T08:34:48.4833431Z [223s] Code of selected kernel: /tmp/torchinductor_root/bl/cblpuyesptln6d4u4v2urg7r6ts6isd75ewp7tzj4zlv5wwm5q3f.py 2026-02-21T08:35:22.6735625Z WARNING:tritonbench.utils.triton_op:Completed input ID 6: 2026-02-21T08:35:22.6736108Z (M, N, K) 2026-02-21T08:35:22.6736495Z ------------------ 2026-02-21T08:35:22.6736763Z (8192, 2048, 2048) 2026-02-21T08:35:22.6737774Z 2026-02-21T08:35:22.6748910Z 62%|██████▎ | 5/8 [25:11<16:10, 323.55s/it]WARNING:tritonbench.utils.triton_op:Running input ID 8: 2026-02-21T08:35:22.6752989Z (M, N, K) 2026-02-21T08:35:22.6756868Z ------------------- 2026-02-21T08:35:22.6760717Z (12288, 1024, 1024) 2026-02-21T08:35:22.6765309Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:36:10.6562928Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:36:49.6090980Z Autotune Choices Stats: 2026-02-21T08:36:49.6092970Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_112", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.0387520007789135, "best_triton_pos": 0} 2026-02-21T08:36:49.6097620Z AUTOTUNE mm(12288x1024, 1024x1024) 2026-02-21T08:36:49.6098054Z strides: [1024, 1], [1, 1024] 2026-02-21T08:36:49.6098482Z dtypes: torch.float16, torch.float16 2026-02-21T08:36:49.6099727Z triton_mm_112 0.0388 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:36:49.6101764Z triton_mm_111 0.0429 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:36:49.6104158Z triton_mm_106 0.0469 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:36:49.6106180Z triton_mm_113 0.0490 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:36:49.6108206Z triton_mm_108 0.0531 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:36:49.6110193Z triton_mm_105 0.0551 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:36:49.6112362Z triton_mm_109 0.0551 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:36:49.6114324Z triton_mm_104 0.0551 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:36:49.6116342Z triton_mm_102 0.0572 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:36:49.6118483Z triton_mm_107 0.0705 ms 55.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T08:36:49.6120137Z SingleProcess AUTOTUNE benchmarking takes 0.3622 seconds and 0.2527 seconds precompiling for 19 choices 2026-02-21T08:36:49.8895869Z INFO:tritonbench.utils.triton_op:Took 1062.29ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:37:30.1569285Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:37:30.1571613Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:37:30.1571903Z 'dtype': 'torch.float16', 2026-02-21T08:37:30.1572138Z 'shape': (12288, 1024), 2026-02-21T08:37:30.1572358Z 'stride': (1024, 1)}, 2026-02-21T08:37:30.1572581Z { 'device': 'cuda:0', 2026-02-21T08:37:30.1572799Z 'dtype': 'torch.float16', 2026-02-21T08:37:30.1573020Z 'shape': (1024, 1024), 2026-02-21T08:37:30.1573289Z 'stride': (1, 1024)}, 2026-02-21T08:37:30.1573494Z None), 2026-02-21T08:37:30.1573657Z 'kwargs': {}} 2026-02-21T08:37:30.1618287Z INFO:tritonbench.utils.triton_op:Took 5.34ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:37:30.2545541Z [0s] Autotune random seed: 2134884919 2026-02-21T08:37:30.3680631Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:37:37.4253627Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 48.1 configs/s 2026-02-21T08:37:46.8304067Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━ 100/100 10.6 configs/s 2026-02-21T08:37:46.8319469Z [16s] Adaptive compile timeout: 30s (90% percentile=5.2s, bounds=[30.0s, 30s]) 2026-02-21T08:37:46.8323703Z [16s] Initial random population of 100, 5 starting points: 2026-02-21T08:37:46.8327389Z error=16 2026-02-21T08:37:46.8331370Z ok=84 2026-02-21T08:37:46.8335268Z min=0.0717 2026-02-21T08:37:46.8338758Z mid=1.9262 2026-02-21T08:37:46.8343879Z max=67.3525 2026-02-21T08:37:46.8347918Z best={'block_sizes': [128, 64, 32], 2026-02-21T08:37:46.8352285Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:37:46.8353459Z 'l2_groupings': [2], 2026-02-21T08:37:46.8353679Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T08:37:46.8353878Z 'loop_orders': [[0, 1]], 2026-02-21T08:37:46.8354050Z 'maxnreg': 256, 2026-02-21T08:37:46.8354510Z 'num_sm_multiplier': 2, 2026-02-21T08:37:46.8354724Z 'num_stages': 4, 2026-02-21T08:37:46.8354861Z 'num_warps': 8, 2026-02-21T08:37:46.8355020Z 'pid_type': 'persistent_blocked', 2026-02-21T08:37:46.8355209Z 'range_flattens': [False, None], 2026-02-21T08:37:46.8355388Z 'range_multi_buffers': [None, True], 2026-02-21T08:37:46.8355580Z 'range_num_stages': [0, 0], 2026-02-21T08:37:46.8355743Z 'range_unroll_factors': [0, 0], 2026-02-21T08:37:46.8355928Z 'range_warp_specializes': [True, None]} 2026-02-21T08:37:46.8356134Z [16s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:37:48.0340552Z [17s] Generation 1 starting: 80 neighbors, 5 active search path(s) 2026-02-21T08:37:54.7096643Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 6.2 configs/s 2026-02-21T08:37:58.9732970Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 19.9 configs/s 2026-02-21T08:38:00.2866588Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1244.3 2026-02-21T08:38:00.2867460Z configs/s 2026-02-21T08:38:00.3543725Z [29s] Generation 1 complete: 2026-02-21T08:38:00.3547734Z error=12 2026-02-21T08:38:00.3549729Z ok=74 2026-02-21T08:38:00.3549925Z min=0.0471 2026-02-21T08:38:00.3550069Z mid=0.2212 2026-02-21T08:38:00.3550211Z max=2.0333 2026-02-21T08:38:00.3550361Z best={'block_sizes': [128, 128, 32], 2026-02-21T08:38:00.3550671Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:38:00.3551302Z 'l2_groupings': [2], 2026-02-21T08:38:00.3551505Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T08:38:00.3551717Z 'loop_orders': [[1, 0]], 2026-02-21T08:38:00.3551897Z 'maxnreg': 256, 2026-02-21T08:38:00.3552070Z 'num_sm_multiplier': 2, 2026-02-21T08:38:00.3552239Z 'num_stages': 4, 2026-02-21T08:38:00.3552402Z 'num_warps': 2, 2026-02-21T08:38:00.3552572Z 'pid_type': 'persistent_blocked', 2026-02-21T08:38:00.3552785Z 'range_flattens': [False, None], 2026-02-21T08:38:00.3552987Z 'range_multi_buffers': [None, True], 2026-02-21T08:38:00.3553190Z 'range_num_stages': [0, 0], 2026-02-21T08:38:00.3553369Z 'range_unroll_factors': [0, 0], 2026-02-21T08:38:00.3553569Z 'range_warp_specializes': [True, None]} 2026-02-21T08:38:00.3561188Z [29s] Fitting surrogate: 186 points, 186 targets 2026-02-21T08:38:01.7307722Z [31s] Generation 2 starting: 89 neighbors, 5 active search path(s) 2026-02-21T08:38:09.1162000Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93/93 5.4 configs/s 2026-02-21T08:38:13.3804565Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 93/93 22.1 configs/s 2026-02-21T08:38:15.7374797Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 426.8 2026-02-21T08:38:15.7376168Z configs/s 2026-02-21T08:38:15.8678947Z [45s] Generation 2 complete: 2026-02-21T08:38:15.8682986Z error=24 2026-02-21T08:38:15.8684460Z ok=71 2026-02-21T08:38:15.8684629Z min=0.0471 2026-02-21T08:38:15.8686597Z mid=0.1126 2026-02-21T08:38:15.8686722Z max=9.2375 2026-02-21T08:38:15.8686867Z best={'block_sizes': [128, 128, 32], 2026-02-21T08:38:15.8687131Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:38:15.8687373Z 'l2_groupings': [2], 2026-02-21T08:38:15.8687549Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T08:38:15.8687741Z 'loop_orders': [[1, 0]], 2026-02-21T08:38:15.8687904Z 'maxnreg': 256, 2026-02-21T08:38:15.8688058Z 'num_sm_multiplier': 2, 2026-02-21T08:38:15.8688220Z 'num_stages': 4, 2026-02-21T08:38:15.8688350Z 'num_warps': 1, 2026-02-21T08:38:15.8688505Z 'pid_type': 'persistent_blocked', 2026-02-21T08:38:15.8688683Z 'range_flattens': [False, None], 2026-02-21T08:38:15.8688863Z 'range_multi_buffers': [None, True], 2026-02-21T08:38:15.8689048Z 'range_num_stages': [0, 0], 2026-02-21T08:38:15.8689209Z 'range_unroll_factors': [0, 0], 2026-02-21T08:38:15.8689391Z 'range_warp_specializes': [True, None]} 2026-02-21T08:38:15.8698672Z [45s] Fitting surrogate: 281 points, 281 targets 2026-02-21T08:38:17.0225112Z [46s] Generation 3 starting: 84 neighbors, 5 active search path(s) 2026-02-21T08:38:50.0866675Z [79s] Timeout after 30s compiling Config(block_sizes=[256, 512, 32], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=1, num_stages=5, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, False]) 2026-02-21T08:38:50.0882575Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 0.4 configs/s 2026-02-21T08:38:53.8264140Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 87/87 23.7 configs/s 2026-02-21T08:38:56.5375517Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 371.7 2026-02-21T08:38:56.5376046Z configs/s 2026-02-21T08:38:56.7042443Z [86s] Generation 3 complete: 2026-02-21T08:38:56.7047724Z error=25 2026-02-21T08:38:56.7052008Z timeout=1 2026-02-21T08:38:56.7056089Z ok=63 2026-02-21T08:38:56.7060556Z min=0.0430 2026-02-21T08:38:56.7064907Z mid=0.0716 2026-02-21T08:38:56.7067263Z max=6.3847 2026-02-21T08:38:56.7067535Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:38:56.7067930Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:38:56.7068609Z 'l2_groupings': [4], 2026-02-21T08:38:56.7068879Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:38:56.7069200Z 'loop_orders': [[1, 0]], 2026-02-21T08:38:56.7069450Z 'num_stages': 2, 2026-02-21T08:38:56.7069694Z 'num_warps': 4, 2026-02-21T08:38:56.7069915Z 'pid_type': 'flat', 2026-02-21T08:38:56.7070265Z 'range_flattens': [None, None], 2026-02-21T08:38:56.7070569Z 'range_multi_buffers': [None, True], 2026-02-21T08:38:56.7070873Z 'range_num_stages': [0, 0], 2026-02-21T08:38:56.7071164Z 'range_unroll_factors': [0, 0], 2026-02-21T08:38:56.7071458Z 'range_warp_specializes': [None, None]} 2026-02-21T08:38:56.7071818Z [86s] Fitting surrogate: 370 points, 370 targets 2026-02-21T08:38:57.9486455Z [87s] Generation 4 starting: 85 neighbors, 5 active search path(s) 2026-02-21T08:39:02.7596549Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86/86 20.2 configs/s 2026-02-21T08:39:06.1741142Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 86/86 25.7 configs/s 2026-02-21T08:39:09.9573556Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 296.3 2026-02-21T08:39:09.9573888Z configs/s 2026-02-21T08:39:10.1578969Z [99s] Generation 4 complete: 2026-02-21T08:39:10.1580969Z error=29 2026-02-21T08:39:10.1581142Z ok=61 2026-02-21T08:39:10.1581291Z min=0.0430 2026-02-21T08:39:10.1581423Z mid=0.0574 2026-02-21T08:39:10.1581553Z max=2.0634 2026-02-21T08:39:10.1581976Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:39:10.1582214Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T08:39:10.1582437Z 'l2_groupings': [4], 2026-02-21T08:39:10.1582597Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:39:10.1582783Z 'loop_orders': [[1, 0]], 2026-02-21T08:39:10.1582933Z 'num_stages': 2, 2026-02-21T08:39:10.1583074Z 'num_warps': 4, 2026-02-21T08:39:10.1583207Z 'pid_type': 'flat', 2026-02-21T08:39:10.1583361Z 'range_flattens': [None, None], 2026-02-21T08:39:10.1583550Z 'range_multi_buffers': [None, True], 2026-02-21T08:39:10.1583727Z 'range_num_stages': [0, 0], 2026-02-21T08:39:10.1583893Z 'range_unroll_factors': [0, 0], 2026-02-21T08:39:10.1584065Z 'range_warp_specializes': [None, None]} 2026-02-21T08:39:10.1599652Z [99s] Fitting surrogate: 460 points, 460 targets 2026-02-21T08:39:11.0695393Z [100s] Generation 5 starting: 57 neighbors, 4 active search path(s) 2026-02-21T08:39:16.0211673Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 10.8 configs/s 2026-02-21T08:39:18.7488442Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 21.7 configs/s 2026-02-21T08:39:20.8355253Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 483.0 2026-02-21T08:39:20.8355871Z configs/s 2026-02-21T08:39:20.9736400Z [110s] Generation 5 complete: 2026-02-21T08:39:20.9736668Z error=13 2026-02-21T08:39:20.9736812Z ok=48 2026-02-21T08:39:20.9737262Z min=0.0389 2026-02-21T08:39:20.9737397Z mid=0.0594 2026-02-21T08:39:20.9737523Z max=4.6126 2026-02-21T08:39:20.9737661Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:39:20.9737896Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:39:20.9738114Z 'l2_groupings': [64], 2026-02-21T08:39:20.9738298Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T08:39:20.9738491Z 'loop_orders': [[0, 1]], 2026-02-21T08:39:20.9738657Z 'maxnreg': 256, 2026-02-21T08:39:20.9738797Z 'num_sm_multiplier': 1, 2026-02-21T08:39:20.9738965Z 'num_stages': 5, 2026-02-21T08:39:20.9739101Z 'num_warps': 1, 2026-02-21T08:39:20.9739258Z 'pid_type': 'persistent_blocked', 2026-02-21T08:39:20.9739441Z 'range_flattens': [True, True], 2026-02-21T08:39:20.9739614Z 'range_multi_buffers': [None, False], 2026-02-21T08:39:20.9739798Z 'range_num_stages': [0, 0], 2026-02-21T08:39:20.9739956Z 'range_unroll_factors': [0, 0], 2026-02-21T08:39:20.9740136Z 'range_warp_specializes': [None, True]} 2026-02-21T08:39:20.9761228Z [110s] Fitting surrogate: 521 points, 521 targets 2026-02-21T08:39:21.9381055Z [111s] Generation 6 starting: 53 neighbors, 3 active search path(s) 2026-02-21T08:39:25.4339849Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54/54 16.9 configs/s 2026-02-21T08:39:27.7860234Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 54/54 23.6 configs/s 2026-02-21T08:39:29.3945139Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 624.6 2026-02-21T08:39:29.3945500Z configs/s 2026-02-21T08:39:29.4996617Z [119s] Generation 6 complete: 2026-02-21T08:39:29.4996949Z error=14 2026-02-21T08:39:29.4997094Z ok=42 2026-02-21T08:39:29.4997230Z min=0.0389 2026-02-21T08:39:29.4997393Z mid=0.0614 2026-02-21T08:39:29.4997527Z max=0.8294 2026-02-21T08:39:29.4997675Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:39:29.4997955Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:39:29.4998266Z 'l2_groupings': [64], 2026-02-21T08:39:29.4998466Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T08:39:29.4998678Z 'loop_orders': [[0, 1]], 2026-02-21T08:39:29.4998844Z 'maxnreg': 256, 2026-02-21T08:39:29.4998990Z 'num_sm_multiplier': 1, 2026-02-21T08:39:29.4999157Z 'num_stages': 5, 2026-02-21T08:39:29.4999309Z 'num_warps': 1, 2026-02-21T08:39:29.4999465Z 'pid_type': 'persistent_blocked', 2026-02-21T08:39:29.4999650Z 'range_flattens': [True, True], 2026-02-21T08:39:29.5000147Z 'range_multi_buffers': [None, False], 2026-02-21T08:39:29.5000330Z 'range_num_stages': [0, 0], 2026-02-21T08:39:29.5000505Z 'range_unroll_factors': [0, 0], 2026-02-21T08:39:29.5000718Z 'range_warp_specializes': [True, None]} 2026-02-21T08:39:29.5018128Z [119s] Fitting surrogate: 577 points, 577 targets 2026-02-21T08:39:30.2700576Z [119s] Generation 7 starting: 39 neighbors, 2 active search path(s) 2026-02-21T08:39:38.4393885Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40/40 1.1 configs/s 2026-02-21T08:39:40.0453130Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 40/40 25.9 configs/s 2026-02-21T08:39:41.2285746Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 845.2 2026-02-21T08:39:41.2286091Z configs/s 2026-02-21T08:39:41.3179392Z [130s] Generation 7 complete: 2026-02-21T08:39:41.3182991Z error=13 2026-02-21T08:39:41.3187289Z ok=28 2026-02-21T08:39:41.3189252Z min=0.0370 2026-02-21T08:39:41.3189438Z mid=0.0573 2026-02-21T08:39:41.3189635Z max=0.2621 2026-02-21T08:39:41.3189789Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:39:41.3190056Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:39:41.3190313Z 'l2_groupings': [64], 2026-02-21T08:39:41.3194205Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T08:39:41.3196162Z 'loop_orders': [[0, 1]], 2026-02-21T08:39:41.3196365Z 'maxnreg': 256, 2026-02-21T08:39:41.3196780Z 'num_sm_multiplier': 1, 2026-02-21T08:39:41.3196943Z 'num_stages': 6, 2026-02-21T08:39:41.3197076Z 'num_warps': 1, 2026-02-21T08:39:41.3197231Z 'pid_type': 'persistent_blocked', 2026-02-21T08:39:41.3197414Z 'range_flattens': [True, True], 2026-02-21T08:39:41.3197598Z 'range_multi_buffers': [None, True], 2026-02-21T08:39:41.3197785Z 'range_num_stages': [0, 0], 2026-02-21T08:39:41.3197953Z 'range_unroll_factors': [0, 0], 2026-02-21T08:39:41.3198138Z 'range_warp_specializes': [True, None]} 2026-02-21T08:39:41.3200217Z [130s] Fitting surrogate: 618 points, 618 targets 2026-02-21T08:39:42.0190838Z [131s] Generation 8 starting: 33 neighbors, 2 active search path(s) 2026-02-21T08:39:46.7271213Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34/34 2.6 configs/s 2026-02-21T08:39:48.0013993Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 34/34 28.2 configs/s 2026-02-21T08:39:49.2694282Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 789.6 2026-02-21T08:39:49.2695318Z configs/s 2026-02-21T08:39:49.3571441Z [138s] Generation 8 complete: 2026-02-21T08:39:49.3575313Z error=13 2026-02-21T08:39:49.3576977Z ok=22 2026-02-21T08:39:49.3577150Z min=0.0379 2026-02-21T08:39:49.3577279Z mid=0.0511 2026-02-21T08:39:49.3577407Z max=1.1644 2026-02-21T08:39:49.3577854Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:39:49.3578146Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:39:49.3578396Z 'l2_groupings': [64], 2026-02-21T08:39:49.3578572Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:39:49.3578757Z 'loop_orders': [[0, 1]], 2026-02-21T08:39:49.3578915Z 'maxnreg': 256, 2026-02-21T08:39:49.3579062Z 'num_sm_multiplier': 1, 2026-02-21T08:39:49.3579213Z 'num_stages': 6, 2026-02-21T08:39:49.3579353Z 'num_warps': 1, 2026-02-21T08:39:49.3579503Z 'pid_type': 'persistent_blocked', 2026-02-21T08:39:49.3579688Z 'range_flattens': [True, True], 2026-02-21T08:39:49.3579867Z 'range_multi_buffers': [None, False], 2026-02-21T08:39:49.3580050Z 'range_num_stages': [0, 0], 2026-02-21T08:39:49.3580208Z 'range_unroll_factors': [0, 0], 2026-02-21T08:39:49.3580384Z 'range_warp_specializes': [True, None]} 2026-02-21T08:39:49.3597757Z [138s] Fitting surrogate: 653 points, 653 targets 2026-02-21T08:39:50.0659630Z [139s] Generation 9 starting: 33 neighbors, 2 active search path(s) 2026-02-21T08:39:52.3921287Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34/34 19.4 configs/s 2026-02-21T08:39:53.2028080Z 2026-02-21T08:39:53.2028092Z 2026-02-21T08:39:53.2028461Z ================================================================ 2026-02-21T08:39:53.2028777Z Internal Triton PTX codegen error 2026-02-21T08:39:53.2028983Z `ptxas` stderr: 2026-02-21T08:39:53.2029515Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:39:53.2030057Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:39:53.2030251Z 2026-02-21T08:39:53.2030708Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpirsf_ryc.ptx -o /tmp/tmpirsf_ryc.ptx.o 2026-02-21T08:39:53.2031214Z 2026-02-21T08:39:53.2031218Z 2026-02-21T08:39:53.2031284Z // 2026-02-21T08:39:53.2031423Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:39:53.2031617Z // 2026-02-21T08:39:53.2032002Z 2026-02-21T08:39:53.2032099Z .version 8.7 2026-02-21T08:39:53.2032252Z .target sm_100a 2026-02-21T08:39:53.2033772Z .address_size 64 2026-02-21T08:39:53.2036972Z 2026-02-21T08:39:53.2037129Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:39:53.2037413Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:39:53.2041359Z // @_helion_matmul 2026-02-21T08:39:53.2044928Z .visible .entry _helion_matmul( 2026-02-21T08:39:53.2049610Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:39:53.2053791Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:39:53.2057071Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:39:53.2057370Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:39:53.2057639Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:39:53.2057846Z ) 2026-02-21T08:39:53.2057973Z .reqntid 256 2026-02-21T08:39:53.2058107Z .maxnreg 32 2026-02-21T08:39:53.2058236Z { 2026-02-21T08:39:53.2058357Z .reg .pred %p<130>; 2026-02-21T08:39:53.2058536Z .reg .b32 %r<1656>; 2026-02-21T08:39:53.2058674Z .reg .b64 %rd<609>; 2026-02-21T08:39:53.2058963Z .loc 1 19 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:19:0 2026-02-21T08:39:53.2059502Z $L__func_begin0: 2026-02-21T08:39:53.2059771Z .loc 1 19 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:19:0 2026-02-21T08:39:53.2060021Z 2026-02-21T08:39:53.2060074Z // %bb.0: 2026-02-21T08:39:53.2060233Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T08:39:53.2060420Z $L__tmp0: 2026-02-21T08:39:53.2060658Z .loc 1 19 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:19 2026-02-21T08:39:53.2060962Z mov.u32 %r1, %tid.x; 2026-02-21T08:39:53.2061316Z shr.u32 %r2, %r1, 5; 2026-02-21T08:39:53.2061484Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:39:53.2061675Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T08:39:53.2061828Z @%p3 bra $L__BB0_16; 2026-02-21T08:39:53.2061976Z bra.uni $L__BB0_1; 2026-02-21T08:39:53.2062113Z $L__BB0_16: 2026-02-21T08:39:53.2062352Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0:0 2026-02-21T08:39:53.2062652Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T08:39:53.2062870Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T08:39:53.2063156Z .loc 1 19 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:19 2026-02-21T08:39:53.2063449Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:39:53.2063673Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T08:39:53.2063840Z mov.b32 %r198, global_smem; 2026-02-21T08:39:53.2064011Z // begin inline asm 2026-02-21T08:39:53.2064266Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r198], 256; 2026-02-21T08:39:53.2064517Z // end inline asm 2026-02-21T08:39:53.2064662Z bar.sync 0, 128; 2026-02-21T08:39:53.2064956Z ld.shared.b32 %r1627, [global_smem]; 2026-02-21T08:39:53.2065135Z bar.sync 0, 128; 2026-02-21T08:39:53.2065269Z // begin inline asm 2026-02-21T08:39:53.2065482Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:39:53.2065708Z // end inline asm 2026-02-21T08:39:53.2065968Z .loc 1 21 68 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:21:68 2026-02-21T08:39:53.2066261Z mov.u32 %r509, %ctaid.x; 2026-02-21T08:39:53.2066412Z mov.u32 %r510, %ctaid.y; 2026-02-21T08:39:53.2066568Z mov.u32 %r511, %ctaid.z; 2026-02-21T08:39:53.2066716Z mov.u32 %r512, %nctaid.x; 2026-02-21T08:39:53.2066878Z mov.u32 %r513, %nctaid.y; 2026-02-21T08:39:53.2067037Z mad.lo.s32 %r514, %r511, %r513, %r510; 2026-02-21T08:39:53.2067229Z mad.lo.s32 %r515, %r514, %r512, %r509; 2026-02-21T08:39:53.2067403Z shl.b32 %r516, %r515, 8; 2026-02-21T08:39:53.2067560Z cvt.s64.s32 %rd64, %r516; 2026-02-21T08:39:53.2067756Z add.s64 %rd43, %rd6, %rd64; 2026-02-21T08:39:53.2067911Z shl.b32 %r517, %r1, 2; 2026-02-21T08:39:53.2068070Z add.s32 %r199, %r198, %r517; 2026-02-21T08:39:53.2068221Z mov.b32 %r1655, 0; 2026-02-21T08:39:53.2068360Z // begin inline asm 2026-02-21T08:39:53.2068511Z @%p33 st.shared.b32 [ %r199 + 0 ], %r1655; 2026-02-21T08:39:53.2068695Z // end inline asm 2026-02-21T08:39:53.2068831Z bar.warp.sync -1; 2026-02-21T08:39:53.2068983Z setp.eq.b32 %p114, %r1, 0; 2026-02-21T08:39:53.2069134Z cvt.u64.u32 %rd28, %r198; 2026-02-21T08:39:53.2069323Z // begin inline asm 2026-02-21T08:39:53.2069576Z @%p114 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd3; 2026-02-21T08:39:53.2069852Z // end inline asm 2026-02-21T08:39:53.2069989Z // begin inline asm 2026-02-21T08:39:53.2070204Z @%p114 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T08:39:53.2070458Z // end inline asm 2026-02-21T08:39:53.2070585Z mov.b32 %r201, 32; 2026-02-21T08:39:53.2070742Z // begin inline asm 2026-02-21T08:39:53.2070981Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r201; 2026-02-21T08:39:53.2071250Z // end inline asm 2026-02-21T08:39:53.2071386Z mov.b32 %r202, 256; 2026-02-21T08:39:53.2071522Z // begin inline asm 2026-02-21T08:39:53.2071754Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r202; 2026-02-21T08:39:53.2072011Z // end inline asm 2026-02-21T08:39:53.2072146Z mov.b32 %r203, 1024; 2026-02-21T08:39:53.2072280Z // begin inline asm 2026-02-21T08:39:53.2072525Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r203; 2026-02-21T08:39:53.2072797Z // end inline asm 2026-02-21T08:39:53.2072926Z mov.b32 %r204, 12288; 2026-02-21T08:39:53.2073073Z // begin inline asm 2026-02-21T08:39:53.2073339Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r204; 2026-02-21T08:39:53.2073612Z // end inline asm 2026-02-21T08:39:53.2073773Z mov.b64 %rd36, 2048; 2026-02-21T08:39:53.2073912Z // begin inline asm 2026-02-21T08:39:53.2074165Z @%p114 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T08:39:53.2074447Z // end inline asm 2026-02-21T08:39:53.2074584Z mov.b32 %r205, 1; 2026-02-21T08:39:53.2074755Z // begin inline asm 2026-02-21T08:39:53.2075015Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r205; 2026-02-21T08:39:53.2075303Z // end inline asm 2026-02-21T08:39:53.2075436Z // begin inline asm 2026-02-21T08:39:53.2075700Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r205; 2026-02-21T08:39:53.2075976Z // end inline asm 2026-02-21T08:39:53.2076114Z // begin inline asm 2026-02-21T08:39:53.2076343Z @%p114 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T08:39:53.2076609Z // end inline asm 2026-02-21T08:39:53.2076746Z // begin inline asm 2026-02-21T08:39:53.2076998Z @%p114 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T08:39:53.2077319Z // end inline asm 2026-02-21T08:39:53.2077447Z // begin inline asm 2026-02-21T08:39:53.2077683Z @%p114 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T08:39:53.2077945Z // end inline asm 2026-02-21T08:39:53.2078079Z // begin inline asm 2026-02-21T08:39:53.2078309Z @%p114 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T08:39:53.2078560Z // end inline asm 2026-02-21T08:39:53.2078696Z // begin inline asm 2026-02-21T08:39:53.2079034Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T08:39:53.2079409Z // end inline asm 2026-02-21T08:39:53.2079540Z // begin inline asm 2026-02-21T08:39:53.2079756Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T08:39:53.2080037Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T08:39:53.2080227Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:39:53.2080404Z // end inline asm 2026-02-21T08:39:53.2080530Z bar.sync 0, 128; 2026-02-21T08:39:53.2080785Z .loc 1 22 67 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:22:67 2026-02-21T08:39:53.2081070Z add.s64 %rd61, %rd43, 128; 2026-02-21T08:39:53.2081224Z bar.sync 0, 128; 2026-02-21T08:39:53.2081353Z // begin inline asm 2026-02-21T08:39:53.2081506Z @%p33 st.shared.b32 [ %r199 + 0 ], %r1655; 2026-02-21T08:39:53.2081712Z // end inline asm 2026-02-21T08:39:53.2081846Z bar.warp.sync -1; 2026-02-21T08:39:53.2081988Z // begin inline asm 2026-02-21T08:39:53.2082229Z @%p114 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd4; 2026-02-21T08:39:53.2082503Z // end inline asm 2026-02-21T08:39:53.2082630Z // begin inline asm 2026-02-21T08:39:53.2082855Z @%p114 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T08:39:53.2083106Z // end inline asm 2026-02-21T08:39:53.2083242Z // begin inline asm 2026-02-21T08:39:53.2083480Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r201; 2026-02-21T08:39:53.2083748Z // end inline asm 2026-02-21T08:39:53.2083888Z mov.b32 %r210, 128; 2026-02-21T08:39:53.2084027Z // begin inline asm 2026-02-21T08:39:53.2084259Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r210; 2026-02-21T08:39:53.2084516Z // end inline asm 2026-02-21T08:39:53.2084650Z // begin inline asm 2026-02-21T08:39:53.2084918Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r203; 2026-02-21T08:39:53.2085199Z // end inline asm 2026-02-21T08:39:53.2085334Z // begin inline asm 2026-02-21T08:39:53.2085569Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r203; 2026-02-21T08:39:53.2085878Z // end inline asm 2026-02-21T08:39:53.2086010Z // begin inline asm 2026-02-21T08:39:53.2086271Z @%p114 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T08:39:53.2086551Z // end inline asm 2026-02-21T08:39:53.2086691Z // begin inline asm 2026-02-21T08:39:53.2086952Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r205; 2026-02-21T08:39:53.2087231Z // end inline asm 2026-02-21T08:39:53.2087367Z // begin inline asm 2026-02-21T08:39:53.2087613Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r205; 2026-02-21T08:39:53.2087902Z // end inline asm 2026-02-21T08:39:53.2088031Z // begin inline asm 2026-02-21T08:39:53.2088266Z @%p114 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T08:39:53.2088532Z // end inline asm 2026-02-21T08:39:53.2088664Z // begin inline asm 2026-02-21T08:39:53.2088916Z @%p114 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T08:39:53.2089208Z // end inline asm 2026-02-21T08:39:53.2089348Z // begin inline asm 2026-02-21T08:39:53.2089615Z @%p114 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T08:39:53.2089895Z // end inline asm 2026-02-21T08:39:53.2090027Z // begin inline asm 2026-02-21T08:39:53.2090261Z @%p114 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T08:39:53.2090528Z // end inline asm 2026-02-21T08:39:53.2090660Z // begin inline asm 2026-02-21T08:39:53.2091013Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T08:39:53.2091390Z // end inline asm 2026-02-21T08:39:53.2091530Z // begin inline asm 2026-02-21T08:39:53.2091741Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T08:39:53.2091997Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T08:39:53.2092198Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:39:53.2092378Z // end inline asm 2026-02-21T08:39:53.2092550Z bar.sync 0, 128; 2026-02-21T08:39:53.2092809Z .loc 1 29 35 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:29:35 2026-02-21T08:39:53.2093110Z mul.lo.s32 %r41, %r509, 3; 2026-02-21T08:39:53.2093381Z .loc 1 30 37 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:30:37 2026-02-21T08:39:53.2093678Z add.s32 %r518, %r41, 3; 2026-02-21T08:39:53.2093952Z .loc 1 30 49 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:30:49 2026-02-21T08:39:53.2094279Z min.s32 %r519, %r518, 384; 2026-02-21T08:39:53.2094564Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2094896Z sub.s32 %r522, %r519, %r41; 2026-02-21T08:39:53.2095068Z shl.b32 %r1645, %r522, 5; 2026-02-21T08:39:53.2095339Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2095674Z shfl.sync.idx.b32 %r523, %r2, 0, 31, -1; 2026-02-21T08:39:53.2095873Z shl.b32 %r524, %r523, 21; 2026-02-21T08:39:53.2096032Z and.b32 %r525, %r524, 6291456; 2026-02-21T08:39:53.2096207Z add.s32 %r215, %r525, %r1627; 2026-02-21T08:39:53.2096369Z mov.pred %p71, -1; 2026-02-21T08:39:53.2096521Z // begin inline asm 2026-02-21T08:39:53.2096928Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 0], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2097372Z // end inline asm 2026-02-21T08:39:53.2097525Z // begin inline asm 2026-02-21T08:39:53.2097913Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 16], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2098339Z // end inline asm 2026-02-21T08:39:53.2098502Z // begin inline asm 2026-02-21T08:39:53.2098884Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 32], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2099297Z // end inline asm 2026-02-21T08:39:53.2099433Z // begin inline asm 2026-02-21T08:39:53.2099806Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 48], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2100222Z // end inline asm 2026-02-21T08:39:53.2100358Z // begin inline asm 2026-02-21T08:39:53.2100726Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 64], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2101147Z // end inline asm 2026-02-21T08:39:53.2101277Z // begin inline asm 2026-02-21T08:39:53.2101655Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 80], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2102118Z // end inline asm 2026-02-21T08:39:53.2102246Z // begin inline asm 2026-02-21T08:39:53.2102615Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 96], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2103014Z // end inline asm 2026-02-21T08:39:53.2103148Z // begin inline asm 2026-02-21T08:39:53.2103529Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 112], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2103931Z // end inline asm 2026-02-21T08:39:53.2104066Z // begin inline asm 2026-02-21T08:39:53.2104435Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 128], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2104910Z // end inline asm 2026-02-21T08:39:53.2105042Z // begin inline asm 2026-02-21T08:39:53.2105416Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 144], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2105830Z // end inline asm 2026-02-21T08:39:53.2105956Z // begin inline asm 2026-02-21T08:39:53.2106333Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 160], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2106766Z // end inline asm 2026-02-21T08:39:53.2106902Z // begin inline asm 2026-02-21T08:39:53.2107282Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 176], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2107686Z // end inline asm 2026-02-21T08:39:53.2107825Z // begin inline asm 2026-02-21T08:39:53.2108209Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 192], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2108625Z // end inline asm 2026-02-21T08:39:53.2108752Z // begin inline asm 2026-02-21T08:39:53.2109131Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 208], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2109547Z // end inline asm 2026-02-21T08:39:53.2109675Z // begin inline asm 2026-02-21T08:39:53.2110055Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 224], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2110499Z // end inline asm 2026-02-21T08:39:53.2110645Z // begin inline asm 2026-02-21T08:39:53.2111041Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r215 + 240], {%r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655, %r1655}; 2026-02-21T08:39:53.2111455Z // end inline asm 2026-02-21T08:39:53.2111590Z // begin inline asm 2026-02-21T08:39:53.2111738Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:39:53.2111901Z // end inline asm 2026-02-21T08:39:53.2112027Z bar.sync 0, 128; 2026-02-21T08:39:53.2112282Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2112582Z add.s32 %r487, %r198, 155648; 2026-02-21T08:39:53.2112739Z // begin inline asm 2026-02-21T08:39:53.2112909Z @%p114 mbarrier.init.shared::cta.b64 [%r487], 1; 2026-02-21T08:39:53.2113096Z // end inline asm 2026-02-21T08:39:53.2113231Z bar.sync 0, 128; 2026-02-21T08:39:53.2113362Z add.s32 %r488, %r198, 155656; 2026-02-21T08:39:53.2113517Z // begin inline asm 2026-02-21T08:39:53.2113679Z @%p114 mbarrier.init.shared::cta.b64 [%r488], 1; 2026-02-21T08:39:53.2113871Z // end inline asm 2026-02-21T08:39:53.2114031Z bar.sync 0, 128; 2026-02-21T08:39:53.2114168Z add.s32 %r489, %r198, 155664; 2026-02-21T08:39:53.2114322Z // begin inline asm 2026-02-21T08:39:53.2114476Z @%p114 mbarrier.init.shared::cta.b64 [%r489], 1; 2026-02-21T08:39:53.2114663Z // end inline asm 2026-02-21T08:39:53.2114816Z bar.sync 0, 128; 2026-02-21T08:39:53.2114954Z add.s32 %r490, %r198, 155672; 2026-02-21T08:39:53.2115100Z // begin inline asm 2026-02-21T08:39:53.2115263Z @%p114 mbarrier.init.shared::cta.b64 [%r490], 1; 2026-02-21T08:39:53.2115444Z // end inline asm 2026-02-21T08:39:53.2115577Z bar.sync 0, 128; 2026-02-21T08:39:53.2115706Z add.s32 %r491, %r198, 155680; 2026-02-21T08:39:53.2115862Z // begin inline asm 2026-02-21T08:39:53.2116022Z @%p114 mbarrier.init.shared::cta.b64 [%r491], 1; 2026-02-21T08:39:53.2116197Z // end inline asm 2026-02-21T08:39:53.2116332Z bar.sync 0, 128; 2026-02-21T08:39:53.2116461Z add.s32 %r492, %r198, 155688; 2026-02-21T08:39:53.2116648Z // begin inline asm 2026-02-21T08:39:53.2116805Z @%p114 mbarrier.init.shared::cta.b64 [%r492], 1; 2026-02-21T08:39:53.2116990Z // end inline asm 2026-02-21T08:39:53.2117119Z add.s32 %r493, %r198, 155696; 2026-02-21T08:39:53.2117271Z // begin inline asm 2026-02-21T08:39:53.2117433Z @%p114 mbarrier.init.shared::cta.b64 [%r493], 1; 2026-02-21T08:39:53.2117611Z // end inline asm 2026-02-21T08:39:53.2117744Z bar.sync 0, 128; 2026-02-21T08:39:53.2117874Z add.s32 %r494, %r198, 155704; 2026-02-21T08:39:53.2118057Z // begin inline asm 2026-02-21T08:39:53.2118212Z @%p114 mbarrier.init.shared::cta.b64 [%r494], 1; 2026-02-21T08:39:53.2118394Z // end inline asm 2026-02-21T08:39:53.2118520Z bar.sync 0, 128; 2026-02-21T08:39:53.2118658Z add.s32 %r495, %r198, 155712; 2026-02-21T08:39:53.2118804Z // begin inline asm 2026-02-21T08:39:53.2118970Z @%p114 mbarrier.init.shared::cta.b64 [%r495], 1; 2026-02-21T08:39:53.2119173Z // end inline asm 2026-02-21T08:39:53.2119301Z bar.sync 0, 128; 2026-02-21T08:39:53.2119440Z add.s32 %r496, %r198, 155720; 2026-02-21T08:39:53.2119585Z // begin inline asm 2026-02-21T08:39:53.2119747Z @%p114 mbarrier.init.shared::cta.b64 [%r496], 1; 2026-02-21T08:39:53.2119921Z // end inline asm 2026-02-21T08:39:53.2120054Z bar.sync 0, 128; 2026-02-21T08:39:53.2120182Z add.s32 %r497, %r198, 155728; 2026-02-21T08:39:53.2120336Z // begin inline asm 2026-02-21T08:39:53.2120495Z @%p114 mbarrier.init.shared::cta.b64 [%r497], 1; 2026-02-21T08:39:53.2120672Z // end inline asm 2026-02-21T08:39:53.2120806Z bar.sync 0, 128; 2026-02-21T08:39:53.2120935Z add.s32 %r498, %r198, 155736; 2026-02-21T08:39:53.2121088Z // begin inline asm 2026-02-21T08:39:53.2121242Z @%p114 mbarrier.init.shared::cta.b64 [%r498], 1; 2026-02-21T08:39:53.2121425Z // end inline asm 2026-02-21T08:39:53.2121698Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2121981Z bar.sync 0, 128; 2026-02-21T08:39:53.2122117Z // begin inline asm 2026-02-21T08:39:53.2122284Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r487]; 2026-02-21T08:39:53.2122480Z // end inline asm 2026-02-21T08:39:53.2122606Z bar.sync 0, 128; 2026-02-21T08:39:53.2122740Z // begin inline asm 2026-02-21T08:39:53.2122901Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r488]; 2026-02-21T08:39:53.2123095Z // end inline asm 2026-02-21T08:39:53.2123219Z bar.sync 0, 128; 2026-02-21T08:39:53.2123350Z // begin inline asm 2026-02-21T08:39:53.2123509Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r489]; 2026-02-21T08:39:53.2123697Z // end inline asm 2026-02-21T08:39:53.2123827Z bar.sync 0, 128; 2026-02-21T08:39:53.2123954Z // begin inline asm 2026-02-21T08:39:53.2124116Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r490]; 2026-02-21T08:39:53.2124296Z // end inline asm 2026-02-21T08:39:53.2124426Z bar.sync 0, 128; 2026-02-21T08:39:53.2124553Z // begin inline asm 2026-02-21T08:39:53.2124746Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r491]; 2026-02-21T08:39:53.2124929Z // end inline asm 2026-02-21T08:39:53.2125093Z bar.sync 0, 128; 2026-02-21T08:39:53.2125226Z // begin inline asm 2026-02-21T08:39:53.2125384Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r492]; 2026-02-21T08:39:53.2125569Z // end inline asm 2026-02-21T08:39:53.2125815Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2126098Z bar.sync 0, 128; 2026-02-21T08:39:53.2126231Z add.s32 %r505, %r198, 155744; 2026-02-21T08:39:53.2126390Z // begin inline asm 2026-02-21T08:39:53.2126549Z @%p114 mbarrier.init.shared::cta.b64 [%r505], 1; 2026-02-21T08:39:53.2126735Z // end inline asm 2026-02-21T08:39:53.2126879Z add.s32 %r1611, %r198, 155760; 2026-02-21T08:39:53.2127034Z // begin inline asm 2026-02-21T08:39:53.2127201Z @%p114 mbarrier.init.shared::cta.b64 [%r1611], 1; 2026-02-21T08:39:53.2127390Z // end inline asm 2026-02-21T08:39:53.2127678Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2127968Z bar.sync 0, 128; 2026-02-21T08:39:53.2128102Z // begin inline asm 2026-02-21T08:39:53.2128280Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r1611]; 2026-02-21T08:39:53.2128477Z // end inline asm 2026-02-21T08:39:53.2128729Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2129037Z st.shared.b32 [global_smem+155768], 33554689; 2026-02-21T08:39:53.2129252Z st.shared.b32 [global_smem+147456], %r1627; 2026-02-21T08:39:53.2129462Z barrier.sync 1; 2026-02-21T08:39:53.2129625Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:39:53.2129802Z barrier.sync 1; 2026-02-21T08:39:53.2129949Z setp.lt.s32 %p108, %r1645, 1; 2026-02-21T08:39:53.2130107Z @%p108 bra $L__BB0_23; 2026-02-21T08:39:53.2130280Z // %bb.17: // %.lr.ph10 2026-02-21T08:39:53.2130586Z .loc 1 0 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0:107 2026-02-21T08:39:53.2130897Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T08:39:53.2131111Z shr.u32 %r520, %r1, 4; 2026-02-21T08:39:53.2131270Z bfe.u32 %r42, %r1, 4, 3; 2026-02-21T08:39:53.2131431Z or.b32 %r43, %r42, 8; 2026-02-21T08:39:53.2131578Z or.b32 %r44, %r42, 16; 2026-02-21T08:39:53.2131732Z or.b32 %r45, %r42, 24; 2026-02-21T08:39:53.2131876Z or.b32 %r46, %r42, 32; 2026-02-21T08:39:53.2132025Z or.b32 %r47, %r42, 40; 2026-02-21T08:39:53.2132174Z or.b32 %r48, %r42, 48; 2026-02-21T08:39:53.2132318Z or.b32 %r49, %r520, 56; 2026-02-21T08:39:53.2132473Z or.b32 %r50, %r42, 64; 2026-02-21T08:39:53.2132615Z or.b32 %r51, %r42, 72; 2026-02-21T08:39:53.2132762Z or.b32 %r52, %r42, 80; 2026-02-21T08:39:53.2132900Z or.b32 %r53, %r42, 88; 2026-02-21T08:39:53.2133046Z or.b32 %r54, %r42, 96; 2026-02-21T08:39:53.2133188Z or.b32 %r55, %r42, 104; 2026-02-21T08:39:53.2133372Z or.b32 %r56, %r42, 112; 2026-02-21T08:39:53.2133523Z or.b32 %r57, %r520, 120; 2026-02-21T08:39:53.2133682Z or.b32 %r58, %r42, 128; 2026-02-21T08:39:53.2133832Z or.b32 %r59, %r42, 136; 2026-02-21T08:39:53.2133975Z or.b32 %r60, %r42, 144; 2026-02-21T08:39:53.2134122Z or.b32 %r61, %r42, 152; 2026-02-21T08:39:53.2134264Z or.b32 %r62, %r42, 160; 2026-02-21T08:39:53.2134413Z or.b32 %r63, %r42, 168; 2026-02-21T08:39:53.2134557Z or.b32 %r64, %r42, 176; 2026-02-21T08:39:53.2134734Z or.b32 %r65, %r520, 184; 2026-02-21T08:39:53.2134882Z or.b32 %r66, %r42, 192; 2026-02-21T08:39:53.2135033Z or.b32 %r67, %r42, 200; 2026-02-21T08:39:53.2135178Z or.b32 %r68, %r42, 208; 2026-02-21T08:39:53.2135330Z or.b32 %r69, %r42, 216; 2026-02-21T08:39:53.2135480Z or.b32 %r70, %r42, 224; 2026-02-21T08:39:53.2135624Z or.b32 %r71, %r42, 232; 2026-02-21T08:39:53.2135777Z or.b32 %r72, %r42, 240; 2026-02-21T08:39:53.2135927Z or.b32 %r73, %r520, 248; 2026-02-21T08:39:53.2136090Z shl.b32 %r521, %r1, 3; 2026-02-21T08:39:53.2136245Z and.b32 %r74, %r521, 120; 2026-02-21T08:39:53.2136537Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2136872Z add.s32 %r1652, %r41, -1; 2026-02-21T08:39:53.2137035Z shl.b32 %r528, %r1, 10; 2026-02-21T08:39:53.2137185Z and.b32 %r529, %r528, 6144; 2026-02-21T08:39:53.2137349Z shl.b32 %r530, %r1, 4; 2026-02-21T08:39:53.2137505Z and.b32 %r531, %r530, 2032; 2026-02-21T08:39:53.2137663Z or.b32 %r532, %r529, %r531; 2026-02-21T08:39:53.2137828Z add.s32 %r534, %r198, 147456; 2026-02-21T08:39:53.2137987Z add.s32 %r78, %r534, %r532; 2026-02-21T08:39:53.2138153Z xor.b32 %r535, %r532, 32; 2026-02-21T08:39:53.2138305Z add.s32 %r79, %r534, %r535; 2026-02-21T08:39:53.2138466Z xor.b32 %r536, %r532, 64; 2026-02-21T08:39:53.2138616Z add.s32 %r80, %r534, %r536; 2026-02-21T08:39:53.2138777Z xor.b32 %r537, %r532, 96; 2026-02-21T08:39:53.2138926Z add.s32 %r81, %r534, %r537; 2026-02-21T08:39:53.2139087Z and.b32 %r538, %r1, 96; 2026-02-21T08:39:53.2139244Z shl.b32 %r539, %r538, 6; 2026-02-21T08:39:53.2139420Z shl.b32 %r540, %r1, 5; 2026-02-21T08:39:53.2139579Z and.b32 %r541, %r540, 96; 2026-02-21T08:39:53.2139731Z and.b32 %r542, %r530, 384; 2026-02-21T08:39:53.2139901Z and.b32 %r544, %r517, 16; 2026-02-21T08:39:53.2140040Z or.b32 %r545, %r539, %r541; 2026-02-21T08:39:53.2140192Z or.b32 %r546, %r542, %r538; 2026-02-21T08:39:53.2140340Z xor.b32 %r547, %r545, %r546; 2026-02-21T08:39:53.2140498Z add.s32 %r548, %r534, %r544; 2026-02-21T08:39:53.2140654Z add.s32 %r830, %r548, %r547; 2026-02-21T08:39:53.2140799Z add.s32 %r835, %r830, 512; 2026-02-21T08:39:53.2140986Z add.s32 %r840, %r830, 1024; 2026-02-21T08:39:53.2141129Z add.s32 %r845, %r830, 1536; 2026-02-21T08:39:53.2141281Z mov.b32 %r1650, -1; 2026-02-21T08:39:53.2141417Z mov.b32 %r1653, %r1655; 2026-02-21T08:39:53.2141563Z mov.b32 %r1654, %r1655; 2026-02-21T08:39:53.2141701Z bra.uni $L__BB0_18; 2026-02-21T08:39:53.2141893Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:39:53.2142209Z .loc 1 43 32 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:43:32 2026-02-21T08:39:53.2142496Z add.s32 %r1115, %r1654, %r42; 2026-02-21T08:39:53.2142657Z add.s32 %r1116, %r1654, %r43; 2026-02-21T08:39:53.2142809Z add.s32 %r1117, %r1654, %r44; 2026-02-21T08:39:53.2142963Z add.s32 %r1118, %r1654, %r45; 2026-02-21T08:39:53.2143108Z add.s32 %r1119, %r1654, %r46; 2026-02-21T08:39:53.2143261Z add.s32 %r1120, %r1654, %r47; 2026-02-21T08:39:53.2143408Z add.s32 %r1121, %r1654, %r48; 2026-02-21T08:39:53.2143563Z add.s32 %r1122, %r1654, %r49; 2026-02-21T08:39:53.2143710Z add.s32 %r1123, %r1654, %r50; 2026-02-21T08:39:53.2143862Z add.s32 %r1124, %r1654, %r51; 2026-02-21T08:39:53.2144015Z add.s32 %r1125, %r1654, %r52; 2026-02-21T08:39:53.2144162Z add.s32 %r1126, %r1654, %r53; 2026-02-21T08:39:53.2144320Z add.s32 %r1127, %r1654, %r54; 2026-02-21T08:39:53.2144516Z add.s32 %r1128, %r1654, %r55; 2026-02-21T08:39:53.2144698Z add.s32 %r1129, %r1654, %r56; 2026-02-21T08:39:53.2144845Z add.s32 %r1130, %r1654, %r57; 2026-02-21T08:39:53.2144999Z add.s32 %r1131, %r1654, %r58; 2026-02-21T08:39:53.2145145Z add.s32 %r1132, %r1654, %r59; 2026-02-21T08:39:53.2145299Z add.s32 %r1133, %r1654, %r60; 2026-02-21T08:39:53.2145450Z add.s32 %r1134, %r1654, %r61; 2026-02-21T08:39:53.2145611Z add.s32 %r1135, %r1654, %r62; 2026-02-21T08:39:53.2145767Z add.s32 %r1136, %r1654, %r63; 2026-02-21T08:39:53.2145914Z add.s32 %r1137, %r1654, %r64; 2026-02-21T08:39:53.2146071Z add.s32 %r1138, %r1654, %r65; 2026-02-21T08:39:53.2146216Z add.s32 %r1139, %r1654, %r66; 2026-02-21T08:39:53.2146367Z add.s32 %r1140, %r1654, %r67; 2026-02-21T08:39:53.2146512Z add.s32 %r1141, %r1654, %r68; 2026-02-21T08:39:53.2146667Z add.s32 %r1142, %r1654, %r69; 2026-02-21T08:39:53.2146812Z add.s32 %r1143, %r1654, %r70; 2026-02-21T08:39:53.2146964Z add.s32 %r1144, %r1654, %r71; 2026-02-21T08:39:53.2147119Z add.s32 %r1145, %r1654, %r72; 2026-02-21T08:39:53.2147265Z add.s32 %r1146, %r1654, %r73; 2026-02-21T08:39:53.2147558Z .loc 1 45 32 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:45:32 2026-02-21T08:39:53.2147836Z or.b32 %r1147, %r1653, %r74; 2026-02-21T08:39:53.2148099Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2148374Z bar.sync 0, 128; 2026-02-21T08:39:53.2148513Z // begin inline asm 2026-02-21T08:39:53.2148644Z 2026-02-21T08:39:53.2148759Z { 2026-02-21T08:39:53.2148884Z .reg .pred complete; 2026-02-21T08:39:53.2149026Z waitLoop: 2026-02-21T08:39:53.2149216Z mbarrier.try_wait.parity.shared.b64 complete, [%r505], %r1655; 2026-02-21T08:39:53.2149447Z @!complete bra.uni waitLoop; 2026-02-21T08:39:53.2149600Z } 2026-02-21T08:39:53.2149664Z 2026-02-21T08:39:53.2149718Z // end inline asm 2026-02-21T08:39:53.2149856Z // begin inline asm 2026-02-21T08:39:53.2150233Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568}, [%r215 + 0]; 2026-02-21T08:39:53.2150618Z // end inline asm 2026-02-21T08:39:53.2150753Z // begin inline asm 2026-02-21T08:39:53.2151092Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585}, [%r215 + 16]; 2026-02-21T08:39:53.2151482Z // end inline asm 2026-02-21T08:39:53.2151612Z // begin inline asm 2026-02-21T08:39:53.2151953Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602}, [%r215 + 32]; 2026-02-21T08:39:53.2152353Z // end inline asm 2026-02-21T08:39:53.2152490Z // begin inline asm 2026-02-21T08:39:53.2152839Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619}, [%r215 + 48]; 2026-02-21T08:39:53.2153205Z // end inline asm 2026-02-21T08:39:53.2153346Z // begin inline asm 2026-02-21T08:39:53.2153682Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636}, [%r215 + 64]; 2026-02-21T08:39:53.2154058Z // end inline asm 2026-02-21T08:39:53.2154188Z // begin inline asm 2026-02-21T08:39:53.2154534Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653}, [%r215 + 80]; 2026-02-21T08:39:53.2154941Z // end inline asm 2026-02-21T08:39:53.2155083Z // begin inline asm 2026-02-21T08:39:53.2155423Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670}, [%r215 + 96]; 2026-02-21T08:39:53.2155794Z // end inline asm 2026-02-21T08:39:53.2155930Z // begin inline asm 2026-02-21T08:39:53.2156306Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687}, [%r215 + 112]; 2026-02-21T08:39:53.2156691Z // end inline asm 2026-02-21T08:39:53.2156832Z // begin inline asm 2026-02-21T08:39:53.2157168Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704}, [%r215 + 128]; 2026-02-21T08:39:53.2157544Z // end inline asm 2026-02-21T08:39:53.2157678Z // begin inline asm 2026-02-21T08:39:53.2158021Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721}, [%r215 + 144]; 2026-02-21T08:39:53.2158397Z // end inline asm 2026-02-21T08:39:53.2158530Z // begin inline asm 2026-02-21T08:39:53.2158879Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738}, [%r215 + 160]; 2026-02-21T08:39:53.2159262Z // end inline asm 2026-02-21T08:39:53.2159400Z // begin inline asm 2026-02-21T08:39:53.2159764Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755}, [%r215 + 176]; 2026-02-21T08:39:53.2160144Z // end inline asm 2026-02-21T08:39:53.2160277Z // begin inline asm 2026-02-21T08:39:53.2160608Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r757, %r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772}, [%r215 + 192]; 2026-02-21T08:39:53.2160981Z // end inline asm 2026-02-21T08:39:53.2161109Z // begin inline asm 2026-02-21T08:39:53.2161447Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r774, %r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789}, [%r215 + 208]; 2026-02-21T08:39:53.2161821Z // end inline asm 2026-02-21T08:39:53.2161950Z // begin inline asm 2026-02-21T08:39:53.2162314Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r791, %r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806}, [%r215 + 224]; 2026-02-21T08:39:53.2162698Z // end inline asm 2026-02-21T08:39:53.2162832Z // begin inline asm 2026-02-21T08:39:53.2163160Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r808, %r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823}, [%r215 + 240]; 2026-02-21T08:39:53.2163531Z // end inline asm 2026-02-21T08:39:53.2163664Z // begin inline asm 2026-02-21T08:39:53.2163812Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:39:53.2164003Z // end inline asm 2026-02-21T08:39:53.2164129Z bar.sync 0, 128; 2026-02-21T08:39:53.2164263Z // begin inline asm 2026-02-21T08:39:53.2164426Z @%p114 mbarrier.arrive.shared::cta.b64 _, [%r1611]; 2026-02-21T08:39:53.2164625Z // end inline asm 2026-02-21T08:39:53.2164783Z cvt.u64.u32 %rd97, %r553; 2026-02-21T08:39:53.2164945Z cvt.u64.u32 %rd98, %r554; 2026-02-21T08:39:53.2165096Z shl.b64 %rd99, %rd98, 32; 2026-02-21T08:39:53.2165256Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T08:39:53.2165536Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2165832Z mov.b64 {%r1149, %r1150}, %rd100; 2026-02-21T08:39:53.2166020Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T08:39:53.2166315Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2166608Z cvt.u64.u32 %rd101, %r555; 2026-02-21T08:39:53.2166764Z cvt.u64.u32 %rd102, %r556; 2026-02-21T08:39:53.2166932Z shl.b64 %rd103, %rd102, 32; 2026-02-21T08:39:53.2167099Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T08:39:53.2167373Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2167664Z mov.b64 {%r1152, %r1153}, %rd104; 2026-02-21T08:39:53.2167870Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T08:39:53.2168163Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2168440Z cvt.u64.u32 %rd105, %r557; 2026-02-21T08:39:53.2168595Z cvt.u64.u32 %rd106, %r558; 2026-02-21T08:39:53.2168744Z shl.b64 %rd107, %rd106, 32; 2026-02-21T08:39:53.2168904Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T08:39:53.2169173Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2169454Z mov.b64 {%r1155, %r1156}, %rd108; 2026-02-21T08:39:53.2169628Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T08:39:53.2169904Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2170185Z cvt.u64.u32 %rd109, %r559; 2026-02-21T08:39:53.2170335Z cvt.u64.u32 %rd110, %r560; 2026-02-21T08:39:53.2170492Z shl.b64 %rd111, %rd110, 32; 2026-02-21T08:39:53.2170650Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T08:39:53.2170909Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2171226Z mov.b64 {%r1158, %r1159}, %rd112; 2026-02-21T08:39:53.2171400Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T08:39:53.2171692Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2171973Z cvt.u64.u32 %rd113, %r561; 2026-02-21T08:39:53.2172133Z cvt.u64.u32 %rd114, %r562; 2026-02-21T08:39:53.2172285Z shl.b64 %rd115, %rd114, 32; 2026-02-21T08:39:53.2172451Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T08:39:53.2172722Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2173007Z mov.b64 {%r1161, %r1162}, %rd116; 2026-02-21T08:39:53.2173184Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T08:39:53.2173465Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2173751Z cvt.u64.u32 %rd117, %r563; 2026-02-21T08:39:53.2173979Z cvt.u64.u32 %rd118, %r564; 2026-02-21T08:39:53.2174139Z shl.b64 %rd119, %rd118, 32; 2026-02-21T08:39:53.2174296Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T08:39:53.2174549Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2174861Z mov.b64 {%r1164, %r1165}, %rd120; 2026-02-21T08:39:53.2175027Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T08:39:53.2175318Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2175637Z cvt.u64.u32 %rd121, %r565; 2026-02-21T08:39:53.2175799Z cvt.u64.u32 %rd122, %r566; 2026-02-21T08:39:53.2175953Z shl.b64 %rd123, %rd122, 32; 2026-02-21T08:39:53.2176118Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T08:39:53.2176396Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2176682Z mov.b64 {%r1167, %r1168}, %rd124; 2026-02-21T08:39:53.2176865Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T08:39:53.2177152Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2177446Z cvt.u64.u32 %rd125, %r567; 2026-02-21T08:39:53.2177601Z cvt.u64.u32 %rd126, %r568; 2026-02-21T08:39:53.2177764Z shl.b64 %rd127, %rd126, 32; 2026-02-21T08:39:53.2177934Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T08:39:53.2178204Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2178503Z mov.b64 {%r1170, %r1171}, %rd128; 2026-02-21T08:39:53.2178680Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T08:39:53.2178974Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2179258Z cvt.u64.u32 %rd129, %r570; 2026-02-21T08:39:53.2179446Z cvt.u64.u32 %rd130, %r571; 2026-02-21T08:39:53.2179601Z shl.b64 %rd131, %rd130, 32; 2026-02-21T08:39:53.2179766Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T08:39:53.2180040Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2180333Z mov.b64 {%r1173, %r1174}, %rd132; 2026-02-21T08:39:53.2180513Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T08:39:53.2180791Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2181079Z cvt.u64.u32 %rd133, %r572; 2026-02-21T08:39:53.2181233Z cvt.u64.u32 %rd134, %r573; 2026-02-21T08:39:53.2181398Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:39:53.2181563Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:39:53.2181826Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2182117Z mov.b64 {%r1176, %r1177}, %rd136; 2026-02-21T08:39:53.2182291Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T08:39:53.2182582Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2182902Z cvt.u64.u32 %rd137, %r574; 2026-02-21T08:39:53.2183057Z cvt.u64.u32 %rd138, %r575; 2026-02-21T08:39:53.2183206Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:39:53.2183361Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:39:53.2183621Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2183892Z mov.b64 {%r1179, %r1180}, %rd140; 2026-02-21T08:39:53.2184064Z cvt.rn.f16x2.f32 %r1181, %r1180, %r1179; 2026-02-21T08:39:53.2184336Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2184613Z cvt.u64.u32 %rd141, %r576; 2026-02-21T08:39:53.2184785Z cvt.u64.u32 %rd142, %r577; 2026-02-21T08:39:53.2184938Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:39:53.2185096Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:39:53.2185384Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2185676Z mov.b64 {%r1182, %r1183}, %rd144; 2026-02-21T08:39:53.2185841Z cvt.rn.f16x2.f32 %r1184, %r1183, %r1182; 2026-02-21T08:39:53.2186126Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2186407Z cvt.u64.u32 %rd145, %r578; 2026-02-21T08:39:53.2186562Z cvt.u64.u32 %rd146, %r579; 2026-02-21T08:39:53.2186715Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:39:53.2186866Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:39:53.2187170Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2187440Z mov.b64 {%r1185, %r1186}, %rd148; 2026-02-21T08:39:53.2187613Z cvt.rn.f16x2.f32 %r1187, %r1186, %r1185; 2026-02-21T08:39:53.2187880Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2188163Z cvt.u64.u32 %rd149, %r580; 2026-02-21T08:39:53.2188311Z cvt.u64.u32 %rd150, %r581; 2026-02-21T08:39:53.2188468Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:39:53.2188626Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:39:53.2188878Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2189155Z mov.b64 {%r1188, %r1189}, %rd152; 2026-02-21T08:39:53.2189321Z cvt.rn.f16x2.f32 %r1190, %r1189, %r1188; 2026-02-21T08:39:53.2189603Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2189875Z cvt.u64.u32 %rd153, %r582; 2026-02-21T08:39:53.2190031Z cvt.u64.u32 %rd154, %r583; 2026-02-21T08:39:53.2190185Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:39:53.2190337Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:39:53.2190624Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2190901Z mov.b64 {%r1191, %r1192}, %rd156; 2026-02-21T08:39:53.2191075Z cvt.rn.f16x2.f32 %r1193, %r1192, %r1191; 2026-02-21T08:39:53.2191355Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2191638Z cvt.u64.u32 %rd157, %r584; 2026-02-21T08:39:53.2191785Z cvt.u64.u32 %rd158, %r585; 2026-02-21T08:39:53.2191937Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:39:53.2192094Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:39:53.2192357Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2192642Z mov.b64 {%r1194, %r1195}, %rd160; 2026-02-21T08:39:53.2192806Z cvt.rn.f16x2.f32 %r1196, %r1195, %r1194; 2026-02-21T08:39:53.2193091Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2193368Z cvt.u64.u32 %rd161, %r587; 2026-02-21T08:39:53.2193523Z cvt.u64.u32 %rd162, %r588; 2026-02-21T08:39:53.2193677Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:39:53.2193828Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:39:53.2194117Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2194391Z mov.b64 {%r1197, %r1198}, %rd164; 2026-02-21T08:39:53.2194562Z cvt.rn.f16x2.f32 %r1199, %r1198, %r1197; 2026-02-21T08:39:53.2194880Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2195168Z cvt.u64.u32 %rd165, %r589; 2026-02-21T08:39:53.2195314Z cvt.u64.u32 %rd166, %r590; 2026-02-21T08:39:53.2195470Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:39:53.2195627Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:39:53.2195890Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2196179Z mov.b64 {%r1200, %r1201}, %rd168; 2026-02-21T08:39:53.2196347Z cvt.rn.f16x2.f32 %r1202, %r1201, %r1200; 2026-02-21T08:39:53.2196665Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2196939Z cvt.u64.u32 %rd169, %r591; 2026-02-21T08:39:53.2197094Z cvt.u64.u32 %rd170, %r592; 2026-02-21T08:39:53.2197248Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:39:53.2197396Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:39:53.2197658Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2197930Z mov.b64 {%r1203, %r1204}, %rd172; 2026-02-21T08:39:53.2198101Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T08:39:53.2198401Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2198692Z cvt.u64.u32 %rd173, %r593; 2026-02-21T08:39:53.2198841Z cvt.u64.u32 %rd174, %r594; 2026-02-21T08:39:53.2198997Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:39:53.2199156Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:39:53.2199413Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2199699Z mov.b64 {%r1206, %r1207}, %rd176; 2026-02-21T08:39:53.2199865Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T08:39:53.2200147Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2200426Z cvt.u64.u32 %rd177, %r595; 2026-02-21T08:39:53.2200588Z cvt.u64.u32 %rd178, %r596; 2026-02-21T08:39:53.2200756Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:39:53.2200908Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:39:53.2201173Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2201448Z mov.b64 {%r1209, %r1210}, %rd180; 2026-02-21T08:39:53.2201618Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T08:39:53.2201920Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2202207Z cvt.u64.u32 %rd181, %r597; 2026-02-21T08:39:53.2202356Z cvt.u64.u32 %rd182, %r598; 2026-02-21T08:39:53.2202511Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:39:53.2202667Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:39:53.2202922Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2203203Z mov.b64 {%r1212, %r1213}, %rd184; 2026-02-21T08:39:53.2203372Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T08:39:53.2203655Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2203929Z cvt.u64.u32 %rd185, %r599; 2026-02-21T08:39:53.2204083Z cvt.u64.u32 %rd186, %r600; 2026-02-21T08:39:53.2204238Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:39:53.2204387Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:39:53.2204648Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2204953Z mov.b64 {%r1215, %r1216}, %rd188; 2026-02-21T08:39:53.2205129Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T08:39:53.2205431Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2205713Z cvt.u64.u32 %rd189, %r601; 2026-02-21T08:39:53.2205865Z cvt.u64.u32 %rd190, %r602; 2026-02-21T08:39:53.2206011Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:39:53.2206166Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:39:53.2206418Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2206702Z mov.b64 {%r1218, %r1219}, %rd192; 2026-02-21T08:39:53.2206864Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T08:39:53.2207142Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2207413Z cvt.u64.u32 %rd193, %r604; 2026-02-21T08:39:53.2207566Z cvt.u64.u32 %rd194, %r605; 2026-02-21T08:39:53.2207718Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:39:53.2207893Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:39:53.2208156Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2208431Z mov.b64 {%r1221, %r1222}, %rd196; 2026-02-21T08:39:53.2208604Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T08:39:53.2208875Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2209163Z cvt.u64.u32 %rd197, %r606; 2026-02-21T08:39:53.2209315Z cvt.u64.u32 %rd198, %r607; 2026-02-21T08:39:53.2209491Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:39:53.2209650Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:39:53.2209906Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2210183Z mov.b64 {%r1224, %r1225}, %rd200; 2026-02-21T08:39:53.2210349Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T08:39:53.2210629Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2210898Z cvt.u64.u32 %rd201, %r608; 2026-02-21T08:39:53.2211057Z cvt.u64.u32 %rd202, %r609; 2026-02-21T08:39:53.2211217Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:39:53.2211371Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:39:53.2211635Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2211911Z mov.b64 {%r1227, %r1228}, %rd204; 2026-02-21T08:39:53.2212081Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T08:39:53.2212348Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2212630Z cvt.u64.u32 %rd205, %r610; 2026-02-21T08:39:53.2212782Z cvt.u64.u32 %rd206, %r611; 2026-02-21T08:39:53.2212927Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:39:53.2213085Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:39:53.2213374Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2213668Z mov.b64 {%r1230, %r1231}, %rd208; 2026-02-21T08:39:53.2213833Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T08:39:53.2214113Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2214397Z cvt.u64.u32 %rd209, %r612; 2026-02-21T08:39:53.2214550Z cvt.u64.u32 %rd210, %r613; 2026-02-21T08:39:53.2214733Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:39:53.2214887Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:39:53.2215152Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2215434Z mov.b64 {%r1233, %r1234}, %rd212; 2026-02-21T08:39:53.2215606Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T08:39:53.2215878Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2216161Z cvt.u64.u32 %rd213, %r614; 2026-02-21T08:39:53.2216317Z cvt.u64.u32 %rd214, %r615; 2026-02-21T08:39:53.2216463Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:39:53.2216648Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:39:53.2216898Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2217176Z mov.b64 {%r1236, %r1237}, %rd216; 2026-02-21T08:39:53.2217337Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T08:39:53.2217609Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2217886Z cvt.u64.u32 %rd217, %r616; 2026-02-21T08:39:53.2218036Z cvt.u64.u32 %rd218, %r617; 2026-02-21T08:39:53.2218188Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:39:53.2218336Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:39:53.2218593Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2218863Z mov.b64 {%r1239, %r1240}, %rd220; 2026-02-21T08:39:53.2219059Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T08:39:53.2219332Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2219621Z cvt.u64.u32 %rd221, %r618; 2026-02-21T08:39:53.2219779Z cvt.u64.u32 %rd222, %r619; 2026-02-21T08:39:53.2219932Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:39:53.2220095Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:39:53.2220362Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2220657Z mov.b64 {%r1242, %r1243}, %rd224; 2026-02-21T08:39:53.2220862Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T08:39:53.2221160Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2221463Z cvt.u64.u32 %rd225, %r621; 2026-02-21T08:39:53.2221617Z cvt.u64.u32 %rd226, %r622; 2026-02-21T08:39:53.2221779Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:39:53.2221940Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:39:53.2222227Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2222531Z mov.b64 {%r1245, %r1246}, %rd228; 2026-02-21T08:39:53.2222715Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T08:39:53.2223004Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2223301Z cvt.u64.u32 %rd229, %r623; 2026-02-21T08:39:53.2223460Z cvt.u64.u32 %rd230, %r624; 2026-02-21T08:39:53.2223613Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:39:53.2223777Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:39:53.2224044Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2224343Z mov.b64 {%r1248, %r1249}, %rd232; 2026-02-21T08:39:53.2224515Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T08:39:53.2224877Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2225181Z cvt.u64.u32 %rd233, %r625; 2026-02-21T08:39:53.2225337Z cvt.u64.u32 %rd234, %r626; 2026-02-21T08:39:53.2225499Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:39:53.2225658Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:39:53.2225933Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2226219Z mov.b64 {%r1251, %r1252}, %rd236; 2026-02-21T08:39:53.2226401Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T08:39:53.2226686Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2226984Z cvt.u64.u32 %rd237, %r627; 2026-02-21T08:39:53.2227148Z cvt.u64.u32 %rd238, %r628; 2026-02-21T08:39:53.2227312Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:39:53.2227469Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:39:53.2227724Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2228004Z mov.b64 {%r1254, %r1255}, %rd240; 2026-02-21T08:39:53.2228199Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T08:39:53.2228480Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2228770Z cvt.u64.u32 %rd241, %r629; 2026-02-21T08:39:53.2228917Z cvt.u64.u32 %rd242, %r630; 2026-02-21T08:39:53.2229070Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:39:53.2229218Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:39:53.2229482Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2229756Z mov.b64 {%r1257, %r1258}, %rd244; 2026-02-21T08:39:53.2229928Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T08:39:53.2230202Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2230482Z cvt.u64.u32 %rd245, %r631; 2026-02-21T08:39:53.2230635Z cvt.u64.u32 %rd246, %r632; 2026-02-21T08:39:53.2230823Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:39:53.2230984Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:39:53.2231266Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2231555Z mov.b64 {%r1260, %r1261}, %rd248; 2026-02-21T08:39:53.2231722Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T08:39:53.2232005Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2232294Z cvt.u64.u32 %rd249, %r633; 2026-02-21T08:39:53.2232474Z cvt.u64.u32 %rd250, %r634; 2026-02-21T08:39:53.2232637Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:39:53.2232793Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:39:53.2233067Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2233354Z mov.b64 {%r1263, %r1264}, %rd252; 2026-02-21T08:39:53.2233538Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T08:39:53.2233817Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2234109Z cvt.u64.u32 %rd253, %r635; 2026-02-21T08:39:53.2234269Z cvt.u64.u32 %rd254, %r636; 2026-02-21T08:39:53.2234421Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:39:53.2234586Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:39:53.2234878Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2235171Z mov.b64 {%r1266, %r1267}, %rd256; 2026-02-21T08:39:53.2235338Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T08:39:53.2235623Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2235908Z cvt.u64.u32 %rd257, %r638; 2026-02-21T08:39:53.2236056Z cvt.u64.u32 %rd258, %r639; 2026-02-21T08:39:53.2236243Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:39:53.2236394Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:39:53.2236659Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2236930Z mov.b64 {%r1269, %r1270}, %rd260; 2026-02-21T08:39:53.2237101Z cvt.rn.f16x2.f32 %r1271, %r1270, %r1269; 2026-02-21T08:39:53.2237365Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2237644Z cvt.u64.u32 %rd261, %r640; 2026-02-21T08:39:53.2237798Z cvt.u64.u32 %rd262, %r641; 2026-02-21T08:39:53.2237943Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:39:53.2238099Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:39:53.2238351Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2238630Z mov.b64 {%r1272, %r1273}, %rd264; 2026-02-21T08:39:53.2238791Z cvt.rn.f16x2.f32 %r1274, %r1273, %r1272; 2026-02-21T08:39:53.2239066Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2239340Z cvt.u64.u32 %rd265, %r642; 2026-02-21T08:39:53.2239527Z cvt.u64.u32 %rd266, %r643; 2026-02-21T08:39:53.2239681Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:39:53.2239829Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:39:53.2240091Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2240376Z mov.b64 {%r1275, %r1276}, %rd268; 2026-02-21T08:39:53.2240545Z cvt.rn.f16x2.f32 %r1277, %r1276, %r1275; 2026-02-21T08:39:53.2240821Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2241103Z cvt.u64.u32 %rd269, %r644; 2026-02-21T08:39:53.2241257Z cvt.u64.u32 %rd270, %r645; 2026-02-21T08:39:53.2241405Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:39:53.2241563Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:39:53.2241817Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2242128Z mov.b64 {%r1278, %r1279}, %rd272; 2026-02-21T08:39:53.2242302Z cvt.rn.f16x2.f32 %r1280, %r1279, %r1278; 2026-02-21T08:39:53.2242587Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2242874Z cvt.u64.u32 %rd273, %r646; 2026-02-21T08:39:53.2243025Z cvt.u64.u32 %rd274, %r647; 2026-02-21T08:39:53.2243185Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:39:53.2243340Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:39:53.2243607Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2243911Z mov.b64 {%r1281, %r1282}, %rd276; 2026-02-21T08:39:53.2244087Z cvt.rn.f16x2.f32 %r1283, %r1282, %r1281; 2026-02-21T08:39:53.2244362Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2244649Z cvt.u64.u32 %rd277, %r648; 2026-02-21T08:39:53.2244847Z cvt.u64.u32 %rd278, %r649; 2026-02-21T08:39:53.2244995Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:39:53.2245155Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:39:53.2245408Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2245689Z mov.b64 {%r1284, %r1285}, %rd280; 2026-02-21T08:39:53.2245853Z cvt.rn.f16x2.f32 %r1286, %r1285, %r1284; 2026-02-21T08:39:53.2246131Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2246408Z cvt.u64.u32 %rd281, %r650; 2026-02-21T08:39:53.2246557Z cvt.u64.u32 %rd282, %r651; 2026-02-21T08:39:53.2246711Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:39:53.2246862Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:39:53.2247124Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2247427Z mov.b64 {%r1287, %r1288}, %rd284; 2026-02-21T08:39:53.2247602Z cvt.rn.f16x2.f32 %r1289, %r1288, %r1287; 2026-02-21T08:39:53.2247875Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2248155Z cvt.u64.u32 %rd285, %r652; 2026-02-21T08:39:53.2248312Z cvt.u64.u32 %rd286, %r653; 2026-02-21T08:39:53.2248458Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:39:53.2248615Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:39:53.2248869Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2249147Z mov.b64 {%r1290, %r1291}, %rd288; 2026-02-21T08:39:53.2249315Z cvt.rn.f16x2.f32 %r1292, %r1291, %r1290; 2026-02-21T08:39:53.2249592Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2249867Z cvt.u64.u32 %rd289, %r655; 2026-02-21T08:39:53.2250013Z cvt.u64.u32 %rd290, %r656; 2026-02-21T08:39:53.2250164Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:39:53.2250314Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:39:53.2250577Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2250880Z mov.b64 {%r1293, %r1294}, %rd292; 2026-02-21T08:39:53.2251051Z cvt.rn.f16x2.f32 %r1295, %r1294, %r1293; 2026-02-21T08:39:53.2251324Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2251602Z cvt.u64.u32 %rd293, %r657; 2026-02-21T08:39:53.2251756Z cvt.u64.u32 %rd294, %r658; 2026-02-21T08:39:53.2251900Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:39:53.2252059Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:39:53.2252313Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2252591Z mov.b64 {%r1296, %r1297}, %rd296; 2026-02-21T08:39:53.2252760Z cvt.rn.f16x2.f32 %r1298, %r1297, %r1296; 2026-02-21T08:39:53.2253036Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2253339Z cvt.u64.u32 %rd297, %r659; 2026-02-21T08:39:53.2253488Z cvt.u64.u32 %rd298, %r660; 2026-02-21T08:39:53.2253644Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:39:53.2253794Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:39:53.2254054Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2254335Z mov.b64 {%r1299, %r1300}, %rd300; 2026-02-21T08:39:53.2254509Z cvt.rn.f16x2.f32 %r1301, %r1300, %r1299; 2026-02-21T08:39:53.2254823Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2255135Z cvt.u64.u32 %rd301, %r661; 2026-02-21T08:39:53.2255295Z cvt.u64.u32 %rd302, %r662; 2026-02-21T08:39:53.2255448Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:39:53.2255609Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:39:53.2255870Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2256154Z mov.b64 {%r1302, %r1303}, %rd304; 2026-02-21T08:39:53.2256322Z cvt.rn.f16x2.f32 %r1304, %r1303, %r1302; 2026-02-21T08:39:53.2256604Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2256884Z cvt.u64.u32 %rd305, %r663; 2026-02-21T08:39:53.2257030Z cvt.u64.u32 %rd306, %r664; 2026-02-21T08:39:53.2257186Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:39:53.2257336Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:39:53.2257597Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2257876Z mov.b64 {%r1305, %r1306}, %rd308; 2026-02-21T08:39:53.2258050Z cvt.rn.f16x2.f32 %r1307, %r1306, %r1305; 2026-02-21T08:39:53.2258330Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2258630Z cvt.u64.u32 %rd309, %r665; 2026-02-21T08:39:53.2258786Z cvt.u64.u32 %rd310, %r666; 2026-02-21T08:39:53.2258935Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:39:53.2259096Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:39:53.2259351Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2259633Z mov.b64 {%r1308, %r1309}, %rd312; 2026-02-21T08:39:53.2259799Z cvt.rn.f16x2.f32 %r1310, %r1309, %r1308; 2026-02-21T08:39:53.2260075Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2260354Z cvt.u64.u32 %rd313, %r667; 2026-02-21T08:39:53.2260502Z cvt.u64.u32 %rd314, %r668; 2026-02-21T08:39:53.2260655Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:39:53.2260803Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:39:53.2261060Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2261335Z mov.b64 {%r1311, %r1312}, %rd316; 2026-02-21T08:39:53.2261504Z cvt.rn.f16x2.f32 %r1313, %r1312, %r1311; 2026-02-21T08:39:53.2261779Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2262077Z cvt.u64.u32 %rd317, %r669; 2026-02-21T08:39:53.2262229Z cvt.u64.u32 %rd318, %r670; 2026-02-21T08:39:53.2262373Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:39:53.2262528Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:39:53.2262779Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2263058Z mov.b64 {%r1314, %r1315}, %rd320; 2026-02-21T08:39:53.2263229Z cvt.rn.f16x2.f32 %r1316, %r1315, %r1314; 2026-02-21T08:39:53.2263517Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2263808Z cvt.u64.u32 %rd321, %r672; 2026-02-21T08:39:53.2263961Z cvt.u64.u32 %rd322, %r673; 2026-02-21T08:39:53.2264119Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:39:53.2264277Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:39:53.2264577Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2264906Z mov.b64 {%r1317, %r1318}, %rd324; 2026-02-21T08:39:53.2265090Z cvt.rn.f16x2.f32 %r1319, %r1318, %r1317; 2026-02-21T08:39:53.2265384Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2265680Z cvt.u64.u32 %rd325, %r674; 2026-02-21T08:39:53.2265844Z cvt.u64.u32 %rd326, %r675; 2026-02-21T08:39:53.2266002Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:39:53.2266173Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:39:53.2266475Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2266781Z mov.b64 {%r1320, %r1321}, %rd328; 2026-02-21T08:39:53.2266954Z cvt.rn.f16x2.f32 %r1322, %r1321, %r1320; 2026-02-21T08:39:53.2267250Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2267554Z cvt.u64.u32 %rd329, %r676; 2026-02-21T08:39:53.2267709Z cvt.u64.u32 %rd330, %r677; 2026-02-21T08:39:53.2267869Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:39:53.2268025Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:39:53.2268297Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2268583Z mov.b64 {%r1323, %r1324}, %rd332; 2026-02-21T08:39:53.2268762Z cvt.rn.f16x2.f32 %r1325, %r1324, %r1323; 2026-02-21T08:39:53.2269050Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2269335Z cvt.u64.u32 %rd333, %r678; 2026-02-21T08:39:53.2269495Z cvt.u64.u32 %rd334, %r679; 2026-02-21T08:39:53.2269647Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:39:53.2269811Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:39:53.2270108Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2270421Z mov.b64 {%r1326, %r1327}, %rd336; 2026-02-21T08:39:53.2270601Z cvt.rn.f16x2.f32 %r1328, %r1327, %r1326; 2026-02-21T08:39:53.2270907Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2271209Z cvt.u64.u32 %rd337, %r680; 2026-02-21T08:39:53.2271363Z cvt.u64.u32 %rd338, %r681; 2026-02-21T08:39:53.2271524Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:39:53.2271678Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:39:53.2271950Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2272242Z mov.b64 {%r1329, %r1330}, %rd340; 2026-02-21T08:39:53.2272421Z cvt.rn.f16x2.f32 %r1331, %r1330, %r1329; 2026-02-21T08:39:53.2272704Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2272981Z cvt.u64.u32 %rd341, %r682; 2026-02-21T08:39:53.2273143Z cvt.u64.u32 %rd342, %r683; 2026-02-21T08:39:53.2273202Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:39:53.2273264Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:39:53.2273484Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2273542Z mov.b64 {%r1332, %r1333}, %rd344; 2026-02-21T08:39:53.2273607Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T08:39:53.2273769Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2273833Z cvt.u64.u32 %rd345, %r684; 2026-02-21T08:39:53.2273887Z cvt.u64.u32 %rd346, %r685; 2026-02-21T08:39:53.2273944Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:39:53.2274010Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:39:53.2274171Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2274229Z mov.b64 {%r1335, %r1336}, %rd348; 2026-02-21T08:39:53.2274300Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T08:39:53.2274485Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2274545Z cvt.u64.u32 %rd349, %r686; 2026-02-21T08:39:53.2274600Z cvt.u64.u32 %rd350, %r687; 2026-02-21T08:39:53.2274663Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:39:53.2274753Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:39:53.2274915Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2274985Z mov.b64 {%r1338, %r1339}, %rd352; 2026-02-21T08:39:53.2275050Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T08:39:53.2275241Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2275305Z cvt.u64.u32 %rd353, %r689; 2026-02-21T08:39:53.2275362Z cvt.u64.u32 %rd354, %r690; 2026-02-21T08:39:53.2275417Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:39:53.2275477Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:39:53.2275650Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2275708Z mov.b64 {%r1341, %r1342}, %rd356; 2026-02-21T08:39:53.2275771Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T08:39:53.2275943Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2275999Z cvt.u64.u32 %rd357, %r691; 2026-02-21T08:39:53.2276055Z cvt.u64.u32 %rd358, %r692; 2026-02-21T08:39:53.2276120Z shl.b64 %rd359, %rd358, 32; 2026-02-21T08:39:53.2276178Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T08:39:53.2276343Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2276400Z mov.b64 {%r1344, %r1345}, %rd360; 2026-02-21T08:39:53.2276475Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T08:39:53.2276665Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2276728Z cvt.u64.u32 %rd361, %r693; 2026-02-21T08:39:53.2276797Z cvt.u64.u32 %rd362, %r694; 2026-02-21T08:39:53.2276858Z shl.b64 %rd363, %rd362, 32; 2026-02-21T08:39:53.2276916Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T08:39:53.2277089Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2277146Z mov.b64 {%r1347, %r1348}, %rd364; 2026-02-21T08:39:53.2277210Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T08:39:53.2277372Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2277436Z cvt.u64.u32 %rd365, %r695; 2026-02-21T08:39:53.2277491Z cvt.u64.u32 %rd366, %r696; 2026-02-21T08:39:53.2277547Z shl.b64 %rd367, %rd366, 32; 2026-02-21T08:39:53.2277611Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T08:39:53.2277773Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2277830Z mov.b64 {%r1350, %r1351}, %rd368; 2026-02-21T08:39:53.2277900Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T08:39:53.2278137Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2278193Z cvt.u64.u32 %rd369, %r697; 2026-02-21T08:39:53.2278249Z cvt.u64.u32 %rd370, %r698; 2026-02-21T08:39:53.2278315Z shl.b64 %rd371, %rd370, 32; 2026-02-21T08:39:53.2278372Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T08:39:53.2278536Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2278605Z mov.b64 {%r1353, %r1354}, %rd372; 2026-02-21T08:39:53.2278669Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T08:39:53.2278834Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2278896Z cvt.u64.u32 %rd373, %r699; 2026-02-21T08:39:53.2278953Z cvt.u64.u32 %rd374, %r700; 2026-02-21T08:39:53.2279011Z shl.b64 %rd375, %rd374, 32; 2026-02-21T08:39:53.2279094Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T08:39:53.2279268Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2279324Z mov.b64 {%r1356, %r1357}, %rd376; 2026-02-21T08:39:53.2279387Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T08:39:53.2279557Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2279615Z cvt.u64.u32 %rd377, %r701; 2026-02-21T08:39:53.2279671Z cvt.u64.u32 %rd378, %r702; 2026-02-21T08:39:53.2279755Z shl.b64 %rd379, %rd378, 32; 2026-02-21T08:39:53.2279813Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T08:39:53.2279979Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2280036Z mov.b64 {%r1359, %r1360}, %rd380; 2026-02-21T08:39:53.2280107Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T08:39:53.2280274Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2280331Z cvt.u64.u32 %rd381, %r703; 2026-02-21T08:39:53.2280393Z cvt.u64.u32 %rd382, %r704; 2026-02-21T08:39:53.2280448Z shl.b64 %rd383, %rd382, 32; 2026-02-21T08:39:53.2280504Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T08:39:53.2280676Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2280733Z mov.b64 {%r1362, %r1363}, %rd384; 2026-02-21T08:39:53.2280795Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T08:39:53.2280960Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2281023Z cvt.u64.u32 %rd385, %r706; 2026-02-21T08:39:53.2281077Z cvt.u64.u32 %rd386, %r707; 2026-02-21T08:39:53.2281132Z shl.b64 %rd387, %rd386, 32; 2026-02-21T08:39:53.2281214Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T08:39:53.2281376Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2281434Z mov.b64 {%r1365, %r1366}, %rd388; 2026-02-21T08:39:53.2281504Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T08:39:53.2281665Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2281719Z cvt.u64.u32 %rd389, %r708; 2026-02-21T08:39:53.2281774Z cvt.u64.u32 %rd390, %r709; 2026-02-21T08:39:53.2281835Z shl.b64 %rd391, %rd390, 32; 2026-02-21T08:39:53.2281891Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T08:39:53.2282053Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2282119Z mov.b64 {%r1368, %r1369}, %rd392; 2026-02-21T08:39:53.2282181Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T08:39:53.2282341Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2282403Z cvt.u64.u32 %rd393, %r710; 2026-02-21T08:39:53.2282457Z cvt.u64.u32 %rd394, %r711; 2026-02-21T08:39:53.2282534Z shl.b64 %rd395, %rd394, 32; 2026-02-21T08:39:53.2282590Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T08:39:53.2282758Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2282815Z mov.b64 {%r1371, %r1372}, %rd396; 2026-02-21T08:39:53.2282877Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T08:39:53.2283045Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2283103Z cvt.u64.u32 %rd397, %r712; 2026-02-21T08:39:53.2283158Z cvt.u64.u32 %rd398, %r713; 2026-02-21T08:39:53.2283220Z shl.b64 %rd399, %rd398, 32; 2026-02-21T08:39:53.2283277Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T08:39:53.2283438Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2283496Z mov.b64 {%r1374, %r1375}, %rd400; 2026-02-21T08:39:53.2283588Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T08:39:53.2283753Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2283809Z cvt.u64.u32 %rd401, %r714; 2026-02-21T08:39:53.2283872Z cvt.u64.u32 %rd402, %r715; 2026-02-21T08:39:53.2283927Z shl.b64 %rd403, %rd402, 32; 2026-02-21T08:39:53.2283983Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T08:39:53.2284153Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2284233Z mov.b64 {%r1377, %r1378}, %rd404; 2026-02-21T08:39:53.2284294Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T08:39:53.2284455Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2284519Z cvt.u64.u32 %rd405, %r716; 2026-02-21T08:39:53.2284574Z cvt.u64.u32 %rd406, %r717; 2026-02-21T08:39:53.2284630Z shl.b64 %rd407, %rd406, 32; 2026-02-21T08:39:53.2284721Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T08:39:53.2284884Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2284942Z mov.b64 {%r1380, %r1381}, %rd408; 2026-02-21T08:39:53.2285013Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T08:39:53.2285175Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2285233Z cvt.u64.u32 %rd409, %r718; 2026-02-21T08:39:53.2285294Z cvt.u64.u32 %rd410, %r719; 2026-02-21T08:39:53.2285362Z shl.b64 %rd411, %rd410, 32; 2026-02-21T08:39:53.2285420Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T08:39:53.2285585Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2285652Z mov.b64 {%r1383, %r1384}, %rd412; 2026-02-21T08:39:53.2285741Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T08:39:53.2285906Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2285972Z cvt.u64.u32 %rd413, %r720; 2026-02-21T08:39:53.2286026Z cvt.u64.u32 %rd414, %r721; 2026-02-21T08:39:53.2286082Z shl.b64 %rd415, %rd414, 32; 2026-02-21T08:39:53.2286139Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T08:39:53.2286310Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2286367Z mov.b64 {%r1386, %r1387}, %rd416; 2026-02-21T08:39:53.2286430Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T08:39:53.2286600Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2286656Z cvt.u64.u32 %rd417, %r723; 2026-02-21T08:39:53.2286711Z cvt.u64.u32 %rd418, %r724; 2026-02-21T08:39:53.2286774Z shl.b64 %rd419, %rd418, 32; 2026-02-21T08:39:53.2286831Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T08:39:53.2286994Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2287075Z mov.b64 {%r1389, %r1390}, %rd420; 2026-02-21T08:39:53.2287145Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T08:39:53.2287306Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2287361Z cvt.u64.u32 %rd421, %r725; 2026-02-21T08:39:53.2287425Z cvt.u64.u32 %rd422, %r726; 2026-02-21T08:39:53.2287481Z shl.b64 %rd423, %rd422, 32; 2026-02-21T08:39:53.2287537Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T08:39:53.2287707Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2287763Z mov.b64 {%r1392, %r1393}, %rd424; 2026-02-21T08:39:53.2287825Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T08:39:53.2287985Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2288050Z cvt.u64.u32 %rd425, %r727; 2026-02-21T08:39:53.2288130Z cvt.u64.u32 %rd426, %r728; 2026-02-21T08:39:53.2288191Z shl.b64 %rd427, %rd426, 32; 2026-02-21T08:39:53.2288255Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T08:39:53.2288418Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2288475Z mov.b64 {%r1395, %r1396}, %rd428; 2026-02-21T08:39:53.2288545Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T08:39:53.2288707Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2288792Z cvt.u64.u32 %rd429, %r729; 2026-02-21T08:39:53.2288848Z cvt.u64.u32 %rd430, %r730; 2026-02-21T08:39:53.2288910Z shl.b64 %rd431, %rd430, 32; 2026-02-21T08:39:53.2288966Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T08:39:53.2289127Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2289192Z mov.b64 {%r1398, %r1399}, %rd432; 2026-02-21T08:39:53.2289257Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T08:39:53.2289416Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2289477Z cvt.u64.u32 %rd433, %r731; 2026-02-21T08:39:53.2289531Z cvt.u64.u32 %rd434, %r732; 2026-02-21T08:39:53.2289586Z shl.b64 %rd435, %rd434, 32; 2026-02-21T08:39:53.2289643Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T08:39:53.2289811Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2289870Z mov.b64 {%r1401, %r1402}, %rd436; 2026-02-21T08:39:53.2289932Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T08:39:53.2290100Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2290156Z cvt.u64.u32 %rd437, %r733; 2026-02-21T08:39:53.2290209Z cvt.u64.u32 %rd438, %r734; 2026-02-21T08:39:53.2290304Z shl.b64 %rd439, %rd438, 32; 2026-02-21T08:39:53.2290363Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T08:39:53.2290527Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2290582Z mov.b64 {%r1404, %r1405}, %rd440; 2026-02-21T08:39:53.2290650Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T08:39:53.2290810Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2290864Z cvt.u64.u32 %rd441, %r735; 2026-02-21T08:39:53.2290925Z cvt.u64.u32 %rd442, %r736; 2026-02-21T08:39:53.2290981Z shl.b64 %rd443, %rd442, 32; 2026-02-21T08:39:53.2291038Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T08:39:53.2291205Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2291262Z mov.b64 {%r1407, %r1408}, %rd444; 2026-02-21T08:39:53.2291323Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T08:39:53.2291484Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2291566Z cvt.u64.u32 %rd445, %r737; 2026-02-21T08:39:53.2291621Z cvt.u64.u32 %rd446, %r738; 2026-02-21T08:39:53.2291675Z shl.b64 %rd447, %rd446, 32; 2026-02-21T08:39:53.2291738Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T08:39:53.2291895Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2291951Z mov.b64 {%r1410, %r1411}, %rd448; 2026-02-21T08:39:53.2292022Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T08:39:53.2292186Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2292240Z cvt.u64.u32 %rd449, %r740; 2026-02-21T08:39:53.2292296Z cvt.u64.u32 %rd450, %r741; 2026-02-21T08:39:53.2292357Z shl.b64 %rd451, %rd450, 32; 2026-02-21T08:39:53.2292413Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T08:39:53.2292598Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2292664Z mov.b64 {%r1413, %r1414}, %rd452; 2026-02-21T08:39:53.2292729Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T08:39:53.2292886Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2292948Z cvt.u64.u32 %rd453, %r742; 2026-02-21T08:39:53.2293004Z cvt.u64.u32 %rd454, %r743; 2026-02-21T08:39:53.2293060Z shl.b64 %rd455, %rd454, 32; 2026-02-21T08:39:53.2293116Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T08:39:53.2293284Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2293362Z mov.b64 {%r1416, %r1417}, %rd456; 2026-02-21T08:39:53.2293427Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T08:39:53.2293595Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2293653Z cvt.u64.u32 %rd457, %r744; 2026-02-21T08:39:53.2293711Z cvt.u64.u32 %rd458, %r745; 2026-02-21T08:39:53.2293776Z shl.b64 %rd459, %rd458, 32; 2026-02-21T08:39:53.2293835Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T08:39:53.2293997Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2294054Z mov.b64 {%r1419, %r1420}, %rd460; 2026-02-21T08:39:53.2294129Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T08:39:53.2294293Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2294351Z cvt.u64.u32 %rd461, %r746; 2026-02-21T08:39:53.2294414Z cvt.u64.u32 %rd462, %r747; 2026-02-21T08:39:53.2294470Z shl.b64 %rd463, %rd462, 32; 2026-02-21T08:39:53.2294526Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T08:39:53.2294725Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2294809Z mov.b64 {%r1422, %r1423}, %rd464; 2026-02-21T08:39:53.2294875Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T08:39:53.2295036Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2295099Z cvt.u64.u32 %rd465, %r748; 2026-02-21T08:39:53.2295153Z cvt.u64.u32 %rd466, %r749; 2026-02-21T08:39:53.2295208Z shl.b64 %rd467, %rd466, 32; 2026-02-21T08:39:53.2295272Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T08:39:53.2295435Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2295492Z mov.b64 {%r1425, %r1426}, %rd468; 2026-02-21T08:39:53.2295563Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T08:39:53.2295723Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2295778Z cvt.u64.u32 %rd469, %r750; 2026-02-21T08:39:53.2295833Z cvt.u64.u32 %rd470, %r751; 2026-02-21T08:39:53.2295897Z shl.b64 %rd471, %rd470, 32; 2026-02-21T08:39:53.2295954Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T08:39:53.2296116Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2296210Z mov.b64 {%r1428, %r1429}, %rd472; 2026-02-21T08:39:53.2296274Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T08:39:53.2296433Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2296495Z cvt.u64.u32 %rd473, %r752; 2026-02-21T08:39:53.2296550Z cvt.u64.u32 %rd474, %r753; 2026-02-21T08:39:53.2296605Z shl.b64 %rd475, %rd474, 32; 2026-02-21T08:39:53.2296662Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T08:39:53.2296828Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2296884Z mov.b64 {%r1431, %r1432}, %rd476; 2026-02-21T08:39:53.2296947Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T08:39:53.2297141Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2297198Z cvt.u64.u32 %rd477, %r754; 2026-02-21T08:39:53.2297255Z cvt.u64.u32 %rd478, %r755; 2026-02-21T08:39:53.2297318Z shl.b64 %rd479, %rd478, 32; 2026-02-21T08:39:53.2297375Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T08:39:53.2297537Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2297593Z mov.b64 {%r1434, %r1435}, %rd480; 2026-02-21T08:39:53.2297664Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T08:39:53.2297825Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2297910Z cvt.u64.u32 %rd481, %r757; 2026-02-21T08:39:53.2297973Z cvt.u64.u32 %rd482, %r758; 2026-02-21T08:39:53.2298028Z shl.b64 %rd483, %rd482, 32; 2026-02-21T08:39:53.2298083Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T08:39:53.2298255Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2298313Z mov.b64 {%r1437, %r1438}, %rd484; 2026-02-21T08:39:53.2298376Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T08:39:53.2298538Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2298599Z cvt.u64.u32 %rd485, %r759; 2026-02-21T08:39:53.2298654Z cvt.u64.u32 %rd486, %r760; 2026-02-21T08:39:53.2298709Z shl.b64 %rd487, %rd486, 32; 2026-02-21T08:39:53.2298771Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T08:39:53.2298931Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2298988Z mov.b64 {%r1440, %r1441}, %rd488; 2026-02-21T08:39:53.2299057Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T08:39:53.2299216Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2299292Z cvt.u64.u32 %rd489, %r761; 2026-02-21T08:39:53.2299349Z cvt.u64.u32 %rd490, %r762; 2026-02-21T08:39:53.2299411Z shl.b64 %rd491, %rd490, 32; 2026-02-21T08:39:53.2299467Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T08:39:53.2299627Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2299692Z mov.b64 {%r1443, %r1444}, %rd492; 2026-02-21T08:39:53.2299754Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T08:39:53.2299917Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2299978Z cvt.u64.u32 %rd493, %r763; 2026-02-21T08:39:53.2300033Z cvt.u64.u32 %rd494, %r764; 2026-02-21T08:39:53.2300088Z shl.b64 %rd495, %rd494, 32; 2026-02-21T08:39:53.2300144Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T08:39:53.2300314Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2300371Z mov.b64 {%r1446, %r1447}, %rd496; 2026-02-21T08:39:53.2300435Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T08:39:53.2300605Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2300682Z cvt.u64.u32 %rd497, %r765; 2026-02-21T08:39:53.2300738Z cvt.u64.u32 %rd498, %r766; 2026-02-21T08:39:53.2300799Z shl.b64 %rd499, %rd498, 32; 2026-02-21T08:39:53.2300856Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T08:39:53.2301017Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2301072Z mov.b64 {%r1449, %r1450}, %rd500; 2026-02-21T08:39:53.2301145Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T08:39:53.2301305Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2301361Z cvt.u64.u32 %rd501, %r767; 2026-02-21T08:39:53.2301422Z cvt.u64.u32 %rd502, %r768; 2026-02-21T08:39:53.2301479Z shl.b64 %rd503, %rd502, 32; 2026-02-21T08:39:53.2301535Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T08:39:53.2301728Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2301789Z mov.b64 {%r1452, %r1453}, %rd504; 2026-02-21T08:39:53.2301853Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T08:39:53.2302014Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2302078Z cvt.u64.u32 %rd505, %r769; 2026-02-21T08:39:53.2302133Z cvt.u64.u32 %rd506, %r770; 2026-02-21T08:39:53.2302188Z shl.b64 %rd507, %rd506, 32; 2026-02-21T08:39:53.2302274Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T08:39:53.2302433Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2302492Z mov.b64 {%r1455, %r1456}, %rd508; 2026-02-21T08:39:53.2302565Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T08:39:53.2302731Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2302790Z cvt.u64.u32 %rd509, %r771; 2026-02-21T08:39:53.2302847Z cvt.u64.u32 %rd510, %r772; 2026-02-21T08:39:53.2302910Z shl.b64 %rd511, %rd510, 32; 2026-02-21T08:39:53.2302967Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T08:39:53.2303124Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2303188Z mov.b64 {%r1458, %r1459}, %rd512; 2026-02-21T08:39:53.2303252Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T08:39:53.2303413Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2303478Z cvt.u64.u32 %rd513, %r774; 2026-02-21T08:39:53.2303533Z cvt.u64.u32 %rd514, %r775; 2026-02-21T08:39:53.2303589Z shl.b64 %rd515, %rd514, 32; 2026-02-21T08:39:53.2303644Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T08:39:53.2303832Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2303891Z mov.b64 {%r1461, %r1462}, %rd516; 2026-02-21T08:39:53.2303956Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T08:39:53.2304131Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2304187Z cvt.u64.u32 %rd517, %r776; 2026-02-21T08:39:53.2304243Z cvt.u64.u32 %rd518, %r777; 2026-02-21T08:39:53.2304305Z shl.b64 %rd519, %rd518, 32; 2026-02-21T08:39:53.2304363Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T08:39:53.2304523Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2304580Z mov.b64 {%r1464, %r1465}, %rd520; 2026-02-21T08:39:53.2304650Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T08:39:53.2304845Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2304903Z cvt.u64.u32 %rd521, %r778; 2026-02-21T08:39:53.2304965Z cvt.u64.u32 %rd522, %r779; 2026-02-21T08:39:53.2305021Z shl.b64 %rd523, %rd522, 32; 2026-02-21T08:39:53.2305105Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T08:39:53.2305272Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2305329Z mov.b64 {%r1467, %r1468}, %rd524; 2026-02-21T08:39:53.2305692Z [142s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:39:53.2306690Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:39:53.2306822Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:39:53.2306909Z `ptxas` stderr: 2026-02-21T08:39:53.2307270Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:39:53.2307366Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:39:53.2307371Z 2026-02-21T08:39:53.2307780Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpirsf_ryc.ptx -o /tmp/tmpirsf_ryc.ptx.o 2026-02-21T08:39:53.2307831Z 2026-02-21T08:39:53.2307963Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:39:53.2308035Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T08:39:53.2308210Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2308280Z cvt.u64.u32 %rd525, %r780; 2026-02-21T08:39:53.2308340Z cvt.u64.u32 %rd526, %r781; 2026-02-21T08:39:53.2308400Z shl.b64 %rd527, %rd526, 32; 2026-02-21T08:39:53.2308469Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T08:39:53.2308642Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2308703Z mov.b64 {%r1470, %r1471}, %rd528; 2026-02-21T08:39:53.2308776Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T08:39:53.2308947Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2309005Z cvt.u64.u32 %rd529, %r782; 2026-02-21T08:39:53.2309064Z cvt.u64.u32 %rd530, %r783; 2026-02-21T08:39:53.2309128Z shl.b64 %rd531, %rd530, 32; 2026-02-21T08:39:53.2309186Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T08:39:53.2309357Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2309454Z mov.b64 {%r1473, %r1474}, %rd532; 2026-02-21T08:39:53.2309522Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T08:39:53.2309695Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2309761Z cvt.u64.u32 %rd533, %r784; 2026-02-21T08:39:53.2309819Z cvt.u64.u32 %rd534, %r785; 2026-02-21T08:39:53.2309877Z shl.b64 %rd535, %rd534, 32; 2026-02-21T08:39:53.2309934Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T08:39:53.2310110Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2310169Z mov.b64 {%r1476, %r1477}, %rd536; 2026-02-21T08:39:53.2310236Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T08:39:53.2310409Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2310467Z cvt.u64.u32 %rd537, %r786; 2026-02-21T08:39:53.2310524Z cvt.u64.u32 %rd538, %r787; 2026-02-21T08:39:53.2310590Z shl.b64 %rd539, %rd538, 32; 2026-02-21T08:39:53.2310648Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T08:39:53.2310818Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2310906Z mov.b64 {%r1479, %r1480}, %rd540; 2026-02-21T08:39:53.2310979Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T08:39:53.2311153Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2311211Z cvt.u64.u32 %rd541, %r788; 2026-02-21T08:39:53.2311276Z cvt.u64.u32 %rd542, %r789; 2026-02-21T08:39:53.2311333Z shl.b64 %rd543, %rd542, 32; 2026-02-21T08:39:53.2311392Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T08:39:53.2311561Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2311629Z mov.b64 {%r1482, %r1483}, %rd544; 2026-02-21T08:39:53.2311695Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T08:39:53.2311865Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2311956Z cvt.u64.u32 %rd545, %r791; 2026-02-21T08:39:53.2312018Z cvt.u64.u32 %rd546, %r792; 2026-02-21T08:39:53.2312077Z shl.b64 %rd547, %rd546, 32; 2026-02-21T08:39:53.2312144Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T08:39:53.2312316Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2312376Z mov.b64 {%r1485, %r1486}, %rd548; 2026-02-21T08:39:53.2312450Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T08:39:53.2312622Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2312704Z cvt.u64.u32 %rd549, %r793; 2026-02-21T08:39:53.2312763Z cvt.u64.u32 %rd550, %r794; 2026-02-21T08:39:53.2312831Z shl.b64 %rd551, %rd550, 32; 2026-02-21T08:39:53.2312892Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T08:39:53.2313064Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2313135Z mov.b64 {%r1488, %r1489}, %rd552; 2026-02-21T08:39:53.2313203Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T08:39:53.2313371Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2313439Z cvt.u64.u32 %rd553, %r795; 2026-02-21T08:39:53.2313497Z cvt.u64.u32 %rd554, %r796; 2026-02-21T08:39:53.2313556Z shl.b64 %rd555, %rd554, 32; 2026-02-21T08:39:53.2313615Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T08:39:53.2313787Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2313848Z mov.b64 {%r1491, %r1492}, %rd556; 2026-02-21T08:39:53.2313913Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T08:39:53.2314089Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2314171Z cvt.u64.u32 %rd557, %r797; 2026-02-21T08:39:53.2314230Z cvt.u64.u32 %rd558, %r798; 2026-02-21T08:39:53.2314296Z shl.b64 %rd559, %rd558, 32; 2026-02-21T08:39:53.2314356Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T08:39:53.2314529Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2314586Z mov.b64 {%r1494, %r1495}, %rd560; 2026-02-21T08:39:53.2314656Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T08:39:53.2314851Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2314907Z cvt.u64.u32 %rd561, %r799; 2026-02-21T08:39:53.2314971Z cvt.u64.u32 %rd562, %r800; 2026-02-21T08:39:53.2315026Z shl.b64 %rd563, %rd562, 32; 2026-02-21T08:39:53.2315081Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T08:39:53.2315244Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2315309Z mov.b64 {%r1497, %r1498}, %rd564; 2026-02-21T08:39:53.2315372Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T08:39:53.2315535Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2315626Z cvt.u64.u32 %rd565, %r801; 2026-02-21T08:39:53.2315682Z cvt.u64.u32 %rd566, %r802; 2026-02-21T08:39:53.2315738Z shl.b64 %rd567, %rd566, 32; 2026-02-21T08:39:53.2315801Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T08:39:53.2315964Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2316021Z mov.b64 {%r1500, %r1501}, %rd568; 2026-02-21T08:39:53.2316094Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T08:39:53.2316258Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2316313Z cvt.u64.u32 %rd569, %r803; 2026-02-21T08:39:53.2316368Z cvt.u64.u32 %rd570, %r804; 2026-02-21T08:39:53.2316431Z shl.b64 %rd571, %rd570, 32; 2026-02-21T08:39:53.2316490Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T08:39:53.2316678Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2316745Z mov.b64 {%r1503, %r1504}, %rd572; 2026-02-21T08:39:53.2316809Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T08:39:53.2316963Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2317026Z cvt.u64.u32 %rd573, %r805; 2026-02-21T08:39:53.2317080Z cvt.u64.u32 %rd574, %r806; 2026-02-21T08:39:53.2317136Z shl.b64 %rd575, %rd574, 32; 2026-02-21T08:39:53.2317216Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T08:39:53.2317384Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2317441Z mov.b64 {%r1506, %r1507}, %rd576; 2026-02-21T08:39:53.2317503Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T08:39:53.2317673Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2317730Z cvt.u64.u32 %rd577, %r808; 2026-02-21T08:39:53.2317786Z cvt.u64.u32 %rd578, %r809; 2026-02-21T08:39:53.2317848Z shl.b64 %rd579, %rd578, 32; 2026-02-21T08:39:53.2317904Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T08:39:53.2318067Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2318125Z mov.b64 {%r1509, %r1510}, %rd580; 2026-02-21T08:39:53.2318193Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T08:39:53.2318360Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2318417Z cvt.u64.u32 %rd581, %r810; 2026-02-21T08:39:53.2318478Z cvt.u64.u32 %rd582, %r811; 2026-02-21T08:39:53.2318533Z shl.b64 %rd583, %rd582, 32; 2026-02-21T08:39:53.2318589Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T08:39:53.2318777Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2318844Z mov.b64 {%r1512, %r1513}, %rd584; 2026-02-21T08:39:53.2318907Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T08:39:53.2319067Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2319129Z cvt.u64.u32 %rd585, %r812; 2026-02-21T08:39:53.2319182Z cvt.u64.u32 %rd586, %r813; 2026-02-21T08:39:53.2319237Z shl.b64 %rd587, %rd586, 32; 2026-02-21T08:39:53.2319300Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T08:39:53.2319463Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2319521Z mov.b64 {%r1515, %r1516}, %rd588; 2026-02-21T08:39:53.2319590Z cvt.rn.f16x2.f32 %r1517, %r1516, %r1515; 2026-02-21T08:39:53.2319753Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2319807Z cvt.u64.u32 %rd589, %r814; 2026-02-21T08:39:53.2319862Z cvt.u64.u32 %rd590, %r815; 2026-02-21T08:39:53.2319925Z shl.b64 %rd591, %rd590, 32; 2026-02-21T08:39:53.2320005Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T08:39:53.2320166Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2320230Z mov.b64 {%r1518, %r1519}, %rd592; 2026-02-21T08:39:53.2320293Z cvt.rn.f16x2.f32 %r1520, %r1519, %r1518; 2026-02-21T08:39:53.2320456Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2320517Z cvt.u64.u32 %rd593, %r816; 2026-02-21T08:39:53.2320574Z cvt.u64.u32 %rd594, %r817; 2026-02-21T08:39:53.2320630Z shl.b64 %rd595, %rd594, 32; 2026-02-21T08:39:53.2320687Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T08:39:53.2320859Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2320916Z mov.b64 {%r1521, %r1522}, %rd596; 2026-02-21T08:39:53.2320981Z cvt.rn.f16x2.f32 %r1523, %r1522, %r1521; 2026-02-21T08:39:53.2321167Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2321225Z cvt.u64.u32 %rd597, %r818; 2026-02-21T08:39:53.2321281Z cvt.u64.u32 %rd598, %r819; 2026-02-21T08:39:53.2321346Z shl.b64 %rd599, %rd598, 32; 2026-02-21T08:39:53.2321403Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T08:39:53.2321568Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2321627Z mov.b64 {%r1524, %r1525}, %rd600; 2026-02-21T08:39:53.2321729Z cvt.rn.f16x2.f32 %r1526, %r1525, %r1524; 2026-02-21T08:39:53.2321894Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2321950Z cvt.u64.u32 %rd601, %r820; 2026-02-21T08:39:53.2322023Z cvt.u64.u32 %rd602, %r821; 2026-02-21T08:39:53.2322080Z shl.b64 %rd603, %rd602, 32; 2026-02-21T08:39:53.2322138Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T08:39:53.2322301Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2322367Z mov.b64 {%r1527, %r1528}, %rd604; 2026-02-21T08:39:53.2322430Z cvt.rn.f16x2.f32 %r1529, %r1528, %r1527; 2026-02-21T08:39:53.2322592Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2322656Z cvt.u64.u32 %rd605, %r822; 2026-02-21T08:39:53.2322710Z cvt.u64.u32 %rd606, %r823; 2026-02-21T08:39:53.2322765Z shl.b64 %rd607, %rd606, 32; 2026-02-21T08:39:53.2322830Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T08:39:53.2322993Z .loc 1 58 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:58:27 2026-02-21T08:39:53.2323048Z mov.b64 {%r1530, %r1531}, %rd608; 2026-02-21T08:39:53.2323116Z cvt.rn.f16x2.f32 %r1532, %r1531, %r1530; 2026-02-21T08:39:53.2323340Z .loc 1 59 45 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:59:45 2026-02-21T08:39:53.2323399Z shl.b32 %r1533, %r1115, 10; 2026-02-21T08:39:53.2323455Z shl.b32 %r1534, %r1116, 10; 2026-02-21T08:39:53.2323516Z shl.b32 %r1535, %r1117, 10; 2026-02-21T08:39:53.2323570Z shl.b32 %r1536, %r1118, 10; 2026-02-21T08:39:53.2323624Z shl.b32 %r1537, %r1119, 10; 2026-02-21T08:39:53.2323684Z shl.b32 %r1538, %r1120, 10; 2026-02-21T08:39:53.2323737Z shl.b32 %r1539, %r1121, 10; 2026-02-21T08:39:53.2323791Z shl.b32 %r1540, %r1122, 10; 2026-02-21T08:39:53.2323845Z shl.b32 %r1541, %r1123, 10; 2026-02-21T08:39:53.2323907Z shl.b32 %r1542, %r1124, 10; 2026-02-21T08:39:53.2323963Z shl.b32 %r1543, %r1125, 10; 2026-02-21T08:39:53.2324016Z shl.b32 %r1544, %r1126, 10; 2026-02-21T08:39:53.2324077Z shl.b32 %r1545, %r1127, 10; 2026-02-21T08:39:53.2324131Z shl.b32 %r1546, %r1128, 10; 2026-02-21T08:39:53.2324185Z shl.b32 %r1547, %r1129, 10; 2026-02-21T08:39:53.2324239Z shl.b32 %r1548, %r1130, 10; 2026-02-21T08:39:53.2324302Z shl.b32 %r1549, %r1131, 10; 2026-02-21T08:39:53.2324355Z shl.b32 %r1550, %r1132, 10; 2026-02-21T08:39:53.2324410Z shl.b32 %r1551, %r1133, 10; 2026-02-21T08:39:53.2324497Z shl.b32 %r1552, %r1134, 10; 2026-02-21T08:39:53.2324552Z shl.b32 %r1553, %r1135, 10; 2026-02-21T08:39:53.2324607Z shl.b32 %r1554, %r1136, 10; 2026-02-21T08:39:53.2324700Z shl.b32 %r1555, %r1137, 10; 2026-02-21T08:39:53.2324757Z shl.b32 %r1556, %r1138, 10; 2026-02-21T08:39:53.2324811Z shl.b32 %r1557, %r1139, 10; 2026-02-21T08:39:53.2324865Z shl.b32 %r1558, %r1140, 10; 2026-02-21T08:39:53.2324927Z shl.b32 %r1559, %r1141, 10; 2026-02-21T08:39:53.2324983Z shl.b32 %r1560, %r1142, 10; 2026-02-21T08:39:53.2325035Z shl.b32 %r1561, %r1143, 10; 2026-02-21T08:39:53.2325096Z shl.b32 %r1562, %r1144, 10; 2026-02-21T08:39:53.2325150Z shl.b32 %r1563, %r1145, 10; 2026-02-21T08:39:53.2325204Z shl.b32 %r1564, %r1146, 10; 2026-02-21T08:39:53.2325367Z .loc 1 59 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:59:52 2026-02-21T08:39:53.2325435Z add.s32 %r1565, %r1533, %r1147; 2026-02-21T08:39:53.2325537Z add.s32 %r1566, %r1534, %r1147; 2026-02-21T08:39:53.2325596Z add.s32 %r1567, %r1535, %r1147; 2026-02-21T08:39:53.2325659Z add.s32 %r1568, %r1536, %r1147; 2026-02-21T08:39:53.2325714Z add.s32 %r1569, %r1537, %r1147; 2026-02-21T08:39:53.2325768Z add.s32 %r1570, %r1538, %r1147; 2026-02-21T08:39:53.2325829Z add.s32 %r1571, %r1539, %r1147; 2026-02-21T08:39:53.2325884Z add.s32 %r1572, %r1540, %r1147; 2026-02-21T08:39:53.2325939Z add.s32 %r1573, %r1541, %r1147; 2026-02-21T08:39:53.2325993Z add.s32 %r1574, %r1542, %r1147; 2026-02-21T08:39:53.2326083Z add.s32 %r1575, %r1543, %r1147; 2026-02-21T08:39:53.2326137Z add.s32 %r1576, %r1544, %r1147; 2026-02-21T08:39:53.2326192Z add.s32 %r1577, %r1545, %r1147; 2026-02-21T08:39:53.2326253Z add.s32 %r1578, %r1546, %r1147; 2026-02-21T08:39:53.2326309Z add.s32 %r1579, %r1547, %r1147; 2026-02-21T08:39:53.2326365Z add.s32 %r1580, %r1548, %r1147; 2026-02-21T08:39:53.2326419Z add.s32 %r1581, %r1549, %r1147; 2026-02-21T08:39:53.2326483Z add.s32 %r1582, %r1550, %r1147; 2026-02-21T08:39:53.2326539Z add.s32 %r1583, %r1551, %r1147; 2026-02-21T08:39:53.2326593Z add.s32 %r1584, %r1552, %r1147; 2026-02-21T08:39:53.2326654Z add.s32 %r1585, %r1553, %r1147; 2026-02-21T08:39:53.2326708Z add.s32 %r1586, %r1554, %r1147; 2026-02-21T08:39:53.2326761Z add.s32 %r1587, %r1555, %r1147; 2026-02-21T08:39:53.2326817Z add.s32 %r1588, %r1556, %r1147; 2026-02-21T08:39:53.2326881Z add.s32 %r1589, %r1557, %r1147; 2026-02-21T08:39:53.2326935Z add.s32 %r1590, %r1558, %r1147; 2026-02-21T08:39:53.2326993Z add.s32 %r1591, %r1559, %r1147; 2026-02-21T08:39:53.2327055Z add.s32 %r1592, %r1560, %r1147; 2026-02-21T08:39:53.2327110Z add.s32 %r1593, %r1561, %r1147; 2026-02-21T08:39:53.2327164Z add.s32 %r1594, %r1562, %r1147; 2026-02-21T08:39:53.2327225Z add.s32 %r1595, %r1563, %r1147; 2026-02-21T08:39:53.2327279Z add.s32 %r1596, %r1564, %r1147; 2026-02-21T08:39:53.2327466Z .loc 1 59 24 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:59:24 2026-02-21T08:39:53.2327535Z mad.wide.s32 %rd65, %r1565, 2, %rd5; 2026-02-21T08:39:53.2327608Z mad.wide.s32 %rd66, %r1566, 2, %rd5; 2026-02-21T08:39:53.2327670Z mad.wide.s32 %rd67, %r1567, 2, %rd5; 2026-02-21T08:39:53.2327729Z mad.wide.s32 %rd68, %r1568, 2, %rd5; 2026-02-21T08:39:53.2327796Z mad.wide.s32 %rd69, %r1569, 2, %rd5; 2026-02-21T08:39:53.2327854Z mad.wide.s32 %rd70, %r1570, 2, %rd5; 2026-02-21T08:39:53.2327913Z mad.wide.s32 %rd71, %r1571, 2, %rd5; 2026-02-21T08:39:53.2327979Z mad.wide.s32 %rd72, %r1572, 2, %rd5; 2026-02-21T08:39:53.2328039Z mad.wide.s32 %rd73, %r1573, 2, %rd5; 2026-02-21T08:39:53.2328098Z mad.wide.s32 %rd74, %r1574, 2, %rd5; 2026-02-21T08:39:53.2328156Z mad.wide.s32 %rd75, %r1575, 2, %rd5; 2026-02-21T08:39:53.2328222Z mad.wide.s32 %rd76, %r1576, 2, %rd5; 2026-02-21T08:39:53.2328281Z mad.wide.s32 %rd77, %r1577, 2, %rd5; 2026-02-21T08:39:53.2328342Z mad.wide.s32 %rd78, %r1578, 2, %rd5; 2026-02-21T08:39:53.2328412Z mad.wide.s32 %rd79, %r1579, 2, %rd5; 2026-02-21T08:39:53.2328498Z mad.wide.s32 %rd80, %r1580, 2, %rd5; 2026-02-21T08:39:53.2328557Z mad.wide.s32 %rd81, %r1581, 2, %rd5; 2026-02-21T08:39:53.2328619Z mad.wide.s32 %rd82, %r1582, 2, %rd5; 2026-02-21T08:39:53.2328687Z mad.wide.s32 %rd83, %r1583, 2, %rd5; 2026-02-21T08:39:53.2328746Z mad.wide.s32 %rd84, %r1584, 2, %rd5; 2026-02-21T08:39:53.2328804Z mad.wide.s32 %rd85, %r1585, 2, %rd5; 2026-02-21T08:39:53.2328869Z mad.wide.s32 %rd86, %r1586, 2, %rd5; 2026-02-21T08:39:53.2328927Z mad.wide.s32 %rd87, %r1587, 2, %rd5; 2026-02-21T08:39:53.2328986Z mad.wide.s32 %rd88, %r1588, 2, %rd5; 2026-02-21T08:39:53.2329050Z mad.wide.s32 %rd89, %r1589, 2, %rd5; 2026-02-21T08:39:53.2329109Z mad.wide.s32 %rd90, %r1590, 2, %rd5; 2026-02-21T08:39:53.2329166Z mad.wide.s32 %rd91, %r1591, 2, %rd5; 2026-02-21T08:39:53.2329225Z mad.wide.s32 %rd92, %r1592, 2, %rd5; 2026-02-21T08:39:53.2329292Z mad.wide.s32 %rd93, %r1593, 2, %rd5; 2026-02-21T08:39:53.2329370Z mad.wide.s32 %rd94, %r1594, 2, %rd5; 2026-02-21T08:39:53.2329432Z mad.wide.s32 %rd95, %r1595, 2, %rd5; 2026-02-21T08:39:53.2329497Z mad.wide.s32 %rd96, %r1596, 2, %rd5; 2026-02-21T08:39:53.2329658Z .loc 1 59 82 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:59:82 2026-02-21T08:39:53.2329712Z bar.sync 0, 128; 2026-02-21T08:39:53.2329812Z st.shared.v4.b32 [%r78], {%r1151, %r1163, %r1175, %r1187}; 2026-02-21T08:39:53.2329908Z st.shared.v4.b32 [%r79], {%r1199, %r1211, %r1223, %r1235}; 2026-02-21T08:39:53.2329997Z st.shared.v4.b32 [%r80], {%r1247, %r1259, %r1271, %r1283}; 2026-02-21T08:39:53.2330115Z st.shared.v4.b32 [%r81], {%r1295, %r1307, %r1319, %r1331}; 2026-02-21T08:39:53.2330176Z bar.sync 0, 128; 2026-02-21T08:39:53.2330234Z // begin inline asm 2026-02-21T08:39:53.2330386Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r986, %r990, %r994, %r998}, [%r830]; 2026-02-21T08:39:53.2330449Z // end inline asm 2026-02-21T08:39:53.2330504Z // begin inline asm 2026-02-21T08:39:53.2330656Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1002, %r1006, %r1010, %r1014}, [%r835]; 2026-02-21T08:39:53.2330710Z // end inline asm 2026-02-21T08:39:53.2330772Z // begin inline asm 2026-02-21T08:39:53.2330920Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1018, %r1022, %r1026, %r1030}, [%r840]; 2026-02-21T08:39:53.2330973Z // end inline asm 2026-02-21T08:39:53.2331034Z // begin inline asm 2026-02-21T08:39:53.2331179Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1034, %r1038, %r1042, %r1046}, [%r845]; 2026-02-21T08:39:53.2331233Z // end inline asm 2026-02-21T08:39:53.2331295Z bar.sync 0, 128; 2026-02-21T08:39:53.2331385Z st.shared.v4.b32 [%r78], {%r1343, %r1355, %r1367, %r1379}; 2026-02-21T08:39:53.2331473Z st.shared.v4.b32 [%r79], {%r1391, %r1403, %r1415, %r1427}; 2026-02-21T08:39:53.2331559Z st.shared.v4.b32 [%r80], {%r1439, %r1451, %r1463, %r1475}; 2026-02-21T08:39:53.2331676Z st.shared.v4.b32 [%r81], {%r1487, %r1499, %r1511, %r1523}; 2026-02-21T08:39:53.2331731Z bar.sync 0, 128; 2026-02-21T08:39:53.2331787Z // begin inline asm 2026-02-21T08:39:53.2331941Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1050, %r1054, %r1058, %r1062}, [%r830]; 2026-02-21T08:39:53.2331993Z // end inline asm 2026-02-21T08:39:53.2332048Z // begin inline asm 2026-02-21T08:39:53.2332191Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1066, %r1070, %r1074, %r1078}, [%r835]; 2026-02-21T08:39:53.2332252Z // end inline asm 2026-02-21T08:39:53.2332305Z // begin inline asm 2026-02-21T08:39:53.2332447Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1082, %r1086, %r1090, %r1094}, [%r840]; 2026-02-21T08:39:53.2332507Z // end inline asm 2026-02-21T08:39:53.2332560Z // begin inline asm 2026-02-21T08:39:53.2332699Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1098, %r1102, %r1106, %r1110}, [%r845]; 2026-02-21T08:39:53.2332757Z // end inline asm 2026-02-21T08:39:53.2332808Z bar.sync 0, 128; 2026-02-21T08:39:53.2332897Z st.shared.v4.b32 [%r78], {%r1154, %r1166, %r1178, %r1190}; 2026-02-21T08:39:53.2332984Z st.shared.v4.b32 [%r79], {%r1202, %r1214, %r1226, %r1238}; 2026-02-21T08:39:53.2333104Z st.shared.v4.b32 [%r80], {%r1250, %r1262, %r1274, %r1286}; 2026-02-21T08:39:53.2333188Z st.shared.v4.b32 [%r81], {%r1298, %r1310, %r1322, %r1334}; 2026-02-21T08:39:53.2333241Z bar.sync 0, 128; 2026-02-21T08:39:53.2333302Z // begin inline asm 2026-02-21T08:39:53.2333443Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r987, %r991, %r995, %r999}, [%r830]; 2026-02-21T08:39:53.2333494Z // end inline asm 2026-02-21T08:39:53.2333553Z // begin inline asm 2026-02-21T08:39:53.2333695Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1003, %r1007, %r1011, %r1015}, [%r835]; 2026-02-21T08:39:53.2333747Z // end inline asm 2026-02-21T08:39:53.2333800Z // begin inline asm 2026-02-21T08:39:53.2333947Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1019, %r1023, %r1027, %r1031}, [%r840]; 2026-02-21T08:39:53.2333998Z // end inline asm 2026-02-21T08:39:53.2334053Z // begin inline asm 2026-02-21T08:39:53.2334223Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1035, %r1039, %r1043, %r1047}, [%r845]; 2026-02-21T08:39:53.2334279Z // end inline asm 2026-02-21T08:39:53.2334330Z bar.sync 0, 128; 2026-02-21T08:39:53.2334417Z st.shared.v4.b32 [%r78], {%r1346, %r1358, %r1370, %r1382}; 2026-02-21T08:39:53.2334509Z st.shared.v4.b32 [%r79], {%r1394, %r1406, %r1418, %r1430}; 2026-02-21T08:39:53.2334594Z st.shared.v4.b32 [%r80], {%r1442, %r1454, %r1466, %r1478}; 2026-02-21T08:39:53.2334700Z st.shared.v4.b32 [%r81], {%r1490, %r1502, %r1514, %r1526}; 2026-02-21T08:39:53.2334762Z bar.sync 0, 128; 2026-02-21T08:39:53.2334847Z // begin inline asm 2026-02-21T08:39:53.2334994Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1051, %r1055, %r1059, %r1063}, [%r830]; 2026-02-21T08:39:53.2335053Z // end inline asm 2026-02-21T08:39:53.2335106Z // begin inline asm 2026-02-21T08:39:53.2335251Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1067, %r1071, %r1075, %r1079}, [%r835]; 2026-02-21T08:39:53.2335303Z // end inline asm 2026-02-21T08:39:53.2335364Z // begin inline asm 2026-02-21T08:39:53.2335505Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1083, %r1087, %r1091, %r1095}, [%r840]; 2026-02-21T08:39:53.2335559Z // end inline asm 2026-02-21T08:39:53.2335621Z // begin inline asm 2026-02-21T08:39:53.2335762Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1099, %r1103, %r1107, %r1111}, [%r845]; 2026-02-21T08:39:53.2335813Z // end inline asm 2026-02-21T08:39:53.2335873Z bar.sync 0, 128; 2026-02-21T08:39:53.2335960Z st.shared.v4.b32 [%r78], {%r1157, %r1169, %r1181, %r1193}; 2026-02-21T08:39:53.2336047Z st.shared.v4.b32 [%r79], {%r1205, %r1217, %r1229, %r1241}; 2026-02-21T08:39:53.2336135Z st.shared.v4.b32 [%r80], {%r1253, %r1265, %r1277, %r1289}; 2026-02-21T08:39:53.2336228Z st.shared.v4.b32 [%r81], {%r1301, %r1313, %r1325, %r1337}; 2026-02-21T08:39:53.2336283Z bar.sync 0, 128; 2026-02-21T08:39:53.2336338Z // begin inline asm 2026-02-21T08:39:53.2336524Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r988, %r992, %r996, %r1000}, [%r830]; 2026-02-21T08:39:53.2336583Z // end inline asm 2026-02-21T08:39:53.2336638Z // begin inline asm 2026-02-21T08:39:53.2336780Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1004, %r1008, %r1012, %r1016}, [%r835]; 2026-02-21T08:39:53.2336844Z // end inline asm 2026-02-21T08:39:53.2336897Z // begin inline asm 2026-02-21T08:39:53.2337038Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1024, %r1028, %r1032}, [%r840]; 2026-02-21T08:39:53.2337098Z // end inline asm 2026-02-21T08:39:53.2337152Z // begin inline asm 2026-02-21T08:39:53.2337293Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1036, %r1040, %r1044, %r1048}, [%r845]; 2026-02-21T08:39:53.2337356Z // end inline asm 2026-02-21T08:39:53.2337408Z bar.sync 0, 128; 2026-02-21T08:39:53.2337494Z st.shared.v4.b32 [%r78], {%r1349, %r1361, %r1373, %r1385}; 2026-02-21T08:39:53.2337580Z st.shared.v4.b32 [%r79], {%r1397, %r1409, %r1421, %r1433}; 2026-02-21T08:39:53.2337673Z st.shared.v4.b32 [%r80], {%r1445, %r1457, %r1469, %r1481}; 2026-02-21T08:39:53.2337762Z st.shared.v4.b32 [%r81], {%r1493, %r1505, %r1517, %r1529}; 2026-02-21T08:39:53.2337841Z bar.sync 0, 128; 2026-02-21T08:39:53.2337901Z // begin inline asm 2026-02-21T08:39:53.2338042Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1052, %r1056, %r1060, %r1064}, [%r830]; 2026-02-21T08:39:53.2338095Z // end inline asm 2026-02-21T08:39:53.2338154Z // begin inline asm 2026-02-21T08:39:53.2338298Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1068, %r1072, %r1076, %r1080}, [%r835]; 2026-02-21T08:39:53.2338350Z // end inline asm 2026-02-21T08:39:53.2338403Z // begin inline asm 2026-02-21T08:39:53.2338550Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1084, %r1088, %r1092, %r1096}, [%r840]; 2026-02-21T08:39:53.2338602Z // end inline asm 2026-02-21T08:39:53.2338656Z // begin inline asm 2026-02-21T08:39:53.2338802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1100, %r1104, %r1108, %r1112}, [%r845]; 2026-02-21T08:39:53.2338855Z // end inline asm 2026-02-21T08:39:53.2338908Z bar.sync 0, 128; 2026-02-21T08:39:53.2339021Z st.shared.v4.b32 [%r78], {%r1160, %r1172, %r1184, %r1196}; 2026-02-21T08:39:53.2339116Z st.shared.v4.b32 [%r79], {%r1208, %r1220, %r1232, %r1244}; 2026-02-21T08:39:53.2339203Z st.shared.v4.b32 [%r80], {%r1256, %r1268, %r1280, %r1292}; 2026-02-21T08:39:53.2339288Z st.shared.v4.b32 [%r81], {%r1304, %r1316, %r1328, %r1340}; 2026-02-21T08:39:53.2339348Z bar.sync 0, 128; 2026-02-21T08:39:53.2339400Z // begin inline asm 2026-02-21T08:39:53.2339541Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r989, %r993, %r997, %r1001}, [%r830]; 2026-02-21T08:39:53.2339622Z // end inline asm 2026-02-21T08:39:53.2339676Z // begin inline asm 2026-02-21T08:39:53.2339817Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1005, %r1009, %r1013, %r1017}, [%r835]; 2026-02-21T08:39:53.2339869Z // end inline asm 2026-02-21T08:39:53.2339929Z // begin inline asm 2026-02-21T08:39:53.2340069Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1021, %r1025, %r1029, %r1033}, [%r840]; 2026-02-21T08:39:53.2340121Z // end inline asm 2026-02-21T08:39:53.2340183Z // begin inline asm 2026-02-21T08:39:53.2340322Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1037, %r1041, %r1045, %r1049}, [%r845]; 2026-02-21T08:39:53.2340373Z // end inline asm 2026-02-21T08:39:53.2340433Z bar.sync 0, 128; 2026-02-21T08:39:53.2340520Z st.shared.v4.b32 [%r78], {%r1352, %r1364, %r1376, %r1388}; 2026-02-21T08:39:53.2340605Z st.shared.v4.b32 [%r79], {%r1400, %r1412, %r1424, %r1436}; 2026-02-21T08:39:53.2340690Z st.shared.v4.b32 [%r80], {%r1448, %r1460, %r1472, %r1484}; 2026-02-21T08:39:53.2340781Z st.shared.v4.b32 [%r81], {%r1496, %r1508, %r1520, %r1532}; 2026-02-21T08:39:53.2340835Z bar.sync 0, 128; 2026-02-21T08:39:53.2340888Z // begin inline asm 2026-02-21T08:39:53.2341033Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1053, %r1057, %r1061, %r1065}, [%r830]; 2026-02-21T08:39:53.2341083Z // end inline asm 2026-02-21T08:39:53.2341135Z // begin inline asm 2026-02-21T08:39:53.2341294Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1069, %r1073, %r1077, %r1081}, [%r835]; 2026-02-21T08:39:53.2341355Z // end inline asm 2026-02-21T08:39:53.2341408Z // begin inline asm 2026-02-21T08:39:53.2341549Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1085, %r1089, %r1093, %r1097}, [%r840]; 2026-02-21T08:39:53.2341608Z // end inline asm 2026-02-21T08:39:53.2341661Z // begin inline asm 2026-02-21T08:39:53.2341798Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1101, %r1105, %r1109, %r1113}, [%r845]; 2026-02-21T08:39:53.2341856Z // end inline asm 2026-02-21T08:39:53.2341908Z // begin inline asm 2026-02-21T08:39:53.2342007Z st.global.v4.b32 [ %rd65 + 0 ], { %r986, %r987, %r988, %r989 }; 2026-02-21T08:39:53.2342061Z // end inline asm 2026-02-21T08:39:53.2342120Z // begin inline asm 2026-02-21T08:39:53.2342212Z st.global.v4.b32 [ %rd66 + 0 ], { %r990, %r991, %r992, %r993 }; 2026-02-21T08:39:53.2342265Z // end inline asm 2026-02-21T08:39:53.2342325Z // begin inline asm 2026-02-21T08:39:53.2342418Z st.global.v4.b32 [ %rd67 + 0 ], { %r994, %r995, %r996, %r997 }; 2026-02-21T08:39:53.2342472Z // end inline asm 2026-02-21T08:39:53.2342561Z // begin inline asm 2026-02-21T08:39:53.2342668Z st.global.v4.b32 [ %rd68 + 0 ], { %r998, %r999, %r1000, %r1001 }; 2026-02-21T08:39:53.2342720Z // end inline asm 2026-02-21T08:39:53.2342773Z // begin inline asm 2026-02-21T08:39:53.2342881Z st.global.v4.b32 [ %rd69 + 0 ], { %r1002, %r1003, %r1004, %r1005 }; 2026-02-21T08:39:53.2342933Z // end inline asm 2026-02-21T08:39:53.2342985Z // begin inline asm 2026-02-21T08:39:53.2343092Z st.global.v4.b32 [ %rd70 + 0 ], { %r1006, %r1007, %r1008, %r1009 }; 2026-02-21T08:39:53.2343145Z // end inline asm 2026-02-21T08:39:53.2343199Z // begin inline asm 2026-02-21T08:39:53.2343293Z st.global.v4.b32 [ %rd71 + 0 ], { %r1010, %r1011, %r1012, %r1013 }; 2026-02-21T08:39:53.2343354Z // end inline asm 2026-02-21T08:39:53.2343406Z // begin inline asm 2026-02-21T08:39:53.2343501Z st.global.v4.b32 [ %rd72 + 0 ], { %r1014, %r1015, %r1016, %r1017 }; 2026-02-21T08:39:53.2343561Z // end inline asm 2026-02-21T08:39:53.2343636Z // begin inline asm 2026-02-21T08:39:53.2343734Z st.global.v4.b32 [ %rd73 + 0 ], { %r1018, %r1019, %r1020, %r1021 }; 2026-02-21T08:39:53.2343787Z // end inline asm 2026-02-21T08:39:53.2343848Z // begin inline asm 2026-02-21T08:39:53.2343942Z st.global.v4.b32 [ %rd74 + 0 ], { %r1022, %r1023, %r1024, %r1025 }; 2026-02-21T08:39:53.2343994Z // end inline asm 2026-02-21T08:39:53.2344055Z // begin inline asm 2026-02-21T08:39:53.2344148Z st.global.v4.b32 [ %rd75 + 0 ], { %r1026, %r1027, %r1028, %r1029 }; 2026-02-21T08:39:53.2344200Z // end inline asm 2026-02-21T08:39:53.2344277Z // begin inline asm 2026-02-21T08:39:53.2344378Z st.global.v4.b32 [ %rd76 + 0 ], { %r1030, %r1031, %r1032, %r1033 }; 2026-02-21T08:39:53.2344430Z // end inline asm 2026-02-21T08:39:53.2344484Z // begin inline asm 2026-02-21T08:39:53.2344587Z st.global.v4.b32 [ %rd77 + 0 ], { %r1034, %r1035, %r1036, %r1037 }; 2026-02-21T08:39:53.2344642Z // end inline asm 2026-02-21T08:39:53.2344723Z // begin inline asm 2026-02-21T08:39:53.2344826Z st.global.v4.b32 [ %rd78 + 0 ], { %r1038, %r1039, %r1040, %r1041 }; 2026-02-21T08:39:53.2344881Z // end inline asm 2026-02-21T08:39:53.2344935Z // begin inline asm 2026-02-21T08:39:53.2345028Z st.global.v4.b32 [ %rd79 + 0 ], { %r1042, %r1043, %r1044, %r1045 }; 2026-02-21T08:39:53.2345087Z // end inline asm 2026-02-21T08:39:53.2345140Z // begin inline asm 2026-02-21T08:39:53.2345233Z st.global.v4.b32 [ %rd80 + 0 ], { %r1046, %r1047, %r1048, %r1049 }; 2026-02-21T08:39:53.2345291Z // end inline asm 2026-02-21T08:39:53.2345343Z // begin inline asm 2026-02-21T08:39:53.2345437Z st.global.v4.b32 [ %rd81 + 0 ], { %r1050, %r1051, %r1052, %r1053 }; 2026-02-21T08:39:53.2345489Z // end inline asm 2026-02-21T08:39:53.2345549Z // begin inline asm 2026-02-21T08:39:53.2345643Z st.global.v4.b32 [ %rd82 + 0 ], { %r1054, %r1055, %r1056, %r1057 }; 2026-02-21T08:39:53.2345695Z // end inline asm 2026-02-21T08:39:53.2345782Z // begin inline asm 2026-02-21T08:39:53.2345880Z st.global.v4.b32 [ %rd83 + 0 ], { %r1058, %r1059, %r1060, %r1061 }; 2026-02-21T08:39:53.2345935Z // end inline asm 2026-02-21T08:39:53.2345996Z // begin inline asm 2026-02-21T08:39:53.2346090Z st.global.v4.b32 [ %rd84 + 0 ], { %r1062, %r1063, %r1064, %r1065 }; 2026-02-21T08:39:53.2346143Z // end inline asm 2026-02-21T08:39:53.2346197Z // begin inline asm 2026-02-21T08:39:53.2346298Z st.global.v4.b32 [ %rd85 + 0 ], { %r1066, %r1067, %r1068, %r1069 }; 2026-02-21T08:39:53.2346351Z // end inline asm 2026-02-21T08:39:53.2346404Z // begin inline asm 2026-02-21T08:39:53.2346507Z st.global.v4.b32 [ %rd86 + 0 ], { %r1070, %r1071, %r1072, %r1073 }; 2026-02-21T08:39:53.2346560Z // end inline asm 2026-02-21T08:39:53.2346614Z // begin inline asm 2026-02-21T08:39:53.2346709Z st.global.v4.b32 [ %rd87 + 0 ], { %r1074, %r1075, %r1076, %r1077 }; 2026-02-21T08:39:53.2346769Z // end inline asm 2026-02-21T08:39:53.2346822Z // begin inline asm 2026-02-21T08:39:53.2346917Z st.global.v4.b32 [ %rd88 + 0 ], { %r1078, %r1079, %r1080, %r1081 }; 2026-02-21T08:39:53.2346978Z // end inline asm 2026-02-21T08:39:53.2347060Z // begin inline asm 2026-02-21T08:39:53.2347153Z st.global.v4.b32 [ %rd89 + 0 ], { %r1082, %r1083, %r1084, %r1085 }; 2026-02-21T08:39:53.2347206Z // end inline asm 2026-02-21T08:39:53.2347267Z // begin inline asm 2026-02-21T08:39:53.2347358Z st.global.v4.b32 [ %rd90 + 0 ], { %r1086, %r1087, %r1088, %r1089 }; 2026-02-21T08:39:53.2347409Z // end inline asm 2026-02-21T08:39:53.2347470Z // begin inline asm 2026-02-21T08:39:53.2347563Z st.global.v4.b32 [ %rd91 + 0 ], { %r1090, %r1091, %r1092, %r1093 }; 2026-02-21T08:39:53.2347617Z // end inline asm 2026-02-21T08:39:53.2347676Z // begin inline asm 2026-02-21T08:39:53.2347769Z st.global.v4.b32 [ %rd92 + 0 ], { %r1094, %r1095, %r1096, %r1097 }; 2026-02-21T08:39:53.2347821Z // end inline asm 2026-02-21T08:39:53.2347872Z // begin inline asm 2026-02-21T08:39:53.2347973Z st.global.v4.b32 [ %rd93 + 0 ], { %r1098, %r1099, %r1100, %r1101 }; 2026-02-21T08:39:53.2348024Z // end inline asm 2026-02-21T08:39:53.2348102Z // begin inline asm 2026-02-21T08:39:53.2348205Z st.global.v4.b32 [ %rd94 + 0 ], { %r1102, %r1103, %r1104, %r1105 }; 2026-02-21T08:39:53.2348257Z // end inline asm 2026-02-21T08:39:53.2348309Z // begin inline asm 2026-02-21T08:39:53.2348402Z st.global.v4.b32 [ %rd95 + 0 ], { %r1106, %r1107, %r1108, %r1109 }; 2026-02-21T08:39:53.2348460Z // end inline asm 2026-02-21T08:39:53.2348513Z // begin inline asm 2026-02-21T08:39:53.2348607Z st.global.v4.b32 [ %rd96 + 0 ], { %r1110, %r1111, %r1112, %r1113 }; 2026-02-21T08:39:53.2348697Z // end inline asm 2026-02-21T08:39:53.2348749Z mov.b32 %r1651, 1; 2026-02-21T08:39:53.2348850Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:39:53.2349029Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2349093Z xor.b32 %r1655, %r1651, %r1655; 2026-02-21T08:39:53.2349272Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2349335Z add.s32 %r1645, %r1645, -1; 2026-02-21T08:39:53.2349405Z setp.ne.b32 %p113, %r1645, 0; 2026-02-21T08:39:53.2349464Z @%p113 bra $L__BB0_18; 2026-02-21T08:39:53.2349519Z bra.uni $L__BB0_23; 2026-02-21T08:39:53.2349629Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T08:39:53.2349687Z add.s32 %r550, %r1650, 1; 2026-02-21T08:39:53.2349749Z setp.eq.b32 %p109, %r1650, 31; 2026-02-21T08:39:53.2349816Z selp.b32 %r1650, 0, %r550, %p109; 2026-02-21T08:39:53.2349878Z setp.eq.b32 %p110, %r1650, 31; 2026-02-21T08:39:53.2349935Z @%p110 bra $L__BB0_21; 2026-02-21T08:39:53.2350032Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:39:53.2350214Z .loc 1 0 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0:107 2026-02-21T08:39:53.2350291Z mov.b32 %r1651, 0; 2026-02-21T08:39:53.2350485Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2350559Z setp.ne.b32 %p111, %r1650, 0; 2026-02-21T08:39:53.2350617Z @%p111 bra $L__BB0_22; 2026-02-21T08:39:53.2350693Z // %bb.20: // %.thread 2026-02-21T08:39:53.2350793Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:39:53.2350854Z add.s32 %r1652, %r1652, 1; 2026-02-21T08:39:53.2351028Z .loc 1 37 35 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:37:35 2026-02-21T08:39:53.2351091Z shr.s32 %r1598, %r1652, 31; 2026-02-21T08:39:53.2351157Z shr.u32 %r1599, %r1598, 23; 2026-02-21T08:39:53.2351218Z add.s32 %r1600, %r1652, %r1599; 2026-02-21T08:39:53.2351276Z shr.s32 %r1601, %r1600, 9; 2026-02-21T08:39:53.2351458Z .loc 1 38 33 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:38:33 2026-02-21T08:39:53.2351516Z shl.b32 %r1602, %r1601, 6; 2026-02-21T08:39:53.2351687Z .loc 1 39 39 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:39:39 2026-02-21T08:39:53.2351777Z sub.s32 %r1603, 48, %r1602; 2026-02-21T08:39:53.2351951Z .loc 1 39 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:39:52 2026-02-21T08:39:53.2352009Z min.s32 %r1604, %r1603, 64; 2026-02-21T08:39:53.2352177Z .loc 1 40 45 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:40:45 2026-02-21T08:39:53.2352249Z and.b32 %r1605, %r1600, -512; 2026-02-21T08:39:53.2352314Z sub.s32 %r1606, %r1652, %r1605; 2026-02-21T08:39:53.2352486Z .loc 1 41 51 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:41:51 2026-02-21T08:39:53.2352566Z div.s32 %r1607, %r1606, %r1604; 2026-02-21T08:39:53.2352734Z .loc 1 40 64 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:40:64 2026-02-21T08:39:53.2352799Z mul.lo.s32 %r1608, %r1607, %r1604; 2026-02-21T08:39:53.2352891Z sub.s32 %r1609, %r1606, %r1608; 2026-02-21T08:39:53.2353062Z .loc 1 40 30 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:40:30 2026-02-21T08:39:53.2353121Z add.s32 %r1610, %r1609, %r1602; 2026-02-21T08:39:53.2353300Z .loc 1 42 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:42:27 2026-02-21T08:39:53.2353360Z shl.b32 %r1654, %r1610, 8; 2026-02-21T08:39:53.2353532Z .loc 1 44 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:44:27 2026-02-21T08:39:53.2353614Z shl.b32 %r1653, %r1607, 7; 2026-02-21T08:39:53.2353807Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2353864Z bra.uni $L__BB0_22; 2026-02-21T08:39:53.2353962Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:39:53.2354142Z .loc 1 0 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0:107 2026-02-21T08:39:53.2354205Z mov.b32 %r104, global_smem; 2026-02-21T08:39:53.2354265Z add.s32 %r105, %r104, %r3; 2026-02-21T08:39:53.2354334Z mov.u32 %r154, %ctaid.x; 2026-02-21T08:39:53.2354393Z mul.lo.s32 %r155, %r154, 3; 2026-02-21T08:39:53.2354453Z add.s32 %r156, %r155, 3; 2026-02-21T08:39:53.2354511Z min.s32 %r157, %r156, 384; 2026-02-21T08:39:53.2354579Z sub.s32 %r158, %r157, %r155; 2026-02-21T08:39:53.2354639Z shl.b32 %r5, %r158, 5; 2026-02-21T08:39:53.2354729Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T08:39:53.2354794Z bra.uni $L__BB0_2; 2026-02-21T08:39:53.2354898Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2355079Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2355169Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:39:53.2355228Z barrier.sync 1; 2026-02-21T08:39:53.2355313Z barrier.sync 1; 2026-02-21T08:39:53.2355394Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:39:53.2355486Z $L__BB0_2: // %.preheader 2026-02-21T08:39:53.2355580Z // =>This Loop Header: Depth=1 2026-02-21T08:39:53.2355670Z // Child Loop BB0_11 Depth 2 2026-02-21T08:39:53.2355767Z // Child Loop BB0_7 Depth 2 2026-02-21T08:39:53.2355936Z .loc 1 19 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:19 2026-02-21T08:39:53.2356013Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:39:53.2356079Z barrier.sync 1; 2026-02-21T08:39:53.2356144Z ld.shared.b8 %r103, [%r105+155764]; 2026-02-21T08:39:53.2356207Z setp.gt.u32 %p4, %r103, 3; 2026-02-21T08:39:53.2356267Z @%p4 bra $L__BB0_4; 2026-02-21T08:39:53.2356356Z // %bb.3: // %.preheader 2026-02-21T08:39:53.2356449Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2356514Z $L_brx_0: .branchtargets 2026-02-21T08:39:53.2356605Z $L__BB0_5, 2026-02-21T08:39:53.2356658Z $L__BB0_9, 2026-02-21T08:39:53.2356710Z $L__BB0_15, 2026-02-21T08:39:53.2356760Z $L__BB0_24; 2026-02-21T08:39:53.2356831Z brx.idx %r103, $L_brx_0; 2026-02-21T08:39:53.2356928Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2357109Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2357192Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:39:53.2357273Z ld.shared.b32 %r163, [global_smem+147456]; 2026-02-21T08:39:53.2357330Z barrier.sync 1; 2026-02-21T08:39:53.2357396Z @%p17 bra $L__BB0_8; 2026-02-21T08:39:53.2357471Z // %bb.6: // %.lr.ph7 2026-02-21T08:39:53.2357557Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2357762Z .loc 1 0 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0:107 2026-02-21T08:39:53.2357831Z mov.b32 %r1633, -1; 2026-02-21T08:39:53.2357894Z mov.pred %p129, 0; 2026-02-21T08:39:53.2357949Z mov.b32 %r1630, 0; 2026-02-21T08:39:53.2358012Z mov.b32 %r1629, %r5; 2026-02-21T08:39:53.2358070Z mov.b32 %r1631, %r1630; 2026-02-21T08:39:53.2358127Z mov.b32 %r1632, %r1630; 2026-02-21T08:39:53.2358235Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:39:53.2358329Z // => This Inner Loop Header: Depth=2 2026-02-21T08:39:53.2358527Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2358585Z add.s32 %r173, %r1633, 1; 2026-02-21T08:39:53.2358655Z setp.eq.b32 %p30, %r1633, 31; 2026-02-21T08:39:53.2358717Z selp.b32 %r1633, 0, %r173, %p30; 2026-02-21T08:39:53.2358773Z shl.b32 %r174, %r1632, 3; 2026-02-21T08:39:53.2358837Z add.s32 %r176, %r104, %r174; 2026-02-21T08:39:53.2358896Z add.s32 %r177, %r176, 155648; 2026-02-21T08:39:53.2358953Z add.s32 %r161, %r176, 155696; 2026-02-21T08:39:53.2359120Z .loc 1 54 31 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:54:31 2026-02-21T08:39:53.2359185Z shl.b32 %r178, %r1632, 14; 2026-02-21T08:39:53.2359243Z add.s32 %r179, %r104, %r178; 2026-02-21T08:39:53.2359406Z .loc 1 55 44 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:55:44 2026-02-21T08:39:53.2359470Z shl.b32 %r180, %r1632, 13; 2026-02-21T08:39:53.2359526Z add.s32 %r181, %r104, %r180; 2026-02-21T08:39:53.2359583Z add.s32 %r182, %r181, 98304; 2026-02-21T08:39:53.2359751Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2359810Z bar.warp.sync -1; 2026-02-21T08:39:53.2359866Z // begin inline asm 2026-02-21T08:39:53.2359916Z 2026-02-21T08:39:53.2360004Z { 2026-02-21T08:39:53.2360066Z .reg .pred complete; 2026-02-21T08:39:53.2360120Z waitLoop: 2026-02-21T08:39:53.2360251Z mbarrier.try_wait.parity.shared.b64 complete, [%r161], %r1631; 2026-02-21T08:39:53.2360317Z @!complete bra.uni waitLoop; 2026-02-21T08:39:53.2360365Z } 2026-02-21T08:39:53.2360369Z 2026-02-21T08:39:53.2360424Z // end inline asm 2026-02-21T08:39:53.2360600Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2360661Z setp.eq.b32 %p29, %r1633, 31; 2026-02-21T08:39:53.2360825Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2360898Z elect.sync %r183|%p20, -1; 2026-02-21T08:39:53.2360958Z bfe.u32 %r184, %r179, 4, 14; 2026-02-21T08:39:53.2361017Z cvt.u64.u32 %rd22, %r184; 2026-02-21T08:39:53.2361100Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T08:39:53.2361159Z bfe.u32 %r185, %r182, 4, 14; 2026-02-21T08:39:53.2361220Z cvt.u64.u32 %rd23, %r185; 2026-02-21T08:39:53.2361289Z or.b64 %rd13, %rd23, -9223371899382267904; 2026-02-21T08:39:53.2361353Z mov.b32 %r164, 136314896; 2026-02-21T08:39:53.2361430Z // begin inline asm 2026-02-21T08:39:53.2361578Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r163 + 0 ], %rd12, %rd13, %r164, %p129; 2026-02-21T08:39:53.2361639Z // end inline asm 2026-02-21T08:39:53.2361694Z add.s32 %r186, %r179, 32; 2026-02-21T08:39:53.2361749Z bfe.u32 %r187, %r186, 4, 14; 2026-02-21T08:39:53.2361812Z cvt.u64.u32 %rd24, %r187; 2026-02-21T08:39:53.2361877Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T08:39:53.2361932Z add.s32 %r188, %r181, 98336; 2026-02-21T08:39:53.2361988Z bfe.u32 %r189, %r188, 4, 14; 2026-02-21T08:39:53.2362051Z cvt.u64.u32 %rd25, %r189; 2026-02-21T08:39:53.2362114Z or.b64 %rd15, %rd25, -9223371899382267904; 2026-02-21T08:39:53.2362174Z mov.pred %p21, -1; 2026-02-21T08:39:53.2362235Z // begin inline asm 2026-02-21T08:39:53.2362373Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r163 + 0 ], %rd14, %rd15, %r164, %p21; 2026-02-21T08:39:53.2362425Z // end inline asm 2026-02-21T08:39:53.2362505Z add.s32 %r190, %r179, 8192; 2026-02-21T08:39:53.2362571Z bfe.u32 %r191, %r190, 4, 14; 2026-02-21T08:39:53.2362627Z cvt.u64.u32 %rd26, %r191; 2026-02-21T08:39:53.2362691Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T08:39:53.2362755Z // begin inline asm 2026-02-21T08:39:53.2362888Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r163 + 128 ], %rd16, %rd13, %r164, %p129; 2026-02-21T08:39:53.2362942Z // end inline asm 2026-02-21T08:39:53.2363004Z add.s32 %r192, %r179, 8224; 2026-02-21T08:39:53.2363060Z bfe.u32 %r193, %r192, 4, 14; 2026-02-21T08:39:53.2363140Z cvt.u64.u32 %rd27, %r193; 2026-02-21T08:39:53.2363203Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T08:39:53.2363267Z // begin inline asm 2026-02-21T08:39:53.2363399Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r163 + 128 ], %rd18, %rd15, %r164, %p21; 2026-02-21T08:39:53.2363453Z // end inline asm 2026-02-21T08:39:53.2363517Z cvt.u64.u32 %rd20, %r177; 2026-02-21T08:39:53.2363573Z // begin inline asm 2026-02-21T08:39:53.2363695Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T08:39:53.2363757Z // end inline asm 2026-02-21T08:39:53.2363820Z and.pred %p28, %p29, %p20; 2026-02-21T08:39:53.2363876Z add.s32 %r194, %r104, 155744; 2026-02-21T08:39:53.2363933Z cvt.u64.u32 %rd21, %r194; 2026-02-21T08:39:53.2363995Z // begin inline asm 2026-02-21T08:39:53.2364114Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T08:39:53.2364167Z // end inline asm 2026-02-21T08:39:53.2364344Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2364406Z setp.ne.b32 %p129, %r1633, 31; 2026-02-21T08:39:53.2364575Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2364639Z selp.b32 %r195, 1, 0, %p29; 2026-02-21T08:39:53.2364747Z xor.b32 %r1630, %r1630, %r195; 2026-02-21T08:39:53.2364806Z add.s32 %r171, %r104, 155760; 2026-02-21T08:39:53.2364861Z // begin inline asm 2026-02-21T08:39:53.2364918Z 2026-02-21T08:39:53.2364966Z { 2026-02-21T08:39:53.2365027Z @!%p29 bra.uni skipWait; 2026-02-21T08:39:53.2365090Z .reg .pred complete; 2026-02-21T08:39:53.2365144Z waitLoop: 2026-02-21T08:39:53.2365262Z mbarrier.try_wait.parity.shared.b64 complete, [%r171], %r1630; 2026-02-21T08:39:53.2365323Z @!complete bra.uni waitLoop; 2026-02-21T08:39:53.2365385Z skipWait: 2026-02-21T08:39:53.2365436Z } 2026-02-21T08:39:53.2365439Z 2026-02-21T08:39:53.2365493Z // end inline asm 2026-02-21T08:39:53.2365659Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2365715Z add.s32 %r196, %r1632, 1; 2026-02-21T08:39:53.2365773Z setp.eq.b32 %p31, %r196, 6; 2026-02-21T08:39:53.2365835Z selp.b32 %r1632, 0, %r196, %p31; 2026-02-21T08:39:53.2365901Z selp.b32 %r197, 1, 0, %p31; 2026-02-21T08:39:53.2365959Z xor.b32 %r1631, %r1631, %r197; 2026-02-21T08:39:53.2366133Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2366230Z add.s32 %r1629, %r1629, -1; 2026-02-21T08:39:53.2366290Z setp.ne.b32 %p32, %r1629, 0; 2026-02-21T08:39:53.2366347Z @%p32 bra $L__BB0_7; 2026-02-21T08:39:53.2366437Z $L__BB0_8: // %._crit_edge8 2026-02-21T08:39:53.2366525Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2366581Z barrier.sync 1; 2026-02-21T08:39:53.2366659Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:39:53.2366722Z bra.uni $L__BB0_2; 2026-02-21T08:39:53.2366815Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2366986Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2367071Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:39:53.2367125Z barrier.sync 1; 2026-02-21T08:39:53.2367315Z .loc 1 21 68 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:21:68 2026-02-21T08:39:53.2367383Z mov.u32 %r107, %ctaid.y; 2026-02-21T08:39:53.2367440Z mov.u32 %r108, %ctaid.z; 2026-02-21T08:39:53.2367496Z mov.u32 %r109, %nctaid.x; 2026-02-21T08:39:53.2367550Z mov.u32 %r110, %nctaid.y; 2026-02-21T08:39:53.2367623Z mad.lo.s32 %r111, %r108, %r110, %r107; 2026-02-21T08:39:53.2367687Z mad.lo.s32 %r112, %r111, %r109, %r154; 2026-02-21T08:39:53.2367743Z shl.b32 %r113, %r112, 8; 2026-02-21T08:39:53.2367913Z .loc 1 22 67 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:22:67 2026-02-21T08:39:53.2368040Z cvt.s64.s32 %rd7, %r113; 2026-02-21T08:39:53.2368098Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T08:39:53.2368163Z add.s64 %rd9, %rd8, 128; 2026-02-21T08:39:53.2368224Z cvta.global.u64 %rd11, %rd9; 2026-02-21T08:39:53.2368391Z .loc 1 21 68 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:21:68 2026-02-21T08:39:53.2368452Z cvta.global.u64 %rd10, %rd8; 2026-02-21T08:39:53.2368632Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2368690Z shl.b32 %r1634, %r158, 5; 2026-02-21T08:39:53.2368752Z setp.lt.s32 %p5, %r1634, 1; 2026-02-21T08:39:53.2368818Z @%p5 bra $L__BB0_14; 2026-02-21T08:39:53.2368892Z // %bb.10: // %.lr.ph 2026-02-21T08:39:53.2368977Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2369046Z add.s32 %r1644, %r155, -1; 2026-02-21T08:39:53.2369103Z add.s32 %r19, %r1, -128; 2026-02-21T08:39:53.2369157Z mov.b32 %r1641, -1; 2026-02-21T08:39:53.2369210Z mov.b32 %r1635, 0; 2026-02-21T08:39:53.2369273Z mov.b32 %r1636, %r1635; 2026-02-21T08:39:53.2369328Z mov.b32 %r1643, %r1635; 2026-02-21T08:39:53.2369381Z mov.b32 %r1642, %r1635; 2026-02-21T08:39:53.2369467Z mov.b32 %r1639, %r1635; 2026-02-21T08:39:53.2369525Z bra.uni $L__BB0_11; 2026-02-21T08:39:53.2369627Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T08:39:53.2369795Z .loc 1 0 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0:107 2026-02-21T08:39:53.2369865Z selp.b32 %r137, 0, %r1639, %p8; 2026-02-21T08:39:53.2369924Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T08:39:53.2369982Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T08:39:53.2370154Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2370212Z shl.b32 %r144, %r1636, 3; 2026-02-21T08:39:53.2370268Z add.s32 %r146, %r104, %r144; 2026-02-21T08:39:53.2370332Z add.s32 %r133, %r146, 155648; 2026-02-21T08:39:53.2370485Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2370541Z // begin inline asm 2026-02-21T08:39:53.2370589Z 2026-02-21T08:39:53.2370644Z { 2026-02-21T08:39:53.2370701Z .reg .pred complete; 2026-02-21T08:39:53.2370754Z waitLoop: 2026-02-21T08:39:53.2370898Z mbarrier.try_wait.parity.shared.b64 complete, [%r133], %r1635; 2026-02-21T08:39:53.2370959Z @!complete bra.uni waitLoop; 2026-02-21T08:39:53.2371006Z } 2026-02-21T08:39:53.2371009Z 2026-02-21T08:39:53.2371063Z // end inline asm 2026-02-21T08:39:53.2371236Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2371294Z add.s32 %r139, %r146, 155696; 2026-02-21T08:39:53.2371454Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2371517Z bar.sync 3, 64; 2026-02-21T08:39:53.2371571Z // begin inline asm 2026-02-21T08:39:53.2371681Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r139], 24576; 2026-02-21T08:39:53.2371743Z // end inline asm 2026-02-21T08:39:53.2371906Z .loc 1 54 31 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:54:31 2026-02-21T08:39:53.2371985Z shl.b32 %r147, %r1636, 14; 2026-02-21T08:39:53.2372044Z add.s32 %r136, %r104, %r147; 2026-02-21T08:39:53.2372210Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2372265Z bar.sync 3, 64; 2026-02-21T08:39:53.2372325Z elect.sync %r148|%p13, -1; 2026-02-21T08:39:53.2372393Z and.pred %p10, %p12, %p13; 2026-02-21T08:39:53.2372447Z // begin inline asm 2026-02-21T08:39:53.2372692Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r136], [%rd10, {%r137, %r1642}], [%r139]; 2026-02-21T08:39:53.2372778Z // end inline asm 2026-02-21T08:39:53.2372939Z .loc 1 55 44 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:55:44 2026-02-21T08:39:53.2372995Z shl.b32 %r149, %r1636, 13; 2026-02-21T08:39:53.2373051Z add.s32 %r150, %r104, %r149; 2026-02-21T08:39:53.2373114Z add.s32 %r140, %r150, 98304; 2026-02-21T08:39:53.2373268Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2373322Z bar.sync 3, 64; 2026-02-21T08:39:53.2373390Z elect.sync %r151|%p14, -1; 2026-02-21T08:39:53.2373448Z and.pred %p11, %p12, %p14; 2026-02-21T08:39:53.2373501Z // begin inline asm 2026-02-21T08:39:53.2373752Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r140], [%rd11, {%r137, %r1643}], [%r139]; 2026-02-21T08:39:53.2373805Z // end inline asm 2026-02-21T08:39:53.2373970Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2374036Z add.s32 %r1639, %r137, 32; 2026-02-21T08:39:53.2374183Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2374240Z add.s32 %r152, %r1636, 1; 2026-02-21T08:39:53.2374297Z setp.eq.b32 %p15, %r152, 6; 2026-02-21T08:39:53.2374387Z selp.b32 %r1636, 0, %r152, %p15; 2026-02-21T08:39:53.2374447Z selp.b32 %r153, 1, 0, %p15; 2026-02-21T08:39:53.2374506Z xor.b32 %r1635, %r1635, %r153; 2026-02-21T08:39:53.2374710Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2374766Z add.s32 %r1634, %r1634, -1; 2026-02-21T08:39:53.2374825Z setp.ne.b32 %p16, %r1634, 0; 2026-02-21T08:39:53.2374881Z @%p16 bra $L__BB0_11; 2026-02-21T08:39:53.2374943Z bra.uni $L__BB0_14; 2026-02-21T08:39:53.2375039Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:39:53.2375133Z // => This Inner Loop Header: Depth=2 2026-02-21T08:39:53.2375198Z add.s32 %r119, %r1641, 1; 2026-02-21T08:39:53.2375257Z setp.eq.b32 %p6, %r1641, 31; 2026-02-21T08:39:53.2375317Z selp.b32 %r1641, 0, %r119, %p6; 2026-02-21T08:39:53.2375382Z setp.ne.b32 %p7, %r1641, 0; 2026-02-21T08:39:53.2375441Z setp.eq.b32 %p8, %r1641, 0; 2026-02-21T08:39:53.2375497Z @%p7 bra $L__BB0_13; 2026-02-21T08:39:53.2375591Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T08:39:53.2375686Z add.s32 %r1644, %r1644, 1; 2026-02-21T08:39:53.2375851Z .loc 1 37 35 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:37:35 2026-02-21T08:39:53.2375908Z shr.s32 %r120, %r1644, 31; 2026-02-21T08:39:53.2375971Z shr.u32 %r121, %r120, 23; 2026-02-21T08:39:53.2376028Z add.s32 %r122, %r1644, %r121; 2026-02-21T08:39:53.2376084Z shr.s32 %r123, %r122, 9; 2026-02-21T08:39:53.2376256Z .loc 1 38 33 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:38:33 2026-02-21T08:39:53.2376314Z shl.b32 %r124, %r123, 6; 2026-02-21T08:39:53.2376475Z .loc 1 39 39 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:39:39 2026-02-21T08:39:53.2376529Z sub.s32 %r125, 48, %r124; 2026-02-21T08:39:53.2376696Z .loc 1 39 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:39:52 2026-02-21T08:39:53.2376751Z min.s32 %r126, %r125, 64; 2026-02-21T08:39:53.2376941Z .loc 1 40 45 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:40:45 2026-02-21T08:39:53.2377010Z and.b32 %r127, %r122, -512; 2026-02-21T08:39:53.2377070Z sub.s32 %r128, %r1644, %r127; 2026-02-21T08:39:53.2377231Z .loc 1 41 51 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:41:51 2026-02-21T08:39:53.2377299Z div.s32 %r129, %r128, %r126; 2026-02-21T08:39:53.2377459Z .loc 1 40 64 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:40:64 2026-02-21T08:39:53.2377560Z mul.lo.s32 %r130, %r129, %r126; 2026-02-21T08:39:53.2377625Z sub.s32 %r131, %r128, %r130; 2026-02-21T08:39:53.2377784Z .loc 1 40 30 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:40:30 2026-02-21T08:39:53.2377839Z add.s32 %r132, %r131, %r124; 2026-02-21T08:39:53.2377997Z .loc 1 42 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:42:27 2026-02-21T08:39:53.2378062Z shl.b32 %r1642, %r132, 8; 2026-02-21T08:39:53.2378218Z .loc 1 44 27 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:44:27 2026-02-21T08:39:53.2378272Z shl.b32 %r1643, %r129, 7; 2026-02-21T08:39:53.2378333Z bra.uni $L__BB0_13; 2026-02-21T08:39:53.2378411Z $L__BB0_14: // %._crit_edge 2026-02-21T08:39:53.2378495Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2378664Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2378720Z barrier.sync 1; 2026-02-21T08:39:53.2378794Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:39:53.2378847Z bra.uni $L__BB0_2; 2026-02-21T08:39:53.2378943Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:39:53.2379120Z .loc 1 19 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:19 2026-02-21T08:39:53.2379178Z barrier.sync 1; 2026-02-21T08:39:53.2379240Z barrier.sync 1; 2026-02-21T08:39:53.2379294Z bra.uni $L__BB0_2; 2026-02-21T08:39:53.2379378Z $L__BB0_23: // %._crit_edge11 2026-02-21T08:39:53.2379552Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2379606Z barrier.sync 1; 2026-02-21T08:39:53.2379679Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:39:53.2379842Z .loc 1 56 52 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:56:52 2026-02-21T08:39:53.2379903Z bar.sync 0, 128; 2026-02-21T08:39:53.2379957Z // begin inline asm 2026-02-21T08:39:53.2380006Z 2026-02-21T08:39:53.2380060Z { 2026-02-21T08:39:53.2380117Z .reg .pred complete; 2026-02-21T08:39:53.2380172Z waitLoop: 2026-02-21T08:39:53.2380292Z mbarrier.try_wait.parity.shared.b64 complete, [%r1611], %r1655; 2026-02-21T08:39:53.2380361Z @!complete bra.uni waitLoop; 2026-02-21T08:39:53.2380410Z } 2026-02-21T08:39:53.2380413Z 2026-02-21T08:39:53.2380491Z // end inline asm 2026-02-21T08:39:53.2380660Z .loc 1 31 107 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:107 2026-02-21T08:39:53.2380713Z bar.sync 0, 128; 2026-02-21T08:39:53.2380766Z // begin inline asm 2026-02-21T08:39:53.2380859Z @%p114 mbarrier.inval.shared::cta.b64 [%r1611]; 2026-02-21T08:39:53.2380912Z // end inline asm 2026-02-21T08:39:53.2380965Z // begin inline asm 2026-02-21T08:39:53.2381047Z @%p114 mbarrier.inval.shared::cta.b64 [%r505]; 2026-02-21T08:39:53.2381110Z // end inline asm 2026-02-21T08:39:53.2381163Z // begin inline asm 2026-02-21T08:39:53.2381241Z @%p114 mbarrier.inval.shared::cta.b64 [%r493]; 2026-02-21T08:39:53.2381301Z // end inline asm 2026-02-21T08:39:53.2381354Z bar.sync 0, 128; 2026-02-21T08:39:53.2381405Z // begin inline asm 2026-02-21T08:39:53.2381487Z @%p114 mbarrier.inval.shared::cta.b64 [%r494]; 2026-02-21T08:39:53.2381539Z // end inline asm 2026-02-21T08:39:53.2381611Z bar.sync 0, 128; 2026-02-21T08:39:53.2381666Z // begin inline asm 2026-02-21T08:39:53.2381750Z @%p114 mbarrier.inval.shared::cta.b64 [%r495]; 2026-02-21T08:39:53.2381802Z // end inline asm 2026-02-21T08:39:53.2381854Z bar.sync 0, 128; 2026-02-21T08:39:53.2381914Z // begin inline asm 2026-02-21T08:39:53.2381989Z @%p114 mbarrier.inval.shared::cta.b64 [%r496]; 2026-02-21T08:39:53.2382041Z // end inline asm 2026-02-21T08:39:53.2382092Z bar.sync 0, 128; 2026-02-21T08:39:53.2382152Z // begin inline asm 2026-02-21T08:39:53.2382247Z @%p114 mbarrier.inval.shared::cta.b64 [%r497]; 2026-02-21T08:39:53.2382298Z // end inline asm 2026-02-21T08:39:53.2382357Z bar.sync 0, 128; 2026-02-21T08:39:53.2382411Z // begin inline asm 2026-02-21T08:39:53.2382484Z @%p114 mbarrier.inval.shared::cta.b64 [%r498]; 2026-02-21T08:39:53.2382535Z // end inline asm 2026-02-21T08:39:53.2382597Z // begin inline asm 2026-02-21T08:39:53.2382672Z @%p114 mbarrier.inval.shared::cta.b64 [%r487]; 2026-02-21T08:39:53.2382724Z // end inline asm 2026-02-21T08:39:53.2382785Z bar.sync 0, 128; 2026-02-21T08:39:53.2382838Z // begin inline asm 2026-02-21T08:39:53.2382912Z @%p114 mbarrier.inval.shared::cta.b64 [%r488]; 2026-02-21T08:39:53.2382970Z // end inline asm 2026-02-21T08:39:53.2383022Z bar.sync 0, 128; 2026-02-21T08:39:53.2383074Z // begin inline asm 2026-02-21T08:39:53.2383149Z @%p114 mbarrier.inval.shared::cta.b64 [%r489]; 2026-02-21T08:39:53.2383208Z // end inline asm 2026-02-21T08:39:53.2383261Z bar.sync 0, 128; 2026-02-21T08:39:53.2383316Z // begin inline asm 2026-02-21T08:39:53.2383397Z @%p114 mbarrier.inval.shared::cta.b64 [%r490]; 2026-02-21T08:39:53.2383449Z // end inline asm 2026-02-21T08:39:53.2383500Z bar.sync 0, 128; 2026-02-21T08:39:53.2383554Z // begin inline asm 2026-02-21T08:39:53.2383636Z @%p114 mbarrier.inval.shared::cta.b64 [%r491]; 2026-02-21T08:39:53.2383713Z // end inline asm 2026-02-21T08:39:53.2383766Z bar.sync 0, 128; 2026-02-21T08:39:53.2383827Z // begin inline asm 2026-02-21T08:39:53.2383904Z @%p114 mbarrier.inval.shared::cta.b64 [%r492]; 2026-02-21T08:39:53.2383959Z // end inline asm 2026-02-21T08:39:53.2384121Z .loc 1 31 4 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:31:4 2026-02-21T08:39:53.2384182Z bar.sync 0, 128; 2026-02-21T08:39:53.2384235Z // begin inline asm 2026-02-21T08:39:53.2384349Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1627, 256; 2026-02-21T08:39:53.2384410Z // end inline asm 2026-02-21T08:39:53.2384486Z st.shared.b32 [global_smem+155768], 50529027; 2026-02-21T08:39:53.2384543Z barrier.sync 1; 2026-02-21T08:39:53.2384634Z $L__BB0_24: // %common.ret 2026-02-21T08:39:53.2384829Z .loc 1 0 0 // ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py:0 2026-02-21T08:39:53.2384882Z ret; 2026-02-21T08:39:53.2384937Z $L__tmp1: 2026-02-21T08:39:53.2385013Z $L__func_end0: 2026-02-21T08:39:53.2385096Z // -- End function 2026-02-21T08:39:53.2385173Z } 2026-02-21T08:39:53.2385386Z .file 1 "/tmp/torchinductor_root/tq/ctqhxf2c272wvksabzcvyyhdejdmw5hwlznu7lhf7wqnk6gsvuqt.py" 2026-02-21T08:39:53.2385446Z .section .debug_abbrev 2026-02-21T08:39:53.2385494Z { 2026-02-21T08:39:53.2385589Z .b8 1 // Abbreviation Code 2026-02-21T08:39:53.2385674Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:39:53.2385750Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:39:53.2385830Z .b8 37 // DW_AT_producer 2026-02-21T08:39:53.2385916Z .b8 8 // DW_FORM_string 2026-02-21T08:39:53.2385992Z .b8 19 // DW_AT_language 2026-02-21T08:39:53.2386068Z .b8 5 // DW_FORM_data2 2026-02-21T08:39:53.2386153Z .b8 3 // DW_AT_name 2026-02-21T08:39:53.2386248Z .b8 8 // DW_FORM_string 2026-02-21T08:39:53.2386328Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:39:53.2386409Z .b8 6 // DW_FORM_data4 2026-02-21T08:39:53.2386480Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:39:53.2386550Z .b8 8 // DW_FORM_string 2026-02-21T08:39:53.2386616Z .b8 0 // EOM(1) 2026-02-21T08:39:53.2386688Z .b8 0 // EOM(2) 2026-02-21T08:39:53.2386779Z .b8 0 // EOM(3) 2026-02-21T08:39:53.2386828Z } 2026-02-21T08:39:53.2386894Z .section .debug_info 2026-02-21T08:39:53.2386942Z { 2026-02-21T08:39:53.2387021Z .b32 104 // Length of Unit 2026-02-21T08:39:53.2387109Z .b8 2 // DWARF version number 2026-02-21T08:39:53.2387160Z .b8 0 2026-02-21T08:39:53.2387274Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:39:53.2387362Z .b8 8 // Address Size (in bytes) 2026-02-21T08:39:53.2387464Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:39:53.2387541Z .b8 116 // DW_AT_producer 2026-02-21T08:39:53.2387593Z .b8 114 2026-02-21T08:39:53.2387651Z .b8 105 2026-02-21T08:39:53.2387700Z .b8 116 2026-02-21T08:39:53.2387749Z .b8 111 2026-02-21T08:39:53.2387798Z .b8 110 2026-02-21T08:39:53.2387854Z .b8 0 2026-02-21T08:39:53.2387925Z .b8 2 // DW_AT_language 2026-02-21T08:39:53.2387974Z .b8 0 2026-02-21T08:39:53.2388053Z .b8 99 // DW_AT_name 2026-02-21T08:39:53.2388103Z .b8 116 2026-02-21T08:39:53.2388151Z .b8 113 2026-02-21T08:39:53.2388199Z .b8 104 2026-02-21T08:39:53.2388255Z .b8 120 2026-02-21T08:39:53.2388303Z .b8 102 2026-02-21T08:39:53.2388377Z .b8 50 2026-02-21T08:39:53.2388434Z .b8 99 2026-02-21T08:39:53.2388483Z .b8 50 2026-02-21T08:39:53.2388532Z .b8 55 2026-02-21T08:39:53.2388581Z .b8 50 2026-02-21T08:39:53.2388636Z .b8 119 2026-02-21T08:39:53.2388685Z .b8 118 2026-02-21T08:39:53.2388735Z .b8 107 2026-02-21T08:39:53.2388790Z .b8 115 2026-02-21T08:39:53.2388838Z .b8 97 2026-02-21T08:39:53.2388886Z .b8 98 2026-02-21T08:39:53.2388933Z .b8 122 2026-02-21T08:39:53.2388986Z .b8 99 2026-02-21T08:39:53.2389033Z .b8 118 2026-02-21T08:39:53.2389081Z .b8 121 2026-02-21T08:39:53.2389128Z .b8 121 2026-02-21T08:39:53.2389183Z .b8 104 2026-02-21T08:39:53.2389232Z .b8 100 2026-02-21T08:39:53.2389281Z .b8 101 2026-02-21T08:39:53.2389333Z .b8 106 2026-02-21T08:39:53.2389382Z .b8 100 2026-02-21T08:39:53.2389430Z .b8 109 2026-02-21T08:39:53.2389477Z .b8 119 2026-02-21T08:39:53.2389532Z .b8 53 2026-02-21T08:39:53.2389578Z .b8 104 2026-02-21T08:39:53.2389626Z .b8 119 2026-02-21T08:39:53.2389678Z .b8 108 2026-02-21T08:39:53.2389726Z .b8 122 2026-02-21T08:39:53.2389774Z .b8 110 2026-02-21T08:39:53.2389822Z .b8 117 2026-02-21T08:39:53.2389877Z .b8 55 2026-02-21T08:39:53.2389926Z .b8 108 2026-02-21T08:39:53.2390001Z .b8 104 2026-02-21T08:39:53.2390048Z .b8 102 2026-02-21T08:39:53.2390104Z .b8 55 2026-02-21T08:39:53.2390154Z .b8 119 2026-02-21T08:39:53.2390203Z .b8 113 2026-02-21T08:39:53.2390261Z .b8 110 2026-02-21T08:39:53.2390308Z .b8 107 2026-02-21T08:39:53.2390356Z .b8 54 2026-02-21T08:39:53.2390404Z .b8 103 2026-02-21T08:39:53.2390460Z .b8 115 2026-02-21T08:39:53.2390507Z .b8 118 2026-02-21T08:39:53.2390556Z .b8 117 2026-02-21T08:39:53.2390611Z .b8 113 2026-02-21T08:39:53.2390660Z .b8 116 2026-02-21T08:39:53.2390709Z .b8 46 2026-02-21T08:39:53.2390758Z .b8 112 2026-02-21T08:39:53.2390816Z .b8 121 2026-02-21T08:39:53.2390865Z .b8 0 2026-02-21T08:39:53.2390955Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:39:53.2391036Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:39:53.2391086Z .b8 116 2026-02-21T08:39:53.2391137Z .b8 109 2026-02-21T08:39:53.2391184Z .b8 112 2026-02-21T08:39:53.2391240Z .b8 47 2026-02-21T08:39:53.2391312Z .b8 116 2026-02-21T08:39:53.2391363Z .b8 111 2026-02-21T08:39:53.2391419Z .b8 114 2026-02-21T08:39:53.2391467Z .b8 99 2026-02-21T08:39:53.2391514Z .b8 104 2026-02-21T08:39:53.2391560Z .b8 105 2026-02-21T08:39:53.2391615Z .b8 110 2026-02-21T08:39:53.2391663Z .b8 100 2026-02-21T08:39:53.2391710Z .b8 117 2026-02-21T08:39:53.2391757Z .b8 99 2026-02-21T08:39:53.2391812Z .b8 116 2026-02-21T08:39:53.2391859Z .b8 111 2026-02-21T08:39:53.2391907Z .b8 114 2026-02-21T08:39:53.2391961Z .b8 95 2026-02-21T08:39:53.2392009Z .b8 114 2026-02-21T08:39:53.2392078Z .b8 111 2026-02-21T08:39:53.2392125Z .b8 111 2026-02-21T08:39:53.2392179Z .b8 116 2026-02-21T08:39:53.2392227Z .b8 47 2026-02-21T08:39:53.2392276Z .b8 116 2026-02-21T08:39:53.2392332Z .b8 113 2026-02-21T08:39:53.2392379Z .b8 0 2026-02-21T08:39:53.2392427Z } 2026-02-21T08:39:53.2392489Z .section .debug_macinfo { } 2026-02-21T08:39:53.2392494Z 2026-02-21T08:39:53.2392576Z ================================================================ 2026-02-21T08:39:53.2392678Z please share the reproducer above with Triton project. 2026-02-21T08:39:53.9399582Z 2026-02-21T08:39:53.9404527Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 34/34 22.8 configs/s 2026-02-21T08:39:55.3377109Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 717.5 2026-02-21T08:39:55.3381359Z configs/s 2026-02-21T08:39:55.4331981Z [145s] Generation 9 complete: 2026-02-21T08:39:55.4332230Z error=12 2026-02-21T08:39:55.4336490Z ok=23 2026-02-21T08:39:55.4340400Z min=0.0388 2026-02-21T08:39:55.4347722Z mid=0.0389 2026-02-21T08:39:55.4352486Z max=1.1458 2026-02-21T08:39:55.4352720Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:39:55.4353030Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:39:55.4353288Z 'l2_groupings': [64], 2026-02-21T08:39:55.4358670Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:39:55.4363051Z 'loop_orders': [[0, 1]], 2026-02-21T08:39:55.4368202Z 'maxnreg': 256, 2026-02-21T08:39:55.4371961Z 'num_sm_multiplier': 1, 2026-02-21T08:39:55.4373547Z 'num_stages': 6, 2026-02-21T08:39:55.4373730Z 'num_warps': 2, 2026-02-21T08:39:55.4373902Z 'pid_type': 'persistent_blocked', 2026-02-21T08:39:55.4374098Z 'range_flattens': [True, True], 2026-02-21T08:39:55.4374290Z 'range_multi_buffers': [None, False], 2026-02-21T08:39:55.4374474Z 'range_num_stages': [0, 0], 2026-02-21T08:39:55.4374644Z 'range_unroll_factors': [0, 0], 2026-02-21T08:39:55.4374886Z 'range_warp_specializes': [True, None]} 2026-02-21T08:39:55.4375187Z [145s] Fitting surrogate: 688 points, 688 targets 2026-02-21T08:39:56.1043503Z [145s] Generation 10 starting: 33 neighbors, 2 active search path(s) 2026-02-21T08:39:58.7961833Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 34/34 14.1 configs/s 2026-02-21T08:40:00.6331268Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 34/34 19.0 configs/s 2026-02-21T08:40:01.6509662Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 979.9 2026-02-21T08:40:01.6510436Z configs/s 2026-02-21T08:40:01.7302075Z [151s] Generation 10 complete: 2026-02-21T08:40:01.7302318Z error=13 2026-02-21T08:40:01.7302444Z ok=22 2026-02-21T08:40:01.7302573Z min=0.0377 2026-02-21T08:40:01.7302701Z mid=0.0552 2026-02-21T08:40:01.7302815Z max=2.0004 2026-02-21T08:40:01.7302954Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:01.7303196Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:01.7303468Z 'l2_groupings': [64], 2026-02-21T08:40:01.7303628Z 'load_eviction_policies': ['', 'first'], 2026-02-21T08:40:01.7303812Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:01.7303975Z 'maxnreg': 256, 2026-02-21T08:40:01.7304122Z 'num_sm_multiplier': 1, 2026-02-21T08:40:01.7304287Z 'num_stages': 6, 2026-02-21T08:40:01.7304418Z 'num_warps': 2, 2026-02-21T08:40:01.7304933Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:01.7305128Z 'range_flattens': [True, True], 2026-02-21T08:40:01.7305327Z 'range_multi_buffers': [True, False], 2026-02-21T08:40:01.7305507Z 'range_num_stages': [0, 0], 2026-02-21T08:40:01.7305682Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:01.7305857Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:01.7324220Z [151s] Fitting surrogate: 723 points, 723 targets 2026-02-21T08:40:02.3294627Z [151s] Generation 11 starting: 31 neighbors, 2 active search path(s) 2026-02-21T08:40:10.8619744Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32/32 1.8 configs/s 2026-02-21T08:40:12.0628304Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 32/32 28.2 configs/s 2026-02-21T08:40:13.0657979Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 994.1 2026-02-21T08:40:13.0662558Z configs/s 2026-02-21T08:40:13.1411265Z [162s] Generation 11 complete: 2026-02-21T08:40:13.1415198Z error=12 2026-02-21T08:40:13.1420536Z ok=21 2026-02-21T08:40:13.1424222Z min=0.0388 2026-02-21T08:40:13.1425866Z mid=0.0492 2026-02-21T08:40:13.1426093Z max=1.4387 2026-02-21T08:40:13.1429016Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:13.1429306Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:13.1429549Z 'l2_groupings': [64], 2026-02-21T08:40:13.1429724Z 'load_eviction_policies': ['', ''], 2026-02-21T08:40:13.1429896Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:13.1430056Z 'maxnreg': 256, 2026-02-21T08:40:13.1430221Z 'num_sm_multiplier': 1, 2026-02-21T08:40:13.1430372Z 'num_stages': 6, 2026-02-21T08:40:13.1430511Z 'num_warps': 2, 2026-02-21T08:40:13.1430659Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:13.1430849Z 'range_flattens': [True, True], 2026-02-21T08:40:13.1431022Z 'range_multi_buffers': [True, False], 2026-02-21T08:40:13.1431209Z 'range_num_stages': [0, 0], 2026-02-21T08:40:13.1431369Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:13.1431557Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:13.1439032Z [162s] Fitting surrogate: 756 points, 756 targets 2026-02-21T08:40:13.8606079Z [163s] Generation 12 starting: 36 neighbors, 2 active search path(s) 2026-02-21T08:40:16.3039135Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 37/37 18.6 configs/s 2026-02-21T08:40:17.3478125Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 37/37 38.2 configs/s 2026-02-21T08:40:18.4879904Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 877.3 2026-02-21T08:40:18.4885166Z configs/s 2026-02-21T08:40:18.5724952Z [168s] Generation 12 complete: 2026-02-21T08:40:18.5730056Z error=20 2026-02-21T08:40:18.5732176Z ok=18 2026-02-21T08:40:18.5732395Z min=0.0388 2026-02-21T08:40:18.5736511Z mid=0.0389 2026-02-21T08:40:18.5741753Z max=0.1352 2026-02-21T08:40:18.5743823Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:18.5744166Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:18.5744753Z 'l2_groupings': [64], 2026-02-21T08:40:18.5744934Z 'load_eviction_policies': ['', ''], 2026-02-21T08:40:18.5745120Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:18.5745290Z 'maxnreg': 256, 2026-02-21T08:40:18.5745439Z 'num_sm_multiplier': 1, 2026-02-21T08:40:18.5745610Z 'num_stages': 6, 2026-02-21T08:40:18.5745756Z 'num_warps': 1, 2026-02-21T08:40:18.5745903Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:18.5746092Z 'range_flattens': [True, True], 2026-02-21T08:40:18.5746371Z 'range_multi_buffers': [None, False], 2026-02-21T08:40:18.5746558Z 'range_num_stages': [0, 0], 2026-02-21T08:40:18.5746722Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:18.5746906Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:18.5752116Z [168s] Fitting surrogate: 794 points, 794 targets 2026-02-21T08:40:19.2066358Z [168s] Generation 13 starting: 30 neighbors, 2 active search path(s) 2026-02-21T08:40:22.5814124Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 31/31 6.4 configs/s 2026-02-21T08:40:23.9804042Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 31/31 23.1 configs/s 2026-02-21T08:40:25.2313756Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 801.9 2026-02-21T08:40:25.2315162Z configs/s 2026-02-21T08:40:25.3210281Z [174s] Generation 13 complete: 2026-02-21T08:40:25.3210590Z error=8 2026-02-21T08:40:25.3210752Z ok=24 2026-02-21T08:40:25.3210918Z min=0.0369 2026-02-21T08:40:25.3211094Z mid=0.0512 2026-02-21T08:40:25.3211231Z max=2.5406 2026-02-21T08:40:25.3211430Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:25.3211714Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:25.3211978Z 'l2_groupings': [64], 2026-02-21T08:40:25.3212422Z 'load_eviction_policies': ['', ''], 2026-02-21T08:40:25.3212611Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:25.3212786Z 'maxnreg': 256, 2026-02-21T08:40:25.3212946Z 'num_sm_multiplier': 1, 2026-02-21T08:40:25.3213096Z 'num_stages': 6, 2026-02-21T08:40:25.3213238Z 'num_warps': 2, 2026-02-21T08:40:25.3213382Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:25.3213567Z 'range_flattens': [True, True], 2026-02-21T08:40:25.3213739Z 'range_multi_buffers': [None, False], 2026-02-21T08:40:25.3213923Z 'range_num_stages': [0, 0], 2026-02-21T08:40:25.3214084Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:25.3214265Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:25.3235445Z [174s] Fitting surrogate: 826 points, 826 targets 2026-02-21T08:40:25.9264440Z [175s] Generation 14 starting: 29 neighbors, 2 active search path(s) 2026-02-21T08:40:28.2854305Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 30/30 17.1 configs/s 2026-02-21T08:40:29.2487734Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 30/30 31.4 configs/s 2026-02-21T08:40:30.4945103Z Generation 14: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 803.3 2026-02-21T08:40:30.4945760Z configs/s 2026-02-21T08:40:30.5855932Z [180s] Generation 14 complete: 2026-02-21T08:40:30.5860531Z error=14 2026-02-21T08:40:30.5863662Z ok=17 2026-02-21T08:40:30.5867673Z min=0.0370 2026-02-21T08:40:30.5871047Z mid=0.0389 2026-02-21T08:40:30.5874522Z max=0.0696 2026-02-21T08:40:30.5876999Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:30.5877290Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:30.5877568Z 'l2_groupings': [64], 2026-02-21T08:40:30.5883012Z 'load_eviction_policies': ['', ''], 2026-02-21T08:40:30.5885073Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:30.5885284Z 'maxnreg': 256, 2026-02-21T08:40:30.5885447Z 'num_sm_multiplier': 1, 2026-02-21T08:40:30.5885608Z 'num_stages': 6, 2026-02-21T08:40:30.5885772Z 'num_warps': 2, 2026-02-21T08:40:30.5885929Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:30.5886364Z 'range_flattens': [True, True], 2026-02-21T08:40:30.5886563Z 'range_multi_buffers': [None, False], 2026-02-21T08:40:30.5886767Z 'range_num_stages': [0, 0], 2026-02-21T08:40:30.5886940Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:30.5887136Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:30.5887354Z [180s] Fitting surrogate: 857 points, 857 targets 2026-02-21T08:40:31.0057147Z [180s] Generation 15 starting: 14 neighbors, 1 active search path(s) 2026-02-21T08:40:32.6332720Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 16.6 configs/s 2026-02-21T08:40:33.2257276Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 26.5 configs/s 2026-02-21T08:40:33.5624337Z Generation 15: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2857.4 2026-02-21T08:40:33.5632038Z configs/s 2026-02-21T08:40:33.6028814Z [183s] Generation 15 complete: 2026-02-21T08:40:33.6029139Z error=6 2026-02-21T08:40:33.6029325Z ok=10 2026-02-21T08:40:33.6029520Z min=0.0369 2026-02-21T08:40:33.6029707Z mid=0.0818 2026-02-21T08:40:33.6029880Z max=4.4523 2026-02-21T08:40:33.6030080Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:33.6030457Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:33.6030842Z 'l2_groupings': [64], 2026-02-21T08:40:33.6031084Z 'load_eviction_policies': ['', ''], 2026-02-21T08:40:33.6031361Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:33.6031589Z 'maxnreg': 256, 2026-02-21T08:40:33.6031819Z 'num_sm_multiplier': 1, 2026-02-21T08:40:33.6032042Z 'num_stages': 6, 2026-02-21T08:40:33.6032249Z 'num_warps': 2, 2026-02-21T08:40:33.6032480Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:33.6032757Z 'range_flattens': [True, True], 2026-02-21T08:40:33.6033013Z 'range_multi_buffers': [None, False], 2026-02-21T08:40:33.6033564Z 'range_num_stages': [0, 0], 2026-02-21T08:40:33.6033809Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:33.6034078Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:33.6060394Z [183s] Fitting surrogate: 873 points, 873 targets 2026-02-21T08:40:33.9375550Z [183s] Generation 16 starting: 9 neighbors, 1 active search path(s) 2026-02-21T08:40:35.6527327Z Generation 16: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/10 9.8 configs/s 2026-02-21T08:40:35.8497932Z Generation 16: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 10/10 84.2 configs/s 2026-02-21T08:40:35.9980396Z Generation 16: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 6143.2 2026-02-21T08:40:35.9984439Z configs/s 2026-02-21T08:40:36.0260213Z [185s] Generation 16 complete: 2026-02-21T08:40:36.0260439Z error=7 2026-02-21T08:40:36.0260572Z ok=4 2026-02-21T08:40:36.0260692Z min=0.0368 2026-02-21T08:40:36.0260829Z mid=0.0635 2026-02-21T08:40:36.0260949Z max=0.0839 2026-02-21T08:40:36.0261116Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:40:36.0261377Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:40:36.0261944Z 'l2_groupings': [64], 2026-02-21T08:40:36.0262128Z 'load_eviction_policies': ['', ''], 2026-02-21T08:40:36.0262313Z 'loop_orders': [[0, 1]], 2026-02-21T08:40:36.0262484Z 'maxnreg': 256, 2026-02-21T08:40:36.0262629Z 'num_sm_multiplier': 1, 2026-02-21T08:40:36.0262799Z 'num_stages': 6, 2026-02-21T08:40:36.0262940Z 'num_warps': 2, 2026-02-21T08:40:36.0263096Z 'pid_type': 'persistent_blocked', 2026-02-21T08:40:36.0263276Z 'range_flattens': [True, True], 2026-02-21T08:40:36.0263461Z 'range_multi_buffers': [None, False], 2026-02-21T08:40:36.0263642Z 'range_num_stages': [0, 0], 2026-02-21T08:40:36.0263810Z 'range_unroll_factors': [0, 0], 2026-02-21T08:40:36.0263991Z 'range_warp_specializes': [True, None]} 2026-02-21T08:40:36.0290799Z [185s] Fitting surrogate: 884 points, 884 targets 2026-02-21T08:40:36.2793854Z [185s] Autotuning complete in 185.9s after searching 858 configs. 2026-02-21T08:40:36.2794511Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:40:36.2797089Z @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=1, num_stages=6, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:40:36.2798388Z 2026-02-21T08:40:36.2798660Z [185s] Code of selected kernel: /tmp/torchinductor_root/xt/cxtg55mxbm4lbnzbkz2bjeyuogblq5tjzuhzmawgje2zibm5mfqr.py 2026-02-21T08:41:02.1140164Z WARNING:tritonbench.utils.triton_op:Completed input ID 8: 2026-02-21T08:41:02.1144398Z (M, N, K) 2026-02-21T08:41:02.1148849Z ------------------- 2026-02-21T08:41:02.1150431Z (12288, 1024, 1024) 2026-02-21T08:41:02.1150631Z 2026-02-21T08:41:02.1156567Z 75%|███████▌ | 6/8 [30:51<10:57, 328.95s/it]WARNING:tritonbench.utils.triton_op:Running input ID 9: 2026-02-21T08:41:02.1157655Z (M, N, K) 2026-02-21T08:41:02.1157825Z ------------------- 2026-02-21T08:41:02.1157977Z (1024, 12288, 1024) 2026-02-21T08:41:02.1158316Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:41:49.8855946Z INFO:tritonbench.utils.triton_op:Took 0.02ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:42:26.5361919Z Autotune Choices Stats: 2026-02-21T08:42:26.5365254Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_131", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.04076800122857094, "best_triton_pos": 0} 2026-02-21T08:42:26.5370784Z AUTOTUNE mm(1024x1024, 1024x12288) 2026-02-21T08:42:26.5374988Z strides: [1024, 1], [1, 1024] 2026-02-21T08:42:26.5379796Z dtypes: torch.float16, torch.float16 2026-02-21T08:42:26.5381600Z triton_mm_131 0.0408 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:42:26.5382442Z triton_mm_130 0.0448 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:42:26.5383171Z triton_mm_125 0.0460 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:42:26.5383877Z triton_mm_132 0.0510 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:42:26.5384583Z triton_mm_128 0.0542 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:42:26.5385604Z triton_mm_123 0.0551 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:42:26.5386327Z triton_mm_124 0.0551 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:42:26.5387006Z triton_mm_127 0.0551 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:42:26.5387680Z triton_mm_121 0.0592 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:42:26.5388449Z triton_mm_120 0.0694 ms 58.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2026-02-21T08:42:26.5389027Z SingleProcess AUTOTUNE benchmarking takes 0.3636 seconds and 0.2586 seconds precompiling for 19 choices 2026-02-21T08:42:26.8155467Z INFO:tritonbench.utils.triton_op:Took 1073.42ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:43:05.8283802Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:43:05.8287920Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:43:05.8292399Z 'dtype': 'torch.float16', 2026-02-21T08:43:05.8294334Z 'shape': (1024, 1024), 2026-02-21T08:43:05.8295080Z 'stride': (1024, 1)}, 2026-02-21T08:43:05.8295272Z { 'device': 'cuda:0', 2026-02-21T08:43:05.8295457Z 'dtype': 'torch.float16', 2026-02-21T08:43:05.8295649Z 'shape': (1024, 12288), 2026-02-21T08:43:05.8295822Z 'stride': (1, 1024)}, 2026-02-21T08:43:05.8296000Z None), 2026-02-21T08:43:05.8296140Z 'kwargs': {}} 2026-02-21T08:43:05.8352893Z INFO:tritonbench.utils.triton_op:Took 7.27ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:43:05.9280195Z [0s] Autotune random seed: 2134884919 2026-02-21T08:43:06.0297554Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:43:11.0822002Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 50.7 configs/s 2026-02-21T08:43:21.1416074Z [15s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:43:21.1416399Z 2026-02-21T08:43:21.1422747Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:43:21.1423923Z 2026-02-21T08:43:21.1424117Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:43:21.1424366Z `ptxas` stderr: 2026-02-21T08:43:21.1425133Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 250 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:43:21.1425650Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:43:21.1425817Z 2026-02-21T08:43:21.1426257Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp1eom2vlw.ptx -o /tmp/tmp1eom2vlw.ptx.o 2026-02-21T08:43:21.1426753Z 2026-02-21T08:43:21.1426894Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:43:21.1427168Z ================================================================ 2026-02-21T08:43:21.1427394Z Internal Triton PTX codegen error 2026-02-21T08:43:21.1427680Z `ptxas` stderr: 2026-02-21T08:43:21.1428106Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 250 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:43:21.1428588Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:43:21.1428736Z 2026-02-21T08:43:21.1429136Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp1eom2vlw.ptx -o /tmp/tmp1eom2vlw.ptx.o 2026-02-21T08:43:21.1429594Z 2026-02-21T08:43:21.1429597Z 2026-02-21T08:43:21.1429657Z // 2026-02-21T08:43:21.1429809Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:43:21.1430002Z // 2026-02-21T08:43:21.1430073Z 2026-02-21T08:43:21.1430140Z .version 8.7 2026-02-21T08:43:21.1430281Z .target sm_100a 2026-02-21T08:43:21.1430420Z .address_size 64 2026-02-21T08:43:21.1430499Z 2026-02-21T08:43:21.1430690Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:43:21.1430946Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:43:21.1431147Z // @_helion_matmul 2026-02-21T08:43:21.1431346Z .visible .entry _helion_matmul( 2026-02-21T08:43:21.1431560Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:43:21.1431805Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:43:21.1432054Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:43:21.1432373Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:43:21.1432618Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:43:21.1432812Z ) 2026-02-21T08:43:21.1432936Z .reqntid 128 2026-02-21T08:43:21.1433059Z .maxnreg 32 2026-02-21T08:43:21.1433184Z { 2026-02-21T08:43:21.1433305Z .reg .pred %p<73>; 2026-02-21T08:43:21.1433459Z .reg .b32 %r<546>; 2026-02-21T08:43:21.1433606Z .reg .b64 %rd<189>; 2026-02-21T08:43:21.1433748Z $L__func_begin0: 2026-02-21T08:43:21.1433828Z 2026-02-21T08:43:21.1433893Z // %bb.0: 2026-02-21T08:43:21.1434129Z .loc 1 19 0 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:19 2026-02-21T08:43:21.1434426Z mov.u32 %r1, %tid.x; 2026-02-21T08:43:21.1434596Z ld.param.b64 %rd11, [_helion_matmul_param_1]; 2026-02-21T08:43:21.1434861Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:43:21.1435023Z mov.b32 %r55, global_smem; 2026-02-21T08:43:21.1435185Z // begin inline asm 2026-02-21T08:43:21.1435436Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r55], 128; 2026-02-21T08:43:21.1435682Z // end inline asm 2026-02-21T08:43:21.1435860Z ld.param.b64 %rd28, [_helion_matmul_param_3]; 2026-02-21T08:43:21.1436045Z bar.sync 0; 2026-02-21T08:43:21.1436195Z ld.shared.b32 %r537, [global_smem]; 2026-02-21T08:43:21.1436364Z bar.sync 0; 2026-02-21T08:43:21.1436538Z // begin inline asm 2026-02-21T08:43:21.1436745Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:43:21.1436981Z // end inline asm 2026-02-21T08:43:21.1437237Z .loc 1 21 68 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:21:68 2026-02-21T08:43:21.1437527Z mov.u32 %r64, %ctaid.x; 2026-02-21T08:43:21.1437688Z mov.u32 %r65, %ctaid.y; 2026-02-21T08:43:21.1437835Z mov.u32 %r66, %ctaid.z; 2026-02-21T08:43:21.1437992Z mov.u32 %r67, %nctaid.x; 2026-02-21T08:43:21.1438145Z mov.u32 %r68, %nctaid.y; 2026-02-21T08:43:21.1438312Z mad.lo.s32 %r69, %r66, %r68, %r65; 2026-02-21T08:43:21.1438489Z mad.lo.s32 %r70, %r69, %r67, %r64; 2026-02-21T08:43:21.1438665Z shl.b32 %r71, %r70, 7; 2026-02-21T08:43:21.1438822Z cvt.s64.s32 %rd29, %r71; 2026-02-21T08:43:21.1438978Z add.s64 %rd25, %rd28, %rd29; 2026-02-21T08:43:21.1439146Z shl.b32 %r72, %r1, 2; 2026-02-21T08:43:21.1439292Z add.s32 %r56, %r55, %r72; 2026-02-21T08:43:21.1439447Z mov.b32 %r57, 0; 2026-02-21T08:43:21.1439580Z // begin inline asm 2026-02-21T08:43:21.1439741Z @%p1 st.shared.b32 [ %r56 + 0 ], %r57; 2026-02-21T08:43:21.1439962Z // end inline asm 2026-02-21T08:43:21.1440104Z bar.warp.sync -1; 2026-02-21T08:43:21.1440245Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T08:43:21.1440396Z cvt.u64.u32 %rd10, %r55; 2026-02-21T08:43:21.1440542Z // begin inline asm 2026-02-21T08:43:21.1440780Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T08:43:21.1441055Z // end inline asm 2026-02-21T08:43:21.1441183Z // begin inline asm 2026-02-21T08:43:21.1441399Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:43:21.1441640Z // end inline asm 2026-02-21T08:43:21.1441774Z mov.b32 %r58, 16; 2026-02-21T08:43:21.1441905Z // begin inline asm 2026-02-21T08:43:21.1442136Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r58; 2026-02-21T08:43:21.1442396Z // end inline asm 2026-02-21T08:43:21.1442525Z mov.b32 %r59, 128; 2026-02-21T08:43:21.1442663Z // begin inline asm 2026-02-21T08:43:21.1442916Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r59; 2026-02-21T08:43:21.1443181Z // end inline asm 2026-02-21T08:43:21.1443311Z mov.b32 %r60, 1024; 2026-02-21T08:43:21.1443458Z // begin inline asm 2026-02-21T08:43:21.1443700Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r60; 2026-02-21T08:43:21.1443966Z // end inline asm 2026-02-21T08:43:21.1444102Z mov.b32 %r61, 12288; 2026-02-21T08:43:21.1444239Z // begin inline asm 2026-02-21T08:43:21.1444473Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r61; 2026-02-21T08:43:21.1444801Z // end inline asm 2026-02-21T08:43:21.1444938Z mov.b64 %rd18, 2048; 2026-02-21T08:43:21.1445075Z // begin inline asm 2026-02-21T08:43:21.1445322Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T08:43:21.1445603Z // end inline asm 2026-02-21T08:43:21.1445730Z mov.b32 %r62, 1; 2026-02-21T08:43:21.1445866Z // begin inline asm 2026-02-21T08:43:21.1446114Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r62; 2026-02-21T08:43:21.1446398Z // end inline asm 2026-02-21T08:43:21.1446526Z // begin inline asm 2026-02-21T08:43:21.1446779Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r62; 2026-02-21T08:43:21.1447058Z // end inline asm 2026-02-21T08:43:21.1447188Z // begin inline asm 2026-02-21T08:43:21.1447414Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T08:43:21.1447669Z // end inline asm 2026-02-21T08:43:21.1447804Z // begin inline asm 2026-02-21T08:43:21.1448044Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:43:21.1448321Z // end inline asm 2026-02-21T08:43:21.1448449Z // begin inline asm 2026-02-21T08:43:21.1448708Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:43:21.1448975Z // end inline asm 2026-02-21T08:43:21.1449109Z // begin inline asm 2026-02-21T08:43:21.1449479Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:43:21.1449729Z // end inline asm 2026-02-21T08:43:21.1449871Z // begin inline asm 2026-02-21T08:43:21.1450208Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd25 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T08:43:21.1450578Z // end inline asm 2026-02-21T08:43:21.1450721Z // begin inline asm 2026-02-21T08:43:21.1450927Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd25 + 0 ], 0x80; 2026-02-21T08:43:21.1451186Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:43:21.1451375Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:43:21.1451561Z // end inline asm 2026-02-21T08:43:21.1451693Z bar.sync 0; 2026-02-21T08:43:21.1451841Z cvta.global.u64 %rd42, %rd25; 2026-02-21T08:43:21.1452119Z .loc 1 28 35 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:28:35 2026-02-21T08:43:21.1452458Z mul.lo.s32 %r538, %r64, 6; 2026-02-21T08:43:21.1452727Z .loc 1 29 37 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:29:37 2026-02-21T08:43:21.1453013Z add.s32 %r73, %r538, 6; 2026-02-21T08:43:21.1453274Z .loc 1 29 49 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:29:49 2026-02-21T08:43:21.1453556Z min.s32 %r4, %r73, 1536; 2026-02-21T08:43:21.1453817Z .loc 1 30 74 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:30:74 2026-02-21T08:43:21.1454122Z setp.ge.s32 %p21, %r538, %r4; 2026-02-21T08:43:21.1454285Z @%p21 bra $L__BB0_9; 2026-02-21T08:43:21.1454452Z // %bb.1: // %.lr.ph 2026-02-21T08:43:21.1454776Z .loc 1 0 74 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:0:74 2026-02-21T08:43:21.1455095Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T08:43:21.1455339Z ld.param.b64 %rd8, [_helion_matmul_param_0]; 2026-02-21T08:43:21.1455533Z shr.u32 %r5, %r1, 5; 2026-02-21T08:43:21.1455673Z and.b32 %r6, %r1, 15; 2026-02-21T08:43:21.1455820Z shl.b32 %r7, %r6, 3; 2026-02-21T08:43:21.1455966Z bfe.u32 %r8, %r1, 1, 6; 2026-02-21T08:43:21.1456113Z shr.u32 %r74, %r1, 4; 2026-02-21T08:43:21.1465847Z bfe.u32 %r9, %r1, 4, 3; 2026-02-21T08:43:21.1466095Z or.b32 %r10, %r9, 8; 2026-02-21T08:43:21.1466261Z or.b32 %r11, %r9, 16; 2026-02-21T08:43:21.1466427Z or.b32 %r12, %r9, 24; 2026-02-21T08:43:21.1466693Z or.b32 %r13, %r9, 32; 2026-02-21T08:43:21.1466838Z or.b32 %r14, %r9, 40; 2026-02-21T08:43:21.1466992Z or.b32 %r15, %r9, 48; 2026-02-21T08:43:21.1467139Z or.b32 %r16, %r74, 56; 2026-02-21T08:43:21.1467306Z shl.b32 %r75, %r1, 3; 2026-02-21T08:43:21.1467454Z and.b32 %r17, %r75, 8; 2026-02-21T08:43:21.1467614Z shl.b32 %r76, %r1, 4; 2026-02-21T08:43:21.1467773Z and.b32 %r77, %r76, 1904; 2026-02-21T08:43:21.1467954Z bfe.s32 %r78, %r1, 3, 1; 2026-02-21T08:43:21.1468118Z and.b32 %r79, %r78, 144; 2026-02-21T08:43:21.1468290Z xor.b32 %r80, %r79, %r77; 2026-02-21T08:43:21.1468463Z add.s32 %r82, %r55, 16384; 2026-02-21T08:43:21.1468625Z add.s32 %r180, %r82, %r80; 2026-02-21T08:43:21.1468790Z add.s32 %r83, %r55, %r80; 2026-02-21T08:43:21.1468946Z add.s32 %r187, %r83, 18432; 2026-02-21T08:43:21.1469115Z add.s32 %r194, %r83, 20480; 2026-02-21T08:43:21.1469272Z bfe.u32 %r84, %r82, 4, 14; 2026-02-21T08:43:21.1469438Z cvt.u64.u32 %rd30, %r84; 2026-02-21T08:43:21.1469615Z or.b64 %rd38, %rd30, -4611685949699522560; 2026-02-21T08:43:21.1469813Z bfe.u32 %r85, %r55, 4, 14; 2026-02-21T08:43:21.1469968Z cvt.u64.u32 %rd31, %r85; 2026-02-21T08:43:21.1470142Z or.b64 %rd39, %rd31, -4611685949691133952; 2026-02-21T08:43:21.1470332Z add.s32 %r229, %r83, 22528; 2026-02-21T08:43:21.1470490Z shl.b32 %r86, %r1, 9; 2026-02-21T08:43:21.1470719Z and.b32 %r87, %r86, 3072; 2026-02-21T08:43:21.1470894Z shl.b32 %r88, %r6, 4; 2026-02-21T08:43:21.1471055Z and.b32 %r89, %r1, 96; 2026-02-21T08:43:21.1471207Z shl.b32 %r90, %r89, 3; 2026-02-21T08:43:21.1471380Z and.b32 %r92, %r72, 64; 2026-02-21T08:43:21.1471530Z or.b32 %r93, %r88, %r90; 2026-02-21T08:43:21.1471684Z xor.b32 %r94, %r93, %r92; 2026-02-21T08:43:21.1471832Z or.b32 %r95, %r94, %r87; 2026-02-21T08:43:21.1471993Z add.s32 %r22, %r55, %r95; 2026-02-21T08:43:21.1472154Z xor.b32 %r96, %r95, 32; 2026-02-21T08:43:21.1472299Z add.s32 %r23, %r55, %r96; 2026-02-21T08:43:21.1472462Z shl.b32 %r97, %r1, 5; 2026-02-21T08:43:21.1472608Z and.b32 %r98, %r97, 3168; 2026-02-21T08:43:21.1472765Z and.b32 %r99, %r76, 384; 2026-02-21T08:43:21.1472912Z and.b32 %r100, %r72, 16; 2026-02-21T08:43:21.1473065Z or.b32 %r101, %r98, %r99; 2026-02-21T08:43:21.1473211Z xor.b32 %r102, %r101, %r89; 2026-02-21T08:43:21.1473372Z add.s32 %r103, %r55, %r100; 2026-02-21T08:43:21.1473531Z add.s32 %r364, %r103, %r102; 2026-02-21T08:43:21.1473697Z add.s32 %r369, %r364, 512; 2026-02-21T08:43:21.1473988Z .loc 1 30 74 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:30:74 2026-02-21T08:43:21.1474334Z shl.b32 %r104, %r8, 10; 2026-02-21T08:43:21.1474607Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1474930Z or.b32 %r105, %r104, %r17; 2026-02-21T08:43:21.1475091Z or.b32 %r26, %r105, 64; 2026-02-21T08:43:21.1475239Z bra.uni $L__BB0_2; 2026-02-21T08:43:21.1475440Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:43:21.1475778Z .loc 1 0 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:0:90 2026-02-21T08:43:21.1476056Z mov.b32 %r285, 1; 2026-02-21T08:43:21.1476315Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1476598Z // begin inline asm 2026-02-21T08:43:21.1476749Z 2026-02-21T08:43:21.1476865Z { 2026-02-21T08:43:21.1477002Z .reg .pred complete; 2026-02-21T08:43:21.1477188Z waitLoop: 2026-02-21T08:43:21.1477398Z mbarrier.try_wait.parity.shared.b64 complete, [%r284], %r285; 2026-02-21T08:43:21.1477647Z @!complete bra.uni waitLoop; 2026-02-21T08:43:21.1477801Z } 2026-02-21T08:43:21.1477869Z 2026-02-21T08:43:21.1477936Z // end inline asm 2026-02-21T08:43:21.1478187Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1478492Z cp.async.wait_group 0; 2026-02-21T08:43:21.1478641Z bar.sync 0; 2026-02-21T08:43:21.1478826Z // begin inline asm 2026-02-21T08:43:21.1478996Z @%p4 mbarrier.inval.shared::cta.b64 [%r176]; 2026-02-21T08:43:21.1479188Z // end inline asm 2026-02-21T08:43:21.1479324Z bar.sync 0; 2026-02-21T08:43:21.1479450Z // begin inline asm 2026-02-21T08:43:21.1479612Z @%p4 mbarrier.inval.shared::cta.b64 [%r177]; 2026-02-21T08:43:21.1479792Z // end inline asm 2026-02-21T08:43:21.1479930Z bar.sync 0; 2026-02-21T08:43:21.1480056Z // begin inline asm 2026-02-21T08:43:21.1480221Z @%p4 mbarrier.inval.shared::cta.b64 [%r178]; 2026-02-21T08:43:21.1480399Z // end inline asm 2026-02-21T08:43:21.1480536Z bar.sync 0; 2026-02-21T08:43:21.1480660Z // begin inline asm 2026-02-21T08:43:21.1480821Z @%p4 mbarrier.inval.shared::cta.b64 [%r231]; 2026-02-21T08:43:21.1481007Z // end inline asm 2026-02-21T08:43:21.1481141Z add.s32 %r290, %r55, 24608; 2026-02-21T08:43:21.1481301Z // begin inline asm 2026-02-21T08:43:21.1481454Z @%p4 mbarrier.inval.shared::cta.b64 [%r290]; 2026-02-21T08:43:21.1481641Z // end inline asm 2026-02-21T08:43:21.1481771Z bar.sync 0; 2026-02-21T08:43:21.1481909Z // begin inline asm 2026-02-21T08:43:21.1482063Z @%p4 mbarrier.inval.shared::cta.b64 [%r175]; 2026-02-21T08:43:21.1482251Z // end inline asm 2026-02-21T08:43:21.1482490Z .loc 1 59 53 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:59:53 2026-02-21T08:43:21.1482834Z mad.lo.s32 %r433, %r31, 12288, %r30; 2026-02-21T08:43:21.1483024Z mad.lo.s32 %r434, %r32, 12288, %r30; 2026-02-21T08:43:21.1483195Z mad.lo.s32 %r435, %r33, 12288, %r30; 2026-02-21T08:43:21.1483374Z mad.lo.s32 %r436, %r34, 12288, %r30; 2026-02-21T08:43:21.1483540Z mad.lo.s32 %r437, %r35, 12288, %r30; 2026-02-21T08:43:21.1483714Z mad.lo.s32 %r438, %r36, 12288, %r30; 2026-02-21T08:43:21.1483877Z mad.lo.s32 %r439, %r37, 12288, %r30; 2026-02-21T08:43:21.1484049Z mad.lo.s32 %r440, %r38, 12288, %r30; 2026-02-21T08:43:21.1484327Z .loc 1 59 24 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:59:24 2026-02-21T08:43:21.1484621Z mad.wide.s32 %rd52, %r433, 2, %rd9; 2026-02-21T08:43:21.1484835Z mad.wide.s32 %rd53, %r434, 2, %rd9; 2026-02-21T08:43:21.1485004Z mad.wide.s32 %rd54, %r435, 2, %rd9; 2026-02-21T08:43:21.1485180Z mad.wide.s32 %rd55, %r436, 2, %rd9; 2026-02-21T08:43:21.1485346Z mad.wide.s32 %rd56, %r437, 2, %rd9; 2026-02-21T08:43:21.1485527Z mad.wide.s32 %rd57, %r438, 2, %rd9; 2026-02-21T08:43:21.1485696Z mad.wide.s32 %rd58, %r439, 2, %rd9; 2026-02-21T08:43:21.1485874Z mad.wide.s32 %rd59, %r440, 2, %rd9; 2026-02-21T08:43:21.1489221Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1489539Z // begin inline asm 2026-02-21T08:43:21.1489934Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302, %r303, %r304, %r305, %r306, %r307}, [%r359 + 0], 64; 2026-02-21T08:43:21.1490329Z // end inline asm 2026-02-21T08:43:21.1490486Z // begin inline asm 2026-02-21T08:43:21.1490852Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r309, %r310, %r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324}, [%r359 + 16], 64; 2026-02-21T08:43:21.1491259Z // end inline asm 2026-02-21T08:43:21.1491403Z // begin inline asm 2026-02-21T08:43:21.1491860Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341}, [%r359 + 32], 64; 2026-02-21T08:43:21.1492266Z // end inline asm 2026-02-21T08:43:21.1492407Z // begin inline asm 2026-02-21T08:43:21.1492771Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358}, [%r359 + 48], 64; 2026-02-21T08:43:21.1493163Z // end inline asm 2026-02-21T08:43:21.1493300Z // begin inline asm 2026-02-21T08:43:21.1493475Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:43:21.1493644Z // end inline asm 2026-02-21T08:43:21.1493844Z cvt.u64.u32 %rd60, %r292; 2026-02-21T08:43:21.1494003Z cvt.u64.u32 %rd61, %r293; 2026-02-21T08:43:21.1494163Z shl.b64 %rd62, %rd61, 32; 2026-02-21T08:43:21.1494316Z or.b64 %rd63, %rd60, %rd62; 2026-02-21T08:43:21.1494604Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1494976Z mov.b64 {%r441, %r442}, %rd63; 2026-02-21T08:43:21.1495153Z cvt.rn.f16x2.f32 %r443, %r442, %r441; 2026-02-21T08:43:21.1495449Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1495738Z cvt.u64.u32 %rd64, %r294; 2026-02-21T08:43:21.1495900Z cvt.u64.u32 %rd65, %r295; 2026-02-21T08:43:21.1496050Z shl.b64 %rd66, %rd65, 32; 2026-02-21T08:43:21.1496210Z or.b64 %rd67, %rd64, %rd66; 2026-02-21T08:43:21.1496477Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1496773Z mov.b64 {%r444, %r445}, %rd67; 2026-02-21T08:43:21.1496957Z cvt.rn.f16x2.f32 %r446, %r445, %r444; 2026-02-21T08:43:21.1497239Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1497531Z cvt.u64.u32 %rd68, %r296; 2026-02-21T08:43:21.1497680Z cvt.u64.u32 %rd69, %r297; 2026-02-21T08:43:21.1497872Z shl.b64 %rd70, %rd69, 32; 2026-02-21T08:43:21.1498022Z or.b64 %rd71, %rd68, %rd70; 2026-02-21T08:43:21.1498297Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1498594Z mov.b64 {%r447, %r448}, %rd71; 2026-02-21T08:43:21.1498759Z cvt.rn.f16x2.f32 %r449, %r448, %r447; 2026-02-21T08:43:21.1499048Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1499335Z cvt.u64.u32 %rd72, %r298; 2026-02-21T08:43:21.1499493Z cvt.u64.u32 %rd73, %r299; 2026-02-21T08:43:21.1499639Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:43:21.1499800Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:43:21.1500064Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1500361Z mov.b64 {%r450, %r451}, %rd75; 2026-02-21T08:43:21.1500534Z cvt.rn.f16x2.f32 %r452, %r451, %r450; 2026-02-21T08:43:21.1500815Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1501106Z cvt.u64.u32 %rd76, %r300; 2026-02-21T08:43:21.1501290Z cvt.u64.u32 %rd77, %r301; 2026-02-21T08:43:21.1501451Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:43:21.1501600Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:43:21.1501862Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1502155Z mov.b64 {%r453, %r454}, %rd79; 2026-02-21T08:43:21.1502313Z cvt.rn.f16x2.f32 %r455, %r454, %r453; 2026-02-21T08:43:21.1502603Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1502887Z cvt.u64.u32 %rd80, %r302; 2026-02-21T08:43:21.1503047Z cvt.u64.u32 %rd81, %r303; 2026-02-21T08:43:21.1503209Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:43:21.1503359Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:43:21.1503625Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1503897Z mov.b64 {%r456, %r457}, %rd83; 2026-02-21T08:43:21.1504095Z cvt.rn.f16x2.f32 %r458, %r457, %r456; 2026-02-21T08:43:21.1504362Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1504639Z cvt.u64.u32 %rd84, %r304; 2026-02-21T08:43:21.1504834Z cvt.u64.u32 %rd85, %r305; 2026-02-21T08:43:21.1504979Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:43:21.1505134Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:43:21.1505389Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1505740Z mov.b64 {%r459, %r460}, %rd87; 2026-02-21T08:43:21.1505904Z cvt.rn.f16x2.f32 %r461, %r460, %r459; 2026-02-21T08:43:21.1506187Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1506469Z cvt.u64.u32 %rd88, %r306; 2026-02-21T08:43:21.1506628Z cvt.u64.u32 %rd89, %r307; 2026-02-21T08:43:21.1506785Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:43:21.1506936Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:43:21.1507210Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1507502Z mov.b64 {%r462, %r463}, %rd91; 2026-02-21T08:43:21.1507675Z cvt.rn.f16x2.f32 %r464, %r463, %r462; 2026-02-21T08:43:21.1507953Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1508248Z cvt.u64.u32 %rd92, %r309; 2026-02-21T08:43:21.1508408Z cvt.u64.u32 %rd93, %r310; 2026-02-21T08:43:21.1508566Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:43:21.1508731Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:43:21.1509004Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1509307Z mov.b64 {%r465, %r466}, %rd95; 2026-02-21T08:43:21.1509478Z cvt.rn.f16x2.f32 %r467, %r466, %r465; 2026-02-21T08:43:21.1509802Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1510113Z cvt.u64.u32 %rd96, %r311; 2026-02-21T08:43:21.1510272Z cvt.u64.u32 %rd97, %r312; 2026-02-21T08:43:21.1510423Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:43:21.1510581Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:43:21.1510847Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1511151Z mov.b64 {%r468, %r469}, %rd99; 2026-02-21T08:43:21.1511325Z cvt.rn.f16x2.f32 %r470, %r469, %r468; 2026-02-21T08:43:21.1511608Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1511916Z cvt.u64.u32 %rd100, %r313; 2026-02-21T08:43:21.1512077Z cvt.u64.u32 %rd101, %r314; 2026-02-21T08:43:21.1512240Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:43:21.1512400Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:43:21.1512678Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1512987Z mov.b64 {%r471, %r472}, %rd103; 2026-02-21T08:43:21.1513189Z cvt.rn.f16x2.f32 %r473, %r472, %r471; 2026-02-21T08:43:21.1513477Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1513767Z cvt.u64.u32 %rd104, %r315; 2026-02-21T08:43:21.1513930Z cvt.u64.u32 %rd105, %r316; 2026-02-21T08:43:21.1514086Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:43:21.1514253Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:43:21.1514519Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1514846Z mov.b64 {%r474, %r475}, %rd107; 2026-02-21T08:43:21.1515024Z cvt.rn.f16x2.f32 %r476, %r475, %r474; 2026-02-21T08:43:21.1515316Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1515612Z cvt.u64.u32 %rd108, %r317; 2026-02-21T08:43:21.1515768Z cvt.u64.u32 %rd109, %r318; 2026-02-21T08:43:21.1515957Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:43:21.1516119Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:43:21.1516395Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1516744Z mov.b64 {%r477, %r478}, %rd111; 2026-02-21T08:43:21.1516908Z cvt.rn.f16x2.f32 %r479, %r478, %r477; 2026-02-21T08:43:21.1517183Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1517460Z cvt.u64.u32 %rd112, %r319; 2026-02-21T08:43:21.1517644Z cvt.u64.u32 %rd113, %r320; 2026-02-21T08:43:21.1517791Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:43:21.1517950Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:43:21.1518204Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1518494Z mov.b64 {%r480, %r481}, %rd115; 2026-02-21T08:43:21.1518661Z cvt.rn.f16x2.f32 %r482, %r481, %r480; 2026-02-21T08:43:21.1518928Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1519203Z cvt.u64.u32 %rd116, %r321; 2026-02-21T08:43:21.1519349Z cvt.u64.u32 %rd117, %r322; 2026-02-21T08:43:21.1519495Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:43:21.1519634Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:43:21.1519890Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1520173Z mov.b64 {%r483, %r484}, %rd119; 2026-02-21T08:43:21.1520334Z cvt.rn.f16x2.f32 %r485, %r484, %r483; 2026-02-21T08:43:21.1520608Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1520883Z cvt.u64.u32 %rd120, %r323; 2026-02-21T08:43:21.1521044Z cvt.u64.u32 %rd121, %r324; 2026-02-21T08:43:21.1521192Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:43:21.1521368Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:43:21.1521625Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1521915Z mov.b64 {%r486, %r487}, %rd123; 2026-02-21T08:43:21.1522083Z cvt.rn.f16x2.f32 %r488, %r487, %r486; 2026-02-21T08:43:21.1522346Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1522629Z cvt.u64.u32 %rd124, %r326; 2026-02-21T08:43:21.1522774Z cvt.u64.u32 %rd125, %r327; 2026-02-21T08:43:21.1522918Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:43:21.1523061Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:43:21.1523313Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1523585Z mov.b64 {%r489, %r490}, %rd127; 2026-02-21T08:43:21.1523736Z cvt.rn.f16x2.f32 %r491, %r490, %r489; 2026-02-21T08:43:21.1523999Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1524278Z cvt.u64.u32 %rd128, %r328; 2026-02-21T08:43:21.1524456Z cvt.u64.u32 %rd129, %r329; 2026-02-21T08:43:21.1524603Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:43:21.1524779Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:43:21.1525039Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1525334Z mov.b64 {%r492, %r493}, %rd131; 2026-02-21T08:43:21.1525503Z cvt.rn.f16x2.f32 %r494, %r493, %r492; 2026-02-21T08:43:21.1525775Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1526062Z cvt.u64.u32 %rd132, %r330; 2026-02-21T08:43:21.1526209Z cvt.u64.u32 %rd133, %r331; 2026-02-21T08:43:21.1526361Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:43:21.1526511Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:43:21.1526780Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1527071Z mov.b64 {%r495, %r496}, %rd135; 2026-02-21T08:43:21.1527269Z cvt.rn.f16x2.f32 %r497, %r496, %r495; 2026-02-21T08:43:21.1527538Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1527815Z cvt.u64.u32 %rd136, %r332; 2026-02-21T08:43:21.1527966Z cvt.u64.u32 %rd137, %r333; 2026-02-21T08:43:21.1528111Z shl.b64 %rd138, %rd137, 32; 2026-02-21T08:43:21.1528265Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T08:43:21.1528516Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1528829Z mov.b64 {%r498, %r499}, %rd139; 2026-02-21T08:43:21.1528999Z cvt.rn.f16x2.f32 %r500, %r499, %r498; 2026-02-21T08:43:21.1529267Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1529551Z cvt.u64.u32 %rd140, %r334; 2026-02-21T08:43:21.1529698Z cvt.u64.u32 %rd141, %r335; 2026-02-21T08:43:21.1529851Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:43:21.1530001Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:43:21.1530266Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1530555Z mov.b64 {%r501, %r502}, %rd143; 2026-02-21T08:43:21.1530715Z cvt.rn.f16x2.f32 %r503, %r502, %r501; 2026-02-21T08:43:21.1530989Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1531270Z cvt.u64.u32 %rd144, %r336; 2026-02-21T08:43:21.1531427Z cvt.u64.u32 %rd145, %r337; 2026-02-21T08:43:21.1531579Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:43:21.1531740Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:43:21.1531998Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1532280Z mov.b64 {%r504, %r505}, %rd147; 2026-02-21T08:43:21.1532476Z cvt.rn.f16x2.f32 %r506, %r505, %r504; 2026-02-21T08:43:21.1532754Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1533030Z cvt.u64.u32 %rd148, %r338; 2026-02-21T08:43:21.1533177Z cvt.u64.u32 %rd149, %r339; 2026-02-21T08:43:21.1533332Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:43:21.1533480Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:43:21.1533740Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1534028Z mov.b64 {%r507, %r508}, %rd151; 2026-02-21T08:43:21.1534187Z cvt.rn.f16x2.f32 %r509, %r508, %r507; 2026-02-21T08:43:21.1534460Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1534770Z cvt.u64.u32 %rd152, %r340; 2026-02-21T08:43:21.1534923Z cvt.u64.u32 %rd153, %r341; 2026-02-21T08:43:21.1535070Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:43:21.1535229Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:43:21.1535493Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1535811Z mov.b64 {%r510, %r511}, %rd155; 2026-02-21T08:43:21.1535979Z cvt.rn.f16x2.f32 %r512, %r511, %r510; 2026-02-21T08:43:21.1536243Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1536528Z cvt.u64.u32 %rd156, %r343; 2026-02-21T08:43:21.1536677Z cvt.u64.u32 %rd157, %r344; 2026-02-21T08:43:21.1536832Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:43:21.1536980Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:43:21.1537239Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1537516Z mov.b64 {%r513, %r514}, %rd159; 2026-02-21T08:43:21.1537674Z cvt.rn.f16x2.f32 %r515, %r514, %r513; 2026-02-21T08:43:21.1537946Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1538220Z cvt.u64.u32 %rd160, %r345; 2026-02-21T08:43:21.1538399Z cvt.u64.u32 %rd161, %r346; 2026-02-21T08:43:21.1538549Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:43:21.1538707Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:43:21.1538961Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1539243Z mov.b64 {%r516, %r517}, %rd163; 2026-02-21T08:43:21.1539408Z cvt.rn.f16x2.f32 %r518, %r517, %r516; 2026-02-21T08:43:21.1539669Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1540000Z cvt.u64.u32 %rd164, %r347; 2026-02-21T08:43:21.1540146Z cvt.u64.u32 %rd165, %r348; 2026-02-21T08:43:21.1540298Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:43:21.1540447Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:43:21.1540712Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1541002Z mov.b64 {%r519, %r520}, %rd167; 2026-02-21T08:43:21.1541161Z cvt.rn.f16x2.f32 %r521, %r520, %r519; 2026-02-21T08:43:21.1541439Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1541724Z cvt.u64.u32 %rd168, %r349; 2026-02-21T08:43:21.1541880Z cvt.u64.u32 %rd169, %r350; 2026-02-21T08:43:21.1542026Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:43:21.1542188Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:43:21.1542447Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1542744Z mov.b64 {%r522, %r523}, %rd171; 2026-02-21T08:43:21.1542912Z cvt.rn.f16x2.f32 %r524, %r523, %r522; 2026-02-21T08:43:21.1543182Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1543469Z cvt.u64.u32 %rd172, %r351; 2026-02-21T08:43:21.1543615Z cvt.u64.u32 %rd173, %r352; 2026-02-21T08:43:21.1543792Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:43:21.1543943Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:43:21.1544206Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1544490Z mov.b64 {%r525, %r526}, %rd175; 2026-02-21T08:43:21.1544646Z cvt.rn.f16x2.f32 %r527, %r526, %r525; 2026-02-21T08:43:21.1544952Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1545229Z cvt.u64.u32 %rd176, %r353; 2026-02-21T08:43:21.1545380Z cvt.u64.u32 %rd177, %r354; 2026-02-21T08:43:21.1545526Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:43:21.1545685Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:43:21.1545941Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1546233Z mov.b64 {%r528, %r529}, %rd179; 2026-02-21T08:43:21.1546402Z cvt.rn.f16x2.f32 %r530, %r529, %r528; 2026-02-21T08:43:21.1546675Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1546966Z cvt.u64.u32 %rd180, %r355; 2026-02-21T08:43:21.1547144Z cvt.u64.u32 %rd181, %r356; 2026-02-21T08:43:21.1547296Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:43:21.1547443Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:43:21.1547702Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1547983Z mov.b64 {%r531, %r532}, %rd183; 2026-02-21T08:43:21.1548140Z cvt.rn.f16x2.f32 %r533, %r532, %r531; 2026-02-21T08:43:21.1548411Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1548686Z cvt.u64.u32 %rd184, %r357; 2026-02-21T08:43:21.1548838Z cvt.u64.u32 %rd185, %r358; 2026-02-21T08:43:21.1548983Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:43:21.1549136Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:43:21.1549395Z .loc 1 58 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:58:27 2026-02-21T08:43:21.1549715Z mov.b64 {%r534, %r535}, %rd187; 2026-02-21T08:43:21.1549885Z cvt.rn.f16x2.f32 %r536, %r535, %r534; 2026-02-21T08:43:21.1550149Z .loc 1 59 83 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:59:83 2026-02-21T08:43:21.1550471Z st.shared.v4.b32 [%r22], {%r443, %r455, %r467, %r479}; 2026-02-21T08:43:21.1550698Z st.shared.v4.b32 [%r23], {%r491, %r503, %r515, %r527}; 2026-02-21T08:43:21.1550898Z bar.sync 0; 2026-02-21T08:43:21.1551026Z // begin inline asm 2026-02-21T08:43:21.1551262Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r360, %r361, %r362, %r363}, [%r364]; 2026-02-21T08:43:21.1551567Z // end inline asm 2026-02-21T08:43:21.1551700Z // begin inline asm 2026-02-21T08:43:21.1551942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r365, %r366, %r367, %r368}, [%r369]; 2026-02-21T08:43:21.1552207Z // end inline asm 2026-02-21T08:43:21.1552348Z bar.sync 0; 2026-02-21T08:43:21.1552514Z st.shared.v4.b32 [%r22], {%r446, %r458, %r470, %r482}; 2026-02-21T08:43:21.1552757Z st.shared.v4.b32 [%r23], {%r494, %r506, %r518, %r530}; 2026-02-21T08:43:21.1552954Z bar.sync 0; 2026-02-21T08:43:21.1553093Z // begin inline asm 2026-02-21T08:43:21.1553332Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r370, %r371, %r372, %r373}, [%r364]; 2026-02-21T08:43:21.1553607Z // end inline asm 2026-02-21T08:43:21.1553753Z // begin inline asm 2026-02-21T08:43:21.1553981Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r375, %r376, %r377, %r378}, [%r369]; 2026-02-21T08:43:21.1554248Z // end inline asm 2026-02-21T08:43:21.1554382Z bar.sync 0; 2026-02-21T08:43:21.1554554Z st.shared.v4.b32 [%r22], {%r449, %r461, %r473, %r485}; 2026-02-21T08:43:21.1554814Z st.shared.v4.b32 [%r23], {%r497, %r509, %r521, %r533}; 2026-02-21T08:43:21.1555014Z bar.sync 0; 2026-02-21T08:43:21.1555153Z // begin inline asm 2026-02-21T08:43:21.1555407Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r380, %r381, %r382, %r383}, [%r364]; 2026-02-21T08:43:21.1555680Z // end inline asm 2026-02-21T08:43:21.1555816Z // begin inline asm 2026-02-21T08:43:21.1556046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r385, %r386, %r387, %r388}, [%r369]; 2026-02-21T08:43:21.1556303Z // end inline asm 2026-02-21T08:43:21.1556442Z bar.sync 0; 2026-02-21T08:43:21.1556603Z st.shared.v4.b32 [%r22], {%r452, %r464, %r476, %r488}; 2026-02-21T08:43:21.1556837Z st.shared.v4.b32 [%r23], {%r500, %r512, %r524, %r536}; 2026-02-21T08:43:21.1557032Z bar.sync 0; 2026-02-21T08:43:21.1557161Z // begin inline asm 2026-02-21T08:43:21.1557396Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r390, %r391, %r392, %r393}, [%r364]; 2026-02-21T08:43:21.1557656Z // end inline asm 2026-02-21T08:43:21.1557798Z // begin inline asm 2026-02-21T08:43:21.1558021Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r395, %r396, %r397, %r398}, [%r369]; 2026-02-21T08:43:21.1558283Z // end inline asm 2026-02-21T08:43:21.1558416Z // begin inline asm 2026-02-21T08:43:21.1558610Z st.global.v4.b32 [ %rd52 + 0 ], { %r360, %r370, %r380, %r390 }; 2026-02-21T08:43:21.1558828Z // end inline asm 2026-02-21T08:43:21.1558961Z // begin inline asm 2026-02-21T08:43:21.1559178Z st.global.v4.b32 [ %rd53 + 0 ], { %r361, %r371, %r381, %r391 }; 2026-02-21T08:43:21.1559387Z // end inline asm 2026-02-21T08:43:21.1559528Z // begin inline asm 2026-02-21T08:43:21.1559704Z st.global.v4.b32 [ %rd54 + 0 ], { %r362, %r372, %r382, %r392 }; 2026-02-21T08:43:21.1559918Z // end inline asm 2026-02-21T08:43:21.1560053Z // begin inline asm 2026-02-21T08:43:21.1560228Z st.global.v4.b32 [ %rd55 + 0 ], { %r363, %r373, %r383, %r393 }; 2026-02-21T08:43:21.1560424Z // end inline asm 2026-02-21T08:43:21.1560555Z // begin inline asm 2026-02-21T08:43:21.1560727Z st.global.v4.b32 [ %rd56 + 0 ], { %r365, %r375, %r385, %r395 }; 2026-02-21T08:43:21.1560922Z // end inline asm 2026-02-21T08:43:21.1561055Z // begin inline asm 2026-02-21T08:43:21.1561222Z st.global.v4.b32 [ %rd57 + 0 ], { %r366, %r376, %r386, %r396 }; 2026-02-21T08:43:21.1561424Z // end inline asm 2026-02-21T08:43:21.1561551Z // begin inline asm 2026-02-21T08:43:21.1561756Z st.global.v4.b32 [ %rd58 + 0 ], { %r367, %r377, %r387, %r397 }; 2026-02-21T08:43:21.1561954Z // end inline asm 2026-02-21T08:43:21.1562086Z // begin inline asm 2026-02-21T08:43:21.1562260Z st.global.v4.b32 [ %rd59 + 0 ], { %r368, %r378, %r388, %r398 }; 2026-02-21T08:43:21.1562459Z // end inline asm 2026-02-21T08:43:21.1562710Z .loc 1 30 74 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:30:74 2026-02-21T08:43:21.1562991Z add.s32 %r538, %r538, 1; 2026-02-21T08:43:21.1563159Z setp.ne.b32 %p71, %r538, %r4; 2026-02-21T08:43:21.1563348Z @%p71 bra $L__BB0_2; 2026-02-21T08:43:21.1563497Z bra.uni $L__BB0_9; 2026-02-21T08:43:21.1563673Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:43:21.1563916Z // Child Loop BB0_5 Depth 2 2026-02-21T08:43:21.1564236Z .loc 1 36 35 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:36:35 2026-02-21T08:43:21.1564528Z shr.s32 %r203, %r538, 31; 2026-02-21T08:43:21.1564713Z shr.u32 %r204, %r203, 22; 2026-02-21T08:43:21.1564867Z add.s32 %r205, %r538, %r204; 2026-02-21T08:43:21.1565024Z shr.s32 %r206, %r205, 10; 2026-02-21T08:43:21.1565277Z .loc 1 37 33 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:37:33 2026-02-21T08:43:21.1565567Z shl.b32 %r207, %r206, 6; 2026-02-21T08:43:21.1565825Z .loc 1 38 39 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:38:39 2026-02-21T08:43:21.1566102Z sub.s32 %r208, 96, %r207; 2026-02-21T08:43:21.1566356Z .loc 1 38 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:38:52 2026-02-21T08:43:21.1566630Z min.s32 %r209, %r208, 64; 2026-02-21T08:43:21.1566885Z .loc 1 39 45 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:39:45 2026-02-21T08:43:21.1567195Z and.b32 %r210, %r205, -1024; 2026-02-21T08:43:21.1567363Z sub.s32 %r211, %r538, %r210; 2026-02-21T08:43:21.1567628Z .loc 1 40 51 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:40:51 2026-02-21T08:43:21.1567912Z div.s32 %r28, %r211, %r209; 2026-02-21T08:43:21.1568171Z .loc 1 39 64 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:39:64 2026-02-21T08:43:21.1568459Z mul.lo.s32 %r212, %r28, %r209; 2026-02-21T08:43:21.1568621Z sub.s32 %r213, %r211, %r212; 2026-02-21T08:43:21.1568874Z .loc 1 39 30 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:39:30 2026-02-21T08:43:21.1569163Z add.s32 %r214, %r213, %r207; 2026-02-21T08:43:21.1569423Z .loc 1 41 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:41:27 2026-02-21T08:43:21.1569702Z shl.b32 %r234, %r214, 7; 2026-02-21T08:43:21.1569955Z .loc 1 43 27 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:43:27 2026-02-21T08:43:21.1570231Z shl.b32 %r215, %r28, 6; 2026-02-21T08:43:21.1570484Z .loc 1 44 32 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:44:32 2026-02-21T08:43:21.1570790Z or.b32 %r216, %r215, %r8; 2026-02-21T08:43:21.1571041Z .loc 1 54 53 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:53 2026-02-21T08:43:21.1571317Z shl.b32 %r217, %r216, 10; 2026-02-21T08:43:21.1571565Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1571863Z shfl.sync.idx.b32 %r39, %r5, 0, 31, -1; 2026-02-21T08:43:21.1572038Z shl.b32 %r218, %r39, 21; 2026-02-21T08:43:21.1572192Z and.b32 %r219, %r218, 6291456; 2026-02-21T08:43:21.1572344Z add.s32 %r359, %r219, %r537; 2026-02-21T08:43:21.1572502Z mov.pred %p22, -1; 2026-02-21T08:43:21.1572637Z mov.b32 %r539, 0; 2026-02-21T08:43:21.1572776Z // begin inline asm 2026-02-21T08:43:21.1573189Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r359 + 0], 64, {%r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539}; 2026-02-21T08:43:21.1573584Z // end inline asm 2026-02-21T08:43:21.1573729Z // begin inline asm 2026-02-21T08:43:21.1574109Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r359 + 16], 64, {%r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539}; 2026-02-21T08:43:21.1574522Z // end inline asm 2026-02-21T08:43:21.1574663Z // begin inline asm 2026-02-21T08:43:21.1575076Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r359 + 32], 64, {%r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539}; 2026-02-21T08:43:21.1575527Z // end inline asm 2026-02-21T08:43:21.1575661Z // begin inline asm 2026-02-21T08:43:21.1576038Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r359 + 48], 64, {%r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539, %r539}; 2026-02-21T08:43:21.1576443Z // end inline asm 2026-02-21T08:43:21.1576583Z // begin inline asm 2026-02-21T08:43:21.1576735Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:43:21.1576903Z // end inline asm 2026-02-21T08:43:21.1577038Z bar.sync 0; 2026-02-21T08:43:21.1577284Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1577587Z add.s32 %r540, %r55, 24608; 2026-02-21T08:43:21.1577739Z // begin inline asm 2026-02-21T08:43:21.1577910Z @%p4 mbarrier.init.shared::cta.b64 [%r540], 1; 2026-02-21T08:43:21.1578099Z // end inline asm 2026-02-21T08:43:21.1578235Z bar.sync 0; 2026-02-21T08:43:21.1578373Z add.s32 %r175, %r55, 24616; 2026-02-21T08:43:21.1578529Z // begin inline asm 2026-02-21T08:43:21.1578692Z @%p4 mbarrier.init.shared::cta.b64 [%r175], 1; 2026-02-21T08:43:21.1578870Z // end inline asm 2026-02-21T08:43:21.1579043Z add.s32 %r176, %r55, 24576; 2026-02-21T08:43:21.1579193Z // begin inline asm 2026-02-21T08:43:21.1579357Z @%p4 mbarrier.init.shared::cta.b64 [%r176], 1; 2026-02-21T08:43:21.1579536Z // end inline asm 2026-02-21T08:43:21.1579670Z bar.sync 0; 2026-02-21T08:43:21.1579795Z add.s32 %r177, %r55, 24584; 2026-02-21T08:43:21.1579948Z // begin inline asm 2026-02-21T08:43:21.1580106Z @%p4 mbarrier.init.shared::cta.b64 [%r177], 1; 2026-02-21T08:43:21.1580281Z // end inline asm 2026-02-21T08:43:21.1580413Z bar.sync 0; 2026-02-21T08:43:21.1580537Z add.s32 %r178, %r55, 24592; 2026-02-21T08:43:21.1580689Z // begin inline asm 2026-02-21T08:43:21.1580843Z @%p4 mbarrier.init.shared::cta.b64 [%r178], 1; 2026-02-21T08:43:21.1581030Z // end inline asm 2026-02-21T08:43:21.1581154Z bar.sync 0; 2026-02-21T08:43:21.1581284Z add.s32 %r231, %r55, 24600; 2026-02-21T08:43:21.1581429Z // begin inline asm 2026-02-21T08:43:21.1581585Z @%p4 mbarrier.init.shared::cta.b64 [%r231], 1; 2026-02-21T08:43:21.1581769Z // end inline asm 2026-02-21T08:43:21.1582012Z .loc 1 54 60 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:60 2026-02-21T08:43:21.1582333Z or.b32 %r220, %r217, %r17; 2026-02-21T08:43:21.1582589Z .loc 1 54 32 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:32 2026-02-21T08:43:21.1582883Z mad.wide.s32 %rd32, %r220, 2, %rd8; 2026-02-21T08:43:21.1583051Z mov.b32 %r181, 16; 2026-02-21T08:43:21.1583298Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1583578Z // begin inline asm 2026-02-21T08:43:21.1583776Z cp.async.cg.shared.global [ %r180 + 0 ], [ %rd32 + 0 ], 0x10, %r181; 2026-02-21T08:43:21.1584004Z // end inline asm 2026-02-21T08:43:21.1584142Z cp.async.commit_group; 2026-02-21T08:43:21.1584400Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1584710Z bar.sync 0; 2026-02-21T08:43:21.1584850Z // begin inline asm 2026-02-21T08:43:21.1585065Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r176], 4096; 2026-02-21T08:43:21.1585287Z // end inline asm 2026-02-21T08:43:21.1585530Z .loc 1 55 44 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:55:44 2026-02-21T08:43:21.1585807Z // begin inline asm 2026-02-21T08:43:21.1585962Z fence.proxy.async.shared::cta; 2026-02-21T08:43:21.1586122Z // end inline asm 2026-02-21T08:43:21.1586255Z bar.sync 0; 2026-02-21T08:43:21.1586388Z elect.sync %r221|%p39, -1; 2026-02-21T08:43:21.1586559Z and.pred %p33, %p1, %p39; 2026-02-21T08:43:21.1586709Z // begin inline asm 2026-02-21T08:43:21.1587070Z @%p33 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r55], [%rd42, {%r539, %r234}], [%r176]; 2026-02-21T08:43:21.1587420Z // end inline asm 2026-02-21T08:43:21.1587659Z .loc 1 54 32 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:32 2026-02-21T08:43:21.1587951Z add.s64 %rd34, %rd32, 32; 2026-02-21T08:43:21.1588203Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1588488Z // begin inline asm 2026-02-21T08:43:21.1588680Z cp.async.cg.shared.global [ %r187 + 0 ], [ %rd34 + 0 ], 0x10, %r181; 2026-02-21T08:43:21.1588902Z // end inline asm 2026-02-21T08:43:21.1589044Z cp.async.commit_group; 2026-02-21T08:43:21.1589293Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1589574Z bar.sync 0; 2026-02-21T08:43:21.1589695Z // begin inline asm 2026-02-21T08:43:21.1589883Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r177], 4096; 2026-02-21T08:43:21.1590086Z // end inline asm 2026-02-21T08:43:21.1590324Z .loc 1 55 44 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:55:44 2026-02-21T08:43:21.1590599Z bar.sync 0; 2026-02-21T08:43:21.1590737Z elect.sync %r222|%p40, -1; 2026-02-21T08:43:21.1590941Z and.pred %p35, %p1, %p40; 2026-02-21T08:43:21.1591095Z add.s32 %r190, %r55, 4096; 2026-02-21T08:43:21.1591247Z // begin inline asm 2026-02-21T08:43:21.1591561Z @%p35 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r190], [%rd42, {%r181, %r234}], [%r177]; 2026-02-21T08:43:21.1591917Z // end inline asm 2026-02-21T08:43:21.1592156Z .loc 1 54 32 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:32 2026-02-21T08:43:21.1592441Z add.s64 %rd36, %rd32, 64; 2026-02-21T08:43:21.1592702Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1592980Z // begin inline asm 2026-02-21T08:43:21.1593179Z cp.async.cg.shared.global [ %r194 + 0 ], [ %rd36 + 0 ], 0x10, %r181; 2026-02-21T08:43:21.1593395Z // end inline asm 2026-02-21T08:43:21.1593535Z cp.async.commit_group; 2026-02-21T08:43:21.1593792Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1594070Z bar.sync 0; 2026-02-21T08:43:21.1594194Z // begin inline asm 2026-02-21T08:43:21.1594453Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r178], 4096; 2026-02-21T08:43:21.1594665Z // end inline asm 2026-02-21T08:43:21.1594929Z .loc 1 55 44 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:55:44 2026-02-21T08:43:21.1595215Z bar.sync 0; 2026-02-21T08:43:21.1595345Z elect.sync %r223|%p41, -1; 2026-02-21T08:43:21.1595514Z and.pred %p37, %p1, %p41; 2026-02-21T08:43:21.1595679Z add.s32 %r197, %r55, 8192; 2026-02-21T08:43:21.1595846Z mov.b32 %r198, 32; 2026-02-21T08:43:21.1595994Z // begin inline asm 2026-02-21T08:43:21.1596331Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r197], [%rd42, {%r198, %r234}], [%r178]; 2026-02-21T08:43:21.1596709Z // end inline asm 2026-02-21T08:43:21.1596974Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1597287Z cp.async.wait_group 2; 2026-02-21T08:43:21.1597484Z bar.sync 0; 2026-02-21T08:43:21.1597735Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1598023Z // begin inline asm 2026-02-21T08:43:21.1598169Z 2026-02-21T08:43:21.1598295Z { 2026-02-21T08:43:21.1598423Z .reg .pred complete; 2026-02-21T08:43:21.1598579Z waitLoop: 2026-02-21T08:43:21.1598774Z mbarrier.try_wait.parity.shared.b64 complete, [%r176], %r539; 2026-02-21T08:43:21.1599025Z @!complete bra.uni waitLoop; 2026-02-21T08:43:21.1599180Z } 2026-02-21T08:43:21.1599284Z 2026-02-21T08:43:21.1599341Z // end inline asm 2026-02-21T08:43:21.1599593Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1599896Z setp.ne.b32 %p42, %r39, 0; 2026-02-21T08:43:21.1600063Z @%p42 bra $L__BB0_4; 2026-02-21T08:43:21.1600255Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:43:21.1600490Z elect.sync %r226|%p44, -1; 2026-02-21T08:43:21.1600652Z mov.b32 %r225, 69206032; 2026-02-21T08:43:21.1600815Z mov.pred %p43, 0; 2026-02-21T08:43:21.1600958Z // begin inline asm 2026-02-21T08:43:21.1601194Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r537 + 0 ], %rd38, %rd39, %r225, %p43; 2026-02-21T08:43:21.1601445Z // end inline asm 2026-02-21T08:43:21.1601592Z add.s32 %r228, %r55, 24608; 2026-02-21T08:43:21.1601758Z cvt.u64.u32 %rd40, %r228; 2026-02-21T08:43:21.1601912Z // begin inline asm 2026-02-21T08:43:21.1602130Z @%p44 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T08:43:21.1602361Z // end inline asm 2026-02-21T08:43:21.1602546Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:43:21.1602882Z .loc 1 0 0 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:0 2026-02-21T08:43:21.1603182Z or.b32 %r30, %r234, %r7; 2026-02-21T08:43:21.1603355Z or.b32 %r31, %r215, %r9; 2026-02-21T08:43:21.1603512Z or.b32 %r32, %r215, %r10; 2026-02-21T08:43:21.1603667Z or.b32 %r33, %r215, %r11; 2026-02-21T08:43:21.1603812Z or.b32 %r34, %r215, %r12; 2026-02-21T08:43:21.1603959Z or.b32 %r35, %r215, %r13; 2026-02-21T08:43:21.1604098Z or.b32 %r36, %r215, %r14; 2026-02-21T08:43:21.1604246Z or.b32 %r37, %r215, %r15; 2026-02-21T08:43:21.1604385Z or.b32 %r38, %r215, %r16; 2026-02-21T08:43:21.1604646Z .loc 1 54 32 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:32 2026-02-21T08:43:21.1604953Z add.s64 %rd41, %rd32, 96; 2026-02-21T08:43:21.1605214Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1605500Z bar.sync 0; 2026-02-21T08:43:21.1605626Z // begin inline asm 2026-02-21T08:43:21.1605830Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd41 + 0 ], 0x10, %r181; 2026-02-21T08:43:21.1606052Z // end inline asm 2026-02-21T08:43:21.1606201Z cp.async.commit_group; 2026-02-21T08:43:21.1606461Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1606797Z // begin inline asm 2026-02-21T08:43:21.1606989Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r231], 4096; 2026-02-21T08:43:21.1607202Z // end inline asm 2026-02-21T08:43:21.1607443Z .loc 1 55 44 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:55:44 2026-02-21T08:43:21.1607722Z bar.sync 0; 2026-02-21T08:43:21.1607864Z elect.sync %r241|%p49, -1; 2026-02-21T08:43:21.1608023Z and.pred %p47, %p1, %p49; 2026-02-21T08:43:21.1608183Z add.s32 %r232, %r55, 12288; 2026-02-21T08:43:21.1608330Z mov.b32 %r233, 48; 2026-02-21T08:43:21.1608471Z // begin inline asm 2026-02-21T08:43:21.1608798Z @%p47 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r232], [%rd42, {%r233, %r234}], [%r231]; 2026-02-21T08:43:21.1609150Z // end inline asm 2026-02-21T08:43:21.1609455Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1609755Z shl.b32 %r242, %r28, 16; 2026-02-21T08:43:21.1609913Z or.b32 %r243, %r26, %r242; 2026-02-21T08:43:21.1610067Z cvt.u64.u32 %rd5, %r243; 2026-02-21T08:43:21.1610222Z mov.b32 %r544, 1; 2026-02-21T08:43:21.1610357Z mov.b32 %r543, 3; 2026-02-21T08:43:21.1610498Z mov.b64 %rd188, 0; 2026-02-21T08:43:21.1610642Z mov.b32 %r541, %r539; 2026-02-21T08:43:21.1610785Z mov.b32 %r542, %r539; 2026-02-21T08:43:21.1610933Z mov.b32 %r545, %r539; 2026-02-21T08:43:21.1611072Z bra.uni $L__BB0_5; 2026-02-21T08:43:21.1611302Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:43:21.1611640Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1611970Z setp.lt.u64 %p57, %rd188, 960; 2026-02-21T08:43:21.1612259Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1612558Z // begin inline asm 2026-02-21T08:43:21.1612695Z 2026-02-21T08:43:21.1612808Z { 2026-02-21T08:43:21.1612936Z .reg .pred complete; 2026-02-21T08:43:21.1613079Z waitLoop: 2026-02-21T08:43:21.1613273Z mbarrier.try_wait.parity.shared.b64 complete, [%r540], %r539; 2026-02-21T08:43:21.1613503Z @!complete bra.uni waitLoop; 2026-02-21T08:43:21.1613660Z } 2026-02-21T08:43:21.1613727Z 2026-02-21T08:43:21.1613782Z // end inline asm 2026-02-21T08:43:21.1613925Z add.s32 %r273, %r544, 1; 2026-02-21T08:43:21.1614076Z setp.gt.s32 %p60, %r273, 1; 2026-02-21T08:43:21.1614245Z selp.b32 %r544, 0, %r273, %p60; 2026-02-21T08:43:21.1614420Z selp.b32 %r274, 1, 0, %p60; 2026-02-21T08:43:21.1614574Z xor.b32 %r52, %r545, %r274; 2026-02-21T08:43:21.1614891Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1615204Z add.s32 %r275, %r543, 1; 2026-02-21T08:43:21.1615413Z setp.gt.s32 %p61, %r275, 3; 2026-02-21T08:43:21.1615574Z selp.b32 %r543, 0, %r275, %p61; 2026-02-21T08:43:21.1615856Z .loc 1 54 32 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:32 2026-02-21T08:43:21.1616154Z add.s64 %rd51, %rd5, %rd188; 2026-02-21T08:43:21.1616317Z cvt.u32.u64 %r276, %rd51; 2026-02-21T08:43:21.1616491Z mad.wide.s32 %rd49, %r276, 2, %rd8; 2026-02-21T08:43:21.1616795Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1617109Z shl.b32 %r277, %r543, 11; 2026-02-21T08:43:21.1617256Z bar.sync 0; 2026-02-21T08:43:21.1617399Z add.s32 %r266, %r180, %r277; 2026-02-21T08:43:21.1617556Z selp.b32 %r267, 16, 0, %p57; 2026-02-21T08:43:21.1617727Z // begin inline asm 2026-02-21T08:43:21.1617932Z cp.async.cg.shared.global [ %r266 + 0 ], [ %rd49 + 0 ], 0x10, %r267; 2026-02-21T08:43:21.1618173Z // end inline asm 2026-02-21T08:43:21.1618323Z cp.async.commit_group; 2026-02-21T08:43:21.1618573Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1618854Z shl.b32 %r278, %r543, 3; 2026-02-21T08:43:21.1619042Z add.s32 %r280, %r55, %r278; 2026-02-21T08:43:21.1619201Z add.s32 %r272, %r280, 24576; 2026-02-21T08:43:21.1619352Z and.pred %p55, %p4, %p57; 2026-02-21T08:43:21.1619509Z // begin inline asm 2026-02-21T08:43:21.1619694Z @%p55 mbarrier.arrive.expect_tx.shared.b64 _, [%r272], 4096; 2026-02-21T08:43:21.1619908Z // end inline asm 2026-02-21T08:43:21.1620148Z .loc 1 55 44 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:55:44 2026-02-21T08:43:21.1620425Z shl.b32 %r281, %r543, 12; 2026-02-21T08:43:21.1620582Z add.s32 %r269, %r55, %r281; 2026-02-21T08:43:21.1620728Z bar.sync 0; 2026-02-21T08:43:21.1620873Z elect.sync %r282|%p62, -1; 2026-02-21T08:43:21.1621030Z and.pred %p63, %p57, %p62; 2026-02-21T08:43:21.1621193Z and.pred %p56, %p1, %p63; 2026-02-21T08:43:21.1621346Z cvt.u32.u64 %r283, %rd188; 2026-02-21T08:43:21.1621500Z add.s32 %r270, %r283, 64; 2026-02-21T08:43:21.1621677Z // begin inline asm 2026-02-21T08:43:21.1622000Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r269], [%rd42, {%r270, %r234}], [%r272]; 2026-02-21T08:43:21.1622344Z // end inline asm 2026-02-21T08:43:21.1622579Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1622857Z add.s64 %rd7, %rd188, 16; 2026-02-21T08:43:21.1623010Z setp.lt.u64 %p64, %rd188, 992; 2026-02-21T08:43:21.1623175Z mov.b64 %rd188, %rd7; 2026-02-21T08:43:21.1623345Z mov.b32 %r539, %r545; 2026-02-21T08:43:21.1623481Z mov.b32 %r540, %r284; 2026-02-21T08:43:21.1623621Z mov.b32 %r545, %r52; 2026-02-21T08:43:21.1623760Z @%p64 bra $L__BB0_5; 2026-02-21T08:43:21.1623905Z bra.uni $L__BB0_8; 2026-02-21T08:43:21.1624079Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:43:21.1624325Z // => This Inner Loop Header: Depth=2 2026-02-21T08:43:21.1624528Z add.s32 %r246, %r542, 1; 2026-02-21T08:43:21.1624717Z setp.gt.s32 %p51, %r246, 3; 2026-02-21T08:43:21.1624883Z selp.b32 %r542, 0, %r246, %p51; 2026-02-21T08:43:21.1625043Z selp.b32 %r247, 1, 0, %p51; 2026-02-21T08:43:21.1625201Z xor.b32 %r541, %r541, %r247; 2026-02-21T08:43:21.1625455Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1625742Z cp.async.wait_group 2; 2026-02-21T08:43:21.1625890Z bar.sync 0; 2026-02-21T08:43:21.1626128Z .loc 1 49 90 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:49:90 2026-02-21T08:43:21.1626411Z shl.b32 %r248, %r542, 3; 2026-02-21T08:43:21.1626568Z add.s32 %r250, %r55, %r248; 2026-02-21T08:43:21.1626725Z add.s32 %r244, %r250, 24576; 2026-02-21T08:43:21.1626873Z // begin inline asm 2026-02-21T08:43:21.1627011Z 2026-02-21T08:43:21.1627146Z { 2026-02-21T08:43:21.1627272Z .reg .pred complete; 2026-02-21T08:43:21.1627411Z waitLoop: 2026-02-21T08:43:21.1627601Z mbarrier.try_wait.parity.shared.b64 complete, [%r244], %r541; 2026-02-21T08:43:21.1627827Z @!complete bra.uni waitLoop; 2026-02-21T08:43:21.1627977Z } 2026-02-21T08:43:21.1628041Z 2026-02-21T08:43:21.1628095Z // end inline asm 2026-02-21T08:43:21.1628236Z shl.b32 %r251, %r544, 3; 2026-02-21T08:43:21.1628388Z add.s32 %r252, %r55, %r251; 2026-02-21T08:43:21.1628537Z add.s32 %r284, %r252, 24608; 2026-02-21T08:43:21.1628804Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1629086Z @%p42 bra $L__BB0_7; 2026-02-21T08:43:21.1629275Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T08:43:21.1629592Z .loc 1 55 44 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:55:44 2026-02-21T08:43:21.1629874Z shl.b32 %r255, %r542, 12; 2026-02-21T08:43:21.1630028Z add.s32 %r257, %r55, %r255; 2026-02-21T08:43:21.1630283Z .loc 1 54 85 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:54:85 2026-02-21T08:43:21.1630600Z shl.b32 %r258, %r542, 11; 2026-02-21T08:43:21.1630745Z add.s32 %r259, %r55, %r258; 2026-02-21T08:43:21.1630901Z add.s32 %r260, %r259, 16384; 2026-02-21T08:43:21.1631154Z .loc 1 56 52 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:56:52 2026-02-21T08:43:21.1631444Z elect.sync %r261|%p53, -1; 2026-02-21T08:43:21.1631600Z bfe.u32 %r262, %r260, 4, 14; 2026-02-21T08:43:21.1631754Z cvt.u64.u32 %rd47, %r262; 2026-02-21T08:43:21.1631923Z or.b64 %rd44, %rd47, -4611685949699522560; 2026-02-21T08:43:21.1632098Z bfe.u32 %r263, %r257, 4, 14; 2026-02-21T08:43:21.1632254Z cvt.u64.u32 %rd48, %r263; 2026-02-21T08:43:21.1632410Z or.b64 %rd45, %rd48, -4611685949691133952; 2026-02-21T08:43:21.1632589Z mov.b32 %r254, 69206032; 2026-02-21T08:43:21.1632734Z mov.pred %p52, -1; 2026-02-21T08:43:21.1632878Z // begin inline asm 2026-02-21T08:43:21.1633119Z @%p53 tcgen05.mma.cta_group::1.kind::f16 [ %r537 + 0 ], %rd44, %rd45, %r254, %p52; 2026-02-21T08:43:21.1633375Z // end inline asm 2026-02-21T08:43:21.1633517Z cvt.u64.u32 %rd46, %r284; 2026-02-21T08:43:21.1633661Z // begin inline asm 2026-02-21T08:43:21.1633870Z @%p53 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:43:21.1634096Z // end inline asm 2026-02-21T08:43:21.1634234Z bra.uni $L__BB0_7; 2026-02-21T08:43:21.1634391Z $L__BB0_9: // %._crit_edge 2026-02-21T08:43:21.1634715Z .loc 1 30 4 // cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py:30:4 2026-02-21T08:43:21.1635013Z bar.sync 0; 2026-02-21T08:43:21.1635148Z // begin inline asm 2026-02-21T08:43:21.1635346Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r537, 128; 2026-02-21T08:43:21.1635566Z // end inline asm 2026-02-21T08:43:21.1635705Z ret; 2026-02-21T08:43:21.1635829Z $L__tmp0: 2026-02-21T08:43:21.1635976Z $L__func_end0: 2026-02-21T08:43:21.1636136Z // -- End function 2026-02-21T08:43:21.1636330Z } 2026-02-21T08:43:21.1636599Z .file 1 "/tmp/torchinductor_root/we/cwe2lpjq5bn44e3cjvvjlij3oe6joopmnlhlmf3zlestiln4gpsn.py" 2026-02-21T08:43:21.1636926Z .section .debug_abbrev 2026-02-21T08:43:21.1637078Z { 2026-02-21T08:43:21.1637223Z .b8 1 // Abbreviation Code 2026-02-21T08:43:21.1637448Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:43:21.1637658Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:43:21.1637870Z .b8 37 // DW_AT_producer 2026-02-21T08:43:21.1638067Z .b8 8 // DW_FORM_string 2026-02-21T08:43:21.1638267Z .b8 19 // DW_AT_language 2026-02-21T08:43:21.1638463Z .b8 5 // DW_FORM_data2 2026-02-21T08:43:21.1638707Z .b8 3 // DW_AT_name 2026-02-21T08:43:21.1638912Z .b8 8 // DW_FORM_string 2026-02-21T08:43:21.1639108Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:43:21.1639317Z .b8 6 // DW_FORM_data4 2026-02-21T08:43:21.1639522Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:43:21.1639734Z .b8 8 // DW_FORM_string 2026-02-21T08:43:21.1639933Z .b8 0 // EOM(1) 2026-02-21T08:43:21.1640138Z .b8 0 // EOM(2) 2026-02-21T08:43:21.1640337Z .b8 0 // EOM(3) 2026-02-21T08:43:21.1640506Z } 2026-02-21T08:43:21.1640638Z .section .debug_info 2026-02-21T08:43:21.1640781Z { 2026-02-21T08:43:21.1640937Z .b32 104 // Length of Unit 2026-02-21T08:43:21.1641164Z .b8 2 // DWARF version number 2026-02-21T08:43:21.1641360Z .b8 0 2026-02-21T08:43:21.1641543Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:43:21.1641834Z .b8 8 // Address Size (in bytes) 2026-02-21T08:43:21.1642078Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:43:21.1642317Z .b8 116 // DW_AT_producer 2026-02-21T08:43:21.1642513Z .b8 114 2026-02-21T08:43:21.1642634Z .b8 105 2026-02-21T08:43:21.1642756Z .b8 116 2026-02-21T08:43:21.1642869Z .b8 111 2026-02-21T08:43:21.1642987Z .b8 110 2026-02-21T08:43:21.1643099Z .b8 0 2026-02-21T08:43:21.1643248Z .b8 2 // DW_AT_language 2026-02-21T08:43:21.1643427Z .b8 0 2026-02-21T08:43:21.1643572Z .b8 99 // DW_AT_name 2026-02-21T08:43:21.1643757Z .b8 119 2026-02-21T08:43:21.1643870Z .b8 101 2026-02-21T08:43:21.1643991Z .b8 50 2026-02-21T08:43:21.1644109Z .b8 108 2026-02-21T08:43:21.1644231Z .b8 112 2026-02-21T08:43:21.1644344Z .b8 106 2026-02-21T08:43:21.1644468Z .b8 113 2026-02-21T08:43:21.1644630Z .b8 53 2026-02-21T08:43:21.1644789Z .b8 98 2026-02-21T08:43:21.1644905Z .b8 110 2026-02-21T08:43:21.1645027Z .b8 52 2026-02-21T08:43:21.1645143Z .b8 52 2026-02-21T08:43:21.1645275Z .b8 101 2026-02-21T08:43:21.1645396Z .b8 51 2026-02-21T08:43:21.1645522Z .b8 99 2026-02-21T08:43:21.1645645Z .b8 106 2026-02-21T08:43:21.1645761Z .b8 118 2026-02-21T08:43:21.1645881Z .b8 118 2026-02-21T08:43:21.1645997Z .b8 106 2026-02-21T08:43:21.1646119Z .b8 108 2026-02-21T08:43:21.1646231Z .b8 105 2026-02-21T08:43:21.1646354Z .b8 106 2026-02-21T08:43:21.1646495Z .b8 51 2026-02-21T08:43:21.1646616Z .b8 111 2026-02-21T08:43:21.1646727Z .b8 101 2026-02-21T08:43:21.1646847Z .b8 54 2026-02-21T08:43:21.1646959Z .b8 106 2026-02-21T08:43:21.1647080Z .b8 111 2026-02-21T08:43:21.1647192Z .b8 111 2026-02-21T08:43:21.1647312Z .b8 112 2026-02-21T08:43:21.1647430Z .b8 109 2026-02-21T08:43:21.1647553Z .b8 110 2026-02-21T08:43:21.1647668Z .b8 108 2026-02-21T08:43:21.1647774Z .b8 104 2026-02-21T08:43:21.1647889Z .b8 108 2026-02-21T08:43:21.1647995Z .b8 109 2026-02-21T08:43:21.1648108Z .b8 102 2026-02-21T08:43:21.1648214Z .b8 51 2026-02-21T08:43:21.1648327Z .b8 122 2026-02-21T08:43:21.1648432Z .b8 108 2026-02-21T08:43:21.1648545Z .b8 101 2026-02-21T08:43:21.1648593Z .b8 115 2026-02-21T08:43:21.1648642Z .b8 116 2026-02-21T08:43:21.1648697Z .b8 105 2026-02-21T08:43:21.1648745Z .b8 108 2026-02-21T08:43:21.1648793Z .b8 110 2026-02-21T08:43:21.1648847Z .b8 52 2026-02-21T08:43:21.1648895Z .b8 103 2026-02-21T08:43:21.1648943Z .b8 112 2026-02-21T08:43:21.1648991Z .b8 115 2026-02-21T08:43:21.1649047Z .b8 110 2026-02-21T08:43:21.1649094Z .b8 46 2026-02-21T08:43:21.1649142Z .b8 112 2026-02-21T08:43:21.1649190Z .b8 121 2026-02-21T08:43:21.1649245Z .b8 0 2026-02-21T08:43:21.1649336Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:43:21.1649409Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:43:21.1649492Z .b8 116 2026-02-21T08:43:21.1649541Z .b8 109 2026-02-21T08:43:21.1649590Z .b8 112 2026-02-21T08:43:21.1649637Z .b8 47 2026-02-21T08:43:21.1649694Z .b8 116 2026-02-21T08:43:21.1649744Z .b8 111 2026-02-21T08:43:21.1649791Z .b8 114 2026-02-21T08:43:21.1649846Z .b8 99 2026-02-21T08:43:21.1649894Z .b8 104 2026-02-21T08:43:21.1649944Z .b8 105 2026-02-21T08:43:21.1649991Z .b8 110 2026-02-21T08:43:21.1650047Z .b8 100 2026-02-21T08:43:21.1650097Z .b8 117 2026-02-21T08:43:21.1650144Z .b8 99 2026-02-21T08:43:21.1650201Z .b8 116 2026-02-21T08:43:21.1650252Z .b8 111 2026-02-21T08:43:21.1650301Z .b8 114 2026-02-21T08:43:21.1650351Z .b8 95 2026-02-21T08:43:21.1650411Z .b8 114 2026-02-21T08:43:21.1650461Z .b8 111 2026-02-21T08:43:21.1650513Z .b8 111 2026-02-21T08:43:21.1650561Z .b8 116 2026-02-21T08:43:21.1650617Z .b8 47 2026-02-21T08:43:21.1650666Z .b8 119 2026-02-21T08:43:21.1650713Z .b8 101 2026-02-21T08:43:21.1650770Z .b8 0 2026-02-21T08:43:21.1650819Z } 2026-02-21T08:43:21.1650883Z .section .debug_macinfo { } 2026-02-21T08:43:21.1650887Z 2026-02-21T08:43:21.1650971Z ================================================================ 2026-02-21T08:43:21.1651100Z please share the reproducer above with Triton project. 2026-02-21T08:43:24.8738466Z 2026-02-21T08:43:24.8743347Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 7.2 configs/s 2026-02-21T08:43:24.8751439Z [18s] Adaptive compile timeout: 30s (90% percentile=3.4s, bounds=[30.0s, 30s]) 2026-02-21T08:43:25.8270832Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 991.2 configs/s 2026-02-21T08:43:25.9083149Z [19s] Initial random population of 100, 5 starting points: 2026-02-21T08:43:25.9087620Z error=4 2026-02-21T08:43:25.9089779Z ok=96 2026-02-21T08:43:25.9089937Z min=0.1619 2026-02-21T08:43:25.9090073Z mid=3.0915 2026-02-21T08:43:25.9090196Z max=230.1665 2026-02-21T08:43:25.9090356Z best={'block_sizes': [128, 32, 16], 2026-02-21T08:43:25.9090643Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:43:25.9090905Z 'l2_groupings': [2], 2026-02-21T08:43:25.9091305Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:43:25.9091523Z 'loop_orders': [[1, 0]], 2026-02-21T08:43:25.9091684Z 'num_stages': 5, 2026-02-21T08:43:25.9091819Z 'num_warps': 2, 2026-02-21T08:43:25.9091964Z 'pid_type': 'flat', 2026-02-21T08:43:25.9092116Z 'range_flattens': [None, False], 2026-02-21T08:43:25.9092297Z 'range_multi_buffers': [None, None], 2026-02-21T08:43:25.9092472Z 'range_num_stages': [0, 0], 2026-02-21T08:43:25.9092640Z 'range_unroll_factors': [0, 0], 2026-02-21T08:43:25.9092819Z 'range_warp_specializes': [None, None]} 2026-02-21T08:43:25.9101140Z [19s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:43:27.0159108Z [20s] Generation 1 starting: 85 neighbors, 5 active search path(s) 2026-02-21T08:43:31.5675694Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 18.8 configs/s 2026-02-21T08:43:35.9927429Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 20.4 configs/s 2026-02-21T08:43:37.8370247Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 732.5 2026-02-21T08:43:37.8370602Z configs/s 2026-02-21T08:43:37.9156561Z [31s] Generation 1 complete: 2026-02-21T08:43:37.9160851Z error=12 2026-02-21T08:43:37.9165961Z ok=79 2026-02-21T08:43:37.9170482Z min=0.0716 2026-02-21T08:43:37.9171895Z mid=0.2089 2026-02-21T08:43:37.9172055Z max=1.3772 2026-02-21T08:43:37.9172208Z best={'block_sizes': [64, 128, 32], 2026-02-21T08:43:37.9172457Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:43:37.9172720Z 'l2_groupings': [4], 2026-02-21T08:43:37.9172899Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:43:37.9173120Z 'loop_orders': [[1, 0]], 2026-02-21T08:43:37.9173282Z 'num_stages': 7, 2026-02-21T08:43:37.9173422Z 'num_warps': 4, 2026-02-21T08:43:37.9173570Z 'pid_type': 'flat', 2026-02-21T08:43:37.9174029Z 'range_flattens': [None, None], 2026-02-21T08:43:37.9174243Z 'range_multi_buffers': [None, False], 2026-02-21T08:43:37.9174440Z 'range_num_stages': [0, 0], 2026-02-21T08:43:37.9174626Z 'range_unroll_factors': [0, 0], 2026-02-21T08:43:37.9175072Z 'range_warp_specializes': [None, False]} 2026-02-21T08:43:37.9175304Z [31s] Fitting surrogate: 191 points, 191 targets 2026-02-21T08:43:39.1981219Z [33s] Generation 2 starting: 88 neighbors, 5 active search path(s) 2026-02-21T08:43:50.3042468Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90/90 2.4 configs/s 2026-02-21T08:43:55.1851456Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 90/90 18.6 configs/s 2026-02-21T08:43:55.4495990Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3567.5 2026-02-21T08:43:55.4496786Z configs/s 2026-02-21T08:43:55.4865401Z [49s] Generation 2 complete: 2026-02-21T08:43:55.4867843Z error=13 2026-02-21T08:43:55.4868007Z ok=81 2026-02-21T08:43:55.4868165Z min=0.0450 2026-02-21T08:43:55.4868307Z mid=0.1251 2026-02-21T08:43:55.4868439Z max=8.7658 2026-02-21T08:43:55.4868589Z best={'block_sizes': [256, 128, 32], 2026-02-21T08:43:55.4869224Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:43:55.4869455Z 'l2_groupings': [4], 2026-02-21T08:43:55.4869633Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:43:55.4869825Z 'loop_orders': [[1, 0]], 2026-02-21T08:43:55.4869983Z 'num_stages': 7, 2026-02-21T08:43:55.4870120Z 'num_warps': 4, 2026-02-21T08:43:55.4870264Z 'pid_type': 'flat', 2026-02-21T08:43:55.4870416Z 'range_flattens': [None, False], 2026-02-21T08:43:55.4870611Z 'range_multi_buffers': [None, False], 2026-02-21T08:43:55.4870787Z 'range_num_stages': [0, 0], 2026-02-21T08:43:55.4870957Z 'range_unroll_factors': [0, 0], 2026-02-21T08:43:55.4871129Z 'range_warp_specializes': [None, False]} 2026-02-21T08:43:55.4884665Z [49s] Fitting surrogate: 285 points, 285 targets 2026-02-21T08:43:56.8033795Z [50s] Generation 3 starting: 89 neighbors, 5 active search path(s) 2026-02-21T08:44:10.9393303Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90/90 2.3 configs/s 2026-02-21T08:44:14.9219752Z 2026-02-21T08:44:14.9219794Z 2026-02-21T08:44:14.9220282Z ================================================================ 2026-02-21T08:44:14.9220876Z Internal Triton PTX codegen error 2026-02-21T08:44:14.9221197Z `ptxas` stderr: 2026-02-21T08:44:14.9222011Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 153 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:44:14.9223314Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:44:14.9223602Z 2026-02-21T08:44:14.9224385Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpy3psav1r.ptx -o /tmp/tmpy3psav1r.ptx.o 2026-02-21T08:44:14.9225441Z 2026-02-21T08:44:14.9225448Z 2026-02-21T08:44:14.9225663Z // 2026-02-21T08:44:14.9225908Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:44:14.9226235Z // 2026-02-21T08:44:14.9226365Z 2026-02-21T08:44:14.9226465Z .version 8.7 2026-02-21T08:44:14.9226713Z .target sm_100a 2026-02-21T08:44:14.9226948Z .address_size 64 2026-02-21T08:44:14.9227105Z 2026-02-21T08:44:14.9227325Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:44:14.9227832Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:44:14.9228226Z // @_helion_matmul 2026-02-21T08:44:14.9228599Z .visible .entry _helion_matmul( 2026-02-21T08:44:14.9228998Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:44:14.9229484Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:44:14.9229959Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:44:14.9230435Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:44:14.9230916Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:44:14.9231293Z ) 2026-02-21T08:44:14.9231507Z .reqntid 256 2026-02-21T08:44:14.9231735Z .maxnreg 32 2026-02-21T08:44:14.9231944Z { 2026-02-21T08:44:14.9232151Z .reg .pred %p<42>; 2026-02-21T08:44:14.9232415Z .reg .b32 %r<1522>; 2026-02-21T08:44:14.9232667Z .reg .b64 %rd<644>; 2026-02-21T08:44:14.9232919Z $L__func_begin0: 2026-02-21T08:44:14.9233067Z 2026-02-21T08:44:14.9233153Z // %bb.0: 2026-02-21T08:44:14.9233605Z .loc 1 14 0 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:14 2026-02-21T08:44:14.9234163Z mov.u32 %r1, %tid.x; 2026-02-21T08:44:14.9234435Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:44:14.9234809Z mov.b32 %r71, global_smem; 2026-02-21T08:44:14.9235088Z // begin inline asm 2026-02-21T08:44:14.9235529Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r71], 512; 2026-02-21T08:44:14.9235990Z // end inline asm 2026-02-21T08:44:14.9236233Z bar.sync 0; 2026-02-21T08:44:14.9236486Z ld.shared.b32 %r1515, [global_smem]; 2026-02-21T08:44:14.9236805Z bar.sync 0; 2026-02-21T08:44:14.9237042Z // begin inline asm 2026-02-21T08:44:14.9237538Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:44:14.9237968Z // end inline asm 2026-02-21T08:44:14.9238465Z .loc 1 20 46 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:20:46 2026-02-21T08:44:14.9239027Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:44:14.9239581Z .loc 1 20 99 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:20:99 2026-02-21T08:44:14.9240287Z setp.gt.u32 %p3, %r3, 191; 2026-02-21T08:44:14.9240658Z @%p3 bra $L__BB0_8; 2026-02-21T08:44:14.9240938Z // %bb.1: // %.lr.ph 2026-02-21T08:44:14.9241502Z .loc 1 0 99 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:0:99 2026-02-21T08:44:14.9242092Z ld.param.b64 %rd19, [_helion_matmul_param_1]; 2026-02-21T08:44:14.9242499Z ld.param.b64 %rd18, [_helion_matmul_param_0]; 2026-02-21T08:44:14.9243171Z .loc 1 34 45 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:34:45 2026-02-21T08:44:14.9243738Z shl.b32 %r406, %r1, 3; 2026-02-21T08:44:14.9244234Z .loc 1 40 48 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:40:48 2026-02-21T08:44:14.9244820Z and.b32 %r407, %r406, 8; 2026-02-21T08:44:14.9245315Z .loc 1 34 45 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:34:45 2026-02-21T08:44:14.9245861Z and.b32 %r408, %r406, 504; 2026-02-21T08:44:14.9246364Z .loc 1 32 45 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:32:45 2026-02-21T08:44:14.9246970Z and.b32 %r409, %r1, 192; 2026-02-21T08:44:14.9247253Z bfe.u32 %r410, %r1, 6, 2; 2026-02-21T08:44:14.9247536Z shr.u32 %r411, %r1, 1; 2026-02-21T08:44:14.9247798Z bfe.u32 %r4, %r1, 1, 7; 2026-02-21T08:44:14.9248065Z shr.u32 %r412, %r1, 5; 2026-02-21T08:44:14.9248387Z setp.eq.b32 %p39, %r1, 0; 2026-02-21T08:44:14.9248675Z shl.b32 %r413, %r1, 4; 2026-02-21T08:44:14.9248938Z and.b32 %r414, %r413, 3952; 2026-02-21T08:44:14.9249229Z bfe.s32 %r415, %r1, 3, 1; 2026-02-21T08:44:14.9249496Z and.b32 %r416, %r415, 144; 2026-02-21T08:44:14.9249781Z xor.b32 %r5, %r416, %r414; 2026-02-21T08:44:14.9250053Z add.s32 %r348, %r71, %r5; 2026-02-21T08:44:14.9250336Z add.s32 %r346, %r348, 114688; 2026-02-21T08:44:14.9250629Z add.s32 %r350, %r348, 4096; 2026-02-21T08:44:14.9250907Z add.s32 %r352, %r348, 8192; 2026-02-21T08:44:14.9251196Z add.s32 %r354, %r348, 12288; 2026-02-21T08:44:14.9251480Z add.s32 %r356, %r348, 118784; 2026-02-21T08:44:14.9251767Z add.s32 %r358, %r348, 16384; 2026-02-21T08:44:14.9252042Z add.s32 %r360, %r348, 20480; 2026-02-21T08:44:14.9252319Z add.s32 %r362, %r348, 24576; 2026-02-21T08:44:14.9252588Z add.s32 %r364, %r348, 28672; 2026-02-21T08:44:14.9252867Z add.s32 %r366, %r348, 122880; 2026-02-21T08:44:14.9253142Z add.s32 %r368, %r348, 32768; 2026-02-21T08:44:14.9253419Z add.s32 %r370, %r348, 36864; 2026-02-21T08:44:14.9253698Z add.s32 %r372, %r348, 40960; 2026-02-21T08:44:14.9253970Z add.s32 %r374, %r348, 45056; 2026-02-21T08:44:14.9254248Z add.s32 %r376, %r348, 126976; 2026-02-21T08:44:14.9254520Z add.s32 %r378, %r348, 49152; 2026-02-21T08:44:14.9254848Z add.s32 %r380, %r348, 53248; 2026-02-21T08:44:14.9255120Z add.s32 %r382, %r348, 57344; 2026-02-21T08:44:14.9255396Z add.s32 %r384, %r348, 61440; 2026-02-21T08:44:14.9255668Z add.s32 %r386, %r348, 131072; 2026-02-21T08:44:14.9255946Z add.s32 %r388, %r348, 65536; 2026-02-21T08:44:14.9256222Z add.s32 %r390, %r348, 69632; 2026-02-21T08:44:14.9256489Z add.s32 %r392, %r348, 73728; 2026-02-21T08:44:14.9256761Z add.s32 %r394, %r348, 77824; 2026-02-21T08:44:14.9257028Z add.s32 %r396, %r348, 135168; 2026-02-21T08:44:14.9257307Z add.s32 %r398, %r348, 81920; 2026-02-21T08:44:14.9257574Z add.s32 %r400, %r348, 86016; 2026-02-21T08:44:14.9257850Z add.s32 %r402, %r348, 90112; 2026-02-21T08:44:14.9258117Z add.s32 %r404, %r348, 94208; 2026-02-21T08:44:14.9258397Z or.b32 %r7, %r407, 96; 2026-02-21T08:44:14.9258720Z add.s32 %r473, %r348, 139264; 2026-02-21T08:44:14.9258994Z add.s32 %r475, %r348, 98304; 2026-02-21T08:44:14.9259271Z add.s32 %r477, %r348, 102400; 2026-02-21T08:44:14.9259542Z add.s32 %r479, %r348, 106496; 2026-02-21T08:44:14.9259824Z add.s32 %r481, %r348, 110592; 2026-02-21T08:44:14.9260101Z and.b32 %r418, %r1, 7; 2026-02-21T08:44:14.9260370Z shl.b32 %r419, %r418, 12; 2026-02-21T08:44:14.9260639Z and.b32 %r420, %r413, 2032; 2026-02-21T08:44:14.9260920Z and.b32 %r421, %r411, 64; 2026-02-21T08:44:14.9261191Z xor.b32 %r422, %r420, %r421; 2026-02-21T08:44:14.9261474Z or.b32 %r423, %r422, %r419; 2026-02-21T08:44:14.9261745Z xor.b32 %r424, %r423, 16; 2026-02-21T08:44:14.9262018Z xor.b32 %r425, %r423, 32; 2026-02-21T08:44:14.9262285Z xor.b32 %r426, %r423, 48; 2026-02-21T08:44:14.9262540Z shl.b32 %r427, %r1, 6; 2026-02-21T08:44:14.9262806Z and.b32 %r428, %r427, 14336; 2026-02-21T08:44:14.9263135Z shl.b32 %r429, %r418, 4; 2026-02-21T08:44:14.9263409Z shr.u32 %r430, %r409, 2; 2026-02-21T08:44:14.9263672Z and.b32 %r431, %r415, 16448; 2026-02-21T08:44:14.9263951Z and.b32 %r432, %r406, 128; 2026-02-21T08:44:14.9264219Z or.b32 %r433, %r428, %r429; 2026-02-21T08:44:14.9264496Z or.b32 %r434, %r431, %r430; 2026-02-21T08:44:14.9264820Z xor.b32 %r435, %r434, %r433; 2026-02-21T08:44:14.9265100Z add.s32 %r436, %r71, %r432; 2026-02-21T08:44:14.9265375Z add.s32 %r814, %r436, %r435; 2026-02-21T08:44:14.9265872Z .loc 1 31 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:31:27 2026-02-21T08:44:14.9266492Z shl.b32 %r437, %r3, 7; 2026-02-21T08:44:14.9266751Z and.b32 %r438, %r437, 896; 2026-02-21T08:44:14.9267250Z .loc 1 32 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:32:32 2026-02-21T08:44:14.9267783Z or.b32 %r439, %r438, %r4; 2026-02-21T08:44:14.9268113Z or.b32 %r25, %r438, %r410; 2026-02-21T08:44:14.9268614Z .loc 1 33 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:33:27 2026-02-21T08:44:14.9269156Z shl.b32 %r440, %r3, 6; 2026-02-21T08:44:14.9269425Z and.b32 %r441, %r440, 15872; 2026-02-21T08:44:14.9269915Z .loc 1 34 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:34:32 2026-02-21T08:44:14.9270462Z or.b32 %r442, %r441, %r4; 2026-02-21T08:44:14.9270729Z or.b32 %r443, %r411, %r441; 2026-02-21T08:44:14.9271226Z .loc 1 44 53 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:53 2026-02-21T08:44:14.9271778Z shl.b32 %r444, %r439, 10; 2026-02-21T08:44:14.9272264Z .loc 1 45 80 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:80 2026-02-21T08:44:14.9272811Z shl.b32 %r445, %r442, 10; 2026-02-21T08:44:14.9273074Z shl.b32 %r446, %r443, 10; 2026-02-21T08:44:14.9273345Z or.b32 %r447, %r446, 393216; 2026-02-21T08:44:14.9273838Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9274422Z shfl.sync.idx.b32 %r58, %r412, 0, 31, -1; 2026-02-21T08:44:14.9274798Z shl.b32 %r448, %r58, 21; 2026-02-21T08:44:14.9275083Z and.b32 %r449, %r448, 6291456; 2026-02-21T08:44:14.9275405Z add.s32 %r450, %r449, %r1515; 2026-02-21T08:44:14.9275687Z shl.b32 %r451, %r58, 5; 2026-02-21T08:44:14.9275959Z and.b32 %r452, %r451, 128; 2026-02-21T08:44:14.9276231Z add.s32 %r809, %r450, %r452; 2026-02-21T08:44:14.9276518Z mov.pred %p4, -1; 2026-02-21T08:44:14.9276771Z mov.b32 %r1516, 0; 2026-02-21T08:44:14.9277030Z // begin inline asm 2026-02-21T08:44:14.9277793Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 0], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9278620Z // end inline asm 2026-02-21T08:44:14.9278865Z // begin inline asm 2026-02-21T08:44:14.9279619Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 16], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9280512Z // end inline asm 2026-02-21T08:44:14.9280746Z // begin inline asm 2026-02-21T08:44:14.9281502Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 32], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9282317Z // end inline asm 2026-02-21T08:44:14.9282552Z // begin inline asm 2026-02-21T08:44:14.9283308Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 48], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9284111Z // end inline asm 2026-02-21T08:44:14.9284354Z // begin inline asm 2026-02-21T08:44:14.9285202Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 64], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9286027Z // end inline asm 2026-02-21T08:44:14.9286268Z // begin inline asm 2026-02-21T08:44:14.9287006Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 80], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9287831Z // end inline asm 2026-02-21T08:44:14.9288061Z // begin inline asm 2026-02-21T08:44:14.9288817Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 96], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9289674Z // end inline asm 2026-02-21T08:44:14.9289914Z // begin inline asm 2026-02-21T08:44:14.9290721Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 112], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9291551Z // end inline asm 2026-02-21T08:44:14.9291792Z // begin inline asm 2026-02-21T08:44:14.9292538Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 256], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9293369Z // end inline asm 2026-02-21T08:44:14.9293609Z // begin inline asm 2026-02-21T08:44:14.9294352Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 272], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9295221Z // end inline asm 2026-02-21T08:44:14.9295451Z // begin inline asm 2026-02-21T08:44:14.9296201Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 288], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9297020Z // end inline asm 2026-02-21T08:44:14.9297259Z // begin inline asm 2026-02-21T08:44:14.9298017Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 304], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9298825Z // end inline asm 2026-02-21T08:44:14.9299067Z // begin inline asm 2026-02-21T08:44:14.9299807Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 320], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9300621Z // end inline asm 2026-02-21T08:44:14.9300861Z // begin inline asm 2026-02-21T08:44:14.9301597Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 336], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9302403Z // end inline asm 2026-02-21T08:44:14.9302639Z // begin inline asm 2026-02-21T08:44:14.9303383Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 352], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9304280Z // end inline asm 2026-02-21T08:44:14.9304526Z // begin inline asm 2026-02-21T08:44:14.9305332Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r809 + 368], {%r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516, %r1516}; 2026-02-21T08:44:14.9306132Z // end inline asm 2026-02-21T08:44:14.9306371Z // begin inline asm 2026-02-21T08:44:14.9306640Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:44:14.9306936Z // end inline asm 2026-02-21T08:44:14.9307160Z bar.sync 0; 2026-02-21T08:44:14.9307615Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9308158Z add.s32 %r1517, %r71, 143360; 2026-02-21T08:44:14.9308432Z // begin inline asm 2026-02-21T08:44:14.9308787Z @%p39 mbarrier.init.shared::cta.b64 [%r1517], 1; 2026-02-21T08:44:14.9309125Z // end inline asm 2026-02-21T08:44:14.9309357Z bar.sync 0; 2026-02-21T08:44:14.9309581Z add.s32 %r345, %r71, 143368; 2026-02-21T08:44:14.9309857Z // begin inline asm 2026-02-21T08:44:14.9310136Z @%p39 mbarrier.init.shared::cta.b64 [%r345], 1; 2026-02-21T08:44:14.9310479Z // end inline asm 2026-02-21T08:44:14.9310927Z .loc 1 44 60 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:60 2026-02-21T08:44:14.9311445Z or.b32 %r453, %r444, %r407; 2026-02-21T08:44:14.9311984Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9312512Z mad.wide.u32 %rd21, %r453, 2, %rd18; 2026-02-21T08:44:14.9312816Z mov.b32 %r474, 16; 2026-02-21T08:44:14.9313256Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9313824Z // begin inline asm 2026-02-21T08:44:14.9314180Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd21 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9314591Z // end inline asm 2026-02-21T08:44:14.9314913Z cp.async.commit_group; 2026-02-21T08:44:14.9315375Z .loc 1 45 59 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:59 2026-02-21T08:44:14.9315881Z or.b32 %r454, %r445, %r407; 2026-02-21T08:44:14.9316139Z or.b32 %r455, %r447, %r407; 2026-02-21T08:44:14.9316608Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9317165Z mad.wide.u32 %rd22, %r454, 2, %rd19; 2026-02-21T08:44:14.9317480Z cvt.u64.u32 %rd51, %r407; 2026-02-21T08:44:14.9317764Z cvt.u64.u32 %rd1, %r445; 2026-02-21T08:44:14.9318036Z or.b64 %rd52, %rd1, %rd51; 2026-02-21T08:44:14.9318319Z shl.b64 %rd53, %rd52, 1; 2026-02-21T08:44:14.9318588Z add.s64 %rd2, %rd19, %rd53; 2026-02-21T08:44:14.9318882Z add.s64 %rd23, %rd2, 262144; 2026-02-21T08:44:14.9319161Z add.s64 %rd24, %rd2, 524288; 2026-02-21T08:44:14.9319458Z mad.wide.u32 %rd25, %r455, 2, %rd19; 2026-02-21T08:44:14.9319996Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9320542Z // begin inline asm 2026-02-21T08:44:14.9320911Z cp.async.cg.shared.global [ %r348 + 0 ], [ %rd22 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9321331Z // end inline asm 2026-02-21T08:44:14.9321576Z // begin inline asm 2026-02-21T08:44:14.9321932Z cp.async.cg.shared.global [ %r350 + 0 ], [ %rd23 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9322358Z // end inline asm 2026-02-21T08:44:14.9322593Z // begin inline asm 2026-02-21T08:44:14.9322953Z cp.async.cg.shared.global [ %r352 + 0 ], [ %rd24 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9323370Z // end inline asm 2026-02-21T08:44:14.9323610Z // begin inline asm 2026-02-21T08:44:14.9323972Z cp.async.cg.shared.global [ %r354 + 0 ], [ %rd25 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9324390Z // end inline asm 2026-02-21T08:44:14.9324640Z cp.async.commit_group; 2026-02-21T08:44:14.9325174Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9325823Z add.s64 %rd26, %rd21, 32; 2026-02-21T08:44:14.9326319Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9326863Z bar.sync 0; 2026-02-21T08:44:14.9327085Z // begin inline asm 2026-02-21T08:44:14.9327444Z cp.async.cg.shared.global [ %r356 + 0 ], [ %rd26 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9327862Z // end inline asm 2026-02-21T08:44:14.9328108Z cp.async.commit_group; 2026-02-21T08:44:14.9328596Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9329142Z add.s64 %rd27, %rd2, 32; 2026-02-21T08:44:14.9329410Z or.b32 %r456, %r454, 16; 2026-02-21T08:44:14.9329687Z mad.wide.u32 %rd54, %r456, 2, %rd19; 2026-02-21T08:44:14.9330009Z add.s64 %rd28, %rd54, 262144; 2026-02-21T08:44:14.9330346Z add.s64 %rd29, %rd54, 524288; 2026-02-21T08:44:14.9330635Z cvt.u64.u32 %rd4, %r447; 2026-02-21T08:44:14.9330914Z or.b64 %rd55, %rd4, %rd51; 2026-02-21T08:44:14.9331188Z shl.b64 %rd56, %rd55, 1; 2026-02-21T08:44:14.9331462Z add.s64 %rd5, %rd19, %rd56; 2026-02-21T08:44:14.9331732Z add.s64 %rd30, %rd5, 32; 2026-02-21T08:44:14.9332220Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9332758Z // begin inline asm 2026-02-21T08:44:14.9333124Z cp.async.cg.shared.global [ %r358 + 0 ], [ %rd27 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9333594Z // end inline asm 2026-02-21T08:44:14.9333833Z // begin inline asm 2026-02-21T08:44:14.9334192Z cp.async.cg.shared.global [ %r360 + 0 ], [ %rd28 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9334605Z // end inline asm 2026-02-21T08:44:14.9334916Z // begin inline asm 2026-02-21T08:44:14.9335318Z cp.async.cg.shared.global [ %r362 + 0 ], [ %rd29 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9335736Z // end inline asm 2026-02-21T08:44:14.9335974Z // begin inline asm 2026-02-21T08:44:14.9336331Z cp.async.cg.shared.global [ %r364 + 0 ], [ %rd30 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9336736Z // end inline asm 2026-02-21T08:44:14.9336990Z cp.async.commit_group; 2026-02-21T08:44:14.9337481Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9338022Z add.s64 %rd31, %rd21, 64; 2026-02-21T08:44:14.9338519Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9339056Z bar.sync 0; 2026-02-21T08:44:14.9339285Z // begin inline asm 2026-02-21T08:44:14.9339633Z cp.async.cg.shared.global [ %r366 + 0 ], [ %rd31 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9340053Z // end inline asm 2026-02-21T08:44:14.9340289Z cp.async.commit_group; 2026-02-21T08:44:14.9340782Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9341330Z add.s64 %rd32, %rd2, 64; 2026-02-21T08:44:14.9341595Z or.b32 %r457, %r454, 32; 2026-02-21T08:44:14.9341876Z mad.wide.u32 %rd57, %r457, 2, %rd19; 2026-02-21T08:44:14.9342187Z add.s64 %rd33, %rd57, 262144; 2026-02-21T08:44:14.9342476Z add.s64 %rd34, %rd57, 524288; 2026-02-21T08:44:14.9342750Z add.s64 %rd35, %rd5, 64; 2026-02-21T08:44:14.9343241Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9343777Z // begin inline asm 2026-02-21T08:44:14.9344136Z cp.async.cg.shared.global [ %r368 + 0 ], [ %rd32 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9344559Z // end inline asm 2026-02-21T08:44:14.9344831Z // begin inline asm 2026-02-21T08:44:14.9345195Z cp.async.cg.shared.global [ %r370 + 0 ], [ %rd33 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9345615Z // end inline asm 2026-02-21T08:44:14.9345856Z // begin inline asm 2026-02-21T08:44:14.9346208Z cp.async.cg.shared.global [ %r372 + 0 ], [ %rd34 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9346637Z // end inline asm 2026-02-21T08:44:14.9346931Z // begin inline asm 2026-02-21T08:44:14.9347288Z cp.async.cg.shared.global [ %r374 + 0 ], [ %rd35 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9347704Z // end inline asm 2026-02-21T08:44:14.9347946Z cp.async.commit_group; 2026-02-21T08:44:14.9348436Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9348977Z add.s64 %rd36, %rd21, 96; 2026-02-21T08:44:14.9349472Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9350012Z bar.sync 0; 2026-02-21T08:44:14.9350246Z // begin inline asm 2026-02-21T08:44:14.9350600Z cp.async.cg.shared.global [ %r376 + 0 ], [ %rd36 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9351014Z // end inline asm 2026-02-21T08:44:14.9351262Z cp.async.commit_group; 2026-02-21T08:44:14.9351745Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9352349Z add.s64 %rd37, %rd2, 96; 2026-02-21T08:44:14.9352616Z or.b32 %r458, %r454, 48; 2026-02-21T08:44:14.9352902Z mad.wide.u32 %rd58, %r458, 2, %rd19; 2026-02-21T08:44:14.9353211Z add.s64 %rd38, %rd58, 262144; 2026-02-21T08:44:14.9353501Z add.s64 %rd39, %rd58, 524288; 2026-02-21T08:44:14.9353785Z add.s64 %rd40, %rd5, 96; 2026-02-21T08:44:14.9354267Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9354869Z // begin inline asm 2026-02-21T08:44:14.9355268Z cp.async.cg.shared.global [ %r378 + 0 ], [ %rd37 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9355695Z // end inline asm 2026-02-21T08:44:14.9355929Z // begin inline asm 2026-02-21T08:44:14.9356291Z cp.async.cg.shared.global [ %r380 + 0 ], [ %rd38 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9356705Z // end inline asm 2026-02-21T08:44:14.9356996Z // begin inline asm 2026-02-21T08:44:14.9357353Z cp.async.cg.shared.global [ %r382 + 0 ], [ %rd39 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9357766Z // end inline asm 2026-02-21T08:44:14.9358010Z // begin inline asm 2026-02-21T08:44:14.9358358Z cp.async.cg.shared.global [ %r384 + 0 ], [ %rd40 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9358781Z // end inline asm 2026-02-21T08:44:14.9359023Z cp.async.commit_group; 2026-02-21T08:44:14.9359509Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9360056Z add.s64 %rd41, %rd21, 128; 2026-02-21T08:44:14.9360562Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9361118Z bar.sync 0; 2026-02-21T08:44:14.9361340Z // begin inline asm 2026-02-21T08:44:14.9361699Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd41 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9362110Z // end inline asm 2026-02-21T08:44:14.9362357Z cp.async.commit_group; 2026-02-21T08:44:14.9362841Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9363416Z add.s64 %rd42, %rd2, 128; 2026-02-21T08:44:14.9363686Z or.b32 %r459, %r454, 64; 2026-02-21T08:44:14.9363972Z mad.wide.u32 %rd59, %r459, 2, %rd19; 2026-02-21T08:44:14.9364289Z add.s64 %rd43, %rd59, 262144; 2026-02-21T08:44:14.9364570Z add.s64 %rd44, %rd59, 524288; 2026-02-21T08:44:14.9364893Z add.s64 %rd45, %rd5, 128; 2026-02-21T08:44:14.9365378Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9365931Z // begin inline asm 2026-02-21T08:44:14.9366284Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd42 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9366709Z // end inline asm 2026-02-21T08:44:14.9366942Z // begin inline asm 2026-02-21T08:44:14.9367300Z cp.async.cg.shared.global [ %r390 + 0 ], [ %rd43 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9367719Z // end inline asm 2026-02-21T08:44:14.9367950Z // begin inline asm 2026-02-21T08:44:14.9368313Z cp.async.cg.shared.global [ %r392 + 0 ], [ %rd44 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9368791Z // end inline asm 2026-02-21T08:44:14.9369027Z // begin inline asm 2026-02-21T08:44:14.9369378Z cp.async.cg.shared.global [ %r394 + 0 ], [ %rd45 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9369799Z // end inline asm 2026-02-21T08:44:14.9370039Z cp.async.commit_group; 2026-02-21T08:44:14.9370532Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9371088Z add.s64 %rd46, %rd21, 160; 2026-02-21T08:44:14.9371585Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9372124Z bar.sync 0; 2026-02-21T08:44:14.9372346Z // begin inline asm 2026-02-21T08:44:14.9372704Z cp.async.cg.shared.global [ %r396 + 0 ], [ %rd46 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9373116Z // end inline asm 2026-02-21T08:44:14.9373364Z cp.async.commit_group; 2026-02-21T08:44:14.9373901Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9374447Z add.s64 %rd47, %rd2, 160; 2026-02-21T08:44:14.9374769Z or.b32 %r460, %r454, 80; 2026-02-21T08:44:14.9375053Z mad.wide.u32 %rd60, %r460, 2, %rd19; 2026-02-21T08:44:14.9375379Z add.s64 %rd48, %rd60, 262144; 2026-02-21T08:44:14.9375657Z add.s64 %rd49, %rd60, 524288; 2026-02-21T08:44:14.9375943Z add.s64 %rd50, %rd5, 160; 2026-02-21T08:44:14.9376424Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9377026Z // begin inline asm 2026-02-21T08:44:14.9377386Z cp.async.cg.shared.global [ %r398 + 0 ], [ %rd47 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9377796Z // end inline asm 2026-02-21T08:44:14.9378036Z // begin inline asm 2026-02-21T08:44:14.9378386Z cp.async.cg.shared.global [ %r400 + 0 ], [ %rd48 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9378890Z // end inline asm 2026-02-21T08:44:14.9379124Z // begin inline asm 2026-02-21T08:44:14.9379482Z cp.async.cg.shared.global [ %r402 + 0 ], [ %rd49 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9379891Z // end inline asm 2026-02-21T08:44:14.9380129Z // begin inline asm 2026-02-21T08:44:14.9380489Z cp.async.cg.shared.global [ %r404 + 0 ], [ %rd50 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9380906Z // end inline asm 2026-02-21T08:44:14.9381157Z cp.async.commit_group; 2026-02-21T08:44:14.9381643Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9382215Z cp.async.wait_group 10; 2026-02-21T08:44:14.9382491Z bar.sync 0; 2026-02-21T08:44:14.9382939Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9383486Z setp.ne.b32 %p22, %r58, 0; 2026-02-21T08:44:14.9383774Z @%p22 bra $L__BB0_3; 2026-02-21T08:44:14.9384026Z // %bb.2: 2026-02-21T08:44:14.9384455Z .loc 1 0 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:0:52 2026-02-21T08:44:14.9385069Z add.s32 %r466, %r71, 8192; 2026-02-21T08:44:14.9385355Z bfe.u32 %r467, %r466, 4, 14; 2026-02-21T08:44:14.9385640Z cvt.u64.u32 %rd66, %r467; 2026-02-21T08:44:14.9385939Z or.b64 %rd64, %rd66, -4611685949640802304; 2026-02-21T08:44:14.9386281Z bfe.u32 %r468, %r71, 4, 14; 2026-02-21T08:44:14.9386557Z cvt.u64.u32 %rd67, %r468; 2026-02-21T08:44:14.9386857Z or.b64 %rd62, %rd67, -4611685949640802304; 2026-02-21T08:44:14.9387190Z add.s32 %r469, %r71, 114688; 2026-02-21T08:44:14.9387466Z bfe.u32 %r470, %r469, 4, 14; 2026-02-21T08:44:14.9387748Z cvt.u64.u32 %rd68, %r470; 2026-02-21T08:44:14.9388057Z or.b64 %rd61, %rd68, -4611685949691133952; 2026-02-21T08:44:14.9388811Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9389378Z elect.sync %r471|%p24, -1; 2026-02-21T08:44:14.9389668Z mov.b32 %r462, 138412048; 2026-02-21T08:44:14.9389934Z mov.pred %p23, 0; 2026-02-21T08:44:14.9390184Z // begin inline asm 2026-02-21T08:44:14.9390613Z @%p24 tcgen05.mma.cta_group::1.kind::f16 [ %r1515 + 0 ], %rd61, %rd62, %r462, %p23; 2026-02-21T08:44:14.9391165Z // end inline asm 2026-02-21T08:44:14.9391405Z // begin inline asm 2026-02-21T08:44:14.9391813Z @%p24 tcgen05.mma.cta_group::1.kind::f16 [ %r1515 + 256 ], %rd61, %rd64, %r462, %p23; 2026-02-21T08:44:14.9392303Z // end inline asm 2026-02-21T08:44:14.9392542Z add.s32 %r472, %r71, 143360; 2026-02-21T08:44:14.9392826Z cvt.u64.u32 %rd65, %r472; 2026-02-21T08:44:14.9393088Z // begin inline asm 2026-02-21T08:44:14.9393471Z @%p24 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd65]; 2026-02-21T08:44:14.9393906Z // end inline asm 2026-02-21T08:44:14.9394135Z $L__BB0_3: 2026-02-21T08:44:14.9394579Z .loc 1 0 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:0:52 2026-02-21T08:44:14.9395220Z ld.param.b64 %rd20, [_helion_matmul_param_2]; 2026-02-21T08:44:14.9395581Z add.s32 %r13, %r71, %r423; 2026-02-21T08:44:14.9395905Z add.s32 %r14, %r71, %r424; 2026-02-21T08:44:14.9396185Z add.s32 %r15, %r71, %r425; 2026-02-21T08:44:14.9396459Z add.s32 %r16, %r71, %r426; 2026-02-21T08:44:14.9396735Z add.s32 %r819, %r814, 256; 2026-02-21T08:44:14.9397009Z add.s32 %r824, %r814, 512; 2026-02-21T08:44:14.9397272Z add.s32 %r829, %r814, 768; 2026-02-21T08:44:14.9397554Z add.s32 %r834, %r814, 1024; 2026-02-21T08:44:14.9397834Z add.s32 %r839, %r814, 1280; 2026-02-21T08:44:14.9398121Z add.s32 %r844, %r814, 1536; 2026-02-21T08:44:14.9398390Z add.s32 %r849, %r814, 1792; 2026-02-21T08:44:14.9398714Z or.b32 %r26, %r25, 4; 2026-02-21T08:44:14.9398963Z or.b32 %r27, %r25, 8; 2026-02-21T08:44:14.9399222Z or.b32 %r28, %r25, 12; 2026-02-21T08:44:14.9399481Z or.b32 %r29, %r25, 16; 2026-02-21T08:44:14.9399746Z or.b32 %r30, %r25, 20; 2026-02-21T08:44:14.9400002Z or.b32 %r31, %r25, 24; 2026-02-21T08:44:14.9400301Z or.b32 %r32, %r25, 28; 2026-02-21T08:44:14.9400558Z or.b32 %r33, %r25, 32; 2026-02-21T08:44:14.9400808Z or.b32 %r34, %r25, 36; 2026-02-21T08:44:14.9401065Z or.b32 %r35, %r25, 40; 2026-02-21T08:44:14.9401312Z or.b32 %r36, %r25, 44; 2026-02-21T08:44:14.9401564Z or.b32 %r37, %r25, 48; 2026-02-21T08:44:14.9401809Z or.b32 %r38, %r25, 52; 2026-02-21T08:44:14.9402064Z or.b32 %r39, %r25, 56; 2026-02-21T08:44:14.9402313Z or.b32 %r40, %r25, 60; 2026-02-21T08:44:14.9402571Z or.b32 %r41, %r25, 64; 2026-02-21T08:44:14.9402823Z or.b32 %r42, %r25, 68; 2026-02-21T08:44:14.9403067Z or.b32 %r43, %r25, 72; 2026-02-21T08:44:14.9403323Z or.b32 %r44, %r25, 76; 2026-02-21T08:44:14.9403571Z or.b32 %r45, %r25, 80; 2026-02-21T08:44:14.9403827Z or.b32 %r46, %r25, 84; 2026-02-21T08:44:14.9404072Z or.b32 %r47, %r25, 88; 2026-02-21T08:44:14.9404325Z or.b32 %r48, %r25, 92; 2026-02-21T08:44:14.9404570Z or.b32 %r49, %r25, 96; 2026-02-21T08:44:14.9404864Z or.b32 %r50, %r25, 100; 2026-02-21T08:44:14.9405729Z [68s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:44:14.9408089Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 512, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, False]), static_shapes=True) 2026-02-21T08:44:14.9410329Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:44:14.9410772Z `ptxas` stderr: 2026-02-21T08:44:14.9411567Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 153 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:44:14.9412500Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:44:14.9412792Z 2026-02-21T08:44:14.9413727Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpy3psav1r.ptx -o /tmp/tmpy3psav1r.ptx.o 2026-02-21T08:44:14.9414719Z 2026-02-21T08:44:14.9414954Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:44:14.9415414Z or.b32 %r51, %r25, 104; 2026-02-21T08:44:14.9415677Z or.b32 %r52, %r25, 108; 2026-02-21T08:44:14.9415943Z or.b32 %r53, %r25, 112; 2026-02-21T08:44:14.9416198Z or.b32 %r54, %r25, 116; 2026-02-21T08:44:14.9416462Z or.b32 %r55, %r25, 120; 2026-02-21T08:44:14.9416712Z or.b32 %r56, %r25, 124; 2026-02-21T08:44:14.9416980Z or.b32 %r57, %r441, %r408; 2026-02-21T08:44:14.9417490Z .loc 1 44 32 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:32 2026-02-21T08:44:14.9418039Z add.s64 %rd69, %rd21, 192; 2026-02-21T08:44:14.9418544Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9419083Z bar.sync 0; 2026-02-21T08:44:14.9419375Z // begin inline asm 2026-02-21T08:44:14.9419749Z cp.async.cg.shared.global [ %r473 + 0 ], [ %rd69 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9420180Z // end inline asm 2026-02-21T08:44:14.9420434Z cp.async.commit_group; 2026-02-21T08:44:14.9420922Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9421479Z add.s64 %rd70, %rd2, 192; 2026-02-21T08:44:14.9421758Z cvt.u64.u32 %rd75, %r7; 2026-02-21T08:44:14.9422040Z add.s64 %rd76, %rd1, %rd75; 2026-02-21T08:44:14.9422318Z shl.b64 %rd77, %rd76, 1; 2026-02-21T08:44:14.9422656Z add.s64 %rd78, %rd19, %rd77; 2026-02-21T08:44:14.9422940Z add.s64 %rd71, %rd78, 262144; 2026-02-21T08:44:14.9423229Z add.s64 %rd72, %rd78, 524288; 2026-02-21T08:44:14.9423506Z add.s64 %rd73, %rd5, 192; 2026-02-21T08:44:14.9424058Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9424615Z // begin inline asm 2026-02-21T08:44:14.9425028Z cp.async.cg.shared.global [ %r475 + 0 ], [ %rd70 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9425461Z // end inline asm 2026-02-21T08:44:14.9425697Z // begin inline asm 2026-02-21T08:44:14.9426062Z cp.async.cg.shared.global [ %r477 + 0 ], [ %rd71 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9426477Z // end inline asm 2026-02-21T08:44:14.9426716Z // begin inline asm 2026-02-21T08:44:14.9427079Z cp.async.cg.shared.global [ %r479 + 0 ], [ %rd72 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9427494Z // end inline asm 2026-02-21T08:44:14.9427732Z // begin inline asm 2026-02-21T08:44:14.9428089Z cp.async.cg.shared.global [ %r481 + 0 ], [ %rd73 + 0 ], 0x10, %r474; 2026-02-21T08:44:14.9428510Z // end inline asm 2026-02-21T08:44:14.9428752Z cp.async.commit_group; 2026-02-21T08:44:14.9429245Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9429790Z and.b32 %r487, %r1, 1; 2026-02-21T08:44:14.9430067Z mul.wide.u32 %rd6, %r487, 16; 2026-02-21T08:44:14.9430361Z shl.b64 %rd79, %rd4, 1; 2026-02-21T08:44:14.9430630Z add.s64 %rd80, %rd79, %rd19; 2026-02-21T08:44:14.9430924Z add.s64 %rd643, %rd80, 224; 2026-02-21T08:44:14.9431200Z shl.b32 %r488, %r3, 16; 2026-02-21T08:44:14.9431477Z and.b32 %r489, %r488, 16252928; 2026-02-21T08:44:14.9431764Z shl.b32 %r490, %r4, 10; 2026-02-21T08:44:14.9432029Z or.b32 %r491, %r489, %r490; 2026-02-21T08:44:14.9432317Z mad.wide.u32 %rd642, %r491, 2, %rd19; 2026-02-21T08:44:14.9432634Z and.b32 %r492, %r3, 7; 2026-02-21T08:44:14.9432892Z shl.b32 %r493, %r492, 17; 2026-02-21T08:44:14.9433166Z or.b32 %r494, %r493, %r490; 2026-02-21T08:44:14.9433463Z mad.wide.u32 %rd81, %r494, 2, %rd18; 2026-02-21T08:44:14.9433777Z add.s64 %rd641, %rd81, 224; 2026-02-21T08:44:14.9434055Z mov.b32 %r1520, 1; 2026-02-21T08:44:14.9434291Z mov.b32 %r1519, 6; 2026-02-21T08:44:14.9434540Z mov.b64 %rd640, -16; 2026-02-21T08:44:14.9434826Z mov.b32 %r1518, %r1516; 2026-02-21T08:44:14.9435098Z mov.b32 %r1521, %r1516; 2026-02-21T08:44:14.9435357Z bra.uni $L__BB0_4; 2026-02-21T08:44:14.9435762Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:44:14.9436381Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9436956Z add.s64 %rd640, %rd640, 16; 2026-02-21T08:44:14.9437265Z setp.lt.u64 %p35, %rd640, 912; 2026-02-21T08:44:14.9437785Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9438336Z // begin inline asm 2026-02-21T08:44:14.9438567Z 2026-02-21T08:44:14.9438762Z { 2026-02-21T08:44:14.9438968Z .reg .pred complete; 2026-02-21T08:44:14.9439225Z waitLoop: 2026-02-21T08:44:14.9439565Z mbarrier.try_wait.parity.shared.b64 complete, [%r1517], %r1516; 2026-02-21T08:44:14.9440015Z @!complete bra.uni waitLoop; 2026-02-21T08:44:14.9440293Z } 2026-02-21T08:44:14.9440404Z 2026-02-21T08:44:14.9440499Z // end inline asm 2026-02-21T08:44:14.9440791Z add.s32 %r527, %r1520, 1; 2026-02-21T08:44:14.9441071Z setp.gt.s32 %p36, %r527, 1; 2026-02-21T08:44:14.9441367Z selp.b32 %r1520, 0, %r527, %p36; 2026-02-21T08:44:14.9441666Z selp.b32 %r528, 1, 0, %p36; 2026-02-21T08:44:14.9441953Z xor.b32 %r69, %r1521, %r528; 2026-02-21T08:44:14.9442452Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9443009Z add.s32 %r529, %r1519, 1; 2026-02-21T08:44:14.9443293Z setp.gt.s32 %p37, %r529, 6; 2026-02-21T08:44:14.9443582Z selp.b32 %r1519, 0, %r529, %p37; 2026-02-21T08:44:14.9444157Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9444749Z add.s64 %rd90, %rd641, %rd6; 2026-02-21T08:44:14.9445042Z shl.b32 %r530, %r1519, 12; 2026-02-21T08:44:14.9445307Z bar.sync 0; 2026-02-21T08:44:14.9445604Z add.s32 %r517, %r346, %r530; 2026-02-21T08:44:14.9445887Z selp.b32 %r518, 16, 0, %p35; 2026-02-21T08:44:14.9446168Z // begin inline asm 2026-02-21T08:44:14.9446543Z cp.async.cg.shared.global [ %r517 + 0 ], [ %rd90 + 0 ], 0x10, %r518; 2026-02-21T08:44:14.9446961Z // end inline asm 2026-02-21T08:44:14.9447212Z cp.async.commit_group; 2026-02-21T08:44:14.9447696Z .loc 1 45 34 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:34 2026-02-21T08:44:14.9448245Z add.s64 %rd95, %rd642, %rd6; 2026-02-21T08:44:14.9448524Z add.s64 %rd91, %rd95, 224; 2026-02-21T08:44:14.9448811Z add.s64 %rd92, %rd95, 262368; 2026-02-21T08:44:14.9449097Z add.s64 %rd93, %rd95, 524512; 2026-02-21T08:44:14.9449603Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9450160Z add.s64 %rd94, %rd643, %rd6; 2026-02-21T08:44:14.9450439Z shl.b32 %r531, %r1519, 14; 2026-02-21T08:44:14.9450717Z add.s32 %r533, %r71, %r531; 2026-02-21T08:44:14.9450993Z add.s32 %r519, %r533, %r5; 2026-02-21T08:44:14.9451272Z // begin inline asm 2026-02-21T08:44:14.9451634Z cp.async.cg.shared.global [ %r519 + 0 ], [ %rd91 + 0 ], 0x10, %r518; 2026-02-21T08:44:14.9452065Z // end inline asm 2026-02-21T08:44:14.9452304Z add.s32 %r521, %r519, 4096; 2026-02-21T08:44:14.9452580Z // begin inline asm 2026-02-21T08:44:14.9452941Z cp.async.cg.shared.global [ %r521 + 0 ], [ %rd92 + 0 ], 0x10, %r518; 2026-02-21T08:44:14.9453358Z // end inline asm 2026-02-21T08:44:14.9453603Z add.s32 %r523, %r519, 8192; 2026-02-21T08:44:14.9453872Z // begin inline asm 2026-02-21T08:44:14.9454236Z cp.async.cg.shared.global [ %r523 + 0 ], [ %rd93 + 0 ], 0x10, %r518; 2026-02-21T08:44:14.9454653Z // end inline asm 2026-02-21T08:44:14.9454955Z add.s32 %r525, %r519, 12288; 2026-02-21T08:44:14.9455231Z // begin inline asm 2026-02-21T08:44:14.9455602Z cp.async.cg.shared.global [ %r525 + 0 ], [ %rd94 + 0 ], 0x10, %r518; 2026-02-21T08:44:14.9456037Z // end inline asm 2026-02-21T08:44:14.9456290Z cp.async.commit_group; 2026-02-21T08:44:14.9456804Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9462023Z add.s64 %rd643, %rd643, 32; 2026-02-21T08:44:14.9462325Z add.s64 %rd642, %rd642, 32; 2026-02-21T08:44:14.9462605Z add.s64 %rd641, %rd641, 32; 2026-02-21T08:44:14.9462908Z setp.lt.u64 %p38, %rd640, 992; 2026-02-21T08:44:14.9463209Z mov.b32 %r1516, %r1521; 2026-02-21T08:44:14.9463494Z mov.b32 %r1517, %r534; 2026-02-21T08:44:14.9463770Z mov.b32 %r1521, %r69; 2026-02-21T08:44:14.9464029Z @%p38 bra $L__BB0_4; 2026-02-21T08:44:14.9464295Z bra.uni $L__BB0_7; 2026-02-21T08:44:14.9464640Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:44:14.9465123Z add.s32 %r496, %r1518, 1; 2026-02-21T08:44:14.9465407Z setp.gt.s32 %p29, %r496, 6; 2026-02-21T08:44:14.9465711Z selp.b32 %r1518, 0, %r496, %p29; 2026-02-21T08:44:14.9466245Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9466912Z cp.async.wait_group 10; 2026-02-21T08:44:14.9467202Z bar.sync 0; 2026-02-21T08:44:14.9467658Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9468226Z shl.b32 %r497, %r1520, 3; 2026-02-21T08:44:14.9468502Z add.s32 %r499, %r71, %r497; 2026-02-21T08:44:14.9468794Z add.s32 %r534, %r499, 143360; 2026-02-21T08:44:14.9469305Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9469870Z @%p22 bra $L__BB0_6; 2026-02-21T08:44:14.9470302Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:44:14.9470936Z .loc 1 45 87 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:45:87 2026-02-21T08:44:14.9471506Z shl.b32 %r504, %r1518, 14; 2026-02-21T08:44:14.9471791Z add.s32 %r506, %r71, %r504; 2026-02-21T08:44:14.9472367Z .loc 1 44 85 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:44:85 2026-02-21T08:44:14.9472930Z shl.b32 %r507, %r1518, 12; 2026-02-21T08:44:14.9473222Z add.s32 %r508, %r71, %r507; 2026-02-21T08:44:14.9473504Z add.s32 %r509, %r508, 114688; 2026-02-21T08:44:14.9474028Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9474608Z elect.sync %r510|%p31, -1; 2026-02-21T08:44:14.9474938Z bfe.u32 %r511, %r509, 4, 14; 2026-02-21T08:44:14.9475234Z cvt.u64.u32 %rd87, %r511; 2026-02-21T08:44:14.9475538Z or.b64 %rd82, %rd87, -4611685949691133952; 2026-02-21T08:44:14.9475887Z bfe.u32 %r512, %r506, 4, 14; 2026-02-21T08:44:14.9476172Z cvt.u64.u32 %rd88, %r512; 2026-02-21T08:44:14.9476475Z or.b64 %rd83, %rd88, -4611685949640802304; 2026-02-21T08:44:14.9476804Z mov.b32 %r501, 138412048; 2026-02-21T08:44:14.9477093Z mov.pred %p30, -1; 2026-02-21T08:44:14.9477362Z // begin inline asm 2026-02-21T08:44:14.9477805Z @%p31 tcgen05.mma.cta_group::1.kind::f16 [ %r1515 + 0 ], %rd82, %rd83, %r501, %p30; 2026-02-21T08:44:14.9478310Z // end inline asm 2026-02-21T08:44:14.9478560Z add.s32 %r513, %r506, 8192; 2026-02-21T08:44:14.9478852Z bfe.u32 %r514, %r513, 4, 14; 2026-02-21T08:44:14.9479136Z cvt.u64.u32 %rd89, %r514; 2026-02-21T08:44:14.9479437Z or.b64 %rd85, %rd89, -4611685949640802304; 2026-02-21T08:44:14.9479764Z // begin inline asm 2026-02-21T08:44:14.9480196Z @%p31 tcgen05.mma.cta_group::1.kind::f16 [ %r1515 + 256 ], %rd82, %rd85, %r501, %p30; 2026-02-21T08:44:14.9480696Z // end inline asm 2026-02-21T08:44:14.9480943Z cvt.u64.u32 %rd86, %r534; 2026-02-21T08:44:14.9481224Z // begin inline asm 2026-02-21T08:44:14.9481610Z @%p31 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T08:44:14.9482062Z // end inline asm 2026-02-21T08:44:14.9482300Z bra.uni $L__BB0_6; 2026-02-21T08:44:14.9482626Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:44:14.9483246Z .loc 1 0 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:0:52 2026-02-21T08:44:14.9483873Z mov.b32 %r535, 1; 2026-02-21T08:44:14.9484360Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9484959Z // begin inline asm 2026-02-21T08:44:14.9485209Z 2026-02-21T08:44:14.9485401Z { 2026-02-21T08:44:14.9485620Z .reg .pred complete; 2026-02-21T08:44:14.9485875Z waitLoop: 2026-02-21T08:44:14.9486226Z mbarrier.try_wait.parity.shared.b64 complete, [%r534], %r535; 2026-02-21T08:44:14.9486673Z @!complete bra.uni waitLoop; 2026-02-21T08:44:14.9486954Z } 2026-02-21T08:44:14.9487067Z 2026-02-21T08:44:14.9487169Z // end inline asm 2026-02-21T08:44:14.9487640Z .loc 1 39 57 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:39:57 2026-02-21T08:44:14.9488209Z cp.async.wait_group 0; 2026-02-21T08:44:14.9488484Z bar.sync 0; 2026-02-21T08:44:14.9488731Z add.s32 %r536, %r71, 143360; 2026-02-21T08:44:14.9489013Z // begin inline asm 2026-02-21T08:44:14.9489376Z @%p39 mbarrier.inval.shared::cta.b64 [%r536]; 2026-02-21T08:44:14.9489730Z // end inline asm 2026-02-21T08:44:14.9489972Z bar.sync 0; 2026-02-21T08:44:14.9490205Z // begin inline asm 2026-02-21T08:44:14.9490499Z @%p39 mbarrier.inval.shared::cta.b64 [%r345]; 2026-02-21T08:44:14.9490858Z // end inline asm 2026-02-21T08:44:14.9491326Z .loc 1 49 53 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:49:53 2026-02-21T08:44:14.9491902Z mad.lo.s32 %r1099, %r25, 12288, %r57; 2026-02-21T08:44:14.9492277Z mad.lo.s32 %r1100, %r26, 12288, %r57; 2026-02-21T08:44:14.9492608Z mad.lo.s32 %r1101, %r27, 12288, %r57; 2026-02-21T08:44:14.9492922Z mad.lo.s32 %r1102, %r28, 12288, %r57; 2026-02-21T08:44:14.9493246Z mad.lo.s32 %r1103, %r29, 12288, %r57; 2026-02-21T08:44:14.9493568Z mad.lo.s32 %r1104, %r30, 12288, %r57; 2026-02-21T08:44:14.9493923Z mad.lo.s32 %r1105, %r31, 12288, %r57; 2026-02-21T08:44:14.9494246Z mad.lo.s32 %r1106, %r32, 12288, %r57; 2026-02-21T08:44:14.9494557Z mad.lo.s32 %r1107, %r33, 12288, %r57; 2026-02-21T08:44:14.9494922Z mad.lo.s32 %r1108, %r34, 12288, %r57; 2026-02-21T08:44:14.9495240Z mad.lo.s32 %r1109, %r35, 12288, %r57; 2026-02-21T08:44:14.9495562Z mad.lo.s32 %r1110, %r36, 12288, %r57; 2026-02-21T08:44:14.9495879Z mad.lo.s32 %r1111, %r37, 12288, %r57; 2026-02-21T08:44:14.9496200Z mad.lo.s32 %r1112, %r38, 12288, %r57; 2026-02-21T08:44:14.9496534Z mad.lo.s32 %r1113, %r39, 12288, %r57; 2026-02-21T08:44:14.9496850Z mad.lo.s32 %r1114, %r40, 12288, %r57; 2026-02-21T08:44:14.9497179Z mad.lo.s32 %r1115, %r41, 12288, %r57; 2026-02-21T08:44:14.9497491Z mad.lo.s32 %r1116, %r42, 12288, %r57; 2026-02-21T08:44:14.9497815Z mad.lo.s32 %r1117, %r43, 12288, %r57; 2026-02-21T08:44:14.9498127Z mad.lo.s32 %r1118, %r44, 12288, %r57; 2026-02-21T08:44:14.9498448Z mad.lo.s32 %r1119, %r45, 12288, %r57; 2026-02-21T08:44:14.9498764Z mad.lo.s32 %r1120, %r46, 12288, %r57; 2026-02-21T08:44:14.9499081Z mad.lo.s32 %r1121, %r47, 12288, %r57; 2026-02-21T08:44:14.9499404Z mad.lo.s32 %r1122, %r48, 12288, %r57; 2026-02-21T08:44:14.9499719Z mad.lo.s32 %r1123, %r49, 12288, %r57; 2026-02-21T08:44:14.9500038Z mad.lo.s32 %r1124, %r50, 12288, %r57; 2026-02-21T08:44:14.9500352Z mad.lo.s32 %r1125, %r51, 12288, %r57; 2026-02-21T08:44:14.9500669Z mad.lo.s32 %r1126, %r52, 12288, %r57; 2026-02-21T08:44:14.9500979Z mad.lo.s32 %r1127, %r53, 12288, %r57; 2026-02-21T08:44:14.9501298Z mad.lo.s32 %r1128, %r54, 12288, %r57; 2026-02-21T08:44:14.9501609Z mad.lo.s32 %r1129, %r55, 12288, %r57; 2026-02-21T08:44:14.9501936Z mad.lo.s32 %r1130, %r56, 12288, %r57; 2026-02-21T08:44:14.9502483Z .loc 1 49 24 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:49:24 2026-02-21T08:44:14.9503049Z mad.wide.u32 %rd96, %r1099, 2, %rd20; 2026-02-21T08:44:14.9503379Z mad.wide.u32 %rd97, %r1100, 2, %rd20; 2026-02-21T08:44:14.9503702Z mad.wide.u32 %rd98, %r1101, 2, %rd20; 2026-02-21T08:44:14.9504043Z mad.wide.u32 %rd99, %r1102, 2, %rd20; 2026-02-21T08:44:14.9504439Z mad.wide.u32 %rd100, %r1103, 2, %rd20; 2026-02-21T08:44:14.9505028Z mad.wide.u32 %rd101, %r1104, 2, %rd20; 2026-02-21T08:44:14.9505421Z mad.wide.u32 %rd102, %r1105, 2, %rd20; 2026-02-21T08:44:14.9505763Z mad.wide.u32 %rd103, %r1106, 2, %rd20; 2026-02-21T08:44:14.9506100Z mad.wide.u32 %rd104, %r1107, 2, %rd20; 2026-02-21T08:44:14.9506430Z mad.wide.u32 %rd105, %r1108, 2, %rd20; 2026-02-21T08:44:14.9506766Z mad.wide.u32 %rd106, %r1109, 2, %rd20; 2026-02-21T08:44:14.9507094Z mad.wide.u32 %rd107, %r1110, 2, %rd20; 2026-02-21T08:44:14.9507434Z mad.wide.u32 %rd108, %r1111, 2, %rd20; 2026-02-21T08:44:14.9507765Z mad.wide.u32 %rd109, %r1112, 2, %rd20; 2026-02-21T08:44:14.9508103Z mad.wide.u32 %rd110, %r1113, 2, %rd20; 2026-02-21T08:44:14.9508435Z mad.wide.u32 %rd111, %r1114, 2, %rd20; 2026-02-21T08:44:14.9508770Z mad.wide.u32 %rd112, %r1115, 2, %rd20; 2026-02-21T08:44:14.9509108Z mad.wide.u32 %rd113, %r1116, 2, %rd20; 2026-02-21T08:44:14.9509485Z mad.wide.u32 %rd114, %r1117, 2, %rd20; 2026-02-21T08:44:14.9509831Z mad.wide.u32 %rd115, %r1118, 2, %rd20; 2026-02-21T08:44:14.9510164Z mad.wide.u32 %rd116, %r1119, 2, %rd20; 2026-02-21T08:44:14.9510505Z mad.wide.u32 %rd117, %r1120, 2, %rd20; 2026-02-21T08:44:14.9510830Z mad.wide.u32 %rd118, %r1121, 2, %rd20; 2026-02-21T08:44:14.9511170Z mad.wide.u32 %rd119, %r1122, 2, %rd20; 2026-02-21T08:44:14.9511501Z mad.wide.u32 %rd120, %r1123, 2, %rd20; 2026-02-21T08:44:14.9511848Z mad.wide.u32 %rd121, %r1124, 2, %rd20; 2026-02-21T08:44:14.9512190Z mad.wide.u32 %rd122, %r1125, 2, %rd20; 2026-02-21T08:44:14.9512568Z mad.wide.u32 %rd123, %r1126, 2, %rd20; 2026-02-21T08:44:14.9512906Z mad.wide.u32 %rd124, %r1127, 2, %rd20; 2026-02-21T08:44:14.9513247Z mad.wide.u32 %rd125, %r1128, 2, %rd20; 2026-02-21T08:44:14.9513586Z mad.wide.u32 %rd126, %r1129, 2, %rd20; 2026-02-21T08:44:14.9513973Z mad.wide.u32 %rd127, %r1130, 2, %rd20; 2026-02-21T08:44:14.9514534Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9515154Z // begin inline asm 2026-02-21T08:44:14.9515884Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553}, [%r809 + 0]; 2026-02-21T08:44:14.9516667Z // end inline asm 2026-02-21T08:44:14.9516912Z // begin inline asm 2026-02-21T08:44:14.9517629Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570}, [%r809 + 16]; 2026-02-21T08:44:14.9518410Z // end inline asm 2026-02-21T08:44:14.9518658Z // begin inline asm 2026-02-21T08:44:14.9519364Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587}, [%r809 + 32]; 2026-02-21T08:44:14.9520130Z // end inline asm 2026-02-21T08:44:14.9520378Z // begin inline asm 2026-02-21T08:44:14.9521076Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604}, [%r809 + 48]; 2026-02-21T08:44:14.9521856Z // end inline asm 2026-02-21T08:44:14.9522089Z // begin inline asm 2026-02-21T08:44:14.9522796Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621}, [%r809 + 64]; 2026-02-21T08:44:14.9523577Z // end inline asm 2026-02-21T08:44:14.9523813Z // begin inline asm 2026-02-21T08:44:14.9524511Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638}, [%r809 + 80]; 2026-02-21T08:44:14.9525323Z // end inline asm 2026-02-21T08:44:14.9525571Z // begin inline asm 2026-02-21T08:44:14.9526275Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655}, [%r809 + 96]; 2026-02-21T08:44:14.9527091Z // end inline asm 2026-02-21T08:44:14.9527336Z // begin inline asm 2026-02-21T08:44:14.9528025Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672}, [%r809 + 112]; 2026-02-21T08:44:14.9528805Z // end inline asm 2026-02-21T08:44:14.9529044Z // begin inline asm 2026-02-21T08:44:14.9529747Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689}, [%r809 + 256]; 2026-02-21T08:44:14.9530637Z // end inline asm 2026-02-21T08:44:14.9530922Z // begin inline asm 2026-02-21T08:44:14.9531641Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706}, [%r809 + 272]; 2026-02-21T08:44:14.9532405Z // end inline asm 2026-02-21T08:44:14.9532648Z // begin inline asm 2026-02-21T08:44:14.9533374Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723}, [%r809 + 288]; 2026-02-21T08:44:14.9534151Z // end inline asm 2026-02-21T08:44:14.9534397Z // begin inline asm 2026-02-21T08:44:14.9535130Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740}, [%r809 + 304]; 2026-02-21T08:44:14.9535912Z // end inline asm 2026-02-21T08:44:14.9536148Z // begin inline asm 2026-02-21T08:44:14.9536889Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757}, [%r809 + 320]; 2026-02-21T08:44:14.9537648Z // end inline asm 2026-02-21T08:44:14.9537897Z // begin inline asm 2026-02-21T08:44:14.9538633Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774}, [%r809 + 336]; 2026-02-21T08:44:14.9539409Z // end inline asm 2026-02-21T08:44:14.9539658Z // begin inline asm 2026-02-21T08:44:14.9540352Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791}, [%r809 + 352]; 2026-02-21T08:44:14.9541121Z // end inline asm 2026-02-21T08:44:14.9541360Z // begin inline asm 2026-02-21T08:44:14.9542067Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808}, [%r809 + 368]; 2026-02-21T08:44:14.9542850Z // end inline asm 2026-02-21T08:44:14.9543093Z // begin inline asm 2026-02-21T08:44:14.9543376Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:44:14.9543673Z // end inline asm 2026-02-21T08:44:14.9543928Z cvt.u64.u32 %rd128, %r538; 2026-02-21T08:44:14.9544217Z cvt.u64.u32 %rd129, %r539; 2026-02-21T08:44:14.9544514Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:44:14.9544854Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:44:14.9545405Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9545989Z mov.b64 {%r1131, %r1132}, %rd131; 2026-02-21T08:44:14.9546323Z cvt.rn.f16x2.f32 %r1133, %r1132, %r1131; 2026-02-21T08:44:14.9546903Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9547473Z cvt.u64.u32 %rd132, %r540; 2026-02-21T08:44:14.9547773Z cvt.u64.u32 %rd133, %r541; 2026-02-21T08:44:14.9548063Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:44:14.9548364Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:44:14.9548901Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9549467Z mov.b64 {%r1134, %r1135}, %rd135; 2026-02-21T08:44:14.9549811Z cvt.rn.f16x2.f32 %r1136, %r1135, %r1134; 2026-02-21T08:44:14.9550375Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9550989Z cvt.u64.u32 %rd136, %r542; 2026-02-21T08:44:14.9551272Z cvt.u64.u32 %rd137, %r543; 2026-02-21T08:44:14.9551563Z shl.b64 %rd138, %rd137, 32; 2026-02-21T08:44:14.9551850Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T08:44:14.9552378Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9552955Z mov.b64 {%r1137, %r1138}, %rd139; 2026-02-21T08:44:14.9553284Z cvt.rn.f16x2.f32 %r1139, %r1138, %r1137; 2026-02-21T08:44:14.9553849Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9554411Z cvt.u64.u32 %rd140, %r544; 2026-02-21T08:44:14.9554756Z cvt.u64.u32 %rd141, %r545; 2026-02-21T08:44:14.9555037Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:44:14.9555333Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:44:14.9555932Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9556496Z mov.b64 {%r1140, %r1141}, %rd143; 2026-02-21T08:44:14.9556830Z cvt.rn.f16x2.f32 %r1142, %r1141, %r1140; 2026-02-21T08:44:14.9557388Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9557957Z cvt.u64.u32 %rd144, %r546; 2026-02-21T08:44:14.9558242Z cvt.u64.u32 %rd145, %r547; 2026-02-21T08:44:14.9558538Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:44:14.9558828Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:44:14.9559422Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9559993Z mov.b64 {%r1143, %r1144}, %rd147; 2026-02-21T08:44:14.9560314Z cvt.rn.f16x2.f32 %r1145, %r1144, %r1143; 2026-02-21T08:44:14.9560923Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9561486Z cvt.u64.u32 %rd148, %r548; 2026-02-21T08:44:14.9561785Z cvt.u64.u32 %rd149, %r549; 2026-02-21T08:44:14.9562064Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:44:14.9562361Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:44:14.9562885Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9563453Z mov.b64 {%r1146, %r1147}, %rd151; 2026-02-21T08:44:14.9563789Z cvt.rn.f16x2.f32 %r1148, %r1147, %r1146; 2026-02-21T08:44:14.9564346Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9564963Z cvt.u64.u32 %rd152, %r550; 2026-02-21T08:44:14.9565244Z cvt.u64.u32 %rd153, %r551; 2026-02-21T08:44:14.9565532Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:44:14.9565817Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:44:14.9566339Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9566909Z mov.b64 {%r1149, %r1150}, %rd155; 2026-02-21T08:44:14.9567230Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T08:44:14.9567801Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9568338Z cvt.u64.u32 %rd156, %r552; 2026-02-21T08:44:14.9568621Z cvt.u64.u32 %rd157, %r553; 2026-02-21T08:44:14.9568895Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:44:14.9569181Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:44:14.9569683Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9570227Z mov.b64 {%r1152, %r1153}, %rd159; 2026-02-21T08:44:14.9570544Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T08:44:14.9571079Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9571632Z cvt.u64.u32 %rd160, %r555; 2026-02-21T08:44:14.9571910Z cvt.u64.u32 %rd161, %r556; 2026-02-21T08:44:14.9572192Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:44:14.9572480Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:44:14.9573034Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9573583Z mov.b64 {%r1155, %r1156}, %rd163; 2026-02-21T08:44:14.9573892Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T08:44:14.9574432Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9575015Z cvt.u64.u32 %rd164, %r557; 2026-02-21T08:44:14.9575300Z cvt.u64.u32 %rd165, %r558; 2026-02-21T08:44:14.9575578Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:44:14.9575866Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:44:14.9576387Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9576937Z mov.b64 {%r1158, %r1159}, %rd167; 2026-02-21T08:44:14.9577258Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T08:44:14.9577859Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9578408Z cvt.u64.u32 %rd168, %r559; 2026-02-21T08:44:14.9578684Z cvt.u64.u32 %rd169, %r560; 2026-02-21T08:44:14.9578968Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:44:14.9579254Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:44:14.9579756Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9580310Z mov.b64 {%r1161, %r1162}, %rd171; 2026-02-21T08:44:14.9580623Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T08:44:14.9581217Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9581757Z cvt.u64.u32 %rd172, %r561; 2026-02-21T08:44:14.9582040Z cvt.u64.u32 %rd173, %r562; 2026-02-21T08:44:14.9582311Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:44:14.9582641Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:44:14.9583156Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9583704Z mov.b64 {%r1164, %r1165}, %rd175; 2026-02-21T08:44:14.9584026Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T08:44:14.9584562Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9585156Z cvt.u64.u32 %rd176, %r563; 2026-02-21T08:44:14.9585426Z cvt.u64.u32 %rd177, %r564; 2026-02-21T08:44:14.9585707Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:44:14.9585993Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:44:14.9586498Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9587053Z mov.b64 {%r1167, %r1168}, %rd179; 2026-02-21T08:44:14.9587363Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T08:44:14.9587915Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9588458Z cvt.u64.u32 %rd180, %r565; 2026-02-21T08:44:14.9588742Z cvt.u64.u32 %rd181, %r566; 2026-02-21T08:44:14.9589018Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:44:14.9589305Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:44:14.9589818Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9590362Z mov.b64 {%r1170, %r1171}, %rd183; 2026-02-21T08:44:14.9590682Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T08:44:14.9591223Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9591773Z cvt.u64.u32 %rd184, %r567; 2026-02-21T08:44:14.9592046Z cvt.u64.u32 %rd185, %r568; 2026-02-21T08:44:14.9592323Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:44:14.9592612Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:44:14.9593117Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9593669Z mov.b64 {%r1173, %r1174}, %rd187; 2026-02-21T08:44:14.9593974Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T08:44:14.9594568Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9595159Z cvt.u64.u32 %rd188, %r569; 2026-02-21T08:44:14.9595443Z cvt.u64.u32 %rd189, %r570; 2026-02-21T08:44:14.9595714Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:44:14.9595999Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:44:14.9596511Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9597055Z mov.b64 {%r1176, %r1177}, %rd191; 2026-02-21T08:44:14.9597373Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T08:44:14.9597909Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9598462Z cvt.u64.u32 %rd192, %r572; 2026-02-21T08:44:14.9598739Z cvt.u64.u32 %rd193, %r573; 2026-02-21T08:44:14.9599022Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:44:14.9599354Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:44:14.9599864Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9600424Z mov.b64 {%r1179, %r1180}, %rd195; 2026-02-21T08:44:14.9600731Z cvt.rn.f16x2.f32 %r1181, %r1180, %r1179; 2026-02-21T08:44:14.9601276Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9601819Z cvt.u64.u32 %rd196, %r574; 2026-02-21T08:44:14.9602105Z cvt.u64.u32 %rd197, %r575; 2026-02-21T08:44:14.9602426Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:44:14.9602712Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:44:14.9603221Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9603765Z mov.b64 {%r1182, %r1183}, %rd199; 2026-02-21T08:44:14.9604123Z cvt.rn.f16x2.f32 %r1184, %r1183, %r1182; 2026-02-21T08:44:14.9604665Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9605275Z cvt.u64.u32 %rd200, %r576; 2026-02-21T08:44:14.9605551Z cvt.u64.u32 %rd201, %r577; 2026-02-21T08:44:14.9605832Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:44:14.9606121Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:44:14.9606624Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9607179Z mov.b64 {%r1185, %r1186}, %rd203; 2026-02-21T08:44:14.9607491Z cvt.rn.f16x2.f32 %r1187, %r1186, %r1185; 2026-02-21T08:44:14.9608036Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9608584Z cvt.u64.u32 %rd204, %r578; 2026-02-21T08:44:14.9608868Z cvt.u64.u32 %rd205, %r579; 2026-02-21T08:44:14.9609151Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:44:14.9609428Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:44:14.9609949Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9610495Z mov.b64 {%r1188, %r1189}, %rd207; 2026-02-21T08:44:14.9610819Z cvt.rn.f16x2.f32 %r1190, %r1189, %r1188; 2026-02-21T08:44:14.9611358Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9611907Z cvt.u64.u32 %rd208, %r580; 2026-02-21T08:44:14.9612184Z cvt.u64.u32 %rd209, %r581; 2026-02-21T08:44:14.9612464Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:44:14.9612751Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:44:14.9613255Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9613805Z mov.b64 {%r1191, %r1192}, %rd211; 2026-02-21T08:44:14.9614119Z cvt.rn.f16x2.f32 %r1193, %r1192, %r1191; 2026-02-21T08:44:14.9614666Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9615247Z cvt.u64.u32 %rd212, %r582; 2026-02-21T08:44:14.9615532Z cvt.u64.u32 %rd213, %r583; 2026-02-21T08:44:14.9615865Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:44:14.9616139Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:44:14.9616654Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9617194Z mov.b64 {%r1194, %r1195}, %rd215; 2026-02-21T08:44:14.9617513Z cvt.rn.f16x2.f32 %r1196, %r1195, %r1194; 2026-02-21T08:44:14.9618048Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9618602Z cvt.u64.u32 %rd216, %r584; 2026-02-21T08:44:14.9618872Z cvt.u64.u32 %rd217, %r585; 2026-02-21T08:44:14.9619150Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:44:14.9619432Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:44:14.9619933Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9620493Z mov.b64 {%r1197, %r1198}, %rd219; 2026-02-21T08:44:14.9620850Z cvt.rn.f16x2.f32 %r1199, %r1198, %r1197; 2026-02-21T08:44:14.9621402Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9621942Z cvt.u64.u32 %rd220, %r586; 2026-02-21T08:44:14.9622220Z cvt.u64.u32 %rd221, %r587; 2026-02-21T08:44:14.9622503Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:44:14.9622781Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:44:14.9623290Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9623833Z mov.b64 {%r1200, %r1201}, %rd223; 2026-02-21T08:44:14.9624198Z cvt.rn.f16x2.f32 %r1202, %r1201, %r1200; 2026-02-21T08:44:14.9624772Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9625327Z cvt.u64.u32 %rd224, %r589; 2026-02-21T08:44:14.9625606Z cvt.u64.u32 %rd225, %r590; 2026-02-21T08:44:14.9625937Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:44:14.9626225Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:44:14.9626725Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9627276Z mov.b64 {%r1203, %r1204}, %rd227; 2026-02-21T08:44:14.9627587Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T08:44:14.9628120Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9628661Z cvt.u64.u32 %rd228, %r591; 2026-02-21T08:44:14.9628940Z cvt.u64.u32 %rd229, %r592; 2026-02-21T08:44:14.9629221Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:44:14.9629494Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:44:14.9630000Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9630532Z mov.b64 {%r1206, %r1207}, %rd231; 2026-02-21T08:44:14.9630852Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T08:44:14.9631383Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9631933Z cvt.u64.u32 %rd232, %r593; 2026-02-21T08:44:14.9632207Z cvt.u64.u32 %rd233, %r594; 2026-02-21T08:44:14.9632486Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:44:14.9632766Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:44:14.9633268Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9633819Z mov.b64 {%r1209, %r1210}, %rd235; 2026-02-21T08:44:14.9634127Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T08:44:14.9634715Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9635257Z cvt.u64.u32 %rd236, %r595; 2026-02-21T08:44:14.9635535Z cvt.u64.u32 %rd237, %r596; 2026-02-21T08:44:14.9635814Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:44:14.9636090Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:44:14.9636607Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9637156Z mov.b64 {%r1212, %r1213}, %rd239; 2026-02-21T08:44:14.9637570Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T08:44:14.9638105Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9638649Z cvt.u64.u32 %rd240, %r597; 2026-02-21T08:44:14.9638748Z cvt.u64.u32 %rd241, %r598; 2026-02-21T08:44:14.9638855Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:44:14.9638956Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:44:14.9639277Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9639381Z mov.b64 {%r1215, %r1216}, %rd243; 2026-02-21T08:44:14.9639505Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T08:44:14.9639820Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9639919Z cvt.u64.u32 %rd244, %r599; 2026-02-21T08:44:14.9640027Z cvt.u64.u32 %rd245, %r600; 2026-02-21T08:44:14.9640169Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:44:14.9640272Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:44:14.9640598Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9640699Z mov.b64 {%r1218, %r1219}, %rd247; 2026-02-21T08:44:14.9640815Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T08:44:14.9641127Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9641275Z cvt.u64.u32 %rd248, %r601; 2026-02-21T08:44:14.9641373Z cvt.u64.u32 %rd249, %r602; 2026-02-21T08:44:14.9641473Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:44:14.9641583Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:44:14.9641900Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9642043Z mov.b64 {%r1221, %r1222}, %rd251; 2026-02-21T08:44:14.9642172Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T08:44:14.9642488Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9642588Z cvt.u64.u32 %rd252, %r603; 2026-02-21T08:44:14.9642685Z cvt.u64.u32 %rd253, %r604; 2026-02-21T08:44:14.9642795Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:44:14.9642894Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:44:14.9643210Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9643320Z mov.b64 {%r1224, %r1225}, %rd255; 2026-02-21T08:44:14.9643436Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T08:44:14.9643751Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9643857Z cvt.u64.u32 %rd256, %r606; 2026-02-21T08:44:14.9643955Z cvt.u64.u32 %rd257, %r607; 2026-02-21T08:44:14.9644056Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:44:14.9644157Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:44:14.9644484Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9644589Z mov.b64 {%r1227, %r1228}, %rd259; 2026-02-21T08:44:14.9644736Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T08:44:14.9645007Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9645112Z cvt.u64.u32 %rd260, %r608; 2026-02-21T08:44:14.9645212Z cvt.u64.u32 %rd261, %r609; 2026-02-21T08:44:14.9645324Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:44:14.9645430Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:44:14.9645744Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9645849Z mov.b64 {%r1230, %r1231}, %rd263; 2026-02-21T08:44:14.9645978Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T08:44:14.9646297Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9646398Z cvt.u64.u32 %rd264, %r610; 2026-02-21T08:44:14.9646571Z cvt.u64.u32 %rd265, %r611; 2026-02-21T08:44:14.9646671Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:44:14.9646771Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:44:14.9647092Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9647196Z mov.b64 {%r1233, %r1234}, %rd267; 2026-02-21T08:44:14.9647309Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T08:44:14.9647625Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9647734Z cvt.u64.u32 %rd268, %r612; 2026-02-21T08:44:14.9647832Z cvt.u64.u32 %rd269, %r613; 2026-02-21T08:44:14.9647933Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:44:14.9648042Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:44:14.9648355Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9648505Z mov.b64 {%r1236, %r1237}, %rd271; 2026-02-21T08:44:14.9648628Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T08:44:14.9648945Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9649045Z cvt.u64.u32 %rd272, %r614; 2026-02-21T08:44:14.9649142Z cvt.u64.u32 %rd273, %r615; 2026-02-21T08:44:14.9649249Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:44:14.9649349Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:44:14.9649664Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9649821Z mov.b64 {%r1239, %r1240}, %rd275; 2026-02-21T08:44:14.9649936Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T08:44:14.9650254Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9650414Z cvt.u64.u32 %rd276, %r616; 2026-02-21T08:44:14.9650513Z cvt.u64.u32 %rd277, %r617; 2026-02-21T08:44:14.9650615Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:44:14.9650716Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:44:14.9651042Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9651141Z mov.b64 {%r1242, %r1243}, %rd279; 2026-02-21T08:44:14.9651255Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T08:44:14.9651582Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9651680Z cvt.u64.u32 %rd280, %r618; 2026-02-21T08:44:14.9651779Z cvt.u64.u32 %rd281, %r619; 2026-02-21T08:44:14.9651884Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:44:14.9651982Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:44:14.9652293Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9652397Z mov.b64 {%r1245, %r1246}, %rd283; 2026-02-21T08:44:14.9652521Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T08:44:14.9652841Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9652941Z cvt.u64.u32 %rd284, %r620; 2026-02-21T08:44:14.9653048Z cvt.u64.u32 %rd285, %r621; 2026-02-21T08:44:14.9653146Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:44:14.9653244Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:44:14.9653565Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9653665Z mov.b64 {%r1248, %r1249}, %rd287; 2026-02-21T08:44:14.9653783Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T08:44:14.9654094Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9654199Z cvt.u64.u32 %rd288, %r623; 2026-02-21T08:44:14.9654296Z cvt.u64.u32 %rd289, %r624; 2026-02-21T08:44:14.9654396Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:44:14.9654505Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:44:14.9654873Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9655016Z mov.b64 {%r1251, %r1252}, %rd291; 2026-02-21T08:44:14.9655136Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T08:44:14.9655451Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9655548Z cvt.u64.u32 %rd292, %r625; 2026-02-21T08:44:14.9655645Z cvt.u64.u32 %rd293, %r626; 2026-02-21T08:44:14.9655750Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:44:14.9655851Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:44:14.9656161Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9656270Z mov.b64 {%r1254, %r1255}, %rd295; 2026-02-21T08:44:14.9656382Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T08:44:14.9656695Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9656843Z cvt.u64.u32 %rd296, %r627; 2026-02-21T08:44:14.9656943Z cvt.u64.u32 %rd297, %r628; 2026-02-21T08:44:14.9657040Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:44:14.9657141Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:44:14.9657464Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9657565Z mov.b64 {%r1257, %r1258}, %rd299; 2026-02-21T08:44:14.9657680Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T08:44:14.9658001Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9658139Z cvt.u64.u32 %rd300, %r629; 2026-02-21T08:44:14.9658236Z cvt.u64.u32 %rd301, %r630; 2026-02-21T08:44:14.9658344Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:44:14.9658443Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:44:14.9658801Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9658903Z mov.b64 {%r1260, %r1261}, %rd303; 2026-02-21T08:44:14.9659029Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T08:44:14.9659342Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9659439Z cvt.u64.u32 %rd304, %r631; 2026-02-21T08:44:14.9659546Z cvt.u64.u32 %rd305, %r632; 2026-02-21T08:44:14.9659643Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:44:14.9659740Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:44:14.9660061Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9660162Z mov.b64 {%r1263, %r1264}, %rd307; 2026-02-21T08:44:14.9660274Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T08:44:14.9660589Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9660698Z cvt.u64.u32 %rd308, %r633; 2026-02-21T08:44:14.9660796Z cvt.u64.u32 %rd309, %r634; 2026-02-21T08:44:14.9660895Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:44:14.9661005Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:44:14.9661317Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9661416Z mov.b64 {%r1266, %r1267}, %rd311; 2026-02-21T08:44:14.9661536Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T08:44:14.9661848Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9661945Z cvt.u64.u32 %rd312, %r635; 2026-02-21T08:44:14.9662045Z cvt.u64.u32 %rd313, %r636; 2026-02-21T08:44:14.9662150Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:44:14.9662251Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:44:14.9662562Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9662674Z mov.b64 {%r1269, %r1270}, %rd315; 2026-02-21T08:44:14.9662787Z cvt.rn.f16x2.f32 %r1271, %r1270, %r1269; 2026-02-21T08:44:14.9663102Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9663248Z cvt.u64.u32 %rd316, %r637; 2026-02-21T08:44:14.9663344Z cvt.u64.u32 %rd317, %r638; 2026-02-21T08:44:14.9663440Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:44:14.9663538Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:44:14.9663860Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9663958Z mov.b64 {%r1272, %r1273}, %rd319; 2026-02-21T08:44:14.9664075Z cvt.rn.f16x2.f32 %r1274, %r1273, %r1272; 2026-02-21T08:44:14.9664395Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9664491Z cvt.u64.u32 %rd320, %r640; 2026-02-21T08:44:14.9664587Z cvt.u64.u32 %rd321, %r641; 2026-02-21T08:44:14.9664744Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:44:14.9664847Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:44:14.9665205Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9665306Z mov.b64 {%r1275, %r1276}, %rd323; 2026-02-21T08:44:14.9665427Z cvt.rn.f16x2.f32 %r1277, %r1276, %r1275; 2026-02-21T08:44:14.9665738Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9665837Z cvt.u64.u32 %rd324, %r642; 2026-02-21T08:44:14.9665942Z cvt.u64.u32 %rd325, %r643; 2026-02-21T08:44:14.9666038Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:44:14.9666181Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:44:14.9666502Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9666601Z mov.b64 {%r1278, %r1279}, %rd327; 2026-02-21T08:44:14.9666716Z cvt.rn.f16x2.f32 %r1280, %r1279, %r1278; 2026-02-21T08:44:14.9667083Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9667195Z cvt.u64.u32 %rd328, %r644; 2026-02-21T08:44:14.9667293Z cvt.u64.u32 %rd329, %r645; 2026-02-21T08:44:14.9667390Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:44:14.9667499Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:44:14.9667812Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9667911Z mov.b64 {%r1281, %r1282}, %rd331; 2026-02-21T08:44:14.9668029Z cvt.rn.f16x2.f32 %r1283, %r1282, %r1281; 2026-02-21T08:44:14.9668343Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9668444Z cvt.u64.u32 %rd332, %r646; 2026-02-21T08:44:14.9668543Z cvt.u64.u32 %rd333, %r647; 2026-02-21T08:44:14.9668650Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:44:14.9668750Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:44:14.9669066Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9669177Z mov.b64 {%r1284, %r1285}, %rd335; 2026-02-21T08:44:14.9669295Z cvt.rn.f16x2.f32 %r1286, %r1285, %r1284; 2026-02-21T08:44:14.9669604Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9669711Z cvt.u64.u32 %rd336, %r648; 2026-02-21T08:44:14.9669811Z cvt.u64.u32 %rd337, %r649; 2026-02-21T08:44:14.9669911Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:44:14.9670008Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:44:14.9670327Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9670431Z mov.b64 {%r1287, %r1288}, %rd339; 2026-02-21T08:44:14.9670545Z cvt.rn.f16x2.f32 %r1289, %r1288, %r1287; 2026-02-21T08:44:14.9670866Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9670966Z cvt.u64.u32 %rd340, %r650; 2026-02-21T08:44:14.9671065Z cvt.u64.u32 %rd341, %r651; 2026-02-21T08:44:14.9671172Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:44:14.9671270Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:44:14.9671632Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9671732Z mov.b64 {%r1290, %r1291}, %rd343; 2026-02-21T08:44:14.9671854Z cvt.rn.f16x2.f32 %r1292, %r1291, %r1290; 2026-02-21T08:44:14.9672167Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9672265Z cvt.u64.u32 %rd344, %r652; 2026-02-21T08:44:14.9672374Z cvt.u64.u32 %rd345, %r653; 2026-02-21T08:44:14.9672471Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:44:14.9672569Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:44:14.9672891Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9672989Z mov.b64 {%r1293, %r1294}, %rd347; 2026-02-21T08:44:14.9673104Z cvt.rn.f16x2.f32 %r1295, %r1294, %r1293; 2026-02-21T08:44:14.9673460Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9673573Z cvt.u64.u32 %rd348, %r654; 2026-02-21T08:44:14.9673672Z cvt.u64.u32 %rd349, %r655; 2026-02-21T08:44:14.9673771Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:44:14.9673878Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:44:14.9674197Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9674300Z mov.b64 {%r1296, %r1297}, %rd351; 2026-02-21T08:44:14.9674453Z cvt.rn.f16x2.f32 %r1298, %r1297, %r1296; 2026-02-21T08:44:14.9674816Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9674916Z cvt.u64.u32 %rd352, %r657; 2026-02-21T08:44:14.9675014Z cvt.u64.u32 %rd353, %r658; 2026-02-21T08:44:14.9675123Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:44:14.9675265Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:44:14.9675583Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9675692Z mov.b64 {%r1299, %r1300}, %rd355; 2026-02-21T08:44:14.9675809Z cvt.rn.f16x2.f32 %r1301, %r1300, %r1299; 2026-02-21T08:44:14.9676123Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9676231Z cvt.u64.u32 %rd356, %r659; 2026-02-21T08:44:14.9676343Z cvt.u64.u32 %rd357, %r660; 2026-02-21T08:44:14.9676443Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:44:14.9676543Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:44:14.9676869Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9676970Z mov.b64 {%r1302, %r1303}, %rd359; 2026-02-21T08:44:14.9677082Z cvt.rn.f16x2.f32 %r1304, %r1303, %r1302; 2026-02-21T08:44:14.9677405Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9677506Z cvt.u64.u32 %rd360, %r661; 2026-02-21T08:44:14.9677606Z cvt.u64.u32 %rd361, %r662; 2026-02-21T08:44:14.9677711Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:44:14.9677808Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:44:14.9678122Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9678221Z mov.b64 {%r1305, %r1306}, %rd363; 2026-02-21T08:44:14.9678345Z cvt.rn.f16x2.f32 %r1307, %r1306, %r1305; 2026-02-21T08:44:14.9678658Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9678763Z cvt.u64.u32 %rd364, %r663; 2026-02-21T08:44:14.9678869Z cvt.u64.u32 %rd365, %r664; 2026-02-21T08:44:14.9678970Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:44:14.9679068Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:44:14.9679394Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9679493Z mov.b64 {%r1308, %r1309}, %rd367; 2026-02-21T08:44:14.9679610Z cvt.rn.f16x2.f32 %r1310, %r1309, %r1308; 2026-02-21T08:44:14.9679972Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9680081Z cvt.u64.u32 %rd368, %r665; 2026-02-21T08:44:14.9680178Z cvt.u64.u32 %rd369, %r666; 2026-02-21T08:44:14.9680278Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:44:14.9680382Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:44:14.9680698Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9680801Z mov.b64 {%r1311, %r1312}, %rd371; 2026-02-21T08:44:14.9680925Z cvt.rn.f16x2.f32 %r1313, %r1312, %r1311; 2026-02-21T08:44:14.9681239Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9681337Z cvt.u64.u32 %rd372, %r667; 2026-02-21T08:44:14.9681435Z cvt.u64.u32 %rd373, %r668; 2026-02-21T08:44:14.9681541Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:44:14.9681690Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:44:14.9682007Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9682114Z mov.b64 {%r1314, %r1315}, %rd375; 2026-02-21T08:44:14.9682228Z cvt.rn.f16x2.f32 %r1316, %r1315, %r1314; 2026-02-21T08:44:14.9682540Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9682645Z cvt.u64.u32 %rd376, %r669; 2026-02-21T08:44:14.9682743Z cvt.u64.u32 %rd377, %r670; 2026-02-21T08:44:14.9682889Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:44:14.9682988Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:44:14.9683322Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9683422Z mov.b64 {%r1317, %r1318}, %rd379; 2026-02-21T08:44:14.9683570Z cvt.rn.f16x2.f32 %r1319, %r1318, %r1317; 2026-02-21T08:44:14.9683902Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9684002Z cvt.u64.u32 %rd380, %r671; 2026-02-21T08:44:14.9684101Z cvt.u64.u32 %rd381, %r672; 2026-02-21T08:44:14.9684205Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:44:14.9684303Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:44:14.9684617Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9684763Z mov.b64 {%r1320, %r1321}, %rd383; 2026-02-21T08:44:14.9684886Z cvt.rn.f16x2.f32 %r1322, %r1321, %r1320; 2026-02-21T08:44:14.9685196Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9685295Z cvt.u64.u32 %rd384, %r674; 2026-02-21T08:44:14.9685401Z cvt.u64.u32 %rd385, %r675; 2026-02-21T08:44:14.9685500Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:44:14.9685599Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:44:14.9685920Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9686021Z mov.b64 {%r1323, %r1324}, %rd387; 2026-02-21T08:44:14.9686135Z cvt.rn.f16x2.f32 %r1325, %r1324, %r1323; 2026-02-21T08:44:14.9686444Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9686551Z cvt.u64.u32 %rd388, %r676; 2026-02-21T08:44:14.9686648Z cvt.u64.u32 %rd389, %r677; 2026-02-21T08:44:14.9686746Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:44:14.9686849Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:44:14.9687159Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9687261Z mov.b64 {%r1326, %r1327}, %rd391; 2026-02-21T08:44:14.9687382Z cvt.rn.f16x2.f32 %r1328, %r1327, %r1326; 2026-02-21T08:44:14.9687696Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9687790Z cvt.u64.u32 %rd392, %r678; 2026-02-21T08:44:14.9687887Z cvt.u64.u32 %rd393, %r679; 2026-02-21T08:44:14.9688047Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:44:14.9688144Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:44:14.9688454Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9688562Z mov.b64 {%r1329, %r1330}, %rd395; 2026-02-21T08:44:14.9688674Z cvt.rn.f16x2.f32 %r1331, %r1330, %r1329; 2026-02-21T08:44:14.9688989Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9689097Z cvt.u64.u32 %rd396, %r680; 2026-02-21T08:44:14.9689196Z cvt.u64.u32 %rd397, %r681; 2026-02-21T08:44:14.9689292Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:44:14.9689388Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:44:14.9689709Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9689810Z mov.b64 {%r1332, %r1333}, %rd399; 2026-02-21T08:44:14.9689970Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T08:44:14.9690298Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9690393Z cvt.u64.u32 %rd400, %r682; 2026-02-21T08:44:14.9690488Z cvt.u64.u32 %rd401, %r683; 2026-02-21T08:44:14.9690592Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:44:14.9690691Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:44:14.9690996Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9691139Z mov.b64 {%r1335, %r1336}, %rd403; 2026-02-21T08:44:14.9691262Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T08:44:14.9691570Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9691665Z cvt.u64.u32 %rd404, %r684; 2026-02-21T08:44:14.9691808Z cvt.u64.u32 %rd405, %r685; 2026-02-21T08:44:14.9691904Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:44:14.9692001Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:44:14.9692318Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9692415Z mov.b64 {%r1338, %r1339}, %rd407; 2026-02-21T08:44:14.9692524Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T08:44:14.9692829Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9692932Z cvt.u64.u32 %rd408, %r686; 2026-02-21T08:44:14.9693023Z cvt.u64.u32 %rd409, %r687; 2026-02-21T08:44:14.9693123Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:44:14.9693226Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:44:14.9693511Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9693605Z mov.b64 {%r1341, %r1342}, %rd411; 2026-02-21T08:44:14.9693723Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T08:44:14.9694017Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9694110Z cvt.u64.u32 %rd412, %r688; 2026-02-21T08:44:14.9694199Z cvt.u64.u32 %rd413, %r689; 2026-02-21T08:44:14.9694299Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:44:14.9694387Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:44:14.9694709Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9694813Z mov.b64 {%r1344, %r1345}, %rd415; 2026-02-21T08:44:14.9694923Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T08:44:14.9695218Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9695318Z cvt.u64.u32 %rd416, %r691; 2026-02-21T08:44:14.9695405Z cvt.u64.u32 %rd417, %r692; 2026-02-21T08:44:14.9695495Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:44:14.9695585Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:44:14.9695885Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9696032Z mov.b64 {%r1347, %r1348}, %rd419; 2026-02-21T08:44:14.9696138Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T08:44:14.9696437Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9696529Z cvt.u64.u32 %rd420, %r693; 2026-02-21T08:44:14.9696618Z cvt.u64.u32 %rd421, %r694; 2026-02-21T08:44:14.9696717Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:44:14.9696806Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:44:14.9697098Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9697190Z mov.b64 {%r1350, %r1351}, %rd423; 2026-02-21T08:44:14.9697305Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T08:44:14.9697588Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9697684Z cvt.u64.u32 %rd424, %r695; 2026-02-21T08:44:14.9697838Z cvt.u64.u32 %rd425, %r696; 2026-02-21T08:44:14.9697932Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:44:14.9698025Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:44:14.9698318Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9698411Z mov.b64 {%r1353, %r1354}, %rd427; 2026-02-21T08:44:14.9698521Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T08:44:14.9698813Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9698980Z cvt.u64.u32 %rd428, %r697; 2026-02-21T08:44:14.9699067Z cvt.u64.u32 %rd429, %r698; 2026-02-21T08:44:14.9699156Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:44:14.9699259Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:44:14.9699586Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9699683Z mov.b64 {%r1356, %r1357}, %rd431; 2026-02-21T08:44:14.9699801Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T08:44:14.9700106Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9700198Z cvt.u64.u32 %rd432, %r699; 2026-02-21T08:44:14.9700291Z cvt.u64.u32 %rd433, %r700; 2026-02-21T08:44:14.9700392Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:44:14.9700486Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:44:14.9700779Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9700883Z mov.b64 {%r1359, %r1360}, %rd435; 2026-02-21T08:44:14.9700992Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T08:44:14.9701293Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9701392Z cvt.u64.u32 %rd436, %r701; 2026-02-21T08:44:14.9701484Z cvt.u64.u32 %rd437, %r702; 2026-02-21T08:44:14.9701578Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:44:14.9701672Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:44:14.9701965Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9702053Z mov.b64 {%r1362, %r1363}, %rd439; 2026-02-21T08:44:14.9702157Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T08:44:14.9702446Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9702533Z cvt.u64.u32 %rd440, %r703; 2026-02-21T08:44:14.9702621Z cvt.u64.u32 %rd441, %r704; 2026-02-21T08:44:14.9702719Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:44:14.9702809Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:44:14.9703092Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9703184Z mov.b64 {%r1365, %r1366}, %rd443; 2026-02-21T08:44:14.9703301Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T08:44:14.9703583Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9703721Z cvt.u64.u32 %rd444, %r705; 2026-02-21T08:44:14.9703820Z cvt.u64.u32 %rd445, %r706; 2026-02-21T08:44:14.9703911Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:44:14.9704004Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:44:14.9704290Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9704382Z mov.b64 {%r1368, %r1369}, %rd447; 2026-02-21T08:44:14.9704487Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T08:44:14.9704831Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9704928Z cvt.u64.u32 %rd448, %r708; 2026-02-21T08:44:14.9705016Z cvt.u64.u32 %rd449, %r709; 2026-02-21T08:44:14.9705107Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:44:14.9705209Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:44:14.9705543Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9705637Z mov.b64 {%r1371, %r1372}, %rd451; 2026-02-21T08:44:14.9705751Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T08:44:14.9706038Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9706128Z cvt.u64.u32 %rd452, %r710; 2026-02-21T08:44:14.9706218Z cvt.u64.u32 %rd453, %r711; 2026-02-21T08:44:14.9706324Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:44:14.9706494Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:44:14.9706832Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9706938Z mov.b64 {%r1374, %r1375}, %rd455; 2026-02-21T08:44:14.9707045Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T08:44:14.9707377Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9707485Z cvt.u64.u32 %rd456, %r712; 2026-02-21T08:44:14.9707579Z cvt.u64.u32 %rd457, %r713; 2026-02-21T08:44:14.9707675Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:44:14.9707768Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:44:14.9708073Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9708170Z mov.b64 {%r1377, %r1378}, %rd459; 2026-02-21T08:44:14.9708280Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T08:44:14.9708585Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9708682Z cvt.u64.u32 %rd460, %r714; 2026-02-21T08:44:14.9708768Z cvt.u64.u32 %rd461, %r715; 2026-02-21T08:44:14.9708860Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:44:14.9708949Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:44:14.9709219Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9709302Z mov.b64 {%r1380, %r1381}, %rd463; 2026-02-21T08:44:14.9709409Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T08:44:14.9709673Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9709759Z cvt.u64.u32 %rd464, %r716; 2026-02-21T08:44:14.9709855Z cvt.u64.u32 %rd465, %r717; 2026-02-21T08:44:14.9709938Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:44:14.9710025Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:44:14.9710311Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9710399Z mov.b64 {%r1383, %r1384}, %rd467; 2026-02-21T08:44:14.9710498Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T08:44:14.9710770Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9710863Z cvt.u64.u32 %rd468, %r718; 2026-02-21T08:44:14.9710949Z cvt.u64.u32 %rd469, %r719; 2026-02-21T08:44:14.9711037Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:44:14.9711137Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:44:14.9711404Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9711545Z mov.b64 {%r1386, %r1387}, %rd471; 2026-02-21T08:44:14.9711651Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T08:44:14.9711919Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9712006Z cvt.u64.u32 %rd472, %r720; 2026-02-21T08:44:14.9712091Z cvt.u64.u32 %rd473, %r721; 2026-02-21T08:44:14.9712193Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:44:14.9712282Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:44:14.9712557Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9712662Z mov.b64 {%r1389, %r1390}, %rd475; 2026-02-21T08:44:14.9712770Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T08:44:14.9713136Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9713240Z cvt.u64.u32 %rd476, %r722; 2026-02-21T08:44:14.9713337Z cvt.u64.u32 %rd477, %r723; 2026-02-21T08:44:14.9713436Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:44:14.9713532Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:44:14.9713855Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9713950Z mov.b64 {%r1392, %r1393}, %rd479; 2026-02-21T08:44:14.9714064Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T08:44:14.9714423Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9714523Z cvt.u64.u32 %rd480, %r725; 2026-02-21T08:44:14.9714619Z cvt.u64.u32 %rd481, %r726; 2026-02-21T08:44:14.9714788Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:44:14.9714887Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:44:14.9715253Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9715354Z mov.b64 {%r1395, %r1396}, %rd483; 2026-02-21T08:44:14.9715474Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T08:44:14.9715793Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9715889Z cvt.u64.u32 %rd484, %r727; 2026-02-21T08:44:14.9715995Z cvt.u64.u32 %rd485, %r728; 2026-02-21T08:44:14.9716094Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:44:14.9716190Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:44:14.9716515Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9716613Z mov.b64 {%r1398, %r1399}, %rd487; 2026-02-21T08:44:14.9716727Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T08:44:14.9717037Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9717146Z cvt.u64.u32 %rd488, %r729; 2026-02-21T08:44:14.9717245Z cvt.u64.u32 %rd489, %r730; 2026-02-21T08:44:14.9717343Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:44:14.9717452Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:44:14.9717762Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9717861Z mov.b64 {%r1401, %r1402}, %rd491; 2026-02-21T08:44:14.9717982Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T08:44:14.9718299Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9718400Z cvt.u64.u32 %rd492, %r731; 2026-02-21T08:44:14.9718499Z cvt.u64.u32 %rd493, %r732; 2026-02-21T08:44:14.9718607Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:44:14.9718704Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:44:14.9719021Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9719130Z mov.b64 {%r1404, %r1405}, %rd495; 2026-02-21T08:44:14.9719246Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T08:44:14.9719558Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9719723Z cvt.u64.u32 %rd496, %r733; 2026-02-21T08:44:14.9719817Z cvt.u64.u32 %rd497, %r734; 2026-02-21T08:44:14.9719916Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:44:14.9720014Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:44:14.9720334Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9720434Z mov.b64 {%r1407, %r1408}, %rd499; 2026-02-21T08:44:14.9720549Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T08:44:14.9720866Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9720964Z cvt.u64.u32 %rd500, %r735; 2026-02-21T08:44:14.9721062Z cvt.u64.u32 %rd501, %r736; 2026-02-21T08:44:14.9721173Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:44:14.9721273Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:44:14.9721631Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9721734Z mov.b64 {%r1410, %r1411}, %rd503; 2026-02-21T08:44:14.9721857Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T08:44:14.9722172Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9722267Z cvt.u64.u32 %rd504, %r737; 2026-02-21T08:44:14.9722374Z cvt.u64.u32 %rd505, %r738; 2026-02-21T08:44:14.9722472Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:44:14.9722612Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:44:14.9722938Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9723039Z mov.b64 {%r1413, %r1414}, %rd507; 2026-02-21T08:44:14.9723154Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T08:44:14.9723497Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9723604Z cvt.u64.u32 %rd508, %r739; 2026-02-21T08:44:14.9723706Z cvt.u64.u32 %rd509, %r740; 2026-02-21T08:44:14.9723803Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:44:14.9723910Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:44:14.9724223Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9724321Z mov.b64 {%r1416, %r1417}, %rd511; 2026-02-21T08:44:14.9724442Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T08:44:14.9724800Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9724901Z cvt.u64.u32 %rd512, %r742; 2026-02-21T08:44:14.9724998Z cvt.u64.u32 %rd513, %r743; 2026-02-21T08:44:14.9725097Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:44:14.9725196Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:44:14.9725515Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9725624Z mov.b64 {%r1419, %r1420}, %rd515; 2026-02-21T08:44:14.9725738Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T08:44:14.9726053Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9726158Z cvt.u64.u32 %rd516, %r744; 2026-02-21T08:44:14.9726256Z cvt.u64.u32 %rd517, %r745; 2026-02-21T08:44:14.9726352Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:44:14.9726446Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:44:14.9726769Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9726870Z mov.b64 {%r1422, %r1423}, %rd519; 2026-02-21T08:44:14.9726984Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T08:44:14.9727303Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9727402Z cvt.u64.u32 %rd520, %r746; 2026-02-21T08:44:14.9727498Z cvt.u64.u32 %rd521, %r747; 2026-02-21T08:44:14.9727606Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:44:14.9727754Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:44:14.9728069Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9728165Z mov.b64 {%r1425, %r1426}, %rd523; 2026-02-21T08:44:14.9728287Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T08:44:14.9728592Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9728689Z cvt.u64.u32 %rd524, %r748; 2026-02-21T08:44:14.9728797Z cvt.u64.u32 %rd525, %r749; 2026-02-21T08:44:14.9728897Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:44:14.9728997Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:44:14.9729317Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9729420Z mov.b64 {%r1428, %r1429}, %rd527; 2026-02-21T08:44:14.9729530Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T08:44:14.9729907Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9730017Z cvt.u64.u32 %rd528, %r750; 2026-02-21T08:44:14.9730115Z cvt.u64.u32 %rd529, %r751; 2026-02-21T08:44:14.9730213Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:44:14.9730323Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:44:14.9730636Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9730737Z mov.b64 {%r1431, %r1432}, %rd531; 2026-02-21T08:44:14.9730899Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T08:44:14.9731212Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9731312Z cvt.u64.u32 %rd532, %r752; 2026-02-21T08:44:14.9731412Z cvt.u64.u32 %rd533, %r753; 2026-02-21T08:44:14.9731556Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:44:14.9731655Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:44:14.9731972Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9732082Z mov.b64 {%r1434, %r1435}, %rd535; 2026-02-21T08:44:14.9732196Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T08:44:14.9732513Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9732621Z cvt.u64.u32 %rd536, %r754; 2026-02-21T08:44:14.9732719Z cvt.u64.u32 %rd537, %r755; 2026-02-21T08:44:14.9732815Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:44:14.9732918Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:44:14.9733240Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9733340Z mov.b64 {%r1437, %r1438}, %rd539; 2026-02-21T08:44:14.9733451Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T08:44:14.9733775Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9733877Z cvt.u64.u32 %rd540, %r756; 2026-02-21T08:44:14.9733977Z cvt.u64.u32 %rd541, %r757; 2026-02-21T08:44:14.9734082Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:44:14.9734182Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:44:14.9734495Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9734593Z mov.b64 {%r1440, %r1441}, %rd543; 2026-02-21T08:44:14.9734767Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T08:44:14.9735083Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9735186Z cvt.u64.u32 %rd544, %r759; 2026-02-21T08:44:14.9735296Z cvt.u64.u32 %rd545, %r760; 2026-02-21T08:44:14.9735394Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:44:14.9735492Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:44:14.9735818Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9735920Z mov.b64 {%r1443, %r1444}, %rd547; 2026-02-21T08:44:14.9736091Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T08:44:14.9736405Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9736512Z cvt.u64.u32 %rd548, %r761; 2026-02-21T08:44:14.9736610Z cvt.u64.u32 %rd549, %r762; 2026-02-21T08:44:14.9736705Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:44:14.9736814Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:44:14.9737127Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9737230Z mov.b64 {%r1446, %r1447}, %rd551; 2026-02-21T08:44:14.9737351Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T08:44:14.9737669Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9737770Z cvt.u64.u32 %rd552, %r763; 2026-02-21T08:44:14.9737866Z cvt.u64.u32 %rd553, %r764; 2026-02-21T08:44:14.9738029Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:44:14.9738129Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:44:14.9738444Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9738551Z mov.b64 {%r1449, %r1450}, %rd555; 2026-02-21T08:44:14.9738661Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T08:44:14.9738969Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9739075Z cvt.u64.u32 %rd556, %r765; 2026-02-21T08:44:14.9739216Z cvt.u64.u32 %rd557, %r766; 2026-02-21T08:44:14.9739312Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:44:14.9739411Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:44:14.9739731Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9739869Z mov.b64 {%r1452, %r1453}, %rd559; 2026-02-21T08:44:14.9739985Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T08:44:14.9740312Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9740412Z cvt.u64.u32 %rd560, %r767; 2026-02-21T08:44:14.9740508Z cvt.u64.u32 %rd561, %r768; 2026-02-21T08:44:14.9740617Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:44:14.9740717Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:44:14.9741028Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9741125Z mov.b64 {%r1455, %r1456}, %rd563; 2026-02-21T08:44:14.9741249Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T08:44:14.9741561Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9741659Z cvt.u64.u32 %rd564, %r769; 2026-02-21T08:44:14.9741765Z cvt.u64.u32 %rd565, %r770; 2026-02-21T08:44:14.9741866Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:44:14.9741965Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:44:14.9742286Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9742386Z mov.b64 {%r1458, %r1459}, %rd567; 2026-02-21T08:44:14.9742499Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T08:44:14.9742808Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9742915Z cvt.u64.u32 %rd568, %r771; 2026-02-21T08:44:14.9743009Z cvt.u64.u32 %rd569, %r772; 2026-02-21T08:44:14.9743108Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:44:14.9743217Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:44:14.9743529Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9743625Z mov.b64 {%r1461, %r1462}, %rd571; 2026-02-21T08:44:14.9743747Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T08:44:14.9744062Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9744161Z cvt.u64.u32 %rd572, %r773; 2026-02-21T08:44:14.9744297Z cvt.u64.u32 %rd573, %r774; 2026-02-21T08:44:14.9744405Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:44:14.9744501Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:44:14.9744839Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9744948Z mov.b64 {%r1464, %r1465}, %rd575; 2026-02-21T08:44:14.9745061Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T08:44:14.9745373Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9745481Z cvt.u64.u32 %rd576, %r776; 2026-02-21T08:44:14.9745578Z cvt.u64.u32 %rd577, %r777; 2026-02-21T08:44:14.9745679Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:44:14.9745776Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:44:14.9746124Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9746278Z mov.b64 {%r1467, %r1468}, %rd579; 2026-02-21T08:44:14.9746389Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T08:44:14.9746708Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9746807Z cvt.u64.u32 %rd580, %r778; 2026-02-21T08:44:14.9746905Z cvt.u64.u32 %rd581, %r779; 2026-02-21T08:44:14.9747006Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:44:14.9747104Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:44:14.9747414Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9747556Z mov.b64 {%r1470, %r1471}, %rd583; 2026-02-21T08:44:14.9747680Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T08:44:14.9747996Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9748131Z cvt.u64.u32 %rd584, %r780; 2026-02-21T08:44:14.9748242Z cvt.u64.u32 %rd585, %r781; 2026-02-21T08:44:14.9748343Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:44:14.9748444Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:44:14.9748769Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9748869Z mov.b64 {%r1473, %r1474}, %rd587; 2026-02-21T08:44:14.9748982Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T08:44:14.9749300Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9749405Z cvt.u64.u32 %rd588, %r782; 2026-02-21T08:44:14.9749506Z cvt.u64.u32 %rd589, %r783; 2026-02-21T08:44:14.9749603Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:44:14.9749713Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:44:14.9750027Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9750127Z mov.b64 {%r1476, %r1477}, %rd591; 2026-02-21T08:44:14.9750249Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T08:44:14.9750570Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9750672Z cvt.u64.u32 %rd592, %r784; 2026-02-21T08:44:14.9750772Z cvt.u64.u32 %rd593, %r785; 2026-02-21T08:44:14.9750881Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:44:14.9750981Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:44:14.9751298Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9751404Z mov.b64 {%r1479, %r1480}, %rd595; 2026-02-21T08:44:14.9751521Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T08:44:14.9751838Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9751943Z cvt.u64.u32 %rd596, %r786; 2026-02-21T08:44:14.9752041Z cvt.u64.u32 %rd597, %r787; 2026-02-21T08:44:14.9752141Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:44:14.9752238Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:44:14.9752564Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9752720Z mov.b64 {%r1482, %r1483}, %rd599; 2026-02-21T08:44:14.9752831Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T08:44:14.9753156Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9753252Z cvt.u64.u32 %rd600, %r788; 2026-02-21T08:44:14.9753347Z cvt.u64.u32 %rd601, %r789; 2026-02-21T08:44:14.9753454Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:44:14.9753552Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:44:14.9753870Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9753968Z mov.b64 {%r1485, %r1486}, %rd603; 2026-02-21T08:44:14.9754096Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T08:44:14.9754414Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9754562Z cvt.u64.u32 %rd604, %r790; 2026-02-21T08:44:14.9754711Z cvt.u64.u32 %rd605, %r791; 2026-02-21T08:44:14.9754813Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:44:14.9754917Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:44:14.9755240Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9755338Z mov.b64 {%r1488, %r1489}, %rd607; 2026-02-21T08:44:14.9755449Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T08:44:14.9755760Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9755913Z cvt.u64.u32 %rd608, %r793; 2026-02-21T08:44:14.9756013Z cvt.u64.u32 %rd609, %r794; 2026-02-21T08:44:14.9756114Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:44:14.9756222Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:44:14.9756575Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9756679Z mov.b64 {%r1491, %r1492}, %rd611; 2026-02-21T08:44:14.9756798Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T08:44:14.9757113Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9757211Z cvt.u64.u32 %rd612, %r795; 2026-02-21T08:44:14.9757307Z cvt.u64.u32 %rd613, %r796; 2026-02-21T08:44:14.9757413Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:44:14.9757512Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:44:14.9757826Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9757934Z mov.b64 {%r1494, %r1495}, %rd615; 2026-02-21T08:44:14.9758047Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T08:44:14.9758357Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9758459Z cvt.u64.u32 %rd616, %r797; 2026-02-21T08:44:14.9758557Z cvt.u64.u32 %rd617, %r798; 2026-02-21T08:44:14.9758655Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:44:14.9758753Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:44:14.9759077Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9759178Z mov.b64 {%r1497, %r1498}, %rd619; 2026-02-21T08:44:14.9759289Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T08:44:14.9759614Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9759711Z cvt.u64.u32 %rd620, %r799; 2026-02-21T08:44:14.9759807Z cvt.u64.u32 %rd621, %r800; 2026-02-21T08:44:14.9759913Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:44:14.9760011Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:44:14.9760329Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9760430Z mov.b64 {%r1500, %r1501}, %rd623; 2026-02-21T08:44:14.9760552Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T08:44:14.9760866Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9761040Z cvt.u64.u32 %rd624, %r801; 2026-02-21T08:44:14.9761144Z cvt.u64.u32 %rd625, %r802; 2026-02-21T08:44:14.9761243Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:44:14.9761343Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:44:14.9761661Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9761759Z mov.b64 {%r1503, %r1504}, %rd627; 2026-02-21T08:44:14.9761871Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T08:44:14.9762184Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9762291Z cvt.u64.u32 %rd628, %r803; 2026-02-21T08:44:14.9762387Z cvt.u64.u32 %rd629, %r804; 2026-02-21T08:44:14.9762486Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:44:14.9762596Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:44:14.9762949Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9763052Z mov.b64 {%r1506, %r1507}, %rd631; 2026-02-21T08:44:14.9763171Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T08:44:14.9763483Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9763580Z cvt.u64.u32 %rd632, %r805; 2026-02-21T08:44:14.9763674Z cvt.u64.u32 %rd633, %r806; 2026-02-21T08:44:14.9763779Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:44:14.9763877Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:44:14.9764224Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9764333Z mov.b64 {%r1509, %r1510}, %rd635; 2026-02-21T08:44:14.9764445Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T08:44:14.9764835Z .loc 1 46 52 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:46:52 2026-02-21T08:44:14.9764942Z cvt.u64.u32 %rd636, %r807; 2026-02-21T08:44:14.9765044Z cvt.u64.u32 %rd637, %r808; 2026-02-21T08:44:14.9765148Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:44:14.9765243Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:44:14.9765562Z .loc 1 48 27 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:48:27 2026-02-21T08:44:14.9765662Z mov.b64 {%r1512, %r1513}, %rd639; 2026-02-21T08:44:14.9765773Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T08:44:14.9766102Z .loc 1 49 83 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:49:83 2026-02-21T08:44:14.9766278Z st.shared.v4.b32 [%r13], {%r1133, %r1145, %r1157, %r1169}; 2026-02-21T08:44:14.9766467Z st.shared.v4.b32 [%r13+2048], {%r1325, %r1337, %r1349, %r1361}; 2026-02-21T08:44:14.9766644Z st.shared.v4.b32 [%r14], {%r1181, %r1193, %r1205, %r1217}; 2026-02-21T08:44:14.9766828Z st.shared.v4.b32 [%r14+2048], {%r1373, %r1385, %r1397, %r1409}; 2026-02-21T08:44:14.9766993Z st.shared.v4.b32 [%r15], {%r1229, %r1241, %r1253, %r1265}; 2026-02-21T08:44:14.9767172Z st.shared.v4.b32 [%r15+2048], {%r1421, %r1433, %r1445, %r1457}; 2026-02-21T08:44:14.9767347Z st.shared.v4.b32 [%r16], {%r1277, %r1289, %r1301, %r1313}; 2026-02-21T08:44:14.9767524Z st.shared.v4.b32 [%r16+2048], {%r1469, %r1481, %r1493, %r1505}; 2026-02-21T08:44:14.9767618Z bar.sync 0; 2026-02-21T08:44:14.9767724Z // begin inline asm 2026-02-21T08:44:14.9768014Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r970, %r974, %r978, %r982}, [%r814]; 2026-02-21T08:44:14.9768111Z // end inline asm 2026-02-21T08:44:14.9768211Z // begin inline asm 2026-02-21T08:44:14.9768493Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r986, %r990, %r994, %r998}, [%r819]; 2026-02-21T08:44:14.9768586Z // end inline asm 2026-02-21T08:44:14.9768680Z // begin inline asm 2026-02-21T08:44:14.9768987Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1002, %r1006, %r1010, %r1014}, [%r824]; 2026-02-21T08:44:14.9769082Z // end inline asm 2026-02-21T08:44:14.9769171Z // begin inline asm 2026-02-21T08:44:14.9769470Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1018, %r1022, %r1026, %r1030}, [%r829]; 2026-02-21T08:44:14.9769625Z // end inline asm 2026-02-21T08:44:14.9769720Z // begin inline asm 2026-02-21T08:44:14.9770011Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1034, %r1038, %r1042, %r1046}, [%r834]; 2026-02-21T08:44:14.9770102Z // end inline asm 2026-02-21T08:44:14.9770196Z // begin inline asm 2026-02-21T08:44:14.9770470Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1050, %r1054, %r1058, %r1062}, [%r839]; 2026-02-21T08:44:14.9770570Z // end inline asm 2026-02-21T08:44:14.9770663Z // begin inline asm 2026-02-21T08:44:14.9770938Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1066, %r1070, %r1074, %r1078}, [%r844]; 2026-02-21T08:44:14.9771042Z // end inline asm 2026-02-21T08:44:14.9771135Z // begin inline asm 2026-02-21T08:44:14.9771417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1082, %r1086, %r1090, %r1094}, [%r849]; 2026-02-21T08:44:14.9771509Z // end inline asm 2026-02-21T08:44:14.9771655Z bar.sync 0; 2026-02-21T08:44:14.9771822Z st.shared.v4.b32 [%r13], {%r1136, %r1148, %r1160, %r1172}; 2026-02-21T08:44:14.9772006Z st.shared.v4.b32 [%r13+2048], {%r1328, %r1340, %r1352, %r1364}; 2026-02-21T08:44:14.9772182Z st.shared.v4.b32 [%r14], {%r1184, %r1196, %r1208, %r1220}; 2026-02-21T08:44:14.9772360Z st.shared.v4.b32 [%r14+2048], {%r1376, %r1388, %r1400, %r1412}; 2026-02-21T08:44:14.9772522Z st.shared.v4.b32 [%r15], {%r1232, %r1244, %r1256, %r1268}; 2026-02-21T08:44:14.9772706Z st.shared.v4.b32 [%r15+2048], {%r1424, %r1436, %r1448, %r1460}; 2026-02-21T08:44:14.9772916Z st.shared.v4.b32 [%r16], {%r1280, %r1292, %r1304, %r1316}; 2026-02-21T08:44:14.9773094Z st.shared.v4.b32 [%r16+2048], {%r1472, %r1484, %r1496, %r1508}; 2026-02-21T08:44:14.9773191Z bar.sync 0; 2026-02-21T08:44:14.9773287Z // begin inline asm 2026-02-21T08:44:14.9773593Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r971, %r975, %r979, %r983}, [%r814]; 2026-02-21T08:44:14.9773678Z // end inline asm 2026-02-21T08:44:14.9773781Z // begin inline asm 2026-02-21T08:44:14.9774056Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r987, %r991, %r995, %r999}, [%r819]; 2026-02-21T08:44:14.9774142Z // end inline asm 2026-02-21T08:44:14.9774241Z // begin inline asm 2026-02-21T08:44:14.9774520Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1003, %r1007, %r1011, %r1015}, [%r824]; 2026-02-21T08:44:14.9774606Z // end inline asm 2026-02-21T08:44:14.9774741Z // begin inline asm 2026-02-21T08:44:14.9775033Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1019, %r1023, %r1027, %r1031}, [%r829]; 2026-02-21T08:44:14.9775128Z // end inline asm 2026-02-21T08:44:14.9775222Z // begin inline asm 2026-02-21T08:44:14.9775509Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1035, %r1039, %r1043, %r1047}, [%r834]; 2026-02-21T08:44:14.9775601Z // end inline asm 2026-02-21T08:44:14.9775695Z // begin inline asm 2026-02-21T08:44:14.9775983Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1051, %r1055, %r1059, %r1063}, [%r839]; 2026-02-21T08:44:14.9776077Z // end inline asm 2026-02-21T08:44:14.9776173Z // begin inline asm 2026-02-21T08:44:14.9776454Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1067, %r1071, %r1075, %r1079}, [%r844]; 2026-02-21T08:44:14.9776557Z // end inline asm 2026-02-21T08:44:14.9776646Z // begin inline asm 2026-02-21T08:44:14.9776922Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1083, %r1087, %r1091, %r1095}, [%r849]; 2026-02-21T08:44:14.9777020Z // end inline asm 2026-02-21T08:44:14.9777107Z bar.sync 0; 2026-02-21T08:44:14.9777271Z st.shared.v4.b32 [%r13], {%r1139, %r1151, %r1163, %r1175}; 2026-02-21T08:44:14.9777450Z st.shared.v4.b32 [%r13+2048], {%r1331, %r1343, %r1355, %r1367}; 2026-02-21T08:44:14.9777619Z st.shared.v4.b32 [%r14], {%r1187, %r1199, %r1211, %r1223}; 2026-02-21T08:44:14.9777793Z st.shared.v4.b32 [%r14+2048], {%r1379, %r1391, %r1403, %r1415}; 2026-02-21T08:44:14.9777955Z st.shared.v4.b32 [%r15], {%r1235, %r1247, %r1259, %r1271}; 2026-02-21T08:44:14.9778141Z st.shared.v4.b32 [%r15+2048], {%r1427, %r1439, %r1451, %r1463}; 2026-02-21T08:44:14.9778362Z st.shared.v4.b32 [%r16], {%r1283, %r1295, %r1307, %r1319}; 2026-02-21T08:44:14.9778537Z st.shared.v4.b32 [%r16+2048], {%r1475, %r1487, %r1499, %r1511}; 2026-02-21T08:44:14.9778631Z bar.sync 0; 2026-02-21T08:44:14.9778727Z // begin inline asm 2026-02-21T08:44:14.9778994Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r972, %r976, %r980, %r984}, [%r814]; 2026-02-21T08:44:14.9779086Z // end inline asm 2026-02-21T08:44:14.9779190Z // begin inline asm 2026-02-21T08:44:14.9779460Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r988, %r992, %r996, %r1000}, [%r819]; 2026-02-21T08:44:14.9779554Z // end inline asm 2026-02-21T08:44:14.9779657Z // begin inline asm 2026-02-21T08:44:14.9779937Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1004, %r1008, %r1012, %r1016}, [%r824]; 2026-02-21T08:44:14.9780027Z // end inline asm 2026-02-21T08:44:14.9780128Z // begin inline asm 2026-02-21T08:44:14.9780461Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1024, %r1028, %r1032}, [%r829]; 2026-02-21T08:44:14.9780551Z // end inline asm 2026-02-21T08:44:14.9780643Z // begin inline asm 2026-02-21T08:44:14.9780926Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1036, %r1040, %r1044, %r1048}, [%r834]; 2026-02-21T08:44:14.9781016Z // end inline asm 2026-02-21T08:44:14.9781108Z // begin inline asm 2026-02-21T08:44:14.9781394Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1052, %r1056, %r1060, %r1064}, [%r839]; 2026-02-21T08:44:14.9781486Z // end inline asm 2026-02-21T08:44:14.9781575Z // begin inline asm 2026-02-21T08:44:14.9781908Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1068, %r1072, %r1076, %r1080}, [%r844]; 2026-02-21T08:44:14.9781997Z // end inline asm 2026-02-21T08:44:14.9782091Z // begin inline asm 2026-02-21T08:44:14.9782363Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1084, %r1088, %r1092, %r1096}, [%r849]; 2026-02-21T08:44:14.9782464Z // end inline asm 2026-02-21T08:44:14.9782594Z bar.sync 0; 2026-02-21T08:44:14.9782763Z st.shared.v4.b32 [%r13], {%r1142, %r1154, %r1166, %r1178}; 2026-02-21T08:44:14.9782954Z st.shared.v4.b32 [%r13+2048], {%r1334, %r1346, %r1358, %r1370}; 2026-02-21T08:44:14.9783119Z st.shared.v4.b32 [%r14], {%r1190, %r1202, %r1214, %r1226}; 2026-02-21T08:44:14.9783297Z st.shared.v4.b32 [%r14+2048], {%r1382, %r1394, %r1406, %r1418}; 2026-02-21T08:44:14.9783462Z st.shared.v4.b32 [%r15], {%r1238, %r1250, %r1262, %r1274}; 2026-02-21T08:44:14.9783645Z st.shared.v4.b32 [%r15+2048], {%r1430, %r1442, %r1454, %r1466}; 2026-02-21T08:44:14.9783807Z st.shared.v4.b32 [%r16], {%r1286, %r1298, %r1310, %r1322}; 2026-02-21T08:44:14.9783980Z st.shared.v4.b32 [%r16+2048], {%r1478, %r1490, %r1502, %r1514}; 2026-02-21T08:44:14.9784079Z bar.sync 0; 2026-02-21T08:44:14.9784170Z // begin inline asm 2026-02-21T08:44:14.9784444Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r973, %r977, %r981, %r985}, [%r814]; 2026-02-21T08:44:14.9784545Z // end inline asm 2026-02-21T08:44:14.9784639Z // begin inline asm 2026-02-21T08:44:14.9784956Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r989, %r993, %r997, %r1001}, [%r819]; 2026-02-21T08:44:14.9785049Z // end inline asm 2026-02-21T08:44:14.9785153Z // begin inline asm 2026-02-21T08:44:14.9785433Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1005, %r1009, %r1013, %r1017}, [%r824]; 2026-02-21T08:44:14.9785525Z // end inline asm 2026-02-21T08:44:14.9785621Z // begin inline asm 2026-02-21T08:44:14.9785900Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1021, %r1025, %r1029, %r1033}, [%r829]; 2026-02-21T08:44:14.9785991Z // end inline asm 2026-02-21T08:44:14.9786103Z // begin inline asm 2026-02-21T08:44:14.9786383Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1037, %r1041, %r1045, %r1049}, [%r834]; 2026-02-21T08:44:14.9786478Z // end inline asm 2026-02-21T08:44:14.9786573Z // begin inline asm 2026-02-21T08:44:14.9786863Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1053, %r1057, %r1061, %r1065}, [%r839]; 2026-02-21T08:44:14.9786951Z // end inline asm 2026-02-21T08:44:14.9787044Z // begin inline asm 2026-02-21T08:44:14.9787326Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1069, %r1073, %r1077, %r1081}, [%r844]; 2026-02-21T08:44:14.9787479Z // end inline asm 2026-02-21T08:44:14.9787570Z // begin inline asm 2026-02-21T08:44:14.9787851Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1085, %r1089, %r1093, %r1097}, [%r849]; 2026-02-21T08:44:14.9787950Z // end inline asm 2026-02-21T08:44:14.9788045Z // begin inline asm 2026-02-21T08:44:14.9788227Z st.global.v4.b32 [ %rd96 + 0 ], { %r970, %r971, %r972, %r973 }; 2026-02-21T08:44:14.9788326Z // end inline asm 2026-02-21T08:44:14.9788422Z // begin inline asm 2026-02-21T08:44:14.9788602Z st.global.v4.b32 [ %rd97 + 0 ], { %r974, %r975, %r976, %r977 }; 2026-02-21T08:44:14.9788703Z // end inline asm 2026-02-21T08:44:14.9788797Z // begin inline asm 2026-02-21T08:44:14.9788970Z st.global.v4.b32 [ %rd98 + 0 ], { %r978, %r979, %r980, %r981 }; 2026-02-21T08:44:14.9789066Z // end inline asm 2026-02-21T08:44:14.9789170Z // begin inline asm 2026-02-21T08:44:14.9789389Z st.global.v4.b32 [ %rd99 + 0 ], { %r982, %r983, %r984, %r985 }; 2026-02-21T08:44:14.9789482Z // end inline asm 2026-02-21T08:44:14.9789578Z // begin inline asm 2026-02-21T08:44:14.9789754Z st.global.v4.b32 [ %rd100 + 0 ], { %r986, %r987, %r988, %r989 }; 2026-02-21T08:44:14.9789847Z // end inline asm 2026-02-21T08:44:14.9789935Z // begin inline asm 2026-02-21T08:44:14.9790118Z st.global.v4.b32 [ %rd101 + 0 ], { %r990, %r991, %r992, %r993 }; 2026-02-21T08:44:14.9790210Z // end inline asm 2026-02-21T08:44:14.9790297Z // begin inline asm 2026-02-21T08:44:14.9790530Z st.global.v4.b32 [ %rd102 + 0 ], { %r994, %r995, %r996, %r997 }; 2026-02-21T08:44:14.9790623Z // end inline asm 2026-02-21T08:44:14.9790717Z // begin inline asm 2026-02-21T08:44:14.9790908Z st.global.v4.b32 [ %rd103 + 0 ], { %r998, %r999, %r1000, %r1001 }; 2026-02-21T08:44:14.9790994Z // end inline asm 2026-02-21T08:44:14.9791156Z // begin inline asm 2026-02-21T08:44:14.9791343Z st.global.v4.b32 [ %rd104 + 0 ], { %r1002, %r1003, %r1004, %r1005 }; 2026-02-21T08:44:14.9791446Z // end inline asm 2026-02-21T08:44:14.9791537Z // begin inline asm 2026-02-21T08:44:14.9791722Z st.global.v4.b32 [ %rd105 + 0 ], { %r1006, %r1007, %r1008, %r1009 }; 2026-02-21T08:44:14.9791818Z // end inline asm 2026-02-21T08:44:14.9791911Z // begin inline asm 2026-02-21T08:44:14.9792094Z st.global.v4.b32 [ %rd106 + 0 ], { %r1010, %r1011, %r1012, %r1013 }; 2026-02-21T08:44:14.9792184Z // end inline asm 2026-02-21T08:44:14.9792287Z // begin inline asm 2026-02-21T08:44:14.9792466Z st.global.v4.b32 [ %rd107 + 0 ], { %r1014, %r1015, %r1016, %r1017 }; 2026-02-21T08:44:14.9792555Z // end inline asm 2026-02-21T08:44:14.9792658Z // begin inline asm 2026-02-21T08:44:14.9792839Z st.global.v4.b32 [ %rd108 + 0 ], { %r1018, %r1019, %r1020, %r1021 }; 2026-02-21T08:44:14.9792924Z // end inline asm 2026-02-21T08:44:14.9793017Z // begin inline asm 2026-02-21T08:44:14.9793208Z st.global.v4.b32 [ %rd109 + 0 ], { %r1022, %r1023, %r1024, %r1025 }; 2026-02-21T08:44:14.9793301Z // end inline asm 2026-02-21T08:44:14.9793389Z // begin inline asm 2026-02-21T08:44:14.9793581Z st.global.v4.b32 [ %rd110 + 0 ], { %r1026, %r1027, %r1028, %r1029 }; 2026-02-21T08:44:14.9793672Z // end inline asm 2026-02-21T08:44:14.9793767Z // begin inline asm 2026-02-21T08:44:14.9793956Z st.global.v4.b32 [ %rd111 + 0 ], { %r1030, %r1031, %r1032, %r1033 }; 2026-02-21T08:44:14.9794042Z // end inline asm 2026-02-21T08:44:14.9794136Z // begin inline asm 2026-02-21T08:44:14.9794314Z st.global.v4.b32 [ %rd112 + 0 ], { %r1034, %r1035, %r1036, %r1037 }; 2026-02-21T08:44:14.9794412Z // end inline asm 2026-02-21T08:44:14.9794502Z // begin inline asm 2026-02-21T08:44:14.9794738Z st.global.v4.b32 [ %rd113 + 0 ], { %r1038, %r1039, %r1040, %r1041 }; 2026-02-21T08:44:14.9794836Z // end inline asm 2026-02-21T08:44:14.9794925Z // begin inline asm 2026-02-21T08:44:14.9795108Z st.global.v4.b32 [ %rd114 + 0 ], { %r1042, %r1043, %r1044, %r1045 }; 2026-02-21T08:44:14.9795198Z // end inline asm 2026-02-21T08:44:14.9795305Z // begin inline asm 2026-02-21T08:44:14.9795539Z st.global.v4.b32 [ %rd115 + 0 ], { %r1046, %r1047, %r1048, %r1049 }; 2026-02-21T08:44:14.9795631Z // end inline asm 2026-02-21T08:44:14.9795733Z // begin inline asm 2026-02-21T08:44:14.9795912Z st.global.v4.b32 [ %rd116 + 0 ], { %r1050, %r1051, %r1052, %r1053 }; 2026-02-21T08:44:14.9796000Z // end inline asm 2026-02-21T08:44:14.9796100Z // begin inline asm 2026-02-21T08:44:14.9796277Z st.global.v4.b32 [ %rd117 + 0 ], { %r1054, %r1055, %r1056, %r1057 }; 2026-02-21T08:44:14.9796370Z // end inline asm 2026-02-21T08:44:14.9796462Z // begin inline asm 2026-02-21T08:44:14.9796648Z st.global.v4.b32 [ %rd118 + 0 ], { %r1058, %r1059, %r1060, %r1061 }; 2026-02-21T08:44:14.9796738Z // end inline asm 2026-02-21T08:44:14.9796833Z // begin inline asm 2026-02-21T08:44:14.9797019Z st.global.v4.b32 [ %rd119 + 0 ], { %r1062, %r1063, %r1064, %r1065 }; 2026-02-21T08:44:14.9797109Z // end inline asm 2026-02-21T08:44:14.9797198Z // begin inline asm 2026-02-21T08:44:14.9797425Z st.global.v4.b32 [ %rd120 + 0 ], { %r1066, %r1067, %r1068, %r1069 }; 2026-02-21T08:44:14.9797521Z // end inline asm 2026-02-21T08:44:14.9797613Z // begin inline asm 2026-02-21T08:44:14.9797794Z st.global.v4.b32 [ %rd121 + 0 ], { %r1070, %r1071, %r1072, %r1073 }; 2026-02-21T08:44:14.9797895Z // end inline asm 2026-02-21T08:44:14.9797986Z // begin inline asm 2026-02-21T08:44:14.9798165Z st.global.v4.b32 [ %rd122 + 0 ], { %r1074, %r1075, %r1076, %r1077 }; 2026-02-21T08:44:14.9798260Z // end inline asm 2026-02-21T08:44:14.9798391Z // begin inline asm 2026-02-21T08:44:14.9798571Z st.global.v4.b32 [ %rd123 + 0 ], { %r1078, %r1079, %r1080, %r1081 }; 2026-02-21T08:44:14.9798658Z // end inline asm 2026-02-21T08:44:14.9798771Z // begin inline asm 2026-02-21T08:44:14.9798953Z st.global.v4.b32 [ %rd124 + 0 ], { %r1082, %r1083, %r1084, %r1085 }; 2026-02-21T08:44:14.9799041Z // end inline asm 2026-02-21T08:44:14.9799184Z // begin inline asm 2026-02-21T08:44:14.9799370Z st.global.v4.b32 [ %rd125 + 0 ], { %r1086, %r1087, %r1088, %r1089 }; 2026-02-21T08:44:14.9799466Z // end inline asm 2026-02-21T08:44:14.9799564Z // begin inline asm 2026-02-21T08:44:14.9799753Z st.global.v4.b32 [ %rd126 + 0 ], { %r1090, %r1091, %r1092, %r1093 }; 2026-02-21T08:44:14.9799841Z // end inline asm 2026-02-21T08:44:14.9799928Z // begin inline asm 2026-02-21T08:44:14.9800117Z st.global.v4.b32 [ %rd127 + 0 ], { %r1094, %r1095, %r1096, %r1097 }; 2026-02-21T08:44:14.9800204Z // end inline asm 2026-02-21T08:44:14.9800345Z $L__BB0_8: // %._crit_edge 2026-02-21T08:44:14.9800684Z .loc 1 20 4 // c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py:20:4 2026-02-21T08:44:14.9800775Z bar.sync 0; 2026-02-21T08:44:14.9800861Z // begin inline asm 2026-02-21T08:44:14.9801078Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1515, 512; 2026-02-21T08:44:14.9801181Z // end inline asm 2026-02-21T08:44:14.9801263Z ret; 2026-02-21T08:44:14.9801353Z $L__tmp0: 2026-02-21T08:44:14.9801456Z $L__func_end0: 2026-02-21T08:44:14.9801598Z // -- End function 2026-02-21T08:44:14.9801684Z } 2026-02-21T08:44:14.9802072Z .file 1 "/tmp/torchinductor_root/3n/c3nnb4coikumoajcuglpos7vcds4j4y5xkbqw33kgxfjkkc67xay.py" 2026-02-21T08:44:14.9802185Z .section .debug_abbrev 2026-02-21T08:44:14.9802264Z { 2026-02-21T08:44:14.9802407Z .b8 1 // Abbreviation Code 2026-02-21T08:44:14.9802565Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:44:14.9802702Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:44:14.9802838Z .b8 37 // DW_AT_producer 2026-02-21T08:44:14.9802974Z .b8 8 // DW_FORM_string 2026-02-21T08:44:14.9803099Z .b8 19 // DW_AT_language 2026-02-21T08:44:14.9803228Z .b8 5 // DW_FORM_data2 2026-02-21T08:44:14.9803353Z .b8 3 // DW_AT_name 2026-02-21T08:44:14.9803531Z .b8 8 // DW_FORM_string 2026-02-21T08:44:14.9803663Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:44:14.9803792Z .b8 6 // DW_FORM_data4 2026-02-21T08:44:14.9803929Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:44:14.9804052Z .b8 8 // DW_FORM_string 2026-02-21T08:44:14.9804167Z .b8 0 // EOM(1) 2026-02-21T08:44:14.9804288Z .b8 0 // EOM(2) 2026-02-21T08:44:14.9804398Z .b8 0 // EOM(3) 2026-02-21T08:44:14.9804478Z } 2026-02-21T08:44:14.9804576Z .section .debug_info 2026-02-21T08:44:14.9804666Z { 2026-02-21T08:44:14.9804843Z .b32 104 // Length of Unit 2026-02-21T08:44:14.9804985Z .b8 2 // DWARF version number 2026-02-21T08:44:14.9805127Z .b8 0 2026-02-21T08:44:14.9805333Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:44:14.9805488Z .b8 8 // Address Size (in bytes) 2026-02-21T08:44:14.9805666Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:44:14.9805798Z .b8 116 // DW_AT_producer 2026-02-21T08:44:14.9805882Z .b8 114 2026-02-21T08:44:14.9805965Z .b8 105 2026-02-21T08:44:14.9806055Z .b8 116 2026-02-21T08:44:14.9806137Z .b8 111 2026-02-21T08:44:14.9806257Z .b8 110 2026-02-21T08:44:14.9806348Z .b8 0 2026-02-21T08:44:14.9806474Z .b8 2 // DW_AT_language 2026-02-21T08:44:14.9806555Z .b8 0 2026-02-21T08:44:14.9806680Z .b8 99 // DW_AT_name 2026-02-21T08:44:14.9806768Z .b8 51 2026-02-21T08:44:14.9806850Z .b8 110 2026-02-21T08:44:14.9806979Z .b8 110 2026-02-21T08:44:14.9807064Z .b8 98 2026-02-21T08:44:14.9807145Z .b8 52 2026-02-21T08:44:14.9807229Z .b8 99 2026-02-21T08:44:14.9807315Z .b8 111 2026-02-21T08:44:14.9807406Z .b8 105 2026-02-21T08:44:14.9807485Z .b8 107 2026-02-21T08:44:14.9807565Z .b8 117 2026-02-21T08:44:14.9807653Z .b8 109 2026-02-21T08:44:14.9807734Z .b8 111 2026-02-21T08:44:14.9807813Z .b8 97 2026-02-21T08:44:14.9807893Z .b8 106 2026-02-21T08:44:14.9807983Z .b8 99 2026-02-21T08:44:14.9808065Z .b8 117 2026-02-21T08:44:14.9808148Z .b8 103 2026-02-21T08:44:14.9808231Z .b8 108 2026-02-21T08:44:14.9808330Z .b8 112 2026-02-21T08:44:14.9808411Z .b8 111 2026-02-21T08:44:14.9808492Z .b8 115 2026-02-21T08:44:14.9808579Z .b8 55 2026-02-21T08:44:14.9808656Z .b8 118 2026-02-21T08:44:14.9808736Z .b8 99 2026-02-21T08:44:14.9808814Z .b8 100 2026-02-21T08:44:14.9808901Z .b8 115 2026-02-21T08:44:14.9808978Z .b8 52 2026-02-21T08:44:14.9809057Z .b8 106 2026-02-21T08:44:14.9809142Z .b8 52 2026-02-21T08:44:14.9809218Z .b8 121 2026-02-21T08:44:14.9809299Z .b8 53 2026-02-21T08:44:14.9809377Z .b8 120 2026-02-21T08:44:14.9809466Z .b8 107 2026-02-21T08:44:14.9809545Z .b8 98 2026-02-21T08:44:14.9809623Z .b8 113 2026-02-21T08:44:14.9809703Z .b8 119 2026-02-21T08:44:14.9809790Z .b8 51 2026-02-21T08:44:14.9809867Z .b8 51 2026-02-21T08:44:14.9809949Z .b8 107 2026-02-21T08:44:14.9810038Z .b8 103 2026-02-21T08:44:14.9810121Z .b8 120 2026-02-21T08:44:14.9810202Z .b8 102 2026-02-21T08:44:14.9810279Z .b8 106 2026-02-21T08:44:14.9810365Z .b8 107 2026-02-21T08:44:14.9810443Z .b8 107 2026-02-21T08:44:14.9810521Z .b8 99 2026-02-21T08:44:14.9810607Z .b8 54 2026-02-21T08:44:14.9810687Z .b8 55 2026-02-21T08:44:14.9810766Z .b8 120 2026-02-21T08:44:14.9810845Z .b8 97 2026-02-21T08:44:14.9810933Z .b8 121 2026-02-21T08:44:14.9811010Z .b8 46 2026-02-21T08:44:14.9811091Z .b8 112 2026-02-21T08:44:14.9811177Z .b8 121 2026-02-21T08:44:14.9811259Z .b8 0 2026-02-21T08:44:14.9811414Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:44:14.9811543Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:44:14.9811632Z .b8 116 2026-02-21T08:44:14.9811714Z .b8 109 2026-02-21T08:44:14.9811850Z .b8 112 2026-02-21T08:44:14.9811936Z .b8 47 2026-02-21T08:44:14.9812013Z .b8 116 2026-02-21T08:44:14.9812094Z .b8 111 2026-02-21T08:44:14.9812173Z .b8 114 2026-02-21T08:44:14.9812261Z .b8 99 2026-02-21T08:44:14.9812341Z .b8 104 2026-02-21T08:44:14.9812421Z .b8 105 2026-02-21T08:44:14.9812500Z .b8 110 2026-02-21T08:44:14.9812588Z .b8 100 2026-02-21T08:44:14.9812665Z .b8 117 2026-02-21T08:44:14.9812740Z .b8 99 2026-02-21T08:44:14.9812825Z .b8 116 2026-02-21T08:44:14.9812900Z .b8 111 2026-02-21T08:44:14.9812976Z .b8 114 2026-02-21T08:44:14.9813051Z .b8 95 2026-02-21T08:44:14.9813134Z .b8 114 2026-02-21T08:44:14.9813212Z .b8 111 2026-02-21T08:44:14.9813287Z .b8 111 2026-02-21T08:44:14.9813370Z .b8 116 2026-02-21T08:44:14.9813443Z .b8 47 2026-02-21T08:44:14.9813515Z .b8 51 2026-02-21T08:44:14.9813582Z .b8 110 2026-02-21T08:44:14.9813662Z .b8 0 2026-02-21T08:44:14.9813735Z } 2026-02-21T08:44:14.9813837Z .section .debug_macinfo { } 2026-02-21T08:44:14.9813848Z 2026-02-21T08:44:14.9814018Z ================================================================ 2026-02-21T08:44:14.9814194Z please share the reproducer above with Triton project. 2026-02-21T08:44:15.0725449Z 2026-02-21T08:44:15.0726849Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 90/90 21.6 configs/s 2026-02-21T08:44:16.7986091Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 591.9 2026-02-21T08:44:16.7986624Z configs/s 2026-02-21T08:44:16.9505767Z [70s] Generation 3 complete: 2026-02-21T08:44:16.9506133Z error=30 2026-02-21T08:44:16.9506349Z ok=65 2026-02-21T08:44:16.9506549Z min=0.0430 2026-02-21T08:44:16.9506763Z mid=0.0799 2026-02-21T08:44:16.9506958Z max=1.2033 2026-02-21T08:44:16.9507192Z best={'block_sizes': [256, 256, 32], 2026-02-21T08:44:16.9508032Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T08:44:16.9508481Z 'l2_groupings': [2], 2026-02-21T08:44:16.9508769Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:44:16.9509119Z 'loop_orders': [[1, 0]], 2026-02-21T08:44:16.9509411Z 'num_stages': 5, 2026-02-21T08:44:16.9509641Z 'num_warps': 8, 2026-02-21T08:44:16.9509901Z 'pid_type': 'flat', 2026-02-21T08:44:16.9510157Z 'range_flattens': [None, False], 2026-02-21T08:44:16.9510462Z 'range_multi_buffers': [None, None], 2026-02-21T08:44:16.9510786Z 'range_num_stages': [0, 0], 2026-02-21T08:44:16.9511093Z 'range_unroll_factors': [0, 0], 2026-02-21T08:44:16.9511431Z 'range_warp_specializes': [None, None]} 2026-02-21T08:44:16.9546598Z [70s] Fitting surrogate: 380 points, 380 targets 2026-02-21T08:44:18.6488620Z [72s] Generation 4 starting: 85 neighbors, 5 active search path(s) 2026-02-21T08:44:54.8557470Z [108s] Timeout after 30s compiling Config(block_sizes=[512, 512, 32], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=7, num_warps=2, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T08:44:54.8576900Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 0.2 configs/s 2026-02-21T08:44:58.9833345Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 20.4 configs/s 2026-02-21T08:45:01.3978634Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 417.2 2026-02-21T08:45:01.3979380Z configs/s 2026-02-21T08:45:01.5361147Z [115s] Generation 4 complete: 2026-02-21T08:45:01.5361566Z error=14 2026-02-21T08:45:01.5361854Z timeout=1 2026-02-21T08:45:01.5362082Z ok=75 2026-02-21T08:45:01.5362337Z min=0.0409 2026-02-21T08:45:01.5362564Z mid=0.0697 2026-02-21T08:45:01.5362832Z max=0.8294 2026-02-21T08:45:01.5363159Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:45:01.5365295Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:45:01.5365735Z 'l2_groupings': [4], 2026-02-21T08:45:01.5366276Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:45:01.5366647Z 'loop_orders': [[1, 0]], 2026-02-21T08:45:01.5366982Z 'num_stages': 3, 2026-02-21T08:45:01.5367277Z 'num_warps': 4, 2026-02-21T08:45:01.5367547Z 'pid_type': 'flat', 2026-02-21T08:45:01.5367902Z 'range_flattens': [None, True], 2026-02-21T08:45:01.5368241Z 'range_multi_buffers': [None, None], 2026-02-21T08:45:01.5368632Z 'range_num_stages': [0, 0], 2026-02-21T08:45:01.5368950Z 'range_unroll_factors': [0, 0], 2026-02-21T08:45:01.5369481Z 'range_warp_specializes': [None, False]} 2026-02-21T08:45:01.5385070Z [115s] Fitting surrogate: 470 points, 470 targets 2026-02-21T08:45:02.6090346Z [116s] Generation 5 starting: 62 neighbors, 4 active search path(s) 2026-02-21T08:45:07.6085387Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63/63 20.4 configs/s 2026-02-21T08:45:11.2465123Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 63/63 17.5 configs/s 2026-02-21T08:45:14.4896972Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 379.2 2026-02-21T08:45:14.4897652Z configs/s 2026-02-21T08:45:14.6563426Z [128s] Generation 5 complete: 2026-02-21T08:45:14.6566889Z error=4 2026-02-21T08:45:14.6570011Z ok=62 2026-02-21T08:45:14.6574051Z min=0.0410 2026-02-21T08:45:14.6577893Z mid=0.0655 2026-02-21T08:45:14.6580962Z max=10.1796 2026-02-21T08:45:14.6585207Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:45:14.6589109Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:45:14.6590880Z 'l2_groupings': [4], 2026-02-21T08:45:14.6591116Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:45:14.6591399Z 'loop_orders': [[1, 0]], 2026-02-21T08:45:14.6591595Z 'num_stages': 3, 2026-02-21T08:45:14.6591813Z 'num_warps': 4, 2026-02-21T08:45:14.6592361Z 'pid_type': 'flat', 2026-02-21T08:45:14.6592599Z 'range_flattens': [None, True], 2026-02-21T08:45:14.6592837Z 'range_multi_buffers': [None, None], 2026-02-21T08:45:14.6593100Z 'range_num_stages': [0, 0], 2026-02-21T08:45:14.6593327Z 'range_unroll_factors': [0, 0], 2026-02-21T08:45:14.6593545Z 'range_warp_specializes': [None, False]} 2026-02-21T08:45:14.6593821Z [128s] Fitting surrogate: 536 points, 536 targets 2026-02-21T08:45:15.5414875Z [129s] Generation 6 starting: 45 neighbors, 3 active search path(s) 2026-02-21T08:45:18.7944133Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46/46 23.0 configs/s 2026-02-21T08:45:20.7460270Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 46/46 23.3 configs/s 2026-02-21T08:45:23.2833514Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 397.5 2026-02-21T08:45:23.2837086Z configs/s 2026-02-21T08:45:23.4472513Z [137s] Generation 6 complete: 2026-02-21T08:45:23.4478539Z error=13 2026-02-21T08:45:23.4478892Z ok=35 2026-02-21T08:45:23.4479226Z min=0.0410 2026-02-21T08:45:23.4479821Z mid=0.0471 2026-02-21T08:45:23.4480121Z max=0.7209 2026-02-21T08:45:23.4480486Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:45:23.4480986Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:45:23.4481509Z 'l2_groupings': [4], 2026-02-21T08:45:23.4481942Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:45:23.4482413Z 'loop_orders': [[1, 0]], 2026-02-21T08:45:23.4482835Z 'num_stages': 3, 2026-02-21T08:45:23.4483171Z 'num_warps': 4, 2026-02-21T08:45:23.4483526Z 'pid_type': 'flat', 2026-02-21T08:45:23.4483879Z 'range_flattens': [None, True], 2026-02-21T08:45:23.4484317Z 'range_multi_buffers': [None, None], 2026-02-21T08:45:23.4485127Z 'range_num_stages': [0, 0], 2026-02-21T08:45:23.4485546Z 'range_unroll_factors': [0, 0], 2026-02-21T08:45:23.4485959Z 'range_warp_specializes': [None, False]} 2026-02-21T08:45:23.4494705Z [137s] Fitting surrogate: 584 points, 584 targets 2026-02-21T08:45:23.8750718Z [137s] Generation 7 starting: 15 neighbors, 1 active search path(s) 2026-02-21T08:45:32.2303073Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 0.7 configs/s 2026-02-21T08:45:33.0517319Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 15/15 19.5 configs/s 2026-02-21T08:45:34.0092536Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1038.6 2026-02-21T08:45:34.0096325Z configs/s 2026-02-21T08:45:34.0821313Z [148s] Generation 7 complete: 2026-02-21T08:45:34.0822004Z error=1 2026-02-21T08:45:34.0822326Z ok=16 2026-02-21T08:45:34.0822612Z min=0.0409 2026-02-21T08:45:34.0822944Z mid=0.0471 2026-02-21T08:45:34.0823229Z max=0.8500 2026-02-21T08:45:34.0823618Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:45:34.0825026Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:45:34.0825514Z 'l2_groupings': [4], 2026-02-21T08:45:34.0825937Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:45:34.0826397Z 'loop_orders': [[1, 0]], 2026-02-21T08:45:34.0826799Z 'num_stages': 3, 2026-02-21T08:45:34.0827119Z 'num_warps': 4, 2026-02-21T08:45:34.0827474Z 'pid_type': 'flat', 2026-02-21T08:45:34.0827859Z 'range_flattens': [None, True], 2026-02-21T08:45:34.0828273Z 'range_multi_buffers': [None, None], 2026-02-21T08:45:34.0828733Z 'range_num_stages': [0, 0], 2026-02-21T08:45:34.0829116Z 'range_unroll_factors': [0, 0], 2026-02-21T08:45:34.0829569Z 'range_warp_specializes': [None, False]} 2026-02-21T08:45:34.0844788Z [148s] Fitting surrogate: 601 points, 601 targets 2026-02-21T08:45:34.5537970Z [148s] Generation 8 starting: 15 neighbors, 1 active search path(s) 2026-02-21T08:45:36.0824056Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 10.8 configs/s 2026-02-21T08:45:36.6866088Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 15/15 23.9 configs/s 2026-02-21T08:45:37.3650303Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1455.9 2026-02-21T08:45:37.3651039Z configs/s 2026-02-21T08:45:37.4198613Z [151s] Generation 8 complete: 2026-02-21T08:45:37.4199017Z error=5 2026-02-21T08:45:37.4199343Z ok=12 2026-02-21T08:45:37.4199661Z min=0.0410 2026-02-21T08:45:37.4199952Z mid=0.0471 2026-02-21T08:45:37.4200275Z max=1.8668 2026-02-21T08:45:37.4200594Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:45:37.4201089Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:45:37.4201538Z 'l2_groupings': [4], 2026-02-21T08:45:37.4201958Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:45:37.4202384Z 'loop_orders': [[1, 0]], 2026-02-21T08:45:37.4202754Z 'num_stages': 3, 2026-02-21T08:45:37.4203057Z 'num_warps': 4, 2026-02-21T08:45:37.4203393Z 'pid_type': 'flat', 2026-02-21T08:45:37.4203763Z 'range_flattens': [None, True], 2026-02-21T08:45:37.4204172Z 'range_multi_buffers': [None, None], 2026-02-21T08:45:37.4225579Z 'range_num_stages': [0, 0], 2026-02-21T08:45:37.4225953Z 'range_unroll_factors': [0, 0], 2026-02-21T08:45:37.4226553Z 'range_warp_specializes': [None, False]} 2026-02-21T08:45:37.4226979Z [151s] Fitting surrogate: 618 points, 618 targets 2026-02-21T08:45:37.8645753Z [151s] Generation 9 starting: 14 neighbors, 1 active search path(s) 2026-02-21T08:45:39.3623452Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14/14 16.3 configs/s 2026-02-21T08:45:40.0198397Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 14/14 23.4 configs/s 2026-02-21T08:45:40.7890130Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1289.8 2026-02-21T08:45:40.7890778Z configs/s 2026-02-21T08:45:40.8492799Z [154s] Generation 9 complete: 2026-02-21T08:45:40.8493144Z error=3 2026-02-21T08:45:40.8493399Z ok=13 2026-02-21T08:45:40.8493773Z min=0.0410 2026-02-21T08:45:40.8495828Z mid=0.0451 2026-02-21T08:45:40.8496156Z max=1.5432 2026-02-21T08:45:40.8501345Z best={'block_sizes': [128, 128, 64], 2026-02-21T08:45:40.8501990Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:45:40.8502274Z 'l2_groupings': [4], 2026-02-21T08:45:40.8502525Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T08:45:40.8502774Z 'loop_orders': [[1, 0]], 2026-02-21T08:45:40.8503003Z 'num_stages': 3, 2026-02-21T08:45:40.8503190Z 'num_warps': 4, 2026-02-21T08:45:40.8503398Z 'pid_type': 'flat', 2026-02-21T08:45:40.8503591Z 'range_flattens': [None, True], 2026-02-21T08:45:40.8503839Z 'range_multi_buffers': [None, None], 2026-02-21T08:45:40.8504193Z 'range_num_stages': [0, 0], 2026-02-21T08:45:40.8504432Z 'range_unroll_factors': [0, 0], 2026-02-21T08:45:40.8504879Z 'range_warp_specializes': [None, False]} 2026-02-21T08:45:40.8517785Z [154s] Fitting surrogate: 634 points, 634 targets 2026-02-21T08:45:41.1380888Z [155s] Autotuning complete in 155.1s after searching 606 configs. 2026-02-21T08:45:41.1382197Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:45:41.1383296Z @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=3, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T08:45:41.1384210Z 2026-02-21T08:45:41.1384490Z [155s] Code of selected kernel: /tmp/torchinductor_root/di/cditdwuxt6ikc7mej4zj2az27q7t5hjspusyd44r5co6ho7sjl6v.py 2026-02-21T08:46:08.0882518Z WARNING:tritonbench.utils.triton_op:Completed input ID 9: 2026-02-21T08:46:08.0888742Z (M, N, K) 2026-02-21T08:46:08.0892688Z ------------------- 2026-02-21T08:46:08.0896703Z (1024, 12288, 1024) 2026-02-21T08:46:08.0898509Z 2026-02-21T08:46:08.0899489Z 88%|████████▊ | 7/8 [35:57<05:21, 321.44s/it]WARNING:tritonbench.utils.triton_op:Running input ID 11: 2026-02-21T08:46:08.0899966Z (M, N, K) 2026-02-21T08:46:08.0902649Z ------------------- 2026-02-21T08:46:08.0902879Z (2048, 12288, 2048) 2026-02-21T08:46:08.0903196Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T08:46:54.7977746Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T08:47:32.2664818Z Autotune Choices Stats: 2026-02-21T08:47:32.2668245Z {"num_choices": 19, "num_triton_choices": 19, "best_kernel": "triton_mm_150", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.13609600067138672, "best_triton_pos": 0} 2026-02-21T08:47:32.2678852Z AUTOTUNE mm(2048x2048, 2048x12288) 2026-02-21T08:47:32.2679465Z strides: [2048, 1], [1, 2048] 2026-02-21T08:47:32.2683257Z dtypes: torch.float16, torch.float16 2026-02-21T08:47:32.2684634Z triton_mm_150 0.1361 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:47:32.2687036Z triton_mm_151 0.1465 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T08:47:32.2690426Z triton_mm_144 0.1496 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:47:32.2692158Z triton_mm_149 0.1647 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:47:32.2693932Z triton_mm_143 0.1863 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:47:32.2696007Z triton_mm_147 0.1884 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T08:47:32.2697805Z triton_mm_142 0.1904 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:47:32.2699615Z triton_mm_140 0.1944 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2026-02-21T08:47:32.2701610Z triton_mm_146 0.2056 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T08:47:32.2703392Z triton_mm_139 0.2375 ms 57.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2026-02-21T08:47:32.2704872Z SingleProcess AUTOTUNE benchmarking takes 0.6174 seconds and 0.2499 seconds precompiling for 19 choices 2026-02-21T08:47:32.5500603Z INFO:tritonbench.utils.triton_op:Took 1323.73ms to get benchmark function for pt2_triton_matmul 2026-02-21T08:48:14.2941496Z WARNING:__main__:Input tensor metadata: 2026-02-21T08:48:14.2942215Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T08:48:14.2942498Z 'dtype': 'torch.float16', 2026-02-21T08:48:14.2942858Z 'shape': (2048, 2048), 2026-02-21T08:48:14.2946565Z 'stride': (2048, 1)}, 2026-02-21T08:48:14.2949199Z { 'device': 'cuda:0', 2026-02-21T08:48:14.2949819Z 'dtype': 'torch.float16', 2026-02-21T08:48:14.2950371Z 'shape': (2048, 12288), 2026-02-21T08:48:14.2950852Z 'stride': (1, 2048)}, 2026-02-21T08:48:14.2951346Z None), 2026-02-21T08:48:14.2951751Z 'kwargs': {}} 2026-02-21T08:48:14.2996505Z INFO:tritonbench.utils.triton_op:Took 5.81ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T08:48:14.3909882Z [0s] Autotune random seed: 2134884919 2026-02-21T08:48:14.5209336Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T08:48:21.7412456Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 23.3 configs/s 2026-02-21T08:48:48.6262782Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 3.7 configs/s 2026-02-21T08:48:48.6275921Z [34s] Adaptive compile timeout: 30s (90% percentile=5.3s, bounds=[30.0s, 30s]) 2026-02-21T08:48:48.6280453Z [34s] Initial random population of 100, 5 starting points: 2026-02-21T08:48:48.6282540Z error=13 2026-02-21T08:48:48.6282801Z ok=87 2026-02-21T08:48:48.6287462Z min=0.3931 2026-02-21T08:48:48.6292553Z mid=7.0246 2026-02-21T08:48:48.6296434Z max=802.5815 2026-02-21T08:48:48.6300938Z best={'block_sizes': [64, 128, 16], 2026-02-21T08:48:48.6304956Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:48:48.6308099Z 'l2_groupings': [4], 2026-02-21T08:48:48.6310212Z 'load_eviction_policies': ['', ''], 2026-02-21T08:48:48.6310803Z 'loop_orders': [[1, 0]], 2026-02-21T08:48:48.6311009Z 'num_stages': 8, 2026-02-21T08:48:48.6311223Z 'num_warps': 2, 2026-02-21T08:48:48.6311432Z 'pid_type': 'flat', 2026-02-21T08:48:48.6311628Z 'range_flattens': [None, None], 2026-02-21T08:48:48.6311878Z 'range_multi_buffers': [None, None], 2026-02-21T08:48:48.6312107Z 'range_num_stages': [0, 0], 2026-02-21T08:48:48.6312338Z 'range_unroll_factors': [0, 0], 2026-02-21T08:48:48.6312553Z 'range_warp_specializes': [None, None]} 2026-02-21T08:48:48.6312839Z [34s] Fitting surrogate: 100 points, 100 targets 2026-02-21T08:48:49.8469875Z [35s] Generation 1 starting: 79 neighbors, 5 active search path(s) 2026-02-21T08:48:56.0333712Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 42.3 configs/s 2026-02-21T08:49:00.1933413Z [45s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:49:00.1933941Z 2026-02-21T08:49:00.1934206Z 2026-02-21T08:49:00.1934375Z ================================================================ 2026-02-21T08:49:00.1935022Z Internal Triton PTX codegen error 2026-02-21T08:49:00.1935364Z `ptxas` stderr: 2026-02-21T08:49:00.1936172Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:49:00.1937036Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:49:00.1937344Z 2026-02-21T08:49:00.1938204Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpznlw3zqi.ptx -o /tmp/tmpznlw3zqi.ptx.o 2026-02-21T08:49:00.1939050Z 2026-02-21T08:49:00.1939056Z 2026-02-21T08:49:00.1939165Z // 2026-02-21T08:49:00.1939441Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:49:00.1939809Z // 2026-02-21T08:49:00.1939934Z 2026-02-21T08:49:00.1940046Z .version 8.7 2026-02-21T08:49:00.1940350Z .target sm_100a 2026-02-21T08:49:00.1940652Z .address_size 64 2026-02-21T08:49:00.1940811Z 2026-02-21T08:49:00.1941026Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:49:00.1941540Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:49:00.1941932Z // @_helion_matmul 2026-02-21T08:49:00.1942340Z .visible .entry _helion_matmul( 2026-02-21T08:49:00.1942753Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:49:00.1943246Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:49:00.1943759Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:49:00.1944217Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:49:00.1944760Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:49:00.1945153Z ) 2026-02-21T08:49:00.1945533Z .reqntid 128 2026-02-21T08:49:00.1945791Z .maxnreg 32 2026-02-21T08:49:00.1946063Z { 2026-02-21T08:49:00.1946306Z .reg .pred %p<120>; 2026-02-21T08:49:00.1946619Z .reg .b32 %r<402>; 2026-02-21T08:49:00.1946891Z .reg .b64 %rd<136>; 2026-02-21T08:49:00.1947432Z .loc 1 19 0 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:19:0 2026-02-21T08:49:00.1947993Z $L__func_begin0: 2026-02-21T08:49:00.1948456Z .loc 1 19 0 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:19:0 2026-02-21T08:49:00.1948922Z 2026-02-21T08:49:00.1949029Z // %bb.0: 2026-02-21T08:49:00.1949321Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T08:49:00.1949711Z $L__tmp0: 2026-02-21T08:49:00.1950166Z .loc 1 19 0 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:19 2026-02-21T08:49:00.1950724Z mov.u32 %r1, %tid.x; 2026-02-21T08:49:00.1951071Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T08:49:00.1951460Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:49:00.1951808Z mov.b32 %r30, global_smem; 2026-02-21T08:49:00.1952114Z // begin inline asm 2026-02-21T08:49:00.1952735Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r30], 64; 2026-02-21T08:49:00.1953179Z // end inline asm 2026-02-21T08:49:00.1953527Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T08:49:00.1953890Z bar.sync 0; 2026-02-21T08:49:00.1954197Z ld.shared.b32 %r393, [global_smem]; 2026-02-21T08:49:00.1954562Z bar.sync 0; 2026-02-21T08:49:00.1954869Z // begin inline asm 2026-02-21T08:49:00.1955300Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:49:00.1955741Z // end inline asm 2026-02-21T08:49:00.1956262Z .loc 1 21 67 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:21:67 2026-02-21T08:49:00.1958794Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:49:00.1960918Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:49:00.1961349Z `ptxas` stderr: 2026-02-21T08:49:00.1962146Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:49:00.1963099Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:49:00.1963402Z 2026-02-21T08:49:00.1964136Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpznlw3zqi.ptx -o /tmp/tmpznlw3zqi.ptx.o 2026-02-21T08:49:00.1965014Z 2026-02-21T08:49:00.1965321Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:49:00.1965793Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:49:00.1966152Z mov.u32 %r47, %ctaid.y; 2026-02-21T08:49:00.1966470Z mov.u32 %r48, %ctaid.z; 2026-02-21T08:49:00.1966807Z mov.u32 %r49, %nctaid.x; 2026-02-21T08:49:00.1967119Z mov.u32 %r50, %nctaid.y; 2026-02-21T08:49:00.1967479Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T08:49:00.1967868Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T08:49:00.1968199Z shl.b32 %r53, %r52, 8; 2026-02-21T08:49:00.1968552Z cvt.s64.s32 %rd41, %r53; 2026-02-21T08:49:00.1968872Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T08:49:00.1969220Z shl.b32 %r54, %r1, 2; 2026-02-21T08:49:00.1969511Z add.s32 %r31, %r30, %r54; 2026-02-21T08:49:00.1969857Z mov.b32 %r40, 0; 2026-02-21T08:49:00.1970145Z // begin inline asm 2026-02-21T08:49:00.1970481Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:49:00.1970823Z // end inline asm 2026-02-21T08:49:00.1971230Z bar.warp.sync -1; 2026-02-21T08:49:00.1971540Z setp.eq.b32 %p110, %r1, 0; 2026-02-21T08:49:00.1971906Z cvt.u64.u32 %rd4, %r30; 2026-02-21T08:49:00.1972262Z // begin inline asm 2026-02-21T08:49:00.1972748Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T08:49:00.1973290Z // end inline asm 2026-02-21T08:49:00.1973571Z // begin inline asm 2026-02-21T08:49:00.1974034Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:49:00.1974495Z // end inline asm 2026-02-21T08:49:00.1974843Z mov.b32 %r33, 16; 2026-02-21T08:49:00.1975161Z // begin inline asm 2026-02-21T08:49:00.1975607Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:49:00.1976175Z // end inline asm 2026-02-21T08:49:00.1976430Z mov.b32 %r34, 64; 2026-02-21T08:49:00.1976720Z // begin inline asm 2026-02-21T08:49:00.1977186Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:49:00.1977732Z // end inline asm 2026-02-21T08:49:00.1977987Z mov.b32 %r35, 2048; 2026-02-21T08:49:00.1978306Z // begin inline asm 2026-02-21T08:49:00.1978890Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:49:00.1979410Z // end inline asm 2026-02-21T08:49:00.1979721Z // begin inline asm 2026-02-21T08:49:00.1980177Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r35; 2026-02-21T08:49:00.1980744Z // end inline asm 2026-02-21T08:49:00.1981003Z mov.b64 %rd12, 4096; 2026-02-21T08:49:00.1981313Z // begin inline asm 2026-02-21T08:49:00.1981803Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:49:00.1982366Z // end inline asm 2026-02-21T08:49:00.1982669Z mov.b32 %r37, 1; 2026-02-21T08:49:00.1982931Z // begin inline asm 2026-02-21T08:49:00.1983444Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:49:00.1984001Z // end inline asm 2026-02-21T08:49:00.1984309Z // begin inline asm 2026-02-21T08:49:00.1984892Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:49:00.1985482Z // end inline asm 2026-02-21T08:49:00.1985775Z // begin inline asm 2026-02-21T08:49:00.1986233Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:49:00.1986762Z // end inline asm 2026-02-21T08:49:00.1987022Z // begin inline asm 2026-02-21T08:49:00.1987548Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:49:00.1988146Z // end inline asm 2026-02-21T08:49:00.1988454Z // begin inline asm 2026-02-21T08:49:00.1988951Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:49:00.1989454Z // end inline asm 2026-02-21T08:49:00.1989759Z // begin inline asm 2026-02-21T08:49:00.1990197Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:49:00.1990721Z // end inline asm 2026-02-21T08:49:00.1990993Z // begin inline asm 2026-02-21T08:49:00.1991649Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:49:00.1992385Z // end inline asm 2026-02-21T08:49:00.1992647Z // begin inline asm 2026-02-21T08:49:00.1993091Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:49:00.1993554Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:49:00.1993985Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:49:00.1994337Z // end inline asm 2026-02-21T08:49:00.1994625Z bar.sync 0; 2026-02-21T08:49:00.1994940Z cvta.global.u64 %rd59, %rd19; 2026-02-21T08:49:00.1995527Z .loc 1 22 68 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:22:68 2026-02-21T08:49:00.1996110Z add.s64 %rd37, %rd19, 128; 2026-02-21T08:49:00.1996418Z bar.sync 0; 2026-02-21T08:49:00.1996786Z // begin inline asm 2026-02-21T08:49:00.1997080Z @%p1 st.shared.b32 [ %r31 + 0 ], %r40; 2026-02-21T08:49:00.1997464Z // end inline asm 2026-02-21T08:49:00.1997743Z bar.warp.sync -1; 2026-02-21T08:49:00.1998043Z // begin inline asm 2026-02-21T08:49:00.1998498Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T08:49:00.1999066Z // end inline asm 2026-02-21T08:49:00.1999331Z // begin inline asm 2026-02-21T08:49:00.1999812Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:49:00.2000314Z // end inline asm 2026-02-21T08:49:00.2000583Z // begin inline asm 2026-02-21T08:49:00.2001095Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r33; 2026-02-21T08:49:00.2001578Z // end inline asm 2026-02-21T08:49:00.2001894Z // begin inline asm 2026-02-21T08:49:00.2002350Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r34; 2026-02-21T08:49:00.2002875Z // end inline asm 2026-02-21T08:49:00.2003187Z // begin inline asm 2026-02-21T08:49:00.2003663Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r35; 2026-02-21T08:49:00.2004289Z // end inline asm 2026-02-21T08:49:00.2004558Z mov.b32 %r44, 12288; 2026-02-21T08:49:00.2004903Z // begin inline asm 2026-02-21T08:49:00.2005378Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r44; 2026-02-21T08:49:00.2005941Z // end inline asm 2026-02-21T08:49:00.2006213Z // begin inline asm 2026-02-21T08:49:00.2006752Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T08:49:00.2007324Z // end inline asm 2026-02-21T08:49:00.2007606Z // begin inline asm 2026-02-21T08:49:00.2008150Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T08:49:00.2008730Z // end inline asm 2026-02-21T08:49:00.2009038Z // begin inline asm 2026-02-21T08:49:00.2009542Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T08:49:00.2010222Z // end inline asm 2026-02-21T08:49:00.2010540Z // begin inline asm 2026-02-21T08:49:00.2011024Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T08:49:00.2011587Z // end inline asm 2026-02-21T08:49:00.2011870Z // begin inline asm 2026-02-21T08:49:00.2012426Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:49:00.2012986Z // end inline asm 2026-02-21T08:49:00.2013309Z // begin inline asm 2026-02-21T08:49:00.2013815Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T08:49:00.2014447Z // end inline asm 2026-02-21T08:49:00.2014789Z // begin inline asm 2026-02-21T08:49:00.2015265Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T08:49:00.2015813Z // end inline asm 2026-02-21T08:49:00.2016085Z // begin inline asm 2026-02-21T08:49:00.2016783Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T08:49:00.2017506Z // end inline asm 2026-02-21T08:49:00.2017776Z // begin inline asm 2026-02-21T08:49:00.2018216Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T08:49:00.2018701Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:49:00.2019089Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:49:00.2019438Z // end inline asm 2026-02-21T08:49:00.2019736Z bar.sync 0; 2026-02-21T08:49:00.2020004Z cvta.global.u64 %rd60, %rd37; 2026-02-21T08:49:00.2020570Z .loc 1 28 131 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:28:131 2026-02-21T08:49:00.2021182Z setp.gt.u32 %p39, %r3, 6143; 2026-02-21T08:49:00.2021490Z @%p39 bra $L__BB0_8; 2026-02-21T08:49:00.2021847Z // %bb.1: // %.lr.ph 2026-02-21T08:49:00.2022535Z .loc 1 40 45 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:40:45 2026-02-21T08:49:00.2023145Z shl.b32 %r154, %r1, 3; 2026-02-21T08:49:00.2023455Z and.b32 %r155, %r154, 56; 2026-02-21T08:49:00.2023792Z bfe.u32 %r156, %r1, 3, 4; 2026-02-21T08:49:00.2024093Z shr.u32 %r157, %r1, 5; 2026-02-21T08:49:00.2024429Z shl.b32 %r158, %r1, 4; 2026-02-21T08:49:00.2024802Z and.b32 %r159, %r158, 176; 2026-02-21T08:49:00.2025115Z and.b32 %r160, %r1, 96; 2026-02-21T08:49:00.2025469Z shl.b32 %r161, %r160, 3; 2026-02-21T08:49:00.2025774Z bfe.s32 %r162, %r1, 2, 1; 2026-02-21T08:49:00.2026102Z and.b32 %r163, %r162, 1088; 2026-02-21T08:49:00.2026425Z and.b32 %r165, %r54, 64; 2026-02-21T08:49:00.2026774Z xor.b32 %r166, %r163, %r165; 2026-02-21T08:49:00.2027071Z add.s32 %r167, %r30, %r159; 2026-02-21T08:49:00.2027410Z add.s32 %r168, %r167, %r161; 2026-02-21T08:49:00.2027725Z shl.b32 %r169, %r1, 5; 2026-02-21T08:49:00.2028055Z and.b32 %r170, %r169, 1792; 2026-02-21T08:49:00.2028381Z and.b32 %r171, %r154, 48; 2026-02-21T08:49:00.2028689Z shl.b32 %r172, %r160, 1; 2026-02-21T08:49:00.2029034Z shl.b32 %r173, %r1, 6; 2026-02-21T08:49:00.2029381Z and.b32 %r174, %r173, 64; 2026-02-21T08:49:00.2029714Z xor.b32 %r175, %r172, %r174; 2026-02-21T08:49:00.2030024Z add.s32 %r176, %r30, %r170; 2026-02-21T08:49:00.2030363Z add.s32 %r177, %r176, %r171; 2026-02-21T08:49:00.2030875Z .loc 1 35 33 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:35:33 2026-02-21T08:49:00.2031466Z shr.u32 %r178, %r3, 5; 2026-02-21T08:49:00.2031784Z and.b32 %r179, %r178, 252; 2026-02-21T08:49:00.2032299Z .loc 1 37 64 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:37:64 2026-02-21T08:49:00.2032876Z and.b32 %r180, %r3, 3; 2026-02-21T08:49:00.2033369Z .loc 1 37 30 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:37:30 2026-02-21T08:49:00.2033955Z or.b32 %r181, %r179, %r180; 2026-02-21T08:49:00.2034539Z .loc 1 39 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:39:27 2026-02-21T08:49:00.2035173Z shl.b32 %r212, %r181, 6; 2026-02-21T08:49:00.2035720Z .loc 1 41 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:41:27 2026-02-21T08:49:00.2036263Z shl.b32 %r182, %r3, 4; 2026-02-21T08:49:00.2036596Z and.b32 %r208, %r182, 1984; 2026-02-21T08:49:00.2037114Z .loc 1 42 32 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:42:32 2026-02-21T08:49:00.2037674Z or.b32 %r9, %r208, %r156; 2026-02-21T08:49:00.2038171Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2038858Z shfl.sync.idx.b32 %r13, %r157, 0, 31, -1; 2026-02-21T08:49:00.2039255Z shl.b32 %r183, %r13, 21; 2026-02-21T08:49:00.2039572Z and.b32 %r184, %r183, 6291456; 2026-02-21T08:49:00.2039927Z add.s32 %r303, %r184, %r393; 2026-02-21T08:49:00.2040230Z mov.pred %p40, -1; 2026-02-21T08:49:00.2040547Z // begin inline asm 2026-02-21T08:49:00.2041224Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 0], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:49:00.2041967Z // end inline asm 2026-02-21T08:49:00.2042244Z // begin inline asm 2026-02-21T08:49:00.2042915Z @%p40 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r303 + 16], 32, {%r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40, %r40}; 2026-02-21T08:49:00.2043657Z // end inline asm 2026-02-21T08:49:00.2043925Z // begin inline asm 2026-02-21T08:49:00.2044268Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:49:00.2044594Z // end inline asm 2026-02-21T08:49:00.2044919Z bar.sync 0; 2026-02-21T08:49:00.2045400Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2045998Z add.s32 %r395, %r30, 28736; 2026-02-21T08:49:00.2046408Z // begin inline asm 2026-02-21T08:49:00.2046750Z @%p110 mbarrier.init.shared::cta.b64 [%r395], 1; 2026-02-21T08:49:00.2047172Z // end inline asm 2026-02-21T08:49:00.2047435Z bar.sync 0; 2026-02-21T08:49:00.2047740Z add.s32 %r90, %r30, 28744; 2026-02-21T08:49:00.2048051Z // begin inline asm 2026-02-21T08:49:00.2048394Z @%p110 mbarrier.init.shared::cta.b64 [%r90], 1; 2026-02-21T08:49:00.2048763Z // end inline asm 2026-02-21T08:49:00.2049076Z add.s32 %r91, %r30, 28672; 2026-02-21T08:49:00.2049376Z // begin inline asm 2026-02-21T08:49:00.2049728Z @%p110 mbarrier.init.shared::cta.b64 [%r91], 1; 2026-02-21T08:49:00.2050093Z // end inline asm 2026-02-21T08:49:00.2050364Z bar.sync 0; 2026-02-21T08:49:00.2050653Z add.s32 %r92, %r30, 28680; 2026-02-21T08:49:00.2050962Z // begin inline asm 2026-02-21T08:49:00.2051349Z @%p110 mbarrier.init.shared::cta.b64 [%r92], 1; 2026-02-21T08:49:00.2051715Z // end inline asm 2026-02-21T08:49:00.2052033Z bar.sync 0; 2026-02-21T08:49:00.2052307Z add.s32 %r93, %r30, 28688; 2026-02-21T08:49:00.2052661Z // begin inline asm 2026-02-21T08:49:00.2052996Z @%p110 mbarrier.init.shared::cta.b64 [%r93], 1; 2026-02-21T08:49:00.2053494Z // end inline asm 2026-02-21T08:49:00.2053812Z bar.sync 0; 2026-02-21T08:49:00.2054078Z add.s32 %r94, %r30, 28696; 2026-02-21T08:49:00.2054441Z // begin inline asm 2026-02-21T08:49:00.2054830Z @%p110 mbarrier.init.shared::cta.b64 [%r94], 1; 2026-02-21T08:49:00.2055225Z // end inline asm 2026-02-21T08:49:00.2055505Z bar.sync 0; 2026-02-21T08:49:00.2055812Z add.s32 %r95, %r30, 28704; 2026-02-21T08:49:00.2056133Z // begin inline asm 2026-02-21T08:49:00.2056505Z @%p110 mbarrier.init.shared::cta.b64 [%r95], 1; 2026-02-21T08:49:00.2056938Z // end inline asm 2026-02-21T08:49:00.2057217Z bar.sync 0; 2026-02-21T08:49:00.2057512Z add.s32 %r96, %r30, 28712; 2026-02-21T08:49:00.2057841Z // begin inline asm 2026-02-21T08:49:00.2058225Z @%p110 mbarrier.init.shared::cta.b64 [%r96], 1; 2026-02-21T08:49:00.2058589Z // end inline asm 2026-02-21T08:49:00.2058892Z bar.sync 0; 2026-02-21T08:49:00.2059171Z add.s32 %r205, %r30, 28720; 2026-02-21T08:49:00.2059594Z // begin inline asm 2026-02-21T08:49:00.2059930Z @%p110 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T08:49:00.2060356Z // end inline asm 2026-02-21T08:49:00.2060645Z bar.sync 0; 2026-02-21T08:49:00.2060889Z // begin inline asm 2026-02-21T08:49:00.2061317Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r91], 4096; 2026-02-21T08:49:00.2061754Z // end inline asm 2026-02-21T08:49:00.2062284Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2062848Z // begin inline asm 2026-02-21T08:49:00.2063262Z fence.proxy.async.shared::cta; 2026-02-21T08:49:00.2063592Z // end inline asm 2026-02-21T08:49:00.2063882Z bar.sync 0; 2026-02-21T08:49:00.2064177Z elect.sync %r185|%p70, -1; 2026-02-21T08:49:00.2064506Z and.pred %p52, %p1, %p70; 2026-02-21T08:49:00.2064905Z // begin inline asm 2026-02-21T08:49:00.2065523Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r30], [%rd59, {%r40, %r208}], [%r91]; 2026-02-21T08:49:00.2066212Z // end inline asm 2026-02-21T08:49:00.2066699Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2067279Z bar.sync 0; 2026-02-21T08:49:00.2067544Z elect.sync %r186|%p71, -1; 2026-02-21T08:49:00.2067909Z and.pred %p53, %p1, %p71; 2026-02-21T08:49:00.2068264Z add.s32 %r103, %r30, 14336; 2026-02-21T08:49:00.2068560Z // begin inline asm 2026-02-21T08:49:00.2069201Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r103], [%rd60, {%r40, %r212}], [%r91]; 2026-02-21T08:49:00.2069877Z // end inline asm 2026-02-21T08:49:00.2070388Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2070936Z bar.sync 0; 2026-02-21T08:49:00.2071233Z // begin inline asm 2026-02-21T08:49:00.2071714Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r92], 4096; 2026-02-21T08:49:00.2072129Z // end inline asm 2026-02-21T08:49:00.2072646Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2073196Z bar.sync 0; 2026-02-21T08:49:00.2073497Z elect.sync %r187|%p72, -1; 2026-02-21T08:49:00.2073827Z and.pred %p55, %p1, %p72; 2026-02-21T08:49:00.2074198Z add.s32 %r108, %r30, 2048; 2026-02-21T08:49:00.2074491Z // begin inline asm 2026-02-21T08:49:00.2075187Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd59, {%r33, %r208}], [%r92]; 2026-02-21T08:49:00.2075893Z // end inline asm 2026-02-21T08:49:00.2076394Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2076968Z bar.sync 0; 2026-02-21T08:49:00.2077289Z elect.sync %r188|%p73, -1; 2026-02-21T08:49:00.2077623Z and.pred %p56, %p1, %p73; 2026-02-21T08:49:00.2077931Z add.s32 %r112, %r30, 16384; 2026-02-21T08:49:00.2078280Z // begin inline asm 2026-02-21T08:49:00.2078924Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r112], [%rd60, {%r33, %r212}], [%r92]; 2026-02-21T08:49:00.2079682Z // end inline asm 2026-02-21T08:49:00.2080203Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2080766Z bar.sync 0; 2026-02-21T08:49:00.2081060Z // begin inline asm 2026-02-21T08:49:00.2081436Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r93], 4096; 2026-02-21T08:49:00.2081896Z // end inline asm 2026-02-21T08:49:00.2082387Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2082964Z bar.sync 0; 2026-02-21T08:49:00.2083271Z elect.sync %r189|%p74, -1; 2026-02-21T08:49:00.2083602Z and.pred %p58, %p1, %p74; 2026-02-21T08:49:00.2083957Z add.s32 %r117, %r30, 4096; 2026-02-21T08:49:00.2084248Z mov.b32 %r118, 32; 2026-02-21T08:49:00.2084582Z // begin inline asm 2026-02-21T08:49:00.2085328Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd59, {%r118, %r208}], [%r93]; 2026-02-21T08:49:00.2086055Z // end inline asm 2026-02-21T08:49:00.2086561Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2087112Z bar.sync 0; 2026-02-21T08:49:00.2087404Z elect.sync %r190|%p75, -1; 2026-02-21T08:49:00.2087732Z and.pred %p59, %p1, %p75; 2026-02-21T08:49:00.2088079Z add.s32 %r121, %r30, 18432; 2026-02-21T08:49:00.2088397Z // begin inline asm 2026-02-21T08:49:00.2089130Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd60, {%r118, %r212}], [%r93]; 2026-02-21T08:49:00.2089790Z // end inline asm 2026-02-21T08:49:00.2090313Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2090891Z bar.sync 0; 2026-02-21T08:49:00.2091152Z // begin inline asm 2026-02-21T08:49:00.2091578Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r94], 4096; 2026-02-21T08:49:00.2091991Z // end inline asm 2026-02-21T08:49:00.2092510Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2093065Z bar.sync 0; 2026-02-21T08:49:00.2093387Z elect.sync %r191|%p76, -1; 2026-02-21T08:49:00.2093735Z and.pred %p61, %p1, %p76; 2026-02-21T08:49:00.2094097Z add.s32 %r126, %r30, 6144; 2026-02-21T08:49:00.2094433Z mov.b32 %r127, 48; 2026-02-21T08:49:00.2094756Z // begin inline asm 2026-02-21T08:49:00.2095452Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r126], [%rd59, {%r127, %r208}], [%r94]; 2026-02-21T08:49:00.2096140Z // end inline asm 2026-02-21T08:49:00.2096662Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2097295Z bar.sync 0; 2026-02-21T08:49:00.2097612Z elect.sync %r192|%p77, -1; 2026-02-21T08:49:00.2097973Z and.pred %p62, %p1, %p77; 2026-02-21T08:49:00.2098315Z add.s32 %r130, %r30, 20480; 2026-02-21T08:49:00.2098669Z // begin inline asm 2026-02-21T08:49:00.2099289Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r130], [%rd60, {%r127, %r212}], [%r94]; 2026-02-21T08:49:00.2100024Z // end inline asm 2026-02-21T08:49:00.2100518Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2101138Z bar.sync 0; 2026-02-21T08:49:00.2101387Z // begin inline asm 2026-02-21T08:49:00.2101802Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r95], 4096; 2026-02-21T08:49:00.2102255Z // end inline asm 2026-02-21T08:49:00.2102716Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2103301Z bar.sync 0; 2026-02-21T08:49:00.2103563Z elect.sync %r193|%p78, -1; 2026-02-21T08:49:00.2103911Z and.pred %p64, %p1, %p78; 2026-02-21T08:49:00.2104226Z add.s32 %r135, %r30, 8192; 2026-02-21T08:49:00.2104635Z // begin inline asm 2026-02-21T08:49:00.2105311Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r135], [%rd59, {%r34, %r208}], [%r95]; 2026-02-21T08:49:00.2105971Z // end inline asm 2026-02-21T08:49:00.2106474Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2107020Z bar.sync 0; 2026-02-21T08:49:00.2107311Z elect.sync %r194|%p79, -1; 2026-02-21T08:49:00.2107622Z and.pred %p65, %p1, %p79; 2026-02-21T08:49:00.2107980Z add.s32 %r139, %r30, 22528; 2026-02-21T08:49:00.2108291Z // begin inline asm 2026-02-21T08:49:00.2108909Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r139], [%rd60, {%r34, %r212}], [%r95]; 2026-02-21T08:49:00.2109636Z // end inline asm 2026-02-21T08:49:00.2110112Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2110770Z bar.sync 0; 2026-02-21T08:49:00.2111028Z // begin inline asm 2026-02-21T08:49:00.2111435Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r96], 4096; 2026-02-21T08:49:00.2111857Z // end inline asm 2026-02-21T08:49:00.2112349Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2112933Z bar.sync 0; 2026-02-21T08:49:00.2113207Z elect.sync %r195|%p80, -1; 2026-02-21T08:49:00.2113555Z and.pred %p67, %p1, %p80; 2026-02-21T08:49:00.2113874Z add.s32 %r144, %r30, 10240; 2026-02-21T08:49:00.2114307Z mov.b32 %r145, 80; 2026-02-21T08:49:00.2114571Z // begin inline asm 2026-02-21T08:49:00.2115257Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd59, {%r145, %r208}], [%r96]; 2026-02-21T08:49:00.2115909Z // end inline asm 2026-02-21T08:49:00.2116441Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2117001Z bar.sync 0; 2026-02-21T08:49:00.2117263Z elect.sync %r196|%p81, -1; 2026-02-21T08:49:00.2117618Z and.pred %p68, %p1, %p81; 2026-02-21T08:49:00.2117913Z add.s32 %r148, %r30, 24576; 2026-02-21T08:49:00.2118240Z // begin inline asm 2026-02-21T08:49:00.2118854Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd60, {%r145, %r212}], [%r96]; 2026-02-21T08:49:00.2119539Z // end inline asm 2026-02-21T08:49:00.2120045Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2120583Z bar.sync 0; 2026-02-21T08:49:00.2120882Z // begin inline asm 2026-02-21T08:49:00.2121142Z 2026-02-21T08:49:00.2121393Z { 2026-02-21T08:49:00.2121640Z .reg .pred complete; 2026-02-21T08:49:00.2121970Z waitLoop: 2026-02-21T08:49:00.2122317Z mbarrier.try_wait.parity.shared.b64 complete, [%r91], %r40; 2026-02-21T08:49:00.2122871Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.2123181Z } 2026-02-21T08:49:00.2123361Z 2026-02-21T08:49:00.2123471Z // end inline asm 2026-02-21T08:49:00.2123996Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2124555Z setp.ne.b32 %p82, %r13, 0; 2026-02-21T08:49:00.2124925Z @%p82 bra $L__BB0_3; 2026-02-21T08:49:00.2125204Z // %bb.2: 2026-02-21T08:49:00.2125702Z .loc 1 0 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:0:52 2026-02-21T08:49:00.2126234Z bfe.u32 %r201, %r103, 4, 14; 2026-02-21T08:49:00.2126599Z cvt.u64.u32 %rd57, %r201; 2026-02-21T08:49:00.2126921Z or.b64 %rd55, %rd57, -4611685949699522560; 2026-02-21T08:49:00.2127300Z bfe.u32 %r202, %r30, 4, 14; 2026-02-21T08:49:00.2127656Z cvt.u64.u32 %rd58, %r202; 2026-02-21T08:49:00.2127973Z or.b64 %rd54, %rd58, -4611685949699522560; 2026-02-21T08:49:00.2128578Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2129142Z elect.sync %r203|%p84, -1; 2026-02-21T08:49:00.2129563Z mov.b32 %r198, 68157456; 2026-02-21T08:49:00.2129881Z mov.pred %p83, 0; 2026-02-21T08:49:00.2130191Z // begin inline asm 2026-02-21T08:49:00.2130665Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r393 + 0 ], %rd54, %rd55, %r198, %p83; 2026-02-21T08:49:00.2131148Z // end inline asm 2026-02-21T08:49:00.2131447Z add.s32 %r204, %r30, 28736; 2026-02-21T08:49:00.2131757Z cvt.u64.u32 %rd56, %r204; 2026-02-21T08:49:00.2132110Z // begin inline asm 2026-02-21T08:49:00.2132505Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T08:49:00.2132980Z // end inline asm 2026-02-21T08:49:00.2133246Z $L__BB0_3: 2026-02-21T08:49:00.2133726Z .loc 1 0 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:0:52 2026-02-21T08:49:00.2134367Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T08:49:00.2134763Z add.s32 %r4, %r168, %r166; 2026-02-21T08:49:00.2135126Z add.s32 %r308, %r177, %r175; 2026-02-21T08:49:00.2135515Z or.b32 %r7, %r212, %r155; 2026-02-21T08:49:00.2135861Z or.b32 %r10, %r9, 16; 2026-02-21T08:49:00.2136131Z or.b32 %r11, %r9, 32; 2026-02-21T08:49:00.2136468Z or.b32 %r12, %r9, 48; 2026-02-21T08:49:00.2136982Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2137578Z bar.sync 0; 2026-02-21T08:49:00.2137858Z // begin inline asm 2026-02-21T08:49:00.2138292Z @%p110 mbarrier.arrive.expect_tx.shared.b64 _, [%r205], 4096; 2026-02-21T08:49:00.2138776Z // end inline asm 2026-02-21T08:49:00.2139357Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2139968Z bar.sync 0; 2026-02-21T08:49:00.2140263Z elect.sync %r219|%p90, -1; 2026-02-21T08:49:00.2140626Z and.pred %p87, %p1, %p90; 2026-02-21T08:49:00.2140963Z add.s32 %r206, %r30, 12288; 2026-02-21T08:49:00.2141318Z mov.b32 %r207, 96; 2026-02-21T08:49:00.2141595Z // begin inline asm 2026-02-21T08:49:00.2142292Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r206], [%rd59, {%r207, %r208}], [%r205]; 2026-02-21T08:49:00.2143033Z // end inline asm 2026-02-21T08:49:00.2143544Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2144145Z bar.sync 0; 2026-02-21T08:49:00.2144445Z elect.sync %r220|%p91, -1; 2026-02-21T08:49:00.2144856Z and.pred %p88, %p1, %p91; 2026-02-21T08:49:00.2145161Z add.s32 %r210, %r30, 26624; 2026-02-21T08:49:00.2145517Z // begin inline asm 2026-02-21T08:49:00.2146173Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r210], [%rd60, {%r207, %r212}], [%r205]; 2026-02-21T08:49:00.2146850Z // end inline asm 2026-02-21T08:49:00.2147142Z mov.b32 %r399, 1; 2026-02-21T08:49:00.2147413Z mov.b32 %r398, 6; 2026-02-21T08:49:00.2147791Z mov.b32 %r394, 0; 2026-02-21T08:49:00.2148073Z mov.b32 %r396, %r394; 2026-02-21T08:49:00.2148389Z mov.b32 %r397, %r394; 2026-02-21T08:49:00.2148679Z mov.b32 %r400, %r394; 2026-02-21T08:49:00.2148998Z mov.b32 %r401, %r394; 2026-02-21T08:49:00.2149312Z bra.uni $L__BB0_4; 2026-02-21T08:49:00.2149685Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:49:00.2150357Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2150918Z setp.lt.u32 %p100, %r401, 1936; 2026-02-21T08:49:00.2151487Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2152038Z // begin inline asm 2026-02-21T08:49:00.2152351Z 2026-02-21T08:49:00.2152576Z { 2026-02-21T08:49:00.2152855Z .reg .pred complete; 2026-02-21T08:49:00.2153171Z waitLoop: 2026-02-21T08:49:00.2153545Z mbarrier.try_wait.parity.shared.b64 complete, [%r395], %r394; 2026-02-21T08:49:00.2154031Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.2154330Z } 2026-02-21T08:49:00.2154469Z 2026-02-21T08:49:00.2154609Z // end inline asm 2026-02-21T08:49:00.2154978Z add.s32 %r250, %r399, 1; 2026-02-21T08:49:00.2155339Z setp.gt.s32 %p103, %r250, 1; 2026-02-21T08:49:00.2155672Z selp.b32 %r399, 0, %r250, %p103; 2026-02-21T08:49:00.2156030Z selp.b32 %r251, 1, 0, %p103; 2026-02-21T08:49:00.2156348Z xor.b32 %r400, %r260, %r251; 2026-02-21T08:49:00.2156913Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2157469Z add.s32 %r252, %r398, 1; 2026-02-21T08:49:00.2157786Z setp.gt.s32 %p104, %r252, 6; 2026-02-21T08:49:00.2158145Z selp.b32 %r398, 0, %r252, %p104; 2026-02-21T08:49:00.2158455Z shl.b32 %r253, %r398, 3; 2026-02-21T08:49:00.2158795Z add.s32 %r255, %r30, %r253; 2026-02-21T08:49:00.2159110Z add.s32 %r245, %r255, 28672; 2026-02-21T08:49:00.2159429Z bar.sync 0; 2026-02-21T08:49:00.2159714Z and.pred %p97, %p110, %p100; 2026-02-21T08:49:00.2160068Z // begin inline asm 2026-02-21T08:49:00.2160543Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r245], 4096; 2026-02-21T08:49:00.2160977Z // end inline asm 2026-02-21T08:49:00.2161498Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2162025Z shl.b32 %r256, %r398, 11; 2026-02-21T08:49:00.2162375Z add.s32 %r242, %r30, %r256; 2026-02-21T08:49:00.2162693Z bar.sync 0; 2026-02-21T08:49:00.2162989Z elect.sync %r257|%p105, -1; 2026-02-21T08:49:00.2163312Z and.pred %p106, %p100, %p105; 2026-02-21T08:49:00.2163674Z and.pred %p98, %p1, %p106; 2026-02-21T08:49:00.2164087Z add.s32 %r243, %r401, 112; 2026-02-21T08:49:00.2164402Z // begin inline asm 2026-02-21T08:49:00.2165109Z @%p98 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd59, {%r243, %r208}], [%r245]; 2026-02-21T08:49:00.2165778Z // end inline asm 2026-02-21T08:49:00.2166293Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2166842Z add.s32 %r246, %r242, 14336; 2026-02-21T08:49:00.2167186Z bar.sync 0; 2026-02-21T08:49:00.2167473Z elect.sync %r258|%p107, -1; 2026-02-21T08:49:00.2167796Z and.pred %p108, %p100, %p107; 2026-02-21T08:49:00.2168156Z and.pred %p99, %p1, %p108; 2026-02-21T08:49:00.2168456Z // begin inline asm 2026-02-21T08:49:00.2169105Z @%p99 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd60, {%r243, %r212}], [%r245]; 2026-02-21T08:49:00.2169808Z // end inline asm 2026-02-21T08:49:00.2170322Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2170906Z setp.lt.u32 %p109, %r401, 2016; 2026-02-21T08:49:00.2171222Z add.s32 %r401, %r401, 16; 2026-02-21T08:49:00.2171574Z mov.b32 %r394, %r260; 2026-02-21T08:49:00.2171879Z mov.b32 %r395, %r259; 2026-02-21T08:49:00.2172253Z @%p109 bra $L__BB0_4; 2026-02-21T08:49:00.2172544Z bra.uni $L__BB0_7; 2026-02-21T08:49:00.2172951Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:49:00.2173568Z .loc 1 0 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:0:42 2026-02-21T08:49:00.2174152Z mov.b32 %r260, %r400; 2026-02-21T08:49:00.2174665Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2175243Z add.s32 %r223, %r397, 1; 2026-02-21T08:49:00.2175598Z setp.gt.s32 %p93, %r223, 6; 2026-02-21T08:49:00.2175919Z selp.b32 %r397, 0, %r223, %p93; 2026-02-21T08:49:00.2176269Z selp.b32 %r224, 1, 0, %p93; 2026-02-21T08:49:00.2176589Z xor.b32 %r396, %r396, %r224; 2026-02-21T08:49:00.2176935Z shl.b32 %r225, %r397, 3; 2026-02-21T08:49:00.2177213Z add.s32 %r227, %r30, %r225; 2026-02-21T08:49:00.2177538Z add.s32 %r221, %r227, 28672; 2026-02-21T08:49:00.2177843Z bar.sync 0; 2026-02-21T08:49:00.2178133Z // begin inline asm 2026-02-21T08:49:00.2178414Z 2026-02-21T08:49:00.2178632Z { 2026-02-21T08:49:00.2178916Z .reg .pred complete; 2026-02-21T08:49:00.2179270Z waitLoop: 2026-02-21T08:49:00.2179652Z mbarrier.try_wait.parity.shared.b64 complete, [%r221], %r396; 2026-02-21T08:49:00.2180121Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.2180452Z } 2026-02-21T08:49:00.2180573Z 2026-02-21T08:49:00.2180683Z // end inline asm 2026-02-21T08:49:00.2180992Z shl.b32 %r228, %r399, 3; 2026-02-21T08:49:00.2181306Z add.s32 %r229, %r30, %r228; 2026-02-21T08:49:00.2181655Z add.s32 %r259, %r229, 28736; 2026-02-21T08:49:00.2182221Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2182815Z @%p82 bra $L__BB0_6; 2026-02-21T08:49:00.2183223Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:49:00.2183888Z .loc 1 51 31 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:51:31 2026-02-21T08:49:00.2184475Z shl.b32 %r232, %r397, 11; 2026-02-21T08:49:00.2184909Z add.s32 %r234, %r30, %r232; 2026-02-21T08:49:00.2185499Z .loc 1 52 44 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:52:44 2026-02-21T08:49:00.2186108Z add.s32 %r235, %r234, 14336; 2026-02-21T08:49:00.2186639Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2187254Z elect.sync %r236|%p95, -1; 2026-02-21T08:49:00.2187572Z bfe.u32 %r237, %r234, 4, 14; 2026-02-21T08:49:00.2187932Z cvt.u64.u32 %rd64, %r237; 2026-02-21T08:49:00.2188349Z or.b64 %rd61, %rd64, -4611685949699522560; 2026-02-21T08:49:00.2188743Z bfe.u32 %r238, %r235, 4, 14; 2026-02-21T08:49:00.2189087Z cvt.u64.u32 %rd65, %r238; 2026-02-21T08:49:00.2189412Z or.b64 %rd62, %rd65, -4611685949699522560; 2026-02-21T08:49:00.2189781Z mov.b32 %r231, 68157456; 2026-02-21T08:49:00.2190089Z mov.pred %p94, -1; 2026-02-21T08:49:00.2190417Z // begin inline asm 2026-02-21T08:49:00.2190838Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r393 + 0 ], %rd61, %rd62, %r231, %p94; 2026-02-21T08:49:00.2191375Z // end inline asm 2026-02-21T08:49:00.2191649Z cvt.u64.u32 %rd63, %r259; 2026-02-21T08:49:00.2191974Z // begin inline asm 2026-02-21T08:49:00.2192419Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T08:49:00.2192868Z // end inline asm 2026-02-21T08:49:00.2193159Z bra.uni $L__BB0_6; 2026-02-21T08:49:00.2193503Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:49:00.2194136Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2194991Z // begin inline asm 2026-02-21T08:49:00.2195299Z 2026-02-21T08:49:00.2195521Z { 2026-02-21T08:49:00.2195804Z .reg .pred complete; 2026-02-21T08:49:00.2196096Z waitLoop: 2026-02-21T08:49:00.2196473Z mbarrier.try_wait.parity.shared.b64 complete, [%r259], %r260; 2026-02-21T08:49:00.2197047Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.2197341Z } 2026-02-21T08:49:00.2197467Z 2026-02-21T08:49:00.2197626Z // end inline asm 2026-02-21T08:49:00.2198126Z .loc 1 47 42 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:47:42 2026-02-21T08:49:00.2198681Z bar.sync 0; 2026-02-21T08:49:00.2198951Z // begin inline asm 2026-02-21T08:49:00.2199320Z @%p110 mbarrier.inval.shared::cta.b64 [%r91]; 2026-02-21T08:49:00.2199676Z // end inline asm 2026-02-21T08:49:00.2199975Z bar.sync 0; 2026-02-21T08:49:00.2200254Z // begin inline asm 2026-02-21T08:49:00.2200573Z @%p110 mbarrier.inval.shared::cta.b64 [%r92]; 2026-02-21T08:49:00.2200962Z // end inline asm 2026-02-21T08:49:00.2201227Z bar.sync 0; 2026-02-21T08:49:00.2201522Z // begin inline asm 2026-02-21T08:49:00.2201818Z @%p110 mbarrier.inval.shared::cta.b64 [%r93]; 2026-02-21T08:49:00.2202187Z // end inline asm 2026-02-21T08:49:00.2202449Z bar.sync 0; 2026-02-21T08:49:00.2202756Z // begin inline asm 2026-02-21T08:49:00.2203096Z @%p110 mbarrier.inval.shared::cta.b64 [%r94]; 2026-02-21T08:49:00.2203454Z // end inline asm 2026-02-21T08:49:00.2203823Z bar.sync 0; 2026-02-21T08:49:00.2204082Z // begin inline asm 2026-02-21T08:49:00.2204427Z @%p110 mbarrier.inval.shared::cta.b64 [%r95]; 2026-02-21T08:49:00.2204831Z // end inline asm 2026-02-21T08:49:00.2205139Z bar.sync 0; 2026-02-21T08:49:00.2205391Z // begin inline asm 2026-02-21T08:49:00.2205748Z @%p110 mbarrier.inval.shared::cta.b64 [%r96]; 2026-02-21T08:49:00.2206115Z // end inline asm 2026-02-21T08:49:00.2206400Z bar.sync 0; 2026-02-21T08:49:00.2206675Z // begin inline asm 2026-02-21T08:49:00.2207011Z @%p110 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T08:49:00.2207434Z // end inline asm 2026-02-21T08:49:00.2207704Z add.s32 %r268, %r30, 28736; 2026-02-21T08:49:00.2208050Z // begin inline asm 2026-02-21T08:49:00.2208374Z @%p110 mbarrier.inval.shared::cta.b64 [%r268]; 2026-02-21T08:49:00.2208763Z // end inline asm 2026-02-21T08:49:00.2209030Z bar.sync 0; 2026-02-21T08:49:00.2209326Z // begin inline asm 2026-02-21T08:49:00.2209758Z @%p110 mbarrier.inval.shared::cta.b64 [%r90]; 2026-02-21T08:49:00.2210161Z // end inline asm 2026-02-21T08:49:00.2210692Z .loc 1 56 53 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:56:53 2026-02-21T08:49:00.2211257Z mad.lo.s32 %r341, %r9, 12288, %r7; 2026-02-21T08:49:00.2211645Z mad.lo.s32 %r342, %r10, 12288, %r7; 2026-02-21T08:49:00.2211981Z mad.lo.s32 %r343, %r11, 12288, %r7; 2026-02-21T08:49:00.2212363Z mad.lo.s32 %r344, %r12, 12288, %r7; 2026-02-21T08:49:00.2212913Z .loc 1 56 24 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:56:24 2026-02-21T08:49:00.2213567Z mad.wide.u32 %rd68, %r341, 2, %rd3; 2026-02-21T08:49:00.2213943Z mad.wide.u32 %rd69, %r342, 2, %rd3; 2026-02-21T08:49:00.2214268Z mad.wide.u32 %rd70, %r343, 2, %rd3; 2026-02-21T08:49:00.2214648Z mad.wide.u32 %rd71, %r344, 2, %rd3; 2026-02-21T08:49:00.2215225Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2215810Z // begin inline asm 2026-02-21T08:49:00.2216503Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285}, [%r303 + 0], 32; 2026-02-21T08:49:00.2217297Z // end inline asm 2026-02-21T08:49:00.2217591Z // begin inline asm 2026-02-21T08:49:00.2218302Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302}, [%r303 + 16], 32; 2026-02-21T08:49:00.2219083Z // end inline asm 2026-02-21T08:49:00.2219358Z // begin inline asm 2026-02-21T08:49:00.2219681Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:49:00.2219999Z // end inline asm 2026-02-21T08:49:00.2220316Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:49:00.2220621Z cvt.u64.u32 %rd73, %r271; 2026-02-21T08:49:00.2221014Z shl.b64 %rd74, %rd73, 32; 2026-02-21T08:49:00.2221366Z or.b64 %rd75, %rd72, %rd74; 2026-02-21T08:49:00.2221877Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2222466Z mov.b64 {%r345, %r346}, %rd75; 2026-02-21T08:49:00.2222802Z cvt.rn.f16x2.f32 %r347, %r346, %r345; 2026-02-21T08:49:00.2223388Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2223933Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:49:00.2224256Z cvt.u64.u32 %rd77, %r273; 2026-02-21T08:49:00.2224600Z shl.b64 %rd78, %rd77, 32; 2026-02-21T08:49:00.2224962Z or.b64 %rd79, %rd76, %rd78; 2026-02-21T08:49:00.2225545Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2226149Z mov.b64 {%r348, %r349}, %rd79; 2026-02-21T08:49:00.2226521Z cvt.rn.f16x2.f32 %r350, %r349, %r348; 2026-02-21T08:49:00.2227104Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2227708Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:49:00.2228118Z cvt.u64.u32 %rd81, %r275; 2026-02-21T08:49:00.2228475Z shl.b64 %rd82, %rd81, 32; 2026-02-21T08:49:00.2228809Z or.b64 %rd83, %rd80, %rd82; 2026-02-21T08:49:00.2229351Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2229956Z mov.b64 {%r351, %r352}, %rd83; 2026-02-21T08:49:00.2230309Z cvt.rn.f16x2.f32 %r353, %r352, %r351; 2026-02-21T08:49:00.2230913Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2231492Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:49:00.2231847Z cvt.u64.u32 %rd85, %r277; 2026-02-21T08:49:00.2232180Z shl.b64 %rd86, %rd85, 32; 2026-02-21T08:49:00.2232499Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T08:49:00.2233076Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2233715Z mov.b64 {%r354, %r355}, %rd87; 2026-02-21T08:49:00.2234097Z cvt.rn.f16x2.f32 %r356, %r355, %r354; 2026-02-21T08:49:00.2234637Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2235283Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:49:00.2235571Z cvt.u64.u32 %rd89, %r279; 2026-02-21T08:49:00.2235910Z shl.b64 %rd90, %rd89, 32; 2026-02-21T08:49:00.2236252Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T08:49:00.2236749Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2237426Z mov.b64 {%r357, %r358}, %rd91; 2026-02-21T08:49:00.2237749Z cvt.rn.f16x2.f32 %r359, %r358, %r357; 2026-02-21T08:49:00.2238317Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2238868Z cvt.u64.u32 %rd92, %r280; 2026-02-21T08:49:00.2239191Z cvt.u64.u32 %rd93, %r281; 2026-02-21T08:49:00.2239492Z shl.b64 %rd94, %rd93, 32; 2026-02-21T08:49:00.2239833Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T08:49:00.2240364Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2240927Z mov.b64 {%r360, %r361}, %rd95; 2026-02-21T08:49:00.2241294Z cvt.rn.f16x2.f32 %r362, %r361, %r360; 2026-02-21T08:49:00.2241809Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2242417Z cvt.u64.u32 %rd96, %r282; 2026-02-21T08:49:00.2242712Z cvt.u64.u32 %rd97, %r283; 2026-02-21T08:49:00.2243051Z shl.b64 %rd98, %rd97, 32; 2026-02-21T08:49:00.2243392Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T08:49:00.2243899Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2244471Z mov.b64 {%r363, %r364}, %rd99; 2026-02-21T08:49:00.2244904Z cvt.rn.f16x2.f32 %r365, %r364, %r363; 2026-02-21T08:49:00.2245462Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2246035Z cvt.u64.u32 %rd100, %r284; 2026-02-21T08:49:00.2246419Z cvt.u64.u32 %rd101, %r285; 2026-02-21T08:49:00.2246724Z shl.b64 %rd102, %rd101, 32; 2026-02-21T08:49:00.2247077Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T08:49:00.2247622Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2248191Z mov.b64 {%r366, %r367}, %rd103; 2026-02-21T08:49:00.2248562Z cvt.rn.f16x2.f32 %r368, %r367, %r366; 2026-02-21T08:49:00.2249118Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2249695Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:49:00.2249993Z cvt.u64.u32 %rd105, %r288; 2026-02-21T08:49:00.2250348Z shl.b64 %rd106, %rd105, 32; 2026-02-21T08:49:00.2250696Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T08:49:00.2251205Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2251784Z mov.b64 {%r369, %r370}, %rd107; 2026-02-21T08:49:00.2252183Z cvt.rn.f16x2.f32 %r371, %r370, %r369; 2026-02-21T08:49:00.2252781Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2253329Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:49:00.2253685Z cvt.u64.u32 %rd109, %r290; 2026-02-21T08:49:00.2253993Z shl.b64 %rd110, %rd109, 32; 2026-02-21T08:49:00.2254322Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T08:49:00.2254925Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2255469Z mov.b64 {%r372, %r373}, %rd111; 2026-02-21T08:49:00.2255842Z cvt.rn.f16x2.f32 %r374, %r373, %r372; 2026-02-21T08:49:00.2256383Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2256955Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:49:00.2257271Z cvt.u64.u32 %rd113, %r292; 2026-02-21T08:49:00.2257670Z shl.b64 %rd114, %rd113, 32; 2026-02-21T08:49:00.2258023Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T08:49:00.2258556Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2259134Z mov.b64 {%r375, %r376}, %rd115; 2026-02-21T08:49:00.2259464Z cvt.rn.f16x2.f32 %r377, %r376, %r375; 2026-02-21T08:49:00.2260022Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2260579Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:49:00.2260980Z cvt.u64.u32 %rd117, %r294; 2026-02-21T08:49:00.2261279Z shl.b64 %rd118, %rd117, 32; 2026-02-21T08:49:00.2261633Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T08:49:00.2262173Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2262726Z mov.b64 {%r378, %r379}, %rd119; 2026-02-21T08:49:00.2263081Z cvt.rn.f16x2.f32 %r380, %r379, %r378; 2026-02-21T08:49:00.2263625Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2264216Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:49:00.2264509Z cvt.u64.u32 %rd121, %r296; 2026-02-21T08:49:00.2264858Z shl.b64 %rd122, %rd121, 32; 2026-02-21T08:49:00.2265206Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T08:49:00.2265720Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2266318Z mov.b64 {%r381, %r382}, %rd123; 2026-02-21T08:49:00.2266630Z cvt.rn.f16x2.f32 %r383, %r382, %r381; 2026-02-21T08:49:00.2267194Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2267731Z cvt.u64.u32 %rd124, %r297; 2026-02-21T08:49:00.2268076Z cvt.u64.u32 %rd125, %r298; 2026-02-21T08:49:00.2268460Z shl.b64 %rd126, %rd125, 32; 2026-02-21T08:49:00.2268794Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T08:49:00.2269380Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2269962Z mov.b64 {%r384, %r385}, %rd127; 2026-02-21T08:49:00.2270331Z cvt.rn.f16x2.f32 %r386, %r385, %r384; 2026-02-21T08:49:00.2270919Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2271514Z cvt.u64.u32 %rd128, %r299; 2026-02-21T08:49:00.2271843Z cvt.u64.u32 %rd129, %r300; 2026-02-21T08:49:00.2272199Z shl.b64 %rd130, %rd129, 32; 2026-02-21T08:49:00.2272550Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T08:49:00.2273112Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2273711Z mov.b64 {%r387, %r388}, %rd131; 2026-02-21T08:49:00.2274072Z cvt.rn.f16x2.f32 %r389, %r388, %r387; 2026-02-21T08:49:00.2274727Z .loc 1 53 52 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:53:52 2026-02-21T08:49:00.2275325Z cvt.u64.u32 %rd132, %r301; 2026-02-21T08:49:00.2275734Z cvt.u64.u32 %rd133, %r302; 2026-02-21T08:49:00.2276070Z shl.b64 %rd134, %rd133, 32; 2026-02-21T08:49:00.2276440Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T08:49:00.2277012Z .loc 1 55 27 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:55:27 2026-02-21T08:49:00.2277621Z mov.b64 {%r390, %r391}, %rd135; 2026-02-21T08:49:00.2277978Z cvt.rn.f16x2.f32 %r392, %r391, %r390; 2026-02-21T08:49:00.2278526Z .loc 1 56 83 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:56:83 2026-02-21T08:49:00.2279160Z st.shared.v4.b32 [%r4], {%r347, %r359, %r371, %r383}; 2026-02-21T08:49:00.2279552Z bar.sync 0; 2026-02-21T08:49:00.2279854Z // begin inline asm 2026-02-21T08:49:00.2280344Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r324, %r328, %r332, %r336}, [%r308]; 2026-02-21T08:49:00.2280872Z // end inline asm 2026-02-21T08:49:00.2281164Z bar.sync 0; 2026-02-21T08:49:00.2281553Z st.shared.v4.b32 [%r4], {%r350, %r362, %r374, %r386}; 2026-02-21T08:49:00.2281977Z bar.sync 0; 2026-02-21T08:49:00.2282219Z // begin inline asm 2026-02-21T08:49:00.2282729Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r325, %r329, %r333, %r337}, [%r308]; 2026-02-21T08:49:00.2283230Z // end inline asm 2026-02-21T08:49:00.2283534Z bar.sync 0; 2026-02-21T08:49:00.2283891Z st.shared.v4.b32 [%r4], {%r353, %r365, %r377, %r389}; 2026-02-21T08:49:00.2284266Z bar.sync 0; 2026-02-21T08:49:00.2284546Z // begin inline asm 2026-02-21T08:49:00.2285041Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r326, %r330, %r334, %r338}, [%r308]; 2026-02-21T08:49:00.2285636Z // end inline asm 2026-02-21T08:49:00.2285904Z bar.sync 0; 2026-02-21T08:49:00.2286249Z st.shared.v4.b32 [%r4], {%r356, %r368, %r380, %r392}; 2026-02-21T08:49:00.2286616Z bar.sync 0; 2026-02-21T08:49:00.2286913Z // begin inline asm 2026-02-21T08:49:00.2287357Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r327, %r331, %r335, %r339}, [%r308]; 2026-02-21T08:49:00.2287877Z // end inline asm 2026-02-21T08:49:00.2288193Z // begin inline asm 2026-02-21T08:49:00.2288549Z st.global.v4.b32 [ %rd68 + 0 ], { %r324, %r325, %r326, %r327 }; 2026-02-21T08:49:00.2288975Z // end inline asm 2026-02-21T08:49:00.2289236Z // begin inline asm 2026-02-21T08:49:00.2289624Z st.global.v4.b32 [ %rd69 + 0 ], { %r328, %r329, %r330, %r331 }; 2026-02-21T08:49:00.2290024Z // end inline asm 2026-02-21T08:49:00.2290336Z // begin inline asm 2026-02-21T08:49:00.2290684Z st.global.v4.b32 [ %rd70 + 0 ], { %r332, %r333, %r334, %r335 }; 2026-02-21T08:49:00.2291100Z // end inline asm 2026-02-21T08:49:00.2291403Z // begin inline asm 2026-02-21T08:49:00.2291764Z st.global.v4.b32 [ %rd71 + 0 ], { %r336, %r337, %r338, %r339 }; 2026-02-21T08:49:00.2292187Z // end inline asm 2026-02-21T08:49:00.2292492Z $L__BB0_8: // %._crit_edge 2026-02-21T08:49:00.2293184Z .loc 1 28 4 // chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py:28:4 2026-02-21T08:49:00.2293739Z bar.sync 0; 2026-02-21T08:49:00.2294048Z // begin inline asm 2026-02-21T08:49:00.2294420Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r393, 64; 2026-02-21T08:49:00.2294950Z // end inline asm 2026-02-21T08:49:00.2295251Z ret; 2026-02-21T08:49:00.2295486Z $L__tmp1: 2026-02-21T08:49:00.2295768Z $L__func_end0: 2026-02-21T08:49:00.2296075Z // -- End function 2026-02-21T08:49:00.2296475Z } 2026-02-21T08:49:00.2296967Z .file 1 "/tmp/torchinductor_root/hi/chiefixkehjxszq27nbbd7hmnhohap4qh76mxfdy5qwfkks2zq6d.py" 2026-02-21T08:49:00.2297630Z .section .debug_abbrev 2026-02-21T08:49:00.2297890Z { 2026-02-21T08:49:00.2298216Z .b8 1 // Abbreviation Code 2026-02-21T08:49:00.2298676Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:49:00.2299076Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:49:00.2299519Z .b8 37 // DW_AT_producer 2026-02-21T08:49:00.2299916Z .b8 8 // DW_FORM_string 2026-02-21T08:49:00.2300442Z .b8 19 // DW_AT_language 2026-02-21T08:49:00.2300848Z .b8 5 // DW_FORM_data2 2026-02-21T08:49:00.2301265Z .b8 3 // DW_AT_name 2026-02-21T08:49:00.2301652Z .b8 8 // DW_FORM_string 2026-02-21T08:49:00.2302098Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:49:00.2302516Z .b8 6 // DW_FORM_data4 2026-02-21T08:49:00.2302924Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:49:00.2303351Z .b8 8 // DW_FORM_string 2026-02-21T08:49:00.2303713Z .b8 0 // EOM(1) 2026-02-21T08:49:00.2304126Z .b8 0 // EOM(2) 2026-02-21T08:49:00.2304498Z .b8 0 // EOM(3) 2026-02-21T08:49:00.2304941Z } 2026-02-21T08:49:00.2305237Z .section .debug_info 2026-02-21T08:49:00.2305519Z { 2026-02-21T08:49:00.2305838Z .b32 104 // Length of Unit 2026-02-21T08:49:00.2306249Z .b8 2 // DWARF version number 2026-02-21T08:49:00.2306654Z .b8 0 2026-02-21T08:49:00.2307004Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:49:00.2307506Z .b8 8 // Address Size (in bytes) 2026-02-21T08:49:00.2307980Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:49:00.2308534Z .b8 116 // DW_AT_producer 2026-02-21T08:49:00.2308939Z .b8 114 2026-02-21T08:49:00.2309185Z .b8 105 2026-02-21T08:49:00.2309459Z .b8 116 2026-02-21T08:49:00.2309688Z .b8 111 2026-02-21T08:49:00.2309938Z .b8 110 2026-02-21T08:49:00.2310179Z .b8 0 2026-02-21T08:49:00.2310493Z .b8 2 // DW_AT_language 2026-02-21T08:49:00.2310842Z .b8 0 2026-02-21T08:49:00.2311131Z .b8 99 // DW_AT_name 2026-02-21T08:49:00.2311492Z .b8 104 2026-02-21T08:49:00.2311756Z .b8 105 2026-02-21T08:49:00.2312013Z .b8 101 2026-02-21T08:49:00.2312224Z .b8 102 2026-02-21T08:49:00.2312479Z .b8 105 2026-02-21T08:49:00.2312711Z .b8 120 2026-02-21T08:49:00.2312978Z .b8 107 2026-02-21T08:49:00.2313203Z .b8 101 2026-02-21T08:49:00.2313466Z .b8 104 2026-02-21T08:49:00.2313707Z .b8 106 2026-02-21T08:49:00.2313990Z .b8 120 2026-02-21T08:49:00.2314237Z .b8 115 2026-02-21T08:49:00.2314494Z .b8 122 2026-02-21T08:49:00.2314782Z .b8 113 2026-02-21T08:49:00.2315062Z .b8 50 2026-02-21T08:49:00.2315311Z .b8 55 2026-02-21T08:49:00.2315582Z .b8 110 2026-02-21T08:49:00.2315820Z .b8 98 2026-02-21T08:49:00.2316090Z .b8 98 2026-02-21T08:49:00.2316364Z .b8 100 2026-02-21T08:49:00.2316591Z .b8 55 2026-02-21T08:49:00.2316950Z .b8 104 2026-02-21T08:49:00.2317197Z .b8 109 2026-02-21T08:49:00.2317479Z .b8 110 2026-02-21T08:49:00.2317707Z .b8 104 2026-02-21T08:49:00.2317981Z .b8 111 2026-02-21T08:49:00.2318210Z .b8 104 2026-02-21T08:49:00.2318486Z .b8 97 2026-02-21T08:49:00.2318729Z .b8 112 2026-02-21T08:49:00.2318988Z .b8 52 2026-02-21T08:49:00.2319231Z .b8 113 2026-02-21T08:49:00.2319503Z .b8 104 2026-02-21T08:49:00.2319741Z .b8 55 2026-02-21T08:49:00.2320007Z .b8 54 2026-02-21T08:49:00.2320283Z .b8 109 2026-02-21T08:49:00.2320520Z .b8 120 2026-02-21T08:49:00.2320817Z .b8 102 2026-02-21T08:49:00.2321035Z .b8 100 2026-02-21T08:49:00.2321298Z .b8 121 2026-02-21T08:49:00.2321528Z .b8 53 2026-02-21T08:49:00.2321797Z .b8 113 2026-02-21T08:49:00.2322027Z .b8 119 2026-02-21T08:49:00.2322270Z .b8 102 2026-02-21T08:49:00.2322492Z .b8 107 2026-02-21T08:49:00.2322790Z .b8 107 2026-02-21T08:49:00.2339351Z .b8 115 2026-02-21T08:49:00.2339648Z .b8 50 2026-02-21T08:49:00.2339889Z .b8 122 2026-02-21T08:49:00.2340158Z .b8 113 2026-02-21T08:49:00.2340389Z .b8 54 2026-02-21T08:49:00.2340659Z .b8 100 2026-02-21T08:49:00.2340879Z .b8 46 2026-02-21T08:49:00.2341133Z .b8 112 2026-02-21T08:49:00.2341529Z .b8 121 2026-02-21T08:49:00.2341788Z .b8 0 2026-02-21T08:49:00.2342125Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:49:00.2342583Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:49:00.2342937Z .b8 116 2026-02-21T08:49:00.2343193Z .b8 109 2026-02-21T08:49:00.2343453Z .b8 112 2026-02-21T08:49:00.2343673Z .b8 47 2026-02-21T08:49:00.2343921Z .b8 116 2026-02-21T08:49:00.2344140Z .b8 111 2026-02-21T08:49:00.2344387Z .b8 114 2026-02-21T08:49:00.2344601Z .b8 99 2026-02-21T08:49:00.2344903Z .b8 104 2026-02-21T08:49:00.2345128Z .b8 105 2026-02-21T08:49:00.2345383Z .b8 110 2026-02-21T08:49:00.2345599Z .b8 100 2026-02-21T08:49:00.2345839Z .b8 117 2026-02-21T08:49:00.2346054Z .b8 99 2026-02-21T08:49:00.2346304Z .b8 116 2026-02-21T08:49:00.2346526Z .b8 111 2026-02-21T08:49:00.2346771Z .b8 114 2026-02-21T08:49:00.2347011Z .b8 95 2026-02-21T08:49:00.2347235Z .b8 114 2026-02-21T08:49:00.2347488Z .b8 111 2026-02-21T08:49:00.2347803Z .b8 111 2026-02-21T08:49:00.2348059Z .b8 116 2026-02-21T08:49:00.2348278Z .b8 47 2026-02-21T08:49:00.2348526Z .b8 104 2026-02-21T08:49:00.2348745Z .b8 105 2026-02-21T08:49:00.2348986Z .b8 0 2026-02-21T08:49:00.2349213Z } 2026-02-21T08:49:00.2349501Z .section .debug_macinfo { } 2026-02-21T08:49:00.2349713Z 2026-02-21T08:49:00.2349865Z ================================================================ 2026-02-21T08:49:00.2350340Z please share the reproducer above with Triton project. 2026-02-21T08:49:00.5505006Z [46s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:49:00.5505757Z 2026-02-21T08:49:00.5509100Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, None]), static_shapes=True) 2026-02-21T08:49:00.5510302Z 2026-02-21T08:49:00.5510560Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:49:00.5510791Z 2026-02-21T08:49:00.5510934Z `ptxas` stderr: 2026-02-21T08:49:00.5511147Z ================================================================ 2026-02-21T08:49:00.5515391Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 186 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:49:00.5515940Z Internal Triton PTX codegen error 2026-02-21T08:49:00.5519568Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:49:00.5519881Z `ptxas` stderr: 2026-02-21T08:49:00.5522603Z 2026-02-21T08:49:00.5523332Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 186 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:49:00.5527683Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmptmrs4vxy.ptx -o /tmp/tmptmrs4vxy.ptx.o 2026-02-21T08:49:00.5528308Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:49:00.5528530Z 2026-02-21T08:49:00.5528535Z 2026-02-21T08:49:00.5528949Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmptmrs4vxy.ptx -o /tmp/tmptmrs4vxy.ptx.o 2026-02-21T08:49:00.5529435Z 2026-02-21T08:49:00.5529439Z 2026-02-21T08:49:00.5529525Z // 2026-02-21T08:49:00.5529739Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:49:00.5530005Z // 2026-02-21T08:49:00.5530427Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:49:00.5530627Z 2026-02-21T08:49:00.5530707Z .version 8.7 2026-02-21T08:49:00.5530908Z .target sm_100a 2026-02-21T08:49:00.5531082Z .address_size 64 2026-02-21T08:49:00.5531217Z 2026-02-21T08:49:00.5531531Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:49:00.5531822Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:49:00.5532091Z // @_helion_matmul 2026-02-21T08:49:00.5532350Z .visible .entry _helion_matmul( 2026-02-21T08:49:00.5532599Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:49:00.5532912Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:49:00.5533192Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:49:00.5533490Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:49:00.5533770Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:49:00.5534027Z ) 2026-02-21T08:49:00.5534207Z .reqntid 128 2026-02-21T08:49:00.5534369Z .maxnreg 32 2026-02-21T08:49:00.5534554Z { 2026-02-21T08:49:00.5534776Z .reg .pred %p<85>; 2026-02-21T08:49:00.5535040Z .reg .b32 %r<386>; 2026-02-21T08:49:00.5535240Z .reg .b64 %rd<128>; 2026-02-21T08:49:00.5535445Z $L__func_begin0: 2026-02-21T08:49:00.5535545Z 2026-02-21T08:49:00.5535654Z // %bb.0: 2026-02-21T08:49:00.5535961Z .loc 1 19 0 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:19 2026-02-21T08:49:00.5536277Z mov.u32 %r1, %tid.x; 2026-02-21T08:49:00.5536512Z ld.param.b64 %rd11, [_helion_matmul_param_1]; 2026-02-21T08:49:00.5536769Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:49:00.5536965Z mov.b32 %r31, global_smem; 2026-02-21T08:49:00.5537232Z // begin inline asm 2026-02-21T08:49:00.5537499Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r31], 64; 2026-02-21T08:49:00.5537803Z // end inline asm 2026-02-21T08:49:00.5538003Z ld.param.b64 %rd28, [_helion_matmul_param_3]; 2026-02-21T08:49:00.5538268Z bar.sync 0; 2026-02-21T08:49:00.5538447Z ld.shared.b32 %r378, [global_smem]; 2026-02-21T08:49:00.5538677Z bar.sync 0; 2026-02-21T08:49:00.5538867Z // begin inline asm 2026-02-21T08:49:00.5539109Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:49:00.5539402Z // end inline asm 2026-02-21T08:49:00.5539695Z .loc 1 21 68 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:21:68 2026-02-21T08:49:00.5540053Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:49:00.5540244Z mov.u32 %r40, %ctaid.y; 2026-02-21T08:49:00.5540455Z mov.u32 %r41, %ctaid.z; 2026-02-21T08:49:00.5540642Z mov.u32 %r42, %nctaid.x; 2026-02-21T08:49:00.5540863Z mov.u32 %r43, %nctaid.y; 2026-02-21T08:49:00.5541083Z mad.lo.s32 %r44, %r41, %r43, %r40; 2026-02-21T08:49:00.5541295Z mad.lo.s32 %r45, %r44, %r42, %r3; 2026-02-21T08:49:00.5541520Z shl.b32 %r46, %r45, 7; 2026-02-21T08:49:00.5541705Z cvt.s64.s32 %rd29, %r46; 2026-02-21T08:49:00.5541923Z add.s64 %rd25, %rd28, %rd29; 2026-02-21T08:49:00.5542116Z shl.b32 %r47, %r1, 2; 2026-02-21T08:49:00.5542367Z add.s32 %r32, %r31, %r47; 2026-02-21T08:49:00.5542553Z mov.b32 %r49, 0; 2026-02-21T08:49:00.5542753Z // begin inline asm 2026-02-21T08:49:00.5542945Z @%p1 st.shared.b32 [ %r32 + 0 ], %r49; 2026-02-21T08:49:00.5543184Z // end inline asm 2026-02-21T08:49:00.5543384Z bar.warp.sync -1; 2026-02-21T08:49:00.5543567Z setp.eq.b32 %p75, %r1, 0; 2026-02-21T08:49:00.5543787Z cvt.u64.u32 %rd10, %r31; 2026-02-21T08:49:00.5543968Z // begin inline asm 2026-02-21T08:49:00.5544281Z @%p75 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T08:49:00.5544598Z // end inline asm 2026-02-21T08:49:00.5544838Z // begin inline asm 2026-02-21T08:49:00.5545096Z @%p75 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:49:00.5545414Z // end inline asm 2026-02-21T08:49:00.5545610Z mov.b32 %r92, 16; 2026-02-21T08:49:00.5545783Z // begin inline asm 2026-02-21T08:49:00.5546081Z @%p75 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r92; 2026-02-21T08:49:00.5546382Z // end inline asm 2026-02-21T08:49:00.5546576Z mov.b32 %r35, 64; 2026-02-21T08:49:00.5546775Z // begin inline asm 2026-02-21T08:49:00.5547063Z @%p75 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r35; 2026-02-21T08:49:00.5547382Z // end inline asm 2026-02-21T08:49:00.5547549Z mov.b32 %r36, 2048; 2026-02-21T08:49:00.5547745Z // begin inline asm 2026-02-21T08:49:00.5548018Z @%p75 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r36; 2026-02-21T08:49:00.5548344Z // end inline asm 2026-02-21T08:49:00.5548509Z mov.b32 %r37, 12288; 2026-02-21T08:49:00.5548715Z // begin inline asm 2026-02-21T08:49:00.5548990Z @%p75 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r37; 2026-02-21T08:49:00.5549326Z // end inline asm 2026-02-21T08:49:00.5549525Z mov.b64 %rd18, 4096; 2026-02-21T08:49:00.5549707Z // begin inline asm 2026-02-21T08:49:00.5550024Z @%p75 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T08:49:00.5550389Z // end inline asm 2026-02-21T08:49:00.5550598Z mov.b32 %r38, 1; 2026-02-21T08:49:00.5550745Z // begin inline asm 2026-02-21T08:49:00.5551050Z @%p75 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r38; 2026-02-21T08:49:00.5551408Z // end inline asm 2026-02-21T08:49:00.5551620Z // begin inline asm 2026-02-21T08:49:00.5551928Z @%p75 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r38; 2026-02-21T08:49:00.5552292Z // end inline asm 2026-02-21T08:49:00.5552467Z // begin inline asm 2026-02-21T08:49:00.5552801Z @%p75 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T08:49:00.5553118Z // end inline asm 2026-02-21T08:49:00.5553323Z // begin inline asm 2026-02-21T08:49:00.5553651Z @%p75 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:49:00.5553991Z // end inline asm 2026-02-21T08:49:00.5554203Z // begin inline asm 2026-02-21T08:49:00.5554488Z @%p75 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T08:49:00.5554859Z // end inline asm 2026-02-21T08:49:00.5555036Z // begin inline asm 2026-02-21T08:49:00.5555345Z @%p75 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T08:49:00.5555653Z // end inline asm 2026-02-21T08:49:00.5555860Z // begin inline asm 2026-02-21T08:49:00.5556289Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd25 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T08:49:00.5556730Z // end inline asm 2026-02-21T08:49:00.5556950Z // begin inline asm 2026-02-21T08:49:00.5557191Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd25 + 0 ], 0x80; 2026-02-21T08:49:00.5557510Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:49:00.5557736Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:49:00.5557980Z // end inline asm 2026-02-21T08:49:00.5558208Z bar.sync 0; 2026-02-21T08:49:00.5558388Z cvta.global.u64 %rd48, %rd25; 2026-02-21T08:49:00.5558746Z .loc 1 27 132 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:27:132 2026-02-21T08:49:00.5559087Z setp.gt.u32 %p21, %r3, 6143; 2026-02-21T08:49:00.5559311Z @%p21 bra $L__BB0_8; 2026-02-21T08:49:00.5559512Z // %bb.1: // %.lr.ph 2026-02-21T08:49:00.5559874Z .loc 1 0 132 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:0:132 2026-02-21T08:49:00.5560249Z ld.param.b64 %rd8, [_helion_matmul_param_0]; 2026-02-21T08:49:00.5560561Z .loc 1 47 48 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:47:48 2026-02-21T08:49:00.5560841Z and.b32 %r135, %r1, 1; 2026-02-21T08:49:00.5560995Z shl.b32 %r4, %r135, 3; 2026-02-21T08:49:00.5561247Z .loc 1 39 45 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:39:45 2026-02-21T08:49:00.5561523Z shl.b32 %r136, %r1, 3; 2026-02-21T08:49:00.5561675Z and.b32 %r137, %r136, 56; 2026-02-21T08:49:00.5561828Z bfe.u32 %r138, %r1, 3, 4; 2026-02-21T08:49:00.5562021Z bfe.u32 %r5, %r1, 1, 6; 2026-02-21T08:49:00.5562164Z shr.u32 %r139, %r1, 5; 2026-02-21T08:49:00.5562308Z shl.b32 %r140, %r1, 4; 2026-02-21T08:49:00.5562451Z and.b32 %r141, %r140, 1904; 2026-02-21T08:49:00.5562612Z bfe.s32 %r142, %r1, 3, 1; 2026-02-21T08:49:00.5562764Z and.b32 %r143, %r142, 144; 2026-02-21T08:49:00.5562914Z xor.b32 %r144, %r143, %r141; 2026-02-21T08:49:00.5563074Z add.s32 %r91, %r31, %r144; 2026-02-21T08:49:00.5563223Z add.s32 %r98, %r91, 2048; 2026-02-21T08:49:00.5563375Z add.s32 %r105, %r91, 4096; 2026-02-21T08:49:00.5563521Z add.s32 %r112, %r91, 6144; 2026-02-21T08:49:00.5563673Z add.s32 %r119, %r91, 8192; 2026-02-21T08:49:00.5563820Z add.s32 %r126, %r91, 10240; 2026-02-21T08:49:00.5563978Z add.s32 %r189, %r91, 12288; 2026-02-21T08:49:00.5564124Z and.b32 %r146, %r140, 176; 2026-02-21T08:49:00.5564277Z and.b32 %r147, %r1, 96; 2026-02-21T08:49:00.5564430Z shl.b32 %r148, %r147, 3; 2026-02-21T08:49:00.5564607Z bfe.s32 %r149, %r1, 2, 1; 2026-02-21T08:49:00.5564786Z and.b32 %r150, %r149, 1088; 2026-02-21T08:49:00.5564931Z and.b32 %r152, %r47, 64; 2026-02-21T08:49:00.5565085Z xor.b32 %r153, %r150, %r152; 2026-02-21T08:49:00.5565233Z add.s32 %r154, %r31, %r146; 2026-02-21T08:49:00.5565387Z add.s32 %r155, %r154, %r148; 2026-02-21T08:49:00.5565533Z shl.b32 %r156, %r1, 5; 2026-02-21T08:49:00.5565683Z and.b32 %r157, %r156, 1792; 2026-02-21T08:49:00.5565826Z and.b32 %r158, %r136, 48; 2026-02-21T08:49:00.5565979Z shl.b32 %r159, %r147, 1; 2026-02-21T08:49:00.5566162Z shl.b32 %r160, %r135, 6; 2026-02-21T08:49:00.5566310Z xor.b32 %r161, %r159, %r160; 2026-02-21T08:49:00.5566468Z add.s32 %r162, %r31, %r157; 2026-02-21T08:49:00.5566620Z add.s32 %r163, %r162, %r158; 2026-02-21T08:49:00.5566889Z .loc 1 34 33 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:34:33 2026-02-21T08:49:00.5567166Z shr.u32 %r164, %r3, 5; 2026-02-21T08:49:00.5567322Z and.b32 %r165, %r164, 252; 2026-02-21T08:49:00.5567578Z .loc 1 36 64 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:36:64 2026-02-21T08:49:00.5567865Z and.b32 %r166, %r3, 3; 2026-02-21T08:49:00.5568117Z .loc 1 36 30 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:36:30 2026-02-21T08:49:00.5568396Z or.b32 %r167, %r165, %r166; 2026-02-21T08:49:00.5568658Z .loc 1 38 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:38:27 2026-02-21T08:49:00.5568937Z shl.b32 %r194, %r167, 6; 2026-02-21T08:49:00.5569195Z .loc 1 40 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:40:27 2026-02-21T08:49:00.5569473Z shl.b32 %r168, %r3, 4; 2026-02-21T08:49:00.5569632Z and.b32 %r169, %r168, 1984; 2026-02-21T08:49:00.5569927Z .loc 1 41 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:41:32 2026-02-21T08:49:00.5570205Z or.b32 %r170, %r169, %r5; 2026-02-21T08:49:00.5570362Z or.b32 %r12, %r169, %r138; 2026-02-21T08:49:00.5570612Z .loc 1 51 53 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:53 2026-02-21T08:49:00.5570892Z shl.b32 %r171, %r170, 11; 2026-02-21T08:49:00.5571146Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5571447Z shfl.sync.idx.b32 %r16, %r139, 0, 31, -1; 2026-02-21T08:49:00.5571632Z shl.b32 %r172, %r16, 21; 2026-02-21T08:49:00.5571779Z and.b32 %r173, %r172, 6291456; 2026-02-21T08:49:00.5571946Z add.s32 %r288, %r173, %r378; 2026-02-21T08:49:00.5572098Z mov.pred %p22, -1; 2026-02-21T08:49:00.5572246Z // begin inline asm 2026-02-21T08:49:00.5572590Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r288 + 0], 32, {%r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49}; 2026-02-21T08:49:00.5572978Z // end inline asm 2026-02-21T08:49:00.5573117Z // begin inline asm 2026-02-21T08:49:00.5573452Z @%p22 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r288 + 16], 32, {%r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49, %r49}; 2026-02-21T08:49:00.5573856Z // end inline asm 2026-02-21T08:49:00.5573987Z // begin inline asm 2026-02-21T08:49:00.5574141Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:49:00.5574299Z // end inline asm 2026-02-21T08:49:00.5574433Z bar.sync 0; 2026-02-21T08:49:00.5574697Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5574984Z add.s32 %r380, %r31, 28736; 2026-02-21T08:49:00.5575139Z // begin inline asm 2026-02-21T08:49:00.5575298Z @%p75 mbarrier.init.shared::cta.b64 [%r380], 1; 2026-02-21T08:49:00.5575489Z // end inline asm 2026-02-21T08:49:00.5575614Z bar.sync 0; 2026-02-21T08:49:00.5575747Z add.s32 %r83, %r31, 28744; 2026-02-21T08:49:00.5575895Z // begin inline asm 2026-02-21T08:49:00.5576123Z @%p75 mbarrier.init.shared::cta.b64 [%r83], 1; 2026-02-21T08:49:00.5576306Z // end inline asm 2026-02-21T08:49:00.5576443Z add.s32 %r84, %r31, 28672; 2026-02-21T08:49:00.5576587Z // begin inline asm 2026-02-21T08:49:00.5576751Z @%p75 mbarrier.init.shared::cta.b64 [%r84], 1; 2026-02-21T08:49:00.5576938Z // end inline asm 2026-02-21T08:49:00.5577064Z bar.sync 0; 2026-02-21T08:49:00.5577199Z add.s32 %r85, %r31, 28680; 2026-02-21T08:49:00.5577345Z // begin inline asm 2026-02-21T08:49:00.5577509Z @%p75 mbarrier.init.shared::cta.b64 [%r85], 1; 2026-02-21T08:49:00.5577688Z // end inline asm 2026-02-21T08:49:00.5577855Z bar.sync 0; 2026-02-21T08:49:00.5577981Z add.s32 %r86, %r31, 28688; 2026-02-21T08:49:00.5578132Z // begin inline asm 2026-02-21T08:49:00.5578290Z @%p75 mbarrier.init.shared::cta.b64 [%r86], 1; 2026-02-21T08:49:00.5578465Z // end inline asm 2026-02-21T08:49:00.5578600Z bar.sync 0; 2026-02-21T08:49:00.5578725Z add.s32 %r87, %r31, 28696; 2026-02-21T08:49:00.5578880Z // begin inline asm 2026-02-21T08:49:00.5579036Z @%p75 mbarrier.init.shared::cta.b64 [%r87], 1; 2026-02-21T08:49:00.5579223Z // end inline asm 2026-02-21T08:49:00.5579351Z bar.sync 0; 2026-02-21T08:49:00.5579482Z add.s32 %r88, %r31, 28704; 2026-02-21T08:49:00.5579625Z // begin inline asm 2026-02-21T08:49:00.5579782Z @%p75 mbarrier.init.shared::cta.b64 [%r88], 1; 2026-02-21T08:49:00.5579963Z // end inline asm 2026-02-21T08:49:00.5580088Z bar.sync 0; 2026-02-21T08:49:00.5580221Z add.s32 %r89, %r31, 28712; 2026-02-21T08:49:00.5580363Z // begin inline asm 2026-02-21T08:49:00.5580525Z @%p75 mbarrier.init.shared::cta.b64 [%r89], 1; 2026-02-21T08:49:00.5580700Z // end inline asm 2026-02-21T08:49:00.5580833Z bar.sync 0; 2026-02-21T08:49:00.5580958Z add.s32 %r191, %r31, 28720; 2026-02-21T08:49:00.5581114Z // begin inline asm 2026-02-21T08:49:00.5581269Z @%p75 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T08:49:00.5581484Z // end inline asm 2026-02-21T08:49:00.5581733Z .loc 1 51 60 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:60 2026-02-21T08:49:00.5582018Z or.b32 %r174, %r171, %r4; 2026-02-21T08:49:00.5582276Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5582560Z mad.wide.u32 %rd30, %r174, 2, %rd8; 2026-02-21T08:49:00.5582841Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5583126Z // begin inline asm 2026-02-21T08:49:00.5583319Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd30 + 0 ], 0x10, %r92; 2026-02-21T08:49:00.5583541Z // end inline asm 2026-02-21T08:49:00.5583677Z cp.async.commit_group; 2026-02-21T08:49:00.5583931Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5584196Z bar.sync 0; 2026-02-21T08:49:00.5584327Z // begin inline asm 2026-02-21T08:49:00.5584507Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r84], 2048; 2026-02-21T08:49:00.5584746Z // end inline asm 2026-02-21T08:49:00.5584991Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5585290Z bar.sync 0; 2026-02-21T08:49:00.5585430Z elect.sync %r175|%p46, -1; 2026-02-21T08:49:00.5585586Z and.pred %p34, %p1, %p46; 2026-02-21T08:49:00.5585745Z add.s32 %r94, %r31, 14336; 2026-02-21T08:49:00.5585888Z // begin inline asm 2026-02-21T08:49:00.5586210Z @%p34 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r94], [%rd48, {%r49, %r194}], [%r84]; 2026-02-21T08:49:00.5586556Z // end inline asm 2026-02-21T08:49:00.5586800Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5587089Z add.s64 %rd32, %rd30, 32; 2026-02-21T08:49:00.5587340Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5587618Z // begin inline asm 2026-02-21T08:49:00.5587832Z cp.async.cg.shared.global [ %r98 + 0 ], [ %rd32 + 0 ], 0x10, %r92; 2026-02-21T08:49:00.5588057Z // end inline asm 2026-02-21T08:49:00.5588191Z cp.async.commit_group; 2026-02-21T08:49:00.5588459Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5588743Z bar.sync 0; 2026-02-21T08:49:00.5588868Z // begin inline asm 2026-02-21T08:49:00.5589059Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r85], 2048; 2026-02-21T08:49:00.5589271Z // end inline asm 2026-02-21T08:49:00.5589517Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5589822Z bar.sync 0; 2026-02-21T08:49:00.5589962Z elect.sync %r176|%p47, -1; 2026-02-21T08:49:00.5590119Z and.pred %p36, %p1, %p47; 2026-02-21T08:49:00.5590279Z add.s32 %r101, %r31, 16384; 2026-02-21T08:49:00.5590432Z // begin inline asm 2026-02-21T08:49:00.5590748Z @%p36 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r101], [%rd48, {%r92, %r194}], [%r85]; 2026-02-21T08:49:00.5591128Z // end inline asm 2026-02-21T08:49:00.5591376Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5591680Z add.s64 %rd34, %rd30, 64; 2026-02-21T08:49:00.5591943Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5592244Z // begin inline asm 2026-02-21T08:49:00.5592452Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd34 + 0 ], 0x10, %r92; 2026-02-21T08:49:00.5592680Z // end inline asm 2026-02-21T08:49:00.5592828Z cp.async.commit_group; 2026-02-21T08:49:00.5593091Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5593392Z bar.sync 0; 2026-02-21T08:49:00.5593521Z // begin inline asm 2026-02-21T08:49:00.5593742Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r86], 2048; 2026-02-21T08:49:00.5593961Z // end inline asm 2026-02-21T08:49:00.5594213Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5594503Z bar.sync 0; 2026-02-21T08:49:00.5594638Z elect.sync %r177|%p48, -1; 2026-02-21T08:49:00.5594832Z and.pred %p38, %p1, %p48; 2026-02-21T08:49:00.5594988Z add.s32 %r108, %r31, 18432; 2026-02-21T08:49:00.5595147Z mov.b32 %r109, 32; 2026-02-21T08:49:00.5595284Z // begin inline asm 2026-02-21T08:49:00.5595618Z @%p38 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd48, {%r109, %r194}], [%r86]; 2026-02-21T08:49:00.5595971Z // end inline asm 2026-02-21T08:49:00.5596225Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5596529Z add.s64 %rd36, %rd30, 96; 2026-02-21T08:49:00.5596794Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5597097Z // begin inline asm 2026-02-21T08:49:00.5597297Z cp.async.cg.shared.global [ %r112 + 0 ], [ %rd36 + 0 ], 0x10, %r92; 2026-02-21T08:49:00.5597559Z // end inline asm 2026-02-21T08:49:00.5597698Z cp.async.commit_group; 2026-02-21T08:49:00.5597966Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5598254Z bar.sync 0; 2026-02-21T08:49:00.5598381Z // begin inline asm 2026-02-21T08:49:00.5598575Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r87], 2048; 2026-02-21T08:49:00.5598794Z // end inline asm 2026-02-21T08:49:00.5599035Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5599304Z bar.sync 0; 2026-02-21T08:49:00.5599439Z elect.sync %r178|%p49, -1; 2026-02-21T08:49:00.5599594Z and.pred %p40, %p1, %p49; 2026-02-21T08:49:00.5599752Z add.s32 %r115, %r31, 20480; 2026-02-21T08:49:00.5599904Z mov.b32 %r116, 48; 2026-02-21T08:49:00.5600036Z // begin inline asm 2026-02-21T08:49:00.5600386Z @%p40 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r115], [%rd48, {%r116, %r194}], [%r87]; 2026-02-21T08:49:00.5600729Z // end inline asm 2026-02-21T08:49:00.5600983Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5601268Z add.s64 %rd38, %rd30, 128; 2026-02-21T08:49:00.5601532Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5601814Z // begin inline asm 2026-02-21T08:49:00.5602030Z cp.async.cg.shared.global [ %r119 + 0 ], [ %rd38 + 0 ], 0x10, %r92; 2026-02-21T08:49:00.5602254Z // end inline asm 2026-02-21T08:49:00.5602389Z cp.async.commit_group; 2026-02-21T08:49:00.5602646Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5602913Z bar.sync 0; 2026-02-21T08:49:00.5603047Z // begin inline asm 2026-02-21T08:49:00.5603226Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r88], 2048; 2026-02-21T08:49:00.5603438Z // end inline asm 2026-02-21T08:49:00.5603679Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5603950Z bar.sync 0; 2026-02-21T08:49:00.5604087Z elect.sync %r179|%p50, -1; 2026-02-21T08:49:00.5604242Z and.pred %p42, %p1, %p50; 2026-02-21T08:49:00.5604400Z add.s32 %r122, %r31, 22528; 2026-02-21T08:49:00.5604548Z // begin inline asm 2026-02-21T08:49:00.5604897Z @%p42 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r122], [%rd48, {%r35, %r194}], [%r88]; 2026-02-21T08:49:00.5605242Z // end inline asm 2026-02-21T08:49:00.5605477Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5605767Z add.s64 %rd40, %rd30, 160; 2026-02-21T08:49:00.5606043Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5606328Z // begin inline asm 2026-02-21T08:49:00.5606522Z cp.async.cg.shared.global [ %r126 + 0 ], [ %rd40 + 0 ], 0x10, %r92; 2026-02-21T08:49:00.5606741Z // end inline asm 2026-02-21T08:49:00.5606873Z cp.async.commit_group; 2026-02-21T08:49:00.5607134Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5607409Z bar.sync 0; 2026-02-21T08:49:00.5607529Z // begin inline asm 2026-02-21T08:49:00.5607712Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r89], 2048; 2026-02-21T08:49:00.5607915Z // end inline asm 2026-02-21T08:49:00.5608153Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5608420Z bar.sync 0; 2026-02-21T08:49:00.5608553Z elect.sync %r180|%p51, -1; 2026-02-21T08:49:00.5608713Z and.pred %p44, %p1, %p51; 2026-02-21T08:49:00.5608865Z add.s32 %r129, %r31, 24576; 2026-02-21T08:49:00.5609016Z mov.b32 %r130, 80; 2026-02-21T08:49:00.5609149Z // begin inline asm 2026-02-21T08:49:00.5609492Z @%p44 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r129], [%rd48, {%r130, %r194}], [%r89]; 2026-02-21T08:49:00.5609827Z // end inline asm 2026-02-21T08:49:00.5610066Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5610343Z cp.async.wait_group 5; 2026-02-21T08:49:00.5610495Z bar.sync 0; 2026-02-21T08:49:00.5610726Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5610993Z // begin inline asm 2026-02-21T08:49:00.5611129Z 2026-02-21T08:49:00.5611239Z { 2026-02-21T08:49:00.5611369Z .reg .pred complete; 2026-02-21T08:49:00.5611513Z waitLoop: 2026-02-21T08:49:00.5611703Z mbarrier.try_wait.parity.shared.b64 complete, [%r84], %r49; 2026-02-21T08:49:00.5611930Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.5612082Z } 2026-02-21T08:49:00.5612146Z 2026-02-21T08:49:00.5612229Z // end inline asm 2026-02-21T08:49:00.5612471Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5612755Z setp.ne.b32 %p52, %r16, 0; 2026-02-21T08:49:00.5612906Z @%p52 bra $L__BB0_3; 2026-02-21T08:49:00.5613051Z // %bb.2: 2026-02-21T08:49:00.5613276Z .loc 1 0 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:0:52 2026-02-21T08:49:00.5613558Z bfe.u32 %r185, %r94, 4, 14; 2026-02-21T08:49:00.5613714Z cvt.u64.u32 %rd45, %r185; 2026-02-21T08:49:00.5613911Z or.b64 %rd43, %rd45, -4611685949699522560; 2026-02-21T08:49:00.5614092Z bfe.u32 %r186, %r31, 4, 14; 2026-02-21T08:49:00.5614239Z cvt.u64.u32 %rd46, %r186; 2026-02-21T08:49:00.5614404Z or.b64 %rd42, %rd46, -4611685949699522560; 2026-02-21T08:49:00.5614705Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5614988Z elect.sync %r187|%p54, -1; 2026-02-21T08:49:00.5615141Z mov.b32 %r182, 68157456; 2026-02-21T08:49:00.5615297Z mov.pred %p53, 0; 2026-02-21T08:49:00.5615432Z // begin inline asm 2026-02-21T08:49:00.5615658Z @%p54 tcgen05.mma.cta_group::1.kind::f16 [ %r378 + 0 ], %rd42, %rd43, %r182, %p53; 2026-02-21T08:49:00.5615906Z // end inline asm 2026-02-21T08:49:00.5616037Z add.s32 %r188, %r31, 28736; 2026-02-21T08:49:00.5616192Z cvt.u64.u32 %rd44, %r188; 2026-02-21T08:49:00.5616334Z // begin inline asm 2026-02-21T08:49:00.5616539Z @%p54 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd44]; 2026-02-21T08:49:00.5616764Z // end inline asm 2026-02-21T08:49:00.5616900Z $L__BB0_3: 2026-02-21T08:49:00.5617123Z .loc 1 0 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:0:52 2026-02-21T08:49:00.5617424Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T08:49:00.5617640Z add.s32 %r8, %r155, %r153; 2026-02-21T08:49:00.5617796Z add.s32 %r293, %r163, %r161; 2026-02-21T08:49:00.5617954Z or.b32 %r11, %r194, %r137; 2026-02-21T08:49:00.5618100Z or.b32 %r13, %r12, 16; 2026-02-21T08:49:00.5618252Z or.b32 %r14, %r12, 32; 2026-02-21T08:49:00.5618391Z or.b32 %r15, %r12, 48; 2026-02-21T08:49:00.5618641Z .loc 1 51 32 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:32 2026-02-21T08:49:00.5618912Z add.s64 %rd47, %rd30, 192; 2026-02-21T08:49:00.5619166Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5619444Z bar.sync 0; 2026-02-21T08:49:00.5619570Z mov.b32 %r190, 16; 2026-02-21T08:49:00.5619710Z // begin inline asm 2026-02-21T08:49:00.5619907Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd47 + 0 ], 0x10, %r190; 2026-02-21T08:49:00.5620130Z // end inline asm 2026-02-21T08:49:00.5620265Z cp.async.commit_group; 2026-02-21T08:49:00.5620519Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5620788Z // begin inline asm 2026-02-21T08:49:00.5620978Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r191], 2048; 2026-02-21T08:49:00.5621252Z // end inline asm 2026-02-21T08:49:00.5621493Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5621780Z bar.sync 0; 2026-02-21T08:49:00.5621916Z elect.sync %r201|%p59, -1; 2026-02-21T08:49:00.5622085Z and.pred %p57, %p1, %p59; 2026-02-21T08:49:00.5622242Z add.s32 %r192, %r31, 26624; 2026-02-21T08:49:00.5622401Z mov.b32 %r193, 96; 2026-02-21T08:49:00.5622547Z // begin inline asm 2026-02-21T08:49:00.5622893Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r192], [%rd48, {%r193, %r194}], [%r191]; 2026-02-21T08:49:00.5623267Z // end inline asm 2026-02-21T08:49:00.5623512Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5623802Z shl.b32 %r202, %r3, 15; 2026-02-21T08:49:00.5623984Z and.b32 %r203, %r202, 4063232; 2026-02-21T08:49:00.5624151Z shl.b32 %r204, %r5, 11; 2026-02-21T08:49:00.5624296Z or.b32 %r205, %r203, %r204; 2026-02-21T08:49:00.5624450Z or.b32 %r206, %r205, %r4; 2026-02-21T08:49:00.5624609Z mad.wide.u32 %rd50, %r206, 2, %rd8; 2026-02-21T08:49:00.5624816Z add.s64 %rd126, %rd50, 224; 2026-02-21T08:49:00.5624972Z mov.b32 %r384, 1; 2026-02-21T08:49:00.5625101Z mov.b32 %r383, 6; 2026-02-21T08:49:00.5625236Z mov.b32 %r379, 0; 2026-02-21T08:49:00.5625365Z mov.b64 %rd127, 0; 2026-02-21T08:49:00.5625504Z mov.b32 %r381, %r379; 2026-02-21T08:49:00.5625675Z mov.b32 %r382, %r379; 2026-02-21T08:49:00.5625821Z mov.b32 %r385, %r379; 2026-02-21T08:49:00.5625955Z bra.uni $L__BB0_4; 2026-02-21T08:49:00.5626140Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:49:00.5626462Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5626740Z setp.lt.u64 %p67, %rd127, 1936; 2026-02-21T08:49:00.5627020Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5627293Z // begin inline asm 2026-02-21T08:49:00.5627426Z 2026-02-21T08:49:00.5627531Z { 2026-02-21T08:49:00.5627654Z .reg .pred complete; 2026-02-21T08:49:00.5627793Z waitLoop: 2026-02-21T08:49:00.5627983Z mbarrier.try_wait.parity.shared.b64 complete, [%r380], %r379; 2026-02-21T08:49:00.5628217Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.5628364Z } 2026-02-21T08:49:00.5628425Z 2026-02-21T08:49:00.5628488Z // end inline asm 2026-02-21T08:49:00.5628620Z add.s32 %r234, %r384, 1; 2026-02-21T08:49:00.5628777Z setp.gt.s32 %p70, %r234, 1; 2026-02-21T08:49:00.5628930Z selp.b32 %r384, 0, %r234, %p70; 2026-02-21T08:49:00.5629096Z selp.b32 %r235, 1, 0, %p70; 2026-02-21T08:49:00.5629246Z xor.b32 %r385, %r245, %r235; 2026-02-21T08:49:00.5629526Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5629817Z add.s32 %r236, %r383, 1; 2026-02-21T08:49:00.5629969Z setp.gt.s32 %p71, %r236, 6; 2026-02-21T08:49:00.5630129Z selp.b32 %r383, 0, %r236, %p71; 2026-02-21T08:49:00.5630391Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5630675Z bar.sync 0; 2026-02-21T08:49:00.5630803Z shl.b32 %r237, %r383, 11; 2026-02-21T08:49:00.5630960Z add.s32 %r227, %r91, %r237; 2026-02-21T08:49:00.5631112Z selp.b32 %r228, 16, 0, %p67; 2026-02-21T08:49:00.5631271Z // begin inline asm 2026-02-21T08:49:00.5631474Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd126 + 0 ], 0x10, %r228; 2026-02-21T08:49:00.5631691Z // end inline asm 2026-02-21T08:49:00.5631833Z cp.async.commit_group; 2026-02-21T08:49:00.5632077Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5632355Z shl.b32 %r238, %r383, 3; 2026-02-21T08:49:00.5632499Z add.s32 %r240, %r31, %r238; 2026-02-21T08:49:00.5632654Z add.s32 %r233, %r240, 28672; 2026-02-21T08:49:00.5632838Z and.pred %p65, %p75, %p67; 2026-02-21T08:49:00.5632995Z // begin inline asm 2026-02-21T08:49:00.5633181Z @%p65 mbarrier.arrive.expect_tx.shared.b64 _, [%r233], 2048; 2026-02-21T08:49:00.5633392Z // end inline asm 2026-02-21T08:49:00.5633633Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5633912Z add.s32 %r241, %r31, %r237; 2026-02-21T08:49:00.5634064Z add.s32 %r230, %r241, 14336; 2026-02-21T08:49:00.5634211Z bar.sync 0; 2026-02-21T08:49:00.5634352Z elect.sync %r242|%p72, -1; 2026-02-21T08:49:00.5634508Z and.pred %p73, %p67, %p72; 2026-02-21T08:49:00.5634697Z and.pred %p66, %p1, %p73; 2026-02-21T08:49:00.5634863Z cvt.u32.u64 %r243, %rd127; 2026-02-21T08:49:00.5635018Z add.s32 %r231, %r243, 112; 2026-02-21T08:49:00.5635177Z // begin inline asm 2026-02-21T08:49:00.5635538Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r230], [%rd48, {%r231, %r194}], [%r233]; 2026-02-21T08:49:00.5635925Z // end inline asm 2026-02-21T08:49:00.5636180Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5636484Z add.s64 %rd126, %rd126, 32; 2026-02-21T08:49:00.5636648Z setp.lt.u64 %p74, %rd127, 2016; 2026-02-21T08:49:00.5636823Z add.s64 %rd127, %rd127, 16; 2026-02-21T08:49:00.5636982Z mov.b32 %r379, %r245; 2026-02-21T08:49:00.5637128Z mov.b32 %r380, %r244; 2026-02-21T08:49:00.5637306Z @%p74 bra $L__BB0_4; 2026-02-21T08:49:00.5637450Z bra.uni $L__BB0_7; 2026-02-21T08:49:00.5637644Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:49:00.5637982Z .loc 1 0 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:0:42 2026-02-21T08:49:00.5638275Z mov.b32 %r245, %r385; 2026-02-21T08:49:00.5638532Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5638827Z add.s32 %r209, %r382, 1; 2026-02-21T08:49:00.5638992Z setp.gt.s32 %p61, %r209, 6; 2026-02-21T08:49:00.5639156Z selp.b32 %r382, 0, %r209, %p61; 2026-02-21T08:49:00.5639333Z selp.b32 %r210, 1, 0, %p61; 2026-02-21T08:49:00.5639490Z xor.b32 %r381, %r381, %r210; 2026-02-21T08:49:00.5639764Z .loc 1 51 85 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:51:85 2026-02-21T08:49:00.5640066Z cp.async.wait_group 5; 2026-02-21T08:49:00.5640228Z bar.sync 0; 2026-02-21T08:49:00.5640483Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5640769Z shl.b32 %r211, %r382, 3; 2026-02-21T08:49:00.5640930Z add.s32 %r213, %r31, %r211; 2026-02-21T08:49:00.5641088Z add.s32 %r207, %r213, 28672; 2026-02-21T08:49:00.5641251Z // begin inline asm 2026-02-21T08:49:00.5641413Z 2026-02-21T08:49:00.5641536Z { 2026-02-21T08:49:00.5641660Z .reg .pred complete; 2026-02-21T08:49:00.5641815Z waitLoop: 2026-02-21T08:49:00.5642003Z mbarrier.try_wait.parity.shared.b64 complete, [%r207], %r381; 2026-02-21T08:49:00.5642247Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.5642403Z } 2026-02-21T08:49:00.5642469Z 2026-02-21T08:49:00.5642524Z // end inline asm 2026-02-21T08:49:00.5642671Z shl.b32 %r214, %r384, 3; 2026-02-21T08:49:00.5642823Z add.s32 %r215, %r31, %r214; 2026-02-21T08:49:00.5642990Z add.s32 %r244, %r215, 28736; 2026-02-21T08:49:00.5643243Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5643528Z @%p52 bra $L__BB0_6; 2026-02-21T08:49:00.5643708Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:49:00.5644027Z .loc 1 52 44 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:52:44 2026-02-21T08:49:00.5644307Z shl.b32 %r218, %r382, 11; 2026-02-21T08:49:00.5644452Z add.s32 %r220, %r31, %r218; 2026-02-21T08:49:00.5644610Z add.s32 %r221, %r220, 14336; 2026-02-21T08:49:00.5644920Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5645212Z elect.sync %r222|%p63, -1; 2026-02-21T08:49:00.5645368Z bfe.u32 %r223, %r220, 4, 14; 2026-02-21T08:49:00.5645524Z cvt.u64.u32 %rd54, %r223; 2026-02-21T08:49:00.5645694Z or.b64 %rd51, %rd54, -4611685949699522560; 2026-02-21T08:49:00.5645868Z bfe.u32 %r224, %r221, 4, 14; 2026-02-21T08:49:00.5646024Z cvt.u64.u32 %rd55, %r224; 2026-02-21T08:49:00.5646094Z or.b64 %rd52, %rd55, -4611685949699522560; 2026-02-21T08:49:00.5646152Z mov.b32 %r217, 68157456; 2026-02-21T08:49:00.5646218Z mov.pred %p62, -1; 2026-02-21T08:49:00.5646276Z // begin inline asm 2026-02-21T08:49:00.5646417Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r378 + 0 ], %rd51, %rd52, %r217, %p62; 2026-02-21T08:49:00.5646475Z // end inline asm 2026-02-21T08:49:00.5646541Z cvt.u64.u32 %rd53, %r244; 2026-02-21T08:49:00.5646624Z // begin inline asm 2026-02-21T08:49:00.5646748Z @%p63 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd53]; 2026-02-21T08:49:00.5646807Z // end inline asm 2026-02-21T08:49:00.5646861Z bra.uni $L__BB0_6; 2026-02-21T08:49:00.5646952Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:49:00.5647122Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5647175Z // begin inline asm 2026-02-21T08:49:00.5647223Z 2026-02-21T08:49:00.5647294Z { 2026-02-21T08:49:00.5647360Z .reg .pred complete; 2026-02-21T08:49:00.5647412Z waitLoop: 2026-02-21T08:49:00.5647522Z mbarrier.try_wait.parity.shared.b64 complete, [%r244], %r245; 2026-02-21T08:49:00.5647590Z @!complete bra.uni waitLoop; 2026-02-21T08:49:00.5647637Z } 2026-02-21T08:49:00.5647641Z 2026-02-21T08:49:00.5647694Z // end inline asm 2026-02-21T08:49:00.5647855Z .loc 1 46 42 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:46:42 2026-02-21T08:49:00.5647923Z cp.async.wait_group 0; 2026-02-21T08:49:00.5647974Z bar.sync 0; 2026-02-21T08:49:00.5648028Z // begin inline asm 2026-02-21T08:49:00.5648116Z @%p75 mbarrier.inval.shared::cta.b64 [%r84]; 2026-02-21T08:49:00.5648168Z // end inline asm 2026-02-21T08:49:00.5648218Z bar.sync 0; 2026-02-21T08:49:00.5648272Z // begin inline asm 2026-02-21T08:49:00.5648355Z @%p75 mbarrier.inval.shared::cta.b64 [%r85]; 2026-02-21T08:49:00.5648407Z // end inline asm 2026-02-21T08:49:00.5648457Z bar.sync 0; 2026-02-21T08:49:00.5648522Z // begin inline asm 2026-02-21T08:49:00.5648596Z @%p75 mbarrier.inval.shared::cta.b64 [%r86]; 2026-02-21T08:49:00.5648649Z // end inline asm 2026-02-21T08:49:00.5648706Z bar.sync 0; 2026-02-21T08:49:00.5648759Z // begin inline asm 2026-02-21T08:49:00.5648832Z @%p75 mbarrier.inval.shared::cta.b64 [%r87]; 2026-02-21T08:49:00.5648906Z // end inline asm 2026-02-21T08:49:00.5648968Z bar.sync 0; 2026-02-21T08:49:00.5649024Z // begin inline asm 2026-02-21T08:49:00.5649097Z @%p75 mbarrier.inval.shared::cta.b64 [%r88]; 2026-02-21T08:49:00.5649159Z // end inline asm 2026-02-21T08:49:00.5649211Z bar.sync 0; 2026-02-21T08:49:00.5649264Z // begin inline asm 2026-02-21T08:49:00.5649336Z @%p75 mbarrier.inval.shared::cta.b64 [%r89]; 2026-02-21T08:49:00.5649399Z // end inline asm 2026-02-21T08:49:00.5649452Z bar.sync 0; 2026-02-21T08:49:00.5649508Z // begin inline asm 2026-02-21T08:49:00.5649594Z @%p75 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T08:49:00.5649647Z // end inline asm 2026-02-21T08:49:00.5649706Z add.s32 %r253, %r31, 28736; 2026-02-21T08:49:00.5649759Z // begin inline asm 2026-02-21T08:49:00.5649842Z @%p75 mbarrier.inval.shared::cta.b64 [%r253]; 2026-02-21T08:49:00.5649893Z // end inline asm 2026-02-21T08:49:00.5649944Z bar.sync 0; 2026-02-21T08:49:00.5650006Z // begin inline asm 2026-02-21T08:49:00.5650082Z @%p75 mbarrier.inval.shared::cta.b64 [%r83]; 2026-02-21T08:49:00.5650134Z // end inline asm 2026-02-21T08:49:00.5650306Z .loc 1 56 53 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:56:53 2026-02-21T08:49:00.5650395Z mad.lo.s32 %r326, %r12, 12288, %r11; 2026-02-21T08:49:00.5650459Z mad.lo.s32 %r327, %r13, 12288, %r11; 2026-02-21T08:49:00.5650520Z mad.lo.s32 %r328, %r14, 12288, %r11; 2026-02-21T08:49:00.5650585Z mad.lo.s32 %r329, %r15, 12288, %r11; 2026-02-21T08:49:00.5650747Z .loc 1 56 24 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:56:24 2026-02-21T08:49:00.5650814Z mad.wide.u32 %rd58, %r326, 2, %rd9; 2026-02-21T08:49:00.5650884Z mad.wide.u32 %rd59, %r327, 2, %rd9; 2026-02-21T08:49:00.5650943Z mad.wide.u32 %rd60, %r328, 2, %rd9; 2026-02-21T08:49:00.5651001Z mad.wide.u32 %rd61, %r329, 2, %rd9; 2026-02-21T08:49:00.5651170Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5651225Z // begin inline asm 2026-02-21T08:49:00.5651542Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r255, %r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270}, [%r288 + 0], 32; 2026-02-21T08:49:00.5651605Z // end inline asm 2026-02-21T08:49:00.5651660Z // begin inline asm 2026-02-21T08:49:00.5651936Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287}, [%r288 + 16], 32; 2026-02-21T08:49:00.5651988Z // end inline asm 2026-02-21T08:49:00.5652049Z // begin inline asm 2026-02-21T08:49:00.5652138Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:49:00.5652190Z // end inline asm 2026-02-21T08:49:00.5652253Z cvt.u64.u32 %rd62, %r255; 2026-02-21T08:49:00.5652309Z cvt.u64.u32 %rd63, %r256; 2026-02-21T08:49:00.5652366Z shl.b64 %rd64, %rd63, 32; 2026-02-21T08:49:00.5652421Z or.b64 %rd65, %rd62, %rd64; 2026-02-21T08:49:00.5652592Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5652654Z mov.b64 {%r330, %r331}, %rd65; 2026-02-21T08:49:00.5652721Z cvt.rn.f16x2.f32 %r332, %r331, %r330; 2026-02-21T08:49:00.5652892Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5652949Z cvt.u64.u32 %rd66, %r257; 2026-02-21T08:49:00.5653004Z cvt.u64.u32 %rd67, %r258; 2026-02-21T08:49:00.5653065Z shl.b64 %rd68, %rd67, 32; 2026-02-21T08:49:00.5653122Z or.b64 %rd69, %rd66, %rd68; 2026-02-21T08:49:00.5653281Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5653342Z mov.b64 {%r333, %r334}, %rd69; 2026-02-21T08:49:00.5653412Z cvt.rn.f16x2.f32 %r335, %r334, %r333; 2026-02-21T08:49:00.5653572Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5653671Z cvt.u64.u32 %rd70, %r259; 2026-02-21T08:49:00.5653735Z cvt.u64.u32 %rd71, %r260; 2026-02-21T08:49:00.5653793Z shl.b64 %rd72, %rd71, 32; 2026-02-21T08:49:00.5653849Z or.b64 %rd73, %rd70, %rd72; 2026-02-21T08:49:00.5654007Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5654064Z mov.b64 {%r336, %r337}, %rd73; 2026-02-21T08:49:00.5654126Z cvt.rn.f16x2.f32 %r338, %r337, %r336; 2026-02-21T08:49:00.5654286Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5654346Z cvt.u64.u32 %rd74, %r261; 2026-02-21T08:49:00.5654403Z cvt.u64.u32 %rd75, %r262; 2026-02-21T08:49:00.5654456Z shl.b64 %rd76, %rd75, 32; 2026-02-21T08:49:00.5654517Z or.b64 %rd77, %rd74, %rd76; 2026-02-21T08:49:00.5654704Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5654763Z mov.b64 {%r339, %r340}, %rd77; 2026-02-21T08:49:00.5654834Z cvt.rn.f16x2.f32 %r341, %r340, %r339; 2026-02-21T08:49:00.5654994Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5655074Z cvt.u64.u32 %rd78, %r263; 2026-02-21T08:49:00.5655129Z cvt.u64.u32 %rd79, %r264; 2026-02-21T08:49:00.5655192Z shl.b64 %rd80, %rd79, 32; 2026-02-21T08:49:00.5655248Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T08:49:00.5655408Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5655471Z mov.b64 {%r342, %r343}, %rd81; 2026-02-21T08:49:00.5655531Z cvt.rn.f16x2.f32 %r344, %r343, %r342; 2026-02-21T08:49:00.5655695Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5655757Z cvt.u64.u32 %rd82, %r265; 2026-02-21T08:49:00.5655812Z cvt.u64.u32 %rd83, %r266; 2026-02-21T08:49:00.5655866Z shl.b64 %rd84, %rd83, 32; 2026-02-21T08:49:00.5655923Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T08:49:00.5656116Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5656179Z mov.b64 {%r345, %r346}, %rd85; 2026-02-21T08:49:00.5656240Z cvt.rn.f16x2.f32 %r347, %r346, %r345; 2026-02-21T08:49:00.5656410Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5656465Z cvt.u64.u32 %rd86, %r267; 2026-02-21T08:49:00.5656519Z cvt.u64.u32 %rd87, %r268; 2026-02-21T08:49:00.5656583Z shl.b64 %rd88, %rd87, 32; 2026-02-21T08:49:00.5656639Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T08:49:00.5656830Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5656889Z mov.b64 {%r348, %r349}, %rd89; 2026-02-21T08:49:00.5656959Z cvt.rn.f16x2.f32 %r350, %r349, %r348; 2026-02-21T08:49:00.5657125Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5657183Z cvt.u64.u32 %rd90, %r269; 2026-02-21T08:49:00.5657249Z cvt.u64.u32 %rd91, %r270; 2026-02-21T08:49:00.5657307Z shl.b64 %rd92, %rd91, 32; 2026-02-21T08:49:00.5657365Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T08:49:00.5657535Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5657595Z mov.b64 {%r351, %r352}, %rd93; 2026-02-21T08:49:00.5657658Z cvt.rn.f16x2.f32 %r353, %r352, %r351; 2026-02-21T08:49:00.5657824Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5657893Z cvt.u64.u32 %rd94, %r272; 2026-02-21T08:49:00.5657950Z cvt.u64.u32 %rd95, %r273; 2026-02-21T08:49:00.5658009Z shl.b64 %rd96, %rd95, 32; 2026-02-21T08:49:00.5658078Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T08:49:00.5658239Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5658323Z mov.b64 {%r354, %r355}, %rd97; 2026-02-21T08:49:00.5658393Z cvt.rn.f16x2.f32 %r356, %r355, %r354; 2026-02-21T08:49:00.5658558Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5658614Z cvt.u64.u32 %rd98, %r274; 2026-02-21T08:49:00.5658669Z cvt.u64.u32 %rd99, %r275; 2026-02-21T08:49:00.5658733Z shl.b64 %rd100, %rd99, 32; 2026-02-21T08:49:00.5658791Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T08:49:00.5658953Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5659020Z mov.b64 {%r357, %r358}, %rd101; 2026-02-21T08:49:00.5659082Z cvt.rn.f16x2.f32 %r359, %r358, %r357; 2026-02-21T08:49:00.5659245Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5659310Z cvt.u64.u32 %rd102, %r276; 2026-02-21T08:49:00.5659366Z cvt.u64.u32 %rd103, %r277; 2026-02-21T08:49:00.5659424Z shl.b64 %rd104, %rd103, 32; 2026-02-21T08:49:00.5659481Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T08:49:00.5659651Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5659734Z mov.b64 {%r360, %r361}, %rd105; 2026-02-21T08:49:00.5659794Z cvt.rn.f16x2.f32 %r362, %r361, %r360; 2026-02-21T08:49:00.5659956Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5660011Z cvt.u64.u32 %rd106, %r278; 2026-02-21T08:49:00.5660066Z cvt.u64.u32 %rd107, %r279; 2026-02-21T08:49:00.5660126Z shl.b64 %rd108, %rd107, 32; 2026-02-21T08:49:00.5660184Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T08:49:00.5660342Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5660398Z mov.b64 {%r363, %r364}, %rd109; 2026-02-21T08:49:00.5660467Z cvt.rn.f16x2.f32 %r365, %r364, %r363; 2026-02-21T08:49:00.5660626Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5660704Z cvt.u64.u32 %rd110, %r280; 2026-02-21T08:49:00.5660769Z cvt.u64.u32 %rd111, %r281; 2026-02-21T08:49:00.5660825Z shl.b64 %rd112, %rd111, 32; 2026-02-21T08:49:00.5660883Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T08:49:00.5661053Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5661111Z mov.b64 {%r366, %r367}, %rd113; 2026-02-21T08:49:00.5661170Z cvt.rn.f16x2.f32 %r368, %r367, %r366; 2026-02-21T08:49:00.5661332Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5661422Z cvt.u64.u32 %rd114, %r282; 2026-02-21T08:49:00.5661478Z cvt.u64.u32 %rd115, %r283; 2026-02-21T08:49:00.5661534Z shl.b64 %rd116, %rd115, 32; 2026-02-21T08:49:00.5661597Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T08:49:00.5661761Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5661820Z mov.b64 {%r369, %r370}, %rd117; 2026-02-21T08:49:00.5661888Z cvt.rn.f16x2.f32 %r371, %r370, %r369; 2026-02-21T08:49:00.5662050Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5662106Z cvt.u64.u32 %rd118, %r284; 2026-02-21T08:49:00.5662161Z cvt.u64.u32 %rd119, %r285; 2026-02-21T08:49:00.5662223Z shl.b64 %rd120, %rd119, 32; 2026-02-21T08:49:00.5662278Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T08:49:00.5662437Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5662501Z mov.b64 {%r372, %r373}, %rd121; 2026-02-21T08:49:00.5662561Z cvt.rn.f16x2.f32 %r374, %r373, %r372; 2026-02-21T08:49:00.5662717Z .loc 1 53 52 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:53:52 2026-02-21T08:49:00.5662801Z cvt.u64.u32 %rd122, %r286; 2026-02-21T08:49:00.5662857Z cvt.u64.u32 %rd123, %r287; 2026-02-21T08:49:00.5662914Z shl.b64 %rd124, %rd123, 32; 2026-02-21T08:49:00.5662971Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T08:49:00.5663136Z .loc 1 55 27 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:55:27 2026-02-21T08:49:00.5663192Z mov.b64 {%r375, %r376}, %rd125; 2026-02-21T08:49:00.5663251Z cvt.rn.f16x2.f32 %r377, %r376, %r375; 2026-02-21T08:49:00.5663414Z .loc 1 56 83 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:56:83 2026-02-21T08:49:00.5663505Z st.shared.v4.b32 [%r8], {%r332, %r344, %r356, %r368}; 2026-02-21T08:49:00.5663559Z bar.sync 0; 2026-02-21T08:49:00.5663620Z // begin inline asm 2026-02-21T08:49:00.5663766Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r309, %r313, %r317, %r321}, [%r293]; 2026-02-21T08:49:00.5663820Z // end inline asm 2026-02-21T08:49:00.5663869Z bar.sync 0; 2026-02-21T08:49:00.5663966Z st.shared.v4.b32 [%r8], {%r335, %r347, %r359, %r371}; 2026-02-21T08:49:00.5664017Z bar.sync 0; 2026-02-21T08:49:00.5664073Z // begin inline asm 2026-02-21T08:49:00.5664223Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r310, %r314, %r318, %r322}, [%r293]; 2026-02-21T08:49:00.5664298Z // end inline asm 2026-02-21T08:49:00.5664349Z bar.sync 0; 2026-02-21T08:49:00.5664432Z st.shared.v4.b32 [%r8], {%r338, %r350, %r362, %r374}; 2026-02-21T08:49:00.5664491Z bar.sync 0; 2026-02-21T08:49:00.5664544Z // begin inline asm 2026-02-21T08:49:00.5664707Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r311, %r315, %r319, %r323}, [%r293]; 2026-02-21T08:49:00.5664770Z // end inline asm 2026-02-21T08:49:00.5664822Z bar.sync 0; 2026-02-21T08:49:00.5664904Z st.shared.v4.b32 [%r8], {%r341, %r353, %r365, %r377}; 2026-02-21T08:49:00.5664955Z bar.sync 0; 2026-02-21T08:49:00.5665017Z // begin inline asm 2026-02-21T08:49:00.5665151Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r312, %r316, %r320, %r324}, [%r293]; 2026-02-21T08:49:00.5665206Z // end inline asm 2026-02-21T08:49:00.5665267Z // begin inline asm 2026-02-21T08:49:00.5665397Z st.global.v4.b32 [ %rd58 + 0 ], { %r309, %r310, %r311, %r312 }; 2026-02-21T08:49:00.5665452Z // end inline asm 2026-02-21T08:49:00.5665513Z // begin inline asm 2026-02-21T08:49:00.5665611Z st.global.v4.b32 [ %rd59 + 0 ], { %r313, %r314, %r315, %r316 }; 2026-02-21T08:49:00.5665665Z // end inline asm 2026-02-21T08:49:00.5665719Z // begin inline asm 2026-02-21T08:49:00.5665819Z st.global.v4.b32 [ %rd60 + 0 ], { %r317, %r318, %r319, %r320 }; 2026-02-21T08:49:00.5665872Z // end inline asm 2026-02-21T08:49:00.5665927Z // begin inline asm 2026-02-21T08:49:00.5666060Z st.global.v4.b32 [ %rd61 + 0 ], { %r321, %r322, %r323, %r324 }; 2026-02-21T08:49:00.5666116Z // end inline asm 2026-02-21T08:49:00.5666195Z $L__BB0_8: // %._crit_edge 2026-02-21T08:49:00.5666365Z .loc 1 27 4 // cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py:27:4 2026-02-21T08:49:00.5666429Z bar.sync 0; 2026-02-21T08:49:00.5666486Z // begin inline asm 2026-02-21T08:49:00.5666602Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r378, 64; 2026-02-21T08:49:00.5666665Z // end inline asm 2026-02-21T08:49:00.5666714Z ret; 2026-02-21T08:49:00.5666767Z $L__tmp0: 2026-02-21T08:49:00.5666826Z $L__func_end0: 2026-02-21T08:49:00.5666905Z // -- End function 2026-02-21T08:49:00.5666956Z } 2026-02-21T08:49:00.5667157Z .file 1 "/tmp/torchinductor_root/qz/cqzx3t3q74pyrdg2r7hiqf44harqrlznpa5umwqpucq2j5pezotb.py" 2026-02-21T08:49:00.5667226Z .section .debug_abbrev 2026-02-21T08:49:00.5667275Z { 2026-02-21T08:49:00.5667360Z .b8 1 // Abbreviation Code 2026-02-21T08:49:00.5667448Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:49:00.5667525Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:49:00.5667602Z .b8 37 // DW_AT_producer 2026-02-21T08:49:00.5667697Z .b8 8 // DW_FORM_string 2026-02-21T08:49:00.5667781Z .b8 19 // DW_AT_language 2026-02-21T08:49:00.5667858Z .b8 5 // DW_FORM_data2 2026-02-21T08:49:00.5667931Z .b8 3 // DW_AT_name 2026-02-21T08:49:00.5668008Z .b8 8 // DW_FORM_string 2026-02-21T08:49:00.5668082Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:49:00.5668155Z .b8 6 // DW_FORM_data4 2026-02-21T08:49:00.5668233Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:49:00.5668303Z .b8 8 // DW_FORM_string 2026-02-21T08:49:00.5668371Z .b8 0 // EOM(1) 2026-02-21T08:49:00.5668447Z .b8 0 // EOM(2) 2026-02-21T08:49:00.5668512Z .b8 0 // EOM(3) 2026-02-21T08:49:00.5668560Z } 2026-02-21T08:49:00.5668619Z .section .debug_info 2026-02-21T08:49:00.5668676Z { 2026-02-21T08:49:00.5668759Z .b32 104 // Length of Unit 2026-02-21T08:49:00.5668868Z .b8 2 // DWARF version number 2026-02-21T08:49:00.5668924Z .b8 0 2026-02-21T08:49:00.5669036Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:49:00.5669123Z .b8 8 // Address Size (in bytes) 2026-02-21T08:49:00.5669218Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:49:00.5669303Z .b8 116 // DW_AT_producer 2026-02-21T08:49:00.5669356Z .b8 114 2026-02-21T08:49:00.5669407Z .b8 105 2026-02-21T08:49:00.5669465Z .b8 116 2026-02-21T08:49:00.5669514Z .b8 111 2026-02-21T08:49:00.5669561Z .b8 110 2026-02-21T08:49:00.5669608Z .b8 0 2026-02-21T08:49:00.5669684Z .b8 2 // DW_AT_language 2026-02-21T08:49:00.5669733Z .b8 0 2026-02-21T08:49:00.5669805Z .b8 99 // DW_AT_name 2026-02-21T08:49:00.5669880Z .b8 113 2026-02-21T08:49:00.5669931Z .b8 122 2026-02-21T08:49:00.5669979Z .b8 120 2026-02-21T08:49:00.5670027Z .b8 51 2026-02-21T08:49:00.5670083Z .b8 116 2026-02-21T08:49:00.5670132Z .b8 51 2026-02-21T08:49:00.5670180Z .b8 113 2026-02-21T08:49:00.5670233Z .b8 55 2026-02-21T08:49:00.5670280Z .b8 52 2026-02-21T08:49:00.5670328Z .b8 112 2026-02-21T08:49:00.5670375Z .b8 121 2026-02-21T08:49:00.5670431Z .b8 114 2026-02-21T08:49:00.5670479Z .b8 100 2026-02-21T08:49:00.5670526Z .b8 103 2026-02-21T08:49:00.5670579Z .b8 50 2026-02-21T08:49:00.5670656Z .b8 114 2026-02-21T08:49:00.5670702Z .b8 55 2026-02-21T08:49:00.5670749Z .b8 104 2026-02-21T08:49:00.5670804Z .b8 105 2026-02-21T08:49:00.5670851Z .b8 113 2026-02-21T08:49:00.5670898Z .b8 102 2026-02-21T08:49:00.5670952Z .b8 52 2026-02-21T08:49:00.5670999Z .b8 52 2026-02-21T08:49:00.5671046Z .b8 104 2026-02-21T08:49:00.5671093Z .b8 97 2026-02-21T08:49:00.5671148Z .b8 114 2026-02-21T08:49:00.5671196Z .b8 113 2026-02-21T08:49:00.5671243Z .b8 114 2026-02-21T08:49:00.5671292Z .b8 108 2026-02-21T08:49:00.5671349Z .b8 122 2026-02-21T08:49:00.5671396Z .b8 110 2026-02-21T08:49:00.5671443Z .b8 112 2026-02-21T08:49:00.5671497Z .b8 97 2026-02-21T08:49:00.5671544Z .b8 53 2026-02-21T08:49:00.5671593Z .b8 117 2026-02-21T08:49:00.5671641Z .b8 109 2026-02-21T08:49:00.5671697Z .b8 119 2026-02-21T08:49:00.5671745Z .b8 113 2026-02-21T08:49:00.5671791Z .b8 112 2026-02-21T08:49:00.5671846Z .b8 117 2026-02-21T08:49:00.5671893Z .b8 99 2026-02-21T08:49:00.5671942Z .b8 113 2026-02-21T08:49:00.5671989Z .b8 50 2026-02-21T08:49:00.5672047Z .b8 106 2026-02-21T08:49:00.5672095Z .b8 53 2026-02-21T08:49:00.5672142Z .b8 112 2026-02-21T08:49:00.5672189Z .b8 101 2026-02-21T08:49:00.5672245Z .b8 122 2026-02-21T08:49:00.5672295Z .b8 111 2026-02-21T08:49:00.5672343Z .b8 116 2026-02-21T08:49:00.5672401Z .b8 98 2026-02-21T08:49:00.5672450Z .b8 46 2026-02-21T08:49:00.5672499Z .b8 112 2026-02-21T08:49:00.5672572Z .b8 121 2026-02-21T08:49:00.5672630Z .b8 0 2026-02-21T08:49:00.5672720Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:49:00.5672794Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:49:00.5672849Z .b8 116 2026-02-21T08:49:00.5672897Z .b8 109 2026-02-21T08:49:00.5672944Z .b8 112 2026-02-21T08:49:00.5672990Z .b8 47 2026-02-21T08:49:00.5673045Z .b8 116 2026-02-21T08:49:00.5673093Z .b8 111 2026-02-21T08:49:00.5673141Z .b8 114 2026-02-21T08:49:00.5673196Z .b8 99 2026-02-21T08:49:00.5673245Z .b8 104 2026-02-21T08:49:00.5673293Z .b8 105 2026-02-21T08:49:00.5673340Z .b8 110 2026-02-21T08:49:00.5673398Z .b8 100 2026-02-21T08:49:00.5673445Z .b8 117 2026-02-21T08:49:00.5673492Z .b8 99 2026-02-21T08:49:00.5673547Z .b8 116 2026-02-21T08:49:00.5673596Z .b8 111 2026-02-21T08:49:00.5673645Z .b8 114 2026-02-21T08:49:00.5673693Z .b8 95 2026-02-21T08:49:00.5673747Z .b8 114 2026-02-21T08:49:00.5673794Z .b8 111 2026-02-21T08:49:00.5673842Z .b8 111 2026-02-21T08:49:00.5673890Z .b8 116 2026-02-21T08:49:00.5673945Z .b8 47 2026-02-21T08:49:00.5673993Z .b8 113 2026-02-21T08:49:00.5674042Z .b8 122 2026-02-21T08:49:00.5674117Z .b8 0 2026-02-21T08:49:00.5674165Z } 2026-02-21T08:49:00.5674228Z .section .debug_macinfo { } 2026-02-21T08:49:00.5674232Z 2026-02-21T08:49:00.5674306Z ================================================================ 2026-02-21T08:49:00.5674411Z please share the reproducer above with Triton project. 2026-02-21T08:49:01.2976786Z [46s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:49:01.2977047Z 2026-02-21T08:49:01.2979089Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:49:01.2980518Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:49:01.2980843Z `ptxas` stderr: 2026-02-21T08:49:01.2981362Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 206 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:49:01.2981887Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:49:01.2982060Z 2026-02-21T08:49:01.2982533Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxs_ha0e0.ptx -o /tmp/tmpxs_ha0e0.ptx.o 2026-02-21T08:49:01.2983116Z 2026-02-21T08:49:01.2983262Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:49:01.2983646Z 2026-02-21T08:49:01.2983650Z 2026-02-21T08:49:01.2983737Z ================================================================ 2026-02-21T08:49:01.2983944Z Internal Triton PTX codegen error 2026-02-21T08:49:01.2984124Z `ptxas` stderr: 2026-02-21T08:49:01.2984547Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 206 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:49:01.2985278Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:49:01.2985441Z 2026-02-21T08:49:01.2985829Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxs_ha0e0.ptx -o /tmp/tmpxs_ha0e0.ptx.o 2026-02-21T08:49:01.2986324Z 2026-02-21T08:49:01.2986328Z 2026-02-21T08:49:01.2986389Z // 2026-02-21T08:49:01.2986527Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:49:01.2986702Z // 2026-02-21T08:49:01.2986768Z 2026-02-21T08:49:01.2986823Z .version 8.7 2026-02-21T08:49:01.2986961Z .target sm_100a 2026-02-21T08:49:01.2987090Z .address_size 64 2026-02-21T08:49:01.2987177Z 2026-02-21T08:49:01.2987353Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:49:01.2987602Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:49:01.2987827Z // @_helion_matmul 2026-02-21T08:49:01.2988042Z .visible .entry _helion_matmul( 2026-02-21T08:49:01.2988256Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:49:01.2988520Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:49:01.2988763Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:49:01.2989015Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:49:01.2989255Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:49:01.2989459Z ) 2026-02-21T08:49:01.2989574Z .reqntid 256 2026-02-21T08:49:01.2989705Z .maxnreg 32 2026-02-21T08:49:01.2989829Z { 2026-02-21T08:49:01.2989950Z .reg .pred %p<118>; 2026-02-21T08:49:01.2990103Z .reg .b32 %r<454>; 2026-02-21T08:49:01.2990241Z .reg .b64 %rd<123>; 2026-02-21T08:49:01.2990510Z .loc 1 19 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:19:0 2026-02-21T08:49:01.2990843Z $L__func_begin0: 2026-02-21T08:49:01.2991093Z .loc 1 19 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:19:0 2026-02-21T08:49:01.2991320Z 2026-02-21T08:49:01.2991379Z // %bb.0: 2026-02-21T08:49:01.2991530Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T08:49:01.2991715Z $L__tmp0: 2026-02-21T08:49:01.2991949Z .loc 1 19 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:19 2026-02-21T08:49:01.2992237Z mov.u32 %r1, %tid.x; 2026-02-21T08:49:01.2992377Z shr.u32 %r2, %r1, 5; 2026-02-21T08:49:01.2992538Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:49:01.2992724Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T08:49:01.2992882Z @%p3 bra $L__BB0_16; 2026-02-21T08:49:01.2993021Z bra.uni $L__BB0_1; 2026-02-21T08:49:01.2993165Z $L__BB0_16: 2026-02-21T08:49:01.2993432Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0:0 2026-02-21T08:49:01.2993744Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T08:49:01.2993952Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T08:49:01.2994232Z .loc 1 19 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:19 2026-02-21T08:49:01.2994534Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:49:01.2994752Z setp.lt.u32 %p27, %r1, 32; 2026-02-21T08:49:01.2994922Z mov.b32 %r139, global_smem; 2026-02-21T08:49:01.2995084Z // begin inline asm 2026-02-21T08:49:01.2995346Z @%p27 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r139], 64; 2026-02-21T08:49:01.2995596Z // end inline asm 2026-02-21T08:49:01.2995727Z bar.sync 0, 128; 2026-02-21T08:49:01.2995877Z ld.shared.b32 %r425, [global_smem]; 2026-02-21T08:49:01.2996042Z bar.sync 0, 128; 2026-02-21T08:49:01.2996179Z // begin inline asm 2026-02-21T08:49:01.2996376Z @%p27 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:49:01.2996609Z // end inline asm 2026-02-21T08:49:01.2996865Z .loc 1 21 67 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:21:67 2026-02-21T08:49:01.2997153Z mov.u32 %r41, %ctaid.x; 2026-02-21T08:49:01.2997311Z mov.u32 %r215, %ctaid.y; 2026-02-21T08:49:01.2997462Z mov.u32 %r216, %ctaid.z; 2026-02-21T08:49:01.2997621Z mov.u32 %r217, %nctaid.x; 2026-02-21T08:49:01.2997774Z mov.u32 %r218, %nctaid.y; 2026-02-21T08:49:01.2997937Z mad.lo.s32 %r219, %r216, %r218, %r215; 2026-02-21T08:49:01.2998112Z mad.lo.s32 %r220, %r219, %r217, %r41; 2026-02-21T08:49:01.2998288Z shl.b32 %r221, %r220, 8; 2026-02-21T08:49:01.2998432Z cvt.s64.s32 %rd54, %r221; 2026-02-21T08:49:01.2998587Z add.s64 %rd33, %rd6, %rd54; 2026-02-21T08:49:01.2998747Z shl.b32 %r222, %r1, 2; 2026-02-21T08:49:01.2998894Z add.s32 %r140, %r139, %r222; 2026-02-21T08:49:01.2999051Z mov.b32 %r453, 0; 2026-02-21T08:49:01.2999213Z // begin inline asm 2026-02-21T08:49:01.2999373Z @%p27 st.shared.b32 [ %r140 + 0 ], %r453; 2026-02-21T08:49:01.2999542Z // end inline asm 2026-02-21T08:49:01.2999684Z bar.warp.sync -1; 2026-02-21T08:49:01.2999824Z setp.eq.b32 %p100, %r1, 0; 2026-02-21T08:49:01.2999985Z cvt.u64.u32 %rd18, %r139; 2026-02-21T08:49:01.3000129Z // begin inline asm 2026-02-21T08:49:01.3000378Z @%p100 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd18 + 0 ], %rd3; 2026-02-21T08:49:01.3000659Z // end inline asm 2026-02-21T08:49:01.3000790Z // begin inline asm 2026-02-21T08:49:01.3001015Z @%p100 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:49:01.3001260Z // end inline asm 2026-02-21T08:49:01.3001393Z mov.b32 %r142, 16; 2026-02-21T08:49:01.3001524Z // begin inline asm 2026-02-21T08:49:01.3001763Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r142; 2026-02-21T08:49:01.3002042Z // end inline asm 2026-02-21T08:49:01.3002175Z mov.b32 %r143, 64; 2026-02-21T08:49:01.3002318Z // begin inline asm 2026-02-21T08:49:01.3002553Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r143; 2026-02-21T08:49:01.3002859Z // end inline asm 2026-02-21T08:49:01.3002994Z mov.b32 %r144, 2048; 2026-02-21T08:49:01.3003143Z // begin inline asm 2026-02-21T08:49:01.3003390Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r144; 2026-02-21T08:49:01.3003677Z // end inline asm 2026-02-21T08:49:01.3003817Z // begin inline asm 2026-02-21T08:49:01.3004060Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r144; 2026-02-21T08:49:01.3004346Z // end inline asm 2026-02-21T08:49:01.3004480Z mov.b64 %rd26, 4096; 2026-02-21T08:49:01.3004625Z // begin inline asm 2026-02-21T08:49:01.3004914Z @%p100 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd18 + 0 ], 0x0, %rd26; 2026-02-21T08:49:01.3005219Z // end inline asm 2026-02-21T08:49:01.3005360Z mov.b32 %r146, 1; 2026-02-21T08:49:01.3005496Z // begin inline asm 2026-02-21T08:49:01.3005793Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r146; 2026-02-21T08:49:01.3006098Z // end inline asm 2026-02-21T08:49:01.3006242Z // begin inline asm 2026-02-21T08:49:01.3006501Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r146; 2026-02-21T08:49:01.3006830Z // end inline asm 2026-02-21T08:49:01.3007001Z // begin inline asm 2026-02-21T08:49:01.3007293Z @%p100 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x6; 2026-02-21T08:49:01.3007607Z // end inline asm 2026-02-21T08:49:01.3007742Z // begin inline asm 2026-02-21T08:49:01.3008009Z @%p100 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:49:01.3008338Z // end inline asm 2026-02-21T08:49:01.3008499Z // begin inline asm 2026-02-21T08:49:01.3008738Z @%p100 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:49:01.3009018Z // end inline asm 2026-02-21T08:49:01.3009160Z // begin inline asm 2026-02-21T08:49:01.3009393Z @%p100 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:49:01.3009665Z // end inline asm 2026-02-21T08:49:01.3009799Z // begin inline asm 2026-02-21T08:49:01.3010163Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd33 + 0 ], [ %rd18 + 0 ], 0x80; 2026-02-21T08:49:01.3010522Z // end inline asm 2026-02-21T08:49:01.3010655Z // begin inline asm 2026-02-21T08:49:01.3010863Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd33 + 0 ], 0x80; 2026-02-21T08:49:01.3011106Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T08:49:01.3011299Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:49:01.3011465Z // end inline asm 2026-02-21T08:49:01.3011598Z bar.sync 0, 128; 2026-02-21T08:49:01.3011876Z .loc 1 22 68 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:22:68 2026-02-21T08:49:01.3012178Z add.s64 %rd51, %rd33, 128; 2026-02-21T08:49:01.3012327Z bar.sync 0, 128; 2026-02-21T08:49:01.3012459Z // begin inline asm 2026-02-21T08:49:01.3012611Z @%p27 st.shared.b32 [ %r140 + 0 ], %r453; 2026-02-21T08:49:01.3012777Z // end inline asm 2026-02-21T08:49:01.3012916Z bar.warp.sync -1; 2026-02-21T08:49:01.3013048Z // begin inline asm 2026-02-21T08:49:01.3013293Z @%p100 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd18 + 0 ], %rd4; 2026-02-21T08:49:01.3013566Z // end inline asm 2026-02-21T08:49:01.3013700Z // begin inline asm 2026-02-21T08:49:01.3013913Z @%p100 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:49:01.3014164Z // end inline asm 2026-02-21T08:49:01.3014296Z // begin inline asm 2026-02-21T08:49:01.3014520Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r142; 2026-02-21T08:49:01.3014829Z // end inline asm 2026-02-21T08:49:01.3014957Z // begin inline asm 2026-02-21T08:49:01.3015189Z @%p100 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r143; 2026-02-21T08:49:01.3015471Z // end inline asm 2026-02-21T08:49:01.3015608Z // begin inline asm 2026-02-21T08:49:01.3015851Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r144; 2026-02-21T08:49:01.3016120Z // end inline asm 2026-02-21T08:49:01.3016267Z mov.b32 %r153, 12288; 2026-02-21T08:49:01.3016413Z // begin inline asm 2026-02-21T08:49:01.3016646Z @%p100 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r153; 2026-02-21T08:49:01.3016919Z // end inline asm 2026-02-21T08:49:01.3017045Z // begin inline asm 2026-02-21T08:49:01.3017298Z @%p100 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd18 + 0 ], 0x0, %rd26; 2026-02-21T08:49:01.3017582Z // end inline asm 2026-02-21T08:49:01.3017716Z // begin inline asm 2026-02-21T08:49:01.3017971Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r146; 2026-02-21T08:49:01.3018291Z // end inline asm 2026-02-21T08:49:01.3018428Z // begin inline asm 2026-02-21T08:49:01.3018673Z @%p100 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r146; 2026-02-21T08:49:01.3018955Z // end inline asm 2026-02-21T08:49:01.3019080Z // begin inline asm 2026-02-21T08:49:01.3019309Z @%p100 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x6; 2026-02-21T08:49:01.3019570Z // end inline asm 2026-02-21T08:49:01.3019696Z // begin inline asm 2026-02-21T08:49:01.3019944Z @%p100 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:49:01.3020246Z // end inline asm 2026-02-21T08:49:01.3020379Z // begin inline asm 2026-02-21T08:49:01.3020603Z @%p100 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T08:49:01.3020873Z // end inline asm 2026-02-21T08:49:01.3021004Z // begin inline asm 2026-02-21T08:49:01.3021224Z @%p100 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T08:49:01.3021477Z // end inline asm 2026-02-21T08:49:01.3021604Z // begin inline asm 2026-02-21T08:49:01.3021942Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd51 + 0 ], [ %rd18 + 0 ], 0x80; 2026-02-21T08:49:01.3022310Z // end inline asm 2026-02-21T08:49:01.3022445Z // begin inline asm 2026-02-21T08:49:01.3022651Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd51 + 0 ], 0x80; 2026-02-21T08:49:01.3022892Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T08:49:01.3023085Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:49:01.3023253Z // end inline asm 2026-02-21T08:49:01.3023386Z bar.sync 0, 128; 2026-02-21T08:49:01.3023643Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3023944Z sub.s32 %r223, 6144, %r41; 2026-02-21T08:49:01.3024130Z mul.hi.s32 %r224, %r223, -580400985; 2026-02-21T08:49:01.3024309Z add.s32 %r225, %r224, %r223; 2026-02-21T08:49:01.3024472Z shr.u32 %r226, %r225, 31; 2026-02-21T08:49:01.3024621Z shr.s32 %r227, %r225, 14; 2026-02-21T08:49:01.3024803Z add.s32 %r228, %r227, %r226; 2026-02-21T08:49:01.3024962Z mul.lo.s32 %r229, %r228, 18944; 2026-02-21T08:49:01.3025137Z setp.ne.b32 %p91, %r223, %r229; 2026-02-21T08:49:01.3025300Z setp.lt.u32 %p92, %r41, 6145; 2026-02-21T08:49:01.3025471Z and.pred %p93, %p92, %p91; 2026-02-21T08:49:01.3025628Z selp.b32 %r230, 1, 0, %p93; 2026-02-21T08:49:01.3025789Z add.s32 %r231, %r228, %r230; 2026-02-21T08:49:01.3025939Z shl.b32 %r48, %r231, 7; 2026-02-21T08:49:01.3026212Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3026525Z shfl.sync.idx.b32 %r232, %r2, 0, 31, -1; 2026-02-21T08:49:01.3026705Z shl.b32 %r233, %r232, 21; 2026-02-21T08:49:01.3026870Z and.b32 %r234, %r233, 6291456; 2026-02-21T08:49:01.3027029Z add.s32 %r156, %r234, %r425; 2026-02-21T08:49:01.3027188Z mov.pred %p65, -1; 2026-02-21T08:49:01.3027328Z // begin inline asm 2026-02-21T08:49:01.3027735Z @%p65 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r156 + 0], 32, {%r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453}; 2026-02-21T08:49:01.3028141Z // end inline asm 2026-02-21T08:49:01.3028271Z // begin inline asm 2026-02-21T08:49:01.3028634Z @%p65 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r156 + 16], 32, {%r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453, %r453}; 2026-02-21T08:49:01.3029023Z // end inline asm 2026-02-21T08:49:01.3029152Z // begin inline asm 2026-02-21T08:49:01.3029297Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:49:01.3029456Z // end inline asm 2026-02-21T08:49:01.3029576Z bar.sync 0, 128; 2026-02-21T08:49:01.3029834Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3030132Z add.s32 %r190, %r139, 30720; 2026-02-21T08:49:01.3030302Z // begin inline asm 2026-02-21T08:49:01.3030472Z @%p100 mbarrier.init.shared::cta.b64 [%r190], 1; 2026-02-21T08:49:01.3030654Z // end inline asm 2026-02-21T08:49:01.3030784Z bar.sync 0, 128; 2026-02-21T08:49:01.3030905Z add.s32 %r191, %r139, 30728; 2026-02-21T08:49:01.3031045Z // begin inline asm 2026-02-21T08:49:01.3031193Z @%p100 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T08:49:01.3031372Z // end inline asm 2026-02-21T08:49:01.3031491Z bar.sync 0, 128; 2026-02-21T08:49:01.3031612Z add.s32 %r192, %r139, 30736; 2026-02-21T08:49:01.3031780Z // begin inline asm 2026-02-21T08:49:01.3031926Z @%p100 mbarrier.init.shared::cta.b64 [%r192], 1; 2026-02-21T08:49:01.3032109Z // end inline asm 2026-02-21T08:49:01.3032233Z bar.sync 0, 128; 2026-02-21T08:49:01.3032367Z add.s32 %r193, %r139, 30744; 2026-02-21T08:49:01.3032509Z // begin inline asm 2026-02-21T08:49:01.3032669Z @%p100 mbarrier.init.shared::cta.b64 [%r193], 1; 2026-02-21T08:49:01.3032848Z // end inline asm 2026-02-21T08:49:01.3032973Z bar.sync 0, 128; 2026-02-21T08:49:01.3033109Z add.s32 %r194, %r139, 30752; 2026-02-21T08:49:01.3033252Z // begin inline asm 2026-02-21T08:49:01.3033413Z @%p100 mbarrier.init.shared::cta.b64 [%r194], 1; 2026-02-21T08:49:01.3033589Z // end inline asm 2026-02-21T08:49:01.3033721Z bar.sync 0, 128; 2026-02-21T08:49:01.3033848Z add.s32 %r195, %r139, 30760; 2026-02-21T08:49:01.3033997Z // begin inline asm 2026-02-21T08:49:01.3034149Z @%p100 mbarrier.init.shared::cta.b64 [%r195], 1; 2026-02-21T08:49:01.3034330Z // end inline asm 2026-02-21T08:49:01.3034461Z bar.sync 0, 128; 2026-02-21T08:49:01.3034587Z add.s32 %r196, %r139, 30768; 2026-02-21T08:49:01.3034767Z // begin inline asm 2026-02-21T08:49:01.3034921Z @%p100 mbarrier.init.shared::cta.b64 [%r196], 1; 2026-02-21T08:49:01.3035104Z // end inline asm 2026-02-21T08:49:01.3035232Z add.s32 %r197, %r139, 30784; 2026-02-21T08:49:01.3035415Z // begin inline asm 2026-02-21T08:49:01.3035581Z @%p100 mbarrier.init.shared::cta.b64 [%r197], 1; 2026-02-21T08:49:01.3035780Z // end inline asm 2026-02-21T08:49:01.3035925Z bar.sync 0, 128; 2026-02-21T08:49:01.3036061Z add.s32 %r198, %r139, 30792; 2026-02-21T08:49:01.3036222Z // begin inline asm 2026-02-21T08:49:01.3036380Z @%p100 mbarrier.init.shared::cta.b64 [%r198], 1; 2026-02-21T08:49:01.3036567Z // end inline asm 2026-02-21T08:49:01.3036697Z bar.sync 0, 128; 2026-02-21T08:49:01.3036836Z add.s32 %r199, %r139, 30800; 2026-02-21T08:49:01.3036985Z // begin inline asm 2026-02-21T08:49:01.3037150Z @%p100 mbarrier.init.shared::cta.b64 [%r199], 1; 2026-02-21T08:49:01.3037335Z // end inline asm 2026-02-21T08:49:01.3037472Z bar.sync 0, 128; 2026-02-21T08:49:01.3037609Z add.s32 %r200, %r139, 30808; 2026-02-21T08:49:01.3037759Z // begin inline asm 2026-02-21T08:49:01.3037924Z @%p100 mbarrier.init.shared::cta.b64 [%r200], 1; 2026-02-21T08:49:01.3038101Z // end inline asm 2026-02-21T08:49:01.3038240Z bar.sync 0, 128; 2026-02-21T08:49:01.3038373Z add.s32 %r201, %r139, 30816; 2026-02-21T08:49:01.3038530Z // begin inline asm 2026-02-21T08:49:01.3038709Z @%p100 mbarrier.init.shared::cta.b64 [%r201], 1; 2026-02-21T08:49:01.3038891Z // end inline asm 2026-02-21T08:49:01.3039014Z bar.sync 0, 128; 2026-02-21T08:49:01.3039146Z add.s32 %r202, %r139, 30824; 2026-02-21T08:49:01.3039295Z // begin inline asm 2026-02-21T08:49:01.3039445Z @%p100 mbarrier.init.shared::cta.b64 [%r202], 1; 2026-02-21T08:49:01.3039628Z // end inline asm 2026-02-21T08:49:01.3039751Z bar.sync 0, 128; 2026-02-21T08:49:01.3039884Z add.s32 %r203, %r139, 30832; 2026-02-21T08:49:01.3040033Z // begin inline asm 2026-02-21T08:49:01.3040196Z @%p100 mbarrier.init.shared::cta.b64 [%r203], 1; 2026-02-21T08:49:01.3040373Z // end inline asm 2026-02-21T08:49:01.3040613Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3040899Z bar.sync 0, 128; 2026-02-21T08:49:01.3041025Z // begin inline asm 2026-02-21T08:49:01.3041224Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r190]; 2026-02-21T08:49:01.3041418Z // end inline asm 2026-02-21T08:49:01.3041552Z bar.sync 0, 128; 2026-02-21T08:49:01.3041679Z // begin inline asm 2026-02-21T08:49:01.3041852Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r191]; 2026-02-21T08:49:01.3042037Z // end inline asm 2026-02-21T08:49:01.3042171Z bar.sync 0, 128; 2026-02-21T08:49:01.3042302Z // begin inline asm 2026-02-21T08:49:01.3042461Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r192]; 2026-02-21T08:49:01.3042655Z // end inline asm 2026-02-21T08:49:01.3042809Z bar.sync 0, 128; 2026-02-21T08:49:01.3042942Z // begin inline asm 2026-02-21T08:49:01.3043097Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r193]; 2026-02-21T08:49:01.3043284Z // end inline asm 2026-02-21T08:49:01.3043409Z bar.sync 0, 128; 2026-02-21T08:49:01.3043542Z // begin inline asm 2026-02-21T08:49:01.3043699Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r194]; 2026-02-21T08:49:01.3043885Z // end inline asm 2026-02-21T08:49:01.3044016Z bar.sync 0, 128; 2026-02-21T08:49:01.3044144Z // begin inline asm 2026-02-21T08:49:01.3044308Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r195]; 2026-02-21T08:49:01.3044486Z // end inline asm 2026-02-21T08:49:01.3044615Z bar.sync 0, 128; 2026-02-21T08:49:01.3044771Z // begin inline asm 2026-02-21T08:49:01.3044937Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r196]; 2026-02-21T08:49:01.3045115Z // end inline asm 2026-02-21T08:49:01.3045376Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3045693Z bar.sync 0, 128; 2026-02-21T08:49:01.3045829Z add.s32 %r211, %r139, 30848; 2026-02-21T08:49:01.3045995Z // begin inline asm 2026-02-21T08:49:01.3046159Z @%p100 mbarrier.init.shared::cta.b64 [%r211], 1; 2026-02-21T08:49:01.3046357Z // end inline asm 2026-02-21T08:49:01.3046494Z add.s32 %r407, %r139, 30864; 2026-02-21T08:49:01.3046686Z // begin inline asm 2026-02-21T08:49:01.3046851Z @%p100 mbarrier.init.shared::cta.b64 [%r407], 1; 2026-02-21T08:49:01.3047043Z // end inline asm 2026-02-21T08:49:01.3047301Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3047588Z bar.sync 0, 128; 2026-02-21T08:49:01.3047725Z // begin inline asm 2026-02-21T08:49:01.3047887Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r407]; 2026-02-21T08:49:01.3048082Z // end inline asm 2026-02-21T08:49:01.3048336Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3048665Z st.shared.b32 [global_smem+30872], 33554689; 2026-02-21T08:49:01.3048875Z st.shared.b32 [global_smem+28672], %r425; 2026-02-21T08:49:01.3049072Z st.shared.b32 [global_smem+28680], %r48; 2026-02-21T08:49:01.3049257Z barrier.sync 1; 2026-02-21T08:49:01.3049413Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:49:01.3049603Z barrier.sync 1; 2026-02-21T08:49:01.3049745Z setp.lt.s32 %p94, %r231, 1; 2026-02-21T08:49:01.3049915Z @%p94 bra $L__BB0_23; 2026-02-21T08:49:01.3050081Z // %bb.17: // %.lr.ph7 2026-02-21T08:49:01.3050425Z .loc 1 0 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0:131 2026-02-21T08:49:01.3050750Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T08:49:01.3050945Z bfe.u32 %r42, %r1, 3, 4; 2026-02-21T08:49:01.3051106Z or.b32 %r43, %r42, 16; 2026-02-21T08:49:01.3051256Z or.b32 %r44, %r42, 32; 2026-02-21T08:49:01.3051410Z or.b32 %r45, %r42, 48; 2026-02-21T08:49:01.3051551Z shl.b32 %r46, %r1, 3; 2026-02-21T08:49:01.3051709Z and.b32 %r47, %r46, 56; 2026-02-21T08:49:01.3051983Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3052292Z add.s32 %r450, %r41, -18944; 2026-02-21T08:49:01.3052456Z shl.b32 %r237, %r1, 4; 2026-02-21T08:49:01.3052606Z and.b32 %r238, %r237, 176; 2026-02-21T08:49:01.3052763Z and.b32 %r239, %r1, 96; 2026-02-21T08:49:01.3052950Z shl.b32 %r240, %r239, 3; 2026-02-21T08:49:01.3053112Z bfe.s32 %r241, %r1, 2, 1; 2026-02-21T08:49:01.3053270Z and.b32 %r242, %r241, 1088; 2026-02-21T08:49:01.3053431Z and.b32 %r244, %r222, 64; 2026-02-21T08:49:01.3053583Z xor.b32 %r245, %r242, %r244; 2026-02-21T08:49:01.3053745Z add.s32 %r247, %r139, 28672; 2026-02-21T08:49:01.3053900Z add.s32 %r248, %r247, %r238; 2026-02-21T08:49:01.3054062Z add.s32 %r249, %r248, %r240; 2026-02-21T08:49:01.3054224Z add.s32 %r51, %r249, %r245; 2026-02-21T08:49:01.3054379Z shl.b32 %r250, %r1, 5; 2026-02-21T08:49:01.3054561Z and.b32 %r251, %r250, 1792; 2026-02-21T08:49:01.3054742Z and.b32 %r252, %r46, 48; 2026-02-21T08:49:01.3054903Z shl.b32 %r253, %r239, 1; 2026-02-21T08:49:01.3055063Z shl.b32 %r254, %r1, 6; 2026-02-21T08:49:01.3055207Z and.b32 %r255, %r254, 64; 2026-02-21T08:49:01.3055354Z xor.b32 %r256, %r253, %r255; 2026-02-21T08:49:01.3055509Z add.s32 %r257, %r247, %r251; 2026-02-21T08:49:01.3055655Z add.s32 %r258, %r257, %r252; 2026-02-21T08:49:01.3055807Z add.s32 %r302, %r258, %r256; 2026-02-21T08:49:01.3055959Z max.s32 %r443, %r48, 1; 2026-02-21T08:49:01.3056100Z mov.b32 %r448, -1; 2026-02-21T08:49:01.3056241Z mov.b32 %r451, %r453; 2026-02-21T08:49:01.3056379Z mov.b32 %r452, %r453; 2026-02-21T08:49:01.3056524Z bra.uni $L__BB0_18; 2026-02-21T08:49:01.3056708Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:49:01.3057031Z .loc 1 40 32 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:40:32 2026-02-21T08:49:01.3057315Z or.b32 %r335, %r452, %r47; 2026-02-21T08:49:01.3057573Z .loc 1 42 32 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:42:32 2026-02-21T08:49:01.3057853Z add.s32 %r336, %r451, %r42; 2026-02-21T08:49:01.3058000Z add.s32 %r337, %r451, %r43; 2026-02-21T08:49:01.3058152Z add.s32 %r338, %r451, %r44; 2026-02-21T08:49:01.3058320Z add.s32 %r339, %r451, %r45; 2026-02-21T08:49:01.3058581Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3058855Z bar.sync 0, 128; 2026-02-21T08:49:01.3058994Z // begin inline asm 2026-02-21T08:49:01.3059125Z 2026-02-21T08:49:01.3059241Z { 2026-02-21T08:49:01.3059365Z .reg .pred complete; 2026-02-21T08:49:01.3059505Z waitLoop: 2026-02-21T08:49:01.3059694Z mbarrier.try_wait.parity.shared.b64 complete, [%r211], %r453; 2026-02-21T08:49:01.3059921Z @!complete bra.uni waitLoop; 2026-02-21T08:49:01.3060075Z } 2026-02-21T08:49:01.3060138Z 2026-02-21T08:49:01.3060195Z // end inline asm 2026-02-21T08:49:01.3060331Z // begin inline asm 2026-02-21T08:49:01.3060685Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278}, [%r156 + 0], 32; 2026-02-21T08:49:01.3061069Z // end inline asm 2026-02-21T08:49:01.3061206Z // begin inline asm 2026-02-21T08:49:01.3061555Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295}, [%r156 + 16], 32; 2026-02-21T08:49:01.3061991Z // end inline asm 2026-02-21T08:49:01.3062118Z // begin inline asm 2026-02-21T08:49:01.3062270Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:49:01.3062426Z // end inline asm 2026-02-21T08:49:01.3062558Z bar.sync 0, 128; 2026-02-21T08:49:01.3062685Z // begin inline asm 2026-02-21T08:49:01.3062858Z @%p100 mbarrier.arrive.shared::cta.b64 _, [%r407]; 2026-02-21T08:49:01.3063057Z // end inline asm 2026-02-21T08:49:01.3063192Z cvt.u64.u32 %rd59, %r263; 2026-02-21T08:49:01.3063349Z cvt.u64.u32 %rd60, %r264; 2026-02-21T08:49:01.3063496Z shl.b64 %rd61, %rd60, 32; 2026-02-21T08:49:01.3063648Z or.b64 %rd62, %rd59, %rd61; 2026-02-21T08:49:01.3063905Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3064206Z mov.b64 {%r341, %r342}, %rd62; 2026-02-21T08:49:01.3064393Z cvt.rn.f16x2.f32 %r343, %r342, %r341; 2026-02-21T08:49:01.3064707Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3064999Z cvt.u64.u32 %rd63, %r265; 2026-02-21T08:49:01.3065143Z cvt.u64.u32 %rd64, %r266; 2026-02-21T08:49:01.3065295Z shl.b64 %rd65, %rd64, 32; 2026-02-21T08:49:01.3065438Z or.b64 %rd66, %rd63, %rd65; 2026-02-21T08:49:01.3065703Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3065983Z mov.b64 {%r344, %r345}, %rd66; 2026-02-21T08:49:01.3066183Z cvt.rn.f16x2.f32 %r346, %r345, %r344; 2026-02-21T08:49:01.3066461Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3066736Z cvt.u64.u32 %rd67, %r267; 2026-02-21T08:49:01.3066887Z cvt.u64.u32 %rd68, %r268; 2026-02-21T08:49:01.3067030Z shl.b64 %rd69, %rd68, 32; 2026-02-21T08:49:01.3067178Z or.b64 %rd70, %rd67, %rd69; 2026-02-21T08:49:01.3067433Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3067721Z mov.b64 {%r347, %r348}, %rd70; 2026-02-21T08:49:01.3067881Z cvt.rn.f16x2.f32 %r349, %r348, %r347; 2026-02-21T08:49:01.3068156Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3068439Z cvt.u64.u32 %rd71, %r269; 2026-02-21T08:49:01.3068583Z cvt.u64.u32 %rd72, %r270; 2026-02-21T08:49:01.3068732Z shl.b64 %rd73, %rd72, 32; 2026-02-21T08:49:01.3068875Z or.b64 %rd74, %rd71, %rd73; 2026-02-21T08:49:01.3069136Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3069410Z mov.b64 {%r350, %r351}, %rd74; 2026-02-21T08:49:01.3069576Z cvt.rn.f16x2.f32 %r352, %r351, %r350; 2026-02-21T08:49:01.3069878Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3070160Z cvt.u64.u32 %rd75, %r271; 2026-02-21T08:49:01.3070314Z cvt.u64.u32 %rd76, %r272; 2026-02-21T08:49:01.3070459Z shl.b64 %rd77, %rd76, 32; 2026-02-21T08:49:01.3070610Z or.b64 %rd78, %rd75, %rd77; 2026-02-21T08:49:01.3070867Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3071156Z mov.b64 {%r353, %r354}, %rd78; 2026-02-21T08:49:01.3071312Z cvt.rn.f16x2.f32 %r355, %r354, %r353; 2026-02-21T08:49:01.3071591Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3071876Z cvt.u64.u32 %rd79, %r273; 2026-02-21T08:49:01.3072018Z cvt.u64.u32 %rd80, %r274; 2026-02-21T08:49:01.3072166Z shl.b64 %rd81, %rd80, 32; 2026-02-21T08:49:01.3072307Z or.b64 %rd82, %rd79, %rd81; 2026-02-21T08:49:01.3072569Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3072852Z mov.b64 {%r356, %r357}, %rd82; 2026-02-21T08:49:01.3073019Z cvt.rn.f16x2.f32 %r358, %r357, %r356; 2026-02-21T08:49:01.3073330Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3073609Z cvt.u64.u32 %rd83, %r275; 2026-02-21T08:49:01.3073760Z cvt.u64.u32 %rd84, %r276; 2026-02-21T08:49:01.3073903Z shl.b64 %rd85, %rd84, 32; 2026-02-21T08:49:01.3074052Z or.b64 %rd86, %rd83, %rd85; 2026-02-21T08:49:01.3074310Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3074605Z mov.b64 {%r359, %r360}, %rd86; 2026-02-21T08:49:01.3074797Z cvt.rn.f16x2.f32 %r361, %r360, %r359; 2026-02-21T08:49:01.3075087Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3075381Z cvt.u64.u32 %rd87, %r277; 2026-02-21T08:49:01.3075524Z cvt.u64.u32 %rd88, %r278; 2026-02-21T08:49:01.3075676Z shl.b64 %rd89, %rd88, 32; 2026-02-21T08:49:01.3075841Z or.b64 %rd90, %rd87, %rd89; 2026-02-21T08:49:01.3076106Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3076387Z mov.b64 {%r362, %r363}, %rd90; 2026-02-21T08:49:01.3076554Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T08:49:01.3076833Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3077116Z cvt.u64.u32 %rd91, %r280; 2026-02-21T08:49:01.3077265Z cvt.u64.u32 %rd92, %r281; 2026-02-21T08:49:01.3077434Z shl.b64 %rd93, %rd92, 32; 2026-02-21T08:49:01.3077582Z or.b64 %rd94, %rd91, %rd93; 2026-02-21T08:49:01.3077837Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3078123Z mov.b64 {%r365, %r366}, %rd94; 2026-02-21T08:49:01.3078278Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T08:49:01.3078553Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3078842Z cvt.u64.u32 %rd95, %r282; 2026-02-21T08:49:01.3078983Z cvt.u64.u32 %rd96, %r283; 2026-02-21T08:49:01.3079133Z shl.b64 %rd97, %rd96, 32; 2026-02-21T08:49:01.3079275Z or.b64 %rd98, %rd95, %rd97; 2026-02-21T08:49:01.3079533Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3079816Z mov.b64 {%r368, %r369}, %rd98; 2026-02-21T08:49:01.3079976Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T08:49:01.3080249Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3080534Z cvt.u64.u32 %rd99, %r284; 2026-02-21T08:49:01.3080687Z cvt.u64.u32 %rd100, %r285; 2026-02-21T08:49:01.3080838Z shl.b64 %rd101, %rd100, 32; 2026-02-21T08:49:01.3080997Z or.b64 %rd102, %rd99, %rd101; 2026-02-21T08:49:01.3081281Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3081352Z mov.b64 {%r371, %r372}, %rd102; 2026-02-21T08:49:01.3081414Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T08:49:01.3081579Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3081645Z cvt.u64.u32 %rd103, %r286; 2026-02-21T08:49:01.3081699Z cvt.u64.u32 %rd104, %r287; 2026-02-21T08:49:01.3081755Z shl.b64 %rd105, %rd104, 32; 2026-02-21T08:49:01.3081813Z or.b64 %rd106, %rd103, %rd105; 2026-02-21T08:49:01.3081986Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3082046Z mov.b64 {%r374, %r375}, %rd106; 2026-02-21T08:49:01.3082108Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T08:49:01.3082278Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3082336Z cvt.u64.u32 %rd107, %r288; 2026-02-21T08:49:01.3082391Z cvt.u64.u32 %rd108, %r289; 2026-02-21T08:49:01.3082454Z shl.b64 %rd109, %rd108, 32; 2026-02-21T08:49:01.3082540Z or.b64 %rd110, %rd107, %rd109; 2026-02-21T08:49:01.3082702Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3082761Z mov.b64 {%r377, %r378}, %rd110; 2026-02-21T08:49:01.3082832Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T08:49:01.3082990Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3083046Z cvt.u64.u32 %rd111, %r290; 2026-02-21T08:49:01.3083109Z cvt.u64.u32 %rd112, %r291; 2026-02-21T08:49:01.3083164Z shl.b64 %rd113, %rd112, 32; 2026-02-21T08:49:01.3083221Z or.b64 %rd114, %rd111, %rd113; 2026-02-21T08:49:01.3083385Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3083443Z mov.b64 {%r380, %r381}, %rd114; 2026-02-21T08:49:01.3083502Z cvt.rn.f16x2.f32 %r382, %r381, %r380; 2026-02-21T08:49:01.3083688Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3083756Z cvt.u64.u32 %rd115, %r292; 2026-02-21T08:49:01.3083811Z cvt.u64.u32 %rd116, %r293; 2026-02-21T08:49:01.3083868Z shl.b64 %rd117, %rd116, 32; 2026-02-21T08:49:01.3083930Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T08:49:01.3084099Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3084156Z mov.b64 {%r383, %r384}, %rd118; 2026-02-21T08:49:01.3084247Z cvt.rn.f16x2.f32 %r385, %r384, %r383; 2026-02-21T08:49:01.3084405Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3084459Z cvt.u64.u32 %rd119, %r294; 2026-02-21T08:49:01.3084513Z cvt.u64.u32 %rd120, %r295; 2026-02-21T08:49:01.3084578Z shl.b64 %rd121, %rd120, 32; 2026-02-21T08:49:01.3084635Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T08:49:01.3084826Z .loc 1 55 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:55:27 2026-02-21T08:49:01.3084891Z mov.b64 {%r386, %r387}, %rd122; 2026-02-21T08:49:01.3084950Z cvt.rn.f16x2.f32 %r388, %r387, %r386; 2026-02-21T08:49:01.3085106Z .loc 1 56 53 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:56:53 2026-02-21T08:49:01.3085178Z mad.lo.s32 %r389, %r336, 12288, %r335; 2026-02-21T08:49:01.3085241Z mad.lo.s32 %r390, %r337, 12288, %r335; 2026-02-21T08:49:01.3085299Z mad.lo.s32 %r391, %r338, 12288, %r335; 2026-02-21T08:49:01.3085359Z mad.lo.s32 %r392, %r339, 12288, %r335; 2026-02-21T08:49:01.3085526Z .loc 1 56 24 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:56:24 2026-02-21T08:49:01.3085590Z mad.wide.s32 %rd55, %r389, 2, %rd5; 2026-02-21T08:49:01.3085650Z mad.wide.s32 %rd56, %r390, 2, %rd5; 2026-02-21T08:49:01.3085740Z mad.wide.s32 %rd57, %r391, 2, %rd5; 2026-02-21T08:49:01.3085801Z mad.wide.s32 %rd58, %r392, 2, %rd5; 2026-02-21T08:49:01.3085969Z .loc 1 56 83 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:56:83 2026-02-21T08:49:01.3086031Z bar.sync 0, 128; 2026-02-21T08:49:01.3086122Z st.shared.v4.b32 [%r51], {%r343, %r355, %r367, %r379}; 2026-02-21T08:49:01.3086178Z bar.sync 0, 128; 2026-02-21T08:49:01.3086235Z // begin inline asm 2026-02-21T08:49:01.3086390Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r318, %r322, %r326, %r330}, [%r302]; 2026-02-21T08:49:01.3086443Z // end inline asm 2026-02-21T08:49:01.3086499Z bar.sync 0, 128; 2026-02-21T08:49:01.3086593Z st.shared.v4.b32 [%r51], {%r346, %r358, %r370, %r382}; 2026-02-21T08:49:01.3086646Z bar.sync 0, 128; 2026-02-21T08:49:01.3086699Z // begin inline asm 2026-02-21T08:49:01.3086842Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r319, %r323, %r327, %r331}, [%r302]; 2026-02-21T08:49:01.3086904Z // end inline asm 2026-02-21T08:49:01.3086956Z bar.sync 0, 128; 2026-02-21T08:49:01.3087042Z st.shared.v4.b32 [%r51], {%r349, %r361, %r373, %r385}; 2026-02-21T08:49:01.3087218Z bar.sync 0, 128; 2026-02-21T08:49:01.3087272Z // begin inline asm 2026-02-21T08:49:01.3087410Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r320, %r324, %r328, %r332}, [%r302]; 2026-02-21T08:49:01.3087468Z // end inline asm 2026-02-21T08:49:01.3087520Z bar.sync 0, 128; 2026-02-21T08:49:01.3087603Z st.shared.v4.b32 [%r51], {%r352, %r364, %r376, %r388}; 2026-02-21T08:49:01.3087656Z bar.sync 0, 128; 2026-02-21T08:49:01.3087718Z // begin inline asm 2026-02-21T08:49:01.3087853Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r321, %r325, %r329, %r333}, [%r302]; 2026-02-21T08:49:01.3087907Z // end inline asm 2026-02-21T08:49:01.3087967Z // begin inline asm 2026-02-21T08:49:01.3088066Z st.global.v4.b32 [ %rd55 + 0 ], { %r318, %r319, %r320, %r321 }; 2026-02-21T08:49:01.3088119Z // end inline asm 2026-02-21T08:49:01.3088174Z // begin inline asm 2026-02-21T08:49:01.3088277Z st.global.v4.b32 [ %rd56 + 0 ], { %r322, %r323, %r324, %r325 }; 2026-02-21T08:49:01.3088363Z // end inline asm 2026-02-21T08:49:01.3088419Z // begin inline asm 2026-02-21T08:49:01.3088520Z st.global.v4.b32 [ %rd57 + 0 ], { %r326, %r327, %r328, %r329 }; 2026-02-21T08:49:01.3088573Z // end inline asm 2026-02-21T08:49:01.3088627Z // begin inline asm 2026-02-21T08:49:01.3088724Z st.global.v4.b32 [ %rd58 + 0 ], { %r330, %r331, %r332, %r333 }; 2026-02-21T08:49:01.3088779Z // end inline asm 2026-02-21T08:49:01.3088834Z mov.b32 %r449, 1; 2026-02-21T08:49:01.3088936Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:49:01.3089145Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3089208Z xor.b32 %r453, %r449, %r453; 2026-02-21T08:49:01.3089391Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3089463Z add.s32 %r443, %r443, -1; 2026-02-21T08:49:01.3089527Z setp.ne.b32 %p99, %r443, 0; 2026-02-21T08:49:01.3089588Z @%p99 bra $L__BB0_18; 2026-02-21T08:49:01.3089655Z bra.uni $L__BB0_23; 2026-02-21T08:49:01.3089761Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T08:49:01.3089821Z add.s32 %r260, %r448, 1; 2026-02-21T08:49:01.3089886Z setp.eq.b32 %p95, %r448, 127; 2026-02-21T08:49:01.3089957Z selp.b32 %r448, 0, %r260, %p95; 2026-02-21T08:49:01.3090020Z setp.eq.b32 %p96, %r448, 127; 2026-02-21T08:49:01.3090077Z @%p96 bra $L__BB0_21; 2026-02-21T08:49:01.3090182Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:49:01.3090360Z .loc 1 0 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0:131 2026-02-21T08:49:01.3090416Z mov.b32 %r449, 0; 2026-02-21T08:49:01.3090604Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3090701Z setp.ne.b32 %p97, %r448, 0; 2026-02-21T08:49:01.3090761Z @%p97 bra $L__BB0_22; 2026-02-21T08:49:01.3090837Z // %bb.20: // %.thread 2026-02-21T08:49:01.3090938Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T08:49:01.3090999Z add.s32 %r450, %r450, 18944; 2026-02-21T08:49:01.3091176Z .loc 1 34 35 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:34:35 2026-02-21T08:49:01.3091323Z shr.s32 %r394, %r450, 31; 2026-02-21T08:49:01.3091405Z shr.u32 %r395, %r394, 25; 2026-02-21T08:49:01.3091490Z add.s32 %r396, %r450, %r395; 2026-02-21T08:49:01.3091576Z shr.s32 %r397, %r396, 7; 2026-02-21T08:49:01.3091804Z .loc 1 35 33 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:35:33 2026-02-21T08:49:01.3091888Z shl.b32 %r398, %r397, 2; 2026-02-21T08:49:01.3092090Z .loc 1 36 39 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:36:39 2026-02-21T08:49:01.3092208Z sub.s32 %r399, 192, %r398; 2026-02-21T08:49:01.3092406Z .loc 1 36 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:36:52 2026-02-21T08:49:01.3092506Z min.s32 %r400, %r399, 4; 2026-02-21T08:49:01.3092730Z .loc 1 37 45 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:37:45 2026-02-21T08:49:01.3092812Z and.b32 %r401, %r396, -128; 2026-02-21T08:49:01.3092894Z sub.s32 %r402, %r450, %r401; 2026-02-21T08:49:01.3093123Z .loc 1 38 51 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:38:51 2026-02-21T08:49:01.3093204Z div.s32 %r403, %r402, %r400; 2026-02-21T08:49:01.3093401Z .loc 1 37 64 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:37:64 2026-02-21T08:49:01.3093485Z mul.lo.s32 %r404, %r403, %r400; 2026-02-21T08:49:01.3093593Z sub.s32 %r405, %r402, %r404; 2026-02-21T08:49:01.3093797Z .loc 1 37 30 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:37:30 2026-02-21T08:49:01.3093905Z add.s32 %r406, %r405, %r398; 2026-02-21T08:49:01.3094134Z .loc 1 39 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:39:27 2026-02-21T08:49:01.3094215Z shl.b32 %r452, %r406, 6; 2026-02-21T08:49:01.3094409Z .loc 1 41 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:41:27 2026-02-21T08:49:01.3094517Z shl.b32 %r451, %r403, 6; 2026-02-21T08:49:01.3094759Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3094868Z bra.uni $L__BB0_22; 2026-02-21T08:49:01.3095015Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:49:01.3095212Z .loc 1 0 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0:131 2026-02-21T08:49:01.3095298Z mov.b32 %r72, global_smem; 2026-02-21T08:49:01.3095380Z add.s32 %r73, %r72, %r3; 2026-02-21T08:49:01.3095489Z bra.uni $L__BB0_2; 2026-02-21T08:49:01.3095613Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3095817Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3095953Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:49:01.3096035Z barrier.sync 1; 2026-02-21T08:49:01.3096115Z barrier.sync 1; 2026-02-21T08:49:01.3096245Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:49:01.3096348Z $L__BB0_2: // %.preheader 2026-02-21T08:49:01.3096463Z // =>This Loop Header: Depth=1 2026-02-21T08:49:01.3096577Z // Child Loop BB0_11 Depth 2 2026-02-21T08:49:01.3096717Z // Child Loop BB0_7 Depth 2 2026-02-21T08:49:01.3096939Z .loc 1 19 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:19 2026-02-21T08:49:01.3097037Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:49:01.3097148Z barrier.sync 1; 2026-02-21T08:49:01.3097237Z ld.shared.b8 %r71, [%r73+30868]; 2026-02-21T08:49:01.3097322Z setp.gt.u32 %p4, %r71, 3; 2026-02-21T08:49:01.3097433Z @%p4 bra $L__BB0_4; 2026-02-21T08:49:01.3097538Z // %bb.3: // %.preheader 2026-02-21T08:49:01.3097648Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3097740Z $L_brx_0: .branchtargets 2026-02-21T08:49:01.3097844Z $L__BB0_5, 2026-02-21T08:49:01.3097916Z $L__BB0_9, 2026-02-21T08:49:01.3097987Z $L__BB0_15, 2026-02-21T08:49:01.3098091Z $L__BB0_24; 2026-02-21T08:49:01.3098171Z brx.idx %r71, $L_brx_0; 2026-02-21T08:49:01.3098284Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3098477Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3098601Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:49:01.3098699Z ld.shared.b32 %r120, [global_smem+28672]; 2026-02-21T08:49:01.3098794Z ld.shared.b32 %r427, [global_smem+28680]; 2026-02-21T08:49:01.3098923Z barrier.sync 1; 2026-02-21T08:49:01.3099006Z setp.lt.s32 %p17, %r427, 1; 2026-02-21T08:49:01.3099084Z @%p17 bra $L__BB0_8; 2026-02-21T08:49:01.3099206Z // %bb.6: // %.lr.ph4 2026-02-21T08:49:01.3099310Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3099502Z .loc 1 0 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0:131 2026-02-21T08:49:01.3099580Z mov.b32 %r431, -1; 2026-02-21T08:49:01.3099687Z mov.pred %p117, 0; 2026-02-21T08:49:01.3099761Z mov.b32 %r428, 0; 2026-02-21T08:49:01.3099839Z mov.b32 %r429, %r428; 2026-02-21T08:49:01.3099941Z mov.b32 %r430, %r428; 2026-02-21T08:49:01.3100052Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:49:01.3100161Z // => This Inner Loop Header: Depth=2 2026-02-21T08:49:01.3100402Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3100484Z add.s32 %r124, %r431, 1; 2026-02-21T08:49:01.3100567Z setp.eq.b32 %p24, %r431, 127; 2026-02-21T08:49:01.3100651Z selp.b32 %r431, 0, %r124, %p24; 2026-02-21T08:49:01.3100752Z shl.b32 %r125, %r430, 3; 2026-02-21T08:49:01.3100832Z add.s32 %r127, %r72, %r125; 2026-02-21T08:49:01.3100909Z add.s32 %r128, %r127, 30720; 2026-02-21T08:49:01.3101011Z add.s32 %r118, %r127, 30784; 2026-02-21T08:49:01.3101223Z .loc 1 51 31 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:51:31 2026-02-21T08:49:01.3101302Z shl.b32 %r129, %r430, 11; 2026-02-21T08:49:01.3101380Z add.s32 %r130, %r72, %r129; 2026-02-21T08:49:01.3101593Z .loc 1 52 44 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:52:44 2026-02-21T08:49:01.3101671Z add.s32 %r131, %r130, 14336; 2026-02-21T08:49:01.3101857Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3101971Z bar.warp.sync -1; 2026-02-21T08:49:01.3102046Z // begin inline asm 2026-02-21T08:49:01.3102117Z 2026-02-21T08:49:01.3102215Z { 2026-02-21T08:49:01.3102296Z .reg .pred complete; 2026-02-21T08:49:01.3102370Z waitLoop: 2026-02-21T08:49:01.3102507Z mbarrier.try_wait.parity.shared.b64 complete, [%r118], %r429; 2026-02-21T08:49:01.3102614Z @!complete bra.uni waitLoop; 2026-02-21T08:49:01.3102681Z } 2026-02-21T08:49:01.3102688Z 2026-02-21T08:49:01.3102763Z // end inline asm 2026-02-21T08:49:01.3102991Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3103071Z setp.eq.b32 %p23, %r431, 127; 2026-02-21T08:49:01.3103279Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3103389Z elect.sync %r132|%p20, -1; 2026-02-21T08:49:01.3103467Z bfe.u32 %r133, %r130, 4, 14; 2026-02-21T08:49:01.3103544Z cvt.u64.u32 %rd16, %r133; 2026-02-21T08:49:01.3103635Z or.b64 %rd12, %rd16, -4611685949699522560; 2026-02-21T08:49:01.3103737Z bfe.u32 %r134, %r131, 4, 14; 2026-02-21T08:49:01.3103816Z cvt.u64.u32 %rd17, %r134; 2026-02-21T08:49:01.3103906Z or.b64 %rd13, %rd17, -4611685949699522560; 2026-02-21T08:49:01.3104006Z mov.b32 %r121, 68157456; 2026-02-21T08:49:01.3104079Z // begin inline asm 2026-02-21T08:49:01.3104243Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r120 + 0 ], %rd12, %rd13, %r121, %p117; 2026-02-21T08:49:01.3104342Z // end inline asm 2026-02-21T08:49:01.3104421Z cvt.u64.u32 %rd14, %r128; 2026-02-21T08:49:01.3104497Z // begin inline asm 2026-02-21T08:49:01.3104642Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd14]; 2026-02-21T08:49:01.3104768Z // end inline asm 2026-02-21T08:49:01.3104854Z and.pred %p22, %p23, %p20; 2026-02-21T08:49:01.3104931Z add.s32 %r135, %r72, 30848; 2026-02-21T08:49:01.3105034Z cvt.u64.u32 %rd15, %r135; 2026-02-21T08:49:01.3105136Z // begin inline asm 2026-02-21T08:49:01.3105277Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd15]; 2026-02-21T08:49:01.3105350Z // end inline asm 2026-02-21T08:49:01.3105567Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3105649Z setp.ne.b32 %p117, %r431, 127; 2026-02-21T08:49:01.3105839Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3105953Z selp.b32 %r136, 1, 0, %p23; 2026-02-21T08:49:01.3106030Z xor.b32 %r428, %r428, %r136; 2026-02-21T08:49:01.3106106Z add.s32 %r122, %r72, 30864; 2026-02-21T08:49:01.3106211Z // begin inline asm 2026-02-21T08:49:01.3106280Z 2026-02-21T08:49:01.3106351Z { 2026-02-21T08:49:01.3106432Z @!%p23 bra.uni skipWait; 2026-02-21T08:49:01.3106544Z .reg .pred complete; 2026-02-21T08:49:01.3106621Z waitLoop: 2026-02-21T08:49:01.3106787Z mbarrier.try_wait.parity.shared.b64 complete, [%r122], %r428; 2026-02-21T08:49:01.3106899Z @!complete bra.uni waitLoop; 2026-02-21T08:49:01.3106974Z skipWait: 2026-02-21T08:49:01.3107045Z } 2026-02-21T08:49:01.3107049Z 2026-02-21T08:49:01.3107125Z // end inline asm 2026-02-21T08:49:01.3107331Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3107414Z add.s32 %r137, %r430, 1; 2026-02-21T08:49:01.3107490Z setp.eq.b32 %p25, %r137, 7; 2026-02-21T08:49:01.3107604Z selp.b32 %r430, 0, %r137, %p25; 2026-02-21T08:49:01.3107712Z selp.b32 %r138, 1, 0, %p25; 2026-02-21T08:49:01.3107787Z xor.b32 %r429, %r429, %r138; 2026-02-21T08:49:01.3108014Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3108098Z add.s32 %r427, %r427, -1; 2026-02-21T08:49:01.3108178Z setp.ne.b32 %p26, %r427, 0; 2026-02-21T08:49:01.3108253Z @%p26 bra $L__BB0_7; 2026-02-21T08:49:01.3108382Z $L__BB0_8: // %._crit_edge5 2026-02-21T08:49:01.3108489Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3108565Z barrier.sync 1; 2026-02-21T08:49:01.3108689Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:49:01.3108763Z bra.uni $L__BB0_2; 2026-02-21T08:49:01.3108875Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3109085Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3109181Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:49:01.3109274Z ld.shared.b32 %r432, [global_smem+28680]; 2026-02-21T08:49:01.3109348Z barrier.sync 1; 2026-02-21T08:49:01.3109562Z .loc 1 21 67 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:21:67 2026-02-21T08:49:01.3109640Z mov.u32 %r17, %ctaid.x; 2026-02-21T08:49:01.3109738Z mov.u32 %r74, %ctaid.y; 2026-02-21T08:49:01.3109841Z mov.u32 %r75, %ctaid.z; 2026-02-21T08:49:01.3109922Z mov.u32 %r76, %nctaid.x; 2026-02-21T08:49:01.3109998Z mov.u32 %r77, %nctaid.y; 2026-02-21T08:49:01.3110081Z mad.lo.s32 %r78, %r75, %r77, %r74; 2026-02-21T08:49:01.3110190Z mad.lo.s32 %r79, %r78, %r76, %r17; 2026-02-21T08:49:01.3110265Z shl.b32 %r80, %r79, 8; 2026-02-21T08:49:01.3110452Z .loc 1 22 68 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:22:68 2026-02-21T08:49:01.3110555Z cvt.s64.s32 %rd7, %r80; 2026-02-21T08:49:01.3110634Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T08:49:01.3110712Z add.s64 %rd9, %rd8, 128; 2026-02-21T08:49:01.3110819Z cvta.global.u64 %rd11, %rd9; 2026-02-21T08:49:01.3111007Z .loc 1 21 67 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:21:67 2026-02-21T08:49:01.3111088Z cvta.global.u64 %rd10, %rd8; 2026-02-21T08:49:01.3111278Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3111386Z setp.lt.s32 %p5, %r432, 1; 2026-02-21T08:49:01.3111487Z @%p5 bra $L__BB0_14; 2026-02-21T08:49:01.3111582Z // %bb.10: // %.lr.ph 2026-02-21T08:49:01.3111715Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3111793Z add.s32 %r442, %r17, -18944; 2026-02-21T08:49:01.3111869Z add.s32 %r19, %r1, -128; 2026-02-21T08:49:01.3111969Z mov.b32 %r439, -1; 2026-02-21T08:49:01.3112043Z mov.b32 %r433, 0; 2026-02-21T08:49:01.3112118Z mov.b32 %r434, %r433; 2026-02-21T08:49:01.3112195Z mov.b32 %r441, %r433; 2026-02-21T08:49:01.3112297Z mov.b32 %r440, %r433; 2026-02-21T08:49:01.3112371Z mov.b32 %r437, %r433; 2026-02-21T08:49:01.3112450Z bra.uni $L__BB0_11; 2026-02-21T08:49:01.3112590Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T08:49:01.3112782Z .loc 1 0 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0:131 2026-02-21T08:49:01.3112883Z selp.b32 %r101, 0, %r437, %p8; 2026-02-21T08:49:01.3112962Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T08:49:01.3113067Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T08:49:01.3113259Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3113336Z shl.b32 %r108, %r434, 3; 2026-02-21T08:49:01.3113439Z add.s32 %r110, %r72, %r108; 2026-02-21T08:49:01.3113522Z add.s32 %r97, %r110, 30720; 2026-02-21T08:49:01.3113703Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3113838Z // begin inline asm 2026-02-21T08:49:01.3113887Z 2026-02-21T08:49:01.3113944Z { 2026-02-21T08:49:01.3114030Z .reg .pred complete; 2026-02-21T08:49:01.3114132Z waitLoop: 2026-02-21T08:49:01.3114266Z mbarrier.try_wait.parity.shared.b64 complete, [%r97], %r433; 2026-02-21T08:49:01.3114346Z @!complete bra.uni waitLoop; 2026-02-21T08:49:01.3114441Z } 2026-02-21T08:49:01.3114445Z 2026-02-21T08:49:01.3114521Z // end inline asm 2026-02-21T08:49:01.3114752Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3114857Z add.s32 %r103, %r110, 30784; 2026-02-21T08:49:01.3115040Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3115113Z bar.sync 3, 64; 2026-02-21T08:49:01.3115191Z // begin inline asm 2026-02-21T08:49:01.3115350Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r103], 4096; 2026-02-21T08:49:01.3115426Z // end inline asm 2026-02-21T08:49:01.3115618Z .loc 1 51 31 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:51:31 2026-02-21T08:49:01.3115721Z shl.b32 %r111, %r434, 11; 2026-02-21T08:49:01.3115799Z add.s32 %r100, %r72, %r111; 2026-02-21T08:49:01.3116011Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3116113Z bar.sync 3, 64; 2026-02-21T08:49:01.3116199Z elect.sync %r112|%p13, -1; 2026-02-21T08:49:01.3116281Z and.pred %p10, %p12, %p13; 2026-02-21T08:49:01.3116359Z // begin inline asm 2026-02-21T08:49:01.3116650Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r100], [%rd10, {%r101, %r441}], [%r103]; 2026-02-21T08:49:01.3116727Z // end inline asm 2026-02-21T08:49:01.3116918Z .loc 1 52 44 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:52:44 2026-02-21T08:49:01.3117023Z add.s32 %r104, %r100, 14336; 2026-02-21T08:49:01.3117203Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3117278Z bar.sync 3, 64; 2026-02-21T08:49:01.3117384Z elect.sync %r113|%p14, -1; 2026-02-21T08:49:01.3117465Z and.pred %p11, %p12, %p14; 2026-02-21T08:49:01.3117542Z // begin inline asm 2026-02-21T08:49:01.3117809Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r104], [%rd11, {%r101, %r440}], [%r103]; 2026-02-21T08:49:01.3117915Z // end inline asm 2026-02-21T08:49:01.3118136Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3118211Z add.s32 %r437, %r101, 16; 2026-02-21T08:49:01.3118414Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3118493Z add.s32 %r114, %r434, 1; 2026-02-21T08:49:01.3118571Z setp.eq.b32 %p15, %r114, 7; 2026-02-21T08:49:01.3118680Z selp.b32 %r434, 0, %r114, %p15; 2026-02-21T08:49:01.3118761Z selp.b32 %r115, 1, 0, %p15; 2026-02-21T08:49:01.3118836Z xor.b32 %r433, %r433, %r115; 2026-02-21T08:49:01.3119031Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3119133Z add.s32 %r432, %r432, -1; 2026-02-21T08:49:01.3119208Z setp.ne.b32 %p16, %r432, 0; 2026-02-21T08:49:01.3119288Z @%p16 bra $L__BB0_11; 2026-02-21T08:49:01.3119391Z bra.uni $L__BB0_14; 2026-02-21T08:49:01.3119536Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:49:01.3119640Z // => This Inner Loop Header: Depth=2 2026-02-21T08:49:01.3119743Z add.s32 %r83, %r439, 1; 2026-02-21T08:49:01.3119822Z setp.eq.b32 %p6, %r439, 127; 2026-02-21T08:49:01.3119902Z selp.b32 %r439, 0, %r83, %p6; 2026-02-21T08:49:01.3119981Z setp.ne.b32 %p7, %r439, 0; 2026-02-21T08:49:01.3120088Z setp.eq.b32 %p8, %r439, 0; 2026-02-21T08:49:01.3120164Z @%p7 bra $L__BB0_13; 2026-02-21T08:49:01.3120302Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T08:49:01.3120407Z add.s32 %r442, %r442, 18944; 2026-02-21T08:49:01.3120596Z .loc 1 34 35 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:34:35 2026-02-21T08:49:01.3120674Z shr.s32 %r84, %r442, 31; 2026-02-21T08:49:01.3120776Z shr.u32 %r85, %r84, 25; 2026-02-21T08:49:01.3120840Z add.s32 %r86, %r442, %r85; 2026-02-21T08:49:01.3120918Z shr.s32 %r87, %r86, 7; 2026-02-21T08:49:01.3121097Z .loc 1 35 33 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:35:33 2026-02-21T08:49:01.3121199Z shl.b32 %r88, %r87, 2; 2026-02-21T08:49:01.3121379Z .loc 1 36 39 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:36:39 2026-02-21T08:49:01.3121454Z sub.s32 %r89, 192, %r88; 2026-02-21T08:49:01.3121668Z .loc 1 36 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:36:52 2026-02-21T08:49:01.3121737Z min.s32 %r90, %r89, 4; 2026-02-21T08:49:01.3121914Z .loc 1 37 45 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:37:45 2026-02-21T08:49:01.3122016Z and.b32 %r91, %r86, -128; 2026-02-21T08:49:01.3122093Z sub.s32 %r92, %r442, %r91; 2026-02-21T08:49:01.3122307Z .loc 1 38 51 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:38:51 2026-02-21T08:49:01.3122376Z div.s32 %r93, %r92, %r90; 2026-02-21T08:49:01.3122564Z .loc 1 37 64 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:37:64 2026-02-21T08:49:01.3122645Z mul.lo.s32 %r94, %r93, %r90; 2026-02-21T08:49:01.3122720Z sub.s32 %r95, %r92, %r94; 2026-02-21T08:49:01.3122932Z .loc 1 37 30 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:37:30 2026-02-21T08:49:01.3122999Z add.s32 %r96, %r95, %r88; 2026-02-21T08:49:01.3123176Z .loc 1 39 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:39:27 2026-02-21T08:49:01.3123273Z shl.b32 %r440, %r96, 6; 2026-02-21T08:49:01.3123456Z .loc 1 41 27 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:41:27 2026-02-21T08:49:01.3123523Z shl.b32 %r441, %r93, 6; 2026-02-21T08:49:01.3123578Z bra.uni $L__BB0_13; 2026-02-21T08:49:01.3123666Z $L__BB0_14: // %._crit_edge 2026-02-21T08:49:01.3123752Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3123945Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3124008Z barrier.sync 1; 2026-02-21T08:49:01.3124082Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:49:01.3124136Z bra.uni $L__BB0_2; 2026-02-21T08:49:01.3124233Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:49:01.3124392Z .loc 1 19 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:19 2026-02-21T08:49:01.3124448Z barrier.sync 1; 2026-02-21T08:49:01.3124501Z barrier.sync 1; 2026-02-21T08:49:01.3124560Z bra.uni $L__BB0_2; 2026-02-21T08:49:01.3124638Z $L__BB0_23: // %._crit_edge8 2026-02-21T08:49:01.3124837Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3124903Z barrier.sync 1; 2026-02-21T08:49:01.3124999Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:49:01.3125169Z .loc 1 53 52 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:53:52 2026-02-21T08:49:01.3125231Z bar.sync 0, 128; 2026-02-21T08:49:01.3125285Z // begin inline asm 2026-02-21T08:49:01.3125333Z 2026-02-21T08:49:01.3125381Z { 2026-02-21T08:49:01.3125448Z .reg .pred complete; 2026-02-21T08:49:01.3125501Z waitLoop: 2026-02-21T08:49:01.3125619Z mbarrier.try_wait.parity.shared.b64 complete, [%r407], %r453; 2026-02-21T08:49:01.3125688Z @!complete bra.uni waitLoop; 2026-02-21T08:49:01.3125759Z } 2026-02-21T08:49:01.3125763Z 2026-02-21T08:49:01.3125816Z // end inline asm 2026-02-21T08:49:01.3125994Z .loc 1 28 131 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:131 2026-02-21T08:49:01.3126049Z bar.sync 0, 128; 2026-02-21T08:49:01.3126104Z // begin inline asm 2026-02-21T08:49:01.3126191Z @%p100 mbarrier.inval.shared::cta.b64 [%r407]; 2026-02-21T08:49:01.3126251Z // end inline asm 2026-02-21T08:49:01.3126307Z // begin inline asm 2026-02-21T08:49:01.3126390Z @%p100 mbarrier.inval.shared::cta.b64 [%r211]; 2026-02-21T08:49:01.3126448Z // end inline asm 2026-02-21T08:49:01.3126499Z // begin inline asm 2026-02-21T08:49:01.3126577Z @%p100 mbarrier.inval.shared::cta.b64 [%r197]; 2026-02-21T08:49:01.3126628Z // end inline asm 2026-02-21T08:49:01.3126689Z bar.sync 0, 128; 2026-02-21T08:49:01.3126741Z // begin inline asm 2026-02-21T08:49:01.3126813Z @%p100 mbarrier.inval.shared::cta.b64 [%r198]; 2026-02-21T08:49:01.3126871Z // end inline asm 2026-02-21T08:49:01.3126924Z bar.sync 0, 128; 2026-02-21T08:49:01.3126975Z // begin inline asm 2026-02-21T08:49:01.3127052Z @%p100 mbarrier.inval.shared::cta.b64 [%r199]; 2026-02-21T08:49:01.3127103Z // end inline asm 2026-02-21T08:49:01.3127154Z bar.sync 0, 128; 2026-02-21T08:49:01.3127205Z // begin inline asm 2026-02-21T08:49:01.3127311Z @%p100 mbarrier.inval.shared::cta.b64 [%r200]; 2026-02-21T08:49:01.3127363Z // end inline asm 2026-02-21T08:49:01.3127415Z bar.sync 0, 128; 2026-02-21T08:49:01.3127477Z // begin inline asm 2026-02-21T08:49:01.3127549Z @%p100 mbarrier.inval.shared::cta.b64 [%r201]; 2026-02-21T08:49:01.3127600Z // end inline asm 2026-02-21T08:49:01.3127651Z bar.sync 0, 128; 2026-02-21T08:49:01.3127711Z // begin inline asm 2026-02-21T08:49:01.3127783Z @%p100 mbarrier.inval.shared::cta.b64 [%r202]; 2026-02-21T08:49:01.3127833Z // end inline asm 2026-02-21T08:49:01.3127890Z bar.sync 0, 128; 2026-02-21T08:49:01.3127942Z // begin inline asm 2026-02-21T08:49:01.3128015Z @%p100 mbarrier.inval.shared::cta.b64 [%r203]; 2026-02-21T08:49:01.3128067Z // end inline asm 2026-02-21T08:49:01.3128126Z // begin inline asm 2026-02-21T08:49:01.3128197Z @%p100 mbarrier.inval.shared::cta.b64 [%r190]; 2026-02-21T08:49:01.3128248Z // end inline asm 2026-02-21T08:49:01.3128306Z bar.sync 0, 128; 2026-02-21T08:49:01.3128358Z // begin inline asm 2026-02-21T08:49:01.3128433Z @%p100 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T08:49:01.3128491Z // end inline asm 2026-02-21T08:49:01.3128574Z bar.sync 0, 128; 2026-02-21T08:49:01.3128627Z // begin inline asm 2026-02-21T08:49:01.3128700Z @%p100 mbarrier.inval.shared::cta.b64 [%r192]; 2026-02-21T08:49:01.3128757Z // end inline asm 2026-02-21T08:49:01.3128808Z bar.sync 0, 128; 2026-02-21T08:49:01.3128861Z // begin inline asm 2026-02-21T08:49:01.3128943Z @%p100 mbarrier.inval.shared::cta.b64 [%r193]; 2026-02-21T08:49:01.3128993Z // end inline asm 2026-02-21T08:49:01.3129045Z bar.sync 0, 128; 2026-02-21T08:49:01.3129097Z // begin inline asm 2026-02-21T08:49:01.3129180Z @%p100 mbarrier.inval.shared::cta.b64 [%r194]; 2026-02-21T08:49:01.3129232Z // end inline asm 2026-02-21T08:49:01.3129282Z bar.sync 0, 128; 2026-02-21T08:49:01.3129341Z // begin inline asm 2026-02-21T08:49:01.3129415Z @%p100 mbarrier.inval.shared::cta.b64 [%r195]; 2026-02-21T08:49:01.3129467Z // end inline asm 2026-02-21T08:49:01.3129519Z bar.sync 0, 128; 2026-02-21T08:49:01.3129579Z // begin inline asm 2026-02-21T08:49:01.3129673Z @%p100 mbarrier.inval.shared::cta.b64 [%r196]; 2026-02-21T08:49:01.3129729Z // end inline asm 2026-02-21T08:49:01.3129902Z .loc 1 28 4 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:28:4 2026-02-21T08:49:01.3129955Z bar.sync 0, 128; 2026-02-21T08:49:01.3130008Z // begin inline asm 2026-02-21T08:49:01.3130129Z @%p27 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r425, 64; 2026-02-21T08:49:01.3130182Z // end inline asm 2026-02-21T08:49:01.3130257Z st.shared.b32 [global_smem+30872], 50529027; 2026-02-21T08:49:01.3130332Z barrier.sync 1; 2026-02-21T08:49:01.3130421Z $L__BB0_24: // %common.ret 2026-02-21T08:49:01.3130578Z .loc 1 0 0 // c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py:0 2026-02-21T08:49:01.3130628Z ret; 2026-02-21T08:49:01.3130688Z $L__tmp1: 2026-02-21T08:49:01.3130742Z $L__func_end0: 2026-02-21T08:49:01.3130820Z // -- End function 2026-02-21T08:49:01.3130880Z } 2026-02-21T08:49:01.3131086Z .file 1 "/tmp/torchinductor_root/7n/c7njqeph4ntxdg5yjdluhdbvwm4efqdsfpdpfecg4e5zmbnkgiv7.py" 2026-02-21T08:49:01.3131149Z .section .debug_abbrev 2026-02-21T08:49:01.3131199Z { 2026-02-21T08:49:01.3131293Z .b8 1 // Abbreviation Code 2026-02-21T08:49:01.3131380Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:49:01.3131460Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:49:01.3131549Z .b8 37 // DW_AT_producer 2026-02-21T08:49:01.3131624Z .b8 8 // DW_FORM_string 2026-02-21T08:49:01.3131700Z .b8 19 // DW_AT_language 2026-02-21T08:49:01.3131835Z .b8 5 // DW_FORM_data2 2026-02-21T08:49:01.3131958Z .b8 3 // DW_AT_name 2026-02-21T08:49:01.3132055Z .b8 8 // DW_FORM_string 2026-02-21T08:49:01.3132157Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:49:01.3132282Z .b8 6 // DW_FORM_data4 2026-02-21T08:49:01.3132378Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:49:01.3132472Z .b8 8 // DW_FORM_string 2026-02-21T08:49:01.3132585Z .b8 0 // EOM(1) 2026-02-21T08:49:01.3132679Z .b8 0 // EOM(2) 2026-02-21T08:49:01.3132769Z .b8 0 // EOM(3) 2026-02-21T08:49:01.3132873Z } 2026-02-21T08:49:01.3132954Z .section .debug_info 2026-02-21T08:49:01.3133030Z { 2026-02-21T08:49:01.3133134Z .b32 104 // Length of Unit 2026-02-21T08:49:01.3133269Z .b8 2 // DWARF version number 2026-02-21T08:49:01.3133342Z .b8 0 2026-02-21T08:49:01.3133483Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:49:01.3133622Z .b8 8 // Address Size (in bytes) 2026-02-21T08:49:01.3133765Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:49:01.3133866Z .b8 116 // DW_AT_producer 2026-02-21T08:49:01.3133972Z .b8 114 2026-02-21T08:49:01.3134046Z .b8 105 2026-02-21T08:49:01.3134120Z .b8 116 2026-02-21T08:49:01.3134194Z .b8 111 2026-02-21T08:49:01.3134295Z .b8 110 2026-02-21T08:49:01.3134369Z .b8 0 2026-02-21T08:49:01.3134467Z .b8 2 // DW_AT_language 2026-02-21T08:49:01.3134542Z .b8 0 2026-02-21T08:49:01.3134692Z .b8 99 // DW_AT_name 2026-02-21T08:49:01.3134769Z .b8 55 2026-02-21T08:49:01.3134843Z .b8 110 2026-02-21T08:49:01.3134944Z .b8 106 2026-02-21T08:49:01.3135019Z .b8 113 2026-02-21T08:49:01.3135096Z .b8 101 2026-02-21T08:49:01.3135171Z .b8 112 2026-02-21T08:49:01.3135278Z .b8 104 2026-02-21T08:49:01.3135350Z .b8 52 2026-02-21T08:49:01.3135465Z .b8 110 2026-02-21T08:49:01.3135569Z .b8 116 2026-02-21T08:49:01.3135645Z .b8 120 2026-02-21T08:49:01.3135720Z .b8 100 2026-02-21T08:49:01.3135792Z .b8 103 2026-02-21T08:49:01.3135891Z .b8 53 2026-02-21T08:49:01.3135964Z .b8 121 2026-02-21T08:49:01.3136037Z .b8 106 2026-02-21T08:49:01.3136138Z .b8 100 2026-02-21T08:49:01.3136214Z .b8 108 2026-02-21T08:49:01.3136287Z .b8 117 2026-02-21T08:49:01.3136356Z .b8 104 2026-02-21T08:49:01.3136461Z .b8 100 2026-02-21T08:49:01.3136534Z .b8 98 2026-02-21T08:49:01.3136604Z .b8 118 2026-02-21T08:49:01.3136708Z .b8 119 2026-02-21T08:49:01.3136808Z .b8 109 2026-02-21T08:49:01.3136881Z .b8 52 2026-02-21T08:49:01.3136950Z .b8 101 2026-02-21T08:49:01.3137049Z .b8 102 2026-02-21T08:49:01.3137121Z .b8 113 2026-02-21T08:49:01.3137190Z .b8 100 2026-02-21T08:49:01.3137261Z .b8 115 2026-02-21T08:49:01.3137357Z .b8 102 2026-02-21T08:49:01.3137430Z .b8 112 2026-02-21T08:49:01.3137503Z .b8 100 2026-02-21T08:49:01.3137602Z .b8 112 2026-02-21T08:49:01.3137675Z .b8 102 2026-02-21T08:49:01.3137748Z .b8 101 2026-02-21T08:49:01.3137819Z .b8 99 2026-02-21T08:49:01.3137915Z .b8 103 2026-02-21T08:49:01.3137985Z .b8 52 2026-02-21T08:49:01.3138057Z .b8 101 2026-02-21T08:49:01.3138158Z .b8 53 2026-02-21T08:49:01.3138226Z .b8 122 2026-02-21T08:49:01.3138296Z .b8 109 2026-02-21T08:49:01.3138366Z .b8 98 2026-02-21T08:49:01.3138465Z .b8 110 2026-02-21T08:49:01.3138537Z .b8 107 2026-02-21T08:49:01.3138609Z .b8 103 2026-02-21T08:49:01.3138681Z .b8 105 2026-02-21T08:49:01.3138779Z .b8 118 2026-02-21T08:49:01.3138849Z .b8 55 2026-02-21T08:49:01.3138921Z .b8 46 2026-02-21T08:49:01.3139019Z .b8 112 2026-02-21T08:49:01.3139091Z .b8 121 2026-02-21T08:49:01.3139163Z .b8 0 2026-02-21T08:49:01.3139278Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:49:01.3139408Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:49:01.3139491Z .b8 116 2026-02-21T08:49:01.3139592Z .b8 109 2026-02-21T08:49:01.3139689Z .b8 112 2026-02-21T08:49:01.3139758Z .b8 47 2026-02-21T08:49:01.3139828Z .b8 116 2026-02-21T08:49:01.3139900Z .b8 111 2026-02-21T08:49:01.3139999Z .b8 114 2026-02-21T08:49:01.3140066Z .b8 99 2026-02-21T08:49:01.3140135Z .b8 104 2026-02-21T08:49:01.3140230Z .b8 105 2026-02-21T08:49:01.3140299Z .b8 110 2026-02-21T08:49:01.3140368Z .b8 100 2026-02-21T08:49:01.3140435Z .b8 117 2026-02-21T08:49:01.3140533Z .b8 99 2026-02-21T08:49:01.3140599Z .b8 116 2026-02-21T08:49:01.3140667Z .b8 111 2026-02-21T08:49:01.3140735Z .b8 114 2026-02-21T08:49:01.3140828Z .b8 95 2026-02-21T08:49:01.3140896Z .b8 114 2026-02-21T08:49:01.3140966Z .b8 111 2026-02-21T08:49:01.3141060Z .b8 111 2026-02-21T08:49:01.3141127Z .b8 116 2026-02-21T08:49:01.3141197Z .b8 47 2026-02-21T08:49:01.3141265Z .b8 55 2026-02-21T08:49:01.3141361Z .b8 110 2026-02-21T08:49:01.3141429Z .b8 0 2026-02-21T08:49:01.3141499Z } 2026-02-21T08:49:01.3141612Z .section .debug_macinfo { } 2026-02-21T08:49:01.3141616Z 2026-02-21T08:49:01.3141714Z ================================================================ 2026-02-21T08:49:01.3141835Z please share the reproducer above with Triton project. 2026-02-21T08:49:01.3141866Z 2026-02-21T08:49:01.3142304Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 16.1 configs/s 2026-02-21T08:49:02.1040022Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━ 912/912 1037.4 2026-02-21T08:49:02.1040834Z configs/s 2026-02-21T08:49:02.1968703Z [47s] Generation 1 complete: 2026-02-21T08:49:02.1973267Z error=9 2026-02-21T08:49:02.1974843Z ok=76 2026-02-21T08:49:02.1975109Z min=0.2232 2026-02-21T08:49:02.1975279Z mid=1.0352 2026-02-21T08:49:02.1975469Z max=5.2511 2026-02-21T08:49:02.1975707Z best={'block_sizes': [256, 128, 16], 2026-02-21T08:49:02.1980255Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:49:02.1983416Z 'l2_groupings': [4], 2026-02-21T08:49:02.1985001Z 'load_eviction_policies': ['', ''], 2026-02-21T08:49:02.1985314Z 'loop_orders': [[1, 0]], 2026-02-21T08:49:02.1985752Z 'num_stages': 8, 2026-02-21T08:49:02.1985986Z 'num_warps': 4, 2026-02-21T08:49:02.1986203Z 'pid_type': 'flat', 2026-02-21T08:49:02.1986446Z 'range_flattens': [None, None], 2026-02-21T08:49:02.1986701Z 'range_multi_buffers': [None, None], 2026-02-21T08:49:02.1986929Z 'range_num_stages': [0, 0], 2026-02-21T08:49:02.1987160Z 'range_unroll_factors': [0, 0], 2026-02-21T08:49:02.1987379Z 'range_warp_specializes': [None, True]} 2026-02-21T08:49:02.1987670Z [47s] Fitting surrogate: 185 points, 185 targets 2026-02-21T08:49:03.4506016Z [48s] Generation 2 starting: 86 neighbors, 5 active search path(s) 2026-02-21T08:49:11.7586654Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 2.8 configs/s 2026-02-21T08:49:16.8524524Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 17.4 configs/s 2026-02-21T08:49:18.4957776Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 592.5 2026-02-21T08:49:18.4958359Z configs/s 2026-02-21T08:49:18.5848012Z [64s] Generation 2 complete: 2026-02-21T08:49:18.5851860Z error=5 2026-02-21T08:49:18.5856249Z ok=86 2026-02-21T08:49:18.5857763Z min=0.1228 2026-02-21T08:49:18.5857970Z mid=0.5284 2026-02-21T08:49:18.5858164Z max=7.3139 2026-02-21T08:49:18.5858349Z best={'block_sizes': [128, 256, 64], 2026-02-21T08:49:18.5858644Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:49:18.5858888Z 'l2_groupings': [64], 2026-02-21T08:49:18.5859130Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:49:18.5859381Z 'loop_orders': [[0, 1]], 2026-02-21T08:49:18.5859610Z 'num_stages': 3, 2026-02-21T08:49:18.5859814Z 'num_warps': 4, 2026-02-21T08:49:18.5859988Z 'pid_type': 'flat', 2026-02-21T08:49:18.5860207Z 'range_flattens': [None, None], 2026-02-21T08:49:18.5860432Z 'range_multi_buffers': [None, True], 2026-02-21T08:49:18.5860688Z 'range_num_stages': [0, 0], 2026-02-21T08:49:18.5860897Z 'range_unroll_factors': [0, 0], 2026-02-21T08:49:18.5861143Z 'range_warp_specializes': [None, True]} 2026-02-21T08:49:18.5865801Z [64s] Fitting surrogate: 276 points, 276 targets 2026-02-21T08:49:19.8337029Z [65s] Generation 3 starting: 88 neighbors, 5 active search path(s) 2026-02-21T08:49:32.5902503Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92/92 1.5 configs/s 2026-02-21T08:49:38.2396499Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 92/92 16.4 configs/s 2026-02-21T08:49:40.4647599Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 569.1 2026-02-21T08:49:40.4648152Z configs/s 2026-02-21T08:49:40.5584058Z [86s] Generation 3 complete: 2026-02-21T08:49:40.5587756Z error=24 2026-02-21T08:49:40.5591600Z ok=70 2026-02-21T08:49:40.5595613Z min=0.1054 2026-02-21T08:49:40.5597691Z mid=0.4362 2026-02-21T08:49:40.5603577Z max=37.7793 2026-02-21T08:49:40.5607728Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:49:40.5609918Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:49:40.5610189Z 'l2_groupings': [64], 2026-02-21T08:49:40.5610436Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:49:40.5610670Z 'loop_orders': [[0, 1]], 2026-02-21T08:49:40.5610906Z 'num_stages': 3, 2026-02-21T08:49:40.5611085Z 'num_warps': 8, 2026-02-21T08:49:40.5611292Z 'pid_type': 'flat', 2026-02-21T08:49:40.5611486Z 'range_flattens': [None, None], 2026-02-21T08:49:40.5611727Z 'range_multi_buffers': [None, True], 2026-02-21T08:49:40.5611972Z 'range_num_stages': [0, 0], 2026-02-21T08:49:40.5612289Z 'range_unroll_factors': [0, 0], 2026-02-21T08:49:40.5612537Z 'range_warp_specializes': [None, True]} 2026-02-21T08:49:40.5612784Z [86s] Fitting surrogate: 370 points, 370 targets 2026-02-21T08:49:41.7063548Z [87s] Generation 4 starting: 78 neighbors, 5 active search path(s) 2026-02-21T08:50:15.7957396Z [121s] Timeout after 30s compiling Config(block_sizes=[1024, 256, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=1, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T08:50:15.7977305Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81/81 0.2 configs/s 2026-02-21T08:50:19.3131312Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 81/81 22.6 configs/s 2026-02-21T08:50:21.9307030Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 377.7 2026-02-21T08:50:21.9311221Z configs/s 2026-02-21T08:50:22.0414836Z [127s] Generation 4 complete: 2026-02-21T08:50:22.0416585Z error=23 2026-02-21T08:50:22.0416794Z timeout=1 2026-02-21T08:50:22.0417058Z ok=60 2026-02-21T08:50:22.0417268Z min=0.1065 2026-02-21T08:50:22.0417736Z mid=0.3400 2026-02-21T08:50:22.0417943Z max=5.9966 2026-02-21T08:50:22.0418140Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:50:22.0422159Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:50:22.0426047Z 'l2_groupings': [64], 2026-02-21T08:50:22.0428129Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:50:22.0428492Z 'loop_orders': [[0, 1]], 2026-02-21T08:50:22.0432665Z 'num_stages': 3, 2026-02-21T08:50:22.0437813Z 'num_warps': 8, 2026-02-21T08:50:22.0442433Z 'pid_type': 'flat', 2026-02-21T08:50:22.0446202Z 'range_flattens': [None, None], 2026-02-21T08:50:22.0450814Z 'range_multi_buffers': [None, True], 2026-02-21T08:50:22.0454911Z 'range_num_stages': [0, 0], 2026-02-21T08:50:22.0458699Z 'range_unroll_factors': [0, 0], 2026-02-21T08:50:22.0462518Z 'range_warp_specializes': [None, True]} 2026-02-21T08:50:22.0462926Z [127s] Fitting surrogate: 454 points, 454 targets 2026-02-21T08:50:23.0851934Z [128s] Generation 5 starting: 73 neighbors, 5 active search path(s) 2026-02-21T08:50:33.2023002Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76/76 1.5 configs/s 2026-02-21T08:50:37.8095805Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 76/76 16.4 configs/s 2026-02-21T08:50:40.7823042Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 333.5 2026-02-21T08:50:40.7827274Z configs/s 2026-02-21T08:50:40.9085202Z [146s] Generation 5 complete: 2026-02-21T08:50:40.9089981Z error=10 2026-02-21T08:50:40.9093691Z ok=69 2026-02-21T08:50:40.9095427Z min=0.1054 2026-02-21T08:50:40.9095715Z mid=0.3235 2026-02-21T08:50:40.9095916Z max=20.1457 2026-02-21T08:50:40.9100164Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:50:40.9105355Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:50:40.9107047Z 'l2_groupings': [64], 2026-02-21T08:50:40.9107331Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:50:40.9107670Z 'loop_orders': [[0, 1]], 2026-02-21T08:50:40.9107924Z 'num_stages': 3, 2026-02-21T08:50:40.9108183Z 'num_warps': 8, 2026-02-21T08:50:40.9108766Z 'pid_type': 'flat', 2026-02-21T08:50:40.9109031Z 'range_flattens': [None, None], 2026-02-21T08:50:40.9109328Z 'range_multi_buffers': [None, None], 2026-02-21T08:50:40.9109594Z 'range_num_stages': [0, 0], 2026-02-21T08:50:40.9109876Z 'range_unroll_factors': [0, 0], 2026-02-21T08:50:40.9110144Z 'range_warp_specializes': [None, True]} 2026-02-21T08:50:40.9114059Z [146s] Fitting surrogate: 533 points, 533 targets 2026-02-21T08:50:41.9871333Z [147s] Generation 6 starting: 64 neighbors, 4 active search path(s) 2026-02-21T08:50:46.6408891Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67/67 9.1 configs/s 2026-02-21T08:50:50.0527717Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 67/67 19.6 configs/s 2026-02-21T08:50:52.2126209Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 456.6 2026-02-21T08:50:52.2129607Z configs/s 2026-02-21T08:50:52.3125278Z [157s] Generation 6 complete: 2026-02-21T08:50:52.3129081Z error=17 2026-02-21T08:50:52.3134076Z ok=52 2026-02-21T08:50:52.3137945Z min=0.1075 2026-02-21T08:50:52.3139820Z mid=0.3849 2026-02-21T08:50:52.3140059Z max=19.6240 2026-02-21T08:50:52.3140259Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:50:52.3140568Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:50:52.3140815Z 'l2_groupings': [64], 2026-02-21T08:50:52.3141129Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:50:52.3145105Z 'loop_orders': [[0, 1]], 2026-02-21T08:50:52.3147211Z 'num_stages': 3, 2026-02-21T08:50:52.3147507Z 'num_warps': 8, 2026-02-21T08:50:52.3152957Z 'pid_type': 'flat', 2026-02-21T08:50:52.3154994Z 'range_flattens': [None, None], 2026-02-21T08:50:52.3155303Z 'range_multi_buffers': [None, None], 2026-02-21T08:50:52.3155545Z 'range_num_stages': [0, 0], 2026-02-21T08:50:52.3155789Z 'range_unroll_factors': [0, 0], 2026-02-21T08:50:52.3156427Z 'range_warp_specializes': [None, True]} 2026-02-21T08:50:52.3156739Z [157s] Fitting surrogate: 602 points, 602 targets 2026-02-21T08:50:53.4341731Z [158s] Generation 7 starting: 64 neighbors, 4 active search path(s) 2026-02-21T08:51:27.7631983Z [193s] Timeout after 30s compiling Config(block_sizes=[1024, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], num_stages=7, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T08:51:27.7651916Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66/66 0.4 configs/s 2026-02-21T08:51:30.6224297Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 66/66 23.2 configs/s 2026-02-21T08:51:32.9145235Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 430.4 2026-02-21T08:51:32.9146559Z configs/s 2026-02-21T08:51:33.0229355Z [198s] Generation 7 complete: 2026-02-21T08:51:33.0233710Z error=24 2026-02-21T08:51:33.0235210Z timeout=1 2026-02-21T08:51:33.0235456Z ok=44 2026-02-21T08:51:33.0238660Z min=0.1076 2026-02-21T08:51:33.0238918Z mid=0.1822 2026-02-21T08:51:33.0243536Z max=23.6703 2026-02-21T08:51:33.0248080Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:51:33.0252118Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:51:33.0253627Z 'l2_groupings': [64], 2026-02-21T08:51:33.0253972Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:51:33.0254272Z 'loop_orders': [[0, 1]], 2026-02-21T08:51:33.0256142Z 'num_stages': 3, 2026-02-21T08:51:33.0256401Z 'num_warps': 8, 2026-02-21T08:51:33.0256602Z 'pid_type': 'flat', 2026-02-21T08:51:33.0256843Z 'range_flattens': [None, None], 2026-02-21T08:51:33.0257074Z 'range_multi_buffers': [None, None], 2026-02-21T08:51:33.0257347Z 'range_num_stages': [0, 0], 2026-02-21T08:51:33.0257581Z 'range_unroll_factors': [0, 0], 2026-02-21T08:51:33.0258029Z 'range_warp_specializes': [None, True]} 2026-02-21T08:51:33.0258334Z [198s] Fitting surrogate: 671 points, 671 targets 2026-02-21T08:51:34.1005688Z [199s] Generation 8 starting: 60 neighbors, 4 active search path(s) 2026-02-21T08:52:08.7874491Z [234s] Timeout after 30s compiling Config(block_sizes=[1024, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], num_stages=7, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T08:52:08.7891916Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62/62 0.2 configs/s 2026-02-21T08:52:11.0513314Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 62/62 27.8 configs/s 2026-02-21T08:52:12.9954119Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 507.5 2026-02-21T08:52:12.9957603Z configs/s 2026-02-21T08:52:13.0881407Z [238s] Generation 8 complete: 2026-02-21T08:52:13.0884122Z error=35 2026-02-21T08:52:13.0884329Z timeout=1 2026-02-21T08:52:13.0884537Z ok=29 2026-02-21T08:52:13.0884757Z min=0.1086 2026-02-21T08:52:13.0884951Z mid=0.1905 2026-02-21T08:52:13.0885147Z max=39.7998 2026-02-21T08:52:13.0885334Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:52:13.0885628Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:52:13.0885869Z 'l2_groupings': [64], 2026-02-21T08:52:13.0886131Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:52:13.0886363Z 'loop_orders': [[0, 1]], 2026-02-21T08:52:13.0886585Z 'num_stages': 3, 2026-02-21T08:52:13.0886762Z 'num_warps': 8, 2026-02-21T08:52:13.0886966Z 'pid_type': 'flat', 2026-02-21T08:52:13.0887162Z 'range_flattens': [None, None], 2026-02-21T08:52:13.0887680Z 'range_multi_buffers': [None, None], 2026-02-21T08:52:13.0887945Z 'range_num_stages': [0, 0], 2026-02-21T08:52:13.0888156Z 'range_unroll_factors': [0, 0], 2026-02-21T08:52:13.0888407Z 'range_warp_specializes': [None, True]} 2026-02-21T08:52:13.0909354Z [238s] Fitting surrogate: 736 points, 736 targets 2026-02-21T08:52:13.9167049Z [239s] Generation 9 starting: 44 neighbors, 3 active search path(s) 2026-02-21T08:52:20.6805362Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46/46 2.1 configs/s 2026-02-21T08:52:22.7093261Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 46/46 22.8 configs/s 2026-02-21T08:52:24.3189858Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 611.5 2026-02-21T08:52:24.3194420Z configs/s 2026-02-21T08:52:24.4060358Z [249s] Generation 9 complete: 2026-02-21T08:52:24.4065357Z error=14 2026-02-21T08:52:24.4066735Z ok=34 2026-02-21T08:52:24.4067006Z min=0.1095 2026-02-21T08:52:24.4067195Z mid=0.3195 2026-02-21T08:52:24.4067391Z max=18.2826 2026-02-21T08:52:24.4067597Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:52:24.4068189Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:52:24.4068446Z 'l2_groupings': [64], 2026-02-21T08:52:24.4068695Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:52:24.4068937Z 'loop_orders': [[0, 1]], 2026-02-21T08:52:24.4069173Z 'num_stages': 3, 2026-02-21T08:52:24.4069357Z 'num_warps': 8, 2026-02-21T08:52:24.4069580Z 'pid_type': 'flat', 2026-02-21T08:52:24.4069813Z 'range_flattens': [None, None], 2026-02-21T08:52:24.4070043Z 'range_multi_buffers': [None, None], 2026-02-21T08:52:24.4070309Z 'range_num_stages': [0, 0], 2026-02-21T08:52:24.4070535Z 'range_unroll_factors': [0, 0], 2026-02-21T08:52:24.4070789Z 'range_warp_specializes': [None, True]} 2026-02-21T08:52:24.4089793Z [249s] Fitting surrogate: 784 points, 784 targets 2026-02-21T08:52:25.3482791Z [250s] Generation 10 starting: 51 neighbors, 3 active search path(s) 2026-02-21T08:52:37.2111892Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53/53 1.5 configs/s 2026-02-21T08:52:40.4311315Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 53/53 16.7 configs/s 2026-02-21T08:52:42.3672586Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 509.2 2026-02-21T08:52:42.3673483Z configs/s 2026-02-21T08:52:42.4658973Z [267s] Generation 10 complete: 2026-02-21T08:52:42.4659317Z error=10 2026-02-21T08:52:42.4659559Z ok=45 2026-02-21T08:52:42.4659757Z min=0.1095 2026-02-21T08:52:42.4660230Z mid=0.4310 2026-02-21T08:52:42.4660393Z max=34.3634 2026-02-21T08:52:42.4660608Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:52:42.4660861Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:52:42.4661128Z 'l2_groupings': [64], 2026-02-21T08:52:42.4661366Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:52:42.4661691Z 'loop_orders': [[0, 1]], 2026-02-21T08:52:42.4661920Z 'num_stages': 3, 2026-02-21T08:52:42.4662110Z 'num_warps': 8, 2026-02-21T08:52:42.4662327Z 'pid_type': 'flat', 2026-02-21T08:52:42.4662518Z 'range_flattens': [None, None], 2026-02-21T08:52:42.4662773Z 'range_multi_buffers': [None, None], 2026-02-21T08:52:42.4662991Z 'range_num_stages': [0, 0], 2026-02-21T08:52:42.4663218Z 'range_unroll_factors': [0, 0], 2026-02-21T08:52:42.4663456Z 'range_warp_specializes': [None, True]} 2026-02-21T08:52:42.4691891Z [267s] Fitting surrogate: 839 points, 839 targets 2026-02-21T08:52:43.4066812Z [268s] Generation 11 starting: 54 neighbors, 3 active search path(s) 2026-02-21T08:52:58.2972514Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 0.7 configs/s 2026-02-21T08:52:59.7115539Z 2026-02-21T08:52:59.7115566Z 2026-02-21T08:52:59.7116267Z ================================================================ 2026-02-21T08:52:59.7117055Z Internal Triton PTX codegen error 2026-02-21T08:52:59.7117640Z `ptxas` stderr: 2026-02-21T08:52:59.7118924Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 226 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:52:59.7120701Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:59.7121095Z 2026-02-21T08:52:59.7122168Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp7a78taq0.ptx -o /tmp/tmp7a78taq0.ptx.o 2026-02-21T08:52:59.7123440Z 2026-02-21T08:52:59.7123448Z 2026-02-21T08:52:59.7123593Z // 2026-02-21T08:52:59.7123994Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:59.7124442Z // 2026-02-21T08:52:59.7124611Z 2026-02-21T08:52:59.7124865Z .version 8.7 2026-02-21T08:52:59.7125206Z .target sm_100a 2026-02-21T08:52:59.7125587Z .address_size 64 2026-02-21T08:52:59.7125801Z 2026-02-21T08:52:59.7126098Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:52:59.7126782Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:59.7127369Z // @_helion_matmul 2026-02-21T08:52:59.7127894Z .visible .entry _helion_matmul( 2026-02-21T08:52:59.7128488Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:52:59.7129135Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:52:59.7129843Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:52:59.7130530Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:52:59.7131212Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:52:59.7132016Z ) 2026-02-21T08:52:59.7132314Z .reqntid 384 2026-02-21T08:52:59.7132676Z .maxnreg 32 2026-02-21T08:52:59.7132982Z { 2026-02-21T08:52:59.7133345Z .reg .pred %p<230>; 2026-02-21T08:52:59.7133733Z .reg .b32 %r<2352>; 2026-02-21T08:52:59.7134663Z .reg .b64 %rd<1390>; 2026-02-21T08:52:59.7135400Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19:0 2026-02-21T08:52:59.7136218Z $L__func_begin0: 2026-02-21T08:52:59.7137043Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19:0 2026-02-21T08:52:59.7137757Z 2026-02-21T08:52:59.7137899Z // %bb.0: 2026-02-21T08:52:59.7138336Z ld.param.b64 %rd91, [_helion_matmul_param_3]; 2026-02-21T08:52:59.7138829Z $L__tmp0: 2026-02-21T08:52:59.7139482Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19 2026-02-21T08:52:59.7140253Z mov.u32 %r1, %tid.x; 2026-02-21T08:52:59.7140663Z shr.u32 %r2, %r1, 5; 2026-02-21T08:52:59.7141069Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:52:59.7141590Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:52:59.7142016Z @%p2 bra $L__BB0_14; 2026-02-21T08:52:59.7142381Z bra.uni $L__BB0_1; 2026-02-21T08:52:59.7142794Z $L__BB0_14: 2026-02-21T08:52:59.7143610Z .loc 1 0 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:0:0 2026-02-21T08:52:59.7144485Z ld.param.b64 %rd90, [_helion_matmul_param_2]; 2026-02-21T08:52:59.7145091Z ld.param.b64 %rd88, [_helion_matmul_param_0]; 2026-02-21T08:52:59.7145866Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19 2026-02-21T08:52:59.7146694Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T08:52:59.7147161Z setp.lt.u32 %p147, %r1, 32; 2026-02-21T08:52:59.7147600Z mov.b32 %r425, global_smem; 2026-02-21T08:52:59.7147986Z // begin inline asm 2026-02-21T08:52:59.7148606Z @%p147 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r425], 512; 2026-02-21T08:52:59.7149314Z // end inline asm 2026-02-21T08:52:59.7149692Z bar.sync 0, 128; 2026-02-21T08:52:59.7150056Z ld.shared.b32 %r2348, [global_smem]; 2026-02-21T08:52:59.7150549Z bar.sync 0, 128; 2026-02-21T08:52:59.7150894Z // begin inline asm 2026-02-21T08:52:59.7151446Z @%p147 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:52:59.7152091Z // end inline asm 2026-02-21T08:52:59.7152757Z .loc 1 21 67 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:21:67 2026-02-21T08:52:59.7153719Z mov.u32 %r24, %ctaid.x; 2026-02-21T08:52:59.7154103Z mov.u32 %r442, %ctaid.y; 2026-02-21T08:52:59.7154529Z mov.u32 %r443, %ctaid.z; 2026-02-21T08:52:59.7155006Z mov.u32 %r444, %nctaid.x; 2026-02-21T08:52:59.7155450Z mov.u32 %r445, %nctaid.y; 2026-02-21T08:52:59.7155915Z mad.lo.s32 %r446, %r443, %r445, %r442; 2026-02-21T08:52:59.7156390Z mad.lo.s32 %r447, %r446, %r444, %r24; 2026-02-21T08:52:59.7156876Z shl.b32 %r448, %r447, 8; 2026-02-21T08:52:59.7157270Z cvt.s64.s32 %rd362, %r448; 2026-02-21T08:52:59.7157715Z add.s64 %rd341, %rd91, %rd362; 2026-02-21T08:52:59.7158110Z shl.b32 %r449, %r1, 2; 2026-02-21T08:52:59.7158517Z add.s32 %r426, %r425, %r449; 2026-02-21T08:52:59.7158912Z mov.b32 %r451, 0; 2026-02-21T08:52:59.7159289Z // begin inline asm 2026-02-21T08:52:59.7159679Z @%p147 st.shared.b32 [ %r426 + 0 ], %r451; 2026-02-21T08:52:59.7160167Z // end inline asm 2026-02-21T08:52:59.7160563Z bar.warp.sync -1; 2026-02-21T08:52:59.7160929Z setp.eq.b32 %p218, %r1, 0; 2026-02-21T08:52:59.7161371Z cvt.u64.u32 %rd326, %r425; 2026-02-21T08:52:59.7161771Z // begin inline asm 2026-02-21T08:52:59.7162485Z @%p218 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd326 + 0 ], %rd88; 2026-02-21T08:52:59.7163264Z // end inline asm 2026-02-21T08:52:59.7163651Z // begin inline asm 2026-02-21T08:52:59.7164214Z @%p218 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1; 2026-02-21T08:52:59.7164976Z // end inline asm 2026-02-21T08:52:59.7165491Z mov.b32 %r428, 64; 2026-02-21T08:52:59.7165841Z // begin inline asm 2026-02-21T08:52:59.7166495Z @%p218 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0, %r428; 2026-02-21T08:52:59.7167222Z // end inline asm 2026-02-21T08:52:59.7167596Z mov.b32 %r429, 256; 2026-02-21T08:52:59.7167960Z // begin inline asm 2026-02-21T08:52:59.7168772Z @%p218 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1, %r429; 2026-02-21T08:52:59.7169508Z // end inline asm 2026-02-21T08:52:59.7169876Z mov.b32 %r430, 2048; 2026-02-21T08:52:59.7170254Z // begin inline asm 2026-02-21T08:52:59.7170863Z @%p218 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0, %r430; 2026-02-21T08:52:59.7171624Z // end inline asm 2026-02-21T08:52:59.7171950Z // begin inline asm 2026-02-21T08:52:59.7172618Z @%p218 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1, %r430; 2026-02-21T08:52:59.7173331Z // end inline asm 2026-02-21T08:52:59.7173699Z mov.b64 %rd334, 4096; 2026-02-21T08:52:59.7174096Z // begin inline asm 2026-02-21T08:52:59.7174877Z @%p218 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd326 + 0 ], 0x0, %rd334; 2026-02-21T08:52:59.7175709Z // end inline asm 2026-02-21T08:52:59.7176053Z mov.b32 %r432, 1; 2026-02-21T08:52:59.7176564Z // begin inline asm 2026-02-21T08:52:59.7177251Z @%p218 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0, %r432; 2026-02-21T08:52:59.7178083Z // end inline asm 2026-02-21T08:52:59.7178428Z // begin inline asm 2026-02-21T08:52:59.7179146Z @%p218 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1, %r432; 2026-02-21T08:52:59.7179963Z // end inline asm 2026-02-21T08:52:59.7180305Z // begin inline asm 2026-02-21T08:52:59.7180985Z @%p218 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x6; 2026-02-21T08:52:59.7181708Z // end inline asm 2026-02-21T08:52:59.7182086Z // begin inline asm 2026-02-21T08:52:59.7182767Z @%p218 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0; 2026-02-21T08:52:59.7183543Z // end inline asm 2026-02-21T08:52:59.7183923Z // begin inline asm 2026-02-21T08:52:59.7184552Z @%p218 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x3; 2026-02-21T08:52:59.7185388Z // end inline asm 2026-02-21T08:52:59.7185722Z // begin inline asm 2026-02-21T08:52:59.7186371Z @%p218 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0; 2026-02-21T08:52:59.7187298Z // end inline asm 2026-02-21T08:52:59.7187640Z // begin inline asm 2026-02-21T08:52:59.7188646Z @%p147 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd341 + 0 ], [ %rd326 + 0 ], 0x80; 2026-02-21T08:52:59.7189639Z // end inline asm 2026-02-21T08:52:59.7190018Z // begin inline asm 2026-02-21T08:52:59.7190572Z @%p147 fence.proxy.tensormap::generic.acquire.gpu [ %rd341 + 0 ], 0x80; 2026-02-21T08:52:59.7191291Z @%p147 cp.async.bulk.commit_group ; 2026-02-21T08:52:59.7191822Z @%p147 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:52:59.7192275Z // end inline asm 2026-02-21T08:52:59.7192648Z bar.sync 0, 128; 2026-02-21T08:52:59.7193331Z .loc 1 23 73 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:23:73 2026-02-21T08:52:59.7194148Z add.s64 %rd87, %rd341, 128; 2026-02-21T08:52:59.7194544Z bar.sync 0, 128; 2026-02-21T08:52:59.7194985Z // begin inline asm 2026-02-21T08:52:59.7195355Z @%p147 st.shared.b32 [ %r426 + 0 ], %r451; 2026-02-21T08:52:59.7195813Z // end inline asm 2026-02-21T08:52:59.7196187Z bar.warp.sync -1; 2026-02-21T08:52:59.7196526Z // begin inline asm 2026-02-21T08:52:59.7197214Z @%p218 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd326 + 0 ], %rd90; 2026-02-21T08:52:59.7197936Z // end inline asm 2026-02-21T08:52:59.7198303Z // begin inline asm 2026-02-21T08:52:59.7198879Z @%p218 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1; 2026-02-21T08:52:59.7199742Z // end inline asm 2026-02-21T08:52:59.7200077Z // begin inline asm 2026-02-21T08:52:59.7200768Z @%p218 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0, %r428; 2026-02-21T08:52:59.7201537Z // end inline asm 2026-02-21T08:52:59.7201863Z // begin inline asm 2026-02-21T08:52:59.7202530Z @%p218 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1, %r429; 2026-02-21T08:52:59.7203375Z // end inline asm 2026-02-21T08:52:59.7203764Z mov.b32 %r438, 12288; 2026-02-21T08:52:59.7204120Z // begin inline asm 2026-02-21T08:52:59.7204876Z @%p218 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0, %r438; 2026-02-21T08:52:59.7205703Z // end inline asm 2026-02-21T08:52:59.7206046Z // begin inline asm 2026-02-21T08:52:59.7206732Z @%p218 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1, %r430; 2026-02-21T08:52:59.7207509Z // end inline asm 2026-02-21T08:52:59.7207882Z mov.b64 %rd352, 24576; 2026-02-21T08:52:59.7208237Z // begin inline asm 2026-02-21T08:52:59.7208950Z @%p218 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd326 + 0 ], 0x0, %rd352; 2026-02-21T08:52:59.7209767Z // end inline asm 2026-02-21T08:52:59.7210108Z // begin inline asm 2026-02-21T08:52:59.7210929Z @%p218 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0, %r432; 2026-02-21T08:52:59.7211761Z // end inline asm 2026-02-21T08:52:59.7212146Z // begin inline asm 2026-02-21T08:52:59.7212832Z @%p218 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x1, %r432; 2026-02-21T08:52:59.7213665Z // end inline asm 2026-02-21T08:52:59.7213988Z // begin inline asm 2026-02-21T08:52:59.7214622Z @%p218 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x6; 2026-02-21T08:52:59.7215456Z // end inline asm 2026-02-21T08:52:59.7215795Z // begin inline asm 2026-02-21T08:52:59.7216501Z @%p218 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0; 2026-02-21T08:52:59.7217274Z // end inline asm 2026-02-21T08:52:59.7217665Z // begin inline asm 2026-02-21T08:52:59.7218334Z @%p218 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x3; 2026-02-21T08:52:59.7219102Z // end inline asm 2026-02-21T08:52:59.7219484Z // begin inline asm 2026-02-21T08:52:59.7220098Z @%p218 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd326 + 0 ], 0x0; 2026-02-21T08:52:59.7220930Z // end inline asm 2026-02-21T08:52:59.7221254Z // begin inline asm 2026-02-21T08:52:59.7222202Z @%p147 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd87 + 0 ], [ %rd326 + 0 ], 0x80; 2026-02-21T08:52:59.7223195Z // end inline asm 2026-02-21T08:52:59.7223560Z // begin inline asm 2026-02-21T08:52:59.7224121Z @%p147 fence.proxy.tensormap::generic.acquire.gpu [ %rd87 + 0 ], 0x80; 2026-02-21T08:52:59.7224881Z @%p147 cp.async.bulk.commit_group ; 2026-02-21T08:52:59.7225412Z @%p147 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:52:59.7225877Z // end inline asm 2026-02-21T08:52:59.7226252Z bar.sync 0, 128; 2026-02-21T08:52:59.7226872Z .loc 1 32 85 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:32:85 2026-02-21T08:52:59.7227678Z setp.gt.u32 %p185, %r24, 383; 2026-02-21T08:52:59.7228097Z @%p185 bra $L__BB0_16; 2026-02-21T08:52:59.7228556Z // %bb.15: // %.lr.ph 2026-02-21T08:52:59.7229361Z .loc 1 23 73 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:23:73 2026-02-21T08:52:59.7230152Z cvta.global.u64 %rd363, %rd87; 2026-02-21T08:52:59.7230640Z setp.lt.u32 %p226, %r1, 128; 2026-02-21T08:52:59.7231060Z shl.b32 %r1550, %r1, 7; 2026-02-21T08:52:59.7231487Z and.b32 %r1551, %r1550, 16256; 2026-02-21T08:52:59.7231898Z shl.b32 %r1552, %r1, 4; 2026-02-21T08:52:59.7232321Z and.b32 %r1553, %r1552, 112; 2026-02-21T08:52:59.7232719Z or.b32 %r1554, %r1551, %r1553; 2026-02-21T08:52:59.7233277Z xor.b32 %r1555, %r1554, 112; 2026-02-21T08:52:59.7233703Z add.s32 %r1557, %r425, %r1555; 2026-02-21T08:52:59.7234112Z xor.b32 %r1558, %r1554, 96; 2026-02-21T08:52:59.7234552Z add.s32 %r1559, %r425, %r1558; 2026-02-21T08:52:59.7235009Z xor.b32 %r1560, %r1554, 80; 2026-02-21T08:52:59.7235435Z add.s32 %r1561, %r425, %r1560; 2026-02-21T08:52:59.7235831Z xor.b32 %r1562, %r1554, 64; 2026-02-21T08:52:59.7236403Z add.s32 %r1563, %r425, %r1562; 2026-02-21T08:52:59.7236828Z xor.b32 %r1564, %r1554, 48; 2026-02-21T08:52:59.7237274Z add.s32 %r1565, %r425, %r1564; 2026-02-21T08:52:59.7237680Z xor.b32 %r1566, %r1554, 32; 2026-02-21T08:52:59.7238120Z add.s32 %r1567, %r425, %r1566; 2026-02-21T08:52:59.7238570Z xor.b32 %r1568, %r1554, 16; 2026-02-21T08:52:59.7238962Z add.s32 %r1569, %r425, %r1568; 2026-02-21T08:52:59.7239391Z add.s32 %r1570, %r425, %r1554; 2026-02-21T08:52:59.7240082Z .loc 1 43 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:43:27 2026-02-21T08:52:59.7240903Z shl.b32 %r1571, %r24, 10; 2026-02-21T08:52:59.7241289Z and.b32 %r1572, %r1571, 1024; 2026-02-21T08:52:59.7242005Z .loc 1 44 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:44:27 2026-02-21T08:52:59.7242810Z shl.b32 %r1573, %r24, 5; 2026-02-21T08:52:59.7243321Z and.b32 %r1547, %r1573, 16320; 2026-02-21T08:52:59.7244100Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7244968Z shfl.sync.idx.b32 %r1574, %r2, 0, 31, -1; 2026-02-21T08:52:59.7245458Z and.b32 %r1575, %r1574, 3; 2026-02-21T08:52:59.7245825Z shl.b32 %r1576, %r1575, 21; 2026-02-21T08:52:59.7246243Z add.s32 %r450, %r1576, %r2348; 2026-02-21T08:52:59.7246646Z mov.pred %p186, -1; 2026-02-21T08:52:59.7247040Z // begin inline asm 2026-02-21T08:52:59.7248028Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 0], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7249085Z // end inline asm 2026-02-21T08:52:59.7249478Z // begin inline asm 2026-02-21T08:52:59.7250460Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 16], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7251592Z // end inline asm 2026-02-21T08:52:59.7251924Z // begin inline asm 2026-02-21T08:52:59.7252945Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 32], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7254176Z // end inline asm 2026-02-21T08:52:59.7254514Z // begin inline asm 2026-02-21T08:52:59.7255601Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 48], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7256713Z // end inline asm 2026-02-21T08:52:59.7257103Z // begin inline asm 2026-02-21T08:52:59.7258059Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 64], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7259165Z // end inline asm 2026-02-21T08:52:59.7259541Z // begin inline asm 2026-02-21T08:52:59.7260496Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 80], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7261648Z // end inline asm 2026-02-21T08:52:59.7261999Z // begin inline asm 2026-02-21T08:52:59.7263024Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 96], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7264124Z // end inline asm 2026-02-21T08:52:59.7264467Z // begin inline asm 2026-02-21T08:52:59.7265567Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 112], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7266763Z // end inline asm 2026-02-21T08:52:59.7267142Z // begin inline asm 2026-02-21T08:52:59.7268176Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 128], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7269273Z // end inline asm 2026-02-21T08:52:59.7269741Z // begin inline asm 2026-02-21T08:52:59.7270670Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 144], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7271744Z // end inline asm 2026-02-21T08:52:59.7272066Z // begin inline asm 2026-02-21T08:52:59.7273042Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 160], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7274159Z // end inline asm 2026-02-21T08:52:59.7274493Z // begin inline asm 2026-02-21T08:52:59.7275604Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 176], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7276631Z // end inline asm 2026-02-21T08:52:59.7277109Z // begin inline asm 2026-02-21T08:52:59.7278101Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 192], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7279198Z // end inline asm 2026-02-21T08:52:59.7279570Z // begin inline asm 2026-02-21T08:52:59.7280578Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 208], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7281705Z // end inline asm 2026-02-21T08:52:59.7282044Z // begin inline asm 2026-02-21T08:52:59.7283016Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 224], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7284101Z // end inline asm 2026-02-21T08:52:59.7284474Z // begin inline asm 2026-02-21T08:52:59.7285575Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 240], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7286833Z // end inline asm 2026-02-21T08:52:59.7287211Z // begin inline asm 2026-02-21T08:52:59.7288193Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 256], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7289272Z // end inline asm 2026-02-21T08:52:59.7289617Z // begin inline asm 2026-02-21T08:52:59.7290627Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 272], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7291738Z // end inline asm 2026-02-21T08:52:59.7292085Z // begin inline asm 2026-02-21T08:52:59.7293139Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 288], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7294211Z // end inline asm 2026-02-21T08:52:59.7294579Z // begin inline asm 2026-02-21T08:52:59.7295635Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 304], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7296683Z // end inline asm 2026-02-21T08:52:59.7297046Z // begin inline asm 2026-02-21T08:52:59.7297978Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 320], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7299117Z // end inline asm 2026-02-21T08:52:59.7299591Z // begin inline asm 2026-02-21T08:52:59.7300587Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 336], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7301686Z // end inline asm 2026-02-21T08:52:59.7302036Z // begin inline asm 2026-02-21T08:52:59.7303138Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 352], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7304249Z // end inline asm 2026-02-21T08:52:59.7304642Z // begin inline asm 2026-02-21T08:52:59.7305693Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 368], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7306811Z // end inline asm 2026-02-21T08:52:59.7307176Z // begin inline asm 2026-02-21T08:52:59.7308143Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 384], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7309257Z // end inline asm 2026-02-21T08:52:59.7309587Z // begin inline asm 2026-02-21T08:52:59.7310754Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 400], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7311928Z // end inline asm 2026-02-21T08:52:59.7312269Z // begin inline asm 2026-02-21T08:52:59.7313240Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 416], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7314331Z // end inline asm 2026-02-21T08:52:59.7314766Z // begin inline asm 2026-02-21T08:52:59.7315724Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 432], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7316899Z // end inline asm 2026-02-21T08:52:59.7317276Z // begin inline asm 2026-02-21T08:52:59.7318232Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 448], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7319277Z // end inline asm 2026-02-21T08:52:59.7319600Z // begin inline asm 2026-02-21T08:52:59.7320578Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 464], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7321776Z // end inline asm 2026-02-21T08:52:59.7322105Z // begin inline asm 2026-02-21T08:52:59.7323137Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 480], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7324237Z // end inline asm 2026-02-21T08:52:59.7324628Z // begin inline asm 2026-02-21T08:52:59.7325628Z @%p186 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r450 + 496], {%r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451, %r451}; 2026-02-21T08:52:59.7326748Z // end inline asm 2026-02-21T08:52:59.7327121Z // begin inline asm 2026-02-21T08:52:59.7327506Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:52:59.7327951Z // end inline asm 2026-02-21T08:52:59.7328284Z bar.sync 0, 128; 2026-02-21T08:52:59.7329000Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.7329796Z add.s32 %r994, %r425, 278528; 2026-02-21T08:52:59.7330233Z // begin inline asm 2026-02-21T08:52:59.7330669Z @%p218 mbarrier.init.shared::cta.b64 [%r994], 1; 2026-02-21T08:52:59.7331185Z // end inline asm 2026-02-21T08:52:59.7331554Z add.s32 %r995, %r425, 278544; 2026-02-21T08:52:59.7331945Z // begin inline asm 2026-02-21T08:52:59.7332402Z @%p218 mbarrier.init.shared::cta.b64 [%r995], 1; 2026-02-21T08:52:59.7333031Z // end inline asm 2026-02-21T08:52:59.7333701Z .loc 1 55 31 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:55:31 2026-02-21T08:52:59.7334457Z bar.sync 0, 128; 2026-02-21T08:52:59.7334922Z // begin inline asm 2026-02-21T08:52:59.7335369Z @%p218 mbarrier.arrive.shared::cta.b64 _, [%r994]; 2026-02-21T08:52:59.7335914Z // end inline asm 2026-02-21T08:52:59.7336722Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.7337482Z bar.sync 0, 128; 2026-02-21T08:52:59.7337869Z add.s32 %r997, %r425, 278560; 2026-02-21T08:52:59.7338258Z // begin inline asm 2026-02-21T08:52:59.7338716Z @%p218 mbarrier.init.shared::cta.b64 [%r997], 1; 2026-02-21T08:52:59.7339211Z // end inline asm 2026-02-21T08:52:59.7339671Z st.shared.v2.b32 [global_smem+278568], {0, 33685761}; 2026-02-21T08:52:59.7340230Z st.shared.b32 [global_smem+262144], %r2348; 2026-02-21T08:52:59.7340849Z st.shared.v2.b32 [global_smem+262152], {%r1572, %r1547}; 2026-02-21T08:52:59.7341454Z barrier.sync 1; 2026-02-21T08:52:59.7341801Z barrier.sync 1; 2026-02-21T08:52:59.7342171Z barrier.sync 1; 2026-02-21T08:52:59.7342802Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7343677Z bar.sync 0, 128; 2026-02-21T08:52:59.7344017Z // begin inline asm 2026-02-21T08:52:59.7344375Z 2026-02-21T08:52:59.7344649Z { 2026-02-21T08:52:59.7345036Z .reg .pred complete; 2026-02-21T08:52:59.7345374Z waitLoop: 2026-02-21T08:52:59.7345864Z mbarrier.try_wait.parity.shared.b64 complete, [%r997], %r451; 2026-02-21T08:52:59.7346489Z @!complete bra.uni waitLoop; 2026-02-21T08:52:59.7346869Z } 2026-02-21T08:52:59.7347030Z 2026-02-21T08:52:59.7347201Z // end inline asm 2026-02-21T08:52:59.7347853Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.7348658Z bar.sync 0, 128; 2026-02-21T08:52:59.7349001Z // begin inline asm 2026-02-21T08:52:59.7349438Z @%p218 mbarrier.inval.shared::cta.b64 [%r997]; 2026-02-21T08:52:59.7349904Z // end inline asm 2026-02-21T08:52:59.7350278Z // begin inline asm 2026-02-21T08:52:59.7350723Z @%p218 mbarrier.inval.shared::cta.b64 [%r995]; 2026-02-21T08:52:59.7351217Z // end inline asm 2026-02-21T08:52:59.7351591Z // begin inline asm 2026-02-21T08:52:59.7351996Z @%p218 mbarrier.inval.shared::cta.b64 [%r994]; 2026-02-21T08:52:59.7352506Z // end inline asm 2026-02-21T08:52:59.7353299Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7354135Z // begin inline asm 2026-02-21T08:52:59.7355306Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1003, %r1004, %r1005, %r1006, %r1007, %r1008, %r1009, %r1010, %r1011, %r1012, %r1013, %r1014, %r1015, %r1016, %r1017, %r1018}, [%r450 + 0]; 2026-02-21T08:52:59.7356396Z // end inline asm 2026-02-21T08:52:59.7356772Z // begin inline asm 2026-02-21T08:52:59.7357820Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1020, %r1021, %r1022, %r1023, %r1024, %r1025, %r1026, %r1027, %r1028, %r1029, %r1030, %r1031, %r1032, %r1033, %r1034, %r1035}, [%r450 + 16]; 2026-02-21T08:52:59.7359012Z // end inline asm 2026-02-21T08:52:59.7359351Z // begin inline asm 2026-02-21T08:52:59.7360466Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1037, %r1038, %r1039, %r1040, %r1041, %r1042, %r1043, %r1044, %r1045, %r1046, %r1047, %r1048, %r1049, %r1050, %r1051, %r1052}, [%r450 + 32]; 2026-02-21T08:52:59.7361640Z // end inline asm 2026-02-21T08:52:59.7361979Z cvt.u64.u32 %rd364, %r1037; 2026-02-21T08:52:59.7362422Z cvt.u64.u32 %rd365, %r1038; 2026-02-21T08:52:59.7362822Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:52:59.7363268Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:52:59.7363676Z cvt.u64.u32 %rd368, %r1039; 2026-02-21T08:52:59.7364106Z cvt.u64.u32 %rd369, %r1040; 2026-02-21T08:52:59.7364490Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:52:59.7364998Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:52:59.7365623Z cvt.u64.u32 %rd372, %r1041; 2026-02-21T08:52:59.7366032Z cvt.u64.u32 %rd373, %r1042; 2026-02-21T08:52:59.7366459Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:52:59.7366859Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:52:59.7367300Z cvt.u64.u32 %rd376, %r1043; 2026-02-21T08:52:59.7367675Z cvt.u64.u32 %rd377, %r1044; 2026-02-21T08:52:59.7368083Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:52:59.7368557Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:52:59.7369001Z cvt.u64.u32 %rd380, %r1045; 2026-02-21T08:52:59.7369415Z cvt.u64.u32 %rd381, %r1046; 2026-02-21T08:52:59.7369790Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:52:59.7370202Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:52:59.7370584Z cvt.u64.u32 %rd384, %r1047; 2026-02-21T08:52:59.7371003Z cvt.u64.u32 %rd385, %r1048; 2026-02-21T08:52:59.7371388Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:52:59.7371828Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:52:59.7372227Z cvt.u64.u32 %rd388, %r1049; 2026-02-21T08:52:59.7372670Z cvt.u64.u32 %rd389, %r1050; 2026-02-21T08:52:59.7373062Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:52:59.7373475Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:52:59.7373897Z cvt.u64.u32 %rd392, %r1051; 2026-02-21T08:52:59.7374285Z cvt.u64.u32 %rd393, %r1052; 2026-02-21T08:52:59.7374892Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:52:59.7375298Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:52:59.7375735Z // begin inline asm 2026-02-21T08:52:59.7376756Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1054, %r1055, %r1056, %r1057, %r1058, %r1059, %r1060, %r1061, %r1062, %r1063, %r1064, %r1065, %r1066, %r1067, %r1068, %r1069}, [%r450 + 48]; 2026-02-21T08:52:59.7377989Z // end inline asm 2026-02-21T08:52:59.7378335Z cvt.u64.u32 %rd396, %r1054; 2026-02-21T08:52:59.7378784Z cvt.u64.u32 %rd397, %r1055; 2026-02-21T08:52:59.7379218Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:52:59.7379612Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:52:59.7380027Z cvt.u64.u32 %rd400, %r1056; 2026-02-21T08:52:59.7380420Z cvt.u64.u32 %rd401, %r1057; 2026-02-21T08:52:59.7380859Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:52:59.7381252Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:52:59.7381699Z cvt.u64.u32 %rd404, %r1058; 2026-02-21T08:52:59.7382083Z cvt.u64.u32 %rd405, %r1059; 2026-02-21T08:52:59.7382506Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:52:59.7382934Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:52:59.7383351Z cvt.u64.u32 %rd408, %r1060; 2026-02-21T08:52:59.7383932Z cvt.u64.u32 %rd409, %r1061; 2026-02-21T08:52:59.7384321Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:52:59.7384825Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:52:59.7385241Z cvt.u64.u32 %rd412, %r1062; 2026-02-21T08:52:59.7385662Z cvt.u64.u32 %rd413, %r1063; 2026-02-21T08:52:59.7386033Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:52:59.7386462Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:52:59.7386862Z cvt.u64.u32 %rd416, %r1064; 2026-02-21T08:52:59.7387296Z cvt.u64.u32 %rd417, %r1065; 2026-02-21T08:52:59.7387728Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:52:59.7388116Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:52:59.7388552Z cvt.u64.u32 %rd420, %r1066; 2026-02-21T08:52:59.7388944Z cvt.u64.u32 %rd421, %r1067; 2026-02-21T08:52:59.7389381Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:52:59.7389796Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:52:59.7390246Z cvt.u64.u32 %rd424, %r1068; 2026-02-21T08:52:59.7390639Z cvt.u64.u32 %rd425, %r1069; 2026-02-21T08:52:59.7391071Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:52:59.7391456Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:52:59.7391883Z // begin inline asm 2026-02-21T08:52:59.7392895Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1071, %r1072, %r1073, %r1074, %r1075, %r1076, %r1077, %r1078, %r1079, %r1080, %r1081, %r1082, %r1083, %r1084, %r1085, %r1086}, [%r450 + 64]; 2026-02-21T08:52:59.7394005Z // end inline asm 2026-02-21T08:52:59.7394371Z // begin inline asm 2026-02-21T08:52:59.7395437Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1088, %r1089, %r1090, %r1091, %r1092, %r1093, %r1094, %r1095, %r1096, %r1097, %r1098, %r1099, %r1100, %r1101, %r1102, %r1103}, [%r450 + 80]; 2026-02-21T08:52:59.7396726Z // end inline asm 2026-02-21T08:52:59.7397105Z // begin inline asm 2026-02-21T08:52:59.7398130Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1105, %r1106, %r1107, %r1108, %r1109, %r1110, %r1111, %r1112, %r1113, %r1114, %r1115, %r1116, %r1117, %r1118, %r1119, %r1120}, [%r450 + 96]; 2026-02-21T08:52:59.7399397Z // end inline asm 2026-02-21T08:52:59.7399750Z cvt.u64.u32 %rd428, %r1105; 2026-02-21T08:52:59.7400176Z cvt.u64.u32 %rd429, %r1106; 2026-02-21T08:52:59.7400558Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:52:59.7400984Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:52:59.7401382Z cvt.u64.u32 %rd432, %r1107; 2026-02-21T08:52:59.7401820Z cvt.u64.u32 %rd433, %r1108; 2026-02-21T08:52:59.7402264Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:52:59.7402658Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:52:59.7403125Z cvt.u64.u32 %rd436, %r1109; 2026-02-21T08:52:59.7403527Z cvt.u64.u32 %rd437, %r1110; 2026-02-21T08:52:59.7403952Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:52:59.7404328Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:52:59.7404824Z cvt.u64.u32 %rd440, %r1111; 2026-02-21T08:52:59.7405220Z cvt.u64.u32 %rd441, %r1112; 2026-02-21T08:52:59.7405772Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:52:59.7406181Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:52:59.7406619Z cvt.u64.u32 %rd444, %r1113; 2026-02-21T08:52:59.7407041Z cvt.u64.u32 %rd445, %r1114; 2026-02-21T08:52:59.7407435Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:52:59.7407884Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:52:59.7408311Z cvt.u64.u32 %rd448, %r1115; 2026-02-21T08:52:59.7408752Z cvt.u64.u32 %rd449, %r1116; 2026-02-21T08:52:59.7409146Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:52:59.7409594Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:52:59.7409990Z cvt.u64.u32 %rd452, %r1117; 2026-02-21T08:52:59.7410409Z cvt.u64.u32 %rd453, %r1118; 2026-02-21T08:52:59.7410841Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:52:59.7411241Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:52:59.7411693Z cvt.u64.u32 %rd456, %r1119; 2026-02-21T08:52:59.7412083Z cvt.u64.u32 %rd457, %r1120; 2026-02-21T08:52:59.7412511Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:52:59.7412908Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:52:59.7413346Z // begin inline asm 2026-02-21T08:52:59.7414460Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1122, %r1123, %r1124, %r1125, %r1126, %r1127, %r1128, %r1129, %r1130, %r1131, %r1132, %r1133, %r1134, %r1135, %r1136, %r1137}, [%r450 + 112]; 2026-02-21T08:52:59.7415858Z // end inline asm 2026-02-21T08:52:59.7416227Z cvt.u64.u32 %rd460, %r1122; 2026-02-21T08:52:59.7416593Z cvt.u64.u32 %rd461, %r1123; 2026-02-21T08:52:59.7417013Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:52:59.7417397Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:52:59.7417854Z cvt.u64.u32 %rd464, %r1124; 2026-02-21T08:52:59.7418232Z cvt.u64.u32 %rd465, %r1125; 2026-02-21T08:52:59.7418655Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:52:59.7419028Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:52:59.7419451Z cvt.u64.u32 %rd468, %r1126; 2026-02-21T08:52:59.7419844Z cvt.u64.u32 %rd469, %r1127; 2026-02-21T08:52:59.7420288Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:52:59.7420714Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:52:59.7421129Z cvt.u64.u32 %rd472, %r1128; 2026-02-21T08:52:59.7421573Z cvt.u64.u32 %rd473, %r1129; 2026-02-21T08:52:59.7421966Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:52:59.7422380Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:52:59.7422772Z cvt.u64.u32 %rd476, %r1130; 2026-02-21T08:52:59.7423200Z cvt.u64.u32 %rd477, %r1131; 2026-02-21T08:52:59.7423590Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:52:59.7424016Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:52:59.7424462Z cvt.u64.u32 %rd480, %r1132; 2026-02-21T08:52:59.7424935Z cvt.u64.u32 %rd481, %r1133; 2026-02-21T08:52:59.7425358Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:52:59.7425897Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:52:59.7426355Z cvt.u64.u32 %rd484, %r1134; 2026-02-21T08:52:59.7426759Z cvt.u64.u32 %rd485, %r1135; 2026-02-21T08:52:59.7427188Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:52:59.7427585Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:52:59.7428035Z cvt.u64.u32 %rd488, %r1136; 2026-02-21T08:52:59.7428426Z cvt.u64.u32 %rd489, %r1137; 2026-02-21T08:52:59.7428940Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:52:59.7429386Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:52:59.7429786Z // begin inline asm 2026-02-21T08:52:59.7430878Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1139, %r1140, %r1141, %r1142, %r1143, %r1144, %r1145, %r1146, %r1147, %r1148, %r1149, %r1150, %r1151, %r1152, %r1153, %r1154}, [%r450 + 128]; 2026-02-21T08:52:59.7432033Z // end inline asm 2026-02-21T08:52:59.7432436Z // begin inline asm 2026-02-21T08:52:59.7433502Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1156, %r1157, %r1158, %r1159, %r1160, %r1161, %r1162, %r1163, %r1164, %r1165, %r1166, %r1167, %r1168, %r1169, %r1170, %r1171}, [%r450 + 144]; 2026-02-21T08:52:59.7434723Z // end inline asm 2026-02-21T08:52:59.7435110Z // begin inline asm 2026-02-21T08:52:59.7436273Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1173, %r1174, %r1175, %r1176, %r1177, %r1178, %r1179, %r1180, %r1181, %r1182, %r1183, %r1184, %r1185, %r1186, %r1187, %r1188}, [%r450 + 160]; 2026-02-21T08:52:59.7437461Z // end inline asm 2026-02-21T08:52:59.7437816Z cvt.u64.u32 %rd492, %r1173; 2026-02-21T08:52:59.7438270Z cvt.u64.u32 %rd493, %r1174; 2026-02-21T08:52:59.7438682Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:52:59.7439116Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:52:59.7439551Z cvt.u64.u32 %rd496, %r1175; 2026-02-21T08:52:59.7439945Z cvt.u64.u32 %rd497, %r1176; 2026-02-21T08:52:59.7440357Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:52:59.7440722Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:52:59.7441148Z cvt.u64.u32 %rd500, %r1177; 2026-02-21T08:52:59.7441540Z cvt.u64.u32 %rd501, %r1178; 2026-02-21T08:52:59.7441962Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:52:59.7442346Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:52:59.7442777Z cvt.u64.u32 %rd504, %r1179; 2026-02-21T08:52:59.7443150Z cvt.u64.u32 %rd505, %r1180; 2026-02-21T08:52:59.7443558Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:52:59.7443981Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:52:59.7444391Z cvt.u64.u32 %rd508, %r1181; 2026-02-21T08:52:59.7444894Z cvt.u64.u32 %rd509, %r1182; 2026-02-21T08:52:59.7445387Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:52:59.7445831Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:52:59.7446241Z cvt.u64.u32 %rd512, %r1183; 2026-02-21T08:52:59.7446654Z cvt.u64.u32 %rd513, %r1184; 2026-02-21T08:52:59.7447027Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:52:59.7447446Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:52:59.7447843Z cvt.u64.u32 %rd516, %r1185; 2026-02-21T08:52:59.7448274Z cvt.u64.u32 %rd517, %r1186; 2026-02-21T08:52:59.7448715Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:52:59.7449113Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:52:59.7449555Z cvt.u64.u32 %rd520, %r1187; 2026-02-21T08:52:59.7449942Z cvt.u64.u32 %rd521, %r1188; 2026-02-21T08:52:59.7450377Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:52:59.7450778Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:52:59.7451232Z // begin inline asm 2026-02-21T08:52:59.7452309Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1190, %r1191, %r1192, %r1193, %r1194, %r1195, %r1196, %r1197, %r1198, %r1199, %r1200, %r1201, %r1202, %r1203, %r1204, %r1205}, [%r450 + 176]; 2026-02-21T08:52:59.7453457Z // end inline asm 2026-02-21T08:52:59.7453830Z cvt.u64.u32 %rd524, %r1190; 2026-02-21T08:52:59.7454228Z cvt.u64.u32 %rd525, %r1191; 2026-02-21T08:52:59.7454664Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:52:59.7455144Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:52:59.7455593Z cvt.u64.u32 %rd528, %r1192; 2026-02-21T08:52:59.7455987Z cvt.u64.u32 %rd529, %r1193; 2026-02-21T08:52:59.7456550Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:52:59.7456953Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:52:59.7457393Z cvt.u64.u32 %rd532, %r1194; 2026-02-21T08:52:59.7457843Z cvt.u64.u32 %rd533, %r1195; 2026-02-21T08:52:59.7458244Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:52:59.7458671Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:52:59.7459071Z cvt.u64.u32 %rd536, %r1196; 2026-02-21T08:52:59.7459498Z cvt.u64.u32 %rd537, %r1197; 2026-02-21T08:52:59.7459994Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:52:59.7460447Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:52:59.7460850Z cvt.u64.u32 %rd540, %r1198; 2026-02-21T08:52:59.7461278Z cvt.u64.u32 %rd541, %r1199; 2026-02-21T08:52:59.7461662Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:52:59.7462097Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:52:59.7462559Z cvt.u64.u32 %rd544, %r1200; 2026-02-21T08:52:59.7462965Z cvt.u64.u32 %rd545, %r1201; 2026-02-21T08:52:59.7463394Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:52:59.7463780Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:52:59.7464225Z cvt.u64.u32 %rd548, %r1202; 2026-02-21T08:52:59.7464607Z cvt.u64.u32 %rd549, %r1203; 2026-02-21T08:52:59.7465074Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:52:59.7465451Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:52:59.7465887Z cvt.u64.u32 %rd552, %r1204; 2026-02-21T08:52:59.7466418Z cvt.u64.u32 %rd553, %r1205; 2026-02-21T08:52:59.7466805Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:52:59.7467222Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:52:59.7467615Z // begin inline asm 2026-02-21T08:52:59.7468682Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1207, %r1208, %r1209, %r1210, %r1211, %r1212, %r1213, %r1214, %r1215, %r1216, %r1217, %r1218, %r1219, %r1220, %r1221, %r1222}, [%r450 + 192]; 2026-02-21T08:52:59.7469829Z // end inline asm 2026-02-21T08:52:59.7470210Z // begin inline asm 2026-02-21T08:52:59.7471203Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1224, %r1225, %r1226, %r1227, %r1228, %r1229, %r1230, %r1231, %r1232, %r1233, %r1234, %r1235, %r1236, %r1237, %r1238, %r1239}, [%r450 + 208]; 2026-02-21T08:52:59.7472406Z // end inline asm 2026-02-21T08:52:59.7472777Z // begin inline asm 2026-02-21T08:52:59.7473807Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1241, %r1242, %r1243, %r1244, %r1245, %r1246, %r1247, %r1248, %r1249, %r1250, %r1251, %r1252, %r1253, %r1254, %r1255, %r1256}, [%r450 + 224]; 2026-02-21T08:52:59.7475109Z // end inline asm 2026-02-21T08:52:59.7475453Z cvt.u64.u32 %rd556, %r1241; 2026-02-21T08:52:59.7475894Z cvt.u64.u32 %rd557, %r1242; 2026-02-21T08:52:59.7476423Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:52:59.7476840Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:52:59.7477264Z cvt.u64.u32 %rd560, %r1243; 2026-02-21T08:52:59.7477657Z cvt.u64.u32 %rd561, %r1244; 2026-02-21T08:52:59.7478101Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:52:59.7478500Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:52:59.7478941Z cvt.u64.u32 %rd564, %r1245; 2026-02-21T08:52:59.7479332Z cvt.u64.u32 %rd565, %r1246; 2026-02-21T08:52:59.7479767Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:52:59.7480163Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:52:59.7480629Z cvt.u64.u32 %rd568, %r1247; 2026-02-21T08:52:59.7481083Z cvt.u64.u32 %rd569, %r1248; 2026-02-21T08:52:59.7481491Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:52:59.7481932Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:52:59.7482341Z cvt.u64.u32 %rd572, %r1249; 2026-02-21T08:52:59.7482745Z cvt.u64.u32 %rd573, %r1250; 2026-02-21T08:52:59.7483129Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:52:59.7483563Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:52:59.7483969Z cvt.u64.u32 %rd576, %r1251; 2026-02-21T08:52:59.7484405Z cvt.u64.u32 %rd577, %r1252; 2026-02-21T08:52:59.7484834Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:52:59.7485257Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:52:59.7485690Z cvt.u64.u32 %rd580, %r1253; 2026-02-21T08:52:59.7486082Z cvt.u64.u32 %rd581, %r1254; 2026-02-21T08:52:59.7486536Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:52:59.7487069Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:52:59.7487507Z cvt.u64.u32 %rd584, %r1255; 2026-02-21T08:52:59.7487902Z cvt.u64.u32 %rd585, %r1256; 2026-02-21T08:52:59.7488325Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:52:59.7488699Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:52:59.7489113Z // begin inline asm 2026-02-21T08:52:59.7490275Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1258, %r1259, %r1260, %r1261, %r1262, %r1263, %r1264, %r1265, %r1266, %r1267, %r1268, %r1269, %r1270, %r1271, %r1272, %r1273}, [%r450 + 240]; 2026-02-21T08:52:59.7491396Z // end inline asm 2026-02-21T08:52:59.7491768Z cvt.u64.u32 %rd588, %r1258; 2026-02-21T08:52:59.7492160Z cvt.u64.u32 %rd589, %r1259; 2026-02-21T08:52:59.7492592Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:52:59.7492990Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:52:59.7493432Z cvt.u64.u32 %rd592, %r1260; 2026-02-21T08:52:59.7493830Z cvt.u64.u32 %rd593, %r1261; 2026-02-21T08:52:59.7494250Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:52:59.7495089Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:52:59.7495498Z cvt.u64.u32 %rd596, %r1262; 2026-02-21T08:52:59.7495925Z cvt.u64.u32 %rd597, %r1263; 2026-02-21T08:52:59.7496326Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:52:59.7496755Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:52:59.7497155Z cvt.u64.u32 %rd600, %r1264; 2026-02-21T08:52:59.7497681Z cvt.u64.u32 %rd601, %r1265; 2026-02-21T08:52:59.7498089Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:52:59.7498536Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:52:59.7498959Z cvt.u64.u32 %rd604, %r1266; 2026-02-21T08:52:59.7499384Z cvt.u64.u32 %rd605, %r1267; 2026-02-21T08:52:59.7499811Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:52:59.7500204Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:52:59.7500626Z cvt.u64.u32 %rd608, %r1268; 2026-02-21T08:52:59.7501007Z cvt.u64.u32 %rd609, %r1269; 2026-02-21T08:52:59.7501435Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:52:59.7501827Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:52:59.7502276Z cvt.u64.u32 %rd612, %r1270; 2026-02-21T08:52:59.7502660Z cvt.u64.u32 %rd613, %r1271; 2026-02-21T08:52:59.7503076Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:52:59.7503461Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:52:59.7503903Z cvt.u64.u32 %rd616, %r1272; 2026-02-21T08:52:59.7504344Z cvt.u64.u32 %rd617, %r1273; 2026-02-21T08:52:59.7504811Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:52:59.7505252Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:52:59.7505664Z // begin inline asm 2026-02-21T08:52:59.7506791Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1275, %r1276, %r1277, %r1278, %r1279, %r1280, %r1281, %r1282, %r1283, %r1284, %r1285, %r1286, %r1287, %r1288, %r1289, %r1290}, [%r450 + 256]; 2026-02-21T08:52:59.7507926Z // end inline asm 2026-02-21T08:52:59.7508307Z // begin inline asm 2026-02-21T08:52:59.7509370Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1292, %r1293, %r1294, %r1295, %r1296, %r1297, %r1298, %r1299, %r1300, %r1301, %r1302, %r1303, %r1304, %r1305, %r1306, %r1307}, [%r450 + 272]; 2026-02-21T08:52:59.7510531Z // end inline asm 2026-02-21T08:52:59.7510928Z // begin inline asm 2026-02-21T08:52:59.7511968Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1309, %r1310, %r1311, %r1312, %r1313, %r1314, %r1315, %r1316, %r1317, %r1318, %r1319, %r1320, %r1321, %r1322, %r1323, %r1324}, [%r450 + 288]; 2026-02-21T08:52:59.7513076Z // end inline asm 2026-02-21T08:52:59.7513406Z cvt.u64.u32 %rd620, %r1309; 2026-02-21T08:52:59.7513840Z cvt.u64.u32 %rd621, %r1310; 2026-02-21T08:52:59.7514270Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:52:59.7514655Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:52:59.7515167Z cvt.u64.u32 %rd624, %r1311; 2026-02-21T08:52:59.7515558Z cvt.u64.u32 %rd625, %r1312; 2026-02-21T08:52:59.7515978Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:52:59.7516370Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:52:59.7516829Z cvt.u64.u32 %rd628, %r1313; 2026-02-21T08:52:59.7517215Z cvt.u64.u32 %rd629, %r1314; 2026-02-21T08:52:59.7517652Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:52:59.7518175Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:52:59.7518607Z cvt.u64.u32 %rd632, %r1315; 2026-02-21T08:52:59.7519014Z cvt.u64.u32 %rd633, %r1316; 2026-02-21T08:52:59.7519399Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:52:59.7519828Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:52:59.7520233Z cvt.u64.u32 %rd636, %r1317; 2026-02-21T08:52:59.7520674Z cvt.u64.u32 %rd637, %r1318; 2026-02-21T08:52:59.7521160Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:52:59.7521603Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:52:59.7522004Z cvt.u64.u32 %rd640, %r1319; 2026-02-21T08:52:59.7522436Z cvt.u64.u32 %rd641, %r1320; 2026-02-21T08:52:59.7522878Z shl.b64 %rd642, %rd641, 32; 2026-02-21T08:52:59.7523282Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T08:52:59.7523738Z cvt.u64.u32 %rd644, %r1321; 2026-02-21T08:52:59.7524143Z cvt.u64.u32 %rd645, %r1322; 2026-02-21T08:52:59.7524580Z shl.b64 %rd646, %rd645, 32; 2026-02-21T08:52:59.7525019Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T08:52:59.7525450Z cvt.u64.u32 %rd648, %r1323; 2026-02-21T08:52:59.7525841Z cvt.u64.u32 %rd649, %r1324; 2026-02-21T08:52:59.7526296Z shl.b64 %rd650, %rd649, 32; 2026-02-21T08:52:59.7526687Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T08:52:59.7527122Z // begin inline asm 2026-02-21T08:52:59.7528323Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1326, %r1327, %r1328, %r1329, %r1330, %r1331, %r1332, %r1333, %r1334, %r1335, %r1336, %r1337, %r1338, %r1339, %r1340, %r1341}, [%r450 + 304]; 2026-02-21T08:52:59.7529554Z // end inline asm 2026-02-21T08:52:59.7529952Z cvt.u64.u32 %rd652, %r1326; 2026-02-21T08:52:59.7530353Z cvt.u64.u32 %rd653, %r1327; 2026-02-21T08:52:59.7530777Z shl.b64 %rd654, %rd653, 32; 2026-02-21T08:52:59.7531157Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T08:52:59.7531590Z cvt.u64.u32 %rd656, %r1328; 2026-02-21T08:52:59.7532020Z cvt.u64.u32 %rd657, %r1329; 2026-02-21T08:52:59.7532426Z shl.b64 %rd658, %rd657, 32; 2026-02-21T08:52:59.7532863Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T08:52:59.7533270Z cvt.u64.u32 %rd660, %r1330; 2026-02-21T08:52:59.7533705Z cvt.u64.u32 %rd661, %r1331; 2026-02-21T08:52:59.7534094Z shl.b64 %rd662, %rd661, 32; 2026-02-21T08:52:59.7534532Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T08:52:59.7535035Z cvt.u64.u32 %rd664, %r1332; 2026-02-21T08:52:59.7535472Z cvt.u64.u32 %rd665, %r1333; 2026-02-21T08:52:59.7535862Z shl.b64 %rd666, %rd665, 32; 2026-02-21T08:52:59.7536292Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T08:52:59.7536714Z cvt.u64.u32 %rd668, %r1334; 2026-02-21T08:52:59.7537205Z cvt.u64.u32 %rd669, %r1335; 2026-02-21T08:52:59.7537617Z shl.b64 %rd670, %rd669, 32; 2026-02-21T08:52:59.7538006Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T08:52:59.7538449Z cvt.u64.u32 %rd672, %r1336; 2026-02-21T08:52:59.7538833Z cvt.u64.u32 %rd673, %r1337; 2026-02-21T08:52:59.7539252Z shl.b64 %rd674, %rd673, 32; 2026-02-21T08:52:59.7539635Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T08:52:59.7540068Z cvt.u64.u32 %rd676, %r1338; 2026-02-21T08:52:59.7540463Z cvt.u64.u32 %rd677, %r1339; 2026-02-21T08:52:59.7540913Z shl.b64 %rd678, %rd677, 32; 2026-02-21T08:52:59.7541339Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T08:52:59.7541761Z cvt.u64.u32 %rd680, %r1340; 2026-02-21T08:52:59.7542203Z cvt.u64.u32 %rd681, %r1341; 2026-02-21T08:52:59.7542587Z shl.b64 %rd682, %rd681, 32; 2026-02-21T08:52:59.7543009Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T08:52:59.7543406Z // begin inline asm 2026-02-21T08:52:59.7544499Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1343, %r1344, %r1345, %r1346, %r1347, %r1348, %r1349, %r1350, %r1351, %r1352, %r1353, %r1354, %r1355, %r1356, %r1357, %r1358}, [%r450 + 320]; 2026-02-21T08:52:59.7545764Z // end inline asm 2026-02-21T08:52:59.7546106Z // begin inline asm 2026-02-21T08:52:59.7547239Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1360, %r1361, %r1362, %r1363, %r1364, %r1365, %r1366, %r1367, %r1368, %r1369, %r1370, %r1371, %r1372, %r1373, %r1374, %r1375}, [%r450 + 336]; 2026-02-21T08:52:59.7548412Z // end inline asm 2026-02-21T08:52:59.7548905Z cvt.u64.u32 %rd684, %r1368; 2026-02-21T08:52:59.7549285Z cvt.u64.u32 %rd685, %r1369; 2026-02-21T08:52:59.7549717Z shl.b64 %rd686, %rd685, 32; 2026-02-21T08:52:59.7550118Z or.b64 %rd687, %rd684, %rd686; 2026-02-21T08:52:59.7550561Z // begin inline asm 2026-02-21T08:52:59.7551732Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1377, %r1378, %r1379, %r1380, %r1381, %r1382, %r1383, %r1384, %r1385, %r1386, %r1387, %r1388, %r1389, %r1390, %r1391, %r1392}, [%r450 + 352]; 2026-02-21T08:52:59.7552955Z // end inline asm 2026-02-21T08:52:59.7553351Z cvt.u64.u32 %rd688, %r1377; 2026-02-21T08:52:59.7553756Z cvt.u64.u32 %rd689, %r1378; 2026-02-21T08:52:59.7554193Z shl.b64 %rd690, %rd689, 32; 2026-02-21T08:52:59.7554571Z or.b64 %rd691, %rd688, %rd690; 2026-02-21T08:52:59.7555080Z cvt.u64.u32 %rd692, %r1379; 2026-02-21T08:52:59.7555477Z cvt.u64.u32 %rd693, %r1380; 2026-02-21T08:52:59.7555924Z shl.b64 %rd694, %rd693, 32; 2026-02-21T08:52:59.7556372Z or.b64 %rd695, %rd692, %rd694; 2026-02-21T08:52:59.7556781Z cvt.u64.u32 %rd696, %r1381; 2026-02-21T08:52:59.7557212Z cvt.u64.u32 %rd697, %r1382; 2026-02-21T08:52:59.7557596Z shl.b64 %rd698, %rd697, 32; 2026-02-21T08:52:59.7558024Z or.b64 %rd699, %rd696, %rd698; 2026-02-21T08:52:59.7558441Z cvt.u64.u32 %rd700, %r1383; 2026-02-21T08:52:59.7558998Z cvt.u64.u32 %rd701, %r1384; 2026-02-21T08:52:59.7559396Z shl.b64 %rd702, %rd701, 32; 2026-02-21T08:52:59.7559828Z or.b64 %rd703, %rd700, %rd702; 2026-02-21T08:52:59.7560262Z cvt.u64.u32 %rd704, %r1385; 2026-02-21T08:52:59.7560630Z cvt.u64.u32 %rd705, %r1386; 2026-02-21T08:52:59.7561031Z shl.b64 %rd706, %rd705, 32; 2026-02-21T08:52:59.7561405Z or.b64 %rd707, %rd704, %rd706; 2026-02-21T08:52:59.7561844Z cvt.u64.u32 %rd708, %r1387; 2026-02-21T08:52:59.7562225Z cvt.u64.u32 %rd709, %r1388; 2026-02-21T08:52:59.7562641Z shl.b64 %rd710, %rd709, 32; 2026-02-21T08:52:59.7563021Z or.b64 %rd711, %rd708, %rd710; 2026-02-21T08:52:59.7563456Z cvt.u64.u32 %rd712, %r1389; 2026-02-21T08:52:59.7563848Z cvt.u64.u32 %rd713, %r1390; 2026-02-21T08:52:59.7564273Z shl.b64 %rd714, %rd713, 32; 2026-02-21T08:52:59.7564783Z or.b64 %rd715, %rd712, %rd714; 2026-02-21T08:52:59.7565190Z cvt.u64.u32 %rd716, %r1391; 2026-02-21T08:52:59.7565626Z cvt.u64.u32 %rd717, %r1392; 2026-02-21T08:52:59.7566023Z shl.b64 %rd718, %rd717, 32; 2026-02-21T08:52:59.7566455Z or.b64 %rd719, %rd716, %rd718; 2026-02-21T08:52:59.7566832Z // begin inline asm 2026-02-21T08:52:59.7567915Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1394, %r1395, %r1396, %r1397, %r1398, %r1399, %r1400, %r1401, %r1402, %r1403, %r1404, %r1405, %r1406, %r1407, %r1408, %r1409}, [%r450 + 368]; 2026-02-21T08:52:59.7569172Z // end inline asm 2026-02-21T08:52:59.7569549Z cvt.u64.u32 %rd720, %r1394; 2026-02-21T08:52:59.7569977Z cvt.u64.u32 %rd721, %r1395; 2026-02-21T08:52:59.7570383Z shl.b64 %rd722, %rd721, 32; 2026-02-21T08:52:59.7570831Z or.b64 %rd723, %rd720, %rd722; 2026-02-21T08:52:59.7571235Z cvt.u64.u32 %rd724, %r1396; 2026-02-21T08:52:59.7571691Z cvt.u64.u32 %rd725, %r1397; 2026-02-21T08:52:59.7572092Z shl.b64 %rd726, %rd725, 32; 2026-02-21T08:52:59.7572513Z or.b64 %rd727, %rd724, %rd726; 2026-02-21T08:52:59.7572896Z cvt.u64.u32 %rd728, %r1398; 2026-02-21T08:52:59.7573322Z cvt.u64.u32 %rd729, %r1399; 2026-02-21T08:52:59.7573757Z shl.b64 %rd730, %rd729, 32; 2026-02-21T08:52:59.7574157Z or.b64 %rd731, %rd728, %rd730; 2026-02-21T08:52:59.7574590Z cvt.u64.u32 %rd732, %r1400; 2026-02-21T08:52:59.7575060Z cvt.u64.u32 %rd733, %r1401; 2026-02-21T08:52:59.7575486Z shl.b64 %rd734, %rd733, 32; 2026-02-21T08:52:59.7575901Z or.b64 %rd735, %rd732, %rd734; 2026-02-21T08:52:59.7576360Z cvt.u64.u32 %rd736, %r1402; 2026-02-21T08:52:59.7576765Z cvt.u64.u32 %rd737, %r1403; 2026-02-21T08:52:59.7577213Z shl.b64 %rd738, %rd737, 32; 2026-02-21T08:52:59.7577619Z or.b64 %rd739, %rd736, %rd738; 2026-02-21T08:52:59.7578061Z cvt.u64.u32 %rd740, %r1404; 2026-02-21T08:52:59.7578517Z cvt.u64.u32 %rd741, %r1405; 2026-02-21T08:52:59.7579031Z shl.b64 %rd742, %rd741, 32; 2026-02-21T08:52:59.7579462Z or.b64 %rd743, %rd740, %rd742; 2026-02-21T08:52:59.7579880Z cvt.u64.u32 %rd744, %r1406; 2026-02-21T08:52:59.7580306Z cvt.u64.u32 %rd745, %r1407; 2026-02-21T08:52:59.7580689Z shl.b64 %rd746, %rd745, 32; 2026-02-21T08:52:59.7581110Z or.b64 %rd747, %rd744, %rd746; 2026-02-21T08:52:59.7581514Z cvt.u64.u32 %rd748, %r1408; 2026-02-21T08:52:59.7582047Z cvt.u64.u32 %rd749, %r1409; 2026-02-21T08:52:59.7582472Z shl.b64 %rd750, %rd749, 32; 2026-02-21T08:52:59.7582915Z or.b64 %rd751, %rd748, %rd750; 2026-02-21T08:52:59.7583351Z // begin inline asm 2026-02-21T08:52:59.7584355Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1411, %r1412, %r1413, %r1414, %r1415, %r1416, %r1417, %r1418, %r1419, %r1420, %r1421, %r1422, %r1423, %r1424, %r1425, %r1426}, [%r450 + 384]; 2026-02-21T08:52:59.7585548Z // end inline asm 2026-02-21T08:52:59.7585878Z // begin inline asm 2026-02-21T08:52:59.7586925Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1428, %r1429, %r1430, %r1431, %r1432, %r1433, %r1434, %r1435, %r1436, %r1437, %r1438, %r1439, %r1440, %r1441, %r1442, %r1443}, [%r450 + 400]; 2026-02-21T08:52:59.7588082Z // end inline asm 2026-02-21T08:52:59.7588439Z cvt.u64.u32 %rd752, %r1436; 2026-02-21T08:52:59.7588875Z cvt.u64.u32 %rd753, %r1437; 2026-02-21T08:52:59.7589409Z shl.b64 %rd754, %rd753, 32; 2026-02-21T08:52:59.7589862Z or.b64 %rd755, %rd752, %rd754; 2026-02-21T08:52:59.7590259Z cvt.u64.u32 %rd756, %r1438; 2026-02-21T08:52:59.7590671Z cvt.u64.u32 %rd757, %r1439; 2026-02-21T08:52:59.7591051Z shl.b64 %rd758, %rd757, 32; 2026-02-21T08:52:59.7591487Z or.b64 %rd759, %rd756, %rd758; 2026-02-21T08:52:59.7591896Z cvt.u64.u32 %rd760, %r1440; 2026-02-21T08:52:59.7592321Z cvt.u64.u32 %rd761, %r1441; 2026-02-21T08:52:59.7592740Z shl.b64 %rd762, %rd761, 32; 2026-02-21T08:52:59.7593131Z or.b64 %rd763, %rd760, %rd762; 2026-02-21T08:52:59.7593569Z cvt.u64.u32 %rd764, %r1442; 2026-02-21T08:52:59.7593961Z cvt.u64.u32 %rd765, %r1443; 2026-02-21T08:52:59.7594413Z shl.b64 %rd766, %rd765, 32; 2026-02-21T08:52:59.7594879Z or.b64 %rd767, %rd764, %rd766; 2026-02-21T08:52:59.7595319Z // begin inline asm 2026-02-21T08:52:59.7596359Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1445, %r1446, %r1447, %r1448, %r1449, %r1450, %r1451, %r1452, %r1453, %r1454, %r1455, %r1456, %r1457, %r1458, %r1459, %r1460}, [%r450 + 416]; 2026-02-21T08:52:59.7597501Z // end inline asm 2026-02-21T08:52:59.7597885Z cvt.u64.u32 %rd768, %r1445; 2026-02-21T08:52:59.7598391Z cvt.u64.u32 %rd769, %r1446; 2026-02-21T08:52:59.7598809Z shl.b64 %rd770, %rd769, 32; 2026-02-21T08:52:59.7599195Z or.b64 %rd771, %rd768, %rd770; 2026-02-21T08:52:59.7599634Z cvt.u64.u32 %rd772, %r1447; 2026-02-21T08:52:59.7600030Z cvt.u64.u32 %rd773, %r1448; 2026-02-21T08:52:59.7600469Z shl.b64 %rd774, %rd773, 32; 2026-02-21T08:52:59.7600862Z or.b64 %rd775, %rd772, %rd774; 2026-02-21T08:52:59.7601320Z cvt.u64.u32 %rd776, %r1449; 2026-02-21T08:52:59.7601747Z cvt.u64.u32 %rd777, %r1450; 2026-02-21T08:52:59.7602139Z shl.b64 %rd778, %rd777, 32; 2026-02-21T08:52:59.7602554Z or.b64 %rd779, %rd776, %rd778; 2026-02-21T08:52:59.7602956Z cvt.u64.u32 %rd780, %r1451; 2026-02-21T08:52:59.7603393Z cvt.u64.u32 %rd781, %r1452; 2026-02-21T08:52:59.7603792Z shl.b64 %rd782, %rd781, 32; 2026-02-21T08:52:59.7604225Z or.b64 %rd783, %rd780, %rd782; 2026-02-21T08:52:59.7604624Z cvt.u64.u32 %rd784, %r1453; 2026-02-21T08:52:59.7605118Z cvt.u64.u32 %rd785, %r1454; 2026-02-21T08:52:59.7605515Z shl.b64 %rd786, %rd785, 32; 2026-02-21T08:52:59.7605950Z or.b64 %rd787, %rd784, %rd786; 2026-02-21T08:52:59.7606414Z cvt.u64.u32 %rd788, %r1455; 2026-02-21T08:52:59.7606810Z cvt.u64.u32 %rd789, %r1456; 2026-02-21T08:52:59.7607231Z shl.b64 %rd790, %rd789, 32; 2026-02-21T08:52:59.7607619Z or.b64 %rd791, %rd788, %rd790; 2026-02-21T08:52:59.7608047Z cvt.u64.u32 %rd792, %r1457; 2026-02-21T08:52:59.7608415Z cvt.u64.u32 %rd793, %r1458; 2026-02-21T08:52:59.7608828Z shl.b64 %rd794, %rd793, 32; 2026-02-21T08:52:59.7609320Z or.b64 %rd795, %rd792, %rd794; 2026-02-21T08:52:59.7609760Z cvt.u64.u32 %rd796, %r1459; 2026-02-21T08:52:59.7610137Z cvt.u64.u32 %rd797, %r1460; 2026-02-21T08:52:59.7610548Z shl.b64 %rd798, %rd797, 32; 2026-02-21T08:52:59.7610973Z or.b64 %rd799, %rd796, %rd798; 2026-02-21T08:52:59.7611361Z // begin inline asm 2026-02-21T08:52:59.7612588Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1462, %r1463, %r1464, %r1465, %r1466, %r1467, %r1468, %r1469, %r1470, %r1471, %r1472, %r1473, %r1474, %r1475, %r1476, %r1477}, [%r450 + 432]; 2026-02-21T08:52:59.7613780Z // end inline asm 2026-02-21T08:52:59.7614151Z cvt.u64.u32 %rd800, %r1462; 2026-02-21T08:52:59.7614534Z cvt.u64.u32 %rd801, %r1463; 2026-02-21T08:52:59.7615034Z shl.b64 %rd802, %rd801, 32; 2026-02-21T08:52:59.7615474Z or.b64 %rd803, %rd800, %rd802; 2026-02-21T08:52:59.7615884Z cvt.u64.u32 %rd804, %r1464; 2026-02-21T08:52:59.7616315Z cvt.u64.u32 %rd805, %r1465; 2026-02-21T08:52:59.7616701Z shl.b64 %rd806, %rd805, 32; 2026-02-21T08:52:59.7617131Z or.b64 %rd807, %rd804, %rd806; 2026-02-21T08:52:59.7617535Z cvt.u64.u32 %rd808, %r1466; 2026-02-21T08:52:59.7617976Z cvt.u64.u32 %rd809, %r1467; 2026-02-21T08:52:59.7618385Z shl.b64 %rd810, %rd809, 32; 2026-02-21T08:52:59.7618818Z or.b64 %rd811, %rd808, %rd810; 2026-02-21T08:52:59.7619338Z cvt.u64.u32 %rd812, %r1468; 2026-02-21T08:52:59.7619785Z cvt.u64.u32 %rd813, %r1469; 2026-02-21T08:52:59.7620204Z shl.b64 %rd814, %rd813, 32; 2026-02-21T08:52:59.7620585Z or.b64 %rd815, %rd812, %rd814; 2026-02-21T08:52:59.7621017Z cvt.u64.u32 %rd816, %r1470; 2026-02-21T08:52:59.7621407Z cvt.u64.u32 %rd817, %r1471; 2026-02-21T08:52:59.7621835Z shl.b64 %rd818, %rd817, 32; 2026-02-21T08:52:59.7622220Z or.b64 %rd819, %rd816, %rd818; 2026-02-21T08:52:59.7622654Z cvt.u64.u32 %rd820, %r1472; 2026-02-21T08:52:59.7623035Z cvt.u64.u32 %rd821, %r1473; 2026-02-21T08:52:59.7623462Z shl.b64 %rd822, %rd821, 32; 2026-02-21T08:52:59.7623900Z or.b64 %rd823, %rd820, %rd822; 2026-02-21T08:52:59.7624376Z cvt.u64.u32 %rd824, %r1474; 2026-02-21T08:52:59.7624876Z cvt.u64.u32 %rd825, %r1475; 2026-02-21T08:52:59.7625292Z shl.b64 %rd826, %rd825, 32; 2026-02-21T08:52:59.7625730Z or.b64 %rd827, %rd824, %rd826; 2026-02-21T08:52:59.7626117Z cvt.u64.u32 %rd828, %r1476; 2026-02-21T08:52:59.7626542Z cvt.u64.u32 %rd829, %r1477; 2026-02-21T08:52:59.7626939Z shl.b64 %rd830, %rd829, 32; 2026-02-21T08:52:59.7627382Z or.b64 %rd831, %rd828, %rd830; 2026-02-21T08:52:59.7627902Z // begin inline asm 2026-02-21T08:52:59.7628959Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1479, %r1480, %r1481, %r1482, %r1483, %r1484, %r1485, %r1486, %r1487, %r1488, %r1489, %r1490, %r1491, %r1492, %r1493, %r1494}, [%r450 + 448]; 2026-02-21T08:52:59.7630199Z // end inline asm 2026-02-21T08:52:59.7630535Z // begin inline asm 2026-02-21T08:52:59.7631591Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1496, %r1497, %r1498, %r1499, %r1500, %r1501, %r1502, %r1503, %r1504, %r1505, %r1506, %r1507, %r1508, %r1509, %r1510, %r1511}, [%r450 + 464]; 2026-02-21T08:52:59.7632662Z // end inline asm 2026-02-21T08:52:59.7633035Z cvt.u64.u32 %rd832, %r1504; 2026-02-21T08:52:59.7633420Z cvt.u64.u32 %rd833, %r1505; 2026-02-21T08:52:59.7633832Z shl.b64 %rd834, %rd833, 32; 2026-02-21T08:52:59.7634250Z or.b64 %rd835, %rd832, %rd834; 2026-02-21T08:52:59.7634631Z cvt.u64.u32 %rd836, %r1506; 2026-02-21T08:52:59.7635098Z cvt.u64.u32 %rd837, %r1507; 2026-02-21T08:52:59.7635482Z shl.b64 %rd838, %rd837, 32; 2026-02-21T08:52:59.7635939Z or.b64 %rd839, %rd836, %rd838; 2026-02-21T08:52:59.7636334Z cvt.u64.u32 %rd840, %r1508; 2026-02-21T08:52:59.7636769Z cvt.u64.u32 %rd841, %r1509; 2026-02-21T08:52:59.7637164Z shl.b64 %rd842, %rd841, 32; 2026-02-21T08:52:59.7637578Z or.b64 %rd843, %rd840, %rd842; 2026-02-21T08:52:59.7637969Z cvt.u64.u32 %rd844, %r1510; 2026-02-21T08:52:59.7638387Z cvt.u64.u32 %rd845, %r1511; 2026-02-21T08:52:59.7638853Z shl.b64 %rd846, %rd845, 32; 2026-02-21T08:52:59.7639347Z or.b64 %rd847, %rd844, %rd846; 2026-02-21T08:52:59.7639945Z // begin inline asm 2026-02-21T08:52:59.7640967Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1513, %r1514, %r1515, %r1516, %r1517, %r1518, %r1519, %r1520, %r1521, %r1522, %r1523, %r1524, %r1525, %r1526, %r1527, %r1528}, [%r450 + 480]; 2026-02-21T08:52:59.7642206Z // end inline asm 2026-02-21T08:52:59.7642545Z cvt.u64.u32 %rd848, %r1513; 2026-02-21T08:52:59.7642977Z cvt.u64.u32 %rd849, %r1514; 2026-02-21T08:52:59.7643488Z shl.b64 %rd850, %rd849, 32; 2026-02-21T08:52:59.7643861Z or.b64 %rd851, %rd848, %rd850; 2026-02-21T08:52:59.7644288Z cvt.u64.u32 %rd852, %r1515; 2026-02-21T08:52:59.7644656Z cvt.u64.u32 %rd853, %r1516; 2026-02-21T08:52:59.7645139Z shl.b64 %rd854, %rd853, 32; 2026-02-21T08:52:59.7645510Z or.b64 %rd855, %rd852, %rd854; 2026-02-21T08:52:59.7645923Z cvt.u64.u32 %rd856, %r1517; 2026-02-21T08:52:59.7646281Z cvt.u64.u32 %rd857, %r1518; 2026-02-21T08:52:59.7646708Z shl.b64 %rd858, %rd857, 32; 2026-02-21T08:52:59.7647105Z or.b64 %rd859, %rd856, %rd858; 2026-02-21T08:52:59.7647561Z cvt.u64.u32 %rd860, %r1519; 2026-02-21T08:52:59.7647985Z cvt.u64.u32 %rd861, %r1520; 2026-02-21T08:52:59.7648377Z shl.b64 %rd862, %rd861, 32; 2026-02-21T08:52:59.7648807Z or.b64 %rd863, %rd860, %rd862; 2026-02-21T08:52:59.7649176Z cvt.u64.u32 %rd864, %r1521; 2026-02-21T08:52:59.7649705Z cvt.u64.u32 %rd865, %r1522; 2026-02-21T08:52:59.7650117Z shl.b64 %rd866, %rd865, 32; 2026-02-21T08:52:59.7650560Z or.b64 %rd867, %rd864, %rd866; 2026-02-21T08:52:59.7650946Z cvt.u64.u32 %rd868, %r1523; 2026-02-21T08:52:59.7651368Z cvt.u64.u32 %rd869, %r1524; 2026-02-21T08:52:59.7651802Z shl.b64 %rd870, %rd869, 32; 2026-02-21T08:52:59.7652197Z or.b64 %rd871, %rd868, %rd870; 2026-02-21T08:52:59.7652644Z cvt.u64.u32 %rd872, %r1525; 2026-02-21T08:52:59.7653058Z cvt.u64.u32 %rd873, %r1526; 2026-02-21T08:52:59.7653495Z shl.b64 %rd874, %rd873, 32; 2026-02-21T08:52:59.7653886Z or.b64 %rd875, %rd872, %rd874; 2026-02-21T08:52:59.7654321Z cvt.u64.u32 %rd876, %r1527; 2026-02-21T08:52:59.7654757Z cvt.u64.u32 %rd877, %r1528; 2026-02-21T08:52:59.7655177Z shl.b64 %rd878, %rd877, 32; 2026-02-21T08:52:59.7655550Z or.b64 %rd879, %rd876, %rd878; 2026-02-21T08:52:59.7655980Z // begin inline asm 2026-02-21T08:52:59.7657038Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1530, %r1531, %r1532, %r1533, %r1534, %r1535, %r1536, %r1537, %r1538, %r1539, %r1540, %r1541, %r1542, %r1543, %r1544, %r1545}, [%r450 + 496]; 2026-02-21T08:52:59.7658151Z // end inline asm 2026-02-21T08:52:59.7658643Z cvt.u64.u32 %rd880, %r1530; 2026-02-21T08:52:59.7659040Z cvt.u64.u32 %rd881, %r1531; 2026-02-21T08:52:59.7659476Z shl.b64 %rd882, %rd881, 32; 2026-02-21T08:52:59.7659869Z or.b64 %rd883, %rd880, %rd882; 2026-02-21T08:52:59.7660295Z cvt.u64.u32 %rd884, %r1532; 2026-02-21T08:52:59.7660670Z cvt.u64.u32 %rd885, %r1533; 2026-02-21T08:52:59.7661094Z shl.b64 %rd886, %rd885, 32; 2026-02-21T08:52:59.7661578Z or.b64 %rd887, %rd884, %rd886; 2026-02-21T08:52:59.7662094Z cvt.u64.u32 %rd888, %r1534; 2026-02-21T08:52:59.7662613Z cvt.u64.u32 %rd889, %r1535; 2026-02-21T08:52:59.7663044Z shl.b64 %rd890, %rd889, 32; 2026-02-21T08:52:59.7663471Z or.b64 %rd891, %rd888, %rd890; 2026-02-21T08:52:59.7663845Z cvt.u64.u32 %rd892, %r1536; 2026-02-21T08:52:59.7664283Z cvt.u64.u32 %rd893, %r1537; 2026-02-21T08:52:59.7664752Z shl.b64 %rd894, %rd893, 32; 2026-02-21T08:52:59.7665195Z or.b64 %rd895, %rd892, %rd894; 2026-02-21T08:52:59.7665652Z cvt.u64.u32 %rd896, %r1538; 2026-02-21T08:52:59.7666055Z cvt.u64.u32 %rd897, %r1539; 2026-02-21T08:52:59.7666477Z shl.b64 %rd898, %rd897, 32; 2026-02-21T08:52:59.7666857Z or.b64 %rd899, %rd896, %rd898; 2026-02-21T08:52:59.7667292Z cvt.u64.u32 %rd900, %r1540; 2026-02-21T08:52:59.7667682Z cvt.u64.u32 %rd901, %r1541; 2026-02-21T08:52:59.7668109Z shl.b64 %rd902, %rd901, 32; 2026-02-21T08:52:59.7668497Z or.b64 %rd903, %rd900, %rd902; 2026-02-21T08:52:59.7668933Z cvt.u64.u32 %rd904, %r1542; 2026-02-21T08:52:59.7669312Z cvt.u64.u32 %rd905, %r1543; 2026-02-21T08:52:59.7669859Z shl.b64 %rd906, %rd905, 32; 2026-02-21T08:52:59.7670296Z or.b64 %rd907, %rd904, %rd906; 2026-02-21T08:52:59.7670714Z cvt.u64.u32 %rd908, %r1544; 2026-02-21T08:52:59.7671145Z cvt.u64.u32 %rd909, %r1545; 2026-02-21T08:52:59.7671540Z shl.b64 %rd910, %rd909, 32; 2026-02-21T08:52:59.7671986Z or.b64 %rd911, %rd908, %rd910; 2026-02-21T08:52:59.7672369Z // begin inline asm 2026-02-21T08:52:59.7672889Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:52:59.7673305Z // end inline asm 2026-02-21T08:52:59.7673691Z cvt.u64.u32 %rd912, %r1003; 2026-02-21T08:52:59.7674080Z cvt.u64.u32 %rd913, %r1004; 2026-02-21T08:52:59.7674258Z shl.b64 %rd914, %rd913, 32; 2026-02-21T08:52:59.7674402Z or.b64 %rd915, %rd912, %rd914; 2026-02-21T08:52:59.7674929Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7675124Z mov.b64 {%r1577, %r1578}, %rd915; 2026-02-21T08:52:59.7675295Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T08:52:59.7675756Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7675940Z cvt.u64.u32 %rd916, %r1005; 2026-02-21T08:52:59.7676088Z cvt.u64.u32 %rd917, %r1006; 2026-02-21T08:52:59.7676233Z shl.b64 %rd918, %rd917, 32; 2026-02-21T08:52:59.7676486Z or.b64 %rd919, %rd916, %rd918; 2026-02-21T08:52:59.7676981Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7677137Z mov.b64 {%r1580, %r1581}, %rd919; 2026-02-21T08:52:59.7677306Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T08:52:59.7677789Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7677932Z cvt.u64.u32 %rd920, %r1007; 2026-02-21T08:52:59.7678068Z cvt.u64.u32 %rd921, %r1008; 2026-02-21T08:52:59.7678238Z shl.b64 %rd922, %rd921, 32; 2026-02-21T08:52:59.7678377Z or.b64 %rd923, %rd920, %rd922; 2026-02-21T08:52:59.7678796Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7678938Z mov.b64 {%r1583, %r1584}, %rd923; 2026-02-21T08:52:59.7679136Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T08:52:59.7679572Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7679714Z cvt.u64.u32 %rd924, %r1009; 2026-02-21T08:52:59.7679895Z cvt.u64.u32 %rd925, %r1010; 2026-02-21T08:52:59.7680145Z shl.b64 %rd926, %rd925, 32; 2026-02-21T08:52:59.7680286Z or.b64 %rd927, %rd924, %rd926; 2026-02-21T08:52:59.7680743Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7680883Z mov.b64 {%r1586, %r1587}, %rd927; 2026-02-21T08:52:59.7681045Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T08:52:59.7681467Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7681651Z cvt.u64.u32 %rd928, %r1011; 2026-02-21T08:52:59.7681791Z cvt.u64.u32 %rd929, %r1012; 2026-02-21T08:52:59.7681928Z shl.b64 %rd930, %rd929, 32; 2026-02-21T08:52:59.7682118Z or.b64 %rd931, %rd928, %rd930; 2026-02-21T08:52:59.7682581Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7682727Z mov.b64 {%r1589, %r1590}, %rd931; 2026-02-21T08:52:59.7682935Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T08:52:59.7683386Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7683535Z cvt.u64.u32 %rd932, %r1013; 2026-02-21T08:52:59.7683677Z cvt.u64.u32 %rd933, %r1014; 2026-02-21T08:52:59.7683856Z shl.b64 %rd934, %rd933, 32; 2026-02-21T08:52:59.7683995Z or.b64 %rd935, %rd932, %rd934; 2026-02-21T08:52:59.7684408Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7684765Z mov.b64 {%r1592, %r1593}, %rd935; 2026-02-21T08:52:59.7684929Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T08:52:59.7685371Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7685554Z cvt.u64.u32 %rd936, %r1015; 2026-02-21T08:52:59.7685701Z cvt.u64.u32 %rd937, %r1016; 2026-02-21T08:52:59.7685842Z shl.b64 %rd938, %rd937, 32; 2026-02-21T08:52:59.7689606Z or.b64 %rd939, %rd936, %rd938; 2026-02-21T08:52:59.7690357Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7690509Z mov.b64 {%r1595, %r1596}, %rd939; 2026-02-21T08:52:59.7690676Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T08:52:59.7691156Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7691302Z cvt.u64.u32 %rd940, %r1017; 2026-02-21T08:52:59.7691445Z cvt.u64.u32 %rd941, %r1018; 2026-02-21T08:52:59.7691623Z shl.b64 %rd942, %rd941, 32; 2026-02-21T08:52:59.7691768Z or.b64 %rd943, %rd940, %rd942; 2026-02-21T08:52:59.7692199Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7692342Z mov.b64 {%r1598, %r1599}, %rd943; 2026-02-21T08:52:59.7692665Z cvt.rn.f16x2.f32 %r1600, %r1599, %r1598; 2026-02-21T08:52:59.7693154Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7693306Z cvt.u64.u32 %rd944, %r1020; 2026-02-21T08:52:59.7693487Z cvt.u64.u32 %rd945, %r1021; 2026-02-21T08:52:59.7693624Z shl.b64 %rd946, %rd945, 32; 2026-02-21T08:52:59.7693771Z or.b64 %rd947, %rd944, %rd946; 2026-02-21T08:52:59.7694268Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7694417Z mov.b64 {%r1601, %r1602}, %rd947; 2026-02-21T08:52:59.7694579Z cvt.rn.f16x2.f32 %r1603, %r1602, %r1601; 2026-02-21T08:52:59.7695160Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7695299Z cvt.u64.u32 %rd948, %r1022; 2026-02-21T08:52:59.7695442Z cvt.u64.u32 %rd949, %r1023; 2026-02-21T08:52:59.7695582Z shl.b64 %rd950, %rd949, 32; 2026-02-21T08:52:59.7695766Z or.b64 %rd951, %rd948, %rd950; 2026-02-21T08:52:59.7696211Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7696486Z mov.b64 {%r1604, %r1605}, %rd951; 2026-02-21T08:52:59.7696691Z cvt.rn.f16x2.f32 %r1606, %r1605, %r1604; 2026-02-21T08:52:59.7697146Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7697285Z cvt.u64.u32 %rd952, %r1024; 2026-02-21T08:52:59.7697464Z cvt.u64.u32 %rd953, %r1025; 2026-02-21T08:52:59.7697609Z shl.b64 %rd954, %rd953, 32; 2026-02-21T08:52:59.7697753Z or.b64 %rd955, %rd952, %rd954; 2026-02-21T08:52:59.7698190Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7698375Z mov.b64 {%r1607, %r1608}, %rd955; 2026-02-21T08:52:59.7698542Z cvt.rn.f16x2.f32 %r1609, %r1608, %r1607; 2026-02-21T08:52:59.7699009Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7699205Z cvt.u64.u32 %rd956, %r1026; 2026-02-21T08:52:59.7699350Z cvt.u64.u32 %rd957, %r1027; 2026-02-21T08:52:59.7699498Z shl.b64 %rd958, %rd957, 32; 2026-02-21T08:52:59.7699642Z or.b64 %rd959, %rd956, %rd958; 2026-02-21T08:52:59.7700121Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7700269Z mov.b64 {%r1610, %r1611}, %rd959; 2026-02-21T08:52:59.7700434Z cvt.rn.f16x2.f32 %r1612, %r1611, %r1610; 2026-02-21T08:52:59.7700891Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7701142Z cvt.u64.u32 %rd960, %r1028; 2026-02-21T08:52:59.7701277Z cvt.u64.u32 %rd961, %r1029; 2026-02-21T08:52:59.7701453Z shl.b64 %rd962, %rd961, 32; 2026-02-21T08:52:59.7701594Z or.b64 %rd963, %rd960, %rd962; 2026-02-21T08:52:59.7702030Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7702221Z mov.b64 {%r1613, %r1614}, %rd963; 2026-02-21T08:52:59.7702460Z cvt.rn.f16x2.f32 %r1615, %r1614, %r1613; 2026-02-21T08:52:59.7702896Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7703038Z cvt.u64.u32 %rd964, %r1030; 2026-02-21T08:52:59.7703210Z cvt.u64.u32 %rd965, %r1031; 2026-02-21T08:52:59.7703344Z shl.b64 %rd966, %rd965, 32; 2026-02-21T08:52:59.7703487Z or.b64 %rd967, %rd964, %rd966; 2026-02-21T08:52:59.7703949Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7704098Z mov.b64 {%r1616, %r1617}, %rd967; 2026-02-21T08:52:59.7704256Z cvt.rn.f16x2.f32 %r1618, %r1617, %r1616; 2026-02-21T08:52:59.7704790Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7704943Z cvt.u64.u32 %rd968, %r1032; 2026-02-21T08:52:59.7705191Z cvt.u64.u32 %rd969, %r1033; 2026-02-21T08:52:59.7705339Z shl.b64 %rd970, %rd969, 32; 2026-02-21T08:52:59.7705522Z or.b64 %rd971, %rd968, %rd970; 2026-02-21T08:52:59.7705982Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7706129Z mov.b64 {%r1619, %r1620}, %rd971; 2026-02-21T08:52:59.7706342Z cvt.rn.f16x2.f32 %r1621, %r1620, %r1619; 2026-02-21T08:52:59.7706786Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7706927Z cvt.u64.u32 %rd972, %r1034; 2026-02-21T08:52:59.7707104Z cvt.u64.u32 %rd973, %r1035; 2026-02-21T08:52:59.7707257Z shl.b64 %rd974, %rd973, 32; 2026-02-21T08:52:59.7707399Z or.b64 %rd975, %rd972, %rd974; 2026-02-21T08:52:59.7707855Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7708045Z mov.b64 {%r1622, %r1623}, %rd975; 2026-02-21T08:52:59.7708218Z cvt.rn.f16x2.f32 %r1624, %r1623, %r1622; 2026-02-21T08:52:59.7708365Z mov.b64 {%r1625, %r1626}, %rd367; 2026-02-21T08:52:59.7708575Z cvt.rn.f16x2.f32 %r1627, %r1626, %r1625; 2026-02-21T08:52:59.7708828Z mov.b64 {%r1628, %r1629}, %rd371; 2026-02-21T08:52:59.7708992Z cvt.rn.f16x2.f32 %r1630, %r1629, %r1628; 2026-02-21T08:52:59.7709138Z mov.b64 {%r1631, %r1632}, %rd375; 2026-02-21T08:52:59.7709340Z cvt.rn.f16x2.f32 %r1633, %r1632, %r1631; 2026-02-21T08:52:59.7709481Z mov.b64 {%r1634, %r1635}, %rd379; 2026-02-21T08:52:59.7709643Z cvt.rn.f16x2.f32 %r1636, %r1635, %r1634; 2026-02-21T08:52:59.7709825Z mov.b64 {%r1637, %r1638}, %rd383; 2026-02-21T08:52:59.7709996Z cvt.rn.f16x2.f32 %r1639, %r1638, %r1637; 2026-02-21T08:52:59.7710144Z mov.b64 {%r1640, %r1641}, %rd387; 2026-02-21T08:52:59.7710351Z cvt.rn.f16x2.f32 %r1642, %r1641, %r1640; 2026-02-21T08:52:59.7710501Z mov.b64 {%r1643, %r1644}, %rd391; 2026-02-21T08:52:59.7710669Z cvt.rn.f16x2.f32 %r1645, %r1644, %r1643; 2026-02-21T08:52:59.7710821Z mov.b64 {%r1646, %r1647}, %rd395; 2026-02-21T08:52:59.7711027Z cvt.rn.f16x2.f32 %r1648, %r1647, %r1646; 2026-02-21T08:52:59.7711179Z mov.b64 {%r1649, %r1650}, %rd399; 2026-02-21T08:52:59.7711347Z cvt.rn.f16x2.f32 %r1651, %r1650, %r1649; 2026-02-21T08:52:59.7711533Z mov.b64 {%r1652, %r1653}, %rd403; 2026-02-21T08:52:59.7711702Z cvt.rn.f16x2.f32 %r1654, %r1653, %r1652; 2026-02-21T08:52:59.7711848Z mov.b64 {%r1655, %r1656}, %rd407; 2026-02-21T08:52:59.7712059Z cvt.rn.f16x2.f32 %r1657, %r1656, %r1655; 2026-02-21T08:52:59.7712209Z mov.b64 {%r1658, %r1659}, %rd411; 2026-02-21T08:52:59.7712372Z cvt.rn.f16x2.f32 %r1660, %r1659, %r1658; 2026-02-21T08:52:59.7712616Z mov.b64 {%r1661, %r1662}, %rd415; 2026-02-21T08:52:59.7712818Z cvt.rn.f16x2.f32 %r1663, %r1662, %r1661; 2026-02-21T08:52:59.7712955Z mov.b64 {%r1664, %r1665}, %rd419; 2026-02-21T08:52:59.7713110Z cvt.rn.f16x2.f32 %r1666, %r1665, %r1664; 2026-02-21T08:52:59.7713288Z mov.b64 {%r1667, %r1668}, %rd423; 2026-02-21T08:52:59.7713453Z cvt.rn.f16x2.f32 %r1669, %r1668, %r1667; 2026-02-21T08:52:59.7713596Z mov.b64 {%r1670, %r1671}, %rd427; 2026-02-21T08:52:59.7713849Z cvt.rn.f16x2.f32 %r1672, %r1671, %r1670; 2026-02-21T08:52:59.7714353Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7714499Z cvt.u64.u32 %rd976, %r1071; 2026-02-21T08:52:59.7714640Z cvt.u64.u32 %rd977, %r1072; 2026-02-21T08:52:59.7714914Z shl.b64 %rd978, %rd977, 32; 2026-02-21T08:52:59.7715081Z or.b64 %rd979, %rd976, %rd978; 2026-02-21T08:52:59.7715598Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7715846Z mov.b64 {%r1673, %r1674}, %rd979; 2026-02-21T08:52:59.7716034Z cvt.rn.f16x2.f32 %r1675, %r1674, %r1673; 2026-02-21T08:52:59.7716578Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7716753Z cvt.u64.u32 %rd980, %r1073; 2026-02-21T08:52:59.7717057Z cvt.u64.u32 %rd981, %r1074; 2026-02-21T08:52:59.7717213Z shl.b64 %rd982, %rd981, 32; 2026-02-21T08:52:59.7717357Z or.b64 %rd983, %rd980, %rd982; 2026-02-21T08:52:59.7717857Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7718004Z mov.b64 {%r1676, %r1677}, %rd983; 2026-02-21T08:52:59.7718170Z cvt.rn.f16x2.f32 %r1678, %r1677, %r1676; 2026-02-21T08:52:59.7718640Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7718777Z cvt.u64.u32 %rd984, %r1075; 2026-02-21T08:52:59.7718914Z cvt.u64.u32 %rd985, %r1076; 2026-02-21T08:52:59.7719056Z shl.b64 %rd986, %rd985, 32; 2026-02-21T08:52:59.7719234Z or.b64 %rd987, %rd984, %rd986; 2026-02-21T08:52:59.7719672Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7719819Z mov.b64 {%r1679, %r1680}, %rd987; 2026-02-21T08:52:59.7720026Z cvt.rn.f16x2.f32 %r1681, %r1680, %r1679; 2026-02-21T08:52:59.7720457Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7720702Z cvt.u64.u32 %rd988, %r1077; 2026-02-21T08:52:59.7720880Z cvt.u64.u32 %rd989, %r1078; 2026-02-21T08:52:59.7721038Z shl.b64 %rd990, %rd989, 32; 2026-02-21T08:52:59.7721178Z or.b64 %rd991, %rd988, %rd990; 2026-02-21T08:52:59.7721604Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7721790Z mov.b64 {%r1682, %r1683}, %rd991; 2026-02-21T08:52:59.7721955Z cvt.rn.f16x2.f32 %r1684, %r1683, %r1682; 2026-02-21T08:52:59.7722416Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7722606Z cvt.u64.u32 %rd992, %r1079; 2026-02-21T08:52:59.7722750Z cvt.u64.u32 %rd993, %r1080; 2026-02-21T08:52:59.7722894Z shl.b64 %rd994, %rd993, 32; 2026-02-21T08:52:59.7723074Z or.b64 %rd995, %rd992, %rd994; 2026-02-21T08:52:59.7723517Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7723659Z mov.b64 {%r1685, %r1686}, %rd995; 2026-02-21T08:52:59.7723819Z cvt.rn.f16x2.f32 %r1687, %r1686, %r1685; 2026-02-21T08:52:59.7724269Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7724407Z cvt.u64.u32 %rd996, %r1081; 2026-02-21T08:52:59.7724537Z cvt.u64.u32 %rd997, %r1082; 2026-02-21T08:52:59.7724761Z shl.b64 %rd998, %rd997, 32; 2026-02-21T08:52:59.7724906Z or.b64 %rd999, %rd996, %rd998; 2026-02-21T08:52:59.7725431Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7725612Z mov.b64 {%r1688, %r1689}, %rd999; 2026-02-21T08:52:59.7725776Z cvt.rn.f16x2.f32 %r1690, %r1689, %r1688; 2026-02-21T08:52:59.7726194Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7726433Z cvt.u64.u32 %rd1000, %r1083; 2026-02-21T08:52:59.7726580Z cvt.u64.u32 %rd1001, %r1084; 2026-02-21T08:52:59.7726729Z shl.b64 %rd1002, %rd1001, 32; 2026-02-21T08:52:59.7726869Z or.b64 %rd1003, %rd1000, %rd1002; 2026-02-21T08:52:59.7727334Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7727482Z mov.b64 {%r1691, %r1692}, %rd1003; 2026-02-21T08:52:59.7727645Z cvt.rn.f16x2.f32 %r1693, %r1692, %r1691; 2026-02-21T08:52:59.7728127Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7728280Z cvt.u64.u32 %rd1004, %r1085; 2026-02-21T08:52:59.7728422Z cvt.u64.u32 %rd1005, %r1086; 2026-02-21T08:52:59.7728567Z shl.b64 %rd1006, %rd1005, 32; 2026-02-21T08:52:59.7728744Z or.b64 %rd1007, %rd1004, %rd1006; 2026-02-21T08:52:59.7729284Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7729442Z mov.b64 {%r1694, %r1695}, %rd1007; 2026-02-21T08:52:59.7729646Z cvt.rn.f16x2.f32 %r1696, %r1695, %r1694; 2026-02-21T08:52:59.7730058Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7730198Z cvt.u64.u32 %rd1008, %r1088; 2026-02-21T08:52:59.7730378Z cvt.u64.u32 %rd1009, %r1089; 2026-02-21T08:52:59.7730522Z shl.b64 %rd1010, %rd1009, 32; 2026-02-21T08:52:59.7730665Z or.b64 %rd1011, %rd1008, %rd1010; 2026-02-21T08:52:59.7731105Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7731298Z mov.b64 {%r1697, %r1698}, %rd1011; 2026-02-21T08:52:59.7731460Z cvt.rn.f16x2.f32 %r1699, %r1698, %r1697; 2026-02-21T08:52:59.7731899Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7732081Z cvt.u64.u32 %rd1012, %r1090; 2026-02-21T08:52:59.7732220Z cvt.u64.u32 %rd1013, %r1091; 2026-02-21T08:52:59.7732365Z shl.b64 %rd1014, %rd1013, 32; 2026-02-21T08:52:59.7732642Z or.b64 %rd1015, %rd1012, %rd1014; 2026-02-21T08:52:59.7733077Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7733226Z mov.b64 {%r1700, %r1701}, %rd1015; 2026-02-21T08:52:59.7733437Z cvt.rn.f16x2.f32 %r1702, %r1701, %r1700; 2026-02-21T08:52:59.7733911Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7734062Z cvt.u64.u32 %rd1016, %r1092; 2026-02-21T08:52:59.7734211Z cvt.u64.u32 %rd1017, %r1093; 2026-02-21T08:52:59.7734390Z shl.b64 %rd1018, %rd1017, 32; 2026-02-21T08:52:59.7734539Z or.b64 %rd1019, %rd1016, %rd1018; 2026-02-21T08:52:59.7735055Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7735242Z mov.b64 {%r1703, %r1704}, %rd1019; 2026-02-21T08:52:59.7735406Z cvt.rn.f16x2.f32 %r1705, %r1704, %r1703; 2026-02-21T08:52:59.7735829Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7736011Z cvt.u64.u32 %rd1020, %r1094; 2026-02-21T08:52:59.7736149Z cvt.u64.u32 %rd1021, %r1095; 2026-02-21T08:52:59.7736296Z shl.b64 %rd1022, %rd1021, 32; 2026-02-21T08:52:59.7736436Z or.b64 %rd1023, %rd1020, %rd1022; 2026-02-21T08:52:59.7736918Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7737061Z mov.b64 {%r1706, %r1707}, %rd1023; 2026-02-21T08:52:59.7737355Z cvt.rn.f16x2.f32 %r1708, %r1707, %r1706; 2026-02-21T08:52:59.7737847Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7738024Z cvt.u64.u32 %rd1024, %r1096; 2026-02-21T08:52:59.7738215Z cvt.u64.u32 %rd1025, %r1097; 2026-02-21T08:52:59.7738436Z shl.b64 %rd1026, %rd1025, 32; 2026-02-21T08:52:59.7738618Z or.b64 %rd1027, %rd1024, %rd1026; 2026-02-21T08:52:59.7739170Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7739326Z mov.b64 {%r1709, %r1710}, %rd1027; 2026-02-21T08:52:59.7739533Z cvt.rn.f16x2.f32 %r1711, %r1710, %r1709; 2026-02-21T08:52:59.7739997Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7740145Z cvt.u64.u32 %rd1028, %r1098; 2026-02-21T08:52:59.7740330Z cvt.u64.u32 %rd1029, %r1099; 2026-02-21T08:52:59.7740474Z shl.b64 %rd1030, %rd1029, 32; 2026-02-21T08:52:59.7740627Z or.b64 %rd1031, %rd1028, %rd1030; 2026-02-21T08:52:59.7741106Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7741246Z mov.b64 {%r1712, %r1713}, %rd1031; 2026-02-21T08:52:59.7741399Z cvt.rn.f16x2.f32 %r1714, %r1713, %r1712; 2026-02-21T08:52:59.7741917Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7742107Z cvt.u64.u32 %rd1032, %r1100; 2026-02-21T08:52:59.7742252Z cvt.u64.u32 %rd1033, %r1101; 2026-02-21T08:52:59.7742397Z shl.b64 %rd1034, %rd1033, 32; 2026-02-21T08:52:59.7742579Z or.b64 %rd1035, %rd1032, %rd1034; 2026-02-21T08:52:59.7743025Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7743171Z mov.b64 {%r1715, %r1716}, %rd1035; 2026-02-21T08:52:59.7743374Z cvt.rn.f16x2.f32 %r1717, %r1716, %r1715; 2026-02-21T08:52:59.7743803Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7743945Z cvt.u64.u32 %rd1036, %r1102; 2026-02-21T08:52:59.7744084Z cvt.u64.u32 %rd1037, %r1103; 2026-02-21T08:52:59.7744260Z shl.b64 %rd1038, %rd1037, 32; 2026-02-21T08:52:59.7744402Z or.b64 %rd1039, %rd1036, %rd1038; 2026-02-21T08:52:59.7744936Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7745242Z mov.b64 {%r1718, %r1719}, %rd1039; 2026-02-21T08:52:59.7745412Z cvt.rn.f16x2.f32 %r1720, %r1719, %r1718; 2026-02-21T08:52:59.7745561Z mov.b64 {%r1721, %r1722}, %rd431; 2026-02-21T08:52:59.7745766Z cvt.rn.f16x2.f32 %r1723, %r1722, %r1721; 2026-02-21T08:52:59.7745911Z mov.b64 {%r1724, %r1725}, %rd435; 2026-02-21T08:52:59.7746070Z cvt.rn.f16x2.f32 %r1726, %r1725, %r1724; 2026-02-21T08:52:59.7746212Z mov.b64 {%r1727, %r1728}, %rd439; 2026-02-21T08:52:59.7746410Z cvt.rn.f16x2.f32 %r1729, %r1728, %r1727; 2026-02-21T08:52:59.7746553Z mov.b64 {%r1730, %r1731}, %rd443; 2026-02-21T08:52:59.7746713Z cvt.rn.f16x2.f32 %r1732, %r1731, %r1730; 2026-02-21T08:52:59.7746891Z mov.b64 {%r1733, %r1734}, %rd447; 2026-02-21T08:52:59.7747049Z cvt.rn.f16x2.f32 %r1735, %r1734, %r1733; 2026-02-21T08:52:59.7747185Z mov.b64 {%r1736, %r1737}, %rd451; 2026-02-21T08:52:59.7747344Z cvt.rn.f16x2.f32 %r1738, %r1737, %r1736; 2026-02-21T08:52:59.7747528Z mov.b64 {%r1739, %r1740}, %rd455; 2026-02-21T08:52:59.7747686Z cvt.rn.f16x2.f32 %r1741, %r1740, %r1739; 2026-02-21T08:52:59.7747822Z mov.b64 {%r1742, %r1743}, %rd459; 2026-02-21T08:52:59.7748011Z cvt.rn.f16x2.f32 %r1744, %r1743, %r1742; 2026-02-21T08:52:59.7748147Z mov.b64 {%r1745, %r1746}, %rd463; 2026-02-21T08:52:59.7748307Z cvt.rn.f16x2.f32 %r1747, %r1746, %r1745; 2026-02-21T08:52:59.7748483Z mov.b64 {%r1748, %r1749}, %rd467; 2026-02-21T08:52:59.7748642Z cvt.rn.f16x2.f32 %r1750, %r1749, %r1748; 2026-02-21T08:52:59.7748782Z mov.b64 {%r1751, %r1752}, %rd471; 2026-02-21T08:52:59.7749051Z cvt.rn.f16x2.f32 %r1753, %r1752, %r1751; 2026-02-21T08:52:59.7749233Z mov.b64 {%r1754, %r1755}, %rd475; 2026-02-21T08:52:59.7749392Z cvt.rn.f16x2.f32 %r1756, %r1755, %r1754; 2026-02-21T08:52:59.7749527Z mov.b64 {%r1757, %r1758}, %rd479; 2026-02-21T08:52:59.7749723Z cvt.rn.f16x2.f32 %r1759, %r1758, %r1757; 2026-02-21T08:52:59.7749864Z mov.b64 {%r1760, %r1761}, %rd483; 2026-02-21T08:52:59.7750093Z cvt.rn.f16x2.f32 %r1762, %r1761, %r1760; 2026-02-21T08:52:59.7750246Z mov.b64 {%r1763, %r1764}, %rd487; 2026-02-21T08:52:59.7750440Z cvt.rn.f16x2.f32 %r1765, %r1764, %r1763; 2026-02-21T08:52:59.7750579Z mov.b64 {%r1766, %r1767}, %rd491; 2026-02-21T08:52:59.7750741Z cvt.rn.f16x2.f32 %r1768, %r1767, %r1766; 2026-02-21T08:52:59.7751246Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7751401Z cvt.u64.u32 %rd1040, %r1139; 2026-02-21T08:52:59.7751548Z cvt.u64.u32 %rd1041, %r1140; 2026-02-21T08:52:59.7751730Z shl.b64 %rd1042, %rd1041, 32; 2026-02-21T08:52:59.7751874Z or.b64 %rd1043, %rd1040, %rd1042; 2026-02-21T08:52:59.7752338Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7752483Z mov.b64 {%r1769, %r1770}, %rd1043; 2026-02-21T08:52:59.7752779Z cvt.rn.f16x2.f32 %r1771, %r1770, %r1769; 2026-02-21T08:52:59.7753212Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7753357Z cvt.u64.u32 %rd1044, %r1141; 2026-02-21T08:52:59.7753535Z cvt.u64.u32 %rd1045, %r1142; 2026-02-21T08:52:59.7753677Z shl.b64 %rd1046, %rd1045, 32; 2026-02-21T08:52:59.7753821Z or.b64 %rd1047, %rd1044, %rd1046; 2026-02-21T08:52:59.7754294Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7754440Z mov.b64 {%r1772, %r1773}, %rd1047; 2026-02-21T08:52:59.7754605Z cvt.rn.f16x2.f32 %r1774, %r1773, %r1772; 2026-02-21T08:52:59.7755115Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7755303Z cvt.u64.u32 %rd1048, %r1143; 2026-02-21T08:52:59.7755444Z cvt.u64.u32 %rd1049, %r1144; 2026-02-21T08:52:59.7755582Z shl.b64 %rd1050, %rd1049, 32; 2026-02-21T08:52:59.7755762Z or.b64 %rd1051, %rd1048, %rd1050; 2026-02-21T08:52:59.7756196Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7756444Z mov.b64 {%r1775, %r1776}, %rd1051; 2026-02-21T08:52:59.7756649Z cvt.rn.f16x2.f32 %r1777, %r1776, %r1775; 2026-02-21T08:52:59.7757116Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7757267Z cvt.u64.u32 %rd1052, %r1145; 2026-02-21T08:52:59.7757412Z cvt.u64.u32 %rd1053, %r1146; 2026-02-21T08:52:59.7757596Z shl.b64 %rd1054, %rd1053, 32; 2026-02-21T08:52:59.7757736Z or.b64 %rd1055, %rd1052, %rd1054; 2026-02-21T08:52:59.7758209Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7758397Z mov.b64 {%r1778, %r1779}, %rd1055; 2026-02-21T08:52:59.7758559Z cvt.rn.f16x2.f32 %r1780, %r1779, %r1778; 2026-02-21T08:52:59.7758986Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7759160Z cvt.u64.u32 %rd1056, %r1147; 2026-02-21T08:52:59.7759298Z cvt.u64.u32 %rd1057, %r1148; 2026-02-21T08:52:59.7759441Z shl.b64 %rd1058, %rd1057, 32; 2026-02-21T08:52:59.7759582Z or.b64 %rd1059, %rd1056, %rd1058; 2026-02-21T08:52:59.7760057Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7760204Z mov.b64 {%r1781, %r1782}, %rd1059; 2026-02-21T08:52:59.7760365Z cvt.rn.f16x2.f32 %r1783, %r1782, %r1781; 2026-02-21T08:52:59.7760832Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7761084Z cvt.u64.u32 %rd1060, %r1149; 2026-02-21T08:52:59.7761227Z cvt.u64.u32 %rd1061, %r1150; 2026-02-21T08:52:59.7761404Z shl.b64 %rd1062, %rd1061, 32; 2026-02-21T08:52:59.7761545Z or.b64 %rd1063, %rd1060, %rd1062; 2026-02-21T08:52:59.7761984Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7762238Z mov.b64 {%r1784, %r1785}, %rd1063; 2026-02-21T08:52:59.7762416Z cvt.rn.f16x2.f32 %r1786, %r1785, %r1784; 2026-02-21T08:52:59.7762892Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7763044Z cvt.u64.u32 %rd1064, %r1151; 2026-02-21T08:52:59.7763227Z cvt.u64.u32 %rd1065, %r1152; 2026-02-21T08:52:59.7763370Z shl.b64 %rd1066, %rd1065, 32; 2026-02-21T08:52:59.7763514Z or.b64 %rd1067, %rd1064, %rd1066; 2026-02-21T08:52:59.7764000Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7764146Z mov.b64 {%r1787, %r1788}, %rd1067; 2026-02-21T08:52:59.7764304Z cvt.rn.f16x2.f32 %r1789, %r1788, %r1787; 2026-02-21T08:52:59.7764811Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7765057Z cvt.u64.u32 %rd1068, %r1153; 2026-02-21T08:52:59.7765202Z cvt.u64.u32 %rd1069, %r1154; 2026-02-21T08:52:59.7765350Z shl.b64 %rd1070, %rd1069, 32; 2026-02-21T08:52:59.7765532Z or.b64 %rd1071, %rd1068, %rd1070; 2026-02-21T08:52:59.7765978Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7766123Z mov.b64 {%r1790, %r1791}, %rd1071; 2026-02-21T08:52:59.7766328Z cvt.rn.f16x2.f32 %r1792, %r1791, %r1790; 2026-02-21T08:52:59.7766767Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7766907Z cvt.u64.u32 %rd1072, %r1156; 2026-02-21T08:52:59.7767089Z cvt.u64.u32 %rd1073, %r1157; 2026-02-21T08:52:59.7767234Z shl.b64 %rd1074, %rd1073, 32; 2026-02-21T08:52:59.7767378Z or.b64 %rd1075, %rd1072, %rd1074; 2026-02-21T08:52:59.7767815Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7768007Z mov.b64 {%r1793, %r1794}, %rd1075; 2026-02-21T08:52:59.7768171Z cvt.rn.f16x2.f32 %r1795, %r1794, %r1793; 2026-02-21T08:52:59.7768636Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7768917Z cvt.u64.u32 %rd1076, %r1158; 2026-02-21T08:52:59.7769061Z cvt.u64.u32 %rd1077, %r1159; 2026-02-21T08:52:59.7769204Z shl.b64 %rd1078, %rd1077, 32; 2026-02-21T08:52:59.7769384Z or.b64 %rd1079, %rd1076, %rd1078; 2026-02-21T08:52:59.7769825Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7769964Z mov.b64 {%r1796, %r1797}, %rd1079; 2026-02-21T08:52:59.7770111Z cvt.rn.f16x2.f32 %r1798, %r1797, %r1796; 2026-02-21T08:52:59.7770522Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7770662Z cvt.u64.u32 %rd1080, %r1160; 2026-02-21T08:52:59.7770801Z cvt.u64.u32 %rd1081, %r1161; 2026-02-21T08:52:59.7770980Z shl.b64 %rd1082, %rd1081, 32; 2026-02-21T08:52:59.7771110Z or.b64 %rd1083, %rd1080, %rd1082; 2026-02-21T08:52:59.7771538Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7771718Z mov.b64 {%r1799, %r1800}, %rd1083; 2026-02-21T08:52:59.7771867Z cvt.rn.f16x2.f32 %r1801, %r1800, %r1799; 2026-02-21T08:52:59.7772295Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7772429Z cvt.u64.u32 %rd1084, %r1162; 2026-02-21T08:52:59.7772600Z cvt.u64.u32 %rd1085, %r1163; 2026-02-21T08:52:59.7772740Z shl.b64 %rd1086, %rd1085, 32; 2026-02-21T08:52:59.7772992Z or.b64 %rd1087, %rd1084, %rd1086; 2026-02-21T08:52:59.7773459Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7773599Z mov.b64 {%r1802, %r1803}, %rd1087; 2026-02-21T08:52:59.7773746Z cvt.rn.f16x2.f32 %r1804, %r1803, %r1802; 2026-02-21T08:52:59.7774308Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7774462Z cvt.u64.u32 %rd1088, %r1164; 2026-02-21T08:52:59.7774612Z cvt.u64.u32 %rd1089, %r1165; 2026-02-21T08:52:59.7774800Z shl.b64 %rd1090, %rd1089, 32; 2026-02-21T08:52:59.7774989Z or.b64 %rd1091, %rd1088, %rd1090; 2026-02-21T08:52:59.7775442Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7775585Z mov.b64 {%r1805, %r1806}, %rd1091; 2026-02-21T08:52:59.7775786Z cvt.rn.f16x2.f32 %r1807, %r1806, %r1805; 2026-02-21T08:52:59.7776204Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7776342Z cvt.u64.u32 %rd1092, %r1166; 2026-02-21T08:52:59.7776523Z cvt.u64.u32 %rd1093, %r1167; 2026-02-21T08:52:59.7776660Z shl.b64 %rd1094, %rd1093, 32; 2026-02-21T08:52:59.7776805Z or.b64 %rd1095, %rd1092, %rd1094; 2026-02-21T08:52:59.7777341Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7777538Z mov.b64 {%r1808, %r1809}, %rd1095; 2026-02-21T08:52:59.7777702Z cvt.rn.f16x2.f32 %r1810, %r1809, %r1808; 2026-02-21T08:52:59.7778146Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7778329Z cvt.u64.u32 %rd1096, %r1168; 2026-02-21T08:52:59.7778469Z cvt.u64.u32 %rd1097, %r1169; 2026-02-21T08:52:59.7778609Z shl.b64 %rd1098, %rd1097, 32; 2026-02-21T08:52:59.7778796Z or.b64 %rd1099, %rd1096, %rd1098; 2026-02-21T08:52:59.7779239Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7779388Z mov.b64 {%r1811, %r1812}, %rd1099; 2026-02-21T08:52:59.7779559Z cvt.rn.f16x2.f32 %r1813, %r1812, %r1811; 2026-02-21T08:52:59.7780047Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7780187Z cvt.u64.u32 %rd1100, %r1170; 2026-02-21T08:52:59.7780337Z cvt.u64.u32 %rd1101, %r1171; 2026-02-21T08:52:59.7780629Z shl.b64 %rd1102, %rd1101, 32; 2026-02-21T08:52:59.7780748Z or.b64 %rd1103, %rd1100, %rd1102; 2026-02-21T08:52:59.7781219Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7781403Z mov.b64 {%r1814, %r1815}, %rd1103; 2026-02-21T08:52:59.7781566Z cvt.rn.f16x2.f32 %r1816, %r1815, %r1814; 2026-02-21T08:52:59.7781705Z mov.b64 {%r1817, %r1818}, %rd495; 2026-02-21T08:52:59.7781861Z cvt.rn.f16x2.f32 %r1819, %r1818, %r1817; 2026-02-21T08:52:59.7782039Z mov.b64 {%r1820, %r1821}, %rd499; 2026-02-21T08:52:59.7782196Z cvt.rn.f16x2.f32 %r1822, %r1821, %r1820; 2026-02-21T08:52:59.7782337Z mov.b64 {%r1823, %r1824}, %rd503; 2026-02-21T08:52:59.7782531Z cvt.rn.f16x2.f32 %r1825, %r1824, %r1823; 2026-02-21T08:52:59.7782676Z mov.b64 {%r1826, %r1827}, %rd507; 2026-02-21T08:52:59.7782848Z cvt.rn.f16x2.f32 %r1828, %r1827, %r1826; 2026-02-21T08:52:59.7783039Z mov.b64 {%r1829, %r1830}, %rd511; 2026-02-21T08:52:59.7783207Z cvt.rn.f16x2.f32 %r1831, %r1830, %r1829; 2026-02-21T08:52:59.7783350Z mov.b64 {%r1832, %r1833}, %rd515; 2026-02-21T08:52:59.7783511Z cvt.rn.f16x2.f32 %r1834, %r1833, %r1832; 2026-02-21T08:52:59.7783687Z mov.b64 {%r1835, %r1836}, %rd519; 2026-02-21T08:52:59.7783844Z cvt.rn.f16x2.f32 %r1837, %r1836, %r1835; 2026-02-21T08:52:59.7783982Z mov.b64 {%r1838, %r1839}, %rd523; 2026-02-21T08:52:59.7784181Z cvt.rn.f16x2.f32 %r1840, %r1839, %r1838; 2026-02-21T08:52:59.7784320Z mov.b64 {%r1841, %r1842}, %rd527; 2026-02-21T08:52:59.7784585Z cvt.rn.f16x2.f32 %r1843, %r1842, %r1841; 2026-02-21T08:52:59.7784785Z mov.b64 {%r1844, %r1845}, %rd531; 2026-02-21T08:52:59.7784982Z cvt.rn.f16x2.f32 %r1846, %r1845, %r1844; 2026-02-21T08:52:59.7785125Z mov.b64 {%r1847, %r1848}, %rd535; 2026-02-21T08:52:59.7785291Z cvt.rn.f16x2.f32 %r1849, %r1848, %r1847; 2026-02-21T08:52:59.7785472Z mov.b64 {%r1850, %r1851}, %rd539; 2026-02-21T08:52:59.7785715Z cvt.rn.f16x2.f32 %r1852, %r1851, %r1850; 2026-02-21T08:52:59.7785872Z mov.b64 {%r1853, %r1854}, %rd543; 2026-02-21T08:52:59.7786075Z cvt.rn.f16x2.f32 %r1855, %r1854, %r1853; 2026-02-21T08:52:59.7786217Z mov.b64 {%r1856, %r1857}, %rd547; 2026-02-21T08:52:59.7786382Z cvt.rn.f16x2.f32 %r1858, %r1857, %r1856; 2026-02-21T08:52:59.7786530Z mov.b64 {%r1859, %r1860}, %rd551; 2026-02-21T08:52:59.7786733Z cvt.rn.f16x2.f32 %r1861, %r1860, %r1859; 2026-02-21T08:52:59.7786884Z mov.b64 {%r1862, %r1863}, %rd555; 2026-02-21T08:52:59.7787043Z cvt.rn.f16x2.f32 %r1864, %r1863, %r1862; 2026-02-21T08:52:59.7787521Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7787661Z cvt.u64.u32 %rd1104, %r1207; 2026-02-21T08:52:59.7787815Z cvt.u64.u32 %rd1105, %r1208; 2026-02-21T08:52:59.7787957Z shl.b64 %rd1106, %rd1105, 32; 2026-02-21T08:52:59.7788225Z or.b64 %rd1107, %rd1104, %rd1106; 2026-02-21T08:52:59.7788690Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7788836Z mov.b64 {%r1865, %r1866}, %rd1107; 2026-02-21T08:52:59.7789045Z cvt.rn.f16x2.f32 %r1867, %r1866, %r1865; 2026-02-21T08:52:59.7789487Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7789630Z cvt.u64.u32 %rd1108, %r1209; 2026-02-21T08:52:59.7789808Z cvt.u64.u32 %rd1109, %r1210; 2026-02-21T08:52:59.7789948Z shl.b64 %rd1110, %rd1109, 32; 2026-02-21T08:52:59.7790092Z or.b64 %rd1111, %rd1108, %rd1110; 2026-02-21T08:52:59.7790533Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7790720Z mov.b64 {%r1868, %r1869}, %rd1111; 2026-02-21T08:52:59.7790880Z cvt.rn.f16x2.f32 %r1870, %r1869, %r1868; 2026-02-21T08:52:59.7791310Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7791504Z cvt.u64.u32 %rd1112, %r1211; 2026-02-21T08:52:59.7791654Z cvt.u64.u32 %rd1113, %r1212; 2026-02-21T08:52:59.7791902Z shl.b64 %rd1114, %rd1113, 32; 2026-02-21T08:52:59.7792084Z or.b64 %rd1115, %rd1112, %rd1114; 2026-02-21T08:52:59.7792525Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7792674Z mov.b64 {%r1871, %r1872}, %rd1115; 2026-02-21T08:52:59.7792873Z cvt.rn.f16x2.f32 %r1873, %r1872, %r1871; 2026-02-21T08:52:59.7793295Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7793433Z cvt.u64.u32 %rd1116, %r1213; 2026-02-21T08:52:59.7793565Z cvt.u64.u32 %rd1117, %r1214; 2026-02-21T08:52:59.7793741Z shl.b64 %rd1118, %rd1117, 32; 2026-02-21T08:52:59.7793880Z or.b64 %rd1119, %rd1116, %rd1118; 2026-02-21T08:52:59.7794312Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7794495Z mov.b64 {%r1874, %r1875}, %rd1119; 2026-02-21T08:52:59.7794662Z cvt.rn.f16x2.f32 %r1876, %r1875, %r1874; 2026-02-21T08:52:59.7795161Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7795337Z cvt.u64.u32 %rd1120, %r1215; 2026-02-21T08:52:59.7795479Z cvt.u64.u32 %rd1121, %r1216; 2026-02-21T08:52:59.7795617Z shl.b64 %rd1122, %rd1121, 32; 2026-02-21T08:52:59.7795760Z or.b64 %rd1123, %rd1120, %rd1122; 2026-02-21T08:52:59.7796224Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7796484Z mov.b64 {%r1877, %r1878}, %rd1123; 2026-02-21T08:52:59.7796650Z cvt.rn.f16x2.f32 %r1879, %r1878, %r1877; 2026-02-21T08:52:59.7797127Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7797281Z cvt.u64.u32 %rd1124, %r1217; 2026-02-21T08:52:59.7797425Z cvt.u64.u32 %rd1125, %r1218; 2026-02-21T08:52:59.7797692Z shl.b64 %rd1126, %rd1125, 32; 2026-02-21T08:52:59.7797846Z or.b64 %rd1127, %rd1124, %rd1126; 2026-02-21T08:52:59.7798301Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7798448Z mov.b64 {%r1880, %r1881}, %rd1127; 2026-02-21T08:52:59.7798651Z cvt.rn.f16x2.f32 %r1882, %r1881, %r1880; 2026-02-21T08:52:59.7799074Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7799218Z cvt.u64.u32 %rd1128, %r1219; 2026-02-21T08:52:59.7799403Z cvt.u64.u32 %rd1129, %r1220; 2026-02-21T08:52:59.7799543Z shl.b64 %rd1130, %rd1129, 32; 2026-02-21T08:52:59.7799685Z or.b64 %rd1131, %rd1128, %rd1130; 2026-02-21T08:52:59.7800169Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7800413Z mov.b64 {%r1883, %r1884}, %rd1131; 2026-02-21T08:52:59.7800590Z cvt.rn.f16x2.f32 %r1885, %r1884, %r1883; 2026-02-21T08:52:59.7801040Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7801221Z cvt.u64.u32 %rd1132, %r1221; 2026-02-21T08:52:59.7801362Z cvt.u64.u32 %rd1133, %r1222; 2026-02-21T08:52:59.7801503Z shl.b64 %rd1134, %rd1133, 32; 2026-02-21T08:52:59.7801683Z or.b64 %rd1135, %rd1132, %rd1134; 2026-02-21T08:52:59.7802122Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7802265Z mov.b64 {%r1886, %r1887}, %rd1135; 2026-02-21T08:52:59.7802468Z cvt.rn.f16x2.f32 %r1888, %r1887, %r1886; 2026-02-21T08:52:59.7802933Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7803085Z cvt.u64.u32 %rd1136, %r1224; 2026-02-21T08:52:59.7803231Z cvt.u64.u32 %rd1137, %r1225; 2026-02-21T08:52:59.7803418Z shl.b64 %rd1138, %rd1137, 32; 2026-02-21T08:52:59.7803562Z or.b64 %rd1139, %rd1136, %rd1138; 2026-02-21T08:52:59.7804022Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7804314Z mov.b64 {%r1889, %r1890}, %rd1139; 2026-02-21T08:52:59.7804476Z cvt.rn.f16x2.f32 %r1891, %r1890, %r1889; 2026-02-21T08:52:59.7804890Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7805041Z cvt.u64.u32 %rd1140, %r1226; 2026-02-21T08:52:59.7805179Z cvt.u64.u32 %rd1141, %r1227; 2026-02-21T08:52:59.7805318Z shl.b64 %rd1142, %rd1141, 32; 2026-02-21T08:52:59.7805463Z or.b64 %rd1143, %rd1140, %rd1142; 2026-02-21T08:52:59.7805903Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7806047Z mov.b64 {%r1892, %r1893}, %rd1143; 2026-02-21T08:52:59.7806210Z cvt.rn.f16x2.f32 %r1894, %r1893, %r1892; 2026-02-21T08:52:59.7806690Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7806835Z cvt.u64.u32 %rd1144, %r1228; 2026-02-21T08:52:59.7806965Z cvt.u64.u32 %rd1145, %r1229; 2026-02-21T08:52:59.7807152Z shl.b64 %rd1146, %rd1145, 32; 2026-02-21T08:52:59.7807302Z or.b64 %rd1147, %rd1144, %rd1146; 2026-02-21T08:52:59.7807733Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7807880Z mov.b64 {%r1895, %r1896}, %rd1147; 2026-02-21T08:52:59.7808080Z cvt.rn.f16x2.f32 %r1897, %r1896, %r1895; 2026-02-21T08:52:59.7808648Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7808800Z cvt.u64.u32 %rd1148, %r1230; 2026-02-21T08:52:59.7808993Z cvt.u64.u32 %rd1149, %r1231; 2026-02-21T08:52:59.7809140Z shl.b64 %rd1150, %rd1149, 32; 2026-02-21T08:52:59.7809286Z or.b64 %rd1151, %rd1148, %rd1150; 2026-02-21T08:52:59.7809883Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7810048Z mov.b64 {%r1898, %r1899}, %rd1151; 2026-02-21T08:52:59.7810217Z cvt.rn.f16x2.f32 %r1900, %r1899, %r1898; 2026-02-21T08:52:59.7810646Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7810820Z cvt.u64.u32 %rd1152, %r1232; 2026-02-21T08:52:59.7810960Z cvt.u64.u32 %rd1153, %r1233; 2026-02-21T08:52:59.7811102Z shl.b64 %rd1154, %rd1153, 32; 2026-02-21T08:52:59.7811300Z or.b64 %rd1155, %rd1152, %rd1154; 2026-02-21T08:52:59.7811741Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7811886Z mov.b64 {%r1901, %r1902}, %rd1155; 2026-02-21T08:52:59.7812084Z cvt.rn.f16x2.f32 %r1903, %r1902, %r1901; 2026-02-21T08:52:59.7812619Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7812774Z cvt.u64.u32 %rd1156, %r1234; 2026-02-21T08:52:59.7812923Z cvt.u64.u32 %rd1157, %r1235; 2026-02-21T08:52:59.7813113Z shl.b64 %rd1158, %rd1157, 32; 2026-02-21T08:52:59.7813255Z or.b64 %rd1159, %rd1156, %rd1158; 2026-02-21T08:52:59.7813688Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7813866Z mov.b64 {%r1904, %r1905}, %rd1159; 2026-02-21T08:52:59.7814030Z cvt.rn.f16x2.f32 %r1906, %r1905, %r1904; 2026-02-21T08:52:59.7814481Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7814725Z cvt.u64.u32 %rd1160, %r1236; 2026-02-21T08:52:59.7814881Z cvt.u64.u32 %rd1161, %r1237; 2026-02-21T08:52:59.7815028Z shl.b64 %rd1162, %rd1161, 32; 2026-02-21T08:52:59.7815175Z or.b64 %rd1163, %rd1160, %rd1162; 2026-02-21T08:52:59.7815660Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7815807Z mov.b64 {%r1907, %r1908}, %rd1163; 2026-02-21T08:52:59.7815971Z cvt.rn.f16x2.f32 %r1909, %r1908, %r1907; 2026-02-21T08:52:59.7816526Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7816661Z cvt.u64.u32 %rd1164, %r1238; 2026-02-21T08:52:59.7816800Z cvt.u64.u32 %rd1165, %r1239; 2026-02-21T08:52:59.7816971Z shl.b64 %rd1166, %rd1165, 32; 2026-02-21T08:52:59.7817108Z or.b64 %rd1167, %rd1164, %rd1166; 2026-02-21T08:52:59.7817533Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7817682Z mov.b64 {%r1910, %r1911}, %rd1167; 2026-02-21T08:52:59.7817882Z cvt.rn.f16x2.f32 %r1912, %r1911, %r1910; 2026-02-21T08:52:59.7818025Z mov.b64 {%r1913, %r1914}, %rd559; 2026-02-21T08:52:59.7818184Z cvt.rn.f16x2.f32 %r1915, %r1914, %r1913; 2026-02-21T08:52:59.7818360Z mov.b64 {%r1916, %r1917}, %rd563; 2026-02-21T08:52:59.7818523Z cvt.rn.f16x2.f32 %r1918, %r1917, %r1916; 2026-02-21T08:52:59.7818667Z mov.b64 {%r1919, %r1920}, %rd567; 2026-02-21T08:52:59.7818856Z cvt.rn.f16x2.f32 %r1921, %r1920, %r1919; 2026-02-21T08:52:59.7818998Z mov.b64 {%r1922, %r1923}, %rd571; 2026-02-21T08:52:59.7819155Z cvt.rn.f16x2.f32 %r1924, %r1923, %r1922; 2026-02-21T08:52:59.7819296Z mov.b64 {%r1925, %r1926}, %rd575; 2026-02-21T08:52:59.7819490Z cvt.rn.f16x2.f32 %r1927, %r1926, %r1925; 2026-02-21T08:52:59.7819631Z mov.b64 {%r1928, %r1929}, %rd579; 2026-02-21T08:52:59.7819789Z cvt.rn.f16x2.f32 %r1930, %r1929, %r1928; 2026-02-21T08:52:59.7819963Z mov.b64 {%r1931, %r1932}, %rd583; 2026-02-21T08:52:59.7820235Z cvt.rn.f16x2.f32 %r1933, %r1932, %r1931; 2026-02-21T08:52:59.7820385Z mov.b64 {%r1934, %r1935}, %rd587; 2026-02-21T08:52:59.7820593Z cvt.rn.f16x2.f32 %r1936, %r1935, %r1934; 2026-02-21T08:52:59.7820742Z mov.b64 {%r1937, %r1938}, %rd591; 2026-02-21T08:52:59.7820901Z cvt.rn.f16x2.f32 %r1939, %r1938, %r1937; 2026-02-21T08:52:59.7821045Z mov.b64 {%r1940, %r1941}, %rd595; 2026-02-21T08:52:59.7821325Z cvt.rn.f16x2.f32 %r1942, %r1941, %r1940; 2026-02-21T08:52:59.7821484Z mov.b64 {%r1943, %r1944}, %rd599; 2026-02-21T08:52:59.7821649Z cvt.rn.f16x2.f32 %r1945, %r1944, %r1943; 2026-02-21T08:52:59.7821829Z mov.b64 {%r1946, %r1947}, %rd603; 2026-02-21T08:52:59.7821988Z cvt.rn.f16x2.f32 %r1948, %r1947, %r1946; 2026-02-21T08:52:59.7822126Z mov.b64 {%r1949, %r1950}, %rd607; 2026-02-21T08:52:59.7822283Z cvt.rn.f16x2.f32 %r1951, %r1950, %r1949; 2026-02-21T08:52:59.7822463Z mov.b64 {%r1952, %r1953}, %rd611; 2026-02-21T08:52:59.7822627Z cvt.rn.f16x2.f32 %r1954, %r1953, %r1952; 2026-02-21T08:52:59.7822775Z mov.b64 {%r1955, %r1956}, %rd615; 2026-02-21T08:52:59.7822982Z cvt.rn.f16x2.f32 %r1957, %r1956, %r1955; 2026-02-21T08:52:59.7823131Z mov.b64 {%r1958, %r1959}, %rd619; 2026-02-21T08:52:59.7823296Z cvt.rn.f16x2.f32 %r1960, %r1959, %r1958; 2026-02-21T08:52:59.7823910Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7824068Z cvt.u64.u32 %rd1168, %r1275; 2026-02-21T08:52:59.7824212Z cvt.u64.u32 %rd1169, %r1276; 2026-02-21T08:52:59.7824358Z shl.b64 %rd1170, %rd1169, 32; 2026-02-21T08:52:59.7824541Z or.b64 %rd1171, %rd1168, %rd1170; 2026-02-21T08:52:59.7825045Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7825189Z mov.b64 {%r1961, %r1962}, %rd1171; 2026-02-21T08:52:59.7825388Z cvt.rn.f16x2.f32 %r1963, %r1962, %r1961; 2026-02-21T08:52:59.7825829Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7825979Z cvt.u64.u32 %rd1172, %r1277; 2026-02-21T08:52:59.7826164Z cvt.u64.u32 %rd1173, %r1278; 2026-02-21T08:52:59.7826316Z shl.b64 %rd1174, %rd1173, 32; 2026-02-21T08:52:59.7826466Z or.b64 %rd1175, %rd1172, %rd1174; 2026-02-21T08:52:59.7826920Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7827110Z mov.b64 {%r1964, %r1965}, %rd1175; 2026-02-21T08:52:59.7827382Z cvt.rn.f16x2.f32 %r1966, %r1965, %r1964; 2026-02-21T08:52:59.7827824Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7828001Z cvt.u64.u32 %rd1176, %r1279; 2026-02-21T08:52:59.7828141Z cvt.u64.u32 %rd1177, %r1280; 2026-02-21T08:52:59.7828277Z shl.b64 %rd1178, %rd1177, 32; 2026-02-21T08:52:59.7828459Z or.b64 %rd1179, %rd1176, %rd1178; 2026-02-21T08:52:59.7828901Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7829051Z mov.b64 {%r1967, %r1968}, %rd1179; 2026-02-21T08:52:59.7829213Z cvt.rn.f16x2.f32 %r1969, %r1968, %r1967; 2026-02-21T08:52:59.7829690Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7829839Z cvt.u64.u32 %rd1180, %r1281; 2026-02-21T08:52:59.7829979Z cvt.u64.u32 %rd1181, %r1282; 2026-02-21T08:52:59.7830161Z shl.b64 %rd1182, %rd1181, 32; 2026-02-21T08:52:59.7830307Z or.b64 %rd1183, %rd1180, %rd1182; 2026-02-21T08:52:59.7830736Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7830915Z mov.b64 {%r1970, %r1971}, %rd1183; 2026-02-21T08:52:59.7831074Z cvt.rn.f16x2.f32 %r1972, %r1971, %r1970; 2026-02-21T08:52:59.7831530Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7831680Z cvt.u64.u32 %rd1184, %r1283; 2026-02-21T08:52:59.7831979Z cvt.u64.u32 %rd1185, %r1284; 2026-02-21T08:52:59.7832127Z shl.b64 %rd1186, %rd1185, 32; 2026-02-21T08:52:59.7832269Z or.b64 %rd1187, %rd1184, %rd1186; 2026-02-21T08:52:59.7832779Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7832934Z mov.b64 {%r1973, %r1974}, %rd1187; 2026-02-21T08:52:59.7833100Z cvt.rn.f16x2.f32 %r1975, %r1974, %r1973; 2026-02-21T08:52:59.7833657Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7833814Z cvt.u64.u32 %rd1188, %r1285; 2026-02-21T08:52:59.7833958Z cvt.u64.u32 %rd1189, %r1286; 2026-02-21T08:52:59.7834096Z shl.b64 %rd1190, %rd1189, 32; 2026-02-21T08:52:59.7834280Z or.b64 %rd1191, %rd1188, %rd1190; 2026-02-21T08:52:59.7834804Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7834955Z mov.b64 {%r1976, %r1977}, %rd1191; 2026-02-21T08:52:59.7835170Z cvt.rn.f16x2.f32 %r1978, %r1977, %r1976; 2026-02-21T08:52:59.7835614Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7835758Z cvt.u64.u32 %rd1192, %r1287; 2026-02-21T08:52:59.7835935Z cvt.u64.u32 %rd1193, %r1288; 2026-02-21T08:52:59.7836162Z shl.b64 %rd1194, %rd1193, 32; 2026-02-21T08:52:59.7836315Z or.b64 %rd1195, %rd1192, %rd1194; 2026-02-21T08:52:59.7836756Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7836950Z mov.b64 {%r1979, %r1980}, %rd1195; 2026-02-21T08:52:59.7837116Z cvt.rn.f16x2.f32 %r1981, %r1980, %r1979; 2026-02-21T08:52:59.7837577Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7837769Z cvt.u64.u32 %rd1196, %r1289; 2026-02-21T08:52:59.7837916Z cvt.u64.u32 %rd1197, %r1290; 2026-02-21T08:52:59.7838063Z shl.b64 %rd1198, %rd1197, 32; 2026-02-21T08:52:59.7838253Z or.b64 %rd1199, %rd1196, %rd1198; 2026-02-21T08:52:59.7838696Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7838839Z mov.b64 {%r1982, %r1983}, %rd1199; 2026-02-21T08:52:59.7839000Z cvt.rn.f16x2.f32 %r1984, %r1983, %r1982; 2026-02-21T08:52:59.7839456Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7839598Z cvt.u64.u32 %rd1200, %r1292; 2026-02-21T08:52:59.7839857Z cvt.u64.u32 %rd1201, %r1293; 2026-02-21T08:52:59.7840040Z shl.b64 %rd1202, %rd1201, 32; 2026-02-21T08:52:59.7840182Z or.b64 %rd1203, %rd1200, %rd1202; 2026-02-21T08:52:59.7840610Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7840793Z mov.b64 {%r1985, %r1986}, %rd1203; 2026-02-21T08:52:59.7840950Z cvt.rn.f16x2.f32 %r1987, %r1986, %r1985; 2026-02-21T08:52:59.7841379Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7842333Z [285s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:59.7845207Z Config: @helion.kernel(config=helion.Config(block_sizes=[1024, 64, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T08:52:59.7845555Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:59.7845734Z `ptxas` stderr: 2026-02-21T08:52:59.7846639Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 226 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:52:59.7846971Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:59.7846982Z 2026-02-21T08:52:59.7848029Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp7a78taq0.ptx -o /tmp/tmp7a78taq0.ptx.o 2026-02-21T08:52:59.7848039Z 2026-02-21T08:52:59.7848454Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:59.7848621Z cvt.u64.u32 %rd1204, %r1294; 2026-02-21T08:52:59.7848813Z cvt.u64.u32 %rd1205, %r1295; 2026-02-21T08:52:59.7848965Z shl.b64 %rd1206, %rd1205, 32; 2026-02-21T08:52:59.7849115Z or.b64 %rd1207, %rd1204, %rd1206; 2026-02-21T08:52:59.7849620Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7849773Z mov.b64 {%r1988, %r1989}, %rd1207; 2026-02-21T08:52:59.7849940Z cvt.rn.f16x2.f32 %r1990, %r1989, %r1988; 2026-02-21T08:52:59.7850425Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7850565Z cvt.u64.u32 %rd1208, %r1296; 2026-02-21T08:52:59.7850707Z cvt.u64.u32 %rd1209, %r1297; 2026-02-21T08:52:59.7850845Z shl.b64 %rd1210, %rd1209, 32; 2026-02-21T08:52:59.7851118Z or.b64 %rd1211, %rd1208, %rd1210; 2026-02-21T08:52:59.7851585Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7851733Z mov.b64 {%r1991, %r1992}, %rd1211; 2026-02-21T08:52:59.7851942Z cvt.rn.f16x2.f32 %r1993, %r1992, %r1991; 2026-02-21T08:52:59.7852383Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7852524Z cvt.u64.u32 %rd1212, %r1298; 2026-02-21T08:52:59.7852706Z cvt.u64.u32 %rd1213, %r1299; 2026-02-21T08:52:59.7852850Z shl.b64 %rd1214, %rd1213, 32; 2026-02-21T08:52:59.7852997Z or.b64 %rd1215, %rd1212, %rd1214; 2026-02-21T08:52:59.7853438Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7853626Z mov.b64 {%r1994, %r1995}, %rd1215; 2026-02-21T08:52:59.7853789Z cvt.rn.f16x2.f32 %r1996, %r1995, %r1994; 2026-02-21T08:52:59.7854244Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7854432Z cvt.u64.u32 %rd1216, %r1300; 2026-02-21T08:52:59.7854582Z cvt.u64.u32 %rd1217, %r1301; 2026-02-21T08:52:59.7854891Z shl.b64 %rd1218, %rd1217, 32; 2026-02-21T08:52:59.7855076Z or.b64 %rd1219, %rd1216, %rd1218; 2026-02-21T08:52:59.7855541Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7855687Z mov.b64 {%r1997, %r1998}, %rd1219; 2026-02-21T08:52:59.7855848Z cvt.rn.f16x2.f32 %r1999, %r1998, %r1997; 2026-02-21T08:52:59.7856297Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7856445Z cvt.u64.u32 %rd1220, %r1302; 2026-02-21T08:52:59.7856588Z cvt.u64.u32 %rd1221, %r1303; 2026-02-21T08:52:59.7856766Z shl.b64 %rd1222, %rd1221, 32; 2026-02-21T08:52:59.7856916Z or.b64 %rd1223, %rd1220, %rd1222; 2026-02-21T08:52:59.7857365Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7857550Z mov.b64 {%r2000, %r2001}, %rd1223; 2026-02-21T08:52:59.7857717Z cvt.rn.f16x2.f32 %r2002, %r2001, %r2000; 2026-02-21T08:52:59.7858163Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7858309Z cvt.u64.u32 %rd1224, %r1304; 2026-02-21T08:52:59.7858484Z cvt.u64.u32 %rd1225, %r1305; 2026-02-21T08:52:59.7858627Z shl.b64 %rd1226, %rd1225, 32; 2026-02-21T08:52:59.7858771Z or.b64 %rd1227, %rd1224, %rd1226; 2026-02-21T08:52:59.7859247Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7859502Z mov.b64 {%r2003, %r2004}, %rd1227; 2026-02-21T08:52:59.7859673Z cvt.rn.f16x2.f32 %r2005, %r2004, %r2003; 2026-02-21T08:52:59.7860179Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7860332Z cvt.u64.u32 %rd1228, %r1306; 2026-02-21T08:52:59.7860480Z cvt.u64.u32 %rd1229, %r1307; 2026-02-21T08:52:59.7860697Z shl.b64 %rd1230, %rd1229, 32; 2026-02-21T08:52:59.7860902Z or.b64 %rd1231, %rd1228, %rd1230; 2026-02-21T08:52:59.7861342Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7861482Z mov.b64 {%r2006, %r2007}, %rd1231; 2026-02-21T08:52:59.7861677Z cvt.rn.f16x2.f32 %r2008, %r2007, %r2006; 2026-02-21T08:52:59.7861812Z mov.b64 {%r2009, %r2010}, %rd623; 2026-02-21T08:52:59.7861969Z cvt.rn.f16x2.f32 %r2011, %r2010, %r2009; 2026-02-21T08:52:59.7862146Z mov.b64 {%r2012, %r2013}, %rd627; 2026-02-21T08:52:59.7862304Z cvt.rn.f16x2.f32 %r2014, %r2013, %r2012; 2026-02-21T08:52:59.7862441Z mov.b64 {%r2015, %r2016}, %rd631; 2026-02-21T08:52:59.7862595Z cvt.rn.f16x2.f32 %r2017, %r2016, %r2015; 2026-02-21T08:52:59.7862774Z mov.b64 {%r2018, %r2019}, %rd635; 2026-02-21T08:52:59.7862932Z cvt.rn.f16x2.f32 %r2020, %r2019, %r2018; 2026-02-21T08:52:59.7863161Z mov.b64 {%r2021, %r2022}, %rd639; 2026-02-21T08:52:59.7863372Z cvt.rn.f16x2.f32 %r2023, %r2022, %r2021; 2026-02-21T08:52:59.7863517Z mov.b64 {%r2024, %r2025}, %rd643; 2026-02-21T08:52:59.7863684Z cvt.rn.f16x2.f32 %r2026, %r2025, %r2024; 2026-02-21T08:52:59.7863821Z mov.b64 {%r2027, %r2028}, %rd647; 2026-02-21T08:52:59.7864014Z cvt.rn.f16x2.f32 %r2029, %r2028, %r2027; 2026-02-21T08:52:59.7864148Z mov.b64 {%r2030, %r2031}, %rd651; 2026-02-21T08:52:59.7864307Z cvt.rn.f16x2.f32 %r2032, %r2031, %r2030; 2026-02-21T08:52:59.7864486Z mov.b64 {%r2033, %r2034}, %rd655; 2026-02-21T08:52:59.7864645Z cvt.rn.f16x2.f32 %r2035, %r2034, %r2033; 2026-02-21T08:52:59.7864868Z mov.b64 {%r2036, %r2037}, %rd659; 2026-02-21T08:52:59.7865070Z cvt.rn.f16x2.f32 %r2038, %r2037, %r2036; 2026-02-21T08:52:59.7865207Z mov.b64 {%r2039, %r2040}, %rd663; 2026-02-21T08:52:59.7865369Z cvt.rn.f16x2.f32 %r2041, %r2040, %r2039; 2026-02-21T08:52:59.7865513Z mov.b64 {%r2042, %r2043}, %rd667; 2026-02-21T08:52:59.7865726Z cvt.rn.f16x2.f32 %r2044, %r2043, %r2042; 2026-02-21T08:52:59.7865876Z mov.b64 {%r2045, %r2046}, %rd671; 2026-02-21T08:52:59.7866044Z cvt.rn.f16x2.f32 %r2047, %r2046, %r2045; 2026-02-21T08:52:59.7866318Z mov.b64 {%r2048, %r2049}, %rd675; 2026-02-21T08:52:59.7866482Z cvt.rn.f16x2.f32 %r2050, %r2049, %r2048; 2026-02-21T08:52:59.7866626Z mov.b64 {%r2051, %r2052}, %rd679; 2026-02-21T08:52:59.7866799Z cvt.rn.f16x2.f32 %r2053, %r2052, %r2051; 2026-02-21T08:52:59.7866979Z mov.b64 {%r2054, %r2055}, %rd683; 2026-02-21T08:52:59.7867143Z cvt.rn.f16x2.f32 %r2056, %r2055, %r2054; 2026-02-21T08:52:59.7867577Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7867761Z cvt.u64.u32 %rd1232, %r1343; 2026-02-21T08:52:59.7867905Z cvt.u64.u32 %rd1233, %r1344; 2026-02-21T08:52:59.7868049Z shl.b64 %rd1234, %rd1233, 32; 2026-02-21T08:52:59.7868233Z or.b64 %rd1235, %rd1232, %rd1234; 2026-02-21T08:52:59.7868685Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7868834Z mov.b64 {%r2057, %r2058}, %rd1235; 2026-02-21T08:52:59.7869003Z cvt.rn.f16x2.f32 %r2059, %r2058, %r2057; 2026-02-21T08:52:59.7869480Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7869624Z cvt.u64.u32 %rd1236, %r1345; 2026-02-21T08:52:59.7869765Z cvt.u64.u32 %rd1237, %r1346; 2026-02-21T08:52:59.7869950Z shl.b64 %rd1238, %rd1237, 32; 2026-02-21T08:52:59.7870095Z or.b64 %rd1239, %rd1236, %rd1238; 2026-02-21T08:52:59.7870539Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7870830Z mov.b64 {%r2060, %r2061}, %rd1239; 2026-02-21T08:52:59.7870997Z cvt.rn.f16x2.f32 %r2062, %r2061, %r2060; 2026-02-21T08:52:59.7871454Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7871647Z cvt.u64.u32 %rd1240, %r1347; 2026-02-21T08:52:59.7871799Z cvt.u64.u32 %rd1241, %r1348; 2026-02-21T08:52:59.7872047Z shl.b64 %rd1242, %rd1241, 32; 2026-02-21T08:52:59.7872209Z or.b64 %rd1243, %rd1240, %rd1242; 2026-02-21T08:52:59.7872700Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7872849Z mov.b64 {%r2063, %r2064}, %rd1243; 2026-02-21T08:52:59.7873009Z cvt.rn.f16x2.f32 %r2065, %r2064, %r2063; 2026-02-21T08:52:59.7873463Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7873606Z cvt.u64.u32 %rd1244, %r1349; 2026-02-21T08:52:59.7873752Z cvt.u64.u32 %rd1245, %r1350; 2026-02-21T08:52:59.7873895Z shl.b64 %rd1246, %rd1245, 32; 2026-02-21T08:52:59.7874083Z or.b64 %rd1247, %rd1244, %rd1246; 2026-02-21T08:52:59.7874524Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7874828Z mov.b64 {%r2066, %r2067}, %rd1247; 2026-02-21T08:52:59.7875044Z cvt.rn.f16x2.f32 %r2068, %r2067, %r2066; 2026-02-21T08:52:59.7875479Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7875626Z cvt.u64.u32 %rd1248, %r1351; 2026-02-21T08:52:59.7875807Z cvt.u64.u32 %rd1249, %r1352; 2026-02-21T08:52:59.7875952Z shl.b64 %rd1250, %rd1249, 32; 2026-02-21T08:52:59.7876093Z or.b64 %rd1251, %rd1248, %rd1250; 2026-02-21T08:52:59.7876532Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7876718Z mov.b64 {%r2069, %r2070}, %rd1251; 2026-02-21T08:52:59.7876893Z cvt.rn.f16x2.f32 %r2071, %r2070, %r2069; 2026-02-21T08:52:59.7877367Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7877556Z cvt.u64.u32 %rd1252, %r1353; 2026-02-21T08:52:59.7877704Z cvt.u64.u32 %rd1253, %r1354; 2026-02-21T08:52:59.7877855Z shl.b64 %rd1254, %rd1253, 32; 2026-02-21T08:52:59.7878048Z or.b64 %rd1255, %rd1252, %rd1254; 2026-02-21T08:52:59.7878496Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7878737Z mov.b64 {%r2072, %r2073}, %rd1255; 2026-02-21T08:52:59.7878935Z cvt.rn.f16x2.f32 %r2074, %r2073, %r2072; 2026-02-21T08:52:59.7879374Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7879519Z cvt.u64.u32 %rd1256, %r1355; 2026-02-21T08:52:59.7879667Z cvt.u64.u32 %rd1257, %r1356; 2026-02-21T08:52:59.7879852Z shl.b64 %rd1258, %rd1257, 32; 2026-02-21T08:52:59.7879996Z or.b64 %rd1259, %rd1256, %rd1258; 2026-02-21T08:52:59.7880441Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7880625Z mov.b64 {%r2075, %r2076}, %rd1259; 2026-02-21T08:52:59.7880792Z cvt.rn.f16x2.f32 %r2077, %r2076, %r2075; 2026-02-21T08:52:59.7881233Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7881413Z cvt.u64.u32 %rd1260, %r1357; 2026-02-21T08:52:59.7881554Z cvt.u64.u32 %rd1261, %r1358; 2026-02-21T08:52:59.7881696Z shl.b64 %rd1262, %rd1261, 32; 2026-02-21T08:52:59.7881836Z or.b64 %rd1263, %rd1260, %rd1262; 2026-02-21T08:52:59.7882305Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7882452Z mov.b64 {%r2078, %r2079}, %rd1263; 2026-02-21T08:52:59.7882633Z cvt.rn.f16x2.f32 %r2080, %r2079, %r2078; 2026-02-21T08:52:59.7883250Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7883397Z cvt.u64.u32 %rd1264, %r1360; 2026-02-21T08:52:59.7883541Z cvt.u64.u32 %rd1265, %r1361; 2026-02-21T08:52:59.7883726Z shl.b64 %rd1266, %rd1265, 32; 2026-02-21T08:52:59.7883876Z or.b64 %rd1267, %rd1264, %rd1266; 2026-02-21T08:52:59.7884385Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7884528Z mov.b64 {%r2081, %r2082}, %rd1267; 2026-02-21T08:52:59.7884788Z cvt.rn.f16x2.f32 %r2083, %r2082, %r2081; 2026-02-21T08:52:59.7885207Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7885348Z cvt.u64.u32 %rd1268, %r1362; 2026-02-21T08:52:59.7885523Z cvt.u64.u32 %rd1269, %r1363; 2026-02-21T08:52:59.7885662Z shl.b64 %rd1270, %rd1269, 32; 2026-02-21T08:52:59.7885801Z or.b64 %rd1271, %rd1268, %rd1270; 2026-02-21T08:52:59.7886271Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7886412Z mov.b64 {%r2084, %r2085}, %rd1271; 2026-02-21T08:52:59.7886568Z cvt.rn.f16x2.f32 %r2086, %r2085, %r2084; 2026-02-21T08:52:59.7887070Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7887259Z cvt.u64.u32 %rd1272, %r1364; 2026-02-21T08:52:59.7887397Z cvt.u64.u32 %rd1273, %r1365; 2026-02-21T08:52:59.7887540Z shl.b64 %rd1274, %rd1273, 32; 2026-02-21T08:52:59.7887721Z or.b64 %rd1275, %rd1272, %rd1274; 2026-02-21T08:52:59.7888150Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7888299Z mov.b64 {%r2087, %r2088}, %rd1275; 2026-02-21T08:52:59.7888511Z cvt.rn.f16x2.f32 %r2089, %r2088, %r2087; 2026-02-21T08:52:59.7888957Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7889105Z cvt.u64.u32 %rd1276, %r1366; 2026-02-21T08:52:59.7889257Z cvt.u64.u32 %rd1277, %r1367; 2026-02-21T08:52:59.7889443Z shl.b64 %rd1278, %rd1277, 32; 2026-02-21T08:52:59.7889589Z or.b64 %rd1279, %rd1276, %rd1278; 2026-02-21T08:52:59.7890023Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7890205Z mov.b64 {%r2090, %r2091}, %rd1279; 2026-02-21T08:52:59.7890364Z cvt.rn.f16x2.f32 %r2092, %r2091, %r2090; 2026-02-21T08:52:59.7890607Z mov.b64 {%r2093, %r2094}, %rd687; 2026-02-21T08:52:59.7890808Z cvt.rn.f16x2.f32 %r2095, %r2094, %r2093; 2026-02-21T08:52:59.7891248Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7891398Z cvt.u64.u32 %rd1280, %r1370; 2026-02-21T08:52:59.7891543Z cvt.u64.u32 %rd1281, %r1371; 2026-02-21T08:52:59.7891729Z shl.b64 %rd1282, %rd1281, 32; 2026-02-21T08:52:59.7891872Z or.b64 %rd1283, %rd1280, %rd1282; 2026-02-21T08:52:59.7892316Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7892503Z mov.b64 {%r2096, %r2097}, %rd1283; 2026-02-21T08:52:59.7892665Z cvt.rn.f16x2.f32 %r2098, %r2097, %r2096; 2026-02-21T08:52:59.7893108Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7893293Z cvt.u64.u32 %rd1284, %r1372; 2026-02-21T08:52:59.7893445Z cvt.u64.u32 %rd1285, %r1373; 2026-02-21T08:52:59.7893590Z shl.b64 %rd1286, %rd1285, 32; 2026-02-21T08:52:59.7893736Z or.b64 %rd1287, %rd1284, %rd1286; 2026-02-21T08:52:59.7894237Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7894390Z mov.b64 {%r2099, %r2100}, %rd1287; 2026-02-21T08:52:59.7894557Z cvt.rn.f16x2.f32 %r2101, %r2100, %r2099; 2026-02-21T08:52:59.7895110Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7895358Z cvt.u64.u32 %rd1288, %r1374; 2026-02-21T08:52:59.7895498Z cvt.u64.u32 %rd1289, %r1375; 2026-02-21T08:52:59.7895683Z shl.b64 %rd1290, %rd1289, 32; 2026-02-21T08:52:59.7895830Z or.b64 %rd1291, %rd1288, %rd1290; 2026-02-21T08:52:59.7896256Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7896481Z mov.b64 {%r2102, %r2103}, %rd1291; 2026-02-21T08:52:59.7896692Z cvt.rn.f16x2.f32 %r2104, %r2103, %r2102; 2026-02-21T08:52:59.7896839Z mov.b64 {%r2105, %r2106}, %rd691; 2026-02-21T08:52:59.7897002Z cvt.rn.f16x2.f32 %r2107, %r2106, %r2105; 2026-02-21T08:52:59.7897190Z mov.b64 {%r2108, %r2109}, %rd695; 2026-02-21T08:52:59.7897353Z cvt.rn.f16x2.f32 %r2110, %r2109, %r2108; 2026-02-21T08:52:59.7897500Z mov.b64 {%r2111, %r2112}, %rd699; 2026-02-21T08:52:59.7897704Z cvt.rn.f16x2.f32 %r2113, %r2112, %r2111; 2026-02-21T08:52:59.7897846Z mov.b64 {%r2114, %r2115}, %rd703; 2026-02-21T08:52:59.7898017Z cvt.rn.f16x2.f32 %r2116, %r2115, %r2114; 2026-02-21T08:52:59.7898164Z mov.b64 {%r2117, %r2118}, %rd707; 2026-02-21T08:52:59.7898365Z cvt.rn.f16x2.f32 %r2119, %r2118, %r2117; 2026-02-21T08:52:59.7898511Z mov.b64 {%r2120, %r2121}, %rd711; 2026-02-21T08:52:59.7898774Z cvt.rn.f16x2.f32 %r2122, %r2121, %r2120; 2026-02-21T08:52:59.7898959Z mov.b64 {%r2123, %r2124}, %rd715; 2026-02-21T08:52:59.7899127Z cvt.rn.f16x2.f32 %r2125, %r2124, %r2123; 2026-02-21T08:52:59.7899278Z mov.b64 {%r2126, %r2127}, %rd719; 2026-02-21T08:52:59.7899442Z cvt.rn.f16x2.f32 %r2128, %r2127, %r2126; 2026-02-21T08:52:59.7899625Z mov.b64 {%r2129, %r2130}, %rd723; 2026-02-21T08:52:59.7899796Z cvt.rn.f16x2.f32 %r2131, %r2130, %r2129; 2026-02-21T08:52:59.7899940Z mov.b64 {%r2132, %r2133}, %rd727; 2026-02-21T08:52:59.7900147Z cvt.rn.f16x2.f32 %r2134, %r2133, %r2132; 2026-02-21T08:52:59.7900291Z mov.b64 {%r2135, %r2136}, %rd731; 2026-02-21T08:52:59.7900456Z cvt.rn.f16x2.f32 %r2137, %r2136, %r2135; 2026-02-21T08:52:59.7900645Z mov.b64 {%r2138, %r2139}, %rd735; 2026-02-21T08:52:59.7900813Z cvt.rn.f16x2.f32 %r2140, %r2139, %r2138; 2026-02-21T08:52:59.7900955Z mov.b64 {%r2141, %r2142}, %rd739; 2026-02-21T08:52:59.7901113Z cvt.rn.f16x2.f32 %r2143, %r2142, %r2141; 2026-02-21T08:52:59.7901296Z mov.b64 {%r2144, %r2145}, %rd743; 2026-02-21T08:52:59.7901456Z cvt.rn.f16x2.f32 %r2146, %r2145, %r2144; 2026-02-21T08:52:59.7901596Z mov.b64 {%r2147, %r2148}, %rd747; 2026-02-21T08:52:59.7901895Z cvt.rn.f16x2.f32 %r2149, %r2148, %r2147; 2026-02-21T08:52:59.7902036Z mov.b64 {%r2150, %r2151}, %rd751; 2026-02-21T08:52:59.7902196Z cvt.rn.f16x2.f32 %r2152, %r2151, %r2150; 2026-02-21T08:52:59.7902645Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7902832Z cvt.u64.u32 %rd1292, %r1411; 2026-02-21T08:52:59.7902978Z cvt.u64.u32 %rd1293, %r1412; 2026-02-21T08:52:59.7903120Z shl.b64 %rd1294, %rd1293, 32; 2026-02-21T08:52:59.7903309Z or.b64 %rd1295, %rd1292, %rd1294; 2026-02-21T08:52:59.7903747Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7903890Z mov.b64 {%r2153, %r2154}, %rd1295; 2026-02-21T08:52:59.7904103Z cvt.rn.f16x2.f32 %r2155, %r2154, %r2153; 2026-02-21T08:52:59.7904549Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7904746Z cvt.u64.u32 %rd1296, %r1413; 2026-02-21T08:52:59.7904899Z cvt.u64.u32 %rd1297, %r1414; 2026-02-21T08:52:59.7905082Z shl.b64 %rd1298, %rd1297, 32; 2026-02-21T08:52:59.7905230Z or.b64 %rd1299, %rd1296, %rd1298; 2026-02-21T08:52:59.7905697Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7905889Z mov.b64 {%r2156, %r2157}, %rd1299; 2026-02-21T08:52:59.7906053Z cvt.rn.f16x2.f32 %r2158, %r2157, %r2156; 2026-02-21T08:52:59.7906490Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7906778Z cvt.u64.u32 %rd1300, %r1415; 2026-02-21T08:52:59.7906916Z cvt.u64.u32 %rd1301, %r1416; 2026-02-21T08:52:59.7907051Z shl.b64 %rd1302, %rd1301, 32; 2026-02-21T08:52:59.7907183Z or.b64 %rd1303, %rd1300, %rd1302; 2026-02-21T08:52:59.7907705Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7907855Z mov.b64 {%r2159, %r2160}, %rd1303; 2026-02-21T08:52:59.7908013Z cvt.rn.f16x2.f32 %r2161, %r2160, %r2159; 2026-02-21T08:52:59.7908470Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7908615Z cvt.u64.u32 %rd1304, %r1417; 2026-02-21T08:52:59.7908755Z cvt.u64.u32 %rd1305, %r1418; 2026-02-21T08:52:59.7908928Z shl.b64 %rd1306, %rd1305, 32; 2026-02-21T08:52:59.7909065Z or.b64 %rd1307, %rd1304, %rd1306; 2026-02-21T08:52:59.7909484Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7909662Z mov.b64 {%r2162, %r2163}, %rd1307; 2026-02-21T08:52:59.7909822Z cvt.rn.f16x2.f32 %r2164, %r2163, %r2162; 2026-02-21T08:52:59.7910331Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7910480Z cvt.u64.u32 %rd1308, %r1419; 2026-02-21T08:52:59.7910661Z cvt.u64.u32 %rd1309, %r1420; 2026-02-21T08:52:59.7910807Z shl.b64 %rd1310, %rd1309, 32; 2026-02-21T08:52:59.7910953Z or.b64 %rd1311, %rd1308, %rd1310; 2026-02-21T08:52:59.7911453Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7911593Z mov.b64 {%r2165, %r2166}, %rd1311; 2026-02-21T08:52:59.7911758Z cvt.rn.f16x2.f32 %r2167, %r2166, %r2165; 2026-02-21T08:52:59.7912244Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7912391Z cvt.u64.u32 %rd1312, %r1421; 2026-02-21T08:52:59.7912528Z cvt.u64.u32 %rd1313, %r1422; 2026-02-21T08:52:59.7912670Z shl.b64 %rd1314, %rd1313, 32; 2026-02-21T08:52:59.7912847Z or.b64 %rd1315, %rd1312, %rd1314; 2026-02-21T08:52:59.7913281Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7913426Z mov.b64 {%r2168, %r2169}, %rd1315; 2026-02-21T08:52:59.7913628Z cvt.rn.f16x2.f32 %r2170, %r2169, %r2168; 2026-02-21T08:52:59.7914189Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7914335Z cvt.u64.u32 %rd1316, %r1423; 2026-02-21T08:52:59.7914518Z cvt.u64.u32 %rd1317, %r1424; 2026-02-21T08:52:59.7914660Z shl.b64 %rd1318, %rd1317, 32; 2026-02-21T08:52:59.7914880Z or.b64 %rd1319, %rd1316, %rd1318; 2026-02-21T08:52:59.7915314Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7915501Z mov.b64 {%r2171, %r2172}, %rd1319; 2026-02-21T08:52:59.7915661Z cvt.rn.f16x2.f32 %r2173, %r2172, %r2171; 2026-02-21T08:52:59.7916101Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7916284Z cvt.u64.u32 %rd1320, %r1425; 2026-02-21T08:52:59.7916433Z cvt.u64.u32 %rd1321, %r1426; 2026-02-21T08:52:59.7916580Z shl.b64 %rd1322, %rd1321, 32; 2026-02-21T08:52:59.7916771Z or.b64 %rd1323, %rd1320, %rd1322; 2026-02-21T08:52:59.7917232Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7917382Z mov.b64 {%r2174, %r2175}, %rd1323; 2026-02-21T08:52:59.7917549Z cvt.rn.f16x2.f32 %r2176, %r2175, %r2174; 2026-02-21T08:52:59.7918021Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7918159Z cvt.u64.u32 %rd1324, %r1428; 2026-02-21T08:52:59.7918294Z cvt.u64.u32 %rd1325, %r1429; 2026-02-21T08:52:59.7918604Z shl.b64 %rd1326, %rd1325, 32; 2026-02-21T08:52:59.7918742Z or.b64 %rd1327, %rd1324, %rd1326; 2026-02-21T08:52:59.7919180Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7919360Z mov.b64 {%r2177, %r2178}, %rd1327; 2026-02-21T08:52:59.7919526Z cvt.rn.f16x2.f32 %r2179, %r2178, %r2177; 2026-02-21T08:52:59.7920040Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7920190Z cvt.u64.u32 %rd1328, %r1430; 2026-02-21T08:52:59.7920372Z cvt.u64.u32 %rd1329, %r1431; 2026-02-21T08:52:59.7920513Z shl.b64 %rd1330, %rd1329, 32; 2026-02-21T08:52:59.7920654Z or.b64 %rd1331, %rd1328, %rd1330; 2026-02-21T08:52:59.7921129Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7921273Z mov.b64 {%r2180, %r2181}, %rd1331; 2026-02-21T08:52:59.7921438Z cvt.rn.f16x2.f32 %r2182, %r2181, %r2180; 2026-02-21T08:52:59.7921934Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7922083Z cvt.u64.u32 %rd1332, %r1432; 2026-02-21T08:52:59.7922230Z cvt.u64.u32 %rd1333, %r1433; 2026-02-21T08:52:59.7922378Z shl.b64 %rd1334, %rd1333, 32; 2026-02-21T08:52:59.7922659Z or.b64 %rd1335, %rd1332, %rd1334; 2026-02-21T08:52:59.7923124Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7923275Z mov.b64 {%r2183, %r2184}, %rd1335; 2026-02-21T08:52:59.7923473Z cvt.rn.f16x2.f32 %r2185, %r2184, %r2183; 2026-02-21T08:52:59.7923887Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7924031Z cvt.u64.u32 %rd1336, %r1434; 2026-02-21T08:52:59.7924211Z cvt.u64.u32 %rd1337, %r1435; 2026-02-21T08:52:59.7924357Z shl.b64 %rd1338, %rd1337, 32; 2026-02-21T08:52:59.7924500Z or.b64 %rd1339, %rd1336, %rd1338; 2026-02-21T08:52:59.7925019Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7925204Z mov.b64 {%r2186, %r2187}, %rd1339; 2026-02-21T08:52:59.7925367Z cvt.rn.f16x2.f32 %r2188, %r2187, %r2186; 2026-02-21T08:52:59.7925517Z mov.b64 {%r2189, %r2190}, %rd755; 2026-02-21T08:52:59.7925714Z cvt.rn.f16x2.f32 %r2191, %r2190, %r2189; 2026-02-21T08:52:59.7925860Z mov.b64 {%r2192, %r2193}, %rd759; 2026-02-21T08:52:59.7926125Z cvt.rn.f16x2.f32 %r2194, %r2193, %r2192; 2026-02-21T08:52:59.7926302Z mov.b64 {%r2195, %r2196}, %rd763; 2026-02-21T08:52:59.7926461Z cvt.rn.f16x2.f32 %r2197, %r2196, %r2195; 2026-02-21T08:52:59.7926600Z mov.b64 {%r2198, %r2199}, %rd767; 2026-02-21T08:52:59.7926758Z cvt.rn.f16x2.f32 %r2200, %r2199, %r2198; 2026-02-21T08:52:59.7926947Z mov.b64 {%r2201, %r2202}, %rd771; 2026-02-21T08:52:59.7927108Z cvt.rn.f16x2.f32 %r2203, %r2202, %r2201; 2026-02-21T08:52:59.7927251Z mov.b64 {%r2204, %r2205}, %rd775; 2026-02-21T08:52:59.7927463Z cvt.rn.f16x2.f32 %r2206, %r2205, %r2204; 2026-02-21T08:52:59.7927609Z mov.b64 {%r2207, %r2208}, %rd779; 2026-02-21T08:52:59.7927773Z cvt.rn.f16x2.f32 %r2209, %r2208, %r2207; 2026-02-21T08:52:59.7927916Z mov.b64 {%r2210, %r2211}, %rd783; 2026-02-21T08:52:59.7928121Z cvt.rn.f16x2.f32 %r2212, %r2211, %r2210; 2026-02-21T08:52:59.7928262Z mov.b64 {%r2213, %r2214}, %rd787; 2026-02-21T08:52:59.7928427Z cvt.rn.f16x2.f32 %r2215, %r2214, %r2213; 2026-02-21T08:52:59.7928608Z mov.b64 {%r2216, %r2217}, %rd791; 2026-02-21T08:52:59.7928766Z cvt.rn.f16x2.f32 %r2218, %r2217, %r2216; 2026-02-21T08:52:59.7928905Z mov.b64 {%r2219, %r2220}, %rd795; 2026-02-21T08:52:59.7929098Z cvt.rn.f16x2.f32 %r2221, %r2220, %r2219; 2026-02-21T08:52:59.7929236Z mov.b64 {%r2222, %r2223}, %rd799; 2026-02-21T08:52:59.7929382Z cvt.rn.f16x2.f32 %r2224, %r2223, %r2222; 2026-02-21T08:52:59.7929518Z mov.b64 {%r2225, %r2226}, %rd803; 2026-02-21T08:52:59.7929706Z cvt.rn.f16x2.f32 %r2227, %r2226, %r2225; 2026-02-21T08:52:59.7929949Z mov.b64 {%r2228, %r2229}, %rd807; 2026-02-21T08:52:59.7930109Z cvt.rn.f16x2.f32 %r2230, %r2229, %r2228; 2026-02-21T08:52:59.7930283Z mov.b64 {%r2231, %r2232}, %rd811; 2026-02-21T08:52:59.7930441Z cvt.rn.f16x2.f32 %r2233, %r2232, %r2231; 2026-02-21T08:52:59.7930585Z mov.b64 {%r2234, %r2235}, %rd815; 2026-02-21T08:52:59.7930745Z cvt.rn.f16x2.f32 %r2236, %r2235, %r2234; 2026-02-21T08:52:59.7930992Z mov.b64 {%r2237, %r2238}, %rd819; 2026-02-21T08:52:59.7931159Z cvt.rn.f16x2.f32 %r2239, %r2238, %r2237; 2026-02-21T08:52:59.7931301Z mov.b64 {%r2240, %r2241}, %rd823; 2026-02-21T08:52:59.7931497Z cvt.rn.f16x2.f32 %r2242, %r2241, %r2240; 2026-02-21T08:52:59.7931634Z mov.b64 {%r2243, %r2244}, %rd827; 2026-02-21T08:52:59.7931793Z cvt.rn.f16x2.f32 %r2245, %r2244, %r2243; 2026-02-21T08:52:59.7931973Z mov.b64 {%r2246, %r2247}, %rd831; 2026-02-21T08:52:59.7932129Z cvt.rn.f16x2.f32 %r2248, %r2247, %r2246; 2026-02-21T08:52:59.7932573Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7932721Z cvt.u64.u32 %rd1340, %r1479; 2026-02-21T08:52:59.7932904Z cvt.u64.u32 %rd1341, %r1480; 2026-02-21T08:52:59.7933050Z shl.b64 %rd1342, %rd1341, 32; 2026-02-21T08:52:59.7933198Z or.b64 %rd1343, %rd1340, %rd1342; 2026-02-21T08:52:59.7933769Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7933922Z mov.b64 {%r2249, %r2250}, %rd1343; 2026-02-21T08:52:59.7934093Z cvt.rn.f16x2.f32 %r2251, %r2250, %r2249; 2026-02-21T08:52:59.7934571Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7934770Z cvt.u64.u32 %rd1344, %r1481; 2026-02-21T08:52:59.7934910Z cvt.u64.u32 %rd1345, %r1482; 2026-02-21T08:52:59.7935051Z shl.b64 %rd1346, %rd1345, 32; 2026-02-21T08:52:59.7935232Z or.b64 %rd1347, %rd1344, %rd1346; 2026-02-21T08:52:59.7935678Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7935821Z mov.b64 {%r2252, %r2253}, %rd1347; 2026-02-21T08:52:59.7936025Z cvt.rn.f16x2.f32 %r2254, %r2253, %r2252; 2026-02-21T08:52:59.7936472Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7936617Z cvt.u64.u32 %rd1348, %r1483; 2026-02-21T08:52:59.7936797Z cvt.u64.u32 %rd1349, %r1484; 2026-02-21T08:52:59.7937037Z shl.b64 %rd1350, %rd1349, 32; 2026-02-21T08:52:59.7937178Z or.b64 %rd1351, %rd1348, %rd1350; 2026-02-21T08:52:59.7937610Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7937789Z mov.b64 {%r2255, %r2256}, %rd1351; 2026-02-21T08:52:59.7937951Z cvt.rn.f16x2.f32 %r2257, %r2256, %r2255; 2026-02-21T08:52:59.7938398Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7938593Z cvt.u64.u32 %rd1352, %r1485; 2026-02-21T08:52:59.7938741Z cvt.u64.u32 %rd1353, %r1486; 2026-02-21T08:52:59.7938891Z shl.b64 %rd1354, %rd1353, 32; 2026-02-21T08:52:59.7939076Z or.b64 %rd1355, %rd1352, %rd1354; 2026-02-21T08:52:59.7939526Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7939675Z mov.b64 {%r2258, %r2259}, %rd1355; 2026-02-21T08:52:59.7939845Z cvt.rn.f16x2.f32 %r2260, %r2259, %r2258; 2026-02-21T08:52:59.7940307Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7940451Z cvt.u64.u32 %rd1356, %r1487; 2026-02-21T08:52:59.7940583Z cvt.u64.u32 %rd1357, %r1488; 2026-02-21T08:52:59.7940759Z shl.b64 %rd1358, %rd1357, 32; 2026-02-21T08:52:59.7940900Z or.b64 %rd1359, %rd1356, %rd1358; 2026-02-21T08:52:59.7941337Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7941622Z mov.b64 {%r2261, %r2262}, %rd1359; 2026-02-21T08:52:59.7941785Z cvt.rn.f16x2.f32 %r2263, %r2262, %r2261; 2026-02-21T08:52:59.7942225Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7942367Z cvt.u64.u32 %rd1360, %r1489; 2026-02-21T08:52:59.7942557Z cvt.u64.u32 %rd1361, %r1490; 2026-02-21T08:52:59.7942699Z shl.b64 %rd1362, %rd1361, 32; 2026-02-21T08:52:59.7942910Z or.b64 %rd1363, %rd1360, %rd1362; 2026-02-21T08:52:59.7943396Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7943541Z mov.b64 {%r2264, %r2265}, %rd1363; 2026-02-21T08:52:59.7943710Z cvt.rn.f16x2.f32 %r2266, %r2265, %r2264; 2026-02-21T08:52:59.7944215Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7944366Z cvt.u64.u32 %rd1364, %r1491; 2026-02-21T08:52:59.7944512Z cvt.u64.u32 %rd1365, %r1492; 2026-02-21T08:52:59.7944660Z shl.b64 %rd1366, %rd1365, 32; 2026-02-21T08:52:59.7944905Z or.b64 %rd1367, %rd1364, %rd1366; 2026-02-21T08:52:59.7945358Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7945503Z mov.b64 {%r2267, %r2268}, %rd1367; 2026-02-21T08:52:59.7945791Z cvt.rn.f16x2.f32 %r2269, %r2268, %r2267; 2026-02-21T08:52:59.7946226Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7946372Z cvt.u64.u32 %rd1368, %r1493; 2026-02-21T08:52:59.7946549Z cvt.u64.u32 %rd1369, %r1494; 2026-02-21T08:52:59.7946691Z shl.b64 %rd1370, %rd1369, 32; 2026-02-21T08:52:59.7946838Z or.b64 %rd1371, %rd1368, %rd1370; 2026-02-21T08:52:59.7947278Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7947459Z mov.b64 {%r2270, %r2271}, %rd1371; 2026-02-21T08:52:59.7947621Z cvt.rn.f16x2.f32 %r2272, %r2271, %r2270; 2026-02-21T08:52:59.7948063Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7948245Z cvt.u64.u32 %rd1372, %r1496; 2026-02-21T08:52:59.7948388Z cvt.u64.u32 %rd1373, %r1497; 2026-02-21T08:52:59.7948540Z shl.b64 %rd1374, %rd1373, 32; 2026-02-21T08:52:59.7948728Z or.b64 %rd1375, %rd1372, %rd1374; 2026-02-21T08:52:59.7949168Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7949415Z mov.b64 {%r2273, %r2274}, %rd1375; 2026-02-21T08:52:59.7949586Z cvt.rn.f16x2.f32 %r2275, %r2274, %r2273; 2026-02-21T08:52:59.7950097Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7950245Z cvt.u64.u32 %rd1376, %r1498; 2026-02-21T08:52:59.7950394Z cvt.u64.u32 %rd1377, %r1499; 2026-02-21T08:52:59.7950577Z shl.b64 %rd1378, %rd1377, 32; 2026-02-21T08:52:59.7950726Z or.b64 %rd1379, %rd1376, %rd1378; 2026-02-21T08:52:59.7951163Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7951350Z mov.b64 {%r2276, %r2277}, %rd1379; 2026-02-21T08:52:59.7951505Z cvt.rn.f16x2.f32 %r2278, %r2277, %r2276; 2026-02-21T08:52:59.7951926Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7952106Z cvt.u64.u32 %rd1380, %r1500; 2026-02-21T08:52:59.7952251Z cvt.u64.u32 %rd1381, %r1501; 2026-02-21T08:52:59.7952394Z shl.b64 %rd1382, %rd1381, 32; 2026-02-21T08:52:59.7952537Z or.b64 %rd1383, %rd1380, %rd1382; 2026-02-21T08:52:59.7953006Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7953146Z mov.b64 {%r2279, %r2280}, %rd1383; 2026-02-21T08:52:59.7953307Z cvt.rn.f16x2.f32 %r2281, %r2280, %r2279; 2026-02-21T08:52:59.7953769Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.7954019Z cvt.u64.u32 %rd1384, %r1502; 2026-02-21T08:52:59.7954157Z cvt.u64.u32 %rd1385, %r1503; 2026-02-21T08:52:59.7954300Z shl.b64 %rd1386, %rd1385, 32; 2026-02-21T08:52:59.7954480Z or.b64 %rd1387, %rd1384, %rd1386; 2026-02-21T08:52:59.7954965Z .loc 1 59 27 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:59:27 2026-02-21T08:52:59.7955180Z mov.b64 {%r2282, %r2283}, %rd1387; 2026-02-21T08:52:59.7955388Z cvt.rn.f16x2.f32 %r2284, %r2283, %r2282; 2026-02-21T08:52:59.7955534Z mov.b64 {%r2285, %r2286}, %rd835; 2026-02-21T08:52:59.7955702Z cvt.rn.f16x2.f32 %r2287, %r2286, %r2285; 2026-02-21T08:52:59.7955886Z mov.b64 {%r2288, %r2289}, %rd839; 2026-02-21T08:52:59.7956042Z cvt.rn.f16x2.f32 %r2290, %r2289, %r2288; 2026-02-21T08:52:59.7956186Z mov.b64 {%r2291, %r2292}, %rd843; 2026-02-21T08:52:59.7956356Z cvt.rn.f16x2.f32 %r2293, %r2292, %r2291; 2026-02-21T08:52:59.7956539Z mov.b64 {%r2294, %r2295}, %rd847; 2026-02-21T08:52:59.7956704Z cvt.rn.f16x2.f32 %r2296, %r2295, %r2294; 2026-02-21T08:52:59.7956844Z mov.b64 {%r2297, %r2298}, %rd851; 2026-02-21T08:52:59.7957043Z cvt.rn.f16x2.f32 %r2299, %r2298, %r2297; 2026-02-21T08:52:59.7957180Z mov.b64 {%r2300, %r2301}, %rd855; 2026-02-21T08:52:59.7957423Z cvt.rn.f16x2.f32 %r2302, %r2301, %r2300; 2026-02-21T08:52:59.7957615Z mov.b64 {%r2303, %r2304}, %rd859; 2026-02-21T08:52:59.7957777Z cvt.rn.f16x2.f32 %r2305, %r2304, %r2303; 2026-02-21T08:52:59.7957925Z mov.b64 {%r2306, %r2307}, %rd863; 2026-02-21T08:52:59.7958087Z cvt.rn.f16x2.f32 %r2308, %r2307, %r2306; 2026-02-21T08:52:59.7958269Z mov.b64 {%r2309, %r2310}, %rd867; 2026-02-21T08:52:59.7958430Z cvt.rn.f16x2.f32 %r2311, %r2310, %r2309; 2026-02-21T08:52:59.7958576Z mov.b64 {%r2312, %r2313}, %rd871; 2026-02-21T08:52:59.7958776Z cvt.rn.f16x2.f32 %r2314, %r2313, %r2312; 2026-02-21T08:52:59.7958916Z mov.b64 {%r2315, %r2316}, %rd875; 2026-02-21T08:52:59.7959078Z cvt.rn.f16x2.f32 %r2317, %r2316, %r2315; 2026-02-21T08:52:59.7959220Z mov.b64 {%r2318, %r2319}, %rd879; 2026-02-21T08:52:59.7959419Z cvt.rn.f16x2.f32 %r2320, %r2319, %r2318; 2026-02-21T08:52:59.7959564Z mov.b64 {%r2321, %r2322}, %rd883; 2026-02-21T08:52:59.7959726Z cvt.rn.f16x2.f32 %r2323, %r2322, %r2321; 2026-02-21T08:52:59.7959901Z mov.b64 {%r2324, %r2325}, %rd887; 2026-02-21T08:52:59.7960060Z cvt.rn.f16x2.f32 %r2326, %r2325, %r2324; 2026-02-21T08:52:59.7960205Z mov.b64 {%r2327, %r2328}, %rd891; 2026-02-21T08:52:59.7960529Z cvt.rn.f16x2.f32 %r2329, %r2328, %r2327; 2026-02-21T08:52:59.7960679Z mov.b64 {%r2330, %r2331}, %rd895; 2026-02-21T08:52:59.7960844Z cvt.rn.f16x2.f32 %r2332, %r2331, %r2330; 2026-02-21T08:52:59.7960991Z mov.b64 {%r2333, %r2334}, %rd899; 2026-02-21T08:52:59.7961199Z cvt.rn.f16x2.f32 %r2335, %r2334, %r2333; 2026-02-21T08:52:59.7961353Z mov.b64 {%r2336, %r2337}, %rd903; 2026-02-21T08:52:59.7961521Z cvt.rn.f16x2.f32 %r2338, %r2337, %r2336; 2026-02-21T08:52:59.7961704Z mov.b64 {%r2339, %r2340}, %rd907; 2026-02-21T08:52:59.7961868Z cvt.rn.f16x2.f32 %r2341, %r2340, %r2339; 2026-02-21T08:52:59.7962016Z mov.b64 {%r2342, %r2343}, %rd911; 2026-02-21T08:52:59.7962182Z cvt.rn.f16x2.f32 %r2344, %r2343, %r2342; 2026-02-21T08:52:59.7962669Z .loc 1 60 45 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:60:45 2026-02-21T08:52:59.7962908Z st.shared.v4.b32 [%r1570], {%r1579, %r1582, %r1585, %r1588}; 2026-02-21T08:52:59.7963178Z st.shared.v4.b32 [%r1570+16384], {%r1675, %r1678, %r1681, %r1684}; 2026-02-21T08:52:59.7963478Z st.shared.v4.b32 [%r1570+32768], {%r1771, %r1774, %r1777, %r1780}; 2026-02-21T08:52:59.7963741Z st.shared.v4.b32 [%r1570+49152], {%r1867, %r1870, %r1873, %r1876}; 2026-02-21T08:52:59.7964005Z st.shared.v4.b32 [%r1570+65536], {%r1963, %r1966, %r1969, %r1972}; 2026-02-21T08:52:59.7964306Z st.shared.v4.b32 [%r1570+81920], {%r2059, %r2062, %r2065, %r2068}; 2026-02-21T08:52:59.7964567Z st.shared.v4.b32 [%r1570+98304], {%r2155, %r2158, %r2161, %r2164}; 2026-02-21T08:52:59.7965000Z st.shared.v4.b32 [%r1570+114688], {%r2251, %r2254, %r2257, %r2260}; 2026-02-21T08:52:59.7965277Z st.shared.v4.b32 [%r1569], {%r1591, %r1594, %r1597, %r1600}; 2026-02-21T08:52:59.7965535Z st.shared.v4.b32 [%r1569+16384], {%r1687, %r1690, %r1693, %r1696}; 2026-02-21T08:52:59.7965797Z st.shared.v4.b32 [%r1569+32768], {%r1783, %r1786, %r1789, %r1792}; 2026-02-21T08:52:59.7966169Z st.shared.v4.b32 [%r1569+49152], {%r1879, %r1882, %r1885, %r1888}; 2026-02-21T08:52:59.7966450Z st.shared.v4.b32 [%r1569+65536], {%r1975, %r1978, %r1981, %r1984}; 2026-02-21T08:52:59.7966727Z st.shared.v4.b32 [%r1569+81920], {%r2071, %r2074, %r2077, %r2080}; 2026-02-21T08:52:59.7966994Z st.shared.v4.b32 [%r1569+98304], {%r2167, %r2170, %r2173, %r2176}; 2026-02-21T08:52:59.7967299Z st.shared.v4.b32 [%r1569+114688], {%r2263, %r2266, %r2269, %r2272}; 2026-02-21T08:52:59.7967543Z st.shared.v4.b32 [%r1567], {%r1603, %r1606, %r1609, %r1612}; 2026-02-21T08:52:59.7967807Z st.shared.v4.b32 [%r1567+16384], {%r1699, %r1702, %r1705, %r1708}; 2026-02-21T08:52:59.7968099Z st.shared.v4.b32 [%r1567+32768], {%r1795, %r1798, %r1801, %r1804}; 2026-02-21T08:52:59.7968342Z st.shared.v4.b32 [%r1567+49152], {%r1891, %r1894, %r1897, %r1900}; 2026-02-21T08:52:59.7968684Z st.shared.v4.b32 [%r1567+65536], {%r1987, %r1990, %r1993, %r1996}; 2026-02-21T08:52:59.7968987Z st.shared.v4.b32 [%r1567+81920], {%r2083, %r2086, %r2089, %r2092}; 2026-02-21T08:52:59.7969251Z st.shared.v4.b32 [%r1567+98304], {%r2179, %r2182, %r2185, %r2188}; 2026-02-21T08:52:59.7969518Z st.shared.v4.b32 [%r1567+114688], {%r2275, %r2278, %r2281, %r2284}; 2026-02-21T08:52:59.7969800Z st.shared.v4.b32 [%r1565], {%r1615, %r1618, %r1621, %r1624}; 2026-02-21T08:52:59.7970062Z st.shared.v4.b32 [%r1565+16384], {%r1711, %r1714, %r1717, %r1720}; 2026-02-21T08:52:59.7970317Z st.shared.v4.b32 [%r1565+32768], {%r1807, %r1810, %r1813, %r1816}; 2026-02-21T08:52:59.7970571Z st.shared.v4.b32 [%r1565+49152], {%r1903, %r1906, %r1909, %r1912}; 2026-02-21T08:52:59.7970874Z st.shared.v4.b32 [%r1565+65536], {%r1999, %r2002, %r2005, %r2008}; 2026-02-21T08:52:59.7971128Z st.shared.v4.b32 [%r1565+81920], {%r2095, %r2098, %r2101, %r2104}; 2026-02-21T08:52:59.7971383Z st.shared.v4.b32 [%r1565+98304], {%r2191, %r2194, %r2197, %r2200}; 2026-02-21T08:52:59.7971696Z st.shared.v4.b32 [%r1565+114688], {%r2287, %r2290, %r2293, %r2296}; 2026-02-21T08:52:59.7971951Z st.shared.v4.b32 [%r1563], {%r1627, %r1630, %r1633, %r1636}; 2026-02-21T08:52:59.7972328Z st.shared.v4.b32 [%r1563+16384], {%r1723, %r1726, %r1729, %r1732}; 2026-02-21T08:52:59.7972637Z st.shared.v4.b32 [%r1563+32768], {%r1819, %r1822, %r1825, %r1828}; 2026-02-21T08:52:59.7972898Z st.shared.v4.b32 [%r1563+49152], {%r1915, %r1918, %r1921, %r1924}; 2026-02-21T08:52:59.7973153Z st.shared.v4.b32 [%r1563+65536], {%r2011, %r2014, %r2017, %r2020}; 2026-02-21T08:52:59.7973452Z st.shared.v4.b32 [%r1563+81920], {%r2107, %r2110, %r2113, %r2116}; 2026-02-21T08:52:59.7973718Z st.shared.v4.b32 [%r1563+98304], {%r2203, %r2206, %r2209, %r2212}; 2026-02-21T08:52:59.7973967Z st.shared.v4.b32 [%r1563+114688], {%r2299, %r2302, %r2305, %r2308}; 2026-02-21T08:52:59.7974187Z st.shared.v4.b32 [%r1561], {%r1639, %r1642, %r1645, %r1648}; 2026-02-21T08:52:59.7974480Z st.shared.v4.b32 [%r1561+16384], {%r1735, %r1738, %r1741, %r1744}; 2026-02-21T08:52:59.7974794Z st.shared.v4.b32 [%r1561+32768], {%r1831, %r1834, %r1837, %r1840}; 2026-02-21T08:52:59.7975052Z st.shared.v4.b32 [%r1561+49152], {%r1927, %r1930, %r1933, %r1936}; 2026-02-21T08:52:59.7975350Z st.shared.v4.b32 [%r1561+65536], {%r2023, %r2026, %r2029, %r2032}; 2026-02-21T08:52:59.7975606Z st.shared.v4.b32 [%r1561+81920], {%r2119, %r2122, %r2125, %r2128}; 2026-02-21T08:52:59.7975857Z st.shared.v4.b32 [%r1561+98304], {%r2215, %r2218, %r2221, %r2224}; 2026-02-21T08:52:59.7976162Z st.shared.v4.b32 [%r1561+114688], {%r2311, %r2314, %r2317, %r2320}; 2026-02-21T08:52:59.7976388Z st.shared.v4.b32 [%r1559], {%r1651, %r1654, %r1657, %r1660}; 2026-02-21T08:52:59.7976742Z st.shared.v4.b32 [%r1559+16384], {%r1747, %r1750, %r1753, %r1756}; 2026-02-21T08:52:59.7977029Z st.shared.v4.b32 [%r1559+32768], {%r1843, %r1846, %r1849, %r1852}; 2026-02-21T08:52:59.7977287Z st.shared.v4.b32 [%r1559+49152], {%r1939, %r1942, %r1945, %r1948}; 2026-02-21T08:52:59.7977550Z st.shared.v4.b32 [%r1559+65536], {%r2035, %r2038, %r2041, %r2044}; 2026-02-21T08:52:59.7977888Z st.shared.v4.b32 [%r1559+81920], {%r2131, %r2134, %r2137, %r2140}; 2026-02-21T08:52:59.7978199Z st.shared.v4.b32 [%r1559+98304], {%r2227, %r2230, %r2233, %r2236}; 2026-02-21T08:52:59.7978458Z st.shared.v4.b32 [%r1559+114688], {%r2323, %r2326, %r2329, %r2332}; 2026-02-21T08:52:59.7978704Z st.shared.v4.b32 [%r1557], {%r1663, %r1666, %r1669, %r1672}; 2026-02-21T08:52:59.7979006Z st.shared.v4.b32 [%r1557+16384], {%r1759, %r1762, %r1765, %r1768}; 2026-02-21T08:52:59.7979261Z st.shared.v4.b32 [%r1557+32768], {%r1855, %r1858, %r1861, %r1864}; 2026-02-21T08:52:59.7979517Z st.shared.v4.b32 [%r1557+49152], {%r1951, %r1954, %r1957, %r1960}; 2026-02-21T08:52:59.7979800Z st.shared.v4.b32 [%r1557+65536], {%r2047, %r2050, %r2053, %r2056}; 2026-02-21T08:52:59.7980057Z st.shared.v4.b32 [%r1557+81920], {%r2143, %r2146, %r2149, %r2152}; 2026-02-21T08:52:59.7980404Z st.shared.v4.b32 [%r1557+98304], {%r2239, %r2242, %r2245, %r2248}; 2026-02-21T08:52:59.7980714Z st.shared.v4.b32 [%r1557+114688], {%r2335, %r2338, %r2341, %r2344}; 2026-02-21T08:52:59.7980864Z // begin inline asm 2026-02-21T08:52:59.7981067Z fence.proxy.async.shared::cta; 2026-02-21T08:52:59.7981208Z // end inline asm 2026-02-21T08:52:59.7981382Z bar.sync 0, 128; 2026-02-21T08:52:59.7981540Z elect.sync %r2345|%p227, -1; 2026-02-21T08:52:59.7981687Z and.pred %p225, %p226, %p227; 2026-02-21T08:52:59.7981867Z shl.b32 %r2346, %r1575, 15; 2026-02-21T08:52:59.7982011Z add.s32 %r1549, %r425, %r2346; 2026-02-21T08:52:59.7982152Z shl.b32 %r2347, %r1575, 8; 2026-02-21T08:52:59.7982293Z or.b32 %r1548, %r2347, %r1572; 2026-02-21T08:52:59.7982475Z // begin inline asm 2026-02-21T08:52:59.7982981Z @%p225 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd363, {%r1547, %r1548}], [%r1549]; 2026-02-21T08:52:59.7983120Z // end inline asm 2026-02-21T08:52:59.7983327Z cp.async.bulk.commit_group; 2026-02-21T08:52:59.7983507Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:52:59.7983656Z bar.sync 0, 128; 2026-02-21T08:52:59.7983897Z $L__BB0_16: // %._crit_edge 2026-02-21T08:52:59.7984462Z .loc 1 32 4 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:32:4 2026-02-21T08:52:59.7984602Z bar.sync 0, 128; 2026-02-21T08:52:59.7984803Z // begin inline asm 2026-02-21T08:52:59.7985135Z @%p147 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2348, 512; 2026-02-21T08:52:59.7985266Z // end inline asm 2026-02-21T08:52:59.7985508Z st.shared.v2.b32 [global_smem+278568], {50529027, 50529027}; 2026-02-21T08:52:59.7985686Z barrier.sync 1; 2026-02-21T08:52:59.7985877Z $L__BB0_17: // %common.ret 2026-02-21T08:52:59.7986304Z .loc 1 0 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:0 2026-02-21T08:52:59.7986468Z ret; 2026-02-21T08:52:59.7986698Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:52:59.7986905Z ld.param.b64 %rd89, [_helion_matmul_param_1]; 2026-02-21T08:52:59.7987325Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19 2026-02-21T08:52:59.7987506Z and.b32 %r25, %r1, 15; 2026-02-21T08:52:59.7987660Z mad.wide.u32 %rd1, %r25, 16, %rd89; 2026-02-21T08:52:59.7987808Z bfe.u32 %r26, %r1, 4, 3; 2026-02-21T08:52:59.7988285Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.7988430Z or.b32 %r4, %r26, 56; 2026-02-21T08:52:59.7988568Z or.b32 %r5, %r26, 48; 2026-02-21T08:52:59.7988745Z or.b32 %r6, %r26, 40; 2026-02-21T08:52:59.7988997Z or.b32 %r7, %r26, 32; 2026-02-21T08:52:59.7989135Z or.b32 %r8, %r26, 24; 2026-02-21T08:52:59.7989272Z or.b32 %r9, %r26, 16; 2026-02-21T08:52:59.7989447Z or.b32 %r10, %r26, 8; 2026-02-21T08:52:59.7989587Z bra.uni $L__BB0_2; 2026-02-21T08:52:59.7989842Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.7990323Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.7990543Z barrier.sync 1; 2026-02-21T08:52:59.7990690Z barrier.sync 1; 2026-02-21T08:52:59.7990871Z $L__BB0_2: // %.preheader 2026-02-21T08:52:59.7991115Z // =>This Loop Header: Depth=1 2026-02-21T08:52:59.7991324Z // Child Loop BB0_11 Depth 2 2026-02-21T08:52:59.7991532Z // Child Loop BB0_6 Depth 2 2026-02-21T08:52:59.7991998Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19 2026-02-21T08:52:59.7992190Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:52:59.7992331Z barrier.sync 1; 2026-02-21T08:52:59.7992517Z mov.b32 %r28, global_smem; 2026-02-21T08:52:59.7992658Z add.s32 %r29, %r28, %r3; 2026-02-21T08:52:59.7992809Z ld.shared.b8 %r27, [%r29+278564]; 2026-02-21T08:52:59.7993051Z setp.gt.u32 %p3, %r27, 3; 2026-02-21T08:52:59.7993233Z @%p3 bra $L__BB0_4; 2026-02-21T08:52:59.7993418Z // %bb.3: // %.preheader 2026-02-21T08:52:59.7993635Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.7993824Z $L_brx_0: .branchtargets 2026-02-21T08:52:59.7993952Z $L__BB0_5, 2026-02-21T08:52:59.7994096Z $L__BB0_10, 2026-02-21T08:52:59.7994266Z $L__BB0_13, 2026-02-21T08:52:59.7994400Z $L__BB0_17; 2026-02-21T08:52:59.7994548Z brx.idx %r27, $L_brx_0; 2026-02-21T08:52:59.7994874Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.7995375Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.7995522Z add.s32 %r86, %r28, 262144; 2026-02-21T08:52:59.7995707Z ld.shared.b32 %r293, [global_smem+262144]; 2026-02-21T08:52:59.7995920Z ld.shared.b32 %r87, [global_smem+262156]; 2026-02-21T08:52:59.7996057Z barrier.sync 1; 2026-02-21T08:52:59.7996465Z .loc 1 45 45 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:45:45 2026-02-21T08:52:59.7996705Z add.s32 %r88, %r1, -128; 2026-02-21T08:52:59.7996889Z shr.u32 %r12, %r88, 5; 2026-02-21T08:52:59.7997028Z and.b32 %r89, %r1, 112; 2026-02-21T08:52:59.7997464Z .loc 1 45 32 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:45:32 2026-02-21T08:52:59.7997652Z add.s32 %r91, %r87, %r26; 2026-02-21T08:52:59.7998083Z .loc 1 56 80 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:56:80 2026-02-21T08:52:59.7998222Z shl.b32 %r92, %r91, 11; 2026-02-21T08:52:59.7998396Z shl.b32 %r93, %r1, 4; 2026-02-21T08:52:59.7998535Z and.b32 %r94, %r93, 112; 2026-02-21T08:52:59.7998670Z shl.b32 %r95, %r89, 3; 2026-02-21T08:52:59.7998806Z shl.b32 %r96, %r1, 10; 2026-02-21T08:52:59.7998989Z and.b32 %r97, %r96, 8192; 2026-02-21T08:52:59.7999130Z or.b32 %r98, %r94, %r95; 2026-02-21T08:52:59.7999265Z xor.b32 %r99, %r98, %r89; 2026-02-21T08:52:59.7999450Z add.s32 %r100, %r86, %r97; 2026-02-21T08:52:59.7999594Z add.s32 %r13, %r100, %r99; 2026-02-21T08:52:59.7999740Z bfe.u32 %r101, %r28, 4, 14; 2026-02-21T08:52:59.7999887Z cvt.u64.u32 %rd100, %r101; 2026-02-21T08:52:59.8000109Z or.b64 %rd196, %rd100, 4611686293842165760; 2026-02-21T08:52:59.8000259Z bfe.u32 %r102, %r86, 4, 14; 2026-02-21T08:52:59.8000407Z cvt.u64.u32 %rd101, %r102; 2026-02-21T08:52:59.8000614Z or.b64 %rd197, %rd101, 4611686293338849280; 2026-02-21T08:52:59.8000759Z add.s32 %r103, %r28, 32; 2026-02-21T08:52:59.8000905Z bfe.u32 %r104, %r103, 4, 14; 2026-02-21T08:52:59.8001198Z cvt.u64.u32 %rd102, %r104; 2026-02-21T08:52:59.8001362Z or.b64 %rd198, %rd102, 4611686293842165760; 2026-02-21T08:52:59.8001503Z add.s32 %r105, %r28, 262176; 2026-02-21T08:52:59.8001636Z bfe.u32 %r106, %r105, 4, 14; 2026-02-21T08:52:59.8001817Z cvt.u64.u32 %rd103, %r106; 2026-02-21T08:52:59.8001985Z or.b64 %rd199, %rd103, 4611686293338849280; 2026-02-21T08:52:59.8002129Z add.s32 %r107, %r28, 64; 2026-02-21T08:52:59.8002386Z bfe.u32 %r108, %r107, 4, 14; 2026-02-21T08:52:59.8002542Z cvt.u64.u32 %rd104, %r108; 2026-02-21T08:52:59.8002717Z or.b64 %rd200, %rd104, 4611686293842165760; 2026-02-21T08:52:59.8002859Z add.s32 %r109, %r28, 262208; 2026-02-21T08:52:59.8003041Z bfe.u32 %r110, %r109, 4, 14; 2026-02-21T08:52:59.8003186Z cvt.u64.u32 %rd105, %r110; 2026-02-21T08:52:59.8003353Z or.b64 %rd201, %rd105, 4611686293338849280; 2026-02-21T08:52:59.8003531Z add.s32 %r111, %r28, 96; 2026-02-21T08:52:59.8003671Z bfe.u32 %r112, %r111, 4, 14; 2026-02-21T08:52:59.8003816Z cvt.u64.u32 %rd106, %r112; 2026-02-21T08:52:59.8003979Z or.b64 %rd202, %rd106, 4611686293842165760; 2026-02-21T08:52:59.8004159Z add.s32 %r113, %r28, 262240; 2026-02-21T08:52:59.8004303Z bfe.u32 %r114, %r113, 4, 14; 2026-02-21T08:52:59.8004444Z cvt.u64.u32 %rd107, %r114; 2026-02-21T08:52:59.8004823Z or.b64 %rd203, %rd107, 4611686293338849280; 2026-02-21T08:52:59.8004978Z add.s32 %r115, %r28, 131072; 2026-02-21T08:52:59.8005128Z bfe.u32 %r116, %r115, 4, 14; 2026-02-21T08:52:59.8005320Z cvt.u64.u32 %rd108, %r116; 2026-02-21T08:52:59.8005489Z or.b64 %rd204, %rd108, 4611686293842165760; 2026-02-21T08:52:59.8005633Z add.s32 %r117, %r28, 270336; 2026-02-21T08:52:59.8005778Z bfe.u32 %r118, %r117, 4, 14; 2026-02-21T08:52:59.8005962Z cvt.u64.u32 %rd109, %r118; 2026-02-21T08:52:59.8006128Z or.b64 %rd205, %rd109, 4611686293338849280; 2026-02-21T08:52:59.8006270Z add.s32 %r119, %r28, 131104; 2026-02-21T08:52:59.8006453Z bfe.u32 %r120, %r119, 4, 14; 2026-02-21T08:52:59.8006602Z cvt.u64.u32 %rd110, %r120; 2026-02-21T08:52:59.8006765Z or.b64 %rd206, %rd110, 4611686293842165760; 2026-02-21T08:52:59.8006907Z add.s32 %r121, %r28, 270368; 2026-02-21T08:52:59.8007080Z bfe.u32 %r122, %r121, 4, 14; 2026-02-21T08:52:59.8007217Z cvt.u64.u32 %rd111, %r122; 2026-02-21T08:52:59.8007377Z or.b64 %rd207, %rd111, 4611686293338849280; 2026-02-21T08:52:59.8007555Z add.s32 %r123, %r28, 131136; 2026-02-21T08:52:59.8007696Z bfe.u32 %r124, %r123, 4, 14; 2026-02-21T08:52:59.8007946Z cvt.u64.u32 %rd112, %r124; 2026-02-21T08:52:59.8008111Z or.b64 %rd208, %rd112, 4611686293842165760; 2026-02-21T08:52:59.8008290Z add.s32 %r125, %r28, 270400; 2026-02-21T08:52:59.8008435Z bfe.u32 %r126, %r125, 4, 14; 2026-02-21T08:52:59.8008576Z cvt.u64.u32 %rd113, %r126; 2026-02-21T08:52:59.8008780Z or.b64 %rd209, %rd113, 4611686293338849280; 2026-02-21T08:52:59.8008919Z add.s32 %r127, %r28, 131168; 2026-02-21T08:52:59.8009063Z bfe.u32 %r128, %r127, 4, 14; 2026-02-21T08:52:59.8009242Z cvt.u64.u32 %rd114, %r128; 2026-02-21T08:52:59.8009406Z or.b64 %rd210, %rd114, 4611686293842165760; 2026-02-21T08:52:59.8009544Z add.s32 %r129, %r28, 270432; 2026-02-21T08:52:59.8009682Z bfe.u32 %r130, %r129, 4, 14; 2026-02-21T08:52:59.8009861Z cvt.u64.u32 %rd115, %r130; 2026-02-21T08:52:59.8010024Z or.b64 %rd211, %rd115, 4611686293338849280; 2026-02-21T08:52:59.8010168Z add.s32 %r131, %r28, 16384; 2026-02-21T08:52:59.8010346Z bfe.u32 %r132, %r131, 4, 14; 2026-02-21T08:52:59.8010493Z cvt.u64.u32 %rd116, %r132; 2026-02-21T08:52:59.8010660Z or.b64 %rd212, %rd116, 4611686293842165760; 2026-02-21T08:52:59.8010801Z add.s32 %r133, %r28, 16416; 2026-02-21T08:52:59.8010984Z bfe.u32 %r134, %r133, 4, 14; 2026-02-21T08:52:59.8011131Z cvt.u64.u32 %rd117, %r134; 2026-02-21T08:52:59.8011293Z or.b64 %rd214, %rd117, 4611686293842165760; 2026-02-21T08:52:59.8011471Z add.s32 %r135, %r28, 16448; 2026-02-21T08:52:59.8011621Z bfe.u32 %r136, %r135, 4, 14; 2026-02-21T08:52:59.8011763Z cvt.u64.u32 %rd118, %r136; 2026-02-21T08:52:59.8012069Z or.b64 %rd216, %rd118, 4611686293842165760; 2026-02-21T08:52:59.8012213Z add.s32 %r137, %r28, 16480; 2026-02-21T08:52:59.8012353Z bfe.u32 %r138, %r137, 4, 14; 2026-02-21T08:52:59.8012491Z cvt.u64.u32 %rd119, %r138; 2026-02-21T08:52:59.8012693Z or.b64 %rd218, %rd119, 4611686293842165760; 2026-02-21T08:52:59.8012836Z add.s32 %r139, %r28, 147456; 2026-02-21T08:52:59.8012976Z bfe.u32 %r140, %r139, 4, 14; 2026-02-21T08:52:59.8013236Z cvt.u64.u32 %rd120, %r140; 2026-02-21T08:52:59.8013416Z or.b64 %rd220, %rd120, 4611686293842165760; 2026-02-21T08:52:59.8013557Z add.s32 %r141, %r28, 147488; 2026-02-21T08:52:59.8013699Z bfe.u32 %r142, %r141, 4, 14; 2026-02-21T08:52:59.8013888Z cvt.u64.u32 %rd121, %r142; 2026-02-21T08:52:59.8014052Z or.b64 %rd222, %rd121, 4611686293842165760; 2026-02-21T08:52:59.8014192Z add.s32 %r143, %r28, 147520; 2026-02-21T08:52:59.8014375Z bfe.u32 %r144, %r143, 4, 14; 2026-02-21T08:52:59.8014520Z cvt.u64.u32 %rd122, %r144; 2026-02-21T08:52:59.8014768Z or.b64 %rd224, %rd122, 4611686293842165760; 2026-02-21T08:52:59.8014879Z add.s32 %r145, %r28, 147552; 2026-02-21T08:52:59.8015021Z bfe.u32 %r146, %r145, 4, 14; 2026-02-21T08:52:59.8015126Z cvt.u64.u32 %rd123, %r146; 2026-02-21T08:52:59.8015247Z or.b64 %rd226, %rd123, 4611686293842165760; 2026-02-21T08:52:59.8015471Z add.s32 %r147, %r28, 32768; 2026-02-21T08:52:59.8015575Z bfe.u32 %r148, %r147, 4, 14; 2026-02-21T08:52:59.8015684Z cvt.u64.u32 %rd124, %r148; 2026-02-21T08:52:59.8015844Z or.b64 %rd228, %rd124, 4611686293842165760; 2026-02-21T08:52:59.8015950Z add.s32 %r149, %r28, 32800; 2026-02-21T08:52:59.8016065Z bfe.u32 %r150, %r149, 4, 14; 2026-02-21T08:52:59.8016168Z cvt.u64.u32 %rd125, %r150; 2026-02-21T08:52:59.8016321Z or.b64 %rd230, %rd125, 4611686293842165760; 2026-02-21T08:52:59.8016426Z add.s32 %r151, %r28, 32832; 2026-02-21T08:52:59.8016527Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T08:52:59.8016664Z cvt.u64.u32 %rd126, %r152; 2026-02-21T08:52:59.8016786Z or.b64 %rd232, %rd126, 4611686293842165760; 2026-02-21T08:52:59.8016888Z add.s32 %r153, %r28, 32864; 2026-02-21T08:52:59.8016990Z bfe.u32 %r154, %r153, 4, 14; 2026-02-21T08:52:59.8017130Z cvt.u64.u32 %rd127, %r154; 2026-02-21T08:52:59.8017249Z or.b64 %rd234, %rd127, 4611686293842165760; 2026-02-21T08:52:59.8017352Z add.s32 %r155, %r28, 163840; 2026-02-21T08:52:59.8017489Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T08:52:59.8017594Z cvt.u64.u32 %rd128, %r156; 2026-02-21T08:52:59.8017789Z or.b64 %rd236, %rd128, 4611686293842165760; 2026-02-21T08:52:59.8017895Z add.s32 %r157, %r28, 163872; 2026-02-21T08:52:59.8018032Z bfe.u32 %r158, %r157, 4, 14; 2026-02-21T08:52:59.8018134Z cvt.u64.u32 %rd129, %r158; 2026-02-21T08:52:59.8018250Z or.b64 %rd238, %rd129, 4611686293842165760; 2026-02-21T08:52:59.8018384Z add.s32 %r159, %r28, 163904; 2026-02-21T08:52:59.8018484Z bfe.u32 %r160, %r159, 4, 14; 2026-02-21T08:52:59.8018589Z cvt.u64.u32 %rd130, %r160; 2026-02-21T08:52:59.8018745Z or.b64 %rd240, %rd130, 4611686293842165760; 2026-02-21T08:52:59.8018850Z add.s32 %r161, %r28, 163936; 2026-02-21T08:52:59.8018951Z bfe.u32 %r162, %r161, 4, 14; 2026-02-21T08:52:59.8019053Z cvt.u64.u32 %rd131, %r162; 2026-02-21T08:52:59.8019202Z or.b64 %rd242, %rd131, 4611686293842165760; 2026-02-21T08:52:59.8019307Z add.s32 %r163, %r28, 49152; 2026-02-21T08:52:59.8019413Z bfe.u32 %r164, %r163, 4, 14; 2026-02-21T08:52:59.8019551Z cvt.u64.u32 %rd132, %r164; 2026-02-21T08:52:59.8019668Z or.b64 %rd244, %rd132, 4611686293842165760; 2026-02-21T08:52:59.8019768Z add.s32 %r165, %r28, 49184; 2026-02-21T08:52:59.8019868Z bfe.u32 %r166, %r165, 4, 14; 2026-02-21T08:52:59.8020006Z cvt.u64.u32 %rd133, %r166; 2026-02-21T08:52:59.8020126Z or.b64 %rd246, %rd133, 4611686293842165760; 2026-02-21T08:52:59.8020229Z add.s32 %r167, %r28, 49216; 2026-02-21T08:52:59.8020364Z bfe.u32 %r168, %r167, 4, 14; 2026-02-21T08:52:59.8020468Z cvt.u64.u32 %rd134, %r168; 2026-02-21T08:52:59.8020584Z or.b64 %rd248, %rd134, 4611686293842165760; 2026-02-21T08:52:59.8020778Z add.s32 %r169, %r28, 49248; 2026-02-21T08:52:59.8020884Z bfe.u32 %r170, %r169, 4, 14; 2026-02-21T08:52:59.8020988Z cvt.u64.u32 %rd135, %r170; 2026-02-21T08:52:59.8021107Z or.b64 %rd250, %rd135, 4611686293842165760; 2026-02-21T08:52:59.8021245Z add.s32 %r171, %r28, 180224; 2026-02-21T08:52:59.8021347Z bfe.u32 %r172, %r171, 4, 14; 2026-02-21T08:52:59.8021449Z cvt.u64.u32 %rd136, %r172; 2026-02-21T08:52:59.8021665Z or.b64 %rd252, %rd136, 4611686293842165760; 2026-02-21T08:52:59.8021774Z add.s32 %r173, %r28, 180256; 2026-02-21T08:52:59.8021876Z bfe.u32 %r174, %r173, 4, 14; 2026-02-21T08:52:59.8021979Z cvt.u64.u32 %rd137, %r174; 2026-02-21T08:52:59.8022130Z or.b64 %rd254, %rd137, 4611686293842165760; 2026-02-21T08:52:59.8022232Z add.s32 %r175, %r28, 180288; 2026-02-21T08:52:59.8022336Z bfe.u32 %r176, %r175, 4, 14; 2026-02-21T08:52:59.8022469Z cvt.u64.u32 %rd138, %r176; 2026-02-21T08:52:59.8022589Z or.b64 %rd256, %rd138, 4611686293842165760; 2026-02-21T08:52:59.8022695Z add.s32 %r177, %r28, 180320; 2026-02-21T08:52:59.8022797Z bfe.u32 %r178, %r177, 4, 14; 2026-02-21T08:52:59.8022934Z cvt.u64.u32 %rd139, %r178; 2026-02-21T08:52:59.8023052Z or.b64 %rd258, %rd139, 4611686293842165760; 2026-02-21T08:52:59.8023153Z add.s32 %r179, %r28, 65536; 2026-02-21T08:52:59.8023327Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T08:52:59.8023434Z cvt.u64.u32 %rd140, %r180; 2026-02-21T08:52:59.8023551Z or.b64 %rd260, %rd140, 4611686293842165760; 2026-02-21T08:52:59.8023687Z add.s32 %r181, %r28, 65568; 2026-02-21T08:52:59.8023791Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T08:52:59.8023893Z cvt.u64.u32 %rd141, %r182; 2026-02-21T08:52:59.8024009Z or.b64 %rd262, %rd141, 4611686293842165760; 2026-02-21T08:52:59.8024142Z add.s32 %r183, %r28, 65600; 2026-02-21T08:52:59.8024243Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T08:52:59.8024346Z cvt.u64.u32 %rd142, %r184; 2026-02-21T08:52:59.8024495Z or.b64 %rd264, %rd142, 4611686293842165760; 2026-02-21T08:52:59.8024612Z add.s32 %r185, %r28, 65632; 2026-02-21T08:52:59.8024788Z bfe.u32 %r186, %r185, 4, 14; 2026-02-21T08:52:59.8024895Z cvt.u64.u32 %rd143, %r186; 2026-02-21T08:52:59.8025048Z or.b64 %rd266, %rd143, 4611686293842165760; 2026-02-21T08:52:59.8025149Z add.s32 %r187, %r28, 196608; 2026-02-21T08:52:59.8025252Z bfe.u32 %r188, %r187, 4, 14; 2026-02-21T08:52:59.8025386Z cvt.u64.u32 %rd144, %r188; 2026-02-21T08:52:59.8025507Z or.b64 %rd268, %rd144, 4611686293842165760; 2026-02-21T08:52:59.8025659Z add.s32 %r189, %r28, 196640; 2026-02-21T08:52:59.8025763Z bfe.u32 %r190, %r189, 4, 14; 2026-02-21T08:52:59.8025899Z cvt.u64.u32 %rd145, %r190; 2026-02-21T08:52:59.8026013Z or.b64 %rd270, %rd145, 4611686293842165760; 2026-02-21T08:52:59.8026115Z add.s32 %r191, %r28, 196672; 2026-02-21T08:52:59.8026249Z bfe.u32 %r192, %r191, 4, 14; 2026-02-21T08:52:59.8026353Z cvt.u64.u32 %rd146, %r192; 2026-02-21T08:52:59.8026469Z or.b64 %rd272, %rd146, 4611686293842165760; 2026-02-21T08:52:59.8026603Z add.s32 %r193, %r28, 196704; 2026-02-21T08:52:59.8026705Z bfe.u32 %r194, %r193, 4, 14; 2026-02-21T08:52:59.8026811Z cvt.u64.u32 %rd147, %r194; 2026-02-21T08:52:59.8026928Z or.b64 %rd274, %rd147, 4611686293842165760; 2026-02-21T08:52:59.8027063Z add.s32 %r195, %r28, 81920; 2026-02-21T08:52:59.8027168Z bfe.u32 %r196, %r195, 4, 14; 2026-02-21T08:52:59.8027273Z cvt.u64.u32 %rd148, %r196; 2026-02-21T08:52:59.8027427Z or.b64 %rd276, %rd148, 4611686293842165760; 2026-02-21T08:52:59.8027532Z add.s32 %r197, %r28, 81952; 2026-02-21T08:52:59.8027633Z bfe.u32 %r198, %r197, 4, 14; 2026-02-21T08:52:59.8027740Z cvt.u64.u32 %rd149, %r198; 2026-02-21T08:52:59.8027898Z or.b64 %rd278, %rd149, 4611686293842165760; 2026-02-21T08:52:59.8027999Z add.s32 %r199, %r28, 81984; 2026-02-21T08:52:59.8028103Z bfe.u32 %r200, %r199, 4, 14; 2026-02-21T08:52:59.8028238Z cvt.u64.u32 %rd150, %r200; 2026-02-21T08:52:59.8028355Z or.b64 %rd280, %rd150, 4611686293842165760; 2026-02-21T08:52:59.8028504Z add.s32 %r201, %r28, 82016; 2026-02-21T08:52:59.8028609Z bfe.u32 %r202, %r201, 4, 14; 2026-02-21T08:52:59.8028743Z cvt.u64.u32 %rd151, %r202; 2026-02-21T08:52:59.8028859Z or.b64 %rd282, %rd151, 4611686293842165760; 2026-02-21T08:52:59.8028961Z add.s32 %r203, %r28, 212992; 2026-02-21T08:52:59.8029098Z bfe.u32 %r204, %r203, 4, 14; 2026-02-21T08:52:59.8029206Z cvt.u64.u32 %rd152, %r204; 2026-02-21T08:52:59.8029369Z or.b64 %rd284, %rd152, 4611686293842165760; 2026-02-21T08:52:59.8029514Z add.s32 %r205, %r28, 213024; 2026-02-21T08:52:59.8029617Z bfe.u32 %r206, %r205, 4, 14; 2026-02-21T08:52:59.8029721Z cvt.u64.u32 %rd153, %r206; 2026-02-21T08:52:59.8029837Z or.b64 %rd286, %rd153, 4611686293842165760; 2026-02-21T08:52:59.8029973Z add.s32 %r207, %r28, 213056; 2026-02-21T08:52:59.8030076Z bfe.u32 %r208, %r207, 4, 14; 2026-02-21T08:52:59.8030181Z cvt.u64.u32 %rd154, %r208; 2026-02-21T08:52:59.8030336Z or.b64 %rd288, %rd154, 4611686293842165760; 2026-02-21T08:52:59.8030440Z add.s32 %r209, %r28, 213088; 2026-02-21T08:52:59.8030542Z bfe.u32 %r210, %r209, 4, 14; 2026-02-21T08:52:59.8030647Z cvt.u64.u32 %rd155, %r210; 2026-02-21T08:52:59.8030800Z or.b64 %rd290, %rd155, 4611686293842165760; 2026-02-21T08:52:59.8030903Z add.s32 %r211, %r28, 98304; 2026-02-21T08:52:59.8031005Z bfe.u32 %r212, %r211, 4, 14; 2026-02-21T08:52:59.8031184Z cvt.u64.u32 %rd156, %r212; 2026-02-21T08:52:59.8031306Z or.b64 %rd292, %rd156, 4611686293842165760; 2026-02-21T08:52:59.8031410Z add.s32 %r213, %r28, 98336; 2026-02-21T08:52:59.8031551Z bfe.u32 %r214, %r213, 4, 14; 2026-02-21T08:52:59.8031657Z cvt.u64.u32 %rd157, %r214; 2026-02-21T08:52:59.8031772Z or.b64 %rd294, %rd157, 4611686293842165760; 2026-02-21T08:52:59.8031877Z add.s32 %r215, %r28, 98368; 2026-02-21T08:52:59.8032021Z bfe.u32 %r216, %r215, 4, 14; 2026-02-21T08:52:59.8032125Z cvt.u64.u32 %rd158, %r216; 2026-02-21T08:52:59.8032242Z or.b64 %rd296, %rd158, 4611686293842165760; 2026-02-21T08:52:59.8032373Z add.s32 %r217, %r28, 98400; 2026-02-21T08:52:59.8032479Z bfe.u32 %r218, %r217, 4, 14; 2026-02-21T08:52:59.8032579Z cvt.u64.u32 %rd159, %r218; 2026-02-21T08:52:59.8032694Z or.b64 %rd298, %rd159, 4611686293842165760; 2026-02-21T08:52:59.8032832Z add.s32 %r219, %r28, 229376; 2026-02-21T08:52:59.8032930Z bfe.u32 %r220, %r219, 4, 14; 2026-02-21T08:52:59.8033033Z cvt.u64.u32 %rd160, %r220; 2026-02-21T08:52:59.8033182Z or.b64 %rd300, %rd160, 4611686293842165760; 2026-02-21T08:52:59.8033286Z add.s32 %r221, %r28, 229408; 2026-02-21T08:52:59.8033426Z bfe.u32 %r222, %r221, 4, 14; 2026-02-21T08:52:59.8033529Z cvt.u64.u32 %rd161, %r222; 2026-02-21T08:52:59.8033678Z or.b64 %rd302, %rd161, 4611686293842165760; 2026-02-21T08:52:59.8033778Z add.s32 %r223, %r28, 229440; 2026-02-21T08:52:59.8033879Z bfe.u32 %r224, %r223, 4, 14; 2026-02-21T08:52:59.8034017Z cvt.u64.u32 %rd162, %r224; 2026-02-21T08:52:59.8034134Z or.b64 %rd304, %rd162, 4611686293842165760; 2026-02-21T08:52:59.8034235Z add.s32 %r225, %r28, 229472; 2026-02-21T08:52:59.8034373Z bfe.u32 %r226, %r225, 4, 14; 2026-02-21T08:52:59.8034475Z cvt.u64.u32 %rd163, %r226; 2026-02-21T08:52:59.8034594Z or.b64 %rd306, %rd163, 4611686293842165760; 2026-02-21T08:52:59.8034777Z add.s32 %r227, %r28, 114688; 2026-02-21T08:52:59.8034919Z bfe.u32 %r228, %r227, 4, 14; 2026-02-21T08:52:59.8035024Z cvt.u64.u32 %rd164, %r228; 2026-02-21T08:52:59.8035143Z or.b64 %rd308, %rd164, 4611686293842165760; 2026-02-21T08:52:59.8035281Z add.s32 %r229, %r28, 114720; 2026-02-21T08:52:59.8035387Z bfe.u32 %r230, %r229, 4, 14; 2026-02-21T08:52:59.8035496Z cvt.u64.u32 %rd165, %r230; 2026-02-21T08:52:59.8035616Z or.b64 %rd310, %rd165, 4611686293842165760; 2026-02-21T08:52:59.8035752Z add.s32 %r231, %r28, 114752; 2026-02-21T08:52:59.8035852Z bfe.u32 %r232, %r231, 4, 14; 2026-02-21T08:52:59.8035955Z cvt.u64.u32 %rd166, %r232; 2026-02-21T08:52:59.8036103Z or.b64 %rd312, %rd166, 4611686293842165760; 2026-02-21T08:52:59.8036205Z add.s32 %r233, %r28, 114784; 2026-02-21T08:52:59.8036361Z bfe.u32 %r234, %r233, 4, 14; 2026-02-21T08:52:59.8036466Z cvt.u64.u32 %rd167, %r234; 2026-02-21T08:52:59.8036626Z or.b64 %rd314, %rd167, 4611686293842165760; 2026-02-21T08:52:59.8036723Z add.s32 %r235, %r28, 245760; 2026-02-21T08:52:59.8036825Z bfe.u32 %r236, %r235, 4, 14; 2026-02-21T08:52:59.8036962Z cvt.u64.u32 %rd168, %r236; 2026-02-21T08:52:59.8037079Z or.b64 %rd316, %rd168, 4611686293842165760; 2026-02-21T08:52:59.8037245Z add.s32 %r237, %r28, 245792; 2026-02-21T08:52:59.8037388Z bfe.u32 %r238, %r237, 4, 14; 2026-02-21T08:52:59.8037492Z cvt.u64.u32 %rd169, %r238; 2026-02-21T08:52:59.8037609Z or.b64 %rd318, %rd169, 4611686293842165760; 2026-02-21T08:52:59.8037707Z add.s32 %r239, %r28, 245824; 2026-02-21T08:52:59.8037846Z bfe.u32 %r240, %r239, 4, 14; 2026-02-21T08:52:59.8037948Z cvt.u64.u32 %rd170, %r240; 2026-02-21T08:52:59.8038064Z or.b64 %rd320, %rd170, 4611686293842165760; 2026-02-21T08:52:59.8038198Z add.s32 %r241, %r28, 245856; 2026-02-21T08:52:59.8038301Z bfe.u32 %r242, %r241, 4, 14; 2026-02-21T08:52:59.8038406Z cvt.u64.u32 %rd171, %r242; 2026-02-21T08:52:59.8038520Z or.b64 %rd322, %rd171, 4611686293842165760; 2026-02-21T08:52:59.8038846Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.8038953Z add.s32 %r243, %r4, %r87; 2026-02-21T08:52:59.8039100Z shl.b32 %r244, %r243, 11; 2026-02-21T08:52:59.8039257Z mad.wide.s32 %rd74, %r244, 2, %rd1; 2026-02-21T08:52:59.8039364Z add.s32 %r245, %r5, %r87; 2026-02-21T08:52:59.8039464Z shl.b32 %r246, %r245, 11; 2026-02-21T08:52:59.8039608Z mad.wide.s32 %rd75, %r246, 2, %rd1; 2026-02-21T08:52:59.8039710Z add.s32 %r247, %r6, %r87; 2026-02-21T08:52:59.8039810Z shl.b32 %r248, %r247, 11; 2026-02-21T08:52:59.8039924Z mad.wide.s32 %rd76, %r248, 2, %rd1; 2026-02-21T08:52:59.8040062Z add.s32 %r249, %r7, %r87; 2026-02-21T08:52:59.8040164Z shl.b32 %r250, %r249, 11; 2026-02-21T08:52:59.8040280Z mad.wide.s32 %rd77, %r250, 2, %rd1; 2026-02-21T08:52:59.8040424Z add.s32 %r251, %r8, %r87; 2026-02-21T08:52:59.8040527Z shl.b32 %r252, %r251, 11; 2026-02-21T08:52:59.8040636Z mad.wide.s32 %rd78, %r252, 2, %rd1; 2026-02-21T08:52:59.8040738Z add.s32 %r253, %r9, %r87; 2026-02-21T08:52:59.8040875Z shl.b32 %r254, %r253, 11; 2026-02-21T08:52:59.8040985Z mad.wide.s32 %rd79, %r254, 2, %rd1; 2026-02-21T08:52:59.8041093Z add.s32 %r255, %r10, %r87; 2026-02-21T08:52:59.8041234Z shl.b32 %r256, %r255, 11; 2026-02-21T08:52:59.8041346Z mad.wide.s32 %rd80, %r256, 2, %rd1; 2026-02-21T08:52:59.8041511Z mad.wide.s32 %rd81, %r92, 2, %rd1; 2026-02-21T08:52:59.8041618Z mov.pred %p229, 0; 2026-02-21T08:52:59.8041754Z mov.b32 %r2349, 0; 2026-02-21T08:52:59.8041864Z mov.b64 %rd1389, -128; 2026-02-21T08:52:59.8041964Z mov.b64 %rd1388, 0; 2026-02-21T08:52:59.8042102Z bra.uni $L__BB0_6; 2026-02-21T08:52:59.8042271Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:52:59.8042558Z .loc 1 55 31 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:55:31 2026-02-21T08:52:59.8042698Z xor.b32 %r2349, %r2349, 1; 2026-02-21T08:52:59.8042997Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.8043107Z add.s64 %rd1389, %rd1389, 128; 2026-02-21T08:52:59.8043217Z add.s64 %rd1388, %rd1388, 256; 2026-02-21T08:52:59.8043368Z setp.lt.u64 %p146, %rd1389, 1920; 2026-02-21T08:52:59.8043478Z mov.pred %p229, -1; 2026-02-21T08:52:59.8043584Z @%p146 bra $L__BB0_6; 2026-02-21T08:52:59.8043715Z bra.uni $L__BB0_9; 2026-02-21T08:52:59.8043881Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:59.8044043Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:59.8044366Z .loc 1 56 34 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:56:34 2026-02-21T08:52:59.8044477Z add.s64 %rd173, %rd81, %rd1388; 2026-02-21T08:52:59.8044624Z add.s64 %rd176, %rd80, %rd1388; 2026-02-21T08:52:59.8044781Z add.s64 %rd179, %rd79, %rd1388; 2026-02-21T08:52:59.8044923Z add.s64 %rd182, %rd78, %rd1388; 2026-02-21T08:52:59.8045027Z add.s64 %rd185, %rd77, %rd1388; 2026-02-21T08:52:59.8045130Z add.s64 %rd188, %rd76, %rd1388; 2026-02-21T08:52:59.8045268Z add.s64 %rd191, %rd75, %rd1388; 2026-02-21T08:52:59.8045601Z .loc 1 56 87 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:56:87 2026-02-21T08:52:59.8045710Z add.s64 %rd194, %rd74, %rd1388; 2026-02-21T08:52:59.8045843Z // begin inline asm 2026-02-21T08:52:59.8045944Z mov.u64 %rd172, 0x0; 2026-02-21T08:52:59.8046133Z createpolicy.fractional.L2::evict_last.b64 %rd172, 1.0; 2026-02-21T08:52:59.8046233Z // end inline asm 2026-02-21T08:52:59.8046367Z // begin inline asm 2026-02-21T08:52:59.8046467Z mov.u32 %r257, 0x0; 2026-02-21T08:52:59.8046565Z mov.u32 %r258, 0x0; 2026-02-21T08:52:59.8046694Z mov.u32 %r259, 0x0; 2026-02-21T08:52:59.8046796Z mov.u32 %r260, 0x0; 2026-02-21T08:52:59.8047095Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r257, %r258, %r259, %r260 }, [ %rd173 + 0 ], %rd172; 2026-02-21T08:52:59.8047195Z // end inline asm 2026-02-21T08:52:59.8047333Z // begin inline asm 2026-02-21T08:52:59.8047434Z mov.u64 %rd175, 0x0; 2026-02-21T08:52:59.8047662Z createpolicy.fractional.L2::evict_last.b64 %rd175, 1.0; 2026-02-21T08:52:59.8047798Z // end inline asm 2026-02-21T08:52:59.8047901Z // begin inline asm 2026-02-21T08:52:59.8048001Z mov.u32 %r261, 0x0; 2026-02-21T08:52:59.8048097Z mov.u32 %r262, 0x0; 2026-02-21T08:52:59.8048231Z mov.u32 %r263, 0x0; 2026-02-21T08:52:59.8048329Z mov.u32 %r264, 0x0; 2026-02-21T08:52:59.8048619Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r261, %r262, %r263, %r264 }, [ %rd176 + 0 ], %rd175; 2026-02-21T08:52:59.8048749Z // end inline asm 2026-02-21T08:52:59.8048848Z // begin inline asm 2026-02-21T08:52:59.8048948Z mov.u64 %rd178, 0x0; 2026-02-21T08:52:59.8049159Z createpolicy.fractional.L2::evict_last.b64 %rd178, 1.0; 2026-02-21T08:52:59.8049260Z // end inline asm 2026-02-21T08:52:59.8049358Z // begin inline asm 2026-02-21T08:52:59.8049452Z mov.u32 %r265, 0x0; 2026-02-21T08:52:59.8049581Z mov.u32 %r266, 0x0; 2026-02-21T08:52:59.8049677Z mov.u32 %r267, 0x0; 2026-02-21T08:52:59.8049771Z mov.u32 %r268, 0x0; 2026-02-21T08:52:59.8050086Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r265, %r266, %r267, %r268 }, [ %rd179 + 0 ], %rd178; 2026-02-21T08:52:59.8070160Z // end inline asm 2026-02-21T08:52:59.8070536Z // begin inline asm 2026-02-21T08:52:59.8070625Z mov.u64 %rd181, 0x0; 2026-02-21T08:52:59.8070837Z createpolicy.fractional.L2::evict_last.b64 %rd181, 1.0; 2026-02-21T08:52:59.8070916Z // end inline asm 2026-02-21T08:52:59.8070994Z // begin inline asm 2026-02-21T08:52:59.8071070Z mov.u32 %r269, 0x0; 2026-02-21T08:52:59.8071153Z mov.u32 %r270, 0x0; 2026-02-21T08:52:59.8071226Z mov.u32 %r271, 0x0; 2026-02-21T08:52:59.8071300Z mov.u32 %r272, 0x0; 2026-02-21T08:52:59.8071619Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r269, %r270, %r271, %r272 }, [ %rd182 + 0 ], %rd181; 2026-02-21T08:52:59.8071697Z // end inline asm 2026-02-21T08:52:59.8071773Z // begin inline asm 2026-02-21T08:52:59.8071848Z mov.u64 %rd184, 0x0; 2026-02-21T08:52:59.8072022Z createpolicy.fractional.L2::evict_last.b64 %rd184, 1.0; 2026-02-21T08:52:59.8072101Z // end inline asm 2026-02-21T08:52:59.8072177Z // begin inline asm 2026-02-21T08:52:59.8072261Z mov.u32 %r273, 0x0; 2026-02-21T08:52:59.8072336Z mov.u32 %r274, 0x0; 2026-02-21T08:52:59.8072407Z mov.u32 %r275, 0x0; 2026-02-21T08:52:59.8072478Z mov.u32 %r276, 0x0; 2026-02-21T08:52:59.8072766Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r273, %r274, %r275, %r276 }, [ %rd185 + 0 ], %rd184; 2026-02-21T08:52:59.8072842Z // end inline asm 2026-02-21T08:52:59.8072917Z // begin inline asm 2026-02-21T08:52:59.8073003Z mov.u64 %rd187, 0x0; 2026-02-21T08:52:59.8073163Z createpolicy.fractional.L2::evict_last.b64 %rd187, 1.0; 2026-02-21T08:52:59.8073321Z // end inline asm 2026-02-21T08:52:59.8073409Z // begin inline asm 2026-02-21T08:52:59.8073484Z mov.u32 %r277, 0x0; 2026-02-21T08:52:59.8073557Z mov.u32 %r278, 0x0; 2026-02-21T08:52:59.8073629Z mov.u32 %r279, 0x0; 2026-02-21T08:52:59.8073710Z mov.u32 %r280, 0x0; 2026-02-21T08:52:59.8073970Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r277, %r278, %r279, %r280 }, [ %rd188 + 0 ], %rd187; 2026-02-21T08:52:59.8074043Z // end inline asm 2026-02-21T08:52:59.8074175Z // begin inline asm 2026-02-21T08:52:59.8074257Z mov.u64 %rd190, 0x0; 2026-02-21T08:52:59.8074415Z createpolicy.fractional.L2::evict_last.b64 %rd190, 1.0; 2026-02-21T08:52:59.8074500Z // end inline asm 2026-02-21T08:52:59.8074576Z // begin inline asm 2026-02-21T08:52:59.8074649Z mov.u32 %r281, 0x0; 2026-02-21T08:52:59.8074798Z mov.u32 %r282, 0x0; 2026-02-21T08:52:59.8074886Z mov.u32 %r283, 0x0; 2026-02-21T08:52:59.8074960Z mov.u32 %r284, 0x0; 2026-02-21T08:52:59.8075221Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r281, %r282, %r283, %r284 }, [ %rd191 + 0 ], %rd190; 2026-02-21T08:52:59.8075310Z // end inline asm 2026-02-21T08:52:59.8075386Z // begin inline asm 2026-02-21T08:52:59.8075461Z mov.u64 %rd193, 0x0; 2026-02-21T08:52:59.8075614Z createpolicy.fractional.L2::evict_last.b64 %rd193, 1.0; 2026-02-21T08:52:59.8075697Z // end inline asm 2026-02-21T08:52:59.8075831Z // begin inline asm 2026-02-21T08:52:59.8075907Z mov.u32 %r285, 0x0; 2026-02-21T08:52:59.8075994Z mov.u32 %r286, 0x0; 2026-02-21T08:52:59.8076068Z mov.u32 %r287, 0x0; 2026-02-21T08:52:59.8076139Z mov.u32 %r288, 0x0; 2026-02-21T08:52:59.8076410Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r285, %r286, %r287, %r288 }, [ %rd194 + 0 ], %rd193; 2026-02-21T08:52:59.8076482Z // end inline asm 2026-02-21T08:52:59.8076556Z bar.sync 2, 128; 2026-02-21T08:52:59.8076685Z st.shared.v4.b32 [%r13], {%r257, %r258, %r259, %r260}; 2026-02-21T08:52:59.8076846Z st.shared.v4.b32 [%r13+1024], {%r261, %r262, %r263, %r264}; 2026-02-21T08:52:59.8076989Z st.shared.v4.b32 [%r13+2048], {%r265, %r266, %r267, %r268}; 2026-02-21T08:52:59.8077127Z st.shared.v4.b32 [%r13+3072], {%r269, %r270, %r271, %r272}; 2026-02-21T08:52:59.8077273Z st.shared.v4.b32 [%r13+4096], {%r273, %r274, %r275, %r276}; 2026-02-21T08:52:59.8077409Z st.shared.v4.b32 [%r13+5120], {%r277, %r278, %r279, %r280}; 2026-02-21T08:52:59.8077545Z st.shared.v4.b32 [%r13+6144], {%r281, %r282, %r283, %r284}; 2026-02-21T08:52:59.8077694Z st.shared.v4.b32 [%r13+7168], {%r285, %r286, %r287, %r288}; 2026-02-21T08:52:59.8078036Z .loc 1 55 31 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:55:31 2026-02-21T08:52:59.8078121Z add.s32 %r289, %r28, 278544; 2026-02-21T08:52:59.8078200Z // begin inline asm 2026-02-21T08:52:59.8078281Z 2026-02-21T08:52:59.8078347Z { 2026-02-21T08:52:59.8078432Z .reg .pred complete; 2026-02-21T08:52:59.8078512Z waitLoop: 2026-02-21T08:52:59.8078696Z mbarrier.try_wait.parity.shared.b64 complete, [%r289], %r2349; 2026-02-21T08:52:59.8078789Z @!complete bra.uni waitLoop; 2026-02-21T08:52:59.8078851Z } 2026-02-21T08:52:59.8078869Z 2026-02-21T08:52:59.8078941Z // end inline asm 2026-02-21T08:52:59.8079214Z .loc 1 57 52 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:57:52 2026-02-21T08:52:59.8079287Z // begin inline asm 2026-02-21T08:52:59.8079396Z fence.proxy.async.shared::cta; 2026-02-21T08:52:59.8079468Z // end inline asm 2026-02-21T08:52:59.8079540Z bar.sync 2, 128; 2026-02-21T08:52:59.8079661Z shfl.sync.idx.b32 %r292, %r12, 0, 31, -1; 2026-02-21T08:52:59.8079747Z setp.ne.b32 %p13, %r292, 0; 2026-02-21T08:52:59.8079822Z @%p13 bra $L__BB0_8; 2026-02-21T08:52:59.8079963Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:52:59.8080065Z setp.eq.b64 %p144, %rd1388, 3840; 2026-02-21T08:52:59.8080157Z elect.sync %r421|%p15, -1; 2026-02-21T08:52:59.8080234Z mov.b32 %r294, 135266320; 2026-02-21T08:52:59.8080320Z // begin inline asm 2026-02-21T08:52:59.8080614Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd196, %rd197, %r294, %p229; 2026-02-21T08:52:59.8080690Z // end inline asm 2026-02-21T08:52:59.8080785Z mov.pred %p16, -1; 2026-02-21T08:52:59.8080860Z // begin inline asm 2026-02-21T08:52:59.8081077Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd198, %rd199, %r294, %p16; 2026-02-21T08:52:59.8081151Z // end inline asm 2026-02-21T08:52:59.8081237Z // begin inline asm 2026-02-21T08:52:59.8081499Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd200, %rd201, %r294, %p16; 2026-02-21T08:52:59.8081576Z // end inline asm 2026-02-21T08:52:59.8081700Z // begin inline asm 2026-02-21T08:52:59.8081906Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd202, %rd203, %r294, %p16; 2026-02-21T08:52:59.8081980Z // end inline asm 2026-02-21T08:52:59.8082067Z // begin inline asm 2026-02-21T08:52:59.8082272Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd204, %rd205, %r294, %p16; 2026-02-21T08:52:59.8082346Z // end inline asm 2026-02-21T08:52:59.8082421Z // begin inline asm 2026-02-21T08:52:59.8082633Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd206, %rd207, %r294, %p16; 2026-02-21T08:52:59.8082704Z // end inline asm 2026-02-21T08:52:59.8082779Z // begin inline asm 2026-02-21T08:52:59.8083028Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd208, %rd209, %r294, %p16; 2026-02-21T08:52:59.8083104Z // end inline asm 2026-02-21T08:52:59.8083182Z // begin inline asm 2026-02-21T08:52:59.8083400Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 0 ], %rd210, %rd211, %r294, %p16; 2026-02-21T08:52:59.8083472Z // end inline asm 2026-02-21T08:52:59.8083546Z // begin inline asm 2026-02-21T08:52:59.8083776Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd212, %rd197, %r294, %p229; 2026-02-21T08:52:59.8083849Z // end inline asm 2026-02-21T08:52:59.8083925Z // begin inline asm 2026-02-21T08:52:59.8084133Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd214, %rd199, %r294, %p16; 2026-02-21T08:52:59.8084219Z // end inline asm 2026-02-21T08:52:59.8084295Z // begin inline asm 2026-02-21T08:52:59.8084503Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd216, %rd201, %r294, %p16; 2026-02-21T08:52:59.8084585Z // end inline asm 2026-02-21T08:52:59.8084662Z // begin inline asm 2026-02-21T08:52:59.8084963Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd218, %rd203, %r294, %p16; 2026-02-21T08:52:59.8085052Z // end inline asm 2026-02-21T08:52:59.8085382Z // begin inline asm 2026-02-21T08:52:59.8085588Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd220, %rd205, %r294, %p16; 2026-02-21T08:52:59.8085663Z // end inline asm 2026-02-21T08:52:59.8085751Z // begin inline asm 2026-02-21T08:52:59.8085959Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd222, %rd207, %r294, %p16; 2026-02-21T08:52:59.8086033Z // end inline asm 2026-02-21T08:52:59.8086119Z // begin inline asm 2026-02-21T08:52:59.8086324Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd224, %rd209, %r294, %p16; 2026-02-21T08:52:59.8086398Z // end inline asm 2026-02-21T08:52:59.8086485Z // begin inline asm 2026-02-21T08:52:59.8086687Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 64 ], %rd226, %rd211, %r294, %p16; 2026-02-21T08:52:59.8086761Z // end inline asm 2026-02-21T08:52:59.8086838Z // begin inline asm 2026-02-21T08:52:59.8087069Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd228, %rd197, %r294, %p229; 2026-02-21T08:52:59.8087146Z // end inline asm 2026-02-21T08:52:59.8087220Z // begin inline asm 2026-02-21T08:52:59.8087444Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd230, %rd199, %r294, %p16; 2026-02-21T08:52:59.8087517Z // end inline asm 2026-02-21T08:52:59.8087591Z // begin inline asm 2026-02-21T08:52:59.8087812Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd232, %rd201, %r294, %p16; 2026-02-21T08:52:59.8087885Z // end inline asm 2026-02-21T08:52:59.8087958Z // begin inline asm 2026-02-21T08:52:59.8089312Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd234, %rd203, %r294, %p16; 2026-02-21T08:52:59.8089388Z // end inline asm 2026-02-21T08:52:59.8089462Z // begin inline asm 2026-02-21T08:52:59.8089676Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd236, %rd205, %r294, %p16; 2026-02-21T08:52:59.8089758Z // end inline asm 2026-02-21T08:52:59.8089832Z // begin inline asm 2026-02-21T08:52:59.8090089Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd238, %rd207, %r294, %p16; 2026-02-21T08:52:59.8090176Z // end inline asm 2026-02-21T08:52:59.8090250Z // begin inline asm 2026-02-21T08:52:59.8090466Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd240, %rd209, %r294, %p16; 2026-02-21T08:52:59.8090548Z // end inline asm 2026-02-21T08:52:59.8090621Z // begin inline asm 2026-02-21T08:52:59.8090833Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 128 ], %rd242, %rd211, %r294, %p16; 2026-02-21T08:52:59.8090904Z // end inline asm 2026-02-21T08:52:59.8090992Z // begin inline asm 2026-02-21T08:52:59.8091210Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd244, %rd197, %r294, %p229; 2026-02-21T08:52:59.8091280Z // end inline asm 2026-02-21T08:52:59.8091364Z // begin inline asm 2026-02-21T08:52:59.8091628Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd246, %rd199, %r294, %p16; 2026-02-21T08:52:59.8091706Z // end inline asm 2026-02-21T08:52:59.8091793Z // begin inline asm 2026-02-21T08:52:59.8092010Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd248, %rd201, %r294, %p16; 2026-02-21T08:52:59.8092080Z // end inline asm 2026-02-21T08:52:59.8092179Z // begin inline asm 2026-02-21T08:52:59.8092389Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd250, %rd203, %r294, %p16; 2026-02-21T08:52:59.8092463Z // end inline asm 2026-02-21T08:52:59.8092537Z // begin inline asm 2026-02-21T08:52:59.8092754Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd252, %rd205, %r294, %p16; 2026-02-21T08:52:59.8092825Z // end inline asm 2026-02-21T08:52:59.8092899Z // begin inline asm 2026-02-21T08:52:59.8093115Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd254, %rd207, %r294, %p16; 2026-02-21T08:52:59.8093187Z // end inline asm 2026-02-21T08:52:59.8093262Z // begin inline asm 2026-02-21T08:52:59.8093480Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd256, %rd209, %r294, %p16; 2026-02-21T08:52:59.8093555Z // end inline asm 2026-02-21T08:52:59.8093697Z // begin inline asm 2026-02-21T08:52:59.8093909Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 192 ], %rd258, %rd211, %r294, %p16; 2026-02-21T08:52:59.8093991Z // end inline asm 2026-02-21T08:52:59.8094071Z // begin inline asm 2026-02-21T08:52:59.8094286Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd260, %rd197, %r294, %p229; 2026-02-21T08:52:59.8094372Z // end inline asm 2026-02-21T08:52:59.8094448Z // begin inline asm 2026-02-21T08:52:59.8094657Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd262, %rd199, %r294, %p16; 2026-02-21T08:52:59.8094811Z // end inline asm 2026-02-21T08:52:59.8094891Z // begin inline asm 2026-02-21T08:52:59.8095099Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd264, %rd201, %r294, %p16; 2026-02-21T08:52:59.8095182Z // end inline asm 2026-02-21T08:52:59.8095258Z // begin inline asm 2026-02-21T08:52:59.8095470Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd266, %rd203, %r294, %p16; 2026-02-21T08:52:59.8095545Z // end inline asm 2026-02-21T08:52:59.8095629Z // begin inline asm 2026-02-21T08:52:59.8095851Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd268, %rd205, %r294, %p16; 2026-02-21T08:52:59.8095921Z // end inline asm 2026-02-21T08:52:59.8096006Z // begin inline asm 2026-02-21T08:52:59.8096212Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd270, %rd207, %r294, %p16; 2026-02-21T08:52:59.8096285Z // end inline asm 2026-02-21T08:52:59.8096371Z // begin inline asm 2026-02-21T08:52:59.8096648Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd272, %rd209, %r294, %p16; 2026-02-21T08:52:59.8096723Z // end inline asm 2026-02-21T08:52:59.8096801Z // begin inline asm 2026-02-21T08:52:59.8097029Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 256 ], %rd274, %rd211, %r294, %p16; 2026-02-21T08:52:59.8097103Z // end inline asm 2026-02-21T08:52:59.8097178Z // begin inline asm 2026-02-21T08:52:59.8097457Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd276, %rd197, %r294, %p229; 2026-02-21T08:52:59.8097535Z // end inline asm 2026-02-21T08:52:59.8097610Z // begin inline asm 2026-02-21T08:52:59.8097836Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd278, %rd199, %r294, %p16; 2026-02-21T08:52:59.8097910Z // end inline asm 2026-02-21T08:52:59.8097986Z // begin inline asm 2026-02-21T08:52:59.8098212Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd280, %rd201, %r294, %p16; 2026-02-21T08:52:59.8098285Z // end inline asm 2026-02-21T08:52:59.8098361Z // begin inline asm 2026-02-21T08:52:59.8098569Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd282, %rd203, %r294, %p16; 2026-02-21T08:52:59.8098653Z // end inline asm 2026-02-21T08:52:59.8098725Z // begin inline asm 2026-02-21T08:52:59.8098982Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd284, %rd205, %r294, %p16; 2026-02-21T08:52:59.8099072Z // end inline asm 2026-02-21T08:52:59.8099151Z // begin inline asm 2026-02-21T08:52:59.8099368Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd286, %rd207, %r294, %p16; 2026-02-21T08:52:59.8099455Z // end inline asm 2026-02-21T08:52:59.8099531Z // begin inline asm 2026-02-21T08:52:59.8099742Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd288, %rd209, %r294, %p16; 2026-02-21T08:52:59.8099815Z // end inline asm 2026-02-21T08:52:59.8099898Z // begin inline asm 2026-02-21T08:52:59.8100111Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 320 ], %rd290, %rd211, %r294, %p16; 2026-02-21T08:52:59.8100187Z // end inline asm 2026-02-21T08:52:59.8100270Z // begin inline asm 2026-02-21T08:52:59.8100488Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd292, %rd197, %r294, %p229; 2026-02-21T08:52:59.8100560Z // end inline asm 2026-02-21T08:52:59.8100642Z // begin inline asm 2026-02-21T08:52:59.8100854Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd294, %rd199, %r294, %p16; 2026-02-21T08:52:59.8100928Z // end inline asm 2026-02-21T08:52:59.8101002Z // begin inline asm 2026-02-21T08:52:59.8101334Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd296, %rd201, %r294, %p16; 2026-02-21T08:52:59.8101410Z // end inline asm 2026-02-21T08:52:59.8101487Z // begin inline asm 2026-02-21T08:52:59.8101709Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd298, %rd203, %r294, %p16; 2026-02-21T08:52:59.8101782Z // end inline asm 2026-02-21T08:52:59.8101856Z // begin inline asm 2026-02-21T08:52:59.8102071Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd300, %rd205, %r294, %p16; 2026-02-21T08:52:59.8102146Z // end inline asm 2026-02-21T08:52:59.8102220Z // begin inline asm 2026-02-21T08:52:59.8102440Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd302, %rd207, %r294, %p16; 2026-02-21T08:52:59.8102513Z // end inline asm 2026-02-21T08:52:59.8102584Z // begin inline asm 2026-02-21T08:52:59.8102798Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd304, %rd209, %r294, %p16; 2026-02-21T08:52:59.8102879Z // end inline asm 2026-02-21T08:52:59.8102956Z // begin inline asm 2026-02-21T08:52:59.8103169Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 384 ], %rd306, %rd211, %r294, %p16; 2026-02-21T08:52:59.8103249Z // end inline asm 2026-02-21T08:52:59.8103323Z // begin inline asm 2026-02-21T08:52:59.8103539Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd308, %rd197, %r294, %p229; 2026-02-21T08:52:59.8103621Z // end inline asm 2026-02-21T08:52:59.8103694Z // begin inline asm 2026-02-21T08:52:59.8103946Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd310, %rd199, %r294, %p16; 2026-02-21T08:52:59.8104034Z // end inline asm 2026-02-21T08:52:59.8104110Z // begin inline asm 2026-02-21T08:52:59.8104316Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd312, %rd201, %r294, %p16; 2026-02-21T08:52:59.8104389Z // end inline asm 2026-02-21T08:52:59.8104474Z // begin inline asm 2026-02-21T08:52:59.8104790Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd314, %rd203, %r294, %p16; 2026-02-21T08:52:59.8104869Z // end inline asm 2026-02-21T08:52:59.8104956Z // begin inline asm 2026-02-21T08:52:59.8105173Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd316, %rd205, %r294, %p16; 2026-02-21T08:52:59.8105246Z // end inline asm 2026-02-21T08:52:59.8105330Z // begin inline asm 2026-02-21T08:52:59.8105549Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd318, %rd207, %r294, %p16; 2026-02-21T08:52:59.8105621Z // end inline asm 2026-02-21T08:52:59.8105699Z // begin inline asm 2026-02-21T08:52:59.8105916Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd320, %rd209, %r294, %p16; 2026-02-21T08:52:59.8105990Z // end inline asm 2026-02-21T08:52:59.8106063Z // begin inline asm 2026-02-21T08:52:59.8106322Z @%p15 tcgen05.mma.cta_group::1.kind::f16 [ %r293 + 448 ], %rd322, %rd211, %r294, %p16; 2026-02-21T08:52:59.8106397Z // end inline asm 2026-02-21T08:52:59.8106484Z add.s32 %r423, %r28, 278528; 2026-02-21T08:52:59.8106582Z cvt.u64.u32 %rd324, %r423; 2026-02-21T08:52:59.8106659Z // begin inline asm 2026-02-21T08:52:59.8106851Z @%p15 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd324]; 2026-02-21T08:52:59.8106922Z // end inline asm 2026-02-21T08:52:59.8107018Z and.pred %p143, %p144, %p15; 2026-02-21T08:52:59.8107097Z add.s32 %r424, %r28, 278560; 2026-02-21T08:52:59.8107178Z cvt.u64.u32 %rd325, %r424; 2026-02-21T08:52:59.8107264Z // begin inline asm 2026-02-21T08:52:59.8107459Z @%p143 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd325]; 2026-02-21T08:52:59.8107534Z // end inline asm 2026-02-21T08:52:59.8107618Z bra.uni $L__BB0_8; 2026-02-21T08:52:59.8107765Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.8108043Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.8108152Z ld.shared.b32 %r16, [global_smem+262152]; 2026-02-21T08:52:59.8108243Z barrier.sync 1; 2026-02-21T08:52:59.8108568Z .loc 1 21 67 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:21:67 2026-02-21T08:52:59.8108650Z mov.u32 %r31, %ctaid.x; 2026-02-21T08:52:59.8108743Z mov.u32 %r32, %ctaid.y; 2026-02-21T08:52:59.8108820Z mov.u32 %r33, %ctaid.z; 2026-02-21T08:52:59.8108899Z mov.u32 %r34, %nctaid.x; 2026-02-21T08:52:59.8108979Z mov.u32 %r35, %nctaid.y; 2026-02-21T08:52:59.8109080Z mad.lo.s32 %r36, %r33, %r35, %r32; 2026-02-21T08:52:59.8109163Z mad.lo.s32 %r37, %r36, %r34, %r31; 2026-02-21T08:52:59.8109244Z shl.b32 %r38, %r37, 8; 2026-02-21T08:52:59.8109339Z cvt.s64.s32 %rd92, %r38; 2026-02-21T08:52:59.8109418Z add.s64 %rd93, %rd91, %rd92; 2026-02-21T08:52:59.8109506Z cvta.global.u64 %rd94, %rd93; 2026-02-21T08:52:59.8109599Z add.s32 %r17, %r1, -256; 2026-02-21T08:52:59.8109676Z shr.u32 %r18, %r17, 5; 2026-02-21T08:52:59.8109751Z mov.b32 %r2350, 0; 2026-02-21T08:52:59.8109827Z mov.b32 %r2351, %r2350; 2026-02-21T08:52:59.8109984Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:59.8110123Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:59.8110391Z .loc 1 0 67 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:0:67 2026-02-21T08:52:59.8110488Z setp.lt.u32 %p9, %r17, 64; 2026-02-21T08:52:59.8110574Z setp.eq.b32 %p4, %r17, 0; 2026-02-21T08:52:59.8110847Z .loc 1 55 31 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:55:31 2026-02-21T08:52:59.8110988Z add.s32 %r39, %r28, 278528; 2026-02-21T08:52:59.8111065Z // begin inline asm 2026-02-21T08:52:59.8111133Z 2026-02-21T08:52:59.8111198Z { 2026-02-21T08:52:59.8111288Z .reg .pred complete; 2026-02-21T08:52:59.8111356Z waitLoop: 2026-02-21T08:52:59.8111556Z mbarrier.try_wait.parity.shared.b64 complete, [%r39], %r2350; 2026-02-21T08:52:59.8111643Z @!complete bra.uni waitLoop; 2026-02-21T08:52:59.8111705Z } 2026-02-21T08:52:59.8111749Z 2026-02-21T08:52:59.8111826Z // end inline asm 2026-02-21T08:52:59.8111910Z bar.sync 3, 64; 2026-02-21T08:52:59.8111987Z add.s32 %r41, %r28, 278544; 2026-02-21T08:52:59.8112061Z // begin inline asm 2026-02-21T08:52:59.8112232Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r41], 262144; 2026-02-21T08:52:59.8112304Z // end inline asm 2026-02-21T08:52:59.8112378Z bar.sync 3, 64; 2026-02-21T08:52:59.8112478Z shfl.sync.idx.b32 %r59, %r18, 0, 31, -1; 2026-02-21T08:52:59.8112573Z elect.sync %r60|%p10, -1; 2026-02-21T08:52:59.8112661Z and.pred %p5, %p9, %p10; 2026-02-21T08:52:59.8112739Z shl.b32 %r61, %r59, 14; 2026-02-21T08:52:59.8112824Z and.b32 %r62, %r61, 114688; 2026-02-21T08:52:59.8112899Z shl.b32 %r63, %r62, 1; 2026-02-21T08:52:59.8112976Z add.s32 %r42, %r28, %r63; 2026-02-21T08:52:59.8113051Z shl.b32 %r64, %r59, 8; 2026-02-21T08:52:59.8113173Z and.b32 %r65, %r64, 768; 2026-02-21T08:52:59.8113252Z shl.b32 %r66, %r59, 4; 2026-02-21T08:52:59.8113327Z and.b32 %r67, %r66, 64; 2026-02-21T08:52:59.8113413Z add.s32 %r43, %r2351, %r67; 2026-02-21T08:52:59.8113488Z add.s32 %r44, %r65, %r16; 2026-02-21T08:52:59.8113562Z // begin inline asm 2026-02-21T08:52:59.8113962Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r42], [%rd94, {%r43, %r44}], [%r41]; 2026-02-21T08:52:59.8114034Z // end inline asm 2026-02-21T08:52:59.8114107Z add.s32 %r68, %r59, 2; 2026-02-21T08:52:59.8114180Z shl.b32 %r69, %r68, 15; 2026-02-21T08:52:59.8114264Z and.b32 %r70, %r69, 229376; 2026-02-21T08:52:59.8114342Z add.s32 %r46, %r28, %r70; 2026-02-21T08:52:59.8114417Z shl.b32 %r71, %r68, 8; 2026-02-21T08:52:59.8114499Z and.b32 %r72, %r71, 768; 2026-02-21T08:52:59.8114573Z shl.b32 %r73, %r68, 4; 2026-02-21T08:52:59.8114646Z and.b32 %r74, %r73, 64; 2026-02-21T08:52:59.8114784Z add.s32 %r47, %r2351, %r74; 2026-02-21T08:52:59.8114875Z add.s32 %r48, %r72, %r16; 2026-02-21T08:52:59.8114948Z // begin inline asm 2026-02-21T08:52:59.8115330Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r46], [%rd94, {%r47, %r48}], [%r41]; 2026-02-21T08:52:59.8115463Z // end inline asm 2026-02-21T08:52:59.8115540Z xor.b32 %r75, %r62, 65536; 2026-02-21T08:52:59.8115614Z shl.b32 %r76, %r75, 1; 2026-02-21T08:52:59.8115696Z add.s32 %r50, %r28, %r76; 2026-02-21T08:52:59.8115771Z xor.b32 %r51, %r43, 64; 2026-02-21T08:52:59.8115844Z // begin inline asm 2026-02-21T08:52:59.8116216Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r50], [%rd94, {%r51, %r44}], [%r41]; 2026-02-21T08:52:59.8116298Z // end inline asm 2026-02-21T08:52:59.8116371Z add.s32 %r77, %r59, 6; 2026-02-21T08:52:59.8116444Z shl.b32 %r78, %r77, 15; 2026-02-21T08:52:59.8116528Z and.b32 %r79, %r78, 229376; 2026-02-21T08:52:59.8116604Z add.s32 %r54, %r28, %r79; 2026-02-21T08:52:59.8116679Z shl.b32 %r80, %r77, 8; 2026-02-21T08:52:59.8116763Z and.b32 %r81, %r80, 768; 2026-02-21T08:52:59.8116837Z shl.b32 %r82, %r77, 4; 2026-02-21T08:52:59.8116913Z and.b32 %r83, %r82, 64; 2026-02-21T08:52:59.8116993Z add.s32 %r55, %r2351, %r83; 2026-02-21T08:52:59.8117077Z add.s32 %r56, %r81, %r16; 2026-02-21T08:52:59.8117151Z // begin inline asm 2026-02-21T08:52:59.8117511Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r54], [%rd94, {%r55, %r56}], [%r41]; 2026-02-21T08:52:59.8117590Z // end inline asm 2026-02-21T08:52:59.8117667Z xor.b32 %r2350, %r2350, 1; 2026-02-21T08:52:59.8117934Z .loc 1 50 57 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:50:57 2026-02-21T08:52:59.8118063Z add.s32 %r22, %r2351, 128; 2026-02-21T08:52:59.8118160Z setp.lt.u32 %p11, %r2351, 1920; 2026-02-21T08:52:59.8118238Z mov.b32 %r2351, %r22; 2026-02-21T08:52:59.8118313Z @%p11 bra $L__BB0_11; 2026-02-21T08:52:59.8118457Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.8118548Z barrier.sync 1; 2026-02-21T08:52:59.8118663Z bra.uni $L__BB0_2; 2026-02-21T08:52:59.8118808Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.8118895Z barrier.sync 1; 2026-02-21T08:52:59.8118967Z bra.uni $L__BB0_2; 2026-02-21T08:52:59.8119101Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:59.8119368Z .loc 1 19 0 // co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py:19 2026-02-21T08:52:59.8119443Z barrier.sync 1; 2026-02-21T08:52:59.8119515Z barrier.sync 1; 2026-02-21T08:52:59.8119590Z bra.uni $L__BB0_2; 2026-02-21T08:52:59.8119667Z $L__tmp1: 2026-02-21T08:52:59.8119739Z $L__func_end0: 2026-02-21T08:52:59.8119851Z // -- End function 2026-02-21T08:52:59.8119923Z } 2026-02-21T08:52:59.8120249Z .file 1 "/tmp/torchinductor_root/o7/co7elh3tbppfg4psps6d2b5qj5hjanrarxntmdvki6gbqit5kiaj.py" 2026-02-21T08:52:59.8120393Z .section .debug_abbrev 2026-02-21T08:52:59.8120461Z { 2026-02-21T08:52:59.8120593Z .b8 1 // Abbreviation Code 2026-02-21T08:52:59.8120717Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:59.8120829Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:59.8120948Z .b8 37 // DW_AT_producer 2026-02-21T08:52:59.8121053Z .b8 8 // DW_FORM_string 2026-02-21T08:52:59.8121158Z .b8 19 // DW_AT_language 2026-02-21T08:52:59.8121270Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:59.8121373Z .b8 3 // DW_AT_name 2026-02-21T08:52:59.8121476Z .b8 8 // DW_FORM_string 2026-02-21T08:52:59.8121593Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:59.8121699Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:59.8121804Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:59.8121905Z .b8 8 // DW_FORM_string 2026-02-21T08:52:59.8122049Z .b8 0 // EOM(1) 2026-02-21T08:52:59.8122143Z .b8 0 // EOM(2) 2026-02-21T08:52:59.8122229Z .b8 0 // EOM(3) 2026-02-21T08:52:59.8122300Z } 2026-02-21T08:52:59.8122378Z .section .debug_info 2026-02-21T08:52:59.8122441Z { 2026-02-21T08:52:59.8122554Z .b32 104 // Length of Unit 2026-02-21T08:52:59.8122684Z .b8 2 // DWARF version number 2026-02-21T08:52:59.8122750Z .b8 0 2026-02-21T08:52:59.8122916Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:59.8123049Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:59.8123193Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:52:59.8123303Z .b8 116 // DW_AT_producer 2026-02-21T08:52:59.8123380Z .b8 114 2026-02-21T08:52:59.8123447Z .b8 105 2026-02-21T08:52:59.8123510Z .b8 116 2026-02-21T08:52:59.8123575Z .b8 111 2026-02-21T08:52:59.8123648Z .b8 110 2026-02-21T08:52:59.8123712Z .b8 0 2026-02-21T08:52:59.8123811Z .b8 2 // DW_AT_language 2026-02-21T08:52:59.8123882Z .b8 0 2026-02-21T08:52:59.8123982Z .b8 99 // DW_AT_name 2026-02-21T08:52:59.8124047Z .b8 111 2026-02-21T08:52:59.8124110Z .b8 55 2026-02-21T08:52:59.8124222Z .b8 101 2026-02-21T08:52:59.8124286Z .b8 108 2026-02-21T08:52:59.8124352Z .b8 104 2026-02-21T08:52:59.8124426Z .b8 51 2026-02-21T08:52:59.8124490Z .b8 116 2026-02-21T08:52:59.8124553Z .b8 98 2026-02-21T08:52:59.8124616Z .b8 112 2026-02-21T08:52:59.8124745Z .b8 112 2026-02-21T08:52:59.8124811Z .b8 102 2026-02-21T08:52:59.8124875Z .b8 103 2026-02-21T08:52:59.8124940Z .b8 52 2026-02-21T08:52:59.8125014Z .b8 112 2026-02-21T08:52:59.8125076Z .b8 115 2026-02-21T08:52:59.8125184Z .b8 112 2026-02-21T08:52:59.8125260Z .b8 115 2026-02-21T08:52:59.8125325Z .b8 54 2026-02-21T08:52:59.8125390Z .b8 100 2026-02-21T08:52:59.8125453Z .b8 50 2026-02-21T08:52:59.8125525Z .b8 98 2026-02-21T08:52:59.8125590Z .b8 53 2026-02-21T08:52:59.8125654Z .b8 113 2026-02-21T08:52:59.8125723Z .b8 106 2026-02-21T08:52:59.8125786Z .b8 53 2026-02-21T08:52:59.8125849Z .b8 104 2026-02-21T08:52:59.8125911Z .b8 106 2026-02-21T08:52:59.8125981Z .b8 97 2026-02-21T08:52:59.8126047Z .b8 110 2026-02-21T08:52:59.8126111Z .b8 114 2026-02-21T08:52:59.8126185Z .b8 97 2026-02-21T08:52:59.8126248Z .b8 114 2026-02-21T08:52:59.8126312Z .b8 120 2026-02-21T08:52:59.8126374Z .b8 110 2026-02-21T08:52:59.8126445Z .b8 116 2026-02-21T08:52:59.8126508Z .b8 109 2026-02-21T08:52:59.8126571Z .b8 100 2026-02-21T08:52:59.8126633Z .b8 118 2026-02-21T08:52:59.8126704Z .b8 107 2026-02-21T08:52:59.8126766Z .b8 105 2026-02-21T08:52:59.8126869Z .b8 54 2026-02-21T08:52:59.8126946Z .b8 103 2026-02-21T08:52:59.8127009Z .b8 98 2026-02-21T08:52:59.8127076Z .b8 113 2026-02-21T08:52:59.8127140Z .b8 105 2026-02-21T08:52:59.8127211Z .b8 116 2026-02-21T08:52:59.8127273Z .b8 53 2026-02-21T08:52:59.8127337Z .b8 107 2026-02-21T08:52:59.8127411Z .b8 105 2026-02-21T08:52:59.8127474Z .b8 97 2026-02-21T08:52:59.8127539Z .b8 106 2026-02-21T08:52:59.8127602Z .b8 46 2026-02-21T08:52:59.8127674Z .b8 112 2026-02-21T08:52:59.8127738Z .b8 121 2026-02-21T08:52:59.8127803Z .b8 0 2026-02-21T08:52:59.8127935Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:59.8128048Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:59.8128115Z .b8 116 2026-02-21T08:52:59.8128178Z .b8 109 2026-02-21T08:52:59.8128249Z .b8 112 2026-02-21T08:52:59.8128313Z .b8 47 2026-02-21T08:52:59.8128378Z .b8 116 2026-02-21T08:52:59.8128442Z .b8 111 2026-02-21T08:52:59.8128515Z .b8 114 2026-02-21T08:52:59.8128577Z .b8 99 2026-02-21T08:52:59.8128645Z .b8 104 2026-02-21T08:52:59.8128718Z .b8 105 2026-02-21T08:52:59.8128782Z .b8 110 2026-02-21T08:52:59.8128848Z .b8 100 2026-02-21T08:52:59.8128912Z .b8 117 2026-02-21T08:52:59.8129028Z .b8 99 2026-02-21T08:52:59.8129093Z .b8 116 2026-02-21T08:52:59.8129157Z .b8 111 2026-02-21T08:52:59.8129230Z .b8 114 2026-02-21T08:52:59.8129293Z .b8 95 2026-02-21T08:52:59.8129357Z .b8 114 2026-02-21T08:52:59.8129419Z .b8 111 2026-02-21T08:52:59.8129491Z .b8 111 2026-02-21T08:52:59.8129555Z .b8 116 2026-02-21T08:52:59.8129617Z .b8 47 2026-02-21T08:52:59.8129686Z .b8 111 2026-02-21T08:52:59.8129749Z .b8 55 2026-02-21T08:52:59.8129813Z .b8 0 2026-02-21T08:52:59.8129877Z } 2026-02-21T08:52:59.8129973Z .section .debug_macinfo { } 2026-02-21T08:52:59.8129981Z 2026-02-21T08:52:59.8130087Z ================================================================ 2026-02-21T08:52:59.8130234Z please share the reproducer above with Triton project. 2026-02-21T08:53:00.6441257Z 2026-02-21T08:53:00.6441277Z 2026-02-21T08:53:00.6441283Z 2026-02-21T08:53:00.6442172Z ================================================================ 2026-02-21T08:53:00.6442846Z Internal Triton PTX codegen error 2026-02-21T08:53:00.6443317Z `ptxas` stderr: 2026-02-21T08:53:00.6444627Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 226 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:53:00.6446026Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:53:00.6446431Z 2026-02-21T08:53:00.6447485Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpc1p32dc8.ptx -o /tmp/tmpc1p32dc8.ptx.o 2026-02-21T08:53:00.6449221Z 2026-02-21T08:53:00.6449229Z 2026-02-21T08:53:00.6449364Z // 2026-02-21T08:53:00.6449706Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:53:00.6450158Z // 2026-02-21T08:53:00.6450316Z 2026-02-21T08:53:00.6450453Z .version 8.7 2026-02-21T08:53:00.6450731Z .target sm_100a 2026-02-21T08:53:00.6451017Z .address_size 64 2026-02-21T08:53:00.6451187Z 2026-02-21T08:53:00.6451620Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:53:00.6452252Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:53:00.6452738Z // @_helion_matmul 2026-02-21T08:53:00.6453194Z .visible .entry _helion_matmul( 2026-02-21T08:53:00.6453659Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:53:00.6454240Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:53:00.6454925Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:53:00.6455562Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:53:00.6456161Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:53:00.6456626Z ) 2026-02-21T08:53:00.6456864Z .reqntid 384 2026-02-21T08:53:00.6457106Z .maxnreg 32 2026-02-21T08:53:00.6457341Z { 2026-02-21T08:53:00.6457755Z .reg .pred %p<131>; 2026-02-21T08:53:00.6458061Z .reg .b32 %r<2297>; 2026-02-21T08:53:00.6458345Z .reg .b64 %rd<1291>; 2026-02-21T08:53:00.6458909Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19:0 2026-02-21T08:53:00.6459570Z $L__func_begin0: 2026-02-21T08:53:00.6460090Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19:0 2026-02-21T08:53:00.6460640Z 2026-02-21T08:53:00.6460732Z // %bb.0: 2026-02-21T08:53:00.6461040Z ld.param.b64 %rd67, [_helion_matmul_param_3]; 2026-02-21T08:53:00.6461478Z $L__tmp0: 2026-02-21T08:53:00.6462002Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19 2026-02-21T08:53:00.6462652Z mov.u32 %r1, %tid.x; 2026-02-21T08:53:00.6462979Z shr.u32 %r2, %r1, 5; 2026-02-21T08:53:00.6463303Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:53:00.6463707Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T08:53:00.6464011Z @%p2 bra $L__BB0_14; 2026-02-21T08:53:00.6464308Z bra.uni $L__BB0_1; 2026-02-21T08:53:00.6464579Z $L__BB0_14: 2026-02-21T08:53:00.6465197Z .loc 1 0 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:0:0 2026-02-21T08:53:00.6466117Z ld.param.b64 %rd66, [_helion_matmul_param_2]; 2026-02-21T08:53:00.6466591Z ld.param.b64 %rd64, [_helion_matmul_param_0]; 2026-02-21T08:53:00.6467276Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19 2026-02-21T08:53:00.6468010Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T08:53:00.6468444Z setp.lt.u32 %p48, %r1, 32; 2026-02-21T08:53:00.6468784Z mov.b32 %r370, global_smem; 2026-02-21T08:53:00.6469139Z // begin inline asm 2026-02-21T08:53:00.6469686Z @%p48 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r370], 512; 2026-02-21T08:53:00.6470256Z // end inline asm 2026-02-21T08:53:00.6470528Z bar.sync 0, 128; 2026-02-21T08:53:00.6470826Z ld.shared.b32 %r2293, [global_smem]; 2026-02-21T08:53:00.6471212Z bar.sync 0, 128; 2026-02-21T08:53:00.6471485Z // begin inline asm 2026-02-21T08:53:00.6471942Z @%p48 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:53:00.6472463Z // end inline asm 2026-02-21T08:53:00.6473005Z .loc 1 21 67 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:21:67 2026-02-21T08:53:00.6473694Z mov.u32 %r48, %ctaid.x; 2026-02-21T08:53:00.6474020Z mov.u32 %r387, %ctaid.y; 2026-02-21T08:53:00.6474353Z mov.u32 %r388, %ctaid.z; 2026-02-21T08:53:00.6474667Z mov.u32 %r389, %nctaid.x; 2026-02-21T08:53:00.6475092Z mov.u32 %r390, %nctaid.y; 2026-02-21T08:53:00.6475437Z mad.lo.s32 %r391, %r388, %r390, %r387; 2026-02-21T08:53:00.6475999Z mad.lo.s32 %r392, %r391, %r389, %r48; 2026-02-21T08:53:00.6476363Z shl.b32 %r393, %r392, 8; 2026-02-21T08:53:00.6476671Z cvt.s64.s32 %rd263, %r393; 2026-02-21T08:53:00.6477014Z add.s64 %rd242, %rd67, %rd263; 2026-02-21T08:53:00.6477363Z shl.b32 %r394, %r1, 2; 2026-02-21T08:53:00.6477698Z add.s32 %r371, %r370, %r394; 2026-02-21T08:53:00.6478031Z mov.b32 %r396, 0; 2026-02-21T08:53:00.6478441Z // begin inline asm 2026-02-21T08:53:00.6478777Z @%p48 st.shared.b32 [ %r371 + 0 ], %r396; 2026-02-21T08:53:00.6479167Z // end inline asm 2026-02-21T08:53:00.6479445Z bar.warp.sync -1; 2026-02-21T08:53:00.6479769Z setp.eq.b32 %p119, %r1, 0; 2026-02-21T08:53:00.6480127Z cvt.u64.u32 %rd227, %r370; 2026-02-21T08:53:00.6480462Z // begin inline asm 2026-02-21T08:53:00.6481079Z @%p119 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd227 + 0 ], %rd64; 2026-02-21T08:53:00.6481770Z // end inline asm 2026-02-21T08:53:00.6482055Z // begin inline asm 2026-02-21T08:53:00.6482562Z @%p119 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1; 2026-02-21T08:53:00.6483137Z // end inline asm 2026-02-21T08:53:00.6483402Z mov.b32 %r373, 64; 2026-02-21T08:53:00.6483667Z // begin inline asm 2026-02-21T08:53:00.6484352Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0, %r373; 2026-02-21T08:53:00.6485090Z // end inline asm 2026-02-21T08:53:00.6485360Z mov.b32 %r374, 256; 2026-02-21T08:53:00.6485647Z // begin inline asm 2026-02-21T08:53:00.6486186Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1, %r374; 2026-02-21T08:53:00.6486865Z // end inline asm 2026-02-21T08:53:00.6487125Z mov.b32 %r375, 2048; 2026-02-21T08:53:00.6487432Z // begin inline asm 2026-02-21T08:53:00.6488022Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0, %r375; 2026-02-21T08:53:00.6488693Z // end inline asm 2026-02-21T08:53:00.6488958Z // begin inline asm 2026-02-21T08:53:00.6489536Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1, %r375; 2026-02-21T08:53:00.6490236Z // end inline asm 2026-02-21T08:53:00.6490511Z mov.b64 %rd235, 4096; 2026-02-21T08:53:00.6490810Z // begin inline asm 2026-02-21T08:53:00.6491410Z @%p119 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd227 + 0 ], 0x0, %rd235; 2026-02-21T08:53:00.6492119Z // end inline asm 2026-02-21T08:53:00.6492400Z mov.b32 %r377, 1; 2026-02-21T08:53:00.6492853Z // begin inline asm 2026-02-21T08:53:00.6493467Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0, %r377; 2026-02-21T08:53:00.6494187Z // end inline asm 2026-02-21T08:53:00.6494461Z // begin inline asm 2026-02-21T08:53:00.6495130Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1, %r377; 2026-02-21T08:53:00.6495846Z // end inline asm 2026-02-21T08:53:00.6496119Z // begin inline asm 2026-02-21T08:53:00.6496671Z @%p119 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x6; 2026-02-21T08:53:00.6497309Z // end inline asm 2026-02-21T08:53:00.6497576Z // begin inline asm 2026-02-21T08:53:00.6498175Z @%p119 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0; 2026-02-21T08:53:00.6498907Z // end inline asm 2026-02-21T08:53:00.6499195Z // begin inline asm 2026-02-21T08:53:00.6499762Z @%p119 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x3; 2026-02-21T08:53:00.6500433Z // end inline asm 2026-02-21T08:53:00.6500702Z // begin inline asm 2026-02-21T08:53:00.6501213Z @%p119 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0; 2026-02-21T08:53:00.6501849Z // end inline asm 2026-02-21T08:53:00.6502128Z // begin inline asm 2026-02-21T08:53:00.6502982Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd242 + 0 ], [ %rd227 + 0 ], 0x80; 2026-02-21T08:53:00.6503925Z // end inline asm 2026-02-21T08:53:00.6504358Z // begin inline asm 2026-02-21T08:53:00.6504957Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd242 + 0 ], 0x80; 2026-02-21T08:53:00.6505567Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:53:00.6505987Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:53:00.6506377Z // end inline asm 2026-02-21T08:53:00.6506652Z bar.sync 0, 128; 2026-02-21T08:53:00.6507317Z .loc 1 23 73 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:23:73 2026-02-21T08:53:00.6508009Z add.s64 %rd63, %rd242, 128; 2026-02-21T08:53:00.6508334Z bar.sync 0, 128; 2026-02-21T08:53:00.6508606Z // begin inline asm 2026-02-21T08:53:00.6508915Z @%p48 st.shared.b32 [ %r371 + 0 ], %r396; 2026-02-21T08:53:00.6509282Z // end inline asm 2026-02-21T08:53:00.6509561Z bar.warp.sync -1; 2026-02-21T08:53:00.6509834Z // begin inline asm 2026-02-21T08:53:00.6510417Z @%p119 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd227 + 0 ], %rd66; 2026-02-21T08:53:00.6511135Z // end inline asm 2026-02-21T08:53:00.6511408Z // begin inline asm 2026-02-21T08:53:00.6511926Z @%p119 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1; 2026-02-21T08:53:00.6512550Z // end inline asm 2026-02-21T08:53:00.6512824Z // begin inline asm 2026-02-21T08:53:00.6513473Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0, %r373; 2026-02-21T08:53:00.6514155Z // end inline asm 2026-02-21T08:53:00.6514435Z // begin inline asm 2026-02-21T08:53:00.6515076Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1, %r374; 2026-02-21T08:53:00.6515721Z // end inline asm 2026-02-21T08:53:00.6516003Z mov.b32 %r383, 12288; 2026-02-21T08:53:00.6516300Z // begin inline asm 2026-02-21T08:53:00.6516908Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0, %r383; 2026-02-21T08:53:00.6517616Z // end inline asm 2026-02-21T08:53:00.6517892Z // begin inline asm 2026-02-21T08:53:00.6518489Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1, %r375; 2026-02-21T08:53:00.6519165Z // end inline asm 2026-02-21T08:53:00.6519439Z mov.b64 %rd253, 24576; 2026-02-21T08:53:00.6519737Z // begin inline asm 2026-02-21T08:53:00.6520354Z @%p119 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd227 + 0 ], 0x0, %rd253; 2026-02-21T08:53:00.6521074Z // end inline asm 2026-02-21T08:53:00.6521353Z // begin inline asm 2026-02-21T08:53:00.6521966Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0, %r377; 2026-02-21T08:53:00.6522843Z // end inline asm 2026-02-21T08:53:00.6523138Z // begin inline asm 2026-02-21T08:53:00.6523753Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x1, %r377; 2026-02-21T08:53:00.6524472Z // end inline asm 2026-02-21T08:53:00.6524833Z // begin inline asm 2026-02-21T08:53:00.6525367Z @%p119 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x6; 2026-02-21T08:53:00.6526165Z // end inline asm 2026-02-21T08:53:00.6526486Z // begin inline asm 2026-02-21T08:53:00.6527109Z @%p119 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0; 2026-02-21T08:53:00.6527796Z // end inline asm 2026-02-21T08:53:00.6528070Z // begin inline asm 2026-02-21T08:53:00.6528619Z @%p119 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x3; 2026-02-21T08:53:00.6529290Z // end inline asm 2026-02-21T08:53:00.6529586Z // begin inline asm 2026-02-21T08:53:00.6530134Z @%p119 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd227 + 0 ], 0x0; 2026-02-21T08:53:00.6530775Z // end inline asm 2026-02-21T08:53:00.6531039Z // begin inline asm 2026-02-21T08:53:00.6531861Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd63 + 0 ], [ %rd227 + 0 ], 0x80; 2026-02-21T08:53:00.6532748Z // end inline asm 2026-02-21T08:53:00.6533023Z // begin inline asm 2026-02-21T08:53:00.6533660Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd63 + 0 ], 0x80; 2026-02-21T08:53:00.6534228Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:53:00.6534639Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:53:00.6535098Z // end inline asm 2026-02-21T08:53:00.6535369Z bar.sync 0, 128; 2026-02-21T08:53:00.6535917Z .loc 1 32 75 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:32:75 2026-02-21T08:53:00.6536706Z setp.gt.u32 %p86, %r48, 383; 2026-02-21T08:53:00.6537082Z @%p86 bra $L__BB0_16; 2026-02-21T08:53:00.6537423Z // %bb.15: // %.lr.ph 2026-02-21T08:53:00.6538110Z .loc 1 23 73 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:23:73 2026-02-21T08:53:00.6538769Z cvta.global.u64 %rd264, %rd63; 2026-02-21T08:53:00.6539145Z setp.lt.u32 %p127, %r1, 128; 2026-02-21T08:53:00.6539489Z shl.b32 %r1495, %r1, 7; 2026-02-21T08:53:00.6539822Z and.b32 %r1496, %r1495, 16256; 2026-02-21T08:53:00.6540169Z shl.b32 %r1497, %r1, 4; 2026-02-21T08:53:00.6540477Z and.b32 %r1498, %r1497, 112; 2026-02-21T08:53:00.6540813Z or.b32 %r1499, %r1496, %r1498; 2026-02-21T08:53:00.6541149Z xor.b32 %r1500, %r1499, 112; 2026-02-21T08:53:00.6541488Z add.s32 %r1502, %r370, %r1500; 2026-02-21T08:53:00.6541831Z xor.b32 %r1503, %r1499, 96; 2026-02-21T08:53:00.6542345Z add.s32 %r1504, %r370, %r1503; 2026-02-21T08:53:00.6542719Z xor.b32 %r1505, %r1499, 80; 2026-02-21T08:53:00.6543057Z add.s32 %r1506, %r370, %r1505; 2026-02-21T08:53:00.6543399Z xor.b32 %r1507, %r1499, 64; 2026-02-21T08:53:00.6543741Z add.s32 %r1508, %r370, %r1507; 2026-02-21T08:53:00.6544085Z xor.b32 %r1509, %r1499, 48; 2026-02-21T08:53:00.6544416Z add.s32 %r1510, %r370, %r1509; 2026-02-21T08:53:00.6544837Z xor.b32 %r1511, %r1499, 32; 2026-02-21T08:53:00.6545152Z add.s32 %r1512, %r370, %r1511; 2026-02-21T08:53:00.6545496Z xor.b32 %r1513, %r1499, 16; 2026-02-21T08:53:00.6545826Z add.s32 %r1514, %r370, %r1513; 2026-02-21T08:53:00.6546186Z add.s32 %r1515, %r370, %r1499; 2026-02-21T08:53:00.6546799Z .loc 1 43 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:43:27 2026-02-21T08:53:00.6547468Z shl.b32 %r1516, %r48, 8; 2026-02-21T08:53:00.6547788Z and.b32 %r1493, %r1516, 1792; 2026-02-21T08:53:00.6548404Z .loc 1 44 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:44:27 2026-02-21T08:53:00.6549130Z shl.b32 %r1517, %r48, 5; 2026-02-21T08:53:00.6549608Z and.b32 %r1518, %r1517, 16128; 2026-02-21T08:53:00.6550235Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6550942Z shfl.sync.idx.b32 %r1519, %r2, 0, 31, -1; 2026-02-21T08:53:00.6551337Z and.b32 %r1520, %r1519, 3; 2026-02-21T08:53:00.6551657Z shl.b32 %r1521, %r1520, 21; 2026-02-21T08:53:00.6551991Z add.s32 %r395, %r1521, %r2293; 2026-02-21T08:53:00.6552341Z mov.pred %p87, -1; 2026-02-21T08:53:00.6552647Z // begin inline asm 2026-02-21T08:53:00.6553573Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 0], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6554574Z // end inline asm 2026-02-21T08:53:00.6554961Z // begin inline asm 2026-02-21T08:53:00.6555921Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 16], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6556927Z // end inline asm 2026-02-21T08:53:00.6557197Z // begin inline asm 2026-02-21T08:53:00.6558031Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 32], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6559011Z // end inline asm 2026-02-21T08:53:00.6559277Z // begin inline asm 2026-02-21T08:53:00.6560141Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 48], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6561280Z // end inline asm 2026-02-21T08:53:00.6561575Z // begin inline asm 2026-02-21T08:53:00.6562469Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 64], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6563498Z // end inline asm 2026-02-21T08:53:00.6563891Z // begin inline asm 2026-02-21T08:53:00.6564870Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 80], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6565895Z // end inline asm 2026-02-21T08:53:00.6566166Z // begin inline asm 2026-02-21T08:53:00.6567115Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 96], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6568185Z // end inline asm 2026-02-21T08:53:00.6568465Z // begin inline asm 2026-02-21T08:53:00.6569401Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 112], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6570403Z // end inline asm 2026-02-21T08:53:00.6570803Z // begin inline asm 2026-02-21T08:53:00.6571736Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 128], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6572739Z // end inline asm 2026-02-21T08:53:00.6573022Z // begin inline asm 2026-02-21T08:53:00.6573926Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 144], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6575058Z // end inline asm 2026-02-21T08:53:00.6575341Z // begin inline asm 2026-02-21T08:53:00.6576266Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 160], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6577263Z // end inline asm 2026-02-21T08:53:00.6577538Z // begin inline asm 2026-02-21T08:53:00.6578461Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 176], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6579456Z // end inline asm 2026-02-21T08:53:00.6579876Z // begin inline asm 2026-02-21T08:53:00.6580831Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 192], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6581835Z // end inline asm 2026-02-21T08:53:00.6582116Z // begin inline asm 2026-02-21T08:53:00.6582955Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 208], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6583932Z // end inline asm 2026-02-21T08:53:00.6584200Z // begin inline asm 2026-02-21T08:53:00.6585157Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 224], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6586146Z // end inline asm 2026-02-21T08:53:00.6586418Z // begin inline asm 2026-02-21T08:53:00.6587335Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 240], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6588376Z // end inline asm 2026-02-21T08:53:00.6588676Z // begin inline asm 2026-02-21T08:53:00.6589546Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 256], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6590566Z // end inline asm 2026-02-21T08:53:00.6590976Z // begin inline asm 2026-02-21T08:53:00.6591862Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 272], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6592907Z // end inline asm 2026-02-21T08:53:00.6593192Z // begin inline asm 2026-02-21T08:53:00.6594242Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 288], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6595314Z // end inline asm 2026-02-21T08:53:00.6595599Z // begin inline asm 2026-02-21T08:53:00.6596494Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 304], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6597473Z // end inline asm 2026-02-21T08:53:00.6597752Z // begin inline asm 2026-02-21T08:53:00.6598646Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 320], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6599672Z // end inline asm 2026-02-21T08:53:00.6599947Z // begin inline asm 2026-02-21T08:53:00.6600963Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 336], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6601964Z // end inline asm 2026-02-21T08:53:00.6602241Z // begin inline asm 2026-02-21T08:53:00.6603145Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 352], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6604130Z // end inline asm 2026-02-21T08:53:00.6604410Z // begin inline asm 2026-02-21T08:53:00.6605462Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 368], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6606462Z // end inline asm 2026-02-21T08:53:00.6606745Z // begin inline asm 2026-02-21T08:53:00.6607577Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 384], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6608541Z // end inline asm 2026-02-21T08:53:00.6608814Z // begin inline asm 2026-02-21T08:53:00.6609683Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 400], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6610802Z // end inline asm 2026-02-21T08:53:00.6611075Z // begin inline asm 2026-02-21T08:53:00.6611966Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 416], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6612987Z // end inline asm 2026-02-21T08:53:00.6613261Z // begin inline asm 2026-02-21T08:53:00.6614129Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 432], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6615244Z // end inline asm 2026-02-21T08:53:00.6615534Z // begin inline asm 2026-02-21T08:53:00.6616442Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 448], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6617505Z // end inline asm 2026-02-21T08:53:00.6617795Z // begin inline asm 2026-02-21T08:53:00.6618718Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 464], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6619745Z // end inline asm 2026-02-21T08:53:00.6620014Z // begin inline asm 2026-02-21T08:53:00.6620934Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 480], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6622122Z // end inline asm 2026-02-21T08:53:00.6622409Z // begin inline asm 2026-02-21T08:53:00.6623317Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r395 + 496], {%r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396, %r396}; 2026-02-21T08:53:00.6624378Z // end inline asm 2026-02-21T08:53:00.6624664Z // begin inline asm 2026-02-21T08:53:00.6625189Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:53:00.6625574Z // end inline asm 2026-02-21T08:53:00.6625841Z bar.sync 0, 128; 2026-02-21T08:53:00.6626385Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.6627077Z add.s32 %r939, %r370, 131072; 2026-02-21T08:53:00.6627438Z // begin inline asm 2026-02-21T08:53:00.6627798Z @%p119 mbarrier.init.shared::cta.b64 [%r939], 1; 2026-02-21T08:53:00.6628244Z // end inline asm 2026-02-21T08:53:00.6628527Z add.s32 %r940, %r370, 131088; 2026-02-21T08:53:00.6628865Z // begin inline asm 2026-02-21T08:53:00.6629225Z @%p119 mbarrier.init.shared::cta.b64 [%r940], 1; 2026-02-21T08:53:00.6629662Z // end inline asm 2026-02-21T08:53:00.6630257Z .loc 1 55 31 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:55:31 2026-02-21T08:53:00.6631063Z bar.sync 0, 128; 2026-02-21T08:53:00.6631360Z // begin inline asm 2026-02-21T08:53:00.6631728Z @%p119 mbarrier.arrive.shared::cta.b64 _, [%r939]; 2026-02-21T08:53:00.6632171Z // end inline asm 2026-02-21T08:53:00.6632681Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.6633320Z bar.sync 0, 128; 2026-02-21T08:53:00.6633601Z add.s32 %r942, %r370, 131104; 2026-02-21T08:53:00.6633922Z // begin inline asm 2026-02-21T08:53:00.6634272Z @%p119 mbarrier.init.shared::cta.b64 [%r942], 1; 2026-02-21T08:53:00.6634757Z // end inline asm 2026-02-21T08:53:00.6635118Z st.shared.v2.b32 [global_smem+131112], {0, 33685761}; 2026-02-21T08:53:00.6635601Z st.shared.b32 [global_smem+65536], %r2293; 2026-02-21T08:53:00.6636095Z st.shared.v2.b32 [global_smem+65544], {%r1493, %r1518}; 2026-02-21T08:53:00.6636577Z barrier.sync 1; 2026-02-21T08:53:00.6636846Z barrier.sync 1; 2026-02-21T08:53:00.6637115Z barrier.sync 1; 2026-02-21T08:53:00.6637675Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6638356Z bar.sync 0, 128; 2026-02-21T08:53:00.6638758Z // begin inline asm 2026-02-21T08:53:00.6639033Z 2026-02-21T08:53:00.6639246Z { 2026-02-21T08:53:00.6639489Z .reg .pred complete; 2026-02-21T08:53:00.6639790Z waitLoop: 2026-02-21T08:53:00.6640206Z mbarrier.try_wait.parity.shared.b64 complete, [%r942], %r396; 2026-02-21T08:53:00.6640761Z @!complete bra.uni waitLoop; 2026-02-21T08:53:00.6641083Z } 2026-02-21T08:53:00.6641210Z 2026-02-21T08:53:00.6641323Z // end inline asm 2026-02-21T08:53:00.6641857Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.6642545Z bar.sync 0, 128; 2026-02-21T08:53:00.6643760Z [286s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:53:00.6647168Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:53:00.6650360Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:53:00.6650913Z `ptxas` stderr: 2026-02-21T08:53:00.6651886Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 226 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:53:00.6653249Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:53:00.6653590Z 2026-02-21T08:53:00.6654610Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpc1p32dc8.ptx -o /tmp/tmpc1p32dc8.ptx.o 2026-02-21T08:53:00.6655916Z 2026-02-21T08:53:00.6656319Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:53:00.6656908Z // begin inline asm 2026-02-21T08:53:00.6657249Z @%p119 mbarrier.inval.shared::cta.b64 [%r942]; 2026-02-21T08:53:00.6657655Z // end inline asm 2026-02-21T08:53:00.6657916Z // begin inline asm 2026-02-21T08:53:00.6658259Z @%p119 mbarrier.inval.shared::cta.b64 [%r940]; 2026-02-21T08:53:00.6658679Z // end inline asm 2026-02-21T08:53:00.6658939Z // begin inline asm 2026-02-21T08:53:00.6659278Z @%p119 mbarrier.inval.shared::cta.b64 [%r939]; 2026-02-21T08:53:00.6659685Z // end inline asm 2026-02-21T08:53:00.6660219Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6660874Z // begin inline asm 2026-02-21T08:53:00.6661870Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r948, %r949, %r950, %r951, %r952, %r953, %r954, %r955, %r956, %r957, %r958, %r959, %r960, %r961, %r962, %r963}, [%r395 + 0]; 2026-02-21T08:53:00.6662916Z // end inline asm 2026-02-21T08:53:00.6663187Z // begin inline asm 2026-02-21T08:53:00.6664058Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r965, %r966, %r967, %r968, %r969, %r970, %r971, %r972, %r973, %r974, %r975, %r976, %r977, %r978, %r979, %r980}, [%r395 + 16]; 2026-02-21T08:53:00.6665134Z // end inline asm 2026-02-21T08:53:00.6665412Z // begin inline asm 2026-02-21T08:53:00.6666265Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r982, %r983, %r984, %r985, %r986, %r987, %r988, %r989, %r990, %r991, %r992, %r993, %r994, %r995, %r996, %r997}, [%r395 + 32]; 2026-02-21T08:53:00.6667283Z // end inline asm 2026-02-21T08:53:00.6667579Z cvt.u64.u32 %rd265, %r982; 2026-02-21T08:53:00.6667913Z cvt.u64.u32 %rd266, %r983; 2026-02-21T08:53:00.6668257Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:53:00.6668602Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:53:00.6668957Z cvt.u64.u32 %rd269, %r984; 2026-02-21T08:53:00.6669280Z cvt.u64.u32 %rd270, %r985; 2026-02-21T08:53:00.6669602Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:53:00.6669935Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:53:00.6670417Z cvt.u64.u32 %rd273, %r986; 2026-02-21T08:53:00.6670745Z cvt.u64.u32 %rd274, %r987; 2026-02-21T08:53:00.6671079Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:53:00.6671416Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:53:00.6671752Z cvt.u64.u32 %rd277, %r988; 2026-02-21T08:53:00.6672082Z cvt.u64.u32 %rd278, %r989; 2026-02-21T08:53:00.6672403Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:53:00.6672747Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:53:00.6673100Z cvt.u64.u32 %rd281, %r990; 2026-02-21T08:53:00.6673451Z cvt.u64.u32 %rd282, %r991; 2026-02-21T08:53:00.6673778Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:53:00.6674118Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:53:00.6674467Z cvt.u64.u32 %rd285, %r992; 2026-02-21T08:53:00.6674893Z cvt.u64.u32 %rd286, %r993; 2026-02-21T08:53:00.6675224Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:53:00.6675545Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:53:00.6675886Z cvt.u64.u32 %rd289, %r994; 2026-02-21T08:53:00.6676209Z cvt.u64.u32 %rd290, %r995; 2026-02-21T08:53:00.6676538Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:53:00.6676870Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:53:00.6677213Z cvt.u64.u32 %rd293, %r996; 2026-02-21T08:53:00.6677531Z cvt.u64.u32 %rd294, %r997; 2026-02-21T08:53:00.6677858Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:53:00.6678191Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:53:00.6678519Z // begin inline asm 2026-02-21T08:53:00.6679535Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r999, %r1000, %r1001, %r1002, %r1003, %r1004, %r1005, %r1006, %r1007, %r1008, %r1009, %r1010, %r1011, %r1012, %r1013, %r1014}, [%r395 + 48]; 2026-02-21T08:53:00.6680753Z // end inline asm 2026-02-21T08:53:00.6681039Z cvt.u64.u32 %rd297, %r999; 2026-02-21T08:53:00.6681353Z cvt.u64.u32 %rd298, %r1000; 2026-02-21T08:53:00.6681669Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:53:00.6681985Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:53:00.6682415Z cvt.u64.u32 %rd301, %r1001; 2026-02-21T08:53:00.6682761Z cvt.u64.u32 %rd302, %r1002; 2026-02-21T08:53:00.6683088Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:53:00.6683418Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:53:00.6683742Z cvt.u64.u32 %rd305, %r1003; 2026-02-21T08:53:00.6684064Z cvt.u64.u32 %rd306, %r1004; 2026-02-21T08:53:00.6684386Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:53:00.6684781Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:53:00.6685126Z cvt.u64.u32 %rd309, %r1005; 2026-02-21T08:53:00.6685476Z cvt.u64.u32 %rd310, %r1006; 2026-02-21T08:53:00.6685830Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:53:00.6686159Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:53:00.6686501Z cvt.u64.u32 %rd313, %r1007; 2026-02-21T08:53:00.6686840Z cvt.u64.u32 %rd314, %r1008; 2026-02-21T08:53:00.6687184Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:53:00.6687513Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:53:00.6687964Z cvt.u64.u32 %rd317, %r1009; 2026-02-21T08:53:00.6688309Z cvt.u64.u32 %rd318, %r1010; 2026-02-21T08:53:00.6688641Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:53:00.6688974Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:53:00.6689325Z cvt.u64.u32 %rd321, %r1011; 2026-02-21T08:53:00.6689662Z cvt.u64.u32 %rd322, %r1012; 2026-02-21T08:53:00.6689989Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:53:00.6690319Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:53:00.6690648Z cvt.u64.u32 %rd325, %r1013; 2026-02-21T08:53:00.6690977Z cvt.u64.u32 %rd326, %r1014; 2026-02-21T08:53:00.6691302Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:53:00.6691650Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:53:00.6691999Z // begin inline asm 2026-02-21T08:53:00.6693015Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1016, %r1017, %r1018, %r1019, %r1020, %r1021, %r1022, %r1023, %r1024, %r1025, %r1026, %r1027, %r1028, %r1029, %r1030, %r1031}, [%r395 + 64]; 2026-02-21T08:53:00.6694125Z // end inline asm 2026-02-21T08:53:00.6694394Z // begin inline asm 2026-02-21T08:53:00.6695475Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1033, %r1034, %r1035, %r1036, %r1037, %r1038, %r1039, %r1040, %r1041, %r1042, %r1043, %r1044, %r1045, %r1046, %r1047, %r1048}, [%r395 + 80]; 2026-02-21T08:53:00.6696672Z // end inline asm 2026-02-21T08:53:00.6696956Z // begin inline asm 2026-02-21T08:53:00.6697930Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1050, %r1051, %r1052, %r1053, %r1054, %r1055, %r1056, %r1057, %r1058, %r1059, %r1060, %r1061, %r1062, %r1063, %r1064, %r1065}, [%r395 + 96]; 2026-02-21T08:53:00.6699027Z // end inline asm 2026-02-21T08:53:00.6699327Z cvt.u64.u32 %rd329, %r1050; 2026-02-21T08:53:00.6699669Z cvt.u64.u32 %rd330, %r1051; 2026-02-21T08:53:00.6700020Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:53:00.6700335Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:53:00.6700676Z cvt.u64.u32 %rd333, %r1052; 2026-02-21T08:53:00.6700998Z cvt.u64.u32 %rd334, %r1053; 2026-02-21T08:53:00.6701342Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:53:00.6701686Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:53:00.6702029Z cvt.u64.u32 %rd337, %r1054; 2026-02-21T08:53:00.6702368Z cvt.u64.u32 %rd338, %r1055; 2026-02-21T08:53:00.6702688Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:53:00.6703019Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:53:00.6703358Z cvt.u64.u32 %rd341, %r1056; 2026-02-21T08:53:00.6703693Z cvt.u64.u32 %rd342, %r1057; 2026-02-21T08:53:00.6704028Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:53:00.6704387Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:53:00.6704807Z cvt.u64.u32 %rd345, %r1058; 2026-02-21T08:53:00.6705155Z cvt.u64.u32 %rd346, %r1059; 2026-02-21T08:53:00.6705612Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:53:00.6705941Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:53:00.6706282Z cvt.u64.u32 %rd349, %r1060; 2026-02-21T08:53:00.6706593Z cvt.u64.u32 %rd350, %r1061; 2026-02-21T08:53:00.6706899Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:53:00.6707219Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:53:00.6707555Z cvt.u64.u32 %rd353, %r1062; 2026-02-21T08:53:00.6707974Z cvt.u64.u32 %rd354, %r1063; 2026-02-21T08:53:00.6708318Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:53:00.6708633Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:53:00.6708960Z cvt.u64.u32 %rd357, %r1064; 2026-02-21T08:53:00.6709280Z cvt.u64.u32 %rd358, %r1065; 2026-02-21T08:53:00.6709591Z shl.b64 %rd359, %rd358, 32; 2026-02-21T08:53:00.6709911Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T08:53:00.6710236Z // begin inline asm 2026-02-21T08:53:00.6711233Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1067, %r1068, %r1069, %r1070, %r1071, %r1072, %r1073, %r1074, %r1075, %r1076, %r1077, %r1078, %r1079, %r1080, %r1081, %r1082}, [%r395 + 112]; 2026-02-21T08:53:00.6712342Z // end inline asm 2026-02-21T08:53:00.6712621Z cvt.u64.u32 %rd361, %r1067; 2026-02-21T08:53:00.6712939Z cvt.u64.u32 %rd362, %r1068; 2026-02-21T08:53:00.6713258Z shl.b64 %rd363, %rd362, 32; 2026-02-21T08:53:00.6713705Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T08:53:00.6714064Z cvt.u64.u32 %rd365, %r1069; 2026-02-21T08:53:00.6714412Z cvt.u64.u32 %rd366, %r1070; 2026-02-21T08:53:00.6714807Z shl.b64 %rd367, %rd366, 32; 2026-02-21T08:53:00.6715146Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T08:53:00.6715477Z cvt.u64.u32 %rd369, %r1071; 2026-02-21T08:53:00.6715805Z cvt.u64.u32 %rd370, %r1072; 2026-02-21T08:53:00.6716129Z shl.b64 %rd371, %rd370, 32; 2026-02-21T08:53:00.6716467Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T08:53:00.6716826Z cvt.u64.u32 %rd373, %r1073; 2026-02-21T08:53:00.6717170Z cvt.u64.u32 %rd374, %r1074; 2026-02-21T08:53:00.6717504Z shl.b64 %rd375, %rd374, 32; 2026-02-21T08:53:00.6717843Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T08:53:00.6718197Z cvt.u64.u32 %rd377, %r1075; 2026-02-21T08:53:00.6718528Z cvt.u64.u32 %rd378, %r1076; 2026-02-21T08:53:00.6718857Z shl.b64 %rd379, %rd378, 32; 2026-02-21T08:53:00.6719169Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T08:53:00.6719498Z cvt.u64.u32 %rd381, %r1077; 2026-02-21T08:53:00.6719824Z cvt.u64.u32 %rd382, %r1078; 2026-02-21T08:53:00.6720170Z shl.b64 %rd383, %rd382, 32; 2026-02-21T08:53:00.6720683Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T08:53:00.6721022Z cvt.u64.u32 %rd385, %r1079; 2026-02-21T08:53:00.6721352Z cvt.u64.u32 %rd386, %r1080; 2026-02-21T08:53:00.6721673Z shl.b64 %rd387, %rd386, 32; 2026-02-21T08:53:00.6722011Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T08:53:00.6722350Z cvt.u64.u32 %rd389, %r1081; 2026-02-21T08:53:00.6722693Z cvt.u64.u32 %rd390, %r1082; 2026-02-21T08:53:00.6723034Z shl.b64 %rd391, %rd390, 32; 2026-02-21T08:53:00.6723372Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T08:53:00.6723725Z // begin inline asm 2026-02-21T08:53:00.6724801Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1084, %r1085, %r1086, %r1087, %r1088, %r1089, %r1090, %r1091, %r1092, %r1093, %r1094, %r1095, %r1096, %r1097, %r1098, %r1099}, [%r395 + 128]; 2026-02-21T08:53:00.6725852Z // end inline asm 2026-02-21T08:53:00.6726126Z // begin inline asm 2026-02-21T08:53:00.6727121Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1101, %r1102, %r1103, %r1104, %r1105, %r1106, %r1107, %r1108, %r1109, %r1110, %r1111, %r1112, %r1113, %r1114, %r1115, %r1116}, [%r395 + 144]; 2026-02-21T08:53:00.6728182Z // end inline asm 2026-02-21T08:53:00.6728467Z // begin inline asm 2026-02-21T08:53:00.6729480Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1118, %r1119, %r1120, %r1121, %r1122, %r1123, %r1124, %r1125, %r1126, %r1127, %r1128, %r1129, %r1130, %r1131, %r1132, %r1133}, [%r395 + 160]; 2026-02-21T08:53:00.6730535Z // end inline asm 2026-02-21T08:53:00.6730817Z cvt.u64.u32 %rd393, %r1118; 2026-02-21T08:53:00.6731260Z cvt.u64.u32 %rd394, %r1119; 2026-02-21T08:53:00.6731570Z shl.b64 %rd395, %rd394, 32; 2026-02-21T08:53:00.6731881Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T08:53:00.6732216Z cvt.u64.u32 %rd397, %r1120; 2026-02-21T08:53:00.6732536Z cvt.u64.u32 %rd398, %r1121; 2026-02-21T08:53:00.6732863Z shl.b64 %rd399, %rd398, 32; 2026-02-21T08:53:00.6733191Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T08:53:00.6733523Z cvt.u64.u32 %rd401, %r1122; 2026-02-21T08:53:00.6733939Z cvt.u64.u32 %rd402, %r1123; 2026-02-21T08:53:00.6734277Z shl.b64 %rd403, %rd402, 32; 2026-02-21T08:53:00.6734606Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T08:53:00.6734988Z cvt.u64.u32 %rd405, %r1124; 2026-02-21T08:53:00.6735330Z cvt.u64.u32 %rd406, %r1125; 2026-02-21T08:53:00.6735655Z shl.b64 %rd407, %rd406, 32; 2026-02-21T08:53:00.6735971Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T08:53:00.6736319Z cvt.u64.u32 %rd409, %r1126; 2026-02-21T08:53:00.6736644Z cvt.u64.u32 %rd410, %r1127; 2026-02-21T08:53:00.6736976Z shl.b64 %rd411, %rd410, 32; 2026-02-21T08:53:00.6737297Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T08:53:00.6737623Z cvt.u64.u32 %rd413, %r1128; 2026-02-21T08:53:00.6737938Z cvt.u64.u32 %rd414, %r1129; 2026-02-21T08:53:00.6738267Z shl.b64 %rd415, %rd414, 32; 2026-02-21T08:53:00.6738588Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T08:53:00.6739043Z cvt.u64.u32 %rd417, %r1130; 2026-02-21T08:53:00.6739390Z cvt.u64.u32 %rd418, %r1131; 2026-02-21T08:53:00.6739729Z shl.b64 %rd419, %rd418, 32; 2026-02-21T08:53:00.6740068Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T08:53:00.6740406Z cvt.u64.u32 %rd421, %r1132; 2026-02-21T08:53:00.6740737Z cvt.u64.u32 %rd422, %r1133; 2026-02-21T08:53:00.6741072Z shl.b64 %rd423, %rd422, 32; 2026-02-21T08:53:00.6741426Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T08:53:00.6741776Z // begin inline asm 2026-02-21T08:53:00.6742789Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1135, %r1136, %r1137, %r1138, %r1139, %r1140, %r1141, %r1142, %r1143, %r1144, %r1145, %r1146, %r1147, %r1148, %r1149, %r1150}, [%r395 + 176]; 2026-02-21T08:53:00.6743849Z // end inline asm 2026-02-21T08:53:00.6744124Z cvt.u64.u32 %rd425, %r1135; 2026-02-21T08:53:00.6744462Z cvt.u64.u32 %rd426, %r1136; 2026-02-21T08:53:00.6744858Z shl.b64 %rd427, %rd426, 32; 2026-02-21T08:53:00.6745202Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T08:53:00.6745543Z cvt.u64.u32 %rd429, %r1137; 2026-02-21T08:53:00.6745872Z cvt.u64.u32 %rd430, %r1138; 2026-02-21T08:53:00.6746196Z shl.b64 %rd431, %rd430, 32; 2026-02-21T08:53:00.6746670Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T08:53:00.6747015Z cvt.u64.u32 %rd433, %r1139; 2026-02-21T08:53:00.6747364Z cvt.u64.u32 %rd434, %r1140; 2026-02-21T08:53:00.6747709Z shl.b64 %rd435, %rd434, 32; 2026-02-21T08:53:00.6748042Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T08:53:00.6748390Z cvt.u64.u32 %rd437, %r1141; 2026-02-21T08:53:00.6748728Z cvt.u64.u32 %rd438, %r1142; 2026-02-21T08:53:00.6749070Z shl.b64 %rd439, %rd438, 32; 2026-02-21T08:53:00.6749395Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T08:53:00.6749733Z cvt.u64.u32 %rd441, %r1143; 2026-02-21T08:53:00.6750049Z cvt.u64.u32 %rd442, %r1144; 2026-02-21T08:53:00.6750371Z shl.b64 %rd443, %rd442, 32; 2026-02-21T08:53:00.6750700Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T08:53:00.6751052Z cvt.u64.u32 %rd445, %r1145; 2026-02-21T08:53:00.6751392Z cvt.u64.u32 %rd446, %r1146; 2026-02-21T08:53:00.6751718Z shl.b64 %rd447, %rd446, 32; 2026-02-21T08:53:00.6752051Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T08:53:00.6752387Z cvt.u64.u32 %rd449, %r1147; 2026-02-21T08:53:00.6752717Z cvt.u64.u32 %rd450, %r1148; 2026-02-21T08:53:00.6753043Z shl.b64 %rd451, %rd450, 32; 2026-02-21T08:53:00.6753385Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T08:53:00.6753737Z cvt.u64.u32 %rd453, %r1149; 2026-02-21T08:53:00.6754078Z cvt.u64.u32 %rd454, %r1150; 2026-02-21T08:53:00.6754411Z shl.b64 %rd455, %rd454, 32; 2026-02-21T08:53:00.6754790Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T08:53:00.6755142Z // begin inline asm 2026-02-21T08:53:00.6756194Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1152, %r1153, %r1154, %r1155, %r1156, %r1157, %r1158, %r1159, %r1160, %r1161, %r1162, %r1163, %r1164, %r1165, %r1166, %r1167}, [%r395 + 192]; 2026-02-21T08:53:00.6757245Z // end inline asm 2026-02-21T08:53:00.6757520Z // begin inline asm 2026-02-21T08:53:00.6758584Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1169, %r1170, %r1171, %r1172, %r1173, %r1174, %r1175, %r1176, %r1177, %r1178, %r1179, %r1180, %r1181, %r1182, %r1183, %r1184}, [%r395 + 208]; 2026-02-21T08:53:00.6759666Z // end inline asm 2026-02-21T08:53:00.6759942Z // begin inline asm 2026-02-21T08:53:00.6760907Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1186, %r1187, %r1188, %r1189, %r1190, %r1191, %r1192, %r1193, %r1194, %r1195, %r1196, %r1197, %r1198, %r1199, %r1200, %r1201}, [%r395 + 224]; 2026-02-21T08:53:00.6761981Z // end inline asm 2026-02-21T08:53:00.6762267Z cvt.u64.u32 %rd457, %r1186; 2026-02-21T08:53:00.6762595Z cvt.u64.u32 %rd458, %r1187; 2026-02-21T08:53:00.6762941Z shl.b64 %rd459, %rd458, 32; 2026-02-21T08:53:00.6763274Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T08:53:00.6763627Z cvt.u64.u32 %rd461, %r1188; 2026-02-21T08:53:00.6763962Z cvt.u64.u32 %rd462, %r1189; 2026-02-21T08:53:00.6764285Z shl.b64 %rd463, %rd462, 32; 2026-02-21T08:53:00.6764620Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T08:53:00.6765194Z cvt.u64.u32 %rd465, %r1190; 2026-02-21T08:53:00.6765547Z cvt.u64.u32 %rd466, %r1191; 2026-02-21T08:53:00.6765887Z shl.b64 %rd467, %rd466, 32; 2026-02-21T08:53:00.6766247Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T08:53:00.6766604Z cvt.u64.u32 %rd469, %r1192; 2026-02-21T08:53:00.6766937Z cvt.u64.u32 %rd470, %r1193; 2026-02-21T08:53:00.6767283Z shl.b64 %rd471, %rd470, 32; 2026-02-21T08:53:00.6767622Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T08:53:00.6767974Z cvt.u64.u32 %rd473, %r1194; 2026-02-21T08:53:00.6768292Z cvt.u64.u32 %rd474, %r1195; 2026-02-21T08:53:00.6768611Z shl.b64 %rd475, %rd474, 32; 2026-02-21T08:53:00.6768947Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T08:53:00.6769294Z cvt.u64.u32 %rd477, %r1196; 2026-02-21T08:53:00.6769623Z cvt.u64.u32 %rd478, %r1197; 2026-02-21T08:53:00.6769959Z shl.b64 %rd479, %rd478, 32; 2026-02-21T08:53:00.6770290Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T08:53:00.6770633Z cvt.u64.u32 %rd481, %r1198; 2026-02-21T08:53:00.6770969Z cvt.u64.u32 %rd482, %r1199; 2026-02-21T08:53:00.6771298Z shl.b64 %rd483, %rd482, 32; 2026-02-21T08:53:00.6771645Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T08:53:00.6772127Z cvt.u64.u32 %rd485, %r1200; 2026-02-21T08:53:00.6772482Z cvt.u64.u32 %rd486, %r1201; 2026-02-21T08:53:00.6772810Z shl.b64 %rd487, %rd486, 32; 2026-02-21T08:53:00.6773159Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T08:53:00.6773509Z // begin inline asm 2026-02-21T08:53:00.6774468Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1203, %r1204, %r1205, %r1206, %r1207, %r1208, %r1209, %r1210, %r1211, %r1212, %r1213, %r1214, %r1215, %r1216, %r1217, %r1218}, [%r395 + 240]; 2026-02-21T08:53:00.6775674Z // end inline asm 2026-02-21T08:53:00.6775963Z cvt.u64.u32 %rd489, %r1203; 2026-02-21T08:53:00.6776304Z cvt.u64.u32 %rd490, %r1204; 2026-02-21T08:53:00.6776629Z shl.b64 %rd491, %rd490, 32; 2026-02-21T08:53:00.6776967Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T08:53:00.6777304Z cvt.u64.u32 %rd493, %r1205; 2026-02-21T08:53:00.6777642Z cvt.u64.u32 %rd494, %r1206; 2026-02-21T08:53:00.6777976Z shl.b64 %rd495, %rd494, 32; 2026-02-21T08:53:00.6778325Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T08:53:00.6778703Z cvt.u64.u32 %rd497, %r1207; 2026-02-21T08:53:00.6779032Z cvt.u64.u32 %rd498, %r1208; 2026-02-21T08:53:00.6779372Z shl.b64 %rd499, %rd498, 32; 2026-02-21T08:53:00.6779700Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T08:53:00.6780040Z cvt.u64.u32 %rd501, %r1209; 2026-02-21T08:53:00.6780356Z cvt.u64.u32 %rd502, %r1210; 2026-02-21T08:53:00.6780682Z shl.b64 %rd503, %rd502, 32; 2026-02-21T08:53:00.6780993Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T08:53:00.6781459Z cvt.u64.u32 %rd505, %r1211; 2026-02-21T08:53:00.6781781Z cvt.u64.u32 %rd506, %r1212; 2026-02-21T08:53:00.6782115Z shl.b64 %rd507, %rd506, 32; 2026-02-21T08:53:00.6782442Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T08:53:00.6782769Z cvt.u64.u32 %rd509, %r1213; 2026-02-21T08:53:00.6783092Z cvt.u64.u32 %rd510, %r1214; 2026-02-21T08:53:00.6783412Z shl.b64 %rd511, %rd510, 32; 2026-02-21T08:53:00.6783744Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T08:53:00.6784174Z cvt.u64.u32 %rd513, %r1215; 2026-02-21T08:53:00.6784535Z cvt.u64.u32 %rd514, %r1216; 2026-02-21T08:53:00.6784913Z shl.b64 %rd515, %rd514, 32; 2026-02-21T08:53:00.6785241Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T08:53:00.6785573Z cvt.u64.u32 %rd517, %r1217; 2026-02-21T08:53:00.6785917Z cvt.u64.u32 %rd518, %r1218; 2026-02-21T08:53:00.6786256Z shl.b64 %rd519, %rd518, 32; 2026-02-21T08:53:00.6786596Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T08:53:00.6786924Z // begin inline asm 2026-02-21T08:53:00.6787884Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1220, %r1221, %r1222, %r1223, %r1224, %r1225, %r1226, %r1227, %r1228, %r1229, %r1230, %r1231, %r1232, %r1233, %r1234, %r1235}, [%r395 + 256]; 2026-02-21T08:53:00.6788993Z // end inline asm 2026-02-21T08:53:00.6789270Z // begin inline asm 2026-02-21T08:53:00.6790361Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1237, %r1238, %r1239, %r1240, %r1241, %r1242, %r1243, %r1244, %r1245, %r1246, %r1247, %r1248, %r1249, %r1250, %r1251, %r1252}, [%r395 + 272]; 2026-02-21T08:53:00.6791523Z // end inline asm 2026-02-21T08:53:00.6791814Z // begin inline asm 2026-02-21T08:53:00.6792792Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1254, %r1255, %r1256, %r1257, %r1258, %r1259, %r1260, %r1261, %r1262, %r1263, %r1264, %r1265, %r1266, %r1267, %r1268, %r1269}, [%r395 + 288]; 2026-02-21T08:53:00.6793825Z // end inline asm 2026-02-21T08:53:00.6794124Z cvt.u64.u32 %rd521, %r1254; 2026-02-21T08:53:00.6794461Z cvt.u64.u32 %rd522, %r1255; 2026-02-21T08:53:00.6794882Z shl.b64 %rd523, %rd522, 32; 2026-02-21T08:53:00.6795224Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T08:53:00.6795556Z cvt.u64.u32 %rd525, %r1256; 2026-02-21T08:53:00.6795882Z cvt.u64.u32 %rd526, %r1257; 2026-02-21T08:53:00.6796201Z shl.b64 %rd527, %rd526, 32; 2026-02-21T08:53:00.6796535Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T08:53:00.6796887Z cvt.u64.u32 %rd529, %r1258; 2026-02-21T08:53:00.6797234Z cvt.u64.u32 %rd530, %r1259; 2026-02-21T08:53:00.6797557Z shl.b64 %rd531, %rd530, 32; 2026-02-21T08:53:00.6797898Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T08:53:00.6798379Z cvt.u64.u32 %rd533, %r1260; 2026-02-21T08:53:00.6798711Z cvt.u64.u32 %rd534, %r1261; 2026-02-21T08:53:00.6799035Z shl.b64 %rd535, %rd534, 32; 2026-02-21T08:53:00.6799343Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T08:53:00.6799682Z cvt.u64.u32 %rd537, %r1262; 2026-02-21T08:53:00.6800005Z cvt.u64.u32 %rd538, %r1263; 2026-02-21T08:53:00.6800336Z shl.b64 %rd539, %rd538, 32; 2026-02-21T08:53:00.6800664Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T08:53:00.6801016Z cvt.u64.u32 %rd541, %r1264; 2026-02-21T08:53:00.6801332Z cvt.u64.u32 %rd542, %r1265; 2026-02-21T08:53:00.6801655Z shl.b64 %rd543, %rd542, 32; 2026-02-21T08:53:00.6801989Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T08:53:00.6802324Z cvt.u64.u32 %rd545, %r1266; 2026-02-21T08:53:00.6802660Z cvt.u64.u32 %rd546, %r1267; 2026-02-21T08:53:00.6802999Z shl.b64 %rd547, %rd546, 32; 2026-02-21T08:53:00.6803356Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T08:53:00.6803706Z cvt.u64.u32 %rd549, %r1268; 2026-02-21T08:53:00.6804047Z cvt.u64.u32 %rd550, %r1269; 2026-02-21T08:53:00.6804365Z shl.b64 %rd551, %rd550, 32; 2026-02-21T08:53:00.6804757Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T08:53:00.6805092Z // begin inline asm 2026-02-21T08:53:00.6806006Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1271, %r1272, %r1273, %r1274, %r1275, %r1276, %r1277, %r1278, %r1279, %r1280, %r1281, %r1282, %r1283, %r1284, %r1285, %r1286}, [%r395 + 304]; 2026-02-21T08:53:00.6807076Z // end inline asm 2026-02-21T08:53:00.6807471Z cvt.u64.u32 %rd553, %r1271; 2026-02-21T08:53:00.6807794Z cvt.u64.u32 %rd554, %r1272; 2026-02-21T08:53:00.6808105Z shl.b64 %rd555, %rd554, 32; 2026-02-21T08:53:00.6808438Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T08:53:00.6808765Z cvt.u64.u32 %rd557, %r1273; 2026-02-21T08:53:00.6809094Z cvt.u64.u32 %rd558, %r1274; 2026-02-21T08:53:00.6809434Z shl.b64 %rd559, %rd558, 32; 2026-02-21T08:53:00.6809771Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T08:53:00.6810219Z cvt.u64.u32 %rd561, %r1275; 2026-02-21T08:53:00.6810578Z cvt.u64.u32 %rd562, %r1276; 2026-02-21T08:53:00.6810923Z shl.b64 %rd563, %rd562, 32; 2026-02-21T08:53:00.6811248Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T08:53:00.6811581Z cvt.u64.u32 %rd565, %r1277; 2026-02-21T08:53:00.6811902Z cvt.u64.u32 %rd566, %r1278; 2026-02-21T08:53:00.6812227Z shl.b64 %rd567, %rd566, 32; 2026-02-21T08:53:00.6812553Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T08:53:00.6812903Z cvt.u64.u32 %rd569, %r1279; 2026-02-21T08:53:00.6813244Z cvt.u64.u32 %rd570, %r1280; 2026-02-21T08:53:00.6813569Z shl.b64 %rd571, %rd570, 32; 2026-02-21T08:53:00.6813903Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T08:53:00.6814235Z cvt.u64.u32 %rd573, %r1281; 2026-02-21T08:53:00.6814568Z cvt.u64.u32 %rd574, %r1282; 2026-02-21T08:53:00.6815017Z shl.b64 %rd575, %rd574, 32; 2026-02-21T08:53:00.6815506Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T08:53:00.6815882Z cvt.u64.u32 %rd577, %r1283; 2026-02-21T08:53:00.6816223Z cvt.u64.u32 %rd578, %r1284; 2026-02-21T08:53:00.6816556Z shl.b64 %rd579, %rd578, 32; 2026-02-21T08:53:00.6816910Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T08:53:00.6817266Z cvt.u64.u32 %rd581, %r1285; 2026-02-21T08:53:00.6817596Z cvt.u64.u32 %rd582, %r1286; 2026-02-21T08:53:00.6817925Z shl.b64 %rd583, %rd582, 32; 2026-02-21T08:53:00.6818240Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T08:53:00.6818578Z // begin inline asm 2026-02-21T08:53:00.6819565Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1288, %r1289, %r1290, %r1291, %r1292, %r1293, %r1294, %r1295, %r1296, %r1297, %r1298, %r1299, %r1300, %r1301, %r1302, %r1303}, [%r395 + 320]; 2026-02-21T08:53:00.6820630Z // end inline asm 2026-02-21T08:53:00.6820898Z // begin inline asm 2026-02-21T08:53:00.6821909Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1305, %r1306, %r1307, %r1308, %r1309, %r1310, %r1311, %r1312, %r1313, %r1314, %r1315, %r1316, %r1317, %r1318, %r1319, %r1320}, [%r395 + 336]; 2026-02-21T08:53:00.6823033Z // end inline asm 2026-02-21T08:53:00.6823315Z cvt.u64.u32 %rd585, %r1313; 2026-02-21T08:53:00.6823789Z cvt.u64.u32 %rd586, %r1314; 2026-02-21T08:53:00.6824104Z shl.b64 %rd587, %rd586, 32; 2026-02-21T08:53:00.6824441Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T08:53:00.6824818Z // begin inline asm 2026-02-21T08:53:00.6825670Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1322, %r1323, %r1324, %r1325, %r1326, %r1327, %r1328, %r1329, %r1330, %r1331, %r1332, %r1333, %r1334, %r1335, %r1336, %r1337}, [%r395 + 352]; 2026-02-21T08:53:00.6826748Z // end inline asm 2026-02-21T08:53:00.6827031Z cvt.u64.u32 %rd589, %r1322; 2026-02-21T08:53:00.6827361Z cvt.u64.u32 %rd590, %r1323; 2026-02-21T08:53:00.6827688Z shl.b64 %rd591, %rd590, 32; 2026-02-21T08:53:00.6828034Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T08:53:00.6828395Z cvt.u64.u32 %rd593, %r1324; 2026-02-21T08:53:00.6828745Z cvt.u64.u32 %rd594, %r1325; 2026-02-21T08:53:00.6829075Z shl.b64 %rd595, %rd594, 32; 2026-02-21T08:53:00.6829421Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T08:53:00.6829772Z cvt.u64.u32 %rd597, %r1326; 2026-02-21T08:53:00.6830090Z cvt.u64.u32 %rd598, %r1327; 2026-02-21T08:53:00.6830413Z shl.b64 %rd599, %rd598, 32; 2026-02-21T08:53:00.6830722Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T08:53:00.6831055Z cvt.u64.u32 %rd601, %r1328; 2026-02-21T08:53:00.6831365Z cvt.u64.u32 %rd602, %r1329; 2026-02-21T08:53:00.6831689Z shl.b64 %rd603, %rd602, 32; 2026-02-21T08:53:00.6832013Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T08:53:00.6832342Z cvt.u64.u32 %rd605, %r1330; 2026-02-21T08:53:00.6832791Z cvt.u64.u32 %rd606, %r1331; 2026-02-21T08:53:00.6833112Z shl.b64 %rd607, %rd606, 32; 2026-02-21T08:53:00.6833443Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T08:53:00.6833769Z cvt.u64.u32 %rd609, %r1332; 2026-02-21T08:53:00.6834098Z cvt.u64.u32 %rd610, %r1333; 2026-02-21T08:53:00.6834428Z shl.b64 %rd611, %rd610, 32; 2026-02-21T08:53:00.6834857Z or.b64 %rd612, %rd609, %rd611; 2026-02-21T08:53:00.6835306Z cvt.u64.u32 %rd613, %r1334; 2026-02-21T08:53:00.6835665Z cvt.u64.u32 %rd614, %r1335; 2026-02-21T08:53:00.6835999Z shl.b64 %rd615, %rd614, 32; 2026-02-21T08:53:00.6836331Z or.b64 %rd616, %rd613, %rd615; 2026-02-21T08:53:00.6836670Z cvt.u64.u32 %rd617, %r1336; 2026-02-21T08:53:00.6836983Z cvt.u64.u32 %rd618, %r1337; 2026-02-21T08:53:00.6837314Z shl.b64 %rd619, %rd618, 32; 2026-02-21T08:53:00.6837644Z or.b64 %rd620, %rd617, %rd619; 2026-02-21T08:53:00.6837987Z // begin inline asm 2026-02-21T08:53:00.6838958Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1339, %r1340, %r1341, %r1342, %r1343, %r1344, %r1345, %r1346, %r1347, %r1348, %r1349, %r1350, %r1351, %r1352, %r1353, %r1354}, [%r395 + 368]; 2026-02-21T08:53:00.6840025Z // end inline asm 2026-02-21T08:53:00.6840309Z cvt.u64.u32 %rd621, %r1339; 2026-02-21T08:53:00.6840659Z cvt.u64.u32 %rd622, %r1340; 2026-02-21T08:53:00.6841011Z shl.b64 %rd623, %rd622, 32; 2026-02-21T08:53:00.6841461Z or.b64 %rd624, %rd621, %rd623; 2026-02-21T08:53:00.6841838Z cvt.u64.u32 %rd625, %r1341; 2026-02-21T08:53:00.6842175Z cvt.u64.u32 %rd626, %r1342; 2026-02-21T08:53:00.6842515Z shl.b64 %rd627, %rd626, 32; 2026-02-21T08:53:00.6842837Z or.b64 %rd628, %rd625, %rd627; 2026-02-21T08:53:00.6843168Z cvt.u64.u32 %rd629, %r1343; 2026-02-21T08:53:00.6843495Z cvt.u64.u32 %rd630, %r1344; 2026-02-21T08:53:00.6843835Z shl.b64 %rd631, %rd630, 32; 2026-02-21T08:53:00.6844169Z or.b64 %rd632, %rd629, %rd631; 2026-02-21T08:53:00.6844519Z cvt.u64.u32 %rd633, %r1345; 2026-02-21T08:53:00.6844975Z cvt.u64.u32 %rd634, %r1346; 2026-02-21T08:53:00.6845303Z shl.b64 %rd635, %rd634, 32; 2026-02-21T08:53:00.6845636Z or.b64 %rd636, %rd633, %rd635; 2026-02-21T08:53:00.6845974Z cvt.u64.u32 %rd637, %r1347; 2026-02-21T08:53:00.6846310Z cvt.u64.u32 %rd638, %r1348; 2026-02-21T08:53:00.6846642Z shl.b64 %rd639, %rd638, 32; 2026-02-21T08:53:00.6846996Z or.b64 %rd640, %rd637, %rd639; 2026-02-21T08:53:00.6847350Z cvt.u64.u32 %rd641, %r1349; 2026-02-21T08:53:00.6847688Z cvt.u64.u32 %rd642, %r1350; 2026-02-21T08:53:00.6848037Z shl.b64 %rd643, %rd642, 32; 2026-02-21T08:53:00.6848530Z or.b64 %rd644, %rd641, %rd643; 2026-02-21T08:53:00.6848874Z cvt.u64.u32 %rd645, %r1351; 2026-02-21T08:53:00.6849190Z cvt.u64.u32 %rd646, %r1352; 2026-02-21T08:53:00.6849516Z shl.b64 %rd647, %rd646, 32; 2026-02-21T08:53:00.6849839Z or.b64 %rd648, %rd645, %rd647; 2026-02-21T08:53:00.6850190Z cvt.u64.u32 %rd649, %r1353; 2026-02-21T08:53:00.6850517Z cvt.u64.u32 %rd650, %r1354; 2026-02-21T08:53:00.6850856Z shl.b64 %rd651, %rd650, 32; 2026-02-21T08:53:00.6851189Z or.b64 %rd652, %rd649, %rd651; 2026-02-21T08:53:00.6851526Z // begin inline asm 2026-02-21T08:53:00.6852513Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1356, %r1357, %r1358, %r1359, %r1360, %r1361, %r1362, %r1363, %r1364, %r1365, %r1366, %r1367, %r1368, %r1369, %r1370, %r1371}, [%r395 + 384]; 2026-02-21T08:53:00.6853657Z // end inline asm 2026-02-21T08:53:00.6853950Z // begin inline asm 2026-02-21T08:53:00.6854989Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1373, %r1374, %r1375, %r1376, %r1377, %r1378, %r1379, %r1380, %r1381, %r1382, %r1383, %r1384, %r1385, %r1386, %r1387, %r1388}, [%r395 + 400]; 2026-02-21T08:53:00.6856002Z // end inline asm 2026-02-21T08:53:00.6856276Z cvt.u64.u32 %rd653, %r1381; 2026-02-21T08:53:00.6856609Z cvt.u64.u32 %rd654, %r1382; 2026-02-21T08:53:00.6856937Z shl.b64 %rd655, %rd654, 32; 2026-02-21T08:53:00.6857259Z or.b64 %rd656, %rd653, %rd655; 2026-02-21T08:53:00.6857596Z cvt.u64.u32 %rd657, %r1383; 2026-02-21T08:53:00.6857905Z cvt.u64.u32 %rd658, %r1384; 2026-02-21T08:53:00.6858362Z shl.b64 %rd659, %rd658, 32; 2026-02-21T08:53:00.6858683Z or.b64 %rd660, %rd657, %rd659; 2026-02-21T08:53:00.6859025Z cvt.u64.u32 %rd661, %r1385; 2026-02-21T08:53:00.6859361Z cvt.u64.u32 %rd662, %r1386; 2026-02-21T08:53:00.6859693Z shl.b64 %rd663, %rd662, 32; 2026-02-21T08:53:00.6860019Z or.b64 %rd664, %rd661, %rd663; 2026-02-21T08:53:00.6860372Z cvt.u64.u32 %rd665, %r1387; 2026-02-21T08:53:00.6860820Z cvt.u64.u32 %rd666, %r1388; 2026-02-21T08:53:00.6861162Z shl.b64 %rd667, %rd666, 32; 2026-02-21T08:53:00.6861490Z or.b64 %rd668, %rd665, %rd667; 2026-02-21T08:53:00.6861821Z // begin inline asm 2026-02-21T08:53:00.6862824Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1390, %r1391, %r1392, %r1393, %r1394, %r1395, %r1396, %r1397, %r1398, %r1399, %r1400, %r1401, %r1402, %r1403, %r1404, %r1405}, [%r395 + 416]; 2026-02-21T08:53:00.6863903Z // end inline asm 2026-02-21T08:53:00.6864192Z cvt.u64.u32 %rd669, %r1390; 2026-02-21T08:53:00.6864549Z cvt.u64.u32 %rd670, %r1391; 2026-02-21T08:53:00.6864972Z shl.b64 %rd671, %rd670, 32; 2026-02-21T08:53:00.6865330Z or.b64 %rd672, %rd669, %rd671; 2026-02-21T08:53:00.6865689Z cvt.u64.u32 %rd673, %r1392; 2026-02-21T08:53:00.6866028Z cvt.u64.u32 %rd674, %r1393; 2026-02-21T08:53:00.6866355Z shl.b64 %rd675, %rd674, 32; 2026-02-21T08:53:00.6866807Z or.b64 %rd676, %rd673, %rd675; 2026-02-21T08:53:00.6867166Z cvt.u64.u32 %rd677, %r1394; 2026-02-21T08:53:00.6867497Z cvt.u64.u32 %rd678, %r1395; 2026-02-21T08:53:00.6867821Z shl.b64 %rd679, %rd678, 32; 2026-02-21T08:53:00.6868146Z or.b64 %rd680, %rd677, %rd679; 2026-02-21T08:53:00.6868493Z cvt.u64.u32 %rd681, %r1396; 2026-02-21T08:53:00.6868821Z cvt.u64.u32 %rd682, %r1397; 2026-02-21T08:53:00.6869155Z shl.b64 %rd683, %rd682, 32; 2026-02-21T08:53:00.6869482Z or.b64 %rd684, %rd681, %rd683; 2026-02-21T08:53:00.6869822Z cvt.u64.u32 %rd685, %r1398; 2026-02-21T08:53:00.6870140Z cvt.u64.u32 %rd686, %r1399; 2026-02-21T08:53:00.6870475Z shl.b64 %rd687, %rd686, 32; 2026-02-21T08:53:00.6870810Z or.b64 %rd688, %rd685, %rd687; 2026-02-21T08:53:00.6871160Z cvt.u64.u32 %rd689, %r1400; 2026-02-21T08:53:00.6871515Z cvt.u64.u32 %rd690, %r1401; 2026-02-21T08:53:00.6871851Z shl.b64 %rd691, %rd690, 32; 2026-02-21T08:53:00.6872191Z or.b64 %rd692, %rd689, %rd691; 2026-02-21T08:53:00.6872542Z cvt.u64.u32 %rd693, %r1402; 2026-02-21T08:53:00.6872890Z cvt.u64.u32 %rd694, %r1403; 2026-02-21T08:53:00.6873219Z shl.b64 %rd695, %rd694, 32; 2026-02-21T08:53:00.6873543Z or.b64 %rd696, %rd693, %rd695; 2026-02-21T08:53:00.6874018Z cvt.u64.u32 %rd697, %r1404; 2026-02-21T08:53:00.6874350Z cvt.u64.u32 %rd698, %r1405; 2026-02-21T08:53:00.6874743Z shl.b64 %rd699, %rd698, 32; 2026-02-21T08:53:00.6875099Z or.b64 %rd700, %rd697, %rd699; 2026-02-21T08:53:00.6875437Z // begin inline asm 2026-02-21T08:53:00.6876392Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1407, %r1408, %r1409, %r1410, %r1411, %r1412, %r1413, %r1414, %r1415, %r1416, %r1417, %r1418, %r1419, %r1420, %r1421, %r1422}, [%r395 + 432]; 2026-02-21T08:53:00.6877498Z // end inline asm 2026-02-21T08:53:00.6877790Z cvt.u64.u32 %rd701, %r1407; 2026-02-21T08:53:00.6878133Z cvt.u64.u32 %rd702, %r1408; 2026-02-21T08:53:00.6878459Z shl.b64 %rd703, %rd702, 32; 2026-02-21T08:53:00.6878790Z or.b64 %rd704, %rd701, %rd703; 2026-02-21T08:53:00.6879130Z cvt.u64.u32 %rd705, %r1409; 2026-02-21T08:53:00.6879448Z cvt.u64.u32 %rd706, %r1410; 2026-02-21T08:53:00.6879765Z shl.b64 %rd707, %rd706, 32; 2026-02-21T08:53:00.6880073Z or.b64 %rd708, %rd705, %rd707; 2026-02-21T08:53:00.6880409Z cvt.u64.u32 %rd709, %r1411; 2026-02-21T08:53:00.6880724Z cvt.u64.u32 %rd710, %r1412; 2026-02-21T08:53:00.6881054Z shl.b64 %rd711, %rd710, 32; 2026-02-21T08:53:00.6881377Z or.b64 %rd712, %rd709, %rd711; 2026-02-21T08:53:00.6881706Z cvt.u64.u32 %rd713, %r1413; 2026-02-21T08:53:00.6882017Z cvt.u64.u32 %rd714, %r1414; 2026-02-21T08:53:00.6882332Z shl.b64 %rd715, %rd714, 32; 2026-02-21T08:53:00.6882650Z or.b64 %rd716, %rd713, %rd715; 2026-02-21T08:53:00.6883098Z cvt.u64.u32 %rd717, %r1415; 2026-02-21T08:53:00.6883431Z cvt.u64.u32 %rd718, %r1416; 2026-02-21T08:53:00.6883768Z shl.b64 %rd719, %rd718, 32; 2026-02-21T08:53:00.6884110Z or.b64 %rd720, %rd717, %rd719; 2026-02-21T08:53:00.6884435Z cvt.u64.u32 %rd721, %r1417; 2026-02-21T08:53:00.6884852Z cvt.u64.u32 %rd722, %r1418; 2026-02-21T08:53:00.6885196Z shl.b64 %rd723, %rd722, 32; 2026-02-21T08:53:00.6885643Z or.b64 %rd724, %rd721, %rd723; 2026-02-21T08:53:00.6885984Z cvt.u64.u32 %rd725, %r1419; 2026-02-21T08:53:00.6886324Z cvt.u64.u32 %rd726, %r1420; 2026-02-21T08:53:00.6886661Z shl.b64 %rd727, %rd726, 32; 2026-02-21T08:53:00.6886996Z or.b64 %rd728, %rd725, %rd727; 2026-02-21T08:53:00.6887348Z cvt.u64.u32 %rd729, %r1421; 2026-02-21T08:53:00.6887678Z cvt.u64.u32 %rd730, %r1422; 2026-02-21T08:53:00.6888010Z shl.b64 %rd731, %rd730, 32; 2026-02-21T08:53:00.6888334Z or.b64 %rd732, %rd729, %rd731; 2026-02-21T08:53:00.6888676Z // begin inline asm 2026-02-21T08:53:00.6889693Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1424, %r1425, %r1426, %r1427, %r1428, %r1429, %r1430, %r1431, %r1432, %r1433, %r1434, %r1435, %r1436, %r1437, %r1438, %r1439}, [%r395 + 448]; 2026-02-21T08:53:00.6890817Z // end inline asm 2026-02-21T08:53:00.6891109Z // begin inline asm 2026-02-21T08:53:00.6892161Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1441, %r1442, %r1443, %r1444, %r1445, %r1446, %r1447, %r1448, %r1449, %r1450, %r1451, %r1452, %r1453, %r1454, %r1455, %r1456}, [%r395 + 464]; 2026-02-21T08:53:00.6893287Z // end inline asm 2026-02-21T08:53:00.6893571Z cvt.u64.u32 %rd733, %r1449; 2026-02-21T08:53:00.6893912Z cvt.u64.u32 %rd734, %r1450; 2026-02-21T08:53:00.6894239Z shl.b64 %rd735, %rd734, 32; 2026-02-21T08:53:00.6894575Z or.b64 %rd736, %rd733, %rd735; 2026-02-21T08:53:00.6894985Z cvt.u64.u32 %rd737, %r1451; 2026-02-21T08:53:00.6895318Z cvt.u64.u32 %rd738, %r1452; 2026-02-21T08:53:00.6895657Z shl.b64 %rd739, %rd738, 32; 2026-02-21T08:53:00.6896010Z or.b64 %rd740, %rd737, %rd739; 2026-02-21T08:53:00.6896368Z cvt.u64.u32 %rd741, %r1453; 2026-02-21T08:53:00.6896701Z cvt.u64.u32 %rd742, %r1454; 2026-02-21T08:53:00.6897046Z shl.b64 %rd743, %rd742, 32; 2026-02-21T08:53:00.6897380Z or.b64 %rd744, %rd741, %rd743; 2026-02-21T08:53:00.6897716Z cvt.u64.u32 %rd745, %r1455; 2026-02-21T08:53:00.6898023Z cvt.u64.u32 %rd746, %r1456; 2026-02-21T08:53:00.6898352Z shl.b64 %rd747, %rd746, 32; 2026-02-21T08:53:00.6898689Z or.b64 %rd748, %rd745, %rd747; 2026-02-21T08:53:00.6899155Z // begin inline asm 2026-02-21T08:53:00.6900124Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1458, %r1459, %r1460, %r1461, %r1462, %r1463, %r1464, %r1465, %r1466, %r1467, %r1468, %r1469, %r1470, %r1471, %r1472, %r1473}, [%r395 + 480]; 2026-02-21T08:53:00.6901173Z // end inline asm 2026-02-21T08:53:00.6901462Z cvt.u64.u32 %rd749, %r1458; 2026-02-21T08:53:00.6901799Z cvt.u64.u32 %rd750, %r1459; 2026-02-21T08:53:00.6902141Z shl.b64 %rd751, %rd750, 32; 2026-02-21T08:53:00.6902486Z or.b64 %rd752, %rd749, %rd751; 2026-02-21T08:53:00.6902836Z cvt.u64.u32 %rd753, %r1460; 2026-02-21T08:53:00.6903169Z cvt.u64.u32 %rd754, %r1461; 2026-02-21T08:53:00.6903490Z shl.b64 %rd755, %rd754, 32; 2026-02-21T08:53:00.6903813Z or.b64 %rd756, %rd753, %rd755; 2026-02-21T08:53:00.6904128Z cvt.u64.u32 %rd757, %r1462; 2026-02-21T08:53:00.6904448Z cvt.u64.u32 %rd758, %r1463; 2026-02-21T08:53:00.6904854Z shl.b64 %rd759, %rd758, 32; 2026-02-21T08:53:00.6905202Z or.b64 %rd760, %rd757, %rd759; 2026-02-21T08:53:00.6905537Z cvt.u64.u32 %rd761, %r1464; 2026-02-21T08:53:00.6905863Z cvt.u64.u32 %rd762, %r1465; 2026-02-21T08:53:00.6906185Z shl.b64 %rd763, %rd762, 32; 2026-02-21T08:53:00.6906499Z or.b64 %rd764, %rd761, %rd763; 2026-02-21T08:53:00.6906836Z cvt.u64.u32 %rd765, %r1466; 2026-02-21T08:53:00.6907151Z cvt.u64.u32 %rd766, %r1467; 2026-02-21T08:53:00.6907480Z shl.b64 %rd767, %rd766, 32; 2026-02-21T08:53:00.6907802Z or.b64 %rd768, %rd765, %rd767; 2026-02-21T08:53:00.6908160Z cvt.u64.u32 %rd769, %r1468; 2026-02-21T08:53:00.6908624Z cvt.u64.u32 %rd770, %r1469; 2026-02-21T08:53:00.6908950Z shl.b64 %rd771, %rd770, 32; 2026-02-21T08:53:00.6909286Z or.b64 %rd772, %rd769, %rd771; 2026-02-21T08:53:00.6909636Z cvt.u64.u32 %rd773, %r1470; 2026-02-21T08:53:00.6909965Z cvt.u64.u32 %rd774, %r1471; 2026-02-21T08:53:00.6910278Z shl.b64 %rd775, %rd774, 32; 2026-02-21T08:53:00.6910608Z or.b64 %rd776, %rd773, %rd775; 2026-02-21T08:53:00.6911076Z cvt.u64.u32 %rd777, %r1472; 2026-02-21T08:53:00.6911436Z cvt.u64.u32 %rd778, %r1473; 2026-02-21T08:53:00.6911771Z shl.b64 %rd779, %rd778, 32; 2026-02-21T08:53:00.6912110Z or.b64 %rd780, %rd777, %rd779; 2026-02-21T08:53:00.6912441Z // begin inline asm 2026-02-21T08:53:00.6913407Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1475, %r1476, %r1477, %r1478, %r1479, %r1480, %r1481, %r1482, %r1483, %r1484, %r1485, %r1486, %r1487, %r1488, %r1489, %r1490}, [%r395 + 496]; 2026-02-21T08:53:00.6914552Z // end inline asm 2026-02-21T08:53:00.6914916Z cvt.u64.u32 %rd781, %r1475; 2026-02-21T08:53:00.6915263Z cvt.u64.u32 %rd782, %r1476; 2026-02-21T08:53:00.6915596Z shl.b64 %rd783, %rd782, 32; 2026-02-21T08:53:00.6915940Z or.b64 %rd784, %rd781, %rd783; 2026-02-21T08:53:00.6916271Z cvt.u64.u32 %rd785, %r1477; 2026-02-21T08:53:00.6916588Z cvt.u64.u32 %rd786, %r1478; 2026-02-21T08:53:00.6916910Z shl.b64 %rd787, %rd786, 32; 2026-02-21T08:53:00.6917142Z or.b64 %rd788, %rd785, %rd787; 2026-02-21T08:53:00.6917274Z cvt.u64.u32 %rd789, %r1479; 2026-02-21T08:53:00.6917388Z cvt.u64.u32 %rd790, %r1480; 2026-02-21T08:53:00.6917508Z shl.b64 %rd791, %rd790, 32; 2026-02-21T08:53:00.6917621Z or.b64 %rd792, %rd789, %rd791; 2026-02-21T08:53:00.6917731Z cvt.u64.u32 %rd793, %r1481; 2026-02-21T08:53:00.6917843Z cvt.u64.u32 %rd794, %r1482; 2026-02-21T08:53:00.6917959Z shl.b64 %rd795, %rd794, 32; 2026-02-21T08:53:00.6918072Z or.b64 %rd796, %rd793, %rd795; 2026-02-21T08:53:00.6918182Z cvt.u64.u32 %rd797, %r1483; 2026-02-21T08:53:00.6918298Z cvt.u64.u32 %rd798, %r1484; 2026-02-21T08:53:00.6918412Z shl.b64 %rd799, %rd798, 32; 2026-02-21T08:53:00.6918523Z or.b64 %rd800, %rd797, %rd799; 2026-02-21T08:53:00.6918640Z cvt.u64.u32 %rd801, %r1485; 2026-02-21T08:53:00.6918750Z cvt.u64.u32 %rd802, %r1486; 2026-02-21T08:53:00.6918859Z shl.b64 %rd803, %rd802, 32; 2026-02-21T08:53:00.6918968Z or.b64 %rd804, %rd801, %rd803; 2026-02-21T08:53:00.6919092Z cvt.u64.u32 %rd805, %r1487; 2026-02-21T08:53:00.6919206Z cvt.u64.u32 %rd806, %r1488; 2026-02-21T08:53:00.6919317Z shl.b64 %rd807, %rd806, 32; 2026-02-21T08:53:00.6919546Z or.b64 %rd808, %rd805, %rd807; 2026-02-21T08:53:00.6919657Z cvt.u64.u32 %rd809, %r1489; 2026-02-21T08:53:00.6919768Z cvt.u64.u32 %rd810, %r1490; 2026-02-21T08:53:00.6919878Z shl.b64 %rd811, %rd810, 32; 2026-02-21T08:53:00.6919999Z or.b64 %rd812, %rd809, %rd811; 2026-02-21T08:53:00.6920113Z // begin inline asm 2026-02-21T08:53:00.6920259Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:53:00.6920377Z // end inline asm 2026-02-21T08:53:00.6920497Z cvt.u64.u32 %rd813, %r948; 2026-02-21T08:53:00.6920608Z cvt.u64.u32 %rd814, %r949; 2026-02-21T08:53:00.6920716Z shl.b64 %rd815, %rd814, 32; 2026-02-21T08:53:00.6920836Z or.b64 %rd816, %rd813, %rd815; 2026-02-21T08:53:00.6921261Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6921384Z mov.b64 {%r1522, %r1523}, %rd816; 2026-02-21T08:53:00.6921540Z cvt.rn.f16x2.f32 %r1524, %r1523, %r1522; 2026-02-21T08:53:00.6921929Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6922047Z cvt.u64.u32 %rd817, %r950; 2026-02-21T08:53:00.6922167Z cvt.u64.u32 %rd818, %r951; 2026-02-21T08:53:00.6922272Z shl.b64 %rd819, %rd818, 32; 2026-02-21T08:53:00.6922381Z or.b64 %rd820, %rd817, %rd819; 2026-02-21T08:53:00.6922742Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6922867Z mov.b64 {%r1525, %r1526}, %rd820; 2026-02-21T08:53:00.6923125Z cvt.rn.f16x2.f32 %r1527, %r1526, %r1525; 2026-02-21T08:53:00.6923512Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6923637Z cvt.u64.u32 %rd821, %r952; 2026-02-21T08:53:00.6923751Z cvt.u64.u32 %rd822, %r953; 2026-02-21T08:53:00.6923868Z shl.b64 %rd823, %rd822, 32; 2026-02-21T08:53:00.6923991Z or.b64 %rd824, %rd821, %rd823; 2026-02-21T08:53:00.6924443Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6924572Z mov.b64 {%r1528, %r1529}, %rd824; 2026-02-21T08:53:00.6924784Z cvt.rn.f16x2.f32 %r1530, %r1529, %r1528; 2026-02-21T08:53:00.6925167Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6925279Z cvt.u64.u32 %rd825, %r954; 2026-02-21T08:53:00.6925390Z cvt.u64.u32 %rd826, %r955; 2026-02-21T08:53:00.6925508Z shl.b64 %rd827, %rd826, 32; 2026-02-21T08:53:00.6925626Z or.b64 %rd828, %rd825, %rd827; 2026-02-21T08:53:00.6926004Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6926128Z mov.b64 {%r1531, %r1532}, %rd828; 2026-02-21T08:53:00.6926263Z cvt.rn.f16x2.f32 %r1533, %r1532, %r1531; 2026-02-21T08:53:00.6926753Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6926884Z cvt.u64.u32 %rd829, %r956; 2026-02-21T08:53:00.6927008Z cvt.u64.u32 %rd830, %r957; 2026-02-21T08:53:00.6927118Z shl.b64 %rd831, %rd830, 32; 2026-02-21T08:53:00.6927234Z or.b64 %rd832, %rd829, %rd831; 2026-02-21T08:53:00.6927621Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6927733Z mov.b64 {%r1534, %r1535}, %rd832; 2026-02-21T08:53:00.6927861Z cvt.rn.f16x2.f32 %r1536, %r1535, %r1534; 2026-02-21T08:53:00.6928233Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6928344Z cvt.u64.u32 %rd833, %r958; 2026-02-21T08:53:00.6928448Z cvt.u64.u32 %rd834, %r959; 2026-02-21T08:53:00.6928552Z shl.b64 %rd835, %rd834, 32; 2026-02-21T08:53:00.6928666Z or.b64 %rd836, %rd833, %rd835; 2026-02-21T08:53:00.6929007Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6929119Z mov.b64 {%r1537, %r1538}, %rd836; 2026-02-21T08:53:00.6929370Z cvt.rn.f16x2.f32 %r1539, %r1538, %r1537; 2026-02-21T08:53:00.6929736Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6929845Z cvt.u64.u32 %rd837, %r960; 2026-02-21T08:53:00.6929965Z cvt.u64.u32 %rd838, %r961; 2026-02-21T08:53:00.6930074Z shl.b64 %rd839, %rd838, 32; 2026-02-21T08:53:00.6930184Z or.b64 %rd840, %rd837, %rd839; 2026-02-21T08:53:00.6930547Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6930666Z mov.b64 {%r1540, %r1541}, %rd840; 2026-02-21T08:53:00.6930789Z cvt.rn.f16x2.f32 %r1542, %r1541, %r1540; 2026-02-21T08:53:00.6931144Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6931261Z cvt.u64.u32 %rd841, %r962; 2026-02-21T08:53:00.6931369Z cvt.u64.u32 %rd842, %r963; 2026-02-21T08:53:00.6931477Z shl.b64 %rd843, %rd842, 32; 2026-02-21T08:53:00.6931596Z or.b64 %rd844, %rd841, %rd843; 2026-02-21T08:53:00.6931959Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6932066Z mov.b64 {%r1543, %r1544}, %rd844; 2026-02-21T08:53:00.6932191Z cvt.rn.f16x2.f32 %r1545, %r1544, %r1543; 2026-02-21T08:53:00.6932574Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6932688Z cvt.u64.u32 %rd845, %r965; 2026-02-21T08:53:00.6932924Z cvt.u64.u32 %rd846, %r966; 2026-02-21T08:53:00.6933042Z shl.b64 %rd847, %rd846, 32; 2026-02-21T08:53:00.6933151Z or.b64 %rd848, %rd845, %rd847; 2026-02-21T08:53:00.6933521Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6933646Z mov.b64 {%r1546, %r1547}, %rd848; 2026-02-21T08:53:00.6933780Z cvt.rn.f16x2.f32 %r1548, %r1547, %r1546; 2026-02-21T08:53:00.6934242Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6934367Z cvt.u64.u32 %rd849, %r967; 2026-02-21T08:53:00.6934487Z cvt.u64.u32 %rd850, %r968; 2026-02-21T08:53:00.6934594Z shl.b64 %rd851, %rd850, 32; 2026-02-21T08:53:00.6934752Z or.b64 %rd852, %rd849, %rd851; 2026-02-21T08:53:00.6935142Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6935254Z mov.b64 {%r1549, %r1550}, %rd852; 2026-02-21T08:53:00.6935389Z cvt.rn.f16x2.f32 %r1551, %r1550, %r1549; 2026-02-21T08:53:00.6935780Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6935893Z cvt.u64.u32 %rd853, %r969; 2026-02-21T08:53:00.6936008Z cvt.u64.u32 %rd854, %r970; 2026-02-21T08:53:00.6936205Z shl.b64 %rd855, %rd854, 32; 2026-02-21T08:53:00.6936333Z or.b64 %rd856, %rd853, %rd855; 2026-02-21T08:53:00.6936720Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6936833Z mov.b64 {%r1552, %r1553}, %rd856; 2026-02-21T08:53:00.6936970Z cvt.rn.f16x2.f32 %r1554, %r1553, %r1552; 2026-02-21T08:53:00.6937346Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6937457Z cvt.u64.u32 %rd857, %r971; 2026-02-21T08:53:00.6937574Z cvt.u64.u32 %rd858, %r972; 2026-02-21T08:53:00.6937682Z shl.b64 %rd859, %rd858, 32; 2026-02-21T08:53:00.6937795Z or.b64 %rd860, %rd857, %rd859; 2026-02-21T08:53:00.6938176Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6938295Z mov.b64 {%r1555, %r1556}, %rd860; 2026-02-21T08:53:00.6938428Z cvt.rn.f16x2.f32 %r1557, %r1556, %r1555; 2026-02-21T08:53:00.6938826Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6938951Z cvt.u64.u32 %rd861, %r973; 2026-02-21T08:53:00.6939183Z cvt.u64.u32 %rd862, %r974; 2026-02-21T08:53:00.6939297Z shl.b64 %rd863, %rd862, 32; 2026-02-21T08:53:00.6939417Z or.b64 %rd864, %rd861, %rd863; 2026-02-21T08:53:00.6939792Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6939908Z mov.b64 {%r1558, %r1559}, %rd864; 2026-02-21T08:53:00.6940039Z cvt.rn.f16x2.f32 %r1560, %r1559, %r1558; 2026-02-21T08:53:00.6940426Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6940557Z cvt.u64.u32 %rd865, %r975; 2026-02-21T08:53:00.6940666Z cvt.u64.u32 %rd866, %r976; 2026-02-21T08:53:00.6940783Z shl.b64 %rd867, %rd866, 32; 2026-02-21T08:53:00.6940893Z or.b64 %rd868, %rd865, %rd867; 2026-02-21T08:53:00.6941253Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6941375Z mov.b64 {%r1561, %r1562}, %rd868; 2026-02-21T08:53:00.6941505Z cvt.rn.f16x2.f32 %r1563, %r1562, %r1561; 2026-02-21T08:53:00.6941881Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6941994Z cvt.u64.u32 %rd869, %r977; 2026-02-21T08:53:00.6942117Z cvt.u64.u32 %rd870, %r978; 2026-02-21T08:53:00.6942227Z shl.b64 %rd871, %rd870, 32; 2026-02-21T08:53:00.6942340Z or.b64 %rd872, %rd869, %rd871; 2026-02-21T08:53:00.6942722Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6942963Z mov.b64 {%r1564, %r1565}, %rd872; 2026-02-21T08:53:00.6943095Z cvt.rn.f16x2.f32 %r1566, %r1565, %r1564; 2026-02-21T08:53:00.6943483Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6943597Z cvt.u64.u32 %rd873, %r979; 2026-02-21T08:53:00.6943708Z cvt.u64.u32 %rd874, %r980; 2026-02-21T08:53:00.6943890Z shl.b64 %rd875, %rd874, 32; 2026-02-21T08:53:00.6944027Z or.b64 %rd876, %rd873, %rd875; 2026-02-21T08:53:00.6944416Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6944533Z mov.b64 {%r1567, %r1568}, %rd876; 2026-02-21T08:53:00.6944750Z cvt.rn.f16x2.f32 %r1569, %r1568, %r1567; 2026-02-21T08:53:00.6944876Z mov.b64 {%r1570, %r1571}, %rd268; 2026-02-21T08:53:00.6945018Z cvt.rn.f16x2.f32 %r1572, %r1571, %r1570; 2026-02-21T08:53:00.6945143Z mov.b64 {%r1573, %r1574}, %rd272; 2026-02-21T08:53:00.6945281Z cvt.rn.f16x2.f32 %r1575, %r1574, %r1573; 2026-02-21T08:53:00.6945394Z mov.b64 {%r1576, %r1577}, %rd276; 2026-02-21T08:53:00.6945528Z cvt.rn.f16x2.f32 %r1578, %r1577, %r1576; 2026-02-21T08:53:00.6945649Z mov.b64 {%r1579, %r1580}, %rd280; 2026-02-21T08:53:00.6945783Z cvt.rn.f16x2.f32 %r1581, %r1580, %r1579; 2026-02-21T08:53:00.6945986Z mov.b64 {%r1582, %r1583}, %rd284; 2026-02-21T08:53:00.6946139Z cvt.rn.f16x2.f32 %r1584, %r1583, %r1582; 2026-02-21T08:53:00.6946257Z mov.b64 {%r1585, %r1586}, %rd288; 2026-02-21T08:53:00.6946390Z cvt.rn.f16x2.f32 %r1587, %r1586, %r1585; 2026-02-21T08:53:00.6946502Z mov.b64 {%r1588, %r1589}, %rd292; 2026-02-21T08:53:00.6946641Z cvt.rn.f16x2.f32 %r1590, %r1589, %r1588; 2026-02-21T08:53:00.6946751Z mov.b64 {%r1591, %r1592}, %rd296; 2026-02-21T08:53:00.6946877Z cvt.rn.f16x2.f32 %r1593, %r1592, %r1591; 2026-02-21T08:53:00.6946993Z mov.b64 {%r1594, %r1595}, %rd300; 2026-02-21T08:53:00.6947117Z cvt.rn.f16x2.f32 %r1596, %r1595, %r1594; 2026-02-21T08:53:00.6947225Z mov.b64 {%r1597, %r1598}, %rd304; 2026-02-21T08:53:00.6947355Z cvt.rn.f16x2.f32 %r1599, %r1598, %r1597; 2026-02-21T08:53:00.6947464Z mov.b64 {%r1600, %r1601}, %rd308; 2026-02-21T08:53:00.6947592Z cvt.rn.f16x2.f32 %r1602, %r1601, %r1600; 2026-02-21T08:53:00.6947698Z mov.b64 {%r1603, %r1604}, %rd312; 2026-02-21T08:53:00.6947836Z cvt.rn.f16x2.f32 %r1605, %r1604, %r1603; 2026-02-21T08:53:00.6947950Z mov.b64 {%r1606, %r1607}, %rd316; 2026-02-21T08:53:00.6948178Z cvt.rn.f16x2.f32 %r1608, %r1607, %r1606; 2026-02-21T08:53:00.6948297Z mov.b64 {%r1609, %r1610}, %rd320; 2026-02-21T08:53:00.6948428Z cvt.rn.f16x2.f32 %r1611, %r1610, %r1609; 2026-02-21T08:53:00.6948539Z mov.b64 {%r1612, %r1613}, %rd324; 2026-02-21T08:53:00.6948679Z cvt.rn.f16x2.f32 %r1614, %r1613, %r1612; 2026-02-21T08:53:00.6948790Z mov.b64 {%r1615, %r1616}, %rd328; 2026-02-21T08:53:00.6948919Z cvt.rn.f16x2.f32 %r1617, %r1616, %r1615; 2026-02-21T08:53:00.6949299Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6949421Z cvt.u64.u32 %rd877, %r1016; 2026-02-21T08:53:00.6949531Z cvt.u64.u32 %rd878, %r1017; 2026-02-21T08:53:00.6949640Z shl.b64 %rd879, %rd878, 32; 2026-02-21T08:53:00.6949758Z or.b64 %rd880, %rd877, %rd879; 2026-02-21T08:53:00.6950132Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6950246Z mov.b64 {%r1618, %r1619}, %rd880; 2026-02-21T08:53:00.6950377Z cvt.rn.f16x2.f32 %r1620, %r1619, %r1618; 2026-02-21T08:53:00.6950756Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6950869Z cvt.u64.u32 %rd881, %r1018; 2026-02-21T08:53:00.6950981Z cvt.u64.u32 %rd882, %r1019; 2026-02-21T08:53:00.6951104Z shl.b64 %rd883, %rd882, 32; 2026-02-21T08:53:00.6951220Z or.b64 %rd884, %rd881, %rd883; 2026-02-21T08:53:00.6951604Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6951877Z mov.b64 {%r1621, %r1622}, %rd884; 2026-02-21T08:53:00.6952011Z cvt.rn.f16x2.f32 %r1623, %r1622, %r1621; 2026-02-21T08:53:00.6952380Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6952505Z cvt.u64.u32 %rd885, %r1020; 2026-02-21T08:53:00.6952615Z cvt.u64.u32 %rd886, %r1021; 2026-02-21T08:53:00.6952805Z shl.b64 %rd887, %rd886, 32; 2026-02-21T08:53:00.6952925Z or.b64 %rd888, %rd885, %rd887; 2026-02-21T08:53:00.6953280Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6953385Z mov.b64 {%r1624, %r1625}, %rd888; 2026-02-21T08:53:00.6953505Z cvt.rn.f16x2.f32 %r1626, %r1625, %r1624; 2026-02-21T08:53:00.6953865Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6953973Z cvt.u64.u32 %rd889, %r1022; 2026-02-21T08:53:00.6954084Z cvt.u64.u32 %rd890, %r1023; 2026-02-21T08:53:00.6954200Z shl.b64 %rd891, %rd890, 32; 2026-02-21T08:53:00.6954310Z or.b64 %rd892, %rd889, %rd891; 2026-02-21T08:53:00.6954754Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6954951Z mov.b64 {%r1627, %r1628}, %rd892; 2026-02-21T08:53:00.6955089Z cvt.rn.f16x2.f32 %r1629, %r1628, %r1627; 2026-02-21T08:53:00.6955456Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6955567Z cvt.u64.u32 %rd893, %r1024; 2026-02-21T08:53:00.6955684Z cvt.u64.u32 %rd894, %r1025; 2026-02-21T08:53:00.6955790Z shl.b64 %rd895, %rd894, 32; 2026-02-21T08:53:00.6955898Z or.b64 %rd896, %rd893, %rd895; 2026-02-21T08:53:00.6956274Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6956386Z mov.b64 {%r1630, %r1631}, %rd896; 2026-02-21T08:53:00.6956515Z cvt.rn.f16x2.f32 %r1632, %r1631, %r1630; 2026-02-21T08:53:00.6956877Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6956997Z cvt.u64.u32 %rd897, %r1026; 2026-02-21T08:53:00.6957107Z cvt.u64.u32 %rd898, %r1027; 2026-02-21T08:53:00.6957223Z shl.b64 %rd899, %rd898, 32; 2026-02-21T08:53:00.6957349Z or.b64 %rd900, %rd897, %rd899; 2026-02-21T08:53:00.6957734Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6957955Z mov.b64 {%r1633, %r1634}, %rd900; 2026-02-21T08:53:00.6958098Z cvt.rn.f16x2.f32 %r1635, %r1634, %r1633; 2026-02-21T08:53:00.6958489Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6958605Z cvt.u64.u32 %rd901, %r1028; 2026-02-21T08:53:00.6958715Z cvt.u64.u32 %rd902, %r1029; 2026-02-21T08:53:00.6958834Z shl.b64 %rd903, %rd902, 32; 2026-02-21T08:53:00.6958947Z or.b64 %rd904, %rd901, %rd903; 2026-02-21T08:53:00.6959301Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6959421Z mov.b64 {%r1636, %r1637}, %rd904; 2026-02-21T08:53:00.6959547Z cvt.rn.f16x2.f32 %r1638, %r1637, %r1636; 2026-02-21T08:53:00.6959913Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6960038Z cvt.u64.u32 %rd905, %r1030; 2026-02-21T08:53:00.6960156Z cvt.u64.u32 %rd906, %r1031; 2026-02-21T08:53:00.6960267Z shl.b64 %rd907, %rd906, 32; 2026-02-21T08:53:00.6960380Z or.b64 %rd908, %rd905, %rd907; 2026-02-21T08:53:00.6960763Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6960877Z mov.b64 {%r1639, %r1640}, %rd908; 2026-02-21T08:53:00.6961004Z cvt.rn.f16x2.f32 %r1641, %r1640, %r1639; 2026-02-21T08:53:00.6961385Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6961636Z cvt.u64.u32 %rd909, %r1033; 2026-02-21T08:53:00.6961752Z cvt.u64.u32 %rd910, %r1034; 2026-02-21T08:53:00.6961874Z shl.b64 %rd911, %rd910, 32; 2026-02-21T08:53:00.6961990Z or.b64 %rd912, %rd909, %rd911; 2026-02-21T08:53:00.6962379Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6962572Z mov.b64 {%r1642, %r1643}, %rd912; 2026-02-21T08:53:00.6962727Z cvt.rn.f16x2.f32 %r1644, %r1643, %r1642; 2026-02-21T08:53:00.6963121Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6963238Z cvt.u64.u32 %rd913, %r1035; 2026-02-21T08:53:00.6963363Z cvt.u64.u32 %rd914, %r1036; 2026-02-21T08:53:00.6963478Z shl.b64 %rd915, %rd914, 32; 2026-02-21T08:53:00.6963599Z or.b64 %rd916, %rd913, %rd915; 2026-02-21T08:53:00.6963986Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6964104Z mov.b64 {%r1645, %r1646}, %rd916; 2026-02-21T08:53:00.6964237Z cvt.rn.f16x2.f32 %r1647, %r1646, %r1645; 2026-02-21T08:53:00.6964633Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6964930Z cvt.u64.u32 %rd917, %r1037; 2026-02-21T08:53:00.6965050Z cvt.u64.u32 %rd918, %r1038; 2026-02-21T08:53:00.6965165Z shl.b64 %rd919, %rd918, 32; 2026-02-21T08:53:00.6965289Z or.b64 %rd920, %rd917, %rd919; 2026-02-21T08:53:00.6965648Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6965761Z mov.b64 {%r1648, %r1649}, %rd920; 2026-02-21T08:53:00.6965901Z cvt.rn.f16x2.f32 %r1650, %r1649, %r1648; 2026-02-21T08:53:00.6966285Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6966398Z cvt.u64.u32 %rd921, %r1039; 2026-02-21T08:53:00.6966512Z cvt.u64.u32 %rd922, %r1040; 2026-02-21T08:53:00.6966632Z shl.b64 %rd923, %rd922, 32; 2026-02-21T08:53:00.6966746Z or.b64 %rd924, %rd921, %rd923; 2026-02-21T08:53:00.6967130Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6967253Z mov.b64 {%r1651, %r1652}, %rd924; 2026-02-21T08:53:00.6967385Z cvt.rn.f16x2.f32 %r1653, %r1652, %r1651; 2026-02-21T08:53:00.6967763Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6967993Z cvt.u64.u32 %rd925, %r1041; 2026-02-21T08:53:00.6968108Z cvt.u64.u32 %rd926, %r1042; 2026-02-21T08:53:00.6968217Z shl.b64 %rd927, %rd926, 32; 2026-02-21T08:53:00.6968332Z or.b64 %rd928, %rd925, %rd927; 2026-02-21T08:53:00.6968718Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6968835Z mov.b64 {%r1654, %r1655}, %rd928; 2026-02-21T08:53:00.6968968Z cvt.rn.f16x2.f32 %r1656, %r1655, %r1654; 2026-02-21T08:53:00.6969379Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6969496Z cvt.u64.u32 %rd929, %r1043; 2026-02-21T08:53:00.6969608Z cvt.u64.u32 %rd930, %r1044; 2026-02-21T08:53:00.6969733Z shl.b64 %rd931, %rd930, 32; 2026-02-21T08:53:00.6969851Z or.b64 %rd932, %rd929, %rd931; 2026-02-21T08:53:00.6970243Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6970360Z mov.b64 {%r1657, %r1658}, %rd932; 2026-02-21T08:53:00.6970502Z cvt.rn.f16x2.f32 %r1659, %r1658, %r1657; 2026-02-21T08:53:00.6970878Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6970987Z cvt.u64.u32 %rd933, %r1045; 2026-02-21T08:53:00.6971104Z cvt.u64.u32 %rd934, %r1046; 2026-02-21T08:53:00.6971211Z shl.b64 %rd935, %rd934, 32; 2026-02-21T08:53:00.6971439Z or.b64 %rd936, %rd933, %rd935; 2026-02-21T08:53:00.6971821Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6971936Z mov.b64 {%r1660, %r1661}, %rd936; 2026-02-21T08:53:00.6972067Z cvt.rn.f16x2.f32 %r1662, %r1661, %r1660; 2026-02-21T08:53:00.6972446Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6972644Z cvt.u64.u32 %rd937, %r1047; 2026-02-21T08:53:00.6972771Z cvt.u64.u32 %rd938, %r1048; 2026-02-21T08:53:00.6972883Z shl.b64 %rd939, %rd938, 32; 2026-02-21T08:53:00.6973006Z or.b64 %rd940, %rd937, %rd939; 2026-02-21T08:53:00.6973385Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6973499Z mov.b64 {%r1663, %r1664}, %rd940; 2026-02-21T08:53:00.6973636Z cvt.rn.f16x2.f32 %r1665, %r1664, %r1663; 2026-02-21T08:53:00.6973748Z mov.b64 {%r1666, %r1667}, %rd332; 2026-02-21T08:53:00.6973881Z cvt.rn.f16x2.f32 %r1668, %r1667, %r1666; 2026-02-21T08:53:00.6973990Z mov.b64 {%r1669, %r1670}, %rd336; 2026-02-21T08:53:00.6974127Z cvt.rn.f16x2.f32 %r1671, %r1670, %r1669; 2026-02-21T08:53:00.6974236Z mov.b64 {%r1672, %r1673}, %rd340; 2026-02-21T08:53:00.6974364Z cvt.rn.f16x2.f32 %r1674, %r1673, %r1672; 2026-02-21T08:53:00.6974553Z mov.b64 {%r1675, %r1676}, %rd344; 2026-02-21T08:53:00.6974758Z cvt.rn.f16x2.f32 %r1677, %r1676, %r1675; 2026-02-21T08:53:00.6974883Z mov.b64 {%r1678, %r1679}, %rd348; 2026-02-21T08:53:00.6975015Z cvt.rn.f16x2.f32 %r1680, %r1679, %r1678; 2026-02-21T08:53:00.6975141Z mov.b64 {%r1681, %r1682}, %rd352; 2026-02-21T08:53:00.6975278Z cvt.rn.f16x2.f32 %r1683, %r1682, %r1681; 2026-02-21T08:53:00.6975393Z mov.b64 {%r1684, %r1685}, %rd356; 2026-02-21T08:53:00.6975541Z cvt.rn.f16x2.f32 %r1686, %r1685, %r1684; 2026-02-21T08:53:00.6975658Z mov.b64 {%r1687, %r1688}, %rd360; 2026-02-21T08:53:00.6975789Z cvt.rn.f16x2.f32 %r1689, %r1688, %r1687; 2026-02-21T08:53:00.6975913Z mov.b64 {%r1690, %r1691}, %rd364; 2026-02-21T08:53:00.6976044Z cvt.rn.f16x2.f32 %r1692, %r1691, %r1690; 2026-02-21T08:53:00.6976155Z mov.b64 {%r1693, %r1694}, %rd368; 2026-02-21T08:53:00.6976285Z cvt.rn.f16x2.f32 %r1695, %r1694, %r1693; 2026-02-21T08:53:00.6976403Z mov.b64 {%r1696, %r1697}, %rd372; 2026-02-21T08:53:00.6976536Z cvt.rn.f16x2.f32 %r1698, %r1697, %r1696; 2026-02-21T08:53:00.6976647Z mov.b64 {%r1699, %r1700}, %rd376; 2026-02-21T08:53:00.6976885Z cvt.rn.f16x2.f32 %r1701, %r1700, %r1699; 2026-02-21T08:53:00.6976996Z mov.b64 {%r1702, %r1703}, %rd380; 2026-02-21T08:53:00.6977125Z cvt.rn.f16x2.f32 %r1704, %r1703, %r1702; 2026-02-21T08:53:00.6977244Z mov.b64 {%r1705, %r1706}, %rd384; 2026-02-21T08:53:00.6977364Z cvt.rn.f16x2.f32 %r1707, %r1706, %r1705; 2026-02-21T08:53:00.6977466Z mov.b64 {%r1708, %r1709}, %rd388; 2026-02-21T08:53:00.6977583Z cvt.rn.f16x2.f32 %r1710, %r1709, %r1708; 2026-02-21T08:53:00.6977695Z mov.b64 {%r1711, %r1712}, %rd392; 2026-02-21T08:53:00.6977823Z cvt.rn.f16x2.f32 %r1713, %r1712, %r1711; 2026-02-21T08:53:00.6978192Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6978311Z cvt.u64.u32 %rd941, %r1084; 2026-02-21T08:53:00.6978419Z cvt.u64.u32 %rd942, %r1085; 2026-02-21T08:53:00.6978526Z shl.b64 %rd943, %rd942, 32; 2026-02-21T08:53:00.6978636Z or.b64 %rd944, %rd941, %rd943; 2026-02-21T08:53:00.6979018Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6979127Z mov.b64 {%r1714, %r1715}, %rd944; 2026-02-21T08:53:00.6979250Z cvt.rn.f16x2.f32 %r1716, %r1715, %r1714; 2026-02-21T08:53:00.6979618Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6979726Z cvt.u64.u32 %rd945, %r1086; 2026-02-21T08:53:00.6979832Z cvt.u64.u32 %rd946, %r1087; 2026-02-21T08:53:00.6979945Z shl.b64 %rd947, %rd946, 32; 2026-02-21T08:53:00.6980167Z or.b64 %rd948, %rd945, %rd947; 2026-02-21T08:53:00.6980533Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6980640Z mov.b64 {%r1717, %r1718}, %rd948; 2026-02-21T08:53:00.6980781Z cvt.rn.f16x2.f32 %r1719, %r1718, %r1717; 2026-02-21T08:53:00.6981225Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6981353Z cvt.u64.u32 %rd949, %r1088; 2026-02-21T08:53:00.6981478Z cvt.u64.u32 %rd950, %r1089; 2026-02-21T08:53:00.6981592Z shl.b64 %rd951, %rd950, 32; 2026-02-21T08:53:00.6981705Z or.b64 %rd952, %rd949, %rd951; 2026-02-21T08:53:00.6982076Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6982187Z mov.b64 {%r1720, %r1721}, %rd952; 2026-02-21T08:53:00.6982319Z cvt.rn.f16x2.f32 %r1722, %r1721, %r1720; 2026-02-21T08:53:00.6982722Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6982832Z cvt.u64.u32 %rd953, %r1090; 2026-02-21T08:53:00.6982943Z cvt.u64.u32 %rd954, %r1091; 2026-02-21T08:53:00.6983052Z shl.b64 %rd955, %rd954, 32; 2026-02-21T08:53:00.6983171Z or.b64 %rd956, %rd953, %rd955; 2026-02-21T08:53:00.6983627Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6983759Z mov.b64 {%r1723, %r1724}, %rd956; 2026-02-21T08:53:00.6983901Z cvt.rn.f16x2.f32 %r1725, %r1724, %r1723; 2026-02-21T08:53:00.6984286Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6984397Z cvt.u64.u32 %rd957, %r1092; 2026-02-21T08:53:00.6984514Z cvt.u64.u32 %rd958, %r1093; 2026-02-21T08:53:00.6984627Z shl.b64 %rd959, %rd958, 32; 2026-02-21T08:53:00.6984808Z or.b64 %rd960, %rd957, %rd959; 2026-02-21T08:53:00.6985194Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6985319Z mov.b64 {%r1726, %r1727}, %rd960; 2026-02-21T08:53:00.6985444Z cvt.rn.f16x2.f32 %r1728, %r1727, %r1726; 2026-02-21T08:53:00.6985811Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6985930Z cvt.u64.u32 %rd961, %r1094; 2026-02-21T08:53:00.6986045Z cvt.u64.u32 %rd962, %r1095; 2026-02-21T08:53:00.6986153Z shl.b64 %rd963, %rd962, 32; 2026-02-21T08:53:00.6986395Z or.b64 %rd964, %rd961, %rd963; 2026-02-21T08:53:00.6986782Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6986895Z mov.b64 {%r1729, %r1730}, %rd964; 2026-02-21T08:53:00.6987029Z cvt.rn.f16x2.f32 %r1731, %r1730, %r1729; 2026-02-21T08:53:00.6987423Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6987548Z cvt.u64.u32 %rd965, %r1096; 2026-02-21T08:53:00.6987663Z cvt.u64.u32 %rd966, %r1097; 2026-02-21T08:53:00.6987785Z shl.b64 %rd967, %rd966, 32; 2026-02-21T08:53:00.6987899Z or.b64 %rd968, %rd965, %rd967; 2026-02-21T08:53:00.6988268Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6988394Z mov.b64 {%r1732, %r1733}, %rd968; 2026-02-21T08:53:00.6988531Z cvt.rn.f16x2.f32 %r1734, %r1733, %r1732; 2026-02-21T08:53:00.6988920Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6989032Z cvt.u64.u32 %rd969, %r1098; 2026-02-21T08:53:00.6989152Z cvt.u64.u32 %rd970, %r1099; 2026-02-21T08:53:00.6989265Z shl.b64 %rd971, %rd970, 32; 2026-02-21T08:53:00.6989375Z or.b64 %rd972, %rd969, %rd971; 2026-02-21T08:53:00.6989738Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6989848Z mov.b64 {%r1735, %r1736}, %rd972; 2026-02-21T08:53:00.6990098Z cvt.rn.f16x2.f32 %r1737, %r1736, %r1735; 2026-02-21T08:53:00.6990494Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6990607Z cvt.u64.u32 %rd973, %r1101; 2026-02-21T08:53:00.6990719Z cvt.u64.u32 %rd974, %r1102; 2026-02-21T08:53:00.6990835Z shl.b64 %rd975, %rd974, 32; 2026-02-21T08:53:00.6990957Z or.b64 %rd976, %rd973, %rd975; 2026-02-21T08:53:00.6991417Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6991543Z mov.b64 {%r1738, %r1739}, %rd976; 2026-02-21T08:53:00.6991684Z cvt.rn.f16x2.f32 %r1740, %r1739, %r1738; 2026-02-21T08:53:00.6992060Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6992170Z cvt.u64.u32 %rd977, %r1103; 2026-02-21T08:53:00.6992305Z cvt.u64.u32 %rd978, %r1104; 2026-02-21T08:53:00.6992415Z shl.b64 %rd979, %rd978, 32; 2026-02-21T08:53:00.6992531Z or.b64 %rd980, %rd977, %rd979; 2026-02-21T08:53:00.6992903Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6993029Z mov.b64 {%r1741, %r1742}, %rd980; 2026-02-21T08:53:00.6993158Z cvt.rn.f16x2.f32 %r1743, %r1742, %r1741; 2026-02-21T08:53:00.6993661Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6993809Z cvt.u64.u32 %rd981, %r1105; 2026-02-21T08:53:00.6993926Z cvt.u64.u32 %rd982, %r1106; 2026-02-21T08:53:00.6994040Z shl.b64 %rd983, %rd982, 32; 2026-02-21T08:53:00.6994165Z or.b64 %rd984, %rd981, %rd983; 2026-02-21T08:53:00.6994557Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6994748Z mov.b64 {%r1744, %r1745}, %rd984; 2026-02-21T08:53:00.6994893Z cvt.rn.f16x2.f32 %r1746, %r1745, %r1744; 2026-02-21T08:53:00.6995282Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6995394Z cvt.u64.u32 %rd985, %r1107; 2026-02-21T08:53:00.6995504Z cvt.u64.u32 %rd986, %r1108; 2026-02-21T08:53:00.6995629Z shl.b64 %rd987, %rd986, 32; 2026-02-21T08:53:00.6995741Z or.b64 %rd988, %rd985, %rd987; 2026-02-21T08:53:00.6996115Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6996239Z mov.b64 {%r1747, %r1748}, %rd988; 2026-02-21T08:53:00.6996476Z cvt.rn.f16x2.f32 %r1749, %r1748, %r1747; 2026-02-21T08:53:00.6996863Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6996978Z cvt.u64.u32 %rd989, %r1109; 2026-02-21T08:53:00.6997102Z cvt.u64.u32 %rd990, %r1110; 2026-02-21T08:53:00.6997213Z shl.b64 %rd991, %rd990, 32; 2026-02-21T08:53:00.6997327Z or.b64 %rd992, %rd989, %rd991; 2026-02-21T08:53:00.6997715Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6997831Z mov.b64 {%r1750, %r1751}, %rd992; 2026-02-21T08:53:00.6997959Z cvt.rn.f16x2.f32 %r1752, %r1751, %r1750; 2026-02-21T08:53:00.6998344Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.6998458Z cvt.u64.u32 %rd993, %r1111; 2026-02-21T08:53:00.6998571Z cvt.u64.u32 %rd994, %r1112; 2026-02-21T08:53:00.6998681Z shl.b64 %rd995, %rd994, 32; 2026-02-21T08:53:00.6998804Z or.b64 %rd996, %rd993, %rd995; 2026-02-21T08:53:00.6999187Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.6999300Z mov.b64 {%r1753, %r1754}, %rd996; 2026-02-21T08:53:00.6999444Z cvt.rn.f16x2.f32 %r1755, %r1754, %r1753; 2026-02-21T08:53:00.6999851Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7000086Z cvt.u64.u32 %rd997, %r1113; 2026-02-21T08:53:00.7000208Z cvt.u64.u32 %rd998, %r1114; 2026-02-21T08:53:00.7000317Z shl.b64 %rd999, %rd998, 32; 2026-02-21T08:53:00.7000436Z or.b64 %rd1000, %rd997, %rd999; 2026-02-21T08:53:00.7000804Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7000931Z mov.b64 {%r1756, %r1757}, %rd1000; 2026-02-21T08:53:00.7001127Z cvt.rn.f16x2.f32 %r1758, %r1757, %r1756; 2026-02-21T08:53:00.7001498Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7001620Z cvt.u64.u32 %rd1001, %r1115; 2026-02-21T08:53:00.7001730Z cvt.u64.u32 %rd1002, %r1116; 2026-02-21T08:53:00.7001840Z shl.b64 %rd1003, %rd1002, 32; 2026-02-21T08:53:00.7001957Z or.b64 %rd1004, %rd1001, %rd1003; 2026-02-21T08:53:00.7002313Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7002432Z mov.b64 {%r1759, %r1760}, %rd1004; 2026-02-21T08:53:00.7002559Z cvt.rn.f16x2.f32 %r1761, %r1760, %r1759; 2026-02-21T08:53:00.7002677Z mov.b64 {%r1762, %r1763}, %rd396; 2026-02-21T08:53:00.7002804Z cvt.rn.f16x2.f32 %r1764, %r1763, %r1762; 2026-02-21T08:53:00.7002916Z mov.b64 {%r1765, %r1766}, %rd400; 2026-02-21T08:53:00.7003135Z cvt.rn.f16x2.f32 %r1767, %r1766, %r1765; 2026-02-21T08:53:00.7003256Z mov.b64 {%r1768, %r1769}, %rd404; 2026-02-21T08:53:00.7003384Z cvt.rn.f16x2.f32 %r1770, %r1769, %r1768; 2026-02-21T08:53:00.7003507Z mov.b64 {%r1771, %r1772}, %rd408; 2026-02-21T08:53:00.7003635Z cvt.rn.f16x2.f32 %r1773, %r1772, %r1771; 2026-02-21T08:53:00.7003743Z mov.b64 {%r1774, %r1775}, %rd412; 2026-02-21T08:53:00.7003866Z cvt.rn.f16x2.f32 %r1776, %r1775, %r1774; 2026-02-21T08:53:00.7003985Z mov.b64 {%r1777, %r1778}, %rd416; 2026-02-21T08:53:00.7004109Z cvt.rn.f16x2.f32 %r1779, %r1778, %r1777; 2026-02-21T08:53:00.7004218Z mov.b64 {%r1780, %r1781}, %rd420; 2026-02-21T08:53:00.7004358Z cvt.rn.f16x2.f32 %r1782, %r1781, %r1780; 2026-02-21T08:53:00.7004466Z mov.b64 {%r1783, %r1784}, %rd424; 2026-02-21T08:53:00.7004591Z cvt.rn.f16x2.f32 %r1785, %r1784, %r1783; 2026-02-21T08:53:00.7004755Z mov.b64 {%r1786, %r1787}, %rd428; 2026-02-21T08:53:00.7004897Z cvt.rn.f16x2.f32 %r1788, %r1787, %r1786; 2026-02-21T08:53:00.7005012Z mov.b64 {%r1789, %r1790}, %rd432; 2026-02-21T08:53:00.7005142Z cvt.rn.f16x2.f32 %r1791, %r1790, %r1789; 2026-02-21T08:53:00.7005264Z mov.b64 {%r1792, %r1793}, %rd436; 2026-02-21T08:53:00.7005483Z cvt.rn.f16x2.f32 %r1794, %r1793, %r1792; 2026-02-21T08:53:00.7005600Z mov.b64 {%r1795, %r1796}, %rd440; 2026-02-21T08:53:00.7005741Z cvt.rn.f16x2.f32 %r1797, %r1796, %r1795; 2026-02-21T08:53:00.7005867Z mov.b64 {%r1798, %r1799}, %rd444; 2026-02-21T08:53:00.7006008Z cvt.rn.f16x2.f32 %r1800, %r1799, %r1798; 2026-02-21T08:53:00.7006117Z mov.b64 {%r1801, %r1802}, %rd448; 2026-02-21T08:53:00.7006251Z cvt.rn.f16x2.f32 %r1803, %r1802, %r1801; 2026-02-21T08:53:00.7006363Z mov.b64 {%r1804, %r1805}, %rd452; 2026-02-21T08:53:00.7006485Z cvt.rn.f16x2.f32 %r1806, %r1805, %r1804; 2026-02-21T08:53:00.7006604Z mov.b64 {%r1807, %r1808}, %rd456; 2026-02-21T08:53:00.7006737Z cvt.rn.f16x2.f32 %r1809, %r1808, %r1807; 2026-02-21T08:53:00.7007138Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7007254Z cvt.u64.u32 %rd1005, %r1152; 2026-02-21T08:53:00.7007378Z cvt.u64.u32 %rd1006, %r1153; 2026-02-21T08:53:00.7007497Z shl.b64 %rd1007, %rd1006, 32; 2026-02-21T08:53:00.7007605Z or.b64 %rd1008, %rd1005, %rd1007; 2026-02-21T08:53:00.7007970Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7008087Z mov.b64 {%r1810, %r1811}, %rd1008; 2026-02-21T08:53:00.7008217Z cvt.rn.f16x2.f32 %r1812, %r1811, %r1810; 2026-02-21T08:53:00.7008600Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7008837Z cvt.u64.u32 %rd1009, %r1154; 2026-02-21T08:53:00.7008954Z cvt.u64.u32 %rd1010, %r1155; 2026-02-21T08:53:00.7009071Z shl.b64 %rd1011, %rd1010, 32; 2026-02-21T08:53:00.7009219Z or.b64 %rd1012, %rd1009, %rd1011; 2026-02-21T08:53:00.7009686Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7009820Z mov.b64 {%r1813, %r1814}, %rd1012; 2026-02-21T08:53:00.7010178Z cvt.rn.f16x2.f32 %r1815, %r1814, %r1813; 2026-02-21T08:53:00.7010568Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7010682Z cvt.u64.u32 %rd1013, %r1156; 2026-02-21T08:53:00.7010816Z cvt.u64.u32 %rd1014, %r1157; 2026-02-21T08:53:00.7010930Z shl.b64 %rd1015, %rd1014, 32; 2026-02-21T08:53:00.7011042Z or.b64 %rd1016, %rd1013, %rd1015; 2026-02-21T08:53:00.7011424Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7011556Z mov.b64 {%r1816, %r1817}, %rd1016; 2026-02-21T08:53:00.7011690Z cvt.rn.f16x2.f32 %r1818, %r1817, %r1816; 2026-02-21T08:53:00.7012089Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7012220Z cvt.u64.u32 %rd1017, %r1158; 2026-02-21T08:53:00.7012434Z cvt.u64.u32 %rd1018, %r1159; 2026-02-21T08:53:00.7012557Z shl.b64 %rd1019, %rd1018, 32; 2026-02-21T08:53:00.7012686Z or.b64 %rd1020, %rd1017, %rd1019; 2026-02-21T08:53:00.7013079Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7013199Z mov.b64 {%r1819, %r1820}, %rd1020; 2026-02-21T08:53:00.7013340Z cvt.rn.f16x2.f32 %r1821, %r1820, %r1819; 2026-02-21T08:53:00.7013705Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7013814Z cvt.u64.u32 %rd1021, %r1160; 2026-02-21T08:53:00.7013921Z cvt.u64.u32 %rd1022, %r1161; 2026-02-21T08:53:00.7014042Z shl.b64 %rd1023, %rd1022, 32; 2026-02-21T08:53:00.7014155Z or.b64 %rd1024, %rd1021, %rd1023; 2026-02-21T08:53:00.7014526Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7014654Z mov.b64 {%r1822, %r1823}, %rd1024; 2026-02-21T08:53:00.7014878Z cvt.rn.f16x2.f32 %r1824, %r1823, %r1822; 2026-02-21T08:53:00.7015268Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7015491Z cvt.u64.u32 %rd1025, %r1162; 2026-02-21T08:53:00.7015606Z cvt.u64.u32 %rd1026, %r1163; 2026-02-21T08:53:00.7015721Z shl.b64 %rd1027, %rd1026, 32; 2026-02-21T08:53:00.7015835Z or.b64 %rd1028, %rd1025, %rd1027; 2026-02-21T08:53:00.7016211Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7016326Z mov.b64 {%r1825, %r1826}, %rd1028; 2026-02-21T08:53:00.7016458Z cvt.rn.f16x2.f32 %r1827, %r1826, %r1825; 2026-02-21T08:53:00.7016838Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7016953Z cvt.u64.u32 %rd1029, %r1164; 2026-02-21T08:53:00.7017068Z cvt.u64.u32 %rd1030, %r1165; 2026-02-21T08:53:00.7017193Z shl.b64 %rd1031, %rd1030, 32; 2026-02-21T08:53:00.7017310Z or.b64 %rd1032, %rd1029, %rd1031; 2026-02-21T08:53:00.7017706Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7017832Z mov.b64 {%r1828, %r1829}, %rd1032; 2026-02-21T08:53:00.7017975Z cvt.rn.f16x2.f32 %r1830, %r1829, %r1828; 2026-02-21T08:53:00.7018358Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7018472Z cvt.u64.u32 %rd1033, %r1166; 2026-02-21T08:53:00.7018595Z cvt.u64.u32 %rd1034, %r1167; 2026-02-21T08:53:00.7018711Z shl.b64 %rd1035, %rd1034, 32; 2026-02-21T08:53:00.7018948Z or.b64 %rd1036, %rd1033, %rd1035; 2026-02-21T08:53:00.7019333Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7019446Z mov.b64 {%r1831, %r1832}, %rd1036; 2026-02-21T08:53:00.7019572Z cvt.rn.f16x2.f32 %r1833, %r1832, %r1831; 2026-02-21T08:53:00.7019932Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7020129Z cvt.u64.u32 %rd1037, %r1169; 2026-02-21T08:53:00.7020258Z cvt.u64.u32 %rd1038, %r1170; 2026-02-21T08:53:00.7020373Z shl.b64 %rd1039, %rd1038, 32; 2026-02-21T08:53:00.7020498Z or.b64 %rd1040, %rd1037, %rd1039; 2026-02-21T08:53:00.7020878Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7020999Z mov.b64 {%r1834, %r1835}, %rd1040; 2026-02-21T08:53:00.7021139Z cvt.rn.f16x2.f32 %r1836, %r1835, %r1834; 2026-02-21T08:53:00.7021512Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7021630Z cvt.u64.u32 %rd1041, %r1171; 2026-02-21T08:53:00.7021741Z cvt.u64.u32 %rd1042, %r1172; 2026-02-21T08:53:00.7021865Z shl.b64 %rd1043, %rd1042, 32; 2026-02-21T08:53:00.7021977Z or.b64 %rd1044, %rd1041, %rd1043; 2026-02-21T08:53:00.7022469Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7022609Z mov.b64 {%r1837, %r1838}, %rd1044; 2026-02-21T08:53:00.7022746Z cvt.rn.f16x2.f32 %r1839, %r1838, %r1837; 2026-02-21T08:53:00.7023128Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7023254Z cvt.u64.u32 %rd1045, %r1173; 2026-02-21T08:53:00.7023374Z cvt.u64.u32 %rd1046, %r1174; 2026-02-21T08:53:00.7023491Z shl.b64 %rd1047, %rd1046, 32; 2026-02-21T08:53:00.7023608Z or.b64 %rd1048, %rd1045, %rd1047; 2026-02-21T08:53:00.7024003Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7024121Z mov.b64 {%r1840, %r1841}, %rd1048; 2026-02-21T08:53:00.7024253Z cvt.rn.f16x2.f32 %r1842, %r1841, %r1840; 2026-02-21T08:53:00.7024637Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7024849Z cvt.u64.u32 %rd1049, %r1175; 2026-02-21T08:53:00.7024962Z cvt.u64.u32 %rd1050, %r1176; 2026-02-21T08:53:00.7025084Z shl.b64 %rd1051, %rd1050, 32; 2026-02-21T08:53:00.7025307Z or.b64 %rd1052, %rd1049, %rd1051; 2026-02-21T08:53:00.7025655Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7025766Z mov.b64 {%r1843, %r1844}, %rd1052; 2026-02-21T08:53:00.7025899Z cvt.rn.f16x2.f32 %r1845, %r1844, %r1843; 2026-02-21T08:53:00.7026251Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7026359Z cvt.u64.u32 %rd1053, %r1177; 2026-02-21T08:53:00.7026479Z cvt.u64.u32 %rd1054, %r1178; 2026-02-21T08:53:00.7026587Z shl.b64 %rd1055, %rd1054, 32; 2026-02-21T08:53:00.7026698Z or.b64 %rd1056, %rd1053, %rd1055; 2026-02-21T08:53:00.7027075Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7027189Z mov.b64 {%r1846, %r1847}, %rd1056; 2026-02-21T08:53:00.7027317Z cvt.rn.f16x2.f32 %r1848, %r1847, %r1846; 2026-02-21T08:53:00.7027682Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7027808Z cvt.u64.u32 %rd1057, %r1179; 2026-02-21T08:53:00.7027937Z cvt.u64.u32 %rd1058, %r1180; 2026-02-21T08:53:00.7028063Z shl.b64 %rd1059, %rd1058, 32; 2026-02-21T08:53:00.7028261Z or.b64 %rd1060, %rd1057, %rd1059; 2026-02-21T08:53:00.7028705Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7028939Z mov.b64 {%r1849, %r1850}, %rd1060; 2026-02-21T08:53:00.7029076Z cvt.rn.f16x2.f32 %r1851, %r1850, %r1849; 2026-02-21T08:53:00.7029457Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7029573Z cvt.u64.u32 %rd1061, %r1181; 2026-02-21T08:53:00.7029687Z cvt.u64.u32 %rd1062, %r1182; 2026-02-21T08:53:00.7029816Z shl.b64 %rd1063, %rd1062, 32; 2026-02-21T08:53:00.7030002Z or.b64 %rd1064, %rd1061, %rd1063; 2026-02-21T08:53:00.7030389Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7030525Z mov.b64 {%r1852, %r1853}, %rd1064; 2026-02-21T08:53:00.7030661Z cvt.rn.f16x2.f32 %r1854, %r1853, %r1852; 2026-02-21T08:53:00.7031045Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7031167Z cvt.u64.u32 %rd1065, %r1183; 2026-02-21T08:53:00.7031278Z cvt.u64.u32 %rd1066, %r1184; 2026-02-21T08:53:00.7031391Z shl.b64 %rd1067, %rd1066, 32; 2026-02-21T08:53:00.7031504Z or.b64 %rd1068, %rd1065, %rd1067; 2026-02-21T08:53:00.7031872Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7031984Z mov.b64 {%r1855, %r1856}, %rd1068; 2026-02-21T08:53:00.7032211Z cvt.rn.f16x2.f32 %r1857, %r1856, %r1855; 2026-02-21T08:53:00.7032350Z mov.b64 {%r1858, %r1859}, %rd460; 2026-02-21T08:53:00.7032488Z cvt.rn.f16x2.f32 %r1860, %r1859, %r1858; 2026-02-21T08:53:00.7032603Z mov.b64 {%r1861, %r1862}, %rd464; 2026-02-21T08:53:00.7032746Z cvt.rn.f16x2.f32 %r1863, %r1862, %r1861; 2026-02-21T08:53:00.7032858Z mov.b64 {%r1864, %r1865}, %rd468; 2026-02-21T08:53:00.7032989Z cvt.rn.f16x2.f32 %r1866, %r1865, %r1864; 2026-02-21T08:53:00.7033098Z mov.b64 {%r1867, %r1868}, %rd472; 2026-02-21T08:53:00.7033235Z cvt.rn.f16x2.f32 %r1869, %r1868, %r1867; 2026-02-21T08:53:00.7033345Z mov.b64 {%r1870, %r1871}, %rd476; 2026-02-21T08:53:00.7033474Z cvt.rn.f16x2.f32 %r1872, %r1871, %r1870; 2026-02-21T08:53:00.7033595Z mov.b64 {%r1873, %r1874}, %rd480; 2026-02-21T08:53:00.7033723Z cvt.rn.f16x2.f32 %r1875, %r1874, %r1873; 2026-02-21T08:53:00.7033832Z mov.b64 {%r1876, %r1877}, %rd484; 2026-02-21T08:53:00.7033967Z cvt.rn.f16x2.f32 %r1878, %r1877, %r1876; 2026-02-21T08:53:00.7034078Z mov.b64 {%r1879, %r1880}, %rd488; 2026-02-21T08:53:00.7034204Z cvt.rn.f16x2.f32 %r1881, %r1880, %r1879; 2026-02-21T08:53:00.7034315Z mov.b64 {%r1882, %r1883}, %rd492; 2026-02-21T08:53:00.7034536Z cvt.rn.f16x2.f32 %r1884, %r1883, %r1882; 2026-02-21T08:53:00.7034650Z mov.b64 {%r1885, %r1886}, %rd496; 2026-02-21T08:53:00.7034857Z cvt.rn.f16x2.f32 %r1887, %r1886, %r1885; 2026-02-21T08:53:00.7034984Z mov.b64 {%r1888, %r1889}, %rd500; 2026-02-21T08:53:00.7035119Z cvt.rn.f16x2.f32 %r1890, %r1889, %r1888; 2026-02-21T08:53:00.7035236Z mov.b64 {%r1891, %r1892}, %rd504; 2026-02-21T08:53:00.7035371Z cvt.rn.f16x2.f32 %r1893, %r1892, %r1891; 2026-02-21T08:53:00.7035496Z mov.b64 {%r1894, %r1895}, %rd508; 2026-02-21T08:53:00.7035632Z cvt.rn.f16x2.f32 %r1896, %r1895, %r1894; 2026-02-21T08:53:00.7035750Z mov.b64 {%r1897, %r1898}, %rd512; 2026-02-21T08:53:00.7035893Z cvt.rn.f16x2.f32 %r1899, %r1898, %r1897; 2026-02-21T08:53:00.7036010Z mov.b64 {%r1900, %r1901}, %rd516; 2026-02-21T08:53:00.7036144Z cvt.rn.f16x2.f32 %r1902, %r1901, %r1900; 2026-02-21T08:53:00.7036263Z mov.b64 {%r1903, %r1904}, %rd520; 2026-02-21T08:53:00.7036392Z cvt.rn.f16x2.f32 %r1905, %r1904, %r1903; 2026-02-21T08:53:00.7036796Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7036913Z cvt.u64.u32 %rd1069, %r1220; 2026-02-21T08:53:00.7037037Z cvt.u64.u32 %rd1070, %r1221; 2026-02-21T08:53:00.7037151Z shl.b64 %rd1071, %rd1070, 32; 2026-02-21T08:53:00.7037264Z or.b64 %rd1072, %rd1069, %rd1071; 2026-02-21T08:53:00.7037643Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7037869Z mov.b64 {%r1906, %r1907}, %rd1072; 2026-02-21T08:53:00.7037992Z cvt.rn.f16x2.f32 %r1908, %r1907, %r1906; 2026-02-21T08:53:00.7038375Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7038492Z cvt.u64.u32 %rd1073, %r1222; 2026-02-21T08:53:00.7038608Z cvt.u64.u32 %rd1074, %r1223; 2026-02-21T08:53:00.7038721Z shl.b64 %rd1075, %rd1074, 32; 2026-02-21T08:53:00.7038915Z or.b64 %rd1076, %rd1073, %rd1075; 2026-02-21T08:53:00.7039316Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7039434Z mov.b64 {%r1909, %r1910}, %rd1076; 2026-02-21T08:53:00.7039574Z cvt.rn.f16x2.f32 %r1911, %r1910, %r1909; 2026-02-21T08:53:00.7039943Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7040057Z cvt.u64.u32 %rd1077, %r1224; 2026-02-21T08:53:00.7040180Z cvt.u64.u32 %rd1078, %r1225; 2026-02-21T08:53:00.7040299Z shl.b64 %rd1079, %rd1078, 32; 2026-02-21T08:53:00.7040415Z or.b64 %rd1080, %rd1077, %rd1079; 2026-02-21T08:53:00.7040784Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7040906Z mov.b64 {%r1912, %r1913}, %rd1080; 2026-02-21T08:53:00.7041131Z cvt.rn.f16x2.f32 %r1914, %r1913, %r1912; 2026-02-21T08:53:00.7041540Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7041675Z cvt.u64.u32 %rd1081, %r1226; 2026-02-21T08:53:00.7041799Z cvt.u64.u32 %rd1082, %r1227; 2026-02-21T08:53:00.7041916Z shl.b64 %rd1083, %rd1082, 32; 2026-02-21T08:53:00.7042039Z or.b64 %rd1084, %rd1081, %rd1083; 2026-02-21T08:53:00.7042425Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7042541Z mov.b64 {%r1915, %r1916}, %rd1084; 2026-02-21T08:53:00.7042678Z cvt.rn.f16x2.f32 %r1917, %r1916, %r1915; 2026-02-21T08:53:00.7043067Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7043181Z cvt.u64.u32 %rd1085, %r1228; 2026-02-21T08:53:00.7043292Z cvt.u64.u32 %rd1086, %r1229; 2026-02-21T08:53:00.7043415Z shl.b64 %rd1087, %rd1086, 32; 2026-02-21T08:53:00.7043528Z or.b64 %rd1088, %rd1085, %rd1087; 2026-02-21T08:53:00.7043887Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7044130Z mov.b64 {%r1918, %r1919}, %rd1088; 2026-02-21T08:53:00.7044262Z cvt.rn.f16x2.f32 %r1920, %r1919, %r1918; 2026-02-21T08:53:00.7044643Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7044830Z cvt.u64.u32 %rd1089, %r1230; 2026-02-21T08:53:00.7044956Z cvt.u64.u32 %rd1090, %r1231; 2026-02-21T08:53:00.7045072Z shl.b64 %rd1091, %rd1090, 32; 2026-02-21T08:53:00.7045195Z or.b64 %rd1092, %rd1089, %rd1091; 2026-02-21T08:53:00.7045579Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7045692Z mov.b64 {%r1921, %r1922}, %rd1092; 2026-02-21T08:53:00.7045822Z cvt.rn.f16x2.f32 %r1923, %r1922, %r1921; 2026-02-21T08:53:00.7046204Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7046318Z cvt.u64.u32 %rd1093, %r1232; 2026-02-21T08:53:00.7046432Z cvt.u64.u32 %rd1094, %r1233; 2026-02-21T08:53:00.7046545Z shl.b64 %rd1095, %rd1094, 32; 2026-02-21T08:53:00.7046666Z or.b64 %rd1096, %rd1093, %rd1095; 2026-02-21T08:53:00.7047044Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7047158Z mov.b64 {%r1924, %r1925}, %rd1096; 2026-02-21T08:53:00.7047303Z cvt.rn.f16x2.f32 %r1926, %r1925, %r1924; 2026-02-21T08:53:00.7047696Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7047938Z cvt.u64.u32 %rd1097, %r1234; 2026-02-21T08:53:00.7048071Z cvt.u64.u32 %rd1098, %r1235; 2026-02-21T08:53:00.7048186Z shl.b64 %rd1099, %rd1098, 32; 2026-02-21T08:53:00.7048299Z or.b64 %rd1100, %rd1097, %rd1099; 2026-02-21T08:53:00.7048683Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7048896Z mov.b64 {%r1927, %r1928}, %rd1100; 2026-02-21T08:53:00.7049039Z cvt.rn.f16x2.f32 %r1929, %r1928, %r1927; 2026-02-21T08:53:00.7049400Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7049515Z cvt.u64.u32 %rd1101, %r1237; 2026-02-21T08:53:00.7049620Z cvt.u64.u32 %rd1102, %r1238; 2026-02-21T08:53:00.7049726Z shl.b64 %rd1103, %rd1102, 32; 2026-02-21T08:53:00.7049839Z or.b64 %rd1104, %rd1101, %rd1103; 2026-02-21T08:53:00.7050192Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7050301Z mov.b64 {%r1930, %r1931}, %rd1104; 2026-02-21T08:53:00.7050426Z cvt.rn.f16x2.f32 %r1932, %r1931, %r1930; 2026-02-21T08:53:00.7050794Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7050986Z cvt.u64.u32 %rd1105, %r1239; 2026-02-21T08:53:00.7051108Z cvt.u64.u32 %rd1106, %r1240; 2026-02-21T08:53:00.7051233Z shl.b64 %rd1107, %rd1106, 32; 2026-02-21T08:53:00.7051343Z or.b64 %rd1108, %rd1105, %rd1107; 2026-02-21T08:53:00.7051704Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7051820Z mov.b64 {%r1933, %r1934}, %rd1108; 2026-02-21T08:53:00.7051945Z cvt.rn.f16x2.f32 %r1935, %r1934, %r1933; 2026-02-21T08:53:00.7052302Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7052412Z cvt.u64.u32 %rd1109, %r1241; 2026-02-21T08:53:00.7052531Z cvt.u64.u32 %rd1110, %r1242; 2026-02-21T08:53:00.7052638Z shl.b64 %rd1111, %rd1110, 32; 2026-02-21T08:53:00.7052745Z or.b64 %rd1112, %rd1109, %rd1111; 2026-02-21T08:53:00.7053115Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7053226Z mov.b64 {%r1936, %r1937}, %rd1112; 2026-02-21T08:53:00.7053358Z cvt.rn.f16x2.f32 %r1938, %r1937, %r1936; 2026-02-21T08:53:00.7053874Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7053995Z cvt.u64.u32 %rd1113, %r1243; 2026-02-21T08:53:00.7054105Z cvt.u64.u32 %rd1114, %r1244; 2026-02-21T08:53:00.7054216Z shl.b64 %rd1115, %rd1114, 32; 2026-02-21T08:53:00.7054333Z or.b64 %rd1116, %rd1113, %rd1115; 2026-02-21T08:53:00.7054814Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7054941Z mov.b64 {%r1939, %r1940}, %rd1116; 2026-02-21T08:53:00.7055084Z cvt.rn.f16x2.f32 %r1941, %r1940, %r1939; 2026-02-21T08:53:00.7055458Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7055569Z cvt.u64.u32 %rd1117, %r1245; 2026-02-21T08:53:00.7055688Z cvt.u64.u32 %rd1118, %r1246; 2026-02-21T08:53:00.7055801Z shl.b64 %rd1119, %rd1118, 32; 2026-02-21T08:53:00.7055917Z or.b64 %rd1120, %rd1117, %rd1119; 2026-02-21T08:53:00.7056287Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7056414Z mov.b64 {%r1942, %r1943}, %rd1120; 2026-02-21T08:53:00.7056549Z cvt.rn.f16x2.f32 %r1944, %r1943, %r1942; 2026-02-21T08:53:00.7056932Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7057058Z cvt.u64.u32 %rd1121, %r1247; 2026-02-21T08:53:00.7057169Z cvt.u64.u32 %rd1122, %r1248; 2026-02-21T08:53:00.7057395Z shl.b64 %rd1123, %rd1122, 32; 2026-02-21T08:53:00.7057518Z or.b64 %rd1124, %rd1121, %rd1123; 2026-02-21T08:53:00.7057885Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7057996Z mov.b64 {%r1945, %r1946}, %rd1124; 2026-02-21T08:53:00.7058127Z cvt.rn.f16x2.f32 %r1947, %r1946, %r1945; 2026-02-21T08:53:00.7058605Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7058731Z cvt.u64.u32 %rd1125, %r1249; 2026-02-21T08:53:00.7058845Z cvt.u64.u32 %rd1126, %r1250; 2026-02-21T08:53:00.7058968Z shl.b64 %rd1127, %rd1126, 32; 2026-02-21T08:53:00.7059083Z or.b64 %rd1128, %rd1125, %rd1127; 2026-02-21T08:53:00.7059474Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7059601Z mov.b64 {%r1948, %r1949}, %rd1128; 2026-02-21T08:53:00.7059739Z cvt.rn.f16x2.f32 %r1950, %r1949, %r1948; 2026-02-21T08:53:00.7060136Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7060256Z cvt.u64.u32 %rd1129, %r1251; 2026-02-21T08:53:00.7060368Z cvt.u64.u32 %rd1130, %r1252; 2026-02-21T08:53:00.7060481Z shl.b64 %rd1131, %rd1130, 32; 2026-02-21T08:53:00.7060686Z or.b64 %rd1132, %rd1129, %rd1131; 2026-02-21T08:53:00.7061096Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7061216Z mov.b64 {%r1951, %r1952}, %rd1132; 2026-02-21T08:53:00.7061343Z cvt.rn.f16x2.f32 %r1953, %r1952, %r1951; 2026-02-21T08:53:00.7061460Z mov.b64 {%r1954, %r1955}, %rd524; 2026-02-21T08:53:00.7061585Z cvt.rn.f16x2.f32 %r1956, %r1955, %r1954; 2026-02-21T08:53:00.7061690Z mov.b64 {%r1957, %r1958}, %rd528; 2026-02-21T08:53:00.7061812Z cvt.rn.f16x2.f32 %r1959, %r1958, %r1957; 2026-02-21T08:53:00.7061933Z mov.b64 {%r1960, %r1961}, %rd532; 2026-02-21T08:53:00.7062060Z cvt.rn.f16x2.f32 %r1962, %r1961, %r1960; 2026-02-21T08:53:00.7062171Z mov.b64 {%r1963, %r1964}, %rd536; 2026-02-21T08:53:00.7062307Z cvt.rn.f16x2.f32 %r1965, %r1964, %r1963; 2026-02-21T08:53:00.7062417Z mov.b64 {%r1966, %r1967}, %rd540; 2026-02-21T08:53:00.7062545Z cvt.rn.f16x2.f32 %r1968, %r1967, %r1966; 2026-02-21T08:53:00.7062667Z mov.b64 {%r1969, %r1970}, %rd544; 2026-02-21T08:53:00.7062798Z cvt.rn.f16x2.f32 %r1971, %r1970, %r1969; 2026-02-21T08:53:00.7062910Z mov.b64 {%r1972, %r1973}, %rd548; 2026-02-21T08:53:00.7063143Z cvt.rn.f16x2.f32 %r1974, %r1973, %r1972; 2026-02-21T08:53:00.7063264Z mov.b64 {%r1975, %r1976}, %rd552; 2026-02-21T08:53:00.7063392Z cvt.rn.f16x2.f32 %r1977, %r1976, %r1975; 2026-02-21T08:53:00.7063500Z mov.b64 {%r1978, %r1979}, %rd556; 2026-02-21T08:53:00.7063636Z cvt.rn.f16x2.f32 %r1980, %r1979, %r1978; 2026-02-21T08:53:00.7063746Z mov.b64 {%r1981, %r1982}, %rd560; 2026-02-21T08:53:00.7063872Z cvt.rn.f16x2.f32 %r1983, %r1982, %r1981; 2026-02-21T08:53:00.7063984Z mov.b64 {%r1984, %r1985}, %rd564; 2026-02-21T08:53:00.7064119Z cvt.rn.f16x2.f32 %r1986, %r1985, %r1984; 2026-02-21T08:53:00.7064227Z mov.b64 {%r1987, %r1988}, %rd568; 2026-02-21T08:53:00.7064352Z cvt.rn.f16x2.f32 %r1989, %r1988, %r1987; 2026-02-21T08:53:00.7064471Z mov.b64 {%r1990, %r1991}, %rd572; 2026-02-21T08:53:00.7064601Z cvt.rn.f16x2.f32 %r1992, %r1991, %r1990; 2026-02-21T08:53:00.7064776Z mov.b64 {%r1993, %r1994}, %rd576; 2026-02-21T08:53:00.7064935Z cvt.rn.f16x2.f32 %r1995, %r1994, %r1993; 2026-02-21T08:53:00.7065051Z mov.b64 {%r1996, %r1997}, %rd580; 2026-02-21T08:53:00.7065180Z cvt.rn.f16x2.f32 %r1998, %r1997, %r1996; 2026-02-21T08:53:00.7065294Z mov.b64 {%r1999, %r2000}, %rd584; 2026-02-21T08:53:00.7065435Z cvt.rn.f16x2.f32 %r2001, %r2000, %r1999; 2026-02-21T08:53:00.7065848Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7065963Z cvt.u64.u32 %rd1133, %r1288; 2026-02-21T08:53:00.7066090Z cvt.u64.u32 %rd1134, %r1289; 2026-02-21T08:53:00.7066322Z shl.b64 %rd1135, %rd1134, 32; 2026-02-21T08:53:00.7066440Z or.b64 %rd1136, %rd1133, %rd1135; 2026-02-21T08:53:00.7066848Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7066961Z mov.b64 {%r2002, %r2003}, %rd1136; 2026-02-21T08:53:00.7067093Z cvt.rn.f16x2.f32 %r2004, %r2003, %r2002; 2026-02-21T08:53:00.7067532Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7067665Z cvt.u64.u32 %rd1137, %r1290; 2026-02-21T08:53:00.7067774Z cvt.u64.u32 %rd1138, %r1291; 2026-02-21T08:53:00.7067890Z shl.b64 %rd1139, %rd1138, 32; 2026-02-21T08:53:00.7068008Z or.b64 %rd1140, %rd1137, %rd1139; 2026-02-21T08:53:00.7068397Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7068508Z mov.b64 {%r2005, %r2006}, %rd1140; 2026-02-21T08:53:00.7068653Z cvt.rn.f16x2.f32 %r2007, %r2006, %r2005; 2026-02-21T08:53:00.7069045Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7069156Z cvt.u64.u32 %rd1141, %r1292; 2026-02-21T08:53:00.7069267Z cvt.u64.u32 %rd1142, %r1293; 2026-02-21T08:53:00.7069388Z shl.b64 %rd1143, %rd1142, 32; 2026-02-21T08:53:00.7069580Z or.b64 %rd1144, %rd1141, %rd1143; 2026-02-21T08:53:00.7069968Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7070097Z mov.b64 {%r2008, %r2009}, %rd1144; 2026-02-21T08:53:00.7070227Z cvt.rn.f16x2.f32 %r2010, %r2009, %r2008; 2026-02-21T08:53:00.7070602Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7070724Z cvt.u64.u32 %rd1145, %r1294; 2026-02-21T08:53:00.7070839Z cvt.u64.u32 %rd1146, %r1295; 2026-02-21T08:53:00.7070952Z shl.b64 %rd1147, %rd1146, 32; 2026-02-21T08:53:00.7071073Z or.b64 %rd1148, %rd1145, %rd1147; 2026-02-21T08:53:00.7071462Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7071583Z mov.b64 {%r2011, %r2012}, %rd1148; 2026-02-21T08:53:00.7071719Z cvt.rn.f16x2.f32 %r2013, %r2012, %r2011; 2026-02-21T08:53:00.7072117Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7072234Z cvt.u64.u32 %rd1149, %r1296; 2026-02-21T08:53:00.7072460Z cvt.u64.u32 %rd1150, %r1297; 2026-02-21T08:53:00.7072584Z shl.b64 %rd1151, %rd1150, 32; 2026-02-21T08:53:00.7072697Z or.b64 %rd1152, %rd1149, %rd1151; 2026-02-21T08:53:00.7073076Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7073187Z mov.b64 {%r2014, %r2015}, %rd1152; 2026-02-21T08:53:00.7073320Z cvt.rn.f16x2.f32 %r2016, %r2015, %r2014; 2026-02-21T08:53:00.7073670Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7073779Z cvt.u64.u32 %rd1153, %r1298; 2026-02-21T08:53:00.7073894Z cvt.u64.u32 %rd1154, %r1299; 2026-02-21T08:53:00.7074002Z shl.b64 %rd1155, %rd1154, 32; 2026-02-21T08:53:00.7074110Z or.b64 %rd1156, %rd1153, %rd1155; 2026-02-21T08:53:00.7074492Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7074605Z mov.b64 {%r2017, %r2018}, %rd1156; 2026-02-21T08:53:00.7074801Z cvt.rn.f16x2.f32 %r2019, %r2018, %r2017; 2026-02-21T08:53:00.7075171Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7075293Z cvt.u64.u32 %rd1157, %r1300; 2026-02-21T08:53:00.7075404Z cvt.u64.u32 %rd1158, %r1301; 2026-02-21T08:53:00.7075513Z shl.b64 %rd1159, %rd1158, 32; 2026-02-21T08:53:00.7075626Z or.b64 %rd1160, %rd1157, %rd1159; 2026-02-21T08:53:00.7075988Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7076217Z mov.b64 {%r2020, %r2021}, %rd1160; 2026-02-21T08:53:00.7076354Z cvt.rn.f16x2.f32 %r2022, %r2021, %r2020; 2026-02-21T08:53:00.7076724Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7076835Z cvt.u64.u32 %rd1161, %r1302; 2026-02-21T08:53:00.7076948Z cvt.u64.u32 %rd1162, %r1303; 2026-02-21T08:53:00.7077149Z shl.b64 %rd1163, %rd1162, 32; 2026-02-21T08:53:00.7077272Z or.b64 %rd1164, %rd1161, %rd1163; 2026-02-21T08:53:00.7077662Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7077793Z mov.b64 {%r2023, %r2024}, %rd1164; 2026-02-21T08:53:00.7077925Z cvt.rn.f16x2.f32 %r2025, %r2024, %r2023; 2026-02-21T08:53:00.7078283Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7078404Z cvt.u64.u32 %rd1165, %r1305; 2026-02-21T08:53:00.7078522Z cvt.u64.u32 %rd1166, %r1306; 2026-02-21T08:53:00.7078634Z shl.b64 %rd1167, %rd1166, 32; 2026-02-21T08:53:00.7078748Z or.b64 %rd1168, %rd1165, %rd1167; 2026-02-21T08:53:00.7079127Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7079324Z mov.b64 {%r2026, %r2027}, %rd1168; 2026-02-21T08:53:00.7079460Z cvt.rn.f16x2.f32 %r2028, %r2027, %r2026; 2026-02-21T08:53:00.7079838Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7079947Z cvt.u64.u32 %rd1169, %r1307; 2026-02-21T08:53:00.7080058Z cvt.u64.u32 %rd1170, %r1308; 2026-02-21T08:53:00.7080181Z shl.b64 %rd1171, %rd1170, 32; 2026-02-21T08:53:00.7080294Z or.b64 %rd1172, %rd1169, %rd1171; 2026-02-21T08:53:00.7080673Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7080793Z mov.b64 {%r2029, %r2030}, %rd1172; 2026-02-21T08:53:00.7080934Z cvt.rn.f16x2.f32 %r2031, %r2030, %r2029; 2026-02-21T08:53:00.7081312Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7081423Z cvt.u64.u32 %rd1173, %r1309; 2026-02-21T08:53:00.7081543Z cvt.u64.u32 %rd1174, %r1310; 2026-02-21T08:53:00.7081659Z shl.b64 %rd1175, %rd1174, 32; 2026-02-21T08:53:00.7081773Z or.b64 %rd1176, %rd1173, %rd1175; 2026-02-21T08:53:00.7082269Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7082385Z mov.b64 {%r2032, %r2033}, %rd1176; 2026-02-21T08:53:00.7082519Z cvt.rn.f16x2.f32 %r2034, %r2033, %r2032; 2026-02-21T08:53:00.7082907Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7083026Z cvt.u64.u32 %rd1177, %r1311; 2026-02-21T08:53:00.7083156Z cvt.u64.u32 %rd1178, %r1312; 2026-02-21T08:53:00.7083273Z shl.b64 %rd1179, %rd1178, 32; 2026-02-21T08:53:00.7083401Z or.b64 %rd1180, %rd1177, %rd1179; 2026-02-21T08:53:00.7083806Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7083923Z mov.b64 {%r2035, %r2036}, %rd1180; 2026-02-21T08:53:00.7084073Z cvt.rn.f16x2.f32 %r2037, %r2036, %r2035; 2026-02-21T08:53:00.7084184Z mov.b64 {%r2038, %r2039}, %rd588; 2026-02-21T08:53:00.7084316Z cvt.rn.f16x2.f32 %r2040, %r2039, %r2038; 2026-02-21T08:53:00.7084819Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7084945Z cvt.u64.u32 %rd1181, %r1315; 2026-02-21T08:53:00.7085059Z cvt.u64.u32 %rd1182, %r1316; 2026-02-21T08:53:00.7085172Z shl.b64 %rd1183, %rd1182, 32; 2026-02-21T08:53:00.7085298Z or.b64 %rd1184, %rd1181, %rd1183; 2026-02-21T08:53:00.7085654Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7085877Z mov.b64 {%r2041, %r2042}, %rd1184; 2026-02-21T08:53:00.7086018Z cvt.rn.f16x2.f32 %r2043, %r2042, %r2041; 2026-02-21T08:53:00.7086397Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7086510Z cvt.u64.u32 %rd1185, %r1317; 2026-02-21T08:53:00.7086637Z cvt.u64.u32 %rd1186, %r1318; 2026-02-21T08:53:00.7086751Z shl.b64 %rd1187, %rd1186, 32; 2026-02-21T08:53:00.7086934Z or.b64 %rd1188, %rd1185, %rd1187; 2026-02-21T08:53:00.7087331Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7087455Z mov.b64 {%r2044, %r2045}, %rd1188; 2026-02-21T08:53:00.7087582Z cvt.rn.f16x2.f32 %r2046, %r2045, %r2044; 2026-02-21T08:53:00.7087951Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7088075Z cvt.u64.u32 %rd1189, %r1319; 2026-02-21T08:53:00.7088187Z cvt.u64.u32 %rd1190, %r1320; 2026-02-21T08:53:00.7088302Z shl.b64 %rd1191, %rd1190, 32; 2026-02-21T08:53:00.7088415Z or.b64 %rd1192, %rd1189, %rd1191; 2026-02-21T08:53:00.7088798Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7088911Z mov.b64 {%r2047, %r2048}, %rd1192; 2026-02-21T08:53:00.7089129Z cvt.rn.f16x2.f32 %r2049, %r2048, %r2047; 2026-02-21T08:53:00.7089270Z mov.b64 {%r2050, %r2051}, %rd592; 2026-02-21T08:53:00.7089408Z cvt.rn.f16x2.f32 %r2052, %r2051, %r2050; 2026-02-21T08:53:00.7089526Z mov.b64 {%r2053, %r2054}, %rd596; 2026-02-21T08:53:00.7089666Z cvt.rn.f16x2.f32 %r2055, %r2054, %r2053; 2026-02-21T08:53:00.7089779Z mov.b64 {%r2056, %r2057}, %rd600; 2026-02-21T08:53:00.7089908Z cvt.rn.f16x2.f32 %r2058, %r2057, %r2056; 2026-02-21T08:53:00.7090020Z mov.b64 {%r2059, %r2060}, %rd604; 2026-02-21T08:53:00.7090159Z cvt.rn.f16x2.f32 %r2061, %r2060, %r2059; 2026-02-21T08:53:00.7090271Z mov.b64 {%r2062, %r2063}, %rd608; 2026-02-21T08:53:00.7090408Z cvt.rn.f16x2.f32 %r2064, %r2063, %r2062; 2026-02-21T08:53:00.7090530Z mov.b64 {%r2065, %r2066}, %rd612; 2026-02-21T08:53:00.7090660Z cvt.rn.f16x2.f32 %r2067, %r2066, %r2065; 2026-02-21T08:53:00.7090768Z mov.b64 {%r2068, %r2069}, %rd616; 2026-02-21T08:53:00.7090905Z cvt.rn.f16x2.f32 %r2070, %r2069, %r2068; 2026-02-21T08:53:00.7091021Z mov.b64 {%r2071, %r2072}, %rd620; 2026-02-21T08:53:00.7091150Z cvt.rn.f16x2.f32 %r2073, %r2072, %r2071; 2026-02-21T08:53:00.7091364Z mov.b64 {%r2074, %r2075}, %rd624; 2026-02-21T08:53:00.7091496Z cvt.rn.f16x2.f32 %r2076, %r2075, %r2074; 2026-02-21T08:53:00.7091605Z mov.b64 {%r2077, %r2078}, %rd628; 2026-02-21T08:53:00.7091733Z cvt.rn.f16x2.f32 %r2079, %r2078, %r2077; 2026-02-21T08:53:00.7091849Z mov.b64 {%r2080, %r2081}, %rd632; 2026-02-21T08:53:00.7091977Z cvt.rn.f16x2.f32 %r2082, %r2081, %r2080; 2026-02-21T08:53:00.7092085Z mov.b64 {%r2083, %r2084}, %rd636; 2026-02-21T08:53:00.7092215Z cvt.rn.f16x2.f32 %r2085, %r2084, %r2083; 2026-02-21T08:53:00.7092335Z mov.b64 {%r2086, %r2087}, %rd640; 2026-02-21T08:53:00.7092463Z cvt.rn.f16x2.f32 %r2088, %r2087, %r2086; 2026-02-21T08:53:00.7092574Z mov.b64 {%r2089, %r2090}, %rd644; 2026-02-21T08:53:00.7092713Z cvt.rn.f16x2.f32 %r2091, %r2090, %r2089; 2026-02-21T08:53:00.7092824Z mov.b64 {%r2092, %r2093}, %rd648; 2026-02-21T08:53:00.7092954Z cvt.rn.f16x2.f32 %r2094, %r2093, %r2092; 2026-02-21T08:53:00.7093073Z mov.b64 {%r2095, %r2096}, %rd652; 2026-02-21T08:53:00.7093206Z cvt.rn.f16x2.f32 %r2097, %r2096, %r2095; 2026-02-21T08:53:00.7093592Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7093702Z cvt.u64.u32 %rd1193, %r1356; 2026-02-21T08:53:00.7093820Z cvt.u64.u32 %rd1194, %r1357; 2026-02-21T08:53:00.7093929Z shl.b64 %rd1195, %rd1194, 32; 2026-02-21T08:53:00.7094038Z or.b64 %rd1196, %rd1193, %rd1195; 2026-02-21T08:53:00.7094422Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7094771Z mov.b64 {%r2098, %r2099}, %rd1196; 2026-02-21T08:53:00.7094908Z cvt.rn.f16x2.f32 %r2100, %r2099, %r2098; 2026-02-21T08:53:00.7095308Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7095430Z cvt.u64.u32 %rd1197, %r1358; 2026-02-21T08:53:00.7095546Z cvt.u64.u32 %rd1198, %r1359; 2026-02-21T08:53:00.7095741Z shl.b64 %rd1199, %rd1198, 32; 2026-02-21T08:53:00.7095881Z or.b64 %rd1200, %rd1197, %rd1199; 2026-02-21T08:53:00.7096266Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7096380Z mov.b64 {%r2101, %r2102}, %rd1200; 2026-02-21T08:53:00.7096518Z cvt.rn.f16x2.f32 %r2103, %r2102, %r2101; 2026-02-21T08:53:00.7096892Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7097005Z cvt.u64.u32 %rd1201, %r1360; 2026-02-21T08:53:00.7097128Z cvt.u64.u32 %rd1202, %r1361; 2026-02-21T08:53:00.7097237Z shl.b64 %rd1203, %rd1202, 32; 2026-02-21T08:53:00.7097342Z or.b64 %rd1204, %rd1201, %rd1203; 2026-02-21T08:53:00.7097684Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7097891Z mov.b64 {%r2104, %r2105}, %rd1204; 2026-02-21T08:53:00.7098024Z cvt.rn.f16x2.f32 %r2106, %r2105, %r2104; 2026-02-21T08:53:00.7098395Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7098518Z cvt.u64.u32 %rd1205, %r1362; 2026-02-21T08:53:00.7098628Z cvt.u64.u32 %rd1206, %r1363; 2026-02-21T08:53:00.7098737Z shl.b64 %rd1207, %rd1206, 32; 2026-02-21T08:53:00.7098858Z or.b64 %rd1208, %rd1205, %rd1207; 2026-02-21T08:53:00.7099223Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7099332Z mov.b64 {%r2107, %r2108}, %rd1208; 2026-02-21T08:53:00.7099459Z cvt.rn.f16x2.f32 %r2109, %r2108, %r2107; 2026-02-21T08:53:00.7099820Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7099931Z cvt.u64.u32 %rd1209, %r1364; 2026-02-21T08:53:00.7100038Z cvt.u64.u32 %rd1210, %r1365; 2026-02-21T08:53:00.7100160Z shl.b64 %rd1211, %rd1210, 32; 2026-02-21T08:53:00.7100268Z or.b64 %rd1212, %rd1209, %rd1211; 2026-02-21T08:53:00.7100631Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7100866Z mov.b64 {%r2110, %r2111}, %rd1212; 2026-02-21T08:53:00.7100997Z cvt.rn.f16x2.f32 %r2112, %r2111, %r2110; 2026-02-21T08:53:00.7101386Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7101504Z cvt.u64.u32 %rd1213, %r1366; 2026-02-21T08:53:00.7101631Z cvt.u64.u32 %rd1214, %r1367; 2026-02-21T08:53:00.7101744Z shl.b64 %rd1215, %rd1214, 32; 2026-02-21T08:53:00.7101856Z or.b64 %rd1216, %rd1213, %rd1215; 2026-02-21T08:53:00.7102226Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7102340Z mov.b64 {%r2113, %r2114}, %rd1216; 2026-02-21T08:53:00.7102475Z cvt.rn.f16x2.f32 %r2115, %r2114, %r2113; 2026-02-21T08:53:00.7102866Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7102977Z cvt.u64.u32 %rd1217, %r1368; 2026-02-21T08:53:00.7103088Z cvt.u64.u32 %rd1218, %r1369; 2026-02-21T08:53:00.7103194Z shl.b64 %rd1219, %rd1218, 32; 2026-02-21T08:53:00.7103310Z or.b64 %rd1220, %rd1217, %rd1219; 2026-02-21T08:53:00.7103672Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7103782Z mov.b64 {%r2116, %r2117}, %rd1220; 2026-02-21T08:53:00.7103918Z cvt.rn.f16x2.f32 %r2118, %r2117, %r2116; 2026-02-21T08:53:00.7104432Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7104548Z cvt.u64.u32 %rd1221, %r1370; 2026-02-21T08:53:00.7104750Z cvt.u64.u32 %rd1222, %r1371; 2026-02-21T08:53:00.7104875Z shl.b64 %rd1223, %rd1222, 32; 2026-02-21T08:53:00.7104988Z or.b64 %rd1224, %rd1221, %rd1223; 2026-02-21T08:53:00.7105483Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7105631Z mov.b64 {%r2119, %r2120}, %rd1224; 2026-02-21T08:53:00.7105797Z cvt.rn.f16x2.f32 %r2121, %r2120, %r2119; 2026-02-21T08:53:00.7106263Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7106406Z cvt.u64.u32 %rd1225, %r1373; 2026-02-21T08:53:00.7106515Z cvt.u64.u32 %rd1226, %r1374; 2026-02-21T08:53:00.7106626Z shl.b64 %rd1227, %rd1226, 32; 2026-02-21T08:53:00.7106743Z or.b64 %rd1228, %rd1225, %rd1227; 2026-02-21T08:53:00.7107134Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7107247Z mov.b64 {%r2122, %r2123}, %rd1228; 2026-02-21T08:53:00.7107382Z cvt.rn.f16x2.f32 %r2124, %r2123, %r2122; 2026-02-21T08:53:00.7107882Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7108006Z cvt.u64.u32 %rd1229, %r1375; 2026-02-21T08:53:00.7108119Z cvt.u64.u32 %rd1230, %r1376; 2026-02-21T08:53:00.7108245Z shl.b64 %rd1231, %rd1230, 32; 2026-02-21T08:53:00.7108357Z or.b64 %rd1232, %rd1229, %rd1231; 2026-02-21T08:53:00.7108741Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7108860Z mov.b64 {%r2125, %r2126}, %rd1232; 2026-02-21T08:53:00.7108987Z cvt.rn.f16x2.f32 %r2127, %r2126, %r2125; 2026-02-21T08:53:00.7109340Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7109461Z cvt.u64.u32 %rd1233, %r1377; 2026-02-21T08:53:00.7109568Z cvt.u64.u32 %rd1234, %r1378; 2026-02-21T08:53:00.7109679Z shl.b64 %rd1235, %rd1234, 32; 2026-02-21T08:53:00.7109786Z or.b64 %rd1236, %rd1233, %rd1235; 2026-02-21T08:53:00.7110160Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7110273Z mov.b64 {%r2128, %r2129}, %rd1236; 2026-02-21T08:53:00.7110404Z cvt.rn.f16x2.f32 %r2130, %r2129, %r2128; 2026-02-21T08:53:00.7110905Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7111020Z cvt.u64.u32 %rd1237, %r1379; 2026-02-21T08:53:00.7111133Z cvt.u64.u32 %rd1238, %r1380; 2026-02-21T08:53:00.7111254Z shl.b64 %rd1239, %rd1238, 32; 2026-02-21T08:53:00.7111368Z or.b64 %rd1240, %rd1237, %rd1239; 2026-02-21T08:53:00.7111743Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7111859Z mov.b64 {%r2131, %r2132}, %rd1240; 2026-02-21T08:53:00.7111996Z cvt.rn.f16x2.f32 %r2133, %r2132, %r2131; 2026-02-21T08:53:00.7112107Z mov.b64 {%r2134, %r2135}, %rd656; 2026-02-21T08:53:00.7112237Z cvt.rn.f16x2.f32 %r2136, %r2135, %r2134; 2026-02-21T08:53:00.7112357Z mov.b64 {%r2137, %r2138}, %rd660; 2026-02-21T08:53:00.7112494Z cvt.rn.f16x2.f32 %r2139, %r2138, %r2137; 2026-02-21T08:53:00.7112608Z mov.b64 {%r2140, %r2141}, %rd664; 2026-02-21T08:53:00.7112740Z cvt.rn.f16x2.f32 %r2142, %r2141, %r2140; 2026-02-21T08:53:00.7112868Z mov.b64 {%r2143, %r2144}, %rd668; 2026-02-21T08:53:00.7113004Z cvt.rn.f16x2.f32 %r2145, %r2144, %r2143; 2026-02-21T08:53:00.7113120Z mov.b64 {%r2146, %r2147}, %rd672; 2026-02-21T08:53:00.7113265Z cvt.rn.f16x2.f32 %r2148, %r2147, %r2146; 2026-02-21T08:53:00.7113378Z mov.b64 {%r2149, %r2150}, %rd676; 2026-02-21T08:53:00.7113509Z cvt.rn.f16x2.f32 %r2151, %r2150, %r2149; 2026-02-21T08:53:00.7113630Z mov.b64 {%r2152, %r2153}, %rd680; 2026-02-21T08:53:00.7113874Z cvt.rn.f16x2.f32 %r2154, %r2153, %r2152; 2026-02-21T08:53:00.7113988Z mov.b64 {%r2155, %r2156}, %rd684; 2026-02-21T08:53:00.7114122Z cvt.rn.f16x2.f32 %r2157, %r2156, %r2155; 2026-02-21T08:53:00.7114242Z mov.b64 {%r2158, %r2159}, %rd688; 2026-02-21T08:53:00.7114370Z cvt.rn.f16x2.f32 %r2160, %r2159, %r2158; 2026-02-21T08:53:00.7114484Z mov.b64 {%r2161, %r2162}, %rd692; 2026-02-21T08:53:00.7114723Z cvt.rn.f16x2.f32 %r2163, %r2162, %r2161; 2026-02-21T08:53:00.7114854Z mov.b64 {%r2164, %r2165}, %rd696; 2026-02-21T08:53:00.7114981Z cvt.rn.f16x2.f32 %r2166, %r2165, %r2164; 2026-02-21T08:53:00.7115091Z mov.b64 {%r2167, %r2168}, %rd700; 2026-02-21T08:53:00.7115228Z cvt.rn.f16x2.f32 %r2169, %r2168, %r2167; 2026-02-21T08:53:00.7115341Z mov.b64 {%r2170, %r2171}, %rd704; 2026-02-21T08:53:00.7115468Z cvt.rn.f16x2.f32 %r2172, %r2171, %r2170; 2026-02-21T08:53:00.7115582Z mov.b64 {%r2173, %r2174}, %rd708; 2026-02-21T08:53:00.7115711Z cvt.rn.f16x2.f32 %r2175, %r2174, %r2173; 2026-02-21T08:53:00.7115826Z mov.b64 {%r2176, %r2177}, %rd712; 2026-02-21T08:53:00.7115963Z cvt.rn.f16x2.f32 %r2178, %r2177, %r2176; 2026-02-21T08:53:00.7116074Z mov.b64 {%r2179, %r2180}, %rd716; 2026-02-21T08:53:00.7116204Z cvt.rn.f16x2.f32 %r2181, %r2180, %r2179; 2026-02-21T08:53:00.7116313Z mov.b64 {%r2182, %r2183}, %rd720; 2026-02-21T08:53:00.7116532Z cvt.rn.f16x2.f32 %r2184, %r2183, %r2182; 2026-02-21T08:53:00.7116658Z mov.b64 {%r2185, %r2186}, %rd724; 2026-02-21T08:53:00.7116787Z cvt.rn.f16x2.f32 %r2187, %r2186, %r2185; 2026-02-21T08:53:00.7116907Z mov.b64 {%r2188, %r2189}, %rd728; 2026-02-21T08:53:00.7117035Z cvt.rn.f16x2.f32 %r2190, %r2189, %r2188; 2026-02-21T08:53:00.7117144Z mov.b64 {%r2191, %r2192}, %rd732; 2026-02-21T08:53:00.7117271Z cvt.rn.f16x2.f32 %r2193, %r2192, %r2191; 2026-02-21T08:53:00.7117654Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7117771Z cvt.u64.u32 %rd1241, %r1424; 2026-02-21T08:53:00.7117889Z cvt.u64.u32 %rd1242, %r1425; 2026-02-21T08:53:00.7118015Z shl.b64 %rd1243, %rd1242, 32; 2026-02-21T08:53:00.7118125Z or.b64 %rd1244, %rd1241, %rd1243; 2026-02-21T08:53:00.7118507Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7118631Z mov.b64 {%r2194, %r2195}, %rd1244; 2026-02-21T08:53:00.7118763Z cvt.rn.f16x2.f32 %r2196, %r2195, %r2194; 2026-02-21T08:53:00.7119161Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7119395Z cvt.u64.u32 %rd1245, %r1426; 2026-02-21T08:53:00.7119511Z cvt.u64.u32 %rd1246, %r1427; 2026-02-21T08:53:00.7119627Z shl.b64 %rd1247, %rd1246, 32; 2026-02-21T08:53:00.7119742Z or.b64 %rd1248, %rd1245, %rd1247; 2026-02-21T08:53:00.7120114Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7120227Z mov.b64 {%r2197, %r2198}, %rd1248; 2026-02-21T08:53:00.7120353Z cvt.rn.f16x2.f32 %r2199, %r2198, %r2197; 2026-02-21T08:53:00.7120709Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7120815Z cvt.u64.u32 %rd1249, %r1428; 2026-02-21T08:53:00.7120918Z cvt.u64.u32 %rd1250, %r1429; 2026-02-21T08:53:00.7121033Z shl.b64 %rd1251, %rd1250, 32; 2026-02-21T08:53:00.7121156Z or.b64 %rd1252, %rd1249, %rd1251; 2026-02-21T08:53:00.7121587Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7121719Z mov.b64 {%r2200, %r2201}, %rd1252; 2026-02-21T08:53:00.7121902Z cvt.rn.f16x2.f32 %r2202, %r2201, %r2200; 2026-02-21T08:53:00.7122341Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7122469Z cvt.u64.u32 %rd1253, %r1430; 2026-02-21T08:53:00.7122610Z cvt.u64.u32 %rd1254, %r1431; 2026-02-21T08:53:00.7122741Z shl.b64 %rd1255, %rd1254, 32; 2026-02-21T08:53:00.7122997Z or.b64 %rd1256, %rd1253, %rd1255; 2026-02-21T08:53:00.7123398Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7123506Z mov.b64 {%r2203, %r2204}, %rd1256; 2026-02-21T08:53:00.7123629Z cvt.rn.f16x2.f32 %r2205, %r2204, %r2203; 2026-02-21T08:53:00.7124057Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7124188Z cvt.u64.u32 %rd1257, %r1432; 2026-02-21T08:53:00.7124298Z cvt.u64.u32 %rd1258, %r1433; 2026-02-21T08:53:00.7124408Z shl.b64 %rd1259, %rd1258, 32; 2026-02-21T08:53:00.7124527Z or.b64 %rd1260, %rd1257, %rd1259; 2026-02-21T08:53:00.7124955Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7125075Z mov.b64 {%r2206, %r2207}, %rd1260; 2026-02-21T08:53:00.7125220Z cvt.rn.f16x2.f32 %r2208, %r2207, %r2206; 2026-02-21T08:53:00.7125593Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7125701Z cvt.u64.u32 %rd1261, %r1434; 2026-02-21T08:53:00.7125810Z cvt.u64.u32 %rd1262, %r1435; 2026-02-21T08:53:00.7125929Z shl.b64 %rd1263, %rd1262, 32; 2026-02-21T08:53:00.7126044Z or.b64 %rd1264, %rd1261, %rd1263; 2026-02-21T08:53:00.7126522Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7126655Z mov.b64 {%r2209, %r2210}, %rd1264; 2026-02-21T08:53:00.7126780Z cvt.rn.f16x2.f32 %r2211, %r2210, %r2209; 2026-02-21T08:53:00.7127135Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7127252Z cvt.u64.u32 %rd1265, %r1436; 2026-02-21T08:53:00.7127362Z cvt.u64.u32 %rd1266, %r1437; 2026-02-21T08:53:00.7127468Z shl.b64 %rd1267, %rd1266, 32; 2026-02-21T08:53:00.7127576Z or.b64 %rd1268, %rd1265, %rd1267; 2026-02-21T08:53:00.7127966Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7128078Z mov.b64 {%r2212, %r2213}, %rd1268; 2026-02-21T08:53:00.7128206Z cvt.rn.f16x2.f32 %r2214, %r2213, %r2212; 2026-02-21T08:53:00.7128594Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7128706Z cvt.u64.u32 %rd1269, %r1438; 2026-02-21T08:53:00.7128820Z cvt.u64.u32 %rd1270, %r1439; 2026-02-21T08:53:00.7129078Z shl.b64 %rd1271, %rd1270, 32; 2026-02-21T08:53:00.7129194Z or.b64 %rd1272, %rd1269, %rd1271; 2026-02-21T08:53:00.7129563Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7129675Z mov.b64 {%r2215, %r2216}, %rd1272; 2026-02-21T08:53:00.7129812Z cvt.rn.f16x2.f32 %r2217, %r2216, %r2215; 2026-02-21T08:53:00.7130189Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7130305Z cvt.u64.u32 %rd1273, %r1441; 2026-02-21T08:53:00.7130430Z cvt.u64.u32 %rd1274, %r1442; 2026-02-21T08:53:00.7130546Z shl.b64 %rd1275, %rd1274, 32; 2026-02-21T08:53:00.7130662Z or.b64 %rd1276, %rd1273, %rd1275; 2026-02-21T08:53:00.7131070Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7131192Z mov.b64 {%r2218, %r2219}, %rd1276; 2026-02-21T08:53:00.7131326Z cvt.rn.f16x2.f32 %r2220, %r2219, %r2218; 2026-02-21T08:53:00.7131700Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7131823Z cvt.u64.u32 %rd1277, %r1443; 2026-02-21T08:53:00.7131938Z cvt.u64.u32 %rd1278, %r1444; 2026-02-21T08:53:00.7132050Z shl.b64 %rd1279, %rd1278, 32; 2026-02-21T08:53:00.7132174Z or.b64 %rd1280, %rd1277, %rd1279; 2026-02-21T08:53:00.7132546Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7132770Z mov.b64 {%r2221, %r2222}, %rd1280; 2026-02-21T08:53:00.7132903Z cvt.rn.f16x2.f32 %r2223, %r2222, %r2221; 2026-02-21T08:53:00.7133264Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7133375Z cvt.u64.u32 %rd1281, %r1445; 2026-02-21T08:53:00.7133488Z cvt.u64.u32 %rd1282, %r1446; 2026-02-21T08:53:00.7133606Z shl.b64 %rd1283, %rd1282, 32; 2026-02-21T08:53:00.7133789Z or.b64 %rd1284, %rd1281, %rd1283; 2026-02-21T08:53:00.7134195Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7134319Z mov.b64 {%r2224, %r2225}, %rd1284; 2026-02-21T08:53:00.7134452Z cvt.rn.f16x2.f32 %r2226, %r2225, %r2224; 2026-02-21T08:53:00.7134922Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7135045Z cvt.u64.u32 %rd1285, %r1447; 2026-02-21T08:53:00.7135156Z cvt.u64.u32 %rd1286, %r1448; 2026-02-21T08:53:00.7135273Z shl.b64 %rd1287, %rd1286, 32; 2026-02-21T08:53:00.7135384Z or.b64 %rd1288, %rd1285, %rd1287; 2026-02-21T08:53:00.7135760Z .loc 1 59 27 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:59:27 2026-02-21T08:53:00.7135875Z mov.b64 {%r2227, %r2228}, %rd1288; 2026-02-21T08:53:00.7136093Z cvt.rn.f16x2.f32 %r2229, %r2228, %r2227; 2026-02-21T08:53:00.7136229Z mov.b64 {%r2230, %r2231}, %rd736; 2026-02-21T08:53:00.7136358Z cvt.rn.f16x2.f32 %r2232, %r2231, %r2230; 2026-02-21T08:53:00.7136470Z mov.b64 {%r2233, %r2234}, %rd740; 2026-02-21T08:53:00.7136613Z cvt.rn.f16x2.f32 %r2235, %r2234, %r2233; 2026-02-21T08:53:00.7136727Z mov.b64 {%r2236, %r2237}, %rd744; 2026-02-21T08:53:00.7136859Z cvt.rn.f16x2.f32 %r2238, %r2237, %r2236; 2026-02-21T08:53:00.7136973Z mov.b64 {%r2239, %r2240}, %rd748; 2026-02-21T08:53:00.7137109Z cvt.rn.f16x2.f32 %r2241, %r2240, %r2239; 2026-02-21T08:53:00.7137219Z mov.b64 {%r2242, %r2243}, %rd752; 2026-02-21T08:53:00.7137352Z cvt.rn.f16x2.f32 %r2244, %r2243, %r2242; 2026-02-21T08:53:00.7137471Z mov.b64 {%r2245, %r2246}, %rd756; 2026-02-21T08:53:00.7137601Z cvt.rn.f16x2.f32 %r2247, %r2246, %r2245; 2026-02-21T08:53:00.7137712Z mov.b64 {%r2248, %r2249}, %rd760; 2026-02-21T08:53:00.7137844Z cvt.rn.f16x2.f32 %r2250, %r2249, %r2248; 2026-02-21T08:53:00.7137966Z mov.b64 {%r2251, %r2252}, %rd764; 2026-02-21T08:53:00.7138097Z cvt.rn.f16x2.f32 %r2253, %r2252, %r2251; 2026-02-21T08:53:00.7138309Z mov.b64 {%r2254, %r2255}, %rd768; 2026-02-21T08:53:00.7138447Z cvt.rn.f16x2.f32 %r2256, %r2255, %r2254; 2026-02-21T08:53:00.7138554Z mov.b64 {%r2257, %r2258}, %rd772; 2026-02-21T08:53:00.7138676Z cvt.rn.f16x2.f32 %r2259, %r2258, %r2257; 2026-02-21T08:53:00.7138795Z mov.b64 {%r2260, %r2261}, %rd776; 2026-02-21T08:53:00.7138918Z cvt.rn.f16x2.f32 %r2262, %r2261, %r2260; 2026-02-21T08:53:00.7139026Z mov.b64 {%r2263, %r2264}, %rd780; 2026-02-21T08:53:00.7139153Z cvt.rn.f16x2.f32 %r2265, %r2264, %r2263; 2026-02-21T08:53:00.7139273Z mov.b64 {%r2266, %r2267}, %rd784; 2026-02-21T08:53:00.7139405Z cvt.rn.f16x2.f32 %r2268, %r2267, %r2266; 2026-02-21T08:53:00.7139515Z mov.b64 {%r2269, %r2270}, %rd788; 2026-02-21T08:53:00.7139651Z cvt.rn.f16x2.f32 %r2271, %r2270, %r2269; 2026-02-21T08:53:00.7139761Z mov.b64 {%r2272, %r2273}, %rd792; 2026-02-21T08:53:00.7139890Z cvt.rn.f16x2.f32 %r2274, %r2273, %r2272; 2026-02-21T08:53:00.7140002Z mov.b64 {%r2275, %r2276}, %rd796; 2026-02-21T08:53:00.7140146Z cvt.rn.f16x2.f32 %r2277, %r2276, %r2275; 2026-02-21T08:53:00.7140258Z mov.b64 {%r2278, %r2279}, %rd800; 2026-02-21T08:53:00.7140385Z cvt.rn.f16x2.f32 %r2280, %r2279, %r2278; 2026-02-21T08:53:00.7140504Z mov.b64 {%r2281, %r2282}, %rd804; 2026-02-21T08:53:00.7140634Z cvt.rn.f16x2.f32 %r2283, %r2282, %r2281; 2026-02-21T08:53:00.7140745Z mov.b64 {%r2284, %r2285}, %rd808; 2026-02-21T08:53:00.7140880Z cvt.rn.f16x2.f32 %r2286, %r2285, %r2284; 2026-02-21T08:53:00.7140990Z mov.b64 {%r2287, %r2288}, %rd812; 2026-02-21T08:53:00.7141232Z cvt.rn.f16x2.f32 %r2289, %r2288, %r2287; 2026-02-21T08:53:00.7141628Z .loc 1 60 45 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:60:45 2026-02-21T08:53:00.7141856Z st.shared.v4.b32 [%r1515], {%r1524, %r1527, %r1530, %r1533}; 2026-02-21T08:53:00.7142106Z st.shared.v4.b32 [%r1515+32768], {%r1620, %r1623, %r1626, %r1629}; 2026-02-21T08:53:00.7142422Z st.shared.v4.b32 [%r1515+65536], {%r1716, %r1719, %r1722, %r1725}; 2026-02-21T08:53:00.7142683Z st.shared.v4.b32 [%r1515+98304], {%r1812, %r1815, %r1818, %r1821}; 2026-02-21T08:53:00.7142926Z st.shared.v4.b32 [%r1515+16384], {%r1908, %r1911, %r1914, %r1917}; 2026-02-21T08:53:00.7143167Z st.shared.v4.b32 [%r1515+49152], {%r2004, %r2007, %r2010, %r2013}; 2026-02-21T08:53:00.7143407Z st.shared.v4.b32 [%r1515+81920], {%r2100, %r2103, %r2106, %r2109}; 2026-02-21T08:53:00.7143646Z st.shared.v4.b32 [%r1515+114688], {%r2196, %r2199, %r2202, %r2205}; 2026-02-21T08:53:00.7143859Z st.shared.v4.b32 [%r1514], {%r1536, %r1539, %r1542, %r1545}; 2026-02-21T08:53:00.7144095Z st.shared.v4.b32 [%r1514+32768], {%r1632, %r1635, %r1638, %r1641}; 2026-02-21T08:53:00.7144309Z st.shared.v4.b32 [%r1514+65536], {%r1728, %r1731, %r1734, %r1737}; 2026-02-21T08:53:00.7144517Z st.shared.v4.b32 [%r1514+98304], {%r1824, %r1827, %r1830, %r1833}; 2026-02-21T08:53:00.7144897Z st.shared.v4.b32 [%r1514+16384], {%r1920, %r1923, %r1926, %r1929}; 2026-02-21T08:53:00.7145133Z st.shared.v4.b32 [%r1514+49152], {%r2016, %r2019, %r2022, %r2025}; 2026-02-21T08:53:00.7145345Z st.shared.v4.b32 [%r1514+81920], {%r2112, %r2115, %r2118, %r2121}; 2026-02-21T08:53:00.7145568Z st.shared.v4.b32 [%r1514+114688], {%r2208, %r2211, %r2214, %r2217}; 2026-02-21T08:53:00.7145774Z st.shared.v4.b32 [%r1512], {%r1548, %r1551, %r1554, %r1557}; 2026-02-21T08:53:00.7145991Z st.shared.v4.b32 [%r1512+32768], {%r1644, %r1647, %r1650, %r1653}; 2026-02-21T08:53:00.7146208Z st.shared.v4.b32 [%r1512+65536], {%r1740, %r1743, %r1746, %r1749}; 2026-02-21T08:53:00.7146437Z st.shared.v4.b32 [%r1512+98304], {%r1836, %r1839, %r1842, %r1845}; 2026-02-21T08:53:00.7146648Z st.shared.v4.b32 [%r1512+16384], {%r1932, %r1935, %r1938, %r1941}; 2026-02-21T08:53:00.7146861Z st.shared.v4.b32 [%r1512+49152], {%r2028, %r2031, %r2034, %r2037}; 2026-02-21T08:53:00.7147086Z st.shared.v4.b32 [%r1512+81920], {%r2124, %r2127, %r2130, %r2133}; 2026-02-21T08:53:00.7147306Z st.shared.v4.b32 [%r1512+114688], {%r2220, %r2223, %r2226, %r2229}; 2026-02-21T08:53:00.7147607Z st.shared.v4.b32 [%r1510], {%r1560, %r1563, %r1566, %r1569}; 2026-02-21T08:53:00.7147829Z st.shared.v4.b32 [%r1510+32768], {%r1656, %r1659, %r1662, %r1665}; 2026-02-21T08:53:00.7148059Z st.shared.v4.b32 [%r1510+65536], {%r1752, %r1755, %r1758, %r1761}; 2026-02-21T08:53:00.7148284Z st.shared.v4.b32 [%r1510+98304], {%r1848, %r1851, %r1854, %r1857}; 2026-02-21T08:53:00.7148511Z st.shared.v4.b32 [%r1510+16384], {%r1944, %r1947, %r1950, %r1953}; 2026-02-21T08:53:00.7148747Z st.shared.v4.b32 [%r1510+49152], {%r2040, %r2043, %r2046, %r2049}; 2026-02-21T08:53:00.7148976Z st.shared.v4.b32 [%r1510+81920], {%r2136, %r2139, %r2142, %r2145}; 2026-02-21T08:53:00.7149196Z st.shared.v4.b32 [%r1510+114688], {%r2232, %r2235, %r2238, %r2241}; 2026-02-21T08:53:00.7149405Z st.shared.v4.b32 [%r1508], {%r1572, %r1575, %r1578, %r1581}; 2026-02-21T08:53:00.7149637Z st.shared.v4.b32 [%r1508+32768], {%r1668, %r1671, %r1674, %r1677}; 2026-02-21T08:53:00.7149864Z st.shared.v4.b32 [%r1508+65536], {%r1764, %r1767, %r1770, %r1773}; 2026-02-21T08:53:00.7150098Z st.shared.v4.b32 [%r1508+98304], {%r1860, %r1863, %r1866, %r1869}; 2026-02-21T08:53:00.7150313Z st.shared.v4.b32 [%r1508+16384], {%r1956, %r1959, %r1962, %r1965}; 2026-02-21T08:53:00.7150525Z st.shared.v4.b32 [%r1508+49152], {%r2052, %r2055, %r2058, %r2061}; 2026-02-21T08:53:00.7150745Z st.shared.v4.b32 [%r1508+81920], {%r2148, %r2151, %r2154, %r2157}; 2026-02-21T08:53:00.7150977Z st.shared.v4.b32 [%r1508+114688], {%r2244, %r2247, %r2250, %r2253}; 2026-02-21T08:53:00.7151301Z st.shared.v4.b32 [%r1506], {%r1584, %r1587, %r1590, %r1593}; 2026-02-21T08:53:00.7151528Z st.shared.v4.b32 [%r1506+32768], {%r1680, %r1683, %r1686, %r1689}; 2026-02-21T08:53:00.7151763Z st.shared.v4.b32 [%r1506+65536], {%r1776, %r1779, %r1782, %r1785}; 2026-02-21T08:53:00.7151993Z st.shared.v4.b32 [%r1506+98304], {%r1872, %r1875, %r1878, %r1881}; 2026-02-21T08:53:00.7152290Z st.shared.v4.b32 [%r1506+16384], {%r1968, %r1971, %r1974, %r1977}; 2026-02-21T08:53:00.7152534Z st.shared.v4.b32 [%r1506+49152], {%r2064, %r2067, %r2070, %r2073}; 2026-02-21T08:53:00.7152758Z st.shared.v4.b32 [%r1506+81920], {%r2160, %r2163, %r2166, %r2169}; 2026-02-21T08:53:00.7152981Z st.shared.v4.b32 [%r1506+114688], {%r2256, %r2259, %r2262, %r2265}; 2026-02-21T08:53:00.7153188Z st.shared.v4.b32 [%r1504], {%r1596, %r1599, %r1602, %r1605}; 2026-02-21T08:53:00.7153412Z st.shared.v4.b32 [%r1504+32768], {%r1692, %r1695, %r1698, %r1701}; 2026-02-21T08:53:00.7153634Z st.shared.v4.b32 [%r1504+65536], {%r1788, %r1791, %r1794, %r1797}; 2026-02-21T08:53:00.7153864Z st.shared.v4.b32 [%r1504+98304], {%r1884, %r1887, %r1890, %r1893}; 2026-02-21T08:53:00.7154103Z st.shared.v4.b32 [%r1504+16384], {%r1980, %r1983, %r1986, %r1989}; 2026-02-21T08:53:00.7154337Z st.shared.v4.b32 [%r1504+49152], {%r2076, %r2079, %r2082, %r2085}; 2026-02-21T08:53:00.7154652Z st.shared.v4.b32 [%r1504+81920], {%r2172, %r2175, %r2178, %r2181}; 2026-02-21T08:53:00.7154988Z st.shared.v4.b32 [%r1504+114688], {%r2268, %r2271, %r2274, %r2277}; 2026-02-21T08:53:00.7155203Z st.shared.v4.b32 [%r1502], {%r1608, %r1611, %r1614, %r1617}; 2026-02-21T08:53:00.7155437Z st.shared.v4.b32 [%r1502+32768], {%r1704, %r1707, %r1710, %r1713}; 2026-02-21T08:53:00.7155683Z st.shared.v4.b32 [%r1502+65536], {%r1800, %r1803, %r1806, %r1809}; 2026-02-21T08:53:00.7155906Z st.shared.v4.b32 [%r1502+98304], {%r1896, %r1899, %r1902, %r1905}; 2026-02-21T08:53:00.7156129Z st.shared.v4.b32 [%r1502+16384], {%r1992, %r1995, %r1998, %r2001}; 2026-02-21T08:53:00.7156362Z st.shared.v4.b32 [%r1502+49152], {%r2088, %r2091, %r2094, %r2097}; 2026-02-21T08:53:00.7156580Z st.shared.v4.b32 [%r1502+81920], {%r2184, %r2187, %r2190, %r2193}; 2026-02-21T08:53:00.7156806Z st.shared.v4.b32 [%r1502+114688], {%r2280, %r2283, %r2286, %r2289}; 2026-02-21T08:53:00.7156921Z // begin inline asm 2026-02-21T08:53:00.7157086Z fence.proxy.async.shared::cta; 2026-02-21T08:53:00.7157194Z // end inline asm 2026-02-21T08:53:00.7157308Z bar.sync 0, 128; 2026-02-21T08:53:00.7157556Z elect.sync %r2290|%p128, -1; 2026-02-21T08:53:00.7157680Z and.pred %p126, %p127, %p128; 2026-02-21T08:53:00.7157794Z shl.b32 %r2291, %r1520, 15; 2026-02-21T08:53:00.7157916Z add.s32 %r1494, %r370, %r2291; 2026-02-21T08:53:00.7158028Z shl.b32 %r2292, %r1520, 6; 2026-02-21T08:53:00.7158139Z or.b32 %r1492, %r2292, %r1518; 2026-02-21T08:53:00.7158246Z // begin inline asm 2026-02-21T08:53:00.7158700Z @%p126 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd264, {%r1492, %r1493}], [%r1494]; 2026-02-21T08:53:00.7158808Z // end inline asm 2026-02-21T08:53:00.7158937Z cp.async.bulk.commit_group; 2026-02-21T08:53:00.7159086Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:53:00.7159191Z bar.sync 0, 128; 2026-02-21T08:53:00.7159348Z $L__BB0_16: // %._crit_edge 2026-02-21T08:53:00.7159743Z .loc 1 32 4 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:32:4 2026-02-21T08:53:00.7159852Z bar.sync 0, 128; 2026-02-21T08:53:00.7159972Z // begin inline asm 2026-02-21T08:53:00.7160251Z @%p48 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2293, 512; 2026-02-21T08:53:00.7160369Z // end inline asm 2026-02-21T08:53:00.7160584Z st.shared.v2.b32 [global_smem+131112], {50529027, 50529027}; 2026-02-21T08:53:00.7160692Z barrier.sync 1; 2026-02-21T08:53:00.7160861Z $L__BB0_17: // %common.ret 2026-02-21T08:53:00.7161237Z .loc 1 0 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:0 2026-02-21T08:53:00.7161459Z ret; 2026-02-21T08:53:00.7161658Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:53:00.7161837Z ld.param.b64 %rd65, [_helion_matmul_param_1]; 2026-02-21T08:53:00.7162178Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19 2026-02-21T08:53:00.7162293Z and.b32 %r49, %r1, 15; 2026-02-21T08:53:00.7162510Z mad.wide.u32 %rd1, %r49, 16, %rd65; 2026-02-21T08:53:00.7162641Z bfe.u32 %r50, %r1, 4, 3; 2026-02-21T08:53:00.7163029Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7163150Z or.b32 %r4, %r50, 248; 2026-02-21T08:53:00.7163261Z or.b32 %r5, %r50, 240; 2026-02-21T08:53:00.7163368Z or.b32 %r6, %r50, 232; 2026-02-21T08:53:00.7163474Z or.b32 %r7, %r50, 224; 2026-02-21T08:53:00.7163589Z or.b32 %r8, %r50, 216; 2026-02-21T08:53:00.7163695Z or.b32 %r9, %r50, 208; 2026-02-21T08:53:00.7163809Z or.b32 %r10, %r50, 200; 2026-02-21T08:53:00.7163925Z or.b32 %r11, %r50, 192; 2026-02-21T08:53:00.7164029Z or.b32 %r12, %r50, 184; 2026-02-21T08:53:00.7164135Z or.b32 %r13, %r50, 176; 2026-02-21T08:53:00.7164239Z or.b32 %r14, %r50, 168; 2026-02-21T08:53:00.7164352Z or.b32 %r15, %r50, 160; 2026-02-21T08:53:00.7164456Z or.b32 %r16, %r50, 152; 2026-02-21T08:53:00.7164749Z or.b32 %r17, %r50, 144; 2026-02-21T08:53:00.7164872Z or.b32 %r18, %r50, 136; 2026-02-21T08:53:00.7164982Z or.b32 %r19, %r50, 128; 2026-02-21T08:53:00.7165095Z or.b32 %r20, %r50, 120; 2026-02-21T08:53:00.7165210Z or.b32 %r21, %r50, 112; 2026-02-21T08:53:00.7165316Z or.b32 %r22, %r50, 104; 2026-02-21T08:53:00.7165424Z or.b32 %r23, %r50, 96; 2026-02-21T08:53:00.7165529Z or.b32 %r24, %r50, 88; 2026-02-21T08:53:00.7165647Z or.b32 %r25, %r50, 80; 2026-02-21T08:53:00.7165756Z or.b32 %r26, %r50, 72; 2026-02-21T08:53:00.7165863Z or.b32 %r27, %r50, 64; 2026-02-21T08:53:00.7165982Z or.b32 %r28, %r50, 56; 2026-02-21T08:53:00.7166097Z or.b32 %r29, %r50, 48; 2026-02-21T08:53:00.7166206Z or.b32 %r30, %r50, 40; 2026-02-21T08:53:00.7166314Z or.b32 %r31, %r50, 32; 2026-02-21T08:53:00.7166433Z or.b32 %r32, %r50, 24; 2026-02-21T08:53:00.7166538Z or.b32 %r33, %r50, 16; 2026-02-21T08:53:00.7166645Z or.b32 %r34, %r50, 8; 2026-02-21T08:53:00.7166760Z bra.uni $L__BB0_2; 2026-02-21T08:53:00.7166984Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7167364Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7167573Z barrier.sync 1; 2026-02-21T08:53:00.7167685Z barrier.sync 1; 2026-02-21T08:53:00.7167835Z $L__BB0_2: // %.preheader 2026-02-21T08:53:00.7168005Z // =>This Loop Header: Depth=1 2026-02-21T08:53:00.7168178Z // Child Loop BB0_11 Depth 2 2026-02-21T08:53:00.7168347Z // Child Loop BB0_6 Depth 2 2026-02-21T08:53:00.7168689Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19 2026-02-21T08:53:00.7168853Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:53:00.7168955Z barrier.sync 1; 2026-02-21T08:53:00.7169067Z mov.b32 %r52, global_smem; 2026-02-21T08:53:00.7169177Z add.s32 %r53, %r52, %r3; 2026-02-21T08:53:00.7169308Z ld.shared.b8 %r51, [%r53+131108]; 2026-02-21T08:53:00.7169424Z setp.gt.u32 %p3, %r51, 3; 2026-02-21T08:53:00.7169530Z @%p3 bra $L__BB0_4; 2026-02-21T08:53:00.7169683Z // %bb.3: // %.preheader 2026-02-21T08:53:00.7169861Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7169970Z $L_brx_0: .branchtargets 2026-02-21T08:53:00.7170075Z $L__BB0_5, 2026-02-21T08:53:00.7170168Z $L__BB0_10, 2026-02-21T08:53:00.7170259Z $L__BB0_13, 2026-02-21T08:53:00.7170349Z $L__BB0_17; 2026-02-21T08:53:00.7170588Z brx.idx %r51, $L_brx_0; 2026-02-21T08:53:00.7170789Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7171154Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7171273Z add.s32 %r79, %r52, 65536; 2026-02-21T08:53:00.7171420Z ld.shared.b32 %r334, [global_smem+65536]; 2026-02-21T08:53:00.7171563Z ld.shared.b32 %r80, [global_smem+65548]; 2026-02-21T08:53:00.7171743Z barrier.sync 1; 2026-02-21T08:53:00.7172151Z .loc 1 45 45 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:45:45 2026-02-21T08:53:00.7172276Z add.s32 %r81, %r1, -128; 2026-02-21T08:53:00.7172383Z shr.u32 %r36, %r81, 5; 2026-02-21T08:53:00.7172504Z and.b32 %r82, %r1, 112; 2026-02-21T08:53:00.7172873Z .loc 1 45 32 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:45:32 2026-02-21T08:53:00.7172987Z add.s32 %r84, %r80, %r50; 2026-02-21T08:53:00.7173378Z .loc 1 56 80 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:56:80 2026-02-21T08:53:00.7173484Z shl.b32 %r85, %r84, 11; 2026-02-21T08:53:00.7173586Z shl.b32 %r86, %r1, 4; 2026-02-21T08:53:00.7173693Z and.b32 %r87, %r86, 112; 2026-02-21T08:53:00.7173802Z shl.b32 %r88, %r82, 3; 2026-02-21T08:53:00.7173998Z shl.b32 %r89, %r1, 12; 2026-02-21T08:53:00.7174123Z and.b32 %r90, %r89, 32768; 2026-02-21T08:53:00.7174244Z or.b32 %r91, %r87, %r88; 2026-02-21T08:53:00.7174357Z xor.b32 %r92, %r91, %r82; 2026-02-21T08:53:00.7174467Z add.s32 %r93, %r79, %r90; 2026-02-21T08:53:00.7174576Z add.s32 %r37, %r93, %r92; 2026-02-21T08:53:00.7174763Z bfe.u32 %r94, %r52, 4, 14; 2026-02-21T08:53:00.7174880Z cvt.u64.u32 %rd73, %r94; 2026-02-21T08:53:00.7175021Z or.b64 %rd193, %rd73, 4611686293439512576; 2026-02-21T08:53:00.7175144Z bfe.u32 %r95, %r79, 4, 14; 2026-02-21T08:53:00.7175261Z cvt.u64.u32 %rd74, %r95; 2026-02-21T08:53:00.7175392Z or.b64 %rd194, %rd74, 4611686293439512576; 2026-02-21T08:53:00.7175503Z add.s32 %r96, %r52, 32; 2026-02-21T08:53:00.7175621Z bfe.u32 %r97, %r96, 4, 14; 2026-02-21T08:53:00.7175729Z cvt.u64.u32 %rd75, %r97; 2026-02-21T08:53:00.7175857Z or.b64 %rd195, %rd75, 4611686293439512576; 2026-02-21T08:53:00.7175974Z add.s32 %r98, %r52, 65568; 2026-02-21T08:53:00.7176082Z bfe.u32 %r99, %r98, 4, 14; 2026-02-21T08:53:00.7176189Z cvt.u64.u32 %rd76, %r99; 2026-02-21T08:53:00.7176329Z or.b64 %rd196, %rd76, 4611686293439512576; 2026-02-21T08:53:00.7176535Z add.s32 %r100, %r52, 64; 2026-02-21T08:53:00.7176649Z bfe.u32 %r101, %r100, 4, 14; 2026-02-21T08:53:00.7176760Z cvt.u64.u32 %rd77, %r101; 2026-02-21T08:53:00.7176897Z or.b64 %rd197, %rd77, 4611686293439512576; 2026-02-21T08:53:00.7177007Z add.s32 %r102, %r52, 65600; 2026-02-21T08:53:00.7177117Z bfe.u32 %r103, %r102, 4, 14; 2026-02-21T08:53:00.7177233Z cvt.u64.u32 %rd78, %r103; 2026-02-21T08:53:00.7177362Z or.b64 %rd198, %rd78, 4611686293439512576; 2026-02-21T08:53:00.7177475Z add.s32 %r104, %r52, 96; 2026-02-21T08:53:00.7177585Z bfe.u32 %r105, %r104, 4, 14; 2026-02-21T08:53:00.7177708Z cvt.u64.u32 %rd79, %r105; 2026-02-21T08:53:00.7177839Z or.b64 %rd199, %rd79, 4611686293439512576; 2026-02-21T08:53:00.7177952Z add.s32 %r106, %r52, 65632; 2026-02-21T08:53:00.7178071Z bfe.u32 %r107, %r106, 4, 14; 2026-02-21T08:53:00.7178188Z cvt.u64.u32 %rd80, %r107; 2026-02-21T08:53:00.7178320Z or.b64 %rd200, %rd80, 4611686293439512576; 2026-02-21T08:53:00.7178427Z add.s32 %r108, %r52, 32768; 2026-02-21T08:53:00.7178544Z bfe.u32 %r109, %r108, 4, 14; 2026-02-21T08:53:00.7178653Z cvt.u64.u32 %rd81, %r109; 2026-02-21T08:53:00.7178779Z or.b64 %rd201, %rd81, 4611686293439512576; 2026-02-21T08:53:00.7178898Z add.s32 %r110, %r52, 98304; 2026-02-21T08:53:00.7179009Z bfe.u32 %r111, %r110, 4, 14; 2026-02-21T08:53:00.7179118Z cvt.u64.u32 %rd82, %r111; 2026-02-21T08:53:00.7179250Z or.b64 %rd202, %rd82, 4611686293439512576; 2026-02-21T08:53:00.7179358Z add.s32 %r112, %r52, 32800; 2026-02-21T08:53:00.7179572Z bfe.u32 %r113, %r112, 4, 14; 2026-02-21T08:53:00.7179679Z cvt.u64.u32 %rd83, %r113; 2026-02-21T08:53:00.7179813Z or.b64 %rd203, %rd83, 4611686293439512576; 2026-02-21T08:53:00.7179919Z add.s32 %r114, %r52, 98336; 2026-02-21T08:53:00.7180024Z bfe.u32 %r115, %r114, 4, 14; 2026-02-21T08:53:00.7180145Z cvt.u64.u32 %rd84, %r115; 2026-02-21T08:53:00.7180269Z or.b64 %rd204, %rd84, 4611686293439512576; 2026-02-21T08:53:00.7180448Z add.s32 %r116, %r52, 32832; 2026-02-21T08:53:00.7180573Z bfe.u32 %r117, %r116, 4, 14; 2026-02-21T08:53:00.7180694Z cvt.u64.u32 %rd85, %r117; 2026-02-21T08:53:00.7180824Z or.b64 %rd205, %rd85, 4611686293439512576; 2026-02-21T08:53:00.7180933Z add.s32 %r118, %r52, 98368; 2026-02-21T08:53:00.7181055Z bfe.u32 %r119, %r118, 4, 14; 2026-02-21T08:53:00.7181166Z cvt.u64.u32 %rd86, %r119; 2026-02-21T08:53:00.7181295Z or.b64 %rd206, %rd86, 4611686293439512576; 2026-02-21T08:53:00.7181407Z add.s32 %r120, %r52, 32864; 2026-02-21T08:53:00.7181530Z bfe.u32 %r121, %r120, 4, 14; 2026-02-21T08:53:00.7181639Z cvt.u64.u32 %rd87, %r121; 2026-02-21T08:53:00.7181766Z or.b64 %rd207, %rd87, 4611686293439512576; 2026-02-21T08:53:00.7181882Z add.s32 %r122, %r52, 98400; 2026-02-21T08:53:00.7181989Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T08:53:00.7182099Z cvt.u64.u32 %rd88, %r123; 2026-02-21T08:53:00.7182319Z or.b64 %rd208, %rd88, 4611686293439512576; 2026-02-21T08:53:00.7182442Z add.s32 %r124, %r52, 16384; 2026-02-21T08:53:00.7182554Z bfe.u32 %r125, %r124, 4, 14; 2026-02-21T08:53:00.7182663Z cvt.u64.u32 %rd89, %r125; 2026-02-21T08:53:00.7182801Z or.b64 %rd209, %rd89, 4611686293439512576; 2026-02-21T08:53:00.7182911Z add.s32 %r126, %r52, 16416; 2026-02-21T08:53:00.7183020Z bfe.u32 %r127, %r126, 4, 14; 2026-02-21T08:53:00.7183137Z cvt.u64.u32 %rd90, %r127; 2026-02-21T08:53:00.7183266Z or.b64 %rd211, %rd90, 4611686293439512576; 2026-02-21T08:53:00.7183376Z add.s32 %r128, %r52, 16448; 2026-02-21T08:53:00.7183488Z bfe.u32 %r129, %r128, 4, 14; 2026-02-21T08:53:00.7183611Z cvt.u64.u32 %rd91, %r129; 2026-02-21T08:53:00.7183745Z or.b64 %rd213, %rd91, 4611686293439512576; 2026-02-21T08:53:00.7183854Z add.s32 %r130, %r52, 16480; 2026-02-21T08:53:00.7183971Z bfe.u32 %r131, %r130, 4, 14; 2026-02-21T08:53:00.7184081Z cvt.u64.u32 %rd92, %r131; 2026-02-21T08:53:00.7184211Z or.b64 %rd215, %rd92, 4611686293439512576; 2026-02-21T08:53:00.7184321Z add.s32 %r132, %r52, 49152; 2026-02-21T08:53:00.7184439Z bfe.u32 %r133, %r132, 4, 14; 2026-02-21T08:53:00.7184634Z cvt.u64.u32 %rd93, %r133; 2026-02-21T08:53:00.7184833Z or.b64 %rd217, %rd93, 4611686293439512576; 2026-02-21T08:53:00.7184957Z add.s32 %r134, %r52, 49184; 2026-02-21T08:53:00.7185068Z bfe.u32 %r135, %r134, 4, 14; 2026-02-21T08:53:00.7185179Z cvt.u64.u32 %rd94, %r135; 2026-02-21T08:53:00.7185313Z or.b64 %rd219, %rd94, 4611686293439512576; 2026-02-21T08:53:00.7185418Z add.s32 %r136, %r52, 49216; 2026-02-21T08:53:00.7185520Z bfe.u32 %r137, %r136, 4, 14; 2026-02-21T08:53:00.7185628Z cvt.u64.u32 %rd95, %r137; 2026-02-21T08:53:00.7185761Z or.b64 %rd221, %rd95, 4611686293439512576; 2026-02-21T08:53:00.7185871Z add.s32 %r138, %r52, 49248; 2026-02-21T08:53:00.7185978Z bfe.u32 %r139, %r138, 4, 14; 2026-02-21T08:53:00.7186094Z cvt.u64.u32 %rd96, %r139; 2026-02-21T08:53:00.7186226Z or.b64 %rd223, %rd96, 4611686293439512576; 2026-02-21T08:53:00.7186621Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7186735Z add.s32 %r140, %r4, %r80; 2026-02-21T08:53:00.7186854Z shl.b32 %r141, %r140, 11; 2026-02-21T08:53:00.7186984Z mad.wide.s32 %rd26, %r141, 2, %rd1; 2026-02-21T08:53:00.7187089Z add.s32 %r142, %r5, %r80; 2026-02-21T08:53:00.7187204Z shl.b32 %r143, %r142, 11; 2026-02-21T08:53:00.7187330Z mad.wide.s32 %rd27, %r143, 2, %rd1; 2026-02-21T08:53:00.7187437Z add.s32 %r144, %r6, %r80; 2026-02-21T08:53:00.7187552Z shl.b32 %r145, %r144, 11; 2026-02-21T08:53:00.7187676Z mad.wide.s32 %rd28, %r145, 2, %rd1; 2026-02-21T08:53:00.7187898Z add.s32 %r146, %r7, %r80; 2026-02-21T08:53:00.7188009Z shl.b32 %r147, %r146, 11; 2026-02-21T08:53:00.7188140Z mad.wide.s32 %rd29, %r147, 2, %rd1; 2026-02-21T08:53:00.7188249Z add.s32 %r148, %r8, %r80; 2026-02-21T08:53:00.7188356Z shl.b32 %r149, %r148, 11; 2026-02-21T08:53:00.7188491Z mad.wide.s32 %rd30, %r149, 2, %rd1; 2026-02-21T08:53:00.7188601Z add.s32 %r150, %r9, %r80; 2026-02-21T08:53:00.7188796Z shl.b32 %r151, %r150, 11; 2026-02-21T08:53:00.7188931Z mad.wide.s32 %rd31, %r151, 2, %rd1; 2026-02-21T08:53:00.7189054Z add.s32 %r152, %r10, %r80; 2026-02-21T08:53:00.7189164Z shl.b32 %r153, %r152, 11; 2026-02-21T08:53:00.7189287Z mad.wide.s32 %rd32, %r153, 2, %rd1; 2026-02-21T08:53:00.7189410Z add.s32 %r154, %r11, %r80; 2026-02-21T08:53:00.7189523Z shl.b32 %r155, %r154, 11; 2026-02-21T08:53:00.7189649Z mad.wide.s32 %rd33, %r155, 2, %rd1; 2026-02-21T08:53:00.7189762Z add.s32 %r156, %r12, %r80; 2026-02-21T08:53:00.7189885Z shl.b32 %r157, %r156, 11; 2026-02-21T08:53:00.7190006Z mad.wide.s32 %rd34, %r157, 2, %rd1; 2026-02-21T08:53:00.7190117Z add.s32 %r158, %r13, %r80; 2026-02-21T08:53:00.7190231Z shl.b32 %r159, %r158, 11; 2026-02-21T08:53:00.7190352Z mad.wide.s32 %rd35, %r159, 2, %rd1; 2026-02-21T08:53:00.7190463Z add.s32 %r160, %r14, %r80; 2026-02-21T08:53:00.7190655Z shl.b32 %r161, %r160, 11; 2026-02-21T08:53:00.7190795Z mad.wide.s32 %rd36, %r161, 2, %rd1; 2026-02-21T08:53:00.7190904Z add.s32 %r162, %r15, %r80; 2026-02-21T08:53:00.7191010Z shl.b32 %r163, %r162, 11; 2026-02-21T08:53:00.7191135Z mad.wide.s32 %rd37, %r163, 2, %rd1; 2026-02-21T08:53:00.7191239Z add.s32 %r164, %r16, %r80; 2026-02-21T08:53:00.7191341Z shl.b32 %r165, %r164, 11; 2026-02-21T08:53:00.7191460Z mad.wide.s32 %rd38, %r165, 2, %rd1; 2026-02-21T08:53:00.7191564Z add.s32 %r166, %r17, %r80; 2026-02-21T08:53:00.7191662Z shl.b32 %r167, %r166, 11; 2026-02-21T08:53:00.7191775Z mad.wide.s32 %rd39, %r167, 2, %rd1; 2026-02-21T08:53:00.7191894Z add.s32 %r168, %r18, %r80; 2026-02-21T08:53:00.7191995Z shl.b32 %r169, %r168, 11; 2026-02-21T08:53:00.7192109Z mad.wide.s32 %rd40, %r169, 2, %rd1; 2026-02-21T08:53:00.7192221Z add.s32 %r170, %r19, %r80; 2026-02-21T08:53:00.7192324Z shl.b32 %r171, %r170, 11; 2026-02-21T08:53:00.7192441Z mad.wide.s32 %rd41, %r171, 2, %rd1; 2026-02-21T08:53:00.7192548Z add.s32 %r172, %r20, %r80; 2026-02-21T08:53:00.7192662Z shl.b32 %r173, %r172, 11; 2026-02-21T08:53:00.7192783Z mad.wide.s32 %rd42, %r173, 2, %rd1; 2026-02-21T08:53:00.7193006Z add.s32 %r174, %r21, %r80; 2026-02-21T08:53:00.7193119Z shl.b32 %r175, %r174, 11; 2026-02-21T08:53:00.7193235Z mad.wide.s32 %rd43, %r175, 2, %rd1; 2026-02-21T08:53:00.7193339Z add.s32 %r176, %r22, %r80; 2026-02-21T08:53:00.7193442Z shl.b32 %r177, %r176, 11; 2026-02-21T08:53:00.7193566Z mad.wide.s32 %rd44, %r177, 2, %rd1; 2026-02-21T08:53:00.7193673Z add.s32 %r178, %r23, %r80; 2026-02-21T08:53:00.7193777Z shl.b32 %r179, %r178, 11; 2026-02-21T08:53:00.7193906Z mad.wide.s32 %rd45, %r179, 2, %rd1; 2026-02-21T08:53:00.7194012Z add.s32 %r180, %r24, %r80; 2026-02-21T08:53:00.7194112Z shl.b32 %r181, %r180, 11; 2026-02-21T08:53:00.7194237Z mad.wide.s32 %rd46, %r181, 2, %rd1; 2026-02-21T08:53:00.7194343Z add.s32 %r182, %r25, %r80; 2026-02-21T08:53:00.7194446Z shl.b32 %r183, %r182, 11; 2026-02-21T08:53:00.7194566Z mad.wide.s32 %rd47, %r183, 2, %rd1; 2026-02-21T08:53:00.7194742Z add.s32 %r184, %r26, %r80; 2026-02-21T08:53:00.7194853Z shl.b32 %r185, %r184, 11; 2026-02-21T08:53:00.7194977Z mad.wide.s32 %rd48, %r185, 2, %rd1; 2026-02-21T08:53:00.7195090Z add.s32 %r186, %r27, %r80; 2026-02-21T08:53:00.7195197Z shl.b32 %r187, %r186, 11; 2026-02-21T08:53:00.7195315Z mad.wide.s32 %rd49, %r187, 2, %rd1; 2026-02-21T08:53:00.7195424Z add.s32 %r188, %r28, %r80; 2026-02-21T08:53:00.7195542Z shl.b32 %r189, %r188, 11; 2026-02-21T08:53:00.7195665Z mad.wide.s32 %rd50, %r189, 2, %rd1; 2026-02-21T08:53:00.7195776Z add.s32 %r190, %r29, %r80; 2026-02-21T08:53:00.7195999Z shl.b32 %r191, %r190, 11; 2026-02-21T08:53:00.7196120Z mad.wide.s32 %rd51, %r191, 2, %rd1; 2026-02-21T08:53:00.7196226Z add.s32 %r192, %r30, %r80; 2026-02-21T08:53:00.7196335Z shl.b32 %r193, %r192, 11; 2026-02-21T08:53:00.7196468Z mad.wide.s32 %rd52, %r193, 2, %rd1; 2026-02-21T08:53:00.7196580Z add.s32 %r194, %r31, %r80; 2026-02-21T08:53:00.7196697Z shl.b32 %r195, %r194, 11; 2026-02-21T08:53:00.7196924Z mad.wide.s32 %rd53, %r195, 2, %rd1; 2026-02-21T08:53:00.7197046Z add.s32 %r196, %r32, %r80; 2026-02-21T08:53:00.7197155Z shl.b32 %r197, %r196, 11; 2026-02-21T08:53:00.7197273Z mad.wide.s32 %rd54, %r197, 2, %rd1; 2026-02-21T08:53:00.7197395Z add.s32 %r198, %r33, %r80; 2026-02-21T08:53:00.7197499Z shl.b32 %r199, %r198, 11; 2026-02-21T08:53:00.7197613Z mad.wide.s32 %rd55, %r199, 2, %rd1; 2026-02-21T08:53:00.7197729Z add.s32 %r200, %r34, %r80; 2026-02-21T08:53:00.7197832Z shl.b32 %r201, %r200, 11; 2026-02-21T08:53:00.7197946Z mad.wide.s32 %rd56, %r201, 2, %rd1; 2026-02-21T08:53:00.7198082Z mad.wide.s32 %rd57, %r85, 2, %rd1; 2026-02-21T08:53:00.7198195Z mov.pred %p130, 0; 2026-02-21T08:53:00.7198300Z mov.b32 %r2294, 0; 2026-02-21T08:53:00.7198410Z mov.b64 %rd1290, -128; 2026-02-21T08:53:00.7198524Z mov.b64 %rd1289, 0; 2026-02-21T08:53:00.7198631Z bra.uni $L__BB0_6; 2026-02-21T08:53:00.7198926Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:53:00.7199345Z .loc 1 55 31 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:55:31 2026-02-21T08:53:00.7199463Z xor.b32 %r2294, %r2294, 1; 2026-02-21T08:53:00.7199839Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7199960Z add.s64 %rd1290, %rd1290, 128; 2026-02-21T08:53:00.7200085Z add.s64 %rd1289, %rd1289, 256; 2026-02-21T08:53:00.7200209Z setp.lt.u64 %p47, %rd1290, 1920; 2026-02-21T08:53:00.7200328Z mov.pred %p130, -1; 2026-02-21T08:53:00.7200449Z @%p47 bra $L__BB0_6; 2026-02-21T08:53:00.7200563Z bra.uni $L__BB0_9; 2026-02-21T08:53:00.7200766Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:53:00.7200979Z // => This Inner Loop Header: Depth=2 2026-02-21T08:53:00.7201381Z .loc 1 56 34 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:56:34 2026-02-21T08:53:00.7201509Z add.s64 %rd98, %rd57, %rd1289; 2026-02-21T08:53:00.7201639Z add.s64 %rd101, %rd56, %rd1289; 2026-02-21T08:53:00.7201877Z add.s64 %rd104, %rd55, %rd1289; 2026-02-21T08:53:00.7201995Z add.s64 %rd107, %rd54, %rd1289; 2026-02-21T08:53:00.7202108Z add.s64 %rd110, %rd53, %rd1289; 2026-02-21T08:53:00.7202231Z add.s64 %rd113, %rd52, %rd1289; 2026-02-21T08:53:00.7202346Z add.s64 %rd116, %rd51, %rd1289; 2026-02-21T08:53:00.7202462Z add.s64 %rd119, %rd50, %rd1289; 2026-02-21T08:53:00.7202587Z add.s64 %rd122, %rd49, %rd1289; 2026-02-21T08:53:00.7202703Z add.s64 %rd125, %rd48, %rd1289; 2026-02-21T08:53:00.7202819Z add.s64 %rd128, %rd47, %rd1289; 2026-02-21T08:53:00.7202932Z add.s64 %rd131, %rd46, %rd1289; 2026-02-21T08:53:00.7203052Z add.s64 %rd134, %rd45, %rd1289; 2026-02-21T08:53:00.7203164Z add.s64 %rd137, %rd44, %rd1289; 2026-02-21T08:53:00.7203272Z add.s64 %rd140, %rd43, %rd1289; 2026-02-21T08:53:00.7203393Z add.s64 %rd143, %rd42, %rd1289; 2026-02-21T08:53:00.7203502Z add.s64 %rd146, %rd41, %rd1289; 2026-02-21T08:53:00.7203614Z add.s64 %rd149, %rd40, %rd1289; 2026-02-21T08:53:00.7203731Z add.s64 %rd152, %rd39, %rd1289; 2026-02-21T08:53:00.7203850Z add.s64 %rd155, %rd38, %rd1289; 2026-02-21T08:53:00.7203966Z add.s64 %rd158, %rd37, %rd1289; 2026-02-21T08:53:00.7204081Z add.s64 %rd161, %rd36, %rd1289; 2026-02-21T08:53:00.7204204Z add.s64 %rd164, %rd35, %rd1289; 2026-02-21T08:53:00.7204319Z add.s64 %rd167, %rd34, %rd1289; 2026-02-21T08:53:00.7204433Z add.s64 %rd170, %rd33, %rd1289; 2026-02-21T08:53:00.7204546Z add.s64 %rd173, %rd32, %rd1289; 2026-02-21T08:53:00.7204807Z add.s64 %rd176, %rd31, %rd1289; 2026-02-21T08:53:00.7204923Z add.s64 %rd179, %rd30, %rd1289; 2026-02-21T08:53:00.7205036Z add.s64 %rd182, %rd29, %rd1289; 2026-02-21T08:53:00.7205158Z add.s64 %rd185, %rd28, %rd1289; 2026-02-21T08:53:00.7205271Z add.s64 %rd188, %rd27, %rd1289; 2026-02-21T08:53:00.7205662Z .loc 1 56 87 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:56:87 2026-02-21T08:53:00.7205854Z add.s64 %rd191, %rd26, %rd1289; 2026-02-21T08:53:00.7205974Z // begin inline asm 2026-02-21T08:53:00.7206078Z mov.u64 %rd97, 0x0; 2026-02-21T08:53:00.7206308Z createpolicy.fractional.L2::evict_last.b64 %rd97, 1.0; 2026-02-21T08:53:00.7206419Z // end inline asm 2026-02-21T08:53:00.7206527Z // begin inline asm 2026-02-21T08:53:00.7206634Z mov.u32 %r202, 0x0; 2026-02-21T08:53:00.7206745Z mov.u32 %r203, 0x0; 2026-02-21T08:53:00.7206849Z mov.u32 %r204, 0x0; 2026-02-21T08:53:00.7206953Z mov.u32 %r205, 0x0; 2026-02-21T08:53:00.7207386Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r202, %r203, %r204, %r205 }, [ %rd98 + 0 ], %rd97; 2026-02-21T08:53:00.7207501Z // end inline asm 2026-02-21T08:53:00.7207609Z // begin inline asm 2026-02-21T08:53:00.7207714Z mov.u64 %rd100, 0x0; 2026-02-21T08:53:00.7207958Z createpolicy.fractional.L2::evict_last.b64 %rd100, 1.0; 2026-02-21T08:53:00.7208151Z // end inline asm 2026-02-21T08:53:00.7208270Z // begin inline asm 2026-02-21T08:53:00.7208389Z mov.u32 %r206, 0x0; 2026-02-21T08:53:00.7208494Z mov.u32 %r207, 0x0; 2026-02-21T08:53:00.7208600Z mov.u32 %r208, 0x0; 2026-02-21T08:53:00.7208702Z mov.u32 %r209, 0x0; 2026-02-21T08:53:00.7209099Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r206, %r207, %r208, %r209 }, [ %rd101 + 0 ], %rd100; 2026-02-21T08:53:00.7209198Z // end inline asm 2026-02-21T08:53:00.7209299Z // begin inline asm 2026-02-21T08:53:00.7209414Z mov.u64 %rd103, 0x0; 2026-02-21T08:53:00.7209638Z createpolicy.fractional.L2::evict_last.b64 %rd103, 1.0; 2026-02-21T08:53:00.7209742Z // end inline asm 2026-02-21T08:53:00.7209855Z // begin inline asm 2026-02-21T08:53:00.7209958Z mov.u32 %r210, 0x0; 2026-02-21T08:53:00.7210060Z mov.u32 %r211, 0x0; 2026-02-21T08:53:00.7210162Z mov.u32 %r212, 0x0; 2026-02-21T08:53:00.7210271Z mov.u32 %r213, 0x0; 2026-02-21T08:53:00.7210686Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r210, %r211, %r212, %r213 }, [ %rd104 + 0 ], %rd103; 2026-02-21T08:53:00.7210789Z // end inline asm 2026-02-21T08:53:00.7210919Z // begin inline asm 2026-02-21T08:53:00.7211130Z mov.u64 %rd106, 0x0; 2026-02-21T08:53:00.7211359Z createpolicy.fractional.L2::evict_last.b64 %rd106, 1.0; 2026-02-21T08:53:00.7211466Z // end inline asm 2026-02-21T08:53:00.7211586Z // begin inline asm 2026-02-21T08:53:00.7211685Z mov.u32 %r214, 0x0; 2026-02-21T08:53:00.7211787Z mov.u32 %r215, 0x0; 2026-02-21T08:53:00.7211898Z mov.u32 %r216, 0x0; 2026-02-21T08:53:00.7212001Z mov.u32 %r217, 0x0; 2026-02-21T08:53:00.7212400Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r214, %r215, %r216, %r217 }, [ %rd107 + 0 ], %rd106; 2026-02-21T08:53:00.7212514Z // end inline asm 2026-02-21T08:53:00.7212622Z // begin inline asm 2026-02-21T08:53:00.7212729Z mov.u64 %rd109, 0x0; 2026-02-21T08:53:00.7212960Z createpolicy.fractional.L2::evict_last.b64 %rd109, 1.0; 2026-02-21T08:53:00.7213078Z // end inline asm 2026-02-21T08:53:00.7213193Z // begin inline asm 2026-02-21T08:53:00.7213298Z mov.u32 %r218, 0x0; 2026-02-21T08:53:00.7213416Z mov.u32 %r219, 0x0; 2026-02-21T08:53:00.7213519Z mov.u32 %r220, 0x0; 2026-02-21T08:53:00.7213623Z mov.u32 %r221, 0x0; 2026-02-21T08:53:00.7214028Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r218, %r219, %r220, %r221 }, [ %rd110 + 0 ], %rd109; 2026-02-21T08:53:00.7214141Z // end inline asm 2026-02-21T08:53:00.7214243Z // begin inline asm 2026-02-21T08:53:00.7214344Z mov.u64 %rd112, 0x0; 2026-02-21T08:53:00.7214565Z createpolicy.fractional.L2::evict_last.b64 %rd112, 1.0; 2026-02-21T08:53:00.7214664Z // end inline asm 2026-02-21T08:53:00.7214949Z // begin inline asm 2026-02-21T08:53:00.7215058Z mov.u32 %r222, 0x0; 2026-02-21T08:53:00.7215154Z mov.u32 %r223, 0x0; 2026-02-21T08:53:00.7215248Z mov.u32 %r224, 0x0; 2026-02-21T08:53:00.7215344Z mov.u32 %r225, 0x0; 2026-02-21T08:53:00.7215722Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r222, %r223, %r224, %r225 }, [ %rd113 + 0 ], %rd112; 2026-02-21T08:53:00.7215827Z // end inline asm 2026-02-21T08:53:00.7215929Z // begin inline asm 2026-02-21T08:53:00.7216110Z mov.u64 %rd115, 0x0; 2026-02-21T08:53:00.7216345Z createpolicy.fractional.L2::evict_last.b64 %rd115, 1.0; 2026-02-21T08:53:00.7216451Z // end inline asm 2026-02-21T08:53:00.7216555Z // begin inline asm 2026-02-21T08:53:00.7216665Z mov.u32 %r226, 0x0; 2026-02-21T08:53:00.7216765Z mov.u32 %r227, 0x0; 2026-02-21T08:53:00.7216864Z mov.u32 %r228, 0x0; 2026-02-21T08:53:00.7216971Z mov.u32 %r229, 0x0; 2026-02-21T08:53:00.7217347Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r226, %r227, %r228, %r229 }, [ %rd116 + 0 ], %rd115; 2026-02-21T08:53:00.7217452Z // end inline asm 2026-02-21T08:53:00.7217565Z // begin inline asm 2026-02-21T08:53:00.7217667Z mov.u64 %rd118, 0x0; 2026-02-21T08:53:00.7217879Z createpolicy.fractional.L2::evict_last.b64 %rd118, 1.0; 2026-02-21T08:53:00.7217977Z // end inline asm 2026-02-21T08:53:00.7218090Z // begin inline asm 2026-02-21T08:53:00.7218276Z mov.u32 %r230, 0x0; 2026-02-21T08:53:00.7218389Z mov.u32 %r231, 0x0; 2026-02-21T08:53:00.7218499Z mov.u32 %r232, 0x0; 2026-02-21T08:53:00.7218603Z mov.u32 %r233, 0x0; 2026-02-21T08:53:00.7219013Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r230, %r231, %r232, %r233 }, [ %rd119 + 0 ], %rd118; 2026-02-21T08:53:00.7219118Z // end inline asm 2026-02-21T08:53:00.7219239Z // begin inline asm 2026-02-21T08:53:00.7219343Z mov.u64 %rd121, 0x0; 2026-02-21T08:53:00.7219559Z createpolicy.fractional.L2::evict_last.b64 %rd121, 1.0; 2026-02-21T08:53:00.7219666Z // end inline asm 2026-02-21T08:53:00.7219770Z // begin inline asm 2026-02-21T08:53:00.7219878Z mov.u32 %r234, 0x0; 2026-02-21T08:53:00.7219991Z mov.u32 %r235, 0x0; 2026-02-21T08:53:00.7220095Z mov.u32 %r236, 0x0; 2026-02-21T08:53:00.7220196Z mov.u32 %r237, 0x0; 2026-02-21T08:53:00.7220586Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r234, %r235, %r236, %r237 }, [ %rd122 + 0 ], %rd121; 2026-02-21T08:53:00.7220693Z // end inline asm 2026-02-21T08:53:00.7220798Z // begin inline asm 2026-02-21T08:53:00.7220900Z mov.u64 %rd124, 0x0; 2026-02-21T08:53:00.7221131Z createpolicy.fractional.L2::evict_last.b64 %rd124, 1.0; 2026-02-21T08:53:00.7221348Z // end inline asm 2026-02-21T08:53:00.7221456Z // begin inline asm 2026-02-21T08:53:00.7221561Z mov.u32 %r238, 0x0; 2026-02-21T08:53:00.7221672Z mov.u32 %r239, 0x0; 2026-02-21T08:53:00.7221774Z mov.u32 %r240, 0x0; 2026-02-21T08:53:00.7221875Z mov.u32 %r241, 0x0; 2026-02-21T08:53:00.7222290Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r238, %r239, %r240, %r241 }, [ %rd125 + 0 ], %rd124; 2026-02-21T08:53:00.7222391Z // end inline asm 2026-02-21T08:53:00.7222500Z // begin inline asm 2026-02-21T08:53:00.7222611Z mov.u64 %rd127, 0x0; 2026-02-21T08:53:00.7222834Z createpolicy.fractional.L2::evict_last.b64 %rd127, 1.0; 2026-02-21T08:53:00.7222935Z // end inline asm 2026-02-21T08:53:00.7223039Z // begin inline asm 2026-02-21T08:53:00.7223150Z mov.u32 %r242, 0x0; 2026-02-21T08:53:00.7223254Z mov.u32 %r243, 0x0; 2026-02-21T08:53:00.7223356Z mov.u32 %r244, 0x0; 2026-02-21T08:53:00.7223467Z mov.u32 %r245, 0x0; 2026-02-21T08:53:00.7223860Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r242, %r243, %r244, %r245 }, [ %rd128 + 0 ], %rd127; 2026-02-21T08:53:00.7223965Z // end inline asm 2026-02-21T08:53:00.7224070Z // begin inline asm 2026-02-21T08:53:00.7224185Z mov.u64 %rd130, 0x0; 2026-02-21T08:53:00.7224418Z createpolicy.fractional.L2::evict_last.b64 %rd130, 1.0; 2026-02-21T08:53:00.7224521Z // end inline asm 2026-02-21T08:53:00.7224637Z // begin inline asm 2026-02-21T08:53:00.7224823Z mov.u32 %r246, 0x0; 2026-02-21T08:53:00.7225038Z mov.u32 %r247, 0x0; 2026-02-21T08:53:00.7225161Z mov.u32 %r248, 0x0; 2026-02-21T08:53:00.7225266Z mov.u32 %r249, 0x0; 2026-02-21T08:53:00.7225663Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r246, %r247, %r248, %r249 }, [ %rd131 + 0 ], %rd130; 2026-02-21T08:53:00.7225768Z // end inline asm 2026-02-21T08:53:00.7225884Z // begin inline asm 2026-02-21T08:53:00.7225996Z mov.u64 %rd133, 0x0; 2026-02-21T08:53:00.7226294Z createpolicy.fractional.L2::evict_last.b64 %rd133, 1.0; 2026-02-21T08:53:00.7226414Z // end inline asm 2026-02-21T08:53:00.7226520Z // begin inline asm 2026-02-21T08:53:00.7226622Z mov.u32 %r250, 0x0; 2026-02-21T08:53:00.7226719Z mov.u32 %r251, 0x0; 2026-02-21T08:53:00.7226826Z mov.u32 %r252, 0x0; 2026-02-21T08:53:00.7226928Z mov.u32 %r253, 0x0; 2026-02-21T08:53:00.7227315Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r250, %r251, %r252, %r253 }, [ %rd134 + 0 ], %rd133; 2026-02-21T08:53:00.7227426Z // end inline asm 2026-02-21T08:53:00.7227532Z // begin inline asm 2026-02-21T08:53:00.7227641Z mov.u64 %rd136, 0x0; 2026-02-21T08:53:00.7227874Z createpolicy.fractional.L2::evict_last.b64 %rd136, 1.0; 2026-02-21T08:53:00.7227976Z // end inline asm 2026-02-21T08:53:00.7228081Z // begin inline asm 2026-02-21T08:53:00.7228182Z mov.u32 %r254, 0x0; 2026-02-21T08:53:00.7228293Z mov.u32 %r255, 0x0; 2026-02-21T08:53:00.7228481Z mov.u32 %r256, 0x0; 2026-02-21T08:53:00.7228595Z mov.u32 %r257, 0x0; 2026-02-21T08:53:00.7229003Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r254, %r255, %r256, %r257 }, [ %rd137 + 0 ], %rd136; 2026-02-21T08:53:00.7229108Z // end inline asm 2026-02-21T08:53:00.7229212Z // begin inline asm 2026-02-21T08:53:00.7229315Z mov.u64 %rd139, 0x0; 2026-02-21T08:53:00.7229544Z createpolicy.fractional.L2::evict_last.b64 %rd139, 1.0; 2026-02-21T08:53:00.7229645Z // end inline asm 2026-02-21T08:53:00.7229749Z // begin inline asm 2026-02-21T08:53:00.7229860Z mov.u32 %r258, 0x0; 2026-02-21T08:53:00.7229964Z mov.u32 %r259, 0x0; 2026-02-21T08:53:00.7230070Z mov.u32 %r260, 0x0; 2026-02-21T08:53:00.7230181Z mov.u32 %r261, 0x0; 2026-02-21T08:53:00.7230603Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r258, %r259, %r260, %r261 }, [ %rd140 + 0 ], %rd139; 2026-02-21T08:53:00.7230713Z // end inline asm 2026-02-21T08:53:00.7230820Z // begin inline asm 2026-02-21T08:53:00.7230935Z mov.u64 %rd142, 0x0; 2026-02-21T08:53:00.7231162Z createpolicy.fractional.L2::evict_last.b64 %rd142, 1.0; 2026-02-21T08:53:00.7231267Z // end inline asm 2026-02-21T08:53:00.7231498Z // begin inline asm 2026-02-21T08:53:00.7231606Z mov.u32 %r262, 0x0; 2026-02-21T08:53:00.7231726Z mov.u32 %r263, 0x0; 2026-02-21T08:53:00.7231830Z mov.u32 %r264, 0x0; 2026-02-21T08:53:00.7231942Z mov.u32 %r265, 0x0; 2026-02-21T08:53:00.7232324Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r262, %r263, %r264, %r265 }, [ %rd143 + 0 ], %rd142; 2026-02-21T08:53:00.7232423Z // end inline asm 2026-02-21T08:53:00.7232535Z // begin inline asm 2026-02-21T08:53:00.7232640Z mov.u64 %rd145, 0x0; 2026-02-21T08:53:00.7232863Z createpolicy.fractional.L2::evict_last.b64 %rd145, 1.0; 2026-02-21T08:53:00.7232970Z // end inline asm 2026-02-21T08:53:00.7233076Z // begin inline asm 2026-02-21T08:53:00.7233180Z mov.u32 %r266, 0x0; 2026-02-21T08:53:00.7233283Z mov.u32 %r267, 0x0; 2026-02-21T08:53:00.7233393Z mov.u32 %r268, 0x0; 2026-02-21T08:53:00.7233499Z mov.u32 %r269, 0x0; 2026-02-21T08:53:00.7233903Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r266, %r267, %r268, %r269 }, [ %rd146 + 0 ], %rd145; 2026-02-21T08:53:00.7234017Z // end inline asm 2026-02-21T08:53:00.7234124Z // begin inline asm 2026-02-21T08:53:00.7234230Z mov.u64 %rd148, 0x0; 2026-02-21T08:53:00.7234457Z createpolicy.fractional.L2::evict_last.b64 %rd148, 1.0; 2026-02-21T08:53:00.7234559Z // end inline asm 2026-02-21T08:53:00.7234662Z // begin inline asm 2026-02-21T08:53:00.7234854Z mov.u32 %r270, 0x0; 2026-02-21T08:53:00.7234966Z mov.u32 %r271, 0x0; 2026-02-21T08:53:00.7235067Z mov.u32 %r272, 0x0; 2026-02-21T08:53:00.7235316Z mov.u32 %r273, 0x0; 2026-02-21T08:53:00.7235729Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r270, %r271, %r272, %r273 }, [ %rd149 + 0 ], %rd148; 2026-02-21T08:53:00.7235834Z // end inline asm 2026-02-21T08:53:00.7235937Z // begin inline asm 2026-02-21T08:53:00.7236044Z mov.u64 %rd151, 0x0; 2026-02-21T08:53:00.7236285Z createpolicy.fractional.L2::evict_last.b64 %rd151, 1.0; 2026-02-21T08:53:00.7236392Z // end inline asm 2026-02-21T08:53:00.7236586Z // begin inline asm 2026-02-21T08:53:00.7236711Z mov.u32 %r274, 0x0; 2026-02-21T08:53:00.7236820Z mov.u32 %r275, 0x0; 2026-02-21T08:53:00.7236925Z mov.u32 %r276, 0x0; 2026-02-21T08:53:00.7237028Z mov.u32 %r277, 0x0; 2026-02-21T08:53:00.7237432Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r274, %r275, %r276, %r277 }, [ %rd152 + 0 ], %rd151; 2026-02-21T08:53:00.7237539Z // end inline asm 2026-02-21T08:53:00.7237644Z // begin inline asm 2026-02-21T08:53:00.7237758Z mov.u64 %rd154, 0x0; 2026-02-21T08:53:00.7237975Z createpolicy.fractional.L2::evict_last.b64 %rd154, 1.0; 2026-02-21T08:53:00.7238077Z // end inline asm 2026-02-21T08:53:00.7238186Z // begin inline asm 2026-02-21T08:53:00.7238285Z mov.u32 %r278, 0x0; 2026-02-21T08:53:00.7238381Z mov.u32 %r279, 0x0; 2026-02-21T08:53:00.7238476Z mov.u32 %r280, 0x0; 2026-02-21T08:53:00.7238582Z mov.u32 %r281, 0x0; 2026-02-21T08:53:00.7239051Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r278, %r279, %r280, %r281 }, [ %rd155 + 0 ], %rd154; 2026-02-21T08:53:00.7239166Z // end inline asm 2026-02-21T08:53:00.7239281Z // begin inline asm 2026-02-21T08:53:00.7239383Z mov.u64 %rd157, 0x0; 2026-02-21T08:53:00.7239601Z createpolicy.fractional.L2::evict_last.b64 %rd157, 1.0; 2026-02-21T08:53:00.7239707Z // end inline asm 2026-02-21T08:53:00.7239811Z // begin inline asm 2026-02-21T08:53:00.7239909Z mov.u32 %r282, 0x0; 2026-02-21T08:53:00.7240008Z mov.u32 %r283, 0x0; 2026-02-21T08:53:00.7240116Z mov.u32 %r284, 0x0; 2026-02-21T08:53:00.7240214Z mov.u32 %r285, 0x0; 2026-02-21T08:53:00.7240594Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r282, %r283, %r284, %r285 }, [ %rd158 + 0 ], %rd157; 2026-02-21T08:53:00.7240701Z // end inline asm 2026-02-21T08:53:00.7240803Z // begin inline asm 2026-02-21T08:53:00.7240904Z mov.u64 %rd160, 0x0; 2026-02-21T08:53:00.7241120Z createpolicy.fractional.L2::evict_last.b64 %rd160, 1.0; 2026-02-21T08:53:00.7241233Z // end inline asm 2026-02-21T08:53:00.7241334Z // begin inline asm 2026-02-21T08:53:00.7241436Z mov.u32 %r286, 0x0; 2026-02-21T08:53:00.7241652Z mov.u32 %r287, 0x0; 2026-02-21T08:53:00.7241754Z mov.u32 %r288, 0x0; 2026-02-21T08:53:00.7241854Z mov.u32 %r289, 0x0; 2026-02-21T08:53:00.7242268Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r286, %r287, %r288, %r289 }, [ %rd161 + 0 ], %rd160; 2026-02-21T08:53:00.7242379Z // end inline asm 2026-02-21T08:53:00.7242491Z // begin inline asm 2026-02-21T08:53:00.7242601Z mov.u64 %rd163, 0x0; 2026-02-21T08:53:00.7242825Z createpolicy.fractional.L2::evict_last.b64 %rd163, 1.0; 2026-02-21T08:53:00.7242927Z // end inline asm 2026-02-21T08:53:00.7243030Z // begin inline asm 2026-02-21T08:53:00.7243144Z mov.u32 %r290, 0x0; 2026-02-21T08:53:00.7243249Z mov.u32 %r291, 0x0; 2026-02-21T08:53:00.7243353Z mov.u32 %r292, 0x0; 2026-02-21T08:53:00.7243457Z mov.u32 %r293, 0x0; 2026-02-21T08:53:00.7243847Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r290, %r291, %r292, %r293 }, [ %rd164 + 0 ], %rd163; 2026-02-21T08:53:00.7243948Z // end inline asm 2026-02-21T08:53:00.7244053Z // begin inline asm 2026-02-21T08:53:00.7244165Z mov.u64 %rd166, 0x0; 2026-02-21T08:53:00.7244385Z createpolicy.fractional.L2::evict_last.b64 %rd166, 1.0; 2026-02-21T08:53:00.7244485Z // end inline asm 2026-02-21T08:53:00.7244597Z // begin inline asm 2026-02-21T08:53:00.7244787Z mov.u32 %r294, 0x0; 2026-02-21T08:53:00.7244898Z mov.u32 %r295, 0x0; 2026-02-21T08:53:00.7245002Z mov.u32 %r296, 0x0; 2026-02-21T08:53:00.7245115Z mov.u32 %r297, 0x0; 2026-02-21T08:53:00.7245526Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r294, %r295, %r296, %r297 }, [ %rd167 + 0 ], %rd166; 2026-02-21T08:53:00.7245736Z // end inline asm 2026-02-21T08:53:00.7245852Z // begin inline asm 2026-02-21T08:53:00.7245960Z mov.u64 %rd169, 0x0; 2026-02-21T08:53:00.7246185Z createpolicy.fractional.L2::evict_last.b64 %rd169, 1.0; 2026-02-21T08:53:00.7246287Z // end inline asm 2026-02-21T08:53:00.7246407Z // begin inline asm 2026-02-21T08:53:00.7246511Z mov.u32 %r298, 0x0; 2026-02-21T08:53:00.7246680Z mov.u32 %r299, 0x0; 2026-02-21T08:53:00.7246803Z mov.u32 %r300, 0x0; 2026-02-21T08:53:00.7246906Z mov.u32 %r301, 0x0; 2026-02-21T08:53:00.7247298Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r298, %r299, %r300, %r301 }, [ %rd170 + 0 ], %rd169; 2026-02-21T08:53:00.7247410Z // end inline asm 2026-02-21T08:53:00.7247514Z // begin inline asm 2026-02-21T08:53:00.7247620Z mov.u64 %rd172, 0x0; 2026-02-21T08:53:00.7247847Z createpolicy.fractional.L2::evict_last.b64 %rd172, 1.0; 2026-02-21T08:53:00.7247958Z // end inline asm 2026-02-21T08:53:00.7248074Z // begin inline asm 2026-02-21T08:53:00.7248180Z mov.u32 %r302, 0x0; 2026-02-21T08:53:00.7248292Z mov.u32 %r303, 0x0; 2026-02-21T08:53:00.7248398Z mov.u32 %r304, 0x0; 2026-02-21T08:53:00.7248499Z mov.u32 %r305, 0x0; 2026-02-21T08:53:00.7248886Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r302, %r303, %r304, %r305 }, [ %rd173 + 0 ], %rd172; 2026-02-21T08:53:00.7249085Z // end inline asm 2026-02-21T08:53:00.7249204Z // begin inline asm 2026-02-21T08:53:00.7249314Z mov.u64 %rd175, 0x0; 2026-02-21T08:53:00.7249548Z createpolicy.fractional.L2::evict_last.b64 %rd175, 1.0; 2026-02-21T08:53:00.7249649Z // end inline asm 2026-02-21T08:53:00.7249751Z // begin inline asm 2026-02-21T08:53:00.7249858Z mov.u32 %r306, 0x0; 2026-02-21T08:53:00.7249956Z mov.u32 %r307, 0x0; 2026-02-21T08:53:00.7250052Z mov.u32 %r308, 0x0; 2026-02-21T08:53:00.7250151Z mov.u32 %r309, 0x0; 2026-02-21T08:53:00.7250544Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r306, %r307, %r308, %r309 }, [ %rd176 + 0 ], %rd175; 2026-02-21T08:53:00.7250650Z // end inline asm 2026-02-21T08:53:00.7250756Z // begin inline asm 2026-02-21T08:53:00.7250869Z mov.u64 %rd178, 0x0; 2026-02-21T08:53:00.7251092Z createpolicy.fractional.L2::evict_last.b64 %rd178, 1.0; 2026-02-21T08:53:00.7251195Z // end inline asm 2026-02-21T08:53:00.7251302Z // begin inline asm 2026-02-21T08:53:00.7251416Z mov.u32 %r310, 0x0; 2026-02-21T08:53:00.7251518Z mov.u32 %r311, 0x0; 2026-02-21T08:53:00.7251624Z mov.u32 %r312, 0x0; 2026-02-21T08:53:00.7251839Z mov.u32 %r313, 0x0; 2026-02-21T08:53:00.7252239Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r310, %r311, %r312, %r313 }, [ %rd179 + 0 ], %rd178; 2026-02-21T08:53:00.7252342Z // end inline asm 2026-02-21T08:53:00.7252456Z // begin inline asm 2026-02-21T08:53:00.7252563Z mov.u64 %rd181, 0x0; 2026-02-21T08:53:00.7252784Z createpolicy.fractional.L2::evict_last.b64 %rd181, 1.0; 2026-02-21T08:53:00.7252884Z // end inline asm 2026-02-21T08:53:00.7252998Z // begin inline asm 2026-02-21T08:53:00.7253106Z mov.u32 %r314, 0x0; 2026-02-21T08:53:00.7253209Z mov.u32 %r315, 0x0; 2026-02-21T08:53:00.7253319Z mov.u32 %r316, 0x0; 2026-02-21T08:53:00.7253423Z mov.u32 %r317, 0x0; 2026-02-21T08:53:00.7253851Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r314, %r315, %r316, %r317 }, [ %rd182 + 0 ], %rd181; 2026-02-21T08:53:00.7253971Z // end inline asm 2026-02-21T08:53:00.7254083Z // begin inline asm 2026-02-21T08:53:00.7254188Z mov.u64 %rd184, 0x0; 2026-02-21T08:53:00.7254416Z createpolicy.fractional.L2::evict_last.b64 %rd184, 1.0; 2026-02-21T08:53:00.7254531Z // end inline asm 2026-02-21T08:53:00.7254640Z // begin inline asm 2026-02-21T08:53:00.7254827Z mov.u32 %r318, 0x0; 2026-02-21T08:53:00.7254945Z mov.u32 %r319, 0x0; 2026-02-21T08:53:00.7255050Z mov.u32 %r320, 0x0; 2026-02-21T08:53:00.7255153Z mov.u32 %r321, 0x0; 2026-02-21T08:53:00.7255549Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r318, %r319, %r320, %r321 }, [ %rd185 + 0 ], %rd184; 2026-02-21T08:53:00.7255659Z // end inline asm 2026-02-21T08:53:00.7255876Z // begin inline asm 2026-02-21T08:53:00.7255985Z mov.u64 %rd187, 0x0; 2026-02-21T08:53:00.7256214Z createpolicy.fractional.L2::evict_last.b64 %rd187, 1.0; 2026-02-21T08:53:00.7256316Z // end inline asm 2026-02-21T08:53:00.7256423Z // begin inline asm 2026-02-21T08:53:00.7256525Z mov.u32 %r322, 0x0; 2026-02-21T08:53:00.7256642Z mov.u32 %r323, 0x0; 2026-02-21T08:53:00.7256746Z mov.u32 %r324, 0x0; 2026-02-21T08:53:00.7256919Z mov.u32 %r325, 0x0; 2026-02-21T08:53:00.7257348Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r322, %r323, %r324, %r325 }, [ %rd188 + 0 ], %rd187; 2026-02-21T08:53:00.7257454Z // end inline asm 2026-02-21T08:53:00.7257562Z // begin inline asm 2026-02-21T08:53:00.7257677Z mov.u64 %rd190, 0x0; 2026-02-21T08:53:00.7257902Z createpolicy.fractional.L2::evict_last.b64 %rd190, 1.0; 2026-02-21T08:53:00.7258009Z // end inline asm 2026-02-21T08:53:00.7258118Z // begin inline asm 2026-02-21T08:53:00.7258233Z mov.u32 %r326, 0x0; 2026-02-21T08:53:00.7258340Z mov.u32 %r327, 0x0; 2026-02-21T08:53:00.7258444Z mov.u32 %r328, 0x0; 2026-02-21T08:53:00.7258555Z mov.u32 %r329, 0x0; 2026-02-21T08:53:00.7258956Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r326, %r327, %r328, %r329 }, [ %rd191 + 0 ], %rd190; 2026-02-21T08:53:00.7259063Z // end inline asm 2026-02-21T08:53:00.7259182Z bar.sync 2, 128; 2026-02-21T08:53:00.7259464Z st.shared.v4.b32 [%r37], {%r202, %r203, %r204, %r205}; 2026-02-21T08:53:00.7259707Z st.shared.v4.b32 [%r37+1024], {%r206, %r207, %r208, %r209}; 2026-02-21T08:53:00.7259930Z st.shared.v4.b32 [%r37+2048], {%r210, %r211, %r212, %r213}; 2026-02-21T08:53:00.7260140Z st.shared.v4.b32 [%r37+3072], {%r214, %r215, %r216, %r217}; 2026-02-21T08:53:00.7260344Z st.shared.v4.b32 [%r37+4096], {%r218, %r219, %r220, %r221}; 2026-02-21T08:53:00.7260544Z st.shared.v4.b32 [%r37+5120], {%r222, %r223, %r224, %r225}; 2026-02-21T08:53:00.7260748Z st.shared.v4.b32 [%r37+6144], {%r226, %r227, %r228, %r229}; 2026-02-21T08:53:00.7260944Z st.shared.v4.b32 [%r37+7168], {%r230, %r231, %r232, %r233}; 2026-02-21T08:53:00.7261131Z st.shared.v4.b32 [%r37+8192], {%r234, %r235, %r236, %r237}; 2026-02-21T08:53:00.7261322Z st.shared.v4.b32 [%r37+9216], {%r238, %r239, %r240, %r241}; 2026-02-21T08:53:00.7261513Z st.shared.v4.b32 [%r37+10240], {%r242, %r243, %r244, %r245}; 2026-02-21T08:53:00.7261709Z st.shared.v4.b32 [%r37+11264], {%r246, %r247, %r248, %r249}; 2026-02-21T08:53:00.7261900Z st.shared.v4.b32 [%r37+12288], {%r250, %r251, %r252, %r253}; 2026-02-21T08:53:00.7262216Z st.shared.v4.b32 [%r37+13312], {%r254, %r255, %r256, %r257}; 2026-02-21T08:53:00.7262414Z st.shared.v4.b32 [%r37+14336], {%r258, %r259, %r260, %r261}; 2026-02-21T08:53:00.7262611Z st.shared.v4.b32 [%r37+15360], {%r262, %r263, %r264, %r265}; 2026-02-21T08:53:00.7262815Z st.shared.v4.b32 [%r37+16384], {%r266, %r267, %r268, %r269}; 2026-02-21T08:53:00.7263010Z st.shared.v4.b32 [%r37+17408], {%r270, %r271, %r272, %r273}; 2026-02-21T08:53:00.7263205Z st.shared.v4.b32 [%r37+18432], {%r274, %r275, %r276, %r277}; 2026-02-21T08:53:00.7263411Z st.shared.v4.b32 [%r37+19456], {%r278, %r279, %r280, %r281}; 2026-02-21T08:53:00.7263603Z st.shared.v4.b32 [%r37+20480], {%r282, %r283, %r284, %r285}; 2026-02-21T08:53:00.7263794Z st.shared.v4.b32 [%r37+21504], {%r286, %r287, %r288, %r289}; 2026-02-21T08:53:00.7263998Z st.shared.v4.b32 [%r37+22528], {%r290, %r291, %r292, %r293}; 2026-02-21T08:53:00.7264200Z st.shared.v4.b32 [%r37+23552], {%r294, %r295, %r296, %r297}; 2026-02-21T08:53:00.7264397Z st.shared.v4.b32 [%r37+24576], {%r298, %r299, %r300, %r301}; 2026-02-21T08:53:00.7264592Z st.shared.v4.b32 [%r37+25600], {%r302, %r303, %r304, %r305}; 2026-02-21T08:53:00.7264868Z st.shared.v4.b32 [%r37+26624], {%r306, %r307, %r308, %r309}; 2026-02-21T08:53:00.7265074Z st.shared.v4.b32 [%r37+27648], {%r310, %r311, %r312, %r313}; 2026-02-21T08:53:00.7265282Z st.shared.v4.b32 [%r37+28672], {%r314, %r315, %r316, %r317}; 2026-02-21T08:53:00.7265499Z st.shared.v4.b32 [%r37+29696], {%r318, %r319, %r320, %r321}; 2026-02-21T08:53:00.7265808Z st.shared.v4.b32 [%r37+30720], {%r322, %r323, %r324, %r325}; 2026-02-21T08:53:00.7266003Z st.shared.v4.b32 [%r37+31744], {%r326, %r327, %r328, %r329}; 2026-02-21T08:53:00.7266412Z .loc 1 55 31 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:55:31 2026-02-21T08:53:00.7266537Z add.s32 %r330, %r52, 131088; 2026-02-21T08:53:00.7266647Z // begin inline asm 2026-02-21T08:53:00.7266817Z 2026-02-21T08:53:00.7266928Z { 2026-02-21T08:53:00.7267038Z .reg .pred complete; 2026-02-21T08:53:00.7267133Z waitLoop: 2026-02-21T08:53:00.7267404Z mbarrier.try_wait.parity.shared.b64 complete, [%r330], %r2294; 2026-02-21T08:53:00.7267529Z @!complete bra.uni waitLoop; 2026-02-21T08:53:00.7267618Z } 2026-02-21T08:53:00.7267627Z 2026-02-21T08:53:00.7267741Z // end inline asm 2026-02-21T08:53:00.7268121Z .loc 1 57 52 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:57:52 2026-02-21T08:53:00.7268234Z // begin inline asm 2026-02-21T08:53:00.7268380Z fence.proxy.async.shared::cta; 2026-02-21T08:53:00.7268494Z // end inline asm 2026-02-21T08:53:00.7268599Z bar.sync 2, 128; 2026-02-21T08:53:00.7268746Z shfl.sync.idx.b32 %r333, %r36, 0, 31, -1; 2026-02-21T08:53:00.7268874Z setp.ne.b32 %p10, %r333, 0; 2026-02-21T08:53:00.7269066Z @%p10 bra $L__BB0_8; 2026-02-21T08:53:00.7269280Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T08:53:00.7269409Z setp.eq.b64 %p45, %rd1289, 3840; 2026-02-21T08:53:00.7269547Z elect.sync %r366|%p12, -1; 2026-02-21T08:53:00.7269658Z mov.b32 %r335, 138412048; 2026-02-21T08:53:00.7269766Z // begin inline asm 2026-02-21T08:53:00.7270101Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd193, %rd194, %r335, %p130; 2026-02-21T08:53:00.7270203Z // end inline asm 2026-02-21T08:53:00.7270320Z mov.pred %p13, -1; 2026-02-21T08:53:00.7270430Z // begin inline asm 2026-02-21T08:53:00.7270762Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd195, %rd196, %r335, %p13; 2026-02-21T08:53:00.7270874Z // end inline asm 2026-02-21T08:53:00.7270982Z // begin inline asm 2026-02-21T08:53:00.7271322Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd197, %rd198, %r335, %p13; 2026-02-21T08:53:00.7271427Z // end inline asm 2026-02-21T08:53:00.7271537Z // begin inline asm 2026-02-21T08:53:00.7271859Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd199, %rd200, %r335, %p13; 2026-02-21T08:53:00.7272117Z // end inline asm 2026-02-21T08:53:00.7272223Z // begin inline asm 2026-02-21T08:53:00.7272541Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd201, %rd202, %r335, %p13; 2026-02-21T08:53:00.7272640Z // end inline asm 2026-02-21T08:53:00.7272742Z // begin inline asm 2026-02-21T08:53:00.7273046Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd203, %rd204, %r335, %p13; 2026-02-21T08:53:00.7273163Z // end inline asm 2026-02-21T08:53:00.7273268Z // begin inline asm 2026-02-21T08:53:00.7273583Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd205, %rd206, %r335, %p13; 2026-02-21T08:53:00.7273695Z // end inline asm 2026-02-21T08:53:00.7273803Z // begin inline asm 2026-02-21T08:53:00.7274113Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 0 ], %rd207, %rd208, %r335, %p13; 2026-02-21T08:53:00.7274225Z // end inline asm 2026-02-21T08:53:00.7274333Z // begin inline asm 2026-02-21T08:53:00.7274659Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd209, %rd194, %r335, %p130; 2026-02-21T08:53:00.7274851Z // end inline asm 2026-02-21T08:53:00.7274961Z // begin inline asm 2026-02-21T08:53:00.7275286Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd211, %rd196, %r335, %p13; 2026-02-21T08:53:00.7275386Z // end inline asm 2026-02-21T08:53:00.7275499Z // begin inline asm 2026-02-21T08:53:00.7275813Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd213, %rd198, %r335, %p13; 2026-02-21T08:53:00.7275915Z // end inline asm 2026-02-21T08:53:00.7276149Z // begin inline asm 2026-02-21T08:53:00.7276480Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd215, %rd200, %r335, %p13; 2026-02-21T08:53:00.7276590Z // end inline asm 2026-02-21T08:53:00.7276710Z // begin inline asm 2026-02-21T08:53:00.7277048Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd217, %rd202, %r335, %p13; 2026-02-21T08:53:00.7277155Z // end inline asm 2026-02-21T08:53:00.7277262Z // begin inline asm 2026-02-21T08:53:00.7277676Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd219, %rd204, %r335, %p13; 2026-02-21T08:53:00.7277794Z // end inline asm 2026-02-21T08:53:00.7277901Z // begin inline asm 2026-02-21T08:53:00.7278215Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd221, %rd206, %r335, %p13; 2026-02-21T08:53:00.7278313Z // end inline asm 2026-02-21T08:53:00.7278414Z // begin inline asm 2026-02-21T08:53:00.7278726Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r334 + 256 ], %rd223, %rd208, %r335, %p13; 2026-02-21T08:53:00.7278831Z // end inline asm 2026-02-21T08:53:00.7278943Z add.s32 %r368, %r52, 131072; 2026-02-21T08:53:00.7279066Z cvt.u64.u32 %rd225, %r368; 2026-02-21T08:53:00.7279172Z // begin inline asm 2026-02-21T08:53:00.7279454Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd225]; 2026-02-21T08:53:00.7279559Z // end inline asm 2026-02-21T08:53:00.7279774Z and.pred %p44, %p45, %p12; 2026-02-21T08:53:00.7279898Z add.s32 %r369, %r52, 131104; 2026-02-21T08:53:00.7280015Z cvt.u64.u32 %rd226, %r369; 2026-02-21T08:53:00.7280138Z // begin inline asm 2026-02-21T08:53:00.7280425Z @%p44 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd226]; 2026-02-21T08:53:00.7280528Z // end inline asm 2026-02-21T08:53:00.7280631Z bra.uni $L__BB0_8; 2026-02-21T08:53:00.7280844Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7281226Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7281376Z ld.shared.b32 %r69, [global_smem+65544]; 2026-02-21T08:53:00.7281492Z barrier.sync 1; 2026-02-21T08:53:00.7281872Z .loc 1 21 67 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:21:67 2026-02-21T08:53:00.7281985Z mov.u32 %r56, %ctaid.x; 2026-02-21T08:53:00.7282104Z mov.u32 %r57, %ctaid.y; 2026-02-21T08:53:00.7282217Z mov.u32 %r58, %ctaid.z; 2026-02-21T08:53:00.7282331Z mov.u32 %r59, %nctaid.x; 2026-02-21T08:53:00.7282448Z mov.u32 %r60, %nctaid.y; 2026-02-21T08:53:00.7282702Z mad.lo.s32 %r61, %r58, %r60, %r57; 2026-02-21T08:53:00.7282824Z mad.lo.s32 %r62, %r61, %r59, %r56; 2026-02-21T08:53:00.7282936Z shl.b32 %r63, %r62, 8; 2026-02-21T08:53:00.7283056Z cvt.s64.s32 %rd68, %r63; 2026-02-21T08:53:00.7283172Z add.s64 %rd69, %rd67, %rd68; 2026-02-21T08:53:00.7283297Z cvta.global.u64 %rd70, %rd69; 2026-02-21T08:53:00.7283409Z add.s32 %r41, %r1, -256; 2026-02-21T08:53:00.7283524Z shr.u32 %r42, %r41, 5; 2026-02-21T08:53:00.7283629Z mov.b32 %r2296, 0; 2026-02-21T08:53:00.7283738Z mov.b32 %r2295, -128; 2026-02-21T08:53:00.7283950Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T08:53:00.7284142Z // => This Inner Loop Header: Depth=2 2026-02-21T08:53:00.7284497Z .loc 1 0 67 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:0:67 2026-02-21T08:53:00.7284619Z setp.lt.u32 %p6, %r41, 64; 2026-02-21T08:53:00.7284809Z setp.eq.b32 %p4, %r41, 0; 2026-02-21T08:53:00.7285180Z .loc 1 55 31 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:55:31 2026-02-21T08:53:00.7285300Z add.s32 %r64, %r52, 131072; 2026-02-21T08:53:00.7285406Z // begin inline asm 2026-02-21T08:53:00.7285495Z 2026-02-21T08:53:00.7285588Z { 2026-02-21T08:53:00.7285706Z .reg .pred complete; 2026-02-21T08:53:00.7285804Z waitLoop: 2026-02-21T08:53:00.7286052Z mbarrier.try_wait.parity.shared.b64 complete, [%r64], %r2296; 2026-02-21T08:53:00.7286297Z @!complete bra.uni waitLoop; 2026-02-21T08:53:00.7286384Z } 2026-02-21T08:53:00.7286392Z 2026-02-21T08:53:00.7286493Z // end inline asm 2026-02-21T08:53:00.7286592Z bar.sync 3, 64; 2026-02-21T08:53:00.7286706Z add.s32 %r66, %r52, 131088; 2026-02-21T08:53:00.7286811Z // begin inline asm 2026-02-21T08:53:00.7287039Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r66], 65536; 2026-02-21T08:53:00.7287149Z // end inline asm 2026-02-21T08:53:00.7287320Z bar.sync 3, 64; 2026-02-21T08:53:00.7287476Z shfl.sync.idx.b32 %r72, %r42, 0, 31, -1; 2026-02-21T08:53:00.7287594Z elect.sync %r73|%p7, -1; 2026-02-21T08:53:00.7287719Z and.pred %p5, %p6, %p7; 2026-02-21T08:53:00.7287823Z and.b32 %r74, %r72, 1; 2026-02-21T08:53:00.7287932Z shl.b32 %r75, %r74, 15; 2026-02-21T08:53:00.7288051Z add.s32 %r67, %r52, %r75; 2026-02-21T08:53:00.7288157Z shl.b32 %r76, %r74, 6; 2026-02-21T08:53:00.7288555Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7288672Z add.s32 %r2295, %r2295, 128; 2026-02-21T08:53:00.7289034Z .loc 1 55 31 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:55:31 2026-02-21T08:53:00.7289147Z add.s32 %r68, %r2295, %r76; 2026-02-21T08:53:00.7289255Z // begin inline asm 2026-02-21T08:53:00.7289928Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r67], [%rd70, {%r68, %r69}], [%r66]; 2026-02-21T08:53:00.7290047Z // end inline asm 2026-02-21T08:53:00.7290161Z xor.b32 %r2296, %r2296, 1; 2026-02-21T08:53:00.7290546Z .loc 1 50 57 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:50:57 2026-02-21T08:53:00.7290672Z setp.lt.u32 %p8, %r2295, 1920; 2026-02-21T08:53:00.7290784Z @%p8 bra $L__BB0_11; 2026-02-21T08:53:00.7290993Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7291105Z barrier.sync 1; 2026-02-21T08:53:00.7291215Z bra.uni $L__BB0_2; 2026-02-21T08:53:00.7291432Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7291548Z barrier.sync 1; 2026-02-21T08:53:00.7291656Z bra.uni $L__BB0_2; 2026-02-21T08:53:00.7291858Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:00.7292232Z .loc 1 19 0 // cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py:19 2026-02-21T08:53:00.7292340Z barrier.sync 1; 2026-02-21T08:53:00.7292451Z barrier.sync 1; 2026-02-21T08:53:00.7292667Z bra.uni $L__BB0_2; 2026-02-21T08:53:00.7292781Z $L__tmp1: 2026-02-21T08:53:00.7292887Z $L__func_end0: 2026-02-21T08:53:00.7293047Z // -- End function 2026-02-21T08:53:00.7293157Z } 2026-02-21T08:53:00.7293641Z .file 1 "/tmp/torchinductor_root/g7/cg74246b4afmed4hhchren3355r4y24efiut4bnl6m4us2z23p76.py" 2026-02-21T08:53:00.7293764Z .section .debug_abbrev 2026-02-21T08:53:00.7293874Z { 2026-02-21T08:53:00.7294055Z .b8 1 // Abbreviation Code 2026-02-21T08:53:00.7294239Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:53:00.7294400Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:53:00.7294564Z .b8 37 // DW_AT_producer 2026-02-21T08:53:00.7294797Z .b8 8 // DW_FORM_string 2026-02-21T08:53:00.7294959Z .b8 19 // DW_AT_language 2026-02-21T08:53:00.7295120Z .b8 5 // DW_FORM_data2 2026-02-21T08:53:00.7295266Z .b8 3 // DW_AT_name 2026-02-21T08:53:00.7295413Z .b8 8 // DW_FORM_string 2026-02-21T08:53:00.7295571Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:53:00.7295713Z .b8 6 // DW_FORM_data4 2026-02-21T08:53:00.7295857Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:53:00.7296109Z .b8 8 // DW_FORM_string 2026-02-21T08:53:00.7296257Z .b8 0 // EOM(1) 2026-02-21T08:53:00.7296389Z .b8 0 // EOM(2) 2026-02-21T08:53:00.7296520Z .b8 0 // EOM(3) 2026-02-21T08:53:00.7296624Z } 2026-02-21T08:53:00.7296742Z .section .debug_info 2026-02-21T08:53:00.7296836Z { 2026-02-21T08:53:00.7297089Z .b32 104 // Length of Unit 2026-02-21T08:53:00.7297276Z .b8 2 // DWARF version number 2026-02-21T08:53:00.7297373Z .b8 0 2026-02-21T08:53:00.7297622Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:53:00.7297808Z .b8 8 // Address Size (in bytes) 2026-02-21T08:53:00.7298015Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:53:00.7298173Z .b8 116 // DW_AT_producer 2026-02-21T08:53:00.7298290Z .b8 114 2026-02-21T08:53:00.7298387Z .b8 105 2026-02-21T08:53:00.7298483Z .b8 116 2026-02-21T08:53:00.7298577Z .b8 111 2026-02-21T08:53:00.7298680Z .b8 110 2026-02-21T08:53:00.7298773Z .b8 0 2026-02-21T08:53:00.7298920Z .b8 2 // DW_AT_language 2026-02-21T08:53:00.7299024Z .b8 0 2026-02-21T08:53:00.7299250Z .b8 99 // DW_AT_name 2026-02-21T08:53:00.7299355Z .b8 103 2026-02-21T08:53:00.7299456Z .b8 55 2026-02-21T08:53:00.7299564Z .b8 52 2026-02-21T08:53:00.7299659Z .b8 50 2026-02-21T08:53:00.7299754Z .b8 52 2026-02-21T08:53:00.7299859Z .b8 54 2026-02-21T08:53:00.7299952Z .b8 98 2026-02-21T08:53:00.7300046Z .b8 52 2026-02-21T08:53:00.7300138Z .b8 97 2026-02-21T08:53:00.7300239Z .b8 102 2026-02-21T08:53:00.7300335Z .b8 109 2026-02-21T08:53:00.7300427Z .b8 101 2026-02-21T08:53:00.7300530Z .b8 100 2026-02-21T08:53:00.7300624Z .b8 52 2026-02-21T08:53:00.7300718Z .b8 104 2026-02-21T08:53:00.7300810Z .b8 104 2026-02-21T08:53:00.7300914Z .b8 99 2026-02-21T08:53:00.7301006Z .b8 104 2026-02-21T08:53:00.7301097Z .b8 114 2026-02-21T08:53:00.7301188Z .b8 101 2026-02-21T08:53:00.7301287Z .b8 110 2026-02-21T08:53:00.7301376Z .b8 51 2026-02-21T08:53:00.7301462Z .b8 51 2026-02-21T08:53:00.7301558Z .b8 53 2026-02-21T08:53:00.7301645Z .b8 53 2026-02-21T08:53:00.7301739Z .b8 114 2026-02-21T08:53:00.7301831Z .b8 52 2026-02-21T08:53:00.7301931Z .b8 121 2026-02-21T08:53:00.7302021Z .b8 50 2026-02-21T08:53:00.7302115Z .b8 52 2026-02-21T08:53:00.7302331Z .b8 101 2026-02-21T08:53:00.7302427Z .b8 102 2026-02-21T08:53:00.7302523Z .b8 105 2026-02-21T08:53:00.7302621Z .b8 117 2026-02-21T08:53:00.7302726Z .b8 116 2026-02-21T08:53:00.7302821Z .b8 52 2026-02-21T08:53:00.7302916Z .b8 98 2026-02-21T08:53:00.7303018Z .b8 110 2026-02-21T08:53:00.7303116Z .b8 108 2026-02-21T08:53:00.7303210Z .b8 54 2026-02-21T08:53:00.7303306Z .b8 109 2026-02-21T08:53:00.7303408Z .b8 52 2026-02-21T08:53:00.7303505Z .b8 117 2026-02-21T08:53:00.7303598Z .b8 115 2026-02-21T08:53:00.7303695Z .b8 50 2026-02-21T08:53:00.7303797Z .b8 122 2026-02-21T08:53:00.7303892Z .b8 50 2026-02-21T08:53:00.7303983Z .b8 51 2026-02-21T08:53:00.7304085Z .b8 112 2026-02-21T08:53:00.7304179Z .b8 55 2026-02-21T08:53:00.7304273Z .b8 54 2026-02-21T08:53:00.7304365Z .b8 46 2026-02-21T08:53:00.7304469Z .b8 112 2026-02-21T08:53:00.7304563Z .b8 121 2026-02-21T08:53:00.7304661Z .b8 0 2026-02-21T08:53:00.7304948Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:53:00.7305110Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:53:00.7305208Z .b8 116 2026-02-21T08:53:00.7305302Z .b8 109 2026-02-21T08:53:00.7305408Z .b8 112 2026-02-21T08:53:00.7305504Z .b8 47 2026-02-21T08:53:00.7305601Z .b8 116 2026-02-21T08:53:00.7305707Z .b8 111 2026-02-21T08:53:00.7305804Z .b8 114 2026-02-21T08:53:00.7305895Z .b8 99 2026-02-21T08:53:00.7305987Z .b8 104 2026-02-21T08:53:00.7306087Z .b8 105 2026-02-21T08:53:00.7306179Z .b8 110 2026-02-21T08:53:00.7306268Z .b8 100 2026-02-21T08:53:00.7306470Z .b8 117 2026-02-21T08:53:00.7306573Z .b8 99 2026-02-21T08:53:00.7306664Z .b8 116 2026-02-21T08:53:00.7306754Z .b8 111 2026-02-21T08:53:00.7306849Z .b8 114 2026-02-21T08:53:00.7306938Z .b8 95 2026-02-21T08:53:00.7307026Z .b8 114 2026-02-21T08:53:00.7307115Z .b8 111 2026-02-21T08:53:00.7307211Z .b8 111 2026-02-21T08:53:00.7307296Z .b8 116 2026-02-21T08:53:00.7307385Z .b8 47 2026-02-21T08:53:00.7307480Z .b8 103 2026-02-21T08:53:00.7307568Z .b8 55 2026-02-21T08:53:00.7307731Z .b8 0 2026-02-21T08:53:00.7307832Z } 2026-02-21T08:53:00.7307970Z .section .debug_macinfo { } 2026-02-21T08:53:00.7307977Z 2026-02-21T08:53:00.7308135Z ================================================================ 2026-02-21T08:53:00.7308346Z please share the reproducer above with Triton project. 2026-02-21T08:53:02.1310697Z 2026-02-21T08:53:02.1311946Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 56/56 14.5 configs/s 2026-02-21T08:53:03.7201177Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 622.4 2026-02-21T08:53:03.7202047Z configs/s 2026-02-21T08:53:03.8298010Z [289s] Generation 11 complete: 2026-02-21T08:53:03.8298568Z error=17 2026-02-21T08:53:03.8298886Z ok=41 2026-02-21T08:53:03.8299188Z min=0.1095 2026-02-21T08:53:03.8299496Z mid=0.3011 2026-02-21T08:53:03.8300304Z max=19.7991 2026-02-21T08:53:03.8300662Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:53:03.8301203Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:53:03.8301735Z 'l2_groupings': [64], 2026-02-21T08:53:03.8302143Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:53:03.8302608Z 'loop_orders': [[0, 1]], 2026-02-21T08:53:03.8302994Z 'num_stages': 3, 2026-02-21T08:53:03.8303323Z 'num_warps': 8, 2026-02-21T08:53:03.8303630Z 'pid_type': 'flat', 2026-02-21T08:53:03.8303990Z 'range_flattens': [None, None], 2026-02-21T08:53:03.8304424Z 'range_multi_buffers': [None, None], 2026-02-21T08:53:03.8305284Z 'range_num_stages': [0, 0], 2026-02-21T08:53:03.8305675Z 'range_unroll_factors': [0, 0], 2026-02-21T08:53:03.8306073Z 'range_warp_specializes': [None, True]} 2026-02-21T08:53:03.8355658Z [289s] Fitting surrogate: 897 points, 897 targets 2026-02-21T08:53:05.1192182Z [290s] Generation 12 starting: 53 neighbors, 3 active search path(s) 2026-02-21T08:53:17.8952597Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55/55 2.9 configs/s 2026-02-21T08:53:20.5107918Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 55/55 21.5 configs/s 2026-02-21T08:53:22.2653378Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 561.0 2026-02-21T08:53:22.2653959Z configs/s 2026-02-21T08:53:22.3599652Z [307s] Generation 12 complete: 2026-02-21T08:53:22.3600210Z error=14 2026-02-21T08:53:22.3600570Z ok=43 2026-02-21T08:53:22.3600910Z min=0.1095 2026-02-21T08:53:22.3601252Z mid=0.2131 2026-02-21T08:53:22.3601599Z max=16.6349 2026-02-21T08:53:22.3602005Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:53:22.3602671Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:53:22.3603238Z 'l2_groupings': [64], 2026-02-21T08:53:22.3603553Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:53:22.3603910Z 'loop_orders': [[0, 1]], 2026-02-21T08:53:22.3604200Z 'num_stages': 3, 2026-02-21T08:53:22.3604456Z 'num_warps': 8, 2026-02-21T08:53:22.3604885Z 'pid_type': 'flat', 2026-02-21T08:53:22.3605926Z 'range_flattens': [None, None], 2026-02-21T08:53:22.3606389Z 'range_multi_buffers': [None, None], 2026-02-21T08:53:22.3606726Z 'range_num_stages': [0, 0], 2026-02-21T08:53:22.3607018Z 'range_unroll_factors': [0, 0], 2026-02-21T08:53:22.3607321Z 'range_warp_specializes': [None, True]} 2026-02-21T08:53:22.3636013Z [307s] Fitting surrogate: 954 points, 954 targets 2026-02-21T08:53:23.3362459Z [308s] Generation 13 starting: 55 neighbors, 3 active search path(s) 2026-02-21T08:53:34.9825920Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57/57 1.4 configs/s 2026-02-21T08:53:37.4968067Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 57/57 22.0 configs/s 2026-02-21T08:53:39.4881952Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 495.1 2026-02-21T08:53:39.4882352Z configs/s 2026-02-21T08:53:39.6099574Z [325s] Generation 13 complete: 2026-02-21T08:53:39.6099837Z error=20 2026-02-21T08:53:39.6099996Z ok=39 2026-02-21T08:53:39.6100163Z min=0.1076 2026-02-21T08:53:39.6100303Z mid=0.2171 2026-02-21T08:53:39.6100432Z max=19.8528 2026-02-21T08:53:39.6100588Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:53:39.6100817Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:53:39.6101040Z 'l2_groupings': [64], 2026-02-21T08:53:39.6101226Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:53:39.6101426Z 'loop_orders': [[0, 1]], 2026-02-21T08:53:39.6101599Z 'num_stages': 3, 2026-02-21T08:53:39.6101747Z 'num_warps': 8, 2026-02-21T08:53:39.6101907Z 'pid_type': 'flat', 2026-02-21T08:53:39.6102067Z 'range_flattens': [None, None], 2026-02-21T08:53:39.6102259Z 'range_multi_buffers': [None, None], 2026-02-21T08:53:39.6102448Z 'range_num_stages': [0, 0], 2026-02-21T08:53:39.6102627Z 'range_unroll_factors': [0, 0], 2026-02-21T08:53:39.6103263Z 'range_warp_specializes': [None, True]} 2026-02-21T08:53:39.6157454Z [325s] Fitting surrogate: 1013 points, 1013 targets 2026-02-21T08:53:40.8769355Z [326s] Generation 14 starting: 56 neighbors, 3 active search path(s) 2026-02-21T08:54:12.9663188Z [358s] Timeout after 30s compiling Config(block_sizes=[512, 256, 32], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=None, num_sm_multiplier=8, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, False]) 2026-02-21T08:54:12.9683045Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 0.3 configs/s 2026-02-21T08:54:13.6472259Z 2026-02-21T08:54:13.6472420Z 2026-02-21T08:54:13.6472939Z ================================================================ 2026-02-21T08:54:13.6473223Z Internal Triton PTX codegen error 2026-02-21T08:54:13.6473482Z `ptxas` stderr: 2026-02-21T08:54:13.6473960Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 140 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:13.6474863Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:13.6475060Z 2026-02-21T08:54:13.6475491Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5jmbcnfn.ptx -o /tmp/tmp5jmbcnfn.ptx.o 2026-02-21T08:54:13.6475948Z 2026-02-21T08:54:13.6475951Z 2026-02-21T08:54:13.6476019Z // 2026-02-21T08:54:13.6476175Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:13.6476360Z // 2026-02-21T08:54:13.6476428Z 2026-02-21T08:54:13.6476489Z .version 8.7 2026-02-21T08:54:13.6476656Z .target sm_100a 2026-02-21T08:54:13.6476793Z .address_size 64 2026-02-21T08:54:13.6476886Z 2026-02-21T08:54:13.6477018Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:13.6477282Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:13.6477577Z // @_helion_matmul 2026-02-21T08:54:13.6477797Z .visible .entry _helion_matmul( 2026-02-21T08:54:13.6478014Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:13.6478284Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:13.6478550Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:13.6478800Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:13.6479064Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:13.6479353Z ) 2026-02-21T08:54:13.6479483Z .reqntid 128 2026-02-21T08:54:13.6479617Z .maxnreg 32 2026-02-21T08:54:13.6479760Z { 2026-02-21T08:54:13.6479899Z .reg .pred %p<103>; 2026-02-21T08:54:13.6480056Z .reg .b32 %r<2535>; 2026-02-21T08:54:13.6480196Z .reg .b64 %rd<1132>; 2026-02-21T08:54:13.6480479Z .loc 1 19 0 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:19:0 2026-02-21T08:54:13.6480784Z $L__func_begin0: 2026-02-21T08:54:13.6481028Z .loc 1 19 0 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:19:0 2026-02-21T08:54:13.6481261Z 2026-02-21T08:54:13.6481312Z // %bb.0: 2026-02-21T08:54:13.6481462Z ld.param.b64 %rd51, [_helion_matmul_param_1]; 2026-02-21T08:54:13.6481684Z ld.param.b64 %rd50, [_helion_matmul_param_0]; 2026-02-21T08:54:13.6481861Z $L__tmp0: 2026-02-21T08:54:13.6482098Z .loc 1 19 0 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:19 2026-02-21T08:54:13.6482382Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:13.6482529Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T08:54:13.6482711Z ld.param.b64 %rd53, [_helion_matmul_param_2]; 2026-02-21T08:54:13.6482897Z mov.b32 %r403, global_smem; 2026-02-21T08:54:13.6483056Z // begin inline asm 2026-02-21T08:54:13.6483378Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r403], 256; 2026-02-21T08:54:13.6483631Z // end inline asm 2026-02-21T08:54:13.6483805Z ld.param.b64 %rd310, [_helion_matmul_param_3]; 2026-02-21T08:54:13.6483992Z bar.sync 0; 2026-02-21T08:54:13.6484137Z ld.shared.b32 %r2408, [global_smem]; 2026-02-21T08:54:13.6484305Z bar.sync 0; 2026-02-21T08:54:13.6484442Z // begin inline asm 2026-02-21T08:54:13.6484641Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:13.6484915Z // end inline asm 2026-02-21T08:54:13.6492642Z [359s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:13.6493928Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 128], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=6, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, False]), static_shapes=True) 2026-02-21T08:54:13.6495161Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:13.6495543Z `ptxas` stderr: 2026-02-21T08:54:13.6495959Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 140 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:13.6496447Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:13.6496596Z 2026-02-21T08:54:13.6496991Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5jmbcnfn.ptx -o /tmp/tmp5jmbcnfn.ptx.o 2026-02-21T08:54:13.6497473Z 2026-02-21T08:54:13.6497614Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:13.6497999Z .loc 1 21 73 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:21:73 2026-02-21T08:54:13.6498321Z mov.u32 %r2500, %ctaid.x; 2026-02-21T08:54:13.6498494Z mov.u32 %r1166, %ctaid.y; 2026-02-21T08:54:13.6498695Z mov.u32 %r1167, %ctaid.z; 2026-02-21T08:54:13.6498863Z mov.u32 %r1168, %nctaid.x; 2026-02-21T08:54:13.6499025Z mov.u32 %r1169, %nctaid.y; 2026-02-21T08:54:13.6499204Z mad.lo.s32 %r1170, %r1167, %r1169, %r1166; 2026-02-21T08:54:13.6499402Z mad.lo.s32 %r1171, %r1170, %r1168, %r2500; 2026-02-21T08:54:13.6499591Z shl.b32 %r1172, %r1171, 7; 2026-02-21T08:54:13.6499761Z cvt.s64.s32 %rd311, %r1172; 2026-02-21T08:54:13.6499928Z add.s64 %rd67, %rd310, %rd311; 2026-02-21T08:54:13.6500104Z shl.b32 %r1173, %r1, 2; 2026-02-21T08:54:13.6500310Z add.s32 %r404, %r403, %r1173; 2026-02-21T08:54:13.6500478Z mov.b32 %r2534, 0; 2026-02-21T08:54:13.6500621Z // begin inline asm 2026-02-21T08:54:13.6500786Z @%p3 st.shared.b32 [ %r404 + 0 ], %r2534; 2026-02-21T08:54:13.6500968Z // end inline asm 2026-02-21T08:54:13.6501123Z bar.warp.sync -1; 2026-02-21T08:54:13.6501274Z setp.eq.b32 %p99, %r1, 0; 2026-02-21T08:54:13.6501441Z cvt.u64.u32 %rd52, %r403; 2026-02-21T08:54:13.6501602Z // begin inline asm 2026-02-21T08:54:13.6501875Z @%p99 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd52 + 0 ], %rd53; 2026-02-21T08:54:13.6502168Z // end inline asm 2026-02-21T08:54:13.6502304Z // begin inline asm 2026-02-21T08:54:13.6502536Z @%p99 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x1; 2026-02-21T08:54:13.6502788Z // end inline asm 2026-02-21T08:54:13.6502926Z mov.b32 %r406, 64; 2026-02-21T08:54:13.6503060Z // begin inline asm 2026-02-21T08:54:13.6503304Z @%p99 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x0, %r406; 2026-02-21T08:54:13.6503582Z // end inline asm 2026-02-21T08:54:13.6503716Z mov.b32 %r407, 128; 2026-02-21T08:54:13.6503863Z // begin inline asm 2026-02-21T08:54:13.6504096Z @%p99 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x1, %r407; 2026-02-21T08:54:13.6504374Z // end inline asm 2026-02-21T08:54:13.6504548Z mov.b32 %r408, 12288; 2026-02-21T08:54:13.6504738Z // begin inline asm 2026-02-21T08:54:13.6505008Z @%p99 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x0, %r408; 2026-02-21T08:54:13.6505304Z // end inline asm 2026-02-21T08:54:13.6505453Z mov.b32 %r409, 2048; 2026-02-21T08:54:13.6505604Z // begin inline asm 2026-02-21T08:54:13.6505861Z @%p99 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x1, %r409; 2026-02-21T08:54:13.6506151Z // end inline asm 2026-02-21T08:54:13.6506305Z mov.b64 %rd60, 24576; 2026-02-21T08:54:13.6506456Z // begin inline asm 2026-02-21T08:54:13.6506736Z @%p99 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd52 + 0 ], 0x0, %rd60; 2026-02-21T08:54:13.6507055Z // end inline asm 2026-02-21T08:54:13.6507189Z mov.b32 %r410, 1; 2026-02-21T08:54:13.6507332Z // begin inline asm 2026-02-21T08:54:13.6507586Z @%p99 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x0, %r410; 2026-02-21T08:54:13.6507884Z // end inline asm 2026-02-21T08:54:13.6508018Z // begin inline asm 2026-02-21T08:54:13.6508283Z @%p99 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x1, %r410; 2026-02-21T08:54:13.6508600Z // end inline asm 2026-02-21T08:54:13.6508730Z // begin inline asm 2026-02-21T08:54:13.6508962Z @%p99 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x6; 2026-02-21T08:54:13.6509222Z // end inline asm 2026-02-21T08:54:13.6509357Z // begin inline asm 2026-02-21T08:54:13.6509602Z @%p99 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x0; 2026-02-21T08:54:13.6509884Z // end inline asm 2026-02-21T08:54:13.6510013Z // begin inline asm 2026-02-21T08:54:13.6510247Z @%p99 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x3; 2026-02-21T08:54:13.6510510Z // end inline asm 2026-02-21T08:54:13.6510642Z // begin inline asm 2026-02-21T08:54:13.6510873Z @%p99 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd52 + 0 ], 0x0; 2026-02-21T08:54:13.6511123Z // end inline asm 2026-02-21T08:54:13.6511290Z // begin inline asm 2026-02-21T08:54:13.6511633Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd67 + 0 ], [ %rd52 + 0 ], 0x80; 2026-02-21T08:54:13.6512014Z // end inline asm 2026-02-21T08:54:13.6512152Z // begin inline asm 2026-02-21T08:54:13.6512353Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd67 + 0 ], 0x80; 2026-02-21T08:54:13.6512604Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T08:54:13.6512790Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:13.6512998Z // end inline asm 2026-02-21T08:54:13.6513126Z bar.sync 0; 2026-02-21T08:54:13.6513269Z cvta.global.u64 %rd619, %rd67; 2026-02-21T08:54:13.6513545Z .loc 1 42 45 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:42:45 2026-02-21T08:54:13.6513834Z shr.u32 %r1174, %r1, 5; 2026-02-21T08:54:13.6513991Z and.b32 %r1175, %r1, 112; 2026-02-21T08:54:13.6514138Z shl.b32 %r1176, %r1, 3; 2026-02-21T08:54:13.6514290Z and.b32 %r4, %r1176, 120; 2026-02-21T08:54:13.6514555Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6514880Z max.u32 %r1177, %r2500, 767; 2026-02-21T08:54:13.6515034Z shl.b32 %r1178, %r1177, 4; 2026-02-21T08:54:13.6515197Z add.s32 %r5, %r1178, -12272; 2026-02-21T08:54:13.6515351Z sub.s32 %r6, 12288, %r1178; 2026-02-21T08:54:13.6515622Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6515932Z shfl.sync.idx.b32 %r7, %r1174, 0, 31, -1; 2026-02-21T08:54:13.6516108Z and.b32 %r8, %r7, 3; 2026-02-21T08:54:13.6516257Z shl.b32 %r1179, %r8, 21; 2026-02-21T08:54:13.6516408Z add.s32 %r412, %r1179, %r2408; 2026-02-21T08:54:13.6516576Z mov.pred %p51, -1; 2026-02-21T08:54:13.6516714Z // begin inline asm 2026-02-21T08:54:13.6517155Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 0], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6517584Z // end inline asm 2026-02-21T08:54:13.6517717Z // begin inline asm 2026-02-21T08:54:13.6518111Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 16], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6518515Z // end inline asm 2026-02-21T08:54:13.6518653Z // begin inline asm 2026-02-21T08:54:13.6519023Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 32], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6519439Z // end inline asm 2026-02-21T08:54:13.6519574Z // begin inline asm 2026-02-21T08:54:13.6519940Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 48], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6520349Z // end inline asm 2026-02-21T08:54:13.6520537Z // begin inline asm 2026-02-21T08:54:13.6520913Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 64], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6521337Z // end inline asm 2026-02-21T08:54:13.6521468Z // begin inline asm 2026-02-21T08:54:13.6521840Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 80], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6522265Z // end inline asm 2026-02-21T08:54:13.6522403Z // begin inline asm 2026-02-21T08:54:13.6522772Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 96], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6523189Z // end inline asm 2026-02-21T08:54:13.6523358Z // begin inline asm 2026-02-21T08:54:13.6523736Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 112], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6524154Z // end inline asm 2026-02-21T08:54:13.6524283Z // begin inline asm 2026-02-21T08:54:13.6524709Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 128], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6525149Z // end inline asm 2026-02-21T08:54:13.6525280Z // begin inline asm 2026-02-21T08:54:13.6525661Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 144], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6526068Z // end inline asm 2026-02-21T08:54:13.6526204Z // begin inline asm 2026-02-21T08:54:13.6526569Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 160], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6526979Z // end inline asm 2026-02-21T08:54:13.6527113Z // begin inline asm 2026-02-21T08:54:13.6527477Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 176], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6527883Z // end inline asm 2026-02-21T08:54:13.6528014Z // begin inline asm 2026-02-21T08:54:13.6528386Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 192], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6528793Z // end inline asm 2026-02-21T08:54:13.6528920Z // begin inline asm 2026-02-21T08:54:13.6529337Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 208], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6529748Z // end inline asm 2026-02-21T08:54:13.6529890Z // begin inline asm 2026-02-21T08:54:13.6530281Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 224], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6530714Z // end inline asm 2026-02-21T08:54:13.6530859Z // begin inline asm 2026-02-21T08:54:13.6531253Z @%p51 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r412 + 240], {%r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534, %r2534}; 2026-02-21T08:54:13.6531692Z // end inline asm 2026-02-21T08:54:13.6531831Z // begin inline asm 2026-02-21T08:54:13.6531996Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:13.6532169Z // end inline asm 2026-02-21T08:54:13.6532318Z bar.sync 0; 2026-02-21T08:54:13.6532611Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6532961Z add.s32 %r2533, %r403, 655360; 2026-02-21T08:54:13.6533132Z // begin inline asm 2026-02-21T08:54:13.6533299Z @%p99 mbarrier.init.shared::cta.b64 [%r2533], 1; 2026-02-21T08:54:13.6533499Z // end inline asm 2026-02-21T08:54:13.6533630Z bar.sync 0; 2026-02-21T08:54:13.6533774Z add.s32 %r685, %r403, 655368; 2026-02-21T08:54:13.6533929Z // begin inline asm 2026-02-21T08:54:13.6534099Z @%p99 mbarrier.init.shared::cta.b64 [%r685], 1; 2026-02-21T08:54:13.6534292Z // end inline asm 2026-02-21T08:54:13.6534438Z setp.lt.s32 %p41, %r6, 1; 2026-02-21T08:54:13.6534605Z setp.gt.s32 %p42, %r6, 0; 2026-02-21T08:54:13.6534939Z .loc 1 36 35 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:36:35 2026-02-21T08:54:13.6535283Z mul.hi.u32 %r1180, %r2500, 715827883; 2026-02-21T08:54:13.6535462Z shr.u32 %r1181, %r1180, 7; 2026-02-21T08:54:13.6535787Z .loc 1 37 33 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:37:33 2026-02-21T08:54:13.6536102Z shl.b32 %r1182, %r1181, 4; 2026-02-21T08:54:13.6536377Z .loc 1 38 39 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:38:39 2026-02-21T08:54:13.6536675Z sub.s32 %r1183, 16, %r1182; 2026-02-21T08:54:13.6536934Z .loc 1 39 45 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:39:45 2026-02-21T08:54:13.6537223Z mul.lo.s32 %r1184, %r1181, 768; 2026-02-21T08:54:13.6537417Z sub.s32 %r1185, %r2500, %r1184; 2026-02-21T08:54:13.6537680Z .loc 1 40 51 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:40:51 2026-02-21T08:54:13.6537963Z div.s32 %r1186, %r1185, %r1183; 2026-02-21T08:54:13.6538226Z .loc 1 39 64 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:39:64 2026-02-21T08:54:13.6538509Z mul.lo.s32 %r1187, %r1186, %r1183; 2026-02-21T08:54:13.6538677Z sub.s32 %r1188, %r1185, %r1187; 2026-02-21T08:54:13.6538940Z .loc 1 39 30 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:39:30 2026-02-21T08:54:13.6539218Z add.s32 %r1189, %r1188, %r1182; 2026-02-21T08:54:13.6539478Z .loc 1 42 45 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:42:45 2026-02-21T08:54:13.6539755Z bfe.u32 %r107, %r1, 4, 3; 2026-02-21T08:54:13.6539911Z or.b32 %r108, %r107, 8; 2026-02-21T08:54:13.6540060Z or.b32 %r110, %r107, 24; 2026-02-21T08:54:13.6540208Z or.b32 %r111, %r107, 32; 2026-02-21T08:54:13.6540358Z or.b32 %r114, %r107, 56; 2026-02-21T08:54:13.6540499Z or.b32 %r115, %r107, 64; 2026-02-21T08:54:13.6540644Z or.b32 %r116, %r107, 72; 2026-02-21T08:54:13.6540886Z .loc 1 44 45 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:44:45 2026-02-21T08:54:13.6541186Z or.b32 %r17, %r107, 208; 2026-02-21T08:54:13.6541330Z or.b32 %r18, %r107, 216; 2026-02-21T08:54:13.6541479Z or.b32 %r19, %r107, 224; 2026-02-21T08:54:13.6541626Z or.b32 %r20, %r107, 232; 2026-02-21T08:54:13.6541764Z or.b32 %r21, %r107, 240; 2026-02-21T08:54:13.6541907Z or.b32 %r22, %r107, 248; 2026-02-21T08:54:13.6542151Z .loc 1 41 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:41:27 2026-02-21T08:54:13.6542432Z shl.b32 %r2410, %r1186, 8; 2026-02-21T08:54:13.6542578Z or.b32 %r73, %r107, 112; 2026-02-21T08:54:13.6542727Z or.b32 %r35, %r107, 48; 2026-02-21T08:54:13.6542868Z or.b32 %r34, %r107, 40; 2026-02-21T08:54:13.6543020Z or.b32 %r31, %r107, 16; 2026-02-21T08:54:13.6543157Z or.b32 %r25, %r107, 96; 2026-02-21T08:54:13.6543306Z or.b32 %r24, %r107, 88; 2026-02-21T08:54:13.6543455Z shl.b32 %r2414, %r1189, 7; 2026-02-21T08:54:13.6543710Z .loc 1 42 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:42:32 2026-02-21T08:54:13.6543993Z or.b32 %r2483, %r2414, %r107; 2026-02-21T08:54:13.6544147Z or.b32 %r2484, %r2414, %r108; 2026-02-21T08:54:13.6544307Z or.b32 %r2485, %r2414, %r31; 2026-02-21T08:54:13.6544489Z or.b32 %r2486, %r2414, %r110; 2026-02-21T08:54:13.6544645Z or.b32 %r2487, %r2414, %r111; 2026-02-21T08:54:13.6544817Z or.b32 %r2488, %r2414, %r34; 2026-02-21T08:54:13.6544976Z or.b32 %r2489, %r2414, %r35; 2026-02-21T08:54:13.6545132Z or.b32 %r2490, %r2414, %r114; 2026-02-21T08:54:13.6545278Z or.b32 %r2491, %r2414, %r115; 2026-02-21T08:54:13.6545434Z or.b32 %r2492, %r2414, %r116; 2026-02-21T08:54:13.6545579Z or.b32 %r75, %r107, 128; 2026-02-21T08:54:13.6545729Z or.b32 %r76, %r107, 136; 2026-02-21T08:54:13.6545872Z or.b32 %r77, %r107, 144; 2026-02-21T08:54:13.6546020Z or.b32 %r78, %r107, 152; 2026-02-21T08:54:13.6546162Z or.b32 %r79, %r107, 160; 2026-02-21T08:54:13.6546309Z or.b32 %r80, %r107, 168; 2026-02-21T08:54:13.6546449Z or.b32 %r81, %r107, 176; 2026-02-21T08:54:13.6546597Z or.b32 %r82, %r107, 184; 2026-02-21T08:54:13.6546741Z or.b32 %r83, %r107, 192; 2026-02-21T08:54:13.6546909Z or.b32 %r84, %r107, 200; 2026-02-21T08:54:13.6547058Z or.b32 %r101, %r107, 80; 2026-02-21T08:54:13.6547200Z or.b32 %r104, %r107, 104; 2026-02-21T08:54:13.6547350Z or.b32 %r106, %r107, 120; 2026-02-21T08:54:13.6547494Z or.b32 %r2506, %r2414, %r106; 2026-02-21T08:54:13.6547650Z or.b32 %r2505, %r2414, %r73; 2026-02-21T08:54:13.6547797Z or.b32 %r2504, %r2414, %r104; 2026-02-21T08:54:13.6547951Z or.b32 %r2503, %r25, %r2414; 2026-02-21T08:54:13.6548104Z or.b32 %r2502, %r24, %r2414; 2026-02-21T08:54:13.6548248Z or.b32 %r2501, %r2414, %r101; 2026-02-21T08:54:13.6548427Z or.b32 %r2532, %r2410, %r84; 2026-02-21T08:54:13.6548571Z or.b32 %r2531, %r2410, %r83; 2026-02-21T08:54:13.6548723Z or.b32 %r2530, %r2410, %r82; 2026-02-21T08:54:13.6548865Z or.b32 %r2529, %r2410, %r81; 2026-02-21T08:54:13.6549015Z or.b32 %r2528, %r2410, %r80; 2026-02-21T08:54:13.6549159Z or.b32 %r2527, %r2410, %r79; 2026-02-21T08:54:13.6549313Z or.b32 %r2526, %r2410, %r78; 2026-02-21T08:54:13.6549458Z or.b32 %r2525, %r2410, %r77; 2026-02-21T08:54:13.6549611Z or.b32 %r2524, %r2410, %r76; 2026-02-21T08:54:13.6549764Z or.b32 %r2523, %r2410, %r75; 2026-02-21T08:54:13.6549910Z or.b32 %r2522, %r2410, %r106; 2026-02-21T08:54:13.6550065Z or.b32 %r2521, %r73, %r2410; 2026-02-21T08:54:13.6550212Z or.b32 %r2520, %r2410, %r104; 2026-02-21T08:54:13.6550367Z or.b32 %r2519, %r25, %r2410; 2026-02-21T08:54:13.6550512Z or.b32 %r2518, %r24, %r2410; 2026-02-21T08:54:13.6550666Z or.b32 %r2517, %r2410, %r101; 2026-02-21T08:54:13.6550815Z or.b32 %r2516, %r2410, %r116; 2026-02-21T08:54:13.6550972Z or.b32 %r2515, %r2410, %r115; 2026-02-21T08:54:13.6551140Z or.b32 %r2514, %r2410, %r114; 2026-02-21T08:54:13.6551287Z or.b32 %r2513, %r35, %r2410; 2026-02-21T08:54:13.6551439Z or.b32 %r2512, %r34, %r2410; 2026-02-21T08:54:13.6551582Z or.b32 %r2511, %r2410, %r111; 2026-02-21T08:54:13.6551736Z or.b32 %r2510, %r2410, %r110; 2026-02-21T08:54:13.6551908Z or.b32 %r2509, %r31, %r2410; 2026-02-21T08:54:13.6552065Z or.b32 %r2508, %r2410, %r108; 2026-02-21T08:54:13.6552215Z or.b32 %r2507, %r2410, %r107; 2026-02-21T08:54:13.6552479Z .loc 1 44 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:44:32 2026-02-21T08:54:13.6552754Z or.b32 %r2494, %r2410, %r17; 2026-02-21T08:54:13.6552915Z or.b32 %r2495, %r2410, %r18; 2026-02-21T08:54:13.6553078Z or.b32 %r2496, %r2410, %r19; 2026-02-21T08:54:13.6553229Z or.b32 %r2497, %r2410, %r20; 2026-02-21T08:54:13.6553391Z or.b32 %r2498, %r2410, %r21; 2026-02-21T08:54:13.6553544Z or.b32 %r2499, %r2410, %r22; 2026-02-21T08:54:13.6553816Z .loc 1 54 53 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:53 2026-02-21T08:54:13.6554115Z shl.b32 %r1190, %r2483, 11; 2026-02-21T08:54:13.6554285Z shl.b32 %r1191, %r2484, 11; 2026-02-21T08:54:13.6554440Z shl.b32 %r1192, %r2485, 11; 2026-02-21T08:54:13.6554601Z shl.b32 %r1193, %r2486, 11; 2026-02-21T08:54:13.6554790Z shl.b32 %r1194, %r2487, 11; 2026-02-21T08:54:13.6554946Z shl.b32 %r1195, %r2488, 11; 2026-02-21T08:54:13.6555141Z shl.b32 %r1196, %r2489, 11; 2026-02-21T08:54:13.6555294Z shl.b32 %r1197, %r2490, 11; 2026-02-21T08:54:13.6555456Z shl.b32 %r1198, %r2491, 11; 2026-02-21T08:54:13.6555609Z shl.b32 %r1199, %r2492, 11; 2026-02-21T08:54:13.6555774Z shl.b32 %r1200, %r2501, 11; 2026-02-21T08:54:13.6555931Z shl.b32 %r1201, %r2502, 11; 2026-02-21T08:54:13.6556094Z shl.b32 %r1202, %r2503, 11; 2026-02-21T08:54:13.6556257Z shl.b32 %r1203, %r2504, 11; 2026-02-21T08:54:13.6556409Z shl.b32 %r1204, %r2505, 11; 2026-02-21T08:54:13.6556571Z shl.b32 %r1205, %r2506, 11; 2026-02-21T08:54:13.6556840Z .loc 1 54 60 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:60 2026-02-21T08:54:13.6557132Z or.b32 %r1206, %r1190, %r4; 2026-02-21T08:54:13.6557286Z or.b32 %r1207, %r1191, %r4; 2026-02-21T08:54:13.6557446Z or.b32 %r1208, %r1192, %r4; 2026-02-21T08:54:13.6557599Z or.b32 %r1209, %r1193, %r4; 2026-02-21T08:54:13.6557790Z or.b32 %r1210, %r1194, %r4; 2026-02-21T08:54:13.6557956Z or.b32 %r1211, %r1195, %r4; 2026-02-21T08:54:13.6558112Z or.b32 %r1212, %r1196, %r4; 2026-02-21T08:54:13.6558272Z or.b32 %r1213, %r1197, %r4; 2026-02-21T08:54:13.6558427Z or.b32 %r1214, %r1198, %r4; 2026-02-21T08:54:13.6558594Z or.b32 %r1215, %r1199, %r4; 2026-02-21T08:54:13.6558749Z or.b32 %r1216, %r1200, %r4; 2026-02-21T08:54:13.6558913Z or.b32 %r1217, %r1201, %r4; 2026-02-21T08:54:13.6559069Z or.b32 %r1218, %r1202, %r4; 2026-02-21T08:54:13.6559234Z or.b32 %r1219, %r1203, %r4; 2026-02-21T08:54:13.6559430Z or.b32 %r1220, %r1204, %r4; 2026-02-21T08:54:13.6559593Z or.b32 %r1221, %r1205, %r4; 2026-02-21T08:54:13.6559865Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6560162Z mad.wide.s32 %rd70, %r1206, 2, %rd50; 2026-02-21T08:54:13.6560363Z mad.wide.s32 %rd71, %r1207, 2, %rd50; 2026-02-21T08:54:13.6560535Z mad.wide.s32 %rd72, %r1208, 2, %rd50; 2026-02-21T08:54:13.6560714Z mad.wide.s32 %rd73, %r1209, 2, %rd50; 2026-02-21T08:54:13.6560879Z mad.wide.s32 %rd74, %r1210, 2, %rd50; 2026-02-21T08:54:13.6561050Z mad.wide.s32 %rd75, %r1211, 2, %rd50; 2026-02-21T08:54:13.6561213Z mad.wide.s32 %rd76, %r1212, 2, %rd50; 2026-02-21T08:54:13.6561391Z mad.wide.s32 %rd77, %r1213, 2, %rd50; 2026-02-21T08:54:13.6561560Z mad.wide.s32 %rd78, %r1214, 2, %rd50; 2026-02-21T08:54:13.6561723Z mad.wide.s32 %rd79, %r1215, 2, %rd50; 2026-02-21T08:54:13.6561893Z mad.wide.s32 %rd80, %r1216, 2, %rd50; 2026-02-21T08:54:13.6562059Z mad.wide.s32 %rd81, %r1217, 2, %rd50; 2026-02-21T08:54:13.6562228Z mad.wide.s32 %rd82, %r1218, 2, %rd50; 2026-02-21T08:54:13.6562390Z mad.wide.s32 %rd83, %r1219, 2, %rd50; 2026-02-21T08:54:13.6562562Z mad.wide.s32 %rd84, %r1220, 2, %rd50; 2026-02-21T08:54:13.6562726Z mad.wide.s32 %rd85, %r1221, 2, %rd50; 2026-02-21T08:54:13.6563050Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6563348Z shl.b32 %r1222, %r1, 4; 2026-02-21T08:54:13.6563495Z and.b32 %r172, %r1222, 112; 2026-02-21T08:54:13.6563655Z shl.b32 %r1223, %r1175, 3; 2026-02-21T08:54:13.6563804Z and.b32 %r1224, %r1, 8; 2026-02-21T08:54:13.6563952Z shl.b32 %r1225, %r1224, 11; 2026-02-21T08:54:13.6564100Z or.b32 %r1226, %r172, %r1223; 2026-02-21T08:54:13.6564264Z xor.b32 %r1227, %r1226, %r1175; 2026-02-21T08:54:13.6564420Z or.b32 %r173, %r1227, %r1225; 2026-02-21T08:54:13.6564583Z add.s32 %r1228, %r403, %r173; 2026-02-21T08:54:13.6564780Z add.s32 %r686, %r1228, 393216; 2026-02-21T08:54:13.6564937Z selp.b32 %r687, 16, 0, %p42; 2026-02-21T08:54:13.6565095Z // begin inline asm 2026-02-21T08:54:13.6565298Z cp.async.cg.shared.global [ %r686 + 0 ], [ %rd70 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6565533Z // end inline asm 2026-02-21T08:54:13.6565669Z add.s32 %r688, %r1228, 394240; 2026-02-21T08:54:13.6565831Z // begin inline asm 2026-02-21T08:54:13.6566026Z cp.async.cg.shared.global [ %r688 + 0 ], [ %rd71 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6566282Z // end inline asm 2026-02-21T08:54:13.6566425Z add.s32 %r690, %r1228, 395264; 2026-02-21T08:54:13.6566575Z // begin inline asm 2026-02-21T08:54:13.6566771Z cp.async.cg.shared.global [ %r690 + 0 ], [ %rd72 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6566992Z // end inline asm 2026-02-21T08:54:13.6567133Z add.s32 %r692, %r1228, 396288; 2026-02-21T08:54:13.6567283Z // begin inline asm 2026-02-21T08:54:13.6567481Z cp.async.cg.shared.global [ %r692 + 0 ], [ %rd73 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6567699Z // end inline asm 2026-02-21T08:54:13.6567839Z add.s32 %r694, %r1228, 397312; 2026-02-21T08:54:13.6567991Z // begin inline asm 2026-02-21T08:54:13.6568189Z cp.async.cg.shared.global [ %r694 + 0 ], [ %rd74 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6568413Z // end inline asm 2026-02-21T08:54:13.6568549Z add.s32 %r696, %r1228, 398336; 2026-02-21T08:54:13.6568717Z // begin inline asm 2026-02-21T08:54:13.6568939Z cp.async.cg.shared.global [ %r696 + 0 ], [ %rd75 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6569171Z // end inline asm 2026-02-21T08:54:13.6569304Z add.s32 %r698, %r1228, 399360; 2026-02-21T08:54:13.6569465Z // begin inline asm 2026-02-21T08:54:13.6569654Z cp.async.cg.shared.global [ %r698 + 0 ], [ %rd76 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6569874Z // end inline asm 2026-02-21T08:54:13.6570012Z add.s32 %r700, %r1228, 400384; 2026-02-21T08:54:13.6570162Z // begin inline asm 2026-02-21T08:54:13.6570356Z cp.async.cg.shared.global [ %r700 + 0 ], [ %rd77 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6570600Z // end inline asm 2026-02-21T08:54:13.6570742Z add.s32 %r702, %r1228, 401408; 2026-02-21T08:54:13.6570893Z // begin inline asm 2026-02-21T08:54:13.6571087Z cp.async.cg.shared.global [ %r702 + 0 ], [ %rd78 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6571301Z // end inline asm 2026-02-21T08:54:13.6571442Z add.s32 %r704, %r1228, 402432; 2026-02-21T08:54:13.6571599Z // begin inline asm 2026-02-21T08:54:13.6571786Z cp.async.cg.shared.global [ %r704 + 0 ], [ %rd79 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6572010Z // end inline asm 2026-02-21T08:54:13.6572142Z add.s32 %r706, %r1228, 403456; 2026-02-21T08:54:13.6572300Z // begin inline asm 2026-02-21T08:54:13.6572489Z cp.async.cg.shared.global [ %r706 + 0 ], [ %rd80 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6572708Z // end inline asm 2026-02-21T08:54:13.6572840Z add.s32 %r708, %r1228, 404480; 2026-02-21T08:54:13.6572998Z // begin inline asm 2026-02-21T08:54:13.6573184Z cp.async.cg.shared.global [ %r708 + 0 ], [ %rd81 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6573410Z // end inline asm 2026-02-21T08:54:13.6573546Z add.s32 %r710, %r1228, 405504; 2026-02-21T08:54:13.6573697Z // begin inline asm 2026-02-21T08:54:13.6573887Z cp.async.cg.shared.global [ %r710 + 0 ], [ %rd82 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6574106Z // end inline asm 2026-02-21T08:54:13.6574270Z add.s32 %r712, %r1228, 406528; 2026-02-21T08:54:13.6574423Z // begin inline asm 2026-02-21T08:54:13.6574620Z cp.async.cg.shared.global [ %r712 + 0 ], [ %rd83 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6574864Z // end inline asm 2026-02-21T08:54:13.6575002Z add.s32 %r714, %r1228, 407552; 2026-02-21T08:54:13.6575159Z // begin inline asm 2026-02-21T08:54:13.6575347Z cp.async.cg.shared.global [ %r714 + 0 ], [ %rd84 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6575570Z // end inline asm 2026-02-21T08:54:13.6575700Z add.s32 %r716, %r1228, 408576; 2026-02-21T08:54:13.6575855Z // begin inline asm 2026-02-21T08:54:13.6576043Z cp.async.cg.shared.global [ %r716 + 0 ], [ %rd85 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6576264Z // end inline asm 2026-02-21T08:54:13.6576402Z cp.async.commit_group; 2026-02-21T08:54:13.6576669Z .loc 1 55 80 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:80 2026-02-21T08:54:13.6576956Z shl.b32 %r1229, %r2507, 11; 2026-02-21T08:54:13.6577111Z shl.b32 %r1230, %r2508, 11; 2026-02-21T08:54:13.6577271Z shl.b32 %r1231, %r2509, 11; 2026-02-21T08:54:13.6577421Z shl.b32 %r1232, %r2510, 11; 2026-02-21T08:54:13.6577611Z shl.b32 %r1233, %r2511, 11; 2026-02-21T08:54:13.6577760Z shl.b32 %r1234, %r2512, 11; 2026-02-21T08:54:13.6577918Z shl.b32 %r1235, %r2513, 11; 2026-02-21T08:54:13.6578066Z shl.b32 %r1236, %r2514, 11; 2026-02-21T08:54:13.6578222Z shl.b32 %r1237, %r2515, 11; 2026-02-21T08:54:13.6578374Z shl.b32 %r1238, %r2516, 11; 2026-02-21T08:54:13.6578521Z shl.b32 %r1239, %r2517, 11; 2026-02-21T08:54:13.6578673Z shl.b32 %r1240, %r2518, 11; 2026-02-21T08:54:13.6578817Z shl.b32 %r1241, %r2519, 11; 2026-02-21T08:54:13.6578969Z shl.b32 %r1242, %r2520, 11; 2026-02-21T08:54:13.6579112Z shl.b32 %r1243, %r2521, 11; 2026-02-21T08:54:13.6579267Z shl.b32 %r1244, %r2522, 11; 2026-02-21T08:54:13.6579414Z shl.b32 %r1245, %r2523, 11; 2026-02-21T08:54:13.6579568Z shl.b32 %r1246, %r2524, 11; 2026-02-21T08:54:13.6579713Z shl.b32 %r1247, %r2525, 11; 2026-02-21T08:54:13.6579864Z shl.b32 %r1248, %r2526, 11; 2026-02-21T08:54:13.6580048Z shl.b32 %r1249, %r2527, 11; 2026-02-21T08:54:13.6580195Z shl.b32 %r1250, %r2528, 11; 2026-02-21T08:54:13.6580349Z shl.b32 %r1251, %r2529, 11; 2026-02-21T08:54:13.6580491Z shl.b32 %r1252, %r2530, 11; 2026-02-21T08:54:13.6580644Z shl.b32 %r1253, %r2531, 11; 2026-02-21T08:54:13.6580789Z shl.b32 %r1254, %r2532, 11; 2026-02-21T08:54:13.6580942Z shl.b32 %r1255, %r2494, 11; 2026-02-21T08:54:13.6581087Z shl.b32 %r1256, %r2495, 11; 2026-02-21T08:54:13.6581239Z shl.b32 %r1257, %r2496, 11; 2026-02-21T08:54:13.6581383Z shl.b32 %r1258, %r2497, 11; 2026-02-21T08:54:13.6581565Z shl.b32 %r1259, %r2498, 11; 2026-02-21T08:54:13.6581719Z shl.b32 %r1260, %r2499, 11; 2026-02-21T08:54:13.6581975Z .loc 1 55 59 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:59 2026-02-21T08:54:13.6582259Z or.b32 %r1261, %r1229, %r4; 2026-02-21T08:54:13.6582405Z or.b32 %r1262, %r1230, %r4; 2026-02-21T08:54:13.6582556Z or.b32 %r1263, %r1231, %r4; 2026-02-21T08:54:13.6582699Z or.b32 %r1264, %r1232, %r4; 2026-02-21T08:54:13.6582852Z or.b32 %r1265, %r1233, %r4; 2026-02-21T08:54:13.6582997Z or.b32 %r1266, %r1234, %r4; 2026-02-21T08:54:13.6583148Z or.b32 %r1267, %r1235, %r4; 2026-02-21T08:54:13.6583298Z or.b32 %r1268, %r1236, %r4; 2026-02-21T08:54:13.6583441Z or.b32 %r1269, %r1237, %r4; 2026-02-21T08:54:13.6583602Z or.b32 %r1270, %r1238, %r4; 2026-02-21T08:54:13.6583746Z or.b32 %r1271, %r1239, %r4; 2026-02-21T08:54:13.6583898Z or.b32 %r1272, %r1240, %r4; 2026-02-21T08:54:13.6584042Z or.b32 %r1273, %r1241, %r4; 2026-02-21T08:54:13.6584195Z or.b32 %r1274, %r1242, %r4; 2026-02-21T08:54:13.6584339Z or.b32 %r1275, %r1243, %r4; 2026-02-21T08:54:13.6584490Z or.b32 %r1276, %r1244, %r4; 2026-02-21T08:54:13.6584640Z or.b32 %r1277, %r1245, %r4; 2026-02-21T08:54:13.6584824Z or.b32 %r1278, %r1246, %r4; 2026-02-21T08:54:13.6584976Z or.b32 %r1279, %r1247, %r4; 2026-02-21T08:54:13.6585149Z or.b32 %r1280, %r1248, %r4; 2026-02-21T08:54:13.6585308Z or.b32 %r1281, %r1249, %r4; 2026-02-21T08:54:13.6585453Z or.b32 %r1282, %r1250, %r4; 2026-02-21T08:54:13.6585607Z or.b32 %r1283, %r1251, %r4; 2026-02-21T08:54:13.6585754Z or.b32 %r1284, %r1252, %r4; 2026-02-21T08:54:13.6585922Z or.b32 %r1285, %r1253, %r4; 2026-02-21T08:54:13.6586068Z or.b32 %r1286, %r1254, %r4; 2026-02-21T08:54:13.6586220Z or.b32 %r1287, %r1255, %r4; 2026-02-21T08:54:13.6586372Z or.b32 %r1288, %r1256, %r4; 2026-02-21T08:54:13.6586516Z or.b32 %r1289, %r1257, %r4; 2026-02-21T08:54:13.6586668Z or.b32 %r1290, %r1258, %r4; 2026-02-21T08:54:13.6586812Z or.b32 %r1291, %r1259, %r4; 2026-02-21T08:54:13.6586963Z or.b32 %r1292, %r1260, %r4; 2026-02-21T08:54:13.6587215Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6587513Z mad.wide.s32 %rd86, %r1261, 2, %rd51; 2026-02-21T08:54:13.6587687Z mad.wide.s32 %rd87, %r1262, 2, %rd51; 2026-02-21T08:54:13.6587865Z mad.wide.s32 %rd88, %r1263, 2, %rd51; 2026-02-21T08:54:13.6588039Z mad.wide.s32 %rd89, %r1264, 2, %rd51; 2026-02-21T08:54:13.6588206Z mad.wide.s32 %rd90, %r1265, 2, %rd51; 2026-02-21T08:54:13.6588415Z mad.wide.s32 %rd91, %r1266, 2, %rd51; 2026-02-21T08:54:13.6588581Z mad.wide.s32 %rd92, %r1267, 2, %rd51; 2026-02-21T08:54:13.6588753Z mad.wide.s32 %rd93, %r1268, 2, %rd51; 2026-02-21T08:54:13.6588917Z mad.wide.s32 %rd94, %r1269, 2, %rd51; 2026-02-21T08:54:13.6589088Z mad.wide.s32 %rd95, %r1270, 2, %rd51; 2026-02-21T08:54:13.6589254Z mad.wide.s32 %rd96, %r1271, 2, %rd51; 2026-02-21T08:54:13.6589427Z mad.wide.s32 %rd97, %r1272, 2, %rd51; 2026-02-21T08:54:13.6589600Z mad.wide.s32 %rd98, %r1273, 2, %rd51; 2026-02-21T08:54:13.6589765Z mad.wide.s32 %rd99, %r1274, 2, %rd51; 2026-02-21T08:54:13.6589945Z mad.wide.s32 %rd100, %r1275, 2, %rd51; 2026-02-21T08:54:13.6590123Z mad.wide.s32 %rd101, %r1276, 2, %rd51; 2026-02-21T08:54:13.6590303Z mad.wide.s32 %rd102, %r1277, 2, %rd51; 2026-02-21T08:54:13.6590474Z mad.wide.s32 %rd103, %r1278, 2, %rd51; 2026-02-21T08:54:13.6590680Z mad.wide.s32 %rd104, %r1279, 2, %rd51; 2026-02-21T08:54:13.6590851Z mad.wide.s32 %rd105, %r1280, 2, %rd51; 2026-02-21T08:54:13.6591023Z mad.wide.s32 %rd106, %r1281, 2, %rd51; 2026-02-21T08:54:13.6591198Z mad.wide.s32 %rd107, %r1282, 2, %rd51; 2026-02-21T08:54:13.6591363Z mad.wide.s32 %rd108, %r1283, 2, %rd51; 2026-02-21T08:54:13.6591537Z mad.wide.s32 %rd109, %r1284, 2, %rd51; 2026-02-21T08:54:13.6591706Z mad.wide.s32 %rd110, %r1285, 2, %rd51; 2026-02-21T08:54:13.6591879Z mad.wide.s32 %rd111, %r1286, 2, %rd51; 2026-02-21T08:54:13.6592050Z mad.wide.s32 %rd112, %r1287, 2, %rd51; 2026-02-21T08:54:13.6592256Z mad.wide.s32 %rd113, %r1288, 2, %rd51; 2026-02-21T08:54:13.6592426Z mad.wide.s32 %rd114, %r1289, 2, %rd51; 2026-02-21T08:54:13.6592603Z mad.wide.s32 %rd115, %r1290, 2, %rd51; 2026-02-21T08:54:13.6592779Z mad.wide.s32 %rd116, %r1291, 2, %rd51; 2026-02-21T08:54:13.6592949Z mad.wide.s32 %rd117, %r1292, 2, %rd51; 2026-02-21T08:54:13.6593229Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6593510Z shl.b32 %r1293, %r1224, 12; 2026-02-21T08:54:13.6593673Z or.b32 %r174, %r1227, %r1293; 2026-02-21T08:54:13.6593830Z add.s32 %r718, %r403, %r174; 2026-02-21T08:54:13.6593990Z // begin inline asm 2026-02-21T08:54:13.6594181Z cp.async.cg.shared.global [ %r718 + 0 ], [ %rd86 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6594406Z // end inline asm 2026-02-21T08:54:13.6594550Z add.s32 %r720, %r718, 1024; 2026-02-21T08:54:13.6594732Z // begin inline asm 2026-02-21T08:54:13.6594936Z cp.async.cg.shared.global [ %r720 + 0 ], [ %rd87 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6595149Z // end inline asm 2026-02-21T08:54:13.6595286Z add.s32 %r722, %r718, 2048; 2026-02-21T08:54:13.6595432Z // begin inline asm 2026-02-21T08:54:13.6595627Z cp.async.cg.shared.global [ %r722 + 0 ], [ %rd88 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6595880Z // end inline asm 2026-02-21T08:54:13.6596031Z add.s32 %r724, %r718, 3072; 2026-02-21T08:54:13.6596201Z // begin inline asm 2026-02-21T08:54:13.6596402Z cp.async.cg.shared.global [ %r724 + 0 ], [ %rd89 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6596635Z // end inline asm 2026-02-21T08:54:13.6596771Z add.s32 %r726, %r718, 4096; 2026-02-21T08:54:13.6596933Z // begin inline asm 2026-02-21T08:54:13.6597130Z cp.async.cg.shared.global [ %r726 + 0 ], [ %rd90 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6597360Z // end inline asm 2026-02-21T08:54:13.6597496Z add.s32 %r728, %r718, 5120; 2026-02-21T08:54:13.6597657Z // begin inline asm 2026-02-21T08:54:13.6597855Z cp.async.cg.shared.global [ %r728 + 0 ], [ %rd91 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6598084Z // end inline asm 2026-02-21T08:54:13.6598228Z add.s32 %r730, %r718, 6144; 2026-02-21T08:54:13.6598383Z // begin inline asm 2026-02-21T08:54:13.6598583Z cp.async.cg.shared.global [ %r730 + 0 ], [ %rd92 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6598809Z // end inline asm 2026-02-21T08:54:13.6598954Z add.s32 %r732, %r718, 7168; 2026-02-21T08:54:13.6599108Z // begin inline asm 2026-02-21T08:54:13.6599341Z cp.async.cg.shared.global [ %r732 + 0 ], [ %rd93 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6599561Z // end inline asm 2026-02-21T08:54:13.6599703Z add.s32 %r734, %r718, 8192; 2026-02-21T08:54:13.6599862Z // begin inline asm 2026-02-21T08:54:13.6600055Z cp.async.cg.shared.global [ %r734 + 0 ], [ %rd94 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6600282Z // end inline asm 2026-02-21T08:54:13.6600418Z add.s32 %r736, %r718, 9216; 2026-02-21T08:54:13.6600577Z // begin inline asm 2026-02-21T08:54:13.6600768Z cp.async.cg.shared.global [ %r736 + 0 ], [ %rd95 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6600998Z // end inline asm 2026-02-21T08:54:13.6601136Z add.s32 %r738, %r718, 10240; 2026-02-21T08:54:13.6601300Z // begin inline asm 2026-02-21T08:54:13.6601500Z cp.async.cg.shared.global [ %r738 + 0 ], [ %rd96 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6601720Z // end inline asm 2026-02-21T08:54:13.6601868Z add.s32 %r740, %r718, 11264; 2026-02-21T08:54:13.6602052Z // begin inline asm 2026-02-21T08:54:13.6602261Z cp.async.cg.shared.global [ %r740 + 0 ], [ %rd97 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6602484Z // end inline asm 2026-02-21T08:54:13.6602631Z add.s32 %r742, %r718, 12288; 2026-02-21T08:54:13.6602783Z // begin inline asm 2026-02-21T08:54:13.6602987Z cp.async.cg.shared.global [ %r742 + 0 ], [ %rd98 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6603211Z // end inline asm 2026-02-21T08:54:13.6603367Z add.s32 %r744, %r718, 13312; 2026-02-21T08:54:13.6603528Z // begin inline asm 2026-02-21T08:54:13.6603768Z cp.async.cg.shared.global [ %r744 + 0 ], [ %rd99 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6603993Z // end inline asm 2026-02-21T08:54:13.6604127Z add.s32 %r746, %r718, 14336; 2026-02-21T08:54:13.6604280Z // begin inline asm 2026-02-21T08:54:13.6604469Z cp.async.cg.shared.global [ %r746 + 0 ], [ %rd100 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6604719Z // end inline asm 2026-02-21T08:54:13.6604859Z add.s32 %r748, %r718, 15360; 2026-02-21T08:54:13.6605024Z // begin inline asm 2026-02-21T08:54:13.6605220Z cp.async.cg.shared.global [ %r748 + 0 ], [ %rd101 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6605431Z // end inline asm 2026-02-21T08:54:13.6605567Z add.s32 %r750, %r718, 16384; 2026-02-21T08:54:13.6605712Z // begin inline asm 2026-02-21T08:54:13.6605906Z cp.async.cg.shared.global [ %r750 + 0 ], [ %rd102 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6606118Z // end inline asm 2026-02-21T08:54:13.6606258Z add.s32 %r752, %r718, 17408; 2026-02-21T08:54:13.6606404Z // begin inline asm 2026-02-21T08:54:13.6606602Z cp.async.cg.shared.global [ %r752 + 0 ], [ %rd103 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6606827Z // end inline asm 2026-02-21T08:54:13.6606959Z add.s32 %r754, %r718, 18432; 2026-02-21T08:54:13.6607115Z // begin inline asm 2026-02-21T08:54:13.6607302Z cp.async.cg.shared.global [ %r754 + 0 ], [ %rd104 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6607558Z // end inline asm 2026-02-21T08:54:13.6607690Z add.s32 %r756, %r718, 19456; 2026-02-21T08:54:13.6607846Z // begin inline asm 2026-02-21T08:54:13.6608035Z cp.async.cg.shared.global [ %r756 + 0 ], [ %rd105 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6608256Z // end inline asm 2026-02-21T08:54:13.6608385Z add.s32 %r758, %r718, 20480; 2026-02-21T08:54:13.6608538Z // begin inline asm 2026-02-21T08:54:13.6608731Z cp.async.cg.shared.global [ %r758 + 0 ], [ %rd106 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6608941Z // end inline asm 2026-02-21T08:54:13.6609077Z add.s32 %r760, %r718, 21504; 2026-02-21T08:54:13.6609223Z // begin inline asm 2026-02-21T08:54:13.6609416Z cp.async.cg.shared.global [ %r760 + 0 ], [ %rd107 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6609634Z // end inline asm 2026-02-21T08:54:13.6609770Z add.s32 %r762, %r718, 22528; 2026-02-21T08:54:13.6609915Z // begin inline asm 2026-02-21T08:54:13.6610110Z cp.async.cg.shared.global [ %r762 + 0 ], [ %rd108 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6610328Z // end inline asm 2026-02-21T08:54:13.6610459Z add.s32 %r764, %r718, 23552; 2026-02-21T08:54:13.6610614Z // begin inline asm 2026-02-21T08:54:13.6610832Z cp.async.cg.shared.global [ %r764 + 0 ], [ %rd109 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6611055Z // end inline asm 2026-02-21T08:54:13.6611187Z add.s32 %r766, %r718, 24576; 2026-02-21T08:54:13.6611343Z // begin inline asm 2026-02-21T08:54:13.6611535Z cp.async.cg.shared.global [ %r766 + 0 ], [ %rd110 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6611757Z // end inline asm 2026-02-21T08:54:13.6611895Z add.s32 %r768, %r718, 25600; 2026-02-21T08:54:13.6612043Z // begin inline asm 2026-02-21T08:54:13.6612241Z cp.async.cg.shared.global [ %r768 + 0 ], [ %rd111 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6612460Z // end inline asm 2026-02-21T08:54:13.6612603Z add.s32 %r770, %r718, 26624; 2026-02-21T08:54:13.6612753Z // begin inline asm 2026-02-21T08:54:13.6612955Z cp.async.cg.shared.global [ %r770 + 0 ], [ %rd112 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6613170Z // end inline asm 2026-02-21T08:54:13.6613311Z add.s32 %r772, %r718, 27648; 2026-02-21T08:54:13.6613493Z // begin inline asm 2026-02-21T08:54:13.6613683Z cp.async.cg.shared.global [ %r772 + 0 ], [ %rd113 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6613906Z // end inline asm 2026-02-21T08:54:13.6614035Z add.s32 %r774, %r718, 28672; 2026-02-21T08:54:13.6614188Z // begin inline asm 2026-02-21T08:54:13.6614373Z cp.async.cg.shared.global [ %r774 + 0 ], [ %rd114 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6614595Z // end inline asm 2026-02-21T08:54:13.6614753Z add.s32 %r776, %r718, 29696; 2026-02-21T08:54:13.6614907Z // begin inline asm 2026-02-21T08:54:13.6615124Z cp.async.cg.shared.global [ %r776 + 0 ], [ %rd115 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6615351Z // end inline asm 2026-02-21T08:54:13.6615489Z add.s32 %r778, %r718, 30720; 2026-02-21T08:54:13.6615634Z // begin inline asm 2026-02-21T08:54:13.6615830Z cp.async.cg.shared.global [ %r778 + 0 ], [ %rd116 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6616047Z // end inline asm 2026-02-21T08:54:13.6616185Z add.s32 %r780, %r718, 31744; 2026-02-21T08:54:13.6616333Z // begin inline asm 2026-02-21T08:54:13.6616531Z cp.async.cg.shared.global [ %r780 + 0 ], [ %rd117 + 0 ], 0x10, %r687; 2026-02-21T08:54:13.6616748Z // end inline asm 2026-02-21T08:54:13.6616891Z cp.async.commit_group; 2026-02-21T08:54:13.6617166Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6617459Z setp.gt.s32 %p43, %r6, 1; 2026-02-21T08:54:13.6617723Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6618004Z cvt.s64.s32 %rd312, %r1190; 2026-02-21T08:54:13.6618166Z cvt.u64.u32 %rd313, %r4; 2026-02-21T08:54:13.6618315Z or.b64 %rd314, %rd312, %rd313; 2026-02-21T08:54:13.6618480Z shl.b64 %rd315, %rd314, 1; 2026-02-21T08:54:13.6618631Z add.s64 %rd2, %rd50, %rd315; 2026-02-21T08:54:13.6618817Z add.s64 %rd118, %rd2, 256; 2026-02-21T08:54:13.6618976Z cvt.s64.s32 %rd316, %r1191; 2026-02-21T08:54:13.6619126Z or.b64 %rd317, %rd316, %rd313; 2026-02-21T08:54:13.6619287Z shl.b64 %rd318, %rd317, 1; 2026-02-21T08:54:13.6619436Z add.s64 %rd3, %rd50, %rd318; 2026-02-21T08:54:13.6619590Z add.s64 %rd119, %rd3, 256; 2026-02-21T08:54:13.6619736Z cvt.s64.s32 %rd319, %r1192; 2026-02-21T08:54:13.6619891Z or.b64 %rd320, %rd319, %rd313; 2026-02-21T08:54:13.6620043Z shl.b64 %rd321, %rd320, 1; 2026-02-21T08:54:13.6620197Z add.s64 %rd4, %rd50, %rd321; 2026-02-21T08:54:13.6620353Z add.s64 %rd120, %rd4, 256; 2026-02-21T08:54:13.6620498Z cvt.s64.s32 %rd322, %r1193; 2026-02-21T08:54:13.6620654Z or.b64 %rd323, %rd322, %rd313; 2026-02-21T08:54:13.6620804Z shl.b64 %rd324, %rd323, 1; 2026-02-21T08:54:13.6620957Z add.s64 %rd5, %rd50, %rd324; 2026-02-21T08:54:13.6621103Z add.s64 %rd121, %rd5, 256; 2026-02-21T08:54:13.6621259Z cvt.s64.s32 %rd325, %r1194; 2026-02-21T08:54:13.6621410Z or.b64 %rd326, %rd325, %rd313; 2026-02-21T08:54:13.6621573Z shl.b64 %rd327, %rd326, 1; 2026-02-21T08:54:13.6621730Z add.s64 %rd6, %rd50, %rd327; 2026-02-21T08:54:13.6621920Z add.s64 %rd122, %rd6, 256; 2026-02-21T08:54:13.6622073Z cvt.s64.s32 %rd328, %r1195; 2026-02-21T08:54:13.6622220Z or.b64 %rd329, %rd328, %rd313; 2026-02-21T08:54:13.6622379Z shl.b64 %rd330, %rd329, 1; 2026-02-21T08:54:13.6622525Z add.s64 %rd7, %rd50, %rd330; 2026-02-21T08:54:13.6622680Z add.s64 %rd123, %rd7, 256; 2026-02-21T08:54:13.6622826Z cvt.s64.s32 %rd331, %r1196; 2026-02-21T08:54:13.6622981Z or.b64 %rd332, %rd331, %rd313; 2026-02-21T08:54:13.6623132Z shl.b64 %rd333, %rd332, 1; 2026-02-21T08:54:13.6623287Z add.s64 %rd8, %rd50, %rd333; 2026-02-21T08:54:13.6623443Z add.s64 %rd124, %rd8, 256; 2026-02-21T08:54:13.6623596Z cvt.s64.s32 %rd334, %r1197; 2026-02-21T08:54:13.6623750Z or.b64 %rd335, %rd334, %rd313; 2026-02-21T08:54:13.6623904Z shl.b64 %rd336, %rd335, 1; 2026-02-21T08:54:13.6624058Z add.s64 %rd9, %rd50, %rd336; 2026-02-21T08:54:13.6624206Z add.s64 %rd125, %rd9, 256; 2026-02-21T08:54:13.6624361Z cvt.s64.s32 %rd337, %r1198; 2026-02-21T08:54:13.6624536Z or.b64 %rd338, %rd337, %rd313; 2026-02-21T08:54:13.6624729Z shl.b64 %rd339, %rd338, 1; 2026-02-21T08:54:13.6624882Z add.s64 %rd10, %rd50, %rd339; 2026-02-21T08:54:13.6625045Z add.s64 %rd126, %rd10, 256; 2026-02-21T08:54:13.6625201Z cvt.s64.s32 %rd340, %r1199; 2026-02-21T08:54:13.6625348Z or.b64 %rd341, %rd340, %rd313; 2026-02-21T08:54:13.6625508Z shl.b64 %rd342, %rd341, 1; 2026-02-21T08:54:13.6625655Z add.s64 %rd11, %rd50, %rd342; 2026-02-21T08:54:13.6625812Z add.s64 %rd127, %rd11, 256; 2026-02-21T08:54:13.6626006Z cvt.s64.s32 %rd343, %r1200; 2026-02-21T08:54:13.6626157Z or.b64 %rd344, %rd343, %rd313; 2026-02-21T08:54:13.6626309Z shl.b64 %rd345, %rd344, 1; 2026-02-21T08:54:13.6626462Z add.s64 %rd12, %rd50, %rd345; 2026-02-21T08:54:13.6626611Z add.s64 %rd128, %rd12, 256; 2026-02-21T08:54:13.6626764Z cvt.s64.s32 %rd346, %r1201; 2026-02-21T08:54:13.6626920Z or.b64 %rd347, %rd346, %rd313; 2026-02-21T08:54:13.6627070Z shl.b64 %rd348, %rd347, 1; 2026-02-21T08:54:13.6627228Z add.s64 %rd13, %rd50, %rd348; 2026-02-21T08:54:13.6627380Z add.s64 %rd129, %rd13, 256; 2026-02-21T08:54:13.6627536Z cvt.s64.s32 %rd349, %r1202; 2026-02-21T08:54:13.6627685Z or.b64 %rd350, %rd349, %rd313; 2026-02-21T08:54:13.6627843Z shl.b64 %rd351, %rd350, 1; 2026-02-21T08:54:13.6627993Z add.s64 %rd14, %rd50, %rd351; 2026-02-21T08:54:13.6628150Z add.s64 %rd130, %rd14, 256; 2026-02-21T08:54:13.6628295Z cvt.s64.s32 %rd352, %r1203; 2026-02-21T08:54:13.6628451Z or.b64 %rd353, %rd352, %rd313; 2026-02-21T08:54:13.6628612Z shl.b64 %rd354, %rd353, 1; 2026-02-21T08:54:13.6628760Z add.s64 %rd15, %rd50, %rd354; 2026-02-21T08:54:13.6628917Z add.s64 %rd131, %rd15, 256; 2026-02-21T08:54:13.6629065Z cvt.s64.s32 %rd355, %r1204; 2026-02-21T08:54:13.6629224Z or.b64 %rd356, %rd355, %rd313; 2026-02-21T08:54:13.6629380Z shl.b64 %rd357, %rd356, 1; 2026-02-21T08:54:13.6629580Z add.s64 %rd16, %rd50, %rd357; 2026-02-21T08:54:13.6629732Z add.s64 %rd132, %rd16, 256; 2026-02-21T08:54:13.6629887Z cvt.s64.s32 %rd358, %r1205; 2026-02-21T08:54:13.6630044Z or.b64 %rd359, %rd358, %rd313; 2026-02-21T08:54:13.6630196Z shl.b64 %rd360, %rd359, 1; 2026-02-21T08:54:13.6630351Z add.s64 %rd17, %rd50, %rd360; 2026-02-21T08:54:13.6630501Z add.s64 %rd133, %rd17, 256; 2026-02-21T08:54:13.6630769Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6631056Z bar.sync 0; 2026-02-21T08:54:13.6631194Z add.s32 %r782, %r1228, 425984; 2026-02-21T08:54:13.6631350Z selp.b32 %r783, 16, 0, %p43; 2026-02-21T08:54:13.6631510Z // begin inline asm 2026-02-21T08:54:13.6631708Z cp.async.cg.shared.global [ %r782 + 0 ], [ %rd118 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6631937Z // end inline asm 2026-02-21T08:54:13.6632080Z add.s32 %r784, %r1228, 427008; 2026-02-21T08:54:13.6632229Z // begin inline asm 2026-02-21T08:54:13.6632430Z cp.async.cg.shared.global [ %r784 + 0 ], [ %rd119 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6632649Z // end inline asm 2026-02-21T08:54:13.6632822Z add.s32 %r786, %r1228, 428032; 2026-02-21T08:54:13.6632971Z // begin inline asm 2026-02-21T08:54:13.6633168Z cp.async.cg.shared.global [ %r786 + 0 ], [ %rd120 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6633385Z // end inline asm 2026-02-21T08:54:13.6633525Z add.s32 %r788, %r1228, 429056; 2026-02-21T08:54:13.6633682Z // begin inline asm 2026-02-21T08:54:13.6633869Z cp.async.cg.shared.global [ %r788 + 0 ], [ %rd121 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6634092Z // end inline asm 2026-02-21T08:54:13.6634223Z add.s32 %r790, %r1228, 430080; 2026-02-21T08:54:13.6634380Z // begin inline asm 2026-02-21T08:54:13.6634568Z cp.async.cg.shared.global [ %r790 + 0 ], [ %rd122 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6634818Z // end inline asm 2026-02-21T08:54:13.6634949Z add.s32 %r792, %r1228, 431104; 2026-02-21T08:54:13.6635104Z // begin inline asm 2026-02-21T08:54:13.6635296Z cp.async.cg.shared.global [ %r792 + 0 ], [ %rd123 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6635538Z // end inline asm 2026-02-21T08:54:13.6635681Z add.s32 %r794, %r1228, 432128; 2026-02-21T08:54:13.6635832Z // begin inline asm 2026-02-21T08:54:13.6636029Z cp.async.cg.shared.global [ %r794 + 0 ], [ %rd124 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6636246Z // end inline asm 2026-02-21T08:54:13.6636384Z add.s32 %r796, %r1228, 433152; 2026-02-21T08:54:13.6636533Z // begin inline asm 2026-02-21T08:54:13.6636727Z cp.async.cg.shared.global [ %r796 + 0 ], [ %rd125 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6636948Z // end inline asm 2026-02-21T08:54:13.6637110Z add.s32 %r798, %r1228, 434176; 2026-02-21T08:54:13.6637270Z // begin inline asm 2026-02-21T08:54:13.6637458Z cp.async.cg.shared.global [ %r798 + 0 ], [ %rd126 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6637684Z // end inline asm 2026-02-21T08:54:13.6637815Z add.s32 %r800, %r1228, 435200; 2026-02-21T08:54:13.6637977Z // begin inline asm 2026-02-21T08:54:13.6638167Z cp.async.cg.shared.global [ %r800 + 0 ], [ %rd127 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6638403Z // end inline asm 2026-02-21T08:54:13.6638539Z add.s32 %r802, %r1228, 436224; 2026-02-21T08:54:13.6638704Z // begin inline asm 2026-02-21T08:54:13.6638900Z cp.async.cg.shared.global [ %r802 + 0 ], [ %rd128 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6639121Z // end inline asm 2026-02-21T08:54:13.6639275Z add.s32 %r804, %r1228, 437248; 2026-02-21T08:54:13.6639432Z // begin inline asm 2026-02-21T08:54:13.6639633Z cp.async.cg.shared.global [ %r804 + 0 ], [ %rd129 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6639861Z // end inline asm 2026-02-21T08:54:13.6640007Z add.s32 %r806, %r1228, 438272; 2026-02-21T08:54:13.6640164Z // begin inline asm 2026-02-21T08:54:13.6640369Z cp.async.cg.shared.global [ %r806 + 0 ], [ %rd130 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6640602Z // end inline asm 2026-02-21T08:54:13.6640740Z add.s32 %r808, %r1228, 439296; 2026-02-21T08:54:13.6640935Z // begin inline asm 2026-02-21T08:54:13.6641136Z cp.async.cg.shared.global [ %r808 + 0 ], [ %rd131 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6641373Z // end inline asm 2026-02-21T08:54:13.6641510Z add.s32 %r810, %r1228, 440320; 2026-02-21T08:54:13.6641678Z // begin inline asm 2026-02-21T08:54:13.6641873Z cp.async.cg.shared.global [ %r810 + 0 ], [ %rd132 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6642104Z // end inline asm 2026-02-21T08:54:13.6642248Z add.s32 %r812, %r1228, 441344; 2026-02-21T08:54:13.6642405Z // begin inline asm 2026-02-21T08:54:13.6642611Z cp.async.cg.shared.global [ %r812 + 0 ], [ %rd133 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6642840Z // end inline asm 2026-02-21T08:54:13.6642989Z cp.async.commit_group; 2026-02-21T08:54:13.6643258Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6643565Z cvt.s64.s32 %rd361, %r1229; 2026-02-21T08:54:13.6643729Z or.b64 %rd362, %rd361, %rd313; 2026-02-21T08:54:13.6643902Z shl.b64 %rd363, %rd362, 1; 2026-02-21T08:54:13.6644068Z add.s64 %rd18, %rd51, %rd363; 2026-02-21T08:54:13.6644232Z add.s64 %rd134, %rd18, 256; 2026-02-21T08:54:13.6644426Z cvt.s64.s32 %rd364, %r1230; 2026-02-21T08:54:13.6644581Z or.b64 %rd365, %rd364, %rd313; 2026-02-21T08:54:13.6644778Z shl.b64 %rd366, %rd365, 1; 2026-02-21T08:54:13.6644934Z add.s64 %rd19, %rd51, %rd366; 2026-02-21T08:54:13.6645099Z add.s64 %rd135, %rd19, 256; 2026-02-21T08:54:13.6645255Z cvt.s64.s32 %rd367, %r1231; 2026-02-21T08:54:13.6645416Z or.b64 %rd368, %rd367, %rd313; 2026-02-21T08:54:13.6645576Z shl.b64 %rd369, %rd368, 1; 2026-02-21T08:54:13.6645738Z add.s64 %rd20, %rd51, %rd369; 2026-02-21T08:54:13.6645906Z add.s64 %rd136, %rd20, 256; 2026-02-21T08:54:13.6646060Z cvt.s64.s32 %rd370, %r1232; 2026-02-21T08:54:13.6646227Z or.b64 %rd371, %rd370, %rd313; 2026-02-21T08:54:13.6646387Z shl.b64 %rd372, %rd371, 1; 2026-02-21T08:54:13.6646553Z add.s64 %rd21, %rd51, %rd372; 2026-02-21T08:54:13.6646713Z add.s64 %rd137, %rd21, 256; 2026-02-21T08:54:13.6646879Z cvt.s64.s32 %rd373, %r1233; 2026-02-21T08:54:13.6647090Z or.b64 %rd374, %rd373, %rd313; 2026-02-21T08:54:13.6647258Z shl.b64 %rd375, %rd374, 1; 2026-02-21T08:54:13.6647410Z add.s64 %rd22, %rd51, %rd375; 2026-02-21T08:54:13.6647574Z add.s64 %rd138, %rd22, 256; 2026-02-21T08:54:13.6647729Z cvt.s64.s32 %rd376, %r1234; 2026-02-21T08:54:13.6647879Z or.b64 %rd377, %rd376, %rd313; 2026-02-21T08:54:13.6648038Z shl.b64 %rd378, %rd377, 1; 2026-02-21T08:54:13.6648186Z add.s64 %rd23, %rd51, %rd378; 2026-02-21T08:54:13.6648351Z add.s64 %rd139, %rd23, 256; 2026-02-21T08:54:13.6648504Z cvt.s64.s32 %rd379, %r1235; 2026-02-21T08:54:13.6648695Z or.b64 %rd380, %rd379, %rd313; 2026-02-21T08:54:13.6648848Z shl.b64 %rd381, %rd380, 1; 2026-02-21T08:54:13.6649003Z add.s64 %rd24, %rd51, %rd381; 2026-02-21T08:54:13.6649160Z add.s64 %rd140, %rd24, 256; 2026-02-21T08:54:13.6649308Z cvt.s64.s32 %rd382, %r1236; 2026-02-21T08:54:13.6649464Z or.b64 %rd383, %rd382, %rd313; 2026-02-21T08:54:13.6649617Z shl.b64 %rd384, %rd383, 1; 2026-02-21T08:54:13.6649773Z add.s64 %rd25, %rd51, %rd384; 2026-02-21T08:54:13.6649924Z add.s64 %rd141, %rd25, 256; 2026-02-21T08:54:13.6650079Z cvt.s64.s32 %rd385, %r1237; 2026-02-21T08:54:13.6650228Z or.b64 %rd386, %rd385, %rd313; 2026-02-21T08:54:13.6650389Z shl.b64 %rd387, %rd386, 1; 2026-02-21T08:54:13.6650536Z add.s64 %rd26, %rd51, %rd387; 2026-02-21T08:54:13.6650693Z add.s64 %rd142, %rd26, 256; 2026-02-21T08:54:13.6650847Z cvt.s64.s32 %rd388, %r1238; 2026-02-21T08:54:13.6650995Z or.b64 %rd389, %rd388, %rd313; 2026-02-21T08:54:13.6651154Z shl.b64 %rd390, %rd389, 1; 2026-02-21T08:54:13.6651301Z add.s64 %rd27, %rd51, %rd390; 2026-02-21T08:54:13.6651457Z add.s64 %rd143, %rd27, 256; 2026-02-21T08:54:13.6651604Z cvt.s64.s32 %rd391, %r1239; 2026-02-21T08:54:13.6651757Z or.b64 %rd392, %rd391, %rd313; 2026-02-21T08:54:13.6651910Z shl.b64 %rd393, %rd392, 1; 2026-02-21T08:54:13.6652097Z add.s64 %rd28, %rd51, %rd393; 2026-02-21T08:54:13.6652248Z add.s64 %rd144, %rd28, 256; 2026-02-21T08:54:13.6652403Z cvt.s64.s32 %rd394, %r1240; 2026-02-21T08:54:13.6652559Z or.b64 %rd395, %rd394, %rd313; 2026-02-21T08:54:13.6652711Z shl.b64 %rd396, %rd395, 1; 2026-02-21T08:54:13.6652865Z add.s64 %rd29, %rd51, %rd396; 2026-02-21T08:54:13.6653013Z add.s64 %rd145, %rd29, 256; 2026-02-21T08:54:13.6653166Z cvt.s64.s32 %rd397, %r1241; 2026-02-21T08:54:13.6653314Z or.b64 %rd398, %rd397, %rd313; 2026-02-21T08:54:13.6653472Z shl.b64 %rd399, %rd398, 1; 2026-02-21T08:54:13.6653621Z add.s64 %rd30, %rd51, %rd399; 2026-02-21T08:54:13.6653775Z add.s64 %rd146, %rd30, 256; 2026-02-21T08:54:13.6653927Z cvt.s64.s32 %rd400, %r1242; 2026-02-21T08:54:13.6654075Z or.b64 %rd401, %rd400, %rd313; 2026-02-21T08:54:13.6654233Z shl.b64 %rd402, %rd401, 1; 2026-02-21T08:54:13.6654381Z add.s64 %rd31, %rd51, %rd402; 2026-02-21T08:54:13.6654538Z add.s64 %rd147, %rd31, 256; 2026-02-21T08:54:13.6654713Z cvt.s64.s32 %rd403, %r1243; 2026-02-21T08:54:13.6654875Z or.b64 %rd404, %rd403, %rd313; 2026-02-21T08:54:13.6655032Z shl.b64 %rd405, %rd404, 1; 2026-02-21T08:54:13.6655222Z add.s64 %rd32, %rd51, %rd405; 2026-02-21T08:54:13.6655382Z add.s64 %rd148, %rd32, 256; 2026-02-21T08:54:13.6655536Z cvt.s64.s32 %rd406, %r1244; 2026-02-21T08:54:13.6655697Z or.b64 %rd407, %rd406, %rd313; 2026-02-21T08:54:13.6655857Z shl.b64 %rd408, %rd407, 1; 2026-02-21T08:54:13.6656023Z add.s64 %rd33, %rd51, %rd408; 2026-02-21T08:54:13.6656174Z add.s64 %rd149, %rd33, 256; 2026-02-21T08:54:13.6656329Z cvt.s64.s32 %rd409, %r1245; 2026-02-21T08:54:13.6656477Z or.b64 %rd410, %rd409, %rd313; 2026-02-21T08:54:13.6656639Z shl.b64 %rd411, %rd410, 1; 2026-02-21T08:54:13.6656786Z add.s64 %rd34, %rd51, %rd411; 2026-02-21T08:54:13.6656945Z add.s64 %rd150, %rd34, 256; 2026-02-21T08:54:13.6657093Z cvt.s64.s32 %rd412, %r1246; 2026-02-21T08:54:13.6657250Z or.b64 %rd413, %rd412, %rd313; 2026-02-21T08:54:13.6657412Z shl.b64 %rd414, %rd413, 1; 2026-02-21T08:54:13.6657561Z add.s64 %rd35, %rd51, %rd414; 2026-02-21T08:54:13.6657746Z add.s64 %rd151, %rd35, 256; 2026-02-21T08:54:13.6657897Z cvt.s64.s32 %rd415, %r1247; 2026-02-21T08:54:13.6658055Z or.b64 %rd416, %rd415, %rd313; 2026-02-21T08:54:13.6658208Z shl.b64 %rd417, %rd416, 1; 2026-02-21T08:54:13.6658363Z add.s64 %rd36, %rd51, %rd417; 2026-02-21T08:54:13.6658511Z add.s64 %rd152, %rd36, 256; 2026-02-21T08:54:13.6658665Z cvt.s64.s32 %rd418, %r1248; 2026-02-21T08:54:13.6658819Z or.b64 %rd419, %rd418, %rd313; 2026-02-21T08:54:13.6658971Z shl.b64 %rd420, %rd419, 1; 2026-02-21T08:54:13.6659125Z add.s64 %rd37, %rd51, %rd420; 2026-02-21T08:54:13.6659304Z add.s64 %rd153, %rd37, 256; 2026-02-21T08:54:13.6659456Z cvt.s64.s32 %rd421, %r1249; 2026-02-21T08:54:13.6659601Z or.b64 %rd422, %rd421, %rd313; 2026-02-21T08:54:13.6659760Z shl.b64 %rd423, %rd422, 1; 2026-02-21T08:54:13.6659906Z add.s64 %rd38, %rd51, %rd423; 2026-02-21T08:54:13.6660061Z add.s64 %rd154, %rd38, 256; 2026-02-21T08:54:13.6660207Z cvt.s64.s32 %rd424, %r1250; 2026-02-21T08:54:13.6660361Z or.b64 %rd425, %rd424, %rd313; 2026-02-21T08:54:13.6660520Z shl.b64 %rd426, %rd425, 1; 2026-02-21T08:54:13.6660668Z add.s64 %rd39, %rd51, %rd426; 2026-02-21T08:54:13.6660824Z add.s64 %rd155, %rd39, 256; 2026-02-21T08:54:13.6660973Z cvt.s64.s32 %rd427, %r1251; 2026-02-21T08:54:13.6661130Z or.b64 %rd428, %rd427, %rd313; 2026-02-21T08:54:13.6661282Z shl.b64 %rd429, %rd428, 1; 2026-02-21T08:54:13.6661438Z add.s64 %rd40, %rd51, %rd429; 2026-02-21T08:54:13.6661585Z add.s64 %rd156, %rd40, 256; 2026-02-21T08:54:13.6661741Z cvt.s64.s32 %rd430, %r1252; 2026-02-21T08:54:13.6661890Z or.b64 %rd431, %rd430, %rd313; 2026-02-21T08:54:13.6662051Z shl.b64 %rd432, %rd431, 1; 2026-02-21T08:54:13.6662207Z add.s64 %rd41, %rd51, %rd432; 2026-02-21T08:54:13.6662358Z add.s64 %rd157, %rd41, 256; 2026-02-21T08:54:13.6662512Z cvt.s64.s32 %rd433, %r1253; 2026-02-21T08:54:13.6662661Z or.b64 %rd434, %rd433, %rd313; 2026-02-21T08:54:13.6662849Z shl.b64 %rd435, %rd434, 1; 2026-02-21T08:54:13.6663001Z add.s64 %rd42, %rd51, %rd435; 2026-02-21T08:54:13.6663165Z add.s64 %rd158, %rd42, 256; 2026-02-21T08:54:13.6663314Z cvt.s64.s32 %rd436, %r1254; 2026-02-21T08:54:13.6663471Z or.b64 %rd437, %rd436, %rd313; 2026-02-21T08:54:13.6663631Z shl.b64 %rd438, %rd437, 1; 2026-02-21T08:54:13.6663777Z add.s64 %rd43, %rd51, %rd438; 2026-02-21T08:54:13.6663932Z add.s64 %rd159, %rd43, 256; 2026-02-21T08:54:13.6664080Z cvt.s64.s32 %rd439, %r1255; 2026-02-21T08:54:13.6664234Z or.b64 %rd440, %rd439, %rd313; 2026-02-21T08:54:13.6664383Z shl.b64 %rd441, %rd440, 1; 2026-02-21T08:54:13.6664538Z add.s64 %rd44, %rd51, %rd441; 2026-02-21T08:54:13.6664720Z add.s64 %rd160, %rd44, 256; 2026-02-21T08:54:13.6664874Z cvt.s64.s32 %rd442, %r1256; 2026-02-21T08:54:13.6665023Z or.b64 %rd443, %rd442, %rd313; 2026-02-21T08:54:13.6665182Z shl.b64 %rd444, %rd443, 1; 2026-02-21T08:54:13.6665338Z add.s64 %rd45, %rd51, %rd444; 2026-02-21T08:54:13.6665488Z add.s64 %rd161, %rd45, 256; 2026-02-21T08:54:13.6665645Z cvt.s64.s32 %rd445, %r1257; 2026-02-21T08:54:13.6665793Z or.b64 %rd446, %rd445, %rd313; 2026-02-21T08:54:13.6665981Z shl.b64 %rd447, %rd446, 1; 2026-02-21T08:54:13.6666129Z add.s64 %rd46, %rd51, %rd447; 2026-02-21T08:54:13.6666284Z add.s64 %rd162, %rd46, 256; 2026-02-21T08:54:13.6666430Z cvt.s64.s32 %rd448, %r1258; 2026-02-21T08:54:13.6666586Z or.b64 %rd449, %rd448, %rd313; 2026-02-21T08:54:13.6666738Z shl.b64 %rd450, %rd449, 1; 2026-02-21T08:54:13.6666892Z add.s64 %rd47, %rd51, %rd450; 2026-02-21T08:54:13.6667046Z add.s64 %rd163, %rd47, 256; 2026-02-21T08:54:13.6667192Z cvt.s64.s32 %rd451, %r1259; 2026-02-21T08:54:13.6667345Z or.b64 %rd452, %rd451, %rd313; 2026-02-21T08:54:13.6667498Z shl.b64 %rd453, %rd452, 1; 2026-02-21T08:54:13.6667653Z add.s64 %rd48, %rd51, %rd453; 2026-02-21T08:54:13.6667802Z add.s64 %rd164, %rd48, 256; 2026-02-21T08:54:13.6667956Z cvt.s64.s32 %rd454, %r1260; 2026-02-21T08:54:13.6668104Z or.b64 %rd455, %rd454, %rd313; 2026-02-21T08:54:13.6668261Z shl.b64 %rd456, %rd455, 1; 2026-02-21T08:54:13.6668441Z add.s64 %rd49, %rd51, %rd456; 2026-02-21T08:54:13.6668595Z add.s64 %rd165, %rd49, 256; 2026-02-21T08:54:13.6668863Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6669153Z add.s32 %r814, %r718, 65536; 2026-02-21T08:54:13.6669312Z // begin inline asm 2026-02-21T08:54:13.6669505Z cp.async.cg.shared.global [ %r814 + 0 ], [ %rd134 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6669731Z // end inline asm 2026-02-21T08:54:13.6669864Z add.s32 %r816, %r718, 66560; 2026-02-21T08:54:13.6670048Z // begin inline asm 2026-02-21T08:54:13.6670246Z cp.async.cg.shared.global [ %r816 + 0 ], [ %rd135 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6670466Z // end inline asm 2026-02-21T08:54:13.6670605Z add.s32 %r818, %r718, 67584; 2026-02-21T08:54:13.6670752Z // begin inline asm 2026-02-21T08:54:13.6670951Z cp.async.cg.shared.global [ %r818 + 0 ], [ %rd136 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6671171Z // end inline asm 2026-02-21T08:54:13.6671315Z add.s32 %r820, %r718, 68608; 2026-02-21T08:54:13.6671468Z // begin inline asm 2026-02-21T08:54:13.6671664Z cp.async.cg.shared.global [ %r820 + 0 ], [ %rd137 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6671881Z // end inline asm 2026-02-21T08:54:13.6672018Z add.s32 %r822, %r718, 69632; 2026-02-21T08:54:13.6672173Z // begin inline asm 2026-02-21T08:54:13.6672358Z cp.async.cg.shared.global [ %r822 + 0 ], [ %rd138 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6672583Z // end inline asm 2026-02-21T08:54:13.6672714Z add.s32 %r824, %r718, 70656; 2026-02-21T08:54:13.6672870Z // begin inline asm 2026-02-21T08:54:13.6673055Z cp.async.cg.shared.global [ %r824 + 0 ], [ %rd139 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6673275Z // end inline asm 2026-02-21T08:54:13.6673406Z add.s32 %r826, %r718, 71680; 2026-02-21T08:54:13.6673561Z // begin inline asm 2026-02-21T08:54:13.6673781Z cp.async.cg.shared.global [ %r826 + 0 ], [ %rd140 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6673999Z // end inline asm 2026-02-21T08:54:13.6674137Z add.s32 %r828, %r718, 72704; 2026-02-21T08:54:13.6674283Z // begin inline asm 2026-02-21T08:54:13.6674479Z cp.async.cg.shared.global [ %r828 + 0 ], [ %rd141 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6674721Z // end inline asm 2026-02-21T08:54:13.6674862Z add.s32 %r830, %r718, 73728; 2026-02-21T08:54:13.6675009Z // begin inline asm 2026-02-21T08:54:13.6675204Z cp.async.cg.shared.global [ %r830 + 0 ], [ %rd142 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6675425Z // end inline asm 2026-02-21T08:54:13.6675482Z add.s32 %r832, %r718, 74752; 2026-02-21T08:54:13.6675538Z // begin inline asm 2026-02-21T08:54:13.6675647Z cp.async.cg.shared.global [ %r832 + 0 ], [ %rd143 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6675707Z // end inline asm 2026-02-21T08:54:13.6675762Z add.s32 %r834, %r718, 75776; 2026-02-21T08:54:13.6675815Z // begin inline asm 2026-02-21T08:54:13.6675931Z cp.async.cg.shared.global [ %r834 + 0 ], [ %rd144 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6675985Z // end inline asm 2026-02-21T08:54:13.6676041Z add.s32 %r836, %r718, 76800; 2026-02-21T08:54:13.6676123Z // begin inline asm 2026-02-21T08:54:13.6676241Z cp.async.cg.shared.global [ %r836 + 0 ], [ %rd145 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6676293Z // end inline asm 2026-02-21T08:54:13.6676348Z add.s32 %r838, %r718, 77824; 2026-02-21T08:54:13.6676408Z // begin inline asm 2026-02-21T08:54:13.6676517Z cp.async.cg.shared.global [ %r838 + 0 ], [ %rd146 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6676569Z // end inline asm 2026-02-21T08:54:13.6676631Z add.s32 %r840, %r718, 78848; 2026-02-21T08:54:13.6676686Z // begin inline asm 2026-02-21T08:54:13.6676794Z cp.async.cg.shared.global [ %r840 + 0 ], [ %rd147 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6676847Z // end inline asm 2026-02-21T08:54:13.6676910Z add.s32 %r842, %r718, 79872; 2026-02-21T08:54:13.6676963Z // begin inline asm 2026-02-21T08:54:13.6677071Z cp.async.cg.shared.global [ %r842 + 0 ], [ %rd148 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6677132Z // end inline asm 2026-02-21T08:54:13.6677211Z add.s32 %r844, %r718, 80896; 2026-02-21T08:54:13.6677267Z // begin inline asm 2026-02-21T08:54:13.6677377Z cp.async.cg.shared.global [ %r844 + 0 ], [ %rd149 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6677439Z // end inline asm 2026-02-21T08:54:13.6677493Z add.s32 %r846, %r718, 81920; 2026-02-21T08:54:13.6677546Z // begin inline asm 2026-02-21T08:54:13.6677661Z cp.async.cg.shared.global [ %r846 + 0 ], [ %rd150 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6677713Z // end inline asm 2026-02-21T08:54:13.6677767Z add.s32 %r848, %r718, 82944; 2026-02-21T08:54:13.6677849Z // begin inline asm 2026-02-21T08:54:13.6677964Z cp.async.cg.shared.global [ %r848 + 0 ], [ %rd151 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6678018Z // end inline asm 2026-02-21T08:54:13.6678074Z add.s32 %r850, %r718, 83968; 2026-02-21T08:54:13.6678135Z // begin inline asm 2026-02-21T08:54:13.6678244Z cp.async.cg.shared.global [ %r850 + 0 ], [ %rd152 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6678297Z // end inline asm 2026-02-21T08:54:13.6678360Z add.s32 %r852, %r718, 84992; 2026-02-21T08:54:13.6678415Z // begin inline asm 2026-02-21T08:54:13.6678523Z cp.async.cg.shared.global [ %r852 + 0 ], [ %rd153 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6678575Z // end inline asm 2026-02-21T08:54:13.6678638Z add.s32 %r854, %r718, 86016; 2026-02-21T08:54:13.6678691Z // begin inline asm 2026-02-21T08:54:13.6678798Z cp.async.cg.shared.global [ %r854 + 0 ], [ %rd154 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6678856Z // end inline asm 2026-02-21T08:54:13.6678915Z add.s32 %r856, %r718, 87040; 2026-02-21T08:54:13.6678970Z // begin inline asm 2026-02-21T08:54:13.6679077Z cp.async.cg.shared.global [ %r856 + 0 ], [ %rd155 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6679139Z // end inline asm 2026-02-21T08:54:13.6679195Z add.s32 %r858, %r718, 88064; 2026-02-21T08:54:13.6679249Z // begin inline asm 2026-02-21T08:54:13.6679391Z cp.async.cg.shared.global [ %r858 + 0 ], [ %rd156 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6679448Z // end inline asm 2026-02-21T08:54:13.6679506Z add.s32 %r860, %r718, 89088; 2026-02-21T08:54:13.6679568Z // begin inline asm 2026-02-21T08:54:13.6679673Z cp.async.cg.shared.global [ %r860 + 0 ], [ %rd157 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6679725Z // end inline asm 2026-02-21T08:54:13.6679780Z add.s32 %r862, %r718, 90112; 2026-02-21T08:54:13.6679840Z // begin inline asm 2026-02-21T08:54:13.6679943Z cp.async.cg.shared.global [ %r862 + 0 ], [ %rd158 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6679995Z // end inline asm 2026-02-21T08:54:13.6680058Z add.s32 %r864, %r718, 91136; 2026-02-21T08:54:13.6680112Z // begin inline asm 2026-02-21T08:54:13.6680216Z cp.async.cg.shared.global [ %r864 + 0 ], [ %rd159 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6680268Z // end inline asm 2026-02-21T08:54:13.6680328Z add.s32 %r866, %r718, 92160; 2026-02-21T08:54:13.6680381Z // begin inline asm 2026-02-21T08:54:13.6680487Z cp.async.cg.shared.global [ %r866 + 0 ], [ %rd160 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6680550Z // end inline asm 2026-02-21T08:54:13.6680649Z add.s32 %r868, %r718, 93184; 2026-02-21T08:54:13.6680702Z // begin inline asm 2026-02-21T08:54:13.6680811Z cp.async.cg.shared.global [ %r868 + 0 ], [ %rd161 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6680873Z // end inline asm 2026-02-21T08:54:13.6680931Z add.s32 %r870, %r718, 94208; 2026-02-21T08:54:13.6680987Z // begin inline asm 2026-02-21T08:54:13.6681105Z cp.async.cg.shared.global [ %r870 + 0 ], [ %rd162 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6681159Z // end inline asm 2026-02-21T08:54:13.6681219Z add.s32 %r872, %r718, 95232; 2026-02-21T08:54:13.6681283Z // begin inline asm 2026-02-21T08:54:13.6681393Z cp.async.cg.shared.global [ %r872 + 0 ], [ %rd163 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6681449Z // end inline asm 2026-02-21T08:54:13.6681505Z add.s32 %r874, %r718, 96256; 2026-02-21T08:54:13.6681569Z // begin inline asm 2026-02-21T08:54:13.6681680Z cp.async.cg.shared.global [ %r874 + 0 ], [ %rd164 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6681759Z // end inline asm 2026-02-21T08:54:13.6681826Z add.s32 %r876, %r718, 97280; 2026-02-21T08:54:13.6681879Z // begin inline asm 2026-02-21T08:54:13.6681987Z cp.async.cg.shared.global [ %r876 + 0 ], [ %rd165 + 0 ], 0x10, %r783; 2026-02-21T08:54:13.6682040Z // end inline asm 2026-02-21T08:54:13.6682112Z cp.async.commit_group; 2026-02-21T08:54:13.6682302Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6682367Z setp.gt.s32 %p44, %r6, 2; 2026-02-21T08:54:13.6682575Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6682639Z add.s64 %rd166, %rd2, 512; 2026-02-21T08:54:13.6682699Z add.s64 %rd167, %rd3, 512; 2026-02-21T08:54:13.6682766Z add.s64 %rd168, %rd4, 512; 2026-02-21T08:54:13.6682824Z add.s64 %rd169, %rd5, 512; 2026-02-21T08:54:13.6682883Z add.s64 %rd170, %rd6, 512; 2026-02-21T08:54:13.6682939Z add.s64 %rd171, %rd7, 512; 2026-02-21T08:54:13.6683003Z add.s64 %rd172, %rd8, 512; 2026-02-21T08:54:13.6683062Z add.s64 %rd173, %rd9, 512; 2026-02-21T08:54:13.6683123Z add.s64 %rd174, %rd10, 512; 2026-02-21T08:54:13.6683188Z add.s64 %rd175, %rd11, 512; 2026-02-21T08:54:13.6683245Z add.s64 %rd176, %rd12, 512; 2026-02-21T08:54:13.6683303Z add.s64 %rd177, %rd13, 512; 2026-02-21T08:54:13.6683359Z add.s64 %rd178, %rd14, 512; 2026-02-21T08:54:13.6683424Z add.s64 %rd179, %rd15, 512; 2026-02-21T08:54:13.6683480Z add.s64 %rd180, %rd16, 512; 2026-02-21T08:54:13.6683535Z add.s64 %rd181, %rd17, 512; 2026-02-21T08:54:13.6683720Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6683775Z bar.sync 0; 2026-02-21T08:54:13.6683837Z add.s32 %r878, %r1228, 458752; 2026-02-21T08:54:13.6683905Z selp.b32 %r879, 16, 0, %p44; 2026-02-21T08:54:13.6683983Z // begin inline asm 2026-02-21T08:54:13.6684099Z cp.async.cg.shared.global [ %r878 + 0 ], [ %rd166 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6684156Z // end inline asm 2026-02-21T08:54:13.6684224Z add.s32 %r880, %r1228, 459776; 2026-02-21T08:54:13.6684281Z // begin inline asm 2026-02-21T08:54:13.6684394Z cp.async.cg.shared.global [ %r880 + 0 ], [ %rd167 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6684455Z // end inline asm 2026-02-21T08:54:13.6684514Z add.s32 %r882, %r1228, 460800; 2026-02-21T08:54:13.6684570Z // begin inline asm 2026-02-21T08:54:13.6684716Z cp.async.cg.shared.global [ %r882 + 0 ], [ %rd168 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6684781Z // end inline asm 2026-02-21T08:54:13.6684841Z add.s32 %r884, %r1228, 461824; 2026-02-21T08:54:13.6684896Z // begin inline asm 2026-02-21T08:54:13.6685015Z cp.async.cg.shared.global [ %r884 + 0 ], [ %rd169 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6685071Z // end inline asm 2026-02-21T08:54:13.6685132Z add.s32 %r886, %r1228, 462848; 2026-02-21T08:54:13.6685190Z // begin inline asm 2026-02-21T08:54:13.6685312Z cp.async.cg.shared.global [ %r886 + 0 ], [ %rd170 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6685420Z // end inline asm 2026-02-21T08:54:13.6685478Z add.s32 %r888, %r1228, 463872; 2026-02-21T08:54:13.6685541Z // begin inline asm 2026-02-21T08:54:13.6685654Z cp.async.cg.shared.global [ %r888 + 0 ], [ %rd171 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6685709Z // end inline asm 2026-02-21T08:54:13.6685776Z add.s32 %r890, %r1228, 464896; 2026-02-21T08:54:13.6685832Z // begin inline asm 2026-02-21T08:54:13.6685945Z cp.async.cg.shared.global [ %r890 + 0 ], [ %rd172 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6686000Z // end inline asm 2026-02-21T08:54:13.6686065Z add.s32 %r892, %r1228, 465920; 2026-02-21T08:54:13.6686122Z // begin inline asm 2026-02-21T08:54:13.6686236Z cp.async.cg.shared.global [ %r892 + 0 ], [ %rd173 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6686298Z // end inline asm 2026-02-21T08:54:13.6686356Z add.s32 %r894, %r1228, 466944; 2026-02-21T08:54:13.6686414Z // begin inline asm 2026-02-21T08:54:13.6686557Z cp.async.cg.shared.global [ %r894 + 0 ], [ %rd174 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6686628Z // end inline asm 2026-02-21T08:54:13.6686687Z add.s32 %r896, %r1228, 467968; 2026-02-21T08:54:13.6686754Z // begin inline asm 2026-02-21T08:54:13.6686875Z cp.async.cg.shared.global [ %r896 + 0 ], [ %rd175 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6686930Z // end inline asm 2026-02-21T08:54:13.6686987Z add.s32 %r898, %r1228, 468992; 2026-02-21T08:54:13.6687050Z // begin inline asm 2026-02-21T08:54:13.6687164Z cp.async.cg.shared.global [ %r898 + 0 ], [ %rd176 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6687249Z // end inline asm 2026-02-21T08:54:13.6687308Z add.s32 %r900, %r1228, 470016; 2026-02-21T08:54:13.6687373Z // begin inline asm 2026-02-21T08:54:13.6687483Z cp.async.cg.shared.global [ %r900 + 0 ], [ %rd177 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6687538Z // end inline asm 2026-02-21T08:54:13.6687608Z add.s32 %r902, %r1228, 471040; 2026-02-21T08:54:13.6687663Z // begin inline asm 2026-02-21T08:54:13.6687774Z cp.async.cg.shared.global [ %r902 + 0 ], [ %rd178 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6687830Z // end inline asm 2026-02-21T08:54:13.6687895Z add.s32 %r904, %r1228, 472064; 2026-02-21T08:54:13.6687949Z // begin inline asm 2026-02-21T08:54:13.6688061Z cp.async.cg.shared.global [ %r904 + 0 ], [ %rd179 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6688124Z // end inline asm 2026-02-21T08:54:13.6688181Z add.s32 %r906, %r1228, 473088; 2026-02-21T08:54:13.6688236Z // begin inline asm 2026-02-21T08:54:13.6688346Z cp.async.cg.shared.global [ %r906 + 0 ], [ %rd180 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6688409Z // end inline asm 2026-02-21T08:54:13.6688467Z add.s32 %r908, %r1228, 474112; 2026-02-21T08:54:13.6688523Z // begin inline asm 2026-02-21T08:54:13.6688641Z cp.async.cg.shared.global [ %r908 + 0 ], [ %rd181 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6688696Z // end inline asm 2026-02-21T08:54:13.6688784Z cp.async.commit_group; 2026-02-21T08:54:13.6688972Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6689034Z add.s64 %rd182, %rd18, 512; 2026-02-21T08:54:13.6689092Z add.s64 %rd183, %rd19, 512; 2026-02-21T08:54:13.6689148Z add.s64 %rd184, %rd20, 512; 2026-02-21T08:54:13.6689214Z add.s64 %rd185, %rd21, 512; 2026-02-21T08:54:13.6689271Z add.s64 %rd186, %rd22, 512; 2026-02-21T08:54:13.6689327Z add.s64 %rd187, %rd23, 512; 2026-02-21T08:54:13.6689390Z add.s64 %rd188, %rd24, 512; 2026-02-21T08:54:13.6689446Z add.s64 %rd189, %rd25, 512; 2026-02-21T08:54:13.6689502Z add.s64 %rd190, %rd26, 512; 2026-02-21T08:54:13.6689561Z add.s64 %rd191, %rd27, 512; 2026-02-21T08:54:13.6689625Z add.s64 %rd192, %rd28, 512; 2026-02-21T08:54:13.6689683Z add.s64 %rd193, %rd29, 512; 2026-02-21T08:54:13.6689739Z add.s64 %rd194, %rd30, 512; 2026-02-21T08:54:13.6689806Z add.s64 %rd195, %rd31, 512; 2026-02-21T08:54:13.6689864Z add.s64 %rd196, %rd32, 512; 2026-02-21T08:54:13.6689921Z add.s64 %rd197, %rd33, 512; 2026-02-21T08:54:13.6689986Z add.s64 %rd198, %rd34, 512; 2026-02-21T08:54:13.6690077Z add.s64 %rd199, %rd35, 512; 2026-02-21T08:54:13.6690131Z add.s64 %rd200, %rd36, 512; 2026-02-21T08:54:13.6690184Z add.s64 %rd201, %rd37, 512; 2026-02-21T08:54:13.6690245Z add.s64 %rd202, %rd38, 512; 2026-02-21T08:54:13.6690298Z add.s64 %rd203, %rd39, 512; 2026-02-21T08:54:13.6690352Z add.s64 %rd204, %rd40, 512; 2026-02-21T08:54:13.6690412Z add.s64 %rd205, %rd41, 512; 2026-02-21T08:54:13.6690465Z add.s64 %rd206, %rd42, 512; 2026-02-21T08:54:13.6690520Z add.s64 %rd207, %rd43, 512; 2026-02-21T08:54:13.6690574Z add.s64 %rd208, %rd44, 512; 2026-02-21T08:54:13.6690636Z add.s64 %rd209, %rd45, 512; 2026-02-21T08:54:13.6690690Z add.s64 %rd210, %rd46, 512; 2026-02-21T08:54:13.6690745Z add.s64 %rd211, %rd47, 512; 2026-02-21T08:54:13.6690804Z add.s64 %rd212, %rd48, 512; 2026-02-21T08:54:13.6690858Z add.s64 %rd213, %rd49, 512; 2026-02-21T08:54:13.6691053Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6691113Z add.s32 %r910, %r718, 131072; 2026-02-21T08:54:13.6691176Z // begin inline asm 2026-02-21T08:54:13.6691282Z cp.async.cg.shared.global [ %r910 + 0 ], [ %rd182 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6691334Z // end inline asm 2026-02-21T08:54:13.6691399Z add.s32 %r912, %r718, 132096; 2026-02-21T08:54:13.6691453Z // begin inline asm 2026-02-21T08:54:13.6691560Z cp.async.cg.shared.global [ %r912 + 0 ], [ %rd183 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6691619Z // end inline asm 2026-02-21T08:54:13.6691699Z add.s32 %r914, %r718, 133120; 2026-02-21T08:54:13.6691752Z // begin inline asm 2026-02-21T08:54:13.6691860Z cp.async.cg.shared.global [ %r914 + 0 ], [ %rd184 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6691921Z // end inline asm 2026-02-21T08:54:13.6691975Z add.s32 %r916, %r718, 134144; 2026-02-21T08:54:13.6692028Z // begin inline asm 2026-02-21T08:54:13.6692144Z cp.async.cg.shared.global [ %r916 + 0 ], [ %rd185 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6692198Z // end inline asm 2026-02-21T08:54:13.6692257Z add.s32 %r918, %r718, 135168; 2026-02-21T08:54:13.6692310Z // begin inline asm 2026-02-21T08:54:13.6692424Z cp.async.cg.shared.global [ %r918 + 0 ], [ %rd186 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6692476Z // end inline asm 2026-02-21T08:54:13.6692531Z add.s32 %r920, %r718, 136192; 2026-02-21T08:54:13.6692594Z // begin inline asm 2026-02-21T08:54:13.6692700Z cp.async.cg.shared.global [ %r920 + 0 ], [ %rd187 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6692751Z // end inline asm 2026-02-21T08:54:13.6692816Z add.s32 %r922, %r718, 137216; 2026-02-21T08:54:13.6692869Z // begin inline asm 2026-02-21T08:54:13.6692976Z cp.async.cg.shared.global [ %r922 + 0 ], [ %rd188 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6693027Z // end inline asm 2026-02-21T08:54:13.6693091Z add.s32 %r924, %r718, 138240; 2026-02-21T08:54:13.6693146Z // begin inline asm 2026-02-21T08:54:13.6693275Z cp.async.cg.shared.global [ %r924 + 0 ], [ %rd189 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6693340Z // end inline asm 2026-02-21T08:54:13.6693399Z add.s32 %r926, %r718, 139264; 2026-02-21T08:54:13.6693454Z // begin inline asm 2026-02-21T08:54:13.6693562Z cp.async.cg.shared.global [ %r926 + 0 ], [ %rd190 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6693627Z // end inline asm 2026-02-21T08:54:13.6693683Z add.s32 %r928, %r718, 140288; 2026-02-21T08:54:13.6693738Z // begin inline asm 2026-02-21T08:54:13.6693851Z cp.async.cg.shared.global [ %r928 + 0 ], [ %rd191 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6693905Z // end inline asm 2026-02-21T08:54:13.6693961Z add.s32 %r930, %r718, 141312; 2026-02-21T08:54:13.6694015Z // begin inline asm 2026-02-21T08:54:13.6694130Z cp.async.cg.shared.global [ %r930 + 0 ], [ %rd192 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6694181Z // end inline asm 2026-02-21T08:54:13.6694234Z add.s32 %r932, %r718, 142336; 2026-02-21T08:54:13.6694296Z // begin inline asm 2026-02-21T08:54:13.6694402Z cp.async.cg.shared.global [ %r932 + 0 ], [ %rd193 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6694455Z // end inline asm 2026-02-21T08:54:13.6694538Z add.s32 %r934, %r718, 143360; 2026-02-21T08:54:13.6694591Z // begin inline asm 2026-02-21T08:54:13.6694725Z cp.async.cg.shared.global [ %r934 + 0 ], [ %rd194 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6694779Z // end inline asm 2026-02-21T08:54:13.6694841Z add.s32 %r936, %r718, 144384; 2026-02-21T08:54:13.6694894Z // begin inline asm 2026-02-21T08:54:13.6695003Z cp.async.cg.shared.global [ %r936 + 0 ], [ %rd195 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6695062Z // end inline asm 2026-02-21T08:54:13.6695119Z add.s32 %r938, %r718, 145408; 2026-02-21T08:54:13.6695172Z // begin inline asm 2026-02-21T08:54:13.6695281Z cp.async.cg.shared.global [ %r938 + 0 ], [ %rd196 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6695340Z // end inline asm 2026-02-21T08:54:13.6695396Z add.s32 %r940, %r718, 146432; 2026-02-21T08:54:13.6695452Z // begin inline asm 2026-02-21T08:54:13.6695605Z cp.async.cg.shared.global [ %r940 + 0 ], [ %rd197 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6695660Z // end inline asm 2026-02-21T08:54:13.6695720Z add.s32 %r942, %r718, 147456; 2026-02-21T08:54:13.6695780Z // begin inline asm 2026-02-21T08:54:13.6695888Z cp.async.cg.shared.global [ %r942 + 0 ], [ %rd198 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6695942Z // end inline asm 2026-02-21T08:54:13.6695997Z add.s32 %r944, %r718, 148480; 2026-02-21T08:54:13.6696060Z // begin inline asm 2026-02-21T08:54:13.6696167Z cp.async.cg.shared.global [ %r944 + 0 ], [ %rd199 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6696251Z // end inline asm 2026-02-21T08:54:13.6696314Z add.s32 %r946, %r718, 149504; 2026-02-21T08:54:13.6696369Z // begin inline asm 2026-02-21T08:54:13.6696479Z cp.async.cg.shared.global [ %r946 + 0 ], [ %rd200 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6696532Z // end inline asm 2026-02-21T08:54:13.6696594Z add.s32 %r948, %r718, 150528; 2026-02-21T08:54:13.6696649Z // begin inline asm 2026-02-21T08:54:13.6696759Z cp.async.cg.shared.global [ %r948 + 0 ], [ %rd201 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6696822Z // end inline asm 2026-02-21T08:54:13.6696877Z add.s32 %r950, %r718, 151552; 2026-02-21T08:54:13.6696931Z // begin inline asm 2026-02-21T08:54:13.6697038Z cp.async.cg.shared.global [ %r950 + 0 ], [ %rd202 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6697100Z // end inline asm 2026-02-21T08:54:13.6697155Z add.s32 %r952, %r718, 152576; 2026-02-21T08:54:13.6697209Z // begin inline asm 2026-02-21T08:54:13.6697322Z cp.async.cg.shared.global [ %r952 + 0 ], [ %rd203 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6697376Z // end inline asm 2026-02-21T08:54:13.6697430Z add.s32 %r954, %r718, 153600; 2026-02-21T08:54:13.6697488Z // begin inline asm 2026-02-21T08:54:13.6697596Z cp.async.cg.shared.global [ %r954 + 0 ], [ %rd204 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6697648Z // end inline asm 2026-02-21T08:54:13.6697702Z add.s32 %r956, %r718, 154624; 2026-02-21T08:54:13.6697805Z // begin inline asm 2026-02-21T08:54:13.6697917Z cp.async.cg.shared.global [ %r956 + 0 ], [ %rd205 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6697970Z // end inline asm 2026-02-21T08:54:13.6698031Z add.s32 %r958, %r718, 155648; 2026-02-21T08:54:13.6698084Z // begin inline asm 2026-02-21T08:54:13.6698193Z cp.async.cg.shared.global [ %r958 + 0 ], [ %rd206 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6698245Z // end inline asm 2026-02-21T08:54:13.6698305Z add.s32 %r960, %r718, 156672; 2026-02-21T08:54:13.6698358Z // begin inline asm 2026-02-21T08:54:13.6698466Z cp.async.cg.shared.global [ %r960 + 0 ], [ %rd207 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6698526Z // end inline asm 2026-02-21T08:54:13.6698580Z add.s32 %r962, %r718, 157696; 2026-02-21T08:54:13.6698634Z // begin inline asm 2026-02-21T08:54:13.6698747Z cp.async.cg.shared.global [ %r962 + 0 ], [ %rd208 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6698800Z // end inline asm 2026-02-21T08:54:13.6698857Z add.s32 %r964, %r718, 158720; 2026-02-21T08:54:13.6698911Z // begin inline asm 2026-02-21T08:54:13.6699028Z cp.async.cg.shared.global [ %r964 + 0 ], [ %rd209 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6699112Z // end inline asm 2026-02-21T08:54:13.6699168Z add.s32 %r966, %r718, 159744; 2026-02-21T08:54:13.6699231Z // begin inline asm 2026-02-21T08:54:13.6699339Z cp.async.cg.shared.global [ %r966 + 0 ], [ %rd210 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6699392Z // end inline asm 2026-02-21T08:54:13.6699448Z add.s32 %r968, %r718, 160768; 2026-02-21T08:54:13.6699511Z // begin inline asm 2026-02-21T08:54:13.6699614Z cp.async.cg.shared.global [ %r968 + 0 ], [ %rd211 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6699668Z // end inline asm 2026-02-21T08:54:13.6699731Z add.s32 %r970, %r718, 161792; 2026-02-21T08:54:13.6699786Z // begin inline asm 2026-02-21T08:54:13.6699892Z cp.async.cg.shared.global [ %r970 + 0 ], [ %rd212 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6699945Z // end inline asm 2026-02-21T08:54:13.6700008Z add.s32 %r972, %r718, 162816; 2026-02-21T08:54:13.6700062Z // begin inline asm 2026-02-21T08:54:13.6700195Z cp.async.cg.shared.global [ %r972 + 0 ], [ %rd213 + 0 ], 0x10, %r879; 2026-02-21T08:54:13.6700257Z // end inline asm 2026-02-21T08:54:13.6700318Z cp.async.commit_group; 2026-02-21T08:54:13.6700492Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6700562Z setp.gt.s32 %p45, %r6, 3; 2026-02-21T08:54:13.6700729Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6700789Z add.s64 %rd214, %rd2, 768; 2026-02-21T08:54:13.6700872Z add.s64 %rd215, %rd3, 768; 2026-02-21T08:54:13.6700938Z add.s64 %rd216, %rd4, 768; 2026-02-21T08:54:13.6700995Z add.s64 %rd217, %rd5, 768; 2026-02-21T08:54:13.6701051Z add.s64 %rd218, %rd6, 768; 2026-02-21T08:54:13.6701114Z add.s64 %rd219, %rd7, 768; 2026-02-21T08:54:13.6701168Z add.s64 %rd220, %rd8, 768; 2026-02-21T08:54:13.6701223Z add.s64 %rd221, %rd9, 768; 2026-02-21T08:54:13.6701279Z add.s64 %rd222, %rd10, 768; 2026-02-21T08:54:13.6701344Z add.s64 %rd223, %rd11, 768; 2026-02-21T08:54:13.6701401Z add.s64 %rd224, %rd12, 768; 2026-02-21T08:54:13.6701456Z add.s64 %rd225, %rd13, 768; 2026-02-21T08:54:13.6701519Z add.s64 %rd226, %rd14, 768; 2026-02-21T08:54:13.6701573Z add.s64 %rd227, %rd15, 768; 2026-02-21T08:54:13.6701627Z add.s64 %rd228, %rd16, 768; 2026-02-21T08:54:13.6701686Z add.s64 %rd229, %rd17, 768; 2026-02-21T08:54:13.6701853Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6701908Z bar.sync 0; 2026-02-21T08:54:13.6701965Z add.s32 %r974, %r1228, 491520; 2026-02-21T08:54:13.6702033Z selp.b32 %r975, 16, 0, %p45; 2026-02-21T08:54:13.6702087Z // begin inline asm 2026-02-21T08:54:13.6702194Z cp.async.cg.shared.global [ %r974 + 0 ], [ %rd214 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6702251Z // end inline asm 2026-02-21T08:54:13.6702334Z add.s32 %r976, %r1228, 492544; 2026-02-21T08:54:13.6702390Z // begin inline asm 2026-02-21T08:54:13.6702499Z cp.async.cg.shared.global [ %r976 + 0 ], [ %rd215 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6702561Z // end inline asm 2026-02-21T08:54:13.6702616Z add.s32 %r978, %r1228, 493568; 2026-02-21T08:54:13.6702670Z // begin inline asm 2026-02-21T08:54:13.6702784Z cp.async.cg.shared.global [ %r978 + 0 ], [ %rd216 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6702837Z // end inline asm 2026-02-21T08:54:13.6702892Z add.s32 %r980, %r1228, 494592; 2026-02-21T08:54:13.6702951Z // begin inline asm 2026-02-21T08:54:13.6703059Z cp.async.cg.shared.global [ %r980 + 0 ], [ %rd217 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6703113Z // end inline asm 2026-02-21T08:54:13.6703168Z add.s32 %r982, %r1228, 495616; 2026-02-21T08:54:13.6703229Z // begin inline asm 2026-02-21T08:54:13.6703335Z cp.async.cg.shared.global [ %r982 + 0 ], [ %rd218 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6703387Z // end inline asm 2026-02-21T08:54:13.6703451Z add.s32 %r984, %r1228, 496640; 2026-02-21T08:54:13.6703505Z // begin inline asm 2026-02-21T08:54:13.6703612Z cp.async.cg.shared.global [ %r984 + 0 ], [ %rd219 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6703692Z // end inline asm 2026-02-21T08:54:13.6703756Z add.s32 %r986, %r1228, 497664; 2026-02-21T08:54:13.6703808Z // begin inline asm 2026-02-21T08:54:13.6703915Z cp.async.cg.shared.global [ %r986 + 0 ], [ %rd220 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6703975Z // end inline asm 2026-02-21T08:54:13.6704030Z add.s32 %r988, %r1228, 498688; 2026-02-21T08:54:13.6704083Z // begin inline asm 2026-02-21T08:54:13.6704189Z cp.async.cg.shared.global [ %r988 + 0 ], [ %rd221 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6704248Z // end inline asm 2026-02-21T08:54:13.6704303Z add.s32 %r990, %r1228, 499712; 2026-02-21T08:54:13.6704358Z // begin inline asm 2026-02-21T08:54:13.6704471Z cp.async.cg.shared.global [ %r990 + 0 ], [ %rd222 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6704524Z // end inline asm 2026-02-21T08:54:13.6704581Z add.s32 %r992, %r1228, 500736; 2026-02-21T08:54:13.6704666Z // begin inline asm 2026-02-21T08:54:13.6704802Z cp.async.cg.shared.global [ %r992 + 0 ], [ %rd223 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6704858Z // end inline asm 2026-02-21T08:54:13.6704913Z add.s32 %r994, %r1228, 501760; 2026-02-21T08:54:13.6704974Z // begin inline asm 2026-02-21T08:54:13.6705078Z cp.async.cg.shared.global [ %r994 + 0 ], [ %rd224 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6705130Z // end inline asm 2026-02-21T08:54:13.6705193Z add.s32 %r996, %r1228, 502784; 2026-02-21T08:54:13.6705246Z // begin inline asm 2026-02-21T08:54:13.6705381Z cp.async.cg.shared.global [ %r996 + 0 ], [ %rd225 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6705433Z // end inline asm 2026-02-21T08:54:13.6705495Z add.s32 %r998, %r1228, 503808; 2026-02-21T08:54:13.6705547Z // begin inline asm 2026-02-21T08:54:13.6705655Z cp.async.cg.shared.global [ %r998 + 0 ], [ %rd226 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6705716Z // end inline asm 2026-02-21T08:54:13.6705776Z add.s32 %r1000, %r1228, 504832; 2026-02-21T08:54:13.6705833Z // begin inline asm 2026-02-21T08:54:13.6705958Z cp.async.cg.shared.global [ %r1000 + 0 ], [ %rd227 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6706011Z // end inline asm 2026-02-21T08:54:13.6706072Z add.s32 %r1002, %r1228, 505856; 2026-02-21T08:54:13.6706125Z // begin inline asm 2026-02-21T08:54:13.6706247Z cp.async.cg.shared.global [ %r1002 + 0 ], [ %rd228 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6706300Z // end inline asm 2026-02-21T08:54:13.6706359Z add.s32 %r1004, %r1228, 506880; 2026-02-21T08:54:13.6706420Z // begin inline asm 2026-02-21T08:54:13.6706532Z cp.async.cg.shared.global [ %r1004 + 0 ], [ %rd229 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6706585Z // end inline asm 2026-02-21T08:54:13.6706645Z cp.async.commit_group; 2026-02-21T08:54:13.6706825Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6706907Z add.s64 %rd230, %rd18, 768; 2026-02-21T08:54:13.6706967Z add.s64 %rd231, %rd19, 768; 2026-02-21T08:54:13.6707032Z add.s64 %rd232, %rd20, 768; 2026-02-21T08:54:13.6707089Z add.s64 %rd233, %rd21, 768; 2026-02-21T08:54:13.6707144Z add.s64 %rd234, %rd22, 768; 2026-02-21T08:54:13.6707207Z add.s64 %rd235, %rd23, 768; 2026-02-21T08:54:13.6707262Z add.s64 %rd236, %rd24, 768; 2026-02-21T08:54:13.6707317Z add.s64 %rd237, %rd25, 768; 2026-02-21T08:54:13.6707372Z add.s64 %rd238, %rd26, 768; 2026-02-21T08:54:13.6707436Z add.s64 %rd239, %rd27, 768; 2026-02-21T08:54:13.6707490Z add.s64 %rd240, %rd28, 768; 2026-02-21T08:54:13.6707543Z add.s64 %rd241, %rd29, 768; 2026-02-21T08:54:13.6707608Z add.s64 %rd242, %rd30, 768; 2026-02-21T08:54:13.6707664Z add.s64 %rd243, %rd31, 768; 2026-02-21T08:54:13.6707721Z add.s64 %rd244, %rd32, 768; 2026-02-21T08:54:13.6707777Z add.s64 %rd245, %rd33, 768; 2026-02-21T08:54:13.6707841Z add.s64 %rd246, %rd34, 768; 2026-02-21T08:54:13.6707900Z add.s64 %rd247, %rd35, 768; 2026-02-21T08:54:13.6707957Z add.s64 %rd248, %rd36, 768; 2026-02-21T08:54:13.6708020Z add.s64 %rd249, %rd37, 768; 2026-02-21T08:54:13.6708105Z add.s64 %rd250, %rd38, 768; 2026-02-21T08:54:13.6708160Z add.s64 %rd251, %rd39, 768; 2026-02-21T08:54:13.6708216Z add.s64 %rd252, %rd40, 768; 2026-02-21T08:54:13.6708278Z add.s64 %rd253, %rd41, 768; 2026-02-21T08:54:13.6708333Z add.s64 %rd254, %rd42, 768; 2026-02-21T08:54:13.6708390Z add.s64 %rd255, %rd43, 768; 2026-02-21T08:54:13.6708451Z add.s64 %rd256, %rd44, 768; 2026-02-21T08:54:13.6708504Z add.s64 %rd257, %rd45, 768; 2026-02-21T08:54:13.6708559Z add.s64 %rd258, %rd46, 768; 2026-02-21T08:54:13.6708622Z add.s64 %rd259, %rd47, 768; 2026-02-21T08:54:13.6708677Z add.s64 %rd260, %rd48, 768; 2026-02-21T08:54:13.6708731Z add.s64 %rd261, %rd49, 768; 2026-02-21T08:54:13.6708902Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6708968Z add.s32 %r1006, %r718, 196608; 2026-02-21T08:54:13.6709023Z // begin inline asm 2026-02-21T08:54:13.6709159Z cp.async.cg.shared.global [ %r1006 + 0 ], [ %rd230 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6709223Z // end inline asm 2026-02-21T08:54:13.6709281Z add.s32 %r1008, %r718, 197632; 2026-02-21T08:54:13.6709335Z // begin inline asm 2026-02-21T08:54:13.6709445Z cp.async.cg.shared.global [ %r1008 + 0 ], [ %rd231 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6709506Z // end inline asm 2026-02-21T08:54:13.6709562Z add.s32 %r1010, %r718, 198656; 2026-02-21T08:54:13.6709617Z // begin inline asm 2026-02-21T08:54:13.6709737Z cp.async.cg.shared.global [ %r1010 + 0 ], [ %rd232 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6709814Z // end inline asm 2026-02-21T08:54:13.6709870Z add.s32 %r1012, %r718, 199680; 2026-02-21T08:54:13.6709923Z // begin inline asm 2026-02-21T08:54:13.6710038Z cp.async.cg.shared.global [ %r1012 + 0 ], [ %rd233 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6710090Z // end inline asm 2026-02-21T08:54:13.6710148Z add.s32 %r1014, %r718, 200704; 2026-02-21T08:54:13.6710209Z // begin inline asm 2026-02-21T08:54:13.6710317Z cp.async.cg.shared.global [ %r1014 + 0 ], [ %rd234 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6710372Z // end inline asm 2026-02-21T08:54:13.6710435Z add.s32 %r1016, %r718, 201728; 2026-02-21T08:54:13.6710488Z // begin inline asm 2026-02-21T08:54:13.6710599Z cp.async.cg.shared.global [ %r1016 + 0 ], [ %rd235 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6710652Z // end inline asm 2026-02-21T08:54:13.6710716Z add.s32 %r1018, %r718, 202752; 2026-02-21T08:54:13.6710769Z // begin inline asm 2026-02-21T08:54:13.6710877Z cp.async.cg.shared.global [ %r1018 + 0 ], [ %rd236 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6710936Z // end inline asm 2026-02-21T08:54:13.6710992Z add.s32 %r1020, %r718, 203776; 2026-02-21T08:54:13.6711045Z // begin inline asm 2026-02-21T08:54:13.6711152Z cp.async.cg.shared.global [ %r1020 + 0 ], [ %rd237 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6711210Z // end inline asm 2026-02-21T08:54:13.6711288Z add.s32 %r1022, %r718, 204800; 2026-02-21T08:54:13.6711345Z // begin inline asm 2026-02-21T08:54:13.6711463Z cp.async.cg.shared.global [ %r1022 + 0 ], [ %rd238 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6711516Z // end inline asm 2026-02-21T08:54:13.6711573Z add.s32 %r1024, %r718, 205824; 2026-02-21T08:54:13.6711631Z // begin inline asm 2026-02-21T08:54:13.6711740Z cp.async.cg.shared.global [ %r1024 + 0 ], [ %rd239 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6711791Z // end inline asm 2026-02-21T08:54:13.6711846Z add.s32 %r1026, %r718, 206848; 2026-02-21T08:54:13.6711906Z // begin inline asm 2026-02-21T08:54:13.6712014Z cp.async.cg.shared.global [ %r1026 + 0 ], [ %rd240 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6712067Z // end inline asm 2026-02-21T08:54:13.6712127Z add.s32 %r1028, %r718, 207872; 2026-02-21T08:54:13.6712180Z // begin inline asm 2026-02-21T08:54:13.6712289Z cp.async.cg.shared.global [ %r1028 + 0 ], [ %rd241 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6712341Z // end inline asm 2026-02-21T08:54:13.6712406Z add.s32 %r1030, %r718, 208896; 2026-02-21T08:54:13.6712460Z // begin inline asm 2026-02-21T08:54:13.6712566Z cp.async.cg.shared.global [ %r1030 + 0 ], [ %rd242 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6712653Z // end inline asm 2026-02-21T08:54:13.6712708Z add.s32 %r1032, %r718, 209920; 2026-02-21T08:54:13.6712761Z // begin inline asm 2026-02-21T08:54:13.6712876Z cp.async.cg.shared.global [ %r1032 + 0 ], [ %rd243 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6712930Z // end inline asm 2026-02-21T08:54:13.6712986Z add.s32 %r1034, %r718, 210944; 2026-02-21T08:54:13.6713039Z // begin inline asm 2026-02-21T08:54:13.6713157Z cp.async.cg.shared.global [ %r1034 + 0 ], [ %rd244 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6713209Z // end inline asm 2026-02-21T08:54:13.6713264Z add.s32 %r1036, %r718, 211968; 2026-02-21T08:54:13.6713325Z // begin inline asm 2026-02-21T08:54:13.6713434Z cp.async.cg.shared.global [ %r1036 + 0 ], [ %rd245 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6713488Z // end inline asm 2026-02-21T08:54:13.6713544Z add.s32 %r1038, %r718, 212992; 2026-02-21T08:54:13.6713627Z // begin inline asm 2026-02-21T08:54:13.6713738Z cp.async.cg.shared.global [ %r1038 + 0 ], [ %rd246 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6713791Z // end inline asm 2026-02-21T08:54:13.6713853Z add.s32 %r1040, %r718, 214016; 2026-02-21T08:54:13.6713907Z // begin inline asm 2026-02-21T08:54:13.6714016Z cp.async.cg.shared.global [ %r1040 + 0 ], [ %rd247 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6714068Z // end inline asm 2026-02-21T08:54:13.6714132Z add.s32 %r1042, %r718, 215040; 2026-02-21T08:54:13.6714185Z // begin inline asm 2026-02-21T08:54:13.6714333Z cp.async.cg.shared.global [ %r1042 + 0 ], [ %rd248 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6714395Z // end inline asm 2026-02-21T08:54:13.6714451Z add.s32 %r1044, %r718, 216064; 2026-02-21T08:54:13.6714505Z // begin inline asm 2026-02-21T08:54:13.6714625Z cp.async.cg.shared.global [ %r1044 + 0 ], [ %rd249 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6714709Z // end inline asm 2026-02-21T08:54:13.6714768Z add.s32 %r1046, %r718, 217088; 2026-02-21T08:54:13.6714824Z // begin inline asm 2026-02-21T08:54:13.6714955Z cp.async.cg.shared.global [ %r1046 + 0 ], [ %rd250 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6729143Z // end inline asm 2026-02-21T08:54:13.6729291Z add.s32 %r1048, %r718, 218112; 2026-02-21T08:54:13.6729360Z // begin inline asm 2026-02-21T08:54:13.6729517Z cp.async.cg.shared.global [ %r1048 + 0 ], [ %rd251 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6729581Z // end inline asm 2026-02-21T08:54:13.6729649Z add.s32 %r1050, %r718, 219136; 2026-02-21T08:54:13.6729707Z // begin inline asm 2026-02-21T08:54:13.6729845Z cp.async.cg.shared.global [ %r1050 + 0 ], [ %rd252 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6729902Z // end inline asm 2026-02-21T08:54:13.6729962Z add.s32 %r1052, %r718, 220160; 2026-02-21T08:54:13.6730027Z // begin inline asm 2026-02-21T08:54:13.6730239Z cp.async.cg.shared.global [ %r1052 + 0 ], [ %rd253 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6730301Z // end inline asm 2026-02-21T08:54:13.6730373Z add.s32 %r1054, %r718, 221184; 2026-02-21T08:54:13.6730435Z // begin inline asm 2026-02-21T08:54:13.6730554Z cp.async.cg.shared.global [ %r1054 + 0 ], [ %rd254 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6730609Z // end inline asm 2026-02-21T08:54:13.6730676Z add.s32 %r1056, %r718, 222208; 2026-02-21T08:54:13.6730732Z // begin inline asm 2026-02-21T08:54:13.6730846Z cp.async.cg.shared.global [ %r1056 + 0 ], [ %rd255 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6730909Z // end inline asm 2026-02-21T08:54:13.6730968Z add.s32 %r1058, %r718, 223232; 2026-02-21T08:54:13.6731028Z // begin inline asm 2026-02-21T08:54:13.6731142Z cp.async.cg.shared.global [ %r1058 + 0 ], [ %rd256 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6731206Z // end inline asm 2026-02-21T08:54:13.6731264Z add.s32 %r1060, %r718, 224256; 2026-02-21T08:54:13.6731321Z // begin inline asm 2026-02-21T08:54:13.6731443Z cp.async.cg.shared.global [ %r1060 + 0 ], [ %rd257 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6731499Z // end inline asm 2026-02-21T08:54:13.6731560Z add.s32 %r1062, %r718, 225280; 2026-02-21T08:54:13.6731660Z // begin inline asm 2026-02-21T08:54:13.6731781Z cp.async.cg.shared.global [ %r1062 + 0 ], [ %rd258 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6731836Z // end inline asm 2026-02-21T08:54:13.6731894Z add.s32 %r1064, %r718, 226304; 2026-02-21T08:54:13.6731959Z // begin inline asm 2026-02-21T08:54:13.6732072Z cp.async.cg.shared.global [ %r1064 + 0 ], [ %rd259 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6732126Z // end inline asm 2026-02-21T08:54:13.6732189Z add.s32 %r1066, %r718, 227328; 2026-02-21T08:54:13.6732248Z // begin inline asm 2026-02-21T08:54:13.6732361Z cp.async.cg.shared.global [ %r1066 + 0 ], [ %rd260 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6732417Z // end inline asm 2026-02-21T08:54:13.6732482Z add.s32 %r1068, %r718, 228352; 2026-02-21T08:54:13.6732538Z // begin inline asm 2026-02-21T08:54:13.6732653Z cp.async.cg.shared.global [ %r1068 + 0 ], [ %rd261 + 0 ], 0x10, %r975; 2026-02-21T08:54:13.6732714Z // end inline asm 2026-02-21T08:54:13.6732816Z cp.async.commit_group; 2026-02-21T08:54:13.6733026Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6733092Z setp.gt.s32 %p46, %r6, 4; 2026-02-21T08:54:13.6733284Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6733349Z add.s64 %rd262, %rd2, 1024; 2026-02-21T08:54:13.6733407Z add.s64 %rd263, %rd3, 1024; 2026-02-21T08:54:13.6733471Z add.s64 %rd264, %rd4, 1024; 2026-02-21T08:54:13.6733581Z add.s64 %rd265, %rd5, 1024; 2026-02-21T08:54:13.6733647Z add.s64 %rd266, %rd6, 1024; 2026-02-21T08:54:13.6733708Z add.s64 %rd267, %rd7, 1024; 2026-02-21T08:54:13.6733763Z add.s64 %rd268, %rd8, 1024; 2026-02-21T08:54:13.6733816Z add.s64 %rd269, %rd9, 1024; 2026-02-21T08:54:13.6733876Z add.s64 %rd270, %rd10, 1024; 2026-02-21T08:54:13.6733940Z add.s64 %rd271, %rd11, 1024; 2026-02-21T08:54:13.6733995Z add.s64 %rd272, %rd12, 1024; 2026-02-21T08:54:13.6734051Z add.s64 %rd273, %rd13, 1024; 2026-02-21T08:54:13.6734112Z add.s64 %rd274, %rd14, 1024; 2026-02-21T08:54:13.6734167Z add.s64 %rd275, %rd15, 1024; 2026-02-21T08:54:13.6734222Z add.s64 %rd276, %rd16, 1024; 2026-02-21T08:54:13.6734276Z add.s64 %rd277, %rd17, 1024; 2026-02-21T08:54:13.6734450Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6734504Z bar.sync 0; 2026-02-21T08:54:13.6734563Z add.s32 %r1070, %r1228, 524288; 2026-02-21T08:54:13.6734634Z selp.b32 %r1071, 16, 0, %p46; 2026-02-21T08:54:13.6734720Z // begin inline asm 2026-02-21T08:54:13.6734845Z cp.async.cg.shared.global [ %r1070 + 0 ], [ %rd262 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6734903Z // end inline asm 2026-02-21T08:54:13.6734961Z add.s32 %r1072, %r1228, 525312; 2026-02-21T08:54:13.6735015Z // begin inline asm 2026-02-21T08:54:13.6735160Z cp.async.cg.shared.global [ %r1072 + 0 ], [ %rd263 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6735224Z // end inline asm 2026-02-21T08:54:13.6735283Z add.s32 %r1074, %r1228, 526336; 2026-02-21T08:54:13.6735338Z // begin inline asm 2026-02-21T08:54:13.6735457Z cp.async.cg.shared.global [ %r1074 + 0 ], [ %rd264 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6735510Z // end inline asm 2026-02-21T08:54:13.6735567Z add.s32 %r1076, %r1228, 527360; 2026-02-21T08:54:13.6735622Z // begin inline asm 2026-02-21T08:54:13.6735742Z cp.async.cg.shared.global [ %r1076 + 0 ], [ %rd265 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6735794Z // end inline asm 2026-02-21T08:54:13.6735853Z add.s32 %r1078, %r1228, 528384; 2026-02-21T08:54:13.6735915Z // begin inline asm 2026-02-21T08:54:13.6736026Z cp.async.cg.shared.global [ %r1078 + 0 ], [ %rd266 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6736078Z // end inline asm 2026-02-21T08:54:13.6736143Z add.s32 %r1080, %r1228, 529408; 2026-02-21T08:54:13.6736199Z // begin inline asm 2026-02-21T08:54:13.6736310Z cp.async.cg.shared.global [ %r1080 + 0 ], [ %rd267 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6736390Z // end inline asm 2026-02-21T08:54:13.6736453Z add.s32 %r1082, %r1228, 530432; 2026-02-21T08:54:13.6736508Z // begin inline asm 2026-02-21T08:54:13.6736616Z cp.async.cg.shared.global [ %r1082 + 0 ], [ %rd268 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6736676Z // end inline asm 2026-02-21T08:54:13.6736732Z add.s32 %r1084, %r1228, 531456; 2026-02-21T08:54:13.6736786Z // begin inline asm 2026-02-21T08:54:13.6736898Z cp.async.cg.shared.global [ %r1084 + 0 ], [ %rd269 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6736963Z // end inline asm 2026-02-21T08:54:13.6737020Z add.s32 %r1086, %r1228, 532480; 2026-02-21T08:54:13.6737073Z // begin inline asm 2026-02-21T08:54:13.6737190Z cp.async.cg.shared.global [ %r1086 + 0 ], [ %rd270 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6737241Z // end inline asm 2026-02-21T08:54:13.6737296Z add.s32 %r1088, %r1228, 533504; 2026-02-21T08:54:13.6737350Z // begin inline asm 2026-02-21T08:54:13.6737501Z cp.async.cg.shared.global [ %r1088 + 0 ], [ %rd271 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6737557Z // end inline asm 2026-02-21T08:54:13.6737612Z add.s32 %r1090, %r1228, 534528; 2026-02-21T08:54:13.6737673Z // begin inline asm 2026-02-21T08:54:13.6737782Z cp.async.cg.shared.global [ %r1090 + 0 ], [ %rd272 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6737835Z // end inline asm 2026-02-21T08:54:13.6737897Z add.s32 %r1092, %r1228, 535552; 2026-02-21T08:54:13.6737950Z // begin inline asm 2026-02-21T08:54:13.6738061Z cp.async.cg.shared.global [ %r1092 + 0 ], [ %rd273 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6738141Z // end inline asm 2026-02-21T08:54:13.6738203Z add.s32 %r1094, %r1228, 536576; 2026-02-21T08:54:13.6738257Z // begin inline asm 2026-02-21T08:54:13.6738368Z cp.async.cg.shared.global [ %r1094 + 0 ], [ %rd274 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6738427Z // end inline asm 2026-02-21T08:54:13.6738484Z add.s32 %r1096, %r1228, 537600; 2026-02-21T08:54:13.6738538Z // begin inline asm 2026-02-21T08:54:13.6738650Z cp.async.cg.shared.global [ %r1096 + 0 ], [ %rd275 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6738710Z // end inline asm 2026-02-21T08:54:13.6738766Z add.s32 %r1098, %r1228, 538624; 2026-02-21T08:54:13.6738820Z // begin inline asm 2026-02-21T08:54:13.6738937Z cp.async.cg.shared.global [ %r1098 + 0 ], [ %rd276 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6738990Z // end inline asm 2026-02-21T08:54:13.6739045Z add.s32 %r1100, %r1228, 539648; 2026-02-21T08:54:13.6739105Z // begin inline asm 2026-02-21T08:54:13.6739217Z cp.async.cg.shared.global [ %r1100 + 0 ], [ %rd277 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6739272Z // end inline asm 2026-02-21T08:54:13.6739335Z cp.async.commit_group; 2026-02-21T08:54:13.6739512Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6739571Z add.s64 %rd278, %rd18, 1024; 2026-02-21T08:54:13.6739671Z add.s64 %rd279, %rd19, 1024; 2026-02-21T08:54:13.6739737Z add.s64 %rd280, %rd20, 1024; 2026-02-21T08:54:13.6739794Z add.s64 %rd281, %rd21, 1024; 2026-02-21T08:54:13.6739850Z add.s64 %rd282, %rd22, 1024; 2026-02-21T08:54:13.6739904Z add.s64 %rd283, %rd23, 1024; 2026-02-21T08:54:13.6739966Z add.s64 %rd284, %rd24, 1024; 2026-02-21T08:54:13.6740020Z add.s64 %rd285, %rd25, 1024; 2026-02-21T08:54:13.6740075Z add.s64 %rd286, %rd26, 1024; 2026-02-21T08:54:13.6740135Z add.s64 %rd287, %rd27, 1024; 2026-02-21T08:54:13.6740188Z add.s64 %rd288, %rd28, 1024; 2026-02-21T08:54:13.6740242Z add.s64 %rd289, %rd29, 1024; 2026-02-21T08:54:13.6740302Z add.s64 %rd290, %rd30, 1024; 2026-02-21T08:54:13.6740356Z add.s64 %rd291, %rd31, 1024; 2026-02-21T08:54:13.6740409Z add.s64 %rd292, %rd32, 1024; 2026-02-21T08:54:13.6740462Z add.s64 %rd293, %rd33, 1024; 2026-02-21T08:54:13.6740523Z add.s64 %rd294, %rd34, 1024; 2026-02-21T08:54:13.6740576Z add.s64 %rd295, %rd35, 1024; 2026-02-21T08:54:13.6740629Z add.s64 %rd296, %rd36, 1024; 2026-02-21T08:54:13.6740690Z add.s64 %rd297, %rd37, 1024; 2026-02-21T08:54:13.6740746Z add.s64 %rd298, %rd38, 1024; 2026-02-21T08:54:13.6740827Z add.s64 %rd299, %rd39, 1024; 2026-02-21T08:54:13.6740881Z add.s64 %rd300, %rd40, 1024; 2026-02-21T08:54:13.6740941Z add.s64 %rd301, %rd41, 1024; 2026-02-21T08:54:13.6740993Z add.s64 %rd302, %rd42, 1024; 2026-02-21T08:54:13.6741046Z add.s64 %rd303, %rd43, 1024; 2026-02-21T08:54:13.6741105Z add.s64 %rd304, %rd44, 1024; 2026-02-21T08:54:13.6741160Z add.s64 %rd305, %rd45, 1024; 2026-02-21T08:54:13.6741213Z add.s64 %rd306, %rd46, 1024; 2026-02-21T08:54:13.6741268Z add.s64 %rd307, %rd47, 1024; 2026-02-21T08:54:13.6741331Z add.s64 %rd308, %rd48, 1024; 2026-02-21T08:54:13.6741384Z add.s64 %rd309, %rd49, 1024; 2026-02-21T08:54:13.6741551Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6741615Z add.s32 %r1102, %r718, 262144; 2026-02-21T08:54:13.6741670Z // begin inline asm 2026-02-21T08:54:13.6741809Z cp.async.cg.shared.global [ %r1102 + 0 ], [ %rd278 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6741872Z // end inline asm 2026-02-21T08:54:13.6741929Z add.s32 %r1104, %r718, 263168; 2026-02-21T08:54:13.6741982Z // begin inline asm 2026-02-21T08:54:13.6742092Z cp.async.cg.shared.global [ %r1104 + 0 ], [ %rd279 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6742151Z // end inline asm 2026-02-21T08:54:13.6742207Z add.s32 %r1106, %r718, 264192; 2026-02-21T08:54:13.6742259Z // begin inline asm 2026-02-21T08:54:13.6742375Z cp.async.cg.shared.global [ %r1106 + 0 ], [ %rd280 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6742455Z // end inline asm 2026-02-21T08:54:13.6742510Z add.s32 %r1108, %r718, 265216; 2026-02-21T08:54:13.6742564Z // begin inline asm 2026-02-21T08:54:13.6742682Z cp.async.cg.shared.global [ %r1108 + 0 ], [ %rd281 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6742735Z // end inline asm 2026-02-21T08:54:13.6742791Z add.s32 %r1110, %r718, 266240; 2026-02-21T08:54:13.6742854Z // begin inline asm 2026-02-21T08:54:13.6742965Z cp.async.cg.shared.global [ %r1110 + 0 ], [ %rd282 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6743018Z // end inline asm 2026-02-21T08:54:13.6743080Z add.s32 %r1112, %r718, 267264; 2026-02-21T08:54:13.6743132Z // begin inline asm 2026-02-21T08:54:13.6743242Z cp.async.cg.shared.global [ %r1112 + 0 ], [ %rd283 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6743292Z // end inline asm 2026-02-21T08:54:13.6743357Z add.s32 %r1114, %r718, 268288; 2026-02-21T08:54:13.6743411Z // begin inline asm 2026-02-21T08:54:13.6743522Z cp.async.cg.shared.global [ %r1114 + 0 ], [ %rd284 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6743584Z // end inline asm 2026-02-21T08:54:13.6743641Z add.s32 %r1116, %r718, 269312; 2026-02-21T08:54:13.6743695Z // begin inline asm 2026-02-21T08:54:13.6743805Z cp.async.cg.shared.global [ %r1116 + 0 ], [ %rd285 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6743870Z // end inline asm 2026-02-21T08:54:13.6743948Z add.s32 %r1118, %r718, 270336; 2026-02-21T08:54:13.6744002Z // begin inline asm 2026-02-21T08:54:13.6744121Z cp.async.cg.shared.global [ %r1118 + 0 ], [ %rd286 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6744175Z // end inline asm 2026-02-21T08:54:13.6744229Z add.s32 %r1120, %r718, 271360; 2026-02-21T08:54:13.6744282Z // begin inline asm 2026-02-21T08:54:13.6744398Z cp.async.cg.shared.global [ %r1120 + 0 ], [ %rd287 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6744451Z // end inline asm 2026-02-21T08:54:13.6744507Z add.s32 %r1122, %r718, 272384; 2026-02-21T08:54:13.6744566Z // begin inline asm 2026-02-21T08:54:13.6744731Z cp.async.cg.shared.global [ %r1122 + 0 ], [ %rd288 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6744787Z // end inline asm 2026-02-21T08:54:13.6744850Z add.s32 %r1124, %r718, 273408; 2026-02-21T08:54:13.6744902Z // begin inline asm 2026-02-21T08:54:13.6745013Z cp.async.cg.shared.global [ %r1124 + 0 ], [ %rd289 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6745064Z // end inline asm 2026-02-21T08:54:13.6745129Z add.s32 %r1126, %r718, 274432; 2026-02-21T08:54:13.6745182Z // begin inline asm 2026-02-21T08:54:13.6745292Z cp.async.cg.shared.global [ %r1126 + 0 ], [ %rd290 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6745384Z // end inline asm 2026-02-21T08:54:13.6745439Z add.s32 %r1128, %r718, 275456; 2026-02-21T08:54:13.6745492Z // begin inline asm 2026-02-21T08:54:13.6745600Z cp.async.cg.shared.global [ %r1128 + 0 ], [ %rd291 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6745661Z // end inline asm 2026-02-21T08:54:13.6745717Z add.s32 %r1130, %r718, 276480; 2026-02-21T08:54:13.6745770Z // begin inline asm 2026-02-21T08:54:13.6745887Z cp.async.cg.shared.global [ %r1130 + 0 ], [ %rd292 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6745941Z // end inline asm 2026-02-21T08:54:13.6745998Z add.s32 %r1132, %r718, 277504; 2026-02-21T08:54:13.6746062Z // begin inline asm 2026-02-21T08:54:13.6746172Z cp.async.cg.shared.global [ %r1132 + 0 ], [ %rd293 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6746226Z // end inline asm 2026-02-21T08:54:13.6746281Z add.s32 %r1134, %r718, 278528; 2026-02-21T08:54:13.6746369Z // begin inline asm 2026-02-21T08:54:13.6746485Z cp.async.cg.shared.global [ %r1134 + 0 ], [ %rd294 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6746536Z // end inline asm 2026-02-21T08:54:13.6746599Z add.s32 %r1136, %r718, 279552; 2026-02-21T08:54:13.6746652Z // begin inline asm 2026-02-21T08:54:13.6746761Z cp.async.cg.shared.global [ %r1136 + 0 ], [ %rd295 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6746813Z // end inline asm 2026-02-21T08:54:13.6746874Z add.s32 %r1138, %r718, 280576; 2026-02-21T08:54:13.6746927Z // begin inline asm 2026-02-21T08:54:13.6747063Z cp.async.cg.shared.global [ %r1138 + 0 ], [ %rd296 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6747122Z // end inline asm 2026-02-21T08:54:13.6747177Z add.s32 %r1140, %r718, 281600; 2026-02-21T08:54:13.6747230Z // begin inline asm 2026-02-21T08:54:13.6747340Z cp.async.cg.shared.global [ %r1140 + 0 ], [ %rd297 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6747400Z // end inline asm 2026-02-21T08:54:13.6747456Z add.s32 %r1142, %r718, 282624; 2026-02-21T08:54:13.6747511Z // begin inline asm 2026-02-21T08:54:13.6747631Z cp.async.cg.shared.global [ %r1142 + 0 ], [ %rd298 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6747683Z // end inline asm 2026-02-21T08:54:13.6747736Z add.s32 %r1144, %r718, 283648; 2026-02-21T08:54:13.6747796Z // begin inline asm 2026-02-21T08:54:13.6747906Z cp.async.cg.shared.global [ %r1144 + 0 ], [ %rd299 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6747957Z // end inline asm 2026-02-21T08:54:13.6748011Z add.s32 %r1146, %r718, 284672; 2026-02-21T08:54:13.6748072Z // begin inline asm 2026-02-21T08:54:13.6748182Z cp.async.cg.shared.global [ %r1146 + 0 ], [ %rd300 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6748234Z // end inline asm 2026-02-21T08:54:13.6748293Z add.s32 %r1148, %r718, 285696; 2026-02-21T08:54:13.6748346Z // begin inline asm 2026-02-21T08:54:13.6748482Z cp.async.cg.shared.global [ %r1148 + 0 ], [ %rd301 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6748535Z // end inline asm 2026-02-21T08:54:13.6748598Z add.s32 %r1150, %r718, 286720; 2026-02-21T08:54:13.6748652Z // begin inline asm 2026-02-21T08:54:13.6748760Z cp.async.cg.shared.global [ %r1150 + 0 ], [ %rd302 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6748818Z // end inline asm 2026-02-21T08:54:13.6748873Z add.s32 %r1152, %r718, 287744; 2026-02-21T08:54:13.6748926Z // begin inline asm 2026-02-21T08:54:13.6749040Z cp.async.cg.shared.global [ %r1152 + 0 ], [ %rd303 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6749092Z // end inline asm 2026-02-21T08:54:13.6749146Z add.s32 %r1154, %r718, 288768; 2026-02-21T08:54:13.6749202Z // begin inline asm 2026-02-21T08:54:13.6749316Z cp.async.cg.shared.global [ %r1154 + 0 ], [ %rd304 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6749368Z // end inline asm 2026-02-21T08:54:13.6749422Z add.s32 %r1156, %r718, 289792; 2026-02-21T08:54:13.6749482Z // begin inline asm 2026-02-21T08:54:13.6749592Z cp.async.cg.shared.global [ %r1156 + 0 ], [ %rd305 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6749644Z // end inline asm 2026-02-21T08:54:13.6749701Z add.s32 %r1158, %r718, 290816; 2026-02-21T08:54:13.6749785Z // begin inline asm 2026-02-21T08:54:13.6749895Z cp.async.cg.shared.global [ %r1158 + 0 ], [ %rd306 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6749947Z // end inline asm 2026-02-21T08:54:13.6750008Z add.s32 %r1160, %r718, 291840; 2026-02-21T08:54:13.6750061Z // begin inline asm 2026-02-21T08:54:13.6750171Z cp.async.cg.shared.global [ %r1160 + 0 ], [ %rd307 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6750222Z // end inline asm 2026-02-21T08:54:13.6750284Z add.s32 %r1162, %r718, 292864; 2026-02-21T08:54:13.6750341Z // begin inline asm 2026-02-21T08:54:13.6750452Z cp.async.cg.shared.global [ %r1162 + 0 ], [ %rd308 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6750513Z // end inline asm 2026-02-21T08:54:13.6750568Z add.s32 %r1164, %r718, 293888; 2026-02-21T08:54:13.6750621Z // begin inline asm 2026-02-21T08:54:13.6750739Z cp.async.cg.shared.global [ %r1164 + 0 ], [ %rd309 + 0 ], 0x10, %r1071; 2026-02-21T08:54:13.6750793Z // end inline asm 2026-02-21T08:54:13.6750876Z cp.async.commit_group; 2026-02-21T08:54:13.6751049Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6751122Z cp.async.wait_group 8; 2026-02-21T08:54:13.6751178Z bar.sync 0; 2026-02-21T08:54:13.6751345Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6751413Z setp.ne.b32 %p47, %r7, 0; 2026-02-21T08:54:13.6751477Z or.pred %p48, %p41, %p47; 2026-02-21T08:54:13.6751555Z @%p48 bra $L__BB0_2; 2026-02-21T08:54:13.6751606Z // %bb.1: 2026-02-21T08:54:13.6751682Z elect.sync %r1310|%p50, -1; 2026-02-21T08:54:13.6751738Z add.s32 %r1312, %r403, 393216; 2026-02-21T08:54:13.6751793Z bfe.u32 %r1313, %r1312, 4, 14; 2026-02-21T08:54:13.6751856Z cvt.u64.u32 %rd474, %r1313; 2026-02-21T08:54:13.6751930Z or.b64 %rd457, %rd474, 4611686293372403712; 2026-02-21T08:54:13.6751988Z bfe.u32 %r1314, %r403, 4, 14; 2026-02-21T08:54:13.6752052Z cvt.u64.u32 %rd475, %r1314; 2026-02-21T08:54:13.6752122Z or.b64 %rd458, %rd475, 4611686293439512576; 2026-02-21T08:54:13.6752178Z mov.b32 %r1295, 138412048; 2026-02-21T08:54:13.6752233Z mov.pred %p49, 0; 2026-02-21T08:54:13.6752296Z // begin inline asm 2026-02-21T08:54:13.6752453Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd457, %rd458, %r1295, %p49; 2026-02-21T08:54:13.6752507Z // end inline asm 2026-02-21T08:54:13.6752567Z add.s32 %r1315, %r403, 393248; 2026-02-21T08:54:13.6752623Z bfe.u32 %r1316, %r1315, 4, 14; 2026-02-21T08:54:13.6752680Z cvt.u64.u32 %rd476, %r1316; 2026-02-21T08:54:13.6752747Z or.b64 %rd459, %rd476, 4611686293372403712; 2026-02-21T08:54:13.6752808Z add.s32 %r1317, %r403, 32; 2026-02-21T08:54:13.6752863Z bfe.u32 %r1318, %r1317, 4, 14; 2026-02-21T08:54:13.6752917Z cvt.u64.u32 %rd477, %r1318; 2026-02-21T08:54:13.6753013Z or.b64 %rd460, %rd477, 4611686293439512576; 2026-02-21T08:54:13.6753069Z // begin inline asm 2026-02-21T08:54:13.6753212Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd459, %rd460, %r1295, %p51; 2026-02-21T08:54:13.6753273Z // end inline asm 2026-02-21T08:54:13.6753329Z add.s32 %r1319, %r403, 393280; 2026-02-21T08:54:13.6753383Z bfe.u32 %r1320, %r1319, 4, 14; 2026-02-21T08:54:13.6753437Z cvt.u64.u32 %rd478, %r1320; 2026-02-21T08:54:13.6753507Z or.b64 %rd461, %rd478, 4611686293372403712; 2026-02-21T08:54:13.6753562Z add.s32 %r1321, %r403, 64; 2026-02-21T08:54:13.6753617Z bfe.u32 %r1322, %r1321, 4, 14; 2026-02-21T08:54:13.6753678Z cvt.u64.u32 %rd479, %r1322; 2026-02-21T08:54:13.6753746Z or.b64 %rd462, %rd479, 4611686293439512576; 2026-02-21T08:54:13.6753800Z // begin inline asm 2026-02-21T08:54:13.6753943Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd461, %rd462, %r1295, %p51; 2026-02-21T08:54:13.6753997Z // end inline asm 2026-02-21T08:54:13.6754055Z add.s32 %r1323, %r403, 393312; 2026-02-21T08:54:13.6754111Z bfe.u32 %r1324, %r1323, 4, 14; 2026-02-21T08:54:13.6754176Z cvt.u64.u32 %rd480, %r1324; 2026-02-21T08:54:13.6754265Z or.b64 %rd463, %rd480, 4611686293372403712; 2026-02-21T08:54:13.6754320Z add.s32 %r1325, %r403, 96; 2026-02-21T08:54:13.6754379Z bfe.u32 %r1326, %r1325, 4, 14; 2026-02-21T08:54:13.6754436Z cvt.u64.u32 %rd481, %r1326; 2026-02-21T08:54:13.6754500Z or.b64 %rd464, %rd481, 4611686293439512576; 2026-02-21T08:54:13.6754554Z // begin inline asm 2026-02-21T08:54:13.6754731Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd463, %rd464, %r1295, %p51; 2026-02-21T08:54:13.6754785Z // end inline asm 2026-02-21T08:54:13.6754839Z add.s32 %r1327, %r403, 409600; 2026-02-21T08:54:13.6754901Z bfe.u32 %r1328, %r1327, 4, 14; 2026-02-21T08:54:13.6754956Z cvt.u64.u32 %rd482, %r1328; 2026-02-21T08:54:13.6755018Z or.b64 %rd465, %rd482, 4611686293372403712; 2026-02-21T08:54:13.6755081Z add.s32 %r1329, %r403, 32768; 2026-02-21T08:54:13.6755137Z bfe.u32 %r1330, %r1329, 4, 14; 2026-02-21T08:54:13.6755194Z cvt.u64.u32 %rd483, %r1330; 2026-02-21T08:54:13.6755285Z or.b64 %rd466, %rd483, 4611686293439512576; 2026-02-21T08:54:13.6755349Z // begin inline asm 2026-02-21T08:54:13.6755483Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd465, %rd466, %r1295, %p51; 2026-02-21T08:54:13.6755537Z // end inline asm 2026-02-21T08:54:13.6755599Z add.s32 %r1331, %r403, 409632; 2026-02-21T08:54:13.6755654Z bfe.u32 %r1332, %r1331, 4, 14; 2026-02-21T08:54:13.6755710Z cvt.u64.u32 %rd484, %r1332; 2026-02-21T08:54:13.6755773Z or.b64 %rd467, %rd484, 4611686293372403712; 2026-02-21T08:54:13.6755874Z add.s32 %r1333, %r403, 32800; 2026-02-21T08:54:13.6755928Z bfe.u32 %r1334, %r1333, 4, 14; 2026-02-21T08:54:13.6755983Z cvt.u64.u32 %rd485, %r1334; 2026-02-21T08:54:13.6756053Z or.b64 %rd468, %rd485, 4611686293439512576; 2026-02-21T08:54:13.6756107Z // begin inline asm 2026-02-21T08:54:13.6756240Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd467, %rd468, %r1295, %p51; 2026-02-21T08:54:13.6756298Z // end inline asm 2026-02-21T08:54:13.6756354Z add.s32 %r1335, %r403, 409664; 2026-02-21T08:54:13.6756409Z bfe.u32 %r1336, %r1335, 4, 14; 2026-02-21T08:54:13.6756464Z cvt.u64.u32 %rd486, %r1336; 2026-02-21T08:54:13.6756534Z or.b64 %rd469, %rd486, 4611686293372403712; 2026-02-21T08:54:13.6756589Z add.s32 %r1337, %r403, 32832; 2026-02-21T08:54:13.6756643Z bfe.u32 %r1338, %r1337, 4, 14; 2026-02-21T08:54:13.6756704Z cvt.u64.u32 %rd487, %r1338; 2026-02-21T08:54:13.6756767Z or.b64 %rd470, %rd487, 4611686293439512576; 2026-02-21T08:54:13.6756821Z // begin inline asm 2026-02-21T08:54:13.6756962Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd469, %rd470, %r1295, %p51; 2026-02-21T08:54:13.6757014Z // end inline asm 2026-02-21T08:54:13.6757069Z add.s32 %r1339, %r403, 409696; 2026-02-21T08:54:13.6757122Z bfe.u32 %r1340, %r1339, 4, 14; 2026-02-21T08:54:13.6757183Z cvt.u64.u32 %rd488, %r1340; 2026-02-21T08:54:13.6757271Z or.b64 %rd471, %rd488, 4611686293372403712; 2026-02-21T08:54:13.6757330Z add.s32 %r1341, %r403, 32864; 2026-02-21T08:54:13.6757390Z bfe.u32 %r1342, %r1341, 4, 14; 2026-02-21T08:54:13.6757447Z cvt.u64.u32 %rd489, %r1342; 2026-02-21T08:54:13.6757512Z or.b64 %rd472, %rd489, 4611686293439512576; 2026-02-21T08:54:13.6757566Z // begin inline asm 2026-02-21T08:54:13.6757708Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd471, %rd472, %r1295, %p51; 2026-02-21T08:54:13.6757762Z // end inline asm 2026-02-21T08:54:13.6757818Z add.s32 %r1343, %r403, 655360; 2026-02-21T08:54:13.6757888Z cvt.u64.u32 %rd473, %r1343; 2026-02-21T08:54:13.6757947Z // begin inline asm 2026-02-21T08:54:13.6758072Z @%p50 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd473]; 2026-02-21T08:54:13.6758132Z // end inline asm 2026-02-21T08:54:13.6758185Z $L__BB0_2: 2026-02-21T08:54:13.6758367Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6758430Z setp.gt.s32 %p66, %r6, 5; 2026-02-21T08:54:13.6758607Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6758692Z add.s64 %rd490, %rd2, 1280; 2026-02-21T08:54:13.6758747Z add.s64 %rd491, %rd3, 1280; 2026-02-21T08:54:13.6758810Z add.s64 %rd492, %rd4, 1280; 2026-02-21T08:54:13.6758866Z add.s64 %rd493, %rd5, 1280; 2026-02-21T08:54:13.6758921Z add.s64 %rd494, %rd6, 1280; 2026-02-21T08:54:13.6758983Z add.s64 %rd495, %rd7, 1280; 2026-02-21T08:54:13.6759036Z add.s64 %rd496, %rd8, 1280; 2026-02-21T08:54:13.6759090Z add.s64 %rd497, %rd9, 1280; 2026-02-21T08:54:13.6759149Z add.s64 %rd498, %rd10, 1280; 2026-02-21T08:54:13.6759213Z add.s64 %rd499, %rd11, 1280; 2026-02-21T08:54:13.6759270Z add.s64 %rd500, %rd12, 1280; 2026-02-21T08:54:13.6759324Z add.s64 %rd501, %rd13, 1280; 2026-02-21T08:54:13.6759387Z add.s64 %rd502, %rd14, 1280; 2026-02-21T08:54:13.6759441Z add.s64 %rd503, %rd15, 1280; 2026-02-21T08:54:13.6759497Z add.s64 %rd504, %rd16, 1280; 2026-02-21T08:54:13.6759550Z add.s64 %rd505, %rd17, 1280; 2026-02-21T08:54:13.6759753Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6759808Z bar.sync 0; 2026-02-21T08:54:13.6759870Z add.s32 %r1344, %r1228, 557056; 2026-02-21T08:54:13.6759938Z selp.b32 %r1345, 16, 0, %p66; 2026-02-21T08:54:13.6759993Z // begin inline asm 2026-02-21T08:54:13.6760110Z cp.async.cg.shared.global [ %r1344 + 0 ], [ %rd490 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6760171Z // end inline asm 2026-02-21T08:54:13.6760231Z add.s32 %r1346, %r1228, 558080; 2026-02-21T08:54:13.6760306Z // begin inline asm 2026-02-21T08:54:13.6760422Z cp.async.cg.shared.global [ %r1346 + 0 ], [ %rd491 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6760482Z // end inline asm 2026-02-21T08:54:13.6760538Z add.s32 %r1348, %r1228, 559104; 2026-02-21T08:54:13.6760591Z // begin inline asm 2026-02-21T08:54:13.6760709Z cp.async.cg.shared.global [ %r1348 + 0 ], [ %rd492 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6760761Z // end inline asm 2026-02-21T08:54:13.6760818Z add.s32 %r1350, %r1228, 560128; 2026-02-21T08:54:13.6760874Z // begin inline asm 2026-02-21T08:54:13.6760990Z cp.async.cg.shared.global [ %r1350 + 0 ], [ %rd493 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6761043Z // end inline asm 2026-02-21T08:54:13.6763171Z add.s32 %r1352, %r1228, 561152; 2026-02-21T08:54:13.6763229Z // begin inline asm 2026-02-21T08:54:13.6763356Z cp.async.cg.shared.global [ %r1352 + 0 ], [ %rd494 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6763412Z // end inline asm 2026-02-21T08:54:13.6763470Z add.s32 %r1354, %r1228, 562176; 2026-02-21T08:54:13.6763534Z // begin inline asm 2026-02-21T08:54:13.6763648Z cp.async.cg.shared.global [ %r1354 + 0 ], [ %rd495 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6763701Z // end inline asm 2026-02-21T08:54:13.6763757Z add.s32 %r1356, %r1228, 563200; 2026-02-21T08:54:13.6763820Z // begin inline asm 2026-02-21T08:54:13.6763962Z cp.async.cg.shared.global [ %r1356 + 0 ], [ %rd496 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6764020Z // end inline asm 2026-02-21T08:54:13.6764083Z add.s32 %r1358, %r1228, 564224; 2026-02-21T08:54:13.6764137Z // begin inline asm 2026-02-21T08:54:13.6764247Z cp.async.cg.shared.global [ %r1358 + 0 ], [ %rd497 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6764298Z // end inline asm 2026-02-21T08:54:13.6764398Z add.s32 %r1360, %r1228, 565248; 2026-02-21T08:54:13.6764453Z // begin inline asm 2026-02-21T08:54:13.6764572Z cp.async.cg.shared.global [ %r1360 + 0 ], [ %rd498 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6764625Z // end inline asm 2026-02-21T08:54:13.6764730Z add.s32 %r1362, %r1228, 566272; 2026-02-21T08:54:13.6764791Z // begin inline asm 2026-02-21T08:54:13.6764902Z cp.async.cg.shared.global [ %r1362 + 0 ], [ %rd499 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6764954Z // end inline asm 2026-02-21T08:54:13.6765011Z add.s32 %r1364, %r1228, 567296; 2026-02-21T08:54:13.6765075Z // begin inline asm 2026-02-21T08:54:13.6765187Z cp.async.cg.shared.global [ %r1364 + 0 ], [ %rd500 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6765243Z // end inline asm 2026-02-21T08:54:13.6765341Z add.s32 %r1366, %r1228, 568320; 2026-02-21T08:54:13.6765395Z // begin inline asm 2026-02-21T08:54:13.6765506Z cp.async.cg.shared.global [ %r1366 + 0 ], [ %rd501 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6765569Z // end inline asm 2026-02-21T08:54:13.6765624Z add.s32 %r1368, %r1228, 569344; 2026-02-21T08:54:13.6765678Z // begin inline asm 2026-02-21T08:54:13.6765788Z cp.async.cg.shared.global [ %r1368 + 0 ], [ %rd502 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6765848Z // end inline asm 2026-02-21T08:54:13.6765904Z add.s32 %r1370, %r1228, 570368; 2026-02-21T08:54:13.6765958Z // begin inline asm 2026-02-21T08:54:13.6766074Z cp.async.cg.shared.global [ %r1370 + 0 ], [ %rd503 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6766126Z // end inline asm 2026-02-21T08:54:13.6766182Z add.s32 %r1372, %r1228, 571392; 2026-02-21T08:54:13.6766238Z // begin inline asm 2026-02-21T08:54:13.6766384Z cp.async.cg.shared.global [ %r1372 + 0 ], [ %rd504 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6766441Z // end inline asm 2026-02-21T08:54:13.6766499Z add.s32 %r1374, %r1228, 572416; 2026-02-21T08:54:13.6766562Z // begin inline asm 2026-02-21T08:54:13.6766670Z cp.async.cg.shared.global [ %r1374 + 0 ], [ %rd505 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6766725Z // end inline asm 2026-02-21T08:54:13.6766793Z cp.async.commit_group; 2026-02-21T08:54:13.6766971Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6767031Z add.s64 %rd506, %rd18, 1280; 2026-02-21T08:54:13.6767089Z add.s64 %rd507, %rd19, 1280; 2026-02-21T08:54:13.6767156Z add.s64 %rd508, %rd20, 1280; 2026-02-21T08:54:13.6767212Z add.s64 %rd509, %rd21, 1280; 2026-02-21T08:54:13.6767269Z add.s64 %rd510, %rd22, 1280; 2026-02-21T08:54:13.6767333Z add.s64 %rd511, %rd23, 1280; 2026-02-21T08:54:13.6767391Z add.s64 %rd512, %rd24, 1280; 2026-02-21T08:54:13.6767449Z add.s64 %rd513, %rd25, 1280; 2026-02-21T08:54:13.6767508Z add.s64 %rd514, %rd26, 1280; 2026-02-21T08:54:13.6767575Z add.s64 %rd515, %rd27, 1280; 2026-02-21T08:54:13.6767630Z add.s64 %rd516, %rd28, 1280; 2026-02-21T08:54:13.6767685Z add.s64 %rd517, %rd29, 1280; 2026-02-21T08:54:13.6767750Z add.s64 %rd518, %rd30, 1280; 2026-02-21T08:54:13.6767881Z add.s64 %rd519, %rd31, 1280; 2026-02-21T08:54:13.6767935Z add.s64 %rd520, %rd32, 1280; 2026-02-21T08:54:13.6767989Z add.s64 %rd521, %rd33, 1280; 2026-02-21T08:54:13.6768050Z add.s64 %rd522, %rd34, 1280; 2026-02-21T08:54:13.6768103Z add.s64 %rd523, %rd35, 1280; 2026-02-21T08:54:13.6768158Z add.s64 %rd524, %rd36, 1280; 2026-02-21T08:54:13.6768220Z add.s64 %rd525, %rd37, 1280; 2026-02-21T08:54:13.6768274Z add.s64 %rd526, %rd38, 1280; 2026-02-21T08:54:13.6768328Z add.s64 %rd527, %rd39, 1280; 2026-02-21T08:54:13.6768390Z add.s64 %rd528, %rd40, 1280; 2026-02-21T08:54:13.6768470Z add.s64 %rd529, %rd41, 1280; 2026-02-21T08:54:13.6768526Z add.s64 %rd530, %rd42, 1280; 2026-02-21T08:54:13.6768585Z add.s64 %rd531, %rd43, 1280; 2026-02-21T08:54:13.6768652Z add.s64 %rd532, %rd44, 1280; 2026-02-21T08:54:13.6768708Z add.s64 %rd533, %rd45, 1280; 2026-02-21T08:54:13.6768764Z add.s64 %rd534, %rd46, 1280; 2026-02-21T08:54:13.6768826Z add.s64 %rd535, %rd47, 1280; 2026-02-21T08:54:13.6768884Z add.s64 %rd536, %rd48, 1280; 2026-02-21T08:54:13.6768941Z add.s64 %rd537, %rd49, 1280; 2026-02-21T08:54:13.6769124Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6769194Z add.s32 %r1376, %r718, 327680; 2026-02-21T08:54:13.6769251Z // begin inline asm 2026-02-21T08:54:13.6769367Z cp.async.cg.shared.global [ %r1376 + 0 ], [ %rd506 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6769432Z // end inline asm 2026-02-21T08:54:13.6769493Z add.s32 %r1378, %r718, 328704; 2026-02-21T08:54:13.6769549Z // begin inline asm 2026-02-21T08:54:13.6769673Z cp.async.cg.shared.global [ %r1378 + 0 ], [ %rd507 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6769732Z // end inline asm 2026-02-21T08:54:13.6769815Z add.s32 %r1380, %r718, 329728; 2026-02-21T08:54:13.6769872Z // begin inline asm 2026-02-21T08:54:13.6769996Z cp.async.cg.shared.global [ %r1380 + 0 ], [ %rd508 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6770051Z // end inline asm 2026-02-21T08:54:13.6770111Z add.s32 %r1382, %r718, 330752; 2026-02-21T08:54:13.6770173Z // begin inline asm 2026-02-21T08:54:13.6770288Z cp.async.cg.shared.global [ %r1382 + 0 ], [ %rd509 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6770342Z // end inline asm 2026-02-21T08:54:13.6770399Z add.s32 %r1384, %r718, 331776; 2026-02-21T08:54:13.6770464Z // begin inline asm 2026-02-21T08:54:13.6770578Z cp.async.cg.shared.global [ %r1384 + 0 ], [ %rd510 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6770632Z // end inline asm 2026-02-21T08:54:13.6770696Z add.s32 %r1386, %r718, 332800; 2026-02-21T08:54:13.6770752Z // begin inline asm 2026-02-21T08:54:13.6770890Z cp.async.cg.shared.global [ %r1386 + 0 ], [ %rd511 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6770953Z // end inline asm 2026-02-21T08:54:13.6771012Z add.s32 %r1388, %r718, 333824; 2026-02-21T08:54:13.6771068Z // begin inline asm 2026-02-21T08:54:13.6771182Z cp.async.cg.shared.global [ %r1388 + 0 ], [ %rd512 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6771245Z // end inline asm 2026-02-21T08:54:13.6771302Z add.s32 %r1390, %r718, 334848; 2026-02-21T08:54:13.6771357Z // begin inline asm 2026-02-21T08:54:13.6771477Z cp.async.cg.shared.global [ %r1390 + 0 ], [ %rd513 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6771532Z // end inline asm 2026-02-21T08:54:13.6771589Z add.s32 %r1392, %r718, 335872; 2026-02-21T08:54:13.6771643Z // begin inline asm 2026-02-21T08:54:13.6771763Z cp.async.cg.shared.global [ %r1392 + 0 ], [ %rd514 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6771816Z // end inline asm 2026-02-21T08:54:13.6771873Z add.s32 %r1394, %r718, 336896; 2026-02-21T08:54:13.6771934Z // begin inline asm 2026-02-21T08:54:13.6772051Z cp.async.cg.shared.global [ %r1394 + 0 ], [ %rd515 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6772106Z // end inline asm 2026-02-21T08:54:13.6772165Z add.s32 %r1396, %r718, 337920; 2026-02-21T08:54:13.6772226Z // begin inline asm 2026-02-21T08:54:13.6772339Z cp.async.cg.shared.global [ %r1396 + 0 ], [ %rd516 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6772426Z // end inline asm 2026-02-21T08:54:13.6772492Z add.s32 %r1398, %r718, 338944; 2026-02-21T08:54:13.6772549Z // begin inline asm 2026-02-21T08:54:13.6772663Z cp.async.cg.shared.global [ %r1398 + 0 ], [ %rd517 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6772723Z // end inline asm 2026-02-21T08:54:13.6772782Z add.s32 %r1400, %r718, 339968; 2026-02-21T08:54:13.6772837Z // begin inline asm 2026-02-21T08:54:13.6772951Z cp.async.cg.shared.global [ %r1400 + 0 ], [ %rd518 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6773013Z // end inline asm 2026-02-21T08:54:13.6773091Z add.s32 %r1402, %r718, 340992; 2026-02-21T08:54:13.6773149Z // begin inline asm 2026-02-21T08:54:13.6773271Z cp.async.cg.shared.global [ %r1402 + 0 ], [ %rd519 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6773328Z // end inline asm 2026-02-21T08:54:13.6773386Z add.s32 %r1404, %r718, 342016; 2026-02-21T08:54:13.6773443Z // begin inline asm 2026-02-21T08:54:13.6773567Z cp.async.cg.shared.global [ %r1404 + 0 ], [ %rd520 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6773624Z // end inline asm 2026-02-21T08:54:13.6773682Z add.s32 %r1406, %r718, 343040; 2026-02-21T08:54:13.6773746Z // begin inline asm 2026-02-21T08:54:13.6773862Z cp.async.cg.shared.global [ %r1406 + 0 ], [ %rd521 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6773916Z // end inline asm 2026-02-21T08:54:13.6773975Z add.s32 %r1408, %r718, 344064; 2026-02-21T08:54:13.6774038Z // begin inline asm 2026-02-21T08:54:13.6774153Z cp.async.cg.shared.global [ %r1408 + 0 ], [ %rd522 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6774207Z // end inline asm 2026-02-21T08:54:13.6774275Z add.s32 %r1410, %r718, 345088; 2026-02-21T08:54:13.6774331Z // begin inline asm 2026-02-21T08:54:13.6774450Z cp.async.cg.shared.global [ %r1410 + 0 ], [ %rd523 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6774554Z // end inline asm 2026-02-21T08:54:13.6774614Z add.s32 %r1412, %r718, 346112; 2026-02-21T08:54:13.6774706Z // begin inline asm 2026-02-21T08:54:13.6774822Z cp.async.cg.shared.global [ %r1412 + 0 ], [ %rd524 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6774891Z // end inline asm 2026-02-21T08:54:13.6774950Z add.s32 %r1414, %r718, 347136; 2026-02-21T08:54:13.6775009Z // begin inline asm 2026-02-21T08:54:13.6775136Z cp.async.cg.shared.global [ %r1414 + 0 ], [ %rd525 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6775195Z // end inline asm 2026-02-21T08:54:13.6775255Z add.s32 %r1416, %r718, 348160; 2026-02-21T08:54:13.6775313Z // begin inline asm 2026-02-21T08:54:13.6775435Z cp.async.cg.shared.global [ %r1416 + 0 ], [ %rd526 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6775489Z // end inline asm 2026-02-21T08:54:13.6775549Z add.s32 %r1418, %r718, 349184; 2026-02-21T08:54:13.6775611Z // begin inline asm 2026-02-21T08:54:13.6775753Z cp.async.cg.shared.global [ %r1418 + 0 ], [ %rd527 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6775813Z // end inline asm 2026-02-21T08:54:13.6775883Z add.s32 %r1420, %r718, 350208; 2026-02-21T08:54:13.6775941Z // begin inline asm 2026-02-21T08:54:13.6776061Z cp.async.cg.shared.global [ %r1420 + 0 ], [ %rd528 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6776117Z // end inline asm 2026-02-21T08:54:13.6776183Z add.s32 %r1422, %r718, 351232; 2026-02-21T08:54:13.6776240Z // begin inline asm 2026-02-21T08:54:13.6776355Z cp.async.cg.shared.global [ %r1422 + 0 ], [ %rd529 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6776419Z // end inline asm 2026-02-21T08:54:13.6776478Z add.s32 %r1424, %r718, 352256; 2026-02-21T08:54:13.6776535Z // begin inline asm 2026-02-21T08:54:13.6776649Z cp.async.cg.shared.global [ %r1424 + 0 ], [ %rd530 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6776714Z // end inline asm 2026-02-21T08:54:13.6776773Z add.s32 %r1426, %r718, 353280; 2026-02-21T08:54:13.6776839Z // begin inline asm 2026-02-21T08:54:13.6776960Z cp.async.cg.shared.global [ %r1426 + 0 ], [ %rd531 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6777013Z // end inline asm 2026-02-21T08:54:13.6777069Z add.s32 %r1428, %r718, 354304; 2026-02-21T08:54:13.6777123Z // begin inline asm 2026-02-21T08:54:13.6777270Z cp.async.cg.shared.global [ %r1428 + 0 ], [ %rd532 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6777323Z // end inline asm 2026-02-21T08:54:13.6777381Z add.s32 %r1430, %r718, 355328; 2026-02-21T08:54:13.6777442Z // begin inline asm 2026-02-21T08:54:13.6777551Z cp.async.cg.shared.global [ %r1430 + 0 ], [ %rd533 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6777603Z // end inline asm 2026-02-21T08:54:13.6777666Z add.s32 %r1432, %r718, 356352; 2026-02-21T08:54:13.6777719Z // begin inline asm 2026-02-21T08:54:13.6777828Z cp.async.cg.shared.global [ %r1432 + 0 ], [ %rd534 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6777907Z // end inline asm 2026-02-21T08:54:13.6777972Z add.s32 %r1434, %r718, 357376; 2026-02-21T08:54:13.6778028Z // begin inline asm 2026-02-21T08:54:13.6778138Z cp.async.cg.shared.global [ %r1434 + 0 ], [ %rd535 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6778194Z // end inline asm 2026-02-21T08:54:13.6778250Z add.s32 %r1436, %r718, 358400; 2026-02-21T08:54:13.6778305Z // begin inline asm 2026-02-21T08:54:13.6778413Z cp.async.cg.shared.global [ %r1436 + 0 ], [ %rd536 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6778471Z // end inline asm 2026-02-21T08:54:13.6778527Z add.s32 %r1438, %r718, 359424; 2026-02-21T08:54:13.6778579Z // begin inline asm 2026-02-21T08:54:13.6778695Z cp.async.cg.shared.global [ %r1438 + 0 ], [ %rd537 + 0 ], 0x10, %r1345; 2026-02-21T08:54:13.6778747Z // end inline asm 2026-02-21T08:54:13.6778807Z cp.async.commit_group; 2026-02-21T08:54:13.6778993Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6779053Z sub.s32 %r175, 15, %r5; 2026-02-21T08:54:13.6779113Z setp.lt.s32 %p67, %r175, 1; 2026-02-21T08:54:13.6779171Z @%p67 bra $L__BB0_11; 2026-02-21T08:54:13.6779282Z // %bb.3: // %.lr.ph 2026-02-21T08:54:13.6779452Z .loc 1 0 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:0:108 2026-02-21T08:54:13.6779511Z sub.s32 %r176, 10, %r5; 2026-02-21T08:54:13.6779574Z shl.b32 %r1453, %r1, 7; 2026-02-21T08:54:13.6779632Z and.b32 %r1454, %r1453, 16256; 2026-02-21T08:54:13.6779689Z or.b32 %r1455, %r1454, %r172; 2026-02-21T08:54:13.6779744Z add.s32 %r1457, %r403, 589824; 2026-02-21T08:54:13.6779807Z add.s32 %r177, %r1457, %r1455; 2026-02-21T08:54:13.6779864Z xor.b32 %r1458, %r1455, 16; 2026-02-21T08:54:13.6779919Z add.s32 %r178, %r1457, %r1458; 2026-02-21T08:54:13.6779982Z xor.b32 %r1459, %r1455, 32; 2026-02-21T08:54:13.6780037Z add.s32 %r179, %r1457, %r1459; 2026-02-21T08:54:13.6780094Z xor.b32 %r1460, %r1455, 48; 2026-02-21T08:54:13.6780157Z add.s32 %r180, %r1457, %r1460; 2026-02-21T08:54:13.6780233Z xor.b32 %r1461, %r1455, 64; 2026-02-21T08:54:13.6780291Z add.s32 %r181, %r1457, %r1461; 2026-02-21T08:54:13.6780344Z xor.b32 %r1462, %r1455, 80; 2026-02-21T08:54:13.6780405Z add.s32 %r182, %r1457, %r1462; 2026-02-21T08:54:13.6780458Z xor.b32 %r1463, %r1455, 96; 2026-02-21T08:54:13.6780514Z add.s32 %r183, %r1457, %r1463; 2026-02-21T08:54:13.6780576Z xor.b32 %r1464, %r1455, 112; 2026-02-21T08:54:13.6780631Z add.s32 %r184, %r1457, %r1464; 2026-02-21T08:54:13.6780686Z shl.b32 %r1465, %r8, 14; 2026-02-21T08:54:13.6780741Z add.s32 %r2018, %r1457, %r1465; 2026-02-21T08:54:13.6780803Z shl.b32 %r186, %r8, 6; 2026-02-21T08:54:13.6780858Z add.s32 %r2424, %r403, 655360; 2026-02-21T08:54:13.6780920Z mov.pred %p102, -1; 2026-02-21T08:54:13.6780979Z mov.b32 %r2426, 5; 2026-02-21T08:54:13.6781033Z mov.b32 %r2423, 0; 2026-02-21T08:54:13.6781087Z mov.b32 %r2422, 640; 2026-02-21T08:54:13.6781141Z mov.b32 %r2421, 1; 2026-02-21T08:54:13.6781201Z mov.b32 %r2420, 2; 2026-02-21T08:54:13.6781254Z mov.b32 %r2419, 3; 2026-02-21T08:54:13.6781305Z mov.b32 %r2418, 4; 2026-02-21T08:54:13.6781369Z mov.b32 %r2411, %r2410; 2026-02-21T08:54:13.6781426Z mov.b32 %r2412, %r2410; 2026-02-21T08:54:13.6781481Z mov.b32 %r2413, %r2410; 2026-02-21T08:54:13.6781535Z mov.b32 %r2415, %r2414; 2026-02-21T08:54:13.6781623Z mov.b32 %r2416, %r2414; 2026-02-21T08:54:13.6781679Z mov.b32 %r2417, %r2414; 2026-02-21T08:54:13.6781735Z mov.b32 %r2425, %r2423; 2026-02-21T08:54:13.6781799Z mov.b32 %r2427, %r2421; 2026-02-21T08:54:13.6781853Z mov.b32 %r2428, %r2423; 2026-02-21T08:54:13.6781907Z mov.b32 %r2435, %r2410; 2026-02-21T08:54:13.6781961Z mov.b32 %r2446, %r2414; 2026-02-21T08:54:13.6782021Z mov.b32 %r2448, %r2426; 2026-02-21T08:54:13.6782074Z mov.b32 %r2449, %r2423; 2026-02-21T08:54:13.6782127Z mov.b32 %r2482, %r2446; 2026-02-21T08:54:13.6782188Z mov.b32 %r2493, %r2435; 2026-02-21T08:54:13.6782280Z bra.uni $L__BB0_4; 2026-02-21T08:54:13.6782384Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:13.6782456Z selp.b32 %r2427, 0, %r1635, %p92; 2026-02-21T08:54:13.6782516Z selp.b32 %r1636, 1, 0, %p92; 2026-02-21T08:54:13.6782574Z xor.b32 %r2428, %r2534, %r1636; 2026-02-21T08:54:13.6782744Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6782811Z add.s32 %r2449, %r2449, 1; 2026-02-21T08:54:13.6782875Z setp.lt.s32 %p97, %r2449, %r175; 2026-02-21T08:54:13.6782929Z mov.b32 %r2410, %r2435; 2026-02-21T08:54:13.6782991Z mov.b32 %r2413, %r221; 2026-02-21T08:54:13.6783045Z mov.b32 %r2414, %r2446; 2026-02-21T08:54:13.6783098Z mov.b32 %r2417, %r225; 2026-02-21T08:54:13.6783150Z mov.b32 %r2418, %r2448; 2026-02-21T08:54:13.6783210Z mov.b32 %r2421, %r229; 2026-02-21T08:54:13.6783263Z mov.b32 %r2423, %r2534; 2026-02-21T08:54:13.6783315Z mov.b32 %r2424, %r2533; 2026-02-21T08:54:13.6783376Z mov.b32 %r2435, %r2493; 2026-02-21T08:54:13.6783429Z mov.b32 %r2446, %r2482; 2026-02-21T08:54:13.6783482Z mov.b32 %r2448, %r291; 2026-02-21T08:54:13.6783569Z @%p97 bra $L__BB0_4; 2026-02-21T08:54:13.6783630Z bra.uni $L__BB0_11; 2026-02-21T08:54:13.6783732Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:54:13.6783903Z .loc 1 0 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:0:108 2026-02-21T08:54:13.6783967Z mov.b32 %r2534, %r2428; 2026-02-21T08:54:13.6784022Z mov.b32 %r229, %r2420; 2026-02-21T08:54:13.6784076Z mov.b32 %r2420, %r2419; 2026-02-21T08:54:13.6784135Z mov.b32 %r2419, %r2418; 2026-02-21T08:54:13.6784188Z mov.b32 %r225, %r2416; 2026-02-21T08:54:13.6784242Z mov.b32 %r2416, %r2415; 2026-02-21T08:54:13.6784294Z mov.b32 %r2415, %r2414; 2026-02-21T08:54:13.6784356Z mov.b32 %r221, %r2412; 2026-02-21T08:54:13.6784410Z mov.b32 %r2412, %r2411; 2026-02-21T08:54:13.6784464Z mov.b32 %r2411, %r2410; 2026-02-21T08:54:13.6784666Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6784760Z add.s32 %r1466, %r2448, 1; 2026-02-21T08:54:13.6784820Z setp.eq.b32 %p69, %r2448, 15; 2026-02-21T08:54:13.6784882Z selp.b32 %r291, 0, %r1466, %p69; 2026-02-21T08:54:13.6784949Z setp.ne.b32 %p70, %r291, 0; 2026-02-21T08:54:13.6785005Z @%p70 bra $L__BB0_6; 2026-02-21T08:54:13.6785099Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:13.6785163Z add.s32 %r2500, %r2500, 1; 2026-02-21T08:54:13.6785331Z .loc 1 36 35 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:36:35 2026-02-21T08:54:13.6785395Z mul.hi.s32 %r1467, %r2500, 715827883; 2026-02-21T08:54:13.6785457Z shr.u32 %r1468, %r1467, 31; 2026-02-21T08:54:13.6785512Z shr.s32 %r1469, %r1467, 7; 2026-02-21T08:54:13.6785568Z add.s32 %r1470, %r1469, %r1468; 2026-02-21T08:54:13.6785734Z .loc 1 37 33 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:37:33 2026-02-21T08:54:13.6785795Z shl.b32 %r1471, %r1470, 4; 2026-02-21T08:54:13.6785954Z .loc 1 38 39 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:38:39 2026-02-21T08:54:13.6786009Z sub.s32 %r1472, 16, %r1471; 2026-02-21T08:54:13.6786177Z .loc 1 38 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:38:52 2026-02-21T08:54:13.6786263Z min.s32 %r1473, %r1472, 16; 2026-02-21T08:54:13.6786432Z .loc 1 39 45 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:39:45 2026-02-21T08:54:13.6786498Z mul.lo.s32 %r1474, %r1470, 768; 2026-02-21T08:54:13.6786554Z sub.s32 %r1475, %r2500, %r1474; 2026-02-21T08:54:13.6786720Z .loc 1 40 51 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:40:51 2026-02-21T08:54:13.6786784Z div.s32 %r1476, %r1475, %r1473; 2026-02-21T08:54:13.6786976Z .loc 1 39 64 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:39:64 2026-02-21T08:54:13.6787040Z mul.lo.s32 %r1477, %r1476, %r1473; 2026-02-21T08:54:13.6787097Z sub.s32 %r1478, %r1475, %r1477; 2026-02-21T08:54:13.6787268Z .loc 1 39 30 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:39:30 2026-02-21T08:54:13.6787325Z add.s32 %r1479, %r1478, %r1471; 2026-02-21T08:54:13.6787486Z .loc 1 41 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:41:27 2026-02-21T08:54:13.6787549Z shl.b32 %r2493, %r1476, 8; 2026-02-21T08:54:13.6787604Z shl.b32 %r2482, %r1479, 7; 2026-02-21T08:54:13.6787767Z .loc 1 42 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:42:32 2026-02-21T08:54:13.6787830Z or.b32 %r2483, %r2482, %r107; 2026-02-21T08:54:13.6787886Z or.b32 %r2484, %r2482, %r108; 2026-02-21T08:54:13.6787942Z or.b32 %r2485, %r2482, %r31; 2026-02-21T08:54:13.6787997Z or.b32 %r2486, %r2482, %r110; 2026-02-21T08:54:13.6788059Z or.b32 %r2487, %r2482, %r111; 2026-02-21T08:54:13.6788115Z or.b32 %r2488, %r2482, %r34; 2026-02-21T08:54:13.6788204Z or.b32 %r2489, %r2482, %r35; 2026-02-21T08:54:13.6788265Z or.b32 %r2490, %r2482, %r114; 2026-02-21T08:54:13.6788319Z or.b32 %r2491, %r2482, %r115; 2026-02-21T08:54:13.6788373Z or.b32 %r2492, %r2482, %r116; 2026-02-21T08:54:13.6788428Z or.b32 %r2510, %r2493, %r110; 2026-02-21T08:54:13.6788492Z or.b32 %r2509, %r2493, %r31; 2026-02-21T08:54:13.6788546Z or.b32 %r2508, %r2493, %r108; 2026-02-21T08:54:13.6788599Z or.b32 %r2507, %r2493, %r107; 2026-02-21T08:54:13.6788661Z or.b32 %r2511, %r2493, %r111; 2026-02-21T08:54:13.6788716Z or.b32 %r2512, %r2493, %r34; 2026-02-21T08:54:13.6788769Z or.b32 %r2513, %r2493, %r35; 2026-02-21T08:54:13.6788829Z or.b32 %r2506, %r2482, %r106; 2026-02-21T08:54:13.6788890Z or.b32 %r2505, %r2482, %r73; 2026-02-21T08:54:13.6788952Z or.b32 %r2504, %r2482, %r104; 2026-02-21T08:54:13.6789013Z or.b32 %r2503, %r2482, %r25; 2026-02-21T08:54:13.6789067Z or.b32 %r2502, %r2482, %r24; 2026-02-21T08:54:13.6789167Z or.b32 %r2501, %r2482, %r101; 2026-02-21T08:54:13.6789232Z or.b32 %r2514, %r2493, %r114; 2026-02-21T08:54:13.6789286Z or.b32 %r2515, %r2493, %r115; 2026-02-21T08:54:13.6789341Z or.b32 %r2516, %r2493, %r116; 2026-02-21T08:54:13.6789402Z or.b32 %r2517, %r2493, %r101; 2026-02-21T08:54:13.6789458Z or.b32 %r2518, %r2493, %r24; 2026-02-21T08:54:13.6789513Z or.b32 %r2519, %r2493, %r25; 2026-02-21T08:54:13.6789566Z or.b32 %r2520, %r2493, %r104; 2026-02-21T08:54:13.6789628Z or.b32 %r2521, %r2493, %r73; 2026-02-21T08:54:13.6789683Z or.b32 %r2522, %r2493, %r106; 2026-02-21T08:54:13.6789736Z or.b32 %r2523, %r2493, %r75; 2026-02-21T08:54:13.6789798Z or.b32 %r2524, %r2493, %r76; 2026-02-21T08:54:13.6789853Z or.b32 %r2525, %r2493, %r77; 2026-02-21T08:54:13.6789906Z or.b32 %r2526, %r2493, %r78; 2026-02-21T08:54:13.6789967Z or.b32 %r2527, %r2493, %r79; 2026-02-21T08:54:13.6790024Z or.b32 %r2528, %r2493, %r80; 2026-02-21T08:54:13.6790078Z or.b32 %r2529, %r2493, %r81; 2026-02-21T08:54:13.6790133Z or.b32 %r2530, %r2493, %r82; 2026-02-21T08:54:13.6790195Z or.b32 %r2531, %r2493, %r83; 2026-02-21T08:54:13.6790249Z or.b32 %r2532, %r2493, %r84; 2026-02-21T08:54:13.6790415Z .loc 1 44 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:44:32 2026-02-21T08:54:13.6790501Z or.b32 %r2494, %r2493, %r17; 2026-02-21T08:54:13.6790554Z or.b32 %r2495, %r2493, %r18; 2026-02-21T08:54:13.6790608Z or.b32 %r2496, %r2493, %r19; 2026-02-21T08:54:13.6790663Z or.b32 %r2497, %r2493, %r20; 2026-02-21T08:54:13.6790723Z or.b32 %r2498, %r2493, %r21; 2026-02-21T08:54:13.6790777Z or.b32 %r2499, %r2493, %r22; 2026-02-21T08:54:13.6790868Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:13.6791044Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6791133Z add.s32 %r1480, %r2425, 1; 2026-02-21T08:54:13.6791194Z setp.gt.s32 %p72, %r1480, 5; 2026-02-21T08:54:13.6791264Z selp.b32 %r2425, 0, %r1480, %p72; 2026-02-21T08:54:13.6791431Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6791492Z cp.async.wait_group 8; 2026-02-21T08:54:13.6791544Z bar.sync 0; 2026-02-21T08:54:13.6791723Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6791779Z shl.b32 %r1481, %r2427, 3; 2026-02-21T08:54:13.6791837Z add.s32 %r1483, %r403, %r1481; 2026-02-21T08:54:13.6791901Z add.s32 %r2533, %r1483, 655360; 2026-02-21T08:54:13.6792066Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6792121Z @%p47 bra $L__BB0_8; 2026-02-21T08:54:13.6792216Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:13.6792381Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6792438Z shl.b32 %r1500, %r2425, 16; 2026-02-21T08:54:13.6792520Z add.s32 %r1502, %r403, %r1500; 2026-02-21T08:54:13.6792688Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6792743Z shl.b32 %r1503, %r2425, 15; 2026-02-21T08:54:13.6792800Z add.s32 %r1504, %r403, %r1503; 2026-02-21T08:54:13.6792865Z add.s32 %r1505, %r1504, 393216; 2026-02-21T08:54:13.6793021Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6793083Z elect.sync %r1506|%p74, -1; 2026-02-21T08:54:13.6793145Z bfe.u32 %r1507, %r1505, 4, 14; 2026-02-21T08:54:13.6793200Z cvt.u64.u32 %rd555, %r1507; 2026-02-21T08:54:13.6793272Z or.b64 %rd538, %rd555, 4611686293372403712; 2026-02-21T08:54:13.6793329Z bfe.u32 %r1508, %r1502, 4, 14; 2026-02-21T08:54:13.6793392Z cvt.u64.u32 %rd556, %r1508; 2026-02-21T08:54:13.6793460Z or.b64 %rd539, %rd556, 4611686293439512576; 2026-02-21T08:54:13.6793534Z mov.b32 %r1485, 138412048; 2026-02-21T08:54:13.6793598Z // begin inline asm 2026-02-21T08:54:13.6793754Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd538, %rd539, %r1485, %p102; 2026-02-21T08:54:13.6793807Z // end inline asm 2026-02-21T08:54:13.6793870Z add.s32 %r1509, %r1504, 393248; 2026-02-21T08:54:13.6793928Z bfe.u32 %r1510, %r1509, 4, 14; 2026-02-21T08:54:13.6793983Z cvt.u64.u32 %rd557, %r1510; 2026-02-21T08:54:13.6794051Z or.b64 %rd540, %rd557, 4611686293372403712; 2026-02-21T08:54:13.6794115Z add.s32 %r1511, %r1502, 32; 2026-02-21T08:54:13.6794172Z bfe.u32 %r1512, %r1511, 4, 14; 2026-02-21T08:54:13.6794227Z cvt.u64.u32 %rd558, %r1512; 2026-02-21T08:54:13.6794299Z or.b64 %rd541, %rd558, 4611686293439512576; 2026-02-21T08:54:13.6794357Z mov.pred %p75, -1; 2026-02-21T08:54:13.6794411Z // begin inline asm 2026-02-21T08:54:13.6794556Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd540, %rd541, %r1485, %p75; 2026-02-21T08:54:13.6794616Z // end inline asm 2026-02-21T08:54:13.6794704Z add.s32 %r1513, %r1504, 393280; 2026-02-21T08:54:13.6794762Z bfe.u32 %r1514, %r1513, 4, 14; 2026-02-21T08:54:13.6794824Z cvt.u64.u32 %rd559, %r1514; 2026-02-21T08:54:13.6794890Z or.b64 %rd542, %rd559, 4611686293372403712; 2026-02-21T08:54:13.6794975Z add.s32 %r1515, %r1502, 64; 2026-02-21T08:54:13.6795038Z bfe.u32 %r1516, %r1515, 4, 14; 2026-02-21T08:54:13.6795092Z cvt.u64.u32 %rd560, %r1516; 2026-02-21T08:54:13.6795155Z or.b64 %rd543, %rd560, 4611686293439512576; 2026-02-21T08:54:13.6795208Z // begin inline asm 2026-02-21T08:54:13.6795348Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd542, %rd543, %r1485, %p75; 2026-02-21T08:54:13.6795402Z // end inline asm 2026-02-21T08:54:13.6795458Z add.s32 %r1517, %r1504, 393312; 2026-02-21T08:54:13.6795520Z bfe.u32 %r1518, %r1517, 4, 14; 2026-02-21T08:54:13.6795575Z cvt.u64.u32 %rd561, %r1518; 2026-02-21T08:54:13.6795665Z or.b64 %rd544, %rd561, 4611686293372403712; 2026-02-21T08:54:13.6795733Z add.s32 %r1519, %r1502, 96; 2026-02-21T08:54:13.6795793Z bfe.u32 %r1520, %r1519, 4, 14; 2026-02-21T08:54:13.6795850Z cvt.u64.u32 %rd562, %r1520; 2026-02-21T08:54:13.6795916Z or.b64 %rd545, %rd562, 4611686293439512576; 2026-02-21T08:54:13.6795982Z // begin inline asm 2026-02-21T08:54:13.6796118Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd544, %rd545, %r1485, %p75; 2026-02-21T08:54:13.6796173Z // end inline asm 2026-02-21T08:54:13.6796237Z add.s32 %r1521, %r1504, 409600; 2026-02-21T08:54:13.6796293Z bfe.u32 %r1522, %r1521, 4, 14; 2026-02-21T08:54:13.6796350Z cvt.u64.u32 %rd563, %r1522; 2026-02-21T08:54:13.6796415Z or.b64 %rd546, %rd563, 4611686293372403712; 2026-02-21T08:54:13.6796478Z add.s32 %r1523, %r1502, 32768; 2026-02-21T08:54:13.6796532Z bfe.u32 %r1524, %r1523, 4, 14; 2026-02-21T08:54:13.6796589Z cvt.u64.u32 %rd564, %r1524; 2026-02-21T08:54:13.6796663Z or.b64 %rd547, %rd564, 4611686293439512576; 2026-02-21T08:54:13.6796717Z // begin inline asm 2026-02-21T08:54:13.6796848Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd546, %rd547, %r1485, %p75; 2026-02-21T08:54:13.6796936Z // end inline asm 2026-02-21T08:54:13.6796994Z add.s32 %r1525, %r1504, 409632; 2026-02-21T08:54:13.6797048Z bfe.u32 %r1526, %r1525, 4, 14; 2026-02-21T08:54:13.6797105Z cvt.u64.u32 %rd565, %r1526; 2026-02-21T08:54:13.6797177Z or.b64 %rd548, %rd565, 4611686293372403712; 2026-02-21T08:54:13.6797233Z add.s32 %r1527, %r1502, 32800; 2026-02-21T08:54:13.6797287Z bfe.u32 %r1528, %r1527, 4, 14; 2026-02-21T08:54:13.6797349Z cvt.u64.u32 %rd566, %r1528; 2026-02-21T08:54:13.6797413Z or.b64 %rd549, %rd566, 4611686293439512576; 2026-02-21T08:54:13.6797467Z // begin inline asm 2026-02-21T08:54:13.6797603Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd548, %rd549, %r1485, %p75; 2026-02-21T08:54:13.6797656Z // end inline asm 2026-02-21T08:54:13.6797713Z add.s32 %r1529, %r1504, 409664; 2026-02-21T08:54:13.6797768Z bfe.u32 %r1530, %r1529, 4, 14; 2026-02-21T08:54:13.6797856Z cvt.u64.u32 %rd567, %r1530; 2026-02-21T08:54:13.6797924Z or.b64 %rd550, %rd567, 4611686293372403712; 2026-02-21T08:54:13.6797980Z add.s32 %r1531, %r1502, 32832; 2026-02-21T08:54:13.6798041Z bfe.u32 %r1532, %r1531, 4, 14; 2026-02-21T08:54:13.6798098Z cvt.u64.u32 %rd568, %r1532; 2026-02-21T08:54:13.6798163Z or.b64 %rd551, %rd568, 4611686293439512576; 2026-02-21T08:54:13.6798218Z // begin inline asm 2026-02-21T08:54:13.6798355Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd550, %rd551, %r1485, %p75; 2026-02-21T08:54:13.6798409Z // end inline asm 2026-02-21T08:54:13.6798465Z add.s32 %r1533, %r1504, 409696; 2026-02-21T08:54:13.6798530Z bfe.u32 %r1534, %r1533, 4, 14; 2026-02-21T08:54:13.6798586Z cvt.u64.u32 %rd569, %r1534; 2026-02-21T08:54:13.6798651Z or.b64 %rd552, %rd569, 4611686293372403712; 2026-02-21T08:54:13.6798713Z add.s32 %r1535, %r1502, 32864; 2026-02-21T08:54:13.6798769Z bfe.u32 %r1536, %r1535, 4, 14; 2026-02-21T08:54:13.6798826Z cvt.u64.u32 %rd570, %r1536; 2026-02-21T08:54:13.6798892Z or.b64 %rd553, %rd570, 4611686293439512576; 2026-02-21T08:54:13.6798956Z // begin inline asm 2026-02-21T08:54:13.6799084Z @%p74 tcgen05.mma.cta_group::1.kind::f16 [ %r2408 + 0 ], %rd552, %rd553, %r1485, %p75; 2026-02-21T08:54:13.6799135Z // end inline asm 2026-02-21T08:54:13.6799223Z cvt.u64.u32 %rd554, %r2533; 2026-02-21T08:54:13.6799276Z // begin inline asm 2026-02-21T08:54:13.6799399Z @%p74 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd554]; 2026-02-21T08:54:13.6799459Z // end inline asm 2026-02-21T08:54:13.6799552Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:13.6799727Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6799787Z setp.eq.b32 %p90, %r291, 0; 2026-02-21T08:54:13.6799856Z setp.lt.s32 %p91, %r2449, %r176; 2026-02-21T08:54:13.6800047Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6800104Z // begin inline asm 2026-02-21T08:54:13.6800163Z 2026-02-21T08:54:13.6800212Z { 2026-02-21T08:54:13.6800272Z .reg .pred complete; 2026-02-21T08:54:13.6800326Z waitLoop: 2026-02-21T08:54:13.6800452Z mbarrier.try_wait.parity.shared.b64 complete, [%r2424], %r2423; 2026-02-21T08:54:13.6800516Z @!complete bra.uni waitLoop; 2026-02-21T08:54:13.6800563Z } 2026-02-21T08:54:13.6800570Z 2026-02-21T08:54:13.6800631Z // end inline asm 2026-02-21T08:54:13.6800687Z add.s32 %r1635, %r2427, 1; 2026-02-21T08:54:13.6800747Z setp.gt.s32 %p92, %r1635, 1; 2026-02-21T08:54:13.6800924Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6800979Z add.s32 %r1637, %r2422, 128; 2026-02-21T08:54:13.6801034Z add.s32 %r1638, %r2426, 1; 2026-02-21T08:54:13.6801091Z setp.gt.s32 %p93, %r1638, 5; 2026-02-21T08:54:13.6801161Z selp.b32 %r2426, 0, %r1638, %p93; 2026-02-21T08:54:13.6801224Z selp.b32 %r2422, 0, %r1637, %p90; 2026-02-21T08:54:13.6801417Z .loc 1 50 35 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:50:35 2026-02-21T08:54:13.6801482Z add.s32 %r1639, %r2422, %r4; 2026-02-21T08:54:13.6801648Z .loc 1 54 53 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:53 2026-02-21T08:54:13.6801705Z shl.b32 %r1640, %r2483, 11; 2026-02-21T08:54:13.6801766Z shl.b32 %r1641, %r2484, 11; 2026-02-21T08:54:13.6801821Z shl.b32 %r1642, %r2485, 11; 2026-02-21T08:54:13.6801874Z shl.b32 %r1643, %r2486, 11; 2026-02-21T08:54:13.6801928Z shl.b32 %r1644, %r2487, 11; 2026-02-21T08:54:13.6801989Z shl.b32 %r1645, %r2488, 11; 2026-02-21T08:54:13.6802044Z shl.b32 %r1646, %r2489, 11; 2026-02-21T08:54:13.6802096Z shl.b32 %r1647, %r2490, 11; 2026-02-21T08:54:13.6802156Z shl.b32 %r1648, %r2491, 11; 2026-02-21T08:54:13.6802210Z shl.b32 %r1649, %r2492, 11; 2026-02-21T08:54:13.6802266Z shl.b32 %r1650, %r2501, 11; 2026-02-21T08:54:13.6802340Z shl.b32 %r1651, %r2502, 11; 2026-02-21T08:54:13.6802404Z shl.b32 %r1652, %r2503, 11; 2026-02-21T08:54:13.6802458Z shl.b32 %r1653, %r2504, 11; 2026-02-21T08:54:13.6802513Z shl.b32 %r1654, %r2505, 11; 2026-02-21T08:54:13.6802576Z shl.b32 %r1655, %r2506, 11; 2026-02-21T08:54:13.6802740Z .loc 1 54 60 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:60 2026-02-21T08:54:13.6802799Z add.s32 %r1656, %r1640, %r1639; 2026-02-21T08:54:13.6802864Z add.s32 %r1657, %r1641, %r1639; 2026-02-21T08:54:13.6802920Z add.s32 %r1658, %r1642, %r1639; 2026-02-21T08:54:13.6802976Z add.s32 %r1659, %r1643, %r1639; 2026-02-21T08:54:13.6803032Z add.s32 %r1660, %r1644, %r1639; 2026-02-21T08:54:13.6803096Z add.s32 %r1661, %r1645, %r1639; 2026-02-21T08:54:13.6803154Z add.s32 %r1662, %r1646, %r1639; 2026-02-21T08:54:13.6803211Z add.s32 %r1663, %r1647, %r1639; 2026-02-21T08:54:13.6803275Z add.s32 %r1664, %r1648, %r1639; 2026-02-21T08:54:13.6803331Z add.s32 %r1665, %r1649, %r1639; 2026-02-21T08:54:13.6803389Z add.s32 %r1666, %r1650, %r1639; 2026-02-21T08:54:13.6803446Z add.s32 %r1667, %r1651, %r1639; 2026-02-21T08:54:13.6803510Z add.s32 %r1668, %r1652, %r1639; 2026-02-21T08:54:13.6803565Z add.s32 %r1669, %r1653, %r1639; 2026-02-21T08:54:13.6803620Z add.s32 %r1670, %r1654, %r1639; 2026-02-21T08:54:13.6803708Z add.s32 %r1671, %r1655, %r1639; 2026-02-21T08:54:13.6803877Z .loc 1 54 32 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:32 2026-02-21T08:54:13.6803945Z mad.wide.s32 %rd571, %r1656, 2, %rd50; 2026-02-21T08:54:13.6804017Z mad.wide.s32 %rd572, %r1657, 2, %rd50; 2026-02-21T08:54:13.6804079Z mad.wide.s32 %rd573, %r1658, 2, %rd50; 2026-02-21T08:54:13.6804139Z mad.wide.s32 %rd574, %r1659, 2, %rd50; 2026-02-21T08:54:13.6804199Z mad.wide.s32 %rd575, %r1660, 2, %rd50; 2026-02-21T08:54:13.6804289Z mad.wide.s32 %rd576, %r1661, 2, %rd50; 2026-02-21T08:54:13.6804351Z mad.wide.s32 %rd577, %r1662, 2, %rd50; 2026-02-21T08:54:13.6804412Z mad.wide.s32 %rd578, %r1663, 2, %rd50; 2026-02-21T08:54:13.6804480Z mad.wide.s32 %rd579, %r1664, 2, %rd50; 2026-02-21T08:54:13.6804540Z mad.wide.s32 %rd580, %r1665, 2, %rd50; 2026-02-21T08:54:13.6804601Z mad.wide.s32 %rd581, %r1666, 2, %rd50; 2026-02-21T08:54:13.6804659Z mad.wide.s32 %rd582, %r1667, 2, %rd50; 2026-02-21T08:54:13.6804762Z mad.wide.s32 %rd583, %r1668, 2, %rd50; 2026-02-21T08:54:13.6804822Z mad.wide.s32 %rd584, %r1669, 2, %rd50; 2026-02-21T08:54:13.6804881Z mad.wide.s32 %rd585, %r1670, 2, %rd50; 2026-02-21T08:54:13.6804947Z mad.wide.s32 %rd586, %r1671, 2, %rd50; 2026-02-21T08:54:13.6805113Z .loc 1 54 85 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:54:85 2026-02-21T08:54:13.6805170Z shl.b32 %r1672, %r2426, 15; 2026-02-21T08:54:13.6805233Z add.s32 %r1674, %r403, %r1672; 2026-02-21T08:54:13.6805291Z add.s32 %r1675, %r1674, %r173; 2026-02-21T08:54:13.6805345Z bar.sync 0; 2026-02-21T08:54:13.6805403Z add.s32 %r1539, %r1675, 393216; 2026-02-21T08:54:13.6805471Z selp.b32 %r1540, 16, 0, %p91; 2026-02-21T08:54:13.6805556Z // begin inline asm 2026-02-21T08:54:13.6805679Z cp.async.cg.shared.global [ %r1539 + 0 ], [ %rd571 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6805743Z // end inline asm 2026-02-21T08:54:13.6805803Z add.s32 %r1541, %r1675, 394240; 2026-02-21T08:54:13.6805858Z // begin inline asm 2026-02-21T08:54:13.6805983Z cp.async.cg.shared.global [ %r1541 + 0 ], [ %rd572 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6806037Z // end inline asm 2026-02-21T08:54:13.6806093Z add.s32 %r1543, %r1675, 395264; 2026-02-21T08:54:13.6806147Z // begin inline asm 2026-02-21T08:54:13.6806268Z cp.async.cg.shared.global [ %r1543 + 0 ], [ %rd573 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6806321Z // end inline asm 2026-02-21T08:54:13.6806377Z add.s32 %r1545, %r1675, 396288; 2026-02-21T08:54:13.6806438Z // begin inline asm 2026-02-21T08:54:13.6806550Z cp.async.cg.shared.global [ %r1545 + 0 ], [ %rd574 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6806630Z // end inline asm 2026-02-21T08:54:13.6806688Z add.s32 %r1547, %r1675, 397312; 2026-02-21T08:54:13.6806748Z // begin inline asm 2026-02-21T08:54:13.6806859Z cp.async.cg.shared.global [ %r1547 + 0 ], [ %rd575 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6806910Z // end inline asm 2026-02-21T08:54:13.6806972Z add.s32 %r1549, %r1675, 398336; 2026-02-21T08:54:13.6807025Z // begin inline asm 2026-02-21T08:54:13.6807134Z cp.async.cg.shared.global [ %r1549 + 0 ], [ %rd576 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6807186Z // end inline asm 2026-02-21T08:54:13.6807246Z add.s32 %r1551, %r1675, 399360; 2026-02-21T08:54:13.6807298Z // begin inline asm 2026-02-21T08:54:13.6807406Z cp.async.cg.shared.global [ %r1551 + 0 ], [ %rd577 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6807465Z // end inline asm 2026-02-21T08:54:13.6807520Z add.s32 %r1553, %r1675, 400384; 2026-02-21T08:54:13.6807574Z // begin inline asm 2026-02-21T08:54:13.6807691Z cp.async.cg.shared.global [ %r1553 + 0 ], [ %rd578 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6807743Z // end inline asm 2026-02-21T08:54:13.6807799Z add.s32 %r1555, %r1675, 401408; 2026-02-21T08:54:13.6807853Z // begin inline asm 2026-02-21T08:54:13.6807968Z cp.async.cg.shared.global [ %r1555 + 0 ], [ %rd579 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6808057Z // end inline asm 2026-02-21T08:54:13.6808113Z add.s32 %r1557, %r1675, 402432; 2026-02-21T08:54:13.6808173Z // begin inline asm 2026-02-21T08:54:13.6808280Z cp.async.cg.shared.global [ %r1557 + 0 ], [ %rd580 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6808331Z // end inline asm 2026-02-21T08:54:13.6808387Z add.s32 %r1559, %r1675, 403456; 2026-02-21T08:54:13.6808448Z // begin inline asm 2026-02-21T08:54:13.6808556Z cp.async.cg.shared.global [ %r1559 + 0 ], [ %rd581 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6808608Z // end inline asm 2026-02-21T08:54:13.6808672Z add.s32 %r1561, %r1675, 404480; 2026-02-21T08:54:13.6808753Z // begin inline asm 2026-02-21T08:54:13.6808863Z cp.async.cg.shared.global [ %r1561 + 0 ], [ %rd582 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6808926Z // end inline asm 2026-02-21T08:54:13.6808982Z add.s32 %r1563, %r1675, 405504; 2026-02-21T08:54:13.6809035Z // begin inline asm 2026-02-21T08:54:13.6809146Z cp.async.cg.shared.global [ %r1563 + 0 ], [ %rd583 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6809206Z // end inline asm 2026-02-21T08:54:13.6809261Z add.s32 %r1565, %r1675, 406528; 2026-02-21T08:54:13.6809315Z // begin inline asm 2026-02-21T08:54:13.6809432Z cp.async.cg.shared.global [ %r1565 + 0 ], [ %rd584 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6809484Z // end inline asm 2026-02-21T08:54:13.6809539Z add.s32 %r1567, %r1675, 407552; 2026-02-21T08:54:13.6809592Z // begin inline asm 2026-02-21T08:54:13.6809708Z cp.async.cg.shared.global [ %r1567 + 0 ], [ %rd585 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6809761Z // end inline asm 2026-02-21T08:54:13.6809817Z add.s32 %r1569, %r1675, 408576; 2026-02-21T08:54:13.6809880Z // begin inline asm 2026-02-21T08:54:13.6809990Z cp.async.cg.shared.global [ %r1569 + 0 ], [ %rd586 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6810068Z // end inline asm 2026-02-21T08:54:13.6810137Z cp.async.commit_group; 2026-02-21T08:54:13.6810301Z .loc 1 55 80 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:80 2026-02-21T08:54:13.6810361Z shl.b32 %r1676, %r2507, 11; 2026-02-21T08:54:13.6810418Z shl.b32 %r1677, %r2508, 11; 2026-02-21T08:54:13.6810482Z shl.b32 %r1678, %r2509, 11; 2026-02-21T08:54:13.6810537Z shl.b32 %r1679, %r2510, 11; 2026-02-21T08:54:13.6810592Z shl.b32 %r1680, %r2511, 11; 2026-02-21T08:54:13.6810653Z shl.b32 %r1681, %r2512, 11; 2026-02-21T08:54:13.6810707Z shl.b32 %r1682, %r2513, 11; 2026-02-21T08:54:13.6810762Z shl.b32 %r1683, %r2514, 11; 2026-02-21T08:54:13.6810816Z shl.b32 %r1684, %r2515, 11; 2026-02-21T08:54:13.6810876Z shl.b32 %r1685, %r2516, 11; 2026-02-21T08:54:13.6810930Z shl.b32 %r1686, %r2517, 11; 2026-02-21T08:54:13.6810984Z shl.b32 %r1687, %r2518, 11; 2026-02-21T08:54:13.6811066Z shl.b32 %r1688, %r2519, 11; 2026-02-21T08:54:13.6811124Z shl.b32 %r1689, %r2520, 11; 2026-02-21T08:54:13.6811177Z shl.b32 %r1690, %r2521, 11; 2026-02-21T08:54:13.6811230Z shl.b32 %r1691, %r2522, 11; 2026-02-21T08:54:13.6811291Z shl.b32 %r1692, %r2523, 11; 2026-02-21T08:54:13.6811348Z shl.b32 %r1693, %r2524, 11; 2026-02-21T08:54:13.6811401Z shl.b32 %r1694, %r2525, 11; 2026-02-21T08:54:13.6811462Z shl.b32 %r1695, %r2526, 11; 2026-02-21T08:54:13.6811514Z shl.b32 %r1696, %r2527, 11; 2026-02-21T08:54:13.6811569Z shl.b32 %r1697, %r2528, 11; 2026-02-21T08:54:13.6811629Z shl.b32 %r1698, %r2529, 11; 2026-02-21T08:54:13.6811683Z shl.b32 %r1699, %r2530, 11; 2026-02-21T08:54:13.6811735Z shl.b32 %r1700, %r2531, 11; 2026-02-21T08:54:13.6811787Z shl.b32 %r1701, %r2532, 11; 2026-02-21T08:54:13.6811850Z shl.b32 %r1702, %r2494, 11; 2026-02-21T08:54:13.6811907Z shl.b32 %r1703, %r2495, 11; 2026-02-21T08:54:13.6811963Z shl.b32 %r1704, %r2496, 11; 2026-02-21T08:54:13.6812028Z shl.b32 %r1705, %r2497, 11; 2026-02-21T08:54:13.6812086Z shl.b32 %r1706, %r2498, 11; 2026-02-21T08:54:13.6812142Z shl.b32 %r1707, %r2499, 11; 2026-02-21T08:54:13.6812314Z .loc 1 55 59 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:59 2026-02-21T08:54:13.6812403Z add.s32 %r1708, %r1676, %r1639; 2026-02-21T08:54:13.6812461Z add.s32 %r1709, %r1677, %r1639; 2026-02-21T08:54:13.6812519Z add.s32 %r1710, %r1678, %r1639; 2026-02-21T08:54:13.6812585Z add.s32 %r1711, %r1679, %r1639; 2026-02-21T08:54:13.6812643Z add.s32 %r1712, %r1680, %r1639; 2026-02-21T08:54:13.6812699Z add.s32 %r1713, %r1681, %r1639; 2026-02-21T08:54:13.6812766Z add.s32 %r1714, %r1682, %r1639; 2026-02-21T08:54:13.6812824Z add.s32 %r1715, %r1683, %r1639; 2026-02-21T08:54:13.6812882Z add.s32 %r1716, %r1684, %r1639; 2026-02-21T08:54:13.6812939Z add.s32 %r1717, %r1685, %r1639; 2026-02-21T08:54:13.6813027Z add.s32 %r1718, %r1686, %r1639; 2026-02-21T08:54:13.6813089Z add.s32 %r1719, %r1687, %r1639; 2026-02-21T08:54:13.6813147Z add.s32 %r1720, %r1688, %r1639; 2026-02-21T08:54:13.6813213Z add.s32 %r1721, %r1689, %r1639; 2026-02-21T08:54:13.6813271Z add.s32 %r1722, %r1690, %r1639; 2026-02-21T08:54:13.6813328Z add.s32 %r1723, %r1691, %r1639; 2026-02-21T08:54:13.6813385Z add.s32 %r1724, %r1692, %r1639; 2026-02-21T08:54:13.6813453Z add.s32 %r1725, %r1693, %r1639; 2026-02-21T08:54:13.6813511Z add.s32 %r1726, %r1694, %r1639; 2026-02-21T08:54:13.6813567Z add.s32 %r1727, %r1695, %r1639; 2026-02-21T08:54:13.6813630Z add.s32 %r1728, %r1696, %r1639; 2026-02-21T08:54:13.6813686Z add.s32 %r1729, %r1697, %r1639; 2026-02-21T08:54:13.6813743Z add.s32 %r1730, %r1698, %r1639; 2026-02-21T08:54:13.6813799Z add.s32 %r1731, %r1699, %r1639; 2026-02-21T08:54:13.6813861Z add.s32 %r1732, %r1700, %r1639; 2026-02-21T08:54:13.6813917Z add.s32 %r1733, %r1701, %r1639; 2026-02-21T08:54:13.6813976Z add.s32 %r1734, %r1702, %r1639; 2026-02-21T08:54:13.6814041Z add.s32 %r1735, %r1703, %r1639; 2026-02-21T08:54:13.6814099Z add.s32 %r1736, %r1704, %r1639; 2026-02-21T08:54:13.6814180Z add.s32 %r1737, %r1705, %r1639; 2026-02-21T08:54:13.6814243Z add.s32 %r1738, %r1706, %r1639; 2026-02-21T08:54:13.6814299Z add.s32 %r1739, %r1707, %r1639; 2026-02-21T08:54:13.6814469Z .loc 1 55 34 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:34 2026-02-21T08:54:13.6814539Z mad.wide.s32 %rd587, %r1708, 2, %rd51; 2026-02-21T08:54:13.6814613Z mad.wide.s32 %rd588, %r1709, 2, %rd51; 2026-02-21T08:54:13.6814703Z mad.wide.s32 %rd589, %r1710, 2, %rd51; 2026-02-21T08:54:13.6814769Z mad.wide.s32 %rd590, %r1711, 2, %rd51; 2026-02-21T08:54:13.6814838Z mad.wide.s32 %rd591, %r1712, 2, %rd51; 2026-02-21T08:54:13.6814900Z mad.wide.s32 %rd592, %r1713, 2, %rd51; 2026-02-21T08:54:13.6814964Z mad.wide.s32 %rd593, %r1714, 2, %rd51; 2026-02-21T08:54:13.6815035Z mad.wide.s32 %rd594, %r1715, 2, %rd51; 2026-02-21T08:54:13.6815098Z mad.wide.s32 %rd595, %r1716, 2, %rd51; 2026-02-21T08:54:13.6815186Z mad.wide.s32 %rd596, %r1717, 2, %rd51; 2026-02-21T08:54:13.6815251Z mad.wide.s32 %rd597, %r1718, 2, %rd51; 2026-02-21T08:54:13.6815321Z mad.wide.s32 %rd598, %r1719, 2, %rd51; 2026-02-21T08:54:13.6815382Z mad.wide.s32 %rd599, %r1720, 2, %rd51; 2026-02-21T08:54:13.6815444Z mad.wide.s32 %rd600, %r1721, 2, %rd51; 2026-02-21T08:54:13.6815514Z mad.wide.s32 %rd601, %r1722, 2, %rd51; 2026-02-21T08:54:13.6815577Z mad.wide.s32 %rd602, %r1723, 2, %rd51; 2026-02-21T08:54:13.6815639Z mad.wide.s32 %rd603, %r1724, 2, %rd51; 2026-02-21T08:54:13.6815701Z mad.wide.s32 %rd604, %r1725, 2, %rd51; 2026-02-21T08:54:13.6815770Z mad.wide.s32 %rd605, %r1726, 2, %rd51; 2026-02-21T08:54:13.6815833Z mad.wide.s32 %rd606, %r1727, 2, %rd51; 2026-02-21T08:54:13.6815894Z mad.wide.s32 %rd607, %r1728, 2, %rd51; 2026-02-21T08:54:13.6815963Z mad.wide.s32 %rd608, %r1729, 2, %rd51; 2026-02-21T08:54:13.6816027Z mad.wide.s32 %rd609, %r1730, 2, %rd51; 2026-02-21T08:54:13.6816092Z mad.wide.s32 %rd610, %r1731, 2, %rd51; 2026-02-21T08:54:13.6816163Z mad.wide.s32 %rd611, %r1732, 2, %rd51; 2026-02-21T08:54:13.6816227Z mad.wide.s32 %rd612, %r1733, 2, %rd51; 2026-02-21T08:54:13.6816290Z mad.wide.s32 %rd613, %r1734, 2, %rd51; 2026-02-21T08:54:13.6816353Z mad.wide.s32 %rd614, %r1735, 2, %rd51; 2026-02-21T08:54:13.6816452Z mad.wide.s32 %rd615, %r1736, 2, %rd51; 2026-02-21T08:54:13.6816517Z mad.wide.s32 %rd616, %r1737, 2, %rd51; 2026-02-21T08:54:13.6816580Z mad.wide.s32 %rd617, %r1738, 2, %rd51; 2026-02-21T08:54:13.6816650Z mad.wide.s32 %rd618, %r1739, 2, %rd51; 2026-02-21T08:54:13.6816828Z .loc 1 55 87 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:55:87 2026-02-21T08:54:13.6816892Z shl.b32 %r1740, %r2426, 16; 2026-02-21T08:54:13.6816953Z add.s32 %r1741, %r403, %r1740; 2026-02-21T08:54:13.6817020Z add.s32 %r1571, %r1741, %r174; 2026-02-21T08:54:13.6817100Z // begin inline asm 2026-02-21T08:54:13.6817221Z cp.async.cg.shared.global [ %r1571 + 0 ], [ %rd587 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6817287Z // end inline asm 2026-02-21T08:54:13.6817349Z add.s32 %r1573, %r1571, 1024; 2026-02-21T08:54:13.6817405Z // begin inline asm 2026-02-21T08:54:13.6817530Z cp.async.cg.shared.global [ %r1573 + 0 ], [ %rd588 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6817588Z // end inline asm 2026-02-21T08:54:13.6817647Z add.s32 %r1575, %r1571, 2048; 2026-02-21T08:54:13.6817703Z // begin inline asm 2026-02-21T08:54:13.6817828Z cp.async.cg.shared.global [ %r1575 + 0 ], [ %rd589 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6817884Z // end inline asm 2026-02-21T08:54:13.6817944Z add.s32 %r1577, %r1571, 3072; 2026-02-21T08:54:13.6818008Z // begin inline asm 2026-02-21T08:54:13.6818123Z cp.async.cg.shared.global [ %r1577 + 0 ], [ %rd590 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6818179Z // end inline asm 2026-02-21T08:54:13.6818238Z add.s32 %r1579, %r1571, 4096; 2026-02-21T08:54:13.6818302Z // begin inline asm 2026-02-21T08:54:13.6818421Z cp.async.cg.shared.global [ %r1579 + 0 ], [ %rd591 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6818504Z // end inline asm 2026-02-21T08:54:13.6818571Z add.s32 %r1581, %r1571, 5120; 2026-02-21T08:54:13.6818626Z // begin inline asm 2026-02-21T08:54:13.6818743Z cp.async.cg.shared.global [ %r1581 + 0 ], [ %rd592 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6818805Z // end inline asm 2026-02-21T08:54:13.6818862Z add.s32 %r1583, %r1571, 6144; 2026-02-21T08:54:13.6818918Z // begin inline asm 2026-02-21T08:54:13.6819033Z cp.async.cg.shared.global [ %r1583 + 0 ], [ %rd593 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6819096Z // end inline asm 2026-02-21T08:54:13.6819154Z add.s32 %r1585, %r1571, 7168; 2026-02-21T08:54:13.6819210Z // begin inline asm 2026-02-21T08:54:13.6819333Z cp.async.cg.shared.global [ %r1585 + 0 ], [ %rd594 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6819387Z // end inline asm 2026-02-21T08:54:13.6819445Z add.s32 %r1587, %r1571, 8192; 2026-02-21T08:54:13.6819503Z // begin inline asm 2026-02-21T08:54:13.6819651Z cp.async.cg.shared.global [ %r1587 + 0 ], [ %rd595 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6819710Z // end inline asm 2026-02-21T08:54:13.6819768Z add.s32 %r1589, %r1571, 9216; 2026-02-21T08:54:13.6819830Z // begin inline asm 2026-02-21T08:54:13.6819946Z cp.async.cg.shared.global [ %r1589 + 0 ], [ %rd596 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6820002Z // end inline asm 2026-02-21T08:54:13.6820061Z add.s32 %r1591, %r1571, 10240; 2026-02-21T08:54:13.6820124Z // begin inline asm 2026-02-21T08:54:13.6820240Z cp.async.cg.shared.global [ %r1591 + 0 ], [ %rd597 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6820294Z // end inline asm 2026-02-21T08:54:13.6820362Z add.s32 %r1593, %r1571, 11264; 2026-02-21T08:54:13.6820418Z // begin inline asm 2026-02-21T08:54:13.6820533Z cp.async.cg.shared.global [ %r1593 + 0 ], [ %rd598 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6820594Z // end inline asm 2026-02-21T08:54:13.6820655Z add.s32 %r1595, %r1571, 12288; 2026-02-21T08:54:13.6820710Z // begin inline asm 2026-02-21T08:54:13.6820833Z cp.async.cg.shared.global [ %r1595 + 0 ], [ %rd599 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6820893Z // end inline asm 2026-02-21T08:54:13.6820948Z add.s32 %r1597, %r1571, 13312; 2026-02-21T08:54:13.6821000Z // begin inline asm 2026-02-21T08:54:13.6821115Z cp.async.cg.shared.global [ %r1597 + 0 ], [ %rd600 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6821194Z // end inline asm 2026-02-21T08:54:13.6821250Z add.s32 %r1599, %r1571, 14336; 2026-02-21T08:54:13.6821303Z // begin inline asm 2026-02-21T08:54:13.6821420Z cp.async.cg.shared.global [ %r1599 + 0 ], [ %rd601 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6821473Z // end inline asm 2026-02-21T08:54:13.6821528Z add.s32 %r1601, %r1571, 15360; 2026-02-21T08:54:13.6821588Z // begin inline asm 2026-02-21T08:54:13.6821698Z cp.async.cg.shared.global [ %r1601 + 0 ], [ %rd602 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6821751Z // end inline asm 2026-02-21T08:54:13.6821841Z add.s32 %r1603, %r1571, 16384; 2026-02-21T08:54:13.6821896Z // begin inline asm 2026-02-21T08:54:13.6822008Z cp.async.cg.shared.global [ %r1603 + 0 ], [ %rd603 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6822062Z // end inline asm 2026-02-21T08:54:13.6822125Z add.s32 %r1605, %r1571, 17408; 2026-02-21T08:54:13.6822178Z // begin inline asm 2026-02-21T08:54:13.6822288Z cp.async.cg.shared.global [ %r1605 + 0 ], [ %rd604 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6822348Z // end inline asm 2026-02-21T08:54:13.6822403Z add.s32 %r1607, %r1571, 18432; 2026-02-21T08:54:13.6822455Z // begin inline asm 2026-02-21T08:54:13.6822565Z cp.async.cg.shared.global [ %r1607 + 0 ], [ %rd605 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6822624Z // end inline asm 2026-02-21T08:54:13.6822679Z add.s32 %r1609, %r1571, 19456; 2026-02-21T08:54:13.6822733Z // begin inline asm 2026-02-21T08:54:13.6822849Z cp.async.cg.shared.global [ %r1609 + 0 ], [ %rd606 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6822923Z // end inline asm 2026-02-21T08:54:13.6822979Z add.s32 %r1611, %r1571, 20480; 2026-02-21T08:54:13.6823035Z // begin inline asm 2026-02-21T08:54:13.6823179Z cp.async.cg.shared.global [ %r1611 + 0 ], [ %rd607 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6823233Z // end inline asm 2026-02-21T08:54:13.6823290Z add.s32 %r1613, %r1571, 21504; 2026-02-21T08:54:13.6823352Z // begin inline asm 2026-02-21T08:54:13.6823464Z cp.async.cg.shared.global [ %r1613 + 0 ], [ %rd608 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6823517Z // end inline asm 2026-02-21T08:54:13.6823579Z add.s32 %r1615, %r1571, 22528; 2026-02-21T08:54:13.6823634Z // begin inline asm 2026-02-21T08:54:13.6823743Z cp.async.cg.shared.global [ %r1615 + 0 ], [ %rd609 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6823797Z // end inline asm 2026-02-21T08:54:13.6823863Z add.s32 %r1617, %r1571, 23552; 2026-02-21T08:54:13.6823918Z // begin inline asm 2026-02-21T08:54:13.6824031Z cp.async.cg.shared.global [ %r1617 + 0 ], [ %rd610 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6824094Z // end inline asm 2026-02-21T08:54:13.6824152Z add.s32 %r1619, %r1571, 24576; 2026-02-21T08:54:13.6824238Z // begin inline asm 2026-02-21T08:54:13.6824352Z cp.async.cg.shared.global [ %r1619 + 0 ], [ %rd611 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6824412Z // end inline asm 2026-02-21T08:54:13.6824467Z add.s32 %r1621, %r1571, 25600; 2026-02-21T08:54:13.6824521Z // begin inline asm 2026-02-21T08:54:13.6824641Z cp.async.cg.shared.global [ %r1621 + 0 ], [ %rd612 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6824753Z // end inline asm 2026-02-21T08:54:13.6824813Z add.s32 %r1623, %r1571, 26624; 2026-02-21T08:54:13.6824883Z // begin inline asm 2026-02-21T08:54:13.6824998Z cp.async.cg.shared.global [ %r1623 + 0 ], [ %rd613 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6825053Z // end inline asm 2026-02-21T08:54:13.6825113Z add.s32 %r1625, %r1571, 27648; 2026-02-21T08:54:13.6825180Z // begin inline asm 2026-02-21T08:54:13.6825292Z cp.async.cg.shared.global [ %r1625 + 0 ], [ %rd614 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6825348Z // end inline asm 2026-02-21T08:54:13.6825414Z add.s32 %r1627, %r1571, 28672; 2026-02-21T08:54:13.6825470Z // begin inline asm 2026-02-21T08:54:13.6825580Z cp.async.cg.shared.global [ %r1627 + 0 ], [ %rd615 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6825633Z // end inline asm 2026-02-21T08:54:13.6825695Z add.s32 %r1629, %r1571, 29696; 2026-02-21T08:54:13.6825782Z // begin inline asm 2026-02-21T08:54:13.6825892Z cp.async.cg.shared.global [ %r1629 + 0 ], [ %rd616 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6825953Z // end inline asm 2026-02-21T08:54:13.6826006Z add.s32 %r1631, %r1571, 30720; 2026-02-21T08:54:13.6826060Z // begin inline asm 2026-02-21T08:54:13.6826167Z cp.async.cg.shared.global [ %r1631 + 0 ], [ %rd617 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6826229Z // end inline asm 2026-02-21T08:54:13.6826283Z add.s32 %r1633, %r1571, 31744; 2026-02-21T08:54:13.6826337Z // begin inline asm 2026-02-21T08:54:13.6826480Z cp.async.cg.shared.global [ %r1633 + 0 ], [ %rd618 + 0 ], 0x10, %r1540; 2026-02-21T08:54:13.6826536Z // end inline asm 2026-02-21T08:54:13.6826598Z cp.async.commit_group; 2026-02-21T08:54:13.6826784Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6826843Z setp.ne.b32 %p102, %r2421, 15; 2026-02-21T08:54:13.6826900Z @%p102 bra $L__BB0_10; 2026-02-21T08:54:13.6826994Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:13.6827174Z .loc 1 0 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:0:108 2026-02-21T08:54:13.6827233Z setp.lt.u32 %p95, %r1, 128; 2026-02-21T08:54:13.6827395Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6827457Z // begin inline asm 2026-02-21T08:54:13.6827508Z 2026-02-21T08:54:13.6827556Z { 2026-02-21T08:54:13.6827622Z .reg .pred complete; 2026-02-21T08:54:13.6827676Z waitLoop: 2026-02-21T08:54:13.6827797Z mbarrier.try_wait.parity.shared.b64 complete, [%r2533], %r2534; 2026-02-21T08:54:13.6827861Z @!complete bra.uni waitLoop; 2026-02-21T08:54:13.6827943Z } 2026-02-21T08:54:13.6827949Z 2026-02-21T08:54:13.6828003Z // end inline asm 2026-02-21T08:54:13.6828056Z // begin inline asm 2026-02-21T08:54:13.6828374Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1744, %r1745, %r1746, %r1747, %r1748, %r1749, %r1750, %r1751, %r1752, %r1753, %r1754, %r1755, %r1756, %r1757, %r1758, %r1759}, [%r412 + 0]; 2026-02-21T08:54:13.6828429Z // end inline asm 2026-02-21T08:54:13.6828482Z // begin inline asm 2026-02-21T08:54:13.6828782Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1761, %r1762, %r1763, %r1764, %r1765, %r1766, %r1767, %r1768, %r1769, %r1770, %r1771, %r1772, %r1773, %r1774, %r1775, %r1776}, [%r412 + 16]; 2026-02-21T08:54:13.6828834Z // end inline asm 2026-02-21T08:54:13.6828888Z // begin inline asm 2026-02-21T08:54:13.6829211Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1778, %r1779, %r1780, %r1781, %r1782, %r1783, %r1784, %r1785, %r1786, %r1787, %r1788, %r1789, %r1790, %r1791, %r1792, %r1793}, [%r412 + 32]; 2026-02-21T08:54:13.6829273Z // end inline asm 2026-02-21T08:54:13.6829328Z // begin inline asm 2026-02-21T08:54:13.6829625Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1795, %r1796, %r1797, %r1798, %r1799, %r1800, %r1801, %r1802, %r1803, %r1804, %r1805, %r1806, %r1807, %r1808, %r1809, %r1810}, [%r412 + 48]; 2026-02-21T08:54:13.6829686Z // end inline asm 2026-02-21T08:54:13.6829738Z // begin inline asm 2026-02-21T08:54:13.6830029Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1812, %r1813, %r1814, %r1815, %r1816, %r1817, %r1818, %r1819, %r1820, %r1821, %r1822, %r1823, %r1824, %r1825, %r1826, %r1827}, [%r412 + 64]; 2026-02-21T08:54:13.6830088Z // end inline asm 2026-02-21T08:54:13.6830141Z // begin inline asm 2026-02-21T08:54:13.6830442Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1829, %r1830, %r1831, %r1832, %r1833, %r1834, %r1835, %r1836, %r1837, %r1838, %r1839, %r1840, %r1841, %r1842, %r1843, %r1844}, [%r412 + 80]; 2026-02-21T08:54:13.6830502Z // end inline asm 2026-02-21T08:54:13.6830556Z // begin inline asm 2026-02-21T08:54:13.6830851Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1846, %r1847, %r1848, %r1849, %r1850, %r1851, %r1852, %r1853, %r1854, %r1855, %r1856, %r1857, %r1858, %r1859, %r1860, %r1861}, [%r412 + 96]; 2026-02-21T08:54:13.6830909Z // end inline asm 2026-02-21T08:54:13.6830983Z // begin inline asm 2026-02-21T08:54:13.6831268Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1863, %r1864, %r1865, %r1866, %r1867, %r1868, %r1869, %r1870, %r1871, %r1872, %r1873, %r1874, %r1875, %r1876, %r1877, %r1878}, [%r412 + 112]; 2026-02-21T08:54:13.6831320Z // end inline asm 2026-02-21T08:54:13.6831380Z // begin inline asm 2026-02-21T08:54:13.6831665Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1880, %r1881, %r1882, %r1883, %r1884, %r1885, %r1886, %r1887, %r1888, %r1889, %r1890, %r1891, %r1892, %r1893, %r1894, %r1895}, [%r412 + 128]; 2026-02-21T08:54:13.6831717Z // end inline asm 2026-02-21T08:54:13.6831798Z // begin inline asm 2026-02-21T08:54:13.6832085Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1897, %r1898, %r1899, %r1900, %r1901, %r1902, %r1903, %r1904, %r1905, %r1906, %r1907, %r1908, %r1909, %r1910, %r1911, %r1912}, [%r412 + 144]; 2026-02-21T08:54:13.6832139Z // end inline asm 2026-02-21T08:54:13.6832202Z // begin inline asm 2026-02-21T08:54:13.6832499Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1914, %r1915, %r1916, %r1917, %r1918, %r1919, %r1920, %r1921, %r1922, %r1923, %r1924, %r1925, %r1926, %r1927, %r1928, %r1929}, [%r412 + 160]; 2026-02-21T08:54:13.6832555Z // end inline asm 2026-02-21T08:54:13.6832615Z // begin inline asm 2026-02-21T08:54:13.6832903Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1931, %r1932, %r1933, %r1934, %r1935, %r1936, %r1937, %r1938, %r1939, %r1940, %r1941, %r1942, %r1943, %r1944, %r1945, %r1946}, [%r412 + 176]; 2026-02-21T08:54:13.6832956Z // end inline asm 2026-02-21T08:54:13.6833016Z // begin inline asm 2026-02-21T08:54:13.6833317Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1948, %r1949, %r1950, %r1951, %r1952, %r1953, %r1954, %r1955, %r1956, %r1957, %r1958, %r1959, %r1960, %r1961, %r1962, %r1963}, [%r412 + 192]; 2026-02-21T08:54:13.6833391Z // end inline asm 2026-02-21T08:54:13.6833445Z // begin inline asm 2026-02-21T08:54:13.6833754Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1965, %r1966, %r1967, %r1968, %r1969, %r1970, %r1971, %r1972, %r1973, %r1974, %r1975, %r1976, %r1977, %r1978, %r1979, %r1980}, [%r412 + 208]; 2026-02-21T08:54:13.6833811Z // end inline asm 2026-02-21T08:54:13.6833864Z // begin inline asm 2026-02-21T08:54:13.6834169Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1982, %r1983, %r1984, %r1985, %r1986, %r1987, %r1988, %r1989, %r1990, %r1991, %r1992, %r1993, %r1994, %r1995, %r1996, %r1997}, [%r412 + 224]; 2026-02-21T08:54:13.6834223Z // end inline asm 2026-02-21T08:54:13.6834276Z // begin inline asm 2026-02-21T08:54:13.6834583Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1999, %r2000, %r2001, %r2002, %r2003, %r2004, %r2005, %r2006, %r2007, %r2008, %r2009, %r2010, %r2011, %r2012, %r2013, %r2014}, [%r412 + 240]; 2026-02-21T08:54:13.6834657Z // end inline asm 2026-02-21T08:54:13.6834761Z // begin inline asm 2026-02-21T08:54:13.6834843Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:13.6834896Z // end inline asm 2026-02-21T08:54:13.6834954Z cvt.u64.u32 %rd620, %r1744; 2026-02-21T08:54:13.6835010Z cvt.u64.u32 %rd621, %r1745; 2026-02-21T08:54:13.6835076Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:54:13.6835134Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:54:13.6835299Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6835367Z mov.b64 {%r2019, %r2020}, %rd623; 2026-02-21T08:54:13.6835438Z cvt.rn.f16x2.f32 %r2021, %r2020, %r2019; 2026-02-21T08:54:13.6835602Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6835664Z cvt.u64.u32 %rd624, %r1746; 2026-02-21T08:54:13.6835718Z cvt.u64.u32 %rd625, %r1747; 2026-02-21T08:54:13.6835774Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:54:13.6835832Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:54:13.6836002Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6836062Z mov.b64 {%r2022, %r2023}, %rd627; 2026-02-21T08:54:13.6836129Z cvt.rn.f16x2.f32 %r2024, %r2023, %r2022; 2026-02-21T08:54:13.6836331Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6836388Z cvt.u64.u32 %rd628, %r1748; 2026-02-21T08:54:13.6836443Z cvt.u64.u32 %rd629, %r1749; 2026-02-21T08:54:13.6836506Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:54:13.6836562Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:54:13.6836726Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6836784Z mov.b64 {%r2025, %r2026}, %rd631; 2026-02-21T08:54:13.6836885Z cvt.rn.f16x2.f32 %r2027, %r2026, %r2025; 2026-02-21T08:54:13.6837055Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6837113Z cvt.u64.u32 %rd632, %r1750; 2026-02-21T08:54:13.6837176Z cvt.u64.u32 %rd633, %r1751; 2026-02-21T08:54:13.6837232Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:54:13.6837289Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:54:13.6837463Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6837521Z mov.b64 {%r2028, %r2029}, %rd635; 2026-02-21T08:54:13.6837587Z cvt.rn.f16x2.f32 %r2030, %r2029, %r2028; 2026-02-21T08:54:13.6837750Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6837813Z cvt.u64.u32 %rd636, %r1752; 2026-02-21T08:54:13.6837868Z cvt.u64.u32 %rd637, %r1753; 2026-02-21T08:54:13.6837924Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:54:13.6837988Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:54:13.6838155Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6838243Z mov.b64 {%r2031, %r2032}, %rd639; 2026-02-21T08:54:13.6838314Z cvt.rn.f16x2.f32 %r2033, %r2032, %r2031; 2026-02-21T08:54:13.6838479Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6838537Z cvt.u64.u32 %rd640, %r1754; 2026-02-21T08:54:13.6838591Z cvt.u64.u32 %rd641, %r1755; 2026-02-21T08:54:13.6838654Z shl.b64 %rd642, %rd641, 32; 2026-02-21T08:54:13.6838711Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T08:54:13.6838872Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6838936Z mov.b64 {%r2034, %r2035}, %rd643; 2026-02-21T08:54:13.6839001Z cvt.rn.f16x2.f32 %r2036, %r2035, %r2034; 2026-02-21T08:54:13.6839167Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6839231Z cvt.u64.u32 %rd644, %r1756; 2026-02-21T08:54:13.6839313Z cvt.u64.u32 %rd645, %r1757; 2026-02-21T08:54:13.6839369Z shl.b64 %rd646, %rd645, 32; 2026-02-21T08:54:13.6839425Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T08:54:13.6839597Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6839655Z mov.b64 {%r2037, %r2038}, %rd647; 2026-02-21T08:54:13.6839718Z cvt.rn.f16x2.f32 %r2039, %r2038, %r2037; 2026-02-21T08:54:13.6839886Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6839941Z cvt.u64.u32 %rd648, %r1758; 2026-02-21T08:54:13.6839995Z cvt.u64.u32 %rd649, %r1759; 2026-02-21T08:54:13.6840055Z shl.b64 %rd650, %rd649, 32; 2026-02-21T08:54:13.6840110Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T08:54:13.6840272Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6840329Z mov.b64 {%r2040, %r2041}, %rd651; 2026-02-21T08:54:13.6840402Z cvt.rn.f16x2.f32 %r2042, %r2041, %r2040; 2026-02-21T08:54:13.6840568Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6840622Z cvt.u64.u32 %rd652, %r1761; 2026-02-21T08:54:13.6840682Z cvt.u64.u32 %rd653, %r1762; 2026-02-21T08:54:13.6840759Z shl.b64 %rd654, %rd653, 32; 2026-02-21T08:54:13.6840816Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T08:54:13.6840981Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6841038Z mov.b64 {%r2043, %r2044}, %rd655; 2026-02-21T08:54:13.6841101Z cvt.rn.f16x2.f32 %r2045, %r2044, %r2043; 2026-02-21T08:54:13.6841261Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6841321Z cvt.u64.u32 %rd656, %r1763; 2026-02-21T08:54:13.6841396Z cvt.u64.u32 %rd657, %r1764; 2026-02-21T08:54:13.6841452Z shl.b64 %rd658, %rd657, 32; 2026-02-21T08:54:13.6841516Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T08:54:13.6841676Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6841733Z mov.b64 {%r2046, %r2047}, %rd659; 2026-02-21T08:54:13.6841801Z cvt.rn.f16x2.f32 %r2048, %r2047, %r2046; 2026-02-21T08:54:13.6841958Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6842013Z cvt.u64.u32 %rd660, %r1765; 2026-02-21T08:54:13.6842066Z cvt.u64.u32 %rd661, %r1766; 2026-02-21T08:54:13.6842131Z shl.b64 %rd662, %rd661, 32; 2026-02-21T08:54:13.6842186Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T08:54:13.6842343Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6842407Z mov.b64 {%r2049, %r2050}, %rd663; 2026-02-21T08:54:13.6842470Z cvt.rn.f16x2.f32 %r2051, %r2050, %r2049; 2026-02-21T08:54:13.6842627Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6842722Z cvt.u64.u32 %rd664, %r1767; 2026-02-21T08:54:13.6842777Z cvt.u64.u32 %rd665, %r1768; 2026-02-21T08:54:13.6842831Z shl.b64 %rd666, %rd665, 32; 2026-02-21T08:54:13.6842886Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T08:54:13.6843060Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6843120Z mov.b64 {%r2052, %r2053}, %rd667; 2026-02-21T08:54:13.6843185Z cvt.rn.f16x2.f32 %r2054, %r2053, %r2052; 2026-02-21T08:54:13.6843360Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6843416Z cvt.u64.u32 %rd668, %r1769; 2026-02-21T08:54:13.6843473Z cvt.u64.u32 %rd669, %r1770; 2026-02-21T08:54:13.6843535Z shl.b64 %rd670, %rd669, 32; 2026-02-21T08:54:13.6843592Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T08:54:13.6843780Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6843840Z mov.b64 {%r2055, %r2056}, %rd671; 2026-02-21T08:54:13.6843913Z cvt.rn.f16x2.f32 %r2057, %r2056, %r2055; 2026-02-21T08:54:13.6844074Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6844130Z cvt.u64.u32 %rd672, %r1771; 2026-02-21T08:54:13.6844194Z cvt.u64.u32 %rd673, %r1772; 2026-02-21T08:54:13.6844249Z shl.b64 %rd674, %rd673, 32; 2026-02-21T08:54:13.6844305Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T08:54:13.6844471Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6844530Z mov.b64 {%r2058, %r2059}, %rd675; 2026-02-21T08:54:13.6844592Z cvt.rn.f16x2.f32 %r2060, %r2059, %r2058; 2026-02-21T08:54:13.6844803Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6844867Z cvt.u64.u32 %rd676, %r1773; 2026-02-21T08:54:13.6844922Z cvt.u64.u32 %rd677, %r1774; 2026-02-21T08:54:13.6844977Z shl.b64 %rd678, %rd677, 32; 2026-02-21T08:54:13.6845042Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T08:54:13.6845200Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6845288Z mov.b64 {%r2061, %r2062}, %rd679; 2026-02-21T08:54:13.6845358Z cvt.rn.f16x2.f32 %r2063, %r2062, %r2061; 2026-02-21T08:54:13.6845521Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6845578Z cvt.u64.u32 %rd680, %r1775; 2026-02-21T08:54:13.6845632Z cvt.u64.u32 %rd681, %r1776; 2026-02-21T08:54:13.6845695Z shl.b64 %rd682, %rd681, 32; 2026-02-21T08:54:13.6845752Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T08:54:13.6845946Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6846015Z mov.b64 {%r2064, %r2065}, %rd683; 2026-02-21T08:54:13.6846082Z cvt.rn.f16x2.f32 %r2066, %r2065, %r2064; 2026-02-21T08:54:13.6846248Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6846313Z cvt.u64.u32 %rd684, %r1778; 2026-02-21T08:54:13.6846369Z cvt.u64.u32 %rd685, %r1779; 2026-02-21T08:54:13.6846428Z shl.b64 %rd686, %rd685, 32; 2026-02-21T08:54:13.6846486Z or.b64 %rd687, %rd684, %rd686; 2026-02-21T08:54:13.6846656Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6846715Z mov.b64 {%r2067, %r2068}, %rd687; 2026-02-21T08:54:13.6846778Z cvt.rn.f16x2.f32 %r2069, %r2068, %r2067; 2026-02-21T08:54:13.6846950Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6847006Z cvt.u64.u32 %rd688, %r1780; 2026-02-21T08:54:13.6847063Z cvt.u64.u32 %rd689, %r1781; 2026-02-21T08:54:13.6847126Z shl.b64 %rd690, %rd689, 32; 2026-02-21T08:54:13.6847184Z or.b64 %rd691, %rd688, %rd690; 2026-02-21T08:54:13.6847372Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6847430Z mov.b64 {%r2070, %r2071}, %rd691; 2026-02-21T08:54:13.6847500Z cvt.rn.f16x2.f32 %r2072, %r2071, %r2070; 2026-02-21T08:54:13.6847661Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6847716Z cvt.u64.u32 %rd692, %r1782; 2026-02-21T08:54:13.6847777Z cvt.u64.u32 %rd693, %r1783; 2026-02-21T08:54:13.6847832Z shl.b64 %rd694, %rd693, 32; 2026-02-21T08:54:13.6847888Z or.b64 %rd695, %rd692, %rd694; 2026-02-21T08:54:13.6848054Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6848111Z mov.b64 {%r2073, %r2074}, %rd695; 2026-02-21T08:54:13.6848175Z cvt.rn.f16x2.f32 %r2075, %r2074, %r2073; 2026-02-21T08:54:13.6848358Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6848423Z cvt.u64.u32 %rd696, %r1784; 2026-02-21T08:54:13.6848477Z cvt.u64.u32 %rd697, %r1785; 2026-02-21T08:54:13.6848530Z shl.b64 %rd698, %rd697, 32; 2026-02-21T08:54:13.6848593Z or.b64 %rd699, %rd696, %rd698; 2026-02-21T08:54:13.6848752Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6848808Z mov.b64 {%r2076, %r2077}, %rd699; 2026-02-21T08:54:13.6848878Z cvt.rn.f16x2.f32 %r2078, %r2077, %r2076; 2026-02-21T08:54:13.6849038Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6849092Z cvt.u64.u32 %rd700, %r1786; 2026-02-21T08:54:13.6849146Z cvt.u64.u32 %rd701, %r1787; 2026-02-21T08:54:13.6849208Z shl.b64 %rd702, %rd701, 32; 2026-02-21T08:54:13.6849264Z or.b64 %rd703, %rd700, %rd702; 2026-02-21T08:54:13.6849426Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6849492Z mov.b64 {%r2079, %r2080}, %rd703; 2026-02-21T08:54:13.6849555Z cvt.rn.f16x2.f32 %r2081, %r2080, %r2079; 2026-02-21T08:54:13.6849712Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6849798Z cvt.u64.u32 %rd704, %r1788; 2026-02-21T08:54:13.6849853Z cvt.u64.u32 %rd705, %r1789; 2026-02-21T08:54:13.6849908Z shl.b64 %rd706, %rd705, 32; 2026-02-21T08:54:13.6849964Z or.b64 %rd707, %rd704, %rd706; 2026-02-21T08:54:13.6850138Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6850194Z mov.b64 {%r2082, %r2083}, %rd707; 2026-02-21T08:54:13.6850256Z cvt.rn.f16x2.f32 %r2084, %r2083, %r2082; 2026-02-21T08:54:13.6850448Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6850504Z cvt.u64.u32 %rd708, %r1790; 2026-02-21T08:54:13.6850560Z cvt.u64.u32 %rd709, %r1791; 2026-02-21T08:54:13.6850624Z shl.b64 %rd710, %rd709, 32; 2026-02-21T08:54:13.6850681Z or.b64 %rd711, %rd708, %rd710; 2026-02-21T08:54:13.6850843Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6850909Z mov.b64 {%r2085, %r2086}, %rd711; 2026-02-21T08:54:13.6850973Z cvt.rn.f16x2.f32 %r2087, %r2086, %r2085; 2026-02-21T08:54:13.6852435Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6852502Z cvt.u64.u32 %rd712, %r1792; 2026-02-21T08:54:13.6852560Z cvt.u64.u32 %rd713, %r1793; 2026-02-21T08:54:13.6852625Z shl.b64 %rd714, %rd713, 32; 2026-02-21T08:54:13.6852685Z or.b64 %rd715, %rd712, %rd714; 2026-02-21T08:54:13.6852857Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6852924Z mov.b64 {%r2088, %r2089}, %rd715; 2026-02-21T08:54:13.6852992Z cvt.rn.f16x2.f32 %r2090, %r2089, %r2088; 2026-02-21T08:54:13.6853188Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6853253Z cvt.u64.u32 %rd716, %r1795; 2026-02-21T08:54:13.6853315Z cvt.u64.u32 %rd717, %r1796; 2026-02-21T08:54:13.6853375Z shl.b64 %rd718, %rd717, 32; 2026-02-21T08:54:13.6853434Z or.b64 %rd719, %rd716, %rd718; 2026-02-21T08:54:13.6853614Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6853695Z mov.b64 {%r2091, %r2092}, %rd719; 2026-02-21T08:54:13.6853760Z cvt.rn.f16x2.f32 %r2093, %r2092, %r2091; 2026-02-21T08:54:13.6853923Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6853988Z cvt.u64.u32 %rd720, %r1797; 2026-02-21T08:54:13.6854042Z cvt.u64.u32 %rd721, %r1798; 2026-02-21T08:54:13.6854097Z shl.b64 %rd722, %rd721, 32; 2026-02-21T08:54:13.6854185Z or.b64 %rd723, %rd720, %rd722; 2026-02-21T08:54:13.6854363Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6854424Z mov.b64 {%r2094, %r2095}, %rd723; 2026-02-21T08:54:13.6854498Z cvt.rn.f16x2.f32 %r2096, %r2095, %r2094; 2026-02-21T08:54:13.6854722Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6854782Z cvt.u64.u32 %rd724, %r1799; 2026-02-21T08:54:13.6854840Z cvt.u64.u32 %rd725, %r1800; 2026-02-21T08:54:13.6854908Z shl.b64 %rd726, %rd725, 32; 2026-02-21T08:54:13.6854968Z or.b64 %rd727, %rd724, %rd726; 2026-02-21T08:54:13.6855143Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6855213Z mov.b64 {%r2097, %r2098}, %rd727; 2026-02-21T08:54:13.6855279Z cvt.rn.f16x2.f32 %r2099, %r2098, %r2097; 2026-02-21T08:54:13.6855451Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6855520Z cvt.u64.u32 %rd728, %r1801; 2026-02-21T08:54:13.6855578Z cvt.u64.u32 %rd729, %r1802; 2026-02-21T08:54:13.6855636Z shl.b64 %rd730, %rd729, 32; 2026-02-21T08:54:13.6855695Z or.b64 %rd731, %rd728, %rd730; 2026-02-21T08:54:13.6855908Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6855969Z mov.b64 {%r2100, %r2101}, %rd731; 2026-02-21T08:54:13.6856038Z cvt.rn.f16x2.f32 %r2102, %r2101, %r2100; 2026-02-21T08:54:13.6856217Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6856276Z cvt.u64.u32 %rd732, %r1803; 2026-02-21T08:54:13.6856335Z cvt.u64.u32 %rd733, %r1804; 2026-02-21T08:54:13.6856401Z shl.b64 %rd734, %rd733, 32; 2026-02-21T08:54:13.6856460Z or.b64 %rd735, %rd732, %rd734; 2026-02-21T08:54:13.6856636Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6856701Z mov.b64 {%r2103, %r2104}, %rd735; 2026-02-21T08:54:13.6856777Z cvt.rn.f16x2.f32 %r2105, %r2104, %r2103; 2026-02-21T08:54:13.6856950Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6857009Z cvt.u64.u32 %rd736, %r1805; 2026-02-21T08:54:13.6857073Z cvt.u64.u32 %rd737, %r1806; 2026-02-21T08:54:13.6857130Z shl.b64 %rd738, %rd737, 32; 2026-02-21T08:54:13.6857190Z or.b64 %rd739, %rd736, %rd738; 2026-02-21T08:54:13.6857427Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6857489Z mov.b64 {%r2106, %r2107}, %rd739; 2026-02-21T08:54:13.6857554Z cvt.rn.f16x2.f32 %r2108, %r2107, %r2106; 2026-02-21T08:54:13.6857734Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6857792Z cvt.u64.u32 %rd740, %r1807; 2026-02-21T08:54:13.6857851Z cvt.u64.u32 %rd741, %r1808; 2026-02-21T08:54:13.6857936Z shl.b64 %rd742, %rd741, 32; 2026-02-21T08:54:13.6858003Z or.b64 %rd743, %rd740, %rd742; 2026-02-21T08:54:13.6858178Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6858240Z mov.b64 {%r2109, %r2110}, %rd743; 2026-02-21T08:54:13.6858312Z cvt.rn.f16x2.f32 %r2111, %r2110, %r2109; 2026-02-21T08:54:13.6858481Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6858540Z cvt.u64.u32 %rd744, %r1809; 2026-02-21T08:54:13.6858606Z cvt.u64.u32 %rd745, %r1810; 2026-02-21T08:54:13.6858664Z shl.b64 %rd746, %rd745, 32; 2026-02-21T08:54:13.6858723Z or.b64 %rd747, %rd744, %rd746; 2026-02-21T08:54:13.6858891Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6858958Z mov.b64 {%r2112, %r2113}, %rd747; 2026-02-21T08:54:13.6859055Z cvt.rn.f16x2.f32 %r2114, %r2113, %r2112; 2026-02-21T08:54:13.6859230Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6859296Z cvt.u64.u32 %rd748, %r1812; 2026-02-21T08:54:13.6859354Z cvt.u64.u32 %rd749, %r1813; 2026-02-21T08:54:13.6859415Z shl.b64 %rd750, %rd749, 32; 2026-02-21T08:54:13.6859473Z or.b64 %rd751, %rd748, %rd750; 2026-02-21T08:54:13.6859651Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6859712Z mov.b64 {%r2115, %r2116}, %rd751; 2026-02-21T08:54:13.6859780Z cvt.rn.f16x2.f32 %r2117, %r2116, %r2115; 2026-02-21T08:54:13.6859961Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6860020Z cvt.u64.u32 %rd752, %r1814; 2026-02-21T08:54:13.6860078Z cvt.u64.u32 %rd753, %r1815; 2026-02-21T08:54:13.6860142Z shl.b64 %rd754, %rd753, 32; 2026-02-21T08:54:13.6860203Z or.b64 %rd755, %rd752, %rd754; 2026-02-21T08:54:13.6860377Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6860446Z mov.b64 {%r2118, %r2119}, %rd755; 2026-02-21T08:54:13.6860514Z cvt.rn.f16x2.f32 %r2120, %r2119, %r2118; 2026-02-21T08:54:13.6860733Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6860791Z cvt.u64.u32 %rd756, %r1816; 2026-02-21T08:54:13.6860857Z cvt.u64.u32 %rd757, %r1817; 2026-02-21T08:54:13.6860915Z shl.b64 %rd758, %rd757, 32; 2026-02-21T08:54:13.6860974Z or.b64 %rd759, %rd756, %rd758; 2026-02-21T08:54:13.6861152Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6861211Z mov.b64 {%r2121, %r2122}, %rd759; 2026-02-21T08:54:13.6861277Z cvt.rn.f16x2.f32 %r2123, %r2122, %r2121; 2026-02-21T08:54:13.6861455Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6861514Z cvt.u64.u32 %rd760, %r1818; 2026-02-21T08:54:13.6861571Z cvt.u64.u32 %rd761, %r1819; 2026-02-21T08:54:13.6861628Z shl.b64 %rd762, %rd761, 32; 2026-02-21T08:54:13.6861697Z or.b64 %rd763, %rd760, %rd762; 2026-02-21T08:54:13.6861871Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6861932Z mov.b64 {%r2124, %r2125}, %rd763; 2026-02-21T08:54:13.6862007Z cvt.rn.f16x2.f32 %r2126, %r2125, %r2124; 2026-02-21T08:54:13.6862208Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6862270Z cvt.u64.u32 %rd764, %r1820; 2026-02-21T08:54:13.6862335Z cvt.u64.u32 %rd765, %r1821; 2026-02-21T08:54:13.6862392Z shl.b64 %rd766, %rd765, 32; 2026-02-21T08:54:13.6862450Z or.b64 %rd767, %rd764, %rd766; 2026-02-21T08:54:13.6862621Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6862712Z mov.b64 {%r2127, %r2128}, %rd767; 2026-02-21T08:54:13.6862789Z cvt.rn.f16x2.f32 %r2129, %r2128, %r2127; 2026-02-21T08:54:13.6862951Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6863018Z cvt.u64.u32 %rd768, %r1822; 2026-02-21T08:54:13.6863072Z cvt.u64.u32 %rd769, %r1823; 2026-02-21T08:54:13.6863126Z shl.b64 %rd770, %rd769, 32; 2026-02-21T08:54:13.6863190Z or.b64 %rd771, %rd768, %rd770; 2026-02-21T08:54:13.6863354Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6863412Z mov.b64 {%r2130, %r2131}, %rd771; 2026-02-21T08:54:13.6863474Z cvt.rn.f16x2.f32 %r2132, %r2131, %r2130; 2026-02-21T08:54:13.6863645Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6863699Z cvt.u64.u32 %rd772, %r1824; 2026-02-21T08:54:13.6863776Z cvt.u64.u32 %rd773, %r1825; 2026-02-21T08:54:13.6863843Z shl.b64 %rd774, %rd773, 32; 2026-02-21T08:54:13.6863901Z or.b64 %rd775, %rd772, %rd774; 2026-02-21T08:54:13.6864062Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6864130Z mov.b64 {%r2133, %r2134}, %rd775; 2026-02-21T08:54:13.6864193Z cvt.rn.f16x2.f32 %r2135, %r2134, %r2133; 2026-02-21T08:54:13.6864351Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6864408Z cvt.u64.u32 %rd776, %r1826; 2026-02-21T08:54:13.6864472Z cvt.u64.u32 %rd777, %r1827; 2026-02-21T08:54:13.6864527Z shl.b64 %rd778, %rd777, 32; 2026-02-21T08:54:13.6864582Z or.b64 %rd779, %rd776, %rd778; 2026-02-21T08:54:13.6864774Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6864832Z mov.b64 {%r2136, %r2137}, %rd779; 2026-02-21T08:54:13.6864897Z cvt.rn.f16x2.f32 %r2138, %r2137, %r2136; 2026-02-21T08:54:13.6865068Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6865124Z cvt.u64.u32 %rd780, %r1829; 2026-02-21T08:54:13.6865178Z cvt.u64.u32 %rd781, %r1830; 2026-02-21T08:54:13.6865264Z shl.b64 %rd782, %rd781, 32; 2026-02-21T08:54:13.6865329Z or.b64 %rd783, %rd780, %rd782; 2026-02-21T08:54:13.6865493Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6865551Z mov.b64 {%r2139, %r2140}, %rd783; 2026-02-21T08:54:13.6865622Z cvt.rn.f16x2.f32 %r2141, %r2140, %r2139; 2026-02-21T08:54:13.6865784Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6865839Z cvt.u64.u32 %rd784, %r1831; 2026-02-21T08:54:13.6865901Z cvt.u64.u32 %rd785, %r1832; 2026-02-21T08:54:13.6865955Z shl.b64 %rd786, %rd785, 32; 2026-02-21T08:54:13.6866012Z or.b64 %rd787, %rd784, %rd786; 2026-02-21T08:54:13.6866174Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6866241Z mov.b64 {%r2142, %r2143}, %rd787; 2026-02-21T08:54:13.6866304Z cvt.rn.f16x2.f32 %r2144, %r2143, %r2142; 2026-02-21T08:54:13.6866468Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6866533Z cvt.u64.u32 %rd788, %r1833; 2026-02-21T08:54:13.6866587Z cvt.u64.u32 %rd789, %r1834; 2026-02-21T08:54:13.6866664Z shl.b64 %rd790, %rd789, 32; 2026-02-21T08:54:13.6866733Z or.b64 %rd791, %rd788, %rd790; 2026-02-21T08:54:13.6866901Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6866959Z mov.b64 {%r2145, %r2146}, %rd791; 2026-02-21T08:54:13.6867025Z cvt.rn.f16x2.f32 %r2147, %r2146, %r2145; 2026-02-21T08:54:13.6867202Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6867285Z cvt.u64.u32 %rd792, %r1835; 2026-02-21T08:54:13.6867340Z cvt.u64.u32 %rd793, %r1836; 2026-02-21T08:54:13.6867401Z shl.b64 %rd794, %rd793, 32; 2026-02-21T08:54:13.6867458Z or.b64 %rd795, %rd792, %rd794; 2026-02-21T08:54:13.6867622Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6867684Z mov.b64 {%r2148, %r2149}, %rd795; 2026-02-21T08:54:13.6867746Z cvt.rn.f16x2.f32 %r2150, %r2149, %r2148; 2026-02-21T08:54:13.6867912Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6867966Z cvt.u64.u32 %rd796, %r1837; 2026-02-21T08:54:13.6868027Z cvt.u64.u32 %rd797, %r1838; 2026-02-21T08:54:13.6868082Z shl.b64 %rd798, %rd797, 32; 2026-02-21T08:54:13.6868137Z or.b64 %rd799, %rd796, %rd798; 2026-02-21T08:54:13.6868329Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6868389Z mov.b64 {%r2151, %r2152}, %rd799; 2026-02-21T08:54:13.6868453Z cvt.rn.f16x2.f32 %r2153, %r2152, %r2151; 2026-02-21T08:54:13.6868621Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6868679Z cvt.u64.u32 %rd800, %r1839; 2026-02-21T08:54:13.6868733Z cvt.u64.u32 %rd801, %r1840; 2026-02-21T08:54:13.6868788Z shl.b64 %rd802, %rd801, 32; 2026-02-21T08:54:13.6868852Z or.b64 %rd803, %rd800, %rd802; 2026-02-21T08:54:13.6869019Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6869077Z mov.b64 {%r2154, %r2155}, %rd803; 2026-02-21T08:54:13.6869147Z cvt.rn.f16x2.f32 %r2156, %r2155, %r2154; 2026-02-21T08:54:13.6869310Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6869365Z cvt.u64.u32 %rd804, %r1841; 2026-02-21T08:54:13.6869428Z cvt.u64.u32 %rd805, %r1842; 2026-02-21T08:54:13.6869485Z shl.b64 %rd806, %rd805, 32; 2026-02-21T08:54:13.6869543Z or.b64 %rd807, %rd804, %rd806; 2026-02-21T08:54:13.6869707Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6869794Z mov.b64 {%r2157, %r2158}, %rd807; 2026-02-21T08:54:13.6869858Z cvt.rn.f16x2.f32 %r2159, %r2158, %r2157; 2026-02-21T08:54:13.6870022Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6870087Z cvt.u64.u32 %rd808, %r1843; 2026-02-21T08:54:13.6870141Z cvt.u64.u32 %rd809, %r1844; 2026-02-21T08:54:13.6870197Z shl.b64 %rd810, %rd809, 32; 2026-02-21T08:54:13.6870263Z or.b64 %rd811, %rd808, %rd810; 2026-02-21T08:54:13.6870424Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6870482Z mov.b64 {%r2160, %r2161}, %rd811; 2026-02-21T08:54:13.6870547Z cvt.rn.f16x2.f32 %r2162, %r2161, %r2160; 2026-02-21T08:54:13.6870721Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6870776Z cvt.u64.u32 %rd812, %r1846; 2026-02-21T08:54:13.6870832Z cvt.u64.u32 %rd813, %r1847; 2026-02-21T08:54:13.6870903Z shl.b64 %rd814, %rd813, 32; 2026-02-21T08:54:13.6870960Z or.b64 %rd815, %rd812, %rd814; 2026-02-21T08:54:13.6871123Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6871209Z mov.b64 {%r2163, %r2164}, %rd815; 2026-02-21T08:54:13.6871275Z cvt.rn.f16x2.f32 %r2165, %r2164, %r2163; 2026-02-21T08:54:13.6871438Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6871496Z cvt.u64.u32 %rd816, %r1848; 2026-02-21T08:54:13.6871559Z cvt.u64.u32 %rd817, %r1849; 2026-02-21T08:54:13.6871613Z shl.b64 %rd818, %rd817, 32; 2026-02-21T08:54:13.6871670Z or.b64 %rd819, %rd816, %rd818; 2026-02-21T08:54:13.6871860Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6871917Z mov.b64 {%r2166, %r2167}, %rd819; 2026-02-21T08:54:13.6871979Z cvt.rn.f16x2.f32 %r2168, %r2167, %r2166; 2026-02-21T08:54:13.6872151Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6872206Z cvt.u64.u32 %rd820, %r1850; 2026-02-21T08:54:13.6872261Z cvt.u64.u32 %rd821, %r1851; 2026-02-21T08:54:13.6872317Z shl.b64 %rd822, %rd821, 32; 2026-02-21T08:54:13.6872379Z or.b64 %rd823, %rd820, %rd822; 2026-02-21T08:54:13.6872546Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6872603Z mov.b64 {%r2169, %r2170}, %rd823; 2026-02-21T08:54:13.6872674Z cvt.rn.f16x2.f32 %r2171, %r2170, %r2169; 2026-02-21T08:54:13.6872857Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6872915Z cvt.u64.u32 %rd824, %r1852; 2026-02-21T08:54:13.6872977Z cvt.u64.u32 %rd825, %r1853; 2026-02-21T08:54:13.6873031Z shl.b64 %rd826, %rd825, 32; 2026-02-21T08:54:13.6873086Z or.b64 %rd827, %rd824, %rd826; 2026-02-21T08:54:13.6873253Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6873317Z mov.b64 {%r2172, %r2173}, %rd827; 2026-02-21T08:54:13.6873379Z cvt.rn.f16x2.f32 %r2174, %r2173, %r2172; 2026-02-21T08:54:13.6873545Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6873607Z cvt.u64.u32 %rd828, %r1854; 2026-02-21T08:54:13.6873661Z cvt.u64.u32 %rd829, %r1855; 2026-02-21T08:54:13.6873716Z shl.b64 %rd830, %rd829, 32; 2026-02-21T08:54:13.6873780Z or.b64 %rd831, %rd828, %rd830; 2026-02-21T08:54:13.6873946Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6874003Z mov.b64 {%r2175, %r2176}, %rd831; 2026-02-21T08:54:13.6874066Z cvt.rn.f16x2.f32 %r2177, %r2176, %r2175; 2026-02-21T08:54:13.6874236Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6874315Z cvt.u64.u32 %rd832, %r1856; 2026-02-21T08:54:13.6874369Z cvt.u64.u32 %rd833, %r1857; 2026-02-21T08:54:13.6874430Z shl.b64 %rd834, %rd833, 32; 2026-02-21T08:54:13.6874486Z or.b64 %rd835, %rd832, %rd834; 2026-02-21T08:54:13.6874650Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6874778Z mov.b64 {%r2178, %r2179}, %rd835; 2026-02-21T08:54:13.6874843Z cvt.rn.f16x2.f32 %r2180, %r2179, %r2178; 2026-02-21T08:54:13.6875006Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6875067Z cvt.u64.u32 %rd836, %r1858; 2026-02-21T08:54:13.6875122Z cvt.u64.u32 %rd837, %r1859; 2026-02-21T08:54:13.6875176Z shl.b64 %rd838, %rd837, 32; 2026-02-21T08:54:13.6875235Z or.b64 %rd839, %rd836, %rd838; 2026-02-21T08:54:13.6875406Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6875464Z mov.b64 {%r2181, %r2182}, %rd839; 2026-02-21T08:54:13.6875526Z cvt.rn.f16x2.f32 %r2183, %r2182, %r2181; 2026-02-21T08:54:13.6875693Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6875773Z cvt.u64.u32 %rd840, %r1860; 2026-02-21T08:54:13.6875829Z cvt.u64.u32 %rd841, %r1861; 2026-02-21T08:54:13.6875884Z shl.b64 %rd842, %rd841, 32; 2026-02-21T08:54:13.6875946Z or.b64 %rd843, %rd840, %rd842; 2026-02-21T08:54:13.6876108Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6876165Z mov.b64 {%r2184, %r2185}, %rd843; 2026-02-21T08:54:13.6876234Z cvt.rn.f16x2.f32 %r2186, %r2185, %r2184; 2026-02-21T08:54:13.6876426Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6876482Z cvt.u64.u32 %rd844, %r1863; 2026-02-21T08:54:13.6876544Z cvt.u64.u32 %rd845, %r1864; 2026-02-21T08:54:13.6876601Z shl.b64 %rd846, %rd845, 32; 2026-02-21T08:54:13.6876657Z or.b64 %rd847, %rd844, %rd846; 2026-02-21T08:54:13.6876820Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6876885Z mov.b64 {%r2187, %r2188}, %rd847; 2026-02-21T08:54:13.6876947Z cvt.rn.f16x2.f32 %r2189, %r2188, %r2187; 2026-02-21T08:54:13.6877108Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6877170Z cvt.u64.u32 %rd848, %r1865; 2026-02-21T08:54:13.6877225Z cvt.u64.u32 %rd849, %r1866; 2026-02-21T08:54:13.6877280Z shl.b64 %rd850, %rd849, 32; 2026-02-21T08:54:13.6877379Z or.b64 %rd851, %rd848, %rd850; 2026-02-21T08:54:13.6877545Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6877603Z mov.b64 {%r2190, %r2191}, %rd851; 2026-02-21T08:54:13.6877673Z cvt.rn.f16x2.f32 %r2192, %r2191, %r2190; 2026-02-21T08:54:13.6877839Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6877895Z cvt.u64.u32 %rd852, %r1867; 2026-02-21T08:54:13.6877949Z cvt.u64.u32 %rd853, %r1868; 2026-02-21T08:54:13.6878015Z shl.b64 %rd854, %rd853, 32; 2026-02-21T08:54:13.6878075Z or.b64 %rd855, %rd852, %rd854; 2026-02-21T08:54:13.6878243Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6878307Z mov.b64 {%r2193, %r2194}, %rd855; 2026-02-21T08:54:13.6878372Z cvt.rn.f16x2.f32 %r2195, %r2194, %r2193; 2026-02-21T08:54:13.6878538Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6878603Z cvt.u64.u32 %rd856, %r1869; 2026-02-21T08:54:13.6878658Z cvt.u64.u32 %rd857, %r1870; 2026-02-21T08:54:13.6878713Z shl.b64 %rd858, %rd857, 32; 2026-02-21T08:54:13.6878770Z or.b64 %rd859, %rd856, %rd858; 2026-02-21T08:54:13.6878972Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6879030Z mov.b64 {%r2196, %r2197}, %rd859; 2026-02-21T08:54:13.6879094Z cvt.rn.f16x2.f32 %r2198, %r2197, %r2196; 2026-02-21T08:54:13.6879265Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6879322Z cvt.u64.u32 %rd860, %r1871; 2026-02-21T08:54:13.6879377Z cvt.u64.u32 %rd861, %r1872; 2026-02-21T08:54:13.6879440Z shl.b64 %rd862, %rd861, 32; 2026-02-21T08:54:13.6879497Z or.b64 %rd863, %rd860, %rd862; 2026-02-21T08:54:13.6879663Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6879723Z mov.b64 {%r2199, %r2200}, %rd863; 2026-02-21T08:54:13.6879795Z cvt.rn.f16x2.f32 %r2201, %r2200, %r2199; 2026-02-21T08:54:13.6879960Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6880018Z cvt.u64.u32 %rd864, %r1873; 2026-02-21T08:54:13.6880081Z cvt.u64.u32 %rd865, %r1874; 2026-02-21T08:54:13.6880135Z shl.b64 %rd866, %rd865, 32; 2026-02-21T08:54:13.6880192Z or.b64 %rd867, %rd864, %rd866; 2026-02-21T08:54:13.6880384Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6880443Z mov.b64 {%r2202, %r2203}, %rd867; 2026-02-21T08:54:13.6880505Z cvt.rn.f16x2.f32 %r2204, %r2203, %r2202; 2026-02-21T08:54:13.6880669Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6880731Z cvt.u64.u32 %rd868, %r1875; 2026-02-21T08:54:13.6880788Z cvt.u64.u32 %rd869, %r1876; 2026-02-21T08:54:13.6880866Z shl.b64 %rd870, %rd869, 32; 2026-02-21T08:54:13.6880930Z or.b64 %rd871, %rd868, %rd870; 2026-02-21T08:54:13.6881097Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6881155Z mov.b64 {%r2205, %r2206}, %rd871; 2026-02-21T08:54:13.6881225Z cvt.rn.f16x2.f32 %r2207, %r2206, %r2205; 2026-02-21T08:54:13.6881388Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6881444Z cvt.u64.u32 %rd872, %r1877; 2026-02-21T08:54:13.6881498Z cvt.u64.u32 %rd873, %r1878; 2026-02-21T08:54:13.6881560Z shl.b64 %rd874, %rd873, 32; 2026-02-21T08:54:13.6881619Z or.b64 %rd875, %rd872, %rd874; 2026-02-21T08:54:13.6881783Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6881847Z mov.b64 {%r2208, %r2209}, %rd875; 2026-02-21T08:54:13.6881933Z cvt.rn.f16x2.f32 %r2210, %r2209, %r2208; 2026-02-21T08:54:13.6882095Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6882157Z cvt.u64.u32 %rd876, %r1880; 2026-02-21T08:54:13.6882212Z cvt.u64.u32 %rd877, %r1881; 2026-02-21T08:54:13.6882270Z shl.b64 %rd878, %rd877, 32; 2026-02-21T08:54:13.6882327Z or.b64 %rd879, %rd876, %rd878; 2026-02-21T08:54:13.6882499Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6882558Z mov.b64 {%r2211, %r2212}, %rd879; 2026-02-21T08:54:13.6882622Z cvt.rn.f16x2.f32 %r2213, %r2212, %r2211; 2026-02-21T08:54:13.6882788Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6882843Z cvt.u64.u32 %rd880, %r1882; 2026-02-21T08:54:13.6882897Z cvt.u64.u32 %rd881, %r1883; 2026-02-21T08:54:13.6882959Z shl.b64 %rd882, %rd881, 32; 2026-02-21T08:54:13.6883018Z or.b64 %rd883, %rd880, %rd882; 2026-02-21T08:54:13.6883178Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6883235Z mov.b64 {%r2214, %r2215}, %rd883; 2026-02-21T08:54:13.6883307Z cvt.rn.f16x2.f32 %r2216, %r2215, %r2214; 2026-02-21T08:54:13.6883496Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6883551Z cvt.u64.u32 %rd884, %r1884; 2026-02-21T08:54:13.6883614Z cvt.u64.u32 %rd885, %r1885; 2026-02-21T08:54:13.6883670Z shl.b64 %rd886, %rd885, 32; 2026-02-21T08:54:13.6883727Z or.b64 %rd887, %rd884, %rd886; 2026-02-21T08:54:13.6883896Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6883954Z mov.b64 {%r2217, %r2218}, %rd887; 2026-02-21T08:54:13.6884017Z cvt.rn.f16x2.f32 %r2219, %r2218, %r2217; 2026-02-21T08:54:13.6884182Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6884247Z cvt.u64.u32 %rd888, %r1886; 2026-02-21T08:54:13.6884301Z cvt.u64.u32 %rd889, %r1887; 2026-02-21T08:54:13.6884355Z shl.b64 %rd890, %rd889, 32; 2026-02-21T08:54:13.6884418Z or.b64 %rd891, %rd888, %rd890; 2026-02-21T08:54:13.6884582Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6884638Z mov.b64 {%r2220, %r2221}, %rd891; 2026-02-21T08:54:13.6884740Z cvt.rn.f16x2.f32 %r2222, %r2221, %r2220; 2026-02-21T08:54:13.6884930Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6884987Z cvt.u64.u32 %rd892, %r1888; 2026-02-21T08:54:13.6885041Z cvt.u64.u32 %rd893, %r1889; 2026-02-21T08:54:13.6885102Z shl.b64 %rd894, %rd893, 32; 2026-02-21T08:54:13.6885159Z or.b64 %rd895, %rd892, %rd894; 2026-02-21T08:54:13.6885323Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6885416Z mov.b64 {%r2223, %r2224}, %rd895; 2026-02-21T08:54:13.6885481Z cvt.rn.f16x2.f32 %r2225, %r2224, %r2223; 2026-02-21T08:54:13.6885643Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6885708Z cvt.u64.u32 %rd896, %r1890; 2026-02-21T08:54:13.6885761Z cvt.u64.u32 %rd897, %r1891; 2026-02-21T08:54:13.6885816Z shl.b64 %rd898, %rd897, 32; 2026-02-21T08:54:13.6885871Z or.b64 %rd899, %rd896, %rd898; 2026-02-21T08:54:13.6886046Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6886105Z mov.b64 {%r2226, %r2227}, %rd899; 2026-02-21T08:54:13.6886169Z cvt.rn.f16x2.f32 %r2228, %r2227, %r2226; 2026-02-21T08:54:13.6886337Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6886392Z cvt.u64.u32 %rd900, %r1892; 2026-02-21T08:54:13.6886480Z cvt.u64.u32 %rd901, %r1893; 2026-02-21T08:54:13.6886545Z shl.b64 %rd902, %rd901, 32; 2026-02-21T08:54:13.6886602Z or.b64 %rd903, %rd900, %rd902; 2026-02-21T08:54:13.6886764Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6886822Z mov.b64 {%r2229, %r2230}, %rd903; 2026-02-21T08:54:13.6886894Z cvt.rn.f16x2.f32 %r2231, %r2230, %r2229; 2026-02-21T08:54:13.6887055Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6887111Z cvt.u64.u32 %rd904, %r1894; 2026-02-21T08:54:13.6887173Z cvt.u64.u32 %rd905, %r1895; 2026-02-21T08:54:13.6887228Z shl.b64 %rd906, %rd905, 32; 2026-02-21T08:54:13.6887284Z or.b64 %rd907, %rd904, %rd906; 2026-02-21T08:54:13.6887454Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6887511Z mov.b64 {%r2232, %r2233}, %rd907; 2026-02-21T08:54:13.6887574Z cvt.rn.f16x2.f32 %r2234, %r2233, %r2232; 2026-02-21T08:54:13.6887738Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6887801Z cvt.u64.u32 %rd908, %r1897; 2026-02-21T08:54:13.6887855Z cvt.u64.u32 %rd909, %r1898; 2026-02-21T08:54:13.6887943Z shl.b64 %rd910, %rd909, 32; 2026-02-21T08:54:13.6888011Z or.b64 %rd911, %rd908, %rd910; 2026-02-21T08:54:13.6888174Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6888235Z mov.b64 {%r2235, %r2236}, %rd911; 2026-02-21T08:54:13.6888311Z cvt.rn.f16x2.f32 %r2237, %r2236, %r2235; 2026-02-21T08:54:13.6888476Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6888532Z cvt.u64.u32 %rd912, %r1899; 2026-02-21T08:54:13.6888595Z cvt.u64.u32 %rd913, %r1900; 2026-02-21T08:54:13.6888659Z shl.b64 %rd914, %rd913, 32; 2026-02-21T08:54:13.6888717Z or.b64 %rd915, %rd912, %rd914; 2026-02-21T08:54:13.6888881Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6888945Z mov.b64 {%r2238, %r2239}, %rd915; 2026-02-21T08:54:13.6889008Z cvt.rn.f16x2.f32 %r2240, %r2239, %r2238; 2026-02-21T08:54:13.6889171Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6889233Z cvt.u64.u32 %rd916, %r1901; 2026-02-21T08:54:13.6889289Z cvt.u64.u32 %rd917, %r1902; 2026-02-21T08:54:13.6889364Z shl.b64 %rd918, %rd917, 32; 2026-02-21T08:54:13.6889423Z or.b64 %rd919, %rd916, %rd918; 2026-02-21T08:54:13.6889594Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6889650Z mov.b64 {%r2241, %r2242}, %rd919; 2026-02-21T08:54:13.6889714Z cvt.rn.f16x2.f32 %r2243, %r2242, %r2241; 2026-02-21T08:54:13.6889884Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6889965Z cvt.u64.u32 %rd920, %r1903; 2026-02-21T08:54:13.6890020Z cvt.u64.u32 %rd921, %r1904; 2026-02-21T08:54:13.6890081Z shl.b64 %rd922, %rd921, 32; 2026-02-21T08:54:13.6890137Z or.b64 %rd923, %rd920, %rd922; 2026-02-21T08:54:13.6890300Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6890356Z mov.b64 {%r2244, %r2245}, %rd923; 2026-02-21T08:54:13.6890427Z cvt.rn.f16x2.f32 %r2246, %r2245, %r2244; 2026-02-21T08:54:13.6890588Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6890644Z cvt.u64.u32 %rd924, %r1905; 2026-02-21T08:54:13.6890705Z cvt.u64.u32 %rd925, %r1906; 2026-02-21T08:54:13.6890759Z shl.b64 %rd926, %rd925, 32; 2026-02-21T08:54:13.6890815Z or.b64 %rd927, %rd924, %rd926; 2026-02-21T08:54:13.6891003Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6891062Z mov.b64 {%r2247, %r2248}, %rd927; 2026-02-21T08:54:13.6891126Z cvt.rn.f16x2.f32 %r2249, %r2248, %r2247; 2026-02-21T08:54:13.6891295Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6891352Z cvt.u64.u32 %rd928, %r1907; 2026-02-21T08:54:13.6891407Z cvt.u64.u32 %rd929, %r1908; 2026-02-21T08:54:13.6891462Z shl.b64 %rd930, %rd929, 32; 2026-02-21T08:54:13.6891529Z or.b64 %rd931, %rd928, %rd930; 2026-02-21T08:54:13.6891687Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6891745Z mov.b64 {%r2250, %r2251}, %rd931; 2026-02-21T08:54:13.6891814Z cvt.rn.f16x2.f32 %r2252, %r2251, %r2250; 2026-02-21T08:54:13.6891971Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6892026Z cvt.u64.u32 %rd932, %r1909; 2026-02-21T08:54:13.6892088Z cvt.u64.u32 %rd933, %r1910; 2026-02-21T08:54:13.6892144Z shl.b64 %rd934, %rd933, 32; 2026-02-21T08:54:13.6892202Z or.b64 %rd935, %rd932, %rd934; 2026-02-21T08:54:13.6892360Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6892445Z mov.b64 {%r2253, %r2254}, %rd935; 2026-02-21T08:54:13.6892509Z cvt.rn.f16x2.f32 %r2255, %r2254, %r2253; 2026-02-21T08:54:13.6892675Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6892737Z cvt.u64.u32 %rd936, %r1911; 2026-02-21T08:54:13.6892791Z cvt.u64.u32 %rd937, %r1912; 2026-02-21T08:54:13.6892846Z shl.b64 %rd938, %rd937, 32; 2026-02-21T08:54:13.6892904Z or.b64 %rd939, %rd936, %rd938; 2026-02-21T08:54:13.6893073Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6893129Z mov.b64 {%r2256, %r2257}, %rd939; 2026-02-21T08:54:13.6893192Z cvt.rn.f16x2.f32 %r2258, %r2257, %r2256; 2026-02-21T08:54:13.6893363Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6893417Z cvt.u64.u32 %rd940, %r1914; 2026-02-21T08:54:13.6893471Z cvt.u64.u32 %rd941, %r1915; 2026-02-21T08:54:13.6893534Z shl.b64 %rd942, %rd941, 32; 2026-02-21T08:54:13.6893590Z or.b64 %rd943, %rd940, %rd942; 2026-02-21T08:54:13.6893753Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6893836Z mov.b64 {%r2259, %r2260}, %rd943; 2026-02-21T08:54:13.6893900Z cvt.rn.f16x2.f32 %r2261, %r2260, %r2259; 2026-02-21T08:54:13.6894062Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6894117Z cvt.u64.u32 %rd944, %r1916; 2026-02-21T08:54:13.6894178Z cvt.u64.u32 %rd945, %r1917; 2026-02-21T08:54:13.6894232Z shl.b64 %rd946, %rd945, 32; 2026-02-21T08:54:13.6894289Z or.b64 %rd947, %rd944, %rd946; 2026-02-21T08:54:13.6894485Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6894541Z mov.b64 {%r2262, %r2263}, %rd947; 2026-02-21T08:54:13.6894603Z cvt.rn.f16x2.f32 %r2264, %r2263, %r2262; 2026-02-21T08:54:13.6894800Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6894855Z cvt.u64.u32 %rd948, %r1918; 2026-02-21T08:54:13.6894910Z cvt.u64.u32 %rd949, %r1919; 2026-02-21T08:54:13.6894967Z shl.b64 %rd950, %rd949, 32; 2026-02-21T08:54:13.6895031Z or.b64 %rd951, %rd948, %rd950; 2026-02-21T08:54:13.6895197Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6895255Z mov.b64 {%r2265, %r2266}, %rd951; 2026-02-21T08:54:13.6895326Z cvt.rn.f16x2.f32 %r2267, %r2266, %r2265; 2026-02-21T08:54:13.6895516Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6895574Z cvt.u64.u32 %rd952, %r1920; 2026-02-21T08:54:13.6895635Z cvt.u64.u32 %rd953, %r1921; 2026-02-21T08:54:13.6895690Z shl.b64 %rd954, %rd953, 32; 2026-02-21T08:54:13.6895746Z or.b64 %rd955, %rd952, %rd954; 2026-02-21T08:54:13.6895906Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6895972Z mov.b64 {%r2268, %r2269}, %rd955; 2026-02-21T08:54:13.6896035Z cvt.rn.f16x2.f32 %r2270, %r2269, %r2268; 2026-02-21T08:54:13.6896195Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6896260Z cvt.u64.u32 %rd956, %r1922; 2026-02-21T08:54:13.6896315Z cvt.u64.u32 %rd957, %r1923; 2026-02-21T08:54:13.6896369Z shl.b64 %rd958, %rd957, 32; 2026-02-21T08:54:13.6896434Z or.b64 %rd959, %rd956, %rd958; 2026-02-21T08:54:13.6896599Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6896658Z mov.b64 {%r2271, %r2272}, %rd959; 2026-02-21T08:54:13.6896722Z cvt.rn.f16x2.f32 %r2273, %r2272, %r2271; 2026-02-21T08:54:13.6896889Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6896973Z cvt.u64.u32 %rd960, %r1924; 2026-02-21T08:54:13.6897028Z cvt.u64.u32 %rd961, %r1925; 2026-02-21T08:54:13.6897093Z shl.b64 %rd962, %rd961, 32; 2026-02-21T08:54:13.6897154Z or.b64 %rd963, %rd960, %rd962; 2026-02-21T08:54:13.6897319Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6897383Z mov.b64 {%r2274, %r2275}, %rd963; 2026-02-21T08:54:13.6897445Z cvt.rn.f16x2.f32 %r2276, %r2275, %r2274; 2026-02-21T08:54:13.6897609Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6897665Z cvt.u64.u32 %rd964, %r1926; 2026-02-21T08:54:13.6897729Z cvt.u64.u32 %rd965, %r1927; 2026-02-21T08:54:13.6897785Z shl.b64 %rd966, %rd965, 32; 2026-02-21T08:54:13.6897842Z or.b64 %rd967, %rd964, %rd966; 2026-02-21T08:54:13.6898009Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6898068Z mov.b64 {%r2277, %r2278}, %rd967; 2026-02-21T08:54:13.6898130Z cvt.rn.f16x2.f32 %r2279, %r2278, %r2277; 2026-02-21T08:54:13.6898304Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6898384Z cvt.u64.u32 %rd968, %r1928; 2026-02-21T08:54:13.6898441Z cvt.u64.u32 %rd969, %r1929; 2026-02-21T08:54:13.6898496Z shl.b64 %rd970, %rd969, 32; 2026-02-21T08:54:13.6898560Z or.b64 %rd971, %rd968, %rd970; 2026-02-21T08:54:13.6898725Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6898784Z mov.b64 {%r2280, %r2281}, %rd971; 2026-02-21T08:54:13.6898856Z cvt.rn.f16x2.f32 %r2282, %r2281, %r2280; 2026-02-21T08:54:13.6899063Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6899121Z cvt.u64.u32 %rd972, %r1931; 2026-02-21T08:54:13.6899186Z cvt.u64.u32 %rd973, %r1932; 2026-02-21T08:54:13.6899246Z shl.b64 %rd974, %rd973, 32; 2026-02-21T08:54:13.6899306Z or.b64 %rd975, %rd972, %rd974; 2026-02-21T08:54:13.6899491Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6899559Z mov.b64 {%r2283, %r2284}, %rd975; 2026-02-21T08:54:13.6899626Z cvt.rn.f16x2.f32 %r2285, %r2284, %r2283; 2026-02-21T08:54:13.6899810Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6899875Z cvt.u64.u32 %rd976, %r1933; 2026-02-21T08:54:13.6899932Z cvt.u64.u32 %rd977, %r1934; 2026-02-21T08:54:13.6899989Z shl.b64 %rd978, %rd977, 32; 2026-02-21T08:54:13.6900078Z or.b64 %rd979, %rd976, %rd978; 2026-02-21T08:54:13.6900265Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6900324Z mov.b64 {%r2286, %r2287}, %rd979; 2026-02-21T08:54:13.6900390Z cvt.rn.f16x2.f32 %r2288, %r2287, %r2286; 2026-02-21T08:54:13.6900582Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6900639Z cvt.u64.u32 %rd980, %r1935; 2026-02-21T08:54:13.6900696Z cvt.u64.u32 %rd981, %r1936; 2026-02-21T08:54:13.6900761Z shl.b64 %rd982, %rd981, 32; 2026-02-21T08:54:13.6900821Z or.b64 %rd983, %rd980, %rd982; 2026-02-21T08:54:13.6901005Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6901069Z mov.b64 {%r2289, %r2290}, %rd983; 2026-02-21T08:54:13.6901135Z cvt.rn.f16x2.f32 %r2291, %r2290, %r2289; 2026-02-21T08:54:13.6901318Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6901376Z cvt.u64.u32 %rd984, %r1937; 2026-02-21T08:54:13.6901441Z cvt.u64.u32 %rd985, %r1938; 2026-02-21T08:54:13.6901497Z shl.b64 %rd986, %rd985, 32; 2026-02-21T08:54:13.6901556Z or.b64 %rd987, %rd984, %rd986; 2026-02-21T08:54:13.6901761Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6901820Z mov.b64 {%r2292, %r2293}, %rd987; 2026-02-21T08:54:13.6901886Z cvt.rn.f16x2.f32 %r2294, %r2293, %r2292; 2026-02-21T08:54:13.6902065Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6902123Z cvt.u64.u32 %rd988, %r1939; 2026-02-21T08:54:13.6902178Z cvt.u64.u32 %rd989, %r1940; 2026-02-21T08:54:13.6902236Z shl.b64 %rd990, %rd989, 32; 2026-02-21T08:54:13.6902302Z or.b64 %rd991, %rd988, %rd990; 2026-02-21T08:54:13.6902473Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6902533Z mov.b64 {%r2295, %r2296}, %rd991; 2026-02-21T08:54:13.6902605Z cvt.rn.f16x2.f32 %r2297, %r2296, %r2295; 2026-02-21T08:54:13.6902773Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6902831Z cvt.u64.u32 %rd992, %r1941; 2026-02-21T08:54:13.6902894Z cvt.u64.u32 %rd993, %r1942; 2026-02-21T08:54:13.6902950Z shl.b64 %rd994, %rd993, 32; 2026-02-21T08:54:13.6903008Z or.b64 %rd995, %rd992, %rd994; 2026-02-21T08:54:13.6903201Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6903270Z mov.b64 {%r2298, %r2299}, %rd995; 2026-02-21T08:54:13.6903336Z cvt.rn.f16x2.f32 %r2300, %r2299, %r2298; 2026-02-21T08:54:13.6903500Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6903565Z cvt.u64.u32 %rd996, %r1943; 2026-02-21T08:54:13.6903622Z cvt.u64.u32 %rd997, %r1944; 2026-02-21T08:54:13.6903708Z shl.b64 %rd998, %rd997, 32; 2026-02-21T08:54:13.6903774Z or.b64 %rd999, %rd996, %rd998; 2026-02-21T08:54:13.6903946Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6904007Z mov.b64 {%r2301, %r2302}, %rd999; 2026-02-21T08:54:13.6904076Z cvt.rn.f16x2.f32 %r2303, %r2302, %r2301; 2026-02-21T08:54:13.6904253Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6904314Z cvt.u64.u32 %rd1000, %r1945; 2026-02-21T08:54:13.6904376Z cvt.u64.u32 %rd1001, %r1946; 2026-02-21T08:54:13.6904446Z shl.b64 %rd1002, %rd1001, 32; 2026-02-21T08:54:13.6904506Z or.b64 %rd1003, %rd1000, %rd1002; 2026-02-21T08:54:13.6904706Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6904777Z mov.b64 {%r2304, %r2305}, %rd1003; 2026-02-21T08:54:13.6904870Z cvt.rn.f16x2.f32 %r2306, %r2305, %r2304; 2026-02-21T08:54:13.6905044Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6905105Z cvt.u64.u32 %rd1004, %r1948; 2026-02-21T08:54:13.6905171Z cvt.u64.u32 %rd1005, %r1949; 2026-02-21T08:54:13.6905236Z shl.b64 %rd1006, %rd1005, 32; 2026-02-21T08:54:13.6905296Z or.b64 %rd1007, %rd1004, %rd1006; 2026-02-21T08:54:13.6905476Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6905537Z mov.b64 {%r2307, %r2308}, %rd1007; 2026-02-21T08:54:13.6905606Z cvt.rn.f16x2.f32 %r2309, %r2308, %r2307; 2026-02-21T08:54:13.6905786Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6905846Z cvt.u64.u32 %rd1008, %r1950; 2026-02-21T08:54:13.6905907Z cvt.u64.u32 %rd1009, %r1951; 2026-02-21T08:54:13.6905969Z shl.b64 %rd1010, %rd1009, 32; 2026-02-21T08:54:13.6906039Z or.b64 %rd1011, %rd1008, %rd1010; 2026-02-21T08:54:13.6906212Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6906276Z mov.b64 {%r2310, %r2311}, %rd1011; 2026-02-21T08:54:13.6906352Z cvt.rn.f16x2.f32 %r2312, %r2311, %r2310; 2026-02-21T08:54:13.6906552Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6906610Z cvt.u64.u32 %rd1012, %r1952; 2026-02-21T08:54:13.6906676Z cvt.u64.u32 %rd1013, %r1953; 2026-02-21T08:54:13.6906736Z shl.b64 %rd1014, %rd1013, 32; 2026-02-21T08:54:13.6906796Z or.b64 %rd1015, %rd1012, %rd1014; 2026-02-21T08:54:13.6906965Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6907032Z mov.b64 {%r2313, %r2314}, %rd1015; 2026-02-21T08:54:13.6907099Z cvt.rn.f16x2.f32 %r2315, %r2314, %r2313; 2026-02-21T08:54:13.6907271Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6907338Z cvt.u64.u32 %rd1016, %r1954; 2026-02-21T08:54:13.6907395Z cvt.u64.u32 %rd1017, %r1955; 2026-02-21T08:54:13.6907453Z shl.b64 %rd1018, %rd1017, 32; 2026-02-21T08:54:13.6907517Z or.b64 %rd1019, %rd1016, %rd1018; 2026-02-21T08:54:13.6907690Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6907750Z mov.b64 {%r2316, %r2317}, %rd1019; 2026-02-21T08:54:13.6907842Z cvt.rn.f16x2.f32 %r2318, %r2317, %r2316; 2026-02-21T08:54:13.6908030Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6908085Z cvt.u64.u32 %rd1020, %r1956; 2026-02-21T08:54:13.6908140Z cvt.u64.u32 %rd1021, %r1957; 2026-02-21T08:54:13.6908203Z shl.b64 %rd1022, %rd1021, 32; 2026-02-21T08:54:13.6908259Z or.b64 %rd1023, %rd1020, %rd1022; 2026-02-21T08:54:13.6908425Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6908516Z mov.b64 {%r2319, %r2320}, %rd1023; 2026-02-21T08:54:13.6908579Z cvt.rn.f16x2.f32 %r2321, %r2320, %r2319; 2026-02-21T08:54:13.6908736Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6908802Z cvt.u64.u32 %rd1024, %r1958; 2026-02-21T08:54:13.6908859Z cvt.u64.u32 %rd1025, %r1959; 2026-02-21T08:54:13.6908915Z shl.b64 %rd1026, %rd1025, 32; 2026-02-21T08:54:13.6908972Z or.b64 %rd1027, %rd1024, %rd1026; 2026-02-21T08:54:13.6909139Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6909196Z mov.b64 {%r2322, %r2323}, %rd1027; 2026-02-21T08:54:13.6909259Z cvt.rn.f16x2.f32 %r2324, %r2323, %r2322; 2026-02-21T08:54:13.6909425Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6909504Z cvt.u64.u32 %rd1028, %r1960; 2026-02-21T08:54:13.6909563Z cvt.u64.u32 %rd1029, %r1961; 2026-02-21T08:54:13.6909628Z shl.b64 %rd1030, %rd1029, 32; 2026-02-21T08:54:13.6909685Z or.b64 %rd1031, %rd1028, %rd1030; 2026-02-21T08:54:13.6909841Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6909898Z mov.b64 {%r2325, %r2326}, %rd1031; 2026-02-21T08:54:13.6909968Z cvt.rn.f16x2.f32 %r2327, %r2326, %r2325; 2026-02-21T08:54:13.6910128Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6910184Z cvt.u64.u32 %rd1032, %r1962; 2026-02-21T08:54:13.6910246Z cvt.u64.u32 %rd1033, %r1963; 2026-02-21T08:54:13.6910302Z shl.b64 %rd1034, %rd1033, 32; 2026-02-21T08:54:13.6910357Z or.b64 %rd1035, %rd1032, %rd1034; 2026-02-21T08:54:13.6910519Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6910577Z mov.b64 {%r2328, %r2329}, %rd1035; 2026-02-21T08:54:13.6910641Z cvt.rn.f16x2.f32 %r2330, %r2329, %r2328; 2026-02-21T08:54:13.6910796Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6910858Z cvt.u64.u32 %rd1036, %r1965; 2026-02-21T08:54:13.6910938Z cvt.u64.u32 %rd1037, %r1966; 2026-02-21T08:54:13.6910992Z shl.b64 %rd1038, %rd1037, 32; 2026-02-21T08:54:13.6911055Z or.b64 %rd1039, %rd1036, %rd1038; 2026-02-21T08:54:13.6911219Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6911276Z mov.b64 {%r2331, %r2332}, %rd1039; 2026-02-21T08:54:13.6911346Z cvt.rn.f16x2.f32 %r2333, %r2332, %r2331; 2026-02-21T08:54:13.6911504Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6911560Z cvt.u64.u32 %rd1040, %r1967; 2026-02-21T08:54:13.6911615Z cvt.u64.u32 %rd1041, %r1968; 2026-02-21T08:54:13.6911678Z shl.b64 %rd1042, %rd1041, 32; 2026-02-21T08:54:13.6911736Z or.b64 %rd1043, %rd1040, %rd1042; 2026-02-21T08:54:13.6911897Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6911961Z mov.b64 {%r2334, %r2335}, %rd1043; 2026-02-21T08:54:13.6912027Z cvt.rn.f16x2.f32 %r2336, %r2335, %r2334; 2026-02-21T08:54:13.6912187Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6912283Z cvt.u64.u32 %rd1044, %r1969; 2026-02-21T08:54:13.6912340Z cvt.u64.u32 %rd1045, %r1970; 2026-02-21T08:54:13.6912395Z shl.b64 %rd1046, %rd1045, 32; 2026-02-21T08:54:13.6912452Z or.b64 %rd1047, %rd1044, %rd1046; 2026-02-21T08:54:13.6912622Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6912680Z mov.b64 {%r2337, %r2338}, %rd1047; 2026-02-21T08:54:13.6912745Z cvt.rn.f16x2.f32 %r2339, %r2338, %r2337; 2026-02-21T08:54:13.6912914Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6912997Z cvt.u64.u32 %rd1048, %r1971; 2026-02-21T08:54:13.6913053Z cvt.u64.u32 %rd1049, %r1972; 2026-02-21T08:54:13.6913115Z shl.b64 %rd1050, %rd1049, 32; 2026-02-21T08:54:13.6913175Z or.b64 %rd1051, %rd1048, %rd1050; 2026-02-21T08:54:13.6913339Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6913398Z mov.b64 {%r2340, %r2341}, %rd1051; 2026-02-21T08:54:13.6913470Z cvt.rn.f16x2.f32 %r2342, %r2341, %r2340; 2026-02-21T08:54:13.6913631Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6913688Z cvt.u64.u32 %rd1052, %r1973; 2026-02-21T08:54:13.6913752Z cvt.u64.u32 %rd1053, %r1974; 2026-02-21T08:54:13.6913809Z shl.b64 %rd1054, %rd1053, 32; 2026-02-21T08:54:13.6913889Z or.b64 %rd1055, %rd1052, %rd1054; 2026-02-21T08:54:13.6914061Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6914119Z mov.b64 {%r2343, %r2344}, %rd1055; 2026-02-21T08:54:13.6914186Z cvt.rn.f16x2.f32 %r2345, %r2344, %r2343; 2026-02-21T08:54:13.6914351Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6914418Z cvt.u64.u32 %rd1056, %r1975; 2026-02-21T08:54:13.6914476Z cvt.u64.u32 %rd1057, %r1976; 2026-02-21T08:54:13.6914536Z shl.b64 %rd1058, %rd1057, 32; 2026-02-21T08:54:13.6914603Z or.b64 %rd1059, %rd1056, %rd1058; 2026-02-21T08:54:13.6914824Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6914884Z mov.b64 {%r2346, %r2347}, %rd1059; 2026-02-21T08:54:13.6914959Z cvt.rn.f16x2.f32 %r2348, %r2347, %r2346; 2026-02-21T08:54:13.6915127Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6915187Z cvt.u64.u32 %rd1060, %r1977; 2026-02-21T08:54:13.6915242Z cvt.u64.u32 %rd1061, %r1978; 2026-02-21T08:54:13.6915310Z shl.b64 %rd1062, %rd1061, 32; 2026-02-21T08:54:13.6915370Z or.b64 %rd1063, %rd1060, %rd1062; 2026-02-21T08:54:13.6915566Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6915643Z mov.b64 {%r2349, %r2350}, %rd1063; 2026-02-21T08:54:13.6915711Z cvt.rn.f16x2.f32 %r2351, %r2350, %r2349; 2026-02-21T08:54:13.6915880Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6915951Z cvt.u64.u32 %rd1064, %r1979; 2026-02-21T08:54:13.6916009Z cvt.u64.u32 %rd1065, %r1980; 2026-02-21T08:54:13.6916070Z shl.b64 %rd1066, %rd1065, 32; 2026-02-21T08:54:13.6916130Z or.b64 %rd1067, %rd1064, %rd1066; 2026-02-21T08:54:13.6916309Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6916371Z mov.b64 {%r2352, %r2353}, %rd1067; 2026-02-21T08:54:13.6916436Z cvt.rn.f16x2.f32 %r2354, %r2353, %r2352; 2026-02-21T08:54:13.6916609Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6916669Z cvt.u64.u32 %rd1068, %r1982; 2026-02-21T08:54:13.6916724Z cvt.u64.u32 %rd1069, %r1983; 2026-02-21T08:54:13.6916789Z shl.b64 %rd1070, %rd1069, 32; 2026-02-21T08:54:13.6916847Z or.b64 %rd1071, %rd1068, %rd1070; 2026-02-21T08:54:13.6917037Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6917098Z mov.b64 {%r2355, %r2356}, %rd1071; 2026-02-21T08:54:13.6917168Z cvt.rn.f16x2.f32 %r2357, %r2356, %r2355; 2026-02-21T08:54:13.6917330Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6917386Z cvt.u64.u32 %rd1072, %r1984; 2026-02-21T08:54:13.6917451Z cvt.u64.u32 %rd1073, %r1985; 2026-02-21T08:54:13.6917535Z shl.b64 %rd1074, %rd1073, 32; 2026-02-21T08:54:13.6917592Z or.b64 %rd1075, %rd1072, %rd1074; 2026-02-21T08:54:13.6917762Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6917823Z mov.b64 {%r2358, %r2359}, %rd1075; 2026-02-21T08:54:13.6917887Z cvt.rn.f16x2.f32 %r2360, %r2359, %r2358; 2026-02-21T08:54:13.6918051Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6918115Z cvt.u64.u32 %rd1076, %r1986; 2026-02-21T08:54:13.6918170Z cvt.u64.u32 %rd1077, %r1987; 2026-02-21T08:54:13.6918227Z shl.b64 %rd1078, %rd1077, 32; 2026-02-21T08:54:13.6918293Z or.b64 %rd1079, %rd1076, %rd1078; 2026-02-21T08:54:13.6918455Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6918512Z mov.b64 {%r2361, %r2362}, %rd1079; 2026-02-21T08:54:13.6918607Z cvt.rn.f16x2.f32 %r2363, %r2362, %r2361; 2026-02-21T08:54:13.6918770Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6918828Z cvt.u64.u32 %rd1080, %r1988; 2026-02-21T08:54:13.6918885Z cvt.u64.u32 %rd1081, %r1989; 2026-02-21T08:54:13.6918948Z shl.b64 %rd1082, %rd1081, 32; 2026-02-21T08:54:13.6919006Z or.b64 %rd1083, %rd1080, %rd1082; 2026-02-21T08:54:13.6919169Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6919233Z mov.b64 {%r2364, %r2365}, %rd1083; 2026-02-21T08:54:13.6919296Z cvt.rn.f16x2.f32 %r2366, %r2365, %r2364; 2026-02-21T08:54:13.6919455Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6919519Z cvt.u64.u32 %rd1084, %r1990; 2026-02-21T08:54:13.6919574Z cvt.u64.u32 %rd1085, %r1991; 2026-02-21T08:54:13.6919631Z shl.b64 %rd1086, %rd1085, 32; 2026-02-21T08:54:13.6919687Z or.b64 %rd1087, %rd1084, %rd1086; 2026-02-21T08:54:13.6919854Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6919910Z mov.b64 {%r2367, %r2368}, %rd1087; 2026-02-21T08:54:13.6919994Z cvt.rn.f16x2.f32 %r2369, %r2368, %r2367; 2026-02-21T08:54:13.6920162Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6920218Z cvt.u64.u32 %rd1088, %r1992; 2026-02-21T08:54:13.6920275Z cvt.u64.u32 %rd1089, %r1993; 2026-02-21T08:54:13.6920337Z shl.b64 %rd1090, %rd1089, 32; 2026-02-21T08:54:13.6920393Z or.b64 %rd1091, %rd1088, %rd1090; 2026-02-21T08:54:13.6920558Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6920620Z mov.b64 {%r2370, %r2371}, %rd1091; 2026-02-21T08:54:13.6920683Z cvt.rn.f16x2.f32 %r2372, %r2371, %r2370; 2026-02-21T08:54:13.6920846Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6920903Z cvt.u64.u32 %rd1092, %r1994; 2026-02-21T08:54:13.6920966Z cvt.u64.u32 %rd1093, %r1995; 2026-02-21T08:54:13.6921022Z shl.b64 %rd1094, %rd1093, 32; 2026-02-21T08:54:13.6921079Z or.b64 %rd1095, %rd1092, %rd1094; 2026-02-21T08:54:13.6921249Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6921305Z mov.b64 {%r2373, %r2374}, %rd1095; 2026-02-21T08:54:13.6921390Z cvt.rn.f16x2.f32 %r2375, %r2374, %r2373; 2026-02-21T08:54:13.6921560Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6921616Z cvt.u64.u32 %rd1096, %r1996; 2026-02-21T08:54:13.6921672Z cvt.u64.u32 %rd1097, %r1997; 2026-02-21T08:54:13.6921728Z shl.b64 %rd1098, %rd1097, 32; 2026-02-21T08:54:13.6921792Z or.b64 %rd1099, %rd1096, %rd1098; 2026-02-21T08:54:13.6921951Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6922032Z mov.b64 {%r2376, %r2377}, %rd1099; 2026-02-21T08:54:13.6922101Z cvt.rn.f16x2.f32 %r2378, %r2377, %r2376; 2026-02-21T08:54:13.6922264Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6922322Z cvt.u64.u32 %rd1100, %r1999; 2026-02-21T08:54:13.6922385Z cvt.u64.u32 %rd1101, %r2000; 2026-02-21T08:54:13.6922442Z shl.b64 %rd1102, %rd1101, 32; 2026-02-21T08:54:13.6922499Z or.b64 %rd1103, %rd1100, %rd1102; 2026-02-21T08:54:13.6922663Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6922728Z mov.b64 {%r2379, %r2380}, %rd1103; 2026-02-21T08:54:13.6922792Z cvt.rn.f16x2.f32 %r2381, %r2380, %r2379; 2026-02-21T08:54:13.6922954Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6923044Z cvt.u64.u32 %rd1104, %r2001; 2026-02-21T08:54:13.6923104Z cvt.u64.u32 %rd1105, %r2002; 2026-02-21T08:54:13.6923161Z shl.b64 %rd1106, %rd1105, 32; 2026-02-21T08:54:13.6923228Z or.b64 %rd1107, %rd1104, %rd1106; 2026-02-21T08:54:13.6923391Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6923451Z mov.b64 {%r2382, %r2383}, %rd1107; 2026-02-21T08:54:13.6923515Z cvt.rn.f16x2.f32 %r2384, %r2383, %r2382; 2026-02-21T08:54:13.6923688Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6923745Z cvt.u64.u32 %rd1108, %r2003; 2026-02-21T08:54:13.6923809Z cvt.u64.u32 %rd1109, %r2004; 2026-02-21T08:54:13.6923873Z shl.b64 %rd1110, %rd1109, 32; 2026-02-21T08:54:13.6923929Z or.b64 %rd1111, %rd1108, %rd1110; 2026-02-21T08:54:13.6924091Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6924158Z mov.b64 {%r2385, %r2386}, %rd1111; 2026-02-21T08:54:13.6924225Z cvt.rn.f16x2.f32 %r2387, %r2386, %r2385; 2026-02-21T08:54:13.6924386Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6924464Z cvt.u64.u32 %rd1112, %r2005; 2026-02-21T08:54:13.6924528Z cvt.u64.u32 %rd1113, %r2006; 2026-02-21T08:54:13.6924583Z shl.b64 %rd1114, %rd1113, 32; 2026-02-21T08:54:13.6924639Z or.b64 %rd1115, %rd1112, %rd1114; 2026-02-21T08:54:13.6924836Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6924894Z mov.b64 {%r2388, %r2389}, %rd1115; 2026-02-21T08:54:13.6924959Z cvt.rn.f16x2.f32 %r2390, %r2389, %r2388; 2026-02-21T08:54:13.6925128Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6925184Z cvt.u64.u32 %rd1116, %r2007; 2026-02-21T08:54:13.6925240Z cvt.u64.u32 %rd1117, %r2008; 2026-02-21T08:54:13.6925297Z shl.b64 %rd1118, %rd1117, 32; 2026-02-21T08:54:13.6925363Z or.b64 %rd1119, %rd1116, %rd1118; 2026-02-21T08:54:13.6925524Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6925581Z mov.b64 {%r2391, %r2392}, %rd1119; 2026-02-21T08:54:13.6925651Z cvt.rn.f16x2.f32 %r2393, %r2392, %r2391; 2026-02-21T08:54:13.6925811Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6925896Z cvt.u64.u32 %rd1120, %r2009; 2026-02-21T08:54:13.6925964Z cvt.u64.u32 %rd1121, %r2010; 2026-02-21T08:54:13.6926023Z shl.b64 %rd1122, %rd1121, 32; 2026-02-21T08:54:13.6926084Z or.b64 %rd1123, %rd1120, %rd1122; 2026-02-21T08:54:13.6926244Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6926312Z mov.b64 {%r2394, %r2395}, %rd1123; 2026-02-21T08:54:13.6926380Z cvt.rn.f16x2.f32 %r2396, %r2395, %r2394; 2026-02-21T08:54:13.6926570Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6926635Z cvt.u64.u32 %rd1124, %r2011; 2026-02-21T08:54:13.6926690Z cvt.u64.u32 %rd1125, %r2012; 2026-02-21T08:54:13.6926749Z shl.b64 %rd1126, %rd1125, 32; 2026-02-21T08:54:13.6926813Z or.b64 %rd1127, %rd1124, %rd1126; 2026-02-21T08:54:13.6926976Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6927034Z mov.b64 {%r2397, %r2398}, %rd1127; 2026-02-21T08:54:13.6927098Z cvt.rn.f16x2.f32 %r2399, %r2398, %r2397; 2026-02-21T08:54:13.6927267Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6927323Z cvt.u64.u32 %rd1128, %r2013; 2026-02-21T08:54:13.6927377Z cvt.u64.u32 %rd1129, %r2014; 2026-02-21T08:54:13.6927439Z shl.b64 %rd1130, %rd1129, 32; 2026-02-21T08:54:13.6927521Z or.b64 %rd1131, %rd1128, %rd1130; 2026-02-21T08:54:13.6927685Z .loc 1 58 27 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:58:27 2026-02-21T08:54:13.6927750Z mov.b64 {%r2400, %r2401}, %rd1131; 2026-02-21T08:54:13.6927813Z cvt.rn.f16x2.f32 %r2402, %r2401, %r2400; 2026-02-21T08:54:13.6927978Z .loc 1 59 45 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:59:45 2026-02-21T08:54:13.6928047Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:13.6928107Z bar.sync 0; 2026-02-21T08:54:13.6928210Z st.shared.v4.b32 [%r177], {%r2021, %r2024, %r2027, %r2030}; 2026-02-21T08:54:13.6928317Z st.shared.v4.b32 [%r177+16384], {%r2117, %r2120, %r2123, %r2126}; 2026-02-21T08:54:13.6928428Z st.shared.v4.b32 [%r177+32768], {%r2213, %r2216, %r2219, %r2222}; 2026-02-21T08:54:13.6928523Z st.shared.v4.b32 [%r177+49152], {%r2309, %r2312, %r2315, %r2318}; 2026-02-21T08:54:13.6928616Z st.shared.v4.b32 [%r178], {%r2033, %r2036, %r2039, %r2042}; 2026-02-21T08:54:13.6928718Z st.shared.v4.b32 [%r178+16384], {%r2129, %r2132, %r2135, %r2138}; 2026-02-21T08:54:13.6928812Z st.shared.v4.b32 [%r178+32768], {%r2225, %r2228, %r2231, %r2234}; 2026-02-21T08:54:13.6928903Z st.shared.v4.b32 [%r178+49152], {%r2321, %r2324, %r2327, %r2330}; 2026-02-21T08:54:13.6929046Z st.shared.v4.b32 [%r179], {%r2045, %r2048, %r2051, %r2054}; 2026-02-21T08:54:13.6929141Z st.shared.v4.b32 [%r179+16384], {%r2141, %r2144, %r2147, %r2150}; 2026-02-21T08:54:13.6929236Z st.shared.v4.b32 [%r179+32768], {%r2237, %r2240, %r2243, %r2246}; 2026-02-21T08:54:13.6929330Z st.shared.v4.b32 [%r179+49152], {%r2333, %r2336, %r2339, %r2342}; 2026-02-21T08:54:13.6929426Z st.shared.v4.b32 [%r180], {%r2057, %r2060, %r2063, %r2066}; 2026-02-21T08:54:13.6929518Z st.shared.v4.b32 [%r180+16384], {%r2153, %r2156, %r2159, %r2162}; 2026-02-21T08:54:13.6929611Z st.shared.v4.b32 [%r180+32768], {%r2249, %r2252, %r2255, %r2258}; 2026-02-21T08:54:13.6929714Z st.shared.v4.b32 [%r180+49152], {%r2345, %r2348, %r2351, %r2354}; 2026-02-21T08:54:13.6929805Z st.shared.v4.b32 [%r181], {%r2069, %r2072, %r2075, %r2078}; 2026-02-21T08:54:13.6929899Z st.shared.v4.b32 [%r181+16384], {%r2165, %r2168, %r2171, %r2174}; 2026-02-21T08:54:13.6929997Z st.shared.v4.b32 [%r181+32768], {%r2261, %r2264, %r2267, %r2270}; 2026-02-21T08:54:13.6930092Z st.shared.v4.b32 [%r181+49152], {%r2357, %r2360, %r2363, %r2366}; 2026-02-21T08:54:13.6930180Z st.shared.v4.b32 [%r182], {%r2081, %r2084, %r2087, %r2090}; 2026-02-21T08:54:13.6930302Z st.shared.v4.b32 [%r182+16384], {%r2177, %r2180, %r2183, %r2186}; 2026-02-21T08:54:13.6930396Z st.shared.v4.b32 [%r182+32768], {%r2273, %r2276, %r2279, %r2282}; 2026-02-21T08:54:13.6930489Z st.shared.v4.b32 [%r182+49152], {%r2369, %r2372, %r2375, %r2378}; 2026-02-21T08:54:13.6930576Z st.shared.v4.b32 [%r183], {%r2093, %r2096, %r2099, %r2102}; 2026-02-21T08:54:13.6930676Z st.shared.v4.b32 [%r183+16384], {%r2189, %r2192, %r2195, %r2198}; 2026-02-21T08:54:13.6930770Z st.shared.v4.b32 [%r183+32768], {%r2285, %r2288, %r2291, %r2294}; 2026-02-21T08:54:13.6930895Z st.shared.v4.b32 [%r183+49152], {%r2381, %r2384, %r2387, %r2390}; 2026-02-21T08:54:13.6930992Z st.shared.v4.b32 [%r184], {%r2105, %r2108, %r2111, %r2114}; 2026-02-21T08:54:13.6931085Z st.shared.v4.b32 [%r184+16384], {%r2201, %r2204, %r2207, %r2210}; 2026-02-21T08:54:13.6931179Z st.shared.v4.b32 [%r184+32768], {%r2297, %r2300, %r2303, %r2306}; 2026-02-21T08:54:13.6931279Z st.shared.v4.b32 [%r184+49152], {%r2393, %r2396, %r2399, %r2402}; 2026-02-21T08:54:13.6931339Z // begin inline asm 2026-02-21T08:54:13.6931414Z fence.proxy.async.shared::cta; 2026-02-21T08:54:13.6931468Z // end inline asm 2026-02-21T08:54:13.6931529Z bar.sync 0; 2026-02-21T08:54:13.6931594Z elect.sync %r2403|%p96, -1; 2026-02-21T08:54:13.6931658Z and.pred %p94, %p95, %p96; 2026-02-21T08:54:13.6931725Z add.s32 %r2016, %r2413, %r186; 2026-02-21T08:54:13.6931780Z // begin inline asm 2026-02-21T08:54:13.6931986Z @%p94 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd619, {%r2016, %r2417}], [%r2018]; 2026-02-21T08:54:13.6932054Z // end inline asm 2026-02-21T08:54:13.6932122Z cp.async.bulk.commit_group; 2026-02-21T08:54:13.6932183Z bra.uni $L__BB0_10; 2026-02-21T08:54:13.6932271Z $L__BB0_11: // %._crit_edge 2026-02-21T08:54:13.6932460Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6932535Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:13.6932592Z bar.sync 0; 2026-02-21T08:54:13.6932664Z @%p41 bra $L__BB0_13; 2026-02-21T08:54:13.6932720Z // %bb.12: 2026-02-21T08:54:13.6932889Z .loc 1 56 52 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:56:52 2026-02-21T08:54:13.6932952Z // begin inline asm 2026-02-21T08:54:13.6933005Z 2026-02-21T08:54:13.6933056Z { 2026-02-21T08:54:13.6933118Z .reg .pred complete; 2026-02-21T08:54:13.6933181Z waitLoop: 2026-02-21T08:54:13.6933306Z mbarrier.try_wait.parity.shared.b64 complete, [%r2533], %r2534; 2026-02-21T08:54:13.6933372Z @!complete bra.uni waitLoop; 2026-02-21T08:54:13.6933429Z } 2026-02-21T08:54:13.6933435Z 2026-02-21T08:54:13.6933491Z // end inline asm 2026-02-21T08:54:13.6933546Z $L__BB0_13: 2026-02-21T08:54:13.6933717Z .loc 1 30 108 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:108 2026-02-21T08:54:13.6933809Z cp.async.wait_group 0; 2026-02-21T08:54:13.6933862Z bar.sync 0; 2026-02-21T08:54:13.6933922Z add.s32 %r2406, %r403, 655360; 2026-02-21T08:54:13.6933981Z // begin inline asm 2026-02-21T08:54:13.6934065Z @%p99 mbarrier.inval.shared::cta.b64 [%r2406]; 2026-02-21T08:54:13.6934116Z // end inline asm 2026-02-21T08:54:13.6934166Z bar.sync 0; 2026-02-21T08:54:13.6934225Z // begin inline asm 2026-02-21T08:54:13.6934307Z @%p99 mbarrier.inval.shared::cta.b64 [%r685]; 2026-02-21T08:54:13.6934359Z // end inline asm 2026-02-21T08:54:13.6934527Z .loc 1 30 4 // ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py:30:4 2026-02-21T08:54:13.6934579Z bar.sync 0; 2026-02-21T08:54:13.6934631Z // begin inline asm 2026-02-21T08:54:13.6934775Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2408, 256; 2026-02-21T08:54:13.6934827Z // end inline asm 2026-02-21T08:54:13.6934876Z ret; 2026-02-21T08:54:13.6934929Z $L__tmp1: 2026-02-21T08:54:13.6934986Z $L__func_end0: 2026-02-21T08:54:13.6935065Z // -- End function 2026-02-21T08:54:13.6935115Z } 2026-02-21T08:54:13.6935351Z .file 1 "/tmp/torchinductor_root/kz/ckz5dgwiqpy4ul3vq2mzjnqgteck52owptaicdpswozntdibci37.py" 2026-02-21T08:54:13.6935412Z .section .debug_abbrev 2026-02-21T08:54:13.6935460Z { 2026-02-21T08:54:13.6935548Z .b8 1 // Abbreviation Code 2026-02-21T08:54:13.6935638Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:13.6935713Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:13.6935793Z .b8 37 // DW_AT_producer 2026-02-21T08:54:13.6935899Z .b8 8 // DW_FORM_string 2026-02-21T08:54:13.6935970Z .b8 19 // DW_AT_language 2026-02-21T08:54:13.6936045Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:13.6936125Z .b8 3 // DW_AT_name 2026-02-21T08:54:13.6936195Z .b8 8 // DW_FORM_string 2026-02-21T08:54:13.6936273Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:13.6936350Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:13.6936424Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:13.6936492Z .b8 8 // DW_FORM_string 2026-02-21T08:54:13.6936558Z .b8 0 // EOM(1) 2026-02-21T08:54:13.6936627Z .b8 0 // EOM(2) 2026-02-21T08:54:13.6936718Z .b8 0 // EOM(3) 2026-02-21T08:54:13.6936769Z } 2026-02-21T08:54:13.6936832Z .section .debug_info 2026-02-21T08:54:13.6936877Z { 2026-02-21T08:54:13.6936958Z .b32 104 // Length of Unit 2026-02-21T08:54:13.6937039Z .b8 2 // DWARF version number 2026-02-21T08:54:13.6937096Z .b8 0 2026-02-21T08:54:13.6937208Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:13.6937292Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:13.6937391Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:13.6937468Z .b8 116 // DW_AT_producer 2026-02-21T08:54:13.6937519Z .b8 114 2026-02-21T08:54:13.6937575Z .b8 105 2026-02-21T08:54:13.6937622Z .b8 116 2026-02-21T08:54:13.6937670Z .b8 111 2026-02-21T08:54:13.6937717Z .b8 110 2026-02-21T08:54:13.6937770Z .b8 0 2026-02-21T08:54:13.6937842Z .b8 2 // DW_AT_language 2026-02-21T08:54:13.6937889Z .b8 0 2026-02-21T08:54:13.6937968Z .b8 99 // DW_AT_name 2026-02-21T08:54:13.6938017Z .b8 107 2026-02-21T08:54:13.6938065Z .b8 122 2026-02-21T08:54:13.6938110Z .b8 53 2026-02-21T08:54:13.6938197Z .b8 100 2026-02-21T08:54:13.6938247Z .b8 103 2026-02-21T08:54:13.6938295Z .b8 119 2026-02-21T08:54:13.6938348Z .b8 105 2026-02-21T08:54:13.6938395Z .b8 113 2026-02-21T08:54:13.6938443Z .b8 112 2026-02-21T08:54:13.6938489Z .b8 121 2026-02-21T08:54:13.6938545Z .b8 52 2026-02-21T08:54:13.6938592Z .b8 117 2026-02-21T08:54:13.6938639Z .b8 108 2026-02-21T08:54:13.6938685Z .b8 51 2026-02-21T08:54:13.6938740Z .b8 118 2026-02-21T08:54:13.6938788Z .b8 113 2026-02-21T08:54:13.6938836Z .b8 50 2026-02-21T08:54:13.6938888Z .b8 109 2026-02-21T08:54:13.6938936Z .b8 122 2026-02-21T08:54:13.6938982Z .b8 106 2026-02-21T08:54:13.6939029Z .b8 110 2026-02-21T08:54:13.6939083Z .b8 113 2026-02-21T08:54:13.6939130Z .b8 103 2026-02-21T08:54:13.6939179Z .b8 116 2026-02-21T08:54:13.6939237Z .b8 101 2026-02-21T08:54:13.6939286Z .b8 99 2026-02-21T08:54:13.6939333Z .b8 107 2026-02-21T08:54:13.6939379Z .b8 53 2026-02-21T08:54:13.6939437Z .b8 50 2026-02-21T08:54:13.6939485Z .b8 111 2026-02-21T08:54:13.6939532Z .b8 119 2026-02-21T08:54:13.6939583Z .b8 112 2026-02-21T08:54:13.6939637Z .b8 116 2026-02-21T08:54:13.6939684Z .b8 97 2026-02-21T08:54:13.6939732Z .b8 105 2026-02-21T08:54:13.6939784Z .b8 99 2026-02-21T08:54:13.6939831Z .b8 100 2026-02-21T08:54:13.6939880Z .b8 112 2026-02-21T08:54:13.6939952Z .b8 115 2026-02-21T08:54:13.6940013Z .b8 119 2026-02-21T08:54:13.6940061Z .b8 111 2026-02-21T08:54:13.6940109Z .b8 122 2026-02-21T08:54:13.6940163Z .b8 110 2026-02-21T08:54:13.6940210Z .b8 116 2026-02-21T08:54:13.6940258Z .b8 100 2026-02-21T08:54:13.6940304Z .b8 105 2026-02-21T08:54:13.6940359Z .b8 98 2026-02-21T08:54:13.6940405Z .b8 99 2026-02-21T08:54:13.6940453Z .b8 105 2026-02-21T08:54:13.6940505Z .b8 51 2026-02-21T08:54:13.6940549Z .b8 55 2026-02-21T08:54:13.6940598Z .b8 46 2026-02-21T08:54:13.6940669Z .b8 112 2026-02-21T08:54:13.6940725Z .b8 121 2026-02-21T08:54:13.6940774Z .b8 0 2026-02-21T08:54:13.6940861Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:13.6940937Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:13.6940988Z .b8 116 2026-02-21T08:54:13.6941037Z .b8 109 2026-02-21T08:54:13.6941087Z .b8 112 2026-02-21T08:54:13.6941144Z .b8 47 2026-02-21T08:54:13.6941191Z .b8 116 2026-02-21T08:54:13.6941240Z .b8 111 2026-02-21T08:54:13.6941288Z .b8 114 2026-02-21T08:54:13.6941345Z .b8 99 2026-02-21T08:54:13.6941392Z .b8 104 2026-02-21T08:54:13.6941440Z .b8 105 2026-02-21T08:54:13.6941493Z .b8 110 2026-02-21T08:54:13.6941541Z .b8 100 2026-02-21T08:54:13.6941589Z .b8 117 2026-02-21T08:54:13.6941637Z .b8 99 2026-02-21T08:54:13.6941694Z .b8 116 2026-02-21T08:54:13.6941745Z .b8 111 2026-02-21T08:54:13.6941794Z .b8 114 2026-02-21T08:54:13.6941851Z .b8 95 2026-02-21T08:54:13.6941901Z .b8 114 2026-02-21T08:54:13.6941988Z .b8 111 2026-02-21T08:54:13.6942042Z .b8 111 2026-02-21T08:54:13.6942101Z .b8 116 2026-02-21T08:54:13.6942151Z .b8 47 2026-02-21T08:54:13.6942201Z .b8 107 2026-02-21T08:54:13.6942252Z .b8 122 2026-02-21T08:54:13.6942309Z .b8 0 2026-02-21T08:54:13.6942358Z } 2026-02-21T08:54:13.6942426Z .section .debug_macinfo { } 2026-02-21T08:54:13.6942431Z 2026-02-21T08:54:13.6942516Z ================================================================ 2026-02-21T08:54:13.6942633Z please share the reproducer above with Triton project. 2026-02-21T08:54:14.7776571Z [360s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:14.7776867Z 2026-02-21T08:54:14.7778117Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=1, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:54:14.7779353Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:14.7779777Z 2026-02-21T08:54:14.7780045Z `ptxas` stderr: 2026-02-21T08:54:14.7780467Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 299 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:14.7780937Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:14.7781081Z 2026-02-21T08:54:14.7781499Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpqegh6e40.ptx -o /tmp/tmpqegh6e40.ptx.o 2026-02-21T08:54:14.7781930Z 2026-02-21T08:54:14.7782060Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:14.7782255Z 2026-02-21T08:54:14.7782343Z ================================================================ 2026-02-21T08:54:14.7782549Z Internal Triton PTX codegen error 2026-02-21T08:54:14.7782729Z `ptxas` stderr: 2026-02-21T08:54:14.7783139Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 299 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:14.7783610Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:14.7783756Z 2026-02-21T08:54:14.7784184Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpqegh6e40.ptx -o /tmp/tmpqegh6e40.ptx.o 2026-02-21T08:54:14.7784604Z 2026-02-21T08:54:14.7784607Z 2026-02-21T08:54:14.7784941Z // 2026-02-21T08:54:14.7785089Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:14.7785266Z // 2026-02-21T08:54:14.7785331Z 2026-02-21T08:54:14.7785386Z .version 8.7 2026-02-21T08:54:14.7785529Z .target sm_100a 2026-02-21T08:54:14.7785763Z .address_size 64 2026-02-21T08:54:14.7785856Z 2026-02-21T08:54:14.7785978Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:14.7786224Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:14.7786447Z // @_helion_matmul 2026-02-21T08:54:14.7786658Z .visible .entry _helion_matmul( 2026-02-21T08:54:14.7786875Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:14.7787140Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:14.7787383Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:14.7787630Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:14.7787866Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:14.7788069Z ) 2026-02-21T08:54:14.7788190Z .reqntid 384 2026-02-21T08:54:14.7788319Z .maxnreg 32 2026-02-21T08:54:14.7788446Z { 2026-02-21T08:54:14.7788629Z .reg .pred %p<124>; 2026-02-21T08:54:14.7788788Z .reg .b16 %rs<6>; 2026-02-21T08:54:14.7788928Z .reg .b32 %r<679>; 2026-02-21T08:54:14.7789081Z .reg .b64 %rd<396>; 2026-02-21T08:54:14.7789349Z .loc 1 19 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:19:0 2026-02-21T08:54:14.7789662Z $L__func_begin0: 2026-02-21T08:54:14.7789912Z .loc 1 19 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:19:0 2026-02-21T08:54:14.7790155Z 2026-02-21T08:54:14.7790211Z // %bb.0: 2026-02-21T08:54:14.7790378Z ld.param.b64 %rd23, [_helion_matmul_param_3]; 2026-02-21T08:54:14.7790575Z $L__tmp0: 2026-02-21T08:54:14.7790823Z .loc 1 19 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:19 2026-02-21T08:54:14.7791119Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:14.7791275Z shr.u32 %r2, %r1, 5; 2026-02-21T08:54:14.7791435Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:54:14.7791635Z setp.lt.u32 %p1, %r3, 8; 2026-02-21T08:54:14.7791801Z @%p1 bra $L__BB0_12; 2026-02-21T08:54:14.7791949Z bra.uni $L__BB0_1; 2026-02-21T08:54:14.7792094Z $L__BB0_12: 2026-02-21T08:54:14.7792335Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0:0 2026-02-21T08:54:14.7792734Z ld.param.b64 %rd22, [_helion_matmul_param_2]; 2026-02-21T08:54:14.7792950Z ld.param.b64 %rd21, [_helion_matmul_param_1]; 2026-02-21T08:54:14.7793170Z ld.param.b64 %rd20, [_helion_matmul_param_0]; 2026-02-21T08:54:14.7793473Z .loc 1 19 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:19 2026-02-21T08:54:14.7793797Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T08:54:14.7794002Z setp.lt.u32 %p48, %r1, 32; 2026-02-21T08:54:14.7794172Z mov.b32 %r130, global_smem; 2026-02-21T08:54:14.7794344Z // begin inline asm 2026-02-21T08:54:14.7794600Z @%p48 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r130], 256; 2026-02-21T08:54:14.7794922Z // end inline asm 2026-02-21T08:54:14.7795062Z bar.sync 0, 256; 2026-02-21T08:54:14.7795222Z ld.shared.b32 %r674, [global_smem]; 2026-02-21T08:54:14.7795395Z bar.sync 0, 256; 2026-02-21T08:54:14.7795539Z // begin inline asm 2026-02-21T08:54:14.7795753Z @%p48 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:14.7795988Z // end inline asm 2026-02-21T08:54:14.7796253Z .loc 1 21 67 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:21:67 2026-02-21T08:54:14.7796551Z mov.u32 %r18, %ctaid.x; 2026-02-21T08:54:14.7796768Z mov.u32 %r155, %ctaid.y; 2026-02-21T08:54:14.7796925Z mov.u32 %r156, %ctaid.z; 2026-02-21T08:54:14.7797094Z mov.u32 %r157, %nctaid.x; 2026-02-21T08:54:14.7797254Z mov.u32 %r158, %nctaid.y; 2026-02-21T08:54:14.7797424Z mad.lo.s32 %r159, %r156, %r158, %r155; 2026-02-21T08:54:14.7797618Z mad.lo.s32 %r160, %r159, %r157, %r18; 2026-02-21T08:54:14.7797797Z mul.lo.s32 %r161, %r160, 384; 2026-02-21T08:54:14.7797973Z cvt.s64.s32 %rd136, %r161; 2026-02-21T08:54:14.7798138Z add.s64 %rd97, %rd23, %rd136; 2026-02-21T08:54:14.7798342Z shl.b32 %r162, %r1, 2; 2026-02-21T08:54:14.7798496Z add.s32 %r131, %r130, %r162; 2026-02-21T08:54:14.7798650Z mov.b32 %r166, 0; 2026-02-21T08:54:14.7798781Z // begin inline asm 2026-02-21T08:54:14.7798942Z @%p48 st.shared.b32 [ %r131 + 0 ], %r166; 2026-02-21T08:54:14.7799111Z // end inline asm 2026-02-21T08:54:14.7799255Z bar.warp.sync -1; 2026-02-21T08:54:14.7799404Z setp.eq.b32 %p113, %r1, 0; 2026-02-21T08:54:14.7799559Z cvt.u64.u32 %rd82, %r130; 2026-02-21T08:54:14.7799713Z // begin inline asm 2026-02-21T08:54:14.7799965Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd20; 2026-02-21T08:54:14.7800253Z // end inline asm 2026-02-21T08:54:14.7800384Z // begin inline asm 2026-02-21T08:54:14.7800614Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:14.7800869Z // end inline asm 2026-02-21T08:54:14.7801038Z mov.b32 %r133, 64; 2026-02-21T08:54:14.7801183Z // begin inline asm 2026-02-21T08:54:14.7801414Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:14.7801692Z // end inline asm 2026-02-21T08:54:14.7801821Z mov.b32 %r134, 128; 2026-02-21T08:54:14.7801965Z // begin inline asm 2026-02-21T08:54:14.7802197Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r134; 2026-02-21T08:54:14.7802471Z // end inline asm 2026-02-21T08:54:14.7802600Z mov.b32 %r135, 2048; 2026-02-21T08:54:14.7802745Z // begin inline asm 2026-02-21T08:54:14.7802993Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:14.7803264Z // end inline asm 2026-02-21T08:54:14.7803401Z // begin inline asm 2026-02-21T08:54:14.7803640Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:14.7803914Z // end inline asm 2026-02-21T08:54:14.7804044Z mov.b64 %rd90, 4096; 2026-02-21T08:54:14.7804189Z // begin inline asm 2026-02-21T08:54:14.7804443Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T08:54:14.7804762Z // end inline asm 2026-02-21T08:54:14.7804903Z mov.b32 %r137, 1; 2026-02-21T08:54:14.7805079Z // begin inline asm 2026-02-21T08:54:14.7805342Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r137; 2026-02-21T08:54:14.7805633Z // end inline asm 2026-02-21T08:54:14.7805777Z // begin inline asm 2026-02-21T08:54:14.7806038Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r137; 2026-02-21T08:54:14.7806321Z // end inline asm 2026-02-21T08:54:14.7806500Z // begin inline asm 2026-02-21T08:54:14.7806732Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:14.7807001Z // end inline asm 2026-02-21T08:54:14.7807136Z // begin inline asm 2026-02-21T08:54:14.7807391Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:14.7807677Z // end inline asm 2026-02-21T08:54:14.7807819Z // begin inline asm 2026-02-21T08:54:14.7808068Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:14.7808338Z // end inline asm 2026-02-21T08:54:14.7808480Z // begin inline asm 2026-02-21T08:54:14.7808709Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:14.7808975Z // end inline asm 2026-02-21T08:54:14.7809136Z // begin inline asm 2026-02-21T08:54:14.7809482Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd97 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:14.7809864Z // end inline asm 2026-02-21T08:54:14.7809998Z // begin inline asm 2026-02-21T08:54:14.7810209Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd97 + 0 ], 0x80; 2026-02-21T08:54:14.7810453Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:14.7810649Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:14.7810855Z // end inline asm 2026-02-21T08:54:14.7810993Z bar.sync 0, 256; 2026-02-21T08:54:14.7811239Z .loc 1 22 68 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:22:68 2026-02-21T08:54:14.7811535Z add.s32 %r163, %r161, 128; 2026-02-21T08:54:14.7811696Z cvt.s64.s32 %rd137, %r163; 2026-02-21T08:54:14.7811851Z add.s64 %rd115, %rd23, %rd137; 2026-02-21T08:54:14.7812012Z bar.sync 0, 256; 2026-02-21T08:54:14.7812141Z // begin inline asm 2026-02-21T08:54:14.7812299Z @%p48 st.shared.b32 [ %r131 + 0 ], %r166; 2026-02-21T08:54:14.7812471Z // end inline asm 2026-02-21T08:54:14.7812612Z bar.warp.sync -1; 2026-02-21T08:54:14.7812747Z // begin inline asm 2026-02-21T08:54:14.7813005Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd21; 2026-02-21T08:54:14.7813306Z // end inline asm 2026-02-21T08:54:14.7813436Z // begin inline asm 2026-02-21T08:54:14.7813695Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:14.7813967Z // end inline asm 2026-02-21T08:54:14.7814107Z // begin inline asm 2026-02-21T08:54:14.7814340Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:14.7814625Z // end inline asm 2026-02-21T08:54:14.7814794Z mov.b32 %r142, 256; 2026-02-21T08:54:14.7814942Z // begin inline asm 2026-02-21T08:54:14.7815186Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r142; 2026-02-21T08:54:14.7815466Z // end inline asm 2026-02-21T08:54:14.7815609Z // begin inline asm 2026-02-21T08:54:14.7815868Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:14.7816170Z // end inline asm 2026-02-21T08:54:14.7816305Z mov.b32 %r144, 12288; 2026-02-21T08:54:14.7816463Z // begin inline asm 2026-02-21T08:54:14.7816729Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r144; 2026-02-21T08:54:14.7817018Z // end inline asm 2026-02-21T08:54:14.7817162Z // begin inline asm 2026-02-21T08:54:14.7817418Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T08:54:14.7817713Z // end inline asm 2026-02-21T08:54:14.7817871Z // begin inline asm 2026-02-21T08:54:14.7818126Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r137; 2026-02-21T08:54:14.7818409Z // end inline asm 2026-02-21T08:54:14.7818538Z // begin inline asm 2026-02-21T08:54:14.7818787Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r137; 2026-02-21T08:54:14.7819058Z // end inline asm 2026-02-21T08:54:14.7819195Z // begin inline asm 2026-02-21T08:54:14.7819416Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:14.7819681Z // end inline asm 2026-02-21T08:54:14.7819807Z // begin inline asm 2026-02-21T08:54:14.7820056Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:14.7820339Z // end inline asm 2026-02-21T08:54:14.7820466Z // begin inline asm 2026-02-21T08:54:14.7820702Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:14.7820961Z // end inline asm 2026-02-21T08:54:14.7821095Z // begin inline asm 2026-02-21T08:54:14.7821314Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:14.7821597Z // end inline asm 2026-02-21T08:54:14.7821739Z // begin inline asm 2026-02-21T08:54:14.7822090Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd115 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:14.7822473Z // end inline asm 2026-02-21T08:54:14.7822607Z // begin inline asm 2026-02-21T08:54:14.7822826Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd115 + 0 ], 0x80; 2026-02-21T08:54:14.7823084Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:14.7823310Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:14.7823489Z // end inline asm 2026-02-21T08:54:14.7823617Z bar.sync 0, 256; 2026-02-21T08:54:14.7823873Z .loc 1 24 73 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:24:73 2026-02-21T08:54:14.7824162Z add.s32 %r164, %r161, 256; 2026-02-21T08:54:14.7824323Z cvt.s64.s32 %rd138, %r164; 2026-02-21T08:54:14.7824476Z add.s64 %rd19, %rd23, %rd138; 2026-02-21T08:54:14.7824637Z bar.sync 0, 256; 2026-02-21T08:54:14.7824800Z // begin inline asm 2026-02-21T08:54:14.7824959Z @%p48 st.shared.b32 [ %r131 + 0 ], %r166; 2026-02-21T08:54:14.7825128Z // end inline asm 2026-02-21T08:54:14.7825267Z bar.warp.sync -1; 2026-02-21T08:54:14.7825408Z // begin inline asm 2026-02-21T08:54:14.7825652Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd22; 2026-02-21T08:54:14.7825943Z // end inline asm 2026-02-21T08:54:14.7826103Z // begin inline asm 2026-02-21T08:54:14.7826332Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:14.7826575Z // end inline asm 2026-02-21T08:54:14.7826713Z // begin inline asm 2026-02-21T08:54:14.7826949Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:14.7827215Z // end inline asm 2026-02-21T08:54:14.7827356Z // begin inline asm 2026-02-21T08:54:14.7827588Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r134; 2026-02-21T08:54:14.7827865Z // end inline asm 2026-02-21T08:54:14.7827994Z // begin inline asm 2026-02-21T08:54:14.7828239Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r144; 2026-02-21T08:54:14.7828513Z // end inline asm 2026-02-21T08:54:14.7828643Z // begin inline asm 2026-02-21T08:54:14.7828885Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:14.7829153Z // end inline asm 2026-02-21T08:54:14.7829291Z mov.b64 %rd126, 24576; 2026-02-21T08:54:14.7829436Z // begin inline asm 2026-02-21T08:54:14.7829690Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd126; 2026-02-21T08:54:14.7829973Z // end inline asm 2026-02-21T08:54:14.7830138Z // begin inline asm 2026-02-21T08:54:14.7830395Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r137; 2026-02-21T08:54:14.7830671Z // end inline asm 2026-02-21T08:54:14.7830810Z // begin inline asm 2026-02-21T08:54:14.7831057Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r137; 2026-02-21T08:54:14.7831350Z // end inline asm 2026-02-21T08:54:14.7831476Z // begin inline asm 2026-02-21T08:54:14.7831710Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:14.7831973Z // end inline asm 2026-02-21T08:54:14.7832099Z // begin inline asm 2026-02-21T08:54:14.7832350Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:14.7832624Z // end inline asm 2026-02-21T08:54:14.7832758Z // begin inline asm 2026-02-21T08:54:14.7832988Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:14.7833255Z // end inline asm 2026-02-21T08:54:14.7833382Z // begin inline asm 2026-02-21T08:54:14.7833609Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:14.7833892Z // end inline asm 2026-02-21T08:54:14.7834029Z // begin inline asm 2026-02-21T08:54:14.7834382Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:14.7834801Z // end inline asm 2026-02-21T08:54:14.7834946Z // begin inline asm 2026-02-21T08:54:14.7835160Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:54:14.7835422Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:14.7835620Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:14.7835832Z // end inline asm 2026-02-21T08:54:14.7835973Z bar.sync 0, 256; 2026-02-21T08:54:14.7836229Z .loc 1 33 75 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:33:75 2026-02-21T08:54:14.7836543Z setp.gt.u32 %p104, %r18, 767; 2026-02-21T08:54:14.7836713Z @%p104 bra $L__BB0_14; 2026-02-21T08:54:14.7836892Z // %bb.13: // %.lr.ph 2026-02-21T08:54:14.7837213Z .loc 1 24 73 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:24:73 2026-02-21T08:54:14.7837520Z cvta.global.u64 %rd139, %rd19; 2026-02-21T08:54:14.7837705Z setp.lt.u32 %p121, %r1, 128; 2026-02-21T08:54:14.7837872Z shl.b32 %r449, %r1, 4; 2026-02-21T08:54:14.7838034Z shl.b32 %r450, %r1, 8; 2026-02-21T08:54:14.7838189Z or.b32 %r451, %r449, %r450; 2026-02-21T08:54:14.7838364Z and.b32 %r452, %r451, 32880; 2026-02-21T08:54:14.7838574Z shl.b32 %r453, %r1, 7; 2026-02-21T08:54:14.7838733Z and.b32 %r454, %r453, 16256; 2026-02-21T08:54:14.7838892Z or.b32 %r455, %r452, %r454; 2026-02-21T08:54:14.7839062Z xor.b32 %r456, %r455, 112; 2026-02-21T08:54:14.7839228Z add.s32 %r458, %r130, %r456; 2026-02-21T08:54:14.7839384Z xor.b32 %r459, %r455, 96; 2026-02-21T08:54:14.7839546Z add.s32 %r460, %r130, %r459; 2026-02-21T08:54:14.7839700Z xor.b32 %r461, %r455, 80; 2026-02-21T08:54:14.7839860Z add.s32 %r462, %r130, %r461; 2026-02-21T08:54:14.7840013Z xor.b32 %r463, %r455, 64; 2026-02-21T08:54:14.7840173Z add.s32 %r464, %r130, %r463; 2026-02-21T08:54:14.7840327Z xor.b32 %r465, %r455, 48; 2026-02-21T08:54:14.7840487Z add.s32 %r466, %r130, %r465; 2026-02-21T08:54:14.7840642Z xor.b32 %r467, %r455, 32; 2026-02-21T08:54:14.7840801Z add.s32 %r468, %r130, %r467; 2026-02-21T08:54:14.7840958Z xor.b32 %r469, %r455, 16; 2026-02-21T08:54:14.7841107Z add.s32 %r470, %r130, %r469; 2026-02-21T08:54:14.7841270Z add.s32 %r471, %r130, %r455; 2026-02-21T08:54:14.7841546Z .loc 1 42 64 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:42:64 2026-02-21T08:54:14.7841854Z cvt.u16.u32 %rs1, %r18; 2026-02-21T08:54:14.7842123Z .loc 1 43 51 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:43:51 2026-02-21T08:54:14.7842469Z mul.hi.u16 %rs2, %rs1, -21845; 2026-02-21T08:54:14.7842633Z shr.u16 %rs3, %rs2, 5; 2026-02-21T08:54:14.7842882Z .loc 1 42 64 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:42:64 2026-02-21T08:54:14.7843173Z mul.lo.s16 %rs4, %rs3, 48; 2026-02-21T08:54:14.7843324Z sub.s16 %rs5, %rs1, %rs4; 2026-02-21T08:54:14.7843588Z .loc 1 44 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:44:27 2026-02-21T08:54:14.7843873Z mul.wide.u16 %r472, %rs5, 256; 2026-02-21T08:54:14.7844149Z .loc 1 45 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:45:27 2026-02-21T08:54:14.7844447Z mul.wide.u16 %r447, %rs3, 128; 2026-02-21T08:54:14.7844747Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7845054Z shfl.sync.idx.b32 %r473, %r2, 0, 31, -1; 2026-02-21T08:54:14.7845234Z and.b32 %r474, %r473, 3; 2026-02-21T08:54:14.7845387Z shl.b32 %r475, %r474, 21; 2026-02-21T08:54:14.7845532Z add.s32 %r476, %r475, %r674; 2026-02-21T08:54:14.7845688Z shl.b32 %r477, %r473, 5; 2026-02-21T08:54:14.7845832Z and.b32 %r478, %r477, 128; 2026-02-21T08:54:14.7846010Z add.s32 %r165, %r476, %r478; 2026-02-21T08:54:14.7846172Z mov.pred %p105, -1; 2026-02-21T08:54:14.7846312Z // begin inline asm 2026-02-21T08:54:14.7846683Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 0], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7847079Z // end inline asm 2026-02-21T08:54:14.7847221Z // begin inline asm 2026-02-21T08:54:14.7847575Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 16], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7847992Z // end inline asm 2026-02-21T08:54:14.7848131Z // begin inline asm 2026-02-21T08:54:14.7848482Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 32], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7848877Z // end inline asm 2026-02-21T08:54:14.7849012Z // begin inline asm 2026-02-21T08:54:14.7849369Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 48], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7849773Z // end inline asm 2026-02-21T08:54:14.7849913Z // begin inline asm 2026-02-21T08:54:14.7850295Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 64], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7850684Z // end inline asm 2026-02-21T08:54:14.7850819Z // begin inline asm 2026-02-21T08:54:14.7851170Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 80], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7851560Z // end inline asm 2026-02-21T08:54:14.7851690Z // begin inline asm 2026-02-21T08:54:14.7852045Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 96], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7852442Z // end inline asm 2026-02-21T08:54:14.7852572Z // begin inline asm 2026-02-21T08:54:14.7852931Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 112], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:14.7853310Z // end inline asm 2026-02-21T08:54:14.7853447Z // begin inline asm 2026-02-21T08:54:14.7853598Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:14.7853766Z // end inline asm 2026-02-21T08:54:14.7853902Z bar.sync 0, 256; 2026-02-21T08:54:14.7854143Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.7854463Z add.s32 %r301, %r130, 98416; 2026-02-21T08:54:14.7854613Z // begin inline asm 2026-02-21T08:54:14.7854813Z @%p113 mbarrier.init.shared::cta.b64 [%r301], 1; 2026-02-21T08:54:14.7855000Z // end inline asm 2026-02-21T08:54:14.7855141Z add.s32 %r302, %r130, 98432; 2026-02-21T08:54:14.7855289Z // begin inline asm 2026-02-21T08:54:14.7855455Z @%p113 mbarrier.init.shared::cta.b64 [%r302], 1; 2026-02-21T08:54:14.7855645Z // end inline asm 2026-02-21T08:54:14.7855883Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0 2026-02-21T08:54:14.7856168Z bar.sync 0, 256; 2026-02-21T08:54:14.7856295Z // begin inline asm 2026-02-21T08:54:14.7856469Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r301]; 2026-02-21T08:54:14.7856659Z // end inline asm 2026-02-21T08:54:14.7856913Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.7857197Z bar.sync 0, 256; 2026-02-21T08:54:14.7857327Z add.s32 %r304, %r130, 98448; 2026-02-21T08:54:14.7857484Z // begin inline asm 2026-02-21T08:54:14.7857641Z @%p113 mbarrier.init.shared::cta.b64 [%r304], 1; 2026-02-21T08:54:14.7857830Z // end inline asm 2026-02-21T08:54:14.7858002Z st.shared.b32 [global_smem+98456], 33554689; 2026-02-21T08:54:14.7858212Z st.shared.b32 [global_smem+98304], %r674; 2026-02-21T08:54:14.7858426Z st.shared.v2.b32 [global_smem+98312], {%r447, %r472}; 2026-02-21T08:54:14.7858629Z barrier.sync 1; 2026-02-21T08:54:14.7858790Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T08:54:14.7858967Z barrier.sync 1; 2026-02-21T08:54:14.7859107Z barrier.sync 1; 2026-02-21T08:54:14.7859260Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T08:54:14.7859587Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7859868Z bar.sync 0, 256; 2026-02-21T08:54:14.7860008Z // begin inline asm 2026-02-21T08:54:14.7860139Z 2026-02-21T08:54:14.7860266Z { 2026-02-21T08:54:14.7860388Z .reg .pred complete; 2026-02-21T08:54:14.7860539Z waitLoop: 2026-02-21T08:54:14.7860727Z mbarrier.try_wait.parity.shared.b64 complete, [%r304], %r166; 2026-02-21T08:54:14.7860958Z @!complete bra.uni waitLoop; 2026-02-21T08:54:14.7861117Z } 2026-02-21T08:54:14.7861179Z 2026-02-21T08:54:14.7861233Z // end inline asm 2026-02-21T08:54:14.7861480Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.7861758Z bar.sync 0, 256; 2026-02-21T08:54:14.7861894Z // begin inline asm 2026-02-21T08:54:14.7862057Z @%p113 mbarrier.inval.shared::cta.b64 [%r304]; 2026-02-21T08:54:14.7862249Z // end inline asm 2026-02-21T08:54:14.7862413Z // begin inline asm 2026-02-21T08:54:14.7862578Z @%p113 mbarrier.inval.shared::cta.b64 [%r302]; 2026-02-21T08:54:14.7862771Z // end inline asm 2026-02-21T08:54:14.7862899Z // begin inline asm 2026-02-21T08:54:14.7863063Z @%p113 mbarrier.inval.shared::cta.b64 [%r301]; 2026-02-21T08:54:14.7863242Z // end inline asm 2026-02-21T08:54:14.7863485Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7863760Z // begin inline asm 2026-02-21T08:54:14.7864111Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r310, %r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325}, [%r165 + 0]; 2026-02-21T08:54:14.7864488Z // end inline asm 2026-02-21T08:54:14.7864617Z // begin inline asm 2026-02-21T08:54:14.7865003Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342}, [%r165 + 16]; 2026-02-21T08:54:14.7865368Z // end inline asm 2026-02-21T08:54:14.7865506Z // begin inline asm 2026-02-21T08:54:14.7865848Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359}, [%r165 + 32]; 2026-02-21T08:54:14.7866256Z // end inline asm 2026-02-21T08:54:14.7866391Z // begin inline asm 2026-02-21T08:54:14.7866723Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r361, %r362, %r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376}, [%r165 + 48]; 2026-02-21T08:54:14.7867088Z // end inline asm 2026-02-21T08:54:14.7867217Z // begin inline asm 2026-02-21T08:54:14.7867557Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r378, %r379, %r380, %r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393}, [%r165 + 64]; 2026-02-21T08:54:14.7867924Z // end inline asm 2026-02-21T08:54:14.7868053Z // begin inline asm 2026-02-21T08:54:14.7868390Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410}, [%r165 + 80]; 2026-02-21T08:54:14.7868766Z // end inline asm 2026-02-21T08:54:14.7868902Z // begin inline asm 2026-02-21T08:54:14.7869233Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427}, [%r165 + 96]; 2026-02-21T08:54:14.7869603Z // end inline asm 2026-02-21T08:54:14.7869739Z // begin inline asm 2026-02-21T08:54:14.7870098Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444}, [%r165 + 112]; 2026-02-21T08:54:14.7870472Z // end inline asm 2026-02-21T08:54:14.7870602Z // begin inline asm 2026-02-21T08:54:14.7870755Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:14.7870912Z // end inline asm 2026-02-21T08:54:14.7871055Z cvt.u64.u32 %rd140, %r310; 2026-02-21T08:54:14.7871212Z cvt.u64.u32 %rd141, %r311; 2026-02-21T08:54:14.7871409Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:54:14.7871579Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:54:14.7871858Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7872157Z mov.b64 {%r479, %r480}, %rd143; 2026-02-21T08:54:14.7872329Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T08:54:14.7872626Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7872911Z cvt.u64.u32 %rd144, %r312; 2026-02-21T08:54:14.7873070Z cvt.u64.u32 %rd145, %r313; 2026-02-21T08:54:14.7873229Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:54:14.7873383Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:54:14.7873657Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7873939Z mov.b64 {%r482, %r483}, %rd147; 2026-02-21T08:54:14.7874153Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T08:54:14.7874433Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7874742Z cvt.u64.u32 %rd148, %r314; 2026-02-21T08:54:14.7874892Z cvt.u64.u32 %rd149, %r315; 2026-02-21T08:54:14.7875054Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:54:14.7875215Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:54:14.7875480Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7875773Z mov.b64 {%r485, %r486}, %rd151; 2026-02-21T08:54:14.7875940Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T08:54:14.7876230Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7876511Z cvt.u64.u32 %rd152, %r316; 2026-02-21T08:54:14.7876670Z cvt.u64.u32 %rd153, %r317; 2026-02-21T08:54:14.7876824Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:54:14.7876978Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:54:14.7877248Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7877530Z mov.b64 {%r488, %r489}, %rd155; 2026-02-21T08:54:14.7877701Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T08:54:14.7878012Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7878312Z cvt.u64.u32 %rd156, %r318; 2026-02-21T08:54:14.7878473Z cvt.u64.u32 %rd157, %r319; 2026-02-21T08:54:14.7878629Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:54:14.7878792Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:54:14.7879067Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7879369Z mov.b64 {%r491, %r492}, %rd159; 2026-02-21T08:54:14.7879534Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T08:54:14.7879832Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7880126Z cvt.u64.u32 %rd160, %r320; 2026-02-21T08:54:14.7880288Z cvt.u64.u32 %rd161, %r321; 2026-02-21T08:54:14.7880447Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:54:14.7880603Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:54:14.7880878Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7881176Z mov.b64 {%r494, %r495}, %rd163; 2026-02-21T08:54:14.7881352Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T08:54:14.7881669Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7881973Z cvt.u64.u32 %rd164, %r322; 2026-02-21T08:54:14.7882136Z cvt.u64.u32 %rd165, %r323; 2026-02-21T08:54:14.7882293Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:54:14.7882460Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:54:14.7882733Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7883040Z mov.b64 {%r497, %r498}, %rd167; 2026-02-21T08:54:14.7883264Z cvt.rn.f16x2.f32 %r499, %r498, %r497; 2026-02-21T08:54:14.7883553Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7883854Z cvt.u64.u32 %rd168, %r324; 2026-02-21T08:54:14.7884016Z cvt.u64.u32 %rd169, %r325; 2026-02-21T08:54:14.7884174Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:54:14.7884330Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:54:14.7884607Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7884930Z mov.b64 {%r500, %r501}, %rd171; 2026-02-21T08:54:14.7885103Z cvt.rn.f16x2.f32 %r502, %r501, %r500; 2026-02-21T08:54:14.7885391Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7885681Z cvt.u64.u32 %rd172, %r327; 2026-02-21T08:54:14.7885838Z cvt.u64.u32 %rd173, %r328; 2026-02-21T08:54:14.7886013Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:54:14.7886175Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:54:14.7886435Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7886732Z mov.b64 {%r503, %r504}, %rd175; 2026-02-21T08:54:14.7886893Z cvt.rn.f16x2.f32 %r505, %r504, %r503; 2026-02-21T08:54:14.7887178Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7887459Z cvt.u64.u32 %rd176, %r329; 2026-02-21T08:54:14.7887614Z cvt.u64.u32 %rd177, %r330; 2026-02-21T08:54:14.7887769Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:54:14.7887921Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:54:14.7888191Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7888477Z mov.b64 {%r506, %r507}, %rd179; 2026-02-21T08:54:14.7888644Z cvt.rn.f16x2.f32 %r508, %r507, %r506; 2026-02-21T08:54:14.7888917Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7889201Z cvt.u64.u32 %rd180, %r331; 2026-02-21T08:54:14.7889355Z cvt.u64.u32 %rd181, %r332; 2026-02-21T08:54:14.7889502Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:54:14.7889691Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:54:14.7889946Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7890228Z mov.b64 {%r509, %r510}, %rd183; 2026-02-21T08:54:14.7890386Z cvt.rn.f16x2.f32 %r511, %r510, %r509; 2026-02-21T08:54:14.7890661Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7890936Z cvt.u64.u32 %rd184, %r333; 2026-02-21T08:54:14.7891091Z cvt.u64.u32 %rd185, %r334; 2026-02-21T08:54:14.7891243Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:54:14.7891392Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:54:14.7891656Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7891935Z mov.b64 {%r512, %r513}, %rd187; 2026-02-21T08:54:14.7892100Z cvt.rn.f16x2.f32 %r514, %r513, %r512; 2026-02-21T08:54:14.7892363Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7892646Z cvt.u64.u32 %rd188, %r335; 2026-02-21T08:54:14.7892804Z cvt.u64.u32 %rd189, %r336; 2026-02-21T08:54:14.7892950Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:54:14.7893133Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:54:14.7893395Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7893680Z mov.b64 {%r515, %r516}, %rd191; 2026-02-21T08:54:14.7893841Z cvt.rn.f16x2.f32 %r517, %r516, %r515; 2026-02-21T08:54:14.7894128Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7894420Z cvt.u64.u32 %rd192, %r337; 2026-02-21T08:54:14.7894607Z cvt.u64.u32 %rd193, %r338; 2026-02-21T08:54:14.7894791Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:54:14.7894941Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:54:14.7895212Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7895500Z mov.b64 {%r518, %r519}, %rd195; 2026-02-21T08:54:14.7895668Z cvt.rn.f16x2.f32 %r520, %r519, %r518; 2026-02-21T08:54:14.7895943Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7896234Z cvt.u64.u32 %rd196, %r339; 2026-02-21T08:54:14.7896388Z cvt.u64.u32 %rd197, %r340; 2026-02-21T08:54:14.7896538Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:54:14.7896697Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:54:14.7896957Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7897244Z mov.b64 {%r521, %r522}, %rd199; 2026-02-21T08:54:14.7897433Z cvt.rn.f16x2.f32 %r523, %r522, %r521; 2026-02-21T08:54:14.7897712Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7897986Z cvt.u64.u32 %rd200, %r341; 2026-02-21T08:54:14.7898144Z cvt.u64.u32 %rd201, %r342; 2026-02-21T08:54:14.7898296Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:54:14.7898445Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:54:14.7898711Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7898991Z mov.b64 {%r524, %r525}, %rd203; 2026-02-21T08:54:14.7899156Z cvt.rn.f16x2.f32 %r526, %r525, %r524; 2026-02-21T08:54:14.7899428Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7899719Z cvt.u64.u32 %rd204, %r344; 2026-02-21T08:54:14.7899871Z cvt.u64.u32 %rd205, %r345; 2026-02-21T08:54:14.7900019Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:54:14.7900178Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:54:14.7900444Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7900735Z mov.b64 {%r527, %r528}, %rd207; 2026-02-21T08:54:14.7900890Z cvt.rn.f16x2.f32 %r529, %r528, %r527; 2026-02-21T08:54:14.7901197Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7901477Z cvt.u64.u32 %rd208, %r346; 2026-02-21T08:54:14.7901633Z cvt.u64.u32 %rd209, %r347; 2026-02-21T08:54:14.7901788Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:54:14.7901939Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:54:14.7902207Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7902487Z mov.b64 {%r530, %r531}, %rd211; 2026-02-21T08:54:14.7902654Z cvt.rn.f16x2.f32 %r532, %r531, %r530; 2026-02-21T08:54:14.7902925Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7903215Z cvt.u64.u32 %rd212, %r348; 2026-02-21T08:54:14.7903371Z cvt.u64.u32 %rd213, %r349; 2026-02-21T08:54:14.7903517Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:54:14.7903677Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:54:14.7903938Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7904235Z mov.b64 {%r533, %r534}, %rd215; 2026-02-21T08:54:14.7904425Z cvt.rn.f16x2.f32 %r535, %r534, %r533; 2026-02-21T08:54:14.7904736Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7905018Z cvt.u64.u32 %rd216, %r350; 2026-02-21T08:54:14.7905180Z cvt.u64.u32 %rd217, %r351; 2026-02-21T08:54:14.7905336Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:54:14.7905487Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:54:14.7905754Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7906065Z mov.b64 {%r536, %r537}, %rd219; 2026-02-21T08:54:14.7906232Z cvt.rn.f16x2.f32 %r538, %r537, %r536; 2026-02-21T08:54:14.7906505Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7906790Z cvt.u64.u32 %rd220, %r352; 2026-02-21T08:54:14.7906950Z cvt.u64.u32 %rd221, %r353; 2026-02-21T08:54:14.7907100Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:54:14.7907259Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:54:14.7907521Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7907800Z mov.b64 {%r539, %r540}, %rd223; 2026-02-21T08:54:14.7907961Z cvt.rn.f16x2.f32 %r541, %r540, %r539; 2026-02-21T08:54:14.7908238Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7908516Z cvt.u64.u32 %rd224, %r354; 2026-02-21T08:54:14.7908708Z cvt.u64.u32 %rd225, %r355; 2026-02-21T08:54:14.7908866Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:54:14.7909016Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:54:14.7909286Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7909560Z mov.b64 {%r542, %r543}, %rd227; 2026-02-21T08:54:14.7909728Z cvt.rn.f16x2.f32 %r544, %r543, %r542; 2026-02-21T08:54:14.7909992Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7910280Z cvt.u64.u32 %rd228, %r356; 2026-02-21T08:54:14.7910434Z cvt.u64.u32 %rd229, %r357; 2026-02-21T08:54:14.7910581Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:54:14.7910736Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:54:14.7910992Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7911275Z mov.b64 {%r545, %r546}, %rd231; 2026-02-21T08:54:14.7911435Z cvt.rn.f16x2.f32 %r547, %r546, %r545; 2026-02-21T08:54:14.7911712Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7911983Z cvt.u64.u32 %rd232, %r358; 2026-02-21T08:54:14.7912136Z cvt.u64.u32 %rd233, %r359; 2026-02-21T08:54:14.7912321Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:54:14.7912472Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:54:14.7912745Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7913029Z mov.b64 {%r548, %r549}, %rd235; 2026-02-21T08:54:14.7913195Z cvt.rn.f16x2.f32 %r550, %r549, %r548; 2026-02-21T08:54:14.7913476Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7913762Z cvt.u64.u32 %rd236, %r361; 2026-02-21T08:54:14.7913918Z cvt.u64.u32 %rd237, %r362; 2026-02-21T08:54:14.7914066Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:54:14.7914224Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:54:14.7914492Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7914813Z mov.b64 {%r551, %r552}, %rd239; 2026-02-21T08:54:14.7914973Z cvt.rn.f16x2.f32 %r553, %r552, %r551; 2026-02-21T08:54:14.7915257Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7915537Z cvt.u64.u32 %rd240, %r363; 2026-02-21T08:54:14.7915695Z cvt.u64.u32 %rd241, %r364; 2026-02-21T08:54:14.7915876Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:54:14.7916034Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:54:14.7916304Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7916591Z mov.b64 {%r554, %r555}, %rd243; 2026-02-21T08:54:14.7916755Z cvt.rn.f16x2.f32 %r556, %r555, %r554; 2026-02-21T08:54:14.7917023Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7917306Z cvt.u64.u32 %rd244, %r365; 2026-02-21T08:54:14.7917509Z cvt.u64.u32 %rd245, %r366; 2026-02-21T08:54:14.7917658Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:54:14.7917817Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:54:14.7918081Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7918379Z mov.b64 {%r557, %r558}, %rd247; 2026-02-21T08:54:14.7918539Z cvt.rn.f16x2.f32 %r559, %r558, %r557; 2026-02-21T08:54:14.7918816Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7919100Z cvt.u64.u32 %rd248, %r367; 2026-02-21T08:54:14.7919254Z cvt.u64.u32 %rd249, %r368; 2026-02-21T08:54:14.7919409Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:54:14.7919559Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:54:14.7919830Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7920140Z mov.b64 {%r560, %r561}, %rd251; 2026-02-21T08:54:14.7920313Z cvt.rn.f16x2.f32 %r562, %r561, %r560; 2026-02-21T08:54:14.7920629Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7920937Z cvt.u64.u32 %rd252, %r369; 2026-02-21T08:54:14.7921099Z cvt.u64.u32 %rd253, %r370; 2026-02-21T08:54:14.7921255Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:54:14.7921416Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:54:14.7921703Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7922017Z mov.b64 {%r563, %r564}, %rd255; 2026-02-21T08:54:14.7922183Z cvt.rn.f16x2.f32 %r565, %r564, %r563; 2026-02-21T08:54:14.7922503Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7922807Z cvt.u64.u32 %rd256, %r371; 2026-02-21T08:54:14.7922968Z cvt.u64.u32 %rd257, %r372; 2026-02-21T08:54:14.7923128Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:54:14.7923287Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:54:14.7923565Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7923880Z mov.b64 {%r566, %r567}, %rd259; 2026-02-21T08:54:14.7924087Z cvt.rn.f16x2.f32 %r568, %r567, %r566; 2026-02-21T08:54:14.7924378Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7924728Z cvt.u64.u32 %rd260, %r373; 2026-02-21T08:54:14.7924892Z cvt.u64.u32 %rd261, %r374; 2026-02-21T08:54:14.7925047Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:54:14.7925214Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:54:14.7925486Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7925801Z mov.b64 {%r569, %r570}, %rd263; 2026-02-21T08:54:14.7925969Z cvt.rn.f16x2.f32 %r571, %r570, %r569; 2026-02-21T08:54:14.7926259Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7926548Z cvt.u64.u32 %rd264, %r375; 2026-02-21T08:54:14.7926713Z cvt.u64.u32 %rd265, %r376; 2026-02-21T08:54:14.7926876Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:54:14.7927040Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:54:14.7927320Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7927637Z mov.b64 {%r572, %r573}, %rd267; 2026-02-21T08:54:14.7927860Z cvt.rn.f16x2.f32 %r574, %r573, %r572; 2026-02-21T08:54:14.7928146Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7928463Z cvt.u64.u32 %rd268, %r378; 2026-02-21T08:54:14.7928623Z cvt.u64.u32 %rd269, %r379; 2026-02-21T08:54:14.7928776Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:54:14.7928941Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:54:14.7929210Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7929554Z mov.b64 {%r575, %r576}, %rd271; 2026-02-21T08:54:14.7929723Z cvt.rn.f16x2.f32 %r577, %r576, %r575; 2026-02-21T08:54:14.7930011Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7930312Z cvt.u64.u32 %rd272, %r380; 2026-02-21T08:54:14.7930474Z cvt.u64.u32 %rd273, %r381; 2026-02-21T08:54:14.7930634Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:54:14.7930791Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:54:14.7931071Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7931346Z mov.b64 {%r578, %r579}, %rd275; 2026-02-21T08:54:14.7931511Z cvt.rn.f16x2.f32 %r580, %r579, %r578; 2026-02-21T08:54:14.7931775Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7932069Z cvt.u64.u32 %rd276, %r382; 2026-02-21T08:54:14.7932252Z cvt.u64.u32 %rd277, %r383; 2026-02-21T08:54:14.7932408Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:54:14.7932570Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:54:14.7932833Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7933129Z mov.b64 {%r581, %r582}, %rd279; 2026-02-21T08:54:14.7933294Z cvt.rn.f16x2.f32 %r583, %r582, %r581; 2026-02-21T08:54:14.7933571Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7933851Z cvt.u64.u32 %rd280, %r384; 2026-02-21T08:54:14.7934010Z cvt.u64.u32 %rd281, %r385; 2026-02-21T08:54:14.7934168Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:54:14.7934323Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:54:14.7934592Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7934911Z mov.b64 {%r584, %r585}, %rd283; 2026-02-21T08:54:14.7935078Z cvt.rn.f16x2.f32 %r586, %r585, %r584; 2026-02-21T08:54:14.7935355Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7935639Z cvt.u64.u32 %rd284, %r386; 2026-02-21T08:54:14.7935795Z cvt.u64.u32 %rd285, %r387; 2026-02-21T08:54:14.7935970Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:54:14.7936130Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:54:14.7936385Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7936670Z mov.b64 {%r587, %r588}, %rd287; 2026-02-21T08:54:14.7936830Z cvt.rn.f16x2.f32 %r589, %r588, %r587; 2026-02-21T08:54:14.7937105Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7937389Z cvt.u64.u32 %rd288, %r388; 2026-02-21T08:54:14.7937545Z cvt.u64.u32 %rd289, %r389; 2026-02-21T08:54:14.7937699Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:54:14.7937854Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:54:14.7938121Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7938398Z mov.b64 {%r590, %r591}, %rd291; 2026-02-21T08:54:14.7938569Z cvt.rn.f16x2.f32 %r592, %r591, %r590; 2026-02-21T08:54:14.7938851Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7939132Z cvt.u64.u32 %rd292, %r390; 2026-02-21T08:54:14.7939286Z cvt.u64.u32 %rd293, %r391; 2026-02-21T08:54:14.7939456Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:54:14.7939616Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:54:14.7939870Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7940150Z mov.b64 {%r593, %r594}, %rd295; 2026-02-21T08:54:14.7940310Z cvt.rn.f16x2.f32 %r595, %r594, %r593; 2026-02-21T08:54:14.7940582Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7940894Z cvt.u64.u32 %rd296, %r392; 2026-02-21T08:54:14.7941057Z cvt.u64.u32 %rd297, %r393; 2026-02-21T08:54:14.7941215Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:54:14.7941370Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:54:14.7941645Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7941938Z mov.b64 {%r596, %r597}, %rd299; 2026-02-21T08:54:14.7942110Z cvt.rn.f16x2.f32 %r598, %r597, %r596; 2026-02-21T08:54:14.7942387Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7942678Z cvt.u64.u32 %rd300, %r395; 2026-02-21T08:54:14.7942838Z cvt.u64.u32 %rd301, %r396; 2026-02-21T08:54:14.7942992Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:54:14.7943157Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:54:14.7943448Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7943735Z mov.b64 {%r599, %r600}, %rd303; 2026-02-21T08:54:14.7943897Z cvt.rn.f16x2.f32 %r601, %r600, %r599; 2026-02-21T08:54:14.7944174Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7944460Z cvt.u64.u32 %rd304, %r397; 2026-02-21T08:54:14.7944612Z cvt.u64.u32 %rd305, %r398; 2026-02-21T08:54:14.7944795Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:54:14.7944947Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:54:14.7945219Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7945496Z mov.b64 {%r602, %r603}, %rd307; 2026-02-21T08:54:14.7945663Z cvt.rn.f16x2.f32 %r604, %r603, %r602; 2026-02-21T08:54:14.7945937Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7946226Z cvt.u64.u32 %rd308, %r399; 2026-02-21T08:54:14.7946381Z cvt.u64.u32 %rd309, %r400; 2026-02-21T08:54:14.7946530Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:54:14.7946692Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:54:14.7946961Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7947246Z mov.b64 {%r605, %r606}, %rd311; 2026-02-21T08:54:14.7947435Z cvt.rn.f16x2.f32 %r607, %r606, %r605; 2026-02-21T08:54:14.7947705Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7947979Z cvt.u64.u32 %rd312, %r401; 2026-02-21T08:54:14.7948133Z cvt.u64.u32 %rd313, %r402; 2026-02-21T08:54:14.7948286Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:54:14.7948435Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:54:14.7948697Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7948979Z mov.b64 {%r608, %r609}, %rd315; 2026-02-21T08:54:14.7949147Z cvt.rn.f16x2.f32 %r610, %r609, %r608; 2026-02-21T08:54:14.7949417Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7949701Z cvt.u64.u32 %rd316, %r403; 2026-02-21T08:54:14.7949856Z cvt.u64.u32 %rd317, %r404; 2026-02-21T08:54:14.7950004Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:54:14.7950162Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:54:14.7950415Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7950729Z mov.b64 {%r611, %r612}, %rd319; 2026-02-21T08:54:14.7950893Z cvt.rn.f16x2.f32 %r613, %r612, %r611; 2026-02-21T08:54:14.7951173Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7951454Z cvt.u64.u32 %rd320, %r405; 2026-02-21T08:54:14.7951611Z cvt.u64.u32 %rd321, %r406; 2026-02-21T08:54:14.7951770Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:54:14.7951924Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:54:14.7952194Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7952507Z mov.b64 {%r614, %r615}, %rd323; 2026-02-21T08:54:14.7952673Z cvt.rn.f16x2.f32 %r616, %r615, %r614; 2026-02-21T08:54:14.7952939Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7953223Z cvt.u64.u32 %rd324, %r407; 2026-02-21T08:54:14.7953376Z cvt.u64.u32 %rd325, %r408; 2026-02-21T08:54:14.7953522Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:54:14.7953683Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:54:14.7953940Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7954233Z mov.b64 {%r617, %r618}, %rd327; 2026-02-21T08:54:14.7954391Z cvt.rn.f16x2.f32 %r619, %r618, %r617; 2026-02-21T08:54:14.7954719Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7955029Z cvt.u64.u32 %rd328, %r409; 2026-02-21T08:54:14.7955188Z cvt.u64.u32 %rd329, %r410; 2026-02-21T08:54:14.7955340Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:54:14.7955491Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:54:14.7955757Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7956034Z mov.b64 {%r620, %r621}, %rd331; 2026-02-21T08:54:14.7956199Z cvt.rn.f16x2.f32 %r622, %r621, %r620; 2026-02-21T08:54:14.7956471Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7956755Z cvt.u64.u32 %rd332, %r412; 2026-02-21T08:54:14.7956909Z cvt.u64.u32 %rd333, %r413; 2026-02-21T08:54:14.7957055Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:54:14.7957211Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:54:14.7957470Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7957751Z mov.b64 {%r623, %r624}, %rd335; 2026-02-21T08:54:14.7957912Z cvt.rn.f16x2.f32 %r625, %r624, %r623; 2026-02-21T08:54:14.7958185Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7958461Z cvt.u64.u32 %rd336, %r414; 2026-02-21T08:54:14.7958641Z cvt.u64.u32 %rd337, %r415; 2026-02-21T08:54:14.7958797Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:54:14.7958946Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:54:14.7959215Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7959492Z mov.b64 {%r626, %r627}, %rd339; 2026-02-21T08:54:14.7959659Z cvt.rn.f16x2.f32 %r628, %r627, %r626; 2026-02-21T08:54:14.7959930Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7960218Z cvt.u64.u32 %rd340, %r416; 2026-02-21T08:54:14.7960377Z cvt.u64.u32 %rd341, %r417; 2026-02-21T08:54:14.7960532Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:54:14.7960692Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:54:14.7960958Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7961249Z mov.b64 {%r629, %r630}, %rd343; 2026-02-21T08:54:14.7961409Z cvt.rn.f16x2.f32 %r631, %r630, %r629; 2026-02-21T08:54:14.7961683Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7961973Z cvt.u64.u32 %rd344, %r418; 2026-02-21T08:54:14.7962160Z cvt.u64.u32 %rd345, %r419; 2026-02-21T08:54:14.7962323Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:54:14.7962481Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:54:14.7962775Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7963092Z mov.b64 {%r632, %r633}, %rd347; 2026-02-21T08:54:14.7963264Z cvt.rn.f16x2.f32 %r634, %r633, %r632; 2026-02-21T08:54:14.7963578Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7963930Z cvt.u64.u32 %rd348, %r420; 2026-02-21T08:54:14.7964091Z cvt.u64.u32 %rd349, %r421; 2026-02-21T08:54:14.7964245Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:54:14.7964412Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:54:14.7964731Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7965060Z mov.b64 {%r635, %r636}, %rd351; 2026-02-21T08:54:14.7965229Z cvt.rn.f16x2.f32 %r637, %r636, %r635; 2026-02-21T08:54:14.7965526Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7965840Z cvt.u64.u32 %rd352, %r422; 2026-02-21T08:54:14.7966003Z cvt.u64.u32 %rd353, %r423; 2026-02-21T08:54:14.7966163Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:54:14.7966320Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:54:14.7966632Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7966933Z mov.b64 {%r638, %r639}, %rd355; 2026-02-21T08:54:14.7967103Z cvt.rn.f16x2.f32 %r640, %r639, %r638; 2026-02-21T08:54:14.7967382Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7967684Z cvt.u64.u32 %rd356, %r424; 2026-02-21T08:54:14.7967843Z cvt.u64.u32 %rd357, %r425; 2026-02-21T08:54:14.7967996Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:54:14.7968161Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:54:14.7968429Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7968724Z mov.b64 {%r641, %r642}, %rd359; 2026-02-21T08:54:14.7968890Z cvt.rn.f16x2.f32 %r643, %r642, %r641; 2026-02-21T08:54:14.7969179Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7969484Z cvt.u64.u32 %rd360, %r426; 2026-02-21T08:54:14.7969649Z cvt.u64.u32 %rd361, %r427; 2026-02-21T08:54:14.7969812Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:54:14.7969971Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:54:14.7970248Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7970609Z mov.b64 {%r644, %r645}, %rd363; 2026-02-21T08:54:14.7970783Z cvt.rn.f16x2.f32 %r646, %r645, %r644; 2026-02-21T08:54:14.7971065Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7971397Z cvt.u64.u32 %rd364, %r429; 2026-02-21T08:54:14.7971561Z cvt.u64.u32 %rd365, %r430; 2026-02-21T08:54:14.7971719Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:54:14.7971889Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:54:14.7972164Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7972477Z mov.b64 {%r647, %r648}, %rd367; 2026-02-21T08:54:14.7972650Z cvt.rn.f16x2.f32 %r649, %r648, %r647; 2026-02-21T08:54:14.7972954Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7973276Z cvt.u64.u32 %rd368, %r431; 2026-02-21T08:54:14.7973451Z cvt.u64.u32 %rd369, %r432; 2026-02-21T08:54:14.7973606Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:54:14.7973758Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:54:14.7974021Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7974323Z mov.b64 {%r650, %r651}, %rd371; 2026-02-21T08:54:14.7974493Z cvt.rn.f16x2.f32 %r652, %r651, %r650; 2026-02-21T08:54:14.7974786Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7975079Z cvt.u64.u32 %rd372, %r433; 2026-02-21T08:54:14.7975237Z cvt.u64.u32 %rd373, %r434; 2026-02-21T08:54:14.7975386Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:54:14.7975546Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:54:14.7975806Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7976125Z mov.b64 {%r653, %r654}, %rd375; 2026-02-21T08:54:14.7976284Z cvt.rn.f16x2.f32 %r655, %r654, %r653; 2026-02-21T08:54:14.7976564Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7976850Z cvt.u64.u32 %rd376, %r435; 2026-02-21T08:54:14.7977004Z cvt.u64.u32 %rd377, %r436; 2026-02-21T08:54:14.7977157Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:54:14.7977306Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:54:14.7977574Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7977848Z mov.b64 {%r656, %r657}, %rd379; 2026-02-21T08:54:14.7978014Z cvt.rn.f16x2.f32 %r658, %r657, %r656; 2026-02-21T08:54:14.7978310Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7978596Z cvt.u64.u32 %rd380, %r437; 2026-02-21T08:54:14.7978747Z cvt.u64.u32 %rd381, %r438; 2026-02-21T08:54:14.7978892Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:54:14.7979050Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:54:14.7979306Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7979590Z mov.b64 {%r659, %r660}, %rd383; 2026-02-21T08:54:14.7979748Z cvt.rn.f16x2.f32 %r661, %r660, %r659; 2026-02-21T08:54:14.7980021Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7980292Z cvt.u64.u32 %rd384, %r439; 2026-02-21T08:54:14.7980446Z cvt.u64.u32 %rd385, %r440; 2026-02-21T08:54:14.7980598Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:54:14.7980747Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:54:14.7981011Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7981288Z mov.b64 {%r662, %r663}, %rd387; 2026-02-21T08:54:14.7981453Z cvt.rn.f16x2.f32 %r664, %r663, %r662; 2026-02-21T08:54:14.7981721Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7982004Z cvt.u64.u32 %rd388, %r441; 2026-02-21T08:54:14.7982191Z cvt.u64.u32 %rd389, %r442; 2026-02-21T08:54:14.7982339Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:54:14.7982502Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:54:14.7982768Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7983059Z mov.b64 {%r665, %r666}, %rd391; 2026-02-21T08:54:14.7983231Z cvt.rn.f16x2.f32 %r667, %r666, %r665; 2026-02-21T08:54:14.7983510Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.7983797Z cvt.u64.u32 %rd392, %r443; 2026-02-21T08:54:14.7983950Z cvt.u64.u32 %rd393, %r444; 2026-02-21T08:54:14.7984106Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:54:14.7984257Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:54:14.7984525Z .loc 1 58 27 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:58:27 2026-02-21T08:54:14.7984836Z mov.b64 {%r668, %r669}, %rd395; 2026-02-21T08:54:14.7985001Z cvt.rn.f16x2.f32 %r670, %r669, %r668; 2026-02-21T08:54:14.7985275Z .loc 1 59 45 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:59:45 2026-02-21T08:54:14.7985633Z st.shared.v4.b32 [%r471], {%r481, %r484, %r487, %r490}; 2026-02-21T08:54:14.7985886Z st.shared.v4.b32 [%r471+16384], {%r577, %r580, %r583, %r586}; 2026-02-21T08:54:14.7986124Z st.shared.v4.b32 [%r470], {%r493, %r496, %r499, %r502}; 2026-02-21T08:54:14.7986363Z st.shared.v4.b32 [%r470+16384], {%r589, %r592, %r595, %r598}; 2026-02-21T08:54:14.7986596Z st.shared.v4.b32 [%r468], {%r505, %r508, %r511, %r514}; 2026-02-21T08:54:14.7986830Z st.shared.v4.b32 [%r468+16384], {%r601, %r604, %r607, %r610}; 2026-02-21T08:54:14.7987087Z st.shared.v4.b32 [%r466], {%r517, %r520, %r523, %r526}; 2026-02-21T08:54:14.7987320Z st.shared.v4.b32 [%r466+16384], {%r613, %r616, %r619, %r622}; 2026-02-21T08:54:14.7987554Z st.shared.v4.b32 [%r464], {%r529, %r532, %r535, %r538}; 2026-02-21T08:54:14.7987780Z st.shared.v4.b32 [%r464+16384], {%r625, %r628, %r631, %r634}; 2026-02-21T08:54:14.7988012Z st.shared.v4.b32 [%r462], {%r541, %r544, %r547, %r550}; 2026-02-21T08:54:14.7988237Z st.shared.v4.b32 [%r462+16384], {%r637, %r640, %r643, %r646}; 2026-02-21T08:54:14.7988468Z st.shared.v4.b32 [%r460], {%r553, %r556, %r559, %r562}; 2026-02-21T08:54:14.7988694Z st.shared.v4.b32 [%r460+16384], {%r649, %r652, %r655, %r658}; 2026-02-21T08:54:14.7988926Z st.shared.v4.b32 [%r458], {%r565, %r568, %r571, %r574}; 2026-02-21T08:54:14.7989156Z st.shared.v4.b32 [%r458+16384], {%r661, %r664, %r667, %r670}; 2026-02-21T08:54:14.7989355Z // begin inline asm 2026-02-21T08:54:14.7989522Z fence.proxy.async.shared::cta; 2026-02-21T08:54:14.7989706Z // end inline asm 2026-02-21T08:54:14.7989854Z bar.sync 0, 256; 2026-02-21T08:54:14.7990001Z elect.sync %r671|%p122, -1; 2026-02-21T08:54:14.7990175Z and.pred %p120, %p121, %p122; 2026-02-21T08:54:14.7990333Z shl.b32 %r672, %r474, 14; 2026-02-21T08:54:14.7990501Z add.s32 %r448, %r130, %r672; 2026-02-21T08:54:14.7990665Z shl.b32 %r673, %r474, 6; 2026-02-21T08:54:14.7990817Z or.b32 %r446, %r673, %r472; 2026-02-21T08:54:14.7990975Z // begin inline asm 2026-02-21T08:54:14.7991242Z @%p120 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd139, {%r446, %r447}], [%r448]; 2026-02-21T08:54:14.7991546Z // end inline asm 2026-02-21T08:54:14.7991693Z cp.async.bulk.commit_group; 2026-02-21T08:54:14.7991877Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:14.7992048Z bar.sync 0, 256; 2026-02-21T08:54:14.7992217Z $L__BB0_14: // %._crit_edge 2026-02-21T08:54:14.7992531Z .loc 1 33 4 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:33:4 2026-02-21T08:54:14.7992817Z bar.sync 0, 256; 2026-02-21T08:54:14.7992963Z // begin inline asm 2026-02-21T08:54:14.7993165Z @%p48 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r674, 256; 2026-02-21T08:54:14.7993396Z // end inline asm 2026-02-21T08:54:14.7993580Z st.shared.b32 [global_smem+98456], 50529027; 2026-02-21T08:54:14.7993775Z barrier.sync 1; 2026-02-21T08:54:14.7993933Z $L__BB0_15: // %common.ret 2026-02-21T08:54:14.7994237Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0 2026-02-21T08:54:14.7994512Z ret; 2026-02-21T08:54:14.7994700Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:54:14.7994912Z mov.b32 %r20, global_smem; 2026-02-21T08:54:14.7995069Z add.s32 %r21, %r20, %r3; 2026-02-21T08:54:14.7995222Z add.s32 %r74, %r20, 65536; 2026-02-21T08:54:14.7995375Z bfe.u32 %r75, %r74, 4, 14; 2026-02-21T08:54:14.7995532Z cvt.u64.u32 %rd48, %r75; 2026-02-21T08:54:14.7995694Z or.b64 %rd30, %rd48, 4611686293372403712; 2026-02-21T08:54:14.7995876Z bfe.u32 %r76, %r20, 4, 14; 2026-02-21T08:54:14.7996030Z cvt.u64.u32 %rd49, %r76; 2026-02-21T08:54:14.7996185Z or.b64 %rd31, %rd49, 4611686293439512576; 2026-02-21T08:54:14.7996364Z add.s32 %r77, %r20, 65568; 2026-02-21T08:54:14.7996507Z bfe.u32 %r78, %r77, 4, 14; 2026-02-21T08:54:14.7996663Z add.s32 %r79, %r20, 32; 2026-02-21T08:54:14.7996806Z bfe.u32 %r80, %r79, 4, 14; 2026-02-21T08:54:14.7996958Z bra.uni $L__BB0_2; 2026-02-21T08:54:14.7997162Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.7997501Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.7997822Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:14.7998001Z barrier.sync 1; 2026-02-21T08:54:14.7998142Z barrier.sync 1; 2026-02-21T08:54:14.7998294Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:14.7998504Z $L__BB0_2: // %.preheader 2026-02-21T08:54:14.7998747Z // =>This Loop Header: Depth=1 2026-02-21T08:54:14.7998976Z // Child Loop BB0_9 Depth 2 2026-02-21T08:54:14.7999193Z // Child Loop BB0_6 Depth 2 2026-02-21T08:54:14.7999499Z .loc 1 19 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:19 2026-02-21T08:54:14.7999804Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:54:14.8000052Z barrier.sync 1; 2026-02-21T08:54:14.8000203Z ld.shared.b8 %r19, [%r21+98448]; 2026-02-21T08:54:14.8000370Z setp.gt.u32 %p2, %r19, 3; 2026-02-21T08:54:14.8000528Z @%p2 bra $L__BB0_4; 2026-02-21T08:54:14.8000690Z // %bb.3: // %.preheader 2026-02-21T08:54:14.8000908Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.8001115Z $L_brx_0: .branchtargets 2026-02-21T08:54:14.8001191Z $L__BB0_5, 2026-02-21T08:54:14.8001249Z $L__BB0_8, 2026-02-21T08:54:14.8001301Z $L__BB0_11, 2026-02-21T08:54:14.8001362Z $L__BB0_15; 2026-02-21T08:54:14.8001422Z brx.idx %r19, $L_brx_0; 2026-02-21T08:54:14.8001500Z $L__BB0_5: // %.peel.next 2026-02-21T08:54:14.8001596Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.8001767Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.8001842Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:14.8001925Z ld.shared.b32 %r55, [global_smem+98304]; 2026-02-21T08:54:14.8001980Z barrier.sync 1; 2026-02-21T08:54:14.8002038Z cvt.u64.u32 %rd50, %r78; 2026-02-21T08:54:14.8002105Z or.b64 %rd66, %rd50, 4611686293372403712; 2026-02-21T08:54:14.8002170Z cvt.u64.u32 %rd51, %r80; 2026-02-21T08:54:14.8002235Z or.b64 %rd67, %rd51, 4611686293439512576; 2026-02-21T08:54:14.8002294Z add.s32 %r81, %r20, 65600; 2026-02-21T08:54:14.8002359Z bfe.u32 %r82, %r81, 4, 14; 2026-02-21T08:54:14.8002415Z cvt.u64.u32 %rd52, %r82; 2026-02-21T08:54:14.8002478Z or.b64 %rd68, %rd52, 4611686293372403712; 2026-02-21T08:54:14.8002533Z add.s32 %r83, %r20, 64; 2026-02-21T08:54:14.8002629Z bfe.u32 %r84, %r83, 4, 14; 2026-02-21T08:54:14.8002689Z cvt.u64.u32 %rd53, %r84; 2026-02-21T08:54:14.8002753Z or.b64 %rd69, %rd53, 4611686293439512576; 2026-02-21T08:54:14.8002818Z add.s32 %r85, %r20, 65632; 2026-02-21T08:54:14.8002876Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T08:54:14.8002932Z cvt.u64.u32 %rd54, %r86; 2026-02-21T08:54:14.8002997Z or.b64 %rd70, %rd54, 4611686293372403712; 2026-02-21T08:54:14.8003061Z add.s32 %r87, %r20, 96; 2026-02-21T08:54:14.8003115Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T08:54:14.8003171Z cvt.u64.u32 %rd55, %r88; 2026-02-21T08:54:14.8003239Z or.b64 %rd71, %rd55, 4611686293439512576; 2026-02-21T08:54:14.8003294Z add.s32 %r89, %r20, 81920; 2026-02-21T08:54:14.8003350Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T08:54:14.8003407Z cvt.u64.u32 %rd56, %r90; 2026-02-21T08:54:14.8003478Z or.b64 %rd72, %rd56, 4611686293372403712; 2026-02-21T08:54:14.8003533Z add.s32 %r91, %r20, 32768; 2026-02-21T08:54:14.8003588Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T08:54:14.8003652Z cvt.u64.u32 %rd57, %r92; 2026-02-21T08:54:14.8003715Z or.b64 %rd73, %rd57, 4611686293439512576; 2026-02-21T08:54:14.8003769Z add.s32 %r93, %r20, 81952; 2026-02-21T08:54:14.8003831Z bfe.u32 %r94, %r93, 4, 14; 2026-02-21T08:54:14.8003907Z cvt.u64.u32 %rd58, %r94; 2026-02-21T08:54:14.8003972Z or.b64 %rd74, %rd58, 4611686293372403712; 2026-02-21T08:54:14.8004025Z add.s32 %r95, %r20, 32800; 2026-02-21T08:54:14.8004088Z bfe.u32 %r96, %r95, 4, 14; 2026-02-21T08:54:14.8004145Z cvt.u64.u32 %rd59, %r96; 2026-02-21T08:54:14.8004205Z or.b64 %rd75, %rd59, 4611686293439512576; 2026-02-21T08:54:14.8004267Z add.s32 %r97, %r20, 81984; 2026-02-21T08:54:14.8004322Z bfe.u32 %r98, %r97, 4, 14; 2026-02-21T08:54:14.8004381Z cvt.u64.u32 %rd60, %r98; 2026-02-21T08:54:14.8004472Z or.b64 %rd76, %rd60, 4611686293372403712; 2026-02-21T08:54:14.8004538Z add.s32 %r99, %r20, 32832; 2026-02-21T08:54:14.8004601Z bfe.u32 %r100, %r99, 4, 14; 2026-02-21T08:54:14.8004662Z cvt.u64.u32 %rd61, %r100; 2026-02-21T08:54:14.8004758Z or.b64 %rd77, %rd61, 4611686293439512576; 2026-02-21T08:54:14.8004820Z add.s32 %r101, %r20, 82016; 2026-02-21T08:54:14.8004882Z bfe.u32 %r102, %r101, 4, 14; 2026-02-21T08:54:14.8004940Z cvt.u64.u32 %rd62, %r102; 2026-02-21T08:54:14.8005013Z or.b64 %rd78, %rd62, 4611686293372403712; 2026-02-21T08:54:14.8005073Z add.s32 %r103, %r20, 32864; 2026-02-21T08:54:14.8005134Z bfe.u32 %r104, %r103, 4, 14; 2026-02-21T08:54:14.8005202Z cvt.u64.u32 %rd63, %r104; 2026-02-21T08:54:14.8005268Z or.b64 %rd79, %rd63, 4611686293439512576; 2026-02-21T08:54:14.8005440Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0 2026-02-21T08:54:14.8005555Z bar.warp.sync -1; 2026-02-21T08:54:14.8005616Z add.s32 %r53, %r20, 98432; 2026-02-21T08:54:14.8005671Z mov.b32 %r675, 0; 2026-02-21T08:54:14.8005729Z // begin inline asm 2026-02-21T08:54:14.8005790Z 2026-02-21T08:54:14.8005841Z { 2026-02-21T08:54:14.8005906Z .reg .pred complete; 2026-02-21T08:54:14.8005972Z waitLoop: 2026-02-21T08:54:14.8006121Z mbarrier.try_wait.parity.shared.b64 complete, [%r53], %r675; 2026-02-21T08:54:14.8006204Z @!complete bra.uni waitLoop; 2026-02-21T08:54:14.8006270Z } 2026-02-21T08:54:14.8006274Z 2026-02-21T08:54:14.8006339Z // end inline asm 2026-02-21T08:54:14.8006549Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.8006625Z elect.sync %r105|%p11, -1; 2026-02-21T08:54:14.8006702Z mov.b32 %r56, 138412048; 2026-02-21T08:54:14.8006774Z mov.pred %p10, 0; 2026-02-21T08:54:14.8006837Z // begin inline asm 2026-02-21T08:54:14.8007009Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd30, %rd31, %r56, %p10; 2026-02-21T08:54:14.8007068Z // end inline asm 2026-02-21T08:54:14.8007128Z mov.pred %p12, -1; 2026-02-21T08:54:14.8007197Z // begin inline asm 2026-02-21T08:54:14.8007355Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd66, %rd67, %r56, %p12; 2026-02-21T08:54:14.8007453Z // end inline asm 2026-02-21T08:54:14.8007527Z // begin inline asm 2026-02-21T08:54:14.8007703Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd68, %rd69, %r56, %p12; 2026-02-21T08:54:14.8007770Z // end inline asm 2026-02-21T08:54:14.8007835Z // begin inline asm 2026-02-21T08:54:14.8008009Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd70, %rd71, %r56, %p12; 2026-02-21T08:54:14.8008077Z // end inline asm 2026-02-21T08:54:14.8008142Z // begin inline asm 2026-02-21T08:54:14.8008299Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd72, %rd73, %r56, %p12; 2026-02-21T08:54:14.8008373Z // end inline asm 2026-02-21T08:54:14.8008438Z // begin inline asm 2026-02-21T08:54:14.8008584Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd74, %rd75, %r56, %p12; 2026-02-21T08:54:14.8008659Z // end inline asm 2026-02-21T08:54:14.8008726Z // begin inline asm 2026-02-21T08:54:14.8008864Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd76, %rd77, %r56, %p12; 2026-02-21T08:54:14.8008926Z // end inline asm 2026-02-21T08:54:14.8008993Z // begin inline asm 2026-02-21T08:54:14.8009138Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd78, %rd79, %r56, %p12; 2026-02-21T08:54:14.8009244Z // end inline asm 2026-02-21T08:54:14.8009315Z add.s32 %r106, %r20, 98416; 2026-02-21T08:54:14.8009375Z cvt.u64.u32 %rd46, %r106; 2026-02-21T08:54:14.8009463Z // begin inline asm 2026-02-21T08:54:14.8009611Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:54:14.8009677Z // end inline asm 2026-02-21T08:54:14.8009748Z add.s32 %r107, %r20, 98448; 2026-02-21T08:54:14.8009825Z cvt.u64.u32 %rd47, %r107; 2026-02-21T08:54:14.8009895Z // begin inline asm 2026-02-21T08:54:14.8010033Z @%p10 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T08:54:14.8010145Z // end inline asm 2026-02-21T08:54:14.8010218Z mov.b32 %r676, 1; 2026-02-21T08:54:14.8010320Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:14.8010419Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:14.8010623Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0 2026-02-21T08:54:14.8010690Z bar.warp.sync -1; 2026-02-21T08:54:14.8010760Z // begin inline asm 2026-02-21T08:54:14.8010830Z 2026-02-21T08:54:14.8010894Z { 2026-02-21T08:54:14.8010969Z .reg .pred complete; 2026-02-21T08:54:14.8011048Z waitLoop: 2026-02-21T08:54:14.8011198Z mbarrier.try_wait.parity.shared.b64 complete, [%r53], %r676; 2026-02-21T08:54:14.8011264Z @!complete bra.uni waitLoop; 2026-02-21T08:54:14.8011323Z } 2026-02-21T08:54:14.8011329Z 2026-02-21T08:54:14.8011433Z // end inline asm 2026-02-21T08:54:14.8011657Z .loc 1 56 52 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:56:52 2026-02-21T08:54:14.8011725Z setp.eq.b32 %p46, %r675, 1792; 2026-02-21T08:54:14.8011803Z elect.sync %r127|%p29, -1; 2026-02-21T08:54:14.8011882Z // begin inline asm 2026-02-21T08:54:14.8012026Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd30, %rd31, %r56, %p12; 2026-02-21T08:54:14.8012093Z // end inline asm 2026-02-21T08:54:14.8012167Z // begin inline asm 2026-02-21T08:54:14.8012314Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd66, %rd67, %r56, %p12; 2026-02-21T08:54:14.8012382Z // end inline asm 2026-02-21T08:54:14.8012456Z // begin inline asm 2026-02-21T08:54:14.8012601Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd68, %rd69, %r56, %p12; 2026-02-21T08:54:14.8012663Z // end inline asm 2026-02-21T08:54:14.8012739Z // begin inline asm 2026-02-21T08:54:14.8012891Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd70, %rd71, %r56, %p12; 2026-02-21T08:54:14.8012949Z // end inline asm 2026-02-21T08:54:14.8013034Z // begin inline asm 2026-02-21T08:54:14.8013187Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd72, %rd73, %r56, %p12; 2026-02-21T08:54:14.8013253Z // end inline asm 2026-02-21T08:54:14.8013365Z // begin inline asm 2026-02-21T08:54:14.8013515Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd74, %rd75, %r56, %p12; 2026-02-21T08:54:14.8013581Z // end inline asm 2026-02-21T08:54:14.8013662Z // begin inline asm 2026-02-21T08:54:14.8013810Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd76, %rd77, %r56, %p12; 2026-02-21T08:54:14.8013877Z // end inline asm 2026-02-21T08:54:14.8013943Z // begin inline asm 2026-02-21T08:54:14.8014086Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd78, %rd79, %r56, %p12; 2026-02-21T08:54:14.8014160Z // end inline asm 2026-02-21T08:54:14.8014213Z // begin inline asm 2026-02-21T08:54:14.8014331Z @%p29 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:54:14.8014393Z // end inline asm 2026-02-21T08:54:14.8014456Z and.pred %p45, %p46, %p29; 2026-02-21T08:54:14.8014509Z // begin inline asm 2026-02-21T08:54:14.8014633Z @%p45 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T08:54:14.8014714Z // end inline asm 2026-02-21T08:54:14.8014877Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0 2026-02-21T08:54:14.8014934Z xor.b32 %r676, %r676, 1; 2026-02-21T08:54:14.8015139Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.8015197Z add.s32 %r675, %r675, 128; 2026-02-21T08:54:14.8015260Z setp.lt.u32 %p47, %r675, 1920; 2026-02-21T08:54:14.8015325Z @%p47 bra $L__BB0_6; 2026-02-21T08:54:14.8015403Z // %bb.7: // %.loopexit 2026-02-21T08:54:14.8015492Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.8015554Z barrier.sync 1; 2026-02-21T08:54:14.8015659Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:14.8015714Z bra.uni $L__BB0_2; 2026-02-21T08:54:14.8015806Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.8015982Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.8016058Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:14.8016151Z ld.shared.v2.b32 {%r38, %r42}, [global_smem+98312]; 2026-02-21T08:54:14.8016213Z barrier.sync 1; 2026-02-21T08:54:14.8016377Z .loc 1 21 67 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:21:67 2026-02-21T08:54:14.8016432Z mov.u32 %r24, %ctaid.x; 2026-02-21T08:54:14.8016494Z mov.u32 %r25, %ctaid.y; 2026-02-21T08:54:14.8016548Z mov.u32 %r26, %ctaid.z; 2026-02-21T08:54:14.8016604Z mov.u32 %r27, %nctaid.x; 2026-02-21T08:54:14.8016658Z mov.u32 %r28, %nctaid.y; 2026-02-21T08:54:14.8016753Z mad.lo.s32 %r29, %r26, %r28, %r25; 2026-02-21T08:54:14.8016818Z mad.lo.s32 %r30, %r29, %r27, %r24; 2026-02-21T08:54:14.8016876Z mul.lo.s32 %r31, %r30, 384; 2026-02-21T08:54:14.8017051Z .loc 1 22 68 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:22:68 2026-02-21T08:54:14.8017108Z add.s32 %r32, %r31, 128; 2026-02-21T08:54:14.8017163Z cvt.s64.s32 %rd24, %r32; 2026-02-21T08:54:14.8017222Z add.s64 %rd25, %rd23, %rd24; 2026-02-21T08:54:14.8017293Z cvta.global.u64 %rd29, %rd25; 2026-02-21T08:54:14.8017463Z .loc 1 21 67 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:21:67 2026-02-21T08:54:14.8017521Z cvt.s64.s32 %rd26, %r31; 2026-02-21T08:54:14.8017587Z add.s64 %rd27, %rd23, %rd26; 2026-02-21T08:54:14.8017649Z cvta.global.u64 %rd28, %rd27; 2026-02-21T08:54:14.8017706Z add.s32 %r11, %r1, -256; 2026-02-21T08:54:14.8017769Z shr.u32 %r12, %r11, 5; 2026-02-21T08:54:14.8017823Z mov.b32 %r678, 0; 2026-02-21T08:54:14.8017879Z mov.b32 %r677, -128; 2026-02-21T08:54:14.8017975Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:14.8018072Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:14.8018239Z .loc 1 0 67 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0:67 2026-02-21T08:54:14.8018326Z setp.lt.u32 %p6, %r11, 64; 2026-02-21T08:54:14.8018395Z setp.eq.b32 %p3, %r11, 0; 2026-02-21T08:54:14.8018452Z add.s32 %r33, %r20, 98416; 2026-02-21T08:54:14.8018508Z // begin inline asm 2026-02-21T08:54:14.8018555Z 2026-02-21T08:54:14.8018612Z { 2026-02-21T08:54:14.8018670Z .reg .pred complete; 2026-02-21T08:54:14.8018721Z waitLoop: 2026-02-21T08:54:14.8018843Z mbarrier.try_wait.parity.shared.b64 complete, [%r33], %r678; 2026-02-21T08:54:14.8018905Z @!complete bra.uni waitLoop; 2026-02-21T08:54:14.8018953Z } 2026-02-21T08:54:14.8018957Z 2026-02-21T08:54:14.8019019Z // end inline asm 2026-02-21T08:54:14.8019076Z bar.sync 3, 64; 2026-02-21T08:54:14.8019136Z add.s32 %r35, %r20, 98432; 2026-02-21T08:54:14.8019194Z // begin inline asm 2026-02-21T08:54:14.8019310Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r35], 98304; 2026-02-21T08:54:14.8019364Z // end inline asm 2026-02-21T08:54:14.8019426Z bar.sync 3, 64; 2026-02-21T08:54:14.8019506Z shfl.sync.idx.b32 %r45, %r12, 0, 31, -1; 2026-02-21T08:54:14.8019568Z elect.sync %r46|%p7, -1; 2026-02-21T08:54:14.8019628Z and.pred %p4, %p6, %p7; 2026-02-21T08:54:14.8019704Z and.b32 %r47, %r45, 1; 2026-02-21T08:54:14.8019770Z shl.b32 %r48, %r47, 14; 2026-02-21T08:54:14.8019826Z add.s32 %r49, %r20, %r48; 2026-02-21T08:54:14.8019882Z add.s32 %r36, %r49, 65536; 2026-02-21T08:54:14.8019945Z shl.b32 %r50, %r47, 6; 2026-02-21T08:54:14.8020115Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.8020172Z add.s32 %r677, %r677, 128; 2026-02-21T08:54:14.8020337Z .loc 1 0 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:0 2026-02-21T08:54:14.8020425Z add.s32 %r41, %r677, %r50; 2026-02-21T08:54:14.8020479Z // begin inline asm 2026-02-21T08:54:14.8020723Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r36], [%rd28, {%r41, %r38}], [%r35]; 2026-02-21T08:54:14.8020787Z // end inline asm 2026-02-21T08:54:14.8020840Z bar.sync 3, 64; 2026-02-21T08:54:14.8020901Z elect.sync %r51|%p8, -1; 2026-02-21T08:54:14.8020969Z and.pred %p5, %p6, %p8; 2026-02-21T08:54:14.8021025Z shl.b32 %r52, %r47, 15; 2026-02-21T08:54:14.8021081Z add.s32 %r40, %r20, %r52; 2026-02-21T08:54:14.8021135Z // begin inline asm 2026-02-21T08:54:14.8021371Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r40], [%rd29, {%r41, %r42}], [%r35]; 2026-02-21T08:54:14.8021426Z // end inline asm 2026-02-21T08:54:14.8021481Z xor.b32 %r678, %r678, 1; 2026-02-21T08:54:14.8021680Z .loc 1 50 57 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:50:57 2026-02-21T08:54:14.8021746Z setp.lt.u32 %p9, %r677, 1920; 2026-02-21T08:54:14.8021803Z @%p9 bra $L__BB0_9; 2026-02-21T08:54:14.8021902Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.8021959Z barrier.sync 1; 2026-02-21T08:54:14.8022036Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:14.8022090Z bra.uni $L__BB0_2; 2026-02-21T08:54:14.8022192Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:14.8022357Z .loc 1 19 0 // cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py:19 2026-02-21T08:54:14.8022412Z barrier.sync 1; 2026-02-21T08:54:14.8022474Z barrier.sync 1; 2026-02-21T08:54:14.8022529Z bra.uni $L__BB0_2; 2026-02-21T08:54:14.8022581Z $L__tmp1: 2026-02-21T08:54:14.8022639Z $L__func_end0: 2026-02-21T08:54:14.8022717Z // -- End function 2026-02-21T08:54:14.8022767Z } 2026-02-21T08:54:14.8022971Z .file 1 "/tmp/torchinductor_root/st/cstsvkjlseqwkfwuvg5uqcnewa7dmi4coposvlxl6j6yy5p7wjbp.py" 2026-02-21T08:54:14.8023041Z .section .debug_abbrev 2026-02-21T08:54:14.8023088Z { 2026-02-21T08:54:14.8023173Z .b8 1 // Abbreviation Code 2026-02-21T08:54:14.8023289Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:14.8023366Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:14.8023443Z .b8 37 // DW_AT_producer 2026-02-21T08:54:14.8023522Z .b8 8 // DW_FORM_string 2026-02-21T08:54:14.8023593Z .b8 19 // DW_AT_language 2026-02-21T08:54:14.8023666Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:14.8023737Z .b8 3 // DW_AT_name 2026-02-21T08:54:14.8023816Z .b8 8 // DW_FORM_string 2026-02-21T08:54:14.8023891Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:14.8023964Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:14.8024040Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:14.8024112Z .b8 8 // DW_FORM_string 2026-02-21T08:54:14.8024181Z .b8 0 // EOM(1) 2026-02-21T08:54:14.8024254Z .b8 0 // EOM(2) 2026-02-21T08:54:14.8024337Z .b8 0 // EOM(3) 2026-02-21T08:54:14.8024389Z } 2026-02-21T08:54:14.8024446Z .section .debug_info 2026-02-21T08:54:14.8024501Z { 2026-02-21T08:54:14.8024581Z .b32 104 // Length of Unit 2026-02-21T08:54:14.8024664Z .b8 2 // DWARF version number 2026-02-21T08:54:14.8024748Z .b8 0 2026-02-21T08:54:14.8024857Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:14.8024944Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:14.8025079Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:14.8025163Z .b8 116 // DW_AT_producer 2026-02-21T08:54:14.8025216Z .b8 114 2026-02-21T08:54:14.8025268Z .b8 105 2026-02-21T08:54:14.8025323Z .b8 116 2026-02-21T08:54:14.8025372Z .b8 111 2026-02-21T08:54:14.8025419Z .b8 110 2026-02-21T08:54:14.8025469Z .b8 0 2026-02-21T08:54:14.8025548Z .b8 2 // DW_AT_language 2026-02-21T08:54:14.8025597Z .b8 0 2026-02-21T08:54:14.8025668Z .b8 99 // DW_AT_name 2026-02-21T08:54:14.8025725Z .b8 115 2026-02-21T08:54:14.8025774Z .b8 116 2026-02-21T08:54:14.8025823Z .b8 115 2026-02-21T08:54:14.8025872Z .b8 118 2026-02-21T08:54:14.8025930Z .b8 107 2026-02-21T08:54:14.8025979Z .b8 106 2026-02-21T08:54:14.8026029Z .b8 108 2026-02-21T08:54:14.8026086Z .b8 115 2026-02-21T08:54:14.8026136Z .b8 101 2026-02-21T08:54:14.8026212Z .b8 113 2026-02-21T08:54:14.8026265Z .b8 119 2026-02-21T08:54:14.8026324Z .b8 107 2026-02-21T08:54:14.8026373Z .b8 102 2026-02-21T08:54:14.8026422Z .b8 119 2026-02-21T08:54:14.8026479Z .b8 117 2026-02-21T08:54:14.8026530Z .b8 118 2026-02-21T08:54:14.8026580Z .b8 103 2026-02-21T08:54:14.8026632Z .b8 53 2026-02-21T08:54:14.8026694Z .b8 117 2026-02-21T08:54:14.8026746Z .b8 113 2026-02-21T08:54:14.8026795Z .b8 99 2026-02-21T08:54:14.8026852Z .b8 110 2026-02-21T08:54:14.8026901Z .b8 101 2026-02-21T08:54:14.8026950Z .b8 119 2026-02-21T08:54:14.8027000Z .b8 97 2026-02-21T08:54:14.8027056Z .b8 55 2026-02-21T08:54:14.8027103Z .b8 100 2026-02-21T08:54:14.8027151Z .b8 109 2026-02-21T08:54:14.8027199Z .b8 105 2026-02-21T08:54:14.8027255Z .b8 52 2026-02-21T08:54:14.8027302Z .b8 99 2026-02-21T08:54:14.8027351Z .b8 111 2026-02-21T08:54:14.8027408Z .b8 112 2026-02-21T08:54:14.8027456Z .b8 111 2026-02-21T08:54:14.8027505Z .b8 115 2026-02-21T08:54:14.8027553Z .b8 118 2026-02-21T08:54:14.8027612Z .b8 108 2026-02-21T08:54:14.8027661Z .b8 120 2026-02-21T08:54:14.8027713Z .b8 108 2026-02-21T08:54:14.8027766Z .b8 54 2026-02-21T08:54:14.8027817Z .b8 106 2026-02-21T08:54:14.8027865Z .b8 54 2026-02-21T08:54:14.8027913Z .b8 121 2026-02-21T08:54:14.8027970Z .b8 121 2026-02-21T08:54:14.8028046Z .b8 53 2026-02-21T08:54:14.8028096Z .b8 112 2026-02-21T08:54:14.8028143Z .b8 55 2026-02-21T08:54:14.8028201Z .b8 119 2026-02-21T08:54:14.8028249Z .b8 106 2026-02-21T08:54:14.8028298Z .b8 98 2026-02-21T08:54:14.8028354Z .b8 112 2026-02-21T08:54:14.8028403Z .b8 46 2026-02-21T08:54:14.8028452Z .b8 112 2026-02-21T08:54:14.8028500Z .b8 121 2026-02-21T08:54:14.8028555Z .b8 0 2026-02-21T08:54:14.8028642Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:14.8028714Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:14.8028769Z .b8 116 2026-02-21T08:54:14.8028817Z .b8 109 2026-02-21T08:54:14.8028865Z .b8 112 2026-02-21T08:54:14.8028913Z .b8 47 2026-02-21T08:54:14.8028969Z .b8 116 2026-02-21T08:54:14.8029019Z .b8 111 2026-02-21T08:54:14.8029068Z .b8 114 2026-02-21T08:54:14.8029122Z .b8 99 2026-02-21T08:54:14.8029169Z .b8 104 2026-02-21T08:54:14.8029216Z .b8 105 2026-02-21T08:54:14.8029263Z .b8 110 2026-02-21T08:54:14.8029318Z .b8 100 2026-02-21T08:54:14.8029365Z .b8 117 2026-02-21T08:54:14.8029413Z .b8 99 2026-02-21T08:54:14.8029467Z .b8 116 2026-02-21T08:54:14.8029516Z .b8 111 2026-02-21T08:54:14.8029564Z .b8 114 2026-02-21T08:54:14.8029610Z .b8 95 2026-02-21T08:54:14.8029664Z .b8 114 2026-02-21T08:54:14.8029712Z .b8 111 2026-02-21T08:54:14.8029789Z .b8 111 2026-02-21T08:54:14.8029838Z .b8 116 2026-02-21T08:54:14.8029892Z .b8 47 2026-02-21T08:54:14.8029940Z .b8 115 2026-02-21T08:54:14.8029986Z .b8 116 2026-02-21T08:54:14.8030040Z .b8 0 2026-02-21T08:54:14.8030086Z } 2026-02-21T08:54:14.8030148Z .section .debug_macinfo { } 2026-02-21T08:54:14.8030152Z 2026-02-21T08:54:14.8030226Z ================================================================ 2026-02-21T08:54:14.8030334Z please share the reproducer above with Triton project. 2026-02-21T08:54:16.3230080Z 2026-02-21T08:54:16.3232647Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 58/58 16.6 configs/s 2026-02-21T08:54:18.0856764Z Generation 14: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 556.4 2026-02-21T08:54:18.0860819Z configs/s 2026-02-21T08:54:18.1789879Z [363s] Generation 14 complete: 2026-02-21T08:54:18.1790154Z error=19 2026-02-21T08:54:18.1790311Z timeout=1 2026-02-21T08:54:18.1790482Z ok=40 2026-02-21T08:54:18.1790634Z min=0.1075 2026-02-21T08:54:18.1790786Z mid=0.1946 2026-02-21T08:54:18.1791284Z max=35.0648 2026-02-21T08:54:18.1791474Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:54:18.1791712Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:54:18.1791915Z 'l2_groupings': [64], 2026-02-21T08:54:18.1792095Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:54:18.1792281Z 'loop_orders': [[0, 1]], 2026-02-21T08:54:18.1792728Z 'num_stages': 3, 2026-02-21T08:54:18.1792893Z 'num_warps': 8, 2026-02-21T08:54:18.1793030Z 'pid_type': 'flat', 2026-02-21T08:54:18.1793192Z 'range_flattens': [None, None], 2026-02-21T08:54:18.1793369Z 'range_multi_buffers': [None, None], 2026-02-21T08:54:18.1793561Z 'range_num_stages': [0, 0], 2026-02-21T08:54:18.1793723Z 'range_unroll_factors': [0, 0], 2026-02-21T08:54:18.1793905Z 'range_warp_specializes': [None, True]} 2026-02-21T08:54:18.1827026Z [363s] Fitting surrogate: 1073 points, 1073 targets 2026-02-21T08:54:18.9513502Z [364s] Generation 15 starting: 41 neighbors, 2 active search path(s) 2026-02-21T08:54:27.0844376Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42/42 1.3 configs/s 2026-02-21T08:54:28.9593208Z [374s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:28.9593507Z 2026-02-21T08:54:28.9594877Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=1, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:54:28.9596409Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:28.9596653Z `ptxas` stderr: 2026-02-21T08:54:28.9597089Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 305 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:28.9597563Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:28.9597709Z 2026-02-21T08:54:28.9598118Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpg8rw53f_.ptx -o /tmp/tmpg8rw53f_.ptx.o 2026-02-21T08:54:28.9598611Z 2026-02-21T08:54:28.9598739Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:28.9598940Z 2026-02-21T08:54:28.9599495Z ================================================================ 2026-02-21T08:54:28.9601063Z Internal Triton PTX codegen error 2026-02-21T08:54:28.9601345Z `ptxas` stderr: 2026-02-21T08:54:28.9607823Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 305 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:28.9611821Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:28.9616195Z 2026-02-21T08:54:28.9618995Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpg8rw53f_.ptx -o /tmp/tmpg8rw53f_.ptx.o 2026-02-21T08:54:28.9619470Z 2026-02-21T08:54:28.9619474Z 2026-02-21T08:54:28.9619538Z // 2026-02-21T08:54:28.9619695Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:28.9620133Z // 2026-02-21T08:54:28.9620199Z 2026-02-21T08:54:28.9620256Z .version 8.7 2026-02-21T08:54:28.9620396Z .target sm_100a 2026-02-21T08:54:28.9620530Z .address_size 64 2026-02-21T08:54:28.9620620Z 2026-02-21T08:54:28.9620746Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:28.9621002Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:28.9621207Z // @_helion_matmul 2026-02-21T08:54:28.9621410Z .visible .entry _helion_matmul( 2026-02-21T08:54:28.9621621Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:28.9621873Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:28.9622120Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:28.9622374Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:28.9622713Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:28.9622924Z ) 2026-02-21T08:54:28.9623060Z .reqntid 256 2026-02-21T08:54:28.9623190Z .maxnreg 32 2026-02-21T08:54:28.9623321Z { 2026-02-21T08:54:28.9623454Z .reg .pred %p<132>; 2026-02-21T08:54:28.9623613Z .reg .b16 %rs<7>; 2026-02-21T08:54:28.9623753Z .reg .b32 %r<1144>; 2026-02-21T08:54:28.9623937Z .reg .b64 %rd<652>; 2026-02-21T08:54:28.9624194Z .loc 1 19 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:19:0 2026-02-21T08:54:28.9624486Z $L__func_begin0: 2026-02-21T08:54:28.9624806Z .loc 1 19 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:19:0 2026-02-21T08:54:28.9625024Z 2026-02-21T08:54:28.9625076Z // %bb.0: 2026-02-21T08:54:28.9625238Z ld.param.b64 %rd23, [_helion_matmul_param_3]; 2026-02-21T08:54:28.9625422Z $L__tmp0: 2026-02-21T08:54:28.9625652Z .loc 1 19 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:19 2026-02-21T08:54:28.9625929Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:28.9626360Z shr.u32 %r2, %r1, 5; 2026-02-21T08:54:28.9626523Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:54:28.9626705Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:54:28.9626863Z @%p1 bra $L__BB0_12; 2026-02-21T08:54:28.9627001Z bra.uni $L__BB0_1; 2026-02-21T08:54:28.9627224Z $L__BB0_12: 2026-02-21T08:54:28.9627448Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0:0 2026-02-21T08:54:28.9627758Z ld.param.b64 %rd22, [_helion_matmul_param_2]; 2026-02-21T08:54:28.9627976Z ld.param.b64 %rd21, [_helion_matmul_param_1]; 2026-02-21T08:54:28.9628178Z ld.param.b64 %rd20, [_helion_matmul_param_0]; 2026-02-21T08:54:28.9628466Z .loc 1 19 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:19 2026-02-21T08:54:28.9628758Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:28.9628951Z setp.lt.u32 %p48, %r1, 32; 2026-02-21T08:54:28.9629112Z mov.b32 %r130, global_smem; 2026-02-21T08:54:28.9629275Z // begin inline asm 2026-02-21T08:54:28.9629518Z @%p48 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r130], 256; 2026-02-21T08:54:28.9629764Z // end inline asm 2026-02-21T08:54:28.9629900Z bar.sync 0, 128; 2026-02-21T08:54:28.9630044Z ld.shared.b32 %r1139, [global_smem]; 2026-02-21T08:54:28.9630219Z bar.sync 0, 128; 2026-02-21T08:54:28.9630349Z // begin inline asm 2026-02-21T08:54:28.9630553Z @%p48 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:28.9630776Z // end inline asm 2026-02-21T08:54:28.9631055Z .loc 1 21 67 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:21:67 2026-02-21T08:54:28.9631338Z mov.u32 %r18, %ctaid.x; 2026-02-21T08:54:28.9631498Z mov.u32 %r155, %ctaid.y; 2026-02-21T08:54:28.9631656Z mov.u32 %r156, %ctaid.z; 2026-02-21T08:54:28.9631806Z mov.u32 %r157, %nctaid.x; 2026-02-21T08:54:28.9631968Z mov.u32 %r158, %nctaid.y; 2026-02-21T08:54:28.9632129Z mad.lo.s32 %r159, %r156, %r158, %r155; 2026-02-21T08:54:28.9632325Z mad.lo.s32 %r160, %r159, %r157, %r18; 2026-02-21T08:54:28.9632539Z mul.lo.s32 %r161, %r160, 384; 2026-02-21T08:54:28.9632716Z cvt.s64.s32 %rd136, %r161; 2026-02-21T08:54:28.9632876Z add.s64 %rd97, %rd23, %rd136; 2026-02-21T08:54:28.9633037Z shl.b32 %r162, %r1, 2; 2026-02-21T08:54:28.9633197Z add.s32 %r131, %r130, %r162; 2026-02-21T08:54:28.9633346Z mov.b32 %r166, 0; 2026-02-21T08:54:28.9633483Z // begin inline asm 2026-02-21T08:54:28.9633631Z @%p48 st.shared.b32 [ %r131 + 0 ], %r166; 2026-02-21T08:54:28.9633811Z // end inline asm 2026-02-21T08:54:28.9633946Z bar.warp.sync -1; 2026-02-21T08:54:28.9634096Z setp.eq.b32 %p121, %r1, 0; 2026-02-21T08:54:28.9634248Z cvt.u64.u32 %rd82, %r130; 2026-02-21T08:54:28.9634403Z // begin inline asm 2026-02-21T08:54:28.9634651Z @%p121 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd20; 2026-02-21T08:54:28.9634987Z // end inline asm 2026-02-21T08:54:28.9635125Z // begin inline asm 2026-02-21T08:54:28.9635383Z @%p121 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:28.9635650Z // end inline asm 2026-02-21T08:54:28.9635785Z mov.b32 %r133, 64; 2026-02-21T08:54:28.9635930Z // begin inline asm 2026-02-21T08:54:28.9636167Z @%p121 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:28.9636450Z // end inline asm 2026-02-21T08:54:28.9636584Z mov.b32 %r134, 128; 2026-02-21T08:54:28.9636732Z // begin inline asm 2026-02-21T08:54:28.9636970Z @%p121 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r134; 2026-02-21T08:54:28.9637236Z // end inline asm 2026-02-21T08:54:28.9637378Z mov.b32 %r135, 2048; 2026-02-21T08:54:28.9637518Z // begin inline asm 2026-02-21T08:54:28.9637766Z @%p121 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:28.9638040Z // end inline asm 2026-02-21T08:54:28.9638176Z // begin inline asm 2026-02-21T08:54:28.9638421Z @%p121 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:28.9638693Z // end inline asm 2026-02-21T08:54:28.9638833Z mov.b64 %rd90, 4096; 2026-02-21T08:54:28.9638972Z // begin inline asm 2026-02-21T08:54:28.9639231Z @%p121 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T08:54:28.9639542Z // end inline asm 2026-02-21T08:54:28.9639680Z mov.b32 %r137, 1; 2026-02-21T08:54:28.9639815Z // begin inline asm 2026-02-21T08:54:28.9640074Z @%p121 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r137; 2026-02-21T08:54:28.9640365Z // end inline asm 2026-02-21T08:54:28.9640496Z // begin inline asm 2026-02-21T08:54:28.9640758Z @%p121 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r137; 2026-02-21T08:54:28.9641036Z // end inline asm 2026-02-21T08:54:28.9641176Z // begin inline asm 2026-02-21T08:54:28.9641409Z @%p121 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:28.9641690Z // end inline asm 2026-02-21T08:54:28.9641831Z // begin inline asm 2026-02-21T08:54:28.9642078Z @%p121 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:28.9642374Z // end inline asm 2026-02-21T08:54:28.9642511Z // begin inline asm 2026-02-21T08:54:28.9642759Z @%p121 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:28.9643025Z // end inline asm 2026-02-21T08:54:28.9643194Z // begin inline asm 2026-02-21T08:54:28.9643428Z @%p121 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:28.9643681Z // end inline asm 2026-02-21T08:54:28.9643818Z // begin inline asm 2026-02-21T08:54:28.9644154Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd97 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:28.9644526Z // end inline asm 2026-02-21T08:54:28.9644655Z // begin inline asm 2026-02-21T08:54:28.9644894Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd97 + 0 ], 0x80; 2026-02-21T08:54:28.9645179Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:28.9645366Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:28.9645547Z // end inline asm 2026-02-21T08:54:28.9645676Z bar.sync 0, 128; 2026-02-21T08:54:28.9645919Z .loc 1 22 68 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:22:68 2026-02-21T08:54:28.9646205Z add.s32 %r163, %r161, 128; 2026-02-21T08:54:28.9646370Z cvt.s64.s32 %rd137, %r163; 2026-02-21T08:54:28.9646528Z add.s64 %rd115, %rd23, %rd137; 2026-02-21T08:54:28.9646691Z bar.sync 0, 128; 2026-02-21T08:54:28.9646820Z // begin inline asm 2026-02-21T08:54:28.9646975Z @%p48 st.shared.b32 [ %r131 + 0 ], %r166; 2026-02-21T08:54:28.9647150Z // end inline asm 2026-02-21T08:54:28.9647281Z bar.warp.sync -1; 2026-02-21T08:54:28.9647423Z // begin inline asm 2026-02-21T08:54:28.9647701Z @%p121 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd21; 2026-02-21T08:54:28.9647982Z // end inline asm 2026-02-21T08:54:28.9648107Z // begin inline asm 2026-02-21T08:54:28.9648327Z @%p121 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:28.9648578Z // end inline asm 2026-02-21T08:54:28.9648717Z // begin inline asm 2026-02-21T08:54:28.9648963Z @%p121 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:28.9649238Z // end inline asm 2026-02-21T08:54:28.9649380Z mov.b32 %r142, 256; 2026-02-21T08:54:28.9649522Z // begin inline asm 2026-02-21T08:54:28.9649764Z @%p121 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r142; 2026-02-21T08:54:28.9650034Z // end inline asm 2026-02-21T08:54:28.9650175Z // begin inline asm 2026-02-21T08:54:28.9650427Z @%p121 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:28.9650704Z // end inline asm 2026-02-21T08:54:28.9650846Z mov.b32 %r144, 12288; 2026-02-21T08:54:28.9650994Z // begin inline asm 2026-02-21T08:54:28.9651243Z @%p121 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r144; 2026-02-21T08:54:28.9651520Z // end inline asm 2026-02-21T08:54:28.9651699Z // begin inline asm 2026-02-21T08:54:28.9651971Z @%p121 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T08:54:28.9652268Z // end inline asm 2026-02-21T08:54:28.9652416Z // begin inline asm 2026-02-21T08:54:28.9652685Z @%p121 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r137; 2026-02-21T08:54:28.9653003Z // end inline asm 2026-02-21T08:54:28.9653144Z // begin inline asm 2026-02-21T08:54:28.9653430Z @%p121 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r137; 2026-02-21T08:54:28.9653734Z // end inline asm 2026-02-21T08:54:28.9653881Z // begin inline asm 2026-02-21T08:54:28.9654135Z @%p121 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:28.9654408Z // end inline asm 2026-02-21T08:54:28.9654554Z // begin inline asm 2026-02-21T08:54:28.9654840Z @%p121 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:28.9655135Z // end inline asm 2026-02-21T08:54:28.9655266Z // begin inline asm 2026-02-21T08:54:28.9655513Z @%p121 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:28.9655794Z // end inline asm 2026-02-21T08:54:28.9655955Z // begin inline asm 2026-02-21T08:54:28.9656201Z @%p121 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:28.9656467Z // end inline asm 2026-02-21T08:54:28.9656607Z // begin inline asm 2026-02-21T08:54:28.9656962Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd115 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:28.9657358Z // end inline asm 2026-02-21T08:54:28.9657501Z // begin inline asm 2026-02-21T08:54:28.9657747Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd115 + 0 ], 0x80; 2026-02-21T08:54:28.9658019Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:28.9658212Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:28.9658401Z // end inline asm 2026-02-21T08:54:28.9658544Z bar.sync 0, 128; 2026-02-21T08:54:28.9658787Z .loc 1 24 73 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:24:73 2026-02-21T08:54:28.9659073Z add.s32 %r164, %r161, 256; 2026-02-21T08:54:28.9659236Z cvt.s64.s32 %rd138, %r164; 2026-02-21T08:54:28.9659396Z add.s64 %rd19, %rd23, %rd138; 2026-02-21T08:54:28.9659546Z bar.sync 0, 128; 2026-02-21T08:54:28.9659681Z // begin inline asm 2026-02-21T08:54:28.9659828Z @%p48 st.shared.b32 [ %r131 + 0 ], %r166; 2026-02-21T08:54:28.9660006Z // end inline asm 2026-02-21T08:54:28.9660136Z bar.warp.sync -1; 2026-02-21T08:54:28.9660276Z // begin inline asm 2026-02-21T08:54:28.9660563Z @%p121 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd22; 2026-02-21T08:54:28.9660851Z // end inline asm 2026-02-21T08:54:28.9660985Z // begin inline asm 2026-02-21T08:54:28.9661201Z @%p121 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:28.9661457Z // end inline asm 2026-02-21T08:54:28.9661584Z // begin inline asm 2026-02-21T08:54:28.9661817Z @%p121 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:28.9662077Z // end inline asm 2026-02-21T08:54:28.9662212Z // begin inline asm 2026-02-21T08:54:28.9662437Z @%p121 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r134; 2026-02-21T08:54:28.9662705Z // end inline asm 2026-02-21T08:54:28.9662840Z // begin inline asm 2026-02-21T08:54:28.9663081Z @%p121 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r144; 2026-02-21T08:54:28.9663358Z // end inline asm 2026-02-21T08:54:28.9663486Z // begin inline asm 2026-02-21T08:54:28.9663733Z @%p121 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:28.9663999Z // end inline asm 2026-02-21T08:54:28.9664152Z mov.b64 %rd126, 24576; 2026-02-21T08:54:28.9664305Z // begin inline asm 2026-02-21T08:54:28.9664578Z @%p121 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd126; 2026-02-21T08:54:28.9664894Z // end inline asm 2026-02-21T08:54:28.9665025Z // begin inline asm 2026-02-21T08:54:28.9665290Z @%p121 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r137; 2026-02-21T08:54:28.9665583Z // end inline asm 2026-02-21T08:54:28.9665720Z // begin inline asm 2026-02-21T08:54:28.9665981Z @%p121 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r137; 2026-02-21T08:54:28.9666279Z // end inline asm 2026-02-21T08:54:28.9666420Z // begin inline asm 2026-02-21T08:54:28.9666653Z @%p121 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:28.9666935Z // end inline asm 2026-02-21T08:54:28.9667068Z // begin inline asm 2026-02-21T08:54:28.9667332Z @%p121 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:28.9667623Z // end inline asm 2026-02-21T08:54:28.9667765Z // begin inline asm 2026-02-21T08:54:28.9668010Z @%p121 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:28.9668279Z // end inline asm 2026-02-21T08:54:28.9668445Z // begin inline asm 2026-02-21T08:54:28.9668677Z @%p121 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:28.9668944Z // end inline asm 2026-02-21T08:54:28.9669078Z // begin inline asm 2026-02-21T08:54:28.9669431Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:28.9669823Z // end inline asm 2026-02-21T08:54:28.9669958Z // begin inline asm 2026-02-21T08:54:28.9670203Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:54:28.9670450Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:28.9670646Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:28.9670821Z // end inline asm 2026-02-21T08:54:28.9670960Z bar.sync 0, 128; 2026-02-21T08:54:28.9671210Z .loc 1 33 75 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:33:75 2026-02-21T08:54:28.9671500Z setp.gt.u32 %p104, %r18, 767; 2026-02-21T08:54:28.9671675Z @%p104 bra $L__BB0_14; 2026-02-21T08:54:28.9671840Z // %bb.13: // %.lr.ph 2026-02-21T08:54:28.9672135Z .loc 1 24 73 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:24:73 2026-02-21T08:54:28.9672426Z cvta.global.u64 %rd139, %rd19; 2026-02-21T08:54:28.9672602Z setp.lt.u32 %p129, %r1, 128; 2026-02-21T08:54:28.9672761Z shl.b32 %r721, %r1, 7; 2026-02-21T08:54:28.9672943Z and.b32 %r722, %r721, 16256; 2026-02-21T08:54:28.9673108Z shl.b32 %r723, %r1, 4; 2026-02-21T08:54:28.9673262Z and.b32 %r724, %r723, 112; 2026-02-21T08:54:28.9673419Z or.b32 %r725, %r722, %r724; 2026-02-21T08:54:28.9673573Z xor.b32 %r726, %r725, 112; 2026-02-21T08:54:28.9673729Z add.s32 %r728, %r130, %r726; 2026-02-21T08:54:28.9673881Z xor.b32 %r729, %r725, 96; 2026-02-21T08:54:28.9674035Z add.s32 %r730, %r130, %r729; 2026-02-21T08:54:28.9674186Z xor.b32 %r731, %r725, 80; 2026-02-21T08:54:28.9674345Z add.s32 %r732, %r130, %r731; 2026-02-21T08:54:28.9674497Z xor.b32 %r733, %r725, 64; 2026-02-21T08:54:28.9674654Z add.s32 %r734, %r130, %r733; 2026-02-21T08:54:28.9674868Z xor.b32 %r735, %r725, 48; 2026-02-21T08:54:28.9675013Z add.s32 %r736, %r130, %r735; 2026-02-21T08:54:28.9675166Z xor.b32 %r737, %r725, 32; 2026-02-21T08:54:28.9675308Z add.s32 %r738, %r130, %r737; 2026-02-21T08:54:28.9675460Z xor.b32 %r739, %r725, 16; 2026-02-21T08:54:28.9675604Z add.s32 %r740, %r130, %r739; 2026-02-21T08:54:28.9675759Z add.s32 %r741, %r130, %r725; 2026-02-21T08:54:28.9676006Z .loc 1 40 33 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:40:33 2026-02-21T08:54:28.9676291Z shr.u32 %r742, %r18, 4; 2026-02-21T08:54:28.9676442Z and.b32 %r743, %r742, 32; 2026-02-21T08:54:28.9676714Z .loc 1 41 39 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:41:39 2026-02-21T08:54:28.9676995Z xor.b32 %r744, %r743, 48; 2026-02-21T08:54:28.9677236Z .loc 1 41 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:41:52 2026-02-21T08:54:28.9677510Z min.u32 %r745, %r744, 32; 2026-02-21T08:54:28.9677747Z .loc 1 42 64 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:42:64 2026-02-21T08:54:28.9678024Z cvt.u16.u32 %rs1, %r18; 2026-02-21T08:54:28.9678181Z and.b16 %rs2, %rs1, 511; 2026-02-21T08:54:28.9678328Z cvt.u16.u32 %rs3, %r745; 2026-02-21T08:54:28.9678579Z .loc 1 43 51 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:43:51 2026-02-21T08:54:28.9678845Z div.u16 %rs4, %rs2, %rs3; 2026-02-21T08:54:28.9679095Z .loc 1 42 64 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:42:64 2026-02-21T08:54:28.9679368Z mul.lo.s16 %rs5, %rs4, %rs3; 2026-02-21T08:54:28.9679526Z sub.s16 %rs6, %rs2, %rs5; 2026-02-21T08:54:28.9679670Z cvt.u32.u16 %r746, %rs6; 2026-02-21T08:54:28.9679943Z .loc 1 42 30 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:42:30 2026-02-21T08:54:28.9680221Z add.s32 %r747, %r743, %r746; 2026-02-21T08:54:28.9680467Z .loc 1 44 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:44:27 2026-02-21T08:54:28.9680745Z shl.b32 %r748, %r747, 8; 2026-02-21T08:54:28.9680981Z .loc 1 45 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:45:27 2026-02-21T08:54:28.9681258Z mul.wide.u16 %r719, %rs4, 128; 2026-02-21T08:54:28.9681515Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9681837Z shfl.sync.idx.b32 %r749, %r2, 0, 31, -1; 2026-02-21T08:54:28.9682019Z and.b32 %r750, %r749, 3; 2026-02-21T08:54:28.9682162Z shl.b32 %r751, %r750, 21; 2026-02-21T08:54:28.9682317Z add.s32 %r165, %r751, %r1139; 2026-02-21T08:54:28.9682472Z mov.pred %p105, -1; 2026-02-21T08:54:28.9682622Z // begin inline asm 2026-02-21T08:54:28.9682983Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 0], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9683379Z // end inline asm 2026-02-21T08:54:28.9683521Z // begin inline asm 2026-02-21T08:54:28.9683883Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 16], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9684268Z // end inline asm 2026-02-21T08:54:28.9684436Z // begin inline asm 2026-02-21T08:54:28.9684835Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 32], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9685222Z // end inline asm 2026-02-21T08:54:28.9685364Z // begin inline asm 2026-02-21T08:54:28.9685721Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 48], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9686096Z // end inline asm 2026-02-21T08:54:28.9686239Z // begin inline asm 2026-02-21T08:54:28.9686580Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 64], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9686972Z // end inline asm 2026-02-21T08:54:28.9687108Z // begin inline asm 2026-02-21T08:54:28.9687463Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 80], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9687847Z // end inline asm 2026-02-21T08:54:28.9687975Z // begin inline asm 2026-02-21T08:54:28.9688318Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 96], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9688736Z // end inline asm 2026-02-21T08:54:28.9688874Z // begin inline asm 2026-02-21T08:54:28.9689241Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 112], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9689629Z // end inline asm 2026-02-21T08:54:28.9689767Z // begin inline asm 2026-02-21T08:54:28.9690124Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 128], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9690519Z // end inline asm 2026-02-21T08:54:28.9690650Z // begin inline asm 2026-02-21T08:54:28.9691008Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 144], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9691397Z // end inline asm 2026-02-21T08:54:28.9691526Z // begin inline asm 2026-02-21T08:54:28.9691915Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 160], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9692308Z // end inline asm 2026-02-21T08:54:28.9692443Z // begin inline asm 2026-02-21T08:54:28.9692801Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 176], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9693202Z // end inline asm 2026-02-21T08:54:28.9693347Z // begin inline asm 2026-02-21T08:54:28.9693707Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 192], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9694151Z // end inline asm 2026-02-21T08:54:28.9694285Z // begin inline asm 2026-02-21T08:54:28.9694655Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 208], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9695092Z // end inline asm 2026-02-21T08:54:28.9695229Z // begin inline asm 2026-02-21T08:54:28.9695594Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 224], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9696000Z // end inline asm 2026-02-21T08:54:28.9696143Z // begin inline asm 2026-02-21T08:54:28.9696539Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r165 + 240], {%r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166, %r166}; 2026-02-21T08:54:28.9696942Z // end inline asm 2026-02-21T08:54:28.9697085Z // begin inline asm 2026-02-21T08:54:28.9697239Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:28.9697410Z // end inline asm 2026-02-21T08:54:28.9697545Z bar.sync 0, 128; 2026-02-21T08:54:28.9697802Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9698099Z add.s32 %r437, %r130, 98416; 2026-02-21T08:54:28.9698269Z // begin inline asm 2026-02-21T08:54:28.9698439Z @%p121 mbarrier.init.shared::cta.b64 [%r437], 1; 2026-02-21T08:54:28.9698644Z // end inline asm 2026-02-21T08:54:28.9698795Z add.s32 %r438, %r130, 98432; 2026-02-21T08:54:28.9698952Z // begin inline asm 2026-02-21T08:54:28.9699131Z @%p121 mbarrier.init.shared::cta.b64 [%r438], 1; 2026-02-21T08:54:28.9699325Z // end inline asm 2026-02-21T08:54:28.9699584Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0 2026-02-21T08:54:28.9699867Z bar.sync 0, 128; 2026-02-21T08:54:28.9700013Z // begin inline asm 2026-02-21T08:54:28.9700190Z @%p121 mbarrier.arrive.shared::cta.b64 _, [%r437]; 2026-02-21T08:54:28.9700398Z // end inline asm 2026-02-21T08:54:28.9700677Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9700942Z bar.sync 0, 128; 2026-02-21T08:54:28.9701083Z add.s32 %r440, %r130, 98448; 2026-02-21T08:54:28.9701232Z // begin inline asm 2026-02-21T08:54:28.9701396Z @%p121 mbarrier.init.shared::cta.b64 [%r440], 1; 2026-02-21T08:54:28.9701576Z // end inline asm 2026-02-21T08:54:28.9701733Z st.shared.b32 [global_smem+98456], 33554689; 2026-02-21T08:54:28.9701930Z st.shared.b32 [global_smem+98304], %r1139; 2026-02-21T08:54:28.9702148Z st.shared.v2.b32 [global_smem+98312], {%r719, %r748}; 2026-02-21T08:54:28.9702353Z barrier.sync 1; 2026-02-21T08:54:28.9702507Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:28.9702691Z barrier.sync 1; 2026-02-21T08:54:28.9702820Z barrier.sync 1; 2026-02-21T08:54:28.9702973Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:28.9703252Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9703527Z bar.sync 0, 128; 2026-02-21T08:54:28.9703657Z // begin inline asm 2026-02-21T08:54:28.9703792Z 2026-02-21T08:54:28.9703907Z { 2026-02-21T08:54:28.9704027Z .reg .pred complete; 2026-02-21T08:54:28.9704202Z waitLoop: 2026-02-21T08:54:28.9704386Z mbarrier.try_wait.parity.shared.b64 complete, [%r440], %r166; 2026-02-21T08:54:28.9704625Z @!complete bra.uni waitLoop; 2026-02-21T08:54:28.9704800Z } 2026-02-21T08:54:28.9704870Z 2026-02-21T08:54:28.9704922Z // end inline asm 2026-02-21T08:54:28.9705153Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9705433Z bar.sync 0, 128; 2026-02-21T08:54:28.9705569Z // begin inline asm 2026-02-21T08:54:28.9705781Z @%p121 mbarrier.inval.shared::cta.b64 [%r440]; 2026-02-21T08:54:28.9705972Z // end inline asm 2026-02-21T08:54:28.9706101Z // begin inline asm 2026-02-21T08:54:28.9706265Z @%p121 mbarrier.inval.shared::cta.b64 [%r438]; 2026-02-21T08:54:28.9706445Z // end inline asm 2026-02-21T08:54:28.9706578Z // begin inline asm 2026-02-21T08:54:28.9706730Z @%p121 mbarrier.inval.shared::cta.b64 [%r437]; 2026-02-21T08:54:28.9706916Z // end inline asm 2026-02-21T08:54:28.9707153Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9707428Z // begin inline asm 2026-02-21T08:54:28.9707780Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461}, [%r165 + 0]; 2026-02-21T08:54:28.9708154Z // end inline asm 2026-02-21T08:54:28.9708294Z // begin inline asm 2026-02-21T08:54:28.9708655Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478}, [%r165 + 16]; 2026-02-21T08:54:28.9709044Z // end inline asm 2026-02-21T08:54:28.9709180Z // begin inline asm 2026-02-21T08:54:28.9709526Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495}, [%r165 + 32]; 2026-02-21T08:54:28.9709914Z // end inline asm 2026-02-21T08:54:28.9710055Z // begin inline asm 2026-02-21T08:54:28.9710400Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512}, [%r165 + 48]; 2026-02-21T08:54:28.9710770Z // end inline asm 2026-02-21T08:54:28.9710908Z // begin inline asm 2026-02-21T08:54:28.9711251Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529}, [%r165 + 64]; 2026-02-21T08:54:28.9711639Z // end inline asm 2026-02-21T08:54:28.9711776Z // begin inline asm 2026-02-21T08:54:28.9712108Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546}, [%r165 + 80]; 2026-02-21T08:54:28.9712526Z // end inline asm 2026-02-21T08:54:28.9712655Z // begin inline asm 2026-02-21T08:54:28.9712997Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563}, [%r165 + 96]; 2026-02-21T08:54:28.9713373Z // end inline asm 2026-02-21T08:54:28.9713500Z // begin inline asm 2026-02-21T08:54:28.9713852Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580}, [%r165 + 112]; 2026-02-21T08:54:28.9714228Z // end inline asm 2026-02-21T08:54:28.9714364Z // begin inline asm 2026-02-21T08:54:28.9714731Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597}, [%r165 + 128]; 2026-02-21T08:54:28.9715105Z // end inline asm 2026-02-21T08:54:28.9715240Z // begin inline asm 2026-02-21T08:54:28.9715570Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614}, [%r165 + 144]; 2026-02-21T08:54:28.9715934Z // end inline asm 2026-02-21T08:54:28.9716086Z // begin inline asm 2026-02-21T08:54:28.9716433Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631}, [%r165 + 160]; 2026-02-21T08:54:28.9716810Z // end inline asm 2026-02-21T08:54:28.9716938Z // begin inline asm 2026-02-21T08:54:28.9717285Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648}, [%r165 + 176]; 2026-02-21T08:54:28.9717684Z // end inline asm 2026-02-21T08:54:28.9717818Z // begin inline asm 2026-02-21T08:54:28.9718155Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665}, [%r165 + 192]; 2026-02-21T08:54:28.9718539Z // end inline asm 2026-02-21T08:54:28.9718673Z // begin inline asm 2026-02-21T08:54:28.9719004Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682}, [%r165 + 208]; 2026-02-21T08:54:28.9719382Z // end inline asm 2026-02-21T08:54:28.9719511Z // begin inline asm 2026-02-21T08:54:28.9719850Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699}, [%r165 + 224]; 2026-02-21T08:54:28.9720238Z // end inline asm 2026-02-21T08:54:28.9720367Z // begin inline asm 2026-02-21T08:54:28.9720744Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716}, [%r165 + 240]; 2026-02-21T08:54:28.9721118Z // end inline asm 2026-02-21T08:54:28.9721253Z // begin inline asm 2026-02-21T08:54:28.9721397Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:28.9721560Z // end inline asm 2026-02-21T08:54:28.9721694Z cvt.u64.u32 %rd140, %r446; 2026-02-21T08:54:28.9721857Z cvt.u64.u32 %rd141, %r447; 2026-02-21T08:54:28.9722019Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:54:28.9722178Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:54:28.9722454Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9722729Z mov.b64 {%r752, %r753}, %rd143; 2026-02-21T08:54:28.9722911Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T08:54:28.9723183Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9723462Z cvt.u64.u32 %rd144, %r448; 2026-02-21T08:54:28.9723616Z cvt.u64.u32 %rd145, %r449; 2026-02-21T08:54:28.9723775Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:54:28.9723935Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:54:28.9724190Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9724500Z mov.b64 {%r755, %r756}, %rd147; 2026-02-21T08:54:28.9724667Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T08:54:28.9724979Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9725253Z cvt.u64.u32 %rd148, %r450; 2026-02-21T08:54:28.9725410Z cvt.u64.u32 %rd149, %r451; 2026-02-21T08:54:28.9725564Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:54:28.9725716Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:54:28.9725974Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9726252Z mov.b64 {%r758, %r759}, %rd151; 2026-02-21T08:54:28.9726425Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T08:54:28.9726692Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9726971Z cvt.u64.u32 %rd152, %r452; 2026-02-21T08:54:28.9727119Z cvt.u64.u32 %rd153, %r453; 2026-02-21T08:54:28.9727273Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:54:28.9727430Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:54:28.9727705Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9727984Z mov.b64 {%r761, %r762}, %rd155; 2026-02-21T08:54:28.9728145Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T08:54:28.9728414Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9728682Z cvt.u64.u32 %rd156, %r454; 2026-02-21T08:54:28.9728838Z cvt.u64.u32 %rd157, %r455; 2026-02-21T08:54:28.9728995Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:54:28.9729171Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:54:28.9729424Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9729687Z mov.b64 {%r764, %r765}, %rd159; 2026-02-21T08:54:28.9729853Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T08:54:28.9730110Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9730380Z cvt.u64.u32 %rd160, %r456; 2026-02-21T08:54:28.9730527Z cvt.u64.u32 %rd161, %r457; 2026-02-21T08:54:28.9730680Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:54:28.9730836Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:54:28.9731081Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9731352Z mov.b64 {%r767, %r768}, %rd163; 2026-02-21T08:54:28.9731512Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T08:54:28.9731801Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9732078Z cvt.u64.u32 %rd164, %r458; 2026-02-21T08:54:28.9732233Z cvt.u64.u32 %rd165, %r459; 2026-02-21T08:54:28.9732387Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:54:28.9732540Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:54:28.9732799Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9733079Z mov.b64 {%r770, %r771}, %rd167; 2026-02-21T08:54:28.9733249Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T08:54:28.9733515Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9733797Z cvt.u64.u32 %rd168, %r460; 2026-02-21T08:54:28.9733951Z cvt.u64.u32 %rd169, %r461; 2026-02-21T08:54:28.9734109Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:54:28.9734267Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:54:28.9734520Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9734820Z mov.b64 {%r773, %r774}, %rd171; 2026-02-21T08:54:28.9734980Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T08:54:28.9735256Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9735581Z cvt.u64.u32 %rd172, %r463; 2026-02-21T08:54:28.9735743Z cvt.u64.u32 %rd173, %r464; 2026-02-21T08:54:28.9735906Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:54:28.9736068Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:54:28.9736344Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9736633Z mov.b64 {%r776, %r777}, %rd175; 2026-02-21T08:54:28.9736812Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T08:54:28.9737092Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9737385Z cvt.u64.u32 %rd176, %r465; 2026-02-21T08:54:28.9737540Z cvt.u64.u32 %rd177, %r466; 2026-02-21T08:54:28.9737703Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:54:28.9737868Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:54:28.9738138Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9738439Z mov.b64 {%r779, %r780}, %rd179; 2026-02-21T08:54:28.9738608Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T08:54:28.9738917Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9739198Z cvt.u64.u32 %rd180, %r467; 2026-02-21T08:54:28.9739359Z cvt.u64.u32 %rd181, %r468; 2026-02-21T08:54:28.9739518Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:54:28.9739676Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:54:28.9739945Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9740238Z mov.b64 {%r782, %r783}, %rd183; 2026-02-21T08:54:28.9740414Z cvt.rn.f16x2.f32 %r784, %r783, %r782; 2026-02-21T08:54:28.9740722Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9741014Z cvt.u64.u32 %rd184, %r469; 2026-02-21T08:54:28.9741173Z cvt.u64.u32 %rd185, %r470; 2026-02-21T08:54:28.9741343Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:54:28.9741512Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:54:28.9741779Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9742079Z mov.b64 {%r785, %r786}, %rd187; 2026-02-21T08:54:28.9742253Z cvt.rn.f16x2.f32 %r787, %r786, %r785; 2026-02-21T08:54:28.9742539Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9742829Z cvt.u64.u32 %rd188, %r471; 2026-02-21T08:54:28.9742995Z cvt.u64.u32 %rd189, %r472; 2026-02-21T08:54:28.9743162Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:54:28.9743364Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:54:28.9743622Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9743887Z mov.b64 {%r788, %r789}, %rd191; 2026-02-21T08:54:28.9744053Z cvt.rn.f16x2.f32 %r790, %r789, %r788; 2026-02-21T08:54:28.9744317Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9744599Z cvt.u64.u32 %rd192, %r473; 2026-02-21T08:54:28.9744779Z cvt.u64.u32 %rd193, %r474; 2026-02-21T08:54:28.9744942Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:54:28.9745109Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:54:28.9745372Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9745652Z mov.b64 {%r791, %r792}, %rd195; 2026-02-21T08:54:28.9745813Z cvt.rn.f16x2.f32 %r793, %r792, %r791; 2026-02-21T08:54:28.9746086Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9746356Z cvt.u64.u32 %rd196, %r475; 2026-02-21T08:54:28.9746510Z cvt.u64.u32 %rd197, %r476; 2026-02-21T08:54:28.9746662Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:54:28.9746814Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:54:28.9747101Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9747371Z mov.b64 {%r794, %r795}, %rd199; 2026-02-21T08:54:28.9747544Z cvt.rn.f16x2.f32 %r796, %r795, %r794; 2026-02-21T08:54:28.9747810Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9748095Z cvt.u64.u32 %rd200, %r477; 2026-02-21T08:54:28.9748247Z cvt.u64.u32 %rd201, %r478; 2026-02-21T08:54:28.9748407Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:54:28.9748571Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:54:28.9748826Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9749108Z mov.b64 {%r797, %r798}, %rd203; 2026-02-21T08:54:28.9749275Z cvt.rn.f16x2.f32 %r799, %r798, %r797; 2026-02-21T08:54:28.9749547Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9749820Z cvt.u64.u32 %rd204, %r480; 2026-02-21T08:54:28.9749980Z cvt.u64.u32 %rd205, %r481; 2026-02-21T08:54:28.9750139Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:54:28.9750295Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:54:28.9750608Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9750878Z mov.b64 {%r800, %r801}, %rd207; 2026-02-21T08:54:28.9751042Z cvt.rn.f16x2.f32 %r802, %r801, %r800; 2026-02-21T08:54:28.9751302Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9751585Z cvt.u64.u32 %rd208, %r482; 2026-02-21T08:54:28.9751729Z cvt.u64.u32 %rd209, %r483; 2026-02-21T08:54:28.9751916Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:54:28.9752074Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:54:28.9752324Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9752603Z mov.b64 {%r803, %r804}, %rd211; 2026-02-21T08:54:28.9752763Z cvt.rn.f16x2.f32 %r805, %r804, %r803; 2026-02-21T08:54:28.9753037Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9753314Z cvt.u64.u32 %rd212, %r484; 2026-02-21T08:54:28.9753471Z cvt.u64.u32 %rd213, %r485; 2026-02-21T08:54:28.9753624Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:54:28.9753774Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:54:28.9754032Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9754300Z mov.b64 {%r806, %r807}, %rd215; 2026-02-21T08:54:28.9754498Z cvt.rn.f16x2.f32 %r808, %r807, %r806; 2026-02-21T08:54:28.9754791Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9755079Z cvt.u64.u32 %rd216, %r486; 2026-02-21T08:54:28.9755229Z cvt.u64.u32 %rd217, %r487; 2026-02-21T08:54:28.9755386Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:54:28.9755550Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:54:28.9755802Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9756092Z mov.b64 {%r809, %r810}, %rd219; 2026-02-21T08:54:28.9756254Z cvt.rn.f16x2.f32 %r811, %r810, %r809; 2026-02-21T08:54:28.9756521Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9756787Z cvt.u64.u32 %rd220, %r488; 2026-02-21T08:54:28.9756942Z cvt.u64.u32 %rd221, %r489; 2026-02-21T08:54:28.9757095Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:54:28.9757243Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:54:28.9757502Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9757770Z mov.b64 {%r812, %r813}, %rd223; 2026-02-21T08:54:28.9757938Z cvt.rn.f16x2.f32 %r814, %r813, %r812; 2026-02-21T08:54:28.9758201Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9758511Z cvt.u64.u32 %rd224, %r490; 2026-02-21T08:54:28.9758658Z cvt.u64.u32 %rd225, %r491; 2026-02-21T08:54:28.9758815Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:54:28.9758972Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:54:28.9759227Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9759503Z mov.b64 {%r815, %r816}, %rd227; 2026-02-21T08:54:28.9759664Z cvt.rn.f16x2.f32 %r817, %r816, %r815; 2026-02-21T08:54:28.9759942Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9760220Z cvt.u64.u32 %rd228, %r492; 2026-02-21T08:54:28.9760378Z cvt.u64.u32 %rd229, %r493; 2026-02-21T08:54:28.9760532Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:54:28.9760682Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:54:28.9760940Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9761211Z mov.b64 {%r818, %r819}, %rd231; 2026-02-21T08:54:28.9761377Z cvt.rn.f16x2.f32 %r820, %r819, %r818; 2026-02-21T08:54:28.9761664Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9761946Z cvt.u64.u32 %rd232, %r494; 2026-02-21T08:54:28.9762091Z cvt.u64.u32 %rd233, %r495; 2026-02-21T08:54:28.9762245Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:54:28.9762403Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:54:28.9762655Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9762934Z mov.b64 {%r821, %r822}, %rd235; 2026-02-21T08:54:28.9763120Z cvt.rn.f16x2.f32 %r823, %r822, %r821; 2026-02-21T08:54:28.9763391Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9763654Z cvt.u64.u32 %rd236, %r497; 2026-02-21T08:54:28.9763810Z cvt.u64.u32 %rd237, %r498; 2026-02-21T08:54:28.9763963Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:54:28.9764111Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:54:28.9764368Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9764635Z mov.b64 {%r824, %r825}, %rd239; 2026-02-21T08:54:28.9764834Z cvt.rn.f16x2.f32 %r826, %r825, %r824; 2026-02-21T08:54:28.9765093Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9765377Z cvt.u64.u32 %rd240, %r499; 2026-02-21T08:54:28.9765523Z cvt.u64.u32 %rd241, %r500; 2026-02-21T08:54:28.9765703Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:54:28.9765871Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:54:28.9766130Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9766423Z mov.b64 {%r827, %r828}, %rd243; 2026-02-21T08:54:28.9766594Z cvt.rn.f16x2.f32 %r829, %r828, %r827; 2026-02-21T08:54:28.9766877Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9767159Z cvt.u64.u32 %rd244, %r501; 2026-02-21T08:54:28.9767329Z cvt.u64.u32 %rd245, %r502; 2026-02-21T08:54:28.9767488Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:54:28.9767642Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:54:28.9767904Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9768186Z mov.b64 {%r830, %r831}, %rd247; 2026-02-21T08:54:28.9768356Z cvt.rn.f16x2.f32 %r832, %r831, %r830; 2026-02-21T08:54:28.9768621Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9768908Z cvt.u64.u32 %rd248, %r503; 2026-02-21T08:54:28.9769061Z cvt.u64.u32 %rd249, %r504; 2026-02-21T08:54:28.9769220Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:54:28.9769381Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:54:28.9769664Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9769947Z mov.b64 {%r833, %r834}, %rd251; 2026-02-21T08:54:28.9770109Z cvt.rn.f16x2.f32 %r835, %r834, %r833; 2026-02-21T08:54:28.9770380Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9770645Z cvt.u64.u32 %rd252, %r505; 2026-02-21T08:54:28.9770797Z cvt.u64.u32 %rd253, %r506; 2026-02-21T08:54:28.9770950Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:54:28.9771100Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:54:28.9771358Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9771626Z mov.b64 {%r836, %r837}, %rd255; 2026-02-21T08:54:28.9771791Z cvt.rn.f16x2.f32 %r838, %r837, %r836; 2026-02-21T08:54:28.9772051Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9772327Z cvt.u64.u32 %rd256, %r507; 2026-02-21T08:54:28.9772474Z cvt.u64.u32 %rd257, %r508; 2026-02-21T08:54:28.9772626Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:54:28.9772806Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:54:28.9773058Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9773333Z mov.b64 {%r839, %r840}, %rd259; 2026-02-21T08:54:28.9773491Z cvt.rn.f16x2.f32 %r841, %r840, %r839; 2026-02-21T08:54:28.9773758Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9774023Z cvt.u64.u32 %rd260, %r509; 2026-02-21T08:54:28.9774177Z cvt.u64.u32 %rd261, %r510; 2026-02-21T08:54:28.9774356Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:54:28.9774505Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:54:28.9774793Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9775074Z mov.b64 {%r842, %r843}, %rd263; 2026-02-21T08:54:28.9775243Z cvt.rn.f16x2.f32 %r844, %r843, %r842; 2026-02-21T08:54:28.9775504Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9775778Z cvt.u64.u32 %rd264, %r511; 2026-02-21T08:54:28.9775926Z cvt.u64.u32 %rd265, %r512; 2026-02-21T08:54:28.9776081Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:54:28.9776238Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:54:28.9776488Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9776763Z mov.b64 {%r845, %r846}, %rd267; 2026-02-21T08:54:28.9776953Z cvt.rn.f16x2.f32 %r847, %r846, %r845; 2026-02-21T08:54:28.9777235Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9777512Z cvt.u64.u32 %rd268, %r514; 2026-02-21T08:54:28.9777670Z cvt.u64.u32 %rd269, %r515; 2026-02-21T08:54:28.9777825Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:54:28.9777974Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:54:28.9778231Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9778500Z mov.b64 {%r848, %r849}, %rd271; 2026-02-21T08:54:28.9778668Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T08:54:28.9778929Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9779205Z cvt.u64.u32 %rd272, %r516; 2026-02-21T08:54:28.9779351Z cvt.u64.u32 %rd273, %r517; 2026-02-21T08:54:28.9779506Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:54:28.9779663Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:54:28.9779922Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9780213Z mov.b64 {%r851, %r852}, %rd275; 2026-02-21T08:54:28.9780381Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T08:54:28.9780696Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9780978Z cvt.u64.u32 %rd276, %r518; 2026-02-21T08:54:28.9781140Z cvt.u64.u32 %rd277, %r519; 2026-02-21T08:54:28.9781303Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:54:28.9781459Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:54:28.9781730Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9782012Z mov.b64 {%r854, %r855}, %rd279; 2026-02-21T08:54:28.9782187Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T08:54:28.9782461Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9782750Z cvt.u64.u32 %rd280, %r520; 2026-02-21T08:54:28.9782905Z cvt.u64.u32 %rd281, %r521; 2026-02-21T08:54:28.9783065Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:54:28.9783229Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:54:28.9783497Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9783786Z mov.b64 {%r857, %r858}, %rd283; 2026-02-21T08:54:28.9783953Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T08:54:28.9784263Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9784544Z cvt.u64.u32 %rd284, %r522; 2026-02-21T08:54:28.9784729Z cvt.u64.u32 %rd285, %r523; 2026-02-21T08:54:28.9784893Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:54:28.9785050Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:54:28.9785318Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9785597Z mov.b64 {%r860, %r861}, %rd287; 2026-02-21T08:54:28.9785804Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T08:54:28.9786084Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9786381Z cvt.u64.u32 %rd288, %r524; 2026-02-21T08:54:28.9786542Z cvt.u64.u32 %rd289, %r525; 2026-02-21T08:54:28.9786709Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:54:28.9786880Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:54:28.9787153Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9787448Z mov.b64 {%r863, %r864}, %rd291; 2026-02-21T08:54:28.9787626Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T08:54:28.9787898Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9788172Z cvt.u64.u32 %rd292, %r526; 2026-02-21T08:54:28.9788336Z cvt.u64.u32 %rd293, %r527; 2026-02-21T08:54:28.9788528Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:54:28.9788683Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:54:28.9788941Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9789214Z mov.b64 {%r866, %r867}, %rd295; 2026-02-21T08:54:28.9789382Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T08:54:28.9789641Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9789919Z cvt.u64.u32 %rd296, %r528; 2026-02-21T08:54:28.9790073Z cvt.u64.u32 %rd297, %r529; 2026-02-21T08:54:28.9790223Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:54:28.9790379Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:54:28.9790628Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9790902Z mov.b64 {%r869, %r870}, %rd299; 2026-02-21T08:54:28.9791061Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T08:54:28.9791331Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9791597Z cvt.u64.u32 %rd300, %r531; 2026-02-21T08:54:28.9791751Z cvt.u64.u32 %rd301, %r532; 2026-02-21T08:54:28.9791905Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:54:28.9792102Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:54:28.9792357Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9792626Z mov.b64 {%r872, %r873}, %rd303; 2026-02-21T08:54:28.9792794Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T08:54:28.9793056Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9793330Z cvt.u64.u32 %rd304, %r533; 2026-02-21T08:54:28.9793484Z cvt.u64.u32 %rd305, %r534; 2026-02-21T08:54:28.9793631Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:54:28.9793788Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:54:28.9794036Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9794323Z mov.b64 {%r875, %r876}, %rd307; 2026-02-21T08:54:28.9794484Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T08:54:28.9794781Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9795047Z cvt.u64.u32 %rd308, %r535; 2026-02-21T08:54:28.9795201Z cvt.u64.u32 %rd309, %r536; 2026-02-21T08:54:28.9795354Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:54:28.9795528Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:54:28.9795785Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9796053Z mov.b64 {%r878, %r879}, %rd311; 2026-02-21T08:54:28.9796218Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T08:54:28.9796476Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9796756Z cvt.u64.u32 %rd312, %r537; 2026-02-21T08:54:28.9796952Z cvt.u64.u32 %rd313, %r538; 2026-02-21T08:54:28.9797098Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:54:28.9797254Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:54:28.9797501Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9797776Z mov.b64 {%r881, %r882}, %rd315; 2026-02-21T08:54:28.9797936Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T08:54:28.9798203Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9798468Z cvt.u64.u32 %rd316, %r539; 2026-02-21T08:54:28.9798623Z cvt.u64.u32 %rd317, %r540; 2026-02-21T08:54:28.9798779Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:54:28.9798932Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:54:28.9799198Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9799489Z mov.b64 {%r884, %r885}, %rd319; 2026-02-21T08:54:28.9799658Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T08:54:28.9799916Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9800186Z cvt.u64.u32 %rd320, %r541; 2026-02-21T08:54:28.9800343Z cvt.u64.u32 %rd321, %r542; 2026-02-21T08:54:28.9800492Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:54:28.9800648Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:54:28.9800895Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9801172Z mov.b64 {%r887, %r888}, %rd323; 2026-02-21T08:54:28.9801331Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T08:54:28.9801598Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9801868Z cvt.u64.u32 %rd324, %r543; 2026-02-21T08:54:28.9802024Z cvt.u64.u32 %rd325, %r544; 2026-02-21T08:54:28.9802176Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:54:28.9802327Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:54:28.9802582Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9802847Z mov.b64 {%r890, %r891}, %rd327; 2026-02-21T08:54:28.9803012Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T08:54:28.9803301Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9803571Z cvt.u64.u32 %rd328, %r545; 2026-02-21T08:54:28.9803726Z cvt.u64.u32 %rd329, %r546; 2026-02-21T08:54:28.9803875Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:54:28.9804031Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:54:28.9804278Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9804553Z mov.b64 {%r893, %r894}, %rd331; 2026-02-21T08:54:28.9804740Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T08:54:28.9805015Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9805287Z cvt.u64.u32 %rd332, %r548; 2026-02-21T08:54:28.9805443Z cvt.u64.u32 %rd333, %r549; 2026-02-21T08:54:28.9805597Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:54:28.9805745Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:54:28.9806001Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9806270Z mov.b64 {%r896, %r897}, %rd335; 2026-02-21T08:54:28.9806460Z cvt.rn.f16x2.f32 %r898, %r897, %r896; 2026-02-21T08:54:28.9806722Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9806995Z cvt.u64.u32 %rd336, %r550; 2026-02-21T08:54:28.9807152Z cvt.u64.u32 %rd337, %r551; 2026-02-21T08:54:28.9807300Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:54:28.9807458Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:54:28.9807707Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9808010Z mov.b64 {%r899, %r900}, %rd339; 2026-02-21T08:54:28.9808171Z cvt.rn.f16x2.f32 %r901, %r900, %r899; 2026-02-21T08:54:28.9808438Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9808705Z cvt.u64.u32 %rd340, %r552; 2026-02-21T08:54:28.9808862Z cvt.u64.u32 %rd341, %r553; 2026-02-21T08:54:28.9809019Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:54:28.9809172Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:54:28.9809437Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9809706Z mov.b64 {%r902, %r903}, %rd343; 2026-02-21T08:54:28.9809875Z cvt.rn.f16x2.f32 %r904, %r903, %r902; 2026-02-21T08:54:28.9810134Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9810409Z cvt.u64.u32 %rd344, %r554; 2026-02-21T08:54:28.9810588Z cvt.u64.u32 %rd345, %r555; 2026-02-21T08:54:28.9810738Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:54:28.9810896Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:54:28.9811143Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9811417Z mov.b64 {%r905, %r906}, %rd347; 2026-02-21T08:54:28.9811575Z cvt.rn.f16x2.f32 %r907, %r906, %r905; 2026-02-21T08:54:28.9811841Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9812115Z cvt.u64.u32 %rd348, %r556; 2026-02-21T08:54:28.9812268Z cvt.u64.u32 %rd349, %r557; 2026-02-21T08:54:28.9812421Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:54:28.9812571Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:54:28.9812822Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9813085Z mov.b64 {%r908, %r909}, %rd351; 2026-02-21T08:54:28.9813253Z cvt.rn.f16x2.f32 %r910, %r909, %r908; 2026-02-21T08:54:28.9813511Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9813784Z cvt.u64.u32 %rd352, %r558; 2026-02-21T08:54:28.9813937Z cvt.u64.u32 %rd353, %r559; 2026-02-21T08:54:28.9814111Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:54:28.9814267Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:54:28.9814514Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9814818Z mov.b64 {%r911, %r912}, %rd355; 2026-02-21T08:54:28.9814978Z cvt.rn.f16x2.f32 %r913, %r912, %r911; 2026-02-21T08:54:28.9815247Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9815518Z cvt.u64.u32 %rd356, %r560; 2026-02-21T08:54:28.9815672Z cvt.u64.u32 %rd357, %r561; 2026-02-21T08:54:28.9815825Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:54:28.9815974Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:54:28.9816228Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9816498Z mov.b64 {%r914, %r915}, %rd359; 2026-02-21T08:54:28.9816662Z cvt.rn.f16x2.f32 %r916, %r915, %r914; 2026-02-21T08:54:28.9816921Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9817203Z cvt.u64.u32 %rd360, %r562; 2026-02-21T08:54:28.9817358Z cvt.u64.u32 %rd361, %r563; 2026-02-21T08:54:28.9817528Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:54:28.9817686Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:54:28.9817935Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9818210Z mov.b64 {%r917, %r918}, %rd363; 2026-02-21T08:54:28.9818368Z cvt.rn.f16x2.f32 %r919, %r918, %r917; 2026-02-21T08:54:28.9818635Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9818899Z cvt.u64.u32 %rd364, %r565; 2026-02-21T08:54:28.9819083Z cvt.u64.u32 %rd365, %r566; 2026-02-21T08:54:28.9819238Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:54:28.9819388Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:54:28.9819643Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9819913Z mov.b64 {%r920, %r921}, %rd367; 2026-02-21T08:54:28.9820085Z cvt.rn.f16x2.f32 %r922, %r921, %r920; 2026-02-21T08:54:28.9820349Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9820620Z cvt.u64.u32 %rd368, %r567; 2026-02-21T08:54:28.9820775Z cvt.u64.u32 %rd369, %r568; 2026-02-21T08:54:28.9820922Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:54:28.9821077Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:54:28.9821324Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9821633Z mov.b64 {%r923, %r924}, %rd371; 2026-02-21T08:54:28.9821798Z cvt.rn.f16x2.f32 %r925, %r924, %r923; 2026-02-21T08:54:28.9822062Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9822332Z cvt.u64.u32 %rd372, %r569; 2026-02-21T08:54:28.9822488Z cvt.u64.u32 %rd373, %r570; 2026-02-21T08:54:28.9822643Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:54:28.9822791Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:54:28.9823047Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9823317Z mov.b64 {%r926, %r927}, %rd375; 2026-02-21T08:54:28.9823489Z cvt.rn.f16x2.f32 %r928, %r927, %r926; 2026-02-21T08:54:28.9823764Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9824047Z cvt.u64.u32 %rd376, %r571; 2026-02-21T08:54:28.9824210Z cvt.u64.u32 %rd377, %r572; 2026-02-21T08:54:28.9824366Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:54:28.9824530Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:54:28.9824821Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9825120Z mov.b64 {%r929, %r930}, %rd379; 2026-02-21T08:54:28.9825316Z cvt.rn.f16x2.f32 %r931, %r930, %r929; 2026-02-21T08:54:28.9825593Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9825868Z cvt.u64.u32 %rd380, %r573; 2026-02-21T08:54:28.9826028Z cvt.u64.u32 %rd381, %r574; 2026-02-21T08:54:28.9826186Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:54:28.9826340Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:54:28.9826605Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9826883Z mov.b64 {%r932, %r933}, %rd383; 2026-02-21T08:54:28.9827054Z cvt.rn.f16x2.f32 %r934, %r933, %r932; 2026-02-21T08:54:28.9827323Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9827608Z cvt.u64.u32 %rd384, %r575; 2026-02-21T08:54:28.9827770Z cvt.u64.u32 %rd385, %r576; 2026-02-21T08:54:28.9827924Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:54:28.9828091Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:54:28.9828347Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9828639Z mov.b64 {%r935, %r936}, %rd387; 2026-02-21T08:54:28.9828830Z cvt.rn.f16x2.f32 %r937, %r936, %r935; 2026-02-21T08:54:28.9829113Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9829397Z cvt.u64.u32 %rd388, %r577; 2026-02-21T08:54:28.9829561Z cvt.u64.u32 %rd389, %r578; 2026-02-21T08:54:28.9829722Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:54:28.9829880Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:54:28.9830155Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9830471Z mov.b64 {%r938, %r939}, %rd391; 2026-02-21T08:54:28.9830644Z cvt.rn.f16x2.f32 %r940, %r939, %r938; 2026-02-21T08:54:28.9830918Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9831211Z cvt.u64.u32 %rd392, %r579; 2026-02-21T08:54:28.9831387Z cvt.u64.u32 %rd393, %r580; 2026-02-21T08:54:28.9831535Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:54:28.9831694Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:54:28.9831943Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9832221Z mov.b64 {%r941, %r942}, %rd395; 2026-02-21T08:54:28.9832382Z cvt.rn.f16x2.f32 %r943, %r942, %r941; 2026-02-21T08:54:28.9832649Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9832954Z cvt.u64.u32 %rd396, %r582; 2026-02-21T08:54:28.9833112Z cvt.u64.u32 %rd397, %r583; 2026-02-21T08:54:28.9833268Z shl.b64 %rd398, %rd397, 32; 2026-02-21T08:54:28.9833418Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T08:54:28.9833672Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9833947Z mov.b64 {%r944, %r945}, %rd399; 2026-02-21T08:54:28.9834112Z cvt.rn.f16x2.f32 %r946, %r945, %r944; 2026-02-21T08:54:28.9834368Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9834638Z cvt.u64.u32 %rd400, %r584; 2026-02-21T08:54:28.9834825Z cvt.u64.u32 %rd401, %r585; 2026-02-21T08:54:28.9834974Z shl.b64 %rd402, %rd401, 32; 2026-02-21T08:54:28.9835131Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T08:54:28.9835383Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9835666Z mov.b64 {%r947, %r948}, %rd403; 2026-02-21T08:54:28.9835826Z cvt.rn.f16x2.f32 %r949, %r948, %r947; 2026-02-21T08:54:28.9836103Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9836375Z cvt.u64.u32 %rd404, %r586; 2026-02-21T08:54:28.9836530Z cvt.u64.u32 %rd405, %r587; 2026-02-21T08:54:28.9836716Z shl.b64 %rd406, %rd405, 32; 2026-02-21T08:54:28.9836865Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T08:54:28.9837116Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9837383Z mov.b64 {%r950, %r951}, %rd407; 2026-02-21T08:54:28.9837550Z cvt.rn.f16x2.f32 %r952, %r951, %r950; 2026-02-21T08:54:28.9837805Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9838076Z cvt.u64.u32 %rd408, %r588; 2026-02-21T08:54:28.9838231Z cvt.u64.u32 %rd409, %r589; 2026-02-21T08:54:28.9838379Z shl.b64 %rd410, %rd409, 32; 2026-02-21T08:54:28.9838536Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T08:54:28.9838784Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9839059Z mov.b64 {%r953, %r954}, %rd411; 2026-02-21T08:54:28.9839217Z cvt.rn.f16x2.f32 %r955, %r954, %r953; 2026-02-21T08:54:28.9839487Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9839750Z cvt.u64.u32 %rd412, %r590; 2026-02-21T08:54:28.9839931Z cvt.u64.u32 %rd413, %r591; 2026-02-21T08:54:28.9840087Z shl.b64 %rd414, %rd413, 32; 2026-02-21T08:54:28.9840237Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T08:54:28.9840495Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9840761Z mov.b64 {%r956, %r957}, %rd415; 2026-02-21T08:54:28.9840926Z cvt.rn.f16x2.f32 %r958, %r957, %r956; 2026-02-21T08:54:28.9841185Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9841485Z cvt.u64.u32 %rd416, %r592; 2026-02-21T08:54:28.9841646Z cvt.u64.u32 %rd417, %r593; 2026-02-21T08:54:28.9841793Z shl.b64 %rd418, %rd417, 32; 2026-02-21T08:54:28.9842061Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T08:54:28.9842378Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9842465Z mov.b64 {%r959, %r960}, %rd419; 2026-02-21T08:54:28.9842590Z cvt.rn.f16x2.f32 %r961, %r960, %r959; 2026-02-21T08:54:28.9842892Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9842989Z cvt.u64.u32 %rd420, %r594; 2026-02-21T08:54:28.9843076Z cvt.u64.u32 %rd421, %r595; 2026-02-21T08:54:28.9843197Z shl.b64 %rd422, %rd421, 32; 2026-02-21T08:54:28.9843282Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T08:54:28.9843500Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9843571Z mov.b64 {%r962, %r963}, %rd423; 2026-02-21T08:54:28.9843731Z cvt.rn.f16x2.f32 %r964, %r963, %r962; 2026-02-21T08:54:28.9843909Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9843992Z cvt.u64.u32 %rd424, %r596; 2026-02-21T08:54:28.9844106Z cvt.u64.u32 %rd425, %r597; 2026-02-21T08:54:28.9844199Z shl.b64 %rd426, %rd425, 32; 2026-02-21T08:54:28.9844269Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T08:54:28.9844519Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9844602Z mov.b64 {%r965, %r966}, %rd427; 2026-02-21T08:54:28.9844740Z cvt.rn.f16x2.f32 %r967, %r966, %r965; 2026-02-21T08:54:28.9844961Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9845046Z cvt.u64.u32 %rd428, %r599; 2026-02-21T08:54:28.9845115Z cvt.u64.u32 %rd429, %r600; 2026-02-21T08:54:28.9845214Z shl.b64 %rd430, %rd429, 32; 2026-02-21T08:54:28.9845343Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T08:54:28.9845528Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9845647Z mov.b64 {%r968, %r969}, %rd431; 2026-02-21T08:54:28.9845765Z cvt.rn.f16x2.f32 %r970, %r969, %r968; 2026-02-21T08:54:28.9845932Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9846034Z cvt.u64.u32 %rd432, %r601; 2026-02-21T08:54:28.9846157Z cvt.u64.u32 %rd433, %r602; 2026-02-21T08:54:28.9846238Z shl.b64 %rd434, %rd433, 32; 2026-02-21T08:54:28.9846328Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T08:54:28.9846511Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9846612Z mov.b64 {%r971, %r972}, %rd435; 2026-02-21T08:54:28.9846716Z cvt.rn.f16x2.f32 %r973, %r972, %r971; 2026-02-21T08:54:28.9846907Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9847028Z cvt.u64.u32 %rd436, %r603; 2026-02-21T08:54:28.9847108Z cvt.u64.u32 %rd437, %r604; 2026-02-21T08:54:28.9847189Z shl.b64 %rd438, %rd437, 32; 2026-02-21T08:54:28.9847288Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T08:54:28.9847490Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9847620Z mov.b64 {%r974, %r975}, %rd439; 2026-02-21T08:54:28.9847747Z cvt.rn.f16x2.f32 %r976, %r975, %r974; 2026-02-21T08:54:28.9847924Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9848007Z cvt.u64.u32 %rd440, %r605; 2026-02-21T08:54:28.9848087Z cvt.u64.u32 %rd441, %r606; 2026-02-21T08:54:28.9848206Z shl.b64 %rd442, %rd441, 32; 2026-02-21T08:54:28.9848299Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T08:54:28.9848492Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9848653Z mov.b64 {%r977, %r978}, %rd443; 2026-02-21T08:54:28.9848740Z cvt.rn.f16x2.f32 %r979, %r978, %r977; 2026-02-21T08:54:28.9848921Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9849037Z cvt.u64.u32 %rd444, %r607; 2026-02-21T08:54:28.9849140Z cvt.u64.u32 %rd445, %r608; 2026-02-21T08:54:28.9849223Z shl.b64 %rd446, %rd445, 32; 2026-02-21T08:54:28.9849306Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T08:54:28.9849514Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9849597Z mov.b64 {%r980, %r981}, %rd447; 2026-02-21T08:54:28.9849669Z cvt.rn.f16x2.f32 %r982, %r981, %r980; 2026-02-21T08:54:28.9849915Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9850028Z cvt.u64.u32 %rd448, %r609; 2026-02-21T08:54:28.9850110Z cvt.u64.u32 %rd449, %r610; 2026-02-21T08:54:28.9850225Z shl.b64 %rd450, %rd449, 32; 2026-02-21T08:54:28.9850311Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T08:54:28.9850482Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9850632Z mov.b64 {%r983, %r984}, %rd451; 2026-02-21T08:54:28.9850719Z cvt.rn.f16x2.f32 %r985, %r984, %r983; 2026-02-21T08:54:28.9850904Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9850985Z cvt.u64.u32 %rd452, %r611; 2026-02-21T08:54:28.9851097Z cvt.u64.u32 %rd453, %r612; 2026-02-21T08:54:28.9851175Z shl.b64 %rd454, %rd453, 32; 2026-02-21T08:54:28.9851272Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T08:54:28.9851499Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9851583Z mov.b64 {%r986, %r987}, %rd455; 2026-02-21T08:54:28.9851669Z cvt.rn.f16x2.f32 %r988, %r987, %r986; 2026-02-21T08:54:28.9851889Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9851959Z cvt.u64.u32 %rd456, %r613; 2026-02-21T08:54:28.9852078Z cvt.u64.u32 %rd457, %r614; 2026-02-21T08:54:28.9852173Z shl.b64 %rd458, %rd457, 32; 2026-02-21T08:54:28.9852286Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T08:54:28.9852468Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9852559Z mov.b64 {%r989, %r990}, %rd459; 2026-02-21T08:54:28.9852663Z cvt.rn.f16x2.f32 %r991, %r990, %r989; 2026-02-21T08:54:28.9852853Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9852951Z cvt.u64.u32 %rd460, %r616; 2026-02-21T08:54:28.9853061Z cvt.u64.u32 %rd461, %r617; 2026-02-21T08:54:28.9853145Z shl.b64 %rd462, %rd461, 32; 2026-02-21T08:54:28.9853235Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T08:54:28.9853433Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9853530Z mov.b64 {%r992, %r993}, %rd463; 2026-02-21T08:54:28.9853629Z cvt.rn.f16x2.f32 %r994, %r993, %r992; 2026-02-21T08:54:28.9853806Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9853930Z cvt.u64.u32 %rd464, %r618; 2026-02-21T08:54:28.9854036Z cvt.u64.u32 %rd465, %r619; 2026-02-21T08:54:28.9854119Z shl.b64 %rd466, %rd465, 32; 2026-02-21T08:54:28.9854235Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T08:54:28.9854428Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9854511Z mov.b64 {%r995, %r996}, %rd467; 2026-02-21T08:54:28.9854637Z cvt.rn.f16x2.f32 %r997, %r996, %r995; 2026-02-21T08:54:28.9854847Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9854957Z cvt.u64.u32 %rd468, %r620; 2026-02-21T08:54:28.9855071Z cvt.u64.u32 %rd469, %r621; 2026-02-21T08:54:28.9855167Z shl.b64 %rd470, %rd469, 32; 2026-02-21T08:54:28.9855256Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T08:54:28.9855439Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9855556Z mov.b64 {%r998, %r999}, %rd471; 2026-02-21T08:54:28.9855651Z cvt.rn.f16x2.f32 %r1000, %r999, %r998; 2026-02-21T08:54:28.9855816Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9855953Z cvt.u64.u32 %rd472, %r622; 2026-02-21T08:54:28.9856042Z cvt.u64.u32 %rd473, %r623; 2026-02-21T08:54:28.9856126Z shl.b64 %rd474, %rd473, 32; 2026-02-21T08:54:28.9856242Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T08:54:28.9856445Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9856523Z mov.b64 {%r1001, %r1002}, %rd475; 2026-02-21T08:54:28.9856633Z cvt.rn.f16x2.f32 %r1003, %r1002, %r1001; 2026-02-21T08:54:28.9856865Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9856949Z cvt.u64.u32 %rd476, %r624; 2026-02-21T08:54:28.9857029Z cvt.u64.u32 %rd477, %r625; 2026-02-21T08:54:28.9857145Z shl.b64 %rd478, %rd477, 32; 2026-02-21T08:54:28.9857213Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T08:54:28.9857410Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9857551Z mov.b64 {%r1004, %r1005}, %rd479; 2026-02-21T08:54:28.9857645Z cvt.rn.f16x2.f32 %r1006, %r1005, %r1004; 2026-02-21T08:54:28.9857826Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9857941Z cvt.u64.u32 %rd480, %r626; 2026-02-21T08:54:28.9858008Z cvt.u64.u32 %rd481, %r627; 2026-02-21T08:54:28.9858112Z shl.b64 %rd482, %rd481, 32; 2026-02-21T08:54:28.9858204Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T08:54:28.9858409Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9858517Z mov.b64 {%r1007, %r1008}, %rd483; 2026-02-21T08:54:28.9858609Z cvt.rn.f16x2.f32 %r1009, %r1008, %r1007; 2026-02-21T08:54:28.9858818Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9858913Z cvt.u64.u32 %rd484, %r628; 2026-02-21T08:54:28.9859006Z cvt.u64.u32 %rd485, %r629; 2026-02-21T08:54:28.9859118Z shl.b64 %rd486, %rd485, 32; 2026-02-21T08:54:28.9859200Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T08:54:28.9859383Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9859474Z mov.b64 {%r1010, %r1011}, %rd487; 2026-02-21T08:54:28.9859606Z cvt.rn.f16x2.f32 %r1012, %r1011, %r1010; 2026-02-21T08:54:28.9859799Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9859879Z cvt.u64.u32 %rd488, %r630; 2026-02-21T08:54:28.9859989Z cvt.u64.u32 %rd489, %r631; 2026-02-21T08:54:28.9860073Z shl.b64 %rd490, %rd489, 32; 2026-02-21T08:54:28.9860162Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T08:54:28.9860405Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9860500Z mov.b64 {%r1013, %r1014}, %rd491; 2026-02-21T08:54:28.9860591Z cvt.rn.f16x2.f32 %r1015, %r1014, %r1013; 2026-02-21T08:54:28.9860802Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9860895Z cvt.u64.u32 %rd492, %r633; 2026-02-21T08:54:28.9860977Z cvt.u64.u32 %rd493, %r634; 2026-02-21T08:54:28.9861043Z shl.b64 %rd494, %rd493, 32; 2026-02-21T08:54:28.9861185Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T08:54:28.9861384Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9861465Z mov.b64 {%r1016, %r1017}, %rd495; 2026-02-21T08:54:28.9861600Z cvt.rn.f16x2.f32 %r1018, %r1017, %r1016; 2026-02-21T08:54:28.9861779Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9861846Z cvt.u64.u32 %rd496, %r635; 2026-02-21T08:54:28.9861983Z cvt.u64.u32 %rd497, %r636; 2026-02-21T08:54:28.9862068Z shl.b64 %rd498, %rd497, 32; 2026-02-21T08:54:28.9862157Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T08:54:28.9862342Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9862458Z mov.b64 {%r1019, %r1020}, %rd499; 2026-02-21T08:54:28.9862533Z cvt.rn.f16x2.f32 %r1021, %r1020, %r1019; 2026-02-21T08:54:28.9862755Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9862893Z cvt.u64.u32 %rd500, %r637; 2026-02-21T08:54:28.9862973Z cvt.u64.u32 %rd501, %r638; 2026-02-21T08:54:28.9863056Z shl.b64 %rd502, %rd501, 32; 2026-02-21T08:54:28.9863169Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T08:54:28.9863337Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9863435Z mov.b64 {%r1022, %r1023}, %rd503; 2026-02-21T08:54:28.9863578Z cvt.rn.f16x2.f32 %r1024, %r1023, %r1022; 2026-02-21T08:54:28.9863755Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9863835Z cvt.u64.u32 %rd504, %r639; 2026-02-21T08:54:28.9863915Z cvt.u64.u32 %rd505, %r640; 2026-02-21T08:54:28.9864015Z shl.b64 %rd506, %rd505, 32; 2026-02-21T08:54:28.9864112Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T08:54:28.9864313Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9864426Z mov.b64 {%r1025, %r1026}, %rd507; 2026-02-21T08:54:28.9864518Z cvt.rn.f16x2.f32 %r1027, %r1026, %r1025; 2026-02-21T08:54:28.9864731Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9864859Z cvt.u64.u32 %rd508, %r641; 2026-02-21T08:54:28.9864955Z cvt.u64.u32 %rd509, %r642; 2026-02-21T08:54:28.9865061Z shl.b64 %rd510, %rd509, 32; 2026-02-21T08:54:28.9865143Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T08:54:28.9865358Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9865444Z mov.b64 {%r1028, %r1029}, %rd511; 2026-02-21T08:54:28.9865536Z cvt.rn.f16x2.f32 %r1030, %r1029, %r1028; 2026-02-21T08:54:28.9865803Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9865901Z cvt.u64.u32 %rd512, %r643; 2026-02-21T08:54:28.9865982Z cvt.u64.u32 %rd513, %r644; 2026-02-21T08:54:28.9866097Z shl.b64 %rd514, %rd513, 32; 2026-02-21T08:54:28.9866180Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T08:54:28.9866384Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9866614Z mov.b64 {%r1031, %r1032}, %rd515; 2026-02-21T08:54:28.9866723Z cvt.rn.f16x2.f32 %r1033, %r1032, %r1031; 2026-02-21T08:54:28.9866942Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9867031Z cvt.u64.u32 %rd516, %r645; 2026-02-21T08:54:28.9867152Z cvt.u64.u32 %rd517, %r646; 2026-02-21T08:54:28.9867247Z shl.b64 %rd518, %rd517, 32; 2026-02-21T08:54:28.9867319Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T08:54:28.9867575Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9867662Z mov.b64 {%r1034, %r1035}, %rd519; 2026-02-21T08:54:28.9867755Z cvt.rn.f16x2.f32 %r1036, %r1035, %r1034; 2026-02-21T08:54:28.9868011Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9868094Z cvt.u64.u32 %rd520, %r647; 2026-02-21T08:54:28.9868164Z cvt.u64.u32 %rd521, %r648; 2026-02-21T08:54:28.9868314Z shl.b64 %rd522, %rd521, 32; 2026-02-21T08:54:28.9868398Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T08:54:28.9868589Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9868679Z mov.b64 {%r1037, %r1038}, %rd523; 2026-02-21T08:54:28.9868815Z cvt.rn.f16x2.f32 %r1039, %r1038, %r1037; 2026-02-21T08:54:28.9868991Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9869094Z cvt.u64.u32 %rd524, %r650; 2026-02-21T08:54:28.9869224Z cvt.u64.u32 %rd525, %r651; 2026-02-21T08:54:28.9869308Z shl.b64 %rd526, %rd525, 32; 2026-02-21T08:54:28.9869427Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T08:54:28.9869651Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9869725Z mov.b64 {%r1040, %r1041}, %rd527; 2026-02-21T08:54:28.9869835Z cvt.rn.f16x2.f32 %r1042, %r1041, %r1040; 2026-02-21T08:54:28.9870067Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9870158Z cvt.u64.u32 %rd528, %r652; 2026-02-21T08:54:28.9870243Z cvt.u64.u32 %rd529, %r653; 2026-02-21T08:54:28.9870330Z shl.b64 %rd530, %rd529, 32; 2026-02-21T08:54:28.9870434Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T08:54:28.9870640Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9870739Z mov.b64 {%r1043, %r1044}, %rd531; 2026-02-21T08:54:28.9870872Z cvt.rn.f16x2.f32 %r1045, %r1044, %r1043; 2026-02-21T08:54:28.9871062Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9871151Z cvt.u64.u32 %rd532, %r654; 2026-02-21T08:54:28.9871253Z cvt.u64.u32 %rd533, %r655; 2026-02-21T08:54:28.9871354Z shl.b64 %rd534, %rd533, 32; 2026-02-21T08:54:28.9871450Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T08:54:28.9871666Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9871785Z mov.b64 {%r1046, %r1047}, %rd535; 2026-02-21T08:54:28.9871883Z cvt.rn.f16x2.f32 %r1048, %r1047, %r1046; 2026-02-21T08:54:28.9872071Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9872193Z cvt.u64.u32 %rd536, %r656; 2026-02-21T08:54:28.9872298Z cvt.u64.u32 %rd537, %r657; 2026-02-21T08:54:28.9872384Z shl.b64 %rd538, %rd537, 32; 2026-02-21T08:54:28.9872499Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T08:54:28.9872692Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9872778Z mov.b64 {%r1049, %r1050}, %rd539; 2026-02-21T08:54:28.9872905Z cvt.rn.f16x2.f32 %r1051, %r1050, %r1049; 2026-02-21T08:54:28.9873121Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9873208Z cvt.u64.u32 %rd540, %r658; 2026-02-21T08:54:28.9873295Z cvt.u64.u32 %rd541, %r659; 2026-02-21T08:54:28.9873414Z shl.b64 %rd542, %rd541, 32; 2026-02-21T08:54:28.9873497Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T08:54:28.9873702Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9873855Z mov.b64 {%r1052, %r1053}, %rd543; 2026-02-21T08:54:28.9873946Z cvt.rn.f16x2.f32 %r1054, %r1053, %r1052; 2026-02-21T08:54:28.9874125Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9874237Z cvt.u64.u32 %rd544, %r660; 2026-02-21T08:54:28.9874322Z cvt.u64.u32 %rd545, %r661; 2026-02-21T08:54:28.9874419Z shl.b64 %rd546, %rd545, 32; 2026-02-21T08:54:28.9874521Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T08:54:28.9874760Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9874846Z mov.b64 {%r1055, %r1056}, %rd547; 2026-02-21T08:54:28.9874934Z cvt.rn.f16x2.f32 %r1057, %r1056, %r1055; 2026-02-21T08:54:28.9875156Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9875225Z cvt.u64.u32 %rd548, %r662; 2026-02-21T08:54:28.9875320Z cvt.u64.u32 %rd549, %r663; 2026-02-21T08:54:28.9875445Z shl.b64 %rd550, %rd549, 32; 2026-02-21T08:54:28.9875525Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T08:54:28.9875704Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9875826Z mov.b64 {%r1058, %r1059}, %rd551; 2026-02-21T08:54:28.9875928Z cvt.rn.f16x2.f32 %r1060, %r1059, %r1058; 2026-02-21T08:54:28.9876119Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9876212Z cvt.u64.u32 %rd552, %r664; 2026-02-21T08:54:28.9876324Z cvt.u64.u32 %rd553, %r665; 2026-02-21T08:54:28.9876408Z shl.b64 %rd554, %rd553, 32; 2026-02-21T08:54:28.9876496Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T08:54:28.9876697Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9876800Z mov.b64 {%r1061, %r1062}, %rd555; 2026-02-21T08:54:28.9876903Z cvt.rn.f16x2.f32 %r1063, %r1062, %r1061; 2026-02-21T08:54:28.9877112Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9877205Z cvt.u64.u32 %rd556, %r667; 2026-02-21T08:54:28.9877286Z cvt.u64.u32 %rd557, %r668; 2026-02-21T08:54:28.9877368Z shl.b64 %rd558, %rd557, 32; 2026-02-21T08:54:28.9877486Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T08:54:28.9877683Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9877766Z mov.b64 {%r1064, %r1065}, %rd559; 2026-02-21T08:54:28.9877893Z cvt.rn.f16x2.f32 %r1066, %r1065, %r1064; 2026-02-21T08:54:28.9878101Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9878183Z cvt.u64.u32 %rd560, %r669; 2026-02-21T08:54:28.9878298Z cvt.u64.u32 %rd561, %r670; 2026-02-21T08:54:28.9878393Z shl.b64 %rd562, %rd561, 32; 2026-02-21T08:54:28.9878476Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T08:54:28.9878693Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9878776Z mov.b64 {%r1067, %r1068}, %rd563; 2026-02-21T08:54:28.9878865Z cvt.rn.f16x2.f32 %r1069, %r1068, %r1067; 2026-02-21T08:54:28.9879037Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9879183Z cvt.u64.u32 %rd564, %r671; 2026-02-21T08:54:28.9879274Z cvt.u64.u32 %rd565, %r672; 2026-02-21T08:54:28.9879356Z shl.b64 %rd566, %rd565, 32; 2026-02-21T08:54:28.9879475Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T08:54:28.9879657Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9879725Z mov.b64 {%r1070, %r1071}, %rd567; 2026-02-21T08:54:28.9879872Z cvt.rn.f16x2.f32 %r1072, %r1071, %r1070; 2026-02-21T08:54:28.9880090Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9880174Z cvt.u64.u32 %rd568, %r673; 2026-02-21T08:54:28.9880290Z cvt.u64.u32 %rd569, %r674; 2026-02-21T08:54:28.9880371Z shl.b64 %rd570, %rd569, 32; 2026-02-21T08:54:28.9880440Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T08:54:28.9880637Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9880801Z mov.b64 {%r1073, %r1074}, %rd571; 2026-02-21T08:54:28.9880888Z cvt.rn.f16x2.f32 %r1075, %r1074, %r1073; 2026-02-21T08:54:28.9881067Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9881181Z cvt.u64.u32 %rd572, %r675; 2026-02-21T08:54:28.9881247Z cvt.u64.u32 %rd573, %r676; 2026-02-21T08:54:28.9881366Z shl.b64 %rd574, %rd573, 32; 2026-02-21T08:54:28.9881495Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T08:54:28.9881676Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9881758Z mov.b64 {%r1076, %r1077}, %rd575; 2026-02-21T08:54:28.9881850Z cvt.rn.f16x2.f32 %r1078, %r1077, %r1076; 2026-02-21T08:54:28.9882049Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9882161Z cvt.u64.u32 %rd576, %r677; 2026-02-21T08:54:28.9882277Z cvt.u64.u32 %rd577, %r678; 2026-02-21T08:54:28.9882391Z shl.b64 %rd578, %rd577, 32; 2026-02-21T08:54:28.9882475Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T08:54:28.9882660Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9882769Z mov.b64 {%r1079, %r1080}, %rd579; 2026-02-21T08:54:28.9882875Z cvt.rn.f16x2.f32 %r1081, %r1080, %r1079; 2026-02-21T08:54:28.9883070Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9883182Z cvt.u64.u32 %rd580, %r679; 2026-02-21T08:54:28.9883263Z cvt.u64.u32 %rd581, %r680; 2026-02-21T08:54:28.9883344Z shl.b64 %rd582, %rd581, 32; 2026-02-21T08:54:28.9883434Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T08:54:28.9883655Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9883752Z mov.b64 {%r1082, %r1083}, %rd583; 2026-02-21T08:54:28.9883840Z cvt.rn.f16x2.f32 %r1084, %r1083, %r1082; 2026-02-21T08:54:28.9884053Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9884143Z cvt.u64.u32 %rd584, %r681; 2026-02-21T08:54:28.9884223Z cvt.u64.u32 %rd585, %r682; 2026-02-21T08:54:28.9884364Z shl.b64 %rd586, %rd585, 32; 2026-02-21T08:54:28.9884458Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T08:54:28.9884640Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9884749Z mov.b64 {%r1085, %r1086}, %rd587; 2026-02-21T08:54:28.9884883Z cvt.rn.f16x2.f32 %r1087, %r1086, %r1085; 2026-02-21T08:54:28.9885063Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9885130Z cvt.u64.u32 %rd588, %r684; 2026-02-21T08:54:28.9885277Z cvt.u64.u32 %rd589, %r685; 2026-02-21T08:54:28.9885358Z shl.b64 %rd590, %rd589, 32; 2026-02-21T08:54:28.9885441Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T08:54:28.9885659Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9885743Z mov.b64 {%r1088, %r1089}, %rd591; 2026-02-21T08:54:28.9885818Z cvt.rn.f16x2.f32 %r1090, %r1089, %r1088; 2026-02-21T08:54:28.9886063Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9886146Z cvt.u64.u32 %rd592, %r686; 2026-02-21T08:54:28.9886238Z cvt.u64.u32 %rd593, %r687; 2026-02-21T08:54:28.9886346Z shl.b64 %rd594, %rd593, 32; 2026-02-21T08:54:28.9886457Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T08:54:28.9886623Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9886721Z mov.b64 {%r1091, %r1092}, %rd595; 2026-02-21T08:54:28.9886861Z cvt.rn.f16x2.f32 %r1093, %r1092, %r1091; 2026-02-21T08:54:28.9887044Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9887151Z cvt.u64.u32 %rd596, %r688; 2026-02-21T08:54:28.9887263Z cvt.u64.u32 %rd597, %r689; 2026-02-21T08:54:28.9887330Z shl.b64 %rd598, %rd597, 32; 2026-02-21T08:54:28.9887425Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T08:54:28.9887657Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9887745Z mov.b64 {%r1094, %r1095}, %rd599; 2026-02-21T08:54:28.9887832Z cvt.rn.f16x2.f32 %r1096, %r1095, %r1094; 2026-02-21T08:54:28.9888013Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9888113Z cvt.u64.u32 %rd600, %r690; 2026-02-21T08:54:28.9888209Z cvt.u64.u32 %rd601, %r691; 2026-02-21T08:54:28.9888313Z shl.b64 %rd602, %rd601, 32; 2026-02-21T08:54:28.9888428Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T08:54:28.9888611Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9888724Z mov.b64 {%r1097, %r1098}, %rd603; 2026-02-21T08:54:28.9888832Z cvt.rn.f16x2.f32 %r1099, %r1098, %r1097; 2026-02-21T08:54:28.9889038Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9889133Z cvt.u64.u32 %rd604, %r692; 2026-02-21T08:54:28.9889215Z cvt.u64.u32 %rd605, %r693; 2026-02-21T08:54:28.9889326Z shl.b64 %rd606, %rd605, 32; 2026-02-21T08:54:28.9889405Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T08:54:28.9889585Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9889707Z mov.b64 {%r1100, %r1101}, %rd607; 2026-02-21T08:54:28.9889809Z cvt.rn.f16x2.f32 %r1102, %r1101, %r1100; 2026-02-21T08:54:28.9889989Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9890100Z cvt.u64.u32 %rd608, %r694; 2026-02-21T08:54:28.9890183Z cvt.u64.u32 %rd609, %r695; 2026-02-21T08:54:28.9890265Z shl.b64 %rd610, %rd609, 32; 2026-02-21T08:54:28.9890395Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T08:54:28.9890595Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9890678Z mov.b64 {%r1103, %r1104}, %rd611; 2026-02-21T08:54:28.9890794Z cvt.rn.f16x2.f32 %r1105, %r1104, %r1103; 2026-02-21T08:54:28.9891006Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9891098Z cvt.u64.u32 %rd612, %r696; 2026-02-21T08:54:28.9911331Z cvt.u64.u32 %rd613, %r697; 2026-02-21T08:54:28.9911516Z shl.b64 %rd614, %rd613, 32; 2026-02-21T08:54:28.9911610Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T08:54:28.9911864Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9911962Z mov.b64 {%r1106, %r1107}, %rd615; 2026-02-21T08:54:28.9912059Z cvt.rn.f16x2.f32 %r1108, %r1107, %r1106; 2026-02-21T08:54:28.9912259Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9912340Z cvt.u64.u32 %rd616, %r698; 2026-02-21T08:54:28.9912404Z cvt.u64.u32 %rd617, %r699; 2026-02-21T08:54:28.9912469Z shl.b64 %rd618, %rd617, 32; 2026-02-21T08:54:28.9912544Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T08:54:28.9912735Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9912956Z mov.b64 {%r1109, %r1110}, %rd619; 2026-02-21T08:54:28.9913053Z cvt.rn.f16x2.f32 %r1111, %r1110, %r1109; 2026-02-21T08:54:28.9913240Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9913305Z cvt.u64.u32 %rd620, %r701; 2026-02-21T08:54:28.9913368Z cvt.u64.u32 %rd621, %r702; 2026-02-21T08:54:28.9913440Z shl.b64 %rd622, %rd621, 32; 2026-02-21T08:54:28.9913506Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T08:54:28.9913679Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9913814Z mov.b64 {%r1112, %r1113}, %rd623; 2026-02-21T08:54:28.9913887Z cvt.rn.f16x2.f32 %r1114, %r1113, %r1112; 2026-02-21T08:54:28.9914061Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9914133Z cvt.u64.u32 %rd624, %r703; 2026-02-21T08:54:28.9914207Z cvt.u64.u32 %rd625, %r704; 2026-02-21T08:54:28.9914277Z shl.b64 %rd626, %rd625, 32; 2026-02-21T08:54:28.9914342Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T08:54:28.9914521Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9914589Z mov.b64 {%r1115, %r1116}, %rd627; 2026-02-21T08:54:28.9914663Z cvt.rn.f16x2.f32 %r1117, %r1116, %r1115; 2026-02-21T08:54:28.9914875Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9914977Z cvt.u64.u32 %rd628, %r705; 2026-02-21T08:54:28.9915046Z cvt.u64.u32 %rd629, %r706; 2026-02-21T08:54:28.9915110Z shl.b64 %rd630, %rd629, 32; 2026-02-21T08:54:28.9915185Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T08:54:28.9915355Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9915425Z mov.b64 {%r1118, %r1119}, %rd631; 2026-02-21T08:54:28.9915509Z cvt.rn.f16x2.f32 %r1120, %r1119, %r1118; 2026-02-21T08:54:28.9915677Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9915742Z cvt.u64.u32 %rd632, %r707; 2026-02-21T08:54:28.9915814Z cvt.u64.u32 %rd633, %r708; 2026-02-21T08:54:28.9915875Z shl.b64 %rd634, %rd633, 32; 2026-02-21T08:54:28.9915940Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T08:54:28.9916105Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9916180Z mov.b64 {%r1121, %r1122}, %rd635; 2026-02-21T08:54:28.9916251Z cvt.rn.f16x2.f32 %r1123, %r1122, %r1121; 2026-02-21T08:54:28.9916421Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9916499Z cvt.u64.u32 %rd636, %r709; 2026-02-21T08:54:28.9916597Z cvt.u64.u32 %rd637, %r710; 2026-02-21T08:54:28.9916658Z shl.b64 %rd638, %rd637, 32; 2026-02-21T08:54:28.9916727Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T08:54:28.9916895Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9916960Z mov.b64 {%r1124, %r1125}, %rd639; 2026-02-21T08:54:28.9917029Z cvt.rn.f16x2.f32 %r1126, %r1125, %r1124; 2026-02-21T08:54:28.9917200Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9917261Z cvt.u64.u32 %rd640, %r711; 2026-02-21T08:54:28.9917321Z cvt.u64.u32 %rd641, %r712; 2026-02-21T08:54:28.9917392Z shl.b64 %rd642, %rd641, 32; 2026-02-21T08:54:28.9917462Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T08:54:28.9917628Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9917702Z mov.b64 {%r1127, %r1128}, %rd643; 2026-02-21T08:54:28.9917773Z cvt.rn.f16x2.f32 %r1129, %r1128, %r1127; 2026-02-21T08:54:28.9917941Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9918029Z cvt.u64.u32 %rd644, %r713; 2026-02-21T08:54:28.9918101Z cvt.u64.u32 %rd645, %r714; 2026-02-21T08:54:28.9918163Z shl.b64 %rd646, %rd645, 32; 2026-02-21T08:54:28.9918227Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T08:54:28.9918409Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9918471Z mov.b64 {%r1130, %r1131}, %rd647; 2026-02-21T08:54:28.9918539Z cvt.rn.f16x2.f32 %r1132, %r1131, %r1130; 2026-02-21T08:54:28.9918707Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9918802Z cvt.u64.u32 %rd648, %r715; 2026-02-21T08:54:28.9918860Z cvt.u64.u32 %rd649, %r716; 2026-02-21T08:54:28.9918919Z shl.b64 %rd650, %rd649, 32; 2026-02-21T08:54:28.9918989Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T08:54:28.9919146Z .loc 1 58 27 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:58:27 2026-02-21T08:54:28.9919206Z mov.b64 {%r1133, %r1134}, %rd651; 2026-02-21T08:54:28.9919281Z cvt.rn.f16x2.f32 %r1135, %r1134, %r1133; 2026-02-21T08:54:28.9919441Z .loc 1 59 45 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:59:45 2026-02-21T08:54:28.9919539Z st.shared.v4.b32 [%r741], {%r754, %r757, %r760, %r763}; 2026-02-21T08:54:28.9919651Z st.shared.v4.b32 [%r741+16384], {%r850, %r853, %r856, %r859}; 2026-02-21T08:54:28.9919775Z st.shared.v4.b32 [%r741+32768], {%r946, %r949, %r952, %r955}; 2026-02-21T08:54:28.9919886Z st.shared.v4.b32 [%r741+49152], {%r1042, %r1045, %r1048, %r1051}; 2026-02-21T08:54:28.9919987Z st.shared.v4.b32 [%r740], {%r766, %r769, %r772, %r775}; 2026-02-21T08:54:28.9920081Z st.shared.v4.b32 [%r740+16384], {%r862, %r865, %r868, %r871}; 2026-02-21T08:54:28.9920176Z st.shared.v4.b32 [%r740+32768], {%r958, %r961, %r964, %r967}; 2026-02-21T08:54:28.9920280Z st.shared.v4.b32 [%r740+49152], {%r1054, %r1057, %r1060, %r1063}; 2026-02-21T08:54:28.9920377Z st.shared.v4.b32 [%r738], {%r778, %r781, %r784, %r787}; 2026-02-21T08:54:28.9920469Z st.shared.v4.b32 [%r738+16384], {%r874, %r877, %r880, %r883}; 2026-02-21T08:54:28.9920560Z st.shared.v4.b32 [%r738+32768], {%r970, %r973, %r976, %r979}; 2026-02-21T08:54:28.9920667Z st.shared.v4.b32 [%r738+49152], {%r1066, %r1069, %r1072, %r1075}; 2026-02-21T08:54:28.9920752Z st.shared.v4.b32 [%r736], {%r790, %r793, %r796, %r799}; 2026-02-21T08:54:28.9920843Z st.shared.v4.b32 [%r736+16384], {%r886, %r889, %r892, %r895}; 2026-02-21T08:54:28.9920944Z st.shared.v4.b32 [%r736+32768], {%r982, %r985, %r988, %r991}; 2026-02-21T08:54:28.9921046Z st.shared.v4.b32 [%r736+49152], {%r1078, %r1081, %r1084, %r1087}; 2026-02-21T08:54:28.9921133Z st.shared.v4.b32 [%r734], {%r802, %r805, %r808, %r811}; 2026-02-21T08:54:28.9921254Z st.shared.v4.b32 [%r734+16384], {%r898, %r901, %r904, %r907}; 2026-02-21T08:54:28.9921364Z st.shared.v4.b32 [%r734+32768], {%r994, %r997, %r1000, %r1003}; 2026-02-21T08:54:28.9921463Z st.shared.v4.b32 [%r734+49152], {%r1090, %r1093, %r1096, %r1099}; 2026-02-21T08:54:28.9921549Z st.shared.v4.b32 [%r732], {%r814, %r817, %r820, %r823}; 2026-02-21T08:54:28.9921649Z st.shared.v4.b32 [%r732+16384], {%r910, %r913, %r916, %r919}; 2026-02-21T08:54:28.9921747Z st.shared.v4.b32 [%r732+32768], {%r1006, %r1009, %r1012, %r1015}; 2026-02-21T08:54:28.9921844Z st.shared.v4.b32 [%r732+49152], {%r1102, %r1105, %r1108, %r1111}; 2026-02-21T08:54:28.9921937Z st.shared.v4.b32 [%r730], {%r826, %r829, %r832, %r835}; 2026-02-21T08:54:28.9922029Z st.shared.v4.b32 [%r730+16384], {%r922, %r925, %r928, %r931}; 2026-02-21T08:54:28.9922127Z st.shared.v4.b32 [%r730+32768], {%r1018, %r1021, %r1024, %r1027}; 2026-02-21T08:54:28.9922229Z st.shared.v4.b32 [%r730+49152], {%r1114, %r1117, %r1120, %r1123}; 2026-02-21T08:54:28.9942926Z st.shared.v4.b32 [%r728], {%r838, %r841, %r844, %r847}; 2026-02-21T08:54:28.9943065Z st.shared.v4.b32 [%r728+16384], {%r934, %r937, %r940, %r943}; 2026-02-21T08:54:28.9943198Z st.shared.v4.b32 [%r728+32768], {%r1030, %r1033, %r1036, %r1039}; 2026-02-21T08:54:28.9943307Z st.shared.v4.b32 [%r728+49152], {%r1126, %r1129, %r1132, %r1135}; 2026-02-21T08:54:28.9943368Z // begin inline asm 2026-02-21T08:54:28.9943453Z fence.proxy.async.shared::cta; 2026-02-21T08:54:28.9943517Z // end inline asm 2026-02-21T08:54:28.9943572Z bar.sync 0, 128; 2026-02-21T08:54:28.9943640Z elect.sync %r1136|%p130, -1; 2026-02-21T08:54:28.9943708Z and.pred %p128, %p129, %p130; 2026-02-21T08:54:28.9943771Z shl.b32 %r1137, %r750, 14; 2026-02-21T08:54:28.9943856Z add.s32 %r720, %r130, %r1137; 2026-02-21T08:54:28.9943914Z shl.b32 %r1138, %r750, 6; 2026-02-21T08:54:28.9943980Z or.b32 %r718, %r1138, %r748; 2026-02-21T08:54:28.9944036Z // begin inline asm 2026-02-21T08:54:28.9944227Z @%p128 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd139, {%r718, %r719}], [%r720]; 2026-02-21T08:54:28.9944289Z // end inline asm 2026-02-21T08:54:28.9944356Z cp.async.bulk.commit_group; 2026-02-21T08:54:28.9944427Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:28.9944484Z bar.sync 0, 128; 2026-02-21T08:54:28.9944578Z $L__BB0_14: // %._crit_edge 2026-02-21T08:54:28.9944784Z .loc 1 33 4 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:33:4 2026-02-21T08:54:28.9944842Z bar.sync 0, 128; 2026-02-21T08:54:28.9944907Z // begin inline asm 2026-02-21T08:54:28.9945029Z @%p48 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1139, 256; 2026-02-21T08:54:28.9945112Z // end inline asm 2026-02-21T08:54:28.9945203Z st.shared.b32 [global_smem+98456], 50529027; 2026-02-21T08:54:28.9945259Z barrier.sync 1; 2026-02-21T08:54:28.9945341Z $L__BB0_15: // %common.ret 2026-02-21T08:54:28.9945504Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0 2026-02-21T08:54:28.9945563Z ret; 2026-02-21T08:54:28.9945660Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:54:28.9945721Z mov.b32 %r20, global_smem; 2026-02-21T08:54:28.9945787Z add.s32 %r21, %r20, %r3; 2026-02-21T08:54:28.9945846Z add.s32 %r74, %r20, 65536; 2026-02-21T08:54:28.9945907Z bfe.u32 %r75, %r74, 4, 14; 2026-02-21T08:54:28.9945970Z cvt.u64.u32 %rd48, %r75; 2026-02-21T08:54:28.9946038Z or.b64 %rd30, %rd48, 4611686293372403712; 2026-02-21T08:54:28.9946093Z bfe.u32 %r76, %r20, 4, 14; 2026-02-21T08:54:28.9946149Z cvt.u64.u32 %rd49, %r76; 2026-02-21T08:54:28.9946226Z or.b64 %rd31, %rd49, 4611686293439512576; 2026-02-21T08:54:28.9946282Z add.s32 %r77, %r20, 65568; 2026-02-21T08:54:28.9946337Z bfe.u32 %r78, %r77, 4, 14; 2026-02-21T08:54:28.9946401Z add.s32 %r79, %r20, 32; 2026-02-21T08:54:28.9946455Z bfe.u32 %r80, %r79, 4, 14; 2026-02-21T08:54:28.9946510Z bra.uni $L__BB0_2; 2026-02-21T08:54:28.9946636Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9946809Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9946890Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:28.9946945Z barrier.sync 1; 2026-02-21T08:54:28.9947009Z barrier.sync 1; 2026-02-21T08:54:28.9947086Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:28.9947168Z $L__BB0_2: // %.preheader 2026-02-21T08:54:28.9947266Z // =>This Loop Header: Depth=1 2026-02-21T08:54:28.9947356Z // Child Loop BB0_9 Depth 2 2026-02-21T08:54:28.9947442Z // Child Loop BB0_6 Depth 2 2026-02-21T08:54:28.9947604Z .loc 1 19 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:19 2026-02-21T08:54:28.9947686Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:54:28.9947743Z barrier.sync 1; 2026-02-21T08:54:28.9947809Z ld.shared.b8 %r19, [%r21+98452]; 2026-02-21T08:54:28.9947881Z setp.gt.u32 %p2, %r19, 3; 2026-02-21T08:54:28.9947937Z @%p2 bra $L__BB0_4; 2026-02-21T08:54:28.9948041Z // %bb.3: // %.preheader 2026-02-21T08:54:28.9948140Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9948203Z $L_brx_0: .branchtargets 2026-02-21T08:54:28.9948258Z $L__BB0_5, 2026-02-21T08:54:28.9948309Z $L__BB0_8, 2026-02-21T08:54:28.9948370Z $L__BB0_11, 2026-02-21T08:54:28.9948420Z $L__BB0_15; 2026-02-21T08:54:28.9948478Z brx.idx %r19, $L_brx_0; 2026-02-21T08:54:28.9948562Z $L__BB0_5: // %.peel.next 2026-02-21T08:54:28.9948675Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9948836Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9948912Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:28.9948990Z ld.shared.b32 %r55, [global_smem+98304]; 2026-02-21T08:54:28.9949044Z barrier.sync 1; 2026-02-21T08:54:28.9949101Z cvt.u64.u32 %rd50, %r78; 2026-02-21T08:54:28.9949175Z or.b64 %rd66, %rd50, 4611686293372403712; 2026-02-21T08:54:28.9949232Z cvt.u64.u32 %rd51, %r80; 2026-02-21T08:54:28.9949296Z or.b64 %rd67, %rd51, 4611686293439512576; 2026-02-21T08:54:28.9949358Z add.s32 %r81, %r20, 65600; 2026-02-21T08:54:28.9949414Z bfe.u32 %r82, %r81, 4, 14; 2026-02-21T08:54:28.9949469Z cvt.u64.u32 %rd52, %r82; 2026-02-21T08:54:28.9949530Z or.b64 %rd68, %rd52, 4611686293372403712; 2026-02-21T08:54:28.9949632Z add.s32 %r83, %r20, 64; 2026-02-21T08:54:28.9949691Z bfe.u32 %r84, %r83, 4, 14; 2026-02-21T08:54:28.9949745Z cvt.u64.u32 %rd53, %r84; 2026-02-21T08:54:28.9949814Z or.b64 %rd69, %rd53, 4611686293439512576; 2026-02-21T08:54:28.9949875Z add.s32 %r85, %r20, 65632; 2026-02-21T08:54:28.9949930Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T08:54:28.9949991Z cvt.u64.u32 %rd54, %r86; 2026-02-21T08:54:28.9950061Z or.b64 %rd70, %rd54, 4611686293372403712; 2026-02-21T08:54:28.9950118Z add.s32 %r87, %r20, 96; 2026-02-21T08:54:28.9950178Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T08:54:28.9950244Z cvt.u64.u32 %rd55, %r88; 2026-02-21T08:54:28.9950306Z or.b64 %rd71, %rd55, 4611686293439512576; 2026-02-21T08:54:28.9950362Z add.s32 %r89, %r20, 81920; 2026-02-21T08:54:28.9950418Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T08:54:28.9950483Z cvt.u64.u32 %rd56, %r90; 2026-02-21T08:54:28.9950546Z or.b64 %rd72, %rd56, 4611686293372403712; 2026-02-21T08:54:28.9950600Z add.s32 %r91, %r20, 32768; 2026-02-21T08:54:28.9950663Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T08:54:28.9950719Z cvt.u64.u32 %rd57, %r92; 2026-02-21T08:54:28.9950779Z or.b64 %rd73, %rd57, 4611686293439512576; 2026-02-21T08:54:28.9950839Z add.s32 %r93, %r20, 81952; 2026-02-21T08:54:28.9950894Z bfe.u32 %r94, %r93, 4, 14; 2026-02-21T08:54:28.9971554Z cvt.u64.u32 %rd58, %r94; 2026-02-21T08:54:28.9971654Z or.b64 %rd74, %rd58, 4611686293372403712; 2026-02-21T08:54:28.9971709Z add.s32 %r95, %r20, 32800; 2026-02-21T08:54:28.9971758Z bfe.u32 %r96, %r95, 4, 14; 2026-02-21T08:54:28.9971811Z cvt.u64.u32 %rd59, %r96; 2026-02-21T08:54:28.9971872Z or.b64 %rd75, %rd59, 4611686293439512576; 2026-02-21T08:54:28.9971923Z add.s32 %r97, %r20, 81984; 2026-02-21T08:54:28.9971972Z bfe.u32 %r98, %r97, 4, 14; 2026-02-21T08:54:28.9972024Z cvt.u64.u32 %rd60, %r98; 2026-02-21T08:54:28.9972085Z or.b64 %rd76, %rd60, 4611686293372403712; 2026-02-21T08:54:28.9972136Z add.s32 %r99, %r20, 32832; 2026-02-21T08:54:28.9972191Z bfe.u32 %r100, %r99, 4, 14; 2026-02-21T08:54:28.9972249Z cvt.u64.u32 %rd61, %r100; 2026-02-21T08:54:28.9972313Z or.b64 %rd77, %rd61, 4611686293439512576; 2026-02-21T08:54:28.9972368Z add.s32 %r101, %r20, 82016; 2026-02-21T08:54:28.9972423Z bfe.u32 %r102, %r101, 4, 14; 2026-02-21T08:54:28.9972481Z cvt.u64.u32 %rd62, %r102; 2026-02-21T08:54:28.9972540Z or.b64 %rd78, %rd62, 4611686293372403712; 2026-02-21T08:54:28.9972592Z add.s32 %r103, %r20, 32864; 2026-02-21T08:54:28.9972651Z bfe.u32 %r104, %r103, 4, 14; 2026-02-21T08:54:28.9972703Z cvt.u64.u32 %rd63, %r104; 2026-02-21T08:54:28.9972789Z or.b64 %rd79, %rd63, 4611686293439512576; 2026-02-21T08:54:28.9972952Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0 2026-02-21T08:54:28.9973010Z bar.warp.sync -1; 2026-02-21T08:54:28.9973061Z add.s32 %r53, %r20, 98432; 2026-02-21T08:54:28.9973110Z mov.b32 %r1140, 0; 2026-02-21T08:54:28.9973167Z // begin inline asm 2026-02-21T08:54:28.9973215Z 2026-02-21T08:54:28.9973262Z { 2026-02-21T08:54:28.9973322Z .reg .pred complete; 2026-02-21T08:54:28.9973400Z waitLoop: 2026-02-21T08:54:28.9973518Z mbarrier.try_wait.parity.shared.b64 complete, [%r53], %r1140; 2026-02-21T08:54:28.9973578Z @!complete bra.uni waitLoop; 2026-02-21T08:54:28.9973626Z } 2026-02-21T08:54:28.9973632Z 2026-02-21T08:54:28.9973686Z // end inline asm 2026-02-21T08:54:28.9973842Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9973904Z elect.sync %r105|%p11, -1; 2026-02-21T08:54:28.9973957Z mov.b32 %r56, 138412048; 2026-02-21T08:54:28.9974010Z mov.pred %p10, 0; 2026-02-21T08:54:28.9974061Z // begin inline asm 2026-02-21T08:54:28.9974201Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd30, %rd31, %r56, %p10; 2026-02-21T08:54:28.9974251Z // end inline asm 2026-02-21T08:54:28.9974304Z mov.pred %p12, -1; 2026-02-21T08:54:28.9974358Z // begin inline asm 2026-02-21T08:54:28.9974516Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd66, %rd67, %r56, %p12; 2026-02-21T08:54:28.9974570Z // end inline asm 2026-02-21T08:54:28.9974623Z // begin inline asm 2026-02-21T08:54:28.9975066Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd68, %rd69, %r56, %p12; 2026-02-21T08:54:28.9975117Z // end inline asm 2026-02-21T08:54:28.9975169Z // begin inline asm 2026-02-21T08:54:28.9975292Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd70, %rd71, %r56, %p12; 2026-02-21T08:54:28.9975341Z // end inline asm 2026-02-21T08:54:28.9975391Z // begin inline asm 2026-02-21T08:54:28.9975514Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd72, %rd73, %r56, %p12; 2026-02-21T08:54:28.9975563Z // end inline asm 2026-02-21T08:54:28.9975613Z // begin inline asm 2026-02-21T08:54:28.9975733Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd74, %rd75, %r56, %p12; 2026-02-21T08:54:28.9975782Z // end inline asm 2026-02-21T08:54:28.9975832Z // begin inline asm 2026-02-21T08:54:28.9975952Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd76, %rd77, %r56, %p12; 2026-02-21T08:54:28.9976002Z // end inline asm 2026-02-21T08:54:28.9976051Z // begin inline asm 2026-02-21T08:54:28.9976166Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd78, %rd79, %r56, %p12; 2026-02-21T08:54:28.9976220Z // end inline asm 2026-02-21T08:54:28.9976305Z add.s32 %r106, %r20, 98416; 2026-02-21T08:54:28.9976359Z cvt.u64.u32 %rd46, %r106; 2026-02-21T08:54:28.9976412Z // begin inline asm 2026-02-21T08:54:28.9976536Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:54:28.9976584Z // end inline asm 2026-02-21T08:54:28.9976641Z add.s32 %r107, %r20, 98448; 2026-02-21T08:54:28.9976697Z cvt.u64.u32 %rd47, %r107; 2026-02-21T08:54:28.9976747Z // begin inline asm 2026-02-21T08:54:28.9976862Z @%p10 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T08:54:28.9976914Z // end inline asm 2026-02-21T08:54:28.9976964Z mov.b32 %r1141, 1; 2026-02-21T08:54:28.9977058Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:28.9977151Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:28.9977305Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0 2026-02-21T08:54:28.9977363Z bar.warp.sync -1; 2026-02-21T08:54:28.9977415Z // begin inline asm 2026-02-21T08:54:28.9977459Z 2026-02-21T08:54:28.9977502Z { 2026-02-21T08:54:28.9977558Z .reg .pred complete; 2026-02-21T08:54:28.9977612Z waitLoop: 2026-02-21T08:54:28.9977756Z mbarrier.try_wait.parity.shared.b64 complete, [%r53], %r1141; 2026-02-21T08:54:28.9977816Z @!complete bra.uni waitLoop; 2026-02-21T08:54:28.9977865Z } 2026-02-21T08:54:28.9977868Z 2026-02-21T08:54:28.9977918Z // end inline asm 2026-02-21T08:54:28.9978076Z .loc 1 56 52 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:56:52 2026-02-21T08:54:28.9978137Z setp.eq.b32 %p46, %r1140, 1792; 2026-02-21T08:54:28.9978200Z elect.sync %r127|%p29, -1; 2026-02-21T08:54:28.9978278Z // begin inline asm 2026-02-21T08:54:28.9978400Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd30, %rd31, %r56, %p12; 2026-02-21T08:54:28.9978453Z // end inline asm 2026-02-21T08:54:28.9978502Z // begin inline asm 2026-02-21T08:54:28.9978621Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd66, %rd67, %r56, %p12; 2026-02-21T08:54:28.9978675Z // end inline asm 2026-02-21T08:54:28.9978724Z // begin inline asm 2026-02-21T08:54:28.9978840Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd68, %rd69, %r56, %p12; 2026-02-21T08:54:28.9978889Z // end inline asm 2026-02-21T08:54:28.9978942Z // begin inline asm 2026-02-21T08:54:28.9979058Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd70, %rd71, %r56, %p12; 2026-02-21T08:54:28.9979107Z // end inline asm 2026-02-21T08:54:28.9979161Z // begin inline asm 2026-02-21T08:54:28.9979305Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd72, %rd73, %r56, %p12; 2026-02-21T08:54:28.9979356Z // end inline asm 2026-02-21T08:54:28.9979410Z // begin inline asm 2026-02-21T08:54:28.9979538Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd74, %rd75, %r56, %p12; 2026-02-21T08:54:28.9979590Z // end inline asm 2026-02-21T08:54:28.9979644Z // begin inline asm 2026-02-21T08:54:28.9979770Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd76, %rd77, %r56, %p12; 2026-02-21T08:54:28.9979821Z // end inline asm 2026-02-21T08:54:28.9979875Z // begin inline asm 2026-02-21T08:54:28.9980003Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r55 + 0 ], %rd78, %rd79, %r56, %p12; 2026-02-21T08:54:28.9980055Z // end inline asm 2026-02-21T08:54:28.9980108Z // begin inline asm 2026-02-21T08:54:28.9980234Z @%p29 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:54:28.9980287Z // end inline asm 2026-02-21T08:54:28.9980349Z and.pred %p45, %p46, %p29; 2026-02-21T08:54:28.9980402Z // begin inline asm 2026-02-21T08:54:28.9980527Z @%p45 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T08:54:28.9980582Z // end inline asm 2026-02-21T08:54:28.9980735Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0 2026-02-21T08:54:28.9980802Z xor.b32 %r1141, %r1141, 1; 2026-02-21T08:54:28.9980994Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9981055Z add.s32 %r1140, %r1140, 128; 2026-02-21T08:54:28.9981130Z setp.lt.u32 %p47, %r1140, 1920; 2026-02-21T08:54:28.9981190Z @%p47 bra $L__BB0_6; 2026-02-21T08:54:28.9981273Z // %bb.7: // %.loopexit 2026-02-21T08:54:28.9981361Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9981425Z barrier.sync 1; 2026-02-21T08:54:28.9981503Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:28.9981559Z bra.uni $L__BB0_2; 2026-02-21T08:54:28.9981661Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9981823Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9981899Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:28.9982000Z ld.shared.v2.b32 {%r38, %r42}, [global_smem+98312]; 2026-02-21T08:54:28.9982058Z barrier.sync 1; 2026-02-21T08:54:28.9982221Z .loc 1 21 67 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:21:67 2026-02-21T08:54:28.9982281Z mov.u32 %r24, %ctaid.x; 2026-02-21T08:54:28.9982368Z mov.u32 %r25, %ctaid.y; 2026-02-21T08:54:28.9982426Z mov.u32 %r26, %ctaid.z; 2026-02-21T08:54:28.9982484Z mov.u32 %r27, %nctaid.x; 2026-02-21T08:54:28.9982548Z mov.u32 %r28, %nctaid.y; 2026-02-21T08:54:28.9982611Z mad.lo.s32 %r29, %r26, %r28, %r25; 2026-02-21T08:54:28.9982672Z mad.lo.s32 %r30, %r29, %r27, %r24; 2026-02-21T08:54:28.9982730Z mul.lo.s32 %r31, %r30, 384; 2026-02-21T08:54:28.9982902Z .loc 1 22 68 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:22:68 2026-02-21T08:54:28.9982981Z add.s32 %r32, %r31, 128; 2026-02-21T08:54:28.9983038Z cvt.s64.s32 %rd24, %r32; 2026-02-21T08:54:28.9983104Z add.s64 %rd25, %rd23, %rd24; 2026-02-21T08:54:28.9983170Z cvta.global.u64 %rd29, %rd25; 2026-02-21T08:54:28.9983335Z .loc 1 21 67 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:21:67 2026-02-21T08:54:28.9983399Z cvt.s64.s32 %rd26, %r31; 2026-02-21T08:54:28.9983457Z add.s64 %rd27, %rd23, %rd26; 2026-02-21T08:54:28.9983522Z cvta.global.u64 %rd28, %rd27; 2026-02-21T08:54:28.9983579Z add.s32 %r11, %r1, -128; 2026-02-21T08:54:28.9983645Z shr.u32 %r12, %r11, 5; 2026-02-21T08:54:28.9983700Z mov.b32 %r1143, 0; 2026-02-21T08:54:28.9983756Z mov.b32 %r1142, -128; 2026-02-21T08:54:28.9983859Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:28.9983975Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:28.9984140Z .loc 1 0 67 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0:67 2026-02-21T08:54:28.9984212Z setp.lt.u32 %p6, %r11, 64; 2026-02-21T08:54:28.9984273Z setp.eq.b32 %p3, %r11, 0; 2026-02-21T08:54:28.9984331Z add.s32 %r33, %r20, 98416; 2026-02-21T08:54:28.9984387Z // begin inline asm 2026-02-21T08:54:28.9984444Z 2026-02-21T08:54:28.9984493Z { 2026-02-21T08:54:28.9984552Z .reg .pred complete; 2026-02-21T08:54:28.9984613Z waitLoop: 2026-02-21T08:54:28.9984762Z mbarrier.try_wait.parity.shared.b64 complete, [%r33], %r1143; 2026-02-21T08:54:28.9984828Z @!complete bra.uni waitLoop; 2026-02-21T08:54:28.9984876Z } 2026-02-21T08:54:28.9984880Z 2026-02-21T08:54:28.9984942Z // end inline asm 2026-02-21T08:54:28.9984996Z bar.sync 3, 64; 2026-02-21T08:54:28.9985053Z add.s32 %r35, %r20, 98432; 2026-02-21T08:54:28.9985114Z // begin inline asm 2026-02-21T08:54:28.9985225Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r35], 98304; 2026-02-21T08:54:28.9985281Z // end inline asm 2026-02-21T08:54:28.9985336Z bar.sync 3, 64; 2026-02-21T08:54:28.9985416Z shfl.sync.idx.b32 %r45, %r12, 0, 31, -1; 2026-02-21T08:54:28.9985477Z elect.sync %r46|%p7, -1; 2026-02-21T08:54:28.9985537Z and.pred %p4, %p6, %p7; 2026-02-21T08:54:28.9985634Z and.b32 %r47, %r45, 1; 2026-02-21T08:54:28.9985693Z shl.b32 %r48, %r47, 14; 2026-02-21T08:54:28.9985753Z add.s32 %r49, %r20, %r48; 2026-02-21T08:54:28.9985812Z add.s32 %r36, %r49, 65536; 2026-02-21T08:54:28.9985877Z shl.b32 %r50, %r47, 6; 2026-02-21T08:54:28.9986081Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9986147Z add.s32 %r1142, %r1142, 128; 2026-02-21T08:54:28.9986308Z .loc 1 0 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:0 2026-02-21T08:54:28.9986368Z add.s32 %r41, %r1142, %r50; 2026-02-21T08:54:28.9986424Z // begin inline asm 2026-02-21T08:54:28.9986671Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r36], [%rd28, {%r41, %r38}], [%r35]; 2026-02-21T08:54:28.9986729Z // end inline asm 2026-02-21T08:54:28.9986784Z bar.sync 3, 64; 2026-02-21T08:54:28.9986846Z elect.sync %r51|%p8, -1; 2026-02-21T08:54:28.9986915Z and.pred %p5, %p6, %p8; 2026-02-21T08:54:28.9986974Z shl.b32 %r52, %r47, 15; 2026-02-21T08:54:28.9987032Z add.s32 %r40, %r20, %r52; 2026-02-21T08:54:28.9987098Z // begin inline asm 2026-02-21T08:54:28.9987361Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r40], [%rd29, {%r41, %r42}], [%r35]; 2026-02-21T08:54:28.9987416Z // end inline asm 2026-02-21T08:54:28.9987477Z xor.b32 %r1143, %r1143, 1; 2026-02-21T08:54:28.9987633Z .loc 1 50 79 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:50:79 2026-02-21T08:54:28.9987697Z setp.lt.u32 %p9, %r1142, 1920; 2026-02-21T08:54:28.9987754Z @%p9 bra $L__BB0_9; 2026-02-21T08:54:28.9987856Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9987957Z barrier.sync 1; 2026-02-21T08:54:28.9988030Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:28.9988093Z bra.uni $L__BB0_2; 2026-02-21T08:54:28.9988184Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:28.9988336Z .loc 1 19 0 // c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py:19 2026-02-21T08:54:28.9988398Z barrier.sync 1; 2026-02-21T08:54:28.9988451Z barrier.sync 1; 2026-02-21T08:54:28.9988507Z bra.uni $L__BB0_2; 2026-02-21T08:54:28.9988561Z $L__tmp1: 2026-02-21T08:54:28.9988623Z $L__func_end0: 2026-02-21T08:54:28.9988703Z // -- End function 2026-02-21T08:54:28.9988752Z } 2026-02-21T08:54:28.9988952Z .file 1 "/tmp/torchinductor_root/67/c675o5gey3vrf3f7tvytqc5f6fdq32nk3ovqnownp43grfkwc3gs.py" 2026-02-21T08:54:28.9989013Z .section .debug_abbrev 2026-02-21T08:54:28.9989063Z { 2026-02-21T08:54:28.9989176Z .b8 1 // Abbreviation Code 2026-02-21T08:54:28.9989274Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:28.9989353Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:28.9989433Z .b8 37 // DW_AT_producer 2026-02-21T08:54:28.9989516Z .b8 8 // DW_FORM_string 2026-02-21T08:54:28.9989588Z .b8 19 // DW_AT_language 2026-02-21T08:54:28.9989663Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:28.9989741Z .b8 3 // DW_AT_name 2026-02-21T08:54:28.9989812Z .b8 8 // DW_FORM_string 2026-02-21T08:54:28.9989889Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:28.9989969Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:28.9990044Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:28.9990129Z .b8 8 // DW_FORM_string 2026-02-21T08:54:28.9990200Z .b8 0 // EOM(1) 2026-02-21T08:54:28.9990267Z .b8 0 // EOM(2) 2026-02-21T08:54:28.9990338Z .b8 0 // EOM(3) 2026-02-21T08:54:28.9990414Z } 2026-02-21T08:54:28.9990473Z .section .debug_info 2026-02-21T08:54:28.9990519Z { 2026-02-21T08:54:28.9990605Z .b32 104 // Length of Unit 2026-02-21T08:54:28.9990689Z .b8 2 // DWARF version number 2026-02-21T08:54:28.9990737Z .b8 0 2026-02-21T08:54:28.9990854Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:28.9990941Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:28.9991037Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:28.9991123Z .b8 116 // DW_AT_producer 2026-02-21T08:54:28.9991176Z .b8 114 2026-02-21T08:54:28.9991226Z .b8 105 2026-02-21T08:54:28.9991275Z .b8 116 2026-02-21T08:54:28.9991332Z .b8 111 2026-02-21T08:54:28.9991379Z .b8 110 2026-02-21T08:54:28.9991429Z .b8 0 2026-02-21T08:54:28.9991507Z .b8 2 // DW_AT_language 2026-02-21T08:54:28.9991556Z .b8 0 2026-02-21T08:54:28.9991628Z .b8 99 // DW_AT_name 2026-02-21T08:54:28.9991676Z .b8 54 2026-02-21T08:54:28.9991734Z .b8 55 2026-02-21T08:54:28.9991803Z .b8 53 2026-02-21T08:54:28.9991854Z .b8 111 2026-02-21T08:54:28.9991910Z .b8 53 2026-02-21T08:54:28.9991958Z .b8 103 2026-02-21T08:54:28.9992006Z .b8 101 2026-02-21T08:54:28.9992055Z .b8 121 2026-02-21T08:54:28.9992111Z .b8 51 2026-02-21T08:54:28.9992160Z .b8 118 2026-02-21T08:54:28.9992209Z .b8 114 2026-02-21T08:54:28.9992257Z .b8 102 2026-02-21T08:54:28.9992312Z .b8 51 2026-02-21T08:54:28.9992361Z .b8 102 2026-02-21T08:54:28.9992410Z .b8 55 2026-02-21T08:54:28.9992467Z .b8 116 2026-02-21T08:54:28.9992538Z .b8 118 2026-02-21T08:54:28.9992587Z .b8 121 2026-02-21T08:54:28.9992637Z .b8 116 2026-02-21T08:54:28.9992692Z .b8 113 2026-02-21T08:54:28.9992740Z .b8 99 2026-02-21T08:54:28.9992787Z .b8 53 2026-02-21T08:54:28.9992841Z .b8 102 2026-02-21T08:54:28.9992890Z .b8 54 2026-02-21T08:54:28.9992938Z .b8 102 2026-02-21T08:54:28.9992985Z .b8 100 2026-02-21T08:54:28.9993039Z .b8 113 2026-02-21T08:54:28.9993086Z .b8 51 2026-02-21T08:54:28.9993132Z .b8 50 2026-02-21T08:54:28.9993186Z .b8 110 2026-02-21T08:54:28.9993235Z .b8 107 2026-02-21T08:54:28.9993282Z .b8 51 2026-02-21T08:54:28.9993329Z .b8 111 2026-02-21T08:54:28.9993384Z .b8 118 2026-02-21T08:54:28.9993430Z .b8 113 2026-02-21T08:54:28.9993477Z .b8 110 2026-02-21T08:54:28.9993523Z .b8 111 2026-02-21T08:54:28.9993581Z .b8 119 2026-02-21T08:54:28.9993629Z .b8 110 2026-02-21T08:54:28.9993677Z .b8 112 2026-02-21T08:54:28.9993730Z .b8 52 2026-02-21T08:54:28.9993776Z .b8 51 2026-02-21T08:54:28.9993823Z .b8 103 2026-02-21T08:54:28.9993891Z .b8 114 2026-02-21T08:54:28.9993951Z .b8 102 2026-02-21T08:54:28.9994000Z .b8 107 2026-02-21T08:54:28.9994047Z .b8 119 2026-02-21T08:54:28.9994099Z .b8 99 2026-02-21T08:54:28.9994147Z .b8 51 2026-02-21T08:54:28.9994194Z .b8 103 2026-02-21T08:54:28.9994241Z .b8 115 2026-02-21T08:54:28.9994296Z .b8 46 2026-02-21T08:54:28.9994344Z .b8 112 2026-02-21T08:54:28.9994391Z .b8 121 2026-02-21T08:54:28.9994439Z .b8 0 2026-02-21T08:54:28.9994531Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:28.9994603Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:28.9994651Z .b8 116 2026-02-21T08:54:28.9994734Z .b8 109 2026-02-21T08:54:28.9994782Z .b8 112 2026-02-21T08:54:28.9994830Z .b8 47 2026-02-21T08:54:28.9994877Z .b8 116 2026-02-21T08:54:28.9994933Z .b8 111 2026-02-21T08:54:28.9994981Z .b8 114 2026-02-21T08:54:28.9995029Z .b8 99 2026-02-21T08:54:28.9995083Z .b8 104 2026-02-21T08:54:28.9995132Z .b8 105 2026-02-21T08:54:28.9995180Z .b8 110 2026-02-21T08:54:28.9995229Z .b8 100 2026-02-21T08:54:28.9995287Z .b8 117 2026-02-21T08:54:28.9995334Z .b8 99 2026-02-21T08:54:28.9995383Z .b8 116 2026-02-21T08:54:28.9995441Z .b8 111 2026-02-21T08:54:28.9995489Z .b8 114 2026-02-21T08:54:28.9995539Z .b8 95 2026-02-21T08:54:28.9995590Z .b8 114 2026-02-21T08:54:28.9995683Z .b8 111 2026-02-21T08:54:28.9995734Z .b8 111 2026-02-21T08:54:28.9995785Z .b8 116 2026-02-21T08:54:28.9995839Z .b8 47 2026-02-21T08:54:28.9995887Z .b8 54 2026-02-21T08:54:28.9995935Z .b8 55 2026-02-21T08:54:28.9995982Z .b8 0 2026-02-21T08:54:28.9996039Z } 2026-02-21T08:54:28.9996101Z .section .debug_macinfo { } 2026-02-21T08:54:28.9996105Z 2026-02-21T08:54:28.9996181Z ================================================================ 2026-02-21T08:54:28.9996288Z please share the reproducer above with Triton project. 2026-02-21T08:54:29.7091401Z 2026-02-21T08:54:29.7095968Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 42/42 15.1 configs/s 2026-02-21T08:54:30.2238701Z Generation 15: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1776.3 2026-02-21T08:54:30.2239996Z configs/s 2026-02-21T08:54:30.2839424Z [375s] Generation 15 complete: 2026-02-21T08:54:30.2839767Z error=10 2026-02-21T08:54:30.2839949Z ok=34 2026-02-21T08:54:30.2840125Z min=0.1054 2026-02-21T08:54:30.2840255Z mid=0.2643 2026-02-21T08:54:30.2840385Z max=26.7244 2026-02-21T08:54:30.2840545Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:54:30.2841125Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:54:30.2841343Z 'l2_groupings': [64], 2026-02-21T08:54:30.2841522Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:54:30.2841707Z 'loop_orders': [[0, 1]], 2026-02-21T08:54:30.2841868Z 'num_stages': 3, 2026-02-21T08:54:30.2842005Z 'num_warps': 8, 2026-02-21T08:54:30.2842145Z 'pid_type': 'flat', 2026-02-21T08:54:30.2842302Z 'range_flattens': [None, None], 2026-02-21T08:54:30.2842476Z 'range_multi_buffers': [None, None], 2026-02-21T08:54:30.2842670Z 'range_num_stages': [0, 0], 2026-02-21T08:54:30.2842950Z 'range_unroll_factors': [0, 0], 2026-02-21T08:54:30.2843131Z 'range_warp_specializes': [None, True]} 2026-02-21T08:54:30.2873624Z [375s] Fitting surrogate: 1117 points, 1117 targets 2026-02-21T08:54:30.9530064Z [376s] Generation 16 starting: 34 neighbors, 2 active search path(s) 2026-02-21T08:54:38.8141957Z Generation 16: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36/36 0.9 configs/s 2026-02-21T08:54:39.2795556Z 2026-02-21T08:54:39.2797728Z 2026-02-21T08:54:39.2797991Z ================================================================ 2026-02-21T08:54:39.2798260Z Internal Triton PTX codegen error 2026-02-21T08:54:39.2798512Z `ptxas` stderr: 2026-02-21T08:54:39.2798959Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 308 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:39.2803006Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:39.2807462Z 2026-02-21T08:54:39.2809209Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpwbg3kqwo.ptx -o /tmp/tmpwbg3kqwo.ptx.o 2026-02-21T08:54:39.2809696Z 2026-02-21T08:54:39.2809700Z 2026-02-21T08:54:39.2809775Z // 2026-02-21T08:54:39.2809922Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:39.2810100Z // 2026-02-21T08:54:39.2810167Z 2026-02-21T08:54:39.2810228Z .version 8.7 2026-02-21T08:54:39.2810364Z .target sm_100a 2026-02-21T08:54:39.2810509Z .address_size 64 2026-02-21T08:54:39.2810591Z 2026-02-21T08:54:39.2810717Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:39.2810968Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:39.2811172Z // @_helion_matmul 2026-02-21T08:54:39.2811377Z .visible .entry _helion_matmul( 2026-02-21T08:54:39.2811594Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:39.2811846Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:39.2812096Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:39.2812334Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:39.2812590Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:39.2813061Z ) 2026-02-21T08:54:39.2813200Z .reqntid 128 2026-02-21T08:54:39.2813333Z .maxnreg 32 2026-02-21T08:54:39.2813467Z { 2026-02-21T08:54:39.2813607Z .reg .pred %p<87>; 2026-02-21T08:54:39.2813760Z .reg .b16 %rs<4>; 2026-02-21T08:54:39.2813912Z .reg .b32 %r<2335>; 2026-02-21T08:54:39.2814057Z .reg .b64 %rd<1292>; 2026-02-21T08:54:39.2814217Z $L__func_begin0: 2026-02-21T08:54:39.2814902Z 2026-02-21T08:54:39.2814960Z // %bb.0: 2026-02-21T08:54:39.2815211Z .loc 1 19 0 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:19 2026-02-21T08:54:39.2815506Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:39.2815657Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:54:39.2815851Z ld.param.b64 %rd72, [_helion_matmul_param_2]; 2026-02-21T08:54:39.2816048Z mov.b32 %r48, global_smem; 2026-02-21T08:54:39.2816202Z // begin inline asm 2026-02-21T08:54:39.2816450Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r48], 512; 2026-02-21T08:54:39.2816690Z // end inline asm 2026-02-21T08:54:39.2816853Z ld.param.b64 %rd89, [_helion_matmul_param_3]; 2026-02-21T08:54:39.2817033Z bar.sync 0; 2026-02-21T08:54:39.2817178Z ld.shared.b32 %r2328, [global_smem]; 2026-02-21T08:54:39.2817413Z bar.sync 0; 2026-02-21T08:54:39.2817548Z // begin inline asm 2026-02-21T08:54:39.2817755Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:39.2817992Z // end inline asm 2026-02-21T08:54:39.2818252Z .loc 1 21 73 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:21:73 2026-02-21T08:54:39.2818539Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:54:39.2818705Z mov.u32 %r57, %ctaid.y; 2026-02-21T08:54:39.2818852Z mov.u32 %r58, %ctaid.z; 2026-02-21T08:54:39.2819093Z mov.u32 %r59, %nctaid.x; 2026-02-21T08:54:39.2819237Z mov.u32 %r60, %nctaid.y; 2026-02-21T08:54:39.2819394Z mad.lo.s32 %r61, %r58, %r60, %r57; 2026-02-21T08:54:39.2819571Z mad.lo.s32 %r62, %r61, %r59, %r3; 2026-02-21T08:54:39.2819735Z shl.b32 %r63, %r62, 7; 2026-02-21T08:54:39.2819886Z cvt.s64.s32 %rd90, %r63; 2026-02-21T08:54:39.2820066Z add.s64 %rd86, %rd89, %rd90; 2026-02-21T08:54:39.2820224Z shl.b32 %r64, %r1, 2; 2026-02-21T08:54:39.2820386Z add.s32 %r49, %r48, %r64; 2026-02-21T08:54:39.2820534Z mov.b32 %r66, 0; 2026-02-21T08:54:39.2820669Z // begin inline asm 2026-02-21T08:54:39.2820815Z @%p1 st.shared.b32 [ %r49 + 0 ], %r66; 2026-02-21T08:54:39.2820989Z // end inline asm 2026-02-21T08:54:39.2821121Z bar.warp.sync -1; 2026-02-21T08:54:39.2821270Z setp.eq.b32 %p81, %r1, 0; 2026-02-21T08:54:39.2821426Z cvt.u64.u32 %rd71, %r48; 2026-02-21T08:54:39.2821570Z // begin inline asm 2026-02-21T08:54:39.2821863Z @%p81 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd71 + 0 ], %rd72; 2026-02-21T08:54:39.2822148Z // end inline asm 2026-02-21T08:54:39.2822285Z // begin inline asm 2026-02-21T08:54:39.2822503Z @%p81 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x1; 2026-02-21T08:54:39.2822766Z // end inline asm 2026-02-21T08:54:39.2822893Z mov.b32 %r51, 64; 2026-02-21T08:54:39.2823032Z // begin inline asm 2026-02-21T08:54:39.2823274Z @%p81 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x0, %r51; 2026-02-21T08:54:39.2823536Z // end inline asm 2026-02-21T08:54:39.2823672Z mov.b32 %r52, 256; 2026-02-21T08:54:39.2823804Z // begin inline asm 2026-02-21T08:54:39.2824036Z @%p81 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x1, %r52; 2026-02-21T08:54:39.2824296Z // end inline asm 2026-02-21T08:54:39.2824431Z mov.b32 %r53, 12288; 2026-02-21T08:54:39.2824564Z // begin inline asm 2026-02-21T08:54:39.2824849Z @%p81 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x0, %r53; 2026-02-21T08:54:39.2825121Z // end inline asm 2026-02-21T08:54:39.2825249Z mov.b32 %r54, 2048; 2026-02-21T08:54:39.2825394Z // begin inline asm 2026-02-21T08:54:39.2825623Z @%p81 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x1, %r54; 2026-02-21T08:54:39.2825931Z // end inline asm 2026-02-21T08:54:39.2826063Z mov.b64 %rd79, 24576; 2026-02-21T08:54:39.2826215Z // begin inline asm 2026-02-21T08:54:39.2826474Z @%p81 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd71 + 0 ], 0x0, %rd79; 2026-02-21T08:54:39.2826752Z // end inline asm 2026-02-21T08:54:39.2826886Z mov.b32 %r55, 1; 2026-02-21T08:54:39.2827012Z // begin inline asm 2026-02-21T08:54:39.2827266Z @%p81 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x0, %r55; 2026-02-21T08:54:39.2827549Z // end inline asm 2026-02-21T08:54:39.2827683Z // begin inline asm 2026-02-21T08:54:39.2827926Z @%p81 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x1, %r55; 2026-02-21T08:54:39.2828210Z // end inline asm 2026-02-21T08:54:39.2828346Z // begin inline asm 2026-02-21T08:54:39.2828570Z @%p81 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x6; 2026-02-21T08:54:39.2828836Z // end inline asm 2026-02-21T08:54:39.2828966Z // begin inline asm 2026-02-21T08:54:39.2829217Z @%p81 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x0; 2026-02-21T08:54:39.2829500Z // end inline asm 2026-02-21T08:54:39.2829669Z // begin inline asm 2026-02-21T08:54:39.2829907Z @%p81 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x3; 2026-02-21T08:54:39.2830161Z // end inline asm 2026-02-21T08:54:39.2830296Z // begin inline asm 2026-02-21T08:54:39.2830512Z @%p81 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd71 + 0 ], 0x0; 2026-02-21T08:54:39.2830766Z // end inline asm 2026-02-21T08:54:39.2830893Z // begin inline asm 2026-02-21T08:54:39.2831230Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd86 + 0 ], [ %rd71 + 0 ], 0x80; 2026-02-21T08:54:39.2831634Z // end inline asm 2026-02-21T08:54:39.2831760Z // begin inline asm 2026-02-21T08:54:39.2831968Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd86 + 0 ], 0x80; 2026-02-21T08:54:39.2832209Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:54:39.2832399Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:39.2832569Z // end inline asm 2026-02-21T08:54:39.2832704Z bar.sync 0; 2026-02-21T08:54:39.2832841Z cvta.global.u64 %rd250, %rd86; 2026-02-21T08:54:39.2833121Z .loc 1 30 74 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:30:74 2026-02-21T08:54:39.2833420Z setp.gt.u32 %p21, %r3, 383; 2026-02-21T08:54:39.2833578Z @%p21 bra $L__BB0_8; 2026-02-21T08:54:39.2833739Z // %bb.1: // %.lr.ph 2026-02-21T08:54:39.2834054Z .loc 1 0 74 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:0:74 2026-02-21T08:54:39.2834361Z ld.param.b64 %rd70, [_helion_matmul_param_1]; 2026-02-21T08:54:39.2834565Z ld.param.b64 %rd69, [_helion_matmul_param_0]; 2026-02-21T08:54:39.2834900Z .loc 1 50 48 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:50:48 2026-02-21T08:54:39.2835187Z shl.b32 %r771, %r1, 3; 2026-02-21T08:54:39.2835335Z and.b32 %r772, %r771, 24; 2026-02-21T08:54:39.2835602Z .loc 1 42 45 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:42:45 2026-02-21T08:54:39.2835886Z shr.u32 %r773, %r1, 2; 2026-02-21T08:54:39.2836039Z or.b32 %r4, %r773, 224; 2026-02-21T08:54:39.2836186Z bfe.u32 %r5, %r1, 2, 5; 2026-02-21T08:54:39.2836335Z or.b32 %r774, %r5, 192; 2026-02-21T08:54:39.2836475Z or.b32 %r775, %r5, 160; 2026-02-21T08:54:39.2836619Z or.b32 %r776, %r5, 128; 2026-02-21T08:54:39.2836757Z or.b32 %r777, %r5, 96; 2026-02-21T08:54:39.2836906Z or.b32 %r778, %r5, 64; 2026-02-21T08:54:39.2837052Z or.b32 %r779, %r5, 32; 2026-02-21T08:54:39.2837190Z shr.u32 %r780, %r1, 5; 2026-02-21T08:54:39.2837339Z and.b32 %r781, %r1, 127; 2026-02-21T08:54:39.2837484Z shl.b32 %r782, %r781, 4; 2026-02-21T08:54:39.2837633Z shl.b32 %r783, %r1, 1; 2026-02-21T08:54:39.2837804Z and.b32 %r784, %r783, 48; 2026-02-21T08:54:39.2837961Z xor.b32 %r6, %r782, %r784; 2026-02-21T08:54:39.2838112Z add.s32 %r786, %r48, %r6; 2026-02-21T08:54:39.2838271Z add.s32 %r874, %r786, 212992; 2026-02-21T08:54:39.2838426Z add.s32 %r876, %r786, 215040; 2026-02-21T08:54:39.2838585Z add.s32 %r878, %r786, 217088; 2026-02-21T08:54:39.2838742Z add.s32 %r880, %r786, 219136; 2026-02-21T08:54:39.2838890Z add.s32 %r882, %r786, 221184; 2026-02-21T08:54:39.2839049Z add.s32 %r884, %r786, 223232; 2026-02-21T08:54:39.2839195Z add.s32 %r886, %r786, 225280; 2026-02-21T08:54:39.2839352Z add.s32 %r888, %r786, 227328; 2026-02-21T08:54:39.2839498Z add.s32 %r890, %r786, 311296; 2026-02-21T08:54:39.2839653Z add.s32 %r892, %r786, 313344; 2026-02-21T08:54:39.2839802Z add.s32 %r894, %r786, 315392; 2026-02-21T08:54:39.2839955Z add.s32 %r896, %r786, 317440; 2026-02-21T08:54:39.2840107Z add.s32 %r898, %r786, 319488; 2026-02-21T08:54:39.2840252Z add.s32 %r900, %r786, 321536; 2026-02-21T08:54:39.2840407Z add.s32 %r902, %r786, 323584; 2026-02-21T08:54:39.2840550Z add.s32 %r904, %r786, 325632; 2026-02-21T08:54:39.2840705Z shl.b32 %r787, %r781, 7; 2026-02-21T08:54:39.2840849Z shl.b32 %r788, %r1, 4; 2026-02-21T08:54:39.2841026Z and.b32 %r789, %r788, 112; 2026-02-21T08:54:39.2841183Z or.b32 %r790, %r787, %r789; 2026-02-21T08:54:39.2841344Z xor.b32 %r791, %r790, 16; 2026-02-21T08:54:39.2841494Z xor.b32 %r792, %r790, 32; 2026-02-21T08:54:39.2841652Z xor.b32 %r793, %r790, 48; 2026-02-21T08:54:39.2841805Z xor.b32 %r794, %r790, 64; 2026-02-21T08:54:39.2841950Z xor.b32 %r795, %r790, 80; 2026-02-21T08:54:39.2842104Z xor.b32 %r796, %r790, 96; 2026-02-21T08:54:39.2842254Z xor.b32 %r797, %r790, 112; 2026-02-21T08:54:39.2842417Z add.s32 %r755, %r786, 294912; 2026-02-21T08:54:39.2842606Z add.s32 %r769, %r786, 309248; 2026-02-21T08:54:39.2842761Z add.s32 %r767, %r786, 307200; 2026-02-21T08:54:39.2842908Z add.s32 %r765, %r786, 305152; 2026-02-21T08:54:39.2843060Z add.s32 %r763, %r786, 303104; 2026-02-21T08:54:39.2843206Z add.s32 %r761, %r786, 301056; 2026-02-21T08:54:39.2843355Z add.s32 %r759, %r786, 299008; 2026-02-21T08:54:39.2843508Z add.s32 %r757, %r786, 296960; 2026-02-21T08:54:39.2843652Z add.s32 %r739, %r786, 196608; 2026-02-21T08:54:39.2843802Z add.s32 %r753, %r786, 210944; 2026-02-21T08:54:39.2843946Z add.s32 %r751, %r786, 208896; 2026-02-21T08:54:39.2844097Z add.s32 %r749, %r786, 206848; 2026-02-21T08:54:39.2844241Z add.s32 %r747, %r786, 204800; 2026-02-21T08:54:39.2844392Z add.s32 %r745, %r786, 202752; 2026-02-21T08:54:39.2844534Z add.s32 %r743, %r786, 200704; 2026-02-21T08:54:39.2844723Z add.s32 %r741, %r786, 198656; 2026-02-21T08:54:39.2844916Z add.s32 %r723, %r786, 278528; 2026-02-21T08:54:39.2845064Z add.s32 %r737, %r786, 292864; 2026-02-21T08:54:39.2845217Z add.s32 %r735, %r786, 290816; 2026-02-21T08:54:39.2845363Z add.s32 %r733, %r786, 288768; 2026-02-21T08:54:39.2845515Z add.s32 %r731, %r786, 286720; 2026-02-21T08:54:39.2845661Z add.s32 %r729, %r786, 284672; 2026-02-21T08:54:39.2845814Z add.s32 %r727, %r786, 282624; 2026-02-21T08:54:39.2845959Z add.s32 %r725, %r786, 280576; 2026-02-21T08:54:39.2846115Z add.s32 %r707, %r786, 180224; 2026-02-21T08:54:39.2846262Z add.s32 %r721, %r786, 194560; 2026-02-21T08:54:39.2846417Z add.s32 %r719, %r786, 192512; 2026-02-21T08:54:39.2846572Z add.s32 %r717, %r786, 190464; 2026-02-21T08:54:39.2846719Z add.s32 %r715, %r786, 188416; 2026-02-21T08:54:39.2846874Z add.s32 %r713, %r786, 186368; 2026-02-21T08:54:39.2847021Z add.s32 %r711, %r786, 184320; 2026-02-21T08:54:39.2847174Z add.s32 %r709, %r786, 182272; 2026-02-21T08:54:39.2847319Z add.s32 %r691, %r786, 262144; 2026-02-21T08:54:39.2847472Z add.s32 %r705, %r786, 276480; 2026-02-21T08:54:39.2847619Z add.s32 %r703, %r786, 274432; 2026-02-21T08:54:39.2847769Z add.s32 %r701, %r786, 272384; 2026-02-21T08:54:39.2847920Z add.s32 %r699, %r786, 270336; 2026-02-21T08:54:39.2848064Z add.s32 %r697, %r786, 268288; 2026-02-21T08:54:39.2848249Z add.s32 %r695, %r786, 266240; 2026-02-21T08:54:39.2848395Z add.s32 %r693, %r786, 264192; 2026-02-21T08:54:39.2848547Z add.s32 %r675, %r786, 163840; 2026-02-21T08:54:39.2848691Z add.s32 %r689, %r786, 178176; 2026-02-21T08:54:39.2848844Z add.s32 %r687, %r786, 176128; 2026-02-21T08:54:39.2848989Z add.s32 %r685, %r786, 174080; 2026-02-21T08:54:39.2849143Z add.s32 %r683, %r786, 172032; 2026-02-21T08:54:39.2849288Z add.s32 %r681, %r786, 169984; 2026-02-21T08:54:39.2849440Z add.s32 %r679, %r786, 167936; 2026-02-21T08:54:39.2849595Z add.s32 %r677, %r786, 165888; 2026-02-21T08:54:39.2849740Z add.s32 %r659, %r786, 245760; 2026-02-21T08:54:39.2849895Z add.s32 %r673, %r786, 260096; 2026-02-21T08:54:39.2850042Z add.s32 %r671, %r786, 258048; 2026-02-21T08:54:39.2850198Z add.s32 %r669, %r786, 256000; 2026-02-21T08:54:39.2850345Z add.s32 %r667, %r786, 253952; 2026-02-21T08:54:39.2850505Z add.s32 %r665, %r786, 251904; 2026-02-21T08:54:39.2850657Z add.s32 %r663, %r786, 249856; 2026-02-21T08:54:39.2850817Z add.s32 %r661, %r786, 247808; 2026-02-21T08:54:39.2850969Z add.s32 %r643, %r786, 147456; 2026-02-21T08:54:39.2851131Z add.s32 %r657, %r786, 161792; 2026-02-21T08:54:39.2851289Z add.s32 %r655, %r786, 159744; 2026-02-21T08:54:39.2851480Z add.s32 %r653, %r786, 157696; 2026-02-21T08:54:39.2851640Z add.s32 %r651, %r786, 155648; 2026-02-21T08:54:39.2851791Z add.s32 %r649, %r786, 153600; 2026-02-21T08:54:39.2851949Z add.s32 %r647, %r786, 151552; 2026-02-21T08:54:39.2852101Z add.s32 %r645, %r786, 149504; 2026-02-21T08:54:39.2852260Z add.s32 %r627, %r786, 229376; 2026-02-21T08:54:39.2852411Z add.s32 %r641, %r786, 243712; 2026-02-21T08:54:39.2852569Z add.s32 %r639, %r786, 241664; 2026-02-21T08:54:39.2852730Z add.s32 %r637, %r786, 239616; 2026-02-21T08:54:39.2852938Z add.s32 %r635, %r786, 237568; 2026-02-21T08:54:39.2853098Z add.s32 %r633, %r786, 235520; 2026-02-21T08:54:39.2853253Z add.s32 %r631, %r786, 233472; 2026-02-21T08:54:39.2853414Z add.s32 %r629, %r786, 231424; 2026-02-21T08:54:39.2853567Z add.s32 %r611, %r786, 131072; 2026-02-21T08:54:39.2853727Z add.s32 %r625, %r786, 145408; 2026-02-21T08:54:39.2853881Z add.s32 %r623, %r786, 143360; 2026-02-21T08:54:39.2854040Z add.s32 %r621, %r786, 141312; 2026-02-21T08:54:39.2854195Z add.s32 %r619, %r786, 139264; 2026-02-21T08:54:39.2854355Z add.s32 %r617, %r786, 137216; 2026-02-21T08:54:39.2854522Z add.s32 %r615, %r786, 135168; 2026-02-21T08:54:39.2854714Z add.s32 %r613, %r786, 133120; 2026-02-21T08:54:39.2855008Z .loc 1 37 33 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:37:33 2026-02-21T08:54:39.2855306Z shr.u32 %r798, %r3, 3; 2026-02-21T08:54:39.2855495Z and.b32 %r799, %r798, 48; 2026-02-21T08:54:39.2855763Z .loc 1 39 64 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:39:64 2026-02-21T08:54:39.2856063Z and.b32 %r31, %r3, 15; 2026-02-21T08:54:39.2856328Z .loc 1 39 30 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:39:30 2026-02-21T08:54:39.2856620Z or.b32 %r800, %r799, %r31; 2026-02-21T08:54:39.2856895Z .loc 1 41 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:41:27 2026-02-21T08:54:39.2857193Z shl.b32 %r32, %r800, 8; 2026-02-21T08:54:39.2857463Z .loc 1 42 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:42:32 2026-02-21T08:54:39.2857758Z or.b32 %r801, %r32, %r5; 2026-02-21T08:54:39.2857920Z or.b32 %r802, %r32, %r779; 2026-02-21T08:54:39.2858084Z or.b32 %r803, %r32, %r778; 2026-02-21T08:54:39.2858236Z or.b32 %r804, %r32, %r777; 2026-02-21T08:54:39.2858397Z or.b32 %r805, %r32, %r776; 2026-02-21T08:54:39.2858551Z or.b32 %r806, %r32, %r775; 2026-02-21T08:54:39.2858724Z or.b32 %r807, %r32, %r774; 2026-02-21T08:54:39.2858868Z or.b32 %r808, %r32, %r4; 2026-02-21T08:54:39.2859124Z .loc 1 43 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:43:27 2026-02-21T08:54:39.2859427Z shl.b32 %r809, %r3, 4; 2026-02-21T08:54:39.2859582Z and.b32 %r1554, %r809, 1792; 2026-02-21T08:54:39.2859848Z .loc 1 44 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:44:32 2026-02-21T08:54:39.2860132Z or.b32 %r810, %r1554, %r5; 2026-02-21T08:54:39.2860290Z or.b32 %r811, %r1554, %r779; 2026-02-21T08:54:39.2860440Z or.b32 %r812, %r1554, %r778; 2026-02-21T08:54:39.2860591Z or.b32 %r813, %r1554, %r777; 2026-02-21T08:54:39.2860734Z or.b32 %r814, %r1554, %r776; 2026-02-21T08:54:39.2860886Z or.b32 %r815, %r1554, %r775; 2026-02-21T08:54:39.2861027Z or.b32 %r816, %r1554, %r774; 2026-02-21T08:54:39.2861176Z or.b32 %r817, %r1554, %r4; 2026-02-21T08:54:39.2861431Z .loc 1 54 53 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:53 2026-02-21T08:54:39.2861717Z shl.b32 %r818, %r810, 11; 2026-02-21T08:54:39.2861867Z shl.b32 %r819, %r811, 11; 2026-02-21T08:54:39.2862010Z shl.b32 %r820, %r812, 11; 2026-02-21T08:54:39.2862161Z shl.b32 %r821, %r813, 11; 2026-02-21T08:54:39.2862302Z shl.b32 %r822, %r814, 11; 2026-02-21T08:54:39.2862447Z shl.b32 %r823, %r815, 11; 2026-02-21T08:54:39.2862589Z shl.b32 %r824, %r816, 11; 2026-02-21T08:54:39.2862767Z shl.b32 %r825, %r817, 11; 2026-02-21T08:54:39.2863017Z .loc 1 55 80 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:80 2026-02-21T08:54:39.2863298Z shl.b32 %r826, %r801, 11; 2026-02-21T08:54:39.2863445Z shl.b32 %r827, %r802, 11; 2026-02-21T08:54:39.2863585Z shl.b32 %r828, %r803, 11; 2026-02-21T08:54:39.2863736Z shl.b32 %r829, %r804, 11; 2026-02-21T08:54:39.2863876Z shl.b32 %r830, %r805, 11; 2026-02-21T08:54:39.2864025Z shl.b32 %r831, %r806, 11; 2026-02-21T08:54:39.2864165Z shl.b32 %r832, %r807, 11; 2026-02-21T08:54:39.2864346Z shl.b32 %r833, %r808, 11; 2026-02-21T08:54:39.2864596Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.2864954Z shfl.sync.idx.b32 %r34, %r780, 0, 31, -1; 2026-02-21T08:54:39.2865144Z and.b32 %r35, %r34, 3; 2026-02-21T08:54:39.2865290Z shl.b32 %r834, %r35, 21; 2026-02-21T08:54:39.2865447Z add.s32 %r1552, %r834, %r2328; 2026-02-21T08:54:39.2865606Z mov.pred %p59, -1; 2026-02-21T08:54:39.2865755Z // begin inline asm 2026-02-21T08:54:39.2866090Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 0], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2866460Z // end inline asm 2026-02-21T08:54:39.2866591Z // begin inline asm 2026-02-21T08:54:39.2866946Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 16], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2867312Z // end inline asm 2026-02-21T08:54:39.2867444Z // begin inline asm 2026-02-21T08:54:39.2867768Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 32], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2868119Z // end inline asm 2026-02-21T08:54:39.2868255Z // begin inline asm 2026-02-21T08:54:39.2868573Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 48], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2868920Z // end inline asm 2026-02-21T08:54:39.2869058Z // begin inline asm 2026-02-21T08:54:39.2869368Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 64], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2869723Z // end inline asm 2026-02-21T08:54:39.2869851Z // begin inline asm 2026-02-21T08:54:39.2870173Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 80], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2870527Z // end inline asm 2026-02-21T08:54:39.2870656Z // begin inline asm 2026-02-21T08:54:39.2870999Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 96], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2871343Z // end inline asm 2026-02-21T08:54:39.2871478Z // begin inline asm 2026-02-21T08:54:39.2871789Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 112], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2872151Z // end inline asm 2026-02-21T08:54:39.2872283Z // begin inline asm 2026-02-21T08:54:39.2872595Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 128], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2872948Z // end inline asm 2026-02-21T08:54:39.2873076Z // begin inline asm 2026-02-21T08:54:39.2873390Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 144], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2873733Z // end inline asm 2026-02-21T08:54:39.2873865Z // begin inline asm 2026-02-21T08:54:39.2874204Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 160], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2874552Z // end inline asm 2026-02-21T08:54:39.2874721Z // begin inline asm 2026-02-21T08:54:39.2875043Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 176], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2875401Z // end inline asm 2026-02-21T08:54:39.2875529Z // begin inline asm 2026-02-21T08:54:39.2875849Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 192], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2876237Z // end inline asm 2026-02-21T08:54:39.2876371Z // begin inline asm 2026-02-21T08:54:39.2876697Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 208], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2877057Z // end inline asm 2026-02-21T08:54:39.2877199Z // begin inline asm 2026-02-21T08:54:39.2877522Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 224], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2877887Z // end inline asm 2026-02-21T08:54:39.2878036Z // begin inline asm 2026-02-21T08:54:39.2878356Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 240], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2878749Z // end inline asm 2026-02-21T08:54:39.2878879Z // begin inline asm 2026-02-21T08:54:39.2879196Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 256], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2879548Z // end inline asm 2026-02-21T08:54:39.2879677Z // begin inline asm 2026-02-21T08:54:39.2879996Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 272], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2880351Z // end inline asm 2026-02-21T08:54:39.2880485Z // begin inline asm 2026-02-21T08:54:39.2880795Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 288], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2881162Z // end inline asm 2026-02-21T08:54:39.2881300Z // begin inline asm 2026-02-21T08:54:39.2881611Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 304], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2881981Z // end inline asm 2026-02-21T08:54:39.2882107Z // begin inline asm 2026-02-21T08:54:39.2882424Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 320], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2882809Z // end inline asm 2026-02-21T08:54:39.2882946Z // begin inline asm 2026-02-21T08:54:39.2883269Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 336], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2883620Z // end inline asm 2026-02-21T08:54:39.2883756Z // begin inline asm 2026-02-21T08:54:39.2884068Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 352], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2884426Z // end inline asm 2026-02-21T08:54:39.2884552Z // begin inline asm 2026-02-21T08:54:39.2884917Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 368], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2885270Z // end inline asm 2026-02-21T08:54:39.2885400Z // begin inline asm 2026-02-21T08:54:39.2885719Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 384], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2886059Z // end inline asm 2026-02-21T08:54:39.2886216Z // begin inline asm 2026-02-21T08:54:39.2886526Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 400], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2886875Z // end inline asm 2026-02-21T08:54:39.2887006Z // begin inline asm 2026-02-21T08:54:39.2887315Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 416], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2887699Z // end inline asm 2026-02-21T08:54:39.2887826Z // begin inline asm 2026-02-21T08:54:39.2888147Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 432], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2888496Z // end inline asm 2026-02-21T08:54:39.2888629Z // begin inline asm 2026-02-21T08:54:39.2888949Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 448], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2889291Z // end inline asm 2026-02-21T08:54:39.2889428Z // begin inline asm 2026-02-21T08:54:39.2889743Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 464], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2890096Z // end inline asm 2026-02-21T08:54:39.2890225Z // begin inline asm 2026-02-21T08:54:39.2890568Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 480], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2890937Z // end inline asm 2026-02-21T08:54:39.2891068Z // begin inline asm 2026-02-21T08:54:39.2891404Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1552 + 496], {%r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66, %r66}; 2026-02-21T08:54:39.2891755Z // end inline asm 2026-02-21T08:54:39.2891896Z // begin inline asm 2026-02-21T08:54:39.2892044Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:39.2892213Z // end inline asm 2026-02-21T08:54:39.2892346Z bar.sync 0; 2026-02-21T08:54:39.2892592Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.2892893Z add.s32 %r2330, %r48, 327680; 2026-02-21T08:54:39.2893051Z // begin inline asm 2026-02-21T08:54:39.2893227Z @%p81 mbarrier.init.shared::cta.b64 [%r2330], 1; 2026-02-21T08:54:39.2893424Z // end inline asm 2026-02-21T08:54:39.2893568Z bar.sync 0; 2026-02-21T08:54:39.2893705Z add.s32 %r610, %r48, 327688; 2026-02-21T08:54:39.2893870Z // begin inline asm 2026-02-21T08:54:39.2894044Z @%p81 mbarrier.init.shared::cta.b64 [%r610], 1; 2026-02-21T08:54:39.2894268Z // end inline asm 2026-02-21T08:54:39.2894537Z .loc 1 54 60 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:60 2026-02-21T08:54:39.2894867Z or.b32 %r835, %r818, %r772; 2026-02-21T08:54:39.2895036Z or.b32 %r836, %r819, %r772; 2026-02-21T08:54:39.2895193Z or.b32 %r837, %r820, %r772; 2026-02-21T08:54:39.2895355Z or.b32 %r838, %r821, %r772; 2026-02-21T08:54:39.2895508Z or.b32 %r839, %r822, %r772; 2026-02-21T08:54:39.2895666Z or.b32 %r840, %r823, %r772; 2026-02-21T08:54:39.2895825Z or.b32 %r841, %r824, %r772; 2026-02-21T08:54:39.2895976Z or.b32 %r842, %r825, %r772; 2026-02-21T08:54:39.2896254Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.2896562Z mad.wide.u32 %rd91, %r835, 2, %rd69; 2026-02-21T08:54:39.2896752Z mad.wide.u32 %rd92, %r836, 2, %rd69; 2026-02-21T08:54:39.2896930Z mad.wide.u32 %rd93, %r837, 2, %rd69; 2026-02-21T08:54:39.2897110Z mad.wide.u32 %rd94, %r838, 2, %rd69; 2026-02-21T08:54:39.2897286Z mad.wide.u32 %rd95, %r839, 2, %rd69; 2026-02-21T08:54:39.2897462Z mad.wide.u32 %rd96, %r840, 2, %rd69; 2026-02-21T08:54:39.2897640Z mad.wide.u32 %rd97, %r841, 2, %rd69; 2026-02-21T08:54:39.2897854Z mad.wide.u32 %rd98, %r842, 2, %rd69; 2026-02-21T08:54:39.2898031Z mov.b32 %r875, 16; 2026-02-21T08:54:39.2898291Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2898586Z // begin inline asm 2026-02-21T08:54:39.2898789Z cp.async.cg.shared.global [ %r611 + 0 ], [ %rd91 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2899028Z // end inline asm 2026-02-21T08:54:39.2899163Z // begin inline asm 2026-02-21T08:54:39.2899371Z cp.async.cg.shared.global [ %r613 + 0 ], [ %rd92 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2899631Z // end inline asm 2026-02-21T08:54:39.2899767Z // begin inline asm 2026-02-21T08:54:39.2899971Z cp.async.cg.shared.global [ %r615 + 0 ], [ %rd93 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2900196Z // end inline asm 2026-02-21T08:54:39.2900337Z // begin inline asm 2026-02-21T08:54:39.2900532Z cp.async.cg.shared.global [ %r617 + 0 ], [ %rd94 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2900772Z // end inline asm 2026-02-21T08:54:39.2900904Z // begin inline asm 2026-02-21T08:54:39.2901099Z cp.async.cg.shared.global [ %r619 + 0 ], [ %rd95 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2901322Z // end inline asm 2026-02-21T08:54:39.2901457Z // begin inline asm 2026-02-21T08:54:39.2901649Z cp.async.cg.shared.global [ %r621 + 0 ], [ %rd96 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2901865Z // end inline asm 2026-02-21T08:54:39.2902002Z // begin inline asm 2026-02-21T08:54:39.2902214Z cp.async.cg.shared.global [ %r623 + 0 ], [ %rd97 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2902435Z // end inline asm 2026-02-21T08:54:39.2902564Z // begin inline asm 2026-02-21T08:54:39.2902757Z cp.async.cg.shared.global [ %r625 + 0 ], [ %rd98 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2902974Z // end inline asm 2026-02-21T08:54:39.2903111Z cp.async.commit_group; 2026-02-21T08:54:39.2903371Z .loc 1 55 59 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:59 2026-02-21T08:54:39.2903648Z or.b32 %r843, %r826, %r772; 2026-02-21T08:54:39.2903806Z or.b32 %r844, %r827, %r772; 2026-02-21T08:54:39.2903957Z or.b32 %r845, %r828, %r772; 2026-02-21T08:54:39.2904111Z or.b32 %r846, %r829, %r772; 2026-02-21T08:54:39.2904256Z or.b32 %r847, %r830, %r772; 2026-02-21T08:54:39.2904407Z or.b32 %r848, %r831, %r772; 2026-02-21T08:54:39.2904556Z or.b32 %r849, %r832, %r772; 2026-02-21T08:54:39.2904734Z or.b32 %r850, %r833, %r772; 2026-02-21T08:54:39.2904999Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.2905282Z mad.wide.u32 %rd99, %r843, 2, %rd70; 2026-02-21T08:54:39.2905462Z mad.wide.u32 %rd100, %r844, 2, %rd70; 2026-02-21T08:54:39.2905637Z mad.wide.u32 %rd101, %r845, 2, %rd70; 2026-02-21T08:54:39.2905845Z mad.wide.u32 %rd102, %r846, 2, %rd70; 2026-02-21T08:54:39.2906015Z mad.wide.u32 %rd103, %r847, 2, %rd70; 2026-02-21T08:54:39.2906186Z mad.wide.u32 %rd104, %r848, 2, %rd70; 2026-02-21T08:54:39.2906356Z mad.wide.u32 %rd105, %r849, 2, %rd70; 2026-02-21T08:54:39.2906520Z mad.wide.u32 %rd106, %r850, 2, %rd70; 2026-02-21T08:54:39.2906797Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.2907080Z // begin inline asm 2026-02-21T08:54:39.2907276Z cp.async.cg.shared.global [ %r627 + 0 ], [ %rd99 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2907489Z // end inline asm 2026-02-21T08:54:39.2907624Z // begin inline asm 2026-02-21T08:54:39.2907814Z cp.async.cg.shared.global [ %r629 + 0 ], [ %rd100 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2908036Z // end inline asm 2026-02-21T08:54:39.2908169Z // begin inline asm 2026-02-21T08:54:39.2908355Z cp.async.cg.shared.global [ %r631 + 0 ], [ %rd101 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2908579Z // end inline asm 2026-02-21T08:54:39.2908706Z // begin inline asm 2026-02-21T08:54:39.2908896Z cp.async.cg.shared.global [ %r633 + 0 ], [ %rd102 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2909108Z // end inline asm 2026-02-21T08:54:39.2909267Z // begin inline asm 2026-02-21T08:54:39.2909458Z cp.async.cg.shared.global [ %r635 + 0 ], [ %rd103 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2909683Z // end inline asm 2026-02-21T08:54:39.2909823Z // begin inline asm 2026-02-21T08:54:39.2910014Z cp.async.cg.shared.global [ %r637 + 0 ], [ %rd104 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2910242Z // end inline asm 2026-02-21T08:54:39.2910375Z // begin inline asm 2026-02-21T08:54:39.2910575Z cp.async.cg.shared.global [ %r639 + 0 ], [ %rd105 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2910822Z // end inline asm 2026-02-21T08:54:39.2910964Z // begin inline asm 2026-02-21T08:54:39.2911149Z cp.async.cg.shared.global [ %r641 + 0 ], [ %rd106 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2911372Z // end inline asm 2026-02-21T08:54:39.2911513Z cp.async.commit_group; 2026-02-21T08:54:39.2911771Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.2912062Z add.s64 %rd107, %rd91, 64; 2026-02-21T08:54:39.2912219Z add.s64 %rd108, %rd92, 64; 2026-02-21T08:54:39.2912377Z add.s64 %rd109, %rd93, 64; 2026-02-21T08:54:39.2912525Z add.s64 %rd110, %rd94, 64; 2026-02-21T08:54:39.2912680Z add.s64 %rd111, %rd95, 64; 2026-02-21T08:54:39.2912825Z add.s64 %rd112, %rd96, 64; 2026-02-21T08:54:39.2912978Z add.s64 %rd113, %rd97, 64; 2026-02-21T08:54:39.2913131Z add.s64 %rd114, %rd98, 64; 2026-02-21T08:54:39.2913413Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2913706Z bar.sync 0; 2026-02-21T08:54:39.2913833Z // begin inline asm 2026-02-21T08:54:39.2914030Z cp.async.cg.shared.global [ %r643 + 0 ], [ %rd107 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2914247Z // end inline asm 2026-02-21T08:54:39.2914387Z // begin inline asm 2026-02-21T08:54:39.2914575Z cp.async.cg.shared.global [ %r645 + 0 ], [ %rd108 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2914828Z // end inline asm 2026-02-21T08:54:39.2914962Z // begin inline asm 2026-02-21T08:54:39.2915150Z cp.async.cg.shared.global [ %r647 + 0 ], [ %rd109 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2915370Z // end inline asm 2026-02-21T08:54:39.2915497Z // begin inline asm 2026-02-21T08:54:39.2915688Z cp.async.cg.shared.global [ %r649 + 0 ], [ %rd110 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2915900Z // end inline asm 2026-02-21T08:54:39.2916031Z // begin inline asm 2026-02-21T08:54:39.2916215Z cp.async.cg.shared.global [ %r651 + 0 ], [ %rd111 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2916437Z // end inline asm 2026-02-21T08:54:39.2916565Z // begin inline asm 2026-02-21T08:54:39.2916755Z cp.async.cg.shared.global [ %r653 + 0 ], [ %rd112 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2916976Z // end inline asm 2026-02-21T08:54:39.2917103Z // begin inline asm 2026-02-21T08:54:39.2917327Z cp.async.cg.shared.global [ %r655 + 0 ], [ %rd113 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2917541Z // end inline asm 2026-02-21T08:54:39.2917676Z // begin inline asm 2026-02-21T08:54:39.2917860Z cp.async.cg.shared.global [ %r657 + 0 ], [ %rd114 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2918078Z // end inline asm 2026-02-21T08:54:39.2918210Z cp.async.commit_group; 2026-02-21T08:54:39.2918475Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.2918765Z add.s64 %rd115, %rd99, 64; 2026-02-21T08:54:39.2918917Z add.s64 %rd116, %rd100, 64; 2026-02-21T08:54:39.2919075Z add.s64 %rd117, %rd101, 64; 2026-02-21T08:54:39.2919225Z add.s64 %rd118, %rd102, 64; 2026-02-21T08:54:39.2919379Z add.s64 %rd119, %rd103, 64; 2026-02-21T08:54:39.2919524Z add.s64 %rd120, %rd104, 64; 2026-02-21T08:54:39.2919678Z add.s64 %rd121, %rd105, 64; 2026-02-21T08:54:39.2919823Z add.s64 %rd122, %rd106, 64; 2026-02-21T08:54:39.2920087Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.2920382Z // begin inline asm 2026-02-21T08:54:39.2920614Z cp.async.cg.shared.global [ %r659 + 0 ], [ %rd115 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2920846Z // end inline asm 2026-02-21T08:54:39.2920976Z // begin inline asm 2026-02-21T08:54:39.2921170Z cp.async.cg.shared.global [ %r661 + 0 ], [ %rd116 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2921386Z // end inline asm 2026-02-21T08:54:39.2921522Z // begin inline asm 2026-02-21T08:54:39.2921708Z cp.async.cg.shared.global [ %r663 + 0 ], [ %rd117 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2921930Z // end inline asm 2026-02-21T08:54:39.2922065Z // begin inline asm 2026-02-21T08:54:39.2922284Z cp.async.cg.shared.global [ %r665 + 0 ], [ %rd118 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2922504Z // end inline asm 2026-02-21T08:54:39.2922632Z // begin inline asm 2026-02-21T08:54:39.2922824Z cp.async.cg.shared.global [ %r667 + 0 ], [ %rd119 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2923038Z // end inline asm 2026-02-21T08:54:39.2923172Z // begin inline asm 2026-02-21T08:54:39.2923358Z cp.async.cg.shared.global [ %r669 + 0 ], [ %rd120 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2923584Z // end inline asm 2026-02-21T08:54:39.2923719Z // begin inline asm 2026-02-21T08:54:39.2923903Z cp.async.cg.shared.global [ %r671 + 0 ], [ %rd121 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2924123Z // end inline asm 2026-02-21T08:54:39.2924249Z // begin inline asm 2026-02-21T08:54:39.2924441Z cp.async.cg.shared.global [ %r673 + 0 ], [ %rd122 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2924653Z // end inline asm 2026-02-21T08:54:39.2924828Z cp.async.commit_group; 2026-02-21T08:54:39.2925109Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.2925400Z add.s64 %rd123, %rd91, 128; 2026-02-21T08:54:39.2925558Z add.s64 %rd124, %rd92, 128; 2026-02-21T08:54:39.2925707Z add.s64 %rd125, %rd93, 128; 2026-02-21T08:54:39.2925865Z add.s64 %rd126, %rd94, 128; 2026-02-21T08:54:39.2926008Z add.s64 %rd127, %rd95, 128; 2026-02-21T08:54:39.2926157Z add.s64 %rd128, %rd96, 128; 2026-02-21T08:54:39.2926301Z add.s64 %rd129, %rd97, 128; 2026-02-21T08:54:39.2926455Z add.s64 %rd130, %rd98, 128; 2026-02-21T08:54:39.2926710Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2927000Z bar.sync 0; 2026-02-21T08:54:39.2927132Z // begin inline asm 2026-02-21T08:54:39.2927321Z cp.async.cg.shared.global [ %r675 + 0 ], [ %rd123 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2927547Z // end inline asm 2026-02-21T08:54:39.2927676Z // begin inline asm 2026-02-21T08:54:39.2927873Z cp.async.cg.shared.global [ %r677 + 0 ], [ %rd124 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2928089Z // end inline asm 2026-02-21T08:54:39.2928224Z // begin inline asm 2026-02-21T08:54:39.2928412Z cp.async.cg.shared.global [ %r679 + 0 ], [ %rd125 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2928666Z // end inline asm 2026-02-21T08:54:39.2928804Z // begin inline asm 2026-02-21T08:54:39.2928995Z cp.async.cg.shared.global [ %r681 + 0 ], [ %rd126 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2929220Z // end inline asm 2026-02-21T08:54:39.2929356Z // begin inline asm 2026-02-21T08:54:39.2929561Z cp.async.cg.shared.global [ %r683 + 0 ], [ %rd127 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2929783Z // end inline asm 2026-02-21T08:54:39.2929926Z // begin inline asm 2026-02-21T08:54:39.2930116Z cp.async.cg.shared.global [ %r685 + 0 ], [ %rd128 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2930343Z // end inline asm 2026-02-21T08:54:39.2930482Z // begin inline asm 2026-02-21T08:54:39.2930673Z cp.async.cg.shared.global [ %r687 + 0 ], [ %rd129 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2930898Z // end inline asm 2026-02-21T08:54:39.2931029Z // begin inline asm 2026-02-21T08:54:39.2931228Z cp.async.cg.shared.global [ %r689 + 0 ], [ %rd130 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2931444Z // end inline asm 2026-02-21T08:54:39.2931589Z cp.async.commit_group; 2026-02-21T08:54:39.2931843Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.2932165Z add.s64 %rd131, %rd99, 128; 2026-02-21T08:54:39.2932329Z add.s64 %rd132, %rd100, 128; 2026-02-21T08:54:39.2932486Z add.s64 %rd133, %rd101, 128; 2026-02-21T08:54:39.2932647Z add.s64 %rd134, %rd102, 128; 2026-02-21T08:54:39.2932798Z add.s64 %rd135, %rd103, 128; 2026-02-21T08:54:39.2932956Z add.s64 %rd136, %rd104, 128; 2026-02-21T08:54:39.2933104Z add.s64 %rd137, %rd105, 128; 2026-02-21T08:54:39.2933257Z add.s64 %rd138, %rd106, 128; 2026-02-21T08:54:39.2933510Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.2933829Z // begin inline asm 2026-02-21T08:54:39.2934024Z cp.async.cg.shared.global [ %r691 + 0 ], [ %rd131 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2934241Z // end inline asm 2026-02-21T08:54:39.2934380Z // begin inline asm 2026-02-21T08:54:39.2934567Z cp.async.cg.shared.global [ %r693 + 0 ], [ %rd132 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2934833Z // end inline asm 2026-02-21T08:54:39.2934961Z // begin inline asm 2026-02-21T08:54:39.2935157Z cp.async.cg.shared.global [ %r695 + 0 ], [ %rd133 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2935381Z // end inline asm 2026-02-21T08:54:39.2935522Z // begin inline asm 2026-02-21T08:54:39.2935719Z cp.async.cg.shared.global [ %r697 + 0 ], [ %rd134 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2935946Z // end inline asm 2026-02-21T08:54:39.2936086Z // begin inline asm 2026-02-21T08:54:39.2936279Z cp.async.cg.shared.global [ %r699 + 0 ], [ %rd135 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2936541Z // end inline asm 2026-02-21T08:54:39.2936678Z // begin inline asm 2026-02-21T08:54:39.2936878Z cp.async.cg.shared.global [ %r701 + 0 ], [ %rd136 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2937102Z // end inline asm 2026-02-21T08:54:39.2937244Z // begin inline asm 2026-02-21T08:54:39.2937436Z cp.async.cg.shared.global [ %r703 + 0 ], [ %rd137 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2937667Z // end inline asm 2026-02-21T08:54:39.2937808Z // begin inline asm 2026-02-21T08:54:39.2938001Z cp.async.cg.shared.global [ %r705 + 0 ], [ %rd138 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2938238Z // end inline asm 2026-02-21T08:54:39.2938377Z cp.async.commit_group; 2026-02-21T08:54:39.2938647Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.2938940Z add.s64 %rd139, %rd91, 192; 2026-02-21T08:54:39.2939111Z add.s64 %rd140, %rd92, 192; 2026-02-21T08:54:39.2939269Z add.s64 %rd141, %rd93, 192; 2026-02-21T08:54:39.2939444Z add.s64 %rd142, %rd94, 192; 2026-02-21T08:54:39.2939608Z add.s64 %rd143, %rd95, 192; 2026-02-21T08:54:39.2939760Z add.s64 %rd144, %rd96, 192; 2026-02-21T08:54:39.2939922Z add.s64 %rd145, %rd97, 192; 2026-02-21T08:54:39.2940074Z add.s64 %rd146, %rd98, 192; 2026-02-21T08:54:39.2940396Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2940690Z bar.sync 0; 2026-02-21T08:54:39.2940832Z // begin inline asm 2026-02-21T08:54:39.2941030Z cp.async.cg.shared.global [ %r707 + 0 ], [ %rd139 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2941265Z // end inline asm 2026-02-21T08:54:39.2941408Z // begin inline asm 2026-02-21T08:54:39.2941602Z cp.async.cg.shared.global [ %r709 + 0 ], [ %rd140 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2941836Z // end inline asm 2026-02-21T08:54:39.2941970Z // begin inline asm 2026-02-21T08:54:39.2942171Z cp.async.cg.shared.global [ %r711 + 0 ], [ %rd141 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2942398Z // end inline asm 2026-02-21T08:54:39.2942540Z // begin inline asm 2026-02-21T08:54:39.2942733Z cp.async.cg.shared.global [ %r713 + 0 ], [ %rd142 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2942968Z // end inline asm 2026-02-21T08:54:39.2943110Z // begin inline asm 2026-02-21T08:54:39.2943316Z cp.async.cg.shared.global [ %r715 + 0 ], [ %rd143 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2943537Z // end inline asm 2026-02-21T08:54:39.2943665Z // begin inline asm 2026-02-21T08:54:39.2943878Z cp.async.cg.shared.global [ %r717 + 0 ], [ %rd144 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2944099Z // end inline asm 2026-02-21T08:54:39.2944233Z // begin inline asm 2026-02-21T08:54:39.2944416Z cp.async.cg.shared.global [ %r719 + 0 ], [ %rd145 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2944637Z // end inline asm 2026-02-21T08:54:39.2944804Z // begin inline asm 2026-02-21T08:54:39.2944990Z cp.async.cg.shared.global [ %r721 + 0 ], [ %rd146 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2945210Z // end inline asm 2026-02-21T08:54:39.2945346Z cp.async.commit_group; 2026-02-21T08:54:39.2945634Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.2945926Z add.s64 %rd147, %rd99, 192; 2026-02-21T08:54:39.2946087Z add.s64 %rd148, %rd100, 192; 2026-02-21T08:54:39.2946242Z add.s64 %rd149, %rd101, 192; 2026-02-21T08:54:39.2946401Z add.s64 %rd150, %rd102, 192; 2026-02-21T08:54:39.2946559Z add.s64 %rd151, %rd103, 192; 2026-02-21T08:54:39.2946707Z add.s64 %rd152, %rd104, 192; 2026-02-21T08:54:39.2946862Z add.s64 %rd153, %rd105, 192; 2026-02-21T08:54:39.2947012Z add.s64 %rd154, %rd106, 192; 2026-02-21T08:54:39.2947277Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.2947555Z // begin inline asm 2026-02-21T08:54:39.2947752Z cp.async.cg.shared.global [ %r723 + 0 ], [ %rd147 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2947974Z // end inline asm 2026-02-21T08:54:39.2948140Z // begin inline asm 2026-02-21T08:54:39.2948341Z cp.async.cg.shared.global [ %r725 + 0 ], [ %rd148 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2948563Z // end inline asm 2026-02-21T08:54:39.2948702Z // begin inline asm 2026-02-21T08:54:39.2948893Z cp.async.cg.shared.global [ %r727 + 0 ], [ %rd149 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2949120Z // end inline asm 2026-02-21T08:54:39.2949246Z // begin inline asm 2026-02-21T08:54:39.2949438Z cp.async.cg.shared.global [ %r729 + 0 ], [ %rd150 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2949650Z // end inline asm 2026-02-21T08:54:39.2949784Z // begin inline asm 2026-02-21T08:54:39.2949974Z cp.async.cg.shared.global [ %r731 + 0 ], [ %rd151 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2950187Z // end inline asm 2026-02-21T08:54:39.2950319Z // begin inline asm 2026-02-21T08:54:39.2950504Z cp.async.cg.shared.global [ %r733 + 0 ], [ %rd152 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2950724Z // end inline asm 2026-02-21T08:54:39.2950852Z // begin inline asm 2026-02-21T08:54:39.2951045Z cp.async.cg.shared.global [ %r735 + 0 ], [ %rd153 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2951260Z // end inline asm 2026-02-21T08:54:39.2951394Z // begin inline asm 2026-02-21T08:54:39.2951584Z cp.async.cg.shared.global [ %r737 + 0 ], [ %rd154 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2951824Z // end inline asm 2026-02-21T08:54:39.2951966Z cp.async.commit_group; 2026-02-21T08:54:39.2952216Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.2952504Z add.s64 %rd155, %rd91, 256; 2026-02-21T08:54:39.2952655Z add.s64 %rd156, %rd92, 256; 2026-02-21T08:54:39.2952812Z add.s64 %rd157, %rd93, 256; 2026-02-21T08:54:39.2952957Z add.s64 %rd158, %rd94, 256; 2026-02-21T08:54:39.2953107Z add.s64 %rd159, %rd95, 256; 2026-02-21T08:54:39.2953258Z add.s64 %rd160, %rd96, 256; 2026-02-21T08:54:39.2953402Z add.s64 %rd161, %rd97, 256; 2026-02-21T08:54:39.2953551Z add.s64 %rd162, %rd98, 256; 2026-02-21T08:54:39.2953803Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2954085Z bar.sync 0; 2026-02-21T08:54:39.2954209Z // begin inline asm 2026-02-21T08:54:39.2954404Z cp.async.cg.shared.global [ %r739 + 0 ], [ %rd155 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2954617Z // end inline asm 2026-02-21T08:54:39.2954792Z // begin inline asm 2026-02-21T08:54:39.2954979Z cp.async.cg.shared.global [ %r741 + 0 ], [ %rd156 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2955200Z // end inline asm 2026-02-21T08:54:39.2955359Z // begin inline asm 2026-02-21T08:54:39.2955548Z cp.async.cg.shared.global [ %r743 + 0 ], [ %rd157 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2955767Z // end inline asm 2026-02-21T08:54:39.2955896Z // begin inline asm 2026-02-21T08:54:39.2956086Z cp.async.cg.shared.global [ %r745 + 0 ], [ %rd158 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2956303Z // end inline asm 2026-02-21T08:54:39.2956440Z // begin inline asm 2026-02-21T08:54:39.2956624Z cp.async.cg.shared.global [ %r747 + 0 ], [ %rd159 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2956870Z // end inline asm 2026-02-21T08:54:39.2957006Z // begin inline asm 2026-02-21T08:54:39.2957191Z cp.async.cg.shared.global [ %r749 + 0 ], [ %rd160 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2957414Z // end inline asm 2026-02-21T08:54:39.2957546Z // begin inline asm 2026-02-21T08:54:39.2957741Z cp.async.cg.shared.global [ %r751 + 0 ], [ %rd161 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2957956Z // end inline asm 2026-02-21T08:54:39.2958094Z // begin inline asm 2026-02-21T08:54:39.2958280Z cp.async.cg.shared.global [ %r753 + 0 ], [ %rd162 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2958497Z // end inline asm 2026-02-21T08:54:39.2958637Z cp.async.commit_group; 2026-02-21T08:54:39.2958886Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.2959171Z add.s64 %rd163, %rd99, 256; 2026-02-21T08:54:39.2959321Z add.s64 %rd164, %rd100, 256; 2026-02-21T08:54:39.2959507Z add.s64 %rd165, %rd101, 256; 2026-02-21T08:54:39.2959660Z add.s64 %rd166, %rd102, 256; 2026-02-21T08:54:39.2959816Z add.s64 %rd167, %rd103, 256; 2026-02-21T08:54:39.2959963Z add.s64 %rd168, %rd104, 256; 2026-02-21T08:54:39.2960115Z add.s64 %rd169, %rd105, 256; 2026-02-21T08:54:39.2960270Z add.s64 %rd170, %rd106, 256; 2026-02-21T08:54:39.2960519Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.2960798Z // begin inline asm 2026-02-21T08:54:39.2960987Z cp.async.cg.shared.global [ %r755 + 0 ], [ %rd163 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2961209Z // end inline asm 2026-02-21T08:54:39.2961338Z // begin inline asm 2026-02-21T08:54:39.2961531Z cp.async.cg.shared.global [ %r757 + 0 ], [ %rd164 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2961746Z // end inline asm 2026-02-21T08:54:39.2961883Z // begin inline asm 2026-02-21T08:54:39.2962072Z cp.async.cg.shared.global [ %r759 + 0 ], [ %rd165 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2962286Z // end inline asm 2026-02-21T08:54:39.2962421Z // begin inline asm 2026-02-21T08:54:39.2962604Z cp.async.cg.shared.global [ %r761 + 0 ], [ %rd166 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2962822Z // end inline asm 2026-02-21T08:54:39.2962949Z // begin inline asm 2026-02-21T08:54:39.2963169Z cp.async.cg.shared.global [ %r763 + 0 ], [ %rd167 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2963381Z // end inline asm 2026-02-21T08:54:39.2963514Z // begin inline asm 2026-02-21T08:54:39.2963703Z cp.async.cg.shared.global [ %r765 + 0 ], [ %rd168 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2963915Z // end inline asm 2026-02-21T08:54:39.2964049Z // begin inline asm 2026-02-21T08:54:39.2964232Z cp.async.cg.shared.global [ %r767 + 0 ], [ %rd169 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2964452Z // end inline asm 2026-02-21T08:54:39.2964577Z // begin inline asm 2026-02-21T08:54:39.2964794Z cp.async.cg.shared.global [ %r769 + 0 ], [ %rd170 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2965012Z // end inline asm 2026-02-21T08:54:39.2965153Z cp.async.commit_group; 2026-02-21T08:54:39.2965413Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2965699Z cp.async.wait_group 8; 2026-02-21T08:54:39.2965854Z bar.sync 0; 2026-02-21T08:54:39.2966085Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.2966374Z setp.ne.b32 %p56, %r34, 0; 2026-02-21T08:54:39.2966527Z @%p56 bra $L__BB0_3; 2026-02-21T08:54:39.2966699Z // %bb.2: 2026-02-21T08:54:39.2966930Z .loc 1 0 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:0:52 2026-02-21T08:54:39.2967219Z add.s32 %r860, %r48, 131072; 2026-02-21T08:54:39.2967383Z add.s32 %r861, %r48, 139296; 2026-02-21T08:54:39.2967538Z bfe.u32 %r862, %r861, 4, 14; 2026-02-21T08:54:39.2967700Z cvt.u64.u32 %rd180, %r862; 2026-02-21T08:54:39.2967871Z or.b64 %rd177, %rd180, -9223371899348713472; 2026-02-21T08:54:39.2968057Z add.s32 %r863, %r48, 139264; 2026-02-21T08:54:39.2968251Z bfe.u32 %r864, %r863, 4, 14; 2026-02-21T08:54:39.2968405Z cvt.u64.u32 %rd181, %r864; 2026-02-21T08:54:39.2968567Z or.b64 %rd175, %rd181, -9223371899348713472; 2026-02-21T08:54:39.2968745Z add.s32 %r865, %r48, 229376; 2026-02-21T08:54:39.2968899Z add.s32 %r866, %r48, 229408; 2026-02-21T08:54:39.2969043Z bfe.u32 %r867, %r866, 4, 14; 2026-02-21T08:54:39.2969197Z cvt.u64.u32 %rd182, %r867; 2026-02-21T08:54:39.2969354Z or.b64 %rd174, %rd182, -9223371899348713472; 2026-02-21T08:54:39.2969533Z add.s32 %r868, %r48, 131104; 2026-02-21T08:54:39.2969679Z bfe.u32 %r869, %r868, 4, 14; 2026-02-21T08:54:39.2969833Z cvt.u64.u32 %rd183, %r869; 2026-02-21T08:54:39.2969994Z or.b64 %rd173, %rd183, -9223371899348713472; 2026-02-21T08:54:39.2970172Z bfe.u32 %r870, %r865, 4, 14; 2026-02-21T08:54:39.2970323Z cvt.u64.u32 %rd184, %r870; 2026-02-21T08:54:39.2970480Z or.b64 %rd172, %rd184, -9223371899348713472; 2026-02-21T08:54:39.2970679Z bfe.u32 %r871, %r860, 4, 14; 2026-02-21T08:54:39.2970830Z cvt.u64.u32 %rd185, %r871; 2026-02-21T08:54:39.2970996Z or.b64 %rd171, %rd185, -9223371899348713472; 2026-02-21T08:54:39.2971272Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.2971564Z elect.sync %r872|%p58, -1; 2026-02-21T08:54:39.2971718Z mov.b32 %r852, 138412048; 2026-02-21T08:54:39.2971872Z mov.pred %p57, 0; 2026-02-21T08:54:39.2972008Z // begin inline asm 2026-02-21T08:54:39.2972242Z @%p58 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 0 ], %rd171, %rd172, %r852, %p57; 2026-02-21T08:54:39.2972497Z // end inline asm 2026-02-21T08:54:39.2972627Z // begin inline asm 2026-02-21T08:54:39.2972845Z @%p58 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 0 ], %rd173, %rd174, %r852, %p59; 2026-02-21T08:54:39.2973094Z // end inline asm 2026-02-21T08:54:39.2973231Z // begin inline asm 2026-02-21T08:54:39.2973445Z @%p58 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 256 ], %rd175, %rd172, %r852, %p57; 2026-02-21T08:54:39.2973699Z // end inline asm 2026-02-21T08:54:39.2973838Z // begin inline asm 2026-02-21T08:54:39.2974053Z @%p58 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 256 ], %rd177, %rd174, %r852, %p59; 2026-02-21T08:54:39.2974312Z // end inline asm 2026-02-21T08:54:39.2974476Z add.s32 %r873, %r48, 327680; 2026-02-21T08:54:39.2974637Z cvt.u64.u32 %rd179, %r873; 2026-02-21T08:54:39.2974822Z // begin inline asm 2026-02-21T08:54:39.2975028Z @%p58 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd179]; 2026-02-21T08:54:39.2975260Z // end inline asm 2026-02-21T08:54:39.2975389Z $L__BB0_3: 2026-02-21T08:54:39.2975631Z .loc 1 0 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:0:52 2026-02-21T08:54:39.2975915Z add.s32 %r23, %r48, %r790; 2026-02-21T08:54:39.2976072Z add.s32 %r24, %r48, %r791; 2026-02-21T08:54:39.2976220Z add.s32 %r25, %r48, %r792; 2026-02-21T08:54:39.2976373Z add.s32 %r26, %r48, %r793; 2026-02-21T08:54:39.2976521Z add.s32 %r27, %r48, %r794; 2026-02-21T08:54:39.2976677Z add.s32 %r28, %r48, %r795; 2026-02-21T08:54:39.2976823Z add.s32 %r29, %r48, %r796; 2026-02-21T08:54:39.2976982Z add.s32 %r30, %r48, %r797; 2026-02-21T08:54:39.2977250Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.2977534Z add.s64 %rd186, %rd91, 320; 2026-02-21T08:54:39.2977696Z add.s64 %rd187, %rd92, 320; 2026-02-21T08:54:39.2977843Z add.s64 %rd188, %rd93, 320; 2026-02-21T08:54:39.2978023Z add.s64 %rd189, %rd94, 320; 2026-02-21T08:54:39.2978184Z add.s64 %rd190, %rd95, 320; 2026-02-21T08:54:39.2978346Z add.s64 %rd191, %rd96, 320; 2026-02-21T08:54:39.2978498Z add.s64 %rd192, %rd97, 320; 2026-02-21T08:54:39.2978660Z add.s64 %rd193, %rd98, 320; 2026-02-21T08:54:39.2978932Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.2979216Z bar.sync 0; 2026-02-21T08:54:39.2979358Z // begin inline asm 2026-02-21T08:54:39.2979600Z cp.async.cg.shared.global [ %r874 + 0 ], [ %rd186 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2979836Z // end inline asm 2026-02-21T08:54:39.2979971Z // begin inline asm 2026-02-21T08:54:39.2980180Z cp.async.cg.shared.global [ %r876 + 0 ], [ %rd187 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2980406Z // end inline asm 2026-02-21T08:54:39.2980549Z // begin inline asm 2026-02-21T08:54:39.2980755Z cp.async.cg.shared.global [ %r878 + 0 ], [ %rd188 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2980977Z // end inline asm 2026-02-21T08:54:39.2981120Z // begin inline asm 2026-02-21T08:54:39.2981314Z cp.async.cg.shared.global [ %r880 + 0 ], [ %rd189 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2981543Z // end inline asm 2026-02-21T08:54:39.2981676Z // begin inline asm 2026-02-21T08:54:39.2981875Z cp.async.cg.shared.global [ %r882 + 0 ], [ %rd190 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2982095Z // end inline asm 2026-02-21T08:54:39.2982235Z // begin inline asm 2026-02-21T08:54:39.2982473Z cp.async.cg.shared.global [ %r884 + 0 ], [ %rd191 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2982698Z // end inline asm 2026-02-21T08:54:39.2982839Z // begin inline asm 2026-02-21T08:54:39.2983034Z cp.async.cg.shared.global [ %r886 + 0 ], [ %rd192 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2983267Z // end inline asm 2026-02-21T08:54:39.2983400Z // begin inline asm 2026-02-21T08:54:39.2983600Z cp.async.cg.shared.global [ %r888 + 0 ], [ %rd193 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2983819Z // end inline asm 2026-02-21T08:54:39.2983964Z cp.async.commit_group; 2026-02-21T08:54:39.2984233Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.2984526Z add.s64 %rd194, %rd99, 320; 2026-02-21T08:54:39.2984736Z add.s64 %rd195, %rd100, 320; 2026-02-21T08:54:39.2984901Z add.s64 %rd196, %rd101, 320; 2026-02-21T08:54:39.2985067Z add.s64 %rd197, %rd102, 320; 2026-02-21T08:54:39.2985222Z add.s64 %rd198, %rd103, 320; 2026-02-21T08:54:39.2985385Z add.s64 %rd199, %rd104, 320; 2026-02-21T08:54:39.2985548Z add.s64 %rd200, %rd105, 320; 2026-02-21T08:54:39.2985703Z add.s64 %rd201, %rd106, 320; 2026-02-21T08:54:39.2985969Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.2986272Z // begin inline asm 2026-02-21T08:54:39.2986476Z cp.async.cg.shared.global [ %r890 + 0 ], [ %rd194 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2986702Z // end inline asm 2026-02-21T08:54:39.2986847Z // begin inline asm 2026-02-21T08:54:39.2987041Z cp.async.cg.shared.global [ %r892 + 0 ], [ %rd195 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2987264Z // end inline asm 2026-02-21T08:54:39.2987393Z // begin inline asm 2026-02-21T08:54:39.2987587Z cp.async.cg.shared.global [ %r894 + 0 ], [ %rd196 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2987807Z // end inline asm 2026-02-21T08:54:39.2987934Z // begin inline asm 2026-02-21T08:54:39.2988127Z cp.async.cg.shared.global [ %r896 + 0 ], [ %rd197 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2988342Z // end inline asm 2026-02-21T08:54:39.2988477Z // begin inline asm 2026-02-21T08:54:39.2988666Z cp.async.cg.shared.global [ %r898 + 0 ], [ %rd198 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2988888Z // end inline asm 2026-02-21T08:54:39.2989018Z // begin inline asm 2026-02-21T08:54:39.2989209Z cp.async.cg.shared.global [ %r900 + 0 ], [ %rd199 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2989429Z // end inline asm 2026-02-21T08:54:39.2989559Z // begin inline asm 2026-02-21T08:54:39.2989777Z cp.async.cg.shared.global [ %r902 + 0 ], [ %rd200 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2989991Z // end inline asm 2026-02-21T08:54:39.2990126Z // begin inline asm 2026-02-21T08:54:39.2990309Z cp.async.cg.shared.global [ %r904 + 0 ], [ %rd201 + 0 ], 0x10, %r875; 2026-02-21T08:54:39.2990533Z // end inline asm 2026-02-21T08:54:39.2990666Z cp.async.commit_group; 2026-02-21T08:54:39.2990925Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.2991255Z and.b32 %r910, %r1, 3; 2026-02-21T08:54:39.2991410Z mul.wide.u32 %rd18, %r910, 16; 2026-02-21T08:54:39.2991577Z cvt.u16.u32 %rs1, %r3; 2026-02-21T08:54:39.2991721Z shr.u16 %rs2, %rs1, 7; 2026-02-21T08:54:39.2991867Z and.b16 %rs3, %rs2, 3; 2026-02-21T08:54:39.2992015Z mul.wide.u16 %r911, %rs3, 4096; 2026-02-21T08:54:39.2992183Z shl.b32 %r912, %r31, 8; 2026-02-21T08:54:39.2992327Z or.b32 %r913, %r911, %r912; 2026-02-21T08:54:39.2992482Z or.b32 %r914, %r913, %r4; 2026-02-21T08:54:39.2992643Z mad.wide.u32 %rd203, %r914, 4096, %rd70; 2026-02-21T08:54:39.2992828Z add.s64 %rd1291, %rd203, 384; 2026-02-21T08:54:39.2992990Z or.b32 %r915, %r913, %r5; 2026-02-21T08:54:39.2993135Z or.b32 %r916, %r915, 192; 2026-02-21T08:54:39.2993298Z mad.wide.u32 %rd204, %r916, 4096, %rd70; 2026-02-21T08:54:39.2993472Z add.s64 %rd1290, %rd204, 384; 2026-02-21T08:54:39.2993632Z or.b32 %r917, %r915, 160; 2026-02-21T08:54:39.2993810Z mad.wide.u32 %rd205, %r917, 4096, %rd70; 2026-02-21T08:54:39.2993993Z add.s64 %rd1289, %rd205, 384; 2026-02-21T08:54:39.2994143Z or.b32 %r918, %r915, 128; 2026-02-21T08:54:39.2994301Z mad.wide.u32 %rd206, %r918, 4096, %rd70; 2026-02-21T08:54:39.2994478Z add.s64 %rd1288, %rd206, 384; 2026-02-21T08:54:39.2994631Z or.b32 %r919, %r915, 96; 2026-02-21T08:54:39.2994823Z mad.wide.u32 %rd207, %r919, 4096, %rd70; 2026-02-21T08:54:39.2994994Z add.s64 %rd1287, %rd207, 384; 2026-02-21T08:54:39.2995156Z or.b32 %r920, %r915, 64; 2026-02-21T08:54:39.2995314Z mad.wide.u32 %rd208, %r920, 4096, %rd70; 2026-02-21T08:54:39.2995496Z add.s64 %rd1286, %rd208, 384; 2026-02-21T08:54:39.2995645Z or.b32 %r921, %r915, 32; 2026-02-21T08:54:39.2995805Z mad.wide.u32 %rd209, %r921, 4096, %rd70; 2026-02-21T08:54:39.2995973Z add.s64 %rd1285, %rd209, 384; 2026-02-21T08:54:39.2996137Z mad.wide.u32 %rd210, %r915, 4096, %rd70; 2026-02-21T08:54:39.2996311Z add.s64 %rd1284, %rd210, 384; 2026-02-21T08:54:39.2996463Z shl.b32 %r922, %r3, 15; 2026-02-21T08:54:39.2996620Z and.b32 %r923, %r922, 3670016; 2026-02-21T08:54:39.2996776Z shl.b32 %r924, %r4, 11; 2026-02-21T08:54:39.2996927Z or.b32 %r925, %r923, %r924; 2026-02-21T08:54:39.2997086Z mad.wide.u32 %rd211, %r925, 2, %rd69; 2026-02-21T08:54:39.2997263Z add.s64 %rd1283, %rd211, 384; 2026-02-21T08:54:39.2997442Z add.s32 %r926, %r1554, %r5; 2026-02-21T08:54:39.2997599Z add.s32 %r927, %r926, 192; 2026-02-21T08:54:39.2997763Z mad.wide.u32 %rd212, %r927, 4096, %rd69; 2026-02-21T08:54:39.2997932Z add.s64 %rd1282, %rd212, 384; 2026-02-21T08:54:39.2998091Z add.s32 %r928, %r926, 160; 2026-02-21T08:54:39.2998250Z mad.wide.u32 %rd213, %r928, 4096, %rd69; 2026-02-21T08:54:39.2998425Z add.s64 %rd1281, %rd213, 384; 2026-02-21T08:54:39.2998575Z add.s32 %r929, %r926, 128; 2026-02-21T08:54:39.2998739Z mad.wide.u32 %rd214, %r929, 4096, %rd69; 2026-02-21T08:54:39.2998907Z add.s64 %rd1280, %rd214, 384; 2026-02-21T08:54:39.2999064Z add.s32 %r930, %r926, 96; 2026-02-21T08:54:39.2999220Z mad.wide.u32 %rd215, %r930, 4096, %rd69; 2026-02-21T08:54:39.2999397Z add.s64 %rd1279, %rd215, 384; 2026-02-21T08:54:39.2999553Z add.s32 %r931, %r926, 64; 2026-02-21T08:54:39.2999706Z mad.wide.u32 %rd216, %r931, 4096, %rd69; 2026-02-21T08:54:39.2999880Z add.s64 %rd1278, %rd216, 384; 2026-02-21T08:54:39.3000029Z add.s32 %r932, %r926, 32; 2026-02-21T08:54:39.3000186Z mad.wide.u32 %rd217, %r932, 4096, %rd69; 2026-02-21T08:54:39.3000351Z add.s64 %rd1277, %rd217, 384; 2026-02-21T08:54:39.3000506Z shl.b32 %r933, %r5, 11; 2026-02-21T08:54:39.3000674Z or.b32 %r934, %r923, %r933; 2026-02-21T08:54:39.3000839Z mad.wide.u32 %rd218, %r934, 2, %rd69; 2026-02-21T08:54:39.3001008Z add.s64 %rd1276, %rd218, 384; 2026-02-21T08:54:39.3001153Z mov.b32 %r2333, 1; 2026-02-21T08:54:39.3001290Z mov.b32 %r2332, 5; 2026-02-21T08:54:39.3001421Z mov.b32 %r2329, 0; 2026-02-21T08:54:39.3001561Z mov.b64 %rd1275, -32; 2026-02-21T08:54:39.3001699Z mov.b32 %r2331, %r2329; 2026-02-21T08:54:39.3001849Z mov.b32 %r2334, %r2329; 2026-02-21T08:54:39.3002017Z bra.uni $L__BB0_4; 2026-02-21T08:54:39.3002202Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:39.3002524Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.3002819Z add.s64 %rd1275, %rd1275, 32; 2026-02-21T08:54:39.3002991Z setp.lt.u64 %p77, %rd1275, 1856; 2026-02-21T08:54:39.3003269Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3003559Z // begin inline asm 2026-02-21T08:54:39.3003692Z 2026-02-21T08:54:39.3003813Z { 2026-02-21T08:54:39.3003933Z .reg .pred complete; 2026-02-21T08:54:39.3004087Z waitLoop: 2026-02-21T08:54:39.3004272Z mbarrier.try_wait.parity.shared.b64 complete, [%r2330], %r2329; 2026-02-21T08:54:39.3004513Z @!complete bra.uni waitLoop; 2026-02-21T08:54:39.3004666Z } 2026-02-21T08:54:39.3004757Z 2026-02-21T08:54:39.3004839Z // end inline asm 2026-02-21T08:54:39.3004984Z add.s32 %r998, %r2333, 1; 2026-02-21T08:54:39.3005133Z setp.gt.s32 %p78, %r998, 1; 2026-02-21T08:54:39.3005300Z selp.b32 %r2333, 0, %r998, %p78; 2026-02-21T08:54:39.3005463Z selp.b32 %r999, 1, 0, %p78; 2026-02-21T08:54:39.3005622Z xor.b32 %r46, %r2334, %r999; 2026-02-21T08:54:39.3005880Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.3006162Z add.s32 %r1000, %r2332, 1; 2026-02-21T08:54:39.3006323Z setp.gt.s32 %p79, %r1000, 5; 2026-02-21T08:54:39.3006483Z selp.b32 %r2332, 0, %r1000, %p79; 2026-02-21T08:54:39.3006759Z .loc 1 54 32 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:32 2026-02-21T08:54:39.3007041Z add.s64 %rd234, %rd1276, %rd18; 2026-02-21T08:54:39.3007213Z add.s64 %rd235, %rd1277, %rd18; 2026-02-21T08:54:39.3007371Z add.s64 %rd236, %rd1278, %rd18; 2026-02-21T08:54:39.3007532Z add.s64 %rd237, %rd1279, %rd18; 2026-02-21T08:54:39.3007685Z add.s64 %rd238, %rd1280, %rd18; 2026-02-21T08:54:39.3007846Z add.s64 %rd239, %rd1281, %rd18; 2026-02-21T08:54:39.3007903Z add.s64 %rd240, %rd1282, %rd18; 2026-02-21T08:54:39.3008073Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.3008157Z add.s64 %rd241, %rd1283, %rd18; 2026-02-21T08:54:39.3008214Z shl.b32 %r1001, %r2332, 14; 2026-02-21T08:54:39.3008273Z bar.sync 0; 2026-02-21T08:54:39.3008330Z add.s32 %r1003, %r48, %r1001; 2026-02-21T08:54:39.3008388Z add.s32 %r1004, %r1003, %r6; 2026-02-21T08:54:39.3008448Z add.s32 %r966, %r1004, 131072; 2026-02-21T08:54:39.3008514Z selp.b32 %r967, 16, 0, %p77; 2026-02-21T08:54:39.3008568Z // begin inline asm 2026-02-21T08:54:39.3008682Z cp.async.cg.shared.global [ %r966 + 0 ], [ %rd234 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3008740Z // end inline asm 2026-02-21T08:54:39.3008795Z add.s32 %r968, %r1004, 133120; 2026-02-21T08:54:39.3008848Z // begin inline asm 2026-02-21T08:54:39.3008964Z cp.async.cg.shared.global [ %r968 + 0 ], [ %rd235 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3009024Z // end inline asm 2026-02-21T08:54:39.3009079Z add.s32 %r970, %r1004, 135168; 2026-02-21T08:54:39.3009132Z // begin inline asm 2026-02-21T08:54:39.3009251Z cp.async.cg.shared.global [ %r970 + 0 ], [ %rd236 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3009304Z // end inline asm 2026-02-21T08:54:39.3009359Z add.s32 %r972, %r1004, 137216; 2026-02-21T08:54:39.3009412Z // begin inline asm 2026-02-21T08:54:39.3009558Z cp.async.cg.shared.global [ %r972 + 0 ], [ %rd237 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3009615Z // end inline asm 2026-02-21T08:54:39.3009671Z add.s32 %r974, %r1004, 139264; 2026-02-21T08:54:39.3009733Z // begin inline asm 2026-02-21T08:54:39.3009841Z cp.async.cg.shared.global [ %r974 + 0 ], [ %rd238 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3009893Z // end inline asm 2026-02-21T08:54:39.3009954Z add.s32 %r976, %r1004, 141312; 2026-02-21T08:54:39.3010009Z // begin inline asm 2026-02-21T08:54:39.3010147Z cp.async.cg.shared.global [ %r976 + 0 ], [ %rd239 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3010199Z // end inline asm 2026-02-21T08:54:39.3010261Z add.s32 %r978, %r1004, 143360; 2026-02-21T08:54:39.3010314Z // begin inline asm 2026-02-21T08:54:39.3010422Z cp.async.cg.shared.global [ %r978 + 0 ], [ %rd240 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3010480Z // end inline asm 2026-02-21T08:54:39.3010535Z add.s32 %r980, %r1004, 145408; 2026-02-21T08:54:39.3010588Z // begin inline asm 2026-02-21T08:54:39.3010697Z cp.async.cg.shared.global [ %r980 + 0 ], [ %rd241 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3010757Z // end inline asm 2026-02-21T08:54:39.3010816Z cp.async.commit_group; 2026-02-21T08:54:39.3010985Z .loc 1 55 34 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:34 2026-02-21T08:54:39.3011051Z add.s64 %rd242, %rd1284, %rd18; 2026-02-21T08:54:39.3011107Z add.s64 %rd243, %rd1285, %rd18; 2026-02-21T08:54:39.3011187Z add.s64 %rd244, %rd1286, %rd18; 2026-02-21T08:54:39.3011254Z add.s64 %rd245, %rd1287, %rd18; 2026-02-21T08:54:39.3011310Z add.s64 %rd246, %rd1288, %rd18; 2026-02-21T08:54:39.3011366Z add.s64 %rd247, %rd1289, %rd18; 2026-02-21T08:54:39.3011421Z add.s64 %rd248, %rd1290, %rd18; 2026-02-21T08:54:39.3011597Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.3011653Z add.s64 %rd249, %rd1291, %rd18; 2026-02-21T08:54:39.3011709Z add.s32 %r982, %r1004, 229376; 2026-02-21T08:54:39.3011773Z // begin inline asm 2026-02-21T08:54:39.3011883Z cp.async.cg.shared.global [ %r982 + 0 ], [ %rd242 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3011938Z // end inline asm 2026-02-21T08:54:39.3011995Z add.s32 %r984, %r1004, 231424; 2026-02-21T08:54:39.3012065Z // begin inline asm 2026-02-21T08:54:39.3012174Z cp.async.cg.shared.global [ %r984 + 0 ], [ %rd243 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3012227Z // end inline asm 2026-02-21T08:54:39.3012294Z add.s32 %r986, %r1004, 233472; 2026-02-21T08:54:39.3012349Z // begin inline asm 2026-02-21T08:54:39.3012458Z cp.async.cg.shared.global [ %r986 + 0 ], [ %rd244 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3012517Z // end inline asm 2026-02-21T08:54:39.3012571Z add.s32 %r988, %r1004, 235520; 2026-02-21T08:54:39.3012650Z // begin inline asm 2026-02-21T08:54:39.3012759Z cp.async.cg.shared.global [ %r988 + 0 ], [ %rd245 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3012818Z // end inline asm 2026-02-21T08:54:39.3012874Z add.s32 %r990, %r1004, 237568; 2026-02-21T08:54:39.3012926Z // begin inline asm 2026-02-21T08:54:39.3013040Z cp.async.cg.shared.global [ %r990 + 0 ], [ %rd246 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3013092Z // end inline asm 2026-02-21T08:54:39.3013148Z add.s32 %r992, %r1004, 239616; 2026-02-21T08:54:39.3013201Z // begin inline asm 2026-02-21T08:54:39.3013315Z cp.async.cg.shared.global [ %r992 + 0 ], [ %rd247 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3013367Z // end inline asm 2026-02-21T08:54:39.3013422Z add.s32 %r994, %r1004, 241664; 2026-02-21T08:54:39.3013485Z // begin inline asm 2026-02-21T08:54:39.3013593Z cp.async.cg.shared.global [ %r994 + 0 ], [ %rd248 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3013645Z // end inline asm 2026-02-21T08:54:39.3013706Z add.s32 %r996, %r1004, 243712; 2026-02-21T08:54:39.3013760Z // begin inline asm 2026-02-21T08:54:39.3013869Z cp.async.cg.shared.global [ %r996 + 0 ], [ %rd249 + 0 ], 0x10, %r967; 2026-02-21T08:54:39.3013921Z // end inline asm 2026-02-21T08:54:39.3014010Z cp.async.commit_group; 2026-02-21T08:54:39.3014181Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.3014241Z add.s64 %rd1291, %rd1291, 64; 2026-02-21T08:54:39.3014305Z add.s64 %rd1290, %rd1290, 64; 2026-02-21T08:54:39.3014361Z add.s64 %rd1289, %rd1289, 64; 2026-02-21T08:54:39.3014417Z add.s64 %rd1288, %rd1288, 64; 2026-02-21T08:54:39.3014474Z add.s64 %rd1287, %rd1287, 64; 2026-02-21T08:54:39.3014536Z add.s64 %rd1286, %rd1286, 64; 2026-02-21T08:54:39.3014625Z add.s64 %rd1285, %rd1285, 64; 2026-02-21T08:54:39.3014716Z add.s64 %rd1284, %rd1284, 64; 2026-02-21T08:54:39.3014781Z add.s64 %rd1283, %rd1283, 64; 2026-02-21T08:54:39.3014836Z add.s64 %rd1282, %rd1282, 64; 2026-02-21T08:54:39.3014893Z add.s64 %rd1281, %rd1281, 64; 2026-02-21T08:54:39.3014956Z add.s64 %rd1280, %rd1280, 64; 2026-02-21T08:54:39.3015011Z add.s64 %rd1279, %rd1279, 64; 2026-02-21T08:54:39.3015065Z add.s64 %rd1278, %rd1278, 64; 2026-02-21T08:54:39.3015121Z add.s64 %rd1277, %rd1277, 64; 2026-02-21T08:54:39.3015182Z add.s64 %rd1276, %rd1276, 64; 2026-02-21T08:54:39.3015245Z setp.lt.u64 %p80, %rd1275, 1984; 2026-02-21T08:54:39.3015303Z mov.b32 %r2329, %r2334; 2026-02-21T08:54:39.3015366Z mov.b32 %r2330, %r1005; 2026-02-21T08:54:39.3015419Z mov.b32 %r2334, %r46; 2026-02-21T08:54:39.3015475Z @%p80 bra $L__BB0_4; 2026-02-21T08:54:39.3015529Z bra.uni $L__BB0_7; 2026-02-21T08:54:39.3015665Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:54:39.3015725Z add.s32 %r936, %r2331, 1; 2026-02-21T08:54:39.3015784Z setp.gt.s32 %p67, %r936, 5; 2026-02-21T08:54:39.3015850Z selp.b32 %r2331, 0, %r936, %p67; 2026-02-21T08:54:39.3016022Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.3016083Z cp.async.wait_group 8; 2026-02-21T08:54:39.3016134Z bar.sync 0; 2026-02-21T08:54:39.3016311Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.3016367Z shl.b32 %r937, %r2333, 3; 2026-02-21T08:54:39.3016422Z add.s32 %r939, %r48, %r937; 2026-02-21T08:54:39.3016485Z add.s32 %r1005, %r939, 327680; 2026-02-21T08:54:39.3016651Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3016706Z @%p56 bra $L__BB0_6; 2026-02-21T08:54:39.3016808Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:39.3016972Z .loc 1 55 87 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:55:87 2026-02-21T08:54:39.3017029Z shl.b32 %r948, %r2331, 14; 2026-02-21T08:54:39.3017091Z add.s32 %r950, %r48, %r948; 2026-02-21T08:54:39.3017173Z add.s32 %r951, %r950, 229376; 2026-02-21T08:54:39.3017339Z .loc 1 54 85 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:54:85 2026-02-21T08:54:39.3017395Z add.s32 %r952, %r950, 131072; 2026-02-21T08:54:39.3017568Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3017632Z elect.sync %r953|%p69, -1; 2026-02-21T08:54:39.3017691Z bfe.u32 %r954, %r952, 4, 14; 2026-02-21T08:54:39.3017756Z cvt.u64.u32 %rd228, %r954; 2026-02-21T08:54:39.3017828Z or.b64 %rd219, %rd228, -9223371899348713472; 2026-02-21T08:54:39.3017885Z bfe.u32 %r955, %r951, 4, 14; 2026-02-21T08:54:39.3017943Z cvt.u64.u32 %rd229, %r955; 2026-02-21T08:54:39.3018022Z or.b64 %rd220, %rd229, -9223371899348713472; 2026-02-21T08:54:39.3018077Z mov.b32 %r941, 138412048; 2026-02-21T08:54:39.3018134Z mov.pred %p68, -1; 2026-02-21T08:54:39.3018197Z // begin inline asm 2026-02-21T08:54:39.3018345Z @%p69 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 0 ], %rd219, %rd220, %r941, %p68; 2026-02-21T08:54:39.3018398Z // end inline asm 2026-02-21T08:54:39.3018461Z add.s32 %r956, %r950, 131104; 2026-02-21T08:54:39.3018517Z bfe.u32 %r957, %r956, 4, 14; 2026-02-21T08:54:39.3018599Z cvt.u64.u32 %rd230, %r957; 2026-02-21T08:54:39.3018669Z or.b64 %rd221, %rd230, -9223371899348713472; 2026-02-21T08:54:39.3018734Z add.s32 %r958, %r950, 229408; 2026-02-21T08:54:39.3018788Z bfe.u32 %r959, %r958, 4, 14; 2026-02-21T08:54:39.3018844Z cvt.u64.u32 %rd231, %r959; 2026-02-21T08:54:39.3018917Z or.b64 %rd222, %rd231, -9223371899348713472; 2026-02-21T08:54:39.3018971Z // begin inline asm 2026-02-21T08:54:39.3019111Z @%p69 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 0 ], %rd221, %rd222, %r941, %p68; 2026-02-21T08:54:39.3019200Z // end inline asm 2026-02-21T08:54:39.3019257Z add.s32 %r960, %r950, 139264; 2026-02-21T08:54:39.3019312Z bfe.u32 %r961, %r960, 4, 14; 2026-02-21T08:54:39.3019369Z cvt.u64.u32 %rd232, %r961; 2026-02-21T08:54:39.3019448Z or.b64 %rd223, %rd232, -9223371899348713472; 2026-02-21T08:54:39.3019502Z // begin inline asm 2026-02-21T08:54:39.3019641Z @%p69 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 256 ], %rd223, %rd220, %r941, %p68; 2026-02-21T08:54:39.3019702Z // end inline asm 2026-02-21T08:54:39.3019757Z add.s32 %r962, %r950, 139296; 2026-02-21T08:54:39.3019812Z bfe.u32 %r963, %r962, 4, 14; 2026-02-21T08:54:39.3019866Z cvt.u64.u32 %rd233, %r963; 2026-02-21T08:54:39.3019938Z or.b64 %rd225, %rd233, -9223371899348713472; 2026-02-21T08:54:39.3019994Z // begin inline asm 2026-02-21T08:54:39.3020127Z @%p69 tcgen05.mma.cta_group::1.kind::f16 [ %r2328 + 256 ], %rd225, %rd222, %r941, %p68; 2026-02-21T08:54:39.3020212Z // end inline asm 2026-02-21T08:54:39.3020272Z cvt.u64.u32 %rd227, %r1005; 2026-02-21T08:54:39.3020325Z // begin inline asm 2026-02-21T08:54:39.3020452Z @%p69 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd227]; 2026-02-21T08:54:39.3020504Z // end inline asm 2026-02-21T08:54:39.3020559Z bra.uni $L__BB0_6; 2026-02-21T08:54:39.3020649Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T08:54:39.3020827Z .loc 1 0 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:0:52 2026-02-21T08:54:39.3020886Z setp.lt.u32 %p84, %r1, 128; 2026-02-21T08:54:39.3020938Z mov.b32 %r1006, 1; 2026-02-21T08:54:39.3021109Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3021163Z // begin inline asm 2026-02-21T08:54:39.3021212Z 2026-02-21T08:54:39.3021269Z { 2026-02-21T08:54:39.3021329Z .reg .pred complete; 2026-02-21T08:54:39.3021383Z waitLoop: 2026-02-21T08:54:39.3021501Z mbarrier.try_wait.parity.shared.b64 complete, [%r1005], %r1006; 2026-02-21T08:54:39.3021573Z @!complete bra.uni waitLoop; 2026-02-21T08:54:39.3021623Z } 2026-02-21T08:54:39.3021627Z 2026-02-21T08:54:39.3021681Z // end inline asm 2026-02-21T08:54:39.3021856Z .loc 1 49 90 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:49:90 2026-02-21T08:54:39.3021946Z cp.async.wait_group 0; 2026-02-21T08:54:39.3022001Z bar.sync 0; 2026-02-21T08:54:39.3022062Z add.s32 %r1007, %r48, 327680; 2026-02-21T08:54:39.3022128Z // begin inline asm 2026-02-21T08:54:39.3022216Z @%p81 mbarrier.inval.shared::cta.b64 [%r1007]; 2026-02-21T08:54:39.3022270Z // end inline asm 2026-02-21T08:54:39.3022334Z bar.sync 0; 2026-02-21T08:54:39.3022390Z // begin inline asm 2026-02-21T08:54:39.3022471Z @%p81 mbarrier.inval.shared::cta.b64 [%r610]; 2026-02-21T08:54:39.3022533Z // end inline asm 2026-02-21T08:54:39.3022706Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3022762Z // begin inline asm 2026-02-21T08:54:39.3023078Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1009, %r1010, %r1011, %r1012, %r1013, %r1014, %r1015, %r1016, %r1017, %r1018, %r1019, %r1020, %r1021, %r1022, %r1023, %r1024}, [%r1552 + 0]; 2026-02-21T08:54:39.3023143Z // end inline asm 2026-02-21T08:54:39.3023198Z // begin inline asm 2026-02-21T08:54:39.3023535Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1026, %r1027, %r1028, %r1029, %r1030, %r1031, %r1032, %r1033, %r1034, %r1035, %r1036, %r1037, %r1038, %r1039, %r1040, %r1041}, [%r1552 + 16]; 2026-02-21T08:54:39.3023599Z // end inline asm 2026-02-21T08:54:39.3023655Z // begin inline asm 2026-02-21T08:54:39.3023958Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1043, %r1044, %r1045, %r1046, %r1047, %r1048, %r1049, %r1050, %r1051, %r1052, %r1053, %r1054, %r1055, %r1056, %r1057, %r1058}, [%r1552 + 32]; 2026-02-21T08:54:39.3024020Z // end inline asm 2026-02-21T08:54:39.3024080Z cvt.u64.u32 %rd251, %r1043; 2026-02-21T08:54:39.3024160Z cvt.u64.u32 %rd252, %r1044; 2026-02-21T08:54:39.3024226Z shl.b64 %rd253, %rd252, 32; 2026-02-21T08:54:39.3024287Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T08:54:39.3024345Z cvt.u64.u32 %rd255, %r1045; 2026-02-21T08:54:39.3024402Z cvt.u64.u32 %rd256, %r1046; 2026-02-21T08:54:39.3024469Z shl.b64 %rd257, %rd256, 32; 2026-02-21T08:54:39.3024532Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T08:54:39.3024588Z cvt.u64.u32 %rd259, %r1047; 2026-02-21T08:54:39.3024651Z cvt.u64.u32 %rd260, %r1048; 2026-02-21T08:54:39.3024743Z shl.b64 %rd261, %rd260, 32; 2026-02-21T08:54:39.3024805Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T08:54:39.3024862Z cvt.u64.u32 %rd263, %r1049; 2026-02-21T08:54:39.3024925Z cvt.u64.u32 %rd264, %r1050; 2026-02-21T08:54:39.3024982Z shl.b64 %rd265, %rd264, 32; 2026-02-21T08:54:39.3025041Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T08:54:39.3025105Z cvt.u64.u32 %rd267, %r1051; 2026-02-21T08:54:39.3025186Z cvt.u64.u32 %rd268, %r1052; 2026-02-21T08:54:39.3025245Z shl.b64 %rd269, %rd268, 32; 2026-02-21T08:54:39.3025305Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T08:54:39.3025368Z cvt.u64.u32 %rd271, %r1053; 2026-02-21T08:54:39.3025424Z cvt.u64.u32 %rd272, %r1054; 2026-02-21T08:54:39.3025480Z shl.b64 %rd273, %rd272, 32; 2026-02-21T08:54:39.3025546Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T08:54:39.3025603Z cvt.u64.u32 %rd275, %r1055; 2026-02-21T08:54:39.3025661Z cvt.u64.u32 %rd276, %r1056; 2026-02-21T08:54:39.3025724Z shl.b64 %rd277, %rd276, 32; 2026-02-21T08:54:39.3025783Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T08:54:39.3025841Z cvt.u64.u32 %rd279, %r1057; 2026-02-21T08:54:39.3025898Z cvt.u64.u32 %rd280, %r1058; 2026-02-21T08:54:39.3025964Z shl.b64 %rd281, %rd280, 32; 2026-02-21T08:54:39.3026024Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T08:54:39.3026080Z // begin inline asm 2026-02-21T08:54:39.3026396Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1060, %r1061, %r1062, %r1063, %r1064, %r1065, %r1066, %r1067, %r1068, %r1069, %r1070, %r1071, %r1072, %r1073, %r1074, %r1075}, [%r1552 + 48]; 2026-02-21T08:54:39.3026453Z // end inline asm 2026-02-21T08:54:39.3026511Z cvt.u64.u32 %rd283, %r1060; 2026-02-21T08:54:39.3026569Z cvt.u64.u32 %rd284, %r1061; 2026-02-21T08:54:39.3026635Z shl.b64 %rd285, %rd284, 32; 2026-02-21T08:54:39.3026724Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T08:54:39.3026782Z cvt.u64.u32 %rd287, %r1062; 2026-02-21T08:54:39.3026847Z cvt.u64.u32 %rd288, %r1063; 2026-02-21T08:54:39.3026904Z shl.b64 %rd289, %rd288, 32; 2026-02-21T08:54:39.3026963Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T08:54:39.3027030Z cvt.u64.u32 %rd291, %r1064; 2026-02-21T08:54:39.3027088Z cvt.u64.u32 %rd292, %r1065; 2026-02-21T08:54:39.3027148Z shl.b64 %rd293, %rd292, 32; 2026-02-21T08:54:39.3027206Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T08:54:39.3027273Z cvt.u64.u32 %rd295, %r1066; 2026-02-21T08:54:39.3027333Z cvt.u64.u32 %rd296, %r1067; 2026-02-21T08:54:39.3027393Z shl.b64 %rd297, %rd296, 32; 2026-02-21T08:54:39.3027458Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T08:54:39.3027517Z cvt.u64.u32 %rd299, %r1068; 2026-02-21T08:54:39.3027574Z cvt.u64.u32 %rd300, %r1069; 2026-02-21T08:54:39.3027630Z shl.b64 %rd301, %rd300, 32; 2026-02-21T08:54:39.3027696Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T08:54:39.3027754Z cvt.u64.u32 %rd303, %r1070; 2026-02-21T08:54:39.3027810Z cvt.u64.u32 %rd304, %r1071; 2026-02-21T08:54:39.3027874Z shl.b64 %rd305, %rd304, 32; 2026-02-21T08:54:39.3027932Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T08:54:39.3028020Z cvt.u64.u32 %rd307, %r1072; 2026-02-21T08:54:39.3028078Z cvt.u64.u32 %rd308, %r1073; 2026-02-21T08:54:39.3028143Z shl.b64 %rd309, %rd308, 32; 2026-02-21T08:54:39.3028201Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T08:54:39.3028259Z cvt.u64.u32 %rd311, %r1074; 2026-02-21T08:54:39.3028324Z cvt.u64.u32 %rd312, %r1075; 2026-02-21T08:54:39.3028380Z shl.b64 %rd313, %rd312, 32; 2026-02-21T08:54:39.3028438Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T08:54:39.3028495Z // begin inline asm 2026-02-21T08:54:39.3028828Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1077, %r1078, %r1079, %r1080, %r1081, %r1082, %r1083, %r1084, %r1085, %r1086, %r1087, %r1088, %r1089, %r1090, %r1091, %r1092}, [%r1552 + 64]; 2026-02-21T08:54:39.3028886Z // end inline asm 2026-02-21T08:54:39.3028942Z // begin inline asm 2026-02-21T08:54:39.3029244Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1094, %r1095, %r1096, %r1097, %r1098, %r1099, %r1100, %r1101, %r1102, %r1103, %r1104, %r1105, %r1106, %r1107, %r1108, %r1109}, [%r1552 + 80]; 2026-02-21T08:54:39.3029299Z // end inline asm 2026-02-21T08:54:39.3029355Z // begin inline asm 2026-02-21T08:54:39.3029671Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1111, %r1112, %r1113, %r1114, %r1115, %r1116, %r1117, %r1118, %r1119, %r1120, %r1121, %r1122, %r1123, %r1124, %r1125, %r1126}, [%r1552 + 96]; 2026-02-21T08:54:39.3029725Z // end inline asm 2026-02-21T08:54:39.3029780Z cvt.u64.u32 %rd315, %r1111; 2026-02-21T08:54:39.3029863Z cvt.u64.u32 %rd316, %r1112; 2026-02-21T08:54:39.3029921Z shl.b64 %rd317, %rd316, 32; 2026-02-21T08:54:39.3029978Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T08:54:39.3030031Z cvt.u64.u32 %rd319, %r1113; 2026-02-21T08:54:39.3030095Z cvt.u64.u32 %rd320, %r1114; 2026-02-21T08:54:39.3030151Z shl.b64 %rd321, %rd320, 32; 2026-02-21T08:54:39.3030206Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T08:54:39.3030268Z cvt.u64.u32 %rd323, %r1115; 2026-02-21T08:54:39.3030323Z cvt.u64.u32 %rd324, %r1116; 2026-02-21T08:54:39.3030378Z shl.b64 %rd325, %rd324, 32; 2026-02-21T08:54:39.3030433Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T08:54:39.3030494Z cvt.u64.u32 %rd327, %r1117; 2026-02-21T08:54:39.3030548Z cvt.u64.u32 %rd328, %r1118; 2026-02-21T08:54:39.3030604Z shl.b64 %rd329, %rd328, 32; 2026-02-21T08:54:39.3030668Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T08:54:39.3030723Z cvt.u64.u32 %rd331, %r1119; 2026-02-21T08:54:39.3030776Z cvt.u64.u32 %rd332, %r1120; 2026-02-21T08:54:39.3030838Z shl.b64 %rd333, %rd332, 32; 2026-02-21T08:54:39.3030895Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T08:54:39.3030950Z cvt.u64.u32 %rd335, %r1121; 2026-02-21T08:54:39.3031004Z cvt.u64.u32 %rd336, %r1122; 2026-02-21T08:54:39.3031065Z shl.b64 %rd337, %rd336, 32; 2026-02-21T08:54:39.3031119Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T08:54:39.3031194Z cvt.u64.u32 %rd339, %r1123; 2026-02-21T08:54:39.3031255Z cvt.u64.u32 %rd340, %r1124; 2026-02-21T08:54:39.3031310Z shl.b64 %rd341, %rd340, 32; 2026-02-21T08:54:39.3031365Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T08:54:39.3031419Z cvt.u64.u32 %rd343, %r1125; 2026-02-21T08:54:39.3031481Z cvt.u64.u32 %rd344, %r1126; 2026-02-21T08:54:39.3031535Z shl.b64 %rd345, %rd344, 32; 2026-02-21T08:54:39.3031589Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T08:54:39.3031649Z // begin inline asm 2026-02-21T08:54:39.3031945Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1128, %r1129, %r1130, %r1131, %r1132, %r1133, %r1134, %r1135, %r1136, %r1137, %r1138, %r1139, %r1140, %r1141, %r1142, %r1143}, [%r1552 + 112]; 2026-02-21T08:54:39.3031998Z // end inline asm 2026-02-21T08:54:39.3032059Z cvt.u64.u32 %rd347, %r1128; 2026-02-21T08:54:39.3032113Z cvt.u64.u32 %rd348, %r1129; 2026-02-21T08:54:39.3032167Z shl.b64 %rd349, %rd348, 32; 2026-02-21T08:54:39.3032223Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T08:54:39.3032284Z cvt.u64.u32 %rd351, %r1130; 2026-02-21T08:54:39.3032337Z cvt.u64.u32 %rd352, %r1131; 2026-02-21T08:54:39.3032391Z shl.b64 %rd353, %rd352, 32; 2026-02-21T08:54:39.3032482Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T08:54:39.3032537Z cvt.u64.u32 %rd355, %r1132; 2026-02-21T08:54:39.3032591Z cvt.u64.u32 %rd356, %r1133; 2026-02-21T08:54:39.3032644Z shl.b64 %rd357, %rd356, 32; 2026-02-21T08:54:39.3032705Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T08:54:39.3032760Z cvt.u64.u32 %rd359, %r1134; 2026-02-21T08:54:39.3032814Z cvt.u64.u32 %rd360, %r1135; 2026-02-21T08:54:39.3032878Z shl.b64 %rd361, %rd360, 32; 2026-02-21T08:54:39.3032934Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T08:54:39.3033011Z cvt.u64.u32 %rd363, %r1136; 2026-02-21T08:54:39.3033066Z cvt.u64.u32 %rd364, %r1137; 2026-02-21T08:54:39.3033127Z shl.b64 %rd365, %rd364, 32; 2026-02-21T08:54:39.3033182Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T08:54:39.3033236Z cvt.u64.u32 %rd367, %r1138; 2026-02-21T08:54:39.3033301Z cvt.u64.u32 %rd368, %r1139; 2026-02-21T08:54:39.3033355Z shl.b64 %rd369, %rd368, 32; 2026-02-21T08:54:39.3033410Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T08:54:39.3033475Z cvt.u64.u32 %rd371, %r1140; 2026-02-21T08:54:39.3033529Z cvt.u64.u32 %rd372, %r1141; 2026-02-21T08:54:39.3033584Z shl.b64 %rd373, %rd372, 32; 2026-02-21T08:54:39.3033641Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T08:54:39.3033704Z cvt.u64.u32 %rd375, %r1142; 2026-02-21T08:54:39.3033758Z cvt.u64.u32 %rd376, %r1143; 2026-02-21T08:54:39.3033813Z shl.b64 %rd377, %rd376, 32; 2026-02-21T08:54:39.3033881Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T08:54:39.3033936Z // begin inline asm 2026-02-21T08:54:39.3034258Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1145, %r1146, %r1147, %r1148, %r1149, %r1150, %r1151, %r1152, %r1153, %r1154, %r1155, %r1156, %r1157, %r1158, %r1159, %r1160}, [%r1552 + 128]; 2026-02-21T08:54:39.3034315Z // end inline asm 2026-02-21T08:54:39.3034380Z // begin inline asm 2026-02-21T08:54:39.3034718Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1162, %r1163, %r1164, %r1165, %r1166, %r1167, %r1168, %r1169, %r1170, %r1171, %r1172, %r1173, %r1174, %r1175, %r1176, %r1177}, [%r1552 + 144]; 2026-02-21T08:54:39.3034776Z // end inline asm 2026-02-21T08:54:39.3034839Z // begin inline asm 2026-02-21T08:54:39.3035129Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1179, %r1180, %r1181, %r1182, %r1183, %r1184, %r1185, %r1186, %r1187, %r1188, %r1189, %r1190, %r1191, %r1192, %r1193, %r1194}, [%r1552 + 160]; 2026-02-21T08:54:39.3035182Z // end inline asm 2026-02-21T08:54:39.3035244Z cvt.u64.u32 %rd379, %r1179; 2026-02-21T08:54:39.3035299Z cvt.u64.u32 %rd380, %r1180; 2026-02-21T08:54:39.3035354Z shl.b64 %rd381, %rd380, 32; 2026-02-21T08:54:39.3035419Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T08:54:39.3035474Z cvt.u64.u32 %rd383, %r1181; 2026-02-21T08:54:39.3035529Z cvt.u64.u32 %rd384, %r1182; 2026-02-21T08:54:39.3035583Z shl.b64 %rd385, %rd384, 32; 2026-02-21T08:54:39.3035681Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T08:54:39.3035737Z cvt.u64.u32 %rd387, %r1183; 2026-02-21T08:54:39.3035795Z cvt.u64.u32 %rd388, %r1184; 2026-02-21T08:54:39.3035861Z shl.b64 %rd389, %rd388, 32; 2026-02-21T08:54:39.3035917Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T08:54:39.3035973Z cvt.u64.u32 %rd391, %r1185; 2026-02-21T08:54:39.3036027Z cvt.u64.u32 %rd392, %r1186; 2026-02-21T08:54:39.3036091Z shl.b64 %rd393, %rd392, 32; 2026-02-21T08:54:39.3036147Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T08:54:39.3036202Z cvt.u64.u32 %rd395, %r1187; 2026-02-21T08:54:39.3036264Z cvt.u64.u32 %rd396, %r1188; 2026-02-21T08:54:39.3036320Z shl.b64 %rd397, %rd396, 32; 2026-02-21T08:54:39.3036378Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T08:54:39.3036434Z cvt.u64.u32 %rd399, %r1189; 2026-02-21T08:54:39.3036498Z cvt.u64.u32 %rd400, %r1190; 2026-02-21T08:54:39.3036555Z shl.b64 %rd401, %rd400, 32; 2026-02-21T08:54:39.3036611Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T08:54:39.3036677Z cvt.u64.u32 %rd403, %r1191; 2026-02-21T08:54:39.3036731Z cvt.u64.u32 %rd404, %r1192; 2026-02-21T08:54:39.3036785Z shl.b64 %rd405, %rd404, 32; 2026-02-21T08:54:39.3036850Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T08:54:39.3036928Z cvt.u64.u32 %rd407, %r1193; 2026-02-21T08:54:39.3036986Z cvt.u64.u32 %rd408, %r1194; 2026-02-21T08:54:39.3037041Z shl.b64 %rd409, %rd408, 32; 2026-02-21T08:54:39.3037105Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T08:54:39.3037157Z // begin inline asm 2026-02-21T08:54:39.3037451Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1196, %r1197, %r1198, %r1199, %r1200, %r1201, %r1202, %r1203, %r1204, %r1205, %r1206, %r1207, %r1208, %r1209, %r1210, %r1211}, [%r1552 + 176]; 2026-02-21T08:54:39.3037512Z // end inline asm 2026-02-21T08:54:39.3037591Z cvt.u64.u32 %rd411, %r1196; 2026-02-21T08:54:39.3037646Z cvt.u64.u32 %rd412, %r1197; 2026-02-21T08:54:39.3037701Z shl.b64 %rd413, %rd412, 32; 2026-02-21T08:54:39.3037764Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T08:54:39.3037820Z cvt.u64.u32 %rd415, %r1198; 2026-02-21T08:54:39.3037874Z cvt.u64.u32 %rd416, %r1199; 2026-02-21T08:54:39.3037937Z shl.b64 %rd417, %rd416, 32; 2026-02-21T08:54:39.3037992Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T08:54:39.3038048Z cvt.u64.u32 %rd419, %r1200; 2026-02-21T08:54:39.3038109Z cvt.u64.u32 %rd420, %r1201; 2026-02-21T08:54:39.3038162Z shl.b64 %rd421, %rd420, 32; 2026-02-21T08:54:39.3038218Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T08:54:39.3038271Z cvt.u64.u32 %rd423, %r1202; 2026-02-21T08:54:39.3038332Z cvt.u64.u32 %rd424, %r1203; 2026-02-21T08:54:39.3038387Z shl.b64 %rd425, %rd424, 32; 2026-02-21T08:54:39.3038442Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T08:54:39.3038530Z cvt.u64.u32 %rd427, %r1204; 2026-02-21T08:54:39.3038587Z cvt.u64.u32 %rd428, %r1205; 2026-02-21T08:54:39.3038643Z shl.b64 %rd429, %rd428, 32; 2026-02-21T08:54:39.3038698Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T08:54:39.3038758Z cvt.u64.u32 %rd431, %r1206; 2026-02-21T08:54:39.3038815Z cvt.u64.u32 %rd432, %r1207; 2026-02-21T08:54:39.3038870Z shl.b64 %rd433, %rd432, 32; 2026-02-21T08:54:39.3038932Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T08:54:39.3038987Z cvt.u64.u32 %rd435, %r1208; 2026-02-21T08:54:39.3039045Z cvt.u64.u32 %rd436, %r1209; 2026-02-21T08:54:39.3039098Z shl.b64 %rd437, %rd436, 32; 2026-02-21T08:54:39.3039161Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T08:54:39.3039215Z cvt.u64.u32 %rd439, %r1210; 2026-02-21T08:54:39.3039270Z cvt.u64.u32 %rd440, %r1211; 2026-02-21T08:54:39.3039332Z shl.b64 %rd441, %rd440, 32; 2026-02-21T08:54:39.3039388Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T08:54:39.3039441Z // begin inline asm 2026-02-21T08:54:39.3039750Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1213, %r1214, %r1215, %r1216, %r1217, %r1218, %r1219, %r1220, %r1221, %r1222, %r1223, %r1224, %r1225, %r1226, %r1227, %r1228}, [%r1552 + 192]; 2026-02-21T08:54:39.3039804Z // end inline asm 2026-02-21T08:54:39.3039858Z // begin inline asm 2026-02-21T08:54:39.3040172Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1230, %r1231, %r1232, %r1233, %r1234, %r1235, %r1236, %r1237, %r1238, %r1239, %r1240, %r1241, %r1242, %r1243, %r1244, %r1245}, [%r1552 + 208]; 2026-02-21T08:54:39.3040231Z // end inline asm 2026-02-21T08:54:39.3040286Z // begin inline asm 2026-02-21T08:54:39.3040586Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1247, %r1248, %r1249, %r1250, %r1251, %r1252, %r1253, %r1254, %r1255, %r1256, %r1257, %r1258, %r1259, %r1260, %r1261, %r1262}, [%r1552 + 224]; 2026-02-21T08:54:39.3040646Z // end inline asm 2026-02-21T08:54:39.3040700Z cvt.u64.u32 %rd443, %r1247; 2026-02-21T08:54:39.3040754Z cvt.u64.u32 %rd444, %r1248; 2026-02-21T08:54:39.3040815Z shl.b64 %rd445, %rd444, 32; 2026-02-21T08:54:39.3040871Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T08:54:39.3040926Z cvt.u64.u32 %rd447, %r1249; 2026-02-21T08:54:39.3040982Z cvt.u64.u32 %rd448, %r1250; 2026-02-21T08:54:39.3041044Z shl.b64 %rd449, %rd448, 32; 2026-02-21T08:54:39.3041100Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T08:54:39.3041156Z cvt.u64.u32 %rd451, %r1251; 2026-02-21T08:54:39.3041219Z cvt.u64.u32 %rd452, %r1252; 2026-02-21T08:54:39.3041273Z shl.b64 %rd453, %rd452, 32; 2026-02-21T08:54:39.3041329Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T08:54:39.3041406Z cvt.u64.u32 %rd455, %r1253; 2026-02-21T08:54:39.3041470Z cvt.u64.u32 %rd456, %r1254; 2026-02-21T08:54:39.3041526Z shl.b64 %rd457, %rd456, 32; 2026-02-21T08:54:39.3041584Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T08:54:39.3041649Z cvt.u64.u32 %rd459, %r1255; 2026-02-21T08:54:39.3041710Z cvt.u64.u32 %rd460, %r1256; 2026-02-21T08:54:39.3041765Z shl.b64 %rd461, %rd460, 32; 2026-02-21T08:54:39.3041828Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T08:54:39.3041884Z cvt.u64.u32 %rd463, %r1257; 2026-02-21T08:54:39.3041961Z cvt.u64.u32 %rd464, %r1258; 2026-02-21T08:54:39.3042016Z shl.b64 %rd465, %rd464, 32; 2026-02-21T08:54:39.3042080Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T08:54:39.3042135Z cvt.u64.u32 %rd467, %r1259; 2026-02-21T08:54:39.3042192Z cvt.u64.u32 %rd468, %r1260; 2026-02-21T08:54:39.3042254Z shl.b64 %rd469, %rd468, 32; 2026-02-21T08:54:39.3042310Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T08:54:39.3042365Z cvt.u64.u32 %rd471, %r1261; 2026-02-21T08:54:39.3042421Z cvt.u64.u32 %rd472, %r1262; 2026-02-21T08:54:39.3042482Z shl.b64 %rd473, %rd472, 32; 2026-02-21T08:54:39.3042539Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T08:54:39.3042592Z // begin inline asm 2026-02-21T08:54:39.3042898Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1264, %r1265, %r1266, %r1267, %r1268, %r1269, %r1270, %r1271, %r1272, %r1273, %r1274, %r1275, %r1276, %r1277, %r1278, %r1279}, [%r1552 + 240]; 2026-02-21T08:54:39.3042952Z // end inline asm 2026-02-21T08:54:39.3043029Z cvt.u64.u32 %rd475, %r1264; 2026-02-21T08:54:39.3043094Z cvt.u64.u32 %rd476, %r1265; 2026-02-21T08:54:39.3043148Z shl.b64 %rd477, %rd476, 32; 2026-02-21T08:54:39.3043203Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T08:54:39.3043259Z cvt.u64.u32 %rd479, %r1266; 2026-02-21T08:54:39.3043322Z cvt.u64.u32 %rd480, %r1267; 2026-02-21T08:54:39.3043377Z shl.b64 %rd481, %rd480, 32; 2026-02-21T08:54:39.3043435Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T08:54:39.3043495Z cvt.u64.u32 %rd483, %r1268; 2026-02-21T08:54:39.3043551Z cvt.u64.u32 %rd484, %r1269; 2026-02-21T08:54:39.3043606Z shl.b64 %rd485, %rd484, 32; 2026-02-21T08:54:39.3043662Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T08:54:39.3043724Z cvt.u64.u32 %rd487, %r1270; 2026-02-21T08:54:39.3043780Z cvt.u64.u32 %rd488, %r1271; 2026-02-21T08:54:39.3043834Z shl.b64 %rd489, %rd488, 32; 2026-02-21T08:54:39.3043896Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T08:54:39.3043951Z cvt.u64.u32 %rd491, %r1272; 2026-02-21T08:54:39.3044008Z cvt.u64.u32 %rd492, %r1273; 2026-02-21T08:54:39.3044066Z shl.b64 %rd493, %rd492, 32; 2026-02-21T08:54:39.3044129Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T08:54:39.3044184Z cvt.u64.u32 %rd495, %r1274; 2026-02-21T08:54:39.3044237Z cvt.u64.u32 %rd496, %r1275; 2026-02-21T08:54:39.3044322Z shl.b64 %rd497, %rd496, 32; 2026-02-21T08:54:39.3044378Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T08:54:39.3044433Z cvt.u64.u32 %rd499, %r1276; 2026-02-21T08:54:39.3044494Z cvt.u64.u32 %rd500, %r1277; 2026-02-21T08:54:39.3044550Z shl.b64 %rd501, %rd500, 32; 2026-02-21T08:54:39.3044607Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T08:54:39.3044661Z cvt.u64.u32 %rd503, %r1278; 2026-02-21T08:54:39.3044764Z cvt.u64.u32 %rd504, %r1279; 2026-02-21T08:54:39.3044818Z shl.b64 %rd505, %rd504, 32; 2026-02-21T08:54:39.3044875Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T08:54:39.3044936Z // begin inline asm 2026-02-21T08:54:39.3045222Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1281, %r1282, %r1283, %r1284, %r1285, %r1286, %r1287, %r1288, %r1289, %r1290, %r1291, %r1292, %r1293, %r1294, %r1295, %r1296}, [%r1552 + 256]; 2026-02-21T08:54:39.3045277Z // end inline asm 2026-02-21T08:54:39.3045331Z // begin inline asm 2026-02-21T08:54:39.3045624Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1298, %r1299, %r1300, %r1301, %r1302, %r1303, %r1304, %r1305, %r1306, %r1307, %r1308, %r1309, %r1310, %r1311, %r1312, %r1313}, [%r1552 + 272]; 2026-02-21T08:54:39.3045678Z // end inline asm 2026-02-21T08:54:39.3045730Z // begin inline asm 2026-02-21T08:54:39.3046053Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1315, %r1316, %r1317, %r1318, %r1319, %r1320, %r1321, %r1322, %r1323, %r1324, %r1325, %r1326, %r1327, %r1328, %r1329, %r1330}, [%r1552 + 288]; 2026-02-21T08:54:39.3046108Z // end inline asm 2026-02-21T08:54:39.3046164Z cvt.u64.u32 %rd507, %r1315; 2026-02-21T08:54:39.3046225Z cvt.u64.u32 %rd508, %r1316; 2026-02-21T08:54:39.3046279Z shl.b64 %rd509, %rd508, 32; 2026-02-21T08:54:39.3046338Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T08:54:39.3046429Z cvt.u64.u32 %rd511, %r1317; 2026-02-21T08:54:39.3046484Z cvt.u64.u32 %rd512, %r1318; 2026-02-21T08:54:39.3046538Z shl.b64 %rd513, %rd512, 32; 2026-02-21T08:54:39.3046592Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T08:54:39.3046655Z cvt.u64.u32 %rd515, %r1319; 2026-02-21T08:54:39.3046711Z cvt.u64.u32 %rd516, %r1320; 2026-02-21T08:54:39.3046764Z shl.b64 %rd517, %rd516, 32; 2026-02-21T08:54:39.3046827Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T08:54:39.3046883Z cvt.u64.u32 %rd519, %r1321; 2026-02-21T08:54:39.3046938Z cvt.u64.u32 %rd520, %r1322; 2026-02-21T08:54:39.3046991Z shl.b64 %rd521, %rd520, 32; 2026-02-21T08:54:39.3047055Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T08:54:39.3047109Z cvt.u64.u32 %rd523, %r1323; 2026-02-21T08:54:39.3047164Z cvt.u64.u32 %rd524, %r1324; 2026-02-21T08:54:39.3047227Z shl.b64 %rd525, %rd524, 32; 2026-02-21T08:54:39.3047281Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T08:54:39.3047369Z cvt.u64.u32 %rd527, %r1325; 2026-02-21T08:54:39.3047428Z cvt.u64.u32 %rd528, %r1326; 2026-02-21T08:54:39.3047491Z shl.b64 %rd529, %rd528, 32; 2026-02-21T08:54:39.3047547Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T08:54:39.3047601Z cvt.u64.u32 %rd531, %r1327; 2026-02-21T08:54:39.3047663Z cvt.u64.u32 %rd532, %r1328; 2026-02-21T08:54:39.3047719Z shl.b64 %rd533, %rd532, 32; 2026-02-21T08:54:39.3047774Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T08:54:39.3047828Z cvt.u64.u32 %rd535, %r1329; 2026-02-21T08:54:39.3047890Z cvt.u64.u32 %rd536, %r1330; 2026-02-21T08:54:39.3047946Z shl.b64 %rd537, %rd536, 32; 2026-02-21T08:54:39.3048002Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T08:54:39.3048063Z // begin inline asm 2026-02-21T08:54:39.3048349Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1332, %r1333, %r1334, %r1335, %r1336, %r1337, %r1338, %r1339, %r1340, %r1341, %r1342, %r1343, %r1344, %r1345, %r1346, %r1347}, [%r1552 + 304]; 2026-02-21T08:54:39.3048403Z // end inline asm 2026-02-21T08:54:39.3048469Z cvt.u64.u32 %rd539, %r1332; 2026-02-21T08:54:39.3048527Z cvt.u64.u32 %rd540, %r1333; 2026-02-21T08:54:39.3048585Z shl.b64 %rd541, %rd540, 32; 2026-02-21T08:54:39.3048642Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T08:54:39.3048709Z cvt.u64.u32 %rd543, %r1334; 2026-02-21T08:54:39.3048766Z cvt.u64.u32 %rd544, %r1335; 2026-02-21T08:54:39.3048861Z shl.b64 %rd545, %rd544, 32; 2026-02-21T08:54:39.3048924Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T08:54:39.3048979Z cvt.u64.u32 %rd547, %r1336; 2026-02-21T08:54:39.3049033Z cvt.u64.u32 %rd548, %r1337; 2026-02-21T08:54:39.3049096Z shl.b64 %rd549, %rd548, 32; 2026-02-21T08:54:39.3049151Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T08:54:39.3049204Z cvt.u64.u32 %rd551, %r1338; 2026-02-21T08:54:39.3049258Z cvt.u64.u32 %rd552, %r1339; 2026-02-21T08:54:39.3049320Z shl.b64 %rd553, %rd552, 32; 2026-02-21T08:54:39.3049374Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T08:54:39.3049428Z cvt.u64.u32 %rd555, %r1340; 2026-02-21T08:54:39.3049490Z cvt.u64.u32 %rd556, %r1341; 2026-02-21T08:54:39.3049545Z shl.b64 %rd557, %rd556, 32; 2026-02-21T08:54:39.3049600Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T08:54:39.3049654Z cvt.u64.u32 %rd559, %r1342; 2026-02-21T08:54:39.3049716Z cvt.u64.u32 %rd560, %r1343; 2026-02-21T08:54:39.3049772Z shl.b64 %rd561, %rd560, 32; 2026-02-21T08:54:39.3049828Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T08:54:39.3049889Z cvt.u64.u32 %rd563, %r1344; 2026-02-21T08:54:39.3049944Z cvt.u64.u32 %rd564, %r1345; 2026-02-21T08:54:39.3049998Z shl.b64 %rd565, %rd564, 32; 2026-02-21T08:54:39.3050077Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T08:54:39.3050140Z cvt.u64.u32 %rd567, %r1346; 2026-02-21T08:54:39.3050195Z cvt.u64.u32 %rd568, %r1347; 2026-02-21T08:54:39.3050250Z shl.b64 %rd569, %rd568, 32; 2026-02-21T08:54:39.3050312Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T08:54:39.3050366Z // begin inline asm 2026-02-21T08:54:39.3050655Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1349, %r1350, %r1351, %r1352, %r1353, %r1354, %r1355, %r1356, %r1357, %r1358, %r1359, %r1360, %r1361, %r1362, %r1363, %r1364}, [%r1552 + 320]; 2026-02-21T08:54:39.3050736Z // end inline asm 2026-02-21T08:54:39.3050790Z // begin inline asm 2026-02-21T08:54:39.3051089Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1366, %r1367, %r1368, %r1369, %r1370, %r1371, %r1372, %r1373, %r1374, %r1375, %r1376, %r1377, %r1378, %r1379, %r1380, %r1381}, [%r1552 + 336]; 2026-02-21T08:54:39.3051144Z // end inline asm 2026-02-21T08:54:39.3051206Z cvt.u64.u32 %rd571, %r1374; 2026-02-21T08:54:39.3051262Z cvt.u64.u32 %rd572, %r1375; 2026-02-21T08:54:39.3051315Z shl.b64 %rd573, %rd572, 32; 2026-02-21T08:54:39.3051379Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T08:54:39.3051432Z // begin inline asm 2026-02-21T08:54:39.3051729Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1383, %r1384, %r1385, %r1386, %r1387, %r1388, %r1389, %r1390, %r1391, %r1392, %r1393, %r1394, %r1395, %r1396, %r1397, %r1398}, [%r1552 + 352]; 2026-02-21T08:54:39.3051789Z // end inline asm 2026-02-21T08:54:39.3051862Z cvt.u64.u32 %rd575, %r1383; 2026-02-21T08:54:39.3051921Z cvt.u64.u32 %rd576, %r1384; 2026-02-21T08:54:39.3051975Z shl.b64 %rd577, %rd576, 32; 2026-02-21T08:54:39.3052038Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T08:54:39.3052093Z cvt.u64.u32 %rd579, %r1385; 2026-02-21T08:54:39.3052149Z cvt.u64.u32 %rd580, %r1386; 2026-02-21T08:54:39.3052210Z shl.b64 %rd581, %rd580, 32; 2026-02-21T08:54:39.3052266Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T08:54:39.3052320Z cvt.u64.u32 %rd583, %r1387; 2026-02-21T08:54:39.3052374Z cvt.u64.u32 %rd584, %r1388; 2026-02-21T08:54:39.3052435Z shl.b64 %rd585, %rd584, 32; 2026-02-21T08:54:39.3052490Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T08:54:39.3052543Z cvt.u64.u32 %rd587, %r1389; 2026-02-21T08:54:39.3052604Z cvt.u64.u32 %rd588, %r1390; 2026-02-21T08:54:39.3052658Z shl.b64 %rd589, %rd588, 32; 2026-02-21T08:54:39.3052714Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T08:54:39.3052774Z cvt.u64.u32 %rd591, %r1391; 2026-02-21T08:54:39.3052829Z cvt.u64.u32 %rd592, %r1392; 2026-02-21T08:54:39.3052884Z shl.b64 %rd593, %rd592, 32; 2026-02-21T08:54:39.3052939Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T08:54:39.3053000Z cvt.u64.u32 %rd595, %r1393; 2026-02-21T08:54:39.3053054Z cvt.u64.u32 %rd596, %r1394; 2026-02-21T08:54:39.3053133Z shl.b64 %rd597, %rd596, 32; 2026-02-21T08:54:39.3053194Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T08:54:39.3053248Z cvt.u64.u32 %rd599, %r1395; 2026-02-21T08:54:39.3053301Z cvt.u64.u32 %rd600, %r1396; 2026-02-21T08:54:39.3053356Z shl.b64 %rd601, %rd600, 32; 2026-02-21T08:54:39.3053418Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T08:54:39.3053472Z cvt.u64.u32 %rd603, %r1397; 2026-02-21T08:54:39.3053525Z cvt.u64.u32 %rd604, %r1398; 2026-02-21T08:54:39.3053584Z shl.b64 %rd605, %rd604, 32; 2026-02-21T08:54:39.3053639Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T08:54:39.3053693Z // begin inline asm 2026-02-21T08:54:39.3054006Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1400, %r1401, %r1402, %r1403, %r1404, %r1405, %r1406, %r1407, %r1408, %r1409, %r1410, %r1411, %r1412, %r1413, %r1414, %r1415}, [%r1552 + 368]; 2026-02-21T08:54:39.3054060Z // end inline asm 2026-02-21T08:54:39.3054114Z cvt.u64.u32 %rd607, %r1400; 2026-02-21T08:54:39.3054168Z cvt.u64.u32 %rd608, %r1401; 2026-02-21T08:54:39.3054232Z shl.b64 %rd609, %rd608, 32; 2026-02-21T08:54:39.3054287Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T08:54:39.3054342Z cvt.u64.u32 %rd611, %r1402; 2026-02-21T08:54:39.3054401Z cvt.u64.u32 %rd612, %r1403; 2026-02-21T08:54:39.3054478Z shl.b64 %rd613, %rd612, 32; 2026-02-21T08:54:39.3054536Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T08:54:39.3054590Z cvt.u64.u32 %rd615, %r1404; 2026-02-21T08:54:39.3054653Z cvt.u64.u32 %rd616, %r1405; 2026-02-21T08:54:39.3054744Z shl.b64 %rd617, %rd616, 32; 2026-02-21T08:54:39.3054804Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T08:54:39.3054870Z cvt.u64.u32 %rd619, %r1406; 2026-02-21T08:54:39.3054926Z cvt.u64.u32 %rd620, %r1407; 2026-02-21T08:54:39.3054985Z shl.b64 %rd621, %rd620, 32; 2026-02-21T08:54:39.3055069Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T08:54:39.3055131Z cvt.u64.u32 %rd623, %r1408; 2026-02-21T08:54:39.3055185Z cvt.u64.u32 %rd624, %r1409; 2026-02-21T08:54:39.3055240Z shl.b64 %rd625, %rd624, 32; 2026-02-21T08:54:39.3055307Z or.b64 %rd626, %rd623, %rd625; 2026-02-21T08:54:39.3055364Z cvt.u64.u32 %rd627, %r1410; 2026-02-21T08:54:39.3055420Z cvt.u64.u32 %rd628, %r1411; 2026-02-21T08:54:39.3055485Z shl.b64 %rd629, %rd628, 32; 2026-02-21T08:54:39.3055545Z or.b64 %rd630, %rd627, %rd629; 2026-02-21T08:54:39.3055600Z cvt.u64.u32 %rd631, %r1412; 2026-02-21T08:54:39.3055655Z cvt.u64.u32 %rd632, %r1413; 2026-02-21T08:54:39.3055725Z shl.b64 %rd633, %rd632, 32; 2026-02-21T08:54:39.3055783Z or.b64 %rd634, %rd631, %rd633; 2026-02-21T08:54:39.3055842Z cvt.u64.u32 %rd635, %r1414; 2026-02-21T08:54:39.3055911Z cvt.u64.u32 %rd636, %r1415; 2026-02-21T08:54:39.3055971Z shl.b64 %rd637, %rd636, 32; 2026-02-21T08:54:39.3056059Z or.b64 %rd638, %rd635, %rd637; 2026-02-21T08:54:39.3056117Z // begin inline asm 2026-02-21T08:54:39.3056424Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1417, %r1418, %r1419, %r1420, %r1421, %r1422, %r1423, %r1424, %r1425, %r1426, %r1427, %r1428, %r1429, %r1430, %r1431, %r1432}, [%r1552 + 384]; 2026-02-21T08:54:39.3056479Z // end inline asm 2026-02-21T08:54:39.3056533Z // begin inline asm 2026-02-21T08:54:39.3056830Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1434, %r1435, %r1436, %r1437, %r1438, %r1439, %r1440, %r1441, %r1442, %r1443, %r1444, %r1445, %r1446, %r1447, %r1448, %r1449}, [%r1552 + 400]; 2026-02-21T08:54:39.3056883Z // end inline asm 2026-02-21T08:54:39.3056937Z cvt.u64.u32 %rd639, %r1442; 2026-02-21T08:54:39.3057000Z cvt.u64.u32 %rd640, %r1443; 2026-02-21T08:54:39.3057054Z shl.b64 %rd641, %rd640, 32; 2026-02-21T08:54:39.3057111Z or.b64 %rd642, %rd639, %rd641; 2026-02-21T08:54:39.3057166Z cvt.u64.u32 %rd643, %r1444; 2026-02-21T08:54:39.3057228Z cvt.u64.u32 %rd644, %r1445; 2026-02-21T08:54:39.3057283Z shl.b64 %rd645, %rd644, 32; 2026-02-21T08:54:39.3057340Z or.b64 %rd646, %rd643, %rd645; 2026-02-21T08:54:39.3057402Z cvt.u64.u32 %rd647, %r1446; 2026-02-21T08:54:39.3057457Z cvt.u64.u32 %rd648, %r1447; 2026-02-21T08:54:39.3057511Z shl.b64 %rd649, %rd648, 32; 2026-02-21T08:54:39.3057601Z or.b64 %rd650, %rd647, %rd649; 2026-02-21T08:54:39.3057664Z cvt.u64.u32 %rd651, %r1448; 2026-02-21T08:54:39.3057720Z cvt.u64.u32 %rd652, %r1449; 2026-02-21T08:54:39.3057774Z shl.b64 %rd653, %rd652, 32; 2026-02-21T08:54:39.3057838Z or.b64 %rd654, %rd651, %rd653; 2026-02-21T08:54:39.3057892Z // begin inline asm 2026-02-21T08:54:39.3058182Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1451, %r1452, %r1453, %r1454, %r1455, %r1456, %r1457, %r1458, %r1459, %r1460, %r1461, %r1462, %r1463, %r1464, %r1465, %r1466}, [%r1552 + 416]; 2026-02-21T08:54:39.3058241Z // end inline asm 2026-02-21T08:54:39.3058296Z cvt.u64.u32 %rd655, %r1451; 2026-02-21T08:54:39.3058350Z cvt.u64.u32 %rd656, %r1452; 2026-02-21T08:54:39.3058406Z shl.b64 %rd657, %rd656, 32; 2026-02-21T08:54:39.3058470Z or.b64 %rd658, %rd655, %rd657; 2026-02-21T08:54:39.3058524Z cvt.u64.u32 %rd659, %r1453; 2026-02-21T08:54:39.3058578Z cvt.u64.u32 %rd660, %r1454; 2026-02-21T08:54:39.3058638Z shl.b64 %rd661, %rd660, 32; 2026-02-21T08:54:39.3058695Z or.b64 %rd662, %rd659, %rd661; 2026-02-21T08:54:39.3058750Z cvt.u64.u32 %rd663, %r1455; 2026-02-21T08:54:39.3058803Z cvt.u64.u32 %rd664, %r1456; 2026-02-21T08:54:39.3058864Z shl.b64 %rd665, %rd664, 32; 2026-02-21T08:54:39.3058944Z or.b64 %rd666, %rd663, %rd665; 2026-02-21T08:54:39.3059000Z cvt.u64.u32 %rd667, %r1457; 2026-02-21T08:54:39.3059063Z cvt.u64.u32 %rd668, %r1458; 2026-02-21T08:54:39.3059118Z shl.b64 %rd669, %rd668, 32; 2026-02-21T08:54:39.3059173Z or.b64 %rd670, %rd667, %rd669; 2026-02-21T08:54:39.3059234Z cvt.u64.u32 %rd671, %r1459; 2026-02-21T08:54:39.3059287Z cvt.u64.u32 %rd672, %r1460; 2026-02-21T08:54:39.3059340Z shl.b64 %rd673, %rd672, 32; 2026-02-21T08:54:39.3059396Z or.b64 %rd674, %rd671, %rd673; 2026-02-21T08:54:39.3059481Z cvt.u64.u32 %rd675, %r1461; 2026-02-21T08:54:39.3059535Z cvt.u64.u32 %rd676, %r1462; 2026-02-21T08:54:39.3059590Z shl.b64 %rd677, %rd676, 32; 2026-02-21T08:54:39.3059652Z or.b64 %rd678, %rd675, %rd677; 2026-02-21T08:54:39.3059709Z cvt.u64.u32 %rd679, %r1463; 2026-02-21T08:54:39.3059763Z cvt.u64.u32 %rd680, %r1464; 2026-02-21T08:54:39.3059819Z shl.b64 %rd681, %rd680, 32; 2026-02-21T08:54:39.3059882Z or.b64 %rd682, %rd679, %rd681; 2026-02-21T08:54:39.3059939Z cvt.u64.u32 %rd683, %r1465; 2026-02-21T08:54:39.3059995Z cvt.u64.u32 %rd684, %r1466; 2026-02-21T08:54:39.3060056Z shl.b64 %rd685, %rd684, 32; 2026-02-21T08:54:39.3060112Z or.b64 %rd686, %rd683, %rd685; 2026-02-21T08:54:39.3060167Z // begin inline asm 2026-02-21T08:54:39.3060470Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1468, %r1469, %r1470, %r1471, %r1472, %r1473, %r1474, %r1475, %r1476, %r1477, %r1478, %r1479, %r1480, %r1481, %r1482, %r1483}, [%r1552 + 432]; 2026-02-21T08:54:39.3060543Z // end inline asm 2026-02-21T08:54:39.3060600Z cvt.u64.u32 %rd687, %r1468; 2026-02-21T08:54:39.3060654Z cvt.u64.u32 %rd688, %r1469; 2026-02-21T08:54:39.3060716Z shl.b64 %rd689, %rd688, 32; 2026-02-21T08:54:39.3060771Z or.b64 %rd690, %rd687, %rd689; 2026-02-21T08:54:39.3060827Z cvt.u64.u32 %rd691, %r1470; 2026-02-21T08:54:39.3060889Z cvt.u64.u32 %rd692, %r1471; 2026-02-21T08:54:39.3060943Z shl.b64 %rd693, %rd692, 32; 2026-02-21T08:54:39.3060998Z or.b64 %rd694, %rd691, %rd693; 2026-02-21T08:54:39.3061054Z cvt.u64.u32 %rd695, %r1472; 2026-02-21T08:54:39.3061115Z cvt.u64.u32 %rd696, %r1473; 2026-02-21T08:54:39.3061169Z shl.b64 %rd697, %rd696, 32; 2026-02-21T08:54:39.3061225Z or.b64 %rd698, %rd695, %rd697; 2026-02-21T08:54:39.3061287Z cvt.u64.u32 %rd699, %r1474; 2026-02-21T08:54:39.3061342Z cvt.u64.u32 %rd700, %r1475; 2026-02-21T08:54:39.3061398Z shl.b64 %rd701, %rd700, 32; 2026-02-21T08:54:39.3061454Z or.b64 %rd702, %rd699, %rd701; 2026-02-21T08:54:39.3061517Z cvt.u64.u32 %rd703, %r1476; 2026-02-21T08:54:39.3061572Z cvt.u64.u32 %rd704, %r1477; 2026-02-21T08:54:39.3061626Z shl.b64 %rd705, %rd704, 32; 2026-02-21T08:54:39.3061689Z or.b64 %rd706, %rd703, %rd705; 2026-02-21T08:54:39.3061745Z cvt.u64.u32 %rd707, %r1478; 2026-02-21T08:54:39.3061821Z cvt.u64.u32 %rd708, %r1479; 2026-02-21T08:54:39.3061884Z shl.b64 %rd709, %rd708, 32; 2026-02-21T08:54:39.3061940Z or.b64 %rd710, %rd707, %rd709; 2026-02-21T08:54:39.3061994Z cvt.u64.u32 %rd711, %r1480; 2026-02-21T08:54:39.3062051Z cvt.u64.u32 %rd712, %r1481; 2026-02-21T08:54:39.3062114Z shl.b64 %rd713, %rd712, 32; 2026-02-21T08:54:39.3062171Z or.b64 %rd714, %rd711, %rd713; 2026-02-21T08:54:39.3062229Z cvt.u64.u32 %rd715, %r1482; 2026-02-21T08:54:39.3062292Z cvt.u64.u32 %rd716, %r1483; 2026-02-21T08:54:39.3062350Z shl.b64 %rd717, %rd716, 32; 2026-02-21T08:54:39.3062407Z or.b64 %rd718, %rd715, %rd717; 2026-02-21T08:54:39.3062459Z // begin inline asm 2026-02-21T08:54:39.3062759Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1485, %r1486, %r1487, %r1488, %r1489, %r1490, %r1491, %r1492, %r1493, %r1494, %r1495, %r1496, %r1497, %r1498, %r1499, %r1500}, [%r1552 + 448]; 2026-02-21T08:54:39.3062813Z // end inline asm 2026-02-21T08:54:39.3062867Z // begin inline asm 2026-02-21T08:54:39.3063161Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1502, %r1503, %r1504, %r1505, %r1506, %r1507, %r1508, %r1509, %r1510, %r1511, %r1512, %r1513, %r1514, %r1515, %r1516, %r1517}, [%r1552 + 464]; 2026-02-21T08:54:39.3063213Z // end inline asm 2026-02-21T08:54:39.3063288Z cvt.u64.u32 %rd719, %r1510; 2026-02-21T08:54:39.3063351Z cvt.u64.u32 %rd720, %r1511; 2026-02-21T08:54:39.3063404Z shl.b64 %rd721, %rd720, 32; 2026-02-21T08:54:39.3063459Z or.b64 %rd722, %rd719, %rd721; 2026-02-21T08:54:39.3063517Z cvt.u64.u32 %rd723, %r1512; 2026-02-21T08:54:39.3063581Z cvt.u64.u32 %rd724, %r1513; 2026-02-21T08:54:39.3063638Z shl.b64 %rd725, %rd724, 32; 2026-02-21T08:54:39.3063696Z or.b64 %rd726, %rd723, %rd725; 2026-02-21T08:54:39.3063762Z cvt.u64.u32 %rd727, %r1514; 2026-02-21T08:54:39.3063842Z cvt.u64.u32 %rd728, %r1515; 2026-02-21T08:54:39.3063899Z shl.b64 %rd729, %rd728, 32; 2026-02-21T08:54:39.3063957Z or.b64 %rd730, %rd727, %rd729; 2026-02-21T08:54:39.3064022Z cvt.u64.u32 %rd731, %r1516; 2026-02-21T08:54:39.3064082Z cvt.u64.u32 %rd732, %r1517; 2026-02-21T08:54:39.3064139Z shl.b64 %rd733, %rd732, 32; 2026-02-21T08:54:39.3064205Z or.b64 %rd734, %rd731, %rd733; 2026-02-21T08:54:39.3064261Z // begin inline asm 2026-02-21T08:54:39.3064570Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1519, %r1520, %r1521, %r1522, %r1523, %r1524, %r1525, %r1526, %r1527, %r1528, %r1529, %r1530, %r1531, %r1532, %r1533, %r1534}, [%r1552 + 480]; 2026-02-21T08:54:39.3064633Z // end inline asm 2026-02-21T08:54:39.3064724Z cvt.u64.u32 %rd735, %r1519; 2026-02-21T08:54:39.3064784Z cvt.u64.u32 %rd736, %r1520; 2026-02-21T08:54:39.3064842Z shl.b64 %rd737, %rd736, 32; 2026-02-21T08:54:39.3064906Z or.b64 %rd738, %rd735, %rd737; 2026-02-21T08:54:39.3065007Z cvt.u64.u32 %rd739, %r1521; 2026-02-21T08:54:39.3065069Z cvt.u64.u32 %rd740, %r1522; 2026-02-21T08:54:39.3065137Z shl.b64 %rd741, %rd740, 32; 2026-02-21T08:54:39.3065195Z or.b64 %rd742, %rd739, %rd741; 2026-02-21T08:54:39.3065252Z cvt.u64.u32 %rd743, %r1523; 2026-02-21T08:54:39.3065320Z cvt.u64.u32 %rd744, %r1524; 2026-02-21T08:54:39.3065378Z shl.b64 %rd745, %rd744, 32; 2026-02-21T08:54:39.3065436Z or.b64 %rd746, %rd743, %rd745; 2026-02-21T08:54:39.3065494Z cvt.u64.u32 %rd747, %r1525; 2026-02-21T08:54:39.3065561Z cvt.u64.u32 %rd748, %r1526; 2026-02-21T08:54:39.3065619Z shl.b64 %rd749, %rd748, 32; 2026-02-21T08:54:39.3065678Z or.b64 %rd750, %rd747, %rd749; 2026-02-21T08:54:39.3065746Z cvt.u64.u32 %rd751, %r1527; 2026-02-21T08:54:39.3065804Z cvt.u64.u32 %rd752, %r1528; 2026-02-21T08:54:39.3065861Z shl.b64 %rd753, %rd752, 32; 2026-02-21T08:54:39.3065920Z or.b64 %rd754, %rd751, %rd753; 2026-02-21T08:54:39.3065987Z cvt.u64.u32 %rd755, %r1529; 2026-02-21T08:54:39.3066046Z cvt.u64.u32 %rd756, %r1530; 2026-02-21T08:54:39.3066105Z shl.b64 %rd757, %rd756, 32; 2026-02-21T08:54:39.3066169Z or.b64 %rd758, %rd755, %rd757; 2026-02-21T08:54:39.3066225Z cvt.u64.u32 %rd759, %r1531; 2026-02-21T08:54:39.3066281Z cvt.u64.u32 %rd760, %r1532; 2026-02-21T08:54:39.3066381Z shl.b64 %rd761, %rd760, 32; 2026-02-21T08:54:39.3066447Z or.b64 %rd762, %rd759, %rd761; 2026-02-21T08:54:39.3066505Z cvt.u64.u32 %rd763, %r1533; 2026-02-21T08:54:39.3066562Z cvt.u64.u32 %rd764, %r1534; 2026-02-21T08:54:39.3066627Z shl.b64 %rd765, %rd764, 32; 2026-02-21T08:54:39.3066683Z or.b64 %rd766, %rd763, %rd765; 2026-02-21T08:54:39.3066738Z // begin inline asm 2026-02-21T08:54:39.3067062Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1536, %r1537, %r1538, %r1539, %r1540, %r1541, %r1542, %r1543, %r1544, %r1545, %r1546, %r1547, %r1548, %r1549, %r1550, %r1551}, [%r1552 + 496]; 2026-02-21T08:54:39.3067116Z // end inline asm 2026-02-21T08:54:39.3067173Z cvt.u64.u32 %rd767, %r1536; 2026-02-21T08:54:39.3067230Z cvt.u64.u32 %rd768, %r1537; 2026-02-21T08:54:39.3067296Z shl.b64 %rd769, %rd768, 32; 2026-02-21T08:54:39.3067352Z or.b64 %rd770, %rd767, %rd769; 2026-02-21T08:54:39.3067409Z cvt.u64.u32 %rd771, %r1538; 2026-02-21T08:54:39.3067471Z cvt.u64.u32 %rd772, %r1539; 2026-02-21T08:54:39.3067528Z shl.b64 %rd773, %rd772, 32; 2026-02-21T08:54:39.3067586Z or.b64 %rd774, %rd771, %rd773; 2026-02-21T08:54:39.3067643Z cvt.u64.u32 %rd775, %r1540; 2026-02-21T08:54:39.3067706Z cvt.u64.u32 %rd776, %r1541; 2026-02-21T08:54:39.3067792Z shl.b64 %rd777, %rd776, 32; 2026-02-21T08:54:39.3067854Z or.b64 %rd778, %rd775, %rd777; 2026-02-21T08:54:39.3067918Z cvt.u64.u32 %rd779, %r1542; 2026-02-21T08:54:39.3067976Z cvt.u64.u32 %rd780, %r1543; 2026-02-21T08:54:39.3068033Z shl.b64 %rd781, %rd780, 32; 2026-02-21T08:54:39.3068091Z or.b64 %rd782, %rd779, %rd781; 2026-02-21T08:54:39.3068153Z cvt.u64.u32 %rd783, %r1544; 2026-02-21T08:54:39.3068209Z cvt.u64.u32 %rd784, %r1545; 2026-02-21T08:54:39.3068267Z shl.b64 %rd785, %rd784, 32; 2026-02-21T08:54:39.3068363Z or.b64 %rd786, %rd783, %rd785; 2026-02-21T08:54:39.3068421Z cvt.u64.u32 %rd787, %r1546; 2026-02-21T08:54:39.3068479Z cvt.u64.u32 %rd788, %r1547; 2026-02-21T08:54:39.3068542Z shl.b64 %rd789, %rd788, 32; 2026-02-21T08:54:39.3068603Z or.b64 %rd790, %rd787, %rd789; 2026-02-21T08:54:39.3068662Z cvt.u64.u32 %rd791, %r1548; 2026-02-21T08:54:39.3068718Z cvt.u64.u32 %rd792, %r1549; 2026-02-21T08:54:39.3068783Z shl.b64 %rd793, %rd792, 32; 2026-02-21T08:54:39.3068844Z or.b64 %rd794, %rd791, %rd793; 2026-02-21T08:54:39.3068903Z cvt.u64.u32 %rd795, %r1550; 2026-02-21T08:54:39.3068969Z cvt.u64.u32 %rd796, %r1551; 2026-02-21T08:54:39.3069029Z shl.b64 %rd797, %rd796, 32; 2026-02-21T08:54:39.3069087Z or.b64 %rd798, %rd795, %rd797; 2026-02-21T08:54:39.3069143Z // begin inline asm 2026-02-21T08:54:39.3069225Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:39.3069282Z // end inline asm 2026-02-21T08:54:39.3069366Z cvt.u64.u32 %rd799, %r1009; 2026-02-21T08:54:39.3069439Z cvt.u64.u32 %rd800, %r1010; 2026-02-21T08:54:39.3069498Z shl.b64 %rd801, %rd800, 32; 2026-02-21T08:54:39.3069556Z or.b64 %rd802, %rd799, %rd801; 2026-02-21T08:54:39.3069741Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3069816Z mov.b64 {%r1557, %r1558}, %rd802; 2026-02-21T08:54:39.3069889Z cvt.rn.f16x2.f32 %r1559, %r1558, %r1557; 2026-02-21T08:54:39.3070063Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3070127Z cvt.u64.u32 %rd803, %r1011; 2026-02-21T08:54:39.3070184Z cvt.u64.u32 %rd804, %r1012; 2026-02-21T08:54:39.3070239Z shl.b64 %rd805, %rd804, 32; 2026-02-21T08:54:39.3070306Z or.b64 %rd806, %rd803, %rd805; 2026-02-21T08:54:39.3070481Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3070545Z mov.b64 {%r1560, %r1561}, %rd806; 2026-02-21T08:54:39.3070619Z cvt.rn.f16x2.f32 %r1562, %r1561, %r1560; 2026-02-21T08:54:39.3070799Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3070856Z cvt.u64.u32 %rd807, %r1013; 2026-02-21T08:54:39.3070943Z cvt.u64.u32 %rd808, %r1014; 2026-02-21T08:54:39.3071008Z shl.b64 %rd809, %rd808, 32; 2026-02-21T08:54:39.3071067Z or.b64 %rd810, %rd807, %rd809; 2026-02-21T08:54:39.3071242Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3071310Z mov.b64 {%r1563, %r1564}, %rd810; 2026-02-21T08:54:39.3071379Z cvt.rn.f16x2.f32 %r1565, %r1564, %r1563; 2026-02-21T08:54:39.3071552Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3071618Z cvt.u64.u32 %rd811, %r1015; 2026-02-21T08:54:39.3071674Z cvt.u64.u32 %rd812, %r1016; 2026-02-21T08:54:39.3071732Z shl.b64 %rd813, %rd812, 32; 2026-02-21T08:54:39.3071792Z or.b64 %rd814, %rd811, %rd813; 2026-02-21T08:54:39.3071976Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3072036Z mov.b64 {%r1566, %r1567}, %rd814; 2026-02-21T08:54:39.3072115Z cvt.rn.f16x2.f32 %r1568, %r1567, %r1566; 2026-02-21T08:54:39.3072285Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3072341Z cvt.u64.u32 %rd815, %r1017; 2026-02-21T08:54:39.3072417Z cvt.u64.u32 %rd816, %r1018; 2026-02-21T08:54:39.3072481Z shl.b64 %rd817, %rd816, 32; 2026-02-21T08:54:39.3072537Z or.b64 %rd818, %rd815, %rd817; 2026-02-21T08:54:39.3072704Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3072761Z mov.b64 {%r1569, %r1570}, %rd818; 2026-02-21T08:54:39.3072833Z cvt.rn.f16x2.f32 %r1571, %r1570, %r1569; 2026-02-21T08:54:39.3073000Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3073078Z cvt.u64.u32 %rd819, %r1019; 2026-02-21T08:54:39.3073141Z cvt.u64.u32 %rd820, %r1020; 2026-02-21T08:54:39.3073196Z shl.b64 %rd821, %rd820, 32; 2026-02-21T08:54:39.3073251Z or.b64 %rd822, %rd819, %rd821; 2026-02-21T08:54:39.3073424Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3073482Z mov.b64 {%r1572, %r1573}, %rd822; 2026-02-21T08:54:39.3073547Z cvt.rn.f16x2.f32 %r1574, %r1573, %r1572; 2026-02-21T08:54:39.3073710Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3073772Z cvt.u64.u32 %rd823, %r1021; 2026-02-21T08:54:39.3073826Z cvt.u64.u32 %rd824, %r1022; 2026-02-21T08:54:39.3073881Z shl.b64 %rd825, %rd824, 32; 2026-02-21T08:54:39.3073941Z or.b64 %rd826, %rd823, %rd825; 2026-02-21T08:54:39.3074128Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3074187Z mov.b64 {%r1575, %r1576}, %rd826; 2026-02-21T08:54:39.3074257Z cvt.rn.f16x2.f32 %r1577, %r1576, %r1575; 2026-02-21T08:54:39.3074424Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3074480Z cvt.u64.u32 %rd827, %r1023; 2026-02-21T08:54:39.3074535Z cvt.u64.u32 %rd828, %r1024; 2026-02-21T08:54:39.3074595Z shl.b64 %rd829, %rd828, 32; 2026-02-21T08:54:39.3074652Z or.b64 %rd830, %rd827, %rd829; 2026-02-21T08:54:39.3074866Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3074930Z mov.b64 {%r1578, %r1579}, %rd830; 2026-02-21T08:54:39.3074994Z cvt.rn.f16x2.f32 %r1580, %r1579, %r1578; 2026-02-21T08:54:39.3075154Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3075217Z cvt.u64.u32 %rd831, %r1026; 2026-02-21T08:54:39.3075271Z cvt.u64.u32 %rd832, %r1027; 2026-02-21T08:54:39.3075327Z shl.b64 %rd833, %rd832, 32; 2026-02-21T08:54:39.3075382Z or.b64 %rd834, %rd831, %rd833; 2026-02-21T08:54:39.3075550Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3075639Z mov.b64 {%r1581, %r1582}, %rd834; 2026-02-21T08:54:39.3075703Z cvt.rn.f16x2.f32 %r1583, %r1582, %r1581; 2026-02-21T08:54:39.3075876Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3075932Z cvt.u64.u32 %rd835, %r1028; 2026-02-21T08:54:39.3075987Z cvt.u64.u32 %rd836, %r1029; 2026-02-21T08:54:39.3076050Z shl.b64 %rd837, %rd836, 32; 2026-02-21T08:54:39.3076106Z or.b64 %rd838, %rd835, %rd837; 2026-02-21T08:54:39.3076272Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3076330Z mov.b64 {%r1584, %r1585}, %rd838; 2026-02-21T08:54:39.3076404Z cvt.rn.f16x2.f32 %r1586, %r1585, %r1584; 2026-02-21T08:54:39.3076568Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3076624Z cvt.u64.u32 %rd839, %r1030; 2026-02-21T08:54:39.3076690Z cvt.u64.u32 %rd840, %r1031; 2026-02-21T08:54:39.3076744Z shl.b64 %rd841, %rd840, 32; 2026-02-21T08:54:39.3076802Z or.b64 %rd842, %rd839, %rd841; 2026-02-21T08:54:39.3076997Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3077057Z mov.b64 {%r1587, %r1588}, %rd842; 2026-02-21T08:54:39.3077120Z cvt.rn.f16x2.f32 %r1589, %r1588, %r1587; 2026-02-21T08:54:39.3077286Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3077351Z cvt.u64.u32 %rd843, %r1032; 2026-02-21T08:54:39.3077405Z cvt.u64.u32 %rd844, %r1033; 2026-02-21T08:54:39.3077459Z shl.b64 %rd845, %rd844, 32; 2026-02-21T08:54:39.3077523Z or.b64 %rd846, %rd843, %rd845; 2026-02-21T08:54:39.3077714Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3077772Z mov.b64 {%r1590, %r1591}, %rd846; 2026-02-21T08:54:39.3077849Z cvt.rn.f16x2.f32 %r1592, %r1591, %r1590; 2026-02-21T08:54:39.3078014Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3078070Z cvt.u64.u32 %rd847, %r1034; 2026-02-21T08:54:39.3078127Z cvt.u64.u32 %rd848, %r1035; 2026-02-21T08:54:39.3078194Z shl.b64 %rd849, %rd848, 32; 2026-02-21T08:54:39.3078254Z or.b64 %rd850, %rd847, %rd849; 2026-02-21T08:54:39.3078421Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3078487Z mov.b64 {%r1593, %r1594}, %rd850; 2026-02-21T08:54:39.3078551Z cvt.rn.f16x2.f32 %r1595, %r1594, %r1593; 2026-02-21T08:54:39.3078755Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3078823Z cvt.u64.u32 %rd851, %r1036; 2026-02-21T08:54:39.3078879Z cvt.u64.u32 %rd852, %r1037; 2026-02-21T08:54:39.3078933Z shl.b64 %rd853, %rd852, 32; 2026-02-21T08:54:39.3078989Z or.b64 %rd854, %rd851, %rd853; 2026-02-21T08:54:39.3079165Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3079222Z mov.b64 {%r1596, %r1597}, %rd854; 2026-02-21T08:54:39.3079286Z cvt.rn.f16x2.f32 %r1598, %r1597, %r1596; 2026-02-21T08:54:39.3079460Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3079515Z cvt.u64.u32 %rd855, %r1038; 2026-02-21T08:54:39.3079570Z cvt.u64.u32 %rd856, %r1039; 2026-02-21T08:54:39.3079631Z shl.b64 %rd857, %rd856, 32; 2026-02-21T08:54:39.3079687Z or.b64 %rd858, %rd855, %rd857; 2026-02-21T08:54:39.3079854Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3079912Z mov.b64 {%r1599, %r1600}, %rd858; 2026-02-21T08:54:39.3079985Z cvt.rn.f16x2.f32 %r1601, %r1600, %r1599; 2026-02-21T08:54:39.3080151Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3080229Z cvt.u64.u32 %rd859, %r1040; 2026-02-21T08:54:39.3080291Z cvt.u64.u32 %rd860, %r1041; 2026-02-21T08:54:39.3080346Z shl.b64 %rd861, %rd860, 32; 2026-02-21T08:54:39.3080404Z or.b64 %rd862, %rd859, %rd861; 2026-02-21T08:54:39.3080576Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3080634Z mov.b64 {%r1602, %r1603}, %rd862; 2026-02-21T08:54:39.3080696Z cvt.rn.f16x2.f32 %r1604, %r1603, %r1602; 2026-02-21T08:54:39.3080753Z mov.b64 {%r1605, %r1606}, %rd254; 2026-02-21T08:54:39.3080825Z cvt.rn.f16x2.f32 %r1607, %r1606, %r1605; 2026-02-21T08:54:39.3080882Z mov.b64 {%r1608, %r1609}, %rd258; 2026-02-21T08:54:39.3080946Z cvt.rn.f16x2.f32 %r1610, %r1609, %r1608; 2026-02-21T08:54:39.3081008Z mov.b64 {%r1611, %r1612}, %rd262; 2026-02-21T08:54:39.3081070Z cvt.rn.f16x2.f32 %r1613, %r1612, %r1611; 2026-02-21T08:54:39.3081125Z mov.b64 {%r1614, %r1615}, %rd266; 2026-02-21T08:54:39.3081195Z cvt.rn.f16x2.f32 %r1616, %r1615, %r1614; 2026-02-21T08:54:39.3081251Z mov.b64 {%r1617, %r1618}, %rd270; 2026-02-21T08:54:39.3081313Z cvt.rn.f16x2.f32 %r1619, %r1618, %r1617; 2026-02-21T08:54:39.3081393Z mov.b64 {%r1620, %r1621}, %rd274; 2026-02-21T08:54:39.3081462Z cvt.rn.f16x2.f32 %r1622, %r1621, %r1620; 2026-02-21T08:54:39.3081518Z mov.b64 {%r1623, %r1624}, %rd278; 2026-02-21T08:54:39.3081580Z cvt.rn.f16x2.f32 %r1625, %r1624, %r1623; 2026-02-21T08:54:39.3081643Z mov.b64 {%r1626, %r1627}, %rd282; 2026-02-21T08:54:39.3081705Z cvt.rn.f16x2.f32 %r1628, %r1627, %r1626; 2026-02-21T08:54:39.3081759Z mov.b64 {%r1629, %r1630}, %rd286; 2026-02-21T08:54:39.3081821Z cvt.rn.f16x2.f32 %r1631, %r1630, %r1629; 2026-02-21T08:54:39.3081923Z mov.b64 {%r1632, %r1633}, %rd290; 2026-02-21T08:54:39.3081986Z cvt.rn.f16x2.f32 %r1634, %r1633, %r1632; 2026-02-21T08:54:39.3082041Z mov.b64 {%r1635, %r1636}, %rd294; 2026-02-21T08:54:39.3082110Z cvt.rn.f16x2.f32 %r1637, %r1636, %r1635; 2026-02-21T08:54:39.3082167Z mov.b64 {%r1638, %r1639}, %rd298; 2026-02-21T08:54:39.3082229Z cvt.rn.f16x2.f32 %r1640, %r1639, %r1638; 2026-02-21T08:54:39.3082292Z mov.b64 {%r1641, %r1642}, %rd302; 2026-02-21T08:54:39.3082355Z cvt.rn.f16x2.f32 %r1643, %r1642, %r1641; 2026-02-21T08:54:39.3082410Z mov.b64 {%r1644, %r1645}, %rd306; 2026-02-21T08:54:39.3082472Z cvt.rn.f16x2.f32 %r1646, %r1645, %r1644; 2026-02-21T08:54:39.3082534Z mov.b64 {%r1647, %r1648}, %rd310; 2026-02-21T08:54:39.3082596Z cvt.rn.f16x2.f32 %r1649, %r1648, %r1647; 2026-02-21T08:54:39.3082650Z mov.b64 {%r1650, %r1651}, %rd314; 2026-02-21T08:54:39.3082718Z cvt.rn.f16x2.f32 %r1652, %r1651, %r1650; 2026-02-21T08:54:39.3082904Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3082963Z cvt.u64.u32 %rd863, %r1077; 2026-02-21T08:54:39.3083018Z cvt.u64.u32 %rd864, %r1078; 2026-02-21T08:54:39.3083079Z shl.b64 %rd865, %rd864, 32; 2026-02-21T08:54:39.3083136Z or.b64 %rd866, %rd863, %rd865; 2026-02-21T08:54:39.3083298Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3083361Z mov.b64 {%r1653, %r1654}, %rd866; 2026-02-21T08:54:39.3083423Z cvt.rn.f16x2.f32 %r1655, %r1654, %r1653; 2026-02-21T08:54:39.3083584Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3083646Z cvt.u64.u32 %rd867, %r1079; 2026-02-21T08:54:39.3083700Z cvt.u64.u32 %rd868, %r1080; 2026-02-21T08:54:39.3083753Z shl.b64 %rd869, %rd868, 32; 2026-02-21T08:54:39.3083809Z or.b64 %rd870, %rd867, %rd869; 2026-02-21T08:54:39.3083977Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3084035Z mov.b64 {%r1656, %r1657}, %rd870; 2026-02-21T08:54:39.3084097Z cvt.rn.f16x2.f32 %r1658, %r1657, %r1656; 2026-02-21T08:54:39.3084268Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3084345Z cvt.u64.u32 %rd871, %r1081; 2026-02-21T08:54:39.3084399Z cvt.u64.u32 %rd872, %r1082; 2026-02-21T08:54:39.3084459Z shl.b64 %rd873, %rd872, 32; 2026-02-21T08:54:39.3084516Z or.b64 %rd874, %rd871, %rd873; 2026-02-21T08:54:39.3084722Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3084780Z mov.b64 {%r1659, %r1660}, %rd874; 2026-02-21T08:54:39.3084850Z cvt.rn.f16x2.f32 %r1661, %r1660, %r1659; 2026-02-21T08:54:39.3085009Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3085065Z cvt.u64.u32 %rd875, %r1083; 2026-02-21T08:54:39.3085126Z cvt.u64.u32 %rd876, %r1084; 2026-02-21T08:54:39.3085181Z shl.b64 %rd877, %rd876, 32; 2026-02-21T08:54:39.3085237Z or.b64 %rd878, %rd875, %rd877; 2026-02-21T08:54:39.3085406Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3085465Z mov.b64 {%r1662, %r1663}, %rd878; 2026-02-21T08:54:39.3085528Z cvt.rn.f16x2.f32 %r1664, %r1663, %r1662; 2026-02-21T08:54:39.3085721Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3085780Z cvt.u64.u32 %rd879, %r1085; 2026-02-21T08:54:39.3085837Z cvt.u64.u32 %rd880, %r1086; 2026-02-21T08:54:39.3085893Z shl.b64 %rd881, %rd880, 32; 2026-02-21T08:54:39.3085958Z or.b64 %rd882, %rd879, %rd881; 2026-02-21T08:54:39.3086126Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3086185Z mov.b64 {%r1665, %r1666}, %rd882; 2026-02-21T08:54:39.3086291Z cvt.rn.f16x2.f32 %r1667, %r1666, %r1665; 2026-02-21T08:54:39.3086457Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3086516Z cvt.u64.u32 %rd883, %r1087; 2026-02-21T08:54:39.3086587Z cvt.u64.u32 %rd884, %r1088; 2026-02-21T08:54:39.3086643Z shl.b64 %rd885, %rd884, 32; 2026-02-21T08:54:39.3086698Z or.b64 %rd886, %rd883, %rd885; 2026-02-21T08:54:39.3086860Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3086926Z mov.b64 {%r1668, %r1669}, %rd886; 2026-02-21T08:54:39.3086988Z cvt.rn.f16x2.f32 %r1670, %r1669, %r1668; 2026-02-21T08:54:39.3087147Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3087210Z cvt.u64.u32 %rd887, %r1089; 2026-02-21T08:54:39.3087265Z cvt.u64.u32 %rd888, %r1090; 2026-02-21T08:54:39.3087351Z shl.b64 %rd889, %rd888, 32; 2026-02-21T08:54:39.3087410Z or.b64 %rd890, %rd887, %rd889; 2026-02-21T08:54:39.3087581Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3087637Z mov.b64 {%r1671, %r1672}, %rd890; 2026-02-21T08:54:39.3087702Z cvt.rn.f16x2.f32 %r1673, %r1672, %r1671; 2026-02-21T08:54:39.3087872Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3087927Z cvt.u64.u32 %rd891, %r1091; 2026-02-21T08:54:39.3087984Z cvt.u64.u32 %rd892, %r1092; 2026-02-21T08:54:39.3088045Z shl.b64 %rd893, %rd892, 32; 2026-02-21T08:54:39.3088101Z or.b64 %rd894, %rd891, %rd893; 2026-02-21T08:54:39.3088265Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3088328Z mov.b64 {%r1674, %r1675}, %rd894; 2026-02-21T08:54:39.3088391Z cvt.rn.f16x2.f32 %r1676, %r1675, %r1674; 2026-02-21T08:54:39.3088559Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3088615Z cvt.u64.u32 %rd895, %r1094; 2026-02-21T08:54:39.3088678Z cvt.u64.u32 %rd896, %r1095; 2026-02-21T08:54:39.3088733Z shl.b64 %rd897, %rd896, 32; 2026-02-21T08:54:39.3088816Z or.b64 %rd898, %rd895, %rd897; 2026-02-21T08:54:39.3088986Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3089043Z mov.b64 {%r1677, %r1678}, %rd898; 2026-02-21T08:54:39.3089108Z cvt.rn.f16x2.f32 %r1679, %r1678, %r1677; 2026-02-21T08:54:39.3089281Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3089338Z cvt.u64.u32 %rd899, %r1096; 2026-02-21T08:54:39.3089392Z cvt.u64.u32 %rd900, %r1097; 2026-02-21T08:54:39.3089447Z shl.b64 %rd901, %rd900, 32; 2026-02-21T08:54:39.3089512Z or.b64 %rd902, %rd899, %rd901; 2026-02-21T08:54:39.3089678Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3089736Z mov.b64 {%r1680, %r1681}, %rd902; 2026-02-21T08:54:39.3089807Z cvt.rn.f16x2.f32 %r1682, %r1681, %r1680; 2026-02-21T08:54:39.3089969Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3090027Z cvt.u64.u32 %rd903, %r1098; 2026-02-21T08:54:39.3090087Z cvt.u64.u32 %rd904, %r1099; 2026-02-21T08:54:39.3090142Z shl.b64 %rd905, %rd904, 32; 2026-02-21T08:54:39.3090221Z or.b64 %rd906, %rd903, %rd905; 2026-02-21T08:54:39.3090392Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3090458Z mov.b64 {%r1683, %r1684}, %rd906; 2026-02-21T08:54:39.3090523Z cvt.rn.f16x2.f32 %r1685, %r1684, %r1683; 2026-02-21T08:54:39.3090690Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3090756Z cvt.u64.u32 %rd907, %r1100; 2026-02-21T08:54:39.3090833Z cvt.u64.u32 %rd908, %r1101; 2026-02-21T08:54:39.3090888Z shl.b64 %rd909, %rd908, 32; 2026-02-21T08:54:39.3090950Z or.b64 %rd910, %rd907, %rd909; 2026-02-21T08:54:39.3091111Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3091169Z mov.b64 {%r1686, %r1687}, %rd910; 2026-02-21T08:54:39.3091232Z cvt.rn.f16x2.f32 %r1688, %r1687, %r1686; 2026-02-21T08:54:39.3091401Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3091457Z cvt.u64.u32 %rd911, %r1102; 2026-02-21T08:54:39.3091510Z cvt.u64.u32 %rd912, %r1103; 2026-02-21T08:54:39.3091571Z shl.b64 %rd913, %rd912, 32; 2026-02-21T08:54:39.3091627Z or.b64 %rd914, %rd911, %rd913; 2026-02-21T08:54:39.3091788Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3091873Z mov.b64 {%r1689, %r1690}, %rd914; 2026-02-21T08:54:39.3091939Z cvt.rn.f16x2.f32 %r1691, %r1690, %r1689; 2026-02-21T08:54:39.3092102Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3092156Z cvt.u64.u32 %rd915, %r1104; 2026-02-21T08:54:39.3092219Z cvt.u64.u32 %rd916, %r1105; 2026-02-21T08:54:39.3092272Z shl.b64 %rd917, %rd916, 32; 2026-02-21T08:54:39.3092329Z or.b64 %rd918, %rd915, %rd917; 2026-02-21T08:54:39.3092500Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3092557Z mov.b64 {%r1692, %r1693}, %rd918; 2026-02-21T08:54:39.3092620Z cvt.rn.f16x2.f32 %r1694, %r1693, %r1692; 2026-02-21T08:54:39.3092792Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3092847Z cvt.u64.u32 %rd919, %r1106; 2026-02-21T08:54:39.3092902Z cvt.u64.u32 %rd920, %r1107; 2026-02-21T08:54:39.3092958Z shl.b64 %rd921, %rd920, 32; 2026-02-21T08:54:39.3093023Z or.b64 %rd922, %rd919, %rd921; 2026-02-21T08:54:39.3093188Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3093244Z mov.b64 {%r1695, %r1696}, %rd922; 2026-02-21T08:54:39.3093334Z cvt.rn.f16x2.f32 %r1697, %r1696, %r1695; 2026-02-21T08:54:39.3093492Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3093546Z cvt.u64.u32 %rd923, %r1108; 2026-02-21T08:54:39.3093609Z cvt.u64.u32 %rd924, %r1109; 2026-02-21T08:54:39.3093664Z shl.b64 %rd925, %rd924, 32; 2026-02-21T08:54:39.3093721Z or.b64 %rd926, %rd923, %rd925; 2026-02-21T08:54:39.3093878Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3093943Z mov.b64 {%r1698, %r1699}, %rd926; 2026-02-21T08:54:39.3094006Z cvt.rn.f16x2.f32 %r1700, %r1699, %r1698; 2026-02-21T08:54:39.3094064Z mov.b64 {%r1701, %r1702}, %rd318; 2026-02-21T08:54:39.3094137Z cvt.rn.f16x2.f32 %r1703, %r1702, %r1701; 2026-02-21T08:54:39.3094194Z mov.b64 {%r1704, %r1705}, %rd322; 2026-02-21T08:54:39.3094257Z cvt.rn.f16x2.f32 %r1706, %r1705, %r1704; 2026-02-21T08:54:39.3094322Z mov.b64 {%r1707, %r1708}, %rd326; 2026-02-21T08:54:39.3094387Z cvt.rn.f16x2.f32 %r1709, %r1708, %r1707; 2026-02-21T08:54:39.3094443Z mov.b64 {%r1710, %r1711}, %rd330; 2026-02-21T08:54:39.3094507Z cvt.rn.f16x2.f32 %r1712, %r1711, %r1710; 2026-02-21T08:54:39.3094595Z mov.b64 {%r1713, %r1714}, %rd334; 2026-02-21T08:54:39.3094662Z cvt.rn.f16x2.f32 %r1715, %r1714, %r1713; 2026-02-21T08:54:39.3094748Z mov.b64 {%r1716, %r1717}, %rd338; 2026-02-21T08:54:39.3094820Z cvt.rn.f16x2.f32 %r1718, %r1717, %r1716; 2026-02-21T08:54:39.3094876Z mov.b64 {%r1719, %r1720}, %rd342; 2026-02-21T08:54:39.3094938Z cvt.rn.f16x2.f32 %r1721, %r1720, %r1719; 2026-02-21T08:54:39.3094993Z mov.b64 {%r1722, %r1723}, %rd346; 2026-02-21T08:54:39.3095064Z cvt.rn.f16x2.f32 %r1724, %r1723, %r1722; 2026-02-21T08:54:39.3095146Z mov.b64 {%r1725, %r1726}, %rd350; 2026-02-21T08:54:39.3095207Z cvt.rn.f16x2.f32 %r1727, %r1726, %r1725; 2026-02-21T08:54:39.3095270Z mov.b64 {%r1728, %r1729}, %rd354; 2026-02-21T08:54:39.3095331Z cvt.rn.f16x2.f32 %r1730, %r1729, %r1728; 2026-02-21T08:54:39.3095387Z mov.b64 {%r1731, %r1732}, %rd358; 2026-02-21T08:54:39.3095455Z cvt.rn.f16x2.f32 %r1733, %r1732, %r1731; 2026-02-21T08:54:39.3095510Z mov.b64 {%r1734, %r1735}, %rd362; 2026-02-21T08:54:39.3095573Z cvt.rn.f16x2.f32 %r1736, %r1735, %r1734; 2026-02-21T08:54:39.3095627Z mov.b64 {%r1737, %r1738}, %rd366; 2026-02-21T08:54:39.3095696Z cvt.rn.f16x2.f32 %r1739, %r1738, %r1737; 2026-02-21T08:54:39.3095751Z mov.b64 {%r1740, %r1741}, %rd370; 2026-02-21T08:54:39.3095812Z cvt.rn.f16x2.f32 %r1742, %r1741, %r1740; 2026-02-21T08:54:39.3095872Z mov.b64 {%r1743, %r1744}, %rd374; 2026-02-21T08:54:39.3095934Z cvt.rn.f16x2.f32 %r1745, %r1744, %r1743; 2026-02-21T08:54:39.3096012Z mov.b64 {%r1746, %r1747}, %rd378; 2026-02-21T08:54:39.3096077Z cvt.rn.f16x2.f32 %r1748, %r1747, %r1746; 2026-02-21T08:54:39.3096248Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3096304Z cvt.u64.u32 %rd927, %r1145; 2026-02-21T08:54:39.3096361Z cvt.u64.u32 %rd928, %r1146; 2026-02-21T08:54:39.3096426Z shl.b64 %rd929, %rd928, 32; 2026-02-21T08:54:39.3096483Z or.b64 %rd930, %rd927, %rd929; 2026-02-21T08:54:39.3096644Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3096707Z mov.b64 {%r1749, %r1750}, %rd930; 2026-02-21T08:54:39.3096769Z cvt.rn.f16x2.f32 %r1751, %r1750, %r1749; 2026-02-21T08:54:39.3096931Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3096986Z cvt.u64.u32 %rd931, %r1147; 2026-02-21T08:54:39.3097049Z cvt.u64.u32 %rd932, %r1148; 2026-02-21T08:54:39.3097105Z shl.b64 %rd933, %rd932, 32; 2026-02-21T08:54:39.3097162Z or.b64 %rd934, %rd931, %rd933; 2026-02-21T08:54:39.3097333Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3097389Z mov.b64 {%r1752, %r1753}, %rd934; 2026-02-21T08:54:39.3097478Z cvt.rn.f16x2.f32 %r1754, %r1753, %r1752; 2026-02-21T08:54:39.3097645Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3097699Z cvt.u64.u32 %rd935, %r1149; 2026-02-21T08:54:39.3097755Z cvt.u64.u32 %rd936, %r1150; 2026-02-21T08:54:39.3097811Z shl.b64 %rd937, %rd936, 32; 2026-02-21T08:54:39.3097875Z or.b64 %rd938, %rd935, %rd937; 2026-02-21T08:54:39.3098035Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3098092Z mov.b64 {%r1755, %r1756}, %rd938; 2026-02-21T08:54:39.3098161Z cvt.rn.f16x2.f32 %r1757, %r1756, %r1755; 2026-02-21T08:54:39.3098322Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3098379Z cvt.u64.u32 %rd939, %r1151; 2026-02-21T08:54:39.3098439Z cvt.u64.u32 %rd940, %r1152; 2026-02-21T08:54:39.3098494Z shl.b64 %rd941, %rd940, 32; 2026-02-21T08:54:39.3098552Z or.b64 %rd942, %rd939, %rd941; 2026-02-21T08:54:39.3098710Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3098774Z mov.b64 {%r1758, %r1759}, %rd942; 2026-02-21T08:54:39.3098959Z cvt.rn.f16x2.f32 %r1760, %r1759, %r1758; 2026-02-21T08:54:39.3099126Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3099189Z cvt.u64.u32 %rd943, %r1153; 2026-02-21T08:54:39.3099245Z cvt.u64.u32 %rd944, %r1154; 2026-02-21T08:54:39.3099300Z shl.b64 %rd945, %rd944, 32; 2026-02-21T08:54:39.3099365Z or.b64 %rd946, %rd943, %rd945; 2026-02-21T08:54:39.3099533Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3099613Z mov.b64 {%r1761, %r1762}, %rd946; 2026-02-21T08:54:39.3099676Z cvt.rn.f16x2.f32 %r1763, %r1762, %r1761; 2026-02-21T08:54:39.3099849Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3099905Z cvt.u64.u32 %rd947, %r1155; 2026-02-21T08:54:39.3099961Z cvt.u64.u32 %rd948, %r1156; 2026-02-21T08:54:39.3100026Z shl.b64 %rd949, %rd948, 32; 2026-02-21T08:54:39.3100083Z or.b64 %rd950, %rd947, %rd949; 2026-02-21T08:54:39.3100242Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3100308Z mov.b64 {%r1764, %r1765}, %rd950; 2026-02-21T08:54:39.3100371Z cvt.rn.f16x2.f32 %r1766, %r1765, %r1764; 2026-02-21T08:54:39.3100529Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3100611Z cvt.u64.u32 %rd951, %r1157; 2026-02-21T08:54:39.3100671Z cvt.u64.u32 %rd952, %r1158; 2026-02-21T08:54:39.3100725Z shl.b64 %rd953, %rd952, 32; 2026-02-21T08:54:39.3100780Z or.b64 %rd954, %rd951, %rd953; 2026-02-21T08:54:39.3100950Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3101007Z mov.b64 {%r1767, %r1768}, %rd954; 2026-02-21T08:54:39.3101071Z cvt.rn.f16x2.f32 %r1769, %r1768, %r1767; 2026-02-21T08:54:39.3101239Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3101295Z cvt.u64.u32 %rd955, %r1159; 2026-02-21T08:54:39.3101350Z cvt.u64.u32 %rd956, %r1160; 2026-02-21T08:54:39.3101405Z shl.b64 %rd957, %rd956, 32; 2026-02-21T08:54:39.3101468Z or.b64 %rd958, %rd955, %rd957; 2026-02-21T08:54:39.3101633Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3101691Z mov.b64 {%r1770, %r1771}, %rd958; 2026-02-21T08:54:39.3101766Z cvt.rn.f16x2.f32 %r1772, %r1771, %r1770; 2026-02-21T08:54:39.3101929Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3101983Z cvt.u64.u32 %rd959, %r1162; 2026-02-21T08:54:39.3102067Z cvt.u64.u32 %rd960, %r1163; 2026-02-21T08:54:39.3102121Z shl.b64 %rd961, %rd960, 32; 2026-02-21T08:54:39.3102178Z or.b64 %rd962, %rd959, %rd961; 2026-02-21T08:54:39.3102337Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3102403Z mov.b64 {%r1773, %r1774}, %rd962; 2026-02-21T08:54:39.3102469Z cvt.rn.f16x2.f32 %r1775, %r1774, %r1773; 2026-02-21T08:54:39.3102633Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3102699Z cvt.u64.u32 %rd963, %r1164; 2026-02-21T08:54:39.3102756Z cvt.u64.u32 %rd964, %r1165; 2026-02-21T08:54:39.3102820Z shl.b64 %rd965, %rd964, 32; 2026-02-21T08:54:39.3102886Z or.b64 %rd966, %rd963, %rd965; 2026-02-21T08:54:39.3103047Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3103104Z mov.b64 {%r1776, %r1777}, %rd966; 2026-02-21T08:54:39.3103176Z cvt.rn.f16x2.f32 %r1778, %r1777, %r1776; 2026-02-21T08:54:39.3103333Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3103388Z cvt.u64.u32 %rd967, %r1166; 2026-02-21T08:54:39.3103464Z cvt.u64.u32 %rd968, %r1167; 2026-02-21T08:54:39.3103528Z shl.b64 %rd969, %rd968, 32; 2026-02-21T08:54:39.3103584Z or.b64 %rd970, %rd967, %rd969; 2026-02-21T08:54:39.3103739Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3103806Z mov.b64 {%r1779, %r1780}, %rd970; 2026-02-21T08:54:39.3103869Z cvt.rn.f16x2.f32 %r1781, %r1780, %r1779; 2026-02-21T08:54:39.3104024Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3104106Z cvt.u64.u32 %rd971, %r1168; 2026-02-21T08:54:39.3104162Z cvt.u64.u32 %rd972, %r1169; 2026-02-21T08:54:39.3104216Z shl.b64 %rd973, %rd972, 32; 2026-02-21T08:54:39.3104273Z or.b64 %rd974, %rd971, %rd973; 2026-02-21T08:54:39.3104443Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3104499Z mov.b64 {%r1782, %r1783}, %rd974; 2026-02-21T08:54:39.3104562Z cvt.rn.f16x2.f32 %r1784, %r1783, %r1782; 2026-02-21T08:54:39.3104758Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3104814Z cvt.u64.u32 %rd975, %r1170; 2026-02-21T08:54:39.3104870Z cvt.u64.u32 %rd976, %r1171; 2026-02-21T08:54:39.3104931Z shl.b64 %rd977, %rd976, 32; 2026-02-21T08:54:39.3104987Z or.b64 %rd978, %rd975, %rd977; 2026-02-21T08:54:39.3105177Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3105236Z mov.b64 {%r1785, %r1786}, %rd978; 2026-02-21T08:54:39.3105305Z cvt.rn.f16x2.f32 %r1787, %r1786, %r1785; 2026-02-21T08:54:39.3105468Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3105524Z cvt.u64.u32 %rd979, %r1172; 2026-02-21T08:54:39.3105587Z cvt.u64.u32 %rd980, %r1173; 2026-02-21T08:54:39.3105642Z shl.b64 %rd981, %rd980, 32; 2026-02-21T08:54:39.3105700Z or.b64 %rd982, %rd979, %rd981; 2026-02-21T08:54:39.3105869Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3105928Z mov.b64 {%r1788, %r1789}, %rd982; 2026-02-21T08:54:39.3105994Z cvt.rn.f16x2.f32 %r1790, %r1789, %r1788; 2026-02-21T08:54:39.3106162Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3106230Z cvt.u64.u32 %rd983, %r1174; 2026-02-21T08:54:39.3106289Z cvt.u64.u32 %rd984, %r1175; 2026-02-21T08:54:39.3106347Z shl.b64 %rd985, %rd984, 32; 2026-02-21T08:54:39.3106414Z or.b64 %rd986, %rd983, %rd985; 2026-02-21T08:54:39.3106584Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3106674Z mov.b64 {%r1791, %r1792}, %rd986; 2026-02-21T08:54:39.3106747Z cvt.rn.f16x2.f32 %r1793, %r1792, %r1791; 2026-02-21T08:54:39.3106921Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3106978Z cvt.u64.u32 %rd987, %r1176; 2026-02-21T08:54:39.3107034Z cvt.u64.u32 %rd988, %r1177; 2026-02-21T08:54:39.3107098Z shl.b64 %rd989, %rd988, 32; 2026-02-21T08:54:39.3107156Z or.b64 %rd990, %rd987, %rd989; 2026-02-21T08:54:39.3107328Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3107394Z mov.b64 {%r1794, %r1795}, %rd990; 2026-02-21T08:54:39.3107461Z cvt.rn.f16x2.f32 %r1796, %r1795, %r1794; 2026-02-21T08:54:39.3107519Z mov.b64 {%r1797, %r1798}, %rd382; 2026-02-21T08:54:39.3107590Z cvt.rn.f16x2.f32 %r1799, %r1798, %r1797; 2026-02-21T08:54:39.3107648Z mov.b64 {%r1800, %r1801}, %rd386; 2026-02-21T08:54:39.3107713Z cvt.rn.f16x2.f32 %r1802, %r1801, %r1800; 2026-02-21T08:54:39.3107770Z mov.b64 {%r1803, %r1804}, %rd390; 2026-02-21T08:54:39.3107841Z cvt.rn.f16x2.f32 %r1805, %r1804, %r1803; 2026-02-21T08:54:39.3107925Z mov.b64 {%r1806, %r1807}, %rd394; 2026-02-21T08:54:39.3107991Z cvt.rn.f16x2.f32 %r1808, %r1807, %r1806; 2026-02-21T08:54:39.3108056Z mov.b64 {%r1809, %r1810}, %rd398; 2026-02-21T08:54:39.3108119Z cvt.rn.f16x2.f32 %r1811, %r1810, %r1809; 2026-02-21T08:54:39.3108175Z mov.b64 {%r1812, %r1813}, %rd402; 2026-02-21T08:54:39.3108239Z cvt.rn.f16x2.f32 %r1814, %r1813, %r1812; 2026-02-21T08:54:39.3108303Z mov.b64 {%r1815, %r1816}, %rd406; 2026-02-21T08:54:39.3108367Z cvt.rn.f16x2.f32 %r1817, %r1816, %r1815; 2026-02-21T08:54:39.3108453Z mov.b64 {%r1818, %r1819}, %rd410; 2026-02-21T08:54:39.3108526Z cvt.rn.f16x2.f32 %r1820, %r1819, %r1818; 2026-02-21T08:54:39.3108583Z mov.b64 {%r1821, %r1822}, %rd414; 2026-02-21T08:54:39.3108646Z cvt.rn.f16x2.f32 %r1823, %r1822, %r1821; 2026-02-21T08:54:39.3108713Z mov.b64 {%r1824, %r1825}, %rd418; 2026-02-21T08:54:39.3108778Z cvt.rn.f16x2.f32 %r1826, %r1825, %r1824; 2026-02-21T08:54:39.3108835Z mov.b64 {%r1827, %r1828}, %rd422; 2026-02-21T08:54:39.3108902Z cvt.rn.f16x2.f32 %r1829, %r1828, %r1827; 2026-02-21T08:54:39.3108968Z mov.b64 {%r1830, %r1831}, %rd426; 2026-02-21T08:54:39.3109032Z cvt.rn.f16x2.f32 %r1832, %r1831, %r1830; 2026-02-21T08:54:39.3109089Z mov.b64 {%r1833, %r1834}, %rd430; 2026-02-21T08:54:39.3109159Z cvt.rn.f16x2.f32 %r1835, %r1834, %r1833; 2026-02-21T08:54:39.3109216Z mov.b64 {%r1836, %r1837}, %rd434; 2026-02-21T08:54:39.3109282Z cvt.rn.f16x2.f32 %r1838, %r1837, %r1836; 2026-02-21T08:54:39.3109363Z mov.b64 {%r1839, %r1840}, %rd438; 2026-02-21T08:54:39.3109437Z cvt.rn.f16x2.f32 %r1841, %r1840, %r1839; 2026-02-21T08:54:39.3109495Z mov.b64 {%r1842, %r1843}, %rd442; 2026-02-21T08:54:39.3109560Z cvt.rn.f16x2.f32 %r1844, %r1843, %r1842; 2026-02-21T08:54:39.3109733Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3109793Z cvt.u64.u32 %rd991, %r1213; 2026-02-21T08:54:39.3109852Z cvt.u64.u32 %rd992, %r1214; 2026-02-21T08:54:39.3109917Z shl.b64 %rd993, %rd992, 32; 2026-02-21T08:54:39.3109976Z or.b64 %rd994, %rd991, %rd993; 2026-02-21T08:54:39.3110144Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3110205Z mov.b64 {%r1845, %r1846}, %rd994; 2026-02-21T08:54:39.3110277Z cvt.rn.f16x2.f32 %r1847, %r1846, %r1845; 2026-02-21T08:54:39.3110444Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3110506Z cvt.u64.u32 %rd995, %r1215; 2026-02-21T08:54:39.3110574Z cvt.u64.u32 %rd996, %r1216; 2026-02-21T08:54:39.3110633Z shl.b64 %rd997, %rd996, 32; 2026-02-21T08:54:39.3110693Z or.b64 %rd998, %rd995, %rd997; 2026-02-21T08:54:39.3110870Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3110952Z mov.b64 {%r1848, %r1849}, %rd998; 2026-02-21T08:54:39.3111017Z cvt.rn.f16x2.f32 %r1850, %r1849, %r1848; 2026-02-21T08:54:39.3111193Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3111258Z cvt.u64.u32 %rd999, %r1217; 2026-02-21T08:54:39.3111319Z cvt.u64.u32 %rd1000, %r1218; 2026-02-21T08:54:39.3111382Z shl.b64 %rd1001, %rd1000, 32; 2026-02-21T08:54:39.3111455Z or.b64 %rd1002, %rd999, %rd1001; 2026-02-21T08:54:39.3111629Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3111692Z mov.b64 {%r1851, %r1852}, %rd1002; 2026-02-21T08:54:39.3111767Z cvt.rn.f16x2.f32 %r1853, %r1852, %r1851; 2026-02-21T08:54:39.3111938Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3111999Z cvt.u64.u32 %rd1003, %r1219; 2026-02-21T08:54:39.3112061Z cvt.u64.u32 %rd1004, %r1220; 2026-02-21T08:54:39.3112132Z shl.b64 %rd1005, %rd1004, 32; 2026-02-21T08:54:39.3112191Z or.b64 %rd1006, %rd1003, %rd1005; 2026-02-21T08:54:39.3112386Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3112456Z mov.b64 {%r1854, %r1855}, %rd1006; 2026-02-21T08:54:39.3112524Z cvt.rn.f16x2.f32 %r1856, %r1855, %r1854; 2026-02-21T08:54:39.3112692Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3112759Z cvt.u64.u32 %rd1007, %r1221; 2026-02-21T08:54:39.3112818Z cvt.u64.u32 %rd1008, %r1222; 2026-02-21T08:54:39.3112878Z shl.b64 %rd1009, %rd1008, 32; 2026-02-21T08:54:39.3112959Z or.b64 %rd1010, %rd1007, %rd1009; 2026-02-21T08:54:39.3113137Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3113200Z mov.b64 {%r1857, %r1858}, %rd1010; 2026-02-21T08:54:39.3113265Z cvt.rn.f16x2.f32 %r1859, %r1858, %r1857; 2026-02-21T08:54:39.3113444Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3113505Z cvt.u64.u32 %rd1011, %r1223; 2026-02-21T08:54:39.3113565Z cvt.u64.u32 %rd1012, %r1224; 2026-02-21T08:54:39.3113631Z shl.b64 %rd1013, %rd1012, 32; 2026-02-21T08:54:39.3113690Z or.b64 %rd1014, %rd1011, %rd1013; 2026-02-21T08:54:39.3113865Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3113924Z mov.b64 {%r1860, %r1861}, %rd1014; 2026-02-21T08:54:39.3114016Z cvt.rn.f16x2.f32 %r1862, %r1861, %r1860; 2026-02-21T08:54:39.3114178Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3114234Z cvt.u64.u32 %rd1015, %r1225; 2026-02-21T08:54:39.3114296Z cvt.u64.u32 %rd1016, %r1226; 2026-02-21T08:54:39.3114353Z shl.b64 %rd1017, %rd1016, 32; 2026-02-21T08:54:39.3114408Z or.b64 %rd1018, %rd1015, %rd1017; 2026-02-21T08:54:39.3114574Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3114632Z mov.b64 {%r1863, %r1864}, %rd1018; 2026-02-21T08:54:39.3114718Z cvt.rn.f16x2.f32 %r1865, %r1864, %r1863; 2026-02-21T08:54:39.3114889Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3114946Z cvt.u64.u32 %rd1019, %r1227; 2026-02-21T08:54:39.3115002Z cvt.u64.u32 %rd1020, %r1228; 2026-02-21T08:54:39.3115058Z shl.b64 %rd1021, %rd1020, 32; 2026-02-21T08:54:39.3115122Z or.b64 %rd1022, %rd1019, %rd1021; 2026-02-21T08:54:39.3115288Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3115345Z mov.b64 {%r1866, %r1867}, %rd1022; 2026-02-21T08:54:39.3115416Z cvt.rn.f16x2.f32 %r1868, %r1867, %r1866; 2026-02-21T08:54:39.3115615Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3115672Z cvt.u64.u32 %rd1023, %r1230; 2026-02-21T08:54:39.3115732Z cvt.u64.u32 %rd1024, %r1231; 2026-02-21T08:54:39.3115790Z shl.b64 %rd1025, %rd1024, 32; 2026-02-21T08:54:39.3115846Z or.b64 %rd1026, %rd1023, %rd1025; 2026-02-21T08:54:39.3116000Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3116065Z mov.b64 {%r1869, %r1870}, %rd1026; 2026-02-21T08:54:39.3116127Z cvt.rn.f16x2.f32 %r1871, %r1870, %r1869; 2026-02-21T08:54:39.3116290Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3116354Z cvt.u64.u32 %rd1027, %r1232; 2026-02-21T08:54:39.3116409Z cvt.u64.u32 %rd1028, %r1233; 2026-02-21T08:54:39.3116467Z shl.b64 %rd1029, %rd1028, 32; 2026-02-21T08:54:39.3116529Z or.b64 %rd1030, %rd1027, %rd1029; 2026-02-21T08:54:39.3116689Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3116746Z mov.b64 {%r1872, %r1873}, %rd1030; 2026-02-21T08:54:39.3116833Z cvt.rn.f16x2.f32 %r1874, %r1873, %r1872; 2026-02-21T08:54:39.3117008Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3117065Z cvt.u64.u32 %rd1031, %r1234; 2026-02-21T08:54:39.3117121Z cvt.u64.u32 %rd1032, %r1235; 2026-02-21T08:54:39.3117185Z shl.b64 %rd1033, %rd1032, 32; 2026-02-21T08:54:39.3117242Z or.b64 %rd1034, %rd1031, %rd1033; 2026-02-21T08:54:39.3117405Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3117493Z mov.b64 {%r1875, %r1876}, %rd1034; 2026-02-21T08:54:39.3117556Z cvt.rn.f16x2.f32 %r1877, %r1876, %r1875; 2026-02-21T08:54:39.3117719Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3117776Z cvt.u64.u32 %rd1035, %r1236; 2026-02-21T08:54:39.3117838Z cvt.u64.u32 %rd1036, %r1237; 2026-02-21T08:54:39.3117894Z shl.b64 %rd1037, %rd1036, 32; 2026-02-21T08:54:39.3117952Z or.b64 %rd1038, %rd1035, %rd1037; 2026-02-21T08:54:39.3118121Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3118177Z mov.b64 {%r1878, %r1879}, %rd1038; 2026-02-21T08:54:39.3118240Z cvt.rn.f16x2.f32 %r1880, %r1879, %r1878; 2026-02-21T08:54:39.3118407Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3118490Z cvt.u64.u32 %rd1039, %r1238; 2026-02-21T08:54:39.3118547Z cvt.u64.u32 %rd1040, %r1239; 2026-02-21T08:54:39.3118605Z shl.b64 %rd1041, %rd1040, 32; 2026-02-21T08:54:39.3118667Z or.b64 %rd1042, %rd1039, %rd1041; 2026-02-21T08:54:39.3118829Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3118887Z mov.b64 {%r1881, %r1882}, %rd1042; 2026-02-21T08:54:39.3118959Z cvt.rn.f16x2.f32 %r1883, %r1882, %r1881; 2026-02-21T08:54:39.3119120Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3119176Z cvt.u64.u32 %rd1043, %r1240; 2026-02-21T08:54:39.3119240Z cvt.u64.u32 %rd1044, %r1241; 2026-02-21T08:54:39.3119298Z shl.b64 %rd1045, %rd1044, 32; 2026-02-21T08:54:39.3119356Z or.b64 %rd1046, %rd1043, %rd1045; 2026-02-21T08:54:39.3119520Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3119591Z mov.b64 {%r1884, %r1885}, %rd1046; 2026-02-21T08:54:39.3119659Z cvt.rn.f16x2.f32 %r1886, %r1885, %r1884; 2026-02-21T08:54:39.3119818Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3119884Z cvt.u64.u32 %rd1047, %r1242; 2026-02-21T08:54:39.3119960Z cvt.u64.u32 %rd1048, %r1243; 2026-02-21T08:54:39.3120018Z shl.b64 %rd1049, %rd1048, 32; 2026-02-21T08:54:39.3120081Z or.b64 %rd1050, %rd1047, %rd1049; 2026-02-21T08:54:39.3120243Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3120299Z mov.b64 {%r1887, %r1888}, %rd1050; 2026-02-21T08:54:39.3120362Z cvt.rn.f16x2.f32 %r1889, %r1888, %r1887; 2026-02-21T08:54:39.3120528Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3120585Z cvt.u64.u32 %rd1051, %r1244; 2026-02-21T08:54:39.3120640Z cvt.u64.u32 %rd1052, %r1245; 2026-02-21T08:54:39.3120703Z shl.b64 %rd1053, %rd1052, 32; 2026-02-21T08:54:39.3120761Z or.b64 %rd1054, %rd1051, %rd1053; 2026-02-21T08:54:39.3120920Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3120985Z mov.b64 {%r1890, %r1891}, %rd1054; 2026-02-21T08:54:39.3121049Z cvt.rn.f16x2.f32 %r1892, %r1891, %r1890; 2026-02-21T08:54:39.3121105Z mov.b64 {%r1893, %r1894}, %rd446; 2026-02-21T08:54:39.3121167Z cvt.rn.f16x2.f32 %r1895, %r1894, %r1893; 2026-02-21T08:54:39.3121250Z mov.b64 {%r1896, %r1897}, %rd450; 2026-02-21T08:54:39.3121313Z cvt.rn.f16x2.f32 %r1898, %r1897, %r1896; 2026-02-21T08:54:39.3121369Z mov.b64 {%r1899, %r1900}, %rd454; 2026-02-21T08:54:39.3121440Z cvt.rn.f16x2.f32 %r1901, %r1900, %r1899; 2026-02-21T08:54:39.3121495Z mov.b64 {%r1902, %r1903}, %rd458; 2026-02-21T08:54:39.3121558Z cvt.rn.f16x2.f32 %r1904, %r1903, %r1902; 2026-02-21T08:54:39.3121621Z mov.b64 {%r1905, %r1906}, %rd462; 2026-02-21T08:54:39.3121685Z cvt.rn.f16x2.f32 %r1907, %r1906, %r1905; 2026-02-21T08:54:39.3121764Z mov.b64 {%r1908, %r1909}, %rd466; 2026-02-21T08:54:39.3121827Z cvt.rn.f16x2.f32 %r1910, %r1909, %r1908; 2026-02-21T08:54:39.3121889Z mov.b64 {%r1911, %r1912}, %rd470; 2026-02-21T08:54:39.3121952Z cvt.rn.f16x2.f32 %r1913, %r1912, %r1911; 2026-02-21T08:54:39.3122009Z mov.b64 {%r1914, %r1915}, %rd474; 2026-02-21T08:54:39.3122077Z cvt.rn.f16x2.f32 %r1916, %r1915, %r1914; 2026-02-21T08:54:39.3122133Z mov.b64 {%r1917, %r1918}, %rd478; 2026-02-21T08:54:39.3122196Z cvt.rn.f16x2.f32 %r1919, %r1918, %r1917; 2026-02-21T08:54:39.3122251Z mov.b64 {%r1920, %r1921}, %rd482; 2026-02-21T08:54:39.3122322Z cvt.rn.f16x2.f32 %r1922, %r1921, %r1920; 2026-02-21T08:54:39.3122376Z mov.b64 {%r1923, %r1924}, %rd486; 2026-02-21T08:54:39.3122438Z cvt.rn.f16x2.f32 %r1925, %r1924, %r1923; 2026-02-21T08:54:39.3122502Z mov.b64 {%r1926, %r1927}, %rd490; 2026-02-21T08:54:39.3122564Z cvt.rn.f16x2.f32 %r1928, %r1927, %r1926; 2026-02-21T08:54:39.3122639Z mov.b64 {%r1929, %r1930}, %rd494; 2026-02-21T08:54:39.3122713Z cvt.rn.f16x2.f32 %r1931, %r1930, %r1929; 2026-02-21T08:54:39.3122769Z mov.b64 {%r1932, %r1933}, %rd498; 2026-02-21T08:54:39.3122830Z cvt.rn.f16x2.f32 %r1934, %r1933, %r1932; 2026-02-21T08:54:39.3122884Z mov.b64 {%r1935, %r1936}, %rd502; 2026-02-21T08:54:39.3122955Z cvt.rn.f16x2.f32 %r1937, %r1936, %r1935; 2026-02-21T08:54:39.3123010Z mov.b64 {%r1938, %r1939}, %rd506; 2026-02-21T08:54:39.3123072Z cvt.rn.f16x2.f32 %r1940, %r1939, %r1938; 2026-02-21T08:54:39.3123244Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3123299Z cvt.u64.u32 %rd1055, %r1281; 2026-02-21T08:54:39.3123355Z cvt.u64.u32 %rd1056, %r1282; 2026-02-21T08:54:39.3123411Z shl.b64 %rd1057, %rd1056, 32; 2026-02-21T08:54:39.3123474Z or.b64 %rd1058, %rd1055, %rd1057; 2026-02-21T08:54:39.3123637Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3123695Z mov.b64 {%r1941, %r1942}, %rd1058; 2026-02-21T08:54:39.3123765Z cvt.rn.f16x2.f32 %r1943, %r1942, %r1941; 2026-02-21T08:54:39.3123926Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3123982Z cvt.u64.u32 %rd1059, %r1283; 2026-02-21T08:54:39.3124066Z cvt.u64.u32 %rd1060, %r1284; 2026-02-21T08:54:39.3124122Z shl.b64 %rd1061, %rd1060, 32; 2026-02-21T08:54:39.3124177Z or.b64 %rd1062, %rd1059, %rd1061; 2026-02-21T08:54:39.3124340Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3124405Z mov.b64 {%r1944, %r1945}, %rd1062; 2026-02-21T08:54:39.3124467Z cvt.rn.f16x2.f32 %r1946, %r1945, %r1944; 2026-02-21T08:54:39.3124627Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3124712Z cvt.u64.u32 %rd1063, %r1285; 2026-02-21T08:54:39.3124772Z cvt.u64.u32 %rd1064, %r1286; 2026-02-21T08:54:39.3124828Z shl.b64 %rd1065, %rd1064, 32; 2026-02-21T08:54:39.3124892Z or.b64 %rd1066, %rd1063, %rd1065; 2026-02-21T08:54:39.3125055Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3125113Z mov.b64 {%r1947, %r1948}, %rd1066; 2026-02-21T08:54:39.3125177Z cvt.rn.f16x2.f32 %r1949, %r1948, %r1947; 2026-02-21T08:54:39.3125348Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3125431Z cvt.u64.u32 %rd1067, %r1287; 2026-02-21T08:54:39.3125488Z cvt.u64.u32 %rd1068, %r1288; 2026-02-21T08:54:39.3125552Z shl.b64 %rd1069, %rd1068, 32; 2026-02-21T08:54:39.3125609Z or.b64 %rd1070, %rd1067, %rd1069; 2026-02-21T08:54:39.3125769Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3125832Z mov.b64 {%r1950, %r1951}, %rd1070; 2026-02-21T08:54:39.3125897Z cvt.rn.f16x2.f32 %r1952, %r1951, %r1950; 2026-02-21T08:54:39.3126087Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3126149Z cvt.u64.u32 %rd1071, %r1289; 2026-02-21T08:54:39.3126207Z cvt.u64.u32 %rd1072, %r1290; 2026-02-21T08:54:39.3126267Z shl.b64 %rd1073, %rd1072, 32; 2026-02-21T08:54:39.3126324Z or.b64 %rd1074, %rd1071, %rd1073; 2026-02-21T08:54:39.3126496Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3126555Z mov.b64 {%r1953, %r1954}, %rd1074; 2026-02-21T08:54:39.3126868Z [384s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:39.3127902Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=8, num_stages=6, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:54:39.3128029Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:39.3128098Z `ptxas` stderr: 2026-02-21T08:54:39.3128423Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 308 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:39.3128516Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:39.3128521Z 2026-02-21T08:54:39.3128925Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpwbg3kqwo.ptx -o /tmp/tmpwbg3kqwo.ptx.o 2026-02-21T08:54:39.3128930Z 2026-02-21T08:54:39.3129054Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:39.3129127Z cvt.rn.f16x2.f32 %r1955, %r1954, %r1953; 2026-02-21T08:54:39.3129309Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3129368Z cvt.u64.u32 %rd1075, %r1291; 2026-02-21T08:54:39.3129425Z cvt.u64.u32 %rd1076, %r1292; 2026-02-21T08:54:39.3129508Z shl.b64 %rd1077, %rd1076, 32; 2026-02-21T08:54:39.3129573Z or.b64 %rd1078, %rd1075, %rd1077; 2026-02-21T08:54:39.3129744Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3129802Z mov.b64 {%r1956, %r1957}, %rd1078; 2026-02-21T08:54:39.3129874Z cvt.rn.f16x2.f32 %r1958, %r1957, %r1956; 2026-02-21T08:54:39.3130039Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3130095Z cvt.u64.u32 %rd1079, %r1293; 2026-02-21T08:54:39.3130164Z cvt.u64.u32 %rd1080, %r1294; 2026-02-21T08:54:39.3130225Z shl.b64 %rd1081, %rd1080, 32; 2026-02-21T08:54:39.3130285Z or.b64 %rd1082, %rd1079, %rd1081; 2026-02-21T08:54:39.3130451Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3130518Z mov.b64 {%r1959, %r1960}, %rd1082; 2026-02-21T08:54:39.3130582Z cvt.rn.f16x2.f32 %r1961, %r1960, %r1959; 2026-02-21T08:54:39.3130745Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3130810Z cvt.u64.u32 %rd1083, %r1295; 2026-02-21T08:54:39.3130890Z cvt.u64.u32 %rd1084, %r1296; 2026-02-21T08:54:39.3130948Z shl.b64 %rd1085, %rd1084, 32; 2026-02-21T08:54:39.3131011Z or.b64 %rd1086, %rd1083, %rd1085; 2026-02-21T08:54:39.3131175Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3131232Z mov.b64 {%r1962, %r1963}, %rd1086; 2026-02-21T08:54:39.3131295Z cvt.rn.f16x2.f32 %r1964, %r1963, %r1962; 2026-02-21T08:54:39.3131466Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3131561Z cvt.u64.u32 %rd1087, %r1298; 2026-02-21T08:54:39.3131616Z cvt.u64.u32 %rd1088, %r1299; 2026-02-21T08:54:39.3131680Z shl.b64 %rd1089, %rd1088, 32; 2026-02-21T08:54:39.3131736Z or.b64 %rd1090, %rd1087, %rd1089; 2026-02-21T08:54:39.3131900Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3131963Z mov.b64 {%r1965, %r1966}, %rd1090; 2026-02-21T08:54:39.3132027Z cvt.rn.f16x2.f32 %r1967, %r1966, %r1965; 2026-02-21T08:54:39.3132187Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3132242Z cvt.u64.u32 %rd1091, %r1300; 2026-02-21T08:54:39.3132305Z cvt.u64.u32 %rd1092, %r1301; 2026-02-21T08:54:39.3132362Z shl.b64 %rd1093, %rd1092, 32; 2026-02-21T08:54:39.3132418Z or.b64 %rd1094, %rd1091, %rd1093; 2026-02-21T08:54:39.3132627Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3132686Z mov.b64 {%r1968, %r1969}, %rd1094; 2026-02-21T08:54:39.3132748Z cvt.rn.f16x2.f32 %r1970, %r1969, %r1968; 2026-02-21T08:54:39.3132912Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3132969Z cvt.u64.u32 %rd1095, %r1302; 2026-02-21T08:54:39.3133025Z cvt.u64.u32 %rd1096, %r1303; 2026-02-21T08:54:39.3133080Z shl.b64 %rd1097, %rd1096, 32; 2026-02-21T08:54:39.3133144Z or.b64 %rd1098, %rd1095, %rd1097; 2026-02-21T08:54:39.3133307Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3133364Z mov.b64 {%r1971, %r1972}, %rd1098; 2026-02-21T08:54:39.3133432Z cvt.rn.f16x2.f32 %r1973, %r1972, %r1971; 2026-02-21T08:54:39.3133597Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3133654Z cvt.u64.u32 %rd1099, %r1304; 2026-02-21T08:54:39.3133716Z cvt.u64.u32 %rd1100, %r1305; 2026-02-21T08:54:39.3133773Z shl.b64 %rd1101, %rd1100, 32; 2026-02-21T08:54:39.3133828Z or.b64 %rd1102, %rd1099, %rd1101; 2026-02-21T08:54:39.3133989Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3134076Z mov.b64 {%r1974, %r1975}, %rd1102; 2026-02-21T08:54:39.3134137Z cvt.rn.f16x2.f32 %r1976, %r1975, %r1974; 2026-02-21T08:54:39.3134301Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3134364Z cvt.u64.u32 %rd1103, %r1306; 2026-02-21T08:54:39.3134418Z cvt.u64.u32 %rd1104, %r1307; 2026-02-21T08:54:39.3134473Z shl.b64 %rd1105, %rd1104, 32; 2026-02-21T08:54:39.3134536Z or.b64 %rd1106, %rd1103, %rd1105; 2026-02-21T08:54:39.3134733Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3134794Z mov.b64 {%r1977, %r1978}, %rd1106; 2026-02-21T08:54:39.3134859Z cvt.rn.f16x2.f32 %r1979, %r1978, %r1977; 2026-02-21T08:54:39.3135031Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3135086Z cvt.u64.u32 %rd1107, %r1308; 2026-02-21T08:54:39.3135143Z cvt.u64.u32 %rd1108, %r1309; 2026-02-21T08:54:39.3135208Z shl.b64 %rd1109, %rd1108, 32; 2026-02-21T08:54:39.3135264Z or.b64 %rd1110, %rd1107, %rd1109; 2026-02-21T08:54:39.3135454Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3135519Z mov.b64 {%r1980, %r1981}, %rd1110; 2026-02-21T08:54:39.3135582Z cvt.rn.f16x2.f32 %r1982, %r1981, %r1980; 2026-02-21T08:54:39.3135740Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3135803Z cvt.u64.u32 %rd1111, %r1310; 2026-02-21T08:54:39.3135858Z cvt.u64.u32 %rd1112, %r1311; 2026-02-21T08:54:39.3135915Z shl.b64 %rd1113, %rd1112, 32; 2026-02-21T08:54:39.3135999Z or.b64 %rd1114, %rd1111, %rd1113; 2026-02-21T08:54:39.3136172Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3136229Z mov.b64 {%r1983, %r1984}, %rd1114; 2026-02-21T08:54:39.3136294Z cvt.rn.f16x2.f32 %r1985, %r1984, %r1983; 2026-02-21T08:54:39.3136466Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3136524Z cvt.u64.u32 %rd1115, %r1312; 2026-02-21T08:54:39.3136581Z cvt.u64.u32 %rd1116, %r1313; 2026-02-21T08:54:39.3136636Z shl.b64 %rd1117, %rd1116, 32; 2026-02-21T08:54:39.3136700Z or.b64 %rd1118, %rd1115, %rd1117; 2026-02-21T08:54:39.3136862Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3136918Z mov.b64 {%r1986, %r1987}, %rd1118; 2026-02-21T08:54:39.3136989Z cvt.rn.f16x2.f32 %r1988, %r1987, %r1986; 2026-02-21T08:54:39.3137072Z mov.b64 {%r1989, %r1990}, %rd510; 2026-02-21T08:54:39.3137138Z cvt.rn.f16x2.f32 %r1991, %r1990, %r1989; 2026-02-21T08:54:39.3137200Z mov.b64 {%r1992, %r1993}, %rd514; 2026-02-21T08:54:39.3137263Z cvt.rn.f16x2.f32 %r1994, %r1993, %r1992; 2026-02-21T08:54:39.3137321Z mov.b64 {%r1995, %r1996}, %rd518; 2026-02-21T08:54:39.3137385Z cvt.rn.f16x2.f32 %r1997, %r1996, %r1995; 2026-02-21T08:54:39.3137449Z mov.b64 {%r1998, %r1999}, %rd522; 2026-02-21T08:54:39.3137513Z cvt.rn.f16x2.f32 %r2000, %r1999, %r1998; 2026-02-21T08:54:39.3137570Z mov.b64 {%r2001, %r2002}, %rd526; 2026-02-21T08:54:39.3137649Z cvt.rn.f16x2.f32 %r2003, %r2002, %r2001; 2026-02-21T08:54:39.3137704Z mov.b64 {%r2004, %r2005}, %rd530; 2026-02-21T08:54:39.3137766Z cvt.rn.f16x2.f32 %r2006, %r2005, %r2004; 2026-02-21T08:54:39.3137828Z mov.b64 {%r2007, %r2008}, %rd534; 2026-02-21T08:54:39.3137889Z cvt.rn.f16x2.f32 %r2009, %r2008, %r2007; 2026-02-21T08:54:39.3137943Z mov.b64 {%r2010, %r2011}, %rd538; 2026-02-21T08:54:39.3138008Z cvt.rn.f16x2.f32 %r2012, %r2011, %r2010; 2026-02-21T08:54:39.3138073Z mov.b64 {%r2013, %r2014}, %rd542; 2026-02-21T08:54:39.3138134Z cvt.rn.f16x2.f32 %r2015, %r2014, %r2013; 2026-02-21T08:54:39.3138190Z mov.b64 {%r2016, %r2017}, %rd546; 2026-02-21T08:54:39.3138286Z cvt.rn.f16x2.f32 %r2018, %r2017, %r2016; 2026-02-21T08:54:39.3138341Z mov.b64 {%r2019, %r2020}, %rd550; 2026-02-21T08:54:39.3138403Z cvt.rn.f16x2.f32 %r2021, %r2020, %r2019; 2026-02-21T08:54:39.3138457Z mov.b64 {%r2022, %r2023}, %rd554; 2026-02-21T08:54:39.3138527Z cvt.rn.f16x2.f32 %r2024, %r2023, %r2022; 2026-02-21T08:54:39.3138583Z mov.b64 {%r2025, %r2026}, %rd558; 2026-02-21T08:54:39.3138646Z cvt.rn.f16x2.f32 %r2027, %r2026, %r2025; 2026-02-21T08:54:39.3138708Z mov.b64 {%r2028, %r2029}, %rd562; 2026-02-21T08:54:39.3138770Z cvt.rn.f16x2.f32 %r2030, %r2029, %r2028; 2026-02-21T08:54:39.3138825Z mov.b64 {%r2031, %r2032}, %rd566; 2026-02-21T08:54:39.3138894Z cvt.rn.f16x2.f32 %r2033, %r2032, %r2031; 2026-02-21T08:54:39.3138950Z mov.b64 {%r2034, %r2035}, %rd570; 2026-02-21T08:54:39.3139013Z cvt.rn.f16x2.f32 %r2036, %r2035, %r2034; 2026-02-21T08:54:39.3139175Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3139240Z cvt.u64.u32 %rd1119, %r1349; 2026-02-21T08:54:39.3139297Z cvt.u64.u32 %rd1120, %r1350; 2026-02-21T08:54:39.3139352Z shl.b64 %rd1121, %rd1120, 32; 2026-02-21T08:54:39.3139415Z or.b64 %rd1122, %rd1119, %rd1121; 2026-02-21T08:54:39.3139599Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3139658Z mov.b64 {%r2037, %r2038}, %rd1122; 2026-02-21T08:54:39.3139729Z cvt.rn.f16x2.f32 %r2039, %r2038, %r2037; 2026-02-21T08:54:39.3139889Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3139946Z cvt.u64.u32 %rd1123, %r1351; 2026-02-21T08:54:39.3140004Z cvt.u64.u32 %rd1124, %r1352; 2026-02-21T08:54:39.3140096Z shl.b64 %rd1125, %rd1124, 32; 2026-02-21T08:54:39.3140152Z or.b64 %rd1126, %rd1123, %rd1125; 2026-02-21T08:54:39.3140317Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3140385Z mov.b64 {%r2040, %r2041}, %rd1126; 2026-02-21T08:54:39.3140447Z cvt.rn.f16x2.f32 %r2042, %r2041, %r2040; 2026-02-21T08:54:39.3140609Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3140674Z cvt.u64.u32 %rd1127, %r1353; 2026-02-21T08:54:39.3140730Z cvt.u64.u32 %rd1128, %r1354; 2026-02-21T08:54:39.3140787Z shl.b64 %rd1129, %rd1128, 32; 2026-02-21T08:54:39.3140843Z or.b64 %rd1130, %rd1127, %rd1129; 2026-02-21T08:54:39.3141009Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3141066Z mov.b64 {%r2043, %r2044}, %rd1130; 2026-02-21T08:54:39.3141151Z cvt.rn.f16x2.f32 %r2045, %r2044, %r2043; 2026-02-21T08:54:39.3141316Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3141371Z cvt.u64.u32 %rd1131, %r1355; 2026-02-21T08:54:39.3141425Z cvt.u64.u32 %rd1132, %r1356; 2026-02-21T08:54:39.3141490Z shl.b64 %rd1133, %rd1132, 32; 2026-02-21T08:54:39.3141544Z or.b64 %rd1134, %rd1131, %rd1133; 2026-02-21T08:54:39.3141708Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3141765Z mov.b64 {%r2046, %r2047}, %rd1134; 2026-02-21T08:54:39.3141834Z cvt.rn.f16x2.f32 %r2048, %r2047, %r2046; 2026-02-21T08:54:39.3141993Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3142049Z cvt.u64.u32 %rd1135, %r1357; 2026-02-21T08:54:39.3142111Z cvt.u64.u32 %rd1136, %r1358; 2026-02-21T08:54:39.3142166Z shl.b64 %rd1137, %rd1136, 32; 2026-02-21T08:54:39.3142222Z or.b64 %rd1138, %rd1135, %rd1137; 2026-02-21T08:54:39.3142393Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3142449Z mov.b64 {%r2049, %r2050}, %rd1138; 2026-02-21T08:54:39.3142513Z cvt.rn.f16x2.f32 %r2051, %r2050, %r2049; 2026-02-21T08:54:39.3142702Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3142767Z cvt.u64.u32 %rd1139, %r1359; 2026-02-21T08:54:39.3142824Z cvt.u64.u32 %rd1140, %r1360; 2026-02-21T08:54:39.3142880Z shl.b64 %rd1141, %rd1140, 32; 2026-02-21T08:54:39.3142943Z or.b64 %rd1142, %rd1139, %rd1141; 2026-02-21T08:54:39.3143108Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3143165Z mov.b64 {%r2052, %r2053}, %rd1142; 2026-02-21T08:54:39.3143234Z cvt.rn.f16x2.f32 %r2054, %r2053, %r2052; 2026-02-21T08:54:39.3143404Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3143461Z cvt.u64.u32 %rd1143, %r1361; 2026-02-21T08:54:39.3143516Z cvt.u64.u32 %rd1144, %r1362; 2026-02-21T08:54:39.3143579Z shl.b64 %rd1145, %rd1144, 32; 2026-02-21T08:54:39.3143634Z or.b64 %rd1146, %rd1143, %rd1145; 2026-02-21T08:54:39.3143800Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3143863Z mov.b64 {%r2055, %r2056}, %rd1146; 2026-02-21T08:54:39.3143957Z cvt.rn.f16x2.f32 %r2057, %r2056, %r2055; 2026-02-21T08:54:39.3144125Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3144187Z cvt.u64.u32 %rd1147, %r1363; 2026-02-21T08:54:39.3144242Z cvt.u64.u32 %rd1148, %r1364; 2026-02-21T08:54:39.3144298Z shl.b64 %rd1149, %rd1148, 32; 2026-02-21T08:54:39.3144354Z or.b64 %rd1150, %rd1147, %rd1149; 2026-02-21T08:54:39.3144523Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3144603Z mov.b64 {%r2058, %r2059}, %rd1150; 2026-02-21T08:54:39.3144701Z cvt.rn.f16x2.f32 %r2060, %r2059, %r2058; 2026-02-21T08:54:39.3144875Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3144935Z cvt.u64.u32 %rd1151, %r1366; 2026-02-21T08:54:39.3144991Z cvt.u64.u32 %rd1152, %r1367; 2026-02-21T08:54:39.3145054Z shl.b64 %rd1153, %rd1152, 32; 2026-02-21T08:54:39.3145111Z or.b64 %rd1154, %rd1151, %rd1153; 2026-02-21T08:54:39.3145272Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3145331Z mov.b64 {%r2061, %r2062}, %rd1154; 2026-02-21T08:54:39.3145405Z cvt.rn.f16x2.f32 %r2063, %r2062, %r2061; 2026-02-21T08:54:39.3145569Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3145654Z cvt.u64.u32 %rd1155, %r1368; 2026-02-21T08:54:39.3145724Z cvt.u64.u32 %rd1156, %r1369; 2026-02-21T08:54:39.3145784Z shl.b64 %rd1157, %rd1156, 32; 2026-02-21T08:54:39.3145842Z or.b64 %rd1158, %rd1155, %rd1157; 2026-02-21T08:54:39.3146013Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3146075Z mov.b64 {%r2064, %r2065}, %rd1158; 2026-02-21T08:54:39.3146142Z cvt.rn.f16x2.f32 %r2066, %r2065, %r2064; 2026-02-21T08:54:39.3146306Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3146378Z cvt.u64.u32 %rd1159, %r1370; 2026-02-21T08:54:39.3146439Z cvt.u64.u32 %rd1160, %r1371; 2026-02-21T08:54:39.3146497Z shl.b64 %rd1161, %rd1160, 32; 2026-02-21T08:54:39.3146560Z or.b64 %rd1162, %rd1159, %rd1161; 2026-02-21T08:54:39.3146720Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3146779Z mov.b64 {%r2067, %r2068}, %rd1162; 2026-02-21T08:54:39.3146851Z cvt.rn.f16x2.f32 %r2069, %r2068, %r2067; 2026-02-21T08:54:39.3147011Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3147066Z cvt.u64.u32 %rd1163, %r1372; 2026-02-21T08:54:39.3147160Z cvt.u64.u32 %rd1164, %r1373; 2026-02-21T08:54:39.3147216Z shl.b64 %rd1165, %rd1164, 32; 2026-02-21T08:54:39.3147273Z or.b64 %rd1166, %rd1163, %rd1165; 2026-02-21T08:54:39.3147434Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3147499Z mov.b64 {%r2070, %r2071}, %rd1166; 2026-02-21T08:54:39.3147563Z cvt.rn.f16x2.f32 %r2072, %r2071, %r2070; 2026-02-21T08:54:39.3147619Z mov.b64 {%r2073, %r2074}, %rd574; 2026-02-21T08:54:39.3147688Z cvt.rn.f16x2.f32 %r2075, %r2074, %r2073; 2026-02-21T08:54:39.3147849Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3147920Z cvt.u64.u32 %rd1167, %r1376; 2026-02-21T08:54:39.3147982Z cvt.u64.u32 %rd1168, %r1377; 2026-02-21T08:54:39.3148049Z shl.b64 %rd1169, %rd1168, 32; 2026-02-21T08:54:39.3148107Z or.b64 %rd1170, %rd1167, %rd1169; 2026-02-21T08:54:39.3148277Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3148347Z mov.b64 {%r2076, %r2077}, %rd1170; 2026-02-21T08:54:39.3148412Z cvt.rn.f16x2.f32 %r2078, %r2077, %r2076; 2026-02-21T08:54:39.3148606Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3148677Z cvt.u64.u32 %rd1171, %r1378; 2026-02-21T08:54:39.3148736Z cvt.u64.u32 %rd1172, %r1379; 2026-02-21T08:54:39.3148794Z shl.b64 %rd1173, %rd1172, 32; 2026-02-21T08:54:39.3148853Z or.b64 %rd1174, %rd1171, %rd1173; 2026-02-21T08:54:39.3149034Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3149095Z mov.b64 {%r2079, %r2080}, %rd1174; 2026-02-21T08:54:39.3149204Z cvt.rn.f16x2.f32 %r2081, %r2080, %r2079; 2026-02-21T08:54:39.3149384Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3149444Z cvt.u64.u32 %rd1175, %r1380; 2026-02-21T08:54:39.3149502Z cvt.u64.u32 %rd1176, %r1381; 2026-02-21T08:54:39.3149568Z shl.b64 %rd1177, %rd1176, 32; 2026-02-21T08:54:39.3149626Z or.b64 %rd1178, %rd1175, %rd1177; 2026-02-21T08:54:39.3149796Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3149861Z mov.b64 {%r2082, %r2083}, %rd1178; 2026-02-21T08:54:39.3149928Z cvt.rn.f16x2.f32 %r2084, %r2083, %r2082; 2026-02-21T08:54:39.3149985Z mov.b64 {%r2085, %r2086}, %rd578; 2026-02-21T08:54:39.3150051Z cvt.rn.f16x2.f32 %r2087, %r2086, %r2085; 2026-02-21T08:54:39.3150116Z mov.b64 {%r2088, %r2089}, %rd582; 2026-02-21T08:54:39.3150201Z cvt.rn.f16x2.f32 %r2090, %r2089, %r2088; 2026-02-21T08:54:39.3150261Z mov.b64 {%r2091, %r2092}, %rd586; 2026-02-21T08:54:39.3150333Z cvt.rn.f16x2.f32 %r2093, %r2092, %r2091; 2026-02-21T08:54:39.3150391Z mov.b64 {%r2094, %r2095}, %rd590; 2026-02-21T08:54:39.3150454Z cvt.rn.f16x2.f32 %r2096, %r2095, %r2094; 2026-02-21T08:54:39.3150513Z mov.b64 {%r2097, %r2098}, %rd594; 2026-02-21T08:54:39.3150585Z cvt.rn.f16x2.f32 %r2099, %r2098, %r2097; 2026-02-21T08:54:39.3150641Z mov.b64 {%r2100, %r2101}, %rd598; 2026-02-21T08:54:39.3150706Z cvt.rn.f16x2.f32 %r2102, %r2101, %r2100; 2026-02-21T08:54:39.3150770Z mov.b64 {%r2103, %r2104}, %rd602; 2026-02-21T08:54:39.3150834Z cvt.rn.f16x2.f32 %r2105, %r2104, %r2103; 2026-02-21T08:54:39.3150890Z mov.b64 {%r2106, %r2107}, %rd606; 2026-02-21T08:54:39.3150960Z cvt.rn.f16x2.f32 %r2108, %r2107, %r2106; 2026-02-21T08:54:39.3151018Z mov.b64 {%r2109, %r2110}, %rd610; 2026-02-21T08:54:39.3151082Z cvt.rn.f16x2.f32 %r2111, %r2110, %r2109; 2026-02-21T08:54:39.3151140Z mov.b64 {%r2112, %r2113}, %rd614; 2026-02-21T08:54:39.3151214Z cvt.rn.f16x2.f32 %r2114, %r2113, %r2112; 2026-02-21T08:54:39.3151270Z mov.b64 {%r2115, %r2116}, %rd618; 2026-02-21T08:54:39.3151334Z cvt.rn.f16x2.f32 %r2117, %r2116, %r2115; 2026-02-21T08:54:39.3151396Z mov.b64 {%r2118, %r2119}, %rd622; 2026-02-21T08:54:39.3151485Z cvt.rn.f16x2.f32 %r2120, %r2119, %r2118; 2026-02-21T08:54:39.3151543Z mov.b64 {%r2121, %r2122}, %rd626; 2026-02-21T08:54:39.3151607Z cvt.rn.f16x2.f32 %r2123, %r2122, %r2121; 2026-02-21T08:54:39.3151673Z mov.b64 {%r2124, %r2125}, %rd630; 2026-02-21T08:54:39.3151737Z cvt.rn.f16x2.f32 %r2126, %r2125, %r2124; 2026-02-21T08:54:39.3151795Z mov.b64 {%r2127, %r2128}, %rd634; 2026-02-21T08:54:39.3151866Z cvt.rn.f16x2.f32 %r2129, %r2128, %r2127; 2026-02-21T08:54:39.3151923Z mov.b64 {%r2130, %r2131}, %rd638; 2026-02-21T08:54:39.3151988Z cvt.rn.f16x2.f32 %r2132, %r2131, %r2130; 2026-02-21T08:54:39.3152167Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3152227Z cvt.u64.u32 %rd1179, %r1417; 2026-02-21T08:54:39.3152285Z cvt.u64.u32 %rd1180, %r1418; 2026-02-21T08:54:39.3152342Z shl.b64 %rd1181, %rd1180, 32; 2026-02-21T08:54:39.3152409Z or.b64 %rd1182, %rd1179, %rd1181; 2026-02-21T08:54:39.3152583Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3152642Z mov.b64 {%r2133, %r2134}, %rd1182; 2026-02-21T08:54:39.3152717Z cvt.rn.f16x2.f32 %r2135, %r2134, %r2133; 2026-02-21T08:54:39.3152905Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3152967Z cvt.u64.u32 %rd1183, %r1419; 2026-02-21T08:54:39.3153033Z cvt.u64.u32 %rd1184, %r1420; 2026-02-21T08:54:39.3153092Z shl.b64 %rd1185, %rd1184, 32; 2026-02-21T08:54:39.3153150Z or.b64 %rd1186, %rd1183, %rd1185; 2026-02-21T08:54:39.3153321Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3153416Z mov.b64 {%r2136, %r2137}, %rd1186; 2026-02-21T08:54:39.3153482Z cvt.rn.f16x2.f32 %r2138, %r2137, %r2136; 2026-02-21T08:54:39.3153653Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3153724Z cvt.u64.u32 %rd1187, %r1421; 2026-02-21T08:54:39.3153783Z cvt.u64.u32 %rd1188, %r1422; 2026-02-21T08:54:39.3153846Z shl.b64 %rd1189, %rd1188, 32; 2026-02-21T08:54:39.3153918Z or.b64 %rd1190, %rd1187, %rd1189; 2026-02-21T08:54:39.3154091Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3154151Z mov.b64 {%r2139, %r2140}, %rd1190; 2026-02-21T08:54:39.3154217Z cvt.rn.f16x2.f32 %r2141, %r2140, %r2139; 2026-02-21T08:54:39.3154393Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3154453Z cvt.u64.u32 %rd1191, %r1423; 2026-02-21T08:54:39.3154534Z cvt.u64.u32 %rd1192, %r1424; 2026-02-21T08:54:39.3154605Z shl.b64 %rd1193, %rd1192, 32; 2026-02-21T08:54:39.3154665Z or.b64 %rd1194, %rd1191, %rd1193; 2026-02-21T08:54:39.3154872Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3154939Z mov.b64 {%r2142, %r2143}, %rd1194; 2026-02-21T08:54:39.3155005Z cvt.rn.f16x2.f32 %r2144, %r2143, %r2142; 2026-02-21T08:54:39.3155176Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3155235Z cvt.u64.u32 %rd1195, %r1425; 2026-02-21T08:54:39.3155301Z cvt.u64.u32 %rd1196, %r1426; 2026-02-21T08:54:39.3155361Z shl.b64 %rd1197, %rd1196, 32; 2026-02-21T08:54:39.3155419Z or.b64 %rd1198, %rd1195, %rd1197; 2026-02-21T08:54:39.3155599Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3155658Z mov.b64 {%r2145, %r2146}, %rd1198; 2026-02-21T08:54:39.3155726Z cvt.rn.f16x2.f32 %r2147, %r2146, %r2145; 2026-02-21T08:54:39.3155905Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3155963Z cvt.u64.u32 %rd1199, %r1427; 2026-02-21T08:54:39.3156059Z cvt.u64.u32 %rd1200, %r1428; 2026-02-21T08:54:39.3156116Z shl.b64 %rd1201, %rd1200, 32; 2026-02-21T08:54:39.3156179Z or.b64 %rd1202, %rd1199, %rd1201; 2026-02-21T08:54:39.3156347Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3156404Z mov.b64 {%r2148, %r2149}, %rd1202; 2026-02-21T08:54:39.3156477Z cvt.rn.f16x2.f32 %r2150, %r2149, %r2148; 2026-02-21T08:54:39.3156637Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3156694Z cvt.u64.u32 %rd1203, %r1429; 2026-02-21T08:54:39.3156757Z cvt.u64.u32 %rd1204, %r1430; 2026-02-21T08:54:39.3156815Z shl.b64 %rd1205, %rd1204, 32; 2026-02-21T08:54:39.3156872Z or.b64 %rd1206, %rd1203, %rd1205; 2026-02-21T08:54:39.3157034Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3157099Z mov.b64 {%r2151, %r2152}, %rd1206; 2026-02-21T08:54:39.3157162Z cvt.rn.f16x2.f32 %r2153, %r2152, %r2151; 2026-02-21T08:54:39.3157324Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3157389Z cvt.u64.u32 %rd1207, %r1431; 2026-02-21T08:54:39.3157470Z cvt.u64.u32 %rd1208, %r1432; 2026-02-21T08:54:39.3157528Z shl.b64 %rd1209, %rd1208, 32; 2026-02-21T08:54:39.3157592Z or.b64 %rd1210, %rd1207, %rd1209; 2026-02-21T08:54:39.3157755Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3157811Z mov.b64 {%r2154, %r2155}, %rd1210; 2026-02-21T08:54:39.3157872Z cvt.rn.f16x2.f32 %r2156, %r2155, %r2154; 2026-02-21T08:54:39.3158045Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3158128Z cvt.u64.u32 %rd1211, %r1434; 2026-02-21T08:54:39.3158184Z cvt.u64.u32 %rd1212, %r1435; 2026-02-21T08:54:39.3158247Z shl.b64 %rd1213, %rd1212, 32; 2026-02-21T08:54:39.3158303Z or.b64 %rd1214, %rd1211, %rd1213; 2026-02-21T08:54:39.3158466Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3158529Z mov.b64 {%r2157, %r2158}, %rd1214; 2026-02-21T08:54:39.3158593Z cvt.rn.f16x2.f32 %r2159, %r2158, %r2157; 2026-02-21T08:54:39.3158753Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3158816Z cvt.u64.u32 %rd1215, %r1436; 2026-02-21T08:54:39.3158871Z cvt.u64.u32 %rd1216, %r1437; 2026-02-21T08:54:39.3158926Z shl.b64 %rd1217, %rd1216, 32; 2026-02-21T08:54:39.3158981Z or.b64 %rd1218, %rd1215, %rd1217; 2026-02-21T08:54:39.3159177Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3159238Z mov.b64 {%r2160, %r2161}, %rd1218; 2026-02-21T08:54:39.3159300Z cvt.rn.f16x2.f32 %r2162, %r2161, %r2160; 2026-02-21T08:54:39.3159466Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3159524Z cvt.u64.u32 %rd1219, %r1438; 2026-02-21T08:54:39.3159580Z cvt.u64.u32 %rd1220, %r1439; 2026-02-21T08:54:39.3159635Z shl.b64 %rd1221, %rd1220, 32; 2026-02-21T08:54:39.3159700Z or.b64 %rd1222, %rd1219, %rd1221; 2026-02-21T08:54:39.3159861Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3159917Z mov.b64 {%r2163, %r2164}, %rd1222; 2026-02-21T08:54:39.3159987Z cvt.rn.f16x2.f32 %r2165, %r2164, %r2163; 2026-02-21T08:54:39.3160146Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3160203Z cvt.u64.u32 %rd1223, %r1440; 2026-02-21T08:54:39.3160267Z cvt.u64.u32 %rd1224, %r1441; 2026-02-21T08:54:39.3160324Z shl.b64 %rd1225, %rd1224, 32; 2026-02-21T08:54:39.3160379Z or.b64 %rd1226, %rd1223, %rd1225; 2026-02-21T08:54:39.3160547Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3160626Z mov.b64 {%r2166, %r2167}, %rd1226; 2026-02-21T08:54:39.3160687Z cvt.rn.f16x2.f32 %r2168, %r2167, %r2166; 2026-02-21T08:54:39.3160743Z mov.b64 {%r2169, %r2170}, %rd642; 2026-02-21T08:54:39.3160813Z cvt.rn.f16x2.f32 %r2171, %r2170, %r2169; 2026-02-21T08:54:39.3160868Z mov.b64 {%r2172, %r2173}, %rd646; 2026-02-21T08:54:39.3160930Z cvt.rn.f16x2.f32 %r2174, %r2173, %r2172; 2026-02-21T08:54:39.3160993Z mov.b64 {%r2175, %r2176}, %rd650; 2026-02-21T08:54:39.3161055Z cvt.rn.f16x2.f32 %r2177, %r2176, %r2175; 2026-02-21T08:54:39.3161110Z mov.b64 {%r2178, %r2179}, %rd654; 2026-02-21T08:54:39.3161172Z cvt.rn.f16x2.f32 %r2180, %r2179, %r2178; 2026-02-21T08:54:39.3161236Z mov.b64 {%r2181, %r2182}, %rd658; 2026-02-21T08:54:39.3161298Z cvt.rn.f16x2.f32 %r2183, %r2182, %r2181; 2026-02-21T08:54:39.3161353Z mov.b64 {%r2184, %r2185}, %rd662; 2026-02-21T08:54:39.3161424Z cvt.rn.f16x2.f32 %r2186, %r2185, %r2184; 2026-02-21T08:54:39.3161481Z mov.b64 {%r2187, %r2188}, %rd666; 2026-02-21T08:54:39.3161543Z cvt.rn.f16x2.f32 %r2189, %r2188, %r2187; 2026-02-21T08:54:39.3161606Z mov.b64 {%r2190, %r2191}, %rd670; 2026-02-21T08:54:39.3161689Z cvt.rn.f16x2.f32 %r2192, %r2191, %r2190; 2026-02-21T08:54:39.3161747Z mov.b64 {%r2193, %r2194}, %rd674; 2026-02-21T08:54:39.3161810Z cvt.rn.f16x2.f32 %r2195, %r2194, %r2193; 2026-02-21T08:54:39.3161875Z mov.b64 {%r2196, %r2197}, %rd678; 2026-02-21T08:54:39.3161938Z cvt.rn.f16x2.f32 %r2198, %r2197, %r2196; 2026-02-21T08:54:39.3161994Z mov.b64 {%r2199, %r2200}, %rd682; 2026-02-21T08:54:39.3162065Z cvt.rn.f16x2.f32 %r2201, %r2200, %r2199; 2026-02-21T08:54:39.3162124Z mov.b64 {%r2202, %r2203}, %rd686; 2026-02-21T08:54:39.3162208Z cvt.rn.f16x2.f32 %r2204, %r2203, %r2202; 2026-02-21T08:54:39.3162264Z mov.b64 {%r2205, %r2206}, %rd690; 2026-02-21T08:54:39.3162333Z cvt.rn.f16x2.f32 %r2207, %r2206, %r2205; 2026-02-21T08:54:39.3162388Z mov.b64 {%r2208, %r2209}, %rd694; 2026-02-21T08:54:39.3162452Z cvt.rn.f16x2.f32 %r2210, %r2209, %r2208; 2026-02-21T08:54:39.3162516Z mov.b64 {%r2211, %r2212}, %rd698; 2026-02-21T08:54:39.3162577Z cvt.rn.f16x2.f32 %r2213, %r2212, %r2211; 2026-02-21T08:54:39.3162632Z mov.b64 {%r2214, %r2215}, %rd702; 2026-02-21T08:54:39.3162701Z cvt.rn.f16x2.f32 %r2216, %r2215, %r2214; 2026-02-21T08:54:39.3162755Z mov.b64 {%r2217, %r2218}, %rd706; 2026-02-21T08:54:39.3162816Z cvt.rn.f16x2.f32 %r2219, %r2218, %r2217; 2026-02-21T08:54:39.3162871Z mov.b64 {%r2220, %r2221}, %rd710; 2026-02-21T08:54:39.3162942Z cvt.rn.f16x2.f32 %r2222, %r2221, %r2220; 2026-02-21T08:54:39.3162996Z mov.b64 {%r2223, %r2224}, %rd714; 2026-02-21T08:54:39.3163085Z cvt.rn.f16x2.f32 %r2225, %r2224, %r2223; 2026-02-21T08:54:39.3163150Z mov.b64 {%r2226, %r2227}, %rd718; 2026-02-21T08:54:39.3163211Z cvt.rn.f16x2.f32 %r2228, %r2227, %r2226; 2026-02-21T08:54:39.3163374Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3163438Z cvt.u64.u32 %rd1227, %r1485; 2026-02-21T08:54:39.3163494Z cvt.u64.u32 %rd1228, %r1486; 2026-02-21T08:54:39.3163549Z shl.b64 %rd1229, %rd1228, 32; 2026-02-21T08:54:39.3163605Z or.b64 %rd1230, %rd1227, %rd1229; 2026-02-21T08:54:39.3163778Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3163834Z mov.b64 {%r2229, %r2230}, %rd1230; 2026-02-21T08:54:39.3163896Z cvt.rn.f16x2.f32 %r2231, %r2230, %r2229; 2026-02-21T08:54:39.3164064Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3164120Z cvt.u64.u32 %rd1231, %r1487; 2026-02-21T08:54:39.3164176Z cvt.u64.u32 %rd1232, %r1488; 2026-02-21T08:54:39.3164233Z shl.b64 %rd1233, %rd1232, 32; 2026-02-21T08:54:39.3164296Z or.b64 %rd1234, %rd1231, %rd1233; 2026-02-21T08:54:39.3164456Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3164537Z mov.b64 {%r2232, %r2233}, %rd1234; 2026-02-21T08:54:39.3164605Z cvt.rn.f16x2.f32 %r2234, %r2233, %r2232; 2026-02-21T08:54:39.3164818Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3164876Z cvt.u64.u32 %rd1235, %r1489; 2026-02-21T08:54:39.3164939Z cvt.u64.u32 %rd1236, %r1490; 2026-02-21T08:54:39.3164996Z shl.b64 %rd1237, %rd1236, 32; 2026-02-21T08:54:39.3165052Z or.b64 %rd1238, %rd1235, %rd1237; 2026-02-21T08:54:39.3165220Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3165278Z mov.b64 {%r2235, %r2236}, %rd1238; 2026-02-21T08:54:39.3165341Z cvt.rn.f16x2.f32 %r2237, %r2236, %r2235; 2026-02-21T08:54:39.3165504Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3165568Z cvt.u64.u32 %rd1239, %r1491; 2026-02-21T08:54:39.3165625Z cvt.u64.u32 %rd1240, %r1492; 2026-02-21T08:54:39.3165680Z shl.b64 %rd1241, %rd1240, 32; 2026-02-21T08:54:39.3165743Z or.b64 %rd1242, %rd1239, %rd1241; 2026-02-21T08:54:39.3165947Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3166006Z mov.b64 {%r2238, %r2239}, %rd1242; 2026-02-21T08:54:39.3166079Z cvt.rn.f16x2.f32 %r2240, %r2239, %r2238; 2026-02-21T08:54:39.3166242Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3166299Z cvt.u64.u32 %rd1243, %r1493; 2026-02-21T08:54:39.3166356Z cvt.u64.u32 %rd1244, %r1494; 2026-02-21T08:54:39.3166422Z shl.b64 %rd1245, %rd1244, 32; 2026-02-21T08:54:39.3166478Z or.b64 %rd1246, %rd1243, %rd1245; 2026-02-21T08:54:39.3166670Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3166734Z mov.b64 {%r2241, %r2242}, %rd1246; 2026-02-21T08:54:39.3166800Z cvt.rn.f16x2.f32 %r2243, %r2242, %r2241; 2026-02-21T08:54:39.3166961Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3167022Z cvt.u64.u32 %rd1247, %r1495; 2026-02-21T08:54:39.3167078Z cvt.u64.u32 %rd1248, %r1496; 2026-02-21T08:54:39.3167136Z shl.b64 %rd1249, %rd1248, 32; 2026-02-21T08:54:39.3167192Z or.b64 %rd1250, %rd1247, %rd1249; 2026-02-21T08:54:39.3167362Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3167418Z mov.b64 {%r2244, %r2245}, %rd1250; 2026-02-21T08:54:39.3167483Z cvt.rn.f16x2.f32 %r2246, %r2245, %r2244; 2026-02-21T08:54:39.3167676Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3167735Z cvt.u64.u32 %rd1251, %r1497; 2026-02-21T08:54:39.3167791Z cvt.u64.u32 %rd1252, %r1498; 2026-02-21T08:54:39.3167853Z shl.b64 %rd1253, %rd1252, 32; 2026-02-21T08:54:39.3167911Z or.b64 %rd1254, %rd1251, %rd1253; 2026-02-21T08:54:39.3168071Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3168127Z mov.b64 {%r2247, %r2248}, %rd1254; 2026-02-21T08:54:39.3168199Z cvt.rn.f16x2.f32 %r2249, %r2248, %r2247; 2026-02-21T08:54:39.3168361Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3168416Z cvt.u64.u32 %rd1255, %r1499; 2026-02-21T08:54:39.3168480Z cvt.u64.u32 %rd1256, %r1500; 2026-02-21T08:54:39.3168536Z shl.b64 %rd1257, %rd1256, 32; 2026-02-21T08:54:39.3168592Z or.b64 %rd1258, %rd1255, %rd1257; 2026-02-21T08:54:39.3168761Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3168819Z mov.b64 {%r2250, %r2251}, %rd1258; 2026-02-21T08:54:39.3168881Z cvt.rn.f16x2.f32 %r2252, %r2251, %r2250; 2026-02-21T08:54:39.3169044Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3169133Z cvt.u64.u32 %rd1259, %r1502; 2026-02-21T08:54:39.3169189Z cvt.u64.u32 %rd1260, %r1503; 2026-02-21T08:54:39.3169247Z shl.b64 %rd1261, %rd1260, 32; 2026-02-21T08:54:39.3169315Z or.b64 %rd1262, %rd1259, %rd1261; 2026-02-21T08:54:39.3169481Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3169538Z mov.b64 {%r2253, %r2254}, %rd1262; 2026-02-21T08:54:39.3169607Z cvt.rn.f16x2.f32 %r2255, %r2254, %r2253; 2026-02-21T08:54:39.3169770Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3169826Z cvt.u64.u32 %rd1263, %r1504; 2026-02-21T08:54:39.3169883Z cvt.u64.u32 %rd1264, %r1505; 2026-02-21T08:54:39.3169948Z shl.b64 %rd1265, %rd1264, 32; 2026-02-21T08:54:39.3170006Z or.b64 %rd1266, %rd1263, %rd1265; 2026-02-21T08:54:39.3170169Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3170238Z mov.b64 {%r2256, %r2257}, %rd1266; 2026-02-21T08:54:39.3170303Z cvt.rn.f16x2.f32 %r2258, %r2257, %r2256; 2026-02-21T08:54:39.3170501Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3170565Z cvt.u64.u32 %rd1267, %r1506; 2026-02-21T08:54:39.3170622Z cvt.u64.u32 %rd1268, %r1507; 2026-02-21T08:54:39.3170678Z shl.b64 %rd1269, %rd1268, 32; 2026-02-21T08:54:39.3170735Z or.b64 %rd1270, %rd1267, %rd1269; 2026-02-21T08:54:39.3170904Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3170962Z mov.b64 {%r2259, %r2260}, %rd1270; 2026-02-21T08:54:39.3171047Z cvt.rn.f16x2.f32 %r2261, %r2260, %r2259; 2026-02-21T08:54:39.3171218Z .loc 1 56 52 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:56:52 2026-02-21T08:54:39.3171276Z cvt.u64.u32 %rd1271, %r1508; 2026-02-21T08:54:39.3171331Z cvt.u64.u32 %rd1272, %r1509; 2026-02-21T08:54:39.3171395Z shl.b64 %rd1273, %rd1272, 32; 2026-02-21T08:54:39.3171451Z or.b64 %rd1274, %rd1271, %rd1273; 2026-02-21T08:54:39.3171612Z .loc 1 58 27 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:58:27 2026-02-21T08:54:39.3171669Z mov.b64 {%r2262, %r2263}, %rd1274; 2026-02-21T08:54:39.3171739Z cvt.rn.f16x2.f32 %r2264, %r2263, %r2262; 2026-02-21T08:54:39.3171794Z mov.b64 {%r2265, %r2266}, %rd722; 2026-02-21T08:54:39.3171855Z cvt.rn.f16x2.f32 %r2267, %r2266, %r2265; 2026-02-21T08:54:39.3171917Z mov.b64 {%r2268, %r2269}, %rd726; 2026-02-21T08:54:39.3172003Z cvt.rn.f16x2.f32 %r2270, %r2269, %r2268; 2026-02-21T08:54:39.3172061Z mov.b64 {%r2271, %r2272}, %rd730; 2026-02-21T08:54:39.3172131Z cvt.rn.f16x2.f32 %r2273, %r2272, %r2271; 2026-02-21T08:54:39.3172186Z mov.b64 {%r2274, %r2275}, %rd734; 2026-02-21T08:54:39.3172249Z cvt.rn.f16x2.f32 %r2276, %r2275, %r2274; 2026-02-21T08:54:39.3172306Z mov.b64 {%r2277, %r2278}, %rd738; 2026-02-21T08:54:39.3172375Z cvt.rn.f16x2.f32 %r2279, %r2278, %r2277; 2026-02-21T08:54:39.3172430Z mov.b64 {%r2280, %r2281}, %rd742; 2026-02-21T08:54:39.3172494Z cvt.rn.f16x2.f32 %r2282, %r2281, %r2280; 2026-02-21T08:54:39.3172556Z mov.b64 {%r2283, %r2284}, %rd746; 2026-02-21T08:54:39.3172618Z cvt.rn.f16x2.f32 %r2285, %r2284, %r2283; 2026-02-21T08:54:39.3172672Z mov.b64 {%r2286, %r2287}, %rd750; 2026-02-21T08:54:39.3172733Z cvt.rn.f16x2.f32 %r2288, %r2287, %r2286; 2026-02-21T08:54:39.3172796Z mov.b64 {%r2289, %r2290}, %rd754; 2026-02-21T08:54:39.3172858Z cvt.rn.f16x2.f32 %r2291, %r2290, %r2289; 2026-02-21T08:54:39.3172913Z mov.b64 {%r2292, %r2293}, %rd758; 2026-02-21T08:54:39.3172982Z cvt.rn.f16x2.f32 %r2294, %r2293, %r2292; 2026-02-21T08:54:39.3173038Z mov.b64 {%r2295, %r2296}, %rd762; 2026-02-21T08:54:39.3173099Z cvt.rn.f16x2.f32 %r2297, %r2296, %r2295; 2026-02-21T08:54:39.3173162Z mov.b64 {%r2298, %r2299}, %rd766; 2026-02-21T08:54:39.3173249Z cvt.rn.f16x2.f32 %r2300, %r2299, %r2298; 2026-02-21T08:54:39.3173305Z mov.b64 {%r2301, %r2302}, %rd770; 2026-02-21T08:54:39.3173366Z cvt.rn.f16x2.f32 %r2303, %r2302, %r2301; 2026-02-21T08:54:39.3173429Z mov.b64 {%r2304, %r2305}, %rd774; 2026-02-21T08:54:39.3173491Z cvt.rn.f16x2.f32 %r2306, %r2305, %r2304; 2026-02-21T08:54:39.3173545Z mov.b64 {%r2307, %r2308}, %rd778; 2026-02-21T08:54:39.3173614Z cvt.rn.f16x2.f32 %r2309, %r2308, %r2307; 2026-02-21T08:54:39.3173668Z mov.b64 {%r2310, %r2311}, %rd782; 2026-02-21T08:54:39.3173729Z cvt.rn.f16x2.f32 %r2312, %r2311, %r2310; 2026-02-21T08:54:39.3173783Z mov.b64 {%r2313, %r2314}, %rd786; 2026-02-21T08:54:39.3173852Z cvt.rn.f16x2.f32 %r2315, %r2314, %r2313; 2026-02-21T08:54:39.3173907Z mov.b64 {%r2316, %r2317}, %rd790; 2026-02-21T08:54:39.3173968Z cvt.rn.f16x2.f32 %r2318, %r2317, %r2316; 2026-02-21T08:54:39.3174032Z mov.b64 {%r2319, %r2320}, %rd794; 2026-02-21T08:54:39.3174093Z cvt.rn.f16x2.f32 %r2321, %r2320, %r2319; 2026-02-21T08:54:39.3174148Z mov.b64 {%r2322, %r2323}, %rd798; 2026-02-21T08:54:39.3174215Z cvt.rn.f16x2.f32 %r2324, %r2323, %r2322; 2026-02-21T08:54:39.3174402Z .loc 1 59 45 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:59:45 2026-02-21T08:54:39.3174475Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:39.3174527Z bar.sync 0; 2026-02-21T08:54:39.3174633Z st.shared.v4.b32 [%r23], {%r1559, %r1562, %r1565, %r1568}; 2026-02-21T08:54:39.3174767Z st.shared.v4.b32 [%r23+32768], {%r1655, %r1658, %r1661, %r1664}; 2026-02-21T08:54:39.3174866Z st.shared.v4.b32 [%r23+65536], {%r1751, %r1754, %r1757, %r1760}; 2026-02-21T08:54:39.3174970Z st.shared.v4.b32 [%r23+98304], {%r1847, %r1850, %r1853, %r1856}; 2026-02-21T08:54:39.3175089Z st.shared.v4.b32 [%r23+16384], {%r1943, %r1946, %r1949, %r1952}; 2026-02-21T08:54:39.3175182Z st.shared.v4.b32 [%r23+49152], {%r2039, %r2042, %r2045, %r2048}; 2026-02-21T08:54:39.3175278Z st.shared.v4.b32 [%r23+81920], {%r2135, %r2138, %r2141, %r2144}; 2026-02-21T08:54:39.3175382Z st.shared.v4.b32 [%r23+114688], {%r2231, %r2234, %r2237, %r2240}; 2026-02-21T08:54:39.3175473Z st.shared.v4.b32 [%r24], {%r1571, %r1574, %r1577, %r1580}; 2026-02-21T08:54:39.3175575Z st.shared.v4.b32 [%r24+32768], {%r1667, %r1670, %r1673, %r1676}; 2026-02-21T08:54:39.3175668Z st.shared.v4.b32 [%r24+65536], {%r1763, %r1766, %r1769, %r1772}; 2026-02-21T08:54:39.3175758Z st.shared.v4.b32 [%r24+98304], {%r1859, %r1862, %r1865, %r1868}; 2026-02-21T08:54:39.3175850Z st.shared.v4.b32 [%r24+16384], {%r1955, %r1958, %r1961, %r1964}; 2026-02-21T08:54:39.3175948Z st.shared.v4.b32 [%r24+49152], {%r2051, %r2054, %r2057, %r2060}; 2026-02-21T08:54:39.3176066Z st.shared.v4.b32 [%r24+81920], {%r2147, %r2150, %r2153, %r2156}; 2026-02-21T08:54:39.3176169Z st.shared.v4.b32 [%r24+114688], {%r2243, %r2246, %r2249, %r2252}; 2026-02-21T08:54:39.3176268Z st.shared.v4.b32 [%r25], {%r1583, %r1586, %r1589, %r1592}; 2026-02-21T08:54:39.3176360Z st.shared.v4.b32 [%r25+32768], {%r1679, %r1682, %r1685, %r1688}; 2026-02-21T08:54:39.3176455Z st.shared.v4.b32 [%r25+65536], {%r1775, %r1778, %r1781, %r1784}; 2026-02-21T08:54:39.3176554Z st.shared.v4.b32 [%r25+98304], {%r1871, %r1874, %r1877, %r1880}; 2026-02-21T08:54:39.3176647Z st.shared.v4.b32 [%r25+16384], {%r1967, %r1970, %r1973, %r1976}; 2026-02-21T08:54:39.3176741Z st.shared.v4.b32 [%r25+49152], {%r2063, %r2066, %r2069, %r2072}; 2026-02-21T08:54:39.3176839Z st.shared.v4.b32 [%r25+81920], {%r2159, %r2162, %r2165, %r2168}; 2026-02-21T08:54:39.3176938Z st.shared.v4.b32 [%r25+114688], {%r2255, %r2258, %r2261, %r2264}; 2026-02-21T08:54:39.3177027Z st.shared.v4.b32 [%r26], {%r1595, %r1598, %r1601, %r1604}; 2026-02-21T08:54:39.3177121Z st.shared.v4.b32 [%r26+32768], {%r1691, %r1694, %r1697, %r1700}; 2026-02-21T08:54:39.3177224Z st.shared.v4.b32 [%r26+65536], {%r1787, %r1790, %r1793, %r1796}; 2026-02-21T08:54:39.3177317Z st.shared.v4.b32 [%r26+98304], {%r1883, %r1886, %r1889, %r1892}; 2026-02-21T08:54:39.3177432Z st.shared.v4.b32 [%r26+16384], {%r1979, %r1982, %r1985, %r1988}; 2026-02-21T08:54:39.3177531Z st.shared.v4.b32 [%r26+49152], {%r2075, %r2078, %r2081, %r2084}; 2026-02-21T08:54:39.3177626Z st.shared.v4.b32 [%r26+81920], {%r2171, %r2174, %r2177, %r2180}; 2026-02-21T08:54:39.3177722Z st.shared.v4.b32 [%r26+114688], {%r2267, %r2270, %r2273, %r2276}; 2026-02-21T08:54:39.3177816Z st.shared.v4.b32 [%r27], {%r1607, %r1610, %r1613, %r1616}; 2026-02-21T08:54:39.3177910Z st.shared.v4.b32 [%r27+32768], {%r1703, %r1706, %r1709, %r1712}; 2026-02-21T08:54:39.3178002Z st.shared.v4.b32 [%r27+65536], {%r1799, %r1802, %r1805, %r1808}; 2026-02-21T08:54:39.3178103Z st.shared.v4.b32 [%r27+98304], {%r1895, %r1898, %r1901, %r1904}; 2026-02-21T08:54:39.3178197Z st.shared.v4.b32 [%r27+16384], {%r1991, %r1994, %r1997, %r2000}; 2026-02-21T08:54:39.3178288Z st.shared.v4.b32 [%r27+49152], {%r2087, %r2090, %r2093, %r2096}; 2026-02-21T08:54:39.3178380Z st.shared.v4.b32 [%r27+81920], {%r2183, %r2186, %r2189, %r2192}; 2026-02-21T08:54:39.3178487Z st.shared.v4.b32 [%r27+114688], {%r2279, %r2282, %r2285, %r2288}; 2026-02-21T08:54:39.3178573Z st.shared.v4.b32 [%r28], {%r1619, %r1622, %r1625, %r1628}; 2026-02-21T08:54:39.3178692Z st.shared.v4.b32 [%r28+32768], {%r1715, %r1718, %r1721, %r1724}; 2026-02-21T08:54:39.3178794Z st.shared.v4.b32 [%r28+65536], {%r1811, %r1814, %r1817, %r1820}; 2026-02-21T08:54:39.3178886Z st.shared.v4.b32 [%r28+98304], {%r1907, %r1910, %r1913, %r1916}; 2026-02-21T08:54:39.3178978Z st.shared.v4.b32 [%r28+16384], {%r2003, %r2006, %r2009, %r2012}; 2026-02-21T08:54:39.3179076Z st.shared.v4.b32 [%r28+49152], {%r2099, %r2102, %r2105, %r2108}; 2026-02-21T08:54:39.3179168Z st.shared.v4.b32 [%r28+81920], {%r2195, %r2198, %r2201, %r2204}; 2026-02-21T08:54:39.3179288Z st.shared.v4.b32 [%r28+114688], {%r2291, %r2294, %r2297, %r2300}; 2026-02-21T08:54:39.3179382Z st.shared.v4.b32 [%r29], {%r1631, %r1634, %r1637, %r1640}; 2026-02-21T08:54:39.3179476Z st.shared.v4.b32 [%r29+32768], {%r1727, %r1730, %r1733, %r1736}; 2026-02-21T08:54:39.3179571Z st.shared.v4.b32 [%r29+65536], {%r1823, %r1826, %r1829, %r1832}; 2026-02-21T08:54:39.3179663Z st.shared.v4.b32 [%r29+98304], {%r1919, %r1922, %r1925, %r1928}; 2026-02-21T08:54:39.3179765Z st.shared.v4.b32 [%r29+16384], {%r2015, %r2018, %r2021, %r2024}; 2026-02-21T08:54:39.3179860Z st.shared.v4.b32 [%r29+49152], {%r2111, %r2114, %r2117, %r2120}; 2026-02-21T08:54:39.3179953Z st.shared.v4.b32 [%r29+81920], {%r2207, %r2210, %r2213, %r2216}; 2026-02-21T08:54:39.3180056Z st.shared.v4.b32 [%r29+114688], {%r2303, %r2306, %r2309, %r2312}; 2026-02-21T08:54:39.3180142Z st.shared.v4.b32 [%r30], {%r1643, %r1646, %r1649, %r1652}; 2026-02-21T08:54:39.3180256Z st.shared.v4.b32 [%r30+32768], {%r1739, %r1742, %r1745, %r1748}; 2026-02-21T08:54:39.3180359Z st.shared.v4.b32 [%r30+65536], {%r1835, %r1838, %r1841, %r1844}; 2026-02-21T08:54:39.3180451Z st.shared.v4.b32 [%r30+98304], {%r1931, %r1934, %r1937, %r1940}; 2026-02-21T08:54:39.3180546Z st.shared.v4.b32 [%r30+16384], {%r2027, %r2030, %r2033, %r2036}; 2026-02-21T08:54:39.3180645Z st.shared.v4.b32 [%r30+49152], {%r2123, %r2126, %r2129, %r2132}; 2026-02-21T08:54:39.3180740Z st.shared.v4.b32 [%r30+81920], {%r2219, %r2222, %r2225, %r2228}; 2026-02-21T08:54:39.3180834Z st.shared.v4.b32 [%r30+114688], {%r2315, %r2318, %r2321, %r2324}; 2026-02-21T08:54:39.3180894Z // begin inline asm 2026-02-21T08:54:39.3180974Z fence.proxy.async.shared::cta; 2026-02-21T08:54:39.3181029Z // end inline asm 2026-02-21T08:54:39.3181081Z bar.sync 0; 2026-02-21T08:54:39.3181155Z elect.sync %r2325|%p85, -1; 2026-02-21T08:54:39.3181220Z and.pred %p83, %p84, %p85; 2026-02-21T08:54:39.3181280Z shl.b32 %r2326, %r35, 15; 2026-02-21T08:54:39.3181340Z add.s32 %r1555, %r48, %r2326; 2026-02-21T08:54:39.3181407Z shl.b32 %r2327, %r35, 6; 2026-02-21T08:54:39.3181465Z or.b32 %r1553, %r2327, %r32; 2026-02-21T08:54:39.3181519Z // begin inline asm 2026-02-21T08:54:39.3181714Z @%p83 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd250, {%r1553, %r1554}], [%r1555]; 2026-02-21T08:54:39.3181805Z // end inline asm 2026-02-21T08:54:39.3181871Z cp.async.bulk.commit_group; 2026-02-21T08:54:39.3181959Z $L__BB0_8: // %._crit_edge 2026-02-21T08:54:39.3182134Z .loc 1 30 74 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:30:74 2026-02-21T08:54:39.3182205Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:39.3182258Z bar.sync 0; 2026-02-21T08:54:39.3182434Z .loc 1 30 4 // cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py:30:4 2026-02-21T08:54:39.3182486Z bar.sync 0; 2026-02-21T08:54:39.3182540Z // begin inline asm 2026-02-21T08:54:39.3182664Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2328, 512; 2026-02-21T08:54:39.3182719Z // end inline asm 2026-02-21T08:54:39.3182769Z ret; 2026-02-21T08:54:39.3182828Z $L__tmp0: 2026-02-21T08:54:39.3182881Z $L__func_end0: 2026-02-21T08:54:39.3182959Z // -- End function 2026-02-21T08:54:39.3183010Z } 2026-02-21T08:54:39.3183221Z .file 1 "/tmp/torchinductor_root/dd/cddm5hzr2aifyndlyj4xpida4r4loy65dpy5tqhnrknatxqgpdgc.py" 2026-02-21T08:54:39.3183282Z .section .debug_abbrev 2026-02-21T08:54:39.3183360Z { 2026-02-21T08:54:39.3183450Z .b8 1 // Abbreviation Code 2026-02-21T08:54:39.3183533Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:39.3183611Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:39.3183687Z .b8 37 // DW_AT_producer 2026-02-21T08:54:39.3183766Z .b8 8 // DW_FORM_string 2026-02-21T08:54:39.3183838Z .b8 19 // DW_AT_language 2026-02-21T08:54:39.3183934Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:39.3184013Z .b8 3 // DW_AT_name 2026-02-21T08:54:39.3184084Z .b8 8 // DW_FORM_string 2026-02-21T08:54:39.3184161Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:39.3184240Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:39.3184313Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:39.3184383Z .b8 8 // DW_FORM_string 2026-02-21T08:54:39.3184457Z .b8 0 // EOM(1) 2026-02-21T08:54:39.3184524Z .b8 0 // EOM(2) 2026-02-21T08:54:39.3184586Z .b8 0 // EOM(3) 2026-02-21T08:54:39.3184634Z } 2026-02-21T08:54:39.3184731Z .section .debug_info 2026-02-21T08:54:39.3184807Z { 2026-02-21T08:54:39.3184891Z .b32 104 // Length of Unit 2026-02-21T08:54:39.3184980Z .b8 2 // DWARF version number 2026-02-21T08:54:39.3185031Z .b8 0 2026-02-21T08:54:39.3185158Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:39.3185242Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:39.3185345Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:39.3185422Z .b8 116 // DW_AT_producer 2026-02-21T08:54:39.3185475Z .b8 114 2026-02-21T08:54:39.3185533Z .b8 105 2026-02-21T08:54:39.3185582Z .b8 116 2026-02-21T08:54:39.3185630Z .b8 111 2026-02-21T08:54:39.3185678Z .b8 110 2026-02-21T08:54:39.3185734Z .b8 0 2026-02-21T08:54:39.3185805Z .b8 2 // DW_AT_language 2026-02-21T08:54:39.3185853Z .b8 0 2026-02-21T08:54:39.3185934Z .b8 99 // DW_AT_name 2026-02-21T08:54:39.3185985Z .b8 100 2026-02-21T08:54:39.3186035Z .b8 100 2026-02-21T08:54:39.3186086Z .b8 109 2026-02-21T08:54:39.3186147Z .b8 53 2026-02-21T08:54:39.3186198Z .b8 104 2026-02-21T08:54:39.3186249Z .b8 122 2026-02-21T08:54:39.3186308Z .b8 114 2026-02-21T08:54:39.3186395Z .b8 50 2026-02-21T08:54:39.3186446Z .b8 97 2026-02-21T08:54:39.3186495Z .b8 105 2026-02-21T08:54:39.3186552Z .b8 102 2026-02-21T08:54:39.3186602Z .b8 121 2026-02-21T08:54:39.3186651Z .b8 110 2026-02-21T08:54:39.3186706Z .b8 100 2026-02-21T08:54:39.3186757Z .b8 108 2026-02-21T08:54:39.3186806Z .b8 121 2026-02-21T08:54:39.3186856Z .b8 106 2026-02-21T08:54:39.3186913Z .b8 52 2026-02-21T08:54:39.3186962Z .b8 120 2026-02-21T08:54:39.3187010Z .b8 112 2026-02-21T08:54:39.3187065Z .b8 105 2026-02-21T08:54:39.3187114Z .b8 100 2026-02-21T08:54:39.3187162Z .b8 97 2026-02-21T08:54:39.3187211Z .b8 52 2026-02-21T08:54:39.3187268Z .b8 114 2026-02-21T08:54:39.3187316Z .b8 52 2026-02-21T08:54:39.3187363Z .b8 108 2026-02-21T08:54:39.3187413Z .b8 111 2026-02-21T08:54:39.3187472Z .b8 121 2026-02-21T08:54:39.3187520Z .b8 54 2026-02-21T08:54:39.3187569Z .b8 53 2026-02-21T08:54:39.3187625Z .b8 100 2026-02-21T08:54:39.3187673Z .b8 112 2026-02-21T08:54:39.3187723Z .b8 121 2026-02-21T08:54:39.3187773Z .b8 53 2026-02-21T08:54:39.3187831Z .b8 116 2026-02-21T08:54:39.3187881Z .b8 113 2026-02-21T08:54:39.3187931Z .b8 104 2026-02-21T08:54:39.3187988Z .b8 110 2026-02-21T08:54:39.3188037Z .b8 114 2026-02-21T08:54:39.3188086Z .b8 107 2026-02-21T08:54:39.3188135Z .b8 110 2026-02-21T08:54:39.3188219Z .b8 97 2026-02-21T08:54:39.3188270Z .b8 116 2026-02-21T08:54:39.3188319Z .b8 120 2026-02-21T08:54:39.3188368Z .b8 113 2026-02-21T08:54:39.3188424Z .b8 103 2026-02-21T08:54:39.3188472Z .b8 112 2026-02-21T08:54:39.3188522Z .b8 100 2026-02-21T08:54:39.3188578Z .b8 103 2026-02-21T08:54:39.3188626Z .b8 99 2026-02-21T08:54:39.3188675Z .b8 46 2026-02-21T08:54:39.3188723Z .b8 112 2026-02-21T08:54:39.3188780Z .b8 121 2026-02-21T08:54:39.3188828Z .b8 0 2026-02-21T08:54:39.3188917Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:39.3189025Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:39.3189073Z .b8 116 2026-02-21T08:54:39.3189121Z .b8 109 2026-02-21T08:54:39.3189169Z .b8 112 2026-02-21T08:54:39.3189225Z .b8 47 2026-02-21T08:54:39.3189273Z .b8 116 2026-02-21T08:54:39.3189320Z .b8 111 2026-02-21T08:54:39.3189373Z .b8 114 2026-02-21T08:54:39.3189421Z .b8 99 2026-02-21T08:54:39.3189468Z .b8 104 2026-02-21T08:54:39.3189515Z .b8 105 2026-02-21T08:54:39.3189571Z .b8 110 2026-02-21T08:54:39.3189619Z .b8 100 2026-02-21T08:54:39.3189667Z .b8 117 2026-02-21T08:54:39.3189720Z .b8 99 2026-02-21T08:54:39.3189770Z .b8 116 2026-02-21T08:54:39.3189817Z .b8 111 2026-02-21T08:54:39.3189866Z .b8 114 2026-02-21T08:54:39.3189922Z .b8 95 2026-02-21T08:54:39.3189971Z .b8 114 2026-02-21T08:54:39.3190020Z .b8 111 2026-02-21T08:54:39.3190067Z .b8 111 2026-02-21T08:54:39.3190123Z .b8 116 2026-02-21T08:54:39.3190170Z .b8 47 2026-02-21T08:54:39.3190241Z .b8 100 2026-02-21T08:54:39.3190297Z .b8 100 2026-02-21T08:54:39.3190345Z .b8 0 2026-02-21T08:54:39.3190392Z } 2026-02-21T08:54:39.3190456Z .section .debug_macinfo { } 2026-02-21T08:54:39.3190460Z 2026-02-21T08:54:39.3190542Z ================================================================ 2026-02-21T08:54:39.3190642Z please share the reproducer above with Triton project. 2026-02-21T08:54:40.1037156Z [385s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:40.1037451Z 2026-02-21T08:54:40.1038537Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 32], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:54:40.1040057Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:40.1040300Z `ptxas` stderr: 2026-02-21T08:54:40.1040738Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 143 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:40.1041450Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:40.1041607Z 2026-02-21T08:54:40.1042032Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxf_yzxv4.ptx -o /tmp/tmpxf_yzxv4.ptx.o 2026-02-21T08:54:40.1042497Z 2026-02-21T08:54:40.1042632Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:40.1042826Z 2026-02-21T08:54:40.1042830Z 2026-02-21T08:54:40.1042928Z ================================================================ 2026-02-21T08:54:40.1043150Z Internal Triton PTX codegen error 2026-02-21T08:54:40.1043336Z `ptxas` stderr: 2026-02-21T08:54:40.1043759Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 143 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:40.1044254Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:40.1044404Z 2026-02-21T08:54:40.1044906Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxf_yzxv4.ptx -o /tmp/tmpxf_yzxv4.ptx.o 2026-02-21T08:54:40.1045376Z 2026-02-21T08:54:40.1045380Z 2026-02-21T08:54:40.1045435Z // 2026-02-21T08:54:40.1045586Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:40.1045760Z // 2026-02-21T08:54:40.1045837Z 2026-02-21T08:54:40.1045893Z .version 8.7 2026-02-21T08:54:40.1046029Z .target sm_100a 2026-02-21T08:54:40.1046173Z .address_size 64 2026-02-21T08:54:40.1046258Z 2026-02-21T08:54:40.1046407Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:40.1046732Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:40.1046953Z // @_helion_matmul 2026-02-21T08:54:40.1047152Z .visible .entry _helion_matmul( 2026-02-21T08:54:40.1047374Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:40.1047641Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:40.1047896Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:40.1048156Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:40.1048403Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:40.1048619Z ) 2026-02-21T08:54:40.1048738Z .reqntid 256 2026-02-21T08:54:40.1048876Z .maxnreg 32 2026-02-21T08:54:40.1049000Z { 2026-02-21T08:54:40.1049135Z .reg .pred %p<88>; 2026-02-21T08:54:40.1049284Z .reg .b32 %r<819>; 2026-02-21T08:54:40.1049441Z .reg .b64 %rd<305>; 2026-02-21T08:54:40.1049627Z $L__func_begin0: 2026-02-21T08:54:40.1049711Z 2026-02-21T08:54:40.1049763Z // %bb.0: 2026-02-21T08:54:40.1050003Z .loc 1 19 0 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:19 2026-02-21T08:54:40.1050277Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:40.1050428Z shr.u32 %r2, %r1, 5; 2026-02-21T08:54:40.1050581Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:54:40.1050771Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T08:54:40.1050922Z @%p3 bra $L__BB0_17; 2026-02-21T08:54:40.1051066Z bra.uni $L__BB0_1; 2026-02-21T08:54:40.1051207Z $L__BB0_17: 2026-02-21T08:54:40.1051443Z .loc 1 0 0 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:0:0 2026-02-21T08:54:40.1051752Z ld.param.b64 %rd17, [_helion_matmul_param_3]; 2026-02-21T08:54:40.1051960Z ld.param.b64 %rd16, [_helion_matmul_param_2]; 2026-02-21T08:54:40.1052248Z .loc 1 19 0 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:19 2026-02-21T08:54:40.1052544Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:40.1052737Z setp.lt.u32 %p42, %r1, 32; 2026-02-21T08:54:40.1052895Z mov.b32 %r453, global_smem; 2026-02-21T08:54:40.1053053Z // begin inline asm 2026-02-21T08:54:40.1053286Z @%p42 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r453], 128; 2026-02-21T08:54:40.1053560Z // end inline asm 2026-02-21T08:54:40.1053699Z bar.sync 0, 128; 2026-02-21T08:54:40.1053840Z ld.shared.b32 %r766, [global_smem]; 2026-02-21T08:54:40.1054011Z bar.sync 0, 128; 2026-02-21T08:54:40.1054144Z // begin inline asm 2026-02-21T08:54:40.1054349Z @%p42 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:40.1054573Z // end inline asm 2026-02-21T08:54:40.1054861Z .loc 1 21 73 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:21:73 2026-02-21T08:54:40.1055164Z mov.u32 %r537, %ctaid.x; 2026-02-21T08:54:40.1055314Z mov.u32 %r538, %ctaid.y; 2026-02-21T08:54:40.1055465Z mov.u32 %r539, %ctaid.z; 2026-02-21T08:54:40.1055613Z mov.u32 %r540, %nctaid.x; 2026-02-21T08:54:40.1055775Z mov.u32 %r541, %nctaid.y; 2026-02-21T08:54:40.1055937Z mad.lo.s32 %r542, %r539, %r541, %r538; 2026-02-21T08:54:40.1056128Z mad.lo.s32 %r543, %r542, %r540, %r537; 2026-02-21T08:54:40.1056301Z shl.b32 %r544, %r543, 7; 2026-02-21T08:54:40.1056472Z cvt.s64.s32 %rd175, %r544; 2026-02-21T08:54:40.1056649Z add.s64 %rd172, %rd17, %rd175; 2026-02-21T08:54:40.1056814Z shl.b32 %r545, %r1, 2; 2026-02-21T08:54:40.1056978Z add.s32 %r454, %r453, %r545; 2026-02-21T08:54:40.1057161Z mov.b32 %r816, 0; 2026-02-21T08:54:40.1057308Z // begin inline asm 2026-02-21T08:54:40.1057459Z @%p42 st.shared.b32 [ %r454 + 0 ], %r816; 2026-02-21T08:54:40.1057641Z // end inline asm 2026-02-21T08:54:40.1057780Z bar.warp.sync -1; 2026-02-21T08:54:40.1057933Z setp.eq.b32 %p82, %r1, 0; 2026-02-21T08:54:40.1058089Z cvt.u64.u32 %rd157, %r453; 2026-02-21T08:54:40.1058249Z // begin inline asm 2026-02-21T08:54:40.1058509Z @%p82 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd157 + 0 ], %rd16; 2026-02-21T08:54:40.1058817Z // end inline asm 2026-02-21T08:54:40.1058954Z // begin inline asm 2026-02-21T08:54:40.1059173Z @%p82 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x1; 2026-02-21T08:54:40.1059427Z // end inline asm 2026-02-21T08:54:40.1059556Z mov.b32 %r456, 64; 2026-02-21T08:54:40.1059695Z // begin inline asm 2026-02-21T08:54:40.1059934Z @%p82 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x0, %r456; 2026-02-21T08:54:40.1060195Z // end inline asm 2026-02-21T08:54:40.1060331Z mov.b32 %r457, 128; 2026-02-21T08:54:40.1060466Z // begin inline asm 2026-02-21T08:54:40.1060700Z @%p82 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x1, %r457; 2026-02-21T08:54:40.1060955Z // end inline asm 2026-02-21T08:54:40.1061093Z mov.b32 %r458, 12288; 2026-02-21T08:54:40.1061233Z // begin inline asm 2026-02-21T08:54:40.1061503Z @%p82 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x0, %r458; 2026-02-21T08:54:40.1061787Z // end inline asm 2026-02-21T08:54:40.1061915Z mov.b32 %r459, 2048; 2026-02-21T08:54:40.1062056Z // begin inline asm 2026-02-21T08:54:40.1062292Z @%p82 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x1, %r459; 2026-02-21T08:54:40.1062572Z // end inline asm 2026-02-21T08:54:40.1062703Z mov.b64 %rd165, 24576; 2026-02-21T08:54:40.1062853Z // begin inline asm 2026-02-21T08:54:40.1063101Z @%p82 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd157 + 0 ], 0x0, %rd165; 2026-02-21T08:54:40.1063384Z // end inline asm 2026-02-21T08:54:40.1063518Z mov.b32 %r460, 1; 2026-02-21T08:54:40.1063649Z // begin inline asm 2026-02-21T08:54:40.1063907Z @%p82 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x0, %r460; 2026-02-21T08:54:40.1064184Z // end inline asm 2026-02-21T08:54:40.1064319Z // begin inline asm 2026-02-21T08:54:40.1064569Z @%p82 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x1, %r460; 2026-02-21T08:54:40.1064890Z // end inline asm 2026-02-21T08:54:40.1065026Z // begin inline asm 2026-02-21T08:54:40.1065256Z @%p82 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x6; 2026-02-21T08:54:40.1065563Z // end inline asm 2026-02-21T08:54:40.1065697Z // begin inline asm 2026-02-21T08:54:40.1065952Z @%p82 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x0; 2026-02-21T08:54:40.1066234Z // end inline asm 2026-02-21T08:54:40.1066381Z // begin inline asm 2026-02-21T08:54:40.1066634Z @%p82 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x3; 2026-02-21T08:54:40.1066895Z // end inline asm 2026-02-21T08:54:40.1067031Z // begin inline asm 2026-02-21T08:54:40.1067253Z @%p82 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd157 + 0 ], 0x0; 2026-02-21T08:54:40.1067516Z // end inline asm 2026-02-21T08:54:40.1067650Z // begin inline asm 2026-02-21T08:54:40.1068002Z @%p42 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd172 + 0 ], [ %rd157 + 0 ], 0x80; 2026-02-21T08:54:40.1068379Z // end inline asm 2026-02-21T08:54:40.1068510Z // begin inline asm 2026-02-21T08:54:40.1068724Z @%p42 fence.proxy.tensormap::generic.acquire.gpu [ %rd172 + 0 ], 0x80; 2026-02-21T08:54:40.1068973Z @%p42 cp.async.bulk.commit_group ; 2026-02-21T08:54:40.1069167Z @%p42 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:40.1069340Z // end inline asm 2026-02-21T08:54:40.1069504Z bar.sync 0, 128; 2026-02-21T08:54:40.1069646Z cvta.global.u64 %rd176, %rd172; 2026-02-21T08:54:40.1069929Z .loc 1 28 35 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:28:35 2026-02-21T08:54:40.1070218Z shl.b32 %r102, %r537, 1; 2026-02-21T08:54:40.1070473Z .loc 1 29 37 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:29:37 2026-02-21T08:54:40.1070757Z add.s32 %r546, %r102, 2; 2026-02-21T08:54:40.1071010Z .loc 1 29 49 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:29:49 2026-02-21T08:54:40.1071321Z min.s32 %r547, %r546, 3072; 2026-02-21T08:54:40.1071584Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1071876Z sub.s32 %r548, %r547, %r102; 2026-02-21T08:54:40.1072036Z shl.b32 %r805, %r548, 6; 2026-02-21T08:54:40.1072289Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1072585Z shfl.sync.idx.b32 %r549, %r2, 0, 31, -1; 2026-02-21T08:54:40.1072760Z shl.b32 %r550, %r549, 21; 2026-02-21T08:54:40.1072918Z and.b32 %r551, %r550, 6291456; 2026-02-21T08:54:40.1073076Z add.s32 %r462, %r551, %r766; 2026-02-21T08:54:40.1073233Z mov.pred %p62, -1; 2026-02-21T08:54:40.1073369Z // begin inline asm 2026-02-21T08:54:40.1073754Z @%p62 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r462 + 0], {%r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816}; 2026-02-21T08:54:40.1074146Z // end inline asm 2026-02-21T08:54:40.1074276Z // begin inline asm 2026-02-21T08:54:40.1074628Z @%p62 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r462 + 16], {%r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816}; 2026-02-21T08:54:40.1075048Z // end inline asm 2026-02-21T08:54:40.1075184Z // begin inline asm 2026-02-21T08:54:40.1075526Z @%p62 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r462 + 32], {%r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816}; 2026-02-21T08:54:40.1075912Z // end inline asm 2026-02-21T08:54:40.1076047Z // begin inline asm 2026-02-21T08:54:40.1076380Z @%p62 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r462 + 48], {%r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816, %r816}; 2026-02-21T08:54:40.1076761Z // end inline asm 2026-02-21T08:54:40.1076890Z // begin inline asm 2026-02-21T08:54:40.1077044Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:40.1077202Z // end inline asm 2026-02-21T08:54:40.1077339Z bar.sync 0, 128; 2026-02-21T08:54:40.1077598Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1077941Z add.s32 %r530, %r453, 77824; 2026-02-21T08:54:40.1078113Z // begin inline asm 2026-02-21T08:54:40.1078279Z @%p82 mbarrier.init.shared::cta.b64 [%r530], 1; 2026-02-21T08:54:40.1078475Z // end inline asm 2026-02-21T08:54:40.1078607Z bar.sync 0, 128; 2026-02-21T08:54:40.1078751Z add.s32 %r531, %r453, 77832; 2026-02-21T08:54:40.1078907Z // begin inline asm 2026-02-21T08:54:40.1079085Z @%p82 mbarrier.init.shared::cta.b64 [%r531], 1; 2026-02-21T08:54:40.1079285Z // end inline asm 2026-02-21T08:54:40.1079422Z add.s32 %r532, %r453, 77840; 2026-02-21T08:54:40.1079584Z // begin inline asm 2026-02-21T08:54:40.1079747Z @%p82 mbarrier.init.shared::cta.b64 [%r532], 1; 2026-02-21T08:54:40.1079939Z // end inline asm 2026-02-21T08:54:40.1080071Z bar.sync 0, 128; 2026-02-21T08:54:40.1080214Z add.s32 %r533, %r453, 77848; 2026-02-21T08:54:40.1080366Z // begin inline asm 2026-02-21T08:54:40.1080533Z @%p82 mbarrier.init.shared::cta.b64 [%r533], 1; 2026-02-21T08:54:40.1080724Z // end inline asm 2026-02-21T08:54:40.1080973Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1081293Z bar.sync 0, 128; 2026-02-21T08:54:40.1081423Z // begin inline asm 2026-02-21T08:54:40.1081589Z @%p82 mbarrier.arrive.shared::cta.b64 _, [%r532]; 2026-02-21T08:54:40.1081780Z // end inline asm 2026-02-21T08:54:40.1081912Z bar.sync 0, 128; 2026-02-21T08:54:40.1082036Z // begin inline asm 2026-02-21T08:54:40.1082197Z @%p82 mbarrier.arrive.shared::cta.b64 _, [%r533]; 2026-02-21T08:54:40.1082384Z // end inline asm 2026-02-21T08:54:40.1082629Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1082965Z st.shared.b32 [global_smem+77856], 33619968; 2026-02-21T08:54:40.1083163Z st.shared.b32 [global_smem], %r766; 2026-02-21T08:54:40.1083351Z barrier.sync 1; 2026-02-21T08:54:40.1083495Z barrier.sync 1; 2026-02-21T08:54:40.1083651Z setp.lt.s32 %p72, %r805, 1; 2026-02-21T08:54:40.1083817Z mov.b32 %r818, %r816; 2026-02-21T08:54:40.1083980Z @%p72 bra $L__BB0_24; 2026-02-21T08:54:40.1084154Z // %bb.18: // %.lr.ph7 2026-02-21T08:54:40.1084360Z add.s32 %r814, %r102, -1; 2026-02-21T08:54:40.1084527Z shl.b32 %r554, %r1, 7; 2026-02-21T08:54:40.1084712Z and.b32 %r555, %r554, 16256; 2026-02-21T08:54:40.1084881Z shl.b32 %r556, %r1, 4; 2026-02-21T08:54:40.1085032Z and.b32 %r557, %r556, 112; 2026-02-21T08:54:40.1085198Z or.b32 %r558, %r555, %r557; 2026-02-21T08:54:40.1085358Z add.s32 %r560, %r453, 61440; 2026-02-21T08:54:40.1085522Z add.s32 %r106, %r560, %r558; 2026-02-21T08:54:40.1085706Z xor.b32 %r561, %r558, 16; 2026-02-21T08:54:40.1085872Z add.s32 %r107, %r560, %r561; 2026-02-21T08:54:40.1086037Z xor.b32 %r562, %r558, 32; 2026-02-21T08:54:40.1086193Z add.s32 %r108, %r560, %r562; 2026-02-21T08:54:40.1086358Z xor.b32 %r563, %r558, 48; 2026-02-21T08:54:40.1086512Z add.s32 %r109, %r560, %r563; 2026-02-21T08:54:40.1086686Z xor.b32 %r564, %r558, 64; 2026-02-21T08:54:40.1086837Z add.s32 %r110, %r560, %r564; 2026-02-21T08:54:40.1086995Z xor.b32 %r565, %r558, 80; 2026-02-21T08:54:40.1087146Z add.s32 %r111, %r560, %r565; 2026-02-21T08:54:40.1087303Z xor.b32 %r566, %r558, 96; 2026-02-21T08:54:40.1087450Z add.s32 %r112, %r560, %r566; 2026-02-21T08:54:40.1087611Z xor.b32 %r567, %r558, 112; 2026-02-21T08:54:40.1087770Z add.s32 %r113, %r560, %r567; 2026-02-21T08:54:40.1087922Z mov.b32 %r811, -1; 2026-02-21T08:54:40.1088070Z mov.b32 %r818, %r816; 2026-02-21T08:54:40.1088212Z mov.b32 %r813, %r816; 2026-02-21T08:54:40.1088362Z mov.b32 %r812, %r816; 2026-02-21T08:54:40.1088505Z bra.uni $L__BB0_19; 2026-02-21T08:54:40.1088708Z $L__BB0_22: // in Loop: Header=BB0_19 Depth=1 2026-02-21T08:54:40.1088926Z shl.b32 %r644, %r816, 3; 2026-02-21T08:54:40.1089088Z add.s32 %r646, %r453, %r644; 2026-02-21T08:54:40.1089269Z add.s32 %r570, %r646, 77824; 2026-02-21T08:54:40.1089550Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1089850Z shl.b32 %r647, %r816, 6; 2026-02-21T08:54:40.1090001Z bar.sync 0, 128; 2026-02-21T08:54:40.1090146Z // begin inline asm 2026-02-21T08:54:40.1090281Z 2026-02-21T08:54:40.1090403Z { 2026-02-21T08:54:40.1090526Z .reg .pred complete; 2026-02-21T08:54:40.1090679Z waitLoop: 2026-02-21T08:54:40.1090869Z mbarrier.try_wait.parity.shared.b64 complete, [%r570], %r818; 2026-02-21T08:54:40.1091115Z @!complete bra.uni waitLoop; 2026-02-21T08:54:40.1091275Z } 2026-02-21T08:54:40.1091343Z 2026-02-21T08:54:40.1091397Z // end inline asm 2026-02-21T08:54:40.1091543Z add.s32 %r588, %r462, %r647; 2026-02-21T08:54:40.1091704Z // begin inline asm 2026-02-21T08:54:40.1092051Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587}, [%r588 + 0]; 2026-02-21T08:54:40.1092413Z // end inline asm 2026-02-21T08:54:40.1092551Z // begin inline asm 2026-02-21T08:54:40.1092913Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604}, [%r588 + 16]; 2026-02-21T08:54:40.1093299Z // end inline asm 2026-02-21T08:54:40.1093436Z // begin inline asm 2026-02-21T08:54:40.1093769Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621}, [%r588 + 32]; 2026-02-21T08:54:40.1094144Z // end inline asm 2026-02-21T08:54:40.1094271Z // begin inline asm 2026-02-21T08:54:40.1094609Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638}, [%r588 + 48]; 2026-02-21T08:54:40.1095048Z // end inline asm 2026-02-21T08:54:40.1095178Z // begin inline asm 2026-02-21T08:54:40.1095333Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:40.1095492Z // end inline asm 2026-02-21T08:54:40.1095634Z cvt.u64.u32 %rd177, %r572; 2026-02-21T08:54:40.1095788Z cvt.u64.u32 %rd178, %r573; 2026-02-21T08:54:40.1095949Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:54:40.1096107Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:54:40.1096391Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1096694Z mov.b64 {%r648, %r649}, %rd180; 2026-02-21T08:54:40.1096870Z cvt.rn.f16x2.f32 %r650, %r649, %r648; 2026-02-21T08:54:40.1097167Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1097489Z cvt.u64.u32 %rd181, %r574; 2026-02-21T08:54:40.1097651Z cvt.u64.u32 %rd182, %r575; 2026-02-21T08:54:40.1097801Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:54:40.1097960Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:54:40.1098229Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1098523Z mov.b64 {%r651, %r652}, %rd184; 2026-02-21T08:54:40.1098695Z cvt.rn.f16x2.f32 %r653, %r652, %r651; 2026-02-21T08:54:40.1098974Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1099262Z cvt.u64.u32 %rd185, %r576; 2026-02-21T08:54:40.1099412Z cvt.u64.u32 %rd186, %r577; 2026-02-21T08:54:40.1099566Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:54:40.1099715Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:54:40.1099991Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1100282Z mov.b64 {%r654, %r655}, %rd188; 2026-02-21T08:54:40.1100447Z cvt.rn.f16x2.f32 %r656, %r655, %r654; 2026-02-21T08:54:40.1100728Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1101010Z cvt.u64.u32 %rd189, %r578; 2026-02-21T08:54:40.1101194Z cvt.u64.u32 %rd190, %r579; 2026-02-21T08:54:40.1101341Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:54:40.1101501Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:54:40.1101760Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1102041Z mov.b64 {%r657, %r658}, %rd192; 2026-02-21T08:54:40.1102210Z cvt.rn.f16x2.f32 %r659, %r658, %r657; 2026-02-21T08:54:40.1102479Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1102760Z cvt.u64.u32 %rd193, %r580; 2026-02-21T08:54:40.1102908Z cvt.u64.u32 %rd194, %r581; 2026-02-21T08:54:40.1103062Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:54:40.1103212Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:54:40.1103478Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1103762Z mov.b64 {%r660, %r661}, %rd196; 2026-02-21T08:54:40.1103926Z cvt.rn.f16x2.f32 %r662, %r661, %r660; 2026-02-21T08:54:40.1104199Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1104483Z cvt.u64.u32 %rd197, %r582; 2026-02-21T08:54:40.1104661Z cvt.u64.u32 %rd198, %r583; 2026-02-21T08:54:40.1104846Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:54:40.1105004Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:54:40.1105259Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1105551Z mov.b64 {%r663, %r664}, %rd200; 2026-02-21T08:54:40.1105718Z cvt.rn.f16x2.f32 %r665, %r664, %r663; 2026-02-21T08:54:40.1105989Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1106298Z cvt.u64.u32 %rd201, %r584; 2026-02-21T08:54:40.1106454Z cvt.u64.u32 %rd202, %r585; 2026-02-21T08:54:40.1106615Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:54:40.1106775Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:54:40.1107048Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1107349Z mov.b64 {%r666, %r667}, %rd204; 2026-02-21T08:54:40.1107517Z cvt.rn.f16x2.f32 %r668, %r667, %r666; 2026-02-21T08:54:40.1107802Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1108087Z cvt.u64.u32 %rd205, %r586; 2026-02-21T08:54:40.1108248Z cvt.u64.u32 %rd206, %r587; 2026-02-21T08:54:40.1108399Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:54:40.1108562Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:54:40.1108850Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1109131Z mov.b64 {%r669, %r670}, %rd208; 2026-02-21T08:54:40.1109299Z cvt.rn.f16x2.f32 %r671, %r670, %r669; 2026-02-21T08:54:40.1109565Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1109849Z cvt.u64.u32 %rd209, %r589; 2026-02-21T08:54:40.1109997Z cvt.u64.u32 %rd210, %r590; 2026-02-21T08:54:40.1110152Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:54:40.1110304Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:54:40.1110565Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1110844Z mov.b64 {%r672, %r673}, %rd212; 2026-02-21T08:54:40.1111003Z cvt.rn.f16x2.f32 %r674, %r673, %r672; 2026-02-21T08:54:40.1111278Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1111550Z cvt.u64.u32 %rd213, %r591; 2026-02-21T08:54:40.1111706Z cvt.u64.u32 %rd214, %r592; 2026-02-21T08:54:40.1111855Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:54:40.1112014Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:54:40.1112267Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1112580Z mov.b64 {%r675, %r676}, %rd216; 2026-02-21T08:54:40.1112746Z cvt.rn.f16x2.f32 %r677, %r676, %r675; 2026-02-21T08:54:40.1113025Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1113310Z cvt.u64.u32 %rd217, %r593; 2026-02-21T08:54:40.1113459Z cvt.u64.u32 %rd218, %r594; 2026-02-21T08:54:40.1113612Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:54:40.1113762Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:54:40.1114031Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1114316Z mov.b64 {%r678, %r679}, %rd220; 2026-02-21T08:54:40.1114475Z cvt.rn.f16x2.f32 %r680, %r679, %r678; 2026-02-21T08:54:40.1114788Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1115063Z cvt.u64.u32 %rd221, %r595; 2026-02-21T08:54:40.1115219Z cvt.u64.u32 %rd222, %r596; 2026-02-21T08:54:40.1115368Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:54:40.1115526Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:54:40.1115783Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1116102Z mov.b64 {%r681, %r682}, %rd224; 2026-02-21T08:54:40.1116269Z cvt.rn.f16x2.f32 %r683, %r682, %r681; 2026-02-21T08:54:40.1116541Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1116824Z cvt.u64.u32 %rd225, %r597; 2026-02-21T08:54:40.1116973Z cvt.u64.u32 %rd226, %r598; 2026-02-21T08:54:40.1117126Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:54:40.1117278Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:54:40.1117574Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1117862Z mov.b64 {%r684, %r685}, %rd228; 2026-02-21T08:54:40.1118021Z cvt.rn.f16x2.f32 %r686, %r685, %r684; 2026-02-21T08:54:40.1118298Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1118575Z cvt.u64.u32 %rd229, %r599; 2026-02-21T08:54:40.1118734Z cvt.u64.u32 %rd230, %r600; 2026-02-21T08:54:40.1118881Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:54:40.1119040Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:54:40.1119295Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1119586Z mov.b64 {%r687, %r688}, %rd232; 2026-02-21T08:54:40.1119750Z cvt.rn.f16x2.f32 %r689, %r688, %r687; 2026-02-21T08:54:40.1120069Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1120355Z cvt.u64.u32 %rd233, %r601; 2026-02-21T08:54:40.1120501Z cvt.u64.u32 %rd234, %r602; 2026-02-21T08:54:40.1120654Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:54:40.1120802Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:54:40.1121066Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1121350Z mov.b64 {%r690, %r691}, %rd236; 2026-02-21T08:54:40.1121509Z cvt.rn.f16x2.f32 %r692, %r691, %r690; 2026-02-21T08:54:40.1121783Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1122059Z cvt.u64.u32 %rd237, %r603; 2026-02-21T08:54:40.1122215Z cvt.u64.u32 %rd238, %r604; 2026-02-21T08:54:40.1122363Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:54:40.1122524Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:54:40.1122781Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1123067Z mov.b64 {%r693, %r694}, %rd240; 2026-02-21T08:54:40.1123236Z cvt.rn.f16x2.f32 %r695, %r694, %r693; 2026-02-21T08:54:40.1123507Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1123820Z cvt.u64.u32 %rd241, %r606; 2026-02-21T08:54:40.1123967Z cvt.u64.u32 %rd242, %r607; 2026-02-21T08:54:40.1124117Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:54:40.1124266Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:54:40.1124530Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1124842Z mov.b64 {%r696, %r697}, %rd244; 2026-02-21T08:54:40.1125001Z cvt.rn.f16x2.f32 %r698, %r697, %r696; 2026-02-21T08:54:40.1125281Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1125556Z cvt.u64.u32 %rd245, %r608; 2026-02-21T08:54:40.1125717Z cvt.u64.u32 %rd246, %r609; 2026-02-21T08:54:40.1125870Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:54:40.1126035Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:54:40.1126308Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1126607Z mov.b64 {%r699, %r700}, %rd248; 2026-02-21T08:54:40.1126782Z cvt.rn.f16x2.f32 %r701, %r700, %r699; 2026-02-21T08:54:40.1127067Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1127399Z cvt.u64.u32 %rd249, %r610; 2026-02-21T08:54:40.1127556Z cvt.u64.u32 %rd250, %r611; 2026-02-21T08:54:40.1127716Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:54:40.1127872Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:54:40.1128151Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1128448Z mov.b64 {%r702, %r703}, %rd252; 2026-02-21T08:54:40.1128616Z cvt.rn.f16x2.f32 %r704, %r703, %r702; 2026-02-21T08:54:40.1128908Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1129238Z cvt.u64.u32 %rd253, %r612; 2026-02-21T08:54:40.1129404Z cvt.u64.u32 %rd254, %r613; 2026-02-21T08:54:40.1129560Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:54:40.1129738Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:54:40.1130012Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1130320Z mov.b64 {%r705, %r706}, %rd256; 2026-02-21T08:54:40.1130495Z cvt.rn.f16x2.f32 %r707, %r706, %r705; 2026-02-21T08:54:40.1130780Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1131079Z cvt.u64.u32 %rd257, %r614; 2026-02-21T08:54:40.1131236Z cvt.u64.u32 %rd258, %r615; 2026-02-21T08:54:40.1131396Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:54:40.1131552Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:54:40.1131870Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1132176Z mov.b64 {%r708, %r709}, %rd260; 2026-02-21T08:54:40.1132346Z cvt.rn.f16x2.f32 %r710, %r709, %r708; 2026-02-21T08:54:40.1132642Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1132936Z cvt.u64.u32 %rd261, %r616; 2026-02-21T08:54:40.1133100Z cvt.u64.u32 %rd262, %r617; 2026-02-21T08:54:40.1133254Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:54:40.1133429Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:54:40.1133695Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1133990Z mov.b64 {%r711, %r712}, %rd264; 2026-02-21T08:54:40.1134158Z cvt.rn.f16x2.f32 %r713, %r712, %r711; 2026-02-21T08:54:40.1134433Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1134752Z cvt.u64.u32 %rd265, %r618; 2026-02-21T08:54:40.1134904Z cvt.u64.u32 %rd266, %r619; 2026-02-21T08:54:40.1135058Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:54:40.1135208Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:54:40.1135476Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1135791Z mov.b64 {%r714, %r715}, %rd268; 2026-02-21T08:54:40.1135950Z cvt.rn.f16x2.f32 %r716, %r715, %r714; 2026-02-21T08:54:40.1136223Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1136494Z cvt.u64.u32 %rd269, %r620; 2026-02-21T08:54:40.1136647Z cvt.u64.u32 %rd270, %r621; 2026-02-21T08:54:40.1136792Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:54:40.1136946Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:54:40.1137201Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1137484Z mov.b64 {%r717, %r718}, %rd272; 2026-02-21T08:54:40.1137648Z cvt.rn.f16x2.f32 %r719, %r718, %r717; 2026-02-21T08:54:40.1137920Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1138205Z cvt.u64.u32 %rd273, %r623; 2026-02-21T08:54:40.1138351Z cvt.u64.u32 %rd274, %r624; 2026-02-21T08:54:40.1138503Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:54:40.1138650Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:54:40.1138949Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1139237Z mov.b64 {%r720, %r721}, %rd276; 2026-02-21T08:54:40.1139398Z cvt.rn.f16x2.f32 %r722, %r721, %r720; 2026-02-21T08:54:40.1139675Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1139953Z cvt.u64.u32 %rd277, %r625; 2026-02-21T08:54:40.1140109Z cvt.u64.u32 %rd278, %r626; 2026-02-21T08:54:40.1140258Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:54:40.1140419Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:54:40.1140708Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1140993Z mov.b64 {%r723, %r724}, %rd280; 2026-02-21T08:54:40.1141159Z cvt.rn.f16x2.f32 %r725, %r724, %r723; 2026-02-21T08:54:40.1141430Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1141711Z cvt.u64.u32 %rd281, %r627; 2026-02-21T08:54:40.1141858Z cvt.u64.u32 %rd282, %r628; 2026-02-21T08:54:40.1142012Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:54:40.1142160Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:54:40.1142425Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1142708Z mov.b64 {%r726, %r727}, %rd284; 2026-02-21T08:54:40.1142867Z cvt.rn.f16x2.f32 %r728, %r727, %r726; 2026-02-21T08:54:40.1143172Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1143453Z cvt.u64.u32 %rd285, %r629; 2026-02-21T08:54:40.1143609Z cvt.u64.u32 %rd286, %r630; 2026-02-21T08:54:40.1143753Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:54:40.1143910Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:54:40.1144170Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1144463Z mov.b64 {%r729, %r730}, %rd288; 2026-02-21T08:54:40.1144629Z cvt.rn.f16x2.f32 %r731, %r730, %r729; 2026-02-21T08:54:40.1144937Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1145224Z cvt.u64.u32 %rd289, %r631; 2026-02-21T08:54:40.1145371Z cvt.u64.u32 %rd290, %r632; 2026-02-21T08:54:40.1145526Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:54:40.1145677Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:54:40.1145949Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1146238Z mov.b64 {%r732, %r733}, %rd292; 2026-02-21T08:54:40.1146398Z cvt.rn.f16x2.f32 %r734, %r733, %r732; 2026-02-21T08:54:40.1146675Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1146987Z cvt.u64.u32 %rd293, %r633; 2026-02-21T08:54:40.1147144Z cvt.u64.u32 %rd294, %r634; 2026-02-21T08:54:40.1147289Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:54:40.1147445Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:54:40.1147699Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1147980Z mov.b64 {%r735, %r736}, %rd296; 2026-02-21T08:54:40.1148145Z cvt.rn.f16x2.f32 %r737, %r736, %r735; 2026-02-21T08:54:40.1148408Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1148690Z cvt.u64.u32 %rd297, %r635; 2026-02-21T08:54:40.1148838Z cvt.u64.u32 %rd298, %r636; 2026-02-21T08:54:40.1148993Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:54:40.1149141Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:54:40.1149406Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1149692Z mov.b64 {%r738, %r739}, %rd300; 2026-02-21T08:54:40.1149849Z cvt.rn.f16x2.f32 %r740, %r739, %r738; 2026-02-21T08:54:40.1150121Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1150424Z cvt.u64.u32 %rd301, %r637; 2026-02-21T08:54:40.1150583Z cvt.u64.u32 %rd302, %r638; 2026-02-21T08:54:40.1150730Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:54:40.1150887Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:54:40.1151151Z .loc 1 58 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:58:27 2026-02-21T08:54:40.1151441Z mov.b64 {%r741, %r742}, %rd304; 2026-02-21T08:54:40.1151614Z cvt.rn.f16x2.f32 %r743, %r742, %r741; 2026-02-21T08:54:40.1151908Z .loc 1 59 45 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:59:45 2026-02-21T08:54:40.1152228Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:40.1152399Z bar.sync 0, 128; 2026-02-21T08:54:40.1152583Z st.shared.v4.b32 [%r106], {%r650, %r653, %r656, %r659}; 2026-02-21T08:54:40.1152817Z st.shared.v4.b32 [%r107], {%r662, %r665, %r668, %r671}; 2026-02-21T08:54:40.1153049Z st.shared.v4.b32 [%r108], {%r674, %r677, %r680, %r683}; 2026-02-21T08:54:40.1153281Z st.shared.v4.b32 [%r109], {%r686, %r689, %r692, %r695}; 2026-02-21T08:54:40.1153503Z st.shared.v4.b32 [%r110], {%r698, %r701, %r704, %r707}; 2026-02-21T08:54:40.1153729Z st.shared.v4.b32 [%r111], {%r710, %r713, %r716, %r719}; 2026-02-21T08:54:40.1153950Z st.shared.v4.b32 [%r112], {%r722, %r725, %r728, %r731}; 2026-02-21T08:54:40.1154174Z st.shared.v4.b32 [%r113], {%r734, %r737, %r740, %r743}; 2026-02-21T08:54:40.1154363Z // begin inline asm 2026-02-21T08:54:40.1154557Z fence.proxy.async.shared::cta; 2026-02-21T08:54:40.1154774Z // end inline asm 2026-02-21T08:54:40.1154968Z bar.sync 0, 128; 2026-02-21T08:54:40.1155171Z elect.sync %r744|%p79, -1; 2026-02-21T08:54:40.1155331Z and.pred %p76, %p42, %p79; 2026-02-21T08:54:40.1155484Z // begin inline asm 2026-02-21T08:54:40.1155749Z @%p76 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd176, {%r812, %r813}], [%r560]; 2026-02-21T08:54:40.1156044Z // end inline asm 2026-02-21T08:54:40.1156183Z cp.async.bulk.commit_group; 2026-02-21T08:54:40.1156461Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1156755Z add.s32 %r643, %r646, 77840; 2026-02-21T08:54:40.1157014Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1157291Z bar.sync 0, 128; 2026-02-21T08:54:40.1157448Z // begin inline asm 2026-02-21T08:54:40.1157629Z @%p82 mbarrier.arrive.shared::cta.b64 _, [%r643]; 2026-02-21T08:54:40.1157819Z // end inline asm 2026-02-21T08:54:40.1157956Z add.s32 %r745, %r816, 1; 2026-02-21T08:54:40.1158104Z setp.eq.b32 %p80, %r745, 2; 2026-02-21T08:54:40.1158267Z selp.b32 %r816, 0, %r745, %p80; 2026-02-21T08:54:40.1158430Z selp.b32 %r815, 1, 0, %p80; 2026-02-21T08:54:40.1158637Z $L__BB0_23: // %.thread15 2026-02-21T08:54:40.1158859Z // in Loop: Header=BB0_19 Depth=1 2026-02-21T08:54:40.1159173Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1159466Z xor.b32 %r818, %r818, %r815; 2026-02-21T08:54:40.1159731Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1160034Z add.s32 %r805, %r805, -1; 2026-02-21T08:54:40.1160194Z setp.ne.b32 %p81, %r805, 0; 2026-02-21T08:54:40.1160348Z @%p81 bra $L__BB0_19; 2026-02-21T08:54:40.1160500Z bra.uni $L__BB0_24; 2026-02-21T08:54:40.1160693Z $L__BB0_19: // =>This Inner Loop Header: Depth=1 2026-02-21T08:54:40.1160916Z add.s32 %r569, %r811, 1; 2026-02-21T08:54:40.1161066Z setp.eq.b32 %p73, %r811, 63; 2026-02-21T08:54:40.1161228Z selp.b32 %r811, 0, %r569, %p73; 2026-02-21T08:54:40.1161389Z setp.eq.b32 %p74, %r811, 63; 2026-02-21T08:54:40.1161548Z @%p74 bra $L__BB0_22; 2026-02-21T08:54:40.1161725Z // %bb.20: // in Loop: Header=BB0_19 Depth=1 2026-02-21T08:54:40.1162074Z .loc 1 0 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:0:107 2026-02-21T08:54:40.1162358Z mov.b32 %r815, 0; 2026-02-21T08:54:40.1162600Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1162887Z setp.ne.b32 %p75, %r811, 0; 2026-02-21T08:54:40.1163037Z @%p75 bra $L__BB0_23; 2026-02-21T08:54:40.1163197Z // %bb.21: // %.thread 2026-02-21T08:54:40.1163406Z // in Loop: Header=BB0_19 Depth=1 2026-02-21T08:54:40.1163669Z add.s32 %r814, %r814, 1; 2026-02-21T08:54:40.1163929Z .loc 1 36 35 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:36:35 2026-02-21T08:54:40.1164211Z shr.s32 %r747, %r814, 31; 2026-02-21T08:54:40.1164364Z shr.u32 %r748, %r747, 24; 2026-02-21T08:54:40.1164509Z add.s32 %r749, %r814, %r748; 2026-02-21T08:54:40.1164666Z shr.s32 %r750, %r749, 8; 2026-02-21T08:54:40.1164954Z .loc 1 37 33 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:37:33 2026-02-21T08:54:40.1165244Z shl.b32 %r751, %r750, 4; 2026-02-21T08:54:40.1165503Z .loc 1 38 39 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:38:39 2026-02-21T08:54:40.1165785Z sub.s32 %r752, 192, %r751; 2026-02-21T08:54:40.1166079Z .loc 1 38 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:38:52 2026-02-21T08:54:40.1166354Z min.s32 %r753, %r752, 16; 2026-02-21T08:54:40.1166614Z .loc 1 39 45 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:45 2026-02-21T08:54:40.1166892Z and.b32 %r754, %r749, -256; 2026-02-21T08:54:40.1167047Z sub.s32 %r755, %r814, %r754; 2026-02-21T08:54:40.1167310Z .loc 1 40 51 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:40:51 2026-02-21T08:54:40.1167587Z div.s32 %r756, %r755, %r753; 2026-02-21T08:54:40.1167848Z .loc 1 39 64 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:64 2026-02-21T08:54:40.1168128Z mul.lo.s32 %r757, %r756, %r753; 2026-02-21T08:54:40.1168289Z sub.s32 %r758, %r755, %r757; 2026-02-21T08:54:40.1168548Z .loc 1 39 30 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:30 2026-02-21T08:54:40.1168838Z add.s32 %r759, %r758, %r751; 2026-02-21T08:54:40.1169100Z .loc 1 41 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:41:27 2026-02-21T08:54:40.1169376Z shl.b32 %r812, %r759, 6; 2026-02-21T08:54:40.1169635Z .loc 1 43 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:43:27 2026-02-21T08:54:40.1169909Z shl.b32 %r813, %r756, 7; 2026-02-21T08:54:40.1170200Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1170483Z bra.uni $L__BB0_23; 2026-02-21T08:54:40.1170667Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:54:40.1170980Z .loc 1 0 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:0:107 2026-02-21T08:54:40.1171288Z ld.param.b64 %rd15, [_helion_matmul_param_1]; 2026-02-21T08:54:40.1171508Z ld.param.b64 %rd14, [_helion_matmul_param_0]; 2026-02-21T08:54:40.1171695Z mov.b32 %r137, global_smem; 2026-02-21T08:54:40.1171855Z add.s32 %r138, %r137, %r3; 2026-02-21T08:54:40.1172006Z mov.u32 %r260, %ctaid.x; 2026-02-21T08:54:40.1172161Z shl.b32 %r5, %r260, 1; 2026-02-21T08:54:40.1172303Z bra.uni $L__BB0_2; 2026-02-21T08:54:40.1172487Z $L__BB0_16: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1172828Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1173124Z barrier.sync 1; 2026-02-21T08:54:40.1173268Z barrier.sync 1; 2026-02-21T08:54:40.1173426Z $L__BB0_2: // %.preheader 2026-02-21T08:54:40.1173677Z // =>This Loop Header: Depth=1 2026-02-21T08:54:40.1173907Z // Child Loop BB0_9 Depth 2 2026-02-21T08:54:40.1174227Z .loc 1 19 0 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:19 2026-02-21T08:54:40.1174544Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:54:40.1174760Z barrier.sync 1; 2026-02-21T08:54:40.1174916Z ld.shared.b8 %r136, [%r138+77852]; 2026-02-21T08:54:40.1175124Z setp.gt.u32 %p4, %r136, 3; 2026-02-21T08:54:40.1175290Z @%p4 bra $L__BB0_4; 2026-02-21T08:54:40.1175457Z // %bb.3: // %.preheader 2026-02-21T08:54:40.1175685Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1175895Z $L_brx_0: .branchtargets 2026-02-21T08:54:40.1176055Z $L__BB0_5, 2026-02-21T08:54:40.1176190Z $L__BB0_15, 2026-02-21T08:54:40.1176316Z $L__BB0_16, 2026-02-21T08:54:40.1176443Z $L__BB0_25; 2026-02-21T08:54:40.1176573Z brx.idx %r136, $L_brx_0; 2026-02-21T08:54:40.1176774Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1177104Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1177428Z ld.shared.b32 %r310, [global_smem]; 2026-02-21T08:54:40.1177601Z barrier.sync 1; 2026-02-21T08:54:40.1177890Z .loc 1 29 37 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:29:37 2026-02-21T08:54:40.1178196Z add.s32 %r261, %r5, 2; 2026-02-21T08:54:40.1178457Z .loc 1 29 49 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:29:49 2026-02-21T08:54:40.1178750Z min.s32 %r262, %r261, 3072; 2026-02-21T08:54:40.1179020Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1179320Z sub.s32 %r263, %r262, %r5; 2026-02-21T08:54:40.1179480Z shl.b32 %r6, %r263, 6; 2026-02-21T08:54:40.1179747Z .loc 1 42 45 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:42:45 2026-02-21T08:54:40.1180039Z add.s32 %r264, %r1, -128; 2026-02-21T08:54:40.1180196Z shr.u32 %r265, %r264, 5; 2026-02-21T08:54:40.1180357Z bfe.u32 %r7, %r1, 2, 4; 2026-02-21T08:54:40.1180505Z or.b32 %r8, %r7, 16; 2026-02-21T08:54:40.1180656Z or.b32 %r9, %r7, 32; 2026-02-21T08:54:40.1180801Z or.b32 %r10, %r7, 48; 2026-02-21T08:54:40.1181055Z .loc 1 44 45 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:44:45 2026-02-21T08:54:40.1181326Z or.b32 %r11, %r7, 64; 2026-02-21T08:54:40.1181471Z or.b32 %r12, %r7, 80; 2026-02-21T08:54:40.1181610Z or.b32 %r13, %r7, 96; 2026-02-21T08:54:40.1181774Z or.b32 %r14, %r7, 112; 2026-02-21T08:54:40.1182026Z .loc 1 50 48 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:50:48 2026-02-21T08:54:40.1182302Z shl.b32 %r266, %r1, 3; 2026-02-21T08:54:40.1182452Z and.b32 %r15, %r266, 24; 2026-02-21T08:54:40.1182710Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1183003Z setp.lt.s32 %p5, %r6, 1; 2026-02-21T08:54:40.1183151Z setp.gt.s32 %p6, %r6, 0; 2026-02-21T08:54:40.1183403Z .loc 1 36 35 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:36:35 2026-02-21T08:54:40.1183688Z bfe.s32 %r267, %r260, 30, 1; 2026-02-21T08:54:40.1183840Z shr.u32 %r268, %r267, 24; 2026-02-21T08:54:40.1183995Z add.s32 %r269, %r5, %r268; 2026-02-21T08:54:40.1184143Z shr.s32 %r270, %r269, 8; 2026-02-21T08:54:40.1184393Z .loc 1 37 33 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:37:33 2026-02-21T08:54:40.1184702Z shl.b32 %r271, %r270, 4; 2026-02-21T08:54:40.1184958Z .loc 1 38 39 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:38:39 2026-02-21T08:54:40.1185236Z sub.s32 %r272, 192, %r271; 2026-02-21T08:54:40.1185515Z .loc 1 38 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:38:52 2026-02-21T08:54:40.1185808Z min.s32 %r273, %r272, 16; 2026-02-21T08:54:40.1186063Z .loc 1 39 45 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:45 2026-02-21T08:54:40.1186359Z and.b32 %r274, %r269, -256; 2026-02-21T08:54:40.1186513Z sub.s32 %r275, %r5, %r274; 2026-02-21T08:54:40.1186772Z .loc 1 40 51 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:40:51 2026-02-21T08:54:40.1187083Z div.s32 %r276, %r275, %r273; 2026-02-21T08:54:40.1187341Z .loc 1 39 64 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:64 2026-02-21T08:54:40.1187627Z mul.lo.s32 %r277, %r276, %r273; 2026-02-21T08:54:40.1187785Z sub.s32 %r278, %r275, %r277; 2026-02-21T08:54:40.1188049Z .loc 1 39 30 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:30 2026-02-21T08:54:40.1188331Z add.s32 %r279, %r278, %r271; 2026-02-21T08:54:40.1188594Z .loc 1 41 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:41:27 2026-02-21T08:54:40.1188883Z shl.b32 %r280, %r279, 6; 2026-02-21T08:54:40.1189133Z .loc 1 42 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:42:32 2026-02-21T08:54:40.1189412Z or.b32 %r801, %r280, %r7; 2026-02-21T08:54:40.1189555Z or.b32 %r802, %r280, %r8; 2026-02-21T08:54:40.1189728Z or.b32 %r803, %r280, %r9; 2026-02-21T08:54:40.1189874Z or.b32 %r804, %r280, %r10; 2026-02-21T08:54:40.1190133Z .loc 1 43 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:43:27 2026-02-21T08:54:40.1190413Z shl.b32 %r281, %r276, 7; 2026-02-21T08:54:40.1190674Z .loc 1 44 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:44:32 2026-02-21T08:54:40.1190947Z or.b32 %r793, %r281, %r7; 2026-02-21T08:54:40.1191088Z or.b32 %r794, %r281, %r8; 2026-02-21T08:54:40.1191239Z or.b32 %r795, %r281, %r9; 2026-02-21T08:54:40.1191379Z or.b32 %r796, %r281, %r10; 2026-02-21T08:54:40.1191531Z or.b32 %r797, %r281, %r11; 2026-02-21T08:54:40.1191673Z or.b32 %r798, %r281, %r12; 2026-02-21T08:54:40.1191823Z or.b32 %r799, %r281, %r13; 2026-02-21T08:54:40.1191966Z or.b32 %r800, %r281, %r14; 2026-02-21T08:54:40.1192223Z .loc 1 54 53 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:53 2026-02-21T08:54:40.1192509Z shl.b32 %r282, %r793, 11; 2026-02-21T08:54:40.1192654Z shl.b32 %r283, %r794, 11; 2026-02-21T08:54:40.1192808Z shl.b32 %r284, %r795, 11; 2026-02-21T08:54:40.1192950Z shl.b32 %r285, %r796, 11; 2026-02-21T08:54:40.1193100Z shl.b32 %r286, %r797, 11; 2026-02-21T08:54:40.1193268Z shl.b32 %r287, %r798, 11; 2026-02-21T08:54:40.1193415Z shl.b32 %r288, %r799, 11; 2026-02-21T08:54:40.1193556Z shl.b32 %r289, %r800, 11; 2026-02-21T08:54:40.1193811Z .loc 1 54 60 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:60 2026-02-21T08:54:40.1194097Z or.b32 %r290, %r282, %r15; 2026-02-21T08:54:40.1194245Z or.b32 %r291, %r283, %r15; 2026-02-21T08:54:40.1194398Z or.b32 %r292, %r284, %r15; 2026-02-21T08:54:40.1194540Z or.b32 %r293, %r285, %r15; 2026-02-21T08:54:40.1194718Z or.b32 %r294, %r286, %r15; 2026-02-21T08:54:40.1194861Z or.b32 %r295, %r287, %r15; 2026-02-21T08:54:40.1195008Z or.b32 %r296, %r288, %r15; 2026-02-21T08:54:40.1195153Z or.b32 %r297, %r289, %r15; 2026-02-21T08:54:40.1195410Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1195703Z mad.wide.s32 %rd18, %r290, 2, %rd14; 2026-02-21T08:54:40.1195874Z mad.wide.s32 %rd19, %r291, 2, %rd14; 2026-02-21T08:54:40.1196049Z mad.wide.s32 %rd20, %r292, 2, %rd14; 2026-02-21T08:54:40.1196212Z mad.wide.s32 %rd21, %r293, 2, %rd14; 2026-02-21T08:54:40.1196381Z mad.wide.s32 %rd22, %r294, 2, %rd14; 2026-02-21T08:54:40.1196568Z mad.wide.s32 %rd23, %r295, 2, %rd14; 2026-02-21T08:54:40.1196740Z mad.wide.s32 %rd24, %r296, 2, %rd14; 2026-02-21T08:54:40.1196901Z mad.wide.s32 %rd25, %r297, 2, %rd14; 2026-02-21T08:54:40.1197175Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1197461Z shl.b32 %r298, %r1, 4; 2026-02-21T08:54:40.1197604Z and.b32 %r299, %r298, 1008; 2026-02-21T08:54:40.1197763Z shl.b32 %r300, %r1, 1; 2026-02-21T08:54:40.1197905Z and.b32 %r301, %r300, 48; 2026-02-21T08:54:40.1198083Z xor.b32 %r28, %r299, %r301; 2026-02-21T08:54:40.1198227Z add.s32 %r326, %r137, %r28; 2026-02-21T08:54:40.1198384Z selp.b32 %r140, 16, 0, %p6; 2026-02-21T08:54:40.1198531Z // begin inline asm 2026-02-21T08:54:40.1198740Z cp.async.cg.shared.global [ %r326 + 0 ], [ %rd18 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1198969Z // end inline asm 2026-02-21T08:54:40.1199102Z add.s32 %r328, %r326, 1024; 2026-02-21T08:54:40.1199251Z // begin inline asm 2026-02-21T08:54:40.1199442Z cp.async.cg.shared.global [ %r328 + 0 ], [ %rd19 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1199666Z // end inline asm 2026-02-21T08:54:40.1199794Z add.s32 %r330, %r326, 2048; 2026-02-21T08:54:40.1199944Z // begin inline asm 2026-02-21T08:54:40.1200131Z cp.async.cg.shared.global [ %r330 + 0 ], [ %rd20 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1200356Z // end inline asm 2026-02-21T08:54:40.1200485Z add.s32 %r332, %r326, 3072; 2026-02-21T08:54:40.1200663Z // begin inline asm 2026-02-21T08:54:40.1200858Z cp.async.cg.shared.global [ %r332 + 0 ], [ %rd21 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1201071Z // end inline asm 2026-02-21T08:54:40.1201206Z add.s32 %r334, %r326, 4096; 2026-02-21T08:54:40.1201350Z // begin inline asm 2026-02-21T08:54:40.1201543Z cp.async.cg.shared.global [ %r334 + 0 ], [ %rd22 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1201753Z // end inline asm 2026-02-21T08:54:40.1201889Z add.s32 %r336, %r326, 5120; 2026-02-21T08:54:40.1202031Z // begin inline asm 2026-02-21T08:54:40.1202225Z cp.async.cg.shared.global [ %r336 + 0 ], [ %rd23 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1202446Z // end inline asm 2026-02-21T08:54:40.1202575Z add.s32 %r338, %r326, 6144; 2026-02-21T08:54:40.1202725Z // begin inline asm 2026-02-21T08:54:40.1202907Z cp.async.cg.shared.global [ %r338 + 0 ], [ %rd24 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1203127Z // end inline asm 2026-02-21T08:54:40.1203254Z add.s32 %r340, %r326, 7168; 2026-02-21T08:54:40.1203407Z // begin inline asm 2026-02-21T08:54:40.1203588Z cp.async.cg.shared.global [ %r340 + 0 ], [ %rd25 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1203807Z // end inline asm 2026-02-21T08:54:40.1203945Z cp.async.commit_group; 2026-02-21T08:54:40.1204201Z .loc 1 55 80 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:80 2026-02-21T08:54:40.1204521Z shl.b32 %r302, %r801, 11; 2026-02-21T08:54:40.1204698Z shl.b32 %r303, %r802, 11; 2026-02-21T08:54:40.1204851Z shl.b32 %r304, %r803, 11; 2026-02-21T08:54:40.1204994Z shl.b32 %r305, %r804, 11; 2026-02-21T08:54:40.1205248Z .loc 1 55 59 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:59 2026-02-21T08:54:40.1205526Z or.b32 %r306, %r302, %r15; 2026-02-21T08:54:40.1205679Z or.b32 %r307, %r303, %r15; 2026-02-21T08:54:40.1205831Z or.b32 %r308, %r304, %r15; 2026-02-21T08:54:40.1205974Z or.b32 %r309, %r305, %r15; 2026-02-21T08:54:40.1206231Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1206514Z mad.wide.s32 %rd26, %r306, 2, %rd15; 2026-02-21T08:54:40.1206691Z mad.wide.s32 %rd27, %r307, 2, %rd15; 2026-02-21T08:54:40.1206857Z mad.wide.s32 %rd28, %r308, 2, %rd15; 2026-02-21T08:54:40.1207031Z mad.wide.s32 %rd29, %r309, 2, %rd15; 2026-02-21T08:54:40.1207298Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1207582Z add.s32 %r155, %r326, 40960; 2026-02-21T08:54:40.1207786Z // begin inline asm 2026-02-21T08:54:40.1207979Z cp.async.cg.shared.global [ %r155 + 0 ], [ %rd26 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1208198Z // end inline asm 2026-02-21T08:54:40.1208330Z add.s32 %r157, %r326, 41984; 2026-02-21T08:54:40.1208483Z // begin inline asm 2026-02-21T08:54:40.1208665Z cp.async.cg.shared.global [ %r157 + 0 ], [ %rd27 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1208883Z // end inline asm 2026-02-21T08:54:40.1209012Z add.s32 %r159, %r326, 43008; 2026-02-21T08:54:40.1209170Z // begin inline asm 2026-02-21T08:54:40.1209390Z cp.async.cg.shared.global [ %r159 + 0 ], [ %rd28 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1209604Z // end inline asm 2026-02-21T08:54:40.1209748Z add.s32 %r161, %r326, 44032; 2026-02-21T08:54:40.1209898Z // begin inline asm 2026-02-21T08:54:40.1210093Z cp.async.cg.shared.global [ %r161 + 0 ], [ %rd29 + 0 ], 0x10, %r140; 2026-02-21T08:54:40.1210314Z // end inline asm 2026-02-21T08:54:40.1210460Z cp.async.commit_group; 2026-02-21T08:54:40.1210725Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1211022Z setp.gt.s32 %p7, %r6, 1; 2026-02-21T08:54:40.1211282Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1211567Z cvt.s64.s32 %rd78, %r282; 2026-02-21T08:54:40.1211727Z cvt.u64.u32 %rd79, %r15; 2026-02-21T08:54:40.1211879Z or.b64 %rd80, %rd78, %rd79; 2026-02-21T08:54:40.1212066Z shl.b64 %rd81, %rd80, 1; 2026-02-21T08:54:40.1212216Z add.s64 %rd1, %rd14, %rd81; 2026-02-21T08:54:40.1212369Z add.s64 %rd30, %rd1, 64; 2026-02-21T08:54:40.1212512Z cvt.s64.s32 %rd82, %r283; 2026-02-21T08:54:40.1212665Z or.b64 %rd83, %rd82, %rd79; 2026-02-21T08:54:40.1212816Z shl.b64 %rd84, %rd83, 1; 2026-02-21T08:54:40.1212961Z add.s64 %rd2, %rd14, %rd84; 2026-02-21T08:54:40.1213111Z add.s64 %rd31, %rd2, 64; 2026-02-21T08:54:40.1213252Z cvt.s64.s32 %rd85, %r284; 2026-02-21T08:54:40.1213404Z or.b64 %rd86, %rd85, %rd79; 2026-02-21T08:54:40.1213549Z shl.b64 %rd87, %rd86, 1; 2026-02-21T08:54:40.1213706Z add.s64 %rd3, %rd14, %rd87; 2026-02-21T08:54:40.1213851Z add.s64 %rd32, %rd3, 64; 2026-02-21T08:54:40.1214000Z cvt.s64.s32 %rd88, %r285; 2026-02-21T08:54:40.1214141Z or.b64 %rd89, %rd88, %rd79; 2026-02-21T08:54:40.1214291Z shl.b64 %rd90, %rd89, 1; 2026-02-21T08:54:40.1214440Z add.s64 %rd4, %rd14, %rd90; 2026-02-21T08:54:40.1214586Z add.s64 %rd33, %rd4, 64; 2026-02-21T08:54:40.1214762Z cvt.s64.s32 %rd91, %r286; 2026-02-21T08:54:40.1214909Z or.b64 %rd92, %rd91, %rd79; 2026-02-21T08:54:40.1215062Z shl.b64 %rd93, %rd92, 1; 2026-02-21T08:54:40.1215204Z add.s64 %rd5, %rd14, %rd93; 2026-02-21T08:54:40.1215357Z add.s64 %rd34, %rd5, 64; 2026-02-21T08:54:40.1215524Z cvt.s64.s32 %rd94, %r287; 2026-02-21T08:54:40.1215676Z or.b64 %rd95, %rd94, %rd79; 2026-02-21T08:54:40.1215820Z shl.b64 %rd96, %rd95, 1; 2026-02-21T08:54:40.1215969Z add.s64 %rd6, %rd14, %rd96; 2026-02-21T08:54:40.1216122Z add.s64 %rd35, %rd6, 64; 2026-02-21T08:54:40.1216263Z cvt.s64.s32 %rd97, %r288; 2026-02-21T08:54:40.1216413Z or.b64 %rd98, %rd97, %rd79; 2026-02-21T08:54:40.1216555Z shl.b64 %rd99, %rd98, 1; 2026-02-21T08:54:40.1216700Z add.s64 %rd7, %rd14, %rd99; 2026-02-21T08:54:40.1216843Z add.s64 %rd36, %rd7, 64; 2026-02-21T08:54:40.1216990Z cvt.s64.s32 %rd100, %r289; 2026-02-21T08:54:40.1217145Z or.b64 %rd101, %rd100, %rd79; 2026-02-21T08:54:40.1217311Z shl.b64 %rd102, %rd101, 1; 2026-02-21T08:54:40.1217475Z add.s64 %rd8, %rd14, %rd102; 2026-02-21T08:54:40.1217634Z add.s64 %rd37, %rd8, 64; 2026-02-21T08:54:40.1217902Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1218189Z bar.sync 2, 64; 2026-02-21T08:54:40.1218337Z add.s32 %r163, %r326, 8192; 2026-02-21T08:54:40.1218493Z selp.b32 %r164, 16, 0, %p7; 2026-02-21T08:54:40.1218652Z // begin inline asm 2026-02-21T08:54:40.1218884Z cp.async.cg.shared.global [ %r163 + 0 ], [ %rd30 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1219119Z // end inline asm 2026-02-21T08:54:40.1219264Z add.s32 %r165, %r326, 9216; 2026-02-21T08:54:40.1219416Z // begin inline asm 2026-02-21T08:54:40.1219621Z cp.async.cg.shared.global [ %r165 + 0 ], [ %rd31 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1219846Z // end inline asm 2026-02-21T08:54:40.1219992Z add.s32 %r167, %r326, 10240; 2026-02-21T08:54:40.1220147Z // begin inline asm 2026-02-21T08:54:40.1220351Z cp.async.cg.shared.global [ %r167 + 0 ], [ %rd32 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1220612Z // end inline asm 2026-02-21T08:54:40.1220756Z add.s32 %r169, %r326, 11264; 2026-02-21T08:54:40.1220909Z // begin inline asm 2026-02-21T08:54:40.1221108Z cp.async.cg.shared.global [ %r169 + 0 ], [ %rd33 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1221336Z // end inline asm 2026-02-21T08:54:40.1221470Z add.s32 %r171, %r326, 12288; 2026-02-21T08:54:40.1221629Z // begin inline asm 2026-02-21T08:54:40.1221822Z cp.async.cg.shared.global [ %r171 + 0 ], [ %rd34 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1222051Z // end inline asm 2026-02-21T08:54:40.1222187Z add.s32 %r173, %r326, 13312; 2026-02-21T08:54:40.1222349Z // begin inline asm 2026-02-21T08:54:40.1222540Z cp.async.cg.shared.global [ %r173 + 0 ], [ %rd35 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1222769Z // end inline asm 2026-02-21T08:54:40.1222911Z add.s32 %r175, %r326, 14336; 2026-02-21T08:54:40.1223065Z // begin inline asm 2026-02-21T08:54:40.1223305Z cp.async.cg.shared.global [ %r175 + 0 ], [ %rd36 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1223532Z // end inline asm 2026-02-21T08:54:40.1223674Z add.s32 %r177, %r326, 15360; 2026-02-21T08:54:40.1223827Z // begin inline asm 2026-02-21T08:54:40.1224026Z cp.async.cg.shared.global [ %r177 + 0 ], [ %rd37 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1224247Z // end inline asm 2026-02-21T08:54:40.1224394Z cp.async.commit_group; 2026-02-21T08:54:40.1224695Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1225002Z cvt.s64.s32 %rd103, %r302; 2026-02-21T08:54:40.1225178Z or.b64 %rd104, %rd103, %rd79; 2026-02-21T08:54:40.1225332Z shl.b64 %rd105, %rd104, 1; 2026-02-21T08:54:40.1225485Z add.s64 %rd9, %rd15, %rd105; 2026-02-21T08:54:40.1225633Z add.s64 %rd38, %rd9, 64; 2026-02-21T08:54:40.1225785Z cvt.s64.s32 %rd106, %r303; 2026-02-21T08:54:40.1225932Z or.b64 %rd107, %rd106, %rd79; 2026-02-21T08:54:40.1226087Z shl.b64 %rd108, %rd107, 1; 2026-02-21T08:54:40.1226234Z add.s64 %rd10, %rd15, %rd108; 2026-02-21T08:54:40.1226392Z add.s64 %rd39, %rd10, 64; 2026-02-21T08:54:40.1226543Z cvt.s64.s32 %rd109, %r304; 2026-02-21T08:54:40.1226691Z or.b64 %rd110, %rd109, %rd79; 2026-02-21T08:54:40.1226844Z shl.b64 %rd111, %rd110, 1; 2026-02-21T08:54:40.1227019Z add.s64 %rd11, %rd15, %rd111; 2026-02-21T08:54:40.1227175Z add.s64 %rd40, %rd11, 64; 2026-02-21T08:54:40.1227320Z cvt.s64.s32 %rd112, %r305; 2026-02-21T08:54:40.1227474Z or.b64 %rd113, %rd112, %rd79; 2026-02-21T08:54:40.1227622Z shl.b64 %rd114, %rd113, 1; 2026-02-21T08:54:40.1227773Z add.s64 %rd12, %rd15, %rd114; 2026-02-21T08:54:40.1227927Z add.s64 %rd41, %rd12, 64; 2026-02-21T08:54:40.1228185Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1228476Z add.s32 %r179, %r326, 45056; 2026-02-21T08:54:40.1228625Z // begin inline asm 2026-02-21T08:54:40.1228825Z cp.async.cg.shared.global [ %r179 + 0 ], [ %rd38 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1229037Z // end inline asm 2026-02-21T08:54:40.1229182Z add.s32 %r181, %r326, 46080; 2026-02-21T08:54:40.1229328Z // begin inline asm 2026-02-21T08:54:40.1229521Z cp.async.cg.shared.global [ %r181 + 0 ], [ %rd39 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1229741Z // end inline asm 2026-02-21T08:54:40.1229869Z add.s32 %r183, %r326, 47104; 2026-02-21T08:54:40.1230020Z // begin inline asm 2026-02-21T08:54:40.1230205Z cp.async.cg.shared.global [ %r183 + 0 ], [ %rd40 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1230454Z // end inline asm 2026-02-21T08:54:40.1230585Z add.s32 %r185, %r326, 48128; 2026-02-21T08:54:40.1230737Z // begin inline asm 2026-02-21T08:54:40.1230919Z cp.async.cg.shared.global [ %r185 + 0 ], [ %rd41 + 0 ], 0x10, %r164; 2026-02-21T08:54:40.1231140Z // end inline asm 2026-02-21T08:54:40.1231273Z cp.async.commit_group; 2026-02-21T08:54:40.1231542Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1231834Z setp.gt.s32 %p8, %r6, 2; 2026-02-21T08:54:40.1232120Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1232413Z add.s64 %rd42, %rd1, 128; 2026-02-21T08:54:40.1232570Z add.s64 %rd43, %rd2, 128; 2026-02-21T08:54:40.1232735Z add.s64 %rd44, %rd3, 128; 2026-02-21T08:54:40.1232882Z add.s64 %rd45, %rd4, 128; 2026-02-21T08:54:40.1233039Z add.s64 %rd46, %rd5, 128; 2026-02-21T08:54:40.1233195Z add.s64 %rd47, %rd6, 128; 2026-02-21T08:54:40.1233345Z add.s64 %rd48, %rd7, 128; 2026-02-21T08:54:40.1233497Z add.s64 %rd49, %rd8, 128; 2026-02-21T08:54:40.1233753Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1234045Z bar.sync 2, 64; 2026-02-21T08:54:40.1234181Z add.s32 %r187, %r326, 16384; 2026-02-21T08:54:40.1234344Z selp.b32 %r188, 16, 0, %p8; 2026-02-21T08:54:40.1234498Z // begin inline asm 2026-02-21T08:54:40.1234758Z cp.async.cg.shared.global [ %r187 + 0 ], [ %rd42 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1234983Z // end inline asm 2026-02-21T08:54:40.1235114Z add.s32 %r189, %r326, 17408; 2026-02-21T08:54:40.1235266Z // begin inline asm 2026-02-21T08:54:40.1235453Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd43 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1235673Z // end inline asm 2026-02-21T08:54:40.1235801Z add.s32 %r191, %r326, 18432; 2026-02-21T08:54:40.1235953Z // begin inline asm 2026-02-21T08:54:40.1236139Z cp.async.cg.shared.global [ %r191 + 0 ], [ %rd44 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1236363Z // end inline asm 2026-02-21T08:54:40.1236493Z add.s32 %r193, %r326, 19456; 2026-02-21T08:54:40.1236643Z // begin inline asm 2026-02-21T08:54:40.1236832Z cp.async.cg.shared.global [ %r193 + 0 ], [ %rd45 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1237043Z // end inline asm 2026-02-21T08:54:40.1237182Z add.s32 %r195, %r326, 20480; 2026-02-21T08:54:40.1237326Z // begin inline asm 2026-02-21T08:54:40.1237518Z cp.async.cg.shared.global [ %r195 + 0 ], [ %rd46 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1237733Z // end inline asm 2026-02-21T08:54:40.1237867Z add.s32 %r197, %r326, 21504; 2026-02-21T08:54:40.1238012Z // begin inline asm 2026-02-21T08:54:40.1238203Z cp.async.cg.shared.global [ %r197 + 0 ], [ %rd47 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1238453Z // end inline asm 2026-02-21T08:54:40.1238584Z add.s32 %r199, %r326, 22528; 2026-02-21T08:54:40.1238736Z // begin inline asm 2026-02-21T08:54:40.1238921Z cp.async.cg.shared.global [ %r199 + 0 ], [ %rd48 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1239142Z // end inline asm 2026-02-21T08:54:40.1239271Z add.s32 %r201, %r326, 23552; 2026-02-21T08:54:40.1239423Z // begin inline asm 2026-02-21T08:54:40.1239608Z cp.async.cg.shared.global [ %r201 + 0 ], [ %rd49 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1239828Z // end inline asm 2026-02-21T08:54:40.1239967Z cp.async.commit_group; 2026-02-21T08:54:40.1240226Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1240518Z add.s64 %rd50, %rd9, 128; 2026-02-21T08:54:40.1240666Z add.s64 %rd51, %rd10, 128; 2026-02-21T08:54:40.1240820Z add.s64 %rd52, %rd11, 128; 2026-02-21T08:54:40.1240966Z add.s64 %rd53, %rd12, 128; 2026-02-21T08:54:40.1241230Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1241513Z add.s32 %r203, %r326, 49152; 2026-02-21T08:54:40.1241667Z // begin inline asm 2026-02-21T08:54:40.1241881Z cp.async.cg.shared.global [ %r203 + 0 ], [ %rd50 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1242096Z // end inline asm 2026-02-21T08:54:40.1242236Z add.s32 %r205, %r326, 50176; 2026-02-21T08:54:40.1242382Z // begin inline asm 2026-02-21T08:54:40.1242572Z cp.async.cg.shared.global [ %r205 + 0 ], [ %rd51 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1242783Z // end inline asm 2026-02-21T08:54:40.1242917Z add.s32 %r207, %r326, 51200; 2026-02-21T08:54:40.1243062Z // begin inline asm 2026-02-21T08:54:40.1243255Z cp.async.cg.shared.global [ %r207 + 0 ], [ %rd52 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1243499Z // end inline asm 2026-02-21T08:54:40.1243629Z add.s32 %r209, %r326, 52224; 2026-02-21T08:54:40.1243778Z // begin inline asm 2026-02-21T08:54:40.1243959Z cp.async.cg.shared.global [ %r209 + 0 ], [ %rd53 + 0 ], 0x10, %r188; 2026-02-21T08:54:40.1244175Z // end inline asm 2026-02-21T08:54:40.1244306Z cp.async.commit_group; 2026-02-21T08:54:40.1244571Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1244894Z setp.gt.s32 %p9, %r6, 3; 2026-02-21T08:54:40.1245162Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1245452Z add.s64 %rd54, %rd1, 192; 2026-02-21T08:54:40.1245600Z add.s64 %rd55, %rd2, 192; 2026-02-21T08:54:40.1245750Z add.s64 %rd56, %rd3, 192; 2026-02-21T08:54:40.1245893Z add.s64 %rd57, %rd4, 192; 2026-02-21T08:54:40.1246067Z add.s64 %rd58, %rd5, 192; 2026-02-21T08:54:40.1246215Z add.s64 %rd59, %rd6, 192; 2026-02-21T08:54:40.1246363Z add.s64 %rd60, %rd7, 192; 2026-02-21T08:54:40.1246505Z add.s64 %rd61, %rd8, 192; 2026-02-21T08:54:40.1246758Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1247044Z bar.sync 2, 64; 2026-02-21T08:54:40.1247178Z add.s32 %r211, %r326, 24576; 2026-02-21T08:54:40.1247237Z selp.b32 %r212, 16, 0, %p9; 2026-02-21T08:54:40.1247299Z // begin inline asm 2026-02-21T08:54:40.1247406Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd54 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1247458Z // end inline asm 2026-02-21T08:54:40.1247519Z add.s32 %r213, %r326, 25600; 2026-02-21T08:54:40.1247572Z // begin inline asm 2026-02-21T08:54:40.1247676Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd55 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1247729Z // end inline asm 2026-02-21T08:54:40.1247792Z add.s32 %r215, %r326, 26624; 2026-02-21T08:54:40.1247847Z // begin inline asm 2026-02-21T08:54:40.1247953Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd56 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1248011Z // end inline asm 2026-02-21T08:54:40.1248068Z add.s32 %r217, %r326, 27648; 2026-02-21T08:54:40.1248123Z // begin inline asm 2026-02-21T08:54:40.1248271Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd57 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1248330Z // end inline asm 2026-02-21T08:54:40.1248387Z add.s32 %r219, %r326, 28672; 2026-02-21T08:54:40.1248441Z // begin inline asm 2026-02-21T08:54:40.1248553Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd58 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1248604Z // end inline asm 2026-02-21T08:54:40.1248659Z add.s32 %r221, %r326, 29696; 2026-02-21T08:54:40.1248718Z // begin inline asm 2026-02-21T08:54:40.1248821Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd59 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1248872Z // end inline asm 2026-02-21T08:54:40.1248929Z add.s32 %r223, %r326, 30720; 2026-02-21T08:54:40.1248989Z // begin inline asm 2026-02-21T08:54:40.1249094Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd60 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1249145Z // end inline asm 2026-02-21T08:54:40.1249213Z add.s32 %r225, %r326, 31744; 2026-02-21T08:54:40.1249271Z // begin inline asm 2026-02-21T08:54:40.1249375Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd61 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1249428Z // end inline asm 2026-02-21T08:54:40.1249495Z cp.async.commit_group; 2026-02-21T08:54:40.1249690Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1249750Z add.s64 %rd62, %rd9, 192; 2026-02-21T08:54:40.1249817Z add.s64 %rd63, %rd10, 192; 2026-02-21T08:54:40.1249872Z add.s64 %rd64, %rd11, 192; 2026-02-21T08:54:40.1249927Z add.s64 %rd65, %rd12, 192; 2026-02-21T08:54:40.1250102Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1250157Z add.s32 %r227, %r326, 53248; 2026-02-21T08:54:40.1250237Z // begin inline asm 2026-02-21T08:54:40.1250342Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd62 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1250401Z // end inline asm 2026-02-21T08:54:40.1250455Z add.s32 %r229, %r326, 54272; 2026-02-21T08:54:40.1250511Z // begin inline asm 2026-02-21T08:54:40.1250623Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd63 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1250675Z // end inline asm 2026-02-21T08:54:40.1250730Z add.s32 %r231, %r326, 55296; 2026-02-21T08:54:40.1250786Z // begin inline asm 2026-02-21T08:54:40.1250899Z cp.async.cg.shared.global [ %r231 + 0 ], [ %rd64 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1250950Z // end inline asm 2026-02-21T08:54:40.1251004Z add.s32 %r233, %r326, 56320; 2026-02-21T08:54:40.1251065Z // begin inline asm 2026-02-21T08:54:40.1251169Z cp.async.cg.shared.global [ %r233 + 0 ], [ %rd65 + 0 ], 0x10, %r212; 2026-02-21T08:54:40.1251221Z // end inline asm 2026-02-21T08:54:40.1251310Z cp.async.commit_group; 2026-02-21T08:54:40.1251486Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1251544Z setp.gt.s32 %p10, %r6, 4; 2026-02-21T08:54:40.1251712Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1251777Z add.s64 %rd66, %rd1, 256; 2026-02-21T08:54:40.1251832Z add.s64 %rd67, %rd2, 256; 2026-02-21T08:54:40.1251886Z add.s64 %rd68, %rd3, 256; 2026-02-21T08:54:40.1251948Z add.s64 %rd69, %rd4, 256; 2026-02-21T08:54:40.1252002Z add.s64 %rd70, %rd5, 256; 2026-02-21T08:54:40.1252056Z add.s64 %rd71, %rd6, 256; 2026-02-21T08:54:40.1252108Z add.s64 %rd72, %rd7, 256; 2026-02-21T08:54:40.1252168Z add.s64 %rd73, %rd8, 256; 2026-02-21T08:54:40.1252333Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1252388Z bar.sync 2, 64; 2026-02-21T08:54:40.1252451Z add.s32 %r235, %r326, 32768; 2026-02-21T08:54:40.1252509Z selp.b32 %r236, 16, 0, %p10; 2026-02-21T08:54:40.1252562Z // begin inline asm 2026-02-21T08:54:40.1252673Z cp.async.cg.shared.global [ %r235 + 0 ], [ %rd66 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1252724Z // end inline asm 2026-02-21T08:54:40.1252802Z add.s32 %r237, %r326, 33792; 2026-02-21T08:54:40.1252855Z // begin inline asm 2026-02-21T08:54:40.1252968Z cp.async.cg.shared.global [ %r237 + 0 ], [ %rd67 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1253019Z // end inline asm 2026-02-21T08:54:40.1253074Z add.s32 %r239, %r326, 34816; 2026-02-21T08:54:40.1253134Z // begin inline asm 2026-02-21T08:54:40.1253238Z cp.async.cg.shared.global [ %r239 + 0 ], [ %rd68 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1253290Z // end inline asm 2026-02-21T08:54:40.1253345Z add.s32 %r241, %r326, 35840; 2026-02-21T08:54:40.1253405Z // begin inline asm 2026-02-21T08:54:40.1253509Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd69 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1253561Z // end inline asm 2026-02-21T08:54:40.1253622Z add.s32 %r243, %r326, 36864; 2026-02-21T08:54:40.1253675Z // begin inline asm 2026-02-21T08:54:40.1253779Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd70 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1253839Z // end inline asm 2026-02-21T08:54:40.1253895Z add.s32 %r245, %r326, 37888; 2026-02-21T08:54:40.1253947Z // begin inline asm 2026-02-21T08:54:40.1254052Z cp.async.cg.shared.global [ %r245 + 0 ], [ %rd71 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1254113Z // end inline asm 2026-02-21T08:54:40.1254187Z add.s32 %r247, %r326, 38912; 2026-02-21T08:54:40.1254242Z // begin inline asm 2026-02-21T08:54:40.1254353Z cp.async.cg.shared.global [ %r247 + 0 ], [ %rd72 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1254404Z // end inline asm 2026-02-21T08:54:40.1254457Z add.s32 %r249, %r326, 39936; 2026-02-21T08:54:40.1254510Z // begin inline asm 2026-02-21T08:54:40.1254620Z cp.async.cg.shared.global [ %r249 + 0 ], [ %rd73 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1254707Z // end inline asm 2026-02-21T08:54:40.1254808Z cp.async.commit_group; 2026-02-21T08:54:40.1254985Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1255041Z add.s64 %rd74, %rd9, 256; 2026-02-21T08:54:40.1255100Z add.s64 %rd75, %rd10, 256; 2026-02-21T08:54:40.1255164Z add.s64 %rd76, %rd11, 256; 2026-02-21T08:54:40.1255220Z add.s64 %rd77, %rd12, 256; 2026-02-21T08:54:40.1255389Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1255445Z add.s32 %r251, %r326, 57344; 2026-02-21T08:54:40.1255506Z // begin inline asm 2026-02-21T08:54:40.1255611Z cp.async.cg.shared.global [ %r251 + 0 ], [ %rd74 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1255665Z // end inline asm 2026-02-21T08:54:40.1255728Z add.s32 %r253, %r326, 58368; 2026-02-21T08:54:40.1255782Z // begin inline asm 2026-02-21T08:54:40.1255912Z cp.async.cg.shared.global [ %r253 + 0 ], [ %rd75 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1255969Z // end inline asm 2026-02-21T08:54:40.1256035Z add.s32 %r255, %r326, 59392; 2026-02-21T08:54:40.1256090Z // begin inline asm 2026-02-21T08:54:40.1256194Z cp.async.cg.shared.global [ %r255 + 0 ], [ %rd76 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1256257Z // end inline asm 2026-02-21T08:54:40.1256314Z add.s32 %r257, %r326, 60416; 2026-02-21T08:54:40.1256369Z // begin inline asm 2026-02-21T08:54:40.1256482Z cp.async.cg.shared.global [ %r257 + 0 ], [ %rd77 + 0 ], 0x10, %r236; 2026-02-21T08:54:40.1256536Z // end inline asm 2026-02-21T08:54:40.1256594Z cp.async.commit_group; 2026-02-21T08:54:40.1256759Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1256826Z cp.async.wait_group 8; 2026-02-21T08:54:40.1256880Z bar.sync 2, 64; 2026-02-21T08:54:40.1257044Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1257128Z shfl.sync.idx.b32 %r41, %r265, 0, 31, -1; 2026-02-21T08:54:40.1257188Z setp.ne.b32 %p11, %r41, 0; 2026-02-21T08:54:40.1257248Z or.pred %p12, %p5, %p11; 2026-02-21T08:54:40.1257304Z @%p12 bra $L__BB0_7; 2026-02-21T08:54:40.1257409Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1257494Z elect.sync %r314|%p14, -1; 2026-02-21T08:54:40.1257553Z bfe.u32 %r316, %r137, 4, 14; 2026-02-21T08:54:40.1257617Z cvt.u64.u32 %rd120, %r316; 2026-02-21T08:54:40.1257689Z or.b64 %rd115, %rd120, -9223371899382267904; 2026-02-21T08:54:40.1257745Z add.s32 %r317, %r137, 40960; 2026-02-21T08:54:40.1257806Z bfe.u32 %r318, %r317, 4, 14; 2026-02-21T08:54:40.1257860Z cvt.u64.u32 %rd121, %r318; 2026-02-21T08:54:40.1257930Z or.b64 %rd116, %rd121, -9223371899399045120; 2026-02-21T08:54:40.1257985Z mov.b32 %r311, 135266320; 2026-02-21T08:54:40.1258048Z mov.pred %p13, 0; 2026-02-21T08:54:40.1258101Z // begin inline asm 2026-02-21T08:54:40.1258247Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r310 + 0 ], %rd115, %rd116, %r311, %p13; 2026-02-21T08:54:40.1258310Z // end inline asm 2026-02-21T08:54:40.1258364Z add.s32 %r319, %r137, 32; 2026-02-21T08:54:40.1258418Z bfe.u32 %r320, %r319, 4, 14; 2026-02-21T08:54:40.1258473Z cvt.u64.u32 %rd122, %r320; 2026-02-21T08:54:40.1258548Z or.b64 %rd117, %rd122, -9223371899382267904; 2026-02-21T08:54:40.1258601Z add.s32 %r321, %r137, 40992; 2026-02-21T08:54:40.1258655Z bfe.u32 %r322, %r321, 4, 14; 2026-02-21T08:54:40.1258742Z cvt.u64.u32 %rd123, %r322; 2026-02-21T08:54:40.1258811Z or.b64 %rd118, %rd123, -9223371899399045120; 2026-02-21T08:54:40.1258866Z mov.pred %p15, -1; 2026-02-21T08:54:40.1258925Z // begin inline asm 2026-02-21T08:54:40.1259066Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r310 + 0 ], %rd117, %rd118, %r311, %p15; 2026-02-21T08:54:40.1259119Z // end inline asm 2026-02-21T08:54:40.1259173Z add.s32 %r323, %r137, 77824; 2026-02-21T08:54:40.1259234Z cvt.u64.u32 %rd119, %r323; 2026-02-21T08:54:40.1259288Z // begin inline asm 2026-02-21T08:54:40.1259428Z @%p13 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd119]; 2026-02-21T08:54:40.1259487Z // end inline asm 2026-02-21T08:54:40.1259582Z $L__BB0_7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1259758Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1259822Z setp.gt.s32 %p19, %r6, 5; 2026-02-21T08:54:40.1259990Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1260046Z add.s32 %r324, %r137, 77840; 2026-02-21T08:54:40.1260098Z mov.b32 %r774, 0; 2026-02-21T08:54:40.1260160Z mov.pred %p87, 0; 2026-02-21T08:54:40.1260215Z // begin inline asm 2026-02-21T08:54:40.1260263Z 2026-02-21T08:54:40.1260317Z { 2026-02-21T08:54:40.1260379Z @!%p87 bra.uni skipWait; 2026-02-21T08:54:40.1260437Z .reg .pred complete; 2026-02-21T08:54:40.1260511Z waitLoop: 2026-02-21T08:54:40.1260639Z mbarrier.try_wait.parity.shared.b64 complete, [%r324], %r774; 2026-02-21T08:54:40.1260705Z @!complete bra.uni waitLoop; 2026-02-21T08:54:40.1260760Z skipWait: 2026-02-21T08:54:40.1260816Z } 2026-02-21T08:54:40.1260821Z 2026-02-21T08:54:40.1260878Z // end inline asm 2026-02-21T08:54:40.1261044Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1261108Z add.s64 %rd124, %rd1, 320; 2026-02-21T08:54:40.1261167Z add.s64 %rd125, %rd2, 320; 2026-02-21T08:54:40.1261225Z add.s64 %rd126, %rd3, 320; 2026-02-21T08:54:40.1261282Z add.s64 %rd127, %rd4, 320; 2026-02-21T08:54:40.1261345Z add.s64 %rd128, %rd5, 320; 2026-02-21T08:54:40.1261401Z add.s64 %rd129, %rd6, 320; 2026-02-21T08:54:40.1261457Z add.s64 %rd130, %rd7, 320; 2026-02-21T08:54:40.1261522Z add.s64 %rd131, %rd8, 320; 2026-02-21T08:54:40.1261689Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1261746Z bar.sync 2, 64; 2026-02-21T08:54:40.1261808Z selp.b32 %r327, 16, 0, %p19; 2026-02-21T08:54:40.1261871Z // begin inline asm 2026-02-21T08:54:40.1261990Z cp.async.cg.shared.global [ %r326 + 0 ], [ %rd124 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1262067Z // end inline asm 2026-02-21T08:54:40.1262129Z // begin inline asm 2026-02-21T08:54:40.1262245Z cp.async.cg.shared.global [ %r328 + 0 ], [ %rd125 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1262302Z // end inline asm 2026-02-21T08:54:40.1262359Z // begin inline asm 2026-02-21T08:54:40.1262480Z cp.async.cg.shared.global [ %r330 + 0 ], [ %rd126 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1262537Z // end inline asm 2026-02-21T08:54:40.1262593Z // begin inline asm 2026-02-21T08:54:40.1262711Z cp.async.cg.shared.global [ %r332 + 0 ], [ %rd127 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1262764Z // end inline asm 2026-02-21T08:54:40.1262820Z // begin inline asm 2026-02-21T08:54:40.1262940Z cp.async.cg.shared.global [ %r334 + 0 ], [ %rd128 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1262996Z // end inline asm 2026-02-21T08:54:40.1263051Z // begin inline asm 2026-02-21T08:54:40.1263161Z cp.async.cg.shared.global [ %r336 + 0 ], [ %rd129 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1263222Z // end inline asm 2026-02-21T08:54:40.1263278Z // begin inline asm 2026-02-21T08:54:40.1263387Z cp.async.cg.shared.global [ %r338 + 0 ], [ %rd130 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1263449Z // end inline asm 2026-02-21T08:54:40.1263503Z // begin inline asm 2026-02-21T08:54:40.1263635Z cp.async.cg.shared.global [ %r340 + 0 ], [ %rd131 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1263691Z // end inline asm 2026-02-21T08:54:40.1263760Z cp.async.commit_group; 2026-02-21T08:54:40.1263931Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1263989Z add.s64 %rd132, %rd9, 320; 2026-02-21T08:54:40.1264056Z add.s64 %rd133, %rd10, 320; 2026-02-21T08:54:40.1264116Z add.s64 %rd134, %rd11, 320; 2026-02-21T08:54:40.1264196Z add.s64 %rd135, %rd12, 320; 2026-02-21T08:54:40.1264381Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1264438Z // begin inline asm 2026-02-21T08:54:40.1264551Z cp.async.cg.shared.global [ %r155 + 0 ], [ %rd132 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1264608Z // end inline asm 2026-02-21T08:54:40.1264694Z // begin inline asm 2026-02-21T08:54:40.1264809Z cp.async.cg.shared.global [ %r157 + 0 ], [ %rd133 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1264863Z // end inline asm 2026-02-21T08:54:40.1264925Z // begin inline asm 2026-02-21T08:54:40.1265038Z cp.async.cg.shared.global [ %r159 + 0 ], [ %rd134 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1265091Z // end inline asm 2026-02-21T08:54:40.1265153Z // begin inline asm 2026-02-21T08:54:40.1265264Z cp.async.cg.shared.global [ %r161 + 0 ], [ %rd135 + 0 ], 0x10, %r327; 2026-02-21T08:54:40.1265318Z // end inline asm 2026-02-21T08:54:40.1265421Z cp.async.commit_group; 2026-02-21T08:54:40.1265617Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1265677Z @%p5 bra $L__BB0_14; 2026-02-21T08:54:40.1265754Z // %bb.8: // %.lr.ph 2026-02-21T08:54:40.1265853Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1266032Z .loc 1 0 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:0:107 2026-02-21T08:54:40.1266095Z add.s32 %r42, %r6, -6; 2026-02-21T08:54:40.1266161Z add.s32 %r43, %r6, -1; 2026-02-21T08:54:40.1266217Z mov.b32 %r790, 5; 2026-02-21T08:54:40.1266272Z mov.b32 %r773, 1; 2026-02-21T08:54:40.1266326Z mov.b32 %r772, 2; 2026-02-21T08:54:40.1266387Z mov.b32 %r771, 3; 2026-02-21T08:54:40.1266440Z mov.b32 %r770, 4; 2026-02-21T08:54:40.1266497Z mov.b32 %r769, 160; 2026-02-21T08:54:40.1266559Z mov.b32 %r787, %r774; 2026-02-21T08:54:40.1266616Z mov.b32 %r788, %r774; 2026-02-21T08:54:40.1266673Z mov.b32 %r792, %r5; 2026-02-21T08:54:40.1266729Z mov.b32 %r791, %r774; 2026-02-21T08:54:40.1266795Z bra.uni $L__BB0_9; 2026-02-21T08:54:40.1266894Z $L__BB0_13: // in Loop: Header=BB0_9 Depth=2 2026-02-21T08:54:40.1267103Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1267173Z setp.eq.b32 %p87, %r773, 63; 2026-02-21T08:54:40.1267233Z setp.eq.b32 %p35, %r773, 63; 2026-02-21T08:54:40.1267294Z setp.eq.b32 %p36, %r67, 0; 2026-02-21T08:54:40.1267360Z setp.lt.s32 %p37, %r791, %r43; 2026-02-21T08:54:40.1267430Z setp.lt.s32 %p38, %r791, %r42; 2026-02-21T08:54:40.1267675Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1267734Z add.s32 %r421, %r788, 1; 2026-02-21T08:54:40.1267802Z setp.eq.b32 %p39, %r421, 2; 2026-02-21T08:54:40.1267867Z selp.b32 %r422, 0, %r421, %p39; 2026-02-21T08:54:40.1267933Z selp.b32 %r788, %r422, %r788, %p35; 2026-02-21T08:54:40.1268008Z and.pred %p40, %p35, %p39; 2026-02-21T08:54:40.1268067Z selp.b32 %r423, 1, 0, %p40; 2026-02-21T08:54:40.1268124Z xor.b32 %r787, %r787, %r423; 2026-02-21T08:54:40.1268313Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1268377Z shl.b32 %r424, %r788, 3; 2026-02-21T08:54:40.1268432Z add.s32 %r426, %r137, %r424; 2026-02-21T08:54:40.1268487Z add.s32 %r395, %r426, 77840; 2026-02-21T08:54:40.1268578Z and.pred %p34, %p37, %p35; 2026-02-21T08:54:40.1268743Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1268797Z // begin inline asm 2026-02-21T08:54:40.1268852Z 2026-02-21T08:54:40.1268899Z { 2026-02-21T08:54:40.1268959Z @!%p34 bra.uni skipWait; 2026-02-21T08:54:40.1269017Z .reg .pred complete; 2026-02-21T08:54:40.1269079Z waitLoop: 2026-02-21T08:54:40.1269194Z mbarrier.try_wait.parity.shared.b64 complete, [%r395], %r787; 2026-02-21T08:54:40.1269283Z @!complete bra.uni waitLoop; 2026-02-21T08:54:40.1269340Z skipWait: 2026-02-21T08:54:40.1269388Z } 2026-02-21T08:54:40.1269393Z 2026-02-21T08:54:40.1269446Z // end inline asm 2026-02-21T08:54:40.1269620Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1269682Z add.s32 %r427, %r769, 32; 2026-02-21T08:54:40.1269743Z selp.b32 %r769, 0, %r427, %p36; 2026-02-21T08:54:40.1269909Z .loc 1 50 35 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:50:35 2026-02-21T08:54:40.1269974Z add.s32 %r428, %r769, %r15; 2026-02-21T08:54:40.1270138Z .loc 1 54 53 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:53 2026-02-21T08:54:40.1270193Z shl.b32 %r429, %r793, 11; 2026-02-21T08:54:40.1270254Z shl.b32 %r430, %r794, 11; 2026-02-21T08:54:40.1270307Z shl.b32 %r431, %r795, 11; 2026-02-21T08:54:40.1270383Z shl.b32 %r432, %r796, 11; 2026-02-21T08:54:40.1270439Z shl.b32 %r433, %r797, 11; 2026-02-21T08:54:40.1270503Z shl.b32 %r434, %r798, 11; 2026-02-21T08:54:40.1270559Z shl.b32 %r435, %r799, 11; 2026-02-21T08:54:40.1270614Z shl.b32 %r436, %r800, 11; 2026-02-21T08:54:40.1270790Z .loc 1 54 60 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:60 2026-02-21T08:54:40.1270848Z add.s32 %r437, %r429, %r428; 2026-02-21T08:54:40.1270904Z add.s32 %r438, %r430, %r428; 2026-02-21T08:54:40.1270967Z add.s32 %r439, %r431, %r428; 2026-02-21T08:54:40.1271022Z add.s32 %r440, %r432, %r428; 2026-02-21T08:54:40.1271076Z add.s32 %r441, %r433, %r428; 2026-02-21T08:54:40.1271128Z add.s32 %r442, %r434, %r428; 2026-02-21T08:54:40.1271189Z add.s32 %r443, %r435, %r428; 2026-02-21T08:54:40.1271243Z add.s32 %r444, %r436, %r428; 2026-02-21T08:54:40.1271407Z .loc 1 54 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:32 2026-02-21T08:54:40.1271480Z mad.wide.s32 %rd145, %r437, 2, %rd14; 2026-02-21T08:54:40.1271544Z mad.wide.s32 %rd146, %r438, 2, %rd14; 2026-02-21T08:54:40.1271606Z mad.wide.s32 %rd147, %r439, 2, %rd14; 2026-02-21T08:54:40.1271664Z mad.wide.s32 %rd148, %r440, 2, %rd14; 2026-02-21T08:54:40.1271751Z mad.wide.s32 %rd149, %r441, 2, %rd14; 2026-02-21T08:54:40.1271809Z mad.wide.s32 %rd150, %r442, 2, %rd14; 2026-02-21T08:54:40.1271867Z mad.wide.s32 %rd151, %r443, 2, %rd14; 2026-02-21T08:54:40.1271934Z mad.wide.s32 %rd152, %r444, 2, %rd14; 2026-02-21T08:54:40.1272101Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1272153Z bar.sync 2, 64; 2026-02-21T08:54:40.1272219Z add.s32 %r397, %r95, %r28; 2026-02-21T08:54:40.1272276Z selp.b32 %r398, 16, 0, %p38; 2026-02-21T08:54:40.1272329Z // begin inline asm 2026-02-21T08:54:40.1272441Z cp.async.cg.shared.global [ %r397 + 0 ], [ %rd145 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1272502Z // end inline asm 2026-02-21T08:54:40.1272558Z add.s32 %r399, %r397, 1024; 2026-02-21T08:54:40.1272612Z // begin inline asm 2026-02-21T08:54:40.1272727Z cp.async.cg.shared.global [ %r399 + 0 ], [ %rd146 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1272779Z // end inline asm 2026-02-21T08:54:40.1272836Z add.s32 %r401, %r397, 2048; 2026-02-21T08:54:40.1272889Z // begin inline asm 2026-02-21T08:54:40.1273006Z cp.async.cg.shared.global [ %r401 + 0 ], [ %rd147 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1273057Z // end inline asm 2026-02-21T08:54:40.1273132Z add.s32 %r403, %r397, 3072; 2026-02-21T08:54:40.1273196Z // begin inline asm 2026-02-21T08:54:40.1273305Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd148 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1273357Z // end inline asm 2026-02-21T08:54:40.1273420Z add.s32 %r405, %r397, 4096; 2026-02-21T08:54:40.1273474Z // begin inline asm 2026-02-21T08:54:40.1273581Z cp.async.cg.shared.global [ %r405 + 0 ], [ %rd149 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1273636Z // end inline asm 2026-02-21T08:54:40.1273726Z add.s32 %r407, %r397, 5120; 2026-02-21T08:54:40.1273780Z // begin inline asm 2026-02-21T08:54:40.1273887Z cp.async.cg.shared.global [ %r407 + 0 ], [ %rd150 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1273945Z // end inline asm 2026-02-21T08:54:40.1273999Z add.s32 %r409, %r397, 6144; 2026-02-21T08:54:40.1274052Z // begin inline asm 2026-02-21T08:54:40.1274158Z cp.async.cg.shared.global [ %r409 + 0 ], [ %rd151 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1274217Z // end inline asm 2026-02-21T08:54:40.1274272Z add.s32 %r411, %r397, 7168; 2026-02-21T08:54:40.1274325Z // begin inline asm 2026-02-21T08:54:40.1274438Z cp.async.cg.shared.global [ %r411 + 0 ], [ %rd152 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1274489Z // end inline asm 2026-02-21T08:54:40.1274548Z cp.async.commit_group; 2026-02-21T08:54:40.1274745Z .loc 1 55 80 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:80 2026-02-21T08:54:40.1274821Z shl.b32 %r445, %r801, 11; 2026-02-21T08:54:40.1274880Z shl.b32 %r446, %r802, 11; 2026-02-21T08:54:40.1274933Z shl.b32 %r447, %r803, 11; 2026-02-21T08:54:40.1274994Z shl.b32 %r448, %r804, 11; 2026-02-21T08:54:40.1275156Z .loc 1 55 59 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:59 2026-02-21T08:54:40.1275214Z add.s32 %r449, %r445, %r428; 2026-02-21T08:54:40.1275274Z add.s32 %r450, %r446, %r428; 2026-02-21T08:54:40.1275327Z add.s32 %r451, %r447, %r428; 2026-02-21T08:54:40.1275382Z add.s32 %r452, %r448, %r428; 2026-02-21T08:54:40.1275549Z .loc 1 55 34 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:34 2026-02-21T08:54:40.1275615Z mad.wide.s32 %rd153, %r449, 2, %rd15; 2026-02-21T08:54:40.1275674Z mad.wide.s32 %rd154, %r450, 2, %rd15; 2026-02-21T08:54:40.1275734Z mad.wide.s32 %rd155, %r451, 2, %rd15; 2026-02-21T08:54:40.1275802Z mad.wide.s32 %rd156, %r452, 2, %rd15; 2026-02-21T08:54:40.1275970Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1276029Z add.s32 %r413, %r96, %r28; 2026-02-21T08:54:40.1276089Z // begin inline asm 2026-02-21T08:54:40.1276198Z cp.async.cg.shared.global [ %r413 + 0 ], [ %rd153 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1276277Z // end inline asm 2026-02-21T08:54:40.1276334Z add.s32 %r415, %r413, 1024; 2026-02-21T08:54:40.1276395Z // begin inline asm 2026-02-21T08:54:40.1276500Z cp.async.cg.shared.global [ %r415 + 0 ], [ %rd154 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1276554Z // end inline asm 2026-02-21T08:54:40.1276617Z add.s32 %r417, %r413, 2048; 2026-02-21T08:54:40.1276671Z // begin inline asm 2026-02-21T08:54:40.1276776Z cp.async.cg.shared.global [ %r417 + 0 ], [ %rd155 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1276835Z // end inline asm 2026-02-21T08:54:40.1276889Z add.s32 %r419, %r413, 3072; 2026-02-21T08:54:40.1276944Z // begin inline asm 2026-02-21T08:54:40.1277051Z cp.async.cg.shared.global [ %r419 + 0 ], [ %rd156 + 0 ], 0x10, %r398; 2026-02-21T08:54:40.1277111Z // end inline asm 2026-02-21T08:54:40.1277170Z cp.async.commit_group; 2026-02-21T08:54:40.1277339Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1277405Z add.s32 %r791, %r791, 1; 2026-02-21T08:54:40.1277466Z setp.ne.b32 %p41, %r6, %r791; 2026-02-21T08:54:40.1277520Z mov.b32 %r770, %r790; 2026-02-21T08:54:40.1277575Z mov.b32 %r773, %r47; 2026-02-21T08:54:40.1277638Z mov.b32 %r790, %r67; 2026-02-21T08:54:40.1277720Z @%p41 bra $L__BB0_9; 2026-02-21T08:54:40.1277779Z bra.uni $L__BB0_14; 2026-02-21T08:54:40.1277880Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:40.1277974Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:40.1278144Z .loc 1 0 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:0:107 2026-02-21T08:54:40.1278207Z mov.b32 %r47, %r772; 2026-02-21T08:54:40.1278260Z mov.b32 %r772, %r771; 2026-02-21T08:54:40.1278339Z mov.b32 %r771, %r770; 2026-02-21T08:54:40.1278512Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1278575Z add.s32 %r358, %r790, 1; 2026-02-21T08:54:40.1278636Z setp.eq.b32 %p22, %r790, 63; 2026-02-21T08:54:40.1278697Z selp.b32 %r67, 0, %r358, %p22; 2026-02-21T08:54:40.1278761Z setp.ne.b32 %p23, %r67, 0; 2026-02-21T08:54:40.1278817Z @%p23 bra $L__BB0_11; 2026-02-21T08:54:40.1278908Z // %bb.10: // in Loop: Header=BB0_9 Depth=2 2026-02-21T08:54:40.1278965Z add.s32 %r792, %r792, 1; 2026-02-21T08:54:40.1279137Z .loc 1 36 35 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:36:35 2026-02-21T08:54:40.1279192Z shr.s32 %r359, %r792, 31; 2026-02-21T08:54:40.1279245Z shr.u32 %r360, %r359, 24; 2026-02-21T08:54:40.1279307Z add.s32 %r361, %r792, %r360; 2026-02-21T08:54:40.1279384Z shr.s32 %r362, %r361, 8; 2026-02-21T08:54:40.1279550Z .loc 1 37 33 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:37:33 2026-02-21T08:54:40.1279610Z shl.b32 %r363, %r362, 4; 2026-02-21T08:54:40.1279770Z .loc 1 38 39 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:38:39 2026-02-21T08:54:40.1279827Z sub.s32 %r364, 192, %r363; 2026-02-21T08:54:40.1280002Z .loc 1 38 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:38:52 2026-02-21T08:54:40.1280058Z min.s32 %r365, %r364, 16; 2026-02-21T08:54:40.1280219Z .loc 1 39 45 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:45 2026-02-21T08:54:40.1280275Z and.b32 %r366, %r361, -256; 2026-02-21T08:54:40.1280338Z sub.s32 %r367, %r792, %r366; 2026-02-21T08:54:40.1280500Z .loc 1 40 51 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:40:51 2026-02-21T08:54:40.1280557Z div.s32 %r368, %r367, %r365; 2026-02-21T08:54:40.1280729Z .loc 1 39 64 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:64 2026-02-21T08:54:40.1280788Z mul.lo.s32 %r369, %r368, %r365; 2026-02-21T08:54:40.1280842Z sub.s32 %r370, %r367, %r369; 2026-02-21T08:54:40.1281043Z .loc 1 39 30 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:39:30 2026-02-21T08:54:40.1281099Z add.s32 %r371, %r370, %r363; 2026-02-21T08:54:40.1281272Z .loc 1 41 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:41:27 2026-02-21T08:54:40.1281327Z shl.b32 %r372, %r371, 6; 2026-02-21T08:54:40.1281499Z .loc 1 42 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:42:32 2026-02-21T08:54:40.1281555Z or.b32 %r801, %r372, %r7; 2026-02-21T08:54:40.1281608Z or.b32 %r802, %r372, %r8; 2026-02-21T08:54:40.1281671Z or.b32 %r803, %r372, %r9; 2026-02-21T08:54:40.1281727Z or.b32 %r804, %r372, %r10; 2026-02-21T08:54:40.1281893Z .loc 1 43 27 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:43:27 2026-02-21T08:54:40.1281957Z shl.b32 %r373, %r368, 7; 2026-02-21T08:54:40.1282122Z .loc 1 44 32 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:44:32 2026-02-21T08:54:40.1282177Z or.b32 %r793, %r373, %r7; 2026-02-21T08:54:40.1282230Z or.b32 %r794, %r373, %r8; 2026-02-21T08:54:40.1282289Z or.b32 %r795, %r373, %r9; 2026-02-21T08:54:40.1282344Z or.b32 %r796, %r373, %r10; 2026-02-21T08:54:40.1282419Z or.b32 %r797, %r373, %r11; 2026-02-21T08:54:40.1282482Z or.b32 %r798, %r373, %r12; 2026-02-21T08:54:40.1282535Z or.b32 %r799, %r373, %r13; 2026-02-21T08:54:40.1282589Z or.b32 %r800, %r373, %r14; 2026-02-21T08:54:40.1282689Z $L__BB0_11: // in Loop: Header=BB0_9 Depth=2 2026-02-21T08:54:40.1282865Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1282928Z setp.ge.s32 %p24, %r791, %r43; 2026-02-21T08:54:40.1283015Z add.s32 %r374, %r774, 1; 2026-02-21T08:54:40.1283083Z setp.gt.s32 %p26, %r374, 4; 2026-02-21T08:54:40.1283142Z selp.b32 %r774, 0, %r374, %p26; 2026-02-21T08:54:40.1283305Z .loc 1 54 85 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:54:85 2026-02-21T08:54:40.1283372Z cp.async.wait_group 8; 2026-02-21T08:54:40.1283425Z bar.sync 2, 64; 2026-02-21T08:54:40.1283479Z shl.b32 %r375, %r774, 13; 2026-02-21T08:54:40.1283540Z add.s32 %r95, %r137, %r375; 2026-02-21T08:54:40.1283703Z .loc 1 55 87 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:55:87 2026-02-21T08:54:40.1283756Z shl.b32 %r377, %r774, 12; 2026-02-21T08:54:40.1283810Z add.s32 %r378, %r137, %r377; 2026-02-21T08:54:40.1283873Z add.s32 %r96, %r378, 40960; 2026-02-21T08:54:40.1284036Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1284115Z or.pred %p27, %p11, %p24; 2026-02-21T08:54:40.1284180Z @%p27 bra $L__BB0_13; 2026-02-21T08:54:40.1284269Z // %bb.12: // in Loop: Header=BB0_9 Depth=2 2026-02-21T08:54:40.1284436Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1284503Z setp.eq.b32 %p33, %r773, 63; 2026-02-21T08:54:40.1284559Z shl.b32 %r383, %r788, 3; 2026-02-21T08:54:40.1284614Z add.s32 %r385, %r137, %r383; 2026-02-21T08:54:40.1284704Z add.s32 %r386, %r385, 77824; 2026-02-21T08:54:40.1284879Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1284935Z shl.b32 %r387, %r788, 6; 2026-02-21T08:54:40.1284990Z add.s32 %r379, %r387, %r310; 2026-02-21T08:54:40.1285061Z not.pred %p28, %p87; 2026-02-21T08:54:40.1285122Z elect.sync %r388|%p29, -1; 2026-02-21T08:54:40.1285179Z bfe.u32 %r389, %r95, 4, 14; 2026-02-21T08:54:40.1285237Z cvt.u64.u32 %rd141, %r389; 2026-02-21T08:54:40.1285318Z or.b64 %rd136, %rd141, -9223371899382267904; 2026-02-21T08:54:40.1285375Z bfe.u32 %r390, %r96, 4, 14; 2026-02-21T08:54:40.1285431Z cvt.u64.u32 %rd142, %r390; 2026-02-21T08:54:40.1285507Z or.b64 %rd137, %rd142, -9223371899399045120; 2026-02-21T08:54:40.1285599Z mov.b32 %r380, 135266320; 2026-02-21T08:54:40.1285655Z // begin inline asm 2026-02-21T08:54:40.1285803Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r379 + 0 ], %rd136, %rd137, %r380, %p28; 2026-02-21T08:54:40.1285861Z // end inline asm 2026-02-21T08:54:40.1285915Z add.s32 %r391, %r95, 32; 2026-02-21T08:54:40.1285971Z bfe.u32 %r392, %r391, 4, 14; 2026-02-21T08:54:40.1286035Z cvt.u64.u32 %rd143, %r392; 2026-02-21T08:54:40.1286104Z or.b64 %rd138, %rd143, -9223371899382267904; 2026-02-21T08:54:40.1286159Z add.s32 %r393, %r96, 32; 2026-02-21T08:54:40.1286222Z bfe.u32 %r394, %r393, 4, 14; 2026-02-21T08:54:40.1286278Z cvt.u64.u32 %rd144, %r394; 2026-02-21T08:54:40.1286345Z or.b64 %rd139, %rd144, -9223371899399045120; 2026-02-21T08:54:40.1286403Z mov.pred %p30, -1; 2026-02-21T08:54:40.1286465Z // begin inline asm 2026-02-21T08:54:40.1286600Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r379 + 0 ], %rd138, %rd139, %r380, %p30; 2026-02-21T08:54:40.1286654Z // end inline asm 2026-02-21T08:54:40.1286725Z and.pred %p32, %p33, %p29; 2026-02-21T08:54:40.1286782Z cvt.u64.u32 %rd140, %r386; 2026-02-21T08:54:40.1286836Z // begin inline asm 2026-02-21T08:54:40.1286988Z @%p32 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd140]; 2026-02-21T08:54:40.1287043Z // end inline asm 2026-02-21T08:54:40.1287099Z bra.uni $L__BB0_13; 2026-02-21T08:54:40.1287192Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1287374Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1287429Z barrier.sync 1; 2026-02-21T08:54:40.1287481Z barrier.sync 1; 2026-02-21T08:54:40.1287542Z bra.uni $L__BB0_2; 2026-02-21T08:54:40.1287648Z $L__BB0_14: // %._crit_edge 2026-02-21T08:54:40.1287733Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1287799Z cp.async.wait_group 0; 2026-02-21T08:54:40.1287852Z bar.sync 2, 64; 2026-02-21T08:54:40.1287906Z barrier.sync 1; 2026-02-21T08:54:40.1287958Z bra.uni $L__BB0_2; 2026-02-21T08:54:40.1288056Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:40.1288221Z .loc 1 19 0 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:19 2026-02-21T08:54:40.1288275Z barrier.sync 1; 2026-02-21T08:54:40.1288335Z barrier.sync 1; 2026-02-21T08:54:40.1288388Z bra.uni $L__BB0_2; 2026-02-21T08:54:40.1288468Z $L__BB0_24: // %._crit_edge8 2026-02-21T08:54:40.1288639Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1288740Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:40.1288797Z bar.sync 0, 128; 2026-02-21T08:54:40.1288851Z barrier.sync 1; 2026-02-21T08:54:40.1288913Z shl.b32 %r767, %r816, 3; 2026-02-21T08:54:40.1288970Z add.s32 %r760, %r532, %r767; 2026-02-21T08:54:40.1289138Z .loc 1 56 52 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:56:52 2026-02-21T08:54:40.1289197Z // begin inline asm 2026-02-21T08:54:40.1289244Z 2026-02-21T08:54:40.1289291Z { 2026-02-21T08:54:40.1289351Z .reg .pred complete; 2026-02-21T08:54:40.1289412Z waitLoop: 2026-02-21T08:54:40.1289527Z mbarrier.try_wait.parity.shared.b64 complete, [%r760], %r818; 2026-02-21T08:54:40.1289588Z @!complete bra.uni waitLoop; 2026-02-21T08:54:40.1289640Z } 2026-02-21T08:54:40.1289644Z 2026-02-21T08:54:40.1289697Z // end inline asm 2026-02-21T08:54:40.1289865Z .loc 1 30 107 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:107 2026-02-21T08:54:40.1289919Z bar.sync 0, 128; 2026-02-21T08:54:40.1289980Z // begin inline asm 2026-02-21T08:54:40.1290060Z @%p82 mbarrier.inval.shared::cta.b64 [%r532]; 2026-02-21T08:54:40.1290112Z // end inline asm 2026-02-21T08:54:40.1290172Z bar.sync 0, 128; 2026-02-21T08:54:40.1290224Z // begin inline asm 2026-02-21T08:54:40.1290325Z @%p82 mbarrier.inval.shared::cta.b64 [%r533]; 2026-02-21T08:54:40.1290382Z // end inline asm 2026-02-21T08:54:40.1290435Z // begin inline asm 2026-02-21T08:54:40.1290509Z @%p82 mbarrier.inval.shared::cta.b64 [%r530]; 2026-02-21T08:54:40.1290561Z // end inline asm 2026-02-21T08:54:40.1290619Z bar.sync 0, 128; 2026-02-21T08:54:40.1290671Z // begin inline asm 2026-02-21T08:54:40.1290743Z @%p82 mbarrier.inval.shared::cta.b64 [%r531]; 2026-02-21T08:54:40.1290801Z // end inline asm 2026-02-21T08:54:40.1290965Z .loc 1 30 4 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:30:4 2026-02-21T08:54:40.1291017Z bar.sync 0, 128; 2026-02-21T08:54:40.1291070Z // begin inline asm 2026-02-21T08:54:40.1291191Z @%p42 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r766, 128; 2026-02-21T08:54:40.1291245Z // end inline asm 2026-02-21T08:54:40.1291320Z st.shared.b32 [global_smem+77856], 50529027; 2026-02-21T08:54:40.1291380Z barrier.sync 1; 2026-02-21T08:54:40.1291461Z $L__BB0_25: // %common.ret 2026-02-21T08:54:40.1291621Z .loc 1 0 0 // chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py:0 2026-02-21T08:54:40.1291676Z ret; 2026-02-21T08:54:40.1291749Z $L__tmp0: 2026-02-21T08:54:40.1291804Z $L__func_end0: 2026-02-21T08:54:40.1291883Z // -- End function 2026-02-21T08:54:40.1291937Z } 2026-02-21T08:54:40.1292136Z .file 1 "/tmp/torchinductor_root/hz/chzsf5dwca7atvb4ixejrcdo24ubblxbwncw4rcasrfccgowjhhg.py" 2026-02-21T08:54:40.1292195Z .section .debug_abbrev 2026-02-21T08:54:40.1292249Z { 2026-02-21T08:54:40.1292333Z .b8 1 // Abbreviation Code 2026-02-21T08:54:40.1292417Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:40.1292522Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:40.1292596Z .b8 37 // DW_AT_producer 2026-02-21T08:54:40.1292668Z .b8 8 // DW_FORM_string 2026-02-21T08:54:40.1292740Z .b8 19 // DW_AT_language 2026-02-21T08:54:40.1292820Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:40.1292892Z .b8 3 // DW_AT_name 2026-02-21T08:54:40.1292961Z .b8 8 // DW_FORM_string 2026-02-21T08:54:40.1293044Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:40.1293115Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:40.1293186Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:40.1293282Z .b8 8 // DW_FORM_string 2026-02-21T08:54:40.1293353Z .b8 0 // EOM(1) 2026-02-21T08:54:40.1293419Z .b8 0 // EOM(2) 2026-02-21T08:54:40.1293483Z .b8 0 // EOM(3) 2026-02-21T08:54:40.1293539Z } 2026-02-21T08:54:40.1293597Z .section .debug_info 2026-02-21T08:54:40.1293646Z { 2026-02-21T08:54:40.1293731Z .b32 104 // Length of Unit 2026-02-21T08:54:40.1293812Z .b8 2 // DWARF version number 2026-02-21T08:54:40.1293863Z .b8 0 2026-02-21T08:54:40.1293973Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:40.1294065Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:40.1294160Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:40.1294236Z .b8 116 // DW_AT_producer 2026-02-21T08:54:40.1294295Z .b8 114 2026-02-21T08:54:40.1294347Z .b8 105 2026-02-21T08:54:40.1294398Z .b8 116 2026-02-21T08:54:40.1294452Z .b8 111 2026-02-21T08:54:40.1294501Z .b8 110 2026-02-21T08:54:40.1294549Z .b8 0 2026-02-21T08:54:40.1294619Z .b8 2 // DW_AT_language 2026-02-21T08:54:40.1294750Z .b8 0 2026-02-21T08:54:40.1294821Z .b8 99 // DW_AT_name 2026-02-21T08:54:40.1294870Z .b8 104 2026-02-21T08:54:40.1294924Z .b8 122 2026-02-21T08:54:40.1294970Z .b8 115 2026-02-21T08:54:40.1295017Z .b8 102 2026-02-21T08:54:40.1295066Z .b8 53 2026-02-21T08:54:40.1295121Z .b8 100 2026-02-21T08:54:40.1295168Z .b8 119 2026-02-21T08:54:40.1295217Z .b8 99 2026-02-21T08:54:40.1295273Z .b8 97 2026-02-21T08:54:40.1295321Z .b8 55 2026-02-21T08:54:40.1295368Z .b8 97 2026-02-21T08:54:40.1295415Z .b8 116 2026-02-21T08:54:40.1295472Z .b8 118 2026-02-21T08:54:40.1295521Z .b8 98 2026-02-21T08:54:40.1295568Z .b8 52 2026-02-21T08:54:40.1295619Z .b8 105 2026-02-21T08:54:40.1295676Z .b8 120 2026-02-21T08:54:40.1295727Z .b8 101 2026-02-21T08:54:40.1295776Z .b8 106 2026-02-21T08:54:40.1295835Z .b8 114 2026-02-21T08:54:40.1295884Z .b8 99 2026-02-21T08:54:40.1295934Z .b8 100 2026-02-21T08:54:40.1295984Z .b8 111 2026-02-21T08:54:40.1296039Z .b8 50 2026-02-21T08:54:40.1296087Z .b8 52 2026-02-21T08:54:40.1296136Z .b8 117 2026-02-21T08:54:40.1296188Z .b8 98 2026-02-21T08:54:40.1296235Z .b8 98 2026-02-21T08:54:40.1296282Z .b8 108 2026-02-21T08:54:40.1296328Z .b8 120 2026-02-21T08:54:40.1296384Z .b8 98 2026-02-21T08:54:40.1296430Z .b8 119 2026-02-21T08:54:40.1296505Z .b8 110 2026-02-21T08:54:40.1296554Z .b8 99 2026-02-21T08:54:40.1296610Z .b8 119 2026-02-21T08:54:40.1296657Z .b8 52 2026-02-21T08:54:40.1296705Z .b8 114 2026-02-21T08:54:40.1296757Z .b8 99 2026-02-21T08:54:40.1296803Z .b8 97 2026-02-21T08:54:40.1296849Z .b8 115 2026-02-21T08:54:40.1296895Z .b8 114 2026-02-21T08:54:40.1296951Z .b8 102 2026-02-21T08:54:40.1296997Z .b8 99 2026-02-21T08:54:40.1297044Z .b8 99 2026-02-21T08:54:40.1297097Z .b8 103 2026-02-21T08:54:40.1297145Z .b8 111 2026-02-21T08:54:40.1297222Z .b8 119 2026-02-21T08:54:40.1297269Z .b8 106 2026-02-21T08:54:40.1297324Z .b8 104 2026-02-21T08:54:40.1297372Z .b8 104 2026-02-21T08:54:40.1297419Z .b8 103 2026-02-21T08:54:40.1297472Z .b8 46 2026-02-21T08:54:40.1297520Z .b8 112 2026-02-21T08:54:40.1297571Z .b8 121 2026-02-21T08:54:40.1297618Z .b8 0 2026-02-21T08:54:40.1297712Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:40.1297784Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:40.1297833Z .b8 116 2026-02-21T08:54:40.1297887Z .b8 109 2026-02-21T08:54:40.1297934Z .b8 112 2026-02-21T08:54:40.1297982Z .b8 47 2026-02-21T08:54:40.1298028Z .b8 116 2026-02-21T08:54:40.1298081Z .b8 111 2026-02-21T08:54:40.1298129Z .b8 114 2026-02-21T08:54:40.1298175Z .b8 99 2026-02-21T08:54:40.1298222Z .b8 104 2026-02-21T08:54:40.1298275Z .b8 105 2026-02-21T08:54:40.1298323Z .b8 110 2026-02-21T08:54:40.1298370Z .b8 100 2026-02-21T08:54:40.1298425Z .b8 117 2026-02-21T08:54:40.1298497Z .b8 99 2026-02-21T08:54:40.1298547Z .b8 116 2026-02-21T08:54:40.1298595Z .b8 111 2026-02-21T08:54:40.1298650Z .b8 114 2026-02-21T08:54:40.1298697Z .b8 95 2026-02-21T08:54:40.1298744Z .b8 114 2026-02-21T08:54:40.1298798Z .b8 111 2026-02-21T08:54:40.1298845Z .b8 111 2026-02-21T08:54:40.1298895Z .b8 116 2026-02-21T08:54:40.1298941Z .b8 47 2026-02-21T08:54:40.1298996Z .b8 104 2026-02-21T08:54:40.1299044Z .b8 122 2026-02-21T08:54:40.1299091Z .b8 0 2026-02-21T08:54:40.1299137Z } 2026-02-21T08:54:40.1299206Z .section .debug_macinfo { } 2026-02-21T08:54:40.1299210Z 2026-02-21T08:54:40.1299284Z ================================================================ 2026-02-21T08:54:40.1299382Z please share the reproducer above with Triton project. 2026-02-21T08:54:41.2850563Z 2026-02-21T08:54:41.2855649Z Generation 16: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 36/36 13.9 configs/s 2026-02-21T08:54:41.5945827Z Generation 16: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2816.2 2026-02-21T08:54:41.5950358Z configs/s 2026-02-21T08:54:41.6472242Z [387s] Generation 16 complete: 2026-02-21T08:54:41.6472494Z error=8 2026-02-21T08:54:41.6472644Z ok=29 2026-02-21T08:54:41.6472771Z min=0.1033 2026-02-21T08:54:41.6482342Z mid=0.2356 2026-02-21T08:54:41.6482711Z max=27.9235 2026-02-21T08:54:41.6482877Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:54:41.6483176Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:54:41.6483388Z 'l2_groupings': [64], 2026-02-21T08:54:41.6483589Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:54:41.6483782Z 'loop_orders': [[0, 1]], 2026-02-21T08:54:41.6483945Z 'num_stages': 3, 2026-02-21T08:54:41.6484085Z 'num_warps': 8, 2026-02-21T08:54:41.6484232Z 'pid_type': 'flat', 2026-02-21T08:54:41.6484382Z 'range_flattens': [None, None], 2026-02-21T08:54:41.6484566Z 'range_multi_buffers': [None, None], 2026-02-21T08:54:41.6484802Z 'range_num_stages': [0, 0], 2026-02-21T08:54:41.6484977Z 'range_unroll_factors': [0, 0], 2026-02-21T08:54:41.6485159Z 'range_warp_specializes': [None, True]} 2026-02-21T08:54:41.6507143Z [387s] Fitting surrogate: 1154 points, 1154 targets 2026-02-21T08:54:42.1116012Z [387s] Generation 17 starting: 20 neighbors, 1 active search path(s) 2026-02-21T08:54:47.9540171Z Generation 17: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21/21 1.2 configs/s 2026-02-21T08:54:49.0276218Z Generation 17: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 21/21 20.6 configs/s 2026-02-21T08:54:49.0280756Z [394s] Generation 17 complete: 2026-02-21T08:54:49.0280952Z error=5 2026-02-21T08:54:49.0281079Z ok=17 2026-02-21T08:54:49.0281211Z min=0.1033 2026-02-21T08:54:49.0281341Z mid=0.3113 2026-02-21T08:54:49.0281477Z max=13.9727 2026-02-21T08:54:49.0281615Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:54:49.0281831Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:54:49.0282069Z 'l2_groupings': [64], 2026-02-21T08:54:49.0282250Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:54:49.0282461Z 'loop_orders': [[0, 1]], 2026-02-21T08:54:49.0283741Z 'num_stages': 3, 2026-02-21T08:54:49.0283926Z 'num_warps': 8, 2026-02-21T08:54:49.0284092Z 'pid_type': 'flat', 2026-02-21T08:54:49.0289804Z 'range_flattens': [None, None], 2026-02-21T08:54:49.0291264Z 'range_multi_buffers': [None, None], 2026-02-21T08:54:49.0291500Z 'range_num_stages': [0, 0], 2026-02-21T08:54:49.0291694Z 'range_unroll_factors': [0, 0], 2026-02-21T08:54:49.0291896Z 'range_warp_specializes': [None, True]} 2026-02-21T08:54:49.0317515Z [394s] Fitting surrogate: 1176 points, 1176 targets 2026-02-21T08:54:49.5330953Z [395s] Generation 18 starting: 19 neighbors, 1 active search path(s) 2026-02-21T08:54:52.4104118Z Generation 18: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20/20 3.9 configs/s 2026-02-21T08:54:52.9626907Z [398s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:52.9627215Z 2026-02-21T08:54:52.9632396Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T08:54:52.9633560Z 2026-02-21T08:54:52.9633872Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:52.9634113Z `ptxas` stderr: 2026-02-21T08:54:52.9634538Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 286 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:52.9635094Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:52.9635245Z 2026-02-21T08:54:52.9635640Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpfs9k0bjz.ptx -o /tmp/tmpfs9k0bjz.ptx.o 2026-02-21T08:54:52.9636109Z 2026-02-21T08:54:52.9636240Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:52.9636507Z ================================================================ 2026-02-21T08:54:52.9636834Z Internal Triton PTX codegen error 2026-02-21T08:54:52.9637035Z `ptxas` stderr: 2026-02-21T08:54:52.9637446Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 286 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:52.9637913Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:52.9638054Z 2026-02-21T08:54:52.9638417Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpfs9k0bjz.ptx -o /tmp/tmpfs9k0bjz.ptx.o 2026-02-21T08:54:52.9638847Z 2026-02-21T08:54:52.9638853Z 2026-02-21T08:54:52.9638910Z // 2026-02-21T08:54:52.9639063Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:52.9639230Z // 2026-02-21T08:54:52.9639298Z 2026-02-21T08:54:52.9639363Z .version 8.7 2026-02-21T08:54:52.9639497Z .target sm_100a 2026-02-21T08:54:52.9639639Z .address_size 64 2026-02-21T08:54:52.9639722Z 2026-02-21T08:54:52.9639841Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:52.9640094Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:52.9640358Z // @_helion_matmul 2026-02-21T08:54:52.9640557Z .visible .entry _helion_matmul( 2026-02-21T08:54:52.9640780Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:52.9641032Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:52.9641288Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:52.9641531Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:52.9641787Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:52.9642070Z ) 2026-02-21T08:54:52.9642190Z .reqntid 256 2026-02-21T08:54:52.9642325Z .maxnreg 32 2026-02-21T08:54:52.9642447Z { 2026-02-21T08:54:52.9642586Z .reg .pred %p<124>; 2026-02-21T08:54:52.9642738Z .reg .b32 %r<675>; 2026-02-21T08:54:52.9642890Z .reg .b64 %rd<396>; 2026-02-21T08:54:52.9643145Z .loc 1 19 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:19:0 2026-02-21T08:54:52.9643439Z $L__func_begin0: 2026-02-21T08:54:52.9643686Z .loc 1 19 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:19:0 2026-02-21T08:54:52.9643922Z 2026-02-21T08:54:52.9643974Z // %bb.0: 2026-02-21T08:54:52.9644135Z ld.param.b64 %rd23, [_helion_matmul_param_3]; 2026-02-21T08:54:52.9644320Z $L__tmp0: 2026-02-21T08:54:52.9644553Z .loc 1 19 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:19 2026-02-21T08:54:52.9644914Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:52.9645075Z shr.u32 %r2, %r1, 5; 2026-02-21T08:54:52.9645231Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T08:54:52.9645426Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T08:54:52.9645585Z @%p1 bra $L__BB0_12; 2026-02-21T08:54:52.9645736Z bra.uni $L__BB0_1; 2026-02-21T08:54:52.9645885Z $L__BB0_12: 2026-02-21T08:54:52.9646117Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0:0 2026-02-21T08:54:52.9646432Z ld.param.b64 %rd22, [_helion_matmul_param_2]; 2026-02-21T08:54:52.9646644Z ld.param.b64 %rd21, [_helion_matmul_param_1]; 2026-02-21T08:54:52.9646856Z ld.param.b64 %rd20, [_helion_matmul_param_0]; 2026-02-21T08:54:52.9647179Z .loc 1 19 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:19 2026-02-21T08:54:52.9647506Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:52.9647704Z setp.lt.u32 %p48, %r1, 32; 2026-02-21T08:54:52.9647868Z mov.b32 %r128, global_smem; 2026-02-21T08:54:52.9648036Z // begin inline asm 2026-02-21T08:54:52.9648283Z @%p48 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r128], 128; 2026-02-21T08:54:52.9648543Z // end inline asm 2026-02-21T08:54:52.9648693Z bar.sync 0, 128; 2026-02-21T08:54:52.9648848Z ld.shared.b32 %r670, [global_smem]; 2026-02-21T08:54:52.9649055Z bar.sync 0, 128; 2026-02-21T08:54:52.9649202Z // begin inline asm 2026-02-21T08:54:52.9649412Z @%p48 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:52.9649638Z // end inline asm 2026-02-21T08:54:52.9649925Z .loc 1 21 67 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:21:67 2026-02-21T08:54:52.9650213Z mov.u32 %r18, %ctaid.x; 2026-02-21T08:54:52.9650374Z mov.u32 %r153, %ctaid.y; 2026-02-21T08:54:52.9650531Z mov.u32 %r154, %ctaid.z; 2026-02-21T08:54:52.9650698Z mov.u32 %r155, %nctaid.x; 2026-02-21T08:54:52.9650856Z mov.u32 %r156, %nctaid.y; 2026-02-21T08:54:52.9651033Z mad.lo.s32 %r157, %r154, %r156, %r153; 2026-02-21T08:54:52.9651227Z mad.lo.s32 %r158, %r157, %r155, %r18; 2026-02-21T08:54:52.9651414Z mul.lo.s32 %r159, %r158, 384; 2026-02-21T08:54:52.9651596Z cvt.s64.s32 %rd136, %r159; 2026-02-21T08:54:52.9651766Z add.s64 %rd97, %rd23, %rd136; 2026-02-21T08:54:52.9651938Z shl.b32 %r160, %r1, 2; 2026-02-21T08:54:52.9652097Z add.s32 %r129, %r128, %r160; 2026-02-21T08:54:52.9652266Z mov.b32 %r164, 0; 2026-02-21T08:54:52.9652407Z // begin inline asm 2026-02-21T08:54:52.9652573Z @%p48 st.shared.b32 [ %r129 + 0 ], %r164; 2026-02-21T08:54:52.9652788Z // end inline asm 2026-02-21T08:54:52.9652945Z bar.warp.sync -1; 2026-02-21T08:54:52.9653108Z setp.eq.b32 %p113, %r1, 0; 2026-02-21T08:54:52.9653273Z cvt.u64.u32 %rd82, %r128; 2026-02-21T08:54:52.9653437Z // begin inline asm 2026-02-21T08:54:52.9653701Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd20; 2026-02-21T08:54:52.9654013Z // end inline asm 2026-02-21T08:54:52.9654154Z // begin inline asm 2026-02-21T08:54:52.9654398Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:52.9654723Z // end inline asm 2026-02-21T08:54:52.9654874Z mov.b32 %r131, 64; 2026-02-21T08:54:52.9655027Z // begin inline asm 2026-02-21T08:54:52.9655278Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r131; 2026-02-21T08:54:52.9655570Z // end inline asm 2026-02-21T08:54:52.9655710Z mov.b32 %r132, 128; 2026-02-21T08:54:52.9655867Z // begin inline asm 2026-02-21T08:54:52.9656113Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r132; 2026-02-21T08:54:52.9656406Z // end inline asm 2026-02-21T08:54:52.9656552Z mov.b32 %r133, 2048; 2026-02-21T08:54:52.9656697Z // begin inline asm 2026-02-21T08:54:52.9656956Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:52.9657240Z // end inline asm 2026-02-21T08:54:52.9657390Z // begin inline asm 2026-02-21T08:54:52.9657675Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r133; 2026-02-21T08:54:52.9657964Z // end inline asm 2026-02-21T08:54:52.9658111Z mov.b64 %rd90, 4096; 2026-02-21T08:54:52.9658253Z // begin inline asm 2026-02-21T08:54:52.9658522Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T08:54:52.9658798Z // end inline asm 2026-02-21T08:54:52.9658934Z mov.b32 %r135, 1; 2026-02-21T08:54:52.9659064Z // begin inline asm 2026-02-21T08:54:52.9659320Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:52.9659606Z // end inline asm 2026-02-21T08:54:52.9659735Z // begin inline asm 2026-02-21T08:54:52.9659990Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:52.9660269Z // end inline asm 2026-02-21T08:54:52.9660405Z // begin inline asm 2026-02-21T08:54:52.9660631Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:52.9660901Z // end inline asm 2026-02-21T08:54:52.9661037Z // begin inline asm 2026-02-21T08:54:52.9661279Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:52.9661592Z // end inline asm 2026-02-21T08:54:52.9661721Z // begin inline asm 2026-02-21T08:54:52.9661960Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:52.9662218Z // end inline asm 2026-02-21T08:54:52.9662356Z // begin inline asm 2026-02-21T08:54:52.9662583Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:52.9662841Z // end inline asm 2026-02-21T08:54:52.9662976Z // begin inline asm 2026-02-21T08:54:52.9663319Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd97 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:52.9663689Z // end inline asm 2026-02-21T08:54:52.9663848Z // begin inline asm 2026-02-21T08:54:52.9664055Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd97 + 0 ], 0x80; 2026-02-21T08:54:52.9664312Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:52.9664498Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:52.9664714Z // end inline asm 2026-02-21T08:54:52.9664847Z bar.sync 0, 128; 2026-02-21T08:54:52.9665100Z .loc 1 22 68 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:22:68 2026-02-21T08:54:52.9665388Z add.s32 %r161, %r159, 128; 2026-02-21T08:54:52.9665576Z cvt.s64.s32 %rd137, %r161; 2026-02-21T08:54:52.9665743Z add.s64 %rd115, %rd23, %rd137; 2026-02-21T08:54:52.9665897Z bar.sync 0, 128; 2026-02-21T08:54:52.9666034Z // begin inline asm 2026-02-21T08:54:52.9666182Z @%p48 st.shared.b32 [ %r129 + 0 ], %r164; 2026-02-21T08:54:52.9666361Z // end inline asm 2026-02-21T08:54:52.9666496Z bar.warp.sync -1; 2026-02-21T08:54:52.9666637Z // begin inline asm 2026-02-21T08:54:52.9666886Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd21; 2026-02-21T08:54:52.9667191Z // end inline asm 2026-02-21T08:54:52.9667330Z // begin inline asm 2026-02-21T08:54:52.9667549Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:52.9667805Z // end inline asm 2026-02-21T08:54:52.9667933Z // begin inline asm 2026-02-21T08:54:52.9668169Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r131; 2026-02-21T08:54:52.9668442Z // end inline asm 2026-02-21T08:54:52.9668571Z // begin inline asm 2026-02-21T08:54:52.9668804Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r132; 2026-02-21T08:54:52.9669061Z // end inline asm 2026-02-21T08:54:52.9669197Z // begin inline asm 2026-02-21T08:54:52.9669434Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r133; 2026-02-21T08:54:52.9669716Z // end inline asm 2026-02-21T08:54:52.9669847Z mov.b32 %r142, 12288; 2026-02-21T08:54:52.9670021Z // begin inline asm 2026-02-21T08:54:52.9670267Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r142; 2026-02-21T08:54:52.9670539Z // end inline asm 2026-02-21T08:54:52.9670672Z // begin inline asm 2026-02-21T08:54:52.9670917Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T08:54:52.9671202Z // end inline asm 2026-02-21T08:54:52.9671328Z // begin inline asm 2026-02-21T08:54:52.9671584Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:52.9671877Z // end inline asm 2026-02-21T08:54:52.9672006Z // begin inline asm 2026-02-21T08:54:52.9672257Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:52.9672546Z // end inline asm 2026-02-21T08:54:52.9672680Z // begin inline asm 2026-02-21T08:54:52.9672905Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:52.9673172Z // end inline asm 2026-02-21T08:54:52.9673300Z // begin inline asm 2026-02-21T08:54:52.9673553Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:52.9673839Z // end inline asm 2026-02-21T08:54:52.9673998Z // begin inline asm 2026-02-21T08:54:52.9674234Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:52.9674496Z // end inline asm 2026-02-21T08:54:52.9674629Z // begin inline asm 2026-02-21T08:54:52.9674883Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:52.9675152Z // end inline asm 2026-02-21T08:54:52.9675288Z // begin inline asm 2026-02-21T08:54:52.9675627Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd115 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:52.9676015Z // end inline asm 2026-02-21T08:54:52.9676143Z // begin inline asm 2026-02-21T08:54:52.9676359Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd115 + 0 ], 0x80; 2026-02-21T08:54:52.9676617Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:52.9676810Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:52.9676987Z // end inline asm 2026-02-21T08:54:52.9677116Z bar.sync 0, 128; 2026-02-21T08:54:52.9677364Z .loc 1 24 73 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:24:73 2026-02-21T08:54:52.9677659Z add.s32 %r162, %r159, 256; 2026-02-21T08:54:52.9677870Z cvt.s64.s32 %rd138, %r162; 2026-02-21T08:54:52.9678028Z add.s64 %rd19, %rd23, %rd138; 2026-02-21T08:54:52.9678189Z bar.sync 0, 128; 2026-02-21T08:54:52.9678319Z // begin inline asm 2026-02-21T08:54:52.9678475Z @%p48 st.shared.b32 [ %r129 + 0 ], %r164; 2026-02-21T08:54:52.9678651Z // end inline asm 2026-02-21T08:54:52.9678783Z bar.warp.sync -1; 2026-02-21T08:54:52.9678922Z // begin inline asm 2026-02-21T08:54:52.9679166Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd22; 2026-02-21T08:54:52.9679485Z // end inline asm 2026-02-21T08:54:52.9679613Z // begin inline asm 2026-02-21T08:54:52.9679834Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T08:54:52.9680081Z // end inline asm 2026-02-21T08:54:52.9680216Z // begin inline asm 2026-02-21T08:54:52.9680450Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r131; 2026-02-21T08:54:52.9680710Z // end inline asm 2026-02-21T08:54:52.9680846Z // begin inline asm 2026-02-21T08:54:52.9681072Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r132; 2026-02-21T08:54:52.9681341Z // end inline asm 2026-02-21T08:54:52.9681468Z // begin inline asm 2026-02-21T08:54:52.9681712Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r142; 2026-02-21T08:54:52.9681988Z // end inline asm 2026-02-21T08:54:52.9682115Z // begin inline asm 2026-02-21T08:54:52.9682382Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r133; 2026-02-21T08:54:52.9682655Z // end inline asm 2026-02-21T08:54:52.9682794Z mov.b64 %rd126, 24576; 2026-02-21T08:54:52.9682937Z // begin inline asm 2026-02-21T08:54:52.9683191Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd126; 2026-02-21T08:54:52.9683470Z // end inline asm 2026-02-21T08:54:52.9683607Z // begin inline asm 2026-02-21T08:54:52.9683864Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r135; 2026-02-21T08:54:52.9684147Z // end inline asm 2026-02-21T08:54:52.9684285Z // begin inline asm 2026-02-21T08:54:52.9684532Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r135; 2026-02-21T08:54:52.9684850Z // end inline asm 2026-02-21T08:54:52.9684978Z // begin inline asm 2026-02-21T08:54:52.9685213Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T08:54:52.9685479Z // end inline asm 2026-02-21T08:54:52.9685608Z // begin inline asm 2026-02-21T08:54:52.9685859Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:52.9686131Z // end inline asm 2026-02-21T08:54:52.9686316Z // begin inline asm 2026-02-21T08:54:52.9686544Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T08:54:52.9686818Z // end inline asm 2026-02-21T08:54:52.9686945Z // begin inline asm 2026-02-21T08:54:52.9687173Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T08:54:52.9687434Z // end inline asm 2026-02-21T08:54:52.9687560Z // begin inline asm 2026-02-21T08:54:52.9687900Z @%p48 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T08:54:52.9688262Z // end inline asm 2026-02-21T08:54:52.9688403Z // begin inline asm 2026-02-21T08:54:52.9688607Z @%p48 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T08:54:52.9688864Z @%p48 cp.async.bulk.commit_group ; 2026-02-21T08:54:52.9689054Z @%p48 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:52.9689223Z // end inline asm 2026-02-21T08:54:52.9689359Z bar.sync 0, 128; 2026-02-21T08:54:52.9689596Z .loc 1 33 75 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:33:75 2026-02-21T08:54:52.9689885Z setp.gt.u32 %p104, %r18, 1535; 2026-02-21T08:54:52.9690047Z @%p104 bra $L__BB0_14; 2026-02-21T08:54:52.9690243Z // %bb.13: // %.lr.ph 2026-02-21T08:54:52.9690534Z .loc 1 24 73 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:24:73 2026-02-21T08:54:52.9690825Z cvta.global.u64 %rd139, %rd19; 2026-02-21T08:54:52.9690998Z setp.lt.u32 %p121, %r1, 64; 2026-02-21T08:54:52.9691151Z shl.b32 %r447, %r1, 7; 2026-02-21T08:54:52.9691306Z and.b32 %r448, %r447, 16256; 2026-02-21T08:54:52.9691456Z shl.b32 %r449, %r1, 4; 2026-02-21T08:54:52.9691636Z and.b32 %r450, %r449, 112; 2026-02-21T08:54:52.9691785Z or.b32 %r451, %r448, %r450; 2026-02-21T08:54:52.9691944Z xor.b32 %r452, %r451, 112; 2026-02-21T08:54:52.9692091Z add.s32 %r454, %r128, %r452; 2026-02-21T08:54:52.9692250Z xor.b32 %r455, %r451, 96; 2026-02-21T08:54:52.9692406Z add.s32 %r456, %r128, %r455; 2026-02-21T08:54:52.9692554Z xor.b32 %r457, %r451, 80; 2026-02-21T08:54:52.9692707Z add.s32 %r458, %r128, %r457; 2026-02-21T08:54:52.9692855Z xor.b32 %r459, %r451, 64; 2026-02-21T08:54:52.9693017Z add.s32 %r460, %r128, %r459; 2026-02-21T08:54:52.9693168Z xor.b32 %r461, %r451, 48; 2026-02-21T08:54:52.9693326Z add.s32 %r462, %r128, %r461; 2026-02-21T08:54:52.9693478Z xor.b32 %r463, %r451, 32; 2026-02-21T08:54:52.9693635Z add.s32 %r464, %r128, %r463; 2026-02-21T08:54:52.9693795Z xor.b32 %r465, %r451, 16; 2026-02-21T08:54:52.9693949Z add.s32 %r466, %r128, %r465; 2026-02-21T08:54:52.9694115Z add.s32 %r467, %r128, %r451; 2026-02-21T08:54:52.9694408Z .loc 1 44 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:44:27 2026-02-21T08:54:52.9694744Z shl.b32 %r468, %r18, 7; 2026-02-21T08:54:52.9694897Z and.b32 %r445, %r468, 1920; 2026-02-21T08:54:52.9695166Z .loc 1 45 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:45:27 2026-02-21T08:54:52.9695448Z shl.b32 %r469, %r18, 3; 2026-02-21T08:54:52.9695603Z and.b32 %r470, %r469, 16256; 2026-02-21T08:54:52.9695875Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9696170Z shfl.sync.idx.b32 %r471, %r2, 0, 31, -1; 2026-02-21T08:54:52.9696362Z shl.b32 %r472, %r471, 21; 2026-02-21T08:54:52.9696517Z and.b32 %r473, %r472, 6291456; 2026-02-21T08:54:52.9696685Z add.s32 %r163, %r473, %r670; 2026-02-21T08:54:52.9696841Z mov.pred %p105, -1; 2026-02-21T08:54:52.9696998Z // begin inline asm 2026-02-21T08:54:52.9697376Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 0], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9697788Z // end inline asm 2026-02-21T08:54:52.9697935Z // begin inline asm 2026-02-21T08:54:52.9698298Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 16], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9698738Z // end inline asm 2026-02-21T08:54:52.9698878Z // begin inline asm 2026-02-21T08:54:52.9699247Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 32], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9699650Z // end inline asm 2026-02-21T08:54:52.9699786Z // begin inline asm 2026-02-21T08:54:52.9700155Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 48], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9700554Z // end inline asm 2026-02-21T08:54:52.9700697Z // begin inline asm 2026-02-21T08:54:52.9701056Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 64], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9701458Z // end inline asm 2026-02-21T08:54:52.9701593Z // begin inline asm 2026-02-21T08:54:52.9701962Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 80], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9702343Z // end inline asm 2026-02-21T08:54:52.9702471Z // begin inline asm 2026-02-21T08:54:52.9702818Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 96], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9703217Z // end inline asm 2026-02-21T08:54:52.9703349Z // begin inline asm 2026-02-21T08:54:52.9703727Z @%p105 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r163 + 112], {%r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164, %r164}; 2026-02-21T08:54:52.9704120Z // end inline asm 2026-02-21T08:54:52.9704256Z // begin inline asm 2026-02-21T08:54:52.9704402Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:52.9704566Z // end inline asm 2026-02-21T08:54:52.9704733Z bar.sync 0, 128; 2026-02-21T08:54:52.9704987Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9705279Z add.s32 %r299, %r128, 65648; 2026-02-21T08:54:52.9705429Z // begin inline asm 2026-02-21T08:54:52.9705600Z @%p113 mbarrier.init.shared::cta.b64 [%r299], 1; 2026-02-21T08:54:52.9705789Z // end inline asm 2026-02-21T08:54:52.9705928Z add.s32 %r300, %r128, 65664; 2026-02-21T08:54:52.9706077Z // begin inline asm 2026-02-21T08:54:52.9706271Z @%p113 mbarrier.init.shared::cta.b64 [%r300], 1; 2026-02-21T08:54:52.9706457Z // end inline asm 2026-02-21T08:54:52.9706699Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0 2026-02-21T08:54:52.9706982Z bar.sync 0, 128; 2026-02-21T08:54:52.9707111Z // begin inline asm 2026-02-21T08:54:52.9707288Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r299]; 2026-02-21T08:54:52.9707476Z // end inline asm 2026-02-21T08:54:52.9707718Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9707993Z bar.sync 0, 128; 2026-02-21T08:54:52.9708130Z add.s32 %r302, %r128, 65680; 2026-02-21T08:54:52.9708278Z // begin inline asm 2026-02-21T08:54:52.9708442Z @%p113 mbarrier.init.shared::cta.b64 [%r302], 1; 2026-02-21T08:54:52.9708626Z // end inline asm 2026-02-21T08:54:52.9708777Z st.shared.b32 [global_smem+65688], 33554689; 2026-02-21T08:54:52.9708981Z st.shared.b32 [global_smem+65536], %r670; 2026-02-21T08:54:52.9709196Z st.shared.v2.b32 [global_smem+65544], {%r445, %r470}; 2026-02-21T08:54:52.9709401Z barrier.sync 1; 2026-02-21T08:54:52.9709552Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:52.9709738Z barrier.sync 1; 2026-02-21T08:54:52.9709868Z barrier.sync 1; 2026-02-21T08:54:52.9710022Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T08:54:52.9710338Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9710614Z bar.sync 0, 128; 2026-02-21T08:54:52.9710751Z // begin inline asm 2026-02-21T08:54:52.9710879Z 2026-02-21T08:54:52.9710995Z { 2026-02-21T08:54:52.9711113Z .reg .pred complete; 2026-02-21T08:54:52.9711259Z waitLoop: 2026-02-21T08:54:52.9711440Z mbarrier.try_wait.parity.shared.b64 complete, [%r302], %r164; 2026-02-21T08:54:52.9711676Z @!complete bra.uni waitLoop; 2026-02-21T08:54:52.9711824Z } 2026-02-21T08:54:52.9711892Z 2026-02-21T08:54:52.9711944Z // end inline asm 2026-02-21T08:54:52.9712183Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9712457Z bar.sync 0, 128; 2026-02-21T08:54:52.9712593Z // begin inline asm 2026-02-21T08:54:52.9712756Z @%p113 mbarrier.inval.shared::cta.b64 [%r302]; 2026-02-21T08:54:52.9712948Z // end inline asm 2026-02-21T08:54:52.9713077Z // begin inline asm 2026-02-21T08:54:52.9713242Z @%p113 mbarrier.inval.shared::cta.b64 [%r300]; 2026-02-21T08:54:52.9713429Z // end inline asm 2026-02-21T08:54:52.9713560Z // begin inline asm 2026-02-21T08:54:52.9713752Z @%p113 mbarrier.inval.shared::cta.b64 [%r299]; 2026-02-21T08:54:52.9713932Z // end inline asm 2026-02-21T08:54:52.9714177Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9714452Z // begin inline asm 2026-02-21T08:54:52.9714851Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r308, %r309, %r310, %r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323}, [%r163 + 0]; 2026-02-21T08:54:52.9715236Z // end inline asm 2026-02-21T08:54:52.9715398Z // begin inline asm 2026-02-21T08:54:52.9715765Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340}, [%r163 + 16]; 2026-02-21T08:54:52.9716158Z // end inline asm 2026-02-21T08:54:52.9716304Z // begin inline asm 2026-02-21T08:54:52.9716650Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357}, [%r163 + 32]; 2026-02-21T08:54:52.9717046Z // end inline asm 2026-02-21T08:54:52.9717194Z // begin inline asm 2026-02-21T08:54:52.9717550Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r359, %r360, %r361, %r362, %r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374}, [%r163 + 48]; 2026-02-21T08:54:52.9717956Z // end inline asm 2026-02-21T08:54:52.9718097Z // begin inline asm 2026-02-21T08:54:52.9718475Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r376, %r377, %r378, %r379, %r380, %r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391}, [%r163 + 64]; 2026-02-21T08:54:52.9718859Z // end inline asm 2026-02-21T08:54:52.9718997Z // begin inline asm 2026-02-21T08:54:52.9719337Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408}, [%r163 + 80]; 2026-02-21T08:54:52.9719709Z // end inline asm 2026-02-21T08:54:52.9719846Z // begin inline asm 2026-02-21T08:54:52.9720182Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425}, [%r163 + 96]; 2026-02-21T08:54:52.9720562Z // end inline asm 2026-02-21T08:54:52.9720691Z // begin inline asm 2026-02-21T08:54:52.9721037Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442}, [%r163 + 112]; 2026-02-21T08:54:52.9721414Z // end inline asm 2026-02-21T08:54:52.9721546Z // begin inline asm 2026-02-21T08:54:52.9721700Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:52.9721859Z // end inline asm 2026-02-21T08:54:52.9722002Z cvt.u64.u32 %rd140, %r308; 2026-02-21T08:54:52.9722206Z cvt.u64.u32 %rd141, %r309; 2026-02-21T08:54:52.9722367Z shl.b64 %rd142, %rd141, 32; 2026-02-21T08:54:52.9722523Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T08:54:52.9722793Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9723080Z mov.b64 {%r474, %r475}, %rd143; 2026-02-21T08:54:52.9723248Z cvt.rn.f16x2.f32 %r476, %r475, %r474; 2026-02-21T08:54:52.9723525Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9723800Z cvt.u64.u32 %rd144, %r310; 2026-02-21T08:54:52.9723956Z cvt.u64.u32 %rd145, %r311; 2026-02-21T08:54:52.9724103Z shl.b64 %rd146, %rd145, 32; 2026-02-21T08:54:52.9724263Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T08:54:52.9724529Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9724834Z mov.b64 {%r477, %r478}, %rd147; 2026-02-21T08:54:52.9725007Z cvt.rn.f16x2.f32 %r479, %r478, %r477; 2026-02-21T08:54:52.9725279Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9725560Z cvt.u64.u32 %rd148, %r312; 2026-02-21T08:54:52.9725736Z cvt.u64.u32 %rd149, %r313; 2026-02-21T08:54:52.9725893Z shl.b64 %rd150, %rd149, 32; 2026-02-21T08:54:52.9726043Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T08:54:52.9726306Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9726584Z mov.b64 {%r480, %r481}, %rd151; 2026-02-21T08:54:52.9726746Z cvt.rn.f16x2.f32 %r482, %r481, %r480; 2026-02-21T08:54:52.9727022Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9727323Z cvt.u64.u32 %rd152, %r314; 2026-02-21T08:54:52.9727478Z cvt.u64.u32 %rd153, %r315; 2026-02-21T08:54:52.9727624Z shl.b64 %rd154, %rd153, 32; 2026-02-21T08:54:52.9727784Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T08:54:52.9728048Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9728322Z mov.b64 {%r483, %r484}, %rd155; 2026-02-21T08:54:52.9728495Z cvt.rn.f16x2.f32 %r485, %r484, %r483; 2026-02-21T08:54:52.9728764Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9729043Z cvt.u64.u32 %rd156, %r316; 2026-02-21T08:54:52.9729191Z cvt.u64.u32 %rd157, %r317; 2026-02-21T08:54:52.9729346Z shl.b64 %rd158, %rd157, 32; 2026-02-21T08:54:52.9729496Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T08:54:52.9729787Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9730072Z mov.b64 {%r486, %r487}, %rd159; 2026-02-21T08:54:52.9730233Z cvt.rn.f16x2.f32 %r488, %r487, %r486; 2026-02-21T08:54:52.9730505Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9730782Z cvt.u64.u32 %rd160, %r318; 2026-02-21T08:54:52.9730938Z cvt.u64.u32 %rd161, %r319; 2026-02-21T08:54:52.9731082Z shl.b64 %rd162, %rd161, 32; 2026-02-21T08:54:52.9731239Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T08:54:52.9731503Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9731777Z mov.b64 {%r489, %r490}, %rd163; 2026-02-21T08:54:52.9731947Z cvt.rn.f16x2.f32 %r491, %r490, %r489; 2026-02-21T08:54:52.9732218Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9732497Z cvt.u64.u32 %rd164, %r320; 2026-02-21T08:54:52.9732646Z cvt.u64.u32 %rd165, %r321; 2026-02-21T08:54:52.9732800Z shl.b64 %rd166, %rd165, 32; 2026-02-21T08:54:52.9732951Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T08:54:52.9733216Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9733528Z mov.b64 {%r492, %r493}, %rd167; 2026-02-21T08:54:52.9733695Z cvt.rn.f16x2.f32 %r494, %r493, %r492; 2026-02-21T08:54:52.9733975Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9734249Z cvt.u64.u32 %rd168, %r322; 2026-02-21T08:54:52.9734406Z cvt.u64.u32 %rd169, %r323; 2026-02-21T08:54:52.9734557Z shl.b64 %rd170, %rd169, 32; 2026-02-21T08:54:52.9734750Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T08:54:52.9735008Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9735278Z mov.b64 {%r495, %r496}, %rd171; 2026-02-21T08:54:52.9735448Z cvt.rn.f16x2.f32 %r497, %r496, %r495; 2026-02-21T08:54:52.9735711Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9735990Z cvt.u64.u32 %rd172, %r325; 2026-02-21T08:54:52.9736136Z cvt.u64.u32 %rd173, %r326; 2026-02-21T08:54:52.9736295Z shl.b64 %rd174, %rd173, 32; 2026-02-21T08:54:52.9736443Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T08:54:52.9736699Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9737007Z mov.b64 {%r498, %r499}, %rd175; 2026-02-21T08:54:52.9737176Z cvt.rn.f16x2.f32 %r500, %r499, %r498; 2026-02-21T08:54:52.9737461Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9737745Z cvt.u64.u32 %rd176, %r327; 2026-02-21T08:54:52.9737906Z cvt.u64.u32 %rd177, %r328; 2026-02-21T08:54:52.9738060Z shl.b64 %rd178, %rd177, 32; 2026-02-21T08:54:52.9738230Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T08:54:52.9738501Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9738815Z mov.b64 {%r501, %r502}, %rd179; 2026-02-21T08:54:52.9738994Z cvt.rn.f16x2.f32 %r503, %r502, %r501; 2026-02-21T08:54:52.9739271Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9739572Z cvt.u64.u32 %rd180, %r329; 2026-02-21T08:54:52.9739730Z cvt.u64.u32 %rd181, %r330; 2026-02-21T08:54:52.9739895Z shl.b64 %rd182, %rd181, 32; 2026-02-21T08:54:52.9740053Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T08:54:52.9740322Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9740624Z mov.b64 {%r504, %r505}, %rd183; 2026-02-21T08:54:52.9740789Z cvt.rn.f16x2.f32 %r506, %r505, %r504; 2026-02-21T08:54:52.9741115Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9741402Z cvt.u64.u32 %rd184, %r331; 2026-02-21T08:54:52.9741565Z cvt.u64.u32 %rd185, %r332; 2026-02-21T08:54:52.9741718Z shl.b64 %rd186, %rd185, 32; 2026-02-21T08:54:52.9741881Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T08:54:52.9742155Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9742446Z mov.b64 {%r507, %r508}, %rd187; 2026-02-21T08:54:52.9742621Z cvt.rn.f16x2.f32 %r509, %r508, %r507; 2026-02-21T08:54:52.9742900Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9743200Z cvt.u64.u32 %rd188, %r333; 2026-02-21T08:54:52.9743354Z cvt.u64.u32 %rd189, %r334; 2026-02-21T08:54:52.9743518Z shl.b64 %rd190, %rd189, 32; 2026-02-21T08:54:52.9743677Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T08:54:52.9743953Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9744250Z mov.b64 {%r510, %r511}, %rd191; 2026-02-21T08:54:52.9744413Z cvt.rn.f16x2.f32 %r512, %r511, %r510; 2026-02-21T08:54:52.9744713Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9744991Z cvt.u64.u32 %rd192, %r335; 2026-02-21T08:54:52.9745181Z cvt.u64.u32 %rd193, %r336; 2026-02-21T08:54:52.9745327Z shl.b64 %rd194, %rd193, 32; 2026-02-21T08:54:52.9745480Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T08:54:52.9745740Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9746009Z mov.b64 {%r513, %r514}, %rd195; 2026-02-21T08:54:52.9746172Z cvt.rn.f16x2.f32 %r515, %r514, %r513; 2026-02-21T08:54:52.9746435Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9746711Z cvt.u64.u32 %rd196, %r337; 2026-02-21T08:54:52.9746857Z cvt.u64.u32 %rd197, %r338; 2026-02-21T08:54:52.9747013Z shl.b64 %rd198, %rd197, 32; 2026-02-21T08:54:52.9747164Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T08:54:52.9747421Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9747703Z mov.b64 {%r516, %r517}, %rd199; 2026-02-21T08:54:52.9747863Z cvt.rn.f16x2.f32 %r518, %r517, %r516; 2026-02-21T08:54:52.9748132Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9748432Z cvt.u64.u32 %rd200, %r339; 2026-02-21T08:54:52.9748589Z cvt.u64.u32 %rd201, %r340; 2026-02-21T08:54:52.9748734Z shl.b64 %rd202, %rd201, 32; 2026-02-21T08:54:52.9748895Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T08:54:52.9749159Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9749433Z mov.b64 {%r519, %r520}, %rd203; 2026-02-21T08:54:52.9749600Z cvt.rn.f16x2.f32 %r521, %r520, %r519; 2026-02-21T08:54:52.9749863Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9750175Z cvt.u64.u32 %rd204, %r342; 2026-02-21T08:54:52.9750323Z cvt.u64.u32 %rd205, %r343; 2026-02-21T08:54:52.9750481Z shl.b64 %rd206, %rd205, 32; 2026-02-21T08:54:52.9750644Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T08:54:52.9750906Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9751195Z mov.b64 {%r522, %r523}, %rd207; 2026-02-21T08:54:52.9751356Z cvt.rn.f16x2.f32 %r524, %r523, %r522; 2026-02-21T08:54:52.9751628Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9751900Z cvt.u64.u32 %rd208, %r344; 2026-02-21T08:54:52.9752054Z cvt.u64.u32 %rd209, %r345; 2026-02-21T08:54:52.9752200Z shl.b64 %rd210, %rd209, 32; 2026-02-21T08:54:52.9752357Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T08:54:52.9752651Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9752936Z mov.b64 {%r525, %r526}, %rd211; 2026-02-21T08:54:52.9753102Z cvt.rn.f16x2.f32 %r527, %r526, %r525; 2026-02-21T08:54:52.9753367Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9753652Z cvt.u64.u32 %rd212, %r346; 2026-02-21T08:54:52.9753798Z cvt.u64.u32 %rd213, %r347; 2026-02-21T08:54:52.9753951Z shl.b64 %rd214, %rd213, 32; 2026-02-21T08:54:52.9754101Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T08:54:52.9754360Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9754640Z mov.b64 {%r528, %r529}, %rd215; 2026-02-21T08:54:52.9754838Z cvt.rn.f16x2.f32 %r530, %r529, %r528; 2026-02-21T08:54:52.9755113Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9755388Z cvt.u64.u32 %rd216, %r348; 2026-02-21T08:54:52.9755547Z cvt.u64.u32 %rd217, %r349; 2026-02-21T08:54:52.9755693Z shl.b64 %rd218, %rd217, 32; 2026-02-21T08:54:52.9755851Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T08:54:52.9756111Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9756430Z mov.b64 {%r531, %r532}, %rd219; 2026-02-21T08:54:52.9756595Z cvt.rn.f16x2.f32 %r533, %r532, %r531; 2026-02-21T08:54:52.9756855Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9757129Z cvt.u64.u32 %rd220, %r350; 2026-02-21T08:54:52.9757274Z cvt.u64.u32 %rd221, %r351; 2026-02-21T08:54:52.9757426Z shl.b64 %rd222, %rd221, 32; 2026-02-21T08:54:52.9757576Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T08:54:52.9757834Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9758110Z mov.b64 {%r534, %r535}, %rd223; 2026-02-21T08:54:52.9758271Z cvt.rn.f16x2.f32 %r536, %r535, %r534; 2026-02-21T08:54:52.9758540Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9758816Z cvt.u64.u32 %rd224, %r352; 2026-02-21T08:54:52.9758971Z cvt.u64.u32 %rd225, %r353; 2026-02-21T08:54:52.9759117Z shl.b64 %rd226, %rd225, 32; 2026-02-21T08:54:52.9759274Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T08:54:52.9759562Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9759845Z mov.b64 {%r537, %r538}, %rd227; 2026-02-21T08:54:52.9760013Z cvt.rn.f16x2.f32 %r539, %r538, %r537; 2026-02-21T08:54:52.9760278Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9760564Z cvt.u64.u32 %rd228, %r354; 2026-02-21T08:54:52.9760711Z cvt.u64.u32 %rd229, %r355; 2026-02-21T08:54:52.9760869Z shl.b64 %rd230, %rd229, 32; 2026-02-21T08:54:52.9761024Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T08:54:52.9761316Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9761600Z mov.b64 {%r540, %r541}, %rd231; 2026-02-21T08:54:52.9761760Z cvt.rn.f16x2.f32 %r542, %r541, %r540; 2026-02-21T08:54:52.9762031Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9762312Z cvt.u64.u32 %rd232, %r356; 2026-02-21T08:54:52.9762469Z cvt.u64.u32 %rd233, %r357; 2026-02-21T08:54:52.9762615Z shl.b64 %rd234, %rd233, 32; 2026-02-21T08:54:52.9762773Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T08:54:52.9763034Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9763314Z mov.b64 {%r543, %r544}, %rd235; 2026-02-21T08:54:52.9763480Z cvt.rn.f16x2.f32 %r545, %r544, %r543; 2026-02-21T08:54:52.9763791Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9764077Z cvt.u64.u32 %rd236, %r359; 2026-02-21T08:54:52.9764224Z cvt.u64.u32 %rd237, %r360; 2026-02-21T08:54:52.9764377Z shl.b64 %rd238, %rd237, 32; 2026-02-21T08:54:52.9764527Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T08:54:52.9764823Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9765113Z mov.b64 {%r546, %r547}, %rd239; 2026-02-21T08:54:52.9765275Z cvt.rn.f16x2.f32 %r548, %r547, %r546; 2026-02-21T08:54:52.9765557Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9765837Z cvt.u64.u32 %rd240, %r361; 2026-02-21T08:54:52.9765994Z cvt.u64.u32 %rd241, %r362; 2026-02-21T08:54:52.9766141Z shl.b64 %rd242, %rd241, 32; 2026-02-21T08:54:52.9766299Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T08:54:52.9766566Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9766845Z mov.b64 {%r549, %r550}, %rd243; 2026-02-21T08:54:52.9767012Z cvt.rn.f16x2.f32 %r551, %r550, %r549; 2026-02-21T08:54:52.9767279Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9767590Z cvt.u64.u32 %rd244, %r363; 2026-02-21T08:54:52.9767737Z cvt.u64.u32 %rd245, %r364; 2026-02-21T08:54:52.9767889Z shl.b64 %rd246, %rd245, 32; 2026-02-21T08:54:52.9768037Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T08:54:52.9768296Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9768576Z mov.b64 {%r552, %r553}, %rd247; 2026-02-21T08:54:52.9768734Z cvt.rn.f16x2.f32 %r554, %r553, %r552; 2026-02-21T08:54:52.9769003Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9769271Z cvt.u64.u32 %rd248, %r365; 2026-02-21T08:54:52.9769426Z cvt.u64.u32 %rd249, %r366; 2026-02-21T08:54:52.9769573Z shl.b64 %rd250, %rd249, 32; 2026-02-21T08:54:52.9769728Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T08:54:52.9769987Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9770260Z mov.b64 {%r555, %r556}, %rd251; 2026-02-21T08:54:52.9770426Z cvt.rn.f16x2.f32 %r557, %r556, %r555; 2026-02-21T08:54:52.9770697Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9771003Z cvt.u64.u32 %rd252, %r367; 2026-02-21T08:54:52.9771155Z cvt.u64.u32 %rd253, %r368; 2026-02-21T08:54:52.9771310Z shl.b64 %rd254, %rd253, 32; 2026-02-21T08:54:52.9771458Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T08:54:52.9771724Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9772017Z mov.b64 {%r558, %r559}, %rd255; 2026-02-21T08:54:52.9772183Z cvt.rn.f16x2.f32 %r560, %r559, %r558; 2026-02-21T08:54:52.9772456Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9772758Z cvt.u64.u32 %rd256, %r369; 2026-02-21T08:54:52.9772911Z cvt.u64.u32 %rd257, %r370; 2026-02-21T08:54:52.9773057Z shl.b64 %rd258, %rd257, 32; 2026-02-21T08:54:52.9773215Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T08:54:52.9773478Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9773762Z mov.b64 {%r561, %r562}, %rd259; 2026-02-21T08:54:52.9773928Z cvt.rn.f16x2.f32 %r563, %r562, %r561; 2026-02-21T08:54:52.9774189Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9774464Z cvt.u64.u32 %rd260, %r371; 2026-02-21T08:54:52.9774612Z cvt.u64.u32 %rd261, %r372; 2026-02-21T08:54:52.9774797Z shl.b64 %rd262, %rd261, 32; 2026-02-21T08:54:52.9774949Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T08:54:52.9775244Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9775524Z mov.b64 {%r564, %r565}, %rd263; 2026-02-21T08:54:52.9775683Z cvt.rn.f16x2.f32 %r566, %r565, %r564; 2026-02-21T08:54:52.9775953Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9776224Z cvt.u64.u32 %rd264, %r373; 2026-02-21T08:54:52.9776380Z cvt.u64.u32 %rd265, %r374; 2026-02-21T08:54:52.9776528Z shl.b64 %rd266, %rd265, 32; 2026-02-21T08:54:52.9776685Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T08:54:52.9776944Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9777216Z mov.b64 {%r567, %r568}, %rd267; 2026-02-21T08:54:52.9777381Z cvt.rn.f16x2.f32 %r569, %r568, %r567; 2026-02-21T08:54:52.9777640Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9777917Z cvt.u64.u32 %rd268, %r376; 2026-02-21T08:54:52.9778066Z cvt.u64.u32 %rd269, %r377; 2026-02-21T08:54:52.9778218Z shl.b64 %rd270, %rd269, 32; 2026-02-21T08:54:52.9778366Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T08:54:52.9778622Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9778930Z mov.b64 {%r570, %r571}, %rd271; 2026-02-21T08:54:52.9779088Z cvt.rn.f16x2.f32 %r572, %r571, %r570; 2026-02-21T08:54:52.9779358Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9779631Z cvt.u64.u32 %rd272, %r378; 2026-02-21T08:54:52.9779784Z cvt.u64.u32 %rd273, %r379; 2026-02-21T08:54:52.9779929Z shl.b64 %rd274, %rd273, 32; 2026-02-21T08:54:52.9780087Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T08:54:52.9780349Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9780634Z mov.b64 {%r573, %r574}, %rd275; 2026-02-21T08:54:52.9780810Z cvt.rn.f16x2.f32 %r575, %r574, %r573; 2026-02-21T08:54:52.9781088Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9781383Z cvt.u64.u32 %rd276, %r380; 2026-02-21T08:54:52.9781538Z cvt.u64.u32 %rd277, %r381; 2026-02-21T08:54:52.9781700Z shl.b64 %rd278, %rd277, 32; 2026-02-21T08:54:52.9781856Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T08:54:52.9782162Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9782466Z mov.b64 {%r576, %r577}, %rd279; 2026-02-21T08:54:52.9782636Z cvt.rn.f16x2.f32 %r578, %r577, %r576; 2026-02-21T08:54:52.9782929Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9783225Z cvt.u64.u32 %rd280, %r382; 2026-02-21T08:54:52.9783387Z cvt.u64.u32 %rd281, %r383; 2026-02-21T08:54:52.9783542Z shl.b64 %rd282, %rd281, 32; 2026-02-21T08:54:52.9783736Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T08:54:52.9784011Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9784297Z mov.b64 {%r579, %r580}, %rd283; 2026-02-21T08:54:52.9784474Z cvt.rn.f16x2.f32 %r581, %r580, %r579; 2026-02-21T08:54:52.9784785Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9785086Z cvt.u64.u32 %rd284, %r384; 2026-02-21T08:54:52.9785242Z cvt.u64.u32 %rd285, %r385; 2026-02-21T08:54:52.9785405Z shl.b64 %rd286, %rd285, 32; 2026-02-21T08:54:52.9785562Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T08:54:52.9785840Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9786134Z mov.b64 {%r582, %r583}, %rd287; 2026-02-21T08:54:52.9786301Z cvt.rn.f16x2.f32 %r584, %r583, %r582; 2026-02-21T08:54:52.9786618Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9786904Z cvt.u64.u32 %rd288, %r386; 2026-02-21T08:54:52.9787070Z cvt.u64.u32 %rd289, %r387; 2026-02-21T08:54:52.9787227Z shl.b64 %rd290, %rd289, 32; 2026-02-21T08:54:52.9787394Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T08:54:52.9787666Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9787950Z mov.b64 {%r585, %r586}, %rd291; 2026-02-21T08:54:52.9788128Z cvt.rn.f16x2.f32 %r587, %r586, %r585; 2026-02-21T08:54:52.9788409Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9788683Z cvt.u64.u32 %rd292, %r388; 2026-02-21T08:54:52.9788828Z cvt.u64.u32 %rd293, %r389; 2026-02-21T08:54:52.9788980Z shl.b64 %rd294, %rd293, 32; 2026-02-21T08:54:52.9789127Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T08:54:52.9789386Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9789670Z mov.b64 {%r588, %r589}, %rd295; 2026-02-21T08:54:52.9789827Z cvt.rn.f16x2.f32 %r590, %r589, %r588; 2026-02-21T08:54:52.9790102Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9790408Z cvt.u64.u32 %rd296, %r390; 2026-02-21T08:54:52.9790562Z cvt.u64.u32 %rd297, %r391; 2026-02-21T08:54:52.9790707Z shl.b64 %rd298, %rd297, 32; 2026-02-21T08:54:52.9790862Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T08:54:52.9791119Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9791397Z mov.b64 {%r591, %r592}, %rd299; 2026-02-21T08:54:52.9791562Z cvt.rn.f16x2.f32 %r593, %r592, %r591; 2026-02-21T08:54:52.9791823Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9792098Z cvt.u64.u32 %rd300, %r393; 2026-02-21T08:54:52.9792245Z cvt.u64.u32 %rd301, %r394; 2026-02-21T08:54:52.9792401Z shl.b64 %rd302, %rd301, 32; 2026-02-21T08:54:52.9792552Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T08:54:52.9792816Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9793096Z mov.b64 {%r594, %r595}, %rd303; 2026-02-21T08:54:52.9793256Z cvt.rn.f16x2.f32 %r596, %r595, %r594; 2026-02-21T08:54:52.9793557Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9793834Z cvt.u64.u32 %rd304, %r395; 2026-02-21T08:54:52.9793994Z cvt.u64.u32 %rd305, %r396; 2026-02-21T08:54:52.9794149Z shl.b64 %rd306, %rd305, 32; 2026-02-21T08:54:52.9794308Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T08:54:52.9794568Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9794889Z mov.b64 {%r597, %r598}, %rd307; 2026-02-21T08:54:52.9795057Z cvt.rn.f16x2.f32 %r599, %r598, %r597; 2026-02-21T08:54:52.9795350Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9795739Z cvt.u64.u32 %rd308, %r397; 2026-02-21T08:54:52.9795886Z cvt.u64.u32 %rd309, %r398; 2026-02-21T08:54:52.9796041Z shl.b64 %rd310, %rd309, 32; 2026-02-21T08:54:52.9796198Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T08:54:52.9796449Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9796740Z mov.b64 {%r600, %r601}, %rd311; 2026-02-21T08:54:52.9796900Z cvt.rn.f16x2.f32 %r602, %r601, %r600; 2026-02-21T08:54:52.9797173Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9797450Z cvt.u64.u32 %rd312, %r399; 2026-02-21T08:54:52.9797610Z cvt.u64.u32 %rd313, %r400; 2026-02-21T08:54:52.9797759Z shl.b64 %rd314, %rd313, 32; 2026-02-21T08:54:52.9797947Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T08:54:52.9798212Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9798484Z mov.b64 {%r603, %r604}, %rd315; 2026-02-21T08:54:52.9798649Z cvt.rn.f16x2.f32 %r605, %r604, %r603; 2026-02-21T08:54:52.9798911Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9799187Z cvt.u64.u32 %rd316, %r401; 2026-02-21T08:54:52.9799334Z cvt.u64.u32 %rd317, %r402; 2026-02-21T08:54:52.9799489Z shl.b64 %rd318, %rd317, 32; 2026-02-21T08:54:52.9799643Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T08:54:52.9799894Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9800170Z mov.b64 {%r606, %r607}, %rd319; 2026-02-21T08:54:52.9800328Z cvt.rn.f16x2.f32 %r608, %r607, %r606; 2026-02-21T08:54:52.9800595Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9800866Z cvt.u64.u32 %rd320, %r403; 2026-02-21T08:54:52.9801020Z cvt.u64.u32 %rd321, %r404; 2026-02-21T08:54:52.9801164Z shl.b64 %rd322, %rd321, 32; 2026-02-21T08:54:52.9801320Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T08:54:52.9801606Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9801877Z mov.b64 {%r609, %r610}, %rd323; 2026-02-21T08:54:52.9802042Z cvt.rn.f16x2.f32 %r611, %r610, %r609; 2026-02-21T08:54:52.9802304Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9802580Z cvt.u64.u32 %rd324, %r405; 2026-02-21T08:54:52.9802727Z cvt.u64.u32 %rd325, %r406; 2026-02-21T08:54:52.9802884Z shl.b64 %rd326, %rd325, 32; 2026-02-21T08:54:52.9803040Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T08:54:52.9803293Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9803572Z mov.b64 {%r612, %r613}, %rd327; 2026-02-21T08:54:52.9803735Z cvt.rn.f16x2.f32 %r614, %r613, %r612; 2026-02-21T08:54:52.9804005Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9804277Z cvt.u64.u32 %rd328, %r407; 2026-02-21T08:54:52.9804435Z cvt.u64.u32 %rd329, %r408; 2026-02-21T08:54:52.9804586Z shl.b64 %rd330, %rd329, 32; 2026-02-21T08:54:52.9804781Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T08:54:52.9805077Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9805357Z mov.b64 {%r615, %r616}, %rd331; 2026-02-21T08:54:52.9805521Z cvt.rn.f16x2.f32 %r617, %r616, %r615; 2026-02-21T08:54:52.9805790Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9806072Z cvt.u64.u32 %rd332, %r410; 2026-02-21T08:54:52.9806220Z cvt.u64.u32 %rd333, %r411; 2026-02-21T08:54:52.9806376Z shl.b64 %rd334, %rd333, 32; 2026-02-21T08:54:52.9806583Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T08:54:52.9806840Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9807135Z mov.b64 {%r618, %r619}, %rd335; 2026-02-21T08:54:52.9807302Z cvt.rn.f16x2.f32 %r620, %r619, %r618; 2026-02-21T08:54:52.9807579Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9807861Z cvt.u64.u32 %rd336, %r412; 2026-02-21T08:54:52.9808022Z cvt.u64.u32 %rd337, %r413; 2026-02-21T08:54:52.9808172Z shl.b64 %rd338, %rd337, 32; 2026-02-21T08:54:52.9808334Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T08:54:52.9808606Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9808887Z mov.b64 {%r621, %r622}, %rd339; 2026-02-21T08:54:52.9809062Z cvt.rn.f16x2.f32 %r623, %r622, %r621; 2026-02-21T08:54:52.9809363Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9809652Z cvt.u64.u32 %rd340, %r414; 2026-02-21T08:54:52.9809799Z cvt.u64.u32 %rd341, %r415; 2026-02-21T08:54:52.9809952Z shl.b64 %rd342, %rd341, 32; 2026-02-21T08:54:52.9810109Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T08:54:52.9810363Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9810649Z mov.b64 {%r624, %r625}, %rd343; 2026-02-21T08:54:52.9810807Z cvt.rn.f16x2.f32 %r626, %r625, %r624; 2026-02-21T08:54:52.9811080Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9811353Z cvt.u64.u32 %rd344, %r416; 2026-02-21T08:54:52.9811506Z cvt.u64.u32 %rd345, %r417; 2026-02-21T08:54:52.9811651Z shl.b64 %rd346, %rd345, 32; 2026-02-21T08:54:52.9811807Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T08:54:52.9812070Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9812345Z mov.b64 {%r627, %r628}, %rd347; 2026-02-21T08:54:52.9812511Z cvt.rn.f16x2.f32 %r629, %r628, %r627; 2026-02-21T08:54:52.9812780Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9813085Z cvt.u64.u32 %rd348, %r418; 2026-02-21T08:54:52.9813233Z cvt.u64.u32 %rd349, %r419; 2026-02-21T08:54:52.9813386Z shl.b64 %rd350, %rd349, 32; 2026-02-21T08:54:52.9813544Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T08:54:52.9813808Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9814093Z mov.b64 {%r630, %r631}, %rd351; 2026-02-21T08:54:52.9814252Z cvt.rn.f16x2.f32 %r632, %r631, %r630; 2026-02-21T08:54:52.9814525Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9814830Z cvt.u64.u32 %rd352, %r420; 2026-02-21T08:54:52.9814991Z cvt.u64.u32 %rd353, %r421; 2026-02-21T08:54:52.9815140Z shl.b64 %rd354, %rd353, 32; 2026-02-21T08:54:52.9815302Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T08:54:52.9815574Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9815861Z mov.b64 {%r633, %r634}, %rd355; 2026-02-21T08:54:52.9816029Z cvt.rn.f16x2.f32 %r635, %r634, %r633; 2026-02-21T08:54:52.9816334Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9816622Z cvt.u64.u32 %rd356, %r422; 2026-02-21T08:54:52.9816769Z cvt.u64.u32 %rd357, %r423; 2026-02-21T08:54:52.9816922Z shl.b64 %rd358, %rd357, 32; 2026-02-21T08:54:52.9817078Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T08:54:52.9817334Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9817618Z mov.b64 {%r636, %r637}, %rd359; 2026-02-21T08:54:52.9817808Z cvt.rn.f16x2.f32 %r638, %r637, %r636; 2026-02-21T08:54:52.9818077Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9818354Z cvt.u64.u32 %rd360, %r424; 2026-02-21T08:54:52.9818511Z cvt.u64.u32 %rd361, %r425; 2026-02-21T08:54:52.9818655Z shl.b64 %rd362, %rd361, 32; 2026-02-21T08:54:52.9818811Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T08:54:52.9819071Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9819339Z mov.b64 {%r639, %r640}, %rd363; 2026-02-21T08:54:52.9819503Z cvt.rn.f16x2.f32 %r641, %r640, %r639; 2026-02-21T08:54:52.9819760Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9820041Z cvt.u64.u32 %rd364, %r427; 2026-02-21T08:54:52.9820187Z cvt.u64.u32 %rd365, %r428; 2026-02-21T08:54:52.9820339Z shl.b64 %rd366, %rd365, 32; 2026-02-21T08:54:52.9820520Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T08:54:52.9820781Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9821069Z mov.b64 {%r642, %r643}, %rd367; 2026-02-21T08:54:52.9821233Z cvt.rn.f16x2.f32 %r644, %r643, %r642; 2026-02-21T08:54:52.9821519Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9821800Z cvt.u64.u32 %rd368, %r429; 2026-02-21T08:54:52.9821962Z cvt.u64.u32 %rd369, %r430; 2026-02-21T08:54:52.9822114Z shl.b64 %rd370, %rd369, 32; 2026-02-21T08:54:52.9822275Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T08:54:52.9822537Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9822822Z mov.b64 {%r645, %r646}, %rd371; 2026-02-21T08:54:52.9822991Z cvt.rn.f16x2.f32 %r647, %r646, %r645; 2026-02-21T08:54:52.9823259Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9823539Z cvt.u64.u32 %rd372, %r431; 2026-02-21T08:54:52.9823690Z cvt.u64.u32 %rd373, %r432; 2026-02-21T08:54:52.9823848Z shl.b64 %rd374, %rd373, 32; 2026-02-21T08:54:52.9824008Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T08:54:52.9824291Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9824588Z mov.b64 {%r648, %r649}, %rd375; 2026-02-21T08:54:52.9824786Z cvt.rn.f16x2.f32 %r650, %r649, %r648; 2026-02-21T08:54:52.9825076Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9825362Z cvt.u64.u32 %rd376, %r433; 2026-02-21T08:54:52.9825525Z cvt.u64.u32 %rd377, %r434; 2026-02-21T08:54:52.9825680Z shl.b64 %rd378, %rd377, 32; 2026-02-21T08:54:52.9825847Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T08:54:52.9826131Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9826422Z mov.b64 {%r651, %r652}, %rd379; 2026-02-21T08:54:52.9826603Z cvt.rn.f16x2.f32 %r653, %r652, %r651; 2026-02-21T08:54:52.9826887Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9827190Z cvt.u64.u32 %rd380, %r435; 2026-02-21T08:54:52.9827344Z cvt.u64.u32 %rd381, %r436; 2026-02-21T08:54:52.9827507Z shl.b64 %rd382, %rd381, 32; 2026-02-21T08:54:52.9827671Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T08:54:52.9827968Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9828269Z mov.b64 {%r654, %r655}, %rd383; 2026-02-21T08:54:52.9828436Z cvt.rn.f16x2.f32 %r656, %r655, %r654; 2026-02-21T08:54:52.9828722Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9829007Z cvt.u64.u32 %rd384, %r437; 2026-02-21T08:54:52.9829169Z cvt.u64.u32 %rd385, %r438; 2026-02-21T08:54:52.9829351Z shl.b64 %rd386, %rd385, 32; 2026-02-21T08:54:52.9829518Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T08:54:52.9829793Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9830084Z mov.b64 {%r657, %r658}, %rd387; 2026-02-21T08:54:52.9830257Z cvt.rn.f16x2.f32 %r659, %r658, %r657; 2026-02-21T08:54:52.9830533Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9830826Z cvt.u64.u32 %rd388, %r439; 2026-02-21T08:54:52.9830980Z cvt.u64.u32 %rd389, %r440; 2026-02-21T08:54:52.9831138Z shl.b64 %rd390, %rd389, 32; 2026-02-21T08:54:52.9831302Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T08:54:52.9831572Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9831862Z mov.b64 {%r660, %r661}, %rd391; 2026-02-21T08:54:52.9832068Z cvt.rn.f16x2.f32 %r662, %r661, %r660; 2026-02-21T08:54:52.9832341Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9832618Z cvt.u64.u32 %rd392, %r441; 2026-02-21T08:54:52.9832770Z cvt.u64.u32 %rd393, %r442; 2026-02-21T08:54:52.9832917Z shl.b64 %rd394, %rd393, 32; 2026-02-21T08:54:52.9833072Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T08:54:52.9833328Z .loc 1 58 27 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:58:27 2026-02-21T08:54:52.9833609Z mov.b64 {%r663, %r664}, %rd395; 2026-02-21T08:54:52.9833774Z cvt.rn.f16x2.f32 %r665, %r664, %r663; 2026-02-21T08:54:52.9834039Z .loc 1 59 45 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:59:45 2026-02-21T08:54:52.9834360Z st.shared.v4.b32 [%r467], {%r476, %r479, %r482, %r485}; 2026-02-21T08:54:52.9834599Z st.shared.v4.b32 [%r467+16384], {%r572, %r575, %r578, %r581}; 2026-02-21T08:54:52.9834887Z st.shared.v4.b32 [%r466], {%r488, %r491, %r494, %r497}; 2026-02-21T08:54:52.9835123Z st.shared.v4.b32 [%r466+16384], {%r584, %r587, %r590, %r593}; 2026-02-21T08:54:52.9835355Z st.shared.v4.b32 [%r464], {%r500, %r503, %r506, %r509}; 2026-02-21T08:54:52.9835587Z st.shared.v4.b32 [%r464+16384], {%r596, %r599, %r602, %r605}; 2026-02-21T08:54:52.9835846Z st.shared.v4.b32 [%r462], {%r512, %r515, %r518, %r521}; 2026-02-21T08:54:52.9836081Z st.shared.v4.b32 [%r462+16384], {%r608, %r611, %r614, %r617}; 2026-02-21T08:54:52.9836311Z st.shared.v4.b32 [%r460], {%r524, %r527, %r530, %r533}; 2026-02-21T08:54:52.9836540Z st.shared.v4.b32 [%r460+16384], {%r620, %r623, %r626, %r629}; 2026-02-21T08:54:52.9836773Z st.shared.v4.b32 [%r458], {%r536, %r539, %r542, %r545}; 2026-02-21T08:54:52.9836994Z st.shared.v4.b32 [%r458+16384], {%r632, %r635, %r638, %r641}; 2026-02-21T08:54:52.9837228Z st.shared.v4.b32 [%r456], {%r548, %r551, %r554, %r557}; 2026-02-21T08:54:52.9837452Z st.shared.v4.b32 [%r456+16384], {%r644, %r647, %r650, %r653}; 2026-02-21T08:54:52.9837688Z st.shared.v4.b32 [%r454], {%r560, %r563, %r566, %r569}; 2026-02-21T08:54:52.9837917Z st.shared.v4.b32 [%r454+16384], {%r656, %r659, %r662, %r665}; 2026-02-21T08:54:52.9838144Z // begin inline asm 2026-02-21T08:54:52.9838309Z fence.proxy.async.shared::cta; 2026-02-21T08:54:52.9838467Z // end inline asm 2026-02-21T08:54:52.9838608Z bar.sync 0, 128; 2026-02-21T08:54:52.9838748Z elect.sync %r666|%p122, -1; 2026-02-21T08:54:52.9838916Z and.pred %p120, %p121, %p122; 2026-02-21T08:54:52.9839095Z and.b32 %r667, %r471, 1; 2026-02-21T08:54:52.9839254Z shl.b32 %r668, %r667, 14; 2026-02-21T08:54:52.9839407Z add.s32 %r446, %r128, %r668; 2026-02-21T08:54:52.9839567Z shl.b32 %r669, %r667, 6; 2026-02-21T08:54:52.9839712Z or.b32 %r444, %r669, %r470; 2026-02-21T08:54:52.9839869Z // begin inline asm 2026-02-21T08:54:52.9840133Z @%p120 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd139, {%r444, %r445}], [%r446]; 2026-02-21T08:54:52.9840417Z // end inline asm 2026-02-21T08:54:52.9840568Z cp.async.bulk.commit_group; 2026-02-21T08:54:52.9840765Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:52.9840937Z bar.sync 0, 128; 2026-02-21T08:54:52.9841094Z $L__BB0_14: // %._crit_edge 2026-02-21T08:54:52.9841388Z .loc 1 33 4 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:33:4 2026-02-21T08:54:52.9841670Z bar.sync 0, 128; 2026-02-21T08:54:52.9841801Z // begin inline asm 2026-02-21T08:54:52.9842004Z @%p48 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r670, 128; 2026-02-21T08:54:52.9842218Z // end inline asm 2026-02-21T08:54:52.9842377Z st.shared.b32 [global_smem+65688], 50529027; 2026-02-21T08:54:52.9842556Z barrier.sync 1; 2026-02-21T08:54:52.9842717Z $L__BB0_15: // %common.ret 2026-02-21T08:54:52.9842999Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0 2026-02-21T08:54:52.9843277Z ret; 2026-02-21T08:54:52.9843468Z $L__BB0_1: // %.preheader.preheader 2026-02-21T08:54:52.9843675Z mov.b32 %r20, global_smem; 2026-02-21T08:54:52.9843834Z add.s32 %r21, %r20, %r3; 2026-02-21T08:54:52.9843985Z bfe.u32 %r72, %r20, 4, 14; 2026-02-21T08:54:52.9844142Z cvt.u64.u32 %rd48, %r72; 2026-02-21T08:54:52.9844300Z or.b64 %rd30, %rd48, 4611686293372403712; 2026-02-21T08:54:52.9844480Z add.s32 %r73, %r20, 32768; 2026-02-21T08:54:52.9844625Z bfe.u32 %r74, %r73, 4, 14; 2026-02-21T08:54:52.9844809Z cvt.u64.u32 %rd49, %r74; 2026-02-21T08:54:52.9844974Z or.b64 %rd31, %rd49, 4611686293372403712; 2026-02-21T08:54:52.9845146Z add.s32 %r75, %r20, 32; 2026-02-21T08:54:52.9845298Z bfe.u32 %r76, %r75, 4, 14; 2026-02-21T08:54:52.9845443Z add.s32 %r77, %r20, 32800; 2026-02-21T08:54:52.9845596Z bfe.u32 %r78, %r77, 4, 14; 2026-02-21T08:54:52.9845740Z bra.uni $L__BB0_2; 2026-02-21T08:54:52.9845930Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9846256Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9846567Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:52.9846756Z barrier.sync 1; 2026-02-21T08:54:52.9846893Z barrier.sync 1; 2026-02-21T08:54:52.9847086Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:52.9847287Z $L__BB0_2: // %.preheader 2026-02-21T08:54:52.9847512Z // =>This Loop Header: Depth=1 2026-02-21T08:54:52.9847733Z // Child Loop BB0_9 Depth 2 2026-02-21T08:54:52.9847959Z // Child Loop BB0_6 Depth 2 2026-02-21T08:54:52.9848251Z .loc 1 19 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:19 2026-02-21T08:54:52.9848549Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T08:54:52.9848728Z barrier.sync 1; 2026-02-21T08:54:52.9848870Z ld.shared.b8 %r19, [%r21+65684]; 2026-02-21T08:54:52.9849048Z setp.gt.u32 %p2, %r19, 3; 2026-02-21T08:54:52.9849202Z @%p2 bra $L__BB0_4; 2026-02-21T08:54:52.9849370Z // %bb.3: // %.preheader 2026-02-21T08:54:52.9849583Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9849791Z $L_brx_0: .branchtargets 2026-02-21T08:54:52.9849936Z $L__BB0_5, 2026-02-21T08:54:52.9850066Z $L__BB0_8, 2026-02-21T08:54:52.9850192Z $L__BB0_11, 2026-02-21T08:54:52.9850312Z $L__BB0_15; 2026-02-21T08:54:52.9850493Z brx.idx %r19, $L_brx_0; 2026-02-21T08:54:52.9850659Z $L__BB0_5: // %.peel.next 2026-02-21T08:54:52.9850872Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9851167Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9851461Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:52.9851652Z ld.shared.b32 %r53, [global_smem+65536]; 2026-02-21T08:54:52.9851861Z barrier.sync 1; 2026-02-21T08:54:52.9852001Z cvt.u64.u32 %rd50, %r76; 2026-02-21T08:54:52.9852155Z or.b64 %rd66, %rd50, 4611686293372403712; 2026-02-21T08:54:52.9852334Z cvt.u64.u32 %rd51, %r78; 2026-02-21T08:54:52.9852488Z or.b64 %rd67, %rd51, 4611686293372403712; 2026-02-21T08:54:52.9852666Z add.s32 %r79, %r20, 64; 2026-02-21T08:54:52.9852812Z bfe.u32 %r80, %r79, 4, 14; 2026-02-21T08:54:52.9852968Z cvt.u64.u32 %rd52, %r80; 2026-02-21T08:54:52.9853120Z or.b64 %rd68, %rd52, 4611686293372403712; 2026-02-21T08:54:52.9853294Z add.s32 %r81, %r20, 32832; 2026-02-21T08:54:52.9853447Z bfe.u32 %r82, %r81, 4, 14; 2026-02-21T08:54:52.9853595Z cvt.u64.u32 %rd53, %r82; 2026-02-21T08:54:52.9853751Z or.b64 %rd69, %rd53, 4611686293372403712; 2026-02-21T08:54:52.9853917Z add.s32 %r83, %r20, 96; 2026-02-21T08:54:52.9854066Z bfe.u32 %r84, %r83, 4, 14; 2026-02-21T08:54:52.9854214Z cvt.u64.u32 %rd54, %r84; 2026-02-21T08:54:52.9854402Z or.b64 %rd70, %rd54, 4611686293372403712; 2026-02-21T08:54:52.9854570Z add.s32 %r85, %r20, 32864; 2026-02-21T08:54:52.9854744Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T08:54:52.9854890Z cvt.u64.u32 %rd55, %r86; 2026-02-21T08:54:52.9855049Z or.b64 %rd71, %rd55, 4611686293372403712; 2026-02-21T08:54:52.9855225Z add.s32 %r87, %r20, 16384; 2026-02-21T08:54:52.9855369Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T08:54:52.9855524Z cvt.u64.u32 %rd56, %r88; 2026-02-21T08:54:52.9855587Z or.b64 %rd72, %rd56, 4611686293372403712; 2026-02-21T08:54:52.9855644Z add.s32 %r89, %r20, 49152; 2026-02-21T08:54:52.9855708Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T08:54:52.9855766Z cvt.u64.u32 %rd57, %r90; 2026-02-21T08:54:52.9855829Z or.b64 %rd73, %rd57, 4611686293372403712; 2026-02-21T08:54:52.9855884Z add.s32 %r91, %r20, 16416; 2026-02-21T08:54:52.9855949Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T08:54:52.9856006Z cvt.u64.u32 %rd58, %r92; 2026-02-21T08:54:52.9856071Z or.b64 %rd74, %rd58, 4611686293372403712; 2026-02-21T08:54:52.9856138Z add.s32 %r93, %r20, 49184; 2026-02-21T08:54:52.9856194Z bfe.u32 %r94, %r93, 4, 14; 2026-02-21T08:54:52.9856251Z cvt.u64.u32 %rd59, %r94; 2026-02-21T08:54:52.9856314Z or.b64 %rd75, %rd59, 4611686293372403712; 2026-02-21T08:54:52.9856376Z add.s32 %r95, %r20, 16448; 2026-02-21T08:54:52.9856461Z bfe.u32 %r96, %r95, 4, 14; 2026-02-21T08:54:52.9856517Z cvt.u64.u32 %rd60, %r96; 2026-02-21T08:54:52.9856586Z or.b64 %rd76, %rd60, 4611686293372403712; 2026-02-21T08:54:52.9856640Z add.s32 %r97, %r20, 49216; 2026-02-21T08:54:52.9856696Z bfe.u32 %r98, %r97, 4, 14; 2026-02-21T08:54:52.9856752Z cvt.u64.u32 %rd61, %r98; 2026-02-21T08:54:52.9856822Z or.b64 %rd77, %rd61, 4611686293372403712; 2026-02-21T08:54:52.9856876Z add.s32 %r99, %r20, 16480; 2026-02-21T08:54:52.9856934Z bfe.u32 %r100, %r99, 4, 14; 2026-02-21T08:54:52.9856999Z cvt.u64.u32 %rd62, %r100; 2026-02-21T08:54:52.9857060Z or.b64 %rd78, %rd62, 4611686293372403712; 2026-02-21T08:54:52.9857117Z add.s32 %r101, %r20, 49248; 2026-02-21T08:54:52.9857186Z bfe.u32 %r102, %r101, 4, 14; 2026-02-21T08:54:52.9857244Z cvt.u64.u32 %rd63, %r102; 2026-02-21T08:54:52.9857305Z or.b64 %rd79, %rd63, 4611686293372403712; 2026-02-21T08:54:52.9857468Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0 2026-02-21T08:54:52.9857538Z bar.warp.sync -1; 2026-02-21T08:54:52.9857593Z add.s32 %r51, %r20, 65664; 2026-02-21T08:54:52.9857646Z mov.b32 %r671, 0; 2026-02-21T08:54:52.9857710Z // begin inline asm 2026-02-21T08:54:52.9857759Z 2026-02-21T08:54:52.9857831Z { 2026-02-21T08:54:52.9857891Z .reg .pred complete; 2026-02-21T08:54:52.9857952Z waitLoop: 2026-02-21T08:54:52.9858067Z mbarrier.try_wait.parity.shared.b64 complete, [%r51], %r671; 2026-02-21T08:54:52.9858131Z @!complete bra.uni waitLoop; 2026-02-21T08:54:52.9858188Z } 2026-02-21T08:54:52.9858192Z 2026-02-21T08:54:52.9858246Z // end inline asm 2026-02-21T08:54:52.9858413Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9858509Z elect.sync %r103|%p11, -1; 2026-02-21T08:54:52.9858565Z mov.b32 %r54, 136314896; 2026-02-21T08:54:52.9858626Z mov.pred %p10, 0; 2026-02-21T08:54:52.9858681Z // begin inline asm 2026-02-21T08:54:52.9858827Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd30, %rd31, %r54, %p10; 2026-02-21T08:54:52.9858883Z // end inline asm 2026-02-21T08:54:52.9858939Z mov.pred %p12, -1; 2026-02-21T08:54:52.9859000Z // begin inline asm 2026-02-21T08:54:52.9859130Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd66, %rd67, %r54, %p12; 2026-02-21T08:54:52.9859183Z // end inline asm 2026-02-21T08:54:52.9859235Z // begin inline asm 2026-02-21T08:54:52.9859370Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd68, %rd69, %r54, %p12; 2026-02-21T08:54:52.9859422Z // end inline asm 2026-02-21T08:54:52.9859475Z // begin inline asm 2026-02-21T08:54:52.9859626Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd70, %rd71, %r54, %p12; 2026-02-21T08:54:52.9859681Z // end inline asm 2026-02-21T08:54:52.9859735Z // begin inline asm 2026-02-21T08:54:52.9859863Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd72, %rd73, %r54, %p12; 2026-02-21T08:54:52.9859915Z // end inline asm 2026-02-21T08:54:52.9859971Z // begin inline asm 2026-02-21T08:54:52.9860095Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd74, %rd75, %r54, %p12; 2026-02-21T08:54:52.9860148Z // end inline asm 2026-02-21T08:54:52.9860202Z // begin inline asm 2026-02-21T08:54:52.9860322Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd76, %rd77, %r54, %p12; 2026-02-21T08:54:52.9860383Z // end inline asm 2026-02-21T08:54:52.9860437Z // begin inline asm 2026-02-21T08:54:52.9860557Z @%p11 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd78, %rd79, %r54, %p12; 2026-02-21T08:54:52.9860617Z // end inline asm 2026-02-21T08:54:52.9860673Z add.s32 %r104, %r20, 65648; 2026-02-21T08:54:52.9860730Z cvt.u64.u32 %rd46, %r104; 2026-02-21T08:54:52.9860790Z // begin inline asm 2026-02-21T08:54:52.9860911Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:54:52.9860964Z // end inline asm 2026-02-21T08:54:52.9861020Z add.s32 %r105, %r20, 65680; 2026-02-21T08:54:52.9861085Z cvt.u64.u32 %rd47, %r105; 2026-02-21T08:54:52.9861168Z // begin inline asm 2026-02-21T08:54:52.9861286Z @%p10 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T08:54:52.9861346Z // end inline asm 2026-02-21T08:54:52.9861399Z mov.b32 %r672, 1; 2026-02-21T08:54:52.9861495Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:52.9861592Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:52.9861748Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0 2026-02-21T08:54:52.9861808Z bar.warp.sync -1; 2026-02-21T08:54:52.9861862Z // begin inline asm 2026-02-21T08:54:52.9861917Z 2026-02-21T08:54:52.9861968Z { 2026-02-21T08:54:52.9862031Z .reg .pred complete; 2026-02-21T08:54:52.9862093Z waitLoop: 2026-02-21T08:54:52.9862207Z mbarrier.try_wait.parity.shared.b64 complete, [%r51], %r672; 2026-02-21T08:54:52.9862269Z @!complete bra.uni waitLoop; 2026-02-21T08:54:52.9862319Z } 2026-02-21T08:54:52.9862323Z 2026-02-21T08:54:52.9862385Z // end inline asm 2026-02-21T08:54:52.9862552Z .loc 1 56 52 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:56:52 2026-02-21T08:54:52.9862641Z setp.eq.b32 %p46, %r671, 1792; 2026-02-21T08:54:52.9862713Z elect.sync %r125|%p29, -1; 2026-02-21T08:54:52.9862766Z // begin inline asm 2026-02-21T08:54:52.9862891Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd30, %rd31, %r54, %p12; 2026-02-21T08:54:52.9862951Z // end inline asm 2026-02-21T08:54:52.9863006Z // begin inline asm 2026-02-21T08:54:52.9863127Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd66, %rd67, %r54, %p12; 2026-02-21T08:54:52.9863181Z // end inline asm 2026-02-21T08:54:52.9863273Z // begin inline asm 2026-02-21T08:54:52.9863396Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd68, %rd69, %r54, %p12; 2026-02-21T08:54:52.9863451Z // end inline asm 2026-02-21T08:54:52.9863516Z // begin inline asm 2026-02-21T08:54:52.9863639Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd70, %rd71, %r54, %p12; 2026-02-21T08:54:52.9863692Z // end inline asm 2026-02-21T08:54:52.9863754Z // begin inline asm 2026-02-21T08:54:52.9863876Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd72, %rd73, %r54, %p12; 2026-02-21T08:54:52.9863930Z // end inline asm 2026-02-21T08:54:52.9863984Z // begin inline asm 2026-02-21T08:54:52.9864110Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd74, %rd75, %r54, %p12; 2026-02-21T08:54:52.9864163Z // end inline asm 2026-02-21T08:54:52.9864218Z // begin inline asm 2026-02-21T08:54:52.9864347Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd76, %rd77, %r54, %p12; 2026-02-21T08:54:52.9864422Z // end inline asm 2026-02-21T08:54:52.9864478Z // begin inline asm 2026-02-21T08:54:52.9864605Z @%p29 tcgen05.mma.cta_group::1.kind::f16 [ %r53 + 0 ], %rd78, %rd79, %r54, %p12; 2026-02-21T08:54:52.9864658Z // end inline asm 2026-02-21T08:54:52.9864751Z // begin inline asm 2026-02-21T08:54:52.9864881Z @%p29 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T08:54:52.9864934Z // end inline asm 2026-02-21T08:54:52.9864997Z and.pred %p45, %p46, %p29; 2026-02-21T08:54:52.9865052Z // begin inline asm 2026-02-21T08:54:52.9865174Z @%p45 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T08:54:52.9865226Z // end inline asm 2026-02-21T08:54:52.9865383Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0 2026-02-21T08:54:52.9865450Z xor.b32 %r672, %r672, 1; 2026-02-21T08:54:52.9865615Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9865673Z add.s32 %r671, %r671, 128; 2026-02-21T08:54:52.9865746Z setp.lt.u32 %p47, %r671, 1920; 2026-02-21T08:54:52.9865801Z @%p47 bra $L__BB0_6; 2026-02-21T08:54:52.9865880Z // %bb.7: // %.loopexit 2026-02-21T08:54:52.9865968Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9866062Z barrier.sync 1; 2026-02-21T08:54:52.9866139Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:52.9866194Z bra.uni $L__BB0_2; 2026-02-21T08:54:52.9866297Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9866463Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9866539Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:52.9866639Z ld.shared.v2.b32 {%r38, %r42}, [global_smem+65544]; 2026-02-21T08:54:52.9866693Z barrier.sync 1; 2026-02-21T08:54:52.9866858Z .loc 1 21 67 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:21:67 2026-02-21T08:54:52.9866918Z mov.u32 %r24, %ctaid.x; 2026-02-21T08:54:52.9866982Z mov.u32 %r25, %ctaid.y; 2026-02-21T08:54:52.9867039Z mov.u32 %r26, %ctaid.z; 2026-02-21T08:54:52.9867098Z mov.u32 %r27, %nctaid.x; 2026-02-21T08:54:52.9867163Z mov.u32 %r28, %nctaid.y; 2026-02-21T08:54:52.9867226Z mad.lo.s32 %r29, %r26, %r28, %r25; 2026-02-21T08:54:52.9867286Z mad.lo.s32 %r30, %r29, %r27, %r24; 2026-02-21T08:54:52.9867342Z mul.lo.s32 %r31, %r30, 384; 2026-02-21T08:54:52.9867538Z .loc 1 22 68 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:22:68 2026-02-21T08:54:52.9867597Z add.s32 %r32, %r31, 128; 2026-02-21T08:54:52.9867653Z cvt.s64.s32 %rd24, %r32; 2026-02-21T08:54:52.9867720Z add.s64 %rd25, %rd23, %rd24; 2026-02-21T08:54:52.9867781Z cvta.global.u64 %rd29, %rd25; 2026-02-21T08:54:52.9867940Z .loc 1 21 67 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:21:67 2026-02-21T08:54:52.9868004Z cvt.s64.s32 %rd26, %r31; 2026-02-21T08:54:52.9868086Z add.s64 %rd27, %rd23, %rd26; 2026-02-21T08:54:52.9868146Z cvta.global.u64 %rd28, %rd27; 2026-02-21T08:54:52.9868202Z add.s32 %r11, %r1, -128; 2026-02-21T08:54:52.9868264Z shr.u32 %r12, %r11, 5; 2026-02-21T08:54:52.9868317Z mov.b32 %r674, 0; 2026-02-21T08:54:52.9868372Z mov.b32 %r673, -128; 2026-02-21T08:54:52.9868477Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:54:52.9868574Z // => This Inner Loop Header: Depth=2 2026-02-21T08:54:52.9868743Z .loc 1 0 67 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0:67 2026-02-21T08:54:52.9868812Z setp.lt.u32 %p6, %r11, 64; 2026-02-21T08:54:52.9868875Z setp.eq.b32 %p3, %r11, 0; 2026-02-21T08:54:52.9868934Z add.s32 %r33, %r20, 65648; 2026-02-21T08:54:52.9868990Z // begin inline asm 2026-02-21T08:54:52.9869047Z 2026-02-21T08:54:52.9869097Z { 2026-02-21T08:54:52.9869181Z .reg .pred complete; 2026-02-21T08:54:52.9869240Z waitLoop: 2026-02-21T08:54:52.9869363Z mbarrier.try_wait.parity.shared.b64 complete, [%r33], %r674; 2026-02-21T08:54:52.9869428Z @!complete bra.uni waitLoop; 2026-02-21T08:54:52.9869478Z } 2026-02-21T08:54:52.9869482Z 2026-02-21T08:54:52.9869546Z // end inline asm 2026-02-21T08:54:52.9869602Z bar.sync 3, 64; 2026-02-21T08:54:52.9869661Z add.s32 %r35, %r20, 65664; 2026-02-21T08:54:52.9869723Z // begin inline asm 2026-02-21T08:54:52.9869835Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r35], 65536; 2026-02-21T08:54:52.9869891Z // end inline asm 2026-02-21T08:54:52.9869947Z bar.sync 3, 64; 2026-02-21T08:54:52.9870029Z shfl.sync.idx.b32 %r45, %r12, 0, 31, -1; 2026-02-21T08:54:52.9870092Z elect.sync %r46|%p7, -1; 2026-02-21T08:54:52.9870157Z and.pred %p4, %p6, %p7; 2026-02-21T08:54:52.9870224Z and.b32 %r47, %r45, 1; 2026-02-21T08:54:52.9870284Z shl.b32 %r48, %r47, 14; 2026-02-21T08:54:52.9870342Z add.s32 %r36, %r20, %r48; 2026-02-21T08:54:52.9870400Z shl.b32 %r49, %r47, 6; 2026-02-21T08:54:52.9870583Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9870642Z add.s32 %r673, %r673, 128; 2026-02-21T08:54:52.9870806Z .loc 1 0 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:0 2026-02-21T08:54:52.9870912Z add.s32 %r41, %r673, %r49; 2026-02-21T08:54:52.9870970Z // begin inline asm 2026-02-21T08:54:52.9871228Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r36], [%rd28, {%r41, %r38}], [%r35]; 2026-02-21T08:54:52.9871295Z // end inline asm 2026-02-21T08:54:52.9871352Z bar.sync 3, 64; 2026-02-21T08:54:52.9871416Z elect.sync %r50|%p8, -1; 2026-02-21T08:54:52.9871481Z and.pred %p5, %p6, %p8; 2026-02-21T08:54:52.9871560Z add.s32 %r40, %r36, 32768; 2026-02-21T08:54:52.9871617Z // begin inline asm 2026-02-21T08:54:52.9871856Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r40], [%rd29, {%r41, %r42}], [%r35]; 2026-02-21T08:54:52.9871924Z // end inline asm 2026-02-21T08:54:52.9871982Z xor.b32 %r674, %r674, 1; 2026-02-21T08:54:52.9872149Z .loc 1 50 79 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:50:79 2026-02-21T08:54:52.9872225Z setp.lt.u32 %p9, %r673, 1920; 2026-02-21T08:54:52.9872282Z @%p9 bra $L__BB0_9; 2026-02-21T08:54:52.9872380Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9872473Z barrier.sync 1; 2026-02-21T08:54:52.9872554Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T08:54:52.9872612Z bra.uni $L__BB0_2; 2026-02-21T08:54:52.9872710Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:54:52.9872882Z .loc 1 19 0 // ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py:19 2026-02-21T08:54:52.9872940Z barrier.sync 1; 2026-02-21T08:54:52.9872996Z barrier.sync 1; 2026-02-21T08:54:52.9873061Z bra.uni $L__BB0_2; 2026-02-21T08:54:52.9873137Z $L__tmp1: 2026-02-21T08:54:52.9873194Z $L__func_end0: 2026-02-21T08:54:52.9873276Z // -- End function 2026-02-21T08:54:52.9873334Z } 2026-02-21T08:54:52.9873569Z .file 1 "/tmp/torchinductor_root/kz/ckz27pqosy4fnh2vul3v7tace5rvtvq7ac7wgofb6lgbn6cq7x2a.py" 2026-02-21T08:54:52.9873632Z .section .debug_abbrev 2026-02-21T08:54:52.9873690Z { 2026-02-21T08:54:52.9873778Z .b8 1 // Abbreviation Code 2026-02-21T08:54:52.9873870Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:52.9873959Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:52.9874039Z .b8 37 // DW_AT_producer 2026-02-21T08:54:52.9874115Z .b8 8 // DW_FORM_string 2026-02-21T08:54:52.9874191Z .b8 19 // DW_AT_language 2026-02-21T08:54:52.9874300Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:52.9874378Z .b8 3 // DW_AT_name 2026-02-21T08:54:52.9874455Z .b8 8 // DW_FORM_string 2026-02-21T08:54:52.9874543Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:52.9874620Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:52.9874722Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:52.9874806Z .b8 8 // DW_FORM_string 2026-02-21T08:54:52.9874879Z .b8 0 // EOM(1) 2026-02-21T08:54:52.9874948Z .b8 0 // EOM(2) 2026-02-21T08:54:52.9875014Z .b8 0 // EOM(3) 2026-02-21T08:54:52.9875074Z } 2026-02-21T08:54:52.9875133Z .section .debug_info 2026-02-21T08:54:52.9875183Z { 2026-02-21T08:54:52.9875272Z .b32 104 // Length of Unit 2026-02-21T08:54:52.9875361Z .b8 2 // DWARF version number 2026-02-21T08:54:52.9875416Z .b8 0 2026-02-21T08:54:52.9875532Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:52.9875628Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:52.9875755Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:52.9875835Z .b8 116 // DW_AT_producer 2026-02-21T08:54:52.9875898Z .b8 114 2026-02-21T08:54:52.9875951Z .b8 105 2026-02-21T08:54:52.9876002Z .b8 116 2026-02-21T08:54:52.9876059Z .b8 111 2026-02-21T08:54:52.9876110Z .b8 110 2026-02-21T08:54:52.9876161Z .b8 0 2026-02-21T08:54:52.9876236Z .b8 2 // DW_AT_language 2026-02-21T08:54:52.9876295Z .b8 0 2026-02-21T08:54:52.9876370Z .b8 99 // DW_AT_name 2026-02-21T08:54:52.9876422Z .b8 107 2026-02-21T08:54:52.9876480Z .b8 122 2026-02-21T08:54:52.9876534Z .b8 50 2026-02-21T08:54:52.9876587Z .b8 55 2026-02-21T08:54:52.9876639Z .b8 112 2026-02-21T08:54:52.9876698Z .b8 113 2026-02-21T08:54:52.9876748Z .b8 111 2026-02-21T08:54:52.9876799Z .b8 115 2026-02-21T08:54:52.9876856Z .b8 121 2026-02-21T08:54:52.9876906Z .b8 52 2026-02-21T08:54:52.9876958Z .b8 102 2026-02-21T08:54:52.9877008Z .b8 110 2026-02-21T08:54:52.9877066Z .b8 104 2026-02-21T08:54:52.9877117Z .b8 50 2026-02-21T08:54:52.9877167Z .b8 118 2026-02-21T08:54:52.9877217Z .b8 117 2026-02-21T08:54:52.9877274Z .b8 108 2026-02-21T08:54:52.9877362Z .b8 51 2026-02-21T08:54:52.9877414Z .b8 118 2026-02-21T08:54:52.9877473Z .b8 55 2026-02-21T08:54:52.9877524Z .b8 116 2026-02-21T08:54:52.9877576Z .b8 97 2026-02-21T08:54:52.9877628Z .b8 99 2026-02-21T08:54:52.9877687Z .b8 101 2026-02-21T08:54:52.9877739Z .b8 53 2026-02-21T08:54:52.9877790Z .b8 114 2026-02-21T08:54:52.9877847Z .b8 118 2026-02-21T08:54:52.9877899Z .b8 116 2026-02-21T08:54:52.9877949Z .b8 118 2026-02-21T08:54:52.9877999Z .b8 113 2026-02-21T08:54:52.9878060Z .b8 55 2026-02-21T08:54:52.9878142Z .b8 97 2026-02-21T08:54:52.9878194Z .b8 99 2026-02-21T08:54:52.9878246Z .b8 55 2026-02-21T08:54:52.9878308Z .b8 119 2026-02-21T08:54:52.9878371Z .b8 103 2026-02-21T08:54:52.9878421Z .b8 111 2026-02-21T08:54:52.9878479Z .b8 102 2026-02-21T08:54:52.9878529Z .b8 98 2026-02-21T08:54:52.9878580Z .b8 54 2026-02-21T08:54:52.9878631Z .b8 108 2026-02-21T08:54:52.9878695Z .b8 103 2026-02-21T08:54:52.9878746Z .b8 98 2026-02-21T08:54:52.9878798Z .b8 110 2026-02-21T08:54:52.9878854Z .b8 54 2026-02-21T08:54:52.9878903Z .b8 99 2026-02-21T08:54:52.9878951Z .b8 113 2026-02-21T08:54:52.9878998Z .b8 55 2026-02-21T08:54:52.9879054Z .b8 120 2026-02-21T08:54:52.9879102Z .b8 50 2026-02-21T08:54:52.9879150Z .b8 97 2026-02-21T08:54:52.9879204Z .b8 46 2026-02-21T08:54:52.9879252Z .b8 112 2026-02-21T08:54:52.9879299Z .b8 121 2026-02-21T08:54:52.9879347Z .b8 0 2026-02-21T08:54:52.9879443Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:52.9879541Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:52.9879593Z .b8 116 2026-02-21T08:54:52.9879649Z .b8 109 2026-02-21T08:54:52.9879700Z .b8 112 2026-02-21T08:54:52.9879749Z .b8 47 2026-02-21T08:54:52.9879798Z .b8 116 2026-02-21T08:54:52.9879855Z .b8 111 2026-02-21T08:54:52.9879904Z .b8 114 2026-02-21T08:54:52.9879953Z .b8 99 2026-02-21T08:54:52.9880002Z .b8 104 2026-02-21T08:54:52.9880057Z .b8 105 2026-02-21T08:54:52.9880104Z .b8 110 2026-02-21T08:54:52.9880152Z .b8 100 2026-02-21T08:54:52.9880206Z .b8 117 2026-02-21T08:54:52.9880257Z .b8 99 2026-02-21T08:54:52.9880306Z .b8 116 2026-02-21T08:54:52.9880352Z .b8 111 2026-02-21T08:54:52.9880408Z .b8 114 2026-02-21T08:54:52.9880456Z .b8 95 2026-02-21T08:54:52.9880504Z .b8 114 2026-02-21T08:54:52.9880559Z .b8 111 2026-02-21T08:54:52.9880607Z .b8 111 2026-02-21T08:54:52.9880655Z .b8 116 2026-02-21T08:54:52.9880702Z .b8 47 2026-02-21T08:54:52.9880758Z .b8 107 2026-02-21T08:54:52.9880805Z .b8 122 2026-02-21T08:54:52.9880853Z .b8 0 2026-02-21T08:54:52.9880901Z } 2026-02-21T08:54:52.9880971Z .section .debug_macinfo { } 2026-02-21T08:54:52.9880976Z 2026-02-21T08:54:52.9881051Z ================================================================ 2026-02-21T08:54:52.9881151Z please share the reproducer above with Triton project. 2026-02-21T08:54:53.4084040Z 2026-02-21T08:54:53.4088714Z 2026-02-21T08:54:53.4090907Z 2026-02-21T08:54:53.4091125Z ================================================================ 2026-02-21T08:54:53.4091438Z Internal Triton PTX codegen error 2026-02-21T08:54:53.4096526Z `ptxas` stderr: 2026-02-21T08:54:53.4098398Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 354 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:53.4098961Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:53.4103532Z 2026-02-21T08:54:53.4106065Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp4kshdm69.ptx -o /tmp/tmp4kshdm69.ptx.o 2026-02-21T08:54:53.4106574Z 2026-02-21T08:54:53.4106578Z 2026-02-21T08:54:53.4106644Z // 2026-02-21T08:54:53.4106798Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:54:53.4106988Z // 2026-02-21T08:54:53.4107058Z 2026-02-21T08:54:53.4107123Z .version 8.7 2026-02-21T08:54:53.4107271Z .target sm_100a 2026-02-21T08:54:53.4107410Z .address_size 64 2026-02-21T08:54:53.4107507Z 2026-02-21T08:54:53.4107637Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T08:54:53.4108117Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:54:53.4108341Z // @_helion_matmul 2026-02-21T08:54:53.4108550Z .visible .entry _helion_matmul( 2026-02-21T08:54:53.4108769Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T08:54:53.4109033Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T08:54:53.4109282Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T08:54:53.4109560Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T08:54:53.4109897Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T08:54:53.4110112Z ) 2026-02-21T08:54:53.4110252Z .reqntid 128 2026-02-21T08:54:53.4110390Z .maxnreg 32 2026-02-21T08:54:53.4110533Z { 2026-02-21T08:54:53.4110668Z .reg .pred %p<141>; 2026-02-21T08:54:53.4110845Z .reg .b16 %rs<7>; 2026-02-21T08:54:53.4110998Z .reg .b32 %r<1138>; 2026-02-21T08:54:53.4111161Z .reg .b64 %rd<645>; 2026-02-21T08:54:53.4111722Z .loc 1 19 0 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:19:0 2026-02-21T08:54:53.4112019Z $L__func_begin0: 2026-02-21T08:54:53.4112274Z .loc 1 19 0 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:19:0 2026-02-21T08:54:53.4112508Z 2026-02-21T08:54:53.4112562Z // %bb.0: 2026-02-21T08:54:53.4112726Z ld.param.b64 %rd21, [_helion_matmul_param_0]; 2026-02-21T08:54:53.4112924Z $L__tmp0: 2026-02-21T08:54:53.4113231Z .loc 1 19 0 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:19 2026-02-21T08:54:53.4113518Z mov.u32 %r1, %tid.x; 2026-02-21T08:54:53.4113691Z ld.param.b64 %rd39, [_helion_matmul_param_1]; 2026-02-21T08:54:53.4113898Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T08:54:53.4114080Z ld.param.b64 %rd57, [_helion_matmul_param_2]; 2026-02-21T08:54:53.4114280Z mov.b32 %r330, global_smem; 2026-02-21T08:54:53.4114440Z // begin inline asm 2026-02-21T08:54:53.4114862Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r330], 256; 2026-02-21T08:54:53.4115116Z // end inline asm 2026-02-21T08:54:53.4115272Z ld.param.b64 %rd74, [_helion_matmul_param_3]; 2026-02-21T08:54:53.4115460Z bar.sync 0; 2026-02-21T08:54:53.4115600Z ld.shared.b32 %r1136, [global_smem]; 2026-02-21T08:54:53.4115776Z bar.sync 0; 2026-02-21T08:54:53.4115901Z // begin inline asm 2026-02-21T08:54:53.4116112Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T08:54:53.4116340Z // end inline asm 2026-02-21T08:54:53.4116590Z .loc 1 21 67 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:21:67 2026-02-21T08:54:53.4116880Z mov.u32 %r3, %ctaid.x; 2026-02-21T08:54:53.4117029Z mov.u32 %r48, %ctaid.y; 2026-02-21T08:54:53.4117220Z mov.u32 %r49, %ctaid.z; 2026-02-21T08:54:53.4117369Z mov.u32 %r50, %nctaid.x; 2026-02-21T08:54:53.4117529Z mov.u32 %r51, %nctaid.y; 2026-02-21T08:54:53.4117684Z mad.lo.s32 %r52, %r49, %r51, %r48; 2026-02-21T08:54:53.4117866Z mad.lo.s32 %r53, %r52, %r50, %r3; 2026-02-21T08:54:53.4118035Z mul.lo.s32 %r54, %r53, 384; 2026-02-21T08:54:53.4118203Z cvt.s64.s32 %rd75, %r54; 2026-02-21T08:54:53.4118359Z add.s64 %rd35, %rd74, %rd75; 2026-02-21T08:54:53.4118528Z shl.b32 %r55, %r1, 2; 2026-02-21T08:54:53.4118687Z add.s32 %r24, %r330, %r55; 2026-02-21T08:54:53.4118839Z mov.b32 %r33, 0; 2026-02-21T08:54:53.4118982Z // begin inline asm 2026-02-21T08:54:53.4119136Z @%p1 st.shared.b32 [ %r24 + 0 ], %r33; 2026-02-21T08:54:53.4119315Z // end inline asm 2026-02-21T08:54:53.4119456Z bar.warp.sync -1; 2026-02-21T08:54:53.4119610Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T08:54:53.4119763Z cvt.u64.u32 %rd20, %r330; 2026-02-21T08:54:53.4119927Z // begin inline asm 2026-02-21T08:54:53.4120181Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd20 + 0 ], %rd21; 2026-02-21T08:54:53.4120462Z // end inline asm 2026-02-21T08:54:53.4120606Z // begin inline asm 2026-02-21T08:54:53.4120862Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T08:54:53.4121123Z // end inline asm 2026-02-21T08:54:53.4121256Z mov.b32 %r26, 64; 2026-02-21T08:54:53.4121396Z // begin inline asm 2026-02-21T08:54:53.4121623Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r26; 2026-02-21T08:54:53.4121887Z // end inline asm 2026-02-21T08:54:53.4122022Z mov.b32 %r27, 128; 2026-02-21T08:54:53.4122156Z // begin inline asm 2026-02-21T08:54:53.4122384Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r27; 2026-02-21T08:54:53.4122664Z // end inline asm 2026-02-21T08:54:53.4122799Z mov.b32 %r28, 2048; 2026-02-21T08:54:53.4122935Z // begin inline asm 2026-02-21T08:54:53.4123166Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r28; 2026-02-21T08:54:53.4123426Z // end inline asm 2026-02-21T08:54:53.4123560Z // begin inline asm 2026-02-21T08:54:53.4123794Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r28; 2026-02-21T08:54:53.4124051Z // end inline asm 2026-02-21T08:54:53.4124187Z mov.b64 %rd28, 4096; 2026-02-21T08:54:53.4124324Z // begin inline asm 2026-02-21T08:54:53.4124575Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd20 + 0 ], 0x0, %rd28; 2026-02-21T08:54:53.4124913Z // end inline asm 2026-02-21T08:54:53.4125050Z mov.b32 %r30, 1; 2026-02-21T08:54:53.4125179Z // begin inline asm 2026-02-21T08:54:53.4125467Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r30; 2026-02-21T08:54:53.4125745Z // end inline asm 2026-02-21T08:54:53.4125889Z // begin inline asm 2026-02-21T08:54:53.4126134Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r30; 2026-02-21T08:54:53.4126419Z // end inline asm 2026-02-21T08:54:53.4126556Z // begin inline asm 2026-02-21T08:54:53.4126778Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x6; 2026-02-21T08:54:53.4127036Z // end inline asm 2026-02-21T08:54:53.4127161Z // begin inline asm 2026-02-21T08:54:53.4127402Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T08:54:53.4127673Z // end inline asm 2026-02-21T08:54:53.4127809Z // begin inline asm 2026-02-21T08:54:53.4128042Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x3; 2026-02-21T08:54:53.4128296Z // end inline asm 2026-02-21T08:54:53.4128434Z // begin inline asm 2026-02-21T08:54:53.4128650Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T08:54:53.4128904Z // end inline asm 2026-02-21T08:54:53.4129031Z // begin inline asm 2026-02-21T08:54:53.4129369Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd35 + 0 ], [ %rd20 + 0 ], 0x80; 2026-02-21T08:54:53.4129782Z // end inline asm 2026-02-21T08:54:53.4129910Z // begin inline asm 2026-02-21T08:54:53.4130117Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd35 + 0 ], 0x80; 2026-02-21T08:54:53.4130360Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:54:53.4130550Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:53.4130720Z // end inline asm 2026-02-21T08:54:53.4130854Z bar.sync 0; 2026-02-21T08:54:53.4130988Z cvta.global.u64 %rd113, %rd35; 2026-02-21T08:54:53.4131270Z .loc 1 22 68 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:22:68 2026-02-21T08:54:53.4131567Z add.s32 %r56, %r54, 128; 2026-02-21T08:54:53.4131716Z cvt.s64.s32 %rd76, %r56; 2026-02-21T08:54:53.4131874Z add.s64 %rd53, %rd74, %rd76; 2026-02-21T08:54:53.4132024Z bar.sync 0; 2026-02-21T08:54:53.4132156Z // begin inline asm 2026-02-21T08:54:53.4132299Z @%p1 st.shared.b32 [ %r24 + 0 ], %r33; 2026-02-21T08:54:53.4132471Z // end inline asm 2026-02-21T08:54:53.4132601Z bar.warp.sync -1; 2026-02-21T08:54:53.4132744Z // begin inline asm 2026-02-21T08:54:53.4133002Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd20 + 0 ], %rd39; 2026-02-21T08:54:53.4133274Z // end inline asm 2026-02-21T08:54:53.4133410Z // begin inline asm 2026-02-21T08:54:53.4133621Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T08:54:53.4133868Z // end inline asm 2026-02-21T08:54:53.4133995Z // begin inline asm 2026-02-21T08:54:53.4134227Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r26; 2026-02-21T08:54:53.4134483Z // end inline asm 2026-02-21T08:54:53.4134623Z mov.b32 %r35, 256; 2026-02-21T08:54:53.4134829Z // begin inline asm 2026-02-21T08:54:53.4135062Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r35; 2026-02-21T08:54:53.4135319Z // end inline asm 2026-02-21T08:54:53.4135449Z // begin inline asm 2026-02-21T08:54:53.4135684Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r28; 2026-02-21T08:54:53.4135941Z // end inline asm 2026-02-21T08:54:53.4136078Z mov.b32 %r37, 12288; 2026-02-21T08:54:53.4136217Z // begin inline asm 2026-02-21T08:54:53.4136459Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r37; 2026-02-21T08:54:53.4136726Z // end inline asm 2026-02-21T08:54:53.4136855Z // begin inline asm 2026-02-21T08:54:53.4137112Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd20 + 0 ], 0x0, %rd28; 2026-02-21T08:54:53.4137384Z // end inline asm 2026-02-21T08:54:53.4137522Z // begin inline asm 2026-02-21T08:54:53.4137789Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r30; 2026-02-21T08:54:53.4138073Z // end inline asm 2026-02-21T08:54:53.4138208Z // begin inline asm 2026-02-21T08:54:53.4138446Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r30; 2026-02-21T08:54:53.4138725Z // end inline asm 2026-02-21T08:54:53.4138851Z // begin inline asm 2026-02-21T08:54:53.4139080Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x6; 2026-02-21T08:54:53.4139334Z // end inline asm 2026-02-21T08:54:53.4139472Z // begin inline asm 2026-02-21T08:54:53.4139712Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T08:54:53.4139991Z // end inline asm 2026-02-21T08:54:53.4140125Z // begin inline asm 2026-02-21T08:54:53.4140349Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x3; 2026-02-21T08:54:53.4140611Z // end inline asm 2026-02-21T08:54:53.4140738Z // begin inline asm 2026-02-21T08:54:53.4140961Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T08:54:53.4141205Z // end inline asm 2026-02-21T08:54:53.4141339Z // begin inline asm 2026-02-21T08:54:53.4141672Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd53 + 0 ], [ %rd20 + 0 ], 0x80; 2026-02-21T08:54:53.4142056Z // end inline asm 2026-02-21T08:54:53.4142192Z // begin inline asm 2026-02-21T08:54:53.4142391Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd53 + 0 ], 0x80; 2026-02-21T08:54:53.4142638Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:54:53.4142818Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:53.4142994Z // end inline asm 2026-02-21T08:54:53.4143123Z bar.sync 0; 2026-02-21T08:54:53.4143265Z cvta.global.u64 %rd114, %rd53; 2026-02-21T08:54:53.4143542Z .loc 1 24 73 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:24:73 2026-02-21T08:54:53.4143824Z add.s32 %r57, %r54, 256; 2026-02-21T08:54:53.4143981Z cvt.s64.s32 %rd77, %r57; 2026-02-21T08:54:53.4144131Z add.s64 %rd71, %rd74, %rd77; 2026-02-21T08:54:53.4144288Z bar.sync 0; 2026-02-21T08:54:53.4144413Z // begin inline asm 2026-02-21T08:54:53.4144567Z @%p1 st.shared.b32 [ %r24 + 0 ], %r33; 2026-02-21T08:54:53.4144775Z // end inline asm 2026-02-21T08:54:53.4144920Z bar.warp.sync -1; 2026-02-21T08:54:53.4145068Z // begin inline asm 2026-02-21T08:54:53.4145324Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd20 + 0 ], %rd57; 2026-02-21T08:54:53.4145604Z // end inline asm 2026-02-21T08:54:53.4145733Z // begin inline asm 2026-02-21T08:54:53.4145952Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T08:54:53.4146198Z // end inline asm 2026-02-21T08:54:53.4146333Z // begin inline asm 2026-02-21T08:54:53.4146558Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r26; 2026-02-21T08:54:53.4146821Z // end inline asm 2026-02-21T08:54:53.4146986Z // begin inline asm 2026-02-21T08:54:53.4147205Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r27; 2026-02-21T08:54:53.4147459Z // end inline asm 2026-02-21T08:54:53.4147586Z // begin inline asm 2026-02-21T08:54:53.4147823Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r37; 2026-02-21T08:54:53.4148080Z // end inline asm 2026-02-21T08:54:53.4148215Z // begin inline asm 2026-02-21T08:54:53.4148452Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r28; 2026-02-21T08:54:53.4148708Z // end inline asm 2026-02-21T08:54:53.4148850Z mov.b64 %rd64, 24576; 2026-02-21T08:54:53.4148997Z // begin inline asm 2026-02-21T08:54:53.4149258Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd20 + 0 ], 0x0, %rd64; 2026-02-21T08:54:53.4149540Z // end inline asm 2026-02-21T08:54:53.4149679Z // begin inline asm 2026-02-21T08:54:53.4149986Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r30; 2026-02-21T08:54:53.4150270Z // end inline asm 2026-02-21T08:54:53.4150412Z // begin inline asm 2026-02-21T08:54:53.4150657Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r30; 2026-02-21T08:54:53.4150950Z // end inline asm 2026-02-21T08:54:53.4151083Z // begin inline asm 2026-02-21T08:54:53.4151320Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x6; 2026-02-21T08:54:53.4151580Z // end inline asm 2026-02-21T08:54:53.4151721Z // begin inline asm 2026-02-21T08:54:53.4151974Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T08:54:53.4152265Z // end inline asm 2026-02-21T08:54:53.4152410Z // begin inline asm 2026-02-21T08:54:53.4152646Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x3; 2026-02-21T08:54:53.4152922Z // end inline asm 2026-02-21T08:54:53.4153055Z // begin inline asm 2026-02-21T08:54:53.4153287Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T08:54:53.4153551Z // end inline asm 2026-02-21T08:54:53.4153685Z // begin inline asm 2026-02-21T08:54:53.4154038Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd71 + 0 ], [ %rd20 + 0 ], 0x80; 2026-02-21T08:54:53.4154442Z // end inline asm 2026-02-21T08:54:53.4154582Z // begin inline asm 2026-02-21T08:54:53.4154821Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd71 + 0 ], 0x80; 2026-02-21T08:54:53.4155081Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:54:53.4155278Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:54:53.4155454Z // end inline asm 2026-02-21T08:54:53.4155599Z bar.sync 0; 2026-02-21T08:54:53.4155744Z cvta.global.u64 %rd132, %rd71; 2026-02-21T08:54:53.4156047Z .loc 1 33 74 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:33:74 2026-02-21T08:54:53.4156359Z setp.gt.u32 %p57, %r3, 767; 2026-02-21T08:54:53.4156537Z @%p57 bra $L__BB0_8; 2026-02-21T08:54:53.4156702Z // %bb.1: // %.lr.ph 2026-02-21T08:54:53.4157019Z .loc 1 0 74 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:0:74 2026-02-21T08:54:53.4157314Z setp.lt.u32 %p83, %r1, 64; 2026-02-21T08:54:53.4157471Z add.s32 %r349, %r330, 65536; 2026-02-21T08:54:53.4157635Z bfe.u32 %r350, %r349, 4, 14; 2026-02-21T08:54:53.4157786Z cvt.u64.u32 %rd80, %r350; 2026-02-21T08:54:53.4157979Z or.b64 %rd96, %rd80, 4611686293372403712; 2026-02-21T08:54:53.4158155Z bfe.u32 %r351, %r330, 4, 14; 2026-02-21T08:54:53.4158311Z cvt.u64.u32 %rd81, %r351; 2026-02-21T08:54:53.4158470Z or.b64 %rd97, %rd81, 4611686293439512576; 2026-02-21T08:54:53.4158647Z add.s32 %r352, %r330, 65568; 2026-02-21T08:54:53.4158802Z bfe.u32 %r353, %r352, 4, 14; 2026-02-21T08:54:53.4158950Z cvt.u64.u32 %rd82, %r353; 2026-02-21T08:54:53.4159116Z or.b64 %rd98, %rd82, 4611686293372403712; 2026-02-21T08:54:53.4159312Z add.s32 %r354, %r330, 32; 2026-02-21T08:54:53.4159462Z bfe.u32 %r355, %r354, 4, 14; 2026-02-21T08:54:53.4159609Z cvt.u64.u32 %rd83, %r355; 2026-02-21T08:54:53.4159767Z or.b64 %rd99, %rd83, 4611686293439512576; 2026-02-21T08:54:53.4159934Z add.s32 %r356, %r330, 65600; 2026-02-21T08:54:53.4160087Z bfe.u32 %r357, %r356, 4, 14; 2026-02-21T08:54:53.4160231Z cvt.u64.u32 %rd84, %r357; 2026-02-21T08:54:53.4160394Z or.b64 %rd100, %rd84, 4611686293372403712; 2026-02-21T08:54:53.4160571Z add.s32 %r358, %r330, 64; 2026-02-21T08:54:53.4160714Z bfe.u32 %r359, %r358, 4, 14; 2026-02-21T08:54:53.4160867Z cvt.u64.u32 %rd85, %r359; 2026-02-21T08:54:53.4161018Z or.b64 %rd101, %rd85, 4611686293439512576; 2026-02-21T08:54:53.4161193Z add.s32 %r360, %r330, 65632; 2026-02-21T08:54:53.4161337Z bfe.u32 %r361, %r360, 4, 14; 2026-02-21T08:54:53.4161490Z cvt.u64.u32 %rd86, %r361; 2026-02-21T08:54:53.4161640Z or.b64 %rd102, %rd86, 4611686293372403712; 2026-02-21T08:54:53.4161843Z add.s32 %r362, %r330, 96; 2026-02-21T08:54:53.4161995Z bfe.u32 %r363, %r362, 4, 14; 2026-02-21T08:54:53.4162141Z cvt.u64.u32 %rd87, %r363; 2026-02-21T08:54:53.4162298Z or.b64 %rd103, %rd87, 4611686293439512576; 2026-02-21T08:54:53.4162463Z add.s32 %r364, %r330, 81920; 2026-02-21T08:54:53.4162618Z bfe.u32 %r365, %r364, 4, 14; 2026-02-21T08:54:53.4162761Z cvt.u64.u32 %rd88, %r365; 2026-02-21T08:54:53.4162918Z or.b64 %rd104, %rd88, 4611686293372403712; 2026-02-21T08:54:53.4163082Z add.s32 %r366, %r330, 32768; 2026-02-21T08:54:53.4163234Z bfe.u32 %r367, %r366, 4, 14; 2026-02-21T08:54:53.4163378Z cvt.u64.u32 %rd89, %r367; 2026-02-21T08:54:53.4163536Z or.b64 %rd105, %rd89, 4611686293439512576; 2026-02-21T08:54:53.4163706Z add.s32 %r368, %r330, 81952; 2026-02-21T08:54:53.4163850Z bfe.u32 %r369, %r368, 4, 14; 2026-02-21T08:54:53.4164004Z cvt.u64.u32 %rd90, %r369; 2026-02-21T08:54:53.4164156Z or.b64 %rd106, %rd90, 4611686293372403712; 2026-02-21T08:54:53.4164332Z add.s32 %r370, %r330, 32800; 2026-02-21T08:54:53.4164488Z bfe.u32 %r371, %r370, 4, 14; 2026-02-21T08:54:53.4164644Z cvt.u64.u32 %rd91, %r371; 2026-02-21T08:54:53.4164828Z or.b64 %rd107, %rd91, 4611686293439512576; 2026-02-21T08:54:53.4165004Z add.s32 %r372, %r330, 81984; 2026-02-21T08:54:53.4165185Z bfe.u32 %r373, %r372, 4, 14; 2026-02-21T08:54:53.4165331Z cvt.u64.u32 %rd92, %r373; 2026-02-21T08:54:53.4165490Z or.b64 %rd108, %rd92, 4611686293372403712; 2026-02-21T08:54:53.4165655Z add.s32 %r374, %r330, 32832; 2026-02-21T08:54:53.4165811Z bfe.u32 %r375, %r374, 4, 14; 2026-02-21T08:54:53.4165957Z cvt.u64.u32 %rd93, %r375; 2026-02-21T08:54:53.4166116Z or.b64 %rd109, %rd93, 4611686293439512576; 2026-02-21T08:54:53.4166281Z add.s32 %r376, %r330, 82016; 2026-02-21T08:54:53.4166436Z bfe.u32 %r377, %r376, 4, 14; 2026-02-21T08:54:53.4166583Z cvt.u64.u32 %rd94, %r377; 2026-02-21T08:54:53.4166742Z or.b64 %rd110, %rd94, 4611686293372403712; 2026-02-21T08:54:53.4166916Z add.s32 %r378, %r330, 32864; 2026-02-21T08:54:53.4167062Z bfe.u32 %r379, %r378, 4, 14; 2026-02-21T08:54:53.4167217Z cvt.u64.u32 %rd95, %r379; 2026-02-21T08:54:53.4167368Z or.b64 %rd111, %rd95, 4611686293439512576; 2026-02-21T08:54:53.4167541Z shl.b32 %r380, %r1, 7; 2026-02-21T08:54:53.4167684Z and.b32 %r381, %r380, 16256; 2026-02-21T08:54:53.4167835Z shl.b32 %r382, %r1, 4; 2026-02-21T08:54:53.4167982Z and.b32 %r383, %r382, 112; 2026-02-21T08:54:53.4168138Z or.b32 %r384, %r381, %r383; 2026-02-21T08:54:53.4168291Z xor.b32 %r385, %r384, 16; 2026-02-21T08:54:53.4168457Z xor.b32 %r386, %r384, 32; 2026-02-21T08:54:53.4168608Z xor.b32 %r387, %r384, 48; 2026-02-21T08:54:53.4168750Z xor.b32 %r388, %r384, 64; 2026-02-21T08:54:53.4168895Z xor.b32 %r389, %r384, 80; 2026-02-21T08:54:53.4169033Z xor.b32 %r390, %r384, 96; 2026-02-21T08:54:53.4169181Z xor.b32 %r391, %r384, 112; 2026-02-21T08:54:53.4169327Z shr.u32 %r392, %r1, 5; 2026-02-21T08:54:53.4169587Z .loc 1 40 33 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:40:33 2026-02-21T08:54:53.4169902Z shr.u32 %r393, %r3, 4; 2026-02-21T08:54:53.4170041Z and.b32 %r394, %r393, 32; 2026-02-21T08:54:53.4170297Z .loc 1 41 39 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:41:39 2026-02-21T08:54:53.4170573Z xor.b32 %r395, %r394, 48; 2026-02-21T08:54:53.4170830Z .loc 1 41 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:41:52 2026-02-21T08:54:53.4171109Z min.u32 %r396, %r395, 32; 2026-02-21T08:54:53.4171365Z .loc 1 42 64 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:42:64 2026-02-21T08:54:53.4171643Z cvt.u16.u32 %rs1, %r3; 2026-02-21T08:54:53.4171799Z and.b16 %rs2, %rs1, 511; 2026-02-21T08:54:53.4171953Z cvt.u16.u32 %rs3, %r396; 2026-02-21T08:54:53.4172203Z .loc 1 43 51 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:43:51 2026-02-21T08:54:53.4172511Z div.u16 %rs4, %rs2, %rs3; 2026-02-21T08:54:53.4172761Z .loc 1 42 64 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:42:64 2026-02-21T08:54:53.4173046Z mul.lo.s16 %rs5, %rs4, %rs3; 2026-02-21T08:54:53.4173198Z sub.s16 %rs6, %rs2, %rs5; 2026-02-21T08:54:53.4173352Z cvt.u32.u16 %r397, %rs6; 2026-02-21T08:54:53.4173612Z .loc 1 42 30 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:42:30 2026-02-21T08:54:53.4173896Z add.s32 %r398, %r394, %r397; 2026-02-21T08:54:53.4174164Z .loc 1 44 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:44:27 2026-02-21T08:54:53.4174438Z shl.b32 %r442, %r398, 8; 2026-02-21T08:54:53.4174734Z .loc 1 45 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:45:27 2026-02-21T08:54:53.4175014Z mul.wide.u16 %r746, %rs4, 128; 2026-02-21T08:54:53.4175284Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4175576Z shfl.sync.idx.b32 %r14, %r392, 0, 31, -1; 2026-02-21T08:54:53.4175751Z and.b32 %r15, %r14, 3; 2026-02-21T08:54:53.4175900Z shl.b32 %r399, %r15, 21; 2026-02-21T08:54:53.4176046Z add.s32 %r744, %r399, %r1136; 2026-02-21T08:54:53.4176208Z mov.pred %p89, -1; 2026-02-21T08:54:53.4176378Z // begin inline asm 2026-02-21T08:54:53.4176726Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 0], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4177088Z // end inline asm 2026-02-21T08:54:53.4177230Z // begin inline asm 2026-02-21T08:54:53.4177572Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 16], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4177934Z // end inline asm 2026-02-21T08:54:53.4178072Z // begin inline asm 2026-02-21T08:54:53.4178402Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 32], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4178776Z // end inline asm 2026-02-21T08:54:53.4178908Z // begin inline asm 2026-02-21T08:54:53.4179239Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 48], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4179603Z // end inline asm 2026-02-21T08:54:53.4179732Z // begin inline asm 2026-02-21T08:54:53.4180079Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 64], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4180433Z // end inline asm 2026-02-21T08:54:53.4180567Z // begin inline asm 2026-02-21T08:54:53.4180877Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 80], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4181239Z // end inline asm 2026-02-21T08:54:53.4181371Z // begin inline asm 2026-02-21T08:54:53.4181681Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 96], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4182056Z // end inline asm 2026-02-21T08:54:53.4182184Z // begin inline asm 2026-02-21T08:54:53.4182506Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 112], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4182855Z // end inline asm 2026-02-21T08:54:53.4182984Z // begin inline asm 2026-02-21T08:54:53.4183304Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 128], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4183660Z // end inline asm 2026-02-21T08:54:53.4183793Z // begin inline asm 2026-02-21T08:54:53.4184128Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 144], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4184482Z // end inline asm 2026-02-21T08:54:53.4184617Z // begin inline asm 2026-02-21T08:54:53.4184974Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 160], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4185333Z // end inline asm 2026-02-21T08:54:53.4185459Z // begin inline asm 2026-02-21T08:54:53.4185784Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 176], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4186130Z // end inline asm 2026-02-21T08:54:53.4186270Z // begin inline asm 2026-02-21T08:54:53.4186594Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 192], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4186961Z // end inline asm 2026-02-21T08:54:53.4187102Z // begin inline asm 2026-02-21T08:54:53.4187423Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 208], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4187801Z // end inline asm 2026-02-21T08:54:53.4187936Z // begin inline asm 2026-02-21T08:54:53.4188254Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 224], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4188635Z // end inline asm 2026-02-21T08:54:53.4188763Z // begin inline asm 2026-02-21T08:54:53.4189083Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r744 + 240], {%r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33, %r33}; 2026-02-21T08:54:53.4189445Z // end inline asm 2026-02-21T08:54:53.4189581Z // begin inline asm 2026-02-21T08:54:53.4189729Z tcgen05.wait::st.sync.aligned; 2026-02-21T08:54:53.4189892Z // end inline asm 2026-02-21T08:54:53.4190024Z bar.sync 0; 2026-02-21T08:54:53.4190154Z and.b32 %r400, %r14, 1; 2026-02-21T08:54:53.4190309Z shl.b32 %r401, %r400, 14; 2026-02-21T08:54:53.4190459Z add.s32 %r431, %r349, %r401; 2026-02-21T08:54:53.4190618Z shl.b32 %r333, %r400, 6; 2026-02-21T08:54:53.4190761Z shl.b32 %r402, %r400, 15; 2026-02-21T08:54:53.4190914Z add.s32 %r440, %r330, %r402; 2026-02-21T08:54:53.4191069Z setp.ne.b32 %p84, %r14, 0; 2026-02-21T08:54:53.4191339Z .loc 1 54 31 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:54:31 2026-02-21T08:54:53.4191612Z bar.sync 0; 2026-02-21T08:54:53.4191770Z // begin inline asm 2026-02-21T08:54:53.4191949Z @%p4 mbarrier.init.shared::cta.b64 [%r330], 1; 2026-02-21T08:54:53.4192138Z // end inline asm 2026-02-21T08:54:53.4192277Z bar.sync 0; 2026-02-21T08:54:53.4192404Z // begin inline asm 2026-02-21T08:54:53.4192602Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r330], 32768; 2026-02-21T08:54:53.4192818Z // end inline asm 2026-02-21T08:54:53.4192956Z bar.sync 0; 2026-02-21T08:54:53.4193093Z elect.sync %r403|%p85, -1; 2026-02-21T08:54:53.4193267Z and.pred %p76, %p83, %p85; 2026-02-21T08:54:53.4193477Z // begin inline asm 2026-02-21T08:54:53.4193814Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r431], [%rd113, {%r333, %r746}], [%r330]; 2026-02-21T08:54:53.4194199Z // end inline asm 2026-02-21T08:54:53.4194330Z bar.sync 0; 2026-02-21T08:54:53.4194464Z // begin inline asm 2026-02-21T08:54:53.4194600Z 2026-02-21T08:54:53.4194751Z { 2026-02-21T08:54:53.4194879Z .reg .pred complete; 2026-02-21T08:54:53.4195041Z waitLoop: 2026-02-21T08:54:53.4195233Z mbarrier.try_wait.parity.shared.b64 complete, [%r330], %r33; 2026-02-21T08:54:53.4195480Z @!complete bra.uni waitLoop; 2026-02-21T08:54:53.4195640Z } 2026-02-21T08:54:53.4195707Z 2026-02-21T08:54:53.4195763Z // end inline asm 2026-02-21T08:54:53.4195905Z bar.sync 0; 2026-02-21T08:54:53.4196032Z // begin inline asm 2026-02-21T08:54:53.4196204Z @%p4 mbarrier.inval.shared::cta.b64 [%r330]; 2026-02-21T08:54:53.4196422Z // end inline asm 2026-02-21T08:54:53.4196685Z .loc 1 55 44 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:55:44 2026-02-21T08:54:53.4196981Z add.s32 %r424, %r330, 98304; 2026-02-21T08:54:53.4197145Z // begin inline asm 2026-02-21T08:54:53.4197318Z @%p4 mbarrier.init.shared::cta.b64 [%r424], 1; 2026-02-21T08:54:53.4197505Z // end inline asm 2026-02-21T08:54:53.4197645Z bar.sync 0; 2026-02-21T08:54:53.4197771Z // begin inline asm 2026-02-21T08:54:53.4197968Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r424], 65536; 2026-02-21T08:54:53.4198180Z // end inline asm 2026-02-21T08:54:53.4198321Z // begin inline asm 2026-02-21T08:54:53.4198472Z fence.proxy.async.shared::cta; 2026-02-21T08:54:53.4198646Z // end inline asm 2026-02-21T08:54:53.4198782Z bar.sync 0; 2026-02-21T08:54:53.4198917Z elect.sync %r404|%p86, -1; 2026-02-21T08:54:53.4199088Z and.pred %p80, %p83, %p86; 2026-02-21T08:54:53.4199253Z // begin inline asm 2026-02-21T08:54:53.4199585Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r440], [%rd114, {%r333, %r442}], [%r424]; 2026-02-21T08:54:53.4199930Z // end inline asm 2026-02-21T08:54:53.4200064Z bar.sync 0; 2026-02-21T08:54:53.4200186Z // begin inline asm 2026-02-21T08:54:53.4200319Z 2026-02-21T08:54:53.4200457Z { 2026-02-21T08:54:53.4200580Z .reg .pred complete; 2026-02-21T08:54:53.4200728Z waitLoop: 2026-02-21T08:54:53.4200906Z mbarrier.try_wait.parity.shared.b64 complete, [%r424], %r33; 2026-02-21T08:54:53.4201141Z @!complete bra.uni waitLoop; 2026-02-21T08:54:53.4201288Z } 2026-02-21T08:54:53.4201349Z 2026-02-21T08:54:53.4201408Z // end inline asm 2026-02-21T08:54:53.4201533Z bar.sync 0; 2026-02-21T08:54:53.4201658Z // begin inline asm 2026-02-21T08:54:53.4201810Z @%p4 mbarrier.inval.shared::cta.b64 [%r424]; 2026-02-21T08:54:53.4201993Z // end inline asm 2026-02-21T08:54:53.4202238Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4202516Z bar.sync 0; 2026-02-21T08:54:53.4202644Z // begin inline asm 2026-02-21T08:54:53.4202801Z @%p4 mbarrier.init.shared::cta.b64 [%r424], 1; 2026-02-21T08:54:53.4202985Z // end inline asm 2026-02-21T08:54:53.4203115Z @%p84 bra $L__BB0_3; 2026-02-21T08:54:53.4203257Z // %bb.2: 2026-02-21T08:54:53.4203384Z elect.sync %r421|%p88, -1; 2026-02-21T08:54:53.4203542Z mov.b32 %r406, 138412048; 2026-02-21T08:54:53.4203688Z mov.pred %p87, 0; 2026-02-21T08:54:53.4203829Z // begin inline asm 2026-02-21T08:54:53.4204078Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd96, %rd97, %r406, %p87; 2026-02-21T08:54:53.4204324Z // end inline asm 2026-02-21T08:54:53.4204459Z // begin inline asm 2026-02-21T08:54:53.4204667Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd98, %rd99, %r406, %p89; 2026-02-21T08:54:53.4204945Z // end inline asm 2026-02-21T08:54:53.4205075Z // begin inline asm 2026-02-21T08:54:53.4205299Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd100, %rd101, %r406, %p89; 2026-02-21T08:54:53.4205590Z // end inline asm 2026-02-21T08:54:53.4205720Z // begin inline asm 2026-02-21T08:54:53.4205935Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd102, %rd103, %r406, %p89; 2026-02-21T08:54:53.4206174Z // end inline asm 2026-02-21T08:54:53.4206309Z // begin inline asm 2026-02-21T08:54:53.4206512Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd104, %rd105, %r406, %p89; 2026-02-21T08:54:53.4206759Z // end inline asm 2026-02-21T08:54:53.4206888Z // begin inline asm 2026-02-21T08:54:53.4207098Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd106, %rd107, %r406, %p89; 2026-02-21T08:54:53.4207151Z // end inline asm 2026-02-21T08:54:53.4207213Z // begin inline asm 2026-02-21T08:54:53.4207337Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd108, %rd109, %r406, %p89; 2026-02-21T08:54:53.4207389Z // end inline asm 2026-02-21T08:54:53.4207449Z // begin inline asm 2026-02-21T08:54:53.4207603Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd110, %rd111, %r406, %p89; 2026-02-21T08:54:53.4207659Z // end inline asm 2026-02-21T08:54:53.4207719Z cvt.u64.u32 %rd112, %r424; 2026-02-21T08:54:53.4207780Z // begin inline asm 2026-02-21T08:54:53.4207903Z @%p88 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd112]; 2026-02-21T08:54:53.4207957Z // end inline asm 2026-02-21T08:54:53.4208046Z $L__BB0_3: // %.peel.next 2026-02-21T08:54:53.4208213Z .loc 1 0 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:0:52 2026-02-21T08:54:53.4208273Z add.s32 %r4, %r330, %r384; 2026-02-21T08:54:53.4208337Z add.s32 %r5, %r330, %r385; 2026-02-21T08:54:53.4208392Z add.s32 %r6, %r330, %r386; 2026-02-21T08:54:53.4208448Z add.s32 %r7, %r330, %r387; 2026-02-21T08:54:53.4208502Z add.s32 %r8, %r330, %r388; 2026-02-21T08:54:53.4208564Z add.s32 %r9, %r330, %r389; 2026-02-21T08:54:53.4208620Z add.s32 %r10, %r330, %r390; 2026-02-21T08:54:53.4208679Z add.s32 %r11, %r330, %r391; 2026-02-21T08:54:53.4208849Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4208902Z bar.sync 0; 2026-02-21T08:54:53.4208954Z mov.b32 %r425, 0; 2026-02-21T08:54:53.4209009Z // begin inline asm 2026-02-21T08:54:53.4209092Z 2026-02-21T08:54:53.4209141Z { 2026-02-21T08:54:53.4209199Z .reg .pred complete; 2026-02-21T08:54:53.4209259Z waitLoop: 2026-02-21T08:54:53.4209375Z mbarrier.try_wait.parity.shared.b64 complete, [%r424], %r425; 2026-02-21T08:54:53.4209438Z @!complete bra.uni waitLoop; 2026-02-21T08:54:53.4209485Z } 2026-02-21T08:54:53.4209495Z 2026-02-21T08:54:53.4209548Z // end inline asm 2026-02-21T08:54:53.4209599Z bar.sync 0; 2026-02-21T08:54:53.4209652Z // begin inline asm 2026-02-21T08:54:53.4209738Z @%p4 mbarrier.inval.shared::cta.b64 [%r424]; 2026-02-21T08:54:53.4209790Z // end inline asm 2026-02-21T08:54:53.4209959Z .loc 1 50 57 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:50:57 2026-02-21T08:54:53.4210023Z or.b32 %r20, %r333, 128; 2026-02-21T08:54:53.4210079Z mov.b32 %r1137, %r425; 2026-02-21T08:54:53.4210132Z bra.uni $L__BB0_4; 2026-02-21T08:54:53.4210229Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:53.4210397Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4210448Z bar.sync 0; 2026-02-21T08:54:53.4210501Z mov.b32 %r470, 0; 2026-02-21T08:54:53.4210597Z // begin inline asm 2026-02-21T08:54:53.4210646Z 2026-02-21T08:54:53.4210694Z { 2026-02-21T08:54:53.4210757Z .reg .pred complete; 2026-02-21T08:54:53.4210809Z waitLoop: 2026-02-21T08:54:53.4210920Z mbarrier.try_wait.parity.shared.b64 complete, [%r424], %r470; 2026-02-21T08:54:53.4210981Z @!complete bra.uni waitLoop; 2026-02-21T08:54:53.4211035Z } 2026-02-21T08:54:53.4211038Z 2026-02-21T08:54:53.4211090Z // end inline asm 2026-02-21T08:54:53.4211142Z bar.sync 0; 2026-02-21T08:54:53.4211201Z // begin inline asm 2026-02-21T08:54:53.4211305Z @%p4 mbarrier.inval.shared::cta.b64 [%r424]; 2026-02-21T08:54:53.4211358Z // end inline asm 2026-02-21T08:54:53.4211520Z .loc 1 50 57 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:50:57 2026-02-21T08:54:53.4211587Z add.s32 %r1137, %r1137, 128; 2026-02-21T08:54:53.4211651Z setp.lt.u32 %p136, %r1137, 1920; 2026-02-21T08:54:53.4211708Z @%p136 bra $L__BB0_4; 2026-02-21T08:54:53.4211767Z bra.uni $L__BB0_7; 2026-02-21T08:54:53.4211866Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T08:54:53.4212025Z .loc 1 54 31 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:54:31 2026-02-21T08:54:53.4212083Z bar.sync 0; 2026-02-21T08:54:53.4212135Z // begin inline asm 2026-02-21T08:54:53.4212216Z @%p4 mbarrier.init.shared::cta.b64 [%r330], 1; 2026-02-21T08:54:53.4212268Z // end inline asm 2026-02-21T08:54:53.4212366Z bar.sync 0; 2026-02-21T08:54:53.4212423Z // begin inline asm 2026-02-21T08:54:53.4212529Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r330], 32768; 2026-02-21T08:54:53.4212587Z // end inline asm 2026-02-21T08:54:53.4212639Z bar.sync 0; 2026-02-21T08:54:53.4212700Z elect.sync %r448|%p116, -1; 2026-02-21T08:54:53.4212764Z and.pred %p107, %p83, %p116; 2026-02-21T08:54:53.4212831Z add.s32 %r432, %r20, %r1137; 2026-02-21T08:54:53.4212885Z // begin inline asm 2026-02-21T08:54:53.4213137Z @%p107 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r431], [%rd113, {%r432, %r746}], [%r330]; 2026-02-21T08:54:53.4213201Z // end inline asm 2026-02-21T08:54:53.4213255Z bar.sync 0; 2026-02-21T08:54:53.4213310Z // begin inline asm 2026-02-21T08:54:53.4213370Z 2026-02-21T08:54:53.4213418Z { 2026-02-21T08:54:53.4213475Z .reg .pred complete; 2026-02-21T08:54:53.4213527Z waitLoop: 2026-02-21T08:54:53.4213645Z mbarrier.try_wait.parity.shared.b64 complete, [%r330], %r425; 2026-02-21T08:54:53.4213708Z @!complete bra.uni waitLoop; 2026-02-21T08:54:53.4213758Z } 2026-02-21T08:54:53.4213761Z 2026-02-21T08:54:53.4213819Z // end inline asm 2026-02-21T08:54:53.4213871Z bar.sync 0; 2026-02-21T08:54:53.4213923Z // begin inline asm 2026-02-21T08:54:53.4213998Z @%p4 mbarrier.inval.shared::cta.b64 [%r330]; 2026-02-21T08:54:53.4214081Z // end inline asm 2026-02-21T08:54:53.4214244Z .loc 1 55 44 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:55:44 2026-02-21T08:54:53.4214299Z // begin inline asm 2026-02-21T08:54:53.4214384Z @%p4 mbarrier.init.shared::cta.b64 [%r424], 1; 2026-02-21T08:54:53.4214436Z // end inline asm 2026-02-21T08:54:53.4214487Z bar.sync 0; 2026-02-21T08:54:53.4214546Z // begin inline asm 2026-02-21T08:54:53.4214649Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r424], 65536; 2026-02-21T08:54:53.4214736Z // end inline asm 2026-02-21T08:54:53.4214790Z // begin inline asm 2026-02-21T08:54:53.4214870Z fence.proxy.async.shared::cta; 2026-02-21T08:54:53.4214923Z // end inline asm 2026-02-21T08:54:53.4214976Z bar.sync 0; 2026-02-21T08:54:53.4215045Z elect.sync %r449|%p117, -1; 2026-02-21T08:54:53.4215107Z and.pred %p111, %p83, %p117; 2026-02-21T08:54:53.4215160Z // begin inline asm 2026-02-21T08:54:53.4215404Z @%p111 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r440], [%rd114, {%r432, %r442}], [%r424]; 2026-02-21T08:54:53.4215463Z // end inline asm 2026-02-21T08:54:53.4215514Z bar.sync 0; 2026-02-21T08:54:53.4215590Z // begin inline asm 2026-02-21T08:54:53.4215649Z 2026-02-21T08:54:53.4215698Z { 2026-02-21T08:54:53.4215756Z .reg .pred complete; 2026-02-21T08:54:53.4215807Z waitLoop: 2026-02-21T08:54:53.4215927Z mbarrier.try_wait.parity.shared.b64 complete, [%r424], %r425; 2026-02-21T08:54:53.4215988Z @!complete bra.uni waitLoop; 2026-02-21T08:54:53.4216036Z } 2026-02-21T08:54:53.4216040Z 2026-02-21T08:54:53.4216102Z // end inline asm 2026-02-21T08:54:53.4216155Z bar.sync 0; 2026-02-21T08:54:53.4216209Z // begin inline asm 2026-02-21T08:54:53.4216322Z @%p4 mbarrier.inval.shared::cta.b64 [%r424]; 2026-02-21T08:54:53.4216375Z // end inline asm 2026-02-21T08:54:53.4216540Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4216593Z bar.sync 0; 2026-02-21T08:54:53.4216653Z // begin inline asm 2026-02-21T08:54:53.4216730Z @%p4 mbarrier.init.shared::cta.b64 [%r424], 1; 2026-02-21T08:54:53.4216781Z // end inline asm 2026-02-21T08:54:53.4216843Z @%p84 bra $L__BB0_6; 2026-02-21T08:54:53.4216939Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T08:54:53.4216999Z elect.sync %r466|%p119, -1; 2026-02-21T08:54:53.4217062Z mov.b32 %r451, 138412048; 2026-02-21T08:54:53.4217119Z mov.pred %p118, -1; 2026-02-21T08:54:53.4217171Z // begin inline asm 2026-02-21T08:54:53.4217309Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd96, %rd97, %r451, %p118; 2026-02-21T08:54:53.4217392Z // end inline asm 2026-02-21T08:54:53.4217447Z // begin inline asm 2026-02-21T08:54:53.4217583Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd98, %rd99, %r451, %p118; 2026-02-21T08:54:53.4217642Z // end inline asm 2026-02-21T08:54:53.4217695Z // begin inline asm 2026-02-21T08:54:53.4217831Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd100, %rd101, %r451, %p118; 2026-02-21T08:54:53.4217890Z // end inline asm 2026-02-21T08:54:53.4217942Z // begin inline asm 2026-02-21T08:54:53.4218080Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd102, %rd103, %r451, %p118; 2026-02-21T08:54:53.4218131Z // end inline asm 2026-02-21T08:54:53.4218191Z // begin inline asm 2026-02-21T08:54:53.4218325Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd104, %rd105, %r451, %p118; 2026-02-21T08:54:53.4218376Z // end inline asm 2026-02-21T08:54:53.4218437Z // begin inline asm 2026-02-21T08:54:53.4218570Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd106, %rd107, %r451, %p118; 2026-02-21T08:54:53.4218622Z // end inline asm 2026-02-21T08:54:53.4218680Z // begin inline asm 2026-02-21T08:54:53.4218811Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd108, %rd109, %r451, %p118; 2026-02-21T08:54:53.4218864Z // end inline asm 2026-02-21T08:54:53.4218945Z // begin inline asm 2026-02-21T08:54:53.4219083Z @%p119 tcgen05.mma.cta_group::1.kind::f16 [ %r1136 + 0 ], %rd110, %rd111, %r451, %p118; 2026-02-21T08:54:53.4219135Z // end inline asm 2026-02-21T08:54:53.4219196Z cvt.u64.u32 %rd131, %r424; 2026-02-21T08:54:53.4219256Z // begin inline asm 2026-02-21T08:54:53.4219379Z @%p119 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd131]; 2026-02-21T08:54:53.4219431Z // end inline asm 2026-02-21T08:54:53.4219491Z bra.uni $L__BB0_6; 2026-02-21T08:54:53.4219569Z $L__BB0_7: // %.loopexit 2026-02-21T08:54:53.4219742Z .loc 1 0 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:0:52 2026-02-21T08:54:53.4219812Z setp.lt.u32 %p138, %r1, 128; 2026-02-21T08:54:53.4219980Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4220033Z // begin inline asm 2026-02-21T08:54:53.4220303Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r473, %r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488}, [%r744 + 0]; 2026-02-21T08:54:53.4220365Z // end inline asm 2026-02-21T08:54:53.4220453Z // begin inline asm 2026-02-21T08:54:53.4220727Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r490, %r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505}, [%r744 + 16]; 2026-02-21T08:54:53.4220789Z // end inline asm 2026-02-21T08:54:53.4220842Z // begin inline asm 2026-02-21T08:54:53.4221117Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r507, %r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522}, [%r744 + 32]; 2026-02-21T08:54:53.4221206Z // end inline asm 2026-02-21T08:54:53.4221261Z // begin inline asm 2026-02-21T08:54:53.4221516Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r524, %r525, %r526, %r527, %r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539}, [%r744 + 48]; 2026-02-21T08:54:53.4221579Z // end inline asm 2026-02-21T08:54:53.4221636Z // begin inline asm 2026-02-21T08:54:53.4221891Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556}, [%r744 + 64]; 2026-02-21T08:54:53.4221944Z // end inline asm 2026-02-21T08:54:53.4222006Z // begin inline asm 2026-02-21T08:54:53.4222257Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573}, [%r744 + 80]; 2026-02-21T08:54:53.4222309Z // end inline asm 2026-02-21T08:54:53.4222371Z // begin inline asm 2026-02-21T08:54:53.4222647Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590}, [%r744 + 96]; 2026-02-21T08:54:53.4222703Z // end inline asm 2026-02-21T08:54:53.4222772Z // begin inline asm 2026-02-21T08:54:53.4223042Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607}, [%r744 + 112]; 2026-02-21T08:54:53.4223097Z // end inline asm 2026-02-21T08:54:53.4223162Z // begin inline asm 2026-02-21T08:54:53.4223414Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624}, [%r744 + 128]; 2026-02-21T08:54:53.4223466Z // end inline asm 2026-02-21T08:54:53.4223518Z // begin inline asm 2026-02-21T08:54:53.4223779Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641}, [%r744 + 144]; 2026-02-21T08:54:53.4223832Z // end inline asm 2026-02-21T08:54:53.4223885Z // begin inline asm 2026-02-21T08:54:53.4224156Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658}, [%r744 + 160]; 2026-02-21T08:54:53.4224235Z // end inline asm 2026-02-21T08:54:53.4224288Z // begin inline asm 2026-02-21T08:54:53.4224553Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675}, [%r744 + 176]; 2026-02-21T08:54:53.4224607Z // end inline asm 2026-02-21T08:54:53.4224660Z // begin inline asm 2026-02-21T08:54:53.4224960Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692}, [%r744 + 192]; 2026-02-21T08:54:53.4225016Z // end inline asm 2026-02-21T08:54:53.4225070Z // begin inline asm 2026-02-21T08:54:53.4225332Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709}, [%r744 + 208]; 2026-02-21T08:54:53.4225395Z // end inline asm 2026-02-21T08:54:53.4225448Z // begin inline asm 2026-02-21T08:54:53.4225718Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726}, [%r744 + 224]; 2026-02-21T08:54:53.4225779Z // end inline asm 2026-02-21T08:54:53.4225861Z // begin inline asm 2026-02-21T08:54:53.4226120Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743}, [%r744 + 240]; 2026-02-21T08:54:53.4226181Z // end inline asm 2026-02-21T08:54:53.4226234Z // begin inline asm 2026-02-21T08:54:53.4226302Z tcgen05.wait::ld.sync.aligned; 2026-02-21T08:54:53.4226361Z // end inline asm 2026-02-21T08:54:53.4226423Z cvt.u64.u32 %rd133, %r473; 2026-02-21T08:54:53.4226506Z cvt.u64.u32 %rd134, %r474; 2026-02-21T08:54:53.4226567Z shl.b64 %rd135, %rd134, 32; 2026-02-21T08:54:53.4226639Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T08:54:53.4226812Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4226875Z mov.b64 {%r748, %r749}, %rd136; 2026-02-21T08:54:53.4226949Z cvt.rn.f16x2.f32 %r750, %r749, %r748; 2026-02-21T08:54:53.4227116Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4227174Z cvt.u64.u32 %rd137, %r475; 2026-02-21T08:54:53.4227236Z cvt.u64.u32 %rd138, %r476; 2026-02-21T08:54:53.4227292Z shl.b64 %rd139, %rd138, 32; 2026-02-21T08:54:53.4227350Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T08:54:53.4227516Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4227608Z mov.b64 {%r751, %r752}, %rd140; 2026-02-21T08:54:53.4227673Z cvt.rn.f16x2.f32 %r753, %r752, %r751; 2026-02-21T08:54:53.4227832Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4227895Z cvt.u64.u32 %rd141, %r477; 2026-02-21T08:54:53.4227951Z cvt.u64.u32 %rd142, %r478; 2026-02-21T08:54:53.4228005Z shl.b64 %rd143, %rd142, 32; 2026-02-21T08:54:53.4228061Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T08:54:53.4228226Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4228283Z mov.b64 {%r754, %r755}, %rd144; 2026-02-21T08:54:53.4228344Z cvt.rn.f16x2.f32 %r756, %r755, %r754; 2026-02-21T08:54:53.4228506Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4228562Z cvt.u64.u32 %rd145, %r479; 2026-02-21T08:54:53.4228615Z cvt.u64.u32 %rd146, %r480; 2026-02-21T08:54:53.4228677Z shl.b64 %rd147, %rd146, 32; 2026-02-21T08:54:53.4228734Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T08:54:53.4228890Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4228954Z mov.b64 {%r757, %r758}, %rd148; 2026-02-21T08:54:53.4229015Z cvt.rn.f16x2.f32 %r759, %r758, %r757; 2026-02-21T08:54:53.4229198Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4229255Z cvt.u64.u32 %rd149, %r481; 2026-02-21T08:54:53.4229319Z cvt.u64.u32 %rd150, %r482; 2026-02-21T08:54:53.4229374Z shl.b64 %rd151, %rd150, 32; 2026-02-21T08:54:53.4229429Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T08:54:53.4229598Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4229654Z mov.b64 {%r760, %r761}, %rd152; 2026-02-21T08:54:53.4229714Z cvt.rn.f16x2.f32 %r762, %r761, %r760; 2026-02-21T08:54:53.4229884Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4229942Z cvt.u64.u32 %rd153, %r483; 2026-02-21T08:54:53.4229998Z cvt.u64.u32 %rd154, %r484; 2026-02-21T08:54:53.4230053Z shl.b64 %rd155, %rd154, 32; 2026-02-21T08:54:53.4230115Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T08:54:53.4230279Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4230336Z mov.b64 {%r763, %r764}, %rd156; 2026-02-21T08:54:53.4230428Z cvt.rn.f16x2.f32 %r765, %r764, %r763; 2026-02-21T08:54:53.4230591Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4230648Z cvt.u64.u32 %rd157, %r485; 2026-02-21T08:54:53.4230704Z cvt.u64.u32 %rd158, %r486; 2026-02-21T08:54:53.4230768Z shl.b64 %rd159, %rd158, 32; 2026-02-21T08:54:53.4230826Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T08:54:53.4230987Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4231076Z mov.b64 {%r766, %r767}, %rd160; 2026-02-21T08:54:53.4231138Z cvt.rn.f16x2.f32 %r768, %r767, %r766; 2026-02-21T08:54:53.4231303Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4231370Z cvt.u64.u32 %rd161, %r487; 2026-02-21T08:54:53.4231427Z cvt.u64.u32 %rd162, %r488; 2026-02-21T08:54:53.4231484Z shl.b64 %rd163, %rd162, 32; 2026-02-21T08:54:53.4231541Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T08:54:53.4231717Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4231775Z mov.b64 {%r769, %r770}, %rd164; 2026-02-21T08:54:53.4231836Z cvt.rn.f16x2.f32 %r771, %r770, %r769; 2026-02-21T08:54:53.4232010Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4232065Z cvt.u64.u32 %rd165, %r490; 2026-02-21T08:54:53.4232141Z cvt.u64.u32 %rd166, %r491; 2026-02-21T08:54:53.4232207Z shl.b64 %rd167, %rd166, 32; 2026-02-21T08:54:53.4232265Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T08:54:53.4232427Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4232485Z mov.b64 {%r772, %r773}, %rd168; 2026-02-21T08:54:53.4232553Z cvt.rn.f16x2.f32 %r774, %r773, %r772; 2026-02-21T08:54:53.4232714Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4232770Z cvt.u64.u32 %rd169, %r492; 2026-02-21T08:54:53.4232834Z cvt.u64.u32 %rd170, %r493; 2026-02-21T08:54:53.4232890Z shl.b64 %rd171, %rd170, 32; 2026-02-21T08:54:53.4232947Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T08:54:53.4233120Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4233177Z mov.b64 {%r775, %r776}, %rd172; 2026-02-21T08:54:53.4233239Z cvt.rn.f16x2.f32 %r777, %r776, %r775; 2026-02-21T08:54:53.4233402Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4233463Z cvt.u64.u32 %rd173, %r494; 2026-02-21T08:54:53.4233518Z cvt.u64.u32 %rd174, %r495; 2026-02-21T08:54:53.4233596Z shl.b64 %rd175, %rd174, 32; 2026-02-21T08:54:53.4233660Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T08:54:53.4233823Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4233880Z mov.b64 {%r778, %r779}, %rd176; 2026-02-21T08:54:53.4233947Z cvt.rn.f16x2.f32 %r780, %r779, %r778; 2026-02-21T08:54:53.4234111Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4234166Z cvt.u64.u32 %rd177, %r496; 2026-02-21T08:54:53.4234220Z cvt.u64.u32 %rd178, %r497; 2026-02-21T08:54:53.4234283Z shl.b64 %rd179, %rd178, 32; 2026-02-21T08:54:53.4234340Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T08:54:53.4234500Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4234566Z mov.b64 {%r781, %r782}, %rd180; 2026-02-21T08:54:53.4234626Z cvt.rn.f16x2.f32 %r783, %r782, %r781; 2026-02-21T08:54:53.4234825Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4234889Z cvt.u64.u32 %rd181, %r498; 2026-02-21T08:54:53.4234944Z cvt.u64.u32 %rd182, %r499; 2026-02-21T08:54:53.4235025Z shl.b64 %rd183, %rd182, 32; 2026-02-21T08:54:53.4235083Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T08:54:53.4235252Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4235309Z mov.b64 {%r784, %r785}, %rd184; 2026-02-21T08:54:53.4235368Z cvt.rn.f16x2.f32 %r786, %r785, %r784; 2026-02-21T08:54:53.4235533Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4235615Z cvt.u64.u32 %rd185, %r500; 2026-02-21T08:54:53.4235670Z cvt.u64.u32 %rd186, %r501; 2026-02-21T08:54:53.4235732Z shl.b64 %rd187, %rd186, 32; 2026-02-21T08:54:53.4235787Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T08:54:53.4235947Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4236006Z mov.b64 {%r787, %r788}, %rd188; 2026-02-21T08:54:53.4236072Z cvt.rn.f16x2.f32 %r789, %r788, %r787; 2026-02-21T08:54:53.4236235Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4236291Z cvt.u64.u32 %rd189, %r502; 2026-02-21T08:54:53.4236353Z cvt.u64.u32 %rd190, %r503; 2026-02-21T08:54:53.4236408Z shl.b64 %rd191, %rd190, 32; 2026-02-21T08:54:53.4236463Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T08:54:53.4236630Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4236711Z mov.b64 {%r790, %r791}, %rd192; 2026-02-21T08:54:53.4236775Z cvt.rn.f16x2.f32 %r792, %r791, %r790; 2026-02-21T08:54:53.4236941Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4237005Z cvt.u64.u32 %rd193, %r504; 2026-02-21T08:54:53.4237064Z cvt.u64.u32 %rd194, %r505; 2026-02-21T08:54:53.4237122Z shl.b64 %rd195, %rd194, 32; 2026-02-21T08:54:53.4237188Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T08:54:53.4237356Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4237416Z mov.b64 {%r793, %r794}, %rd196; 2026-02-21T08:54:53.4237485Z cvt.rn.f16x2.f32 %r795, %r794, %r793; 2026-02-21T08:54:53.4237654Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4237710Z cvt.u64.u32 %rd197, %r507; 2026-02-21T08:54:53.4237768Z cvt.u64.u32 %rd198, %r508; 2026-02-21T08:54:53.4237833Z shl.b64 %rd199, %rd198, 32; 2026-02-21T08:54:53.4237893Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T08:54:53.4238060Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4238126Z mov.b64 {%r796, %r797}, %rd200; 2026-02-21T08:54:53.4238227Z cvt.rn.f16x2.f32 %r798, %r797, %r796; 2026-02-21T08:54:53.4238397Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4238463Z cvt.u64.u32 %rd201, %r509; 2026-02-21T08:54:53.4238522Z cvt.u64.u32 %rd202, %r510; 2026-02-21T08:54:53.4238580Z shl.b64 %rd203, %rd202, 32; 2026-02-21T08:54:53.4238639Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T08:54:53.4238816Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4238876Z mov.b64 {%r799, %r800}, %rd204; 2026-02-21T08:54:53.4238938Z cvt.rn.f16x2.f32 %r801, %r800, %r799; 2026-02-21T08:54:53.4239114Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4239175Z cvt.u64.u32 %rd205, %r511; 2026-02-21T08:54:53.4239235Z cvt.u64.u32 %rd206, %r512; 2026-02-21T08:54:53.4239301Z shl.b64 %rd207, %rd206, 32; 2026-02-21T08:54:53.4239361Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T08:54:53.4239528Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4239588Z mov.b64 {%r802, %r803}, %rd208; 2026-02-21T08:54:53.4239679Z cvt.rn.f16x2.f32 %r804, %r803, %r802; 2026-02-21T08:54:53.4239851Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4239910Z cvt.u64.u32 %rd209, %r513; 2026-02-21T08:54:53.4239977Z cvt.u64.u32 %rd210, %r514; 2026-02-21T08:54:53.4240038Z shl.b64 %rd211, %rd210, 32; 2026-02-21T08:54:53.4240099Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T08:54:53.4240281Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4240374Z mov.b64 {%r805, %r806}, %rd212; 2026-02-21T08:54:53.4240437Z cvt.rn.f16x2.f32 %r807, %r806, %r805; 2026-02-21T08:54:53.4240603Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4240672Z cvt.u64.u32 %rd213, %r515; 2026-02-21T08:54:53.4240729Z cvt.u64.u32 %rd214, %r516; 2026-02-21T08:54:53.4240787Z shl.b64 %rd215, %rd214, 32; 2026-02-21T08:54:53.4240854Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T08:54:53.4241017Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4241075Z mov.b64 {%r808, %r809}, %rd216; 2026-02-21T08:54:53.4241146Z cvt.rn.f16x2.f32 %r810, %r809, %r808; 2026-02-21T08:54:53.4241305Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4241392Z cvt.u64.u32 %rd217, %r517; 2026-02-21T08:54:53.4241453Z cvt.u64.u32 %rd218, %r518; 2026-02-21T08:54:53.4241518Z shl.b64 %rd219, %rd218, 32; 2026-02-21T08:54:53.4241576Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T08:54:53.4241744Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4241813Z mov.b64 {%r811, %r812}, %rd220; 2026-02-21T08:54:53.4241875Z cvt.rn.f16x2.f32 %r813, %r812, %r811; 2026-02-21T08:54:53.4242043Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4242109Z cvt.u64.u32 %rd221, %r519; 2026-02-21T08:54:53.4242166Z cvt.u64.u32 %rd222, %r520; 2026-02-21T08:54:53.4242224Z shl.b64 %rd223, %rd222, 32; 2026-02-21T08:54:53.4242282Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T08:54:53.4242457Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4242517Z mov.b64 {%r814, %r815}, %rd224; 2026-02-21T08:54:53.4242581Z cvt.rn.f16x2.f32 %r816, %r815, %r814; 2026-02-21T08:54:53.4242757Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4242815Z cvt.u64.u32 %rd225, %r521; 2026-02-21T08:54:53.4242872Z cvt.u64.u32 %rd226, %r522; 2026-02-21T08:54:53.4242963Z shl.b64 %rd227, %rd226, 32; 2026-02-21T08:54:53.4243022Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T08:54:53.4243190Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4243249Z mov.b64 {%r817, %r818}, %rd228; 2026-02-21T08:54:53.4243318Z cvt.rn.f16x2.f32 %r819, %r818, %r817; 2026-02-21T08:54:53.4243484Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4243542Z cvt.u64.u32 %rd229, %r524; 2026-02-21T08:54:53.4243608Z cvt.u64.u32 %rd230, %r525; 2026-02-21T08:54:53.4243667Z shl.b64 %rd231, %rd230, 32; 2026-02-21T08:54:53.4243726Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T08:54:53.4243904Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4243963Z mov.b64 {%r820, %r821}, %rd232; 2026-02-21T08:54:53.4244024Z cvt.rn.f16x2.f32 %r822, %r821, %r820; 2026-02-21T08:54:53.4244193Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4244259Z cvt.u64.u32 %rd233, %r526; 2026-02-21T08:54:53.4244338Z cvt.u64.u32 %rd234, %r527; 2026-02-21T08:54:53.4244398Z shl.b64 %rd235, %rd234, 32; 2026-02-21T08:54:53.4244465Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T08:54:53.4244636Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4244726Z mov.b64 {%r823, %r824}, %rd236; 2026-02-21T08:54:53.4244797Z cvt.rn.f16x2.f32 %r825, %r824, %r823; 2026-02-21T08:54:53.4244966Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4245050Z cvt.u64.u32 %rd237, %r528; 2026-02-21T08:54:53.4245108Z cvt.u64.u32 %rd238, %r529; 2026-02-21T08:54:53.4245172Z shl.b64 %rd239, %rd238, 32; 2026-02-21T08:54:53.4245230Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T08:54:53.4245400Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4245471Z mov.b64 {%r826, %r827}, %rd240; 2026-02-21T08:54:53.4245530Z cvt.rn.f16x2.f32 %r828, %r827, %r826; 2026-02-21T08:54:53.4245686Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4245749Z cvt.u64.u32 %rd241, %r530; 2026-02-21T08:54:53.4245804Z cvt.u64.u32 %rd242, %r531; 2026-02-21T08:54:53.4245859Z shl.b64 %rd243, %rd242, 32; 2026-02-21T08:54:53.4245915Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T08:54:53.4246108Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4246167Z mov.b64 {%r829, %r830}, %rd244; 2026-02-21T08:54:53.4246227Z cvt.rn.f16x2.f32 %r831, %r830, %r829; 2026-02-21T08:54:53.4246393Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4246450Z cvt.u64.u32 %rd245, %r532; 2026-02-21T08:54:53.4246505Z cvt.u64.u32 %rd246, %r533; 2026-02-21T08:54:53.4246567Z shl.b64 %rd247, %rd246, 32; 2026-02-21T08:54:53.4246622Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T08:54:53.4246786Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4246842Z mov.b64 {%r832, %r833}, %rd248; 2026-02-21T08:54:53.4246910Z cvt.rn.f16x2.f32 %r834, %r833, %r832; 2026-02-21T08:54:53.4247074Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4247131Z cvt.u64.u32 %rd249, %r534; 2026-02-21T08:54:53.4247193Z cvt.u64.u32 %rd250, %r535; 2026-02-21T08:54:53.4247250Z shl.b64 %rd251, %rd250, 32; 2026-02-21T08:54:53.4247304Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T08:54:53.4247476Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4247561Z mov.b64 {%r835, %r836}, %rd252; 2026-02-21T08:54:53.4247620Z cvt.rn.f16x2.f32 %r837, %r836, %r835; 2026-02-21T08:54:53.4247779Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4247843Z cvt.u64.u32 %rd253, %r536; 2026-02-21T08:54:53.4247898Z cvt.u64.u32 %rd254, %r537; 2026-02-21T08:54:53.4247955Z shl.b64 %rd255, %rd254, 32; 2026-02-21T08:54:53.4248019Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T08:54:53.4248176Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4248234Z mov.b64 {%r838, %r839}, %rd256; 2026-02-21T08:54:53.4248302Z cvt.rn.f16x2.f32 %r840, %r839, %r838; 2026-02-21T08:54:53.4248460Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4248515Z cvt.u64.u32 %rd257, %r538; 2026-02-21T08:54:53.4248570Z cvt.u64.u32 %rd258, %r539; 2026-02-21T08:54:53.4248636Z shl.b64 %rd259, %rd258, 32; 2026-02-21T08:54:53.4248693Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T08:54:53.4248854Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4248944Z mov.b64 {%r841, %r842}, %rd260; 2026-02-21T08:54:53.4249008Z cvt.rn.f16x2.f32 %r843, %r842, %r841; 2026-02-21T08:54:53.4249171Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4249236Z cvt.u64.u32 %rd261, %r541; 2026-02-21T08:54:53.4249290Z cvt.u64.u32 %rd262, %r542; 2026-02-21T08:54:53.4249345Z shl.b64 %rd263, %rd262, 32; 2026-02-21T08:54:53.4249400Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T08:54:53.4249567Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4249644Z mov.b64 {%r844, %r845}, %rd264; 2026-02-21T08:54:53.4249704Z cvt.rn.f16x2.f32 %r846, %r845, %r844; 2026-02-21T08:54:53.4249871Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4249928Z cvt.u64.u32 %rd265, %r543; 2026-02-21T08:54:53.4249983Z cvt.u64.u32 %rd266, %r544; 2026-02-21T08:54:53.4250048Z shl.b64 %rd267, %rd266, 32; 2026-02-21T08:54:53.4250103Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T08:54:53.4250260Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4250316Z mov.b64 {%r847, %r848}, %rd268; 2026-02-21T08:54:53.4250382Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T08:54:53.4250541Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4250616Z cvt.u64.u32 %rd269, %r545; 2026-02-21T08:54:53.4250680Z cvt.u64.u32 %rd270, %r546; 2026-02-21T08:54:53.4250735Z shl.b64 %rd271, %rd270, 32; 2026-02-21T08:54:53.4250790Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T08:54:53.4250958Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4251015Z mov.b64 {%r850, %r851}, %rd272; 2026-02-21T08:54:53.4251075Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T08:54:53.4251236Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4251298Z cvt.u64.u32 %rd273, %r547; 2026-02-21T08:54:53.4251354Z cvt.u64.u32 %rd274, %r548; 2026-02-21T08:54:53.4251409Z shl.b64 %rd275, %rd274, 32; 2026-02-21T08:54:53.4251475Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T08:54:53.4251632Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4251690Z mov.b64 {%r853, %r854}, %rd276; 2026-02-21T08:54:53.4251759Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T08:54:53.4251918Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4251974Z cvt.u64.u32 %rd277, %r549; 2026-02-21T08:54:53.4252056Z cvt.u64.u32 %rd278, %r550; 2026-02-21T08:54:53.4252119Z shl.b64 %rd279, %rd278, 32; 2026-02-21T08:54:53.4252174Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T08:54:53.4252337Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4252401Z mov.b64 {%r856, %r857}, %rd280; 2026-02-21T08:54:53.4252461Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T08:54:53.4252618Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4252680Z cvt.u64.u32 %rd281, %r551; 2026-02-21T08:54:53.4252735Z cvt.u64.u32 %rd282, %r552; 2026-02-21T08:54:53.4252792Z shl.b64 %rd283, %rd282, 32; 2026-02-21T08:54:53.4252849Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T08:54:53.4253013Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4253068Z mov.b64 {%r859, %r860}, %rd284; 2026-02-21T08:54:53.4253130Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T08:54:53.4253297Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4253352Z cvt.u64.u32 %rd285, %r553; 2026-02-21T08:54:53.4253428Z cvt.u64.u32 %rd286, %r554; 2026-02-21T08:54:53.4253492Z shl.b64 %rd287, %rd286, 32; 2026-02-21T08:54:53.4253550Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T08:54:53.4253709Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4253764Z mov.b64 {%r862, %r863}, %rd288; 2026-02-21T08:54:53.4253833Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T08:54:53.4253994Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4254074Z cvt.u64.u32 %rd289, %r555; 2026-02-21T08:54:53.4254136Z cvt.u64.u32 %rd290, %r556; 2026-02-21T08:54:53.4254190Z shl.b64 %rd291, %rd290, 32; 2026-02-21T08:54:53.4254247Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T08:54:53.4254411Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4254468Z mov.b64 {%r865, %r866}, %rd292; 2026-02-21T08:54:53.4254527Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T08:54:53.4254719Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4254782Z cvt.u64.u32 %rd293, %r558; 2026-02-21T08:54:53.4254836Z cvt.u64.u32 %rd294, %r559; 2026-02-21T08:54:53.4254890Z shl.b64 %rd295, %rd294, 32; 2026-02-21T08:54:53.4254953Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T08:54:53.4255151Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4255210Z mov.b64 {%r868, %r869}, %rd296; 2026-02-21T08:54:53.4255275Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T08:54:53.4255437Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4255495Z cvt.u64.u32 %rd297, %r560; 2026-02-21T08:54:53.4255549Z cvt.u64.u32 %rd298, %r561; 2026-02-21T08:54:53.4255611Z shl.b64 %rd299, %rd298, 32; 2026-02-21T08:54:53.4255668Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T08:54:53.4255828Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4255893Z mov.b64 {%r871, %r872}, %rd300; 2026-02-21T08:54:53.4255953Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T08:54:53.4256110Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4256173Z cvt.u64.u32 %rd301, %r562; 2026-02-21T08:54:53.4256230Z cvt.u64.u32 %rd302, %r563; 2026-02-21T08:54:53.4256288Z shl.b64 %rd303, %rd302, 32; 2026-02-21T08:54:53.4256344Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T08:54:53.4256511Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4256595Z mov.b64 {%r874, %r875}, %rd304; 2026-02-21T08:54:53.4256655Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T08:54:53.4256825Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4256881Z cvt.u64.u32 %rd305, %r564; 2026-02-21T08:54:53.4256936Z cvt.u64.u32 %rd306, %r565; 2026-02-21T08:54:53.4256999Z shl.b64 %rd307, %rd306, 32; 2026-02-21T08:54:53.4257056Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T08:54:53.4257221Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4257278Z mov.b64 {%r877, %r878}, %rd308; 2026-02-21T08:54:53.4257351Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T08:54:53.4257514Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4257573Z cvt.u64.u32 %rd309, %r566; 2026-02-21T08:54:53.4257640Z cvt.u64.u32 %rd310, %r567; 2026-02-21T08:54:53.4257697Z shl.b64 %rd311, %rd310, 32; 2026-02-21T08:54:53.4257753Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T08:54:53.4257921Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4258005Z mov.b64 {%r880, %r881}, %rd312; 2026-02-21T08:54:53.4258067Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T08:54:53.4258225Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4258288Z cvt.u64.u32 %rd313, %r568; 2026-02-21T08:54:53.4258342Z cvt.u64.u32 %rd314, %r569; 2026-02-21T08:54:53.4258398Z shl.b64 %rd315, %rd314, 32; 2026-02-21T08:54:53.4258463Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T08:54:53.4258648Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4258705Z mov.b64 {%r883, %r884}, %rd316; 2026-02-21T08:54:53.4258772Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T08:54:53.4258929Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4258984Z cvt.u64.u32 %rd317, %r570; 2026-02-21T08:54:53.4259039Z cvt.u64.u32 %rd318, %r571; 2026-02-21T08:54:53.4259102Z shl.b64 %rd319, %rd318, 32; 2026-02-21T08:54:53.4259158Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T08:54:53.4259317Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4259380Z mov.b64 {%r886, %r887}, %rd320; 2026-02-21T08:54:53.4259440Z cvt.rn.f16x2.f32 %r888, %r887, %r886; 2026-02-21T08:54:53.4259624Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4259690Z cvt.u64.u32 %rd321, %r572; 2026-02-21T08:54:53.4259745Z cvt.u64.u32 %rd322, %r573; 2026-02-21T08:54:53.4259800Z shl.b64 %rd323, %rd322, 32; 2026-02-21T08:54:53.4259856Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T08:54:53.4260029Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4260085Z mov.b64 {%r889, %r890}, %rd324; 2026-02-21T08:54:53.4260145Z cvt.rn.f16x2.f32 %r891, %r890, %r889; 2026-02-21T08:54:53.4260309Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4260365Z cvt.u64.u32 %rd325, %r575; 2026-02-21T08:54:53.4260420Z cvt.u64.u32 %rd326, %r576; 2026-02-21T08:54:53.4260483Z shl.b64 %rd327, %rd326, 32; 2026-02-21T08:54:53.4260540Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T08:54:53.4260696Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4260753Z mov.b64 {%r892, %r893}, %rd328; 2026-02-21T08:54:53.4260821Z cvt.rn.f16x2.f32 %r894, %r893, %r892; 2026-02-21T08:54:53.4260983Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4261060Z cvt.u64.u32 %rd329, %r577; 2026-02-21T08:54:53.4261124Z cvt.u64.u32 %rd330, %r578; 2026-02-21T08:54:53.4261178Z shl.b64 %rd331, %rd330, 32; 2026-02-21T08:54:53.4261233Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T08:54:53.4261405Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4261461Z mov.b64 {%r895, %r896}, %rd332; 2026-02-21T08:54:53.4261520Z cvt.rn.f16x2.f32 %r897, %r896, %r895; 2026-02-21T08:54:53.4261677Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4261740Z cvt.u64.u32 %rd333, %r579; 2026-02-21T08:54:53.4261796Z cvt.u64.u32 %rd334, %r580; 2026-02-21T08:54:53.4261851Z shl.b64 %rd335, %rd334, 32; 2026-02-21T08:54:53.4261913Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T08:54:53.4262071Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4262126Z mov.b64 {%r898, %r899}, %rd336; 2026-02-21T08:54:53.4262194Z cvt.rn.f16x2.f32 %r900, %r899, %r898; 2026-02-21T08:54:53.4262355Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4262431Z cvt.u64.u32 %rd337, %r581; 2026-02-21T08:54:53.4262487Z cvt.u64.u32 %rd338, %r582; 2026-02-21T08:54:53.4262549Z shl.b64 %rd339, %rd338, 32; 2026-02-21T08:54:53.4262603Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T08:54:53.4262761Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4262823Z mov.b64 {%r901, %r902}, %rd340; 2026-02-21T08:54:53.4262882Z cvt.rn.f16x2.f32 %r903, %r902, %r901; 2026-02-21T08:54:53.4263038Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4263121Z cvt.u64.u32 %rd341, %r583; 2026-02-21T08:54:53.4263176Z cvt.u64.u32 %rd342, %r584; 2026-02-21T08:54:53.4263230Z shl.b64 %rd343, %rd342, 32; 2026-02-21T08:54:53.4263286Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T08:54:53.4263451Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4263509Z mov.b64 {%r904, %r905}, %rd344; 2026-02-21T08:54:53.4263569Z cvt.rn.f16x2.f32 %r906, %r905, %r904; 2026-02-21T08:54:53.4263732Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4263788Z cvt.u64.u32 %rd345, %r585; 2026-02-21T08:54:53.4263843Z cvt.u64.u32 %rd346, %r586; 2026-02-21T08:54:53.4263905Z shl.b64 %rd347, %rd346, 32; 2026-02-21T08:54:53.4263961Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T08:54:53.4264138Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4264196Z mov.b64 {%r907, %r908}, %rd348; 2026-02-21T08:54:53.4264263Z cvt.rn.f16x2.f32 %r909, %r908, %r907; 2026-02-21T08:54:53.4264426Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4264483Z cvt.u64.u32 %rd349, %r587; 2026-02-21T08:54:53.4264545Z cvt.u64.u32 %rd350, %r588; 2026-02-21T08:54:53.4264600Z shl.b64 %rd351, %rd350, 32; 2026-02-21T08:54:53.4264657Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T08:54:53.4264850Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4264908Z mov.b64 {%r910, %r911}, %rd352; 2026-02-21T08:54:53.4264967Z cvt.rn.f16x2.f32 %r912, %r911, %r910; 2026-02-21T08:54:53.4265120Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4265186Z cvt.u64.u32 %rd353, %r589; 2026-02-21T08:54:53.4265242Z cvt.u64.u32 %rd354, %r590; 2026-02-21T08:54:53.4265298Z shl.b64 %rd355, %rd354, 32; 2026-02-21T08:54:53.4265361Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T08:54:53.4265518Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4265601Z mov.b64 {%r913, %r914}, %rd356; 2026-02-21T08:54:53.4265670Z cvt.rn.f16x2.f32 %r915, %r914, %r913; 2026-02-21T08:54:53.4265834Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4265894Z cvt.u64.u32 %rd357, %r592; 2026-02-21T08:54:53.4265949Z cvt.u64.u32 %rd358, %r593; 2026-02-21T08:54:53.4266015Z shl.b64 %rd359, %rd358, 32; 2026-02-21T08:54:53.4266073Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T08:54:53.4266239Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4266307Z mov.b64 {%r916, %r917}, %rd360; 2026-02-21T08:54:53.4266371Z cvt.rn.f16x2.f32 %r918, %r917, %r916; 2026-02-21T08:54:53.4266534Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4266605Z cvt.u64.u32 %rd361, %r594; 2026-02-21T08:54:53.4266667Z cvt.u64.u32 %rd362, %r595; 2026-02-21T08:54:53.4266726Z shl.b64 %rd363, %rd362, 32; 2026-02-21T08:54:53.4266786Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T08:54:53.4266978Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4267035Z mov.b64 {%r919, %r920}, %rd364; 2026-02-21T08:54:53.4267094Z cvt.rn.f16x2.f32 %r921, %r920, %r919; 2026-02-21T08:54:53.4267260Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4267315Z cvt.u64.u32 %rd365, %r596; 2026-02-21T08:54:53.4267370Z cvt.u64.u32 %rd366, %r597; 2026-02-21T08:54:53.4267425Z shl.b64 %rd367, %rd366, 32; 2026-02-21T08:54:53.4267489Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T08:54:53.4267669Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4267727Z mov.b64 {%r922, %r923}, %rd368; 2026-02-21T08:54:53.4267794Z cvt.rn.f16x2.f32 %r924, %r923, %r922; 2026-02-21T08:54:53.4267954Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4268011Z cvt.u64.u32 %rd369, %r598; 2026-02-21T08:54:53.4268074Z cvt.u64.u32 %rd370, %r599; 2026-02-21T08:54:53.4268131Z shl.b64 %rd371, %rd370, 32; 2026-02-21T08:54:53.4268186Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T08:54:53.4268348Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4268412Z mov.b64 {%r925, %r926}, %rd372; 2026-02-21T08:54:53.4268472Z cvt.rn.f16x2.f32 %r927, %r926, %r925; 2026-02-21T08:54:53.4268655Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4268720Z cvt.u64.u32 %rd373, %r600; 2026-02-21T08:54:53.4268776Z cvt.u64.u32 %rd374, %r601; 2026-02-21T08:54:53.4268831Z shl.b64 %rd375, %rd374, 32; 2026-02-21T08:54:53.4268895Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T08:54:53.4269060Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4269117Z mov.b64 {%r928, %r929}, %rd376; 2026-02-21T08:54:53.4269179Z cvt.rn.f16x2.f32 %r930, %r929, %r928; 2026-02-21T08:54:53.4269349Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4269406Z cvt.u64.u32 %rd377, %r602; 2026-02-21T08:54:53.4269461Z cvt.u64.u32 %rd378, %r603; 2026-02-21T08:54:53.4269525Z shl.b64 %rd379, %rd378, 32; 2026-02-21T08:54:53.4269582Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T08:54:53.4269745Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4269811Z mov.b64 {%r931, %r932}, %rd380; 2026-02-21T08:54:53.4269871Z cvt.rn.f16x2.f32 %r933, %r932, %r931; 2026-02-21T08:54:53.4270033Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4270117Z cvt.u64.u32 %rd381, %r604; 2026-02-21T08:54:53.4270171Z cvt.u64.u32 %rd382, %r605; 2026-02-21T08:54:53.4270227Z shl.b64 %rd383, %rd382, 32; 2026-02-21T08:54:53.4270282Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T08:54:53.4270448Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4270505Z mov.b64 {%r934, %r935}, %rd384; 2026-02-21T08:54:53.4270563Z cvt.rn.f16x2.f32 %r936, %r935, %r934; 2026-02-21T08:54:53.4270731Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4270786Z cvt.u64.u32 %rd385, %r606; 2026-02-21T08:54:53.4270842Z cvt.u64.u32 %rd386, %r607; 2026-02-21T08:54:53.4270898Z shl.b64 %rd387, %rd386, 32; 2026-02-21T08:54:53.4270960Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T08:54:53.4271122Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4271180Z mov.b64 {%r937, %r938}, %rd388; 2026-02-21T08:54:53.4271245Z cvt.rn.f16x2.f32 %r939, %r938, %r937; 2026-02-21T08:54:53.4271407Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4271484Z cvt.u64.u32 %rd389, %r609; 2026-02-21T08:54:53.4271546Z cvt.u64.u32 %rd390, %r610; 2026-02-21T08:54:53.4271601Z shl.b64 %rd391, %rd390, 32; 2026-02-21T08:54:53.4271656Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T08:54:53.4271809Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4271872Z mov.b64 {%r940, %r941}, %rd392; 2026-02-21T08:54:53.4271931Z cvt.rn.f16x2.f32 %r942, %r941, %r940; 2026-02-21T08:54:53.4272123Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4272186Z cvt.u64.u32 %rd393, %r611; 2026-02-21T08:54:53.4272239Z cvt.u64.u32 %rd394, %r612; 2026-02-21T08:54:53.4272297Z shl.b64 %rd395, %rd394, 32; 2026-02-21T08:54:53.4272359Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T08:54:53.4272519Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4272577Z mov.b64 {%r943, %r944}, %rd396; 2026-02-21T08:54:53.4272637Z cvt.rn.f16x2.f32 %r945, %r944, %r943; 2026-02-21T08:54:53.4272804Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4272860Z cvt.u64.u32 %rd397, %r613; 2026-02-21T08:54:53.4272914Z cvt.u64.u32 %rd398, %r614; 2026-02-21T08:54:53.4272976Z shl.b64 %rd399, %rd398, 32; 2026-02-21T08:54:53.4273053Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T08:54:53.4273218Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4273284Z mov.b64 {%r946, %r947}, %rd400; 2026-02-21T08:54:53.4273346Z cvt.rn.f16x2.f32 %r948, %r947, %r946; 2026-02-21T08:54:53.4273510Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4273567Z cvt.u64.u32 %rd401, %r615; 2026-02-21T08:54:53.4273633Z cvt.u64.u32 %rd402, %r616; 2026-02-21T08:54:53.4273693Z shl.b64 %rd403, %rd402, 32; 2026-02-21T08:54:53.4273751Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T08:54:53.4273919Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4273978Z mov.b64 {%r949, %r950}, %rd404; 2026-02-21T08:54:53.4274040Z cvt.rn.f16x2.f32 %r951, %r950, %r949; 2026-02-21T08:54:53.4274214Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4274275Z cvt.u64.u32 %rd405, %r617; 2026-02-21T08:54:53.4274334Z cvt.u64.u32 %rd406, %r618; 2026-02-21T08:54:53.4274393Z shl.b64 %rd407, %rd406, 32; 2026-02-21T08:54:53.4274462Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T08:54:53.4274626Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4274738Z mov.b64 {%r952, %r953}, %rd408; 2026-02-21T08:54:53.4274816Z cvt.rn.f16x2.f32 %r954, %r953, %r952; 2026-02-21T08:54:53.4274978Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4275034Z cvt.u64.u32 %rd409, %r619; 2026-02-21T08:54:53.4275096Z cvt.u64.u32 %rd410, %r620; 2026-02-21T08:54:53.4275151Z shl.b64 %rd411, %rd410, 32; 2026-02-21T08:54:53.4275207Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T08:54:53.4275368Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4275433Z mov.b64 {%r955, %r956}, %rd412; 2026-02-21T08:54:53.4275495Z cvt.rn.f16x2.f32 %r957, %r956, %r955; 2026-02-21T08:54:53.4275652Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4275714Z cvt.u64.u32 %rd413, %r621; 2026-02-21T08:54:53.4275770Z cvt.u64.u32 %rd414, %r622; 2026-02-21T08:54:53.4275825Z shl.b64 %rd415, %rd414, 32; 2026-02-21T08:54:53.4275887Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T08:54:53.4276074Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4276132Z mov.b64 {%r958, %r959}, %rd416; 2026-02-21T08:54:53.4276191Z cvt.rn.f16x2.f32 %r960, %r959, %r958; 2026-02-21T08:54:53.4276359Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4276416Z cvt.u64.u32 %rd417, %r623; 2026-02-21T08:54:53.4276471Z cvt.u64.u32 %rd418, %r624; 2026-02-21T08:54:53.4276535Z shl.b64 %rd419, %rd418, 32; 2026-02-21T08:54:53.4276615Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T08:54:53.4276774Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4276837Z mov.b64 {%r961, %r962}, %rd420; 2026-02-21T08:54:53.4276897Z cvt.rn.f16x2.f32 %r963, %r962, %r961; 2026-02-21T08:54:53.4277058Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4277115Z cvt.u64.u32 %rd421, %r626; 2026-02-21T08:54:53.4277178Z cvt.u64.u32 %rd422, %r627; 2026-02-21T08:54:53.4277234Z shl.b64 %rd423, %rd422, 32; 2026-02-21T08:54:53.4277290Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T08:54:53.4277455Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4277512Z mov.b64 {%r964, %r965}, %rd424; 2026-02-21T08:54:53.4277571Z cvt.rn.f16x2.f32 %r966, %r965, %r964; 2026-02-21T08:54:53.4277758Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4277817Z cvt.u64.u32 %rd425, %r628; 2026-02-21T08:54:53.4277872Z cvt.u64.u32 %rd426, %r629; 2026-02-21T08:54:53.4277927Z shl.b64 %rd427, %rd426, 32; 2026-02-21T08:54:53.4277990Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T08:54:53.4278149Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4278204Z mov.b64 {%r967, %r968}, %rd428; 2026-02-21T08:54:53.4278272Z cvt.rn.f16x2.f32 %r969, %r968, %r967; 2026-02-21T08:54:53.4278429Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4278483Z cvt.u64.u32 %rd429, %r630; 2026-02-21T08:54:53.4278544Z cvt.u64.u32 %rd430, %r631; 2026-02-21T08:54:53.4278598Z shl.b64 %rd431, %rd430, 32; 2026-02-21T08:54:53.4278653Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T08:54:53.4278809Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4278872Z mov.b64 {%r970, %r971}, %rd432; 2026-02-21T08:54:53.4278930Z cvt.rn.f16x2.f32 %r972, %r971, %r970; 2026-02-21T08:54:53.4279084Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4279174Z cvt.u64.u32 %rd433, %r632; 2026-02-21T08:54:53.4279230Z cvt.u64.u32 %rd434, %r633; 2026-02-21T08:54:53.4279287Z shl.b64 %rd435, %rd434, 32; 2026-02-21T08:54:53.4279351Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T08:54:53.4279516Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4279574Z mov.b64 {%r973, %r974}, %rd436; 2026-02-21T08:54:53.4279635Z cvt.rn.f16x2.f32 %r975, %r974, %r973; 2026-02-21T08:54:53.4279806Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4279865Z cvt.u64.u32 %rd437, %r634; 2026-02-21T08:54:53.4279923Z cvt.u64.u32 %rd438, %r635; 2026-02-21T08:54:53.4279987Z shl.b64 %rd439, %rd438, 32; 2026-02-21T08:54:53.4280045Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T08:54:53.4280210Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4280278Z mov.b64 {%r976, %r977}, %rd440; 2026-02-21T08:54:53.4280339Z cvt.rn.f16x2.f32 %r978, %r977, %r976; 2026-02-21T08:54:53.4280528Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4280588Z cvt.u64.u32 %rd441, %r636; 2026-02-21T08:54:53.4280652Z cvt.u64.u32 %rd442, %r637; 2026-02-21T08:54:53.4280710Z shl.b64 %rd443, %rd442, 32; 2026-02-21T08:54:53.4280767Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T08:54:53.4280945Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4281003Z mov.b64 {%r979, %r980}, %rd444; 2026-02-21T08:54:53.4281066Z cvt.rn.f16x2.f32 %r981, %r980, %r979; 2026-02-21T08:54:53.4281260Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4281318Z cvt.u64.u32 %rd445, %r638; 2026-02-21T08:54:53.4281376Z cvt.u64.u32 %rd446, %r639; 2026-02-21T08:54:53.4281436Z shl.b64 %rd447, %rd446, 32; 2026-02-21T08:54:53.4281502Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T08:54:53.4281671Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4281730Z mov.b64 {%r982, %r983}, %rd448; 2026-02-21T08:54:53.4281798Z cvt.rn.f16x2.f32 %r984, %r983, %r982; 2026-02-21T08:54:53.4281968Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4282027Z cvt.u64.u32 %rd449, %r640; 2026-02-21T08:54:53.4282091Z cvt.u64.u32 %rd450, %r641; 2026-02-21T08:54:53.4282149Z shl.b64 %rd451, %rd450, 32; 2026-02-21T08:54:53.4282228Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T08:54:53.4282400Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4282466Z mov.b64 {%r985, %r986}, %rd452; 2026-02-21T08:54:53.4282529Z cvt.rn.f16x2.f32 %r987, %r986, %r985; 2026-02-21T08:54:53.4282699Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4282765Z cvt.u64.u32 %rd453, %r643; 2026-02-21T08:54:53.4282826Z cvt.u64.u32 %rd454, %r644; 2026-02-21T08:54:53.4282884Z shl.b64 %rd455, %rd454, 32; 2026-02-21T08:54:53.4282951Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T08:54:53.4283124Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4283185Z mov.b64 {%r988, %r989}, %rd456; 2026-02-21T08:54:53.4283250Z cvt.rn.f16x2.f32 %r990, %r989, %r988; 2026-02-21T08:54:53.4283432Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4283492Z cvt.u64.u32 %rd457, %r645; 2026-02-21T08:54:53.4283552Z cvt.u64.u32 %rd458, %r646; 2026-02-21T08:54:53.4283619Z shl.b64 %rd459, %rd458, 32; 2026-02-21T08:54:53.4283678Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T08:54:53.4283867Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4283933Z mov.b64 {%r991, %r992}, %rd460; 2026-02-21T08:54:53.4283996Z cvt.rn.f16x2.f32 %r993, %r992, %r991; 2026-02-21T08:54:53.4284166Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4284225Z cvt.u64.u32 %rd461, %r647; 2026-02-21T08:54:53.4284291Z cvt.u64.u32 %rd462, %r648; 2026-02-21T08:54:53.4284349Z shl.b64 %rd463, %rd462, 32; 2026-02-21T08:54:53.4284407Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T08:54:53.4284584Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4284645Z mov.b64 {%r994, %r995}, %rd464; 2026-02-21T08:54:53.4284743Z cvt.rn.f16x2.f32 %r996, %r995, %r994; 2026-02-21T08:54:53.4284921Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4284981Z cvt.u64.u32 %rd465, %r649; 2026-02-21T08:54:53.4285038Z cvt.u64.u32 %rd466, %r650; 2026-02-21T08:54:53.4285097Z shl.b64 %rd467, %rd466, 32; 2026-02-21T08:54:53.4285164Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T08:54:53.4285358Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4285420Z mov.b64 {%r997, %r998}, %rd468; 2026-02-21T08:54:53.4285491Z cvt.rn.f16x2.f32 %r999, %r998, %r997; 2026-02-21T08:54:53.4285658Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4285717Z cvt.u64.u32 %rd469, %r651; 2026-02-21T08:54:53.4285785Z cvt.u64.u32 %rd470, %r652; 2026-02-21T08:54:53.4285871Z shl.b64 %rd471, %rd470, 32; 2026-02-21T08:54:53.4285929Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T08:54:53.4286097Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4286173Z mov.b64 {%r1000, %r1001}, %rd472; 2026-02-21T08:54:53.4286245Z cvt.rn.f16x2.f32 %r1002, %r1001, %r1000; 2026-02-21T08:54:53.4286416Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4286484Z cvt.u64.u32 %rd473, %r653; 2026-02-21T08:54:53.4286544Z cvt.u64.u32 %rd474, %r654; 2026-02-21T08:54:53.4286603Z shl.b64 %rd475, %rd474, 32; 2026-02-21T08:54:53.4286669Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T08:54:53.4286840Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4286903Z mov.b64 {%r1003, %r1004}, %rd476; 2026-02-21T08:54:53.4286999Z cvt.rn.f16x2.f32 %r1005, %r1004, %r1003; 2026-02-21T08:54:53.4287174Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4287233Z cvt.u64.u32 %rd477, %r655; 2026-02-21T08:54:53.4287291Z cvt.u64.u32 %rd478, %r656; 2026-02-21T08:54:53.4287361Z shl.b64 %rd479, %rd478, 32; 2026-02-21T08:54:53.4287420Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T08:54:53.4287585Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4287652Z mov.b64 {%r1006, %r1007}, %rd480; 2026-02-21T08:54:53.4287717Z cvt.rn.f16x2.f32 %r1008, %r1007, %r1006; 2026-02-21T08:54:53.4287872Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4287926Z cvt.u64.u32 %rd481, %r657; 2026-02-21T08:54:53.4287986Z cvt.u64.u32 %rd482, %r658; 2026-02-21T08:54:53.4288041Z shl.b64 %rd483, %rd482, 32; 2026-02-21T08:54:53.4288097Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T08:54:53.4288256Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4288316Z mov.b64 {%r1009, %r1010}, %rd484; 2026-02-21T08:54:53.4288380Z cvt.rn.f16x2.f32 %r1011, %r1010, %r1009; 2026-02-21T08:54:53.4288569Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4288625Z cvt.u64.u32 %rd485, %r660; 2026-02-21T08:54:53.4288679Z cvt.u64.u32 %rd486, %r661; 2026-02-21T08:54:53.4288734Z shl.b64 %rd487, %rd486, 32; 2026-02-21T08:54:53.4288798Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T08:54:53.4288957Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4289013Z mov.b64 {%r1012, %r1013}, %rd488; 2026-02-21T08:54:53.4289084Z cvt.rn.f16x2.f32 %r1014, %r1013, %r1012; 2026-02-21T08:54:53.4289244Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4289302Z cvt.u64.u32 %rd489, %r662; 2026-02-21T08:54:53.4289363Z cvt.u64.u32 %rd490, %r663; 2026-02-21T08:54:53.4289418Z shl.b64 %rd491, %rd490, 32; 2026-02-21T08:54:53.4289474Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T08:54:53.4289634Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4289698Z mov.b64 {%r1015, %r1016}, %rd492; 2026-02-21T08:54:53.4289762Z cvt.rn.f16x2.f32 %r1017, %r1016, %r1015; 2026-02-21T08:54:53.4289956Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4290020Z cvt.u64.u32 %rd493, %r664; 2026-02-21T08:54:53.4290074Z cvt.u64.u32 %rd494, %r665; 2026-02-21T08:54:53.4290129Z shl.b64 %rd495, %rd494, 32; 2026-02-21T08:54:53.4290193Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T08:54:53.4290351Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4290410Z mov.b64 {%r1018, %r1019}, %rd496; 2026-02-21T08:54:53.4290505Z cvt.rn.f16x2.f32 %r1020, %r1019, %r1018; 2026-02-21T08:54:53.4290674Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4290732Z cvt.u64.u32 %rd497, %r666; 2026-02-21T08:54:53.4290786Z cvt.u64.u32 %rd498, %r667; 2026-02-21T08:54:53.4290850Z shl.b64 %rd499, %rd498, 32; 2026-02-21T08:54:53.4290905Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T08:54:53.4291067Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4291132Z mov.b64 {%r1021, %r1022}, %rd500; 2026-02-21T08:54:53.4291195Z cvt.rn.f16x2.f32 %r1023, %r1022, %r1021; 2026-02-21T08:54:53.4291359Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4291415Z cvt.u64.u32 %rd501, %r668; 2026-02-21T08:54:53.4291476Z cvt.u64.u32 %rd502, %r669; 2026-02-21T08:54:53.4291557Z shl.b64 %rd503, %rd502, 32; 2026-02-21T08:54:53.4291616Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T08:54:53.4291787Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4291845Z mov.b64 {%r1024, %r1025}, %rd504; 2026-02-21T08:54:53.4291912Z cvt.rn.f16x2.f32 %r1026, %r1025, %r1024; 2026-02-21T08:54:53.4292096Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4292155Z cvt.u64.u32 %rd505, %r670; 2026-02-21T08:54:53.4292212Z cvt.u64.u32 %rd506, %r671; 2026-02-21T08:54:53.4292269Z shl.b64 %rd507, %rd506, 32; 2026-02-21T08:54:53.4292336Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T08:54:53.4292497Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4292553Z mov.b64 {%r1027, %r1028}, %rd508; 2026-02-21T08:54:53.4292628Z cvt.rn.f16x2.f32 %r1029, %r1028, %r1027; 2026-02-21T08:54:53.4292788Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4292845Z cvt.u64.u32 %rd509, %r672; 2026-02-21T08:54:53.4292906Z cvt.u64.u32 %rd510, %r673; 2026-02-21T08:54:53.4292963Z shl.b64 %rd511, %rd510, 32; 2026-02-21T08:54:53.4293045Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T08:54:53.4293204Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4293270Z mov.b64 {%r1030, %r1031}, %rd512; 2026-02-21T08:54:53.4293335Z cvt.rn.f16x2.f32 %r1032, %r1031, %r1030; 2026-02-21T08:54:53.4293499Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4293560Z cvt.u64.u32 %rd513, %r674; 2026-02-21T08:54:53.4293615Z cvt.u64.u32 %rd514, %r675; 2026-02-21T08:54:53.4293670Z shl.b64 %rd515, %rd514, 32; 2026-02-21T08:54:53.4293733Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T08:54:53.4293897Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4293954Z mov.b64 {%r1033, %r1034}, %rd516; 2026-02-21T08:54:53.4294017Z cvt.rn.f16x2.f32 %r1035, %r1034, %r1033; 2026-02-21T08:54:53.4294184Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4294240Z cvt.u64.u32 %rd517, %r677; 2026-02-21T08:54:53.4294294Z cvt.u64.u32 %rd518, %r678; 2026-02-21T08:54:53.4294376Z shl.b64 %rd519, %rd518, 32; 2026-02-21T08:54:53.4294435Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T08:54:53.4294599Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4294662Z mov.b64 {%r1036, %r1037}, %rd520; 2026-02-21T08:54:53.4294759Z cvt.rn.f16x2.f32 %r1038, %r1037, %r1036; 2026-02-21T08:54:53.4294916Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4294973Z cvt.u64.u32 %rd521, %r679; 2026-02-21T08:54:53.4295065Z cvt.u64.u32 %rd522, %r680; 2026-02-21T08:54:53.4295121Z shl.b64 %rd523, %rd522, 32; 2026-02-21T08:54:53.4295177Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T08:54:53.4295345Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4295403Z mov.b64 {%r1039, %r1040}, %rd524; 2026-02-21T08:54:53.4295466Z cvt.rn.f16x2.f32 %r1041, %r1040, %r1039; 2026-02-21T08:54:53.4295635Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4295692Z cvt.u64.u32 %rd525, %r681; 2026-02-21T08:54:53.4295746Z cvt.u64.u32 %rd526, %r682; 2026-02-21T08:54:53.4295802Z shl.b64 %rd527, %rd526, 32; 2026-02-21T08:54:53.4295866Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T08:54:53.4296027Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4296111Z mov.b64 {%r1042, %r1043}, %rd528; 2026-02-21T08:54:53.4296188Z cvt.rn.f16x2.f32 %r1044, %r1043, %r1042; 2026-02-21T08:54:53.4296344Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4296401Z cvt.u64.u32 %rd529, %r683; 2026-02-21T08:54:53.4296467Z cvt.u64.u32 %rd530, %r684; 2026-02-21T08:54:53.4296524Z shl.b64 %rd531, %rd530, 32; 2026-02-21T08:54:53.4296580Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T08:54:53.4296738Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4296802Z mov.b64 {%r1045, %r1046}, %rd532; 2026-02-21T08:54:53.4296865Z cvt.rn.f16x2.f32 %r1047, %r1046, %r1045; 2026-02-21T08:54:53.4297021Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4297084Z cvt.u64.u32 %rd533, %r685; 2026-02-21T08:54:53.4297139Z cvt.u64.u32 %rd534, %r686; 2026-02-21T08:54:53.4297195Z shl.b64 %rd535, %rd534, 32; 2026-02-21T08:54:53.4297257Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T08:54:53.4297413Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4297469Z mov.b64 {%r1048, %r1049}, %rd536; 2026-02-21T08:54:53.4297558Z cvt.rn.f16x2.f32 %r1050, %r1049, %r1048; 2026-02-21T08:54:53.4297723Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4297779Z cvt.u64.u32 %rd537, %r687; 2026-02-21T08:54:53.4297833Z cvt.u64.u32 %rd538, %r688; 2026-02-21T08:54:53.4297895Z shl.b64 %rd539, %rd538, 32; 2026-02-21T08:54:53.4297950Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T08:54:53.4298110Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4298173Z mov.b64 {%r1051, %r1052}, %rd540; 2026-02-21T08:54:53.4298236Z cvt.rn.f16x2.f32 %r1053, %r1052, %r1051; 2026-02-21T08:54:53.4298395Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4298451Z cvt.u64.u32 %rd541, %r689; 2026-02-21T08:54:53.4298511Z cvt.u64.u32 %rd542, %r690; 2026-02-21T08:54:53.4298567Z shl.b64 %rd543, %rd542, 32; 2026-02-21T08:54:53.4298625Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T08:54:53.4298791Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4298871Z mov.b64 {%r1054, %r1055}, %rd544; 2026-02-21T08:54:53.4298936Z cvt.rn.f16x2.f32 %r1056, %r1055, %r1054; 2026-02-21T08:54:53.4299103Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4299159Z cvt.u64.u32 %rd545, %r691; 2026-02-21T08:54:53.4299212Z cvt.u64.u32 %rd546, %r692; 2026-02-21T08:54:53.4299268Z shl.b64 %rd547, %rd546, 32; 2026-02-21T08:54:53.4299330Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T08:54:53.4299489Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4299569Z mov.b64 {%r1057, %r1058}, %rd548; 2026-02-21T08:54:53.4299641Z cvt.rn.f16x2.f32 %r1059, %r1058, %r1057; 2026-02-21T08:54:53.4299805Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4299862Z cvt.u64.u32 %rd549, %r694; 2026-02-21T08:54:53.4299924Z cvt.u64.u32 %rd550, %r695; 2026-02-21T08:54:53.4299979Z shl.b64 %rd551, %rd550, 32; 2026-02-21T08:54:53.4300036Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T08:54:53.4300195Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4300259Z mov.b64 {%r1060, %r1061}, %rd552; 2026-02-21T08:54:53.4300323Z cvt.rn.f16x2.f32 %r1062, %r1061, %r1060; 2026-02-21T08:54:53.4300478Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4300564Z cvt.u64.u32 %rd553, %r696; 2026-02-21T08:54:53.4300623Z cvt.u64.u32 %rd554, %r697; 2026-02-21T08:54:53.4300682Z shl.b64 %rd555, %rd554, 32; 2026-02-21T08:54:53.4300748Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T08:54:53.4300914Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4300974Z mov.b64 {%r1063, %r1064}, %rd556; 2026-02-21T08:54:53.4301037Z cvt.rn.f16x2.f32 %r1065, %r1064, %r1063; 2026-02-21T08:54:53.4301204Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4301259Z cvt.u64.u32 %rd557, %r698; 2026-02-21T08:54:53.4301314Z cvt.u64.u32 %rd558, %r699; 2026-02-21T08:54:53.4301376Z shl.b64 %rd559, %rd558, 32; 2026-02-21T08:54:53.4301431Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T08:54:53.4301593Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4301656Z mov.b64 {%r1066, %r1067}, %rd560; 2026-02-21T08:54:53.4301720Z cvt.rn.f16x2.f32 %r1068, %r1067, %r1066; 2026-02-21T08:54:53.4301877Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4301931Z cvt.u64.u32 %rd561, %r700; 2026-02-21T08:54:53.4302012Z cvt.u64.u32 %rd562, %r701; 2026-02-21T08:54:53.4302067Z shl.b64 %rd563, %rd562, 32; 2026-02-21T08:54:53.4302123Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T08:54:53.4302290Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4302347Z mov.b64 {%r1069, %r1070}, %rd564; 2026-02-21T08:54:53.4302409Z cvt.rn.f16x2.f32 %r1071, %r1070, %r1069; 2026-02-21T08:54:53.4302572Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4302627Z cvt.u64.u32 %rd565, %r702; 2026-02-21T08:54:53.4302681Z cvt.u64.u32 %rd566, %r703; 2026-02-21T08:54:53.4302738Z shl.b64 %rd567, %rd566, 32; 2026-02-21T08:54:53.4302803Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T08:54:53.4302961Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4303017Z mov.b64 {%r1072, %r1073}, %rd568; 2026-02-21T08:54:53.4303090Z cvt.rn.f16x2.f32 %r1074, %r1073, %r1072; 2026-02-21T08:54:53.4303246Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4303302Z cvt.u64.u32 %rd569, %r704; 2026-02-21T08:54:53.4303385Z cvt.u64.u32 %rd570, %r705; 2026-02-21T08:54:53.4303442Z shl.b64 %rd571, %rd570, 32; 2026-02-21T08:54:53.4303498Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T08:54:53.4303661Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4303723Z mov.b64 {%r1075, %r1076}, %rd572; 2026-02-21T08:54:53.4303787Z cvt.rn.f16x2.f32 %r1077, %r1076, %r1075; 2026-02-21T08:54:53.4303951Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4304055Z cvt.u64.u32 %rd573, %r706; 2026-02-21T08:54:53.4304110Z cvt.u64.u32 %rd574, %r707; 2026-02-21T08:54:53.4304166Z shl.b64 %rd575, %rd574, 32; 2026-02-21T08:54:53.4304231Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T08:54:53.4304387Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4304444Z mov.b64 {%r1078, %r1079}, %rd576; 2026-02-21T08:54:53.4304509Z cvt.rn.f16x2.f32 %r1080, %r1079, %r1078; 2026-02-21T08:54:53.4304704Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4304761Z cvt.u64.u32 %rd577, %r708; 2026-02-21T08:54:53.4304816Z cvt.u64.u32 %rd578, %r709; 2026-02-21T08:54:53.4304879Z shl.b64 %rd579, %rd578, 32; 2026-02-21T08:54:53.4304935Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T08:54:53.4305125Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4305191Z mov.b64 {%r1081, %r1082}, %rd580; 2026-02-21T08:54:53.4305254Z cvt.rn.f16x2.f32 %r1083, %r1082, %r1081; 2026-02-21T08:54:53.4305407Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4305463Z cvt.u64.u32 %rd581, %r711; 2026-02-21T08:54:53.4305524Z cvt.u64.u32 %rd582, %r712; 2026-02-21T08:54:53.4305579Z shl.b64 %rd583, %rd582, 32; 2026-02-21T08:54:53.4305634Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T08:54:53.4305795Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4305852Z mov.b64 {%r1084, %r1085}, %rd584; 2026-02-21T08:54:53.4305914Z cvt.rn.f16x2.f32 %r1086, %r1085, %r1084; 2026-02-21T08:54:53.4306074Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4306130Z cvt.u64.u32 %rd585, %r713; 2026-02-21T08:54:53.4306186Z cvt.u64.u32 %rd586, %r714; 2026-02-21T08:54:53.4306240Z shl.b64 %rd587, %rd586, 32; 2026-02-21T08:54:53.4306304Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T08:54:53.4306457Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4306559Z mov.b64 {%r1087, %r1088}, %rd588; 2026-02-21T08:54:53.4306629Z cvt.rn.f16x2.f32 %r1089, %r1088, %r1087; 2026-02-21T08:54:53.4306785Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4306840Z cvt.u64.u32 %rd589, %r715; 2026-02-21T08:54:53.4306900Z cvt.u64.u32 %rd590, %r716; 2026-02-21T08:54:53.4306956Z shl.b64 %rd591, %rd590, 32; 2026-02-21T08:54:53.4307012Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T08:54:53.4307165Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4307230Z mov.b64 {%r1090, %r1091}, %rd592; 2026-02-21T08:54:53.4307294Z cvt.rn.f16x2.f32 %r1092, %r1091, %r1090; 2026-02-21T08:54:53.4307448Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4307512Z cvt.u64.u32 %rd593, %r717; 2026-02-21T08:54:53.4307568Z cvt.u64.u32 %rd594, %r718; 2026-02-21T08:54:53.4307624Z shl.b64 %rd595, %rd594, 32; 2026-02-21T08:54:53.4307687Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T08:54:53.4307876Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4307935Z mov.b64 {%r1093, %r1094}, %rd596; 2026-02-21T08:54:53.4307997Z cvt.rn.f16x2.f32 %r1095, %r1094, %r1093; 2026-02-21T08:54:53.4308159Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4308214Z cvt.u64.u32 %rd597, %r719; 2026-02-21T08:54:53.4308268Z cvt.u64.u32 %rd598, %r720; 2026-02-21T08:54:53.4308335Z shl.b64 %rd599, %rd598, 32; 2026-02-21T08:54:53.4308392Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T08:54:53.4308580Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4308644Z mov.b64 {%r1096, %r1097}, %rd600; 2026-02-21T08:54:53.4308710Z cvt.rn.f16x2.f32 %r1098, %r1097, %r1096; 2026-02-21T08:54:53.4308870Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4308925Z cvt.u64.u32 %rd601, %r721; 2026-02-21T08:54:53.4308990Z cvt.u64.u32 %rd602, %r722; 2026-02-21T08:54:53.4309048Z shl.b64 %rd603, %rd602, 32; 2026-02-21T08:54:53.4309105Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T08:54:53.4309275Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4309336Z mov.b64 {%r1099, %r1100}, %rd604; 2026-02-21T08:54:53.4309408Z cvt.rn.f16x2.f32 %r1101, %r1100, %r1099; 2026-02-21T08:54:53.4309602Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4309660Z cvt.u64.u32 %rd605, %r723; 2026-02-21T08:54:53.4309716Z cvt.u64.u32 %rd606, %r724; 2026-02-21T08:54:53.4309773Z shl.b64 %rd607, %rd606, 32; 2026-02-21T08:54:53.4309841Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T08:54:53.4310003Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4310058Z mov.b64 {%r1102, %r1103}, %rd608; 2026-02-21T08:54:53.4310130Z cvt.rn.f16x2.f32 %r1104, %r1103, %r1102; 2026-02-21T08:54:53.4310291Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4310346Z cvt.u64.u32 %rd609, %r725; 2026-02-21T08:54:53.4310407Z cvt.u64.u32 %rd610, %r726; 2026-02-21T08:54:53.4310462Z shl.b64 %rd611, %rd610, 32; 2026-02-21T08:54:53.4310518Z or.b64 %rd612, %rd609, %rd611; 2026-02-21T08:54:53.4310678Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4310744Z mov.b64 {%r1105, %r1106}, %rd612; 2026-02-21T08:54:53.4310808Z cvt.rn.f16x2.f32 %r1107, %r1106, %r1105; 2026-02-21T08:54:53.4310966Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4311053Z cvt.u64.u32 %rd613, %r728; 2026-02-21T08:54:53.4311108Z cvt.u64.u32 %rd614, %r729; 2026-02-21T08:54:53.4311164Z shl.b64 %rd615, %rd614, 32; 2026-02-21T08:54:53.4311229Z or.b64 %rd616, %rd613, %rd615; 2026-02-21T08:54:53.4311392Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4311449Z mov.b64 {%r1108, %r1109}, %rd616; 2026-02-21T08:54:53.4311513Z cvt.rn.f16x2.f32 %r1110, %r1109, %r1108; 2026-02-21T08:54:53.4311684Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4311740Z cvt.u64.u32 %rd617, %r730; 2026-02-21T08:54:53.4311797Z cvt.u64.u32 %rd618, %r731; 2026-02-21T08:54:53.4311861Z shl.b64 %rd619, %rd618, 32; 2026-02-21T08:54:53.4311916Z or.b64 %rd620, %rd617, %rd619; 2026-02-21T08:54:53.4312080Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4312145Z mov.b64 {%r1111, %r1112}, %rd620; 2026-02-21T08:54:53.4312209Z cvt.rn.f16x2.f32 %r1113, %r1112, %r1111; 2026-02-21T08:54:53.4312402Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4312459Z cvt.u64.u32 %rd621, %r732; 2026-02-21T08:54:53.4312521Z cvt.u64.u32 %rd622, %r733; 2026-02-21T08:54:53.4312577Z shl.b64 %rd623, %rd622, 32; 2026-02-21T08:54:53.4312633Z or.b64 %rd624, %rd621, %rd623; 2026-02-21T08:54:53.4312800Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4312856Z mov.b64 {%r1114, %r1115}, %rd624; 2026-02-21T08:54:53.4312921Z cvt.rn.f16x2.f32 %r1116, %r1115, %r1114; 2026-02-21T08:54:53.4313112Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4313168Z cvt.u64.u32 %rd625, %r734; 2026-02-21T08:54:53.4313226Z cvt.u64.u32 %rd626, %r735; 2026-02-21T08:54:53.4313282Z shl.b64 %rd627, %rd626, 32; 2026-02-21T08:54:53.4313345Z or.b64 %rd628, %rd625, %rd627; 2026-02-21T08:54:53.4313507Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4313565Z mov.b64 {%r1117, %r1118}, %rd628; 2026-02-21T08:54:53.4313634Z cvt.rn.f16x2.f32 %r1119, %r1118, %r1117; 2026-02-21T08:54:53.4313796Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4313850Z cvt.u64.u32 %rd629, %r736; 2026-02-21T08:54:53.4313912Z cvt.u64.u32 %rd630, %r737; 2026-02-21T08:54:53.4313969Z shl.b64 %rd631, %rd630, 32; 2026-02-21T08:54:53.4314048Z or.b64 %rd632, %rd629, %rd631; 2026-02-21T08:54:53.4314213Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4314279Z mov.b64 {%r1120, %r1121}, %rd632; 2026-02-21T08:54:53.4314343Z cvt.rn.f16x2.f32 %r1122, %r1121, %r1120; 2026-02-21T08:54:53.4314508Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4314570Z cvt.u64.u32 %rd633, %r738; 2026-02-21T08:54:53.4314626Z cvt.u64.u32 %rd634, %r739; 2026-02-21T08:54:53.4314712Z shl.b64 %rd635, %rd634, 32; 2026-02-21T08:54:53.4314776Z or.b64 %rd636, %rd633, %rd635; 2026-02-21T08:54:53.4314937Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4314992Z mov.b64 {%r1123, %r1124}, %rd636; 2026-02-21T08:54:53.4315055Z cvt.rn.f16x2.f32 %r1125, %r1124, %r1123; 2026-02-21T08:54:53.4315229Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4315285Z cvt.u64.u32 %rd637, %r740; 2026-02-21T08:54:53.4315341Z cvt.u64.u32 %rd638, %r741; 2026-02-21T08:54:53.4315403Z shl.b64 %rd639, %rd638, 32; 2026-02-21T08:54:53.4315460Z or.b64 %rd640, %rd637, %rd639; 2026-02-21T08:54:53.4315650Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4315713Z mov.b64 {%r1126, %r1127}, %rd640; 2026-02-21T08:54:53.4315777Z cvt.rn.f16x2.f32 %r1128, %r1127, %r1126; 2026-02-21T08:54:53.4315942Z .loc 1 56 52 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:56:52 2026-02-21T08:54:53.4315996Z cvt.u64.u32 %rd641, %r742; 2026-02-21T08:54:53.4316058Z cvt.u64.u32 %rd642, %r743; 2026-02-21T08:54:53.4316113Z shl.b64 %rd643, %rd642, 32; 2026-02-21T08:54:53.4316168Z or.b64 %rd644, %rd641, %rd643; 2026-02-21T08:54:53.4316336Z .loc 1 58 27 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:58:27 2026-02-21T08:54:53.4316395Z mov.b64 {%r1129, %r1130}, %rd644; 2026-02-21T08:54:53.4316459Z cvt.rn.f16x2.f32 %r1131, %r1130, %r1129; 2026-02-21T08:54:53.4316627Z .loc 1 59 45 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:59:45 2026-02-21T08:54:53.4316684Z bar.sync 0; 2026-02-21T08:54:53.4316776Z st.shared.v4.b32 [%r4], {%r750, %r753, %r756, %r759}; 2026-02-21T08:54:53.4316876Z st.shared.v4.b32 [%r4+16384], {%r846, %r849, %r852, %r855}; 2026-02-21T08:54:53.4316999Z st.shared.v4.b32 [%r4+32768], {%r942, %r945, %r948, %r951}; 2026-02-21T08:54:53.4317103Z st.shared.v4.b32 [%r4+49152], {%r1038, %r1041, %r1044, %r1047}; 2026-02-21T08:54:53.4317190Z st.shared.v4.b32 [%r5], {%r762, %r765, %r768, %r771}; 2026-02-21T08:54:53.4317290Z st.shared.v4.b32 [%r5+16384], {%r858, %r861, %r864, %r867}; 2026-02-21T08:54:53.4317379Z st.shared.v4.b32 [%r5+32768], {%r954, %r957, %r960, %r963}; 2026-02-21T08:54:53.4317478Z st.shared.v4.b32 [%r5+49152], {%r1050, %r1053, %r1056, %r1059}; 2026-02-21T08:54:53.4317593Z st.shared.v4.b32 [%r6], {%r774, %r777, %r780, %r783}; 2026-02-21T08:54:53.4317683Z st.shared.v4.b32 [%r6+16384], {%r870, %r873, %r876, %r879}; 2026-02-21T08:54:53.4317769Z st.shared.v4.b32 [%r6+32768], {%r966, %r969, %r972, %r975}; 2026-02-21T08:54:53.4317877Z st.shared.v4.b32 [%r6+49152], {%r1062, %r1065, %r1068, %r1071}; 2026-02-21T08:54:53.4317961Z st.shared.v4.b32 [%r7], {%r786, %r789, %r792, %r795}; 2026-02-21T08:54:53.4318049Z st.shared.v4.b32 [%r7+16384], {%r882, %r885, %r888, %r891}; 2026-02-21T08:54:53.4318139Z st.shared.v4.b32 [%r7+32768], {%r978, %r981, %r984, %r987}; 2026-02-21T08:54:53.4318244Z st.shared.v4.b32 [%r7+49152], {%r1074, %r1077, %r1080, %r1083}; 2026-02-21T08:54:53.4318324Z st.shared.v4.b32 [%r8], {%r798, %r801, %r804, %r807}; 2026-02-21T08:54:53.4318412Z st.shared.v4.b32 [%r8+16384], {%r894, %r897, %r900, %r903}; 2026-02-21T08:54:53.4318530Z st.shared.v4.b32 [%r8+32768], {%r990, %r993, %r996, %r999}; 2026-02-21T08:54:53.4318625Z st.shared.v4.b32 [%r8+49152], {%r1086, %r1089, %r1092, %r1095}; 2026-02-21T08:54:53.4318706Z st.shared.v4.b32 [%r9], {%r810, %r813, %r816, %r819}; 2026-02-21T08:54:53.4318799Z st.shared.v4.b32 [%r9+16384], {%r906, %r909, %r912, %r915}; 2026-02-21T08:54:53.4318891Z st.shared.v4.b32 [%r9+32768], {%r1002, %r1005, %r1008, %r1011}; 2026-02-21T08:54:53.4318984Z st.shared.v4.b32 [%r9+49152], {%r1098, %r1101, %r1104, %r1107}; 2026-02-21T08:54:53.4319079Z st.shared.v4.b32 [%r10], {%r822, %r825, %r828, %r831}; 2026-02-21T08:54:53.4319175Z st.shared.v4.b32 [%r10+16384], {%r918, %r921, %r924, %r927}; 2026-02-21T08:54:53.4319277Z st.shared.v4.b32 [%r10+32768], {%r1014, %r1017, %r1020, %r1023}; 2026-02-21T08:54:53.4319375Z st.shared.v4.b32 [%r10+49152], {%r1110, %r1113, %r1116, %r1119}; 2026-02-21T08:54:53.4319467Z st.shared.v4.b32 [%r11], {%r834, %r837, %r840, %r843}; 2026-02-21T08:54:53.4319559Z st.shared.v4.b32 [%r11+16384], {%r930, %r933, %r936, %r939}; 2026-02-21T08:54:53.4319656Z st.shared.v4.b32 [%r11+32768], {%r1026, %r1029, %r1032, %r1035}; 2026-02-21T08:54:53.4319759Z st.shared.v4.b32 [%r11+49152], {%r1122, %r1125, %r1128, %r1131}; 2026-02-21T08:54:53.4319815Z // begin inline asm 2026-02-21T08:54:53.4319910Z fence.proxy.async.shared::cta; 2026-02-21T08:54:53.4319970Z // end inline asm 2026-02-21T08:54:53.4320025Z bar.sync 0; 2026-02-21T08:54:53.4320090Z elect.sync %r1132|%p139, -1; 2026-02-21T08:54:53.4320151Z and.pred %p137, %p138, %p139; 2026-02-21T08:54:53.4320218Z shl.b32 %r1133, %r15, 14; 2026-02-21T08:54:53.4320274Z add.s32 %r747, %r330, %r1133; 2026-02-21T08:54:53.4320331Z shl.b32 %r1135, %r15, 6; 2026-02-21T08:54:53.4320394Z or.b32 %r745, %r1135, %r442; 2026-02-21T08:54:53.4320448Z // begin inline asm 2026-02-21T08:54:53.4320628Z @%p137 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd132, {%r745, %r746}], [%r747]; 2026-02-21T08:54:53.4320681Z // end inline asm 2026-02-21T08:54:53.4320753Z cp.async.bulk.commit_group; 2026-02-21T08:54:53.4320825Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:54:53.4320877Z bar.sync 0; 2026-02-21T08:54:53.4320963Z $L__BB0_8: // %._crit_edge 2026-02-21T08:54:53.4321128Z .loc 1 33 4 // cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py:33:4 2026-02-21T08:54:53.4321180Z bar.sync 0; 2026-02-21T08:54:53.4321242Z // begin inline asm 2026-02-21T08:54:53.4321356Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1136, 256; 2026-02-21T08:54:53.4321430Z // end inline asm 2026-02-21T08:54:53.4321482Z ret; 2026-02-21T08:54:53.4321542Z $L__tmp1: 2026-02-21T08:54:53.4321596Z $L__func_end0: 2026-02-21T08:54:53.4321675Z // -- End function 2026-02-21T08:54:53.4321732Z } 2026-02-21T08:54:53.4321938Z .file 1 "/tmp/torchinductor_root/au/cauin7grvuvef3aur2clbl3fpnc32whkm7ro24d4cqs6uogvz2ht.py" 2026-02-21T08:54:53.4321998Z .section .debug_abbrev 2026-02-21T08:54:53.4322047Z { 2026-02-21T08:54:53.4322138Z .b8 1 // Abbreviation Code 2026-02-21T08:54:53.4322242Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:54:53.4322318Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:54:53.4322404Z .b8 37 // DW_AT_producer 2026-02-21T08:54:53.4322474Z .b8 8 // DW_FORM_string 2026-02-21T08:54:53.4322545Z .b8 19 // DW_AT_language 2026-02-21T08:54:53.4322625Z .b8 5 // DW_FORM_data2 2026-02-21T08:54:53.4322696Z .b8 3 // DW_AT_name 2026-02-21T08:54:53.4322765Z .b8 8 // DW_FORM_string 2026-02-21T08:54:53.4322839Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:54:53.4322917Z .b8 6 // DW_FORM_data4 2026-02-21T08:54:53.4323022Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:54:53.4323098Z .b8 8 // DW_FORM_string 2026-02-21T08:54:53.4323175Z .b8 0 // EOM(1) 2026-02-21T08:54:53.4323244Z .b8 0 // EOM(2) 2026-02-21T08:54:53.4323312Z .b8 0 // EOM(3) 2026-02-21T08:54:53.4323368Z } 2026-02-21T08:54:53.4323429Z .section .debug_info 2026-02-21T08:54:53.4323478Z { 2026-02-21T08:54:53.4323560Z .b32 104 // Length of Unit 2026-02-21T08:54:53.4323652Z .b8 2 // DWARF version number 2026-02-21T08:54:53.4323704Z .b8 0 2026-02-21T08:54:53.4323819Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:54:53.4323914Z .b8 8 // Address Size (in bytes) 2026-02-21T08:54:53.4324011Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T08:54:53.4324092Z .b8 116 // DW_AT_producer 2026-02-21T08:54:53.4324155Z .b8 114 2026-02-21T08:54:53.4324209Z .b8 105 2026-02-21T08:54:53.4324260Z .b8 116 2026-02-21T08:54:53.4324310Z .b8 111 2026-02-21T08:54:53.4324369Z .b8 110 2026-02-21T08:54:53.4324420Z .b8 0 2026-02-21T08:54:53.4324519Z .b8 2 // DW_AT_language 2026-02-21T08:54:53.4324578Z .b8 0 2026-02-21T08:54:53.4324655Z .b8 99 // DW_AT_name 2026-02-21T08:54:53.4324737Z .b8 97 2026-02-21T08:54:53.4324790Z .b8 117 2026-02-21T08:54:53.4324852Z .b8 105 2026-02-21T08:54:53.4324904Z .b8 110 2026-02-21T08:54:53.4324955Z .b8 55 2026-02-21T08:54:53.4325014Z .b8 103 2026-02-21T08:54:53.4325064Z .b8 114 2026-02-21T08:54:53.4325115Z .b8 118 2026-02-21T08:54:53.4325167Z .b8 117 2026-02-21T08:54:53.4325226Z .b8 118 2026-02-21T08:54:53.4325277Z .b8 101 2026-02-21T08:54:53.4325330Z .b8 102 2026-02-21T08:54:53.4325382Z .b8 51 2026-02-21T08:54:53.4325445Z .b8 97 2026-02-21T08:54:53.4325499Z .b8 117 2026-02-21T08:54:53.4325551Z .b8 114 2026-02-21T08:54:53.4325614Z .b8 50 2026-02-21T08:54:53.4325667Z .b8 99 2026-02-21T08:54:53.4325718Z .b8 108 2026-02-21T08:54:53.4325770Z .b8 98 2026-02-21T08:54:53.4325828Z .b8 108 2026-02-21T08:54:53.4325878Z .b8 51 2026-02-21T08:54:53.4325931Z .b8 102 2026-02-21T08:54:53.4325987Z .b8 112 2026-02-21T08:54:53.4326038Z .b8 110 2026-02-21T08:54:53.4326088Z .b8 99 2026-02-21T08:54:53.4326139Z .b8 51 2026-02-21T08:54:53.4326195Z .b8 50 2026-02-21T08:54:53.4326246Z .b8 119 2026-02-21T08:54:53.4326324Z .b8 104 2026-02-21T08:54:53.4326375Z .b8 107 2026-02-21T08:54:53.4326435Z .b8 109 2026-02-21T08:54:53.4326486Z .b8 55 2026-02-21T08:54:53.4326537Z .b8 114 2026-02-21T08:54:53.4326595Z .b8 111 2026-02-21T08:54:53.4326645Z .b8 50 2026-02-21T08:54:53.4326696Z .b8 52 2026-02-21T08:54:53.4326746Z .b8 100 2026-02-21T08:54:53.4326804Z .b8 52 2026-02-21T08:54:53.4326854Z .b8 99 2026-02-21T08:54:53.4326905Z .b8 113 2026-02-21T08:54:53.4326961Z .b8 115 2026-02-21T08:54:53.4327012Z .b8 54 2026-02-21T08:54:53.4327063Z .b8 117 2026-02-21T08:54:53.4327144Z .b8 111 2026-02-21T08:54:53.4327202Z .b8 103 2026-02-21T08:54:53.4327252Z .b8 118 2026-02-21T08:54:53.4327302Z .b8 122 2026-02-21T08:54:53.4327359Z .b8 50 2026-02-21T08:54:53.4327410Z .b8 104 2026-02-21T08:54:53.4327463Z .b8 116 2026-02-21T08:54:53.4327513Z .b8 46 2026-02-21T08:54:53.4327574Z .b8 112 2026-02-21T08:54:53.4327624Z .b8 121 2026-02-21T08:54:53.4327674Z .b8 0 2026-02-21T08:54:53.4327768Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:54:53.4327852Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:54:53.4327902Z .b8 116 2026-02-21T08:54:53.4327952Z .b8 109 2026-02-21T08:54:53.4328010Z .b8 112 2026-02-21T08:54:53.4328061Z .b8 47 2026-02-21T08:54:53.4328109Z .b8 116 2026-02-21T08:54:53.4328159Z .b8 111 2026-02-21T08:54:53.4328217Z .b8 114 2026-02-21T08:54:53.4328268Z .b8 99 2026-02-21T08:54:53.4328317Z .b8 104 2026-02-21T08:54:53.4328373Z .b8 105 2026-02-21T08:54:53.4328423Z .b8 110 2026-02-21T08:54:53.4328503Z .b8 100 2026-02-21T08:54:53.4328555Z .b8 117 2026-02-21T08:54:53.4328612Z .b8 99 2026-02-21T08:54:53.4328663Z .b8 116 2026-02-21T08:54:53.4328714Z .b8 111 2026-02-21T08:54:53.4328770Z .b8 114 2026-02-21T08:54:53.4328820Z .b8 95 2026-02-21T08:54:53.4328870Z .b8 114 2026-02-21T08:54:53.4328921Z .b8 111 2026-02-21T08:54:53.4328978Z .b8 111 2026-02-21T08:54:53.4329026Z .b8 116 2026-02-21T08:54:53.4329077Z .b8 47 2026-02-21T08:54:53.4329126Z .b8 97 2026-02-21T08:54:53.4329184Z .b8 117 2026-02-21T08:54:53.4329233Z .b8 0 2026-02-21T08:54:53.4329283Z } 2026-02-21T08:54:53.4329356Z .section .debug_macinfo { } 2026-02-21T08:54:53.4329361Z 2026-02-21T08:54:53.4329438Z ================================================================ 2026-02-21T08:54:53.4329542Z please share the reproducer above with Triton project. 2026-02-21T08:54:53.4329745Z [398s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:54:53.4330887Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T08:54:53.4331035Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:54:53.4331099Z `ptxas` stderr: 2026-02-21T08:54:53.4331420Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 354 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T08:54:53.4331506Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:54:53.4331510Z 2026-02-21T08:54:53.4331919Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp4kshdm69.ptx -o /tmp/tmp4kshdm69.ptx.o 2026-02-21T08:54:53.4331924Z 2026-02-21T08:54:53.4332047Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:54:53.6990379Z 2026-02-21T08:54:53.6992792Z Generation 18: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 20/20 16.0 configs/s 2026-02-21T08:54:53.6997594Z [399s] Generation 18 complete: 2026-02-21T08:54:53.6999052Z error=4 2026-02-21T08:54:53.6999203Z ok=17 2026-02-21T08:54:53.6999326Z min=0.1033 2026-02-21T08:54:53.6999624Z mid=0.2723 2026-02-21T08:54:53.6999753Z max=9.2724 2026-02-21T08:54:53.6999897Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:54:53.7000114Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:54:53.7000322Z 'l2_groupings': [64], 2026-02-21T08:54:53.7000493Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:54:53.7000677Z 'loop_orders': [[0, 1]], 2026-02-21T08:54:53.7000836Z 'num_stages': 3, 2026-02-21T08:54:53.7000975Z 'num_warps': 8, 2026-02-21T08:54:53.7001184Z 'pid_type': 'flat', 2026-02-21T08:54:53.7001339Z 'range_flattens': [None, None], 2026-02-21T08:54:53.7001525Z 'range_multi_buffers': [None, None], 2026-02-21T08:54:53.7001707Z 'range_num_stages': [0, 0], 2026-02-21T08:54:53.7001882Z 'range_unroll_factors': [0, 0], 2026-02-21T08:54:53.7002068Z 'range_warp_specializes': [None, True]} 2026-02-21T08:54:53.7032651Z [399s] Fitting surrogate: 1197 points, 1197 targets 2026-02-21T08:54:54.1609621Z [399s] Generation 19 starting: 14 neighbors, 1 active search path(s) 2026-02-21T08:55:24.7563618Z [430s] Timeout after 30s compiling Config(block_sizes=[256, 512, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], num_stages=1, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T08:55:25.4221829Z [430s] Timeout after 30s compiling Config(block_sizes=[512, 512, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], num_stages=1, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T08:55:25.4239619Z Generation 19: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 0.1 configs/s 2026-02-21T08:55:25.9078008Z Generation 19: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 34.1 configs/s 2026-02-21T08:55:25.9085253Z [431s] Generation 19 complete: 2026-02-21T08:55:25.9089724Z error=6 2026-02-21T08:55:25.9093700Z timeout=2 2026-02-21T08:55:25.9097577Z ok=8 2026-02-21T08:55:25.9101069Z min=0.1033 2026-02-21T08:55:25.9105787Z mid=0.2479 2026-02-21T08:55:25.9109519Z max=9.5549 2026-02-21T08:55:25.9109719Z best={'block_sizes': [256, 256, 64], 2026-02-21T08:55:25.9109969Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:55:25.9110183Z 'l2_groupings': [64], 2026-02-21T08:55:25.9110361Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:55:25.9110551Z 'loop_orders': [[0, 1]], 2026-02-21T08:55:25.9110710Z 'num_stages': 3, 2026-02-21T08:55:25.9111123Z 'num_warps': 8, 2026-02-21T08:55:25.9111267Z 'pid_type': 'flat', 2026-02-21T08:55:25.9111416Z 'range_flattens': [None, None], 2026-02-21T08:55:25.9111597Z 'range_multi_buffers': [None, None], 2026-02-21T08:55:25.9111789Z 'range_num_stages': [0, 0], 2026-02-21T08:55:25.9111950Z 'range_unroll_factors': [0, 0], 2026-02-21T08:55:25.9112132Z 'range_warp_specializes': [None, True]} 2026-02-21T08:55:25.9118032Z [431s] Fitting surrogate: 1213 points, 1213 targets 2026-02-21T08:55:26.1412322Z [431s] Autotuning complete in 431.6s after searching 1170 configs. 2026-02-21T08:55:26.1414442Z One can hardcode the best config and skip autotuning with: 2026-02-21T08:55:26.1415729Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_stages=3, num_warps=8, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T08:55:26.1416658Z 2026-02-21T08:55:26.1417162Z [431s] Code of selected kernel: /tmp/torchinductor_root/vx/cvxoytryd37xk3sde4pyxvhj6iyn73amy5ctpbreoii2vv5yv3t3.py 2026-02-21T08:55:55.4011431Z WARNING:tritonbench.utils.triton_op:Completed input ID 11: 2026-02-21T08:55:55.4016367Z (M, N, K) 2026-02-21T08:55:55.4018418Z ------------------- 2026-02-21T08:55:55.4018611Z (2048, 12288, 2048) 2026-02-21T08:55:55.4018777Z 2026-02-21T08:55:55.4024015Z 100%|██████████| 8/8 [45:44<00:00, 406.08s/it] 2026-02-21T08:55:55.4025576Z 100%|██████████| 8/8 [45:44<00:00, 343.06s/it] 2026-02-21T08:55:55.4034456Z INFO:tritonbench.utils.run_utils:[tritonbench] Output result csv to /tmp/tmpz_strk_h.csv 2026-02-21T08:55:56.6839967Z (M, N, K) triton_tutorial_matmul-speedup triton_tutorial_matmul-accuracy pt2_triton_matmul-speedup pt2_triton_matmul-accuracy helion_matmul_tritonbench-speedup helion_matmul_tritonbench-accuracy 2026-02-21T08:55:56.6841455Z ------------------- -------------------------------- --------------------------------- --------------------------- ---------------------------- ----------------------------------- ------------------------------------ 2026-02-21T08:55:56.6842074Z (4096, 1024, 1024) 0.765015 1 0.757183 1 0.728212 1 2026-02-21T08:55:56.6842941Z (4096, 2048, 2048) 0.74895 1 0.733842 1 0.941937 1 2026-02-21T08:55:56.6843511Z (2048, 4096, 2048) 0.748613 1 0.720456 1 0.870379 1 2026-02-21T08:55:56.6844072Z (1024, 8192, 1024) 0.684598 1 0.69229 1 0.85712 1 2026-02-21T08:55:56.6844604Z (8192, 2048, 2048) 0.753624 1 0.727209 1 0.893734 1 2026-02-21T08:55:56.6845591Z (12288, 1024, 1024) 0.685068 1 0.679251 1 0.72815 1 2026-02-21T08:55:56.6846164Z (1024, 12288, 1024) 0.673184 1 0.666727 1 0.666409 1 2026-02-21T08:55:56.6850109Z (2048, 12288, 2048) 0.671619 1 0.669046 1 0.858887 1 2026-02-21T08:55:56.6850732Z average 0.716334 1 0.70575 1 0.818103 1 2026-02-21T08:58:03.6131215Z Applying custom args for gemm: {'num_inputs': 8, 'non_square': '', 'rep': '3000'} 2026-02-21T08:58:03.8845123Z Running gemm benchmark with Helion implementation... 2026-02-21T08:58:03.8849536Z 2026-02-21T08:58:03.9376045Z Equally-spaced-k mode: Selected 8 equally spaced inputs (total available: 12) 2026-02-21T08:58:03.9376644Z WARNING:tritonbench.utils.triton_op:Input IDs to run: [0, 2, 3, 5, 6, 8, 9, 11] 2026-02-21T08:58:03.9380266Z 2026-02-21T08:58:03.9389736Z 0%| | 0/8 [00:00; 2026-02-21T09:00:05.5022046Z .reg .b16 %rs<3>; 2026-02-21T09:00:05.5022180Z .reg .b32 %r<387>; 2026-02-21T09:00:05.5022317Z .reg .b64 %rd<172>; 2026-02-21T09:00:05.5022453Z $L__func_begin0: 2026-02-21T09:00:05.5022539Z 2026-02-21T09:00:05.5022589Z // %bb.0: 2026-02-21T09:00:05.5022836Z .loc 1 14 0 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:14 2026-02-21T09:00:05.5023125Z mov.u32 %r1, %tid.x; 2026-02-21T09:00:05.5023287Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:00:05.5023451Z mov.b32 %r40, global_smem; 2026-02-21T09:00:05.5023615Z // begin inline asm 2026-02-21T09:00:05.5023862Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r40], 32; 2026-02-21T09:00:05.5024119Z // end inline asm 2026-02-21T09:00:05.5024256Z bar.sync 0; 2026-02-21T09:00:05.5024411Z ld.shared.b32 %r380, [global_smem]; 2026-02-21T09:00:05.5024594Z bar.sync 0; 2026-02-21T09:00:05.5024825Z // begin inline asm 2026-02-21T09:00:05.5025047Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:00:05.5025280Z // end inline asm 2026-02-21T09:00:05.5025556Z .loc 1 20 46 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:20:46 2026-02-21T09:00:05.5025876Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:00:05.5026167Z .loc 1 20 131 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:20:131 2026-02-21T09:00:05.5026564Z setp.gt.u32 %p3, %r3, 1023; 2026-02-21T09:00:05.5026734Z @%p3 bra $L__BB0_8; 2026-02-21T09:00:05.5026910Z // %bb.1: // %.lr.ph 2026-02-21T09:00:05.5027227Z .loc 1 0 131 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:0:131 2026-02-21T09:00:05.5027583Z ld.param.b64 %rd14, [_helion_matmul_param_1]; 2026-02-21T09:00:05.5027811Z ld.param.b64 %rd13, [_helion_matmul_param_0]; 2026-02-21T09:00:05.5028140Z .loc 1 40 48 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:40:48 2026-02-21T09:00:05.5028444Z shl.b32 %r113, %r1, 2; 2026-02-21T09:00:05.5028621Z and.b32 %r4, %r113, 28; 2026-02-21T09:00:05.5028792Z shl.b32 %r114, %r1, 3; 2026-02-21T09:00:05.5030090Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=3, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:00:05.5031287Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:00:05.5031522Z `ptxas` stderr: 2026-02-21T09:00:05.5031914Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 143 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:00:05.5032383Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:00:05.5032528Z 2026-02-21T09:00:05.5032944Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp8c_eepkz.ptx -o /tmp/tmp8c_eepkz.ptx.o 2026-02-21T09:00:05.5033375Z 2026-02-21T09:00:05.5033499Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:00:05.5033741Z and.b32 %r115, %r114, 24; 2026-02-21T09:00:05.5034004Z .loc 1 34 45 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:34:45 2026-02-21T09:00:05.5034299Z bfe.u32 %r116, %r1, 1, 6; 2026-02-21T09:00:05.5034456Z shr.u32 %r117, %r1, 2; 2026-02-21T09:00:05.5034601Z bfe.u32 %r5, %r1, 2, 5; 2026-02-21T09:00:05.5034904Z .loc 1 32 45 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:32:45 2026-02-21T09:00:05.5035191Z and.b32 %r118, %r1, 1; 2026-02-21T09:00:05.5035343Z shl.b32 %r119, %r118, 3; 2026-02-21T09:00:05.5035488Z bfe.u32 %r6, %r1, 3, 4; 2026-02-21T09:00:05.5035637Z shr.u32 %r120, %r1, 5; 2026-02-21T09:00:05.5035780Z setp.eq.b32 %p33, %r1, 0; 2026-02-21T09:00:05.5035935Z and.b32 %r121, %r1, 127; 2026-02-21T09:00:05.5036083Z shl.b32 %r122, %r121, 4; 2026-02-21T09:00:05.5036223Z shl.b32 %r123, %r1, 1; 2026-02-21T09:00:05.5036368Z and.b32 %r124, %r123, 48; 2026-02-21T09:00:05.5036519Z xor.b32 %r7, %r122, %r124; 2026-02-21T09:00:05.5036678Z add.s32 %r77, %r40, %r7; 2026-02-21T09:00:05.5036821Z add.s32 %r79, %r77, 2048; 2026-02-21T09:00:05.5036970Z add.s32 %r81, %r77, 4096; 2026-02-21T09:00:05.5037110Z add.s32 %r83, %r77, 6144; 2026-02-21T09:00:05.5037259Z add.s32 %r85, %r77, 8192; 2026-02-21T09:00:05.5037402Z add.s32 %r87, %r77, 10240; 2026-02-21T09:00:05.5037557Z add.s32 %r89, %r77, 12288; 2026-02-21T09:00:05.5037708Z add.s32 %r91, %r77, 14336; 2026-02-21T09:00:05.5037854Z shl.b32 %r126, %r121, 3; 2026-02-21T09:00:05.5038005Z and.b32 %r127, %r1, 48; 2026-02-21T09:00:05.5038150Z xor.b32 %r128, %r126, %r127; 2026-02-21T09:00:05.5038310Z add.s32 %r129, %r40, %r128; 2026-02-21T09:00:05.5038463Z add.s32 %r93, %r129, 49152; 2026-02-21T09:00:05.5038620Z add.s32 %r95, %r77, 16384; 2026-02-21T09:00:05.5038763Z add.s32 %r97, %r77, 18432; 2026-02-21T09:00:05.5038914Z add.s32 %r99, %r77, 20480; 2026-02-21T09:00:05.5039068Z add.s32 %r101, %r77, 22528; 2026-02-21T09:00:05.5039250Z add.s32 %r103, %r77, 24576; 2026-02-21T09:00:05.5039406Z add.s32 %r105, %r77, 26624; 2026-02-21T09:00:05.5039552Z add.s32 %r107, %r77, 28672; 2026-02-21T09:00:05.5039705Z add.s32 %r109, %r77, 30720; 2026-02-21T09:00:05.5039854Z add.s32 %r111, %r129, 50176; 2026-02-21T09:00:05.5040010Z or.b32 %r9, %r115, 64; 2026-02-21T09:00:05.5040151Z add.s32 %r181, %r77, 32768; 2026-02-21T09:00:05.5040299Z add.s32 %r183, %r77, 34816; 2026-02-21T09:00:05.5040441Z add.s32 %r185, %r77, 36864; 2026-02-21T09:00:05.5040589Z add.s32 %r187, %r77, 38912; 2026-02-21T09:00:05.5040737Z add.s32 %r189, %r77, 40960; 2026-02-21T09:00:05.5040878Z add.s32 %r191, %r77, 43008; 2026-02-21T09:00:05.5041026Z add.s32 %r193, %r77, 45056; 2026-02-21T09:00:05.5041199Z add.s32 %r195, %r77, 47104; 2026-02-21T09:00:05.5041353Z add.s32 %r197, %r129, 51200; 2026-02-21T09:00:05.5041501Z shl.b32 %r130, %r1, 4; 2026-02-21T09:00:05.5041644Z and.b32 %r131, %r130, 1968; 2026-02-21T09:00:05.5041793Z bfe.s32 %r132, %r1, 2, 1; 2026-02-21T09:00:05.5041942Z and.b32 %r133, %r132, 2112; 2026-02-21T09:00:05.5042086Z or.b32 %r134, %r133, %r131; 2026-02-21T09:00:05.5042238Z xor.b32 %r135, %r134, 64; 2026-02-21T09:00:05.5042422Z and.b32 %r136, %r114, 944; 2026-02-21T09:00:05.5042571Z shl.b32 %r137, %r118, 6; 2026-02-21T09:00:05.5042723Z bfe.s32 %r138, %r1, 3, 1; 2026-02-21T09:00:05.5042866Z and.b32 %r139, %r138, 2112; 2026-02-21T09:00:05.5043018Z or.b32 %r140, %r136, %r137; 2026-02-21T09:00:05.5043163Z xor.b32 %r141, %r140, %r139; 2026-02-21T09:00:05.5043435Z .loc 1 20 131 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:20:131 2026-02-21T09:00:05.5043756Z cvt.u64.u32 %rd34, %r115; 2026-02-21T09:00:05.5044020Z .loc 1 31 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:31:27 2026-02-21T09:00:05.5044299Z shl.b32 %r142, %r3, 4; 2026-02-21T09:00:05.5044438Z and.b32 %r143, %r142, 1008; 2026-02-21T09:00:05.5044723Z .loc 1 32 32 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:32:32 2026-02-21T09:00:05.5045014Z or.b32 %r144, %r143, %r6; 2026-02-21T09:00:05.5045278Z .loc 1 33 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:33:27 2026-02-21T09:00:05.5045556Z shl.b32 %r145, %r3, 2; 2026-02-21T09:00:05.5045706Z and.b32 %r146, %r145, 3840; 2026-02-21T09:00:05.5045970Z .loc 1 34 32 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:34:32 2026-02-21T09:00:05.5046247Z or.b32 %r147, %r146, %r5; 2026-02-21T09:00:05.5046399Z or.b32 %r148, %r117, %r146; 2026-02-21T09:00:05.5046546Z or.b32 %r23, %r146, %r116; 2026-02-21T09:00:05.5046811Z .loc 1 44 53 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:53 2026-02-21T09:00:05.5047094Z shl.b32 %r149, %r147, 10; 2026-02-21T09:00:05.5047245Z shl.b32 %r150, %r148, 10; 2026-02-21T09:00:05.5047393Z or.b32 %r151, %r150, 229376; 2026-02-21T09:00:05.5047664Z .loc 1 45 80 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:80 2026-02-21T09:00:05.5047948Z shl.b32 %r152, %r144, 10; 2026-02-21T09:00:05.5048204Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5048509Z shfl.sync.idx.b32 %r27, %r120, 0, 31, -1; 2026-02-21T09:00:05.5048691Z shl.b32 %r153, %r27, 21; 2026-02-21T09:00:05.5048852Z and.b32 %r154, %r153, 6291456; 2026-02-21T09:00:05.5049012Z add.s32 %r306, %r154, %r380; 2026-02-21T09:00:05.5049177Z mov.pred %p11, -1; 2026-02-21T09:00:05.5049323Z mov.b32 %r381, 0; 2026-02-21T09:00:05.5049458Z // begin inline asm 2026-02-21T09:00:05.5049838Z @%p11 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r306 + 0], {%r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381}; 2026-02-21T09:00:05.5050222Z // end inline asm 2026-02-21T09:00:05.5050362Z // begin inline asm 2026-02-21T09:00:05.5050743Z @%p11 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r306 + 16], {%r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381, %r381}; 2026-02-21T09:00:05.5051141Z // end inline asm 2026-02-21T09:00:05.5051280Z // begin inline asm 2026-02-21T09:00:05.5051433Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:00:05.5051597Z // end inline asm 2026-02-21T09:00:05.5051725Z bar.sync 0; 2026-02-21T09:00:05.5051974Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5052254Z add.s32 %r382, %r40, 52224; 2026-02-21T09:00:05.5052408Z // begin inline asm 2026-02-21T09:00:05.5052571Z @%p33 mbarrier.init.shared::cta.b64 [%r382], 1; 2026-02-21T09:00:05.5052790Z // end inline asm 2026-02-21T09:00:05.5052917Z bar.sync 0; 2026-02-21T09:00:05.5053054Z add.s32 %r76, %r40, 52232; 2026-02-21T09:00:05.5053206Z // begin inline asm 2026-02-21T09:00:05.5053365Z @%p33 mbarrier.init.shared::cta.b64 [%r76], 1; 2026-02-21T09:00:05.5053555Z // end inline asm 2026-02-21T09:00:05.5053798Z .loc 1 44 60 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:60 2026-02-21T09:00:05.5054088Z or.b32 %r155, %r149, %r115; 2026-02-21T09:00:05.5054267Z or.b32 %r156, %r151, %r115; 2026-02-21T09:00:05.5054533Z .loc 1 44 32 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:32 2026-02-21T09:00:05.5054864Z mad.wide.u32 %rd16, %r155, 2, %rd13; 2026-02-21T09:00:05.5055033Z cvt.u64.u32 %rd1, %r149; 2026-02-21T09:00:05.5055190Z or.b64 %rd35, %rd1, %rd34; 2026-02-21T09:00:05.5055338Z shl.b64 %rd36, %rd35, 1; 2026-02-21T09:00:05.5055491Z add.s64 %rd2, %rd13, %rd36; 2026-02-21T09:00:05.5055696Z add.s64 %rd17, %rd2, 65536; 2026-02-21T09:00:05.5055855Z add.s64 %rd18, %rd2, 131072; 2026-02-21T09:00:05.5056006Z add.s64 %rd19, %rd2, 196608; 2026-02-21T09:00:05.5056164Z add.s64 %rd20, %rd2, 262144; 2026-02-21T09:00:05.5056311Z add.s64 %rd21, %rd2, 327680; 2026-02-21T09:00:05.5056466Z add.s64 %rd22, %rd2, 393216; 2026-02-21T09:00:05.5056628Z mad.wide.u32 %rd23, %r156, 2, %rd13; 2026-02-21T09:00:05.5056793Z mov.b32 %r182, 16; 2026-02-21T09:00:05.5057043Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5057322Z // begin inline asm 2026-02-21T09:00:05.5057527Z cp.async.cg.shared.global [ %r77 + 0 ], [ %rd16 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5057744Z // end inline asm 2026-02-21T09:00:05.5057884Z // begin inline asm 2026-02-21T09:00:05.5058070Z cp.async.cg.shared.global [ %r79 + 0 ], [ %rd17 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5058291Z // end inline asm 2026-02-21T09:00:05.5058428Z // begin inline asm 2026-02-21T09:00:05.5058612Z cp.async.cg.shared.global [ %r81 + 0 ], [ %rd18 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5058833Z // end inline asm 2026-02-21T09:00:05.5058965Z // begin inline asm 2026-02-21T09:00:05.5059177Z cp.async.cg.shared.global [ %r83 + 0 ], [ %rd19 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5059388Z // end inline asm 2026-02-21T09:00:05.5059522Z // begin inline asm 2026-02-21T09:00:05.5059703Z cp.async.cg.shared.global [ %r85 + 0 ], [ %rd20 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5059922Z // end inline asm 2026-02-21T09:00:05.5060057Z // begin inline asm 2026-02-21T09:00:05.5060237Z cp.async.cg.shared.global [ %r87 + 0 ], [ %rd21 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5060456Z // end inline asm 2026-02-21T09:00:05.5060583Z // begin inline asm 2026-02-21T09:00:05.5060775Z cp.async.cg.shared.global [ %r89 + 0 ], [ %rd22 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5060982Z // end inline asm 2026-02-21T09:00:05.5061117Z // begin inline asm 2026-02-21T09:00:05.5061300Z cp.async.cg.shared.global [ %r91 + 0 ], [ %rd23 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5061516Z // end inline asm 2026-02-21T09:00:05.5061661Z cp.async.commit_group; 2026-02-21T09:00:05.5061912Z .loc 1 45 59 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:59 2026-02-21T09:00:05.5062237Z or.b32 %r157, %r152, %r4; 2026-02-21T09:00:05.5062497Z .loc 1 45 34 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:34 2026-02-21T09:00:05.5062794Z mad.wide.u32 %rd24, %r157, 2, %rd14; 2026-02-21T09:00:05.5062957Z mov.b32 %r94, 8; 2026-02-21T09:00:05.5063200Z .loc 1 45 87 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:87 2026-02-21T09:00:05.5063488Z // begin inline asm 2026-02-21T09:00:05.5063678Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd24 + 0 ], 0x8, %r94; 2026-02-21T09:00:05.5063897Z // end inline asm 2026-02-21T09:00:05.5064033Z cp.async.commit_group; 2026-02-21T09:00:05.5064301Z .loc 1 44 32 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:32 2026-02-21T09:00:05.5064621Z add.s64 %rd25, %rd2, 64; 2026-02-21T09:00:05.5064816Z or.b32 %r158, %r155, 32; 2026-02-21T09:00:05.5064968Z mad.wide.u32 %rd37, %r158, 2, %rd13; 2026-02-21T09:00:05.5065141Z add.s64 %rd26, %rd37, 65536; 2026-02-21T09:00:05.5065304Z add.s64 %rd27, %rd37, 131072; 2026-02-21T09:00:05.5065462Z add.s64 %rd28, %rd37, 196608; 2026-02-21T09:00:05.5065622Z add.s64 %rd29, %rd37, 262144; 2026-02-21T09:00:05.5065801Z add.s64 %rd30, %rd37, 327680; 2026-02-21T09:00:05.5065959Z add.s64 %rd31, %rd37, 393216; 2026-02-21T09:00:05.5066109Z cvt.u64.u32 %rd4, %r151; 2026-02-21T09:00:05.5066265Z or.b64 %rd38, %rd4, %rd34; 2026-02-21T09:00:05.5066414Z shl.b64 %rd39, %rd38, 1; 2026-02-21T09:00:05.5066567Z add.s64 %rd5, %rd13, %rd39; 2026-02-21T09:00:05.5066716Z add.s64 %rd32, %rd5, 64; 2026-02-21T09:00:05.5067003Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5067304Z bar.sync 0; 2026-02-21T09:00:05.5067437Z // begin inline asm 2026-02-21T09:00:05.5067646Z cp.async.cg.shared.global [ %r95 + 0 ], [ %rd25 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5067871Z // end inline asm 2026-02-21T09:00:05.5068017Z // begin inline asm 2026-02-21T09:00:05.5068213Z cp.async.cg.shared.global [ %r97 + 0 ], [ %rd26 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5068448Z // end inline asm 2026-02-21T09:00:05.5068587Z // begin inline asm 2026-02-21T09:00:05.5068796Z cp.async.cg.shared.global [ %r99 + 0 ], [ %rd27 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5069030Z // end inline asm 2026-02-21T09:00:05.5069164Z // begin inline asm 2026-02-21T09:00:05.5069372Z cp.async.cg.shared.global [ %r101 + 0 ], [ %rd28 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5069596Z // end inline asm 2026-02-21T09:00:05.5069735Z // begin inline asm 2026-02-21T09:00:05.5069933Z cp.async.cg.shared.global [ %r103 + 0 ], [ %rd29 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5070167Z // end inline asm 2026-02-21T09:00:05.5070304Z // begin inline asm 2026-02-21T09:00:05.5070507Z cp.async.cg.shared.global [ %r105 + 0 ], [ %rd30 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5070739Z // end inline asm 2026-02-21T09:00:05.5070872Z // begin inline asm 2026-02-21T09:00:05.5071075Z cp.async.cg.shared.global [ %r107 + 0 ], [ %rd31 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5071298Z // end inline asm 2026-02-21T09:00:05.5071439Z // begin inline asm 2026-02-21T09:00:05.5071634Z cp.async.cg.shared.global [ %r109 + 0 ], [ %rd32 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5071865Z // end inline asm 2026-02-21T09:00:05.5072006Z cp.async.commit_group; 2026-02-21T09:00:05.5072290Z .loc 1 45 34 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:34 2026-02-21T09:00:05.5072601Z add.s64 %rd33, %rd24, 64; 2026-02-21T09:00:05.5072878Z .loc 1 45 87 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:87 2026-02-21T09:00:05.5073185Z // begin inline asm 2026-02-21T09:00:05.5073386Z cp.async.ca.shared.global [ %r111 + 0 ], [ %rd33 + 0 ], 0x8, %r94; 2026-02-21T09:00:05.5073615Z // end inline asm 2026-02-21T09:00:05.5073756Z cp.async.commit_group; 2026-02-21T09:00:05.5074036Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5074358Z cp.async.wait_group 2; 2026-02-21T09:00:05.5074528Z bar.sync 0; 2026-02-21T09:00:05.5074793Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5075080Z setp.ne.b32 %p8, %r27, 0; 2026-02-21T09:00:05.5075238Z @%p8 bra $L__BB0_3; 2026-02-21T09:00:05.5075376Z // %bb.2: 2026-02-21T09:00:05.5075619Z .loc 1 0 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:0:52 2026-02-21T09:00:05.5075908Z add.s32 %r168, %r40, 8224; 2026-02-21T09:00:05.5076072Z bfe.u32 %r169, %r168, 4, 14; 2026-02-21T09:00:05.5076229Z cvt.u64.u32 %rd49, %r169; 2026-02-21T09:00:05.5076427Z or.b64 %rd46, %rd49, -9223371899348713472; 2026-02-21T09:00:05.5076608Z add.s32 %r170, %r40, 8192; 2026-02-21T09:00:05.5076761Z bfe.u32 %r171, %r170, 4, 14; 2026-02-21T09:00:05.5076917Z cvt.u64.u32 %rd50, %r171; 2026-02-21T09:00:05.5077074Z or.b64 %rd44, %rd50, -9223371899348713472; 2026-02-21T09:00:05.5077255Z add.s32 %r172, %r40, 49152; 2026-02-21T09:00:05.5077403Z add.s32 %r173, %r40, 49184; 2026-02-21T09:00:05.5077558Z bfe.u32 %r174, %r173, 4, 14; 2026-02-21T09:00:05.5077731Z cvt.u64.u32 %rd51, %r174; 2026-02-21T09:00:05.5077896Z or.b64 %rd43, %rd51, -9223371899411628032; 2026-02-21T09:00:05.5078072Z add.s32 %r175, %r40, 32; 2026-02-21T09:00:05.5078217Z bfe.u32 %r176, %r175, 4, 14; 2026-02-21T09:00:05.5078378Z cvt.u64.u32 %rd52, %r176; 2026-02-21T09:00:05.5078534Z or.b64 %rd42, %rd52, -9223371899348713472; 2026-02-21T09:00:05.5078717Z bfe.u32 %r177, %r172, 4, 14; 2026-02-21T09:00:05.5078864Z cvt.u64.u32 %rd53, %r177; 2026-02-21T09:00:05.5079052Z or.b64 %rd41, %rd53, -9223371899411628032; 2026-02-21T09:00:05.5079225Z bfe.u32 %r178, %r40, 4, 14; 2026-02-21T09:00:05.5079379Z cvt.u64.u32 %rd54, %r178; 2026-02-21T09:00:05.5079539Z or.b64 %rd40, %rd54, -9223371899348713472; 2026-02-21T09:00:05.5079818Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5080109Z elect.sync %r179|%p10, -1; 2026-02-21T09:00:05.5080259Z mov.b32 %r160, 134479888; 2026-02-21T09:00:05.5080413Z mov.pred %p9, 0; 2026-02-21T09:00:05.5080548Z // begin inline asm 2026-02-21T09:00:05.5080776Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 0 ], %rd40, %rd41, %r160, %p9; 2026-02-21T09:00:05.5081021Z // end inline asm 2026-02-21T09:00:05.5081159Z // begin inline asm 2026-02-21T09:00:05.5081380Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 0 ], %rd42, %rd43, %r160, %p11; 2026-02-21T09:00:05.5081621Z // end inline asm 2026-02-21T09:00:05.5081759Z // begin inline asm 2026-02-21T09:00:05.5081968Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 16 ], %rd44, %rd41, %r160, %p9; 2026-02-21T09:00:05.5082213Z // end inline asm 2026-02-21T09:00:05.5082342Z // begin inline asm 2026-02-21T09:00:05.5082559Z @%p10 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 16 ], %rd46, %rd43, %r160, %p11; 2026-02-21T09:00:05.5082807Z // end inline asm 2026-02-21T09:00:05.5082939Z add.s32 %r180, %r40, 52224; 2026-02-21T09:00:05.5083094Z cvt.u64.u32 %rd48, %r180; 2026-02-21T09:00:05.5083237Z // begin inline asm 2026-02-21T09:00:05.5083445Z @%p10 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd48]; 2026-02-21T09:00:05.5083664Z // end inline asm 2026-02-21T09:00:05.5083798Z $L__BB0_3: 2026-02-21T09:00:05.5084031Z .loc 1 0 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:0:52 2026-02-21T09:00:05.5084341Z ld.param.b64 %rd15, [_helion_matmul_param_2]; 2026-02-21T09:00:05.5084535Z add.s32 %r19, %r40, %r134; 2026-02-21T09:00:05.5084714Z add.s32 %r20, %r40, %r135; 2026-02-21T09:00:05.5084874Z add.s32 %r21, %r40, %r141; 2026-02-21T09:00:05.5085016Z or.b32 %r22, %r143, %r119; 2026-02-21T09:00:05.5085167Z or.b32 %r24, %r23, 64; 2026-02-21T09:00:05.5085311Z or.b32 %r25, %r23, 128; 2026-02-21T09:00:05.5085461Z or.b32 %r26, %r23, 192; 2026-02-21T09:00:05.5085761Z .loc 1 44 32 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:32 2026-02-21T09:00:05.5086046Z add.s64 %rd55, %rd2, 128; 2026-02-21T09:00:05.5086196Z cvt.u64.u32 %rd66, %r9; 2026-02-21T09:00:05.5086350Z add.s64 %rd67, %rd1, %rd66; 2026-02-21T09:00:05.5086506Z shl.b64 %rd68, %rd67, 1; 2026-02-21T09:00:05.5086654Z add.s64 %rd69, %rd13, %rd68; 2026-02-21T09:00:05.5086816Z add.s64 %rd56, %rd69, 65536; 2026-02-21T09:00:05.5086967Z add.s64 %rd57, %rd69, 131072; 2026-02-21T09:00:05.5087131Z add.s64 %rd58, %rd69, 196608; 2026-02-21T09:00:05.5087281Z add.s64 %rd59, %rd69, 262144; 2026-02-21T09:00:05.5087445Z add.s64 %rd60, %rd69, 327680; 2026-02-21T09:00:05.5087597Z add.s64 %rd61, %rd69, 393216; 2026-02-21T09:00:05.5087789Z add.s64 %rd62, %rd5, 128; 2026-02-21T09:00:05.5088057Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5088337Z bar.sync 0; 2026-02-21T09:00:05.5088472Z // begin inline asm 2026-02-21T09:00:05.5088669Z cp.async.cg.shared.global [ %r181 + 0 ], [ %rd55 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5088893Z // end inline asm 2026-02-21T09:00:05.5089021Z // begin inline asm 2026-02-21T09:00:05.5089251Z cp.async.cg.shared.global [ %r183 + 0 ], [ %rd56 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5089472Z // end inline asm 2026-02-21T09:00:05.5089614Z // begin inline asm 2026-02-21T09:00:05.5089811Z cp.async.cg.shared.global [ %r185 + 0 ], [ %rd57 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5090025Z // end inline asm 2026-02-21T09:00:05.5090160Z // begin inline asm 2026-02-21T09:00:05.5090380Z cp.async.cg.shared.global [ %r187 + 0 ], [ %rd58 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5090610Z // end inline asm 2026-02-21T09:00:05.5090738Z // begin inline asm 2026-02-21T09:00:05.5090930Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd59 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5091144Z // end inline asm 2026-02-21T09:00:05.5091282Z // begin inline asm 2026-02-21T09:00:05.5091469Z cp.async.cg.shared.global [ %r191 + 0 ], [ %rd60 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5091680Z // end inline asm 2026-02-21T09:00:05.5091817Z // begin inline asm 2026-02-21T09:00:05.5092001Z cp.async.cg.shared.global [ %r193 + 0 ], [ %rd61 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5092222Z // end inline asm 2026-02-21T09:00:05.5092349Z // begin inline asm 2026-02-21T09:00:05.5092543Z cp.async.cg.shared.global [ %r195 + 0 ], [ %rd62 + 0 ], 0x10, %r182; 2026-02-21T09:00:05.5092763Z // end inline asm 2026-02-21T09:00:05.5092907Z cp.async.commit_group; 2026-02-21T09:00:05.5093170Z .loc 1 45 34 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:34 2026-02-21T09:00:05.5093457Z add.s64 %rd63, %rd24, 128; 2026-02-21T09:00:05.5093720Z .loc 1 45 87 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:87 2026-02-21T09:00:05.5094000Z // begin inline asm 2026-02-21T09:00:05.5094192Z cp.async.ca.shared.global [ %r197 + 0 ], [ %rd63 + 0 ], 0x8, %r94; 2026-02-21T09:00:05.5094403Z // end inline asm 2026-02-21T09:00:05.5094545Z cp.async.commit_group; 2026-02-21T09:00:05.5094836Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5095126Z and.b32 %r203, %r1, 3; 2026-02-21T09:00:05.5095287Z mul.wide.u32 %rd70, %r203, 16; 2026-02-21T09:00:05.5095445Z shl.b64 %rd71, %rd4, 1; 2026-02-21T09:00:05.5095601Z add.s64 %rd72, %rd70, %rd71; 2026-02-21T09:00:05.5095753Z add.s64 %rd73, %rd72, %rd13; 2026-02-21T09:00:05.5095912Z add.s64 %rd6, %rd73, 192; 2026-02-21T09:00:05.5096062Z cvt.u16.u32 %rs1, %r3; 2026-02-21T09:00:05.5096219Z and.b16 %rs2, %rs1, 63; 2026-02-21T09:00:05.5096373Z mul.wide.u16 %r204, %rs2, 16384; 2026-02-21T09:00:05.5096544Z shl.b32 %r205, %r6, 10; 2026-02-21T09:00:05.5096688Z or.b32 %r206, %r204, %r205; 2026-02-21T09:00:05.5096849Z or.b32 %r207, %r206, %r4; 2026-02-21T09:00:05.5097014Z mad.wide.u32 %rd74, %r207, 2, %rd14; 2026-02-21T09:00:05.5097211Z add.s64 %rd7, %rd74, 192; 2026-02-21T09:00:05.5097370Z shl.b32 %r208, %r3, 12; 2026-02-21T09:00:05.5097532Z and.b32 %r209, %r208, 3932160; 2026-02-21T09:00:05.5097695Z shl.b32 %r210, %r5, 10; 2026-02-21T09:00:05.5097839Z or.b32 %r211, %r209, %r210; 2026-02-21T09:00:05.5097998Z mul.wide.u32 %rd75, %r211, 2; 2026-02-21T09:00:05.5098152Z or.b64 %rd76, %rd70, %rd75; 2026-02-21T09:00:05.5098307Z add.s64 %rd8, %rd13, %rd76; 2026-02-21T09:00:05.5098455Z mov.b32 %r385, 1; 2026-02-21T09:00:05.5098586Z mov.b32 %r384, 2; 2026-02-21T09:00:05.5098723Z mov.b64 %rd171, 0; 2026-02-21T09:00:05.5098860Z mov.b64 %rd170, -32; 2026-02-21T09:00:05.5099011Z mov.b32 %r383, %r381; 2026-02-21T09:00:05.5099181Z mov.b32 %r386, %r381; 2026-02-21T09:00:05.5099327Z bra.uni $L__BB0_4; 2026-02-21T09:00:05.5099506Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:00:05.5099834Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5100123Z add.s64 %rd170, %rd170, 32; 2026-02-21T09:00:05.5100282Z setp.lt.u64 %p29, %rd170, 928; 2026-02-21T09:00:05.5100597Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5100874Z // begin inline asm 2026-02-21T09:00:05.5101012Z 2026-02-21T09:00:05.5101123Z { 2026-02-21T09:00:05.5101247Z .reg .pred complete; 2026-02-21T09:00:05.5101386Z waitLoop: 2026-02-21T09:00:05.5101577Z mbarrier.try_wait.parity.shared.b64 complete, [%r382], %r381; 2026-02-21T09:00:05.5101802Z @!complete bra.uni waitLoop; 2026-02-21T09:00:05.5101955Z } 2026-02-21T09:00:05.5102017Z 2026-02-21T09:00:05.5102106Z // end inline asm 2026-02-21T09:00:05.5102245Z add.s32 %r262, %r385, 1; 2026-02-21T09:00:05.5102402Z setp.gt.s32 %p30, %r262, 1; 2026-02-21T09:00:05.5102558Z selp.b32 %r385, 0, %r262, %p30; 2026-02-21T09:00:05.5102725Z selp.b32 %r263, 1, 0, %p30; 2026-02-21T09:00:05.5102875Z xor.b32 %r38, %r386, %r263; 2026-02-21T09:00:05.5103137Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5103418Z add.s32 %r264, %r384, 1; 2026-02-21T09:00:05.5103573Z setp.gt.s32 %p31, %r264, 2; 2026-02-21T09:00:05.5103734Z selp.b32 %r384, 0, %r264, %p31; 2026-02-21T09:00:05.5104006Z .loc 1 44 32 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:32 2026-02-21T09:00:05.5104307Z add.s64 %rd101, %rd8, %rd171; 2026-02-21T09:00:05.5104461Z add.s64 %rd92, %rd101, 192; 2026-02-21T09:00:05.5104618Z add.s64 %rd93, %rd101, 65728; 2026-02-21T09:00:05.5104810Z add.s64 %rd94, %rd101, 131264; 2026-02-21T09:00:05.5104974Z add.s64 %rd95, %rd101, 196800; 2026-02-21T09:00:05.5105130Z add.s64 %rd96, %rd101, 262336; 2026-02-21T09:00:05.5105289Z add.s64 %rd97, %rd101, 327872; 2026-02-21T09:00:05.5105446Z add.s64 %rd98, %rd101, 393408; 2026-02-21T09:00:05.5105714Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5106021Z add.s64 %rd99, %rd6, %rd171; 2026-02-21T09:00:05.5106173Z shl.b32 %r265, %r384, 14; 2026-02-21T09:00:05.5106335Z add.s32 %r267, %r40, %r265; 2026-02-21T09:00:05.5106484Z bar.sync 0; 2026-02-21T09:00:05.5106631Z add.s32 %r244, %r267, %r7; 2026-02-21T09:00:05.5106785Z selp.b32 %r245, 16, 0, %p29; 2026-02-21T09:00:05.5106942Z // begin inline asm 2026-02-21T09:00:05.5107142Z cp.async.cg.shared.global [ %r244 + 0 ], [ %rd92 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5107363Z // end inline asm 2026-02-21T09:00:05.5107503Z add.s32 %r246, %r244, 2048; 2026-02-21T09:00:05.5107651Z // begin inline asm 2026-02-21T09:00:05.5107852Z cp.async.cg.shared.global [ %r246 + 0 ], [ %rd93 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5108070Z // end inline asm 2026-02-21T09:00:05.5108208Z add.s32 %r248, %r244, 4096; 2026-02-21T09:00:05.5108353Z // begin inline asm 2026-02-21T09:00:05.5108547Z cp.async.cg.shared.global [ %r248 + 0 ], [ %rd94 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5108802Z // end inline asm 2026-02-21T09:00:05.5108933Z add.s32 %r250, %r244, 6144; 2026-02-21T09:00:05.5109086Z // begin inline asm 2026-02-21T09:00:05.5109271Z cp.async.cg.shared.global [ %r250 + 0 ], [ %rd95 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5109493Z // end inline asm 2026-02-21T09:00:05.5109623Z add.s32 %r252, %r244, 8192; 2026-02-21T09:00:05.5109775Z // begin inline asm 2026-02-21T09:00:05.5109963Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd96 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5110182Z // end inline asm 2026-02-21T09:00:05.5110314Z add.s32 %r254, %r244, 10240; 2026-02-21T09:00:05.5110469Z // begin inline asm 2026-02-21T09:00:05.5110660Z cp.async.cg.shared.global [ %r254 + 0 ], [ %rd97 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5110920Z // end inline asm 2026-02-21T09:00:05.5111058Z add.s32 %r256, %r244, 12288; 2026-02-21T09:00:05.5111202Z // begin inline asm 2026-02-21T09:00:05.5111392Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd98 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5111604Z // end inline asm 2026-02-21T09:00:05.5111740Z add.s32 %r258, %r244, 14336; 2026-02-21T09:00:05.5111884Z // begin inline asm 2026-02-21T09:00:05.5112104Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd99 + 0 ], 0x10, %r245; 2026-02-21T09:00:05.5112331Z // end inline asm 2026-02-21T09:00:05.5112471Z cp.async.commit_group; 2026-02-21T09:00:05.5112737Z .loc 1 45 87 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:87 2026-02-21T09:00:05.5113034Z add.s64 %rd100, %rd7, %rd171; 2026-02-21T09:00:05.5113199Z shl.b32 %r268, %r384, 10; 2026-02-21T09:00:05.5113382Z add.s32 %r260, %r93, %r268; 2026-02-21T09:00:05.5113542Z selp.b32 %r261, 8, 0, %p29; 2026-02-21T09:00:05.5113691Z // begin inline asm 2026-02-21T09:00:05.5113884Z cp.async.ca.shared.global [ %r260 + 0 ], [ %rd100 + 0 ], 0x8, %r261; 2026-02-21T09:00:05.5114103Z // end inline asm 2026-02-21T09:00:05.5114239Z cp.async.commit_group; 2026-02-21T09:00:05.5114500Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5114806Z add.s64 %rd171, %rd171, 64; 2026-02-21T09:00:05.5114975Z setp.lt.u64 %p32, %rd170, 960; 2026-02-21T09:00:05.5115135Z mov.b32 %r381, %r386; 2026-02-21T09:00:05.5115288Z mov.b32 %r382, %r269; 2026-02-21T09:00:05.5115429Z mov.b32 %r386, %r38; 2026-02-21T09:00:05.5115584Z @%p32 bra $L__BB0_4; 2026-02-21T09:00:05.5115731Z bra.uni $L__BB0_7; 2026-02-21T09:00:05.5115923Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:00:05.5116149Z add.s32 %r213, %r383, 1; 2026-02-21T09:00:05.5116312Z setp.gt.s32 %p19, %r213, 2; 2026-02-21T09:00:05.5116491Z selp.b32 %r383, 0, %r213, %p19; 2026-02-21T09:00:05.5116774Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5117081Z cp.async.wait_group 2; 2026-02-21T09:00:05.5117227Z bar.sync 0; 2026-02-21T09:00:05.5117477Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5117768Z shl.b32 %r214, %r385, 3; 2026-02-21T09:00:05.5117916Z add.s32 %r216, %r40, %r214; 2026-02-21T09:00:05.5118075Z add.s32 %r269, %r216, 52224; 2026-02-21T09:00:05.5118336Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5118625Z @%p8 bra $L__BB0_6; 2026-02-21T09:00:05.5118807Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:00:05.5119133Z .loc 1 45 87 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:45:87 2026-02-21T09:00:05.5119428Z shl.b32 %r225, %r383, 10; 2026-02-21T09:00:05.5119578Z add.s32 %r227, %r40, %r225; 2026-02-21T09:00:05.5119737Z add.s32 %r228, %r227, 49152; 2026-02-21T09:00:05.5119998Z .loc 1 44 85 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:44:85 2026-02-21T09:00:05.5120320Z shl.b32 %r229, %r383, 14; 2026-02-21T09:00:05.5120467Z add.s32 %r230, %r40, %r229; 2026-02-21T09:00:05.5120731Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5121019Z elect.sync %r231|%p21, -1; 2026-02-21T09:00:05.5121183Z bfe.u32 %r232, %r230, 4, 14; 2026-02-21T09:00:05.5121339Z cvt.u64.u32 %rd86, %r232; 2026-02-21T09:00:05.5121499Z or.b64 %rd77, %rd86, -9223371899348713472; 2026-02-21T09:00:05.5121682Z bfe.u32 %r233, %r228, 4, 14; 2026-02-21T09:00:05.5121829Z cvt.u64.u32 %rd87, %r233; 2026-02-21T09:00:05.5121993Z or.b64 %rd78, %rd87, -9223371899411628032; 2026-02-21T09:00:05.5122166Z mov.b32 %r218, 134479888; 2026-02-21T09:00:05.5122351Z mov.pred %p20, -1; 2026-02-21T09:00:05.5122489Z // begin inline asm 2026-02-21T09:00:05.5122716Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 0 ], %rd77, %rd78, %r218, %p20; 2026-02-21T09:00:05.5122965Z // end inline asm 2026-02-21T09:00:05.5123100Z add.s32 %r234, %r230, 32; 2026-02-21T09:00:05.5123254Z bfe.u32 %r235, %r234, 4, 14; 2026-02-21T09:00:05.5123401Z cvt.u64.u32 %rd88, %r235; 2026-02-21T09:00:05.5123593Z or.b64 %rd79, %rd88, -9223371899348713472; 2026-02-21T09:00:05.5123768Z add.s32 %r236, %r227, 49184; 2026-02-21T09:00:05.5123926Z bfe.u32 %r237, %r236, 4, 14; 2026-02-21T09:00:05.5124073Z cvt.u64.u32 %rd89, %r237; 2026-02-21T09:00:05.5124239Z or.b64 %rd80, %rd89, -9223371899411628032; 2026-02-21T09:00:05.5124416Z // begin inline asm 2026-02-21T09:00:05.5124629Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 0 ], %rd79, %rd80, %r218, %p20; 2026-02-21T09:00:05.5124918Z // end inline asm 2026-02-21T09:00:05.5125080Z add.s32 %r238, %r230, 8192; 2026-02-21T09:00:05.5125239Z bfe.u32 %r239, %r238, 4, 14; 2026-02-21T09:00:05.5125387Z cvt.u64.u32 %rd90, %r239; 2026-02-21T09:00:05.5125552Z or.b64 %rd81, %rd90, -9223371899348713472; 2026-02-21T09:00:05.5125726Z // begin inline asm 2026-02-21T09:00:05.5125948Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 16 ], %rd81, %rd78, %r218, %p20; 2026-02-21T09:00:05.5126197Z // end inline asm 2026-02-21T09:00:05.5126339Z add.s32 %r240, %r230, 8224; 2026-02-21T09:00:05.5126505Z bfe.u32 %r241, %r240, 4, 14; 2026-02-21T09:00:05.5126661Z cvt.u64.u32 %rd91, %r241; 2026-02-21T09:00:05.5126823Z or.b64 %rd83, %rd91, -9223371899348713472; 2026-02-21T09:00:05.5126989Z // begin inline asm 2026-02-21T09:00:05.5127204Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r380 + 16 ], %rd83, %rd80, %r218, %p20; 2026-02-21T09:00:05.5127443Z // end inline asm 2026-02-21T09:00:05.5127582Z cvt.u64.u32 %rd85, %r269; 2026-02-21T09:00:05.5127734Z // begin inline asm 2026-02-21T09:00:05.5127932Z @%p21 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd85]; 2026-02-21T09:00:05.5128165Z // end inline asm 2026-02-21T09:00:05.5128295Z bra.uni $L__BB0_6; 2026-02-21T09:00:05.5128470Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:00:05.5128779Z .loc 1 0 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:0:52 2026-02-21T09:00:05.5129068Z mov.b32 %r270, 1; 2026-02-21T09:00:05.5129308Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5129586Z // begin inline asm 2026-02-21T09:00:05.5129724Z 2026-02-21T09:00:05.5129834Z { 2026-02-21T09:00:05.5129959Z .reg .pred complete; 2026-02-21T09:00:05.5130098Z waitLoop: 2026-02-21T09:00:05.5130288Z mbarrier.try_wait.parity.shared.b64 complete, [%r269], %r270; 2026-02-21T09:00:05.5130513Z @!complete bra.uni waitLoop; 2026-02-21T09:00:05.5130665Z } 2026-02-21T09:00:05.5130726Z 2026-02-21T09:00:05.5130779Z // end inline asm 2026-02-21T09:00:05.5131026Z .loc 1 39 90 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:39:90 2026-02-21T09:00:05.5131322Z cp.async.wait_group 0; 2026-02-21T09:00:05.5131465Z bar.sync 0; 2026-02-21T09:00:05.5131632Z add.s32 %r271, %r40, 52224; 2026-02-21T09:00:05.5131778Z // begin inline asm 2026-02-21T09:00:05.5131941Z @%p33 mbarrier.inval.shared::cta.b64 [%r271]; 2026-02-21T09:00:05.5132122Z // end inline asm 2026-02-21T09:00:05.5132255Z bar.sync 0; 2026-02-21T09:00:05.5132380Z // begin inline asm 2026-02-21T09:00:05.5132543Z @%p33 mbarrier.inval.shared::cta.b64 [%r76]; 2026-02-21T09:00:05.5132727Z // end inline asm 2026-02-21T09:00:05.5132964Z .loc 1 49 45 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:49:45 2026-02-21T09:00:05.5133247Z shl.b32 %r324, %r23, 10; 2026-02-21T09:00:05.5133393Z shl.b32 %r325, %r24, 10; 2026-02-21T09:00:05.5133545Z shl.b32 %r326, %r25, 10; 2026-02-21T09:00:05.5133686Z shl.b32 %r327, %r26, 10; 2026-02-21T09:00:05.5133970Z .loc 1 49 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:49:52 2026-02-21T09:00:05.5134247Z or.b32 %r328, %r324, %r22; 2026-02-21T09:00:05.5134406Z or.b32 %r329, %r325, %r22; 2026-02-21T09:00:05.5134565Z or.b32 %r330, %r326, %r22; 2026-02-21T09:00:05.5134750Z or.b32 %r331, %r327, %r22; 2026-02-21T09:00:05.5135015Z .loc 1 49 24 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:49:24 2026-02-21T09:00:05.5135340Z mad.wide.u32 %rd102, %r328, 2, %rd15; 2026-02-21T09:00:05.5135530Z mad.wide.u32 %rd103, %r329, 2, %rd15; 2026-02-21T09:00:05.5135701Z mad.wide.u32 %rd104, %r330, 2, %rd15; 2026-02-21T09:00:05.5135877Z mad.wide.u32 %rd105, %r331, 2, %rd15; 2026-02-21T09:00:05.5136152Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5136454Z // begin inline asm 2026-02-21T09:00:05.5136856Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287, %r288}, [%r306 + 0]; 2026-02-21T09:00:05.5137228Z // end inline asm 2026-02-21T09:00:05.5137368Z // begin inline asm 2026-02-21T09:00:05.5137711Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302, %r303, %r304, %r305}, [%r306 + 16]; 2026-02-21T09:00:05.5138104Z // end inline asm 2026-02-21T09:00:05.5138243Z // begin inline asm 2026-02-21T09:00:05.5138390Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:00:05.5138553Z // end inline asm 2026-02-21T09:00:05.5138688Z cvt.u64.u32 %rd106, %r273; 2026-02-21T09:00:05.5138849Z cvt.u64.u32 %rd107, %r274; 2026-02-21T09:00:05.5138997Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:00:05.5139159Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:00:05.5139430Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5139725Z mov.b64 {%r332, %r333}, %rd109; 2026-02-21T09:00:05.5139891Z cvt.rn.f16x2.f32 %r334, %r333, %r332; 2026-02-21T09:00:05.5140173Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5140457Z cvt.u64.u32 %rd110, %r275; 2026-02-21T09:00:05.5140606Z cvt.u64.u32 %rd111, %r276; 2026-02-21T09:00:05.5140760Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:00:05.5140912Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:00:05.5141178Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5141461Z mov.b64 {%r335, %r336}, %rd113; 2026-02-21T09:00:05.5141630Z cvt.rn.f16x2.f32 %r337, %r336, %r335; 2026-02-21T09:00:05.5141904Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5142178Z cvt.u64.u32 %rd114, %r277; 2026-02-21T09:00:05.5142333Z cvt.u64.u32 %rd115, %r278; 2026-02-21T09:00:05.5142481Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:00:05.5142636Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:00:05.5142892Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5143215Z mov.b64 {%r338, %r339}, %rd117; 2026-02-21T09:00:05.5143376Z cvt.rn.f16x2.f32 %r340, %r339, %r338; 2026-02-21T09:00:05.5143657Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5143948Z cvt.u64.u32 %rd118, %r279; 2026-02-21T09:00:05.5144099Z cvt.u64.u32 %rd119, %r280; 2026-02-21T09:00:05.5144253Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:00:05.5144405Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:00:05.5144701Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5144989Z mov.b64 {%r341, %r342}, %rd121; 2026-02-21T09:00:05.5145160Z cvt.rn.f16x2.f32 %r343, %r342, %r341; 2026-02-21T09:00:05.5145446Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5145775Z cvt.u64.u32 %rd122, %r281; 2026-02-21T09:00:05.5145936Z cvt.u64.u32 %rd123, %r282; 2026-02-21T09:00:05.5146088Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:00:05.5146257Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:00:05.5146517Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5146829Z mov.b64 {%r344, %r345}, %rd125; 2026-02-21T09:00:05.5147007Z cvt.rn.f16x2.f32 %r346, %r345, %r344; 2026-02-21T09:00:05.5147300Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5147606Z cvt.u64.u32 %rd126, %r283; 2026-02-21T09:00:05.5147761Z cvt.u64.u32 %rd127, %r284; 2026-02-21T09:00:05.5147919Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:00:05.5148076Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:00:05.5148375Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5148652Z mov.b64 {%r347, %r348}, %rd129; 2026-02-21T09:00:05.5148820Z cvt.rn.f16x2.f32 %r349, %r348, %r347; 2026-02-21T09:00:05.5149095Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5149370Z cvt.u64.u32 %rd130, %r285; 2026-02-21T09:00:05.5149525Z cvt.u64.u32 %rd131, %r286; 2026-02-21T09:00:05.5149671Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:00:05.5149828Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:00:05.5150085Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5150366Z mov.b64 {%r350, %r351}, %rd133; 2026-02-21T09:00:05.5150539Z cvt.rn.f16x2.f32 %r352, %r351, %r350; 2026-02-21T09:00:05.5150824Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5151120Z cvt.u64.u32 %rd134, %r287; 2026-02-21T09:00:05.5151276Z cvt.u64.u32 %rd135, %r288; 2026-02-21T09:00:05.5151435Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:00:05.5151590Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:00:05.5151867Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5152157Z mov.b64 {%r353, %r354}, %rd137; 2026-02-21T09:00:05.5152332Z cvt.rn.f16x2.f32 %r355, %r354, %r353; 2026-02-21T09:00:05.5152619Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5152905Z cvt.u64.u32 %rd138, %r290; 2026-02-21T09:00:05.5153067Z cvt.u64.u32 %rd139, %r291; 2026-02-21T09:00:05.5153219Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:00:05.5153381Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:00:05.5153649Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5153945Z mov.b64 {%r356, %r357}, %rd141; 2026-02-21T09:00:05.5154118Z cvt.rn.f16x2.f32 %r358, %r357, %r356; 2026-02-21T09:00:05.5154398Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5154718Z cvt.u64.u32 %rd142, %r292; 2026-02-21T09:00:05.5154913Z cvt.u64.u32 %rd143, %r293; 2026-02-21T09:00:05.5155076Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:00:05.5155233Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:00:05.5155515Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5155800Z mov.b64 {%r359, %r360}, %rd145; 2026-02-21T09:00:05.5155975Z cvt.rn.f16x2.f32 %r361, %r360, %r359; 2026-02-21T09:00:05.5156257Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5156548Z cvt.u64.u32 %rd146, %r294; 2026-02-21T09:00:05.5156711Z cvt.u64.u32 %rd147, %r295; 2026-02-21T09:00:05.5156865Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:00:05.5157069Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:00:05.5157346Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5157662Z mov.b64 {%r362, %r363}, %rd149; 2026-02-21T09:00:05.5157845Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T09:00:05.5158132Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5158443Z cvt.u64.u32 %rd150, %r296; 2026-02-21T09:00:05.5158628Z cvt.u64.u32 %rd151, %r297; 2026-02-21T09:00:05.5158798Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:00:05.5158947Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:00:05.5159207Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5159483Z mov.b64 {%r365, %r366}, %rd153; 2026-02-21T09:00:05.5159649Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T09:00:05.5159952Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5160227Z cvt.u64.u32 %rd154, %r298; 2026-02-21T09:00:05.5160380Z cvt.u64.u32 %rd155, %r299; 2026-02-21T09:00:05.5160527Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:00:05.5160683Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:00:05.5160939Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5161229Z mov.b64 {%r368, %r369}, %rd157; 2026-02-21T09:00:05.5161396Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T09:00:05.5161665Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5161948Z cvt.u64.u32 %rd158, %r300; 2026-02-21T09:00:05.5162095Z cvt.u64.u32 %rd159, %r301; 2026-02-21T09:00:05.5162249Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:00:05.5162399Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:00:05.5162661Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5162936Z mov.b64 {%r371, %r372}, %rd161; 2026-02-21T09:00:05.5163102Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T09:00:05.5163374Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5163649Z cvt.u64.u32 %rd162, %r302; 2026-02-21T09:00:05.5163800Z cvt.u64.u32 %rd163, %r303; 2026-02-21T09:00:05.5163945Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:00:05.5164104Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:00:05.5164357Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5164643Z mov.b64 {%r374, %r375}, %rd165; 2026-02-21T09:00:05.5164849Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T09:00:05.5165123Z .loc 1 46 52 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:46:52 2026-02-21T09:00:05.5165408Z cvt.u64.u32 %rd166, %r304; 2026-02-21T09:00:05.5165555Z cvt.u64.u32 %rd167, %r305; 2026-02-21T09:00:05.5165710Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:00:05.5165860Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:00:05.5166130Z .loc 1 48 27 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:48:27 2026-02-21T09:00:05.5166448Z mov.b64 {%r377, %r378}, %rd169; 2026-02-21T09:00:05.5166615Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T09:00:05.5166893Z .loc 1 49 82 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:49:82 2026-02-21T09:00:05.5167202Z st.shared.v4.b32 [%r19], {%r334, %r337, %r340, %r343}; 2026-02-21T09:00:05.5167437Z st.shared.v4.b32 [%r20], {%r346, %r349, %r352, %r355}; 2026-02-21T09:00:05.5167623Z bar.sync 0; 2026-02-21T09:00:05.5167790Z ld.shared.v4.b32 {%r307, %r308, %r309, %r310}, [%r21]; 2026-02-21T09:00:05.5168013Z ld.shared.v4.b32 {%r311, %r312, %r313, %r314}, [%r21+1024]; 2026-02-21T09:00:05.5168224Z bar.sync 0; 2026-02-21T09:00:05.5168391Z st.shared.v4.b32 [%r19], {%r358, %r361, %r364, %r367}; 2026-02-21T09:00:05.5168638Z st.shared.v4.b32 [%r20], {%r370, %r373, %r376, %r379}; 2026-02-21T09:00:05.5168832Z bar.sync 0; 2026-02-21T09:00:05.5168982Z ld.shared.v4.b32 {%r315, %r316, %r317, %r318}, [%r21]; 2026-02-21T09:00:05.5169214Z ld.shared.v4.b32 {%r319, %r320, %r321, %r322}, [%r21+1024]; 2026-02-21T09:00:05.5169414Z // begin inline asm 2026-02-21T09:00:05.5169604Z st.global.v4.b32 [ %rd102 + 0 ], { %r307, %r308, %r309, %r310 }; 2026-02-21T09:00:05.5169834Z // end inline asm 2026-02-21T09:00:05.5169977Z // begin inline asm 2026-02-21T09:00:05.5170159Z st.global.v4.b32 [ %rd103 + 0 ], { %r311, %r312, %r313, %r314 }; 2026-02-21T09:00:05.5170359Z // end inline asm 2026-02-21T09:00:05.5170496Z // begin inline asm 2026-02-21T09:00:05.5170668Z st.global.v4.b32 [ %rd104 + 0 ], { %r315, %r316, %r317, %r318 }; 2026-02-21T09:00:05.5170877Z // end inline asm 2026-02-21T09:00:05.5171006Z // begin inline asm 2026-02-21T09:00:05.5171212Z st.global.v4.b32 [ %rd105 + 0 ], { %r319, %r320, %r321, %r322 }; 2026-02-21T09:00:05.5171411Z // end inline asm 2026-02-21T09:00:05.5171572Z $L__BB0_8: // %._crit_edge 2026-02-21T09:00:05.5171871Z .loc 1 20 4 // cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py:20:4 2026-02-21T09:00:05.5172148Z bar.sync 0; 2026-02-21T09:00:05.5172280Z // begin inline asm 2026-02-21T09:00:05.5172473Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r380, 32; 2026-02-21T09:00:05.5172692Z // end inline asm 2026-02-21T09:00:05.5172817Z ret; 2026-02-21T09:00:05.5172942Z $L__tmp0: 2026-02-21T09:00:05.5173061Z $L__func_end0: 2026-02-21T09:00:05.5173220Z // -- End function 2026-02-21T09:00:05.5173394Z } 2026-02-21T09:00:05.5173661Z .file 1 "/tmp/torchinductor_root/gz/cgzujitzihfw3xpwlvwvqaig3dmlvj7xh7au32jyp46gemgatdrm.py" 2026-02-21T09:00:05.5173977Z .section .debug_abbrev 2026-02-21T09:00:05.5174113Z { 2026-02-21T09:00:05.5174266Z .b8 1 // Abbreviation Code 2026-02-21T09:00:05.5174480Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:00:05.5174722Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:00:05.5174931Z .b8 37 // DW_AT_producer 2026-02-21T09:00:05.5175138Z .b8 8 // DW_FORM_string 2026-02-21T09:00:05.5175343Z .b8 19 // DW_AT_language 2026-02-21T09:00:05.5175542Z .b8 5 // DW_FORM_data2 2026-02-21T09:00:05.5175744Z .b8 3 // DW_AT_name 2026-02-21T09:00:05.5175934Z .b8 8 // DW_FORM_string 2026-02-21T09:00:05.5176141Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:00:05.5176339Z .b8 6 // DW_FORM_data4 2026-02-21T09:00:05.5176544Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:00:05.5176747Z .b8 8 // DW_FORM_string 2026-02-21T09:00:05.5176941Z .b8 0 // EOM(1) 2026-02-21T09:00:05.5177135Z .b8 0 // EOM(2) 2026-02-21T09:00:05.5177317Z .b8 0 // EOM(3) 2026-02-21T09:00:05.5177519Z } 2026-02-21T09:00:05.5177639Z .section .debug_info 2026-02-21T09:00:05.5177786Z { 2026-02-21T09:00:05.5177931Z .b32 104 // Length of Unit 2026-02-21T09:00:05.5178162Z .b8 2 // DWARF version number 2026-02-21T09:00:05.5178356Z .b8 0 2026-02-21T09:00:05.5178534Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:00:05.5178785Z .b8 8 // Address Size (in bytes) 2026-02-21T09:00:05.5179015Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:00:05.5179250Z .b8 116 // DW_AT_producer 2026-02-21T09:00:05.5179455Z .b8 114 2026-02-21T09:00:05.5179577Z .b8 105 2026-02-21T09:00:05.5179688Z .b8 116 2026-02-21T09:00:05.5179805Z .b8 111 2026-02-21T09:00:05.5179914Z .b8 110 2026-02-21T09:00:05.5180032Z .b8 0 2026-02-21T09:00:05.5180174Z .b8 2 // DW_AT_language 2026-02-21T09:00:05.5180348Z .b8 0 2026-02-21T09:00:05.5180488Z .b8 99 // DW_AT_name 2026-02-21T09:00:05.5180658Z .b8 103 2026-02-21T09:00:05.5180776Z .b8 122 2026-02-21T09:00:05.5180908Z .b8 117 2026-02-21T09:00:05.5181026Z .b8 106 2026-02-21T09:00:05.5181135Z .b8 105 2026-02-21T09:00:05.5181250Z .b8 116 2026-02-21T09:00:05.5181358Z .b8 122 2026-02-21T09:00:05.5181471Z .b8 105 2026-02-21T09:00:05.5181578Z .b8 104 2026-02-21T09:00:05.5181692Z .b8 102 2026-02-21T09:00:05.5181804Z .b8 119 2026-02-21T09:00:05.5181910Z .b8 51 2026-02-21T09:00:05.5182028Z .b8 120 2026-02-21T09:00:05.5182135Z .b8 112 2026-02-21T09:00:05.5182246Z .b8 119 2026-02-21T09:00:05.5182382Z .b8 108 2026-02-21T09:00:05.5182499Z .b8 118 2026-02-21T09:00:05.5182605Z .b8 119 2026-02-21T09:00:05.5182718Z .b8 118 2026-02-21T09:00:05.5182824Z .b8 113 2026-02-21T09:00:05.5182940Z .b8 97 2026-02-21T09:00:05.5183048Z .b8 105 2026-02-21T09:00:05.5183164Z .b8 103 2026-02-21T09:00:05.5183274Z .b8 51 2026-02-21T09:00:05.5183393Z .b8 100 2026-02-21T09:00:05.5183501Z .b8 109 2026-02-21T09:00:05.5183617Z .b8 108 2026-02-21T09:00:05.5183733Z .b8 118 2026-02-21T09:00:05.5183844Z .b8 106 2026-02-21T09:00:05.5183962Z .b8 55 2026-02-21T09:00:05.5184071Z .b8 120 2026-02-21T09:00:05.5184189Z .b8 104 2026-02-21T09:00:05.5184298Z .b8 55 2026-02-21T09:00:05.5184416Z .b8 97 2026-02-21T09:00:05.5184527Z .b8 117 2026-02-21T09:00:05.5184646Z .b8 51 2026-02-21T09:00:05.5184783Z .b8 50 2026-02-21T09:00:05.5184900Z .b8 106 2026-02-21T09:00:05.5185008Z .b8 121 2026-02-21T09:00:05.5185121Z .b8 112 2026-02-21T09:00:05.5185228Z .b8 52 2026-02-21T09:00:05.5185343Z .b8 54 2026-02-21T09:00:05.5185459Z .b8 103 2026-02-21T09:00:05.5185568Z .b8 101 2026-02-21T09:00:05.5185685Z .b8 109 2026-02-21T09:00:05.5185795Z .b8 103 2026-02-21T09:00:05.5185913Z .b8 97 2026-02-21T09:00:05.5186022Z .b8 116 2026-02-21T09:00:05.5186139Z .b8 100 2026-02-21T09:00:05.5186250Z .b8 114 2026-02-21T09:00:05.5186368Z .b8 109 2026-02-21T09:00:05.5186478Z .b8 46 2026-02-21T09:00:05.5186594Z .b8 112 2026-02-21T09:00:05.5186701Z .b8 121 2026-02-21T09:00:05.5186816Z .b8 0 2026-02-21T09:00:05.5186965Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:00:05.5187186Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:00:05.5187369Z .b8 116 2026-02-21T09:00:05.5187477Z .b8 109 2026-02-21T09:00:05.5187591Z .b8 112 2026-02-21T09:00:05.5187697Z .b8 47 2026-02-21T09:00:05.5187811Z .b8 116 2026-02-21T09:00:05.5187916Z .b8 111 2026-02-21T09:00:05.5188030Z .b8 114 2026-02-21T09:00:05.5188136Z .b8 99 2026-02-21T09:00:05.5188249Z .b8 104 2026-02-21T09:00:05.5188356Z .b8 105 2026-02-21T09:00:05.5188470Z .b8 110 2026-02-21T09:00:05.5188576Z .b8 100 2026-02-21T09:00:05.5188692Z .b8 117 2026-02-21T09:00:05.5188798Z .b8 99 2026-02-21T09:00:05.5188912Z .b8 116 2026-02-21T09:00:05.5189024Z .b8 111 2026-02-21T09:00:05.5189132Z .b8 114 2026-02-21T09:00:05.5189246Z .b8 95 2026-02-21T09:00:05.5189354Z .b8 114 2026-02-21T09:00:05.5189533Z .b8 111 2026-02-21T09:00:05.5189641Z .b8 111 2026-02-21T09:00:05.5189758Z .b8 116 2026-02-21T09:00:05.5189865Z .b8 47 2026-02-21T09:00:05.5189982Z .b8 103 2026-02-21T09:00:05.5190090Z .b8 122 2026-02-21T09:00:05.5190209Z .b8 0 2026-02-21T09:00:05.5190323Z } 2026-02-21T09:00:05.5190460Z .section .debug_macinfo { } 2026-02-21T09:00:05.5190565Z 2026-02-21T09:00:05.5190654Z ================================================================ 2026-02-21T09:00:05.5190889Z please share the reproducer above with Triton project. 2026-02-21T09:00:07.8906118Z 2026-02-21T09:00:07.8906876Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 18.3 configs/s 2026-02-21T09:00:08.3987931Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1930.9 2026-02-21T09:00:08.3993410Z configs/s 2026-02-21T09:00:08.4461301Z [19s] Generation 1 complete: 2026-02-21T09:00:08.4465883Z error=6 2026-02-21T09:00:08.4470598Z ok=86 2026-02-21T09:00:08.4474053Z min=0.0348 2026-02-21T09:00:08.4477946Z mid=0.0942 2026-02-21T09:00:08.4481962Z max=1.1188 2026-02-21T09:00:08.4483946Z best={'block_sizes': [128, 64, 32], 2026-02-21T09:00:08.4484489Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:00:08.4484827Z 'l2_groupings': [16], 2026-02-21T09:00:08.4484994Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:00:08.4485187Z 'loop_orders': [[1, 0]], 2026-02-21T09:00:08.4485345Z 'maxnreg': 128, 2026-02-21T09:00:08.4485494Z 'num_sm_multiplier': 8, 2026-02-21T09:00:08.4485647Z 'num_stages': 7, 2026-02-21T09:00:08.4485789Z 'num_warps': 4, 2026-02-21T09:00:08.4485940Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:00:08.4486234Z 'range_flattens': [None, True], 2026-02-21T09:00:08.4486424Z 'range_multi_buffers': [False, False], 2026-02-21T09:00:08.4486615Z 'range_num_stages': [0, 0], 2026-02-21T09:00:08.4486786Z 'range_unroll_factors': [0, 0], 2026-02-21T09:00:08.4486959Z 'range_warp_specializes': [True, None]} 2026-02-21T09:00:08.4487181Z [19s] Fitting surrogate: 192 points, 192 targets 2026-02-21T09:00:09.6066270Z [20s] Generation 2 starting: 85 neighbors, 5 active search path(s) 2026-02-21T09:00:12.7681319Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 40.6 configs/s 2026-02-21T09:00:17.7659052Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 18.0 configs/s 2026-02-21T09:00:18.4629296Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2083.9 2026-02-21T09:00:18.4629830Z configs/s 2026-02-21T09:00:18.5107398Z [29s] Generation 2 complete: 2026-02-21T09:00:18.5107696Z error=4 2026-02-21T09:00:18.5107834Z ok=87 2026-02-21T09:00:18.5107971Z min=0.0225 2026-02-21T09:00:18.5108102Z mid=0.0655 2026-02-21T09:00:18.5108227Z max=2.9174 2026-02-21T09:00:18.5108357Z best={'block_sizes': [64, 256, 32], 2026-02-21T09:00:18.5108588Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:00:18.5108809Z 'l2_groupings': [8], 2026-02-21T09:00:18.5108973Z 'load_eviction_policies': ['', ''], 2026-02-21T09:00:18.5109149Z 'loop_orders': [[1, 0]], 2026-02-21T09:00:18.5109302Z 'num_stages': 4, 2026-02-21T09:00:18.5109447Z 'num_warps': 4, 2026-02-21T09:00:18.5109591Z 'pid_type': 'flat', 2026-02-21T09:00:18.5109743Z 'range_flattens': [None, False], 2026-02-21T09:00:18.5109928Z 'range_multi_buffers': [None, None], 2026-02-21T09:00:18.5110113Z 'range_num_stages': [0, 0], 2026-02-21T09:00:18.5110275Z 'range_unroll_factors': [0, 0], 2026-02-21T09:00:18.5110458Z 'range_warp_specializes': [None, False]} 2026-02-21T09:00:18.5125774Z [29s] Fitting surrogate: 283 points, 283 targets 2026-02-21T09:00:19.6586194Z [30s] Generation 3 starting: 81 neighbors, 5 active search path(s) 2026-02-21T09:00:28.9032519Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82/82 1.1 configs/s 2026-02-21T09:00:33.6089580Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 82/82 17.6 configs/s 2026-02-21T09:00:34.5897426Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1028.5 2026-02-21T09:00:34.5898037Z configs/s 2026-02-21T09:00:34.6724430Z [45s] Generation 3 complete: 2026-02-21T09:00:34.6728573Z error=2 2026-02-21T09:00:34.6731932Z ok=84 2026-02-21T09:00:34.6734784Z min=0.0185 2026-02-21T09:00:34.6738817Z mid=0.0389 2026-02-21T09:00:34.6742802Z max=2.8882 2026-02-21T09:00:34.6746113Z best={'block_sizes': [256, 128, 64], 2026-02-21T09:00:34.6749682Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:00:34.6750000Z 'l2_groupings': [64], 2026-02-21T09:00:34.6750202Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:00:34.6750627Z 'loop_orders': [[1, 0]], 2026-02-21T09:00:34.6750799Z 'num_stages': 3, 2026-02-21T09:00:34.6750944Z 'num_warps': 4, 2026-02-21T09:00:34.6751092Z 'pid_type': 'flat', 2026-02-21T09:00:34.6751243Z 'range_flattens': [None, True], 2026-02-21T09:00:34.6751431Z 'range_multi_buffers': [None, False], 2026-02-21T09:00:34.6751608Z 'range_num_stages': [0, 0], 2026-02-21T09:00:34.6751774Z 'range_unroll_factors': [0, 0], 2026-02-21T09:00:34.6751949Z 'range_warp_specializes': [None, None]} 2026-02-21T09:00:34.6752156Z [45s] Fitting surrogate: 369 points, 369 targets 2026-02-21T09:00:35.8728325Z [46s] Generation 4 starting: 79 neighbors, 5 active search path(s) 2026-02-21T09:01:07.4932623Z [78s] Timeout after 30s compiling Config(block_sizes=[1024, 128, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], num_stages=3, num_warps=1, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T09:01:07.4953951Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 0.5 configs/s 2026-02-21T09:01:10.7754630Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 24.9 configs/s 2026-02-21T09:01:12.0850583Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 772.5 2026-02-21T09:01:12.0851452Z configs/s 2026-02-21T09:01:12.1916227Z [83s] Generation 4 complete: 2026-02-21T09:01:12.1920149Z error=25 2026-02-21T09:01:12.1924040Z timeout=1 2026-02-21T09:01:12.1927399Z ok=58 2026-02-21T09:01:12.1931308Z min=0.0164 2026-02-21T09:01:12.1935160Z mid=0.0307 2026-02-21T09:01:12.1939091Z max=2.0091 2026-02-21T09:01:12.1941078Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:01:12.1941322Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:01:12.1941564Z 'l2_groupings': [8], 2026-02-21T09:01:12.1941735Z 'load_eviction_policies': ['', ''], 2026-02-21T09:01:12.1941975Z 'loop_orders': [[1, 0]], 2026-02-21T09:01:12.1942173Z 'num_stages': 4, 2026-02-21T09:01:12.1942368Z 'num_warps': 4, 2026-02-21T09:01:12.1942828Z 'pid_type': 'flat', 2026-02-21T09:01:12.1948552Z 'range_flattens': [None, False], 2026-02-21T09:01:12.1949480Z 'range_multi_buffers': [None, None], 2026-02-21T09:01:12.1949690Z 'range_num_stages': [0, 0], 2026-02-21T09:01:12.1949906Z 'range_unroll_factors': [0, 0], 2026-02-21T09:01:12.1950105Z 'range_warp_specializes': [None, False]} 2026-02-21T09:01:12.1950324Z [83s] Fitting surrogate: 453 points, 453 targets 2026-02-21T09:01:13.4253658Z [84s] Generation 5 starting: 68 neighbors, 5 active search path(s) 2026-02-21T09:01:24.0096214Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68/68 1.4 configs/s 2026-02-21T09:01:27.0257795Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 68/68 22.7 configs/s 2026-02-21T09:01:27.9779430Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1059.2 2026-02-21T09:01:27.9783014Z configs/s 2026-02-21T09:01:28.0582704Z [98s] Generation 5 complete: 2026-02-21T09:01:28.0582995Z error=19 2026-02-21T09:01:28.0583164Z ok=54 2026-02-21T09:01:28.0583296Z min=0.0184 2026-02-21T09:01:28.0583421Z mid=0.0286 2026-02-21T09:01:28.0583547Z max=3.7796 2026-02-21T09:01:28.0583737Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:01:28.0584325Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:01:28.0584549Z 'l2_groupings': [8], 2026-02-21T09:01:28.0585051Z 'load_eviction_policies': ['', ''], 2026-02-21T09:01:28.0585235Z 'loop_orders': [[1, 0]], 2026-02-21T09:01:28.0585397Z 'num_stages': 4, 2026-02-21T09:01:28.0585543Z 'num_warps': 4, 2026-02-21T09:01:28.0585684Z 'pid_type': 'flat', 2026-02-21T09:01:28.0585842Z 'range_flattens': [None, False], 2026-02-21T09:01:28.0586030Z 'range_multi_buffers': [None, None], 2026-02-21T09:01:28.0586319Z 'range_num_stages': [0, 0], 2026-02-21T09:01:28.0586487Z 'range_unroll_factors': [0, 0], 2026-02-21T09:01:28.0586680Z 'range_warp_specializes': [None, False]} 2026-02-21T09:01:28.0602801Z [98s] Fitting surrogate: 526 points, 526 targets 2026-02-21T09:01:29.1103399Z [100s] Generation 6 starting: 59 neighbors, 4 active search path(s) 2026-02-21T09:01:39.8929941Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60/60 1.4 configs/s 2026-02-21T09:01:41.9382674Z [112s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:01:41.9384205Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=4, num_stages=4, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:01:41.9385646Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:01:41.9385904Z `ptxas` stderr: 2026-02-21T09:01:41.9386353Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 219 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:01:41.9386850Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:01:41.9387007Z 2026-02-21T09:01:41.9387433Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpx3nqq0wu.ptx -o /tmp/tmpx3nqq0wu.ptx.o 2026-02-21T09:01:41.9387883Z 2026-02-21T09:01:41.9388015Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:01:41.9388327Z 2026-02-21T09:01:41.9388331Z 2026-02-21T09:01:41.9388417Z ================================================================ 2026-02-21T09:01:41.9388634Z Internal Triton PTX codegen error 2026-02-21T09:01:41.9388807Z `ptxas` stderr: 2026-02-21T09:01:41.9389222Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 219 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:01:41.9389835Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:01:41.9389984Z 2026-02-21T09:01:41.9390391Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpx3nqq0wu.ptx -o /tmp/tmpx3nqq0wu.ptx.o 2026-02-21T09:01:41.9390860Z 2026-02-21T09:01:41.9390864Z 2026-02-21T09:01:41.9390921Z // 2026-02-21T09:01:41.9391070Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:01:41.9391243Z // 2026-02-21T09:01:41.9391311Z 2026-02-21T09:01:41.9391377Z .version 8.7 2026-02-21T09:01:41.9391511Z .target sm_100a 2026-02-21T09:01:41.9391651Z .address_size 64 2026-02-21T09:01:41.9391738Z 2026-02-21T09:01:41.9391863Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:01:41.9392128Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:01:41.9392350Z // @_helion_matmul 2026-02-21T09:01:41.9392555Z .visible .entry _helion_matmul( 2026-02-21T09:01:41.9392782Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:01:41.9393042Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:01:41.9393394Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:01:41.9393634Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:01:41.9393884Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:01:41.9394083Z ) 2026-02-21T09:01:41.9394210Z .reqntid 256 2026-02-21T09:01:41.9394346Z .maxnreg 32 2026-02-21T09:01:41.9394469Z { 2026-02-21T09:01:41.9394601Z .reg .pred %p<59>; 2026-02-21T09:01:41.9394807Z .reg .b32 %r<529>; 2026-02-21T09:01:41.9394948Z .reg .b64 %rd<280>; 2026-02-21T09:01:41.9395171Z $L__func_begin0: 2026-02-21T09:01:41.9395255Z 2026-02-21T09:01:41.9395308Z // %bb.0: 2026-02-21T09:01:41.9395540Z .loc 1 19 0 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:19 2026-02-21T09:01:41.9395830Z mov.u32 %r1, %tid.x; 2026-02-21T09:01:41.9395981Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:01:41.9396165Z ld.param.b64 %rd40, [_helion_matmul_param_2]; 2026-02-21T09:01:41.9396364Z mov.b32 %r38, global_smem; 2026-02-21T09:01:41.9396518Z // begin inline asm 2026-02-21T09:01:41.9396765Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r38], 128; 2026-02-21T09:01:41.9397009Z // end inline asm 2026-02-21T09:01:41.9397174Z ld.param.b64 %rd57, [_helion_matmul_param_3]; 2026-02-21T09:01:41.9397353Z bar.sync 0; 2026-02-21T09:01:41.9397500Z ld.shared.b32 %r522, [global_smem]; 2026-02-21T09:01:41.9397663Z bar.sync 0; 2026-02-21T09:01:41.9397832Z // begin inline asm 2026-02-21T09:01:41.9398033Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:01:41.9398252Z // end inline asm 2026-02-21T09:01:41.9398495Z .loc 1 21 71 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:21:71 2026-02-21T09:01:41.9398774Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:01:41.9398923Z mov.u32 %r47, %ctaid.y; 2026-02-21T09:01:41.9399068Z mov.u32 %r48, %ctaid.z; 2026-02-21T09:01:41.9399218Z mov.u32 %r49, %nctaid.x; 2026-02-21T09:01:41.9399364Z mov.u32 %r50, %nctaid.y; 2026-02-21T09:01:41.9399523Z mad.lo.s32 %r51, %r48, %r50, %r47; 2026-02-21T09:01:41.9399697Z mad.lo.s32 %r52, %r51, %r49, %r3; 2026-02-21T09:01:41.9399855Z shl.b32 %r53, %r52, 7; 2026-02-21T09:01:41.9400006Z cvt.s64.s32 %rd58, %r53; 2026-02-21T09:01:41.9400152Z add.s64 %rd54, %rd57, %rd58; 2026-02-21T09:01:41.9400305Z shl.b32 %r54, %r1, 2; 2026-02-21T09:01:41.9400447Z add.s32 %r39, %r38, %r54; 2026-02-21T09:01:41.9400594Z mov.b32 %r56, 0; 2026-02-21T09:01:41.9400722Z // begin inline asm 2026-02-21T09:01:41.9400874Z @%p1 st.shared.b32 [ %r39 + 0 ], %r56; 2026-02-21T09:01:41.9401037Z // end inline asm 2026-02-21T09:01:41.9401196Z bar.warp.sync -1; 2026-02-21T09:01:41.9401342Z setp.eq.b32 %p53, %r1, 0; 2026-02-21T09:01:41.9401532Z cvt.u64.u32 %rd39, %r38; 2026-02-21T09:01:41.9401681Z // begin inline asm 2026-02-21T09:01:41.9401928Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd39 + 0 ], %rd40; 2026-02-21T09:01:41.9402204Z // end inline asm 2026-02-21T09:01:41.9402335Z // begin inline asm 2026-02-21T09:01:41.9402560Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x1; 2026-02-21T09:01:41.9402802Z // end inline asm 2026-02-21T09:01:41.9402940Z mov.b32 %r41, 64; 2026-02-21T09:01:41.9403079Z // begin inline asm 2026-02-21T09:01:41.9403307Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x0, %r41; 2026-02-21T09:01:41.9403577Z // end inline asm 2026-02-21T09:01:41.9403705Z mov.b32 %r42, 128; 2026-02-21T09:01:41.9403844Z // begin inline asm 2026-02-21T09:01:41.9404068Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x1, %r42; 2026-02-21T09:01:41.9404325Z // end inline asm 2026-02-21T09:01:41.9404460Z mov.b32 %r43, 1024; 2026-02-21T09:01:41.9404597Z // begin inline asm 2026-02-21T09:01:41.9404883Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x0, %r43; 2026-02-21T09:01:41.9405153Z // end inline asm 2026-02-21T09:01:41.9405343Z mov.b32 %r44, 4096; 2026-02-21T09:01:41.9413300Z // begin inline asm 2026-02-21T09:01:41.9413620Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x1, %r44; 2026-02-21T09:01:41.9413913Z // end inline asm 2026-02-21T09:01:41.9414054Z mov.b64 %rd47, 2048; 2026-02-21T09:01:41.9414210Z // begin inline asm 2026-02-21T09:01:41.9414468Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd39 + 0 ], 0x0, %rd47; 2026-02-21T09:01:41.9414878Z // end inline asm 2026-02-21T09:01:41.9415982Z mov.b32 %r45, 1; 2026-02-21T09:01:41.9416126Z // begin inline asm 2026-02-21T09:01:41.9416378Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x0, %r45; 2026-02-21T09:01:41.9416667Z // end inline asm 2026-02-21T09:01:41.9416813Z // begin inline asm 2026-02-21T09:01:41.9417061Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x1, %r45; 2026-02-21T09:01:41.9417345Z // end inline asm 2026-02-21T09:01:41.9417478Z // begin inline asm 2026-02-21T09:01:41.9417716Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x6; 2026-02-21T09:01:41.9417974Z // end inline asm 2026-02-21T09:01:41.9418110Z // begin inline asm 2026-02-21T09:01:41.9418358Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x0; 2026-02-21T09:01:41.9418632Z // end inline asm 2026-02-21T09:01:41.9418769Z // begin inline asm 2026-02-21T09:01:41.9419051Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x3; 2026-02-21T09:01:41.9419323Z // end inline asm 2026-02-21T09:01:41.9419453Z // begin inline asm 2026-02-21T09:01:41.9419682Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd39 + 0 ], 0x0; 2026-02-21T09:01:41.9419941Z // end inline asm 2026-02-21T09:01:41.9420070Z // begin inline asm 2026-02-21T09:01:41.9420420Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd54 + 0 ], [ %rd39 + 0 ], 0x80; 2026-02-21T09:01:41.9420785Z // end inline asm 2026-02-21T09:01:41.9420926Z // begin inline asm 2026-02-21T09:01:41.9421130Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd54 + 0 ], 0x80; 2026-02-21T09:01:41.9421385Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:01:41.9421579Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:01:41.9421749Z // end inline asm 2026-02-21T09:01:41.9421885Z bar.sync 0; 2026-02-21T09:01:41.9422022Z cvta.global.u64 %rd142, %rd54; 2026-02-21T09:01:41.9422310Z .loc 1 27 75 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:27:75 2026-02-21T09:01:41.9422597Z setp.gt.u32 %p21, %r3, 255; 2026-02-21T09:01:41.9422763Z @%p21 bra $L__BB0_8; 2026-02-21T09:01:41.9422918Z // %bb.1: // %.lr.ph 2026-02-21T09:01:41.9423265Z .loc 1 0 75 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:0:75 2026-02-21T09:01:41.9423586Z ld.param.b64 %rd38, [_helion_matmul_param_1]; 2026-02-21T09:01:41.9423797Z ld.param.b64 %rd37, [_helion_matmul_param_0]; 2026-02-21T09:01:41.9424096Z .loc 1 47 48 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:47:48 2026-02-21T09:01:41.9424384Z and.b32 %r173, %r1, 7; 2026-02-21T09:01:41.9424541Z shl.b32 %r174, %r173, 3; 2026-02-21T09:01:41.9424843Z .loc 1 39 45 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:39:45 2026-02-21T09:01:41.9425129Z shr.u32 %r175, %r1, 3; 2026-02-21T09:01:41.9425282Z or.b32 %r4, %r175, 96; 2026-02-21T09:01:41.9425426Z bfe.u32 %r5, %r1, 3, 5; 2026-02-21T09:01:41.9425579Z or.b32 %r176, %r5, 64; 2026-02-21T09:01:41.9425719Z or.b32 %r177, %r5, 32; 2026-02-21T09:01:41.9425867Z shr.u32 %r178, %r1, 5; 2026-02-21T09:01:41.9426009Z and.b32 %r179, %r1, 255; 2026-02-21T09:01:41.9426167Z shl.b32 %r180, %r179, 4; 2026-02-21T09:01:41.9426310Z shl.b32 %r181, %r1, 1; 2026-02-21T09:01:41.9426464Z and.b32 %r182, %r181, 112; 2026-02-21T09:01:41.9426655Z xor.b32 %r6, %r180, %r182; 2026-02-21T09:01:41.9426817Z add.s32 %r125, %r38, %r6; 2026-02-21T09:01:41.9426970Z add.s32 %r127, %r125, 4096; 2026-02-21T09:01:41.9427133Z add.s32 %r129, %r125, 8192; 2026-02-21T09:01:41.9427293Z add.s32 %r131, %r125, 12288; 2026-02-21T09:01:41.9427466Z add.s32 %r133, %r125, 65536; 2026-02-21T09:01:41.9427630Z add.s32 %r135, %r125, 69632; 2026-02-21T09:01:41.9427783Z add.s32 %r137, %r125, 73728; 2026-02-21T09:01:41.9427949Z add.s32 %r139, %r125, 77824; 2026-02-21T09:01:41.9428134Z add.s32 %r141, %r125, 16384; 2026-02-21T09:01:41.9428295Z add.s32 %r143, %r125, 20480; 2026-02-21T09:01:41.9428447Z add.s32 %r145, %r125, 24576; 2026-02-21T09:01:41.9428606Z add.s32 %r147, %r125, 28672; 2026-02-21T09:01:41.9428764Z add.s32 %r149, %r125, 81920; 2026-02-21T09:01:41.9428917Z add.s32 %r151, %r125, 86016; 2026-02-21T09:01:41.9429077Z add.s32 %r153, %r125, 90112; 2026-02-21T09:01:41.9429227Z add.s32 %r155, %r125, 94208; 2026-02-21T09:01:41.9429388Z add.s32 %r157, %r125, 32768; 2026-02-21T09:01:41.9429541Z add.s32 %r159, %r125, 36864; 2026-02-21T09:01:41.9429703Z add.s32 %r161, %r125, 40960; 2026-02-21T09:01:41.9429856Z add.s32 %r163, %r125, 45056; 2026-02-21T09:01:41.9430015Z add.s32 %r165, %r125, 98304; 2026-02-21T09:01:41.9430170Z add.s32 %r167, %r125, 102400; 2026-02-21T09:01:41.9430342Z add.s32 %r169, %r125, 106496; 2026-02-21T09:01:41.9430503Z add.s32 %r171, %r125, 110592; 2026-02-21T09:01:41.9430687Z add.s32 %r252, %r125, 49152; 2026-02-21T09:01:41.9430854Z add.s32 %r254, %r125, 53248; 2026-02-21T09:01:41.9431006Z add.s32 %r256, %r125, 57344; 2026-02-21T09:01:41.9431163Z add.s32 %r258, %r125, 61440; 2026-02-21T09:01:41.9431316Z add.s32 %r260, %r125, 114688; 2026-02-21T09:01:41.9431476Z add.s32 %r262, %r125, 118784; 2026-02-21T09:01:41.9431629Z add.s32 %r264, %r125, 122880; 2026-02-21T09:01:41.9431792Z add.s32 %r266, %r125, 126976; 2026-02-21T09:01:41.9431955Z shl.b32 %r184, %r179, 7; 2026-02-21T09:01:41.9432107Z shl.b32 %r185, %r173, 4; 2026-02-21T09:01:41.9432269Z or.b32 %r186, %r184, %r185; 2026-02-21T09:01:41.9432425Z add.s32 %r187, %r38, 131072; 2026-02-21T09:01:41.9432590Z xor.b32 %r188, %r186, 16; 2026-02-21T09:01:41.9432745Z xor.b32 %r189, %r186, 32; 2026-02-21T09:01:41.9432903Z xor.b32 %r190, %r186, 48; 2026-02-21T09:01:41.9433052Z xor.b32 %r191, %r186, 64; 2026-02-21T09:01:41.9433210Z xor.b32 %r192, %r186, 80; 2026-02-21T09:01:41.9433357Z xor.b32 %r193, %r186, 96; 2026-02-21T09:01:41.9433517Z xor.b32 %r194, %r186, 112; 2026-02-21T09:01:41.9433790Z .loc 1 38 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:38:27 2026-02-21T09:01:41.9434077Z shl.b32 %r195, %r3, 7; 2026-02-21T09:01:41.9434233Z and.b32 %r23, %r195, 896; 2026-02-21T09:01:41.9434528Z .loc 1 39 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:39:32 2026-02-21T09:01:41.9434883Z or.b32 %r196, %r23, %r5; 2026-02-21T09:01:41.9435038Z or.b32 %r197, %r23, %r177; 2026-02-21T09:01:41.9435204Z or.b32 %r198, %r23, %r176; 2026-02-21T09:01:41.9435356Z or.b32 %r199, %r23, %r4; 2026-02-21T09:01:41.9435633Z .loc 1 40 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:40:27 2026-02-21T09:01:41.9435936Z shl.b32 %r200, %r3, 4; 2026-02-21T09:01:41.9436089Z and.b32 %r418, %r200, 3968; 2026-02-21T09:01:41.9436370Z .loc 1 41 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:41:32 2026-02-21T09:01:41.9436659Z or.b32 %r201, %r418, %r5; 2026-02-21T09:01:41.9436830Z or.b32 %r202, %r418, %r177; 2026-02-21T09:01:41.9436978Z or.b32 %r203, %r418, %r176; 2026-02-21T09:01:41.9437132Z or.b32 %r204, %r418, %r4; 2026-02-21T09:01:41.9437390Z .loc 1 51 53 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:53 2026-02-21T09:01:41.9437672Z shl.b32 %r205, %r201, 10; 2026-02-21T09:01:41.9437822Z shl.b32 %r206, %r202, 10; 2026-02-21T09:01:41.9437964Z shl.b32 %r207, %r203, 10; 2026-02-21T09:01:41.9438144Z shl.b32 %r208, %r204, 10; 2026-02-21T09:01:41.9438386Z .loc 1 52 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:80 2026-02-21T09:01:41.9438671Z shl.b32 %r209, %r196, 10; 2026-02-21T09:01:41.9438813Z shl.b32 %r210, %r197, 10; 2026-02-21T09:01:41.9438964Z shl.b32 %r211, %r198, 10; 2026-02-21T09:01:41.9439113Z shl.b32 %r212, %r199, 10; 2026-02-21T09:01:41.9439364Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9439705Z shfl.sync.idx.b32 %r25, %r178, 0, 31, -1; 2026-02-21T09:01:41.9439886Z shl.b32 %r213, %r25, 21; 2026-02-21T09:01:41.9440046Z and.b32 %r214, %r213, 6291456; 2026-02-21T09:01:41.9440200Z add.s32 %r215, %r214, %r522; 2026-02-21T09:01:41.9440362Z shl.b32 %r216, %r25, 4; 2026-02-21T09:01:41.9440504Z and.b32 %r217, %r216, 64; 2026-02-21T09:01:41.9440658Z add.s32 %r416, %r215, %r217; 2026-02-21T09:01:41.9440820Z mov.pred %p31, -1; 2026-02-21T09:01:41.9440962Z // begin inline asm 2026-02-21T09:01:41.9441308Z @%p31 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r416 + 0], {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T09:01:41.9441665Z // end inline asm 2026-02-21T09:01:41.9441806Z // begin inline asm 2026-02-21T09:01:41.9442153Z @%p31 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r416 + 16], {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T09:01:41.9442521Z // end inline asm 2026-02-21T09:01:41.9442660Z // begin inline asm 2026-02-21T09:01:41.9442987Z @%p31 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r416 + 32], {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T09:01:41.9443364Z // end inline asm 2026-02-21T09:01:41.9443496Z // begin inline asm 2026-02-21T09:01:41.9443829Z @%p31 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r416 + 48], {%r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56, %r56}; 2026-02-21T09:01:41.9444180Z // end inline asm 2026-02-21T09:01:41.9444318Z // begin inline asm 2026-02-21T09:01:41.9444478Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:01:41.9444639Z // end inline asm 2026-02-21T09:01:41.9444837Z bar.sync 0; 2026-02-21T09:01:41.9445073Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9445368Z add.s32 %r524, %r38, 163840; 2026-02-21T09:01:41.9445517Z // begin inline asm 2026-02-21T09:01:41.9445688Z @%p53 mbarrier.init.shared::cta.b64 [%r524], 1; 2026-02-21T09:01:41.9445871Z // end inline asm 2026-02-21T09:01:41.9446008Z bar.sync 0; 2026-02-21T09:01:41.9446137Z add.s32 %r124, %r38, 163848; 2026-02-21T09:01:41.9446331Z // begin inline asm 2026-02-21T09:01:41.9446498Z @%p53 mbarrier.init.shared::cta.b64 [%r124], 1; 2026-02-21T09:01:41.9446681Z // end inline asm 2026-02-21T09:01:41.9446926Z .loc 1 51 60 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:60 2026-02-21T09:01:41.9447205Z or.b32 %r218, %r205, %r174; 2026-02-21T09:01:41.9447354Z or.b32 %r219, %r206, %r174; 2026-02-21T09:01:41.9447506Z or.b32 %r220, %r207, %r174; 2026-02-21T09:01:41.9447650Z or.b32 %r221, %r208, %r174; 2026-02-21T09:01:41.9447900Z .loc 1 51 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:32 2026-02-21T09:01:41.9448184Z mad.wide.u32 %rd59, %r218, 2, %rd37; 2026-02-21T09:01:41.9448372Z mad.wide.u32 %rd60, %r219, 2, %rd37; 2026-02-21T09:01:41.9448543Z mad.wide.u32 %rd61, %r220, 2, %rd37; 2026-02-21T09:01:41.9448716Z mad.wide.u32 %rd62, %r221, 2, %rd37; 2026-02-21T09:01:41.9448879Z mov.b32 %r253, 16; 2026-02-21T09:01:41.9449117Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9449392Z // begin inline asm 2026-02-21T09:01:41.9449618Z cp.async.cg.shared.global [ %r125 + 0 ], [ %rd59 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9449845Z // end inline asm 2026-02-21T09:01:41.9449975Z // begin inline asm 2026-02-21T09:01:41.9450167Z cp.async.cg.shared.global [ %r127 + 0 ], [ %rd60 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9450385Z // end inline asm 2026-02-21T09:01:41.9450515Z // begin inline asm 2026-02-21T09:01:41.9450703Z cp.async.cg.shared.global [ %r129 + 0 ], [ %rd61 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9450913Z // end inline asm 2026-02-21T09:01:41.9451048Z // begin inline asm 2026-02-21T09:01:41.9451271Z cp.async.cg.shared.global [ %r131 + 0 ], [ %rd62 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9451487Z // end inline asm 2026-02-21T09:01:41.9451620Z cp.async.commit_group; 2026-02-21T09:01:41.9451878Z .loc 1 52 59 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:59 2026-02-21T09:01:41.9452155Z or.b32 %r222, %r209, %r174; 2026-02-21T09:01:41.9452303Z or.b32 %r223, %r210, %r174; 2026-02-21T09:01:41.9452456Z or.b32 %r224, %r211, %r174; 2026-02-21T09:01:41.9452599Z or.b32 %r225, %r212, %r174; 2026-02-21T09:01:41.9452851Z .loc 1 52 34 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:34 2026-02-21T09:01:41.9453126Z mad.wide.u32 %rd63, %r222, 2, %rd38; 2026-02-21T09:01:41.9453302Z mad.wide.u32 %rd64, %r223, 2, %rd38; 2026-02-21T09:01:41.9453467Z mad.wide.u32 %rd65, %r224, 2, %rd38; 2026-02-21T09:01:41.9453637Z mad.wide.u32 %rd66, %r225, 2, %rd38; 2026-02-21T09:01:41.9453940Z .loc 1 52 87 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:87 2026-02-21T09:01:41.9454210Z // begin inline asm 2026-02-21T09:01:41.9454403Z cp.async.cg.shared.global [ %r133 + 0 ], [ %rd63 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9454624Z // end inline asm 2026-02-21T09:01:41.9454810Z // begin inline asm 2026-02-21T09:01:41.9455006Z cp.async.cg.shared.global [ %r135 + 0 ], [ %rd64 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9455215Z // end inline asm 2026-02-21T09:01:41.9455352Z // begin inline asm 2026-02-21T09:01:41.9455540Z cp.async.cg.shared.global [ %r137 + 0 ], [ %rd65 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9455756Z // end inline asm 2026-02-21T09:01:41.9455886Z // begin inline asm 2026-02-21T09:01:41.9456076Z cp.async.cg.shared.global [ %r139 + 0 ], [ %rd66 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9456293Z // end inline asm 2026-02-21T09:01:41.9456429Z cp.async.commit_group; 2026-02-21T09:01:41.9456691Z .loc 1 51 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:32 2026-02-21T09:01:41.9456967Z add.s64 %rd67, %rd59, 128; 2026-02-21T09:01:41.9457127Z add.s64 %rd68, %rd60, 128; 2026-02-21T09:01:41.9457275Z add.s64 %rd69, %rd61, 128; 2026-02-21T09:01:41.9457428Z add.s64 %rd70, %rd62, 128; 2026-02-21T09:01:41.9457723Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9458004Z bar.sync 0; 2026-02-21T09:01:41.9458137Z // begin inline asm 2026-02-21T09:01:41.9458322Z cp.async.cg.shared.global [ %r141 + 0 ], [ %rd67 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9458537Z // end inline asm 2026-02-21T09:01:41.9458663Z // begin inline asm 2026-02-21T09:01:41.9458851Z cp.async.cg.shared.global [ %r143 + 0 ], [ %rd68 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9459058Z // end inline asm 2026-02-21T09:01:41.9459190Z // begin inline asm 2026-02-21T09:01:41.9459371Z cp.async.cg.shared.global [ %r145 + 0 ], [ %rd69 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9459587Z // end inline asm 2026-02-21T09:01:41.9459715Z // begin inline asm 2026-02-21T09:01:41.9459903Z cp.async.cg.shared.global [ %r147 + 0 ], [ %rd70 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9460121Z // end inline asm 2026-02-21T09:01:41.9460253Z cp.async.commit_group; 2026-02-21T09:01:41.9460509Z .loc 1 52 34 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:34 2026-02-21T09:01:41.9460781Z add.s64 %rd71, %rd63, 128; 2026-02-21T09:01:41.9460939Z add.s64 %rd72, %rd64, 128; 2026-02-21T09:01:41.9461113Z add.s64 %rd73, %rd65, 128; 2026-02-21T09:01:41.9461267Z add.s64 %rd74, %rd66, 128; 2026-02-21T09:01:41.9461520Z .loc 1 52 87 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:87 2026-02-21T09:01:41.9461786Z // begin inline asm 2026-02-21T09:01:41.9461976Z cp.async.cg.shared.global [ %r149 + 0 ], [ %rd71 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9462187Z // end inline asm 2026-02-21T09:01:41.9462321Z // begin inline asm 2026-02-21T09:01:41.9462504Z cp.async.cg.shared.global [ %r151 + 0 ], [ %rd72 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9462753Z // end inline asm 2026-02-21T09:01:41.9462884Z // begin inline asm 2026-02-21T09:01:41.9463079Z cp.async.cg.shared.global [ %r153 + 0 ], [ %rd73 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9463297Z // end inline asm 2026-02-21T09:01:41.9463424Z // begin inline asm 2026-02-21T09:01:41.9463611Z cp.async.cg.shared.global [ %r155 + 0 ], [ %rd74 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9463818Z // end inline asm 2026-02-21T09:01:41.9463960Z cp.async.commit_group; 2026-02-21T09:01:41.9464204Z .loc 1 51 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:32 2026-02-21T09:01:41.9464486Z add.s64 %rd75, %rd59, 256; 2026-02-21T09:01:41.9464633Z add.s64 %rd76, %rd60, 256; 2026-02-21T09:01:41.9464845Z add.s64 %rd77, %rd61, 256; 2026-02-21T09:01:41.9464998Z add.s64 %rd78, %rd62, 256; 2026-02-21T09:01:41.9465277Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9465630Z bar.sync 0; 2026-02-21T09:01:41.9465797Z // begin inline asm 2026-02-21T09:01:41.9466005Z cp.async.cg.shared.global [ %r157 + 0 ], [ %rd75 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9466219Z // end inline asm 2026-02-21T09:01:41.9466356Z // begin inline asm 2026-02-21T09:01:41.9466539Z cp.async.cg.shared.global [ %r159 + 0 ], [ %rd76 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9466755Z // end inline asm 2026-02-21T09:01:41.9466888Z // begin inline asm 2026-02-21T09:01:41.9467070Z cp.async.cg.shared.global [ %r161 + 0 ], [ %rd77 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9467287Z // end inline asm 2026-02-21T09:01:41.9467413Z // begin inline asm 2026-02-21T09:01:41.9467602Z cp.async.cg.shared.global [ %r163 + 0 ], [ %rd78 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9467816Z // end inline asm 2026-02-21T09:01:41.9467954Z cp.async.commit_group; 2026-02-21T09:01:41.9468200Z .loc 1 52 34 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:34 2026-02-21T09:01:41.9468482Z add.s64 %rd79, %rd63, 256; 2026-02-21T09:01:41.9468634Z add.s64 %rd80, %rd64, 256; 2026-02-21T09:01:41.9468778Z add.s64 %rd81, %rd65, 256; 2026-02-21T09:01:41.9468929Z add.s64 %rd82, %rd66, 256; 2026-02-21T09:01:41.9469206Z .loc 1 52 87 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:87 2026-02-21T09:01:41.9469490Z // begin inline asm 2026-02-21T09:01:41.9469674Z cp.async.cg.shared.global [ %r165 + 0 ], [ %rd79 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9469897Z // end inline asm 2026-02-21T09:01:41.9470025Z // begin inline asm 2026-02-21T09:01:41.9470214Z cp.async.cg.shared.global [ %r167 + 0 ], [ %rd80 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9470433Z // end inline asm 2026-02-21T09:01:41.9470561Z // begin inline asm 2026-02-21T09:01:41.9470772Z cp.async.cg.shared.global [ %r169 + 0 ], [ %rd81 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9470999Z // end inline asm 2026-02-21T09:01:41.9471142Z // begin inline asm 2026-02-21T09:01:41.9471336Z cp.async.cg.shared.global [ %r171 + 0 ], [ %rd82 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9471564Z // end inline asm 2026-02-21T09:01:41.9471704Z cp.async.commit_group; 2026-02-21T09:01:41.9471971Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9472283Z cp.async.wait_group 4; 2026-02-21T09:01:41.9472435Z bar.sync 0; 2026-02-21T09:01:41.9472713Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9473001Z setp.ne.b32 %p28, %r25, 0; 2026-02-21T09:01:41.9473164Z @%p28 bra $L__BB0_3; 2026-02-21T09:01:41.9473306Z // %bb.2: 2026-02-21T09:01:41.9473547Z .loc 1 0 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:0:52 2026-02-21T09:01:41.9473832Z add.s32 %r235, %r38, 65536; 2026-02-21T09:01:41.9473998Z add.s32 %r236, %r38, 65632; 2026-02-21T09:01:41.9474168Z bfe.u32 %r237, %r236, 4, 14; 2026-02-21T09:01:41.9474356Z cvt.u64.u32 %rd92, %r237; 2026-02-21T09:01:41.9474527Z or.b64 %rd90, %rd92, 4611686293372403712; 2026-02-21T09:01:41.9474751Z add.s32 %r238, %r38, 96; 2026-02-21T09:01:41.9474915Z bfe.u32 %r239, %r238, 4, 14; 2026-02-21T09:01:41.9475074Z cvt.u64.u32 %rd93, %r239; 2026-02-21T09:01:41.9475245Z or.b64 %rd89, %rd93, 4611686293372403712; 2026-02-21T09:01:41.9475426Z add.s32 %r240, %r38, 65600; 2026-02-21T09:01:41.9475589Z bfe.u32 %r241, %r240, 4, 14; 2026-02-21T09:01:41.9475744Z cvt.u64.u32 %rd94, %r241; 2026-02-21T09:01:41.9475911Z or.b64 %rd88, %rd94, 4611686293372403712; 2026-02-21T09:01:41.9476091Z add.s32 %r242, %r38, 64; 2026-02-21T09:01:41.9476243Z bfe.u32 %r243, %r242, 4, 14; 2026-02-21T09:01:41.9476406Z cvt.u64.u32 %rd95, %r243; 2026-02-21T09:01:41.9476566Z or.b64 %rd87, %rd95, 4611686293372403712; 2026-02-21T09:01:41.9476748Z add.s32 %r244, %r38, 65568; 2026-02-21T09:01:41.9476903Z bfe.u32 %r245, %r244, 4, 14; 2026-02-21T09:01:41.9477091Z cvt.u64.u32 %rd96, %r245; 2026-02-21T09:01:41.9477250Z or.b64 %rd86, %rd96, 4611686293372403712; 2026-02-21T09:01:41.9477430Z add.s32 %r246, %r38, 32; 2026-02-21T09:01:41.9477584Z bfe.u32 %r247, %r246, 4, 14; 2026-02-21T09:01:41.9477735Z cvt.u64.u32 %rd97, %r247; 2026-02-21T09:01:41.9477899Z or.b64 %rd85, %rd97, 4611686293372403712; 2026-02-21T09:01:41.9478074Z bfe.u32 %r248, %r235, 4, 14; 2026-02-21T09:01:41.9478242Z cvt.u64.u32 %rd98, %r248; 2026-02-21T09:01:41.9478392Z or.b64 %rd84, %rd98, 4611686293372403712; 2026-02-21T09:01:41.9478564Z bfe.u32 %r249, %r38, 4, 14; 2026-02-21T09:01:41.9478708Z cvt.u64.u32 %rd99, %r249; 2026-02-21T09:01:41.9478864Z or.b64 %rd83, %rd99, 4611686293372403712; 2026-02-21T09:01:41.9479143Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9479425Z elect.sync %r250|%p30, -1; 2026-02-21T09:01:41.9479583Z mov.b32 %r227, 136314896; 2026-02-21T09:01:41.9479726Z mov.pred %p29, 0; 2026-02-21T09:01:41.9479868Z // begin inline asm 2026-02-21T09:01:41.9480088Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd83, %rd84, %r227, %p29; 2026-02-21T09:01:41.9480345Z // end inline asm 2026-02-21T09:01:41.9480476Z // begin inline asm 2026-02-21T09:01:41.9480723Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd85, %rd86, %r227, %p31; 2026-02-21T09:01:41.9480973Z // end inline asm 2026-02-21T09:01:41.9481101Z // begin inline asm 2026-02-21T09:01:41.9481314Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd87, %rd88, %r227, %p31; 2026-02-21T09:01:41.9481544Z // end inline asm 2026-02-21T09:01:41.9481677Z // begin inline asm 2026-02-21T09:01:41.9481875Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd89, %rd90, %r227, %p31; 2026-02-21T09:01:41.9482114Z // end inline asm 2026-02-21T09:01:41.9482251Z add.s32 %r251, %r38, 163840; 2026-02-21T09:01:41.9482400Z cvt.u64.u32 %rd91, %r251; 2026-02-21T09:01:41.9482553Z // begin inline asm 2026-02-21T09:01:41.9482752Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd91]; 2026-02-21T09:01:41.9482980Z // end inline asm 2026-02-21T09:01:41.9483106Z $L__BB0_3: 2026-02-21T09:01:41.9483335Z .loc 1 0 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:0:52 2026-02-21T09:01:41.9483611Z add.s32 %r15, %r187, %r186; 2026-02-21T09:01:41.9483768Z add.s32 %r16, %r187, %r188; 2026-02-21T09:01:41.9483923Z add.s32 %r17, %r187, %r189; 2026-02-21T09:01:41.9484110Z add.s32 %r18, %r187, %r190; 2026-02-21T09:01:41.9484264Z add.s32 %r19, %r187, %r191; 2026-02-21T09:01:41.9484408Z add.s32 %r20, %r187, %r192; 2026-02-21T09:01:41.9484558Z add.s32 %r21, %r187, %r193; 2026-02-21T09:01:41.9484749Z add.s32 %r22, %r187, %r194; 2026-02-21T09:01:41.9485012Z .loc 1 51 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:32 2026-02-21T09:01:41.9485283Z add.s64 %rd100, %rd59, 384; 2026-02-21T09:01:41.9485441Z add.s64 %rd101, %rd60, 384; 2026-02-21T09:01:41.9485629Z add.s64 %rd102, %rd61, 384; 2026-02-21T09:01:41.9485775Z add.s64 %rd103, %rd62, 384; 2026-02-21T09:01:41.9486027Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9486298Z bar.sync 0; 2026-02-21T09:01:41.9486430Z // begin inline asm 2026-02-21T09:01:41.9486625Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd100 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9486845Z // end inline asm 2026-02-21T09:01:41.9486974Z // begin inline asm 2026-02-21T09:01:41.9487171Z cp.async.cg.shared.global [ %r254 + 0 ], [ %rd101 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9487395Z // end inline asm 2026-02-21T09:01:41.9487523Z // begin inline asm 2026-02-21T09:01:41.9487717Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd102 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9487928Z // end inline asm 2026-02-21T09:01:41.9488062Z // begin inline asm 2026-02-21T09:01:41.9488271Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd103 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9488490Z // end inline asm 2026-02-21T09:01:41.9488623Z cp.async.commit_group; 2026-02-21T09:01:41.9488877Z .loc 1 52 34 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:34 2026-02-21T09:01:41.9489161Z add.s64 %rd104, %rd63, 384; 2026-02-21T09:01:41.9489308Z add.s64 %rd105, %rd64, 384; 2026-02-21T09:01:41.9489460Z add.s64 %rd106, %rd65, 384; 2026-02-21T09:01:41.9489605Z add.s64 %rd107, %rd66, 384; 2026-02-21T09:01:41.9489860Z .loc 1 52 87 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:87 2026-02-21T09:01:41.9490127Z // begin inline asm 2026-02-21T09:01:41.9490320Z cp.async.cg.shared.global [ %r260 + 0 ], [ %rd104 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9490534Z // end inline asm 2026-02-21T09:01:41.9490673Z // begin inline asm 2026-02-21T09:01:41.9490864Z cp.async.cg.shared.global [ %r262 + 0 ], [ %rd105 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9491078Z // end inline asm 2026-02-21T09:01:41.9491212Z // begin inline asm 2026-02-21T09:01:41.9491397Z cp.async.cg.shared.global [ %r264 + 0 ], [ %rd106 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9491613Z // end inline asm 2026-02-21T09:01:41.9491740Z // begin inline asm 2026-02-21T09:01:41.9491925Z cp.async.cg.shared.global [ %r266 + 0 ], [ %rd107 + 0 ], 0x10, %r253; 2026-02-21T09:01:41.9492179Z // end inline asm 2026-02-21T09:01:41.9492326Z cp.async.commit_group; 2026-02-21T09:01:41.9492582Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9492856Z mul.wide.u32 %rd10, %r173, 16; 2026-02-21T09:01:41.9493025Z and.b32 %r273, %r3, 7; 2026-02-21T09:01:41.9493171Z shl.b32 %r274, %r273, 17; 2026-02-21T09:01:41.9493329Z shl.b32 %r275, %r4, 10; 2026-02-21T09:01:41.9493472Z or.b32 %r276, %r274, %r275; 2026-02-21T09:01:41.9493635Z mad.wide.u32 %rd109, %r276, 2, %rd38; 2026-02-21T09:01:41.9493806Z add.s64 %rd279, %rd109, 512; 2026-02-21T09:01:41.9493966Z add.s32 %r277, %r23, %r5; 2026-02-21T09:01:41.9494111Z add.s32 %r278, %r277, 64; 2026-02-21T09:01:41.9494273Z mad.wide.u32 %rd110, %r278, 2048, %rd38; 2026-02-21T09:01:41.9494450Z add.s64 %rd278, %rd110, 512; 2026-02-21T09:01:41.9494597Z add.s32 %r279, %r277, 32; 2026-02-21T09:01:41.9494801Z mad.wide.u32 %rd111, %r279, 2048, %rd38; 2026-02-21T09:01:41.9494968Z add.s64 %rd277, %rd111, 512; 2026-02-21T09:01:41.9495122Z shl.b32 %r280, %r5, 10; 2026-02-21T09:01:41.9495263Z or.b32 %r281, %r274, %r280; 2026-02-21T09:01:41.9495453Z mad.wide.u32 %rd112, %r281, 2, %rd38; 2026-02-21T09:01:41.9495623Z add.s64 %rd276, %rd112, 512; 2026-02-21T09:01:41.9495777Z shl.b32 %r282, %r3, 14; 2026-02-21T09:01:41.9495929Z and.b32 %r283, %r282, 4063232; 2026-02-21T09:01:41.9496079Z or.b32 %r284, %r283, %r275; 2026-02-21T09:01:41.9496238Z mad.wide.u32 %rd113, %r284, 2, %rd37; 2026-02-21T09:01:41.9496399Z add.s64 %rd275, %rd113, 512; 2026-02-21T09:01:41.9496552Z add.s32 %r285, %r418, %r5; 2026-02-21T09:01:41.9496699Z add.s32 %r286, %r285, 64; 2026-02-21T09:01:41.9496885Z mad.wide.u32 %rd114, %r286, 2048, %rd37; 2026-02-21T09:01:41.9497054Z add.s64 %rd274, %rd114, 512; 2026-02-21T09:01:41.9497206Z add.s32 %r287, %r285, 32; 2026-02-21T09:01:41.9497356Z mad.wide.u32 %rd115, %r287, 2048, %rd37; 2026-02-21T09:01:41.9497527Z add.s64 %rd273, %rd115, 512; 2026-02-21T09:01:41.9497679Z or.b32 %r288, %r283, %r280; 2026-02-21T09:01:41.9497829Z mad.wide.u32 %rd116, %r288, 2, %rd37; 2026-02-21T09:01:41.9498000Z add.s64 %rd272, %rd116, 512; 2026-02-21T09:01:41.9498140Z mov.b32 %r527, 1; 2026-02-21T09:01:41.9498275Z mov.b32 %r526, 3; 2026-02-21T09:01:41.9498404Z mov.b32 %r523, 0; 2026-02-21T09:01:41.9498543Z mov.b64 %rd271, -64; 2026-02-21T09:01:41.9498681Z mov.b32 %r525, %r523; 2026-02-21T09:01:41.9498826Z mov.b32 %r528, %r523; 2026-02-21T09:01:41.9498963Z bra.uni $L__BB0_4; 2026-02-21T09:01:41.9499145Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:01:41.9499519Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9499796Z add.s64 %rd271, %rd271, 64; 2026-02-21T09:01:41.9499960Z setp.lt.u64 %p49, %rd271, 768; 2026-02-21T09:01:41.9500218Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9500497Z // begin inline asm 2026-02-21T09:01:41.9500629Z 2026-02-21T09:01:41.9500746Z { 2026-02-21T09:01:41.9500875Z .reg .pred complete; 2026-02-21T09:01:41.9501019Z waitLoop: 2026-02-21T09:01:41.9501212Z mbarrier.try_wait.parity.shared.b64 complete, [%r524], %r523; 2026-02-21T09:01:41.9501440Z @!complete bra.uni waitLoop; 2026-02-21T09:01:41.9501597Z } 2026-02-21T09:01:41.9501662Z 2026-02-21T09:01:41.9501716Z // end inline asm 2026-02-21T09:01:41.9501855Z add.s32 %r339, %r527, 1; 2026-02-21T09:01:41.9502004Z setp.gt.s32 %p50, %r339, 1; 2026-02-21T09:01:41.9502167Z selp.b32 %r527, 0, %r339, %p50; 2026-02-21T09:01:41.9502327Z selp.b32 %r340, 1, 0, %p50; 2026-02-21T09:01:41.9502483Z xor.b32 %r36, %r528, %r340; 2026-02-21T09:01:41.9502737Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9503018Z add.s32 %r341, %r526, 1; 2026-02-21T09:01:41.9503198Z setp.gt.s32 %p51, %r341, 3; 2026-02-21T09:01:41.9503348Z selp.b32 %r526, 0, %r341, %p51; 2026-02-21T09:01:41.9503611Z .loc 1 51 32 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:32 2026-02-21T09:01:41.9503883Z add.s64 %rd134, %rd272, %rd10; 2026-02-21T09:01:41.9504045Z add.s64 %rd135, %rd273, %rd10; 2026-02-21T09:01:41.9504202Z add.s64 %rd136, %rd274, %rd10; 2026-02-21T09:01:41.9504456Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9504791Z add.s64 %rd137, %rd275, %rd10; 2026-02-21T09:01:41.9504943Z shl.b32 %r342, %r526, 14; 2026-02-21T09:01:41.9505095Z add.s32 %r344, %r38, %r342; 2026-02-21T09:01:41.9505238Z bar.sync 0; 2026-02-21T09:01:41.9505372Z add.s32 %r323, %r344, %r6; 2026-02-21T09:01:41.9505519Z selp.b32 %r324, 16, 0, %p49; 2026-02-21T09:01:41.9505674Z // begin inline asm 2026-02-21T09:01:41.9505870Z cp.async.cg.shared.global [ %r323 + 0 ], [ %rd134 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9506098Z // end inline asm 2026-02-21T09:01:41.9506236Z add.s32 %r325, %r323, 4096; 2026-02-21T09:01:41.9506382Z // begin inline asm 2026-02-21T09:01:41.9506612Z cp.async.cg.shared.global [ %r325 + 0 ], [ %rd135 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9506827Z // end inline asm 2026-02-21T09:01:41.9506963Z add.s32 %r327, %r323, 8192; 2026-02-21T09:01:41.9507108Z // begin inline asm 2026-02-21T09:01:41.9507302Z cp.async.cg.shared.global [ %r327 + 0 ], [ %rd136 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9507514Z // end inline asm 2026-02-21T09:01:41.9507652Z add.s32 %r329, %r323, 12288; 2026-02-21T09:01:41.9507804Z // begin inline asm 2026-02-21T09:01:41.9507990Z cp.async.cg.shared.global [ %r329 + 0 ], [ %rd137 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9508237Z // end inline asm 2026-02-21T09:01:41.9508371Z cp.async.commit_group; 2026-02-21T09:01:41.9508624Z .loc 1 52 34 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:34 2026-02-21T09:01:41.9508898Z add.s64 %rd138, %rd276, %rd10; 2026-02-21T09:01:41.9509058Z add.s64 %rd139, %rd277, %rd10; 2026-02-21T09:01:41.9509207Z add.s64 %rd140, %rd278, %rd10; 2026-02-21T09:01:41.9509461Z .loc 1 52 87 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:87 2026-02-21T09:01:41.9509736Z add.s64 %rd141, %rd279, %rd10; 2026-02-21T09:01:41.9509882Z add.s32 %r331, %r323, 65536; 2026-02-21T09:01:41.9510035Z // begin inline asm 2026-02-21T09:01:41.9510218Z cp.async.cg.shared.global [ %r331 + 0 ], [ %rd138 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9510431Z // end inline asm 2026-02-21T09:01:41.9510557Z add.s32 %r333, %r323, 69632; 2026-02-21T09:01:41.9510729Z // begin inline asm 2026-02-21T09:01:41.9510917Z cp.async.cg.shared.global [ %r333 + 0 ], [ %rd139 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9511131Z // end inline asm 2026-02-21T09:01:41.9511266Z add.s32 %r335, %r323, 73728; 2026-02-21T09:01:41.9511412Z // begin inline asm 2026-02-21T09:01:41.9511605Z cp.async.cg.shared.global [ %r335 + 0 ], [ %rd140 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9511812Z // end inline asm 2026-02-21T09:01:41.9511945Z add.s32 %r337, %r323, 77824; 2026-02-21T09:01:41.9512088Z // begin inline asm 2026-02-21T09:01:41.9512283Z cp.async.cg.shared.global [ %r337 + 0 ], [ %rd141 + 0 ], 0x10, %r324; 2026-02-21T09:01:41.9512494Z // end inline asm 2026-02-21T09:01:41.9512637Z cp.async.commit_group; 2026-02-21T09:01:41.9512892Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9513159Z add.s64 %rd279, %rd279, 128; 2026-02-21T09:01:41.9513320Z add.s64 %rd278, %rd278, 128; 2026-02-21T09:01:41.9513472Z add.s64 %rd277, %rd277, 128; 2026-02-21T09:01:41.9513631Z add.s64 %rd276, %rd276, 128; 2026-02-21T09:01:41.9513781Z add.s64 %rd275, %rd275, 128; 2026-02-21T09:01:41.9513937Z add.s64 %rd274, %rd274, 128; 2026-02-21T09:01:41.9514088Z add.s64 %rd273, %rd273, 128; 2026-02-21T09:01:41.9514272Z add.s64 %rd272, %rd272, 128; 2026-02-21T09:01:41.9514432Z setp.lt.u64 %p52, %rd271, 896; 2026-02-21T09:01:41.9514589Z mov.b32 %r523, %r528; 2026-02-21T09:01:41.9514807Z mov.b32 %r524, %r345; 2026-02-21T09:01:41.9515016Z mov.b32 %r528, %r36; 2026-02-21T09:01:41.9515164Z @%p52 bra $L__BB0_4; 2026-02-21T09:01:41.9515307Z bra.uni $L__BB0_7; 2026-02-21T09:01:41.9515492Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:01:41.9515710Z add.s32 %r290, %r525, 1; 2026-02-21T09:01:41.9515871Z setp.gt.s32 %p39, %r290, 3; 2026-02-21T09:01:41.9516033Z selp.b32 %r525, 0, %r290, %p39; 2026-02-21T09:01:41.9516313Z .loc 1 51 85 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:51:85 2026-02-21T09:01:41.9516620Z cp.async.wait_group 4; 2026-02-21T09:01:41.9516772Z bar.sync 0; 2026-02-21T09:01:41.9517023Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9517316Z shl.b32 %r291, %r527, 3; 2026-02-21T09:01:41.9517475Z add.s32 %r293, %r38, %r291; 2026-02-21T09:01:41.9517633Z add.s32 %r345, %r293, 163840; 2026-02-21T09:01:41.9517935Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9518236Z @%p28 bra $L__BB0_6; 2026-02-21T09:01:41.9518419Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:01:41.9518746Z .loc 1 52 87 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:52:87 2026-02-21T09:01:41.9519129Z shl.b32 %r302, %r525, 14; 2026-02-21T09:01:41.9519288Z add.s32 %r304, %r38, %r302; 2026-02-21T09:01:41.9519441Z add.s32 %r305, %r304, 65536; 2026-02-21T09:01:41.9519735Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9520043Z elect.sync %r306|%p41, -1; 2026-02-21T09:01:41.9520205Z bfe.u32 %r307, %r304, 4, 14; 2026-02-21T09:01:41.9520372Z cvt.u64.u32 %rd126, %r307; 2026-02-21T09:01:41.9520540Z or.b64 %rd117, %rd126, 4611686293372403712; 2026-02-21T09:01:41.9520731Z bfe.u32 %r308, %r305, 4, 14; 2026-02-21T09:01:41.9520883Z cvt.u64.u32 %rd127, %r308; 2026-02-21T09:01:41.9521056Z or.b64 %rd118, %rd127, 4611686293372403712; 2026-02-21T09:01:41.9521236Z mov.b32 %r295, 136314896; 2026-02-21T09:01:41.9521399Z mov.pred %p40, -1; 2026-02-21T09:01:41.9521539Z // begin inline asm 2026-02-21T09:01:41.9521757Z @%p41 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd117, %rd118, %r295, %p40; 2026-02-21T09:01:41.9522010Z // end inline asm 2026-02-21T09:01:41.9522138Z add.s32 %r309, %r304, 32; 2026-02-21T09:01:41.9522310Z bfe.u32 %r310, %r309, 4, 14; 2026-02-21T09:01:41.9522458Z cvt.u64.u32 %rd128, %r310; 2026-02-21T09:01:41.9522618Z or.b64 %rd119, %rd128, 4611686293372403712; 2026-02-21T09:01:41.9522788Z add.s32 %r311, %r304, 65568; 2026-02-21T09:01:41.9522939Z bfe.u32 %r312, %r311, 4, 14; 2026-02-21T09:01:41.9523093Z cvt.u64.u32 %rd129, %r312; 2026-02-21T09:01:41.9523244Z or.b64 %rd120, %rd129, 4611686293372403712; 2026-02-21T09:01:41.9523417Z // begin inline asm 2026-02-21T09:01:41.9523629Z @%p41 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd119, %rd120, %r295, %p40; 2026-02-21T09:01:41.9523884Z // end inline asm 2026-02-21T09:01:41.9524014Z add.s32 %r313, %r304, 64; 2026-02-21T09:01:41.9524164Z bfe.u32 %r314, %r313, 4, 14; 2026-02-21T09:01:41.9524311Z cvt.u64.u32 %rd130, %r314; 2026-02-21T09:01:41.9524474Z or.b64 %rd121, %rd130, 4611686293372403712; 2026-02-21T09:01:41.9524647Z add.s32 %r315, %r304, 65600; 2026-02-21T09:01:41.9524849Z bfe.u32 %r316, %r315, 4, 14; 2026-02-21T09:01:41.9525004Z cvt.u64.u32 %rd131, %r316; 2026-02-21T09:01:41.9525158Z or.b64 %rd122, %rd131, 4611686293372403712; 2026-02-21T09:01:41.9525335Z // begin inline asm 2026-02-21T09:01:41.9525542Z @%p41 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd121, %rd122, %r295, %p40; 2026-02-21T09:01:41.9525822Z // end inline asm 2026-02-21T09:01:41.9525953Z add.s32 %r317, %r304, 96; 2026-02-21T09:01:41.9526105Z bfe.u32 %r318, %r317, 4, 14; 2026-02-21T09:01:41.9526260Z cvt.u64.u32 %rd132, %r318; 2026-02-21T09:01:41.9526419Z or.b64 %rd123, %rd132, 4611686293372403712; 2026-02-21T09:01:41.9526594Z add.s32 %r319, %r304, 65632; 2026-02-21T09:01:41.9526739Z bfe.u32 %r320, %r319, 4, 14; 2026-02-21T09:01:41.9526891Z cvt.u64.u32 %rd133, %r320; 2026-02-21T09:01:41.9527044Z or.b64 %rd124, %rd133, 4611686293372403712; 2026-02-21T09:01:41.9527216Z // begin inline asm 2026-02-21T09:01:41.9527426Z @%p41 tcgen05.mma.cta_group::1.kind::f16 [ %r522 + 0 ], %rd123, %rd124, %r295, %p40; 2026-02-21T09:01:41.9527673Z // end inline asm 2026-02-21T09:01:41.9527816Z cvt.u64.u32 %rd125, %r345; 2026-02-21T09:01:41.9527959Z // begin inline asm 2026-02-21T09:01:41.9528165Z @%p41 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd125]; 2026-02-21T09:01:41.9528392Z // end inline asm 2026-02-21T09:01:41.9528528Z bra.uni $L__BB0_6; 2026-02-21T09:01:41.9528706Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:01:41.9529020Z .loc 1 0 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:0:52 2026-02-21T09:01:41.9529324Z setp.lt.u32 %p56, %r1, 64; 2026-02-21T09:01:41.9529482Z mov.b32 %r346, 1; 2026-02-21T09:01:41.9529726Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9529996Z // begin inline asm 2026-02-21T09:01:41.9530129Z 2026-02-21T09:01:41.9530237Z { 2026-02-21T09:01:41.9530360Z .reg .pred complete; 2026-02-21T09:01:41.9530499Z waitLoop: 2026-02-21T09:01:41.9530686Z mbarrier.try_wait.parity.shared.b64 complete, [%r345], %r346; 2026-02-21T09:01:41.9530942Z @!complete bra.uni waitLoop; 2026-02-21T09:01:41.9531095Z } 2026-02-21T09:01:41.9531157Z 2026-02-21T09:01:41.9531217Z // end inline asm 2026-02-21T09:01:41.9531449Z .loc 1 46 80 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:46:80 2026-02-21T09:01:41.9531733Z cp.async.wait_group 0; 2026-02-21T09:01:41.9531877Z bar.sync 0; 2026-02-21T09:01:41.9532008Z add.s32 %r347, %r38, 163840; 2026-02-21T09:01:41.9532152Z // begin inline asm 2026-02-21T09:01:41.9532318Z @%p53 mbarrier.inval.shared::cta.b64 [%r347]; 2026-02-21T09:01:41.9532497Z // end inline asm 2026-02-21T09:01:41.9532632Z bar.sync 0; 2026-02-21T09:01:41.9532753Z // begin inline asm 2026-02-21T09:01:41.9532914Z @%p53 mbarrier.inval.shared::cta.b64 [%r124]; 2026-02-21T09:01:41.9533095Z // end inline asm 2026-02-21T09:01:41.9533348Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9533639Z // begin inline asm 2026-02-21T09:01:41.9534001Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363, %r364}, [%r416 + 0]; 2026-02-21T09:01:41.9534401Z // end inline asm 2026-02-21T09:01:41.9534534Z // begin inline asm 2026-02-21T09:01:41.9534920Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381}, [%r416 + 16]; 2026-02-21T09:01:41.9535295Z // end inline asm 2026-02-21T09:01:41.9535424Z // begin inline asm 2026-02-21T09:01:41.9535761Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398}, [%r416 + 32]; 2026-02-21T09:01:41.9536123Z // end inline asm 2026-02-21T09:01:41.9536259Z // begin inline asm 2026-02-21T09:01:41.9536598Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415}, [%r416 + 48]; 2026-02-21T09:01:41.9536963Z // end inline asm 2026-02-21T09:01:41.9537100Z // begin inline asm 2026-02-21T09:01:41.9537247Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:01:41.9537438Z // end inline asm 2026-02-21T09:01:41.9537570Z cvt.u64.u32 %rd143, %r349; 2026-02-21T09:01:41.9537728Z cvt.u64.u32 %rd144, %r350; 2026-02-21T09:01:41.9537875Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:01:41.9538036Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:01:41.9538306Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9538584Z mov.b64 {%r421, %r422}, %rd146; 2026-02-21T09:01:41.9538752Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:01:41.9539026Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9539304Z cvt.u64.u32 %rd147, %r351; 2026-02-21T09:01:41.9539454Z cvt.u64.u32 %rd148, %r352; 2026-02-21T09:01:41.9539613Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:01:41.9539766Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:01:41.9540029Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9540312Z mov.b64 {%r424, %r425}, %rd150; 2026-02-21T09:01:41.9540475Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:01:41.9540783Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9541065Z cvt.u64.u32 %rd151, %r353; 2026-02-21T09:01:41.9541219Z cvt.u64.u32 %rd152, %r354; 2026-02-21T09:01:41.9541362Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:01:41.9541519Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:01:41.9541776Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9542050Z mov.b64 {%r427, %r428}, %rd154; 2026-02-21T09:01:41.9542218Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:01:41.9542533Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9542813Z cvt.u64.u32 %rd155, %r355; 2026-02-21T09:01:41.9542958Z cvt.u64.u32 %rd156, %r356; 2026-02-21T09:01:41.9543110Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:01:41.9543257Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:01:41.9543521Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9543804Z mov.b64 {%r430, %r431}, %rd158; 2026-02-21T09:01:41.9543962Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:01:41.9544236Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9544507Z cvt.u64.u32 %rd159, %r357; 2026-02-21T09:01:41.9544663Z cvt.u64.u32 %rd160, %r358; 2026-02-21T09:01:41.9544844Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:01:41.9545030Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:01:41.9545291Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9545569Z mov.b64 {%r433, %r434}, %rd162; 2026-02-21T09:01:41.9545735Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:01:41.9545997Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9546283Z cvt.u64.u32 %rd163, %r359; 2026-02-21T09:01:41.9546430Z cvt.u64.u32 %rd164, %r360; 2026-02-21T09:01:41.9546581Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:01:41.9546730Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:01:41.9546986Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9547259Z mov.b64 {%r436, %r437}, %rd166; 2026-02-21T09:01:41.9547417Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:01:41.9547686Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9547954Z cvt.u64.u32 %rd167, %r361; 2026-02-21T09:01:41.9548106Z cvt.u64.u32 %rd168, %r362; 2026-02-21T09:01:41.9548253Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:01:41.9548407Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:01:41.9548695Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9548980Z mov.b64 {%r439, %r440}, %rd170; 2026-02-21T09:01:41.9549151Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T09:01:41.9549413Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9549691Z cvt.u64.u32 %rd171, %r363; 2026-02-21T09:01:41.9549842Z cvt.u64.u32 %rd172, %r364; 2026-02-21T09:01:41.9550003Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:01:41.9550158Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:01:41.9550419Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9550697Z mov.b64 {%r442, %r443}, %rd174; 2026-02-21T09:01:41.9550860Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T09:01:41.9551133Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9551406Z cvt.u64.u32 %rd175, %r366; 2026-02-21T09:01:41.9551565Z cvt.u64.u32 %rd176, %r367; 2026-02-21T09:01:41.9551715Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:01:41.9551873Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:01:41.9552160Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9552434Z mov.b64 {%r445, %r446}, %rd178; 2026-02-21T09:01:41.9552596Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T09:01:41.9552852Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9553133Z cvt.u64.u32 %rd179, %r368; 2026-02-21T09:01:41.9553278Z cvt.u64.u32 %rd180, %r369; 2026-02-21T09:01:41.9553431Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:01:41.9553607Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:01:41.9553862Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9554136Z mov.b64 {%r448, %r449}, %rd182; 2026-02-21T09:01:41.9554294Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T09:01:41.9554562Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9554867Z cvt.u64.u32 %rd183, %r370; 2026-02-21T09:01:41.9555020Z cvt.u64.u32 %rd184, %r371; 2026-02-21T09:01:41.9555162Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:01:41.9555317Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:01:41.9555571Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9555840Z mov.b64 {%r451, %r452}, %rd186; 2026-02-21T09:01:41.9556003Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T09:01:41.9556295Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9556579Z cvt.u64.u32 %rd187, %r372; 2026-02-21T09:01:41.9556724Z cvt.u64.u32 %rd188, %r373; 2026-02-21T09:01:41.9556876Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:01:41.9557038Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:01:41.9557320Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9557614Z mov.b64 {%r454, %r455}, %rd190; 2026-02-21T09:01:41.9557783Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T09:01:41.9558067Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9558362Z cvt.u64.u32 %rd191, %r374; 2026-02-21T09:01:41.9558523Z cvt.u64.u32 %rd192, %r375; 2026-02-21T09:01:41.9558676Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:01:41.9558843Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:01:41.9559121Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9559409Z mov.b64 {%r457, %r458}, %rd194; 2026-02-21T09:01:41.9559584Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T09:01:41.9559860Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9560197Z cvt.u64.u32 %rd195, %r376; 2026-02-21T09:01:41.9560349Z cvt.u64.u32 %rd196, %r377; 2026-02-21T09:01:41.9560510Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:01:41.9560667Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:01:41.9560944Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9561233Z mov.b64 {%r460, %r461}, %rd198; 2026-02-21T09:01:41.9561399Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T09:01:41.9561685Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9561973Z cvt.u64.u32 %rd199, %r378; 2026-02-21T09:01:41.9562139Z cvt.u64.u32 %rd200, %r379; 2026-02-21T09:01:41.9562291Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:01:41.9562453Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:01:41.9562725Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9563013Z mov.b64 {%r463, %r464}, %rd202; 2026-02-21T09:01:41.9563183Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T09:01:41.9563484Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9563785Z cvt.u64.u32 %rd203, %r380; 2026-02-21T09:01:41.9563937Z cvt.u64.u32 %rd204, %r381; 2026-02-21T09:01:41.9564092Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:01:41.9564247Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:01:41.9564519Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9564861Z mov.b64 {%r466, %r467}, %rd206; 2026-02-21T09:01:41.9565054Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T09:01:41.9565337Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9565630Z cvt.u64.u32 %rd207, %r383; 2026-02-21T09:01:41.9565792Z cvt.u64.u32 %rd208, %r384; 2026-02-21T09:01:41.9565942Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:01:41.9566103Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:01:41.9566374Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9566646Z mov.b64 {%r469, %r470}, %rd210; 2026-02-21T09:01:41.9566812Z cvt.rn.f16x2.f32 %r471, %r470, %r469; 2026-02-21T09:01:41.9567080Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9567355Z cvt.u64.u32 %rd211, %r385; 2026-02-21T09:01:41.9567502Z cvt.u64.u32 %rd212, %r386; 2026-02-21T09:01:41.9567679Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:01:41.9567830Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:01:41.9568092Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9568377Z mov.b64 {%r472, %r473}, %rd214; 2026-02-21T09:01:41.9568534Z cvt.rn.f16x2.f32 %r474, %r473, %r472; 2026-02-21T09:01:41.9568805Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9569071Z cvt.u64.u32 %rd215, %r387; 2026-02-21T09:01:41.9569222Z cvt.u64.u32 %rd216, %r388; 2026-02-21T09:01:41.9569367Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:01:41.9569522Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:01:41.9569784Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9570068Z mov.b64 {%r475, %r476}, %rd218; 2026-02-21T09:01:41.9570236Z cvt.rn.f16x2.f32 %r477, %r476, %r475; 2026-02-21T09:01:41.9570501Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9570780Z cvt.u64.u32 %rd219, %r389; 2026-02-21T09:01:41.9570923Z cvt.u64.u32 %rd220, %r390; 2026-02-21T09:01:41.9571073Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:01:41.9571221Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:01:41.9571510Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9571779Z mov.b64 {%r478, %r479}, %rd222; 2026-02-21T09:01:41.9571945Z cvt.rn.f16x2.f32 %r480, %r479, %r478; 2026-02-21T09:01:41.9572209Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9572475Z cvt.u64.u32 %rd223, %r391; 2026-02-21T09:01:41.9572627Z cvt.u64.u32 %rd224, %r392; 2026-02-21T09:01:41.9572771Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:01:41.9572929Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:01:41.9573190Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9573460Z mov.b64 {%r481, %r482}, %rd226; 2026-02-21T09:01:41.9573625Z cvt.rn.f16x2.f32 %r483, %r482, %r481; 2026-02-21T09:01:41.9573892Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9574166Z cvt.u64.u32 %rd227, %r393; 2026-02-21T09:01:41.9574313Z cvt.u64.u32 %rd228, %r394; 2026-02-21T09:01:41.9574466Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:01:41.9574615Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:01:41.9574943Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9575218Z mov.b64 {%r484, %r485}, %rd230; 2026-02-21T09:01:41.9575376Z cvt.rn.f16x2.f32 %r486, %r485, %r484; 2026-02-21T09:01:41.9575646Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9575916Z cvt.u64.u32 %rd231, %r395; 2026-02-21T09:01:41.9576069Z cvt.u64.u32 %rd232, %r396; 2026-02-21T09:01:41.9576243Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:01:41.9576397Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:01:41.9576649Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9576920Z mov.b64 {%r487, %r488}, %rd234; 2026-02-21T09:01:41.9577082Z cvt.rn.f16x2.f32 %r489, %r488, %r487; 2026-02-21T09:01:41.9577343Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9577615Z cvt.u64.u32 %rd235, %r397; 2026-02-21T09:01:41.9577759Z cvt.u64.u32 %rd236, %r398; 2026-02-21T09:01:41.9577910Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:01:41.9578059Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:01:41.9578311Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9578582Z mov.b64 {%r490, %r491}, %rd238; 2026-02-21T09:01:41.9578762Z cvt.rn.f16x2.f32 %r492, %r491, %r490; 2026-02-21T09:01:41.9579034Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9579303Z cvt.u64.u32 %rd239, %r400; 2026-02-21T09:01:41.9579456Z cvt.u64.u32 %rd240, %r401; 2026-02-21T09:01:41.9579602Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:01:41.9579757Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:01:41.9580014Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9580285Z mov.b64 {%r493, %r494}, %rd242; 2026-02-21T09:01:41.9580449Z cvt.rn.f16x2.f32 %r495, %r494, %r493; 2026-02-21T09:01:41.9580707Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9580980Z cvt.u64.u32 %rd243, %r402; 2026-02-21T09:01:41.9581126Z cvt.u64.u32 %rd244, %r403; 2026-02-21T09:01:41.9581277Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:01:41.9581425Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:01:41.9581680Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9581955Z mov.b64 {%r496, %r497}, %rd246; 2026-02-21T09:01:41.9582111Z cvt.rn.f16x2.f32 %r498, %r497, %r496; 2026-02-21T09:01:41.9582406Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9582674Z cvt.u64.u32 %rd247, %r404; 2026-02-21T09:01:41.9582824Z cvt.u64.u32 %rd248, %r405; 2026-02-21T09:01:41.9582969Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:01:41.9583122Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:01:41.9583376Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9583644Z mov.b64 {%r499, %r500}, %rd250; 2026-02-21T09:01:41.9583807Z cvt.rn.f16x2.f32 %r501, %r500, %r499; 2026-02-21T09:01:41.9584067Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9584342Z cvt.u64.u32 %rd251, %r406; 2026-02-21T09:01:41.9584486Z cvt.u64.u32 %rd252, %r407; 2026-02-21T09:01:41.9584640Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:01:41.9584825Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:01:41.9585082Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9585360Z mov.b64 {%r502, %r503}, %rd254; 2026-02-21T09:01:41.9585517Z cvt.rn.f16x2.f32 %r504, %r503, %r502; 2026-02-21T09:01:41.9585833Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9586105Z cvt.u64.u32 %rd255, %r408; 2026-02-21T09:01:41.9586259Z cvt.u64.u32 %rd256, %r409; 2026-02-21T09:01:41.9586402Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:01:41.9586555Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:01:41.9586813Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9587084Z mov.b64 {%r505, %r506}, %rd258; 2026-02-21T09:01:41.9587284Z cvt.rn.f16x2.f32 %r507, %r506, %r505; 2026-02-21T09:01:41.9587548Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9587839Z cvt.u64.u32 %rd259, %r410; 2026-02-21T09:01:41.9587993Z cvt.u64.u32 %rd260, %r411; 2026-02-21T09:01:41.9588147Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:01:41.9588299Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:01:41.9588561Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9588844Z mov.b64 {%r508, %r509}, %rd262; 2026-02-21T09:01:41.9589006Z cvt.rn.f16x2.f32 %r510, %r509, %r508; 2026-02-21T09:01:41.9589283Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9589559Z cvt.u64.u32 %rd263, %r412; 2026-02-21T09:01:41.9589716Z cvt.u64.u32 %rd264, %r413; 2026-02-21T09:01:41.9589894Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:01:41.9590055Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:01:41.9590312Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9590579Z mov.b64 {%r511, %r512}, %rd266; 2026-02-21T09:01:41.9590744Z cvt.rn.f16x2.f32 %r513, %r512, %r511; 2026-02-21T09:01:41.9591004Z .loc 1 53 52 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:53:52 2026-02-21T09:01:41.9591282Z cvt.u64.u32 %rd267, %r414; 2026-02-21T09:01:41.9591426Z cvt.u64.u32 %rd268, %r415; 2026-02-21T09:01:41.9591577Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:01:41.9591723Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:01:41.9591976Z .loc 1 55 27 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:55:27 2026-02-21T09:01:41.9592252Z mov.b64 {%r514, %r515}, %rd270; 2026-02-21T09:01:41.9592407Z cvt.rn.f16x2.f32 %r516, %r515, %r514; 2026-02-21T09:01:41.9592675Z .loc 1 56 45 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:56:45 2026-02-21T09:01:41.9592958Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:01:41.9593132Z bar.sync 0; 2026-02-21T09:01:41.9593291Z st.shared.v4.b32 [%r15], {%r423, %r426, %r429, %r432}; 2026-02-21T09:01:41.9593549Z st.shared.v4.b32 [%r16], {%r435, %r438, %r441, %r444}; 2026-02-21T09:01:41.9593769Z st.shared.v4.b32 [%r17], {%r447, %r450, %r453, %r456}; 2026-02-21T09:01:41.9593982Z st.shared.v4.b32 [%r18], {%r459, %r462, %r465, %r468}; 2026-02-21T09:01:41.9594200Z st.shared.v4.b32 [%r19], {%r471, %r474, %r477, %r480}; 2026-02-21T09:01:41.9594407Z st.shared.v4.b32 [%r20], {%r483, %r486, %r489, %r492}; 2026-02-21T09:01:41.9594617Z st.shared.v4.b32 [%r21], {%r495, %r498, %r501, %r504}; 2026-02-21T09:01:41.9594863Z st.shared.v4.b32 [%r22], {%r507, %r510, %r513, %r516}; 2026-02-21T09:01:41.9595058Z // begin inline asm 2026-02-21T09:01:41.9595220Z fence.proxy.async.shared::cta; 2026-02-21T09:01:41.9595381Z // end inline asm 2026-02-21T09:01:41.9595518Z bar.sync 0; 2026-02-21T09:01:41.9595650Z elect.sync %r517|%p57, -1; 2026-02-21T09:01:41.9595812Z and.pred %p55, %p56, %p57; 2026-02-21T09:01:41.9595965Z and.b32 %r518, %r25, 1; 2026-02-21T09:01:41.9596116Z shl.b32 %r519, %r518, 14; 2026-02-21T09:01:41.9596268Z add.s32 %r520, %r38, %r519; 2026-02-21T09:01:41.9596422Z add.s32 %r419, %r520, 131072; 2026-02-21T09:01:41.9596570Z shl.b32 %r521, %r518, 6; 2026-02-21T09:01:41.9596723Z or.b32 %r417, %r521, %r23; 2026-02-21T09:01:41.9596901Z // begin inline asm 2026-02-21T09:01:41.9597166Z @%p55 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd142, {%r417, %r418}], [%r419]; 2026-02-21T09:01:41.9597466Z // end inline asm 2026-02-21T09:01:41.9597605Z cp.async.bulk.commit_group; 2026-02-21T09:01:41.9597785Z $L__BB0_8: // %._crit_edge 2026-02-21T09:01:41.9598072Z .loc 1 27 75 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:27:75 2026-02-21T09:01:41.9598399Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:01:41.9598561Z bar.sync 0; 2026-02-21T09:01:41.9598798Z .loc 1 27 4 // cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py:27:4 2026-02-21T09:01:41.9599072Z bar.sync 0; 2026-02-21T09:01:41.9599198Z // begin inline asm 2026-02-21T09:01:41.9599396Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r522, 128; 2026-02-21T09:01:41.9599609Z // end inline asm 2026-02-21T09:01:41.9599740Z ret; 2026-02-21T09:01:41.9599860Z $L__tmp0: 2026-02-21T09:01:41.9599986Z $L__func_end0: 2026-02-21T09:01:41.9600137Z // -- End function 2026-02-21T09:01:41.9600318Z } 2026-02-21T09:01:41.9600513Z .file 1 "/tmp/torchinductor_root/oh/cohe27foogan72pk22p7cg3sdncqct7xoozrsyijompc5xe3ee2x.py" 2026-02-21T09:01:41.9600579Z .section .debug_abbrev 2026-02-21T09:01:41.9600629Z { 2026-02-21T09:01:41.9600713Z .b8 1 // Abbreviation Code 2026-02-21T09:01:41.9600820Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:01:41.9600906Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:01:41.9600982Z .b8 37 // DW_AT_producer 2026-02-21T09:01:41.9601058Z .b8 8 // DW_FORM_string 2026-02-21T09:01:41.9601139Z .b8 19 // DW_AT_language 2026-02-21T09:01:41.9601215Z .b8 5 // DW_FORM_data2 2026-02-21T09:01:41.9601290Z .b8 3 // DW_AT_name 2026-02-21T09:01:41.9601370Z .b8 8 // DW_FORM_string 2026-02-21T09:01:41.9601448Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:01:41.9601522Z .b8 6 // DW_FORM_data4 2026-02-21T09:01:41.9601604Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:01:41.9601677Z .b8 8 // DW_FORM_string 2026-02-21T09:01:41.9601749Z .b8 0 // EOM(1) 2026-02-21T09:01:41.9601816Z .b8 0 // EOM(2) 2026-02-21T09:01:41.9601890Z .b8 0 // EOM(3) 2026-02-21T09:01:41.9601941Z } 2026-02-21T09:01:41.9602029Z .section .debug_info 2026-02-21T09:01:41.9602084Z { 2026-02-21T09:01:41.9602165Z .b32 104 // Length of Unit 2026-02-21T09:01:41.9602249Z .b8 2 // DWARF version number 2026-02-21T09:01:41.9602304Z .b8 0 2026-02-21T09:01:41.9602425Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:01:41.9602512Z .b8 8 // Address Size (in bytes) 2026-02-21T09:01:41.9602611Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:01:41.9602698Z .b8 116 // DW_AT_producer 2026-02-21T09:01:41.9602752Z .b8 114 2026-02-21T09:01:41.9602807Z .b8 105 2026-02-21T09:01:41.9602865Z .b8 116 2026-02-21T09:01:41.9602917Z .b8 111 2026-02-21T09:01:41.9602967Z .b8 110 2026-02-21T09:01:41.9603018Z .b8 0 2026-02-21T09:01:41.9603099Z .b8 2 // DW_AT_language 2026-02-21T09:01:41.9603152Z .b8 0 2026-02-21T09:01:41.9603226Z .b8 99 // DW_AT_name 2026-02-21T09:01:41.9603282Z .b8 111 2026-02-21T09:01:41.9603331Z .b8 104 2026-02-21T09:01:41.9603382Z .b8 101 2026-02-21T09:01:41.9603433Z .b8 50 2026-02-21T09:01:41.9603518Z .b8 55 2026-02-21T09:01:41.9603571Z .b8 102 2026-02-21T09:01:41.9603621Z .b8 111 2026-02-21T09:01:41.9603669Z .b8 111 2026-02-21T09:01:41.9603724Z .b8 103 2026-02-21T09:01:41.9603774Z .b8 97 2026-02-21T09:01:41.9603825Z .b8 110 2026-02-21T09:01:41.9603880Z .b8 55 2026-02-21T09:01:41.9603929Z .b8 50 2026-02-21T09:01:41.9603979Z .b8 112 2026-02-21T09:01:41.9604030Z .b8 107 2026-02-21T09:01:41.9604088Z .b8 50 2026-02-21T09:01:41.9604138Z .b8 50 2026-02-21T09:01:41.9604190Z .b8 112 2026-02-21T09:01:41.9604274Z .b8 55 2026-02-21T09:01:41.9604323Z .b8 99 2026-02-21T09:01:41.9604373Z .b8 103 2026-02-21T09:01:41.9604424Z .b8 51 2026-02-21T09:01:41.9604482Z .b8 115 2026-02-21T09:01:41.9604532Z .b8 100 2026-02-21T09:01:41.9604582Z .b8 110 2026-02-21T09:01:41.9604641Z .b8 99 2026-02-21T09:01:41.9604748Z .b8 113 2026-02-21T09:01:41.9604799Z .b8 99 2026-02-21T09:01:41.9604848Z .b8 116 2026-02-21T09:01:41.9604903Z .b8 55 2026-02-21T09:01:41.9604953Z .b8 120 2026-02-21T09:01:41.9605001Z .b8 111 2026-02-21T09:01:41.9605053Z .b8 111 2026-02-21T09:01:41.9605109Z .b8 122 2026-02-21T09:01:41.9605160Z .b8 114 2026-02-21T09:01:41.9605208Z .b8 115 2026-02-21T09:01:41.9605262Z .b8 121 2026-02-21T09:01:41.9605312Z .b8 105 2026-02-21T09:01:41.9605361Z .b8 106 2026-02-21T09:01:41.9605411Z .b8 111 2026-02-21T09:01:41.9605468Z .b8 109 2026-02-21T09:01:41.9605516Z .b8 112 2026-02-21T09:01:41.9605566Z .b8 99 2026-02-21T09:01:41.9605622Z .b8 53 2026-02-21T09:01:41.9605671Z .b8 120 2026-02-21T09:01:41.9605747Z .b8 101 2026-02-21T09:01:41.9605802Z .b8 51 2026-02-21T09:01:41.9605862Z .b8 101 2026-02-21T09:01:41.9605915Z .b8 101 2026-02-21T09:01:41.9605967Z .b8 50 2026-02-21T09:01:41.9606018Z .b8 120 2026-02-21T09:01:41.9606076Z .b8 46 2026-02-21T09:01:41.9606129Z .b8 112 2026-02-21T09:01:41.9606183Z .b8 121 2026-02-21T09:01:41.9606240Z .b8 0 2026-02-21T09:01:41.9606334Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:01:41.9606413Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:01:41.9606466Z .b8 116 2026-02-21T09:01:41.9606527Z .b8 109 2026-02-21T09:01:41.9606579Z .b8 112 2026-02-21T09:01:41.9606633Z .b8 47 2026-02-21T09:01:41.9606694Z .b8 116 2026-02-21T09:01:41.9606748Z .b8 111 2026-02-21T09:01:41.9606801Z .b8 114 2026-02-21T09:01:41.9606853Z .b8 99 2026-02-21T09:01:41.9606912Z .b8 104 2026-02-21T09:01:41.9606964Z .b8 105 2026-02-21T09:01:41.9607016Z .b8 110 2026-02-21T09:01:41.9607074Z .b8 100 2026-02-21T09:01:41.9607126Z .b8 117 2026-02-21T09:01:41.9607180Z .b8 99 2026-02-21T09:01:41.9607235Z .b8 116 2026-02-21T09:01:41.9607295Z .b8 111 2026-02-21T09:01:41.9607348Z .b8 114 2026-02-21T09:01:41.9607401Z .b8 95 2026-02-21T09:01:41.9607460Z .b8 114 2026-02-21T09:01:41.9607513Z .b8 111 2026-02-21T09:01:41.9607564Z .b8 111 2026-02-21T09:01:41.9607640Z .b8 116 2026-02-21T09:01:41.9607698Z .b8 47 2026-02-21T09:01:41.9607748Z .b8 111 2026-02-21T09:01:41.9607796Z .b8 104 2026-02-21T09:01:41.9607845Z .b8 0 2026-02-21T09:01:41.9607901Z } 2026-02-21T09:01:41.9607973Z .section .debug_macinfo { } 2026-02-21T09:01:41.9607979Z 2026-02-21T09:01:41.9608056Z ================================================================ 2026-02-21T09:01:41.9608165Z please share the reproducer above with Triton project. 2026-02-21T09:01:43.3543694Z 2026-02-21T09:01:43.3544557Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 60/60 17.6 configs/s 2026-02-21T09:01:44.6976932Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 755.3 2026-02-21T09:01:44.6977513Z configs/s 2026-02-21T09:01:44.8097566Z [115s] Generation 6 complete: 2026-02-21T09:01:44.8097895Z error=7 2026-02-21T09:01:44.8098086Z ok=56 2026-02-21T09:01:44.8098275Z min=0.0164 2026-02-21T09:01:44.8098486Z mid=0.0266 2026-02-21T09:01:44.8098666Z max=4.4810 2026-02-21T09:01:44.8098865Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:01:44.8099189Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:01:44.8099495Z 'l2_groupings': [8], 2026-02-21T09:01:44.8100017Z 'load_eviction_policies': ['', ''], 2026-02-21T09:01:44.8100297Z 'loop_orders': [[1, 0]], 2026-02-21T09:01:44.8100539Z 'num_stages': 4, 2026-02-21T09:01:44.8100744Z 'num_warps': 4, 2026-02-21T09:01:44.8100961Z 'pid_type': 'flat', 2026-02-21T09:01:44.8101198Z 'range_flattens': [None, False], 2026-02-21T09:01:44.8101471Z 'range_multi_buffers': [None, None], 2026-02-21T09:01:44.8101761Z 'range_num_stages': [0, 0], 2026-02-21T09:01:44.8102018Z 'range_unroll_factors': [0, 0], 2026-02-21T09:01:44.8102425Z 'range_warp_specializes': [None, False]} 2026-02-21T09:01:44.8116793Z [115s] Fitting surrogate: 589 points, 589 targets 2026-02-21T09:01:45.8323709Z [116s] Generation 7 starting: 66 neighbors, 4 active search path(s) 2026-02-21T09:01:59.0153678Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67/67 2.4 configs/s 2026-02-21T09:02:02.0654404Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 67/67 22.4 configs/s 2026-02-21T09:02:03.7221648Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 613.1 2026-02-21T09:02:03.7222054Z configs/s 2026-02-21T09:02:03.8571240Z [134s] Generation 7 complete: 2026-02-21T09:02:03.8571595Z error=21 2026-02-21T09:02:03.8571742Z ok=49 2026-02-21T09:02:03.8571879Z min=0.0164 2026-02-21T09:02:03.8572049Z mid=0.0225 2026-02-21T09:02:03.8572188Z max=3.7990 2026-02-21T09:02:03.8572658Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:02:03.8572905Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:02:03.8573118Z 'l2_groupings': [8], 2026-02-21T09:02:03.8573287Z 'load_eviction_policies': ['', ''], 2026-02-21T09:02:03.8573474Z 'loop_orders': [[1, 0]], 2026-02-21T09:02:03.8573626Z 'num_stages': 4, 2026-02-21T09:02:03.8573782Z 'num_warps': 4, 2026-02-21T09:02:03.8573917Z 'pid_type': 'flat', 2026-02-21T09:02:03.8574075Z 'range_flattens': [None, False], 2026-02-21T09:02:03.8574250Z 'range_multi_buffers': [None, None], 2026-02-21T09:02:03.8574447Z 'range_num_stages': [0, 0], 2026-02-21T09:02:03.8574612Z 'range_unroll_factors': [0, 0], 2026-02-21T09:02:03.8575161Z 'range_warp_specializes': [None, False]} 2026-02-21T09:02:03.8594614Z [134s] Fitting surrogate: 659 points, 659 targets 2026-02-21T09:02:04.6911607Z [135s] Generation 8 starting: 48 neighbors, 3 active search path(s) 2026-02-21T09:02:15.3983540Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49/49 2.4 configs/s 2026-02-21T09:02:17.3132044Z 2026-02-21T09:02:17.3133893Z 2026-02-21T09:02:17.3134230Z ================================================================ 2026-02-21T09:02:17.3134514Z Internal Triton PTX codegen error 2026-02-21T09:02:17.3135157Z `ptxas` stderr: 2026-02-21T09:02:17.3135609Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 247 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:02:17.3136360Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:02:17.3136506Z 2026-02-21T09:02:17.3136914Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmphh7dslzw.ptx -o /tmp/tmphh7dslzw.ptx.o 2026-02-21T09:02:17.3137370Z 2026-02-21T09:02:17.3137374Z 2026-02-21T09:02:17.3137430Z // 2026-02-21T09:02:17.3137570Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:02:17.3137747Z // 2026-02-21T09:02:17.3137813Z 2026-02-21T09:02:17.3137877Z .version 8.7 2026-02-21T09:02:17.3138012Z .target sm_100a 2026-02-21T09:02:17.3138151Z .address_size 64 2026-02-21T09:02:17.3138234Z 2026-02-21T09:02:17.3138358Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:02:17.3138615Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:02:17.3138830Z // @_helion_matmul 2026-02-21T09:02:17.3139042Z .visible .entry _helion_matmul( 2026-02-21T09:02:17.3139266Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:02:17.3139593Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:02:17.3139847Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:02:17.3140084Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:02:17.3140330Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:02:17.3140523Z ) 2026-02-21T09:02:17.3140646Z .reqntid 128 2026-02-21T09:02:17.3141015Z .maxnreg 32 2026-02-21T09:02:17.3141138Z { 2026-02-21T09:02:17.3141272Z .reg .pred %p<129>; 2026-02-21T09:02:17.3141514Z .reg .b16 %rs<4>; 2026-02-21T09:02:17.3141667Z .reg .b32 %r<1248>; 2026-02-21T09:02:17.3141808Z .reg .b64 %rd<653>; 2026-02-21T09:02:17.3141959Z $L__func_begin0: 2026-02-21T09:02:17.3142043Z 2026-02-21T09:02:17.3142100Z // %bb.0: 2026-02-21T09:02:17.3142354Z .loc 1 19 0 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:19 2026-02-21T09:02:17.3142652Z mov.u32 %r1, %tid.x; 2026-02-21T09:02:17.3142822Z ld.param.b64 %rd12, [_helion_matmul_param_1]; 2026-02-21T09:02:17.3143024Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:02:17.3143199Z ld.param.b64 %rd30, [_helion_matmul_param_2]; 2026-02-21T09:02:17.3143398Z mov.b32 %r40, global_smem; 2026-02-21T09:02:17.3143551Z // begin inline asm 2026-02-21T09:02:17.3143798Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r40], 256; 2026-02-21T09:02:17.3144038Z // end inline asm 2026-02-21T09:02:17.3144267Z ld.param.b64 %rd47, [_helion_matmul_param_3]; 2026-02-21T09:02:17.3144457Z bar.sync 0; 2026-02-21T09:02:17.3144597Z ld.shared.b32 %r1240, [global_smem]; 2026-02-21T09:02:17.3144823Z bar.sync 0; 2026-02-21T09:02:17.3144947Z // begin inline asm 2026-02-21T09:02:17.3145154Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:02:17.3145373Z // end inline asm 2026-02-21T09:02:17.3145630Z .loc 1 21 67 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:21:67 2026-02-21T09:02:17.3145931Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:02:17.3146093Z mov.u32 %r57, %ctaid.y; 2026-02-21T09:02:17.3146246Z mov.u32 %r58, %ctaid.z; 2026-02-21T09:02:17.3146393Z mov.u32 %r59, %nctaid.x; 2026-02-21T09:02:17.3146547Z mov.u32 %r60, %nctaid.y; 2026-02-21T09:02:17.3146699Z mad.lo.s32 %r61, %r58, %r60, %r57; 2026-02-21T09:02:17.3146874Z mad.lo.s32 %r62, %r61, %r59, %r3; 2026-02-21T09:02:17.3147033Z shl.b32 %r63, %r62, 8; 2026-02-21T09:02:17.3147184Z cvt.s64.s32 %rd48, %r63; 2026-02-21T09:02:17.3147336Z add.s64 %rd26, %rd47, %rd48; 2026-02-21T09:02:17.3147498Z shl.b32 %r64, %r1, 2; 2026-02-21T09:02:17.3147641Z add.s32 %r41, %r40, %r64; 2026-02-21T09:02:17.3147792Z mov.b32 %r50, 0; 2026-02-21T09:02:17.3147931Z // begin inline asm 2026-02-21T09:02:17.3148127Z @%p1 st.shared.b32 [ %r41 + 0 ], %r50; 2026-02-21T09:02:17.3148304Z // end inline asm 2026-02-21T09:02:17.3148440Z bar.warp.sync -1; 2026-02-21T09:02:17.3148593Z setp.eq.b32 %p119, %r1, 0; 2026-02-21T09:02:17.3148747Z cvt.u64.u32 %rd11, %r40; 2026-02-21T09:02:17.3148901Z // begin inline asm 2026-02-21T09:02:17.3149152Z @%p119 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T09:02:17.3149441Z // end inline asm 2026-02-21T09:02:17.3149578Z // begin inline asm 2026-02-21T09:02:17.3149824Z @%p119 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:02:17.3150101Z // end inline asm 2026-02-21T09:02:17.3150243Z mov.b32 %r43, 64; 2026-02-21T09:02:17.3150394Z // begin inline asm 2026-02-21T09:02:17.3150633Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r43; 2026-02-21T09:02:17.3150914Z // end inline asm 2026-02-21T09:02:17.3151045Z mov.b32 %r44, 256; 2026-02-21T09:02:17.3151192Z // begin inline asm 2026-02-21T09:02:17.3151434Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r44; 2026-02-21T09:02:17.3151699Z // end inline asm 2026-02-21T09:02:17.3151845Z mov.b32 %r45, 1024; 2026-02-21T09:02:17.3152021Z // begin inline asm 2026-02-21T09:02:17.3152278Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r45; 2026-02-21T09:02:17.3152562Z // end inline asm 2026-02-21T09:02:17.3152701Z // begin inline asm 2026-02-21T09:02:17.3152975Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r45; 2026-02-21T09:02:17.3153249Z // end inline asm 2026-02-21T09:02:17.3153392Z mov.b64 %rd19, 2048; 2026-02-21T09:02:17.3153537Z // begin inline asm 2026-02-21T09:02:17.3153832Z @%p119 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:02:17.3154132Z // end inline asm 2026-02-21T09:02:17.3154275Z mov.b32 %r47, 1; 2026-02-21T09:02:17.3154413Z // begin inline asm 2026-02-21T09:02:17.3154706Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r47; 2026-02-21T09:02:17.3155011Z // end inline asm 2026-02-21T09:02:17.3155144Z // begin inline asm 2026-02-21T09:02:17.3155411Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r47; 2026-02-21T09:02:17.3155706Z // end inline asm 2026-02-21T09:02:17.3155846Z // begin inline asm 2026-02-21T09:02:17.3156092Z @%p119 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:02:17.3156358Z // end inline asm 2026-02-21T09:02:17.3156499Z // begin inline asm 2026-02-21T09:02:17.3156786Z @%p119 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:02:17.3157091Z // end inline asm 2026-02-21T09:02:17.3157223Z // begin inline asm 2026-02-21T09:02:17.3157470Z @%p119 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x3; 2026-02-21T09:02:17.3157789Z // end inline asm 2026-02-21T09:02:17.3157924Z // begin inline asm 2026-02-21T09:02:17.3158150Z @%p119 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:02:17.3158401Z // end inline asm 2026-02-21T09:02:17.3158536Z // begin inline asm 2026-02-21T09:02:17.3158867Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd26 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:02:17.3159244Z // end inline asm 2026-02-21T09:02:17.3159372Z // begin inline asm 2026-02-21T09:02:17.3159581Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd26 + 0 ], 0x80; 2026-02-21T09:02:17.3159830Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:02:17.3160019Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:02:17.3160198Z // end inline asm 2026-02-21T09:02:17.3160326Z bar.sync 0; 2026-02-21T09:02:17.3160467Z cvta.global.u64 %rd93, %rd26; 2026-02-21T09:02:17.3160731Z .loc 1 23 71 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:23:71 2026-02-21T09:02:17.3161059Z add.s64 %rd44, %rd26, 128; 2026-02-21T09:02:17.3161207Z bar.sync 0; 2026-02-21T09:02:17.3161340Z // begin inline asm 2026-02-21T09:02:17.3161490Z @%p1 st.shared.b32 [ %r41 + 0 ], %r50; 2026-02-21T09:02:17.3161655Z // end inline asm 2026-02-21T09:02:17.3161797Z bar.warp.sync -1; 2026-02-21T09:02:17.3161932Z // begin inline asm 2026-02-21T09:02:17.3162184Z @%p119 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd30; 2026-02-21T09:02:17.3162467Z // end inline asm 2026-02-21T09:02:17.3162605Z // begin inline asm 2026-02-21T09:02:17.3162823Z @%p119 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:02:17.3163089Z // end inline asm 2026-02-21T09:02:17.3163231Z // begin inline asm 2026-02-21T09:02:17.3163456Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r43; 2026-02-21T09:02:17.3163731Z // end inline asm 2026-02-21T09:02:17.3163861Z // begin inline asm 2026-02-21T09:02:17.3164093Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r43; 2026-02-21T09:02:17.3164347Z // end inline asm 2026-02-21T09:02:17.3164482Z // begin inline asm 2026-02-21T09:02:17.3164789Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r45; 2026-02-21T09:02:17.3165057Z // end inline asm 2026-02-21T09:02:17.3165194Z mov.b32 %r54, 4096; 2026-02-21T09:02:17.3165329Z // begin inline asm 2026-02-21T09:02:17.3165566Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r54; 2026-02-21T09:02:17.3165827Z // end inline asm 2026-02-21T09:02:17.3165965Z // begin inline asm 2026-02-21T09:02:17.3166213Z @%p119 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:02:17.3166530Z // end inline asm 2026-02-21T09:02:17.3166666Z // begin inline asm 2026-02-21T09:02:17.3166911Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r47; 2026-02-21T09:02:17.3167196Z // end inline asm 2026-02-21T09:02:17.3167325Z // begin inline asm 2026-02-21T09:02:17.3167577Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r47; 2026-02-21T09:02:17.3167847Z // end inline asm 2026-02-21T09:02:17.3167982Z // begin inline asm 2026-02-21T09:02:17.3168213Z @%p119 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:02:17.3168469Z // end inline asm 2026-02-21T09:02:17.3168605Z // begin inline asm 2026-02-21T09:02:17.3168850Z @%p119 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:02:17.3169130Z // end inline asm 2026-02-21T09:02:17.3169283Z // begin inline asm 2026-02-21T09:02:17.3169518Z @%p119 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x3; 2026-02-21T09:02:17.3169785Z // end inline asm 2026-02-21T09:02:17.3169912Z // begin inline asm 2026-02-21T09:02:17.3170139Z @%p119 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:02:17.3170389Z // end inline asm 2026-02-21T09:02:17.3170524Z // begin inline asm 2026-02-21T09:02:17.3170855Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd44 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:02:17.3171221Z // end inline asm 2026-02-21T09:02:17.3171354Z // begin inline asm 2026-02-21T09:02:17.3171552Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd44 + 0 ], 0x80; 2026-02-21T09:02:17.3171795Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:02:17.3171975Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:02:17.3172150Z // end inline asm 2026-02-21T09:02:17.3172277Z bar.sync 0; 2026-02-21T09:02:17.3172420Z cvta.global.u64 %rd137, %rd44; 2026-02-21T09:02:17.3172684Z .loc 1 32 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:32:52 2026-02-21T09:02:17.3172984Z setp.gt.u32 %p39, %r3, 127; 2026-02-21T09:02:17.3173184Z @%p39 bra $L__BB0_8; 2026-02-21T09:02:17.3173341Z // %bb.1: // %.lr.ph 2026-02-21T09:02:17.3173638Z .loc 1 0 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:0:52 2026-02-21T09:02:17.3173948Z ld.param.b64 %rd10, [_helion_matmul_param_0]; 2026-02-21T09:02:17.3174248Z .loc 1 44 45 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:44:45 2026-02-21T09:02:17.3174535Z and.b32 %r370, %r1, 7; 2026-02-21T09:02:17.3174720Z shl.b32 %r371, %r370, 3; 2026-02-21T09:02:17.3174878Z bfe.u32 %r4, %r1, 3, 4; 2026-02-21T09:02:17.3175021Z shr.u32 %r372, %r1, 5; 2026-02-21T09:02:17.3175170Z shl.b32 %r373, %r1, 4; 2026-02-21T09:02:17.3175313Z and.b32 %r374, %r373, 2032; 2026-02-21T09:02:17.3175469Z shl.b32 %r375, %r1, 1; 2026-02-21T09:02:17.3175612Z and.b32 %r376, %r375, 112; 2026-02-21T09:02:17.3175772Z xor.b32 %r5, %r374, %r376; 2026-02-21T09:02:17.3175922Z setp.lt.u32 %p65, %r1, 64; 2026-02-21T09:02:17.3176081Z or.b32 %r6, %r371, 128; 2026-02-21T09:02:17.3176227Z add.s32 %r378, %r40, %r5; 2026-02-21T09:02:17.3176390Z add.s32 %r455, %r378, 278528; 2026-02-21T09:02:17.3176553Z add.s32 %r457, %r378, 280576; 2026-02-21T09:02:17.3176733Z add.s32 %r459, %r378, 282624; 2026-02-21T09:02:17.3176892Z add.s32 %r461, %r378, 284672; 2026-02-21T09:02:17.3177037Z shl.b32 %r379, %r1, 7; 2026-02-21T09:02:17.3177189Z and.b32 %r380, %r379, 1920; 2026-02-21T09:02:17.3177336Z shl.b32 %r381, %r1, 6; 2026-02-21T09:02:17.3177483Z and.b32 %r382, %r381, 6144; 2026-02-21T09:02:17.3177630Z shl.b32 %r383, %r370, 4; 2026-02-21T09:02:17.3177781Z shl.b32 %r384, %r1, 10; 2026-02-21T09:02:17.3177927Z and.b32 %r385, %r384, 16384; 2026-02-21T09:02:17.3178085Z or.b32 %r386, %r382, %r383; 2026-02-21T09:02:17.3178291Z or.b32 %r387, %r386, %r385; 2026-02-21T09:02:17.3178435Z or.b32 %r388, %r387, %r380; 2026-02-21T09:02:17.3178589Z add.s32 %r389, %r40, 196608; 2026-02-21T09:02:17.3178737Z xor.b32 %r390, %r388, 16; 2026-02-21T09:02:17.3178890Z xor.b32 %r391, %r388, 32; 2026-02-21T09:02:17.3179032Z xor.b32 %r392, %r388, 48; 2026-02-21T09:02:17.3179178Z xor.b32 %r393, %r388, 64; 2026-02-21T09:02:17.3179316Z xor.b32 %r394, %r388, 80; 2026-02-21T09:02:17.3179463Z xor.b32 %r395, %r388, 96; 2026-02-21T09:02:17.3179604Z xor.b32 %r396, %r388, 112; 2026-02-21T09:02:17.3179759Z add.s32 %r355, %r378, 270336; 2026-02-21T09:02:17.3179914Z add.s32 %r361, %r378, 276480; 2026-02-21T09:02:17.3180062Z add.s32 %r359, %r378, 274432; 2026-02-21T09:02:17.3180218Z add.s32 %r357, %r378, 272384; 2026-02-21T09:02:17.3180365Z add.s32 %r342, %r378, 262144; 2026-02-21T09:02:17.3180518Z add.s32 %r348, %r378, 268288; 2026-02-21T09:02:17.3180689Z add.s32 %r346, %r378, 266240; 2026-02-21T09:02:17.3180846Z add.s32 %r344, %r378, 264192; 2026-02-21T09:02:17.3181103Z .loc 1 39 33 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:39:33 2026-02-21T09:02:17.3181386Z shr.u32 %r397, %r3, 1; 2026-02-21T09:02:17.3181536Z and.b32 %r398, %r397, 62; 2026-02-21T09:02:17.3181788Z .loc 1 41 64 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:41:64 2026-02-21T09:02:17.3182073Z and.b32 %r19, %r3, 1; 2026-02-21T09:02:17.3182328Z .loc 1 41 30 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:41:30 2026-02-21T09:02:17.3182616Z or.b32 %r399, %r398, %r19; 2026-02-21T09:02:17.3182870Z .loc 1 43 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:43:27 2026-02-21T09:02:17.3183162Z shl.b32 %r842, %r399, 6; 2026-02-21T09:02:17.3183421Z .loc 1 44 32 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:44:32 2026-02-21T09:02:17.3183697Z or.b32 %r400, %r842, %r4; 2026-02-21T09:02:17.3183952Z .loc 1 45 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:45:27 2026-02-21T09:02:17.3184235Z shl.b32 %r401, %r3, 8; 2026-02-21T09:02:17.3184511Z .loc 1 55 53 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:53 2026-02-21T09:02:17.3184812Z shl.b32 %r402, %r400, 10; 2026-02-21T09:02:17.3185078Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3185389Z shfl.sync.idx.b32 %r22, %r372, 0, 31, -1; 2026-02-21T09:02:17.3185567Z shl.b32 %r403, %r22, 21; 2026-02-21T09:02:17.3185722Z and.b32 %r404, %r403, 6291456; 2026-02-21T09:02:17.3185875Z add.s32 %r840, %r404, %r1240; 2026-02-21T09:02:17.3186036Z mov.pred %p71, -1; 2026-02-21T09:02:17.3186175Z // begin inline asm 2026-02-21T09:02:17.3186540Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 0], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3186926Z // end inline asm 2026-02-21T09:02:17.3187059Z // begin inline asm 2026-02-21T09:02:17.3187400Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 16], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3187767Z // end inline asm 2026-02-21T09:02:17.3187903Z // begin inline asm 2026-02-21T09:02:17.3188251Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 32], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3188619Z // end inline asm 2026-02-21T09:02:17.3188746Z // begin inline asm 2026-02-21T09:02:17.3189070Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 48], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3189431Z // end inline asm 2026-02-21T09:02:17.3189559Z // begin inline asm 2026-02-21T09:02:17.3189918Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 64], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3190272Z // end inline asm 2026-02-21T09:02:17.3190407Z // begin inline asm 2026-02-21T09:02:17.3190734Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 80], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3191088Z // end inline asm 2026-02-21T09:02:17.3191222Z // begin inline asm 2026-02-21T09:02:17.3191540Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 96], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3191897Z // end inline asm 2026-02-21T09:02:17.3192025Z // begin inline asm 2026-02-21T09:02:17.3192387Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 112], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3192771Z // end inline asm 2026-02-21T09:02:17.3192901Z // begin inline asm 2026-02-21T09:02:17.3193280Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048576], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3193686Z // end inline asm 2026-02-21T09:02:17.3193829Z // begin inline asm 2026-02-21T09:02:17.3194185Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048592], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3194581Z // end inline asm 2026-02-21T09:02:17.3194765Z // begin inline asm 2026-02-21T09:02:17.3195129Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048608], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3195525Z // end inline asm 2026-02-21T09:02:17.3195661Z // begin inline asm 2026-02-21T09:02:17.3196025Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048624], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3196458Z // end inline asm 2026-02-21T09:02:17.3196606Z // begin inline asm 2026-02-21T09:02:17.3196971Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048640], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3197364Z // end inline asm 2026-02-21T09:02:17.3197514Z // begin inline asm 2026-02-21T09:02:17.3197860Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048656], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3198253Z // end inline asm 2026-02-21T09:02:17.3198396Z // begin inline asm 2026-02-21T09:02:17.3198746Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048672], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3199135Z // end inline asm 2026-02-21T09:02:17.3199268Z // begin inline asm 2026-02-21T09:02:17.3199622Z @%p71 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r840 + 1048688], 128, {%r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50, %r50}; 2026-02-21T09:02:17.3200010Z // end inline asm 2026-02-21T09:02:17.3200151Z // begin inline asm 2026-02-21T09:02:17.3200338Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:02:17.3200506Z // end inline asm 2026-02-21T09:02:17.3200647Z bar.sync 0; 2026-02-21T09:02:17.3200899Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3201203Z add.s32 %r1242, %r40, 286752; 2026-02-21T09:02:17.3201355Z // begin inline asm 2026-02-21T09:02:17.3201523Z @%p119 mbarrier.init.shared::cta.b64 [%r1242], 1; 2026-02-21T09:02:17.3201712Z // end inline asm 2026-02-21T09:02:17.3201875Z bar.sync 0; 2026-02-21T09:02:17.3202011Z add.s32 %r338, %r40, 286760; 2026-02-21T09:02:17.3202160Z // begin inline asm 2026-02-21T09:02:17.3202328Z @%p119 mbarrier.init.shared::cta.b64 [%r338], 1; 2026-02-21T09:02:17.3202510Z // end inline asm 2026-02-21T09:02:17.3202652Z add.s32 %r339, %r40, 286720; 2026-02-21T09:02:17.3202799Z // begin inline asm 2026-02-21T09:02:17.3202961Z @%p119 mbarrier.init.shared::cta.b64 [%r339], 1; 2026-02-21T09:02:17.3203138Z // end inline asm 2026-02-21T09:02:17.3203271Z bar.sync 0; 2026-02-21T09:02:17.3203399Z add.s32 %r340, %r40, 286728; 2026-02-21T09:02:17.3203553Z // begin inline asm 2026-02-21T09:02:17.3203715Z @%p119 mbarrier.init.shared::cta.b64 [%r340], 1; 2026-02-21T09:02:17.3203891Z // end inline asm 2026-02-21T09:02:17.3204023Z bar.sync 0; 2026-02-21T09:02:17.3204150Z add.s32 %r463, %r40, 286736; 2026-02-21T09:02:17.3204303Z // begin inline asm 2026-02-21T09:02:17.3204484Z @%p119 mbarrier.init.shared::cta.b64 [%r463], 1; 2026-02-21T09:02:17.3204700Z // end inline asm 2026-02-21T09:02:17.3204946Z .loc 1 55 60 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:60 2026-02-21T09:02:17.3205232Z or.b32 %r405, %r402, %r371; 2026-02-21T09:02:17.3205499Z .loc 1 55 32 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:32 2026-02-21T09:02:17.3205786Z mad.wide.u32 %rd49, %r405, 2, %rd10; 2026-02-21T09:02:17.3205967Z cvt.u64.u32 %rd3, %r402; 2026-02-21T09:02:17.3206119Z add.s64 %rd50, %rd49, 32768; 2026-02-21T09:02:17.3206279Z add.s64 %rd51, %rd49, 65536; 2026-02-21T09:02:17.3206430Z add.s64 %rd52, %rd49, 98304; 2026-02-21T09:02:17.3206587Z mov.b32 %r456, 16; 2026-02-21T09:02:17.3206826Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3207121Z // begin inline asm 2026-02-21T09:02:17.3207336Z cp.async.cg.shared.global [ %r342 + 0 ], [ %rd49 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3207567Z // end inline asm 2026-02-21T09:02:17.3207711Z // begin inline asm 2026-02-21T09:02:17.3207902Z cp.async.cg.shared.global [ %r344 + 0 ], [ %rd50 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3208123Z // end inline asm 2026-02-21T09:02:17.3208252Z // begin inline asm 2026-02-21T09:02:17.3208483Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd51 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3208698Z // end inline asm 2026-02-21T09:02:17.3208837Z // begin inline asm 2026-02-21T09:02:17.3209031Z cp.async.cg.shared.global [ %r348 + 0 ], [ %rd52 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3209244Z // end inline asm 2026-02-21T09:02:17.3209388Z cp.async.commit_group; 2026-02-21T09:02:17.3209643Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3209922Z bar.sync 0; 2026-02-21T09:02:17.3210046Z // begin inline asm 2026-02-21T09:02:17.3210242Z @%p119 mbarrier.arrive.expect_tx.shared.b64 _, [%r339], 65536; 2026-02-21T09:02:17.3210460Z // end inline asm 2026-02-21T09:02:17.3210702Z .loc 1 56 44 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:56:44 2026-02-21T09:02:17.3210981Z // begin inline asm 2026-02-21T09:02:17.3211129Z fence.proxy.async.shared::cta; 2026-02-21T09:02:17.3211298Z // end inline asm 2026-02-21T09:02:17.3211423Z bar.sync 0; 2026-02-21T09:02:17.3211562Z elect.sync %r406|%p66, -1; 2026-02-21T09:02:17.3211723Z and.pred %p62, %p65, %p66; 2026-02-21T09:02:17.3211882Z and.b32 %r407, %r22, 1; 2026-02-21T09:02:17.3212061Z shl.b32 %r24, %r407, 14; 2026-02-21T09:02:17.3212214Z shl.b32 %r408, %r407, 15; 2026-02-21T09:02:17.3212367Z add.s32 %r351, %r40, %r408; 2026-02-21T09:02:17.3212517Z and.b32 %r409, %r3, 2; 2026-02-21T09:02:17.3212667Z or.b32 %r410, %r407, %r409; 2026-02-21T09:02:17.3212812Z shl.b32 %r466, %r410, 8; 2026-02-21T09:02:17.3212961Z // begin inline asm 2026-02-21T09:02:17.3213278Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r351], [%rd93, {%r50, %r466}], [%r339]; 2026-02-21T09:02:17.3213672Z // end inline asm 2026-02-21T09:02:17.3213913Z .loc 1 55 32 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:32 2026-02-21T09:02:17.3214198Z add.s64 %rd54, %rd49, 128; 2026-02-21T09:02:17.3214359Z or.b32 %r411, %r405, 64; 2026-02-21T09:02:17.3214514Z mad.wide.u32 %rd59, %r411, 2, %rd10; 2026-02-21T09:02:17.3214738Z add.s64 %rd55, %rd59, 32768; 2026-02-21T09:02:17.3214895Z add.s64 %rd56, %rd59, 65536; 2026-02-21T09:02:17.3215051Z add.s64 %rd57, %rd59, 98304; 2026-02-21T09:02:17.3215308Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3215591Z // begin inline asm 2026-02-21T09:02:17.3215795Z cp.async.cg.shared.global [ %r355 + 0 ], [ %rd54 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3216018Z // end inline asm 2026-02-21T09:02:17.3216158Z // begin inline asm 2026-02-21T09:02:17.3216380Z cp.async.cg.shared.global [ %r357 + 0 ], [ %rd55 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3216613Z // end inline asm 2026-02-21T09:02:17.3216753Z // begin inline asm 2026-02-21T09:02:17.3216961Z cp.async.cg.shared.global [ %r359 + 0 ], [ %rd56 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3217181Z // end inline asm 2026-02-21T09:02:17.3217322Z // begin inline asm 2026-02-21T09:02:17.3217513Z cp.async.cg.shared.global [ %r361 + 0 ], [ %rd57 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3217737Z // end inline asm 2026-02-21T09:02:17.3217886Z cp.async.commit_group; 2026-02-21T09:02:17.3218145Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3218435Z bar.sync 0; 2026-02-21T09:02:17.3218562Z // begin inline asm 2026-02-21T09:02:17.3218763Z @%p119 mbarrier.arrive.expect_tx.shared.b64 _, [%r340], 65536; 2026-02-21T09:02:17.3218983Z // end inline asm 2026-02-21T09:02:17.3219238Z .loc 1 56 44 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:56:44 2026-02-21T09:02:17.3219524Z bar.sync 0; 2026-02-21T09:02:17.3219664Z elect.sync %r412|%p67, -1; 2026-02-21T09:02:17.3219835Z and.pred %p64, %p65, %p67; 2026-02-21T09:02:17.3219995Z add.s32 %r364, %r351, 65536; 2026-02-21T09:02:17.3220157Z // begin inline asm 2026-02-21T09:02:17.3220505Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r364], [%rd93, {%r43, %r466}], [%r340]; 2026-02-21T09:02:17.3220867Z // end inline asm 2026-02-21T09:02:17.3221097Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3221387Z cp.async.wait_group 1; 2026-02-21T09:02:17.3221536Z bar.sync 0; 2026-02-21T09:02:17.3221763Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3222037Z // begin inline asm 2026-02-21T09:02:17.3222165Z 2026-02-21T09:02:17.3222281Z { 2026-02-21T09:02:17.3222398Z .reg .pred complete; 2026-02-21T09:02:17.3222546Z waitLoop: 2026-02-21T09:02:17.3222730Z mbarrier.try_wait.parity.shared.b64 complete, [%r339], %r50; 2026-02-21T09:02:17.3222961Z @!complete bra.uni waitLoop; 2026-02-21T09:02:17.3223114Z } 2026-02-21T09:02:17.3223176Z 2026-02-21T09:02:17.3223230Z // end inline asm 2026-02-21T09:02:17.3223468Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3223747Z setp.ne.b32 %p68, %r22, 0; 2026-02-21T09:02:17.3223905Z @%p68 bra $L__BB0_3; 2026-02-21T09:02:17.3224094Z // %bb.2: 2026-02-21T09:02:17.3224333Z .loc 1 0 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:0:52 2026-02-21T09:02:17.3224610Z add.s32 %r430, %r40, 32864; 2026-02-21T09:02:17.3224805Z bfe.u32 %r431, %r430, 4, 14; 2026-02-21T09:02:17.3224964Z cvt.u64.u32 %rd77, %r431; 2026-02-21T09:02:17.3225126Z or.b64 %rd75, %rd77, 4611686293573730304; 2026-02-21T09:02:17.3225310Z add.s32 %r432, %r40, 32832; 2026-02-21T09:02:17.3225460Z bfe.u32 %r433, %r432, 4, 14; 2026-02-21T09:02:17.3225653Z cvt.u64.u32 %rd78, %r433; 2026-02-21T09:02:17.3225810Z or.b64 %rd73, %rd78, 4611686293573730304; 2026-02-21T09:02:17.3225991Z add.s32 %r434, %r40, 32800; 2026-02-21T09:02:17.3226138Z bfe.u32 %r435, %r434, 4, 14; 2026-02-21T09:02:17.3226302Z cvt.u64.u32 %rd79, %r435; 2026-02-21T09:02:17.3226472Z or.b64 %rd71, %rd79, 4611686293573730304; 2026-02-21T09:02:17.3226644Z add.s32 %r436, %r40, 32768; 2026-02-21T09:02:17.3226803Z bfe.u32 %r437, %r436, 4, 14; 2026-02-21T09:02:17.3226951Z cvt.u64.u32 %rd80, %r437; 2026-02-21T09:02:17.3227113Z or.b64 %rd69, %rd80, 4611686293573730304; 2026-02-21T09:02:17.3227278Z add.s32 %r438, %r40, 96; 2026-02-21T09:02:17.3227431Z bfe.u32 %r439, %r438, 4, 14; 2026-02-21T09:02:17.3227577Z cvt.u64.u32 %rd81, %r439; 2026-02-21T09:02:17.3227736Z or.b64 %rd67, %rd81, 4611686293573730304; 2026-02-21T09:02:17.3227902Z add.s32 %r440, %r40, 262144; 2026-02-21T09:02:17.3228090Z add.s32 %r441, %r40, 262240; 2026-02-21T09:02:17.3228246Z bfe.u32 %r442, %r441, 4, 14; 2026-02-21T09:02:17.3228392Z cvt.u64.u32 %rd82, %r442; 2026-02-21T09:02:17.3228550Z or.b64 %rd66, %rd82, 4611686293338849280; 2026-02-21T09:02:17.3228717Z add.s32 %r443, %r40, 64; 2026-02-21T09:02:17.3228872Z bfe.u32 %r444, %r443, 4, 14; 2026-02-21T09:02:17.3229020Z cvt.u64.u32 %rd83, %r444; 2026-02-21T09:02:17.3229177Z or.b64 %rd65, %rd83, 4611686293573730304; 2026-02-21T09:02:17.3229340Z add.s32 %r445, %r40, 262208; 2026-02-21T09:02:17.3229494Z bfe.u32 %r446, %r445, 4, 14; 2026-02-21T09:02:17.3229648Z cvt.u64.u32 %rd84, %r446; 2026-02-21T09:02:17.3229800Z or.b64 %rd64, %rd84, 4611686293338849280; 2026-02-21T09:02:17.3229971Z add.s32 %r447, %r40, 32; 2026-02-21T09:02:17.3230114Z bfe.u32 %r448, %r447, 4, 14; 2026-02-21T09:02:17.3230266Z cvt.u64.u32 %rd85, %r448; 2026-02-21T09:02:17.3230414Z or.b64 %rd63, %rd85, 4611686293573730304; 2026-02-21T09:02:17.3230583Z add.s32 %r449, %r40, 262176; 2026-02-21T09:02:17.3230727Z bfe.u32 %r450, %r449, 4, 14; 2026-02-21T09:02:17.3230879Z cvt.u64.u32 %rd86, %r450; 2026-02-21T09:02:17.3231028Z or.b64 %rd62, %rd86, 4611686293338849280; 2026-02-21T09:02:17.3231198Z bfe.u32 %r451, %r40, 4, 14; 2026-02-21T09:02:17.3231351Z cvt.u64.u32 %rd87, %r451; 2026-02-21T09:02:17.3231532Z or.b64 %rd61, %rd87, 4611686293573730304; 2026-02-21T09:02:17.3231705Z bfe.u32 %r452, %r440, 4, 14; 2026-02-21T09:02:17.3231852Z cvt.u64.u32 %rd88, %r452; 2026-02-21T09:02:17.3232012Z or.b64 %rd60, %rd88, 4611686293338849280; 2026-02-21T09:02:17.3232289Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3232593Z elect.sync %r453|%p70, -1; 2026-02-21T09:02:17.3232746Z mov.b32 %r414, 71303184; 2026-02-21T09:02:17.3232899Z mov.pred %p69, 0; 2026-02-21T09:02:17.3233043Z // begin inline asm 2026-02-21T09:02:17.3233268Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd60, %rd61, %r414, %p69; 2026-02-21T09:02:17.3233526Z // end inline asm 2026-02-21T09:02:17.3233657Z // begin inline asm 2026-02-21T09:02:17.3233877Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd62, %rd63, %r414, %p71; 2026-02-21T09:02:17.3234125Z // end inline asm 2026-02-21T09:02:17.3234266Z // begin inline asm 2026-02-21T09:02:17.3234483Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd64, %rd65, %r414, %p71; 2026-02-21T09:02:17.3234759Z // end inline asm 2026-02-21T09:02:17.3234901Z // begin inline asm 2026-02-21T09:02:17.3235141Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd66, %rd67, %r414, %p71; 2026-02-21T09:02:17.3235401Z // end inline asm 2026-02-21T09:02:17.3235533Z // begin inline asm 2026-02-21T09:02:17.3235766Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd60, %rd69, %r414, %p69; 2026-02-21T09:02:17.3236025Z // end inline asm 2026-02-21T09:02:17.3236177Z // begin inline asm 2026-02-21T09:02:17.3236409Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd62, %rd71, %r414, %p71; 2026-02-21T09:02:17.3236685Z // end inline asm 2026-02-21T09:02:17.3236822Z // begin inline asm 2026-02-21T09:02:17.3237035Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd64, %rd73, %r414, %p71; 2026-02-21T09:02:17.3237309Z // end inline asm 2026-02-21T09:02:17.3237445Z // begin inline asm 2026-02-21T09:02:17.3237675Z @%p70 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd66, %rd75, %r414, %p71; 2026-02-21T09:02:17.3237940Z // end inline asm 2026-02-21T09:02:17.3238081Z add.s32 %r454, %r40, 286752; 2026-02-21T09:02:17.3238247Z cvt.u64.u32 %rd76, %r454; 2026-02-21T09:02:17.3238399Z // begin inline asm 2026-02-21T09:02:17.3238614Z @%p70 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd76]; 2026-02-21T09:02:17.3238847Z // end inline asm 2026-02-21T09:02:17.3238987Z $L__BB0_3: 2026-02-21T09:02:17.3239229Z .loc 1 0 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:0:52 2026-02-21T09:02:17.3239564Z add.s32 %r11, %r389, %r388; 2026-02-21T09:02:17.3239740Z add.s32 %r12, %r389, %r390; 2026-02-21T09:02:17.3239902Z add.s32 %r13, %r389, %r391; 2026-02-21T09:02:17.3240069Z add.s32 %r14, %r389, %r392; 2026-02-21T09:02:17.3240226Z add.s32 %r15, %r389, %r393; 2026-02-21T09:02:17.3240392Z add.s32 %r16, %r389, %r394; 2026-02-21T09:02:17.3240548Z add.s32 %r17, %r389, %r395; 2026-02-21T09:02:17.3240711Z add.s32 %r18, %r389, %r396; 2026-02-21T09:02:17.3240869Z and.b32 %r21, %r401, 512; 2026-02-21T09:02:17.3241147Z .loc 1 55 32 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:32 2026-02-21T09:02:17.3241452Z add.s64 %rd89, %rd49, 256; 2026-02-21T09:02:17.3241616Z cvt.u64.u32 %rd95, %r6; 2026-02-21T09:02:17.3241784Z add.s64 %rd96, %rd3, %rd95; 2026-02-21T09:02:17.3241943Z shl.b64 %rd97, %rd96, 1; 2026-02-21T09:02:17.3242110Z add.s64 %rd98, %rd10, %rd97; 2026-02-21T09:02:17.3242272Z add.s64 %rd90, %rd98, 32768; 2026-02-21T09:02:17.3242440Z add.s64 %rd91, %rd98, 65536; 2026-02-21T09:02:17.3242600Z add.s64 %rd92, %rd98, 98304; 2026-02-21T09:02:17.3242879Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3243179Z bar.sync 0; 2026-02-21T09:02:17.3243343Z // begin inline asm 2026-02-21T09:02:17.3243555Z cp.async.cg.shared.global [ %r455 + 0 ], [ %rd89 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3243784Z // end inline asm 2026-02-21T09:02:17.3243929Z // begin inline asm 2026-02-21T09:02:17.3244131Z cp.async.cg.shared.global [ %r457 + 0 ], [ %rd90 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3244364Z // end inline asm 2026-02-21T09:02:17.3244499Z // begin inline asm 2026-02-21T09:02:17.3244733Z cp.async.cg.shared.global [ %r459 + 0 ], [ %rd91 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3244976Z // end inline asm 2026-02-21T09:02:17.3245115Z // begin inline asm 2026-02-21T09:02:17.3245321Z cp.async.cg.shared.global [ %r461 + 0 ], [ %rd92 + 0 ], 0x10, %r456; 2026-02-21T09:02:17.3245550Z // end inline asm 2026-02-21T09:02:17.3245700Z cp.async.commit_group; 2026-02-21T09:02:17.3245964Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3246259Z // begin inline asm 2026-02-21T09:02:17.3246464Z @%p119 mbarrier.arrive.expect_tx.shared.b64 _, [%r463], 65536; 2026-02-21T09:02:17.3246684Z // end inline asm 2026-02-21T09:02:17.3246932Z .loc 1 56 44 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:56:44 2026-02-21T09:02:17.3247233Z bar.sync 0; 2026-02-21T09:02:17.3247379Z elect.sync %r473|%p89, -1; 2026-02-21T09:02:17.3247538Z and.pred %p87, %p65, %p89; 2026-02-21T09:02:17.3247699Z shl.b32 %r474, %r24, 1; 2026-02-21T09:02:17.3247843Z add.s32 %r475, %r40, %r474; 2026-02-21T09:02:17.3248003Z add.s32 %r464, %r475, 131072; 2026-02-21T09:02:17.3248154Z mov.b32 %r465, 128; 2026-02-21T09:02:17.3248297Z // begin inline asm 2026-02-21T09:02:17.3248622Z @%p87 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r464], [%rd93, {%r465, %r466}], [%r463]; 2026-02-21T09:02:17.3248999Z // end inline asm 2026-02-21T09:02:17.3249242Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3249517Z cvt.u16.u32 %rs1, %r3; 2026-02-21T09:02:17.3249670Z shr.u16 %rs2, %rs1, 2; 2026-02-21T09:02:17.3249815Z and.b16 %rs3, %rs2, 31; 2026-02-21T09:02:17.3249975Z mul.wide.u16 %r476, %rs3, 128; 2026-02-21T09:02:17.3250132Z shl.b32 %r477, %r19, 6; 2026-02-21T09:02:17.3250284Z or.b32 %r478, %r476, %r477; 2026-02-21T09:02:17.3250439Z or.b32 %r479, %r478, %r4; 2026-02-21T09:02:17.3250593Z mul.wide.u32 %rd99, %r479, 2048; 2026-02-21T09:02:17.3250769Z mul.wide.u32 %rd100, %r370, 16; 2026-02-21T09:02:17.3250928Z or.b64 %rd101, %rd99, %rd100; 2026-02-21T09:02:17.3251088Z add.s64 %rd102, %rd101, %rd10; 2026-02-21T09:02:17.3251241Z add.s64 %rd651, %rd102, 98688; 2026-02-21T09:02:17.3251430Z mov.b32 %r1246, 1; 2026-02-21T09:02:17.3251563Z mov.b32 %r1245, 2; 2026-02-21T09:02:17.3251700Z mov.b32 %r1241, 0; 2026-02-21T09:02:17.3251828Z mov.b64 %rd652, 0; 2026-02-21T09:02:17.3251967Z mov.b32 %r1243, %r1241; 2026-02-21T09:02:17.3252115Z mov.b32 %r1244, %r1241; 2026-02-21T09:02:17.3252256Z mov.b32 %r1247, %r1241; 2026-02-21T09:02:17.3252401Z bra.uni $L__BB0_4; 2026-02-21T09:02:17.3252578Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:02:17.3252895Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3253172Z setp.lt.u64 %p111, %rd652, 832; 2026-02-21T09:02:17.3253437Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3253712Z // begin inline asm 2026-02-21T09:02:17.3253841Z 2026-02-21T09:02:17.3253956Z { 2026-02-21T09:02:17.3254071Z .reg .pred complete; 2026-02-21T09:02:17.3254221Z waitLoop: 2026-02-21T09:02:17.3254407Z mbarrier.try_wait.parity.shared.b64 complete, [%r1242], %r1241; 2026-02-21T09:02:17.3254650Z @!complete bra.uni waitLoop; 2026-02-21T09:02:17.3254829Z } 2026-02-21T09:02:17.3254899Z 2026-02-21T09:02:17.3254954Z // end inline asm 2026-02-21T09:02:17.3255122Z add.s32 %r550, %r1246, 1; 2026-02-21T09:02:17.3255280Z setp.gt.s32 %p114, %r550, 1; 2026-02-21T09:02:17.3255445Z selp.b32 %r1246, 0, %r550, %p114; 2026-02-21T09:02:17.3255610Z selp.b32 %r551, 1, 0, %p114; 2026-02-21T09:02:17.3255770Z xor.b32 %r38, %r1247, %r551; 2026-02-21T09:02:17.3256030Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3256315Z add.s32 %r552, %r1245, 1; 2026-02-21T09:02:17.3256468Z setp.gt.s32 %p115, %r552, 2; 2026-02-21T09:02:17.3256635Z selp.b32 %r1245, 0, %r552, %p115; 2026-02-21T09:02:17.3256906Z .loc 1 55 32 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:32 2026-02-21T09:02:17.3257191Z add.s64 %rd132, %rd651, -98304; 2026-02-21T09:02:17.3257363Z add.s64 %rd133, %rd651, -65536; 2026-02-21T09:02:17.3257520Z add.s64 %rd134, %rd651, -32768; 2026-02-21T09:02:17.3257781Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3258059Z shl.b32 %r553, %r1245, 13; 2026-02-21T09:02:17.3258220Z add.s32 %r555, %r40, %r553; 2026-02-21T09:02:17.3258370Z add.s32 %r556, %r555, %r5; 2026-02-21T09:02:17.3258524Z bar.sync 0; 2026-02-21T09:02:17.3258684Z add.s32 %r537, %r556, 262144; 2026-02-21T09:02:17.3258849Z selp.b32 %r538, 16, 0, %p111; 2026-02-21T09:02:17.3259009Z // begin inline asm 2026-02-21T09:02:17.3259207Z cp.async.cg.shared.global [ %r537 + 0 ], [ %rd132 + 0 ], 0x10, %r538; 2026-02-21T09:02:17.3259437Z // end inline asm 2026-02-21T09:02:17.3259569Z add.s32 %r539, %r556, 264192; 2026-02-21T09:02:17.3259724Z // begin inline asm 2026-02-21T09:02:17.3259917Z cp.async.cg.shared.global [ %r539 + 0 ], [ %rd133 + 0 ], 0x10, %r538; 2026-02-21T09:02:17.3260168Z // end inline asm 2026-02-21T09:02:17.3260299Z add.s32 %r541, %r556, 266240; 2026-02-21T09:02:17.3260455Z // begin inline asm 2026-02-21T09:02:17.3260648Z cp.async.cg.shared.global [ %r541 + 0 ], [ %rd134 + 0 ], 0x10, %r538; 2026-02-21T09:02:17.3260862Z // end inline asm 2026-02-21T09:02:17.3261001Z add.s32 %r543, %r556, 268288; 2026-02-21T09:02:17.3261148Z // begin inline asm 2026-02-21T09:02:17.3261342Z cp.async.cg.shared.global [ %r543 + 0 ], [ %rd651 + 0 ], 0x10, %r538; 2026-02-21T09:02:17.3261557Z // end inline asm 2026-02-21T09:02:17.3261698Z cp.async.commit_group; 2026-02-21T09:02:17.3261951Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3262233Z shl.b32 %r557, %r1245, 3; 2026-02-21T09:02:17.3262390Z add.s32 %r558, %r40, %r557; 2026-02-21T09:02:17.3262542Z add.s32 %r549, %r558, 286720; 2026-02-21T09:02:17.3262731Z and.pred %p109, %p119, %p111; 2026-02-21T09:02:17.3262885Z // begin inline asm 2026-02-21T09:02:17.3263081Z @%p109 mbarrier.arrive.expect_tx.shared.b64 _, [%r549], 65536; 2026-02-21T09:02:17.3263295Z // end inline asm 2026-02-21T09:02:17.3263542Z .loc 1 56 44 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:56:44 2026-02-21T09:02:17.3263825Z shl.b32 %r559, %r1245, 16; 2026-02-21T09:02:17.3263984Z bar.sync 0; 2026-02-21T09:02:17.3264131Z elect.sync %r560|%p116, -1; 2026-02-21T09:02:17.3264297Z and.pred %p117, %p111, %p116; 2026-02-21T09:02:17.3264466Z and.pred %p110, %p65, %p117; 2026-02-21T09:02:17.3264623Z add.s32 %r546, %r351, %r559; 2026-02-21T09:02:17.3264814Z cvt.u32.u64 %r561, %rd652; 2026-02-21T09:02:17.3264964Z add.s32 %r547, %r561, 192; 2026-02-21T09:02:17.3265118Z // begin inline asm 2026-02-21T09:02:17.3265448Z @%p110 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r546], [%rd93, {%r547, %r466}], [%r549]; 2026-02-21T09:02:17.3265823Z // end inline asm 2026-02-21T09:02:17.3266075Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3266367Z add.s64 %rd651, %rd651, 128; 2026-02-21T09:02:17.3266532Z setp.lt.u64 %p118, %rd652, 896; 2026-02-21T09:02:17.3266743Z add.s64 %rd652, %rd652, 64; 2026-02-21T09:02:17.3266902Z mov.b32 %r1241, %r1247; 2026-02-21T09:02:17.3267044Z mov.b32 %r1242, %r562; 2026-02-21T09:02:17.3267196Z mov.b32 %r1247, %r38; 2026-02-21T09:02:17.3267336Z @%p118 bra $L__BB0_4; 2026-02-21T09:02:17.3267489Z bra.uni $L__BB0_7; 2026-02-21T09:02:17.3267678Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:02:17.3267887Z add.s32 %r483, %r1244, 1; 2026-02-21T09:02:17.3268046Z setp.gt.s32 %p91, %r483, 2; 2026-02-21T09:02:17.3268205Z selp.b32 %r1244, 0, %r483, %p91; 2026-02-21T09:02:17.3268376Z selp.b32 %r484, 1, 0, %p91; 2026-02-21T09:02:17.3268529Z xor.b32 %r1243, %r1243, %r484; 2026-02-21T09:02:17.3268809Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3269092Z cp.async.wait_group 1; 2026-02-21T09:02:17.3269244Z bar.sync 0; 2026-02-21T09:02:17.3269480Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3269754Z shl.b32 %r485, %r1244, 3; 2026-02-21T09:02:17.3269909Z add.s32 %r487, %r40, %r485; 2026-02-21T09:02:17.3270059Z add.s32 %r481, %r487, 286720; 2026-02-21T09:02:17.3270243Z // begin inline asm 2026-02-21T09:02:17.3270373Z 2026-02-21T09:02:17.3270488Z { 2026-02-21T09:02:17.3270602Z .reg .pred complete; 2026-02-21T09:02:17.3270747Z waitLoop: 2026-02-21T09:02:17.3270930Z mbarrier.try_wait.parity.shared.b64 complete, [%r481], %r1243; 2026-02-21T09:02:17.3271165Z @!complete bra.uni waitLoop; 2026-02-21T09:02:17.3271318Z } 2026-02-21T09:02:17.3271379Z 2026-02-21T09:02:17.3271433Z // end inline asm 2026-02-21T09:02:17.3271578Z shl.b32 %r488, %r1246, 3; 2026-02-21T09:02:17.3271724Z add.s32 %r489, %r40, %r488; 2026-02-21T09:02:17.3271916Z add.s32 %r562, %r489, 286752; 2026-02-21T09:02:17.3272170Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3272452Z @%p68 bra $L__BB0_6; 2026-02-21T09:02:17.3272635Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:02:17.3272951Z .loc 1 56 44 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:56:44 2026-02-21T09:02:17.3273236Z shl.b32 %r506, %r1244, 16; 2026-02-21T09:02:17.3273384Z add.s32 %r508, %r40, %r506; 2026-02-21T09:02:17.3273645Z .loc 1 55 85 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:55:85 2026-02-21T09:02:17.3273921Z shl.b32 %r509, %r1244, 13; 2026-02-21T09:02:17.3274086Z add.s32 %r510, %r40, %r509; 2026-02-21T09:02:17.3274235Z add.s32 %r511, %r510, 262144; 2026-02-21T09:02:17.3274525Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3274848Z elect.sync %r512|%p93, -1; 2026-02-21T09:02:17.3275005Z bfe.u32 %r513, %r511, 4, 14; 2026-02-21T09:02:17.3275167Z cvt.u64.u32 %rd120, %r513; 2026-02-21T09:02:17.3275333Z or.b64 %rd103, %rd120, 4611686293338849280; 2026-02-21T09:02:17.3275518Z bfe.u32 %r514, %r508, 4, 14; 2026-02-21T09:02:17.3275670Z cvt.u64.u32 %rd121, %r514; 2026-02-21T09:02:17.3275838Z or.b64 %rd104, %rd121, 4611686293573730304; 2026-02-21T09:02:17.3276015Z mov.b32 %r491, 71303184; 2026-02-21T09:02:17.3276172Z mov.pred %p92, -1; 2026-02-21T09:02:17.3276319Z // begin inline asm 2026-02-21T09:02:17.3276545Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd103, %rd104, %r491, %p92; 2026-02-21T09:02:17.3276806Z // end inline asm 2026-02-21T09:02:17.3276943Z add.s32 %r515, %r510, 262176; 2026-02-21T09:02:17.3277101Z bfe.u32 %r516, %r515, 4, 14; 2026-02-21T09:02:17.3277254Z cvt.u64.u32 %rd122, %r516; 2026-02-21T09:02:17.3277422Z or.b64 %rd105, %rd122, 4611686293338849280; 2026-02-21T09:02:17.3277598Z add.s32 %r517, %r508, 32; 2026-02-21T09:02:17.3277754Z bfe.u32 %r518, %r517, 4, 14; 2026-02-21T09:02:17.3277911Z cvt.u64.u32 %rd123, %r518; 2026-02-21T09:02:17.3278069Z or.b64 %rd106, %rd123, 4611686293573730304; 2026-02-21T09:02:17.3278282Z // begin inline asm 2026-02-21T09:02:17.3278495Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd105, %rd106, %r491, %p92; 2026-02-21T09:02:17.3278747Z // end inline asm 2026-02-21T09:02:17.3278883Z add.s32 %r519, %r510, 262208; 2026-02-21T09:02:17.3279041Z bfe.u32 %r520, %r519, 4, 14; 2026-02-21T09:02:17.3279188Z cvt.u64.u32 %rd124, %r520; 2026-02-21T09:02:17.3279349Z or.b64 %rd107, %rd124, 4611686293338849280; 2026-02-21T09:02:17.3279525Z add.s32 %r521, %r508, 64; 2026-02-21T09:02:17.3279670Z bfe.u32 %r522, %r521, 4, 14; 2026-02-21T09:02:17.3279824Z cvt.u64.u32 %rd125, %r522; 2026-02-21T09:02:17.3279979Z or.b64 %rd108, %rd125, 4611686293573730304; 2026-02-21T09:02:17.3280154Z // begin inline asm 2026-02-21T09:02:17.3280365Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd107, %rd108, %r491, %p92; 2026-02-21T09:02:17.3280630Z // end inline asm 2026-02-21T09:02:17.3280761Z add.s32 %r523, %r510, 262240; 2026-02-21T09:02:17.3280919Z bfe.u32 %r524, %r523, 4, 14; 2026-02-21T09:02:17.3281084Z cvt.u64.u32 %rd126, %r524; 2026-02-21T09:02:17.3281248Z or.b64 %rd109, %rd126, 4611686293338849280; 2026-02-21T09:02:17.3281433Z add.s32 %r525, %r508, 96; 2026-02-21T09:02:17.3281619Z bfe.u32 %r526, %r525, 4, 14; 2026-02-21T09:02:17.3281784Z cvt.u64.u32 %rd127, %r526; 2026-02-21T09:02:17.3281945Z or.b64 %rd110, %rd127, 4611686293573730304; 2026-02-21T09:02:17.3282127Z // begin inline asm 2026-02-21T09:02:17.3282345Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 0 ], %rd109, %rd110, %r491, %p92; 2026-02-21T09:02:17.3282606Z // end inline asm 2026-02-21T09:02:17.3282755Z add.s32 %r527, %r508, 32768; 2026-02-21T09:02:17.3282915Z bfe.u32 %r528, %r527, 4, 14; 2026-02-21T09:02:17.3283116Z cvt.u64.u32 %rd128, %r528; 2026-02-21T09:02:17.3283279Z or.b64 %rd112, %rd128, 4611686293573730304; 2026-02-21T09:02:17.3283459Z // begin inline asm 2026-02-21T09:02:17.3283691Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd103, %rd112, %r491, %p92; 2026-02-21T09:02:17.3283964Z // end inline asm 2026-02-21T09:02:17.3284102Z add.s32 %r529, %r508, 32800; 2026-02-21T09:02:17.3284264Z bfe.u32 %r530, %r529, 4, 14; 2026-02-21T09:02:17.3284427Z cvt.u64.u32 %rd129, %r530; 2026-02-21T09:02:17.3284592Z or.b64 %rd114, %rd129, 4611686293573730304; 2026-02-21T09:02:17.3284802Z // begin inline asm 2026-02-21T09:02:17.3285037Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd105, %rd114, %r491, %p92; 2026-02-21T09:02:17.3285311Z // end inline asm 2026-02-21T09:02:17.3285449Z add.s32 %r531, %r508, 32832; 2026-02-21T09:02:17.3285610Z bfe.u32 %r532, %r531, 4, 14; 2026-02-21T09:02:17.3285794Z cvt.u64.u32 %rd130, %r532; 2026-02-21T09:02:17.3285971Z or.b64 %rd116, %rd130, 4611686293573730304; 2026-02-21T09:02:17.3286156Z // begin inline asm 2026-02-21T09:02:17.3286388Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd107, %rd116, %r491, %p92; 2026-02-21T09:02:17.3286663Z // end inline asm 2026-02-21T09:02:17.3286800Z add.s32 %r533, %r508, 32864; 2026-02-21T09:02:17.3286962Z bfe.u32 %r534, %r533, 4, 14; 2026-02-21T09:02:17.3287116Z cvt.u64.u32 %rd131, %r534; 2026-02-21T09:02:17.3287288Z or.b64 %rd118, %rd131, 4611686293573730304; 2026-02-21T09:02:17.3287463Z // begin inline asm 2026-02-21T09:02:17.3287700Z @%p93 tcgen05.mma.cta_group::1.kind::f16 [ %r1240 + 1048576 ], %rd109, %rd118, %r491, %p92; 2026-02-21T09:02:17.3287968Z // end inline asm 2026-02-21T09:02:17.3288104Z cvt.u64.u32 %rd119, %r562; 2026-02-21T09:02:17.3288263Z // begin inline asm 2026-02-21T09:02:17.3288473Z @%p93 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd119]; 2026-02-21T09:02:17.3288714Z // end inline asm 2026-02-21T09:02:17.3288848Z bra.uni $L__BB0_6; 2026-02-21T09:02:17.3289033Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:02:17.3289374Z .loc 1 0 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:0:52 2026-02-21T09:02:17.3289694Z setp.lt.u32 %p126, %r1, 128; 2026-02-21T09:02:17.3289848Z mov.b32 %r563, 1; 2026-02-21T09:02:17.3290088Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3290371Z // begin inline asm 2026-02-21T09:02:17.3290498Z 2026-02-21T09:02:17.3290613Z { 2026-02-21T09:02:17.3290732Z .reg .pred complete; 2026-02-21T09:02:17.3290878Z waitLoop: 2026-02-21T09:02:17.3291059Z mbarrier.try_wait.parity.shared.b64 complete, [%r562], %r563; 2026-02-21T09:02:17.3291300Z @!complete bra.uni waitLoop; 2026-02-21T09:02:17.3291456Z } 2026-02-21T09:02:17.3291518Z 2026-02-21T09:02:17.3291571Z // end inline asm 2026-02-21T09:02:17.3291815Z .loc 1 50 80 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:50:80 2026-02-21T09:02:17.3292104Z cp.async.wait_group 0; 2026-02-21T09:02:17.3292261Z bar.sync 0; 2026-02-21T09:02:17.3292388Z // begin inline asm 2026-02-21T09:02:17.3292564Z @%p119 mbarrier.inval.shared::cta.b64 [%r339]; 2026-02-21T09:02:17.3292751Z // end inline asm 2026-02-21T09:02:17.3292885Z bar.sync 0; 2026-02-21T09:02:17.3293014Z // begin inline asm 2026-02-21T09:02:17.3293202Z @%p119 mbarrier.inval.shared::cta.b64 [%r340]; 2026-02-21T09:02:17.3293393Z // end inline asm 2026-02-21T09:02:17.3293519Z bar.sync 0; 2026-02-21T09:02:17.3293646Z // begin inline asm 2026-02-21T09:02:17.3293802Z @%p119 mbarrier.inval.shared::cta.b64 [%r463]; 2026-02-21T09:02:17.3293986Z // end inline asm 2026-02-21T09:02:17.3294116Z add.s32 %r567, %r40, 286752; 2026-02-21T09:02:17.3294272Z // begin inline asm 2026-02-21T09:02:17.3294432Z @%p119 mbarrier.inval.shared::cta.b64 [%r567]; 2026-02-21T09:02:17.3294612Z // end inline asm 2026-02-21T09:02:17.3294814Z bar.sync 0; 2026-02-21T09:02:17.3294936Z // begin inline asm 2026-02-21T09:02:17.3295097Z @%p119 mbarrier.inval.shared::cta.b64 [%r338]; 2026-02-21T09:02:17.3295273Z // end inline asm 2026-02-21T09:02:17.3295512Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3295782Z // begin inline asm 2026-02-21T09:02:17.3296160Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584}, [%r840 + 0], 128; 2026-02-21T09:02:17.3296546Z // end inline asm 2026-02-21T09:02:17.3296675Z // begin inline asm 2026-02-21T09:02:17.3297042Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601}, [%r840 + 16], 128; 2026-02-21T09:02:17.3297441Z // end inline asm 2026-02-21T09:02:17.3297609Z // begin inline asm 2026-02-21T09:02:17.3297962Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618}, [%r840 + 32], 128; 2026-02-21T09:02:17.3298365Z // end inline asm 2026-02-21T09:02:17.3298503Z // begin inline asm 2026-02-21T09:02:17.3298854Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635}, [%r840 + 48], 128; 2026-02-21T09:02:17.3299252Z // end inline asm 2026-02-21T09:02:17.3299379Z // begin inline asm 2026-02-21T09:02:17.3299739Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652}, [%r840 + 64], 128; 2026-02-21T09:02:17.3300135Z // end inline asm 2026-02-21T09:02:17.3300261Z // begin inline asm 2026-02-21T09:02:17.3300614Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669}, [%r840 + 80], 128; 2026-02-21T09:02:17.3301006Z // end inline asm 2026-02-21T09:02:17.3301139Z // begin inline asm 2026-02-21T09:02:17.3301480Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686}, [%r840 + 96], 128; 2026-02-21T09:02:17.3301902Z // end inline asm 2026-02-21T09:02:17.3302037Z // begin inline asm 2026-02-21T09:02:17.3302382Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703}, [%r840 + 112], 128; 2026-02-21T09:02:17.3302786Z // end inline asm 2026-02-21T09:02:17.3302915Z // begin inline asm 2026-02-21T09:02:17.3303268Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720}, [%r840 + 1048576], 128; 2026-02-21T09:02:17.3303674Z // end inline asm 2026-02-21T09:02:17.3303805Z // begin inline asm 2026-02-21T09:02:17.3304160Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737}, [%r840 + 1048592], 128; 2026-02-21T09:02:17.3304549Z // end inline asm 2026-02-21T09:02:17.3304721Z // begin inline asm 2026-02-21T09:02:17.3305112Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754}, [%r840 + 1048608], 128; 2026-02-21T09:02:17.3305523Z // end inline asm 2026-02-21T09:02:17.3305660Z // begin inline asm 2026-02-21T09:02:17.3306005Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771}, [%r840 + 1048624], 128; 2026-02-21T09:02:17.3306392Z // end inline asm 2026-02-21T09:02:17.3306519Z // begin inline asm 2026-02-21T09:02:17.3306872Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788}, [%r840 + 1048640], 128; 2026-02-21T09:02:17.3307293Z // end inline asm 2026-02-21T09:02:17.3307430Z // begin inline asm 2026-02-21T09:02:17.3307783Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805}, [%r840 + 1048656], 128; 2026-02-21T09:02:17.3308163Z // end inline asm 2026-02-21T09:02:17.3308305Z // begin inline asm 2026-02-21T09:02:17.3308677Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822}, [%r840 + 1048672], 128; 2026-02-21T09:02:17.3309091Z // end inline asm 2026-02-21T09:02:17.3309220Z // begin inline asm 2026-02-21T09:02:17.3309650Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r824, %r825, %r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839}, [%r840 + 1048688], 128; 2026-02-21T09:02:17.3310065Z // end inline asm 2026-02-21T09:02:17.3310197Z // begin inline asm 2026-02-21T09:02:17.3310352Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:02:17.3310515Z // end inline asm 2026-02-21T09:02:17.3310662Z cvt.u64.u32 %rd139, %r569; 2026-02-21T09:02:17.3310819Z cvt.u64.u32 %rd140, %r570; 2026-02-21T09:02:17.3310981Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:02:17.3311144Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:02:17.3311432Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3311745Z mov.b64 {%r848, %r849}, %rd142; 2026-02-21T09:02:17.3311915Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T09:02:17.3312200Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3312505Z cvt.u64.u32 %rd143, %r571; 2026-02-21T09:02:17.3312663Z cvt.u64.u32 %rd144, %r572; 2026-02-21T09:02:17.3312815Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:02:17.3312978Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:02:17.3313245Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3313566Z mov.b64 {%r851, %r852}, %rd146; 2026-02-21T09:02:17.3313743Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:02:17.3314028Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3314312Z cvt.u64.u32 %rd147, %r573; 2026-02-21T09:02:17.3314460Z cvt.u64.u32 %rd148, %r574; 2026-02-21T09:02:17.3314614Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:02:17.3314802Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:02:17.3315067Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3315350Z mov.b64 {%r854, %r855}, %rd150; 2026-02-21T09:02:17.3315515Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:02:17.3315792Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3316065Z cvt.u64.u32 %rd151, %r575; 2026-02-21T09:02:17.3316222Z cvt.u64.u32 %rd152, %r576; 2026-02-21T09:02:17.3316372Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:02:17.3316534Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:02:17.3316807Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3317123Z mov.b64 {%r857, %r858}, %rd154; 2026-02-21T09:02:17.3317308Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:02:17.3317581Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3317859Z cvt.u64.u32 %rd155, %r577; 2026-02-21T09:02:17.3318005Z cvt.u64.u32 %rd156, %r578; 2026-02-21T09:02:17.3318162Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:02:17.3318318Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:02:17.3318571Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3318884Z mov.b64 {%r860, %r861}, %rd158; 2026-02-21T09:02:17.3319043Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:02:17.3319317Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3319592Z cvt.u64.u32 %rd159, %r579; 2026-02-21T09:02:17.3319748Z cvt.u64.u32 %rd160, %r580; 2026-02-21T09:02:17.3319896Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:02:17.3320055Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:02:17.3320314Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3320590Z mov.b64 {%r863, %r864}, %rd162; 2026-02-21T09:02:17.3320756Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:02:17.3321025Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3321335Z cvt.u64.u32 %rd163, %r581; 2026-02-21T09:02:17.3321486Z cvt.u64.u32 %rd164, %r582; 2026-02-21T09:02:17.3321639Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:02:17.3321795Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:02:17.3322046Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3322330Z mov.b64 {%r866, %r867}, %rd166; 2026-02-21T09:02:17.3322487Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:02:17.3322758Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3323029Z cvt.u64.u32 %rd167, %r583; 2026-02-21T09:02:17.3323181Z cvt.u64.u32 %rd168, %r584; 2026-02-21T09:02:17.3323327Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:02:17.3323484Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:02:17.3323740Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3324010Z mov.b64 {%r869, %r870}, %rd170; 2026-02-21T09:02:17.3324176Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:02:17.3324438Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3324744Z cvt.u64.u32 %rd171, %r586; 2026-02-21T09:02:17.3324921Z cvt.u64.u32 %rd172, %r587; 2026-02-21T09:02:17.3325076Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:02:17.3325232Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:02:17.3325486Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3325776Z mov.b64 {%r872, %r873}, %rd174; 2026-02-21T09:02:17.3325936Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:02:17.3326214Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3326495Z cvt.u64.u32 %rd175, %r588; 2026-02-21T09:02:17.3326649Z cvt.u64.u32 %rd176, %r589; 2026-02-21T09:02:17.3326796Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:02:17.3326955Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:02:17.3327221Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3327494Z mov.b64 {%r875, %r876}, %rd178; 2026-02-21T09:02:17.3327666Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:02:17.3327931Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3328215Z cvt.u64.u32 %rd179, %r590; 2026-02-21T09:02:17.3328389Z cvt.u64.u32 %rd180, %r591; 2026-02-21T09:02:17.3328547Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:02:17.3328702Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:02:17.3328953Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3329237Z mov.b64 {%r878, %r879}, %rd182; 2026-02-21T09:02:17.3329395Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:02:17.3329666Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3329965Z cvt.u64.u32 %rd183, %r592; 2026-02-21T09:02:17.3330119Z cvt.u64.u32 %rd184, %r593; 2026-02-21T09:02:17.3330265Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:02:17.3330424Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:02:17.3330687Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3330967Z mov.b64 {%r881, %r882}, %rd186; 2026-02-21T09:02:17.3331134Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:02:17.3331405Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3331699Z cvt.u64.u32 %rd187, %r594; 2026-02-21T09:02:17.3331852Z cvt.u64.u32 %rd188, %r595; 2026-02-21T09:02:17.3332013Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:02:17.3332177Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:02:17.3332471Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3332771Z mov.b64 {%r884, %r885}, %rd190; 2026-02-21T09:02:17.3332937Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:02:17.3333223Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3333519Z cvt.u64.u32 %rd191, %r596; 2026-02-21T09:02:17.3333680Z cvt.u64.u32 %rd192, %r597; 2026-02-21T09:02:17.3333833Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:02:17.3333997Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:02:17.3334273Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3334567Z mov.b64 {%r887, %r888}, %rd194; 2026-02-21T09:02:17.3334769Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:02:17.3335053Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3335355Z cvt.u64.u32 %rd195, %r598; 2026-02-21T09:02:17.3335512Z cvt.u64.u32 %rd196, %r599; 2026-02-21T09:02:17.3335674Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:02:17.3335838Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:02:17.3336112Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3336442Z mov.b64 {%r890, %r891}, %rd198; 2026-02-21T09:02:17.3336608Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:02:17.3336898Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3337186Z cvt.u64.u32 %rd199, %r600; 2026-02-21T09:02:17.3337349Z cvt.u64.u32 %rd200, %r601; 2026-02-21T09:02:17.3337502Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:02:17.3337663Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:02:17.3337940Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3338225Z mov.b64 {%r893, %r894}, %rd202; 2026-02-21T09:02:17.3338399Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:02:17.3338683Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3338980Z cvt.u64.u32 %rd203, %r603; 2026-02-21T09:02:17.3339139Z cvt.u64.u32 %rd204, %r604; 2026-02-21T09:02:17.3339308Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:02:17.3339474Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:02:17.3339774Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3340076Z mov.b64 {%r896, %r897}, %rd206; 2026-02-21T09:02:17.3340242Z cvt.rn.f16x2.f32 %r898, %r897, %r896; 2026-02-21T09:02:17.3340538Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3340814Z cvt.u64.u32 %rd207, %r605; 2026-02-21T09:02:17.3340968Z cvt.u64.u32 %rd208, %r606; 2026-02-21T09:02:17.3341113Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:02:17.3341273Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:02:17.3341587Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3341862Z mov.b64 {%r899, %r900}, %rd210; 2026-02-21T09:02:17.3342029Z cvt.rn.f16x2.f32 %r901, %r900, %r899; 2026-02-21T09:02:17.3342303Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3342589Z cvt.u64.u32 %rd211, %r607; 2026-02-21T09:02:17.3342735Z cvt.u64.u32 %rd212, %r608; 2026-02-21T09:02:17.3342891Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:02:17.3343049Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:02:17.3343309Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3343598Z mov.b64 {%r902, %r903}, %rd214; 2026-02-21T09:02:17.3343757Z cvt.rn.f16x2.f32 %r904, %r903, %r902; 2026-02-21T09:02:17.3344064Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3344347Z cvt.u64.u32 %rd215, %r609; 2026-02-21T09:02:17.3344499Z cvt.u64.u32 %rd216, %r610; 2026-02-21T09:02:17.3344643Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:02:17.3344832Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:02:17.3345093Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3345362Z mov.b64 {%r905, %r906}, %rd218; 2026-02-21T09:02:17.3345527Z cvt.rn.f16x2.f32 %r907, %r906, %r905; 2026-02-21T09:02:17.3345791Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3346066Z cvt.u64.u32 %rd219, %r611; 2026-02-21T09:02:17.3346211Z cvt.u64.u32 %rd220, %r612; 2026-02-21T09:02:17.3346365Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:02:17.3346521Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:02:17.3346774Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3347051Z mov.b64 {%r908, %r909}, %rd222; 2026-02-21T09:02:17.3347210Z cvt.rn.f16x2.f32 %r910, %r909, %r908; 2026-02-21T09:02:17.3347479Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3347783Z cvt.u64.u32 %rd223, %r613; 2026-02-21T09:02:17.3347937Z cvt.u64.u32 %rd224, %r614; 2026-02-21T09:02:17.3348083Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:02:17.3348240Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:02:17.3348498Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3348555Z mov.b64 {%r911, %r912}, %rd226; 2026-02-21T09:02:17.3348616Z cvt.rn.f16x2.f32 %r913, %r912, %r911; 2026-02-21T09:02:17.3348771Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3348835Z cvt.u64.u32 %rd227, %r615; 2026-02-21T09:02:17.3348892Z cvt.u64.u32 %rd228, %r616; 2026-02-21T09:02:17.3348949Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:02:17.3349015Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:02:17.3349170Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3349228Z mov.b64 {%r914, %r915}, %rd230; 2026-02-21T09:02:17.3349299Z cvt.rn.f16x2.f32 %r916, %r915, %r914; 2026-02-21T09:02:17.3349454Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3349537Z cvt.u64.u32 %rd231, %r617; 2026-02-21T09:02:17.3349597Z cvt.u64.u32 %rd232, %r618; 2026-02-21T09:02:17.3349666Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:02:17.3349724Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:02:17.3349880Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3349946Z mov.b64 {%r917, %r918}, %rd234; 2026-02-21T09:02:17.3350007Z cvt.rn.f16x2.f32 %r919, %r918, %r917; 2026-02-21T09:02:17.3350160Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3350270Z cvt.u64.u32 %rd235, %r620; 2026-02-21T09:02:17.3350327Z cvt.u64.u32 %rd236, %r621; 2026-02-21T09:02:17.3350383Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:02:17.3350442Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:02:17.3350608Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3350666Z mov.b64 {%r920, %r921}, %rd238; 2026-02-21T09:02:17.3350726Z cvt.rn.f16x2.f32 %r922, %r921, %r920; 2026-02-21T09:02:17.3350889Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3350944Z cvt.u64.u32 %rd239, %r622; 2026-02-21T09:02:17.3350999Z cvt.u64.u32 %rd240, %r623; 2026-02-21T09:02:17.3351062Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:02:17.3351118Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:02:17.3351300Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3351359Z mov.b64 {%r923, %r924}, %rd242; 2026-02-21T09:02:17.3351427Z cvt.rn.f16x2.f32 %r925, %r924, %r923; 2026-02-21T09:02:17.3351582Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3351640Z cvt.u64.u32 %rd243, %r624; 2026-02-21T09:02:17.3351704Z cvt.u64.u32 %rd244, %r625; 2026-02-21T09:02:17.3351761Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:02:17.3351819Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:02:17.3351980Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3352037Z mov.b64 {%r926, %r927}, %rd246; 2026-02-21T09:02:17.3352097Z cvt.rn.f16x2.f32 %r928, %r927, %r926; 2026-02-21T09:02:17.3352250Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3352316Z cvt.u64.u32 %rd247, %r626; 2026-02-21T09:02:17.3352373Z cvt.u64.u32 %rd248, %r627; 2026-02-21T09:02:17.3352429Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:02:17.3352493Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:02:17.3352649Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3352729Z mov.b64 {%r929, %r930}, %rd250; 2026-02-21T09:02:17.3352797Z cvt.rn.f16x2.f32 %r931, %r930, %r929; 2026-02-21T09:02:17.3352956Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3353013Z cvt.u64.u32 %rd251, %r628; 2026-02-21T09:02:17.3353068Z cvt.u64.u32 %rd252, %r629; 2026-02-21T09:02:17.3353131Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:02:17.3353187Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:02:17.3353343Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3353409Z mov.b64 {%r932, %r933}, %rd254; 2026-02-21T09:02:17.3353470Z cvt.rn.f16x2.f32 %r934, %r933, %r932; 2026-02-21T09:02:17.3353625Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3353687Z cvt.u64.u32 %rd255, %r630; 2026-02-21T09:02:17.3353744Z cvt.u64.u32 %rd256, %r631; 2026-02-21T09:02:17.3353799Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:02:17.3353853Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:02:17.3354037Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3354095Z mov.b64 {%r935, %r936}, %rd258; 2026-02-21T09:02:17.3354154Z cvt.rn.f16x2.f32 %r937, %r936, %r935; 2026-02-21T09:02:17.3354319Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3354375Z cvt.u64.u32 %rd259, %r632; 2026-02-21T09:02:17.3354430Z cvt.u64.u32 %rd260, %r633; 2026-02-21T09:02:17.3354492Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:02:17.3354548Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:02:17.3354760Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3354819Z mov.b64 {%r938, %r939}, %rd262; 2026-02-21T09:02:17.3354888Z cvt.rn.f16x2.f32 %r940, %r939, %r938; 2026-02-21T09:02:17.3355045Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3355101Z cvt.u64.u32 %rd263, %r634; 2026-02-21T09:02:17.3355164Z cvt.u64.u32 %rd264, %r635; 2026-02-21T09:02:17.3355220Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:02:17.3355276Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:02:17.3355443Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3355500Z mov.b64 {%r941, %r942}, %rd266; 2026-02-21T09:02:17.3355558Z cvt.rn.f16x2.f32 %r943, %r942, %r941; 2026-02-21T09:02:17.3355749Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3355814Z cvt.u64.u32 %rd267, %r637; 2026-02-21T09:02:17.3355868Z cvt.u64.u32 %rd268, %r638; 2026-02-21T09:02:17.3355924Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:02:17.3355990Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:02:17.3356148Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3356205Z mov.b64 {%r944, %r945}, %rd270; 2026-02-21T09:02:17.3356272Z cvt.rn.f16x2.f32 %r946, %r945, %r944; 2026-02-21T09:02:17.3356427Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3356483Z cvt.u64.u32 %rd271, %r639; 2026-02-21T09:02:17.3356539Z cvt.u64.u32 %rd272, %r640; 2026-02-21T09:02:17.3356602Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:02:17.3356658Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:02:17.3356814Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3356880Z mov.b64 {%r947, %r948}, %rd274; 2026-02-21T09:02:17.3356941Z cvt.rn.f16x2.f32 %r949, %r948, %r947; 2026-02-21T09:02:17.3357097Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3357188Z cvt.u64.u32 %rd275, %r641; 2026-02-21T09:02:17.3357245Z cvt.u64.u32 %rd276, %r642; 2026-02-21T09:02:17.3357301Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:02:17.3357358Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:02:17.3357529Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3357585Z mov.b64 {%r950, %r951}, %rd278; 2026-02-21T09:02:17.3357644Z cvt.rn.f16x2.f32 %r952, %r951, %r950; 2026-02-21T09:02:17.3357817Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3357874Z cvt.u64.u32 %rd279, %r643; 2026-02-21T09:02:17.3357933Z cvt.u64.u32 %rd280, %r644; 2026-02-21T09:02:17.3358001Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:02:17.3358059Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:02:17.3358234Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3358293Z mov.b64 {%r953, %r954}, %rd282; 2026-02-21T09:02:17.3358360Z cvt.rn.f16x2.f32 %r955, %r954, %r953; 2026-02-21T09:02:17.3358551Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3358608Z cvt.u64.u32 %rd283, %r645; 2026-02-21T09:02:17.3358673Z cvt.u64.u32 %rd284, %r646; 2026-02-21T09:02:17.3358728Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:02:17.3358784Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:02:17.3358956Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3359012Z mov.b64 {%r956, %r957}, %rd286; 2026-02-21T09:02:17.3359071Z cvt.rn.f16x2.f32 %r958, %r957, %r956; 2026-02-21T09:02:17.3359260Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3359323Z cvt.u64.u32 %rd287, %r647; 2026-02-21T09:02:17.3359377Z cvt.u64.u32 %rd288, %r648; 2026-02-21T09:02:17.3359435Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:02:17.3359498Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:02:17.3359662Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3359719Z mov.b64 {%r959, %r960}, %rd290; 2026-02-21T09:02:17.3359786Z cvt.rn.f16x2.f32 %r961, %r960, %r959; 2026-02-21T09:02:17.3359945Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3360002Z cvt.u64.u32 %rd291, %r649; 2026-02-21T09:02:17.3360056Z cvt.u64.u32 %rd292, %r650; 2026-02-21T09:02:17.3360119Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:02:17.3360196Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:02:17.3360361Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3360426Z mov.b64 {%r962, %r963}, %rd294; 2026-02-21T09:02:17.3360485Z cvt.rn.f16x2.f32 %r964, %r963, %r962; 2026-02-21T09:02:17.3360646Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3360708Z cvt.u64.u32 %rd295, %r651; 2026-02-21T09:02:17.3360764Z cvt.u64.u32 %rd296, %r652; 2026-02-21T09:02:17.3360820Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:02:17.3360875Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:02:17.3361041Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3361097Z mov.b64 {%r965, %r966}, %rd298; 2026-02-21T09:02:17.3361156Z cvt.rn.f16x2.f32 %r967, %r966, %r965; 2026-02-21T09:02:17.3361324Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3361381Z cvt.u64.u32 %rd299, %r654; 2026-02-21T09:02:17.3361436Z cvt.u64.u32 %rd300, %r655; 2026-02-21T09:02:17.3361497Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:02:17.3361552Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:02:17.3361714Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3361795Z mov.b64 {%r968, %r969}, %rd302; 2026-02-21T09:02:17.3361864Z cvt.rn.f16x2.f32 %r970, %r969, %r968; 2026-02-21T09:02:17.3362028Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3362085Z cvt.u64.u32 %rd303, %r656; 2026-02-21T09:02:17.3362147Z cvt.u64.u32 %rd304, %r657; 2026-02-21T09:02:17.3362202Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:02:17.3362258Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:02:17.3362426Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3362482Z mov.b64 {%r971, %r972}, %rd306; 2026-02-21T09:02:17.3362543Z cvt.rn.f16x2.f32 %r973, %r972, %r971; 2026-02-21T09:02:17.3362701Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3362764Z cvt.u64.u32 %rd307, %r658; 2026-02-21T09:02:17.3362818Z cvt.u64.u32 %rd308, %r659; 2026-02-21T09:02:17.3362873Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:02:17.3362935Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:02:17.3363119Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3363176Z mov.b64 {%r974, %r975}, %rd310; 2026-02-21T09:02:17.3363241Z cvt.rn.f16x2.f32 %r976, %r975, %r974; 2026-02-21T09:02:17.3363404Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3363460Z cvt.u64.u32 %rd311, %r660; 2026-02-21T09:02:17.3363513Z cvt.u64.u32 %rd312, %r661; 2026-02-21T09:02:17.3363576Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:02:17.3363656Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:02:17.3363813Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3363876Z mov.b64 {%r977, %r978}, %rd314; 2026-02-21T09:02:17.3363936Z cvt.rn.f16x2.f32 %r979, %r978, %r977; 2026-02-21T09:02:17.3364094Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3364158Z cvt.u64.u32 %rd315, %r662; 2026-02-21T09:02:17.3364212Z cvt.u64.u32 %rd316, %r663; 2026-02-21T09:02:17.3364267Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:02:17.3364322Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:02:17.3364491Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3364546Z mov.b64 {%r980, %r981}, %rd318; 2026-02-21T09:02:17.3364605Z cvt.rn.f16x2.f32 %r982, %r981, %r980; 2026-02-21T09:02:17.3364833Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3364893Z cvt.u64.u32 %rd319, %r664; 2026-02-21T09:02:17.3364947Z cvt.u64.u32 %rd320, %r665; 2026-02-21T09:02:17.3365012Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:02:17.3365069Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:02:17.3365229Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3365285Z mov.b64 {%r983, %r984}, %rd322; 2026-02-21T09:02:17.3365357Z cvt.rn.f16x2.f32 %r985, %r984, %r983; 2026-02-21T09:02:17.3365518Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3365574Z cvt.u64.u32 %rd323, %r666; 2026-02-21T09:02:17.3365639Z cvt.u64.u32 %rd324, %r667; 2026-02-21T09:02:17.3365696Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:02:17.3365751Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:02:17.3365921Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3365979Z mov.b64 {%r986, %r987}, %rd326; 2026-02-21T09:02:17.3366040Z cvt.rn.f16x2.f32 %r988, %r987, %r986; 2026-02-21T09:02:17.3366201Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3366299Z cvt.u64.u32 %rd327, %r668; 2026-02-21T09:02:17.3366358Z cvt.u64.u32 %rd328, %r669; 2026-02-21T09:02:17.3366415Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:02:17.3366485Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:02:17.3366649Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3366707Z mov.b64 {%r989, %r990}, %rd330; 2026-02-21T09:02:17.3366781Z cvt.rn.f16x2.f32 %r991, %r990, %r989; 2026-02-21T09:02:17.3366944Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3367002Z cvt.u64.u32 %rd331, %r671; 2026-02-21T09:02:17.3367059Z cvt.u64.u32 %rd332, %r672; 2026-02-21T09:02:17.3367123Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:02:17.3367179Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:02:17.3367344Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3367411Z mov.b64 {%r992, %r993}, %rd334; 2026-02-21T09:02:17.3367470Z cvt.rn.f16x2.f32 %r994, %r993, %r992; 2026-02-21T09:02:17.3367675Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3367741Z cvt.u64.u32 %rd335, %r673; 2026-02-21T09:02:17.3367795Z cvt.u64.u32 %rd336, %r674; 2026-02-21T09:02:17.3367851Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:02:17.3367906Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:02:17.3368076Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3368133Z mov.b64 {%r995, %r996}, %rd338; 2026-02-21T09:02:17.3368194Z cvt.rn.f16x2.f32 %r997, %r996, %r995; 2026-02-21T09:02:17.3368390Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3368446Z cvt.u64.u32 %rd339, %r675; 2026-02-21T09:02:17.3368503Z cvt.u64.u32 %rd340, %r676; 2026-02-21T09:02:17.3368566Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:02:17.3368622Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:02:17.3368782Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3368838Z mov.b64 {%r998, %r999}, %rd342; 2026-02-21T09:02:17.3368913Z cvt.rn.f16x2.f32 %r1000, %r999, %r998; 2026-02-21T09:02:17.3369071Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3369126Z cvt.u64.u32 %rd343, %r677; 2026-02-21T09:02:17.3369190Z cvt.u64.u32 %rd344, %r678; 2026-02-21T09:02:17.3369247Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:02:17.3369326Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:02:17.3369498Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3369560Z mov.b64 {%r1001, %r1002}, %rd346; 2026-02-21T09:02:17.3369630Z cvt.rn.f16x2.f32 %r1003, %r1002, %r1001; 2026-02-21T09:02:17.3369794Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3369858Z cvt.u64.u32 %rd347, %r679; 2026-02-21T09:02:17.3369914Z cvt.u64.u32 %rd348, %r680; 2026-02-21T09:02:17.3369970Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:02:17.3370034Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:02:17.3370196Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3370257Z mov.b64 {%r1004, %r1005}, %rd350; 2026-02-21T09:02:17.3370332Z cvt.rn.f16x2.f32 %r1006, %r1005, %r1004; 2026-02-21T09:02:17.3370494Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3370551Z cvt.u64.u32 %rd351, %r681; 2026-02-21T09:02:17.3370605Z cvt.u64.u32 %rd352, %r682; 2026-02-21T09:02:17.3370666Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:02:17.3370722Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:02:17.3370910Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3370974Z mov.b64 {%r1007, %r1008}, %rd354; 2026-02-21T09:02:17.3371042Z cvt.rn.f16x2.f32 %r1009, %r1008, %r1007; 2026-02-21T09:02:17.3371204Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3371265Z cvt.u64.u32 %rd355, %r683; 2026-02-21T09:02:17.3371320Z cvt.u64.u32 %rd356, %r684; 2026-02-21T09:02:17.3371375Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:02:17.3371430Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:02:17.3371602Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3371661Z mov.b64 {%r1010, %r1011}, %rd358; 2026-02-21T09:02:17.3371725Z cvt.rn.f16x2.f32 %r1012, %r1011, %r1010; 2026-02-21T09:02:17.3371891Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3371947Z cvt.u64.u32 %rd359, %r685; 2026-02-21T09:02:17.3372001Z cvt.u64.u32 %rd360, %r686; 2026-02-21T09:02:17.3372062Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:02:17.3372145Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:02:17.3372308Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3372366Z mov.b64 {%r1013, %r1014}, %rd362; 2026-02-21T09:02:17.3372437Z cvt.rn.f16x2.f32 %r1015, %r1014, %r1013; 2026-02-21T09:02:17.3372599Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3372655Z cvt.u64.u32 %rd363, %r688; 2026-02-21T09:02:17.3372720Z cvt.u64.u32 %rd364, %r689; 2026-02-21T09:02:17.3372799Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:02:17.3372855Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:02:17.3373026Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3373084Z mov.b64 {%r1016, %r1017}, %rd366; 2026-02-21T09:02:17.3373148Z cvt.rn.f16x2.f32 %r1018, %r1017, %r1016; 2026-02-21T09:02:17.3373308Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3373371Z cvt.u64.u32 %rd367, %r690; 2026-02-21T09:02:17.3373426Z cvt.u64.u32 %rd368, %r691; 2026-02-21T09:02:17.3373481Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:02:17.3373543Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:02:17.3373704Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3373761Z mov.b64 {%r1019, %r1020}, %rd370; 2026-02-21T09:02:17.3373853Z cvt.rn.f16x2.f32 %r1021, %r1020, %r1019; 2026-02-21T09:02:17.3374016Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3374073Z cvt.u64.u32 %rd371, %r692; 2026-02-21T09:02:17.3374128Z cvt.u64.u32 %rd372, %r693; 2026-02-21T09:02:17.3374193Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:02:17.3374249Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:02:17.3374411Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3374477Z mov.b64 {%r1022, %r1023}, %rd374; 2026-02-21T09:02:17.3374543Z cvt.rn.f16x2.f32 %r1024, %r1023, %r1022; 2026-02-21T09:02:17.3374749Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3374816Z cvt.u64.u32 %rd375, %r694; 2026-02-21T09:02:17.3374874Z cvt.u64.u32 %rd376, %r695; 2026-02-21T09:02:17.3374932Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:02:17.3374992Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:02:17.3375172Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3375233Z mov.b64 {%r1025, %r1026}, %rd378; 2026-02-21T09:02:17.3375300Z cvt.rn.f16x2.f32 %r1027, %r1026, %r1025; 2026-02-21T09:02:17.3375517Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3375578Z cvt.u64.u32 %rd379, %r696; 2026-02-21T09:02:17.3375637Z cvt.u64.u32 %rd380, %r697; 2026-02-21T09:02:17.3375706Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:02:17.3375765Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:02:17.3375932Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3375991Z mov.b64 {%r1028, %r1029}, %rd382; 2026-02-21T09:02:17.3376073Z cvt.rn.f16x2.f32 %r1030, %r1029, %r1028; 2026-02-21T09:02:17.3376246Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3376310Z cvt.u64.u32 %rd383, %r698; 2026-02-21T09:02:17.3376386Z cvt.u64.u32 %rd384, %r699; 2026-02-21T09:02:17.3376449Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:02:17.3376509Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:02:17.3376685Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3376746Z mov.b64 {%r1031, %r1032}, %rd386; 2026-02-21T09:02:17.3376842Z cvt.rn.f16x2.f32 %r1033, %r1032, %r1031; 2026-02-21T09:02:17.3377012Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3377081Z cvt.u64.u32 %rd387, %r700; 2026-02-21T09:02:17.3377141Z cvt.u64.u32 %rd388, %r701; 2026-02-21T09:02:17.3377198Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:02:17.3377267Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:02:17.3377436Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3377525Z mov.b64 {%r1034, %r1035}, %rd390; 2026-02-21T09:02:17.3377599Z cvt.rn.f16x2.f32 %r1036, %r1035, %r1034; 2026-02-21T09:02:17.3377766Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3377827Z cvt.u64.u32 %rd391, %r702; 2026-02-21T09:02:17.3377886Z cvt.u64.u32 %rd392, %r703; 2026-02-21T09:02:17.3377953Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:02:17.3378012Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:02:17.3378182Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3378250Z mov.b64 {%r1037, %r1038}, %rd394; 2026-02-21T09:02:17.3378318Z cvt.rn.f16x2.f32 %r1039, %r1038, %r1037; 2026-02-21T09:02:17.3378489Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3378554Z cvt.u64.u32 %rd395, %r705; 2026-02-21T09:02:17.3378648Z cvt.u64.u32 %rd396, %r706; 2026-02-21T09:02:17.3378712Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:02:17.3378769Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:02:17.3378943Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3379005Z mov.b64 {%r1040, %r1041}, %rd398; 2026-02-21T09:02:17.3379072Z cvt.rn.f16x2.f32 %r1042, %r1041, %r1040; 2026-02-21T09:02:17.3379249Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3379308Z cvt.u64.u32 %rd399, %r707; 2026-02-21T09:02:17.3379366Z cvt.u64.u32 %rd400, %r708; 2026-02-21T09:02:17.3379432Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:02:17.3379490Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:02:17.3379655Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3379714Z mov.b64 {%r1043, %r1044}, %rd402; 2026-02-21T09:02:17.3379788Z cvt.rn.f16x2.f32 %r1045, %r1044, %r1043; 2026-02-21T09:02:17.3379957Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3380014Z cvt.u64.u32 %rd403, %r709; 2026-02-21T09:02:17.3380079Z cvt.u64.u32 %rd404, %r710; 2026-02-21T09:02:17.3380162Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:02:17.3380220Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:02:17.3380393Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3380453Z mov.b64 {%r1046, %r1047}, %rd406; 2026-02-21T09:02:17.3380519Z cvt.rn.f16x2.f32 %r1048, %r1047, %r1046; 2026-02-21T09:02:17.3380686Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3380752Z cvt.u64.u32 %rd407, %r711; 2026-02-21T09:02:17.3380811Z cvt.u64.u32 %rd408, %r712; 2026-02-21T09:02:17.3380869Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:02:17.3380938Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:02:17.3381110Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3381169Z mov.b64 {%r1049, %r1050}, %rd410; 2026-02-21T09:02:17.3381242Z cvt.rn.f16x2.f32 %r1051, %r1050, %r1049; 2026-02-21T09:02:17.3381409Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3381467Z cvt.u64.u32 %rd411, %r713; 2026-02-21T09:02:17.3381523Z cvt.u64.u32 %rd412, %r714; 2026-02-21T09:02:17.3381613Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:02:17.3381673Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:02:17.3381843Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3381910Z mov.b64 {%r1052, %r1053}, %rd414; 2026-02-21T09:02:17.3381977Z cvt.rn.f16x2.f32 %r1054, %r1053, %r1052; 2026-02-21T09:02:17.3382147Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3382238Z cvt.u64.u32 %rd415, %r715; 2026-02-21T09:02:17.3382295Z cvt.u64.u32 %rd416, %r716; 2026-02-21T09:02:17.3382353Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:02:17.3382411Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:02:17.3382588Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3382647Z mov.b64 {%r1055, %r1056}, %rd418; 2026-02-21T09:02:17.3382712Z cvt.rn.f16x2.f32 %r1057, %r1056, %r1055; 2026-02-21T09:02:17.3382892Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3382949Z cvt.u64.u32 %rd419, %r717; 2026-02-21T09:02:17.3383006Z cvt.u64.u32 %rd420, %r718; 2026-02-21T09:02:17.3383069Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:02:17.3383128Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:02:17.3383336Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3383400Z mov.b64 {%r1058, %r1059}, %rd422; 2026-02-21T09:02:17.3383485Z cvt.rn.f16x2.f32 %r1060, %r1059, %r1058; 2026-02-21T09:02:17.3383639Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3383695Z cvt.u64.u32 %rd423, %r719; 2026-02-21T09:02:17.3383759Z cvt.u64.u32 %rd424, %r720; 2026-02-21T09:02:17.3383814Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:02:17.3383868Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:02:17.3384032Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3384091Z mov.b64 {%r1061, %r1062}, %rd426; 2026-02-21T09:02:17.3384156Z cvt.rn.f16x2.f32 %r1063, %r1062, %r1061; 2026-02-21T09:02:17.3384313Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3384378Z cvt.u64.u32 %rd427, %r722; 2026-02-21T09:02:17.3384436Z cvt.u64.u32 %rd428, %r723; 2026-02-21T09:02:17.3384495Z shl.b64 %rd429, %rd428, 32; 2026-02-21T09:02:17.3384564Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T09:02:17.3385277Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3385385Z mov.b64 {%r1064, %r1065}, %rd430; 2026-02-21T09:02:17.3385465Z cvt.rn.f16x2.f32 %r1066, %r1065, %r1064; 2026-02-21T09:02:17.3385633Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3385695Z cvt.u64.u32 %rd431, %r724; 2026-02-21T09:02:17.3385755Z cvt.u64.u32 %rd432, %r725; 2026-02-21T09:02:17.3385826Z shl.b64 %rd433, %rd432, 32; 2026-02-21T09:02:17.3385886Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T09:02:17.3386049Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3386121Z mov.b64 {%r1067, %r1068}, %rd434; 2026-02-21T09:02:17.3386189Z cvt.rn.f16x2.f32 %r1069, %r1068, %r1067; 2026-02-21T09:02:17.3386358Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3386429Z cvt.u64.u32 %rd435, %r726; 2026-02-21T09:02:17.3386486Z cvt.u64.u32 %rd436, %r727; 2026-02-21T09:02:17.3386543Z shl.b64 %rd437, %rd436, 32; 2026-02-21T09:02:17.3386600Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T09:02:17.3386771Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3386864Z mov.b64 {%r1070, %r1071}, %rd438; 2026-02-21T09:02:17.3386929Z cvt.rn.f16x2.f32 %r1072, %r1071, %r1070; 2026-02-21T09:02:17.3387094Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3387149Z cvt.u64.u32 %rd439, %r728; 2026-02-21T09:02:17.3387206Z cvt.u64.u32 %rd440, %r729; 2026-02-21T09:02:17.3387267Z shl.b64 %rd441, %rd440, 32; 2026-02-21T09:02:17.3387326Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T09:02:17.3387487Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3387577Z mov.b64 {%r1073, %r1074}, %rd442; 2026-02-21T09:02:17.3387650Z cvt.rn.f16x2.f32 %r1075, %r1074, %r1073; 2026-02-21T09:02:17.3387812Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3387867Z cvt.u64.u32 %rd443, %r730; 2026-02-21T09:02:17.3387929Z cvt.u64.u32 %rd444, %r731; 2026-02-21T09:02:17.3387985Z shl.b64 %rd445, %rd444, 32; 2026-02-21T09:02:17.3388042Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T09:02:17.3388211Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3388269Z mov.b64 {%r1076, %r1077}, %rd446; 2026-02-21T09:02:17.3388332Z cvt.rn.f16x2.f32 %r1078, %r1077, %r1076; 2026-02-21T09:02:17.3388528Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3388593Z cvt.u64.u32 %rd447, %r732; 2026-02-21T09:02:17.3388649Z cvt.u64.u32 %rd448, %r733; 2026-02-21T09:02:17.3388705Z shl.b64 %rd449, %rd448, 32; 2026-02-21T09:02:17.3388767Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T09:02:17.3388926Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3388985Z mov.b64 {%r1079, %r1080}, %rd450; 2026-02-21T09:02:17.3389057Z cvt.rn.f16x2.f32 %r1081, %r1080, %r1079; 2026-02-21T09:02:17.3389216Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3389271Z cvt.u64.u32 %rd451, %r734; 2026-02-21T09:02:17.3389325Z cvt.u64.u32 %rd452, %r735; 2026-02-21T09:02:17.3389389Z shl.b64 %rd453, %rd452, 32; 2026-02-21T09:02:17.3389445Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T09:02:17.3389603Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3389669Z mov.b64 {%r1082, %r1083}, %rd454; 2026-02-21T09:02:17.3389736Z cvt.rn.f16x2.f32 %r1084, %r1083, %r1082; 2026-02-21T09:02:17.3389892Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3389982Z cvt.u64.u32 %rd455, %r736; 2026-02-21T09:02:17.3390036Z cvt.u64.u32 %rd456, %r737; 2026-02-21T09:02:17.3390092Z shl.b64 %rd457, %rd456, 32; 2026-02-21T09:02:17.3390147Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T09:02:17.3390314Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3390370Z mov.b64 {%r1085, %r1086}, %rd458; 2026-02-21T09:02:17.3390434Z cvt.rn.f16x2.f32 %r1087, %r1086, %r1085; 2026-02-21T09:02:17.3390601Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3390656Z cvt.u64.u32 %rd459, %r739; 2026-02-21T09:02:17.3390710Z cvt.u64.u32 %rd460, %r740; 2026-02-21T09:02:17.3390772Z shl.b64 %rd461, %rd460, 32; 2026-02-21T09:02:17.3390829Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T09:02:17.3390994Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3391051Z mov.b64 {%r1088, %r1089}, %rd462; 2026-02-21T09:02:17.3391123Z cvt.rn.f16x2.f32 %r1090, %r1089, %r1088; 2026-02-21T09:02:17.3391287Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3391367Z cvt.u64.u32 %rd463, %r741; 2026-02-21T09:02:17.3391431Z cvt.u64.u32 %rd464, %r742; 2026-02-21T09:02:17.3391485Z shl.b64 %rd465, %rd464, 32; 2026-02-21T09:02:17.3391541Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T09:02:17.3391708Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3391765Z mov.b64 {%r1091, %r1092}, %rd466; 2026-02-21T09:02:17.3391830Z cvt.rn.f16x2.f32 %r1093, %r1092, %r1091; 2026-02-21T09:02:17.3391995Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3392085Z cvt.u64.u32 %rd467, %r743; 2026-02-21T09:02:17.3392141Z cvt.u64.u32 %rd468, %r744; 2026-02-21T09:02:17.3392196Z shl.b64 %rd469, %rd468, 32; 2026-02-21T09:02:17.3392261Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T09:02:17.3392422Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3392481Z mov.b64 {%r1094, %r1095}, %rd470; 2026-02-21T09:02:17.3392553Z cvt.rn.f16x2.f32 %r1096, %r1095, %r1094; 2026-02-21T09:02:17.3392714Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3392771Z cvt.u64.u32 %rd471, %r745; 2026-02-21T09:02:17.3392826Z cvt.u64.u32 %rd472, %r746; 2026-02-21T09:02:17.3392890Z shl.b64 %rd473, %rd472, 32; 2026-02-21T09:02:17.3392947Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T09:02:17.3393130Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3393199Z mov.b64 {%r1097, %r1098}, %rd474; 2026-02-21T09:02:17.3393263Z cvt.rn.f16x2.f32 %r1099, %r1098, %r1097; 2026-02-21T09:02:17.3393430Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3393498Z cvt.u64.u32 %rd475, %r747; 2026-02-21T09:02:17.3393556Z cvt.u64.u32 %rd476, %r748; 2026-02-21T09:02:17.3393615Z shl.b64 %rd477, %rd476, 32; 2026-02-21T09:02:17.3393918Z [148s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:02:17.3394968Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 512, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=3, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:02:17.3395107Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:02:17.3395195Z `ptxas` stderr: 2026-02-21T09:02:17.3395538Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 247 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:02:17.3395638Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:02:17.3395642Z 2026-02-21T09:02:17.3396044Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmphh7dslzw.ptx -o /tmp/tmphh7dslzw.ptx.o 2026-02-21T09:02:17.3396049Z 2026-02-21T09:02:17.3396176Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:02:17.3396245Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T09:02:17.3396414Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3396476Z mov.b64 {%r1100, %r1101}, %rd478; 2026-02-21T09:02:17.3396553Z cvt.rn.f16x2.f32 %r1102, %r1101, %r1100; 2026-02-21T09:02:17.3396714Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3396774Z cvt.u64.u32 %rd479, %r749; 2026-02-21T09:02:17.3396832Z cvt.u64.u32 %rd480, %r750; 2026-02-21T09:02:17.3396927Z shl.b64 %rd481, %rd480, 32; 2026-02-21T09:02:17.3396987Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T09:02:17.3397150Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3397216Z mov.b64 {%r1103, %r1104}, %rd482; 2026-02-21T09:02:17.3397282Z cvt.rn.f16x2.f32 %r1105, %r1104, %r1103; 2026-02-21T09:02:17.3397448Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3397512Z cvt.u64.u32 %rd483, %r751; 2026-02-21T09:02:17.3397594Z cvt.u64.u32 %rd484, %r752; 2026-02-21T09:02:17.3397651Z shl.b64 %rd485, %rd484, 32; 2026-02-21T09:02:17.3397708Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T09:02:17.3397880Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3397940Z mov.b64 {%r1106, %r1107}, %rd486; 2026-02-21T09:02:17.3398005Z cvt.rn.f16x2.f32 %r1108, %r1107, %r1106; 2026-02-21T09:02:17.3398178Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3398235Z cvt.u64.u32 %rd487, %r753; 2026-02-21T09:02:17.3398291Z cvt.u64.u32 %rd488, %r754; 2026-02-21T09:02:17.3398356Z shl.b64 %rd489, %rd488, 32; 2026-02-21T09:02:17.3398413Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T09:02:17.3398578Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3398659Z mov.b64 {%r1109, %r1110}, %rd490; 2026-02-21T09:02:17.3398734Z cvt.rn.f16x2.f32 %r1111, %r1110, %r1109; 2026-02-21T09:02:17.3398894Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3398951Z cvt.u64.u32 %rd491, %r756; 2026-02-21T09:02:17.3399013Z cvt.u64.u32 %rd492, %r757; 2026-02-21T09:02:17.3399068Z shl.b64 %rd493, %rd492, 32; 2026-02-21T09:02:17.3399124Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T09:02:17.3399297Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3399353Z mov.b64 {%r1112, %r1113}, %rd494; 2026-02-21T09:02:17.3399416Z cvt.rn.f16x2.f32 %r1114, %r1113, %r1112; 2026-02-21T09:02:17.3399579Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3399643Z cvt.u64.u32 %rd495, %r758; 2026-02-21T09:02:17.3399698Z cvt.u64.u32 %rd496, %r759; 2026-02-21T09:02:17.3399755Z shl.b64 %rd497, %rd496, 32; 2026-02-21T09:02:17.3399820Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T09:02:17.3399982Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3400039Z mov.b64 {%r1115, %r1116}, %rd498; 2026-02-21T09:02:17.3400132Z cvt.rn.f16x2.f32 %r1117, %r1116, %r1115; 2026-02-21T09:02:17.3400289Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3400346Z cvt.u64.u32 %rd499, %r760; 2026-02-21T09:02:17.3400400Z cvt.u64.u32 %rd500, %r761; 2026-02-21T09:02:17.3400463Z shl.b64 %rd501, %rd500, 32; 2026-02-21T09:02:17.3400518Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T09:02:17.3400680Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3400744Z mov.b64 {%r1118, %r1119}, %rd502; 2026-02-21T09:02:17.3400807Z cvt.rn.f16x2.f32 %r1120, %r1119, %r1118; 2026-02-21T09:02:17.3400972Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3401036Z cvt.u64.u32 %rd503, %r762; 2026-02-21T09:02:17.3401089Z cvt.u64.u32 %rd504, %r763; 2026-02-21T09:02:17.3401146Z shl.b64 %rd505, %rd504, 32; 2026-02-21T09:02:17.3401203Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T09:02:17.3401368Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3401448Z mov.b64 {%r1121, %r1122}, %rd506; 2026-02-21T09:02:17.3401514Z cvt.rn.f16x2.f32 %r1123, %r1122, %r1121; 2026-02-21T09:02:17.3401681Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3401737Z cvt.u64.u32 %rd507, %r764; 2026-02-21T09:02:17.3401791Z cvt.u64.u32 %rd508, %r765; 2026-02-21T09:02:17.3401853Z shl.b64 %rd509, %rd508, 32; 2026-02-21T09:02:17.3401909Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T09:02:17.3402066Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3402144Z mov.b64 {%r1124, %r1125}, %rd510; 2026-02-21T09:02:17.3402214Z cvt.rn.f16x2.f32 %r1126, %r1125, %r1124; 2026-02-21T09:02:17.3402369Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3402426Z cvt.u64.u32 %rd511, %r766; 2026-02-21T09:02:17.3402487Z cvt.u64.u32 %rd512, %r767; 2026-02-21T09:02:17.3402545Z shl.b64 %rd513, %rd512, 32; 2026-02-21T09:02:17.3402602Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T09:02:17.3402763Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3402820Z mov.b64 {%r1127, %r1128}, %rd514; 2026-02-21T09:02:17.3402883Z cvt.rn.f16x2.f32 %r1129, %r1128, %r1127; 2026-02-21T09:02:17.3403039Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3403135Z cvt.u64.u32 %rd515, %r768; 2026-02-21T09:02:17.3403193Z cvt.u64.u32 %rd516, %r769; 2026-02-21T09:02:17.3403250Z shl.b64 %rd517, %rd516, 32; 2026-02-21T09:02:17.3403314Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T09:02:17.3403469Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3403528Z mov.b64 {%r1130, %r1131}, %rd518; 2026-02-21T09:02:17.3403601Z cvt.rn.f16x2.f32 %r1132, %r1131, %r1130; 2026-02-21T09:02:17.3403758Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3403815Z cvt.u64.u32 %rd519, %r770; 2026-02-21T09:02:17.3403871Z cvt.u64.u32 %rd520, %r771; 2026-02-21T09:02:17.3403940Z shl.b64 %rd521, %rd520, 32; 2026-02-21T09:02:17.3403999Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T09:02:17.3404154Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3404221Z mov.b64 {%r1133, %r1134}, %rd522; 2026-02-21T09:02:17.3404287Z cvt.rn.f16x2.f32 %r1135, %r1134, %r1133; 2026-02-21T09:02:17.3404440Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3404502Z cvt.u64.u32 %rd523, %r773; 2026-02-21T09:02:17.3404580Z cvt.u64.u32 %rd524, %r774; 2026-02-21T09:02:17.3404637Z shl.b64 %rd525, %rd524, 32; 2026-02-21T09:02:17.3404739Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T09:02:17.3404911Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3404970Z mov.b64 {%r1136, %r1137}, %rd526; 2026-02-21T09:02:17.3405034Z cvt.rn.f16x2.f32 %r1138, %r1137, %r1136; 2026-02-21T09:02:17.3405201Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3405255Z cvt.u64.u32 %rd527, %r775; 2026-02-21T09:02:17.3405311Z cvt.u64.u32 %rd528, %r776; 2026-02-21T09:02:17.3405374Z shl.b64 %rd529, %rd528, 32; 2026-02-21T09:02:17.3405432Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T09:02:17.3405594Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3405651Z mov.b64 {%r1139, %r1140}, %rd530; 2026-02-21T09:02:17.3405725Z cvt.rn.f16x2.f32 %r1141, %r1140, %r1139; 2026-02-21T09:02:17.3405886Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3405969Z cvt.u64.u32 %rd531, %r777; 2026-02-21T09:02:17.3406035Z cvt.u64.u32 %rd532, %r778; 2026-02-21T09:02:17.3406093Z shl.b64 %rd533, %rd532, 32; 2026-02-21T09:02:17.3406151Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T09:02:17.3406323Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3406382Z mov.b64 {%r1142, %r1143}, %rd534; 2026-02-21T09:02:17.3406449Z cvt.rn.f16x2.f32 %r1144, %r1143, %r1142; 2026-02-21T09:02:17.3406606Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3406705Z cvt.u64.u32 %rd535, %r779; 2026-02-21T09:02:17.3406761Z cvt.u64.u32 %rd536, %r780; 2026-02-21T09:02:17.3406817Z shl.b64 %rd537, %rd536, 32; 2026-02-21T09:02:17.3406884Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T09:02:17.3407046Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3407104Z mov.b64 {%r1145, %r1146}, %rd538; 2026-02-21T09:02:17.3407177Z cvt.rn.f16x2.f32 %r1147, %r1146, %r1145; 2026-02-21T09:02:17.3407340Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3407395Z cvt.u64.u32 %rd539, %r781; 2026-02-21T09:02:17.3407450Z cvt.u64.u32 %rd540, %r782; 2026-02-21T09:02:17.3407516Z shl.b64 %rd541, %rd540, 32; 2026-02-21T09:02:17.3407572Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T09:02:17.3407760Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3407830Z mov.b64 {%r1148, %r1149}, %rd542; 2026-02-21T09:02:17.3407894Z cvt.rn.f16x2.f32 %r1150, %r1149, %r1148; 2026-02-21T09:02:17.3408051Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3408117Z cvt.u64.u32 %rd543, %r783; 2026-02-21T09:02:17.3408173Z cvt.u64.u32 %rd544, %r784; 2026-02-21T09:02:17.3408227Z shl.b64 %rd545, %rd544, 32; 2026-02-21T09:02:17.3408284Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T09:02:17.3408453Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3408513Z mov.b64 {%r1151, %r1152}, %rd546; 2026-02-21T09:02:17.3408576Z cvt.rn.f16x2.f32 %r1153, %r1152, %r1151; 2026-02-21T09:02:17.3408751Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3408808Z cvt.u64.u32 %rd547, %r785; 2026-02-21T09:02:17.3408865Z cvt.u64.u32 %rd548, %r786; 2026-02-21T09:02:17.3408928Z shl.b64 %rd549, %rd548, 32; 2026-02-21T09:02:17.3408985Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T09:02:17.3409145Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3409231Z mov.b64 {%r1154, %r1155}, %rd550; 2026-02-21T09:02:17.3409301Z cvt.rn.f16x2.f32 %r1156, %r1155, %r1154; 2026-02-21T09:02:17.3409463Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3409518Z cvt.u64.u32 %rd551, %r787; 2026-02-21T09:02:17.3409579Z cvt.u64.u32 %rd552, %r788; 2026-02-21T09:02:17.3409634Z shl.b64 %rd553, %rd552, 32; 2026-02-21T09:02:17.3409689Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T09:02:17.3409851Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3409910Z mov.b64 {%r1157, %r1158}, %rd554; 2026-02-21T09:02:17.3409975Z cvt.rn.f16x2.f32 %r1159, %r1158, %r1157; 2026-02-21T09:02:17.3410128Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3410190Z cvt.u64.u32 %rd555, %r790; 2026-02-21T09:02:17.3410247Z cvt.u64.u32 %rd556, %r791; 2026-02-21T09:02:17.3410303Z shl.b64 %rd557, %rd556, 32; 2026-02-21T09:02:17.3410365Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T09:02:17.3410545Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3410603Z mov.b64 {%r1160, %r1161}, %rd558; 2026-02-21T09:02:17.3410674Z cvt.rn.f16x2.f32 %r1162, %r1161, %r1160; 2026-02-21T09:02:17.3410835Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3410891Z cvt.u64.u32 %rd559, %r792; 2026-02-21T09:02:17.3410947Z cvt.u64.u32 %rd560, %r793; 2026-02-21T09:02:17.3411012Z shl.b64 %rd561, %rd560, 32; 2026-02-21T09:02:17.3411094Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T09:02:17.3411251Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3411316Z mov.b64 {%r1163, %r1164}, %rd562; 2026-02-21T09:02:17.3411381Z cvt.rn.f16x2.f32 %r1165, %r1164, %r1163; 2026-02-21T09:02:17.3411538Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3411602Z cvt.u64.u32 %rd563, %r794; 2026-02-21T09:02:17.3411659Z cvt.u64.u32 %rd564, %r795; 2026-02-21T09:02:17.3411716Z shl.b64 %rd565, %rd564, 32; 2026-02-21T09:02:17.3411772Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T09:02:17.3411939Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3411995Z mov.b64 {%r1166, %r1167}, %rd566; 2026-02-21T09:02:17.3412059Z cvt.rn.f16x2.f32 %r1168, %r1167, %r1166; 2026-02-21T09:02:17.3412247Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3412309Z cvt.u64.u32 %rd567, %r796; 2026-02-21T09:02:17.3412364Z cvt.u64.u32 %rd568, %r797; 2026-02-21T09:02:17.3412428Z shl.b64 %rd569, %rd568, 32; 2026-02-21T09:02:17.3412488Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T09:02:17.3412656Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3412714Z mov.b64 {%r1169, %r1170}, %rd570; 2026-02-21T09:02:17.3412787Z cvt.rn.f16x2.f32 %r1171, %r1170, %r1169; 2026-02-21T09:02:17.3412948Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3413003Z cvt.u64.u32 %rd571, %r798; 2026-02-21T09:02:17.3413066Z cvt.u64.u32 %rd572, %r799; 2026-02-21T09:02:17.3413121Z shl.b64 %rd573, %rd572, 32; 2026-02-21T09:02:17.3413177Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T09:02:17.3413345Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3413404Z mov.b64 {%r1172, %r1173}, %rd574; 2026-02-21T09:02:17.3413467Z cvt.rn.f16x2.f32 %r1174, %r1173, %r1172; 2026-02-21T09:02:17.3413627Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3413724Z cvt.u64.u32 %rd575, %r800; 2026-02-21T09:02:17.3413778Z cvt.u64.u32 %rd576, %r801; 2026-02-21T09:02:17.3413833Z shl.b64 %rd577, %rd576, 32; 2026-02-21T09:02:17.3413898Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T09:02:17.3414056Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3414113Z mov.b64 {%r1175, %r1176}, %rd578; 2026-02-21T09:02:17.3414183Z cvt.rn.f16x2.f32 %r1177, %r1176, %r1175; 2026-02-21T09:02:17.3414345Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3414401Z cvt.u64.u32 %rd579, %r802; 2026-02-21T09:02:17.3414458Z cvt.u64.u32 %rd580, %r803; 2026-02-21T09:02:17.3414522Z shl.b64 %rd581, %rd580, 32; 2026-02-21T09:02:17.3414578Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T09:02:17.3414776Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3414844Z mov.b64 {%r1178, %r1179}, %rd582; 2026-02-21T09:02:17.3414907Z cvt.rn.f16x2.f32 %r1180, %r1179, %r1178; 2026-02-21T09:02:17.3415094Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3415158Z cvt.u64.u32 %rd583, %r804; 2026-02-21T09:02:17.3415214Z cvt.u64.u32 %rd584, %r805; 2026-02-21T09:02:17.3415269Z shl.b64 %rd585, %rd584, 32; 2026-02-21T09:02:17.3415325Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T09:02:17.3415496Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3415555Z mov.b64 {%r1181, %r1182}, %rd586; 2026-02-21T09:02:17.3415619Z cvt.rn.f16x2.f32 %r1183, %r1182, %r1181; 2026-02-21T09:02:17.3415815Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3415872Z cvt.u64.u32 %rd587, %r807; 2026-02-21T09:02:17.3415930Z cvt.u64.u32 %rd588, %r808; 2026-02-21T09:02:17.3415996Z shl.b64 %rd589, %rd588, 32; 2026-02-21T09:02:17.3416054Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T09:02:17.3416221Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3416280Z mov.b64 {%r1184, %r1185}, %rd590; 2026-02-21T09:02:17.3416354Z cvt.rn.f16x2.f32 %r1186, %r1185, %r1184; 2026-02-21T09:02:17.3416512Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3416566Z cvt.u64.u32 %rd591, %r809; 2026-02-21T09:02:17.3416630Z cvt.u64.u32 %rd592, %r810; 2026-02-21T09:02:17.3416713Z shl.b64 %rd593, %rd592, 32; 2026-02-21T09:02:17.3416773Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T09:02:17.3416943Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3416999Z mov.b64 {%r1187, %r1188}, %rd594; 2026-02-21T09:02:17.3417065Z cvt.rn.f16x2.f32 %r1189, %r1188, %r1187; 2026-02-21T09:02:17.3417221Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3417285Z cvt.u64.u32 %rd595, %r811; 2026-02-21T09:02:17.3417341Z cvt.u64.u32 %rd596, %r812; 2026-02-21T09:02:17.3417397Z shl.b64 %rd597, %rd596, 32; 2026-02-21T09:02:17.3417460Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T09:02:17.3417621Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3417678Z mov.b64 {%r1190, %r1191}, %rd598; 2026-02-21T09:02:17.3417746Z cvt.rn.f16x2.f32 %r1192, %r1191, %r1190; 2026-02-21T09:02:17.3417914Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3417971Z cvt.u64.u32 %rd599, %r813; 2026-02-21T09:02:17.3418026Z cvt.u64.u32 %rd600, %r814; 2026-02-21T09:02:17.3418092Z shl.b64 %rd601, %rd600, 32; 2026-02-21T09:02:17.3418152Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T09:02:17.3418353Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3418419Z mov.b64 {%r1193, %r1194}, %rd602; 2026-02-21T09:02:17.3418487Z cvt.rn.f16x2.f32 %r1195, %r1194, %r1193; 2026-02-21T09:02:17.3418654Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3418719Z cvt.u64.u32 %rd603, %r815; 2026-02-21T09:02:17.3418777Z cvt.u64.u32 %rd604, %r816; 2026-02-21T09:02:17.3418834Z shl.b64 %rd605, %rd604, 32; 2026-02-21T09:02:17.3418892Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T09:02:17.3419068Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3419129Z mov.b64 {%r1196, %r1197}, %rd606; 2026-02-21T09:02:17.3419195Z cvt.rn.f16x2.f32 %r1198, %r1197, %r1196; 2026-02-21T09:02:17.3419364Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3419424Z cvt.u64.u32 %rd607, %r817; 2026-02-21T09:02:17.3419481Z cvt.u64.u32 %rd608, %r818; 2026-02-21T09:02:17.3419545Z shl.b64 %rd609, %rd608, 32; 2026-02-21T09:02:17.3419627Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T09:02:17.3419795Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3419854Z mov.b64 {%r1199, %r1200}, %rd610; 2026-02-21T09:02:17.3419927Z cvt.rn.f16x2.f32 %r1201, %r1200, %r1199; 2026-02-21T09:02:17.3420093Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3420153Z cvt.u64.u32 %rd611, %r819; 2026-02-21T09:02:17.3420218Z cvt.u64.u32 %rd612, %r820; 2026-02-21T09:02:17.3420323Z shl.b64 %rd613, %rd612, 32; 2026-02-21T09:02:17.3420381Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T09:02:17.3420561Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3420622Z mov.b64 {%r1202, %r1203}, %rd614; 2026-02-21T09:02:17.3420689Z cvt.rn.f16x2.f32 %r1204, %r1203, %r1202; 2026-02-21T09:02:17.3420861Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3420926Z cvt.u64.u32 %rd615, %r821; 2026-02-21T09:02:17.3420984Z cvt.u64.u32 %rd616, %r822; 2026-02-21T09:02:17.3421043Z shl.b64 %rd617, %rd616, 32; 2026-02-21T09:02:17.3421115Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T09:02:17.3421286Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3421345Z mov.b64 {%r1205, %r1206}, %rd618; 2026-02-21T09:02:17.3421449Z cvt.rn.f16x2.f32 %r1207, %r1206, %r1205; 2026-02-21T09:02:17.3421620Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3421681Z cvt.u64.u32 %rd619, %r824; 2026-02-21T09:02:17.3421741Z cvt.u64.u32 %rd620, %r825; 2026-02-21T09:02:17.3421810Z shl.b64 %rd621, %rd620, 32; 2026-02-21T09:02:17.3421870Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T09:02:17.3422038Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3422108Z mov.b64 {%r1208, %r1209}, %rd622; 2026-02-21T09:02:17.3422173Z cvt.rn.f16x2.f32 %r1210, %r1209, %r1208; 2026-02-21T09:02:17.3422340Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3422406Z cvt.u64.u32 %rd623, %r826; 2026-02-21T09:02:17.3422464Z cvt.u64.u32 %rd624, %r827; 2026-02-21T09:02:17.3422522Z shl.b64 %rd625, %rd624, 32; 2026-02-21T09:02:17.3422581Z or.b64 %rd626, %rd623, %rd625; 2026-02-21T09:02:17.3422756Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3422815Z mov.b64 {%r1211, %r1212}, %rd626; 2026-02-21T09:02:17.3422880Z cvt.rn.f16x2.f32 %r1213, %r1212, %r1211; 2026-02-21T09:02:17.3423083Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3423142Z cvt.u64.u32 %rd627, %r828; 2026-02-21T09:02:17.3423200Z cvt.u64.u32 %rd628, %r829; 2026-02-21T09:02:17.3423267Z shl.b64 %rd629, %rd628, 32; 2026-02-21T09:02:17.3423325Z or.b64 %rd630, %rd627, %rd629; 2026-02-21T09:02:17.3423494Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3423555Z mov.b64 {%r1214, %r1215}, %rd630; 2026-02-21T09:02:17.3423627Z cvt.rn.f16x2.f32 %r1216, %r1215, %r1214; 2026-02-21T09:02:17.3423794Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3423853Z cvt.u64.u32 %rd631, %r830; 2026-02-21T09:02:17.3423919Z cvt.u64.u32 %rd632, %r831; 2026-02-21T09:02:17.3423978Z shl.b64 %rd633, %rd632, 32; 2026-02-21T09:02:17.3424036Z or.b64 %rd634, %rd631, %rd633; 2026-02-21T09:02:17.3424208Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3424267Z mov.b64 {%r1217, %r1218}, %rd634; 2026-02-21T09:02:17.3424360Z cvt.rn.f16x2.f32 %r1219, %r1218, %r1217; 2026-02-21T09:02:17.3424530Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3424597Z cvt.u64.u32 %rd635, %r832; 2026-02-21T09:02:17.3424654Z cvt.u64.u32 %rd636, %r833; 2026-02-21T09:02:17.3424744Z shl.b64 %rd637, %rd636, 32; 2026-02-21T09:02:17.3424813Z or.b64 %rd638, %rd635, %rd637; 2026-02-21T09:02:17.3424982Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3425075Z mov.b64 {%r1220, %r1221}, %rd638; 2026-02-21T09:02:17.3425148Z cvt.rn.f16x2.f32 %r1222, %r1221, %r1220; 2026-02-21T09:02:17.3425318Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3425378Z cvt.u64.u32 %rd639, %r834; 2026-02-21T09:02:17.3425436Z cvt.u64.u32 %rd640, %r835; 2026-02-21T09:02:17.3425502Z shl.b64 %rd641, %rd640, 32; 2026-02-21T09:02:17.3425563Z or.b64 %rd642, %rd639, %rd641; 2026-02-21T09:02:17.3425733Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3425799Z mov.b64 {%r1223, %r1224}, %rd642; 2026-02-21T09:02:17.3425863Z cvt.rn.f16x2.f32 %r1225, %r1224, %r1223; 2026-02-21T09:02:17.3426029Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3426093Z cvt.u64.u32 %rd643, %r836; 2026-02-21T09:02:17.3426173Z cvt.u64.u32 %rd644, %r837; 2026-02-21T09:02:17.3426235Z shl.b64 %rd645, %rd644, 32; 2026-02-21T09:02:17.3426294Z or.b64 %rd646, %rd643, %rd645; 2026-02-21T09:02:17.3426468Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3426529Z mov.b64 {%r1226, %r1227}, %rd646; 2026-02-21T09:02:17.3426596Z cvt.rn.f16x2.f32 %r1228, %r1227, %r1226; 2026-02-21T09:02:17.3426773Z .loc 1 57 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:57:52 2026-02-21T09:02:17.3426832Z cvt.u64.u32 %rd647, %r838; 2026-02-21T09:02:17.3426888Z cvt.u64.u32 %rd648, %r839; 2026-02-21T09:02:17.3426952Z shl.b64 %rd649, %rd648, 32; 2026-02-21T09:02:17.3427011Z or.b64 %rd650, %rd647, %rd649; 2026-02-21T09:02:17.3427176Z .loc 1 59 27 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:59:27 2026-02-21T09:02:17.3427242Z mov.b64 {%r1229, %r1230}, %rd650; 2026-02-21T09:02:17.3427313Z cvt.rn.f16x2.f32 %r1231, %r1230, %r1229; 2026-02-21T09:02:17.3427477Z .loc 1 60 45 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:60:45 2026-02-21T09:02:17.3427547Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:02:17.3427608Z bar.sync 0; 2026-02-21T09:02:17.3427730Z st.shared.v4.b32 [%r11], {%r850, %r853, %r856, %r859}; 2026-02-21T09:02:17.3427830Z st.shared.v4.b32 [%r11+8192], {%r946, %r949, %r952, %r955}; 2026-02-21T09:02:17.3427944Z st.shared.v4.b32 [%r11+32768], {%r1042, %r1045, %r1048, %r1051}; 2026-02-21T09:02:17.3428045Z st.shared.v4.b32 [%r11+40960], {%r1138, %r1141, %r1144, %r1147}; 2026-02-21T09:02:17.3428131Z st.shared.v4.b32 [%r12], {%r862, %r865, %r868, %r871}; 2026-02-21T09:02:17.3428222Z st.shared.v4.b32 [%r12+8192], {%r958, %r961, %r964, %r967}; 2026-02-21T09:02:17.3428325Z st.shared.v4.b32 [%r12+32768], {%r1054, %r1057, %r1060, %r1063}; 2026-02-21T09:02:17.3428421Z st.shared.v4.b32 [%r12+40960], {%r1150, %r1153, %r1156, %r1159}; 2026-02-21T09:02:17.3428505Z st.shared.v4.b32 [%r13], {%r874, %r877, %r880, %r883}; 2026-02-21T09:02:17.3428606Z st.shared.v4.b32 [%r13+8192], {%r970, %r973, %r976, %r979}; 2026-02-21T09:02:17.3428700Z st.shared.v4.b32 [%r13+32768], {%r1066, %r1069, %r1072, %r1075}; 2026-02-21T09:02:17.3428797Z st.shared.v4.b32 [%r13+40960], {%r1162, %r1165, %r1168, %r1171}; 2026-02-21T09:02:17.3428887Z st.shared.v4.b32 [%r14], {%r886, %r889, %r892, %r895}; 2026-02-21T09:02:17.3428979Z st.shared.v4.b32 [%r14+8192], {%r982, %r985, %r988, %r991}; 2026-02-21T09:02:17.3429102Z st.shared.v4.b32 [%r14+32768], {%r1078, %r1081, %r1084, %r1087}; 2026-02-21T09:02:17.3429205Z st.shared.v4.b32 [%r14+40960], {%r1174, %r1177, %r1180, %r1183}; 2026-02-21T09:02:17.3429290Z st.shared.v4.b32 [%r15], {%r898, %r901, %r904, %r907}; 2026-02-21T09:02:17.3429385Z st.shared.v4.b32 [%r15+8192], {%r994, %r997, %r1000, %r1003}; 2026-02-21T09:02:17.3429477Z st.shared.v4.b32 [%r15+32768], {%r1090, %r1093, %r1096, %r1099}; 2026-02-21T09:02:17.3429582Z st.shared.v4.b32 [%r15+40960], {%r1186, %r1189, %r1192, %r1195}; 2026-02-21T09:02:17.3429686Z st.shared.v4.b32 [%r16], {%r910, %r913, %r916, %r919}; 2026-02-21T09:02:17.3429784Z st.shared.v4.b32 [%r16+8192], {%r1006, %r1009, %r1012, %r1015}; 2026-02-21T09:02:17.3429888Z st.shared.v4.b32 [%r16+32768], {%r1102, %r1105, %r1108, %r1111}; 2026-02-21T09:02:17.3429980Z st.shared.v4.b32 [%r16+40960], {%r1198, %r1201, %r1204, %r1207}; 2026-02-21T09:02:17.3430061Z st.shared.v4.b32 [%r17], {%r922, %r925, %r928, %r931}; 2026-02-21T09:02:17.3430168Z st.shared.v4.b32 [%r17+8192], {%r1018, %r1021, %r1024, %r1027}; 2026-02-21T09:02:17.3430262Z st.shared.v4.b32 [%r17+32768], {%r1114, %r1117, %r1120, %r1123}; 2026-02-21T09:02:17.3430357Z st.shared.v4.b32 [%r17+40960], {%r1210, %r1213, %r1216, %r1219}; 2026-02-21T09:02:17.3430450Z st.shared.v4.b32 [%r18], {%r934, %r937, %r940, %r943}; 2026-02-21T09:02:17.3430544Z st.shared.v4.b32 [%r18+8192], {%r1030, %r1033, %r1036, %r1039}; 2026-02-21T09:02:17.3430662Z st.shared.v4.b32 [%r18+32768], {%r1126, %r1129, %r1132, %r1135}; 2026-02-21T09:02:17.3430759Z st.shared.v4.b32 [%r18+40960], {%r1222, %r1225, %r1228, %r1231}; 2026-02-21T09:02:17.3430834Z // begin inline asm 2026-02-21T09:02:17.3430908Z fence.proxy.async.shared::cta; 2026-02-21T09:02:17.3430965Z // end inline asm 2026-02-21T09:02:17.3431028Z bar.sync 0; 2026-02-21T09:02:17.3431094Z elect.sync %r1232|%p127, -1; 2026-02-21T09:02:17.3431157Z and.pred %p124, %p126, %p127; 2026-02-21T09:02:17.3431215Z and.b32 %r1233, %r22, 7; 2026-02-21T09:02:17.3431278Z shl.b32 %r1234, %r1233, 13; 2026-02-21T09:02:17.3431335Z add.s32 %r843, %r389, %r1234; 2026-02-21T09:02:17.3431389Z shl.b32 %r1236, %r1233, 6; 2026-02-21T09:02:17.3431452Z or.b32 %r841, %r1236, %r21; 2026-02-21T09:02:17.3431505Z // begin inline asm 2026-02-21T09:02:17.3431687Z @%p124 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd137, {%r841, %r842}], [%r843]; 2026-02-21T09:02:17.3431747Z // end inline asm 2026-02-21T09:02:17.3431804Z xor.b32 %r1237, %r1233, 4; 2026-02-21T09:02:17.3431862Z shl.b32 %r1238, %r1237, 13; 2026-02-21T09:02:17.3431919Z add.s32 %r846, %r389, %r1238; 2026-02-21T09:02:17.3431981Z shl.b32 %r1239, %r1237, 6; 2026-02-21T09:02:17.3432036Z or.b32 %r844, %r1239, %r21; 2026-02-21T09:02:17.3432111Z // begin inline asm 2026-02-21T09:02:17.3432299Z @%p124 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd137, {%r844, %r842}], [%r846]; 2026-02-21T09:02:17.3432352Z // end inline asm 2026-02-21T09:02:17.3432415Z cp.async.bulk.commit_group; 2026-02-21T09:02:17.3432505Z $L__BB0_8: // %._crit_edge 2026-02-21T09:02:17.3432675Z .loc 1 32 52 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:32:52 2026-02-21T09:02:17.3432744Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:02:17.3432796Z bar.sync 0; 2026-02-21T09:02:17.3432968Z .loc 1 32 4 // c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py:32:4 2026-02-21T09:02:17.3433021Z bar.sync 0; 2026-02-21T09:02:17.3433076Z // begin inline asm 2026-02-21T09:02:17.3433195Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1240, 256; 2026-02-21T09:02:17.3433249Z // end inline asm 2026-02-21T09:02:17.3433299Z ret; 2026-02-21T09:02:17.3433353Z $L__tmp0: 2026-02-21T09:02:17.3433414Z $L__func_end0: 2026-02-21T09:02:17.3433493Z // -- End function 2026-02-21T09:02:17.3433543Z } 2026-02-21T09:02:17.3433772Z .file 1 "/tmp/torchinductor_root/4i/c4iudw7me6ypclwmrm4k6jmv2z2iod7el4mwegfekho3gqrilynf.py" 2026-02-21T09:02:17.3433834Z .section .debug_abbrev 2026-02-21T09:02:17.3433883Z { 2026-02-21T09:02:17.3433976Z .b8 1 // Abbreviation Code 2026-02-21T09:02:17.3434060Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:02:17.3434137Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:02:17.3434213Z .b8 37 // DW_AT_producer 2026-02-21T09:02:17.3434320Z .b8 8 // DW_FORM_string 2026-02-21T09:02:17.3434390Z .b8 19 // DW_AT_language 2026-02-21T09:02:17.3434463Z .b8 5 // DW_FORM_data2 2026-02-21T09:02:17.3434542Z .b8 3 // DW_AT_name 2026-02-21T09:02:17.3434611Z .b8 8 // DW_FORM_string 2026-02-21T09:02:17.3434718Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:02:17.3434800Z .b8 6 // DW_FORM_data4 2026-02-21T09:02:17.3434872Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:02:17.3434941Z .b8 8 // DW_FORM_string 2026-02-21T09:02:17.3435008Z .b8 0 // EOM(1) 2026-02-21T09:02:17.3435082Z .b8 0 // EOM(2) 2026-02-21T09:02:17.3435181Z .b8 0 // EOM(3) 2026-02-21T09:02:17.3435231Z } 2026-02-21T09:02:17.3435297Z .section .debug_info 2026-02-21T09:02:17.3435347Z { 2026-02-21T09:02:17.3435427Z .b32 104 // Length of Unit 2026-02-21T09:02:17.3435515Z .b8 2 // DWARF version number 2026-02-21T09:02:17.3435568Z .b8 0 2026-02-21T09:02:17.3435682Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:02:17.3435765Z .b8 8 // Address Size (in bytes) 2026-02-21T09:02:17.3435870Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:02:17.3435948Z .b8 116 // DW_AT_producer 2026-02-21T09:02:17.3436002Z .b8 114 2026-02-21T09:02:17.3436062Z .b8 105 2026-02-21T09:02:17.3436113Z .b8 116 2026-02-21T09:02:17.3436162Z .b8 111 2026-02-21T09:02:17.3436210Z .b8 110 2026-02-21T09:02:17.3436267Z .b8 0 2026-02-21T09:02:17.3436340Z .b8 2 // DW_AT_language 2026-02-21T09:02:17.3436392Z .b8 0 2026-02-21T09:02:17.3436474Z .b8 99 // DW_AT_name 2026-02-21T09:02:17.3436525Z .b8 52 2026-02-21T09:02:17.3436575Z .b8 105 2026-02-21T09:02:17.3436624Z .b8 117 2026-02-21T09:02:17.3436682Z .b8 100 2026-02-21T09:02:17.3436760Z .b8 119 2026-02-21T09:02:17.3436811Z .b8 55 2026-02-21T09:02:17.3436867Z .b8 109 2026-02-21T09:02:17.3436917Z .b8 101 2026-02-21T09:02:17.3436966Z .b8 54 2026-02-21T09:02:17.3437015Z .b8 121 2026-02-21T09:02:17.3437072Z .b8 112 2026-02-21T09:02:17.3437123Z .b8 99 2026-02-21T09:02:17.3437172Z .b8 108 2026-02-21T09:02:17.3437227Z .b8 119 2026-02-21T09:02:17.3437275Z .b8 109 2026-02-21T09:02:17.3437324Z .b8 114 2026-02-21T09:02:17.3437372Z .b8 109 2026-02-21T09:02:17.3437430Z .b8 52 2026-02-21T09:02:17.3437480Z .b8 107 2026-02-21T09:02:17.3437531Z .b8 54 2026-02-21T09:02:17.3437580Z .b8 106 2026-02-21T09:02:17.3437640Z .b8 109 2026-02-21T09:02:17.3437689Z .b8 118 2026-02-21T09:02:17.3437740Z .b8 50 2026-02-21T09:02:17.3437799Z .b8 122 2026-02-21T09:02:17.3437850Z .b8 50 2026-02-21T09:02:17.3437898Z .b8 105 2026-02-21T09:02:17.3437946Z .b8 111 2026-02-21T09:02:17.3438002Z .b8 100 2026-02-21T09:02:17.3438051Z .b8 55 2026-02-21T09:02:17.3438098Z .b8 101 2026-02-21T09:02:17.3438154Z .b8 108 2026-02-21T09:02:17.3438203Z .b8 52 2026-02-21T09:02:17.3438251Z .b8 109 2026-02-21T09:02:17.3438299Z .b8 119 2026-02-21T09:02:17.3438356Z .b8 101 2026-02-21T09:02:17.3438405Z .b8 103 2026-02-21T09:02:17.3438453Z .b8 102 2026-02-21T09:02:17.3438509Z .b8 101 2026-02-21T09:02:17.3438600Z .b8 107 2026-02-21T09:02:17.3438650Z .b8 104 2026-02-21T09:02:17.3438698Z .b8 111 2026-02-21T09:02:17.3438753Z .b8 51 2026-02-21T09:02:17.3438800Z .b8 103 2026-02-21T09:02:17.3438847Z .b8 113 2026-02-21T09:02:17.3438893Z .b8 114 2026-02-21T09:02:17.3438949Z .b8 105 2026-02-21T09:02:17.3439000Z .b8 108 2026-02-21T09:02:17.3439048Z .b8 121 2026-02-21T09:02:17.3439103Z .b8 110 2026-02-21T09:02:17.3439150Z .b8 102 2026-02-21T09:02:17.3439198Z .b8 46 2026-02-21T09:02:17.3439247Z .b8 112 2026-02-21T09:02:17.3439334Z .b8 121 2026-02-21T09:02:17.3439383Z .b8 0 2026-02-21T09:02:17.3439474Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:02:17.3439552Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:02:17.3439603Z .b8 116 2026-02-21T09:02:17.3439650Z .b8 109 2026-02-21T09:02:17.3439698Z .b8 112 2026-02-21T09:02:17.3439755Z .b8 47 2026-02-21T09:02:17.3439803Z .b8 116 2026-02-21T09:02:17.3439851Z .b8 111 2026-02-21T09:02:17.3439906Z .b8 114 2026-02-21T09:02:17.3439956Z .b8 99 2026-02-21T09:02:17.3440004Z .b8 104 2026-02-21T09:02:17.3440052Z .b8 105 2026-02-21T09:02:17.3440108Z .b8 110 2026-02-21T09:02:17.3440156Z .b8 100 2026-02-21T09:02:17.3440204Z .b8 117 2026-02-21T09:02:17.3440251Z .b8 99 2026-02-21T09:02:17.3440307Z .b8 116 2026-02-21T09:02:17.3440354Z .b8 111 2026-02-21T09:02:17.3440402Z .b8 114 2026-02-21T09:02:17.3440456Z .b8 95 2026-02-21T09:02:17.3440504Z .b8 114 2026-02-21T09:02:17.3440553Z .b8 111 2026-02-21T09:02:17.3440620Z .b8 111 2026-02-21T09:02:17.3440679Z .b8 116 2026-02-21T09:02:17.3440728Z .b8 47 2026-02-21T09:02:17.3440776Z .b8 52 2026-02-21T09:02:17.3440829Z .b8 105 2026-02-21T09:02:17.3440878Z .b8 0 2026-02-21T09:02:17.3440926Z } 2026-02-21T09:02:17.3440988Z .section .debug_macinfo { } 2026-02-21T09:02:17.3440992Z 2026-02-21T09:02:17.3441074Z ================================================================ 2026-02-21T09:02:17.3441173Z please share the reproducer above with Triton project. 2026-02-21T09:02:17.8089420Z 2026-02-21T09:02:17.8093217Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 49/49 20.8 configs/s 2026-02-21T09:02:18.8961092Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 930.5 2026-02-21T09:02:18.8965001Z configs/s 2026-02-21T09:02:18.9971330Z [149s] Generation 8 complete: 2026-02-21T09:02:18.9971697Z error=14 2026-02-21T09:02:18.9971881Z ok=38 2026-02-21T09:02:18.9972100Z min=0.0164 2026-02-21T09:02:18.9972314Z mid=0.0246 2026-02-21T09:02:18.9972475Z max=4.1923 2026-02-21T09:02:18.9972645Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:02:18.9972921Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:02:18.9973168Z 'l2_groupings': [8], 2026-02-21T09:02:18.9973648Z 'load_eviction_policies': ['', ''], 2026-02-21T09:02:18.9973861Z 'loop_orders': [[1, 0]], 2026-02-21T09:02:18.9974054Z 'num_stages': 4, 2026-02-21T09:02:18.9974223Z 'num_warps': 4, 2026-02-21T09:02:18.9974397Z 'pid_type': 'flat', 2026-02-21T09:02:18.9974608Z 'range_flattens': [None, False], 2026-02-21T09:02:18.9975200Z 'range_multi_buffers': [None, None], 2026-02-21T09:02:18.9975470Z 'range_num_stages': [0, 0], 2026-02-21T09:02:18.9975714Z 'range_unroll_factors': [0, 0], 2026-02-21T09:02:18.9975974Z 'range_warp_specializes': [None, False]} 2026-02-21T09:02:18.9997016Z [149s] Fitting surrogate: 711 points, 711 targets 2026-02-21T09:02:19.8091647Z [150s] Generation 9 starting: 45 neighbors, 3 active search path(s) 2026-02-21T09:02:24.4000154Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46/46 4.3 configs/s 2026-02-21T09:02:26.8266017Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 46/46 19.4 configs/s 2026-02-21T09:02:28.1272866Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 927.3 2026-02-21T09:02:28.1277070Z configs/s 2026-02-21T09:02:28.2219076Z [159s] Generation 9 complete: 2026-02-21T09:02:28.2219616Z error=6 2026-02-21T09:02:28.2219806Z ok=43 2026-02-21T09:02:28.2219959Z min=0.0164 2026-02-21T09:02:28.2220103Z mid=0.0286 2026-02-21T09:02:28.2220233Z max=2.2478 2026-02-21T09:02:28.2220367Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:02:28.2220583Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:02:28.2220790Z 'l2_groupings': [8], 2026-02-21T09:02:28.2220954Z 'load_eviction_policies': ['', ''], 2026-02-21T09:02:28.2221235Z 'loop_orders': [[1, 0]], 2026-02-21T09:02:28.2221402Z 'num_stages': 4, 2026-02-21T09:02:28.2221550Z 'num_warps': 4, 2026-02-21T09:02:28.2221688Z 'pid_type': 'flat', 2026-02-21T09:02:28.2221851Z 'range_flattens': [None, False], 2026-02-21T09:02:28.2222026Z 'range_multi_buffers': [None, None], 2026-02-21T09:02:28.2222215Z 'range_num_stages': [0, 0], 2026-02-21T09:02:28.2222378Z 'range_unroll_factors': [0, 0], 2026-02-21T09:02:28.2222563Z 'range_warp_specializes': [None, False]} 2026-02-21T09:02:28.2239959Z [159s] Fitting surrogate: 760 points, 760 targets 2026-02-21T09:02:28.6862955Z [159s] Generation 10 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:02:29.5293061Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 41.1 configs/s 2026-02-21T09:02:30.0277126Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 35.4 configs/s 2026-02-21T09:02:30.2620976Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 4147.3 2026-02-21T09:02:30.2622214Z configs/s 2026-02-21T09:02:30.2932150Z [161s] Generation 10 complete: 2026-02-21T09:02:30.2932424Z error=8 2026-02-21T09:02:30.2932569Z ok=10 2026-02-21T09:02:30.2932700Z min=0.0164 2026-02-21T09:02:30.2932857Z mid=0.0266 2026-02-21T09:02:30.2933018Z max=0.0962 2026-02-21T09:02:30.2933163Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:02:30.2933398Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:02:30.2933597Z 'l2_groupings': [8], 2026-02-21T09:02:30.2933762Z 'load_eviction_policies': ['', ''], 2026-02-21T09:02:30.2933938Z 'loop_orders': [[1, 0]], 2026-02-21T09:02:30.2934090Z 'num_stages': 4, 2026-02-21T09:02:30.2934233Z 'num_warps': 4, 2026-02-21T09:02:30.2934367Z 'pid_type': 'flat', 2026-02-21T09:02:30.2934526Z 'range_flattens': [None, False], 2026-02-21T09:02:30.2934893Z 'range_multi_buffers': [None, None], 2026-02-21T09:02:30.2935097Z 'range_num_stages': [0, 0], 2026-02-21T09:02:30.2935269Z 'range_unroll_factors': [0, 0], 2026-02-21T09:02:30.2935465Z 'range_warp_specializes': [None, False]} 2026-02-21T09:02:30.2954598Z [161s] Fitting surrogate: 778 points, 778 targets 2026-02-21T09:02:30.5722777Z [161s] Autotuning complete in 161.5s after searching 744 configs. 2026-02-21T09:02:30.5723125Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:02:30.5724361Z @helion.kernel(config=helion.Config(block_sizes=[128, 256, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['', ''], loop_orders=[[1, 0]], num_stages=4, num_warps=4, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:02:30.5725480Z 2026-02-21T09:02:30.5725736Z [161s] Code of selected kernel: /tmp/torchinductor_root/2k/c2k5v2gooldskcivjj6tuvj4cut3zftszhkud26r6432tddcs2r5.py 2026-02-21T09:02:30.5827674Z from __future__ import annotations 2026-02-21T09:02:30.5827895Z 2026-02-21T09:02:30.5828236Z import torch 2026-02-21T09:02:30.5828395Z import triton 2026-02-21T09:02:30.5828548Z import triton.language as tl 2026-02-21T09:02:30.5828840Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:02:30.5829079Z 2026-02-21T09:02:30.5829153Z _BLOCK_SIZE_1 = tl.constexpr(256) 2026-02-21T09:02:30.5829331Z _BLOCK_SIZE_0 = tl.constexpr(128) 2026-02-21T09:02:30.5829496Z _BLOCK_SIZE_2 = tl.constexpr(64) 2026-02-21T09:02:30.5829610Z 2026-02-21T09:02:30.5829666Z @triton.jit 2026-02-21T09:02:30.5829878Z def _helion_matmul(x, y, out): 2026-02-21T09:02:30.5830110Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:02:30.5830355Z num_pid_m = tl.cdiv(1024, _BLOCK_SIZE_1) 2026-02-21T09:02:30.5830545Z num_pid_n = tl.cdiv(4096, _BLOCK_SIZE_0) 2026-02-21T09:02:30.5830741Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:02:30.5830922Z num_pid_in_group = 8 * num_pid_n 2026-02-21T09:02:30.5831189Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:02:30.5831384Z first_pid_m = group_id * 8 2026-02-21T09:02:30.5831581Z group_size_m = min(num_pid_m - first_pid_m, 8) 2026-02-21T09:02:30.5831836Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:02:30.5832120Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:02:30.5832343Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:02:30.5832564Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:02:30.5832804Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:02:30.5833017Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:02:30.5833312Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:02:30.5833605Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:02:30.5833844Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:02:30.5834116Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:02:30.5834459Z for offset_2 in tl.range(0, 1024, _BLOCK_SIZE_2, warp_specialize=False, flatten=False): 2026-02-21T09:02:30.5834872Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:02:30.5835099Z acc_copy = acc 2026-02-21T09:02:30.5835257Z acc_copy_0 = acc_copy 2026-02-21T09:02:30.5835491Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:02:30.5835823Z load = tl.load(x + (indices_0[:, None] * 1024 + indices_2[None, :] * 1), None) 2026-02-21T09:02:30.5836145Z load_1 = tl.load(y + (indices_2[:, None] * 1 + indices_1[None, :] * 1024), None) 2026-02-21T09:02:30.5836585Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:02:30.5837006Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:02:30.5837250Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:02:30.5837498Z tl.store(out + (indices_0[:, None] * 1024 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:02:30.5837686Z 2026-02-21T09:02:30.5837972Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:02:30.5838405Z """ 2026-02-21T09:02:30.5838630Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:02:30.5838889Z Args: 2026-02-21T09:02:30.5839043Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:02:30.5839250Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:02:30.5839543Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:02:30.5839869Z after the matmul. Defaults to identity (no change). 2026-02-21T09:02:30.5840070Z Returns: 2026-02-21T09:02:30.5840232Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:02:30.5840458Z """ 2026-02-21T09:02:30.5840603Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:02:30.5840782Z m, k = x.size() 2026-02-21T09:02:30.5840944Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:02:30.5841129Z k2, n = y.size() 2026-02-21T09:02:30.5841333Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:02:30.5841582Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:02:30.5841816Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:02:30.5842110Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:02:30.5842397Z # src[matmul.py:62]: ) 2026-02-21T09:02:30.5842657Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:02:30.5842976Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:02:30.5843198Z _BLOCK_SIZE_1 = 256 2026-02-21T09:02:30.5843392Z _BLOCK_SIZE_0 = 128 2026-02-21T09:02:30.5843579Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:02:30.5843855Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:02:30.5844122Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:02:30.5844331Z # src[matmul.py:63-67]: ... 2026-02-21T09:02:30.5844721Z _launcher(_helion_matmul, (triton.cdiv(1024, _BLOCK_SIZE_1) * triton.cdiv(4096, _BLOCK_SIZE_0),), x, y, out, num_warps=4, num_stages=4) 2026-02-21T09:02:30.5845109Z # src[matmul.py:68]: return out 2026-02-21T09:02:30.5845281Z return out 2026-02-21T09:02:50.7010976Z WARNING:tritonbench.utils.triton_op:Completed input ID 0: 2026-02-21T09:02:50.7015282Z (M, N, K) 2026-02-21T09:02:50.7019733Z ------------------ 2026-02-21T09:02:50.7023520Z (4096, 1024, 1024) 2026-02-21T09:02:50.7027376Z 2026-02-21T09:02:50.7028138Z 12%|█▎ | 1/8 [04:46<33:27, 286.76s/it]WARNING:tritonbench.utils.triton_op:Running input ID 2: 2026-02-21T09:02:50.7028455Z (M, N, K) 2026-02-21T09:02:50.7032302Z ------------------ 2026-02-21T09:02:50.7032517Z (4096, 2048, 2048) 2026-02-21T09:02:50.7038073Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:03:38.7302312Z INFO:tritonbench.utils.triton_op:Took 0.02ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:04:17.5742764Z INFO:tritonbench.utils.triton_op:Took 88.78ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:04:59.4253241Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:04:59.4254492Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:04:59.4255023Z 'dtype': 'torch.float16', 2026-02-21T09:04:59.4257572Z 'shape': (4096, 2048), 2026-02-21T09:04:59.4257797Z 'stride': (2048, 1)}, 2026-02-21T09:04:59.4257982Z { 'device': 'cuda:0', 2026-02-21T09:04:59.4258160Z 'dtype': 'torch.float16', 2026-02-21T09:04:59.4258359Z 'shape': (2048, 2048), 2026-02-21T09:04:59.4258534Z 'stride': (1, 2048)}, 2026-02-21T09:04:59.4258698Z None), 2026-02-21T09:04:59.4258833Z 'kwargs': {}} 2026-02-21T09:04:59.4295531Z INFO:tritonbench.utils.triton_op:Took 4.65ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:04:59.5193680Z [0s] Autotune random seed: 2137757931 2026-02-21T09:04:59.6449392Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:05:03.7472378Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 19.4 configs/s 2026-02-21T09:05:18.9918115Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 6.5 configs/s 2026-02-21T09:05:18.9929584Z [19s] Adaptive compile timeout: 30s (90% percentile=2.0s, bounds=[30.0s, 30s]) 2026-02-21T09:05:19.2785539Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 3121.1 configs/s 2026-02-21T09:05:19.3264938Z [19s] Initial random population of 100, 5 starting points: 2026-02-21T09:05:19.3269691Z error=14 2026-02-21T09:05:19.3274070Z ok=86 2026-02-21T09:05:19.3275870Z min=0.1085 2026-02-21T09:05:19.3276025Z mid=2.5579 2026-02-21T09:05:19.3276161Z max=531.9147 2026-02-21T09:05:19.3276312Z best={'block_sizes': [64, 256, 16], 2026-02-21T09:05:19.3276572Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:05:19.3276786Z 'l2_groupings': [1], 2026-02-21T09:05:19.3276962Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:05:19.3277376Z 'loop_orders': [[1, 0]], 2026-02-21T09:05:19.3277546Z 'num_stages': 7, 2026-02-21T09:05:19.3277694Z 'num_warps': 8, 2026-02-21T09:05:19.3277834Z 'pid_type': 'flat', 2026-02-21T09:05:19.3278003Z 'range_flattens': [None, None], 2026-02-21T09:05:19.3278185Z 'range_multi_buffers': [None, False], 2026-02-21T09:05:19.3278381Z 'range_num_stages': [0, 0], 2026-02-21T09:05:19.3278547Z 'range_unroll_factors': [0, 0], 2026-02-21T09:05:19.3278836Z 'range_warp_specializes': [None, None]} 2026-02-21T09:05:19.3283558Z [19s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:05:20.6075219Z [20s] Generation 1 starting: 88 neighbors, 5 active search path(s) 2026-02-21T09:05:24.8661989Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93/93 5.8 configs/s 2026-02-21T09:05:29.1753354Z [29s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:29.1753628Z 2026-02-21T09:05:29.1758627Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:29.1759663Z 2026-02-21T09:05:29.1759871Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:29.1760147Z ================================================================ 2026-02-21T09:05:29.1761690Z `ptxas` stderr: 2026-02-21T09:05:29.1762161Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 260 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.1762932Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.1763082Z 2026-02-21T09:05:29.1763491Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpmriufajt.ptx -o /tmp/tmpmriufajt.ptx.o 2026-02-21T09:05:29.1763921Z 2026-02-21T09:05:29.1764049Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:29.1764317Z Internal Triton PTX codegen error 2026-02-21T09:05:29.1764491Z `ptxas` stderr: 2026-02-21T09:05:29.1765022Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 260 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.1765522Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.1765669Z 2026-02-21T09:05:29.1766056Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpmriufajt.ptx -o /tmp/tmpmriufajt.ptx.o 2026-02-21T09:05:29.1766519Z 2026-02-21T09:05:29.1766523Z 2026-02-21T09:05:29.1766650Z // 2026-02-21T09:05:29.1766801Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:29.1766988Z // 2026-02-21T09:05:29.1767056Z 2026-02-21T09:05:29.1767125Z .version 8.7 2026-02-21T09:05:29.1767259Z .target sm_100a 2026-02-21T09:05:29.1767397Z .address_size 64 2026-02-21T09:05:29.1767477Z 2026-02-21T09:05:29.1767597Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:29.1767854Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:29.1768060Z // @_helion_matmul 2026-02-21T09:05:29.1768344Z .visible .entry _helion_matmul( 2026-02-21T09:05:29.1768557Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:29.1768805Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:29.1769051Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:29.1769283Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:29.1769529Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:29.1769723Z ) 2026-02-21T09:05:29.1769847Z .reqntid 384 2026-02-21T09:05:29.1769972Z .maxnreg 32 2026-02-21T09:05:29.1770098Z { 2026-02-21T09:05:29.1770218Z .reg .pred %p<122>; 2026-02-21T09:05:29.1770368Z .reg .b32 %r<339>; 2026-02-21T09:05:29.1770508Z .reg .b64 %rd<113>; 2026-02-21T09:05:29.1770909Z .loc 1 19 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:19:0 2026-02-21T09:05:29.1771259Z $L__func_begin0: 2026-02-21T09:05:29.1771503Z .loc 1 19 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:19:0 2026-02-21T09:05:29.1771731Z 2026-02-21T09:05:29.1771782Z // %bb.0: 2026-02-21T09:05:29.1771927Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:05:29.1772117Z $L__tmp0: 2026-02-21T09:05:29.1772345Z .loc 1 19 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:19 2026-02-21T09:05:29.1772615Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:29.1772764Z shr.u32 %r2, %r1, 5; 2026-02-21T09:05:29.1772918Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:05:29.1773105Z setp.lt.u32 %p3, %r3, 8; 2026-02-21T09:05:29.1773256Z @%p3 bra $L__BB0_16; 2026-02-21T09:05:29.1773404Z bra.uni $L__BB0_1; 2026-02-21T09:05:29.1773535Z $L__BB0_16: 2026-02-21T09:05:29.1773769Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0:0 2026-02-21T09:05:29.1774069Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:05:29.1774276Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:05:29.1774483Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:05:29.1774803Z .loc 1 19 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:19 2026-02-21T09:05:29.1775146Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:05:29.1775335Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T09:05:29.1775510Z mov.b32 %r146, global_smem; 2026-02-21T09:05:29.1775678Z // begin inline asm 2026-02-21T09:05:29.1775931Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r146], 32; 2026-02-21T09:05:29.1776183Z // end inline asm 2026-02-21T09:05:29.1776315Z bar.sync 0, 256; 2026-02-21T09:05:29.1776472Z ld.shared.b32 %r310, [global_smem]; 2026-02-21T09:05:29.1776648Z bar.sync 0, 256; 2026-02-21T09:05:29.1776799Z // begin inline asm 2026-02-21T09:05:29.1777001Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:29.1777232Z // end inline asm 2026-02-21T09:05:29.1777479Z .loc 1 21 67 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:21:67 2026-02-21T09:05:29.1777761Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:05:29.1777918Z mov.u32 %r204, %ctaid.y; 2026-02-21T09:05:29.1778066Z mov.u32 %r205, %ctaid.z; 2026-02-21T09:05:29.1778220Z mov.u32 %r206, %nctaid.x; 2026-02-21T09:05:29.1778370Z mov.u32 %r207, %nctaid.y; 2026-02-21T09:05:29.1778532Z mad.lo.s32 %r208, %r205, %r207, %r204; 2026-02-21T09:05:29.1778749Z mad.lo.s32 %r209, %r208, %r206, %r41; 2026-02-21T09:05:29.1778927Z mul.lo.s32 %r210, %r209, 384; 2026-02-21T09:05:29.1779084Z cvt.s64.s32 %rd77, %r210; 2026-02-21T09:05:29.1779245Z add.s64 %rd38, %rd7, %rd77; 2026-02-21T09:05:29.1779406Z shl.b32 %r211, %r1, 2; 2026-02-21T09:05:29.1779554Z add.s32 %r147, %r146, %r211; 2026-02-21T09:05:29.1779709Z mov.b32 %r338, 0; 2026-02-21T09:05:29.1779842Z // begin inline asm 2026-02-21T09:05:29.1780001Z @%p29 st.shared.b32 [ %r147 + 0 ], %r338; 2026-02-21T09:05:29.1780200Z // end inline asm 2026-02-21T09:05:29.1780339Z bar.warp.sync -1; 2026-02-21T09:05:29.1780479Z setp.eq.b32 %p110, %r1, 0; 2026-02-21T09:05:29.1780639Z cvt.u64.u32 %rd23, %r146; 2026-02-21T09:05:29.1780783Z // begin inline asm 2026-02-21T09:05:29.1781035Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd4; 2026-02-21T09:05:29.1781313Z // end inline asm 2026-02-21T09:05:29.1781441Z // begin inline asm 2026-02-21T09:05:29.1781664Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.1781909Z // end inline asm 2026-02-21T09:05:29.1782044Z mov.b32 %r149, 16; 2026-02-21T09:05:29.1782175Z // begin inline asm 2026-02-21T09:05:29.1782411Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r149; 2026-02-21T09:05:29.1782678Z // end inline asm 2026-02-21T09:05:29.1782806Z mov.b32 %r150, 256; 2026-02-21T09:05:29.1782950Z // begin inline asm 2026-02-21T09:05:29.1783204Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r150; 2026-02-21T09:05:29.1783479Z // end inline asm 2026-02-21T09:05:29.1783609Z mov.b32 %r151, 2048; 2026-02-21T09:05:29.1783753Z // begin inline asm 2026-02-21T09:05:29.1783993Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r151; 2026-02-21T09:05:29.1784276Z // end inline asm 2026-02-21T09:05:29.1784413Z mov.b32 %r152, 4096; 2026-02-21T09:05:29.1784551Z // begin inline asm 2026-02-21T09:05:29.1784837Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r152; 2026-02-21T09:05:29.1785111Z // end inline asm 2026-02-21T09:05:29.1785251Z mov.b64 %rd31, 4096; 2026-02-21T09:05:29.1785389Z // begin inline asm 2026-02-21T09:05:29.1785646Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:05:29.1785930Z // end inline asm 2026-02-21T09:05:29.1786058Z mov.b32 %r153, 1; 2026-02-21T09:05:29.1786197Z // begin inline asm 2026-02-21T09:05:29.1786449Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r153; 2026-02-21T09:05:29.1786745Z // end inline asm 2026-02-21T09:05:29.1786878Z // begin inline asm 2026-02-21T09:05:29.1787175Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r153; 2026-02-21T09:05:29.1787451Z // end inline asm 2026-02-21T09:05:29.1787589Z // begin inline asm 2026-02-21T09:05:29.1787823Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:05:29.1788075Z // end inline asm 2026-02-21T09:05:29.1788214Z // begin inline asm 2026-02-21T09:05:29.1788455Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.1788734Z // end inline asm 2026-02-21T09:05:29.1788862Z // begin inline asm 2026-02-21T09:05:29.1789101Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.1789368Z // end inline asm 2026-02-21T09:05:29.1789499Z // begin inline asm 2026-02-21T09:05:29.1789729Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.1789980Z // end inline asm 2026-02-21T09:05:29.1790117Z // begin inline asm 2026-02-21T09:05:29.1790451Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd38 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:05:29.1790814Z // end inline asm 2026-02-21T09:05:29.1790981Z // begin inline asm 2026-02-21T09:05:29.1791184Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd38 + 0 ], 0x80; 2026-02-21T09:05:29.1791431Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.1791613Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.1791790Z // end inline asm 2026-02-21T09:05:29.1791916Z bar.sync 0, 256; 2026-02-21T09:05:29.1792159Z .loc 1 22 67 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:22:67 2026-02-21T09:05:29.1792473Z add.s32 %r212, %r210, 128; 2026-02-21T09:05:29.1792637Z cvt.s64.s32 %rd78, %r212; 2026-02-21T09:05:29.1792794Z add.s64 %rd56, %rd7, %rd78; 2026-02-21T09:05:29.1792942Z bar.sync 0, 256; 2026-02-21T09:05:29.1793077Z // begin inline asm 2026-02-21T09:05:29.1793226Z @%p29 st.shared.b32 [ %r147 + 0 ], %r338; 2026-02-21T09:05:29.1793405Z // end inline asm 2026-02-21T09:05:29.1793536Z bar.warp.sync -1; 2026-02-21T09:05:29.1793679Z // begin inline asm 2026-02-21T09:05:29.1793921Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd5; 2026-02-21T09:05:29.1794223Z // end inline asm 2026-02-21T09:05:29.1794364Z // begin inline asm 2026-02-21T09:05:29.1794591Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.1796083Z // end inline asm 2026-02-21T09:05:29.1796223Z // begin inline asm 2026-02-21T09:05:29.1796525Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r149; 2026-02-21T09:05:29.1796816Z // end inline asm 2026-02-21T09:05:29.1796966Z // begin inline asm 2026-02-21T09:05:29.1797208Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r149; 2026-02-21T09:05:29.1797497Z // end inline asm 2026-02-21T09:05:29.1797642Z // begin inline asm 2026-02-21T09:05:29.1797903Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r151; 2026-02-21T09:05:29.1798203Z // end inline asm 2026-02-21T09:05:29.1798344Z // begin inline asm 2026-02-21T09:05:29.1798610Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r151; 2026-02-21T09:05:29.1798891Z // end inline asm 2026-02-21T09:05:29.1799034Z // begin inline asm 2026-02-21T09:05:29.1799299Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:05:29.1799591Z // end inline asm 2026-02-21T09:05:29.1799732Z // begin inline asm 2026-02-21T09:05:29.1799995Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r153; 2026-02-21T09:05:29.1800294Z // end inline asm 2026-02-21T09:05:29.1800429Z // begin inline asm 2026-02-21T09:05:29.1800697Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r153; 2026-02-21T09:05:29.1801038Z // end inline asm 2026-02-21T09:05:29.1801174Z // begin inline asm 2026-02-21T09:05:29.1801422Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:05:29.1801696Z // end inline asm 2026-02-21T09:05:29.1801838Z // begin inline asm 2026-02-21T09:05:29.1802099Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.1802393Z // end inline asm 2026-02-21T09:05:29.1802519Z // begin inline asm 2026-02-21T09:05:29.1802755Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.1803025Z // end inline asm 2026-02-21T09:05:29.1803155Z // begin inline asm 2026-02-21T09:05:29.1803386Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.1803639Z // end inline asm 2026-02-21T09:05:29.1803774Z // begin inline asm 2026-02-21T09:05:29.1804105Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd56 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:05:29.1804486Z // end inline asm 2026-02-21T09:05:29.1804619Z // begin inline asm 2026-02-21T09:05:29.1804889Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd56 + 0 ], 0x80; 2026-02-21T09:05:29.1805142Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.1805327Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.1805509Z // end inline asm 2026-02-21T09:05:29.1805637Z bar.sync 0, 256; 2026-02-21T09:05:29.1805885Z .loc 1 24 71 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:24:71 2026-02-21T09:05:29.1806173Z add.s32 %r213, %r210, 256; 2026-02-21T09:05:29.1806331Z cvt.s64.s32 %rd79, %r213; 2026-02-21T09:05:29.1806549Z add.s64 %rd74, %rd7, %rd79; 2026-02-21T09:05:29.1806698Z bar.sync 0, 256; 2026-02-21T09:05:29.1806834Z // begin inline asm 2026-02-21T09:05:29.1806981Z @%p29 st.shared.b32 [ %r147 + 0 ], %r338; 2026-02-21T09:05:29.1807161Z // end inline asm 2026-02-21T09:05:29.1807294Z bar.warp.sync -1; 2026-02-21T09:05:29.1807439Z // begin inline asm 2026-02-21T09:05:29.1807687Z @%p110 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd6; 2026-02-21T09:05:29.1807995Z // end inline asm 2026-02-21T09:05:29.1808140Z // begin inline asm 2026-02-21T09:05:29.1808358Z @%p110 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.1808611Z // end inline asm 2026-02-21T09:05:29.1808739Z // begin inline asm 2026-02-21T09:05:29.1808972Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r149; 2026-02-21T09:05:29.1809236Z // end inline asm 2026-02-21T09:05:29.1809411Z // begin inline asm 2026-02-21T09:05:29.1809654Z @%p110 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r150; 2026-02-21T09:05:29.1809912Z // end inline asm 2026-02-21T09:05:29.1810047Z // begin inline asm 2026-02-21T09:05:29.1810283Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r151; 2026-02-21T09:05:29.1810565Z // end inline asm 2026-02-21T09:05:29.1810692Z // begin inline asm 2026-02-21T09:05:29.1810936Z @%p110 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r152; 2026-02-21T09:05:29.1811207Z // end inline asm 2026-02-21T09:05:29.1811337Z // begin inline asm 2026-02-21T09:05:29.1811593Z @%p110 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:05:29.1811875Z // end inline asm 2026-02-21T09:05:29.1812013Z // begin inline asm 2026-02-21T09:05:29.1812266Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r153; 2026-02-21T09:05:29.1812563Z // end inline asm 2026-02-21T09:05:29.1812693Z // begin inline asm 2026-02-21T09:05:29.1812947Z @%p110 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r153; 2026-02-21T09:05:29.1813228Z // end inline asm 2026-02-21T09:05:29.1813389Z // begin inline asm 2026-02-21T09:05:29.1813617Z @%p110 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:05:29.1813867Z // end inline asm 2026-02-21T09:05:29.1814001Z // begin inline asm 2026-02-21T09:05:29.1814244Z @%p110 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.1814521Z // end inline asm 2026-02-21T09:05:29.1814653Z // begin inline asm 2026-02-21T09:05:29.1814916Z @%p110 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.1815184Z // end inline asm 2026-02-21T09:05:29.1815313Z // begin inline asm 2026-02-21T09:05:29.1815541Z @%p110 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.1815797Z // end inline asm 2026-02-21T09:05:29.1815931Z // begin inline asm 2026-02-21T09:05:29.1816270Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd74 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:05:29.1816648Z // end inline asm 2026-02-21T09:05:29.1816784Z // begin inline asm 2026-02-21T09:05:29.1816985Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd74 + 0 ], 0x80; 2026-02-21T09:05:29.1817274Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.1817461Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.1817645Z // end inline asm 2026-02-21T09:05:29.1817775Z bar.sync 0, 256; 2026-02-21T09:05:29.1817925Z cvta.global.u64 %rd80, %rd74; 2026-02-21T09:05:29.1818209Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1818502Z max.u32 %r214, %r41, 2047; 2026-02-21T09:05:29.1818676Z shl.b32 %r215, %r214, 7; 2026-02-21T09:05:29.1818864Z sub.s32 %r42, 262144, %r215; 2026-02-21T09:05:29.1819130Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1819422Z shfl.sync.idx.b32 %r216, %r2, 0, 31, -1; 2026-02-21T09:05:29.1819612Z shl.b32 %r217, %r216, 21; 2026-02-21T09:05:29.1819773Z and.b32 %r218, %r217, 6291456; 2026-02-21T09:05:29.1819938Z add.s32 %r219, %r218, %r310; 2026-02-21T09:05:29.1820102Z shl.b32 %r220, %r216, 2; 2026-02-21T09:05:29.1820254Z and.b32 %r221, %r220, 16; 2026-02-21T09:05:29.1820417Z add.s32 %r252, %r219, %r221; 2026-02-21T09:05:29.1820572Z mov.pred %p85, -1; 2026-02-21T09:05:29.1820722Z // begin inline asm 2026-02-21T09:05:29.1821085Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r252 + 0], {%r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338, %r338}; 2026-02-21T09:05:29.1821490Z // end inline asm 2026-02-21T09:05:29.1821665Z // begin inline asm 2026-02-21T09:05:29.1821819Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:29.1821988Z // end inline asm 2026-02-21T09:05:29.1822122Z bar.sync 0, 256; 2026-02-21T09:05:29.1822375Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1822660Z add.s32 %r188, %r146, 43008; 2026-02-21T09:05:29.1822825Z // begin inline asm 2026-02-21T09:05:29.1822990Z @%p110 mbarrier.init.shared::cta.b64 [%r188], 1; 2026-02-21T09:05:29.1823185Z // end inline asm 2026-02-21T09:05:29.1823325Z bar.sync 0, 256; 2026-02-21T09:05:29.1823463Z add.s32 %r189, %r146, 43016; 2026-02-21T09:05:29.1823620Z // begin inline asm 2026-02-21T09:05:29.1823782Z @%p110 mbarrier.init.shared::cta.b64 [%r189], 1; 2026-02-21T09:05:29.1823978Z // end inline asm 2026-02-21T09:05:29.1824109Z bar.sync 0, 256; 2026-02-21T09:05:29.1824249Z add.s32 %r190, %r146, 43024; 2026-02-21T09:05:29.1824400Z // begin inline asm 2026-02-21T09:05:29.1824569Z @%p110 mbarrier.init.shared::cta.b64 [%r190], 1; 2026-02-21T09:05:29.1824784Z // end inline asm 2026-02-21T09:05:29.1824917Z bar.sync 0, 256; 2026-02-21T09:05:29.1825054Z add.s32 %r191, %r146, 43032; 2026-02-21T09:05:29.1825200Z // begin inline asm 2026-02-21T09:05:29.1825360Z @%p110 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T09:05:29.1825569Z // end inline asm 2026-02-21T09:05:29.1825703Z add.s32 %r192, %r146, 43040; 2026-02-21T09:05:29.1825848Z // begin inline asm 2026-02-21T09:05:29.1826011Z @%p110 mbarrier.init.shared::cta.b64 [%r192], 1; 2026-02-21T09:05:29.1826187Z // end inline asm 2026-02-21T09:05:29.1826319Z bar.sync 0, 256; 2026-02-21T09:05:29.1826454Z add.s32 %r193, %r146, 43048; 2026-02-21T09:05:29.1826600Z // begin inline asm 2026-02-21T09:05:29.1826762Z @%p110 mbarrier.init.shared::cta.b64 [%r193], 1; 2026-02-21T09:05:29.1826940Z // end inline asm 2026-02-21T09:05:29.1827073Z bar.sync 0, 256; 2026-02-21T09:05:29.1827204Z add.s32 %r194, %r146, 43056; 2026-02-21T09:05:29.1827360Z // begin inline asm 2026-02-21T09:05:29.1827518Z @%p110 mbarrier.init.shared::cta.b64 [%r194], 1; 2026-02-21T09:05:29.1827703Z // end inline asm 2026-02-21T09:05:29.1827829Z bar.sync 0, 256; 2026-02-21T09:05:29.1827966Z add.s32 %r195, %r146, 43064; 2026-02-21T09:05:29.1828122Z // begin inline asm 2026-02-21T09:05:29.1828274Z @%p110 mbarrier.init.shared::cta.b64 [%r195], 1; 2026-02-21T09:05:29.1828457Z // end inline asm 2026-02-21T09:05:29.1828721Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1829004Z bar.sync 0, 256; 2026-02-21T09:05:29.1829131Z // begin inline asm 2026-02-21T09:05:29.1829305Z @%p110 mbarrier.arrive.shared::cta.b64 _, [%r188]; 2026-02-21T09:05:29.1829496Z // end inline asm 2026-02-21T09:05:29.1829630Z bar.sync 0, 256; 2026-02-21T09:05:29.1829764Z // begin inline asm 2026-02-21T09:05:29.1829928Z @%p110 mbarrier.arrive.shared::cta.b64 _, [%r189]; 2026-02-21T09:05:29.1830122Z // end inline asm 2026-02-21T09:05:29.1830251Z bar.sync 0, 256; 2026-02-21T09:05:29.1830419Z // begin inline asm 2026-02-21T09:05:29.1830580Z @%p110 mbarrier.arrive.shared::cta.b64 _, [%r190]; 2026-02-21T09:05:29.1830771Z // end inline asm 2026-02-21T09:05:29.1830897Z bar.sync 0, 256; 2026-02-21T09:05:29.1831034Z // begin inline asm 2026-02-21T09:05:29.1831199Z @%p110 mbarrier.arrive.shared::cta.b64 _, [%r191]; 2026-02-21T09:05:29.1831380Z // end inline asm 2026-02-21T09:05:29.1831628Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1831901Z bar.sync 0, 256; 2026-02-21T09:05:29.1832040Z add.s32 %r200, %r146, 43072; 2026-02-21T09:05:29.1832185Z // begin inline asm 2026-02-21T09:05:29.1832349Z @%p110 mbarrier.init.shared::cta.b64 [%r200], 1; 2026-02-21T09:05:29.1832526Z // end inline asm 2026-02-21T09:05:29.1832662Z add.s32 %r298, %r146, 43088; 2026-02-21T09:05:29.1832811Z // begin inline asm 2026-02-21T09:05:29.1833000Z @%p110 mbarrier.init.shared::cta.b64 [%r298], 1; 2026-02-21T09:05:29.1833190Z // end inline asm 2026-02-21T09:05:29.1833421Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1833704Z bar.sync 0, 256; 2026-02-21T09:05:29.1833833Z // begin inline asm 2026-02-21T09:05:29.1833998Z @%p110 mbarrier.arrive.shared::cta.b64 _, [%r298]; 2026-02-21T09:05:29.1834180Z // end inline asm 2026-02-21T09:05:29.1834422Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1834752Z st.shared.b32 [global_smem+43096], 33554689; 2026-02-21T09:05:29.1834952Z st.shared.b32 [global_smem+32768], %r310; 2026-02-21T09:05:29.1835134Z barrier.sync 1; 2026-02-21T09:05:29.1835289Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:05:29.1835473Z barrier.sync 1; 2026-02-21T09:05:29.1835611Z setp.lt.s32 %p101, %r42, 1; 2026-02-21T09:05:29.1835776Z @%p101 bra $L__BB0_23; 2026-02-21T09:05:29.1835942Z // %bb.17: // %.lr.ph12 2026-02-21T09:05:29.1836140Z add.s32 %r335, %r41, -1; 2026-02-21T09:05:29.1836297Z shl.b32 %r224, %r1, 5; 2026-02-21T09:05:29.1836447Z and.b32 %r225, %r224, 8032; 2026-02-21T09:05:29.1836615Z bfe.s32 %r226, %r1, 2, 1; 2026-02-21T09:05:29.1836793Z and.b32 %r227, %r226, 144; 2026-02-21T09:05:29.1836951Z or.b32 %r228, %r227, %r225; 2026-02-21T09:05:29.1837099Z add.s32 %r230, %r146, 32768; 2026-02-21T09:05:29.1837259Z add.s32 %r45, %r230, %r228; 2026-02-21T09:05:29.1837415Z xor.b32 %r231, %r228, 16; 2026-02-21T09:05:29.1837576Z add.s32 %r46, %r230, %r231; 2026-02-21T09:05:29.1837727Z mov.b32 %r332, -1; 2026-02-21T09:05:29.1837875Z mov.b32 %r336, %r338; 2026-02-21T09:05:29.1838027Z mov.b32 %r337, %r338; 2026-02-21T09:05:29.1838172Z mov.b32 %r333, %r338; 2026-02-21T09:05:29.1838326Z bra.uni $L__BB0_18; 2026-02-21T09:05:29.1838520Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.1838862Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1839154Z bar.sync 0, 256; 2026-02-21T09:05:29.1839299Z // begin inline asm 2026-02-21T09:05:29.1839435Z 2026-02-21T09:05:29.1839557Z { 2026-02-21T09:05:29.1839690Z .reg .pred complete; 2026-02-21T09:05:29.1839837Z waitLoop: 2026-02-21T09:05:29.1840039Z mbarrier.try_wait.parity.shared.b64 complete, [%r200], %r338; 2026-02-21T09:05:29.1840278Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.1840437Z } 2026-02-21T09:05:29.1840543Z 2026-02-21T09:05:29.1840603Z // end inline asm 2026-02-21T09:05:29.1840749Z // begin inline asm 2026-02-21T09:05:29.1841112Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r236, %r237, %r238, %r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251}, [%r252 + 0]; 2026-02-21T09:05:29.1841507Z // end inline asm 2026-02-21T09:05:29.1841651Z // begin inline asm 2026-02-21T09:05:29.1841804Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:29.1841976Z // end inline asm 2026-02-21T09:05:29.1842149Z bar.sync 0, 256; 2026-02-21T09:05:29.1842292Z // begin inline asm 2026-02-21T09:05:29.1842464Z @%p110 mbarrier.arrive.shared::cta.b64 _, [%r298]; 2026-02-21T09:05:29.1842666Z // end inline asm 2026-02-21T09:05:29.1842807Z cvt.u64.u32 %rd81, %r236; 2026-02-21T09:05:29.1842970Z cvt.u64.u32 %rd82, %r237; 2026-02-21T09:05:29.1843124Z shl.b64 %rd83, %rd82, 32; 2026-02-21T09:05:29.1843282Z or.b64 %rd84, %rd81, %rd83; 2026-02-21T09:05:29.1843558Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1843849Z mov.b64 {%r259, %r260}, %rd84; 2026-02-21T09:05:29.1844029Z cvt.rn.f16x2.f32 %r261, %r260, %r259; 2026-02-21T09:05:29.1844315Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1844599Z cvt.u64.u32 %rd85, %r238; 2026-02-21T09:05:29.1844795Z cvt.u64.u32 %rd86, %r239; 2026-02-21T09:05:29.1844979Z shl.b64 %rd87, %rd86, 32; 2026-02-21T09:05:29.1845137Z or.b64 %rd88, %rd85, %rd87; 2026-02-21T09:05:29.1845388Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1845669Z mov.b64 {%r262, %r263}, %rd88; 2026-02-21T09:05:29.1845837Z cvt.rn.f16x2.f32 %r264, %r263, %r262; 2026-02-21T09:05:29.1846126Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1846395Z cvt.u64.u32 %rd89, %r240; 2026-02-21T09:05:29.1846550Z cvt.u64.u32 %rd90, %r241; 2026-02-21T09:05:29.1846694Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:05:29.1846846Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:05:29.1847099Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1847367Z mov.b64 {%r265, %r266}, %rd92; 2026-02-21T09:05:29.1847532Z cvt.rn.f16x2.f32 %r267, %r266, %r265; 2026-02-21T09:05:29.1847795Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1848071Z cvt.u64.u32 %rd93, %r242; 2026-02-21T09:05:29.1848213Z cvt.u64.u32 %rd94, %r243; 2026-02-21T09:05:29.1848361Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:05:29.1848553Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:05:29.1848807Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1849097Z mov.b64 {%r268, %r269}, %rd96; 2026-02-21T09:05:29.1849263Z cvt.rn.f16x2.f32 %r270, %r269, %r268; 2026-02-21T09:05:29.1849541Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1849817Z cvt.u64.u32 %rd97, %r244; 2026-02-21T09:05:29.1849977Z cvt.u64.u32 %rd98, %r245; 2026-02-21T09:05:29.1850125Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:05:29.1850287Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:05:29.1850556Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1850835Z mov.b64 {%r271, %r272}, %rd100; 2026-02-21T09:05:29.1851015Z cvt.rn.f16x2.f32 %r273, %r272, %r271; 2026-02-21T09:05:29.1851284Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1851574Z cvt.u64.u32 %rd101, %r246; 2026-02-21T09:05:29.1851733Z cvt.u64.u32 %rd102, %r247; 2026-02-21T09:05:29.1851894Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:05:29.1852101Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:05:29.1852357Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1852640Z mov.b64 {%r274, %r275}, %rd104; 2026-02-21T09:05:29.1852803Z cvt.rn.f16x2.f32 %r276, %r275, %r274; 2026-02-21T09:05:29.1853072Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1853347Z cvt.u64.u32 %rd105, %r248; 2026-02-21T09:05:29.1853507Z cvt.u64.u32 %rd106, %r249; 2026-02-21T09:05:29.1853687Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:05:29.1853852Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:05:29.1854116Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1854402Z mov.b64 {%r277, %r278}, %rd108; 2026-02-21T09:05:29.1854577Z cvt.rn.f16x2.f32 %r279, %r278, %r277; 2026-02-21T09:05:29.1854876Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1855161Z cvt.u64.u32 %rd109, %r250; 2026-02-21T09:05:29.1855313Z cvt.u64.u32 %rd110, %r251; 2026-02-21T09:05:29.1855467Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:05:29.1855626Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:05:29.1855885Z .loc 1 58 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:58:27 2026-02-21T09:05:29.1856167Z mov.b64 {%r280, %r281}, %rd112; 2026-02-21T09:05:29.1856366Z cvt.rn.f16x2.f32 %r282, %r281, %r280; 2026-02-21T09:05:29.1856646Z .loc 1 59 45 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:59:45 2026-02-21T09:05:29.1856943Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.1857130Z bar.sync 0, 256; 2026-02-21T09:05:29.1857310Z st.shared.v4.b32 [%r45], {%r261, %r264, %r267, %r270}; 2026-02-21T09:05:29.1857551Z st.shared.v4.b32 [%r46], {%r273, %r276, %r279, %r282}; 2026-02-21T09:05:29.1857754Z // begin inline asm 2026-02-21T09:05:29.1857914Z fence.proxy.async.shared::cta; 2026-02-21T09:05:29.1858088Z // end inline asm 2026-02-21T09:05:29.1858223Z bar.sync 0, 256; 2026-02-21T09:05:29.1858372Z elect.sync %r283|%p108, -1; 2026-02-21T09:05:29.1858536Z and.pred %p106, %p29, %p108; 2026-02-21T09:05:29.1858703Z // begin inline asm 2026-02-21T09:05:29.1858964Z @%p106 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd80, {%r337, %r336}], [%r230]; 2026-02-21T09:05:29.1859258Z // end inline asm 2026-02-21T09:05:29.1859414Z cp.async.bulk.commit_group; 2026-02-21T09:05:29.1859574Z mov.b32 %r334, 1; 2026-02-21T09:05:29.1859765Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.1860082Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1860401Z xor.b32 %r338, %r334, %r338; 2026-02-21T09:05:29.1860662Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1860952Z add.s32 %r333, %r333, 1; 2026-02-21T09:05:29.1861113Z setp.lt.s32 %p109, %r333, %r42; 2026-02-21T09:05:29.1861276Z @%p109 bra $L__BB0_18; 2026-02-21T09:05:29.1861428Z bra.uni $L__BB0_23; 2026-02-21T09:05:29.1861613Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:29.1861831Z add.s32 %r233, %r332, 1; 2026-02-21T09:05:29.1861981Z setp.eq.b32 %p102, %r332, 127; 2026-02-21T09:05:29.1862152Z selp.b32 %r332, 0, %r233, %p102; 2026-02-21T09:05:29.1862318Z setp.eq.b32 %p103, %r332, 127; 2026-02-21T09:05:29.1862483Z @%p103 bra $L__BB0_21; 2026-02-21T09:05:29.1862673Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.1862982Z .loc 1 0 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0:106 2026-02-21T09:05:29.1863266Z mov.b32 %r334, 0; 2026-02-21T09:05:29.1863502Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1863820Z setp.ne.b32 %p104, %r332, 0; 2026-02-21T09:05:29.1863975Z @%p104 bra $L__BB0_22; 2026-02-21T09:05:29.1864138Z // %bb.20: // %.thread 2026-02-21T09:05:29.1864345Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.1864547Z add.s32 %r335, %r335, 1; 2026-02-21T09:05:29.1864834Z .loc 1 39 35 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:39:35 2026-02-21T09:05:29.1865141Z shr.s32 %r285, %r335, 31; 2026-02-21T09:05:29.1865295Z shr.u32 %r286, %r285, 26; 2026-02-21T09:05:29.1865443Z add.s32 %r287, %r335, %r286; 2026-02-21T09:05:29.1865601Z shr.s32 %r288, %r287, 6; 2026-02-21T09:05:29.1865850Z .loc 1 40 33 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:40:33 2026-02-21T09:05:29.1866138Z shl.b32 %r289, %r288, 2; 2026-02-21T09:05:29.1866389Z .loc 1 41 39 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:41:39 2026-02-21T09:05:29.1866664Z sub.s32 %r290, 128, %r289; 2026-02-21T09:05:29.1866926Z .loc 1 41 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:41:52 2026-02-21T09:05:29.1867197Z min.s32 %r291, %r290, 4; 2026-02-21T09:05:29.1867459Z .loc 1 42 45 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:42:45 2026-02-21T09:05:29.1867738Z and.b32 %r292, %r287, -64; 2026-02-21T09:05:29.1867936Z sub.s32 %r293, %r335, %r292; 2026-02-21T09:05:29.1868196Z .loc 1 43 51 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:43:51 2026-02-21T09:05:29.1868467Z div.s32 %r294, %r293, %r291; 2026-02-21T09:05:29.1868720Z .loc 1 42 64 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:42:64 2026-02-21T09:05:29.1869000Z mul.lo.s32 %r295, %r294, %r291; 2026-02-21T09:05:29.1869167Z sub.s32 %r296, %r293, %r295; 2026-02-21T09:05:29.1869419Z .loc 1 42 30 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:42:30 2026-02-21T09:05:29.1869695Z add.s32 %r297, %r296, %r289; 2026-02-21T09:05:29.1869946Z .loc 1 44 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:44:27 2026-02-21T09:05:29.1870213Z shl.b32 %r337, %r297, 4; 2026-02-21T09:05:29.1870463Z .loc 1 45 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:45:27 2026-02-21T09:05:29.1870731Z shl.b32 %r336, %r294, 8; 2026-02-21T09:05:29.1870992Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1871275Z bra.uni $L__BB0_22; 2026-02-21T09:05:29.1871460Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:05:29.1871805Z .loc 1 0 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0:106 2026-02-21T09:05:29.1872088Z mov.b32 %r65, global_smem; 2026-02-21T09:05:29.1872247Z add.s32 %r66, %r65, %r3; 2026-02-21T09:05:29.1872393Z mov.u32 %r114, %ctaid.x; 2026-02-21T09:05:29.1872547Z max.u32 %r115, %r114, 2047; 2026-02-21T09:05:29.1872698Z shl.b32 %r116, %r115, 7; 2026-02-21T09:05:29.1872850Z sub.s32 %r5, 262144, %r116; 2026-02-21T09:05:29.1873004Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:05:29.1873159Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.1873343Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1873660Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1873969Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.1874146Z barrier.sync 1; 2026-02-21T09:05:29.1874286Z barrier.sync 1; 2026-02-21T09:05:29.1874438Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.1874644Z $L__BB0_2: // %.preheader 2026-02-21T09:05:29.1874895Z // =>This Loop Header: Depth=1 2026-02-21T09:05:29.1875167Z // Child Loop BB0_11 Depth 2 2026-02-21T09:05:29.1875400Z // Child Loop BB0_7 Depth 2 2026-02-21T09:05:29.1875697Z .loc 1 19 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:19 2026-02-21T09:05:29.1875999Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:05:29.1876179Z barrier.sync 1; 2026-02-21T09:05:29.1876332Z ld.shared.b8 %r64, [%r66+43088]; 2026-02-21T09:05:29.1876505Z setp.gt.u32 %p4, %r64, 3; 2026-02-21T09:05:29.1876698Z @%p4 bra $L__BB0_4; 2026-02-21T09:05:29.1876869Z // %bb.3: // %.preheader 2026-02-21T09:05:29.1877081Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1877292Z $L_brx_0: .branchtargets 2026-02-21T09:05:29.1877437Z $L__BB0_5, 2026-02-21T09:05:29.1877567Z $L__BB0_9, 2026-02-21T09:05:29.1877687Z $L__BB0_15, 2026-02-21T09:05:29.1877815Z $L__BB0_24; 2026-02-21T09:05:29.1877948Z brx.idx %r64, $L_brx_0; 2026-02-21T09:05:29.1878143Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1878471Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1878779Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.1878980Z ld.shared.b32 %r121, [global_smem+32768]; 2026-02-21T09:05:29.1879156Z barrier.sync 1; 2026-02-21T09:05:29.1879321Z @%p17 bra $L__BB0_8; 2026-02-21T09:05:29.1879478Z // %bb.6: // %.lr.ph9 2026-02-21T09:05:29.1879692Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1880001Z .loc 1 0 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0:106 2026-02-21T09:05:29.1880300Z mov.b32 %r315, -1; 2026-02-21T09:05:29.1880454Z mov.pred %p121, 0; 2026-02-21T09:05:29.1880597Z mov.b32 %r312, 0; 2026-02-21T09:05:29.1880745Z mov.b32 %r313, %r312; 2026-02-21T09:05:29.1880896Z mov.b32 %r314, %r312; 2026-02-21T09:05:29.1881050Z mov.b32 %r316, %r312; 2026-02-21T09:05:29.1881235Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.1881488Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.1881809Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1882109Z add.s32 %r127, %r315, 1; 2026-02-21T09:05:29.1882279Z setp.eq.b32 %p26, %r315, 127; 2026-02-21T09:05:29.1882451Z selp.b32 %r315, 0, %r127, %p26; 2026-02-21T09:05:29.1882624Z shl.b32 %r128, %r314, 3; 2026-02-21T09:05:29.1882776Z add.s32 %r130, %r65, %r128; 2026-02-21T09:05:29.1882973Z add.s32 %r131, %r130, 43008; 2026-02-21T09:05:29.1883130Z add.s32 %r119, %r130, 43040; 2026-02-21T09:05:29.1883398Z .loc 1 54 31 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:54:31 2026-02-21T09:05:29.1883696Z shl.b32 %r132, %r314, 13; 2026-02-21T09:05:29.1883850Z add.s32 %r133, %r65, %r132; 2026-02-21T09:05:29.1884113Z .loc 1 55 44 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:55:44 2026-02-21T09:05:29.1884397Z shl.b32 %r134, %r314, 9; 2026-02-21T09:05:29.1884555Z add.s32 %r135, %r65, %r134; 2026-02-21T09:05:29.1884737Z add.s32 %r136, %r135, 40960; 2026-02-21T09:05:29.1885007Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1885287Z bar.warp.sync -1; 2026-02-21T09:05:29.1885439Z // begin inline asm 2026-02-21T09:05:29.1885583Z 2026-02-21T09:05:29.1885694Z { 2026-02-21T09:05:29.1885823Z .reg .pred complete; 2026-02-21T09:05:29.1885967Z waitLoop: 2026-02-21T09:05:29.1886165Z mbarrier.try_wait.parity.shared.b64 complete, [%r119], %r313; 2026-02-21T09:05:29.1886405Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.1886569Z } 2026-02-21T09:05:29.1886634Z 2026-02-21T09:05:29.1886733Z // end inline asm 2026-02-21T09:05:29.1886994Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1887294Z setp.eq.b32 %p25, %r315, 127; 2026-02-21T09:05:29.1887564Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1887844Z elect.sync %r137|%p20, -1; 2026-02-21T09:05:29.1888004Z bfe.u32 %r138, %r133, 4, 14; 2026-02-21T09:05:29.1888162Z cvt.u64.u32 %rd20, %r138; 2026-02-21T09:05:29.1888354Z or.b64 %rd14, %rd20, -4611685949674356736; 2026-02-21T09:05:29.1888540Z bfe.u32 %r139, %r136, 4, 14; 2026-02-21T09:05:29.1888691Z cvt.u64.u32 %rd21, %r139; 2026-02-21T09:05:29.1888876Z or.b64 %rd15, %rd21, -4611685949705814016; 2026-02-21T09:05:29.1889061Z mov.b32 %r122, 134479888; 2026-02-21T09:05:29.1889205Z // begin inline asm 2026-02-21T09:05:29.1889431Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r121 + 0 ], %rd14, %rd15, %r122, %p121; 2026-02-21T09:05:29.1889678Z // end inline asm 2026-02-21T09:05:29.1889820Z add.s32 %r140, %r133, 4096; 2026-02-21T09:05:29.1889969Z bfe.u32 %r141, %r140, 4, 14; 2026-02-21T09:05:29.1890122Z cvt.u64.u32 %rd22, %r141; 2026-02-21T09:05:29.1890276Z or.b64 %rd16, %rd22, -4611685949674356736; 2026-02-21T09:05:29.1890452Z // begin inline asm 2026-02-21T09:05:29.1890673Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r121 + 16 ], %rd16, %rd15, %r122, %p121; 2026-02-21T09:05:29.1890943Z // end inline asm 2026-02-21T09:05:29.1891088Z cvt.u64.u32 %rd18, %r131; 2026-02-21T09:05:29.1891234Z // begin inline asm 2026-02-21T09:05:29.1891443Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd18]; 2026-02-21T09:05:29.1891672Z // end inline asm 2026-02-21T09:05:29.1891820Z and.pred %p24, %p25, %p20; 2026-02-21T09:05:29.1891972Z add.s32 %r142, %r65, 43072; 2026-02-21T09:05:29.1892129Z cvt.u64.u32 %rd19, %r142; 2026-02-21T09:05:29.1892281Z // begin inline asm 2026-02-21T09:05:29.1892477Z @%p24 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd19]; 2026-02-21T09:05:29.1892702Z // end inline asm 2026-02-21T09:05:29.1892934Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1893213Z setp.ne.b32 %p121, %r315, 127; 2026-02-21T09:05:29.1893469Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1893747Z selp.b32 %r143, 1, 0, %p25; 2026-02-21T09:05:29.1893899Z xor.b32 %r312, %r312, %r143; 2026-02-21T09:05:29.1894054Z add.s32 %r125, %r65, 43088; 2026-02-21T09:05:29.1894204Z // begin inline asm 2026-02-21T09:05:29.1894330Z 2026-02-21T09:05:29.1894443Z { 2026-02-21T09:05:29.1894561Z @!%p25 bra.uni skipWait; 2026-02-21T09:05:29.1894793Z .reg .pred complete; 2026-02-21T09:05:29.1894931Z waitLoop: 2026-02-21T09:05:29.1895118Z mbarrier.try_wait.parity.shared.b64 complete, [%r125], %r312; 2026-02-21T09:05:29.1895343Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.1895502Z skipWait: 2026-02-21T09:05:29.1895617Z } 2026-02-21T09:05:29.1895688Z 2026-02-21T09:05:29.1895742Z // end inline asm 2026-02-21T09:05:29.1895979Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1896247Z add.s32 %r144, %r314, 1; 2026-02-21T09:05:29.1896405Z setp.eq.b32 %p27, %r144, 4; 2026-02-21T09:05:29.1896564Z selp.b32 %r314, 0, %r144, %p27; 2026-02-21T09:05:29.1896735Z selp.b32 %r145, 1, 0, %p27; 2026-02-21T09:05:29.1896886Z xor.b32 %r313, %r313, %r145; 2026-02-21T09:05:29.1897161Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1897451Z add.s32 %r316, %r316, 1; 2026-02-21T09:05:29.1897605Z setp.lt.s32 %p28, %r316, %r5; 2026-02-21T09:05:29.1897775Z @%p28 bra $L__BB0_7; 2026-02-21T09:05:29.1897943Z $L__BB0_8: // %._crit_edge10 2026-02-21T09:05:29.1898205Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1898402Z barrier.sync 1; 2026-02-21T09:05:29.1898563Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.1898736Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.1898916Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1899230Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1899527Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.1899709Z barrier.sync 1; 2026-02-21T09:05:29.1899972Z .loc 1 21 67 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:21:67 2026-02-21T09:05:29.1900251Z mov.u32 %r67, %ctaid.y; 2026-02-21T09:05:29.1900399Z mov.u32 %r68, %ctaid.z; 2026-02-21T09:05:29.1900553Z mov.u32 %r69, %nctaid.x; 2026-02-21T09:05:29.1900703Z mov.u32 %r70, %nctaid.y; 2026-02-21T09:05:29.1900859Z mad.lo.s32 %r71, %r68, %r70, %r67; 2026-02-21T09:05:29.1901034Z mad.lo.s32 %r72, %r71, %r69, %r114; 2026-02-21T09:05:29.1901201Z mul.lo.s32 %r73, %r72, 384; 2026-02-21T09:05:29.1901457Z .loc 1 22 67 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:22:67 2026-02-21T09:05:29.1901731Z add.s32 %r74, %r73, 128; 2026-02-21T09:05:29.1901885Z cvt.s64.s32 %rd8, %r74; 2026-02-21T09:05:29.1902033Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:05:29.1902193Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:05:29.1902473Z .loc 1 21 67 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:21:67 2026-02-21T09:05:29.1902756Z cvt.s64.s32 %rd10, %r73; 2026-02-21T09:05:29.1902912Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:05:29.1903066Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:05:29.1903333Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1903608Z @%p17 bra $L__BB0_14; 2026-02-21T09:05:29.1903773Z // %bb.10: // %.lr.ph 2026-02-21T09:05:29.1903984Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1904187Z add.s32 %r327, %r114, -1; 2026-02-21T09:05:29.1904340Z add.s32 %r19, %r1, -256; 2026-02-21T09:05:29.1904484Z mov.b32 %r323, -1; 2026-02-21T09:05:29.1904622Z mov.b32 %r317, 0; 2026-02-21T09:05:29.1904784Z mov.b32 %r318, %r317; 2026-02-21T09:05:29.1904932Z mov.b32 %r326, %r317; 2026-02-21T09:05:29.1905070Z mov.b32 %r325, %r317; 2026-02-21T09:05:29.1905214Z mov.b32 %r321, %r317; 2026-02-21T09:05:29.1905351Z mov.b32 %r324, %r317; 2026-02-21T09:05:29.1905497Z bra.uni $L__BB0_11; 2026-02-21T09:05:29.1905679Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:29.1906005Z .loc 1 0 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0:106 2026-02-21T09:05:29.1906324Z selp.b32 %r97, 0, %r321, %p8; 2026-02-21T09:05:29.1906486Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:05:29.1906651Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:05:29.1906906Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1907196Z shl.b32 %r104, %r318, 3; 2026-02-21T09:05:29.1907341Z add.s32 %r106, %r65, %r104; 2026-02-21T09:05:29.1907501Z add.s32 %r93, %r106, 43008; 2026-02-21T09:05:29.1907753Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1908021Z // begin inline asm 2026-02-21T09:05:29.1908161Z 2026-02-21T09:05:29.1908271Z { 2026-02-21T09:05:29.1908397Z .reg .pred complete; 2026-02-21T09:05:29.1908535Z waitLoop: 2026-02-21T09:05:29.1908718Z mbarrier.try_wait.parity.shared.b64 complete, [%r93], %r317; 2026-02-21T09:05:29.1908938Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.1909094Z } 2026-02-21T09:05:29.1909155Z 2026-02-21T09:05:29.1909210Z // end inline asm 2026-02-21T09:05:29.1909455Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1909776Z add.s32 %r99, %r106, 43040; 2026-02-21T09:05:29.1910022Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1910299Z bar.sync 3, 64; 2026-02-21T09:05:29.1910430Z // begin inline asm 2026-02-21T09:05:29.1910619Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r99], 8704; 2026-02-21T09:05:29.1910828Z // end inline asm 2026-02-21T09:05:29.1911070Z .loc 1 54 31 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:54:31 2026-02-21T09:05:29.1911373Z shl.b32 %r107, %r318, 13; 2026-02-21T09:05:29.1911523Z add.s32 %r96, %r65, %r107; 2026-02-21T09:05:29.1911774Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1912031Z bar.sync 3, 64; 2026-02-21T09:05:29.1912173Z elect.sync %r108|%p13, -1; 2026-02-21T09:05:29.1912329Z and.pred %p10, %p12, %p13; 2026-02-21T09:05:29.1912487Z // begin inline asm 2026-02-21T09:05:29.1912809Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r96], [%rd12, {%r97, %r326}], [%r99]; 2026-02-21T09:05:29.1913165Z // end inline asm 2026-02-21T09:05:29.1913406Z .loc 1 55 44 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:55:44 2026-02-21T09:05:29.1913676Z shl.b32 %r109, %r318, 9; 2026-02-21T09:05:29.1913830Z add.s32 %r110, %r65, %r109; 2026-02-21T09:05:29.1913980Z add.s32 %r100, %r110, 40960; 2026-02-21T09:05:29.1914253Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1914510Z bar.sync 3, 64; 2026-02-21T09:05:29.1914653Z elect.sync %r111|%p14, -1; 2026-02-21T09:05:29.1914844Z and.pred %p11, %p12, %p14; 2026-02-21T09:05:29.1914995Z // begin inline asm 2026-02-21T09:05:29.1915315Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r100], [%rd13, {%r97, %r325}], [%r99]; 2026-02-21T09:05:29.1915653Z // end inline asm 2026-02-21T09:05:29.1915902Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1916180Z add.s32 %r321, %r97, 16; 2026-02-21T09:05:29.1916429Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1916694Z add.s32 %r112, %r318, 1; 2026-02-21T09:05:29.1916840Z setp.eq.b32 %p15, %r112, 4; 2026-02-21T09:05:29.1917006Z selp.b32 %r318, 0, %r112, %p15; 2026-02-21T09:05:29.1917169Z selp.b32 %r113, 1, 0, %p15; 2026-02-21T09:05:29.1917328Z xor.b32 %r317, %r317, %r113; 2026-02-21T09:05:29.1917584Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1917904Z add.s32 %r324, %r324, 1; 2026-02-21T09:05:29.1918054Z setp.lt.s32 %p16, %r324, %r5; 2026-02-21T09:05:29.1918222Z @%p16 bra $L__BB0_11; 2026-02-21T09:05:29.1918378Z bra.uni $L__BB0_14; 2026-02-21T09:05:29.1918566Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.1918814Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.1919023Z add.s32 %r79, %r323, 1; 2026-02-21T09:05:29.1919180Z setp.eq.b32 %p6, %r323, 127; 2026-02-21T09:05:29.1919341Z selp.b32 %r323, 0, %r79, %p6; 2026-02-21T09:05:29.1919506Z setp.ne.b32 %p7, %r323, 0; 2026-02-21T09:05:29.1919659Z setp.eq.b32 %p8, %r323, 0; 2026-02-21T09:05:29.1919814Z @%p7 bra $L__BB0_13; 2026-02-21T09:05:29.1920003Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:29.1920207Z add.s32 %r327, %r327, 1; 2026-02-21T09:05:29.1920463Z .loc 1 39 35 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:39:35 2026-02-21T09:05:29.1920745Z shr.s32 %r80, %r327, 31; 2026-02-21T09:05:29.1920897Z shr.u32 %r81, %r80, 26; 2026-02-21T09:05:29.1921041Z add.s32 %r82, %r327, %r81; 2026-02-21T09:05:29.1921199Z shr.s32 %r83, %r82, 6; 2026-02-21T09:05:29.1921472Z .loc 1 40 33 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:40:33 2026-02-21T09:05:29.1921760Z shl.b32 %r84, %r83, 2; 2026-02-21T09:05:29.1922011Z .loc 1 41 39 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:41:39 2026-02-21T09:05:29.1922277Z sub.s32 %r85, 128, %r84; 2026-02-21T09:05:29.1922532Z .loc 1 41 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:41:52 2026-02-21T09:05:29.1922840Z min.s32 %r86, %r85, 4; 2026-02-21T09:05:29.1923094Z .loc 1 42 45 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:42:45 2026-02-21T09:05:29.1923362Z and.b32 %r87, %r82, -64; 2026-02-21T09:05:29.1923520Z sub.s32 %r88, %r327, %r87; 2026-02-21T09:05:29.1923790Z .loc 1 43 51 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:43:51 2026-02-21T09:05:29.1924072Z div.s32 %r89, %r88, %r86; 2026-02-21T09:05:29.1924341Z .loc 1 42 64 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:42:64 2026-02-21T09:05:29.1924628Z mul.lo.s32 %r90, %r89, %r86; 2026-02-21T09:05:29.1924828Z sub.s32 %r91, %r88, %r90; 2026-02-21T09:05:29.1925084Z .loc 1 42 30 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:42:30 2026-02-21T09:05:29.1925375Z add.s32 %r92, %r91, %r84; 2026-02-21T09:05:29.1925688Z .loc 1 44 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:44:27 2026-02-21T09:05:29.1925971Z shl.b32 %r325, %r92, 4; 2026-02-21T09:05:29.1926232Z .loc 1 45 27 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:45:27 2026-02-21T09:05:29.1926512Z shl.b32 %r326, %r89, 8; 2026-02-21T09:05:29.1926670Z bra.uni $L__BB0_13; 2026-02-21T09:05:29.1926840Z $L__BB0_14: // %._crit_edge 2026-02-21T09:05:29.1927073Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1927395Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1927684Z barrier.sync 1; 2026-02-21T09:05:29.1927853Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.1928037Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.1928230Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.1928545Z .loc 1 19 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:19 2026-02-21T09:05:29.1928833Z barrier.sync 1; 2026-02-21T09:05:29.1928972Z barrier.sync 1; 2026-02-21T09:05:29.1929117Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.1929292Z $L__BB0_23: // %._crit_edge13 2026-02-21T09:05:29.1929632Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1929950Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.1930137Z bar.sync 0, 256; 2026-02-21T09:05:29.1930282Z barrier.sync 1; 2026-02-21T09:05:29.1930442Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:05:29.1930739Z .loc 1 56 52 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:56:52 2026-02-21T09:05:29.1931038Z // begin inline asm 2026-02-21T09:05:29.1931176Z 2026-02-21T09:05:29.1931296Z { 2026-02-21T09:05:29.1931426Z .reg .pred complete; 2026-02-21T09:05:29.1931571Z waitLoop: 2026-02-21T09:05:29.1931755Z mbarrier.try_wait.parity.shared.b64 complete, [%r298], %r338; 2026-02-21T09:05:29.1931992Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.1932138Z } 2026-02-21T09:05:29.1932208Z 2026-02-21T09:05:29.1932261Z // end inline asm 2026-02-21T09:05:29.1932503Z .loc 1 33 106 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:106 2026-02-21T09:05:29.1932783Z bar.sync 0, 256; 2026-02-21T09:05:29.1932920Z // begin inline asm 2026-02-21T09:05:29.1933086Z @%p110 mbarrier.inval.shared::cta.b64 [%r298]; 2026-02-21T09:05:29.1933312Z // end inline asm 2026-02-21T09:05:29.1933444Z // begin inline asm 2026-02-21T09:05:29.1933612Z @%p110 mbarrier.inval.shared::cta.b64 [%r200]; 2026-02-21T09:05:29.1933791Z // end inline asm 2026-02-21T09:05:29.1933924Z // begin inline asm 2026-02-21T09:05:29.1934078Z @%p110 mbarrier.inval.shared::cta.b64 [%r192]; 2026-02-21T09:05:29.1934259Z // end inline asm 2026-02-21T09:05:29.1934390Z bar.sync 0, 256; 2026-02-21T09:05:29.1934517Z // begin inline asm 2026-02-21T09:05:29.1934703Z @%p110 mbarrier.inval.shared::cta.b64 [%r193]; 2026-02-21T09:05:29.1934915Z // end inline asm 2026-02-21T09:05:29.1935053Z bar.sync 0, 256; 2026-02-21T09:05:29.1935187Z // begin inline asm 2026-02-21T09:05:29.1935351Z @%p110 mbarrier.inval.shared::cta.b64 [%r194]; 2026-02-21T09:05:29.1935532Z // end inline asm 2026-02-21T09:05:29.1935672Z bar.sync 0, 256; 2026-02-21T09:05:29.1935807Z // begin inline asm 2026-02-21T09:05:29.1935972Z @%p110 mbarrier.inval.shared::cta.b64 [%r195]; 2026-02-21T09:05:29.1936160Z // end inline asm 2026-02-21T09:05:29.1936295Z // begin inline asm 2026-02-21T09:05:29.1936466Z @%p110 mbarrier.inval.shared::cta.b64 [%r188]; 2026-02-21T09:05:29.1936649Z // end inline asm 2026-02-21T09:05:29.1936790Z bar.sync 0, 256; 2026-02-21T09:05:29.1936924Z // begin inline asm 2026-02-21T09:05:29.1937093Z @%p110 mbarrier.inval.shared::cta.b64 [%r189]; 2026-02-21T09:05:29.1937273Z // end inline asm 2026-02-21T09:05:29.1937414Z bar.sync 0, 256; 2026-02-21T09:05:29.1937605Z // begin inline asm 2026-02-21T09:05:29.1937762Z @%p110 mbarrier.inval.shared::cta.b64 [%r190]; 2026-02-21T09:05:29.1937947Z // end inline asm 2026-02-21T09:05:29.1938078Z bar.sync 0, 256; 2026-02-21T09:05:29.1938218Z // begin inline asm 2026-02-21T09:05:29.1938376Z @%p110 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T09:05:29.1938565Z // end inline asm 2026-02-21T09:05:29.1938799Z .loc 1 33 4 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:33:4 2026-02-21T09:05:29.1939090Z bar.sync 0, 256; 2026-02-21T09:05:29.1939225Z // begin inline asm 2026-02-21T09:05:29.1939416Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r310, 32; 2026-02-21T09:05:29.1939649Z // end inline asm 2026-02-21T09:05:29.1939805Z st.shared.b32 [global_smem+43096], 50529027; 2026-02-21T09:05:29.1939996Z barrier.sync 1; 2026-02-21T09:05:29.1940150Z $L__BB0_24: // %common.ret 2026-02-21T09:05:29.1940443Z .loc 1 0 0 // c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py:0 2026-02-21T09:05:29.1940706Z ret; 2026-02-21T09:05:29.1940831Z $L__tmp1: 2026-02-21T09:05:29.1940957Z $L__func_end0: 2026-02-21T09:05:29.1941107Z // -- End function 2026-02-21T09:05:29.1941293Z } 2026-02-21T09:05:29.1941575Z .file 1 "/tmp/torchinductor_root/4m/c4m6s5f2houvjw5t3rpf356d3gpwo6xcscyk3tkbnlmyvudgdw6y.py" 2026-02-21T09:05:29.1941894Z .section .debug_abbrev 2026-02-21T09:05:29.1942033Z { 2026-02-21T09:05:29.1942186Z .b8 1 // Abbreviation Code 2026-02-21T09:05:29.1942403Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:29.1942620Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:29.1942828Z .b8 37 // DW_AT_producer 2026-02-21T09:05:29.1943025Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.1943226Z .b8 19 // DW_AT_language 2026-02-21T09:05:29.1943422Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:29.1943625Z .b8 3 // DW_AT_name 2026-02-21T09:05:29.1943812Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.1944014Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:29.1944216Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:29.1944409Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:29.1944648Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.1944868Z .b8 0 // EOM(1) 2026-02-21T09:05:29.1945057Z .b8 0 // EOM(2) 2026-02-21T09:05:29.1945234Z .b8 0 // EOM(3) 2026-02-21T09:05:29.1945404Z } 2026-02-21T09:05:29.1945524Z .section .debug_info 2026-02-21T09:05:29.1945666Z { 2026-02-21T09:05:29.1945814Z .b32 104 // Length of Unit 2026-02-21T09:05:29.1946062Z .b8 2 // DWARF version number 2026-02-21T09:05:29.1946256Z .b8 0 2026-02-21T09:05:29.1946432Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:29.1946686Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:29.1946913Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:29.1947150Z .b8 116 // DW_AT_producer 2026-02-21T09:05:29.1947337Z .b8 114 2026-02-21T09:05:29.1947454Z .b8 105 2026-02-21T09:05:29.1947573Z .b8 116 2026-02-21T09:05:29.1947684Z .b8 111 2026-02-21T09:05:29.1947804Z .b8 110 2026-02-21T09:05:29.1947916Z .b8 0 2026-02-21T09:05:29.1948060Z .b8 2 // DW_AT_language 2026-02-21T09:05:29.1948232Z .b8 0 2026-02-21T09:05:29.1948372Z .b8 99 // DW_AT_name 2026-02-21T09:05:29.1948541Z .b8 52 2026-02-21T09:05:29.1948686Z .b8 109 2026-02-21T09:05:29.1948799Z .b8 54 2026-02-21T09:05:29.1948918Z .b8 115 2026-02-21T09:05:29.1949025Z .b8 53 2026-02-21T09:05:29.1949140Z .b8 102 2026-02-21T09:05:29.1949257Z .b8 50 2026-02-21T09:05:29.1949366Z .b8 104 2026-02-21T09:05:29.1949481Z .b8 111 2026-02-21T09:05:29.1949590Z .b8 117 2026-02-21T09:05:29.1949707Z .b8 118 2026-02-21T09:05:29.1949813Z .b8 106 2026-02-21T09:05:29.1949931Z .b8 119 2026-02-21T09:05:29.1950038Z .b8 53 2026-02-21T09:05:29.1950154Z .b8 116 2026-02-21T09:05:29.1950260Z .b8 51 2026-02-21T09:05:29.1950377Z .b8 114 2026-02-21T09:05:29.1950484Z .b8 112 2026-02-21T09:05:29.1950598Z .b8 102 2026-02-21T09:05:29.1950705Z .b8 51 2026-02-21T09:05:29.1950820Z .b8 53 2026-02-21T09:05:29.1950933Z .b8 54 2026-02-21T09:05:29.1951039Z .b8 100 2026-02-21T09:05:29.1951151Z .b8 51 2026-02-21T09:05:29.1951258Z .b8 103 2026-02-21T09:05:29.1951372Z .b8 112 2026-02-21T09:05:29.1951478Z .b8 119 2026-02-21T09:05:29.1951591Z .b8 111 2026-02-21T09:05:29.1951698Z .b8 54 2026-02-21T09:05:29.1951812Z .b8 120 2026-02-21T09:05:29.1951922Z .b8 99 2026-02-21T09:05:29.1952035Z .b8 115 2026-02-21T09:05:29.1952142Z .b8 99 2026-02-21T09:05:29.1952258Z .b8 121 2026-02-21T09:05:29.1952362Z .b8 107 2026-02-21T09:05:29.1952477Z .b8 51 2026-02-21T09:05:29.1952584Z .b8 116 2026-02-21T09:05:29.1952734Z .b8 107 2026-02-21T09:05:29.1952848Z .b8 98 2026-02-21T09:05:29.1952955Z .b8 110 2026-02-21T09:05:29.1953070Z .b8 108 2026-02-21T09:05:29.1953179Z .b8 109 2026-02-21T09:05:29.1953296Z .b8 121 2026-02-21T09:05:29.1953404Z .b8 118 2026-02-21T09:05:29.1953524Z .b8 117 2026-02-21T09:05:29.1953634Z .b8 100 2026-02-21T09:05:29.1953751Z .b8 103 2026-02-21T09:05:29.1953860Z .b8 100 2026-02-21T09:05:29.1953983Z .b8 119 2026-02-21T09:05:29.1954098Z .b8 54 2026-02-21T09:05:29.1954225Z .b8 121 2026-02-21T09:05:29.1954332Z .b8 46 2026-02-21T09:05:29.1954447Z .b8 112 2026-02-21T09:05:29.1954560Z .b8 121 2026-02-21T09:05:29.1954667Z .b8 0 2026-02-21T09:05:29.1954853Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:29.1955072Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:29.1955259Z .b8 116 2026-02-21T09:05:29.1955367Z .b8 109 2026-02-21T09:05:29.1955479Z .b8 112 2026-02-21T09:05:29.1955588Z .b8 47 2026-02-21T09:05:29.1955706Z .b8 116 2026-02-21T09:05:29.1955815Z .b8 111 2026-02-21T09:05:29.1955930Z .b8 114 2026-02-21T09:05:29.1956038Z .b8 99 2026-02-21T09:05:29.1956158Z .b8 104 2026-02-21T09:05:29.1956274Z .b8 105 2026-02-21T09:05:29.1956384Z .b8 110 2026-02-21T09:05:29.1956497Z .b8 100 2026-02-21T09:05:29.1956645Z .b8 117 2026-02-21T09:05:29.1956763Z .b8 99 2026-02-21T09:05:29.1956869Z .b8 116 2026-02-21T09:05:29.1956984Z .b8 111 2026-02-21T09:05:29.1957090Z .b8 114 2026-02-21T09:05:29.1957203Z .b8 95 2026-02-21T09:05:29.1957310Z .b8 114 2026-02-21T09:05:29.1957423Z .b8 111 2026-02-21T09:05:29.1957529Z .b8 111 2026-02-21T09:05:29.1957644Z .b8 116 2026-02-21T09:05:29.1957751Z .b8 47 2026-02-21T09:05:29.1957866Z .b8 52 2026-02-21T09:05:29.1957975Z .b8 109 2026-02-21T09:05:29.1958093Z .b8 0 2026-02-21T09:05:29.1958245Z } 2026-02-21T09:05:29.1958371Z .section .debug_macinfo { } 2026-02-21T09:05:29.1958474Z 2026-02-21T09:05:29.1958558Z ================================================================ 2026-02-21T09:05:29.1958792Z please share the reproducer above with Triton project. 2026-02-21T09:05:29.5180703Z 2026-02-21T09:05:29.5182447Z [29s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:29.5183737Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:29.5185329Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:29.5185590Z `ptxas` stderr: 2026-02-21T09:05:29.5186011Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 257 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.5186512Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.5186669Z 2026-02-21T09:05:29.5187062Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp50xnbf0e.ptx -o /tmp/tmp50xnbf0e.ptx.o 2026-02-21T09:05:29.5187522Z 2026-02-21T09:05:29.5187648Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:29.5187831Z 2026-02-21T09:05:29.5187834Z 2026-02-21T09:05:29.5187925Z ================================================================ 2026-02-21T09:05:29.5188132Z Internal Triton PTX codegen error 2026-02-21T09:05:29.5188307Z `ptxas` stderr: 2026-02-21T09:05:29.5188705Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 257 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.5189193Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.5189338Z 2026-02-21T09:05:29.5189760Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp50xnbf0e.ptx -o /tmp/tmp50xnbf0e.ptx.o 2026-02-21T09:05:29.5190181Z 2026-02-21T09:05:29.5190186Z 2026-02-21T09:05:29.5190240Z // 2026-02-21T09:05:29.5190393Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:29.5190562Z // 2026-02-21T09:05:29.5190641Z 2026-02-21T09:05:29.5190696Z .version 8.7 2026-02-21T09:05:29.5190838Z .target sm_100a 2026-02-21T09:05:29.5190970Z .address_size 64 2026-02-21T09:05:29.5191053Z 2026-02-21T09:05:29.5191180Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:29.5191431Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:29.5191647Z // @_helion_matmul 2026-02-21T09:05:29.5191842Z .visible .entry _helion_matmul( 2026-02-21T09:05:29.5192069Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:29.5192343Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:29.5192596Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:29.5192854Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:29.5193149Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:29.5193369Z ) 2026-02-21T09:05:29.5193491Z .reqntid 256 2026-02-21T09:05:29.5193634Z .maxnreg 32 2026-02-21T09:05:29.5193758Z { 2026-02-21T09:05:29.5193893Z .reg .pred %p<123>; 2026-02-21T09:05:29.5194047Z .reg .b32 %r<394>; 2026-02-21T09:05:29.5194200Z .reg .b64 %rd<145>; 2026-02-21T09:05:29.5194486Z .loc 1 19 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:19:0 2026-02-21T09:05:29.5194881Z $L__func_begin0: 2026-02-21T09:05:29.5195151Z .loc 1 19 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:19:0 2026-02-21T09:05:29.5195398Z 2026-02-21T09:05:29.5195454Z // %bb.0: 2026-02-21T09:05:29.5195628Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:05:29.5195826Z $L__tmp0: 2026-02-21T09:05:29.5196078Z .loc 1 19 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:19 2026-02-21T09:05:29.5196382Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:29.5196536Z shr.u32 %r2, %r1, 5; 2026-02-21T09:05:29.5196711Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:05:29.5196906Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:05:29.5197080Z @%p3 bra $L__BB0_16; 2026-02-21T09:05:29.5197228Z bra.uni $L__BB0_1; 2026-02-21T09:05:29.5197547Z $L__BB0_16: 2026-02-21T09:05:29.5197797Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0:0 2026-02-21T09:05:29.5198173Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:05:29.5198415Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:05:29.5198632Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:05:29.5198948Z .loc 1 19 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:19 2026-02-21T09:05:29.5199267Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:29.5199473Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T09:05:29.5199643Z mov.b32 %r146, global_smem; 2026-02-21T09:05:29.5199817Z // begin inline asm 2026-02-21T09:05:29.5200072Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r146], 32; 2026-02-21T09:05:29.5200336Z // end inline asm 2026-02-21T09:05:29.5200483Z bar.sync 0, 128; 2026-02-21T09:05:29.5200630Z ld.shared.b32 %r365, [global_smem]; 2026-02-21T09:05:29.5200809Z bar.sync 0, 128; 2026-02-21T09:05:29.5200943Z // begin inline asm 2026-02-21T09:05:29.5201154Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:29.5201382Z // end inline asm 2026-02-21T09:05:29.5201640Z .loc 1 21 67 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:21:67 2026-02-21T09:05:29.5201935Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:05:29.5202086Z mov.u32 %r221, %ctaid.y; 2026-02-21T09:05:29.5202279Z mov.u32 %r222, %ctaid.z; 2026-02-21T09:05:29.5202429Z mov.u32 %r223, %nctaid.x; 2026-02-21T09:05:29.5202591Z mov.u32 %r224, %nctaid.y; 2026-02-21T09:05:29.5202749Z mad.lo.s32 %r225, %r222, %r224, %r221; 2026-02-21T09:05:29.5202937Z mad.lo.s32 %r226, %r225, %r223, %r41; 2026-02-21T09:05:29.5203106Z mul.lo.s32 %r227, %r226, 384; 2026-02-21T09:05:29.5203272Z cvt.s64.s32 %rd77, %r227; 2026-02-21T09:05:29.5203424Z add.s64 %rd38, %rd7, %rd77; 2026-02-21T09:05:29.5203591Z shl.b32 %r228, %r1, 2; 2026-02-21T09:05:29.5203747Z add.s32 %r147, %r146, %r228; 2026-02-21T09:05:29.5203896Z mov.b32 %r393, 0; 2026-02-21T09:05:29.5204040Z // begin inline asm 2026-02-21T09:05:29.5204194Z @%p29 st.shared.b32 [ %r147 + 0 ], %r393; 2026-02-21T09:05:29.5204376Z // end inline asm 2026-02-21T09:05:29.5204511Z bar.warp.sync -1; 2026-02-21T09:05:29.5204664Z setp.eq.b32 %p111, %r1, 0; 2026-02-21T09:05:29.5204854Z cvt.u64.u32 %rd23, %r146; 2026-02-21T09:05:29.5205012Z // begin inline asm 2026-02-21T09:05:29.5205274Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd4; 2026-02-21T09:05:29.5205554Z // end inline asm 2026-02-21T09:05:29.5205694Z // begin inline asm 2026-02-21T09:05:29.5205949Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.5206213Z // end inline asm 2026-02-21T09:05:29.5206345Z mov.b32 %r149, 16; 2026-02-21T09:05:29.5206490Z // begin inline asm 2026-02-21T09:05:29.5206723Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r149; 2026-02-21T09:05:29.5206998Z // end inline asm 2026-02-21T09:05:29.5207138Z mov.b32 %r150, 256; 2026-02-21T09:05:29.5207281Z // begin inline asm 2026-02-21T09:05:29.5207557Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r150; 2026-02-21T09:05:29.5207818Z // end inline asm 2026-02-21T09:05:29.5207961Z mov.b32 %r151, 2048; 2026-02-21T09:05:29.5208104Z // begin inline asm 2026-02-21T09:05:29.5208358Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r151; 2026-02-21T09:05:29.5208633Z // end inline asm 2026-02-21T09:05:29.5208775Z mov.b32 %r152, 4096; 2026-02-21T09:05:29.5208925Z // begin inline asm 2026-02-21T09:05:29.5209165Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r152; 2026-02-21T09:05:29.5209443Z // end inline asm 2026-02-21T09:05:29.5209576Z mov.b64 %rd31, 4096; 2026-02-21T09:05:29.5209731Z // begin inline asm 2026-02-21T09:05:29.5209982Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:05:29.5210267Z // end inline asm 2026-02-21T09:05:29.5210435Z mov.b32 %r153, 1; 2026-02-21T09:05:29.5210574Z // begin inline asm 2026-02-21T09:05:29.5210837Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r153; 2026-02-21T09:05:29.5211121Z // end inline asm 2026-02-21T09:05:29.5211264Z // begin inline asm 2026-02-21T09:05:29.5211517Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r153; 2026-02-21T09:05:29.5211807Z // end inline asm 2026-02-21T09:05:29.5211939Z // begin inline asm 2026-02-21T09:05:29.5212179Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:05:29.5212454Z // end inline asm 2026-02-21T09:05:29.5212586Z // begin inline asm 2026-02-21T09:05:29.5212841Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.5213127Z // end inline asm 2026-02-21T09:05:29.5213266Z // begin inline asm 2026-02-21T09:05:29.5213502Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.5213784Z // end inline asm 2026-02-21T09:05:29.5213923Z // begin inline asm 2026-02-21T09:05:29.5214144Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.5214415Z // end inline asm 2026-02-21T09:05:29.5214574Z // begin inline asm 2026-02-21T09:05:29.5214962Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd38 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:05:29.5215337Z // end inline asm 2026-02-21T09:05:29.5215478Z // begin inline asm 2026-02-21T09:05:29.5215693Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd38 + 0 ], 0x80; 2026-02-21T09:05:29.5215945Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.5216145Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.5216321Z // end inline asm 2026-02-21T09:05:29.5216460Z bar.sync 0, 128; 2026-02-21T09:05:29.5216712Z .loc 1 22 67 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:22:67 2026-02-21T09:05:29.5217022Z add.s32 %r229, %r227, 128; 2026-02-21T09:05:29.5217179Z cvt.s64.s32 %rd78, %r229; 2026-02-21T09:05:29.5217346Z add.s64 %rd56, %rd7, %rd78; 2026-02-21T09:05:29.5217510Z bar.sync 0, 128; 2026-02-21T09:05:29.5217647Z // begin inline asm 2026-02-21T09:05:29.5217808Z @%p29 st.shared.b32 [ %r147 + 0 ], %r393; 2026-02-21T09:05:29.5217985Z // end inline asm 2026-02-21T09:05:29.5218131Z bar.warp.sync -1; 2026-02-21T09:05:29.5218268Z // begin inline asm 2026-02-21T09:05:29.5218554Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd5; 2026-02-21T09:05:29.5218830Z // end inline asm 2026-02-21T09:05:29.5218976Z // begin inline asm 2026-02-21T09:05:29.5219191Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.5219438Z // end inline asm 2026-02-21T09:05:29.5219566Z // begin inline asm 2026-02-21T09:05:29.5219801Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r149; 2026-02-21T09:05:29.5220092Z // end inline asm 2026-02-21T09:05:29.5220232Z // begin inline asm 2026-02-21T09:05:29.5220466Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r149; 2026-02-21T09:05:29.5220737Z // end inline asm 2026-02-21T09:05:29.5220875Z // begin inline asm 2026-02-21T09:05:29.5221112Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r151; 2026-02-21T09:05:29.5221390Z // end inline asm 2026-02-21T09:05:29.5221519Z // begin inline asm 2026-02-21T09:05:29.5221763Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r151; 2026-02-21T09:05:29.5222031Z // end inline asm 2026-02-21T09:05:29.5222166Z // begin inline asm 2026-02-21T09:05:29.5222418Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:05:29.5222695Z // end inline asm 2026-02-21T09:05:29.5222859Z // begin inline asm 2026-02-21T09:05:29.5223111Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r153; 2026-02-21T09:05:29.5223393Z // end inline asm 2026-02-21T09:05:29.5223520Z // begin inline asm 2026-02-21T09:05:29.5223771Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r153; 2026-02-21T09:05:29.5224053Z // end inline asm 2026-02-21T09:05:29.5224182Z // begin inline asm 2026-02-21T09:05:29.5224413Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:05:29.5224666Z // end inline asm 2026-02-21T09:05:29.5224844Z // begin inline asm 2026-02-21T09:05:29.5225084Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.5225366Z // end inline asm 2026-02-21T09:05:29.5225496Z // begin inline asm 2026-02-21T09:05:29.5225729Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.5225996Z // end inline asm 2026-02-21T09:05:29.5226126Z // begin inline asm 2026-02-21T09:05:29.5226358Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.5226615Z // end inline asm 2026-02-21T09:05:29.5226743Z // begin inline asm 2026-02-21T09:05:29.5227132Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd56 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:05:29.5227500Z // end inline asm 2026-02-21T09:05:29.5227645Z // begin inline asm 2026-02-21T09:05:29.5227850Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd56 + 0 ], 0x80; 2026-02-21T09:05:29.5228100Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.5228291Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.5228460Z // end inline asm 2026-02-21T09:05:29.5228594Z bar.sync 0, 128; 2026-02-21T09:05:29.5228833Z .loc 1 24 71 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:24:71 2026-02-21T09:05:29.5229123Z add.s32 %r230, %r227, 256; 2026-02-21T09:05:29.5229280Z cvt.s64.s32 %rd79, %r230; 2026-02-21T09:05:29.5229443Z add.s64 %rd74, %rd7, %rd79; 2026-02-21T09:05:29.5229593Z bar.sync 0, 128; 2026-02-21T09:05:29.5229730Z // begin inline asm 2026-02-21T09:05:29.5229883Z @%p29 st.shared.b32 [ %r147 + 0 ], %r393; 2026-02-21T09:05:29.5230053Z // end inline asm 2026-02-21T09:05:29.5230192Z bar.warp.sync -1; 2026-02-21T09:05:29.5230326Z // begin inline asm 2026-02-21T09:05:29.5230614Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd6; 2026-02-21T09:05:29.5230884Z // end inline asm 2026-02-21T09:05:29.5231017Z // begin inline asm 2026-02-21T09:05:29.5231233Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.5231483Z // end inline asm 2026-02-21T09:05:29.5231614Z // begin inline asm 2026-02-21T09:05:29.5231841Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r149; 2026-02-21T09:05:29.5232114Z // end inline asm 2026-02-21T09:05:29.5232267Z // begin inline asm 2026-02-21T09:05:29.5232497Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r150; 2026-02-21T09:05:29.5232749Z // end inline asm 2026-02-21T09:05:29.5232885Z // begin inline asm 2026-02-21T09:05:29.5233127Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r151; 2026-02-21T09:05:29.5233393Z // end inline asm 2026-02-21T09:05:29.5233527Z // begin inline asm 2026-02-21T09:05:29.5233758Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r152; 2026-02-21T09:05:29.5234029Z // end inline asm 2026-02-21T09:05:29.5234154Z // begin inline asm 2026-02-21T09:05:29.5234418Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:05:29.5234743Z // end inline asm 2026-02-21T09:05:29.5234881Z // begin inline asm 2026-02-21T09:05:29.5235181Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r153; 2026-02-21T09:05:29.5235489Z // end inline asm 2026-02-21T09:05:29.5235635Z // begin inline asm 2026-02-21T09:05:29.5235898Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r153; 2026-02-21T09:05:29.5236208Z // end inline asm 2026-02-21T09:05:29.5236344Z // begin inline asm 2026-02-21T09:05:29.5236597Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:05:29.5236879Z // end inline asm 2026-02-21T09:05:29.5237018Z // begin inline asm 2026-02-21T09:05:29.5237285Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.5237576Z // end inline asm 2026-02-21T09:05:29.5237719Z // begin inline asm 2026-02-21T09:05:29.5237961Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:05:29.5238242Z // end inline asm 2026-02-21T09:05:29.5238382Z // begin inline asm 2026-02-21T09:05:29.5238617Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:05:29.5238888Z // end inline asm 2026-02-21T09:05:29.5239022Z // begin inline asm 2026-02-21T09:05:29.5239373Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd74 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:05:29.5239784Z // end inline asm 2026-02-21T09:05:29.5239931Z // begin inline asm 2026-02-21T09:05:29.5240157Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd74 + 0 ], 0x80; 2026-02-21T09:05:29.5240419Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.5240626Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.5240810Z // end inline asm 2026-02-21T09:05:29.5240956Z bar.sync 0, 128; 2026-02-21T09:05:29.5241107Z cvta.global.u64 %rd80, %rd74; 2026-02-21T09:05:29.5241411Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5241726Z max.u32 %r231, %r41, 2047; 2026-02-21T09:05:29.5241901Z shl.b32 %r232, %r231, 7; 2026-02-21T09:05:29.5242084Z sub.s32 %r42, 262144, %r232; 2026-02-21T09:05:29.5242351Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5242656Z shfl.sync.idx.b32 %r233, %r2, 0, 31, -1; 2026-02-21T09:05:29.5242837Z shl.b32 %r234, %r233, 21; 2026-02-21T09:05:29.5242999Z and.b32 %r235, %r234, 6291456; 2026-02-21T09:05:29.5243160Z add.s32 %r171, %r235, %r365; 2026-02-21T09:05:29.5243349Z mov.pred %p85, -1; 2026-02-21T09:05:29.5243488Z // begin inline asm 2026-02-21T09:05:29.5243852Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r171 + 0], {%r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393}; 2026-02-21T09:05:29.5244241Z // end inline asm 2026-02-21T09:05:29.5244368Z // begin inline asm 2026-02-21T09:05:29.5244760Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r171 + 16], {%r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393, %r393}; 2026-02-21T09:05:29.5245182Z // end inline asm 2026-02-21T09:05:29.5245324Z // begin inline asm 2026-02-21T09:05:29.5245475Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:29.5245645Z // end inline asm 2026-02-21T09:05:29.5245787Z bar.sync 0, 128; 2026-02-21T09:05:29.5246044Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5246345Z add.s32 %r205, %r146, 43008; 2026-02-21T09:05:29.5246501Z // begin inline asm 2026-02-21T09:05:29.5246678Z @%p111 mbarrier.init.shared::cta.b64 [%r205], 1; 2026-02-21T09:05:29.5246870Z // end inline asm 2026-02-21T09:05:29.5247010Z bar.sync 0, 128; 2026-02-21T09:05:29.5247148Z add.s32 %r206, %r146, 43016; 2026-02-21T09:05:29.5247308Z // begin inline asm 2026-02-21T09:05:29.5247483Z @%p111 mbarrier.init.shared::cta.b64 [%r206], 1; 2026-02-21T09:05:29.5247674Z // end inline asm 2026-02-21T09:05:29.5247847Z bar.sync 0, 128; 2026-02-21T09:05:29.5247983Z add.s32 %r207, %r146, 43024; 2026-02-21T09:05:29.5248137Z // begin inline asm 2026-02-21T09:05:29.5248293Z @%p111 mbarrier.init.shared::cta.b64 [%r207], 1; 2026-02-21T09:05:29.5248479Z // end inline asm 2026-02-21T09:05:29.5248603Z bar.sync 0, 128; 2026-02-21T09:05:29.5248741Z add.s32 %r208, %r146, 43032; 2026-02-21T09:05:29.5248887Z // begin inline asm 2026-02-21T09:05:29.5249049Z @%p111 mbarrier.init.shared::cta.b64 [%r208], 1; 2026-02-21T09:05:29.5249231Z // end inline asm 2026-02-21T09:05:29.5249362Z add.s32 %r209, %r146, 43040; 2026-02-21T09:05:29.5249516Z // begin inline asm 2026-02-21T09:05:29.5249669Z @%p111 mbarrier.init.shared::cta.b64 [%r209], 1; 2026-02-21T09:05:29.5249852Z // end inline asm 2026-02-21T09:05:29.5249976Z bar.sync 0, 128; 2026-02-21T09:05:29.5250114Z add.s32 %r210, %r146, 43048; 2026-02-21T09:05:29.5250260Z // begin inline asm 2026-02-21T09:05:29.5250420Z @%p111 mbarrier.init.shared::cta.b64 [%r210], 1; 2026-02-21T09:05:29.5250606Z // end inline asm 2026-02-21T09:05:29.5250733Z bar.sync 0, 128; 2026-02-21T09:05:29.5250870Z add.s32 %r211, %r146, 43056; 2026-02-21T09:05:29.5251017Z // begin inline asm 2026-02-21T09:05:29.5251176Z @%p111 mbarrier.init.shared::cta.b64 [%r211], 1; 2026-02-21T09:05:29.5251388Z // end inline asm 2026-02-21T09:05:29.5251520Z bar.sync 0, 128; 2026-02-21T09:05:29.5251649Z add.s32 %r212, %r146, 43064; 2026-02-21T09:05:29.5251800Z // begin inline asm 2026-02-21T09:05:29.5251954Z @%p111 mbarrier.init.shared::cta.b64 [%r212], 1; 2026-02-21T09:05:29.5252135Z // end inline asm 2026-02-21T09:05:29.5252369Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5252641Z bar.sync 0, 128; 2026-02-21T09:05:29.5252773Z // begin inline asm 2026-02-21T09:05:29.5252936Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r205]; 2026-02-21T09:05:29.5253131Z // end inline asm 2026-02-21T09:05:29.5253256Z bar.sync 0, 128; 2026-02-21T09:05:29.5253392Z // begin inline asm 2026-02-21T09:05:29.5253555Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r206]; 2026-02-21T09:05:29.5253745Z // end inline asm 2026-02-21T09:05:29.5253876Z bar.sync 0, 128; 2026-02-21T09:05:29.5254003Z // begin inline asm 2026-02-21T09:05:29.5254169Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r207]; 2026-02-21T09:05:29.5254352Z // end inline asm 2026-02-21T09:05:29.5254484Z bar.sync 0, 128; 2026-02-21T09:05:29.5254610Z // begin inline asm 2026-02-21T09:05:29.5254844Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r208]; 2026-02-21T09:05:29.5255031Z // end inline asm 2026-02-21T09:05:29.5255287Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5255577Z bar.sync 0, 128; 2026-02-21T09:05:29.5255712Z add.s32 %r217, %r146, 43072; 2026-02-21T09:05:29.5255871Z // begin inline asm 2026-02-21T09:05:29.5256032Z @%p111 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T09:05:29.5256227Z // end inline asm 2026-02-21T09:05:29.5256388Z add.s32 %r353, %r146, 43088; 2026-02-21T09:05:29.5256542Z // begin inline asm 2026-02-21T09:05:29.5256696Z @%p111 mbarrier.init.shared::cta.b64 [%r353], 1; 2026-02-21T09:05:29.5256880Z // end inline asm 2026-02-21T09:05:29.5257124Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5257399Z bar.sync 0, 128; 2026-02-21T09:05:29.5257533Z // begin inline asm 2026-02-21T09:05:29.5257693Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r353]; 2026-02-21T09:05:29.5257883Z // end inline asm 2026-02-21T09:05:29.5258122Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5258430Z st.shared.b32 [global_smem+43096], 33554689; 2026-02-21T09:05:29.5258632Z st.shared.b32 [global_smem+32768], %r365; 2026-02-21T09:05:29.5258805Z barrier.sync 1; 2026-02-21T09:05:29.5258966Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:29.5259168Z barrier.sync 1; 2026-02-21T09:05:29.5259318Z setp.lt.s32 %p102, %r42, 1; 2026-02-21T09:05:29.5259475Z @%p102 bra $L__BB0_23; 2026-02-21T09:05:29.5259645Z // %bb.17: // %.lr.ph12 2026-02-21T09:05:29.5259830Z add.s32 %r390, %r41, -1; 2026-02-21T09:05:29.5259988Z shl.b32 %r238, %r1, 5; 2026-02-21T09:05:29.5260132Z and.b32 %r239, %r238, 3936; 2026-02-21T09:05:29.5260294Z bfe.s32 %r240, %r1, 2, 1; 2026-02-21T09:05:29.5260449Z and.b32 %r241, %r240, 144; 2026-02-21T09:05:29.5260598Z or.b32 %r242, %r241, %r239; 2026-02-21T09:05:29.5260752Z add.s32 %r244, %r146, 32768; 2026-02-21T09:05:29.5260897Z add.s32 %r45, %r244, %r242; 2026-02-21T09:05:29.5261049Z xor.b32 %r245, %r242, 16; 2026-02-21T09:05:29.5261191Z add.s32 %r46, %r244, %r245; 2026-02-21T09:05:29.5261342Z mov.b32 %r387, -1; 2026-02-21T09:05:29.5261475Z mov.b32 %r391, %r393; 2026-02-21T09:05:29.5261621Z mov.b32 %r392, %r393; 2026-02-21T09:05:29.5261757Z mov.b32 %r388, %r393; 2026-02-21T09:05:29.5261903Z bra.uni $L__BB0_18; 2026-02-21T09:05:29.5262091Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.5262410Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5262740Z bar.sync 0, 128; 2026-02-21T09:05:29.5262872Z // begin inline asm 2026-02-21T09:05:29.5263008Z 2026-02-21T09:05:29.5263119Z { 2026-02-21T09:05:29.5263245Z .reg .pred complete; 2026-02-21T09:05:29.5263386Z waitLoop: 2026-02-21T09:05:29.5263579Z mbarrier.try_wait.parity.shared.b64 complete, [%r217], %r393; 2026-02-21T09:05:29.5263813Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.5263961Z } 2026-02-21T09:05:29.5264023Z 2026-02-21T09:05:29.5264085Z // end inline asm 2026-02-21T09:05:29.5264217Z // begin inline asm 2026-02-21T09:05:29.5264586Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r250, %r251, %r252, %r253, %r254, %r255, %r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265}, [%r171 + 0]; 2026-02-21T09:05:29.5264995Z // end inline asm 2026-02-21T09:05:29.5265145Z // begin inline asm 2026-02-21T09:05:29.5265481Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r267, %r268, %r269, %r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282}, [%r171 + 16]; 2026-02-21T09:05:29.5265874Z // end inline asm 2026-02-21T09:05:29.5266012Z // begin inline asm 2026-02-21T09:05:29.5266156Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:29.5266328Z // end inline asm 2026-02-21T09:05:29.5266493Z bar.sync 0, 128; 2026-02-21T09:05:29.5266630Z // begin inline asm 2026-02-21T09:05:29.5266794Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r353]; 2026-02-21T09:05:29.5266987Z // end inline asm 2026-02-21T09:05:29.5267117Z cvt.u64.u32 %rd81, %r250; 2026-02-21T09:05:29.5267275Z cvt.u64.u32 %rd82, %r251; 2026-02-21T09:05:29.5267428Z shl.b64 %rd83, %rd82, 32; 2026-02-21T09:05:29.5267574Z or.b64 %rd84, %rd81, %rd83; 2026-02-21T09:05:29.5267841Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5268164Z mov.b64 {%r290, %r291}, %rd84; 2026-02-21T09:05:29.5268335Z cvt.rn.f16x2.f32 %r292, %r291, %r290; 2026-02-21T09:05:29.5268607Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5268893Z cvt.u64.u32 %rd85, %r252; 2026-02-21T09:05:29.5269045Z cvt.u64.u32 %rd86, %r253; 2026-02-21T09:05:29.5269187Z shl.b64 %rd87, %rd86, 32; 2026-02-21T09:05:29.5269341Z or.b64 %rd88, %rd85, %rd87; 2026-02-21T09:05:29.5269594Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5269885Z mov.b64 {%r293, %r294}, %rd88; 2026-02-21T09:05:29.5270044Z cvt.rn.f16x2.f32 %r295, %r294, %r293; 2026-02-21T09:05:29.5270320Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5270597Z cvt.u64.u32 %rd89, %r254; 2026-02-21T09:05:29.5270781Z cvt.u64.u32 %rd90, %r255; 2026-02-21T09:05:29.5270934Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:05:29.5271080Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:05:29.5271343Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5271622Z mov.b64 {%r296, %r297}, %rd92; 2026-02-21T09:05:29.5271787Z cvt.rn.f16x2.f32 %r298, %r297, %r296; 2026-02-21T09:05:29.5272059Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5272342Z cvt.u64.u32 %rd93, %r256; 2026-02-21T09:05:29.5272490Z cvt.u64.u32 %rd94, %r257; 2026-02-21T09:05:29.5272634Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:05:29.5272784Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:05:29.5273041Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5273323Z mov.b64 {%r299, %r300}, %rd96; 2026-02-21T09:05:29.5273480Z cvt.rn.f16x2.f32 %r301, %r300, %r299; 2026-02-21T09:05:29.5273757Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5274032Z cvt.u64.u32 %rd97, %r258; 2026-02-21T09:05:29.5274182Z cvt.u64.u32 %rd98, %r259; 2026-02-21T09:05:29.5274384Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:05:29.5274530Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:05:29.5274831Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5275118Z mov.b64 {%r302, %r303}, %rd100; 2026-02-21T09:05:29.5275294Z cvt.rn.f16x2.f32 %r304, %r303, %r302; 2026-02-21T09:05:29.5275568Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5275860Z cvt.u64.u32 %rd101, %r260; 2026-02-21T09:05:29.5276032Z cvt.u64.u32 %rd102, %r261; 2026-02-21T09:05:29.5276188Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:05:29.5276350Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:05:29.5276607Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5276895Z mov.b64 {%r305, %r306}, %rd104; 2026-02-21T09:05:29.5277057Z cvt.rn.f16x2.f32 %r307, %r306, %r305; 2026-02-21T09:05:29.5277334Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5277648Z cvt.u64.u32 %rd105, %r262; 2026-02-21T09:05:29.5277805Z cvt.u64.u32 %rd106, %r263; 2026-02-21T09:05:29.5277989Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:05:29.5278155Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:05:29.5278432Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5278722Z mov.b64 {%r308, %r309}, %rd108; 2026-02-21T09:05:29.5278897Z cvt.rn.f16x2.f32 %r310, %r309, %r308; 2026-02-21T09:05:29.5279179Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5279485Z cvt.u64.u32 %rd109, %r264; 2026-02-21T09:05:29.5279681Z cvt.u64.u32 %rd110, %r265; 2026-02-21T09:05:29.5279835Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:05:29.5280001Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:05:29.5280273Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5280576Z mov.b64 {%r311, %r312}, %rd112; 2026-02-21T09:05:29.5280743Z cvt.rn.f16x2.f32 %r313, %r312, %r311; 2026-02-21T09:05:29.5281036Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5281331Z cvt.u64.u32 %rd113, %r267; 2026-02-21T09:05:29.5281493Z cvt.u64.u32 %rd114, %r268; 2026-02-21T09:05:29.5281655Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:05:29.5281811Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:05:29.5282089Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5282417Z mov.b64 {%r314, %r315}, %rd116; 2026-02-21T09:05:29.5282598Z cvt.rn.f16x2.f32 %r316, %r315, %r314; 2026-02-21T09:05:29.5282889Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5283192Z cvt.u64.u32 %rd117, %r269; 2026-02-21T09:05:29.5283361Z cvt.u64.u32 %rd118, %r270; 2026-02-21T09:05:29.5283520Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:05:29.5283689Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:05:29.5283967Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5284275Z mov.b64 {%r317, %r318}, %rd120; 2026-02-21T09:05:29.5284447Z cvt.rn.f16x2.f32 %r319, %r318, %r317; 2026-02-21T09:05:29.5284769Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5285067Z cvt.u64.u32 %rd121, %r271; 2026-02-21T09:05:29.5285228Z cvt.u64.u32 %rd122, %r272; 2026-02-21T09:05:29.5285390Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:05:29.5285548Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:05:29.5285832Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5286111Z mov.b64 {%r320, %r321}, %rd124; 2026-02-21T09:05:29.5286315Z cvt.rn.f16x2.f32 %r322, %r321, %r320; 2026-02-21T09:05:29.5286584Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5286870Z cvt.u64.u32 %rd125, %r273; 2026-02-21T09:05:29.5287033Z cvt.u64.u32 %rd126, %r274; 2026-02-21T09:05:29.5287183Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:05:29.5287346Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:05:29.5287603Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5287886Z mov.b64 {%r323, %r324}, %rd128; 2026-02-21T09:05:29.5288050Z cvt.rn.f16x2.f32 %r325, %r324, %r323; 2026-02-21T09:05:29.5288320Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5288598Z cvt.u64.u32 %rd129, %r275; 2026-02-21T09:05:29.5288752Z cvt.u64.u32 %rd130, %r276; 2026-02-21T09:05:29.5288904Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:05:29.5289060Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:05:29.5289324Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5289606Z mov.b64 {%r326, %r327}, %rd132; 2026-02-21T09:05:29.5289799Z cvt.rn.f16x2.f32 %r328, %r327, %r326; 2026-02-21T09:05:29.5290062Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5290348Z cvt.u64.u32 %rd133, %r277; 2026-02-21T09:05:29.5290498Z cvt.u64.u32 %rd134, %r278; 2026-02-21T09:05:29.5290639Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:05:29.5290788Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:05:29.5291041Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5291350Z mov.b64 {%r329, %r330}, %rd136; 2026-02-21T09:05:29.5291503Z cvt.rn.f16x2.f32 %r331, %r330, %r329; 2026-02-21T09:05:29.5291776Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5292059Z cvt.u64.u32 %rd137, %r279; 2026-02-21T09:05:29.5292206Z cvt.u64.u32 %rd138, %r280; 2026-02-21T09:05:29.5292355Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:05:29.5292504Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:05:29.5292765Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5293050Z mov.b64 {%r332, %r333}, %rd140; 2026-02-21T09:05:29.5293209Z cvt.rn.f16x2.f32 %r334, %r333, %r332; 2026-02-21T09:05:29.5293479Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5293789Z cvt.u64.u32 %rd141, %r281; 2026-02-21T09:05:29.5293938Z cvt.u64.u32 %rd142, %r282; 2026-02-21T09:05:29.5294079Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:05:29.5294229Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:05:29.5294474Z .loc 1 58 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:58:27 2026-02-21T09:05:29.5294789Z mov.b64 {%r335, %r336}, %rd144; 2026-02-21T09:05:29.5294944Z cvt.rn.f16x2.f32 %r337, %r336, %r335; 2026-02-21T09:05:29.5295211Z .loc 1 59 45 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:59:45 2026-02-21T09:05:29.5295502Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.5295669Z bar.sync 0, 128; 2026-02-21T09:05:29.5295841Z st.shared.v4.b32 [%r45], {%r292, %r295, %r298, %r301}; 2026-02-21T09:05:29.5296069Z st.shared.v4.b32 [%r45+4096], {%r316, %r319, %r322, %r325}; 2026-02-21T09:05:29.5296300Z st.shared.v4.b32 [%r46], {%r304, %r307, %r310, %r313}; 2026-02-21T09:05:29.5296388Z st.shared.v4.b32 [%r46+4096], {%r328, %r331, %r334, %r337}; 2026-02-21T09:05:29.5296445Z // begin inline asm 2026-02-21T09:05:29.5296519Z fence.proxy.async.shared::cta; 2026-02-21T09:05:29.5296571Z // end inline asm 2026-02-21T09:05:29.5296620Z bar.sync 0, 128; 2026-02-21T09:05:29.5296681Z elect.sync %r338|%p109, -1; 2026-02-21T09:05:29.5296779Z and.pred %p107, %p29, %p109; 2026-02-21T09:05:29.5296832Z // begin inline asm 2026-02-21T09:05:29.5297008Z @%p107 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd80, {%r392, %r391}], [%r244]; 2026-02-21T09:05:29.5297065Z // end inline asm 2026-02-21T09:05:29.5297127Z cp.async.bulk.commit_group; 2026-02-21T09:05:29.5297178Z mov.b32 %r389, 1; 2026-02-21T09:05:29.5297278Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.5297446Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5297502Z xor.b32 %r393, %r389, %r393; 2026-02-21T09:05:29.5297677Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5297737Z add.s32 %r388, %r388, 1; 2026-02-21T09:05:29.5297797Z setp.lt.s32 %p110, %r388, %r42; 2026-02-21T09:05:29.5297852Z @%p110 bra $L__BB0_18; 2026-02-21T09:05:29.5297910Z bra.uni $L__BB0_23; 2026-02-21T09:05:29.5298008Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:29.5298060Z add.s32 %r247, %r387, 1; 2026-02-21T09:05:29.5298121Z setp.eq.b32 %p103, %r387, 127; 2026-02-21T09:05:29.5298215Z selp.b32 %r387, 0, %r247, %p103; 2026-02-21T09:05:29.5298274Z setp.eq.b32 %p104, %r387, 127; 2026-02-21T09:05:29.5298327Z @%p104 bra $L__BB0_21; 2026-02-21T09:05:29.5298423Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.5298593Z .loc 1 0 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0:106 2026-02-21T09:05:29.5298643Z mov.b32 %r389, 0; 2026-02-21T09:05:29.5298816Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5298898Z setp.ne.b32 %p105, %r387, 0; 2026-02-21T09:05:29.5298951Z @%p105 bra $L__BB0_22; 2026-02-21T09:05:29.5299024Z // %bb.20: // %.thread 2026-02-21T09:05:29.5299109Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.5299162Z add.s32 %r390, %r390, 1; 2026-02-21T09:05:29.5299328Z .loc 1 39 35 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:39:35 2026-02-21T09:05:29.5299384Z shr.s32 %r340, %r390, 31; 2026-02-21T09:05:29.5299436Z shr.u32 %r341, %r340, 26; 2026-02-21T09:05:29.5299488Z add.s32 %r342, %r390, %r341; 2026-02-21T09:05:29.5299544Z shr.s32 %r343, %r342, 6; 2026-02-21T09:05:29.5299703Z .loc 1 40 33 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:40:33 2026-02-21T09:05:29.5299755Z shl.b32 %r344, %r343, 2; 2026-02-21T09:05:29.5299944Z .loc 1 41 39 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:41:39 2026-02-21T09:05:29.5300000Z sub.s32 %r345, 128, %r344; 2026-02-21T09:05:29.5300161Z .loc 1 41 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:41:52 2026-02-21T09:05:29.5300215Z min.s32 %r346, %r345, 4; 2026-02-21T09:05:29.5300377Z .loc 1 42 45 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:42:45 2026-02-21T09:05:29.5300431Z and.b32 %r347, %r342, -64; 2026-02-21T09:05:29.5300484Z sub.s32 %r348, %r390, %r347; 2026-02-21T09:05:29.5300652Z .loc 1 43 51 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:43:51 2026-02-21T09:05:29.5300704Z div.s32 %r349, %r348, %r346; 2026-02-21T09:05:29.5300864Z .loc 1 42 64 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:42:64 2026-02-21T09:05:29.5300923Z mul.lo.s32 %r350, %r349, %r346; 2026-02-21T09:05:29.5300975Z sub.s32 %r351, %r348, %r350; 2026-02-21T09:05:29.5301141Z .loc 1 42 30 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:42:30 2026-02-21T09:05:29.5301195Z add.s32 %r352, %r351, %r344; 2026-02-21T09:05:29.5301356Z .loc 1 44 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:44:27 2026-02-21T09:05:29.5301429Z shl.b32 %r392, %r352, 4; 2026-02-21T09:05:29.5301594Z .loc 1 45 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:45:27 2026-02-21T09:05:29.5301650Z shl.b32 %r391, %r349, 8; 2026-02-21T09:05:29.5301817Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5301869Z bra.uni $L__BB0_22; 2026-02-21T09:05:29.5301963Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:05:29.5302127Z .loc 1 0 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0:106 2026-02-21T09:05:29.5302185Z mov.b32 %r65, global_smem; 2026-02-21T09:05:29.5302243Z add.s32 %r66, %r65, %r3; 2026-02-21T09:05:29.5302293Z mov.u32 %r114, %ctaid.x; 2026-02-21T09:05:29.5302346Z max.u32 %r115, %r114, 2047; 2026-02-21T09:05:29.5302396Z shl.b32 %r116, %r115, 7; 2026-02-21T09:05:29.5302456Z sub.s32 %r5, 262144, %r116; 2026-02-21T09:05:29.5302512Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:05:29.5302562Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.5302684Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5302852Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5302926Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.5302980Z barrier.sync 1; 2026-02-21T09:05:29.5303031Z barrier.sync 1; 2026-02-21T09:05:29.5303103Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.5303177Z $L__BB0_2: // %.preheader 2026-02-21T09:05:29.5303267Z // =>This Loop Header: Depth=1 2026-02-21T09:05:29.5303372Z // Child Loop BB0_11 Depth 2 2026-02-21T09:05:29.5303450Z // Child Loop BB0_7 Depth 2 2026-02-21T09:05:29.5303616Z .loc 1 19 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:19 2026-02-21T09:05:29.5303687Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:05:29.5303739Z barrier.sync 1; 2026-02-21T09:05:29.5303804Z ld.shared.b8 %r64, [%r66+43092]; 2026-02-21T09:05:29.5303861Z setp.gt.u32 %p4, %r64, 3; 2026-02-21T09:05:29.5303912Z @%p4 bra $L__BB0_4; 2026-02-21T09:05:29.5303986Z // %bb.3: // %.preheader 2026-02-21T09:05:29.5304071Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5304128Z $L_brx_0: .branchtargets 2026-02-21T09:05:29.5304177Z $L__BB0_5, 2026-02-21T09:05:29.5304228Z $L__BB0_9, 2026-02-21T09:05:29.5304322Z $L__BB0_15, 2026-02-21T09:05:29.5304372Z $L__BB0_24; 2026-02-21T09:05:29.5304427Z brx.idx %r64, $L_brx_0; 2026-02-21T09:05:29.5304523Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5304737Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5304814Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.5304893Z ld.shared.b32 %r121, [global_smem+32768]; 2026-02-21T09:05:29.5304948Z barrier.sync 1; 2026-02-21T09:05:29.5305004Z @%p17 bra $L__BB0_8; 2026-02-21T09:05:29.5305080Z // %bb.6: // %.lr.ph9 2026-02-21T09:05:29.5305163Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5305330Z .loc 1 0 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0:106 2026-02-21T09:05:29.5305383Z mov.b32 %r370, -1; 2026-02-21T09:05:29.5305448Z mov.pred %p122, 0; 2026-02-21T09:05:29.5305501Z mov.b32 %r367, 0; 2026-02-21T09:05:29.5305556Z mov.b32 %r368, %r367; 2026-02-21T09:05:29.5305614Z mov.b32 %r369, %r367; 2026-02-21T09:05:29.5305667Z mov.b32 %r371, %r367; 2026-02-21T09:05:29.5305759Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.5305884Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.5306069Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5306129Z add.s32 %r127, %r370, 1; 2026-02-21T09:05:29.5306194Z setp.eq.b32 %p26, %r370, 127; 2026-02-21T09:05:29.5306264Z selp.b32 %r370, 0, %r127, %p26; 2026-02-21T09:05:29.5306318Z shl.b32 %r128, %r369, 3; 2026-02-21T09:05:29.5306375Z add.s32 %r130, %r65, %r128; 2026-02-21T09:05:29.5306439Z add.s32 %r131, %r130, 43008; 2026-02-21T09:05:29.5306494Z add.s32 %r119, %r130, 43040; 2026-02-21T09:05:29.5306661Z .loc 1 54 31 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:54:31 2026-02-21T09:05:29.5306717Z shl.b32 %r132, %r369, 13; 2026-02-21T09:05:29.5306780Z add.s32 %r133, %r65, %r132; 2026-02-21T09:05:29.5306941Z .loc 1 55 44 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:55:44 2026-02-21T09:05:29.5306997Z shl.b32 %r134, %r369, 9; 2026-02-21T09:05:29.5307059Z add.s32 %r135, %r65, %r134; 2026-02-21T09:05:29.5307115Z add.s32 %r136, %r135, 40960; 2026-02-21T09:05:29.5307306Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5307373Z bar.warp.sync -1; 2026-02-21T09:05:29.5307428Z // begin inline asm 2026-02-21T09:05:29.5307477Z 2026-02-21T09:05:29.5307525Z { 2026-02-21T09:05:29.5307590Z .reg .pred complete; 2026-02-21T09:05:29.5307644Z waitLoop: 2026-02-21T09:05:29.5307763Z mbarrier.try_wait.parity.shared.b64 complete, [%r119], %r368; 2026-02-21T09:05:29.5307833Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.5307880Z } 2026-02-21T09:05:29.5307910Z 2026-02-21T09:05:29.5307964Z // end inline asm 2026-02-21T09:05:29.5308128Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5308197Z setp.eq.b32 %p25, %r370, 127; 2026-02-21T09:05:29.5308356Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5308418Z elect.sync %r137|%p20, -1; 2026-02-21T09:05:29.5308487Z bfe.u32 %r138, %r133, 4, 14; 2026-02-21T09:05:29.5308545Z cvt.u64.u32 %rd20, %r138; 2026-02-21T09:05:29.5308618Z or.b64 %rd14, %rd20, -4611685949674356736; 2026-02-21T09:05:29.5308681Z bfe.u32 %r139, %r136, 4, 14; 2026-02-21T09:05:29.5308739Z cvt.u64.u32 %rd21, %r139; 2026-02-21T09:05:29.5308807Z or.b64 %rd15, %rd21, -4611685949705814016; 2026-02-21T09:05:29.5308861Z mov.b32 %r122, 134479888; 2026-02-21T09:05:29.5308923Z // begin inline asm 2026-02-21T09:05:29.5309089Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r121 + 0 ], %rd14, %rd15, %r122, %p122; 2026-02-21T09:05:29.5309146Z // end inline asm 2026-02-21T09:05:29.5309209Z add.s32 %r140, %r133, 4096; 2026-02-21T09:05:29.5309265Z bfe.u32 %r141, %r140, 4, 14; 2026-02-21T09:05:29.5309320Z cvt.u64.u32 %rd22, %r141; 2026-02-21T09:05:29.5309394Z or.b64 %rd16, %rd22, -4611685949674356736; 2026-02-21T09:05:29.5309449Z // begin inline asm 2026-02-21T09:05:29.5309584Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r121 + 16 ], %rd16, %rd15, %r122, %p122; 2026-02-21T09:05:29.5309637Z // end inline asm 2026-02-21T09:05:29.5309699Z cvt.u64.u32 %rd18, %r131; 2026-02-21T09:05:29.5309753Z // begin inline asm 2026-02-21T09:05:29.5309871Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd18]; 2026-02-21T09:05:29.5309931Z // end inline asm 2026-02-21T09:05:29.5309992Z and.pred %p24, %p25, %p20; 2026-02-21T09:05:29.5310046Z add.s32 %r142, %r65, 43072; 2026-02-21T09:05:29.5310108Z cvt.u64.u32 %rd19, %r142; 2026-02-21T09:05:29.5310161Z // begin inline asm 2026-02-21T09:05:29.5310280Z @%p24 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd19]; 2026-02-21T09:05:29.5310332Z // end inline asm 2026-02-21T09:05:29.5310496Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5310583Z setp.ne.b32 %p122, %r370, 127; 2026-02-21T09:05:29.5310749Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5310814Z selp.b32 %r143, 1, 0, %p25; 2026-02-21T09:05:29.5310870Z xor.b32 %r367, %r367, %r143; 2026-02-21T09:05:29.5310923Z add.s32 %r125, %r65, 43088; 2026-02-21T09:05:29.5310982Z // begin inline asm 2026-02-21T09:05:29.5311031Z 2026-02-21T09:05:29.5311079Z { 2026-02-21T09:05:29.5311139Z @!%p25 bra.uni skipWait; 2026-02-21T09:05:29.5311202Z .reg .pred complete; 2026-02-21T09:05:29.5311254Z waitLoop: 2026-02-21T09:05:29.5311370Z mbarrier.try_wait.parity.shared.b64 complete, [%r125], %r367; 2026-02-21T09:05:29.5311438Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.5311490Z skipWait: 2026-02-21T09:05:29.5311538Z } 2026-02-21T09:05:29.5311541Z 2026-02-21T09:05:29.5311593Z // end inline asm 2026-02-21T09:05:29.5311758Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5311814Z add.s32 %r144, %r369, 1; 2026-02-21T09:05:29.5311872Z setp.eq.b32 %p27, %r144, 4; 2026-02-21T09:05:29.5311941Z selp.b32 %r369, 0, %r144, %p27; 2026-02-21T09:05:29.5312019Z selp.b32 %r145, 1, 0, %p27; 2026-02-21T09:05:29.5312078Z xor.b32 %r368, %r368, %r145; 2026-02-21T09:05:29.5312251Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5312314Z add.s32 %r371, %r371, 1; 2026-02-21T09:05:29.5312375Z setp.lt.s32 %p28, %r371, %r5; 2026-02-21T09:05:29.5312430Z @%p28 bra $L__BB0_7; 2026-02-21T09:05:29.5312522Z $L__BB0_8: // %._crit_edge10 2026-02-21T09:05:29.5312630Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5312684Z barrier.sync 1; 2026-02-21T09:05:29.5312768Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.5312821Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.5312914Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5313085Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5313170Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.5313226Z barrier.sync 1; 2026-02-21T09:05:29.5313393Z .loc 1 21 67 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:21:67 2026-02-21T09:05:29.5313462Z mov.u32 %r67, %ctaid.y; 2026-02-21T09:05:29.5313518Z mov.u32 %r68, %ctaid.z; 2026-02-21T09:05:29.5313575Z mov.u32 %r69, %nctaid.x; 2026-02-21T09:05:29.5313642Z mov.u32 %r70, %nctaid.y; 2026-02-21T09:05:29.5313727Z mad.lo.s32 %r71, %r68, %r70, %r67; 2026-02-21T09:05:29.5313792Z mad.lo.s32 %r72, %r71, %r69, %r114; 2026-02-21T09:05:29.5313849Z mul.lo.s32 %r73, %r72, 384; 2026-02-21T09:05:29.5314025Z .loc 1 22 67 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:22:67 2026-02-21T09:05:29.5314081Z add.s32 %r74, %r73, 128; 2026-02-21T09:05:29.5314138Z cvt.s64.s32 %rd8, %r74; 2026-02-21T09:05:29.5314203Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:05:29.5314263Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:05:29.5314429Z .loc 1 21 67 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:21:67 2026-02-21T09:05:29.5314492Z cvt.s64.s32 %rd10, %r73; 2026-02-21T09:05:29.5314548Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:05:29.5314609Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:05:29.5314810Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5314876Z @%p17 bra $L__BB0_14; 2026-02-21T09:05:29.5314950Z // %bb.10: // %.lr.ph 2026-02-21T09:05:29.5315034Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5315099Z add.s32 %r382, %r114, -1; 2026-02-21T09:05:29.5315154Z add.s32 %r19, %r1, -128; 2026-02-21T09:05:29.5315237Z mov.b32 %r378, -1; 2026-02-21T09:05:29.5315291Z mov.b32 %r372, 0; 2026-02-21T09:05:29.5315352Z mov.b32 %r373, %r372; 2026-02-21T09:05:29.5315406Z mov.b32 %r381, %r372; 2026-02-21T09:05:29.5315459Z mov.b32 %r380, %r372; 2026-02-21T09:05:29.5315520Z mov.b32 %r376, %r372; 2026-02-21T09:05:29.5315573Z mov.b32 %r379, %r372; 2026-02-21T09:05:29.5315626Z bra.uni $L__BB0_11; 2026-02-21T09:05:29.5315729Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:29.5315893Z .loc 1 0 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0:106 2026-02-21T09:05:29.5315954Z selp.b32 %r97, 0, %r376, %p8; 2026-02-21T09:05:29.5316013Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:05:29.5316082Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:05:29.5316244Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5316300Z shl.b32 %r104, %r373, 3; 2026-02-21T09:05:29.5316364Z add.s32 %r106, %r65, %r104; 2026-02-21T09:05:29.5316419Z add.s32 %r93, %r106, 43008; 2026-02-21T09:05:29.5316575Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5316669Z // begin inline asm 2026-02-21T09:05:29.5316719Z 2026-02-21T09:05:29.5316767Z { 2026-02-21T09:05:29.5316825Z .reg .pred complete; 2026-02-21T09:05:29.5316885Z waitLoop: 2026-02-21T09:05:29.5316999Z mbarrier.try_wait.parity.shared.b64 complete, [%r93], %r372; 2026-02-21T09:05:29.5317059Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.5317115Z } 2026-02-21T09:05:29.5317118Z 2026-02-21T09:05:29.5317171Z // end inline asm 2026-02-21T09:05:29.5317342Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5317424Z add.s32 %r99, %r106, 43040; 2026-02-21T09:05:29.5317593Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5317649Z bar.sync 3, 64; 2026-02-21T09:05:29.5317703Z // begin inline asm 2026-02-21T09:05:29.5317815Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r99], 8704; 2026-02-21T09:05:29.5317869Z // end inline asm 2026-02-21T09:05:29.5318033Z .loc 1 54 31 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:54:31 2026-02-21T09:05:29.5318097Z shl.b32 %r107, %r373, 13; 2026-02-21T09:05:29.5318154Z add.s32 %r96, %r65, %r107; 2026-02-21T09:05:29.5318313Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5318368Z bar.sync 3, 64; 2026-02-21T09:05:29.5318434Z elect.sync %r108|%p13, -1; 2026-02-21T09:05:29.5318519Z and.pred %p10, %p12, %p13; 2026-02-21T09:05:29.5318575Z // begin inline asm 2026-02-21T09:05:29.5318822Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r96], [%rd12, {%r97, %r381}], [%r99]; 2026-02-21T09:05:29.5318877Z // end inline asm 2026-02-21T09:05:29.5319045Z .loc 1 55 44 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:55:44 2026-02-21T09:05:29.5319106Z shl.b32 %r109, %r373, 9; 2026-02-21T09:05:29.5319161Z add.s32 %r110, %r65, %r109; 2026-02-21T09:05:29.5319218Z add.s32 %r100, %r110, 40960; 2026-02-21T09:05:29.5319372Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5319432Z bar.sync 3, 64; 2026-02-21T09:05:29.5319491Z elect.sync %r111|%p14, -1; 2026-02-21T09:05:29.5319550Z and.pred %p11, %p12, %p14; 2026-02-21T09:05:29.5319609Z // begin inline asm 2026-02-21T09:05:29.5319845Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r100], [%rd13, {%r97, %r380}], [%r99]; 2026-02-21T09:05:29.5319900Z // end inline asm 2026-02-21T09:05:29.5320074Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5320130Z add.s32 %r376, %r97, 16; 2026-02-21T09:05:29.5320314Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5320375Z add.s32 %r112, %r373, 1; 2026-02-21T09:05:29.5320434Z setp.eq.b32 %p15, %r112, 4; 2026-02-21T09:05:29.5320498Z selp.b32 %r373, 0, %r112, %p15; 2026-02-21T09:05:29.5320555Z selp.b32 %r113, 1, 0, %p15; 2026-02-21T09:05:29.5320620Z xor.b32 %r372, %r372, %r113; 2026-02-21T09:05:29.5320793Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5320848Z add.s32 %r379, %r379, 1; 2026-02-21T09:05:29.5320915Z setp.lt.s32 %p16, %r379, %r5; 2026-02-21T09:05:29.5320971Z @%p16 bra $L__BB0_11; 2026-02-21T09:05:29.5321026Z bra.uni $L__BB0_14; 2026-02-21T09:05:29.5321125Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.5321221Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.5321278Z add.s32 %r79, %r378, 1; 2026-02-21T09:05:29.5321338Z setp.eq.b32 %p6, %r378, 127; 2026-02-21T09:05:29.5321404Z selp.b32 %r378, 0, %r79, %p6; 2026-02-21T09:05:29.5321462Z setp.ne.b32 %p7, %r378, 0; 2026-02-21T09:05:29.5321520Z setp.eq.b32 %p8, %r378, 0; 2026-02-21T09:05:29.5321605Z @%p7 bra $L__BB0_13; 2026-02-21T09:05:29.5321701Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:29.5321758Z add.s32 %r382, %r382, 1; 2026-02-21T09:05:29.5321946Z .loc 1 39 35 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:39:35 2026-02-21T09:05:29.5322016Z shr.s32 %r80, %r382, 31; 2026-02-21T09:05:29.5322075Z shr.u32 %r81, %r80, 26; 2026-02-21T09:05:29.5322135Z add.s32 %r82, %r382, %r81; 2026-02-21T09:05:29.5322239Z shr.s32 %r83, %r82, 6; 2026-02-21T09:05:29.5322415Z .loc 1 40 33 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:40:33 2026-02-21T09:05:29.5322473Z shl.b32 %r84, %r83, 2; 2026-02-21T09:05:29.5322657Z .loc 1 41 39 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:41:39 2026-02-21T09:05:29.5322714Z sub.s32 %r85, 128, %r84; 2026-02-21T09:05:29.5322887Z .loc 1 41 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:41:52 2026-02-21T09:05:29.5322944Z min.s32 %r86, %r85, 4; 2026-02-21T09:05:29.5323126Z .loc 1 42 45 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:42:45 2026-02-21T09:05:29.5323184Z and.b32 %r87, %r82, -64; 2026-02-21T09:05:29.5323241Z sub.s32 %r88, %r382, %r87; 2026-02-21T09:05:29.5323442Z .loc 1 43 51 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:43:51 2026-02-21T09:05:29.5323503Z div.s32 %r89, %r88, %r86; 2026-02-21T09:05:29.5323678Z .loc 1 42 64 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:42:64 2026-02-21T09:05:29.5323745Z mul.lo.s32 %r90, %r89, %r86; 2026-02-21T09:05:29.5323803Z sub.s32 %r91, %r88, %r90; 2026-02-21T09:05:29.5323975Z .loc 1 42 30 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:42:30 2026-02-21T09:05:29.5324033Z add.s32 %r92, %r91, %r84; 2026-02-21T09:05:29.5324219Z .loc 1 44 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:44:27 2026-02-21T09:05:29.5324277Z shl.b32 %r380, %r92, 4; 2026-02-21T09:05:29.5324451Z .loc 1 45 27 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:45:27 2026-02-21T09:05:29.5324516Z shl.b32 %r381, %r89, 8; 2026-02-21T09:05:29.5324572Z bra.uni $L__BB0_13; 2026-02-21T09:05:29.5324657Z $L__BB0_14: // %._crit_edge 2026-02-21T09:05:29.5324782Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5324963Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5325020Z barrier.sync 1; 2026-02-21T09:05:29.5325146Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.5325204Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.5325302Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.5325474Z .loc 1 19 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:19 2026-02-21T09:05:29.5325540Z barrier.sync 1; 2026-02-21T09:05:29.5325596Z barrier.sync 1; 2026-02-21T09:05:29.5325652Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.5325744Z $L__BB0_23: // %._crit_edge13 2026-02-21T09:05:29.5325922Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5325996Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.5326053Z bar.sync 0, 128; 2026-02-21T09:05:29.5326117Z barrier.sync 1; 2026-02-21T09:05:29.5326194Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:29.5326369Z .loc 1 56 52 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:56:52 2026-02-21T09:05:29.5326437Z // begin inline asm 2026-02-21T09:05:29.5326489Z 2026-02-21T09:05:29.5326539Z { 2026-02-21T09:05:29.5326603Z .reg .pred complete; 2026-02-21T09:05:29.5326658Z waitLoop: 2026-02-21T09:05:29.5326807Z mbarrier.try_wait.parity.shared.b64 complete, [%r353], %r393; 2026-02-21T09:05:29.5326875Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.5326932Z } 2026-02-21T09:05:29.5326936Z 2026-02-21T09:05:29.5326991Z // end inline asm 2026-02-21T09:05:29.5327165Z .loc 1 33 106 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:106 2026-02-21T09:05:29.5327228Z bar.sync 0, 128; 2026-02-21T09:05:29.5327284Z // begin inline asm 2026-02-21T09:05:29.5327374Z @%p111 mbarrier.inval.shared::cta.b64 [%r353]; 2026-02-21T09:05:29.5327502Z // end inline asm 2026-02-21T09:05:29.5327558Z // begin inline asm 2026-02-21T09:05:29.5327644Z @%p111 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T09:05:29.5327698Z // end inline asm 2026-02-21T09:05:29.5327761Z // begin inline asm 2026-02-21T09:05:29.5327840Z @%p111 mbarrier.inval.shared::cta.b64 [%r209]; 2026-02-21T09:05:29.5327894Z // end inline asm 2026-02-21T09:05:29.5327954Z bar.sync 0, 128; 2026-02-21T09:05:29.5328011Z // begin inline asm 2026-02-21T09:05:29.5328087Z @%p111 mbarrier.inval.shared::cta.b64 [%r210]; 2026-02-21T09:05:29.5328141Z // end inline asm 2026-02-21T09:05:29.5328203Z bar.sync 0, 128; 2026-02-21T09:05:29.5328258Z // begin inline asm 2026-02-21T09:05:29.5328333Z @%p111 mbarrier.inval.shared::cta.b64 [%r211]; 2026-02-21T09:05:29.5328394Z // end inline asm 2026-02-21T09:05:29.5328448Z bar.sync 0, 128; 2026-02-21T09:05:29.5328502Z // begin inline asm 2026-02-21T09:05:29.5328604Z @%p111 mbarrier.inval.shared::cta.b64 [%r212]; 2026-02-21T09:05:29.5328670Z // end inline asm 2026-02-21T09:05:29.5328724Z // begin inline asm 2026-02-21T09:05:29.5328801Z @%p111 mbarrier.inval.shared::cta.b64 [%r205]; 2026-02-21T09:05:29.5328861Z // end inline asm 2026-02-21T09:05:29.5328917Z bar.sync 0, 128; 2026-02-21T09:05:29.5328972Z // begin inline asm 2026-02-21T09:05:29.5329057Z @%p111 mbarrier.inval.shared::cta.b64 [%r206]; 2026-02-21T09:05:29.5329111Z // end inline asm 2026-02-21T09:05:29.5329168Z bar.sync 0, 128; 2026-02-21T09:05:29.5329223Z // begin inline asm 2026-02-21T09:05:29.5329309Z @%p111 mbarrier.inval.shared::cta.b64 [%r207]; 2026-02-21T09:05:29.5329364Z // end inline asm 2026-02-21T09:05:29.5329419Z bar.sync 0, 128; 2026-02-21T09:05:29.5329480Z // begin inline asm 2026-02-21T09:05:29.5329558Z @%p111 mbarrier.inval.shared::cta.b64 [%r208]; 2026-02-21T09:05:29.5329618Z // end inline asm 2026-02-21T09:05:29.5329782Z .loc 1 33 4 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:33:4 2026-02-21T09:05:29.5329843Z bar.sync 0, 128; 2026-02-21T09:05:29.5329896Z // begin inline asm 2026-02-21T09:05:29.5330008Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r365, 32; 2026-02-21T09:05:29.5330071Z // end inline asm 2026-02-21T09:05:29.5330173Z st.shared.b32 [global_smem+43096], 50529027; 2026-02-21T09:05:29.5330229Z barrier.sync 1; 2026-02-21T09:05:29.5330316Z $L__BB0_24: // %common.ret 2026-02-21T09:05:29.5330473Z .loc 1 0 0 // cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py:0 2026-02-21T09:05:29.5330526Z ret; 2026-02-21T09:05:29.5330580Z $L__tmp1: 2026-02-21T09:05:29.5330641Z $L__func_end0: 2026-02-21T09:05:29.5330719Z // -- End function 2026-02-21T09:05:29.5330769Z } 2026-02-21T09:05:29.5330976Z .file 1 "/tmp/torchinductor_root/vf/cvfyxkesifkge7dsv656obwqlfubmlblmxcnqyrbiic5fbx3gvhl.py" 2026-02-21T09:05:29.5331038Z .section .debug_abbrev 2026-02-21T09:05:29.5331087Z { 2026-02-21T09:05:29.5331172Z .b8 1 // Abbreviation Code 2026-02-21T09:05:29.5331261Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:29.5331338Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:29.5331413Z .b8 37 // DW_AT_producer 2026-02-21T09:05:29.5331491Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.5331586Z .b8 19 // DW_AT_language 2026-02-21T09:05:29.5331662Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:29.5331741Z .b8 3 // DW_AT_name 2026-02-21T09:05:29.5331811Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.5331885Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:29.5331965Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:29.5332039Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:29.5332128Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.5332196Z .b8 0 // EOM(1) 2026-02-21T09:05:29.5332270Z .b8 0 // EOM(2) 2026-02-21T09:05:29.5332334Z .b8 0 // EOM(3) 2026-02-21T09:05:29.5332381Z } 2026-02-21T09:05:29.5332444Z .section .debug_info 2026-02-21T09:05:29.5332492Z { 2026-02-21T09:05:29.5332571Z .b32 104 // Length of Unit 2026-02-21T09:05:29.5332651Z .b8 2 // DWARF version number 2026-02-21T09:05:29.5332709Z .b8 0 2026-02-21T09:05:29.5332819Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:29.5332902Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:29.5333026Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:29.5333104Z .b8 116 // DW_AT_producer 2026-02-21T09:05:29.5333156Z .b8 114 2026-02-21T09:05:29.5333213Z .b8 105 2026-02-21T09:05:29.5333263Z .b8 116 2026-02-21T09:05:29.5333312Z .b8 111 2026-02-21T09:05:29.5333359Z .b8 110 2026-02-21T09:05:29.5333416Z .b8 0 2026-02-21T09:05:29.5333488Z .b8 2 // DW_AT_language 2026-02-21T09:05:29.5333535Z .b8 0 2026-02-21T09:05:29.5333613Z .b8 99 // DW_AT_name 2026-02-21T09:05:29.5333664Z .b8 118 2026-02-21T09:05:29.5333713Z .b8 102 2026-02-21T09:05:29.5333760Z .b8 121 2026-02-21T09:05:29.5333816Z .b8 120 2026-02-21T09:05:29.5333865Z .b8 107 2026-02-21T09:05:29.5333912Z .b8 101 2026-02-21T09:05:29.5333966Z .b8 115 2026-02-21T09:05:29.5334014Z .b8 105 2026-02-21T09:05:29.5334060Z .b8 102 2026-02-21T09:05:29.5334107Z .b8 107 2026-02-21T09:05:29.5334161Z .b8 103 2026-02-21T09:05:29.5334210Z .b8 101 2026-02-21T09:05:29.5334258Z .b8 55 2026-02-21T09:05:29.5334307Z .b8 100 2026-02-21T09:05:29.5334362Z .b8 115 2026-02-21T09:05:29.5334410Z .b8 118 2026-02-21T09:05:29.5334458Z .b8 54 2026-02-21T09:05:29.5334511Z .b8 53 2026-02-21T09:05:29.5334560Z .b8 54 2026-02-21T09:05:29.5334608Z .b8 111 2026-02-21T09:05:29.5334655Z .b8 98 2026-02-21T09:05:29.5334763Z .b8 119 2026-02-21T09:05:29.5334810Z .b8 113 2026-02-21T09:05:29.5334857Z .b8 108 2026-02-21T09:05:29.5334912Z .b8 102 2026-02-21T09:05:29.5334959Z .b8 117 2026-02-21T09:05:29.5335006Z .b8 98 2026-02-21T09:05:29.5335054Z .b8 109 2026-02-21T09:05:29.5335112Z .b8 108 2026-02-21T09:05:29.5335161Z .b8 98 2026-02-21T09:05:29.5335210Z .b8 108 2026-02-21T09:05:29.5335266Z .b8 109 2026-02-21T09:05:29.5335315Z .b8 120 2026-02-21T09:05:29.5335363Z .b8 99 2026-02-21T09:05:29.5335411Z .b8 110 2026-02-21T09:05:29.5335466Z .b8 113 2026-02-21T09:05:29.5335515Z .b8 121 2026-02-21T09:05:29.5335563Z .b8 114 2026-02-21T09:05:29.5335613Z .b8 98 2026-02-21T09:05:29.5335671Z .b8 105 2026-02-21T09:05:29.5335720Z .b8 105 2026-02-21T09:05:29.5335770Z .b8 99 2026-02-21T09:05:29.5335828Z .b8 53 2026-02-21T09:05:29.5335877Z .b8 102 2026-02-21T09:05:29.5335926Z .b8 98 2026-02-21T09:05:29.5335976Z .b8 120 2026-02-21T09:05:29.5336034Z .b8 51 2026-02-21T09:05:29.5336082Z .b8 103 2026-02-21T09:05:29.5336131Z .b8 118 2026-02-21T09:05:29.5336189Z .b8 104 2026-02-21T09:05:29.5336238Z .b8 108 2026-02-21T09:05:29.5336287Z .b8 46 2026-02-21T09:05:29.5336335Z .b8 112 2026-02-21T09:05:29.5336396Z .b8 121 2026-02-21T09:05:29.5336446Z .b8 0 2026-02-21T09:05:29.5336570Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:29.5336657Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:29.5336705Z .b8 116 2026-02-21T09:05:29.5336755Z .b8 109 2026-02-21T09:05:29.5336803Z .b8 112 2026-02-21T09:05:29.5336860Z .b8 47 2026-02-21T09:05:29.5336908Z .b8 116 2026-02-21T09:05:29.5336956Z .b8 111 2026-02-21T09:05:29.5337005Z .b8 114 2026-02-21T09:05:29.5337060Z .b8 99 2026-02-21T09:05:29.5337109Z .b8 104 2026-02-21T09:05:29.5337159Z .b8 105 2026-02-21T09:05:29.5337242Z .b8 110 2026-02-21T09:05:29.5337292Z .b8 100 2026-02-21T09:05:29.5337341Z .b8 117 2026-02-21T09:05:29.5337389Z .b8 99 2026-02-21T09:05:29.5337445Z .b8 116 2026-02-21T09:05:29.5337495Z .b8 111 2026-02-21T09:05:29.5337544Z .b8 114 2026-02-21T09:05:29.5337600Z .b8 95 2026-02-21T09:05:29.5337649Z .b8 114 2026-02-21T09:05:29.5337699Z .b8 111 2026-02-21T09:05:29.5337747Z .b8 111 2026-02-21T09:05:29.5337806Z .b8 116 2026-02-21T09:05:29.5337857Z .b8 47 2026-02-21T09:05:29.5337906Z .b8 118 2026-02-21T09:05:29.5337964Z .b8 102 2026-02-21T09:05:29.5338014Z .b8 0 2026-02-21T09:05:29.5338064Z } 2026-02-21T09:05:29.5338127Z .section .debug_macinfo { } 2026-02-21T09:05:29.5338131Z 2026-02-21T09:05:29.5338215Z ================================================================ 2026-02-21T09:05:29.5338316Z please share the reproducer above with Triton project. 2026-02-21T09:05:29.7294416Z [30s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:29.7294435Z 2026-02-21T09:05:29.7295655Z Config: @helion.kernel(config=helion.Config(block_sizes=[512, 16, 16], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:29.7295811Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:29.7295881Z `ptxas` stderr: 2026-02-21T09:05:29.7296213Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 200 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.7296307Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.7296311Z 2026-02-21T09:05:29.7296722Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpydmx6q0y.ptx -o /tmp/tmpydmx6q0y.ptx.o 2026-02-21T09:05:29.7296728Z 2026-02-21T09:05:29.7296854Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:29.7296974Z 2026-02-21T09:05:29.7297063Z 2026-02-21T09:05:29.7297168Z ================================================================ 2026-02-21T09:05:29.7297242Z Internal Triton PTX codegen error 2026-02-21T09:05:29.7297308Z `ptxas` stderr: 2026-02-21T09:05:29.7297636Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 200 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.7297777Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.7297781Z 2026-02-21T09:05:29.7298173Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpydmx6q0y.ptx -o /tmp/tmpydmx6q0y.ptx.o 2026-02-21T09:05:29.7298187Z 2026-02-21T09:05:29.7298190Z 2026-02-21T09:05:29.7298246Z // 2026-02-21T09:05:29.7298320Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:29.7298369Z // 2026-02-21T09:05:29.7298379Z 2026-02-21T09:05:29.7298434Z .version 8.7 2026-02-21T09:05:29.7298489Z .target sm_100a 2026-02-21T09:05:29.7298543Z .address_size 64 2026-02-21T09:05:29.7298547Z 2026-02-21T09:05:29.7298674Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:29.7298815Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:29.7298898Z // @_helion_matmul 2026-02-21T09:05:29.7298972Z .visible .entry _helion_matmul( 2026-02-21T09:05:29.7299077Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:29.7299170Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:29.7299267Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:29.7299362Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:29.7299499Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:29.7299550Z ) 2026-02-21T09:05:29.7299611Z .reqntid 640 2026-02-21T09:05:29.7299664Z .maxnreg 32 2026-02-21T09:05:29.7299715Z { 2026-02-21T09:05:29.7299784Z .reg .pred %p<144>; 2026-02-21T09:05:29.7299840Z .reg .b32 %r<585>; 2026-02-21T09:05:29.7299894Z .reg .b64 %rd<179>; 2026-02-21T09:05:29.7300073Z .loc 1 19 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:19:0 2026-02-21T09:05:29.7300137Z $L__func_begin0: 2026-02-21T09:05:29.7300303Z .loc 1 19 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:19:0 2026-02-21T09:05:29.7300307Z 2026-02-21T09:05:29.7300360Z // %bb.0: 2026-02-21T09:05:29.7300455Z ld.param.b64 %rd10, [_helion_matmul_param_3]; 2026-02-21T09:05:29.7300509Z $L__tmp0: 2026-02-21T09:05:29.7300703Z .loc 1 19 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:19 2026-02-21T09:05:29.7300773Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:29.7300829Z shr.u32 %r2, %r1, 5; 2026-02-21T09:05:29.7300901Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:05:29.7300964Z setp.lt.u32 %p3, %r3, 8; 2026-02-21T09:05:29.7301030Z @%p3 bra $L__BB0_22; 2026-02-21T09:05:29.7301086Z bra.uni $L__BB0_1; 2026-02-21T09:05:29.7301140Z $L__BB0_22: 2026-02-21T09:05:29.7301309Z .loc 1 0 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:0 2026-02-21T09:05:29.7301391Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T09:05:29.7301468Z ld.param.b64 %rd8, [_helion_matmul_param_1]; 2026-02-21T09:05:29.7301631Z .loc 1 19 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:19 2026-02-21T09:05:29.7301709Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:29.7301773Z setp.lt.u32 %p62, %r1, 32; 2026-02-21T09:05:29.7301836Z mov.b32 %r318, global_smem; 2026-02-21T09:05:29.7301906Z // begin inline asm 2026-02-21T09:05:29.7302059Z @%p62 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r318], 128; 2026-02-21T09:05:29.7302115Z // end inline asm 2026-02-21T09:05:29.7302184Z bar.sync 0, 256; 2026-02-21T09:05:29.7302279Z ld.shared.b32 %r539, [global_smem]; 2026-02-21T09:05:29.7302338Z bar.sync 0, 256; 2026-02-21T09:05:29.7302395Z // begin inline asm 2026-02-21T09:05:29.7302525Z @%p62 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:29.7302582Z // end inline asm 2026-02-21T09:05:29.7302752Z .loc 1 21 67 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:21:67 2026-02-21T09:05:29.7302821Z mov.u32 %r81, %ctaid.x; 2026-02-21T09:05:29.7302879Z mov.u32 %r388, %ctaid.y; 2026-02-21T09:05:29.7302935Z mov.u32 %r389, %ctaid.z; 2026-02-21T09:05:29.7303001Z mov.u32 %r390, %nctaid.x; 2026-02-21T09:05:29.7303059Z mov.u32 %r391, %nctaid.y; 2026-02-21T09:05:29.7303127Z mad.lo.s32 %r392, %r389, %r391, %r388; 2026-02-21T09:05:29.7303195Z mad.lo.s32 %r393, %r392, %r390, %r81; 2026-02-21T09:05:29.7303259Z shl.b32 %r394, %r393, 8; 2026-02-21T09:05:29.7303319Z cvt.s64.s32 %rd113, %r394; 2026-02-21T09:05:29.7303379Z add.s64 %rd92, %rd10, %rd113; 2026-02-21T09:05:29.7303445Z shl.b32 %r395, %r1, 2; 2026-02-21T09:05:29.7303503Z add.s32 %r319, %r318, %r395; 2026-02-21T09:05:29.7303557Z mov.b32 %r582, 0; 2026-02-21T09:05:29.7303613Z // begin inline asm 2026-02-21T09:05:29.7303715Z @%p62 st.shared.b32 [ %r319 + 0 ], %r582; 2026-02-21T09:05:29.7303771Z // end inline asm 2026-02-21T09:05:29.7303832Z bar.warp.sync -1; 2026-02-21T09:05:29.7303899Z setp.eq.b32 %p130, %r1, 0; 2026-02-21T09:05:29.7303958Z cvt.u64.u32 %rd77, %r318; 2026-02-21T09:05:29.7304013Z // begin inline asm 2026-02-21T09:05:29.7304195Z @%p130 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd77 + 0 ], %rd8; 2026-02-21T09:05:29.7304252Z // end inline asm 2026-02-21T09:05:29.7304321Z // begin inline asm 2026-02-21T09:05:29.7304494Z @%p130 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1; 2026-02-21T09:05:29.7304576Z // end inline asm 2026-02-21T09:05:29.7304636Z mov.b32 %r321, 16; 2026-02-21T09:05:29.7304731Z // begin inline asm 2026-02-21T09:05:29.7304907Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0, %r321; 2026-02-21T09:05:29.7304964Z // end inline asm 2026-02-21T09:05:29.7305027Z // begin inline asm 2026-02-21T09:05:29.7305193Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1, %r321; 2026-02-21T09:05:29.7305259Z // end inline asm 2026-02-21T09:05:29.7305318Z mov.b32 %r323, 2048; 2026-02-21T09:05:29.7305375Z // begin inline asm 2026-02-21T09:05:29.7305551Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0, %r323; 2026-02-21T09:05:29.7305606Z // end inline asm 2026-02-21T09:05:29.7305659Z // begin inline asm 2026-02-21T09:05:29.7305855Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1, %r323; 2026-02-21T09:05:29.7305911Z // end inline asm 2026-02-21T09:05:29.7305966Z mov.b64 %rd85, 4096; 2026-02-21T09:05:29.7306027Z // begin inline asm 2026-02-21T09:05:29.7306195Z @%p130 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd77 + 0 ], 0x0, %rd85; 2026-02-21T09:05:29.7306249Z // end inline asm 2026-02-21T09:05:29.7306310Z mov.b32 %r325, 1; 2026-02-21T09:05:29.7306564Z // begin inline asm 2026-02-21T09:05:29.7306734Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0, %r325; 2026-02-21T09:05:29.7306789Z // end inline asm 2026-02-21T09:05:29.7306850Z // begin inline asm 2026-02-21T09:05:29.7307018Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1, %r325; 2026-02-21T09:05:29.7307071Z // end inline asm 2026-02-21T09:05:29.7307135Z // begin inline asm 2026-02-21T09:05:29.7307286Z @%p130 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x6; 2026-02-21T09:05:29.7307340Z // end inline asm 2026-02-21T09:05:29.7307404Z // begin inline asm 2026-02-21T09:05:29.7307569Z @%p130 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0; 2026-02-21T09:05:29.7307621Z // end inline asm 2026-02-21T09:05:29.7307708Z // begin inline asm 2026-02-21T09:05:29.7307868Z @%p130 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1; 2026-02-21T09:05:29.7307921Z // end inline asm 2026-02-21T09:05:29.7307976Z // begin inline asm 2026-02-21T09:05:29.7308131Z @%p130 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0; 2026-02-21T09:05:29.7308185Z // end inline asm 2026-02-21T09:05:29.7308239Z // begin inline asm 2026-02-21T09:05:29.7308512Z @%p62 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd92 + 0 ], [ %rd77 + 0 ], 0x80; 2026-02-21T09:05:29.7308568Z // end inline asm 2026-02-21T09:05:29.7308624Z // begin inline asm 2026-02-21T09:05:29.7308760Z @%p62 fence.proxy.tensormap::generic.acquire.gpu [ %rd92 + 0 ], 0x80; 2026-02-21T09:05:29.7308834Z @%p62 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.7308911Z @%p62 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.7308965Z // end inline asm 2026-02-21T09:05:29.7309029Z bar.sync 0, 256; 2026-02-21T09:05:29.7309201Z .loc 1 23 71 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:23:71 2026-02-21T09:05:29.7309262Z add.s64 %rd110, %rd92, 128; 2026-02-21T09:05:29.7309356Z bar.sync 0, 256; 2026-02-21T09:05:29.7309411Z // begin inline asm 2026-02-21T09:05:29.7309480Z @%p62 st.shared.b32 [ %r319 + 0 ], %r582; 2026-02-21T09:05:29.7309539Z // end inline asm 2026-02-21T09:05:29.7309598Z bar.warp.sync -1; 2026-02-21T09:05:29.7309652Z // begin inline asm 2026-02-21T09:05:29.7309813Z @%p130 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd77 + 0 ], %rd9; 2026-02-21T09:05:29.7309874Z // end inline asm 2026-02-21T09:05:29.7309929Z // begin inline asm 2026-02-21T09:05:29.7310098Z @%p130 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1; 2026-02-21T09:05:29.7310158Z // end inline asm 2026-02-21T09:05:29.7310212Z // begin inline asm 2026-02-21T09:05:29.7310363Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0, %r321; 2026-02-21T09:05:29.7310424Z // end inline asm 2026-02-21T09:05:29.7310479Z mov.b32 %r330, 256; 2026-02-21T09:05:29.7310534Z // begin inline asm 2026-02-21T09:05:29.7310681Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1, %r330; 2026-02-21T09:05:29.7310743Z // end inline asm 2026-02-21T09:05:29.7310796Z // begin inline asm 2026-02-21T09:05:29.7310953Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0, %r323; 2026-02-21T09:05:29.7311013Z // end inline asm 2026-02-21T09:05:29.7311069Z mov.b32 %r332, 4096; 2026-02-21T09:05:29.7311122Z // begin inline asm 2026-02-21T09:05:29.7311311Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1, %r332; 2026-02-21T09:05:29.7311367Z // end inline asm 2026-02-21T09:05:29.7311421Z // begin inline asm 2026-02-21T09:05:29.7311586Z @%p130 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd77 + 0 ], 0x0, %rd85; 2026-02-21T09:05:29.7311647Z // end inline asm 2026-02-21T09:05:29.7311701Z // begin inline asm 2026-02-21T09:05:29.7311868Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0, %r325; 2026-02-21T09:05:29.7311930Z // end inline asm 2026-02-21T09:05:29.7311984Z // begin inline asm 2026-02-21T09:05:29.7312149Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1, %r325; 2026-02-21T09:05:29.7312208Z // end inline asm 2026-02-21T09:05:29.7312262Z // begin inline asm 2026-02-21T09:05:29.7312408Z @%p130 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x6; 2026-02-21T09:05:29.7312467Z // end inline asm 2026-02-21T09:05:29.7312522Z // begin inline asm 2026-02-21T09:05:29.7312696Z @%p130 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0; 2026-02-21T09:05:29.7312749Z // end inline asm 2026-02-21T09:05:29.7312809Z // begin inline asm 2026-02-21T09:05:29.7312958Z @%p130 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x1; 2026-02-21T09:05:29.7313036Z // end inline asm 2026-02-21T09:05:29.7313096Z // begin inline asm 2026-02-21T09:05:29.7313241Z @%p130 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd77 + 0 ], 0x0; 2026-02-21T09:05:29.7313294Z // end inline asm 2026-02-21T09:05:29.7313375Z // begin inline asm 2026-02-21T09:05:29.7313670Z @%p62 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd110 + 0 ], [ %rd77 + 0 ], 0x80; 2026-02-21T09:05:29.7313726Z // end inline asm 2026-02-21T09:05:29.7313781Z // begin inline asm 2026-02-21T09:05:29.7313922Z @%p62 fence.proxy.tensormap::generic.acquire.gpu [ %rd110 + 0 ], 0x80; 2026-02-21T09:05:29.7313996Z @%p62 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.7314071Z @%p62 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.7314134Z // end inline asm 2026-02-21T09:05:29.7314189Z bar.sync 0, 256; 2026-02-21T09:05:29.7314258Z cvta.global.u64 %rd114, %rd110; 2026-02-21T09:05:29.7314442Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7314503Z max.u32 %r396, %r81, 1023; 2026-02-21T09:05:29.7314562Z shl.b32 %r397, %r396, 7; 2026-02-21T09:05:29.7314647Z sub.s32 %r82, 131072, %r397; 2026-02-21T09:05:29.7314868Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7314943Z shfl.sync.idx.b32 %r83, %r2, 0, 31, -1; 2026-02-21T09:05:29.7315003Z shl.b32 %r398, %r83, 21; 2026-02-21T09:05:29.7315074Z and.b32 %r399, %r398, 6291456; 2026-02-21T09:05:29.7315134Z add.s32 %r400, %r399, %r539; 2026-02-21T09:05:29.7315196Z shl.b32 %r401, %r83, 2; 2026-02-21T09:05:29.7315262Z and.b32 %r402, %r401, 16; 2026-02-21T09:05:29.7315352Z add.s32 %r335, %r400, %r402; 2026-02-21T09:05:29.7315417Z mov.pred %p100, -1; 2026-02-21T09:05:29.7315474Z // begin inline asm 2026-02-21T09:05:29.7315784Z @%p100 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r335 + 0], {%r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582}; 2026-02-21T09:05:29.7315843Z // end inline asm 2026-02-21T09:05:29.7315901Z // begin inline asm 2026-02-21T09:05:29.7316210Z @%p100 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r335 + 32], {%r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582, %r582}; 2026-02-21T09:05:29.7316266Z // end inline asm 2026-02-21T09:05:29.7316323Z // begin inline asm 2026-02-21T09:05:29.7316402Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:29.7316458Z // end inline asm 2026-02-21T09:05:29.7316515Z bar.sync 0, 256; 2026-02-21T09:05:29.7316721Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7316792Z add.s32 %r369, %r318, 67584; 2026-02-21T09:05:29.7316850Z // begin inline asm 2026-02-21T09:05:29.7316939Z @%p130 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T09:05:29.7317002Z // end inline asm 2026-02-21T09:05:29.7317059Z bar.sync 0, 256; 2026-02-21T09:05:29.7317119Z add.s32 %r370, %r318, 67592; 2026-02-21T09:05:29.7317175Z // begin inline asm 2026-02-21T09:05:29.7317274Z @%p130 mbarrier.init.shared::cta.b64 [%r370], 1; 2026-02-21T09:05:29.7317331Z // end inline asm 2026-02-21T09:05:29.7317388Z bar.sync 0, 256; 2026-02-21T09:05:29.7317459Z add.s32 %r371, %r318, 67600; 2026-02-21T09:05:29.7317516Z // begin inline asm 2026-02-21T09:05:29.7317604Z @%p130 mbarrier.init.shared::cta.b64 [%r371], 1; 2026-02-21T09:05:29.7317667Z // end inline asm 2026-02-21T09:05:29.7317722Z bar.sync 0, 256; 2026-02-21T09:05:29.7317783Z add.s32 %r372, %r318, 67608; 2026-02-21T09:05:29.7317840Z // begin inline asm 2026-02-21T09:05:29.7317934Z @%p130 mbarrier.init.shared::cta.b64 [%r372], 1; 2026-02-21T09:05:29.7317989Z // end inline asm 2026-02-21T09:05:29.7318047Z add.s32 %r373, %r318, 67616; 2026-02-21T09:05:29.7318110Z // begin inline asm 2026-02-21T09:05:29.7318245Z @%p130 mbarrier.init.shared::cta.b64 [%r373], 1; 2026-02-21T09:05:29.7318301Z // end inline asm 2026-02-21T09:05:29.7318355Z bar.sync 0, 256; 2026-02-21T09:05:29.7318421Z add.s32 %r374, %r318, 67624; 2026-02-21T09:05:29.7318478Z // begin inline asm 2026-02-21T09:05:29.7318559Z @%p130 mbarrier.init.shared::cta.b64 [%r374], 1; 2026-02-21T09:05:29.7318620Z // end inline asm 2026-02-21T09:05:29.7318676Z bar.sync 0, 256; 2026-02-21T09:05:29.7318737Z add.s32 %r375, %r318, 67632; 2026-02-21T09:05:29.7318793Z // begin inline asm 2026-02-21T09:05:29.7318882Z @%p130 mbarrier.init.shared::cta.b64 [%r375], 1; 2026-02-21T09:05:29.7318937Z // end inline asm 2026-02-21T09:05:29.7318992Z bar.sync 0, 256; 2026-02-21T09:05:29.7319060Z add.s32 %r376, %r318, 67640; 2026-02-21T09:05:29.7319118Z // begin inline asm 2026-02-21T09:05:29.7319198Z @%p130 mbarrier.init.shared::cta.b64 [%r376], 1; 2026-02-21T09:05:29.7319259Z // end inline asm 2026-02-21T09:05:29.7319431Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7319489Z bar.sync 0, 256; 2026-02-21T09:05:29.7319544Z // begin inline asm 2026-02-21T09:05:29.7319647Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r369]; 2026-02-21T09:05:29.7319728Z // end inline asm 2026-02-21T09:05:29.7319787Z bar.sync 0, 256; 2026-02-21T09:05:29.7319853Z // begin inline asm 2026-02-21T09:05:29.7319943Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r370]; 2026-02-21T09:05:29.7320001Z // end inline asm 2026-02-21T09:05:29.7320056Z bar.sync 0, 256; 2026-02-21T09:05:29.7320120Z // begin inline asm 2026-02-21T09:05:29.7320204Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r371]; 2026-02-21T09:05:29.7320260Z // end inline asm 2026-02-21T09:05:29.7320363Z bar.sync 0, 256; 2026-02-21T09:05:29.7320420Z // begin inline asm 2026-02-21T09:05:29.7320504Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r372]; 2026-02-21T09:05:29.7320567Z // end inline asm 2026-02-21T09:05:29.7320733Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7320800Z bar.sync 0, 256; 2026-02-21T09:05:29.7320857Z add.s32 %r381, %r318, 67648; 2026-02-21T09:05:29.7320919Z // begin inline asm 2026-02-21T09:05:29.7320997Z @%p130 mbarrier.init.shared::cta.b64 [%r381], 1; 2026-02-21T09:05:29.7321048Z // end inline asm 2026-02-21T09:05:29.7321108Z bar.sync 0, 256; 2026-02-21T09:05:29.7321163Z add.s32 %r382, %r318, 67656; 2026-02-21T09:05:29.7321217Z // begin inline asm 2026-02-21T09:05:29.7321299Z @%p130 mbarrier.init.shared::cta.b64 [%r382], 1; 2026-02-21T09:05:29.7321352Z // end inline asm 2026-02-21T09:05:29.7321408Z add.s32 %r383, %r318, 67664; 2026-02-21T09:05:29.7321485Z // begin inline asm 2026-02-21T09:05:29.7321573Z @%p130 mbarrier.init.shared::cta.b64 [%r383], 1; 2026-02-21T09:05:29.7321626Z // end inline asm 2026-02-21T09:05:29.7321680Z bar.sync 0, 256; 2026-02-21T09:05:29.7321745Z add.s32 %r384, %r318, 67672; 2026-02-21T09:05:29.7321801Z // begin inline asm 2026-02-21T09:05:29.7321879Z @%p130 mbarrier.init.shared::cta.b64 [%r384], 1; 2026-02-21T09:05:29.7321931Z // end inline asm 2026-02-21T09:05:29.7322099Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7322152Z bar.sync 0, 256; 2026-02-21T09:05:29.7322206Z // begin inline asm 2026-02-21T09:05:29.7322293Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r383]; 2026-02-21T09:05:29.7322345Z // end inline asm 2026-02-21T09:05:29.7322397Z bar.sync 0, 256; 2026-02-21T09:05:29.7322457Z // begin inline asm 2026-02-21T09:05:29.7322540Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r384]; 2026-02-21T09:05:29.7322595Z // end inline asm 2026-02-21T09:05:29.7322755Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7322844Z st.shared.v2.b32 [global_smem+67680], {0, 0}; 2026-02-21T09:05:29.7322920Z st.shared.b32 [global_smem+67688], 33685761; 2026-02-21T09:05:29.7323013Z st.shared.b32 [global_smem], %r539; 2026-02-21T09:05:29.7323075Z barrier.sync 1; 2026-02-21T09:05:29.7323130Z barrier.sync 1; 2026-02-21T09:05:29.7323193Z setp.lt.s32 %p120, %r82, 1; 2026-02-21T09:05:29.7323251Z mov.b32 %r584, %r582; 2026-02-21T09:05:29.7323316Z @%p120 bra $L__BB0_29; 2026-02-21T09:05:29.7323393Z // %bb.23: // %.lr.ph17 2026-02-21T09:05:29.7323451Z add.s32 %r580, %r81, -1; 2026-02-21T09:05:29.7323514Z shl.b32 %r405, %r1, 5; 2026-02-21T09:05:29.7323572Z and.b32 %r406, %r405, 8032; 2026-02-21T09:05:29.7323630Z bfe.s32 %r407, %r1, 2, 1; 2026-02-21T09:05:29.7323686Z and.b32 %r408, %r407, 144; 2026-02-21T09:05:29.7323751Z or.b32 %r409, %r408, %r406; 2026-02-21T09:05:29.7323809Z add.s32 %r411, %r318, 49152; 2026-02-21T09:05:29.7323866Z add.s32 %r86, %r411, %r409; 2026-02-21T09:05:29.7323929Z xor.b32 %r412, %r409, 16; 2026-02-21T09:05:29.7323984Z add.s32 %r87, %r411, %r412; 2026-02-21T09:05:29.7324041Z and.b32 %r413, %r83, 1; 2026-02-21T09:05:29.7324107Z shl.b32 %r414, %r413, 13; 2026-02-21T09:05:29.7324165Z add.s32 %r455, %r411, %r414; 2026-02-21T09:05:29.7324222Z shl.b32 %r89, %r413, 8; 2026-02-21T09:05:29.7324278Z mov.b32 %r576, -1; 2026-02-21T09:05:29.7324367Z mov.b32 %r584, %r582; 2026-02-21T09:05:29.7324439Z mov.b32 %r579, %r582; 2026-02-21T09:05:29.7324493Z mov.b32 %r578, %r582; 2026-02-21T09:05:29.7324555Z mov.b32 %r577, %r582; 2026-02-21T09:05:29.7324611Z bra.uni $L__BB0_24; 2026-02-21T09:05:29.7324749Z $L__BB0_27: // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:05:29.7324915Z .loc 1 0 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:74 2026-02-21T09:05:29.7324984Z setp.lt.u32 %p126, %r1, 64; 2026-02-21T09:05:29.7325178Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7325242Z shl.b32 %r457, %r582, 3; 2026-02-21T09:05:29.7325315Z add.s32 %r459, %r318, %r457; 2026-02-21T09:05:29.7325376Z add.s32 %r417, %r459, 67648; 2026-02-21T09:05:29.7325538Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7325604Z shl.b32 %r460, %r582, 6; 2026-02-21T09:05:29.7325658Z bar.sync 0, 256; 2026-02-21T09:05:29.7325713Z // begin inline asm 2026-02-21T09:05:29.7325763Z 2026-02-21T09:05:29.7325821Z { 2026-02-21T09:05:29.7325881Z .reg .pred complete; 2026-02-21T09:05:29.7325936Z waitLoop: 2026-02-21T09:05:29.7326065Z mbarrier.try_wait.parity.shared.b64 complete, [%r417], %r584; 2026-02-21T09:05:29.7326129Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7326178Z } 2026-02-21T09:05:29.7326182Z 2026-02-21T09:05:29.7326262Z // end inline asm 2026-02-21T09:05:29.7326328Z add.s32 %r435, %r335, %r460; 2026-02-21T09:05:29.7326383Z // begin inline asm 2026-02-21T09:05:29.7326650Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434}, [%r435 + 0]; 2026-02-21T09:05:29.7326714Z // end inline asm 2026-02-21T09:05:29.7326768Z // begin inline asm 2026-02-21T09:05:29.7327036Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451}, [%r435 + 32]; 2026-02-21T09:05:29.7327098Z // end inline asm 2026-02-21T09:05:29.7327152Z // begin inline asm 2026-02-21T09:05:29.7327219Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:29.7327272Z // end inline asm 2026-02-21T09:05:29.7327338Z cvt.u64.u32 %rd115, %r419; 2026-02-21T09:05:29.7327396Z cvt.u64.u32 %rd116, %r420; 2026-02-21T09:05:29.7327457Z shl.b64 %rd117, %rd116, 32; 2026-02-21T09:05:29.7327523Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T09:05:29.7327685Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7327746Z mov.b64 {%r461, %r462}, %rd118; 2026-02-21T09:05:29.7327854Z cvt.rn.f16x2.f32 %r463, %r462, %r461; 2026-02-21T09:05:29.7328014Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7328073Z cvt.u64.u32 %rd119, %r421; 2026-02-21T09:05:29.7328131Z cvt.u64.u32 %rd120, %r422; 2026-02-21T09:05:29.7328194Z shl.b64 %rd121, %rd120, 32; 2026-02-21T09:05:29.7328254Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T09:05:29.7328414Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7328482Z mov.b64 {%r464, %r465}, %rd122; 2026-02-21T09:05:29.7328548Z cvt.rn.f16x2.f32 %r466, %r465, %r464; 2026-02-21T09:05:29.7328708Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7328772Z cvt.u64.u32 %rd123, %r423; 2026-02-21T09:05:29.7328828Z cvt.u64.u32 %rd124, %r424; 2026-02-21T09:05:29.7328883Z shl.b64 %rd125, %rd124, 32; 2026-02-21T09:05:29.7328941Z or.b64 %rd126, %rd123, %rd125; 2026-02-21T09:05:29.7329108Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7329167Z mov.b64 {%r467, %r468}, %rd126; 2026-02-21T09:05:29.7329256Z cvt.rn.f16x2.f32 %r469, %r468, %r467; 2026-02-21T09:05:29.7329420Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7329477Z cvt.u64.u32 %rd127, %r425; 2026-02-21T09:05:29.7329534Z cvt.u64.u32 %rd128, %r426; 2026-02-21T09:05:29.7329595Z shl.b64 %rd129, %rd128, 32; 2026-02-21T09:05:29.7329652Z or.b64 %rd130, %rd127, %rd129; 2026-02-21T09:05:29.7329809Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7329891Z mov.b64 {%r470, %r471}, %rd130; 2026-02-21T09:05:29.7329960Z cvt.rn.f16x2.f32 %r472, %r471, %r470; 2026-02-21T09:05:29.7330117Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7330177Z cvt.u64.u32 %rd131, %r427; 2026-02-21T09:05:29.7330239Z cvt.u64.u32 %rd132, %r428; 2026-02-21T09:05:29.7330295Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:05:29.7330354Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:05:29.7330517Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7330575Z mov.b64 {%r473, %r474}, %rd134; 2026-02-21T09:05:29.7330637Z cvt.rn.f16x2.f32 %r475, %r474, %r473; 2026-02-21T09:05:29.7330793Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7330856Z cvt.u64.u32 %rd135, %r429; 2026-02-21T09:05:29.7330936Z cvt.u64.u32 %rd136, %r430; 2026-02-21T09:05:29.7330995Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:05:29.7331059Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:05:29.7331210Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7331268Z mov.b64 {%r476, %r477}, %rd138; 2026-02-21T09:05:29.7331337Z cvt.rn.f16x2.f32 %r478, %r477, %r476; 2026-02-21T09:05:29.7331490Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7331547Z cvt.u64.u32 %rd139, %r431; 2026-02-21T09:05:29.7331602Z cvt.u64.u32 %rd140, %r432; 2026-02-21T09:05:29.7331666Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:05:29.7331723Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:05:29.7331875Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7331938Z mov.b64 {%r479, %r480}, %rd142; 2026-02-21T09:05:29.7332000Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T09:05:29.7332154Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7332217Z cvt.u64.u32 %rd143, %r433; 2026-02-21T09:05:29.7332273Z cvt.u64.u32 %rd144, %r434; 2026-02-21T09:05:29.7332354Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:05:29.7332412Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:05:29.7332585Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7332646Z mov.b64 {%r482, %r483}, %rd146; 2026-02-21T09:05:29.7332708Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T09:05:29.7332877Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7332936Z cvt.u64.u32 %rd147, %r436; 2026-02-21T09:05:29.7332994Z cvt.u64.u32 %rd148, %r437; 2026-02-21T09:05:29.7333059Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:05:29.7333118Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:05:29.7333277Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7333333Z mov.b64 {%r485, %r486}, %rd150; 2026-02-21T09:05:29.7333403Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T09:05:29.7333565Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7333622Z cvt.u64.u32 %rd151, %r438; 2026-02-21T09:05:29.7333683Z cvt.u64.u32 %rd152, %r439; 2026-02-21T09:05:29.7333760Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:05:29.7333819Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:05:29.7333983Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7334041Z mov.b64 {%r488, %r489}, %rd154; 2026-02-21T09:05:29.7334101Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T09:05:29.7334258Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7334347Z cvt.u64.u32 %rd155, %r440; 2026-02-21T09:05:29.7334403Z cvt.u64.u32 %rd156, %r441; 2026-02-21T09:05:29.7334459Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:05:29.7334525Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:05:29.7334724Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7334785Z mov.b64 {%r491, %r492}, %rd158; 2026-02-21T09:05:29.7334856Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T09:05:29.7335017Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7335073Z cvt.u64.u32 %rd159, %r442; 2026-02-21T09:05:29.7335129Z cvt.u64.u32 %rd160, %r443; 2026-02-21T09:05:29.7335193Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:05:29.7335251Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:05:29.7335411Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7335513Z mov.b64 {%r494, %r495}, %rd162; 2026-02-21T09:05:29.7335578Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T09:05:29.7335738Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7335807Z cvt.u64.u32 %rd163, %r444; 2026-02-21T09:05:29.7335864Z cvt.u64.u32 %rd164, %r445; 2026-02-21T09:05:29.7335921Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:05:29.7335978Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:05:29.7336145Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7336204Z mov.b64 {%r497, %r498}, %rd166; 2026-02-21T09:05:29.7336266Z cvt.rn.f16x2.f32 %r499, %r498, %r497; 2026-02-21T09:05:29.7336433Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7336492Z cvt.u64.u32 %rd167, %r446; 2026-02-21T09:05:29.7336550Z cvt.u64.u32 %rd168, %r447; 2026-02-21T09:05:29.7336617Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:05:29.7336676Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:05:29.7336835Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7336893Z mov.b64 {%r500, %r501}, %rd170; 2026-02-21T09:05:29.7336994Z cvt.rn.f16x2.f32 %r502, %r501, %r500; 2026-02-21T09:05:29.7337153Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7337212Z cvt.u64.u32 %rd171, %r448; 2026-02-21T09:05:29.7337277Z cvt.u64.u32 %rd172, %r449; 2026-02-21T09:05:29.7337334Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:05:29.7337391Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:05:29.7337556Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7337615Z mov.b64 {%r503, %r504}, %rd174; 2026-02-21T09:05:29.7337675Z cvt.rn.f16x2.f32 %r505, %r504, %r503; 2026-02-21T09:05:29.7337835Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7337900Z cvt.u64.u32 %rd175, %r450; 2026-02-21T09:05:29.7337956Z cvt.u64.u32 %rd176, %r451; 2026-02-21T09:05:29.7338013Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:05:29.7338079Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:05:29.7338236Z .loc 1 59 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:59:27 2026-02-21T09:05:29.7338293Z mov.b64 {%r506, %r507}, %rd178; 2026-02-21T09:05:29.7338403Z cvt.rn.f16x2.f32 %r508, %r507, %r506; 2026-02-21T09:05:29.7338561Z .loc 1 60 45 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:60:45 2026-02-21T09:05:29.7338631Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.7338683Z bar.sync 0, 256; 2026-02-21T09:05:29.7338784Z st.shared.v4.b32 [%r86], {%r463, %r466, %r469, %r472}; 2026-02-21T09:05:29.7338886Z st.shared.v4.b32 [%r86+8192], {%r487, %r490, %r493, %r496}; 2026-02-21T09:05:29.7338975Z st.shared.v4.b32 [%r87], {%r475, %r478, %r481, %r484}; 2026-02-21T09:05:29.7339102Z st.shared.v4.b32 [%r87+8192], {%r499, %r502, %r505, %r508}; 2026-02-21T09:05:29.7339160Z // begin inline asm 2026-02-21T09:05:29.7339232Z fence.proxy.async.shared::cta; 2026-02-21T09:05:29.7339293Z // end inline asm 2026-02-21T09:05:29.7339346Z bar.sync 0, 256; 2026-02-21T09:05:29.7339408Z elect.sync %r509|%p127, -1; 2026-02-21T09:05:29.7339471Z and.pred %p124, %p126, %p127; 2026-02-21T09:05:29.7339537Z add.s32 %r454, %r579, %r89; 2026-02-21T09:05:29.7339593Z // begin inline asm 2026-02-21T09:05:29.7339775Z @%p124 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd114, {%r578, %r454}], [%r455]; 2026-02-21T09:05:29.7339836Z // end inline asm 2026-02-21T09:05:29.7339901Z cp.async.bulk.commit_group; 2026-02-21T09:05:29.7340063Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7340143Z add.s32 %r456, %r459, 67664; 2026-02-21T09:05:29.7340312Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7340366Z bar.sync 0, 256; 2026-02-21T09:05:29.7340420Z // begin inline asm 2026-02-21T09:05:29.7340516Z @%p130 mbarrier.arrive.shared::cta.b64 _, [%r456]; 2026-02-21T09:05:29.7340572Z // end inline asm 2026-02-21T09:05:29.7340629Z add.s32 %r510, %r582, 1; 2026-02-21T09:05:29.7340696Z setp.eq.b32 %p128, %r510, 2; 2026-02-21T09:05:29.7340760Z selp.b32 %r582, 0, %r510, %p128; 2026-02-21T09:05:29.7340818Z selp.b32 %r581, 1, 0, %p128; 2026-02-21T09:05:29.7340896Z $L__BB0_28: // %.thread24 2026-02-21T09:05:29.7340998Z // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:05:29.7341158Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7341216Z xor.b32 %r584, %r584, %r581; 2026-02-21T09:05:29.7341386Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7341447Z add.s32 %r577, %r577, 1; 2026-02-21T09:05:29.7341510Z setp.lt.s32 %p129, %r577, %r82; 2026-02-21T09:05:29.7341578Z @%p129 bra $L__BB0_24; 2026-02-21T09:05:29.7341656Z bra.uni $L__BB0_29; 2026-02-21T09:05:29.7341758Z $L__BB0_24: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:29.7341815Z add.s32 %r416, %r576, 1; 2026-02-21T09:05:29.7341887Z setp.eq.b32 %p121, %r576, 127; 2026-02-21T09:05:29.7341948Z selp.b32 %r576, 0, %r416, %p121; 2026-02-21T09:05:29.7342008Z setp.eq.b32 %p122, %r576, 127; 2026-02-21T09:05:29.7342073Z @%p122 bra $L__BB0_27; 2026-02-21T09:05:29.7342167Z // %bb.25: // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:05:29.7342325Z .loc 1 0 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:74 2026-02-21T09:05:29.7342385Z mov.b32 %r581, 0; 2026-02-21T09:05:29.7342544Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7342605Z setp.ne.b32 %p123, %r576, 0; 2026-02-21T09:05:29.7342662Z @%p123 bra $L__BB0_28; 2026-02-21T09:05:29.7342742Z // %bb.26: // %.thread 2026-02-21T09:05:29.7342828Z // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:05:29.7342884Z add.s32 %r580, %r580, 1; 2026-02-21T09:05:29.7343070Z .loc 1 38 35 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:38:35 2026-02-21T09:05:29.7343129Z shr.s32 %r512, %r580, 31; 2026-02-21T09:05:29.7343186Z shr.u32 %r513, %r512, 27; 2026-02-21T09:05:29.7343249Z add.s32 %r514, %r580, %r513; 2026-02-21T09:05:29.7343305Z shr.s32 %r515, %r514, 5; 2026-02-21T09:05:29.7343464Z .loc 1 39 33 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:39:33 2026-02-21T09:05:29.7343521Z shl.b32 %r516, %r515, 2; 2026-02-21T09:05:29.7343718Z .loc 1 40 39 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:39 2026-02-21T09:05:29.7343776Z sub.s32 %r517, 128, %r516; 2026-02-21T09:05:29.7343931Z .loc 1 40 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:52 2026-02-21T09:05:29.7343997Z min.s32 %r518, %r517, 4; 2026-02-21T09:05:29.7344153Z .loc 1 41 45 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:45 2026-02-21T09:05:29.7344213Z and.b32 %r519, %r514, -32; 2026-02-21T09:05:29.7344277Z sub.s32 %r520, %r580, %r519; 2026-02-21T09:05:29.7344433Z .loc 1 42 51 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:42:51 2026-02-21T09:05:29.7344489Z div.s32 %r521, %r520, %r518; 2026-02-21T09:05:29.7344660Z .loc 1 41 64 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:64 2026-02-21T09:05:29.7344788Z mul.lo.s32 %r522, %r521, %r518; 2026-02-21T09:05:29.7344848Z sub.s32 %r523, %r520, %r522; 2026-02-21T09:05:29.7345008Z .loc 1 41 30 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:30 2026-02-21T09:05:29.7345074Z add.s32 %r524, %r523, %r516; 2026-02-21T09:05:29.7345235Z .loc 1 43 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:43:27 2026-02-21T09:05:29.7345292Z shl.b32 %r578, %r524, 4; 2026-02-21T09:05:29.7345459Z .loc 1 44 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:44:27 2026-02-21T09:05:29.7345515Z shl.b32 %r579, %r521, 9; 2026-02-21T09:05:29.7345671Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7345734Z bra.uni $L__BB0_28; 2026-02-21T09:05:29.7345829Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:05:29.7345990Z .loc 1 0 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:74 2026-02-21T09:05:29.7346082Z ld.param.b64 %rd7, [_helion_matmul_param_0]; 2026-02-21T09:05:29.7346141Z mov.b32 %r113, global_smem; 2026-02-21T09:05:29.7346198Z add.s32 %r114, %r113, %r3; 2026-02-21T09:05:29.7346253Z mov.u32 %r5, %ctaid.x; 2026-02-21T09:05:29.7346346Z max.u32 %r180, %r5, 1023; 2026-02-21T09:05:29.7346401Z shl.b32 %r181, %r180, 7; 2026-02-21T09:05:29.7346455Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.7346559Z $L__BB0_21: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7346716Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7346772Z barrier.sync 1; 2026-02-21T09:05:29.7346829Z barrier.sync 1; 2026-02-21T09:05:29.7346918Z $L__BB0_2: // %.preheader 2026-02-21T09:05:29.7347007Z // =>This Loop Header: Depth=1 2026-02-21T09:05:29.7347093Z // Child Loop BB0_17 Depth 2 2026-02-21T09:05:29.7347185Z // Child Loop BB0_9 Depth 2 2026-02-21T09:05:29.7347336Z .loc 1 19 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:19 2026-02-21T09:05:29.7347414Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:05:29.7347477Z barrier.sync 1; 2026-02-21T09:05:29.7347543Z ld.shared.b8 %r112, [%r114+67672]; 2026-02-21T09:05:29.7347603Z setp.gt.u32 %p4, %r112, 3; 2026-02-21T09:05:29.7347660Z @%p4 bra $L__BB0_4; 2026-02-21T09:05:29.7347778Z // %bb.3: // %.preheader 2026-02-21T09:05:29.7347864Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7347925Z $L_brx_0: .branchtargets 2026-02-21T09:05:29.7347986Z $L__BB0_5, 2026-02-21T09:05:29.7348038Z $L__BB0_15, 2026-02-21T09:05:29.7348089Z $L__BB0_21, 2026-02-21T09:05:29.7348139Z $L__BB0_30; 2026-02-21T09:05:29.7348204Z brx.idx %r112, $L_brx_0; 2026-02-21T09:05:29.7348298Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7348489Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7348564Z ld.shared.b32 %r205, [global_smem]; 2026-02-21T09:05:29.7348621Z barrier.sync 1; 2026-02-21T09:05:29.7348680Z sub.s32 %r7, 131072, %r181; 2026-02-21T09:05:29.7348846Z .loc 1 45 45 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:45:45 2026-02-21T09:05:29.7348906Z add.s32 %r182, %r1, -256; 2026-02-21T09:05:29.7348961Z shr.u32 %r183, %r182, 5; 2026-02-21T09:05:29.7349016Z shr.u32 %r184, %r1, 1; 2026-02-21T09:05:29.7349080Z bfe.u32 %r8, %r1, 1, 7; 2026-02-21T09:05:29.7349135Z or.b32 %r9, %r8, 128; 2026-02-21T09:05:29.7349189Z or.b32 %r10, %r8, 256; 2026-02-21T09:05:29.7349253Z or.b32 %r11, %r184, 384; 2026-02-21T09:05:29.7349434Z .loc 1 51 48 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:51:48 2026-02-21T09:05:29.7349494Z shl.b32 %r185, %r1, 3; 2026-02-21T09:05:29.7349559Z and.b32 %r12, %r185, 8; 2026-02-21T09:05:29.7349717Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7349777Z setp.lt.s32 %p16, %r7, 1; 2026-02-21T09:05:29.7349837Z setp.gt.s32 %p15, %r7, 0; 2026-02-21T09:05:29.7350004Z .loc 1 39 33 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:39:33 2026-02-21T09:05:29.7350063Z shr.u32 %r186, %r5, 3; 2026-02-21T09:05:29.7350126Z and.b32 %r187, %r186, 268435452; 2026-02-21T09:05:29.7350296Z .loc 1 40 39 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:39 2026-02-21T09:05:29.7350356Z sub.s32 %r188, 128, %r187; 2026-02-21T09:05:29.7350517Z .loc 1 40 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:52 2026-02-21T09:05:29.7350582Z min.s32 %r189, %r188, 4; 2026-02-21T09:05:29.7350739Z .loc 1 41 45 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:45 2026-02-21T09:05:29.7350798Z and.b32 %r190, %r5, 31; 2026-02-21T09:05:29.7350956Z .loc 1 42 51 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:42:51 2026-02-21T09:05:29.7351047Z div.s32 %r191, %r190, %r189; 2026-02-21T09:05:29.7351207Z .loc 1 44 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:44:27 2026-02-21T09:05:29.7351263Z shl.b32 %r192, %r191, 9; 2026-02-21T09:05:29.7351428Z .loc 1 45 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:45:32 2026-02-21T09:05:29.7351484Z or.b32 %r557, %r192, %r8; 2026-02-21T09:05:29.7351541Z or.b32 %r558, %r192, %r9; 2026-02-21T09:05:29.7351605Z or.b32 %r559, %r192, %r10; 2026-02-21T09:05:29.7351661Z or.b32 %r560, %r192, %r11; 2026-02-21T09:05:29.7351818Z .loc 1 55 53 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:53 2026-02-21T09:05:29.7351875Z shl.b32 %r193, %r557, 11; 2026-02-21T09:05:29.7351935Z shl.b32 %r194, %r558, 11; 2026-02-21T09:05:29.7351989Z shl.b32 %r195, %r559, 11; 2026-02-21T09:05:29.7352044Z shl.b32 %r196, %r560, 11; 2026-02-21T09:05:29.7352209Z .loc 1 55 60 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:60 2026-02-21T09:05:29.7352266Z or.b32 %r197, %r193, %r12; 2026-02-21T09:05:29.7352322Z or.b32 %r198, %r194, %r12; 2026-02-21T09:05:29.7352382Z or.b32 %r199, %r195, %r12; 2026-02-21T09:05:29.7352458Z or.b32 %r200, %r196, %r12; 2026-02-21T09:05:29.7352619Z .loc 1 55 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:32 2026-02-21T09:05:29.7352683Z mad.wide.s32 %rd14, %r197, 2, %rd7; 2026-02-21T09:05:29.7352756Z mad.wide.s32 %rd15, %r198, 2, %rd7; 2026-02-21T09:05:29.7352817Z mad.wide.s32 %rd16, %r199, 2, %rd7; 2026-02-21T09:05:29.7352877Z mad.wide.s32 %rd17, %r200, 2, %rd7; 2026-02-21T09:05:29.7353046Z .loc 1 55 85 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:85 2026-02-21T09:05:29.7353135Z shl.b32 %r201, %r1, 4; 2026-02-21T09:05:29.7353193Z and.b32 %r202, %r201, 3952; 2026-02-21T09:05:29.7353257Z bfe.s32 %r203, %r1, 3, 1; 2026-02-21T09:05:29.7353314Z and.b32 %r204, %r203, 144; 2026-02-21T09:05:29.7353370Z xor.b32 %r17, %r204, %r202; 2026-02-21T09:05:29.7353424Z add.s32 %r228, %r113, %r17; 2026-02-21T09:05:29.7353491Z selp.b32 %r154, 16, 0, %p15; 2026-02-21T09:05:29.7353547Z // begin inline asm 2026-02-21T09:05:29.7353669Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd14 + 0 ], 0x10, %r154; 2026-02-21T09:05:29.7353730Z // end inline asm 2026-02-21T09:05:29.7353785Z add.s32 %r230, %r228, 4096; 2026-02-21T09:05:29.7353839Z // begin inline asm 2026-02-21T09:05:29.7353955Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd15 + 0 ], 0x10, %r154; 2026-02-21T09:05:29.7354017Z // end inline asm 2026-02-21T09:05:29.7354097Z add.s32 %r232, %r228, 8192; 2026-02-21T09:05:29.7354152Z // begin inline asm 2026-02-21T09:05:29.7354271Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd16 + 0 ], 0x10, %r154; 2026-02-21T09:05:29.7354324Z // end inline asm 2026-02-21T09:05:29.7354380Z add.s32 %r234, %r228, 12288; 2026-02-21T09:05:29.7354443Z // begin inline asm 2026-02-21T09:05:29.7354552Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd17 + 0 ], 0x10, %r154; 2026-02-21T09:05:29.7354604Z // end inline asm 2026-02-21T09:05:29.7354664Z cp.async.commit_group; 2026-02-21T09:05:29.7354892Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7354952Z setp.gt.s32 %p17, %r7, 1; 2026-02-21T09:05:29.7355112Z .loc 1 55 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:32 2026-02-21T09:05:29.7355176Z cvt.s64.s32 %rd26, %r193; 2026-02-21T09:05:29.7355234Z cvt.u64.u32 %rd27, %r12; 2026-02-21T09:05:29.7355289Z or.b64 %rd28, %rd26, %rd27; 2026-02-21T09:05:29.7355347Z shl.b64 %rd29, %rd28, 1; 2026-02-21T09:05:29.7355414Z add.s64 %rd1, %rd7, %rd29; 2026-02-21T09:05:29.7355469Z add.s64 %rd18, %rd1, 32; 2026-02-21T09:05:29.7355525Z cvt.s64.s32 %rd30, %r194; 2026-02-21T09:05:29.7355589Z or.b64 %rd31, %rd30, %rd27; 2026-02-21T09:05:29.7355675Z shl.b64 %rd32, %rd31, 1; 2026-02-21T09:05:29.7355734Z add.s64 %rd2, %rd7, %rd32; 2026-02-21T09:05:29.7355795Z add.s64 %rd19, %rd2, 32; 2026-02-21T09:05:29.7355851Z cvt.s64.s32 %rd33, %r195; 2026-02-21T09:05:29.7355910Z or.b64 %rd34, %rd33, %rd27; 2026-02-21T09:05:29.7355966Z shl.b64 %rd35, %rd34, 1; 2026-02-21T09:05:29.7356033Z add.s64 %rd3, %rd7, %rd35; 2026-02-21T09:05:29.7356090Z add.s64 %rd20, %rd3, 32; 2026-02-21T09:05:29.7356147Z cvt.s64.s32 %rd36, %r196; 2026-02-21T09:05:29.7356212Z or.b64 %rd37, %rd36, %rd27; 2026-02-21T09:05:29.7356268Z shl.b64 %rd38, %rd37, 1; 2026-02-21T09:05:29.7356324Z add.s64 %rd4, %rd7, %rd38; 2026-02-21T09:05:29.7356380Z add.s64 %rd21, %rd4, 32; 2026-02-21T09:05:29.7356547Z .loc 1 55 85 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:85 2026-02-21T09:05:29.7356603Z bar.sync 2, 256; 2026-02-21T09:05:29.7356659Z add.s32 %r161, %r228, 16384; 2026-02-21T09:05:29.7356726Z selp.b32 %r162, 16, 0, %p17; 2026-02-21T09:05:29.7356784Z // begin inline asm 2026-02-21T09:05:29.7356893Z cp.async.cg.shared.global [ %r161 + 0 ], [ %rd18 + 0 ], 0x10, %r162; 2026-02-21T09:05:29.7356945Z // end inline asm 2026-02-21T09:05:29.7357009Z add.s32 %r163, %r228, 20480; 2026-02-21T09:05:29.7357093Z // begin inline asm 2026-02-21T09:05:29.7357204Z cp.async.cg.shared.global [ %r163 + 0 ], [ %rd19 + 0 ], 0x10, %r162; 2026-02-21T09:05:29.7357266Z // end inline asm 2026-02-21T09:05:29.7357323Z add.s32 %r165, %r228, 24576; 2026-02-21T09:05:29.7357377Z // begin inline asm 2026-02-21T09:05:29.7357494Z cp.async.cg.shared.global [ %r165 + 0 ], [ %rd20 + 0 ], 0x10, %r162; 2026-02-21T09:05:29.7357548Z // end inline asm 2026-02-21T09:05:29.7357606Z add.s32 %r167, %r228, 28672; 2026-02-21T09:05:29.7357694Z // begin inline asm 2026-02-21T09:05:29.7357820Z cp.async.cg.shared.global [ %r167 + 0 ], [ %rd21 + 0 ], 0x10, %r162; 2026-02-21T09:05:29.7357875Z // end inline asm 2026-02-21T09:05:29.7357937Z cp.async.commit_group; 2026-02-21T09:05:29.7358109Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7358172Z setp.gt.s32 %p18, %r7, 2; 2026-02-21T09:05:29.7358364Z .loc 1 55 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:32 2026-02-21T09:05:29.7358424Z add.s64 %rd22, %rd1, 64; 2026-02-21T09:05:29.7358490Z add.s64 %rd23, %rd2, 64; 2026-02-21T09:05:29.7358549Z add.s64 %rd24, %rd3, 64; 2026-02-21T09:05:29.7358608Z add.s64 %rd25, %rd4, 64; 2026-02-21T09:05:29.7358783Z .loc 1 55 85 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:85 2026-02-21T09:05:29.7358839Z bar.sync 2, 256; 2026-02-21T09:05:29.7358927Z add.s32 %r169, %r228, 32768; 2026-02-21T09:05:29.7359000Z selp.b32 %r170, 16, 0, %p18; 2026-02-21T09:05:29.7359058Z // begin inline asm 2026-02-21T09:05:29.7359172Z cp.async.cg.shared.global [ %r169 + 0 ], [ %rd22 + 0 ], 0x10, %r170; 2026-02-21T09:05:29.7359227Z // end inline asm 2026-02-21T09:05:29.7359297Z add.s32 %r171, %r228, 36864; 2026-02-21T09:05:29.7359355Z // begin inline asm 2026-02-21T09:05:29.7359469Z cp.async.cg.shared.global [ %r171 + 0 ], [ %rd23 + 0 ], 0x10, %r170; 2026-02-21T09:05:29.7359534Z // end inline asm 2026-02-21T09:05:29.7359593Z add.s32 %r173, %r228, 40960; 2026-02-21T09:05:29.7359651Z // begin inline asm 2026-02-21T09:05:29.7359761Z cp.async.cg.shared.global [ %r173 + 0 ], [ %rd24 + 0 ], 0x10, %r170; 2026-02-21T09:05:29.7359825Z // end inline asm 2026-02-21T09:05:29.7359883Z add.s32 %r175, %r228, 45056; 2026-02-21T09:05:29.7359941Z // begin inline asm 2026-02-21T09:05:29.7360062Z cp.async.cg.shared.global [ %r175 + 0 ], [ %rd25 + 0 ], 0x10, %r170; 2026-02-21T09:05:29.7360120Z // end inline asm 2026-02-21T09:05:29.7360188Z cp.async.commit_group; 2026-02-21T09:05:29.7360258Z cp.async.wait_group 2; 2026-02-21T09:05:29.7360316Z bar.sync 2, 256; 2026-02-21T09:05:29.7360486Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7360568Z add.s32 %r177, %r113, 67616; 2026-02-21T09:05:29.7360631Z mov.b32 %r178, 0; 2026-02-21T09:05:29.7360689Z // begin inline asm 2026-02-21T09:05:29.7360742Z 2026-02-21T09:05:29.7360802Z { 2026-02-21T09:05:29.7360868Z @!%p15 bra.uni skipWait; 2026-02-21T09:05:29.7360929Z .reg .pred complete; 2026-02-21T09:05:29.7360986Z waitLoop: 2026-02-21T09:05:29.7361121Z mbarrier.try_wait.parity.shared.b64 complete, [%r177], %r178; 2026-02-21T09:05:29.7361187Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7361242Z skipWait: 2026-02-21T09:05:29.7361301Z } 2026-02-21T09:05:29.7361304Z 2026-02-21T09:05:29.7361361Z // end inline asm 2026-02-21T09:05:29.7361531Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7361617Z shfl.sync.idx.b32 %r22, %r183, 0, 31, -1; 2026-02-21T09:05:29.7361679Z setp.ne.b32 %p19, %r22, 0; 2026-02-21T09:05:29.7361745Z or.pred %p20, %p16, %p19; 2026-02-21T09:05:29.7361805Z @%p20 bra $L__BB0_7; 2026-02-21T09:05:29.7361911Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7361976Z elect.sync %r213|%p22, -1; 2026-02-21T09:05:29.7362061Z bfe.u32 %r215, %r113, 4, 14; 2026-02-21T09:05:29.7362131Z cvt.u64.u32 %rd49, %r215; 2026-02-21T09:05:29.7362205Z or.b64 %rd39, %rd49, -4611685949640802304; 2026-02-21T09:05:29.7362264Z add.s32 %r216, %r113, 65536; 2026-02-21T09:05:29.7362322Z bfe.u32 %r217, %r216, 4, 14; 2026-02-21T09:05:29.7362388Z cvt.u64.u32 %rd50, %r217; 2026-02-21T09:05:29.7362461Z or.b64 %rd40, %rd50, -4611685949705814016; 2026-02-21T09:05:29.7362519Z mov.b32 %r206, 134479888; 2026-02-21T09:05:29.7362585Z mov.pred %p21, 0; 2026-02-21T09:05:29.7362666Z // begin inline asm 2026-02-21T09:05:29.7362818Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r205 + 0 ], %rd39, %rd40, %r206, %p21; 2026-02-21T09:05:29.7362881Z // end inline asm 2026-02-21T09:05:29.7362940Z add.s32 %r218, %r113, 4096; 2026-02-21T09:05:29.7362999Z bfe.u32 %r219, %r218, 4, 14; 2026-02-21T09:05:29.7363058Z cvt.u64.u32 %rd51, %r219; 2026-02-21T09:05:29.7363133Z or.b64 %rd41, %rd51, -4611685949640802304; 2026-02-21T09:05:29.7363190Z // begin inline asm 2026-02-21T09:05:29.7363334Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r205 + 16 ], %rd41, %rd40, %r206, %p21; 2026-02-21T09:05:29.7363399Z // end inline asm 2026-02-21T09:05:29.7363460Z add.s32 %r220, %r113, 8192; 2026-02-21T09:05:29.7363519Z bfe.u32 %r221, %r220, 4, 14; 2026-02-21T09:05:29.7363580Z cvt.u64.u32 %rd52, %r221; 2026-02-21T09:05:29.7363658Z or.b64 %rd43, %rd52, -4611685949640802304; 2026-02-21T09:05:29.7363715Z // begin inline asm 2026-02-21T09:05:29.7363875Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r205 + 32 ], %rd43, %rd40, %r206, %p21; 2026-02-21T09:05:29.7363944Z // end inline asm 2026-02-21T09:05:29.7364006Z add.s32 %r222, %r113, 12288; 2026-02-21T09:05:29.7364066Z bfe.u32 %r223, %r222, 4, 14; 2026-02-21T09:05:29.7364135Z cvt.u64.u32 %rd53, %r223; 2026-02-21T09:05:29.7364204Z or.b64 %rd45, %rd53, -4611685949640802304; 2026-02-21T09:05:29.7364264Z // begin inline asm 2026-02-21T09:05:29.7364406Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r205 + 48 ], %rd45, %rd40, %r206, %p21; 2026-02-21T09:05:29.7364470Z // end inline asm 2026-02-21T09:05:29.7364528Z add.s32 %r224, %r113, 67584; 2026-02-21T09:05:29.7364588Z cvt.u64.u32 %rd47, %r224; 2026-02-21T09:05:29.7364651Z // begin inline asm 2026-02-21T09:05:29.7364814Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd47]; 2026-02-21T09:05:29.7364872Z // end inline asm 2026-02-21T09:05:29.7364937Z add.s32 %r225, %r113, 67648; 2026-02-21T09:05:29.7364996Z cvt.u64.u32 %rd48, %r225; 2026-02-21T09:05:29.7365056Z // begin inline asm 2026-02-21T09:05:29.7365178Z @%p21 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd48]; 2026-02-21T09:05:29.7365244Z // end inline asm 2026-02-21T09:05:29.7365344Z $L__BB0_7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7365549Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7365623Z setp.gt.s32 %p32, %r7, 3; 2026-02-21T09:05:29.7365798Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7365858Z add.s32 %r226, %r113, 67664; 2026-02-21T09:05:29.7365928Z mov.pred %p143, 0; 2026-02-21T09:05:29.7365985Z // begin inline asm 2026-02-21T09:05:29.7366039Z 2026-02-21T09:05:29.7366092Z { 2026-02-21T09:05:29.7366165Z @!%p143 bra.uni skipWait; 2026-02-21T09:05:29.7366226Z .reg .pred complete; 2026-02-21T09:05:29.7366282Z waitLoop: 2026-02-21T09:05:29.7366416Z mbarrier.try_wait.parity.shared.b64 complete, [%r226], %r178; 2026-02-21T09:05:29.7366483Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7366538Z skipWait: 2026-02-21T09:05:29.7366589Z } 2026-02-21T09:05:29.7366593Z 2026-02-21T09:05:29.7366656Z // end inline asm 2026-02-21T09:05:29.7366829Z .loc 1 55 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:32 2026-02-21T09:05:29.7366889Z add.s64 %rd54, %rd1, 96; 2026-02-21T09:05:29.7366962Z add.s64 %rd55, %rd2, 96; 2026-02-21T09:05:29.7367044Z add.s64 %rd56, %rd3, 96; 2026-02-21T09:05:29.7367099Z add.s64 %rd57, %rd4, 96; 2026-02-21T09:05:29.7367257Z .loc 1 55 85 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:85 2026-02-21T09:05:29.7367319Z bar.sync 2, 256; 2026-02-21T09:05:29.7367376Z selp.b32 %r229, 16, 0, %p32; 2026-02-21T09:05:29.7367431Z // begin inline asm 2026-02-21T09:05:29.7367549Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd54 + 0 ], 0x10, %r229; 2026-02-21T09:05:29.7367603Z // end inline asm 2026-02-21T09:05:29.7367687Z // begin inline asm 2026-02-21T09:05:29.7367801Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd55 + 0 ], 0x10, %r229; 2026-02-21T09:05:29.7367855Z // end inline asm 2026-02-21T09:05:29.7367908Z // begin inline asm 2026-02-21T09:05:29.7368017Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd56 + 0 ], 0x10, %r229; 2026-02-21T09:05:29.7368079Z // end inline asm 2026-02-21T09:05:29.7368134Z // begin inline asm 2026-02-21T09:05:29.7368240Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd57 + 0 ], 0x10, %r229; 2026-02-21T09:05:29.7368301Z // end inline asm 2026-02-21T09:05:29.7368362Z cp.async.commit_group; 2026-02-21T09:05:29.7368522Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7368585Z @%p16 bra $L__BB0_14; 2026-02-21T09:05:29.7368663Z // %bb.8: // %.lr.ph14 2026-02-21T09:05:29.7368781Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7368942Z .loc 1 0 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:74 2026-02-21T09:05:29.7369010Z add.s32 %r6, %r181, -130944; 2026-02-21T09:05:29.7369069Z sub.s32 %r23, 124, %r6; 2026-02-21T09:05:29.7369129Z sub.s32 %r24, 127, %r6; 2026-02-21T09:05:29.7369190Z mov.b32 %r555, 3; 2026-02-21T09:05:29.7369242Z mov.b32 %r545, 0; 2026-02-21T09:05:29.7369296Z mov.b32 %r544, 1; 2026-02-21T09:05:29.7369348Z mov.b32 %r543, 2; 2026-02-21T09:05:29.7369408Z mov.b32 %r542, 48; 2026-02-21T09:05:29.7369464Z mov.b32 %r546, %r545; 2026-02-21T09:05:29.7369518Z mov.b32 %r547, %r545; 2026-02-21T09:05:29.7369578Z mov.b32 %r548, %r545; 2026-02-21T09:05:29.7369632Z mov.b32 %r549, %r545; 2026-02-21T09:05:29.7369685Z mov.b32 %r561, %r5; 2026-02-21T09:05:29.7369738Z mov.b32 %r556, %r545; 2026-02-21T09:05:29.7369800Z bra.uni $L__BB0_9; 2026-02-21T09:05:29.7369894Z $L__BB0_13: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:05:29.7370053Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7370123Z setp.eq.b32 %p143, %r544, 127; 2026-02-21T09:05:29.7370185Z setp.eq.b32 %p55, %r544, 127; 2026-02-21T09:05:29.7370269Z setp.eq.b32 %p56, %r40, 0; 2026-02-21T09:05:29.7370337Z setp.lt.s32 %p58, %r556, %r23; 2026-02-21T09:05:29.7370498Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7370555Z add.s32 %r302, %r547, 1; 2026-02-21T09:05:29.7370613Z setp.eq.b32 %p59, %r302, 2; 2026-02-21T09:05:29.7370682Z selp.b32 %r303, 0, %r302, %p59; 2026-02-21T09:05:29.7370746Z selp.b32 %r547, %r303, %r547, %p55; 2026-02-21T09:05:29.7370808Z and.pred %p60, %p55, %p59; 2026-02-21T09:05:29.7370872Z selp.b32 %r304, 1, 0, %p60; 2026-02-21T09:05:29.7370928Z xor.b32 %r546, %r546, %r304; 2026-02-21T09:05:29.7371088Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7371154Z shl.b32 %r305, %r547, 3; 2026-02-21T09:05:29.7371210Z add.s32 %r307, %r113, %r305; 2026-02-21T09:05:29.7371263Z add.s32 %r292, %r307, 67664; 2026-02-21T09:05:29.7371324Z and.pred %p54, %p37, %p55; 2026-02-21T09:05:29.7371490Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7371544Z // begin inline asm 2026-02-21T09:05:29.7371592Z 2026-02-21T09:05:29.7371681Z { 2026-02-21T09:05:29.7371743Z @!%p54 bra.uni skipWait; 2026-02-21T09:05:29.7371800Z .reg .pred complete; 2026-02-21T09:05:29.7371852Z waitLoop: 2026-02-21T09:05:29.7371979Z mbarrier.try_wait.parity.shared.b64 complete, [%r292], %r546; 2026-02-21T09:05:29.7372040Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7372090Z skipWait: 2026-02-21T09:05:29.7372146Z } 2026-02-21T09:05:29.7372150Z 2026-02-21T09:05:29.7372203Z // end inline asm 2026-02-21T09:05:29.7372363Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7372448Z add.s32 %r308, %r542, 16; 2026-02-21T09:05:29.7372508Z selp.b32 %r542, 0, %r308, %p56; 2026-02-21T09:05:29.7372665Z .loc 1 51 35 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:51:35 2026-02-21T09:05:29.7372722Z add.s32 %r309, %r542, %r12; 2026-02-21T09:05:29.7372887Z .loc 1 55 53 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:53 2026-02-21T09:05:29.7372944Z shl.b32 %r310, %r557, 11; 2026-02-21T09:05:29.7372999Z shl.b32 %r311, %r558, 11; 2026-02-21T09:05:29.7373062Z shl.b32 %r312, %r559, 11; 2026-02-21T09:05:29.7373116Z shl.b32 %r313, %r560, 11; 2026-02-21T09:05:29.7373276Z .loc 1 55 60 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:60 2026-02-21T09:05:29.7373340Z add.s32 %r314, %r310, %r309; 2026-02-21T09:05:29.7373418Z add.s32 %r315, %r311, %r309; 2026-02-21T09:05:29.7373478Z add.s32 %r316, %r312, %r309; 2026-02-21T09:05:29.7373538Z add.s32 %r317, %r313, %r309; 2026-02-21T09:05:29.7373706Z .loc 1 55 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:32 2026-02-21T09:05:29.7373773Z mad.wide.s32 %rd73, %r314, 2, %rd7; 2026-02-21T09:05:29.7373840Z mad.wide.s32 %rd74, %r315, 2, %rd7; 2026-02-21T09:05:29.7373911Z mad.wide.s32 %rd75, %r316, 2, %rd7; 2026-02-21T09:05:29.7373973Z mad.wide.s32 %rd76, %r317, 2, %rd7; 2026-02-21T09:05:29.7374130Z .loc 1 55 85 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:85 2026-02-21T09:05:29.7374192Z bar.sync 2, 256; 2026-02-21T09:05:29.7374250Z add.s32 %r294, %r54, %r17; 2026-02-21T09:05:29.7374307Z selp.b32 %r295, 16, 0, %p58; 2026-02-21T09:05:29.7374361Z // begin inline asm 2026-02-21T09:05:29.7374480Z cp.async.cg.shared.global [ %r294 + 0 ], [ %rd73 + 0 ], 0x10, %r295; 2026-02-21T09:05:29.7374533Z // end inline asm 2026-02-21T09:05:29.7374590Z add.s32 %r296, %r294, 4096; 2026-02-21T09:05:29.7374653Z // begin inline asm 2026-02-21T09:05:29.7374820Z cp.async.cg.shared.global [ %r296 + 0 ], [ %rd74 + 0 ], 0x10, %r295; 2026-02-21T09:05:29.7374877Z // end inline asm 2026-02-21T09:05:29.7374933Z add.s32 %r298, %r294, 8192; 2026-02-21T09:05:29.7375029Z // begin inline asm 2026-02-21T09:05:29.7375137Z cp.async.cg.shared.global [ %r298 + 0 ], [ %rd75 + 0 ], 0x10, %r295; 2026-02-21T09:05:29.7375191Z // end inline asm 2026-02-21T09:05:29.7375255Z add.s32 %r300, %r294, 12288; 2026-02-21T09:05:29.7375311Z // begin inline asm 2026-02-21T09:05:29.7375418Z cp.async.cg.shared.global [ %r300 + 0 ], [ %rd76 + 0 ], 0x10, %r295; 2026-02-21T09:05:29.7375471Z // end inline asm 2026-02-21T09:05:29.7375541Z cp.async.commit_group; 2026-02-21T09:05:29.7375704Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7375761Z add.s32 %r556, %r556, 1; 2026-02-21T09:05:29.7375835Z setp.ne.b32 %p61, %r7, %r556; 2026-02-21T09:05:29.7375891Z mov.b32 %r543, %r555; 2026-02-21T09:05:29.7375946Z mov.b32 %r544, %r26; 2026-02-21T09:05:29.7376008Z mov.b32 %r555, %r40; 2026-02-21T09:05:29.7376065Z @%p61 bra $L__BB0_9; 2026-02-21T09:05:29.7376122Z bra.uni $L__BB0_14; 2026-02-21T09:05:29.7376219Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.7376321Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.7376508Z .loc 1 0 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:74 2026-02-21T09:05:29.7376563Z mov.b32 %r26, %r543; 2026-02-21T09:05:29.7376729Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7376786Z add.s32 %r242, %r555, 1; 2026-02-21T09:05:29.7376848Z setp.eq.b32 %p35, %r555, 127; 2026-02-21T09:05:29.7376917Z selp.b32 %r40, 0, %r242, %p35; 2026-02-21T09:05:29.7376979Z setp.ne.b32 %p36, %r40, 0; 2026-02-21T09:05:29.7377062Z @%p36 bra $L__BB0_11; 2026-02-21T09:05:29.7377155Z // %bb.10: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:05:29.7377219Z add.s32 %r561, %r561, 1; 2026-02-21T09:05:29.7377379Z .loc 1 38 35 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:38:35 2026-02-21T09:05:29.7377435Z shr.s32 %r243, %r561, 31; 2026-02-21T09:05:29.7377498Z shr.u32 %r244, %r243, 27; 2026-02-21T09:05:29.7377554Z add.s32 %r245, %r561, %r244; 2026-02-21T09:05:29.7377609Z shr.s32 %r246, %r245, 5; 2026-02-21T09:05:29.7377663Z neg.s32 %r247, %r246; 2026-02-21T09:05:29.7377829Z .loc 1 39 33 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:39:33 2026-02-21T09:05:29.7377885Z shl.b32 %r248, %r247, 2; 2026-02-21T09:05:29.7378044Z .loc 1 40 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:52 2026-02-21T09:05:29.7378134Z min.s32 %r249, %r248, -124; 2026-02-21T09:05:29.7378194Z add.s32 %r250, %r249, 128; 2026-02-21T09:05:29.7378352Z .loc 1 41 45 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:45 2026-02-21T09:05:29.7378416Z and.b32 %r251, %r245, -32; 2026-02-21T09:05:29.7378473Z sub.s32 %r252, %r561, %r251; 2026-02-21T09:05:29.7378630Z .loc 1 42 51 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:42:51 2026-02-21T09:05:29.7378692Z div.s32 %r253, %r252, %r250; 2026-02-21T09:05:29.7378849Z .loc 1 44 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:44:27 2026-02-21T09:05:29.7378904Z shl.b32 %r254, %r253, 9; 2026-02-21T09:05:29.7379059Z .loc 1 45 32 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:45:32 2026-02-21T09:05:29.7379123Z or.b32 %r557, %r254, %r8; 2026-02-21T09:05:29.7379177Z or.b32 %r558, %r254, %r9; 2026-02-21T09:05:29.7379232Z or.b32 %r559, %r254, %r10; 2026-02-21T09:05:29.7379296Z or.b32 %r560, %r254, %r11; 2026-02-21T09:05:29.7379390Z $L__BB0_11: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:05:29.7379548Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7379639Z setp.ge.s32 %p38, %r556, %r24; 2026-02-21T09:05:29.7379698Z setp.lt.s32 %p37, %r556, %r24; 2026-02-21T09:05:29.7379858Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7379913Z add.s32 %r257, %r549, 1; 2026-02-21T09:05:29.7379980Z setp.eq.b32 %p40, %r257, 4; 2026-02-21T09:05:29.7380042Z selp.b32 %r549, 0, %r257, %p40; 2026-02-21T09:05:29.7380099Z selp.b32 %r258, 1, 0, %p40; 2026-02-21T09:05:29.7380162Z xor.b32 %r548, %r548, %r258; 2026-02-21T09:05:29.7380321Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7380378Z add.s32 %r259, %r545, 1; 2026-02-21T09:05:29.7380444Z setp.gt.s32 %p41, %r259, 2; 2026-02-21T09:05:29.7380504Z selp.b32 %r545, 0, %r259, %p41; 2026-02-21T09:05:29.7380663Z .loc 1 55 85 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:55:85 2026-02-21T09:05:29.7380724Z cp.async.wait_group 2; 2026-02-21T09:05:29.7380789Z bar.sync 2, 256; 2026-02-21T09:05:29.7380844Z shl.b32 %r260, %r545, 14; 2026-02-21T09:05:29.7380899Z add.s32 %r54, %r113, %r260; 2026-02-21T09:05:29.7381088Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7381146Z shl.b32 %r262, %r549, 3; 2026-02-21T09:05:29.7381202Z add.s32 %r263, %r113, %r262; 2026-02-21T09:05:29.7381258Z add.s32 %r255, %r263, 67616; 2026-02-21T09:05:29.7381420Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7381476Z // begin inline asm 2026-02-21T09:05:29.7381524Z 2026-02-21T09:05:29.7381581Z { 2026-02-21T09:05:29.7381644Z @!%p37 bra.uni skipWait; 2026-02-21T09:05:29.7381723Z .reg .pred complete; 2026-02-21T09:05:29.7381786Z waitLoop: 2026-02-21T09:05:29.7381906Z mbarrier.try_wait.parity.shared.b64 complete, [%r255], %r548; 2026-02-21T09:05:29.7381970Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7382024Z skipWait: 2026-02-21T09:05:29.7382085Z } 2026-02-21T09:05:29.7382089Z 2026-02-21T09:05:29.7382142Z // end inline asm 2026-02-21T09:05:29.7382303Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7382371Z or.pred %p42, %p19, %p38; 2026-02-21T09:05:29.7382428Z @%p42 bra $L__BB0_13; 2026-02-21T09:05:29.7382519Z // %bb.12: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:05:29.7382685Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7382746Z setp.eq.b32 %p53, %r544, 127; 2026-02-21T09:05:29.7382823Z shl.b32 %r272, %r547, 3; 2026-02-21T09:05:29.7382883Z add.s32 %r274, %r113, %r272; 2026-02-21T09:05:29.7382948Z add.s32 %r275, %r274, 67648; 2026-02-21T09:05:29.7383106Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7383163Z shl.b32 %r276, %r547, 6; 2026-02-21T09:05:29.7383224Z add.s32 %r264, %r276, %r205; 2026-02-21T09:05:29.7383380Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7383436Z shl.b32 %r277, %r549, 9; 2026-02-21T09:05:29.7383498Z add.s32 %r278, %r113, %r277; 2026-02-21T09:05:29.7383553Z add.s32 %r279, %r278, 65536; 2026-02-21T09:05:29.7383711Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7383765Z add.s32 %r282, %r263, 67584; 2026-02-21T09:05:29.7383932Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7383993Z not.pred %p43, %p143; 2026-02-21T09:05:29.7384055Z elect.sync %r283|%p44, -1; 2026-02-21T09:05:29.7384122Z bfe.u32 %r284, %r54, 4, 14; 2026-02-21T09:05:29.7384180Z cvt.u64.u32 %rd68, %r284; 2026-02-21T09:05:29.7384250Z or.b64 %rd58, %rd68, -4611685949640802304; 2026-02-21T09:05:29.7384346Z bfe.u32 %r285, %r279, 4, 14; 2026-02-21T09:05:29.7384410Z cvt.u64.u32 %rd69, %r285; 2026-02-21T09:05:29.7384480Z or.b64 %rd59, %rd69, -4611685949705814016; 2026-02-21T09:05:29.7384534Z mov.b32 %r265, 134479888; 2026-02-21T09:05:29.7384599Z // begin inline asm 2026-02-21T09:05:29.7384779Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r264 + 0 ], %rd58, %rd59, %r265, %p43; 2026-02-21T09:05:29.7384835Z // end inline asm 2026-02-21T09:05:29.7384898Z add.s32 %r286, %r54, 4096; 2026-02-21T09:05:29.7384955Z bfe.u32 %r287, %r286, 4, 14; 2026-02-21T09:05:29.7385012Z cvt.u64.u32 %rd70, %r287; 2026-02-21T09:05:29.7385080Z or.b64 %rd60, %rd70, -4611685949640802304; 2026-02-21T09:05:29.7385146Z // begin inline asm 2026-02-21T09:05:29.7385282Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r264 + 16 ], %rd60, %rd59, %r265, %p43; 2026-02-21T09:05:29.7385338Z // end inline asm 2026-02-21T09:05:29.7385404Z add.s32 %r288, %r54, 8192; 2026-02-21T09:05:29.7385459Z bfe.u32 %r289, %r288, 4, 14; 2026-02-21T09:05:29.7385517Z cvt.u64.u32 %rd71, %r289; 2026-02-21T09:05:29.7385590Z or.b64 %rd62, %rd71, -4611685949640802304; 2026-02-21T09:05:29.7385646Z // begin inline asm 2026-02-21T09:05:29.7385804Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r264 + 32 ], %rd62, %rd59, %r265, %p43; 2026-02-21T09:05:29.7385861Z // end inline asm 2026-02-21T09:05:29.7385929Z add.s32 %r290, %r54, 12288; 2026-02-21T09:05:29.7385987Z bfe.u32 %r291, %r290, 4, 14; 2026-02-21T09:05:29.7386044Z cvt.u64.u32 %rd72, %r291; 2026-02-21T09:05:29.7386119Z or.b64 %rd64, %rd72, -4611685949640802304; 2026-02-21T09:05:29.7386175Z // begin inline asm 2026-02-21T09:05:29.7386305Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r264 + 48 ], %rd64, %rd59, %r265, %p43; 2026-02-21T09:05:29.7386394Z // end inline asm 2026-02-21T09:05:29.7386457Z cvt.u64.u32 %rd66, %r282; 2026-02-21T09:05:29.7386511Z // begin inline asm 2026-02-21T09:05:29.7386628Z @%p44 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd66]; 2026-02-21T09:05:29.7386690Z // end inline asm 2026-02-21T09:05:29.7386751Z and.pred %p52, %p53, %p44; 2026-02-21T09:05:29.7386807Z cvt.u64.u32 %rd67, %r275; 2026-02-21T09:05:29.7386866Z // begin inline asm 2026-02-21T09:05:29.7386981Z @%p52 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:05:29.7387034Z // end inline asm 2026-02-21T09:05:29.7387090Z bra.uni $L__BB0_13; 2026-02-21T09:05:29.7387191Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7387352Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7387407Z barrier.sync 1; 2026-02-21T09:05:29.7387602Z .loc 1 21 67 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:21:67 2026-02-21T09:05:29.7387664Z mov.u32 %r59, %ctaid.x; 2026-02-21T09:05:29.7387722Z mov.u32 %r115, %ctaid.y; 2026-02-21T09:05:29.7387785Z mov.u32 %r116, %ctaid.z; 2026-02-21T09:05:29.7387842Z mov.u32 %r117, %nctaid.x; 2026-02-21T09:05:29.7387898Z mov.u32 %r118, %nctaid.y; 2026-02-21T09:05:29.7387962Z mad.lo.s32 %r119, %r116, %r118, %r115; 2026-02-21T09:05:29.7388033Z mad.lo.s32 %r120, %r119, %r117, %r59; 2026-02-21T09:05:29.7388089Z shl.b32 %r121, %r120, 8; 2026-02-21T09:05:29.7388146Z cvt.s64.s32 %rd11, %r121; 2026-02-21T09:05:29.7388209Z add.s64 %rd12, %rd10, %rd11; 2026-02-21T09:05:29.7388272Z cvta.global.u64 %rd13, %rd12; 2026-02-21T09:05:29.7388433Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7388490Z max.u32 %r122, %r59, 1023; 2026-02-21T09:05:29.7388553Z shl.b32 %r123, %r122, 7; 2026-02-21T09:05:29.7388610Z sub.s32 %r60, 131072, %r123; 2026-02-21T09:05:29.7388669Z setp.lt.s32 %p5, %r60, 1; 2026-02-21T09:05:29.7388735Z @%p5 bra $L__BB0_20; 2026-02-21T09:05:29.7388808Z // %bb.16: // %.lr.ph 2026-02-21T09:05:29.7388895Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7389011Z add.s32 %r570, %r59, -1; 2026-02-21T09:05:29.7389069Z add.s32 %r62, %r1, -512; 2026-02-21T09:05:29.7389123Z mov.b32 %r567, -1; 2026-02-21T09:05:29.7389176Z mov.b32 %r562, 0; 2026-02-21T09:05:29.7389241Z mov.b32 %r563, %r562; 2026-02-21T09:05:29.7389295Z mov.b32 %r569, %r562; 2026-02-21T09:05:29.7389349Z mov.b32 %r565, %r562; 2026-02-21T09:05:29.7389411Z mov.b32 %r568, %r562; 2026-02-21T09:05:29.7389468Z bra.uni $L__BB0_17; 2026-02-21T09:05:29.7389563Z $L__BB0_19: // in Loop: Header=BB0_17 Depth=2 2026-02-21T09:05:29.7389725Z .loc 1 0 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0:74 2026-02-21T09:05:29.7389805Z selp.b32 %r142, 0, %r565, %p8; 2026-02-21T09:05:29.7389867Z setp.lt.u32 %p11, %r62, 32; 2026-02-21T09:05:29.7389926Z setp.eq.b32 %p9, %r62, 0; 2026-02-21T09:05:29.7390094Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7390151Z shl.b32 %r145, %r563, 3; 2026-02-21T09:05:29.7390207Z add.s32 %r147, %r113, %r145; 2026-02-21T09:05:29.7390269Z add.s32 %r138, %r147, 67584; 2026-02-21T09:05:29.7390446Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7390503Z // begin inline asm 2026-02-21T09:05:29.7390553Z 2026-02-21T09:05:29.7390610Z { 2026-02-21T09:05:29.7390668Z .reg .pred complete; 2026-02-21T09:05:29.7390720Z waitLoop: 2026-02-21T09:05:29.7390846Z mbarrier.try_wait.parity.shared.b64 complete, [%r138], %r562; 2026-02-21T09:05:29.7390908Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7390956Z } 2026-02-21T09:05:29.7390961Z 2026-02-21T09:05:29.7391045Z // end inline asm 2026-02-21T09:05:29.7391200Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7391256Z add.s32 %r144, %r147, 67616; 2026-02-21T09:05:29.7391410Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7391473Z bar.sync 3, 64; 2026-02-21T09:05:29.7391530Z // begin inline asm 2026-02-21T09:05:29.7391635Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r144], 512; 2026-02-21T09:05:29.7391698Z // end inline asm 2026-02-21T09:05:29.7391754Z shl.b32 %r148, %r563, 9; 2026-02-21T09:05:29.7391809Z add.s32 %r149, %r113, %r148; 2026-02-21T09:05:29.7391865Z add.s32 %r141, %r149, 65536; 2026-02-21T09:05:29.7391929Z bar.sync 3, 64; 2026-02-21T09:05:29.7391991Z elect.sync %r150|%p12, -1; 2026-02-21T09:05:29.7392053Z and.pred %p10, %p11, %p12; 2026-02-21T09:05:29.7392116Z // begin inline asm 2026-02-21T09:05:29.7392385Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r141], [%rd13, {%r142, %r569}], [%r144]; 2026-02-21T09:05:29.7392443Z // end inline asm 2026-02-21T09:05:29.7392609Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7392668Z add.s32 %r565, %r142, 16; 2026-02-21T09:05:29.7392823Z .loc 1 56 44 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:56:44 2026-02-21T09:05:29.7392890Z add.s32 %r151, %r563, 1; 2026-02-21T09:05:29.7392949Z setp.eq.b32 %p13, %r151, 4; 2026-02-21T09:05:29.7393012Z selp.b32 %r563, 0, %r151, %p13; 2026-02-21T09:05:29.7393069Z selp.b32 %r152, 1, 0, %p13; 2026-02-21T09:05:29.7393135Z xor.b32 %r562, %r562, %r152; 2026-02-21T09:05:29.7393292Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7393348Z add.s32 %r568, %r568, 1; 2026-02-21T09:05:29.7393417Z setp.lt.s32 %p14, %r568, %r60; 2026-02-21T09:05:29.7393475Z @%p14 bra $L__BB0_17; 2026-02-21T09:05:29.7393530Z bra.uni $L__BB0_20; 2026-02-21T09:05:29.7393627Z $L__BB0_17: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.7393726Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.7393804Z add.s32 %r126, %r567, 1; 2026-02-21T09:05:29.7393863Z setp.eq.b32 %p6, %r567, 127; 2026-02-21T09:05:29.7393931Z selp.b32 %r567, 0, %r126, %p6; 2026-02-21T09:05:29.7393992Z setp.ne.b32 %p7, %r567, 0; 2026-02-21T09:05:29.7394050Z setp.eq.b32 %p8, %r567, 0; 2026-02-21T09:05:29.7394113Z @%p7 bra $L__BB0_19; 2026-02-21T09:05:29.7394206Z // %bb.18: // in Loop: Header=BB0_17 Depth=2 2026-02-21T09:05:29.7394262Z add.s32 %r570, %r570, 1; 2026-02-21T09:05:29.7394420Z .loc 1 38 35 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:38:35 2026-02-21T09:05:29.7394485Z shr.s32 %r127, %r570, 31; 2026-02-21T09:05:29.7394541Z shr.u32 %r128, %r127, 27; 2026-02-21T09:05:29.7394597Z add.s32 %r129, %r570, %r128; 2026-02-21T09:05:29.7394658Z shr.s32 %r130, %r129, 5; 2026-02-21T09:05:29.7394852Z .loc 1 39 33 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:39:33 2026-02-21T09:05:29.7394910Z shl.b32 %r131, %r130, 2; 2026-02-21T09:05:29.7395073Z .loc 1 40 39 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:39 2026-02-21T09:05:29.7395158Z sub.s32 %r132, 128, %r131; 2026-02-21T09:05:29.7395315Z .loc 1 40 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:40:52 2026-02-21T09:05:29.7395370Z min.s32 %r133, %r132, 4; 2026-02-21T09:05:29.7395535Z .loc 1 41 45 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:45 2026-02-21T09:05:29.7395591Z and.b32 %r134, %r129, -32; 2026-02-21T09:05:29.7395647Z sub.s32 %r135, %r570, %r134; 2026-02-21T09:05:29.7395815Z .loc 1 41 64 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:64 2026-02-21T09:05:29.7395908Z rem.s32 %r136, %r135, %r133; 2026-02-21T09:05:29.7396063Z .loc 1 41 30 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:41:30 2026-02-21T09:05:29.7396128Z add.s32 %r137, %r136, %r131; 2026-02-21T09:05:29.7396285Z .loc 1 43 27 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:43:27 2026-02-21T09:05:29.7396344Z shl.b32 %r569, %r137, 4; 2026-02-21T09:05:29.7396400Z bra.uni $L__BB0_19; 2026-02-21T09:05:29.7396489Z $L__BB0_20: // %._crit_edge 2026-02-21T09:05:29.7396575Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7396733Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7396797Z barrier.sync 1; 2026-02-21T09:05:29.7396878Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.7396963Z $L__BB0_14: // %._crit_edge15 2026-02-21T09:05:29.7397055Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7397118Z cp.async.wait_group 0; 2026-02-21T09:05:29.7397172Z bar.sync 2, 256; 2026-02-21T09:05:29.7397228Z barrier.sync 1; 2026-02-21T09:05:29.7397291Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.7397382Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.7397538Z .loc 1 19 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:19 2026-02-21T09:05:29.7397602Z barrier.sync 1; 2026-02-21T09:05:29.7397657Z barrier.sync 1; 2026-02-21T09:05:29.7397714Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.7397804Z $L__BB0_29: // %._crit_edge18 2026-02-21T09:05:29.7397966Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7398039Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.7398096Z bar.sync 0, 256; 2026-02-21T09:05:29.7398162Z barrier.sync 1; 2026-02-21T09:05:29.7398220Z shl.b32 %r540, %r582, 3; 2026-02-21T09:05:29.7398277Z add.s32 %r525, %r383, %r540; 2026-02-21T09:05:29.7398442Z .loc 1 57 52 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:57:52 2026-02-21T09:05:29.7398532Z // begin inline asm 2026-02-21T09:05:29.7398582Z 2026-02-21T09:05:29.7398631Z { 2026-02-21T09:05:29.7398697Z .reg .pred complete; 2026-02-21T09:05:29.7398751Z waitLoop: 2026-02-21T09:05:29.7398869Z mbarrier.try_wait.parity.shared.b64 complete, [%r525], %r584; 2026-02-21T09:05:29.7398939Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.7398988Z } 2026-02-21T09:05:29.7398991Z 2026-02-21T09:05:29.7399046Z // end inline asm 2026-02-21T09:05:29.7399210Z .loc 1 32 74 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:74 2026-02-21T09:05:29.7399264Z bar.sync 0, 256; 2026-02-21T09:05:29.7399321Z // begin inline asm 2026-02-21T09:05:29.7399408Z @%p130 mbarrier.inval.shared::cta.b64 [%r383]; 2026-02-21T09:05:29.7399471Z // end inline asm 2026-02-21T09:05:29.7399525Z bar.sync 0, 256; 2026-02-21T09:05:29.7399580Z // begin inline asm 2026-02-21T09:05:29.7399673Z @%p130 mbarrier.inval.shared::cta.b64 [%r384]; 2026-02-21T09:05:29.7399726Z // end inline asm 2026-02-21T09:05:29.7399781Z // begin inline asm 2026-02-21T09:05:29.7399867Z @%p130 mbarrier.inval.shared::cta.b64 [%r381]; 2026-02-21T09:05:29.7399946Z // end inline asm 2026-02-21T09:05:29.7400001Z bar.sync 0, 256; 2026-02-21T09:05:29.7400056Z // begin inline asm 2026-02-21T09:05:29.7400141Z @%p130 mbarrier.inval.shared::cta.b64 [%r382]; 2026-02-21T09:05:29.7400194Z // end inline asm 2026-02-21T09:05:29.7400247Z // begin inline asm 2026-02-21T09:05:29.7400330Z @%p130 mbarrier.inval.shared::cta.b64 [%r373]; 2026-02-21T09:05:29.7400383Z // end inline asm 2026-02-21T09:05:29.7400436Z bar.sync 0, 256; 2026-02-21T09:05:29.7400492Z // begin inline asm 2026-02-21T09:05:29.7400599Z @%p130 mbarrier.inval.shared::cta.b64 [%r374]; 2026-02-21T09:05:29.7400651Z // end inline asm 2026-02-21T09:05:29.7400705Z bar.sync 0, 256; 2026-02-21T09:05:29.7400767Z // begin inline asm 2026-02-21T09:05:29.7400844Z @%p130 mbarrier.inval.shared::cta.b64 [%r375]; 2026-02-21T09:05:29.7400894Z // end inline asm 2026-02-21T09:05:29.7400947Z bar.sync 0, 256; 2026-02-21T09:05:29.7401008Z // begin inline asm 2026-02-21T09:05:29.7401086Z @%p130 mbarrier.inval.shared::cta.b64 [%r376]; 2026-02-21T09:05:29.7401140Z // end inline asm 2026-02-21T09:05:29.7401202Z // begin inline asm 2026-02-21T09:05:29.7401277Z @%p130 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T09:05:29.7401330Z // end inline asm 2026-02-21T09:05:29.7401391Z bar.sync 0, 256; 2026-02-21T09:05:29.7401445Z // begin inline asm 2026-02-21T09:05:29.7401520Z @%p130 mbarrier.inval.shared::cta.b64 [%r370]; 2026-02-21T09:05:29.7401573Z // end inline asm 2026-02-21T09:05:29.7401653Z bar.sync 0, 256; 2026-02-21T09:05:29.7401709Z // begin inline asm 2026-02-21T09:05:29.7401785Z @%p130 mbarrier.inval.shared::cta.b64 [%r371]; 2026-02-21T09:05:29.7401842Z // end inline asm 2026-02-21T09:05:29.7401896Z bar.sync 0, 256; 2026-02-21T09:05:29.7401949Z // begin inline asm 2026-02-21T09:05:29.7402026Z @%p130 mbarrier.inval.shared::cta.b64 [%r372]; 2026-02-21T09:05:29.7402086Z // end inline asm 2026-02-21T09:05:29.7402247Z .loc 1 32 4 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:32:4 2026-02-21T09:05:29.7402301Z bar.sync 0, 256; 2026-02-21T09:05:29.7402376Z // begin inline asm 2026-02-21T09:05:29.7402497Z @%p62 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r539, 128; 2026-02-21T09:05:29.7402553Z // end inline asm 2026-02-21T09:05:29.7402668Z st.shared.v2.b32 [global_smem+67680], {50529027, 50529027}; 2026-02-21T09:05:29.7402749Z st.shared.b32 [global_smem+67688], 50529027; 2026-02-21T09:05:29.7402806Z barrier.sync 1; 2026-02-21T09:05:29.7402891Z $L__BB0_30: // %common.ret 2026-02-21T09:05:29.7403070Z .loc 1 0 0 // cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py:0 2026-02-21T09:05:29.7403124Z ret; 2026-02-21T09:05:29.7403181Z $L__tmp1: 2026-02-21T09:05:29.7403268Z $L__func_end0: 2026-02-21T09:05:29.7403352Z // -- End function 2026-02-21T09:05:29.7403403Z } 2026-02-21T09:05:29.7403615Z .file 1 "/tmp/torchinductor_root/ly/cly7eij5xsqcv43c4ambhu2d53jraqojf44hcvtqix252jlpho3m.py" 2026-02-21T09:05:29.7403686Z .section .debug_abbrev 2026-02-21T09:05:29.7403738Z { 2026-02-21T09:05:29.7403829Z .b8 1 // Abbreviation Code 2026-02-21T09:05:29.7403927Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:29.7404010Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:29.7404089Z .b8 37 // DW_AT_producer 2026-02-21T09:05:29.7404174Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.7404252Z .b8 19 // DW_AT_language 2026-02-21T09:05:29.7404331Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:29.7404408Z .b8 3 // DW_AT_name 2026-02-21T09:05:29.7404491Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.7404570Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:29.7404704Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:29.7404791Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:29.7404865Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.7404938Z .b8 0 // EOM(1) 2026-02-21T09:05:29.7405016Z .b8 0 // EOM(2) 2026-02-21T09:05:29.7405084Z .b8 0 // EOM(3) 2026-02-21T09:05:29.7405139Z } 2026-02-21T09:05:29.7405202Z .section .debug_info 2026-02-21T09:05:29.7405294Z { 2026-02-21T09:05:29.7405378Z .b32 104 // Length of Unit 2026-02-21T09:05:29.7405469Z .b8 2 // DWARF version number 2026-02-21T09:05:29.7405537Z .b8 0 2026-02-21T09:05:29.7405665Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:29.7405759Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:29.7405878Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:29.7405964Z .b8 116 // DW_AT_producer 2026-02-21T09:05:29.7406023Z .b8 114 2026-02-21T09:05:29.7406081Z .b8 105 2026-02-21T09:05:29.7406151Z .b8 116 2026-02-21T09:05:29.7406207Z .b8 111 2026-02-21T09:05:29.7406261Z .b8 110 2026-02-21T09:05:29.7406319Z .b8 0 2026-02-21T09:05:29.7406395Z .b8 2 // DW_AT_language 2026-02-21T09:05:29.7406491Z .b8 0 2026-02-21T09:05:29.7406574Z .b8 99 // DW_AT_name 2026-02-21T09:05:29.7406635Z .b8 108 2026-02-21T09:05:29.7406688Z .b8 121 2026-02-21T09:05:29.7406740Z .b8 55 2026-02-21T09:05:29.7406799Z .b8 101 2026-02-21T09:05:29.7406850Z .b8 105 2026-02-21T09:05:29.7406904Z .b8 106 2026-02-21T09:05:29.7406956Z .b8 53 2026-02-21T09:05:29.7407017Z .b8 120 2026-02-21T09:05:29.7407068Z .b8 115 2026-02-21T09:05:29.7407119Z .b8 113 2026-02-21T09:05:29.7407177Z .b8 99 2026-02-21T09:05:29.7407230Z .b8 118 2026-02-21T09:05:29.7407283Z .b8 52 2026-02-21T09:05:29.7407334Z .b8 51 2026-02-21T09:05:29.7407392Z .b8 99 2026-02-21T09:05:29.7407443Z .b8 52 2026-02-21T09:05:29.7407493Z .b8 97 2026-02-21T09:05:29.7407544Z .b8 109 2026-02-21T09:05:29.7407604Z .b8 98 2026-02-21T09:05:29.7407656Z .b8 104 2026-02-21T09:05:29.7407708Z .b8 117 2026-02-21T09:05:29.7407764Z .b8 50 2026-02-21T09:05:29.7407816Z .b8 100 2026-02-21T09:05:29.7407866Z .b8 53 2026-02-21T09:05:29.7407918Z .b8 51 2026-02-21T09:05:29.7407979Z .b8 106 2026-02-21T09:05:29.7408032Z .b8 114 2026-02-21T09:05:29.7408083Z .b8 97 2026-02-21T09:05:29.7408141Z .b8 113 2026-02-21T09:05:29.7408193Z .b8 111 2026-02-21T09:05:29.7408245Z .b8 106 2026-02-21T09:05:29.7408295Z .b8 102 2026-02-21T09:05:29.7408355Z .b8 52 2026-02-21T09:05:29.7408443Z .b8 52 2026-02-21T09:05:29.7408495Z .b8 104 2026-02-21T09:05:29.7408552Z .b8 99 2026-02-21T09:05:29.7408603Z .b8 118 2026-02-21T09:05:29.7408655Z .b8 116 2026-02-21T09:05:29.7408705Z .b8 113 2026-02-21T09:05:29.7408763Z .b8 105 2026-02-21T09:05:29.7408815Z .b8 120 2026-02-21T09:05:29.7408865Z .b8 50 2026-02-21T09:05:29.7408915Z .b8 53 2026-02-21T09:05:29.7408973Z .b8 50 2026-02-21T09:05:29.7409023Z .b8 106 2026-02-21T09:05:29.7409074Z .b8 108 2026-02-21T09:05:29.7409131Z .b8 112 2026-02-21T09:05:29.7409182Z .b8 104 2026-02-21T09:05:29.7409232Z .b8 111 2026-02-21T09:05:29.7409281Z .b8 51 2026-02-21T09:05:29.7409340Z .b8 109 2026-02-21T09:05:29.7409391Z .b8 46 2026-02-21T09:05:29.7409441Z .b8 112 2026-02-21T09:05:29.7409500Z .b8 121 2026-02-21T09:05:29.7409551Z .b8 0 2026-02-21T09:05:29.7409644Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:29.7409723Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:29.7409783Z .b8 116 2026-02-21T09:05:29.7409835Z .b8 109 2026-02-21T09:05:29.7409885Z .b8 112 2026-02-21T09:05:29.7409944Z .b8 47 2026-02-21T09:05:29.7409996Z .b8 116 2026-02-21T09:05:29.7410046Z .b8 111 2026-02-21T09:05:29.7410107Z .b8 114 2026-02-21T09:05:29.7410163Z .b8 99 2026-02-21T09:05:29.7410240Z .b8 104 2026-02-21T09:05:29.7410291Z .b8 105 2026-02-21T09:05:29.7410340Z .b8 110 2026-02-21T09:05:29.7410394Z .b8 100 2026-02-21T09:05:29.7410443Z .b8 117 2026-02-21T09:05:29.7410492Z .b8 99 2026-02-21T09:05:29.7410548Z .b8 116 2026-02-21T09:05:29.7410596Z .b8 111 2026-02-21T09:05:29.7410646Z .b8 114 2026-02-21T09:05:29.7410694Z .b8 95 2026-02-21T09:05:29.7410754Z .b8 114 2026-02-21T09:05:29.7410803Z .b8 111 2026-02-21T09:05:29.7410852Z .b8 111 2026-02-21T09:05:29.7410910Z .b8 116 2026-02-21T09:05:29.7410984Z .b8 47 2026-02-21T09:05:29.7411034Z .b8 108 2026-02-21T09:05:29.7411084Z .b8 121 2026-02-21T09:05:29.7411146Z .b8 0 2026-02-21T09:05:29.7411198Z } 2026-02-21T09:05:29.7411263Z .section .debug_macinfo { } 2026-02-21T09:05:29.7411268Z 2026-02-21T09:05:29.7411355Z ================================================================ 2026-02-21T09:05:29.7411456Z please share the reproducer above with Triton project. 2026-02-21T09:05:29.9548364Z [30s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:29.9548632Z 2026-02-21T09:05:29.9549606Z 2026-02-21T09:05:29.9549612Z 2026-02-21T09:05:29.9549867Z ================================================================ 2026-02-21T09:05:29.9550123Z Internal Triton PTX codegen error 2026-02-21T09:05:29.9550299Z `ptxas` stderr: 2026-02-21T09:05:29.9550997Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 260 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.9552437Z Config: @helion.kernel(config=helion.Config(block_sizes=[512, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:29.9553718Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:29.9553982Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.9554205Z `ptxas` stderr: 2026-02-21T09:05:29.9558514Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 260 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:29.9558942Z 2026-02-21T09:05:29.9560283Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:29.9560465Z 2026-02-21T09:05:29.9560888Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpyuz8prup.ptx -o /tmp/tmpyuz8prup.ptx.o 2026-02-21T09:05:29.9561540Z 2026-02-21T09:05:29.9561671Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:29.9562212Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpyuz8prup.ptx -o /tmp/tmpyuz8prup.ptx.o 2026-02-21T09:05:29.9562639Z 2026-02-21T09:05:29.9562642Z 2026-02-21T09:05:29.9562696Z // 2026-02-21T09:05:29.9562839Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:29.9563005Z // 2026-02-21T09:05:29.9563075Z 2026-02-21T09:05:29.9563131Z .version 8.7 2026-02-21T09:05:29.9563261Z .target sm_100a 2026-02-21T09:05:29.9563396Z .address_size 64 2026-02-21T09:05:29.9563480Z 2026-02-21T09:05:29.9563606Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:29.9563848Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:29.9564051Z // @_helion_matmul 2026-02-21T09:05:29.9564244Z .visible .entry _helion_matmul( 2026-02-21T09:05:29.9564460Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:29.9564896Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:29.9565215Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:29.9565482Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:29.9565742Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:29.9565974Z ) 2026-02-21T09:05:29.9566113Z .reqntid 384 2026-02-21T09:05:29.9566248Z .maxnreg 32 2026-02-21T09:05:29.9566391Z { 2026-02-21T09:05:29.9566525Z .reg .pred %p<136>; 2026-02-21T09:05:29.9566684Z .reg .b32 %r<442>; 2026-02-21T09:05:29.9566829Z .reg .b64 %rd<151>; 2026-02-21T09:05:29.9567172Z .loc 1 19 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:19:0 2026-02-21T09:05:29.9567485Z $L__func_begin0: 2026-02-21T09:05:29.9567731Z .loc 1 19 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:19:0 2026-02-21T09:05:29.9567976Z 2026-02-21T09:05:29.9568030Z // %bb.0: 2026-02-21T09:05:29.9568185Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:05:29.9568389Z $L__tmp0: 2026-02-21T09:05:29.9568623Z .loc 1 19 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:19 2026-02-21T09:05:29.9568919Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:29.9569075Z shr.u32 %r2, %r1, 5; 2026-02-21T09:05:29.9569235Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:05:29.9569434Z setp.lt.u32 %p3, %r3, 8; 2026-02-21T09:05:29.9569593Z @%p3 bra $L__BB0_16; 2026-02-21T09:05:29.9569743Z bra.uni $L__BB0_1; 2026-02-21T09:05:29.9569936Z $L__BB0_16: 2026-02-21T09:05:29.9570186Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0:0 2026-02-21T09:05:29.9570703Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:05:29.9570930Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:05:29.9571149Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:05:29.9571438Z .loc 1 19 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:19 2026-02-21T09:05:29.9571751Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:05:29.9571947Z setp.lt.u32 %p36, %r1, 32; 2026-02-21T09:05:29.9572122Z mov.b32 %r177, global_smem; 2026-02-21T09:05:29.9572283Z // begin inline asm 2026-02-21T09:05:29.9572532Z @%p36 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r177], 128; 2026-02-21T09:05:29.9572801Z // end inline asm 2026-02-21T09:05:29.9572938Z bar.sync 0, 256; 2026-02-21T09:05:29.9573091Z ld.shared.b32 %r408, [global_smem]; 2026-02-21T09:05:29.9573265Z bar.sync 0, 256; 2026-02-21T09:05:29.9573411Z // begin inline asm 2026-02-21T09:05:29.9573627Z @%p36 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:29.9573853Z // end inline asm 2026-02-21T09:05:29.9574094Z .loc 1 21 67 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:21:67 2026-02-21T09:05:29.9574420Z mov.u32 %r44, %ctaid.x; 2026-02-21T09:05:29.9574576Z mov.u32 %r255, %ctaid.y; 2026-02-21T09:05:29.9574767Z mov.u32 %r256, %ctaid.z; 2026-02-21T09:05:29.9574923Z mov.u32 %r257, %nctaid.x; 2026-02-21T09:05:29.9575078Z mov.u32 %r258, %nctaid.y; 2026-02-21T09:05:29.9575243Z mad.lo.s32 %r259, %r256, %r258, %r255; 2026-02-21T09:05:29.9575422Z mad.lo.s32 %r260, %r259, %r257, %r44; 2026-02-21T09:05:29.9575607Z mul.lo.s32 %r261, %r260, 384; 2026-02-21T09:05:29.9575768Z cvt.s64.s32 %rd83, %r261; 2026-02-21T09:05:29.9575937Z add.s64 %rd44, %rd7, %rd83; 2026-02-21T09:05:29.9576107Z shl.b32 %r262, %r1, 2; 2026-02-21T09:05:29.9576265Z add.s32 %r178, %r177, %r262; 2026-02-21T09:05:29.9576435Z mov.b32 %r439, 0; 2026-02-21T09:05:29.9576574Z // begin inline asm 2026-02-21T09:05:29.9576735Z @%p36 st.shared.b32 [ %r178 + 0 ], %r439; 2026-02-21T09:05:29.9576904Z // end inline asm 2026-02-21T09:05:29.9577044Z bar.warp.sync -1; 2026-02-21T09:05:29.9577188Z setp.eq.b32 %p122, %r1, 0; 2026-02-21T09:05:29.9577349Z cvt.u64.u32 %rd29, %r177; 2026-02-21T09:05:29.9577496Z // begin inline asm 2026-02-21T09:05:29.9577784Z @%p122 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd29 + 0 ], %rd4; 2026-02-21T09:05:29.9578069Z // end inline asm 2026-02-21T09:05:29.9578198Z // begin inline asm 2026-02-21T09:05:29.9578427Z @%p122 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1; 2026-02-21T09:05:29.9578674Z // end inline asm 2026-02-21T09:05:29.9578808Z mov.b32 %r180, 16; 2026-02-21T09:05:29.9578940Z // begin inline asm 2026-02-21T09:05:29.9579178Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r180; 2026-02-21T09:05:29.9579475Z // end inline asm 2026-02-21T09:05:29.9579612Z mov.b32 %r181, 256; 2026-02-21T09:05:29.9579758Z // begin inline asm 2026-02-21T09:05:29.9579986Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r181; 2026-02-21T09:05:29.9580255Z // end inline asm 2026-02-21T09:05:29.9580384Z mov.b32 %r182, 2048; 2026-02-21T09:05:29.9580524Z // begin inline asm 2026-02-21T09:05:29.9580760Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r182; 2026-02-21T09:05:29.9581035Z // end inline asm 2026-02-21T09:05:29.9581167Z mov.b32 %r183, 4096; 2026-02-21T09:05:29.9581302Z // begin inline asm 2026-02-21T09:05:29.9581544Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r183; 2026-02-21T09:05:29.9581814Z // end inline asm 2026-02-21T09:05:29.9581947Z mov.b64 %rd37, 4096; 2026-02-21T09:05:29.9582079Z // begin inline asm 2026-02-21T09:05:29.9582355Z @%p122 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd29 + 0 ], 0x0, %rd37; 2026-02-21T09:05:29.9582634Z // end inline asm 2026-02-21T09:05:29.9582768Z mov.b32 %r184, 1; 2026-02-21T09:05:29.9582907Z // begin inline asm 2026-02-21T09:05:29.9583155Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r184; 2026-02-21T09:05:29.9583444Z // end inline asm 2026-02-21T09:05:29.9583572Z // begin inline asm 2026-02-21T09:05:29.9583828Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r184; 2026-02-21T09:05:29.9584106Z // end inline asm 2026-02-21T09:05:29.9584243Z // begin inline asm 2026-02-21T09:05:29.9584479Z @%p122 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x6; 2026-02-21T09:05:29.9584779Z // end inline asm 2026-02-21T09:05:29.9584918Z // begin inline asm 2026-02-21T09:05:29.9585164Z @%p122 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0; 2026-02-21T09:05:29.9585452Z // end inline asm 2026-02-21T09:05:29.9585586Z // begin inline asm 2026-02-21T09:05:29.9585831Z @%p122 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1; 2026-02-21T09:05:29.9586093Z // end inline asm 2026-02-21T09:05:29.9586228Z // begin inline asm 2026-02-21T09:05:29.9586497Z @%p122 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0; 2026-02-21T09:05:29.9586755Z // end inline asm 2026-02-21T09:05:29.9586889Z // begin inline asm 2026-02-21T09:05:29.9587235Z @%p36 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd44 + 0 ], [ %rd29 + 0 ], 0x80; 2026-02-21T09:05:29.9587617Z // end inline asm 2026-02-21T09:05:29.9587746Z // begin inline asm 2026-02-21T09:05:29.9587958Z @%p36 fence.proxy.tensormap::generic.acquire.gpu [ %rd44 + 0 ], 0x80; 2026-02-21T09:05:29.9588206Z @%p36 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.9588394Z @%p36 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.9588572Z // end inline asm 2026-02-21T09:05:29.9588702Z bar.sync 0, 256; 2026-02-21T09:05:29.9588956Z .loc 1 22 67 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:22:67 2026-02-21T09:05:29.9589246Z add.s32 %r263, %r261, 128; 2026-02-21T09:05:29.9589409Z cvt.s64.s32 %rd84, %r263; 2026-02-21T09:05:29.9589560Z add.s64 %rd62, %rd7, %rd84; 2026-02-21T09:05:29.9589717Z bar.sync 0, 256; 2026-02-21T09:05:29.9589852Z // begin inline asm 2026-02-21T09:05:29.9590001Z @%p36 st.shared.b32 [ %r178 + 0 ], %r439; 2026-02-21T09:05:29.9590199Z // end inline asm 2026-02-21T09:05:29.9590333Z bar.warp.sync -1; 2026-02-21T09:05:29.9590473Z // begin inline asm 2026-02-21T09:05:29.9590710Z @%p122 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd29 + 0 ], %rd5; 2026-02-21T09:05:29.9590987Z // end inline asm 2026-02-21T09:05:29.9591116Z // begin inline asm 2026-02-21T09:05:29.9591336Z @%p122 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1; 2026-02-21T09:05:29.9591590Z // end inline asm 2026-02-21T09:05:29.9591742Z // begin inline asm 2026-02-21T09:05:29.9591975Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r180; 2026-02-21T09:05:29.9592233Z // end inline asm 2026-02-21T09:05:29.9592367Z // begin inline asm 2026-02-21T09:05:29.9592593Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r180; 2026-02-21T09:05:29.9592855Z // end inline asm 2026-02-21T09:05:29.9592988Z // begin inline asm 2026-02-21T09:05:29.9593221Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r182; 2026-02-21T09:05:29.9593494Z // end inline asm 2026-02-21T09:05:29.9593620Z // begin inline asm 2026-02-21T09:05:29.9593861Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r182; 2026-02-21T09:05:29.9594124Z // end inline asm 2026-02-21T09:05:29.9594259Z // begin inline asm 2026-02-21T09:05:29.9594533Z @%p122 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd29 + 0 ], 0x0, %rd37; 2026-02-21T09:05:29.9594849Z // end inline asm 2026-02-21T09:05:29.9594984Z // begin inline asm 2026-02-21T09:05:29.9595233Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r184; 2026-02-21T09:05:29.9595517Z // end inline asm 2026-02-21T09:05:29.9595647Z // begin inline asm 2026-02-21T09:05:29.9595905Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r184; 2026-02-21T09:05:29.9596182Z // end inline asm 2026-02-21T09:05:29.9596327Z // begin inline asm 2026-02-21T09:05:29.9596558Z @%p122 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x6; 2026-02-21T09:05:29.9596810Z // end inline asm 2026-02-21T09:05:29.9596947Z // begin inline asm 2026-02-21T09:05:29.9597189Z @%p122 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0; 2026-02-21T09:05:29.9597467Z // end inline asm 2026-02-21T09:05:29.9597596Z // begin inline asm 2026-02-21T09:05:29.9597834Z @%p122 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1; 2026-02-21T09:05:29.9598101Z // end inline asm 2026-02-21T09:05:29.9598228Z // begin inline asm 2026-02-21T09:05:29.9598457Z @%p122 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0; 2026-02-21T09:05:29.9598769Z // end inline asm 2026-02-21T09:05:29.9598905Z // begin inline asm 2026-02-21T09:05:29.9599248Z @%p36 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd62 + 0 ], [ %rd29 + 0 ], 0x80; 2026-02-21T09:05:29.9599618Z // end inline asm 2026-02-21T09:05:29.9599753Z // begin inline asm 2026-02-21T09:05:29.9599956Z @%p36 fence.proxy.tensormap::generic.acquire.gpu [ %rd62 + 0 ], 0x80; 2026-02-21T09:05:29.9600204Z @%p36 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.9600387Z @%p36 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.9600566Z // end inline asm 2026-02-21T09:05:29.9600694Z bar.sync 0, 256; 2026-02-21T09:05:29.9600946Z .loc 1 24 71 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:24:71 2026-02-21T09:05:29.9601231Z add.s32 %r264, %r261, 256; 2026-02-21T09:05:29.9601394Z cvt.s64.s32 %rd85, %r264; 2026-02-21T09:05:29.9601555Z add.s64 %rd80, %rd7, %rd85; 2026-02-21T09:05:29.9601706Z bar.sync 0, 256; 2026-02-21T09:05:29.9601843Z // begin inline asm 2026-02-21T09:05:29.9601988Z @%p36 st.shared.b32 [ %r178 + 0 ], %r439; 2026-02-21T09:05:29.9602164Z // end inline asm 2026-02-21T09:05:29.9602322Z bar.warp.sync -1; 2026-02-21T09:05:29.9602464Z // begin inline asm 2026-02-21T09:05:29.9602702Z @%p122 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd29 + 0 ], %rd6; 2026-02-21T09:05:29.9602974Z // end inline asm 2026-02-21T09:05:29.9603107Z // begin inline asm 2026-02-21T09:05:29.9603323Z @%p122 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1; 2026-02-21T09:05:29.9603577Z // end inline asm 2026-02-21T09:05:29.9603705Z // begin inline asm 2026-02-21T09:05:29.9603940Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r180; 2026-02-21T09:05:29.9604222Z // end inline asm 2026-02-21T09:05:29.9604358Z // begin inline asm 2026-02-21T09:05:29.9604580Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r181; 2026-02-21T09:05:29.9604874Z // end inline asm 2026-02-21T09:05:29.9605009Z // begin inline asm 2026-02-21T09:05:29.9605244Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r182; 2026-02-21T09:05:29.9605524Z // end inline asm 2026-02-21T09:05:29.9605651Z // begin inline asm 2026-02-21T09:05:29.9605890Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r183; 2026-02-21T09:05:29.9606160Z // end inline asm 2026-02-21T09:05:29.9606300Z // begin inline asm 2026-02-21T09:05:29.9606572Z @%p122 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd29 + 0 ], 0x0, %rd37; 2026-02-21T09:05:29.9606892Z // end inline asm 2026-02-21T09:05:29.9607031Z // begin inline asm 2026-02-21T09:05:29.9607289Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0, %r184; 2026-02-21T09:05:29.9607584Z // end inline asm 2026-02-21T09:05:29.9607719Z // begin inline asm 2026-02-21T09:05:29.9607988Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1, %r184; 2026-02-21T09:05:29.9608282Z // end inline asm 2026-02-21T09:05:29.9608417Z // begin inline asm 2026-02-21T09:05:29.9608661Z @%p122 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x6; 2026-02-21T09:05:29.9608928Z // end inline asm 2026-02-21T09:05:29.9609068Z // begin inline asm 2026-02-21T09:05:29.9609317Z @%p122 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0; 2026-02-21T09:05:29.9609610Z // end inline asm 2026-02-21T09:05:29.9609744Z // begin inline asm 2026-02-21T09:05:29.9609989Z @%p122 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x1; 2026-02-21T09:05:29.9610269Z // end inline asm 2026-02-21T09:05:29.9610402Z // begin inline asm 2026-02-21T09:05:29.9610639Z @%p122 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd29 + 0 ], 0x0; 2026-02-21T09:05:29.9610902Z // end inline asm 2026-02-21T09:05:29.9611072Z // begin inline asm 2026-02-21T09:05:29.9611425Z @%p36 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd80 + 0 ], [ %rd29 + 0 ], 0x80; 2026-02-21T09:05:29.9611823Z // end inline asm 2026-02-21T09:05:29.9611964Z // begin inline asm 2026-02-21T09:05:29.9612172Z @%p36 fence.proxy.tensormap::generic.acquire.gpu [ %rd80 + 0 ], 0x80; 2026-02-21T09:05:29.9612430Z @%p36 cp.async.bulk.commit_group ; 2026-02-21T09:05:29.9612620Z @%p36 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:29.9612802Z // end inline asm 2026-02-21T09:05:29.9612933Z bar.sync 0, 256; 2026-02-21T09:05:29.9613082Z cvta.global.u64 %rd86, %rd80; 2026-02-21T09:05:29.9613370Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9613668Z max.u32 %r265, %r44, 1023; 2026-02-21T09:05:29.9613835Z shl.b32 %r266, %r265, 7; 2026-02-21T09:05:29.9613989Z sub.s32 %r45, 131072, %r266; 2026-02-21T09:05:29.9614273Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9614583Z shfl.sync.idx.b32 %r46, %r2, 0, 31, -1; 2026-02-21T09:05:29.9614803Z shl.b32 %r267, %r46, 21; 2026-02-21T09:05:29.9614991Z and.b32 %r268, %r267, 6291456; 2026-02-21T09:05:29.9615157Z add.s32 %r269, %r268, %r408; 2026-02-21T09:05:29.9615316Z shl.b32 %r270, %r46, 2; 2026-02-21T09:05:29.9615461Z and.b32 %r271, %r270, 16; 2026-02-21T09:05:29.9615616Z add.s32 %r202, %r269, %r271; 2026-02-21T09:05:29.9615767Z mov.pred %p92, -1; 2026-02-21T09:05:29.9615912Z // begin inline asm 2026-02-21T09:05:29.9616272Z @%p92 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r202 + 0], {%r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439}; 2026-02-21T09:05:29.9616701Z // end inline asm 2026-02-21T09:05:29.9616838Z // begin inline asm 2026-02-21T09:05:29.9617214Z @%p92 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r202 + 32], {%r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439, %r439}; 2026-02-21T09:05:29.9617627Z // end inline asm 2026-02-21T09:05:29.9617765Z // begin inline asm 2026-02-21T09:05:29.9617930Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:29.9618092Z // end inline asm 2026-02-21T09:05:29.9618232Z bar.sync 0, 256; 2026-02-21T09:05:29.9618476Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9618773Z add.s32 %r236, %r177, 83968; 2026-02-21T09:05:29.9618934Z // begin inline asm 2026-02-21T09:05:29.9619099Z @%p122 mbarrier.init.shared::cta.b64 [%r236], 1; 2026-02-21T09:05:29.9619327Z // end inline asm 2026-02-21T09:05:29.9619456Z bar.sync 0, 256; 2026-02-21T09:05:29.9619596Z add.s32 %r237, %r177, 83976; 2026-02-21T09:05:29.9619743Z // begin inline asm 2026-02-21T09:05:29.9619905Z @%p122 mbarrier.init.shared::cta.b64 [%r237], 1; 2026-02-21T09:05:29.9620087Z // end inline asm 2026-02-21T09:05:29.9620220Z bar.sync 0, 256; 2026-02-21T09:05:29.9620349Z add.s32 %r238, %r177, 83984; 2026-02-21T09:05:29.9620504Z // begin inline asm 2026-02-21T09:05:29.9620667Z @%p122 mbarrier.init.shared::cta.b64 [%r238], 1; 2026-02-21T09:05:29.9620849Z // end inline asm 2026-02-21T09:05:29.9620982Z bar.sync 0, 256; 2026-02-21T09:05:29.9621112Z add.s32 %r239, %r177, 83992; 2026-02-21T09:05:29.9621270Z // begin inline asm 2026-02-21T09:05:29.9621427Z @%p122 mbarrier.init.shared::cta.b64 [%r239], 1; 2026-02-21T09:05:29.9621614Z // end inline asm 2026-02-21T09:05:29.9621741Z add.s32 %r240, %r177, 84000; 2026-02-21T09:05:29.9621894Z // begin inline asm 2026-02-21T09:05:29.9622057Z @%p122 mbarrier.init.shared::cta.b64 [%r240], 1; 2026-02-21T09:05:29.9622232Z // end inline asm 2026-02-21T09:05:29.9622365Z bar.sync 0, 256; 2026-02-21T09:05:29.9622494Z add.s32 %r241, %r177, 84008; 2026-02-21T09:05:29.9622645Z // begin inline asm 2026-02-21T09:05:29.9622797Z @%p122 mbarrier.init.shared::cta.b64 [%r241], 1; 2026-02-21T09:05:29.9623006Z // end inline asm 2026-02-21T09:05:29.9623131Z bar.sync 0, 256; 2026-02-21T09:05:29.9623268Z add.s32 %r242, %r177, 84016; 2026-02-21T09:05:29.9623415Z // begin inline asm 2026-02-21T09:05:29.9623578Z @%p122 mbarrier.init.shared::cta.b64 [%r242], 1; 2026-02-21T09:05:29.9623762Z // end inline asm 2026-02-21T09:05:29.9623886Z bar.sync 0, 256; 2026-02-21T09:05:29.9624026Z add.s32 %r243, %r177, 84024; 2026-02-21T09:05:29.9624173Z // begin inline asm 2026-02-21T09:05:29.9624333Z @%p122 mbarrier.init.shared::cta.b64 [%r243], 1; 2026-02-21T09:05:29.9624509Z // end inline asm 2026-02-21T09:05:29.9624784Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9625067Z bar.sync 0, 256; 2026-02-21T09:05:29.9625204Z // begin inline asm 2026-02-21T09:05:29.9625376Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r236]; 2026-02-21T09:05:29.9625566Z // end inline asm 2026-02-21T09:05:29.9625703Z bar.sync 0, 256; 2026-02-21T09:05:29.9625836Z // begin inline asm 2026-02-21T09:05:29.9626011Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r237]; 2026-02-21T09:05:29.9626205Z // end inline asm 2026-02-21T09:05:29.9626339Z bar.sync 0, 256; 2026-02-21T09:05:29.9626489Z // begin inline asm 2026-02-21T09:05:29.9626659Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r238]; 2026-02-21T09:05:29.9626848Z // end inline asm 2026-02-21T09:05:29.9626974Z bar.sync 0, 256; 2026-02-21T09:05:29.9627109Z // begin inline asm 2026-02-21T09:05:29.9627268Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r239]; 2026-02-21T09:05:29.9627455Z // end inline asm 2026-02-21T09:05:29.9627696Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9628008Z bar.sync 0, 256; 2026-02-21T09:05:29.9628140Z add.s32 %r248, %r177, 84032; 2026-02-21T09:05:29.9628295Z // begin inline asm 2026-02-21T09:05:29.9628458Z @%p122 mbarrier.init.shared::cta.b64 [%r248], 1; 2026-02-21T09:05:29.9628640Z // end inline asm 2026-02-21T09:05:29.9628777Z bar.sync 0, 256; 2026-02-21T09:05:29.9628906Z add.s32 %r249, %r177, 84040; 2026-02-21T09:05:29.9629061Z // begin inline asm 2026-02-21T09:05:29.9629218Z @%p122 mbarrier.init.shared::cta.b64 [%r249], 1; 2026-02-21T09:05:29.9629403Z // end inline asm 2026-02-21T09:05:29.9629534Z add.s32 %r250, %r177, 84048; 2026-02-21T09:05:29.9629690Z // begin inline asm 2026-02-21T09:05:29.9629844Z @%p122 mbarrier.init.shared::cta.b64 [%r250], 1; 2026-02-21T09:05:29.9630031Z // end inline asm 2026-02-21T09:05:29.9630163Z bar.sync 0, 256; 2026-02-21T09:05:29.9630293Z add.s32 %r251, %r177, 84056; 2026-02-21T09:05:29.9630446Z // begin inline asm 2026-02-21T09:05:29.9630624Z @%p122 mbarrier.init.shared::cta.b64 [%r251], 1; 2026-02-21T09:05:29.9630809Z // end inline asm 2026-02-21T09:05:29.9631047Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9631328Z bar.sync 0, 256; 2026-02-21T09:05:29.9631456Z // begin inline asm 2026-02-21T09:05:29.9631621Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r250]; 2026-02-21T09:05:29.9631809Z // end inline asm 2026-02-21T09:05:29.9631934Z bar.sync 0, 256; 2026-02-21T09:05:29.9632068Z // begin inline asm 2026-02-21T09:05:29.9632228Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r251]; 2026-02-21T09:05:29.9632418Z // end inline asm 2026-02-21T09:05:29.9632656Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9632966Z st.shared.b32 [global_smem+84064], 33554689; 2026-02-21T09:05:29.9633163Z st.shared.b32 [global_smem+65536], %r408; 2026-02-21T09:05:29.9633346Z barrier.sync 1; 2026-02-21T09:05:29.9633506Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:05:29.9633684Z barrier.sync 1; 2026-02-21T09:05:29.9633830Z setp.lt.s32 %p112, %r45, 1; 2026-02-21T09:05:29.9633984Z mov.b32 %r441, %r439; 2026-02-21T09:05:29.9634136Z @%p112 bra $L__BB0_23; 2026-02-21T09:05:29.9634328Z // %bb.17: // %.lr.ph16 2026-02-21T09:05:29.9634523Z add.s32 %r437, %r44, -1; 2026-02-21T09:05:29.9634706Z shl.b32 %r274, %r1, 5; 2026-02-21T09:05:29.9634870Z and.b32 %r275, %r274, 8032; 2026-02-21T09:05:29.9635037Z bfe.s32 %r276, %r1, 2, 1; 2026-02-21T09:05:29.9635191Z and.b32 %r277, %r276, 144; 2026-02-21T09:05:29.9635350Z or.b32 %r278, %r277, %r275; 2026-02-21T09:05:29.9635503Z add.s32 %r280, %r177, 65536; 2026-02-21T09:05:29.9635662Z add.s32 %r49, %r280, %r278; 2026-02-21T09:05:29.9635812Z xor.b32 %r281, %r278, 16; 2026-02-21T09:05:29.9635968Z add.s32 %r50, %r280, %r281; 2026-02-21T09:05:29.9636119Z and.b32 %r282, %r46, 1; 2026-02-21T09:05:29.9636276Z shl.b32 %r283, %r282, 13; 2026-02-21T09:05:29.9636428Z add.s32 %r324, %r280, %r283; 2026-02-21T09:05:29.9636587Z shl.b32 %r52, %r282, 8; 2026-02-21T09:05:29.9636739Z mov.b32 %r433, -1; 2026-02-21T09:05:29.9636878Z mov.b32 %r441, %r439; 2026-02-21T09:05:29.9637029Z mov.b32 %r436, %r439; 2026-02-21T09:05:29.9637171Z mov.b32 %r435, %r439; 2026-02-21T09:05:29.9637320Z mov.b32 %r434, %r439; 2026-02-21T09:05:29.9637459Z bra.uni $L__BB0_18; 2026-02-21T09:05:29.9637690Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.9638042Z .loc 1 0 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0:74 2026-02-21T09:05:29.9638377Z setp.lt.u32 %p118, %r1, 64; 2026-02-21T09:05:29.9638649Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9638929Z shl.b32 %r326, %r439, 3; 2026-02-21T09:05:29.9639085Z add.s32 %r328, %r177, %r326; 2026-02-21T09:05:29.9639233Z add.s32 %r286, %r328, 84032; 2026-02-21T09:05:29.9639521Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9639793Z shl.b32 %r329, %r439, 6; 2026-02-21T09:05:29.9639944Z bar.sync 0, 256; 2026-02-21T09:05:29.9640076Z // begin inline asm 2026-02-21T09:05:29.9640210Z 2026-02-21T09:05:29.9640325Z { 2026-02-21T09:05:29.9640440Z .reg .pred complete; 2026-02-21T09:05:29.9640584Z waitLoop: 2026-02-21T09:05:29.9640764Z mbarrier.try_wait.parity.shared.b64 complete, [%r286], %r441; 2026-02-21T09:05:29.9640993Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.9641137Z } 2026-02-21T09:05:29.9641205Z 2026-02-21T09:05:29.9641258Z // end inline asm 2026-02-21T09:05:29.9641390Z add.s32 %r304, %r202, %r329; 2026-02-21T09:05:29.9641545Z // begin inline asm 2026-02-21T09:05:29.9641936Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302, %r303}, [%r304 + 0]; 2026-02-21T09:05:29.9642333Z // end inline asm 2026-02-21T09:05:29.9642470Z // begin inline asm 2026-02-21T09:05:29.9642818Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r305, %r306, %r307, %r308, %r309, %r310, %r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318, %r319, %r320}, [%r304 + 32]; 2026-02-21T09:05:29.9643195Z // end inline asm 2026-02-21T09:05:29.9643324Z // begin inline asm 2026-02-21T09:05:29.9643478Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:29.9643635Z // end inline asm 2026-02-21T09:05:29.9643778Z cvt.u64.u32 %rd87, %r288; 2026-02-21T09:05:29.9643939Z cvt.u64.u32 %rd88, %r289; 2026-02-21T09:05:29.9644090Z shl.b64 %rd89, %rd88, 32; 2026-02-21T09:05:29.9644260Z or.b64 %rd90, %rd87, %rd89; 2026-02-21T09:05:29.9644521Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9644853Z mov.b64 {%r330, %r331}, %rd90; 2026-02-21T09:05:29.9645020Z cvt.rn.f16x2.f32 %r332, %r331, %r330; 2026-02-21T09:05:29.9645309Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9645602Z cvt.u64.u32 %rd91, %r290; 2026-02-21T09:05:29.9645749Z cvt.u64.u32 %rd92, %r291; 2026-02-21T09:05:29.9645902Z shl.b64 %rd93, %rd92, 32; 2026-02-21T09:05:29.9646080Z or.b64 %rd94, %rd91, %rd93; 2026-02-21T09:05:29.9646338Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9646616Z mov.b64 {%r333, %r334}, %rd94; 2026-02-21T09:05:29.9646790Z cvt.rn.f16x2.f32 %r335, %r334, %r333; 2026-02-21T09:05:29.9647062Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9647350Z cvt.u64.u32 %rd95, %r292; 2026-02-21T09:05:29.9647501Z cvt.u64.u32 %rd96, %r293; 2026-02-21T09:05:29.9647644Z shl.b64 %rd97, %rd96, 32; 2026-02-21T09:05:29.9647796Z or.b64 %rd98, %rd95, %rd97; 2026-02-21T09:05:29.9648049Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9648332Z mov.b64 {%r336, %r337}, %rd98; 2026-02-21T09:05:29.9648493Z cvt.rn.f16x2.f32 %r338, %r337, %r336; 2026-02-21T09:05:29.9648769Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9649046Z cvt.u64.u32 %rd99, %r294; 2026-02-21T09:05:29.9649192Z cvt.u64.u32 %rd100, %r295; 2026-02-21T09:05:29.9649348Z shl.b64 %rd101, %rd100, 32; 2026-02-21T09:05:29.9649530Z or.b64 %rd102, %rd99, %rd101; 2026-02-21T09:05:29.9649798Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9650074Z mov.b64 {%r339, %r340}, %rd102; 2026-02-21T09:05:29.9650243Z cvt.rn.f16x2.f32 %r341, %r340, %r339; 2026-02-21T09:05:29.9650507Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9650790Z cvt.u64.u32 %rd103, %r296; 2026-02-21T09:05:29.9650975Z cvt.u64.u32 %rd104, %r297; 2026-02-21T09:05:29.9651122Z shl.b64 %rd105, %rd104, 32; 2026-02-21T09:05:29.9651279Z or.b64 %rd106, %rd103, %rd105; 2026-02-21T09:05:29.9651532Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9651817Z mov.b64 {%r342, %r343}, %rd106; 2026-02-21T09:05:29.9651976Z cvt.rn.f16x2.f32 %r344, %r343, %r342; 2026-02-21T09:05:29.9652252Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9652530Z cvt.u64.u32 %rd107, %r298; 2026-02-21T09:05:29.9652678Z cvt.u64.u32 %rd108, %r299; 2026-02-21T09:05:29.9652832Z shl.b64 %rd109, %rd108, 32; 2026-02-21T09:05:29.9652980Z or.b64 %rd110, %rd107, %rd109; 2026-02-21T09:05:29.9653241Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9653515Z mov.b64 {%r345, %r346}, %rd110; 2026-02-21T09:05:29.9653714Z cvt.rn.f16x2.f32 %r347, %r346, %r345; 2026-02-21T09:05:29.9653986Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9654270Z cvt.u64.u32 %rd111, %r300; 2026-02-21T09:05:29.9654425Z cvt.u64.u32 %rd112, %r301; 2026-02-21T09:05:29.9654575Z shl.b64 %rd113, %rd112, 32; 2026-02-21T09:05:29.9654773Z or.b64 %rd114, %rd111, %rd113; 2026-02-21T09:05:29.9655041Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9655340Z mov.b64 {%r348, %r349}, %rd114; 2026-02-21T09:05:29.9655502Z cvt.rn.f16x2.f32 %r350, %r349, %r348; 2026-02-21T09:05:29.9655787Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9656075Z cvt.u64.u32 %rd115, %r302; 2026-02-21T09:05:29.9656223Z cvt.u64.u32 %rd116, %r303; 2026-02-21T09:05:29.9656379Z shl.b64 %rd117, %rd116, 32; 2026-02-21T09:05:29.9656533Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T09:05:29.9656832Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9657135Z mov.b64 {%r351, %r352}, %rd118; 2026-02-21T09:05:29.9657309Z cvt.rn.f16x2.f32 %r353, %r352, %r351; 2026-02-21T09:05:29.9657624Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9657921Z cvt.u64.u32 %rd119, %r305; 2026-02-21T09:05:29.9658082Z cvt.u64.u32 %rd120, %r306; 2026-02-21T09:05:29.9658236Z shl.b64 %rd121, %rd120, 32; 2026-02-21T09:05:29.9658399Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T09:05:29.9658670Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9658969Z mov.b64 {%r354, %r355}, %rd122; 2026-02-21T09:05:29.9659134Z cvt.rn.f16x2.f32 %r356, %r355, %r354; 2026-02-21T09:05:29.9659426Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9659726Z cvt.u64.u32 %rd123, %r307; 2026-02-21T09:05:29.9659881Z cvt.u64.u32 %rd124, %r308; 2026-02-21T09:05:29.9660043Z shl.b64 %rd125, %rd124, 32; 2026-02-21T09:05:29.9660200Z or.b64 %rd126, %rd123, %rd125; 2026-02-21T09:05:29.9660479Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9660772Z mov.b64 {%r357, %r358}, %rd126; 2026-02-21T09:05:29.9660945Z cvt.rn.f16x2.f32 %r359, %r358, %r357; 2026-02-21T09:05:29.9661145Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9661206Z cvt.u64.u32 %rd127, %r309; 2026-02-21T09:05:29.9661270Z cvt.u64.u32 %rd128, %r310; 2026-02-21T09:05:29.9661327Z shl.b64 %rd129, %rd128, 32; 2026-02-21T09:05:29.9661386Z or.b64 %rd130, %rd127, %rd129; 2026-02-21T09:05:29.9661569Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9661657Z mov.b64 {%r360, %r361}, %rd130; 2026-02-21T09:05:29.9661722Z cvt.rn.f16x2.f32 %r362, %r361, %r360; 2026-02-21T09:05:29.9661902Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9661962Z cvt.u64.u32 %rd131, %r311; 2026-02-21T09:05:29.9662021Z cvt.u64.u32 %rd132, %r312; 2026-02-21T09:05:29.9662079Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:05:29.9662146Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:05:29.9662320Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9662380Z mov.b64 {%r363, %r364}, %rd134; 2026-02-21T09:05:29.9662451Z cvt.rn.f16x2.f32 %r365, %r364, %r363; 2026-02-21T09:05:29.9662622Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9662681Z cvt.u64.u32 %rd135, %r313; 2026-02-21T09:05:29.9662773Z cvt.u64.u32 %rd136, %r314; 2026-02-21T09:05:29.9662835Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:05:29.9662895Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:05:29.9663070Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9663139Z mov.b64 {%r366, %r367}, %rd138; 2026-02-21T09:05:29.9663203Z cvt.rn.f16x2.f32 %r368, %r367, %r366; 2026-02-21T09:05:29.9663372Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9663443Z cvt.u64.u32 %rd139, %r315; 2026-02-21T09:05:29.9663503Z cvt.u64.u32 %rd140, %r316; 2026-02-21T09:05:29.9663563Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:05:29.9663628Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:05:29.9663800Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9663860Z mov.b64 {%r369, %r370}, %rd142; 2026-02-21T09:05:29.9663927Z cvt.rn.f16x2.f32 %r371, %r370, %r369; 2026-02-21T09:05:29.9664104Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9664164Z cvt.u64.u32 %rd143, %r317; 2026-02-21T09:05:29.9664223Z cvt.u64.u32 %rd144, %r318; 2026-02-21T09:05:29.9664291Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:05:29.9664377Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:05:29.9664549Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9664618Z mov.b64 {%r372, %r373}, %rd146; 2026-02-21T09:05:29.9664713Z cvt.rn.f16x2.f32 %r374, %r373, %r372; 2026-02-21T09:05:29.9664894Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9664954Z cvt.u64.u32 %rd147, %r319; 2026-02-21T09:05:29.9665019Z cvt.u64.u32 %rd148, %r320; 2026-02-21T09:05:29.9665077Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:05:29.9665134Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:05:29.9665306Z .loc 1 58 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:58:27 2026-02-21T09:05:29.9665366Z mov.b64 {%r375, %r376}, %rd150; 2026-02-21T09:05:29.9665428Z cvt.rn.f16x2.f32 %r377, %r376, %r375; 2026-02-21T09:05:29.9665601Z .loc 1 59 45 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:59:45 2026-02-21T09:05:29.9665675Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.9665730Z bar.sync 0, 256; 2026-02-21T09:05:29.9665851Z st.shared.v4.b32 [%r49], {%r332, %r335, %r338, %r341}; 2026-02-21T09:05:29.9665957Z st.shared.v4.b32 [%r49+8192], {%r356, %r359, %r362, %r365}; 2026-02-21T09:05:29.9666045Z st.shared.v4.b32 [%r50], {%r344, %r347, %r350, %r353}; 2026-02-21T09:05:29.9666137Z st.shared.v4.b32 [%r50+8192], {%r368, %r371, %r374, %r377}; 2026-02-21T09:05:29.9666200Z // begin inline asm 2026-02-21T09:05:29.9666270Z fence.proxy.async.shared::cta; 2026-02-21T09:05:29.9666324Z // end inline asm 2026-02-21T09:05:29.9666388Z bar.sync 0, 256; 2026-02-21T09:05:29.9666475Z elect.sync %r378|%p119, -1; 2026-02-21T09:05:29.9666538Z and.pred %p116, %p118, %p119; 2026-02-21T09:05:29.9666595Z add.s32 %r323, %r436, %r52; 2026-02-21T09:05:29.9666659Z // begin inline asm 2026-02-21T09:05:29.9666844Z @%p116 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd86, {%r435, %r323}], [%r324]; 2026-02-21T09:05:29.9666898Z // end inline asm 2026-02-21T09:05:29.9666971Z cp.async.bulk.commit_group; 2026-02-21T09:05:29.9667134Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9667192Z add.s32 %r325, %r328, 84048; 2026-02-21T09:05:29.9667363Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9667417Z bar.sync 0, 256; 2026-02-21T09:05:29.9667471Z // begin inline asm 2026-02-21T09:05:29.9667562Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r325]; 2026-02-21T09:05:29.9667654Z // end inline asm 2026-02-21T09:05:29.9667716Z add.s32 %r379, %r439, 1; 2026-02-21T09:05:29.9667777Z setp.eq.b32 %p120, %r379, 2; 2026-02-21T09:05:29.9667844Z selp.b32 %r439, 0, %r379, %p120; 2026-02-21T09:05:29.9667903Z selp.b32 %r438, 1, 0, %p120; 2026-02-21T09:05:29.9667984Z $L__BB0_22: // %.thread23 2026-02-21T09:05:29.9668074Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.9668247Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9668303Z xor.b32 %r441, %r441, %r438; 2026-02-21T09:05:29.9668467Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9668532Z add.s32 %r434, %r434, 1; 2026-02-21T09:05:29.9668594Z setp.lt.s32 %p121, %r434, %r45; 2026-02-21T09:05:29.9668652Z @%p121 bra $L__BB0_18; 2026-02-21T09:05:29.9668715Z bra.uni $L__BB0_23; 2026-02-21T09:05:29.9668817Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:29.9668874Z add.s32 %r285, %r433, 1; 2026-02-21T09:05:29.9668935Z setp.eq.b32 %p113, %r433, 127; 2026-02-21T09:05:29.9669003Z selp.b32 %r433, 0, %r285, %p113; 2026-02-21T09:05:29.9669091Z setp.eq.b32 %p114, %r433, 127; 2026-02-21T09:05:29.9669147Z @%p114 bra $L__BB0_21; 2026-02-21T09:05:29.9669251Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.9669423Z .loc 1 0 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0:74 2026-02-21T09:05:29.9669477Z mov.b32 %r438, 0; 2026-02-21T09:05:29.9669651Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9669709Z setp.ne.b32 %p115, %r433, 0; 2026-02-21T09:05:29.9669766Z @%p115 bra $L__BB0_22; 2026-02-21T09:05:29.9669838Z // %bb.20: // %.thread 2026-02-21T09:05:29.9669931Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:29.9669988Z add.s32 %r437, %r437, 1; 2026-02-21T09:05:29.9670153Z .loc 1 39 35 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:39:35 2026-02-21T09:05:29.9670215Z shr.s32 %r381, %r437, 31; 2026-02-21T09:05:29.9670272Z shr.u32 %r382, %r381, 27; 2026-02-21T09:05:29.9670327Z add.s32 %r383, %r437, %r382; 2026-02-21T09:05:29.9670387Z shr.s32 %r384, %r383, 5; 2026-02-21T09:05:29.9670573Z .loc 1 40 33 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:40:33 2026-02-21T09:05:29.9670629Z shl.b32 %r385, %r384, 2; 2026-02-21T09:05:29.9670791Z .loc 1 41 39 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:41:39 2026-02-21T09:05:29.9670854Z sub.s32 %r386, 128, %r385; 2026-02-21T09:05:29.9671022Z .loc 1 41 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:41:52 2026-02-21T09:05:29.9671078Z min.s32 %r387, %r386, 4; 2026-02-21T09:05:29.9671285Z .loc 1 42 45 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:42:45 2026-02-21T09:05:29.9671342Z and.b32 %r388, %r383, -32; 2026-02-21T09:05:29.9671397Z sub.s32 %r389, %r437, %r388; 2026-02-21T09:05:29.9671569Z .loc 1 43 51 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:43:51 2026-02-21T09:05:29.9671625Z div.s32 %r390, %r389, %r387; 2026-02-21T09:05:29.9671793Z .loc 1 42 64 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:42:64 2026-02-21T09:05:29.9671860Z mul.lo.s32 %r391, %r390, %r387; 2026-02-21T09:05:29.9671915Z sub.s32 %r392, %r389, %r391; 2026-02-21T09:05:29.9672078Z .loc 1 42 30 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:42:30 2026-02-21T09:05:29.9672133Z add.s32 %r393, %r392, %r385; 2026-02-21T09:05:29.9672325Z .loc 1 44 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:44:27 2026-02-21T09:05:29.9672382Z shl.b32 %r435, %r393, 4; 2026-02-21T09:05:29.9672545Z .loc 1 45 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:45:27 2026-02-21T09:05:29.9672608Z shl.b32 %r436, %r390, 9; 2026-02-21T09:05:29.9672774Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9672828Z bra.uni $L__BB0_22; 2026-02-21T09:05:29.9672927Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:05:29.9673093Z .loc 1 0 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0:74 2026-02-21T09:05:29.9673152Z mov.b32 %r76, global_smem; 2026-02-21T09:05:29.9673216Z add.s32 %r77, %r76, %r3; 2026-02-21T09:05:29.9673272Z mov.u32 %r130, %ctaid.x; 2026-02-21T09:05:29.9673327Z max.u32 %r131, %r130, 1023; 2026-02-21T09:05:29.9673381Z shl.b32 %r132, %r131, 7; 2026-02-21T09:05:29.9673444Z sub.s32 %r5, 131072, %r132; 2026-02-21T09:05:29.9673504Z setp.lt.s32 %p18, %r5, 1; 2026-02-21T09:05:29.9673560Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.9673662Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9673826Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9673925Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.9673984Z barrier.sync 1; 2026-02-21T09:05:29.9674049Z barrier.sync 1; 2026-02-21T09:05:29.9674130Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.9674209Z $L__BB0_2: // %.preheader 2026-02-21T09:05:29.9674308Z // =>This Loop Header: Depth=1 2026-02-21T09:05:29.9674393Z // Child Loop BB0_11 Depth 2 2026-02-21T09:05:29.9674474Z // Child Loop BB0_7 Depth 2 2026-02-21T09:05:29.9674643Z .loc 1 19 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:19 2026-02-21T09:05:29.9674749Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:05:29.9674806Z barrier.sync 1; 2026-02-21T09:05:29.9674870Z ld.shared.b8 %r75, [%r77+84056]; 2026-02-21T09:05:29.9674937Z setp.gt.u32 %p4, %r75, 3; 2026-02-21T09:05:29.9674994Z @%p4 bra $L__BB0_4; 2026-02-21T09:05:29.9675069Z // %bb.3: // %.preheader 2026-02-21T09:05:29.9675162Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9675249Z $L_brx_0: .branchtargets 2026-02-21T09:05:29.9675303Z $L__BB0_5, 2026-02-21T09:05:29.9675361Z $L__BB0_9, 2026-02-21T09:05:29.9675411Z $L__BB0_15, 2026-02-21T09:05:29.9675462Z $L__BB0_24; 2026-02-21T09:05:29.9675519Z brx.idx %r75, $L_brx_0; 2026-02-21T09:05:29.9675617Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9675780Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9675886Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.9675965Z ld.shared.b32 %r4, [global_smem+65536]; 2026-02-21T09:05:29.9676020Z barrier.sync 1; 2026-02-21T09:05:29.9676075Z @%p18 bra $L__BB0_8; 2026-02-21T09:05:29.9676153Z // %bb.6: // %.lr.ph13 2026-02-21T09:05:29.9676240Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9676409Z .loc 1 0 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0:74 2026-02-21T09:05:29.9676465Z mov.b32 %r415, -1; 2026-02-21T09:05:29.9676531Z mov.pred %p135, 0; 2026-02-21T09:05:29.9676584Z mov.b32 %r411, 0; 2026-02-21T09:05:29.9676639Z mov.b32 %r412, %r411; 2026-02-21T09:05:29.9676699Z mov.b32 %r413, %r411; 2026-02-21T09:05:29.9676753Z mov.b32 %r414, %r411; 2026-02-21T09:05:29.9676806Z mov.b32 %r416, %r411; 2026-02-21T09:05:29.9676924Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.9677023Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.9677190Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9677245Z add.s32 %r147, %r415, 1; 2026-02-21T09:05:29.9677315Z setp.eq.b32 %p31, %r415, 127; 2026-02-21T09:05:29.9677377Z selp.b32 %r415, 0, %r147, %p31; 2026-02-21T09:05:29.9677431Z shl.b32 %r148, %r414, 3; 2026-02-21T09:05:29.9677496Z add.s32 %r150, %r76, %r148; 2026-02-21T09:05:29.9677552Z add.s32 %r151, %r150, 83968; 2026-02-21T09:05:29.9677608Z add.s32 %r135, %r150, 84000; 2026-02-21T09:05:29.9677774Z .loc 1 54 31 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:54:31 2026-02-21T09:05:29.9677835Z shl.b32 %r152, %r414, 14; 2026-02-21T09:05:29.9677891Z add.s32 %r153, %r76, %r152; 2026-02-21T09:05:29.9678061Z .loc 1 55 44 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:55:44 2026-02-21T09:05:29.9678124Z shl.b32 %r154, %r414, 9; 2026-02-21T09:05:29.9678181Z add.s32 %r155, %r76, %r154; 2026-02-21T09:05:29.9678234Z add.s32 %r156, %r155, 81920; 2026-02-21T09:05:29.9678400Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9678485Z bar.warp.sync -1; 2026-02-21T09:05:29.9678539Z // begin inline asm 2026-02-21T09:05:29.9678587Z 2026-02-21T09:05:29.9678643Z { 2026-02-21T09:05:29.9678702Z .reg .pred complete; 2026-02-21T09:05:29.9678756Z waitLoop: 2026-02-21T09:05:29.9678879Z mbarrier.try_wait.parity.shared.b64 complete, [%r135], %r413; 2026-02-21T09:05:29.9678939Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.9678988Z } 2026-02-21T09:05:29.9678991Z 2026-02-21T09:05:29.9679044Z // end inline asm 2026-02-21T09:05:29.9679212Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9679268Z shl.b32 %r157, %r412, 6; 2026-02-21T09:05:29.9679325Z add.s32 %r137, %r157, %r4; 2026-02-21T09:05:29.9679489Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9679544Z shl.b32 %r158, %r412, 3; 2026-02-21T09:05:29.9679599Z add.s32 %r159, %r76, %r158; 2026-02-21T09:05:29.9679663Z add.s32 %r160, %r159, 84032; 2026-02-21T09:05:29.9679724Z setp.eq.b32 %p30, %r415, 127; 2026-02-21T09:05:29.9679902Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9679966Z elect.sync %r161|%p21, -1; 2026-02-21T09:05:29.9680032Z bfe.u32 %r162, %r153, 4, 14; 2026-02-21T09:05:29.9680089Z cvt.u64.u32 %rd24, %r162; 2026-02-21T09:05:29.9680160Z or.b64 %rd14, %rd24, -4611685949640802304; 2026-02-21T09:05:29.9680222Z bfe.u32 %r163, %r156, 4, 14; 2026-02-21T09:05:29.9680278Z cvt.u64.u32 %rd25, %r163; 2026-02-21T09:05:29.9680347Z or.b64 %rd15, %rd25, -4611685949705814016; 2026-02-21T09:05:29.9680402Z mov.b32 %r138, 134479888; 2026-02-21T09:05:29.9680483Z // begin inline asm 2026-02-21T09:05:29.9680624Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r137 + 0 ], %rd14, %rd15, %r138, %p135; 2026-02-21T09:05:29.9680678Z // end inline asm 2026-02-21T09:05:29.9680742Z add.s32 %r164, %r153, 4096; 2026-02-21T09:05:29.9680798Z bfe.u32 %r165, %r164, 4, 14; 2026-02-21T09:05:29.9680853Z cvt.u64.u32 %rd26, %r165; 2026-02-21T09:05:29.9680924Z or.b64 %rd16, %rd26, -4611685949640802304; 2026-02-21T09:05:29.9680978Z // begin inline asm 2026-02-21T09:05:29.9681117Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r137 + 16 ], %rd16, %rd15, %r138, %p135; 2026-02-21T09:05:29.9681169Z // end inline asm 2026-02-21T09:05:29.9681231Z add.s32 %r166, %r153, 8192; 2026-02-21T09:05:29.9681286Z bfe.u32 %r167, %r166, 4, 14; 2026-02-21T09:05:29.9681341Z cvt.u64.u32 %rd27, %r167; 2026-02-21T09:05:29.9681411Z or.b64 %rd18, %rd27, -4611685949640802304; 2026-02-21T09:05:29.9681465Z // begin inline asm 2026-02-21T09:05:29.9681619Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r137 + 32 ], %rd18, %rd15, %r138, %p135; 2026-02-21T09:05:29.9681686Z // end inline asm 2026-02-21T09:05:29.9681741Z add.s32 %r168, %r153, 12288; 2026-02-21T09:05:29.9681798Z bfe.u32 %r169, %r168, 4, 14; 2026-02-21T09:05:29.9681857Z cvt.u64.u32 %rd28, %r169; 2026-02-21T09:05:29.9681938Z or.b64 %rd20, %rd28, -4611685949640802304; 2026-02-21T09:05:29.9681994Z // begin inline asm 2026-02-21T09:05:29.9682124Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r137 + 48 ], %rd20, %rd15, %r138, %p135; 2026-02-21T09:05:29.9682184Z // end inline asm 2026-02-21T09:05:29.9682239Z cvt.u64.u32 %rd22, %r151; 2026-02-21T09:05:29.9682293Z // begin inline asm 2026-02-21T09:05:29.9682420Z @%p21 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:05:29.9682473Z // end inline asm 2026-02-21T09:05:29.9682534Z and.pred %p29, %p30, %p21; 2026-02-21T09:05:29.9682589Z cvt.u64.u32 %rd23, %r160; 2026-02-21T09:05:29.9682651Z // begin inline asm 2026-02-21T09:05:29.9682770Z @%p29 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:05:29.9682824Z // end inline asm 2026-02-21T09:05:29.9682986Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9683069Z setp.ne.b32 %p135, %r415, 127; 2026-02-21T09:05:29.9683241Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9683304Z add.s32 %r170, %r412, 1; 2026-02-21T09:05:29.9683364Z setp.eq.b32 %p32, %r170, 2; 2026-02-21T09:05:29.9683425Z selp.b32 %r171, 0, %r170, %p32; 2026-02-21T09:05:29.9683488Z selp.b32 %r412, %r171, %r412, %p30; 2026-02-21T09:05:29.9683556Z and.pred %p33, %p30, %p32; 2026-02-21T09:05:29.9683613Z selp.b32 %r172, 1, 0, %p33; 2026-02-21T09:05:29.9683667Z xor.b32 %r411, %r411, %r172; 2026-02-21T09:05:29.9683841Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9683897Z shl.b32 %r173, %r412, 3; 2026-02-21T09:05:29.9683953Z add.s32 %r174, %r76, %r173; 2026-02-21T09:05:29.9684008Z add.s32 %r145, %r174, 84048; 2026-02-21T09:05:29.9684178Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9684235Z // begin inline asm 2026-02-21T09:05:29.9684283Z 2026-02-21T09:05:29.9684338Z { 2026-02-21T09:05:29.9684399Z @!%p30 bra.uni skipWait; 2026-02-21T09:05:29.9684456Z .reg .pred complete; 2026-02-21T09:05:29.9684531Z waitLoop: 2026-02-21T09:05:29.9684656Z mbarrier.try_wait.parity.shared.b64 complete, [%r145], %r411; 2026-02-21T09:05:29.9684751Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.9684805Z skipWait: 2026-02-21T09:05:29.9684861Z } 2026-02-21T09:05:29.9684865Z 2026-02-21T09:05:29.9684917Z // end inline asm 2026-02-21T09:05:29.9685073Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9685136Z add.s32 %r175, %r414, 1; 2026-02-21T09:05:29.9685221Z setp.eq.b32 %p34, %r175, 4; 2026-02-21T09:05:29.9685281Z selp.b32 %r414, 0, %r175, %p34; 2026-02-21T09:05:29.9685337Z selp.b32 %r176, 1, 0, %p34; 2026-02-21T09:05:29.9685400Z xor.b32 %r413, %r413, %r176; 2026-02-21T09:05:29.9685567Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9685623Z add.s32 %r416, %r416, 1; 2026-02-21T09:05:29.9685692Z setp.lt.s32 %p35, %r416, %r5; 2026-02-21T09:05:29.9685750Z @%p35 bra $L__BB0_7; 2026-02-21T09:05:29.9685833Z $L__BB0_8: // %._crit_edge14 2026-02-21T09:05:29.9685928Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9685984Z barrier.sync 1; 2026-02-21T09:05:29.9686061Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.9686116Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.9686251Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9686416Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9686492Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.9686554Z barrier.sync 1; 2026-02-21T09:05:29.9686720Z .loc 1 21 67 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:21:67 2026-02-21T09:05:29.9686777Z mov.u32 %r78, %ctaid.y; 2026-02-21T09:05:29.9686841Z mov.u32 %r79, %ctaid.z; 2026-02-21T09:05:29.9686898Z mov.u32 %r80, %nctaid.x; 2026-02-21T09:05:29.9686955Z mov.u32 %r81, %nctaid.y; 2026-02-21T09:05:29.9687018Z mad.lo.s32 %r82, %r79, %r81, %r78; 2026-02-21T09:05:29.9687088Z mad.lo.s32 %r83, %r82, %r80, %r130; 2026-02-21T09:05:29.9687146Z mul.lo.s32 %r84, %r83, 384; 2026-02-21T09:05:29.9687312Z .loc 1 22 67 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:22:67 2026-02-21T09:05:29.9687374Z add.s32 %r85, %r84, 128; 2026-02-21T09:05:29.9687431Z cvt.s64.s32 %rd8, %r85; 2026-02-21T09:05:29.9687489Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:05:29.9687551Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:05:29.9687723Z .loc 1 21 67 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:21:67 2026-02-21T09:05:29.9687807Z cvt.s64.s32 %rd10, %r84; 2026-02-21T09:05:29.9687864Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:05:29.9687931Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:05:29.9688096Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9688152Z @%p18 bra $L__BB0_14; 2026-02-21T09:05:29.9688232Z // %bb.10: // %.lr.ph 2026-02-21T09:05:29.9688316Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9688373Z add.s32 %r427, %r130, -1; 2026-02-21T09:05:29.9688428Z add.s32 %r21, %r1, -256; 2026-02-21T09:05:29.9688491Z shr.u32 %r22, %r21, 5; 2026-02-21T09:05:29.9688545Z mov.b32 %r423, -1; 2026-02-21T09:05:29.9688599Z mov.b32 %r417, 0; 2026-02-21T09:05:29.9688660Z mov.b32 %r418, %r417; 2026-02-21T09:05:29.9688713Z mov.b32 %r426, %r417; 2026-02-21T09:05:29.9688766Z mov.b32 %r425, %r417; 2026-02-21T09:05:29.9688819Z mov.b32 %r421, %r417; 2026-02-21T09:05:29.9688881Z mov.b32 %r424, %r417; 2026-02-21T09:05:29.9688936Z bra.uni $L__BB0_11; 2026-02-21T09:05:29.9689032Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:29.9689261Z .loc 1 0 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0:74 2026-02-21T09:05:29.9689326Z selp.b32 %r108, 0, %r421, %p8; 2026-02-21T09:05:29.9689387Z setp.lt.u32 %p12, %r21, 32; 2026-02-21T09:05:29.9689456Z setp.lt.u32 %p13, %r21, 64; 2026-02-21T09:05:29.9689516Z setp.eq.b32 %p9, %r21, 0; 2026-02-21T09:05:29.9689679Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9689738Z shl.b32 %r115, %r418, 3; 2026-02-21T09:05:29.9689826Z add.s32 %r117, %r76, %r115; 2026-02-21T09:05:29.9689882Z add.s32 %r104, %r117, 83968; 2026-02-21T09:05:29.9690041Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9690107Z // begin inline asm 2026-02-21T09:05:29.9690157Z 2026-02-21T09:05:29.9690205Z { 2026-02-21T09:05:29.9690262Z .reg .pred complete; 2026-02-21T09:05:29.9690325Z waitLoop: 2026-02-21T09:05:29.9690443Z mbarrier.try_wait.parity.shared.b64 complete, [%r104], %r417; 2026-02-21T09:05:29.9690508Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.9690571Z } 2026-02-21T09:05:29.9690574Z 2026-02-21T09:05:29.9690631Z // end inline asm 2026-02-21T09:05:29.9690797Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9690867Z add.s32 %r110, %r117, 84000; 2026-02-21T09:05:29.9691046Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9691103Z bar.sync 3, 64; 2026-02-21T09:05:29.9691159Z // begin inline asm 2026-02-21T09:05:29.9691273Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r110], 16896; 2026-02-21T09:05:29.9691327Z // end inline asm 2026-02-21T09:05:29.9691489Z .loc 1 54 31 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:54:31 2026-02-21T09:05:29.9691554Z shl.b32 %r118, %r418, 14; 2026-02-21T09:05:29.9691609Z add.s32 %r119, %r76, %r118; 2026-02-21T09:05:29.9691759Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9691820Z bar.sync 3, 64; 2026-02-21T09:05:29.9691894Z shfl.sync.idx.b32 %r120, %r22, 0, 31, -1; 2026-02-21T09:05:29.9691955Z elect.sync %r121|%p14, -1; 2026-02-21T09:05:29.9692016Z and.pred %p10, %p13, %p14; 2026-02-21T09:05:29.9692080Z and.b32 %r122, %r120, 1; 2026-02-21T09:05:29.9692136Z shl.b32 %r123, %r122, 13; 2026-02-21T09:05:29.9692193Z add.s32 %r107, %r119, %r123; 2026-02-21T09:05:29.9692257Z shl.b32 %r124, %r122, 8; 2026-02-21T09:05:29.9692312Z or.b32 %r109, %r124, %r426; 2026-02-21T09:05:29.9692366Z // begin inline asm 2026-02-21T09:05:29.9692615Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r107], [%rd12, {%r108, %r109}], [%r110]; 2026-02-21T09:05:29.9692690Z // end inline asm 2026-02-21T09:05:29.9692849Z .loc 1 55 44 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:55:44 2026-02-21T09:05:29.9692905Z shl.b32 %r125, %r418, 9; 2026-02-21T09:05:29.9692967Z add.s32 %r126, %r76, %r125; 2026-02-21T09:05:29.9693023Z add.s32 %r111, %r126, 81920; 2026-02-21T09:05:29.9693174Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9693233Z bar.sync 3, 64; 2026-02-21T09:05:29.9693292Z elect.sync %r127|%p15, -1; 2026-02-21T09:05:29.9693351Z and.pred %p11, %p12, %p15; 2026-02-21T09:05:29.9693410Z // begin inline asm 2026-02-21T09:05:29.9693648Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r111], [%rd13, {%r108, %r425}], [%r110]; 2026-02-21T09:05:29.9693703Z // end inline asm 2026-02-21T09:05:29.9693862Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9693924Z add.s32 %r421, %r108, 16; 2026-02-21T09:05:29.9694074Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9694150Z add.s32 %r128, %r418, 1; 2026-02-21T09:05:29.9694217Z setp.eq.b32 %p16, %r128, 4; 2026-02-21T09:05:29.9694277Z selp.b32 %r418, 0, %r128, %p16; 2026-02-21T09:05:29.9694334Z selp.b32 %r129, 1, 0, %p16; 2026-02-21T09:05:29.9694396Z xor.b32 %r417, %r417, %r129; 2026-02-21T09:05:29.9694564Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9694619Z add.s32 %r424, %r424, 1; 2026-02-21T09:05:29.9694710Z setp.lt.s32 %p17, %r424, %r5; 2026-02-21T09:05:29.9694812Z @%p17 bra $L__BB0_11; 2026-02-21T09:05:29.9694866Z bra.uni $L__BB0_14; 2026-02-21T09:05:29.9694963Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:29.9695063Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:29.9695122Z add.s32 %r90, %r423, 1; 2026-02-21T09:05:29.9695183Z setp.eq.b32 %p6, %r423, 127; 2026-02-21T09:05:29.9695248Z selp.b32 %r423, 0, %r90, %p6; 2026-02-21T09:05:29.9695309Z setp.ne.b32 %p7, %r423, 0; 2026-02-21T09:05:29.9695367Z setp.eq.b32 %p8, %r423, 0; 2026-02-21T09:05:29.9695423Z @%p7 bra $L__BB0_13; 2026-02-21T09:05:29.9695524Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:29.9695579Z add.s32 %r427, %r427, 1; 2026-02-21T09:05:29.9695747Z .loc 1 39 35 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:39:35 2026-02-21T09:05:29.9695833Z shr.s32 %r91, %r427, 31; 2026-02-21T09:05:29.9695891Z shr.u32 %r92, %r91, 27; 2026-02-21T09:05:29.9695947Z add.s32 %r93, %r427, %r92; 2026-02-21T09:05:29.9696002Z shr.s32 %r94, %r93, 5; 2026-02-21T09:05:29.9696171Z .loc 1 40 33 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:40:33 2026-02-21T09:05:29.9696230Z shl.b32 %r95, %r94, 2; 2026-02-21T09:05:29.9696393Z .loc 1 41 39 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:41:39 2026-02-21T09:05:29.9696459Z sub.s32 %r96, 128, %r95; 2026-02-21T09:05:29.9696621Z .loc 1 41 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:41:52 2026-02-21T09:05:29.9696676Z min.s32 %r97, %r96, 4; 2026-02-21T09:05:29.9696844Z .loc 1 42 45 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:42:45 2026-02-21T09:05:29.9696902Z and.b32 %r98, %r93, -32; 2026-02-21T09:05:29.9696958Z sub.s32 %r99, %r427, %r98; 2026-02-21T09:05:29.9697129Z .loc 1 43 51 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:43:51 2026-02-21T09:05:29.9697188Z div.s32 %r100, %r99, %r97; 2026-02-21T09:05:29.9697349Z .loc 1 42 64 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:42:64 2026-02-21T09:05:29.9697443Z mul.lo.s32 %r101, %r100, %r97; 2026-02-21T09:05:29.9697508Z sub.s32 %r102, %r99, %r101; 2026-02-21T09:05:29.9697677Z .loc 1 42 30 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:42:30 2026-02-21T09:05:29.9697734Z add.s32 %r103, %r102, %r95; 2026-02-21T09:05:29.9697902Z .loc 1 44 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:44:27 2026-02-21T09:05:29.9697957Z shl.b32 %r425, %r103, 4; 2026-02-21T09:05:29.9698122Z .loc 1 45 27 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:45:27 2026-02-21T09:05:29.9698185Z shl.b32 %r426, %r100, 9; 2026-02-21T09:05:29.9698242Z bra.uni $L__BB0_13; 2026-02-21T09:05:29.9698325Z $L__BB0_14: // %._crit_edge 2026-02-21T09:05:29.9698410Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9698585Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9698644Z barrier.sync 1; 2026-02-21T09:05:29.9698721Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:29.9698783Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.9698909Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:29.9699073Z .loc 1 19 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:19 2026-02-21T09:05:29.9699134Z barrier.sync 1; 2026-02-21T09:05:29.9699188Z barrier.sync 1; 2026-02-21T09:05:29.9699242Z bra.uni $L__BB0_2; 2026-02-21T09:05:29.9699323Z $L__BB0_23: // %._crit_edge17 2026-02-21T09:05:29.9699496Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9699594Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:29.9699649Z bar.sync 0, 256; 2026-02-21T09:05:29.9699710Z barrier.sync 1; 2026-02-21T09:05:29.9699786Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:05:29.9699843Z shl.b32 %r409, %r439, 3; 2026-02-21T09:05:29.9699906Z add.s32 %r394, %r250, %r409; 2026-02-21T09:05:29.9700070Z .loc 1 56 52 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:56:52 2026-02-21T09:05:29.9700128Z // begin inline asm 2026-02-21T09:05:29.9700179Z 2026-02-21T09:05:29.9700238Z { 2026-02-21T09:05:29.9700299Z .reg .pred complete; 2026-02-21T09:05:29.9700353Z waitLoop: 2026-02-21T09:05:29.9700481Z mbarrier.try_wait.parity.shared.b64 complete, [%r394], %r441; 2026-02-21T09:05:29.9700547Z @!complete bra.uni waitLoop; 2026-02-21T09:05:29.9700597Z } 2026-02-21T09:05:29.9700600Z 2026-02-21T09:05:29.9700656Z // end inline asm 2026-02-21T09:05:29.9700860Z .loc 1 33 74 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:74 2026-02-21T09:05:29.9700921Z bar.sync 0, 256; 2026-02-21T09:05:29.9700978Z // begin inline asm 2026-02-21T09:05:29.9701075Z @%p122 mbarrier.inval.shared::cta.b64 [%r250]; 2026-02-21T09:05:29.9701132Z // end inline asm 2026-02-21T09:05:29.9701187Z bar.sync 0, 256; 2026-02-21T09:05:29.9701249Z // begin inline asm 2026-02-21T09:05:29.9701335Z @%p122 mbarrier.inval.shared::cta.b64 [%r251]; 2026-02-21T09:05:29.9701391Z // end inline asm 2026-02-21T09:05:29.9701448Z // begin inline asm 2026-02-21T09:05:29.9701538Z @%p122 mbarrier.inval.shared::cta.b64 [%r248]; 2026-02-21T09:05:29.9701593Z // end inline asm 2026-02-21T09:05:29.9701646Z bar.sync 0, 256; 2026-02-21T09:05:29.9701708Z // begin inline asm 2026-02-21T09:05:29.9701785Z @%p122 mbarrier.inval.shared::cta.b64 [%r249]; 2026-02-21T09:05:29.9701839Z // end inline asm 2026-02-21T09:05:29.9701894Z // begin inline asm 2026-02-21T09:05:29.9701981Z @%p122 mbarrier.inval.shared::cta.b64 [%r240]; 2026-02-21T09:05:29.9702037Z // end inline asm 2026-02-21T09:05:29.9702091Z bar.sync 0, 256; 2026-02-21T09:05:29.9702154Z // begin inline asm 2026-02-21T09:05:29.9702232Z @%p122 mbarrier.inval.shared::cta.b64 [%r241]; 2026-02-21T09:05:29.9702306Z // end inline asm 2026-02-21T09:05:29.9702367Z bar.sync 0, 256; 2026-02-21T09:05:29.9702422Z // begin inline asm 2026-02-21T09:05:29.9702499Z @%p122 mbarrier.inval.shared::cta.b64 [%r242]; 2026-02-21T09:05:29.9702553Z // end inline asm 2026-02-21T09:05:29.9702615Z bar.sync 0, 256; 2026-02-21T09:05:29.9702671Z // begin inline asm 2026-02-21T09:05:29.9702747Z @%p122 mbarrier.inval.shared::cta.b64 [%r243]; 2026-02-21T09:05:29.9702808Z // end inline asm 2026-02-21T09:05:29.9702862Z // begin inline asm 2026-02-21T09:05:29.9702939Z @%p122 mbarrier.inval.shared::cta.b64 [%r236]; 2026-02-21T09:05:29.9702991Z // end inline asm 2026-02-21T09:05:29.9703053Z bar.sync 0, 256; 2026-02-21T09:05:29.9703109Z // begin inline asm 2026-02-21T09:05:29.9703187Z @%p122 mbarrier.inval.shared::cta.b64 [%r237]; 2026-02-21T09:05:29.9703248Z // end inline asm 2026-02-21T09:05:29.9703301Z bar.sync 0, 256; 2026-02-21T09:05:29.9703356Z // begin inline asm 2026-02-21T09:05:29.9703437Z @%p122 mbarrier.inval.shared::cta.b64 [%r238]; 2026-02-21T09:05:29.9703494Z // end inline asm 2026-02-21T09:05:29.9703547Z bar.sync 0, 256; 2026-02-21T09:05:29.9703603Z // begin inline asm 2026-02-21T09:05:29.9703706Z @%p122 mbarrier.inval.shared::cta.b64 [%r239]; 2026-02-21T09:05:29.9703764Z // end inline asm 2026-02-21T09:05:29.9703929Z .loc 1 33 4 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:33:4 2026-02-21T09:05:29.9703990Z bar.sync 0, 256; 2026-02-21T09:05:29.9704046Z // begin inline asm 2026-02-21T09:05:29.9704164Z @%p36 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r408, 128; 2026-02-21T09:05:29.9704218Z // end inline asm 2026-02-21T09:05:29.9704307Z st.shared.b32 [global_smem+84064], 50529027; 2026-02-21T09:05:29.9704389Z barrier.sync 1; 2026-02-21T09:05:29.9704470Z $L__BB0_24: // %common.ret 2026-02-21T09:05:29.9704640Z .loc 1 0 0 // cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py:0 2026-02-21T09:05:29.9704726Z ret; 2026-02-21T09:05:29.9704784Z $L__tmp1: 2026-02-21T09:05:29.9704847Z $L__func_end0: 2026-02-21T09:05:29.9704928Z // -- End function 2026-02-21T09:05:29.9704980Z } 2026-02-21T09:05:29.9705192Z .file 1 "/tmp/torchinductor_root/xj/cxjcxargxdmp74x2q7tfyneeuvesg4prmgvmf3bgvxvm5u6wssbm.py" 2026-02-21T09:05:29.9705264Z .section .debug_abbrev 2026-02-21T09:05:29.9705316Z { 2026-02-21T09:05:29.9705405Z .b8 1 // Abbreviation Code 2026-02-21T09:05:29.9705498Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:29.9705579Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:29.9705684Z .b8 37 // DW_AT_producer 2026-02-21T09:05:29.9705771Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.9705848Z .b8 19 // DW_AT_language 2026-02-21T09:05:29.9705926Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:29.9706005Z .b8 3 // DW_AT_name 2026-02-21T09:05:29.9706091Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.9706173Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:29.9706252Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:29.9706339Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:29.9706415Z .b8 8 // DW_FORM_string 2026-02-21T09:05:29.9706489Z .b8 0 // EOM(1) 2026-02-21T09:05:29.9706573Z .b8 0 // EOM(2) 2026-02-21T09:05:29.9706646Z .b8 0 // EOM(3) 2026-02-21T09:05:29.9706700Z } 2026-02-21T09:05:29.9706760Z .section .debug_info 2026-02-21T09:05:29.9706819Z { 2026-02-21T09:05:29.9706902Z .b32 104 // Length of Unit 2026-02-21T09:05:29.9706989Z .b8 2 // DWARF version number 2026-02-21T09:05:29.9707099Z .b8 0 2026-02-21T09:05:29.9707220Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:29.9707314Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:29.9707417Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:29.9707507Z .b8 116 // DW_AT_producer 2026-02-21T09:05:29.9707564Z .b8 114 2026-02-21T09:05:29.9707621Z .b8 105 2026-02-21T09:05:29.9707682Z .b8 116 2026-02-21T09:05:29.9707737Z .b8 111 2026-02-21T09:05:29.9707792Z .b8 110 2026-02-21T09:05:29.9707845Z .b8 0 2026-02-21T09:05:29.9707939Z .b8 2 // DW_AT_language 2026-02-21T09:05:29.9707993Z .b8 0 2026-02-21T09:05:29.9708068Z .b8 99 // DW_AT_name 2026-02-21T09:05:29.9708128Z .b8 120 2026-02-21T09:05:29.9708179Z .b8 106 2026-02-21T09:05:29.9708232Z .b8 99 2026-02-21T09:05:29.9708286Z .b8 120 2026-02-21T09:05:29.9708346Z .b8 97 2026-02-21T09:05:29.9708397Z .b8 114 2026-02-21T09:05:29.9708449Z .b8 103 2026-02-21T09:05:29.9708509Z .b8 120 2026-02-21T09:05:29.9708560Z .b8 100 2026-02-21T09:05:29.9708612Z .b8 109 2026-02-21T09:05:29.9708691Z .b8 112 2026-02-21T09:05:29.9708749Z .b8 55 2026-02-21T09:05:29.9708798Z .b8 52 2026-02-21T09:05:29.9708847Z .b8 120 2026-02-21T09:05:29.9708903Z .b8 50 2026-02-21T09:05:29.9708952Z .b8 113 2026-02-21T09:05:29.9709002Z .b8 55 2026-02-21T09:05:29.9709050Z .b8 116 2026-02-21T09:05:29.9709106Z .b8 102 2026-02-21T09:05:29.9709154Z .b8 121 2026-02-21T09:05:29.9709201Z .b8 110 2026-02-21T09:05:29.9709256Z .b8 101 2026-02-21T09:05:29.9709304Z .b8 101 2026-02-21T09:05:29.9709354Z .b8 117 2026-02-21T09:05:29.9709432Z .b8 118 2026-02-21T09:05:29.9709487Z .b8 101 2026-02-21T09:05:29.9709536Z .b8 115 2026-02-21T09:05:29.9709583Z .b8 103 2026-02-21T09:05:29.9709631Z .b8 52 2026-02-21T09:05:29.9709686Z .b8 112 2026-02-21T09:05:29.9709734Z .b8 114 2026-02-21T09:05:29.9709785Z .b8 109 2026-02-21T09:05:29.9709838Z .b8 103 2026-02-21T09:05:29.9709886Z .b8 118 2026-02-21T09:05:29.9709933Z .b8 109 2026-02-21T09:05:29.9709981Z .b8 102 2026-02-21T09:05:29.9710036Z .b8 51 2026-02-21T09:05:29.9710085Z .b8 98 2026-02-21T09:05:29.9710133Z .b8 103 2026-02-21T09:05:29.9710186Z .b8 118 2026-02-21T09:05:29.9710235Z .b8 120 2026-02-21T09:05:29.9710284Z .b8 118 2026-02-21T09:05:29.9710335Z .b8 109 2026-02-21T09:05:29.9710389Z .b8 53 2026-02-21T09:05:29.9710437Z .b8 117 2026-02-21T09:05:29.9710485Z .b8 54 2026-02-21T09:05:29.9710532Z .b8 119 2026-02-21T09:05:29.9710587Z .b8 115 2026-02-21T09:05:29.9710636Z .b8 115 2026-02-21T09:05:29.9710683Z .b8 98 2026-02-21T09:05:29.9710761Z .b8 109 2026-02-21T09:05:29.9710810Z .b8 46 2026-02-21T09:05:29.9710861Z .b8 112 2026-02-21T09:05:29.9710909Z .b8 121 2026-02-21T09:05:29.9710965Z .b8 0 2026-02-21T09:05:29.9711052Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:29.9711124Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:29.9711182Z .b8 116 2026-02-21T09:05:29.9711231Z .b8 109 2026-02-21T09:05:29.9711279Z .b8 112 2026-02-21T09:05:29.9711327Z .b8 47 2026-02-21T09:05:29.9711383Z .b8 116 2026-02-21T09:05:29.9711431Z .b8 111 2026-02-21T09:05:29.9711483Z .b8 114 2026-02-21T09:05:29.9711539Z .b8 99 2026-02-21T09:05:29.9711588Z .b8 104 2026-02-21T09:05:29.9711635Z .b8 105 2026-02-21T09:05:29.9711682Z .b8 110 2026-02-21T09:05:29.9711738Z .b8 100 2026-02-21T09:05:29.9711789Z .b8 117 2026-02-21T09:05:29.9711837Z .b8 99 2026-02-21T09:05:29.9711893Z .b8 116 2026-02-21T09:05:29.9711943Z .b8 111 2026-02-21T09:05:29.9711994Z .b8 114 2026-02-21T09:05:29.9712051Z .b8 95 2026-02-21T09:05:29.9712107Z .b8 114 2026-02-21T09:05:29.9712157Z .b8 111 2026-02-21T09:05:29.9712206Z .b8 111 2026-02-21T09:05:29.9712256Z .b8 116 2026-02-21T09:05:29.9712312Z .b8 47 2026-02-21T09:05:29.9712360Z .b8 120 2026-02-21T09:05:29.9712407Z .b8 106 2026-02-21T09:05:29.9712461Z .b8 0 2026-02-21T09:05:29.9712509Z } 2026-02-21T09:05:29.9712597Z .section .debug_macinfo { } 2026-02-21T09:05:29.9712601Z 2026-02-21T09:05:29.9712683Z ================================================================ 2026-02-21T09:05:29.9712785Z please share the reproducer above with Triton project. 2026-02-21T09:05:30.1023197Z [30s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:30.1023210Z 2026-02-21T09:05:30.1023214Z 2026-02-21T09:05:30.1023217Z 2026-02-21T09:05:30.1023316Z ================================================================ 2026-02-21T09:05:30.1023394Z Internal Triton PTX codegen error 2026-02-21T09:05:30.1023453Z `ptxas` stderr: 2026-02-21T09:05:30.1023824Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 271 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.1023924Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.1023929Z 2026-02-21T09:05:30.1024336Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp24t7em5r.ptx -o /tmp/tmp24t7em5r.ptx.o 2026-02-21T09:05:30.1024343Z 2026-02-21T09:05:30.1024347Z 2026-02-21T09:05:30.1024636Z // 2026-02-21T09:05:30.1024783Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:30.1024834Z // 2026-02-21T09:05:30.1024838Z 2026-02-21T09:05:30.1024904Z .version 8.7 2026-02-21T09:05:30.1024961Z .target sm_100a 2026-02-21T09:05:30.1025016Z .address_size 64 2026-02-21T09:05:30.1025019Z 2026-02-21T09:05:30.1025152Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:30.1025232Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:30.1025312Z // @_helion_matmul 2026-02-21T09:05:30.1025450Z .visible .entry _helion_matmul( 2026-02-21T09:05:30.1025562Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:30.1025657Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:30.1025753Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:30.1025856Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:30.1025955Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:30.1026007Z ) 2026-02-21T09:05:30.1026070Z .reqntid 256 2026-02-21T09:05:30.1026124Z .maxnreg 32 2026-02-21T09:05:30.1026175Z { 2026-02-21T09:05:30.1026235Z .reg .pred %p<147>; 2026-02-21T09:05:30.1026301Z .reg .b32 %r<332>; 2026-02-21T09:05:30.1026356Z .reg .b64 %rd<163>; 2026-02-21T09:05:30.1026539Z .loc 1 19 0 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:19:0 2026-02-21T09:05:30.1026640Z $L__func_begin0: 2026-02-21T09:05:30.1026812Z .loc 1 19 0 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:19:0 2026-02-21T09:05:30.1026816Z 2026-02-21T09:05:30.1026871Z // %bb.0: 2026-02-21T09:05:30.1026969Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:05:30.1027026Z $L__tmp0: 2026-02-21T09:05:30.1027193Z .loc 1 19 0 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:19 2026-02-21T09:05:30.1027253Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:30.1027354Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:05:30.1027420Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:05:30.1027503Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:05:30.1027571Z mov.b32 %r25, global_smem; 2026-02-21T09:05:30.1027627Z // begin inline asm 2026-02-21T09:05:30.1027771Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r25], 32; 2026-02-21T09:05:30.1027833Z // end inline asm 2026-02-21T09:05:30.1027912Z ld.param.b64 %rd58, [_helion_matmul_param_3]; 2026-02-21T09:05:30.1027968Z bar.sync 0; 2026-02-21T09:05:30.1028037Z ld.shared.b32 %r323, [global_smem]; 2026-02-21T09:05:30.1028098Z bar.sync 0; 2026-02-21T09:05:30.1028153Z // begin inline asm 2026-02-21T09:05:30.1028272Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:30.1028371Z // end inline asm 2026-02-21T09:05:30.1028544Z .loc 1 21 67 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:21:67 2026-02-21T09:05:30.1028605Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:05:30.1028663Z mov.u32 %r50, %ctaid.y; 2026-02-21T09:05:30.1028728Z mov.u32 %r51, %ctaid.z; 2026-02-21T09:05:30.1028786Z mov.u32 %r52, %nctaid.x; 2026-02-21T09:05:30.1028841Z mov.u32 %r53, %nctaid.y; 2026-02-21T09:05:30.1028914Z mad.lo.s32 %r54, %r51, %r53, %r50; 2026-02-21T09:05:30.1028974Z mad.lo.s32 %r55, %r54, %r52, %r3; 2026-02-21T09:05:30.1029034Z mul.lo.s32 %r56, %r55, 384; 2026-02-21T09:05:30.1029099Z cvt.s64.s32 %rd59, %r56; 2026-02-21T09:05:30.1029161Z add.s64 %rd19, %rd58, %rd59; 2026-02-21T09:05:30.1029218Z shl.b32 %r57, %r1, 2; 2026-02-21T09:05:30.1029275Z add.s32 %r26, %r25, %r57; 2026-02-21T09:05:30.1029337Z mov.b32 %r35, 0; 2026-02-21T09:05:30.1029394Z // begin inline asm 2026-02-21T09:05:30.1029467Z @%p1 st.shared.b32 [ %r26 + 0 ], %r35; 2026-02-21T09:05:30.1029530Z // end inline asm 2026-02-21T09:05:30.1029592Z bar.warp.sync -1; 2026-02-21T09:05:30.1029654Z setp.eq.b32 %p137, %r1, 0; 2026-02-21T09:05:30.1029746Z cvt.u64.u32 %rd4, %r25; 2026-02-21T09:05:30.1029813Z // begin inline asm 2026-02-21T09:05:30.1029982Z @%p137 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:05:30.1030035Z // end inline asm 2026-02-21T09:05:30.1030096Z // begin inline asm 2026-02-21T09:05:30.1030238Z @%p137 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.1030292Z // end inline asm 2026-02-21T09:05:30.1030346Z mov.b32 %r28, 64; 2026-02-21T09:05:30.1030433Z // begin inline asm 2026-02-21T09:05:30.1030585Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r28; 2026-02-21T09:05:30.1030638Z // end inline asm 2026-02-21T09:05:30.1030698Z mov.b32 %r29, 256; 2026-02-21T09:05:30.1030753Z // begin inline asm 2026-02-21T09:05:30.1030897Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r29; 2026-02-21T09:05:30.1030955Z // end inline asm 2026-02-21T09:05:30.1031010Z mov.b32 %r30, 2048; 2026-02-21T09:05:30.1032178Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:30.1032309Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:30.1032366Z `ptxas` stderr: 2026-02-21T09:05:30.1032710Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 271 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.1032812Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.1032816Z 2026-02-21T09:05:30.1033214Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp24t7em5r.ptx -o /tmp/tmp24t7em5r.ptx.o 2026-02-21T09:05:30.1033218Z 2026-02-21T09:05:30.1033351Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:30.1033416Z // begin inline asm 2026-02-21T09:05:30.1033585Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T09:05:30.1033643Z // end inline asm 2026-02-21T09:05:30.1033709Z mov.b32 %r31, 4096; 2026-02-21T09:05:30.1033764Z // begin inline asm 2026-02-21T09:05:30.1033925Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r31; 2026-02-21T09:05:30.1033987Z // end inline asm 2026-02-21T09:05:30.1034067Z mov.b64 %rd12, 4096; 2026-02-21T09:05:30.1034122Z // begin inline asm 2026-02-21T09:05:30.1034304Z @%p137 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:05:30.1034360Z // end inline asm 2026-02-21T09:05:30.1034415Z mov.b32 %r32, 1; 2026-02-21T09:05:30.1034471Z // begin inline asm 2026-02-21T09:05:30.1034651Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T09:05:30.1034781Z // end inline asm 2026-02-21T09:05:30.1034840Z // begin inline asm 2026-02-21T09:05:30.1035021Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T09:05:30.1035078Z // end inline asm 2026-02-21T09:05:30.1035138Z // begin inline asm 2026-02-21T09:05:30.1035303Z @%p137 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:05:30.1035359Z // end inline asm 2026-02-21T09:05:30.1035416Z // begin inline asm 2026-02-21T09:05:30.1035594Z @%p137 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.1035658Z // end inline asm 2026-02-21T09:05:30.1035715Z // begin inline asm 2026-02-21T09:05:30.1035899Z @%p137 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:05:30.1035965Z // end inline asm 2026-02-21T09:05:30.1036023Z // begin inline asm 2026-02-21T09:05:30.1036174Z @%p137 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.1036238Z // end inline asm 2026-02-21T09:05:30.1036294Z // begin inline asm 2026-02-21T09:05:30.1036576Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:05:30.1036666Z // end inline asm 2026-02-21T09:05:30.1036728Z // begin inline asm 2026-02-21T09:05:30.1036857Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:05:30.1036930Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.1037013Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.1037068Z // end inline asm 2026-02-21T09:05:30.1037121Z bar.sync 0; 2026-02-21T09:05:30.1037195Z cvta.global.u64 %rd97, %rd19; 2026-02-21T09:05:30.1037370Z .loc 1 22 67 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:22:67 2026-02-21T09:05:30.1037431Z add.s32 %r58, %r56, 128; 2026-02-21T09:05:30.1037491Z cvt.s64.s32 %rd60, %r58; 2026-02-21T09:05:30.1037560Z add.s64 %rd37, %rd58, %rd60; 2026-02-21T09:05:30.1037613Z bar.sync 0; 2026-02-21T09:05:30.1037669Z // begin inline asm 2026-02-21T09:05:30.1037744Z @%p1 st.shared.b32 [ %r26 + 0 ], %r35; 2026-02-21T09:05:30.1037826Z // end inline asm 2026-02-21T09:05:30.1037889Z bar.warp.sync -1; 2026-02-21T09:05:30.1037945Z // begin inline asm 2026-02-21T09:05:30.1038116Z @%p137 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:05:30.1038172Z // end inline asm 2026-02-21T09:05:30.1038228Z // begin inline asm 2026-02-21T09:05:30.1038379Z @%p137 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.1038433Z // end inline asm 2026-02-21T09:05:30.1038488Z // begin inline asm 2026-02-21T09:05:30.1038649Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r28; 2026-02-21T09:05:30.1038703Z // end inline asm 2026-02-21T09:05:30.1038758Z mov.b32 %r37, 16; 2026-02-21T09:05:30.1038814Z // begin inline asm 2026-02-21T09:05:30.1038974Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r37; 2026-02-21T09:05:30.1039030Z // end inline asm 2026-02-21T09:05:30.1039084Z // begin inline asm 2026-02-21T09:05:30.1039252Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T09:05:30.1039308Z // end inline asm 2026-02-21T09:05:30.1039363Z // begin inline asm 2026-02-21T09:05:30.1039532Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r30; 2026-02-21T09:05:30.1039616Z // end inline asm 2026-02-21T09:05:30.1039673Z // begin inline asm 2026-02-21T09:05:30.1039853Z @%p137 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:05:30.1039913Z // end inline asm 2026-02-21T09:05:30.1039969Z // begin inline asm 2026-02-21T09:05:30.1040175Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T09:05:30.1040248Z // end inline asm 2026-02-21T09:05:30.1040300Z // begin inline asm 2026-02-21T09:05:30.1040461Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T09:05:30.1040523Z // end inline asm 2026-02-21T09:05:30.1040576Z // begin inline asm 2026-02-21T09:05:30.1040721Z @%p137 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:05:30.1040780Z // end inline asm 2026-02-21T09:05:30.1040832Z // begin inline asm 2026-02-21T09:05:30.1040996Z @%p137 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.1041049Z // end inline asm 2026-02-21T09:05:30.1041107Z // begin inline asm 2026-02-21T09:05:30.1041284Z @%p137 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:05:30.1041338Z // end inline asm 2026-02-21T09:05:30.1041398Z // begin inline asm 2026-02-21T09:05:30.1041538Z @%p137 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.1041591Z // end inline asm 2026-02-21T09:05:30.1041651Z // begin inline asm 2026-02-21T09:05:30.1041900Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:05:30.1041977Z // end inline asm 2026-02-21T09:05:30.1042038Z // begin inline asm 2026-02-21T09:05:30.1042157Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:05:30.1042226Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.1042298Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.1042360Z // end inline asm 2026-02-21T09:05:30.1042413Z bar.sync 0; 2026-02-21T09:05:30.1042478Z cvta.global.u64 %rd98, %rd37; 2026-02-21T09:05:30.1042651Z .loc 1 24 71 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:24:71 2026-02-21T09:05:30.1042709Z add.s32 %r59, %r56, 256; 2026-02-21T09:05:30.1042767Z cvt.s64.s32 %rd61, %r59; 2026-02-21T09:05:30.1042832Z add.s64 %rd55, %rd58, %rd61; 2026-02-21T09:05:30.1042884Z bar.sync 0; 2026-02-21T09:05:30.1042938Z // begin inline asm 2026-02-21T09:05:30.1043001Z @%p1 st.shared.b32 [ %r26 + 0 ], %r35; 2026-02-21T09:05:30.1043100Z // end inline asm 2026-02-21T09:05:30.1043159Z bar.warp.sync -1; 2026-02-21T09:05:30.1043213Z // begin inline asm 2026-02-21T09:05:30.1043378Z @%p137 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:05:30.1043431Z // end inline asm 2026-02-21T09:05:30.1043485Z // begin inline asm 2026-02-21T09:05:30.1043622Z @%p137 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.1043681Z // end inline asm 2026-02-21T09:05:30.1043734Z // begin inline asm 2026-02-21T09:05:30.1043880Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r37; 2026-02-21T09:05:30.1043940Z // end inline asm 2026-02-21T09:05:30.1043994Z // begin inline asm 2026-02-21T09:05:30.1044139Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r29; 2026-02-21T09:05:30.1044196Z // end inline asm 2026-02-21T09:05:30.1044249Z // begin inline asm 2026-02-21T09:05:30.1044408Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T09:05:30.1044461Z // end inline asm 2026-02-21T09:05:30.1044520Z // begin inline asm 2026-02-21T09:05:30.1044702Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r31; 2026-02-21T09:05:30.1044782Z // end inline asm 2026-02-21T09:05:30.1044843Z // begin inline asm 2026-02-21T09:05:30.1045008Z @%p137 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:05:30.1045059Z // end inline asm 2026-02-21T09:05:30.1045118Z // begin inline asm 2026-02-21T09:05:30.1045279Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T09:05:30.1045331Z // end inline asm 2026-02-21T09:05:30.1045388Z // begin inline asm 2026-02-21T09:05:30.1045550Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T09:05:30.1045601Z // end inline asm 2026-02-21T09:05:30.1045652Z // begin inline asm 2026-02-21T09:05:30.1045803Z @%p137 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:05:30.1045856Z // end inline asm 2026-02-21T09:05:30.1045908Z // begin inline asm 2026-02-21T09:05:30.1046076Z @%p137 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.1046130Z // end inline asm 2026-02-21T09:05:30.1046181Z // begin inline asm 2026-02-21T09:05:30.1046334Z @%p137 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.1046419Z // end inline asm 2026-02-21T09:05:30.1046474Z // begin inline asm 2026-02-21T09:05:30.1046616Z @%p137 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.1046677Z // end inline asm 2026-02-21T09:05:30.1046730Z // begin inline asm 2026-02-21T09:05:30.1046980Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:05:30.1047043Z // end inline asm 2026-02-21T09:05:30.1047123Z // begin inline asm 2026-02-21T09:05:30.1047246Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:05:30.1047321Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.1047390Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.1047445Z // end inline asm 2026-02-21T09:05:30.1047504Z bar.sync 0; 2026-02-21T09:05:30.1047569Z cvta.global.u64 %rd130, %rd55; 2026-02-21T09:05:30.1047744Z .loc 1 33 74 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:33:74 2026-02-21T09:05:30.1047805Z setp.gt.u32 %p57, %r3, 2047; 2026-02-21T09:05:30.1047870Z @%p57 bra $L__BB0_8; 2026-02-21T09:05:30.1047945Z // %bb.1: // %.lr.ph 2026-02-21T09:05:30.1048109Z .loc 1 0 74 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:0:74 2026-02-21T09:05:30.1048172Z shl.b32 %r112, %r1, 5; 2026-02-21T09:05:30.1048231Z and.b32 %r113, %r112, 8032; 2026-02-21T09:05:30.1048311Z bfe.s32 %r114, %r1, 2, 1; 2026-02-21T09:05:30.1048373Z and.b32 %r115, %r114, 144; 2026-02-21T09:05:30.1048440Z or.b32 %r116, %r115, %r113; 2026-02-21T09:05:30.1048497Z add.s32 %r117, %r25, 131072; 2026-02-21T09:05:30.1048554Z xor.b32 %r118, %r116, 16; 2026-02-21T09:05:30.1048620Z shr.u32 %r119, %r1, 5; 2026-02-21T09:05:30.1048789Z .loc 1 44 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:44:27 2026-02-21T09:05:30.1048846Z shl.b32 %r120, %r3, 4; 2026-02-21T09:05:30.1048911Z and.b32 %r121, %r120, 48; 2026-02-21T09:05:30.1048966Z and.b32 %r122, %r3, 1984; 2026-02-21T09:05:30.1049021Z or.b32 %r294, %r121, %r122; 2026-02-21T09:05:30.1049183Z .loc 1 45 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:45:27 2026-02-21T09:05:30.1049246Z shl.b32 %r123, %r3, 6; 2026-02-21T09:05:30.1049302Z and.b32 %r295, %r123, 3840; 2026-02-21T09:05:30.1049465Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1049549Z shfl.sync.idx.b32 %r8, %r119, 0, 31, -1; 2026-02-21T09:05:30.1049606Z shl.b32 %r124, %r8, 21; 2026-02-21T09:05:30.1049665Z and.b32 %r125, %r124, 6291456; 2026-02-21T09:05:30.1049727Z add.s32 %r126, %r125, %r323; 2026-02-21T09:05:30.1049806Z shl.b32 %r127, %r8, 2; 2026-02-21T09:05:30.1049860Z and.b32 %r128, %r127, 16; 2026-02-21T09:05:30.1049916Z add.s32 %r293, %r126, %r128; 2026-02-21T09:05:30.1049984Z mov.pred %p84, -1; 2026-02-21T09:05:30.1050040Z // begin inline asm 2026-02-21T09:05:30.1050295Z @%p84 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r293 + 0], {%r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35}; 2026-02-21T09:05:30.1050355Z // end inline asm 2026-02-21T09:05:30.1050408Z // begin inline asm 2026-02-21T09:05:30.1050474Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:30.1050526Z // end inline asm 2026-02-21T09:05:30.1050584Z bar.sync 0; 2026-02-21T09:05:30.1050751Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1050808Z add.s32 %r325, %r25, 147488; 2026-02-21T09:05:30.1050870Z // begin inline asm 2026-02-21T09:05:30.1050955Z @%p137 mbarrier.init.shared::cta.b64 [%r325], 1; 2026-02-21T09:05:30.1051008Z // end inline asm 2026-02-21T09:05:30.1051067Z bar.sync 0; 2026-02-21T09:05:30.1051124Z add.s32 %r78, %r25, 147496; 2026-02-21T09:05:30.1051177Z // begin inline asm 2026-02-21T09:05:30.1051278Z @%p137 mbarrier.init.shared::cta.b64 [%r78], 1; 2026-02-21T09:05:30.1051340Z // end inline asm 2026-02-21T09:05:30.1051397Z add.s32 %r79, %r25, 147456; 2026-02-21T09:05:30.1051452Z // begin inline asm 2026-02-21T09:05:30.1051539Z @%p137 mbarrier.init.shared::cta.b64 [%r79], 1; 2026-02-21T09:05:30.1051593Z // end inline asm 2026-02-21T09:05:30.1051644Z bar.sync 0; 2026-02-21T09:05:30.1051699Z add.s32 %r80, %r25, 147464; 2026-02-21T09:05:30.1051760Z // begin inline asm 2026-02-21T09:05:30.1051839Z @%p137 mbarrier.init.shared::cta.b64 [%r80], 1; 2026-02-21T09:05:30.1051912Z // end inline asm 2026-02-21T09:05:30.1051974Z bar.sync 0; 2026-02-21T09:05:30.1052029Z add.s32 %r81, %r25, 147472; 2026-02-21T09:05:30.1052081Z // begin inline asm 2026-02-21T09:05:30.1052158Z @%p137 mbarrier.init.shared::cta.b64 [%r81], 1; 2026-02-21T09:05:30.1052217Z // end inline asm 2026-02-21T09:05:30.1052267Z bar.sync 0; 2026-02-21T09:05:30.1052322Z add.s32 %r177, %r25, 147480; 2026-02-21T09:05:30.1052383Z // begin inline asm 2026-02-21T09:05:30.1052464Z @%p137 mbarrier.init.shared::cta.b64 [%r177], 1; 2026-02-21T09:05:30.1052515Z // end inline asm 2026-02-21T09:05:30.1052573Z bar.sync 0; 2026-02-21T09:05:30.1052627Z // begin inline asm 2026-02-21T09:05:30.1052737Z @%p137 mbarrier.arrive.expect_tx.shared.b64 _, [%r79], 34816; 2026-02-21T09:05:30.1052789Z // end inline asm 2026-02-21T09:05:30.1052976Z .loc 1 54 31 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:54:31 2026-02-21T09:05:30.1053031Z // begin inline asm 2026-02-21T09:05:30.1053101Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.1053158Z // end inline asm 2026-02-21T09:05:30.1053209Z bar.sync 0; 2026-02-21T09:05:30.1053272Z elect.sync %r129|%p75, -1; 2026-02-21T09:05:30.1053334Z and.pred %p66, %p1, %p75; 2026-02-21T09:05:30.1053395Z // begin inline asm 2026-02-21T09:05:30.1053638Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r25], [%rd97, {%r35, %r295}], [%r79]; 2026-02-21T09:05:30.1053692Z // end inline asm 2026-02-21T09:05:30.1053862Z .loc 1 55 44 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:55:44 2026-02-21T09:05:30.1053914Z bar.sync 0; 2026-02-21T09:05:30.1053976Z elect.sync %r130|%p76, -1; 2026-02-21T09:05:30.1054043Z and.pred %p67, %p1, %p76; 2026-02-21T09:05:30.1054099Z add.s32 %r88, %r25, 139264; 2026-02-21T09:05:30.1054153Z // begin inline asm 2026-02-21T09:05:30.1054391Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r88], [%rd98, {%r35, %r294}], [%r79]; 2026-02-21T09:05:30.1054446Z // end inline asm 2026-02-21T09:05:30.1054613Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1054713Z bar.sync 0; 2026-02-21T09:05:30.1054777Z // begin inline asm 2026-02-21T09:05:30.1054886Z @%p137 mbarrier.arrive.expect_tx.shared.b64 _, [%r80], 34816; 2026-02-21T09:05:30.1054938Z // end inline asm 2026-02-21T09:05:30.1055113Z .loc 1 54 31 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:54:31 2026-02-21T09:05:30.1055165Z bar.sync 0; 2026-02-21T09:05:30.1055225Z elect.sync %r131|%p77, -1; 2026-02-21T09:05:30.1055284Z and.pred %p69, %p1, %p77; 2026-02-21T09:05:30.1055346Z add.s32 %r93, %r25, 32768; 2026-02-21T09:05:30.1055399Z // begin inline asm 2026-02-21T09:05:30.1055635Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r93], [%rd97, {%r28, %r295}], [%r80]; 2026-02-21T09:05:30.1055698Z // end inline asm 2026-02-21T09:05:30.1055865Z .loc 1 55 44 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:55:44 2026-02-21T09:05:30.1055916Z bar.sync 0; 2026-02-21T09:05:30.1055985Z elect.sync %r132|%p78, -1; 2026-02-21T09:05:30.1056044Z and.pred %p70, %p1, %p78; 2026-02-21T09:05:30.1056101Z add.s32 %r97, %r25, 141312; 2026-02-21T09:05:30.1056155Z // begin inline asm 2026-02-21T09:05:30.1056422Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r97], [%rd98, {%r28, %r294}], [%r80]; 2026-02-21T09:05:30.1056478Z // end inline asm 2026-02-21T09:05:30.1056641Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1056704Z bar.sync 0; 2026-02-21T09:05:30.1056760Z // begin inline asm 2026-02-21T09:05:30.1056869Z @%p137 mbarrier.arrive.expect_tx.shared.b64 _, [%r81], 34816; 2026-02-21T09:05:30.1056947Z // end inline asm 2026-02-21T09:05:30.1057135Z .loc 1 54 31 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:54:31 2026-02-21T09:05:30.1057187Z bar.sync 0; 2026-02-21T09:05:30.1057255Z elect.sync %r133|%p79, -1; 2026-02-21T09:05:30.1057315Z and.pred %p72, %p1, %p79; 2026-02-21T09:05:30.1057372Z add.s32 %r102, %r25, 65536; 2026-02-21T09:05:30.1057425Z mov.b32 %r103, 128; 2026-02-21T09:05:30.1057485Z // begin inline asm 2026-02-21T09:05:30.1057719Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r102], [%rd97, {%r103, %r295}], [%r81]; 2026-02-21T09:05:30.1057772Z // end inline asm 2026-02-21T09:05:30.1057941Z .loc 1 55 44 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:55:44 2026-02-21T09:05:30.1057993Z bar.sync 0; 2026-02-21T09:05:30.1058052Z elect.sync %r134|%p80, -1; 2026-02-21T09:05:30.1058119Z and.pred %p73, %p1, %p80; 2026-02-21T09:05:30.1058202Z add.s32 %r106, %r25, 143360; 2026-02-21T09:05:30.1058260Z // begin inline asm 2026-02-21T09:05:30.1058491Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r106], [%rd98, {%r103, %r294}], [%r81]; 2026-02-21T09:05:30.1058552Z // end inline asm 2026-02-21T09:05:30.1058718Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1058770Z bar.sync 0; 2026-02-21T09:05:30.1058832Z // begin inline asm 2026-02-21T09:05:30.1058881Z 2026-02-21T09:05:30.1058930Z { 2026-02-21T09:05:30.1058989Z .reg .pred complete; 2026-02-21T09:05:30.1059048Z waitLoop: 2026-02-21T09:05:30.1059163Z mbarrier.try_wait.parity.shared.b64 complete, [%r79], %r35; 2026-02-21T09:05:30.1059225Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.1059282Z } 2026-02-21T09:05:30.1059286Z 2026-02-21T09:05:30.1059338Z // end inline asm 2026-02-21T09:05:30.1059501Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1059568Z setp.ne.b32 %p81, %r8, 0; 2026-02-21T09:05:30.1059623Z @%p81 bra $L__BB0_3; 2026-02-21T09:05:30.1059674Z // %bb.2: 2026-02-21T09:05:30.1059837Z .loc 1 0 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:0:52 2026-02-21T09:05:30.1059927Z add.s32 %r152, %r25, 16480; 2026-02-21T09:05:30.1059988Z bfe.u32 %r153, %r152, 4, 14; 2026-02-21T09:05:30.1060046Z cvt.u64.u32 %rd85, %r153; 2026-02-21T09:05:30.1060123Z or.b64 %rd82, %rd85, 4611686293439512576; 2026-02-21T09:05:30.1060180Z add.s32 %r154, %r25, 16448; 2026-02-21T09:05:30.1060237Z bfe.u32 %r155, %r154, 4, 14; 2026-02-21T09:05:30.1060295Z cvt.u64.u32 %rd86, %r155; 2026-02-21T09:05:30.1060369Z or.b64 %rd80, %rd86, 4611686293439512576; 2026-02-21T09:05:30.1060423Z add.s32 %r156, %r25, 16416; 2026-02-21T09:05:30.1060479Z bfe.u32 %r157, %r156, 4, 14; 2026-02-21T09:05:30.1060542Z cvt.u64.u32 %rd87, %r157; 2026-02-21T09:05:30.1060608Z or.b64 %rd78, %rd87, 4611686293439512576; 2026-02-21T09:05:30.1060664Z add.s32 %r158, %r25, 16384; 2026-02-21T09:05:30.1060727Z bfe.u32 %r159, %r158, 4, 14; 2026-02-21T09:05:30.1060783Z cvt.u64.u32 %rd88, %r159; 2026-02-21T09:05:30.1060845Z or.b64 %rd76, %rd88, 4611686293439512576; 2026-02-21T09:05:30.1060901Z add.s32 %r161, %r25, 139360; 2026-02-21T09:05:30.1060963Z bfe.u32 %r162, %r161, 4, 14; 2026-02-21T09:05:30.1061018Z cvt.u64.u32 %rd89, %r162; 2026-02-21T09:05:30.1061081Z or.b64 %rd75, %rd89, 4611686293313683456; 2026-02-21T09:05:30.1061166Z add.s32 %r163, %r25, 96; 2026-02-21T09:05:30.1061222Z bfe.u32 %r164, %r163, 4, 14; 2026-02-21T09:05:30.1061277Z cvt.u64.u32 %rd90, %r164; 2026-02-21T09:05:30.1061339Z or.b64 %rd74, %rd90, 4611686293439512576; 2026-02-21T09:05:30.1061401Z add.s32 %r165, %r25, 139328; 2026-02-21T09:05:30.1061455Z bfe.u32 %r166, %r165, 4, 14; 2026-02-21T09:05:30.1061510Z cvt.u64.u32 %rd91, %r166; 2026-02-21T09:05:30.1061578Z or.b64 %rd73, %rd91, 4611686293313683456; 2026-02-21T09:05:30.1061635Z add.s32 %r167, %r25, 64; 2026-02-21T09:05:30.1061720Z bfe.u32 %r168, %r167, 4, 14; 2026-02-21T09:05:30.1061775Z cvt.u64.u32 %rd92, %r168; 2026-02-21T09:05:30.1061843Z or.b64 %rd72, %rd92, 4611686293439512576; 2026-02-21T09:05:30.1061898Z add.s32 %r169, %r25, 139296; 2026-02-21T09:05:30.1061953Z bfe.u32 %r170, %r169, 4, 14; 2026-02-21T09:05:30.1062015Z cvt.u64.u32 %rd93, %r170; 2026-02-21T09:05:30.1062075Z or.b64 %rd71, %rd93, 4611686293313683456; 2026-02-21T09:05:30.1062130Z add.s32 %r171, %r25, 32; 2026-02-21T09:05:30.1062192Z bfe.u32 %r172, %r171, 4, 14; 2026-02-21T09:05:30.1062245Z cvt.u64.u32 %rd94, %r172; 2026-02-21T09:05:30.1062307Z or.b64 %rd70, %rd94, 4611686293439512576; 2026-02-21T09:05:30.1062362Z bfe.u32 %r173, %r88, 4, 14; 2026-02-21T09:05:30.1062423Z cvt.u64.u32 %rd95, %r173; 2026-02-21T09:05:30.1062485Z or.b64 %rd69, %rd95, 4611686293313683456; 2026-02-21T09:05:30.1062541Z bfe.u32 %r174, %r25, 4, 14; 2026-02-21T09:05:30.1062624Z cvt.u64.u32 %rd96, %r174; 2026-02-21T09:05:30.1062688Z or.b64 %rd68, %rd96, 4611686293439512576; 2026-02-21T09:05:30.1062853Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1062913Z elect.sync %r175|%p83, -1; 2026-02-21T09:05:30.1062976Z mov.b32 %r136, 134479888; 2026-02-21T09:05:30.1063033Z mov.pred %p82, 0; 2026-02-21T09:05:30.1063088Z // begin inline asm 2026-02-21T09:05:30.1063238Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd68, %rd69, %r136, %p82; 2026-02-21T09:05:30.1063292Z // end inline asm 2026-02-21T09:05:30.1063347Z // begin inline asm 2026-02-21T09:05:30.1063486Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd70, %rd71, %r136, %p84; 2026-02-21T09:05:30.1063539Z // end inline asm 2026-02-21T09:05:30.1063592Z // begin inline asm 2026-02-21T09:05:30.1063719Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd72, %rd73, %r136, %p84; 2026-02-21T09:05:30.1063780Z // end inline asm 2026-02-21T09:05:30.1063835Z // begin inline asm 2026-02-21T09:05:30.1063964Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd74, %rd75, %r136, %p84; 2026-02-21T09:05:30.1064029Z // end inline asm 2026-02-21T09:05:30.1064083Z // begin inline asm 2026-02-21T09:05:30.1064216Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd76, %rd69, %r136, %p82; 2026-02-21T09:05:30.1064298Z // end inline asm 2026-02-21T09:05:30.1064352Z // begin inline asm 2026-02-21T09:05:30.1064484Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd78, %rd71, %r136, %p84; 2026-02-21T09:05:30.1064545Z // end inline asm 2026-02-21T09:05:30.1064598Z // begin inline asm 2026-02-21T09:05:30.1064751Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd80, %rd73, %r136, %p84; 2026-02-21T09:05:30.1064806Z // end inline asm 2026-02-21T09:05:30.1064869Z // begin inline asm 2026-02-21T09:05:30.1064992Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd82, %rd75, %r136, %p84; 2026-02-21T09:05:30.1065047Z // end inline asm 2026-02-21T09:05:30.1065113Z add.s32 %r176, %r25, 147488; 2026-02-21T09:05:30.1065170Z cvt.u64.u32 %rd84, %r176; 2026-02-21T09:05:30.1065224Z // begin inline asm 2026-02-21T09:05:30.1065349Z @%p83 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd84]; 2026-02-21T09:05:30.1065402Z // end inline asm 2026-02-21T09:05:30.1065454Z $L__BB0_3: 2026-02-21T09:05:30.1065616Z .loc 1 0 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:0:52 2026-02-21T09:05:30.1065715Z add.s32 %r4, %r117, %r116; 2026-02-21T09:05:30.1065773Z add.s32 %r5, %r117, %r118; 2026-02-21T09:05:30.1065940Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1066001Z bar.sync 0; 2026-02-21T09:05:30.1066055Z // begin inline asm 2026-02-21T09:05:30.1066167Z @%p137 mbarrier.arrive.expect_tx.shared.b64 _, [%r177], 34816; 2026-02-21T09:05:30.1066221Z // end inline asm 2026-02-21T09:05:30.1066396Z .loc 1 54 31 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:54:31 2026-02-21T09:05:30.1066474Z bar.sync 0; 2026-02-21T09:05:30.1066539Z elect.sync %r191|%p103, -1; 2026-02-21T09:05:30.1066608Z and.pred %p100, %p1, %p103; 2026-02-21T09:05:30.1066666Z add.s32 %r178, %r25, 98304; 2026-02-21T09:05:30.1066720Z mov.b32 %r179, 192; 2026-02-21T09:05:30.1066780Z // begin inline asm 2026-02-21T09:05:30.1067018Z @%p100 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd97, {%r179, %r295}], [%r177]; 2026-02-21T09:05:30.1067072Z // end inline asm 2026-02-21T09:05:30.1067239Z .loc 1 55 44 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:55:44 2026-02-21T09:05:30.1067291Z bar.sync 0; 2026-02-21T09:05:30.1067351Z elect.sync %r192|%p104, -1; 2026-02-21T09:05:30.1067410Z and.pred %p101, %p1, %p104; 2026-02-21T09:05:30.1067473Z add.s32 %r182, %r25, 145408; 2026-02-21T09:05:30.1067552Z // begin inline asm 2026-02-21T09:05:30.1067796Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r182], [%rd98, {%r179, %r294}], [%r177]; 2026-02-21T09:05:30.1067861Z // end inline asm 2026-02-21T09:05:30.1067913Z mov.b32 %r329, 1; 2026-02-21T09:05:30.1067967Z mov.b32 %r328, 3; 2026-02-21T09:05:30.1068020Z mov.b32 %r324, 0; 2026-02-21T09:05:30.1068081Z mov.b32 %r326, %r324; 2026-02-21T09:05:30.1068137Z mov.b32 %r327, %r324; 2026-02-21T09:05:30.1068190Z mov.b32 %r330, %r324; 2026-02-21T09:05:30.1068252Z mov.b32 %r331, %r324; 2026-02-21T09:05:30.1068306Z bra.uni $L__BB0_4; 2026-02-21T09:05:30.1068409Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.1068582Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1068647Z setp.lt.u32 %p127, %r331, 1792; 2026-02-21T09:05:30.1068814Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1068870Z // begin inline asm 2026-02-21T09:05:30.1068925Z 2026-02-21T09:05:30.1068972Z { 2026-02-21T09:05:30.1069032Z .reg .pred complete; 2026-02-21T09:05:30.1069089Z waitLoop: 2026-02-21T09:05:30.1069206Z mbarrier.try_wait.parity.shared.b64 complete, [%r325], %r324; 2026-02-21T09:05:30.1069294Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.1069343Z } 2026-02-21T09:05:30.1069346Z 2026-02-21T09:05:30.1069406Z // end inline asm 2026-02-21T09:05:30.1069461Z add.s32 %r258, %r329, 1; 2026-02-21T09:05:30.1069522Z setp.gt.s32 %p130, %r258, 1; 2026-02-21T09:05:30.1069591Z selp.b32 %r329, 0, %r258, %p130; 2026-02-21T09:05:30.1069648Z selp.b32 %r259, 1, 0, %p130; 2026-02-21T09:05:30.1069703Z xor.b32 %r22, %r330, %r259; 2026-02-21T09:05:30.1069879Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1069935Z add.s32 %r260, %r328, 1; 2026-02-21T09:05:30.1069995Z setp.gt.s32 %p131, %r260, 3; 2026-02-21T09:05:30.1070058Z selp.b32 %r328, 0, %r260, %p131; 2026-02-21T09:05:30.1070120Z shl.b32 %r261, %r328, 3; 2026-02-21T09:05:30.1070177Z add.s32 %r263, %r25, %r261; 2026-02-21T09:05:30.1070235Z add.s32 %r253, %r263, 147456; 2026-02-21T09:05:30.1070292Z bar.sync 0; 2026-02-21T09:05:30.1070356Z and.pred %p124, %p137, %p127; 2026-02-21T09:05:30.1070412Z // begin inline asm 2026-02-21T09:05:30.1070523Z @%p124 mbarrier.arrive.expect_tx.shared.b64 _, [%r253], 34816; 2026-02-21T09:05:30.1070586Z // end inline asm 2026-02-21T09:05:30.1070774Z .loc 1 54 31 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:54:31 2026-02-21T09:05:30.1070833Z shl.b32 %r264, %r328, 15; 2026-02-21T09:05:30.1070896Z add.s32 %r250, %r25, %r264; 2026-02-21T09:05:30.1070947Z bar.sync 0; 2026-02-21T09:05:30.1071007Z elect.sync %r265|%p132, -1; 2026-02-21T09:05:30.1071076Z and.pred %p133, %p127, %p132; 2026-02-21T09:05:30.1071134Z and.pred %p125, %p1, %p133; 2026-02-21T09:05:30.1071192Z add.s32 %r251, %r331, 256; 2026-02-21T09:05:30.1071268Z // begin inline asm 2026-02-21T09:05:30.1071515Z @%p125 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r250], [%rd97, {%r251, %r295}], [%r253]; 2026-02-21T09:05:30.1071570Z // end inline asm 2026-02-21T09:05:30.1071737Z .loc 1 55 44 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:55:44 2026-02-21T09:05:30.1071803Z shl.b32 %r266, %r328, 11; 2026-02-21T09:05:30.1071860Z add.s32 %r267, %r25, %r266; 2026-02-21T09:05:30.1071918Z add.s32 %r254, %r267, 139264; 2026-02-21T09:05:30.1071979Z bar.sync 0; 2026-02-21T09:05:30.1072040Z elect.sync %r268|%p134, -1; 2026-02-21T09:05:30.1072101Z and.pred %p135, %p127, %p134; 2026-02-21T09:05:30.1072161Z and.pred %p126, %p1, %p135; 2026-02-21T09:05:30.1072229Z // begin inline asm 2026-02-21T09:05:30.1072478Z @%p126 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r254], [%rd98, {%r251, %r294}], [%r253]; 2026-02-21T09:05:30.1072536Z // end inline asm 2026-02-21T09:05:30.1072703Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1072766Z setp.lt.u32 %p136, %r331, 1920; 2026-02-21T09:05:30.1072822Z add.s32 %r331, %r331, 64; 2026-02-21T09:05:30.1072883Z mov.b32 %r324, %r330; 2026-02-21T09:05:30.1072937Z mov.b32 %r325, %r269; 2026-02-21T09:05:30.1072991Z mov.b32 %r330, %r22; 2026-02-21T09:05:30.1073046Z @%p136 bra $L__BB0_4; 2026-02-21T09:05:30.1073109Z bra.uni $L__BB0_7; 2026-02-21T09:05:30.1073209Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:30.1073370Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1073433Z add.s32 %r195, %r327, 1; 2026-02-21T09:05:30.1073490Z setp.gt.s32 %p106, %r195, 3; 2026-02-21T09:05:30.1073550Z selp.b32 %r327, 0, %r195, %p106; 2026-02-21T09:05:30.1073609Z selp.b32 %r196, 1, 0, %p106; 2026-02-21T09:05:30.1073673Z xor.b32 %r326, %r326, %r196; 2026-02-21T09:05:30.1073728Z shl.b32 %r197, %r327, 3; 2026-02-21T09:05:30.1073783Z add.s32 %r199, %r25, %r197; 2026-02-21T09:05:30.1073847Z add.s32 %r193, %r199, 147456; 2026-02-21T09:05:30.1073922Z bar.sync 0; 2026-02-21T09:05:30.1073976Z // begin inline asm 2026-02-21T09:05:30.1074024Z 2026-02-21T09:05:30.1074081Z { 2026-02-21T09:05:30.1074139Z .reg .pred complete; 2026-02-21T09:05:30.1074190Z waitLoop: 2026-02-21T09:05:30.1074315Z mbarrier.try_wait.parity.shared.b64 complete, [%r193], %r326; 2026-02-21T09:05:30.1074377Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.1074427Z } 2026-02-21T09:05:30.1074430Z 2026-02-21T09:05:30.1074488Z // end inline asm 2026-02-21T09:05:30.1074543Z shl.b32 %r200, %r329, 3; 2026-02-21T09:05:30.1074597Z add.s32 %r201, %r25, %r200; 2026-02-21T09:05:30.1074653Z add.s32 %r269, %r201, 147488; 2026-02-21T09:05:30.1074853Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1074912Z @%p81 bra $L__BB0_6; 2026-02-21T09:05:30.1075008Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.1075172Z .loc 1 54 31 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:54:31 2026-02-21T09:05:30.1075229Z shl.b32 %r218, %r327, 15; 2026-02-21T09:05:30.1075285Z add.s32 %r220, %r25, %r218; 2026-02-21T09:05:30.1075481Z .loc 1 55 44 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:55:44 2026-02-21T09:05:30.1075538Z shl.b32 %r221, %r327, 11; 2026-02-21T09:05:30.1075595Z add.s32 %r222, %r25, %r221; 2026-02-21T09:05:30.1075652Z add.s32 %r223, %r222, 139264; 2026-02-21T09:05:30.1075825Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1075885Z elect.sync %r224|%p108, -1; 2026-02-21T09:05:30.1075941Z bfe.u32 %r225, %r220, 4, 14; 2026-02-21T09:05:30.1076008Z cvt.u64.u32 %rd116, %r225; 2026-02-21T09:05:30.1076106Z or.b64 %rd99, %rd116, 4611686293439512576; 2026-02-21T09:05:30.1076163Z bfe.u32 %r226, %r223, 4, 14; 2026-02-21T09:05:30.1076220Z cvt.u64.u32 %rd117, %r226; 2026-02-21T09:05:30.1076300Z or.b64 %rd100, %rd117, 4611686293313683456; 2026-02-21T09:05:30.1076357Z mov.b32 %r203, 134479888; 2026-02-21T09:05:30.1076414Z mov.pred %p107, -1; 2026-02-21T09:05:30.1076476Z // begin inline asm 2026-02-21T09:05:30.1076619Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd99, %rd100, %r203, %p107; 2026-02-21T09:05:30.1076673Z // end inline asm 2026-02-21T09:05:30.1076734Z add.s32 %r227, %r220, 32; 2026-02-21T09:05:30.1076789Z bfe.u32 %r228, %r227, 4, 14; 2026-02-21T09:05:30.1076846Z cvt.u64.u32 %rd118, %r228; 2026-02-21T09:05:30.1076914Z or.b64 %rd101, %rd118, 4611686293439512576; 2026-02-21T09:05:30.1076978Z add.s32 %r229, %r222, 139296; 2026-02-21T09:05:30.1077032Z bfe.u32 %r230, %r229, 4, 14; 2026-02-21T09:05:30.1077115Z cvt.u64.u32 %rd119, %r230; 2026-02-21T09:05:30.1077195Z or.b64 %rd102, %rd119, 4611686293313683456; 2026-02-21T09:05:30.1077253Z // begin inline asm 2026-02-21T09:05:30.1077401Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd101, %rd102, %r203, %p107; 2026-02-21T09:05:30.1077464Z // end inline asm 2026-02-21T09:05:30.1077521Z add.s32 %r231, %r220, 64; 2026-02-21T09:05:30.1077577Z bfe.u32 %r232, %r231, 4, 14; 2026-02-21T09:05:30.1077635Z cvt.u64.u32 %rd120, %r232; 2026-02-21T09:05:30.1077713Z or.b64 %rd103, %rd120, 4611686293439512576; 2026-02-21T09:05:30.1077774Z add.s32 %r233, %r222, 139328; 2026-02-21T09:05:30.1077832Z bfe.u32 %r234, %r233, 4, 14; 2026-02-21T09:05:30.1077897Z cvt.u64.u32 %rd121, %r234; 2026-02-21T09:05:30.1077965Z or.b64 %rd104, %rd121, 4611686293313683456; 2026-02-21T09:05:30.1078023Z // begin inline asm 2026-02-21T09:05:30.1078165Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd103, %rd104, %r203, %p107; 2026-02-21T09:05:30.1078229Z // end inline asm 2026-02-21T09:05:30.1078288Z add.s32 %r235, %r220, 96; 2026-02-21T09:05:30.1078347Z bfe.u32 %r236, %r235, 4, 14; 2026-02-21T09:05:30.1078413Z cvt.u64.u32 %rd122, %r236; 2026-02-21T09:05:30.1078479Z or.b64 %rd105, %rd122, 4611686293439512576; 2026-02-21T09:05:30.1078566Z add.s32 %r237, %r222, 139360; 2026-02-21T09:05:30.1078633Z bfe.u32 %r238, %r237, 4, 14; 2026-02-21T09:05:30.1078690Z cvt.u64.u32 %rd123, %r238; 2026-02-21T09:05:30.1078759Z or.b64 %rd106, %rd123, 4611686293313683456; 2026-02-21T09:05:30.1078817Z // begin inline asm 2026-02-21T09:05:30.1078965Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 0 ], %rd105, %rd106, %r203, %p107; 2026-02-21T09:05:30.1079020Z // end inline asm 2026-02-21T09:05:30.1079078Z add.s32 %r239, %r220, 16384; 2026-02-21T09:05:30.1079142Z bfe.u32 %r240, %r239, 4, 14; 2026-02-21T09:05:30.1079202Z cvt.u64.u32 %rd124, %r240; 2026-02-21T09:05:30.1079269Z or.b64 %rd107, %rd124, 4611686293439512576; 2026-02-21T09:05:30.1079329Z // begin inline asm 2026-02-21T09:05:30.1079485Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd107, %rd100, %r203, %p107; 2026-02-21T09:05:30.1079542Z // end inline asm 2026-02-21T09:05:30.1079601Z add.s32 %r241, %r220, 16416; 2026-02-21T09:05:30.1079672Z bfe.u32 %r242, %r241, 4, 14; 2026-02-21T09:05:30.1079732Z cvt.u64.u32 %rd125, %r242; 2026-02-21T09:05:30.1079801Z or.b64 %rd109, %rd125, 4611686293439512576; 2026-02-21T09:05:30.1079865Z // begin inline asm 2026-02-21T09:05:30.1080039Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd109, %rd102, %r203, %p107; 2026-02-21T09:05:30.1080096Z // end inline asm 2026-02-21T09:05:30.1080153Z add.s32 %r243, %r220, 16448; 2026-02-21T09:05:30.1080217Z bfe.u32 %r244, %r243, 4, 14; 2026-02-21T09:05:30.1080275Z cvt.u64.u32 %rd126, %r244; 2026-02-21T09:05:30.1080342Z or.b64 %rd111, %rd126, 4611686293439512576; 2026-02-21T09:05:30.1080406Z // begin inline asm 2026-02-21T09:05:30.1080547Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd111, %rd104, %r203, %p107; 2026-02-21T09:05:30.1080627Z // end inline asm 2026-02-21T09:05:30.1080692Z add.s32 %r245, %r220, 16480; 2026-02-21T09:05:30.1080750Z bfe.u32 %r246, %r245, 4, 14; 2026-02-21T09:05:30.1080806Z cvt.u64.u32 %rd127, %r246; 2026-02-21T09:05:30.1080876Z or.b64 %rd113, %rd127, 4611686293439512576; 2026-02-21T09:05:30.1080940Z // begin inline asm 2026-02-21T09:05:30.1081080Z @%p108 tcgen05.mma.cta_group::1.kind::f16 [ %r323 + 16 ], %rd113, %rd106, %r203, %p107; 2026-02-21T09:05:30.1081135Z // end inline asm 2026-02-21T09:05:30.1081201Z cvt.u64.u32 %rd115, %r269; 2026-02-21T09:05:30.1081257Z // begin inline asm 2026-02-21T09:05:30.1081387Z @%p108 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd115]; 2026-02-21T09:05:30.1081449Z // end inline asm 2026-02-21T09:05:30.1081504Z bra.uni $L__BB0_6; 2026-02-21T09:05:30.1081602Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:05:30.1081802Z .loc 1 0 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:0:52 2026-02-21T09:05:30.1081869Z mov.b32 %r270, 1; 2026-02-21T09:05:30.1082040Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1082100Z // begin inline asm 2026-02-21T09:05:30.1082157Z 2026-02-21T09:05:30.1082208Z { 2026-02-21T09:05:30.1082270Z .reg .pred complete; 2026-02-21T09:05:30.1082323Z waitLoop: 2026-02-21T09:05:30.1082454Z mbarrier.try_wait.parity.shared.b64 complete, [%r269], %r270; 2026-02-21T09:05:30.1082519Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.1082569Z } 2026-02-21T09:05:30.1082573Z 2026-02-21T09:05:30.1082636Z // end inline asm 2026-02-21T09:05:30.1082805Z .loc 1 50 42 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:50:42 2026-02-21T09:05:30.1082860Z bar.sync 0; 2026-02-21T09:05:30.1082923Z // begin inline asm 2026-02-21T09:05:30.1083011Z @%p137 mbarrier.inval.shared::cta.b64 [%r79]; 2026-02-21T09:05:30.1083067Z // end inline asm 2026-02-21T09:05:30.1083120Z bar.sync 0; 2026-02-21T09:05:30.1083183Z // begin inline asm 2026-02-21T09:05:30.1083267Z @%p137 mbarrier.inval.shared::cta.b64 [%r80]; 2026-02-21T09:05:30.1083321Z // end inline asm 2026-02-21T09:05:30.1083404Z bar.sync 0; 2026-02-21T09:05:30.1083462Z // begin inline asm 2026-02-21T09:05:30.1083541Z @%p137 mbarrier.inval.shared::cta.b64 [%r81]; 2026-02-21T09:05:30.1083594Z // end inline asm 2026-02-21T09:05:30.1083655Z bar.sync 0; 2026-02-21T09:05:30.1083713Z // begin inline asm 2026-02-21T09:05:30.1083797Z @%p137 mbarrier.inval.shared::cta.b64 [%r177]; 2026-02-21T09:05:30.1083861Z // end inline asm 2026-02-21T09:05:30.1083921Z add.s32 %r275, %r25, 147488; 2026-02-21T09:05:30.1083976Z // begin inline asm 2026-02-21T09:05:30.1084065Z @%p137 mbarrier.inval.shared::cta.b64 [%r275]; 2026-02-21T09:05:30.1084120Z // end inline asm 2026-02-21T09:05:30.1084174Z bar.sync 0; 2026-02-21T09:05:30.1084241Z // begin inline asm 2026-02-21T09:05:30.1084323Z @%p137 mbarrier.inval.shared::cta.b64 [%r78]; 2026-02-21T09:05:30.1084374Z // end inline asm 2026-02-21T09:05:30.1084538Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1084600Z // begin inline asm 2026-02-21T09:05:30.1084899Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292}, [%r293 + 0]; 2026-02-21T09:05:30.1084994Z // end inline asm 2026-02-21T09:05:30.1085055Z // begin inline asm 2026-02-21T09:05:30.1085124Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:30.1085176Z // end inline asm 2026-02-21T09:05:30.1085233Z cvt.u64.u32 %rd131, %r277; 2026-02-21T09:05:30.1085297Z cvt.u64.u32 %rd132, %r278; 2026-02-21T09:05:30.1085354Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:05:30.1085416Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:05:30.1085594Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1085686Z mov.b64 {%r298, %r299}, %rd134; 2026-02-21T09:05:30.1085753Z cvt.rn.f16x2.f32 %r300, %r299, %r298; 2026-02-21T09:05:30.1085916Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1085984Z cvt.u64.u32 %rd135, %r279; 2026-02-21T09:05:30.1086040Z cvt.u64.u32 %rd136, %r280; 2026-02-21T09:05:30.1086097Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:05:30.1086167Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:05:30.1086332Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1086390Z mov.b64 {%r301, %r302}, %rd138; 2026-02-21T09:05:30.1086462Z cvt.rn.f16x2.f32 %r303, %r302, %r301; 2026-02-21T09:05:30.1086630Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1086718Z cvt.u64.u32 %rd139, %r281; 2026-02-21T09:05:30.1086778Z cvt.u64.u32 %rd140, %r282; 2026-02-21T09:05:30.1086845Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:05:30.1086903Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:05:30.1087066Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1087134Z mov.b64 {%r304, %r305}, %rd142; 2026-02-21T09:05:30.1087196Z cvt.rn.f16x2.f32 %r306, %r305, %r304; 2026-02-21T09:05:30.1087359Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1087423Z cvt.u64.u32 %rd143, %r283; 2026-02-21T09:05:30.1087480Z cvt.u64.u32 %rd144, %r284; 2026-02-21T09:05:30.1087538Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:05:30.1087597Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:05:30.1087766Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1087832Z mov.b64 {%r307, %r308}, %rd146; 2026-02-21T09:05:30.1087896Z cvt.rn.f16x2.f32 %r309, %r308, %r307; 2026-02-21T09:05:30.1088065Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1088122Z cvt.u64.u32 %rd147, %r285; 2026-02-21T09:05:30.1088179Z cvt.u64.u32 %rd148, %r286; 2026-02-21T09:05:30.1088274Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:05:30.1088331Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:05:30.1088492Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1088549Z mov.b64 {%r310, %r311}, %rd150; 2026-02-21T09:05:30.1088618Z cvt.rn.f16x2.f32 %r312, %r311, %r310; 2026-02-21T09:05:30.1088780Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1088835Z cvt.u64.u32 %rd151, %r287; 2026-02-21T09:05:30.1088897Z cvt.u64.u32 %rd152, %r288; 2026-02-21T09:05:30.1088953Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:05:30.1089010Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:05:30.1089177Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1089233Z mov.b64 {%r313, %r314}, %rd154; 2026-02-21T09:05:30.1089294Z cvt.rn.f16x2.f32 %r315, %r314, %r313; 2026-02-21T09:05:30.1089453Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1089516Z cvt.u64.u32 %rd155, %r289; 2026-02-21T09:05:30.1089570Z cvt.u64.u32 %rd156, %r290; 2026-02-21T09:05:30.1089645Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:05:30.1089711Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:05:30.1089877Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1089933Z mov.b64 {%r316, %r317}, %rd158; 2026-02-21T09:05:30.1090001Z cvt.rn.f16x2.f32 %r318, %r317, %r316; 2026-02-21T09:05:30.1090169Z .loc 1 56 52 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:56:52 2026-02-21T09:05:30.1090249Z cvt.u64.u32 %rd159, %r291; 2026-02-21T09:05:30.1090305Z cvt.u64.u32 %rd160, %r292; 2026-02-21T09:05:30.1090368Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:05:30.1090424Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:05:30.1090588Z .loc 1 58 27 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:58:27 2026-02-21T09:05:30.1090653Z mov.b64 {%r319, %r320}, %rd162; 2026-02-21T09:05:30.1090714Z cvt.rn.f16x2.f32 %r321, %r320, %r319; 2026-02-21T09:05:30.1090878Z .loc 1 59 45 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:59:45 2026-02-21T09:05:30.1090954Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.1091006Z bar.sync 0; 2026-02-21T09:05:30.1091099Z st.shared.v4.b32 [%r4], {%r300, %r303, %r306, %r309}; 2026-02-21T09:05:30.1091188Z st.shared.v4.b32 [%r5], {%r312, %r315, %r318, %r321}; 2026-02-21T09:05:30.1091252Z // begin inline asm 2026-02-21T09:05:30.1091342Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.1091399Z // end inline asm 2026-02-21T09:05:30.1091459Z bar.sync 0; 2026-02-21T09:05:30.1091520Z elect.sync %r322|%p145, -1; 2026-02-21T09:05:30.1091580Z and.pred %p143, %p1, %p145; 2026-02-21T09:05:30.1091640Z // begin inline asm 2026-02-21T09:05:30.1091822Z @%p143 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd130, {%r294, %r295}], [%r117]; 2026-02-21T09:05:30.1091876Z // end inline asm 2026-02-21T09:05:30.1091939Z cp.async.bulk.commit_group; 2026-02-21T09:05:30.1092025Z $L__BB0_8: // %._crit_edge 2026-02-21T09:05:30.1092184Z .loc 1 33 74 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:33:74 2026-02-21T09:05:30.1092252Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.1092310Z bar.sync 0; 2026-02-21T09:05:30.1092471Z .loc 1 33 4 // cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py:33:4 2026-02-21T09:05:30.1092521Z bar.sync 0; 2026-02-21T09:05:30.1092583Z // begin inline asm 2026-02-21T09:05:30.1092694Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r323, 32; 2026-02-21T09:05:30.1092747Z // end inline asm 2026-02-21T09:05:30.1092796Z ret; 2026-02-21T09:05:30.1092855Z $L__tmp1: 2026-02-21T09:05:30.1092931Z $L__func_end0: 2026-02-21T09:05:30.1093013Z // -- End function 2026-02-21T09:05:30.1093068Z } 2026-02-21T09:05:30.1093267Z .file 1 "/tmp/torchinductor_root/ps/cpsxydaqm2xakdiiy375ewh2dxfl2w5hhaspi5r2whmns5udtpdz.py" 2026-02-21T09:05:30.1093328Z .section .debug_abbrev 2026-02-21T09:05:30.1093377Z { 2026-02-21T09:05:30.1093470Z .b8 1 // Abbreviation Code 2026-02-21T09:05:30.1093553Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:30.1093630Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:30.1093714Z .b8 37 // DW_AT_producer 2026-02-21T09:05:30.1093787Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.1093860Z .b8 19 // DW_AT_language 2026-02-21T09:05:30.1093941Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:30.1094013Z .b8 3 // DW_AT_name 2026-02-21T09:05:30.1094085Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.1094160Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:30.1094263Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:30.1094335Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:30.1094405Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.1094481Z .b8 0 // EOM(1) 2026-02-21T09:05:30.1094548Z .b8 0 // EOM(2) 2026-02-21T09:05:30.1094612Z .b8 0 // EOM(3) 2026-02-21T09:05:30.1094703Z } 2026-02-21T09:05:30.1094765Z .section .debug_info 2026-02-21T09:05:30.1094840Z { 2026-02-21T09:05:30.1094920Z .b32 104 // Length of Unit 2026-02-21T09:05:30.1095013Z .b8 2 // DWARF version number 2026-02-21T09:05:30.1095062Z .b8 0 2026-02-21T09:05:30.1095179Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:30.1095273Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:30.1095371Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:30.1095450Z .b8 116 // DW_AT_producer 2026-02-21T09:05:30.1095511Z .b8 114 2026-02-21T09:05:30.1095563Z .b8 105 2026-02-21T09:05:30.1095614Z .b8 116 2026-02-21T09:05:30.1095664Z .b8 111 2026-02-21T09:05:30.1095725Z .b8 110 2026-02-21T09:05:30.1095776Z .b8 0 2026-02-21T09:05:30.1095851Z .b8 2 // DW_AT_language 2026-02-21T09:05:30.1095907Z .b8 0 2026-02-21T09:05:30.1096015Z .b8 99 // DW_AT_name 2026-02-21T09:05:30.1096070Z .b8 112 2026-02-21T09:05:30.1096123Z .b8 115 2026-02-21T09:05:30.1096189Z .b8 120 2026-02-21T09:05:30.1096239Z .b8 121 2026-02-21T09:05:30.1096287Z .b8 100 2026-02-21T09:05:30.1096348Z .b8 97 2026-02-21T09:05:30.1096395Z .b8 113 2026-02-21T09:05:30.1096444Z .b8 109 2026-02-21T09:05:30.1096494Z .b8 50 2026-02-21T09:05:30.1096551Z .b8 120 2026-02-21T09:05:30.1096600Z .b8 97 2026-02-21T09:05:30.1096648Z .b8 107 2026-02-21T09:05:30.1096698Z .b8 100 2026-02-21T09:05:30.1096754Z .b8 105 2026-02-21T09:05:30.1096802Z .b8 105 2026-02-21T09:05:30.1096849Z .b8 121 2026-02-21T09:05:30.1096904Z .b8 51 2026-02-21T09:05:30.1096952Z .b8 55 2026-02-21T09:05:30.1097000Z .b8 53 2026-02-21T09:05:30.1097049Z .b8 101 2026-02-21T09:05:30.1097106Z .b8 119 2026-02-21T09:05:30.1097156Z .b8 104 2026-02-21T09:05:30.1097204Z .b8 50 2026-02-21T09:05:30.1097258Z .b8 100 2026-02-21T09:05:30.1097307Z .b8 120 2026-02-21T09:05:30.1097357Z .b8 102 2026-02-21T09:05:30.1097406Z .b8 108 2026-02-21T09:05:30.1097464Z .b8 50 2026-02-21T09:05:30.1097513Z .b8 119 2026-02-21T09:05:30.1097561Z .b8 53 2026-02-21T09:05:30.1097609Z .b8 104 2026-02-21T09:05:30.1097666Z .b8 104 2026-02-21T09:05:30.1097715Z .b8 97 2026-02-21T09:05:30.1097801Z .b8 115 2026-02-21T09:05:30.1097858Z .b8 112 2026-02-21T09:05:30.1097906Z .b8 105 2026-02-21T09:05:30.1097955Z .b8 53 2026-02-21T09:05:30.1098002Z .b8 114 2026-02-21T09:05:30.1098057Z .b8 50 2026-02-21T09:05:30.1098106Z .b8 119 2026-02-21T09:05:30.1098155Z .b8 104 2026-02-21T09:05:30.1098210Z .b8 109 2026-02-21T09:05:30.1098258Z .b8 110 2026-02-21T09:05:30.1098306Z .b8 115 2026-02-21T09:05:30.1098353Z .b8 53 2026-02-21T09:05:30.1098408Z .b8 117 2026-02-21T09:05:30.1098456Z .b8 100 2026-02-21T09:05:30.1098503Z .b8 116 2026-02-21T09:05:30.1098559Z .b8 112 2026-02-21T09:05:30.1098607Z .b8 100 2026-02-21T09:05:30.1098655Z .b8 122 2026-02-21T09:05:30.1098701Z .b8 46 2026-02-21T09:05:30.1098756Z .b8 112 2026-02-21T09:05:30.1098805Z .b8 121 2026-02-21T09:05:30.1098872Z .b8 0 2026-02-21T09:05:30.1098959Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:30.1099038Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:30.1099087Z .b8 116 2026-02-21T09:05:30.1099137Z .b8 109 2026-02-21T09:05:30.1099193Z .b8 112 2026-02-21T09:05:30.1099242Z .b8 47 2026-02-21T09:05:30.1099290Z .b8 116 2026-02-21T09:05:30.1099338Z .b8 111 2026-02-21T09:05:30.1099394Z .b8 114 2026-02-21T09:05:30.1099441Z .b8 99 2026-02-21T09:05:30.1099515Z .b8 104 2026-02-21T09:05:30.1099573Z .b8 105 2026-02-21T09:05:30.1099621Z .b8 110 2026-02-21T09:05:30.1099668Z .b8 100 2026-02-21T09:05:30.1099716Z .b8 117 2026-02-21T09:05:30.1099771Z .b8 99 2026-02-21T09:05:30.1099819Z .b8 116 2026-02-21T09:05:30.1099867Z .b8 111 2026-02-21T09:05:30.1099921Z .b8 114 2026-02-21T09:05:30.1099970Z .b8 95 2026-02-21T09:05:30.1100017Z .b8 114 2026-02-21T09:05:30.1100065Z .b8 111 2026-02-21T09:05:30.1100119Z .b8 111 2026-02-21T09:05:30.1100169Z .b8 116 2026-02-21T09:05:30.1100241Z .b8 47 2026-02-21T09:05:30.1100289Z .b8 112 2026-02-21T09:05:30.1100345Z .b8 115 2026-02-21T09:05:30.1100394Z .b8 0 2026-02-21T09:05:30.1100442Z } 2026-02-21T09:05:30.1100512Z .section .debug_macinfo { } 2026-02-21T09:05:30.1100516Z 2026-02-21T09:05:30.1100593Z ================================================================ 2026-02-21T09:05:30.1100692Z please share the reproducer above with Triton project. 2026-02-21T09:05:30.2947974Z 2026-02-21T09:05:30.2952643Z [30s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:30.2952669Z 2026-02-21T09:05:30.2955638Z Config: @helion.kernel(config=helion.Config(block_sizes=[1024, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:30.2955809Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:30.2955883Z `ptxas` stderr: 2026-02-21T09:05:30.2956220Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 257 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.2956315Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.2956325Z 2026-02-21T09:05:30.2956743Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpy7mmumdv.ptx -o /tmp/tmpy7mmumdv.ptx.o 2026-02-21T09:05:30.2956749Z 2026-02-21T09:05:30.2956879Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:30.2956886Z 2026-02-21T09:05:30.2956978Z ================================================================ 2026-02-21T09:05:30.2960737Z Internal Triton PTX codegen error 2026-02-21T09:05:30.2962350Z `ptxas` stderr: 2026-02-21T09:05:30.2962785Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 257 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.2963115Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.2967304Z 2026-02-21T09:05:30.2971036Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpy7mmumdv.ptx -o /tmp/tmpy7mmumdv.ptx.o 2026-02-21T09:05:30.2974938Z 2026-02-21T09:05:30.2978811Z 2026-02-21T09:05:30.2982058Z // 2026-02-21T09:05:30.2986149Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:30.2987752Z // 2026-02-21T09:05:30.2987809Z 2026-02-21T09:05:30.2988058Z .version 8.7 2026-02-21T09:05:30.2988124Z .target sm_100a 2026-02-21T09:05:30.2988226Z .address_size 64 2026-02-21T09:05:30.2988333Z 2026-02-21T09:05:30.2988513Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:30.2988605Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:30.2988695Z // @_helion_matmul 2026-02-21T09:05:30.2988795Z .visible .entry _helion_matmul( 2026-02-21T09:05:30.2992733Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:30.2997310Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:30.3002044Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:30.3005056Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:30.3008928Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:30.3012051Z ) 2026-02-21T09:05:30.3016631Z .reqntid 256 2026-02-21T09:05:30.3018458Z .maxnreg 32 2026-02-21T09:05:30.3018572Z { 2026-02-21T09:05:30.3018652Z .reg .pred %p<151>; 2026-02-21T09:05:30.3018714Z .reg .b32 %r<809>; 2026-02-21T09:05:30.3018796Z .reg .b64 %rd<356>; 2026-02-21T09:05:30.3019289Z .loc 1 19 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:19:0 2026-02-21T09:05:30.3019353Z $L__func_begin0: 2026-02-21T09:05:30.3019549Z .loc 1 19 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:19:0 2026-02-21T09:05:30.3019560Z 2026-02-21T09:05:30.3019623Z // %bb.0: 2026-02-21T09:05:30.3019715Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:05:30.3019782Z $L__tmp0: 2026-02-21T09:05:30.3019954Z .loc 1 19 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:19 2026-02-21T09:05:30.3020014Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:30.3020075Z shr.u32 %r2, %r1, 5; 2026-02-21T09:05:30.3020151Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:05:30.3020217Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:05:30.3020274Z @%p3 bra $L__BB0_16; 2026-02-21T09:05:30.3020335Z bra.uni $L__BB0_1; 2026-02-21T09:05:30.3020432Z $L__BB0_16: 2026-02-21T09:05:30.3020609Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0:0 2026-02-21T09:05:30.3020702Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:05:30.3020778Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:05:30.3020854Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:05:30.3021017Z .loc 1 19 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:19 2026-02-21T09:05:30.3021107Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:30.3021171Z setp.lt.u32 %p45, %r1, 32; 2026-02-21T09:05:30.3021233Z mov.b32 %r200, global_smem; 2026-02-21T09:05:30.3021297Z // begin inline asm 2026-02-21T09:05:30.3021447Z @%p45 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r200], 256; 2026-02-21T09:05:30.3021502Z // end inline asm 2026-02-21T09:05:30.3021563Z bar.sync 0, 128; 2026-02-21T09:05:30.3021632Z ld.shared.b32 %r775, [global_smem]; 2026-02-21T09:05:30.3021688Z bar.sync 0, 128; 2026-02-21T09:05:30.3021744Z // begin inline asm 2026-02-21T09:05:30.3021876Z @%p45 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:30.3021929Z // end inline asm 2026-02-21T09:05:30.3022105Z .loc 1 21 67 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:21:67 2026-02-21T09:05:30.3022213Z mov.u32 %r44, %ctaid.x; 2026-02-21T09:05:30.3022272Z mov.u32 %r380, %ctaid.y; 2026-02-21T09:05:30.3022327Z mov.u32 %r381, %ctaid.z; 2026-02-21T09:05:30.3022394Z mov.u32 %r382, %nctaid.x; 2026-02-21T09:05:30.3022452Z mov.u32 %r383, %nctaid.y; 2026-02-21T09:05:30.3022519Z mad.lo.s32 %r384, %r381, %r383, %r380; 2026-02-21T09:05:30.3022586Z mad.lo.s32 %r385, %r384, %r382, %r44; 2026-02-21T09:05:30.3022657Z mul.lo.s32 %r386, %r385, 384; 2026-02-21T09:05:30.3022717Z cvt.s64.s32 %rd96, %r386; 2026-02-21T09:05:30.3022778Z add.s64 %rd57, %rd7, %rd96; 2026-02-21T09:05:30.3022846Z shl.b32 %r387, %r1, 2; 2026-02-21T09:05:30.3022910Z add.s32 %r201, %r200, %r387; 2026-02-21T09:05:30.3022970Z mov.b32 %r806, 0; 2026-02-21T09:05:30.3023027Z // begin inline asm 2026-02-21T09:05:30.3023112Z @%p45 st.shared.b32 [ %r201 + 0 ], %r806; 2026-02-21T09:05:30.3023170Z // end inline asm 2026-02-21T09:05:30.3023236Z bar.warp.sync -1; 2026-02-21T09:05:30.3023315Z setp.eq.b32 %p137, %r1, 0; 2026-02-21T09:05:30.3023376Z cvt.u64.u32 %rd42, %r200; 2026-02-21T09:05:30.3023434Z // begin inline asm 2026-02-21T09:05:30.3023635Z @%p137 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd42 + 0 ], %rd4; 2026-02-21T09:05:30.3023701Z // end inline asm 2026-02-21T09:05:30.3023755Z // begin inline asm 2026-02-21T09:05:30.3023897Z @%p137 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:05:30.3023960Z // end inline asm 2026-02-21T09:05:30.3024014Z mov.b32 %r203, 16; 2026-02-21T09:05:30.3024067Z // begin inline asm 2026-02-21T09:05:30.3024230Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r203; 2026-02-21T09:05:30.3024307Z // end inline asm 2026-02-21T09:05:30.3024363Z mov.b32 %r204, 256; 2026-02-21T09:05:30.3024415Z // begin inline asm 2026-02-21T09:05:30.3024572Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r204; 2026-02-21T09:05:30.3024629Z // end inline asm 2026-02-21T09:05:30.3024746Z mov.b32 %r205, 2048; 2026-02-21T09:05:30.3024812Z // begin inline asm 2026-02-21T09:05:30.3024981Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r205; 2026-02-21T09:05:30.3025034Z // end inline asm 2026-02-21T09:05:30.3025096Z mov.b32 %r206, 4096; 2026-02-21T09:05:30.3025149Z // begin inline asm 2026-02-21T09:05:30.3025306Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r206; 2026-02-21T09:05:30.3025359Z // end inline asm 2026-02-21T09:05:30.3025421Z mov.b64 %rd50, 4096; 2026-02-21T09:05:30.3025475Z // begin inline asm 2026-02-21T09:05:30.3025679Z @%p137 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd42 + 0 ], 0x0, %rd50; 2026-02-21T09:05:30.3025745Z // end inline asm 2026-02-21T09:05:30.3025800Z mov.b32 %r207, 1; 2026-02-21T09:05:30.3025854Z // begin inline asm 2026-02-21T09:05:30.3026035Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r207; 2026-02-21T09:05:30.3026090Z // end inline asm 2026-02-21T09:05:30.3026145Z // begin inline asm 2026-02-21T09:05:30.3026309Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r207; 2026-02-21T09:05:30.3026370Z // end inline asm 2026-02-21T09:05:30.3026423Z // begin inline asm 2026-02-21T09:05:30.3026795Z @%p137 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x6; 2026-02-21T09:05:30.3026856Z // end inline asm 2026-02-21T09:05:30.3026909Z // begin inline asm 2026-02-21T09:05:30.3027071Z @%p137 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:05:30.3027132Z // end inline asm 2026-02-21T09:05:30.3027185Z // begin inline asm 2026-02-21T09:05:30.3027333Z @%p137 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:05:30.3027391Z // end inline asm 2026-02-21T09:05:30.3027445Z // begin inline asm 2026-02-21T09:05:30.3027627Z @%p137 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:05:30.3027680Z // end inline asm 2026-02-21T09:05:30.3027739Z // begin inline asm 2026-02-21T09:05:30.3028000Z @%p45 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd57 + 0 ], [ %rd42 + 0 ], 0x80; 2026-02-21T09:05:30.3028053Z // end inline asm 2026-02-21T09:05:30.3028113Z // begin inline asm 2026-02-21T09:05:30.3028242Z @%p45 fence.proxy.tensormap::generic.acquire.gpu [ %rd57 + 0 ], 0x80; 2026-02-21T09:05:30.3028313Z @%p45 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.3028394Z @%p45 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.3028446Z // end inline asm 2026-02-21T09:05:30.3028499Z bar.sync 0, 128; 2026-02-21T09:05:30.3028675Z .loc 1 22 67 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:22:67 2026-02-21T09:05:30.3028732Z add.s32 %r388, %r386, 128; 2026-02-21T09:05:30.3028790Z cvt.s64.s32 %rd97, %r388; 2026-02-21T09:05:30.3028850Z add.s64 %rd75, %rd7, %rd97; 2026-02-21T09:05:30.3028910Z bar.sync 0, 128; 2026-02-21T09:05:30.3028964Z // begin inline asm 2026-02-21T09:05:30.3029033Z @%p45 st.shared.b32 [ %r201 + 0 ], %r806; 2026-02-21T09:05:30.3029115Z // end inline asm 2026-02-21T09:05:30.3029175Z bar.warp.sync -1; 2026-02-21T09:05:30.3029229Z // begin inline asm 2026-02-21T09:05:30.3029386Z @%p137 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd42 + 0 ], %rd5; 2026-02-21T09:05:30.3029446Z // end inline asm 2026-02-21T09:05:30.3029499Z // begin inline asm 2026-02-21T09:05:30.3029634Z @%p137 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:05:30.3029695Z // end inline asm 2026-02-21T09:05:30.3029775Z // begin inline asm 2026-02-21T09:05:30.3029923Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r203; 2026-02-21T09:05:30.3029982Z // end inline asm 2026-02-21T09:05:30.3030035Z // begin inline asm 2026-02-21T09:05:30.3030185Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r203; 2026-02-21T09:05:30.3030237Z // end inline asm 2026-02-21T09:05:30.3030299Z // begin inline asm 2026-02-21T09:05:30.3030457Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r205; 2026-02-21T09:05:30.3030510Z // end inline asm 2026-02-21T09:05:30.3030571Z // begin inline asm 2026-02-21T09:05:30.3030730Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r205; 2026-02-21T09:05:30.3030783Z // end inline asm 2026-02-21T09:05:30.3030843Z // begin inline asm 2026-02-21T09:05:30.3031031Z @%p137 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd42 + 0 ], 0x0, %rd50; 2026-02-21T09:05:30.3031089Z // end inline asm 2026-02-21T09:05:30.3031151Z // begin inline asm 2026-02-21T09:05:30.3031320Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r207; 2026-02-21T09:05:30.3031389Z // end inline asm 2026-02-21T09:05:30.3031442Z // begin inline asm 2026-02-21T09:05:30.3031618Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r207; 2026-02-21T09:05:30.3031671Z // end inline asm 2026-02-21T09:05:30.3031726Z // begin inline asm 2026-02-21T09:05:30.3031879Z @%p137 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x6; 2026-02-21T09:05:30.3031932Z // end inline asm 2026-02-21T09:05:30.3031985Z // begin inline asm 2026-02-21T09:05:30.3032157Z @%p137 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:05:30.3032210Z // end inline asm 2026-02-21T09:05:30.3032262Z // begin inline asm 2026-02-21T09:05:30.3032414Z @%p137 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:05:30.3032474Z // end inline asm 2026-02-21T09:05:30.3032527Z // begin inline asm 2026-02-21T09:05:30.3032668Z @%p137 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:05:30.3032750Z // end inline asm 2026-02-21T09:05:30.3032804Z // begin inline asm 2026-02-21T09:05:30.3033063Z @%p45 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd75 + 0 ], [ %rd42 + 0 ], 0x80; 2026-02-21T09:05:30.3033124Z // end inline asm 2026-02-21T09:05:30.3033178Z // begin inline asm 2026-02-21T09:05:30.3033302Z @%p45 fence.proxy.tensormap::generic.acquire.gpu [ %rd75 + 0 ], 0x80; 2026-02-21T09:05:30.3033378Z @%p45 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.3033450Z @%p45 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.3033502Z // end inline asm 2026-02-21T09:05:30.3033555Z bar.sync 0, 128; 2026-02-21T09:05:30.3033729Z .loc 1 24 71 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:24:71 2026-02-21T09:05:30.3033789Z add.s32 %r389, %r386, 256; 2026-02-21T09:05:30.3033846Z cvt.s64.s32 %rd98, %r389; 2026-02-21T09:05:30.3033911Z add.s64 %rd93, %rd7, %rd98; 2026-02-21T09:05:30.3033965Z bar.sync 0, 128; 2026-02-21T09:05:30.3034019Z // begin inline asm 2026-02-21T09:05:30.3034086Z @%p45 st.shared.b32 [ %r201 + 0 ], %r806; 2026-02-21T09:05:30.3034144Z // end inline asm 2026-02-21T09:05:30.3034245Z bar.warp.sync -1; 2026-02-21T09:05:30.3034301Z // begin inline asm 2026-02-21T09:05:30.3034468Z @%p137 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd42 + 0 ], %rd6; 2026-02-21T09:05:30.3034521Z // end inline asm 2026-02-21T09:05:30.3034575Z // begin inline asm 2026-02-21T09:05:30.3034748Z @%p137 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:05:30.3034804Z // end inline asm 2026-02-21T09:05:30.3034857Z // begin inline asm 2026-02-21T09:05:30.3035006Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r203; 2026-02-21T09:05:30.3035106Z // end inline asm 2026-02-21T09:05:30.3035159Z // begin inline asm 2026-02-21T09:05:30.3035307Z @%p137 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r204; 2026-02-21T09:05:30.3035368Z // end inline asm 2026-02-21T09:05:30.3035421Z // begin inline asm 2026-02-21T09:05:30.3035582Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r205; 2026-02-21T09:05:30.3035646Z // end inline asm 2026-02-21T09:05:30.3035700Z // begin inline asm 2026-02-21T09:05:30.3035857Z @%p137 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r206; 2026-02-21T09:05:30.3035917Z // end inline asm 2026-02-21T09:05:30.3035970Z // begin inline asm 2026-02-21T09:05:30.3036138Z @%p137 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd42 + 0 ], 0x0, %rd50; 2026-02-21T09:05:30.3036216Z // end inline asm 2026-02-21T09:05:30.3036281Z // begin inline asm 2026-02-21T09:05:30.3036450Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r207; 2026-02-21T09:05:30.3036502Z // end inline asm 2026-02-21T09:05:30.3036560Z // begin inline asm 2026-02-21T09:05:30.3036728Z @%p137 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r207; 2026-02-21T09:05:30.3036780Z // end inline asm 2026-02-21T09:05:30.3036839Z // begin inline asm 2026-02-21T09:05:30.3036990Z @%p137 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x6; 2026-02-21T09:05:30.3037042Z // end inline asm 2026-02-21T09:05:30.3037094Z // begin inline asm 2026-02-21T09:05:30.3037265Z @%p137 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:05:30.3037317Z // end inline asm 2026-02-21T09:05:30.3037369Z // begin inline asm 2026-02-21T09:05:30.3037527Z @%p137 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:05:30.3037581Z // end inline asm 2026-02-21T09:05:30.3037634Z // begin inline asm 2026-02-21T09:05:30.3037782Z @%p137 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:05:30.3037834Z // end inline asm 2026-02-21T09:05:30.3037924Z // begin inline asm 2026-02-21T09:05:30.3038190Z @%p45 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd93 + 0 ], [ %rd42 + 0 ], 0x80; 2026-02-21T09:05:30.3038242Z // end inline asm 2026-02-21T09:05:30.3038296Z // begin inline asm 2026-02-21T09:05:30.3038417Z @%p45 fence.proxy.tensormap::generic.acquire.gpu [ %rd93 + 0 ], 0x80; 2026-02-21T09:05:30.3038492Z @%p45 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.3038562Z @%p45 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.3038614Z // end inline asm 2026-02-21T09:05:30.3038674Z bar.sync 0, 128; 2026-02-21T09:05:30.3038737Z cvta.global.u64 %rd99, %rd93; 2026-02-21T09:05:30.3038908Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3038976Z max.u32 %r390, %r44, 511; 2026-02-21T09:05:30.3039035Z shl.b32 %r391, %r390, 7; 2026-02-21T09:05:30.3039093Z sub.s32 %r45, 65536, %r391; 2026-02-21T09:05:30.3039273Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3039355Z shfl.sync.idx.b32 %r392, %r2, 0, 31, -1; 2026-02-21T09:05:30.3039416Z and.b32 %r46, %r392, 3; 2026-02-21T09:05:30.3039499Z shl.b32 %r393, %r46, 21; 2026-02-21T09:05:30.3039572Z add.s32 %r225, %r393, %r775; 2026-02-21T09:05:30.3039637Z mov.pred %p101, -1; 2026-02-21T09:05:30.3039694Z // begin inline asm 2026-02-21T09:05:30.3040003Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 0], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3040069Z // end inline asm 2026-02-21T09:05:30.3040128Z // begin inline asm 2026-02-21T09:05:30.3040437Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 16], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3040505Z // end inline asm 2026-02-21T09:05:30.3040563Z // begin inline asm 2026-02-21T09:05:30.3040842Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 32], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3040908Z // end inline asm 2026-02-21T09:05:30.3040964Z // begin inline asm 2026-02-21T09:05:30.3041242Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 48], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3041303Z // end inline asm 2026-02-21T09:05:30.3041359Z // begin inline asm 2026-02-21T09:05:30.3041655Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 64], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3041721Z // end inline asm 2026-02-21T09:05:30.3041777Z // begin inline asm 2026-02-21T09:05:30.3042068Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 80], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3042125Z // end inline asm 2026-02-21T09:05:30.3042187Z // begin inline asm 2026-02-21T09:05:30.3042484Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 96], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3042540Z // end inline asm 2026-02-21T09:05:30.3042604Z // begin inline asm 2026-02-21T09:05:30.3042903Z @%p101 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r225 + 112], {%r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806, %r806}; 2026-02-21T09:05:30.3042960Z // end inline asm 2026-02-21T09:05:30.3043025Z // begin inline asm 2026-02-21T09:05:30.3043096Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:30.3043150Z // end inline asm 2026-02-21T09:05:30.3043204Z bar.sync 0, 128; 2026-02-21T09:05:30.3043385Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3043471Z add.s32 %r361, %r200, 165888; 2026-02-21T09:05:30.3043527Z // begin inline asm 2026-02-21T09:05:30.3043626Z @%p137 mbarrier.init.shared::cta.b64 [%r361], 1; 2026-02-21T09:05:30.3043682Z // end inline asm 2026-02-21T09:05:30.3043737Z bar.sync 0, 128; 2026-02-21T09:05:30.3043804Z add.s32 %r362, %r200, 165896; 2026-02-21T09:05:30.3043860Z // begin inline asm 2026-02-21T09:05:30.3043946Z @%p137 mbarrier.init.shared::cta.b64 [%r362], 1; 2026-02-21T09:05:30.3044001Z // end inline asm 2026-02-21T09:05:30.3044066Z bar.sync 0, 128; 2026-02-21T09:05:30.3044124Z add.s32 %r363, %r200, 165904; 2026-02-21T09:05:30.3044183Z // begin inline asm 2026-02-21T09:05:30.3044275Z @%p137 mbarrier.init.shared::cta.b64 [%r363], 1; 2026-02-21T09:05:30.3044330Z // end inline asm 2026-02-21T09:05:30.3044385Z bar.sync 0, 128; 2026-02-21T09:05:30.3044443Z add.s32 %r364, %r200, 165912; 2026-02-21T09:05:30.3044509Z // begin inline asm 2026-02-21T09:05:30.3044589Z @%p137 mbarrier.init.shared::cta.b64 [%r364], 1; 2026-02-21T09:05:30.3044644Z // end inline asm 2026-02-21T09:05:30.3044739Z add.s32 %r365, %r200, 165920; 2026-02-21T09:05:30.3044828Z // begin inline asm 2026-02-21T09:05:30.3044913Z @%p137 mbarrier.init.shared::cta.b64 [%r365], 1; 2026-02-21T09:05:30.3044975Z // end inline asm 2026-02-21T09:05:30.3045031Z bar.sync 0, 128; 2026-02-21T09:05:30.3045089Z add.s32 %r366, %r200, 165928; 2026-02-21T09:05:30.3045144Z // begin inline asm 2026-02-21T09:05:30.3045234Z @%p137 mbarrier.init.shared::cta.b64 [%r366], 1; 2026-02-21T09:05:30.3045288Z // end inline asm 2026-02-21T09:05:30.3045342Z bar.sync 0, 128; 2026-02-21T09:05:30.3045408Z add.s32 %r367, %r200, 165936; 2026-02-21T09:05:30.3045496Z // begin inline asm 2026-02-21T09:05:30.3045576Z @%p137 mbarrier.init.shared::cta.b64 [%r367], 1; 2026-02-21T09:05:30.3045630Z // end inline asm 2026-02-21T09:05:30.3045694Z bar.sync 0, 128; 2026-02-21T09:05:30.3045754Z add.s32 %r368, %r200, 165944; 2026-02-21T09:05:30.3045811Z // begin inline asm 2026-02-21T09:05:30.3045899Z @%p137 mbarrier.init.shared::cta.b64 [%r368], 1; 2026-02-21T09:05:30.3045955Z // end inline asm 2026-02-21T09:05:30.3046129Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3046193Z bar.sync 0, 128; 2026-02-21T09:05:30.3046248Z // begin inline asm 2026-02-21T09:05:30.3046341Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r361]; 2026-02-21T09:05:30.3046397Z // end inline asm 2026-02-21T09:05:30.3046461Z bar.sync 0, 128; 2026-02-21T09:05:30.3046517Z // begin inline asm 2026-02-21T09:05:30.3046632Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r362]; 2026-02-21T09:05:30.3046700Z // end inline asm 2026-02-21T09:05:30.3046755Z bar.sync 0, 128; 2026-02-21T09:05:30.3046810Z // begin inline asm 2026-02-21T09:05:30.3046895Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r363]; 2026-02-21T09:05:30.3046959Z // end inline asm 2026-02-21T09:05:30.3047025Z bar.sync 0, 128; 2026-02-21T09:05:30.3047080Z // begin inline asm 2026-02-21T09:05:30.3047167Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r364]; 2026-02-21T09:05:30.3047220Z // end inline asm 2026-02-21T09:05:30.3047387Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3047449Z bar.sync 0, 128; 2026-02-21T09:05:30.3047504Z add.s32 %r373, %r200, 165952; 2026-02-21T09:05:30.3047556Z // begin inline asm 2026-02-21T09:05:30.3047635Z @%p137 mbarrier.init.shared::cta.b64 [%r373], 1; 2026-02-21T09:05:30.3047695Z // end inline asm 2026-02-21T09:05:30.3047748Z bar.sync 0, 128; 2026-02-21T09:05:30.3047805Z add.s32 %r374, %r200, 165960; 2026-02-21T09:05:30.3047868Z // begin inline asm 2026-02-21T09:05:30.3047945Z @%p137 mbarrier.init.shared::cta.b64 [%r374], 1; 2026-02-21T09:05:30.3047997Z // end inline asm 2026-02-21T09:05:30.3048051Z add.s32 %r375, %r200, 165968; 2026-02-21T09:05:30.3048141Z // begin inline asm 2026-02-21T09:05:30.3048219Z @%p137 mbarrier.init.shared::cta.b64 [%r375], 1; 2026-02-21T09:05:30.3048273Z // end inline asm 2026-02-21T09:05:30.3048335Z bar.sync 0, 128; 2026-02-21T09:05:30.3048394Z add.s32 %r376, %r200, 165976; 2026-02-21T09:05:30.3048450Z // begin inline asm 2026-02-21T09:05:30.3048537Z @%p137 mbarrier.init.shared::cta.b64 [%r376], 1; 2026-02-21T09:05:30.3048592Z // end inline asm 2026-02-21T09:05:30.3048765Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3048817Z bar.sync 0, 128; 2026-02-21T09:05:30.3048879Z // begin inline asm 2026-02-21T09:05:30.3048963Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r375]; 2026-02-21T09:05:30.3049017Z // end inline asm 2026-02-21T09:05:30.3049075Z bar.sync 0, 128; 2026-02-21T09:05:30.3049128Z // begin inline asm 2026-02-21T09:05:30.3049209Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r376]; 2026-02-21T09:05:30.3049261Z // end inline asm 2026-02-21T09:05:30.3049436Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3049517Z st.shared.b32 [global_smem+165984], 33554689; 2026-02-21T09:05:30.3049620Z st.shared.b32 [global_smem+131072], %r775; 2026-02-21T09:05:30.3049684Z barrier.sync 1; 2026-02-21T09:05:30.3049761Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:30.3049817Z barrier.sync 1; 2026-02-21T09:05:30.3049886Z setp.lt.s32 %p127, %r45, 1; 2026-02-21T09:05:30.3049941Z mov.b32 %r808, %r806; 2026-02-21T09:05:30.3049997Z @%p127 bra $L__BB0_23; 2026-02-21T09:05:30.3050075Z // %bb.17: // %.lr.ph17 2026-02-21T09:05:30.3050141Z add.s32 %r804, %r44, -1; 2026-02-21T09:05:30.3050222Z shl.b32 %r396, %r1, 5; 2026-02-21T09:05:30.3050279Z and.b32 %r397, %r396, 3936; 2026-02-21T09:05:30.3050344Z bfe.s32 %r398, %r1, 2, 1; 2026-02-21T09:05:30.3050401Z and.b32 %r399, %r398, 144; 2026-02-21T09:05:30.3050456Z or.b32 %r400, %r399, %r397; 2026-02-21T09:05:30.3050515Z add.s32 %r402, %r200, 131072; 2026-02-21T09:05:30.3050578Z add.s32 %r49, %r402, %r400; 2026-02-21T09:05:30.3050633Z xor.b32 %r403, %r400, 16; 2026-02-21T09:05:30.3050687Z add.s32 %r50, %r402, %r403; 2026-02-21T09:05:30.3050753Z shl.b32 %r404, %r46, 13; 2026-02-21T09:05:30.3050811Z add.s32 %r547, %r402, %r404; 2026-02-21T09:05:30.3050866Z shl.b32 %r52, %r46, 8; 2026-02-21T09:05:30.3050919Z mov.b32 %r800, -1; 2026-02-21T09:05:30.3050979Z mov.b32 %r808, %r806; 2026-02-21T09:05:30.3051033Z mov.b32 %r803, %r806; 2026-02-21T09:05:30.3051088Z mov.b32 %r802, %r806; 2026-02-21T09:05:30.3051149Z mov.b32 %r801, %r806; 2026-02-21T09:05:30.3051206Z bra.uni $L__BB0_18; 2026-02-21T09:05:30.3051330Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.3051508Z .loc 1 0 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0:74 2026-02-21T09:05:30.3051571Z setp.lt.u32 %p133, %r1, 128; 2026-02-21T09:05:30.3051741Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3051798Z shl.b32 %r549, %r806, 3; 2026-02-21T09:05:30.3051863Z add.s32 %r551, %r200, %r549; 2026-02-21T09:05:30.3051920Z add.s32 %r407, %r551, 165952; 2026-02-21T09:05:30.3052086Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3052151Z shl.b32 %r552, %r806, 7; 2026-02-21T09:05:30.3052203Z bar.sync 0, 128; 2026-02-21T09:05:30.3052256Z // begin inline asm 2026-02-21T09:05:30.3052310Z 2026-02-21T09:05:30.3052361Z { 2026-02-21T09:05:30.3052419Z .reg .pred complete; 2026-02-21T09:05:30.3052474Z waitLoop: 2026-02-21T09:05:30.3052602Z mbarrier.try_wait.parity.shared.b64 complete, [%r407], %r808; 2026-02-21T09:05:30.3052664Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.3052713Z } 2026-02-21T09:05:30.3052718Z 2026-02-21T09:05:30.3052777Z // end inline asm 2026-02-21T09:05:30.3052867Z add.s32 %r425, %r225, %r552; 2026-02-21T09:05:30.3052920Z // begin inline asm 2026-02-21T09:05:30.3053190Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424}, [%r425 + 0]; 2026-02-21T09:05:30.3053250Z // end inline asm 2026-02-21T09:05:30.3053304Z // begin inline asm 2026-02-21T09:05:30.3053563Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441}, [%r425 + 16]; 2026-02-21T09:05:30.3053623Z // end inline asm 2026-02-21T09:05:30.3053689Z // begin inline asm 2026-02-21T09:05:30.3053949Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458}, [%r425 + 32]; 2026-02-21T09:05:30.3054010Z // end inline asm 2026-02-21T09:05:30.3054062Z // begin inline asm 2026-02-21T09:05:30.3054316Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475}, [%r425 + 48]; 2026-02-21T09:05:30.3054376Z // end inline asm 2026-02-21T09:05:30.3054429Z // begin inline asm 2026-02-21T09:05:30.3054744Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492}, [%r425 + 64]; 2026-02-21T09:05:30.3054799Z // end inline asm 2026-02-21T09:05:30.3054858Z // begin inline asm 2026-02-21T09:05:30.3055120Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509}, [%r425 + 80]; 2026-02-21T09:05:30.3055200Z // end inline asm 2026-02-21T09:05:30.3055262Z // begin inline asm 2026-02-21T09:05:30.3055533Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526}, [%r425 + 96]; 2026-02-21T09:05:30.3055588Z // end inline asm 2026-02-21T09:05:30.3055648Z // begin inline asm 2026-02-21T09:05:30.3055914Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543}, [%r425 + 112]; 2026-02-21T09:05:30.3055966Z // end inline asm 2026-02-21T09:05:30.3056028Z // begin inline asm 2026-02-21T09:05:30.3056092Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:30.3056144Z // end inline asm 2026-02-21T09:05:30.3056202Z cvt.u64.u32 %rd100, %r409; 2026-02-21T09:05:30.3056269Z cvt.u64.u32 %rd101, %r410; 2026-02-21T09:05:30.3056326Z shl.b64 %rd102, %rd101, 32; 2026-02-21T09:05:30.3056414Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T09:05:30.3056592Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3056655Z mov.b64 {%r553, %r554}, %rd103; 2026-02-21T09:05:30.3056721Z cvt.rn.f16x2.f32 %r555, %r554, %r553; 2026-02-21T09:05:30.3056903Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3056963Z cvt.u64.u32 %rd104, %r411; 2026-02-21T09:05:30.3057019Z cvt.u64.u32 %rd105, %r412; 2026-02-21T09:05:30.3057079Z shl.b64 %rd106, %rd105, 32; 2026-02-21T09:05:30.3057148Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T09:05:30.3057312Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3057375Z mov.b64 {%r556, %r557}, %rd107; 2026-02-21T09:05:30.3057447Z cvt.rn.f16x2.f32 %r558, %r557, %r556; 2026-02-21T09:05:30.3057615Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3057674Z cvt.u64.u32 %rd108, %r413; 2026-02-21T09:05:30.3057736Z cvt.u64.u32 %rd109, %r414; 2026-02-21T09:05:30.3057792Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:05:30.3057849Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:05:30.3058042Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3058109Z mov.b64 {%r559, %r560}, %rd111; 2026-02-21T09:05:30.3058173Z cvt.rn.f16x2.f32 %r561, %r560, %r559; 2026-02-21T09:05:30.3058338Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3058401Z cvt.u64.u32 %rd112, %r415; 2026-02-21T09:05:30.3058456Z cvt.u64.u32 %rd113, %r416; 2026-02-21T09:05:30.3058512Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:05:30.3058571Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:05:30.3058743Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3058802Z mov.b64 {%r562, %r563}, %rd115; 2026-02-21T09:05:30.3058865Z cvt.rn.f16x2.f32 %r564, %r563, %r562; 2026-02-21T09:05:30.3059032Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3059089Z cvt.u64.u32 %rd116, %r417; 2026-02-21T09:05:30.3059145Z cvt.u64.u32 %rd117, %r418; 2026-02-21T09:05:30.3059209Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:05:30.3059267Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:05:30.3059454Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3059511Z mov.b64 {%r565, %r566}, %rd119; 2026-02-21T09:05:30.3059580Z cvt.rn.f16x2.f32 %r567, %r566, %r565; 2026-02-21T09:05:30.3059744Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3059800Z cvt.u64.u32 %rd120, %r419; 2026-02-21T09:05:30.3059863Z cvt.u64.u32 %rd121, %r420; 2026-02-21T09:05:30.3059921Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:05:30.3059999Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:05:30.3060170Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3060227Z mov.b64 {%r568, %r569}, %rd123; 2026-02-21T09:05:30.3060290Z cvt.rn.f16x2.f32 %r570, %r569, %r568; 2026-02-21T09:05:30.3060457Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3060522Z cvt.u64.u32 %rd124, %r421; 2026-02-21T09:05:30.3060577Z cvt.u64.u32 %rd125, %r422; 2026-02-21T09:05:30.3060631Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:05:30.3060695Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:05:30.3060859Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3060915Z mov.b64 {%r571, %r572}, %rd127; 2026-02-21T09:05:30.3060983Z cvt.rn.f16x2.f32 %r573, %r572, %r571; 2026-02-21T09:05:30.3061167Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3061226Z cvt.u64.u32 %rd128, %r423; 2026-02-21T09:05:30.3061281Z cvt.u64.u32 %rd129, %r424; 2026-02-21T09:05:30.3061344Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:05:30.3061401Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:05:30.3061564Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3061630Z mov.b64 {%r574, %r575}, %rd131; 2026-02-21T09:05:30.3061690Z cvt.rn.f16x2.f32 %r576, %r575, %r574; 2026-02-21T09:05:30.3061854Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3061915Z cvt.u64.u32 %rd132, %r426; 2026-02-21T09:05:30.3061969Z cvt.u64.u32 %rd133, %r427; 2026-02-21T09:05:30.3062025Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:05:30.3062081Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:05:30.3062250Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3062308Z mov.b64 {%r577, %r578}, %rd135; 2026-02-21T09:05:30.3062367Z cvt.rn.f16x2.f32 %r579, %r578, %r577; 2026-02-21T09:05:30.3062543Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3062621Z cvt.u64.u32 %rd136, %r428; 2026-02-21T09:05:30.3062676Z cvt.u64.u32 %rd137, %r429; 2026-02-21T09:05:30.3062738Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:05:30.3062794Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:05:30.3062960Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3063015Z mov.b64 {%r580, %r581}, %rd139; 2026-02-21T09:05:30.3063082Z cvt.rn.f16x2.f32 %r582, %r581, %r580; 2026-02-21T09:05:30.3063247Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3063304Z cvt.u64.u32 %rd140, %r430; 2026-02-21T09:05:30.3063368Z cvt.u64.u32 %rd141, %r431; 2026-02-21T09:05:30.3063423Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:05:30.3063479Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:05:30.3063649Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3063707Z mov.b64 {%r583, %r584}, %rd143; 2026-02-21T09:05:30.3063767Z cvt.rn.f16x2.f32 %r585, %r584, %r583; 2026-02-21T09:05:30.3063947Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3064012Z cvt.u64.u32 %rd144, %r432; 2026-02-21T09:05:30.3064066Z cvt.u64.u32 %rd145, %r433; 2026-02-21T09:05:30.3064122Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:05:30.3064187Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:05:30.3064343Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3064400Z mov.b64 {%r586, %r587}, %rd147; 2026-02-21T09:05:30.3064501Z cvt.rn.f16x2.f32 %r588, %r587, %r586; 2026-02-21T09:05:30.3064658Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3064743Z cvt.u64.u32 %rd148, %r434; 2026-02-21T09:05:30.3064802Z cvt.u64.u32 %rd149, %r435; 2026-02-21T09:05:30.3064868Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:05:30.3064926Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:05:30.3065084Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3065148Z mov.b64 {%r589, %r590}, %rd151; 2026-02-21T09:05:30.3065209Z cvt.rn.f16x2.f32 %r591, %r590, %r589; 2026-02-21T09:05:30.3065365Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3065430Z cvt.u64.u32 %rd152, %r436; 2026-02-21T09:05:30.3065485Z cvt.u64.u32 %rd153, %r437; 2026-02-21T09:05:30.3065542Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:05:30.3065636Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:05:30.3065814Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3065873Z mov.b64 {%r592, %r593}, %rd155; 2026-02-21T09:05:30.3065935Z cvt.rn.f16x2.f32 %r594, %r593, %r592; 2026-02-21T09:05:30.3066120Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3066182Z cvt.u64.u32 %rd156, %r438; 2026-02-21T09:05:30.3066241Z cvt.u64.u32 %rd157, %r439; 2026-02-21T09:05:30.3066306Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:05:30.3066362Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:05:30.3066523Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3066580Z mov.b64 {%r595, %r596}, %rd159; 2026-02-21T09:05:30.3066648Z cvt.rn.f16x2.f32 %r597, %r596, %r595; 2026-02-21T09:05:30.3066810Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3066868Z cvt.u64.u32 %rd160, %r440; 2026-02-21T09:05:30.3066930Z cvt.u64.u32 %rd161, %r441; 2026-02-21T09:05:30.3066985Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:05:30.3067042Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:05:30.3067235Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3067292Z mov.b64 {%r598, %r599}, %rd163; 2026-02-21T09:05:30.3067352Z cvt.rn.f16x2.f32 %r600, %r599, %r598; 2026-02-21T09:05:30.3067514Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3067578Z cvt.u64.u32 %rd164, %r443; 2026-02-21T09:05:30.3067631Z cvt.u64.u32 %rd165, %r444; 2026-02-21T09:05:30.3067686Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:05:30.3067751Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:05:30.3067913Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3067971Z mov.b64 {%r601, %r602}, %rd167; 2026-02-21T09:05:30.3068038Z cvt.rn.f16x2.f32 %r603, %r602, %r601; 2026-02-21T09:05:30.3068200Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3068258Z cvt.u64.u32 %rd168, %r445; 2026-02-21T09:05:30.3068313Z cvt.u64.u32 %rd169, %r446; 2026-02-21T09:05:30.3068375Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:05:30.3068431Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:05:30.3068619Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3068686Z mov.b64 {%r604, %r605}, %rd171; 2026-02-21T09:05:30.3068746Z cvt.rn.f16x2.f32 %r606, %r605, %r604; 2026-02-21T09:05:30.3068906Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3068969Z cvt.u64.u32 %rd172, %r447; 2026-02-21T09:05:30.3069026Z cvt.u64.u32 %rd173, %r448; 2026-02-21T09:05:30.3069109Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:05:30.3069165Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:05:30.3069338Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3069396Z mov.b64 {%r607, %r608}, %rd175; 2026-02-21T09:05:30.3069456Z cvt.rn.f16x2.f32 %r609, %r608, %r607; 2026-02-21T09:05:30.3069625Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3069682Z cvt.u64.u32 %rd176, %r449; 2026-02-21T09:05:30.3069738Z cvt.u64.u32 %rd177, %r450; 2026-02-21T09:05:30.3069799Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:05:30.3069854Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:05:30.3070015Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3070072Z mov.b64 {%r610, %r611}, %rd179; 2026-02-21T09:05:30.3070171Z cvt.rn.f16x2.f32 %r612, %r611, %r610; 2026-02-21T09:05:30.3070334Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3070390Z cvt.u64.u32 %rd180, %r451; 2026-02-21T09:05:30.3070453Z cvt.u64.u32 %rd181, %r452; 2026-02-21T09:05:30.3070511Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:05:30.3070565Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:05:30.3070734Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3070792Z mov.b64 {%r613, %r614}, %rd183; 2026-02-21T09:05:30.3070852Z cvt.rn.f16x2.f32 %r615, %r614, %r613; 2026-02-21T09:05:30.3071011Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3071073Z cvt.u64.u32 %rd184, %r453; 2026-02-21T09:05:30.3071126Z cvt.u64.u32 %rd185, %r454; 2026-02-21T09:05:30.3071181Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:05:30.3071247Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:05:30.3071409Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3071467Z mov.b64 {%r616, %r617}, %rd187; 2026-02-21T09:05:30.3071532Z cvt.rn.f16x2.f32 %r618, %r617, %r616; 2026-02-21T09:05:30.3071713Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3071769Z cvt.u64.u32 %rd188, %r455; 2026-02-21T09:05:30.3071824Z cvt.u64.u32 %rd189, %r456; 2026-02-21T09:05:30.3071888Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:05:30.3071944Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:05:30.3072105Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3072169Z mov.b64 {%r619, %r620}, %rd191; 2026-02-21T09:05:30.3072228Z cvt.rn.f16x2.f32 %r621, %r620, %r619; 2026-02-21T09:05:30.3072390Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3072453Z cvt.u64.u32 %rd192, %r457; 2026-02-21T09:05:30.3072510Z cvt.u64.u32 %rd193, %r458; 2026-02-21T09:05:30.3072565Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:05:30.3072621Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:05:30.3072788Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3072846Z mov.b64 {%r622, %r623}, %rd195; 2026-02-21T09:05:30.3072906Z cvt.rn.f16x2.f32 %r624, %r623, %r622; 2026-02-21T09:05:30.3073097Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3073155Z cvt.u64.u32 %rd196, %r460; 2026-02-21T09:05:30.3073211Z cvt.u64.u32 %rd197, %r461; 2026-02-21T09:05:30.3073274Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:05:30.3073331Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:05:30.3073494Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3073551Z mov.b64 {%r625, %r626}, %rd199; 2026-02-21T09:05:30.3073643Z cvt.rn.f16x2.f32 %r627, %r626, %r625; 2026-02-21T09:05:30.3073806Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3073863Z cvt.u64.u32 %rd200, %r462; 2026-02-21T09:05:30.3073927Z cvt.u64.u32 %rd201, %r463; 2026-02-21T09:05:30.3073984Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:05:30.3074041Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:05:30.3074215Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3074273Z mov.b64 {%r628, %r629}, %rd203; 2026-02-21T09:05:30.3074335Z cvt.rn.f16x2.f32 %r630, %r629, %r628; 2026-02-21T09:05:30.3074498Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3074567Z cvt.u64.u32 %rd204, %r464; 2026-02-21T09:05:30.3074623Z cvt.u64.u32 %rd205, %r465; 2026-02-21T09:05:30.3074729Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:05:30.3074801Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:05:30.3074963Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3075020Z mov.b64 {%r631, %r632}, %rd207; 2026-02-21T09:05:30.3075089Z cvt.rn.f16x2.f32 %r633, %r632, %r631; 2026-02-21T09:05:30.3075252Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3075308Z cvt.u64.u32 %rd208, %r466; 2026-02-21T09:05:30.3075363Z cvt.u64.u32 %rd209, %r467; 2026-02-21T09:05:30.3075428Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:05:30.3075484Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:05:30.3075646Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3075710Z mov.b64 {%r634, %r635}, %rd211; 2026-02-21T09:05:30.3075769Z cvt.rn.f16x2.f32 %r636, %r635, %r634; 2026-02-21T09:05:30.3075932Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3075998Z cvt.u64.u32 %rd212, %r468; 2026-02-21T09:05:30.3076053Z cvt.u64.u32 %rd213, %r469; 2026-02-21T09:05:30.3076108Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:05:30.3076197Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:05:30.3076372Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3076429Z mov.b64 {%r637, %r638}, %rd215; 2026-02-21T09:05:30.3076490Z cvt.rn.f16x2.f32 %r639, %r638, %r637; 2026-02-21T09:05:30.3076661Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3076718Z cvt.u64.u32 %rd216, %r470; 2026-02-21T09:05:30.3076773Z cvt.u64.u32 %rd217, %r471; 2026-02-21T09:05:30.3076837Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:05:30.3076892Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:05:30.3077059Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3077118Z mov.b64 {%r640, %r641}, %rd219; 2026-02-21T09:05:30.3077186Z cvt.rn.f16x2.f32 %r642, %r641, %r640; 2026-02-21T09:05:30.3077356Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3077412Z cvt.u64.u32 %rd220, %r472; 2026-02-21T09:05:30.3077476Z cvt.u64.u32 %rd221, %r473; 2026-02-21T09:05:30.3077532Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:05:30.3077610Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:05:30.3077783Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3077840Z mov.b64 {%r643, %r644}, %rd223; 2026-02-21T09:05:30.3077899Z cvt.rn.f16x2.f32 %r645, %r644, %r643; 2026-02-21T09:05:30.3078067Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3078132Z cvt.u64.u32 %rd224, %r474; 2026-02-21T09:05:30.3078212Z cvt.u64.u32 %rd225, %r475; 2026-02-21T09:05:30.3078267Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:05:30.3078329Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:05:30.3078493Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3078551Z mov.b64 {%r646, %r647}, %rd227; 2026-02-21T09:05:30.3078617Z cvt.rn.f16x2.f32 %r648, %r647, %r646; 2026-02-21T09:05:30.3078781Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3078836Z cvt.u64.u32 %rd228, %r477; 2026-02-21T09:05:30.3078891Z cvt.u64.u32 %rd229, %r478; 2026-02-21T09:05:30.3078955Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:05:30.3079010Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:05:30.3079176Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3079239Z mov.b64 {%r649, %r650}, %rd231; 2026-02-21T09:05:30.3079318Z cvt.rn.f16x2.f32 %r651, %r650, %r649; 2026-02-21T09:05:30.3079484Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3079547Z cvt.u64.u32 %rd232, %r479; 2026-02-21T09:05:30.3079602Z cvt.u64.u32 %rd233, %r480; 2026-02-21T09:05:30.3079661Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:05:30.3079718Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:05:30.3079890Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3079946Z mov.b64 {%r652, %r653}, %rd235; 2026-02-21T09:05:30.3080007Z cvt.rn.f16x2.f32 %r654, %r653, %r652; 2026-02-21T09:05:30.3080174Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3080230Z cvt.u64.u32 %rd236, %r481; 2026-02-21T09:05:30.3080286Z cvt.u64.u32 %rd237, %r482; 2026-02-21T09:05:30.3080348Z shl.b64 %rd238, %rd237, 32; 2026-02-21T09:05:30.3080405Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T09:05:30.3080571Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3080627Z mov.b64 {%r655, %r656}, %rd239; 2026-02-21T09:05:30.3080695Z cvt.rn.f16x2.f32 %r657, %r656, %r655; 2026-02-21T09:05:30.3080881Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3080937Z cvt.u64.u32 %rd240, %r483; 2026-02-21T09:05:30.3081001Z cvt.u64.u32 %rd241, %r484; 2026-02-21T09:05:30.3081057Z shl.b64 %rd242, %rd241, 32; 2026-02-21T09:05:30.3081113Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T09:05:30.3081289Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3081346Z mov.b64 {%r658, %r659}, %rd243; 2026-02-21T09:05:30.3081407Z cvt.rn.f16x2.f32 %r660, %r659, %r658; 2026-02-21T09:05:30.3081571Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3081636Z cvt.u64.u32 %rd244, %r485; 2026-02-21T09:05:30.3081691Z cvt.u64.u32 %rd245, %r486; 2026-02-21T09:05:30.3081747Z shl.b64 %rd246, %rd245, 32; 2026-02-21T09:05:30.3081811Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T09:05:30.3081977Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3082033Z mov.b64 {%r661, %r662}, %rd247; 2026-02-21T09:05:30.3082101Z cvt.rn.f16x2.f32 %r663, %r662, %r661; 2026-02-21T09:05:30.3082283Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3082342Z cvt.u64.u32 %rd248, %r487; 2026-02-21T09:05:30.3082398Z cvt.u64.u32 %rd249, %r488; 2026-02-21T09:05:30.3082463Z shl.b64 %rd250, %rd249, 32; 2026-02-21T09:05:30.3082520Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T09:05:30.3082686Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3082775Z mov.b64 {%r664, %r665}, %rd251; 2026-02-21T09:05:30.3082837Z cvt.rn.f16x2.f32 %r666, %r665, %r664; 2026-02-21T09:05:30.3083003Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3083075Z cvt.u64.u32 %rd252, %r489; 2026-02-21T09:05:30.3083133Z cvt.u64.u32 %rd253, %r490; 2026-02-21T09:05:30.3083191Z shl.b64 %rd254, %rd253, 32; 2026-02-21T09:05:30.3083249Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T09:05:30.3083437Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3083497Z mov.b64 {%r667, %r668}, %rd255; 2026-02-21T09:05:30.3083561Z cvt.rn.f16x2.f32 %r669, %r668, %r667; 2026-02-21T09:05:30.3083744Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3083803Z cvt.u64.u32 %rd256, %r491; 2026-02-21T09:05:30.3083880Z cvt.u64.u32 %rd257, %r492; 2026-02-21T09:05:30.3083949Z shl.b64 %rd258, %rd257, 32; 2026-02-21T09:05:30.3084007Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T09:05:30.3084179Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3084240Z mov.b64 {%r670, %r671}, %rd259; 2026-02-21T09:05:30.3084309Z cvt.rn.f16x2.f32 %r672, %r671, %r670; 2026-02-21T09:05:30.3084480Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3084539Z cvt.u64.u32 %rd260, %r494; 2026-02-21T09:05:30.3084604Z cvt.u64.u32 %rd261, %r495; 2026-02-21T09:05:30.3084662Z shl.b64 %rd262, %rd261, 32; 2026-02-21T09:05:30.3084755Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T09:05:30.3084930Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3084989Z mov.b64 {%r673, %r674}, %rd263; 2026-02-21T09:05:30.3085054Z cvt.rn.f16x2.f32 %r675, %r674, %r673; 2026-02-21T09:05:30.3085220Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3085288Z cvt.u64.u32 %rd264, %r496; 2026-02-21T09:05:30.3085346Z cvt.u64.u32 %rd265, %r497; 2026-02-21T09:05:30.3085404Z shl.b64 %rd266, %rd265, 32; 2026-02-21T09:05:30.3085502Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T09:05:30.3085676Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3085736Z mov.b64 {%r676, %r677}, %rd267; 2026-02-21T09:05:30.3085807Z cvt.rn.f16x2.f32 %r678, %r677, %r676; 2026-02-21T09:05:30.3085981Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3086039Z cvt.u64.u32 %rd268, %r498; 2026-02-21T09:05:30.3086096Z cvt.u64.u32 %rd269, %r499; 2026-02-21T09:05:30.3086161Z shl.b64 %rd270, %rd269, 32; 2026-02-21T09:05:30.3086218Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T09:05:30.3086395Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3086463Z mov.b64 {%r679, %r680}, %rd271; 2026-02-21T09:05:30.3086527Z cvt.rn.f16x2.f32 %r681, %r680, %r679; 2026-02-21T09:05:30.3086699Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3086768Z cvt.u64.u32 %rd272, %r500; 2026-02-21T09:05:30.3086827Z cvt.u64.u32 %rd273, %r501; 2026-02-21T09:05:30.3086919Z shl.b64 %rd274, %rd273, 32; 2026-02-21T09:05:30.3086980Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T09:05:30.3087157Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3087215Z mov.b64 {%r682, %r683}, %rd275; 2026-02-21T09:05:30.3087278Z cvt.rn.f16x2.f32 %r684, %r683, %r682; 2026-02-21T09:05:30.3087454Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3087513Z cvt.u64.u32 %rd276, %r502; 2026-02-21T09:05:30.3087608Z cvt.u64.u32 %rd277, %r503; 2026-02-21T09:05:30.3087674Z shl.b64 %rd278, %rd277, 32; 2026-02-21T09:05:30.3087733Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T09:05:30.3087909Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3087969Z mov.b64 {%r685, %r686}, %rd279; 2026-02-21T09:05:30.3088037Z cvt.rn.f16x2.f32 %r687, %r686, %r685; 2026-02-21T09:05:30.3088211Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3088270Z cvt.u64.u32 %rd280, %r504; 2026-02-21T09:05:30.3088335Z cvt.u64.u32 %rd281, %r505; 2026-02-21T09:05:30.3088393Z shl.b64 %rd282, %rd281, 32; 2026-02-21T09:05:30.3088451Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T09:05:30.3088632Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3088716Z mov.b64 {%r688, %r689}, %rd283; 2026-02-21T09:05:30.3088782Z cvt.rn.f16x2.f32 %r690, %r689, %r688; 2026-02-21T09:05:30.3088953Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3089017Z cvt.u64.u32 %rd284, %r506; 2026-02-21T09:05:30.3089077Z cvt.u64.u32 %rd285, %r507; 2026-02-21T09:05:30.3089135Z shl.b64 %rd286, %rd285, 32; 2026-02-21T09:05:30.3089200Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T09:05:30.3089372Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3089431Z mov.b64 {%r691, %r692}, %rd287; 2026-02-21T09:05:30.3089500Z cvt.rn.f16x2.f32 %r693, %r692, %r691; 2026-02-21T09:05:30.3089673Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3089731Z cvt.u64.u32 %rd288, %r508; 2026-02-21T09:05:30.3089788Z cvt.u64.u32 %rd289, %r509; 2026-02-21T09:05:30.3089856Z shl.b64 %rd290, %rd289, 32; 2026-02-21T09:05:30.3089915Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T09:05:30.3090086Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3090151Z mov.b64 {%r694, %r695}, %rd291; 2026-02-21T09:05:30.3090238Z cvt.rn.f16x2.f32 %r696, %r695, %r694; 2026-02-21T09:05:30.3090409Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3090474Z cvt.u64.u32 %rd292, %r511; 2026-02-21T09:05:30.3090533Z cvt.u64.u32 %rd293, %r512; 2026-02-21T09:05:30.3090592Z shl.b64 %rd294, %rd293, 32; 2026-02-21T09:05:30.3090650Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T09:05:30.3090825Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3090884Z mov.b64 {%r697, %r698}, %rd295; 2026-02-21T09:05:30.3090947Z cvt.rn.f16x2.f32 %r699, %r698, %r697; 2026-02-21T09:05:30.3091138Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3091194Z cvt.u64.u32 %rd296, %r513; 2026-02-21T09:05:30.3091249Z cvt.u64.u32 %rd297, %r514; 2026-02-21T09:05:30.3091313Z shl.b64 %rd298, %rd297, 32; 2026-02-21T09:05:30.3091370Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T09:05:30.3091532Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3091588Z mov.b64 {%r700, %r701}, %rd299; 2026-02-21T09:05:30.3091676Z cvt.rn.f16x2.f32 %r702, %r701, %r700; 2026-02-21T09:05:30.3091835Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3091892Z cvt.u64.u32 %rd300, %r515; 2026-02-21T09:05:30.3091956Z cvt.u64.u32 %rd301, %r516; 2026-02-21T09:05:30.3092014Z shl.b64 %rd302, %rd301, 32; 2026-02-21T09:05:30.3092070Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T09:05:30.3092238Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3092327Z mov.b64 {%r703, %r704}, %rd303; 2026-02-21T09:05:30.3092391Z cvt.rn.f16x2.f32 %r705, %r704, %r703; 2026-02-21T09:05:30.3092559Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3092628Z cvt.u64.u32 %rd304, %r517; 2026-02-21T09:05:30.3092685Z cvt.u64.u32 %rd305, %r518; 2026-02-21T09:05:30.3092743Z shl.b64 %rd306, %rd305, 32; 2026-02-21T09:05:30.3092809Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T09:05:30.3092971Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3093031Z mov.b64 {%r706, %r707}, %rd307; 2026-02-21T09:05:30.3093094Z cvt.rn.f16x2.f32 %r708, %r707, %r706; 2026-02-21T09:05:30.3093267Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3093325Z cvt.u64.u32 %rd308, %r519; 2026-02-21T09:05:30.3093410Z cvt.u64.u32 %rd309, %r520; 2026-02-21T09:05:30.3093476Z shl.b64 %rd310, %rd309, 32; 2026-02-21T09:05:30.3093533Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T09:05:30.3093696Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3093759Z mov.b64 {%r709, %r710}, %rd311; 2026-02-21T09:05:30.3093818Z cvt.rn.f16x2.f32 %r711, %r710, %r709; 2026-02-21T09:05:30.3093984Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3094048Z cvt.u64.u32 %rd312, %r521; 2026-02-21T09:05:30.3094105Z cvt.u64.u32 %rd313, %r522; 2026-02-21T09:05:30.3094163Z shl.b64 %rd314, %rd313, 32; 2026-02-21T09:05:30.3094217Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T09:05:30.3094383Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3094438Z mov.b64 {%r712, %r713}, %rd315; 2026-02-21T09:05:30.3094499Z cvt.rn.f16x2.f32 %r714, %r713, %r712; 2026-02-21T09:05:30.3094695Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3094756Z cvt.u64.u32 %rd316, %r523; 2026-02-21T09:05:30.3094812Z cvt.u64.u32 %rd317, %r524; 2026-02-21T09:05:30.3094911Z shl.b64 %rd318, %rd317, 32; 2026-02-21T09:05:30.3094975Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T09:05:30.3095136Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3095194Z mov.b64 {%r715, %r716}, %rd319; 2026-02-21T09:05:30.3095265Z cvt.rn.f16x2.f32 %r717, %r716, %r715; 2026-02-21T09:05:30.3095424Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3095482Z cvt.u64.u32 %rd320, %r525; 2026-02-21T09:05:30.3095545Z cvt.u64.u32 %rd321, %r526; 2026-02-21T09:05:30.3095600Z shl.b64 %rd322, %rd321, 32; 2026-02-21T09:05:30.3095657Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T09:05:30.3095821Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3095883Z mov.b64 {%r718, %r719}, %rd323; 2026-02-21T09:05:30.3095943Z cvt.rn.f16x2.f32 %r720, %r719, %r718; 2026-02-21T09:05:30.3096106Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3096170Z cvt.u64.u32 %rd324, %r528; 2026-02-21T09:05:30.3096226Z cvt.u64.u32 %rd325, %r529; 2026-02-21T09:05:30.3096315Z shl.b64 %rd326, %rd325, 32; 2026-02-21T09:05:30.3096379Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T09:05:30.3096538Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3096595Z mov.b64 {%r721, %r722}, %rd327; 2026-02-21T09:05:30.3096655Z cvt.rn.f16x2.f32 %r723, %r722, %r721; 2026-02-21T09:05:30.3096825Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3096907Z cvt.u64.u32 %rd328, %r530; 2026-02-21T09:05:30.3096962Z cvt.u64.u32 %rd329, %r531; 2026-02-21T09:05:30.3097026Z shl.b64 %rd330, %rd329, 32; 2026-02-21T09:05:30.3097082Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T09:05:30.3097245Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3097308Z mov.b64 {%r724, %r725}, %rd331; 2026-02-21T09:05:30.3097367Z cvt.rn.f16x2.f32 %r726, %r725, %r724; 2026-02-21T09:05:30.3097531Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3097594Z cvt.u64.u32 %rd332, %r532; 2026-02-21T09:05:30.3097649Z cvt.u64.u32 %rd333, %r533; 2026-02-21T09:05:30.3097706Z shl.b64 %rd334, %rd333, 32; 2026-02-21T09:05:30.3097764Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T09:05:30.3097961Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3098019Z mov.b64 {%r727, %r728}, %rd335; 2026-02-21T09:05:30.3098080Z cvt.rn.f16x2.f32 %r729, %r728, %r727; 2026-02-21T09:05:30.3098246Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3098305Z cvt.u64.u32 %rd336, %r534; 2026-02-21T09:05:30.3098358Z cvt.u64.u32 %rd337, %r535; 2026-02-21T09:05:30.3098414Z shl.b64 %rd338, %rd337, 32; 2026-02-21T09:05:30.3098478Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T09:05:30.3098646Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3098701Z mov.b64 {%r730, %r731}, %rd339; 2026-02-21T09:05:30.3098769Z cvt.rn.f16x2.f32 %r732, %r731, %r730; 2026-02-21T09:05:30.3098926Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3098982Z cvt.u64.u32 %rd340, %r536; 2026-02-21T09:05:30.3099043Z cvt.u64.u32 %rd341, %r537; 2026-02-21T09:05:30.3099099Z shl.b64 %rd342, %rd341, 32; 2026-02-21T09:05:30.3099157Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T09:05:30.3099319Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3099383Z mov.b64 {%r733, %r734}, %rd343; 2026-02-21T09:05:30.3099466Z cvt.rn.f16x2.f32 %r735, %r734, %r733; 2026-02-21T09:05:30.3099636Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3099701Z cvt.u64.u32 %rd344, %r538; 2026-02-21T09:05:30.3099756Z cvt.u64.u32 %rd345, %r539; 2026-02-21T09:05:30.3099812Z shl.b64 %rd346, %rd345, 32; 2026-02-21T09:05:30.3099876Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T09:05:30.3100042Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3100098Z mov.b64 {%r736, %r737}, %rd347; 2026-02-21T09:05:30.3100159Z cvt.rn.f16x2.f32 %r738, %r737, %r736; 2026-02-21T09:05:30.3100329Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3100387Z cvt.u64.u32 %rd348, %r540; 2026-02-21T09:05:30.3100443Z cvt.u64.u32 %rd349, %r541; 2026-02-21T09:05:30.3100508Z shl.b64 %rd350, %rd349, 32; 2026-02-21T09:05:30.3100566Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T09:05:30.3100732Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3100818Z mov.b64 {%r739, %r740}, %rd351; 2026-02-21T09:05:30.3100882Z cvt.rn.f16x2.f32 %r741, %r740, %r739; 2026-02-21T09:05:30.3101052Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3101110Z cvt.u64.u32 %rd352, %r542; 2026-02-21T09:05:30.3101174Z cvt.u64.u32 %rd353, %r543; 2026-02-21T09:05:30.3101230Z shl.b64 %rd354, %rd353, 32; 2026-02-21T09:05:30.3101285Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T09:05:30.3101458Z .loc 1 58 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:58:27 2026-02-21T09:05:30.3101541Z mov.b64 {%r742, %r743}, %rd355; 2026-02-21T09:05:30.3101602Z cvt.rn.f16x2.f32 %r744, %r743, %r742; 2026-02-21T09:05:30.3101768Z .loc 1 59 45 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:59:45 2026-02-21T09:05:30.3101839Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.3101894Z bar.sync 0, 128; 2026-02-21T09:05:30.3101988Z st.shared.v4.b32 [%r49], {%r555, %r558, %r561, %r564}; 2026-02-21T09:05:30.3102094Z st.shared.v4.b32 [%r49+4096], {%r579, %r582, %r585, %r588}; 2026-02-21T09:05:30.3102187Z st.shared.v4.b32 [%r49+8192], {%r603, %r606, %r609, %r612}; 2026-02-21T09:05:30.3102283Z st.shared.v4.b32 [%r49+12288], {%r627, %r630, %r633, %r636}; 2026-02-21T09:05:30.3102384Z st.shared.v4.b32 [%r49+16384], {%r651, %r654, %r657, %r660}; 2026-02-21T09:05:30.3102471Z st.shared.v4.b32 [%r49+20480], {%r675, %r678, %r681, %r684}; 2026-02-21T09:05:30.3102582Z st.shared.v4.b32 [%r49+24576], {%r699, %r702, %r705, %r708}; 2026-02-21T09:05:30.3102678Z st.shared.v4.b32 [%r49+28672], {%r723, %r726, %r729, %r732}; 2026-02-21T09:05:30.3102764Z st.shared.v4.b32 [%r50], {%r567, %r570, %r573, %r576}; 2026-02-21T09:05:30.3102854Z st.shared.v4.b32 [%r50+4096], {%r591, %r594, %r597, %r600}; 2026-02-21T09:05:30.3102948Z st.shared.v4.b32 [%r50+8192], {%r615, %r618, %r621, %r624}; 2026-02-21T09:05:30.3103035Z st.shared.v4.b32 [%r50+12288], {%r639, %r642, %r645, %r648}; 2026-02-21T09:05:30.3103122Z st.shared.v4.b32 [%r50+16384], {%r663, %r666, %r669, %r672}; 2026-02-21T09:05:30.3103209Z st.shared.v4.b32 [%r50+20480], {%r687, %r690, %r693, %r696}; 2026-02-21T09:05:30.3103303Z st.shared.v4.b32 [%r50+24576], {%r711, %r714, %r717, %r720}; 2026-02-21T09:05:30.3103388Z st.shared.v4.b32 [%r50+28672], {%r735, %r738, %r741, %r744}; 2026-02-21T09:05:30.3103445Z // begin inline asm 2026-02-21T09:05:30.3103526Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.3103582Z // end inline asm 2026-02-21T09:05:30.3103637Z bar.sync 0, 128; 2026-02-21T09:05:30.3103708Z elect.sync %r745|%p134, -1; 2026-02-21T09:05:30.3103770Z and.pred %p131, %p133, %p134; 2026-02-21T09:05:30.3103825Z add.s32 %r546, %r803, %r52; 2026-02-21T09:05:30.3103905Z // begin inline asm 2026-02-21T09:05:30.3104095Z @%p131 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd99, {%r802, %r546}], [%r547]; 2026-02-21T09:05:30.3104148Z // end inline asm 2026-02-21T09:05:30.3104214Z cp.async.bulk.commit_group; 2026-02-21T09:05:30.3104382Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3104440Z add.s32 %r548, %r551, 165968; 2026-02-21T09:05:30.3104604Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3104663Z bar.sync 0, 128; 2026-02-21T09:05:30.3104751Z // begin inline asm 2026-02-21T09:05:30.3104843Z @%p137 mbarrier.arrive.shared::cta.b64 _, [%r548]; 2026-02-21T09:05:30.3104901Z // end inline asm 2026-02-21T09:05:30.3104968Z add.s32 %r746, %r806, 1; 2026-02-21T09:05:30.3105031Z setp.eq.b32 %p135, %r746, 2; 2026-02-21T09:05:30.3105097Z selp.b32 %r806, 0, %r746, %p135; 2026-02-21T09:05:30.3105164Z selp.b32 %r805, 1, 0, %p135; 2026-02-21T09:05:30.3105248Z $L__BB0_22: // %.thread24 2026-02-21T09:05:30.3105346Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.3105565Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3105623Z xor.b32 %r808, %r808, %r805; 2026-02-21T09:05:30.3105789Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3105845Z add.s32 %r801, %r801, 1; 2026-02-21T09:05:30.3105916Z setp.lt.s32 %p136, %r801, %r45; 2026-02-21T09:05:30.3105975Z @%p136 bra $L__BB0_18; 2026-02-21T09:05:30.3106033Z bra.uni $L__BB0_23; 2026-02-21T09:05:30.3106189Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:30.3106246Z add.s32 %r406, %r800, 1; 2026-02-21T09:05:30.3106306Z setp.eq.b32 %p128, %r800, 127; 2026-02-21T09:05:30.3106367Z selp.b32 %r800, 0, %r406, %p128; 2026-02-21T09:05:30.3106434Z setp.eq.b32 %p129, %r800, 127; 2026-02-21T09:05:30.3106491Z @%p129 bra $L__BB0_21; 2026-02-21T09:05:30.3106586Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.3106759Z .loc 1 0 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0:74 2026-02-21T09:05:30.3106814Z mov.b32 %r805, 0; 2026-02-21T09:05:30.3106975Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3107040Z setp.ne.b32 %p130, %r800, 0; 2026-02-21T09:05:30.3107095Z @%p130 bra $L__BB0_22; 2026-02-21T09:05:30.3107205Z // %bb.20: // %.thread 2026-02-21T09:05:30.3107295Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.3107356Z add.s32 %r804, %r804, 1; 2026-02-21T09:05:30.3107523Z .loc 1 39 35 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:39:35 2026-02-21T09:05:30.3107581Z shr.s32 %r748, %r804, 31; 2026-02-21T09:05:30.3107645Z shr.u32 %r749, %r748, 28; 2026-02-21T09:05:30.3107701Z add.s32 %r750, %r804, %r749; 2026-02-21T09:05:30.3107756Z shr.s32 %r751, %r750, 4; 2026-02-21T09:05:30.3107930Z .loc 1 40 33 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:40:33 2026-02-21T09:05:30.3107986Z shl.b32 %r752, %r751, 2; 2026-02-21T09:05:30.3108151Z .loc 1 41 39 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:41:39 2026-02-21T09:05:30.3108214Z sub.s32 %r753, 128, %r752; 2026-02-21T09:05:30.3108381Z .loc 1 41 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:41:52 2026-02-21T09:05:30.3108437Z min.s32 %r754, %r753, 4; 2026-02-21T09:05:30.3108600Z .loc 1 42 45 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:42:45 2026-02-21T09:05:30.3108666Z and.b32 %r755, %r750, -16; 2026-02-21T09:05:30.3108749Z sub.s32 %r756, %r804, %r755; 2026-02-21T09:05:30.3108919Z .loc 1 43 51 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:43:51 2026-02-21T09:05:30.3108984Z div.s32 %r757, %r756, %r754; 2026-02-21T09:05:30.3109150Z .loc 1 42 64 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:42:64 2026-02-21T09:05:30.3109209Z mul.lo.s32 %r758, %r757, %r754; 2026-02-21T09:05:30.3109273Z sub.s32 %r759, %r756, %r758; 2026-02-21T09:05:30.3109439Z .loc 1 42 30 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:42:30 2026-02-21T09:05:30.3109496Z add.s32 %r760, %r759, %r752; 2026-02-21T09:05:30.3109663Z .loc 1 44 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:44:27 2026-02-21T09:05:30.3109729Z shl.b32 %r802, %r760, 4; 2026-02-21T09:05:30.3109892Z .loc 1 45 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:45:27 2026-02-21T09:05:30.3109952Z shl.b32 %r803, %r757, 10; 2026-02-21T09:05:30.3110130Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3110185Z bra.uni $L__BB0_22; 2026-02-21T09:05:30.3110302Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:05:30.3110474Z .loc 1 0 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0:74 2026-02-21T09:05:30.3110533Z mov.b32 %r76, global_smem; 2026-02-21T09:05:30.3110589Z add.s32 %r77, %r76, %r3; 2026-02-21T09:05:30.3110650Z mov.u32 %r137, %ctaid.x; 2026-02-21T09:05:30.3110705Z max.u32 %r138, %r137, 511; 2026-02-21T09:05:30.3110758Z shl.b32 %r139, %r138, 7; 2026-02-21T09:05:30.3110814Z sub.s32 %r5, 65536, %r139; 2026-02-21T09:05:30.3110903Z setp.lt.s32 %p19, %r5, 1; 2026-02-21T09:05:30.3110956Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.3111051Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3111222Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3111301Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.3111356Z barrier.sync 1; 2026-02-21T09:05:30.3111411Z barrier.sync 1; 2026-02-21T09:05:30.3111496Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.3111576Z $L__BB0_2: // %.preheader 2026-02-21T09:05:30.3111663Z // =>This Loop Header: Depth=1 2026-02-21T09:05:30.3111755Z // Child Loop BB0_11 Depth 2 2026-02-21T09:05:30.3111838Z // Child Loop BB0_7 Depth 2 2026-02-21T09:05:30.3112020Z .loc 1 19 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:19 2026-02-21T09:05:30.3112106Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:05:30.3112161Z barrier.sync 1; 2026-02-21T09:05:30.3112225Z ld.shared.b8 %r75, [%r77+165980]; 2026-02-21T09:05:30.3112285Z setp.gt.u32 %p4, %r75, 3; 2026-02-21T09:05:30.3112348Z @%p4 bra $L__BB0_4; 2026-02-21T09:05:30.3112426Z // %bb.3: // %.preheader 2026-02-21T09:05:30.3112512Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3112579Z $L_brx_0: .branchtargets 2026-02-21T09:05:30.3112630Z $L__BB0_5, 2026-02-21T09:05:30.3112681Z $L__BB0_9, 2026-02-21T09:05:30.3112740Z $L__BB0_15, 2026-02-21T09:05:30.3112789Z $L__BB0_24; 2026-02-21T09:05:30.3112847Z brx.idx %r75, $L_brx_0; 2026-02-21T09:05:30.3112940Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3113114Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3113188Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.3113261Z ld.shared.b32 %r4, [global_smem+131072]; 2026-02-21T09:05:30.3113321Z barrier.sync 1; 2026-02-21T09:05:30.3113400Z @%p19 bra $L__BB0_8; 2026-02-21T09:05:30.3113475Z // %bb.6: // %.lr.ph14 2026-02-21T09:05:30.3113558Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3113726Z .loc 1 0 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0:74 2026-02-21T09:05:30.3113781Z mov.b32 %r782, -1; 2026-02-21T09:05:30.3113838Z mov.pred %p150, 0; 2026-02-21T09:05:30.3113899Z mov.b32 %r778, 0; 2026-02-21T09:05:30.3113953Z mov.b32 %r779, %r778; 2026-02-21T09:05:30.3114007Z mov.b32 %r780, %r778; 2026-02-21T09:05:30.3114066Z mov.b32 %r781, %r778; 2026-02-21T09:05:30.3114118Z mov.b32 %r783, %r778; 2026-02-21T09:05:30.3114211Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:30.3114299Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:30.3114471Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3114527Z add.s32 %r162, %r782, 1; 2026-02-21T09:05:30.3114586Z setp.eq.b32 %p40, %r782, 127; 2026-02-21T09:05:30.3114653Z selp.b32 %r782, 0, %r162, %p40; 2026-02-21T09:05:30.3114762Z shl.b32 %r163, %r781, 3; 2026-02-21T09:05:30.3114822Z add.s32 %r165, %r76, %r163; 2026-02-21T09:05:30.3114885Z add.s32 %r166, %r165, 165888; 2026-02-21T09:05:30.3114940Z add.s32 %r142, %r165, 165920; 2026-02-21T09:05:30.3115103Z .loc 1 54 31 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:54:31 2026-02-21T09:05:30.3115157Z shl.b32 %r167, %r781, 15; 2026-02-21T09:05:30.3115220Z add.s32 %r168, %r76, %r167; 2026-02-21T09:05:30.3115386Z .loc 1 55 44 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:55:44 2026-02-21T09:05:30.3115469Z shl.b32 %r169, %r781, 9; 2026-02-21T09:05:30.3115534Z add.s32 %r170, %r76, %r169; 2026-02-21T09:05:30.3115588Z add.s32 %r171, %r170, 163840; 2026-02-21T09:05:30.3115751Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3115816Z bar.warp.sync -1; 2026-02-21T09:05:30.3115871Z // begin inline asm 2026-02-21T09:05:30.3115920Z 2026-02-21T09:05:30.3115971Z { 2026-02-21T09:05:30.3116037Z .reg .pred complete; 2026-02-21T09:05:30.3116092Z waitLoop: 2026-02-21T09:05:30.3116211Z mbarrier.try_wait.parity.shared.b64 complete, [%r142], %r780; 2026-02-21T09:05:30.3116281Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.3116330Z } 2026-02-21T09:05:30.3116335Z 2026-02-21T09:05:30.3116390Z // end inline asm 2026-02-21T09:05:30.3116590Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3116660Z shl.b32 %r172, %r779, 7; 2026-02-21T09:05:30.3116717Z add.s32 %r144, %r172, %r4; 2026-02-21T09:05:30.3116879Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3116943Z shl.b32 %r173, %r779, 3; 2026-02-21T09:05:30.3116999Z add.s32 %r174, %r76, %r173; 2026-02-21T09:05:30.3117055Z add.s32 %r175, %r174, 165952; 2026-02-21T09:05:30.3117121Z setp.eq.b32 %p39, %r782, 127; 2026-02-21T09:05:30.3117286Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3117347Z elect.sync %r176|%p22, -1; 2026-02-21T09:05:30.3117407Z bfe.u32 %r177, %r168, 4, 14; 2026-02-21T09:05:30.3117471Z cvt.u64.u32 %rd33, %r177; 2026-02-21T09:05:30.3117544Z or.b64 %rd15, %rd33, -4611685949573693440; 2026-02-21T09:05:30.3117602Z bfe.u32 %r178, %r171, 4, 14; 2026-02-21T09:05:30.3117668Z cvt.u64.u32 %rd34, %r178; 2026-02-21T09:05:30.3117739Z or.b64 %rd16, %rd34, -4611685949705814016; 2026-02-21T09:05:30.3117797Z mov.b32 %r145, 134479888; 2026-02-21T09:05:30.3117853Z // begin inline asm 2026-02-21T09:05:30.3118007Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 0 ], %rd15, %rd16, %r145, %p150; 2026-02-21T09:05:30.3118088Z // end inline asm 2026-02-21T09:05:30.3118143Z add.s32 %r179, %r168, 4096; 2026-02-21T09:05:30.3118208Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T09:05:30.3118264Z cvt.u64.u32 %rd35, %r180; 2026-02-21T09:05:30.3118332Z or.b64 %rd17, %rd35, -4611685949573693440; 2026-02-21T09:05:30.3118392Z // begin inline asm 2026-02-21T09:05:30.3118529Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 16 ], %rd17, %rd16, %r145, %p150; 2026-02-21T09:05:30.3118582Z // end inline asm 2026-02-21T09:05:30.3118636Z add.s32 %r181, %r168, 8192; 2026-02-21T09:05:30.3118699Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T09:05:30.3118756Z cvt.u64.u32 %rd36, %r182; 2026-02-21T09:05:30.3118822Z or.b64 %rd19, %rd36, -4611685949573693440; 2026-02-21T09:05:30.3118886Z // begin inline asm 2026-02-21T09:05:30.3119019Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 32 ], %rd19, %rd16, %r145, %p150; 2026-02-21T09:05:30.3119071Z // end inline asm 2026-02-21T09:05:30.3119132Z add.s32 %r183, %r168, 12288; 2026-02-21T09:05:30.3119188Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T09:05:30.3119243Z cvt.u64.u32 %rd37, %r184; 2026-02-21T09:05:30.3119305Z or.b64 %rd21, %rd37, -4611685949573693440; 2026-02-21T09:05:30.3119367Z // begin inline asm 2026-02-21T09:05:30.3119518Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 48 ], %rd21, %rd16, %r145, %p150; 2026-02-21T09:05:30.3119573Z // end inline asm 2026-02-21T09:05:30.3119637Z add.s32 %r185, %r168, 16384; 2026-02-21T09:05:30.3119691Z bfe.u32 %r186, %r185, 4, 14; 2026-02-21T09:05:30.3119746Z cvt.u64.u32 %rd38, %r186; 2026-02-21T09:05:30.3119809Z or.b64 %rd23, %rd38, -4611685949573693440; 2026-02-21T09:05:30.3119871Z // begin inline asm 2026-02-21T09:05:30.3120000Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 64 ], %rd23, %rd16, %r145, %p150; 2026-02-21T09:05:30.3120075Z // end inline asm 2026-02-21T09:05:30.3120137Z add.s32 %r187, %r168, 20480; 2026-02-21T09:05:30.3120191Z bfe.u32 %r188, %r187, 4, 14; 2026-02-21T09:05:30.3120246Z cvt.u64.u32 %rd39, %r188; 2026-02-21T09:05:30.3120318Z or.b64 %rd25, %rd39, -4611685949573693440; 2026-02-21T09:05:30.3120373Z // begin inline asm 2026-02-21T09:05:30.3120502Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 80 ], %rd25, %rd16, %r145, %p150; 2026-02-21T09:05:30.3120556Z // end inline asm 2026-02-21T09:05:30.3120618Z add.s32 %r189, %r168, 24576; 2026-02-21T09:05:30.3120672Z bfe.u32 %r190, %r189, 4, 14; 2026-02-21T09:05:30.3120728Z cvt.u64.u32 %rd40, %r190; 2026-02-21T09:05:30.3120801Z or.b64 %rd27, %rd40, -4611685949573693440; 2026-02-21T09:05:30.3120855Z // begin inline asm 2026-02-21T09:05:30.3120983Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 96 ], %rd27, %rd16, %r145, %p150; 2026-02-21T09:05:30.3121062Z // end inline asm 2026-02-21T09:05:30.3121119Z add.s32 %r191, %r168, 28672; 2026-02-21T09:05:30.3121174Z bfe.u32 %r192, %r191, 4, 14; 2026-02-21T09:05:30.3121229Z cvt.u64.u32 %rd41, %r192; 2026-02-21T09:05:30.3121300Z or.b64 %rd29, %rd41, -4611685949573693440; 2026-02-21T09:05:30.3121355Z // begin inline asm 2026-02-21T09:05:30.3121487Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r144 + 112 ], %rd29, %rd16, %r145, %p150; 2026-02-21T09:05:30.3121544Z // end inline asm 2026-02-21T09:05:30.3121598Z cvt.u64.u32 %rd31, %r166; 2026-02-21T09:05:30.3121654Z // begin inline asm 2026-02-21T09:05:30.3121784Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd31]; 2026-02-21T09:05:30.3121836Z // end inline asm 2026-02-21T09:05:30.3121898Z and.pred %p38, %p39, %p22; 2026-02-21T09:05:30.3121953Z cvt.u64.u32 %rd32, %r175; 2026-02-21T09:05:30.3122014Z // begin inline asm 2026-02-21T09:05:30.3122133Z @%p38 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd32]; 2026-02-21T09:05:30.3122185Z // end inline asm 2026-02-21T09:05:30.3122355Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3122416Z setp.ne.b32 %p150, %r782, 127; 2026-02-21T09:05:30.3122587Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3122680Z add.s32 %r193, %r779, 1; 2026-02-21T09:05:30.3122739Z setp.eq.b32 %p41, %r193, 2; 2026-02-21T09:05:30.3122798Z selp.b32 %r194, 0, %r193, %p41; 2026-02-21T09:05:30.3122863Z selp.b32 %r779, %r194, %r779, %p39; 2026-02-21T09:05:30.3122931Z and.pred %p42, %p39, %p41; 2026-02-21T09:05:30.3122987Z selp.b32 %r195, 1, 0, %p42; 2026-02-21T09:05:30.3123043Z xor.b32 %r778, %r778, %r195; 2026-02-21T09:05:30.3123217Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3123273Z shl.b32 %r196, %r779, 3; 2026-02-21T09:05:30.3123326Z add.s32 %r197, %r76, %r196; 2026-02-21T09:05:30.3123383Z add.s32 %r160, %r197, 165968; 2026-02-21T09:05:30.3123557Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3123610Z // begin inline asm 2026-02-21T09:05:30.3123658Z 2026-02-21T09:05:30.3123715Z { 2026-02-21T09:05:30.3123776Z @!%p39 bra.uni skipWait; 2026-02-21T09:05:30.3123834Z .reg .pred complete; 2026-02-21T09:05:30.3123885Z waitLoop: 2026-02-21T09:05:30.3124009Z mbarrier.try_wait.parity.shared.b64 complete, [%r160], %r778; 2026-02-21T09:05:30.3124092Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.3124146Z skipWait: 2026-02-21T09:05:30.3124203Z } 2026-02-21T09:05:30.3124207Z 2026-02-21T09:05:30.3124260Z // end inline asm 2026-02-21T09:05:30.3124416Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3124480Z add.s32 %r198, %r781, 1; 2026-02-21T09:05:30.3124540Z setp.eq.b32 %p43, %r198, 4; 2026-02-21T09:05:30.3124601Z selp.b32 %r781, 0, %r198, %p43; 2026-02-21T09:05:30.3124712Z selp.b32 %r199, 1, 0, %p43; 2026-02-21T09:05:30.3124780Z xor.b32 %r780, %r780, %r199; 2026-02-21T09:05:30.3124949Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3125006Z add.s32 %r783, %r783, 1; 2026-02-21T09:05:30.3125076Z setp.lt.s32 %p44, %r783, %r5; 2026-02-21T09:05:30.3125133Z @%p44 bra $L__BB0_7; 2026-02-21T09:05:30.3125217Z $L__BB0_8: // %._crit_edge15 2026-02-21T09:05:30.3125316Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3125373Z barrier.sync 1; 2026-02-21T09:05:30.3125451Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.3125507Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.3125610Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3125803Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3125882Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.3125944Z barrier.sync 1; 2026-02-21T09:05:30.3126109Z .loc 1 21 67 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:21:67 2026-02-21T09:05:30.3126168Z mov.u32 %r78, %ctaid.y; 2026-02-21T09:05:30.3126233Z mov.u32 %r79, %ctaid.z; 2026-02-21T09:05:30.3126294Z mov.u32 %r80, %nctaid.x; 2026-02-21T09:05:30.3126353Z mov.u32 %r81, %nctaid.y; 2026-02-21T09:05:30.3126418Z mad.lo.s32 %r82, %r79, %r81, %r78; 2026-02-21T09:05:30.3126495Z mad.lo.s32 %r83, %r82, %r80, %r137; 2026-02-21T09:05:30.3126553Z mul.lo.s32 %r84, %r83, 384; 2026-02-21T09:05:30.3126736Z .loc 1 22 67 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:22:67 2026-02-21T09:05:30.3126798Z add.s32 %r85, %r84, 128; 2026-02-21T09:05:30.3126854Z cvt.s64.s32 %rd8, %r85; 2026-02-21T09:05:30.3126910Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:05:30.3126973Z cvta.global.u64 %rd14, %rd9; 2026-02-21T09:05:30.3127145Z .loc 1 21 67 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:21:67 2026-02-21T09:05:30.3127202Z cvt.s64.s32 %rd10, %r84; 2026-02-21T09:05:30.3127259Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:05:30.3127374Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:05:30.3127545Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3127605Z @%p19 bra $L__BB0_14; 2026-02-21T09:05:30.3127692Z // %bb.10: // %.lr.ph 2026-02-21T09:05:30.3127783Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3127842Z add.s32 %r794, %r137, -1; 2026-02-21T09:05:30.3127901Z add.s32 %r21, %r1, -128; 2026-02-21T09:05:30.3127966Z shr.u32 %r22, %r21, 5; 2026-02-21T09:05:30.3128022Z mov.b32 %r790, -1; 2026-02-21T09:05:30.3128076Z mov.b32 %r784, 0; 2026-02-21T09:05:30.3128141Z mov.b32 %r785, %r784; 2026-02-21T09:05:30.3128199Z mov.b32 %r793, %r784; 2026-02-21T09:05:30.3128257Z mov.b32 %r792, %r784; 2026-02-21T09:05:30.3128314Z mov.b32 %r788, %r784; 2026-02-21T09:05:30.3128376Z mov.b32 %r791, %r784; 2026-02-21T09:05:30.3128433Z bra.uni $L__BB0_11; 2026-02-21T09:05:30.3128535Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:30.3128714Z .loc 1 0 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0:74 2026-02-21T09:05:30.3128778Z selp.b32 %r108, 0, %r788, %p8; 2026-02-21T09:05:30.3128868Z setp.lt.u32 %p13, %r21, 32; 2026-02-21T09:05:30.3128937Z setp.lt.u32 %p14, %r21, 64; 2026-02-21T09:05:30.3128997Z setp.eq.b32 %p9, %r21, 0; 2026-02-21T09:05:30.3129173Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3129231Z shl.b32 %r119, %r785, 3; 2026-02-21T09:05:30.3129296Z add.s32 %r121, %r76, %r119; 2026-02-21T09:05:30.3129355Z add.s32 %r104, %r121, 165888; 2026-02-21T09:05:30.3129524Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3129616Z // begin inline asm 2026-02-21T09:05:30.3129667Z 2026-02-21T09:05:30.3129717Z { 2026-02-21T09:05:30.3129779Z .reg .pred complete; 2026-02-21T09:05:30.3129842Z waitLoop: 2026-02-21T09:05:30.3129963Z mbarrier.try_wait.parity.shared.b64 complete, [%r104], %r784; 2026-02-21T09:05:30.3130027Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.3130085Z } 2026-02-21T09:05:30.3130088Z 2026-02-21T09:05:30.3130144Z // end inline asm 2026-02-21T09:05:30.3130317Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3130381Z add.s32 %r110, %r121, 165920; 2026-02-21T09:05:30.3130550Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3130605Z bar.sync 3, 64; 2026-02-21T09:05:30.3130661Z // begin inline asm 2026-02-21T09:05:30.3130804Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r110], 33280; 2026-02-21T09:05:30.3130865Z // end inline asm 2026-02-21T09:05:30.3131033Z .loc 1 54 31 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:54:31 2026-02-21T09:05:30.3131099Z shl.b32 %r122, %r785, 15; 2026-02-21T09:05:30.3131156Z add.s32 %r123, %r76, %r122; 2026-02-21T09:05:30.3131323Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3131386Z bar.sync 3, 64; 2026-02-21T09:05:30.3131462Z shfl.sync.idx.b32 %r124, %r22, 0, 31, -1; 2026-02-21T09:05:30.3131526Z elect.sync %r125|%p15, -1; 2026-02-21T09:05:30.3131591Z and.pred %p10, %p14, %p15; 2026-02-21T09:05:30.3131652Z and.b32 %r126, %r124, 3; 2026-02-21T09:05:30.3131710Z shl.b32 %r127, %r126, 13; 2026-02-21T09:05:30.3131767Z add.s32 %r107, %r123, %r127; 2026-02-21T09:05:30.3131832Z shl.b32 %r128, %r126, 8; 2026-02-21T09:05:30.3131897Z or.b32 %r109, %r128, %r793; 2026-02-21T09:05:30.3131975Z // begin inline asm 2026-02-21T09:05:30.3132307Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r107], [%rd12, {%r108, %r109}], [%r110]; 2026-02-21T09:05:30.3132366Z // end inline asm 2026-02-21T09:05:30.3132425Z xor.b32 %r129, %r126, 2; 2026-02-21T09:05:30.3132521Z shl.b32 %r130, %r129, 13; 2026-02-21T09:05:30.3132590Z add.s32 %r111, %r123, %r130; 2026-02-21T09:05:30.3132647Z shl.b32 %r131, %r129, 8; 2026-02-21T09:05:30.3132704Z or.b32 %r113, %r131, %r793; 2026-02-21T09:05:30.3132771Z // begin inline asm 2026-02-21T09:05:30.3133023Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r111], [%rd12, {%r108, %r113}], [%r110]; 2026-02-21T09:05:30.3133079Z // end inline asm 2026-02-21T09:05:30.3133265Z .loc 1 55 44 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:55:44 2026-02-21T09:05:30.3133324Z shl.b32 %r132, %r785, 9; 2026-02-21T09:05:30.3133384Z add.s32 %r133, %r76, %r132; 2026-02-21T09:05:30.3133446Z add.s32 %r115, %r133, 163840; 2026-02-21T09:05:30.3133625Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3133682Z bar.sync 3, 64; 2026-02-21T09:05:30.3133748Z elect.sync %r134|%p16, -1; 2026-02-21T09:05:30.3133821Z and.pred %p12, %p13, %p16; 2026-02-21T09:05:30.3133878Z // begin inline asm 2026-02-21T09:05:30.3134152Z @%p12 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r115], [%rd14, {%r108, %r792}], [%r110]; 2026-02-21T09:05:30.3134217Z // end inline asm 2026-02-21T09:05:30.3134394Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3134451Z add.s32 %r788, %r108, 16; 2026-02-21T09:05:30.3134620Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3134723Z add.s32 %r135, %r785, 1; 2026-02-21T09:05:30.3134789Z setp.eq.b32 %p17, %r135, 4; 2026-02-21T09:05:30.3134884Z selp.b32 %r785, 0, %r135, %p17; 2026-02-21T09:05:30.3134953Z selp.b32 %r136, 1, 0, %p17; 2026-02-21T09:05:30.3135021Z xor.b32 %r784, %r784, %r136; 2026-02-21T09:05:30.3135185Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3135247Z add.s32 %r791, %r791, 1; 2026-02-21T09:05:30.3135308Z setp.lt.s32 %p18, %r791, %r5; 2026-02-21T09:05:30.3135364Z @%p18 bra $L__BB0_11; 2026-02-21T09:05:30.3135422Z bra.uni $L__BB0_14; 2026-02-21T09:05:30.3135529Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:30.3135622Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:30.3135678Z add.s32 %r90, %r790, 1; 2026-02-21T09:05:30.3135745Z setp.eq.b32 %p6, %r790, 127; 2026-02-21T09:05:30.3135803Z selp.b32 %r790, 0, %r90, %p6; 2026-02-21T09:05:30.3135862Z setp.ne.b32 %p7, %r790, 0; 2026-02-21T09:05:30.3135962Z setp.eq.b32 %p8, %r790, 0; 2026-02-21T09:05:30.3136021Z @%p7 bra $L__BB0_13; 2026-02-21T09:05:30.3136115Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:30.3136171Z add.s32 %r794, %r794, 1; 2026-02-21T09:05:30.3136348Z .loc 1 39 35 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:39:35 2026-02-21T09:05:30.3136402Z shr.s32 %r91, %r794, 31; 2026-02-21T09:05:30.3136457Z shr.u32 %r92, %r91, 28; 2026-02-21T09:05:30.3136518Z add.s32 %r93, %r794, %r92; 2026-02-21T09:05:30.3136573Z shr.s32 %r94, %r93, 4; 2026-02-21T09:05:30.3136738Z .loc 1 40 33 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:40:33 2026-02-21T09:05:30.3136797Z shl.b32 %r95, %r94, 2; 2026-02-21T09:05:30.3136960Z .loc 1 41 39 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:41:39 2026-02-21T09:05:30.3137014Z sub.s32 %r96, 128, %r95; 2026-02-21T09:05:30.3137177Z .loc 1 41 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:41:52 2026-02-21T09:05:30.3137241Z min.s32 %r97, %r96, 4; 2026-02-21T09:05:30.3137402Z .loc 1 42 45 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:42:45 2026-02-21T09:05:30.3137484Z and.b32 %r98, %r93, -16; 2026-02-21T09:05:30.3137547Z sub.s32 %r99, %r794, %r98; 2026-02-21T09:05:30.3137710Z .loc 1 43 51 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:43:51 2026-02-21T09:05:30.3137767Z div.s32 %r100, %r99, %r97; 2026-02-21T09:05:30.3137938Z .loc 1 42 64 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:42:64 2026-02-21T09:05:30.3137996Z mul.lo.s32 %r101, %r100, %r97; 2026-02-21T09:05:30.3138052Z sub.s32 %r102, %r99, %r101; 2026-02-21T09:05:30.3138215Z .loc 1 42 30 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:42:30 2026-02-21T09:05:30.3138278Z add.s32 %r103, %r102, %r95; 2026-02-21T09:05:30.3138441Z .loc 1 44 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:44:27 2026-02-21T09:05:30.3138496Z shl.b32 %r792, %r103, 4; 2026-02-21T09:05:30.3138663Z .loc 1 45 27 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:45:27 2026-02-21T09:05:30.3138719Z shl.b32 %r793, %r100, 10; 2026-02-21T09:05:30.3138772Z bra.uni $L__BB0_13; 2026-02-21T09:05:30.3138857Z $L__BB0_14: // %._crit_edge 2026-02-21T09:05:30.3138972Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3139137Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3139198Z barrier.sync 1; 2026-02-21T09:05:30.3139274Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.3139327Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.3139418Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.3139586Z .loc 1 19 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:19 2026-02-21T09:05:30.3139660Z barrier.sync 1; 2026-02-21T09:05:30.3139713Z barrier.sync 1; 2026-02-21T09:05:30.3139774Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.3139860Z $L__BB0_23: // %._crit_edge18 2026-02-21T09:05:30.3140016Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3140093Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.3140148Z bar.sync 0, 128; 2026-02-21T09:05:30.3140200Z barrier.sync 1; 2026-02-21T09:05:30.3140272Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:30.3140335Z shl.b32 %r776, %r806, 3; 2026-02-21T09:05:30.3140391Z add.s32 %r761, %r375, %r776; 2026-02-21T09:05:30.3140546Z .loc 1 56 52 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:56:52 2026-02-21T09:05:30.3140607Z // begin inline asm 2026-02-21T09:05:30.3140686Z 2026-02-21T09:05:30.3140736Z { 2026-02-21T09:05:30.3140794Z .reg .pred complete; 2026-02-21T09:05:30.3140856Z waitLoop: 2026-02-21T09:05:30.3140975Z mbarrier.try_wait.parity.shared.b64 complete, [%r761], %r808; 2026-02-21T09:05:30.3141039Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.3141094Z } 2026-02-21T09:05:30.3141098Z 2026-02-21T09:05:30.3141151Z // end inline asm 2026-02-21T09:05:30.3141315Z .loc 1 33 74 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:74 2026-02-21T09:05:30.3141370Z bar.sync 0, 128; 2026-02-21T09:05:30.3141433Z // begin inline asm 2026-02-21T09:05:30.3141521Z @%p137 mbarrier.inval.shared::cta.b64 [%r375]; 2026-02-21T09:05:30.3141573Z // end inline asm 2026-02-21T09:05:30.3141636Z bar.sync 0, 128; 2026-02-21T09:05:30.3141688Z // begin inline asm 2026-02-21T09:05:30.3141771Z @%p137 mbarrier.inval.shared::cta.b64 [%r376]; 2026-02-21T09:05:30.3141833Z // end inline asm 2026-02-21T09:05:30.3141888Z // begin inline asm 2026-02-21T09:05:30.3141966Z @%p137 mbarrier.inval.shared::cta.b64 [%r373]; 2026-02-21T09:05:30.3142020Z // end inline asm 2026-02-21T09:05:30.3142081Z bar.sync 0, 128; 2026-02-21T09:05:30.3142135Z // begin inline asm 2026-02-21T09:05:30.3142208Z @%p137 mbarrier.inval.shared::cta.b64 [%r374]; 2026-02-21T09:05:30.3142285Z // end inline asm 2026-02-21T09:05:30.3142338Z // begin inline asm 2026-02-21T09:05:30.3142412Z @%p137 mbarrier.inval.shared::cta.b64 [%r365]; 2026-02-21T09:05:30.3142463Z // end inline asm 2026-02-21T09:05:30.3142522Z bar.sync 0, 128; 2026-02-21T09:05:30.3142574Z // begin inline asm 2026-02-21T09:05:30.3142647Z @%p137 mbarrier.inval.shared::cta.b64 [%r366]; 2026-02-21T09:05:30.3142705Z // end inline asm 2026-02-21T09:05:30.3142756Z bar.sync 0, 128; 2026-02-21T09:05:30.3142809Z // begin inline asm 2026-02-21T09:05:30.3142888Z @%p137 mbarrier.inval.shared::cta.b64 [%r367]; 2026-02-21T09:05:30.3142940Z // end inline asm 2026-02-21T09:05:30.3142992Z bar.sync 0, 128; 2026-02-21T09:05:30.3143045Z // begin inline asm 2026-02-21T09:05:30.3143127Z @%p137 mbarrier.inval.shared::cta.b64 [%r368]; 2026-02-21T09:05:30.3143178Z // end inline asm 2026-02-21T09:05:30.3143231Z // begin inline asm 2026-02-21T09:05:30.3143309Z @%p137 mbarrier.inval.shared::cta.b64 [%r361]; 2026-02-21T09:05:30.3143361Z // end inline asm 2026-02-21T09:05:30.3143412Z bar.sync 0, 128; 2026-02-21T09:05:30.3143464Z // begin inline asm 2026-02-21T09:05:30.3143543Z @%p137 mbarrier.inval.shared::cta.b64 [%r362]; 2026-02-21T09:05:30.3143613Z // end inline asm 2026-02-21T09:05:30.3143666Z bar.sync 0, 128; 2026-02-21T09:05:30.3143723Z // begin inline asm 2026-02-21T09:05:30.3143795Z @%p137 mbarrier.inval.shared::cta.b64 [%r363]; 2026-02-21T09:05:30.3143847Z // end inline asm 2026-02-21T09:05:30.3143898Z bar.sync 0, 128; 2026-02-21T09:05:30.3143957Z // begin inline asm 2026-02-21T09:05:30.3144029Z @%p137 mbarrier.inval.shared::cta.b64 [%r364]; 2026-02-21T09:05:30.3144078Z // end inline asm 2026-02-21T09:05:30.3144249Z .loc 1 33 4 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:33:4 2026-02-21T09:05:30.3144325Z bar.sync 0, 128; 2026-02-21T09:05:30.3144378Z // begin inline asm 2026-02-21T09:05:30.3144497Z @%p45 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r775, 256; 2026-02-21T09:05:30.3144551Z // end inline asm 2026-02-21T09:05:30.3144627Z st.shared.b32 [global_smem+165984], 50529027; 2026-02-21T09:05:30.3144711Z barrier.sync 1; 2026-02-21T09:05:30.3144801Z $L__BB0_24: // %common.ret 2026-02-21T09:05:30.3144962Z .loc 1 0 0 // c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py:0 2026-02-21T09:05:30.3145013Z ret; 2026-02-21T09:05:30.3145073Z $L__tmp1: 2026-02-21T09:05:30.3145126Z $L__func_end0: 2026-02-21T09:05:30.3145205Z // -- End function 2026-02-21T09:05:30.3145258Z } 2026-02-21T09:05:30.3145488Z .file 1 "/tmp/torchinductor_root/4s/c4sm7maoxp6rwdjbeoqeyzpdt4ssfeeb3iosfrrxvm5vm6w3c3ca.py" 2026-02-21T09:05:30.3145550Z .section .debug_abbrev 2026-02-21T09:05:30.3145599Z { 2026-02-21T09:05:30.3145690Z .b8 1 // Abbreviation Code 2026-02-21T09:05:30.3145774Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:30.3145852Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:30.3145936Z .b8 37 // DW_AT_producer 2026-02-21T09:05:30.3146009Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.3146082Z .b8 19 // DW_AT_language 2026-02-21T09:05:30.3146164Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:30.3146237Z .b8 3 // DW_AT_name 2026-02-21T09:05:30.3146307Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.3146382Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:30.3146463Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:30.3146535Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:30.3146604Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.3146679Z .b8 0 // EOM(1) 2026-02-21T09:05:30.3146770Z .b8 0 // EOM(2) 2026-02-21T09:05:30.3146835Z .b8 0 // EOM(3) 2026-02-21T09:05:30.3146891Z } 2026-02-21T09:05:30.3146950Z .section .debug_info 2026-02-21T09:05:30.3146999Z { 2026-02-21T09:05:30.3147077Z .b32 104 // Length of Unit 2026-02-21T09:05:30.3147166Z .b8 2 // DWARF version number 2026-02-21T09:05:30.3147216Z .b8 0 2026-02-21T09:05:30.3147325Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:30.3147415Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:30.3147513Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:30.3147591Z .b8 116 // DW_AT_producer 2026-02-21T09:05:30.3147650Z .b8 114 2026-02-21T09:05:30.3147700Z .b8 105 2026-02-21T09:05:30.3147749Z .b8 116 2026-02-21T09:05:30.3147799Z .b8 111 2026-02-21T09:05:30.3147854Z .b8 110 2026-02-21T09:05:30.3147903Z .b8 0 2026-02-21T09:05:30.3147974Z .b8 2 // DW_AT_language 2026-02-21T09:05:30.3148022Z .b8 0 2026-02-21T09:05:30.3148135Z .b8 99 // DW_AT_name 2026-02-21T09:05:30.3148186Z .b8 52 2026-02-21T09:05:30.3148234Z .b8 115 2026-02-21T09:05:30.3148289Z .b8 109 2026-02-21T09:05:30.3148338Z .b8 55 2026-02-21T09:05:30.3148388Z .b8 109 2026-02-21T09:05:30.3148437Z .b8 97 2026-02-21T09:05:30.3148491Z .b8 111 2026-02-21T09:05:30.3148539Z .b8 120 2026-02-21T09:05:30.3148587Z .b8 112 2026-02-21T09:05:30.3148640Z .b8 54 2026-02-21T09:05:30.3148687Z .b8 114 2026-02-21T09:05:30.3148735Z .b8 119 2026-02-21T09:05:30.3148808Z .b8 100 2026-02-21T09:05:30.3148864Z .b8 106 2026-02-21T09:05:30.3148912Z .b8 98 2026-02-21T09:05:30.3148961Z .b8 101 2026-02-21T09:05:30.3149016Z .b8 111 2026-02-21T09:05:30.3149063Z .b8 113 2026-02-21T09:05:30.3149109Z .b8 101 2026-02-21T09:05:30.3149160Z .b8 121 2026-02-21T09:05:30.3149219Z .b8 122 2026-02-21T09:05:30.3149267Z .b8 112 2026-02-21T09:05:30.3149315Z .b8 100 2026-02-21T09:05:30.3149364Z .b8 116 2026-02-21T09:05:30.3149419Z .b8 52 2026-02-21T09:05:30.3149466Z .b8 115 2026-02-21T09:05:30.3149515Z .b8 115 2026-02-21T09:05:30.3149571Z .b8 102 2026-02-21T09:05:30.3149618Z .b8 101 2026-02-21T09:05:30.3149665Z .b8 101 2026-02-21T09:05:30.3149713Z .b8 98 2026-02-21T09:05:30.3149767Z .b8 51 2026-02-21T09:05:30.3149815Z .b8 105 2026-02-21T09:05:30.3149862Z .b8 111 2026-02-21T09:05:30.3149917Z .b8 115 2026-02-21T09:05:30.3149966Z .b8 102 2026-02-21T09:05:30.3150014Z .b8 114 2026-02-21T09:05:30.3150060Z .b8 114 2026-02-21T09:05:30.3150116Z .b8 120 2026-02-21T09:05:30.3150183Z .b8 118 2026-02-21T09:05:30.3150231Z .b8 109 2026-02-21T09:05:30.3150286Z .b8 53 2026-02-21T09:05:30.3150334Z .b8 118 2026-02-21T09:05:30.3150381Z .b8 109 2026-02-21T09:05:30.3150430Z .b8 54 2026-02-21T09:05:30.3150482Z .b8 119 2026-02-21T09:05:30.3150530Z .b8 51 2026-02-21T09:05:30.3150578Z .b8 99 2026-02-21T09:05:30.3150625Z .b8 51 2026-02-21T09:05:30.3150679Z .b8 99 2026-02-21T09:05:30.3150725Z .b8 97 2026-02-21T09:05:30.3150772Z .b8 46 2026-02-21T09:05:30.3150825Z .b8 112 2026-02-21T09:05:30.3150871Z .b8 121 2026-02-21T09:05:30.3150920Z .b8 0 2026-02-21T09:05:30.3151007Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:30.3151086Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:30.3151134Z .b8 116 2026-02-21T09:05:30.3151183Z .b8 109 2026-02-21T09:05:30.3151236Z .b8 112 2026-02-21T09:05:30.3151283Z .b8 47 2026-02-21T09:05:30.3151332Z .b8 116 2026-02-21T09:05:30.3151379Z .b8 111 2026-02-21T09:05:30.3151434Z .b8 114 2026-02-21T09:05:30.3151483Z .b8 99 2026-02-21T09:05:30.3151532Z .b8 104 2026-02-21T09:05:30.3151586Z .b8 105 2026-02-21T09:05:30.3151633Z .b8 110 2026-02-21T09:05:30.3151682Z .b8 100 2026-02-21T09:05:30.3151730Z .b8 117 2026-02-21T09:05:30.3151784Z .b8 99 2026-02-21T09:05:30.3151832Z .b8 116 2026-02-21T09:05:30.3151902Z .b8 111 2026-02-21T09:05:30.3151951Z .b8 114 2026-02-21T09:05:30.3152005Z .b8 95 2026-02-21T09:05:30.3152052Z .b8 114 2026-02-21T09:05:30.3152100Z .b8 111 2026-02-21T09:05:30.3152154Z .b8 111 2026-02-21T09:05:30.3152200Z .b8 116 2026-02-21T09:05:30.3152249Z .b8 47 2026-02-21T09:05:30.3152296Z .b8 52 2026-02-21T09:05:30.3152351Z .b8 115 2026-02-21T09:05:30.3152399Z .b8 0 2026-02-21T09:05:30.3152446Z } 2026-02-21T09:05:30.3152514Z .section .debug_macinfo { } 2026-02-21T09:05:30.3152518Z 2026-02-21T09:05:30.3152594Z ================================================================ 2026-02-21T09:05:30.3152695Z please share the reproducer above with Triton project. 2026-02-21T09:05:30.4013040Z [30s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:30.4013320Z 2026-02-21T09:05:30.4014572Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, None]), static_shapes=True) 2026-02-21T09:05:30.4016111Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:30.4016346Z `ptxas` stderr: 2026-02-21T09:05:30.4016766Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.4017229Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.4017374Z 2026-02-21T09:05:30.4017832Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp17m9kkrm.ptx -o /tmp/tmp17m9kkrm.ptx.o 2026-02-21T09:05:30.4018289Z 2026-02-21T09:05:30.4018416Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:30.4018595Z 2026-02-21T09:05:30.4018599Z 2026-02-21T09:05:30.4018686Z ================================================================ 2026-02-21T09:05:30.4018890Z Internal Triton PTX codegen error 2026-02-21T09:05:30.4019066Z `ptxas` stderr: 2026-02-21T09:05:30.4019461Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.4019950Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.4020100Z 2026-02-21T09:05:30.4020537Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp17m9kkrm.ptx -o /tmp/tmp17m9kkrm.ptx.o 2026-02-21T09:05:30.4020991Z 2026-02-21T09:05:30.4020994Z 2026-02-21T09:05:30.4021049Z // 2026-02-21T09:05:30.4021200Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:30.4021378Z // 2026-02-21T09:05:30.4021455Z 2026-02-21T09:05:30.4021512Z .version 8.7 2026-02-21T09:05:30.4021652Z .target sm_100a 2026-02-21T09:05:30.4021803Z .address_size 64 2026-02-21T09:05:30.4021890Z 2026-02-21T09:05:30.4022026Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:30.4022286Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:30.4022512Z // @_helion_matmul 2026-02-21T09:05:30.4022712Z .visible .entry _helion_matmul( 2026-02-21T09:05:30.4022930Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:30.4023181Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:30.4023438Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:30.4023681Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:30.4023935Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:30.4024142Z ) 2026-02-21T09:05:30.4024297Z .reqntid 256 2026-02-21T09:05:30.4024436Z .maxnreg 32 2026-02-21T09:05:30.4024558Z { 2026-02-21T09:05:30.4024732Z .reg .pred %p<148>; 2026-02-21T09:05:30.4024883Z .reg .b32 %r<529>; 2026-02-21T09:05:30.4025033Z .reg .b64 %rd<217>; 2026-02-21T09:05:30.4025303Z .loc 1 19 0 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:19:0 2026-02-21T09:05:30.4025602Z $L__func_begin0: 2026-02-21T09:05:30.4025861Z .loc 1 19 0 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:19:0 2026-02-21T09:05:30.4026093Z 2026-02-21T09:05:30.4026149Z // %bb.0: 2026-02-21T09:05:30.4026315Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:05:30.4026507Z $L__tmp0: 2026-02-21T09:05:30.4026752Z .loc 1 19 0 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:19 2026-02-21T09:05:30.4027060Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:30.4027235Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:05:30.4027447Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:05:30.4027633Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:05:30.4027838Z mov.b32 %r56, global_smem; 2026-02-21T09:05:30.4028008Z // begin inline asm 2026-02-21T09:05:30.4028277Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r56], 128; 2026-02-21T09:05:30.4028521Z // end inline asm 2026-02-21T09:05:30.4028691Z ld.param.b64 %rd64, [_helion_matmul_param_3]; 2026-02-21T09:05:30.4028881Z bar.sync 0; 2026-02-21T09:05:30.4029028Z ld.shared.b32 %r503, [global_smem]; 2026-02-21T09:05:30.4029209Z bar.sync 0; 2026-02-21T09:05:30.4029340Z // begin inline asm 2026-02-21T09:05:30.4029550Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:30.4029811Z // end inline asm 2026-02-21T09:05:30.4030058Z .loc 1 21 67 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:21:67 2026-02-21T09:05:30.4030334Z mov.u32 %r526, %ctaid.x; 2026-02-21T09:05:30.4030488Z mov.u32 %r184, %ctaid.y; 2026-02-21T09:05:30.4030639Z mov.u32 %r185, %ctaid.z; 2026-02-21T09:05:30.4030783Z mov.u32 %r186, %nctaid.x; 2026-02-21T09:05:30.4030936Z mov.u32 %r187, %nctaid.y; 2026-02-21T09:05:30.4031090Z mad.lo.s32 %r188, %r185, %r187, %r184; 2026-02-21T09:05:30.4031277Z mad.lo.s32 %r189, %r188, %r186, %r526; 2026-02-21T09:05:30.4031448Z mul.lo.s32 %r190, %r189, 384; 2026-02-21T09:05:30.4031611Z cvt.s64.s32 %rd65, %r190; 2026-02-21T09:05:30.4031761Z add.s64 %rd19, %rd64, %rd65; 2026-02-21T09:05:30.4031922Z shl.b32 %r191, %r1, 2; 2026-02-21T09:05:30.4032072Z add.s32 %r57, %r56, %r191; 2026-02-21T09:05:30.4032216Z mov.b32 %r528, 0; 2026-02-21T09:05:30.4032353Z // begin inline asm 2026-02-21T09:05:30.4032526Z @%p3 st.shared.b32 [ %r57 + 0 ], %r528; 2026-02-21T09:05:30.4032701Z // end inline asm 2026-02-21T09:05:30.4032834Z bar.warp.sync -1; 2026-02-21T09:05:30.4032982Z setp.eq.b32 %p140, %r1, 0; 2026-02-21T09:05:30.4033135Z cvt.u64.u32 %rd4, %r56; 2026-02-21T09:05:30.4033287Z // begin inline asm 2026-02-21T09:05:30.4033527Z @%p140 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:05:30.4033806Z // end inline asm 2026-02-21T09:05:30.4033944Z // begin inline asm 2026-02-21T09:05:30.4034162Z @%p140 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.4034413Z // end inline asm 2026-02-21T09:05:30.4034538Z mov.b32 %r59, 16; 2026-02-21T09:05:30.4034702Z // begin inline asm 2026-02-21T09:05:30.4034937Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:05:30.4035210Z // end inline asm 2026-02-21T09:05:30.4035339Z mov.b32 %r60, 256; 2026-02-21T09:05:30.4035479Z // begin inline asm 2026-02-21T09:05:30.4035717Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:05:30.4035986Z // end inline asm 2026-02-21T09:05:30.4036128Z mov.b32 %r61, 2048; 2026-02-21T09:05:30.4036264Z // begin inline asm 2026-02-21T09:05:30.4036557Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:05:30.4036821Z // end inline asm 2026-02-21T09:05:30.4036958Z mov.b32 %r62, 4096; 2026-02-21T09:05:30.4037100Z // begin inline asm 2026-02-21T09:05:30.4037331Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r62; 2026-02-21T09:05:30.4037602Z // end inline asm 2026-02-21T09:05:30.4037733Z mov.b64 %rd12, 4096; 2026-02-21T09:05:30.4037879Z // begin inline asm 2026-02-21T09:05:30.4038127Z @%p140 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:05:30.4038422Z // end inline asm 2026-02-21T09:05:30.4038550Z mov.b32 %r63, 1; 2026-02-21T09:05:30.4038687Z // begin inline asm 2026-02-21T09:05:30.4038944Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:05:30.4039227Z // end inline asm 2026-02-21T09:05:30.4039363Z // begin inline asm 2026-02-21T09:05:30.4039608Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:05:30.4039893Z // end inline asm 2026-02-21T09:05:30.4040021Z // begin inline asm 2026-02-21T09:05:30.4040286Z @%p140 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:05:30.4040554Z // end inline asm 2026-02-21T09:05:30.4040685Z // begin inline asm 2026-02-21T09:05:30.4040939Z @%p140 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.4041296Z // end inline asm 2026-02-21T09:05:30.4041460Z // begin inline asm 2026-02-21T09:05:30.4041691Z @%p140 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.4041991Z // end inline asm 2026-02-21T09:05:30.4042118Z // begin inline asm 2026-02-21T09:05:30.4042346Z @%p140 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.4042603Z // end inline asm 2026-02-21T09:05:30.4042733Z // begin inline asm 2026-02-21T09:05:30.4043081Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:05:30.4043450Z // end inline asm 2026-02-21T09:05:30.4043587Z // begin inline asm 2026-02-21T09:05:30.4043789Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:05:30.4044042Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.4044231Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.4044402Z // end inline asm 2026-02-21T09:05:30.4044539Z bar.sync 0; 2026-02-21T09:05:30.4044718Z cvta.global.u64 %rd58, %rd19; 2026-02-21T09:05:30.4045020Z .loc 1 22 67 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:22:67 2026-02-21T09:05:30.4045303Z add.s32 %r192, %r190, 128; 2026-02-21T09:05:30.4045463Z cvt.s64.s32 %rd66, %r192; 2026-02-21T09:05:30.4045617Z add.s64 %rd37, %rd64, %rd66; 2026-02-21T09:05:30.4045775Z bar.sync 0; 2026-02-21T09:05:30.4045912Z // begin inline asm 2026-02-21T09:05:30.4046054Z @%p3 st.shared.b32 [ %r57 + 0 ], %r528; 2026-02-21T09:05:30.4046227Z // end inline asm 2026-02-21T09:05:30.4046359Z bar.warp.sync -1; 2026-02-21T09:05:30.4046503Z // begin inline asm 2026-02-21T09:05:30.4046739Z @%p140 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:05:30.4047019Z // end inline asm 2026-02-21T09:05:30.4047153Z // begin inline asm 2026-02-21T09:05:30.4047379Z @%p140 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.4047627Z // end inline asm 2026-02-21T09:05:30.4047754Z // begin inline asm 2026-02-21T09:05:30.4047987Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:05:30.4048241Z // end inline asm 2026-02-21T09:05:30.4048374Z mov.b32 %r68, 64; 2026-02-21T09:05:30.4048504Z // begin inline asm 2026-02-21T09:05:30.4048732Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r68; 2026-02-21T09:05:30.4049017Z // end inline asm 2026-02-21T09:05:30.4049150Z // begin inline asm 2026-02-21T09:05:30.4049385Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:05:30.4049648Z // end inline asm 2026-02-21T09:05:30.4049780Z // begin inline asm 2026-02-21T09:05:30.4050009Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:05:30.4050281Z // end inline asm 2026-02-21T09:05:30.4050407Z // begin inline asm 2026-02-21T09:05:30.4050654Z @%p140 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:05:30.4050940Z // end inline asm 2026-02-21T09:05:30.4051069Z // begin inline asm 2026-02-21T09:05:30.4051324Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:05:30.4051600Z // end inline asm 2026-02-21T09:05:30.4051732Z // begin inline asm 2026-02-21T09:05:30.4051973Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:05:30.4052260Z // end inline asm 2026-02-21T09:05:30.4052393Z // begin inline asm 2026-02-21T09:05:30.4052643Z @%p140 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:05:30.4052905Z // end inline asm 2026-02-21T09:05:30.4053031Z // begin inline asm 2026-02-21T09:05:30.4053274Z @%p140 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.4053552Z // end inline asm 2026-02-21T09:05:30.4053686Z // begin inline asm 2026-02-21T09:05:30.4053913Z @%p140 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.4054207Z // end inline asm 2026-02-21T09:05:30.4054347Z // begin inline asm 2026-02-21T09:05:30.4054574Z @%p140 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.4054865Z // end inline asm 2026-02-21T09:05:30.4054998Z // begin inline asm 2026-02-21T09:05:30.4055341Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:05:30.4055700Z // end inline asm 2026-02-21T09:05:30.4055835Z // begin inline asm 2026-02-21T09:05:30.4056044Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:05:30.4056282Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.4056472Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.4056641Z // end inline asm 2026-02-21T09:05:30.4056773Z bar.sync 0; 2026-02-21T09:05:30.4056907Z cvta.global.u64 %rd59, %rd37; 2026-02-21T09:05:30.4057214Z .loc 1 24 71 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:24:71 2026-02-21T09:05:30.4057497Z add.s32 %r193, %r190, 256; 2026-02-21T09:05:30.4057656Z cvt.s64.s32 %rd67, %r193; 2026-02-21T09:05:30.4057812Z add.s64 %rd55, %rd64, %rd67; 2026-02-21T09:05:30.4057958Z bar.sync 0; 2026-02-21T09:05:30.4058091Z // begin inline asm 2026-02-21T09:05:30.4058232Z @%p3 st.shared.b32 [ %r57 + 0 ], %r528; 2026-02-21T09:05:30.4058406Z // end inline asm 2026-02-21T09:05:30.4058534Z bar.warp.sync -1; 2026-02-21T09:05:30.4058671Z // begin inline asm 2026-02-21T09:05:30.4058905Z @%p140 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:05:30.4059181Z // end inline asm 2026-02-21T09:05:30.4059316Z // begin inline asm 2026-02-21T09:05:30.4059524Z @%p140 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:05:30.4059775Z // end inline asm 2026-02-21T09:05:30.4059899Z // begin inline asm 2026-02-21T09:05:30.4060128Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r68; 2026-02-21T09:05:30.4060382Z // end inline asm 2026-02-21T09:05:30.4060516Z // begin inline asm 2026-02-21T09:05:30.4060735Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:05:30.4061227Z // end inline asm 2026-02-21T09:05:30.4061363Z // begin inline asm 2026-02-21T09:05:30.4061594Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:05:30.4061872Z // end inline asm 2026-02-21T09:05:30.4062001Z // begin inline asm 2026-02-21T09:05:30.4062239Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r62; 2026-02-21T09:05:30.4062505Z // end inline asm 2026-02-21T09:05:30.4062639Z // begin inline asm 2026-02-21T09:05:30.4062888Z @%p140 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:05:30.4063163Z // end inline asm 2026-02-21T09:05:30.4063297Z // begin inline asm 2026-02-21T09:05:30.4063546Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:05:30.4063825Z // end inline asm 2026-02-21T09:05:30.4063957Z // begin inline asm 2026-02-21T09:05:30.4064221Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:05:30.4064512Z // end inline asm 2026-02-21T09:05:30.4064645Z // begin inline asm 2026-02-21T09:05:30.4064934Z @%p140 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:05:30.4065227Z // end inline asm 2026-02-21T09:05:30.4065375Z // begin inline asm 2026-02-21T09:05:30.4065626Z @%p140 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.4065928Z // end inline asm 2026-02-21T09:05:30.4066063Z // begin inline asm 2026-02-21T09:05:30.4066310Z @%p140 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:05:30.4066598Z // end inline asm 2026-02-21T09:05:30.4066731Z // begin inline asm 2026-02-21T09:05:30.4067004Z @%p140 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:05:30.4067264Z // end inline asm 2026-02-21T09:05:30.4067402Z // begin inline asm 2026-02-21T09:05:30.4067752Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:05:30.4077104Z // end inline asm 2026-02-21T09:05:30.4077309Z // begin inline asm 2026-02-21T09:05:30.4077550Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:05:30.4077816Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.4078013Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.4078194Z // end inline asm 2026-02-21T09:05:30.4078339Z bar.sync 0; 2026-02-21T09:05:30.4078476Z cvta.global.u64 %rd88, %rd55; 2026-02-21T09:05:30.4078871Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4079168Z max.u32 %r194, %r526, 511; 2026-02-21T09:05:30.4079318Z shl.b32 %r195, %r194, 7; 2026-02-21T09:05:30.4079466Z add.s32 %r4, %r195, -65408; 2026-02-21T09:05:30.4079625Z sub.s32 %r5, 65536, %r195; 2026-02-21T09:05:30.4079883Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4080166Z shr.u32 %r196, %r1, 5; 2026-02-21T09:05:30.4080331Z shfl.sync.idx.b32 %r6, %r196, 0, 31, -1; 2026-02-21T09:05:30.4080509Z shl.b32 %r197, %r6, 21; 2026-02-21T09:05:30.4080663Z and.b32 %r198, %r197, 6291456; 2026-02-21T09:05:30.4080823Z add.s32 %r199, %r198, %r503; 2026-02-21T09:05:30.4080984Z shl.b32 %r200, %r6, 4; 2026-02-21T09:05:30.4081128Z and.b32 %r201, %r200, 64; 2026-02-21T09:05:30.4081286Z add.s32 %r81, %r199, %r201; 2026-02-21T09:05:30.4081440Z mov.pred %p59, -1; 2026-02-21T09:05:30.4081590Z // begin inline asm 2026-02-21T09:05:30.4081964Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 0], {%r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528}; 2026-02-21T09:05:30.4082365Z // end inline asm 2026-02-21T09:05:30.4082508Z // begin inline asm 2026-02-21T09:05:30.4082862Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 16], {%r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528}; 2026-02-21T09:05:30.4083306Z // end inline asm 2026-02-21T09:05:30.4083439Z // begin inline asm 2026-02-21T09:05:30.4083805Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 32], {%r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528}; 2026-02-21T09:05:30.4084210Z // end inline asm 2026-02-21T09:05:30.4084343Z // begin inline asm 2026-02-21T09:05:30.4084744Z @%p59 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 48], {%r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528, %r528}; 2026-02-21T09:05:30.4085142Z // end inline asm 2026-02-21T09:05:30.4085287Z // begin inline asm 2026-02-21T09:05:30.4085439Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:30.4085611Z // end inline asm 2026-02-21T09:05:30.4085752Z bar.sync 0; 2026-02-21T09:05:30.4086004Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4086310Z add.s32 %r527, %r56, 73760; 2026-02-21T09:05:30.4086464Z // begin inline asm 2026-02-21T09:05:30.4086677Z @%p140 mbarrier.init.shared::cta.b64 [%r527], 1; 2026-02-21T09:05:30.4086865Z // end inline asm 2026-02-21T09:05:30.4087002Z bar.sync 0; 2026-02-21T09:05:30.4087134Z add.s32 %r150, %r56, 73768; 2026-02-21T09:05:30.4087292Z // begin inline asm 2026-02-21T09:05:30.4087463Z @%p140 mbarrier.init.shared::cta.b64 [%r150], 1; 2026-02-21T09:05:30.4087648Z // end inline asm 2026-02-21T09:05:30.4087793Z add.s32 %r151, %r56, 73728; 2026-02-21T09:05:30.4087942Z // begin inline asm 2026-02-21T09:05:30.4088111Z @%p140 mbarrier.init.shared::cta.b64 [%r151], 1; 2026-02-21T09:05:30.4088328Z // end inline asm 2026-02-21T09:05:30.4088465Z bar.sync 0; 2026-02-21T09:05:30.4088592Z add.s32 %r152, %r56, 73736; 2026-02-21T09:05:30.4088748Z // begin inline asm 2026-02-21T09:05:30.4088915Z @%p140 mbarrier.init.shared::cta.b64 [%r152], 1; 2026-02-21T09:05:30.4089097Z // end inline asm 2026-02-21T09:05:30.4089238Z bar.sync 0; 2026-02-21T09:05:30.4089366Z add.s32 %r153, %r56, 73744; 2026-02-21T09:05:30.4089524Z // begin inline asm 2026-02-21T09:05:30.4089677Z @%p140 mbarrier.init.shared::cta.b64 [%r153], 1; 2026-02-21T09:05:30.4089866Z // end inline asm 2026-02-21T09:05:30.4089997Z bar.sync 0; 2026-02-21T09:05:30.4090133Z add.s32 %r229, %r56, 73752; 2026-02-21T09:05:30.4090278Z // begin inline asm 2026-02-21T09:05:30.4090441Z @%p140 mbarrier.init.shared::cta.b64 [%r229], 1; 2026-02-21T09:05:30.4090628Z // end inline asm 2026-02-21T09:05:30.4090763Z setp.lt.s32 %p79, %r5, 1; 2026-02-21T09:05:30.4090962Z setp.gt.s32 %p78, %r5, 0; 2026-02-21T09:05:30.4091228Z .loc 1 40 33 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:40:33 2026-02-21T09:05:30.4091519Z shr.u32 %r202, %r526, 4; 2026-02-21T09:05:30.4091673Z and.b32 %r203, %r202, 134217724; 2026-02-21T09:05:30.4091951Z .loc 1 41 39 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:41:39 2026-02-21T09:05:30.4092233Z sub.s32 %r204, 32, %r203; 2026-02-21T09:05:30.4092481Z .loc 1 41 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:41:52 2026-02-21T09:05:30.4092760Z min.s32 %r205, %r204, 4; 2026-02-21T09:05:30.4093008Z .loc 1 42 45 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:42:45 2026-02-21T09:05:30.4093285Z and.b32 %r206, %r526, 63; 2026-02-21T09:05:30.4093533Z .loc 1 43 51 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:43:51 2026-02-21T09:05:30.4093825Z div.s32 %r207, %r206, %r205; 2026-02-21T09:05:30.4094088Z .loc 1 42 64 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:42:64 2026-02-21T09:05:30.4094366Z mul.lo.s32 %r208, %r207, %r205; 2026-02-21T09:05:30.4094538Z sub.s32 %r209, %r206, %r208; 2026-02-21T09:05:30.4094864Z .loc 1 42 30 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:42:30 2026-02-21T09:05:30.4095158Z add.s32 %r210, %r209, %r203; 2026-02-21T09:05:30.4095416Z .loc 1 44 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:44:27 2026-02-21T09:05:30.4095711Z shl.b32 %r505, %r210, 6; 2026-02-21T09:05:30.4095970Z .loc 1 45 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:45:27 2026-02-21T09:05:30.4096247Z shl.b32 %r507, %r207, 8; 2026-02-21T09:05:30.4096516Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4096804Z bar.sync 0; 2026-02-21T09:05:30.4096947Z and.pred %p69, %p140, %p78; 2026-02-21T09:05:30.4097104Z // begin inline asm 2026-02-21T09:05:30.4097305Z @%p69 mbarrier.arrive.expect_tx.shared.b64 _, [%r151], 10240; 2026-02-21T09:05:30.4097529Z // end inline asm 2026-02-21T09:05:30.4097779Z .loc 1 54 31 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:54:31 2026-02-21T09:05:30.4098068Z // begin inline asm 2026-02-21T09:05:30.4098224Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.4098396Z // end inline asm 2026-02-21T09:05:30.4098557Z bar.sync 0; 2026-02-21T09:05:30.4098706Z elect.sync %r211|%p80, -1; 2026-02-21T09:05:30.4098867Z and.pred %p81, %p78, %p80; 2026-02-21T09:05:30.4099034Z and.pred %p70, %p3, %p81; 2026-02-21T09:05:30.4099183Z // begin inline asm 2026-02-21T09:05:30.4099514Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r56], [%rd58, {%r528, %r507}], [%r151]; 2026-02-21T09:05:30.4099884Z // end inline asm 2026-02-21T09:05:30.4100119Z .loc 1 55 44 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:55:44 2026-02-21T09:05:30.4100431Z bar.sync 0; 2026-02-21T09:05:30.4100564Z elect.sync %r212|%p82, -1; 2026-02-21T09:05:30.4100732Z and.pred %p83, %p78, %p82; 2026-02-21T09:05:30.4100886Z and.pred %p71, %p3, %p83; 2026-02-21T09:05:30.4101048Z add.s32 %r160, %r56, 65536; 2026-02-21T09:05:30.4101197Z // begin inline asm 2026-02-21T09:05:30.4101523Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd59, {%r528, %r505}], [%r151]; 2026-02-21T09:05:30.4101888Z // end inline asm 2026-02-21T09:05:30.4102130Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4102414Z setp.gt.s32 %p84, %r5, 1; 2026-02-21T09:05:30.4102562Z bar.sync 0; 2026-02-21T09:05:30.4102701Z and.pred %p72, %p140, %p84; 2026-02-21T09:05:30.4102856Z // begin inline asm 2026-02-21T09:05:30.4103077Z @%p72 mbarrier.arrive.expect_tx.shared.b64 _, [%r152], 10240; 2026-02-21T09:05:30.4103300Z // end inline asm 2026-02-21T09:05:30.4103531Z .loc 1 54 31 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:54:31 2026-02-21T09:05:30.4103820Z bar.sync 0; 2026-02-21T09:05:30.4103953Z elect.sync %r213|%p85, -1; 2026-02-21T09:05:30.4104118Z and.pred %p86, %p84, %p85; 2026-02-21T09:05:30.4104269Z and.pred %p73, %p3, %p86; 2026-02-21T09:05:30.4104427Z add.s32 %r165, %r56, 8192; 2026-02-21T09:05:30.4104573Z // begin inline asm 2026-02-21T09:05:30.4104953Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r165], [%rd58, {%r59, %r507}], [%r152]; 2026-02-21T09:05:30.4105303Z // end inline asm 2026-02-21T09:05:30.4105545Z .loc 1 55 44 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:55:44 2026-02-21T09:05:30.4105829Z bar.sync 0; 2026-02-21T09:05:30.4105962Z elect.sync %r214|%p87, -1; 2026-02-21T09:05:30.4106126Z and.pred %p88, %p84, %p87; 2026-02-21T09:05:30.4106282Z and.pred %p74, %p3, %p88; 2026-02-21T09:05:30.4106443Z add.s32 %r169, %r56, 67584; 2026-02-21T09:05:30.4106593Z // begin inline asm 2026-02-21T09:05:30.4106917Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r169], [%rd59, {%r59, %r505}], [%r152]; 2026-02-21T09:05:30.4107326Z // end inline asm 2026-02-21T09:05:30.4107582Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4107888Z setp.gt.s32 %p89, %r5, 2; 2026-02-21T09:05:30.4108042Z bar.sync 0; 2026-02-21T09:05:30.4108187Z and.pred %p75, %p140, %p89; 2026-02-21T09:05:30.4108346Z // begin inline asm 2026-02-21T09:05:30.4108556Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r153], 10240; 2026-02-21T09:05:30.4108790Z // end inline asm 2026-02-21T09:05:30.4109038Z .loc 1 54 31 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:54:31 2026-02-21T09:05:30.4109336Z bar.sync 0; 2026-02-21T09:05:30.4109476Z elect.sync %r215|%p90, -1; 2026-02-21T09:05:30.4109648Z and.pred %p91, %p89, %p90; 2026-02-21T09:05:30.4109810Z and.pred %p76, %p3, %p91; 2026-02-21T09:05:30.4109977Z add.s32 %r174, %r56, 16384; 2026-02-21T09:05:30.4110134Z mov.b32 %r175, 32; 2026-02-21T09:05:30.4110288Z // begin inline asm 2026-02-21T09:05:30.4110636Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r174], [%rd58, {%r175, %r507}], [%r153]; 2026-02-21T09:05:30.4111032Z // end inline asm 2026-02-21T09:05:30.4111289Z .loc 1 55 44 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:55:44 2026-02-21T09:05:30.4111582Z bar.sync 0; 2026-02-21T09:05:30.4111726Z elect.sync %r216|%p92, -1; 2026-02-21T09:05:30.4111888Z and.pred %p93, %p89, %p92; 2026-02-21T09:05:30.4112057Z and.pred %p77, %p3, %p93; 2026-02-21T09:05:30.4112214Z add.s32 %r178, %r56, 69632; 2026-02-21T09:05:30.4112380Z // begin inline asm 2026-02-21T09:05:30.4112711Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd59, {%r175, %r505}], [%r153]; 2026-02-21T09:05:30.4113113Z // end inline asm 2026-02-21T09:05:30.4113367Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4113664Z bar.sync 0; 2026-02-21T09:05:30.4113791Z // begin inline asm 2026-02-21T09:05:30.4113935Z 2026-02-21T09:05:30.4114049Z { 2026-02-21T09:05:30.4114184Z @!%p78 bra.uni skipWait; 2026-02-21T09:05:30.4114347Z .reg .pred complete; 2026-02-21T09:05:30.4114502Z waitLoop: 2026-02-21T09:05:30.4114733Z mbarrier.try_wait.parity.shared.b64 complete, [%r151], %r528; 2026-02-21T09:05:30.4114982Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.4115138Z skipWait: 2026-02-21T09:05:30.4115254Z } 2026-02-21T09:05:30.4115318Z 2026-02-21T09:05:30.4115379Z // end inline asm 2026-02-21T09:05:30.4115646Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4115939Z setp.ne.b32 %p94, %r6, 0; 2026-02-21T09:05:30.4116093Z or.pred %p95, %p79, %p94; 2026-02-21T09:05:30.4116248Z @%p95 bra $L__BB0_2; 2026-02-21T09:05:30.4116383Z // %bb.1: 2026-02-21T09:05:30.4116516Z elect.sync %r221|%p97, -1; 2026-02-21T09:05:30.4116672Z bfe.u32 %r223, %r56, 4, 14; 2026-02-21T09:05:30.4116826Z cvt.u64.u32 %rd73, %r223; 2026-02-21T09:05:30.4116994Z or.b64 %rd68, %rd73, -4611685949674356736; 2026-02-21T09:05:30.4117172Z bfe.u32 %r225, %r160, 4, 14; 2026-02-21T09:05:30.4117326Z cvt.u64.u32 %rd74, %r225; 2026-02-21T09:05:30.4117481Z or.b64 %rd69, %rd74, -4611685949699522560; 2026-02-21T09:05:30.4117659Z mov.b32 %r218, 135266320; 2026-02-21T09:05:30.4117801Z mov.pred %p96, 0; 2026-02-21T09:05:30.4117942Z // begin inline asm 2026-02-21T09:05:30.4118160Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r503 + 0 ], %rd68, %rd69, %r218, %p96; 2026-02-21T09:05:30.4118415Z // end inline asm 2026-02-21T09:05:30.4118554Z add.s32 %r226, %r56, 4096; 2026-02-21T09:05:30.4118704Z bfe.u32 %r227, %r226, 4, 14; 2026-02-21T09:05:30.4118862Z cvt.u64.u32 %rd75, %r227; 2026-02-21T09:05:30.4119018Z or.b64 %rd70, %rd75, -4611685949674356736; 2026-02-21T09:05:30.4119194Z // begin inline asm 2026-02-21T09:05:30.4119442Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r503 + 64 ], %rd70, %rd69, %r218, %p96; 2026-02-21T09:05:30.4119683Z // end inline asm 2026-02-21T09:05:30.4119824Z add.s32 %r228, %r56, 73760; 2026-02-21T09:05:30.4119972Z cvt.u64.u32 %rd72, %r228; 2026-02-21T09:05:30.4120122Z // begin inline asm 2026-02-21T09:05:30.4120322Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd72]; 2026-02-21T09:05:30.4120553Z // end inline asm 2026-02-21T09:05:30.4120687Z $L__BB0_2: 2026-02-21T09:05:30.4120919Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4121216Z setp.gt.s32 %p104, %r5, 3; 2026-02-21T09:05:30.4121365Z bar.sync 0; 2026-02-21T09:05:30.4121511Z and.pred %p101, %p140, %p104; 2026-02-21T09:05:30.4121666Z // begin inline asm 2026-02-21T09:05:30.4121860Z @%p101 mbarrier.arrive.expect_tx.shared.b64 _, [%r229], 10240; 2026-02-21T09:05:30.4122072Z // end inline asm 2026-02-21T09:05:30.4122315Z .loc 1 54 31 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:54:31 2026-02-21T09:05:30.4122589Z bar.sync 0; 2026-02-21T09:05:30.4122717Z elect.sync %r241|%p107, -1; 2026-02-21T09:05:30.4122882Z and.pred %p108, %p104, %p107; 2026-02-21T09:05:30.4123088Z and.pred %p102, %p3, %p108; 2026-02-21T09:05:30.4123249Z add.s32 %r230, %r56, 24576; 2026-02-21T09:05:30.4123392Z mov.b32 %r511, 48; 2026-02-21T09:05:30.4123528Z // begin inline asm 2026-02-21T09:05:30.4123848Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r230], [%rd58, {%r511, %r507}], [%r229]; 2026-02-21T09:05:30.4124208Z // end inline asm 2026-02-21T09:05:30.4124455Z .loc 1 55 44 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:55:44 2026-02-21T09:05:30.4124786Z bar.sync 0; 2026-02-21T09:05:30.4124928Z elect.sync %r242|%p109, -1; 2026-02-21T09:05:30.4125090Z and.pred %p110, %p104, %p109; 2026-02-21T09:05:30.4125257Z and.pred %p103, %p3, %p110; 2026-02-21T09:05:30.4125411Z add.s32 %r234, %r56, 71680; 2026-02-21T09:05:30.4125567Z // begin inline asm 2026-02-21T09:05:30.4125883Z @%p103 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r234], [%rd59, {%r511, %r505}], [%r229]; 2026-02-21T09:05:30.4126234Z // end inline asm 2026-02-21T09:05:30.4126493Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4126780Z sub.s32 %r10, 127, %r4; 2026-02-21T09:05:30.4126940Z setp.lt.s32 %p111, %r10, 1; 2026-02-21T09:05:30.4127100Z @%p111 bra $L__BB0_11; 2026-02-21T09:05:30.4127276Z // %bb.3: // %.lr.ph 2026-02-21T09:05:30.4127609Z .loc 1 0 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:0:107 2026-02-21T09:05:30.4127899Z sub.s32 %r11, 124, %r4; 2026-02-21T09:05:30.4128052Z shl.b32 %r249, %r1, 7; 2026-02-21T09:05:30.4128198Z and.b32 %r250, %r249, 32640; 2026-02-21T09:05:30.4128355Z shl.b32 %r251, %r1, 4; 2026-02-21T09:05:30.4128500Z and.b32 %r252, %r251, 112; 2026-02-21T09:05:30.4128656Z or.b32 %r253, %r250, %r252; 2026-02-21T09:05:30.4128801Z add.s32 %r255, %r56, 32768; 2026-02-21T09:05:30.4128955Z add.s32 %r12, %r255, %r253; 2026-02-21T09:05:30.4129100Z xor.b32 %r256, %r253, 16; 2026-02-21T09:05:30.4129253Z add.s32 %r13, %r255, %r256; 2026-02-21T09:05:30.4129397Z xor.b32 %r257, %r253, 32; 2026-02-21T09:05:30.4129545Z add.s32 %r14, %r255, %r257; 2026-02-21T09:05:30.4129696Z xor.b32 %r258, %r253, 48; 2026-02-21T09:05:30.4129837Z add.s32 %r15, %r255, %r258; 2026-02-21T09:05:30.4129987Z xor.b32 %r259, %r253, 64; 2026-02-21T09:05:30.4130130Z add.s32 %r16, %r255, %r259; 2026-02-21T09:05:30.4130283Z xor.b32 %r260, %r253, 80; 2026-02-21T09:05:30.4130426Z add.s32 %r17, %r255, %r260; 2026-02-21T09:05:30.4130486Z xor.b32 %r261, %r253, 96; 2026-02-21T09:05:30.4130542Z add.s32 %r18, %r255, %r261; 2026-02-21T09:05:30.4130598Z xor.b32 %r262, %r253, 112; 2026-02-21T09:05:30.4130685Z add.s32 %r19, %r255, %r262; 2026-02-21T09:05:30.4130746Z add.s32 %r513, %r56, 73760; 2026-02-21T09:05:30.4130803Z mov.pred %p147, -1; 2026-02-21T09:05:30.4130856Z mov.b32 %r516, 3; 2026-02-21T09:05:30.4130915Z mov.b32 %r512, 0; 2026-02-21T09:05:30.4130966Z mov.b32 %r510, 1; 2026-02-21T09:05:30.4131017Z mov.b32 %r509, 2; 2026-02-21T09:05:30.4131071Z mov.b32 %r506, %r505; 2026-02-21T09:05:30.4131132Z mov.b32 %r508, %r507; 2026-02-21T09:05:30.4131184Z mov.b32 %r514, %r512; 2026-02-21T09:05:30.4131237Z mov.b32 %r515, %r512; 2026-02-21T09:05:30.4131295Z mov.b32 %r517, %r510; 2026-02-21T09:05:30.4131348Z mov.b32 %r518, %r512; 2026-02-21T09:05:30.4131400Z mov.b32 %r519, %r507; 2026-02-21T09:05:30.4131454Z mov.b32 %r520, %r505; 2026-02-21T09:05:30.4131513Z mov.b32 %r522, %r516; 2026-02-21T09:05:30.4131564Z mov.b32 %r523, %r512; 2026-02-21T09:05:30.4131617Z mov.b32 %r524, %r520; 2026-02-21T09:05:30.4131676Z mov.b32 %r525, %r519; 2026-02-21T09:05:30.4131732Z bra.uni $L__BB0_4; 2026-02-21T09:05:30.4131836Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.4131908Z selp.b32 %r517, 0, %r312, %p129; 2026-02-21T09:05:30.4131968Z selp.b32 %r313, 1, 0, %p129; 2026-02-21T09:05:30.4132051Z xor.b32 %r518, %r528, %r313; 2026-02-21T09:05:30.4132224Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4132287Z add.s32 %r523, %r523, 1; 2026-02-21T09:05:30.4132349Z setp.lt.s32 %p138, %r523, %r10; 2026-02-21T09:05:30.4132402Z mov.b32 %r505, %r520; 2026-02-21T09:05:30.4132462Z mov.b32 %r506, %r20; 2026-02-21T09:05:30.4132515Z mov.b32 %r507, %r519; 2026-02-21T09:05:30.4132571Z mov.b32 %r508, %r22; 2026-02-21T09:05:30.4132648Z mov.b32 %r509, %r522; 2026-02-21T09:05:30.4132708Z mov.b32 %r510, %r24; 2026-02-21T09:05:30.4132760Z mov.b32 %r512, %r528; 2026-02-21T09:05:30.4132812Z mov.b32 %r513, %r527; 2026-02-21T09:05:30.4132871Z mov.b32 %r519, %r525; 2026-02-21T09:05:30.4132926Z mov.b32 %r520, %r524; 2026-02-21T09:05:30.4132978Z mov.b32 %r522, %r39; 2026-02-21T09:05:30.4133033Z @%p138 bra $L__BB0_4; 2026-02-21T09:05:30.4133094Z bra.uni $L__BB0_11; 2026-02-21T09:05:30.4133196Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:30.4133362Z .loc 1 0 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:0:107 2026-02-21T09:05:30.4133424Z mov.b32 %r528, %r518; 2026-02-21T09:05:30.4133477Z mov.b32 %r24, %r509; 2026-02-21T09:05:30.4133530Z mov.b32 %r22, %r507; 2026-02-21T09:05:30.4133589Z mov.b32 %r20, %r505; 2026-02-21T09:05:30.4133772Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4133832Z add.s32 %r263, %r522, 1; 2026-02-21T09:05:30.4133894Z setp.eq.b32 %p113, %r522, 127; 2026-02-21T09:05:30.4133961Z selp.b32 %r39, 0, %r263, %p113; 2026-02-21T09:05:30.4134019Z setp.ne.b32 %p114, %r39, 0; 2026-02-21T09:05:30.4134075Z @%p114 bra $L__BB0_6; 2026-02-21T09:05:30.4134179Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.4134234Z add.s32 %r526, %r526, 1; 2026-02-21T09:05:30.4134396Z .loc 1 39 35 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:39:35 2026-02-21T09:05:30.4134459Z shr.s32 %r264, %r526, 31; 2026-02-21T09:05:30.4134512Z shr.u32 %r265, %r264, 26; 2026-02-21T09:05:30.4134569Z add.s32 %r266, %r526, %r265; 2026-02-21T09:05:30.4134623Z shr.s32 %r267, %r266, 6; 2026-02-21T09:05:30.4134826Z .loc 1 40 33 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:40:33 2026-02-21T09:05:30.4134884Z shl.b32 %r268, %r267, 2; 2026-02-21T09:05:30.4135044Z .loc 1 41 39 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:41:39 2026-02-21T09:05:30.4135106Z sub.s32 %r269, 32, %r268; 2026-02-21T09:05:30.4135261Z .loc 1 41 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:41:52 2026-02-21T09:05:30.4135341Z min.s32 %r270, %r269, 4; 2026-02-21T09:05:30.4135506Z .loc 1 42 45 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:42:45 2026-02-21T09:05:30.4135564Z and.b32 %r271, %r266, -64; 2026-02-21T09:05:30.4135620Z sub.s32 %r272, %r526, %r271; 2026-02-21T09:05:30.4135779Z .loc 1 43 51 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:43:51 2026-02-21T09:05:30.4135843Z div.s32 %r273, %r272, %r270; 2026-02-21T09:05:30.4136006Z .loc 1 42 64 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:42:64 2026-02-21T09:05:30.4136068Z mul.lo.s32 %r274, %r273, %r270; 2026-02-21T09:05:30.4136136Z sub.s32 %r275, %r272, %r274; 2026-02-21T09:05:30.4136296Z .loc 1 42 30 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:42:30 2026-02-21T09:05:30.4136352Z add.s32 %r276, %r275, %r268; 2026-02-21T09:05:30.4136526Z .loc 1 44 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:44:27 2026-02-21T09:05:30.4136582Z shl.b32 %r524, %r276, 6; 2026-02-21T09:05:30.4136772Z .loc 1 45 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:45:27 2026-02-21T09:05:30.4136837Z shl.b32 %r525, %r273, 8; 2026-02-21T09:05:30.4136931Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.4137097Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4137153Z add.s32 %r279, %r515, 1; 2026-02-21T09:05:30.4137218Z setp.gt.s32 %p116, %r279, 3; 2026-02-21T09:05:30.4137280Z selp.b32 %r515, 0, %r279, %p116; 2026-02-21T09:05:30.4137364Z selp.b32 %r280, 1, 0, %p116; 2026-02-21T09:05:30.4137425Z xor.b32 %r514, %r514, %r280; 2026-02-21T09:05:30.4137480Z shl.b32 %r281, %r515, 3; 2026-02-21T09:05:30.4137535Z add.s32 %r283, %r56, %r281; 2026-02-21T09:05:30.4137592Z add.s32 %r277, %r283, 73728; 2026-02-21T09:05:30.4137652Z bar.sync 0; 2026-02-21T09:05:30.4137708Z // begin inline asm 2026-02-21T09:05:30.4137757Z 2026-02-21T09:05:30.4137812Z { 2026-02-21T09:05:30.4137871Z .reg .pred complete; 2026-02-21T09:05:30.4137923Z waitLoop: 2026-02-21T09:05:30.4138041Z mbarrier.try_wait.parity.shared.b64 complete, [%r277], %r514; 2026-02-21T09:05:30.4138110Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.4138158Z } 2026-02-21T09:05:30.4138163Z 2026-02-21T09:05:30.4138215Z // end inline asm 2026-02-21T09:05:30.4138277Z shl.b32 %r284, %r517, 3; 2026-02-21T09:05:30.4138331Z add.s32 %r285, %r56, %r284; 2026-02-21T09:05:30.4138409Z add.s32 %r527, %r285, 73760; 2026-02-21T09:05:30.4138581Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4138634Z @%p94 bra $L__BB0_8; 2026-02-21T09:05:30.4138724Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.4138888Z .loc 1 54 31 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:54:31 2026-02-21T09:05:30.4138949Z shl.b32 %r290, %r515, 13; 2026-02-21T09:05:30.4139004Z add.s32 %r292, %r56, %r290; 2026-02-21T09:05:30.4139164Z .loc 1 55 44 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:55:44 2026-02-21T09:05:30.4139227Z shl.b32 %r293, %r515, 11; 2026-02-21T09:05:30.4139280Z add.s32 %r294, %r56, %r293; 2026-02-21T09:05:30.4139334Z add.s32 %r295, %r294, 65536; 2026-02-21T09:05:30.4139499Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4139560Z elect.sync %r296|%p118, -1; 2026-02-21T09:05:30.4139616Z bfe.u32 %r297, %r292, 4, 14; 2026-02-21T09:05:30.4139671Z cvt.u64.u32 %rd83, %r297; 2026-02-21T09:05:30.4139745Z or.b64 %rd78, %rd83, -4611685949674356736; 2026-02-21T09:05:30.4139801Z bfe.u32 %r298, %r295, 4, 14; 2026-02-21T09:05:30.4139879Z cvt.u64.u32 %rd84, %r298; 2026-02-21T09:05:30.4139952Z or.b64 %rd79, %rd84, -4611685949699522560; 2026-02-21T09:05:30.4140006Z mov.b32 %r287, 135266320; 2026-02-21T09:05:30.4140059Z // begin inline asm 2026-02-21T09:05:30.4140205Z @%p118 tcgen05.mma.cta_group::1.kind::f16 [ %r503 + 0 ], %rd78, %rd79, %r287, %p147; 2026-02-21T09:05:30.4140258Z // end inline asm 2026-02-21T09:05:30.4140312Z add.s32 %r299, %r292, 4096; 2026-02-21T09:05:30.4140364Z bfe.u32 %r300, %r299, 4, 14; 2026-02-21T09:05:30.4140424Z cvt.u64.u32 %rd85, %r300; 2026-02-21T09:05:30.4140489Z or.b64 %rd80, %rd85, -4611685949674356736; 2026-02-21T09:05:30.4140543Z // begin inline asm 2026-02-21T09:05:30.4140686Z @%p118 tcgen05.mma.cta_group::1.kind::f16 [ %r503 + 64 ], %rd80, %rd79, %r287, %p147; 2026-02-21T09:05:30.4140739Z // end inline asm 2026-02-21T09:05:30.4140794Z cvt.u64.u32 %rd82, %r527; 2026-02-21T09:05:30.4140846Z // begin inline asm 2026-02-21T09:05:30.4140974Z @%p118 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd82]; 2026-02-21T09:05:30.4141027Z // end inline asm 2026-02-21T09:05:30.4141119Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.4141316Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4141376Z setp.eq.b32 %p125, %r39, 0; 2026-02-21T09:05:30.4141437Z setp.lt.s32 %p126, %r523, %r11; 2026-02-21T09:05:30.4141604Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4141659Z // begin inline asm 2026-02-21T09:05:30.4141707Z 2026-02-21T09:05:30.4141755Z { 2026-02-21T09:05:30.4141819Z .reg .pred complete; 2026-02-21T09:05:30.4141873Z waitLoop: 2026-02-21T09:05:30.4142007Z mbarrier.try_wait.parity.shared.b64 complete, [%r513], %r512; 2026-02-21T09:05:30.4142074Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.4142122Z } 2026-02-21T09:05:30.4142126Z 2026-02-21T09:05:30.4142177Z // end inline asm 2026-02-21T09:05:30.4142239Z add.s32 %r312, %r517, 1; 2026-02-21T09:05:30.4142299Z setp.gt.s32 %p129, %r312, 1; 2026-02-21T09:05:30.4142464Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4142520Z add.s32 %r314, %r511, 16; 2026-02-21T09:05:30.4142584Z add.s32 %r315, %r516, 1; 2026-02-21T09:05:30.4142641Z setp.gt.s32 %p130, %r315, 3; 2026-02-21T09:05:30.4142703Z selp.b32 %r516, 0, %r315, %p130; 2026-02-21T09:05:30.4142768Z selp.b32 %r511, 0, %r314, %p125; 2026-02-21T09:05:30.4142821Z shl.b32 %r316, %r516, 3; 2026-02-21T09:05:30.4142874Z add.s32 %r318, %r56, %r316; 2026-02-21T09:05:30.4142928Z add.s32 %r307, %r318, 73728; 2026-02-21T09:05:30.4143007Z bar.sync 0; 2026-02-21T09:05:30.4143073Z and.pred %p122, %p140, %p126; 2026-02-21T09:05:30.4143128Z // begin inline asm 2026-02-21T09:05:30.4143245Z @%p122 mbarrier.arrive.expect_tx.shared.b64 _, [%r307], 10240; 2026-02-21T09:05:30.4143298Z // end inline asm 2026-02-21T09:05:30.4143460Z .loc 1 54 31 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:54:31 2026-02-21T09:05:30.4143521Z shl.b32 %r319, %r516, 13; 2026-02-21T09:05:30.4143574Z add.s32 %r304, %r56, %r319; 2026-02-21T09:05:30.4143626Z bar.sync 0; 2026-02-21T09:05:30.4143685Z elect.sync %r320|%p131, -1; 2026-02-21T09:05:30.4143753Z and.pred %p132, %p126, %p131; 2026-02-21T09:05:30.4143810Z and.pred %p123, %p3, %p132; 2026-02-21T09:05:30.4143862Z // begin inline asm 2026-02-21T09:05:30.4144109Z @%p123 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r304], [%rd58, {%r511, %r525}], [%r307]; 2026-02-21T09:05:30.4144160Z // end inline asm 2026-02-21T09:05:30.4144318Z .loc 1 55 44 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:55:44 2026-02-21T09:05:30.4144381Z shl.b32 %r321, %r516, 11; 2026-02-21T09:05:30.4144435Z add.s32 %r322, %r56, %r321; 2026-02-21T09:05:30.4144489Z add.s32 %r308, %r322, 65536; 2026-02-21T09:05:30.4144573Z bar.sync 0; 2026-02-21T09:05:30.4144640Z elect.sync %r323|%p133, -1; 2026-02-21T09:05:30.4144733Z and.pred %p134, %p126, %p133; 2026-02-21T09:05:30.4144792Z and.pred %p124, %p3, %p134; 2026-02-21T09:05:30.4144853Z // begin inline asm 2026-02-21T09:05:30.4145083Z @%p124 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r308], [%rd59, {%r511, %r524}], [%r307]; 2026-02-21T09:05:30.4145135Z // end inline asm 2026-02-21T09:05:30.4145306Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4145367Z setp.ne.b32 %p147, %r510, 127; 2026-02-21T09:05:30.4145426Z @%p147 bra $L__BB0_10; 2026-02-21T09:05:30.4145514Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:05:30.4145686Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4145741Z // begin inline asm 2026-02-21T09:05:30.4145790Z 2026-02-21T09:05:30.4145848Z { 2026-02-21T09:05:30.4145905Z .reg .pred complete; 2026-02-21T09:05:30.4145956Z waitLoop: 2026-02-21T09:05:30.4146067Z mbarrier.try_wait.parity.shared.b64 complete, [%r527], %r528; 2026-02-21T09:05:30.4146163Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.4146211Z } 2026-02-21T09:05:30.4146215Z 2026-02-21T09:05:30.4146268Z // end inline asm 2026-02-21T09:05:30.4146330Z // begin inline asm 2026-02-21T09:05:30.4146601Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341}, [%r81 + 0]; 2026-02-21T09:05:30.4146653Z // end inline asm 2026-02-21T09:05:30.4146714Z // begin inline asm 2026-02-21T09:05:30.4147006Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358}, [%r81 + 16]; 2026-02-21T09:05:30.4147059Z // end inline asm 2026-02-21T09:05:30.4147119Z // begin inline asm 2026-02-21T09:05:30.4147381Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r360, %r361, %r362, %r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375}, [%r81 + 32]; 2026-02-21T09:05:30.4147435Z // end inline asm 2026-02-21T09:05:30.4147487Z // begin inline asm 2026-02-21T09:05:30.4147827Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r377, %r378, %r379, %r380, %r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392}, [%r81 + 48]; 2026-02-21T09:05:30.4147877Z // end inline asm 2026-02-21T09:05:30.4147929Z // begin inline asm 2026-02-21T09:05:30.4148003Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:30.4148054Z // end inline asm 2026-02-21T09:05:30.4148144Z cvt.u64.u32 %rd89, %r326; 2026-02-21T09:05:30.4148210Z cvt.u64.u32 %rd90, %r327; 2026-02-21T09:05:30.4148267Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:05:30.4148322Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:05:30.4148487Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4148555Z mov.b64 {%r397, %r398}, %rd92; 2026-02-21T09:05:30.4148621Z cvt.rn.f16x2.f32 %r399, %r398, %r397; 2026-02-21T09:05:30.4148783Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4148846Z cvt.u64.u32 %rd93, %r328; 2026-02-21T09:05:30.4148899Z cvt.u64.u32 %rd94, %r329; 2026-02-21T09:05:30.4148954Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:05:30.4149016Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:05:30.4149172Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4149231Z mov.b64 {%r400, %r401}, %rd96; 2026-02-21T09:05:30.4149297Z cvt.rn.f16x2.f32 %r402, %r401, %r400; 2026-02-21T09:05:30.4149463Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4149517Z cvt.u64.u32 %rd97, %r330; 2026-02-21T09:05:30.4149597Z cvt.u64.u32 %rd98, %r331; 2026-02-21T09:05:30.4149659Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:05:30.4149716Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:05:30.4149876Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4149943Z mov.b64 {%r403, %r404}, %rd100; 2026-02-21T09:05:30.4150005Z cvt.rn.f16x2.f32 %r405, %r404, %r403; 2026-02-21T09:05:30.4150161Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4150218Z cvt.u64.u32 %rd101, %r332; 2026-02-21T09:05:30.4150282Z cvt.u64.u32 %rd102, %r333; 2026-02-21T09:05:30.4150338Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:05:30.4150397Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:05:30.4150560Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4150617Z mov.b64 {%r406, %r407}, %rd104; 2026-02-21T09:05:30.4150677Z cvt.rn.f16x2.f32 %r408, %r407, %r406; 2026-02-21T09:05:30.4150848Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4150907Z cvt.u64.u32 %rd105, %r334; 2026-02-21T09:05:30.4150984Z cvt.u64.u32 %rd106, %r335; 2026-02-21T09:05:30.4151044Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:05:30.4151113Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:05:30.4151281Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4151341Z mov.b64 {%r409, %r410}, %rd108; 2026-02-21T09:05:30.4151414Z cvt.rn.f16x2.f32 %r411, %r410, %r409; 2026-02-21T09:05:30.4151582Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4151665Z cvt.u64.u32 %rd109, %r336; 2026-02-21T09:05:30.4151729Z cvt.u64.u32 %rd110, %r337; 2026-02-21T09:05:30.4151786Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:05:30.4151845Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:05:30.4152014Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4152081Z mov.b64 {%r412, %r413}, %rd112; 2026-02-21T09:05:30.4152145Z cvt.rn.f16x2.f32 %r414, %r413, %r412; 2026-02-21T09:05:30.4152314Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4152381Z cvt.u64.u32 %rd113, %r338; 2026-02-21T09:05:30.4152438Z cvt.u64.u32 %rd114, %r339; 2026-02-21T09:05:30.4152494Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:05:30.4152559Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:05:30.4152748Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4152812Z mov.b64 {%r415, %r416}, %rd116; 2026-02-21T09:05:30.4152876Z cvt.rn.f16x2.f32 %r417, %r416, %r415; 2026-02-21T09:05:30.4153058Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4153117Z cvt.u64.u32 %rd117, %r340; 2026-02-21T09:05:30.4153174Z cvt.u64.u32 %rd118, %r341; 2026-02-21T09:05:30.4153237Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:05:30.4153295Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:05:30.4153464Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4153529Z mov.b64 {%r418, %r419}, %rd120; 2026-02-21T09:05:30.4153591Z cvt.rn.f16x2.f32 %r420, %r419, %r418; 2026-02-21T09:05:30.4153755Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4153813Z cvt.u64.u32 %rd121, %r343; 2026-02-21T09:05:30.4153878Z cvt.u64.u32 %rd122, %r344; 2026-02-21T09:05:30.4153937Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:05:30.4153997Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:05:30.4154168Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4154256Z mov.b64 {%r421, %r422}, %rd124; 2026-02-21T09:05:30.4154318Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:05:30.4154493Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4154553Z cvt.u64.u32 %rd125, %r345; 2026-02-21T09:05:30.4154609Z cvt.u64.u32 %rd126, %r346; 2026-02-21T09:05:30.4154666Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:05:30.4154763Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:05:30.4154932Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4154990Z mov.b64 {%r424, %r425}, %rd128; 2026-02-21T09:05:30.4155059Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:05:30.4155229Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4155287Z cvt.u64.u32 %rd129, %r347; 2026-02-21T09:05:30.4155349Z cvt.u64.u32 %rd130, %r348; 2026-02-21T09:05:30.4155407Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:05:30.4155466Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:05:30.4155635Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4155732Z mov.b64 {%r427, %r428}, %rd132; 2026-02-21T09:05:30.4155796Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:05:30.4155965Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4156029Z cvt.u64.u32 %rd133, %r349; 2026-02-21T09:05:30.4156086Z cvt.u64.u32 %rd134, %r350; 2026-02-21T09:05:30.4156143Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:05:30.4156208Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:05:30.4156377Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4156459Z mov.b64 {%r430, %r431}, %rd136; 2026-02-21T09:05:30.4156521Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:05:30.4156699Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4156756Z cvt.u64.u32 %rd137, %r351; 2026-02-21T09:05:30.4156813Z cvt.u64.u32 %rd138, %r352; 2026-02-21T09:05:30.4156877Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:05:30.4156936Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:05:30.4157099Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4157163Z mov.b64 {%r433, %r434}, %rd140; 2026-02-21T09:05:30.4157225Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:05:30.4157415Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4157477Z cvt.u64.u32 %rd141, %r353; 2026-02-21T09:05:30.4157542Z cvt.u64.u32 %rd142, %r354; 2026-02-21T09:05:30.4157599Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:05:30.4157657Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:05:30.4157827Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4157886Z mov.b64 {%r436, %r437}, %rd144; 2026-02-21T09:05:30.4157948Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:05:30.4158121Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4158179Z cvt.u64.u32 %rd145, %r355; 2026-02-21T09:05:30.4158234Z cvt.u64.u32 %rd146, %r356; 2026-02-21T09:05:30.4158291Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:05:30.4158356Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:05:30.4158526Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4158583Z mov.b64 {%r439, %r440}, %rd148; 2026-02-21T09:05:30.4158650Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T09:05:30.4158809Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4158862Z cvt.u64.u32 %rd149, %r357; 2026-02-21T09:05:30.4158949Z cvt.u64.u32 %rd150, %r358; 2026-02-21T09:05:30.4159003Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:05:30.4159057Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:05:30.4159216Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4159280Z mov.b64 {%r442, %r443}, %rd152; 2026-02-21T09:05:30.4159339Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T09:05:30.4159497Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4159558Z cvt.u64.u32 %rd153, %r360; 2026-02-21T09:05:30.4159612Z cvt.u64.u32 %rd154, %r361; 2026-02-21T09:05:30.4159669Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:05:30.4159735Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:05:30.4159893Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4159947Z mov.b64 {%r445, %r446}, %rd156; 2026-02-21T09:05:30.4160009Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T09:05:30.4160178Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4160233Z cvt.u64.u32 %rd157, %r362; 2026-02-21T09:05:30.4160319Z cvt.u64.u32 %rd158, %r363; 2026-02-21T09:05:30.4160384Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:05:30.4160441Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:05:30.4160601Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4160662Z mov.b64 {%r448, %r449}, %rd160; 2026-02-21T09:05:30.4160722Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T09:05:30.4160882Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4160960Z cvt.u64.u32 %rd161, %r364; 2026-02-21T09:05:30.4161023Z cvt.u64.u32 %rd162, %r365; 2026-02-21T09:05:30.4161078Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:05:30.4161135Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:05:30.4161297Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4161351Z mov.b64 {%r451, %r452}, %rd164; 2026-02-21T09:05:30.4161411Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T09:05:30.4161572Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4161627Z cvt.u64.u32 %rd165, %r366; 2026-02-21T09:05:30.4161680Z cvt.u64.u32 %rd166, %r367; 2026-02-21T09:05:30.4161734Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:05:30.4161800Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:05:30.4161991Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4162050Z mov.b64 {%r454, %r455}, %rd168; 2026-02-21T09:05:30.4162117Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T09:05:30.4162280Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4162337Z cvt.u64.u32 %rd169, %r368; 2026-02-21T09:05:30.4162397Z cvt.u64.u32 %rd170, %r369; 2026-02-21T09:05:30.4162451Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:05:30.4162507Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:05:30.4162665Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4162728Z mov.b64 {%r457, %r458}, %rd172; 2026-02-21T09:05:30.4162786Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T09:05:30.4162943Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4163005Z cvt.u64.u32 %rd173, %r370; 2026-02-21T09:05:30.4163060Z cvt.u64.u32 %rd174, %r371; 2026-02-21T09:05:30.4163116Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:05:30.4163178Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:05:30.4163332Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4163412Z mov.b64 {%r460, %r461}, %rd176; 2026-02-21T09:05:30.4163472Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T09:05:30.4163640Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4163695Z cvt.u64.u32 %rd177, %r372; 2026-02-21T09:05:30.4163750Z cvt.u64.u32 %rd178, %r373; 2026-02-21T09:05:30.4163811Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:05:30.4163865Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:05:30.4164020Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4164081Z mov.b64 {%r463, %r464}, %rd180; 2026-02-21T09:05:30.4164141Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T09:05:30.4164299Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4164352Z cvt.u64.u32 %rd181, %r374; 2026-02-21T09:05:30.4164413Z cvt.u64.u32 %rd182, %r375; 2026-02-21T09:05:30.4164470Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:05:30.4164523Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:05:30.4164736Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4164793Z mov.b64 {%r466, %r467}, %rd184; 2026-02-21T09:05:30.4164852Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T09:05:30.4165014Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4165069Z cvt.u64.u32 %rd185, %r377; 2026-02-21T09:05:30.4165123Z cvt.u64.u32 %rd186, %r378; 2026-02-21T09:05:30.4165177Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:05:30.4165240Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:05:30.4165423Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4165478Z mov.b64 {%r469, %r470}, %rd188; 2026-02-21T09:05:30.4165544Z cvt.rn.f16x2.f32 %r471, %r470, %r469; 2026-02-21T09:05:30.4165705Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4165760Z cvt.u64.u32 %rd189, %r379; 2026-02-21T09:05:30.4165823Z cvt.u64.u32 %rd190, %r380; 2026-02-21T09:05:30.4165880Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:05:30.4165936Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:05:30.4166099Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4166163Z mov.b64 {%r472, %r473}, %rd192; 2026-02-21T09:05:30.4166222Z cvt.rn.f16x2.f32 %r474, %r473, %r472; 2026-02-21T09:05:30.4166412Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4166477Z cvt.u64.u32 %rd193, %r381; 2026-02-21T09:05:30.4166532Z cvt.u64.u32 %rd194, %r382; 2026-02-21T09:05:30.4166587Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:05:30.4166649Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:05:30.4166810Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4166866Z mov.b64 {%r475, %r476}, %rd196; 2026-02-21T09:05:30.4166922Z cvt.rn.f16x2.f32 %r477, %r476, %r475; 2026-02-21T09:05:30.4167086Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4167142Z cvt.u64.u32 %rd197, %r383; 2026-02-21T09:05:30.4167196Z cvt.u64.u32 %rd198, %r384; 2026-02-21T09:05:30.4167258Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:05:30.4167313Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:05:30.4167472Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4167536Z mov.b64 {%r478, %r479}, %rd200; 2026-02-21T09:05:30.4167597Z cvt.rn.f16x2.f32 %r480, %r479, %r478; 2026-02-21T09:05:30.4167755Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4167838Z cvt.u64.u32 %rd201, %r385; 2026-02-21T09:05:30.4167899Z cvt.u64.u32 %rd202, %r386; 2026-02-21T09:05:30.4167955Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:05:30.4168010Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:05:30.4168175Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4168230Z mov.b64 {%r481, %r482}, %rd204; 2026-02-21T09:05:30.4168288Z cvt.rn.f16x2.f32 %r483, %r482, %r481; 2026-02-21T09:05:30.4168454Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4168509Z cvt.u64.u32 %rd205, %r387; 2026-02-21T09:05:30.4168566Z cvt.u64.u32 %rd206, %r388; 2026-02-21T09:05:30.4168621Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:05:30.4168685Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:05:30.4168844Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4168901Z mov.b64 {%r484, %r485}, %rd208; 2026-02-21T09:05:30.4168967Z cvt.rn.f16x2.f32 %r486, %r485, %r484; 2026-02-21T09:05:30.4169123Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4169201Z cvt.u64.u32 %rd209, %r389; 2026-02-21T09:05:30.4169267Z cvt.u64.u32 %rd210, %r390; 2026-02-21T09:05:30.4169321Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:05:30.4169376Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:05:30.4169534Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4169598Z mov.b64 {%r487, %r488}, %rd212; 2026-02-21T09:05:30.4169658Z cvt.rn.f16x2.f32 %r489, %r488, %r487; 2026-02-21T09:05:30.4169817Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4169910Z cvt.u64.u32 %rd213, %r391; 2026-02-21T09:05:30.4169964Z cvt.u64.u32 %rd214, %r392; 2026-02-21T09:05:30.4170019Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:05:30.4170077Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:05:30.4170235Z .loc 1 58 27 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:58:27 2026-02-21T09:05:30.4170293Z mov.b64 {%r490, %r491}, %rd216; 2026-02-21T09:05:30.4170352Z cvt.rn.f16x2.f32 %r492, %r491, %r490; 2026-02-21T09:05:30.4170510Z .loc 1 59 45 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:59:45 2026-02-21T09:05:30.4170579Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.4170631Z bar.sync 0; 2026-02-21T09:05:30.4170729Z st.shared.v4.b32 [%r12], {%r399, %r402, %r405, %r408}; 2026-02-21T09:05:30.4170837Z st.shared.v4.b32 [%r13], {%r411, %r414, %r417, %r420}; 2026-02-21T09:05:30.4170926Z st.shared.v4.b32 [%r14], {%r423, %r426, %r429, %r432}; 2026-02-21T09:05:30.4171014Z st.shared.v4.b32 [%r15], {%r435, %r438, %r441, %r444}; 2026-02-21T09:05:30.4171095Z st.shared.v4.b32 [%r16], {%r447, %r450, %r453, %r456}; 2026-02-21T09:05:30.4171176Z st.shared.v4.b32 [%r17], {%r459, %r462, %r465, %r468}; 2026-02-21T09:05:30.4171255Z st.shared.v4.b32 [%r18], {%r471, %r474, %r477, %r480}; 2026-02-21T09:05:30.4171343Z st.shared.v4.b32 [%r19], {%r483, %r486, %r489, %r492}; 2026-02-21T09:05:30.4171399Z // begin inline asm 2026-02-21T09:05:30.4171472Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.4171532Z // end inline asm 2026-02-21T09:05:30.4171582Z bar.sync 0; 2026-02-21T09:05:30.4171645Z elect.sync %r493|%p137, -1; 2026-02-21T09:05:30.4171706Z and.pred %p135, %p3, %p137; 2026-02-21T09:05:30.4171767Z // begin inline asm 2026-02-21T09:05:30.4171951Z @%p135 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd88, {%r506, %r508}], [%r255]; 2026-02-21T09:05:30.4172005Z // end inline asm 2026-02-21T09:05:30.4172078Z cp.async.bulk.commit_group; 2026-02-21T09:05:30.4172134Z bra.uni $L__BB0_10; 2026-02-21T09:05:30.4172215Z $L__BB0_11: // %._crit_edge 2026-02-21T09:05:30.4172390Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4172481Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.4172532Z bar.sync 0; 2026-02-21T09:05:30.4172587Z @%p79 bra $L__BB0_13; 2026-02-21T09:05:30.4172644Z // %bb.12: 2026-02-21T09:05:30.4172803Z .loc 1 56 52 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:56:52 2026-02-21T09:05:30.4172855Z // begin inline asm 2026-02-21T09:05:30.4172911Z 2026-02-21T09:05:30.4172959Z { 2026-02-21T09:05:30.4173015Z .reg .pred complete; 2026-02-21T09:05:30.4173064Z waitLoop: 2026-02-21T09:05:30.4173185Z mbarrier.try_wait.parity.shared.b64 complete, [%r527], %r528; 2026-02-21T09:05:30.4173246Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.4173293Z } 2026-02-21T09:05:30.4173296Z 2026-02-21T09:05:30.4173354Z // end inline asm 2026-02-21T09:05:30.4173405Z $L__BB0_13: 2026-02-21T09:05:30.4173569Z .loc 1 33 107 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:107 2026-02-21T09:05:30.4173629Z // begin inline asm 2026-02-21T09:05:30.4173712Z @%p140 mbarrier.inval.shared::cta.b64 [%r151]; 2026-02-21T09:05:30.4173762Z // end inline asm 2026-02-21T09:05:30.4173812Z bar.sync 0; 2026-02-21T09:05:30.4173893Z // begin inline asm 2026-02-21T09:05:30.4173976Z @%p140 mbarrier.inval.shared::cta.b64 [%r152]; 2026-02-21T09:05:30.4174026Z // end inline asm 2026-02-21T09:05:30.4174082Z bar.sync 0; 2026-02-21T09:05:30.4174135Z // begin inline asm 2026-02-21T09:05:30.4174211Z @%p140 mbarrier.inval.shared::cta.b64 [%r153]; 2026-02-21T09:05:30.4174262Z // end inline asm 2026-02-21T09:05:30.4174319Z bar.sync 0; 2026-02-21T09:05:30.4174371Z // begin inline asm 2026-02-21T09:05:30.4174447Z @%p140 mbarrier.inval.shared::cta.b64 [%r229]; 2026-02-21T09:05:30.4174526Z // end inline asm 2026-02-21T09:05:30.4174582Z add.s32 %r501, %r56, 73760; 2026-02-21T09:05:30.4174634Z // begin inline asm 2026-02-21T09:05:30.4174736Z @%p140 mbarrier.inval.shared::cta.b64 [%r501]; 2026-02-21T09:05:30.4174790Z // end inline asm 2026-02-21T09:05:30.4174839Z bar.sync 0; 2026-02-21T09:05:30.4174890Z // begin inline asm 2026-02-21T09:05:30.4174972Z @%p140 mbarrier.inval.shared::cta.b64 [%r150]; 2026-02-21T09:05:30.4175024Z // end inline asm 2026-02-21T09:05:30.4175184Z .loc 1 33 4 // cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py:33:4 2026-02-21T09:05:30.4175241Z bar.sync 0; 2026-02-21T09:05:30.4175294Z // begin inline asm 2026-02-21T09:05:30.4175404Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r503, 128; 2026-02-21T09:05:30.4175456Z // end inline asm 2026-02-21T09:05:30.4175513Z ret; 2026-02-21T09:05:30.4175592Z $L__tmp1: 2026-02-21T09:05:30.4175648Z $L__func_end0: 2026-02-21T09:05:30.4175735Z // -- End function 2026-02-21T09:05:30.4175784Z } 2026-02-21T09:05:30.4175980Z .file 1 "/tmp/torchinductor_root/mk/cmk3gtpy6utz725w4vdlkzzuydmk3besuasitvzv5u5y54k72g3e.py" 2026-02-21T09:05:30.4176048Z .section .debug_abbrev 2026-02-21T09:05:30.4176098Z { 2026-02-21T09:05:30.4176182Z .b8 1 // Abbreviation Code 2026-02-21T09:05:30.4176268Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:30.4176352Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:30.4176429Z .b8 37 // DW_AT_producer 2026-02-21T09:05:30.4176501Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.4176579Z .b8 19 // DW_AT_language 2026-02-21T09:05:30.4176651Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:30.4176725Z .b8 3 // DW_AT_name 2026-02-21T09:05:30.4176803Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.4176877Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:30.4176947Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:30.4177050Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:30.4177126Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.4177194Z .b8 0 // EOM(1) 2026-02-21T09:05:30.4177259Z .b8 0 // EOM(2) 2026-02-21T09:05:30.4177329Z .b8 0 // EOM(3) 2026-02-21T09:05:30.4177377Z } 2026-02-21T09:05:30.4177433Z .section .debug_info 2026-02-21T09:05:30.4177485Z { 2026-02-21T09:05:30.4177562Z .b32 104 // Length of Unit 2026-02-21T09:05:30.4177643Z .b8 2 // DWARF version number 2026-02-21T09:05:30.4177692Z .b8 0 2026-02-21T09:05:30.4177813Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:30.4177896Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:30.4177990Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:30.4178074Z .b8 116 // DW_AT_producer 2026-02-21T09:05:30.4178125Z .b8 114 2026-02-21T09:05:30.4178175Z .b8 105 2026-02-21T09:05:30.4178222Z .b8 116 2026-02-21T09:05:30.4178299Z .b8 111 2026-02-21T09:05:30.4178348Z .b8 110 2026-02-21T09:05:30.4178396Z .b8 0 2026-02-21T09:05:30.4178470Z .b8 2 // DW_AT_language 2026-02-21T09:05:30.4178519Z .b8 0 2026-02-21T09:05:30.4178588Z .b8 99 // DW_AT_name 2026-02-21T09:05:30.4178636Z .b8 109 2026-02-21T09:05:30.4178691Z .b8 107 2026-02-21T09:05:30.4178738Z .b8 51 2026-02-21T09:05:30.4178784Z .b8 103 2026-02-21T09:05:30.4178837Z .b8 116 2026-02-21T09:05:30.4178884Z .b8 112 2026-02-21T09:05:30.4178970Z .b8 121 2026-02-21T09:05:30.4179019Z .b8 54 2026-02-21T09:05:30.4179075Z .b8 117 2026-02-21T09:05:30.4179123Z .b8 116 2026-02-21T09:05:30.4179171Z .b8 122 2026-02-21T09:05:30.4179225Z .b8 55 2026-02-21T09:05:30.4179273Z .b8 50 2026-02-21T09:05:30.4179321Z .b8 53 2026-02-21T09:05:30.4179369Z .b8 119 2026-02-21T09:05:30.4179424Z .b8 52 2026-02-21T09:05:30.4179471Z .b8 118 2026-02-21T09:05:30.4179518Z .b8 100 2026-02-21T09:05:30.4179565Z .b8 108 2026-02-21T09:05:30.4179622Z .b8 107 2026-02-21T09:05:30.4179668Z .b8 122 2026-02-21T09:05:30.4179715Z .b8 122 2026-02-21T09:05:30.4179769Z .b8 117 2026-02-21T09:05:30.4179817Z .b8 121 2026-02-21T09:05:30.4179863Z .b8 100 2026-02-21T09:05:30.4179909Z .b8 109 2026-02-21T09:05:30.4179964Z .b8 107 2026-02-21T09:05:30.4180011Z .b8 51 2026-02-21T09:05:30.4180058Z .b8 98 2026-02-21T09:05:30.4180111Z .b8 101 2026-02-21T09:05:30.4180156Z .b8 115 2026-02-21T09:05:30.4180202Z .b8 117 2026-02-21T09:05:30.4180269Z .b8 97 2026-02-21T09:05:30.4180327Z .b8 115 2026-02-21T09:05:30.4180376Z .b8 105 2026-02-21T09:05:30.4180424Z .b8 116 2026-02-21T09:05:30.4180477Z .b8 118 2026-02-21T09:05:30.4180525Z .b8 122 2026-02-21T09:05:30.4180572Z .b8 118 2026-02-21T09:05:30.4180619Z .b8 53 2026-02-21T09:05:30.4180675Z .b8 117 2026-02-21T09:05:30.4180723Z .b8 53 2026-02-21T09:05:30.4180770Z .b8 121 2026-02-21T09:05:30.4180817Z .b8 53 2026-02-21T09:05:30.4180868Z .b8 52 2026-02-21T09:05:30.4180914Z .b8 107 2026-02-21T09:05:30.4180960Z .b8 55 2026-02-21T09:05:30.4181013Z .b8 50 2026-02-21T09:05:30.4181060Z .b8 103 2026-02-21T09:05:30.4181107Z .b8 51 2026-02-21T09:05:30.4181153Z .b8 101 2026-02-21T09:05:30.4181207Z .b8 46 2026-02-21T09:05:30.4181255Z .b8 112 2026-02-21T09:05:30.4181302Z .b8 121 2026-02-21T09:05:30.4181353Z .b8 0 2026-02-21T09:05:30.4181439Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:30.4181511Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:30.4181559Z .b8 116 2026-02-21T09:05:30.4181614Z .b8 109 2026-02-21T09:05:30.4181663Z .b8 112 2026-02-21T09:05:30.4181710Z .b8 47 2026-02-21T09:05:30.4181763Z .b8 116 2026-02-21T09:05:30.4181810Z .b8 111 2026-02-21T09:05:30.4181857Z .b8 114 2026-02-21T09:05:30.4181904Z .b8 99 2026-02-21T09:05:30.4181982Z .b8 104 2026-02-21T09:05:30.4182030Z .b8 105 2026-02-21T09:05:30.4182076Z .b8 110 2026-02-21T09:05:30.4182123Z .b8 100 2026-02-21T09:05:30.4182176Z .b8 117 2026-02-21T09:05:30.4182223Z .b8 99 2026-02-21T09:05:30.4182270Z .b8 116 2026-02-21T09:05:30.4182323Z .b8 111 2026-02-21T09:05:30.4182370Z .b8 114 2026-02-21T09:05:30.4182415Z .b8 95 2026-02-21T09:05:30.4182463Z .b8 114 2026-02-21T09:05:30.4182516Z .b8 111 2026-02-21T09:05:30.4182563Z .b8 111 2026-02-21T09:05:30.4182609Z .b8 116 2026-02-21T09:05:30.4182660Z .b8 47 2026-02-21T09:05:30.4182707Z .b8 109 2026-02-21T09:05:30.4182753Z .b8 107 2026-02-21T09:05:30.4182799Z .b8 0 2026-02-21T09:05:30.4182857Z } 2026-02-21T09:05:30.4182921Z .section .debug_macinfo { } 2026-02-21T09:05:30.4182926Z 2026-02-21T09:05:30.4183001Z ================================================================ 2026-02-21T09:05:30.4183107Z please share the reproducer above with Triton project. 2026-02-21T09:05:30.5575050Z [30s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:30.5576259Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=5, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:05:30.5576754Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:30.5576772Z 2026-02-21T09:05:30.5576777Z 2026-02-21T09:05:30.5576781Z 2026-02-21T09:05:30.5576923Z ================================================================ 2026-02-21T09:05:30.5577236Z Internal Triton PTX codegen error 2026-02-21T09:05:30.5577314Z `ptxas` stderr: 2026-02-21T09:05:30.5577756Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 262 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.5577878Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.5577884Z 2026-02-21T09:05:30.5578405Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpbc94bn2e.ptx -o /tmp/tmpbc94bn2e.ptx.o 2026-02-21T09:05:30.5578410Z 2026-02-21T09:05:30.5578414Z 2026-02-21T09:05:30.5578489Z // 2026-02-21T09:05:30.5578578Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:30.5578639Z // 2026-02-21T09:05:30.5578643Z 2026-02-21T09:05:30.5578718Z .version 8.7 2026-02-21T09:05:30.5578786Z .target sm_100a 2026-02-21T09:05:30.5578915Z .address_size 64 2026-02-21T09:05:30.5578922Z 2026-02-21T09:05:30.5579048Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:05:30.5579138Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:30.5579218Z // @_helion_matmul 2026-02-21T09:05:30.5579289Z .visible .entry _helion_matmul( 2026-02-21T09:05:30.5579407Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:05:30.5579502Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:05:30.5579593Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:05:30.5579692Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:05:30.5579784Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:05:30.5579833Z ) 2026-02-21T09:05:30.5579886Z .reqntid 256 2026-02-21T09:05:30.5579948Z .maxnreg 32 2026-02-21T09:05:30.5579997Z { 2026-02-21T09:05:30.5580058Z .reg .pred %p<125>; 2026-02-21T09:05:30.5580122Z .reg .b32 %r<343>; 2026-02-21T09:05:30.5580178Z .reg .b64 %rd<110>; 2026-02-21T09:05:30.5580354Z .loc 1 19 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:19:0 2026-02-21T09:05:30.5580407Z $L__func_begin0: 2026-02-21T09:05:30.5580624Z .loc 1 19 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:19:0 2026-02-21T09:05:30.5580628Z 2026-02-21T09:05:30.5580679Z // %bb.0: 2026-02-21T09:05:30.5580766Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:05:30.5580824Z $L__tmp0: 2026-02-21T09:05:30.5580981Z .loc 1 19 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:19 2026-02-21T09:05:30.5581036Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:30.5581096Z shr.u32 %r2, %r1, 5; 2026-02-21T09:05:30.5581166Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:05:30.5581229Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:05:30.5581288Z @%p3 bra $L__BB0_16; 2026-02-21T09:05:30.5581349Z bra.uni $L__BB0_1; 2026-02-21T09:05:30.5581561Z `ptxas` stderr: 2026-02-21T09:05:30.5581897Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 262 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:05:30.5581994Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.5581998Z 2026-02-21T09:05:30.5582419Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpbc94bn2e.ptx -o /tmp/tmpbc94bn2e.ptx.o 2026-02-21T09:05:30.5582424Z 2026-02-21T09:05:30.5582553Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:30.5582612Z $L__BB0_16: 2026-02-21T09:05:30.5582780Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0:0 2026-02-21T09:05:30.5582861Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:05:30.5582945Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:05:30.5583040Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:05:30.5583197Z .loc 1 19 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:19 2026-02-21T09:05:30.5583284Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:30.5583350Z setp.lt.u32 %p27, %r1, 32; 2026-02-21T09:05:30.5583412Z mov.b32 %r146, global_smem; 2026-02-21T09:05:30.5583468Z // begin inline asm 2026-02-21T09:05:30.5583624Z @%p27 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r146], 32; 2026-02-21T09:05:30.5583682Z // end inline asm 2026-02-21T09:05:30.5583737Z bar.sync 0, 128; 2026-02-21T09:05:30.5583812Z ld.shared.b32 %r314, [global_smem]; 2026-02-21T09:05:30.5583867Z bar.sync 0, 128; 2026-02-21T09:05:30.5583922Z // begin inline asm 2026-02-21T09:05:30.5584051Z @%p27 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:05:30.5584106Z // end inline asm 2026-02-21T09:05:30.5584299Z .loc 1 21 67 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:21:67 2026-02-21T09:05:30.5584363Z mov.u32 %r207, %ctaid.x; 2026-02-21T09:05:30.5584430Z mov.u32 %r208, %ctaid.y; 2026-02-21T09:05:30.5584488Z mov.u32 %r209, %ctaid.z; 2026-02-21T09:05:30.5584546Z mov.u32 %r210, %nctaid.x; 2026-02-21T09:05:30.5584612Z mov.u32 %r211, %nctaid.y; 2026-02-21T09:05:30.5584759Z mad.lo.s32 %r212, %r209, %r211, %r208; 2026-02-21T09:05:30.5584826Z mad.lo.s32 %r213, %r212, %r210, %r207; 2026-02-21T09:05:30.5584890Z mul.lo.s32 %r214, %r213, 384; 2026-02-21T09:05:30.5584964Z cvt.s64.s32 %rd74, %r214; 2026-02-21T09:05:30.5585024Z add.s64 %rd35, %rd7, %rd74; 2026-02-21T09:05:30.5585082Z shl.b32 %r215, %r1, 2; 2026-02-21T09:05:30.5585149Z add.s32 %r147, %r146, %r215; 2026-02-21T09:05:30.5585201Z mov.b32 %r342, 0; 2026-02-21T09:05:30.5585255Z // begin inline asm 2026-02-21T09:05:30.5585333Z @%p27 st.shared.b32 [ %r147 + 0 ], %r342; 2026-02-21T09:05:30.5585387Z // end inline asm 2026-02-21T09:05:30.5585448Z bar.warp.sync -1; 2026-02-21T09:05:30.5585511Z setp.eq.b32 %p111, %r1, 0; 2026-02-21T09:05:30.5585577Z cvt.u64.u32 %rd20, %r146; 2026-02-21T09:05:30.5585633Z // begin inline asm 2026-02-21T09:05:30.5585802Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd20 + 0 ], %rd4; 2026-02-21T09:05:30.5585894Z // end inline asm 2026-02-21T09:05:30.5585947Z // begin inline asm 2026-02-21T09:05:30.5586091Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T09:05:30.5586145Z // end inline asm 2026-02-21T09:05:30.5586201Z mov.b32 %r149, 16; 2026-02-21T09:05:30.5586252Z // begin inline asm 2026-02-21T09:05:30.5586404Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r149; 2026-02-21T09:05:30.5586460Z // end inline asm 2026-02-21T09:05:30.5586513Z mov.b32 %r150, 128; 2026-02-21T09:05:30.5586569Z // begin inline asm 2026-02-21T09:05:30.5586721Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r150; 2026-02-21T09:05:30.5586773Z // end inline asm 2026-02-21T09:05:30.5586824Z mov.b32 %r151, 2048; 2026-02-21T09:05:30.5586874Z // begin inline asm 2026-02-21T09:05:30.5587041Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r151; 2026-02-21T09:05:30.5587092Z // end inline asm 2026-02-21T09:05:30.5587143Z mov.b32 %r152, 4096; 2026-02-21T09:05:30.5587203Z // begin inline asm 2026-02-21T09:05:30.5587392Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r152; 2026-02-21T09:05:30.5587445Z // end inline asm 2026-02-21T09:05:30.5587500Z mov.b64 %rd28, 4096; 2026-02-21T09:05:30.5587549Z // begin inline asm 2026-02-21T09:05:30.5587718Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd20 + 0 ], 0x0, %rd28; 2026-02-21T09:05:30.5587767Z // end inline asm 2026-02-21T09:05:30.5587823Z mov.b32 %r153, 1; 2026-02-21T09:05:30.5587872Z // begin inline asm 2026-02-21T09:05:30.5588042Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r153; 2026-02-21T09:05:30.5588134Z // end inline asm 2026-02-21T09:05:30.5588184Z // begin inline asm 2026-02-21T09:05:30.5588354Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r153; 2026-02-21T09:05:30.5588410Z // end inline asm 2026-02-21T09:05:30.5588463Z // begin inline asm 2026-02-21T09:05:30.5588612Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x6; 2026-02-21T09:05:30.5588668Z // end inline asm 2026-02-21T09:05:30.5588719Z // begin inline asm 2026-02-21T09:05:30.5588881Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T09:05:30.5588931Z // end inline asm 2026-02-21T09:05:30.5588987Z // begin inline asm 2026-02-21T09:05:30.5589137Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T09:05:30.5589187Z // end inline asm 2026-02-21T09:05:30.5589271Z // begin inline asm 2026-02-21T09:05:30.5589416Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T09:05:30.5589466Z // end inline asm 2026-02-21T09:05:30.5589521Z // begin inline asm 2026-02-21T09:05:30.5589777Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd35 + 0 ], [ %rd20 + 0 ], 0x80; 2026-02-21T09:05:30.5589828Z // end inline asm 2026-02-21T09:05:30.5589879Z // begin inline asm 2026-02-21T09:05:30.5590009Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd35 + 0 ], 0x80; 2026-02-21T09:05:30.5590078Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.5590146Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.5590199Z // end inline asm 2026-02-21T09:05:30.5590249Z bar.sync 0, 128; 2026-02-21T09:05:30.5590411Z .loc 1 22 67 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:22:67 2026-02-21T09:05:30.5590471Z add.s32 %r216, %r214, 128; 2026-02-21T09:05:30.5590527Z cvt.s64.s32 %rd75, %r216; 2026-02-21T09:05:30.5590584Z add.s64 %rd53, %rd7, %rd75; 2026-02-21T09:05:30.5590635Z bar.sync 0, 128; 2026-02-21T09:05:30.5590694Z // begin inline asm 2026-02-21T09:05:30.5590762Z @%p27 st.shared.b32 [ %r147 + 0 ], %r342; 2026-02-21T09:05:30.5590835Z // end inline asm 2026-02-21T09:05:30.5590898Z bar.warp.sync -1; 2026-02-21T09:05:30.5590949Z // begin inline asm 2026-02-21T09:05:30.5591105Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd20 + 0 ], %rd5; 2026-02-21T09:05:30.5591160Z // end inline asm 2026-02-21T09:05:30.5591210Z // begin inline asm 2026-02-21T09:05:30.5591347Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T09:05:30.5591396Z // end inline asm 2026-02-21T09:05:30.5591450Z // begin inline asm 2026-02-21T09:05:30.5591599Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r149; 2026-02-21T09:05:30.5591646Z // end inline asm 2026-02-21T09:05:30.5591700Z // begin inline asm 2026-02-21T09:05:30.5591844Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r149; 2026-02-21T09:05:30.5591895Z // end inline asm 2026-02-21T09:05:30.5591950Z // begin inline asm 2026-02-21T09:05:30.5592103Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r151; 2026-02-21T09:05:30.5592157Z // end inline asm 2026-02-21T09:05:30.5592207Z // begin inline asm 2026-02-21T09:05:30.5592384Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r151; 2026-02-21T09:05:30.5592437Z // end inline asm 2026-02-21T09:05:30.5592490Z // begin inline asm 2026-02-21T09:05:30.5592662Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd20 + 0 ], 0x0, %rd28; 2026-02-21T09:05:30.5592712Z // end inline asm 2026-02-21T09:05:30.5592762Z // begin inline asm 2026-02-21T09:05:30.5592932Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r153; 2026-02-21T09:05:30.5592983Z // end inline asm 2026-02-21T09:05:30.5593056Z // begin inline asm 2026-02-21T09:05:30.5593223Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r153; 2026-02-21T09:05:30.5593272Z // end inline asm 2026-02-21T09:05:30.5593323Z // begin inline asm 2026-02-21T09:05:30.5593467Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x6; 2026-02-21T09:05:30.5593520Z // end inline asm 2026-02-21T09:05:30.5593569Z // begin inline asm 2026-02-21T09:05:30.5593730Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T09:05:30.5593783Z // end inline asm 2026-02-21T09:05:30.5593832Z // begin inline asm 2026-02-21T09:05:30.5593975Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T09:05:30.5594025Z // end inline asm 2026-02-21T09:05:30.5594074Z // begin inline asm 2026-02-21T09:05:30.5594244Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T09:05:30.5594295Z // end inline asm 2026-02-21T09:05:30.5594348Z // begin inline asm 2026-02-21T09:05:30.5594598Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd53 + 0 ], [ %rd20 + 0 ], 0x80; 2026-02-21T09:05:30.5594649Z // end inline asm 2026-02-21T09:05:30.5594738Z // begin inline asm 2026-02-21T09:05:30.5594861Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd53 + 0 ], 0x80; 2026-02-21T09:05:30.5594929Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.5595001Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.5595050Z // end inline asm 2026-02-21T09:05:30.5595100Z bar.sync 0, 128; 2026-02-21T09:05:30.5595263Z .loc 1 24 71 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:24:71 2026-02-21T09:05:30.5595320Z add.s32 %r217, %r214, 256; 2026-02-21T09:05:30.5595375Z cvt.s64.s32 %rd76, %r217; 2026-02-21T09:05:30.5595430Z add.s64 %rd71, %rd7, %rd76; 2026-02-21T09:05:30.5595486Z bar.sync 0, 128; 2026-02-21T09:05:30.5595536Z // begin inline asm 2026-02-21T09:05:30.5595601Z @%p27 st.shared.b32 [ %r147 + 0 ], %r342; 2026-02-21T09:05:30.5595654Z // end inline asm 2026-02-21T09:05:30.5595709Z bar.warp.sync -1; 2026-02-21T09:05:30.5595817Z // begin inline asm 2026-02-21T09:05:30.5595971Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd20 + 0 ], %rd6; 2026-02-21T09:05:30.5596025Z // end inline asm 2026-02-21T09:05:30.5596077Z // begin inline asm 2026-02-21T09:05:30.5596212Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T09:05:30.5596265Z // end inline asm 2026-02-21T09:05:30.5596314Z // begin inline asm 2026-02-21T09:05:30.5596457Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r149; 2026-02-21T09:05:30.5596509Z // end inline asm 2026-02-21T09:05:30.5596558Z // begin inline asm 2026-02-21T09:05:30.5596704Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r150; 2026-02-21T09:05:30.5596753Z // end inline asm 2026-02-21T09:05:30.5596806Z // begin inline asm 2026-02-21T09:05:30.5596958Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r151; 2026-02-21T09:05:30.5597009Z // end inline asm 2026-02-21T09:05:30.5597061Z // begin inline asm 2026-02-21T09:05:30.5597211Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r152; 2026-02-21T09:05:30.5597287Z // end inline asm 2026-02-21T09:05:30.5597343Z // begin inline asm 2026-02-21T09:05:30.5597505Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd20 + 0 ], 0x0, %rd28; 2026-02-21T09:05:30.5597553Z // end inline asm 2026-02-21T09:05:30.5597602Z // begin inline asm 2026-02-21T09:05:30.5597770Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0, %r153; 2026-02-21T09:05:30.5597819Z // end inline asm 2026-02-21T09:05:30.5597869Z // begin inline asm 2026-02-21T09:05:30.5598063Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1, %r153; 2026-02-21T09:05:30.5598113Z // end inline asm 2026-02-21T09:05:30.5598163Z // begin inline asm 2026-02-21T09:05:30.5598309Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x6; 2026-02-21T09:05:30.5598362Z // end inline asm 2026-02-21T09:05:30.5598412Z // begin inline asm 2026-02-21T09:05:30.5598586Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T09:05:30.5598637Z // end inline asm 2026-02-21T09:05:30.5598687Z // begin inline asm 2026-02-21T09:05:30.5598833Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x1; 2026-02-21T09:05:30.5598884Z // end inline asm 2026-02-21T09:05:30.5598934Z // begin inline asm 2026-02-21T09:05:30.5599073Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd20 + 0 ], 0x0; 2026-02-21T09:05:30.5599152Z // end inline asm 2026-02-21T09:05:30.5599207Z // begin inline asm 2026-02-21T09:05:30.5599458Z @%p27 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd71 + 0 ], [ %rd20 + 0 ], 0x80; 2026-02-21T09:05:30.5599512Z // end inline asm 2026-02-21T09:05:30.5599563Z // begin inline asm 2026-02-21T09:05:30.5599680Z @%p27 fence.proxy.tensormap::generic.acquire.gpu [ %rd71 + 0 ], 0x80; 2026-02-21T09:05:30.5599749Z @%p27 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.5599817Z @%p27 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.5599867Z // end inline asm 2026-02-21T09:05:30.5599916Z bar.sync 0, 128; 2026-02-21T09:05:30.5599979Z cvta.global.u64 %rd77, %rd71; 2026-02-21T09:05:30.5600142Z .loc 1 31 35 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:31:35 2026-02-21T09:05:30.5600199Z shl.b32 %r41, %r207, 1; 2026-02-21T09:05:30.5600370Z .loc 1 32 37 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:32:37 2026-02-21T09:05:30.5600429Z add.s32 %r218, %r41, 2; 2026-02-21T09:05:30.5600587Z .loc 1 32 49 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:32:49 2026-02-21T09:05:30.5600645Z min.s32 %r219, %r218, 4096; 2026-02-21T09:05:30.5600830Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5600884Z sub.s32 %r220, %r219, %r41; 2026-02-21T09:05:30.5600937Z shl.b32 %r332, %r220, 7; 2026-02-21T09:05:30.5601105Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5601174Z shfl.sync.idx.b32 %r221, %r2, 0, 31, -1; 2026-02-21T09:05:30.5601226Z shl.b32 %r222, %r221, 21; 2026-02-21T09:05:30.5601289Z and.b32 %r223, %r222, 6291456; 2026-02-21T09:05:30.5601346Z add.s32 %r254, %r223, %r314; 2026-02-21T09:05:30.5601400Z mov.pred %p83, -1; 2026-02-21T09:05:30.5601458Z // begin inline asm 2026-02-21T09:05:30.5601746Z @%p83 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r254 + 0], {%r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342, %r342}; 2026-02-21T09:05:30.5601800Z // end inline asm 2026-02-21T09:05:30.5601850Z // begin inline asm 2026-02-21T09:05:30.5601925Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:05:30.5601978Z // end inline asm 2026-02-21T09:05:30.5602030Z bar.sync 0, 128; 2026-02-21T09:05:30.5602224Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5602283Z add.s32 %r188, %r146, 27136; 2026-02-21T09:05:30.5602336Z // begin inline asm 2026-02-21T09:05:30.5602429Z @%p111 mbarrier.init.shared::cta.b64 [%r188], 1; 2026-02-21T09:05:30.5602481Z // end inline asm 2026-02-21T09:05:30.5602535Z bar.sync 0, 128; 2026-02-21T09:05:30.5602592Z add.s32 %r189, %r146, 27144; 2026-02-21T09:05:30.5602653Z // begin inline asm 2026-02-21T09:05:30.5602735Z @%p111 mbarrier.init.shared::cta.b64 [%r189], 1; 2026-02-21T09:05:30.5602787Z // end inline asm 2026-02-21T09:05:30.5602868Z bar.sync 0, 128; 2026-02-21T09:05:30.5602923Z add.s32 %r190, %r146, 27152; 2026-02-21T09:05:30.5602976Z // begin inline asm 2026-02-21T09:05:30.5603054Z @%p111 mbarrier.init.shared::cta.b64 [%r190], 1; 2026-02-21T09:05:30.5603116Z // end inline asm 2026-02-21T09:05:30.5603164Z bar.sync 0, 128; 2026-02-21T09:05:30.5603214Z add.s32 %r191, %r146, 27160; 2026-02-21T09:05:30.5603269Z // begin inline asm 2026-02-21T09:05:30.5603343Z @%p111 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T09:05:30.5603391Z // end inline asm 2026-02-21T09:05:30.5603438Z bar.sync 0, 128; 2026-02-21T09:05:30.5603492Z add.s32 %r192, %r146, 27168; 2026-02-21T09:05:30.5603541Z // begin inline asm 2026-02-21T09:05:30.5603615Z @%p111 mbarrier.init.shared::cta.b64 [%r192], 1; 2026-02-21T09:05:30.5603674Z // end inline asm 2026-02-21T09:05:30.5603729Z add.s32 %r193, %r146, 27184; 2026-02-21T09:05:30.5603782Z // begin inline asm 2026-02-21T09:05:30.5603886Z @%p111 mbarrier.init.shared::cta.b64 [%r193], 1; 2026-02-21T09:05:30.5603941Z // end inline asm 2026-02-21T09:05:30.5603993Z bar.sync 0, 128; 2026-02-21T09:05:30.5604049Z add.s32 %r194, %r146, 27192; 2026-02-21T09:05:30.5604109Z // begin inline asm 2026-02-21T09:05:30.5604186Z @%p111 mbarrier.init.shared::cta.b64 [%r194], 1; 2026-02-21T09:05:30.5604238Z // end inline asm 2026-02-21T09:05:30.5604294Z bar.sync 0, 128; 2026-02-21T09:05:30.5604349Z add.s32 %r195, %r146, 27200; 2026-02-21T09:05:30.5604405Z // begin inline asm 2026-02-21T09:05:30.5604481Z @%p111 mbarrier.init.shared::cta.b64 [%r195], 1; 2026-02-21T09:05:30.5604539Z // end inline asm 2026-02-21T09:05:30.5604591Z bar.sync 0, 128; 2026-02-21T09:05:30.5604644Z add.s32 %r196, %r146, 27208; 2026-02-21T09:05:30.5604735Z // begin inline asm 2026-02-21T09:05:30.5604813Z @%p111 mbarrier.init.shared::cta.b64 [%r196], 1; 2026-02-21T09:05:30.5604864Z // end inline asm 2026-02-21T09:05:30.5604919Z bar.sync 0, 128; 2026-02-21T09:05:30.5604981Z add.s32 %r197, %r146, 27216; 2026-02-21T09:05:30.5605035Z // begin inline asm 2026-02-21T09:05:30.5605111Z @%p111 mbarrier.init.shared::cta.b64 [%r197], 1; 2026-02-21T09:05:30.5605169Z // end inline asm 2026-02-21T09:05:30.5605326Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5605403Z bar.sync 0, 128; 2026-02-21T09:05:30.5605463Z // begin inline asm 2026-02-21T09:05:30.5605552Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r188]; 2026-02-21T09:05:30.5605605Z // end inline asm 2026-02-21T09:05:30.5605657Z bar.sync 0, 128; 2026-02-21T09:05:30.5605719Z // begin inline asm 2026-02-21T09:05:30.5605804Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r189]; 2026-02-21T09:05:30.5605858Z // end inline asm 2026-02-21T09:05:30.5605917Z bar.sync 0, 128; 2026-02-21T09:05:30.5605971Z // begin inline asm 2026-02-21T09:05:30.5606052Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r190]; 2026-02-21T09:05:30.5606105Z // end inline asm 2026-02-21T09:05:30.5606167Z bar.sync 0, 128; 2026-02-21T09:05:30.5606219Z // begin inline asm 2026-02-21T09:05:30.5606299Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r191]; 2026-02-21T09:05:30.5606357Z // end inline asm 2026-02-21T09:05:30.5606410Z bar.sync 0, 128; 2026-02-21T09:05:30.5606465Z // begin inline asm 2026-02-21T09:05:30.5606552Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r192]; 2026-02-21T09:05:30.5606604Z // end inline asm 2026-02-21T09:05:30.5606803Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5606859Z bar.sync 0, 128; 2026-02-21T09:05:30.5606946Z add.s32 %r203, %r146, 27232; 2026-02-21T09:05:30.5607000Z // begin inline asm 2026-02-21T09:05:30.5607079Z @%p111 mbarrier.init.shared::cta.b64 [%r203], 1; 2026-02-21T09:05:30.5607138Z // end inline asm 2026-02-21T09:05:30.5607193Z add.s32 %r300, %r146, 27248; 2026-02-21T09:05:30.5607247Z // begin inline asm 2026-02-21T09:05:30.5607332Z @%p111 mbarrier.init.shared::cta.b64 [%r300], 1; 2026-02-21T09:05:30.5607414Z // end inline asm 2026-02-21T09:05:30.5607576Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5607628Z bar.sync 0, 128; 2026-02-21T09:05:30.5607692Z // begin inline asm 2026-02-21T09:05:30.5607774Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r300]; 2026-02-21T09:05:30.5607828Z // end inline asm 2026-02-21T09:05:30.5608014Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5608094Z st.shared.b32 [global_smem+27256], 33554689; 2026-02-21T09:05:30.5608168Z st.shared.b32 [global_smem+20480], %r314; 2026-02-21T09:05:30.5608230Z barrier.sync 1; 2026-02-21T09:05:30.5608306Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:30.5608360Z barrier.sync 1; 2026-02-21T09:05:30.5608418Z setp.lt.s32 %p102, %r332, 1; 2026-02-21T09:05:30.5608482Z @%p102 bra $L__BB0_23; 2026-02-21T09:05:30.5608587Z // %bb.17: // %.lr.ph10 2026-02-21T09:05:30.5608647Z add.s32 %r339, %r41, -1; 2026-02-21T09:05:30.5608709Z shl.b32 %r226, %r1, 5; 2026-02-21T09:05:30.5608766Z and.b32 %r227, %r226, 3936; 2026-02-21T09:05:30.5608824Z bfe.s32 %r228, %r1, 2, 1; 2026-02-21T09:05:30.5608881Z and.b32 %r229, %r228, 144; 2026-02-21T09:05:30.5608947Z or.b32 %r230, %r229, %r227; 2026-02-21T09:05:30.5609003Z add.s32 %r232, %r146, 20480; 2026-02-21T09:05:30.5609058Z add.s32 %r45, %r232, %r230; 2026-02-21T09:05:30.5609121Z xor.b32 %r233, %r230, 16; 2026-02-21T09:05:30.5609176Z add.s32 %r46, %r232, %r233; 2026-02-21T09:05:30.5609231Z mov.b32 %r337, -1; 2026-02-21T09:05:30.5609284Z mov.b32 %r340, %r342; 2026-02-21T09:05:30.5609345Z mov.b32 %r341, %r342; 2026-02-21T09:05:30.5609399Z bra.uni $L__BB0_18; 2026-02-21T09:05:30.5609501Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.5609672Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5609728Z bar.sync 0, 128; 2026-02-21T09:05:30.5609783Z // begin inline asm 2026-02-21T09:05:30.5609833Z 2026-02-21T09:05:30.5609889Z { 2026-02-21T09:05:30.5609946Z .reg .pred complete; 2026-02-21T09:05:30.5610022Z waitLoop: 2026-02-21T09:05:30.5610148Z mbarrier.try_wait.parity.shared.b64 complete, [%r203], %r342; 2026-02-21T09:05:30.5610211Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.5610259Z } 2026-02-21T09:05:30.5610262Z 2026-02-21T09:05:30.5610323Z // end inline asm 2026-02-21T09:05:30.5610378Z // begin inline asm 2026-02-21T09:05:30.5610652Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r238, %r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251, %r252, %r253}, [%r254 + 0]; 2026-02-21T09:05:30.5610706Z // end inline asm 2026-02-21T09:05:30.5610767Z // begin inline asm 2026-02-21T09:05:30.5610835Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:05:30.5610888Z // end inline asm 2026-02-21T09:05:30.5610948Z bar.sync 0, 128; 2026-02-21T09:05:30.5611000Z // begin inline asm 2026-02-21T09:05:30.5611086Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r300]; 2026-02-21T09:05:30.5611146Z // end inline asm 2026-02-21T09:05:30.5611202Z cvt.u64.u32 %rd78, %r238; 2026-02-21T09:05:30.5611260Z cvt.u64.u32 %rd79, %r239; 2026-02-21T09:05:30.5611315Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:05:30.5611376Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:05:30.5611559Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5611622Z mov.b64 {%r261, %r262}, %rd81; 2026-02-21T09:05:30.5611694Z cvt.rn.f16x2.f32 %r263, %r262, %r261; 2026-02-21T09:05:30.5611856Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5611911Z cvt.u64.u32 %rd82, %r240; 2026-02-21T09:05:30.5611965Z cvt.u64.u32 %rd83, %r241; 2026-02-21T09:05:30.5612026Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:05:30.5612101Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:05:30.5612262Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5612328Z mov.b64 {%r264, %r265}, %rd85; 2026-02-21T09:05:30.5612395Z cvt.rn.f16x2.f32 %r266, %r265, %r264; 2026-02-21T09:05:30.5612554Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5612614Z cvt.u64.u32 %rd86, %r242; 2026-02-21T09:05:30.5612670Z cvt.u64.u32 %rd87, %r243; 2026-02-21T09:05:30.5612725Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:05:30.5612779Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:05:30.5612945Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5613003Z mov.b64 {%r267, %r268}, %rd89; 2026-02-21T09:05:30.5613065Z cvt.rn.f16x2.f32 %r269, %r268, %r267; 2026-02-21T09:05:30.5613265Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5613323Z cvt.u64.u32 %rd90, %r244; 2026-02-21T09:05:30.5613376Z cvt.u64.u32 %rd91, %r245; 2026-02-21T09:05:30.5613438Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:05:30.5613493Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:05:30.5613654Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5613711Z mov.b64 {%r270, %r271}, %rd93; 2026-02-21T09:05:30.5613781Z cvt.rn.f16x2.f32 %r272, %r271, %r270; 2026-02-21T09:05:30.5613939Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5613995Z cvt.u64.u32 %rd94, %r246; 2026-02-21T09:05:30.5614058Z cvt.u64.u32 %rd95, %r247; 2026-02-21T09:05:30.5614112Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:05:30.5614168Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:05:30.5614333Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5614391Z mov.b64 {%r273, %r274}, %rd97; 2026-02-21T09:05:30.5614451Z cvt.rn.f16x2.f32 %r275, %r274, %r273; 2026-02-21T09:05:30.5614615Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5614728Z cvt.u64.u32 %rd98, %r248; 2026-02-21T09:05:30.5614787Z cvt.u64.u32 %rd99, %r249; 2026-02-21T09:05:30.5614846Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:05:30.5614912Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:05:30.5615072Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5615132Z mov.b64 {%r276, %r277}, %rd101; 2026-02-21T09:05:30.5615203Z cvt.rn.f16x2.f32 %r278, %r277, %r276; 2026-02-21T09:05:30.5615357Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5615417Z cvt.u64.u32 %rd102, %r250; 2026-02-21T09:05:30.5615475Z cvt.u64.u32 %rd103, %r251; 2026-02-21T09:05:30.5615546Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:05:30.5615605Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:05:30.5615762Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5615836Z mov.b64 {%r279, %r280}, %rd105; 2026-02-21T09:05:30.5615898Z cvt.rn.f16x2.f32 %r281, %r280, %r279; 2026-02-21T09:05:30.5616056Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5616148Z cvt.u64.u32 %rd106, %r252; 2026-02-21T09:05:30.5616208Z cvt.u64.u32 %rd107, %r253; 2026-02-21T09:05:30.5616268Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:05:30.5616328Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:05:30.5616508Z .loc 1 58 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:58:27 2026-02-21T09:05:30.5616569Z mov.b64 {%r282, %r283}, %rd109; 2026-02-21T09:05:30.5616633Z cvt.rn.f16x2.f32 %r284, %r283, %r282; 2026-02-21T09:05:30.5616827Z .loc 1 59 45 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:59:45 2026-02-21T09:05:30.5616897Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.5616950Z bar.sync 0, 128; 2026-02-21T09:05:30.5617053Z st.shared.v4.b32 [%r45], {%r263, %r266, %r269, %r272}; 2026-02-21T09:05:30.5617139Z st.shared.v4.b32 [%r46], {%r275, %r278, %r281, %r284}; 2026-02-21T09:05:30.5617195Z // begin inline asm 2026-02-21T09:05:30.5617269Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.5617332Z // end inline asm 2026-02-21T09:05:30.5617388Z bar.sync 0, 128; 2026-02-21T09:05:30.5617452Z elect.sync %r285|%p109, -1; 2026-02-21T09:05:30.5617523Z and.pred %p107, %p27, %p109; 2026-02-21T09:05:30.5617580Z // begin inline asm 2026-02-21T09:05:30.5617767Z @%p107 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd77, {%r341, %r340}], [%r232]; 2026-02-21T09:05:30.5617832Z // end inline asm 2026-02-21T09:05:30.5617923Z cp.async.bulk.commit_group; 2026-02-21T09:05:30.5617982Z mov.b32 %r338, 1; 2026-02-21T09:05:30.5618088Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.5618264Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5618326Z xor.b32 %r342, %r338, %r342; 2026-02-21T09:05:30.5618493Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5618562Z add.s32 %r332, %r332, -1; 2026-02-21T09:05:30.5618625Z setp.ne.b32 %p110, %r332, 0; 2026-02-21T09:05:30.5618685Z @%p110 bra $L__BB0_18; 2026-02-21T09:05:30.5618743Z bra.uni $L__BB0_23; 2026-02-21T09:05:30.5618858Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:30.5618917Z add.s32 %r235, %r337, 1; 2026-02-21T09:05:30.5618979Z setp.eq.b32 %p103, %r337, 127; 2026-02-21T09:05:30.5619050Z selp.b32 %r337, 0, %r235, %p103; 2026-02-21T09:05:30.5619112Z setp.eq.b32 %p104, %r337, 127; 2026-02-21T09:05:30.5619171Z @%p104 bra $L__BB0_21; 2026-02-21T09:05:30.5619275Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.5619445Z .loc 1 0 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0:106 2026-02-21T09:05:30.5619527Z mov.b32 %r338, 0; 2026-02-21T09:05:30.5619696Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5619765Z setp.ne.b32 %p105, %r337, 0; 2026-02-21T09:05:30.5619823Z @%p105 bra $L__BB0_22; 2026-02-21T09:05:30.5619898Z // %bb.20: // %.thread 2026-02-21T09:05:30.5619996Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:05:30.5620057Z add.s32 %r339, %r339, 1; 2026-02-21T09:05:30.5620218Z .loc 1 39 35 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:39:35 2026-02-21T09:05:30.5620284Z shr.s32 %r287, %r339, 31; 2026-02-21T09:05:30.5620345Z shr.u32 %r288, %r287, 25; 2026-02-21T09:05:30.5620404Z add.s32 %r289, %r339, %r288; 2026-02-21T09:05:30.5620461Z shr.s32 %r290, %r289, 7; 2026-02-21T09:05:30.5620630Z .loc 1 40 33 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:40:33 2026-02-21T09:05:30.5620690Z shl.b32 %r291, %r290, 2; 2026-02-21T09:05:30.5620852Z .loc 1 41 39 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:41:39 2026-02-21T09:05:30.5620941Z sub.s32 %r292, 128, %r291; 2026-02-21T09:05:30.5621106Z .loc 1 41 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:41:52 2026-02-21T09:05:30.5621164Z min.s32 %r293, %r292, 4; 2026-02-21T09:05:30.5621333Z .loc 1 42 45 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:42:45 2026-02-21T09:05:30.5621393Z and.b32 %r294, %r289, -128; 2026-02-21T09:05:30.5621452Z sub.s32 %r295, %r339, %r294; 2026-02-21T09:05:30.5621625Z .loc 1 43 51 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:43:51 2026-02-21T09:05:30.5621706Z div.s32 %r296, %r295, %r293; 2026-02-21T09:05:30.5621869Z .loc 1 42 64 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:42:64 2026-02-21T09:05:30.5621933Z mul.lo.s32 %r297, %r296, %r293; 2026-02-21T09:05:30.5621999Z sub.s32 %r298, %r295, %r297; 2026-02-21T09:05:30.5622164Z .loc 1 42 30 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:42:30 2026-02-21T09:05:30.5622221Z add.s32 %r299, %r298, %r291; 2026-02-21T09:05:30.5622393Z .loc 1 44 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:44:27 2026-02-21T09:05:30.5622451Z shl.b32 %r341, %r299, 4; 2026-02-21T09:05:30.5622615Z .loc 1 45 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:45:27 2026-02-21T09:05:30.5622678Z shl.b32 %r340, %r296, 7; 2026-02-21T09:05:30.5622871Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5622931Z bra.uni $L__BB0_22; 2026-02-21T09:05:30.5623033Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:05:30.5623206Z .loc 1 0 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0:106 2026-02-21T09:05:30.5623268Z mov.b32 %r65, global_smem; 2026-02-21T09:05:30.5623327Z add.s32 %r66, %r65, %r3; 2026-02-21T09:05:30.5623394Z mov.u32 %r116, %ctaid.x; 2026-02-21T09:05:30.5623451Z shl.b32 %r117, %r116, 1; 2026-02-21T09:05:30.5623507Z add.s32 %r118, %r117, 2; 2026-02-21T09:05:30.5623572Z min.s32 %r119, %r118, 4096; 2026-02-21T09:05:30.5623630Z sub.s32 %r120, %r119, %r117; 2026-02-21T09:05:30.5623688Z shl.b32 %r5, %r120, 7; 2026-02-21T09:05:30.5623749Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:05:30.5623814Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.5623916Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5624085Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5624172Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.5624230Z barrier.sync 1; 2026-02-21T09:05:30.5624310Z barrier.sync 1; 2026-02-21T09:05:30.5624396Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.5624479Z $L__BB0_2: // %.preheader 2026-02-21T09:05:30.5624573Z // =>This Loop Header: Depth=1 2026-02-21T09:05:30.5624661Z // Child Loop BB0_11 Depth 2 2026-02-21T09:05:30.5624786Z // Child Loop BB0_7 Depth 2 2026-02-21T09:05:30.5624949Z .loc 1 19 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:19 2026-02-21T09:05:30.5625029Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:05:30.5625106Z barrier.sync 1; 2026-02-21T09:05:30.5625172Z ld.shared.b8 %r64, [%r66+27252]; 2026-02-21T09:05:30.5625231Z setp.gt.u32 %p4, %r64, 3; 2026-02-21T09:05:30.5625294Z @%p4 bra $L__BB0_4; 2026-02-21T09:05:30.5625370Z // %bb.3: // %.preheader 2026-02-21T09:05:30.5625457Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5625517Z $L_brx_0: .branchtargets 2026-02-21T09:05:30.5625578Z $L__BB0_5, 2026-02-21T09:05:30.5625630Z $L__BB0_9, 2026-02-21T09:05:30.5625706Z $L__BB0_15, 2026-02-21T09:05:30.5625763Z $L__BB0_24; 2026-02-21T09:05:30.5625823Z brx.idx %r64, $L_brx_0; 2026-02-21T09:05:30.5625918Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5626089Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5626172Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.5626246Z ld.shared.b32 %r125, [global_smem+20480]; 2026-02-21T09:05:30.5626302Z barrier.sync 1; 2026-02-21T09:05:30.5626388Z @%p17 bra $L__BB0_8; 2026-02-21T09:05:30.5626462Z // %bb.6: // %.lr.ph7 2026-02-21T09:05:30.5626543Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5626713Z .loc 1 0 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0:106 2026-02-21T09:05:30.5626768Z mov.b32 %r320, -1; 2026-02-21T09:05:30.5626824Z mov.pred %p124, 0; 2026-02-21T09:05:30.5626877Z mov.b32 %r317, 0; 2026-02-21T09:05:30.5626940Z mov.b32 %r316, %r5; 2026-02-21T09:05:30.5626995Z mov.b32 %r318, %r317; 2026-02-21T09:05:30.5627050Z mov.b32 %r319, %r317; 2026-02-21T09:05:30.5627150Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:30.5627237Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:30.5627423Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5627490Z add.s32 %r129, %r320, 1; 2026-02-21T09:05:30.5627551Z setp.eq.b32 %p24, %r320, 127; 2026-02-21T09:05:30.5627612Z selp.b32 %r320, 0, %r129, %p24; 2026-02-21T09:05:30.5627667Z shl.b32 %r130, %r319, 3; 2026-02-21T09:05:30.5627732Z add.s32 %r132, %r65, %r130; 2026-02-21T09:05:30.5627789Z add.s32 %r133, %r132, 27136; 2026-02-21T09:05:30.5627843Z add.s32 %r123, %r132, 27184; 2026-02-21T09:05:30.5628014Z .loc 1 54 31 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:54:31 2026-02-21T09:05:30.5628070Z shl.b32 %r134, %r319, 12; 2026-02-21T09:05:30.5628127Z add.s32 %r135, %r65, %r134; 2026-02-21T09:05:30.5628282Z .loc 1 55 44 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:55:44 2026-02-21T09:05:30.5628343Z shl.b32 %r136, %r319, 9; 2026-02-21T09:05:30.5628397Z add.s32 %r137, %r65, %r136; 2026-02-21T09:05:30.5628452Z add.s32 %r138, %r137, 24576; 2026-02-21T09:05:30.5628619Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5628680Z bar.warp.sync -1; 2026-02-21T09:05:30.5628734Z // begin inline asm 2026-02-21T09:05:30.5628788Z 2026-02-21T09:05:30.5628837Z { 2026-02-21T09:05:30.5628926Z .reg .pred complete; 2026-02-21T09:05:30.5628977Z waitLoop: 2026-02-21T09:05:30.5629104Z mbarrier.try_wait.parity.shared.b64 complete, [%r123], %r318; 2026-02-21T09:05:30.5629165Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.5629213Z } 2026-02-21T09:05:30.5629217Z 2026-02-21T09:05:30.5629277Z // end inline asm 2026-02-21T09:05:30.5629442Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5629504Z setp.eq.b32 %p23, %r320, 127; 2026-02-21T09:05:30.5629669Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5629731Z elect.sync %r139|%p20, -1; 2026-02-21T09:05:30.5629788Z bfe.u32 %r140, %r135, 4, 14; 2026-02-21T09:05:30.5629845Z cvt.u64.u32 %rd18, %r140; 2026-02-21T09:05:30.5629923Z or.b64 %rd14, %rd18, -4611685949691133952; 2026-02-21T09:05:30.5629979Z bfe.u32 %r141, %r138, 4, 14; 2026-02-21T09:05:30.5630035Z cvt.u64.u32 %rd19, %r141; 2026-02-21T09:05:30.5630112Z or.b64 %rd15, %rd19, -4611685949705814016; 2026-02-21T09:05:30.5630166Z mov.b32 %r126, 134479888; 2026-02-21T09:05:30.5630219Z // begin inline asm 2026-02-21T09:05:30.5630381Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r125 + 0 ], %rd14, %rd15, %r126, %p124; 2026-02-21T09:05:30.5630443Z // end inline asm 2026-02-21T09:05:30.5630499Z cvt.u64.u32 %rd16, %r133; 2026-02-21T09:05:30.5630552Z // begin inline asm 2026-02-21T09:05:30.5630681Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd16]; 2026-02-21T09:05:30.5630735Z // end inline asm 2026-02-21T09:05:30.5630795Z and.pred %p22, %p23, %p20; 2026-02-21T09:05:30.5630856Z add.s32 %r142, %r65, 27232; 2026-02-21T09:05:30.5630912Z cvt.u64.u32 %rd17, %r142; 2026-02-21T09:05:30.5631000Z // begin inline asm 2026-02-21T09:05:30.5631121Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd17]; 2026-02-21T09:05:30.5631182Z // end inline asm 2026-02-21T09:05:30.5631343Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5631404Z setp.ne.b32 %p124, %r320, 127; 2026-02-21T09:05:30.5631577Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5631635Z selp.b32 %r143, 1, 0, %p23; 2026-02-21T09:05:30.5631690Z xor.b32 %r317, %r317, %r143; 2026-02-21T09:05:30.5631752Z add.s32 %r127, %r65, 27248; 2026-02-21T09:05:30.5631806Z // begin inline asm 2026-02-21T09:05:30.5631853Z 2026-02-21T09:05:30.5631901Z { 2026-02-21T09:05:30.5631969Z @!%p23 bra.uni skipWait; 2026-02-21T09:05:30.5632026Z .reg .pred complete; 2026-02-21T09:05:30.5632078Z waitLoop: 2026-02-21T09:05:30.5632222Z mbarrier.try_wait.parity.shared.b64 complete, [%r127], %r317; 2026-02-21T09:05:30.5632287Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.5632341Z skipWait: 2026-02-21T09:05:30.5632390Z } 2026-02-21T09:05:30.5632394Z 2026-02-21T09:05:30.5632458Z // end inline asm 2026-02-21T09:05:30.5632619Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5632675Z add.s32 %r144, %r319, 1; 2026-02-21T09:05:30.5632746Z setp.eq.b32 %p25, %r144, 5; 2026-02-21T09:05:30.5632808Z selp.b32 %r319, 0, %r144, %p25; 2026-02-21T09:05:30.5632864Z selp.b32 %r145, 1, 0, %p25; 2026-02-21T09:05:30.5632927Z xor.b32 %r318, %r318, %r145; 2026-02-21T09:05:30.5633099Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5633155Z add.s32 %r316, %r316, -1; 2026-02-21T09:05:30.5633212Z setp.ne.b32 %p26, %r316, 0; 2026-02-21T09:05:30.5633276Z @%p26 bra $L__BB0_7; 2026-02-21T09:05:30.5633357Z $L__BB0_8: // %._crit_edge8 2026-02-21T09:05:30.5633442Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5633502Z barrier.sync 1; 2026-02-21T09:05:30.5633577Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.5633651Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.5633742Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5633924Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5633999Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.5634053Z barrier.sync 1; 2026-02-21T09:05:30.5634225Z .loc 1 21 67 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:21:67 2026-02-21T09:05:30.5634282Z mov.u32 %r68, %ctaid.y; 2026-02-21T09:05:30.5634337Z mov.u32 %r69, %ctaid.z; 2026-02-21T09:05:30.5634400Z mov.u32 %r70, %nctaid.x; 2026-02-21T09:05:30.5634458Z mov.u32 %r71, %nctaid.y; 2026-02-21T09:05:30.5634520Z mad.lo.s32 %r72, %r69, %r71, %r68; 2026-02-21T09:05:30.5634580Z mad.lo.s32 %r73, %r72, %r70, %r116; 2026-02-21T09:05:30.5634645Z mul.lo.s32 %r74, %r73, 384; 2026-02-21T09:05:30.5634839Z .loc 1 22 67 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:22:67 2026-02-21T09:05:30.5634897Z add.s32 %r75, %r74, 128; 2026-02-21T09:05:30.5634961Z cvt.s64.s32 %rd8, %r75; 2026-02-21T09:05:30.5635018Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:05:30.5635101Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:05:30.5635269Z .loc 1 21 67 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:21:67 2026-02-21T09:05:30.5635326Z cvt.s64.s32 %rd10, %r74; 2026-02-21T09:05:30.5635385Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:05:30.5635446Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:05:30.5635624Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5635681Z shl.b32 %r321, %r120, 7; 2026-02-21T09:05:30.5635790Z setp.lt.s32 %p5, %r321, 1; 2026-02-21T09:05:30.5635853Z @%p5 bra $L__BB0_14; 2026-02-21T09:05:30.5635927Z // %bb.10: // %.lr.ph 2026-02-21T09:05:30.5636011Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5636076Z add.s32 %r331, %r117, -1; 2026-02-21T09:05:30.5636133Z add.s32 %r19, %r1, -128; 2026-02-21T09:05:30.5636187Z mov.b32 %r328, -1; 2026-02-21T09:05:30.5636237Z mov.b32 %r322, 0; 2026-02-21T09:05:30.5636300Z mov.b32 %r323, %r322; 2026-02-21T09:05:30.5636353Z mov.b32 %r330, %r322; 2026-02-21T09:05:30.5636407Z mov.b32 %r329, %r322; 2026-02-21T09:05:30.5636465Z mov.b32 %r326, %r322; 2026-02-21T09:05:30.5636518Z bra.uni $L__BB0_11; 2026-02-21T09:05:30.5636613Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:30.5636802Z .loc 1 0 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0:106 2026-02-21T09:05:30.5636876Z selp.b32 %r99, 0, %r326, %p8; 2026-02-21T09:05:30.5636936Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:05:30.5636995Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:05:30.5637168Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5637225Z shl.b32 %r106, %r323, 3; 2026-02-21T09:05:30.5637279Z add.s32 %r108, %r65, %r106; 2026-02-21T09:05:30.5637340Z add.s32 %r95, %r108, 27136; 2026-02-21T09:05:30.5637497Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5637550Z // begin inline asm 2026-02-21T09:05:30.5637598Z 2026-02-21T09:05:30.5637651Z { 2026-02-21T09:05:30.5637707Z .reg .pred complete; 2026-02-21T09:05:30.5637759Z waitLoop: 2026-02-21T09:05:30.5637880Z mbarrier.try_wait.parity.shared.b64 complete, [%r95], %r322; 2026-02-21T09:05:30.5637940Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.5637987Z } 2026-02-21T09:05:30.5637992Z 2026-02-21T09:05:30.5638046Z // end inline asm 2026-02-21T09:05:30.5638222Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5638278Z add.s32 %r101, %r108, 27184; 2026-02-21T09:05:30.5638465Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5638526Z bar.sync 3, 64; 2026-02-21T09:05:30.5638580Z // begin inline asm 2026-02-21T09:05:30.5638689Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r101], 4608; 2026-02-21T09:05:30.5638749Z // end inline asm 2026-02-21T09:05:30.5638910Z .loc 1 54 31 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:54:31 2026-02-21T09:05:30.5638967Z shl.b32 %r109, %r323, 12; 2026-02-21T09:05:30.5639023Z add.s32 %r98, %r65, %r109; 2026-02-21T09:05:30.5639183Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5639238Z bar.sync 3, 64; 2026-02-21T09:05:30.5639301Z elect.sync %r110|%p13, -1; 2026-02-21T09:05:30.5639369Z and.pred %p10, %p12, %p13; 2026-02-21T09:05:30.5639422Z // begin inline asm 2026-02-21T09:05:30.5639670Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r98], [%rd12, {%r99, %r330}], [%r101]; 2026-02-21T09:05:30.5639730Z // end inline asm 2026-02-21T09:05:30.5639894Z .loc 1 55 44 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:55:44 2026-02-21T09:05:30.5639983Z shl.b32 %r111, %r323, 9; 2026-02-21T09:05:30.5640042Z add.s32 %r112, %r65, %r111; 2026-02-21T09:05:30.5640106Z add.s32 %r102, %r112, 24576; 2026-02-21T09:05:30.5640260Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5640314Z bar.sync 3, 64; 2026-02-21T09:05:30.5640381Z elect.sync %r113|%p14, -1; 2026-02-21T09:05:30.5640442Z and.pred %p11, %p12, %p14; 2026-02-21T09:05:30.5640500Z // begin inline asm 2026-02-21T09:05:30.5640771Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r102], [%rd13, {%r99, %r329}], [%r101]; 2026-02-21T09:05:30.5640827Z // end inline asm 2026-02-21T09:05:30.5640996Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5641071Z add.s32 %r326, %r99, 16; 2026-02-21T09:05:30.5641221Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5641278Z add.s32 %r114, %r323, 1; 2026-02-21T09:05:30.5641338Z setp.eq.b32 %p15, %r114, 5; 2026-02-21T09:05:30.5641406Z selp.b32 %r323, 0, %r114, %p15; 2026-02-21T09:05:30.5641463Z selp.b32 %r115, 1, 0, %p15; 2026-02-21T09:05:30.5641519Z xor.b32 %r322, %r322, %r115; 2026-02-21T09:05:30.5641692Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5641749Z add.s32 %r321, %r321, -1; 2026-02-21T09:05:30.5641833Z setp.ne.b32 %p16, %r321, 0; 2026-02-21T09:05:30.5641894Z @%p16 bra $L__BB0_11; 2026-02-21T09:05:30.5641956Z bra.uni $L__BB0_14; 2026-02-21T09:05:30.5642054Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:30.5642146Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:30.5642210Z add.s32 %r81, %r328, 1; 2026-02-21T09:05:30.5642270Z setp.eq.b32 %p6, %r328, 127; 2026-02-21T09:05:30.5642330Z selp.b32 %r328, 0, %r81, %p6; 2026-02-21T09:05:30.5642396Z setp.ne.b32 %p7, %r328, 0; 2026-02-21T09:05:30.5642454Z setp.eq.b32 %p8, %r328, 0; 2026-02-21T09:05:30.5642509Z @%p7 bra $L__BB0_13; 2026-02-21T09:05:30.5642601Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:05:30.5642664Z add.s32 %r331, %r331, 1; 2026-02-21T09:05:30.5642826Z .loc 1 39 35 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:39:35 2026-02-21T09:05:30.5642885Z shr.s32 %r82, %r331, 31; 2026-02-21T09:05:30.5642952Z shr.u32 %r83, %r82, 25; 2026-02-21T09:05:30.5643005Z add.s32 %r84, %r331, %r83; 2026-02-21T09:05:30.5643060Z shr.s32 %r85, %r84, 7; 2026-02-21T09:05:30.5643229Z .loc 1 40 33 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:40:33 2026-02-21T09:05:30.5643307Z shl.b32 %r86, %r85, 2; 2026-02-21T09:05:30.5643466Z .loc 1 41 39 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:41:39 2026-02-21T09:05:30.5643523Z sub.s32 %r87, 128, %r86; 2026-02-21T09:05:30.5643690Z .loc 1 41 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:41:52 2026-02-21T09:05:30.5643746Z min.s32 %r88, %r87, 4; 2026-02-21T09:05:30.5643905Z .loc 1 42 45 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:42:45 2026-02-21T09:05:30.5643970Z and.b32 %r89, %r84, -128; 2026-02-21T09:05:30.5644027Z sub.s32 %r90, %r331, %r89; 2026-02-21T09:05:30.5644186Z .loc 1 43 51 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:43:51 2026-02-21T09:05:30.5644251Z div.s32 %r91, %r90, %r88; 2026-02-21T09:05:30.5644410Z .loc 1 42 64 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:42:64 2026-02-21T09:05:30.5644469Z mul.lo.s32 %r92, %r91, %r88; 2026-02-21T09:05:30.5644526Z sub.s32 %r93, %r90, %r92; 2026-02-21T09:05:30.5644749Z .loc 1 42 30 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:42:30 2026-02-21T09:05:30.5644808Z add.s32 %r94, %r93, %r86; 2026-02-21T09:05:30.5644968Z .loc 1 44 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:44:27 2026-02-21T09:05:30.5645031Z shl.b32 %r329, %r94, 4; 2026-02-21T09:05:30.5645195Z .loc 1 45 27 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:45:27 2026-02-21T09:05:30.5645251Z shl.b32 %r330, %r91, 7; 2026-02-21T09:05:30.5645311Z bra.uni $L__BB0_13; 2026-02-21T09:05:30.5645417Z $L__BB0_14: // %._crit_edge 2026-02-21T09:05:30.5645502Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5645675Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5645733Z barrier.sync 1; 2026-02-21T09:05:30.5645809Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:05:30.5645863Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.5645964Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:30.5646118Z .loc 1 19 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:19 2026-02-21T09:05:30.5646173Z barrier.sync 1; 2026-02-21T09:05:30.5646234Z barrier.sync 1; 2026-02-21T09:05:30.5646288Z bra.uni $L__BB0_2; 2026-02-21T09:05:30.5646370Z $L__BB0_23: // %._crit_edge11 2026-02-21T09:05:30.5646573Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5646646Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.5646698Z bar.sync 0, 128; 2026-02-21T09:05:30.5646751Z barrier.sync 1; 2026-02-21T09:05:30.5646834Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:05:30.5646997Z .loc 1 56 52 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:56:52 2026-02-21T09:05:30.5647050Z // begin inline asm 2026-02-21T09:05:30.5647104Z 2026-02-21T09:05:30.5647150Z { 2026-02-21T09:05:30.5647210Z .reg .pred complete; 2026-02-21T09:05:30.5647263Z waitLoop: 2026-02-21T09:05:30.5647387Z mbarrier.try_wait.parity.shared.b64 complete, [%r300], %r342; 2026-02-21T09:05:30.5647449Z @!complete bra.uni waitLoop; 2026-02-21T09:05:30.5647496Z } 2026-02-21T09:05:30.5647499Z 2026-02-21T09:05:30.5647558Z // end inline asm 2026-02-21T09:05:30.5647723Z .loc 1 33 106 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:106 2026-02-21T09:05:30.5647776Z bar.sync 0, 128; 2026-02-21T09:05:30.5647840Z // begin inline asm 2026-02-21T09:05:30.5647927Z @%p111 mbarrier.inval.shared::cta.b64 [%r300]; 2026-02-21T09:05:30.5647979Z // end inline asm 2026-02-21T09:05:30.5648031Z // begin inline asm 2026-02-21T09:05:30.5648150Z @%p111 mbarrier.inval.shared::cta.b64 [%r203]; 2026-02-21T09:05:30.5648202Z // end inline asm 2026-02-21T09:05:30.5648256Z // begin inline asm 2026-02-21T09:05:30.5648340Z @%p111 mbarrier.inval.shared::cta.b64 [%r193]; 2026-02-21T09:05:30.5648393Z // end inline asm 2026-02-21T09:05:30.5648446Z bar.sync 0, 128; 2026-02-21T09:05:30.5648499Z // begin inline asm 2026-02-21T09:05:30.5648582Z @%p111 mbarrier.inval.shared::cta.b64 [%r194]; 2026-02-21T09:05:30.5648632Z // end inline asm 2026-02-21T09:05:30.5648683Z bar.sync 0, 128; 2026-02-21T09:05:30.5648744Z // begin inline asm 2026-02-21T09:05:30.5648819Z @%p111 mbarrier.inval.shared::cta.b64 [%r195]; 2026-02-21T09:05:30.5648871Z // end inline asm 2026-02-21T09:05:30.5648932Z bar.sync 0, 128; 2026-02-21T09:05:30.5648987Z // begin inline asm 2026-02-21T09:05:30.5649062Z @%p111 mbarrier.inval.shared::cta.b64 [%r196]; 2026-02-21T09:05:30.5649114Z // end inline asm 2026-02-21T09:05:30.5649177Z bar.sync 0, 128; 2026-02-21T09:05:30.5649233Z // begin inline asm 2026-02-21T09:05:30.5649308Z @%p111 mbarrier.inval.shared::cta.b64 [%r197]; 2026-02-21T09:05:30.5649365Z // end inline asm 2026-02-21T09:05:30.5649417Z // begin inline asm 2026-02-21T09:05:30.5649525Z @%p111 mbarrier.inval.shared::cta.b64 [%r188]; 2026-02-21T09:05:30.5649578Z // end inline asm 2026-02-21T09:05:30.5649637Z bar.sync 0, 128; 2026-02-21T09:05:30.5649690Z // begin inline asm 2026-02-21T09:05:30.5649763Z @%p111 mbarrier.inval.shared::cta.b64 [%r189]; 2026-02-21T09:05:30.5649823Z // end inline asm 2026-02-21T09:05:30.5649875Z bar.sync 0, 128; 2026-02-21T09:05:30.5649927Z // begin inline asm 2026-02-21T09:05:30.5650000Z @%p111 mbarrier.inval.shared::cta.b64 [%r190]; 2026-02-21T09:05:30.5650059Z // end inline asm 2026-02-21T09:05:30.5650132Z bar.sync 0, 128; 2026-02-21T09:05:30.5650186Z // begin inline asm 2026-02-21T09:05:30.5650266Z @%p111 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T09:05:30.5650319Z // end inline asm 2026-02-21T09:05:30.5650374Z bar.sync 0, 128; 2026-02-21T09:05:30.5650436Z // begin inline asm 2026-02-21T09:05:30.5650511Z @%p111 mbarrier.inval.shared::cta.b64 [%r192]; 2026-02-21T09:05:30.5650564Z // end inline asm 2026-02-21T09:05:30.5650726Z .loc 1 33 4 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:33:4 2026-02-21T09:05:30.5650788Z bar.sync 0, 128; 2026-02-21T09:05:30.5650841Z // begin inline asm 2026-02-21T09:05:30.5650955Z @%p27 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r314, 32; 2026-02-21T09:05:30.5651017Z // end inline asm 2026-02-21T09:05:30.5651096Z st.shared.b32 [global_smem+27256], 50529027; 2026-02-21T09:05:30.5651150Z barrier.sync 1; 2026-02-21T09:05:30.5651257Z $L__BB0_24: // %common.ret 2026-02-21T09:05:30.5651416Z .loc 1 0 0 // c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py:0 2026-02-21T09:05:30.5651468Z ret; 2026-02-21T09:05:30.5651520Z $L__tmp1: 2026-02-21T09:05:30.5651582Z $L__func_end0: 2026-02-21T09:05:30.5651664Z // -- End function 2026-02-21T09:05:30.5651713Z } 2026-02-21T09:05:30.5651917Z .file 1 "/tmp/torchinductor_root/5q/c5qrrgct2id67bejili4xsp3l2mbhhe27pq26px4hxavyqzw6wds.py" 2026-02-21T09:05:30.5651978Z .section .debug_abbrev 2026-02-21T09:05:30.5652027Z { 2026-02-21T09:05:30.5652112Z .b8 1 // Abbreviation Code 2026-02-21T09:05:30.5652207Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:30.5652286Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:30.5652362Z .b8 37 // DW_AT_producer 2026-02-21T09:05:30.5652443Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.5652515Z .b8 19 // DW_AT_language 2026-02-21T09:05:30.5652587Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:30.5652666Z .b8 3 // DW_AT_name 2026-02-21T09:05:30.5652757Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.5652833Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:30.5652905Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:30.5652987Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:30.5653056Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.5653124Z .b8 0 // EOM(1) 2026-02-21T09:05:30.5653197Z .b8 0 // EOM(2) 2026-02-21T09:05:30.5653261Z .b8 0 // EOM(3) 2026-02-21T09:05:30.5653310Z } 2026-02-21T09:05:30.5653374Z .section .debug_info 2026-02-21T09:05:30.5653423Z { 2026-02-21T09:05:30.5653501Z .b32 104 // Length of Unit 2026-02-21T09:05:30.5653582Z .b8 2 // DWARF version number 2026-02-21T09:05:30.5653638Z .b8 0 2026-02-21T09:05:30.5653752Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:30.5653837Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:30.5653958Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:05:30.5654038Z .b8 116 // DW_AT_producer 2026-02-21T09:05:30.5654090Z .b8 114 2026-02-21T09:05:30.5654146Z .b8 105 2026-02-21T09:05:30.5654195Z .b8 116 2026-02-21T09:05:30.5654243Z .b8 111 2026-02-21T09:05:30.5654291Z .b8 110 2026-02-21T09:05:30.5654346Z .b8 0 2026-02-21T09:05:30.5654417Z .b8 2 // DW_AT_language 2026-02-21T09:05:30.5654466Z .b8 0 2026-02-21T09:05:30.5654546Z .b8 99 // DW_AT_name 2026-02-21T09:05:30.5654616Z .b8 53 2026-02-21T09:05:30.5654664Z .b8 113 2026-02-21T09:05:30.5654740Z .b8 114 2026-02-21T09:05:30.5654797Z .b8 114 2026-02-21T09:05:30.5654846Z .b8 103 2026-02-21T09:05:30.5654895Z .b8 99 2026-02-21T09:05:30.5654944Z .b8 116 2026-02-21T09:05:30.5654999Z .b8 50 2026-02-21T09:05:30.5655047Z .b8 105 2026-02-21T09:05:30.5655095Z .b8 100 2026-02-21T09:05:30.5655150Z .b8 54 2026-02-21T09:05:30.5655198Z .b8 55 2026-02-21T09:05:30.5655246Z .b8 98 2026-02-21T09:05:30.5655295Z .b8 101 2026-02-21T09:05:30.5655353Z .b8 106 2026-02-21T09:05:30.5655400Z .b8 105 2026-02-21T09:05:30.5655448Z .b8 108 2026-02-21T09:05:30.5655503Z .b8 105 2026-02-21T09:05:30.5655552Z .b8 52 2026-02-21T09:05:30.5655601Z .b8 120 2026-02-21T09:05:30.5655650Z .b8 115 2026-02-21T09:05:30.5655708Z .b8 112 2026-02-21T09:05:30.5655756Z .b8 51 2026-02-21T09:05:30.5655805Z .b8 108 2026-02-21T09:05:30.5655865Z .b8 50 2026-02-21T09:05:30.5655915Z .b8 109 2026-02-21T09:05:30.5655993Z .b8 98 2026-02-21T09:05:30.5656045Z .b8 104 2026-02-21T09:05:30.5656102Z .b8 104 2026-02-21T09:05:30.5656152Z .b8 101 2026-02-21T09:05:30.5656200Z .b8 50 2026-02-21T09:05:30.5656247Z .b8 55 2026-02-21T09:05:30.5656303Z .b8 112 2026-02-21T09:05:30.5656351Z .b8 113 2026-02-21T09:05:30.5656400Z .b8 50 2026-02-21T09:05:30.5656455Z .b8 54 2026-02-21T09:05:30.5656503Z .b8 112 2026-02-21T09:05:30.5656552Z .b8 120 2026-02-21T09:05:30.5656599Z .b8 52 2026-02-21T09:05:30.5656655Z .b8 104 2026-02-21T09:05:30.5656704Z .b8 120 2026-02-21T09:05:30.5656752Z .b8 97 2026-02-21T09:05:30.5656807Z .b8 118 2026-02-21T09:05:30.5656855Z .b8 121 2026-02-21T09:05:30.5656904Z .b8 113 2026-02-21T09:05:30.5656952Z .b8 122 2026-02-21T09:05:30.5657007Z .b8 119 2026-02-21T09:05:30.5657054Z .b8 54 2026-02-21T09:05:30.5657102Z .b8 119 2026-02-21T09:05:30.5657149Z .b8 100 2026-02-21T09:05:30.5657204Z .b8 115 2026-02-21T09:05:30.5657253Z .b8 46 2026-02-21T09:05:30.5657301Z .b8 112 2026-02-21T09:05:30.5657355Z .b8 121 2026-02-21T09:05:30.5657404Z .b8 0 2026-02-21T09:05:30.5657496Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:30.5657569Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:30.5657625Z .b8 116 2026-02-21T09:05:30.5657674Z .b8 109 2026-02-21T09:05:30.5657756Z .b8 112 2026-02-21T09:05:30.5657812Z .b8 47 2026-02-21T09:05:30.5657860Z .b8 116 2026-02-21T09:05:30.5657908Z .b8 111 2026-02-21T09:05:30.5657956Z .b8 114 2026-02-21T09:05:30.5658011Z .b8 99 2026-02-21T09:05:30.5658059Z .b8 104 2026-02-21T09:05:30.5658109Z .b8 105 2026-02-21T09:05:30.5658163Z .b8 110 2026-02-21T09:05:30.5658211Z .b8 100 2026-02-21T09:05:30.5658259Z .b8 117 2026-02-21T09:05:30.5658306Z .b8 99 2026-02-21T09:05:30.5658362Z .b8 116 2026-02-21T09:05:30.5658410Z .b8 111 2026-02-21T09:05:30.5658457Z .b8 114 2026-02-21T09:05:30.5658510Z .b8 95 2026-02-21T09:05:30.5658558Z .b8 114 2026-02-21T09:05:30.5658607Z .b8 111 2026-02-21T09:05:30.5658653Z .b8 111 2026-02-21T09:05:30.5658706Z .b8 116 2026-02-21T09:05:30.5658754Z .b8 47 2026-02-21T09:05:30.5658802Z .b8 53 2026-02-21T09:05:30.5658849Z .b8 113 2026-02-21T09:05:30.5658903Z .b8 0 2026-02-21T09:05:30.5658951Z } 2026-02-21T09:05:30.5659014Z .section .debug_macinfo { } 2026-02-21T09:05:30.5659018Z 2026-02-21T09:05:30.5659099Z ================================================================ 2026-02-21T09:05:30.5659200Z please share the reproducer above with Triton project. 2026-02-21T09:05:30.6159072Z 2026-02-21T09:05:30.6161302Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 93/93 16.3 configs/s 2026-02-21T09:05:30.7689909Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5941.0 2026-02-21T09:05:30.7691747Z configs/s 2026-02-21T09:05:30.7973140Z [31s] Generation 1 complete: 2026-02-21T09:05:30.7977176Z error=13 2026-02-21T09:05:30.7981591Z ok=81 2026-02-21T09:05:30.7985469Z min=0.0348 2026-02-21T09:05:30.7986930Z mid=0.3030 2026-02-21T09:05:30.7987005Z max=5.7586 2026-02-21T09:05:30.7987273Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:05:30.7987403Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:05:30.7987467Z 'l2_groupings': [1], 2026-02-21T09:05:30.7987548Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:05:30.7987616Z 'loop_orders': [[1, 0]], 2026-02-21T09:05:30.7987681Z 'num_stages': 7, 2026-02-21T09:05:30.7987737Z 'num_warps': 4, 2026-02-21T09:05:30.7987796Z 'pid_type': 'flat', 2026-02-21T09:05:30.7987872Z 'range_flattens': [None, None], 2026-02-21T09:05:30.7987942Z 'range_multi_buffers': [None, None], 2026-02-21T09:05:30.7988003Z 'range_num_stages': [0, 0], 2026-02-21T09:05:30.7988063Z 'range_unroll_factors': [0, 0], 2026-02-21T09:05:30.7988142Z 'range_warp_specializes': [None, None]} 2026-02-21T09:05:30.7993507Z [31s] Fitting surrogate: 194 points, 194 targets 2026-02-21T09:05:31.8312435Z [32s] Generation 2 starting: 81 neighbors, 5 active search path(s) 2026-02-21T09:06:03.5428483Z [63s] Timeout after 30s compiling Config(block_sizes=[1024, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_stages=7, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T09:06:03.5446034Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 0.4 configs/s 2026-02-21T09:06:07.6549719Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 20.7 configs/s 2026-02-21T09:06:08.3310348Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1458.4 2026-02-21T09:06:08.3314393Z configs/s 2026-02-21T09:06:08.3860321Z [68s] Generation 2 complete: 2026-02-21T09:06:08.3861632Z error=17 2026-02-21T09:06:08.3861810Z timeout=1 2026-02-21T09:06:08.3861934Z ok=68 2026-02-21T09:06:08.3862083Z min=0.0357 2026-02-21T09:06:08.3862216Z mid=0.1250 2026-02-21T09:06:08.3862346Z max=18.1863 2026-02-21T09:06:08.3862488Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:06:08.3862737Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:06:08.3863282Z 'l2_groupings': [2], 2026-02-21T09:06:08.3863457Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:06:08.3863658Z 'loop_orders': [[1, 0]], 2026-02-21T09:06:08.3863840Z 'num_stages': 7, 2026-02-21T09:06:08.3864008Z 'num_warps': 8, 2026-02-21T09:06:08.3864153Z 'pid_type': 'flat', 2026-02-21T09:06:08.3864312Z 'range_flattens': [None, None], 2026-02-21T09:06:08.3864489Z 'range_multi_buffers': [None, None], 2026-02-21T09:06:08.3864665Z 'range_num_stages': [0, 0], 2026-02-21T09:06:08.3864892Z 'range_unroll_factors': [0, 0], 2026-02-21T09:06:08.3865065Z 'range_warp_specializes': [None, None]} 2026-02-21T09:06:08.3882021Z [68s] Fitting surrogate: 280 points, 280 targets 2026-02-21T09:06:09.8244523Z [70s] Generation 3 starting: 83 neighbors, 5 active search path(s) 2026-02-21T09:06:19.7275273Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 2.3 configs/s 2026-02-21T09:06:21.9610586Z [82s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:06:21.9610885Z 2026-02-21T09:06:21.9612895Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:06:21.9614137Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:06:21.9614372Z `ptxas` stderr: 2026-02-21T09:06:21.9615039Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 228 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:06:21.9615618Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:21.9615781Z 2026-02-21T09:06:21.9616184Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpko_z9ccv.ptx -o /tmp/tmpko_z9ccv.ptx.o 2026-02-21T09:06:21.9616654Z 2026-02-21T09:06:21.9616783Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:06:21.9617165Z 2026-02-21T09:06:21.9617248Z ================================================================ 2026-02-21T09:06:21.9617459Z Internal Triton PTX codegen error 2026-02-21T09:06:21.9617630Z `ptxas` stderr: 2026-02-21T09:06:21.9618110Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 228 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:06:21.9618579Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:21.9618720Z 2026-02-21T09:06:21.9619098Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpko_z9ccv.ptx -o /tmp/tmpko_z9ccv.ptx.o 2026-02-21T09:06:21.9619549Z 2026-02-21T09:06:21.9619553Z 2026-02-21T09:06:21.9619609Z // 2026-02-21T09:06:21.9619754Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:06:21.9619919Z // 2026-02-21T09:06:21.9619994Z 2026-02-21T09:06:21.9620052Z .version 8.7 2026-02-21T09:06:21.9620184Z .target sm_100a 2026-02-21T09:06:21.9620325Z .address_size 64 2026-02-21T09:06:21.9620407Z 2026-02-21T09:06:21.9620527Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:06:21.9620790Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:06:21.9621011Z // @_helion_matmul 2026-02-21T09:06:21.9621202Z .visible .entry _helion_matmul( 2026-02-21T09:06:21.9621414Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:06:21.9621656Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:06:21.9621897Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:06:21.9622188Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:06:21.9622432Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:06:21.9622631Z ) 2026-02-21T09:06:21.9622743Z .reqntid 256 2026-02-21T09:06:21.9622875Z .maxnreg 32 2026-02-21T09:06:21.9622993Z { 2026-02-21T09:06:21.9623122Z .reg .pred %p<150>; 2026-02-21T09:06:21.9623264Z .reg .b32 %r<667>; 2026-02-21T09:06:21.9623409Z .reg .b64 %rd<260>; 2026-02-21T09:06:21.9623663Z .loc 1 19 0 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:19:0 2026-02-21T09:06:21.9623956Z $L__func_begin0: 2026-02-21T09:06:21.9624198Z .loc 1 19 0 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:19:0 2026-02-21T09:06:21.9624434Z 2026-02-21T09:06:21.9624485Z // %bb.0: 2026-02-21T09:06:21.9624639Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:06:21.9624861Z $L__tmp0: 2026-02-21T09:06:21.9625097Z .loc 1 19 0 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:19 2026-02-21T09:06:21.9625377Z mov.u32 %r1, %tid.x; 2026-02-21T09:06:21.9625548Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:06:21.9625739Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:06:21.9625944Z mov.b32 %r41, global_smem; 2026-02-21T09:06:21.9626105Z // begin inline asm 2026-02-21T09:06:21.9626344Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r41], 128; 2026-02-21T09:06:21.9626587Z // end inline asm 2026-02-21T09:06:21.9626743Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T09:06:21.9626927Z bar.sync 0; 2026-02-21T09:06:21.9627065Z ld.shared.b32 %r658, [global_smem]; 2026-02-21T09:06:21.9627237Z bar.sync 0; 2026-02-21T09:06:21.9627365Z // begin inline asm 2026-02-21T09:06:21.9627608Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:06:21.9627831Z // end inline asm 2026-02-21T09:06:21.9628080Z .loc 1 21 67 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:21:67 2026-02-21T09:06:21.9628375Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:06:21.9628523Z mov.u32 %r58, %ctaid.y; 2026-02-21T09:06:21.9628677Z mov.u32 %r59, %ctaid.z; 2026-02-21T09:06:21.9628822Z mov.u32 %r60, %nctaid.x; 2026-02-21T09:06:21.9628977Z mov.u32 %r61, %nctaid.y; 2026-02-21T09:06:21.9629130Z mad.lo.s32 %r62, %r59, %r61, %r58; 2026-02-21T09:06:21.9629306Z mad.lo.s32 %r63, %r62, %r60, %r3; 2026-02-21T09:06:21.9629479Z shl.b32 %r64, %r63, 8; 2026-02-21T09:06:21.9629627Z cvt.s64.s32 %rd41, %r64; 2026-02-21T09:06:21.9629786Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T09:06:21.9629941Z shl.b32 %r65, %r1, 2; 2026-02-21T09:06:21.9630096Z add.s32 %r42, %r41, %r65; 2026-02-21T09:06:21.9630278Z mov.b32 %r51, 0; 2026-02-21T09:06:21.9630430Z // begin inline asm 2026-02-21T09:06:21.9630581Z @%p1 st.shared.b32 [ %r42 + 0 ], %r51; 2026-02-21T09:06:21.9630759Z // end inline asm 2026-02-21T09:06:21.9630897Z bar.warp.sync -1; 2026-02-21T09:06:21.9631051Z setp.eq.b32 %p140, %r1, 0; 2026-02-21T09:06:21.9631214Z cvt.u64.u32 %rd4, %r41; 2026-02-21T09:06:21.9631368Z // begin inline asm 2026-02-21T09:06:21.9631625Z @%p140 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:06:21.9631914Z // end inline asm 2026-02-21T09:06:21.9632054Z // begin inline asm 2026-02-21T09:06:21.9632278Z @%p140 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:06:21.9632544Z // end inline asm 2026-02-21T09:06:21.9632676Z mov.b32 %r44, 64; 2026-02-21T09:06:21.9632818Z // begin inline asm 2026-02-21T09:06:21.9633061Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r44; 2026-02-21T09:06:21.9633335Z // end inline asm 2026-02-21T09:06:21.9633474Z mov.b32 %r45, 128; 2026-02-21T09:06:21.9633611Z // begin inline asm 2026-02-21T09:06:21.9633842Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r45; 2026-02-21T09:06:21.9634100Z // end inline asm 2026-02-21T09:06:21.9634269Z mov.b32 %r46, 2048; 2026-02-21T09:06:21.9634404Z // begin inline asm 2026-02-21T09:06:21.9634645Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r46; 2026-02-21T09:06:21.9634957Z // end inline asm 2026-02-21T09:06:21.9635085Z mov.b32 %r47, 4096; 2026-02-21T09:06:21.9635227Z // begin inline asm 2026-02-21T09:06:21.9635455Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r47; 2026-02-21T09:06:21.9635736Z // end inline asm 2026-02-21T09:06:21.9635865Z mov.b64 %rd12, 4096; 2026-02-21T09:06:21.9636007Z // begin inline asm 2026-02-21T09:06:21.9636256Z @%p140 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:06:21.9636540Z // end inline asm 2026-02-21T09:06:21.9636680Z mov.b32 %r48, 1; 2026-02-21T09:06:21.9636807Z // begin inline asm 2026-02-21T09:06:21.9637060Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r48; 2026-02-21T09:06:21.9637348Z // end inline asm 2026-02-21T09:06:21.9637486Z // begin inline asm 2026-02-21T09:06:21.9637731Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r48; 2026-02-21T09:06:21.9638018Z // end inline asm 2026-02-21T09:06:21.9638184Z // begin inline asm 2026-02-21T09:06:21.9638411Z @%p140 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:06:21.9638669Z // end inline asm 2026-02-21T09:06:21.9638797Z // begin inline asm 2026-02-21T09:06:21.9639042Z @%p140 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:21.9639313Z // end inline asm 2026-02-21T09:06:21.9639451Z // begin inline asm 2026-02-21T09:06:21.9639686Z @%p140 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:06:21.9639979Z // end inline asm 2026-02-21T09:06:21.9640121Z // begin inline asm 2026-02-21T09:06:21.9640348Z @%p140 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:21.9640619Z // end inline asm 2026-02-21T09:06:21.9640755Z // begin inline asm 2026-02-21T09:06:21.9641095Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:06:21.9641460Z // end inline asm 2026-02-21T09:06:21.9641587Z // begin inline asm 2026-02-21T09:06:21.9641798Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:06:21.9642056Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:06:21.9642254Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:21.9642432Z // end inline asm 2026-02-21T09:06:21.9642572Z bar.sync 0; 2026-02-21T09:06:21.9642741Z cvta.global.u64 %rd87, %rd19; 2026-02-21T09:06:21.9643030Z .loc 1 22 67 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:22:67 2026-02-21T09:06:21.9643341Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:06:21.9643498Z bar.sync 0; 2026-02-21T09:06:21.9643637Z // begin inline asm 2026-02-21T09:06:21.9643787Z @%p1 st.shared.b32 [ %r42 + 0 ], %r51; 2026-02-21T09:06:21.9643971Z // end inline asm 2026-02-21T09:06:21.9644109Z bar.warp.sync -1; 2026-02-21T09:06:21.9644257Z // begin inline asm 2026-02-21T09:06:21.9644505Z @%p140 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:06:21.9644838Z // end inline asm 2026-02-21T09:06:21.9644974Z // begin inline asm 2026-02-21T09:06:21.9645206Z @%p140 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:06:21.9645473Z // end inline asm 2026-02-21T09:06:21.9645607Z // begin inline asm 2026-02-21T09:06:21.9645853Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r44; 2026-02-21T09:06:21.9646126Z // end inline asm 2026-02-21T09:06:21.9646267Z // begin inline asm 2026-02-21T09:06:21.9646500Z @%p140 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r45; 2026-02-21T09:06:21.9646775Z // end inline asm 2026-02-21T09:06:21.9646946Z // begin inline asm 2026-02-21T09:06:21.9647189Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r46; 2026-02-21T09:06:21.9647470Z // end inline asm 2026-02-21T09:06:21.9647603Z // begin inline asm 2026-02-21T09:06:21.9647848Z @%p140 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r46; 2026-02-21T09:06:21.9648117Z // end inline asm 2026-02-21T09:06:21.9648257Z // begin inline asm 2026-02-21T09:06:21.9648512Z @%p140 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:06:21.9648797Z // end inline asm 2026-02-21T09:06:21.9648938Z // begin inline asm 2026-02-21T09:06:21.9649192Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r48; 2026-02-21T09:06:21.9649484Z // end inline asm 2026-02-21T09:06:21.9649631Z // begin inline asm 2026-02-21T09:06:21.9649887Z @%p140 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r48; 2026-02-21T09:06:21.9650157Z // end inline asm 2026-02-21T09:06:21.9650292Z // begin inline asm 2026-02-21T09:06:21.9650525Z @%p140 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:06:21.9650816Z // end inline asm 2026-02-21T09:06:21.9650960Z // begin inline asm 2026-02-21T09:06:21.9651227Z @%p140 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:21.9651509Z // end inline asm 2026-02-21T09:06:21.9651636Z // begin inline asm 2026-02-21T09:06:21.9651871Z @%p140 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:06:21.9652137Z // end inline asm 2026-02-21T09:06:21.9652267Z // begin inline asm 2026-02-21T09:06:21.9652562Z @%p140 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:21.9652812Z // end inline asm 2026-02-21T09:06:21.9652946Z // begin inline asm 2026-02-21T09:06:21.9653279Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:06:21.9653664Z // end inline asm 2026-02-21T09:06:21.9653802Z // begin inline asm 2026-02-21T09:06:21.9654005Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:06:21.9654258Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:06:21.9654442Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:21.9654620Z // end inline asm 2026-02-21T09:06:21.9654785Z bar.sync 0; 2026-02-21T09:06:21.9654928Z cvta.global.u64 %rd88, %rd37; 2026-02-21T09:06:21.9655199Z .loc 1 28 76 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:28:76 2026-02-21T09:06:21.9655525Z setp.gt.u32 %p39, %r3, 511; 2026-02-21T09:06:21.9655695Z @%p39 bra $L__BB0_8; 2026-02-21T09:06:21.9655851Z // %bb.1: // %.lr.ph 2026-02-21T09:06:21.9656152Z .loc 1 34 45 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:34:45 2026-02-21T09:06:21.9656431Z shl.b32 %r199, %r1, 3; 2026-02-21T09:06:21.9656588Z and.b32 %r200, %r199, 120; 2026-02-21T09:06:21.9656738Z shr.u32 %r201, %r1, 4; 2026-02-21T09:06:21.9657000Z .loc 1 35 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:35:27 2026-02-21T09:06:21.9657280Z shl.b32 %r202, %r3, 3; 2026-02-21T09:06:21.9657423Z and.b32 %r299, %r202, 3968; 2026-02-21T09:06:21.9657681Z .loc 1 34 45 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:34:45 2026-02-21T09:06:21.9657956Z or.b32 %r203, %r201, %r299; 2026-02-21T09:06:21.9658120Z bfe.u32 %r204, %r1, 4, 4; 2026-02-21T09:06:21.9658265Z shr.u32 %r205, %r1, 5; 2026-02-21T09:06:21.9658418Z setp.lt.u32 %p71, %r1, 64; 2026-02-21T09:06:21.9658568Z shl.b32 %r206, %r1, 10; 2026-02-21T09:06:21.9658719Z and.b32 %r207, %r206, 6144; 2026-02-21T09:06:21.9658866Z shl.b32 %r208, %r1, 4; 2026-02-21T09:06:21.9659012Z and.b32 %r209, %r208, 2032; 2026-02-21T09:06:21.9659203Z shr.u32 %r210, %r1, 1; 2026-02-21T09:06:21.9659345Z and.b32 %r211, %r210, 64; 2026-02-21T09:06:21.9659503Z xor.b32 %r212, %r209, %r211; 2026-02-21T09:06:21.9659651Z or.b32 %r213, %r212, %r207; 2026-02-21T09:06:21.9659806Z xor.b32 %r215, %r213, 32; 2026-02-21T09:06:21.9659950Z shl.b32 %r216, %r1, 6; 2026-02-21T09:06:21.9660097Z and.b32 %r217, %r216, 6144; 2026-02-21T09:06:21.9660241Z shl.b32 %r218, %r1, 5; 2026-02-21T09:06:21.9660390Z and.b32 %r219, %r218, 864; 2026-02-21T09:06:21.9660540Z and.b32 %r220, %r1, 224; 2026-02-21T09:06:21.9660697Z and.b32 %r222, %r65, 16; 2026-02-21T09:06:21.9660852Z or.b32 %r223, %r217, %r219; 2026-02-21T09:06:21.9661007Z xor.b32 %r224, %r223, %r220; 2026-02-21T09:06:21.9661166Z add.s32 %r225, %r41, %r222; 2026-02-21T09:06:21.9661316Z add.s32 %r477, %r225, %r224; 2026-02-21T09:06:21.9661576Z .loc 1 33 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:33:27 2026-02-21T09:06:21.9661851Z shl.b32 %r226, %r3, 7; 2026-02-21T09:06:21.9662000Z and.b32 %r303, %r226, 1920; 2026-02-21T09:06:21.9662250Z .loc 1 36 32 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:36:32 2026-02-21T09:06:21.9662564Z or.b32 %r11, %r299, %r204; 2026-02-21T09:06:21.9662825Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9663119Z shfl.sync.idx.b32 %r19, %r205, 0, 31, -1; 2026-02-21T09:06:21.9663305Z shl.b32 %r227, %r19, 21; 2026-02-21T09:06:21.9663454Z and.b32 %r228, %r227, 6291456; 2026-02-21T09:06:21.9663616Z add.s32 %r229, %r228, %r658; 2026-02-21T09:06:21.9663765Z shl.b32 %r230, %r19, 4; 2026-02-21T09:06:21.9663913Z and.b32 %r231, %r230, 64; 2026-02-21T09:06:21.9664090Z add.s32 %r472, %r229, %r231; 2026-02-21T09:06:21.9664239Z mov.pred %p87, -1; 2026-02-21T09:06:21.9664386Z // begin inline asm 2026-02-21T09:06:21.9664742Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r472 + 0], {%r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51}; 2026-02-21T09:06:21.9665110Z // end inline asm 2026-02-21T09:06:21.9665241Z // begin inline asm 2026-02-21T09:06:21.9665578Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r472 + 16], {%r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51}; 2026-02-21T09:06:21.9666028Z // end inline asm 2026-02-21T09:06:21.9666159Z // begin inline asm 2026-02-21T09:06:21.9666478Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r472 + 32], {%r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51}; 2026-02-21T09:06:21.9666828Z // end inline asm 2026-02-21T09:06:21.9666998Z // begin inline asm 2026-02-21T09:06:21.9667306Z @%p87 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r472 + 48], {%r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51, %r51}; 2026-02-21T09:06:21.9667653Z // end inline asm 2026-02-21T09:06:21.9667781Z // begin inline asm 2026-02-21T09:06:21.9667938Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:06:21.9668101Z // end inline asm 2026-02-21T09:06:21.9668226Z bar.sync 0; 2026-02-21T09:06:21.9668480Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9668753Z add.s32 %r660, %r41, 458816; 2026-02-21T09:06:21.9668905Z // begin inline asm 2026-02-21T09:06:21.9669065Z @%p140 mbarrier.init.shared::cta.b64 [%r660], 1; 2026-02-21T09:06:21.9669254Z // end inline asm 2026-02-21T09:06:21.9669381Z bar.sync 0; 2026-02-21T09:06:21.9669515Z add.s32 %r135, %r41, 458824; 2026-02-21T09:06:21.9669666Z // begin inline asm 2026-02-21T09:06:21.9669825Z @%p140 mbarrier.init.shared::cta.b64 [%r135], 1; 2026-02-21T09:06:21.9670015Z // end inline asm 2026-02-21T09:06:21.9670143Z add.s32 %r136, %r41, 458752; 2026-02-21T09:06:21.9670292Z // begin inline asm 2026-02-21T09:06:21.9670444Z @%p140 mbarrier.init.shared::cta.b64 [%r136], 1; 2026-02-21T09:06:21.9670663Z // end inline asm 2026-02-21T09:06:21.9670790Z bar.sync 0; 2026-02-21T09:06:21.9670921Z add.s32 %r137, %r41, 458760; 2026-02-21T09:06:21.9671072Z // begin inline asm 2026-02-21T09:06:21.9671227Z @%p140 mbarrier.init.shared::cta.b64 [%r137], 1; 2026-02-21T09:06:21.9671414Z // end inline asm 2026-02-21T09:06:21.9671539Z bar.sync 0; 2026-02-21T09:06:21.9671672Z add.s32 %r138, %r41, 458768; 2026-02-21T09:06:21.9671822Z // begin inline asm 2026-02-21T09:06:21.9671985Z @%p140 mbarrier.init.shared::cta.b64 [%r138], 1; 2026-02-21T09:06:21.9672163Z // end inline asm 2026-02-21T09:06:21.9672293Z bar.sync 0; 2026-02-21T09:06:21.9672415Z add.s32 %r139, %r41, 458776; 2026-02-21T09:06:21.9672569Z // begin inline asm 2026-02-21T09:06:21.9672728Z @%p140 mbarrier.init.shared::cta.b64 [%r139], 1; 2026-02-21T09:06:21.9672906Z // end inline asm 2026-02-21T09:06:21.9673037Z bar.sync 0; 2026-02-21T09:06:21.9673160Z add.s32 %r140, %r41, 458784; 2026-02-21T09:06:21.9673311Z // begin inline asm 2026-02-21T09:06:21.9673465Z @%p140 mbarrier.init.shared::cta.b64 [%r140], 1; 2026-02-21T09:06:21.9673648Z // end inline asm 2026-02-21T09:06:21.9673772Z bar.sync 0; 2026-02-21T09:06:21.9673902Z add.s32 %r141, %r41, 458792; 2026-02-21T09:06:21.9674070Z // begin inline asm 2026-02-21T09:06:21.9674232Z @%p140 mbarrier.init.shared::cta.b64 [%r141], 1; 2026-02-21T09:06:21.9674411Z // end inline asm 2026-02-21T09:06:21.9674533Z bar.sync 0; 2026-02-21T09:06:21.9674663Z add.s32 %r296, %r41, 458800; 2026-02-21T09:06:21.9674838Z // begin inline asm 2026-02-21T09:06:21.9674997Z @%p140 mbarrier.init.shared::cta.b64 [%r296], 1; 2026-02-21T09:06:21.9675173Z // end inline asm 2026-02-21T09:06:21.9675305Z bar.sync 0; 2026-02-21T09:06:21.9675426Z // begin inline asm 2026-02-21T09:06:21.9675659Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r136], 65536; 2026-02-21T09:06:21.9675878Z // end inline asm 2026-02-21T09:06:21.9676113Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9676397Z // begin inline asm 2026-02-21T09:06:21.9676543Z fence.proxy.async.shared::cta; 2026-02-21T09:06:21.9676706Z // end inline asm 2026-02-21T09:06:21.9676829Z bar.sync 0; 2026-02-21T09:06:21.9676970Z elect.sync %r232|%p72, -1; 2026-02-21T09:06:21.9677128Z and.pred %p54, %p71, %p72; 2026-02-21T09:06:21.9677285Z and.b32 %r233, %r19, 1; 2026-02-21T09:06:21.9677439Z shl.b32 %r21, %r233, 13; 2026-02-21T09:06:21.9677585Z shl.b32 %r234, %r233, 14; 2026-02-21T09:06:21.9677741Z add.s32 %r144, %r41, %r234; 2026-02-21T09:06:21.9677888Z shl.b32 %r145, %r233, 6; 2026-02-21T09:06:21.9678039Z // begin inline asm 2026-02-21T09:06:21.9678384Z @%p54 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd87, {%r145, %r299}], [%r136]; 2026-02-21T09:06:21.9678749Z // end inline asm 2026-02-21T09:06:21.9678989Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9679269Z bar.sync 0; 2026-02-21T09:06:21.9679403Z elect.sync %r235|%p73, -1; 2026-02-21T09:06:21.9679557Z and.pred %p55, %p71, %p73; 2026-02-21T09:06:21.9679716Z add.s32 %r148, %r144, 229376; 2026-02-21T09:06:21.9679863Z // begin inline asm 2026-02-21T09:06:21.9680189Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd88, {%r145, %r303}], [%r136]; 2026-02-21T09:06:21.9680543Z // end inline asm 2026-02-21T09:06:21.9680799Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9681083Z bar.sync 0; 2026-02-21T09:06:21.9681209Z // begin inline asm 2026-02-21T09:06:21.9681408Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r137], 65536; 2026-02-21T09:06:21.9681627Z // end inline asm 2026-02-21T09:06:21.9681867Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9682138Z bar.sync 0; 2026-02-21T09:06:21.9682304Z elect.sync %r236|%p74, -1; 2026-02-21T09:06:21.9682457Z and.pred %p57, %p71, %p74; 2026-02-21T09:06:21.9682612Z add.s32 %r153, %r144, 32768; 2026-02-21T09:06:21.9682764Z or.b32 %r154, %r145, 128; 2026-02-21T09:06:21.9682907Z // begin inline asm 2026-02-21T09:06:21.9683225Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r153], [%rd87, {%r154, %r299}], [%r137]; 2026-02-21T09:06:21.9683572Z // end inline asm 2026-02-21T09:06:21.9683811Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9684083Z bar.sync 0; 2026-02-21T09:06:21.9684217Z elect.sync %r237|%p75, -1; 2026-02-21T09:06:21.9684369Z and.pred %p58, %p71, %p75; 2026-02-21T09:06:21.9684526Z add.s32 %r157, %r144, 262144; 2026-02-21T09:06:21.9684720Z // begin inline asm 2026-02-21T09:06:21.9685031Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r157], [%rd88, {%r154, %r303}], [%r137]; 2026-02-21T09:06:21.9685378Z // end inline asm 2026-02-21T09:06:21.9685626Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9685920Z bar.sync 0; 2026-02-21T09:06:21.9686075Z // begin inline asm 2026-02-21T09:06:21.9686277Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r138], 65536; 2026-02-21T09:06:21.9686504Z // end inline asm 2026-02-21T09:06:21.9686749Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9687037Z bar.sync 0; 2026-02-21T09:06:21.9687169Z elect.sync %r238|%p76, -1; 2026-02-21T09:06:21.9687334Z and.pred %p60, %p71, %p76; 2026-02-21T09:06:21.9687493Z add.s32 %r162, %r144, 65536; 2026-02-21T09:06:21.9687696Z or.b32 %r163, %r145, 256; 2026-02-21T09:06:21.9687847Z // begin inline asm 2026-02-21T09:06:21.9688185Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r162], [%rd87, {%r163, %r299}], [%r138]; 2026-02-21T09:06:21.9688547Z // end inline asm 2026-02-21T09:06:21.9688798Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9689090Z bar.sync 0; 2026-02-21T09:06:21.9689225Z elect.sync %r239|%p77, -1; 2026-02-21T09:06:21.9689390Z and.pred %p61, %p71, %p77; 2026-02-21T09:06:21.9689547Z add.s32 %r166, %r144, 294912; 2026-02-21T09:06:21.9689709Z // begin inline asm 2026-02-21T09:06:21.9690044Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r166], [%rd88, {%r163, %r303}], [%r138]; 2026-02-21T09:06:21.9690395Z // end inline asm 2026-02-21T09:06:21.9690679Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9690960Z bar.sync 0; 2026-02-21T09:06:21.9691106Z // begin inline asm 2026-02-21T09:06:21.9691299Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r139], 65536; 2026-02-21T09:06:21.9691529Z // end inline asm 2026-02-21T09:06:21.9691773Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9692069Z bar.sync 0; 2026-02-21T09:06:21.9692211Z elect.sync %r240|%p78, -1; 2026-02-21T09:06:21.9692373Z and.pred %p63, %p71, %p78; 2026-02-21T09:06:21.9692537Z add.s32 %r171, %r144, 98304; 2026-02-21T09:06:21.9692690Z or.b32 %r172, %r145, 384; 2026-02-21T09:06:21.9692845Z // begin inline asm 2026-02-21T09:06:21.9693179Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r171], [%rd87, {%r172, %r299}], [%r139]; 2026-02-21T09:06:21.9693520Z // end inline asm 2026-02-21T09:06:21.9693762Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9694034Z bar.sync 0; 2026-02-21T09:06:21.9694171Z elect.sync %r241|%p79, -1; 2026-02-21T09:06:21.9694321Z and.pred %p64, %p71, %p79; 2026-02-21T09:06:21.9694479Z add.s32 %r175, %r144, 327680; 2026-02-21T09:06:21.9694735Z // begin inline asm 2026-02-21T09:06:21.9695059Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r175], [%rd88, {%r172, %r303}], [%r139]; 2026-02-21T09:06:21.9695404Z // end inline asm 2026-02-21T09:06:21.9695659Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9695948Z bar.sync 0; 2026-02-21T09:06:21.9696078Z // begin inline asm 2026-02-21T09:06:21.9696283Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r140], 65536; 2026-02-21T09:06:21.9696505Z // end inline asm 2026-02-21T09:06:21.9696758Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9697040Z bar.sync 0; 2026-02-21T09:06:21.9697186Z elect.sync %r242|%p80, -1; 2026-02-21T09:06:21.9697353Z and.pred %p66, %p71, %p80; 2026-02-21T09:06:21.9697511Z add.s32 %r180, %r144, 131072; 2026-02-21T09:06:21.9697672Z or.b32 %r181, %r145, 512; 2026-02-21T09:06:21.9697822Z // begin inline asm 2026-02-21T09:06:21.9698148Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r180], [%rd87, {%r181, %r299}], [%r140]; 2026-02-21T09:06:21.9698491Z // end inline asm 2026-02-21T09:06:21.9698795Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9699067Z bar.sync 0; 2026-02-21T09:06:21.9699202Z elect.sync %r243|%p81, -1; 2026-02-21T09:06:21.9699362Z and.pred %p67, %p71, %p81; 2026-02-21T09:06:21.9699512Z add.s32 %r184, %r144, 360448; 2026-02-21T09:06:21.9699669Z // begin inline asm 2026-02-21T09:06:21.9699980Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r184], [%rd88, {%r181, %r303}], [%r140]; 2026-02-21T09:06:21.9700368Z // end inline asm 2026-02-21T09:06:21.9700605Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9700884Z bar.sync 0; 2026-02-21T09:06:21.9701013Z // begin inline asm 2026-02-21T09:06:21.9701195Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r141], 65536; 2026-02-21T09:06:21.9701414Z // end inline asm 2026-02-21T09:06:21.9701650Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9701929Z bar.sync 0; 2026-02-21T09:06:21.9702059Z elect.sync %r244|%p82, -1; 2026-02-21T09:06:21.9702225Z and.pred %p69, %p71, %p82; 2026-02-21T09:06:21.9702377Z add.s32 %r189, %r144, 163840; 2026-02-21T09:06:21.9702533Z or.b32 %r190, %r145, 640; 2026-02-21T09:06:21.9702681Z // begin inline asm 2026-02-21T09:06:21.9703018Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r189], [%rd87, {%r190, %r299}], [%r141]; 2026-02-21T09:06:21.9703366Z // end inline asm 2026-02-21T09:06:21.9703603Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9703877Z bar.sync 0; 2026-02-21T09:06:21.9704004Z elect.sync %r245|%p83, -1; 2026-02-21T09:06:21.9704162Z and.pred %p70, %p71, %p83; 2026-02-21T09:06:21.9704309Z add.s32 %r193, %r144, 393216; 2026-02-21T09:06:21.9704462Z // begin inline asm 2026-02-21T09:06:21.9704806Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r193], [%rd88, {%r190, %r303}], [%r141]; 2026-02-21T09:06:21.9705141Z // end inline asm 2026-02-21T09:06:21.9705388Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9705657Z bar.sync 0; 2026-02-21T09:06:21.9705785Z // begin inline asm 2026-02-21T09:06:21.9705911Z 2026-02-21T09:06:21.9706026Z { 2026-02-21T09:06:21.9706144Z .reg .pred complete; 2026-02-21T09:06:21.9706290Z waitLoop: 2026-02-21T09:06:21.9706476Z mbarrier.try_wait.parity.shared.b64 complete, [%r136], %r51; 2026-02-21T09:06:21.9706701Z @!complete bra.uni waitLoop; 2026-02-21T09:06:21.9706854Z } 2026-02-21T09:06:21.9706914Z 2026-02-21T09:06:21.9707002Z // end inline asm 2026-02-21T09:06:21.9707240Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9707514Z setp.ne.b32 %p84, %r19, 0; 2026-02-21T09:06:21.9707670Z @%p84 bra $L__BB0_3; 2026-02-21T09:06:21.9707803Z // %bb.2: 2026-02-21T09:06:21.9708032Z .loc 1 0 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:0:52 2026-02-21T09:06:21.9708317Z add.s32 %r263, %r41, 229376; 2026-02-21T09:06:21.9708468Z add.s32 %r264, %r41, 245856; 2026-02-21T09:06:21.9708624Z bfe.u32 %r265, %r264, 4, 14; 2026-02-21T09:06:21.9708773Z cvt.u64.u32 %rd71, %r265; 2026-02-21T09:06:21.9708939Z or.b64 %rd69, %rd71, 4611686293372403712; 2026-02-21T09:06:21.9709112Z add.s32 %r266, %r41, 16480; 2026-02-21T09:06:21.9709266Z bfe.u32 %r267, %r266, 4, 14; 2026-02-21T09:06:21.9709410Z cvt.u64.u32 %rd72, %r267; 2026-02-21T09:06:21.9709572Z or.b64 %rd68, %rd72, 4611686293372403712; 2026-02-21T09:06:21.9709752Z add.s32 %r268, %r41, 245824; 2026-02-21T09:06:21.9709896Z bfe.u32 %r269, %r268, 4, 14; 2026-02-21T09:06:21.9710048Z cvt.u64.u32 %rd73, %r269; 2026-02-21T09:06:21.9710198Z or.b64 %rd67, %rd73, 4611686293372403712; 2026-02-21T09:06:21.9710398Z add.s32 %r270, %r41, 16448; 2026-02-21T09:06:21.9710551Z bfe.u32 %r271, %r270, 4, 14; 2026-02-21T09:06:21.9710711Z cvt.u64.u32 %rd74, %r271; 2026-02-21T09:06:21.9710868Z or.b64 %rd66, %rd74, 4611686293372403712; 2026-02-21T09:06:21.9711046Z add.s32 %r272, %r41, 245792; 2026-02-21T09:06:21.9711203Z bfe.u32 %r273, %r272, 4, 14; 2026-02-21T09:06:21.9711355Z cvt.u64.u32 %rd75, %r273; 2026-02-21T09:06:21.9711518Z or.b64 %rd65, %rd75, 4611686293372403712; 2026-02-21T09:06:21.9711691Z add.s32 %r274, %r41, 16416; 2026-02-21T09:06:21.9711868Z bfe.u32 %r275, %r274, 4, 14; 2026-02-21T09:06:21.9712012Z cvt.u64.u32 %rd76, %r275; 2026-02-21T09:06:21.9712168Z or.b64 %rd64, %rd76, 4611686293372403712; 2026-02-21T09:06:21.9712329Z add.s32 %r276, %r41, 245760; 2026-02-21T09:06:21.9712482Z bfe.u32 %r277, %r276, 4, 14; 2026-02-21T09:06:21.9712624Z cvt.u64.u32 %rd77, %r277; 2026-02-21T09:06:21.9712778Z or.b64 %rd63, %rd77, 4611686293372403712; 2026-02-21T09:06:21.9712948Z add.s32 %r278, %r41, 16384; 2026-02-21T09:06:21.9713094Z bfe.u32 %r279, %r278, 4, 14; 2026-02-21T09:06:21.9713244Z cvt.u64.u32 %rd78, %r279; 2026-02-21T09:06:21.9713393Z or.b64 %rd62, %rd78, 4611686293372403712; 2026-02-21T09:06:21.9713560Z add.s32 %r280, %r41, 229472; 2026-02-21T09:06:21.9713702Z bfe.u32 %r281, %r280, 4, 14; 2026-02-21T09:06:21.9713853Z cvt.u64.u32 %rd79, %r281; 2026-02-21T09:06:21.9714000Z or.b64 %rd61, %rd79, 4611686293372403712; 2026-02-21T09:06:21.9714203Z add.s32 %r282, %r41, 96; 2026-02-21T09:06:21.9714354Z bfe.u32 %r283, %r282, 4, 14; 2026-02-21T09:06:21.9714499Z cvt.u64.u32 %rd80, %r283; 2026-02-21T09:06:21.9714654Z or.b64 %rd60, %rd80, 4611686293372403712; 2026-02-21T09:06:21.9714865Z add.s32 %r284, %r41, 229440; 2026-02-21T09:06:21.9715019Z bfe.u32 %r285, %r284, 4, 14; 2026-02-21T09:06:21.9715164Z cvt.u64.u32 %rd81, %r285; 2026-02-21T09:06:21.9715319Z or.b64 %rd59, %rd81, 4611686293372403712; 2026-02-21T09:06:21.9715483Z add.s32 %r286, %r41, 64; 2026-02-21T09:06:21.9715636Z bfe.u32 %r287, %r286, 4, 14; 2026-02-21T09:06:21.9715786Z cvt.u64.u32 %rd82, %r287; 2026-02-21T09:06:21.9715933Z or.b64 %rd58, %rd82, 4611686293372403712; 2026-02-21T09:06:21.9716101Z add.s32 %r288, %r41, 229408; 2026-02-21T09:06:21.9716244Z bfe.u32 %r289, %r288, 4, 14; 2026-02-21T09:06:21.9716394Z cvt.u64.u32 %rd83, %r289; 2026-02-21T09:06:21.9716544Z or.b64 %rd57, %rd83, 4611686293372403712; 2026-02-21T09:06:21.9716716Z add.s32 %r290, %r41, 32; 2026-02-21T09:06:21.9716860Z bfe.u32 %r291, %r290, 4, 14; 2026-02-21T09:06:21.9717013Z cvt.u64.u32 %rd84, %r291; 2026-02-21T09:06:21.9717159Z or.b64 %rd56, %rd84, 4611686293372403712; 2026-02-21T09:06:21.9717331Z bfe.u32 %r292, %r263, 4, 14; 2026-02-21T09:06:21.9717484Z cvt.u64.u32 %rd85, %r292; 2026-02-21T09:06:21.9717672Z or.b64 %rd55, %rd85, 4611686293372403712; 2026-02-21T09:06:21.9717846Z bfe.u32 %r293, %r41, 4, 14; 2026-02-21T09:06:21.9717992Z cvt.u64.u32 %rd86, %r293; 2026-02-21T09:06:21.9718147Z or.b64 %rd54, %rd86, 4611686293372403712; 2026-02-21T09:06:21.9718415Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9718711Z elect.sync %r294|%p86, -1; 2026-02-21T09:06:21.9718859Z mov.b32 %r247, 136314896; 2026-02-21T09:06:21.9719009Z mov.pred %p85, 0; 2026-02-21T09:06:21.9719150Z // begin inline asm 2026-02-21T09:06:21.9719370Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd54, %rd55, %r247, %p85; 2026-02-21T09:06:21.9719622Z // end inline asm 2026-02-21T09:06:21.9719754Z // begin inline asm 2026-02-21T09:06:21.9719972Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd56, %rd57, %r247, %p87; 2026-02-21T09:06:21.9720208Z // end inline asm 2026-02-21T09:06:21.9720345Z // begin inline asm 2026-02-21T09:06:21.9720556Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd58, %rd59, %r247, %p87; 2026-02-21T09:06:21.9720797Z // end inline asm 2026-02-21T09:06:21.9720932Z // begin inline asm 2026-02-21T09:06:21.9721160Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd60, %rd61, %r247, %p87; 2026-02-21T09:06:21.9721403Z // end inline asm 2026-02-21T09:06:21.9721531Z // begin inline asm 2026-02-21T09:06:21.9721740Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd62, %rd63, %r247, %p87; 2026-02-21T09:06:21.9721972Z // end inline asm 2026-02-21T09:06:21.9722106Z // begin inline asm 2026-02-21T09:06:21.9722317Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd64, %rd65, %r247, %p87; 2026-02-21T09:06:21.9722576Z // end inline asm 2026-02-21T09:06:21.9722710Z // begin inline asm 2026-02-21T09:06:21.9722910Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd66, %rd67, %r247, %p87; 2026-02-21T09:06:21.9723148Z // end inline asm 2026-02-21T09:06:21.9723277Z // begin inline asm 2026-02-21T09:06:21.9723485Z @%p86 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd68, %rd69, %r247, %p87; 2026-02-21T09:06:21.9723729Z // end inline asm 2026-02-21T09:06:21.9723859Z add.s32 %r295, %r41, 458816; 2026-02-21T09:06:21.9724016Z cvt.u64.u32 %rd70, %r295; 2026-02-21T09:06:21.9724159Z // begin inline asm 2026-02-21T09:06:21.9724365Z @%p86 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd70]; 2026-02-21T09:06:21.9724587Z // end inline asm 2026-02-21T09:06:21.9724752Z $L__BB0_3: 2026-02-21T09:06:21.9724986Z .loc 1 0 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:0:52 2026-02-21T09:06:21.9725328Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:06:21.9725521Z add.s32 %r4, %r41, %r213; 2026-02-21T09:06:21.9725667Z add.s32 %r5, %r41, %r215; 2026-02-21T09:06:21.9725820Z add.s32 %r482, %r477, 1024; 2026-02-21T09:06:21.9725969Z or.b32 %r9, %r303, %r200; 2026-02-21T09:06:21.9726120Z or.b32 %r12, %r11, 16; 2026-02-21T09:06:21.9726262Z or.b32 %r13, %r11, 32; 2026-02-21T09:06:21.9726412Z or.b32 %r14, %r203, 48; 2026-02-21T09:06:21.9726553Z or.b32 %r15, %r11, 64; 2026-02-21T09:06:21.9726698Z or.b32 %r16, %r11, 80; 2026-02-21T09:06:21.9726836Z or.b32 %r17, %r11, 96; 2026-02-21T09:06:21.9726979Z or.b32 %r18, %r203, 112; 2026-02-21T09:06:21.9727236Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9727505Z bar.sync 0; 2026-02-21T09:06:21.9727637Z // begin inline asm 2026-02-21T09:06:21.9727823Z @%p140 mbarrier.arrive.expect_tx.shared.b64 _, [%r296], 65536; 2026-02-21T09:06:21.9728042Z // end inline asm 2026-02-21T09:06:21.9728278Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9728553Z bar.sync 0; 2026-02-21T09:06:21.9728683Z elect.sync %r310|%p106, -1; 2026-02-21T09:06:21.9728847Z and.pred %p103, %p71, %p106; 2026-02-21T09:06:21.9729044Z shl.b32 %r311, %r21, 1; 2026-02-21T09:06:21.9729194Z add.s32 %r312, %r41, %r311; 2026-02-21T09:06:21.9729349Z add.s32 %r297, %r312, 196608; 2026-02-21T09:06:21.9729499Z or.b32 %r298, %r145, 768; 2026-02-21T09:06:21.9729665Z // begin inline asm 2026-02-21T09:06:21.9730001Z @%p103 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r297], [%rd87, {%r298, %r299}], [%r296]; 2026-02-21T09:06:21.9730379Z // end inline asm 2026-02-21T09:06:21.9730634Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9730915Z bar.sync 0; 2026-02-21T09:06:21.9731060Z elect.sync %r313|%p107, -1; 2026-02-21T09:06:21.9731227Z and.pred %p104, %p71, %p107; 2026-02-21T09:06:21.9731397Z add.s32 %r301, %r312, 425984; 2026-02-21T09:06:21.9731552Z // begin inline asm 2026-02-21T09:06:21.9731885Z @%p104 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r301], [%rd88, {%r298, %r303}], [%r296]; 2026-02-21T09:06:21.9732251Z // end inline asm 2026-02-21T09:06:21.9732507Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9732799Z or.b32 %r25, %r145, 896; 2026-02-21T09:06:21.9732974Z mov.b32 %r664, 1; 2026-02-21T09:06:21.9733122Z mov.b32 %r663, 6; 2026-02-21T09:06:21.9733262Z mov.b32 %r659, 0; 2026-02-21T09:06:21.9733407Z mov.b32 %r661, %r659; 2026-02-21T09:06:21.9733557Z mov.b32 %r662, %r659; 2026-02-21T09:06:21.9733713Z mov.b32 %r665, %r659; 2026-02-21T09:06:21.9733857Z mov.b32 %r666, %r659; 2026-02-21T09:06:21.9734007Z bra.uni $L__BB0_4; 2026-02-21T09:06:21.9734205Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:21.9734566Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9734902Z setp.lt.u32 %p130, %r666, 1152; 2026-02-21T09:06:21.9735179Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9735470Z // begin inline asm 2026-02-21T09:06:21.9735601Z 2026-02-21T09:06:21.9735722Z { 2026-02-21T09:06:21.9735844Z .reg .pred complete; 2026-02-21T09:06:21.9735996Z waitLoop: 2026-02-21T09:06:21.9736192Z mbarrier.try_wait.parity.shared.b64 complete, [%r660], %r659; 2026-02-21T09:06:21.9736428Z @!complete bra.uni waitLoop; 2026-02-21T09:06:21.9736590Z } 2026-02-21T09:06:21.9736655Z 2026-02-21T09:06:21.9736711Z // end inline asm 2026-02-21T09:06:21.9736859Z add.s32 %r385, %r664, 1; 2026-02-21T09:06:21.9737017Z setp.gt.s32 %p133, %r385, 1; 2026-02-21T09:06:21.9737198Z selp.b32 %r664, 0, %r385, %p133; 2026-02-21T09:06:21.9737398Z selp.b32 %r386, 1, 0, %p133; 2026-02-21T09:06:21.9737561Z xor.b32 %r38, %r665, %r386; 2026-02-21T09:06:21.9737820Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9738098Z add.s32 %r387, %r663, 1; 2026-02-21T09:06:21.9738255Z setp.gt.s32 %p134, %r387, 6; 2026-02-21T09:06:21.9738408Z selp.b32 %r663, 0, %r387, %p134; 2026-02-21T09:06:21.9738574Z shl.b32 %r388, %r663, 3; 2026-02-21T09:06:21.9738717Z add.s32 %r390, %r41, %r388; 2026-02-21T09:06:21.9738873Z add.s32 %r380, %r390, 458752; 2026-02-21T09:06:21.9739020Z bar.sync 0; 2026-02-21T09:06:21.9739161Z and.pred %p127, %p140, %p130; 2026-02-21T09:06:21.9739314Z // begin inline asm 2026-02-21T09:06:21.9739510Z @%p127 mbarrier.arrive.expect_tx.shared.b64 _, [%r380], 65536; 2026-02-21T09:06:21.9739727Z // end inline asm 2026-02-21T09:06:21.9739963Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9740242Z bar.sync 0; 2026-02-21T09:06:21.9740375Z elect.sync %r391|%p135, -1; 2026-02-21T09:06:21.9740539Z and.pred %p136, %p130, %p135; 2026-02-21T09:06:21.9740694Z and.pred %p128, %p71, %p136; 2026-02-21T09:06:21.9740852Z shl.b32 %r392, %r663, 15; 2026-02-21T09:06:21.9741041Z add.s32 %r377, %r144, %r392; 2026-02-21T09:06:21.9741195Z add.s32 %r378, %r25, %r666; 2026-02-21T09:06:21.9741344Z // begin inline asm 2026-02-21T09:06:21.9741659Z @%p128 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r377], [%rd87, {%r378, %r299}], [%r380]; 2026-02-21T09:06:21.9742006Z // end inline asm 2026-02-21T09:06:21.9742242Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9742518Z bar.sync 0; 2026-02-21T09:06:21.9742645Z elect.sync %r393|%p137, -1; 2026-02-21T09:06:21.9742808Z and.pred %p138, %p130, %p137; 2026-02-21T09:06:21.9742969Z and.pred %p129, %p71, %p138; 2026-02-21T09:06:21.9743121Z add.s32 %r381, %r148, %r392; 2026-02-21T09:06:21.9743271Z // begin inline asm 2026-02-21T09:06:21.9743580Z @%p129 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r381], [%rd88, {%r378, %r303}], [%r380]; 2026-02-21T09:06:21.9743939Z // end inline asm 2026-02-21T09:06:21.9744177Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9744459Z add.s32 %r40, %r666, 128; 2026-02-21T09:06:21.9744616Z setp.lt.u32 %p139, %r666, 1792; 2026-02-21T09:06:21.9744852Z mov.b32 %r659, %r665; 2026-02-21T09:06:21.9744998Z mov.b32 %r660, %r394; 2026-02-21T09:06:21.9745135Z mov.b32 %r665, %r38; 2026-02-21T09:06:21.9745283Z mov.b32 %r666, %r40; 2026-02-21T09:06:21.9745420Z @%p139 bra $L__BB0_4; 2026-02-21T09:06:21.9745567Z bra.uni $L__BB0_7; 2026-02-21T09:06:21.9745745Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:06:21.9745964Z add.s32 %r316, %r662, 1; 2026-02-21T09:06:21.9746114Z setp.gt.s32 %p109, %r316, 6; 2026-02-21T09:06:21.9746303Z selp.b32 %r662, 0, %r316, %p109; 2026-02-21T09:06:21.9746470Z selp.b32 %r317, 1, 0, %p109; 2026-02-21T09:06:21.9746621Z xor.b32 %r661, %r661, %r317; 2026-02-21T09:06:21.9746777Z shl.b32 %r318, %r662, 3; 2026-02-21T09:06:21.9746924Z add.s32 %r320, %r41, %r318; 2026-02-21T09:06:21.9747079Z add.s32 %r314, %r320, 458752; 2026-02-21T09:06:21.9747226Z bar.sync 0; 2026-02-21T09:06:21.9747355Z // begin inline asm 2026-02-21T09:06:21.9747482Z 2026-02-21T09:06:21.9747599Z { 2026-02-21T09:06:21.9747713Z .reg .pred complete; 2026-02-21T09:06:21.9747857Z waitLoop: 2026-02-21T09:06:21.9748040Z mbarrier.try_wait.parity.shared.b64 complete, [%r314], %r661; 2026-02-21T09:06:21.9748263Z @!complete bra.uni waitLoop; 2026-02-21T09:06:21.9748413Z } 2026-02-21T09:06:21.9748473Z 2026-02-21T09:06:21.9748525Z // end inline asm 2026-02-21T09:06:21.9748664Z shl.b32 %r321, %r664, 3; 2026-02-21T09:06:21.9748844Z add.s32 %r322, %r41, %r321; 2026-02-21T09:06:21.9749002Z add.s32 %r394, %r322, 458816; 2026-02-21T09:06:21.9749262Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9749546Z @%p84 bra $L__BB0_6; 2026-02-21T09:06:21.9749733Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:21.9750047Z .loc 1 45 31 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:45:31 2026-02-21T09:06:21.9750333Z shl.b32 %r339, %r662, 15; 2026-02-21T09:06:21.9750480Z add.s32 %r341, %r41, %r339; 2026-02-21T09:06:21.9750741Z .loc 1 46 44 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:46:44 2026-02-21T09:06:21.9751025Z add.s32 %r342, %r341, 229376; 2026-02-21T09:06:21.9751286Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9751581Z elect.sync %r343|%p111, -1; 2026-02-21T09:06:21.9751738Z bfe.u32 %r344, %r341, 4, 14; 2026-02-21T09:06:21.9751895Z cvt.u64.u32 %rd106, %r344; 2026-02-21T09:06:21.9752055Z or.b64 %rd89, %rd106, 4611686293372403712; 2026-02-21T09:06:21.9752234Z bfe.u32 %r345, %r342, 4, 14; 2026-02-21T09:06:21.9752380Z cvt.u64.u32 %rd107, %r345; 2026-02-21T09:06:21.9752579Z or.b64 %rd90, %rd107, 4611686293372403712; 2026-02-21T09:06:21.9752745Z mov.b32 %r324, 136314896; 2026-02-21T09:06:21.9752897Z mov.pred %p110, -1; 2026-02-21T09:06:21.9753041Z // begin inline asm 2026-02-21T09:06:21.9753257Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd89, %rd90, %r324, %p110; 2026-02-21T09:06:21.9753510Z // end inline asm 2026-02-21T09:06:21.9753641Z add.s32 %r346, %r341, 32; 2026-02-21T09:06:21.9753793Z bfe.u32 %r347, %r346, 4, 14; 2026-02-21T09:06:21.9753941Z cvt.u64.u32 %rd108, %r347; 2026-02-21T09:06:21.9754104Z or.b64 %rd91, %rd108, 4611686293372403712; 2026-02-21T09:06:21.9754272Z add.s32 %r348, %r341, 229408; 2026-02-21T09:06:21.9754427Z bfe.u32 %r349, %r348, 4, 14; 2026-02-21T09:06:21.9754575Z cvt.u64.u32 %rd109, %r349; 2026-02-21T09:06:21.9754764Z or.b64 %rd92, %rd109, 4611686293372403712; 2026-02-21T09:06:21.9754939Z // begin inline asm 2026-02-21T09:06:21.9755155Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd91, %rd92, %r324, %p110; 2026-02-21T09:06:21.9755410Z // end inline asm 2026-02-21T09:06:21.9755541Z add.s32 %r350, %r341, 64; 2026-02-21T09:06:21.9755695Z bfe.u32 %r351, %r350, 4, 14; 2026-02-21T09:06:21.9755843Z cvt.u64.u32 %rd110, %r351; 2026-02-21T09:06:21.9756035Z or.b64 %rd93, %rd110, 4611686293372403712; 2026-02-21T09:06:21.9756207Z add.s32 %r352, %r341, 229440; 2026-02-21T09:06:21.9756368Z bfe.u32 %r353, %r352, 4, 14; 2026-02-21T09:06:21.9756527Z cvt.u64.u32 %rd111, %r353; 2026-02-21T09:06:21.9756683Z or.b64 %rd94, %rd111, 4611686293372403712; 2026-02-21T09:06:21.9756859Z // begin inline asm 2026-02-21T09:06:21.9757073Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd93, %rd94, %r324, %p110; 2026-02-21T09:06:21.9757323Z // end inline asm 2026-02-21T09:06:21.9757486Z add.s32 %r354, %r341, 96; 2026-02-21T09:06:21.9757644Z bfe.u32 %r355, %r354, 4, 14; 2026-02-21T09:06:21.9757788Z cvt.u64.u32 %rd112, %r355; 2026-02-21T09:06:21.9757947Z or.b64 %rd95, %rd112, 4611686293372403712; 2026-02-21T09:06:21.9758123Z add.s32 %r356, %r341, 229472; 2026-02-21T09:06:21.9758269Z bfe.u32 %r357, %r356, 4, 14; 2026-02-21T09:06:21.9758424Z cvt.u64.u32 %rd113, %r357; 2026-02-21T09:06:21.9758576Z or.b64 %rd96, %rd113, 4611686293372403712; 2026-02-21T09:06:21.9758748Z // begin inline asm 2026-02-21T09:06:21.9758954Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd95, %rd96, %r324, %p110; 2026-02-21T09:06:21.9759201Z // end inline asm 2026-02-21T09:06:21.9759331Z add.s32 %r358, %r341, 16384; 2026-02-21T09:06:21.9759484Z bfe.u32 %r359, %r358, 4, 14; 2026-02-21T09:06:21.9759634Z cvt.u64.u32 %rd114, %r359; 2026-02-21T09:06:21.9759786Z or.b64 %rd97, %rd114, 4611686293372403712; 2026-02-21T09:06:21.9759984Z add.s32 %r360, %r341, 245760; 2026-02-21T09:06:21.9760134Z bfe.u32 %r361, %r360, 4, 14; 2026-02-21T09:06:21.9760286Z cvt.u64.u32 %rd115, %r361; 2026-02-21T09:06:21.9760438Z or.b64 %rd98, %rd115, 4611686293372403712; 2026-02-21T09:06:21.9760607Z // begin inline asm 2026-02-21T09:06:21.9760817Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd97, %rd98, %r324, %p110; 2026-02-21T09:06:21.9761059Z // end inline asm 2026-02-21T09:06:21.9761194Z add.s32 %r362, %r341, 16416; 2026-02-21T09:06:21.9761338Z bfe.u32 %r363, %r362, 4, 14; 2026-02-21T09:06:21.9761490Z cvt.u64.u32 %rd116, %r363; 2026-02-21T09:06:21.9761642Z or.b64 %rd99, %rd116, 4611686293372403712; 2026-02-21T09:06:21.9761821Z add.s32 %r364, %r341, 245792; 2026-02-21T09:06:21.9761970Z bfe.u32 %r365, %r364, 4, 14; 2026-02-21T09:06:21.9762124Z cvt.u64.u32 %rd117, %r365; 2026-02-21T09:06:21.9762282Z or.b64 %rd100, %rd117, 4611686293372403712; 2026-02-21T09:06:21.9762456Z // begin inline asm 2026-02-21T09:06:21.9762677Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd99, %rd100, %r324, %p110; 2026-02-21T09:06:21.9762923Z // end inline asm 2026-02-21T09:06:21.9763062Z add.s32 %r366, %r341, 16448; 2026-02-21T09:06:21.9763206Z bfe.u32 %r367, %r366, 4, 14; 2026-02-21T09:06:21.9763389Z cvt.u64.u32 %rd118, %r367; 2026-02-21T09:06:21.9763545Z or.b64 %rd101, %rd118, 4611686293372403712; 2026-02-21T09:06:21.9763725Z add.s32 %r368, %r341, 245824; 2026-02-21T09:06:21.9763873Z bfe.u32 %r369, %r368, 4, 14; 2026-02-21T09:06:21.9764027Z cvt.u64.u32 %rd119, %r369; 2026-02-21T09:06:21.9764191Z or.b64 %rd102, %rd119, 4611686293372403712; 2026-02-21T09:06:21.9764362Z // begin inline asm 2026-02-21T09:06:21.9764583Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd101, %rd102, %r324, %p110; 2026-02-21T09:06:21.9764868Z // end inline asm 2026-02-21T09:06:21.9765005Z add.s32 %r370, %r341, 16480; 2026-02-21T09:06:21.9765149Z bfe.u32 %r371, %r370, 4, 14; 2026-02-21T09:06:21.9765302Z cvt.u64.u32 %rd120, %r371; 2026-02-21T09:06:21.9765456Z or.b64 %rd103, %rd120, 4611686293372403712; 2026-02-21T09:06:21.9765634Z add.s32 %r372, %r341, 245856; 2026-02-21T09:06:21.9765785Z bfe.u32 %r373, %r372, 4, 14; 2026-02-21T09:06:21.9765929Z cvt.u64.u32 %rd121, %r373; 2026-02-21T09:06:21.9766101Z or.b64 %rd104, %rd121, 4611686293372403712; 2026-02-21T09:06:21.9766265Z // begin inline asm 2026-02-21T09:06:21.9766485Z @%p111 tcgen05.mma.cta_group::1.kind::f16 [ %r658 + 0 ], %rd103, %rd104, %r324, %p110; 2026-02-21T09:06:21.9766760Z // end inline asm 2026-02-21T09:06:21.9766902Z cvt.u64.u32 %rd105, %r394; 2026-02-21T09:06:21.9767046Z // begin inline asm 2026-02-21T09:06:21.9767253Z @%p111 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd105]; 2026-02-21T09:06:21.9767488Z // end inline asm 2026-02-21T09:06:21.9767617Z bra.uni $L__BB0_6; 2026-02-21T09:06:21.9767794Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:06:21.9768102Z .loc 1 0 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:0:52 2026-02-21T09:06:21.9768422Z mov.b32 %r395, 1; 2026-02-21T09:06:21.9768667Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9768951Z // begin inline asm 2026-02-21T09:06:21.9769087Z 2026-02-21T09:06:21.9769196Z { 2026-02-21T09:06:21.9769317Z .reg .pred complete; 2026-02-21T09:06:21.9769454Z waitLoop: 2026-02-21T09:06:21.9769640Z mbarrier.try_wait.parity.shared.b64 complete, [%r394], %r395; 2026-02-21T09:06:21.9769867Z @!complete bra.uni waitLoop; 2026-02-21T09:06:21.9770019Z } 2026-02-21T09:06:21.9770082Z 2026-02-21T09:06:21.9770134Z // end inline asm 2026-02-21T09:06:21.9770379Z .loc 1 41 57 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:41:57 2026-02-21T09:06:21.9770652Z bar.sync 0; 2026-02-21T09:06:21.9770781Z // begin inline asm 2026-02-21T09:06:21.9770983Z @%p140 mbarrier.inval.shared::cta.b64 [%r136]; 2026-02-21T09:06:21.9771168Z // end inline asm 2026-02-21T09:06:21.9771302Z bar.sync 0; 2026-02-21T09:06:21.9771421Z // begin inline asm 2026-02-21T09:06:21.9771587Z @%p140 mbarrier.inval.shared::cta.b64 [%r137]; 2026-02-21T09:06:21.9771763Z // end inline asm 2026-02-21T09:06:21.9771898Z bar.sync 0; 2026-02-21T09:06:21.9772017Z // begin inline asm 2026-02-21T09:06:21.9772179Z @%p140 mbarrier.inval.shared::cta.b64 [%r138]; 2026-02-21T09:06:21.9772363Z // end inline asm 2026-02-21T09:06:21.9772489Z bar.sync 0; 2026-02-21T09:06:21.9772617Z // begin inline asm 2026-02-21T09:06:21.9772774Z @%p140 mbarrier.inval.shared::cta.b64 [%r139]; 2026-02-21T09:06:21.9772957Z // end inline asm 2026-02-21T09:06:21.9773081Z bar.sync 0; 2026-02-21T09:06:21.9773225Z // begin inline asm 2026-02-21T09:06:21.9773386Z @%p140 mbarrier.inval.shared::cta.b64 [%r140]; 2026-02-21T09:06:21.9773578Z // end inline asm 2026-02-21T09:06:21.9773708Z bar.sync 0; 2026-02-21T09:06:21.9773842Z // begin inline asm 2026-02-21T09:06:21.9774009Z @%p140 mbarrier.inval.shared::cta.b64 [%r141]; 2026-02-21T09:06:21.9774191Z // end inline asm 2026-02-21T09:06:21.9774328Z bar.sync 0; 2026-02-21T09:06:21.9774453Z // begin inline asm 2026-02-21T09:06:21.9774618Z @%p140 mbarrier.inval.shared::cta.b64 [%r296]; 2026-02-21T09:06:21.9774876Z // end inline asm 2026-02-21T09:06:21.9775020Z add.s32 %r403, %r41, 458816; 2026-02-21T09:06:21.9775174Z // begin inline asm 2026-02-21T09:06:21.9775341Z @%p140 mbarrier.inval.shared::cta.b64 [%r403]; 2026-02-21T09:06:21.9775531Z // end inline asm 2026-02-21T09:06:21.9775663Z bar.sync 0; 2026-02-21T09:06:21.9775796Z // begin inline asm 2026-02-21T09:06:21.9775956Z @%p140 mbarrier.inval.shared::cta.b64 [%r135]; 2026-02-21T09:06:21.9776146Z // end inline asm 2026-02-21T09:06:21.9776394Z .loc 1 50 45 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:50:45 2026-02-21T09:06:21.9776701Z shl.b32 %r546, %r11, 11; 2026-02-21T09:06:21.9776858Z shl.b32 %r547, %r12, 11; 2026-02-21T09:06:21.9777018Z shl.b32 %r548, %r13, 11; 2026-02-21T09:06:21.9777164Z shl.b32 %r549, %r14, 11; 2026-02-21T09:06:21.9777316Z shl.b32 %r550, %r15, 11; 2026-02-21T09:06:21.9777466Z shl.b32 %r551, %r16, 11; 2026-02-21T09:06:21.9777610Z shl.b32 %r552, %r17, 11; 2026-02-21T09:06:21.9777765Z shl.b32 %r553, %r18, 11; 2026-02-21T09:06:21.9778024Z .loc 1 50 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:50:52 2026-02-21T09:06:21.9778322Z or.b32 %r554, %r546, %r9; 2026-02-21T09:06:21.9778508Z or.b32 %r555, %r547, %r9; 2026-02-21T09:06:21.9778671Z or.b32 %r556, %r548, %r9; 2026-02-21T09:06:21.9778817Z or.b32 %r557, %r549, %r9; 2026-02-21T09:06:21.9778967Z or.b32 %r558, %r550, %r9; 2026-02-21T09:06:21.9779119Z or.b32 %r559, %r551, %r9; 2026-02-21T09:06:21.9779264Z or.b32 %r560, %r552, %r9; 2026-02-21T09:06:21.9779416Z or.b32 %r561, %r553, %r9; 2026-02-21T09:06:21.9779672Z .loc 1 50 24 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:50:24 2026-02-21T09:06:21.9780018Z mad.wide.u32 %rd124, %r554, 2, %rd3; 2026-02-21T09:06:21.9780201Z mad.wide.u32 %rd125, %r555, 2, %rd3; 2026-02-21T09:06:21.9780387Z mad.wide.u32 %rd126, %r556, 2, %rd3; 2026-02-21T09:06:21.9780561Z mad.wide.u32 %rd127, %r557, 2, %rd3; 2026-02-21T09:06:21.9780739Z mad.wide.u32 %rd128, %r558, 2, %rd3; 2026-02-21T09:06:21.9780913Z mad.wide.u32 %rd129, %r559, 2, %rd3; 2026-02-21T09:06:21.9781083Z mad.wide.u32 %rd130, %r560, 2, %rd3; 2026-02-21T09:06:21.9781262Z mad.wide.u32 %rd131, %r561, 2, %rd3; 2026-02-21T09:06:21.9781540Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9781909Z // begin inline asm 2026-02-21T09:06:21.9782264Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420}, [%r472 + 0]; 2026-02-21T09:06:21.9782645Z // end inline asm 2026-02-21T09:06:21.9782824Z // begin inline asm 2026-02-21T09:06:21.9783166Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437}, [%r472 + 16]; 2026-02-21T09:06:21.9783553Z // end inline asm 2026-02-21T09:06:21.9783684Z // begin inline asm 2026-02-21T09:06:21.9784022Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454}, [%r472 + 32]; 2026-02-21T09:06:21.9784388Z // end inline asm 2026-02-21T09:06:21.9784521Z // begin inline asm 2026-02-21T09:06:21.9784898Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r456, %r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471}, [%r472 + 48]; 2026-02-21T09:06:21.9785261Z // end inline asm 2026-02-21T09:06:21.9785396Z // begin inline asm 2026-02-21T09:06:21.9785540Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:06:21.9785703Z // end inline asm 2026-02-21T09:06:21.9785835Z cvt.u64.u32 %rd132, %r405; 2026-02-21T09:06:21.9785996Z cvt.u64.u32 %rd133, %r406; 2026-02-21T09:06:21.9786147Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:06:21.9786308Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:06:21.9786584Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9786905Z mov.b64 {%r562, %r563}, %rd135; 2026-02-21T09:06:21.9787081Z cvt.rn.f16x2.f32 %r564, %r563, %r562; 2026-02-21T09:06:21.9787358Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9787643Z cvt.u64.u32 %rd136, %r407; 2026-02-21T09:06:21.9787793Z cvt.u64.u32 %rd137, %r408; 2026-02-21T09:06:21.9787946Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:06:21.9788103Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:06:21.9788368Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9788662Z mov.b64 {%r565, %r566}, %rd139; 2026-02-21T09:06:21.9788825Z cvt.rn.f16x2.f32 %r567, %r566, %r565; 2026-02-21T09:06:21.9789106Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9789390Z cvt.u64.u32 %rd140, %r409; 2026-02-21T09:06:21.9789545Z cvt.u64.u32 %rd141, %r410; 2026-02-21T09:06:21.9789689Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:06:21.9789845Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:06:21.9790158Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9790438Z mov.b64 {%r568, %r569}, %rd143; 2026-02-21T09:06:21.9790603Z cvt.rn.f16x2.f32 %r570, %r569, %r568; 2026-02-21T09:06:21.9790869Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9791147Z cvt.u64.u32 %rd144, %r411; 2026-02-21T09:06:21.9791291Z cvt.u64.u32 %rd145, %r412; 2026-02-21T09:06:21.9791442Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:06:21.9791625Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:06:21.9791885Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9792173Z mov.b64 {%r571, %r572}, %rd147; 2026-02-21T09:06:21.9792333Z cvt.rn.f16x2.f32 %r573, %r572, %r571; 2026-02-21T09:06:21.9792610Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9792897Z cvt.u64.u32 %rd148, %r413; 2026-02-21T09:06:21.9793053Z cvt.u64.u32 %rd149, %r414; 2026-02-21T09:06:21.9793195Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:06:21.9793353Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:06:21.9793616Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9793892Z mov.b64 {%r574, %r575}, %rd151; 2026-02-21T09:06:21.9794061Z cvt.rn.f16x2.f32 %r576, %r575, %r574; 2026-02-21T09:06:21.9794369Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9794660Z cvt.u64.u32 %rd152, %r415; 2026-02-21T09:06:21.9794840Z cvt.u64.u32 %rd153, %r416; 2026-02-21T09:06:21.9794995Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:06:21.9795157Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:06:21.9795421Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9795708Z mov.b64 {%r577, %r578}, %rd155; 2026-02-21T09:06:21.9795868Z cvt.rn.f16x2.f32 %r579, %r578, %r577; 2026-02-21T09:06:21.9796149Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9796432Z cvt.u64.u32 %rd156, %r417; 2026-02-21T09:06:21.9796589Z cvt.u64.u32 %rd157, %r418; 2026-02-21T09:06:21.9796733Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:06:21.9796889Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:06:21.9797154Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9797432Z mov.b64 {%r580, %r581}, %rd159; 2026-02-21T09:06:21.9797598Z cvt.rn.f16x2.f32 %r582, %r581, %r580; 2026-02-21T09:06:21.9797867Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9798177Z cvt.u64.u32 %rd160, %r419; 2026-02-21T09:06:21.9798321Z cvt.u64.u32 %rd161, %r420; 2026-02-21T09:06:21.9798473Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:06:21.9798628Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:06:21.9798879Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9799154Z mov.b64 {%r583, %r584}, %rd163; 2026-02-21T09:06:21.9799310Z cvt.rn.f16x2.f32 %r585, %r584, %r583; 2026-02-21T09:06:21.9799576Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9799848Z cvt.u64.u32 %rd164, %r422; 2026-02-21T09:06:21.9800003Z cvt.u64.u32 %rd165, %r423; 2026-02-21T09:06:21.9800145Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:06:21.9800298Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:06:21.9800550Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9800820Z mov.b64 {%r586, %r587}, %rd167; 2026-02-21T09:06:21.9800983Z cvt.rn.f16x2.f32 %r588, %r587, %r586; 2026-02-21T09:06:21.9801269Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9801549Z cvt.u64.u32 %rd168, %r424; 2026-02-21T09:06:21.9801694Z cvt.u64.u32 %rd169, %r425; 2026-02-21T09:06:21.9801845Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:06:21.9802000Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:06:21.9802259Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9802539Z mov.b64 {%r589, %r590}, %rd171; 2026-02-21T09:06:21.9802696Z cvt.rn.f16x2.f32 %r591, %r590, %r589; 2026-02-21T09:06:21.9802998Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9803265Z cvt.u64.u32 %rd172, %r426; 2026-02-21T09:06:21.9803420Z cvt.u64.u32 %rd173, %r427; 2026-02-21T09:06:21.9803566Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:06:21.9803722Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:06:21.9803981Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9804256Z mov.b64 {%r592, %r593}, %rd175; 2026-02-21T09:06:21.9804419Z cvt.rn.f16x2.f32 %r594, %r593, %r592; 2026-02-21T09:06:21.9804711Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9804997Z cvt.u64.u32 %rd176, %r428; 2026-02-21T09:06:21.9805142Z cvt.u64.u32 %rd177, %r429; 2026-02-21T09:06:21.9805294Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:06:21.9805474Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:06:21.9805734Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9806023Z mov.b64 {%r595, %r596}, %rd179; 2026-02-21T09:06:21.9806178Z cvt.rn.f16x2.f32 %r597, %r596, %r595; 2026-02-21T09:06:21.9806453Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9806725Z cvt.u64.u32 %rd180, %r430; 2026-02-21T09:06:21.9806887Z cvt.u64.u32 %rd181, %r431; 2026-02-21T09:06:21.9807038Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:06:21.9807186Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:06:21.9807445Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9807719Z mov.b64 {%r598, %r599}, %rd183; 2026-02-21T09:06:21.9807885Z cvt.rn.f16x2.f32 %r600, %r599, %r598; 2026-02-21T09:06:21.9808153Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9808442Z cvt.u64.u32 %rd184, %r432; 2026-02-21T09:06:21.9808588Z cvt.u64.u32 %rd185, %r433; 2026-02-21T09:06:21.9808740Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:06:21.9808895Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:06:21.9809183Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9809462Z mov.b64 {%r601, %r602}, %rd187; 2026-02-21T09:06:21.9809625Z cvt.rn.f16x2.f32 %r603, %r602, %r601; 2026-02-21T09:06:21.9809904Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9810187Z cvt.u64.u32 %rd188, %r434; 2026-02-21T09:06:21.9810343Z cvt.u64.u32 %rd189, %r435; 2026-02-21T09:06:21.9810498Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:06:21.9810651Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:06:21.9810912Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9811187Z mov.b64 {%r604, %r605}, %rd191; 2026-02-21T09:06:21.9811354Z cvt.rn.f16x2.f32 %r606, %r605, %r604; 2026-02-21T09:06:21.9811621Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9811904Z cvt.u64.u32 %rd192, %r436; 2026-02-21T09:06:21.9812054Z cvt.u64.u32 %rd193, %r437; 2026-02-21T09:06:21.9812210Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:06:21.9812372Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:06:21.9812652Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9812937Z mov.b64 {%r607, %r608}, %rd195; 2026-02-21T09:06:21.9813095Z cvt.rn.f16x2.f32 %r609, %r608, %r607; 2026-02-21T09:06:21.9813365Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9813643Z cvt.u64.u32 %rd196, %r439; 2026-02-21T09:06:21.9813796Z cvt.u64.u32 %rd197, %r440; 2026-02-21T09:06:21.9813976Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:06:21.9814126Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:06:21.9814390Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9814666Z mov.b64 {%r610, %r611}, %rd199; 2026-02-21T09:06:21.9814865Z cvt.rn.f16x2.f32 %r612, %r611, %r610; 2026-02-21T09:06:21.9815145Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9815434Z cvt.u64.u32 %rd200, %r441; 2026-02-21T09:06:21.9815588Z cvt.u64.u32 %rd201, %r442; 2026-02-21T09:06:21.9815748Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:06:21.9815912Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:06:21.9816169Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9816464Z mov.b64 {%r613, %r614}, %rd203; 2026-02-21T09:06:21.9816663Z cvt.rn.f16x2.f32 %r615, %r614, %r613; 2026-02-21T09:06:21.9816949Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9817236Z cvt.u64.u32 %rd204, %r443; 2026-02-21T09:06:21.9817400Z cvt.u64.u32 %rd205, %r444; 2026-02-21T09:06:21.9817566Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:06:21.9817727Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:06:21.9818004Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9818300Z mov.b64 {%r616, %r617}, %rd207; 2026-02-21T09:06:21.9818479Z cvt.rn.f16x2.f32 %r618, %r617, %r616; 2026-02-21T09:06:21.9818757Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9819056Z cvt.u64.u32 %rd208, %r445; 2026-02-21T09:06:21.9819212Z cvt.u64.u32 %rd209, %r446; 2026-02-21T09:06:21.9819285Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:06:21.9819349Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:06:21.9819520Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9819592Z mov.b64 {%r619, %r620}, %rd211; 2026-02-21T09:06:21.9819659Z cvt.rn.f16x2.f32 %r621, %r620, %r619; 2026-02-21T09:06:21.9819825Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9819924Z cvt.u64.u32 %rd212, %r447; 2026-02-21T09:06:21.9819982Z cvt.u64.u32 %rd213, %r448; 2026-02-21T09:06:21.9820042Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:06:21.9820101Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:06:21.9820277Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9820337Z mov.b64 {%r622, %r623}, %rd215; 2026-02-21T09:06:21.9820398Z cvt.rn.f16x2.f32 %r624, %r623, %r622; 2026-02-21T09:06:21.9820575Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9820634Z cvt.u64.u32 %rd216, %r449; 2026-02-21T09:06:21.9820692Z cvt.u64.u32 %rd217, %r450; 2026-02-21T09:06:21.9820756Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:06:21.9820815Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:06:21.9820980Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9821040Z mov.b64 {%r625, %r626}, %rd219; 2026-02-21T09:06:21.9821109Z cvt.rn.f16x2.f32 %r627, %r626, %r625; 2026-02-21T09:06:21.9821304Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9821366Z cvt.u64.u32 %rd220, %r451; 2026-02-21T09:06:21.9821429Z cvt.u64.u32 %rd221, %r452; 2026-02-21T09:06:21.9821486Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:06:21.9821544Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:06:21.9821719Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9821779Z mov.b64 {%r628, %r629}, %rd223; 2026-02-21T09:06:21.9821886Z cvt.rn.f16x2.f32 %r630, %r629, %r628; 2026-02-21T09:06:21.9822058Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9822123Z cvt.u64.u32 %rd224, %r453; 2026-02-21T09:06:21.9822181Z cvt.u64.u32 %rd225, %r454; 2026-02-21T09:06:21.9822238Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:06:21.9822304Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:06:21.9822473Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9822533Z mov.b64 {%r631, %r632}, %rd227; 2026-02-21T09:06:21.9822601Z cvt.rn.f16x2.f32 %r633, %r632, %r631; 2026-02-21T09:06:21.9822769Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9822826Z cvt.u64.u32 %rd228, %r456; 2026-02-21T09:06:21.9822884Z cvt.u64.u32 %rd229, %r457; 2026-02-21T09:06:21.9822969Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:06:21.9823030Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:06:21.9823198Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9823266Z mov.b64 {%r634, %r635}, %rd231; 2026-02-21T09:06:21.9823332Z cvt.rn.f16x2.f32 %r636, %r635, %r634; 2026-02-21T09:06:21.9823502Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9823566Z cvt.u64.u32 %rd232, %r458; 2026-02-21T09:06:21.9823626Z cvt.u64.u32 %rd233, %r459; 2026-02-21T09:06:21.9823684Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:06:21.9823743Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:06:21.9823918Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9823975Z mov.b64 {%r637, %r638}, %rd235; 2026-02-21T09:06:21.9824038Z cvt.rn.f16x2.f32 %r639, %r638, %r637; 2026-02-21T09:06:21.9824225Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9824283Z cvt.u64.u32 %rd236, %r460; 2026-02-21T09:06:21.9824339Z cvt.u64.u32 %rd237, %r461; 2026-02-21T09:06:21.9824401Z shl.b64 %rd238, %rd237, 32; 2026-02-21T09:06:21.9824488Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T09:06:21.9824651Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9824742Z mov.b64 {%r640, %r641}, %rd239; 2026-02-21T09:06:21.9824813Z cvt.rn.f16x2.f32 %r642, %r641, %r640; 2026-02-21T09:06:21.9824978Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9825033Z cvt.u64.u32 %rd240, %r462; 2026-02-21T09:06:21.9825097Z cvt.u64.u32 %rd241, %r463; 2026-02-21T09:06:21.9825154Z shl.b64 %rd242, %rd241, 32; 2026-02-21T09:06:21.9825211Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T09:06:21.9825386Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9825444Z mov.b64 {%r643, %r644}, %rd243; 2026-02-21T09:06:21.9825505Z cvt.rn.f16x2.f32 %r645, %r644, %r643; 2026-02-21T09:06:21.9825668Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9825734Z cvt.u64.u32 %rd244, %r464; 2026-02-21T09:06:21.9825788Z cvt.u64.u32 %rd245, %r465; 2026-02-21T09:06:21.9825843Z shl.b64 %rd246, %rd245, 32; 2026-02-21T09:06:21.9825932Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T09:06:21.9826094Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9826151Z mov.b64 {%r646, %r647}, %rd247; 2026-02-21T09:06:21.9826217Z cvt.rn.f16x2.f32 %r648, %r647, %r646; 2026-02-21T09:06:21.9826379Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9826435Z cvt.u64.u32 %rd248, %r466; 2026-02-21T09:06:21.9826491Z cvt.u64.u32 %rd249, %r467; 2026-02-21T09:06:21.9826582Z shl.b64 %rd250, %rd249, 32; 2026-02-21T09:06:21.9826640Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T09:06:21.9826806Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9826871Z mov.b64 {%r649, %r650}, %rd251; 2026-02-21T09:06:21.9826933Z cvt.rn.f16x2.f32 %r651, %r650, %r649; 2026-02-21T09:06:21.9827096Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9827157Z cvt.u64.u32 %rd252, %r468; 2026-02-21T09:06:21.9827213Z cvt.u64.u32 %rd253, %r469; 2026-02-21T09:06:21.9827269Z shl.b64 %rd254, %rd253, 32; 2026-02-21T09:06:21.9827325Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T09:06:21.9827494Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9827550Z mov.b64 {%r652, %r653}, %rd255; 2026-02-21T09:06:21.9827639Z cvt.rn.f16x2.f32 %r654, %r653, %r652; 2026-02-21T09:06:21.9827809Z .loc 1 47 52 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:47:52 2026-02-21T09:06:21.9827865Z cvt.u64.u32 %rd256, %r470; 2026-02-21T09:06:21.9827920Z cvt.u64.u32 %rd257, %r471; 2026-02-21T09:06:21.9827984Z shl.b64 %rd258, %rd257, 32; 2026-02-21T09:06:21.9828039Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T09:06:21.9828204Z .loc 1 49 27 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:49:27 2026-02-21T09:06:21.9828261Z mov.b64 {%r655, %r656}, %rd259; 2026-02-21T09:06:21.9828329Z cvt.rn.f16x2.f32 %r657, %r656, %r655; 2026-02-21T09:06:21.9828492Z .loc 1 50 82 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:50:82 2026-02-21T09:06:21.9828584Z st.shared.v4.b32 [%r4], {%r564, %r576, %r588, %r600}; 2026-02-21T09:06:21.9828678Z st.shared.v4.b32 [%r5], {%r612, %r624, %r636, %r648}; 2026-02-21T09:06:21.9828733Z bar.sync 0; 2026-02-21T09:06:21.9828787Z // begin inline asm 2026-02-21T09:06:21.9828942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r513, %r517, %r521, %r525}, [%r477]; 2026-02-21T09:06:21.9828997Z // end inline asm 2026-02-21T09:06:21.9829052Z // begin inline asm 2026-02-21T09:06:21.9829225Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r529, %r533, %r537, %r541}, [%r482]; 2026-02-21T09:06:21.9829286Z // end inline asm 2026-02-21T09:06:21.9829338Z bar.sync 0; 2026-02-21T09:06:21.9829422Z st.shared.v4.b32 [%r4], {%r567, %r579, %r591, %r603}; 2026-02-21T09:06:21.9829514Z st.shared.v4.b32 [%r5], {%r615, %r627, %r639, %r651}; 2026-02-21T09:06:21.9829566Z bar.sync 0; 2026-02-21T09:06:21.9829620Z // begin inline asm 2026-02-21T09:06:21.9829762Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r514, %r518, %r522, %r526}, [%r477]; 2026-02-21T09:06:21.9829823Z // end inline asm 2026-02-21T09:06:21.9829875Z // begin inline asm 2026-02-21T09:06:21.9830013Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r530, %r534, %r538, %r542}, [%r482]; 2026-02-21T09:06:21.9830073Z // end inline asm 2026-02-21T09:06:21.9830124Z bar.sync 0; 2026-02-21T09:06:21.9830204Z st.shared.v4.b32 [%r4], {%r570, %r582, %r594, %r606}; 2026-02-21T09:06:21.9830290Z st.shared.v4.b32 [%r5], {%r618, %r630, %r642, %r654}; 2026-02-21T09:06:21.9830343Z bar.sync 0; 2026-02-21T09:06:21.9830396Z // begin inline asm 2026-02-21T09:06:21.9830533Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r515, %r519, %r523, %r527}, [%r477]; 2026-02-21T09:06:21.9830592Z // end inline asm 2026-02-21T09:06:21.9830665Z // begin inline asm 2026-02-21T09:06:21.9830802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r531, %r535, %r539, %r543}, [%r482]; 2026-02-21T09:06:21.9830859Z // end inline asm 2026-02-21T09:06:21.9830911Z bar.sync 0; 2026-02-21T09:06:21.9830990Z st.shared.v4.b32 [%r4], {%r573, %r585, %r597, %r609}; 2026-02-21T09:06:21.9831069Z st.shared.v4.b32 [%r5], {%r621, %r633, %r645, %r657}; 2026-02-21T09:06:21.9831128Z bar.sync 0; 2026-02-21T09:06:21.9831181Z // begin inline asm 2026-02-21T09:06:21.9831343Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r516, %r520, %r524, %r528}, [%r477]; 2026-02-21T09:06:21.9831400Z // end inline asm 2026-02-21T09:06:21.9831453Z // begin inline asm 2026-02-21T09:06:21.9831587Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r532, %r536, %r540, %r544}, [%r482]; 2026-02-21T09:06:21.9831646Z // end inline asm 2026-02-21T09:06:21.9831699Z // begin inline asm 2026-02-21T09:06:21.9831798Z st.global.v4.b32 [ %rd124 + 0 ], { %r513, %r514, %r515, %r516 }; 2026-02-21T09:06:21.9831852Z // end inline asm 2026-02-21T09:06:21.9831912Z // begin inline asm 2026-02-21T09:06:21.9832010Z st.global.v4.b32 [ %rd125 + 0 ], { %r517, %r518, %r519, %r520 }; 2026-02-21T09:06:21.9832062Z // end inline asm 2026-02-21T09:06:21.9832121Z // begin inline asm 2026-02-21T09:06:21.9832216Z st.global.v4.b32 [ %rd126 + 0 ], { %r521, %r522, %r523, %r524 }; 2026-02-21T09:06:21.9832268Z // end inline asm 2026-02-21T09:06:21.9832322Z // begin inline asm 2026-02-21T09:06:21.9832446Z st.global.v4.b32 [ %rd127 + 0 ], { %r525, %r526, %r527, %r528 }; 2026-02-21T09:06:21.9832501Z // end inline asm 2026-02-21T09:06:21.9832553Z // begin inline asm 2026-02-21T09:06:21.9832653Z st.global.v4.b32 [ %rd128 + 0 ], { %r529, %r530, %r531, %r532 }; 2026-02-21T09:06:21.9832707Z // end inline asm 2026-02-21T09:06:21.9832760Z // begin inline asm 2026-02-21T09:06:21.9832858Z st.global.v4.b32 [ %rd129 + 0 ], { %r533, %r534, %r535, %r536 }; 2026-02-21T09:06:21.9832910Z // end inline asm 2026-02-21T09:06:21.9832966Z // begin inline asm 2026-02-21T09:06:21.9833057Z st.global.v4.b32 [ %rd130 + 0 ], { %r537, %r538, %r539, %r540 }; 2026-02-21T09:06:21.9833118Z // end inline asm 2026-02-21T09:06:21.9833170Z // begin inline asm 2026-02-21T09:06:21.9833261Z st.global.v4.b32 [ %rd131 + 0 ], { %r541, %r542, %r543, %r544 }; 2026-02-21T09:06:21.9833320Z // end inline asm 2026-02-21T09:06:21.9833395Z $L__BB0_8: // %._crit_edge 2026-02-21T09:06:21.9833561Z .loc 1 28 4 // cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py:28:4 2026-02-21T09:06:21.9833614Z bar.sync 0; 2026-02-21T09:06:21.9833675Z // begin inline asm 2026-02-21T09:06:21.9833791Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r658, 128; 2026-02-21T09:06:21.9833864Z // end inline asm 2026-02-21T09:06:21.9833921Z ret; 2026-02-21T09:06:21.9833973Z $L__tmp1: 2026-02-21T09:06:21.9834026Z $L__func_end0: 2026-02-21T09:06:21.9834105Z // -- End function 2026-02-21T09:06:21.9834163Z } 2026-02-21T09:06:21.9834366Z .file 1 "/tmp/torchinductor_root/xl/cxlglwid4p36sjyv6jttktpatemz4qorehvtz7syjr4km5bx3dbx.py" 2026-02-21T09:06:21.9834426Z .section .debug_abbrev 2026-02-21T09:06:21.9834482Z { 2026-02-21T09:06:21.9834565Z .b8 1 // Abbreviation Code 2026-02-21T09:06:21.9834647Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:06:21.9834760Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:21.9834839Z .b8 37 // DW_AT_producer 2026-02-21T09:06:21.9834910Z .b8 8 // DW_FORM_string 2026-02-21T09:06:21.9834985Z .b8 19 // DW_AT_language 2026-02-21T09:06:21.9835065Z .b8 5 // DW_FORM_data2 2026-02-21T09:06:21.9835134Z .b8 3 // DW_AT_name 2026-02-21T09:06:21.9835204Z .b8 8 // DW_FORM_string 2026-02-21T09:06:21.9835311Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:06:21.9835385Z .b8 6 // DW_FORM_data4 2026-02-21T09:06:21.9835455Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:06:21.9835529Z .b8 8 // DW_FORM_string 2026-02-21T09:06:21.9835597Z .b8 0 // EOM(1) 2026-02-21T09:06:21.9835664Z .b8 0 // EOM(2) 2026-02-21T09:06:21.9835762Z .b8 0 // EOM(3) 2026-02-21T09:06:21.9835813Z } 2026-02-21T09:06:21.9835872Z .section .debug_info 2026-02-21T09:06:21.9835920Z { 2026-02-21T09:06:21.9836005Z .b32 104 // Length of Unit 2026-02-21T09:06:21.9836087Z .b8 2 // DWARF version number 2026-02-21T09:06:21.9836138Z .b8 0 2026-02-21T09:06:21.9836256Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:06:21.9836341Z .b8 8 // Address Size (in bytes) 2026-02-21T09:06:21.9836434Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:06:21.9836511Z .b8 116 // DW_AT_producer 2026-02-21T09:06:21.9836570Z .b8 114 2026-02-21T09:06:21.9836621Z .b8 105 2026-02-21T09:06:21.9836669Z .b8 116 2026-02-21T09:06:21.9836725Z .b8 111 2026-02-21T09:06:21.9836773Z .b8 110 2026-02-21T09:06:21.9836850Z .b8 0 2026-02-21T09:06:21.9836923Z .b8 2 // DW_AT_language 2026-02-21T09:06:21.9836980Z .b8 0 2026-02-21T09:06:21.9837051Z .b8 99 // DW_AT_name 2026-02-21T09:06:21.9837099Z .b8 120 2026-02-21T09:06:21.9837156Z .b8 108 2026-02-21T09:06:21.9837204Z .b8 103 2026-02-21T09:06:21.9837251Z .b8 108 2026-02-21T09:06:21.9837298Z .b8 119 2026-02-21T09:06:21.9837353Z .b8 105 2026-02-21T09:06:21.9837401Z .b8 100 2026-02-21T09:06:21.9837449Z .b8 52 2026-02-21T09:06:21.9837503Z .b8 112 2026-02-21T09:06:21.9837551Z .b8 51 2026-02-21T09:06:21.9837598Z .b8 54 2026-02-21T09:06:21.9837647Z .b8 115 2026-02-21T09:06:21.9837702Z .b8 106 2026-02-21T09:06:21.9837748Z .b8 121 2026-02-21T09:06:21.9837795Z .b8 118 2026-02-21T09:06:21.9837848Z .b8 54 2026-02-21T09:06:21.9837895Z .b8 106 2026-02-21T09:06:21.9837942Z .b8 116 2026-02-21T09:06:21.9837990Z .b8 116 2026-02-21T09:06:21.9838046Z .b8 107 2026-02-21T09:06:21.9838093Z .b8 116 2026-02-21T09:06:21.9838143Z .b8 112 2026-02-21T09:06:21.9838191Z .b8 97 2026-02-21T09:06:21.9838244Z .b8 116 2026-02-21T09:06:21.9838292Z .b8 101 2026-02-21T09:06:21.9838339Z .b8 109 2026-02-21T09:06:21.9838394Z .b8 122 2026-02-21T09:06:21.9838442Z .b8 52 2026-02-21T09:06:21.9838488Z .b8 113 2026-02-21T09:06:21.9838573Z .b8 111 2026-02-21T09:06:21.9838630Z .b8 114 2026-02-21T09:06:21.9838678Z .b8 101 2026-02-21T09:06:21.9838724Z .b8 104 2026-02-21T09:06:21.9838778Z .b8 118 2026-02-21T09:06:21.9838825Z .b8 116 2026-02-21T09:06:21.9838873Z .b8 122 2026-02-21T09:06:21.9838923Z .b8 55 2026-02-21T09:06:21.9838978Z .b8 115 2026-02-21T09:06:21.9839026Z .b8 121 2026-02-21T09:06:21.9839073Z .b8 106 2026-02-21T09:06:21.9839119Z .b8 114 2026-02-21T09:06:21.9839174Z .b8 52 2026-02-21T09:06:21.9839222Z .b8 107 2026-02-21T09:06:21.9839271Z .b8 109 2026-02-21T09:06:21.9839325Z .b8 53 2026-02-21T09:06:21.9839373Z .b8 98 2026-02-21T09:06:21.9839420Z .b8 120 2026-02-21T09:06:21.9839468Z .b8 51 2026-02-21T09:06:21.9839526Z .b8 100 2026-02-21T09:06:21.9839575Z .b8 98 2026-02-21T09:06:21.9839627Z .b8 120 2026-02-21T09:06:21.9839682Z .b8 46 2026-02-21T09:06:21.9839730Z .b8 112 2026-02-21T09:06:21.9839778Z .b8 121 2026-02-21T09:06:21.9839827Z .b8 0 2026-02-21T09:06:21.9839921Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:06:21.9839992Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:06:21.9840041Z .b8 116 2026-02-21T09:06:21.9840093Z .b8 109 2026-02-21T09:06:21.9840143Z .b8 112 2026-02-21T09:06:21.9840190Z .b8 47 2026-02-21T09:06:21.9840275Z .b8 116 2026-02-21T09:06:21.9840333Z .b8 111 2026-02-21T09:06:21.9840381Z .b8 114 2026-02-21T09:06:21.9840429Z .b8 99 2026-02-21T09:06:21.9840484Z .b8 104 2026-02-21T09:06:21.9840531Z .b8 105 2026-02-21T09:06:21.9840579Z .b8 110 2026-02-21T09:06:21.9840633Z .b8 100 2026-02-21T09:06:21.9840689Z .b8 117 2026-02-21T09:06:21.9840737Z .b8 99 2026-02-21T09:06:21.9840784Z .b8 116 2026-02-21T09:06:21.9840832Z .b8 111 2026-02-21T09:06:21.9840886Z .b8 114 2026-02-21T09:06:21.9840936Z .b8 95 2026-02-21T09:06:21.9841007Z .b8 114 2026-02-21T09:06:21.9841063Z .b8 111 2026-02-21T09:06:21.9841110Z .b8 111 2026-02-21T09:06:21.9841158Z .b8 116 2026-02-21T09:06:21.9841205Z .b8 47 2026-02-21T09:06:21.9841261Z .b8 120 2026-02-21T09:06:21.9841307Z .b8 108 2026-02-21T09:06:21.9841356Z .b8 0 2026-02-21T09:06:21.9841409Z } 2026-02-21T09:06:21.9841472Z .section .debug_macinfo { } 2026-02-21T09:06:21.9841476Z 2026-02-21T09:06:21.9841550Z ================================================================ 2026-02-21T09:06:21.9841649Z please share the reproducer above with Triton project. 2026-02-21T09:06:24.2713376Z 2026-02-21T09:06:24.2713406Z 2026-02-21T09:06:24.2713410Z 2026-02-21T09:06:24.2713678Z ================================================================ 2026-02-21T09:06:24.2713979Z Internal Triton PTX codegen error 2026-02-21T09:06:24.2718744Z `ptxas` stderr: 2026-02-21T09:06:24.2720797Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 243 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:06:24.2721396Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:24.2726275Z 2026-02-21T09:06:24.2728080Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpnhrsqxt4.ptx -o /tmp/tmpnhrsqxt4.ptx.o 2026-02-21T09:06:24.2728559Z 2026-02-21T09:06:24.2728564Z 2026-02-21T09:06:24.2728621Z // 2026-02-21T09:06:24.2728775Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:06:24.2728960Z // 2026-02-21T09:06:24.2729028Z 2026-02-21T09:06:24.2729082Z .version 8.7 2026-02-21T09:06:24.2729225Z .target sm_100a 2026-02-21T09:06:24.2729357Z .address_size 64 2026-02-21T09:06:24.2729449Z 2026-02-21T09:06:24.2729572Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:06:24.2729824Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:06:24.2730039Z // @_helion_matmul 2026-02-21T09:06:24.2730238Z .visible .entry _helion_matmul( 2026-02-21T09:06:24.2730446Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:06:24.2730697Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:06:24.2730933Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:06:24.2731386Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:06:24.2731638Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:06:24.2731850Z ) 2026-02-21T09:06:24.2731981Z .reqntid 128 2026-02-21T09:06:24.2732122Z .maxnreg 32 2026-02-21T09:06:24.2732258Z { 2026-02-21T09:06:24.2732383Z .reg .pred %p<104>; 2026-02-21T09:06:24.2732548Z .reg .b32 %r<1051>; 2026-02-21T09:06:24.2732687Z .reg .b64 %rd<476>; 2026-02-21T09:06:24.2732832Z $L__func_begin0: 2026-02-21T09:06:24.2732913Z 2026-02-21T09:06:24.2732964Z // %bb.0: 2026-02-21T09:06:24.2733215Z .loc 1 19 0 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:19 2026-02-21T09:06:24.2733509Z mov.u32 %r1, %tid.x; 2026-02-21T09:06:24.2733689Z ld.param.b64 %rd19, [_helion_matmul_param_1]; 2026-02-21T09:06:24.2733890Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:06:24.2734050Z mov.b32 %r44, global_smem; 2026-02-21T09:06:24.2734212Z // begin inline asm 2026-02-21T09:06:24.2734441Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r44], 128; 2026-02-21T09:06:24.2734764Z // end inline asm 2026-02-21T09:06:24.2734931Z ld.param.b64 %rd36, [_helion_matmul_param_3]; 2026-02-21T09:06:24.2735174Z bar.sync 0; 2026-02-21T09:06:24.2735320Z ld.shared.b32 %r1043, [global_smem]; 2026-02-21T09:06:24.2735500Z bar.sync 0; 2026-02-21T09:06:24.2735640Z // begin inline asm 2026-02-21T09:06:24.2735844Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:06:24.2736080Z // end inline asm 2026-02-21T09:06:24.2736333Z .loc 1 21 67 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:21:67 2026-02-21T09:06:24.2736630Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:06:24.2736839Z mov.u32 %r53, %ctaid.y; 2026-02-21T09:06:24.2736996Z mov.u32 %r54, %ctaid.z; 2026-02-21T09:06:24.2737141Z mov.u32 %r55, %nctaid.x; 2026-02-21T09:06:24.2737294Z mov.u32 %r56, %nctaid.y; 2026-02-21T09:06:24.2737456Z mad.lo.s32 %r57, %r54, %r56, %r53; 2026-02-21T09:06:24.2737630Z mad.lo.s32 %r58, %r57, %r55, %r3; 2026-02-21T09:06:24.2737804Z shl.b32 %r59, %r58, 7; 2026-02-21T09:06:24.2738334Z [84s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:06:24.2739569Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:06:24.2740706Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:06:24.2740930Z `ptxas` stderr: 2026-02-21T09:06:24.2741336Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 243 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:06:24.2741795Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:24.2741942Z 2026-02-21T09:06:24.2742316Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpnhrsqxt4.ptx -o /tmp/tmpnhrsqxt4.ptx.o 2026-02-21T09:06:24.2742763Z 2026-02-21T09:06:24.2742887Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:06:24.2743123Z cvt.s64.s32 %rd37, %r59; 2026-02-21T09:06:24.2743283Z add.s64 %rd33, %rd36, %rd37; 2026-02-21T09:06:24.2743445Z shl.b32 %r60, %r1, 2; 2026-02-21T09:06:24.2743593Z add.s32 %r45, %r44, %r60; 2026-02-21T09:06:24.2743746Z mov.b32 %r62, 0; 2026-02-21T09:06:24.2743874Z // begin inline asm 2026-02-21T09:06:24.2744028Z @%p1 st.shared.b32 [ %r45 + 0 ], %r62; 2026-02-21T09:06:24.2744194Z // end inline asm 2026-02-21T09:06:24.2744369Z bar.warp.sync -1; 2026-02-21T09:06:24.2744517Z setp.eq.b32 %p97, %r1, 0; 2026-02-21T09:06:24.2744714Z cvt.u64.u32 %rd18, %r44; 2026-02-21T09:06:24.2744863Z // begin inline asm 2026-02-21T09:06:24.2745123Z @%p97 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd18 + 0 ], %rd19; 2026-02-21T09:06:24.2745404Z // end inline asm 2026-02-21T09:06:24.2745533Z // begin inline asm 2026-02-21T09:06:24.2745767Z @%p97 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1; 2026-02-21T09:06:24.2746023Z // end inline asm 2026-02-21T09:06:24.2746166Z mov.b32 %r47, 64; 2026-02-21T09:06:24.2746305Z // begin inline asm 2026-02-21T09:06:24.2746555Z @%p97 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r47; 2026-02-21T09:06:24.2746831Z // end inline asm 2026-02-21T09:06:24.2746979Z // begin inline asm 2026-02-21T09:06:24.2747217Z @%p97 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r47; 2026-02-21T09:06:24.2747479Z // end inline asm 2026-02-21T09:06:24.2747618Z mov.b32 %r49, 2048; 2026-02-21T09:06:24.2747754Z // begin inline asm 2026-02-21T09:06:24.2747993Z @%p97 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r49; 2026-02-21T09:06:24.2748291Z // end inline asm 2026-02-21T09:06:24.2748429Z // begin inline asm 2026-02-21T09:06:24.2748665Z @%p97 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r49; 2026-02-21T09:06:24.2748927Z // end inline asm 2026-02-21T09:06:24.2749065Z mov.b64 %rd26, 4096; 2026-02-21T09:06:24.2749203Z // begin inline asm 2026-02-21T09:06:24.2749451Z @%p97 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd18 + 0 ], 0x0, %rd26; 2026-02-21T09:06:24.2749725Z // end inline asm 2026-02-21T09:06:24.2749888Z mov.b32 %r51, 1; 2026-02-21T09:06:24.2750013Z // begin inline asm 2026-02-21T09:06:24.2750263Z @%p97 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0, %r51; 2026-02-21T09:06:24.2750551Z // end inline asm 2026-02-21T09:06:24.2750679Z // begin inline asm 2026-02-21T09:06:24.2750927Z @%p97 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x1, %r51; 2026-02-21T09:06:24.2751201Z // end inline asm 2026-02-21T09:06:24.2751333Z // begin inline asm 2026-02-21T09:06:24.2751555Z @%p97 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x6; 2026-02-21T09:06:24.2751822Z // end inline asm 2026-02-21T09:06:24.2751955Z // begin inline asm 2026-02-21T09:06:24.2752194Z @%p97 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T09:06:24.2752476Z // end inline asm 2026-02-21T09:06:24.2752602Z // begin inline asm 2026-02-21T09:06:24.2752860Z @%p97 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x3; 2026-02-21T09:06:24.2753128Z // end inline asm 2026-02-21T09:06:24.2753264Z // begin inline asm 2026-02-21T09:06:24.2753486Z @%p97 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd18 + 0 ], 0x0; 2026-02-21T09:06:24.2753746Z // end inline asm 2026-02-21T09:06:24.2753879Z // begin inline asm 2026-02-21T09:06:24.2754217Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd33 + 0 ], [ %rd18 + 0 ], 0x80; 2026-02-21T09:06:24.2754599Z // end inline asm 2026-02-21T09:06:24.2754765Z // begin inline asm 2026-02-21T09:06:24.2754985Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd33 + 0 ], 0x80; 2026-02-21T09:06:24.2755237Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:06:24.2755437Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:24.2755621Z // end inline asm 2026-02-21T09:06:24.2755756Z bar.sync 0; 2026-02-21T09:06:24.2755908Z cvta.global.u64 %rd143, %rd33; 2026-02-21T09:06:24.2756193Z .loc 1 30 74 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:30:74 2026-02-21T09:06:24.2756512Z setp.gt.u32 %p21, %r3, 511; 2026-02-21T09:06:24.2756678Z @%p21 bra $L__BB0_8; 2026-02-21T09:06:24.2756851Z // %bb.1: // %.lr.ph 2026-02-21T09:06:24.2757178Z .loc 1 0 74 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:0:74 2026-02-21T09:06:24.2757495Z ld.param.b64 %rd16, [_helion_matmul_param_0]; 2026-02-21T09:06:24.2757801Z .loc 1 44 45 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:44:45 2026-02-21T09:06:24.2758085Z shr.u32 %r316, %r1, 3; 2026-02-21T09:06:24.2758244Z bfe.u32 %r4, %r1, 3, 4; 2026-02-21T09:06:24.2758496Z .loc 1 42 45 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:42:45 2026-02-21T09:06:24.2758782Z shl.b32 %r317, %r1, 3; 2026-02-21T09:06:24.2758932Z and.b32 %r318, %r317, 56; 2026-02-21T09:06:24.2759094Z shr.u32 %r319, %r1, 5; 2026-02-21T09:06:24.2759246Z shl.b32 %r320, %r1, 4; 2026-02-21T09:06:24.2759393Z and.b32 %r321, %r320, 2032; 2026-02-21T09:06:24.2759558Z shl.b32 %r322, %r1, 1; 2026-02-21T09:06:24.2759705Z and.b32 %r323, %r322, 112; 2026-02-21T09:06:24.2759872Z xor.b32 %r5, %r321, %r323; 2026-02-21T09:06:24.2760026Z or.b32 %r6, %r318, 192; 2026-02-21T09:06:24.2760184Z add.s32 %r203, %r44, %r5; 2026-02-21T09:06:24.2760339Z add.s32 %r403, %r203, 98304; 2026-02-21T09:06:24.2760550Z add.s32 %r405, %r203, 100352; 2026-02-21T09:06:24.2760707Z add.s32 %r407, %r203, 102400; 2026-02-21T09:06:24.2760865Z add.s32 %r409, %r203, 104448; 2026-02-21T09:06:24.2761009Z add.s32 %r411, %r203, 106496; 2026-02-21T09:06:24.2761160Z add.s32 %r413, %r203, 108544; 2026-02-21T09:06:24.2761307Z add.s32 %r415, %r203, 110592; 2026-02-21T09:06:24.2761457Z add.s32 %r417, %r203, 112640; 2026-02-21T09:06:24.2761601Z add.s32 %r419, %r203, 114688; 2026-02-21T09:06:24.2761753Z add.s32 %r421, %r203, 116736; 2026-02-21T09:06:24.2761936Z add.s32 %r423, %r203, 118784; 2026-02-21T09:06:24.2762083Z add.s32 %r425, %r203, 120832; 2026-02-21T09:06:24.2762237Z add.s32 %r427, %r203, 122880; 2026-02-21T09:06:24.2762381Z add.s32 %r429, %r203, 124928; 2026-02-21T09:06:24.2762534Z add.s32 %r431, %r203, 126976; 2026-02-21T09:06:24.2762678Z add.s32 %r433, %r203, 129024; 2026-02-21T09:06:24.2762832Z and.b32 %r325, %r320, 1968; 2026-02-21T09:06:24.2762983Z bfe.s32 %r326, %r1, 2, 1; 2026-02-21T09:06:24.2763138Z and.b32 %r327, %r326, 2112; 2026-02-21T09:06:24.2763283Z or.b32 %r328, %r327, %r325; 2026-02-21T09:06:24.2763437Z xor.b32 %r329, %r328, 64; 2026-02-21T09:06:24.2763585Z shl.b32 %r330, %r1, 6; 2026-02-21T09:06:24.2763724Z and.b32 %r331, %r330, 2112; 2026-02-21T09:06:24.2763876Z shl.b32 %r332, %r1, 5; 2026-02-21T09:06:24.2764014Z and.b32 %r333, %r332, 768; 2026-02-21T09:06:24.2764168Z and.b32 %r334, %r317, 48; 2026-02-21T09:06:24.2764309Z and.b32 %r335, %r322, 192; 2026-02-21T09:06:24.2764487Z or.b32 %r336, %r331, %r333; 2026-02-21T09:06:24.2764636Z or.b32 %r337, %r334, %r335; 2026-02-21T09:06:24.2764828Z xor.b32 %r338, %r336, %r337; 2026-02-21T09:06:24.2764982Z add.s32 %r704, %r44, %r338; 2026-02-21T09:06:24.2765258Z .loc 1 30 74 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:30:74 2026-02-21T09:06:24.2765560Z cvt.u64.u32 %rd89, %r318; 2026-02-21T09:06:24.2765719Z add.s32 %r277, %r203, 65536; 2026-02-21T09:06:24.2765886Z add.s32 %r307, %r203, 96256; 2026-02-21T09:06:24.2766043Z add.s32 %r305, %r203, 94208; 2026-02-21T09:06:24.2766203Z add.s32 %r303, %r203, 92160; 2026-02-21T09:06:24.2766355Z add.s32 %r301, %r203, 90112; 2026-02-21T09:06:24.2766514Z add.s32 %r299, %r203, 88064; 2026-02-21T09:06:24.2766664Z add.s32 %r297, %r203, 86016; 2026-02-21T09:06:24.2766826Z add.s32 %r295, %r203, 83968; 2026-02-21T09:06:24.2766986Z add.s32 %r293, %r203, 81920; 2026-02-21T09:06:24.2767140Z add.s32 %r291, %r203, 79872; 2026-02-21T09:06:24.2767301Z add.s32 %r289, %r203, 77824; 2026-02-21T09:06:24.2767454Z add.s32 %r287, %r203, 75776; 2026-02-21T09:06:24.2767614Z add.s32 %r285, %r203, 73728; 2026-02-21T09:06:24.2767765Z add.s32 %r283, %r203, 71680; 2026-02-21T09:06:24.2767926Z add.s32 %r281, %r203, 69632; 2026-02-21T09:06:24.2768116Z add.s32 %r279, %r203, 67584; 2026-02-21T09:06:24.2768281Z add.s32 %r240, %r203, 32768; 2026-02-21T09:06:24.2768448Z add.s32 %r270, %r203, 63488; 2026-02-21T09:06:24.2768606Z add.s32 %r268, %r203, 61440; 2026-02-21T09:06:24.2768770Z add.s32 %r266, %r203, 59392; 2026-02-21T09:06:24.2768922Z add.s32 %r264, %r203, 57344; 2026-02-21T09:06:24.2769080Z add.s32 %r262, %r203, 55296; 2026-02-21T09:06:24.2769230Z add.s32 %r260, %r203, 53248; 2026-02-21T09:06:24.2769390Z add.s32 %r258, %r203, 51200; 2026-02-21T09:06:24.2769542Z add.s32 %r256, %r203, 49152; 2026-02-21T09:06:24.2769706Z add.s32 %r254, %r203, 47104; 2026-02-21T09:06:24.2769858Z add.s32 %r252, %r203, 45056; 2026-02-21T09:06:24.2770019Z add.s32 %r250, %r203, 43008; 2026-02-21T09:06:24.2770182Z add.s32 %r248, %r203, 40960; 2026-02-21T09:06:24.2770333Z add.s32 %r246, %r203, 38912; 2026-02-21T09:06:24.2770493Z add.s32 %r244, %r203, 36864; 2026-02-21T09:06:24.2770644Z add.s32 %r242, %r203, 34816; 2026-02-21T09:06:24.2770808Z add.s32 %r233, %r203, 30720; 2026-02-21T09:06:24.2770959Z add.s32 %r231, %r203, 28672; 2026-02-21T09:06:24.2771119Z add.s32 %r229, %r203, 26624; 2026-02-21T09:06:24.2771270Z add.s32 %r227, %r203, 24576; 2026-02-21T09:06:24.2771475Z add.s32 %r225, %r203, 22528; 2026-02-21T09:06:24.2771638Z add.s32 %r223, %r203, 20480; 2026-02-21T09:06:24.2771791Z add.s32 %r221, %r203, 18432; 2026-02-21T09:06:24.2771952Z add.s32 %r219, %r203, 16384; 2026-02-21T09:06:24.2772103Z add.s32 %r217, %r203, 14336; 2026-02-21T09:06:24.2772263Z add.s32 %r215, %r203, 12288; 2026-02-21T09:06:24.2772414Z add.s32 %r213, %r203, 10240; 2026-02-21T09:06:24.2772574Z add.s32 %r211, %r203, 8192; 2026-02-21T09:06:24.2772729Z add.s32 %r209, %r203, 6144; 2026-02-21T09:06:24.2772940Z add.s32 %r207, %r203, 4096; 2026-02-21T09:06:24.2773091Z add.s32 %r205, %r203, 2048; 2026-02-21T09:06:24.2773367Z .loc 1 37 33 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:37:33 2026-02-21T09:06:24.2773648Z shr.u32 %r339, %r3, 4; 2026-02-21T09:06:24.2773792Z and.b32 %r340, %r339, 28; 2026-02-21T09:06:24.2774047Z .loc 1 39 64 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:39:64 2026-02-21T09:06:24.2774320Z and.b32 %r341, %r3, 3; 2026-02-21T09:06:24.2774577Z .loc 1 39 30 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:39:30 2026-02-21T09:06:24.2774880Z or.b32 %r342, %r340, %r341; 2026-02-21T09:06:24.2775138Z .loc 1 41 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:41:27 2026-02-21T09:06:24.2775418Z shl.b32 %r438, %r342, 6; 2026-02-21T09:06:24.2775722Z .loc 1 43 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:43:27 2026-02-21T09:06:24.2776013Z shl.b32 %r343, %r3, 6; 2026-02-21T09:06:24.2776153Z and.b32 %r344, %r343, 3840; 2026-02-21T09:06:24.2776411Z .loc 1 44 32 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:44:32 2026-02-21T09:06:24.2776689Z or.b32 %r345, %r344, %r4; 2026-02-21T09:06:24.2776842Z or.b32 %r346, %r316, %r344; 2026-02-21T09:06:24.2777107Z .loc 1 54 53 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:53 2026-02-21T09:06:24.2777382Z shl.b32 %r347, %r345, 11; 2026-02-21T09:06:24.2777536Z shl.b32 %r348, %r346, 11; 2026-02-21T09:06:24.2777683Z or.b32 %r349, %r348, 229376; 2026-02-21T09:06:24.2777844Z or.b32 %r350, %r348, 491520; 2026-02-21T09:06:24.2778100Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2778424Z shfl.sync.idx.b32 %r29, %r319, 0, 31, -1; 2026-02-21T09:06:24.2778601Z shl.b32 %r351, %r29, 21; 2026-02-21T09:06:24.2778758Z and.b32 %r352, %r351, 6291456; 2026-02-21T09:06:24.2778920Z add.s32 %r699, %r352, %r1043; 2026-02-21T09:06:24.2779074Z mov.pred %p49, -1; 2026-02-21T09:06:24.2779223Z // begin inline asm 2026-02-21T09:06:24.2779591Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 0], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2779958Z // end inline asm 2026-02-21T09:06:24.2780098Z // begin inline asm 2026-02-21T09:06:24.2780428Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 16], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2780790Z // end inline asm 2026-02-21T09:06:24.2780925Z // begin inline asm 2026-02-21T09:06:24.2781251Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 32], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2781612Z // end inline asm 2026-02-21T09:06:24.2781753Z // begin inline asm 2026-02-21T09:06:24.2782068Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 48], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2782428Z // end inline asm 2026-02-21T09:06:24.2782568Z // begin inline asm 2026-02-21T09:06:24.2782916Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 64], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2783286Z // end inline asm 2026-02-21T09:06:24.2783415Z // begin inline asm 2026-02-21T09:06:24.2783731Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 80], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2784077Z // end inline asm 2026-02-21T09:06:24.2784214Z // begin inline asm 2026-02-21T09:06:24.2784540Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 96], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2784948Z // end inline asm 2026-02-21T09:06:24.2785085Z // begin inline asm 2026-02-21T09:06:24.2785396Z @%p49 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r699 + 112], {%r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62, %r62}; 2026-02-21T09:06:24.2785760Z // end inline asm 2026-02-21T09:06:24.2785888Z // begin inline asm 2026-02-21T09:06:24.2786045Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:06:24.2786208Z // end inline asm 2026-02-21T09:06:24.2786335Z bar.sync 0; 2026-02-21T09:06:24.2786582Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2786868Z add.s32 %r1045, %r44, 163872; 2026-02-21T09:06:24.2787028Z // begin inline asm 2026-02-21T09:06:24.2787190Z @%p97 mbarrier.init.shared::cta.b64 [%r1045], 1; 2026-02-21T09:06:24.2787416Z // end inline asm 2026-02-21T09:06:24.2787549Z bar.sync 0; 2026-02-21T09:06:24.2787695Z add.s32 %r198, %r44, 163880; 2026-02-21T09:06:24.2787854Z // begin inline asm 2026-02-21T09:06:24.2788018Z @%p97 mbarrier.init.shared::cta.b64 [%r198], 1; 2026-02-21T09:06:24.2788212Z // end inline asm 2026-02-21T09:06:24.2788350Z add.s32 %r199, %r44, 163840; 2026-02-21T09:06:24.2788508Z // begin inline asm 2026-02-21T09:06:24.2788671Z @%p97 mbarrier.init.shared::cta.b64 [%r199], 1; 2026-02-21T09:06:24.2788864Z // end inline asm 2026-02-21T09:06:24.2788998Z bar.sync 0; 2026-02-21T09:06:24.2789143Z add.s32 %r200, %r44, 163848; 2026-02-21T09:06:24.2789296Z // begin inline asm 2026-02-21T09:06:24.2789468Z @%p97 mbarrier.init.shared::cta.b64 [%r200], 1; 2026-02-21T09:06:24.2789661Z // end inline asm 2026-02-21T09:06:24.2789791Z bar.sync 0; 2026-02-21T09:06:24.2789930Z add.s32 %r201, %r44, 163856; 2026-02-21T09:06:24.2790079Z // begin inline asm 2026-02-21T09:06:24.2790245Z @%p97 mbarrier.init.shared::cta.b64 [%r201], 1; 2026-02-21T09:06:24.2790426Z // end inline asm 2026-02-21T09:06:24.2790563Z bar.sync 0; 2026-02-21T09:06:24.2790692Z add.s32 %r435, %r44, 163864; 2026-02-21T09:06:24.2790849Z // begin inline asm 2026-02-21T09:06:24.2791013Z @%p97 mbarrier.init.shared::cta.b64 [%r435], 1; 2026-02-21T09:06:24.2791215Z // end inline asm 2026-02-21T09:06:24.2791466Z .loc 1 54 60 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:60 2026-02-21T09:06:24.2791748Z or.b32 %r353, %r347, %r318; 2026-02-21T09:06:24.2791909Z or.b32 %r354, %r349, %r318; 2026-02-21T09:06:24.2792054Z or.b32 %r355, %r350, %r318; 2026-02-21T09:06:24.2792317Z .loc 1 54 32 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:32 2026-02-21T09:06:24.2792613Z mad.wide.u32 %rd38, %r353, 2, %rd16; 2026-02-21T09:06:24.2792792Z cvt.u64.u32 %rd2, %r347; 2026-02-21T09:06:24.2792950Z or.b64 %rd90, %rd2, %rd89; 2026-02-21T09:06:24.2793099Z shl.b64 %rd91, %rd90, 1; 2026-02-21T09:06:24.2793255Z add.s64 %rd3, %rd16, %rd91; 2026-02-21T09:06:24.2793403Z add.s64 %rd39, %rd3, 65536; 2026-02-21T09:06:24.2793557Z add.s64 %rd40, %rd3, 131072; 2026-02-21T09:06:24.2793707Z add.s64 %rd41, %rd3, 196608; 2026-02-21T09:06:24.2793859Z add.s64 %rd42, %rd3, 262144; 2026-02-21T09:06:24.2800507Z add.s64 %rd43, %rd3, 327680; 2026-02-21T09:06:24.2800678Z add.s64 %rd44, %rd3, 393216; 2026-02-21T09:06:24.2800842Z mad.wide.u32 %rd45, %r354, 2, %rd16; 2026-02-21T09:06:24.2801024Z add.s64 %rd46, %rd3, 524288; 2026-02-21T09:06:24.2801281Z add.s64 %rd47, %rd3, 589824; 2026-02-21T09:06:24.2801441Z add.s64 %rd48, %rd3, 655360; 2026-02-21T09:06:24.2801588Z add.s64 %rd49, %rd3, 720896; 2026-02-21T09:06:24.2801745Z add.s64 %rd50, %rd3, 786432; 2026-02-21T09:06:24.2801902Z add.s64 %rd51, %rd3, 851968; 2026-02-21T09:06:24.2802051Z add.s64 %rd52, %rd3, 917504; 2026-02-21T09:06:24.2802216Z mad.wide.u32 %rd53, %r355, 2, %rd16; 2026-02-21T09:06:24.2802379Z mov.b32 %r404, 16; 2026-02-21T09:06:24.2802662Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2802998Z // begin inline asm 2026-02-21T09:06:24.2803210Z cp.async.cg.shared.global [ %r203 + 0 ], [ %rd38 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2803438Z // end inline asm 2026-02-21T09:06:24.2803607Z // begin inline asm 2026-02-21T09:06:24.2803807Z cp.async.cg.shared.global [ %r205 + 0 ], [ %rd39 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2804031Z // end inline asm 2026-02-21T09:06:24.2804166Z // begin inline asm 2026-02-21T09:06:24.2804363Z cp.async.cg.shared.global [ %r207 + 0 ], [ %rd40 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2804578Z // end inline asm 2026-02-21T09:06:24.2804786Z // begin inline asm 2026-02-21T09:06:24.2805005Z cp.async.cg.shared.global [ %r209 + 0 ], [ %rd41 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2805224Z // end inline asm 2026-02-21T09:06:24.2805369Z // begin inline asm 2026-02-21T09:06:24.2805597Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd42 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2805821Z // end inline asm 2026-02-21T09:06:24.2805948Z // begin inline asm 2026-02-21T09:06:24.2806135Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd43 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2806349Z // end inline asm 2026-02-21T09:06:24.2806476Z // begin inline asm 2026-02-21T09:06:24.2806669Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd44 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2806880Z // end inline asm 2026-02-21T09:06:24.2807016Z // begin inline asm 2026-02-21T09:06:24.2807200Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd45 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2807419Z // end inline asm 2026-02-21T09:06:24.2807546Z // begin inline asm 2026-02-21T09:06:24.2807737Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd46 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2807953Z // end inline asm 2026-02-21T09:06:24.2808078Z // begin inline asm 2026-02-21T09:06:24.2808267Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd47 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2808479Z // end inline asm 2026-02-21T09:06:24.2808630Z // begin inline asm 2026-02-21T09:06:24.2808820Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd48 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2809047Z // end inline asm 2026-02-21T09:06:24.2809179Z // begin inline asm 2026-02-21T09:06:24.2809378Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd49 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2809602Z // end inline asm 2026-02-21T09:06:24.2809733Z // begin inline asm 2026-02-21T09:06:24.2809932Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd50 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2810152Z // end inline asm 2026-02-21T09:06:24.2810290Z // begin inline asm 2026-02-21T09:06:24.2810480Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd51 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2810706Z // end inline asm 2026-02-21T09:06:24.2810838Z // begin inline asm 2026-02-21T09:06:24.2811034Z cp.async.cg.shared.global [ %r231 + 0 ], [ %rd52 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2811260Z // end inline asm 2026-02-21T09:06:24.2811394Z // begin inline asm 2026-02-21T09:06:24.2811594Z cp.async.cg.shared.global [ %r233 + 0 ], [ %rd53 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2811818Z // end inline asm 2026-02-21T09:06:24.2811968Z cp.async.commit_group; 2026-02-21T09:06:24.2812241Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2812642Z bar.sync 0; 2026-02-21T09:06:24.2812778Z // begin inline asm 2026-02-21T09:06:24.2813017Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r199], 8192; 2026-02-21T09:06:24.2813255Z // end inline asm 2026-02-21T09:06:24.2813508Z .loc 1 55 44 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:55:44 2026-02-21T09:06:24.2813812Z bar.sync 0; 2026-02-21T09:06:24.2813959Z elect.sync %r356|%p43, -1; 2026-02-21T09:06:24.2814146Z and.pred %p37, %p1, %p43; 2026-02-21T09:06:24.2814314Z add.s32 %r236, %r44, 131072; 2026-02-21T09:06:24.2814485Z // begin inline asm 2026-02-21T09:06:24.2814873Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r236], [%rd143, {%r62, %r438}], [%r199]; 2026-02-21T09:06:24.2815313Z // end inline asm 2026-02-21T09:06:24.2815574Z .loc 1 54 32 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:32 2026-02-21T09:06:24.2815871Z add.s64 %rd55, %rd3, 128; 2026-02-21T09:06:24.2816044Z or.b32 %r357, %r353, 64; 2026-02-21T09:06:24.2816219Z mad.wide.u32 %rd92, %r357, 2, %rd16; 2026-02-21T09:06:24.2816420Z add.s64 %rd56, %rd92, 65536; 2026-02-21T09:06:24.2816593Z add.s64 %rd57, %rd92, 131072; 2026-02-21T09:06:24.2816774Z add.s64 %rd58, %rd92, 196608; 2026-02-21T09:06:24.2816940Z add.s64 %rd59, %rd92, 262144; 2026-02-21T09:06:24.2817104Z add.s64 %rd60, %rd92, 327680; 2026-02-21T09:06:24.2817265Z add.s64 %rd61, %rd92, 393216; 2026-02-21T09:06:24.2817423Z cvt.u64.u32 %rd4, %r349; 2026-02-21T09:06:24.2817586Z or.b64 %rd93, %rd4, %rd89; 2026-02-21T09:06:24.2817772Z shl.b64 %rd94, %rd93, 1; 2026-02-21T09:06:24.2817930Z add.s64 %rd5, %rd16, %rd94; 2026-02-21T09:06:24.2818081Z add.s64 %rd62, %rd5, 128; 2026-02-21T09:06:24.2818233Z add.s64 %rd63, %rd92, 524288; 2026-02-21T09:06:24.2818381Z add.s64 %rd64, %rd92, 589824; 2026-02-21T09:06:24.2818537Z add.s64 %rd65, %rd92, 655360; 2026-02-21T09:06:24.2818690Z add.s64 %rd66, %rd92, 720896; 2026-02-21T09:06:24.2818836Z add.s64 %rd67, %rd92, 786432; 2026-02-21T09:06:24.2818987Z add.s64 %rd68, %rd92, 851968; 2026-02-21T09:06:24.2819132Z add.s64 %rd69, %rd92, 917504; 2026-02-21T09:06:24.2819285Z cvt.u64.u32 %rd6, %r350; 2026-02-21T09:06:24.2819430Z or.b64 %rd95, %rd6, %rd89; 2026-02-21T09:06:24.2819585Z shl.b64 %rd96, %rd95, 1; 2026-02-21T09:06:24.2819730Z add.s64 %rd7, %rd16, %rd96; 2026-02-21T09:06:24.2819891Z add.s64 %rd70, %rd7, 128; 2026-02-21T09:06:24.2820146Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2820429Z // begin inline asm 2026-02-21T09:06:24.2820628Z cp.async.cg.shared.global [ %r240 + 0 ], [ %rd55 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2820845Z // end inline asm 2026-02-21T09:06:24.2820981Z // begin inline asm 2026-02-21T09:06:24.2821166Z cp.async.cg.shared.global [ %r242 + 0 ], [ %rd56 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2821386Z // end inline asm 2026-02-21T09:06:24.2821513Z // begin inline asm 2026-02-21T09:06:24.2821700Z cp.async.cg.shared.global [ %r244 + 0 ], [ %rd57 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2821910Z // end inline asm 2026-02-21T09:06:24.2822046Z // begin inline asm 2026-02-21T09:06:24.2822234Z cp.async.cg.shared.global [ %r246 + 0 ], [ %rd58 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2822444Z // end inline asm 2026-02-21T09:06:24.2822578Z // begin inline asm 2026-02-21T09:06:24.2822760Z cp.async.cg.shared.global [ %r248 + 0 ], [ %rd59 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2822977Z // end inline asm 2026-02-21T09:06:24.2823103Z // begin inline asm 2026-02-21T09:06:24.2823291Z cp.async.cg.shared.global [ %r250 + 0 ], [ %rd60 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2823509Z // end inline asm 2026-02-21T09:06:24.2823647Z // begin inline asm 2026-02-21T09:06:24.2823840Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd61 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2824051Z // end inline asm 2026-02-21T09:06:24.2824231Z // begin inline asm 2026-02-21T09:06:24.2824417Z cp.async.cg.shared.global [ %r254 + 0 ], [ %rd62 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2824634Z // end inline asm 2026-02-21T09:06:24.2824828Z // begin inline asm 2026-02-21T09:06:24.2825026Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd63 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2825240Z // end inline asm 2026-02-21T09:06:24.2825377Z // begin inline asm 2026-02-21T09:06:24.2825567Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd64 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2825778Z // end inline asm 2026-02-21T09:06:24.2825913Z // begin inline asm 2026-02-21T09:06:24.2826103Z cp.async.cg.shared.global [ %r260 + 0 ], [ %rd65 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2826353Z // end inline asm 2026-02-21T09:06:24.2826478Z // begin inline asm 2026-02-21T09:06:24.2826666Z cp.async.cg.shared.global [ %r262 + 0 ], [ %rd66 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2826879Z // end inline asm 2026-02-21T09:06:24.2827013Z // begin inline asm 2026-02-21T09:06:24.2827201Z cp.async.cg.shared.global [ %r264 + 0 ], [ %rd67 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2827411Z // end inline asm 2026-02-21T09:06:24.2827544Z // begin inline asm 2026-02-21T09:06:24.2827726Z cp.async.cg.shared.global [ %r266 + 0 ], [ %rd68 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2827941Z // end inline asm 2026-02-21T09:06:24.2828067Z // begin inline asm 2026-02-21T09:06:24.2828254Z cp.async.cg.shared.global [ %r268 + 0 ], [ %rd69 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2828464Z // end inline asm 2026-02-21T09:06:24.2828597Z // begin inline asm 2026-02-21T09:06:24.2828776Z cp.async.cg.shared.global [ %r270 + 0 ], [ %rd70 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2829021Z // end inline asm 2026-02-21T09:06:24.2829166Z cp.async.commit_group; 2026-02-21T09:06:24.2829418Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2829706Z bar.sync 0; 2026-02-21T09:06:24.2829830Z // begin inline asm 2026-02-21T09:06:24.2830023Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r200], 8192; 2026-02-21T09:06:24.2830229Z // end inline asm 2026-02-21T09:06:24.2830471Z .loc 1 55 44 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:55:44 2026-02-21T09:06:24.2830748Z bar.sync 0; 2026-02-21T09:06:24.2830884Z elect.sync %r358|%p44, -1; 2026-02-21T09:06:24.2831048Z and.pred %p39, %p1, %p44; 2026-02-21T09:06:24.2831199Z add.s32 %r273, %r44, 139264; 2026-02-21T09:06:24.2831357Z // begin inline asm 2026-02-21T09:06:24.2831682Z @%p39 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r273], [%rd143, {%r47, %r438}], [%r200]; 2026-02-21T09:06:24.2832042Z // end inline asm 2026-02-21T09:06:24.2832281Z .loc 1 54 32 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:32 2026-02-21T09:06:24.2832564Z add.s64 %rd72, %rd3, 256; 2026-02-21T09:06:24.2832718Z or.b32 %r359, %r353, 128; 2026-02-21T09:06:24.2832871Z mad.wide.u32 %rd97, %r359, 2, %rd16; 2026-02-21T09:06:24.2833048Z add.s64 %rd73, %rd97, 65536; 2026-02-21T09:06:24.2833203Z add.s64 %rd74, %rd97, 131072; 2026-02-21T09:06:24.2833365Z add.s64 %rd75, %rd97, 196608; 2026-02-21T09:06:24.2833516Z add.s64 %rd76, %rd97, 262144; 2026-02-21T09:06:24.2833676Z add.s64 %rd77, %rd97, 327680; 2026-02-21T09:06:24.2833822Z add.s64 %rd78, %rd97, 393216; 2026-02-21T09:06:24.2833977Z add.s64 %rd79, %rd5, 256; 2026-02-21T09:06:24.2834128Z add.s64 %rd80, %rd97, 524288; 2026-02-21T09:06:24.2834273Z add.s64 %rd81, %rd97, 589824; 2026-02-21T09:06:24.2834429Z add.s64 %rd82, %rd97, 655360; 2026-02-21T09:06:24.2834575Z add.s64 %rd83, %rd97, 720896; 2026-02-21T09:06:24.2834760Z add.s64 %rd84, %rd97, 786432; 2026-02-21T09:06:24.2834907Z add.s64 %rd85, %rd97, 851968; 2026-02-21T09:06:24.2835059Z add.s64 %rd86, %rd97, 917504; 2026-02-21T09:06:24.2835205Z add.s64 %rd87, %rd7, 256; 2026-02-21T09:06:24.2835472Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2835782Z // begin inline asm 2026-02-21T09:06:24.2835973Z cp.async.cg.shared.global [ %r277 + 0 ], [ %rd72 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2836219Z // end inline asm 2026-02-21T09:06:24.2836354Z // begin inline asm 2026-02-21T09:06:24.2836554Z cp.async.cg.shared.global [ %r279 + 0 ], [ %rd73 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2836773Z // end inline asm 2026-02-21T09:06:24.2836914Z // begin inline asm 2026-02-21T09:06:24.2837103Z cp.async.cg.shared.global [ %r281 + 0 ], [ %rd74 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2837327Z // end inline asm 2026-02-21T09:06:24.2837461Z // begin inline asm 2026-02-21T09:06:24.2837655Z cp.async.cg.shared.global [ %r283 + 0 ], [ %rd75 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2837905Z // end inline asm 2026-02-21T09:06:24.2838032Z // begin inline asm 2026-02-21T09:06:24.2838222Z cp.async.cg.shared.global [ %r285 + 0 ], [ %rd76 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2838432Z // end inline asm 2026-02-21T09:06:24.2838567Z // begin inline asm 2026-02-21T09:06:24.2838750Z cp.async.cg.shared.global [ %r287 + 0 ], [ %rd77 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2838970Z // end inline asm 2026-02-21T09:06:24.2839097Z // begin inline asm 2026-02-21T09:06:24.2839289Z cp.async.cg.shared.global [ %r289 + 0 ], [ %rd78 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2839511Z // end inline asm 2026-02-21T09:06:24.2839639Z // begin inline asm 2026-02-21T09:06:24.2839828Z cp.async.cg.shared.global [ %r291 + 0 ], [ %rd79 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2840038Z // end inline asm 2026-02-21T09:06:24.2840170Z // begin inline asm 2026-02-21T09:06:24.2840378Z cp.async.cg.shared.global [ %r293 + 0 ], [ %rd80 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2840600Z // end inline asm 2026-02-21T09:06:24.2840729Z // begin inline asm 2026-02-21T09:06:24.2840914Z cp.async.cg.shared.global [ %r295 + 0 ], [ %rd81 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2841135Z // end inline asm 2026-02-21T09:06:24.2841263Z // begin inline asm 2026-02-21T09:06:24.2841453Z cp.async.cg.shared.global [ %r297 + 0 ], [ %rd82 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2841666Z // end inline asm 2026-02-21T09:06:24.2841799Z // begin inline asm 2026-02-21T09:06:24.2841980Z cp.async.cg.shared.global [ %r299 + 0 ], [ %rd83 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2842200Z // end inline asm 2026-02-21T09:06:24.2842330Z // begin inline asm 2026-02-21T09:06:24.2842530Z cp.async.cg.shared.global [ %r301 + 0 ], [ %rd84 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2842748Z // end inline asm 2026-02-21T09:06:24.2842878Z // begin inline asm 2026-02-21T09:06:24.2843066Z cp.async.cg.shared.global [ %r303 + 0 ], [ %rd85 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2843276Z // end inline asm 2026-02-21T09:06:24.2843412Z // begin inline asm 2026-02-21T09:06:24.2843592Z cp.async.cg.shared.global [ %r305 + 0 ], [ %rd86 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2843812Z // end inline asm 2026-02-21T09:06:24.2843939Z // begin inline asm 2026-02-21T09:06:24.2844128Z cp.async.cg.shared.global [ %r307 + 0 ], [ %rd87 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2844344Z // end inline asm 2026-02-21T09:06:24.2844480Z cp.async.commit_group; 2026-02-21T09:06:24.2844769Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2845048Z bar.sync 0; 2026-02-21T09:06:24.2845179Z // begin inline asm 2026-02-21T09:06:24.2845364Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r201], 8192; 2026-02-21T09:06:24.2845581Z // end inline asm 2026-02-21T09:06:24.2845812Z .loc 1 55 44 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:55:44 2026-02-21T09:06:24.2846096Z bar.sync 0; 2026-02-21T09:06:24.2846236Z elect.sync %r360|%p45, -1; 2026-02-21T09:06:24.2846396Z and.pred %p41, %p1, %p45; 2026-02-21T09:06:24.2846555Z add.s32 %r310, %r44, 147456; 2026-02-21T09:06:24.2846706Z mov.b32 %r311, 128; 2026-02-21T09:06:24.2846848Z // begin inline asm 2026-02-21T09:06:24.2847173Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r310], [%rd143, {%r311, %r438}], [%r201]; 2026-02-21T09:06:24.2847577Z // end inline asm 2026-02-21T09:06:24.2847838Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2848123Z cp.async.wait_group 2; 2026-02-21T09:06:24.2848274Z bar.sync 0; 2026-02-21T09:06:24.2848500Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2848787Z // begin inline asm 2026-02-21T09:06:24.2848914Z 2026-02-21T09:06:24.2849032Z { 2026-02-21T09:06:24.2849153Z .reg .pred complete; 2026-02-21T09:06:24.2849303Z waitLoop: 2026-02-21T09:06:24.2849516Z mbarrier.try_wait.parity.shared.b64 complete, [%r199], %r62; 2026-02-21T09:06:24.2849759Z @!complete bra.uni waitLoop; 2026-02-21T09:06:24.2849913Z } 2026-02-21T09:06:24.2849976Z 2026-02-21T09:06:24.2850031Z // end inline asm 2026-02-21T09:06:24.2850280Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2850568Z setp.ne.b32 %p46, %r29, 0; 2026-02-21T09:06:24.2850728Z @%p46 bra $L__BB0_3; 2026-02-21T09:06:24.2850869Z // %bb.2: 2026-02-21T09:06:24.2851122Z .loc 1 0 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:0:52 2026-02-21T09:06:24.2851416Z add.s32 %r378, %r44, 16480; 2026-02-21T09:06:24.2851590Z bfe.u32 %r379, %r378, 4, 14; 2026-02-21T09:06:24.2851757Z cvt.u64.u32 %rd115, %r379; 2026-02-21T09:06:24.2851930Z or.b64 %rd112, %rd115, 4611686293439512576; 2026-02-21T09:06:24.2852125Z add.s32 %r380, %r44, 16448; 2026-02-21T09:06:24.2852313Z bfe.u32 %r381, %r380, 4, 14; 2026-02-21T09:06:24.2852487Z cvt.u64.u32 %rd116, %r381; 2026-02-21T09:06:24.2852653Z or.b64 %rd110, %rd116, 4611686293439512576; 2026-02-21T09:06:24.2852844Z add.s32 %r382, %r44, 16416; 2026-02-21T09:06:24.2852995Z bfe.u32 %r383, %r382, 4, 14; 2026-02-21T09:06:24.2853161Z cvt.u64.u32 %rd117, %r383; 2026-02-21T09:06:24.2853329Z or.b64 %rd108, %rd117, 4611686293439512576; 2026-02-21T09:06:24.2853505Z add.s32 %r384, %r44, 16384; 2026-02-21T09:06:24.2853663Z bfe.u32 %r385, %r384, 4, 14; 2026-02-21T09:06:24.2853818Z cvt.u64.u32 %rd118, %r385; 2026-02-21T09:06:24.2853990Z or.b64 %rd106, %rd118, 4611686293439512576; 2026-02-21T09:06:24.2854169Z add.s32 %r387, %r44, 131168; 2026-02-21T09:06:24.2854330Z bfe.u32 %r388, %r387, 4, 14; 2026-02-21T09:06:24.2854483Z cvt.u64.u32 %rd119, %r388; 2026-02-21T09:06:24.2854654Z or.b64 %rd105, %rd119, 4611686293338849280; 2026-02-21T09:06:24.2854864Z add.s32 %r389, %r44, 96; 2026-02-21T09:06:24.2855027Z bfe.u32 %r390, %r389, 4, 14; 2026-02-21T09:06:24.2855189Z cvt.u64.u32 %rd120, %r390; 2026-02-21T09:06:24.2855351Z or.b64 %rd104, %rd120, 4611686293439512576; 2026-02-21T09:06:24.2855537Z add.s32 %r391, %r44, 131136; 2026-02-21T09:06:24.2855689Z bfe.u32 %r392, %r391, 4, 14; 2026-02-21T09:06:24.2855850Z cvt.u64.u32 %rd121, %r392; 2026-02-21T09:06:24.2856012Z or.b64 %rd103, %rd121, 4611686293338849280; 2026-02-21T09:06:24.2856197Z add.s32 %r393, %r44, 64; 2026-02-21T09:06:24.2856347Z bfe.u32 %r394, %r393, 4, 14; 2026-02-21T09:06:24.2856510Z cvt.u64.u32 %rd122, %r394; 2026-02-21T09:06:24.2856676Z or.b64 %rd102, %rd122, 4611686293439512576; 2026-02-21T09:06:24.2856852Z add.s32 %r395, %r44, 131104; 2026-02-21T09:06:24.2857013Z bfe.u32 %r396, %r395, 4, 14; 2026-02-21T09:06:24.2857166Z cvt.u64.u32 %rd123, %r396; 2026-02-21T09:06:24.2857336Z or.b64 %rd101, %rd123, 4611686293338849280; 2026-02-21T09:06:24.2857510Z add.s32 %r397, %r44, 32; 2026-02-21T09:06:24.2857668Z bfe.u32 %r398, %r397, 4, 14; 2026-02-21T09:06:24.2857821Z cvt.u64.u32 %rd124, %r398; 2026-02-21T09:06:24.2857989Z or.b64 %rd100, %rd124, 4611686293439512576; 2026-02-21T09:06:24.2858171Z bfe.u32 %r399, %r236, 4, 14; 2026-02-21T09:06:24.2858323Z cvt.u64.u32 %rd125, %r399; 2026-02-21T09:06:24.2858493Z or.b64 %rd99, %rd125, 4611686293338849280; 2026-02-21T09:06:24.2858725Z bfe.u32 %r400, %r44, 4, 14; 2026-02-21T09:06:24.2858881Z cvt.u64.u32 %rd126, %r400; 2026-02-21T09:06:24.2859036Z or.b64 %rd98, %rd126, 4611686293439512576; 2026-02-21T09:06:24.2859363Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2859652Z elect.sync %r401|%p48, -1; 2026-02-21T09:06:24.2859812Z mov.b32 %r362, 135266320; 2026-02-21T09:06:24.2859966Z mov.pred %p47, 0; 2026-02-21T09:06:24.2860103Z // begin inline asm 2026-02-21T09:06:24.2860338Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd98, %rd99, %r362, %p47; 2026-02-21T09:06:24.2860596Z // end inline asm 2026-02-21T09:06:24.2860738Z // begin inline asm 2026-02-21T09:06:24.2860984Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd100, %rd101, %r362, %p49; 2026-02-21T09:06:24.2861244Z // end inline asm 2026-02-21T09:06:24.2861379Z // begin inline asm 2026-02-21T09:06:24.2861598Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd102, %rd103, %r362, %p49; 2026-02-21T09:06:24.2861850Z // end inline asm 2026-02-21T09:06:24.2861979Z // begin inline asm 2026-02-21T09:06:24.2862195Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd104, %rd105, %r362, %p49; 2026-02-21T09:06:24.2862442Z // end inline asm 2026-02-21T09:06:24.2862577Z // begin inline asm 2026-02-21T09:06:24.2862782Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd106, %rd99, %r362, %p47; 2026-02-21T09:06:24.2863031Z // end inline asm 2026-02-21T09:06:24.2863167Z // begin inline asm 2026-02-21T09:06:24.2863404Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd108, %rd101, %r362, %p49; 2026-02-21T09:06:24.2863671Z // end inline asm 2026-02-21T09:06:24.2863802Z // begin inline asm 2026-02-21T09:06:24.2864021Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd110, %rd103, %r362, %p49; 2026-02-21T09:06:24.2864273Z // end inline asm 2026-02-21T09:06:24.2864410Z // begin inline asm 2026-02-21T09:06:24.2864618Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd112, %rd105, %r362, %p49; 2026-02-21T09:06:24.2864912Z // end inline asm 2026-02-21T09:06:24.2865051Z add.s32 %r402, %r44, 163872; 2026-02-21T09:06:24.2865208Z cvt.u64.u32 %rd114, %r402; 2026-02-21T09:06:24.2865363Z // begin inline asm 2026-02-21T09:06:24.2865565Z @%p48 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd114]; 2026-02-21T09:06:24.2865795Z // end inline asm 2026-02-21T09:06:24.2865924Z $L__BB0_3: 2026-02-21T09:06:24.2866162Z .loc 1 0 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:0:52 2026-02-21T09:06:24.2866476Z ld.param.b64 %rd17, [_helion_matmul_param_2]; 2026-02-21T09:06:24.2866665Z add.s32 %r23, %r44, %r328; 2026-02-21T09:06:24.2866820Z add.s32 %r24, %r44, %r329; 2026-02-21T09:06:24.2866970Z add.s32 %r709, %r704, 1024; 2026-02-21T09:06:24.2867129Z or.b32 %r28, %r438, %r318; 2026-02-21T09:06:24.2867382Z .loc 1 54 32 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:32 2026-02-21T09:06:24.2867667Z add.s64 %rd127, %rd3, 384; 2026-02-21T09:06:24.2867817Z cvt.u64.u32 %rd145, %r6; 2026-02-21T09:06:24.2867976Z add.s64 %rd146, %rd2, %rd145; 2026-02-21T09:06:24.2868135Z shl.b64 %rd147, %rd146, 1; 2026-02-21T09:06:24.2868286Z add.s64 %rd148, %rd16, %rd147; 2026-02-21T09:06:24.2868450Z add.s64 %rd128, %rd148, 65536; 2026-02-21T09:06:24.2868609Z add.s64 %rd129, %rd148, 131072; 2026-02-21T09:06:24.2868775Z add.s64 %rd130, %rd148, 196608; 2026-02-21T09:06:24.2868931Z add.s64 %rd131, %rd148, 262144; 2026-02-21T09:06:24.2869092Z add.s64 %rd132, %rd148, 327680; 2026-02-21T09:06:24.2869247Z add.s64 %rd133, %rd148, 393216; 2026-02-21T09:06:24.2869407Z add.s64 %rd134, %rd5, 384; 2026-02-21T09:06:24.2869557Z add.s64 %rd135, %rd148, 524288; 2026-02-21T09:06:24.2869716Z add.s64 %rd136, %rd148, 589824; 2026-02-21T09:06:24.2869878Z add.s64 %rd137, %rd148, 655360; 2026-02-21T09:06:24.2870030Z add.s64 %rd138, %rd148, 720896; 2026-02-21T09:06:24.2870223Z add.s64 %rd139, %rd148, 786432; 2026-02-21T09:06:24.2870378Z add.s64 %rd140, %rd148, 851968; 2026-02-21T09:06:24.2870541Z add.s64 %rd141, %rd148, 917504; 2026-02-21T09:06:24.2870718Z add.s64 %rd142, %rd7, 384; 2026-02-21T09:06:24.2870985Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2871267Z bar.sync 0; 2026-02-21T09:06:24.2871400Z // begin inline asm 2026-02-21T09:06:24.2871604Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd127 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2871821Z // end inline asm 2026-02-21T09:06:24.2871959Z // begin inline asm 2026-02-21T09:06:24.2872154Z cp.async.cg.shared.global [ %r405 + 0 ], [ %rd128 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2872403Z // end inline asm 2026-02-21T09:06:24.2872531Z // begin inline asm 2026-02-21T09:06:24.2872725Z cp.async.cg.shared.global [ %r407 + 0 ], [ %rd129 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2872941Z // end inline asm 2026-02-21T09:06:24.2873076Z // begin inline asm 2026-02-21T09:06:24.2873269Z cp.async.cg.shared.global [ %r409 + 0 ], [ %rd130 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2873488Z // end inline asm 2026-02-21T09:06:24.2873623Z // begin inline asm 2026-02-21T09:06:24.2873808Z cp.async.cg.shared.global [ %r411 + 0 ], [ %rd131 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2874030Z // end inline asm 2026-02-21T09:06:24.2874156Z // begin inline asm 2026-02-21T09:06:24.2874345Z cp.async.cg.shared.global [ %r413 + 0 ], [ %rd132 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2874561Z // end inline asm 2026-02-21T09:06:24.2874725Z // begin inline asm 2026-02-21T09:06:24.2874947Z cp.async.cg.shared.global [ %r415 + 0 ], [ %rd133 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2875161Z // end inline asm 2026-02-21T09:06:24.2875294Z // begin inline asm 2026-02-21T09:06:24.2875476Z cp.async.cg.shared.global [ %r417 + 0 ], [ %rd134 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2875704Z // end inline asm 2026-02-21T09:06:24.2875832Z // begin inline asm 2026-02-21T09:06:24.2876021Z cp.async.cg.shared.global [ %r419 + 0 ], [ %rd135 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2876234Z // end inline asm 2026-02-21T09:06:24.2876369Z // begin inline asm 2026-02-21T09:06:24.2876550Z cp.async.cg.shared.global [ %r421 + 0 ], [ %rd136 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2876770Z // end inline asm 2026-02-21T09:06:24.2876902Z // begin inline asm 2026-02-21T09:06:24.2877086Z cp.async.cg.shared.global [ %r423 + 0 ], [ %rd137 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2877303Z // end inline asm 2026-02-21T09:06:24.2877429Z // begin inline asm 2026-02-21T09:06:24.2877622Z cp.async.cg.shared.global [ %r425 + 0 ], [ %rd138 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2877832Z // end inline asm 2026-02-21T09:06:24.2877965Z // begin inline asm 2026-02-21T09:06:24.2878149Z cp.async.cg.shared.global [ %r427 + 0 ], [ %rd139 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2878367Z // end inline asm 2026-02-21T09:06:24.2878501Z // begin inline asm 2026-02-21T09:06:24.2878685Z cp.async.cg.shared.global [ %r429 + 0 ], [ %rd140 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2878903Z // end inline asm 2026-02-21T09:06:24.2879030Z // begin inline asm 2026-02-21T09:06:24.2879221Z cp.async.cg.shared.global [ %r431 + 0 ], [ %rd141 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2879431Z // end inline asm 2026-02-21T09:06:24.2879568Z // begin inline asm 2026-02-21T09:06:24.2879755Z cp.async.cg.shared.global [ %r433 + 0 ], [ %rd142 + 0 ], 0x10, %r404; 2026-02-21T09:06:24.2879986Z // end inline asm 2026-02-21T09:06:24.2880127Z cp.async.commit_group; 2026-02-21T09:06:24.2880390Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2880678Z // begin inline asm 2026-02-21T09:06:24.2880861Z @%p97 mbarrier.arrive.expect_tx.shared.b64 _, [%r435], 8192; 2026-02-21T09:06:24.2881077Z // end inline asm 2026-02-21T09:06:24.2881315Z .loc 1 55 44 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:55:44 2026-02-21T09:06:24.2881643Z bar.sync 0; 2026-02-21T09:06:24.2881778Z elect.sync %r445|%p67, -1; 2026-02-21T09:06:24.2881944Z and.pred %p65, %p1, %p67; 2026-02-21T09:06:24.2882130Z add.s32 %r436, %r44, 155648; 2026-02-21T09:06:24.2882282Z mov.b32 %r437, 192; 2026-02-21T09:06:24.2882427Z // begin inline asm 2026-02-21T09:06:24.2882748Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r436], [%rd143, {%r437, %r438}], [%r435]; 2026-02-21T09:06:24.2883107Z // end inline asm 2026-02-21T09:06:24.2883344Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2883640Z shl.b64 %rd149, %rd6, 1; 2026-02-21T09:06:24.2883831Z add.s64 %rd8, %rd149, 512; 2026-02-21T09:06:24.2883981Z and.b32 %r446, %r1, 7; 2026-02-21T09:06:24.2884144Z mad.wide.u32 %rd474, %r446, 16, %rd16; 2026-02-21T09:06:24.2884311Z shl.b64 %rd150, %rd4, 1; 2026-02-21T09:06:24.2884466Z add.s64 %rd10, %rd150, 512; 2026-02-21T09:06:24.2884617Z shl.b32 %r447, %r3, 17; 2026-02-21T09:06:24.2884800Z and.b32 %r448, %r447, 7864320; 2026-02-21T09:06:24.2884956Z shl.b32 %r449, %r4, 11; 2026-02-21T09:06:24.2885109Z or.b32 %r450, %r448, %r449; 2026-02-21T09:06:24.2885263Z mul.wide.u32 %rd11, %r450, 2; 2026-02-21T09:06:24.2885424Z mov.b32 %r1049, 1; 2026-02-21T09:06:24.2885564Z mov.b32 %r1048, 3; 2026-02-21T09:06:24.2885694Z mov.b32 %r1044, 0; 2026-02-21T09:06:24.2885830Z mov.b64 %rd475, 0; 2026-02-21T09:06:24.2885962Z mov.b32 %r1046, %r1044; 2026-02-21T09:06:24.2886113Z mov.b32 %r1047, %r1044; 2026-02-21T09:06:24.2886253Z mov.b32 %r1050, %r1044; 2026-02-21T09:06:24.2886431Z bra.uni $L__BB0_4; 2026-02-21T09:06:24.2886615Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:24.2886938Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2887231Z setp.lt.u64 %p89, %rd475, 1792; 2026-02-21T09:06:24.2887496Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2887779Z // begin inline asm 2026-02-21T09:06:24.2887907Z 2026-02-21T09:06:24.2888026Z { 2026-02-21T09:06:24.2888143Z .reg .pred complete; 2026-02-21T09:06:24.2888290Z waitLoop: 2026-02-21T09:06:24.2888477Z mbarrier.try_wait.parity.shared.b64 complete, [%r1045], %r1044; 2026-02-21T09:06:24.2888719Z @!complete bra.uni waitLoop; 2026-02-21T09:06:24.2888865Z } 2026-02-21T09:06:24.2888940Z 2026-02-21T09:06:24.2888996Z // end inline asm 2026-02-21T09:06:24.2889141Z add.s32 %r544, %r1049, 1; 2026-02-21T09:06:24.2889298Z setp.gt.s32 %p92, %r544, 1; 2026-02-21T09:06:24.2889468Z selp.b32 %r1049, 0, %r544, %p92; 2026-02-21T09:06:24.2889632Z selp.b32 %r545, 1, 0, %p92; 2026-02-21T09:06:24.2889792Z xor.b32 %r42, %r1050, %r545; 2026-02-21T09:06:24.2890053Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2890347Z add.s32 %r546, %r1048, 1; 2026-02-21T09:06:24.2890505Z setp.gt.s32 %p93, %r546, 3; 2026-02-21T09:06:24.2890661Z selp.b32 %r1048, 0, %r546, %p93; 2026-02-21T09:06:24.2890938Z .loc 1 54 32 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:32 2026-02-21T09:06:24.2891220Z add.s64 %rd197, %rd474, %rd11; 2026-02-21T09:06:24.2891385Z add.s64 %rd180, %rd197, 512; 2026-02-21T09:06:24.2891540Z add.s64 %rd181, %rd197, 66048; 2026-02-21T09:06:24.2891704Z add.s64 %rd182, %rd197, 131584; 2026-02-21T09:06:24.2891862Z add.s64 %rd183, %rd197, 197120; 2026-02-21T09:06:24.2892025Z add.s64 %rd184, %rd197, 262656; 2026-02-21T09:06:24.2892186Z add.s64 %rd185, %rd197, 328192; 2026-02-21T09:06:24.2892342Z add.s64 %rd186, %rd197, 393728; 2026-02-21T09:06:24.2892503Z add.s64 %rd187, %rd474, %rd10; 2026-02-21T09:06:24.2892656Z add.s64 %rd188, %rd197, 524800; 2026-02-21T09:06:24.2892815Z add.s64 %rd189, %rd197, 590336; 2026-02-21T09:06:24.2893008Z add.s64 %rd190, %rd197, 655872; 2026-02-21T09:06:24.2893173Z add.s64 %rd191, %rd197, 721408; 2026-02-21T09:06:24.2893330Z add.s64 %rd192, %rd197, 786944; 2026-02-21T09:06:24.2893498Z add.s64 %rd193, %rd197, 852480; 2026-02-21T09:06:24.2893682Z add.s64 %rd194, %rd197, 918016; 2026-02-21T09:06:24.2893986Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2894284Z add.s64 %rd195, %rd474, %rd8; 2026-02-21T09:06:24.2894438Z shl.b32 %r547, %r1048, 15; 2026-02-21T09:06:24.2894612Z add.s32 %r549, %r44, %r547; 2026-02-21T09:06:24.2894808Z bar.sync 0; 2026-02-21T09:06:24.2894957Z add.s32 %r507, %r549, %r5; 2026-02-21T09:06:24.2895123Z selp.b32 %r508, 16, 0, %p89; 2026-02-21T09:06:24.2895315Z // begin inline asm 2026-02-21T09:06:24.2895522Z cp.async.cg.shared.global [ %r507 + 0 ], [ %rd180 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2895762Z // end inline asm 2026-02-21T09:06:24.2895909Z add.s32 %r509, %r507, 2048; 2026-02-21T09:06:24.2896063Z // begin inline asm 2026-02-21T09:06:24.2896274Z cp.async.cg.shared.global [ %r509 + 0 ], [ %rd181 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2896501Z // end inline asm 2026-02-21T09:06:24.2896649Z add.s32 %r511, %r507, 4096; 2026-02-21T09:06:24.2896803Z // begin inline asm 2026-02-21T09:06:24.2897008Z cp.async.cg.shared.global [ %r511 + 0 ], [ %rd182 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2897233Z // end inline asm 2026-02-21T09:06:24.2897378Z add.s32 %r513, %r507, 6144; 2026-02-21T09:06:24.2897540Z // begin inline asm 2026-02-21T09:06:24.2897738Z cp.async.cg.shared.global [ %r513 + 0 ], [ %rd183 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2897973Z // end inline asm 2026-02-21T09:06:24.2898143Z add.s32 %r515, %r507, 8192; 2026-02-21T09:06:24.2898313Z // begin inline asm 2026-02-21T09:06:24.2898517Z cp.async.cg.shared.global [ %r515 + 0 ], [ %rd184 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2898758Z // end inline asm 2026-02-21T09:06:24.2898900Z add.s32 %r517, %r507, 10240; 2026-02-21T09:06:24.2899063Z // begin inline asm 2026-02-21T09:06:24.2899257Z cp.async.cg.shared.global [ %r517 + 0 ], [ %rd185 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2899488Z // end inline asm 2026-02-21T09:06:24.2899634Z add.s32 %r519, %r507, 12288; 2026-02-21T09:06:24.2899785Z // begin inline asm 2026-02-21T09:06:24.2899987Z cp.async.cg.shared.global [ %r519 + 0 ], [ %rd186 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2900216Z // end inline asm 2026-02-21T09:06:24.2900360Z add.s32 %r521, %r507, 14336; 2026-02-21T09:06:24.2900513Z // begin inline asm 2026-02-21T09:06:24.2900716Z cp.async.cg.shared.global [ %r521 + 0 ], [ %rd187 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2900943Z // end inline asm 2026-02-21T09:06:24.2901090Z add.s32 %r523, %r507, 16384; 2026-02-21T09:06:24.2901254Z // begin inline asm 2026-02-21T09:06:24.2901449Z cp.async.cg.shared.global [ %r523 + 0 ], [ %rd188 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2901680Z // end inline asm 2026-02-21T09:06:24.2901814Z add.s32 %r525, %r507, 18432; 2026-02-21T09:06:24.2901977Z // begin inline asm 2026-02-21T09:06:24.2902171Z cp.async.cg.shared.global [ %r525 + 0 ], [ %rd189 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2902413Z // end inline asm 2026-02-21T09:06:24.2902543Z add.s32 %r527, %r507, 20480; 2026-02-21T09:06:24.2902698Z // begin inline asm 2026-02-21T09:06:24.2902888Z cp.async.cg.shared.global [ %r527 + 0 ], [ %rd190 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2903103Z // end inline asm 2026-02-21T09:06:24.2903238Z add.s32 %r529, %r507, 22528; 2026-02-21T09:06:24.2903384Z // begin inline asm 2026-02-21T09:06:24.2903579Z cp.async.cg.shared.global [ %r529 + 0 ], [ %rd191 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2903792Z // end inline asm 2026-02-21T09:06:24.2903929Z add.s32 %r531, %r507, 24576; 2026-02-21T09:06:24.2904071Z // begin inline asm 2026-02-21T09:06:24.2904264Z cp.async.cg.shared.global [ %r531 + 0 ], [ %rd192 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2904487Z // end inline asm 2026-02-21T09:06:24.2904617Z add.s32 %r533, %r507, 26624; 2026-02-21T09:06:24.2904847Z // begin inline asm 2026-02-21T09:06:24.2905035Z cp.async.cg.shared.global [ %r533 + 0 ], [ %rd193 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2905256Z // end inline asm 2026-02-21T09:06:24.2905416Z add.s32 %r535, %r507, 28672; 2026-02-21T09:06:24.2905572Z // begin inline asm 2026-02-21T09:06:24.2905760Z cp.async.cg.shared.global [ %r535 + 0 ], [ %rd194 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2905985Z // end inline asm 2026-02-21T09:06:24.2906117Z add.s32 %r537, %r507, 30720; 2026-02-21T09:06:24.2906272Z // begin inline asm 2026-02-21T09:06:24.2906466Z cp.async.cg.shared.global [ %r537 + 0 ], [ %rd195 + 0 ], 0x10, %r508; 2026-02-21T09:06:24.2906690Z // end inline asm 2026-02-21T09:06:24.2906866Z cp.async.commit_group; 2026-02-21T09:06:24.2907122Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2907414Z shl.b32 %r550, %r1048, 3; 2026-02-21T09:06:24.2907566Z add.s32 %r551, %r44, %r550; 2026-02-21T09:06:24.2907730Z add.s32 %r543, %r551, 163840; 2026-02-21T09:06:24.2907891Z and.pred %p87, %p97, %p89; 2026-02-21T09:06:24.2908057Z // begin inline asm 2026-02-21T09:06:24.2908249Z @%p87 mbarrier.arrive.expect_tx.shared.b64 _, [%r543], 8192; 2026-02-21T09:06:24.2908459Z // end inline asm 2026-02-21T09:06:24.2908704Z .loc 1 55 44 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:55:44 2026-02-21T09:06:24.2908987Z shl.b32 %r552, %r1048, 13; 2026-02-21T09:06:24.2909144Z add.s32 %r553, %r44, %r552; 2026-02-21T09:06:24.2909294Z add.s32 %r540, %r553, 131072; 2026-02-21T09:06:24.2909451Z bar.sync 0; 2026-02-21T09:06:24.2909614Z elect.sync %r554|%p94, -1; 2026-02-21T09:06:24.2909780Z and.pred %p95, %p89, %p94; 2026-02-21T09:06:24.2909939Z and.pred %p88, %p1, %p95; 2026-02-21T09:06:24.2910092Z cvt.u32.u64 %r555, %rd475; 2026-02-21T09:06:24.2910245Z add.s32 %r541, %r555, 256; 2026-02-21T09:06:24.2910388Z // begin inline asm 2026-02-21T09:06:24.2910737Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r540], [%rd143, {%r541, %r438}], [%r543]; 2026-02-21T09:06:24.2911094Z // end inline asm 2026-02-21T09:06:24.2911346Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2911642Z add.s64 %rd474, %rd474, 128; 2026-02-21T09:06:24.2911802Z setp.lt.u64 %p96, %rd475, 1920; 2026-02-21T09:06:24.2911971Z add.s64 %rd475, %rd475, 64; 2026-02-21T09:06:24.2912122Z mov.b32 %r1044, %r1050; 2026-02-21T09:06:24.2912274Z mov.b32 %r1045, %r556; 2026-02-21T09:06:24.2912416Z mov.b32 %r1050, %r42; 2026-02-21T09:06:24.2912565Z @%p96 bra $L__BB0_4; 2026-02-21T09:06:24.2912704Z bra.uni $L__BB0_7; 2026-02-21T09:06:24.2912893Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:06:24.2913104Z add.s32 %r453, %r1047, 1; 2026-02-21T09:06:24.2913263Z setp.gt.s32 %p69, %r453, 3; 2026-02-21T09:06:24.2913425Z selp.b32 %r1047, 0, %r453, %p69; 2026-02-21T09:06:24.2913590Z selp.b32 %r454, 1, 0, %p69; 2026-02-21T09:06:24.2913750Z xor.b32 %r1046, %r1046, %r454; 2026-02-21T09:06:24.2914014Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2914309Z cp.async.wait_group 2; 2026-02-21T09:06:24.2914452Z bar.sync 0; 2026-02-21T09:06:24.2914724Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2915002Z shl.b32 %r455, %r1047, 3; 2026-02-21T09:06:24.2915153Z add.s32 %r457, %r44, %r455; 2026-02-21T09:06:24.2915309Z add.s32 %r451, %r457, 163840; 2026-02-21T09:06:24.2915460Z // begin inline asm 2026-02-21T09:06:24.2915596Z 2026-02-21T09:06:24.2915703Z { 2026-02-21T09:06:24.2915829Z .reg .pred complete; 2026-02-21T09:06:24.2915968Z waitLoop: 2026-02-21T09:06:24.2916161Z mbarrier.try_wait.parity.shared.b64 complete, [%r451], %r1046; 2026-02-21T09:06:24.2916391Z @!complete bra.uni waitLoop; 2026-02-21T09:06:24.2916584Z } 2026-02-21T09:06:24.2916647Z 2026-02-21T09:06:24.2916709Z // end inline asm 2026-02-21T09:06:24.2916844Z shl.b32 %r458, %r1049, 3; 2026-02-21T09:06:24.2917030Z add.s32 %r459, %r44, %r458; 2026-02-21T09:06:24.2917185Z add.s32 %r556, %r459, 163872; 2026-02-21T09:06:24.2917462Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2917741Z @%p46 bra $L__BB0_6; 2026-02-21T09:06:24.2917931Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:24.2918242Z .loc 1 55 44 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:55:44 2026-02-21T09:06:24.2918556Z shl.b32 %r476, %r1047, 13; 2026-02-21T09:06:24.2918713Z add.s32 %r478, %r44, %r476; 2026-02-21T09:06:24.2918863Z add.s32 %r479, %r478, 131072; 2026-02-21T09:06:24.2919127Z .loc 1 54 85 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:54:85 2026-02-21T09:06:24.2919402Z shl.b32 %r480, %r1047, 15; 2026-02-21T09:06:24.2919556Z add.s32 %r481, %r44, %r480; 2026-02-21T09:06:24.2919809Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2920100Z elect.sync %r482|%p71, -1; 2026-02-21T09:06:24.2920262Z bfe.u32 %r483, %r481, 4, 14; 2026-02-21T09:06:24.2920413Z cvt.u64.u32 %rd168, %r483; 2026-02-21T09:06:24.2920580Z or.b64 %rd151, %rd168, 4611686293439512576; 2026-02-21T09:06:24.2920757Z bfe.u32 %r484, %r479, 4, 14; 2026-02-21T09:06:24.2920916Z cvt.u64.u32 %rd169, %r484; 2026-02-21T09:06:24.2921074Z or.b64 %rd152, %rd169, 4611686293338849280; 2026-02-21T09:06:24.2921280Z mov.b32 %r461, 135266320; 2026-02-21T09:06:24.2921434Z mov.pred %p70, -1; 2026-02-21T09:06:24.2921584Z // begin inline asm 2026-02-21T09:06:24.2921810Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd151, %rd152, %r461, %p70; 2026-02-21T09:06:24.2922071Z // end inline asm 2026-02-21T09:06:24.2922216Z add.s32 %r485, %r481, 32; 2026-02-21T09:06:24.2922366Z bfe.u32 %r486, %r485, 4, 14; 2026-02-21T09:06:24.2922525Z cvt.u64.u32 %rd170, %r486; 2026-02-21T09:06:24.2922688Z or.b64 %rd153, %rd170, 4611686293439512576; 2026-02-21T09:06:24.2922871Z add.s32 %r487, %r478, 131104; 2026-02-21T09:06:24.2923026Z bfe.u32 %r488, %r487, 4, 14; 2026-02-21T09:06:24.2923185Z cvt.u64.u32 %rd171, %r488; 2026-02-21T09:06:24.2923346Z or.b64 %rd154, %rd171, 4611686293338849280; 2026-02-21T09:06:24.2923525Z // begin inline asm 2026-02-21T09:06:24.2923748Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd153, %rd154, %r461, %p70; 2026-02-21T09:06:24.2924000Z // end inline asm 2026-02-21T09:06:24.2924143Z add.s32 %r489, %r481, 64; 2026-02-21T09:06:24.2924296Z bfe.u32 %r490, %r489, 4, 14; 2026-02-21T09:06:24.2924455Z cvt.u64.u32 %rd172, %r490; 2026-02-21T09:06:24.2924617Z or.b64 %rd155, %rd172, 4611686293439512576; 2026-02-21T09:06:24.2924827Z add.s32 %r491, %r478, 131136; 2026-02-21T09:06:24.2924979Z bfe.u32 %r492, %r491, 4, 14; 2026-02-21T09:06:24.2925134Z cvt.u64.u32 %rd173, %r492; 2026-02-21T09:06:24.2925297Z or.b64 %rd156, %rd173, 4611686293338849280; 2026-02-21T09:06:24.2925466Z // begin inline asm 2026-02-21T09:06:24.2925688Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd155, %rd156, %r461, %p70; 2026-02-21T09:06:24.2925929Z // end inline asm 2026-02-21T09:06:24.2926069Z add.s32 %r493, %r481, 96; 2026-02-21T09:06:24.2926214Z bfe.u32 %r494, %r493, 4, 14; 2026-02-21T09:06:24.2926372Z cvt.u64.u32 %rd174, %r494; 2026-02-21T09:06:24.2926532Z or.b64 %rd157, %rd174, 4611686293439512576; 2026-02-21T09:06:24.2926719Z add.s32 %r495, %r478, 131168; 2026-02-21T09:06:24.2926888Z bfe.u32 %r496, %r495, 4, 14; 2026-02-21T09:06:24.2927042Z cvt.u64.u32 %rd175, %r496; 2026-02-21T09:06:24.2927214Z or.b64 %rd158, %rd175, 4611686293338849280; 2026-02-21T09:06:24.2927382Z // begin inline asm 2026-02-21T09:06:24.2927601Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 0 ], %rd157, %rd158, %r461, %p70; 2026-02-21T09:06:24.2927877Z // end inline asm 2026-02-21T09:06:24.2928017Z add.s32 %r497, %r481, 16384; 2026-02-21T09:06:24.2928164Z bfe.u32 %r498, %r497, 4, 14; 2026-02-21T09:06:24.2928343Z cvt.u64.u32 %rd176, %r498; 2026-02-21T09:06:24.2928509Z or.b64 %rd159, %rd176, 4611686293439512576; 2026-02-21T09:06:24.2928679Z // begin inline asm 2026-02-21T09:06:24.2928902Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd159, %rd152, %r461, %p70; 2026-02-21T09:06:24.2929158Z // end inline asm 2026-02-21T09:06:24.2929298Z add.s32 %r499, %r481, 16416; 2026-02-21T09:06:24.2929445Z bfe.u32 %r500, %r499, 4, 14; 2026-02-21T09:06:24.2929603Z cvt.u64.u32 %rd177, %r500; 2026-02-21T09:06:24.2929788Z or.b64 %rd161, %rd177, 4611686293439512576; 2026-02-21T09:06:24.2929964Z // begin inline asm 2026-02-21T09:06:24.2930193Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd161, %rd154, %r461, %p70; 2026-02-21T09:06:24.2930451Z // end inline asm 2026-02-21T09:06:24.2930592Z add.s32 %r501, %r481, 16448; 2026-02-21T09:06:24.2930738Z bfe.u32 %r502, %r501, 4, 14; 2026-02-21T09:06:24.2930893Z cvt.u64.u32 %rd178, %r502; 2026-02-21T09:06:24.2931050Z or.b64 %rd163, %rd178, 4611686293439512576; 2026-02-21T09:06:24.2931227Z // begin inline asm 2026-02-21T09:06:24.2931441Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd163, %rd156, %r461, %p70; 2026-02-21T09:06:24.2931693Z // end inline asm 2026-02-21T09:06:24.2931834Z add.s32 %r503, %r481, 16480; 2026-02-21T09:06:24.2931981Z bfe.u32 %r504, %r503, 4, 14; 2026-02-21T09:06:24.2932138Z cvt.u64.u32 %rd179, %r504; 2026-02-21T09:06:24.2932228Z or.b64 %rd165, %rd179, 4611686293439512576; 2026-02-21T09:06:24.2932289Z // begin inline asm 2026-02-21T09:06:24.2932427Z @%p71 tcgen05.mma.cta_group::1.kind::f16 [ %r1043 + 64 ], %rd165, %rd158, %r461, %p70; 2026-02-21T09:06:24.2932480Z // end inline asm 2026-02-21T09:06:24.2932536Z cvt.u64.u32 %rd167, %r556; 2026-02-21T09:06:24.2932592Z // begin inline asm 2026-02-21T09:06:24.2932721Z @%p71 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd167]; 2026-02-21T09:06:24.2932773Z // end inline asm 2026-02-21T09:06:24.2932827Z bra.uni $L__BB0_6; 2026-02-21T09:06:24.2932929Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:06:24.2933098Z .loc 1 0 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:0:52 2026-02-21T09:06:24.2933155Z cvt.u32.u64 %r844, %rd6; 2026-02-21T09:06:24.2933219Z cvt.u32.u64 %r845, %rd4; 2026-02-21T09:06:24.2933273Z cvt.u32.u64 %r846, %rd2; 2026-02-21T09:06:24.2933327Z mov.b32 %r557, 1; 2026-02-21T09:06:24.2933494Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2933557Z // begin inline asm 2026-02-21T09:06:24.2933605Z 2026-02-21T09:06:24.2933652Z { 2026-02-21T09:06:24.2933720Z .reg .pred complete; 2026-02-21T09:06:24.2933774Z waitLoop: 2026-02-21T09:06:24.2933887Z mbarrier.try_wait.parity.shared.b64 complete, [%r556], %r557; 2026-02-21T09:06:24.2933956Z @!complete bra.uni waitLoop; 2026-02-21T09:06:24.2934005Z } 2026-02-21T09:06:24.2934009Z 2026-02-21T09:06:24.2934063Z // end inline asm 2026-02-21T09:06:24.2934229Z .loc 1 49 42 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:49:42 2026-02-21T09:06:24.2934300Z cp.async.wait_group 0; 2026-02-21T09:06:24.2934352Z bar.sync 0; 2026-02-21T09:06:24.2934406Z // begin inline asm 2026-02-21T09:06:24.2934496Z @%p97 mbarrier.inval.shared::cta.b64 [%r199]; 2026-02-21T09:06:24.2934548Z // end inline asm 2026-02-21T09:06:24.2934600Z bar.sync 0; 2026-02-21T09:06:24.2934655Z // begin inline asm 2026-02-21T09:06:24.2934783Z @%p97 mbarrier.inval.shared::cta.b64 [%r200]; 2026-02-21T09:06:24.2934838Z // end inline asm 2026-02-21T09:06:24.2934893Z bar.sync 0; 2026-02-21T09:06:24.2934956Z // begin inline asm 2026-02-21T09:06:24.2935034Z @%p97 mbarrier.inval.shared::cta.b64 [%r201]; 2026-02-21T09:06:24.2935114Z // end inline asm 2026-02-21T09:06:24.2935176Z bar.sync 0; 2026-02-21T09:06:24.2935230Z // begin inline asm 2026-02-21T09:06:24.2935305Z @%p97 mbarrier.inval.shared::cta.b64 [%r435]; 2026-02-21T09:06:24.2935382Z // end inline asm 2026-02-21T09:06:24.2935451Z add.s32 %r562, %r44, 163872; 2026-02-21T09:06:24.2935505Z // begin inline asm 2026-02-21T09:06:24.2935580Z @%p97 mbarrier.inval.shared::cta.b64 [%r562]; 2026-02-21T09:06:24.2935640Z // end inline asm 2026-02-21T09:06:24.2935691Z bar.sync 0; 2026-02-21T09:06:24.2935744Z // begin inline asm 2026-02-21T09:06:24.2935817Z @%p97 mbarrier.inval.shared::cta.b64 [%r198]; 2026-02-21T09:06:24.2935880Z // end inline asm 2026-02-21T09:06:24.2936075Z .loc 1 59 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:59:52 2026-02-21T09:06:24.2936135Z or.b32 %r848, %r28, %r846; 2026-02-21T09:06:24.2936203Z or.b32 %r849, %r28, %r845; 2026-02-21T09:06:24.2936259Z or.b32 %r850, %r28, %r844; 2026-02-21T09:06:24.2936430Z .loc 1 59 24 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:59:24 2026-02-21T09:06:24.2936506Z mad.wide.u32 %rd198, %r848, 2, %rd17; 2026-02-21T09:06:24.2936567Z cvt.u64.u32 %rd214, %r28; 2026-02-21T09:06:24.2936628Z add.s64 %rd215, %rd214, %rd2; 2026-02-21T09:06:24.2936686Z shl.b64 %rd216, %rd215, 1; 2026-02-21T09:06:24.2936757Z add.s64 %rd217, %rd17, %rd216; 2026-02-21T09:06:24.2936818Z add.s64 %rd199, %rd217, 65536; 2026-02-21T09:06:24.2936881Z add.s64 %rd200, %rd217, 131072; 2026-02-21T09:06:24.2936949Z add.s64 %rd201, %rd217, 196608; 2026-02-21T09:06:24.2937007Z add.s64 %rd202, %rd217, 262144; 2026-02-21T09:06:24.2937103Z add.s64 %rd203, %rd217, 327680; 2026-02-21T09:06:24.2937163Z add.s64 %rd204, %rd217, 393216; 2026-02-21T09:06:24.2937233Z mad.wide.u32 %rd205, %r849, 2, %rd17; 2026-02-21T09:06:24.2937290Z add.s64 %rd206, %rd217, 524288; 2026-02-21T09:06:24.2937345Z add.s64 %rd207, %rd217, 589824; 2026-02-21T09:06:24.2937412Z add.s64 %rd208, %rd217, 655360; 2026-02-21T09:06:24.2937467Z add.s64 %rd209, %rd217, 720896; 2026-02-21T09:06:24.2937522Z add.s64 %rd210, %rd217, 786432; 2026-02-21T09:06:24.2937585Z add.s64 %rd211, %rd217, 851968; 2026-02-21T09:06:24.2937640Z add.s64 %rd212, %rd217, 917504; 2026-02-21T09:06:24.2937701Z mad.wide.u32 %rd213, %r850, 2, %rd17; 2026-02-21T09:06:24.2937863Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2937927Z // begin inline asm 2026-02-21T09:06:24.2938229Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579}, [%r699 + 0]; 2026-02-21T09:06:24.2938288Z // end inline asm 2026-02-21T09:06:24.2938351Z // begin inline asm 2026-02-21T09:06:24.2938631Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596}, [%r699 + 16]; 2026-02-21T09:06:24.2938689Z // end inline asm 2026-02-21T09:06:24.2938751Z // begin inline asm 2026-02-21T09:06:24.2939034Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613}, [%r699 + 32]; 2026-02-21T09:06:24.2939090Z // end inline asm 2026-02-21T09:06:24.2939145Z // begin inline asm 2026-02-21T09:06:24.2939438Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630}, [%r699 + 48]; 2026-02-21T09:06:24.2939495Z // end inline asm 2026-02-21T09:06:24.2939554Z // begin inline asm 2026-02-21T09:06:24.2939839Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647}, [%r699 + 64]; 2026-02-21T09:06:24.2939894Z // end inline asm 2026-02-21T09:06:24.2939949Z // begin inline asm 2026-02-21T09:06:24.2940241Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664}, [%r699 + 80]; 2026-02-21T09:06:24.2940296Z // end inline asm 2026-02-21T09:06:24.2940379Z // begin inline asm 2026-02-21T09:06:24.2940673Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681}, [%r699 + 96]; 2026-02-21T09:06:24.2940728Z // end inline asm 2026-02-21T09:06:24.2940783Z // begin inline asm 2026-02-21T09:06:24.2941077Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698}, [%r699 + 112]; 2026-02-21T09:06:24.2941153Z // end inline asm 2026-02-21T09:06:24.2941207Z // begin inline asm 2026-02-21T09:06:24.2941278Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:06:24.2941341Z // end inline asm 2026-02-21T09:06:24.2941401Z cvt.u64.u32 %rd218, %r564; 2026-02-21T09:06:24.2941461Z cvt.u64.u32 %rd219, %r565; 2026-02-21T09:06:24.2941527Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:06:24.2941587Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:06:24.2941763Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2941825Z mov.b64 {%r851, %r852}, %rd221; 2026-02-21T09:06:24.2941896Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:06:24.2942064Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2942123Z cvt.u64.u32 %rd222, %r566; 2026-02-21T09:06:24.2942210Z cvt.u64.u32 %rd223, %r567; 2026-02-21T09:06:24.2942274Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:06:24.2942336Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:06:24.2942519Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2942581Z mov.b64 {%r854, %r855}, %rd225; 2026-02-21T09:06:24.2942648Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:06:24.2942819Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2942889Z cvt.u64.u32 %rd226, %r568; 2026-02-21T09:06:24.2942947Z cvt.u64.u32 %rd227, %r569; 2026-02-21T09:06:24.2943007Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:06:24.2943074Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:06:24.2943244Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2943305Z mov.b64 {%r857, %r858}, %rd229; 2026-02-21T09:06:24.2943374Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:06:24.2943547Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2943607Z cvt.u64.u32 %rd230, %r570; 2026-02-21T09:06:24.2943665Z cvt.u64.u32 %rd231, %r571; 2026-02-21T09:06:24.2943733Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:06:24.2943794Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:06:24.2943962Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2944033Z mov.b64 {%r860, %r861}, %rd233; 2026-02-21T09:06:24.2944098Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:06:24.2944271Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2944338Z cvt.u64.u32 %rd234, %r572; 2026-02-21T09:06:24.2944400Z cvt.u64.u32 %rd235, %r573; 2026-02-21T09:06:24.2944460Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:06:24.2944518Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:06:24.2944757Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2944819Z mov.b64 {%r863, %r864}, %rd237; 2026-02-21T09:06:24.2944882Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:06:24.2945057Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2945160Z cvt.u64.u32 %rd238, %r574; 2026-02-21T09:06:24.2945222Z cvt.u64.u32 %rd239, %r575; 2026-02-21T09:06:24.2945297Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:06:24.2945384Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:06:24.2945560Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2945623Z mov.b64 {%r866, %r867}, %rd241; 2026-02-21T09:06:24.2945698Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:06:24.2945872Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2945932Z cvt.u64.u32 %rd242, %r576; 2026-02-21T09:06:24.2946027Z cvt.u64.u32 %rd243, %r577; 2026-02-21T09:06:24.2946086Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:06:24.2946145Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:06:24.2946328Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2946390Z mov.b64 {%r869, %r870}, %rd245; 2026-02-21T09:06:24.2946453Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:06:24.2946632Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2946697Z cvt.u64.u32 %rd246, %r578; 2026-02-21T09:06:24.2946753Z cvt.u64.u32 %rd247, %r579; 2026-02-21T09:06:24.2946810Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:06:24.2946875Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:06:24.2947037Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2947123Z mov.b64 {%r872, %r873}, %rd249; 2026-02-21T09:06:24.2947193Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:06:24.2947357Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2947413Z cvt.u64.u32 %rd250, %r581; 2026-02-21T09:06:24.2947468Z cvt.u64.u32 %rd251, %r582; 2026-02-21T09:06:24.2947533Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:06:24.2947589Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:06:24.2947754Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2947821Z mov.b64 {%r875, %r876}, %rd253; 2026-02-21T09:06:24.2947881Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:06:24.2948045Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2948108Z cvt.u64.u32 %rd254, %r583; 2026-02-21T09:06:24.2948164Z cvt.u64.u32 %rd255, %r584; 2026-02-21T09:06:24.2948221Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:06:24.2948278Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:06:24.2948452Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2948509Z mov.b64 {%r878, %r879}, %rd257; 2026-02-21T09:06:24.2948567Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:06:24.2948738Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2948794Z cvt.u64.u32 %rd258, %r585; 2026-02-21T09:06:24.2948850Z cvt.u64.u32 %rd259, %r586; 2026-02-21T09:06:24.2948911Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:06:24.2948967Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:06:24.2949129Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2949184Z mov.b64 {%r881, %r882}, %rd261; 2026-02-21T09:06:24.2949251Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:06:24.2949416Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2949472Z cvt.u64.u32 %rd262, %r587; 2026-02-21T09:06:24.2949533Z cvt.u64.u32 %rd263, %r588; 2026-02-21T09:06:24.2949588Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:06:24.2949643Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:06:24.2949832Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2949888Z mov.b64 {%r884, %r885}, %rd265; 2026-02-21T09:06:24.2949966Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:06:24.2950126Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2950189Z cvt.u64.u32 %rd266, %r589; 2026-02-21T09:06:24.2950244Z cvt.u64.u32 %rd267, %r590; 2026-02-21T09:06:24.2950298Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:06:24.2950360Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:06:24.2950524Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2950602Z mov.b64 {%r887, %r888}, %rd269; 2026-02-21T09:06:24.2950668Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:06:24.2950823Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2950880Z cvt.u64.u32 %rd270, %r591; 2026-02-21T09:06:24.2950934Z cvt.u64.u32 %rd271, %r592; 2026-02-21T09:06:24.2950998Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:06:24.2951054Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:06:24.2951210Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2951273Z mov.b64 {%r890, %r891}, %rd273; 2026-02-21T09:06:24.2951333Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:06:24.2951487Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2951550Z cvt.u64.u32 %rd274, %r593; 2026-02-21T09:06:24.2951627Z cvt.u64.u32 %rd275, %r594; 2026-02-21T09:06:24.2951685Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:06:24.2951740Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:06:24.2951910Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2951970Z mov.b64 {%r893, %r894}, %rd277; 2026-02-21T09:06:24.2952030Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:06:24.2952199Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2952255Z cvt.u64.u32 %rd278, %r595; 2026-02-21T09:06:24.2952310Z cvt.u64.u32 %rd279, %r596; 2026-02-21T09:06:24.2952373Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:06:24.2952429Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:06:24.2952590Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2952647Z mov.b64 {%r896, %r897}, %rd281; 2026-02-21T09:06:24.2952718Z cvt.rn.f16x2.f32 %r898, %r897, %r896; 2026-02-21T09:06:24.2952882Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2952940Z cvt.u64.u32 %rd282, %r598; 2026-02-21T09:06:24.2953005Z cvt.u64.u32 %rd283, %r599; 2026-02-21T09:06:24.2953062Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:06:24.2953128Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:06:24.2953298Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2953357Z mov.b64 {%r899, %r900}, %rd285; 2026-02-21T09:06:24.2953418Z cvt.rn.f16x2.f32 %r901, %r900, %r899; 2026-02-21T09:06:24.2953582Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2953645Z cvt.u64.u32 %rd286, %r600; 2026-02-21T09:06:24.2953700Z cvt.u64.u32 %rd287, %r601; 2026-02-21T09:06:24.2953756Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:06:24.2953821Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:06:24.2953986Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2954043Z mov.b64 {%r902, %r903}, %rd289; 2026-02-21T09:06:24.2954111Z cvt.rn.f16x2.f32 %r904, %r903, %r902; 2026-02-21T09:06:24.2954272Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2954350Z cvt.u64.u32 %rd290, %r602; 2026-02-21T09:06:24.2954404Z cvt.u64.u32 %rd291, %r603; 2026-02-21T09:06:24.2954486Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:06:24.2954544Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:06:24.2954741Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2954808Z mov.b64 {%r905, %r906}, %rd293; 2026-02-21T09:06:24.2954869Z cvt.rn.f16x2.f32 %r907, %r906, %r905; 2026-02-21T09:06:24.2955035Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2955139Z cvt.u64.u32 %rd294, %r604; 2026-02-21T09:06:24.2955194Z cvt.u64.u32 %rd295, %r605; 2026-02-21T09:06:24.2955250Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:06:24.2955307Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:06:24.2955480Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2955538Z mov.b64 {%r908, %r909}, %rd297; 2026-02-21T09:06:24.2955598Z cvt.rn.f16x2.f32 %r910, %r909, %r908; 2026-02-21T09:06:24.2955767Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2955823Z cvt.u64.u32 %rd298, %r606; 2026-02-21T09:06:24.2955878Z cvt.u64.u32 %rd299, %r607; 2026-02-21T09:06:24.2955942Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:06:24.2955998Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:06:24.2956168Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2956256Z mov.b64 {%r911, %r912}, %rd301; 2026-02-21T09:06:24.2956328Z cvt.rn.f16x2.f32 %r913, %r912, %r911; 2026-02-21T09:06:24.2956490Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2956546Z cvt.u64.u32 %rd302, %r608; 2026-02-21T09:06:24.2956610Z cvt.u64.u32 %rd303, %r609; 2026-02-21T09:06:24.2956665Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:06:24.2956722Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:06:24.2956895Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2956952Z mov.b64 {%r914, %r915}, %rd305; 2026-02-21T09:06:24.2957014Z cvt.rn.f16x2.f32 %r916, %r915, %r914; 2026-02-21T09:06:24.2957177Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2957243Z cvt.u64.u32 %rd306, %r610; 2026-02-21T09:06:24.2957298Z cvt.u64.u32 %rd307, %r611; 2026-02-21T09:06:24.2957355Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:06:24.2957421Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:06:24.2957581Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2957637Z mov.b64 {%r917, %r918}, %rd309; 2026-02-21T09:06:24.2957704Z cvt.rn.f16x2.f32 %r919, %r918, %r917; 2026-02-21T09:06:24.2957865Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2957921Z cvt.u64.u32 %rd310, %r612; 2026-02-21T09:06:24.2957977Z cvt.u64.u32 %rd311, %r613; 2026-02-21T09:06:24.2958040Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:06:24.2958096Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:06:24.2958257Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2958321Z mov.b64 {%r920, %r921}, %rd313; 2026-02-21T09:06:24.2958380Z cvt.rn.f16x2.f32 %r922, %r921, %r920; 2026-02-21T09:06:24.2958545Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2958608Z cvt.u64.u32 %rd314, %r615; 2026-02-21T09:06:24.2958664Z cvt.u64.u32 %rd315, %r616; 2026-02-21T09:06:24.2958719Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:06:24.2958773Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:06:24.2958964Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2959021Z mov.b64 {%r923, %r924}, %rd317; 2026-02-21T09:06:24.2959103Z cvt.rn.f16x2.f32 %r925, %r924, %r923; 2026-02-21T09:06:24.2959271Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2959327Z cvt.u64.u32 %rd318, %r617; 2026-02-21T09:06:24.2959382Z cvt.u64.u32 %rd319, %r618; 2026-02-21T09:06:24.2959443Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:06:24.2959498Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:06:24.2959664Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2959743Z mov.b64 {%r926, %r927}, %rd321; 2026-02-21T09:06:24.2959812Z cvt.rn.f16x2.f32 %r928, %r927, %r926; 2026-02-21T09:06:24.2959976Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2960033Z cvt.u64.u32 %rd322, %r619; 2026-02-21T09:06:24.2960095Z cvt.u64.u32 %rd323, %r620; 2026-02-21T09:06:24.2960151Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:06:24.2960208Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:06:24.2960376Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2960432Z mov.b64 {%r929, %r930}, %rd325; 2026-02-21T09:06:24.2960491Z cvt.rn.f16x2.f32 %r931, %r930, %r929; 2026-02-21T09:06:24.2960653Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2960737Z cvt.u64.u32 %rd326, %r621; 2026-02-21T09:06:24.2960795Z cvt.u64.u32 %rd327, %r622; 2026-02-21T09:06:24.2960849Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:06:24.2960911Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:06:24.2961071Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2961129Z mov.b64 {%r932, %r933}, %rd329; 2026-02-21T09:06:24.2961198Z cvt.rn.f16x2.f32 %r934, %r933, %r932; 2026-02-21T09:06:24.2961359Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2961417Z cvt.u64.u32 %rd330, %r623; 2026-02-21T09:06:24.2961474Z cvt.u64.u32 %rd331, %r624; 2026-02-21T09:06:24.2961538Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:06:24.2961595Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:06:24.2961754Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2961822Z mov.b64 {%r935, %r936}, %rd333; 2026-02-21T09:06:24.2961883Z cvt.rn.f16x2.f32 %r937, %r936, %r935; 2026-02-21T09:06:24.2962042Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2962106Z cvt.u64.u32 %rd334, %r625; 2026-02-21T09:06:24.2962161Z cvt.u64.u32 %rd335, %r626; 2026-02-21T09:06:24.2962217Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:06:24.2962272Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:06:24.2962441Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2962496Z mov.b64 {%r938, %r939}, %rd337; 2026-02-21T09:06:24.2962555Z cvt.rn.f16x2.f32 %r940, %r939, %r938; 2026-02-21T09:06:24.2962721Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2962776Z cvt.u64.u32 %rd338, %r627; 2026-02-21T09:06:24.2962831Z cvt.u64.u32 %rd339, %r628; 2026-02-21T09:06:24.2962894Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:06:24.2962951Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:06:24.2963113Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2963168Z mov.b64 {%r941, %r942}, %rd341; 2026-02-21T09:06:24.2963235Z cvt.rn.f16x2.f32 %r943, %r942, %r941; 2026-02-21T09:06:24.2963420Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2963474Z cvt.u64.u32 %rd342, %r629; 2026-02-21T09:06:24.2963557Z cvt.u64.u32 %rd343, %r630; 2026-02-21T09:06:24.2963613Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:06:24.2963668Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:06:24.2963836Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2963892Z mov.b64 {%r944, %r945}, %rd345; 2026-02-21T09:06:24.2963953Z cvt.rn.f16x2.f32 %r946, %r945, %r944; 2026-02-21T09:06:24.2964118Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2964221Z cvt.u64.u32 %rd346, %r632; 2026-02-21T09:06:24.2964277Z cvt.u64.u32 %rd347, %r633; 2026-02-21T09:06:24.2964333Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:06:24.2964397Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:06:24.2964560Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2964617Z mov.b64 {%r947, %r948}, %rd349; 2026-02-21T09:06:24.2964721Z cvt.rn.f16x2.f32 %r949, %r948, %r947; 2026-02-21T09:06:24.2964885Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2964939Z cvt.u64.u32 %rd350, %r634; 2026-02-21T09:06:24.2964995Z cvt.u64.u32 %rd351, %r635; 2026-02-21T09:06:24.2965058Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:06:24.2965114Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:06:24.2965311Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2965379Z mov.b64 {%r950, %r951}, %rd353; 2026-02-21T09:06:24.2965440Z cvt.rn.f16x2.f32 %r952, %r951, %r950; 2026-02-21T09:06:24.2965602Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2965667Z cvt.u64.u32 %rd354, %r636; 2026-02-21T09:06:24.2965722Z cvt.u64.u32 %rd355, %r637; 2026-02-21T09:06:24.2965777Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:06:24.2965833Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:06:24.2966004Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2966060Z mov.b64 {%r953, %r954}, %rd357; 2026-02-21T09:06:24.2966118Z cvt.rn.f16x2.f32 %r955, %r954, %r953; 2026-02-21T09:06:24.2966287Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2966343Z cvt.u64.u32 %rd358, %r638; 2026-02-21T09:06:24.2966399Z cvt.u64.u32 %rd359, %r639; 2026-02-21T09:06:24.2966463Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:06:24.2966519Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:06:24.2966681Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2966737Z mov.b64 {%r956, %r957}, %rd361; 2026-02-21T09:06:24.2966806Z cvt.rn.f16x2.f32 %r958, %r957, %r956; 2026-02-21T09:06:24.2966967Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2967024Z cvt.u64.u32 %rd362, %r640; 2026-02-21T09:06:24.2967085Z cvt.u64.u32 %rd363, %r641; 2026-02-21T09:06:24.2967141Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:06:24.2967197Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:06:24.2967356Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2967419Z mov.b64 {%r959, %r960}, %rd365; 2026-02-21T09:06:24.2967481Z cvt.rn.f16x2.f32 %r961, %r960, %r959; 2026-02-21T09:06:24.2967642Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2967704Z cvt.u64.u32 %rd366, %r642; 2026-02-21T09:06:24.2967759Z cvt.u64.u32 %rd367, %r643; 2026-02-21T09:06:24.2967815Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:06:24.2967904Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:06:24.2968067Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2968150Z mov.b64 {%r962, %r963}, %rd369; 2026-02-21T09:06:24.2968213Z cvt.rn.f16x2.f32 %r964, %r963, %r962; 2026-02-21T09:06:24.2968380Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2968437Z cvt.u64.u32 %rd370, %r644; 2026-02-21T09:06:24.2968490Z cvt.u64.u32 %rd371, %r645; 2026-02-21T09:06:24.2968554Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:06:24.2968609Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:06:24.2968766Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2968854Z mov.b64 {%r965, %r966}, %rd373; 2026-02-21T09:06:24.2968915Z cvt.rn.f16x2.f32 %r967, %r966, %r965; 2026-02-21T09:06:24.2969075Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2969138Z cvt.u64.u32 %rd374, %r646; 2026-02-21T09:06:24.2969193Z cvt.u64.u32 %rd375, %r647; 2026-02-21T09:06:24.2970369Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:06:24.2970433Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:06:24.2970603Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2970670Z mov.b64 {%r968, %r969}, %rd377; 2026-02-21T09:06:24.2970733Z cvt.rn.f16x2.f32 %r970, %r969, %r968; 2026-02-21T09:06:24.2970894Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2970986Z cvt.u64.u32 %rd378, %r649; 2026-02-21T09:06:24.2971046Z cvt.u64.u32 %rd379, %r650; 2026-02-21T09:06:24.2971104Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:06:24.2971161Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:06:24.2971336Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2971399Z mov.b64 {%r971, %r972}, %rd381; 2026-02-21T09:06:24.2971462Z cvt.rn.f16x2.f32 %r973, %r972, %r971; 2026-02-21T09:06:24.2971656Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2971712Z cvt.u64.u32 %rd382, %r651; 2026-02-21T09:06:24.2971768Z cvt.u64.u32 %rd383, %r652; 2026-02-21T09:06:24.2971823Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:06:24.2971886Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:06:24.2972049Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2972106Z mov.b64 {%r974, %r975}, %rd385; 2026-02-21T09:06:24.2972175Z cvt.rn.f16x2.f32 %r976, %r975, %r974; 2026-02-21T09:06:24.2972335Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2972391Z cvt.u64.u32 %rd386, %r653; 2026-02-21T09:06:24.2972455Z cvt.u64.u32 %rd387, %r654; 2026-02-21T09:06:24.2972510Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:06:24.2972566Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:06:24.2972727Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2972789Z mov.b64 {%r977, %r978}, %rd389; 2026-02-21T09:06:24.2972849Z cvt.rn.f16x2.f32 %r979, %r978, %r977; 2026-02-21T09:06:24.2973010Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2973074Z cvt.u64.u32 %rd390, %r655; 2026-02-21T09:06:24.2973129Z cvt.u64.u32 %rd391, %r656; 2026-02-21T09:06:24.2973185Z shl.b64 %rd392, %rd391, 32; 2026-02-21T09:06:24.2973248Z or.b64 %rd393, %rd390, %rd392; 2026-02-21T09:06:24.2973409Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2973465Z mov.b64 {%r980, %r981}, %rd393; 2026-02-21T09:06:24.2973525Z cvt.rn.f16x2.f32 %r982, %r981, %r980; 2026-02-21T09:06:24.2973745Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2973801Z cvt.u64.u32 %rd394, %r657; 2026-02-21T09:06:24.2973858Z cvt.u64.u32 %rd395, %r658; 2026-02-21T09:06:24.2973923Z shl.b64 %rd396, %rd395, 32; 2026-02-21T09:06:24.2973978Z or.b64 %rd397, %rd394, %rd396; 2026-02-21T09:06:24.2974136Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2974201Z mov.b64 {%r983, %r984}, %rd397; 2026-02-21T09:06:24.2974260Z cvt.rn.f16x2.f32 %r985, %r984, %r983; 2026-02-21T09:06:24.2974427Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2974504Z cvt.u64.u32 %rd398, %r659; 2026-02-21T09:06:24.2974568Z cvt.u64.u32 %rd399, %r660; 2026-02-21T09:06:24.2974624Z shl.b64 %rd400, %rd399, 32; 2026-02-21T09:06:24.2974719Z or.b64 %rd401, %rd398, %rd400; 2026-02-21T09:06:24.2974890Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2974946Z mov.b64 {%r986, %r987}, %rd401; 2026-02-21T09:06:24.2975060Z cvt.rn.f16x2.f32 %r988, %r987, %r986; 2026-02-21T09:06:24.2975233Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2975290Z cvt.u64.u32 %rd402, %r661; 2026-02-21T09:06:24.2975346Z cvt.u64.u32 %rd403, %r662; 2026-02-21T09:06:24.2975404Z shl.b64 %rd404, %rd403, 32; 2026-02-21T09:06:24.2975469Z or.b64 %rd405, %rd402, %rd404; 2026-02-21T09:06:24.2975654Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2975716Z mov.b64 {%r989, %r990}, %rd405; 2026-02-21T09:06:24.2975783Z cvt.rn.f16x2.f32 %r991, %r990, %r989; 2026-02-21T09:06:24.2975941Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2975998Z cvt.u64.u32 %rd406, %r663; 2026-02-21T09:06:24.2976060Z cvt.u64.u32 %rd407, %r664; 2026-02-21T09:06:24.2976117Z shl.b64 %rd408, %rd407, 32; 2026-02-21T09:06:24.2976174Z or.b64 %rd409, %rd406, %rd408; 2026-02-21T09:06:24.2976333Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2976398Z mov.b64 {%r992, %r993}, %rd409; 2026-02-21T09:06:24.2976458Z cvt.rn.f16x2.f32 %r994, %r993, %r992; 2026-02-21T09:06:24.2976617Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2976682Z cvt.u64.u32 %rd410, %r666; 2026-02-21T09:06:24.2976738Z cvt.u64.u32 %rd411, %r667; 2026-02-21T09:06:24.2976796Z shl.b64 %rd412, %rd411, 32; 2026-02-21T09:06:24.2976860Z or.b64 %rd413, %rd410, %rd412; 2026-02-21T09:06:24.2977026Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2977084Z mov.b64 {%r995, %r996}, %rd413; 2026-02-21T09:06:24.2977143Z cvt.rn.f16x2.f32 %r997, %r996, %r995; 2026-02-21T09:06:24.2977307Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2977362Z cvt.u64.u32 %rd414, %r668; 2026-02-21T09:06:24.2977417Z cvt.u64.u32 %rd415, %r669; 2026-02-21T09:06:24.2977479Z shl.b64 %rd416, %rd415, 32; 2026-02-21T09:06:24.2977536Z or.b64 %rd417, %rd414, %rd416; 2026-02-21T09:06:24.2977693Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2977757Z mov.b64 {%r998, %r999}, %rd417; 2026-02-21T09:06:24.2977825Z cvt.rn.f16x2.f32 %r1000, %r999, %r998; 2026-02-21T09:06:24.2977984Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2978038Z cvt.u64.u32 %rd418, %r670; 2026-02-21T09:06:24.2978100Z cvt.u64.u32 %rd419, %r671; 2026-02-21T09:06:24.2978191Z shl.b64 %rd420, %rd419, 32; 2026-02-21T09:06:24.2978248Z or.b64 %rd421, %rd418, %rd420; 2026-02-21T09:06:24.2978417Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2978479Z mov.b64 {%r1001, %r1002}, %rd421; 2026-02-21T09:06:24.2978548Z cvt.rn.f16x2.f32 %r1003, %r1002, %r1001; 2026-02-21T09:06:24.2978717Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2978772Z cvt.u64.u32 %rd422, %r672; 2026-02-21T09:06:24.2978828Z cvt.u64.u32 %rd423, %r673; 2026-02-21T09:06:24.2978883Z shl.b64 %rd424, %rd423, 32; 2026-02-21T09:06:24.2978947Z or.b64 %rd425, %rd422, %rd424; 2026-02-21T09:06:24.2979130Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2979190Z mov.b64 {%r1004, %r1005}, %rd425; 2026-02-21T09:06:24.2979264Z cvt.rn.f16x2.f32 %r1006, %r1005, %r1004; 2026-02-21T09:06:24.2979430Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2979487Z cvt.u64.u32 %rd426, %r674; 2026-02-21T09:06:24.2979549Z cvt.u64.u32 %rd427, %r675; 2026-02-21T09:06:24.2979636Z shl.b64 %rd428, %rd427, 32; 2026-02-21T09:06:24.2979696Z or.b64 %rd429, %rd426, %rd428; 2026-02-21T09:06:24.2979865Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2979932Z mov.b64 {%r1007, %r1008}, %rd429; 2026-02-21T09:06:24.2979998Z cvt.rn.f16x2.f32 %r1009, %r1008, %r1007; 2026-02-21T09:06:24.2980178Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2980248Z cvt.u64.u32 %rd430, %r676; 2026-02-21T09:06:24.2980305Z cvt.u64.u32 %rd431, %r677; 2026-02-21T09:06:24.2980360Z shl.b64 %rd432, %rd431, 32; 2026-02-21T09:06:24.2980423Z or.b64 %rd433, %rd430, %rd432; 2026-02-21T09:06:24.2980588Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2980645Z mov.b64 {%r1010, %r1011}, %rd433; 2026-02-21T09:06:24.2980709Z cvt.rn.f16x2.f32 %r1012, %r1011, %r1010; 2026-02-21T09:06:24.2980879Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2980935Z cvt.u64.u32 %rd434, %r678; 2026-02-21T09:06:24.2980991Z cvt.u64.u32 %rd435, %r679; 2026-02-21T09:06:24.2981055Z shl.b64 %rd436, %rd435, 32; 2026-02-21T09:06:24.2981112Z or.b64 %rd437, %rd434, %rd436; 2026-02-21T09:06:24.2981278Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2981344Z mov.b64 {%r1013, %r1014}, %rd437; 2026-02-21T09:06:24.2981408Z cvt.rn.f16x2.f32 %r1015, %r1014, %r1013; 2026-02-21T09:06:24.2981569Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2981627Z cvt.u64.u32 %rd438, %r680; 2026-02-21T09:06:24.2981691Z cvt.u64.u32 %rd439, %r681; 2026-02-21T09:06:24.2981747Z shl.b64 %rd440, %rd439, 32; 2026-02-21T09:06:24.2981803Z or.b64 %rd441, %rd438, %rd440; 2026-02-21T09:06:24.2981976Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2982034Z mov.b64 {%r1016, %r1017}, %rd441; 2026-02-21T09:06:24.2982100Z cvt.rn.f16x2.f32 %r1018, %r1017, %r1016; 2026-02-21T09:06:24.2982271Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2982328Z cvt.u64.u32 %rd442, %r683; 2026-02-21T09:06:24.2982385Z cvt.u64.u32 %rd443, %r684; 2026-02-21T09:06:24.2982444Z shl.b64 %rd444, %rd443, 32; 2026-02-21T09:06:24.2982512Z or.b64 %rd445, %rd442, %rd444; 2026-02-21T09:06:24.2982682Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2982763Z mov.b64 {%r1019, %r1020}, %rd445; 2026-02-21T09:06:24.2982838Z cvt.rn.f16x2.f32 %r1021, %r1020, %r1019; 2026-02-21T09:06:24.2983009Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2983068Z cvt.u64.u32 %rd446, %r685; 2026-02-21T09:06:24.2983133Z cvt.u64.u32 %rd447, %r686; 2026-02-21T09:06:24.2983192Z shl.b64 %rd448, %rd447, 32; 2026-02-21T09:06:24.2983251Z or.b64 %rd449, %rd446, %rd448; 2026-02-21T09:06:24.2983420Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2983488Z mov.b64 {%r1022, %r1023}, %rd449; 2026-02-21T09:06:24.2983555Z cvt.rn.f16x2.f32 %r1024, %r1023, %r1022; 2026-02-21T09:06:24.2983747Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2983813Z cvt.u64.u32 %rd450, %r687; 2026-02-21T09:06:24.2983872Z cvt.u64.u32 %rd451, %r688; 2026-02-21T09:06:24.2983929Z shl.b64 %rd452, %rd451, 32; 2026-02-21T09:06:24.2983997Z or.b64 %rd453, %rd450, %rd452; 2026-02-21T09:06:24.2984166Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2984248Z mov.b64 {%r1025, %r1026}, %rd453; 2026-02-21T09:06:24.2984316Z cvt.rn.f16x2.f32 %r1027, %r1026, %r1025; 2026-02-21T09:06:24.2984495Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2984554Z cvt.u64.u32 %rd454, %r689; 2026-02-21T09:06:24.2984611Z cvt.u64.u32 %rd455, %r690; 2026-02-21T09:06:24.2984702Z shl.b64 %rd456, %rd455, 32; 2026-02-21T09:06:24.2984789Z or.b64 %rd457, %rd454, %rd456; 2026-02-21T09:06:24.2984962Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2985028Z mov.b64 {%r1028, %r1029}, %rd457; 2026-02-21T09:06:24.2985094Z cvt.rn.f16x2.f32 %r1030, %r1029, %r1028; 2026-02-21T09:06:24.2985268Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2985325Z cvt.u64.u32 %rd458, %r691; 2026-02-21T09:06:24.2985390Z cvt.u64.u32 %rd459, %r692; 2026-02-21T09:06:24.2985451Z shl.b64 %rd460, %rd459, 32; 2026-02-21T09:06:24.2985510Z or.b64 %rd461, %rd458, %rd460; 2026-02-21T09:06:24.2985681Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2985741Z mov.b64 {%r1031, %r1032}, %rd461; 2026-02-21T09:06:24.2985807Z cvt.rn.f16x2.f32 %r1033, %r1032, %r1031; 2026-02-21T09:06:24.2985985Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2986044Z cvt.u64.u32 %rd462, %r693; 2026-02-21T09:06:24.2986103Z cvt.u64.u32 %rd463, %r694; 2026-02-21T09:06:24.2986162Z shl.b64 %rd464, %rd463, 32; 2026-02-21T09:06:24.2986229Z or.b64 %rd465, %rd462, %rd464; 2026-02-21T09:06:24.2986399Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2986460Z mov.b64 {%r1034, %r1035}, %rd465; 2026-02-21T09:06:24.2986534Z cvt.rn.f16x2.f32 %r1036, %r1035, %r1034; 2026-02-21T09:06:24.2986704Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2986764Z cvt.u64.u32 %rd466, %r695; 2026-02-21T09:06:24.2986830Z cvt.u64.u32 %rd467, %r696; 2026-02-21T09:06:24.2986891Z shl.b64 %rd468, %rd467, 32; 2026-02-21T09:06:24.2986951Z or.b64 %rd469, %rd466, %rd468; 2026-02-21T09:06:24.2987122Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2987192Z mov.b64 {%r1037, %r1038}, %rd469; 2026-02-21T09:06:24.2987259Z cvt.rn.f16x2.f32 %r1039, %r1038, %r1037; 2026-02-21T09:06:24.2987427Z .loc 1 56 52 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:56:52 2026-02-21T09:06:24.2987524Z cvt.u64.u32 %rd470, %r697; 2026-02-21T09:06:24.2987581Z cvt.u64.u32 %rd471, %r698; 2026-02-21T09:06:24.2987639Z shl.b64 %rd472, %rd471, 32; 2026-02-21T09:06:24.2987705Z or.b64 %rd473, %rd470, %rd472; 2026-02-21T09:06:24.2987879Z .loc 1 58 27 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:58:27 2026-02-21T09:06:24.2987941Z mov.b64 {%r1040, %r1041}, %rd473; 2026-02-21T09:06:24.2988008Z cvt.rn.f16x2.f32 %r1042, %r1041, %r1040; 2026-02-21T09:06:24.2988187Z .loc 1 59 82 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:59:82 2026-02-21T09:06:24.2988286Z st.shared.v4.b32 [%r23], {%r853, %r865, %r877, %r889}; 2026-02-21T09:06:24.2988382Z st.shared.v4.b32 [%r24], {%r901, %r913, %r925, %r937}; 2026-02-21T09:06:24.2988473Z bar.sync 0; 2026-02-21T09:06:24.2988535Z // begin inline asm 2026-02-21T09:06:24.2988697Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r780, %r784, %r788, %r792}, [%r704]; 2026-02-21T09:06:24.2988764Z // end inline asm 2026-02-21T09:06:24.2988826Z // begin inline asm 2026-02-21T09:06:24.2988977Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r796, %r800, %r804, %r808}, [%r709]; 2026-02-21T09:06:24.2989035Z // end inline asm 2026-02-21T09:06:24.2989136Z bar.sync 0; 2026-02-21T09:06:24.2989229Z st.shared.v4.b32 [%r23], {%r949, %r961, %r973, %r985}; 2026-02-21T09:06:24.2989328Z st.shared.v4.b32 [%r24], {%r997, %r1009, %r1021, %r1033}; 2026-02-21T09:06:24.2989390Z bar.sync 0; 2026-02-21T09:06:24.2989446Z // begin inline asm 2026-02-21T09:06:24.2989593Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r812, %r816, %r820, %r824}, [%r704]; 2026-02-21T09:06:24.2989650Z // end inline asm 2026-02-21T09:06:24.2989745Z // begin inline asm 2026-02-21T09:06:24.2989899Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r828, %r832, %r836, %r840}, [%r709]; 2026-02-21T09:06:24.2989957Z // end inline asm 2026-02-21T09:06:24.2990017Z bar.sync 0; 2026-02-21T09:06:24.2990104Z st.shared.v4.b32 [%r23], {%r856, %r868, %r880, %r892}; 2026-02-21T09:06:24.2990187Z st.shared.v4.b32 [%r24], {%r904, %r916, %r928, %r940}; 2026-02-21T09:06:24.2990245Z bar.sync 0; 2026-02-21T09:06:24.2990300Z // begin inline asm 2026-02-21T09:06:24.2990439Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r781, %r785, %r789, %r793}, [%r704]; 2026-02-21T09:06:24.2990492Z // end inline asm 2026-02-21T09:06:24.2990554Z // begin inline asm 2026-02-21T09:06:24.2990691Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r797, %r801, %r805, %r809}, [%r709]; 2026-02-21T09:06:24.2990744Z // end inline asm 2026-02-21T09:06:24.2990801Z bar.sync 0; 2026-02-21T09:06:24.2990883Z st.shared.v4.b32 [%r23], {%r952, %r964, %r976, %r988}; 2026-02-21T09:06:24.2990975Z st.shared.v4.b32 [%r24], {%r1000, %r1012, %r1024, %r1036}; 2026-02-21T09:06:24.2991028Z bar.sync 0; 2026-02-21T09:06:24.2991087Z // begin inline asm 2026-02-21T09:06:24.2991226Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r813, %r817, %r821, %r825}, [%r704]; 2026-02-21T09:06:24.2991278Z // end inline asm 2026-02-21T09:06:24.2991340Z // begin inline asm 2026-02-21T09:06:24.2991476Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r829, %r833, %r837, %r841}, [%r709]; 2026-02-21T09:06:24.2991528Z // end inline asm 2026-02-21T09:06:24.2991585Z bar.sync 0; 2026-02-21T09:06:24.2991670Z st.shared.v4.b32 [%r23], {%r859, %r871, %r883, %r895}; 2026-02-21T09:06:24.2991751Z st.shared.v4.b32 [%r24], {%r907, %r919, %r931, %r943}; 2026-02-21T09:06:24.2991801Z bar.sync 0; 2026-02-21T09:06:24.2991861Z // begin inline asm 2026-02-21T09:06:24.2991999Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r782, %r786, %r790, %r794}, [%r704]; 2026-02-21T09:06:24.2992051Z // end inline asm 2026-02-21T09:06:24.2992111Z // begin inline asm 2026-02-21T09:06:24.2992249Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r798, %r802, %r806, %r810}, [%r709]; 2026-02-21T09:06:24.2992302Z // end inline asm 2026-02-21T09:06:24.2992353Z bar.sync 0; 2026-02-21T09:06:24.2992440Z st.shared.v4.b32 [%r23], {%r955, %r967, %r979, %r991}; 2026-02-21T09:06:24.2992561Z st.shared.v4.b32 [%r24], {%r1003, %r1015, %r1027, %r1039}; 2026-02-21T09:06:24.2992614Z bar.sync 0; 2026-02-21T09:06:24.2992673Z // begin inline asm 2026-02-21T09:06:24.2992811Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r814, %r818, %r822, %r826}, [%r704]; 2026-02-21T09:06:24.2992863Z // end inline asm 2026-02-21T09:06:24.2992923Z // begin inline asm 2026-02-21T09:06:24.2993058Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r830, %r834, %r838, %r842}, [%r709]; 2026-02-21T09:06:24.2993109Z // end inline asm 2026-02-21T09:06:24.2993160Z bar.sync 0; 2026-02-21T09:06:24.2993248Z st.shared.v4.b32 [%r23], {%r862, %r874, %r886, %r898}; 2026-02-21T09:06:24.2993331Z st.shared.v4.b32 [%r24], {%r910, %r922, %r934, %r946}; 2026-02-21T09:06:24.2993405Z bar.sync 0; 2026-02-21T09:06:24.2993464Z // begin inline asm 2026-02-21T09:06:24.2993601Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r783, %r787, %r791, %r795}, [%r704]; 2026-02-21T09:06:24.2993653Z // end inline asm 2026-02-21T09:06:24.2993706Z // begin inline asm 2026-02-21T09:06:24.2993850Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r799, %r803, %r807, %r811}, [%r709]; 2026-02-21T09:06:24.2993902Z // end inline asm 2026-02-21T09:06:24.2993954Z bar.sync 0; 2026-02-21T09:06:24.2994068Z st.shared.v4.b32 [%r23], {%r958, %r970, %r982, %r994}; 2026-02-21T09:06:24.2994160Z st.shared.v4.b32 [%r24], {%r1006, %r1018, %r1030, %r1042}; 2026-02-21T09:06:24.2994211Z bar.sync 0; 2026-02-21T09:06:24.2994270Z // begin inline asm 2026-02-21T09:06:24.2994406Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r815, %r819, %r823, %r827}, [%r704]; 2026-02-21T09:06:24.2994459Z // end inline asm 2026-02-21T09:06:24.2994513Z // begin inline asm 2026-02-21T09:06:24.2994713Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r831, %r835, %r839, %r843}, [%r709]; 2026-02-21T09:06:24.2994770Z // end inline asm 2026-02-21T09:06:24.2994824Z // begin inline asm 2026-02-21T09:06:24.2994934Z st.global.v4.b32 [ %rd198 + 0 ], { %r780, %r781, %r782, %r783 }; 2026-02-21T09:06:24.2994989Z // end inline asm 2026-02-21T09:06:24.2995042Z // begin inline asm 2026-02-21T09:06:24.2995140Z st.global.v4.b32 [ %rd199 + 0 ], { %r784, %r785, %r786, %r787 }; 2026-02-21T09:06:24.2995201Z // end inline asm 2026-02-21T09:06:24.2995257Z // begin inline asm 2026-02-21T09:06:24.2995352Z st.global.v4.b32 [ %rd200 + 0 ], { %r788, %r789, %r790, %r791 }; 2026-02-21T09:06:24.2995412Z // end inline asm 2026-02-21T09:06:24.2995465Z // begin inline asm 2026-02-21T09:06:24.2995555Z st.global.v4.b32 [ %rd201 + 0 ], { %r792, %r793, %r794, %r795 }; 2026-02-21T09:06:24.2995613Z // end inline asm 2026-02-21T09:06:24.2995667Z // begin inline asm 2026-02-21T09:06:24.2995759Z st.global.v4.b32 [ %rd202 + 0 ], { %r796, %r797, %r798, %r799 }; 2026-02-21T09:06:24.2995813Z // end inline asm 2026-02-21T09:06:24.2995873Z // begin inline asm 2026-02-21T09:06:24.2995963Z st.global.v4.b32 [ %rd203 + 0 ], { %r800, %r801, %r802, %r803 }; 2026-02-21T09:06:24.2996015Z // end inline asm 2026-02-21T09:06:24.2996076Z // begin inline asm 2026-02-21T09:06:24.2996169Z st.global.v4.b32 [ %rd204 + 0 ], { %r804, %r805, %r806, %r807 }; 2026-02-21T09:06:24.2996222Z // end inline asm 2026-02-21T09:06:24.2996276Z // begin inline asm 2026-02-21T09:06:24.2996376Z st.global.v4.b32 [ %rd205 + 0 ], { %r808, %r809, %r810, %r811 }; 2026-02-21T09:06:24.2996430Z // end inline asm 2026-02-21T09:06:24.2996485Z // begin inline asm 2026-02-21T09:06:24.2996586Z st.global.v4.b32 [ %rd206 + 0 ], { %r812, %r813, %r814, %r815 }; 2026-02-21T09:06:24.2996641Z // end inline asm 2026-02-21T09:06:24.2996696Z // begin inline asm 2026-02-21T09:06:24.2996787Z st.global.v4.b32 [ %rd207 + 0 ], { %r816, %r817, %r818, %r819 }; 2026-02-21T09:06:24.2996852Z // end inline asm 2026-02-21T09:06:24.2996907Z // begin inline asm 2026-02-21T09:06:24.2996998Z st.global.v4.b32 [ %rd208 + 0 ], { %r820, %r821, %r822, %r823 }; 2026-02-21T09:06:24.2997059Z // end inline asm 2026-02-21T09:06:24.2997113Z // begin inline asm 2026-02-21T09:06:24.2997203Z st.global.v4.b32 [ %rd209 + 0 ], { %r824, %r825, %r826, %r827 }; 2026-02-21T09:06:24.2997293Z // end inline asm 2026-02-21T09:06:24.2997345Z // begin inline asm 2026-02-21T09:06:24.2997435Z st.global.v4.b32 [ %rd210 + 0 ], { %r828, %r829, %r830, %r831 }; 2026-02-21T09:06:24.2997489Z // end inline asm 2026-02-21T09:06:24.2997552Z // begin inline asm 2026-02-21T09:06:24.2997641Z st.global.v4.b32 [ %rd211 + 0 ], { %r832, %r833, %r834, %r835 }; 2026-02-21T09:06:24.2997693Z // end inline asm 2026-02-21T09:06:24.2997753Z // begin inline asm 2026-02-21T09:06:24.2997843Z st.global.v4.b32 [ %rd212 + 0 ], { %r836, %r837, %r838, %r839 }; 2026-02-21T09:06:24.2997894Z // end inline asm 2026-02-21T09:06:24.2997949Z // begin inline asm 2026-02-21T09:06:24.2998077Z st.global.v4.b32 [ %rd213 + 0 ], { %r840, %r841, %r842, %r843 }; 2026-02-21T09:06:24.2998129Z // end inline asm 2026-02-21T09:06:24.2998209Z $L__BB0_8: // %._crit_edge 2026-02-21T09:06:24.2998388Z .loc 1 30 4 // c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py:30:4 2026-02-21T09:06:24.2998442Z bar.sync 0; 2026-02-21T09:06:24.2998494Z // begin inline asm 2026-02-21T09:06:24.2998616Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1043, 128; 2026-02-21T09:06:24.2998694Z // end inline asm 2026-02-21T09:06:24.2998749Z ret; 2026-02-21T09:06:24.2998803Z $L__tmp0: 2026-02-21T09:06:24.2998865Z $L__func_end0: 2026-02-21T09:06:24.2998947Z // -- End function 2026-02-21T09:06:24.2998997Z } 2026-02-21T09:06:24.2999202Z .file 1 "/tmp/torchinductor_root/6b/c6bpaee653bw7p7whrlrtpeildslsqkn4mdpx24hkr7kdcobeqlm.py" 2026-02-21T09:06:24.2999263Z .section .debug_abbrev 2026-02-21T09:06:24.2999341Z { 2026-02-21T09:06:24.2999428Z .b8 1 // Abbreviation Code 2026-02-21T09:06:24.2999519Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:06:24.2999594Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:24.2999672Z .b8 37 // DW_AT_producer 2026-02-21T09:06:24.2999753Z .b8 8 // DW_FORM_string 2026-02-21T09:06:24.2999824Z .b8 19 // DW_AT_language 2026-02-21T09:06:24.2999897Z .b8 5 // DW_FORM_data2 2026-02-21T09:06:24.2999978Z .b8 3 // DW_AT_name 2026-02-21T09:06:24.3000049Z .b8 8 // DW_FORM_string 2026-02-21T09:06:24.3000122Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:06:24.3000199Z .b8 6 // DW_FORM_data4 2026-02-21T09:06:24.3000271Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:06:24.3000342Z .b8 8 // DW_FORM_string 2026-02-21T09:06:24.3000409Z .b8 0 // EOM(1) 2026-02-21T09:06:24.3000484Z .b8 0 // EOM(2) 2026-02-21T09:06:24.3000548Z .b8 0 // EOM(3) 2026-02-21T09:06:24.3000597Z } 2026-02-21T09:06:24.3000662Z .section .debug_info 2026-02-21T09:06:24.3000711Z { 2026-02-21T09:06:24.3000791Z .b32 104 // Length of Unit 2026-02-21T09:06:24.3000871Z .b8 2 // DWARF version number 2026-02-21T09:06:24.3000926Z .b8 0 2026-02-21T09:06:24.3001038Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:06:24.3001121Z .b8 8 // Address Size (in bytes) 2026-02-21T09:06:24.3001225Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:06:24.3001301Z .b8 116 // DW_AT_producer 2026-02-21T09:06:24.3001353Z .b8 114 2026-02-21T09:06:24.3001410Z .b8 105 2026-02-21T09:06:24.3001457Z .b8 116 2026-02-21T09:06:24.3001506Z .b8 111 2026-02-21T09:06:24.3001553Z .b8 110 2026-02-21T09:06:24.3001608Z .b8 0 2026-02-21T09:06:24.3001698Z .b8 2 // DW_AT_language 2026-02-21T09:06:24.3001746Z .b8 0 2026-02-21T09:06:24.3001824Z .b8 99 // DW_AT_name 2026-02-21T09:06:24.3001874Z .b8 54 2026-02-21T09:06:24.3001924Z .b8 98 2026-02-21T09:06:24.3001972Z .b8 112 2026-02-21T09:06:24.3002026Z .b8 97 2026-02-21T09:06:24.3002074Z .b8 101 2026-02-21T09:06:24.3002121Z .b8 101 2026-02-21T09:06:24.3002175Z .b8 54 2026-02-21T09:06:24.3002222Z .b8 53 2026-02-21T09:06:24.3002269Z .b8 51 2026-02-21T09:06:24.3002317Z .b8 98 2026-02-21T09:06:24.3002374Z .b8 119 2026-02-21T09:06:24.3002421Z .b8 55 2026-02-21T09:06:24.3002468Z .b8 112 2026-02-21T09:06:24.3002515Z .b8 55 2026-02-21T09:06:24.3002573Z .b8 119 2026-02-21T09:06:24.3002644Z .b8 104 2026-02-21T09:06:24.3002692Z .b8 114 2026-02-21T09:06:24.3002747Z .b8 108 2026-02-21T09:06:24.3002797Z .b8 114 2026-02-21T09:06:24.3002845Z .b8 116 2026-02-21T09:06:24.3002893Z .b8 112 2026-02-21T09:06:24.3002950Z .b8 101 2026-02-21T09:06:24.3003000Z .b8 105 2026-02-21T09:06:24.3003049Z .b8 108 2026-02-21T09:06:24.3003106Z .b8 100 2026-02-21T09:06:24.3003156Z .b8 115 2026-02-21T09:06:24.3003204Z .b8 108 2026-02-21T09:06:24.3003253Z .b8 115 2026-02-21T09:06:24.3003311Z .b8 113 2026-02-21T09:06:24.3003379Z .b8 107 2026-02-21T09:06:24.3003429Z .b8 110 2026-02-21T09:06:24.3003481Z .b8 52 2026-02-21T09:06:24.3003537Z .b8 109 2026-02-21T09:06:24.3003585Z .b8 100 2026-02-21T09:06:24.3003633Z .b8 112 2026-02-21T09:06:24.3003688Z .b8 120 2026-02-21T09:06:24.3003736Z .b8 50 2026-02-21T09:06:24.3003784Z .b8 52 2026-02-21T09:06:24.3003831Z .b8 104 2026-02-21T09:06:24.3003887Z .b8 107 2026-02-21T09:06:24.3003936Z .b8 114 2026-02-21T09:06:24.3003983Z .b8 55 2026-02-21T09:06:24.3004059Z .b8 107 2026-02-21T09:06:24.3004111Z .b8 100 2026-02-21T09:06:24.3004160Z .b8 99 2026-02-21T09:06:24.3004210Z .b8 111 2026-02-21T09:06:24.3004264Z .b8 98 2026-02-21T09:06:24.3004313Z .b8 101 2026-02-21T09:06:24.3004362Z .b8 113 2026-02-21T09:06:24.3004417Z .b8 108 2026-02-21T09:06:24.3004467Z .b8 109 2026-02-21T09:06:24.3004516Z .b8 46 2026-02-21T09:06:24.3004566Z .b8 112 2026-02-21T09:06:24.3004622Z .b8 121 2026-02-21T09:06:24.3004707Z .b8 0 2026-02-21T09:06:24.3004799Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:06:24.3004880Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:06:24.3004931Z .b8 116 2026-02-21T09:06:24.3004979Z .b8 109 2026-02-21T09:06:24.3005027Z .b8 112 2026-02-21T09:06:24.3005082Z .b8 47 2026-02-21T09:06:24.3005129Z .b8 116 2026-02-21T09:06:24.3005177Z .b8 111 2026-02-21T09:06:24.3005225Z .b8 114 2026-02-21T09:06:24.3005283Z .b8 99 2026-02-21T09:06:24.3005331Z .b8 104 2026-02-21T09:06:24.3005380Z .b8 105 2026-02-21T09:06:24.3005436Z .b8 110 2026-02-21T09:06:24.3005486Z .b8 100 2026-02-21T09:06:24.3005535Z .b8 117 2026-02-21T09:06:24.3005583Z .b8 99 2026-02-21T09:06:24.3005640Z .b8 116 2026-02-21T09:06:24.3005687Z .b8 111 2026-02-21T09:06:24.3005734Z .b8 114 2026-02-21T09:06:24.3005789Z .b8 95 2026-02-21T09:06:24.3005839Z .b8 114 2026-02-21T09:06:24.3005887Z .b8 111 2026-02-21T09:06:24.3005934Z .b8 111 2026-02-21T09:06:24.3005992Z .b8 116 2026-02-21T09:06:24.3006040Z .b8 47 2026-02-21T09:06:24.3006088Z .b8 54 2026-02-21T09:06:24.3006136Z .b8 98 2026-02-21T09:06:24.3006192Z .b8 0 2026-02-21T09:06:24.3006240Z } 2026-02-21T09:06:24.3006303Z .section .debug_macinfo { } 2026-02-21T09:06:24.3006307Z 2026-02-21T09:06:24.3006389Z ================================================================ 2026-02-21T09:06:24.3006487Z please share the reproducer above with Triton project. 2026-02-21T09:06:24.6480037Z 2026-02-21T09:06:24.6482603Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 87/87 17.8 configs/s 2026-02-21T09:06:25.1535785Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1931.9 2026-02-21T09:06:25.1540108Z configs/s 2026-02-21T09:06:25.2009067Z [85s] Generation 3 complete: 2026-02-21T09:06:25.2010535Z error=19 2026-02-21T09:06:25.3412946Z ok=70 2026-02-21T09:06:25.3413094Z min=0.0348 2026-02-21T09:06:25.3413229Z mid=0.0860 2026-02-21T09:06:25.3413368Z max=13.1666 2026-02-21T09:06:25.3413517Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:06:25.3413792Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:06:25.3414030Z 'l2_groupings': [2], 2026-02-21T09:06:25.3414221Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:06:25.3414426Z 'loop_orders': [[1, 0]], 2026-02-21T09:06:25.3414595Z 'num_stages': 7, 2026-02-21T09:06:25.3414926Z 'num_warps': 8, 2026-02-21T09:06:25.3415091Z 'pid_type': 'flat', 2026-02-21T09:06:25.3415273Z 'range_flattens': [None, None], 2026-02-21T09:06:25.3415470Z 'range_multi_buffers': [None, None], 2026-02-21T09:06:25.3415800Z 'range_num_stages': [0, 0], 2026-02-21T09:06:25.3415978Z 'range_unroll_factors': [0, 0], 2026-02-21T09:06:25.3416181Z 'range_warp_specializes': [None, None]} 2026-02-21T09:06:25.3416415Z [85s] Fitting surrogate: 369 points, 369 targets 2026-02-21T09:06:26.4544208Z [86s] Generation 4 starting: 84 neighbors, 5 active search path(s) 2026-02-21T09:06:37.5835585Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 2.3 configs/s 2026-02-21T09:06:41.6673615Z [102s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:06:41.6673966Z 2026-02-21T09:06:41.6676771Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:06:41.6677897Z 2026-02-21T09:06:41.6678003Z ================================================================ 2026-02-21T09:06:41.6678230Z Internal Triton PTX codegen error 2026-02-21T09:06:41.6678416Z `ptxas` stderr: 2026-02-21T09:06:41.6678834Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:06:41.6679325Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:41.6679475Z 2026-02-21T09:06:41.6679889Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpe4zruyji.ptx -o /tmp/tmpe4zruyji.ptx.o 2026-02-21T09:06:41.6680344Z 2026-02-21T09:06:41.6680347Z 2026-02-21T09:06:41.6680411Z // 2026-02-21T09:06:41.6680581Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:06:41.6680757Z // 2026-02-21T09:06:41.6680834Z 2026-02-21T09:06:41.6680889Z .version 8.7 2026-02-21T09:06:41.6681021Z .target sm_100a 2026-02-21T09:06:41.6681167Z .address_size 64 2026-02-21T09:06:41.6681249Z 2026-02-21T09:06:41.6681380Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:06:41.6681630Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:06:41.6681878Z // @_helion_matmul 2026-02-21T09:06:41.6682083Z .visible .entry _helion_matmul( 2026-02-21T09:06:41.6682312Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:06:41.6682570Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:06:41.6682835Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:06:41.6683099Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:06:41.6683356Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:06:41.6683573Z ) 2026-02-21T09:06:41.6683703Z .reqntid 256 2026-02-21T09:06:41.6683847Z .maxnreg 32 2026-02-21T09:06:41.6683974Z { 2026-02-21T09:06:41.6684116Z .reg .pred %p<127>; 2026-02-21T09:06:41.6684268Z .reg .b32 %r<955>; 2026-02-21T09:06:41.6684425Z .reg .b64 %rd<334>; 2026-02-21T09:06:41.6684903Z .loc 1 19 0 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:19:0 2026-02-21T09:06:41.6685224Z $L__func_begin0: 2026-02-21T09:06:41.6685500Z .loc 1 19 0 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:19:0 2026-02-21T09:06:41.6685747Z 2026-02-21T09:06:41.6685806Z // %bb.0: 2026-02-21T09:06:41.6685978Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:06:41.6686173Z $L__tmp0: 2026-02-21T09:06:41.6686424Z .loc 1 19 0 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:19 2026-02-21T09:06:41.6686719Z mov.u32 %r1, %tid.x; 2026-02-21T09:06:41.6686915Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:06:41.6687239Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:06:41.6687409Z mov.b32 %r73, global_smem; 2026-02-21T09:06:41.6687580Z // begin inline asm 2026-02-21T09:06:41.6687849Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r73], 256; 2026-02-21T09:06:41.6688120Z // end inline asm 2026-02-21T09:06:41.6688285Z ld.param.b64 %rd46, [_helion_matmul_param_3]; 2026-02-21T09:06:41.6688487Z bar.sync 0; 2026-02-21T09:06:41.6688637Z ld.shared.b32 %r931, [global_smem]; 2026-02-21T09:06:41.6688860Z bar.sync 0; 2026-02-21T09:06:41.6689003Z // begin inline asm 2026-02-21T09:06:41.6689231Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:06:41.6689522Z // end inline asm 2026-02-21T09:06:41.6689774Z .loc 1 21 67 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:21:67 2026-02-21T09:06:41.6690079Z mov.u32 %r954, %ctaid.x; 2026-02-21T09:06:41.6690231Z mov.u32 %r261, %ctaid.y; 2026-02-21T09:06:41.6690421Z mov.u32 %r262, %ctaid.z; 2026-02-21T09:06:41.6690573Z mov.u32 %r263, %nctaid.x; 2026-02-21T09:06:41.6690733Z mov.u32 %r264, %nctaid.y; 2026-02-21T09:06:41.6690888Z mad.lo.s32 %r265, %r262, %r264, %r261; 2026-02-21T09:06:41.6691072Z mad.lo.s32 %r266, %r265, %r263, %r954; 2026-02-21T09:06:41.6691237Z shl.b32 %r267, %r266, 8; 2026-02-21T09:06:41.6691385Z cvt.s64.s32 %rd47, %r267; 2026-02-21T09:06:41.6691543Z add.s64 %rd19, %rd46, %rd47; 2026-02-21T09:06:41.6691697Z shl.b32 %r268, %r1, 2; 2026-02-21T09:06:41.6692145Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:06:41.6692371Z `ptxas` stderr: 2026-02-21T09:06:41.6692771Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:06:41.6693228Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:41.6693371Z 2026-02-21T09:06:41.6693746Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpe4zruyji.ptx -o /tmp/tmpe4zruyji.ptx.o 2026-02-21T09:06:41.6694180Z 2026-02-21T09:06:41.6694302Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:06:41.6694538Z add.s32 %r74, %r73, %r268; 2026-02-21T09:06:41.6694740Z mov.b32 %r83, 0; 2026-02-21T09:06:41.6694873Z // begin inline asm 2026-02-21T09:06:41.6695026Z @%p4 st.shared.b32 [ %r74 + 0 ], %r83; 2026-02-21T09:06:41.6695197Z // end inline asm 2026-02-21T09:06:41.6695356Z bar.warp.sync -1; 2026-02-21T09:06:41.6695516Z setp.eq.b32 %p119, %r1, 0; 2026-02-21T09:06:41.6695676Z cvt.u64.u32 %rd4, %r73; 2026-02-21T09:06:41.6695838Z // begin inline asm 2026-02-21T09:06:41.6696099Z @%p119 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:06:41.6696383Z // end inline asm 2026-02-21T09:06:41.6696527Z // begin inline asm 2026-02-21T09:06:41.6696747Z @%p119 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:06:41.6697010Z // end inline asm 2026-02-21T09:06:41.6697137Z mov.b32 %r76, 16; 2026-02-21T09:06:41.6697274Z // begin inline asm 2026-02-21T09:06:41.6697506Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r76; 2026-02-21T09:06:41.6697818Z // end inline asm 2026-02-21T09:06:41.6697952Z mov.b32 %r77, 128; 2026-02-21T09:06:41.6698083Z // begin inline asm 2026-02-21T09:06:41.6698317Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r77; 2026-02-21T09:06:41.6698578Z // end inline asm 2026-02-21T09:06:41.6698715Z mov.b32 %r78, 2048; 2026-02-21T09:06:41.6698851Z // begin inline asm 2026-02-21T09:06:41.6699092Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r78; 2026-02-21T09:06:41.6699363Z // end inline asm 2026-02-21T09:06:41.6699492Z mov.b32 %r79, 4096; 2026-02-21T09:06:41.6699635Z // begin inline asm 2026-02-21T09:06:41.6699866Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r79; 2026-02-21T09:06:41.6700165Z // end inline asm 2026-02-21T09:06:41.6700294Z mov.b64 %rd12, 4096; 2026-02-21T09:06:41.6700436Z // begin inline asm 2026-02-21T09:06:41.6700677Z @%p119 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:06:41.6700958Z // end inline asm 2026-02-21T09:06:41.6701088Z mov.b32 %r80, 1; 2026-02-21T09:06:41.6701212Z // begin inline asm 2026-02-21T09:06:41.6701487Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r80; 2026-02-21T09:06:41.6701764Z // end inline asm 2026-02-21T09:06:41.6701900Z // begin inline asm 2026-02-21T09:06:41.6702144Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r80; 2026-02-21T09:06:41.6702428Z // end inline asm 2026-02-21T09:06:41.6702561Z // begin inline asm 2026-02-21T09:06:41.6702814Z @%p119 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:06:41.6703089Z // end inline asm 2026-02-21T09:06:41.6703221Z // begin inline asm 2026-02-21T09:06:41.6703476Z @%p119 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:41.6703765Z // end inline asm 2026-02-21T09:06:41.6703906Z // begin inline asm 2026-02-21T09:06:41.6704141Z @%p119 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:06:41.6704417Z // end inline asm 2026-02-21T09:06:41.6704558Z // begin inline asm 2026-02-21T09:06:41.6704820Z @%p119 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:41.6705087Z // end inline asm 2026-02-21T09:06:41.6705218Z // begin inline asm 2026-02-21T09:06:41.6705566Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:06:41.6705928Z // end inline asm 2026-02-21T09:06:41.6706068Z // begin inline asm 2026-02-21T09:06:41.6706279Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:06:41.6706521Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:06:41.6706714Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:41.6706886Z // end inline asm 2026-02-21T09:06:41.6707022Z bar.sync 0; 2026-02-21T09:06:41.6707158Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:06:41.6707440Z .loc 1 22 67 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:22:67 2026-02-21T09:06:41.6707734Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:06:41.6707890Z bar.sync 0; 2026-02-21T09:06:41.6708022Z // begin inline asm 2026-02-21T09:06:41.6708167Z @%p4 st.shared.b32 [ %r74 + 0 ], %r83; 2026-02-21T09:06:41.6708338Z // end inline asm 2026-02-21T09:06:41.6708470Z bar.warp.sync -1; 2026-02-21T09:06:41.6708611Z // begin inline asm 2026-02-21T09:06:41.6708850Z @%p119 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:06:41.6709133Z // end inline asm 2026-02-21T09:06:41.6709262Z // begin inline asm 2026-02-21T09:06:41.6709485Z @%p119 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:06:41.6709736Z // end inline asm 2026-02-21T09:06:41.6709863Z // begin inline asm 2026-02-21T09:06:41.6710135Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r76; 2026-02-21T09:06:41.6710391Z // end inline asm 2026-02-21T09:06:41.6710525Z mov.b32 %r85, 256; 2026-02-21T09:06:41.6710657Z // begin inline asm 2026-02-21T09:06:41.6710888Z @%p119 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r85; 2026-02-21T09:06:41.6711148Z // end inline asm 2026-02-21T09:06:41.6711276Z // begin inline asm 2026-02-21T09:06:41.6711514Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r78; 2026-02-21T09:06:41.6711774Z // end inline asm 2026-02-21T09:06:41.6711908Z // begin inline asm 2026-02-21T09:06:41.6712136Z @%p119 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r78; 2026-02-21T09:06:41.6712445Z // end inline asm 2026-02-21T09:06:41.6712573Z // begin inline asm 2026-02-21T09:06:41.6712820Z @%p119 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:06:41.6713098Z // end inline asm 2026-02-21T09:06:41.6713226Z // begin inline asm 2026-02-21T09:06:41.6713473Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r80; 2026-02-21T09:06:41.6713781Z // end inline asm 2026-02-21T09:06:41.6713919Z // begin inline asm 2026-02-21T09:06:41.6714162Z @%p119 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r80; 2026-02-21T09:06:41.6714443Z // end inline asm 2026-02-21T09:06:41.6714577Z // begin inline asm 2026-02-21T09:06:41.6714833Z @%p119 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:06:41.6715098Z // end inline asm 2026-02-21T09:06:41.6715261Z // begin inline asm 2026-02-21T09:06:41.6715513Z @%p119 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:41.6715790Z // end inline asm 2026-02-21T09:06:41.6715931Z // begin inline asm 2026-02-21T09:06:41.6716168Z @%p119 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:06:41.6716428Z // end inline asm 2026-02-21T09:06:41.6716563Z // begin inline asm 2026-02-21T09:06:41.6716781Z @%p119 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:06:41.6717038Z // end inline asm 2026-02-21T09:06:41.6717165Z // begin inline asm 2026-02-21T09:06:41.6717497Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:06:41.6717864Z // end inline asm 2026-02-21T09:06:41.6717993Z // begin inline asm 2026-02-21T09:06:41.6718199Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:06:41.6718439Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:06:41.6718628Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:41.6718795Z // end inline asm 2026-02-21T09:06:41.6718930Z bar.sync 0; 2026-02-21T09:06:41.6719063Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:06:41.6719338Z .loc 1 40 45 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:40:45 2026-02-21T09:06:41.6719626Z shr.u32 %r269, %r1, 5; 2026-02-21T09:06:41.6719887Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6720190Z sub.s32 %r270, 256, %r954; 2026-02-21T09:06:41.6720349Z mul.hi.s32 %r271, %r270, -580400985; 2026-02-21T09:06:41.6720528Z add.s32 %r272, %r271, %r270; 2026-02-21T09:06:41.6720680Z shr.u32 %r273, %r272, 31; 2026-02-21T09:06:41.6720834Z shr.s32 %r274, %r272, 11; 2026-02-21T09:06:41.6720981Z add.s32 %r275, %r274, %r273; 2026-02-21T09:06:41.6721140Z mul.lo.s32 %r276, %r275, 2368; 2026-02-21T09:06:41.6721308Z setp.ne.b32 %p66, %r270, %r276; 2026-02-21T09:06:41.6721473Z setp.lt.u32 %p67, %r954, 257; 2026-02-21T09:06:41.6721639Z and.pred %p68, %p67, %p66; 2026-02-21T09:06:41.6721793Z selp.b32 %r277, 1, 0, %p68; 2026-02-21T09:06:41.6721952Z add.s32 %r23, %r275, %r277; 2026-02-21T09:06:41.6722257Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6722556Z shfl.sync.idx.b32 %r25, %r269, 0, 31, -1; 2026-02-21T09:06:41.6722735Z shl.b32 %r278, %r25, 21; 2026-02-21T09:06:41.6722890Z and.b32 %r279, %r278, 6291456; 2026-02-21T09:06:41.6723048Z add.s32 %r280, %r279, %r931; 2026-02-21T09:06:41.6723199Z shl.b32 %r281, %r25, 5; 2026-02-21T09:06:41.6723351Z and.b32 %r282, %r281, 128; 2026-02-21T09:06:41.6723498Z add.s32 %r90, %r280, %r282; 2026-02-21T09:06:41.6723654Z mov.pred %p42, -1; 2026-02-21T09:06:41.6723791Z // begin inline asm 2026-02-21T09:06:41.6724134Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 0], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6724601Z // end inline asm 2026-02-21T09:06:41.6724777Z // begin inline asm 2026-02-21T09:06:41.6725106Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 16], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6725452Z // end inline asm 2026-02-21T09:06:41.6725592Z // begin inline asm 2026-02-21T09:06:41.6725941Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 32], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6726313Z // end inline asm 2026-02-21T09:06:41.6726447Z // begin inline asm 2026-02-21T09:06:41.6726772Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 48], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6727148Z // end inline asm 2026-02-21T09:06:41.6727315Z // begin inline asm 2026-02-21T09:06:41.6727651Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 64], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6728017Z // end inline asm 2026-02-21T09:06:41.6728157Z // begin inline asm 2026-02-21T09:06:41.6728476Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 80], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6728837Z // end inline asm 2026-02-21T09:06:41.6728978Z // begin inline asm 2026-02-21T09:06:41.6729306Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 96], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6729668Z // end inline asm 2026-02-21T09:06:41.6729801Z // begin inline asm 2026-02-21T09:06:41.6730133Z @%p42 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r90 + 112], {%r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83, %r83}; 2026-02-21T09:06:41.6730509Z // end inline asm 2026-02-21T09:06:41.6730652Z // begin inline asm 2026-02-21T09:06:41.6730814Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:06:41.6730976Z // end inline asm 2026-02-21T09:06:41.6731117Z bar.sync 0; 2026-02-21T09:06:41.6731378Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6731686Z add.s32 %r226, %r73, 65568; 2026-02-21T09:06:41.6731843Z // begin inline asm 2026-02-21T09:06:41.6732020Z @%p119 mbarrier.init.shared::cta.b64 [%r226], 1; 2026-02-21T09:06:41.6732213Z // end inline asm 2026-02-21T09:06:41.6732351Z bar.sync 0; 2026-02-21T09:06:41.6732488Z add.s32 %r227, %r73, 65576; 2026-02-21T09:06:41.6732639Z // begin inline asm 2026-02-21T09:06:41.6732812Z @%p119 mbarrier.init.shared::cta.b64 [%r227], 1; 2026-02-21T09:06:41.6733002Z // end inline asm 2026-02-21T09:06:41.6733144Z add.s32 %r228, %r73, 65536; 2026-02-21T09:06:41.6733296Z // begin inline asm 2026-02-21T09:06:41.6733466Z @%p119 mbarrier.init.shared::cta.b64 [%r228], 1; 2026-02-21T09:06:41.6733651Z // end inline asm 2026-02-21T09:06:41.6733788Z bar.sync 0; 2026-02-21T09:06:41.6733917Z add.s32 %r229, %r73, 65544; 2026-02-21T09:06:41.6734074Z // begin inline asm 2026-02-21T09:06:41.6734282Z @%p119 mbarrier.init.shared::cta.b64 [%r229], 1; 2026-02-21T09:06:41.6734467Z // end inline asm 2026-02-21T09:06:41.6734606Z bar.sync 0; 2026-02-21T09:06:41.6734761Z add.s32 %r230, %r73, 65552; 2026-02-21T09:06:41.6734922Z // begin inline asm 2026-02-21T09:06:41.6735086Z @%p119 mbarrier.init.shared::cta.b64 [%r230], 1; 2026-02-21T09:06:41.6735284Z // end inline asm 2026-02-21T09:06:41.6735416Z bar.sync 0; 2026-02-21T09:06:41.6735556Z add.s32 %r306, %r73, 65560; 2026-02-21T09:06:41.6735716Z // begin inline asm 2026-02-21T09:06:41.6735881Z @%p119 mbarrier.init.shared::cta.b64 [%r306], 1; 2026-02-21T09:06:41.6736076Z // end inline asm 2026-02-21T09:06:41.6736220Z setp.lt.s32 %p69, %r23, 1; 2026-02-21T09:06:41.6736431Z setp.gt.s32 %p65, %r23, 0; 2026-02-21T09:06:41.6736725Z .loc 1 35 33 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:35:33 2026-02-21T09:06:41.6737025Z shr.u32 %r283, %r954, 5; 2026-02-21T09:06:41.6737176Z and.b32 %r284, %r283, 67108860; 2026-02-21T09:06:41.6737454Z .loc 1 36 39 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:36:39 2026-02-21T09:06:41.6737745Z sub.s32 %r285, 8, %r284; 2026-02-21T09:06:41.6738024Z .loc 1 36 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:36:52 2026-02-21T09:06:41.6738316Z min.s32 %r286, %r285, 4; 2026-02-21T09:06:41.6738565Z .loc 1 37 45 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:37:45 2026-02-21T09:06:41.6738844Z and.b32 %r287, %r954, 127; 2026-02-21T09:06:41.6739120Z .loc 1 38 51 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:38:51 2026-02-21T09:06:41.6739406Z div.s32 %r288, %r287, %r286; 2026-02-21T09:06:41.6739667Z .loc 1 37 64 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:37:64 2026-02-21T09:06:41.6739948Z mul.lo.s32 %r289, %r288, %r286; 2026-02-21T09:06:41.6740115Z sub.s32 %r290, %r287, %r289; 2026-02-21T09:06:41.6740371Z .loc 1 37 30 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:37:30 2026-02-21T09:06:41.6740657Z add.s32 %r291, %r290, %r284; 2026-02-21T09:06:41.6740911Z .loc 1 39 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:39:27 2026-02-21T09:06:41.6741196Z shl.b32 %r935, %r291, 8; 2026-02-21T09:06:41.6741449Z .loc 1 41 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:41:27 2026-02-21T09:06:41.6741723Z shl.b32 %r933, %r288, 7; 2026-02-21T09:06:41.6741985Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6742261Z bar.sync 0; 2026-02-21T09:06:41.6742404Z and.pred %p1, %p119, %p65; 2026-02-21T09:06:41.6742554Z // begin inline asm 2026-02-21T09:06:41.6742748Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r228], 12288; 2026-02-21T09:06:41.6742967Z // end inline asm 2026-02-21T09:06:41.6743206Z .loc 1 51 31 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:51:31 2026-02-21T09:06:41.6743492Z bar.sync 0; 2026-02-21T09:06:41.6743626Z elect.sync %r292|%p70, -1; 2026-02-21T09:06:41.6743790Z and.pred %p71, %p65, %p70; 2026-02-21T09:06:41.6743942Z and.pred %p57, %p4, %p71; 2026-02-21T09:06:41.6744097Z add.s32 %r233, %r73, 32768; 2026-02-21T09:06:41.6744243Z // begin inline asm 2026-02-21T09:06:41.6744567Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r233], [%rd40, {%r83, %r933}], [%r228]; 2026-02-21T09:06:41.6744951Z // end inline asm 2026-02-21T09:06:41.6745196Z .loc 1 52 44 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:52:44 2026-02-21T09:06:41.6745481Z bar.sync 0; 2026-02-21T09:06:41.6745613Z elect.sync %r293|%p72, -1; 2026-02-21T09:06:41.6745774Z and.pred %p73, %p65, %p72; 2026-02-21T09:06:41.6745928Z and.pred %p58, %p4, %p73; 2026-02-21T09:06:41.6746082Z // begin inline asm 2026-02-21T09:06:41.6746429Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r73], [%rd41, {%r83, %r935}], [%r228]; 2026-02-21T09:06:41.6746788Z // end inline asm 2026-02-21T09:06:41.6747042Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6747329Z bar.sync 0; 2026-02-21T09:06:41.6747463Z // begin inline asm 2026-02-21T09:06:41.6747644Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r229], 12288; 2026-02-21T09:06:41.6747860Z // end inline asm 2026-02-21T09:06:41.6748100Z .loc 1 51 31 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:51:31 2026-02-21T09:06:41.6748387Z bar.sync 0; 2026-02-21T09:06:41.6748563Z elect.sync %r294|%p74, -1; 2026-02-21T09:06:41.6748721Z and.pred %p75, %p65, %p74; 2026-02-21T09:06:41.6748881Z and.pred %p60, %p4, %p75; 2026-02-21T09:06:41.6749032Z add.s32 %r242, %r73, 36864; 2026-02-21T09:06:41.6749187Z // begin inline asm 2026-02-21T09:06:41.6749499Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd40, {%r76, %r933}], [%r229]; 2026-02-21T09:06:41.6749846Z // end inline asm 2026-02-21T09:06:41.6750108Z .loc 1 52 44 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:52:44 2026-02-21T09:06:41.6750391Z bar.sync 0; 2026-02-21T09:06:41.6750528Z elect.sync %r295|%p76, -1; 2026-02-21T09:06:41.6750683Z and.pred %p77, %p65, %p76; 2026-02-21T09:06:41.6750843Z and.pred %p61, %p4, %p77; 2026-02-21T09:06:41.6750992Z add.s32 %r246, %r73, 8192; 2026-02-21T09:06:41.6751143Z // begin inline asm 2026-02-21T09:06:41.6751481Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd41, {%r76, %r935}], [%r229]; 2026-02-21T09:06:41.6751866Z // end inline asm 2026-02-21T09:06:41.6752144Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6752449Z bar.sync 0; 2026-02-21T09:06:41.6752586Z // begin inline asm 2026-02-21T09:06:41.6752778Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r230], 12288; 2026-02-21T09:06:41.6753018Z // end inline asm 2026-02-21T09:06:41.6753287Z .loc 1 51 31 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:51:31 2026-02-21T09:06:41.6753601Z bar.sync 0; 2026-02-21T09:06:41.6753737Z elect.sync %r296|%p78, -1; 2026-02-21T09:06:41.6753906Z and.pred %p79, %p65, %p78; 2026-02-21T09:06:41.6754072Z and.pred %p63, %p4, %p79; 2026-02-21T09:06:41.6754228Z add.s32 %r251, %r73, 40960; 2026-02-21T09:06:41.6754389Z mov.b32 %r252, 32; 2026-02-21T09:06:41.6754530Z // begin inline asm 2026-02-21T09:06:41.6754912Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r251], [%rd40, {%r252, %r933}], [%r230]; 2026-02-21T09:06:41.6755267Z // end inline asm 2026-02-21T09:06:41.6755516Z .loc 1 52 44 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:52:44 2026-02-21T09:06:41.6755801Z bar.sync 0; 2026-02-21T09:06:41.6755931Z elect.sync %r297|%p80, -1; 2026-02-21T09:06:41.6756093Z and.pred %p81, %p65, %p80; 2026-02-21T09:06:41.6756244Z and.pred %p64, %p4, %p81; 2026-02-21T09:06:41.6756401Z add.s32 %r255, %r73, 16384; 2026-02-21T09:06:41.6756545Z // begin inline asm 2026-02-21T09:06:41.6756862Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r255], [%rd41, {%r252, %r935}], [%r230]; 2026-02-21T09:06:41.6757207Z // end inline asm 2026-02-21T09:06:41.6757463Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6757757Z bar.sync 0; 2026-02-21T09:06:41.6757885Z // begin inline asm 2026-02-21T09:06:41.6758021Z 2026-02-21T09:06:41.6758131Z { 2026-02-21T09:06:41.6758259Z @!%p65 bra.uni skipWait; 2026-02-21T09:06:41.6758412Z .reg .pred complete; 2026-02-21T09:06:41.6758560Z waitLoop: 2026-02-21T09:06:41.6758740Z mbarrier.try_wait.parity.shared.b64 complete, [%r228], %r83; 2026-02-21T09:06:41.6759016Z @!complete bra.uni waitLoop; 2026-02-21T09:06:41.6759168Z skipWait: 2026-02-21T09:06:41.6759293Z } 2026-02-21T09:06:41.6759358Z 2026-02-21T09:06:41.6759419Z // end inline asm 2026-02-21T09:06:41.6759664Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6759951Z setp.ne.b32 %p82, %r25, 0; 2026-02-21T09:06:41.6760104Z or.pred %p83, %p69, %p82; 2026-02-21T09:06:41.6760258Z @%p83 bra $L__BB0_2; 2026-02-21T09:06:41.6760392Z // %bb.1: 2026-02-21T09:06:41.6760527Z elect.sync %r300|%p85, -1; 2026-02-21T09:06:41.6760686Z bfe.u32 %r303, %r233, 4, 14; 2026-02-21T09:06:41.6760876Z cvt.u64.u32 %rd51, %r303; 2026-02-21T09:06:41.6761045Z or.b64 %rd48, %rd51, -4611685949691133952; 2026-02-21T09:06:41.6761223Z bfe.u32 %r304, %r73, 4, 14; 2026-02-21T09:06:41.6761379Z cvt.u64.u32 %rd52, %r304; 2026-02-21T09:06:41.6761534Z or.b64 %rd49, %rd52, -4611685949674356736; 2026-02-21T09:06:41.6761715Z mov.b32 %r299, 138412048; 2026-02-21T09:06:41.6761859Z mov.pred %p84, 0; 2026-02-21T09:06:41.6762001Z // begin inline asm 2026-02-21T09:06:41.6762237Z @%p85 tcgen05.mma.cta_group::1.kind::f16 [ %r931 + 0 ], %rd48, %rd49, %r299, %p84; 2026-02-21T09:06:41.6762492Z // end inline asm 2026-02-21T09:06:41.6762631Z add.s32 %r305, %r73, 65568; 2026-02-21T09:06:41.6762777Z cvt.u64.u32 %rd50, %r305; 2026-02-21T09:06:41.6762926Z // begin inline asm 2026-02-21T09:06:41.6763127Z @%p85 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd50]; 2026-02-21T09:06:41.6763354Z // end inline asm 2026-02-21T09:06:41.6763479Z $L__BB0_2: 2026-02-21T09:06:41.6763747Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6764031Z bar.sync 0; 2026-02-21T09:06:41.6764162Z // begin inline asm 2026-02-21T09:06:41.6764342Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r306], 12288; 2026-02-21T09:06:41.6764555Z // end inline asm 2026-02-21T09:06:41.6764825Z .loc 1 51 31 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:51:31 2026-02-21T09:06:41.6765106Z bar.sync 0; 2026-02-21T09:06:41.6765244Z elect.sync %r316|%p93, -1; 2026-02-21T09:06:41.6765399Z and.pred %p94, %p65, %p93; 2026-02-21T09:06:41.6765557Z and.pred %p88, %p4, %p94; 2026-02-21T09:06:41.6765705Z add.s32 %r307, %r73, 45056; 2026-02-21T09:06:41.6765856Z mov.b32 %r939, 48; 2026-02-21T09:06:41.6765988Z // begin inline asm 2026-02-21T09:06:41.6766304Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r307], [%rd40, {%r939, %r933}], [%r306]; 2026-02-21T09:06:41.6766656Z // end inline asm 2026-02-21T09:06:41.6766893Z .loc 1 52 44 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:52:44 2026-02-21T09:06:41.6767180Z bar.sync 0; 2026-02-21T09:06:41.6767312Z elect.sync %r317|%p95, -1; 2026-02-21T09:06:41.6767474Z and.pred %p96, %p65, %p95; 2026-02-21T09:06:41.6767628Z and.pred %p89, %p4, %p96; 2026-02-21T09:06:41.6767785Z add.s32 %r311, %r73, 24576; 2026-02-21T09:06:41.6767936Z // begin inline asm 2026-02-21T09:06:41.6768250Z @%p89 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r311], [%rd41, {%r939, %r935}], [%r306]; 2026-02-21T09:06:41.6768595Z // end inline asm 2026-02-21T09:06:41.6768842Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6769132Z @%p69 bra $L__BB0_12; 2026-02-21T09:06:41.6769292Z // %bb.3: // %.lr.ph 2026-02-21T09:06:41.6769593Z .loc 1 0 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:0:108 2026-02-21T09:06:41.6769880Z and.b32 %r4, %r1, 31; 2026-02-21T09:06:41.6770024Z bfe.u32 %r7, %r1, 5, 3; 2026-02-21T09:06:41.6770200Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:06:41.6770381Z shl.b32 %r5, %r4, 3; 2026-02-21T09:06:41.6770584Z and.b32 %r6, %r1, 224; 2026-02-21T09:06:41.6770726Z or.b32 %r8, %r7, 8; 2026-02-21T09:06:41.6770866Z or.b32 %r9, %r7, 16; 2026-02-21T09:06:41.6771001Z or.b32 %r10, %r7, 24; 2026-02-21T09:06:41.6771147Z or.b32 %r11, %r7, 32; 2026-02-21T09:06:41.6771282Z or.b32 %r12, %r7, 40; 2026-02-21T09:06:41.6771421Z or.b32 %r13, %r7, 48; 2026-02-21T09:06:41.6771561Z or.b32 %r14, %r7, 56; 2026-02-21T09:06:41.6771699Z or.b32 %r15, %r7, 64; 2026-02-21T09:06:41.6771847Z or.b32 %r16, %r7, 72; 2026-02-21T09:06:41.6771986Z or.b32 %r17, %r7, 80; 2026-02-21T09:06:41.6772133Z or.b32 %r18, %r7, 88; 2026-02-21T09:06:41.6772272Z or.b32 %r19, %r7, 96; 2026-02-21T09:06:41.6772421Z or.b32 %r20, %r7, 104; 2026-02-21T09:06:41.6772600Z or.b32 %r21, %r7, 112; 2026-02-21T09:06:41.6772752Z or.b32 %r22, %r7, 120; 2026-02-21T09:06:41.6772895Z shl.b32 %r24, %r23, 7; 2026-02-21T09:06:41.6773051Z add.s32 %r29, %r24, -4; 2026-02-21T09:06:41.6773207Z shl.b32 %r324, %r1, 11; 2026-02-21T09:06:41.6773359Z and.b32 %r325, %r324, 14336; 2026-02-21T09:06:41.6773522Z shl.b32 %r326, %r1, 4; 2026-02-21T09:06:41.6773668Z and.b32 %r327, %r326, 2032; 2026-02-21T09:06:41.6773829Z shr.u32 %r328, %r1, 1; 2026-02-21T09:06:41.6773998Z and.b32 %r329, %r328, 64; 2026-02-21T09:06:41.6774163Z xor.b32 %r330, %r327, %r329; 2026-02-21T09:06:41.6774321Z or.b32 %r331, %r330, %r325; 2026-02-21T09:06:41.6774483Z add.s32 %r333, %r73, 49152; 2026-02-21T09:06:41.6774635Z add.s32 %r30, %r333, %r331; 2026-02-21T09:06:41.6774832Z xor.b32 %r334, %r331, 16; 2026-02-21T09:06:41.6774992Z add.s32 %r31, %r333, %r334; 2026-02-21T09:06:41.6775144Z xor.b32 %r335, %r331, 32; 2026-02-21T09:06:41.6775332Z add.s32 %r32, %r333, %r335; 2026-02-21T09:06:41.6775486Z xor.b32 %r336, %r331, 48; 2026-02-21T09:06:41.6775644Z add.s32 %r33, %r333, %r336; 2026-02-21T09:06:41.6775796Z shl.b32 %r337, %r6, 6; 2026-02-21T09:06:41.6775949Z shl.b32 %r338, %r4, 4; 2026-02-21T09:06:41.6776093Z shr.u32 %r339, %r6, 1; 2026-02-21T09:06:41.6776248Z or.b32 %r340, %r337, %r338; 2026-02-21T09:06:41.6776402Z xor.b32 %r341, %r340, %r339; 2026-02-21T09:06:41.6776565Z add.s32 %r542, %r333, %r341; 2026-02-21T09:06:41.6776730Z add.s32 %r547, %r542, 512; 2026-02-21T09:06:41.6776886Z add.s32 %r552, %r542, 1024; 2026-02-21T09:06:41.6777055Z add.s32 %r557, %r542, 1536; 2026-02-21T09:06:41.6777331Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6777635Z max.s32 %r342, %r24, 2; 2026-02-21T09:06:41.6777786Z add.s32 %r38, %r342, -1; 2026-02-21T09:06:41.6777950Z mov.pred %p126, -1; 2026-02-21T09:06:41.6778094Z mov.b32 %r944, 3; 2026-02-21T09:06:41.6778238Z mov.b32 %r940, 0; 2026-02-21T09:06:41.6778379Z mov.b32 %r938, 1; 2026-02-21T09:06:41.6778508Z mov.b32 %r937, 2; 2026-02-21T09:06:41.6778651Z mov.b32 %r934, %r933; 2026-02-21T09:06:41.6778794Z mov.b32 %r936, %r935; 2026-02-21T09:06:41.6778945Z mov.b32 %r941, %r226; 2026-02-21T09:06:41.6779086Z mov.b32 %r942, %r940; 2026-02-21T09:06:41.6779234Z mov.b32 %r943, %r940; 2026-02-21T09:06:41.6779373Z mov.b32 %r945, %r938; 2026-02-21T09:06:41.6779519Z mov.b32 %r946, %r940; 2026-02-21T09:06:41.6779659Z mov.b32 %r947, %r933; 2026-02-21T09:06:41.6779806Z mov.b32 %r948, %r935; 2026-02-21T09:06:41.6779952Z mov.b32 %r950, %r944; 2026-02-21T09:06:41.6780092Z mov.b32 %r951, %r940; 2026-02-21T09:06:41.6780239Z mov.b32 %r952, %r948; 2026-02-21T09:06:41.6780387Z mov.b32 %r953, %r947; 2026-02-21T09:06:41.6780530Z bra.uni $L__BB0_4; 2026-02-21T09:06:41.6780711Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:41.6781025Z .loc 1 0 0 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:0 2026-02-21T09:06:41.6781310Z selp.b32 %r945, 0, %r388, %p112; 2026-02-21T09:06:41.6781480Z selp.b32 %r389, 1, 0, %p112; 2026-02-21T09:06:41.6781636Z xor.b32 %r946, %r924, %r389; 2026-02-21T09:06:41.6781896Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6782220Z add.s32 %r951, %r951, 1; 2026-02-21T09:06:41.6782373Z setp.ne.b32 %p118, %r38, %r951; 2026-02-21T09:06:41.6782539Z mov.b32 %r933, %r947; 2026-02-21T09:06:41.6782677Z mov.b32 %r934, %r39; 2026-02-21T09:06:41.6782819Z mov.b32 %r935, %r948; 2026-02-21T09:06:41.6782951Z mov.b32 %r936, %r41; 2026-02-21T09:06:41.6783092Z mov.b32 %r937, %r950; 2026-02-21T09:06:41.6783224Z mov.b32 %r938, %r43; 2026-02-21T09:06:41.6783362Z mov.b32 %r940, %r924; 2026-02-21T09:06:41.6783502Z mov.b32 %r941, %r923; 2026-02-21T09:06:41.6783634Z mov.b32 %r947, %r953; 2026-02-21T09:06:41.6783778Z mov.b32 %r948, %r952; 2026-02-21T09:06:41.6783912Z mov.b32 %r950, %r58; 2026-02-21T09:06:41.6784085Z @%p118 bra $L__BB0_4; 2026-02-21T09:06:41.6784220Z bra.uni $L__BB0_11; 2026-02-21T09:06:41.6784407Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:06:41.6784763Z .loc 1 0 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:0:108 2026-02-21T09:06:41.6785055Z mov.b32 %r924, %r946; 2026-02-21T09:06:41.6785201Z mov.b32 %r43, %r937; 2026-02-21T09:06:41.6785339Z mov.b32 %r41, %r935; 2026-02-21T09:06:41.6785511Z mov.b32 %r39, %r933; 2026-02-21T09:06:41.6785767Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6786060Z add.s32 %r343, %r950, 1; 2026-02-21T09:06:41.6786219Z setp.eq.b32 %p98, %r950, 127; 2026-02-21T09:06:41.6786405Z selp.b32 %r58, 0, %r343, %p98; 2026-02-21T09:06:41.6786572Z setp.ne.b32 %p99, %r58, 0; 2026-02-21T09:06:41.6786785Z @%p99 bra $L__BB0_6; 2026-02-21T09:06:41.6786972Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:41.6787177Z add.s32 %r954, %r954, 2368; 2026-02-21T09:06:41.6787441Z .loc 1 34 35 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:34:35 2026-02-21T09:06:41.6787720Z shr.s32 %r344, %r954, 31; 2026-02-21T09:06:41.6787873Z shr.u32 %r345, %r344, 25; 2026-02-21T09:06:41.6788019Z add.s32 %r346, %r954, %r345; 2026-02-21T09:06:41.6788174Z shr.s32 %r347, %r346, 7; 2026-02-21T09:06:41.6788424Z .loc 1 35 33 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:35:33 2026-02-21T09:06:41.6788710Z shl.b32 %r348, %r347, 2; 2026-02-21T09:06:41.6788964Z .loc 1 36 39 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:36:39 2026-02-21T09:06:41.6789239Z sub.s32 %r349, 8, %r348; 2026-02-21T09:06:41.6789492Z .loc 1 36 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:36:52 2026-02-21T09:06:41.6789779Z min.s32 %r350, %r349, 4; 2026-02-21T09:06:41.6790033Z .loc 1 37 45 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:37:45 2026-02-21T09:06:41.6790312Z and.b32 %r351, %r346, -128; 2026-02-21T09:06:41.6790470Z sub.s32 %r352, %r954, %r351; 2026-02-21T09:06:41.6790728Z .loc 1 38 51 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:38:51 2026-02-21T09:06:41.6791008Z div.s32 %r353, %r352, %r350; 2026-02-21T09:06:41.6791269Z .loc 1 37 64 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:37:64 2026-02-21T09:06:41.6791552Z mul.lo.s32 %r354, %r353, %r350; 2026-02-21T09:06:41.6791714Z sub.s32 %r355, %r352, %r354; 2026-02-21T09:06:41.6791969Z .loc 1 37 30 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:37:30 2026-02-21T09:06:41.6792258Z add.s32 %r356, %r355, %r348; 2026-02-21T09:06:41.6792517Z .loc 1 39 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:39:27 2026-02-21T09:06:41.6792797Z shl.b32 %r952, %r356, 8; 2026-02-21T09:06:41.6793051Z .loc 1 41 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:41:27 2026-02-21T09:06:41.6793329Z shl.b32 %r953, %r353, 7; 2026-02-21T09:06:41.6793547Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:41.6793870Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6794154Z add.s32 %r359, %r943, 1; 2026-02-21T09:06:41.6794307Z setp.gt.s32 %p101, %r359, 3; 2026-02-21T09:06:41.6794464Z selp.b32 %r943, 0, %r359, %p101; 2026-02-21T09:06:41.6794632Z selp.b32 %r360, 1, 0, %p101; 2026-02-21T09:06:41.6794819Z xor.b32 %r942, %r942, %r360; 2026-02-21T09:06:41.6794973Z shl.b32 %r361, %r943, 3; 2026-02-21T09:06:41.6795117Z add.s32 %r363, %r73, %r361; 2026-02-21T09:06:41.6795275Z add.s32 %r357, %r363, 65536; 2026-02-21T09:06:41.6795421Z bar.sync 0; 2026-02-21T09:06:41.6795590Z // begin inline asm 2026-02-21T09:06:41.6795729Z 2026-02-21T09:06:41.6795838Z { 2026-02-21T09:06:41.6795964Z .reg .pred complete; 2026-02-21T09:06:41.6796106Z waitLoop: 2026-02-21T09:06:41.6796304Z mbarrier.try_wait.parity.shared.b64 complete, [%r357], %r942; 2026-02-21T09:06:41.6796536Z @!complete bra.uni waitLoop; 2026-02-21T09:06:41.6796691Z } 2026-02-21T09:06:41.6796755Z 2026-02-21T09:06:41.6796809Z // end inline asm 2026-02-21T09:06:41.6796975Z shl.b32 %r364, %r945, 3; 2026-02-21T09:06:41.6797123Z add.s32 %r365, %r73, %r364; 2026-02-21T09:06:41.6797276Z add.s32 %r923, %r365, 65568; 2026-02-21T09:06:41.6797540Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6797833Z @%p82 bra $L__BB0_8; 2026-02-21T09:06:41.6798014Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:41.6798346Z .loc 1 51 31 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:51:31 2026-02-21T09:06:41.6798639Z shl.b32 %r368, %r943, 12; 2026-02-21T09:06:41.6798783Z add.s32 %r370, %r73, %r368; 2026-02-21T09:06:41.6798936Z add.s32 %r371, %r370, 32768; 2026-02-21T09:06:41.6799193Z .loc 1 52 44 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:52:44 2026-02-21T09:06:41.6799465Z shl.b32 %r372, %r943, 13; 2026-02-21T09:06:41.6799615Z add.s32 %r373, %r73, %r372; 2026-02-21T09:06:41.6799866Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6800155Z elect.sync %r374|%p103, -1; 2026-02-21T09:06:41.6800310Z bfe.u32 %r375, %r371, 4, 14; 2026-02-21T09:06:41.6800464Z cvt.u64.u32 %rd58, %r375; 2026-02-21T09:06:41.6800628Z or.b64 %rd55, %rd58, -4611685949691133952; 2026-02-21T09:06:41.6800811Z bfe.u32 %r376, %r373, 4, 14; 2026-02-21T09:06:41.6800966Z cvt.u64.u32 %rd59, %r376; 2026-02-21T09:06:41.6801125Z or.b64 %rd56, %rd59, -4611685949674356736; 2026-02-21T09:06:41.6801304Z mov.b32 %r367, 138412048; 2026-02-21T09:06:41.6801446Z // begin inline asm 2026-02-21T09:06:41.6801675Z @%p103 tcgen05.mma.cta_group::1.kind::f16 [ %r931 + 0 ], %rd55, %rd56, %r367, %p126; 2026-02-21T09:06:41.6801922Z // end inline asm 2026-02-21T09:06:41.6802064Z cvt.u64.u32 %rd57, %r923; 2026-02-21T09:06:41.6802207Z // begin inline asm 2026-02-21T09:06:41.6802413Z @%p103 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd57]; 2026-02-21T09:06:41.6802640Z // end inline asm 2026-02-21T09:06:41.6802806Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:41.6803124Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6803417Z setp.eq.b32 %p108, %r58, 0; 2026-02-21T09:06:41.6803582Z setp.lt.s32 %p109, %r951, %r29; 2026-02-21T09:06:41.6803846Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6804134Z // begin inline asm 2026-02-21T09:06:41.6804269Z 2026-02-21T09:06:41.6804374Z { 2026-02-21T09:06:41.6804495Z .reg .pred complete; 2026-02-21T09:06:41.6804632Z waitLoop: 2026-02-21T09:06:41.6804844Z mbarrier.try_wait.parity.shared.b64 complete, [%r941], %r940; 2026-02-21T09:06:41.6805103Z @!complete bra.uni waitLoop; 2026-02-21T09:06:41.6805253Z } 2026-02-21T09:06:41.6805314Z 2026-02-21T09:06:41.6805367Z // end inline asm 2026-02-21T09:06:41.6805506Z add.s32 %r388, %r945, 1; 2026-02-21T09:06:41.6805656Z setp.gt.s32 %p112, %r388, 1; 2026-02-21T09:06:41.6805930Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6806221Z add.s32 %r390, %r939, 16; 2026-02-21T09:06:41.6806371Z add.s32 %r391, %r944, 1; 2026-02-21T09:06:41.6806538Z setp.gt.s32 %p113, %r391, 3; 2026-02-21T09:06:41.6806701Z selp.b32 %r944, 0, %r391, %p113; 2026-02-21T09:06:41.6806882Z selp.b32 %r939, 0, %r390, %p108; 2026-02-21T09:06:41.6807082Z shl.b32 %r392, %r944, 3; 2026-02-21T09:06:41.6807235Z add.s32 %r394, %r73, %r392; 2026-02-21T09:06:41.6807383Z add.s32 %r383, %r394, 65536; 2026-02-21T09:06:41.6807535Z bar.sync 0; 2026-02-21T09:06:41.6807675Z and.pred %p105, %p119, %p109; 2026-02-21T09:06:41.6807835Z // begin inline asm 2026-02-21T09:06:41.6808036Z @%p105 mbarrier.arrive.expect_tx.shared.b64 _, [%r383], 12288; 2026-02-21T09:06:41.6808250Z // end inline asm 2026-02-21T09:06:41.6808532Z .loc 1 51 31 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:51:31 2026-02-21T09:06:41.6808811Z shl.b32 %r395, %r944, 12; 2026-02-21T09:06:41.6808966Z add.s32 %r396, %r73, %r395; 2026-02-21T09:06:41.6809114Z add.s32 %r380, %r396, 32768; 2026-02-21T09:06:41.6809266Z bar.sync 0; 2026-02-21T09:06:41.6809405Z elect.sync %r397|%p114, -1; 2026-02-21T09:06:41.6809562Z and.pred %p115, %p109, %p114; 2026-02-21T09:06:41.6809773Z and.pred %p106, %p4, %p115; 2026-02-21T09:06:41.6809923Z // begin inline asm 2026-02-21T09:06:41.6810252Z @%p106 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r380], [%rd40, {%r939, %r953}], [%r383]; 2026-02-21T09:06:41.6810596Z // end inline asm 2026-02-21T09:06:41.6810841Z .loc 1 52 44 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:52:44 2026-02-21T09:06:41.6811123Z shl.b32 %r398, %r944, 13; 2026-02-21T09:06:41.6811269Z add.s32 %r384, %r73, %r398; 2026-02-21T09:06:41.6811417Z bar.sync 0; 2026-02-21T09:06:41.6811546Z elect.sync %r399|%p116, -1; 2026-02-21T09:06:41.6811714Z and.pred %p117, %p109, %p116; 2026-02-21T09:06:41.6811871Z and.pred %p107, %p4, %p117; 2026-02-21T09:06:41.6812027Z // begin inline asm 2026-02-21T09:06:41.6812344Z @%p107 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r384], [%rd41, {%r939, %r952}], [%r383]; 2026-02-21T09:06:41.6812697Z // end inline asm 2026-02-21T09:06:41.6812949Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6813239Z setp.ne.b32 %p126, %r938, 127; 2026-02-21T09:06:41.6813407Z @%p126 bra $L__BB0_10; 2026-02-21T09:06:41.6813587Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:06:41.6813913Z .loc 1 40 32 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:40:32 2026-02-21T09:06:41.6814206Z add.s32 %r682, %r936, %r5; 2026-02-21T09:06:41.6814487Z .loc 1 42 32 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:42:32 2026-02-21T09:06:41.6814827Z add.s32 %r683, %r934, %r7; 2026-02-21T09:06:41.6814985Z add.s32 %r684, %r8, %r934; 2026-02-21T09:06:41.6815147Z add.s32 %r685, %r9, %r934; 2026-02-21T09:06:41.6815300Z add.s32 %r686, %r10, %r934; 2026-02-21T09:06:41.6815465Z add.s32 %r687, %r11, %r934; 2026-02-21T09:06:41.6815620Z add.s32 %r688, %r12, %r934; 2026-02-21T09:06:41.6815781Z add.s32 %r689, %r13, %r934; 2026-02-21T09:06:41.6815934Z add.s32 %r690, %r14, %r934; 2026-02-21T09:06:41.6816098Z add.s32 %r691, %r15, %r934; 2026-02-21T09:06:41.6816253Z add.s32 %r692, %r16, %r934; 2026-02-21T09:06:41.6816422Z add.s32 %r693, %r17, %r934; 2026-02-21T09:06:41.6816583Z add.s32 %r694, %r18, %r934; 2026-02-21T09:06:41.6816771Z add.s32 %r695, %r19, %r934; 2026-02-21T09:06:41.6816931Z add.s32 %r696, %r20, %r934; 2026-02-21T09:06:41.6817082Z add.s32 %r697, %r21, %r934; 2026-02-21T09:06:41.6817241Z add.s32 %r698, %r22, %r934; 2026-02-21T09:06:41.6817509Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6817806Z // begin inline asm 2026-02-21T09:06:41.6817942Z 2026-02-21T09:06:41.6818060Z { 2026-02-21T09:06:41.6818187Z .reg .pred complete; 2026-02-21T09:06:41.6818332Z waitLoop: 2026-02-21T09:06:41.6818527Z mbarrier.try_wait.parity.shared.b64 complete, [%r923], %r924; 2026-02-21T09:06:41.6818763Z @!complete bra.uni waitLoop; 2026-02-21T09:06:41.6818962Z } 2026-02-21T09:06:41.6819027Z 2026-02-21T09:06:41.6819082Z // end inline asm 2026-02-21T09:06:41.6819344Z .loc 1 56 45 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:56:45 2026-02-21T09:06:41.6819632Z shl.b32 %r699, %r683, 11; 2026-02-21T09:06:41.6819794Z shl.b32 %r700, %r684, 11; 2026-02-21T09:06:41.6819952Z shl.b32 %r701, %r685, 11; 2026-02-21T09:06:41.6820099Z shl.b32 %r702, %r686, 11; 2026-02-21T09:06:41.6820256Z shl.b32 %r703, %r687, 11; 2026-02-21T09:06:41.6820431Z shl.b32 %r704, %r688, 11; 2026-02-21T09:06:41.6820588Z shl.b32 %r705, %r689, 11; 2026-02-21T09:06:41.6820735Z shl.b32 %r706, %r690, 11; 2026-02-21T09:06:41.6820890Z shl.b32 %r707, %r691, 11; 2026-02-21T09:06:41.6821036Z shl.b32 %r708, %r692, 11; 2026-02-21T09:06:41.6821190Z shl.b32 %r709, %r693, 11; 2026-02-21T09:06:41.6821333Z shl.b32 %r710, %r694, 11; 2026-02-21T09:06:41.6821486Z shl.b32 %r711, %r695, 11; 2026-02-21T09:06:41.6821680Z shl.b32 %r712, %r696, 11; 2026-02-21T09:06:41.6821832Z shl.b32 %r713, %r697, 11; 2026-02-21T09:06:41.6821986Z shl.b32 %r714, %r698, 11; 2026-02-21T09:06:41.6822241Z .loc 1 56 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:56:52 2026-02-21T09:06:41.6822522Z add.s32 %r715, %r699, %r682; 2026-02-21T09:06:41.6822674Z add.s32 %r716, %r700, %r682; 2026-02-21T09:06:41.6822830Z add.s32 %r717, %r701, %r682; 2026-02-21T09:06:41.6822977Z add.s32 %r718, %r702, %r682; 2026-02-21T09:06:41.6823134Z add.s32 %r719, %r703, %r682; 2026-02-21T09:06:41.6823288Z add.s32 %r720, %r704, %r682; 2026-02-21T09:06:41.6823433Z add.s32 %r721, %r705, %r682; 2026-02-21T09:06:41.6823588Z add.s32 %r722, %r706, %r682; 2026-02-21T09:06:41.6823735Z add.s32 %r723, %r707, %r682; 2026-02-21T09:06:41.6823889Z add.s32 %r724, %r708, %r682; 2026-02-21T09:06:41.6824036Z add.s32 %r725, %r709, %r682; 2026-02-21T09:06:41.6824190Z add.s32 %r726, %r710, %r682; 2026-02-21T09:06:41.6824338Z add.s32 %r727, %r711, %r682; 2026-02-21T09:06:41.6824495Z add.s32 %r728, %r712, %r682; 2026-02-21T09:06:41.6824643Z add.s32 %r729, %r713, %r682; 2026-02-21T09:06:41.6824835Z add.s32 %r730, %r714, %r682; 2026-02-21T09:06:41.6825109Z .loc 1 56 24 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:56:24 2026-02-21T09:06:41.6825402Z mad.wide.s32 %rd62, %r715, 2, %rd3; 2026-02-21T09:06:41.6825586Z mad.wide.s32 %rd63, %r716, 2, %rd3; 2026-02-21T09:06:41.6825754Z mad.wide.s32 %rd64, %r717, 2, %rd3; 2026-02-21T09:06:41.6825930Z mad.wide.s32 %rd65, %r718, 2, %rd3; 2026-02-21T09:06:41.6826094Z mad.wide.s32 %rd66, %r719, 2, %rd3; 2026-02-21T09:06:41.6826264Z mad.wide.s32 %rd67, %r720, 2, %rd3; 2026-02-21T09:06:41.6826426Z mad.wide.s32 %rd68, %r721, 2, %rd3; 2026-02-21T09:06:41.6826594Z mad.wide.s32 %rd69, %r722, 2, %rd3; 2026-02-21T09:06:41.6826763Z mad.wide.s32 %rd70, %r723, 2, %rd3; 2026-02-21T09:06:41.6826926Z mad.wide.s32 %rd71, %r724, 2, %rd3; 2026-02-21T09:06:41.6827100Z mad.wide.s32 %rd72, %r725, 2, %rd3; 2026-02-21T09:06:41.6827266Z mad.wide.s32 %rd73, %r726, 2, %rd3; 2026-02-21T09:06:41.6827438Z mad.wide.s32 %rd74, %r727, 2, %rd3; 2026-02-21T09:06:41.6827602Z mad.wide.s32 %rd75, %r728, 2, %rd3; 2026-02-21T09:06:41.6827773Z mad.wide.s32 %rd76, %r729, 2, %rd3; 2026-02-21T09:06:41.6827974Z mad.wide.s32 %rd77, %r730, 2, %rd3; 2026-02-21T09:06:41.6828251Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6828546Z // begin inline asm 2026-02-21T09:06:41.6828899Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417}, [%r90 + 0]; 2026-02-21T09:06:41.6829280Z // end inline asm 2026-02-21T09:06:41.6829412Z // begin inline asm 2026-02-21T09:06:41.6829758Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434}, [%r90 + 16]; 2026-02-21T09:06:41.6830176Z // end inline asm 2026-02-21T09:06:41.6830313Z // begin inline asm 2026-02-21T09:06:41.6830656Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451}, [%r90 + 32]; 2026-02-21T09:06:41.6831040Z // end inline asm 2026-02-21T09:06:41.6831179Z // begin inline asm 2026-02-21T09:06:41.6831540Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468}, [%r90 + 48]; 2026-02-21T09:06:41.6831919Z // end inline asm 2026-02-21T09:06:41.6832055Z // begin inline asm 2026-02-21T09:06:41.6832379Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485}, [%r90 + 64]; 2026-02-21T09:06:41.6832746Z // end inline asm 2026-02-21T09:06:41.6832874Z // begin inline asm 2026-02-21T09:06:41.6833234Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502}, [%r90 + 80]; 2026-02-21T09:06:41.6833614Z // end inline asm 2026-02-21T09:06:41.6833750Z // begin inline asm 2026-02-21T09:06:41.6834095Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519}, [%r90 + 96]; 2026-02-21T09:06:41.6834475Z // end inline asm 2026-02-21T09:06:41.6834610Z // begin inline asm 2026-02-21T09:06:41.6834987Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536}, [%r90 + 112]; 2026-02-21T09:06:41.6835379Z // end inline asm 2026-02-21T09:06:41.6835508Z // begin inline asm 2026-02-21T09:06:41.6835664Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:06:41.6835829Z // end inline asm 2026-02-21T09:06:41.6835964Z cvt.u64.u32 %rd78, %r402; 2026-02-21T09:06:41.6836126Z cvt.u64.u32 %rd79, %r403; 2026-02-21T09:06:41.6836279Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:06:41.6836438Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:06:41.6836712Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6837022Z mov.b64 {%r731, %r732}, %rd81; 2026-02-21T09:06:41.6837191Z cvt.rn.f16x2.f32 %r733, %r732, %r731; 2026-02-21T09:06:41.6837489Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6837778Z cvt.u64.u32 %rd82, %r404; 2026-02-21T09:06:41.6837924Z cvt.u64.u32 %rd83, %r405; 2026-02-21T09:06:41.6838076Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:06:41.6838221Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:06:41.6838487Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6838772Z mov.b64 {%r734, %r735}, %rd85; 2026-02-21T09:06:41.6838949Z cvt.rn.f16x2.f32 %r736, %r735, %r734; 2026-02-21T09:06:41.6839238Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6839519Z cvt.u64.u32 %rd86, %r406; 2026-02-21T09:06:41.6839673Z cvt.u64.u32 %rd87, %r407; 2026-02-21T09:06:41.6839850Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:06:41.6840004Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:06:41.6840262Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6840553Z mov.b64 {%r737, %r738}, %rd89; 2026-02-21T09:06:41.6840712Z cvt.rn.f16x2.f32 %r739, %r738, %r737; 2026-02-21T09:06:41.6840988Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6841275Z cvt.u64.u32 %rd90, %r408; 2026-02-21T09:06:41.6841418Z cvt.u64.u32 %rd91, %r409; 2026-02-21T09:06:41.6841568Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:06:41.6841710Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:06:41.6842006Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6842289Z mov.b64 {%r740, %r741}, %rd93; 2026-02-21T09:06:41.6842457Z cvt.rn.f16x2.f32 %r742, %r741, %r740; 2026-02-21T09:06:41.6842741Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6843021Z cvt.u64.u32 %rd94, %r410; 2026-02-21T09:06:41.6843176Z cvt.u64.u32 %rd95, %r411; 2026-02-21T09:06:41.6843346Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:06:41.6843498Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:06:41.6843747Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6844030Z mov.b64 {%r743, %r744}, %rd97; 2026-02-21T09:06:41.6844188Z cvt.rn.f16x2.f32 %r745, %r744, %r743; 2026-02-21T09:06:41.6844487Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6844811Z cvt.u64.u32 %rd98, %r412; 2026-02-21T09:06:41.6844956Z cvt.u64.u32 %rd99, %r413; 2026-02-21T09:06:41.6845110Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:06:41.6845265Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:06:41.6845534Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6845821Z mov.b64 {%r746, %r747}, %rd101; 2026-02-21T09:06:41.6845991Z cvt.rn.f16x2.f32 %r748, %r747, %r746; 2026-02-21T09:06:41.6846277Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6846565Z cvt.u64.u32 %rd102, %r414; 2026-02-21T09:06:41.6846727Z cvt.u64.u32 %rd103, %r415; 2026-02-21T09:06:41.6846878Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:06:41.6847043Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:06:41.6847311Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6847606Z mov.b64 {%r749, %r750}, %rd105; 2026-02-21T09:06:41.6847773Z cvt.rn.f16x2.f32 %r751, %r750, %r749; 2026-02-21T09:06:41.6848058Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6848346Z cvt.u64.u32 %rd106, %r416; 2026-02-21T09:06:41.6848497Z cvt.u64.u32 %rd107, %r417; 2026-02-21T09:06:41.6848651Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:06:41.6848801Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:06:41.6849073Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6849367Z mov.b64 {%r752, %r753}, %rd109; 2026-02-21T09:06:41.6849537Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T09:06:41.6849818Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6850111Z cvt.u64.u32 %rd110, %r419; 2026-02-21T09:06:41.6850269Z cvt.u64.u32 %rd111, %r420; 2026-02-21T09:06:41.6850418Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:06:41.6850578Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:06:41.6850840Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6851132Z mov.b64 {%r755, %r756}, %rd113; 2026-02-21T09:06:41.6851334Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T09:06:41.6851623Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6851918Z cvt.u64.u32 %rd114, %r421; 2026-02-21T09:06:41.6852067Z cvt.u64.u32 %rd115, %r422; 2026-02-21T09:06:41.6852221Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:06:41.6852371Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:06:41.6852642Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6852921Z mov.b64 {%r758, %r759}, %rd117; 2026-02-21T09:06:41.6853088Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T09:06:41.6853370Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6853678Z cvt.u64.u32 %rd118, %r423; 2026-02-21T09:06:41.6853833Z cvt.u64.u32 %rd119, %r424; 2026-02-21T09:06:41.6853978Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:06:41.6854134Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:06:41.6854398Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6854714Z mov.b64 {%r761, %r762}, %rd121; 2026-02-21T09:06:41.6854922Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T09:06:41.6855200Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6855485Z cvt.u64.u32 %rd122, %r425; 2026-02-21T09:06:41.6855632Z cvt.u64.u32 %rd123, %r426; 2026-02-21T09:06:41.6855785Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:06:41.6855934Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:06:41.6856226Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6856519Z mov.b64 {%r764, %r765}, %rd125; 2026-02-21T09:06:41.6856688Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T09:06:41.6856976Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6857267Z cvt.u64.u32 %rd126, %r427; 2026-02-21T09:06:41.6857423Z cvt.u64.u32 %rd127, %r428; 2026-02-21T09:06:41.6857571Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:06:41.6857731Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:06:41.6858006Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6858311Z mov.b64 {%r767, %r768}, %rd129; 2026-02-21T09:06:41.6858478Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T09:06:41.6858775Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6859083Z cvt.u64.u32 %rd130, %r429; 2026-02-21T09:06:41.6859240Z cvt.u64.u32 %rd131, %r430; 2026-02-21T09:06:41.6859402Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:06:41.6859559Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:06:41.6859839Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6860131Z mov.b64 {%r770, %r771}, %rd133; 2026-02-21T09:06:41.6860305Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T09:06:41.6860600Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6860891Z cvt.u64.u32 %rd134, %r431; 2026-02-21T09:06:41.6861054Z cvt.u64.u32 %rd135, %r432; 2026-02-21T09:06:41.6861207Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:06:41.6861371Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:06:41.6861644Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6861942Z mov.b64 {%r773, %r774}, %rd137; 2026-02-21T09:06:41.6862111Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T09:06:41.6862407Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6862720Z cvt.u64.u32 %rd138, %r433; 2026-02-21T09:06:41.6862874Z cvt.u64.u32 %rd139, %r434; 2026-02-21T09:06:41.6863076Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:06:41.6863235Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:06:41.6863518Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6863812Z mov.b64 {%r776, %r777}, %rd141; 2026-02-21T09:06:41.6863986Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T09:06:41.6864277Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6864567Z cvt.u64.u32 %rd142, %r436; 2026-02-21T09:06:41.6864762Z cvt.u64.u32 %rd143, %r437; 2026-02-21T09:06:41.6864916Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:06:41.6865081Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:06:41.6865402Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6865698Z mov.b64 {%r779, %r780}, %rd145; 2026-02-21T09:06:41.6865863Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T09:06:41.6866148Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6866440Z cvt.u64.u32 %rd146, %r438; 2026-02-21T09:06:41.6866585Z cvt.u64.u32 %rd147, %r439; 2026-02-21T09:06:41.6866770Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:06:41.6866923Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:06:41.6867189Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6867472Z mov.b64 {%r782, %r783}, %rd149; 2026-02-21T09:06:41.6867641Z cvt.rn.f16x2.f32 %r784, %r783, %r782; 2026-02-21T09:06:41.6867941Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6868226Z cvt.u64.u32 %rd150, %r440; 2026-02-21T09:06:41.6868384Z cvt.u64.u32 %rd151, %r441; 2026-02-21T09:06:41.6868532Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:06:41.6868689Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:06:41.6868946Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6869235Z mov.b64 {%r785, %r786}, %rd153; 2026-02-21T09:06:41.6869393Z cvt.rn.f16x2.f32 %r787, %r786, %r785; 2026-02-21T09:06:41.6869672Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6869956Z cvt.u64.u32 %rd154, %r442; 2026-02-21T09:06:41.6870107Z cvt.u64.u32 %rd155, %r443; 2026-02-21T09:06:41.6870264Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:06:41.6870415Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:06:41.6870679Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6870953Z mov.b64 {%r788, %r789}, %rd157; 2026-02-21T09:06:41.6871119Z cvt.rn.f16x2.f32 %r790, %r789, %r788; 2026-02-21T09:06:41.6871393Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6871677Z cvt.u64.u32 %rd158, %r444; 2026-02-21T09:06:41.6871831Z cvt.u64.u32 %rd159, %r445; 2026-02-21T09:06:41.6871978Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:06:41.6872135Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:06:41.6872398Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6872682Z mov.b64 {%r791, %r792}, %rd161; 2026-02-21T09:06:41.6872842Z cvt.rn.f16x2.f32 %r793, %r792, %r791; 2026-02-21T09:06:41.6873122Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6873402Z cvt.u64.u32 %rd162, %r446; 2026-02-21T09:06:41.6873552Z cvt.u64.u32 %rd163, %r447; 2026-02-21T09:06:41.6873706Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:06:41.6873857Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:06:41.6874124Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6874402Z mov.b64 {%r794, %r795}, %rd165; 2026-02-21T09:06:41.6874602Z cvt.rn.f16x2.f32 %r796, %r795, %r794; 2026-02-21T09:06:41.6874915Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6875201Z cvt.u64.u32 %rd166, %r448; 2026-02-21T09:06:41.6875357Z cvt.u64.u32 %rd167, %r449; 2026-02-21T09:06:41.6875504Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:06:41.6875660Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:06:41.6875926Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6876213Z mov.b64 {%r797, %r798}, %rd169; 2026-02-21T09:06:41.6876372Z cvt.rn.f16x2.f32 %r799, %r798, %r797; 2026-02-21T09:06:41.6876684Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6876965Z cvt.u64.u32 %rd170, %r450; 2026-02-21T09:06:41.6877109Z cvt.u64.u32 %rd171, %r451; 2026-02-21T09:06:41.6877262Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:06:41.6877412Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:06:41.6877676Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6877975Z mov.b64 {%r800, %r801}, %rd173; 2026-02-21T09:06:41.6878143Z cvt.rn.f16x2.f32 %r802, %r801, %r800; 2026-02-21T09:06:41.6878417Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6878689Z cvt.u64.u32 %rd174, %r453; 2026-02-21T09:06:41.6878840Z cvt.u64.u32 %rd175, %r454; 2026-02-21T09:06:41.6878987Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:06:41.6879143Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:06:41.6879425Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6879722Z mov.b64 {%r803, %r804}, %rd177; 2026-02-21T09:06:41.6879883Z cvt.rn.f16x2.f32 %r805, %r804, %r803; 2026-02-21T09:06:41.6880163Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6880450Z cvt.u64.u32 %rd178, %r455; 2026-02-21T09:06:41.6880598Z cvt.u64.u32 %rd179, %r456; 2026-02-21T09:06:41.6880756Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:06:41.6880912Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:06:41.6881186Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6881471Z mov.b64 {%r806, %r807}, %rd181; 2026-02-21T09:06:41.6881638Z cvt.rn.f16x2.f32 %r808, %r807, %r806; 2026-02-21T09:06:41.6881915Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6882195Z cvt.u64.u32 %rd182, %r457; 2026-02-21T09:06:41.6882351Z cvt.u64.u32 %rd183, %r458; 2026-02-21T09:06:41.6882497Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:06:41.6882653Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:06:41.6882912Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6883199Z mov.b64 {%r809, %r810}, %rd185; 2026-02-21T09:06:41.6883359Z cvt.rn.f16x2.f32 %r811, %r810, %r809; 2026-02-21T09:06:41.6883643Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6883930Z cvt.u64.u32 %rd186, %r459; 2026-02-21T09:06:41.6884079Z cvt.u64.u32 %rd187, %r460; 2026-02-21T09:06:41.6884234Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:06:41.6884382Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:06:41.6884653Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6884969Z mov.b64 {%r812, %r813}, %rd189; 2026-02-21T09:06:41.6885138Z cvt.rn.f16x2.f32 %r814, %r813, %r812; 2026-02-21T09:06:41.6885416Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6885697Z cvt.u64.u32 %rd190, %r461; 2026-02-21T09:06:41.6885877Z cvt.u64.u32 %rd191, %r462; 2026-02-21T09:06:41.6886021Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:06:41.6886177Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:06:41.6886434Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6886719Z mov.b64 {%r815, %r816}, %rd193; 2026-02-21T09:06:41.6886875Z cvt.rn.f16x2.f32 %r817, %r816, %r815; 2026-02-21T09:06:41.6887146Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6887426Z cvt.u64.u32 %rd194, %r463; 2026-02-21T09:06:41.6887573Z cvt.u64.u32 %rd195, %r464; 2026-02-21T09:06:41.6887727Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:06:41.6887944Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:06:41.6888202Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6888473Z mov.b64 {%r818, %r819}, %rd197; 2026-02-21T09:06:41.6888638Z cvt.rn.f16x2.f32 %r820, %r819, %r818; 2026-02-21T09:06:41.6888911Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6889188Z cvt.u64.u32 %rd198, %r465; 2026-02-21T09:06:41.6889372Z cvt.u64.u32 %rd199, %r466; 2026-02-21T09:06:41.6889520Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:06:41.6889677Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:06:41.6889932Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6890222Z mov.b64 {%r821, %r822}, %rd201; 2026-02-21T09:06:41.6890382Z cvt.rn.f16x2.f32 %r823, %r822, %r821; 2026-02-21T09:06:41.6890679Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6890973Z cvt.u64.u32 %rd202, %r467; 2026-02-21T09:06:41.6891122Z cvt.u64.u32 %rd203, %r468; 2026-02-21T09:06:41.6891277Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:06:41.6891428Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:06:41.6891696Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6891973Z mov.b64 {%r824, %r825}, %rd205; 2026-02-21T09:06:41.6892144Z cvt.rn.f16x2.f32 %r826, %r825, %r824; 2026-02-21T09:06:41.6892421Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6892697Z cvt.u64.u32 %rd206, %r470; 2026-02-21T09:06:41.6892849Z cvt.u64.u32 %rd207, %r471; 2026-02-21T09:06:41.6892907Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:06:41.6892963Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:06:41.6893133Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6893193Z mov.b64 {%r827, %r828}, %rd209; 2026-02-21T09:06:41.6893254Z cvt.rn.f16x2.f32 %r829, %r828, %r827; 2026-02-21T09:06:41.6893414Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6893480Z cvt.u64.u32 %rd210, %r472; 2026-02-21T09:06:41.6893534Z cvt.u64.u32 %rd211, %r473; 2026-02-21T09:06:41.6893591Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:06:41.6893656Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:06:41.6893821Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6893879Z mov.b64 {%r830, %r831}, %rd213; 2026-02-21T09:06:41.6893947Z cvt.rn.f16x2.f32 %r832, %r831, %r830; 2026-02-21T09:06:41.6894106Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6894163Z cvt.u64.u32 %rd214, %r474; 2026-02-21T09:06:41.6894218Z cvt.u64.u32 %rd215, %r475; 2026-02-21T09:06:41.6894284Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:06:41.6894340Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:06:41.6894501Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6894587Z mov.b64 {%r833, %r834}, %rd217; 2026-02-21T09:06:41.6894648Z cvt.rn.f16x2.f32 %r835, %r834, %r833; 2026-02-21T09:06:41.6894838Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6894902Z cvt.u64.u32 %rd218, %r476; 2026-02-21T09:06:41.6894957Z cvt.u64.u32 %rd219, %r477; 2026-02-21T09:06:41.6895014Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:06:41.6895071Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:06:41.6895237Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6895293Z mov.b64 {%r836, %r837}, %rd221; 2026-02-21T09:06:41.6895355Z cvt.rn.f16x2.f32 %r838, %r837, %r836; 2026-02-21T09:06:41.6895569Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6895626Z cvt.u64.u32 %rd222, %r478; 2026-02-21T09:06:41.6895682Z cvt.u64.u32 %rd223, %r479; 2026-02-21T09:06:41.6895747Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:06:41.6895804Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:06:41.6895967Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6896057Z mov.b64 {%r839, %r840}, %rd225; 2026-02-21T09:06:41.6896131Z cvt.rn.f16x2.f32 %r841, %r840, %r839; 2026-02-21T09:06:41.6896296Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6896352Z cvt.u64.u32 %rd226, %r480; 2026-02-21T09:06:41.6896413Z cvt.u64.u32 %rd227, %r481; 2026-02-21T09:06:41.6896469Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:06:41.6896556Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:06:41.6896733Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6896792Z mov.b64 {%r842, %r843}, %rd229; 2026-02-21T09:06:41.6896854Z cvt.rn.f16x2.f32 %r844, %r843, %r842; 2026-02-21T09:06:41.6897020Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6897081Z cvt.u64.u32 %rd230, %r482; 2026-02-21T09:06:41.6897135Z cvt.u64.u32 %rd231, %r483; 2026-02-21T09:06:41.6897191Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:06:41.6897255Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:06:41.6897421Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6897479Z mov.b64 {%r845, %r846}, %rd233; 2026-02-21T09:06:41.6897547Z cvt.rn.f16x2.f32 %r847, %r846, %r845; 2026-02-21T09:06:41.6897713Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6897770Z cvt.u64.u32 %rd234, %r484; 2026-02-21T09:06:41.6897825Z cvt.u64.u32 %rd235, %r485; 2026-02-21T09:06:41.6897889Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:06:41.6897946Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:06:41.6898112Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6898177Z mov.b64 {%r848, %r849}, %rd237; 2026-02-21T09:06:41.6898238Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T09:06:41.6898403Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6898466Z cvt.u64.u32 %rd238, %r487; 2026-02-21T09:06:41.6898522Z cvt.u64.u32 %rd239, %r488; 2026-02-21T09:06:41.6898577Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:06:41.6898633Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:06:41.6898805Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6898861Z mov.b64 {%r851, %r852}, %rd241; 2026-02-21T09:06:41.6898923Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:06:41.6899094Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6899150Z cvt.u64.u32 %rd242, %r489; 2026-02-21T09:06:41.6899234Z cvt.u64.u32 %rd243, %r490; 2026-02-21T09:06:41.6899297Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:06:41.6899355Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:06:41.6899523Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6899580Z mov.b64 {%r854, %r855}, %rd245; 2026-02-21T09:06:41.6899647Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:06:41.6899814Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6899869Z cvt.u64.u32 %rd246, %r491; 2026-02-21T09:06:41.6899931Z cvt.u64.u32 %rd247, %r492; 2026-02-21T09:06:41.6899988Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:06:41.6900068Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:06:41.6900246Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6900305Z mov.b64 {%r857, %r858}, %rd249; 2026-02-21T09:06:41.6900368Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:06:41.6900533Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6900622Z cvt.u64.u32 %rd250, %r493; 2026-02-21T09:06:41.6900681Z cvt.u64.u32 %rd251, %r494; 2026-02-21T09:06:41.6900740Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:06:41.6900812Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:06:41.6900986Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6901045Z mov.b64 {%r860, %r861}, %rd253; 2026-02-21T09:06:41.6901116Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:06:41.6901310Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6901371Z cvt.u64.u32 %rd254, %r495; 2026-02-21T09:06:41.6901430Z cvt.u64.u32 %rd255, %r496; 2026-02-21T09:06:41.6901497Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:06:41.6901557Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:06:41.6901723Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6901790Z mov.b64 {%r863, %r864}, %rd257; 2026-02-21T09:06:41.6901852Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:06:41.6902028Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6902093Z cvt.u64.u32 %rd258, %r497; 2026-02-21T09:06:41.6902150Z cvt.u64.u32 %rd259, %r498; 2026-02-21T09:06:41.6902209Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:06:41.6902266Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:06:41.6902446Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6902506Z mov.b64 {%r866, %r867}, %rd261; 2026-02-21T09:06:41.6902568Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:06:41.6902748Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6902809Z cvt.u64.u32 %rd262, %r499; 2026-02-21T09:06:41.6902867Z cvt.u64.u32 %rd263, %r500; 2026-02-21T09:06:41.6902931Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:06:41.6902990Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:06:41.6903162Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6903221Z mov.b64 {%r869, %r870}, %rd265; 2026-02-21T09:06:41.6903291Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:06:41.6903462Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6903522Z cvt.u64.u32 %rd266, %r501; 2026-02-21T09:06:41.6903589Z cvt.u64.u32 %rd267, %r502; 2026-02-21T09:06:41.6903648Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:06:41.6903705Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:06:41.6903879Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6903963Z mov.b64 {%r872, %r873}, %rd269; 2026-02-21T09:06:41.6904028Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:06:41.6904198Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6904266Z cvt.u64.u32 %rd270, %r504; 2026-02-21T09:06:41.6904325Z cvt.u64.u32 %rd271, %r505; 2026-02-21T09:06:41.6904384Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:06:41.6904449Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:06:41.6904619Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6904710Z mov.b64 {%r875, %r876}, %rd273; 2026-02-21T09:06:41.6904807Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:06:41.6904978Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6905035Z cvt.u64.u32 %rd274, %r506; 2026-02-21T09:06:41.6905093Z cvt.u64.u32 %rd275, %r507; 2026-02-21T09:06:41.6905160Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:06:41.6905218Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:06:41.6905411Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6905481Z mov.b64 {%r878, %r879}, %rd277; 2026-02-21T09:06:41.6905545Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:06:41.6905721Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6905786Z cvt.u64.u32 %rd278, %r508; 2026-02-21T09:06:41.6905845Z cvt.u64.u32 %rd279, %r509; 2026-02-21T09:06:41.6905931Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:06:41.6905991Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:06:41.6906172Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6906232Z mov.b64 {%r881, %r882}, %rd281; 2026-02-21T09:06:41.6906294Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:06:41.6906473Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6906532Z cvt.u64.u32 %rd282, %r510; 2026-02-21T09:06:41.6906592Z cvt.u64.u32 %rd283, %r511; 2026-02-21T09:06:41.6906660Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:06:41.6906719Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:06:41.6906892Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6906952Z mov.b64 {%r884, %r885}, %rd285; 2026-02-21T09:06:41.6907024Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:06:41.6907200Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6907261Z cvt.u64.u32 %rd286, %r512; 2026-02-21T09:06:41.6907325Z cvt.u64.u32 %rd287, %r513; 2026-02-21T09:06:41.6907384Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:06:41.6907442Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:06:41.6907624Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6907684Z mov.b64 {%r887, %r888}, %rd289; 2026-02-21T09:06:41.6907748Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:06:41.6907924Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6907990Z cvt.u64.u32 %rd290, %r514; 2026-02-21T09:06:41.6908047Z cvt.u64.u32 %rd291, %r515; 2026-02-21T09:06:41.6908107Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:06:41.6908171Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:06:41.6908349Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6908411Z mov.b64 {%r890, %r891}, %rd293; 2026-02-21T09:06:41.6908481Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:06:41.6908656Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6908748Z cvt.u64.u32 %rd294, %r516; 2026-02-21T09:06:41.6908805Z cvt.u64.u32 %rd295, %r517; 2026-02-21T09:06:41.6908873Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:06:41.6908932Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:06:41.6909119Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6909183Z mov.b64 {%r893, %r894}, %rd297; 2026-02-21T09:06:41.6909244Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:06:41.6909407Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6909472Z cvt.u64.u32 %rd298, %r518; 2026-02-21T09:06:41.6909529Z cvt.u64.u32 %rd299, %r519; 2026-02-21T09:06:41.6909610Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:06:41.6909666Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:06:41.6909838Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6909899Z mov.b64 {%r896, %r897}, %rd301; 2026-02-21T09:06:41.6909961Z cvt.rn.f16x2.f32 %r898, %r897, %r896; 2026-02-21T09:06:41.6910127Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6910206Z cvt.u64.u32 %rd302, %r521; 2026-02-21T09:06:41.6910263Z cvt.u64.u32 %rd303, %r522; 2026-02-21T09:06:41.6910327Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:06:41.6910384Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:06:41.6910550Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6910607Z mov.b64 {%r899, %r900}, %rd305; 2026-02-21T09:06:41.6910695Z cvt.rn.f16x2.f32 %r901, %r900, %r899; 2026-02-21T09:06:41.6910859Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6910915Z cvt.u64.u32 %rd306, %r523; 2026-02-21T09:06:41.6910979Z cvt.u64.u32 %rd307, %r524; 2026-02-21T09:06:41.6911034Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:06:41.6911091Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:06:41.6911258Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6911316Z mov.b64 {%r902, %r903}, %rd309; 2026-02-21T09:06:41.6911376Z cvt.rn.f16x2.f32 %r904, %r903, %r902; 2026-02-21T09:06:41.6911541Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6911603Z cvt.u64.u32 %rd310, %r525; 2026-02-21T09:06:41.6911660Z cvt.u64.u32 %rd311, %r526; 2026-02-21T09:06:41.6911717Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:06:41.6911781Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:06:41.6911941Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6911998Z mov.b64 {%r905, %r906}, %rd313; 2026-02-21T09:06:41.6912064Z cvt.rn.f16x2.f32 %r907, %r906, %r905; 2026-02-21T09:06:41.6912228Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6912284Z cvt.u64.u32 %rd314, %r527; 2026-02-21T09:06:41.6912338Z cvt.u64.u32 %rd315, %r528; 2026-02-21T09:06:41.6912404Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:06:41.6912460Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:06:41.6912623Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6912686Z mov.b64 {%r908, %r909}, %rd317; 2026-02-21T09:06:41.6912747Z cvt.rn.f16x2.f32 %r910, %r909, %r908; 2026-02-21T09:06:41.6912913Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6912979Z cvt.u64.u32 %rd318, %r529; 2026-02-21T09:06:41.6913034Z cvt.u64.u32 %rd319, %r530; 2026-02-21T09:06:41.6913090Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:06:41.6913146Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:06:41.6913326Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6913418Z mov.b64 {%r911, %r912}, %rd321; 2026-02-21T09:06:41.6913477Z cvt.rn.f16x2.f32 %r913, %r912, %r911; 2026-02-21T09:06:41.6913651Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6913708Z cvt.u64.u32 %rd322, %r531; 2026-02-21T09:06:41.6913764Z cvt.u64.u32 %rd323, %r532; 2026-02-21T09:06:41.6913819Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:06:41.6913883Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:06:41.6914048Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6914105Z mov.b64 {%r914, %r915}, %rd325; 2026-02-21T09:06:41.6914196Z cvt.rn.f16x2.f32 %r916, %r915, %r914; 2026-02-21T09:06:41.6914355Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6914411Z cvt.u64.u32 %rd326, %r533; 2026-02-21T09:06:41.6914472Z cvt.u64.u32 %rd327, %r534; 2026-02-21T09:06:41.6914528Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:06:41.6914584Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:06:41.6914800Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6914867Z mov.b64 {%r917, %r918}, %rd329; 2026-02-21T09:06:41.6914927Z cvt.rn.f16x2.f32 %r919, %r918, %r917; 2026-02-21T09:06:41.6915094Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6915157Z cvt.u64.u32 %rd330, %r535; 2026-02-21T09:06:41.6915212Z cvt.u64.u32 %rd331, %r536; 2026-02-21T09:06:41.6915294Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:06:41.6915358Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:06:41.6915520Z .loc 1 55 27 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:55:27 2026-02-21T09:06:41.6915576Z mov.b64 {%r920, %r921}, %rd333; 2026-02-21T09:06:41.6915639Z cvt.rn.f16x2.f32 %r922, %r921, %r920; 2026-02-21T09:06:41.6915810Z .loc 1 56 82 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:56:82 2026-02-21T09:06:41.6915904Z st.shared.v4.b32 [%r30], {%r733, %r745, %r757, %r769}; 2026-02-21T09:06:41.6915993Z st.shared.v4.b32 [%r31], {%r781, %r793, %r805, %r817}; 2026-02-21T09:06:41.6916083Z st.shared.v4.b32 [%r32], {%r829, %r841, %r853, %r865}; 2026-02-21T09:06:41.6916165Z st.shared.v4.b32 [%r33], {%r877, %r889, %r901, %r913}; 2026-02-21T09:06:41.6916220Z bar.sync 0; 2026-02-21T09:06:41.6916283Z // begin inline asm 2026-02-21T09:06:41.6916432Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r618, %r622, %r626, %r630}, [%r542]; 2026-02-21T09:06:41.6916489Z // end inline asm 2026-02-21T09:06:41.6916544Z // begin inline asm 2026-02-21T09:06:41.6916696Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r634, %r638, %r642, %r646}, [%r547]; 2026-02-21T09:06:41.6916750Z // end inline asm 2026-02-21T09:06:41.6916805Z // begin inline asm 2026-02-21T09:06:41.6916953Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r650, %r654, %r658, %r662}, [%r552]; 2026-02-21T09:06:41.6917006Z // end inline asm 2026-02-21T09:06:41.6917060Z // begin inline asm 2026-02-21T09:06:41.6917206Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r666, %r670, %r674, %r678}, [%r557]; 2026-02-21T09:06:41.6917260Z // end inline asm 2026-02-21T09:06:41.6917312Z bar.sync 0; 2026-02-21T09:06:41.6917396Z st.shared.v4.b32 [%r30], {%r736, %r748, %r760, %r772}; 2026-02-21T09:06:41.6917485Z st.shared.v4.b32 [%r31], {%r784, %r796, %r808, %r820}; 2026-02-21T09:06:41.6917563Z st.shared.v4.b32 [%r32], {%r832, %r844, %r856, %r868}; 2026-02-21T09:06:41.6917644Z st.shared.v4.b32 [%r33], {%r880, %r892, %r904, %r916}; 2026-02-21T09:06:41.6917707Z bar.sync 0; 2026-02-21T09:06:41.6917762Z // begin inline asm 2026-02-21T09:06:41.6917901Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r619, %r623, %r627, %r631}, [%r542]; 2026-02-21T09:06:41.6917961Z // end inline asm 2026-02-21T09:06:41.6918046Z // begin inline asm 2026-02-21T09:06:41.6918185Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r635, %r639, %r643, %r647}, [%r547]; 2026-02-21T09:06:41.6918241Z // end inline asm 2026-02-21T09:06:41.6918305Z // begin inline asm 2026-02-21T09:06:41.6918445Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r651, %r655, %r659, %r663}, [%r552]; 2026-02-21T09:06:41.6918498Z // end inline asm 2026-02-21T09:06:41.6918561Z // begin inline asm 2026-02-21T09:06:41.6918694Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r667, %r671, %r675, %r679}, [%r557]; 2026-02-21T09:06:41.6918747Z // end inline asm 2026-02-21T09:06:41.6918798Z bar.sync 0; 2026-02-21T09:06:41.6918889Z st.shared.v4.b32 [%r30], {%r739, %r751, %r763, %r775}; 2026-02-21T09:06:41.6918997Z st.shared.v4.b32 [%r31], {%r787, %r799, %r811, %r823}; 2026-02-21T09:06:41.6919079Z st.shared.v4.b32 [%r32], {%r835, %r847, %r859, %r871}; 2026-02-21T09:06:41.6919168Z st.shared.v4.b32 [%r33], {%r883, %r895, %r907, %r919}; 2026-02-21T09:06:41.6919223Z bar.sync 0; 2026-02-21T09:06:41.6919275Z // begin inline asm 2026-02-21T09:06:41.6919422Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r620, %r624, %r628, %r632}, [%r542]; 2026-02-21T09:06:41.6919474Z // end inline asm 2026-02-21T09:06:41.6919546Z // begin inline asm 2026-02-21T09:06:41.6919688Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r636, %r640, %r644, %r648}, [%r547]; 2026-02-21T09:06:41.6919747Z // end inline asm 2026-02-21T09:06:41.6919799Z // begin inline asm 2026-02-21T09:06:41.6919938Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r652, %r656, %r660, %r664}, [%r552]; 2026-02-21T09:06:41.6919998Z // end inline asm 2026-02-21T09:06:41.6920051Z // begin inline asm 2026-02-21T09:06:41.6920209Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r668, %r672, %r676, %r680}, [%r557]; 2026-02-21T09:06:41.6920263Z // end inline asm 2026-02-21T09:06:41.6920323Z bar.sync 0; 2026-02-21T09:06:41.6920404Z st.shared.v4.b32 [%r30], {%r742, %r754, %r766, %r778}; 2026-02-21T09:06:41.6920485Z st.shared.v4.b32 [%r31], {%r790, %r802, %r814, %r826}; 2026-02-21T09:06:41.6920572Z st.shared.v4.b32 [%r32], {%r838, %r850, %r862, %r874}; 2026-02-21T09:06:41.6920653Z st.shared.v4.b32 [%r33], {%r886, %r898, %r910, %r922}; 2026-02-21T09:06:41.6920705Z bar.sync 0; 2026-02-21T09:06:41.6920766Z // begin inline asm 2026-02-21T09:06:41.6920900Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r621, %r625, %r629, %r633}, [%r542]; 2026-02-21T09:06:41.6920952Z // end inline asm 2026-02-21T09:06:41.6921005Z // begin inline asm 2026-02-21T09:06:41.6921146Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r641, %r645, %r649}, [%r547]; 2026-02-21T09:06:41.6921199Z // end inline asm 2026-02-21T09:06:41.6921253Z // begin inline asm 2026-02-21T09:06:41.6921393Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r653, %r657, %r661, %r665}, [%r552]; 2026-02-21T09:06:41.6921446Z // end inline asm 2026-02-21T09:06:41.6921499Z // begin inline asm 2026-02-21T09:06:41.6921630Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r669, %r673, %r677, %r681}, [%r557]; 2026-02-21T09:06:41.6921693Z // end inline asm 2026-02-21T09:06:41.6921746Z // begin inline asm 2026-02-21T09:06:41.6921845Z st.global.v4.b32 [ %rd62 + 0 ], { %r618, %r619, %r620, %r621 }; 2026-02-21T09:06:41.6921907Z // end inline asm 2026-02-21T09:06:41.6921960Z // begin inline asm 2026-02-21T09:06:41.6922056Z st.global.v4.b32 [ %rd63 + 0 ], { %r622, %r623, %r624, %r625 }; 2026-02-21T09:06:41.6922115Z // end inline asm 2026-02-21T09:06:41.6922169Z // begin inline asm 2026-02-21T09:06:41.6922260Z st.global.v4.b32 [ %rd64 + 0 ], { %r626, %r627, %r628, %r629 }; 2026-02-21T09:06:41.6922311Z // end inline asm 2026-02-21T09:06:41.6922371Z // begin inline asm 2026-02-21T09:06:41.6922461Z st.global.v4.b32 [ %rd65 + 0 ], { %r630, %r631, %r632, %r633 }; 2026-02-21T09:06:41.6922515Z // end inline asm 2026-02-21T09:06:41.6922574Z // begin inline asm 2026-02-21T09:06:41.6922664Z st.global.v4.b32 [ %rd66 + 0 ], { %r634, %r635, %r636, %r637 }; 2026-02-21T09:06:41.6922715Z // end inline asm 2026-02-21T09:06:41.6922792Z // begin inline asm 2026-02-21T09:06:41.6922891Z st.global.v4.b32 [ %rd67 + 0 ], { %r638, %r639, %r640, %r641 }; 2026-02-21T09:06:41.6922941Z // end inline asm 2026-02-21T09:06:41.6922994Z // begin inline asm 2026-02-21T09:06:41.6923092Z st.global.v4.b32 [ %rd68 + 0 ], { %r642, %r643, %r644, %r645 }; 2026-02-21T09:06:41.6923143Z // end inline asm 2026-02-21T09:06:41.6923196Z // begin inline asm 2026-02-21T09:06:41.6923284Z st.global.v4.b32 [ %rd69 + 0 ], { %r646, %r647, %r648, %r649 }; 2026-02-21T09:06:41.6923344Z // end inline asm 2026-02-21T09:06:41.6923395Z // begin inline asm 2026-02-21T09:06:41.6923486Z st.global.v4.b32 [ %rd70 + 0 ], { %r650, %r651, %r652, %r653 }; 2026-02-21T09:06:41.6923568Z // end inline asm 2026-02-21T09:06:41.6923622Z // begin inline asm 2026-02-21T09:06:41.6923711Z st.global.v4.b32 [ %rd71 + 0 ], { %r654, %r655, %r656, %r657 }; 2026-02-21T09:06:41.6923769Z // end inline asm 2026-02-21T09:06:41.6923821Z // begin inline asm 2026-02-21T09:06:41.6923912Z st.global.v4.b32 [ %rd72 + 0 ], { %r658, %r659, %r660, %r661 }; 2026-02-21T09:06:41.6923964Z // end inline asm 2026-02-21T09:06:41.6924023Z // begin inline asm 2026-02-21T09:06:41.6924133Z st.global.v4.b32 [ %rd73 + 0 ], { %r662, %r663, %r664, %r665 }; 2026-02-21T09:06:41.6924187Z // end inline asm 2026-02-21T09:06:41.6924246Z // begin inline asm 2026-02-21T09:06:41.6924336Z st.global.v4.b32 [ %rd74 + 0 ], { %r666, %r667, %r668, %r669 }; 2026-02-21T09:06:41.6924390Z // end inline asm 2026-02-21T09:06:41.6924442Z // begin inline asm 2026-02-21T09:06:41.6924538Z st.global.v4.b32 [ %rd75 + 0 ], { %r670, %r671, %r672, %r673 }; 2026-02-21T09:06:41.6924591Z // end inline asm 2026-02-21T09:06:41.6924665Z // begin inline asm 2026-02-21T09:06:41.6924796Z st.global.v4.b32 [ %rd76 + 0 ], { %r674, %r675, %r676, %r677 }; 2026-02-21T09:06:41.6924849Z // end inline asm 2026-02-21T09:06:41.6924902Z // begin inline asm 2026-02-21T09:06:41.6924999Z st.global.v4.b32 [ %rd77 + 0 ], { %r678, %r679, %r680, %r681 }; 2026-02-21T09:06:41.6925053Z // end inline asm 2026-02-21T09:06:41.6925109Z bra.uni $L__BB0_10; 2026-02-21T09:06:41.6925190Z $L__BB0_11: // %._crit_edge 2026-02-21T09:06:41.6925369Z .loc 1 53 52 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:53:52 2026-02-21T09:06:41.6925422Z // begin inline asm 2026-02-21T09:06:41.6925471Z 2026-02-21T09:06:41.6925526Z { 2026-02-21T09:06:41.6925587Z .reg .pred complete; 2026-02-21T09:06:41.6925642Z waitLoop: 2026-02-21T09:06:41.6925760Z mbarrier.try_wait.parity.shared.b64 complete, [%r923], %r924; 2026-02-21T09:06:41.6925834Z @!complete bra.uni waitLoop; 2026-02-21T09:06:41.6925886Z } 2026-02-21T09:06:41.6925891Z 2026-02-21T09:06:41.6925946Z // end inline asm 2026-02-21T09:06:41.6926047Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:06:41.6926227Z .loc 1 28 108 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:108 2026-02-21T09:06:41.6926286Z bar.sync 0; 2026-02-21T09:06:41.6926354Z // begin inline asm 2026-02-21T09:06:41.6926442Z @%p119 mbarrier.inval.shared::cta.b64 [%r228]; 2026-02-21T09:06:41.6926494Z // end inline asm 2026-02-21T09:06:41.6926547Z bar.sync 0; 2026-02-21T09:06:41.6926609Z // begin inline asm 2026-02-21T09:06:41.6926693Z @%p119 mbarrier.inval.shared::cta.b64 [%r229]; 2026-02-21T09:06:41.6926746Z // end inline asm 2026-02-21T09:06:41.6926805Z bar.sync 0; 2026-02-21T09:06:41.6926857Z // begin inline asm 2026-02-21T09:06:41.6926935Z @%p119 mbarrier.inval.shared::cta.b64 [%r230]; 2026-02-21T09:06:41.6926986Z // end inline asm 2026-02-21T09:06:41.6927045Z bar.sync 0; 2026-02-21T09:06:41.6927099Z // begin inline asm 2026-02-21T09:06:41.6927177Z @%p119 mbarrier.inval.shared::cta.b64 [%r306]; 2026-02-21T09:06:41.6927235Z // end inline asm 2026-02-21T09:06:41.6927288Z // begin inline asm 2026-02-21T09:06:41.6927363Z @%p119 mbarrier.inval.shared::cta.b64 [%r226]; 2026-02-21T09:06:41.6927455Z // end inline asm 2026-02-21T09:06:41.6927507Z bar.sync 0; 2026-02-21T09:06:41.6927560Z // begin inline asm 2026-02-21T09:06:41.6927636Z @%p119 mbarrier.inval.shared::cta.b64 [%r227]; 2026-02-21T09:06:41.6927698Z // end inline asm 2026-02-21T09:06:41.6927870Z .loc 1 28 4 // c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py:28:4 2026-02-21T09:06:41.6927921Z bar.sync 0; 2026-02-21T09:06:41.6927981Z // begin inline asm 2026-02-21T09:06:41.6928095Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r931, 256; 2026-02-21T09:06:41.6928148Z // end inline asm 2026-02-21T09:06:41.6928197Z ret; 2026-02-21T09:06:41.6928258Z $L__tmp1: 2026-02-21T09:06:41.6928313Z $L__func_end0: 2026-02-21T09:06:41.6928423Z // -- End function 2026-02-21T09:06:41.6928480Z } 2026-02-21T09:06:41.6928683Z .file 1 "/tmp/torchinductor_root/2d/c2dyzkqvex3stbwtcjlej4zuhxgkajphaav3rko6nfdur3fzvlth.py" 2026-02-21T09:06:41.6928743Z .section .debug_abbrev 2026-02-21T09:06:41.6928794Z { 2026-02-21T09:06:41.6928886Z .b8 1 // Abbreviation Code 2026-02-21T09:06:41.6928968Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:06:41.6929070Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:41.6929155Z .b8 37 // DW_AT_producer 2026-02-21T09:06:41.6929227Z .b8 8 // DW_FORM_string 2026-02-21T09:06:41.6929299Z .b8 19 // DW_AT_language 2026-02-21T09:06:41.6929378Z .b8 5 // DW_FORM_data2 2026-02-21T09:06:41.6929489Z .b8 3 // DW_AT_name 2026-02-21T09:06:41.6929563Z .b8 8 // DW_FORM_string 2026-02-21T09:06:41.6929644Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:06:41.6929715Z .b8 6 // DW_FORM_data4 2026-02-21T09:06:41.6929787Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:06:41.6929856Z .b8 8 // DW_FORM_string 2026-02-21T09:06:41.6929930Z .b8 0 // EOM(1) 2026-02-21T09:06:41.6929996Z .b8 0 // EOM(2) 2026-02-21T09:06:41.6930058Z .b8 0 // EOM(3) 2026-02-21T09:06:41.6930113Z } 2026-02-21T09:06:41.6930169Z .section .debug_info 2026-02-21T09:06:41.6930217Z { 2026-02-21T09:06:41.6930293Z .b32 104 // Length of Unit 2026-02-21T09:06:41.6930382Z .b8 2 // DWARF version number 2026-02-21T09:06:41.6930432Z .b8 0 2026-02-21T09:06:41.6930544Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:06:41.6930640Z .b8 8 // Address Size (in bytes) 2026-02-21T09:06:41.6930736Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:06:41.6930814Z .b8 116 // DW_AT_producer 2026-02-21T09:06:41.6930872Z .b8 114 2026-02-21T09:06:41.6930923Z .b8 105 2026-02-21T09:06:41.6930974Z .b8 116 2026-02-21T09:06:41.6931022Z .b8 111 2026-02-21T09:06:41.6931079Z .b8 110 2026-02-21T09:06:41.6931128Z .b8 0 2026-02-21T09:06:41.6931198Z .b8 2 // DW_AT_language 2026-02-21T09:06:41.6931253Z .b8 0 2026-02-21T09:06:41.6931324Z .b8 99 // DW_AT_name 2026-02-21T09:06:41.6931375Z .b8 50 2026-02-21T09:06:41.6931423Z .b8 100 2026-02-21T09:06:41.6931479Z .b8 121 2026-02-21T09:06:41.6931527Z .b8 122 2026-02-21T09:06:41.6931576Z .b8 107 2026-02-21T09:06:41.6931631Z .b8 113 2026-02-21T09:06:41.6931681Z .b8 118 2026-02-21T09:06:41.6931731Z .b8 101 2026-02-21T09:06:41.6931780Z .b8 120 2026-02-21T09:06:41.6931838Z .b8 51 2026-02-21T09:06:41.6931887Z .b8 115 2026-02-21T09:06:41.6931936Z .b8 116 2026-02-21T09:06:41.6931993Z .b8 98 2026-02-21T09:06:41.6932042Z .b8 119 2026-02-21T09:06:41.6932114Z .b8 116 2026-02-21T09:06:41.6932162Z .b8 99 2026-02-21T09:06:41.6932219Z .b8 106 2026-02-21T09:06:41.6932268Z .b8 108 2026-02-21T09:06:41.6932316Z .b8 101 2026-02-21T09:06:41.6932364Z .b8 106 2026-02-21T09:06:41.6932422Z .b8 52 2026-02-21T09:06:41.6932474Z .b8 122 2026-02-21T09:06:41.6932524Z .b8 117 2026-02-21T09:06:41.6932582Z .b8 104 2026-02-21T09:06:41.6932632Z .b8 120 2026-02-21T09:06:41.6932693Z .b8 103 2026-02-21T09:06:41.6932742Z .b8 107 2026-02-21T09:06:41.6932800Z .b8 97 2026-02-21T09:06:41.6932848Z .b8 106 2026-02-21T09:06:41.6932897Z .b8 112 2026-02-21T09:06:41.6932951Z .b8 104 2026-02-21T09:06:41.6932999Z .b8 97 2026-02-21T09:06:41.6933046Z .b8 97 2026-02-21T09:06:41.6933094Z .b8 118 2026-02-21T09:06:41.6933175Z .b8 51 2026-02-21T09:06:41.6933224Z .b8 114 2026-02-21T09:06:41.6933272Z .b8 107 2026-02-21T09:06:41.6933320Z .b8 111 2026-02-21T09:06:41.6933376Z .b8 54 2026-02-21T09:06:41.6933424Z .b8 110 2026-02-21T09:06:41.6933472Z .b8 102 2026-02-21T09:06:41.6933526Z .b8 100 2026-02-21T09:06:41.6933575Z .b8 117 2026-02-21T09:06:41.6933623Z .b8 114 2026-02-21T09:06:41.6933671Z .b8 51 2026-02-21T09:06:41.6933725Z .b8 102 2026-02-21T09:06:41.6933773Z .b8 122 2026-02-21T09:06:41.6933820Z .b8 118 2026-02-21T09:06:41.6933892Z .b8 108 2026-02-21T09:06:41.6933941Z .b8 116 2026-02-21T09:06:41.6933989Z .b8 104 2026-02-21T09:06:41.6934036Z .b8 46 2026-02-21T09:06:41.6934091Z .b8 112 2026-02-21T09:06:41.6934138Z .b8 121 2026-02-21T09:06:41.6934186Z .b8 0 2026-02-21T09:06:41.6934281Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:06:41.6934353Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:06:41.6934401Z .b8 116 2026-02-21T09:06:41.6934471Z .b8 109 2026-02-21T09:06:41.6934530Z .b8 112 2026-02-21T09:06:41.6934580Z .b8 47 2026-02-21T09:06:41.6934628Z .b8 116 2026-02-21T09:06:41.6934719Z .b8 111 2026-02-21T09:06:41.6934769Z .b8 114 2026-02-21T09:06:41.6934818Z .b8 99 2026-02-21T09:06:41.6934866Z .b8 104 2026-02-21T09:06:41.6934921Z .b8 105 2026-02-21T09:06:41.6934970Z .b8 110 2026-02-21T09:06:41.6935018Z .b8 100 2026-02-21T09:06:41.6935065Z .b8 117 2026-02-21T09:06:41.6935120Z .b8 99 2026-02-21T09:06:41.6935168Z .b8 116 2026-02-21T09:06:41.6935216Z .b8 111 2026-02-21T09:06:41.6935271Z .b8 114 2026-02-21T09:06:41.6935319Z .b8 95 2026-02-21T09:06:41.6935367Z .b8 114 2026-02-21T09:06:41.6935413Z .b8 111 2026-02-21T09:06:41.6935468Z .b8 111 2026-02-21T09:06:41.6935514Z .b8 116 2026-02-21T09:06:41.6935561Z .b8 47 2026-02-21T09:06:41.6935615Z .b8 50 2026-02-21T09:06:41.6935665Z .b8 100 2026-02-21T09:06:41.6935714Z .b8 0 2026-02-21T09:06:41.6935763Z } 2026-02-21T09:06:41.6935835Z .section .debug_macinfo { } 2026-02-21T09:06:41.6935839Z 2026-02-21T09:06:41.6935914Z ================================================================ 2026-02-21T09:06:41.6936016Z please share the reproducer above with Triton project. 2026-02-21T09:06:41.7857461Z 2026-02-21T09:06:41.7859475Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 20.5 configs/s 2026-02-21T09:06:43.3415355Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 644.3 2026-02-21T09:06:43.3418471Z configs/s 2026-02-21T09:06:43.4450522Z [103s] Generation 4 complete: 2026-02-21T09:06:43.4455040Z error=28 2026-02-21T09:06:43.4456197Z ok=62 2026-02-21T09:06:43.4456358Z min=0.0368 2026-02-21T09:06:43.4456495Z mid=0.0573 2026-02-21T09:06:43.4456623Z max=26.0905 2026-02-21T09:06:43.4456763Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:06:43.4457005Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:06:43.4457227Z 'l2_groupings': [1], 2026-02-21T09:06:43.4457408Z 'load_eviction_policies': ['', ''], 2026-02-21T09:06:43.4457596Z 'loop_orders': [[1, 0]], 2026-02-21T09:06:43.4457750Z 'num_stages': 5, 2026-02-21T09:06:43.4457886Z 'num_warps': 8, 2026-02-21T09:06:43.4458033Z 'pid_type': 'flat', 2026-02-21T09:06:43.4458182Z 'range_flattens': [None, None], 2026-02-21T09:06:43.4458612Z 'range_multi_buffers': [None, False], 2026-02-21T09:06:43.4458803Z 'range_num_stages': [0, 0], 2026-02-21T09:06:43.4458970Z 'range_unroll_factors': [0, 0], 2026-02-21T09:06:43.4459157Z 'range_warp_specializes': [None, None]} 2026-02-21T09:06:43.4473556Z [103s] Fitting surrogate: 459 points, 459 targets 2026-02-21T09:06:44.5469406Z [104s] Generation 5 starting: 76 neighbors, 5 active search path(s) 2026-02-21T09:06:58.0906068Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76/76 1.9 configs/s 2026-02-21T09:07:03.0508563Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 76/76 15.6 configs/s 2026-02-21T09:07:06.6514118Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 311.7 2026-02-21T09:07:06.6523162Z configs/s 2026-02-21T09:07:06.8789667Z [127s] Generation 5 complete: 2026-02-21T09:07:06.8793911Z error=23 2026-02-21T09:07:06.8798313Z ok=58 2026-02-21T09:07:06.8802802Z min=0.0368 2026-02-21T09:07:06.8806052Z mid=0.0480 2026-02-21T09:07:06.8809542Z max=12.9639 2026-02-21T09:07:06.8815057Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:07:06.8820388Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:07:06.8824813Z 'l2_groupings': [1], 2026-02-21T09:07:06.8828901Z 'load_eviction_policies': ['', ''], 2026-02-21T09:07:06.8834142Z 'loop_orders': [[1, 0]], 2026-02-21T09:07:06.8836262Z 'num_stages': 5, 2026-02-21T09:07:06.8836500Z 'num_warps': 8, 2026-02-21T09:07:06.8836693Z 'pid_type': 'flat', 2026-02-21T09:07:06.8836897Z 'range_flattens': [None, None], 2026-02-21T09:07:06.8837145Z 'range_multi_buffers': [None, False], 2026-02-21T09:07:06.8837693Z 'range_num_stages': [0, 0], 2026-02-21T09:07:06.8838290Z 'range_unroll_factors': [0, 0], 2026-02-21T09:07:06.8842856Z 'range_warp_specializes': [None, None]} 2026-02-21T09:07:06.8847470Z [127s] Fitting surrogate: 540 points, 540 targets 2026-02-21T09:07:08.3326413Z [128s] Generation 6 starting: 79 neighbors, 5 active search path(s) 2026-02-21T09:07:19.5503439Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79/79 2.6 configs/s 2026-02-21T09:07:23.0651774Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 79/79 22.9 configs/s 2026-02-21T09:07:26.6382391Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 283.7 2026-02-21T09:07:26.6386493Z configs/s 2026-02-21T09:07:26.8881704Z [147s] Generation 6 complete: 2026-02-21T09:07:26.8885271Z error=27 2026-02-21T09:07:26.8885522Z ok=57 2026-02-21T09:07:26.8889763Z min=0.0369 2026-02-21T09:07:26.8893074Z mid=0.0389 2026-02-21T09:07:26.8897128Z max=12.7773 2026-02-21T09:07:26.8898799Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:07:26.8899136Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:07:26.8899419Z 'l2_groupings': [1], 2026-02-21T09:07:26.8899615Z 'load_eviction_policies': ['', ''], 2026-02-21T09:07:26.8899835Z 'loop_orders': [[1, 0]], 2026-02-21T09:07:26.8900025Z 'num_stages': 5, 2026-02-21T09:07:26.8900197Z 'num_warps': 8, 2026-02-21T09:07:26.8900358Z 'pid_type': 'flat', 2026-02-21T09:07:26.8900547Z 'range_flattens': [None, None], 2026-02-21T09:07:26.8900771Z 'range_multi_buffers': [None, False], 2026-02-21T09:07:26.8900987Z 'range_num_stages': [0, 0], 2026-02-21T09:07:26.8901191Z 'range_unroll_factors': [0, 0], 2026-02-21T09:07:26.8901401Z 'range_warp_specializes': [None, None]} 2026-02-21T09:07:26.8917199Z [147s] Fitting surrogate: 624 points, 624 targets 2026-02-21T09:07:28.0846648Z [148s] Generation 7 starting: 57 neighbors, 4 active search path(s) 2026-02-21T09:07:41.6241892Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 2.0 configs/s 2026-02-21T09:07:44.5956995Z 2026-02-21T09:07:44.5960829Z 2026-02-21T09:07:44.5962684Z ================================================================ 2026-02-21T09:07:44.5963013Z Internal Triton PTX codegen error 2026-02-21T09:07:44.5963229Z `ptxas` stderr: 2026-02-21T09:07:44.5964173Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 197 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:07:44.5964787Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:07:44.5964955Z 2026-02-21T09:07:44.5965381Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2rm1n2gs.ptx -o /tmp/tmp2rm1n2gs.ptx.o 2026-02-21T09:07:44.5965858Z 2026-02-21T09:07:44.5965862Z 2026-02-21T09:07:44.5965940Z // 2026-02-21T09:07:44.5966093Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:07:44.5966284Z // 2026-02-21T09:07:44.5966465Z 2026-02-21T09:07:44.5966538Z .version 8.7 2026-02-21T09:07:44.5966687Z .target sm_100a 2026-02-21T09:07:44.5966825Z .address_size 64 2026-02-21T09:07:44.5966922Z 2026-02-21T09:07:44.5967050Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:07:44.5967326Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:07:44.5967545Z // @_helion_matmul 2026-02-21T09:07:44.5967763Z .visible .entry _helion_matmul( 2026-02-21T09:07:44.5968051Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:07:44.5968325Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:07:44.5968582Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:07:44.5968842Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:07:44.5969102Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:07:44.5969317Z ) 2026-02-21T09:07:44.5969449Z .reqntid 256 2026-02-21T09:07:44.5969583Z .maxnreg 32 2026-02-21T09:07:44.5969724Z { 2026-02-21T09:07:44.5969854Z .reg .pred %p<116>; 2026-02-21T09:07:44.5970022Z .reg .b32 %r<1643>; 2026-02-21T09:07:44.5970175Z .reg .b64 %rd<603>; 2026-02-21T09:07:44.5970471Z .loc 1 19 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:19:0 2026-02-21T09:07:44.5970789Z $L__func_begin0: 2026-02-21T09:07:44.5971055Z .loc 1 19 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:19:0 2026-02-21T09:07:44.5971304Z 2026-02-21T09:07:44.5971368Z // %bb.0: 2026-02-21T09:07:44.5971526Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:07:44.5971731Z $L__tmp0: 2026-02-21T09:07:44.5971976Z .loc 1 19 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:19 2026-02-21T09:07:44.5972281Z mov.u32 %r1, %tid.x; 2026-02-21T09:07:44.5972430Z shr.u32 %r2, %r1, 5; 2026-02-21T09:07:44.5972598Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:07:44.5972802Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:07:44.5972965Z @%p3 bra $L__BB0_16; 2026-02-21T09:07:44.5973121Z bra.uni $L__BB0_1; 2026-02-21T09:07:44.5973266Z $L__BB0_16: 2026-02-21T09:07:44.5973849Z [164s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:07:44.5975190Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:07:44.5976429Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:07:44.5976681Z `ptxas` stderr: 2026-02-21T09:07:44.5977116Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 197 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:07:44.5977747Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:07:44.5977890Z 2026-02-21T09:07:44.5978270Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2rm1n2gs.ptx -o /tmp/tmp2rm1n2gs.ptx.o 2026-02-21T09:07:44.5978739Z 2026-02-21T09:07:44.5978865Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:07:44.5979224Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0:0 2026-02-21T09:07:44.5979541Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:07:44.5979747Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:07:44.5980044Z .loc 1 19 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:19 2026-02-21T09:07:44.5980394Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:07:44.5980593Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T09:07:44.5980755Z mov.b32 %r196, global_smem; 2026-02-21T09:07:44.5980916Z // begin inline asm 2026-02-21T09:07:44.5981166Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r196], 256; 2026-02-21T09:07:44.5981415Z // end inline asm 2026-02-21T09:07:44.5981555Z bar.sync 0, 128; 2026-02-21T09:07:44.5981700Z ld.shared.b32 %r1614, [global_smem]; 2026-02-21T09:07:44.5981874Z bar.sync 0, 128; 2026-02-21T09:07:44.5982045Z // begin inline asm 2026-02-21T09:07:44.5982256Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:07:44.5982477Z // end inline asm 2026-02-21T09:07:44.5982729Z .loc 1 21 67 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:21:67 2026-02-21T09:07:44.5983028Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:07:44.5983178Z mov.u32 %r501, %ctaid.y; 2026-02-21T09:07:44.5983332Z mov.u32 %r502, %ctaid.z; 2026-02-21T09:07:44.5983481Z mov.u32 %r503, %nctaid.x; 2026-02-21T09:07:44.5983639Z mov.u32 %r504, %nctaid.y; 2026-02-21T09:07:44.5983801Z mad.lo.s32 %r505, %r502, %r504, %r501; 2026-02-21T09:07:44.5983998Z mad.lo.s32 %r506, %r505, %r503, %r41; 2026-02-21T09:07:44.5984177Z shl.b32 %r507, %r506, 8; 2026-02-21T09:07:44.5984343Z cvt.s64.s32 %rd58, %r507; 2026-02-21T09:07:44.5984504Z add.s64 %rd37, %rd6, %rd58; 2026-02-21T09:07:44.5984746Z shl.b32 %r508, %r1, 2; 2026-02-21T09:07:44.5984914Z add.s32 %r197, %r196, %r508; 2026-02-21T09:07:44.5985078Z mov.b32 %r1642, 0; 2026-02-21T09:07:44.5985229Z // begin inline asm 2026-02-21T09:07:44.5985391Z @%p29 st.shared.b32 [ %r197 + 0 ], %r1642; 2026-02-21T09:07:44.5985586Z // end inline asm 2026-02-21T09:07:44.5985731Z bar.warp.sync -1; 2026-02-21T09:07:44.5985891Z setp.eq.b32 %p104, %r1, 0; 2026-02-21T09:07:44.5986056Z cvt.u64.u32 %rd22, %r196; 2026-02-21T09:07:44.5986231Z // begin inline asm 2026-02-21T09:07:44.5986491Z @%p104 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd22 + 0 ], %rd3; 2026-02-21T09:07:44.5986767Z // end inline asm 2026-02-21T09:07:44.5986905Z // begin inline asm 2026-02-21T09:07:44.5987129Z @%p104 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1; 2026-02-21T09:07:44.5987380Z // end inline asm 2026-02-21T09:07:44.5987508Z mov.b32 %r199, 32; 2026-02-21T09:07:44.5987645Z // begin inline asm 2026-02-21T09:07:44.5987871Z @%p104 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0, %r199; 2026-02-21T09:07:44.5988139Z // end inline asm 2026-02-21T09:07:44.5988273Z mov.b32 %r200, 128; 2026-02-21T09:07:44.5988408Z // begin inline asm 2026-02-21T09:07:44.5988643Z @%p104 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1, %r200; 2026-02-21T09:07:44.5988900Z // end inline asm 2026-02-21T09:07:44.5989035Z mov.b32 %r201, 2048; 2026-02-21T09:07:44.5989179Z // begin inline asm 2026-02-21T09:07:44.5989426Z @%p104 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0, %r201; 2026-02-21T09:07:44.5989751Z // end inline asm 2026-02-21T09:07:44.5989896Z mov.b32 %r202, 4096; 2026-02-21T09:07:44.5990046Z // begin inline asm 2026-02-21T09:07:44.5990300Z @%p104 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1, %r202; 2026-02-21T09:07:44.5990634Z // end inline asm 2026-02-21T09:07:44.5990773Z mov.b64 %rd30, 4096; 2026-02-21T09:07:44.5990926Z // begin inline asm 2026-02-21T09:07:44.5991193Z @%p104 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd22 + 0 ], 0x0, %rd30; 2026-02-21T09:07:44.5991508Z // end inline asm 2026-02-21T09:07:44.5991656Z mov.b32 %r203, 1; 2026-02-21T09:07:44.5991799Z // begin inline asm 2026-02-21T09:07:44.5992079Z @%p104 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0, %r203; 2026-02-21T09:07:44.5992379Z // end inline asm 2026-02-21T09:07:44.5992530Z // begin inline asm 2026-02-21T09:07:44.5992827Z @%p104 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1, %r203; 2026-02-21T09:07:44.5993144Z // end inline asm 2026-02-21T09:07:44.5993282Z // begin inline asm 2026-02-21T09:07:44.5993533Z @%p104 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x6; 2026-02-21T09:07:44.5993815Z // end inline asm 2026-02-21T09:07:44.5993953Z // begin inline asm 2026-02-21T09:07:44.5994220Z @%p104 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0; 2026-02-21T09:07:44.5994512Z // end inline asm 2026-02-21T09:07:44.5996572Z // begin inline asm 2026-02-21T09:07:44.5996830Z @%p104 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x2; 2026-02-21T09:07:44.5997118Z // end inline asm 2026-02-21T09:07:44.5997263Z // begin inline asm 2026-02-21T09:07:44.5997502Z @%p104 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0; 2026-02-21T09:07:44.5997788Z // end inline asm 2026-02-21T09:07:44.5997927Z // begin inline asm 2026-02-21T09:07:44.5998297Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd22 + 0 ], 0x80; 2026-02-21T09:07:44.5998706Z // end inline asm 2026-02-21T09:07:44.5998855Z // begin inline asm 2026-02-21T09:07:44.5999081Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:07:44.5999346Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:07:44.5999552Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:07:44.5999734Z // end inline asm 2026-02-21T09:07:44.5999879Z bar.sync 0, 128; 2026-02-21T09:07:44.6000138Z .loc 1 22 67 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:22:67 2026-02-21T09:07:44.6000461Z add.s64 %rd55, %rd37, 128; 2026-02-21T09:07:44.6000618Z bar.sync 0, 128; 2026-02-21T09:07:44.6000763Z // begin inline asm 2026-02-21T09:07:44.6000925Z @%p29 st.shared.b32 [ %r197 + 0 ], %r1642; 2026-02-21T09:07:44.6001108Z // end inline asm 2026-02-21T09:07:44.6001255Z bar.warp.sync -1; 2026-02-21T09:07:44.6001398Z // begin inline asm 2026-02-21T09:07:44.6001661Z @%p104 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd22 + 0 ], %rd4; 2026-02-21T09:07:44.6001952Z // end inline asm 2026-02-21T09:07:44.6002094Z // begin inline asm 2026-02-21T09:07:44.6002321Z @%p104 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1; 2026-02-21T09:07:44.6002589Z // end inline asm 2026-02-21T09:07:44.6002732Z // begin inline asm 2026-02-21T09:07:44.6002976Z @%p104 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0, %r199; 2026-02-21T09:07:44.6003260Z // end inline asm 2026-02-21T09:07:44.6003398Z mov.b32 %r208, 256; 2026-02-21T09:07:44.6003547Z // begin inline asm 2026-02-21T09:07:44.6003787Z @%p104 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1, %r208; 2026-02-21T09:07:44.6004072Z // end inline asm 2026-02-21T09:07:44.6004209Z // begin inline asm 2026-02-21T09:07:44.6004468Z @%p104 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0, %r201; 2026-02-21T09:07:44.6004833Z // end inline asm 2026-02-21T09:07:44.6004972Z // begin inline asm 2026-02-21T09:07:44.6005238Z @%p104 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1, %r201; 2026-02-21T09:07:44.6005528Z // end inline asm 2026-02-21T09:07:44.6005725Z // begin inline asm 2026-02-21T09:07:44.6005992Z @%p104 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd22 + 0 ], 0x0, %rd30; 2026-02-21T09:07:44.6006303Z // end inline asm 2026-02-21T09:07:44.6006456Z // begin inline asm 2026-02-21T09:07:44.6006731Z @%p104 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0, %r203; 2026-02-21T09:07:44.6007035Z // end inline asm 2026-02-21T09:07:44.6007167Z // begin inline asm 2026-02-21T09:07:44.6007427Z @%p104 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x1, %r203; 2026-02-21T09:07:44.6007714Z // end inline asm 2026-02-21T09:07:44.6007891Z // begin inline asm 2026-02-21T09:07:44.6008141Z @%p104 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x6; 2026-02-21T09:07:44.6008427Z // end inline asm 2026-02-21T09:07:44.6008574Z // begin inline asm 2026-02-21T09:07:44.6008831Z @%p104 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0; 2026-02-21T09:07:44.6009138Z // end inline asm 2026-02-21T09:07:44.6009275Z // begin inline asm 2026-02-21T09:07:44.6009528Z @%p104 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x2; 2026-02-21T09:07:44.6009844Z // end inline asm 2026-02-21T09:07:44.6009976Z // begin inline asm 2026-02-21T09:07:44.6010210Z @%p104 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd22 + 0 ], 0x0; 2026-02-21T09:07:44.6010478Z // end inline asm 2026-02-21T09:07:44.6010624Z // begin inline asm 2026-02-21T09:07:44.6010978Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd22 + 0 ], 0x80; 2026-02-21T09:07:44.6011389Z // end inline asm 2026-02-21T09:07:44.6011528Z // begin inline asm 2026-02-21T09:07:44.6011747Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:07:44.6012015Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:07:44.6012213Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:07:44.6012405Z // end inline asm 2026-02-21T09:07:44.6012544Z bar.sync 0, 128; 2026-02-21T09:07:44.6012822Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6013139Z max.u32 %r509, %r41, 255; 2026-02-21T09:07:44.6013308Z shl.b32 %r510, %r509, 6; 2026-02-21T09:07:44.6013477Z sub.s32 %r77, 16384, %r510; 2026-02-21T09:07:44.6013754Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6014081Z shfl.sync.idx.b32 %r511, %r2, 0, 31, -1; 2026-02-21T09:07:44.6014267Z shl.b32 %r512, %r511, 21; 2026-02-21T09:07:44.6014438Z and.b32 %r513, %r512, 6291456; 2026-02-21T09:07:44.6014610Z add.s32 %r213, %r513, %r1614; 2026-02-21T09:07:44.6014814Z mov.pred %p67, -1; 2026-02-21T09:07:44.6014964Z // begin inline asm 2026-02-21T09:07:44.6015391Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 0], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6015894Z // end inline asm 2026-02-21T09:07:44.6016044Z // begin inline asm 2026-02-21T09:07:44.6016488Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 16], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6016941Z // end inline asm 2026-02-21T09:07:44.6017089Z // begin inline asm 2026-02-21T09:07:44.6017492Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 32], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6017997Z // end inline asm 2026-02-21T09:07:44.6018167Z // begin inline asm 2026-02-21T09:07:44.6018567Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 48], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6019063Z // end inline asm 2026-02-21T09:07:44.6019207Z // begin inline asm 2026-02-21T09:07:44.6019621Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 64], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6020084Z // end inline asm 2026-02-21T09:07:44.6020224Z // begin inline asm 2026-02-21T09:07:44.6020638Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 80], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6021091Z // end inline asm 2026-02-21T09:07:44.6021275Z // begin inline asm 2026-02-21T09:07:44.6021673Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 96], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6022109Z // end inline asm 2026-02-21T09:07:44.6022260Z // begin inline asm 2026-02-21T09:07:44.6022683Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 112], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6023141Z // end inline asm 2026-02-21T09:07:44.6023279Z // begin inline asm 2026-02-21T09:07:44.6023685Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 128], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6024126Z // end inline asm 2026-02-21T09:07:44.6024265Z // begin inline asm 2026-02-21T09:07:44.6024715Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 144], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6025175Z // end inline asm 2026-02-21T09:07:44.6025322Z // begin inline asm 2026-02-21T09:07:44.6025722Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 160], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6026161Z // end inline asm 2026-02-21T09:07:44.6026307Z // begin inline asm 2026-02-21T09:07:44.6026702Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 176], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6027140Z // end inline asm 2026-02-21T09:07:44.6027276Z // begin inline asm 2026-02-21T09:07:44.6027688Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 192], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6028130Z // end inline asm 2026-02-21T09:07:44.6028268Z // begin inline asm 2026-02-21T09:07:44.6028672Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 208], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6029110Z // end inline asm 2026-02-21T09:07:44.6029253Z // begin inline asm 2026-02-21T09:07:44.6029660Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 224], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6030100Z // end inline asm 2026-02-21T09:07:44.6030245Z // begin inline asm 2026-02-21T09:07:44.6030642Z @%p67 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r213 + 240], {%r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642, %r1642}; 2026-02-21T09:07:44.6031111Z // end inline asm 2026-02-21T09:07:44.6031248Z // begin inline asm 2026-02-21T09:07:44.6031412Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:07:44.6031578Z // end inline asm 2026-02-21T09:07:44.6031720Z bar.sync 0, 128; 2026-02-21T09:07:44.6032037Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6032354Z add.s32 %r485, %r196, 114688; 2026-02-21T09:07:44.6032525Z // begin inline asm 2026-02-21T09:07:44.6032701Z @%p104 mbarrier.init.shared::cta.b64 [%r485], 1; 2026-02-21T09:07:44.6032913Z // end inline asm 2026-02-21T09:07:44.6033051Z bar.sync 0, 128; 2026-02-21T09:07:44.6033204Z add.s32 %r486, %r196, 114696; 2026-02-21T09:07:44.6033364Z // begin inline asm 2026-02-21T09:07:44.6033543Z @%p104 mbarrier.init.shared::cta.b64 [%r486], 1; 2026-02-21T09:07:44.6033743Z // end inline asm 2026-02-21T09:07:44.6033878Z bar.sync 0, 128; 2026-02-21T09:07:44.6034053Z add.s32 %r487, %r196, 114704; 2026-02-21T09:07:44.6034217Z // begin inline asm 2026-02-21T09:07:44.6034391Z @%p104 mbarrier.init.shared::cta.b64 [%r487], 1; 2026-02-21T09:07:44.6034585Z // end inline asm 2026-02-21T09:07:44.6034760Z bar.sync 0, 128; 2026-02-21T09:07:44.6034902Z add.s32 %r488, %r196, 114712; 2026-02-21T09:07:44.6035072Z // begin inline asm 2026-02-21T09:07:44.6035239Z @%p104 mbarrier.init.shared::cta.b64 [%r488], 1; 2026-02-21T09:07:44.6035439Z // end inline asm 2026-02-21T09:07:44.6035617Z add.s32 %r489, %r196, 114720; 2026-02-21T09:07:44.6035781Z // begin inline asm 2026-02-21T09:07:44.6035962Z @%p104 mbarrier.init.shared::cta.b64 [%r489], 1; 2026-02-21T09:07:44.6036158Z // end inline asm 2026-02-21T09:07:44.6036305Z bar.sync 0, 128; 2026-02-21T09:07:44.6036451Z add.s32 %r490, %r196, 114728; 2026-02-21T09:07:44.6036623Z // begin inline asm 2026-02-21T09:07:44.6036797Z @%p104 mbarrier.init.shared::cta.b64 [%r490], 1; 2026-02-21T09:07:44.6037003Z // end inline asm 2026-02-21T09:07:44.6037149Z bar.sync 0, 128; 2026-02-21T09:07:44.6037294Z add.s32 %r491, %r196, 114736; 2026-02-21T09:07:44.6037464Z // begin inline asm 2026-02-21T09:07:44.6037633Z @%p104 mbarrier.init.shared::cta.b64 [%r491], 1; 2026-02-21T09:07:44.6037836Z // end inline asm 2026-02-21T09:07:44.6037977Z bar.sync 0, 128; 2026-02-21T09:07:44.6038127Z add.s32 %r492, %r196, 114744; 2026-02-21T09:07:44.6038290Z // begin inline asm 2026-02-21T09:07:44.6038478Z @%p104 mbarrier.init.shared::cta.b64 [%r492], 1; 2026-02-21T09:07:44.6038659Z // end inline asm 2026-02-21T09:07:44.6038905Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6039187Z bar.sync 0, 128; 2026-02-21T09:07:44.6039321Z // begin inline asm 2026-02-21T09:07:44.6039499Z @%p104 mbarrier.arrive.shared::cta.b64 _, [%r485]; 2026-02-21T09:07:44.6039696Z // end inline asm 2026-02-21T09:07:44.6039832Z bar.sync 0, 128; 2026-02-21T09:07:44.6039965Z // begin inline asm 2026-02-21T09:07:44.6040140Z @%p104 mbarrier.arrive.shared::cta.b64 _, [%r486]; 2026-02-21T09:07:44.6040341Z // end inline asm 2026-02-21T09:07:44.6040472Z bar.sync 0, 128; 2026-02-21T09:07:44.6040612Z // begin inline asm 2026-02-21T09:07:44.6040776Z @%p104 mbarrier.arrive.shared::cta.b64 _, [%r487]; 2026-02-21T09:07:44.6040973Z // end inline asm 2026-02-21T09:07:44.6041104Z bar.sync 0, 128; 2026-02-21T09:07:44.6041244Z // begin inline asm 2026-02-21T09:07:44.6041411Z @%p104 mbarrier.arrive.shared::cta.b64 _, [%r488]; 2026-02-21T09:07:44.6041616Z // end inline asm 2026-02-21T09:07:44.6041871Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6042177Z bar.sync 0, 128; 2026-02-21T09:07:44.6042323Z add.s32 %r497, %r196, 114752; 2026-02-21T09:07:44.6042479Z // begin inline asm 2026-02-21T09:07:44.6042650Z @%p104 mbarrier.init.shared::cta.b64 [%r497], 1; 2026-02-21T09:07:44.6042834Z // end inline asm 2026-02-21T09:07:44.6042983Z add.s32 %r1602, %r196, 114768; 2026-02-21T09:07:44.6043179Z // begin inline asm 2026-02-21T09:07:44.6043344Z @%p104 mbarrier.init.shared::cta.b64 [%r1602], 1; 2026-02-21T09:07:44.6043527Z // end inline asm 2026-02-21T09:07:44.6043773Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6044097Z bar.sync 0, 128; 2026-02-21T09:07:44.6044225Z // begin inline asm 2026-02-21T09:07:44.6044396Z @%p104 mbarrier.arrive.shared::cta.b64 _, [%r1602]; 2026-02-21T09:07:44.6044595Z // end inline asm 2026-02-21T09:07:44.6044927Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6045277Z st.shared.b32 [global_smem+114776], 33554689; 2026-02-21T09:07:44.6045506Z st.shared.b32 [global_smem+98304], %r1614; 2026-02-21T09:07:44.6045703Z barrier.sync 1; 2026-02-21T09:07:44.6045871Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:07:44.6046128Z barrier.sync 1; 2026-02-21T09:07:44.6046287Z setp.lt.s32 %p98, %r77, 1; 2026-02-21T09:07:44.6046450Z @%p98 bra $L__BB0_23; 2026-02-21T09:07:44.6046613Z // %bb.17: // %.lr.ph12 2026-02-21T09:07:44.6046917Z .loc 1 0 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0:106 2026-02-21T09:07:44.6047224Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:07:44.6047413Z shl.b32 %r42, %r1, 3; 2026-02-21T09:07:44.6047563Z and.b32 %r43, %r42, 248; 2026-02-21T09:07:44.6047748Z and.b32 %r44, %r1, 96; 2026-02-21T09:07:44.6047908Z bfe.u32 %r45, %r1, 5, 2; 2026-02-21T09:07:44.6048050Z or.b32 %r46, %r45, 4; 2026-02-21T09:07:44.6048193Z or.b32 %r47, %r45, 8; 2026-02-21T09:07:44.6048328Z or.b32 %r48, %r45, 12; 2026-02-21T09:07:44.6048476Z or.b32 %r49, %r45, 16; 2026-02-21T09:07:44.6048614Z or.b32 %r50, %r45, 20; 2026-02-21T09:07:44.6048756Z or.b32 %r51, %r45, 24; 2026-02-21T09:07:44.6048890Z or.b32 %r52, %r45, 28; 2026-02-21T09:07:44.6049032Z or.b32 %r53, %r45, 32; 2026-02-21T09:07:44.6049174Z or.b32 %r54, %r45, 36; 2026-02-21T09:07:44.6049309Z or.b32 %r55, %r45, 40; 2026-02-21T09:07:44.6049451Z or.b32 %r56, %r45, 44; 2026-02-21T09:07:44.6049587Z or.b32 %r57, %r45, 48; 2026-02-21T09:07:44.6049729Z or.b32 %r58, %r45, 52; 2026-02-21T09:07:44.6049866Z or.b32 %r59, %r45, 56; 2026-02-21T09:07:44.6050005Z or.b32 %r60, %r45, 60; 2026-02-21T09:07:44.6050140Z or.b32 %r61, %r45, 64; 2026-02-21T09:07:44.6050283Z or.b32 %r62, %r45, 68; 2026-02-21T09:07:44.6050421Z or.b32 %r63, %r45, 72; 2026-02-21T09:07:44.6050565Z or.b32 %r64, %r45, 76; 2026-02-21T09:07:44.6050702Z or.b32 %r65, %r45, 80; 2026-02-21T09:07:44.6050849Z or.b32 %r66, %r45, 84; 2026-02-21T09:07:44.6050996Z or.b32 %r67, %r45, 88; 2026-02-21T09:07:44.6051131Z or.b32 %r68, %r45, 92; 2026-02-21T09:07:44.6051274Z or.b32 %r69, %r45, 96; 2026-02-21T09:07:44.6051412Z or.b32 %r70, %r45, 100; 2026-02-21T09:07:44.6051562Z or.b32 %r71, %r45, 104; 2026-02-21T09:07:44.6051703Z or.b32 %r72, %r45, 108; 2026-02-21T09:07:44.6051851Z or.b32 %r73, %r45, 112; 2026-02-21T09:07:44.6051987Z or.b32 %r74, %r45, 116; 2026-02-21T09:07:44.6052131Z or.b32 %r75, %r45, 120; 2026-02-21T09:07:44.6052268Z or.b32 %r76, %r45, 124; 2026-02-21T09:07:44.6052534Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6052836Z add.s32 %r1639, %r41, -1; 2026-02-21T09:07:44.6052986Z and.b32 %r516, %r1, 7; 2026-02-21T09:07:44.6053135Z shl.b32 %r517, %r516, 11; 2026-02-21T09:07:44.6053281Z shl.b32 %r518, %r1, 4; 2026-02-21T09:07:44.6053433Z and.b32 %r519, %r518, 2032; 2026-02-21T09:07:44.6053585Z or.b32 %r520, %r517, %r519; 2026-02-21T09:07:44.6053745Z add.s32 %r522, %r196, 98304; 2026-02-21T09:07:44.6053897Z add.s32 %r80, %r522, %r520; 2026-02-21T09:07:44.6054052Z xor.b32 %r523, %r520, 16; 2026-02-21T09:07:44.6054202Z add.s32 %r81, %r522, %r523; 2026-02-21T09:07:44.6054351Z xor.b32 %r524, %r520, 32; 2026-02-21T09:07:44.6054503Z add.s32 %r82, %r522, %r524; 2026-02-21T09:07:44.6054709Z xor.b32 %r525, %r520, 48; 2026-02-21T09:07:44.6054860Z add.s32 %r83, %r522, %r525; 2026-02-21T09:07:44.6055005Z xor.b32 %r526, %r520, 64; 2026-02-21T09:07:44.6055153Z add.s32 %r84, %r522, %r526; 2026-02-21T09:07:44.6055331Z xor.b32 %r527, %r520, 80; 2026-02-21T09:07:44.6055484Z add.s32 %r85, %r522, %r527; 2026-02-21T09:07:44.6055638Z xor.b32 %r528, %r520, 96; 2026-02-21T09:07:44.6055797Z add.s32 %r86, %r522, %r528; 2026-02-21T09:07:44.6055965Z xor.b32 %r529, %r520, 112; 2026-02-21T09:07:44.6056121Z add.s32 %r87, %r522, %r529; 2026-02-21T09:07:44.6056284Z shl.b32 %r530, %r44, 6; 2026-02-21T09:07:44.6056436Z shl.b32 %r531, %r516, 4; 2026-02-21T09:07:44.6056603Z shr.u32 %r532, %r44, 1; 2026-02-21T09:07:44.6056754Z bfe.s32 %r533, %r1, 3, 1; 2026-02-21T09:07:44.6056915Z and.b32 %r534, %r533, 8256; 2026-02-21T09:07:44.6057071Z and.b32 %r535, %r42, 128; 2026-02-21T09:07:44.6057288Z or.b32 %r536, %r530, %r531; 2026-02-21T09:07:44.6057451Z or.b32 %r537, %r534, %r532; 2026-02-21T09:07:44.6057616Z xor.b32 %r538, %r537, %r536; 2026-02-21T09:07:44.6057784Z add.s32 %r539, %r522, %r535; 2026-02-21T09:07:44.6057945Z add.s32 %r821, %r539, %r538; 2026-02-21T09:07:44.6058113Z add.s32 %r826, %r821, 256; 2026-02-21T09:07:44.6058274Z add.s32 %r831, %r821, 512; 2026-02-21T09:07:44.6058439Z add.s32 %r836, %r821, 768; 2026-02-21T09:07:44.6058596Z add.s32 %r841, %r821, 1024; 2026-02-21T09:07:44.6058760Z add.s32 %r846, %r821, 1280; 2026-02-21T09:07:44.6058944Z add.s32 %r851, %r821, 1536; 2026-02-21T09:07:44.6059122Z add.s32 %r856, %r821, 1792; 2026-02-21T09:07:44.6059286Z mov.b32 %r1636, -1; 2026-02-21T09:07:44.6059435Z mov.b32 %r1640, %r1642; 2026-02-21T09:07:44.6059595Z mov.b32 %r1641, %r1642; 2026-02-21T09:07:44.6059745Z mov.b32 %r1637, %r1642; 2026-02-21T09:07:44.6059900Z bra.uni $L__BB0_18; 2026-02-21T09:07:44.6060103Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:07:44.6060454Z .loc 1 43 32 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:43:32 2026-02-21T09:07:44.6060762Z or.b32 %r1106, %r1641, %r43; 2026-02-21T09:07:44.6061044Z .loc 1 45 32 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:45:32 2026-02-21T09:07:44.6061351Z add.s32 %r1107, %r1640, %r45; 2026-02-21T09:07:44.6061519Z add.s32 %r1108, %r1640, %r46; 2026-02-21T09:07:44.6061688Z add.s32 %r1109, %r1640, %r47; 2026-02-21T09:07:44.6061847Z add.s32 %r1110, %r1640, %r48; 2026-02-21T09:07:44.6062012Z add.s32 %r1111, %r1640, %r49; 2026-02-21T09:07:44.6062167Z add.s32 %r1112, %r1640, %r50; 2026-02-21T09:07:44.6062331Z add.s32 %r1113, %r1640, %r51; 2026-02-21T09:07:44.6062487Z add.s32 %r1114, %r1640, %r52; 2026-02-21T09:07:44.6062651Z add.s32 %r1115, %r1640, %r53; 2026-02-21T09:07:44.6062813Z add.s32 %r1116, %r1640, %r54; 2026-02-21T09:07:44.6062970Z add.s32 %r1117, %r1640, %r55; 2026-02-21T09:07:44.6063135Z add.s32 %r1118, %r1640, %r56; 2026-02-21T09:07:44.6063292Z add.s32 %r1119, %r1640, %r57; 2026-02-21T09:07:44.6063455Z add.s32 %r1120, %r1640, %r58; 2026-02-21T09:07:44.6063610Z add.s32 %r1121, %r1640, %r59; 2026-02-21T09:07:44.6063772Z add.s32 %r1122, %r1640, %r60; 2026-02-21T09:07:44.6063929Z add.s32 %r1123, %r1640, %r61; 2026-02-21T09:07:44.6064092Z add.s32 %r1124, %r1640, %r62; 2026-02-21T09:07:44.6064246Z add.s32 %r1125, %r1640, %r63; 2026-02-21T09:07:44.6064410Z add.s32 %r1126, %r1640, %r64; 2026-02-21T09:07:44.6064574Z add.s32 %r1127, %r1640, %r65; 2026-02-21T09:07:44.6064752Z add.s32 %r1128, %r1640, %r66; 2026-02-21T09:07:44.6064918Z add.s32 %r1129, %r1640, %r67; 2026-02-21T09:07:44.6065074Z add.s32 %r1130, %r1640, %r68; 2026-02-21T09:07:44.6065241Z add.s32 %r1131, %r1640, %r69; 2026-02-21T09:07:44.6065396Z add.s32 %r1132, %r1640, %r70; 2026-02-21T09:07:44.6065560Z add.s32 %r1133, %r1640, %r71; 2026-02-21T09:07:44.6065715Z add.s32 %r1134, %r1640, %r72; 2026-02-21T09:07:44.6065882Z add.s32 %r1135, %r1640, %r73; 2026-02-21T09:07:44.6066075Z add.s32 %r1136, %r1640, %r74; 2026-02-21T09:07:44.6066242Z add.s32 %r1137, %r1640, %r75; 2026-02-21T09:07:44.6066396Z add.s32 %r1138, %r1640, %r76; 2026-02-21T09:07:44.6066660Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6066991Z bar.sync 0, 128; 2026-02-21T09:07:44.6067132Z // begin inline asm 2026-02-21T09:07:44.6067278Z 2026-02-21T09:07:44.6067394Z { 2026-02-21T09:07:44.6067533Z .reg .pred complete; 2026-02-21T09:07:44.6067684Z waitLoop: 2026-02-21T09:07:44.6067888Z mbarrier.try_wait.parity.shared.b64 complete, [%r497], %r1642; 2026-02-21T09:07:44.6068140Z @!complete bra.uni waitLoop; 2026-02-21T09:07:44.6068297Z } 2026-02-21T09:07:44.6068363Z 2026-02-21T09:07:44.6068428Z // end inline asm 2026-02-21T09:07:44.6068565Z // begin inline asm 2026-02-21T09:07:44.6068974Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559}, [%r213 + 0]; 2026-02-21T09:07:44.6069385Z // end inline asm 2026-02-21T09:07:44.6069531Z // begin inline asm 2026-02-21T09:07:44.6069891Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576}, [%r213 + 16]; 2026-02-21T09:07:44.6070304Z // end inline asm 2026-02-21T09:07:44.6070440Z // begin inline asm 2026-02-21T09:07:44.6070815Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593}, [%r213 + 32]; 2026-02-21T09:07:44.6071192Z // end inline asm 2026-02-21T09:07:44.6071323Z // begin inline asm 2026-02-21T09:07:44.6071665Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610}, [%r213 + 48]; 2026-02-21T09:07:44.6072051Z // end inline asm 2026-02-21T09:07:44.6072181Z // begin inline asm 2026-02-21T09:07:44.6072521Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627}, [%r213 + 64]; 2026-02-21T09:07:44.6072887Z // end inline asm 2026-02-21T09:07:44.6073025Z // begin inline asm 2026-02-21T09:07:44.6073362Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644}, [%r213 + 80]; 2026-02-21T09:07:44.6073746Z // end inline asm 2026-02-21T09:07:44.6073881Z // begin inline asm 2026-02-21T09:07:44.6074212Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661}, [%r213 + 96]; 2026-02-21T09:07:44.6074666Z // end inline asm 2026-02-21T09:07:44.6074829Z // begin inline asm 2026-02-21T09:07:44.6075207Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678}, [%r213 + 112]; 2026-02-21T09:07:44.6075601Z // end inline asm 2026-02-21T09:07:44.6075746Z // begin inline asm 2026-02-21T09:07:44.6076122Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695}, [%r213 + 128]; 2026-02-21T09:07:44.6076496Z // end inline asm 2026-02-21T09:07:44.6076631Z // begin inline asm 2026-02-21T09:07:44.6076973Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712}, [%r213 + 144]; 2026-02-21T09:07:44.6077358Z // end inline asm 2026-02-21T09:07:44.6077487Z // begin inline asm 2026-02-21T09:07:44.6077854Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729}, [%r213 + 160]; 2026-02-21T09:07:44.6078262Z // end inline asm 2026-02-21T09:07:44.6078428Z // begin inline asm 2026-02-21T09:07:44.6078804Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746}, [%r213 + 176]; 2026-02-21T09:07:44.6079198Z // end inline asm 2026-02-21T09:07:44.6079381Z // begin inline asm 2026-02-21T09:07:44.6079735Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763}, [%r213 + 192]; 2026-02-21T09:07:44.6080107Z // end inline asm 2026-02-21T09:07:44.6080248Z // begin inline asm 2026-02-21T09:07:44.6080604Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780}, [%r213 + 208]; 2026-02-21T09:07:44.6080981Z // end inline asm 2026-02-21T09:07:44.6081115Z // begin inline asm 2026-02-21T09:07:44.6081500Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797}, [%r213 + 224]; 2026-02-21T09:07:44.6081903Z // end inline asm 2026-02-21T09:07:44.6082039Z // begin inline asm 2026-02-21T09:07:44.6082399Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814}, [%r213 + 240]; 2026-02-21T09:07:44.6082770Z // end inline asm 2026-02-21T09:07:44.6082909Z // begin inline asm 2026-02-21T09:07:44.6083085Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:07:44.6083253Z // end inline asm 2026-02-21T09:07:44.6083388Z bar.sync 0, 128; 2026-02-21T09:07:44.6083520Z // begin inline asm 2026-02-21T09:07:44.6083695Z @%p104 mbarrier.arrive.shared::cta.b64 _, [%r1602]; 2026-02-21T09:07:44.6083891Z // end inline asm 2026-02-21T09:07:44.6084034Z cvt.u64.u32 %rd91, %r544; 2026-02-21T09:07:44.6084190Z cvt.u64.u32 %rd92, %r545; 2026-02-21T09:07:44.6084350Z shl.b64 %rd93, %rd92, 32; 2026-02-21T09:07:44.6084506Z or.b64 %rd94, %rd91, %rd93; 2026-02-21T09:07:44.6084827Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6085142Z mov.b64 {%r1140, %r1141}, %rd94; 2026-02-21T09:07:44.6085329Z cvt.rn.f16x2.f32 %r1142, %r1141, %r1140; 2026-02-21T09:07:44.6085644Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6085948Z cvt.u64.u32 %rd95, %r546; 2026-02-21T09:07:44.6086113Z cvt.u64.u32 %rd96, %r547; 2026-02-21T09:07:44.6086268Z shl.b64 %rd97, %rd96, 32; 2026-02-21T09:07:44.6086440Z or.b64 %rd98, %rd95, %rd97; 2026-02-21T09:07:44.6086701Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6087024Z mov.b64 {%r1143, %r1144}, %rd98; 2026-02-21T09:07:44.6087214Z cvt.rn.f16x2.f32 %r1145, %r1144, %r1143; 2026-02-21T09:07:44.6087517Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6087819Z cvt.u64.u32 %rd99, %r548; 2026-02-21T09:07:44.6087977Z cvt.u64.u32 %rd100, %r549; 2026-02-21T09:07:44.6088144Z shl.b64 %rd101, %rd100, 32; 2026-02-21T09:07:44.6088306Z or.b64 %rd102, %rd99, %rd101; 2026-02-21T09:07:44.6088596Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6088904Z mov.b64 {%r1146, %r1147}, %rd102; 2026-02-21T09:07:44.6089092Z cvt.rn.f16x2.f32 %r1148, %r1147, %r1146; 2026-02-21T09:07:44.6089393Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6089695Z cvt.u64.u32 %rd103, %r550; 2026-02-21T09:07:44.6089861Z cvt.u64.u32 %rd104, %r551; 2026-02-21T09:07:44.6090015Z shl.b64 %rd105, %rd104, 32; 2026-02-21T09:07:44.6090186Z or.b64 %rd106, %rd103, %rd105; 2026-02-21T09:07:44.6090466Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6090814Z mov.b64 {%r1149, %r1150}, %rd106; 2026-02-21T09:07:44.6091005Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T09:07:44.6091305Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6091650Z cvt.u64.u32 %rd107, %r552; 2026-02-21T09:07:44.6091816Z cvt.u64.u32 %rd108, %r553; 2026-02-21T09:07:44.6091987Z shl.b64 %rd109, %rd108, 32; 2026-02-21T09:07:44.6092157Z or.b64 %rd110, %rd107, %rd109; 2026-02-21T09:07:44.6092453Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6092769Z mov.b64 {%r1152, %r1153}, %rd110; 2026-02-21T09:07:44.6092960Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T09:07:44.6093280Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6093626Z cvt.u64.u32 %rd111, %r554; 2026-02-21T09:07:44.6093799Z cvt.u64.u32 %rd112, %r555; 2026-02-21T09:07:44.6093959Z shl.b64 %rd113, %rd112, 32; 2026-02-21T09:07:44.6094128Z or.b64 %rd114, %rd111, %rd113; 2026-02-21T09:07:44.6094406Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6094729Z mov.b64 {%r1155, %r1156}, %rd114; 2026-02-21T09:07:44.6094920Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T09:07:44.6095248Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6095567Z cvt.u64.u32 %rd115, %r556; 2026-02-21T09:07:44.6095714Z cvt.u64.u32 %rd116, %r557; 2026-02-21T09:07:44.6095869Z shl.b64 %rd117, %rd116, 32; 2026-02-21T09:07:44.6096019Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T09:07:44.6096285Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6096596Z mov.b64 {%r1158, %r1159}, %rd118; 2026-02-21T09:07:44.6096775Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T09:07:44.6097081Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6097375Z cvt.u64.u32 %rd119, %r558; 2026-02-21T09:07:44.6097542Z cvt.u64.u32 %rd120, %r559; 2026-02-21T09:07:44.6097701Z shl.b64 %rd121, %rd120, 32; 2026-02-21T09:07:44.6097870Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T09:07:44.6098151Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6098455Z mov.b64 {%r1161, %r1162}, %rd122; 2026-02-21T09:07:44.6098642Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T09:07:44.6098937Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6099240Z cvt.u64.u32 %rd123, %r561; 2026-02-21T09:07:44.6099399Z cvt.u64.u32 %rd124, %r562; 2026-02-21T09:07:44.6099562Z shl.b64 %rd125, %rd124, 32; 2026-02-21T09:07:44.6099723Z or.b64 %rd126, %rd123, %rd125; 2026-02-21T09:07:44.6100008Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6100321Z mov.b64 {%r1164, %r1165}, %rd126; 2026-02-21T09:07:44.6100502Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T09:07:44.6100808Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6101105Z cvt.u64.u32 %rd127, %r563; 2026-02-21T09:07:44.6101269Z cvt.u64.u32 %rd128, %r564; 2026-02-21T09:07:44.6101426Z shl.b64 %rd129, %rd128, 32; 2026-02-21T09:07:44.6101594Z or.b64 %rd130, %rd127, %rd129; 2026-02-21T09:07:44.6101874Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6102185Z mov.b64 {%r1167, %r1168}, %rd130; 2026-02-21T09:07:44.6102369Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T09:07:44.6102667Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6103012Z cvt.u64.u32 %rd131, %r565; 2026-02-21T09:07:44.6103169Z cvt.u64.u32 %rd132, %r566; 2026-02-21T09:07:44.6103337Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:07:44.6103497Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:07:44.6103828Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6104136Z mov.b64 {%r1170, %r1171}, %rd134; 2026-02-21T09:07:44.6104318Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T09:07:44.6104620Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6104947Z cvt.u64.u32 %rd135, %r567; 2026-02-21T09:07:44.6105133Z cvt.u64.u32 %rd136, %r568; 2026-02-21T09:07:44.6105297Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:07:44.6105480Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:07:44.6105813Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6106141Z mov.b64 {%r1173, %r1174}, %rd138; 2026-02-21T09:07:44.6106338Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T09:07:44.6106609Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6106894Z cvt.u64.u32 %rd139, %r569; 2026-02-21T09:07:44.6107042Z cvt.u64.u32 %rd140, %r570; 2026-02-21T09:07:44.6107197Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:07:44.6107346Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:07:44.6107646Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6107937Z mov.b64 {%r1176, %r1177}, %rd142; 2026-02-21T09:07:44.6108108Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T09:07:44.6108391Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6108677Z cvt.u64.u32 %rd143, %r571; 2026-02-21T09:07:44.6108835Z cvt.u64.u32 %rd144, %r572; 2026-02-21T09:07:44.6108983Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:07:44.6109142Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:07:44.6109402Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6109692Z mov.b64 {%r1179, %r1180}, %rd146; 2026-02-21T09:07:44.6109869Z cvt.rn.f16x2.f32 %r1181, %r1180, %r1179; 2026-02-21T09:07:44.6110144Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6110429Z cvt.u64.u32 %rd147, %r573; 2026-02-21T09:07:44.6110576Z cvt.u64.u32 %rd148, %r574; 2026-02-21T09:07:44.6110728Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:07:44.6110876Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:07:44.6111143Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6111437Z mov.b64 {%r1182, %r1183}, %rd150; 2026-02-21T09:07:44.6111605Z cvt.rn.f16x2.f32 %r1184, %r1183, %r1182; 2026-02-21T09:07:44.6111886Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6112192Z cvt.u64.u32 %rd151, %r575; 2026-02-21T09:07:44.6112351Z cvt.u64.u32 %rd152, %r576; 2026-02-21T09:07:44.6112504Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:07:44.6112667Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:07:44.6112940Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6113239Z mov.b64 {%r1185, %r1186}, %rd154; 2026-02-21T09:07:44.6113423Z cvt.rn.f16x2.f32 %r1187, %r1186, %r1185; 2026-02-21T09:07:44.6113710Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6114005Z cvt.u64.u32 %rd155, %r578; 2026-02-21T09:07:44.6114159Z cvt.u64.u32 %rd156, %r579; 2026-02-21T09:07:44.6114319Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:07:44.6114478Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:07:44.6114861Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6115175Z mov.b64 {%r1188, %r1189}, %rd158; 2026-02-21T09:07:44.6115354Z cvt.rn.f16x2.f32 %r1190, %r1189, %r1188; 2026-02-21T09:07:44.6115690Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6115990Z cvt.u64.u32 %rd159, %r580; 2026-02-21T09:07:44.6116158Z cvt.u64.u32 %rd160, %r581; 2026-02-21T09:07:44.6116327Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:07:44.6116499Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:07:44.6116774Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6117090Z mov.b64 {%r1191, %r1192}, %rd162; 2026-02-21T09:07:44.6117287Z cvt.rn.f16x2.f32 %r1193, %r1192, %r1191; 2026-02-21T09:07:44.6117607Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6117915Z cvt.u64.u32 %rd163, %r582; 2026-02-21T09:07:44.6118066Z cvt.u64.u32 %rd164, %r583; 2026-02-21T09:07:44.6118225Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:07:44.6118382Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:07:44.6118666Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6118965Z mov.b64 {%r1194, %r1195}, %rd166; 2026-02-21T09:07:44.6119139Z cvt.rn.f16x2.f32 %r1196, %r1195, %r1194; 2026-02-21T09:07:44.6119461Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6119762Z cvt.u64.u32 %rd167, %r584; 2026-02-21T09:07:44.6119925Z cvt.u64.u32 %rd168, %r585; 2026-02-21T09:07:44.6120078Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:07:44.6120244Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:07:44.6120525Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6120825Z mov.b64 {%r1197, %r1198}, %rd170; 2026-02-21T09:07:44.6121009Z cvt.rn.f16x2.f32 %r1199, %r1198, %r1197; 2026-02-21T09:07:44.6121297Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6121594Z cvt.u64.u32 %rd171, %r586; 2026-02-21T09:07:44.6121750Z cvt.u64.u32 %rd172, %r587; 2026-02-21T09:07:44.6121910Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:07:44.6122065Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:07:44.6122340Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6122626Z mov.b64 {%r1200, %r1201}, %rd174; 2026-02-21T09:07:44.6122792Z cvt.rn.f16x2.f32 %r1202, %r1201, %r1200; 2026-02-21T09:07:44.6123073Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6123353Z cvt.u64.u32 %rd175, %r588; 2026-02-21T09:07:44.6123506Z cvt.u64.u32 %rd176, %r589; 2026-02-21T09:07:44.6123654Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:07:44.6123810Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:07:44.6124070Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6124354Z mov.b64 {%r1203, %r1204}, %rd178; 2026-02-21T09:07:44.6124527Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T09:07:44.6124820Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6125118Z cvt.u64.u32 %rd179, %r590; 2026-02-21T09:07:44.6125276Z cvt.u64.u32 %rd180, %r591; 2026-02-21T09:07:44.6125443Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:07:44.6125607Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:07:44.6125898Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6126209Z mov.b64 {%r1206, %r1207}, %rd182; 2026-02-21T09:07:44.6126397Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T09:07:44.6126721Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6127015Z cvt.u64.u32 %rd183, %r592; 2026-02-21T09:07:44.6127170Z cvt.u64.u32 %rd184, %r593; 2026-02-21T09:07:44.6127345Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:07:44.6127505Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:07:44.6127781Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6128064Z mov.b64 {%r1209, %r1210}, %rd186; 2026-02-21T09:07:44.6128245Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T09:07:44.6128516Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6128800Z cvt.u64.u32 %rd187, %r595; 2026-02-21T09:07:44.6128946Z cvt.u64.u32 %rd188, %r596; 2026-02-21T09:07:44.6129099Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:07:44.6129278Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:07:44.6129552Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6129842Z mov.b64 {%r1212, %r1213}, %rd190; 2026-02-21T09:07:44.6130007Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T09:07:44.6130288Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6130567Z cvt.u64.u32 %rd191, %r597; 2026-02-21T09:07:44.6130721Z cvt.u64.u32 %rd192, %r598; 2026-02-21T09:07:44.6130897Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:07:44.6131058Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:07:44.6131330Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6131620Z mov.b64 {%r1215, %r1216}, %rd194; 2026-02-21T09:07:44.6131794Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T09:07:44.6132078Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6132370Z cvt.u64.u32 %rd195, %r599; 2026-02-21T09:07:44.6132519Z cvt.u64.u32 %rd196, %r600; 2026-02-21T09:07:44.6132674Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:07:44.6132824Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:07:44.6133094Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6133389Z mov.b64 {%r1218, %r1219}, %rd198; 2026-02-21T09:07:44.6133556Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T09:07:44.6133844Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6134120Z cvt.u64.u32 %rd199, %r601; 2026-02-21T09:07:44.6134272Z cvt.u64.u32 %rd200, %r602; 2026-02-21T09:07:44.6134419Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:07:44.6134576Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:07:44.6134876Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6135165Z mov.b64 {%r1221, %r1222}, %rd202; 2026-02-21T09:07:44.6135337Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T09:07:44.6135633Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6135950Z cvt.u64.u32 %rd203, %r603; 2026-02-21T09:07:44.6136108Z cvt.u64.u32 %rd204, %r604; 2026-02-21T09:07:44.6136273Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:07:44.6136435Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:07:44.6136732Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6137047Z mov.b64 {%r1224, %r1225}, %rd206; 2026-02-21T09:07:44.6137227Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T09:07:44.6137537Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6137847Z cvt.u64.u32 %rd207, %r605; 2026-02-21T09:07:44.6138016Z cvt.u64.u32 %rd208, %r606; 2026-02-21T09:07:44.6138213Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:07:44.6138382Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:07:44.6138672Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6139027Z mov.b64 {%r1227, %r1228}, %rd210; 2026-02-21T09:07:44.6139248Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T09:07:44.6139545Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6139854Z cvt.u64.u32 %rd211, %r607; 2026-02-21T09:07:44.6140013Z cvt.u64.u32 %rd212, %r608; 2026-02-21T09:07:44.6140179Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:07:44.6140338Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:07:44.6140622Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6140932Z mov.b64 {%r1230, %r1231}, %rd214; 2026-02-21T09:07:44.6141140Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T09:07:44.6141447Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6141751Z cvt.u64.u32 %rd215, %r609; 2026-02-21T09:07:44.6141914Z cvt.u64.u32 %rd216, %r610; 2026-02-21T09:07:44.6142072Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:07:44.6142243Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:07:44.6142533Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6142866Z mov.b64 {%r1233, %r1234}, %rd218; 2026-02-21T09:07:44.6143055Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T09:07:44.6143346Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6143651Z cvt.u64.u32 %rd219, %r612; 2026-02-21T09:07:44.6143810Z cvt.u64.u32 %rd220, %r613; 2026-02-21T09:07:44.6143974Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:07:44.6144138Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:07:44.6144426Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6144763Z mov.b64 {%r1236, %r1237}, %rd222; 2026-02-21T09:07:44.6144944Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T09:07:44.6145253Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6145559Z cvt.u64.u32 %rd223, %r614; 2026-02-21T09:07:44.6145724Z cvt.u64.u32 %rd224, %r615; 2026-02-21T09:07:44.6145885Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:07:44.6146053Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:07:44.6146341Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6146651Z mov.b64 {%r1239, %r1240}, %rd226; 2026-02-21T09:07:44.6146839Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T09:07:44.6147143Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6147449Z cvt.u64.u32 %rd227, %r616; 2026-02-21T09:07:44.6147606Z cvt.u64.u32 %rd228, %r617; 2026-02-21T09:07:44.6147771Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:07:44.6147936Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:07:44.6148220Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6148532Z mov.b64 {%r1242, %r1243}, %rd230; 2026-02-21T09:07:44.6148711Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T09:07:44.6149023Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6149320Z cvt.u64.u32 %rd231, %r618; 2026-02-21T09:07:44.6149486Z cvt.u64.u32 %rd232, %r619; 2026-02-21T09:07:44.6149643Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:07:44.6149812Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:07:44.6150108Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6150409Z mov.b64 {%r1245, %r1246}, %rd234; 2026-02-21T09:07:44.6150651Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T09:07:44.6150943Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6151258Z cvt.u64.u32 %rd235, %r620; 2026-02-21T09:07:44.6151448Z cvt.u64.u32 %rd236, %r621; 2026-02-21T09:07:44.6151627Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:07:44.6151810Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:07:44.6152088Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6152386Z mov.b64 {%r1248, %r1249}, %rd238; 2026-02-21T09:07:44.6152561Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T09:07:44.6152848Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6153126Z cvt.u64.u32 %rd239, %r622; 2026-02-21T09:07:44.6153317Z cvt.u64.u32 %rd240, %r623; 2026-02-21T09:07:44.6153468Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:07:44.6153626Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:07:44.6153896Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6154177Z mov.b64 {%r1251, %r1252}, %rd242; 2026-02-21T09:07:44.6154353Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T09:07:44.6154628Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6154991Z cvt.u64.u32 %rd243, %r624; 2026-02-21T09:07:44.6155155Z cvt.u64.u32 %rd244, %r625; 2026-02-21T09:07:44.6155322Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:07:44.6155490Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:07:44.6155766Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6156080Z mov.b64 {%r1254, %r1255}, %rd246; 2026-02-21T09:07:44.6156261Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T09:07:44.6156556Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6156832Z cvt.u64.u32 %rd247, %r626; 2026-02-21T09:07:44.6156987Z cvt.u64.u32 %rd248, %r627; 2026-02-21T09:07:44.6157133Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:07:44.6157288Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:07:44.6157555Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6157831Z mov.b64 {%r1257, %r1258}, %rd250; 2026-02-21T09:07:44.6158006Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T09:07:44.6158278Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6158556Z cvt.u64.u32 %rd251, %r629; 2026-02-21T09:07:44.6158702Z cvt.u64.u32 %rd252, %r630; 2026-02-21T09:07:44.6158854Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:07:44.6159008Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:07:44.6159269Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6159552Z mov.b64 {%r1260, %r1261}, %rd254; 2026-02-21T09:07:44.6159718Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T09:07:44.6159995Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6160271Z cvt.u64.u32 %rd255, %r631; 2026-02-21T09:07:44.6160425Z cvt.u64.u32 %rd256, %r632; 2026-02-21T09:07:44.6160573Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:07:44.6160731Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:07:44.6160997Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6161271Z mov.b64 {%r1263, %r1264}, %rd258; 2026-02-21T09:07:44.6161444Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T09:07:44.6161719Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6162002Z cvt.u64.u32 %rd259, %r633; 2026-02-21T09:07:44.6162197Z cvt.u64.u32 %rd260, %r634; 2026-02-21T09:07:44.6162354Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:07:44.6162516Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:07:44.6162777Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6163098Z mov.b64 {%r1266, %r1267}, %rd262; 2026-02-21T09:07:44.6163267Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T09:07:44.6163562Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6163873Z cvt.u64.u32 %rd263, %r635; 2026-02-21T09:07:44.6164050Z cvt.u64.u32 %rd264, %r636; 2026-02-21T09:07:44.6164211Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:07:44.6164382Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:07:44.6164697Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6165029Z mov.b64 {%r1269, %r1270}, %rd266; 2026-02-21T09:07:44.6165217Z cvt.rn.f16x2.f32 %r1271, %r1270, %r1269; 2026-02-21T09:07:44.6165521Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6165828Z cvt.u64.u32 %rd267, %r637; 2026-02-21T09:07:44.6165986Z cvt.u64.u32 %rd268, %r638; 2026-02-21T09:07:44.6166154Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:07:44.6166322Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:07:44.6166618Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6166907Z mov.b64 {%r1272, %r1273}, %rd270; 2026-02-21T09:07:44.6167080Z cvt.rn.f16x2.f32 %r1274, %r1273, %r1272; 2026-02-21T09:07:44.6167364Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6167649Z cvt.u64.u32 %rd271, %r639; 2026-02-21T09:07:44.6167808Z cvt.u64.u32 %rd272, %r640; 2026-02-21T09:07:44.6167966Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:07:44.6168123Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:07:44.6168391Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6168675Z mov.b64 {%r1275, %r1276}, %rd274; 2026-02-21T09:07:44.6168852Z cvt.rn.f16x2.f32 %r1277, %r1276, %r1275; 2026-02-21T09:07:44.6169136Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6169429Z cvt.u64.u32 %rd275, %r641; 2026-02-21T09:07:44.6169581Z cvt.u64.u32 %rd276, %r642; 2026-02-21T09:07:44.6169740Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:07:44.6169901Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:07:44.6170166Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6170454Z mov.b64 {%r1278, %r1279}, %rd278; 2026-02-21T09:07:44.6170627Z cvt.rn.f16x2.f32 %r1280, %r1279, %r1278; 2026-02-21T09:07:44.6170910Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6171189Z cvt.u64.u32 %rd279, %r643; 2026-02-21T09:07:44.6171346Z cvt.u64.u32 %rd280, %r644; 2026-02-21T09:07:44.6171504Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:07:44.6171661Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:07:44.6171932Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6172211Z mov.b64 {%r1281, %r1282}, %rd282; 2026-02-21T09:07:44.6172390Z cvt.rn.f16x2.f32 %r1283, %r1282, %r1281; 2026-02-21T09:07:44.6172666Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6172964Z cvt.u64.u32 %rd283, %r646; 2026-02-21T09:07:44.6173115Z cvt.u64.u32 %rd284, %r647; 2026-02-21T09:07:44.6173277Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:07:44.6173444Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:07:44.6173711Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6174036Z mov.b64 {%r1284, %r1285}, %rd286; 2026-02-21T09:07:44.6174204Z cvt.rn.f16x2.f32 %r1286, %r1285, %r1284; 2026-02-21T09:07:44.6174490Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6174831Z cvt.u64.u32 %rd287, %r648; 2026-02-21T09:07:44.6174987Z cvt.u64.u32 %rd288, %r649; 2026-02-21T09:07:44.6175164Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:07:44.6175317Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:07:44.6175583Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6175881Z mov.b64 {%r1287, %r1288}, %rd290; 2026-02-21T09:07:44.6176067Z cvt.rn.f16x2.f32 %r1289, %r1288, %r1287; 2026-02-21T09:07:44.6176364Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6176701Z cvt.u64.u32 %rd291, %r650; 2026-02-21T09:07:44.6176866Z cvt.u64.u32 %rd292, %r651; 2026-02-21T09:07:44.6177031Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:07:44.6177202Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:07:44.6177484Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6177792Z mov.b64 {%r1290, %r1291}, %rd294; 2026-02-21T09:07:44.6177970Z cvt.rn.f16x2.f32 %r1292, %r1291, %r1290; 2026-02-21T09:07:44.6178303Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6178607Z cvt.u64.u32 %rd295, %r652; 2026-02-21T09:07:44.6178773Z cvt.u64.u32 %rd296, %r653; 2026-02-21T09:07:44.6178935Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:07:44.6179096Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:07:44.6179379Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6179675Z mov.b64 {%r1293, %r1294}, %rd298; 2026-02-21T09:07:44.6179861Z cvt.rn.f16x2.f32 %r1295, %r1294, %r1293; 2026-02-21T09:07:44.6180149Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6180448Z cvt.u64.u32 %rd299, %r654; 2026-02-21T09:07:44.6180606Z cvt.u64.u32 %rd300, %r655; 2026-02-21T09:07:44.6180768Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:07:44.6180934Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:07:44.6181209Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6181519Z mov.b64 {%r1296, %r1297}, %rd302; 2026-02-21T09:07:44.6181696Z cvt.rn.f16x2.f32 %r1298, %r1297, %r1296; 2026-02-21T09:07:44.6181994Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6182296Z cvt.u64.u32 %rd303, %r656; 2026-02-21T09:07:44.6182461Z cvt.u64.u32 %rd304, %r657; 2026-02-21T09:07:44.6182626Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:07:44.6182786Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:07:44.6183069Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6183366Z mov.b64 {%r1299, %r1300}, %rd306; 2026-02-21T09:07:44.6183553Z cvt.rn.f16x2.f32 %r1301, %r1300, %r1299; 2026-02-21T09:07:44.6183844Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6184152Z cvt.u64.u32 %rd307, %r658; 2026-02-21T09:07:44.6184309Z cvt.u64.u32 %rd308, %r659; 2026-02-21T09:07:44.6184474Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:07:44.6184641Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:07:44.6184958Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6185282Z mov.b64 {%r1302, %r1303}, %rd310; 2026-02-21T09:07:44.6185465Z cvt.rn.f16x2.f32 %r1304, %r1303, %r1302; 2026-02-21T09:07:44.6185775Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6186122Z cvt.u64.u32 %rd311, %r660; 2026-02-21T09:07:44.6186286Z cvt.u64.u32 %rd312, %r661; 2026-02-21T09:07:44.6186452Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:07:44.6186612Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:07:44.6186930Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6187226Z mov.b64 {%r1305, %r1306}, %rd314; 2026-02-21T09:07:44.6187410Z cvt.rn.f16x2.f32 %r1307, %r1306, %r1305; 2026-02-21T09:07:44.6187708Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6188010Z cvt.u64.u32 %rd315, %r663; 2026-02-21T09:07:44.6188177Z cvt.u64.u32 %rd316, %r664; 2026-02-21T09:07:44.6188336Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:07:44.6188504Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:07:44.6188811Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6189128Z mov.b64 {%r1308, %r1309}, %rd318; 2026-02-21T09:07:44.6189308Z cvt.rn.f16x2.f32 %r1310, %r1309, %r1308; 2026-02-21T09:07:44.6189612Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6189910Z cvt.u64.u32 %rd319, %r665; 2026-02-21T09:07:44.6190075Z cvt.u64.u32 %rd320, %r666; 2026-02-21T09:07:44.6190239Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:07:44.6190448Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:07:44.6190731Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6191032Z mov.b64 {%r1311, %r1312}, %rd322; 2026-02-21T09:07:44.6191216Z cvt.rn.f16x2.f32 %r1313, %r1312, %r1311; 2026-02-21T09:07:44.6191516Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6191809Z cvt.u64.u32 %rd323, %r667; 2026-02-21T09:07:44.6191962Z cvt.u64.u32 %rd324, %r668; 2026-02-21T09:07:44.6192109Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:07:44.6192264Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:07:44.6192522Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6192805Z mov.b64 {%r1314, %r1315}, %rd326; 2026-02-21T09:07:44.6192969Z cvt.rn.f16x2.f32 %r1316, %r1315, %r1314; 2026-02-21T09:07:44.6193253Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6193527Z cvt.u64.u32 %rd327, %r669; 2026-02-21T09:07:44.6193683Z cvt.u64.u32 %rd328, %r670; 2026-02-21T09:07:44.6193835Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:07:44.6193984Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:07:44.6194245Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6194587Z mov.b64 {%r1317, %r1318}, %rd330; 2026-02-21T09:07:44.6194803Z cvt.rn.f16x2.f32 %r1319, %r1318, %r1317; 2026-02-21T09:07:44.6195105Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6195418Z cvt.u64.u32 %rd331, %r671; 2026-02-21T09:07:44.6195585Z cvt.u64.u32 %rd332, %r672; 2026-02-21T09:07:44.6195747Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:07:44.6195919Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:07:44.6196201Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6196504Z mov.b64 {%r1320, %r1321}, %rd334; 2026-02-21T09:07:44.6196672Z cvt.rn.f16x2.f32 %r1322, %r1321, %r1320; 2026-02-21T09:07:44.6196972Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6197262Z cvt.u64.u32 %rd335, %r673; 2026-02-21T09:07:44.6197428Z cvt.u64.u32 %rd336, %r674; 2026-02-21T09:07:44.6197585Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:07:44.6197768Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:07:44.6198034Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6198325Z mov.b64 {%r1323, %r1324}, %rd338; 2026-02-21T09:07:44.6198498Z cvt.rn.f16x2.f32 %r1325, %r1324, %r1323; 2026-02-21T09:07:44.6198799Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6199078Z cvt.u64.u32 %rd339, %r675; 2026-02-21T09:07:44.6199231Z cvt.u64.u32 %rd340, %r676; 2026-02-21T09:07:44.6199378Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:07:44.6199535Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:07:44.6199793Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6200083Z mov.b64 {%r1326, %r1327}, %rd342; 2026-02-21T09:07:44.6200148Z cvt.rn.f16x2.f32 %r1328, %r1327, %r1326; 2026-02-21T09:07:44.6200335Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6200403Z cvt.u64.u32 %rd343, %r677; 2026-02-21T09:07:44.6200459Z cvt.u64.u32 %rd344, %r678; 2026-02-21T09:07:44.6200516Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:07:44.6200581Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:07:44.6200750Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6200808Z mov.b64 {%r1329, %r1330}, %rd346; 2026-02-21T09:07:44.6200901Z cvt.rn.f16x2.f32 %r1331, %r1330, %r1329; 2026-02-21T09:07:44.6201082Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6201139Z cvt.u64.u32 %rd347, %r680; 2026-02-21T09:07:44.6201195Z cvt.u64.u32 %rd348, %r681; 2026-02-21T09:07:44.6201259Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:07:44.6201316Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:07:44.6201483Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6201549Z mov.b64 {%r1332, %r1333}, %rd350; 2026-02-21T09:07:44.6201614Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T09:07:44.6201783Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6201846Z cvt.u64.u32 %rd351, %r682; 2026-02-21T09:07:44.6201902Z cvt.u64.u32 %rd352, %r683; 2026-02-21T09:07:44.6201958Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:07:44.6202015Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:07:44.6202190Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6202247Z mov.b64 {%r1335, %r1336}, %rd354; 2026-02-21T09:07:44.6202310Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T09:07:44.6202478Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6202535Z cvt.u64.u32 %rd355, %r684; 2026-02-21T09:07:44.6202592Z cvt.u64.u32 %rd356, %r685; 2026-02-21T09:07:44.6202648Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:07:44.6202712Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:07:44.6202877Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6202935Z mov.b64 {%r1338, %r1339}, %rd358; 2026-02-21T09:07:44.6203007Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T09:07:44.6203171Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6203227Z cvt.u64.u32 %rd359, %r686; 2026-02-21T09:07:44.6203289Z cvt.u64.u32 %rd360, %r687; 2026-02-21T09:07:44.6203344Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:07:44.6203401Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:07:44.6203568Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6203633Z mov.b64 {%r1341, %r1342}, %rd362; 2026-02-21T09:07:44.6203696Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T09:07:44.6203890Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6203951Z cvt.u64.u32 %rd363, %r688; 2026-02-21T09:07:44.6204008Z cvt.u64.u32 %rd364, %r689; 2026-02-21T09:07:44.6204089Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:07:44.6204153Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:07:44.6204324Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6204381Z mov.b64 {%r1344, %r1345}, %rd366; 2026-02-21T09:07:44.6204444Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T09:07:44.6204618Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6204701Z cvt.u64.u32 %rd367, %r690; 2026-02-21T09:07:44.6204757Z cvt.u64.u32 %rd368, %r691; 2026-02-21T09:07:44.6204822Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:07:44.6204901Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:07:44.6205070Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6205135Z mov.b64 {%r1347, %r1348}, %rd370; 2026-02-21T09:07:44.6205201Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T09:07:44.6205369Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6205432Z cvt.u64.u32 %rd371, %r692; 2026-02-21T09:07:44.6205525Z cvt.u64.u32 %rd372, %r693; 2026-02-21T09:07:44.6205583Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:07:44.6205640Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:07:44.6205819Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6205880Z mov.b64 {%r1350, %r1351}, %rd374; 2026-02-21T09:07:44.6205949Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T09:07:44.6206136Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6206200Z cvt.u64.u32 %rd375, %r694; 2026-02-21T09:07:44.6206263Z cvt.u64.u32 %rd376, %r695; 2026-02-21T09:07:44.6206325Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:07:44.6206399Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:07:44.6206576Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6206638Z mov.b64 {%r1353, %r1354}, %rd378; 2026-02-21T09:07:44.6206719Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T09:07:44.6206895Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6206957Z cvt.u64.u32 %rd379, %r697; 2026-02-21T09:07:44.6207023Z cvt.u64.u32 %rd380, %r698; 2026-02-21T09:07:44.6207085Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:07:44.6207146Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:07:44.6207323Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6207394Z mov.b64 {%r1356, %r1357}, %rd382; 2026-02-21T09:07:44.6207461Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T09:07:44.6207635Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6207705Z cvt.u64.u32 %rd383, %r699; 2026-02-21T09:07:44.6207765Z cvt.u64.u32 %rd384, %r700; 2026-02-21T09:07:44.6207826Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:07:44.6207894Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:07:44.6208070Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6208131Z mov.b64 {%r1359, %r1360}, %rd386; 2026-02-21T09:07:44.6208199Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T09:07:44.6208378Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6208439Z cvt.u64.u32 %rd387, %r701; 2026-02-21T09:07:44.6208502Z cvt.u64.u32 %rd388, %r702; 2026-02-21T09:07:44.6208601Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:07:44.6208663Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:07:44.6208843Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6208936Z mov.b64 {%r1362, %r1363}, %rd390; 2026-02-21T09:07:44.6209004Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T09:07:44.6209184Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6209250Z cvt.u64.u32 %rd391, %r703; 2026-02-21T09:07:44.6209311Z cvt.u64.u32 %rd392, %r704; 2026-02-21T09:07:44.6209370Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:07:44.6209430Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:07:44.6209615Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6209676Z mov.b64 {%r1365, %r1366}, %rd394; 2026-02-21T09:07:44.6209769Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T09:07:44.6209958Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6210018Z cvt.u64.u32 %rd395, %r705; 2026-02-21T09:07:44.6210078Z cvt.u64.u32 %rd396, %r706; 2026-02-21T09:07:44.6210141Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:07:44.6210209Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:07:44.6210387Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6210476Z mov.b64 {%r1368, %r1369}, %rd398; 2026-02-21T09:07:44.6210553Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T09:07:44.6210723Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6210785Z cvt.u64.u32 %rd399, %r707; 2026-02-21T09:07:44.6210850Z cvt.u64.u32 %rd400, %r708; 2026-02-21T09:07:44.6210909Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:07:44.6210972Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:07:44.6211144Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6211215Z mov.b64 {%r1371, %r1372}, %rd402; 2026-02-21T09:07:44.6211281Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T09:07:44.6211455Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6211520Z cvt.u64.u32 %rd403, %r709; 2026-02-21T09:07:44.6211578Z cvt.u64.u32 %rd404, %r710; 2026-02-21T09:07:44.6211639Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:07:44.6211705Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:07:44.6211875Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6211934Z mov.b64 {%r1374, %r1375}, %rd406; 2026-02-21T09:07:44.6212001Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T09:07:44.6212181Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6212242Z cvt.u64.u32 %rd407, %r711; 2026-02-21T09:07:44.6212299Z cvt.u64.u32 %rd408, %r712; 2026-02-21T09:07:44.6212367Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:07:44.6212427Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:07:44.6212603Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6212671Z mov.b64 {%r1377, %r1378}, %rd410; 2026-02-21T09:07:44.6212739Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T09:07:44.6212907Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6212974Z cvt.u64.u32 %rd411, %r714; 2026-02-21T09:07:44.6213032Z cvt.u64.u32 %rd412, %r715; 2026-02-21T09:07:44.6213091Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:07:44.6213151Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:07:44.6213332Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6213420Z mov.b64 {%r1380, %r1381}, %rd414; 2026-02-21T09:07:44.6213487Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T09:07:44.6213668Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6213756Z cvt.u64.u32 %rd415, %r716; 2026-02-21T09:07:44.6213815Z cvt.u64.u32 %rd416, %r717; 2026-02-21T09:07:44.6213873Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:07:44.6213940Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:07:44.6214122Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6214183Z mov.b64 {%r1383, %r1384}, %rd418; 2026-02-21T09:07:44.6214259Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T09:07:44.6214434Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6214495Z cvt.u64.u32 %rd419, %r718; 2026-02-21T09:07:44.6214603Z cvt.u64.u32 %rd420, %r719; 2026-02-21T09:07:44.6214705Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:07:44.6214769Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:07:44.6214949Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6215019Z mov.b64 {%r1386, %r1387}, %rd422; 2026-02-21T09:07:44.6215087Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T09:07:44.6215272Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6215373Z cvt.u64.u32 %rd423, %r720; 2026-02-21T09:07:44.6215435Z cvt.u64.u32 %rd424, %r721; 2026-02-21T09:07:44.6215496Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:07:44.6215567Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:07:44.6215745Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6215807Z mov.b64 {%r1389, %r1390}, %rd426; 2026-02-21T09:07:44.6215877Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T09:07:44.6216063Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6216125Z cvt.u64.u32 %rd427, %r722; 2026-02-21T09:07:44.6216185Z cvt.u64.u32 %rd428, %r723; 2026-02-21T09:07:44.6216255Z shl.b64 %rd429, %rd428, 32; 2026-02-21T09:07:44.6216317Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T09:07:44.6216492Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6216566Z mov.b64 {%r1392, %r1393}, %rd430; 2026-02-21T09:07:44.6216637Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T09:07:44.6216816Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6216889Z cvt.u64.u32 %rd431, %r724; 2026-02-21T09:07:44.6216956Z cvt.u64.u32 %rd432, %r725; 2026-02-21T09:07:44.6217020Z shl.b64 %rd433, %rd432, 32; 2026-02-21T09:07:44.6217086Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T09:07:44.6217295Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6217360Z mov.b64 {%r1395, %r1396}, %rd434; 2026-02-21T09:07:44.6217431Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T09:07:44.6217623Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6217687Z cvt.u64.u32 %rd435, %r726; 2026-02-21T09:07:44.6217750Z cvt.u64.u32 %rd436, %r727; 2026-02-21T09:07:44.6217815Z shl.b64 %rd437, %rd436, 32; 2026-02-21T09:07:44.6217888Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T09:07:44.6218085Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6218150Z mov.b64 {%r1398, %r1399}, %rd438; 2026-02-21T09:07:44.6218230Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T09:07:44.6218429Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6218523Z cvt.u64.u32 %rd439, %r728; 2026-02-21T09:07:44.6218594Z cvt.u64.u32 %rd440, %r729; 2026-02-21T09:07:44.6218658Z shl.b64 %rd441, %rd440, 32; 2026-02-21T09:07:44.6218721Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T09:07:44.6218914Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6219017Z mov.b64 {%r1401, %r1402}, %rd442; 2026-02-21T09:07:44.6219089Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T09:07:44.6219296Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6219367Z cvt.u64.u32 %rd443, %r731; 2026-02-21T09:07:44.6219429Z cvt.u64.u32 %rd444, %r732; 2026-02-21T09:07:44.6219493Z shl.b64 %rd445, %rd444, 32; 2026-02-21T09:07:44.6219564Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T09:07:44.6219796Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6219864Z mov.b64 {%r1404, %r1405}, %rd446; 2026-02-21T09:07:44.6219938Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T09:07:44.6220142Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6220208Z cvt.u64.u32 %rd447, %r733; 2026-02-21T09:07:44.6220270Z cvt.u64.u32 %rd448, %r734; 2026-02-21T09:07:44.6220340Z shl.b64 %rd449, %rd448, 32; 2026-02-21T09:07:44.6220403Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T09:07:44.6220624Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6220696Z mov.b64 {%r1407, %r1408}, %rd450; 2026-02-21T09:07:44.6220768Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T09:07:44.6220948Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6221009Z cvt.u64.u32 %rd451, %r735; 2026-02-21T09:07:44.6221065Z cvt.u64.u32 %rd452, %r736; 2026-02-21T09:07:44.6221122Z shl.b64 %rd453, %rd452, 32; 2026-02-21T09:07:44.6221179Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T09:07:44.6221351Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6221406Z mov.b64 {%r1410, %r1411}, %rd454; 2026-02-21T09:07:44.6221471Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T09:07:44.6221648Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6221705Z cvt.u64.u32 %rd455, %r737; 2026-02-21T09:07:44.6221759Z cvt.u64.u32 %rd456, %r738; 2026-02-21T09:07:44.6221814Z shl.b64 %rd457, %rd456, 32; 2026-02-21T09:07:44.6221877Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T09:07:44.6222041Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6222098Z mov.b64 {%r1413, %r1414}, %rd458; 2026-02-21T09:07:44.6222170Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T09:07:44.6222335Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6222392Z cvt.u64.u32 %rd459, %r739; 2026-02-21T09:07:44.6222453Z cvt.u64.u32 %rd460, %r740; 2026-02-21T09:07:44.6222511Z shl.b64 %rd461, %rd460, 32; 2026-02-21T09:07:44.6222568Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T09:07:44.6222732Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6222796Z mov.b64 {%r1416, %r1417}, %rd462; 2026-02-21T09:07:44.6222861Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T09:07:44.6223026Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6223089Z cvt.u64.u32 %rd463, %r741; 2026-02-21T09:07:44.6223146Z cvt.u64.u32 %rd464, %r742; 2026-02-21T09:07:44.6223203Z shl.b64 %rd465, %rd464, 32; 2026-02-21T09:07:44.6223268Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T09:07:44.6223460Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6223518Z mov.b64 {%r1419, %r1420}, %rd466; 2026-02-21T09:07:44.6223580Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T09:07:44.6223752Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6223832Z cvt.u64.u32 %rd467, %r743; 2026-02-21T09:07:44.6223887Z cvt.u64.u32 %rd468, %r744; 2026-02-21T09:07:44.6223953Z shl.b64 %rd469, %rd468, 32; 2026-02-21T09:07:44.6224010Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T09:07:44.6224172Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6224236Z mov.b64 {%r1422, %r1423}, %rd470; 2026-02-21T09:07:44.6224300Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T09:07:44.6224482Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6224553Z cvt.u64.u32 %rd471, %r745; 2026-02-21T09:07:44.6224613Z cvt.u64.u32 %rd472, %r746; 2026-02-21T09:07:44.6224714Z shl.b64 %rd473, %rd472, 32; 2026-02-21T09:07:44.6224778Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T09:07:44.6224960Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6225024Z mov.b64 {%r1425, %r1426}, %rd474; 2026-02-21T09:07:44.6225095Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T09:07:44.6225306Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6225371Z cvt.u64.u32 %rd475, %r748; 2026-02-21T09:07:44.6225433Z cvt.u64.u32 %rd476, %r749; 2026-02-21T09:07:44.6225495Z shl.b64 %rd477, %rd476, 32; 2026-02-21T09:07:44.6225569Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T09:07:44.6225747Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6225808Z mov.b64 {%r1428, %r1429}, %rd478; 2026-02-21T09:07:44.6225886Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T09:07:44.6226062Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6226117Z cvt.u64.u32 %rd479, %r750; 2026-02-21T09:07:44.6226189Z cvt.u64.u32 %rd480, %r751; 2026-02-21T09:07:44.6226248Z shl.b64 %rd481, %rd480, 32; 2026-02-21T09:07:44.6226308Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T09:07:44.6226474Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6226546Z mov.b64 {%r1431, %r1432}, %rd482; 2026-02-21T09:07:44.6226612Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T09:07:44.6226775Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6226842Z cvt.u64.u32 %rd483, %r752; 2026-02-21T09:07:44.6226901Z cvt.u64.u32 %rd484, %r753; 2026-02-21T09:07:44.6226957Z shl.b64 %rd485, %rd484, 32; 2026-02-21T09:07:44.6227023Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T09:07:44.6227181Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6227239Z mov.b64 {%r1434, %r1435}, %rd486; 2026-02-21T09:07:44.6227302Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T09:07:44.6227468Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6227524Z cvt.u64.u32 %rd487, %r754; 2026-02-21T09:07:44.6227578Z cvt.u64.u32 %rd488, %r755; 2026-02-21T09:07:44.6227642Z shl.b64 %rd489, %rd488, 32; 2026-02-21T09:07:44.6227699Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T09:07:44.6227858Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6227920Z mov.b64 {%r1437, %r1438}, %rd490; 2026-02-21T09:07:44.6227986Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T09:07:44.6228184Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6228248Z cvt.u64.u32 %rd491, %r756; 2026-02-21T09:07:44.6228303Z cvt.u64.u32 %rd492, %r757; 2026-02-21T09:07:44.6228359Z shl.b64 %rd493, %rd492, 32; 2026-02-21T09:07:44.6228445Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T09:07:44.6228612Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6228671Z mov.b64 {%r1440, %r1441}, %rd494; 2026-02-21T09:07:44.6228735Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T09:07:44.6228908Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6228964Z cvt.u64.u32 %rd495, %r758; 2026-02-21T09:07:44.6229018Z cvt.u64.u32 %rd496, %r759; 2026-02-21T09:07:44.6229075Z shl.b64 %rd497, %rd496, 32; 2026-02-21T09:07:44.6229138Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T09:07:44.6229326Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6229387Z mov.b64 {%r1443, %r1444}, %rd498; 2026-02-21T09:07:44.6229459Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T09:07:44.6229624Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6229681Z cvt.u64.u32 %rd499, %r760; 2026-02-21T09:07:44.6229743Z cvt.u64.u32 %rd500, %r761; 2026-02-21T09:07:44.6229821Z shl.b64 %rd501, %rd500, 32; 2026-02-21T09:07:44.6229878Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T09:07:44.6230043Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6230106Z mov.b64 {%r1446, %r1447}, %rd502; 2026-02-21T09:07:44.6230168Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T09:07:44.6230334Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6230397Z cvt.u64.u32 %rd503, %r762; 2026-02-21T09:07:44.6230451Z cvt.u64.u32 %rd504, %r763; 2026-02-21T09:07:44.6230507Z shl.b64 %rd505, %rd504, 32; 2026-02-21T09:07:44.6230569Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T09:07:44.6230735Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6230792Z mov.b64 {%r1449, %r1450}, %rd506; 2026-02-21T09:07:44.6230855Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T09:07:44.6231030Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6231084Z cvt.u64.u32 %rd507, %r765; 2026-02-21T09:07:44.6231138Z cvt.u64.u32 %rd508, %r766; 2026-02-21T09:07:44.6231199Z shl.b64 %rd509, %rd508, 32; 2026-02-21T09:07:44.6231255Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T09:07:44.6231423Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6231488Z mov.b64 {%r1452, %r1453}, %rd510; 2026-02-21T09:07:44.6231553Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T09:07:44.6231718Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6231779Z cvt.u64.u32 %rd511, %r767; 2026-02-21T09:07:44.6231835Z cvt.u64.u32 %rd512, %r768; 2026-02-21T09:07:44.6231892Z shl.b64 %rd513, %rd512, 32; 2026-02-21T09:07:44.6231948Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T09:07:44.6232122Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6232179Z mov.b64 {%r1455, %r1456}, %rd514; 2026-02-21T09:07:44.6232242Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T09:07:44.6232415Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6232470Z cvt.u64.u32 %rd515, %r769; 2026-02-21T09:07:44.6232525Z cvt.u64.u32 %rd516, %r770; 2026-02-21T09:07:44.6232582Z shl.b64 %rd517, %rd516, 32; 2026-02-21T09:07:44.6232674Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T09:07:44.6232846Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6232903Z mov.b64 {%r1458, %r1459}, %rd518; 2026-02-21T09:07:44.6233010Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T09:07:44.6233174Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6233231Z cvt.u64.u32 %rd519, %r771; 2026-02-21T09:07:44.6233294Z cvt.u64.u32 %rd520, %r772; 2026-02-21T09:07:44.6233350Z shl.b64 %rd521, %rd520, 32; 2026-02-21T09:07:44.6233405Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T09:07:44.6233570Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6233635Z mov.b64 {%r1461, %r1462}, %rd522; 2026-02-21T09:07:44.6233697Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T09:07:44.6233884Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6233953Z cvt.u64.u32 %rd523, %r773; 2026-02-21T09:07:44.6234009Z cvt.u64.u32 %rd524, %r774; 2026-02-21T09:07:44.6234065Z shl.b64 %rd525, %rd524, 32; 2026-02-21T09:07:44.6234135Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T09:07:44.6234302Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6234360Z mov.b64 {%r1464, %r1465}, %rd526; 2026-02-21T09:07:44.6234444Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T09:07:44.6234615Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6234698Z cvt.u64.u32 %rd527, %r775; 2026-02-21T09:07:44.6234758Z cvt.u64.u32 %rd528, %r776; 2026-02-21T09:07:44.6234823Z shl.b64 %rd529, %rd528, 32; 2026-02-21T09:07:44.6234880Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T09:07:44.6235046Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6235110Z mov.b64 {%r1467, %r1468}, %rd530; 2026-02-21T09:07:44.6235173Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T09:07:44.6235340Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6235408Z cvt.u64.u32 %rd531, %r777; 2026-02-21T09:07:44.6235468Z cvt.u64.u32 %rd532, %r778; 2026-02-21T09:07:44.6235528Z shl.b64 %rd533, %rd532, 32; 2026-02-21T09:07:44.6235589Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T09:07:44.6235770Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6235832Z mov.b64 {%r1470, %r1471}, %rd534; 2026-02-21T09:07:44.6235899Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T09:07:44.6236080Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6236142Z cvt.u64.u32 %rd535, %r779; 2026-02-21T09:07:44.6236205Z cvt.u64.u32 %rd536, %r780; 2026-02-21T09:07:44.6236266Z shl.b64 %rd537, %rd536, 32; 2026-02-21T09:07:44.6236335Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T09:07:44.6236513Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6236577Z mov.b64 {%r1473, %r1474}, %rd538; 2026-02-21T09:07:44.6236655Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T09:07:44.6236828Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6236891Z cvt.u64.u32 %rd539, %r782; 2026-02-21T09:07:44.6236959Z cvt.u64.u32 %rd540, %r783; 2026-02-21T09:07:44.6237020Z shl.b64 %rd541, %rd540, 32; 2026-02-21T09:07:44.6237080Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T09:07:44.6237255Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6237325Z mov.b64 {%r1476, %r1477}, %rd542; 2026-02-21T09:07:44.6237425Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T09:07:44.6237603Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6237671Z cvt.u64.u32 %rd543, %r784; 2026-02-21T09:07:44.6237759Z cvt.u64.u32 %rd544, %r785; 2026-02-21T09:07:44.6237820Z shl.b64 %rd545, %rd544, 32; 2026-02-21T09:07:44.6237889Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T09:07:44.6238068Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6238129Z mov.b64 {%r1479, %r1480}, %rd546; 2026-02-21T09:07:44.6238197Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T09:07:44.6238383Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6238442Z cvt.u64.u32 %rd547, %r786; 2026-02-21T09:07:44.6238504Z cvt.u64.u32 %rd548, %r787; 2026-02-21T09:07:44.6238598Z shl.b64 %rd549, %rd548, 32; 2026-02-21T09:07:44.6238662Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T09:07:44.6238838Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6238906Z mov.b64 {%r1482, %r1483}, %rd550; 2026-02-21T09:07:44.6238976Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T09:07:44.6239154Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6239220Z cvt.u64.u32 %rd551, %r788; 2026-02-21T09:07:44.6239310Z cvt.u64.u32 %rd552, %r789; 2026-02-21T09:07:44.6239372Z shl.b64 %rd553, %rd552, 32; 2026-02-21T09:07:44.6239433Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T09:07:44.6239615Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6239676Z mov.b64 {%r1485, %r1486}, %rd554; 2026-02-21T09:07:44.6239742Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T09:07:44.6239920Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6239981Z cvt.u64.u32 %rd555, %r790; 2026-02-21T09:07:44.6240039Z cvt.u64.u32 %rd556, %r791; 2026-02-21T09:07:44.6240098Z shl.b64 %rd557, %rd556, 32; 2026-02-21T09:07:44.6240167Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T09:07:44.6240340Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6240401Z mov.b64 {%r1488, %r1489}, %rd558; 2026-02-21T09:07:44.6240479Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T09:07:44.6240652Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6240710Z cvt.u64.u32 %rd559, %r792; 2026-02-21T09:07:44.6240775Z cvt.u64.u32 %rd560, %r793; 2026-02-21T09:07:44.6240834Z shl.b64 %rd561, %rd560, 32; 2026-02-21T09:07:44.6240894Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T09:07:44.6241068Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6241138Z mov.b64 {%r1491, %r1492}, %rd562; 2026-02-21T09:07:44.6241205Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T09:07:44.6241380Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6241449Z cvt.u64.u32 %rd563, %r794; 2026-02-21T09:07:44.6241508Z cvt.u64.u32 %rd564, %r795; 2026-02-21T09:07:44.6241568Z shl.b64 %rd565, %rd564, 32; 2026-02-21T09:07:44.6241638Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T09:07:44.6241812Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6241871Z mov.b64 {%r1494, %r1495}, %rd566; 2026-02-21T09:07:44.6241938Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T09:07:44.6242119Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6242179Z cvt.u64.u32 %rd567, %r796; 2026-02-21T09:07:44.6242262Z cvt.u64.u32 %rd568, %r797; 2026-02-21T09:07:44.6242330Z shl.b64 %rd569, %rd568, 32; 2026-02-21T09:07:44.6242390Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T09:07:44.6242569Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6242660Z mov.b64 {%r1497, %r1498}, %rd570; 2026-02-21T09:07:44.6242730Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T09:07:44.6242908Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6242978Z cvt.u64.u32 %rd571, %r799; 2026-02-21T09:07:44.6243038Z cvt.u64.u32 %rd572, %r800; 2026-02-21T09:07:44.6243100Z shl.b64 %rd573, %rd572, 32; 2026-02-21T09:07:44.6243161Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T09:07:44.6243345Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6243431Z mov.b64 {%r1500, %r1501}, %rd574; 2026-02-21T09:07:44.6243503Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T09:07:44.6243690Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6243751Z cvt.u64.u32 %rd575, %r801; 2026-02-21T09:07:44.6243811Z cvt.u64.u32 %rd576, %r802; 2026-02-21T09:07:44.6243871Z shl.b64 %rd577, %rd576, 32; 2026-02-21T09:07:44.6243940Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T09:07:44.6244150Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6244212Z mov.b64 {%r1503, %r1504}, %rd578; 2026-02-21T09:07:44.6244288Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T09:07:44.6244472Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6244532Z cvt.u64.u32 %rd579, %r803; 2026-02-21T09:07:44.6244599Z cvt.u64.u32 %rd580, %r804; 2026-02-21T09:07:44.6244660Z shl.b64 %rd581, %rd580, 32; 2026-02-21T09:07:44.6244755Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T09:07:44.6244932Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6245001Z mov.b64 {%r1506, %r1507}, %rd582; 2026-02-21T09:07:44.6245069Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T09:07:44.6245246Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6245312Z cvt.u64.u32 %rd583, %r805; 2026-02-21T09:07:44.6245372Z cvt.u64.u32 %rd584, %r806; 2026-02-21T09:07:44.6245434Z shl.b64 %rd585, %rd584, 32; 2026-02-21T09:07:44.6245503Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T09:07:44.6245680Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6245743Z mov.b64 {%r1509, %r1510}, %rd586; 2026-02-21T09:07:44.6245811Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T09:07:44.6245992Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6246054Z cvt.u64.u32 %rd587, %r807; 2026-02-21T09:07:44.6246113Z cvt.u64.u32 %rd588, %r808; 2026-02-21T09:07:44.6246180Z shl.b64 %rd589, %rd588, 32; 2026-02-21T09:07:44.6246239Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T09:07:44.6246418Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6246487Z mov.b64 {%r1512, %r1513}, %rd590; 2026-02-21T09:07:44.6246555Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T09:07:44.6246730Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6246796Z cvt.u64.u32 %rd591, %r809; 2026-02-21T09:07:44.6246858Z cvt.u64.u32 %rd592, %r810; 2026-02-21T09:07:44.6246917Z shl.b64 %rd593, %rd592, 32; 2026-02-21T09:07:44.6246977Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T09:07:44.6247162Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6247257Z mov.b64 {%r1515, %r1516}, %rd594; 2026-02-21T09:07:44.6247326Z cvt.rn.f16x2.f32 %r1517, %r1516, %r1515; 2026-02-21T09:07:44.6247512Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6247602Z cvt.u64.u32 %rd595, %r811; 2026-02-21T09:07:44.6247661Z cvt.u64.u32 %rd596, %r812; 2026-02-21T09:07:44.6247720Z shl.b64 %rd597, %rd596, 32; 2026-02-21T09:07:44.6247790Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T09:07:44.6247961Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6248022Z mov.b64 {%r1518, %r1519}, %rd598; 2026-02-21T09:07:44.6248096Z cvt.rn.f16x2.f32 %r1520, %r1519, %r1518; 2026-02-21T09:07:44.6248266Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6248355Z cvt.u64.u32 %rd599, %r813; 2026-02-21T09:07:44.6248421Z cvt.u64.u32 %rd600, %r814; 2026-02-21T09:07:44.6248480Z shl.b64 %rd601, %rd600, 32; 2026-02-21T09:07:44.6248544Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T09:07:44.6248722Z .loc 1 58 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:58:27 2026-02-21T09:07:44.6248792Z mov.b64 {%r1521, %r1522}, %rd602; 2026-02-21T09:07:44.6248860Z cvt.rn.f16x2.f32 %r1523, %r1522, %r1521; 2026-02-21T09:07:44.6249065Z .loc 1 59 45 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:59:45 2026-02-21T09:07:44.6249134Z shl.b32 %r1524, %r1107, 11; 2026-02-21T09:07:44.6249195Z shl.b32 %r1525, %r1108, 11; 2026-02-21T09:07:44.6249254Z shl.b32 %r1526, %r1109, 11; 2026-02-21T09:07:44.6249319Z shl.b32 %r1527, %r1110, 11; 2026-02-21T09:07:44.6249377Z shl.b32 %r1528, %r1111, 11; 2026-02-21T09:07:44.6249435Z shl.b32 %r1529, %r1112, 11; 2026-02-21T09:07:44.6249494Z shl.b32 %r1530, %r1113, 11; 2026-02-21T09:07:44.6249560Z shl.b32 %r1531, %r1114, 11; 2026-02-21T09:07:44.6249620Z shl.b32 %r1532, %r1115, 11; 2026-02-21T09:07:44.6249678Z shl.b32 %r1533, %r1116, 11; 2026-02-21T09:07:44.6249743Z shl.b32 %r1534, %r1117, 11; 2026-02-21T09:07:44.6249801Z shl.b32 %r1535, %r1118, 11; 2026-02-21T09:07:44.6249860Z shl.b32 %r1536, %r1119, 11; 2026-02-21T09:07:44.6249918Z shl.b32 %r1537, %r1120, 11; 2026-02-21T09:07:44.6249983Z shl.b32 %r1538, %r1121, 11; 2026-02-21T09:07:44.6250049Z shl.b32 %r1539, %r1122, 11; 2026-02-21T09:07:44.6250104Z shl.b32 %r1540, %r1123, 11; 2026-02-21T09:07:44.6250169Z shl.b32 %r1541, %r1124, 11; 2026-02-21T09:07:44.6250224Z shl.b32 %r1542, %r1125, 11; 2026-02-21T09:07:44.6250278Z shl.b32 %r1543, %r1126, 11; 2026-02-21T09:07:44.6250338Z shl.b32 %r1544, %r1127, 11; 2026-02-21T09:07:44.6250392Z shl.b32 %r1545, %r1128, 11; 2026-02-21T09:07:44.6250445Z shl.b32 %r1546, %r1129, 11; 2026-02-21T09:07:44.6250499Z shl.b32 %r1547, %r1130, 11; 2026-02-21T09:07:44.6250562Z shl.b32 %r1548, %r1131, 11; 2026-02-21T09:07:44.6250618Z shl.b32 %r1549, %r1132, 11; 2026-02-21T09:07:44.6250672Z shl.b32 %r1550, %r1133, 11; 2026-02-21T09:07:44.6250733Z shl.b32 %r1551, %r1134, 11; 2026-02-21T09:07:44.6250785Z shl.b32 %r1552, %r1135, 11; 2026-02-21T09:07:44.6250839Z shl.b32 %r1553, %r1136, 11; 2026-02-21T09:07:44.6250894Z shl.b32 %r1554, %r1137, 11; 2026-02-21T09:07:44.6250956Z shl.b32 %r1555, %r1138, 11; 2026-02-21T09:07:44.6251126Z .loc 1 59 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:59:52 2026-02-21T09:07:44.6251186Z add.s32 %r1556, %r1524, %r1106; 2026-02-21T09:07:44.6251253Z add.s32 %r1557, %r1525, %r1106; 2026-02-21T09:07:44.6251312Z add.s32 %r1558, %r1526, %r1106; 2026-02-21T09:07:44.6251371Z add.s32 %r1559, %r1527, %r1106; 2026-02-21T09:07:44.6251435Z add.s32 %r1560, %r1528, %r1106; 2026-02-21T09:07:44.6251492Z add.s32 %r1561, %r1529, %r1106; 2026-02-21T09:07:44.6251550Z add.s32 %r1562, %r1530, %r1106; 2026-02-21T09:07:44.6251609Z add.s32 %r1563, %r1531, %r1106; 2026-02-21T09:07:44.6251713Z add.s32 %r1564, %r1532, %r1106; 2026-02-21T09:07:44.6251771Z add.s32 %r1565, %r1533, %r1106; 2026-02-21T09:07:44.6251827Z add.s32 %r1566, %r1534, %r1106; 2026-02-21T09:07:44.6251888Z add.s32 %r1567, %r1535, %r1106; 2026-02-21T09:07:44.6251975Z add.s32 %r1568, %r1536, %r1106; 2026-02-21T09:07:44.6252030Z add.s32 %r1569, %r1537, %r1106; 2026-02-21T09:07:44.6252084Z add.s32 %r1570, %r1538, %r1106; 2026-02-21T09:07:44.6252147Z add.s32 %r1571, %r1539, %r1106; 2026-02-21T09:07:44.6252203Z add.s32 %r1572, %r1540, %r1106; 2026-02-21T09:07:44.6252259Z add.s32 %r1573, %r1541, %r1106; 2026-02-21T09:07:44.6252320Z add.s32 %r1574, %r1542, %r1106; 2026-02-21T09:07:44.6252375Z add.s32 %r1575, %r1543, %r1106; 2026-02-21T09:07:44.6252429Z add.s32 %r1576, %r1544, %r1106; 2026-02-21T09:07:44.6252484Z add.s32 %r1577, %r1545, %r1106; 2026-02-21T09:07:44.6252544Z add.s32 %r1578, %r1546, %r1106; 2026-02-21T09:07:44.6252623Z add.s32 %r1579, %r1547, %r1106; 2026-02-21T09:07:44.6252684Z add.s32 %r1580, %r1548, %r1106; 2026-02-21T09:07:44.6252745Z add.s32 %r1581, %r1549, %r1106; 2026-02-21T09:07:44.6252801Z add.s32 %r1582, %r1550, %r1106; 2026-02-21T09:07:44.6252854Z add.s32 %r1583, %r1551, %r1106; 2026-02-21T09:07:44.6252919Z add.s32 %r1584, %r1552, %r1106; 2026-02-21T09:07:44.6252974Z add.s32 %r1585, %r1553, %r1106; 2026-02-21T09:07:44.6253027Z add.s32 %r1586, %r1554, %r1106; 2026-02-21T09:07:44.6253081Z add.s32 %r1587, %r1555, %r1106; 2026-02-21T09:07:44.6253273Z .loc 1 59 24 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:59:24 2026-02-21T09:07:44.6253342Z mad.wide.s32 %rd59, %r1556, 2, %rd5; 2026-02-21T09:07:44.6253406Z mad.wide.s32 %rd60, %r1557, 2, %rd5; 2026-02-21T09:07:44.6253477Z mad.wide.s32 %rd61, %r1558, 2, %rd5; 2026-02-21T09:07:44.6253537Z mad.wide.s32 %rd62, %r1559, 2, %rd5; 2026-02-21T09:07:44.6253595Z mad.wide.s32 %rd63, %r1560, 2, %rd5; 2026-02-21T09:07:44.6253655Z mad.wide.s32 %rd64, %r1561, 2, %rd5; 2026-02-21T09:07:44.6253723Z mad.wide.s32 %rd65, %r1562, 2, %rd5; 2026-02-21T09:07:44.6253781Z mad.wide.s32 %rd66, %r1563, 2, %rd5; 2026-02-21T09:07:44.6253840Z mad.wide.s32 %rd67, %r1564, 2, %rd5; 2026-02-21T09:07:44.6253908Z mad.wide.s32 %rd68, %r1565, 2, %rd5; 2026-02-21T09:07:44.6253969Z mad.wide.s32 %rd69, %r1566, 2, %rd5; 2026-02-21T09:07:44.6254027Z mad.wide.s32 %rd70, %r1567, 2, %rd5; 2026-02-21T09:07:44.6254092Z mad.wide.s32 %rd71, %r1568, 2, %rd5; 2026-02-21T09:07:44.6254153Z mad.wide.s32 %rd72, %r1569, 2, %rd5; 2026-02-21T09:07:44.6254212Z mad.wide.s32 %rd73, %r1570, 2, %rd5; 2026-02-21T09:07:44.6254271Z mad.wide.s32 %rd74, %r1571, 2, %rd5; 2026-02-21T09:07:44.6254335Z mad.wide.s32 %rd75, %r1572, 2, %rd5; 2026-02-21T09:07:44.6254395Z mad.wide.s32 %rd76, %r1573, 2, %rd5; 2026-02-21T09:07:44.6254454Z mad.wide.s32 %rd77, %r1574, 2, %rd5; 2026-02-21T09:07:44.6254522Z mad.wide.s32 %rd78, %r1575, 2, %rd5; 2026-02-21T09:07:44.6254585Z mad.wide.s32 %rd79, %r1576, 2, %rd5; 2026-02-21T09:07:44.6254649Z mad.wide.s32 %rd80, %r1577, 2, %rd5; 2026-02-21T09:07:44.6254770Z mad.wide.s32 %rd81, %r1578, 2, %rd5; 2026-02-21T09:07:44.6254834Z mad.wide.s32 %rd82, %r1579, 2, %rd5; 2026-02-21T09:07:44.6254897Z mad.wide.s32 %rd83, %r1580, 2, %rd5; 2026-02-21T09:07:44.6254961Z mad.wide.s32 %rd84, %r1581, 2, %rd5; 2026-02-21T09:07:44.6255031Z mad.wide.s32 %rd85, %r1582, 2, %rd5; 2026-02-21T09:07:44.6255092Z mad.wide.s32 %rd86, %r1583, 2, %rd5; 2026-02-21T09:07:44.6255155Z mad.wide.s32 %rd87, %r1584, 2, %rd5; 2026-02-21T09:07:44.6255224Z mad.wide.s32 %rd88, %r1585, 2, %rd5; 2026-02-21T09:07:44.6255285Z mad.wide.s32 %rd89, %r1586, 2, %rd5; 2026-02-21T09:07:44.6255347Z mad.wide.s32 %rd90, %r1587, 2, %rd5; 2026-02-21T09:07:44.6255532Z .loc 1 59 82 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:59:82 2026-02-21T09:07:44.6255597Z bar.sync 0, 128; 2026-02-21T09:07:44.6255704Z st.shared.v4.b32 [%r80], {%r1142, %r1154, %r1166, %r1178}; 2026-02-21T09:07:44.6255847Z st.shared.v4.b32 [%r81], {%r1190, %r1202, %r1214, %r1226}; 2026-02-21T09:07:44.6255956Z st.shared.v4.b32 [%r82], {%r1238, %r1250, %r1262, %r1274}; 2026-02-21T09:07:44.6256052Z st.shared.v4.b32 [%r83], {%r1286, %r1298, %r1310, %r1322}; 2026-02-21T09:07:44.6256175Z st.shared.v4.b32 [%r84], {%r1334, %r1346, %r1358, %r1370}; 2026-02-21T09:07:44.6256274Z st.shared.v4.b32 [%r85], {%r1382, %r1394, %r1406, %r1418}; 2026-02-21T09:07:44.6256367Z st.shared.v4.b32 [%r86], {%r1430, %r1442, %r1454, %r1466}; 2026-02-21T09:07:44.6256457Z st.shared.v4.b32 [%r87], {%r1478, %r1490, %r1502, %r1514}; 2026-02-21T09:07:44.6256513Z bar.sync 0, 128; 2026-02-21T09:07:44.6256589Z // begin inline asm 2026-02-21T09:07:44.6256737Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r977, %r981, %r985, %r989}, [%r821]; 2026-02-21T09:07:44.6256792Z // end inline asm 2026-02-21T09:07:44.6256853Z // begin inline asm 2026-02-21T09:07:44.6257028Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r993, %r997, %r1001, %r1005}, [%r826]; 2026-02-21T09:07:44.6257084Z // end inline asm 2026-02-21T09:07:44.6257143Z // begin inline asm 2026-02-21T09:07:44.6257290Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1009, %r1013, %r1017, %r1021}, [%r831]; 2026-02-21T09:07:44.6257343Z // end inline asm 2026-02-21T09:07:44.6257396Z // begin inline asm 2026-02-21T09:07:44.6257548Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r836]; 2026-02-21T09:07:44.6257600Z // end inline asm 2026-02-21T09:07:44.6257680Z // begin inline asm 2026-02-21T09:07:44.6257836Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r841]; 2026-02-21T09:07:44.6257889Z // end inline asm 2026-02-21T09:07:44.6257943Z // begin inline asm 2026-02-21T09:07:44.6258093Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r846]; 2026-02-21T09:07:44.6258145Z // end inline asm 2026-02-21T09:07:44.6258198Z // begin inline asm 2026-02-21T09:07:44.6258340Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r851]; 2026-02-21T09:07:44.6258403Z // end inline asm 2026-02-21T09:07:44.6258458Z // begin inline asm 2026-02-21T09:07:44.6258599Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r856]; 2026-02-21T09:07:44.6258661Z // end inline asm 2026-02-21T09:07:44.6258715Z bar.sync 0, 128; 2026-02-21T09:07:44.6258806Z st.shared.v4.b32 [%r80], {%r1145, %r1157, %r1169, %r1181}; 2026-02-21T09:07:44.6258896Z st.shared.v4.b32 [%r81], {%r1193, %r1205, %r1217, %r1229}; 2026-02-21T09:07:44.6258991Z st.shared.v4.b32 [%r82], {%r1241, %r1253, %r1265, %r1277}; 2026-02-21T09:07:44.6259079Z st.shared.v4.b32 [%r83], {%r1289, %r1301, %r1313, %r1325}; 2026-02-21T09:07:44.6259164Z st.shared.v4.b32 [%r84], {%r1337, %r1349, %r1361, %r1373}; 2026-02-21T09:07:44.6259261Z st.shared.v4.b32 [%r85], {%r1385, %r1397, %r1409, %r1421}; 2026-02-21T09:07:44.6259348Z st.shared.v4.b32 [%r86], {%r1433, %r1445, %r1457, %r1469}; 2026-02-21T09:07:44.6259435Z st.shared.v4.b32 [%r87], {%r1481, %r1493, %r1505, %r1517}; 2026-02-21T09:07:44.6259495Z bar.sync 0, 128; 2026-02-21T09:07:44.6259551Z // begin inline asm 2026-02-21T09:07:44.6259694Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r978, %r982, %r986, %r990}, [%r821]; 2026-02-21T09:07:44.6259749Z // end inline asm 2026-02-21T09:07:44.6259811Z // begin inline asm 2026-02-21T09:07:44.6259954Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r994, %r998, %r1002, %r1006}, [%r826]; 2026-02-21T09:07:44.6260007Z // end inline asm 2026-02-21T09:07:44.6260067Z // begin inline asm 2026-02-21T09:07:44.6260210Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1010, %r1014, %r1018, %r1022}, [%r831]; 2026-02-21T09:07:44.6260262Z // end inline asm 2026-02-21T09:07:44.6260321Z // begin inline asm 2026-02-21T09:07:44.6260463Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r836]; 2026-02-21T09:07:44.6260515Z // end inline asm 2026-02-21T09:07:44.6260569Z // begin inline asm 2026-02-21T09:07:44.6260745Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r841]; 2026-02-21T09:07:44.6260798Z // end inline asm 2026-02-21T09:07:44.6260850Z // begin inline asm 2026-02-21T09:07:44.6260998Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r846]; 2026-02-21T09:07:44.6261071Z // end inline asm 2026-02-21T09:07:44.6261124Z // begin inline asm 2026-02-21T09:07:44.6261267Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r851]; 2026-02-21T09:07:44.6261330Z // end inline asm 2026-02-21T09:07:44.6261384Z // begin inline asm 2026-02-21T09:07:44.6261529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r856]; 2026-02-21T09:07:44.6261588Z // end inline asm 2026-02-21T09:07:44.6261640Z bar.sync 0, 128; 2026-02-21T09:07:44.6261731Z st.shared.v4.b32 [%r80], {%r1148, %r1160, %r1172, %r1184}; 2026-02-21T09:07:44.6261847Z st.shared.v4.b32 [%r81], {%r1196, %r1208, %r1220, %r1232}; 2026-02-21T09:07:44.6261937Z st.shared.v4.b32 [%r82], {%r1244, %r1256, %r1268, %r1280}; 2026-02-21T09:07:44.6262023Z st.shared.v4.b32 [%r83], {%r1292, %r1304, %r1316, %r1328}; 2026-02-21T09:07:44.6262108Z st.shared.v4.b32 [%r84], {%r1340, %r1352, %r1364, %r1376}; 2026-02-21T09:07:44.6262227Z st.shared.v4.b32 [%r85], {%r1388, %r1400, %r1412, %r1424}; 2026-02-21T09:07:44.6262313Z st.shared.v4.b32 [%r86], {%r1436, %r1448, %r1460, %r1472}; 2026-02-21T09:07:44.6262421Z st.shared.v4.b32 [%r87], {%r1484, %r1496, %r1508, %r1520}; 2026-02-21T09:07:44.6262485Z bar.sync 0, 128; 2026-02-21T09:07:44.6262539Z // begin inline asm 2026-02-21T09:07:44.6262682Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r979, %r983, %r987, %r991}, [%r821]; 2026-02-21T09:07:44.6262741Z // end inline asm 2026-02-21T09:07:44.6262795Z // begin inline asm 2026-02-21T09:07:44.6262933Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r995, %r999, %r1003, %r1007}, [%r826]; 2026-02-21T09:07:44.6262987Z // end inline asm 2026-02-21T09:07:44.6263052Z // begin inline asm 2026-02-21T09:07:44.6263190Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1011, %r1015, %r1019, %r1023}, [%r831]; 2026-02-21T09:07:44.6263240Z // end inline asm 2026-02-21T09:07:44.6263299Z // begin inline asm 2026-02-21T09:07:44.6263441Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1027, %r1031, %r1035, %r1039}, [%r836]; 2026-02-21T09:07:44.6263494Z // end inline asm 2026-02-21T09:07:44.6263555Z // begin inline asm 2026-02-21T09:07:44.6263697Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1043, %r1047, %r1051, %r1055}, [%r841]; 2026-02-21T09:07:44.6263749Z // end inline asm 2026-02-21T09:07:44.6263801Z // begin inline asm 2026-02-21T09:07:44.6263945Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1059, %r1063, %r1067, %r1071}, [%r846]; 2026-02-21T09:07:44.6263996Z // end inline asm 2026-02-21T09:07:44.6264048Z // begin inline asm 2026-02-21T09:07:44.6264192Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1075, %r1079, %r1083, %r1087}, [%r851]; 2026-02-21T09:07:44.6264245Z // end inline asm 2026-02-21T09:07:44.6264300Z // begin inline asm 2026-02-21T09:07:44.6264438Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1091, %r1095, %r1099, %r1103}, [%r856]; 2026-02-21T09:07:44.6264496Z // end inline asm 2026-02-21T09:07:44.6264548Z bar.sync 0, 128; 2026-02-21T09:07:44.6264639Z st.shared.v4.b32 [%r80], {%r1151, %r1163, %r1175, %r1187}; 2026-02-21T09:07:44.6264819Z st.shared.v4.b32 [%r81], {%r1199, %r1211, %r1223, %r1235}; 2026-02-21T09:07:44.6264906Z st.shared.v4.b32 [%r82], {%r1247, %r1259, %r1271, %r1283}; 2026-02-21T09:07:44.6264993Z st.shared.v4.b32 [%r83], {%r1295, %r1307, %r1319, %r1331}; 2026-02-21T09:07:44.6265086Z st.shared.v4.b32 [%r84], {%r1343, %r1355, %r1367, %r1379}; 2026-02-21T09:07:44.6265169Z st.shared.v4.b32 [%r85], {%r1391, %r1403, %r1415, %r1427}; 2026-02-21T09:07:44.6265254Z st.shared.v4.b32 [%r86], {%r1439, %r1451, %r1463, %r1475}; 2026-02-21T09:07:44.6265340Z st.shared.v4.b32 [%r87], {%r1487, %r1499, %r1511, %r1523}; 2026-02-21T09:07:44.6265399Z bar.sync 0, 128; 2026-02-21T09:07:44.6265488Z // begin inline asm 2026-02-21T09:07:44.6265630Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r980, %r984, %r988, %r992}, [%r821]; 2026-02-21T09:07:44.6265690Z // end inline asm 2026-02-21T09:07:44.6265743Z // begin inline asm 2026-02-21T09:07:44.6265926Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r996, %r1000, %r1004, %r1008}, [%r826]; 2026-02-21T09:07:44.6265991Z // end inline asm 2026-02-21T09:07:44.6266048Z // begin inline asm 2026-02-21T09:07:44.6266205Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1012, %r1016, %r1020, %r1024}, [%r831]; 2026-02-21T09:07:44.6266262Z // end inline asm 2026-02-21T09:07:44.6266326Z // begin inline asm 2026-02-21T09:07:44.6266506Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1028, %r1032, %r1036, %r1040}, [%r836]; 2026-02-21T09:07:44.6266562Z // end inline asm 2026-02-21T09:07:44.6266626Z // begin inline asm 2026-02-21T09:07:44.6266810Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1044, %r1048, %r1052, %r1056}, [%r841]; 2026-02-21T09:07:44.6266870Z // end inline asm 2026-02-21T09:07:44.6266928Z // begin inline asm 2026-02-21T09:07:44.6267090Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1060, %r1064, %r1068, %r1072}, [%r846]; 2026-02-21T09:07:44.6267146Z // end inline asm 2026-02-21T09:07:44.6267206Z // begin inline asm 2026-02-21T09:07:44.6267368Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1076, %r1080, %r1084, %r1088}, [%r851]; 2026-02-21T09:07:44.6267424Z // end inline asm 2026-02-21T09:07:44.6267482Z // begin inline asm 2026-02-21T09:07:44.6267703Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1092, %r1096, %r1100, %r1104}, [%r856]; 2026-02-21T09:07:44.6267764Z // end inline asm 2026-02-21T09:07:44.6267822Z // begin inline asm 2026-02-21T09:07:44.6267932Z st.global.v4.b32 [ %rd59 + 0 ], { %r977, %r978, %r979, %r980 }; 2026-02-21T09:07:44.6268006Z // end inline asm 2026-02-21T09:07:44.6268064Z // begin inline asm 2026-02-21T09:07:44.6268169Z st.global.v4.b32 [ %rd60 + 0 ], { %r981, %r982, %r983, %r984 }; 2026-02-21T09:07:44.6268233Z // end inline asm 2026-02-21T09:07:44.6268290Z // begin inline asm 2026-02-21T09:07:44.6268389Z st.global.v4.b32 [ %rd61 + 0 ], { %r985, %r986, %r987, %r988 }; 2026-02-21T09:07:44.6268446Z // end inline asm 2026-02-21T09:07:44.6268509Z // begin inline asm 2026-02-21T09:07:44.6268607Z st.global.v4.b32 [ %rd62 + 0 ], { %r989, %r990, %r991, %r992 }; 2026-02-21T09:07:44.6268662Z // end inline asm 2026-02-21T09:07:44.6268729Z // begin inline asm 2026-02-21T09:07:44.6268830Z st.global.v4.b32 [ %rd63 + 0 ], { %r993, %r994, %r995, %r996 }; 2026-02-21T09:07:44.6268890Z // end inline asm 2026-02-21T09:07:44.6268960Z // begin inline asm 2026-02-21T09:07:44.6269067Z st.global.v4.b32 [ %rd64 + 0 ], { %r997, %r998, %r999, %r1000 }; 2026-02-21T09:07:44.6269123Z // end inline asm 2026-02-21T09:07:44.6269180Z // begin inline asm 2026-02-21T09:07:44.6269298Z st.global.v4.b32 [ %rd65 + 0 ], { %r1001, %r1002, %r1003, %r1004 }; 2026-02-21T09:07:44.6269356Z // end inline asm 2026-02-21T09:07:44.6269416Z // begin inline asm 2026-02-21T09:07:44.6269533Z st.global.v4.b32 [ %rd66 + 0 ], { %r1005, %r1006, %r1007, %r1008 }; 2026-02-21T09:07:44.6269588Z // end inline asm 2026-02-21T09:07:44.6269646Z // begin inline asm 2026-02-21T09:07:44.6269751Z st.global.v4.b32 [ %rd67 + 0 ], { %r1009, %r1010, %r1011, %r1012 }; 2026-02-21T09:07:44.6269817Z // end inline asm 2026-02-21T09:07:44.6269875Z // begin inline asm 2026-02-21T09:07:44.6269977Z st.global.v4.b32 [ %rd68 + 0 ], { %r1013, %r1014, %r1015, %r1016 }; 2026-02-21T09:07:44.6270043Z // end inline asm 2026-02-21T09:07:44.6270100Z // begin inline asm 2026-02-21T09:07:44.6270209Z st.global.v4.b32 [ %rd69 + 0 ], { %r1017, %r1018, %r1019, %r1020 }; 2026-02-21T09:07:44.6270261Z // end inline asm 2026-02-21T09:07:44.6270323Z // begin inline asm 2026-02-21T09:07:44.6270433Z st.global.v4.b32 [ %rd70 + 0 ], { %r1021, %r1022, %r1023, %r1024 }; 2026-02-21T09:07:44.6270485Z // end inline asm 2026-02-21T09:07:44.6270549Z // begin inline asm 2026-02-21T09:07:44.6270697Z st.global.v4.b32 [ %rd71 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T09:07:44.6270752Z // end inline asm 2026-02-21T09:07:44.6270816Z // begin inline asm 2026-02-21T09:07:44.6270915Z st.global.v4.b32 [ %rd72 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T09:07:44.6270996Z // end inline asm 2026-02-21T09:07:44.6271053Z // begin inline asm 2026-02-21T09:07:44.6271157Z st.global.v4.b32 [ %rd73 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T09:07:44.6271212Z // end inline asm 2026-02-21T09:07:44.6271269Z // begin inline asm 2026-02-21T09:07:44.6271374Z st.global.v4.b32 [ %rd74 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T09:07:44.6271430Z // end inline asm 2026-02-21T09:07:44.6271485Z // begin inline asm 2026-02-21T09:07:44.6271583Z st.global.v4.b32 [ %rd75 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T09:07:44.6271643Z // end inline asm 2026-02-21T09:07:44.6271698Z // begin inline asm 2026-02-21T09:07:44.6271821Z st.global.v4.b32 [ %rd76 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T09:07:44.6271885Z // end inline asm 2026-02-21T09:07:44.6271941Z // begin inline asm 2026-02-21T09:07:44.6272038Z st.global.v4.b32 [ %rd77 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T09:07:44.6272099Z // end inline asm 2026-02-21T09:07:44.6272157Z // begin inline asm 2026-02-21T09:07:44.6272257Z st.global.v4.b32 [ %rd78 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T09:07:44.6272310Z // end inline asm 2026-02-21T09:07:44.6272396Z // begin inline asm 2026-02-21T09:07:44.6272496Z st.global.v4.b32 [ %rd79 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T09:07:44.6272550Z // end inline asm 2026-02-21T09:07:44.6272613Z // begin inline asm 2026-02-21T09:07:44.6272713Z st.global.v4.b32 [ %rd80 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T09:07:44.6272766Z // end inline asm 2026-02-21T09:07:44.6272823Z // begin inline asm 2026-02-21T09:07:44.6272930Z st.global.v4.b32 [ %rd81 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T09:07:44.6272985Z // end inline asm 2026-02-21T09:07:44.6273041Z // begin inline asm 2026-02-21T09:07:44.6273146Z st.global.v4.b32 [ %rd82 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T09:07:44.6273201Z // end inline asm 2026-02-21T09:07:44.6273256Z // begin inline asm 2026-02-21T09:07:44.6273357Z st.global.v4.b32 [ %rd83 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T09:07:44.6273418Z // end inline asm 2026-02-21T09:07:44.6273472Z // begin inline asm 2026-02-21T09:07:44.6273573Z st.global.v4.b32 [ %rd84 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T09:07:44.6273635Z // end inline asm 2026-02-21T09:07:44.6273690Z // begin inline asm 2026-02-21T09:07:44.6273787Z st.global.v4.b32 [ %rd85 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T09:07:44.6273848Z // end inline asm 2026-02-21T09:07:44.6273903Z // begin inline asm 2026-02-21T09:07:44.6274000Z st.global.v4.b32 [ %rd86 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T09:07:44.6274056Z // end inline asm 2026-02-21T09:07:44.6274121Z // begin inline asm 2026-02-21T09:07:44.6274218Z st.global.v4.b32 [ %rd87 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T09:07:44.6274273Z // end inline asm 2026-02-21T09:07:44.6274335Z // begin inline asm 2026-02-21T09:07:44.6274435Z st.global.v4.b32 [ %rd88 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T09:07:44.6274489Z // end inline asm 2026-02-21T09:07:44.6274546Z // begin inline asm 2026-02-21T09:07:44.6274654Z st.global.v4.b32 [ %rd89 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T09:07:44.6274733Z // end inline asm 2026-02-21T09:07:44.6274791Z // begin inline asm 2026-02-21T09:07:44.6274896Z st.global.v4.b32 [ %rd90 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T09:07:44.6274951Z // end inline asm 2026-02-21T09:07:44.6275008Z mov.b32 %r1638, 1; 2026-02-21T09:07:44.6275126Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:07:44.6275318Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6275414Z xor.b32 %r1642, %r1638, %r1642; 2026-02-21T09:07:44.6275606Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6275683Z add.s32 %r1637, %r1637, 1; 2026-02-21T09:07:44.6275785Z setp.lt.s32 %p103, %r1637, %r77; 2026-02-21T09:07:44.6275860Z @%p103 bra $L__BB0_18; 2026-02-21T09:07:44.6275934Z bra.uni $L__BB0_23; 2026-02-21T09:07:44.6276045Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:07:44.6276109Z add.s32 %r541, %r1636, 1; 2026-02-21T09:07:44.6276187Z setp.eq.b32 %p99, %r1636, 63; 2026-02-21T09:07:44.6276256Z selp.b32 %r1636, 0, %r541, %p99; 2026-02-21T09:07:44.6276322Z setp.eq.b32 %p100, %r1636, 63; 2026-02-21T09:07:44.6276381Z @%p100 bra $L__BB0_21; 2026-02-21T09:07:44.6276488Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:07:44.6276694Z .loc 1 0 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0:106 2026-02-21T09:07:44.6276754Z mov.b32 %r1638, 0; 2026-02-21T09:07:44.6276942Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6277010Z setp.ne.b32 %p101, %r1636, 0; 2026-02-21T09:07:44.6277069Z @%p101 bra $L__BB0_22; 2026-02-21T09:07:44.6277156Z // %bb.20: // %.thread 2026-02-21T09:07:44.6277275Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:07:44.6277339Z add.s32 %r1639, %r1639, 1; 2026-02-21T09:07:44.6277522Z .loc 1 37 35 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:37:35 2026-02-21T09:07:44.6277592Z shr.s32 %r1589, %r1639, 31; 2026-02-21T09:07:44.6277650Z shr.u32 %r1590, %r1589, 25; 2026-02-21T09:07:44.6277712Z add.s32 %r1591, %r1639, %r1590; 2026-02-21T09:07:44.6277780Z shr.s32 %r1592, %r1591, 7; 2026-02-21T09:07:44.6277955Z .loc 1 38 33 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:38:33 2026-02-21T09:07:44.6278014Z shl.b32 %r1593, %r1592, 2; 2026-02-21T09:07:44.6278199Z .loc 1 39 39 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:39:39 2026-02-21T09:07:44.6278260Z sub.s32 %r1594, 8, %r1593; 2026-02-21T09:07:44.6278434Z .loc 1 39 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:39:52 2026-02-21T09:07:44.6278494Z min.s32 %r1595, %r1594, 4; 2026-02-21T09:07:44.6278673Z .loc 1 40 45 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:40:45 2026-02-21T09:07:44.6278739Z and.b32 %r1596, %r1591, -128; 2026-02-21T09:07:44.6278800Z sub.s32 %r1597, %r1639, %r1596; 2026-02-21T09:07:44.6278981Z .loc 1 41 51 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:41:51 2026-02-21T09:07:44.6279044Z div.s32 %r1598, %r1597, %r1595; 2026-02-21T09:07:44.6279228Z .loc 1 40 64 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:40:64 2026-02-21T09:07:44.6279296Z mul.lo.s32 %r1599, %r1598, %r1595; 2026-02-21T09:07:44.6279353Z sub.s32 %r1600, %r1597, %r1599; 2026-02-21T09:07:44.6279522Z .loc 1 40 30 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:40:30 2026-02-21T09:07:44.6279588Z add.s32 %r1601, %r1600, %r1593; 2026-02-21T09:07:44.6279753Z .loc 1 42 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:42:27 2026-02-21T09:07:44.6279809Z shl.b32 %r1641, %r1601, 8; 2026-02-21T09:07:44.6279975Z .loc 1 44 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:44:27 2026-02-21T09:07:44.6280037Z shl.b32 %r1640, %r1598, 7; 2026-02-21T09:07:44.6280208Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6280265Z bra.uni $L__BB0_22; 2026-02-21T09:07:44.6280366Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:07:44.6280567Z .loc 1 0 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0:106 2026-02-21T09:07:44.6280627Z mov.b32 %r114, global_smem; 2026-02-21T09:07:44.6280716Z add.s32 %r115, %r114, %r3; 2026-02-21T09:07:44.6280774Z mov.u32 %r162, %ctaid.x; 2026-02-21T09:07:44.6280829Z max.u32 %r163, %r162, 255; 2026-02-21T09:07:44.6280885Z shl.b32 %r164, %r163, 6; 2026-02-21T09:07:44.6280948Z sub.s32 %r5, 16384, %r164; 2026-02-21T09:07:44.6281008Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:07:44.6281060Z bra.uni $L__BB0_2; 2026-02-21T09:07:44.6281162Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6281332Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6281409Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:07:44.6281492Z barrier.sync 1; 2026-02-21T09:07:44.6281551Z barrier.sync 1; 2026-02-21T09:07:44.6281626Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:07:44.6281705Z $L__BB0_2: // %.preheader 2026-02-21T09:07:44.6281799Z // =>This Loop Header: Depth=1 2026-02-21T09:07:44.6281884Z // Child Loop BB0_11 Depth 2 2026-02-21T09:07:44.6281965Z // Child Loop BB0_7 Depth 2 2026-02-21T09:07:44.6282154Z .loc 1 19 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:19 2026-02-21T09:07:44.6282230Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:07:44.6282285Z barrier.sync 1; 2026-02-21T09:07:44.6282353Z ld.shared.b8 %r113, [%r115+114772]; 2026-02-21T09:07:44.6282414Z setp.gt.u32 %p4, %r113, 3; 2026-02-21T09:07:44.6282470Z @%p4 bra $L__BB0_4; 2026-02-21T09:07:44.6282548Z // %bb.3: // %.preheader 2026-02-21T09:07:44.6282642Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6282704Z $L_brx_0: .branchtargets 2026-02-21T09:07:44.6282756Z $L__BB0_5, 2026-02-21T09:07:44.6282814Z $L__BB0_9, 2026-02-21T09:07:44.6282865Z $L__BB0_15, 2026-02-21T09:07:44.6282915Z $L__BB0_24; 2026-02-21T09:07:44.6282974Z brx.idx %r113, $L_brx_0; 2026-02-21T09:07:44.6283074Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6283250Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6283322Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:07:44.6283406Z ld.shared.b32 %r169, [global_smem+98304]; 2026-02-21T09:07:44.6283462Z barrier.sync 1; 2026-02-21T09:07:44.6283518Z @%p17 bra $L__BB0_8; 2026-02-21T09:07:44.6283598Z // %bb.6: // %.lr.ph9 2026-02-21T09:07:44.6283681Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6283856Z .loc 1 0 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0:106 2026-02-21T09:07:44.6283913Z mov.b32 %r1619, -1; 2026-02-21T09:07:44.6283980Z mov.pred %p115, 0; 2026-02-21T09:07:44.6284035Z mov.b32 %r1616, 0; 2026-02-21T09:07:44.6284093Z mov.b32 %r1617, %r1616; 2026-02-21T09:07:44.6284158Z mov.b32 %r1618, %r1616; 2026-02-21T09:07:44.6284215Z mov.b32 %r1620, %r1616; 2026-02-21T09:07:44.6284309Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:07:44.6284399Z // => This Inner Loop Header: Depth=2 2026-02-21T09:07:44.6284593Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6284656Z add.s32 %r175, %r1619, 1; 2026-02-21T09:07:44.6284778Z setp.eq.b32 %p26, %r1619, 63; 2026-02-21T09:07:44.6284853Z selp.b32 %r1619, 0, %r175, %p26; 2026-02-21T09:07:44.6284914Z shl.b32 %r176, %r1618, 3; 2026-02-21T09:07:44.6285015Z add.s32 %r178, %r114, %r176; 2026-02-21T09:07:44.6285082Z add.s32 %r179, %r178, 114688; 2026-02-21T09:07:44.6285142Z add.s32 %r167, %r178, 114720; 2026-02-21T09:07:44.6285324Z .loc 1 54 31 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:54:31 2026-02-21T09:07:44.6285410Z shl.b32 %r180, %r1618, 13; 2026-02-21T09:07:44.6285482Z add.s32 %r181, %r114, %r180; 2026-02-21T09:07:44.6285541Z add.s32 %r182, %r181, 65536; 2026-02-21T09:07:44.6285723Z .loc 1 55 44 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:55:44 2026-02-21T09:07:44.6285791Z shl.b32 %r183, %r1618, 14; 2026-02-21T09:07:44.6285850Z add.s32 %r184, %r114, %r183; 2026-02-21T09:07:44.6286026Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6286094Z bar.warp.sync -1; 2026-02-21T09:07:44.6286148Z // begin inline asm 2026-02-21T09:07:44.6286199Z 2026-02-21T09:07:44.6286290Z { 2026-02-21T09:07:44.6286358Z .reg .pred complete; 2026-02-21T09:07:44.6286411Z waitLoop: 2026-02-21T09:07:44.6286533Z mbarrier.try_wait.parity.shared.b64 complete, [%r167], %r1617; 2026-02-21T09:07:44.6286603Z @!complete bra.uni waitLoop; 2026-02-21T09:07:44.6286653Z } 2026-02-21T09:07:44.6286658Z 2026-02-21T09:07:44.6286710Z // end inline asm 2026-02-21T09:07:44.6286882Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6286976Z setp.eq.b32 %p25, %r1619, 63; 2026-02-21T09:07:44.6287143Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6287203Z elect.sync %r185|%p20, -1; 2026-02-21T09:07:44.6287271Z bfe.u32 %r186, %r182, 4, 14; 2026-02-21T09:07:44.6287330Z cvt.u64.u32 %rd18, %r186; 2026-02-21T09:07:44.6287401Z or.b64 %rd12, %rd18, -9223371899382267904; 2026-02-21T09:07:44.6287467Z bfe.u32 %r187, %r184, 4, 14; 2026-02-21T09:07:44.6287525Z cvt.u64.u32 %rd19, %r187; 2026-02-21T09:07:44.6287596Z or.b64 %rd13, %rd19, -9223371899348713472; 2026-02-21T09:07:44.6287652Z mov.b32 %r170, 138412048; 2026-02-21T09:07:44.6287714Z // begin inline asm 2026-02-21T09:07:44.6287859Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r169 + 0 ], %rd12, %rd13, %r170, %p115; 2026-02-21T09:07:44.6287915Z // end inline asm 2026-02-21T09:07:44.6287979Z add.s32 %r188, %r181, 65568; 2026-02-21T09:07:44.6288034Z bfe.u32 %r189, %r188, 4, 14; 2026-02-21T09:07:44.6288092Z cvt.u64.u32 %rd20, %r189; 2026-02-21T09:07:44.6288163Z or.b64 %rd14, %rd20, -9223371899382267904; 2026-02-21T09:07:44.6288218Z add.s32 %r190, %r184, 32; 2026-02-21T09:07:44.6288272Z bfe.u32 %r191, %r190, 4, 14; 2026-02-21T09:07:44.6288326Z cvt.u64.u32 %rd21, %r191; 2026-02-21T09:07:44.6288397Z or.b64 %rd15, %rd21, -9223371899348713472; 2026-02-21T09:07:44.6288455Z mov.pred %p21, -1; 2026-02-21T09:07:44.6288509Z // begin inline asm 2026-02-21T09:07:44.6288650Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r169 + 0 ], %rd14, %rd15, %r170, %p21; 2026-02-21T09:07:44.6288704Z // end inline asm 2026-02-21T09:07:44.6288759Z cvt.u64.u32 %rd16, %r179; 2026-02-21T09:07:44.6288812Z // begin inline asm 2026-02-21T09:07:44.6288939Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd16]; 2026-02-21T09:07:44.6288993Z // end inline asm 2026-02-21T09:07:44.6289054Z and.pred %p24, %p25, %p20; 2026-02-21T09:07:44.6289116Z add.s32 %r192, %r114, 114752; 2026-02-21T09:07:44.6289170Z cvt.u64.u32 %rd17, %r192; 2026-02-21T09:07:44.6289224Z // begin inline asm 2026-02-21T09:07:44.6289348Z @%p24 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd17]; 2026-02-21T09:07:44.6289401Z // end inline asm 2026-02-21T09:07:44.6289565Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6289628Z setp.ne.b32 %p115, %r1619, 63; 2026-02-21T09:07:44.6289806Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6289892Z selp.b32 %r193, 1, 0, %p25; 2026-02-21T09:07:44.6289950Z xor.b32 %r1616, %r1616, %r193; 2026-02-21T09:07:44.6290013Z add.s32 %r173, %r114, 114768; 2026-02-21T09:07:44.6290067Z // begin inline asm 2026-02-21T09:07:44.6290114Z 2026-02-21T09:07:44.6290190Z { 2026-02-21T09:07:44.6290251Z @!%p25 bra.uni skipWait; 2026-02-21T09:07:44.6290308Z .reg .pred complete; 2026-02-21T09:07:44.6290361Z waitLoop: 2026-02-21T09:07:44.6290486Z mbarrier.try_wait.parity.shared.b64 complete, [%r173], %r1616; 2026-02-21T09:07:44.6290548Z @!complete bra.uni waitLoop; 2026-02-21T09:07:44.6290601Z skipWait: 2026-02-21T09:07:44.6290739Z } 2026-02-21T09:07:44.6290742Z 2026-02-21T09:07:44.6290795Z // end inline asm 2026-02-21T09:07:44.6290964Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6291028Z add.s32 %r194, %r1618, 1; 2026-02-21T09:07:44.6291107Z setp.eq.b32 %p27, %r194, 4; 2026-02-21T09:07:44.6291172Z selp.b32 %r1618, 0, %r194, %p27; 2026-02-21T09:07:44.6291232Z selp.b32 %r195, 1, 0, %p27; 2026-02-21T09:07:44.6291297Z xor.b32 %r1617, %r1617, %r195; 2026-02-21T09:07:44.6291474Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6291533Z add.s32 %r1620, %r1620, 1; 2026-02-21T09:07:44.6291603Z setp.lt.s32 %p28, %r1620, %r5; 2026-02-21T09:07:44.6291659Z @%p28 bra $L__BB0_7; 2026-02-21T09:07:44.6291766Z $L__BB0_8: // %._crit_edge10 2026-02-21T09:07:44.6291854Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6291920Z barrier.sync 1; 2026-02-21T09:07:44.6291995Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:07:44.6292050Z bra.uni $L__BB0_2; 2026-02-21T09:07:44.6292153Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6292331Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6292407Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:07:44.6292473Z barrier.sync 1; 2026-02-21T09:07:44.6292640Z .loc 1 21 67 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:21:67 2026-02-21T09:07:44.6292702Z mov.u32 %r116, %ctaid.y; 2026-02-21T09:07:44.6292763Z mov.u32 %r117, %ctaid.z; 2026-02-21T09:07:44.6292830Z mov.u32 %r118, %nctaid.x; 2026-02-21T09:07:44.6292887Z mov.u32 %r119, %nctaid.y; 2026-02-21T09:07:44.6292953Z mad.lo.s32 %r120, %r117, %r119, %r116; 2026-02-21T09:07:44.6293024Z mad.lo.s32 %r121, %r120, %r118, %r162; 2026-02-21T09:07:44.6293079Z shl.b32 %r122, %r121, 8; 2026-02-21T09:07:44.6293244Z .loc 1 22 67 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:22:67 2026-02-21T09:07:44.6293308Z cvt.s64.s32 %rd7, %r122; 2026-02-21T09:07:44.6293364Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:07:44.6293421Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:07:44.6293484Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:07:44.6293654Z .loc 1 21 67 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:21:67 2026-02-21T09:07:44.6293714Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:07:44.6293882Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6293951Z @%p17 bra $L__BB0_14; 2026-02-21T09:07:44.6294024Z // %bb.10: // %.lr.ph 2026-02-21T09:07:44.6294109Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6294175Z add.s32 %r1631, %r162, -1; 2026-02-21T09:07:44.6294234Z add.s32 %r19, %r1, -128; 2026-02-21T09:07:44.6294291Z mov.b32 %r1627, -1; 2026-02-21T09:07:44.6294345Z mov.b32 %r1621, 0; 2026-02-21T09:07:44.6294408Z mov.b32 %r1622, %r1621; 2026-02-21T09:07:44.6294462Z mov.b32 %r1630, %r1621; 2026-02-21T09:07:44.6294518Z mov.b32 %r1629, %r1621; 2026-02-21T09:07:44.6294580Z mov.b32 %r1625, %r1621; 2026-02-21T09:07:44.6294659Z mov.b32 %r1628, %r1621; 2026-02-21T09:07:44.6294743Z bra.uni $L__BB0_11; 2026-02-21T09:07:44.6294838Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:07:44.6295034Z .loc 1 0 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0:106 2026-02-21T09:07:44.6295143Z selp.b32 %r145, 0, %r1625, %p8; 2026-02-21T09:07:44.6295209Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:07:44.6295281Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:07:44.6295468Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6295527Z shl.b32 %r152, %r1622, 3; 2026-02-21T09:07:44.6295596Z add.s32 %r154, %r114, %r152; 2026-02-21T09:07:44.6295658Z add.s32 %r141, %r154, 114688; 2026-02-21T09:07:44.6295830Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6295923Z // begin inline asm 2026-02-21T09:07:44.6295988Z 2026-02-21T09:07:44.6296040Z { 2026-02-21T09:07:44.6296104Z .reg .pred complete; 2026-02-21T09:07:44.6296168Z waitLoop: 2026-02-21T09:07:44.6296294Z mbarrier.try_wait.parity.shared.b64 complete, [%r141], %r1621; 2026-02-21T09:07:44.6296362Z @!complete bra.uni waitLoop; 2026-02-21T09:07:44.6296415Z } 2026-02-21T09:07:44.6296419Z 2026-02-21T09:07:44.6296483Z // end inline asm 2026-02-21T09:07:44.6296696Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6296759Z add.s32 %r147, %r154, 114720; 2026-02-21T09:07:44.6296941Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6296998Z bar.sync 3, 64; 2026-02-21T09:07:44.6297056Z // begin inline asm 2026-02-21T09:07:44.6297182Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r147], 24576; 2026-02-21T09:07:44.6297238Z // end inline asm 2026-02-21T09:07:44.6297421Z .loc 1 54 31 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:54:31 2026-02-21T09:07:44.6297486Z shl.b32 %r155, %r1622, 13; 2026-02-21T09:07:44.6297552Z add.s32 %r156, %r114, %r155; 2026-02-21T09:07:44.6297609Z add.s32 %r144, %r156, 65536; 2026-02-21T09:07:44.6297783Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6297848Z bar.sync 3, 64; 2026-02-21T09:07:44.6297914Z elect.sync %r157|%p13, -1; 2026-02-21T09:07:44.6297980Z and.pred %p10, %p12, %p13; 2026-02-21T09:07:44.6298046Z // begin inline asm 2026-02-21T09:07:44.6298327Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd10, {%r145, %r1630}], [%r147]; 2026-02-21T09:07:44.6298384Z // end inline asm 2026-02-21T09:07:44.6298570Z .loc 1 55 44 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:55:44 2026-02-21T09:07:44.6298638Z shl.b32 %r158, %r1622, 14; 2026-02-21T09:07:44.6298697Z add.s32 %r148, %r114, %r158; 2026-02-21T09:07:44.6298870Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6298935Z bar.sync 3, 64; 2026-02-21T09:07:44.6298998Z elect.sync %r159|%p14, -1; 2026-02-21T09:07:44.6299063Z and.pred %p11, %p12, %p14; 2026-02-21T09:07:44.6299127Z // begin inline asm 2026-02-21T09:07:44.6299388Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd11, {%r145, %r1629}], [%r147]; 2026-02-21T09:07:44.6299447Z // end inline asm 2026-02-21T09:07:44.6299641Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6299702Z add.s32 %r1625, %r145, 32; 2026-02-21T09:07:44.6299874Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6299935Z add.s32 %r160, %r1622, 1; 2026-02-21T09:07:44.6300009Z setp.eq.b32 %p15, %r160, 4; 2026-02-21T09:07:44.6300111Z selp.b32 %r1622, 0, %r160, %p15; 2026-02-21T09:07:44.6300173Z selp.b32 %r161, 1, 0, %p15; 2026-02-21T09:07:44.6300242Z xor.b32 %r1621, %r1621, %r161; 2026-02-21T09:07:44.6300433Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6300519Z add.s32 %r1628, %r1628, 1; 2026-02-21T09:07:44.6300595Z setp.lt.s32 %p16, %r1628, %r5; 2026-02-21T09:07:44.6300656Z @%p16 bra $L__BB0_11; 2026-02-21T09:07:44.6300715Z bra.uni $L__BB0_14; 2026-02-21T09:07:44.6300820Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:07:44.6300924Z // => This Inner Loop Header: Depth=2 2026-02-21T09:07:44.6300984Z add.s32 %r127, %r1627, 1; 2026-02-21T09:07:44.6301046Z setp.eq.b32 %p6, %r1627, 63; 2026-02-21T09:07:44.6301119Z selp.b32 %r1627, 0, %r127, %p6; 2026-02-21T09:07:44.6301184Z setp.ne.b32 %p7, %r1627, 0; 2026-02-21T09:07:44.6301269Z setp.eq.b32 %p8, %r1627, 0; 2026-02-21T09:07:44.6301332Z @%p7 bra $L__BB0_13; 2026-02-21T09:07:44.6301445Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:07:44.6301509Z add.s32 %r1631, %r1631, 1; 2026-02-21T09:07:44.6301704Z .loc 1 37 35 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:37:35 2026-02-21T09:07:44.6301774Z shr.s32 %r128, %r1631, 31; 2026-02-21T09:07:44.6301833Z shr.u32 %r129, %r128, 25; 2026-02-21T09:07:44.6301915Z add.s32 %r130, %r1631, %r129; 2026-02-21T09:07:44.6301984Z shr.s32 %r131, %r130, 7; 2026-02-21T09:07:44.6302161Z .loc 1 38 33 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:38:33 2026-02-21T09:07:44.6302222Z shl.b32 %r132, %r131, 2; 2026-02-21T09:07:44.6302399Z .loc 1 39 39 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:39:39 2026-02-21T09:07:44.6302466Z sub.s32 %r133, 8, %r132; 2026-02-21T09:07:44.6302646Z .loc 1 39 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:39:52 2026-02-21T09:07:44.6302705Z min.s32 %r134, %r133, 4; 2026-02-21T09:07:44.6302890Z .loc 1 40 45 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:40:45 2026-02-21T09:07:44.6302951Z and.b32 %r135, %r130, -128; 2026-02-21T09:07:44.6303012Z sub.s32 %r136, %r1631, %r135; 2026-02-21T09:07:44.6303197Z .loc 1 41 51 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:41:51 2026-02-21T09:07:44.6303258Z div.s32 %r137, %r136, %r134; 2026-02-21T09:07:44.6303433Z .loc 1 40 64 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:40:64 2026-02-21T09:07:44.6303503Z mul.lo.s32 %r138, %r137, %r134; 2026-02-21T09:07:44.6303563Z sub.s32 %r139, %r136, %r138; 2026-02-21T09:07:44.6303745Z .loc 1 40 30 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:40:30 2026-02-21T09:07:44.6303808Z add.s32 %r140, %r139, %r132; 2026-02-21T09:07:44.6303990Z .loc 1 42 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:42:27 2026-02-21T09:07:44.6304051Z shl.b32 %r1629, %r140, 8; 2026-02-21T09:07:44.6304227Z .loc 1 44 27 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:44:27 2026-02-21T09:07:44.6304297Z shl.b32 %r1630, %r137, 7; 2026-02-21T09:07:44.6304356Z bra.uni $L__BB0_13; 2026-02-21T09:07:44.6304440Z $L__BB0_14: // %._crit_edge 2026-02-21T09:07:44.6304538Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6304744Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6304803Z barrier.sync 1; 2026-02-21T09:07:44.6304885Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:07:44.6304950Z bra.uni $L__BB0_2; 2026-02-21T09:07:44.6305049Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:44.6305254Z .loc 1 19 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:19 2026-02-21T09:07:44.6305321Z barrier.sync 1; 2026-02-21T09:07:44.6305378Z barrier.sync 1; 2026-02-21T09:07:44.6305436Z bra.uni $L__BB0_2; 2026-02-21T09:07:44.6305574Z $L__BB0_23: // %._crit_edge13 2026-02-21T09:07:44.6305756Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6305816Z barrier.sync 1; 2026-02-21T09:07:44.6305898Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:07:44.6306082Z .loc 1 56 52 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:56:52 2026-02-21T09:07:44.6306139Z bar.sync 0, 128; 2026-02-21T09:07:44.6306196Z // begin inline asm 2026-02-21T09:07:44.6306256Z 2026-02-21T09:07:44.6306308Z { 2026-02-21T09:07:44.6306368Z .reg .pred complete; 2026-02-21T09:07:44.6306423Z waitLoop: 2026-02-21T09:07:44.6306587Z mbarrier.try_wait.parity.shared.b64 complete, [%r1602], %r1642; 2026-02-21T09:07:44.6306656Z @!complete bra.uni waitLoop; 2026-02-21T09:07:44.6306708Z } 2026-02-21T09:07:44.6306712Z 2026-02-21T09:07:44.6306775Z // end inline asm 2026-02-21T09:07:44.6306965Z .loc 1 31 106 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:106 2026-02-21T09:07:44.6307024Z bar.sync 0, 128; 2026-02-21T09:07:44.6307087Z // begin inline asm 2026-02-21T09:07:44.6307180Z @%p104 mbarrier.inval.shared::cta.b64 [%r1602]; 2026-02-21T09:07:44.6307262Z // end inline asm 2026-02-21T09:07:44.6307322Z // begin inline asm 2026-02-21T09:07:44.6307419Z @%p104 mbarrier.inval.shared::cta.b64 [%r497]; 2026-02-21T09:07:44.6307476Z // end inline asm 2026-02-21T09:07:44.6307530Z // begin inline asm 2026-02-21T09:07:44.6307625Z @%p104 mbarrier.inval.shared::cta.b64 [%r489]; 2026-02-21T09:07:44.6307681Z // end inline asm 2026-02-21T09:07:44.6307736Z bar.sync 0, 128; 2026-02-21T09:07:44.6307800Z // begin inline asm 2026-02-21T09:07:44.6307882Z @%p104 mbarrier.inval.shared::cta.b64 [%r490]; 2026-02-21T09:07:44.6307937Z // end inline asm 2026-02-21T09:07:44.6308004Z bar.sync 0, 128; 2026-02-21T09:07:44.6308063Z // begin inline asm 2026-02-21T09:07:44.6308137Z @%p104 mbarrier.inval.shared::cta.b64 [%r491]; 2026-02-21T09:07:44.6308190Z // end inline asm 2026-02-21T09:07:44.6308251Z bar.sync 0, 128; 2026-02-21T09:07:44.6308303Z // begin inline asm 2026-02-21T09:07:44.6308376Z @%p104 mbarrier.inval.shared::cta.b64 [%r492]; 2026-02-21T09:07:44.6308428Z // end inline asm 2026-02-21T09:07:44.6308487Z // begin inline asm 2026-02-21T09:07:44.6308560Z @%p104 mbarrier.inval.shared::cta.b64 [%r485]; 2026-02-21T09:07:44.6308611Z // end inline asm 2026-02-21T09:07:44.6308672Z bar.sync 0, 128; 2026-02-21T09:07:44.6308723Z // begin inline asm 2026-02-21T09:07:44.6308797Z @%p104 mbarrier.inval.shared::cta.b64 [%r486]; 2026-02-21T09:07:44.6308856Z // end inline asm 2026-02-21T09:07:44.6308911Z bar.sync 0, 128; 2026-02-21T09:07:44.6308965Z // begin inline asm 2026-02-21T09:07:44.6309039Z @%p104 mbarrier.inval.shared::cta.b64 [%r487]; 2026-02-21T09:07:44.6309099Z // end inline asm 2026-02-21T09:07:44.6309150Z bar.sync 0, 128; 2026-02-21T09:07:44.6309203Z // begin inline asm 2026-02-21T09:07:44.6309285Z @%p104 mbarrier.inval.shared::cta.b64 [%r488]; 2026-02-21T09:07:44.6309337Z // end inline asm 2026-02-21T09:07:44.6309500Z .loc 1 31 4 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:31:4 2026-02-21T09:07:44.6309554Z bar.sync 0, 128; 2026-02-21T09:07:44.6309618Z // begin inline asm 2026-02-21T09:07:44.6309736Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1614, 256; 2026-02-21T09:07:44.6309790Z // end inline asm 2026-02-21T09:07:44.6309877Z st.shared.b32 [global_smem+114776], 50529027; 2026-02-21T09:07:44.6309933Z barrier.sync 1; 2026-02-21T09:07:44.6310011Z $L__BB0_24: // %common.ret 2026-02-21T09:07:44.6310184Z .loc 1 0 0 // cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py:0 2026-02-21T09:07:44.6310260Z ret; 2026-02-21T09:07:44.6310313Z $L__tmp1: 2026-02-21T09:07:44.6310366Z $L__func_end0: 2026-02-21T09:07:44.6310454Z // -- End function 2026-02-21T09:07:44.6310524Z } 2026-02-21T09:07:44.6310739Z .file 1 "/tmp/torchinductor_root/ib/cibt3hbeueracsjwcp3sxgkmeu5myjcizfzejphii3yfgxcai3i6.py" 2026-02-21T09:07:44.6310808Z .section .debug_abbrev 2026-02-21T09:07:44.6310860Z { 2026-02-21T09:07:44.6310948Z .b8 1 // Abbreviation Code 2026-02-21T09:07:44.6311037Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:07:44.6311121Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:07:44.6311201Z .b8 37 // DW_AT_producer 2026-02-21T09:07:44.6311275Z .b8 8 // DW_FORM_string 2026-02-21T09:07:44.6311378Z .b8 19 // DW_AT_language 2026-02-21T09:07:44.6311455Z .b8 5 // DW_FORM_data2 2026-02-21T09:07:44.6311529Z .b8 3 // DW_AT_name 2026-02-21T09:07:44.6311609Z .b8 8 // DW_FORM_string 2026-02-21T09:07:44.6311685Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:07:44.6311756Z .b8 6 // DW_FORM_data4 2026-02-21T09:07:44.6311855Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:07:44.6311928Z .b8 8 // DW_FORM_string 2026-02-21T09:07:44.6311999Z .b8 0 // EOM(1) 2026-02-21T09:07:44.6312066Z .b8 0 // EOM(2) 2026-02-21T09:07:44.6312140Z .b8 0 // EOM(3) 2026-02-21T09:07:44.6312191Z } 2026-02-21T09:07:44.6312248Z .section .debug_info 2026-02-21T09:07:44.6312302Z { 2026-02-21T09:07:44.6312383Z .b32 104 // Length of Unit 2026-02-21T09:07:44.6312467Z .b8 2 // DWARF version number 2026-02-21T09:07:44.6312517Z .b8 0 2026-02-21T09:07:44.6312636Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:07:44.6312725Z .b8 8 // Address Size (in bytes) 2026-02-21T09:07:44.6312823Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:07:44.6312910Z .b8 116 // DW_AT_producer 2026-02-21T09:07:44.6312962Z .b8 114 2026-02-21T09:07:44.6313013Z .b8 105 2026-02-21T09:07:44.6313069Z .b8 116 2026-02-21T09:07:44.6313118Z .b8 111 2026-02-21T09:07:44.6313166Z .b8 110 2026-02-21T09:07:44.6313215Z .b8 0 2026-02-21T09:07:44.6313294Z .b8 2 // DW_AT_language 2026-02-21T09:07:44.6313342Z .b8 0 2026-02-21T09:07:44.6313417Z .b8 99 // DW_AT_name 2026-02-21T09:07:44.6313474Z .b8 105 2026-02-21T09:07:44.6313524Z .b8 98 2026-02-21T09:07:44.6313572Z .b8 116 2026-02-21T09:07:44.6313621Z .b8 51 2026-02-21T09:07:44.6313678Z .b8 104 2026-02-21T09:07:44.6313726Z .b8 98 2026-02-21T09:07:44.6313774Z .b8 101 2026-02-21T09:07:44.6313822Z .b8 117 2026-02-21T09:07:44.6313877Z .b8 101 2026-02-21T09:07:44.6313924Z .b8 114 2026-02-21T09:07:44.6313972Z .b8 97 2026-02-21T09:07:44.6314025Z .b8 99 2026-02-21T09:07:44.6314073Z .b8 115 2026-02-21T09:07:44.6314120Z .b8 106 2026-02-21T09:07:44.6314167Z .b8 119 2026-02-21T09:07:44.6314224Z .b8 99 2026-02-21T09:07:44.6314272Z .b8 112 2026-02-21T09:07:44.6314319Z .b8 51 2026-02-21T09:07:44.6314372Z .b8 115 2026-02-21T09:07:44.6314419Z .b8 120 2026-02-21T09:07:44.6314466Z .b8 103 2026-02-21T09:07:44.6314513Z .b8 107 2026-02-21T09:07:44.6314569Z .b8 109 2026-02-21T09:07:44.6314617Z .b8 101 2026-02-21T09:07:44.6314664Z .b8 117 2026-02-21T09:07:44.6314750Z .b8 53 2026-02-21T09:07:44.6314800Z .b8 109 2026-02-21T09:07:44.6314850Z .b8 121 2026-02-21T09:07:44.6314898Z .b8 106 2026-02-21T09:07:44.6314977Z .b8 99 2026-02-21T09:07:44.6315027Z .b8 105 2026-02-21T09:07:44.6315076Z .b8 122 2026-02-21T09:07:44.6315125Z .b8 102 2026-02-21T09:07:44.6315181Z .b8 122 2026-02-21T09:07:44.6315230Z .b8 101 2026-02-21T09:07:44.6315278Z .b8 106 2026-02-21T09:07:44.6315367Z .b8 112 2026-02-21T09:07:44.6315421Z .b8 104 2026-02-21T09:07:44.6315472Z .b8 105 2026-02-21T09:07:44.6315523Z .b8 105 2026-02-21T09:07:44.6315584Z .b8 51 2026-02-21T09:07:44.6315638Z .b8 121 2026-02-21T09:07:44.6315690Z .b8 102 2026-02-21T09:07:44.6315750Z .b8 103 2026-02-21T09:07:44.6315803Z .b8 120 2026-02-21T09:07:44.6315858Z .b8 99 2026-02-21T09:07:44.6315910Z .b8 97 2026-02-21T09:07:44.6315970Z .b8 105 2026-02-21T09:07:44.6316021Z .b8 51 2026-02-21T09:07:44.6316072Z .b8 105 2026-02-21T09:07:44.6316124Z .b8 54 2026-02-21T09:07:44.6316185Z .b8 46 2026-02-21T09:07:44.6316238Z .b8 112 2026-02-21T09:07:44.6316290Z .b8 121 2026-02-21T09:07:44.6316352Z .b8 0 2026-02-21T09:07:44.6316476Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:07:44.6316560Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:07:44.6316612Z .b8 116 2026-02-21T09:07:44.6316671Z .b8 109 2026-02-21T09:07:44.6316722Z .b8 112 2026-02-21T09:07:44.6316775Z .b8 47 2026-02-21T09:07:44.6316834Z .b8 116 2026-02-21T09:07:44.6316887Z .b8 111 2026-02-21T09:07:44.6316939Z .b8 114 2026-02-21T09:07:44.6316991Z .b8 99 2026-02-21T09:07:44.6317053Z .b8 104 2026-02-21T09:07:44.6317105Z .b8 105 2026-02-21T09:07:44.6317156Z .b8 110 2026-02-21T09:07:44.6317239Z .b8 100 2026-02-21T09:07:44.6317292Z .b8 117 2026-02-21T09:07:44.6317344Z .b8 99 2026-02-21T09:07:44.6317395Z .b8 116 2026-02-21T09:07:44.6317454Z .b8 111 2026-02-21T09:07:44.6317505Z .b8 114 2026-02-21T09:07:44.6317556Z .b8 95 2026-02-21T09:07:44.6317614Z .b8 114 2026-02-21T09:07:44.6317666Z .b8 111 2026-02-21T09:07:44.6317718Z .b8 111 2026-02-21T09:07:44.6317770Z .b8 116 2026-02-21T09:07:44.6317830Z .b8 47 2026-02-21T09:07:44.6317884Z .b8 105 2026-02-21T09:07:44.6317935Z .b8 98 2026-02-21T09:07:44.6317987Z .b8 0 2026-02-21T09:07:44.6318044Z } 2026-02-21T09:07:44.6318112Z .section .debug_macinfo { } 2026-02-21T09:07:44.6318116Z 2026-02-21T09:07:44.6318198Z ================================================================ 2026-02-21T09:07:44.6318313Z please share the reproducer above with Triton project. 2026-02-21T09:07:45.2437723Z 2026-02-21T09:07:45.2443251Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 16.2 configs/s 2026-02-21T09:07:47.4654418Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 533.3 2026-02-21T09:07:47.4655060Z configs/s 2026-02-21T09:07:47.5938401Z [167s] Generation 7 complete: 2026-02-21T09:07:47.5942857Z error=22 2026-02-21T09:07:47.5946075Z ok=40 2026-02-21T09:07:47.5950088Z min=0.0369 2026-02-21T09:07:47.5954169Z mid=0.0451 2026-02-21T09:07:47.5955505Z max=24.8105 2026-02-21T09:07:47.5955719Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:07:47.5955976Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:07:47.5956201Z 'l2_groupings': [1], 2026-02-21T09:07:47.5956361Z 'load_eviction_policies': ['', ''], 2026-02-21T09:07:47.5956543Z 'loop_orders': [[1, 0]], 2026-02-21T09:07:47.5956705Z 'num_stages': 5, 2026-02-21T09:07:47.5956848Z 'num_warps': 8, 2026-02-21T09:07:47.5956985Z 'pid_type': 'flat', 2026-02-21T09:07:47.5957147Z 'range_flattens': [None, None], 2026-02-21T09:07:47.5957327Z 'range_multi_buffers': [None, False], 2026-02-21T09:07:47.5957525Z 'range_num_stages': [0, 0], 2026-02-21T09:07:47.5957695Z 'range_unroll_factors': [0, 0], 2026-02-21T09:07:47.5957871Z 'range_warp_specializes': [None, None]} 2026-02-21T09:07:47.5963802Z [167s] Fitting surrogate: 686 points, 686 targets 2026-02-21T09:07:48.4358288Z [168s] Generation 8 starting: 44 neighbors, 3 active search path(s) 2026-02-21T09:08:06.3869564Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45/45 0.7 configs/s 2026-02-21T09:08:08.0504123Z 2026-02-21T09:08:08.0504140Z 2026-02-21T09:08:08.0504916Z ================================================================ 2026-02-21T09:08:08.0505447Z Internal Triton PTX codegen error 2026-02-21T09:08:08.0505807Z `ptxas` stderr: 2026-02-21T09:08:08.0506705Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 224 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:08.0508177Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:08.0508499Z 2026-02-21T09:08:08.0509353Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpm6_py1ld.ptx -o /tmp/tmpm6_py1ld.ptx.o 2026-02-21T09:08:08.0510322Z 2026-02-21T09:08:08.0510329Z 2026-02-21T09:08:08.0510426Z // 2026-02-21T09:08:08.0510709Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:08:08.0511049Z // 2026-02-21T09:08:08.0511187Z 2026-02-21T09:08:08.0511434Z .version 8.7 2026-02-21T09:08:08.0511700Z .target sm_100a 2026-02-21T09:08:08.0511983Z .address_size 64 2026-02-21T09:08:08.0512150Z 2026-02-21T09:08:08.0512443Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:08:08.0512965Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:08:08.0513408Z // @_helion_matmul 2026-02-21T09:08:08.0513812Z .visible .entry _helion_matmul( 2026-02-21T09:08:08.0514251Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:08:08.0514988Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:08:08.0515510Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:08:08.0516017Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:08:08.0516516Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:08:08.0516932Z ) 2026-02-21T09:08:08.0517149Z .reqntid 256 2026-02-21T09:08:08.0518222Z [188s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:08:08.0520931Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:08:08.0523502Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:08:08.0523986Z `ptxas` stderr: 2026-02-21T09:08:08.0524973Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 224 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:08.0525956Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:08.0526266Z 2026-02-21T09:08:08.0527101Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpm6_py1ld.ptx -o /tmp/tmpm6_py1ld.ptx.o 2026-02-21T09:08:08.0528060Z 2026-02-21T09:08:08.0528329Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:08:08.0528824Z .maxnreg 32 2026-02-21T09:08:08.0529064Z { 2026-02-21T09:08:08.0529294Z .reg .pred %p<139>; 2026-02-21T09:08:08.0529585Z .reg .b32 %r<1576>; 2026-02-21T09:08:08.0529858Z .reg .b64 %rd<630>; 2026-02-21T09:08:08.0530418Z .loc 1 19 0 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:19:0 2026-02-21T09:08:08.0531041Z $L__func_begin0: 2026-02-21T09:08:08.0531577Z .loc 1 19 0 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:19:0 2026-02-21T09:08:08.0532102Z 2026-02-21T09:08:08.0532214Z // %bb.0: 2026-02-21T09:08:08.0532525Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:08:08.0533153Z $L__tmp0: 2026-02-21T09:08:08.0533543Z .loc 1 19 0 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:19 2026-02-21T09:08:08.0534085Z mov.u32 %r1, %tid.x; 2026-02-21T09:08:08.0534426Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:08:08.0535023Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:08:08.0535365Z mov.b32 %r69, global_smem; 2026-02-21T09:08:08.0535684Z // begin inline asm 2026-02-21T09:08:08.0536203Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r69], 512; 2026-02-21T09:08:08.0536724Z // end inline asm 2026-02-21T09:08:08.0537070Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T09:08:08.0537469Z bar.sync 0; 2026-02-21T09:08:08.0537763Z ld.shared.b32 %r1567, [global_smem]; 2026-02-21T09:08:08.0538119Z bar.sync 0; 2026-02-21T09:08:08.0538392Z // begin inline asm 2026-02-21T09:08:08.0538838Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:08:08.0539423Z // end inline asm 2026-02-21T09:08:08.0539994Z .loc 1 21 67 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:21:67 2026-02-21T09:08:08.0540631Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:08:08.0540959Z mov.u32 %r86, %ctaid.y; 2026-02-21T09:08:08.0541272Z mov.u32 %r87, %ctaid.z; 2026-02-21T09:08:08.0541599Z mov.u32 %r88, %nctaid.x; 2026-02-21T09:08:08.0541915Z mov.u32 %r89, %nctaid.y; 2026-02-21T09:08:08.0542248Z mad.lo.s32 %r90, %r87, %r89, %r86; 2026-02-21T09:08:08.0542701Z mad.lo.s32 %r91, %r90, %r88, %r3; 2026-02-21T09:08:08.0543048Z shl.b32 %r92, %r91, 8; 2026-02-21T09:08:08.0543369Z cvt.s64.s32 %rd41, %r92; 2026-02-21T09:08:08.0543693Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T09:08:08.0544031Z shl.b32 %r93, %r1, 2; 2026-02-21T09:08:08.0544331Z add.s32 %r70, %r69, %r93; 2026-02-21T09:08:08.0544649Z mov.b32 %r79, 0; 2026-02-21T09:08:08.0544999Z // begin inline asm 2026-02-21T09:08:08.0545340Z @%p1 st.shared.b32 [ %r70 + 0 ], %r79; 2026-02-21T09:08:08.0545707Z // end inline asm 2026-02-21T09:08:08.0546005Z bar.warp.sync -1; 2026-02-21T09:08:08.0546321Z setp.eq.b32 %p130, %r1, 0; 2026-02-21T09:08:08.0546657Z cvt.u64.u32 %rd4, %r69; 2026-02-21T09:08:08.0546970Z // begin inline asm 2026-02-21T09:08:08.0547517Z @%p130 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:08:08.0548150Z // end inline asm 2026-02-21T09:08:08.0548437Z // begin inline asm 2026-02-21T09:08:08.0548938Z @%p130 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:08:08.0549493Z // end inline asm 2026-02-21T09:08:08.0549774Z mov.b32 %r72, 32; 2026-02-21T09:08:08.0550060Z // begin inline asm 2026-02-21T09:08:08.0550576Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r72; 2026-02-21T09:08:08.0551174Z // end inline asm 2026-02-21T09:08:08.0551443Z mov.b32 %r73, 256; 2026-02-21T09:08:08.0551734Z // begin inline asm 2026-02-21T09:08:08.0552253Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:08:08.0552852Z // end inline asm 2026-02-21T09:08:08.0553126Z mov.b32 %r74, 2048; 2026-02-21T09:08:08.0553432Z // begin inline asm 2026-02-21T09:08:08.0553966Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r74; 2026-02-21T09:08:08.0554579Z // end inline asm 2026-02-21T09:08:08.0555002Z mov.b32 %r75, 4096; 2026-02-21T09:08:08.0555300Z // begin inline asm 2026-02-21T09:08:08.0555846Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r75; 2026-02-21T09:08:08.0556454Z // end inline asm 2026-02-21T09:08:08.0556746Z mov.b64 %rd12, 4096; 2026-02-21T09:08:08.0557041Z // begin inline asm 2026-02-21T09:08:08.0557611Z @%p130 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:08:08.0558247Z // end inline asm 2026-02-21T09:08:08.0558517Z mov.b32 %r76, 1; 2026-02-21T09:08:08.0558798Z // begin inline asm 2026-02-21T09:08:08.0559369Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r76; 2026-02-21T09:08:08.0560131Z // end inline asm 2026-02-21T09:08:08.0560400Z // begin inline asm 2026-02-21T09:08:08.0560961Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r76; 2026-02-21T09:08:08.0561642Z // end inline asm 2026-02-21T09:08:08.0561912Z // begin inline asm 2026-02-21T09:08:08.0562442Z @%p130 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:08:08.0563017Z // end inline asm 2026-02-21T09:08:08.0563294Z // begin inline asm 2026-02-21T09:08:08.0563850Z @%p130 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:08.0564484Z // end inline asm 2026-02-21T09:08:08.0564857Z // begin inline asm 2026-02-21T09:08:08.0565402Z @%p130 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:08:08.0566012Z // end inline asm 2026-02-21T09:08:08.0566417Z // begin inline asm 2026-02-21T09:08:08.0566957Z @%p130 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:08.0567556Z // end inline asm 2026-02-21T09:08:08.0567840Z // begin inline asm 2026-02-21T09:08:08.0568645Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:08:08.0569545Z // end inline asm 2026-02-21T09:08:08.0569833Z // begin inline asm 2026-02-21T09:08:08.0570376Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:08:08.0570956Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:08:08.0571372Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:08.0571761Z // end inline asm 2026-02-21T09:08:08.0572037Z bar.sync 0; 2026-02-21T09:08:08.0572338Z cvta.global.u64 %rd67, %rd19; 2026-02-21T09:08:08.0572989Z .loc 1 22 67 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:22:67 2026-02-21T09:08:08.0573672Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:08:08.0574016Z bar.sync 0; 2026-02-21T09:08:08.0574285Z // begin inline asm 2026-02-21T09:08:08.0574612Z @%p1 st.shared.b32 [ %r70 + 0 ], %r79; 2026-02-21T09:08:08.0575065Z // end inline asm 2026-02-21T09:08:08.0575368Z bar.warp.sync -1; 2026-02-21T09:08:08.0575663Z // begin inline asm 2026-02-21T09:08:08.0576225Z @%p130 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:08:08.0576872Z // end inline asm 2026-02-21T09:08:08.0577156Z // begin inline asm 2026-02-21T09:08:08.0577665Z @%p130 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:08:08.0578234Z // end inline asm 2026-02-21T09:08:08.0578523Z // begin inline asm 2026-02-21T09:08:08.0579046Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r72; 2026-02-21T09:08:08.0579672Z // end inline asm 2026-02-21T09:08:08.0579955Z // begin inline asm 2026-02-21T09:08:08.0580488Z @%p130 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:08:08.0581105Z // end inline asm 2026-02-21T09:08:08.0581384Z // begin inline asm 2026-02-21T09:08:08.0581936Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r74; 2026-02-21T09:08:08.0582562Z // end inline asm 2026-02-21T09:08:08.0582851Z // begin inline asm 2026-02-21T09:08:08.0583388Z @%p130 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r74; 2026-02-21T09:08:08.0584023Z // end inline asm 2026-02-21T09:08:08.0584309Z // begin inline asm 2026-02-21T09:08:08.0584953Z @%p130 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:08:08.0585626Z // end inline asm 2026-02-21T09:08:08.0585906Z // begin inline asm 2026-02-21T09:08:08.0586486Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r76; 2026-02-21T09:08:08.0587152Z // end inline asm 2026-02-21T09:08:08.0587448Z // begin inline asm 2026-02-21T09:08:08.0588143Z @%p130 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r76; 2026-02-21T09:08:08.0588806Z // end inline asm 2026-02-21T09:08:08.0589095Z // begin inline asm 2026-02-21T09:08:08.0589621Z @%p130 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:08:08.0590316Z // end inline asm 2026-02-21T09:08:08.0590596Z // begin inline asm 2026-02-21T09:08:08.0591185Z @%p130 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:08.0591838Z // end inline asm 2026-02-21T09:08:08.0592128Z // begin inline asm 2026-02-21T09:08:08.0592665Z @%p130 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:08:08.0593283Z // end inline asm 2026-02-21T09:08:08.0593570Z // begin inline asm 2026-02-21T09:08:08.0594086Z @%p130 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:08.0594851Z // end inline asm 2026-02-21T09:08:08.0595140Z // begin inline asm 2026-02-21T09:08:08.0595961Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:08:08.0596853Z // end inline asm 2026-02-21T09:08:08.0597141Z // begin inline asm 2026-02-21T09:08:08.0597604Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:08:08.0598170Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:08:08.0598742Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:08.0599158Z // end inline asm 2026-02-21T09:08:08.0599453Z bar.sync 0; 2026-02-21T09:08:08.0599749Z cvta.global.u64 %rd68, %rd37; 2026-02-21T09:08:08.0600429Z .loc 1 31 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:31:52 2026-02-21T09:08:08.0601180Z setp.gt.u32 %p39, %r3, 127; 2026-02-21T09:08:08.0601537Z @%p39 bra $L__BB0_8; 2026-02-21T09:08:08.0601901Z // %bb.1: // %.lr.ph 2026-02-21T09:08:08.0602645Z .loc 1 37 45 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:37:45 2026-02-21T09:08:08.0603343Z and.b32 %r421, %r1, 31; 2026-02-21T09:08:08.0603675Z shl.b32 %r422, %r421, 3; 2026-02-21T09:08:08.0604014Z and.b32 %r423, %r1, 224; 2026-02-21T09:08:08.0604354Z bfe.u32 %r424, %r1, 5, 3; 2026-02-21T09:08:08.0604808Z shr.u32 %r425, %r1, 5; 2026-02-21T09:08:08.0605134Z shl.b32 %r426, %r1, 12; 2026-02-21T09:08:08.0605516Z and.b32 %r427, %r426, 28672; 2026-02-21T09:08:08.0605984Z shl.b32 %r428, %r1, 4; 2026-02-21T09:08:08.0606341Z and.b32 %r429, %r428, 4080; 2026-02-21T09:08:08.0606738Z or.b32 %r430, %r427, %r429; 2026-02-21T09:08:08.0607119Z xor.b32 %r431, %r430, 16; 2026-02-21T09:08:08.0607480Z xor.b32 %r432, %r430, 32; 2026-02-21T09:08:08.0607792Z xor.b32 %r433, %r430, 48; 2026-02-21T09:08:08.0608103Z xor.b32 %r434, %r430, 64; 2026-02-21T09:08:08.0608404Z xor.b32 %r435, %r430, 80; 2026-02-21T09:08:08.0608725Z xor.b32 %r436, %r430, 96; 2026-02-21T09:08:08.0609050Z xor.b32 %r437, %r430, 112; 2026-02-21T09:08:08.0609380Z shl.b32 %r438, %r423, 7; 2026-02-21T09:08:08.0609704Z shl.b32 %r439, %r421, 4; 2026-02-21T09:08:08.0610019Z shr.u32 %r440, %r423, 1; 2026-02-21T09:08:08.0610339Z or.b32 %r441, %r438, %r439; 2026-02-21T09:08:08.0610673Z xor.b32 %r442, %r441, %r440; 2026-02-21T09:08:08.0611017Z add.s32 %r836, %r69, %r442; 2026-02-21T09:08:08.0611616Z .loc 1 36 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:36:27 2026-02-21T09:08:08.0612267Z shl.b32 %r443, %r3, 8; 2026-02-21T09:08:08.0612584Z and.b32 %r489, %r443, 1792; 2026-02-21T09:08:08.0613172Z .loc 1 38 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:38:27 2026-02-21T09:08:08.0613816Z shl.b32 %r444, %r3, 5; 2026-02-21T09:08:08.0614131Z and.b32 %r485, %r444, 3840; 2026-02-21T09:08:08.0614779Z .loc 1 39 32 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:39:32 2026-02-21T09:08:08.0615542Z or.b32 %r23, %r485, %r424; 2026-02-21T09:08:08.0616174Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0616888Z shfl.sync.idx.b32 %r52, %r425, 0, 31, -1; 2026-02-21T09:08:08.0617364Z shl.b32 %r445, %r52, 21; 2026-02-21T09:08:08.0617712Z and.b32 %r446, %r445, 6291456; 2026-02-21T09:08:08.0618070Z add.s32 %r447, %r446, %r1567; 2026-02-21T09:08:08.0618425Z shl.b32 %r448, %r52, 6; 2026-02-21T09:08:08.0618747Z and.b32 %r449, %r448, 256; 2026-02-21T09:08:08.0619089Z add.s32 %r831, %r447, %r449; 2026-02-21T09:08:08.0619423Z mov.pred %p93, -1; 2026-02-21T09:08:08.0619736Z // begin inline asm 2026-02-21T09:08:08.0620532Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 0], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0621377Z // end inline asm 2026-02-21T09:08:08.0621754Z // begin inline asm 2026-02-21T09:08:08.0622522Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 16], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0623359Z // end inline asm 2026-02-21T09:08:08.0623644Z // begin inline asm 2026-02-21T09:08:08.0624420Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 32], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0625356Z // end inline asm 2026-02-21T09:08:08.0625739Z // begin inline asm 2026-02-21T09:08:08.0626494Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 48], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0627335Z // end inline asm 2026-02-21T09:08:08.0627634Z // begin inline asm 2026-02-21T09:08:08.0628376Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 64], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0629225Z // end inline asm 2026-02-21T09:08:08.0629520Z // begin inline asm 2026-02-21T09:08:08.0630266Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 80], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0631055Z // end inline asm 2026-02-21T09:08:08.0631341Z // begin inline asm 2026-02-21T09:08:08.0632104Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 96], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0632941Z // end inline asm 2026-02-21T09:08:08.0633243Z // begin inline asm 2026-02-21T09:08:08.0634017Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 112], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0634978Z // end inline asm 2026-02-21T09:08:08.0635279Z // begin inline asm 2026-02-21T09:08:08.0636050Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 128], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0636924Z // end inline asm 2026-02-21T09:08:08.0637210Z // begin inline asm 2026-02-21T09:08:08.0637987Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 144], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0638853Z // end inline asm 2026-02-21T09:08:08.0639140Z // begin inline asm 2026-02-21T09:08:08.0639916Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 160], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0640763Z // end inline asm 2026-02-21T09:08:08.0641067Z // begin inline asm 2026-02-21T09:08:08.0641833Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 176], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0642683Z // end inline asm 2026-02-21T09:08:08.0643110Z // begin inline asm 2026-02-21T09:08:08.0643889Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 192], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0644804Z // end inline asm 2026-02-21T09:08:08.0645161Z // begin inline asm 2026-02-21T09:08:08.0645954Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 208], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0646851Z // end inline asm 2026-02-21T09:08:08.0647140Z // begin inline asm 2026-02-21T09:08:08.0647936Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 224], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0648808Z // end inline asm 2026-02-21T09:08:08.0649106Z // begin inline asm 2026-02-21T09:08:08.0649971Z @%p93 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r831 + 240], {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:08:08.0650863Z // end inline asm 2026-02-21T09:08:08.0651161Z // begin inline asm 2026-02-21T09:08:08.0651495Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:08:08.0651856Z // end inline asm 2026-02-21T09:08:08.0652145Z bar.sync 0; 2026-02-21T09:08:08.0652733Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0653434Z add.s32 %r1569, %r69, 196656; 2026-02-21T09:08:08.0653880Z // begin inline asm 2026-02-21T09:08:08.0654262Z @%p130 mbarrier.init.shared::cta.b64 [%r1569], 1; 2026-02-21T09:08:08.0654802Z // end inline asm 2026-02-21T09:08:08.0655103Z bar.sync 0; 2026-02-21T09:08:08.0655392Z add.s32 %r367, %r69, 196664; 2026-02-21T09:08:08.0655745Z // begin inline asm 2026-02-21T09:08:08.0656111Z @%p130 mbarrier.init.shared::cta.b64 [%r367], 1; 2026-02-21T09:08:08.0656563Z // end inline asm 2026-02-21T09:08:08.0656861Z add.s32 %r368, %r69, 196608; 2026-02-21T09:08:08.0657219Z // begin inline asm 2026-02-21T09:08:08.0657587Z @%p130 mbarrier.init.shared::cta.b64 [%r368], 1; 2026-02-21T09:08:08.0658033Z // end inline asm 2026-02-21T09:08:08.0658315Z bar.sync 0; 2026-02-21T09:08:08.0658604Z add.s32 %r369, %r69, 196616; 2026-02-21T09:08:08.0658960Z // begin inline asm 2026-02-21T09:08:08.0659318Z @%p130 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T09:08:08.0659763Z // end inline asm 2026-02-21T09:08:08.0660046Z bar.sync 0; 2026-02-21T09:08:08.0660341Z add.s32 %r370, %r69, 196624; 2026-02-21T09:08:08.0660679Z // begin inline asm 2026-02-21T09:08:08.0661046Z @%p130 mbarrier.init.shared::cta.b64 [%r370], 1; 2026-02-21T09:08:08.0661473Z // end inline asm 2026-02-21T09:08:08.0661759Z bar.sync 0; 2026-02-21T09:08:08.0662038Z add.s32 %r371, %r69, 196632; 2026-02-21T09:08:08.0662383Z // begin inline asm 2026-02-21T09:08:08.0662745Z @%p130 mbarrier.init.shared::cta.b64 [%r371], 1; 2026-02-21T09:08:08.0663182Z // end inline asm 2026-02-21T09:08:08.0663474Z bar.sync 0; 2026-02-21T09:08:08.0663752Z add.s32 %r372, %r69, 196640; 2026-02-21T09:08:08.0664098Z // begin inline asm 2026-02-21T09:08:08.0664457Z @%p130 mbarrier.init.shared::cta.b64 [%r372], 1; 2026-02-21T09:08:08.0665116Z // end inline asm 2026-02-21T09:08:08.0665500Z bar.sync 0; 2026-02-21T09:08:08.0665864Z add.s32 %r482, %r69, 196648; 2026-02-21T09:08:08.0666309Z // begin inline asm 2026-02-21T09:08:08.0666719Z @%p130 mbarrier.init.shared::cta.b64 [%r482], 1; 2026-02-21T09:08:08.0667164Z // end inline asm 2026-02-21T09:08:08.0667442Z bar.sync 0; 2026-02-21T09:08:08.0667726Z // begin inline asm 2026-02-21T09:08:08.0668164Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r368], 32768; 2026-02-21T09:08:08.0668677Z // end inline asm 2026-02-21T09:08:08.0669240Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0669910Z // begin inline asm 2026-02-21T09:08:08.0670244Z fence.proxy.async.shared::cta; 2026-02-21T09:08:08.0670714Z // end inline asm 2026-02-21T09:08:08.0670996Z bar.sync 0; 2026-02-21T09:08:08.0671276Z elect.sync %r450|%p80, -1; 2026-02-21T09:08:08.0671643Z and.pred %p65, %p1, %p80; 2026-02-21T09:08:08.0671972Z // begin inline asm 2026-02-21T09:08:08.0672798Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r69], [%rd67, {%r79, %r485}], [%r368]; 2026-02-21T09:08:08.0673619Z // end inline asm 2026-02-21T09:08:08.0674192Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0674936Z bar.sync 0; 2026-02-21T09:08:08.0675223Z elect.sync %r451|%p81, -1; 2026-02-21T09:08:08.0675589Z and.pred %p66, %p1, %p81; 2026-02-21T09:08:08.0675925Z add.s32 %r379, %r69, 98304; 2026-02-21T09:08:08.0676263Z // begin inline asm 2026-02-21T09:08:08.0677086Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r379], [%rd68, {%r79, %r489}], [%r368]; 2026-02-21T09:08:08.0677934Z // end inline asm 2026-02-21T09:08:08.0678505Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0679155Z bar.sync 0; 2026-02-21T09:08:08.0679435Z // begin inline asm 2026-02-21T09:08:08.0679859Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r369], 32768; 2026-02-21T09:08:08.0680369Z // end inline asm 2026-02-21T09:08:08.0681000Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0681667Z bar.sync 0; 2026-02-21T09:08:08.0681947Z elect.sync %r452|%p82, -1; 2026-02-21T09:08:08.0682303Z and.pred %p68, %p1, %p82; 2026-02-21T09:08:08.0682646Z add.s32 %r384, %r69, 16384; 2026-02-21T09:08:08.0682974Z // begin inline asm 2026-02-21T09:08:08.0683724Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r384], [%rd67, {%r72, %r485}], [%r369]; 2026-02-21T09:08:08.0684542Z // end inline asm 2026-02-21T09:08:08.0685224Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0685878Z bar.sync 0; 2026-02-21T09:08:08.0686168Z elect.sync %r453|%p83, -1; 2026-02-21T09:08:08.0686507Z and.pred %p69, %p1, %p83; 2026-02-21T09:08:08.0686853Z add.s32 %r388, %r69, 114688; 2026-02-21T09:08:08.0687185Z // begin inline asm 2026-02-21T09:08:08.0687923Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r388], [%rd68, {%r72, %r489}], [%r369]; 2026-02-21T09:08:08.0688749Z // end inline asm 2026-02-21T09:08:08.0689302Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0689961Z bar.sync 0; 2026-02-21T09:08:08.0690226Z // begin inline asm 2026-02-21T09:08:08.0690654Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r370], 32768; 2026-02-21T09:08:08.0691155Z // end inline asm 2026-02-21T09:08:08.0691702Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0692358Z bar.sync 0; 2026-02-21T09:08:08.0692636Z elect.sync %r454|%p84, -1; 2026-02-21T09:08:08.0692988Z and.pred %p71, %p1, %p84; 2026-02-21T09:08:08.0693317Z add.s32 %r393, %r69, 32768; 2026-02-21T09:08:08.0693652Z mov.b32 %r394, 64; 2026-02-21T09:08:08.0693938Z // begin inline asm 2026-02-21T09:08:08.0694771Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r393], [%rd67, {%r394, %r485}], [%r370]; 2026-02-21T09:08:08.0695614Z // end inline asm 2026-02-21T09:08:08.0696170Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0696831Z bar.sync 0; 2026-02-21T09:08:08.0697111Z elect.sync %r455|%p85, -1; 2026-02-21T09:08:08.0697459Z and.pred %p72, %p1, %p85; 2026-02-21T09:08:08.0697790Z add.s32 %r397, %r69, 131072; 2026-02-21T09:08:08.0698131Z // begin inline asm 2026-02-21T09:08:08.0698950Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r397], [%rd68, {%r394, %r489}], [%r370]; 2026-02-21T09:08:08.0700089Z // end inline asm 2026-02-21T09:08:08.0700685Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0701409Z bar.sync 0; 2026-02-21T09:08:08.0701680Z // begin inline asm 2026-02-21T09:08:08.0702097Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r371], 32768; 2026-02-21T09:08:08.0702602Z // end inline asm 2026-02-21T09:08:08.0703144Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0703797Z bar.sync 0; 2026-02-21T09:08:08.0704108Z elect.sync %r456|%p86, -1; 2026-02-21T09:08:08.0704455Z and.pred %p74, %p1, %p86; 2026-02-21T09:08:08.0704868Z add.s32 %r402, %r69, 49152; 2026-02-21T09:08:08.0705204Z mov.b32 %r403, 96; 2026-02-21T09:08:08.0705516Z // begin inline asm 2026-02-21T09:08:08.0706400Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r402], [%rd67, {%r403, %r485}], [%r371]; 2026-02-21T09:08:08.0707214Z // end inline asm 2026-02-21T09:08:08.0707779Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0708409Z bar.sync 0; 2026-02-21T09:08:08.0708705Z elect.sync %r457|%p87, -1; 2026-02-21T09:08:08.0709059Z and.pred %p75, %p1, %p87; 2026-02-21T09:08:08.0709401Z add.s32 %r406, %r69, 147456; 2026-02-21T09:08:08.0709808Z // begin inline asm 2026-02-21T09:08:08.0710549Z @%p75 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r406], [%rd68, {%r403, %r489}], [%r371]; 2026-02-21T09:08:08.0711374Z // end inline asm 2026-02-21T09:08:08.0711991Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0712679Z bar.sync 0; 2026-02-21T09:08:08.0712964Z // begin inline asm 2026-02-21T09:08:08.0713455Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r372], 32768; 2026-02-21T09:08:08.0713972Z // end inline asm 2026-02-21T09:08:08.0714588Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0715334Z bar.sync 0; 2026-02-21T09:08:08.0715628Z elect.sync %r458|%p88, -1; 2026-02-21T09:08:08.0715978Z and.pred %p77, %p1, %p88; 2026-02-21T09:08:08.0716325Z add.s32 %r411, %r69, 65536; 2026-02-21T09:08:08.0716661Z mov.b32 %r412, 128; 2026-02-21T09:08:08.0716960Z // begin inline asm 2026-02-21T09:08:08.0717698Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r411], [%rd67, {%r412, %r485}], [%r372]; 2026-02-21T09:08:08.0718479Z // end inline asm 2026-02-21T09:08:08.0719034Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0719674Z bar.sync 0; 2026-02-21T09:08:08.0719978Z elect.sync %r459|%p89, -1; 2026-02-21T09:08:08.0720350Z and.pred %p78, %p1, %p89; 2026-02-21T09:08:08.0720698Z add.s32 %r415, %r69, 163840; 2026-02-21T09:08:08.0721041Z // begin inline asm 2026-02-21T09:08:08.0721776Z @%p78 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r415], [%rd68, {%r412, %r489}], [%r372]; 2026-02-21T09:08:08.0722603Z // end inline asm 2026-02-21T09:08:08.0723166Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0723824Z bar.sync 0; 2026-02-21T09:08:08.0724108Z // begin inline asm 2026-02-21T09:08:08.0724401Z 2026-02-21T09:08:08.0724638Z { 2026-02-21T09:08:08.0725018Z .reg .pred complete; 2026-02-21T09:08:08.0725395Z waitLoop: 2026-02-21T09:08:08.0725917Z mbarrier.try_wait.parity.shared.b64 complete, [%r368], %r79; 2026-02-21T09:08:08.0726565Z @!complete bra.uni waitLoop; 2026-02-21T09:08:08.0726970Z } 2026-02-21T09:08:08.0727120Z 2026-02-21T09:08:08.0727231Z // end inline asm 2026-02-21T09:08:08.0727799Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0728635Z setp.ne.b32 %p90, %r52, 0; 2026-02-21T09:08:08.0729004Z @%p90 bra $L__BB0_3; 2026-02-21T09:08:08.0729314Z // %bb.2: 2026-02-21T09:08:08.0729881Z .loc 1 0 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:0:52 2026-02-21T09:08:08.0730632Z add.s32 %r469, %r69, 8224; 2026-02-21T09:08:08.0731000Z bfe.u32 %r470, %r469, 4, 14; 2026-02-21T09:08:08.0731357Z cvt.u64.u32 %rd61, %r470; 2026-02-21T09:08:08.0731743Z or.b64 %rd58, %rd61, -9223371899348713472; 2026-02-21T09:08:08.0732157Z add.s32 %r471, %r69, 8192; 2026-02-21T09:08:08.0732499Z bfe.u32 %r472, %r471, 4, 14; 2026-02-21T09:08:08.0732853Z cvt.u64.u32 %rd62, %r472; 2026-02-21T09:08:08.0733230Z or.b64 %rd56, %rd62, -9223371899348713472; 2026-02-21T09:08:08.0733645Z add.s32 %r474, %r69, 98336; 2026-02-21T09:08:08.0733986Z bfe.u32 %r475, %r474, 4, 14; 2026-02-21T09:08:08.0734405Z cvt.u64.u32 %rd63, %r475; 2026-02-21T09:08:08.0734854Z or.b64 %rd55, %rd63, -9223371899348713472; 2026-02-21T09:08:08.0735241Z add.s32 %r476, %r69, 32; 2026-02-21T09:08:08.0735550Z bfe.u32 %r477, %r476, 4, 14; 2026-02-21T09:08:08.0735869Z cvt.u64.u32 %rd64, %r477; 2026-02-21T09:08:08.0736199Z or.b64 %rd54, %rd64, -9223371899348713472; 2026-02-21T09:08:08.0736610Z bfe.u32 %r478, %r379, 4, 14; 2026-02-21T09:08:08.0736960Z cvt.u64.u32 %rd65, %r478; 2026-02-21T09:08:08.0737316Z or.b64 %rd53, %rd65, -9223371899348713472; 2026-02-21T09:08:08.0737802Z bfe.u32 %r479, %r69, 4, 14; 2026-02-21T09:08:08.0738138Z cvt.u64.u32 %rd66, %r479; 2026-02-21T09:08:08.0738499Z or.b64 %rd52, %rd66, -9223371899348713472; 2026-02-21T09:08:08.0739173Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0739870Z elect.sync %r480|%p92, -1; 2026-02-21T09:08:08.0740215Z mov.b32 %r461, 138412048; 2026-02-21T09:08:08.0740543Z mov.pred %p91, 0; 2026-02-21T09:08:08.0740852Z // begin inline asm 2026-02-21T09:08:08.0741362Z @%p92 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 0 ], %rd52, %rd53, %r461, %p91; 2026-02-21T09:08:08.0741957Z // end inline asm 2026-02-21T09:08:08.0742243Z // begin inline asm 2026-02-21T09:08:08.0742747Z @%p92 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 0 ], %rd54, %rd55, %r461, %p93; 2026-02-21T09:08:08.0743317Z // end inline asm 2026-02-21T09:08:08.0743607Z // begin inline asm 2026-02-21T09:08:08.0744108Z @%p92 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 256 ], %rd56, %rd53, %r461, %p91; 2026-02-21T09:08:08.0744744Z // end inline asm 2026-02-21T09:08:08.0745029Z // begin inline asm 2026-02-21T09:08:08.0745534Z @%p92 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 256 ], %rd58, %rd55, %r461, %p93; 2026-02-21T09:08:08.0746143Z // end inline asm 2026-02-21T09:08:08.0746441Z add.s32 %r481, %r69, 196656; 2026-02-21T09:08:08.0746800Z cvt.u64.u32 %rd60, %r481; 2026-02-21T09:08:08.0747136Z // begin inline asm 2026-02-21T09:08:08.0747623Z @%p92 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd60]; 2026-02-21T09:08:08.0748171Z // end inline asm 2026-02-21T09:08:08.0748452Z $L__BB0_3: 2026-02-21T09:08:08.0749013Z .loc 1 0 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:0:52 2026-02-21T09:08:08.0749758Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:08:08.0750204Z add.s32 %r4, %r69, %r430; 2026-02-21T09:08:08.0750538Z add.s32 %r5, %r69, %r431; 2026-02-21T09:08:08.0750882Z add.s32 %r6, %r69, %r432; 2026-02-21T09:08:08.0751205Z add.s32 %r7, %r69, %r433; 2026-02-21T09:08:08.0751546Z add.s32 %r8, %r69, %r434; 2026-02-21T09:08:08.0751877Z add.s32 %r9, %r69, %r435; 2026-02-21T09:08:08.0752203Z add.s32 %r10, %r69, %r436; 2026-02-21T09:08:08.0752554Z add.s32 %r11, %r69, %r437; 2026-02-21T09:08:08.0752890Z add.s32 %r841, %r836, 512; 2026-02-21T09:08:08.0753238Z add.s32 %r846, %r836, 1024; 2026-02-21T09:08:08.0753585Z add.s32 %r851, %r836, 1536; 2026-02-21T09:08:08.0754064Z add.s32 %r856, %r836, 2048; 2026-02-21T09:08:08.0754399Z add.s32 %r861, %r836, 2560; 2026-02-21T09:08:08.0754826Z add.s32 %r866, %r836, 3072; 2026-02-21T09:08:08.0755164Z add.s32 %r871, %r836, 3584; 2026-02-21T09:08:08.0755509Z or.b32 %r21, %r489, %r422; 2026-02-21T09:08:08.0755941Z or.b32 %r24, %r23, 8; 2026-02-21T09:08:08.0756258Z or.b32 %r25, %r23, 16; 2026-02-21T09:08:08.0756584Z or.b32 %r26, %r23, 24; 2026-02-21T09:08:08.0756906Z or.b32 %r27, %r23, 32; 2026-02-21T09:08:08.0757230Z or.b32 %r28, %r23, 40; 2026-02-21T09:08:08.0757539Z or.b32 %r29, %r23, 48; 2026-02-21T09:08:08.0757863Z or.b32 %r30, %r23, 56; 2026-02-21T09:08:08.0758173Z or.b32 %r31, %r23, 64; 2026-02-21T09:08:08.0758492Z or.b32 %r32, %r23, 72; 2026-02-21T09:08:08.0758805Z or.b32 %r33, %r23, 80; 2026-02-21T09:08:08.0759126Z or.b32 %r34, %r23, 88; 2026-02-21T09:08:08.0759446Z or.b32 %r35, %r23, 96; 2026-02-21T09:08:08.0759761Z or.b32 %r36, %r23, 104; 2026-02-21T09:08:08.0760188Z or.b32 %r37, %r23, 112; 2026-02-21T09:08:08.0760514Z or.b32 %r38, %r23, 120; 2026-02-21T09:08:08.0760835Z or.b32 %r39, %r23, 128; 2026-02-21T09:08:08.0761150Z or.b32 %r40, %r23, 136; 2026-02-21T09:08:08.0761474Z or.b32 %r41, %r23, 144; 2026-02-21T09:08:08.0761779Z or.b32 %r42, %r23, 152; 2026-02-21T09:08:08.0762103Z or.b32 %r43, %r23, 160; 2026-02-21T09:08:08.0762414Z or.b32 %r44, %r23, 168; 2026-02-21T09:08:08.0762734Z or.b32 %r45, %r23, 176; 2026-02-21T09:08:08.0763053Z or.b32 %r46, %r23, 184; 2026-02-21T09:08:08.0763470Z or.b32 %r47, %r23, 192; 2026-02-21T09:08:08.0763795Z or.b32 %r48, %r23, 200; 2026-02-21T09:08:08.0764107Z or.b32 %r49, %r23, 208; 2026-02-21T09:08:08.0764430Z or.b32 %r50, %r23, 216; 2026-02-21T09:08:08.0764846Z or.b32 %r51, %r424, %r444; 2026-02-21T09:08:08.0765494Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0766173Z bar.sync 0; 2026-02-21T09:08:08.0766468Z // begin inline asm 2026-02-21T09:08:08.0766924Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r482], 32768; 2026-02-21T09:08:08.0767444Z // end inline asm 2026-02-21T09:08:08.0768028Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0768704Z bar.sync 0; 2026-02-21T09:08:08.0769011Z elect.sync %r496|%p104, -1; 2026-02-21T09:08:08.0769373Z and.pred %p101, %p1, %p104; 2026-02-21T09:08:08.0769731Z add.s32 %r483, %r69, 81920; 2026-02-21T09:08:08.0770068Z mov.b32 %r484, 160; 2026-02-21T09:08:08.0770384Z // begin inline asm 2026-02-21T09:08:08.0771181Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r483], [%rd67, {%r484, %r485}], [%r482]; 2026-02-21T09:08:08.0772056Z // end inline asm 2026-02-21T09:08:08.0772647Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0773338Z bar.sync 0; 2026-02-21T09:08:08.0773795Z elect.sync %r497|%p105, -1; 2026-02-21T09:08:08.0774283Z and.pred %p102, %p1, %p105; 2026-02-21T09:08:08.0774835Z add.s32 %r487, %r69, 180224; 2026-02-21T09:08:08.0775290Z // begin inline asm 2026-02-21T09:08:08.0776094Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r487], [%rd68, {%r484, %r489}], [%r482]; 2026-02-21T09:08:08.0776979Z // end inline asm 2026-02-21T09:08:08.0777270Z mov.b32 %r1573, 1; 2026-02-21T09:08:08.0777577Z mov.b32 %r1572, 5; 2026-02-21T09:08:08.0777869Z mov.b32 %r1568, 0; 2026-02-21T09:08:08.0778181Z mov.b32 %r1570, %r1568; 2026-02-21T09:08:08.0778507Z mov.b32 %r1571, %r1568; 2026-02-21T09:08:08.0778838Z mov.b32 %r1574, %r1568; 2026-02-21T09:08:08.0779159Z mov.b32 %r1575, %r1568; 2026-02-21T09:08:08.0779491Z bra.uni $L__BB0_4; 2026-02-21T09:08:08.0779911Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:08:08.0780697Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0781517Z setp.lt.u32 %p120, %r1575, 1856; 2026-02-21T09:08:08.0782168Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0782861Z // begin inline asm 2026-02-21T09:08:08.0783148Z 2026-02-21T09:08:08.0783463Z { 2026-02-21T09:08:08.0783719Z .reg .pred complete; 2026-02-21T09:08:08.0784035Z waitLoop: 2026-02-21T09:08:08.0784472Z mbarrier.try_wait.parity.shared.b64 complete, [%r1569], %r1568; 2026-02-21T09:08:08.0785145Z @!complete bra.uni waitLoop; 2026-02-21T09:08:08.0785495Z } 2026-02-21T09:08:08.0785634Z 2026-02-21T09:08:08.0785752Z // end inline asm 2026-02-21T09:08:08.0786059Z add.s32 %r541, %r1573, 1; 2026-02-21T09:08:08.0786402Z setp.gt.s32 %p123, %r541, 1; 2026-02-21T09:08:08.0786777Z selp.b32 %r1573, 0, %r541, %p123; 2026-02-21T09:08:08.0787154Z selp.b32 %r542, 1, 0, %p123; 2026-02-21T09:08:08.0787504Z xor.b32 %r66, %r1574, %r542; 2026-02-21T09:08:08.0788212Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0788903Z add.s32 %r543, %r1572, 1; 2026-02-21T09:08:08.0789253Z setp.gt.s32 %p124, %r543, 5; 2026-02-21T09:08:08.0789618Z selp.b32 %r1572, 0, %r543, %p124; 2026-02-21T09:08:08.0790001Z shl.b32 %r544, %r1572, 3; 2026-02-21T09:08:08.0790338Z add.s32 %r546, %r69, %r544; 2026-02-21T09:08:08.0790692Z add.s32 %r536, %r546, 196608; 2026-02-21T09:08:08.0791033Z bar.sync 0; 2026-02-21T09:08:08.0791432Z and.pred %p117, %p130, %p120; 2026-02-21T09:08:08.0814244Z // begin inline asm 2026-02-21T09:08:08.0814884Z @%p117 mbarrier.arrive.expect_tx.shared.b64 _, [%r536], 32768; 2026-02-21T09:08:08.0815442Z // end inline asm 2026-02-21T09:08:08.0816057Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0816795Z shl.b32 %r547, %r1572, 14; 2026-02-21T09:08:08.0817158Z add.s32 %r533, %r69, %r547; 2026-02-21T09:08:08.0817534Z bar.sync 0; 2026-02-21T09:08:08.0817863Z elect.sync %r548|%p125, -1; 2026-02-21T09:08:08.0818242Z and.pred %p126, %p120, %p125; 2026-02-21T09:08:08.0818626Z and.pred %p118, %p1, %p126; 2026-02-21T09:08:08.0818984Z add.s32 %r534, %r1575, 192; 2026-02-21T09:08:08.0819343Z // begin inline asm 2026-02-21T09:08:08.0820168Z @%p118 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r533], [%rd67, {%r534, %r485}], [%r536]; 2026-02-21T09:08:08.0821054Z // end inline asm 2026-02-21T09:08:08.0821659Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0822359Z add.s32 %r537, %r533, 98304; 2026-02-21T09:08:08.0822715Z bar.sync 0; 2026-02-21T09:08:08.0823016Z elect.sync %r549|%p127, -1; 2026-02-21T09:08:08.0823402Z and.pred %p128, %p120, %p127; 2026-02-21T09:08:08.0823774Z and.pred %p119, %p1, %p128; 2026-02-21T09:08:08.0824135Z // begin inline asm 2026-02-21T09:08:08.0824986Z @%p119 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r537], [%rd68, {%r534, %r489}], [%r536]; 2026-02-21T09:08:08.0825871Z // end inline asm 2026-02-21T09:08:08.0826463Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0827178Z setp.lt.u32 %p129, %r1575, 1984; 2026-02-21T09:08:08.0827573Z add.s32 %r1575, %r1575, 32; 2026-02-21T09:08:08.0827916Z mov.b32 %r1568, %r1574; 2026-02-21T09:08:08.0828259Z mov.b32 %r1569, %r550; 2026-02-21T09:08:08.0828593Z mov.b32 %r1574, %r66; 2026-02-21T09:08:08.0828925Z @%p129 bra $L__BB0_4; 2026-02-21T09:08:08.0829243Z bra.uni $L__BB0_7; 2026-02-21T09:08:08.0829674Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:08:08.0830470Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0831160Z add.s32 %r500, %r1571, 1; 2026-02-21T09:08:08.0831528Z setp.gt.s32 %p107, %r500, 5; 2026-02-21T09:08:08.0832113Z selp.b32 %r1571, 0, %r500, %p107; 2026-02-21T09:08:08.0832512Z selp.b32 %r501, 1, 0, %p107; 2026-02-21T09:08:08.0832866Z xor.b32 %r1570, %r1570, %r501; 2026-02-21T09:08:08.0833244Z shl.b32 %r502, %r1571, 3; 2026-02-21T09:08:08.0833587Z add.s32 %r504, %r69, %r502; 2026-02-21T09:08:08.0834041Z add.s32 %r498, %r504, 196608; 2026-02-21T09:08:08.0834392Z bar.sync 0; 2026-02-21T09:08:08.0834754Z // begin inline asm 2026-02-21T09:08:08.0835065Z 2026-02-21T09:08:08.0835304Z { 2026-02-21T09:08:08.0835584Z .reg .pred complete; 2026-02-21T09:08:08.0835896Z waitLoop: 2026-02-21T09:08:08.0836336Z mbarrier.try_wait.parity.shared.b64 complete, [%r498], %r1570; 2026-02-21T09:08:08.0836883Z @!complete bra.uni waitLoop; 2026-02-21T09:08:08.0837229Z } 2026-02-21T09:08:08.0837373Z 2026-02-21T09:08:08.0837489Z // end inline asm 2026-02-21T09:08:08.0837801Z shl.b32 %r505, %r1573, 3; 2026-02-21T09:08:08.0838143Z add.s32 %r506, %r69, %r505; 2026-02-21T09:08:08.0838575Z add.s32 %r550, %r506, 196656; 2026-02-21T09:08:08.0839221Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0839909Z @%p90 bra $L__BB0_6; 2026-02-21T09:08:08.0840331Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:08:08.0841093Z .loc 1 48 31 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:48:31 2026-02-21T09:08:08.0841802Z shl.b32 %r515, %r1571, 14; 2026-02-21T09:08:08.0842252Z add.s32 %r517, %r69, %r515; 2026-02-21T09:08:08.0842888Z .loc 1 49 44 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:49:44 2026-02-21T09:08:08.0843590Z add.s32 %r518, %r517, 98304; 2026-02-21T09:08:08.0844208Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0845006Z elect.sync %r519|%p109, -1; 2026-02-21T09:08:08.0845368Z bfe.u32 %r520, %r517, 4, 14; 2026-02-21T09:08:08.0845732Z cvt.u64.u32 %rd78, %r520; 2026-02-21T09:08:08.0846106Z or.b64 %rd69, %rd78, -9223371899348713472; 2026-02-21T09:08:08.0846536Z bfe.u32 %r521, %r518, 4, 14; 2026-02-21T09:08:08.0846893Z cvt.u64.u32 %rd79, %r521; 2026-02-21T09:08:08.0847305Z or.b64 %rd70, %rd79, -9223371899348713472; 2026-02-21T09:08:08.0847878Z mov.b32 %r508, 138412048; 2026-02-21T09:08:08.0848276Z mov.pred %p108, -1; 2026-02-21T09:08:08.0848610Z // begin inline asm 2026-02-21T09:08:08.0849150Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 0 ], %rd69, %rd70, %r508, %p108; 2026-02-21T09:08:08.0849773Z // end inline asm 2026-02-21T09:08:08.0850073Z add.s32 %r522, %r517, 32; 2026-02-21T09:08:08.0850412Z bfe.u32 %r523, %r522, 4, 14; 2026-02-21T09:08:08.0850764Z cvt.u64.u32 %rd80, %r523; 2026-02-21T09:08:08.0851124Z or.b64 %rd71, %rd80, -9223371899348713472; 2026-02-21T09:08:08.0851540Z add.s32 %r524, %r517, 98336; 2026-02-21T09:08:08.0851882Z bfe.u32 %r525, %r524, 4, 14; 2026-02-21T09:08:08.0852236Z cvt.u64.u32 %rd81, %r525; 2026-02-21T09:08:08.0852601Z or.b64 %rd72, %rd81, -9223371899348713472; 2026-02-21T09:08:08.0853013Z // begin inline asm 2026-02-21T09:08:08.0853524Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 0 ], %rd71, %rd72, %r508, %p108; 2026-02-21T09:08:08.0854139Z // end inline asm 2026-02-21T09:08:08.0854454Z add.s32 %r526, %r517, 8192; 2026-02-21T09:08:08.0854849Z bfe.u32 %r527, %r526, 4, 14; 2026-02-21T09:08:08.0855207Z cvt.u64.u32 %rd82, %r527; 2026-02-21T09:08:08.0855566Z or.b64 %rd73, %rd82, -9223371899348713472; 2026-02-21T09:08:08.0855980Z // begin inline asm 2026-02-21T09:08:08.0856503Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 256 ], %rd73, %rd70, %r508, %p108; 2026-02-21T09:08:08.0857130Z // end inline asm 2026-02-21T09:08:08.0857428Z add.s32 %r528, %r517, 8224; 2026-02-21T09:08:08.0857784Z bfe.u32 %r529, %r528, 4, 14; 2026-02-21T09:08:08.0858136Z cvt.u64.u32 %rd83, %r529; 2026-02-21T09:08:08.0858492Z or.b64 %rd75, %rd83, -9223371899348713472; 2026-02-21T09:08:08.0859022Z // begin inline asm 2026-02-21T09:08:08.0859545Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r1567 + 256 ], %rd75, %rd72, %r508, %p108; 2026-02-21T09:08:08.0860163Z // end inline asm 2026-02-21T09:08:08.0860458Z cvt.u64.u32 %rd77, %r550; 2026-02-21T09:08:08.0860880Z // begin inline asm 2026-02-21T09:08:08.0861358Z @%p109 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd77]; 2026-02-21T09:08:08.0861915Z // end inline asm 2026-02-21T09:08:08.0862217Z bra.uni $L__BB0_6; 2026-02-21T09:08:08.0862607Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:08:08.0863366Z .loc 1 0 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:0:52 2026-02-21T09:08:08.0864040Z mov.b32 %r551, 1; 2026-02-21T09:08:08.0864620Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0865387Z // begin inline asm 2026-02-21T09:08:08.0865759Z 2026-02-21T09:08:08.0866001Z { 2026-02-21T09:08:08.0866278Z .reg .pred complete; 2026-02-21T09:08:08.0866629Z waitLoop: 2026-02-21T09:08:08.0867148Z mbarrier.try_wait.parity.shared.b64 complete, [%r550], %r551; 2026-02-21T09:08:08.0867790Z @!complete bra.uni waitLoop; 2026-02-21T09:08:08.0868133Z } 2026-02-21T09:08:08.0868291Z 2026-02-21T09:08:08.0868407Z // end inline asm 2026-02-21T09:08:08.0868986Z .loc 1 44 57 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:44:57 2026-02-21T09:08:08.0869683Z bar.sync 0; 2026-02-21T09:08:08.0870039Z // begin inline asm 2026-02-21T09:08:08.0870436Z @%p130 mbarrier.inval.shared::cta.b64 [%r368]; 2026-02-21T09:08:08.0870879Z // end inline asm 2026-02-21T09:08:08.0871167Z bar.sync 0; 2026-02-21T09:08:08.0871456Z // begin inline asm 2026-02-21T09:08:08.0871822Z @%p130 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T09:08:08.0872274Z // end inline asm 2026-02-21T09:08:08.0872560Z bar.sync 0; 2026-02-21T09:08:08.0872853Z // begin inline asm 2026-02-21T09:08:08.0873222Z @%p130 mbarrier.inval.shared::cta.b64 [%r370]; 2026-02-21T09:08:08.0873671Z // end inline asm 2026-02-21T09:08:08.0873967Z bar.sync 0; 2026-02-21T09:08:08.0874248Z // begin inline asm 2026-02-21T09:08:08.0874621Z @%p130 mbarrier.inval.shared::cta.b64 [%r371]; 2026-02-21T09:08:08.0875156Z // end inline asm 2026-02-21T09:08:08.0875452Z bar.sync 0; 2026-02-21T09:08:08.0875729Z // begin inline asm 2026-02-21T09:08:08.0876097Z @%p130 mbarrier.inval.shared::cta.b64 [%r372]; 2026-02-21T09:08:08.0876526Z // end inline asm 2026-02-21T09:08:08.0876825Z bar.sync 0; 2026-02-21T09:08:08.0877096Z // begin inline asm 2026-02-21T09:08:08.0877465Z @%p130 mbarrier.inval.shared::cta.b64 [%r482]; 2026-02-21T09:08:08.0877907Z // end inline asm 2026-02-21T09:08:08.0878202Z add.s32 %r558, %r69, 196656; 2026-02-21T09:08:08.0878559Z // begin inline asm 2026-02-21T09:08:08.0878911Z @%p130 mbarrier.inval.shared::cta.b64 [%r558]; 2026-02-21T09:08:08.0879346Z // end inline asm 2026-02-21T09:08:08.0879633Z bar.sync 0; 2026-02-21T09:08:08.0879918Z // begin inline asm 2026-02-21T09:08:08.0880271Z @%p130 mbarrier.inval.shared::cta.b64 [%r367]; 2026-02-21T09:08:08.0880704Z // end inline asm 2026-02-21T09:08:08.0881296Z .loc 1 53 45 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:53:45 2026-02-21T09:08:08.0882008Z shl.b32 %r1121, %r23, 11; 2026-02-21T09:08:08.0882359Z shl.b32 %r1122, %r24, 11; 2026-02-21T09:08:08.0882692Z shl.b32 %r1123, %r25, 11; 2026-02-21T09:08:08.0883034Z shl.b32 %r1124, %r26, 11; 2026-02-21T09:08:08.0883356Z shl.b32 %r1125, %r27, 11; 2026-02-21T09:08:08.0883695Z shl.b32 %r1126, %r28, 11; 2026-02-21T09:08:08.0884018Z shl.b32 %r1127, %r29, 11; 2026-02-21T09:08:08.0884351Z shl.b32 %r1128, %r30, 11; 2026-02-21T09:08:08.0884732Z shl.b32 %r1129, %r31, 11; 2026-02-21T09:08:08.0885080Z shl.b32 %r1130, %r32, 11; 2026-02-21T09:08:08.0885417Z shl.b32 %r1131, %r33, 11; 2026-02-21T09:08:08.0885751Z shl.b32 %r1132, %r34, 11; 2026-02-21T09:08:08.0886193Z shl.b32 %r1133, %r35, 11; 2026-02-21T09:08:08.0886534Z shl.b32 %r1134, %r36, 11; 2026-02-21T09:08:08.0886859Z shl.b32 %r1135, %r37, 11; 2026-02-21T09:08:08.0887190Z shl.b32 %r1136, %r38, 11; 2026-02-21T09:08:08.0887512Z shl.b32 %r1137, %r39, 11; 2026-02-21T09:08:08.0887938Z shl.b32 %r1138, %r40, 11; 2026-02-21T09:08:08.0888265Z shl.b32 %r1139, %r41, 11; 2026-02-21T09:08:08.0888602Z shl.b32 %r1140, %r42, 11; 2026-02-21T09:08:08.0888928Z shl.b32 %r1141, %r43, 11; 2026-02-21T09:08:08.0889262Z shl.b32 %r1142, %r44, 11; 2026-02-21T09:08:08.0889579Z shl.b32 %r1143, %r45, 11; 2026-02-21T09:08:08.0889913Z shl.b32 %r1144, %r46, 11; 2026-02-21T09:08:08.0890245Z shl.b32 %r1145, %r47, 11; 2026-02-21T09:08:08.0890569Z shl.b32 %r1146, %r48, 11; 2026-02-21T09:08:08.0890902Z shl.b32 %r1147, %r49, 11; 2026-02-21T09:08:08.0891226Z shl.b32 %r1148, %r50, 11; 2026-02-21T09:08:08.0891558Z shl.b32 %r1149, %r51, 11; 2026-02-21T09:08:08.0892253Z .loc 1 53 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:53:52 2026-02-21T09:08:08.0892968Z or.b32 %r1150, %r1121, %r21; 2026-02-21T09:08:08.0893314Z or.b32 %r1151, %r1122, %r21; 2026-02-21T09:08:08.0893669Z or.b32 %r1152, %r1123, %r21; 2026-02-21T09:08:08.0894011Z or.b32 %r1153, %r1124, %r21; 2026-02-21T09:08:08.0894358Z or.b32 %r1154, %r1125, %r21; 2026-02-21T09:08:08.0894777Z or.b32 %r1155, %r1126, %r21; 2026-02-21T09:08:08.0895115Z or.b32 %r1156, %r1127, %r21; 2026-02-21T09:08:08.0895534Z or.b32 %r1157, %r1128, %r21; 2026-02-21T09:08:08.0895874Z or.b32 %r1158, %r1129, %r21; 2026-02-21T09:08:08.0896221Z or.b32 %r1159, %r1130, %r21; 2026-02-21T09:08:08.0896553Z or.b32 %r1160, %r1131, %r21; 2026-02-21T09:08:08.0896898Z or.b32 %r1161, %r1132, %r21; 2026-02-21T09:08:08.0897230Z or.b32 %r1162, %r1133, %r21; 2026-02-21T09:08:08.0897570Z or.b32 %r1163, %r1134, %r21; 2026-02-21T09:08:08.0897915Z or.b32 %r1164, %r1135, %r21; 2026-02-21T09:08:08.0898251Z or.b32 %r1165, %r1136, %r21; 2026-02-21T09:08:08.0898593Z or.b32 %r1166, %r1137, %r21; 2026-02-21T09:08:08.0898948Z or.b32 %r1167, %r1138, %r21; 2026-02-21T09:08:08.0899298Z or.b32 %r1168, %r1139, %r21; 2026-02-21T09:08:08.0899638Z or.b32 %r1169, %r1140, %r21; 2026-02-21T09:08:08.0899972Z or.b32 %r1170, %r1141, %r21; 2026-02-21T09:08:08.0900318Z or.b32 %r1171, %r1142, %r21; 2026-02-21T09:08:08.0900646Z or.b32 %r1172, %r1143, %r21; 2026-02-21T09:08:08.0900988Z or.b32 %r1173, %r1144, %r21; 2026-02-21T09:08:08.0901327Z or.b32 %r1174, %r1145, %r21; 2026-02-21T09:08:08.0901672Z or.b32 %r1175, %r1146, %r21; 2026-02-21T09:08:08.0902006Z or.b32 %r1176, %r1147, %r21; 2026-02-21T09:08:08.0902348Z or.b32 %r1177, %r1148, %r21; 2026-02-21T09:08:08.0902679Z or.b32 %r1178, %r1149, %r21; 2026-02-21T09:08:08.0903036Z or.b32 %r1179, %r1178, 458752; 2026-02-21T09:08:08.0903396Z or.b32 %r1180, %r1178, 475136; 2026-02-21T09:08:08.0903743Z or.b32 %r1181, %r1178, 491520; 2026-02-21T09:08:08.0904104Z or.b32 %r1182, %r1178, 507904; 2026-02-21T09:08:08.0904834Z .loc 1 53 24 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:53:24 2026-02-21T09:08:08.0905570Z mad.wide.u32 %rd86, %r1150, 2, %rd3; 2026-02-21T09:08:08.0905978Z mad.wide.u32 %rd87, %r1151, 2, %rd3; 2026-02-21T09:08:08.0906386Z mad.wide.u32 %rd88, %r1152, 2, %rd3; 2026-02-21T09:08:08.0906788Z mad.wide.u32 %rd89, %r1153, 2, %rd3; 2026-02-21T09:08:08.0907179Z mad.wide.u32 %rd90, %r1154, 2, %rd3; 2026-02-21T09:08:08.0907579Z mad.wide.u32 %rd91, %r1155, 2, %rd3; 2026-02-21T09:08:08.0907963Z mad.wide.u32 %rd92, %r1156, 2, %rd3; 2026-02-21T09:08:08.0908360Z mad.wide.u32 %rd93, %r1157, 2, %rd3; 2026-02-21T09:08:08.0908747Z mad.wide.u32 %rd94, %r1158, 2, %rd3; 2026-02-21T09:08:08.0909139Z mad.wide.u32 %rd95, %r1159, 2, %rd3; 2026-02-21T09:08:08.0909524Z mad.wide.u32 %rd96, %r1160, 2, %rd3; 2026-02-21T09:08:08.0909916Z mad.wide.u32 %rd97, %r1161, 2, %rd3; 2026-02-21T09:08:08.0910313Z mad.wide.u32 %rd98, %r1162, 2, %rd3; 2026-02-21T09:08:08.0910808Z mad.wide.u32 %rd99, %r1163, 2, %rd3; 2026-02-21T09:08:08.0911214Z mad.wide.u32 %rd100, %r1164, 2, %rd3; 2026-02-21T09:08:08.0911618Z mad.wide.u32 %rd101, %r1165, 2, %rd3; 2026-02-21T09:08:08.0912021Z mad.wide.u32 %rd102, %r1166, 2, %rd3; 2026-02-21T09:08:08.0912500Z mad.wide.u32 %rd103, %r1167, 2, %rd3; 2026-02-21T09:08:08.0912898Z mad.wide.u32 %rd104, %r1168, 2, %rd3; 2026-02-21T09:08:08.0913289Z mad.wide.u32 %rd105, %r1169, 2, %rd3; 2026-02-21T09:08:08.0913682Z mad.wide.u32 %rd106, %r1170, 2, %rd3; 2026-02-21T09:08:08.0914080Z mad.wide.u32 %rd107, %r1171, 2, %rd3; 2026-02-21T09:08:08.0914465Z mad.wide.u32 %rd108, %r1172, 2, %rd3; 2026-02-21T09:08:08.0914962Z mad.wide.u32 %rd109, %r1173, 2, %rd3; 2026-02-21T09:08:08.0915354Z mad.wide.u32 %rd110, %r1174, 2, %rd3; 2026-02-21T09:08:08.0915753Z mad.wide.u32 %rd111, %r1175, 2, %rd3; 2026-02-21T09:08:08.0916137Z mad.wide.u32 %rd112, %r1176, 2, %rd3; 2026-02-21T09:08:08.0916657Z mad.wide.u32 %rd113, %r1177, 2, %rd3; 2026-02-21T09:08:08.0917048Z mad.wide.u32 %rd114, %r1179, 2, %rd3; 2026-02-21T09:08:08.0917444Z mad.wide.u32 %rd115, %r1180, 2, %rd3; 2026-02-21T09:08:08.0917835Z mad.wide.u32 %rd116, %r1181, 2, %rd3; 2026-02-21T09:08:08.0918222Z mad.wide.u32 %rd117, %r1182, 2, %rd3; 2026-02-21T09:08:08.0918895Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0919578Z // begin inline asm 2026-02-21T09:08:08.0920538Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575}, [%r831 + 0]; 2026-02-21T09:08:08.0921499Z // end inline asm 2026-02-21T09:08:08.0921811Z // begin inline asm 2026-02-21T09:08:08.0922682Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592}, [%r831 + 16]; 2026-02-21T09:08:08.0923629Z // end inline asm 2026-02-21T09:08:08.0923934Z // begin inline asm 2026-02-21T09:08:08.0924850Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609}, [%r831 + 32]; 2026-02-21T09:08:08.0925805Z // end inline asm 2026-02-21T09:08:08.0926094Z // begin inline asm 2026-02-21T09:08:08.0926944Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626}, [%r831 + 48]; 2026-02-21T09:08:08.0927886Z // end inline asm 2026-02-21T09:08:08.0928175Z // begin inline asm 2026-02-21T09:08:08.0929023Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643}, [%r831 + 64]; 2026-02-21T09:08:08.0929961Z // end inline asm 2026-02-21T09:08:08.0930262Z // begin inline asm 2026-02-21T09:08:08.0931100Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660}, [%r831 + 80]; 2026-02-21T09:08:08.0932045Z // end inline asm 2026-02-21T09:08:08.0932346Z // begin inline asm 2026-02-21T09:08:08.0933186Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677}, [%r831 + 96]; 2026-02-21T09:08:08.0934127Z // end inline asm 2026-02-21T09:08:08.0934415Z // begin inline asm 2026-02-21T09:08:08.0935392Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694}, [%r831 + 112]; 2026-02-21T09:08:08.0936347Z // end inline asm 2026-02-21T09:08:08.0936642Z // begin inline asm 2026-02-21T09:08:08.0937507Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711}, [%r831 + 128]; 2026-02-21T09:08:08.0938462Z // end inline asm 2026-02-21T09:08:08.0938871Z // begin inline asm 2026-02-21T09:08:08.0939712Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728}, [%r831 + 144]; 2026-02-21T09:08:08.0940662Z // end inline asm 2026-02-21T09:08:08.0941038Z // begin inline asm 2026-02-21T09:08:08.0941900Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745}, [%r831 + 160]; 2026-02-21T09:08:08.0942861Z // end inline asm 2026-02-21T09:08:08.0943150Z // begin inline asm 2026-02-21T09:08:08.0944011Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762}, [%r831 + 176]; 2026-02-21T09:08:08.0945026Z // end inline asm 2026-02-21T09:08:08.0945331Z // begin inline asm 2026-02-21T09:08:08.0946295Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779}, [%r831 + 192]; 2026-02-21T09:08:08.0947242Z // end inline asm 2026-02-21T09:08:08.0947547Z // begin inline asm 2026-02-21T09:08:08.0948391Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796}, [%r831 + 208]; 2026-02-21T09:08:08.0949339Z // end inline asm 2026-02-21T09:08:08.0949628Z // begin inline asm 2026-02-21T09:08:08.0950542Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813}, [%r831 + 224]; 2026-02-21T09:08:08.0951488Z // end inline asm 2026-02-21T09:08:08.0951782Z // begin inline asm 2026-02-21T09:08:08.0952644Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829, %r830}, [%r831 + 240]; 2026-02-21T09:08:08.0953576Z // end inline asm 2026-02-21T09:08:08.0953882Z // begin inline asm 2026-02-21T09:08:08.0954219Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:08:08.0954587Z // end inline asm 2026-02-21T09:08:08.0954979Z cvt.u64.u32 %rd118, %r560; 2026-02-21T09:08:08.0955328Z cvt.u64.u32 %rd119, %r561; 2026-02-21T09:08:08.0955693Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:08:08.0956046Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:08:08.0956701Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0957406Z mov.b64 {%r1183, %r1184}, %rd121; 2026-02-21T09:08:08.0957820Z cvt.rn.f16x2.f32 %r1185, %r1184, %r1183; 2026-02-21T09:08:08.0958503Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0959195Z cvt.u64.u32 %rd122, %r562; 2026-02-21T09:08:08.0959549Z cvt.u64.u32 %rd123, %r563; 2026-02-21T09:08:08.0959894Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:08:08.0960253Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:08:08.0960888Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0961585Z mov.b64 {%r1186, %r1187}, %rd125; 2026-02-21T09:08:08.0961981Z cvt.rn.f16x2.f32 %r1188, %r1187, %r1186; 2026-02-21T09:08:08.0962665Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0963351Z cvt.u64.u32 %rd126, %r564; 2026-02-21T09:08:08.0963696Z cvt.u64.u32 %rd127, %r565; 2026-02-21T09:08:08.0964046Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:08:08.0964390Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:08:08.0965088Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0965781Z mov.b64 {%r1189, %r1190}, %rd129; 2026-02-21T09:08:08.0966187Z cvt.rn.f16x2.f32 %r1191, %r1190, %r1189; 2026-02-21T09:08:08.0966871Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0967677Z cvt.u64.u32 %rd130, %r566; 2026-02-21T09:08:08.0968029Z cvt.u64.u32 %rd131, %r567; 2026-02-21T09:08:08.0968370Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:08:08.0968734Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:08:08.0969442Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0970138Z mov.b64 {%r1192, %r1193}, %rd133; 2026-02-21T09:08:08.0970535Z cvt.rn.f16x2.f32 %r1194, %r1193, %r1192; 2026-02-21T09:08:08.0971219Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0971910Z cvt.u64.u32 %rd134, %r568; 2026-02-21T09:08:08.0972251Z cvt.u64.u32 %rd135, %r569; 2026-02-21T09:08:08.0972607Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:08:08.0972957Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:08:08.0973684Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0974378Z mov.b64 {%r1195, %r1196}, %rd137; 2026-02-21T09:08:08.0974841Z cvt.rn.f16x2.f32 %r1197, %r1196, %r1195; 2026-02-21T09:08:08.0975525Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0976226Z cvt.u64.u32 %rd138, %r570; 2026-02-21T09:08:08.0976583Z cvt.u64.u32 %rd139, %r571; 2026-02-21T09:08:08.0976925Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:08:08.0977353Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:08:08.0977985Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0978675Z mov.b64 {%r1198, %r1199}, %rd141; 2026-02-21T09:08:08.0979056Z cvt.rn.f16x2.f32 %r1200, %r1199, %r1198; 2026-02-21T09:08:08.0979732Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0980429Z cvt.u64.u32 %rd142, %r572; 2026-02-21T09:08:08.0980764Z cvt.u64.u32 %rd143, %r573; 2026-02-21T09:08:08.0981113Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:08:08.0981457Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:08:08.0982087Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0982771Z mov.b64 {%r1201, %r1202}, %rd145; 2026-02-21T09:08:08.0983166Z cvt.rn.f16x2.f32 %r1203, %r1202, %r1201; 2026-02-21T09:08:08.0983838Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0984528Z cvt.u64.u32 %rd146, %r574; 2026-02-21T09:08:08.0984939Z cvt.u64.u32 %rd147, %r575; 2026-02-21T09:08:08.0985282Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:08:08.0985640Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:08:08.0986269Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0986964Z mov.b64 {%r1204, %r1205}, %rd149; 2026-02-21T09:08:08.0987355Z cvt.rn.f16x2.f32 %r1206, %r1205, %r1204; 2026-02-21T09:08:08.0988042Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0988735Z cvt.u64.u32 %rd150, %r577; 2026-02-21T09:08:08.0989079Z cvt.u64.u32 %rd151, %r578; 2026-02-21T09:08:08.0989433Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:08:08.0989780Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:08:08.0990423Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0991110Z mov.b64 {%r1207, %r1208}, %rd153; 2026-02-21T09:08:08.0991509Z cvt.rn.f16x2.f32 %r1209, %r1208, %r1207; 2026-02-21T09:08:08.0992183Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0992871Z cvt.u64.u32 %rd154, %r579; 2026-02-21T09:08:08.0993224Z cvt.u64.u32 %rd155, %r580; 2026-02-21T09:08:08.0993565Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:08:08.0993920Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:08:08.0994660Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0995449Z mov.b64 {%r1210, %r1211}, %rd157; 2026-02-21T09:08:08.0995837Z cvt.rn.f16x2.f32 %r1212, %r1211, %r1210; 2026-02-21T09:08:08.0996595Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.0997276Z cvt.u64.u32 %rd158, %r581; 2026-02-21T09:08:08.0997619Z cvt.u64.u32 %rd159, %r582; 2026-02-21T09:08:08.0997968Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:08:08.0998313Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:08:08.0998943Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.0999623Z mov.b64 {%r1213, %r1214}, %rd161; 2026-02-21T09:08:08.1000020Z cvt.rn.f16x2.f32 %r1215, %r1214, %r1213; 2026-02-21T09:08:08.1000754Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1001454Z cvt.u64.u32 %rd162, %r583; 2026-02-21T09:08:08.1001805Z cvt.u64.u32 %rd163, %r584; 2026-02-21T09:08:08.1002146Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:08:08.1002499Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:08:08.1003128Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1003817Z mov.b64 {%r1216, %r1217}, %rd165; 2026-02-21T09:08:08.1004285Z cvt.rn.f16x2.f32 %r1218, %r1217, %r1216; 2026-02-21T09:08:08.1005216Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1006068Z cvt.u64.u32 %rd166, %r585; 2026-02-21T09:08:08.1006489Z cvt.u64.u32 %rd167, %r586; 2026-02-21T09:08:08.1006858Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:08:08.1007197Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:08:08.1007839Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1008535Z mov.b64 {%r1219, %r1220}, %rd169; 2026-02-21T09:08:08.1008929Z cvt.rn.f16x2.f32 %r1221, %r1220, %r1219; 2026-02-21T09:08:08.1009609Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1010291Z cvt.u64.u32 %rd170, %r587; 2026-02-21T09:08:08.1010643Z cvt.u64.u32 %rd171, %r588; 2026-02-21T09:08:08.1010988Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:08:08.1011330Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:08:08.1011935Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1012598Z mov.b64 {%r1222, %r1223}, %rd173; 2026-02-21T09:08:08.1012964Z cvt.rn.f16x2.f32 %r1224, %r1223, %r1222; 2026-02-21T09:08:08.1013621Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1014278Z cvt.u64.u32 %rd174, %r589; 2026-02-21T09:08:08.1014606Z cvt.u64.u32 %rd175, %r590; 2026-02-21T09:08:08.1015027Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:08:08.1015362Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:08:08.1015983Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1016652Z mov.b64 {%r1225, %r1226}, %rd177; 2026-02-21T09:08:08.1017038Z cvt.rn.f16x2.f32 %r1227, %r1226, %r1225; 2026-02-21T09:08:08.1017699Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1018364Z cvt.u64.u32 %rd178, %r591; 2026-02-21T09:08:08.1018702Z cvt.u64.u32 %rd179, %r592; 2026-02-21T09:08:08.1019032Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:08:08.1019379Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:08:08.1019989Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1020663Z mov.b64 {%r1228, %r1229}, %rd181; 2026-02-21T09:08:08.1021039Z cvt.rn.f16x2.f32 %r1230, %r1229, %r1228; 2026-02-21T09:08:08.1021798Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1022463Z cvt.u64.u32 %rd182, %r594; 2026-02-21T09:08:08.1022582Z cvt.u64.u32 %rd183, %r595; 2026-02-21T09:08:08.1022805Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:08:08.1022926Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:08:08.1023328Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1023455Z mov.b64 {%r1231, %r1232}, %rd185; 2026-02-21T09:08:08.1023596Z cvt.rn.f16x2.f32 %r1233, %r1232, %r1231; 2026-02-21T09:08:08.1023990Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1024109Z cvt.u64.u32 %rd186, %r596; 2026-02-21T09:08:08.1024225Z cvt.u64.u32 %rd187, %r597; 2026-02-21T09:08:08.1024353Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:08:08.1024535Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:08:08.1024997Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1025121Z mov.b64 {%r1234, %r1235}, %rd189; 2026-02-21T09:08:08.1025268Z cvt.rn.f16x2.f32 %r1236, %r1235, %r1234; 2026-02-21T09:08:08.1025651Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1025769Z cvt.u64.u32 %rd190, %r598; 2026-02-21T09:08:08.1025898Z cvt.u64.u32 %rd191, %r599; 2026-02-21T09:08:08.1026078Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:08:08.1026198Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:08:08.1026590Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1026712Z mov.b64 {%r1237, %r1238}, %rd193; 2026-02-21T09:08:08.1026849Z cvt.rn.f16x2.f32 %r1239, %r1238, %r1237; 2026-02-21T09:08:08.1027228Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1027358Z cvt.u64.u32 %rd194, %r600; 2026-02-21T09:08:08.1027474Z cvt.u64.u32 %rd195, %r601; 2026-02-21T09:08:08.1027591Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:08:08.1027721Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:08:08.1028106Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1028228Z mov.b64 {%r1240, %r1241}, %rd197; 2026-02-21T09:08:08.1028372Z cvt.rn.f16x2.f32 %r1242, %r1241, %r1240; 2026-02-21T09:08:08.1028755Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1028874Z cvt.u64.u32 %rd198, %r602; 2026-02-21T09:08:08.1028992Z cvt.u64.u32 %rd199, %r603; 2026-02-21T09:08:08.1029119Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:08:08.1029237Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:08:08.1029625Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1029760Z mov.b64 {%r1243, %r1244}, %rd201; 2026-02-21T09:08:08.1029899Z cvt.rn.f16x2.f32 %r1245, %r1244, %r1243; 2026-02-21T09:08:08.1030279Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1030410Z cvt.u64.u32 %rd202, %r604; 2026-02-21T09:08:08.1030529Z cvt.u64.u32 %rd203, %r605; 2026-02-21T09:08:08.1030648Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:08:08.1030767Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:08:08.1031157Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1031275Z mov.b64 {%r1246, %r1247}, %rd205; 2026-02-21T09:08:08.1031410Z cvt.rn.f16x2.f32 %r1248, %r1247, %r1246; 2026-02-21T09:08:08.1031801Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1031919Z cvt.u64.u32 %rd206, %r606; 2026-02-21T09:08:08.1032037Z cvt.u64.u32 %rd207, %r607; 2026-02-21T09:08:08.1032248Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:08:08.1032369Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:08:08.1032753Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1032873Z mov.b64 {%r1249, %r1250}, %rd209; 2026-02-21T09:08:08.1033076Z cvt.rn.f16x2.f32 %r1251, %r1250, %r1249; 2026-02-21T09:08:08.1033457Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1033576Z cvt.u64.u32 %rd210, %r608; 2026-02-21T09:08:08.1033705Z cvt.u64.u32 %rd211, %r609; 2026-02-21T09:08:08.1033823Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:08:08.1033941Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:08:08.1034331Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1034450Z mov.b64 {%r1252, %r1253}, %rd213; 2026-02-21T09:08:08.1034647Z cvt.rn.f16x2.f32 %r1254, %r1253, %r1252; 2026-02-21T09:08:08.1035120Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1035253Z cvt.u64.u32 %rd214, %r611; 2026-02-21T09:08:08.1035372Z cvt.u64.u32 %rd215, %r612; 2026-02-21T09:08:08.1035489Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:08:08.1035623Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:08:08.1036005Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1036185Z mov.b64 {%r1255, %r1256}, %rd217; 2026-02-21T09:08:08.1036333Z cvt.rn.f16x2.f32 %r1257, %r1256, %r1255; 2026-02-21T09:08:08.1036718Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1036838Z cvt.u64.u32 %rd218, %r613; 2026-02-21T09:08:08.1036956Z cvt.u64.u32 %rd219, %r614; 2026-02-21T09:08:08.1037084Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:08:08.1037204Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:08:08.1037586Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1037712Z mov.b64 {%r1258, %r1259}, %rd221; 2026-02-21T09:08:08.1037844Z cvt.rn.f16x2.f32 %r1260, %r1259, %r1258; 2026-02-21T09:08:08.1038239Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1038362Z cvt.u64.u32 %rd222, %r615; 2026-02-21T09:08:08.1038477Z cvt.u64.u32 %rd223, %r616; 2026-02-21T09:08:08.1038591Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:08:08.1038703Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:08:08.1039106Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1039221Z mov.b64 {%r1261, %r1262}, %rd225; 2026-02-21T09:08:08.1039353Z cvt.rn.f16x2.f32 %r1263, %r1262, %r1261; 2026-02-21T09:08:08.1039758Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1039882Z cvt.u64.u32 %rd226, %r617; 2026-02-21T09:08:08.1039999Z cvt.u64.u32 %rd227, %r618; 2026-02-21T09:08:08.1040124Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:08:08.1040242Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:08:08.1040623Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1040740Z mov.b64 {%r1264, %r1265}, %rd229; 2026-02-21T09:08:08.1040887Z cvt.rn.f16x2.f32 %r1266, %r1265, %r1264; 2026-02-21T09:08:08.1041268Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1041388Z cvt.u64.u32 %rd230, %r619; 2026-02-21T09:08:08.1041517Z cvt.u64.u32 %rd231, %r620; 2026-02-21T09:08:08.1041634Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:08:08.1041752Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:08:08.1042151Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1042347Z mov.b64 {%r1267, %r1268}, %rd233; 2026-02-21T09:08:08.1042486Z cvt.rn.f16x2.f32 %r1269, %r1268, %r1267; 2026-02-21T09:08:08.1042887Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1043092Z cvt.u64.u32 %rd234, %r621; 2026-02-21T09:08:08.1043226Z cvt.u64.u32 %rd235, %r622; 2026-02-21T09:08:08.1043361Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:08:08.1043506Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:08:08.1043922Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1044058Z mov.b64 {%r1270, %r1271}, %rd237; 2026-02-21T09:08:08.1044220Z cvt.rn.f16x2.f32 %r1272, %r1271, %r1270; 2026-02-21T09:08:08.1044622Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1044823Z cvt.u64.u32 %rd238, %r623; 2026-02-21T09:08:08.1045020Z cvt.u64.u32 %rd239, %r624; 2026-02-21T09:08:08.1045162Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:08:08.1045298Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:08:08.1045732Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1045879Z mov.b64 {%r1273, %r1274}, %rd241; 2026-02-21T09:08:08.1046024Z cvt.rn.f16x2.f32 %r1275, %r1274, %r1273; 2026-02-21T09:08:08.1046439Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1046655Z cvt.u64.u32 %rd242, %r625; 2026-02-21T09:08:08.1046787Z cvt.u64.u32 %rd243, %r626; 2026-02-21T09:08:08.1046905Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:08:08.1047024Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:08:08.1047424Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1047543Z mov.b64 {%r1276, %r1277}, %rd245; 2026-02-21T09:08:08.1047682Z cvt.rn.f16x2.f32 %r1278, %r1277, %r1276; 2026-02-21T09:08:08.1048075Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1048193Z cvt.u64.u32 %rd246, %r628; 2026-02-21T09:08:08.1048311Z cvt.u64.u32 %rd247, %r629; 2026-02-21T09:08:08.1048438Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:08:08.1048559Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:08:08.1048941Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1049065Z mov.b64 {%r1279, %r1280}, %rd249; 2026-02-21T09:08:08.1049213Z cvt.rn.f16x2.f32 %r1281, %r1280, %r1279; 2026-02-21T09:08:08.1049594Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1049712Z cvt.u64.u32 %rd250, %r630; 2026-02-21T09:08:08.1049840Z cvt.u64.u32 %rd251, %r631; 2026-02-21T09:08:08.1049958Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:08:08.1050078Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:08:08.1050475Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1050596Z mov.b64 {%r1282, %r1283}, %rd253; 2026-02-21T09:08:08.1050731Z cvt.rn.f16x2.f32 %r1284, %r1283, %r1282; 2026-02-21T09:08:08.1051108Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1051236Z cvt.u64.u32 %rd254, %r632; 2026-02-21T09:08:08.1051351Z cvt.u64.u32 %rd255, %r633; 2026-02-21T09:08:08.1051472Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:08:08.1051598Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:08:08.1051978Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1052097Z mov.b64 {%r1285, %r1286}, %rd257; 2026-02-21T09:08:08.1052241Z cvt.rn.f16x2.f32 %r1287, %r1286, %r1285; 2026-02-21T09:08:08.1052621Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1052813Z cvt.u64.u32 %rd258, %r634; 2026-02-21T09:08:08.1052933Z cvt.u64.u32 %rd259, %r635; 2026-02-21T09:08:08.1053062Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:08:08.1053182Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:08:08.1053562Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1053756Z mov.b64 {%r1288, %r1289}, %rd261; 2026-02-21T09:08:08.1053893Z cvt.rn.f16x2.f32 %r1290, %r1289, %r1288; 2026-02-21T09:08:08.1054275Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1054401Z cvt.u64.u32 %rd262, %r636; 2026-02-21T09:08:08.1054517Z cvt.u64.u32 %rd263, %r637; 2026-02-21T09:08:08.1054633Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:08:08.1054806Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:08:08.1055199Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1055376Z mov.b64 {%r1291, %r1292}, %rd265; 2026-02-21T09:08:08.1055519Z cvt.rn.f16x2.f32 %r1293, %r1292, %r1291; 2026-02-21T09:08:08.1055922Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1056042Z cvt.u64.u32 %rd266, %r638; 2026-02-21T09:08:08.1056162Z cvt.u64.u32 %rd267, %r639; 2026-02-21T09:08:08.1056286Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:08:08.1056403Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:08:08.1056844Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1056967Z mov.b64 {%r1294, %r1295}, %rd269; 2026-02-21T09:08:08.1057115Z cvt.rn.f16x2.f32 %r1296, %r1295, %r1294; 2026-02-21T09:08:08.1057495Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1057614Z cvt.u64.u32 %rd270, %r640; 2026-02-21T09:08:08.1057743Z cvt.u64.u32 %rd271, %r641; 2026-02-21T09:08:08.1057865Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:08:08.1057988Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:08:08.1058379Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1058501Z mov.b64 {%r1297, %r1298}, %rd273; 2026-02-21T09:08:08.1058640Z cvt.rn.f16x2.f32 %r1299, %r1298, %r1297; 2026-02-21T09:08:08.1059021Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1059151Z cvt.u64.u32 %rd274, %r642; 2026-02-21T09:08:08.1059268Z cvt.u64.u32 %rd275, %r643; 2026-02-21T09:08:08.1059387Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:08:08.1059515Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:08:08.1059894Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1060013Z mov.b64 {%r1300, %r1301}, %rd277; 2026-02-21T09:08:08.1060162Z cvt.rn.f16x2.f32 %r1302, %r1301, %r1300; 2026-02-21T09:08:08.1060539Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1060660Z cvt.u64.u32 %rd278, %r645; 2026-02-21T09:08:08.1060781Z cvt.u64.u32 %rd279, %r646; 2026-02-21T09:08:08.1060910Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:08:08.1061028Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:08:08.1061409Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1061541Z mov.b64 {%r1303, %r1304}, %rd281; 2026-02-21T09:08:08.1061678Z cvt.rn.f16x2.f32 %r1305, %r1304, %r1303; 2026-02-21T09:08:08.1062054Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1062182Z cvt.u64.u32 %rd282, %r647; 2026-02-21T09:08:08.1062301Z cvt.u64.u32 %rd283, %r648; 2026-02-21T09:08:08.1062419Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:08:08.1062538Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:08:08.1062930Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1063169Z mov.b64 {%r1306, %r1307}, %rd285; 2026-02-21T09:08:08.1063309Z cvt.rn.f16x2.f32 %r1308, %r1307, %r1306; 2026-02-21T09:08:08.1063714Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1063889Z cvt.u64.u32 %rd286, %r649; 2026-02-21T09:08:08.1064012Z cvt.u64.u32 %rd287, %r650; 2026-02-21T09:08:08.1064144Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:08:08.1064271Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:08:08.1064665Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1064883Z mov.b64 {%r1309, %r1310}, %rd289; 2026-02-21T09:08:08.1065035Z cvt.rn.f16x2.f32 %r1311, %r1310, %r1309; 2026-02-21T09:08:08.1065433Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1065616Z cvt.u64.u32 %rd290, %r651; 2026-02-21T09:08:08.1065757Z cvt.u64.u32 %rd291, %r652; 2026-02-21T09:08:08.1065899Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:08:08.1066050Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:08:08.1066550Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1066726Z mov.b64 {%r1312, %r1313}, %rd293; 2026-02-21T09:08:08.1066896Z cvt.rn.f16x2.f32 %r1314, %r1313, %r1312; 2026-02-21T09:08:08.1067451Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1067611Z cvt.u64.u32 %rd294, %r653; 2026-02-21T09:08:08.1067760Z cvt.u64.u32 %rd295, %r654; 2026-02-21T09:08:08.1067910Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:08:08.1068076Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:08:08.1068474Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1068603Z mov.b64 {%r1315, %r1316}, %rd297; 2026-02-21T09:08:08.1068756Z cvt.rn.f16x2.f32 %r1317, %r1316, %r1315; 2026-02-21T09:08:08.1069153Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1069274Z cvt.u64.u32 %rd298, %r655; 2026-02-21T09:08:08.1069399Z cvt.u64.u32 %rd299, %r656; 2026-02-21T09:08:08.1069534Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:08:08.1069660Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:08:08.1070060Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1070195Z mov.b64 {%r1318, %r1319}, %rd301; 2026-02-21T09:08:08.1070336Z cvt.rn.f16x2.f32 %r1320, %r1319, %r1318; 2026-02-21T09:08:08.1070729Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1070859Z cvt.u64.u32 %rd302, %r657; 2026-02-21T09:08:08.1070980Z cvt.u64.u32 %rd303, %r658; 2026-02-21T09:08:08.1071104Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:08:08.1071228Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:08:08.1071635Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1071757Z mov.b64 {%r1321, %r1322}, %rd305; 2026-02-21T09:08:08.1071901Z cvt.rn.f16x2.f32 %r1323, %r1322, %r1321; 2026-02-21T09:08:08.1072301Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1072423Z cvt.u64.u32 %rd306, %r659; 2026-02-21T09:08:08.1072547Z cvt.u64.u32 %rd307, %r660; 2026-02-21T09:08:08.1072677Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:08:08.1072802Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:08:08.1073199Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1073323Z mov.b64 {%r1324, %r1325}, %rd309; 2026-02-21T09:08:08.1073472Z cvt.rn.f16x2.f32 %r1326, %r1325, %r1324; 2026-02-21T09:08:08.1073868Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1074069Z cvt.u64.u32 %rd310, %r662; 2026-02-21T09:08:08.1074201Z cvt.u64.u32 %rd311, %r663; 2026-02-21T09:08:08.1074321Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:08:08.1074498Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:08:08.1074975Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1075100Z mov.b64 {%r1327, %r1328}, %rd313; 2026-02-21T09:08:08.1075242Z cvt.rn.f16x2.f32 %r1329, %r1328, %r1327; 2026-02-21T09:08:08.1075638Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1075769Z cvt.u64.u32 %rd314, %r664; 2026-02-21T09:08:08.1075890Z cvt.u64.u32 %rd315, %r665; 2026-02-21T09:08:08.1076010Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:08:08.1076142Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:08:08.1076600Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1076728Z mov.b64 {%r1330, %r1331}, %rd317; 2026-02-21T09:08:08.1076879Z cvt.rn.f16x2.f32 %r1332, %r1331, %r1330; 2026-02-21T09:08:08.1077269Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1077393Z cvt.u64.u32 %rd318, %r666; 2026-02-21T09:08:08.1077513Z cvt.u64.u32 %rd319, %r667; 2026-02-21T09:08:08.1077644Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:08:08.1077829Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:08:08.1078224Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1078360Z mov.b64 {%r1333, %r1334}, %rd321; 2026-02-21T09:08:08.1078501Z cvt.rn.f16x2.f32 %r1335, %r1334, %r1333; 2026-02-21T09:08:08.1078893Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1079025Z cvt.u64.u32 %rd322, %r668; 2026-02-21T09:08:08.1079149Z cvt.u64.u32 %rd323, %r669; 2026-02-21T09:08:08.1079271Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:08:08.1079393Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:08:08.1079792Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1079914Z mov.b64 {%r1336, %r1337}, %rd325; 2026-02-21T09:08:08.1080054Z cvt.rn.f16x2.f32 %r1338, %r1337, %r1336; 2026-02-21T09:08:08.1080460Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1080583Z cvt.u64.u32 %rd326, %r670; 2026-02-21T09:08:08.1080704Z cvt.u64.u32 %rd327, %r671; 2026-02-21T09:08:08.1080832Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:08:08.1080955Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:08:08.1081350Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1081476Z mov.b64 {%r1339, %r1340}, %rd329; 2026-02-21T09:08:08.1081630Z cvt.rn.f16x2.f32 %r1341, %r1340, %r1339; 2026-02-21T09:08:08.1082021Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1082143Z cvt.u64.u32 %rd330, %r672; 2026-02-21T09:08:08.1082275Z cvt.u64.u32 %rd331, %r673; 2026-02-21T09:08:08.1082396Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:08:08.1082518Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:08:08.1082921Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1083048Z mov.b64 {%r1342, %r1343}, %rd333; 2026-02-21T09:08:08.1083188Z cvt.rn.f16x2.f32 %r1344, %r1343, %r1342; 2026-02-21T09:08:08.1083585Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1083717Z cvt.u64.u32 %rd334, %r674; 2026-02-21T09:08:08.1083838Z cvt.u64.u32 %rd335, %r675; 2026-02-21T09:08:08.1083964Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:08:08.1084182Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:08:08.1084578Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1084791Z mov.b64 {%r1345, %r1346}, %rd337; 2026-02-21T09:08:08.1085015Z cvt.rn.f16x2.f32 %r1347, %r1346, %r1345; 2026-02-21T09:08:08.1085413Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1085536Z cvt.u64.u32 %rd338, %r676; 2026-02-21T09:08:08.1085660Z cvt.u64.u32 %rd339, %r677; 2026-02-21T09:08:08.1085791Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:08:08.1085914Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:08:08.1086310Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1086444Z mov.b64 {%r1348, %r1349}, %rd341; 2026-02-21T09:08:08.1086584Z cvt.rn.f16x2.f32 %r1350, %r1349, %r1348; 2026-02-21T09:08:08.1087033Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1087167Z cvt.u64.u32 %rd342, %r679; 2026-02-21T09:08:08.1087287Z cvt.u64.u32 %rd343, %r680; 2026-02-21T09:08:08.1087407Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:08:08.1087531Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:08:08.1087930Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1088053Z mov.b64 {%r1351, %r1352}, %rd345; 2026-02-21T09:08:08.1088247Z cvt.rn.f16x2.f32 %r1353, %r1352, %r1351; 2026-02-21T09:08:08.1088652Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1088777Z cvt.u64.u32 %rd346, %r681; 2026-02-21T09:08:08.1088902Z cvt.u64.u32 %rd347, %r682; 2026-02-21T09:08:08.1089032Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:08:08.1089155Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:08:08.1089564Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1089689Z mov.b64 {%r1354, %r1355}, %rd349; 2026-02-21T09:08:08.1089840Z cvt.rn.f16x2.f32 %r1356, %r1355, %r1354; 2026-02-21T09:08:08.1090229Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1090353Z cvt.u64.u32 %rd350, %r683; 2026-02-21T09:08:08.1090483Z cvt.u64.u32 %rd351, %r684; 2026-02-21T09:08:08.1090601Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:08:08.1090725Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:08:08.1091131Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1091254Z mov.b64 {%r1357, %r1358}, %rd353; 2026-02-21T09:08:08.1091392Z cvt.rn.f16x2.f32 %r1359, %r1358, %r1357; 2026-02-21T09:08:08.1091783Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1091918Z cvt.u64.u32 %rd354, %r685; 2026-02-21T09:08:08.1092044Z cvt.u64.u32 %rd355, %r686; 2026-02-21T09:08:08.1092167Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:08:08.1092301Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:08:08.1092692Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1092819Z mov.b64 {%r1360, %r1361}, %rd357; 2026-02-21T09:08:08.1092970Z cvt.rn.f16x2.f32 %r1362, %r1361, %r1360; 2026-02-21T09:08:08.1093365Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1093485Z cvt.u64.u32 %rd358, %r687; 2026-02-21T09:08:08.1093606Z cvt.u64.u32 %rd359, %r688; 2026-02-21T09:08:08.1093737Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:08:08.1093860Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:08:08.1094255Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1094392Z mov.b64 {%r1363, %r1364}, %rd361; 2026-02-21T09:08:08.1094609Z cvt.rn.f16x2.f32 %r1365, %r1364, %r1363; 2026-02-21T09:08:08.1094988Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1095124Z cvt.u64.u32 %rd362, %r689; 2026-02-21T09:08:08.1095245Z cvt.u64.u32 %rd363, %r690; 2026-02-21T09:08:08.1095433Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:08:08.1095557Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:08:08.1095964Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1096089Z mov.b64 {%r1366, %r1367}, %rd365; 2026-02-21T09:08:08.1096232Z cvt.rn.f16x2.f32 %r1368, %r1367, %r1366; 2026-02-21T09:08:08.1096641Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1096765Z cvt.u64.u32 %rd366, %r691; 2026-02-21T09:08:08.1096887Z cvt.u64.u32 %rd367, %r692; 2026-02-21T09:08:08.1097082Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:08:08.1097209Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:08:08.1097611Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1097736Z mov.b64 {%r1369, %r1370}, %rd369; 2026-02-21T09:08:08.1097887Z cvt.rn.f16x2.f32 %r1371, %r1370, %r1369; 2026-02-21T09:08:08.1098290Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1098413Z cvt.u64.u32 %rd370, %r693; 2026-02-21T09:08:08.1098633Z cvt.u64.u32 %rd371, %r694; 2026-02-21T09:08:08.1098759Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:08:08.1098883Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:08:08.1099294Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1099419Z mov.b64 {%r1372, %r1373}, %rd373; 2026-02-21T09:08:08.1099555Z cvt.rn.f16x2.f32 %r1374, %r1373, %r1372; 2026-02-21T09:08:08.1099958Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1100094Z cvt.u64.u32 %rd374, %r696; 2026-02-21T09:08:08.1100214Z cvt.u64.u32 %rd375, %r697; 2026-02-21T09:08:08.1100334Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:08:08.1100466Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:08:08.1100864Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1100987Z mov.b64 {%r1375, %r1376}, %rd377; 2026-02-21T09:08:08.1101139Z cvt.rn.f16x2.f32 %r1377, %r1376, %r1375; 2026-02-21T09:08:08.1101539Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1101663Z cvt.u64.u32 %rd378, %r698; 2026-02-21T09:08:08.1101783Z cvt.u64.u32 %rd379, %r699; 2026-02-21T09:08:08.1101913Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:08:08.1102037Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:08:08.1102443Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1102578Z mov.b64 {%r1378, %r1379}, %rd381; 2026-02-21T09:08:08.1102718Z cvt.rn.f16x2.f32 %r1380, %r1379, %r1378; 2026-02-21T09:08:08.1103118Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1103253Z cvt.u64.u32 %rd382, %r700; 2026-02-21T09:08:08.1103372Z cvt.u64.u32 %rd383, %r701; 2026-02-21T09:08:08.1103489Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:08:08.1103612Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:08:08.1104021Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1104147Z mov.b64 {%r1381, %r1382}, %rd385; 2026-02-21T09:08:08.1104287Z cvt.rn.f16x2.f32 %r1383, %r1382, %r1381; 2026-02-21T09:08:08.1104770Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1104898Z cvt.u64.u32 %rd386, %r702; 2026-02-21T09:08:08.1105103Z cvt.u64.u32 %rd387, %r703; 2026-02-21T09:08:08.1105234Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:08:08.1105358Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:08:08.1105754Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1105934Z mov.b64 {%r1384, %r1385}, %rd389; 2026-02-21T09:08:08.1106085Z cvt.rn.f16x2.f32 %r1386, %r1385, %r1384; 2026-02-21T09:08:08.1106482Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1106604Z cvt.u64.u32 %rd390, %r704; 2026-02-21T09:08:08.1106735Z cvt.u64.u32 %rd391, %r705; 2026-02-21T09:08:08.1106857Z shl.b64 %rd392, %rd391, 32; 2026-02-21T09:08:08.1106979Z or.b64 %rd393, %rd390, %rd392; 2026-02-21T09:08:08.1107382Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1107507Z mov.b64 {%r1387, %r1388}, %rd393; 2026-02-21T09:08:08.1107729Z cvt.rn.f16x2.f32 %r1389, %r1388, %r1387; 2026-02-21T09:08:08.1108132Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1108266Z cvt.u64.u32 %rd394, %r706; 2026-02-21T09:08:08.1108387Z cvt.u64.u32 %rd395, %r707; 2026-02-21T09:08:08.1108512Z shl.b64 %rd396, %rd395, 32; 2026-02-21T09:08:08.1108647Z or.b64 %rd397, %rd394, %rd396; 2026-02-21T09:08:08.1109098Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1109222Z mov.b64 {%r1390, %r1391}, %rd397; 2026-02-21T09:08:08.1109375Z cvt.rn.f16x2.f32 %r1392, %r1391, %r1390; 2026-02-21T09:08:08.1109774Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1109894Z cvt.u64.u32 %rd398, %r708; 2026-02-21T09:08:08.1110018Z cvt.u64.u32 %rd399, %r709; 2026-02-21T09:08:08.1110150Z shl.b64 %rd400, %rd399, 32; 2026-02-21T09:08:08.1110272Z or.b64 %rd401, %rd398, %rd400; 2026-02-21T09:08:08.1110670Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1110803Z mov.b64 {%r1393, %r1394}, %rd401; 2026-02-21T09:08:08.1110941Z cvt.rn.f16x2.f32 %r1395, %r1394, %r1393; 2026-02-21T09:08:08.1111338Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1111467Z cvt.u64.u32 %rd402, %r710; 2026-02-21T09:08:08.1111589Z cvt.u64.u32 %rd403, %r711; 2026-02-21T09:08:08.1111711Z shl.b64 %rd404, %rd403, 32; 2026-02-21T09:08:08.1111832Z or.b64 %rd405, %rd402, %rd404; 2026-02-21T09:08:08.1112233Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1112355Z mov.b64 {%r1396, %r1397}, %rd405; 2026-02-21T09:08:08.1112495Z cvt.rn.f16x2.f32 %r1398, %r1397, %r1396; 2026-02-21T09:08:08.1112901Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1113026Z cvt.u64.u32 %rd406, %r713; 2026-02-21T09:08:08.1113145Z cvt.u64.u32 %rd407, %r714; 2026-02-21T09:08:08.1113274Z shl.b64 %rd408, %rd407, 32; 2026-02-21T09:08:08.1113395Z or.b64 %rd409, %rd406, %rd408; 2026-02-21T09:08:08.1113795Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1113919Z mov.b64 {%r1399, %r1400}, %rd409; 2026-02-21T09:08:08.1114064Z cvt.rn.f16x2.f32 %r1401, %r1400, %r1399; 2026-02-21T09:08:08.1114461Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1114581Z cvt.u64.u32 %rd410, %r715; 2026-02-21T09:08:08.1114812Z cvt.u64.u32 %rd411, %r716; 2026-02-21T09:08:08.1114936Z shl.b64 %rd412, %rd411, 32; 2026-02-21T09:08:08.1115059Z or.b64 %rd413, %rd410, %rd412; 2026-02-21T09:08:08.1115465Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1115666Z mov.b64 {%r1402, %r1403}, %rd413; 2026-02-21T09:08:08.1115807Z cvt.rn.f16x2.f32 %r1404, %r1403, %r1402; 2026-02-21T09:08:08.1116202Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1116388Z cvt.u64.u32 %rd414, %r717; 2026-02-21T09:08:08.1116510Z cvt.u64.u32 %rd415, %r718; 2026-02-21T09:08:08.1116631Z shl.b64 %rd416, %rd415, 32; 2026-02-21T09:08:08.1116764Z or.b64 %rd417, %rd414, %rd416; 2026-02-21T09:08:08.1117160Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1117281Z mov.b64 {%r1405, %r1406}, %rd417; 2026-02-21T09:08:08.1117427Z cvt.rn.f16x2.f32 %r1407, %r1406, %r1405; 2026-02-21T09:08:08.1117819Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1117937Z cvt.u64.u32 %rd418, %r719; 2026-02-21T09:08:08.1118120Z cvt.u64.u32 %rd419, %r720; 2026-02-21T09:08:08.1118255Z shl.b64 %rd420, %rd419, 32; 2026-02-21T09:08:08.1118377Z or.b64 %rd421, %rd418, %rd420; 2026-02-21T09:08:08.1118778Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1118913Z mov.b64 {%r1408, %r1409}, %rd421; 2026-02-21T09:08:08.1119054Z cvt.rn.f16x2.f32 %r1410, %r1409, %r1408; 2026-02-21T09:08:08.1119502Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1119632Z cvt.u64.u32 %rd422, %r721; 2026-02-21T09:08:08.1119753Z cvt.u64.u32 %rd423, %r722; 2026-02-21T09:08:08.1119873Z shl.b64 %rd424, %rd423, 32; 2026-02-21T09:08:08.1119996Z or.b64 %rd425, %rd422, %rd424; 2026-02-21T09:08:08.1120394Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1120518Z mov.b64 {%r1411, %r1412}, %rd425; 2026-02-21T09:08:08.1120660Z cvt.rn.f16x2.f32 %r1413, %r1412, %r1411; 2026-02-21T09:08:08.1121064Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1121188Z cvt.u64.u32 %rd426, %r723; 2026-02-21T09:08:08.1121311Z cvt.u64.u32 %rd427, %r724; 2026-02-21T09:08:08.1121444Z shl.b64 %rd428, %rd427, 32; 2026-02-21T09:08:08.1121570Z or.b64 %rd429, %rd426, %rd428; 2026-02-21T09:08:08.1121962Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1122086Z mov.b64 {%r1414, %r1415}, %rd429; 2026-02-21T09:08:08.1122236Z cvt.rn.f16x2.f32 %r1416, %r1415, %r1414; 2026-02-21T09:08:08.1122628Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1122747Z cvt.u64.u32 %rd430, %r725; 2026-02-21T09:08:08.1122876Z cvt.u64.u32 %rd431, %r726; 2026-02-21T09:08:08.1122999Z shl.b64 %rd432, %rd431, 32; 2026-02-21T09:08:08.1123125Z or.b64 %rd433, %rd430, %rd432; 2026-02-21T09:08:08.1123531Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1123653Z mov.b64 {%r1417, %r1418}, %rd433; 2026-02-21T09:08:08.1123795Z cvt.rn.f16x2.f32 %r1419, %r1418, %r1417; 2026-02-21T09:08:08.1124191Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1124323Z cvt.u64.u32 %rd434, %r727; 2026-02-21T09:08:08.1124445Z cvt.u64.u32 %rd435, %r728; 2026-02-21T09:08:08.1124570Z shl.b64 %rd436, %rd435, 32; 2026-02-21T09:08:08.1124758Z or.b64 %rd437, %rd434, %rd436; 2026-02-21T09:08:08.1125161Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1125283Z mov.b64 {%r1420, %r1421}, %rd437; 2026-02-21T09:08:08.1125432Z cvt.rn.f16x2.f32 %r1422, %r1421, %r1420; 2026-02-21T09:08:08.1125831Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1126027Z cvt.u64.u32 %rd438, %r730; 2026-02-21T09:08:08.1126149Z cvt.u64.u32 %rd439, %r731; 2026-02-21T09:08:08.1126280Z shl.b64 %rd440, %rd439, 32; 2026-02-21T09:08:08.1126401Z or.b64 %rd441, %rd438, %rd440; 2026-02-21T09:08:08.1126857Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1126990Z mov.b64 {%r1423, %r1424}, %rd441; 2026-02-21T09:08:08.1127131Z cvt.rn.f16x2.f32 %r1425, %r1424, %r1423; 2026-02-21T09:08:08.1127524Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1127653Z cvt.u64.u32 %rd442, %r732; 2026-02-21T09:08:08.1127773Z cvt.u64.u32 %rd443, %r733; 2026-02-21T09:08:08.1127893Z shl.b64 %rd444, %rd443, 32; 2026-02-21T09:08:08.1128017Z or.b64 %rd445, %rd442, %rd444; 2026-02-21T09:08:08.1128481Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1128615Z mov.b64 {%r1426, %r1427}, %rd445; 2026-02-21T09:08:08.1128757Z cvt.rn.f16x2.f32 %r1428, %r1427, %r1426; 2026-02-21T09:08:08.1129163Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1129294Z cvt.u64.u32 %rd446, %r734; 2026-02-21T09:08:08.1129416Z cvt.u64.u32 %rd447, %r735; 2026-02-21T09:08:08.1129544Z shl.b64 %rd448, %rd447, 32; 2026-02-21T09:08:08.1129664Z or.b64 %rd449, %rd446, %rd448; 2026-02-21T09:08:08.1130118Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1130243Z mov.b64 {%r1429, %r1430}, %rd449; 2026-02-21T09:08:08.1130391Z cvt.rn.f16x2.f32 %r1431, %r1430, %r1429; 2026-02-21T09:08:08.1130784Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1130906Z cvt.u64.u32 %rd450, %r736; 2026-02-21T09:08:08.1131037Z cvt.u64.u32 %rd451, %r737; 2026-02-21T09:08:08.1131162Z shl.b64 %rd452, %rd451, 32; 2026-02-21T09:08:08.1131285Z or.b64 %rd453, %rd450, %rd452; 2026-02-21T09:08:08.1131683Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1131812Z mov.b64 {%r1432, %r1433}, %rd453; 2026-02-21T09:08:08.1131952Z cvt.rn.f16x2.f32 %r1434, %r1433, %r1432; 2026-02-21T09:08:08.1132345Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1132477Z cvt.u64.u32 %rd454, %r738; 2026-02-21T09:08:08.1132596Z cvt.u64.u32 %rd455, %r739; 2026-02-21T09:08:08.1132717Z shl.b64 %rd456, %rd455, 32; 2026-02-21T09:08:08.1132849Z or.b64 %rd457, %rd454, %rd456; 2026-02-21T09:08:08.1133243Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1133368Z mov.b64 {%r1435, %r1436}, %rd457; 2026-02-21T09:08:08.1133518Z cvt.rn.f16x2.f32 %r1437, %r1436, %r1435; 2026-02-21T09:08:08.1133914Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1134035Z cvt.u64.u32 %rd458, %r740; 2026-02-21T09:08:08.1134155Z cvt.u64.u32 %rd459, %r741; 2026-02-21T09:08:08.1134291Z shl.b64 %rd460, %rd459, 32; 2026-02-21T09:08:08.1134412Z or.b64 %rd461, %rd458, %rd460; 2026-02-21T09:08:08.1134880Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1135019Z mov.b64 {%r1438, %r1439}, %rd461; 2026-02-21T09:08:08.1135163Z cvt.rn.f16x2.f32 %r1440, %r1439, %r1438; 2026-02-21T09:08:08.1135558Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1135689Z cvt.u64.u32 %rd462, %r742; 2026-02-21T09:08:08.1135810Z cvt.u64.u32 %rd463, %r743; 2026-02-21T09:08:08.1135930Z shl.b64 %rd464, %rd463, 32; 2026-02-21T09:08:08.1136057Z or.b64 %rd465, %rd462, %rd464; 2026-02-21T09:08:08.1136538Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1136661Z mov.b64 {%r1441, %r1442}, %rd465; 2026-02-21T09:08:08.1136800Z cvt.rn.f16x2.f32 %r1443, %r1442, %r1441; 2026-02-21T09:08:08.1137277Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1137398Z cvt.u64.u32 %rd466, %r744; 2026-02-21T09:08:08.1137519Z cvt.u64.u32 %rd467, %r745; 2026-02-21T09:08:08.1137650Z shl.b64 %rd468, %rd467, 32; 2026-02-21T09:08:08.1137773Z or.b64 %rd469, %rd466, %rd468; 2026-02-21T09:08:08.1138172Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1138297Z mov.b64 {%r1444, %r1445}, %rd469; 2026-02-21T09:08:08.1138446Z cvt.rn.f16x2.f32 %r1446, %r1445, %r1444; 2026-02-21T09:08:08.1138933Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1139059Z cvt.u64.u32 %rd470, %r747; 2026-02-21T09:08:08.1139194Z cvt.u64.u32 %rd471, %r748; 2026-02-21T09:08:08.1139314Z shl.b64 %rd472, %rd471, 32; 2026-02-21T09:08:08.1139434Z or.b64 %rd473, %rd470, %rd472; 2026-02-21T09:08:08.1139839Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1139962Z mov.b64 {%r1447, %r1448}, %rd473; 2026-02-21T09:08:08.1140101Z cvt.rn.f16x2.f32 %r1449, %r1448, %r1447; 2026-02-21T09:08:08.1140560Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1140692Z cvt.u64.u32 %rd474, %r749; 2026-02-21T09:08:08.1140811Z cvt.u64.u32 %rd475, %r750; 2026-02-21T09:08:08.1140927Z shl.b64 %rd476, %rd475, 32; 2026-02-21T09:08:08.1141059Z or.b64 %rd477, %rd474, %rd476; 2026-02-21T09:08:08.1141465Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1141592Z mov.b64 {%r1450, %r1451}, %rd477; 2026-02-21T09:08:08.1141742Z cvt.rn.f16x2.f32 %r1452, %r1451, %r1450; 2026-02-21T09:08:08.1142142Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1142265Z cvt.u64.u32 %rd478, %r751; 2026-02-21T09:08:08.1142385Z cvt.u64.u32 %rd479, %r752; 2026-02-21T09:08:08.1142516Z shl.b64 %rd480, %rd479, 32; 2026-02-21T09:08:08.1142639Z or.b64 %rd481, %rd478, %rd480; 2026-02-21T09:08:08.1143042Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1143176Z mov.b64 {%r1453, %r1454}, %rd481; 2026-02-21T09:08:08.1143315Z cvt.rn.f16x2.f32 %r1455, %r1454, %r1453; 2026-02-21T09:08:08.1143720Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1143852Z cvt.u64.u32 %rd482, %r753; 2026-02-21T09:08:08.1143977Z cvt.u64.u32 %rd483, %r754; 2026-02-21T09:08:08.1144104Z shl.b64 %rd484, %rd483, 32; 2026-02-21T09:08:08.1144228Z or.b64 %rd485, %rd482, %rd484; 2026-02-21T09:08:08.1144785Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1144946Z mov.b64 {%r1456, %r1457}, %rd485; 2026-02-21T09:08:08.1145123Z cvt.rn.f16x2.f32 %r1458, %r1457, %r1456; 2026-02-21T09:08:08.1145647Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1145810Z cvt.u64.u32 %rd486, %r755; 2026-02-21T09:08:08.1145959Z cvt.u64.u32 %rd487, %r756; 2026-02-21T09:08:08.1146128Z shl.b64 %rd488, %rd487, 32; 2026-02-21T09:08:08.1146288Z or.b64 %rd489, %rd486, %rd488; 2026-02-21T09:08:08.1146753Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1146878Z mov.b64 {%r1459, %r1460}, %rd489; 2026-02-21T09:08:08.1147034Z cvt.rn.f16x2.f32 %r1461, %r1460, %r1459; 2026-02-21T09:08:08.1147515Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1147640Z cvt.u64.u32 %rd490, %r757; 2026-02-21T09:08:08.1147774Z cvt.u64.u32 %rd491, %r758; 2026-02-21T09:08:08.1147902Z shl.b64 %rd492, %rd491, 32; 2026-02-21T09:08:08.1148094Z or.b64 %rd493, %rd490, %rd492; 2026-02-21T09:08:08.1148503Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1148631Z mov.b64 {%r1462, %r1463}, %rd493; 2026-02-21T09:08:08.1148772Z cvt.rn.f16x2.f32 %r1464, %r1463, %r1462; 2026-02-21T09:08:08.1149168Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1149300Z cvt.u64.u32 %rd494, %r759; 2026-02-21T09:08:08.1149420Z cvt.u64.u32 %rd495, %r760; 2026-02-21T09:08:08.1149543Z shl.b64 %rd496, %rd495, 32; 2026-02-21T09:08:08.1149741Z or.b64 %rd497, %rd494, %rd496; 2026-02-21T09:08:08.1150144Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1150277Z mov.b64 {%r1465, %r1466}, %rd497; 2026-02-21T09:08:08.1150434Z cvt.rn.f16x2.f32 %r1467, %r1466, %r1465; 2026-02-21T09:08:08.1150826Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1150952Z cvt.u64.u32 %rd498, %r761; 2026-02-21T09:08:08.1151069Z cvt.u64.u32 %rd499, %r762; 2026-02-21T09:08:08.1151266Z shl.b64 %rd500, %rd499, 32; 2026-02-21T09:08:08.1151392Z or.b64 %rd501, %rd498, %rd500; 2026-02-21T09:08:08.1151789Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1151924Z mov.b64 {%r1468, %r1469}, %rd501; 2026-02-21T09:08:08.1152062Z cvt.rn.f16x2.f32 %r1470, %r1469, %r1468; 2026-02-21T09:08:08.1152461Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1152594Z cvt.u64.u32 %rd502, %r764; 2026-02-21T09:08:08.1152715Z cvt.u64.u32 %rd503, %r765; 2026-02-21T09:08:08.1152837Z shl.b64 %rd504, %rd503, 32; 2026-02-21T09:08:08.1152953Z or.b64 %rd505, %rd502, %rd504; 2026-02-21T09:08:08.1153365Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1153490Z mov.b64 {%r1471, %r1472}, %rd505; 2026-02-21T09:08:08.1153630Z cvt.rn.f16x2.f32 %r1473, %r1472, %r1471; 2026-02-21T09:08:08.1154033Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1154155Z cvt.u64.u32 %rd506, %r766; 2026-02-21T09:08:08.1154278Z cvt.u64.u32 %rd507, %r767; 2026-02-21T09:08:08.1154411Z shl.b64 %rd508, %rd507, 32; 2026-02-21T09:08:08.1154537Z or.b64 %rd509, %rd506, %rd508; 2026-02-21T09:08:08.1155024Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1155150Z mov.b64 {%r1474, %r1475}, %rd509; 2026-02-21T09:08:08.1155306Z cvt.rn.f16x2.f32 %r1476, %r1475, %r1474; 2026-02-21T09:08:08.1155699Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1155822Z cvt.u64.u32 %rd510, %r768; 2026-02-21T09:08:08.1155955Z cvt.u64.u32 %rd511, %r769; 2026-02-21T09:08:08.1156077Z shl.b64 %rd512, %rd511, 32; 2026-02-21T09:08:08.1156202Z or.b64 %rd513, %rd510, %rd512; 2026-02-21T09:08:08.1156612Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1156736Z mov.b64 {%r1477, %r1478}, %rd513; 2026-02-21T09:08:08.1156871Z cvt.rn.f16x2.f32 %r1479, %r1478, %r1477; 2026-02-21T09:08:08.1157266Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1157399Z cvt.u64.u32 %rd514, %r770; 2026-02-21T09:08:08.1157525Z cvt.u64.u32 %rd515, %r771; 2026-02-21T09:08:08.1157649Z shl.b64 %rd516, %rd515, 32; 2026-02-21T09:08:08.1157889Z or.b64 %rd517, %rd514, %rd516; 2026-02-21T09:08:08.1158293Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1158416Z mov.b64 {%r1480, %r1481}, %rd517; 2026-02-21T09:08:08.1158621Z cvt.rn.f16x2.f32 %r1482, %r1481, %r1480; 2026-02-21T09:08:08.1159015Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1159139Z cvt.u64.u32 %rd518, %r772; 2026-02-21T09:08:08.1159260Z cvt.u64.u32 %rd519, %r773; 2026-02-21T09:08:08.1159390Z shl.b64 %rd520, %rd519, 32; 2026-02-21T09:08:08.1159513Z or.b64 %rd521, %rd518, %rd520; 2026-02-21T09:08:08.1159903Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1160036Z mov.b64 {%r1483, %r1484}, %rd521; 2026-02-21T09:08:08.1160237Z cvt.rn.f16x2.f32 %r1485, %r1484, %r1483; 2026-02-21T09:08:08.1160641Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1160778Z cvt.u64.u32 %rd522, %r774; 2026-02-21T09:08:08.1160899Z cvt.u64.u32 %rd523, %r775; 2026-02-21T09:08:08.1161021Z shl.b64 %rd524, %rd523, 32; 2026-02-21T09:08:08.1161146Z or.b64 %rd525, %rd522, %rd524; 2026-02-21T09:08:08.1161553Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1161745Z mov.b64 {%r1486, %r1487}, %rd525; 2026-02-21T09:08:08.1161888Z cvt.rn.f16x2.f32 %r1488, %r1487, %r1486; 2026-02-21T09:08:08.1162295Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1162419Z cvt.u64.u32 %rd526, %r776; 2026-02-21T09:08:08.1162539Z cvt.u64.u32 %rd527, %r777; 2026-02-21T09:08:08.1162673Z shl.b64 %rd528, %rd527, 32; 2026-02-21T09:08:08.1162798Z or.b64 %rd529, %rd526, %rd528; 2026-02-21T09:08:08.1163197Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1163325Z mov.b64 {%r1489, %r1490}, %rd529; 2026-02-21T09:08:08.1163474Z cvt.rn.f16x2.f32 %r1491, %r1490, %r1489; 2026-02-21T09:08:08.1163868Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1163994Z cvt.u64.u32 %rd530, %r778; 2026-02-21T09:08:08.1164126Z cvt.u64.u32 %rd531, %r779; 2026-02-21T09:08:08.1164250Z shl.b64 %rd532, %rd531, 32; 2026-02-21T09:08:08.1164373Z or.b64 %rd533, %rd530, %rd532; 2026-02-21T09:08:08.1164846Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1164973Z mov.b64 {%r1492, %r1493}, %rd533; 2026-02-21T09:08:08.1165112Z cvt.rn.f16x2.f32 %r1494, %r1493, %r1492; 2026-02-21T09:08:08.1165508Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1165641Z cvt.u64.u32 %rd534, %r781; 2026-02-21T09:08:08.1165765Z cvt.u64.u32 %rd535, %r782; 2026-02-21T09:08:08.1165890Z shl.b64 %rd536, %rd535, 32; 2026-02-21T09:08:08.1166024Z or.b64 %rd537, %rd534, %rd536; 2026-02-21T09:08:08.1166417Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1166547Z mov.b64 {%r1495, %r1496}, %rd537; 2026-02-21T09:08:08.1166697Z cvt.rn.f16x2.f32 %r1497, %r1496, %r1495; 2026-02-21T09:08:08.1167091Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1167214Z cvt.u64.u32 %rd538, %r783; 2026-02-21T09:08:08.1167336Z cvt.u64.u32 %rd539, %r784; 2026-02-21T09:08:08.1167467Z shl.b64 %rd540, %rd539, 32; 2026-02-21T09:08:08.1167590Z or.b64 %rd541, %rd538, %rd540; 2026-02-21T09:08:08.1167982Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1168120Z mov.b64 {%r1498, %r1499}, %rd541; 2026-02-21T09:08:08.1168341Z cvt.rn.f16x2.f32 %r1500, %r1499, %r1498; 2026-02-21T09:08:08.1168734Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1168869Z cvt.u64.u32 %rd542, %r785; 2026-02-21T09:08:08.1169050Z cvt.u64.u32 %rd543, %r786; 2026-02-21T09:08:08.1169171Z shl.b64 %rd544, %rd543, 32; 2026-02-21T09:08:08.1169295Z or.b64 %rd545, %rd542, %rd544; 2026-02-21T09:08:08.1169725Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1169848Z mov.b64 {%r1501, %r1502}, %rd545; 2026-02-21T09:08:08.1169992Z cvt.rn.f16x2.f32 %r1503, %r1502, %r1501; 2026-02-21T09:08:08.1170392Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1170514Z cvt.u64.u32 %rd546, %r787; 2026-02-21T09:08:08.1170634Z cvt.u64.u32 %rd547, %r788; 2026-02-21T09:08:08.1170827Z shl.b64 %rd548, %rd547, 32; 2026-02-21T09:08:08.1170956Z or.b64 %rd549, %rd546, %rd548; 2026-02-21T09:08:08.1171348Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1171472Z mov.b64 {%r1504, %r1505}, %rd549; 2026-02-21T09:08:08.1171623Z cvt.rn.f16x2.f32 %r1506, %r1505, %r1504; 2026-02-21T09:08:08.1172019Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1172199Z cvt.u64.u32 %rd550, %r789; 2026-02-21T09:08:08.1172332Z cvt.u64.u32 %rd551, %r790; 2026-02-21T09:08:08.1172455Z shl.b64 %rd552, %rd551, 32; 2026-02-21T09:08:08.1172580Z or.b64 %rd553, %rd550, %rd552; 2026-02-21T09:08:08.1172988Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1173113Z mov.b64 {%r1507, %r1508}, %rd553; 2026-02-21T09:08:08.1173253Z cvt.rn.f16x2.f32 %r1509, %r1508, %r1507; 2026-02-21T09:08:08.1173649Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1173782Z cvt.u64.u32 %rd554, %r791; 2026-02-21T09:08:08.1173902Z cvt.u64.u32 %rd555, %r792; 2026-02-21T09:08:08.1174018Z shl.b64 %rd556, %rd555, 32; 2026-02-21T09:08:08.1174154Z or.b64 %rd557, %rd554, %rd556; 2026-02-21T09:08:08.1174552Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1174757Z mov.b64 {%r1510, %r1511}, %rd557; 2026-02-21T09:08:08.1174914Z cvt.rn.f16x2.f32 %r1512, %r1511, %r1510; 2026-02-21T09:08:08.1175311Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1175431Z cvt.u64.u32 %rd558, %r793; 2026-02-21T09:08:08.1175550Z cvt.u64.u32 %rd559, %r794; 2026-02-21T09:08:08.1175677Z shl.b64 %rd560, %rd559, 32; 2026-02-21T09:08:08.1175799Z or.b64 %rd561, %rd558, %rd560; 2026-02-21T09:08:08.1176196Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1176331Z mov.b64 {%r1513, %r1514}, %rd561; 2026-02-21T09:08:08.1176471Z cvt.rn.f16x2.f32 %r1515, %r1514, %r1513; 2026-02-21T09:08:08.1176865Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1176999Z cvt.u64.u32 %rd562, %r795; 2026-02-21T09:08:08.1177118Z cvt.u64.u32 %rd563, %r796; 2026-02-21T09:08:08.1177241Z shl.b64 %rd564, %rd563, 32; 2026-02-21T09:08:08.1177367Z or.b64 %rd565, %rd562, %rd564; 2026-02-21T09:08:08.1177775Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1177897Z mov.b64 {%r1516, %r1517}, %rd565; 2026-02-21T09:08:08.1178041Z cvt.rn.f16x2.f32 %r1518, %r1517, %r1516; 2026-02-21T09:08:08.1178447Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1178573Z cvt.u64.u32 %rd566, %r798; 2026-02-21T09:08:08.1178767Z cvt.u64.u32 %rd567, %r799; 2026-02-21T09:08:08.1178897Z shl.b64 %rd568, %rd567, 32; 2026-02-21T09:08:08.1179017Z or.b64 %rd569, %rd566, %rd568; 2026-02-21T09:08:08.1179417Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1179636Z mov.b64 {%r1519, %r1520}, %rd569; 2026-02-21T09:08:08.1179785Z cvt.rn.f16x2.f32 %r1521, %r1520, %r1519; 2026-02-21T09:08:08.1180189Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1180312Z cvt.u64.u32 %rd570, %r800; 2026-02-21T09:08:08.1180444Z cvt.u64.u32 %rd571, %r801; 2026-02-21T09:08:08.1180566Z shl.b64 %rd572, %rd571, 32; 2026-02-21T09:08:08.1180685Z or.b64 %rd573, %rd570, %rd572; 2026-02-21T09:08:08.1181096Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1181280Z mov.b64 {%r1522, %r1523}, %rd573; 2026-02-21T09:08:08.1181426Z cvt.rn.f16x2.f32 %r1524, %r1523, %r1522; 2026-02-21T09:08:08.1181818Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1181953Z cvt.u64.u32 %rd574, %r802; 2026-02-21T09:08:08.1182076Z cvt.u64.u32 %rd575, %r803; 2026-02-21T09:08:08.1182199Z shl.b64 %rd576, %rd575, 32; 2026-02-21T09:08:08.1182332Z or.b64 %rd577, %rd574, %rd576; 2026-02-21T09:08:08.1182788Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1182914Z mov.b64 {%r1525, %r1526}, %rd577; 2026-02-21T09:08:08.1183066Z cvt.rn.f16x2.f32 %r1527, %r1526, %r1525; 2026-02-21T09:08:08.1183468Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1183589Z cvt.u64.u32 %rd578, %r804; 2026-02-21T09:08:08.1183710Z cvt.u64.u32 %rd579, %r805; 2026-02-21T09:08:08.1183845Z shl.b64 %rd580, %rd579, 32; 2026-02-21T09:08:08.1183969Z or.b64 %rd581, %rd578, %rd580; 2026-02-21T09:08:08.1184367Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1184497Z mov.b64 {%r1528, %r1529}, %rd581; 2026-02-21T09:08:08.1184643Z cvt.rn.f16x2.f32 %r1530, %r1529, %r1528; 2026-02-21T09:08:08.1185109Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1185249Z cvt.u64.u32 %rd582, %r806; 2026-02-21T09:08:08.1185379Z cvt.u64.u32 %rd583, %r807; 2026-02-21T09:08:08.1185505Z shl.b64 %rd584, %rd583, 32; 2026-02-21T09:08:08.1185634Z or.b64 %rd585, %rd582, %rd584; 2026-02-21T09:08:08.1186049Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1186177Z mov.b64 {%r1531, %r1532}, %rd585; 2026-02-21T09:08:08.1186322Z cvt.rn.f16x2.f32 %r1533, %r1532, %r1531; 2026-02-21T09:08:08.1186737Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1186867Z cvt.u64.u32 %rd586, %r808; 2026-02-21T09:08:08.1186995Z cvt.u64.u32 %rd587, %r809; 2026-02-21T09:08:08.1187131Z shl.b64 %rd588, %rd587, 32; 2026-02-21T09:08:08.1187258Z or.b64 %rd589, %rd586, %rd588; 2026-02-21T09:08:08.1187660Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1187791Z mov.b64 {%r1534, %r1535}, %rd589; 2026-02-21T09:08:08.1187950Z cvt.rn.f16x2.f32 %r1536, %r1535, %r1534; 2026-02-21T09:08:08.1188351Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1188478Z cvt.u64.u32 %rd590, %r810; 2026-02-21T09:08:08.1188612Z cvt.u64.u32 %rd591, %r811; 2026-02-21T09:08:08.1188739Z shl.b64 %rd592, %rd591, 32; 2026-02-21T09:08:08.1188867Z or.b64 %rd593, %rd590, %rd592; 2026-02-21T09:08:08.1189280Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1189491Z mov.b64 {%r1537, %r1538}, %rd593; 2026-02-21T09:08:08.1189631Z cvt.rn.f16x2.f32 %r1539, %r1538, %r1537; 2026-02-21T09:08:08.1190026Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1190218Z cvt.u64.u32 %rd594, %r812; 2026-02-21T09:08:08.1190341Z cvt.u64.u32 %rd595, %r813; 2026-02-21T09:08:08.1190464Z shl.b64 %rd596, %rd595, 32; 2026-02-21T09:08:08.1190609Z or.b64 %rd597, %rd594, %rd596; 2026-02-21T09:08:08.1191009Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1191130Z mov.b64 {%r1540, %r1541}, %rd597; 2026-02-21T09:08:08.1191279Z cvt.rn.f16x2.f32 %r1542, %r1541, %r1540; 2026-02-21T09:08:08.1191676Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1191854Z cvt.u64.u32 %rd598, %r815; 2026-02-21T09:08:08.1191982Z cvt.u64.u32 %rd599, %r816; 2026-02-21T09:08:08.1192112Z shl.b64 %rd600, %rd599, 32; 2026-02-21T09:08:08.1192237Z or.b64 %rd601, %rd598, %rd600; 2026-02-21T09:08:08.1192632Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1192770Z mov.b64 {%r1543, %r1544}, %rd601; 2026-02-21T09:08:08.1192911Z cvt.rn.f16x2.f32 %r1545, %r1544, %r1543; 2026-02-21T09:08:08.1193369Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1193503Z cvt.u64.u32 %rd602, %r817; 2026-02-21T09:08:08.1193627Z cvt.u64.u32 %rd603, %r818; 2026-02-21T09:08:08.1193749Z shl.b64 %rd604, %rd603, 32; 2026-02-21T09:08:08.1193870Z or.b64 %rd605, %rd602, %rd604; 2026-02-21T09:08:08.1194272Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1194398Z mov.b64 {%r1546, %r1547}, %rd605; 2026-02-21T09:08:08.1194540Z cvt.rn.f16x2.f32 %r1548, %r1547, %r1546; 2026-02-21T09:08:08.1195063Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1195188Z cvt.u64.u32 %rd606, %r819; 2026-02-21T09:08:08.1195313Z cvt.u64.u32 %rd607, %r820; 2026-02-21T09:08:08.1195444Z shl.b64 %rd608, %rd607, 32; 2026-02-21T09:08:08.1195566Z or.b64 %rd609, %rd606, %rd608; 2026-02-21T09:08:08.1195967Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1196089Z mov.b64 {%r1549, %r1550}, %rd609; 2026-02-21T09:08:08.1196239Z cvt.rn.f16x2.f32 %r1551, %r1550, %r1549; 2026-02-21T09:08:08.1196637Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1196758Z cvt.u64.u32 %rd610, %r821; 2026-02-21T09:08:08.1196890Z cvt.u64.u32 %rd611, %r822; 2026-02-21T09:08:08.1197011Z shl.b64 %rd612, %rd611, 32; 2026-02-21T09:08:08.1197138Z or.b64 %rd613, %rd610, %rd612; 2026-02-21T09:08:08.1197543Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1197665Z mov.b64 {%r1552, %r1553}, %rd613; 2026-02-21T09:08:08.1197806Z cvt.rn.f16x2.f32 %r1554, %r1553, %r1552; 2026-02-21T09:08:08.1198202Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1198334Z cvt.u64.u32 %rd614, %r823; 2026-02-21T09:08:08.1198458Z cvt.u64.u32 %rd615, %r824; 2026-02-21T09:08:08.1198580Z shl.b64 %rd616, %rd615, 32; 2026-02-21T09:08:08.1198712Z or.b64 %rd617, %rd614, %rd616; 2026-02-21T09:08:08.1199115Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1199239Z mov.b64 {%r1555, %r1556}, %rd617; 2026-02-21T09:08:08.1199387Z cvt.rn.f16x2.f32 %r1557, %r1556, %r1555; 2026-02-21T09:08:08.1199792Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1199992Z cvt.u64.u32 %rd618, %r825; 2026-02-21T09:08:08.1200112Z cvt.u64.u32 %rd619, %r826; 2026-02-21T09:08:08.1200245Z shl.b64 %rd620, %rd619, 32; 2026-02-21T09:08:08.1200369Z or.b64 %rd621, %rd618, %rd620; 2026-02-21T09:08:08.1200817Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1200955Z mov.b64 {%r1558, %r1559}, %rd621; 2026-02-21T09:08:08.1201099Z cvt.rn.f16x2.f32 %r1560, %r1559, %r1558; 2026-02-21T09:08:08.1201494Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1201625Z cvt.u64.u32 %rd622, %r827; 2026-02-21T09:08:08.1201746Z cvt.u64.u32 %rd623, %r828; 2026-02-21T09:08:08.1201869Z shl.b64 %rd624, %rd623, 32; 2026-02-21T09:08:08.1201991Z or.b64 %rd625, %rd622, %rd624; 2026-02-21T09:08:08.1202456Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1202585Z mov.b64 {%r1561, %r1562}, %rd625; 2026-02-21T09:08:08.1202728Z cvt.rn.f16x2.f32 %r1563, %r1562, %r1561; 2026-02-21T09:08:08.1203134Z .loc 1 50 52 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:50:52 2026-02-21T09:08:08.1203262Z cvt.u64.u32 %rd626, %r829; 2026-02-21T09:08:08.1203384Z cvt.u64.u32 %rd627, %r830; 2026-02-21T09:08:08.1203517Z shl.b64 %rd628, %rd627, 32; 2026-02-21T09:08:08.1203696Z or.b64 %rd629, %rd626, %rd628; 2026-02-21T09:08:08.1204098Z .loc 1 52 27 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:52:27 2026-02-21T09:08:08.1204225Z mov.b64 {%r1564, %r1565}, %rd629; 2026-02-21T09:08:08.1204376Z cvt.rn.f16x2.f32 %r1566, %r1565, %r1564; 2026-02-21T09:08:08.1204868Z .loc 1 53 82 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:53:82 2026-02-21T09:08:08.1205094Z st.shared.v4.b32 [%r4], {%r1185, %r1197, %r1209, %r1221}; 2026-02-21T09:08:08.1205315Z st.shared.v4.b32 [%r5], {%r1233, %r1245, %r1257, %r1269}; 2026-02-21T09:08:08.1205515Z st.shared.v4.b32 [%r6], {%r1281, %r1293, %r1305, %r1317}; 2026-02-21T09:08:08.1205713Z st.shared.v4.b32 [%r7], {%r1329, %r1341, %r1353, %r1365}; 2026-02-21T09:08:08.1205923Z st.shared.v4.b32 [%r8], {%r1377, %r1389, %r1401, %r1413}; 2026-02-21T09:08:08.1206119Z st.shared.v4.b32 [%r9], {%r1425, %r1437, %r1449, %r1461}; 2026-02-21T09:08:08.1206332Z st.shared.v4.b32 [%r10], {%r1473, %r1485, %r1497, %r1509}; 2026-02-21T09:08:08.1206544Z st.shared.v4.b32 [%r11], {%r1521, %r1533, %r1545, %r1557}; 2026-02-21T09:08:08.1206662Z bar.sync 0; 2026-02-21T09:08:08.1206786Z // begin inline asm 2026-02-21T09:08:08.1207146Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r992, %r996, %r1000, %r1004}, [%r836]; 2026-02-21T09:08:08.1207275Z // end inline asm 2026-02-21T09:08:08.1207398Z // begin inline asm 2026-02-21T09:08:08.1207770Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1008, %r1012, %r1016, %r1020}, [%r841]; 2026-02-21T09:08:08.1207896Z // end inline asm 2026-02-21T09:08:08.1208015Z // begin inline asm 2026-02-21T09:08:08.1208369Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1028, %r1032, %r1036}, [%r846]; 2026-02-21T09:08:08.1208482Z // end inline asm 2026-02-21T09:08:08.1208614Z // begin inline asm 2026-02-21T09:08:08.1208965Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1040, %r1044, %r1048, %r1052}, [%r851]; 2026-02-21T09:08:08.1209077Z // end inline asm 2026-02-21T09:08:08.1209208Z // begin inline asm 2026-02-21T09:08:08.1209555Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1056, %r1060, %r1064, %r1068}, [%r856]; 2026-02-21T09:08:08.1209668Z // end inline asm 2026-02-21T09:08:08.1209795Z // begin inline asm 2026-02-21T09:08:08.1210151Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1072, %r1076, %r1080, %r1084}, [%r861]; 2026-02-21T09:08:08.1210263Z // end inline asm 2026-02-21T09:08:08.1210382Z // begin inline asm 2026-02-21T09:08:08.1210747Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1088, %r1092, %r1096, %r1100}, [%r866]; 2026-02-21T09:08:08.1210944Z // end inline asm 2026-02-21T09:08:08.1211064Z // begin inline asm 2026-02-21T09:08:08.1211429Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1104, %r1108, %r1112, %r1116}, [%r871]; 2026-02-21T09:08:08.1211609Z // end inline asm 2026-02-21T09:08:08.1211722Z bar.sync 0; 2026-02-21T09:08:08.1211931Z st.shared.v4.b32 [%r4], {%r1188, %r1200, %r1212, %r1224}; 2026-02-21T09:08:08.1212146Z st.shared.v4.b32 [%r5], {%r1236, %r1248, %r1260, %r1272}; 2026-02-21T09:08:08.1212346Z st.shared.v4.b32 [%r6], {%r1284, %r1296, %r1308, %r1320}; 2026-02-21T09:08:08.1212544Z st.shared.v4.b32 [%r7], {%r1332, %r1344, %r1356, %r1368}; 2026-02-21T09:08:08.1212748Z st.shared.v4.b32 [%r8], {%r1380, %r1392, %r1404, %r1416}; 2026-02-21T09:08:08.1212941Z st.shared.v4.b32 [%r9], {%r1428, %r1440, %r1452, %r1464}; 2026-02-21T09:08:08.1213212Z st.shared.v4.b32 [%r10], {%r1476, %r1488, %r1500, %r1512}; 2026-02-21T09:08:08.1213429Z st.shared.v4.b32 [%r11], {%r1524, %r1536, %r1548, %r1560}; 2026-02-21T09:08:08.1213546Z bar.sync 0; 2026-02-21T09:08:08.1213666Z // begin inline asm 2026-02-21T09:08:08.1214028Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r993, %r997, %r1001, %r1005}, [%r836]; 2026-02-21T09:08:08.1214145Z // end inline asm 2026-02-21T09:08:08.1214263Z // begin inline asm 2026-02-21T09:08:08.1214620Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1009, %r1013, %r1017, %r1021}, [%r841]; 2026-02-21T09:08:08.1214797Z // end inline asm 2026-02-21T09:08:08.1214991Z // begin inline asm 2026-02-21T09:08:08.1215349Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r846]; 2026-02-21T09:08:08.1215472Z // end inline asm 2026-02-21T09:08:08.1215591Z // begin inline asm 2026-02-21T09:08:08.1215941Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r851]; 2026-02-21T09:08:08.1216057Z // end inline asm 2026-02-21T09:08:08.1216188Z // begin inline asm 2026-02-21T09:08:08.1216546Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r856]; 2026-02-21T09:08:08.1216662Z // end inline asm 2026-02-21T09:08:08.1216790Z // begin inline asm 2026-02-21T09:08:08.1217141Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r861]; 2026-02-21T09:08:08.1217258Z // end inline asm 2026-02-21T09:08:08.1217382Z // begin inline asm 2026-02-21T09:08:08.1217735Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r866]; 2026-02-21T09:08:08.1217845Z // end inline asm 2026-02-21T09:08:08.1217963Z // begin inline asm 2026-02-21T09:08:08.1218326Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1105, %r1109, %r1113, %r1117}, [%r871]; 2026-02-21T09:08:08.1218439Z // end inline asm 2026-02-21T09:08:08.1218550Z bar.sync 0; 2026-02-21T09:08:08.1218761Z st.shared.v4.b32 [%r4], {%r1191, %r1203, %r1215, %r1227}; 2026-02-21T09:08:08.1218964Z st.shared.v4.b32 [%r5], {%r1239, %r1251, %r1263, %r1275}; 2026-02-21T09:08:08.1219160Z st.shared.v4.b32 [%r6], {%r1287, %r1299, %r1311, %r1323}; 2026-02-21T09:08:08.1219357Z st.shared.v4.b32 [%r7], {%r1335, %r1347, %r1359, %r1371}; 2026-02-21T09:08:08.1219559Z st.shared.v4.b32 [%r8], {%r1383, %r1395, %r1407, %r1419}; 2026-02-21T09:08:08.1219752Z st.shared.v4.b32 [%r9], {%r1431, %r1443, %r1455, %r1467}; 2026-02-21T09:08:08.1219959Z st.shared.v4.b32 [%r10], {%r1479, %r1491, %r1503, %r1515}; 2026-02-21T09:08:08.1220172Z st.shared.v4.b32 [%r11], {%r1527, %r1539, %r1551, %r1563}; 2026-02-21T09:08:08.1220287Z bar.sync 0; 2026-02-21T09:08:08.1220403Z // begin inline asm 2026-02-21T09:08:08.1220761Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r994, %r998, %r1002, %r1006}, [%r836]; 2026-02-21T09:08:08.1220873Z // end inline asm 2026-02-21T09:08:08.1220989Z // begin inline asm 2026-02-21T09:08:08.1221341Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1010, %r1014, %r1018, %r1022}, [%r841]; 2026-02-21T09:08:08.1221460Z // end inline asm 2026-02-21T09:08:08.1221577Z // begin inline asm 2026-02-21T09:08:08.1222049Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r846]; 2026-02-21T09:08:08.1222173Z // end inline asm 2026-02-21T09:08:08.1222291Z // begin inline asm 2026-02-21T09:08:08.1222636Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r851]; 2026-02-21T09:08:08.1222822Z // end inline asm 2026-02-21T09:08:08.1222939Z // begin inline asm 2026-02-21T09:08:08.1223295Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r856]; 2026-02-21T09:08:08.1223407Z // end inline asm 2026-02-21T09:08:08.1223537Z // begin inline asm 2026-02-21T09:08:08.1223894Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r861]; 2026-02-21T09:08:08.1224007Z // end inline asm 2026-02-21T09:08:08.1224133Z // begin inline asm 2026-02-21T09:08:08.1224484Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r866]; 2026-02-21T09:08:08.1224753Z // end inline asm 2026-02-21T09:08:08.1224879Z // begin inline asm 2026-02-21T09:08:08.1225244Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1106, %r1110, %r1114, %r1118}, [%r871]; 2026-02-21T09:08:08.1225358Z // end inline asm 2026-02-21T09:08:08.1225469Z bar.sync 0; 2026-02-21T09:08:08.1225680Z st.shared.v4.b32 [%r4], {%r1194, %r1206, %r1218, %r1230}; 2026-02-21T09:08:08.1225884Z st.shared.v4.b32 [%r5], {%r1242, %r1254, %r1266, %r1278}; 2026-02-21T09:08:08.1226083Z st.shared.v4.b32 [%r6], {%r1290, %r1302, %r1314, %r1326}; 2026-02-21T09:08:08.1226366Z st.shared.v4.b32 [%r7], {%r1338, %r1350, %r1362, %r1374}; 2026-02-21T09:08:08.1226569Z st.shared.v4.b32 [%r8], {%r1386, %r1398, %r1410, %r1422}; 2026-02-21T09:08:08.1226768Z st.shared.v4.b32 [%r9], {%r1434, %r1446, %r1458, %r1470}; 2026-02-21T09:08:08.1226975Z st.shared.v4.b32 [%r10], {%r1482, %r1494, %r1506, %r1518}; 2026-02-21T09:08:08.1227194Z st.shared.v4.b32 [%r11], {%r1530, %r1542, %r1554, %r1566}; 2026-02-21T09:08:08.1227306Z bar.sync 0; 2026-02-21T09:08:08.1227422Z // begin inline asm 2026-02-21T09:08:08.1227788Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r995, %r999, %r1003, %r1007}, [%r836]; 2026-02-21T09:08:08.1227901Z // end inline asm 2026-02-21T09:08:08.1228018Z // begin inline asm 2026-02-21T09:08:08.1228381Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1011, %r1015, %r1019, %r1023}, [%r841]; 2026-02-21T09:08:08.1228493Z // end inline asm 2026-02-21T09:08:08.1228608Z // begin inline asm 2026-02-21T09:08:08.1228966Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1027, %r1031, %r1035, %r1039}, [%r846]; 2026-02-21T09:08:08.1229088Z // end inline asm 2026-02-21T09:08:08.1229201Z // begin inline asm 2026-02-21T09:08:08.1229557Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1043, %r1047, %r1051, %r1055}, [%r851]; 2026-02-21T09:08:08.1229678Z // end inline asm 2026-02-21T09:08:08.1229794Z // begin inline asm 2026-02-21T09:08:08.1230149Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1059, %r1063, %r1067, %r1071}, [%r856]; 2026-02-21T09:08:08.1230273Z // end inline asm 2026-02-21T09:08:08.1230393Z // begin inline asm 2026-02-21T09:08:08.1230743Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1075, %r1079, %r1083, %r1087}, [%r861]; 2026-02-21T09:08:08.1230855Z // end inline asm 2026-02-21T09:08:08.1230982Z // begin inline asm 2026-02-21T09:08:08.1231333Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1091, %r1095, %r1099, %r1103}, [%r866]; 2026-02-21T09:08:08.1231446Z // end inline asm 2026-02-21T09:08:08.1231570Z // begin inline asm 2026-02-21T09:08:08.1231927Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1107, %r1111, %r1115, %r1119}, [%r871]; 2026-02-21T09:08:08.1232035Z // end inline asm 2026-02-21T09:08:08.1232151Z // begin inline asm 2026-02-21T09:08:08.1232395Z st.global.v4.b32 [ %rd86 + 0 ], { %r992, %r993, %r994, %r995 }; 2026-02-21T09:08:08.1232507Z // end inline asm 2026-02-21T09:08:08.1232622Z // begin inline asm 2026-02-21T09:08:08.1232858Z st.global.v4.b32 [ %rd87 + 0 ], { %r996, %r997, %r998, %r999 }; 2026-02-21T09:08:08.1232970Z // end inline asm 2026-02-21T09:08:08.1233167Z // begin inline asm 2026-02-21T09:08:08.1233420Z st.global.v4.b32 [ %rd88 + 0 ], { %r1000, %r1001, %r1002, %r1003 }; 2026-02-21T09:08:08.1233532Z // end inline asm 2026-02-21T09:08:08.1233646Z // begin inline asm 2026-02-21T09:08:08.1233877Z st.global.v4.b32 [ %rd89 + 0 ], { %r1004, %r1005, %r1006, %r1007 }; 2026-02-21T09:08:08.1234060Z // end inline asm 2026-02-21T09:08:08.1234175Z // begin inline asm 2026-02-21T09:08:08.1234407Z st.global.v4.b32 [ %rd90 + 0 ], { %r1008, %r1009, %r1010, %r1011 }; 2026-02-21T09:08:08.1234527Z // end inline asm 2026-02-21T09:08:08.1234641Z // begin inline asm 2026-02-21T09:08:08.1234940Z st.global.v4.b32 [ %rd91 + 0 ], { %r1012, %r1013, %r1014, %r1015 }; 2026-02-21T09:08:08.1235055Z // end inline asm 2026-02-21T09:08:08.1235188Z // begin inline asm 2026-02-21T09:08:08.1235411Z st.global.v4.b32 [ %rd92 + 0 ], { %r1016, %r1017, %r1018, %r1019 }; 2026-02-21T09:08:08.1235525Z // end inline asm 2026-02-21T09:08:08.1235718Z // begin inline asm 2026-02-21T09:08:08.1235944Z st.global.v4.b32 [ %rd93 + 0 ], { %r1020, %r1021, %r1022, %r1023 }; 2026-02-21T09:08:08.1236056Z // end inline asm 2026-02-21T09:08:08.1236172Z // begin inline asm 2026-02-21T09:08:08.1236410Z st.global.v4.b32 [ %rd94 + 0 ], { %r1024, %r1025, %r1026, %r1027 }; 2026-02-21T09:08:08.1236527Z // end inline asm 2026-02-21T09:08:08.1236643Z // begin inline asm 2026-02-21T09:08:08.1236875Z st.global.v4.b32 [ %rd95 + 0 ], { %r1028, %r1029, %r1030, %r1031 }; 2026-02-21T09:08:08.1237045Z // end inline asm 2026-02-21T09:08:08.1237163Z // begin inline asm 2026-02-21T09:08:08.1237399Z st.global.v4.b32 [ %rd96 + 0 ], { %r1032, %r1033, %r1034, %r1035 }; 2026-02-21T09:08:08.1237512Z // end inline asm 2026-02-21T09:08:08.1237627Z // begin inline asm 2026-02-21T09:08:08.1237849Z st.global.v4.b32 [ %rd97 + 0 ], { %r1036, %r1037, %r1038, %r1039 }; 2026-02-21T09:08:08.1237969Z // end inline asm 2026-02-21T09:08:08.1238085Z // begin inline asm 2026-02-21T09:08:08.1238312Z st.global.v4.b32 [ %rd98 + 0 ], { %r1040, %r1041, %r1042, %r1043 }; 2026-02-21T09:08:08.1238437Z // end inline asm 2026-02-21T09:08:08.1238552Z // begin inline asm 2026-02-21T09:08:08.1238775Z st.global.v4.b32 [ %rd99 + 0 ], { %r1044, %r1045, %r1046, %r1047 }; 2026-02-21T09:08:08.1238889Z // end inline asm 2026-02-21T09:08:08.1239015Z // begin inline asm 2026-02-21T09:08:08.1239252Z st.global.v4.b32 [ %rd100 + 0 ], { %r1048, %r1049, %r1050, %r1051 }; 2026-02-21T09:08:08.1239362Z // end inline asm 2026-02-21T09:08:08.1239491Z // begin inline asm 2026-02-21T09:08:08.1239718Z st.global.v4.b32 [ %rd101 + 0 ], { %r1052, %r1053, %r1054, %r1055 }; 2026-02-21T09:08:08.1239834Z // end inline asm 2026-02-21T09:08:08.1239959Z // begin inline asm 2026-02-21T09:08:08.1240188Z st.global.v4.b32 [ %rd102 + 0 ], { %r1056, %r1057, %r1058, %r1059 }; 2026-02-21T09:08:08.1240301Z // end inline asm 2026-02-21T09:08:08.1240417Z // begin inline asm 2026-02-21T09:08:08.1240658Z st.global.v4.b32 [ %rd103 + 0 ], { %r1060, %r1061, %r1062, %r1063 }; 2026-02-21T09:08:08.1240775Z // end inline asm 2026-02-21T09:08:08.1240892Z // begin inline asm 2026-02-21T09:08:08.1241128Z st.global.v4.b32 [ %rd104 + 0 ], { %r1064, %r1065, %r1066, %r1067 }; 2026-02-21T09:08:08.1241241Z // end inline asm 2026-02-21T09:08:08.1241361Z // begin inline asm 2026-02-21T09:08:08.1241587Z st.global.v4.b32 [ %rd105 + 0 ], { %r1068, %r1069, %r1070, %r1071 }; 2026-02-21T09:08:08.1241710Z // end inline asm 2026-02-21T09:08:08.1241828Z // begin inline asm 2026-02-21T09:08:08.1242057Z st.global.v4.b32 [ %rd106 + 0 ], { %r1072, %r1073, %r1074, %r1075 }; 2026-02-21T09:08:08.1242180Z // end inline asm 2026-02-21T09:08:08.1242299Z // begin inline asm 2026-02-21T09:08:08.1242525Z st.global.v4.b32 [ %rd107 + 0 ], { %r1076, %r1077, %r1078, %r1079 }; 2026-02-21T09:08:08.1242649Z // end inline asm 2026-02-21T09:08:08.1242769Z // begin inline asm 2026-02-21T09:08:08.1243000Z st.global.v4.b32 [ %rd108 + 0 ], { %r1080, %r1081, %r1082, %r1083 }; 2026-02-21T09:08:08.1243194Z // end inline asm 2026-02-21T09:08:08.1243322Z // begin inline asm 2026-02-21T09:08:08.1243549Z st.global.v4.b32 [ %rd109 + 0 ], { %r1084, %r1085, %r1086, %r1087 }; 2026-02-21T09:08:08.1243661Z // end inline asm 2026-02-21T09:08:08.1243787Z // begin inline asm 2026-02-21T09:08:08.1244073Z st.global.v4.b32 [ %rd110 + 0 ], { %r1088, %r1089, %r1090, %r1091 }; 2026-02-21T09:08:08.1244185Z // end inline asm 2026-02-21T09:08:08.1244305Z // begin inline asm 2026-02-21T09:08:08.1244542Z st.global.v4.b32 [ %rd111 + 0 ], { %r1092, %r1093, %r1094, %r1095 }; 2026-02-21T09:08:08.1244655Z // end inline asm 2026-02-21T09:08:08.1244849Z // begin inline asm 2026-02-21T09:08:08.1245089Z st.global.v4.b32 [ %rd112 + 0 ], { %r1096, %r1097, %r1098, %r1099 }; 2026-02-21T09:08:08.1245202Z // end inline asm 2026-02-21T09:08:08.1245318Z // begin inline asm 2026-02-21T09:08:08.1245544Z st.global.v4.b32 [ %rd113 + 0 ], { %r1100, %r1101, %r1102, %r1103 }; 2026-02-21T09:08:08.1245727Z // end inline asm 2026-02-21T09:08:08.1245848Z // begin inline asm 2026-02-21T09:08:08.1246075Z st.global.v4.b32 [ %rd114 + 0 ], { %r1104, %r1105, %r1106, %r1107 }; 2026-02-21T09:08:08.1246219Z // end inline asm 2026-02-21T09:08:08.1246331Z // begin inline asm 2026-02-21T09:08:08.1246557Z st.global.v4.b32 [ %rd115 + 0 ], { %r1108, %r1109, %r1110, %r1111 }; 2026-02-21T09:08:08.1246685Z // end inline asm 2026-02-21T09:08:08.1246804Z // begin inline asm 2026-02-21T09:08:08.1247093Z st.global.v4.b32 [ %rd116 + 0 ], { %r1112, %r1113, %r1114, %r1115 }; 2026-02-21T09:08:08.1247208Z // end inline asm 2026-02-21T09:08:08.1247333Z // begin inline asm 2026-02-21T09:08:08.1247559Z st.global.v4.b32 [ %rd117 + 0 ], { %r1116, %r1117, %r1118, %r1119 }; 2026-02-21T09:08:08.1247670Z // end inline asm 2026-02-21T09:08:08.1247856Z $L__BB0_8: // %._crit_edge 2026-02-21T09:08:08.1248272Z .loc 1 31 4 // cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py:31:4 2026-02-21T09:08:08.1248384Z bar.sync 0; 2026-02-21T09:08:08.1248516Z // begin inline asm 2026-02-21T09:08:08.1248798Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1567, 512; 2026-02-21T09:08:08.1248910Z // end inline asm 2026-02-21T09:08:08.1249014Z ret; 2026-02-21T09:08:08.1249134Z $L__tmp1: 2026-02-21T09:08:08.1249250Z $L__func_end0: 2026-02-21T09:08:08.1249432Z // -- End function 2026-02-21T09:08:08.1249548Z } 2026-02-21T09:08:08.1250049Z .file 1 "/tmp/torchinductor_root/vf/cvfxd6vg526flltdkvr6yiyfsatdh3mafxcacu6eptuhyi3bhkce.py" 2026-02-21T09:08:08.1250181Z .section .debug_abbrev 2026-02-21T09:08:08.1250286Z { 2026-02-21T09:08:08.1250482Z .b8 1 // Abbreviation Code 2026-02-21T09:08:08.1250669Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:08:08.1250841Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:08:08.1251019Z .b8 37 // DW_AT_producer 2026-02-21T09:08:08.1251181Z .b8 8 // DW_FORM_string 2026-02-21T09:08:08.1251340Z .b8 19 // DW_AT_language 2026-02-21T09:08:08.1251513Z .b8 5 // DW_FORM_data2 2026-02-21T09:08:08.1251674Z .b8 3 // DW_AT_name 2026-02-21T09:08:08.1251835Z .b8 8 // DW_FORM_string 2026-02-21T09:08:08.1252004Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:08:08.1252176Z .b8 6 // DW_FORM_data4 2026-02-21T09:08:08.1252337Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:08:08.1252493Z .b8 8 // DW_FORM_string 2026-02-21T09:08:08.1252648Z .b8 0 // EOM(1) 2026-02-21T09:08:08.1252792Z .b8 0 // EOM(2) 2026-02-21T09:08:08.1252934Z .b8 0 // EOM(3) 2026-02-21T09:08:08.1253044Z } 2026-02-21T09:08:08.1253249Z .section .debug_info 2026-02-21T09:08:08.1253348Z { 2026-02-21T09:08:08.1253524Z .b32 104 // Length of Unit 2026-02-21T09:08:08.1253718Z .b8 2 // DWARF version number 2026-02-21T09:08:08.1253882Z .b8 0 2026-02-21T09:08:08.1254145Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:08:08.1254348Z .b8 8 // Address Size (in bytes) 2026-02-21T09:08:08.1254568Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:08:08.1254849Z .b8 116 // DW_AT_producer 2026-02-21T09:08:08.1254969Z .b8 114 2026-02-21T09:08:08.1255077Z .b8 105 2026-02-21T09:08:08.1255180Z .b8 116 2026-02-21T09:08:08.1255281Z .b8 111 2026-02-21T09:08:08.1255391Z .b8 110 2026-02-21T09:08:08.1255495Z .b8 0 2026-02-21T09:08:08.1255717Z .b8 2 // DW_AT_language 2026-02-21T09:08:08.1255837Z .b8 0 2026-02-21T09:08:08.1255998Z .b8 99 // DW_AT_name 2026-02-21T09:08:08.1256101Z .b8 118 2026-02-21T09:08:08.1256203Z .b8 102 2026-02-21T09:08:08.1256316Z .b8 120 2026-02-21T09:08:08.1256417Z .b8 100 2026-02-21T09:08:08.1256519Z .b8 54 2026-02-21T09:08:08.1256625Z .b8 118 2026-02-21T09:08:08.1256735Z .b8 103 2026-02-21T09:08:08.1256838Z .b8 53 2026-02-21T09:08:08.1256939Z .b8 50 2026-02-21T09:08:08.1257049Z .b8 54 2026-02-21T09:08:08.1257153Z .b8 102 2026-02-21T09:08:08.1257354Z .b8 108 2026-02-21T09:08:08.1257468Z .b8 108 2026-02-21T09:08:08.1257584Z .b8 116 2026-02-21T09:08:08.1257684Z .b8 100 2026-02-21T09:08:08.1257787Z .b8 107 2026-02-21T09:08:08.1257899Z .b8 118 2026-02-21T09:08:08.1258001Z .b8 114 2026-02-21T09:08:08.1258103Z .b8 54 2026-02-21T09:08:08.1258205Z .b8 121 2026-02-21T09:08:08.1258322Z .b8 105 2026-02-21T09:08:08.1258424Z .b8 121 2026-02-21T09:08:08.1258527Z .b8 102 2026-02-21T09:08:08.1258634Z .b8 115 2026-02-21T09:08:08.1258738Z .b8 97 2026-02-21T09:08:08.1258841Z .b8 116 2026-02-21T09:08:08.1258944Z .b8 100 2026-02-21T09:08:08.1259054Z .b8 104 2026-02-21T09:08:08.1259157Z .b8 51 2026-02-21T09:08:08.1259259Z .b8 109 2026-02-21T09:08:08.1259360Z .b8 97 2026-02-21T09:08:08.1259471Z .b8 102 2026-02-21T09:08:08.1259572Z .b8 120 2026-02-21T09:08:08.1259674Z .b8 99 2026-02-21T09:08:08.1259786Z .b8 97 2026-02-21T09:08:08.1259886Z .b8 99 2026-02-21T09:08:08.1259986Z .b8 117 2026-02-21T09:08:08.1260085Z .b8 54 2026-02-21T09:08:08.1260197Z .b8 101 2026-02-21T09:08:08.1260301Z .b8 112 2026-02-21T09:08:08.1260404Z .b8 116 2026-02-21T09:08:08.1260514Z .b8 117 2026-02-21T09:08:08.1260616Z .b8 104 2026-02-21T09:08:08.1260716Z .b8 121 2026-02-21T09:08:08.1260818Z .b8 105 2026-02-21T09:08:08.1260929Z .b8 51 2026-02-21T09:08:08.1261032Z .b8 98 2026-02-21T09:08:08.1261135Z .b8 104 2026-02-21T09:08:08.1261237Z .b8 107 2026-02-21T09:08:08.1261349Z .b8 99 2026-02-21T09:08:08.1261448Z .b8 101 2026-02-21T09:08:08.1261550Z .b8 46 2026-02-21T09:08:08.1261667Z .b8 112 2026-02-21T09:08:08.1261774Z .b8 121 2026-02-21T09:08:08.1261876Z .b8 0 2026-02-21T09:08:08.1262085Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:08:08.1262255Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:08:08.1262359Z .b8 116 2026-02-21T09:08:08.1262466Z .b8 109 2026-02-21T09:08:08.1262579Z .b8 112 2026-02-21T09:08:08.1262681Z .b8 47 2026-02-21T09:08:08.1262784Z .b8 116 2026-02-21T09:08:08.1262884Z .b8 111 2026-02-21T09:08:08.1262996Z .b8 114 2026-02-21T09:08:08.1263103Z .b8 99 2026-02-21T09:08:08.1263204Z .b8 104 2026-02-21T09:08:08.1263311Z .b8 105 2026-02-21T09:08:08.1263414Z .b8 110 2026-02-21T09:08:08.1263513Z .b8 100 2026-02-21T09:08:08.1263612Z .b8 117 2026-02-21T09:08:08.1263725Z .b8 99 2026-02-21T09:08:08.1263826Z .b8 116 2026-02-21T09:08:08.1263928Z .b8 111 2026-02-21T09:08:08.1264037Z .b8 114 2026-02-21T09:08:08.1264137Z .b8 95 2026-02-21T09:08:08.1264238Z .b8 114 2026-02-21T09:08:08.1264338Z .b8 111 2026-02-21T09:08:08.1264453Z .b8 111 2026-02-21T09:08:08.1264660Z .b8 116 2026-02-21T09:08:08.1264858Z .b8 47 2026-02-21T09:08:08.1264962Z .b8 118 2026-02-21T09:08:08.1265075Z .b8 102 2026-02-21T09:08:08.1265176Z .b8 0 2026-02-21T09:08:08.1265278Z } 2026-02-21T09:08:08.1265430Z .section .debug_macinfo { } 2026-02-21T09:08:08.1265508Z 2026-02-21T09:08:08.1265674Z ================================================================ 2026-02-21T09:08:08.1265910Z please share the reproducer above with Triton project. 2026-02-21T09:08:09.0097191Z 2026-02-21T09:08:09.0098370Z 2026-02-21T09:08:09.0098387Z 2026-02-21T09:08:09.0099020Z ================================================================ 2026-02-21T09:08:09.0099603Z Internal Triton PTX codegen error 2026-02-21T09:08:09.0100000Z `ptxas` stderr: 2026-02-21T09:08:09.0100999Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 232 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:09.0102515Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:09.0102897Z 2026-02-21T09:08:09.0103850Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp8_lxns1x.ptx -o /tmp/tmp8_lxns1x.ptx.o 2026-02-21T09:08:09.0105348Z 2026-02-21T09:08:09.0105356Z 2026-02-21T09:08:09.0105468Z // 2026-02-21T09:08:09.0105764Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:08:09.0106157Z // 2026-02-21T09:08:09.0106297Z 2026-02-21T09:08:09.0106430Z .version 8.7 2026-02-21T09:08:09.0106865Z .target sm_100a 2026-02-21T09:08:09.0107181Z .address_size 64 2026-02-21T09:08:09.0107370Z 2026-02-21T09:08:09.0107646Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:08:09.0108240Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:08:09.0108683Z // @_helion_matmul 2026-02-21T09:08:09.0109065Z .visible .entry _helion_matmul( 2026-02-21T09:08:09.0109413Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:08:09.0109832Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:08:09.0110239Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:08:09.0110812Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:08:09.0111413Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:08:09.0111863Z ) 2026-02-21T09:08:09.0112121Z .reqntid 128 2026-02-21T09:08:09.0112400Z .maxnreg 32 2026-02-21T09:08:09.0112662Z { 2026-02-21T09:08:09.0112921Z .reg .pred %p<133>; 2026-02-21T09:08:09.0113245Z .reg .b32 %r<2852>; 2026-02-21T09:08:09.0113562Z .reg .b64 %rd<1156>; 2026-02-21T09:08:09.0114179Z .loc 1 19 0 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:19:0 2026-02-21T09:08:09.0114983Z $L__func_begin0: 2026-02-21T09:08:09.0115576Z .loc 1 19 0 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:19:0 2026-02-21T09:08:09.0116160Z 2026-02-21T09:08:09.0116273Z // %bb.0: 2026-02-21T09:08:09.0116605Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:08:09.0117043Z $L__tmp0: 2026-02-21T09:08:09.0117593Z .loc 1 19 0 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:19 2026-02-21T09:08:09.0118251Z mov.u32 %r1, %tid.x; 2026-02-21T09:08:09.0118650Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:08:09.0119116Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:08:09.0119482Z mov.b32 %r106, global_smem; 2026-02-21T09:08:09.0119839Z // begin inline asm 2026-02-21T09:08:09.0120395Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r106], 512; 2026-02-21T09:08:09.0120984Z // end inline asm 2026-02-21T09:08:09.0121350Z ld.param.b64 %rd40, [_helion_matmul_param_3]; 2026-02-21T09:08:09.0121791Z bar.sync 0; 2026-02-21T09:08:09.0122101Z ld.shared.b32 %r2843, [global_smem]; 2026-02-21T09:08:09.0122501Z bar.sync 0; 2026-02-21T09:08:09.0122791Z // begin inline asm 2026-02-21T09:08:09.0123503Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:08:09.0124026Z // end inline asm 2026-02-21T09:08:09.0124648Z .loc 1 21 67 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:21:67 2026-02-21T09:08:09.0125564Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:08:09.0125903Z mov.u32 %r123, %ctaid.y; 2026-02-21T09:08:09.0126246Z mov.u32 %r124, %ctaid.z; 2026-02-21T09:08:09.0126581Z mov.u32 %r125, %nctaid.x; 2026-02-21T09:08:09.0126944Z mov.u32 %r126, %nctaid.y; 2026-02-21T09:08:09.0127314Z mad.lo.s32 %r127, %r124, %r126, %r123; 2026-02-21T09:08:09.0127742Z mad.lo.s32 %r128, %r127, %r125, %r3; 2026-02-21T09:08:09.0128118Z shl.b32 %r129, %r128, 8; 2026-02-21T09:08:09.0128455Z cvt.s64.s32 %rd41, %r129; 2026-02-21T09:08:09.0128801Z add.s64 %rd19, %rd40, %rd41; 2026-02-21T09:08:09.0129158Z shl.b32 %r130, %r1, 2; 2026-02-21T09:08:09.0129500Z add.s32 %r107, %r106, %r130; 2026-02-21T09:08:09.0129927Z mov.b32 %r116, 0; 2026-02-21T09:08:09.0130241Z // begin inline asm 2026-02-21T09:08:09.0130573Z @%p1 st.shared.b32 [ %r107 + 0 ], %r116; 2026-02-21T09:08:09.0130976Z // end inline asm 2026-02-21T09:08:09.0131281Z bar.warp.sync -1; 2026-02-21T09:08:09.0131616Z setp.eq.b32 %p126, %r1, 0; 2026-02-21T09:08:09.0131968Z cvt.u64.u32 %rd4, %r106; 2026-02-21T09:08:09.0132309Z // begin inline asm 2026-02-21T09:08:09.0132893Z @%p126 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:08:09.0133622Z // end inline asm 2026-02-21T09:08:09.0133933Z // begin inline asm 2026-02-21T09:08:09.0134490Z @%p126 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:08:09.0135198Z // end inline asm 2026-02-21T09:08:09.0135496Z mov.b32 %r109, 16; 2026-02-21T09:08:09.0135794Z // begin inline asm 2026-02-21T09:08:09.0137202Z [189s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:08:09.0140158Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:08:09.0143096Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:08:09.0143622Z `ptxas` stderr: 2026-02-21T09:08:09.0144604Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 232 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:09.0145829Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:09.0146188Z 2026-02-21T09:08:09.0147122Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp8_lxns1x.ptx -o /tmp/tmp8_lxns1x.ptx.o 2026-02-21T09:08:09.0148191Z 2026-02-21T09:08:09.0148487Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:08:09.0149281Z @%p126 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r109; 2026-02-21T09:08:09.0149935Z // end inline asm 2026-02-21T09:08:09.0150224Z mov.b32 %r110, 256; 2026-02-21T09:08:09.0150544Z // begin inline asm 2026-02-21T09:08:09.0151109Z @%p126 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r110; 2026-02-21T09:08:09.0151732Z // end inline asm 2026-02-21T09:08:09.0152036Z mov.b32 %r111, 2048; 2026-02-21T09:08:09.0152347Z // begin inline asm 2026-02-21T09:08:09.0152937Z @%p126 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r111; 2026-02-21T09:08:09.0153589Z // end inline asm 2026-02-21T09:08:09.0153884Z mov.b32 %r112, 4096; 2026-02-21T09:08:09.0154195Z // begin inline asm 2026-02-21T09:08:09.0155034Z @%p126 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r112; 2026-02-21T09:08:09.0155686Z // end inline asm 2026-02-21T09:08:09.0155978Z mov.b64 %rd12, 4096; 2026-02-21T09:08:09.0156301Z // begin inline asm 2026-02-21T09:08:09.0156885Z @%p126 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:08:09.0157660Z // end inline asm 2026-02-21T09:08:09.0157945Z mov.b32 %r113, 1; 2026-02-21T09:08:09.0158251Z // begin inline asm 2026-02-21T09:08:09.0158862Z @%p126 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r113; 2026-02-21T09:08:09.0159548Z // end inline asm 2026-02-21T09:08:09.0159850Z // begin inline asm 2026-02-21T09:08:09.0160444Z @%p126 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r113; 2026-02-21T09:08:09.0161128Z // end inline asm 2026-02-21T09:08:09.0161423Z // begin inline asm 2026-02-21T09:08:09.0162060Z @%p126 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:08:09.0162690Z // end inline asm 2026-02-21T09:08:09.0162991Z // begin inline asm 2026-02-21T09:08:09.0163589Z @%p126 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:09.0164258Z // end inline asm 2026-02-21T09:08:09.0164561Z // begin inline asm 2026-02-21T09:08:09.0165195Z @%p126 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:08:09.0165832Z // end inline asm 2026-02-21T09:08:09.0166199Z // begin inline asm 2026-02-21T09:08:09.0166754Z @%p126 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:09.0167369Z // end inline asm 2026-02-21T09:08:09.0167665Z // begin inline asm 2026-02-21T09:08:09.0168486Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:08:09.0169352Z // end inline asm 2026-02-21T09:08:09.0169651Z // begin inline asm 2026-02-21T09:08:09.0170128Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:08:09.0170723Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:08:09.0171167Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:09.0171576Z // end inline asm 2026-02-21T09:08:09.0171869Z bar.sync 0; 2026-02-21T09:08:09.0172160Z cvta.global.u64 %rd56, %rd19; 2026-02-21T09:08:09.0172822Z .loc 1 22 67 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:22:67 2026-02-21T09:08:09.0173493Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:08:09.0173847Z bar.sync 0; 2026-02-21T09:08:09.0174133Z // begin inline asm 2026-02-21T09:08:09.0174484Z @%p1 st.shared.b32 [ %r107 + 0 ], %r116; 2026-02-21T09:08:09.0174988Z // end inline asm 2026-02-21T09:08:09.0175301Z bar.warp.sync -1; 2026-02-21T09:08:09.0175611Z // begin inline asm 2026-02-21T09:08:09.0176167Z @%p126 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:08:09.0176814Z // end inline asm 2026-02-21T09:08:09.0177098Z // begin inline asm 2026-02-21T09:08:09.0177599Z @%p126 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:08:09.0178176Z // end inline asm 2026-02-21T09:08:09.0178467Z // begin inline asm 2026-02-21T09:08:09.0179010Z @%p126 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r109; 2026-02-21T09:08:09.0179630Z // end inline asm 2026-02-21T09:08:09.0179927Z // begin inline asm 2026-02-21T09:08:09.0180476Z @%p126 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r110; 2026-02-21T09:08:09.0181111Z // end inline asm 2026-02-21T09:08:09.0181398Z // begin inline asm 2026-02-21T09:08:09.0181966Z @%p126 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r111; 2026-02-21T09:08:09.0182603Z // end inline asm 2026-02-21T09:08:09.0182904Z // begin inline asm 2026-02-21T09:08:09.0183475Z @%p126 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r111; 2026-02-21T09:08:09.0184261Z // end inline asm 2026-02-21T09:08:09.0184561Z // begin inline asm 2026-02-21T09:08:09.0185264Z @%p126 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:08:09.0186030Z // end inline asm 2026-02-21T09:08:09.0186326Z // begin inline asm 2026-02-21T09:08:09.0186939Z @%p126 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r113; 2026-02-21T09:08:09.0187627Z // end inline asm 2026-02-21T09:08:09.0187920Z // begin inline asm 2026-02-21T09:08:09.0188515Z @%p126 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r113; 2026-02-21T09:08:09.0189182Z // end inline asm 2026-02-21T09:08:09.0189479Z // begin inline asm 2026-02-21T09:08:09.0190000Z @%p126 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:08:09.0190629Z // end inline asm 2026-02-21T09:08:09.0191012Z // begin inline asm 2026-02-21T09:08:09.0191619Z @%p126 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:09.0192318Z // end inline asm 2026-02-21T09:08:09.0192611Z // begin inline asm 2026-02-21T09:08:09.0193182Z @%p126 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:08:09.0193808Z // end inline asm 2026-02-21T09:08:09.0194107Z // begin inline asm 2026-02-21T09:08:09.0194649Z @%p126 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:08:09.0195569Z // end inline asm 2026-02-21T09:08:09.0195882Z // begin inline asm 2026-02-21T09:08:09.0196703Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:08:09.0197602Z // end inline asm 2026-02-21T09:08:09.0197903Z // begin inline asm 2026-02-21T09:08:09.0198398Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:08:09.0198989Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:08:09.0199438Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:09.0199858Z // end inline asm 2026-02-21T09:08:09.0200152Z bar.sync 0; 2026-02-21T09:08:09.0200475Z cvta.global.u64 %rd57, %rd37; 2026-02-21T09:08:09.0201163Z .loc 1 31 106 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:31:106 2026-02-21T09:08:09.0201893Z setp.gt.u32 %p39, %r3, 127; 2026-02-21T09:08:09.0202265Z @%p39 bra $L__BB0_8; 2026-02-21T09:08:09.0202638Z // %bb.1: // %.lr.ph 2026-02-21T09:08:09.0203344Z .loc 1 43 45 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:43:45 2026-02-21T09:08:09.0204041Z and.b32 %r4, %r1, 96; 2026-02-21T09:08:09.0204386Z bfe.u32 %r5, %r1, 5, 2; 2026-02-21T09:08:09.0204810Z shl.b32 %r710, %r1, 3; 2026-02-21T09:08:09.0205152Z and.b32 %r711, %r710, 248; 2026-02-21T09:08:09.0205493Z shr.u32 %r712, %r1, 5; 2026-02-21T09:08:09.0205822Z and.b32 %r713, %r1, 7; 2026-02-21T09:08:09.0206148Z shl.b32 %r714, %r713, 11; 2026-02-21T09:08:09.0206489Z shl.b32 %r715, %r1, 4; 2026-02-21T09:08:09.0206805Z and.b32 %r716, %r715, 2032; 2026-02-21T09:08:09.0207158Z or.b32 %r717, %r714, %r716; 2026-02-21T09:08:09.0207505Z xor.b32 %r718, %r717, 16; 2026-02-21T09:08:09.0207838Z xor.b32 %r719, %r717, 32; 2026-02-21T09:08:09.0208178Z xor.b32 %r720, %r717, 48; 2026-02-21T09:08:09.0208493Z xor.b32 %r721, %r717, 64; 2026-02-21T09:08:09.0208830Z xor.b32 %r722, %r717, 80; 2026-02-21T09:08:09.0209154Z xor.b32 %r723, %r717, 96; 2026-02-21T09:08:09.0209494Z xor.b32 %r724, %r717, 112; 2026-02-21T09:08:09.0209832Z shl.b32 %r725, %r4, 6; 2026-02-21T09:08:09.0210167Z shl.b32 %r726, %r713, 4; 2026-02-21T09:08:09.0210494Z shr.u32 %r727, %r4, 1; 2026-02-21T09:08:09.0210822Z bfe.s32 %r728, %r1, 3, 1; 2026-02-21T09:08:09.0211156Z and.b32 %r729, %r728, 8256; 2026-02-21T09:08:09.0211490Z and.b32 %r730, %r710, 128; 2026-02-21T09:08:09.0211839Z or.b32 %r731, %r725, %r726; 2026-02-21T09:08:09.0212347Z or.b32 %r732, %r729, %r727; 2026-02-21T09:08:09.0212711Z xor.b32 %r733, %r732, %r731; 2026-02-21T09:08:09.0213068Z add.s32 %r734, %r106, %r730; 2026-02-21T09:08:09.0213443Z add.s32 %r1374, %r734, %r733; 2026-02-21T09:08:09.0214083Z .loc 1 38 33 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:38:33 2026-02-21T09:08:09.0214941Z shr.u32 %r735, %r3, 4; 2026-02-21T09:08:09.0215280Z and.b32 %r736, %r735, 6; 2026-02-21T09:08:09.0215906Z .loc 1 40 64 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:40:64 2026-02-21T09:08:09.0216594Z and.b32 %r737, %r3, 1; 2026-02-21T09:08:09.0217203Z .loc 1 40 30 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:40:30 2026-02-21T09:08:09.0217885Z or.b32 %r738, %r736, %r737; 2026-02-21T09:08:09.0218508Z .loc 1 42 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:42:27 2026-02-21T09:08:09.0219298Z shl.b32 %r767, %r738, 8; 2026-02-21T09:08:09.0219927Z .loc 1 44 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:44:27 2026-02-21T09:08:09.0220602Z shl.b32 %r739, %r3, 7; 2026-02-21T09:08:09.0220930Z and.b32 %r763, %r739, 3840; 2026-02-21T09:08:09.0221556Z .loc 1 45 32 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:45:32 2026-02-21T09:08:09.0222229Z or.b32 %r25, %r763, %r5; 2026-02-21T09:08:09.0222909Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0223653Z shfl.sync.idx.b32 %r89, %r712, 0, 31, -1; 2026-02-21T09:08:09.0224064Z shl.b32 %r740, %r89, 21; 2026-02-21T09:08:09.0224415Z and.b32 %r741, %r740, 6291456; 2026-02-21T09:08:09.0224863Z add.s32 %r1369, %r741, %r2843; 2026-02-21T09:08:09.0225179Z mov.pred %p40, -1; 2026-02-21T09:08:09.0225510Z // begin inline asm 2026-02-21T09:08:09.0226390Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 0], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0227341Z // end inline asm 2026-02-21T09:08:09.0227638Z // begin inline asm 2026-02-21T09:08:09.0228513Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 16], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0229465Z // end inline asm 2026-02-21T09:08:09.0229762Z // begin inline asm 2026-02-21T09:08:09.0230627Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 32], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0231549Z // end inline asm 2026-02-21T09:08:09.0231857Z // begin inline asm 2026-02-21T09:08:09.0232705Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 48], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0233657Z // end inline asm 2026-02-21T09:08:09.0233955Z // begin inline asm 2026-02-21T09:08:09.0234917Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 64], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0235869Z // end inline asm 2026-02-21T09:08:09.0236156Z // begin inline asm 2026-02-21T09:08:09.0236991Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 80], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0237921Z // end inline asm 2026-02-21T09:08:09.0238212Z // begin inline asm 2026-02-21T09:08:09.0239072Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 96], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0239997Z // end inline asm 2026-02-21T09:08:09.0240295Z // begin inline asm 2026-02-21T09:08:09.0241138Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 112], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0242220Z // end inline asm 2026-02-21T09:08:09.0242530Z // begin inline asm 2026-02-21T09:08:09.0243458Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 128], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0244399Z // end inline asm 2026-02-21T09:08:09.0244775Z // begin inline asm 2026-02-21T09:08:09.0245671Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 144], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0246620Z // end inline asm 2026-02-21T09:08:09.0246933Z // begin inline asm 2026-02-21T09:08:09.0247880Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 160], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0248833Z // end inline asm 2026-02-21T09:08:09.0249141Z // begin inline asm 2026-02-21T09:08:09.0249998Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 176], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0250948Z // end inline asm 2026-02-21T09:08:09.0251244Z // begin inline asm 2026-02-21T09:08:09.0252158Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 192], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0253112Z // end inline asm 2026-02-21T09:08:09.0253402Z // begin inline asm 2026-02-21T09:08:09.0254280Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 208], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0255349Z // end inline asm 2026-02-21T09:08:09.0255652Z // begin inline asm 2026-02-21T09:08:09.0256519Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 224], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0257471Z // end inline asm 2026-02-21T09:08:09.0257775Z // begin inline asm 2026-02-21T09:08:09.0258640Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 240], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0259579Z // end inline asm 2026-02-21T09:08:09.0259867Z // begin inline asm 2026-02-21T09:08:09.0260732Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 256], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0261711Z // end inline asm 2026-02-21T09:08:09.0261999Z // begin inline asm 2026-02-21T09:08:09.0262865Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 272], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0263795Z // end inline asm 2026-02-21T09:08:09.0264090Z // begin inline asm 2026-02-21T09:08:09.0265019Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 288], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0265952Z // end inline asm 2026-02-21T09:08:09.0266254Z // begin inline asm 2026-02-21T09:08:09.0267100Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 304], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0268015Z // end inline asm 2026-02-21T09:08:09.0268305Z // begin inline asm 2026-02-21T09:08:09.0269155Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 320], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0270270Z // end inline asm 2026-02-21T09:08:09.0270556Z // begin inline asm 2026-02-21T09:08:09.0271426Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 336], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0272487Z // end inline asm 2026-02-21T09:08:09.0272788Z // begin inline asm 2026-02-21T09:08:09.0273640Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 352], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0274582Z // end inline asm 2026-02-21T09:08:09.0274994Z // begin inline asm 2026-02-21T09:08:09.0275857Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 368], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0276794Z // end inline asm 2026-02-21T09:08:09.0277182Z // begin inline asm 2026-02-21T09:08:09.0278045Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 384], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0278988Z // end inline asm 2026-02-21T09:08:09.0279285Z // begin inline asm 2026-02-21T09:08:09.0280141Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 400], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0281157Z // end inline asm 2026-02-21T09:08:09.0281466Z // begin inline asm 2026-02-21T09:08:09.0282323Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 416], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0283271Z // end inline asm 2026-02-21T09:08:09.0283562Z // begin inline asm 2026-02-21T09:08:09.0284416Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 432], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0285444Z // end inline asm 2026-02-21T09:08:09.0285738Z // begin inline asm 2026-02-21T09:08:09.0286582Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 448], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0287523Z // end inline asm 2026-02-21T09:08:09.0287819Z // begin inline asm 2026-02-21T09:08:09.0288671Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 464], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0289601Z // end inline asm 2026-02-21T09:08:09.0289894Z // begin inline asm 2026-02-21T09:08:09.0290749Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 480], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0291680Z // end inline asm 2026-02-21T09:08:09.0291964Z // begin inline asm 2026-02-21T09:08:09.0292830Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r1369 + 496], {%r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116, %r116}; 2026-02-21T09:08:09.0293773Z // end inline asm 2026-02-21T09:08:09.0294052Z // begin inline asm 2026-02-21T09:08:09.0294397Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:08:09.0294869Z // end inline asm 2026-02-21T09:08:09.0295170Z bar.sync 0; 2026-02-21T09:08:09.0295758Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0296461Z add.s32 %r2845, %r106, 65568; 2026-02-21T09:08:09.0296817Z // begin inline asm 2026-02-21T09:08:09.0297192Z @%p126 mbarrier.init.shared::cta.b64 [%r2845], 1; 2026-02-21T09:08:09.0297637Z // end inline asm 2026-02-21T09:08:09.0297917Z bar.sync 0; 2026-02-21T09:08:09.0298218Z add.s32 %r676, %r106, 65576; 2026-02-21T09:08:09.0298727Z // begin inline asm 2026-02-21T09:08:09.0299114Z @%p126 mbarrier.init.shared::cta.b64 [%r676], 1; 2026-02-21T09:08:09.0299547Z // end inline asm 2026-02-21T09:08:09.0299854Z add.s32 %r677, %r106, 65536; 2026-02-21T09:08:09.0300201Z // begin inline asm 2026-02-21T09:08:09.0300661Z @%p126 mbarrier.init.shared::cta.b64 [%r677], 1; 2026-02-21T09:08:09.0301100Z // end inline asm 2026-02-21T09:08:09.0301373Z bar.sync 0; 2026-02-21T09:08:09.0301662Z add.s32 %r678, %r106, 65544; 2026-02-21T09:08:09.0301998Z // begin inline asm 2026-02-21T09:08:09.0302371Z @%p126 mbarrier.init.shared::cta.b64 [%r678], 1; 2026-02-21T09:08:09.0302797Z // end inline asm 2026-02-21T09:08:09.0303088Z bar.sync 0; 2026-02-21T09:08:09.0303370Z add.s32 %r679, %r106, 65552; 2026-02-21T09:08:09.0303715Z // begin inline asm 2026-02-21T09:08:09.0304083Z @%p126 mbarrier.init.shared::cta.b64 [%r679], 1; 2026-02-21T09:08:09.0304502Z // end inline asm 2026-02-21T09:08:09.0305016Z bar.sync 0; 2026-02-21T09:08:09.0305305Z add.s32 %r760, %r106, 65560; 2026-02-21T09:08:09.0305649Z // begin inline asm 2026-02-21T09:08:09.0306019Z @%p126 mbarrier.init.shared::cta.b64 [%r760], 1; 2026-02-21T09:08:09.0306458Z // end inline asm 2026-02-21T09:08:09.0306734Z bar.sync 0; 2026-02-21T09:08:09.0307017Z // begin inline asm 2026-02-21T09:08:09.0307445Z @%p126 mbarrier.arrive.expect_tx.shared.b64 _, [%r677], 16384; 2026-02-21T09:08:09.0307957Z // end inline asm 2026-02-21T09:08:09.0308646Z .loc 1 54 31 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:54:31 2026-02-21T09:08:09.0309326Z // begin inline asm 2026-02-21T09:08:09.0309673Z fence.proxy.async.shared::cta; 2026-02-21T09:08:09.0310037Z // end inline asm 2026-02-21T09:08:09.0310318Z bar.sync 0; 2026-02-21T09:08:09.0310614Z elect.sync %r742|%p88, -1; 2026-02-21T09:08:09.0310984Z and.pred %p79, %p1, %p88; 2026-02-21T09:08:09.0311325Z // begin inline asm 2026-02-21T09:08:09.0312116Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r106], [%rd56, {%r116, %r763}], [%r677]; 2026-02-21T09:08:09.0312961Z // end inline asm 2026-02-21T09:08:09.0313530Z .loc 1 55 44 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:55:44 2026-02-21T09:08:09.0314211Z bar.sync 0; 2026-02-21T09:08:09.0314500Z elect.sync %r743|%p89, -1; 2026-02-21T09:08:09.0314926Z and.pred %p80, %p1, %p89; 2026-02-21T09:08:09.0315279Z add.s32 %r686, %r106, 32768; 2026-02-21T09:08:09.0315630Z // begin inline asm 2026-02-21T09:08:09.0316394Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r686], [%rd57, {%r116, %r767}], [%r677]; 2026-02-21T09:08:09.0317224Z // end inline asm 2026-02-21T09:08:09.0317818Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0318488Z bar.sync 0; 2026-02-21T09:08:09.0318781Z // begin inline asm 2026-02-21T09:08:09.0319226Z @%p126 mbarrier.arrive.expect_tx.shared.b64 _, [%r678], 16384; 2026-02-21T09:08:09.0319750Z // end inline asm 2026-02-21T09:08:09.0320343Z .loc 1 54 31 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:54:31 2026-02-21T09:08:09.0321009Z bar.sync 0; 2026-02-21T09:08:09.0321318Z elect.sync %r744|%p90, -1; 2026-02-21T09:08:09.0321680Z and.pred %p82, %p1, %p90; 2026-02-21T09:08:09.0322040Z add.s32 %r691, %r106, 8192; 2026-02-21T09:08:09.0322384Z // begin inline asm 2026-02-21T09:08:09.0323153Z @%p82 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r691], [%rd56, {%r109, %r763}], [%r678]; 2026-02-21T09:08:09.0323986Z // end inline asm 2026-02-21T09:08:09.0324591Z .loc 1 55 44 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:55:44 2026-02-21T09:08:09.0325346Z bar.sync 0; 2026-02-21T09:08:09.0325641Z elect.sync %r745|%p91, -1; 2026-02-21T09:08:09.0326016Z and.pred %p83, %p1, %p91; 2026-02-21T09:08:09.0326359Z add.s32 %r695, %r106, 40960; 2026-02-21T09:08:09.0326841Z // begin inline asm 2026-02-21T09:08:09.0327597Z @%p83 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r695], [%rd57, {%r109, %r767}], [%r678]; 2026-02-21T09:08:09.0328433Z // end inline asm 2026-02-21T09:08:09.0329103Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0329758Z bar.sync 0; 2026-02-21T09:08:09.0330040Z // begin inline asm 2026-02-21T09:08:09.0330479Z @%p126 mbarrier.arrive.expect_tx.shared.b64 _, [%r679], 16384; 2026-02-21T09:08:09.0331003Z // end inline asm 2026-02-21T09:08:09.0331584Z .loc 1 54 31 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:54:31 2026-02-21T09:08:09.0332258Z bar.sync 0; 2026-02-21T09:08:09.0332543Z elect.sync %r746|%p92, -1; 2026-02-21T09:08:09.0332910Z and.pred %p85, %p1, %p92; 2026-02-21T09:08:09.0333267Z add.s32 %r700, %r106, 16384; 2026-02-21T09:08:09.0333683Z mov.b32 %r701, 32; 2026-02-21T09:08:09.0334004Z // begin inline asm 2026-02-21T09:08:09.0334821Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r700], [%rd56, {%r701, %r763}], [%r679]; 2026-02-21T09:08:09.0335659Z // end inline asm 2026-02-21T09:08:09.0336237Z .loc 1 55 44 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:55:44 2026-02-21T09:08:09.0336911Z bar.sync 0; 2026-02-21T09:08:09.0337216Z elect.sync %r747|%p93, -1; 2026-02-21T09:08:09.0337644Z and.pred %p86, %p1, %p93; 2026-02-21T09:08:09.0338006Z add.s32 %r704, %r106, 49152; 2026-02-21T09:08:09.0338342Z // begin inline asm 2026-02-21T09:08:09.0339098Z @%p86 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r704], [%rd57, {%r701, %r767}], [%r679]; 2026-02-21T09:08:09.0339931Z // end inline asm 2026-02-21T09:08:09.0340512Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0341168Z bar.sync 0; 2026-02-21T09:08:09.0341451Z // begin inline asm 2026-02-21T09:08:09.0341746Z 2026-02-21T09:08:09.0341980Z { 2026-02-21T09:08:09.0342250Z .reg .pred complete; 2026-02-21T09:08:09.0342555Z waitLoop: 2026-02-21T09:08:09.0342978Z mbarrier.try_wait.parity.shared.b64 complete, [%r677], %r116; 2026-02-21T09:08:09.0343532Z @!complete bra.uni waitLoop; 2026-02-21T09:08:09.0343876Z } 2026-02-21T09:08:09.0344023Z 2026-02-21T09:08:09.0344134Z // end inline asm 2026-02-21T09:08:09.0344785Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0345486Z setp.ne.b32 %p94, %r89, 0; 2026-02-21T09:08:09.0345832Z @%p94 bra $L__BB0_3; 2026-02-21T09:08:09.0346141Z // %bb.2: 2026-02-21T09:08:09.0346683Z .loc 1 0 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:0:52 2026-02-21T09:08:09.0347366Z add.s32 %r753, %r106, 4096; 2026-02-21T09:08:09.0347718Z bfe.u32 %r754, %r753, 4, 14; 2026-02-21T09:08:09.0348079Z cvt.u64.u32 %rd53, %r754; 2026-02-21T09:08:09.0348448Z or.b64 %rd50, %rd53, -4611685949674356736; 2026-02-21T09:08:09.0348870Z bfe.u32 %r756, %r686, 4, 14; 2026-02-21T09:08:09.0349222Z cvt.u64.u32 %rd54, %r756; 2026-02-21T09:08:09.0349592Z or.b64 %rd49, %rd54, -4611685949674356736; 2026-02-21T09:08:09.0350009Z bfe.u32 %r757, %r106, 4, 14; 2026-02-21T09:08:09.0350353Z cvt.u64.u32 %rd55, %r757; 2026-02-21T09:08:09.0350720Z or.b64 %rd48, %rd55, -4611685949674356736; 2026-02-21T09:08:09.0351394Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0352106Z elect.sync %r758|%p96, -1; 2026-02-21T09:08:09.0352452Z mov.b32 %r749, 138412048; 2026-02-21T09:08:09.0352790Z mov.pred %p95, 0; 2026-02-21T09:08:09.0353095Z // begin inline asm 2026-02-21T09:08:09.0353615Z @%p96 tcgen05.mma.cta_group::1.kind::f16 [ %r2843 + 0 ], %rd48, %rd49, %r749, %p95; 2026-02-21T09:08:09.0354224Z // end inline asm 2026-02-21T09:08:09.0354518Z // begin inline asm 2026-02-21T09:08:09.0355282Z @%p96 tcgen05.mma.cta_group::1.kind::f16 [ %r2843 + 256 ], %rd50, %rd49, %r749, %p95; 2026-02-21T09:08:09.0355875Z // end inline asm 2026-02-21T09:08:09.0356184Z add.s32 %r759, %r106, 65568; 2026-02-21T09:08:09.0356529Z cvt.u64.u32 %rd52, %r759; 2026-02-21T09:08:09.0356953Z // begin inline asm 2026-02-21T09:08:09.0357442Z @%p96 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd52]; 2026-02-21T09:08:09.0357970Z // end inline asm 2026-02-21T09:08:09.0358259Z $L__BB0_3: 2026-02-21T09:08:09.0358808Z .loc 1 0 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:0:52 2026-02-21T09:08:09.0359537Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:08:09.0359973Z add.s32 %r6, %r106, %r717; 2026-02-21T09:08:09.0360330Z add.s32 %r7, %r106, %r718; 2026-02-21T09:08:09.0360654Z add.s32 %r8, %r106, %r719; 2026-02-21T09:08:09.0360995Z add.s32 %r9, %r106, %r720; 2026-02-21T09:08:09.0361417Z add.s32 %r10, %r106, %r721; 2026-02-21T09:08:09.0361770Z add.s32 %r11, %r106, %r722; 2026-02-21T09:08:09.0362116Z add.s32 %r12, %r106, %r723; 2026-02-21T09:08:09.0362457Z add.s32 %r13, %r106, %r724; 2026-02-21T09:08:09.0362799Z add.s32 %r1379, %r1374, 256; 2026-02-21T09:08:09.0363139Z add.s32 %r1384, %r1374, 512; 2026-02-21T09:08:09.0363496Z add.s32 %r1389, %r1374, 768; 2026-02-21T09:08:09.0363837Z add.s32 %r1394, %r1374, 1024; 2026-02-21T09:08:09.0364198Z add.s32 %r1399, %r1374, 1280; 2026-02-21T09:08:09.0364599Z add.s32 %r1404, %r1374, 1536; 2026-02-21T09:08:09.0365442Z add.s32 %r1409, %r1374, 1792; 2026-02-21T09:08:09.0365799Z or.b32 %r23, %r767, %r711; 2026-02-21T09:08:09.0366136Z or.b32 %r26, %r25, 4; 2026-02-21T09:08:09.0366455Z or.b32 %r27, %r25, 8; 2026-02-21T09:08:09.0366764Z or.b32 %r28, %r25, 12; 2026-02-21T09:08:09.0367093Z or.b32 %r29, %r25, 16; 2026-02-21T09:08:09.0367447Z or.b32 %r30, %r25, 20; 2026-02-21T09:08:09.0367801Z or.b32 %r31, %r25, 24; 2026-02-21T09:08:09.0368144Z or.b32 %r32, %r25, 28; 2026-02-21T09:08:09.0368497Z or.b32 %r33, %r25, 32; 2026-02-21T09:08:09.0368840Z or.b32 %r34, %r25, 36; 2026-02-21T09:08:09.0369178Z or.b32 %r35, %r25, 40; 2026-02-21T09:08:09.0369460Z or.b32 %r36, %r25, 44; 2026-02-21T09:08:09.0369768Z or.b32 %r37, %r25, 48; 2026-02-21T09:08:09.0370101Z or.b32 %r38, %r25, 52; 2026-02-21T09:08:09.0370418Z or.b32 %r39, %r25, 56; 2026-02-21T09:08:09.0370719Z or.b32 %r40, %r25, 60; 2026-02-21T09:08:09.0370996Z or.b32 %r41, %r25, 64; 2026-02-21T09:08:09.0371307Z or.b32 %r42, %r25, 68; 2026-02-21T09:08:09.0371579Z or.b32 %r43, %r25, 72; 2026-02-21T09:08:09.0371898Z or.b32 %r44, %r25, 76; 2026-02-21T09:08:09.0372199Z or.b32 %r45, %r25, 80; 2026-02-21T09:08:09.0372513Z or.b32 %r46, %r25, 84; 2026-02-21T09:08:09.0372826Z or.b32 %r47, %r25, 88; 2026-02-21T09:08:09.0373133Z or.b32 %r48, %r25, 92; 2026-02-21T09:08:09.0373450Z or.b32 %r49, %r25, 96; 2026-02-21T09:08:09.0373767Z or.b32 %r50, %r25, 100; 2026-02-21T09:08:09.0374101Z or.b32 %r51, %r25, 104; 2026-02-21T09:08:09.0374410Z or.b32 %r52, %r25, 108; 2026-02-21T09:08:09.0374838Z or.b32 %r53, %r25, 112; 2026-02-21T09:08:09.0375158Z or.b32 %r54, %r25, 116; 2026-02-21T09:08:09.0375491Z or.b32 %r55, %r25, 120; 2026-02-21T09:08:09.0375802Z or.b32 %r56, %r25, 124; 2026-02-21T09:08:09.0376126Z or.b32 %r57, %r25, 128; 2026-02-21T09:08:09.0376443Z or.b32 %r58, %r25, 132; 2026-02-21T09:08:09.0376749Z or.b32 %r59, %r25, 136; 2026-02-21T09:08:09.0377049Z or.b32 %r60, %r25, 140; 2026-02-21T09:08:09.0377363Z or.b32 %r61, %r25, 144; 2026-02-21T09:08:09.0377628Z or.b32 %r62, %r25, 148; 2026-02-21T09:08:09.0377903Z or.b32 %r63, %r25, 152; 2026-02-21T09:08:09.0378187Z or.b32 %r64, %r25, 156; 2026-02-21T09:08:09.0378472Z or.b32 %r65, %r25, 160; 2026-02-21T09:08:09.0378765Z or.b32 %r66, %r25, 164; 2026-02-21T09:08:09.0379052Z or.b32 %r67, %r25, 168; 2026-02-21T09:08:09.0379344Z or.b32 %r68, %r25, 172; 2026-02-21T09:08:09.0379635Z or.b32 %r69, %r25, 176; 2026-02-21T09:08:09.0380080Z or.b32 %r70, %r25, 180; 2026-02-21T09:08:09.0380377Z or.b32 %r71, %r25, 184; 2026-02-21T09:08:09.0380657Z or.b32 %r72, %r25, 188; 2026-02-21T09:08:09.0380957Z or.b32 %r73, %r25, 192; 2026-02-21T09:08:09.0381238Z or.b32 %r74, %r25, 196; 2026-02-21T09:08:09.0381606Z or.b32 %r75, %r25, 200; 2026-02-21T09:08:09.0381894Z or.b32 %r76, %r25, 204; 2026-02-21T09:08:09.0382190Z or.b32 %r77, %r25, 208; 2026-02-21T09:08:09.0382493Z or.b32 %r78, %r25, 212; 2026-02-21T09:08:09.0382840Z or.b32 %r79, %r25, 216; 2026-02-21T09:08:09.0383197Z or.b32 %r80, %r25, 220; 2026-02-21T09:08:09.0383538Z or.b32 %r81, %r25, 224; 2026-02-21T09:08:09.0383846Z or.b32 %r82, %r25, 228; 2026-02-21T09:08:09.0384139Z or.b32 %r83, %r25, 232; 2026-02-21T09:08:09.0384435Z or.b32 %r84, %r25, 236; 2026-02-21T09:08:09.0384852Z or.b32 %r85, %r25, 240; 2026-02-21T09:08:09.0385152Z or.b32 %r86, %r25, 244; 2026-02-21T09:08:09.0385430Z or.b32 %r87, %r25, 248; 2026-02-21T09:08:09.0385855Z or.b32 %r88, %r25, 252; 2026-02-21T09:08:09.0386395Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0387059Z bar.sync 0; 2026-02-21T09:08:09.0387350Z // begin inline asm 2026-02-21T09:08:09.0387794Z @%p126 mbarrier.arrive.expect_tx.shared.b64 _, [%r760], 16384; 2026-02-21T09:08:09.0388310Z // end inline asm 2026-02-21T09:08:09.0388876Z .loc 1 54 31 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:54:31 2026-02-21T09:08:09.0389639Z bar.sync 0; 2026-02-21T09:08:09.0389943Z elect.sync %r774|%p104, -1; 2026-02-21T09:08:09.0390323Z and.pred %p101, %p1, %p104; 2026-02-21T09:08:09.0390666Z add.s32 %r761, %r106, 24576; 2026-02-21T09:08:09.0391006Z mov.b32 %r762, 48; 2026-02-21T09:08:09.0391309Z // begin inline asm 2026-02-21T09:08:09.0392083Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r761], [%rd56, {%r762, %r763}], [%r760]; 2026-02-21T09:08:09.0392935Z // end inline asm 2026-02-21T09:08:09.0393493Z .loc 1 55 44 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:55:44 2026-02-21T09:08:09.0394147Z bar.sync 0; 2026-02-21T09:08:09.0394425Z elect.sync %r775|%p105, -1; 2026-02-21T09:08:09.0394886Z and.pred %p102, %p1, %p105; 2026-02-21T09:08:09.0395245Z add.s32 %r765, %r106, 57344; 2026-02-21T09:08:09.0395567Z // begin inline asm 2026-02-21T09:08:09.0396340Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r765], [%rd57, {%r762, %r767}], [%r760]; 2026-02-21T09:08:09.0397178Z // end inline asm 2026-02-21T09:08:09.0397473Z mov.b32 %r2849, 1; 2026-02-21T09:08:09.0397762Z mov.b32 %r2848, 3; 2026-02-21T09:08:09.0398049Z mov.b32 %r2844, 0; 2026-02-21T09:08:09.0398338Z mov.b32 %r2846, %r2844; 2026-02-21T09:08:09.0398653Z mov.b32 %r2847, %r2844; 2026-02-21T09:08:09.0398965Z mov.b32 %r2850, %r2844; 2026-02-21T09:08:09.0399278Z mov.b32 %r2851, %r2844; 2026-02-21T09:08:09.0399603Z bra.uni $L__BB0_4; 2026-02-21T09:08:09.0400003Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:08:09.0400765Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0401446Z setp.lt.u32 %p116, %r2851, 1984; 2026-02-21T09:08:09.0402023Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0402614Z // begin inline asm 2026-02-21T09:08:09.0402879Z 2026-02-21T09:08:09.0403094Z { 2026-02-21T09:08:09.0403326Z .reg .pred complete; 2026-02-21T09:08:09.0403623Z waitLoop: 2026-02-21T09:08:09.0404026Z mbarrier.try_wait.parity.shared.b64 complete, [%r2845], %r2844; 2026-02-21T09:08:09.0404556Z @!complete bra.uni waitLoop; 2026-02-21T09:08:09.0404959Z } 2026-02-21T09:08:09.0405096Z 2026-02-21T09:08:09.0405204Z // end inline asm 2026-02-21T09:08:09.0405467Z add.s32 %r809, %r2849, 1; 2026-02-21T09:08:09.0405791Z setp.gt.s32 %p119, %r809, 1; 2026-02-21T09:08:09.0406288Z selp.b32 %r2849, 0, %r809, %p119; 2026-02-21T09:08:09.0406642Z selp.b32 %r810, 1, 0, %p119; 2026-02-21T09:08:09.0406983Z xor.b32 %r2850, %r819, %r810; 2026-02-21T09:08:09.0407578Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0408365Z add.s32 %r811, %r2848, 1; 2026-02-21T09:08:09.0408689Z setp.gt.s32 %p120, %r811, 3; 2026-02-21T09:08:09.0409043Z selp.b32 %r2848, 0, %r811, %p120; 2026-02-21T09:08:09.0409403Z shl.b32 %r812, %r2848, 3; 2026-02-21T09:08:09.0409724Z add.s32 %r814, %r106, %r812; 2026-02-21T09:08:09.0410038Z add.s32 %r804, %r814, 65536; 2026-02-21T09:08:09.0410358Z bar.sync 0; 2026-02-21T09:08:09.0410645Z and.pred %p113, %p126, %p116; 2026-02-21T09:08:09.0410976Z // begin inline asm 2026-02-21T09:08:09.0411404Z @%p113 mbarrier.arrive.expect_tx.shared.b64 _, [%r804], 16384; 2026-02-21T09:08:09.0411891Z // end inline asm 2026-02-21T09:08:09.0412544Z .loc 1 54 31 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:54:31 2026-02-21T09:08:09.0413212Z shl.b32 %r815, %r2848, 13; 2026-02-21T09:08:09.0413555Z add.s32 %r801, %r106, %r815; 2026-02-21T09:08:09.0413873Z bar.sync 0; 2026-02-21T09:08:09.0414170Z elect.sync %r816|%p121, -1; 2026-02-21T09:08:09.0414523Z and.pred %p122, %p116, %p121; 2026-02-21T09:08:09.0414956Z and.pred %p114, %p1, %p122; 2026-02-21T09:08:09.0415281Z add.s32 %r802, %r2851, 64; 2026-02-21T09:08:09.0415654Z // begin inline asm 2026-02-21T09:08:09.0416401Z @%p114 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r801], [%rd56, {%r802, %r763}], [%r804]; 2026-02-21T09:08:09.0417194Z // end inline asm 2026-02-21T09:08:09.0417739Z .loc 1 55 44 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:55:44 2026-02-21T09:08:09.0418372Z add.s32 %r805, %r801, 32768; 2026-02-21T09:08:09.0418675Z bar.sync 0; 2026-02-21T09:08:09.0418963Z elect.sync %r817|%p123, -1; 2026-02-21T09:08:09.0419298Z and.pred %p124, %p116, %p123; 2026-02-21T09:08:09.0419644Z and.pred %p115, %p1, %p124; 2026-02-21T09:08:09.0419965Z // begin inline asm 2026-02-21T09:08:09.0420694Z @%p115 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r805], [%rd57, {%r802, %r767}], [%r804]; 2026-02-21T09:08:09.0421489Z // end inline asm 2026-02-21T09:08:09.0422041Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0422713Z setp.lt.u32 %p125, %r2851, 2016; 2026-02-21T09:08:09.0423067Z add.s32 %r2851, %r2851, 16; 2026-02-21T09:08:09.0423397Z mov.b32 %r2844, %r819; 2026-02-21T09:08:09.0423705Z mov.b32 %r2845, %r818; 2026-02-21T09:08:09.0424019Z @%p125 bra $L__BB0_4; 2026-02-21T09:08:09.0424318Z bra.uni $L__BB0_7; 2026-02-21T09:08:09.0424806Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:08:09.0425502Z .loc 1 0 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:0:57 2026-02-21T09:08:09.0426123Z mov.b32 %r819, %r2850; 2026-02-21T09:08:09.0426702Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0427346Z add.s32 %r778, %r2847, 1; 2026-02-21T09:08:09.0427685Z setp.gt.s32 %p107, %r778, 3; 2026-02-21T09:08:09.0428021Z selp.b32 %r2847, 0, %r778, %p107; 2026-02-21T09:08:09.0428387Z selp.b32 %r779, 1, 0, %p107; 2026-02-21T09:08:09.0428725Z xor.b32 %r2846, %r2846, %r779; 2026-02-21T09:08:09.0429070Z shl.b32 %r780, %r2847, 3; 2026-02-21T09:08:09.0429393Z add.s32 %r782, %r106, %r780; 2026-02-21T09:08:09.0429719Z add.s32 %r776, %r782, 65536; 2026-02-21T09:08:09.0430040Z bar.sync 0; 2026-02-21T09:08:09.0430311Z // begin inline asm 2026-02-21T09:08:09.0430606Z 2026-02-21T09:08:09.0430823Z { 2026-02-21T09:08:09.0431078Z .reg .pred complete; 2026-02-21T09:08:09.0431369Z waitLoop: 2026-02-21T09:08:09.0431778Z mbarrier.try_wait.parity.shared.b64 complete, [%r776], %r2846; 2026-02-21T09:08:09.0432437Z @!complete bra.uni waitLoop; 2026-02-21T09:08:09.0432762Z } 2026-02-21T09:08:09.0432900Z 2026-02-21T09:08:09.0433026Z // end inline asm 2026-02-21T09:08:09.0433333Z shl.b32 %r783, %r2849, 3; 2026-02-21T09:08:09.0433741Z add.s32 %r784, %r106, %r783; 2026-02-21T09:08:09.0434065Z add.s32 %r818, %r784, 65568; 2026-02-21T09:08:09.0434785Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0435394Z @%p94 bra $L__BB0_6; 2026-02-21T09:08:09.0435775Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:08:09.0436489Z .loc 1 54 31 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:54:31 2026-02-21T09:08:09.0437144Z shl.b32 %r789, %r2847, 13; 2026-02-21T09:08:09.0437474Z add.s32 %r791, %r106, %r789; 2026-02-21T09:08:09.0438134Z .loc 1 55 44 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:55:44 2026-02-21T09:08:09.0438794Z add.s32 %r792, %r791, 32768; 2026-02-21T09:08:09.0439375Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0440029Z elect.sync %r793|%p109, -1; 2026-02-21T09:08:09.0440380Z bfe.u32 %r794, %r791, 4, 14; 2026-02-21T09:08:09.0440717Z cvt.u64.u32 %rd63, %r794; 2026-02-21T09:08:09.0441076Z or.b64 %rd58, %rd63, -4611685949674356736; 2026-02-21T09:08:09.0441542Z bfe.u32 %r795, %r792, 4, 14; 2026-02-21T09:08:09.0441878Z cvt.u64.u32 %rd64, %r795; 2026-02-21T09:08:09.0442229Z or.b64 %rd59, %rd64, -4611685949674356736; 2026-02-21T09:08:09.0442626Z mov.b32 %r786, 138412048; 2026-02-21T09:08:09.0442945Z mov.pred %p108, -1; 2026-02-21T09:08:09.0443258Z // begin inline asm 2026-02-21T09:08:09.0443758Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r2843 + 0 ], %rd58, %rd59, %r786, %p108; 2026-02-21T09:08:09.0444335Z // end inline asm 2026-02-21T09:08:09.0444629Z add.s32 %r796, %r791, 4096; 2026-02-21T09:08:09.0445029Z bfe.u32 %r797, %r796, 4, 14; 2026-02-21T09:08:09.0445343Z cvt.u64.u32 %rd65, %r797; 2026-02-21T09:08:09.0445670Z or.b64 %rd60, %rd65, -4611685949674356736; 2026-02-21T09:08:09.0446057Z // begin inline asm 2026-02-21T09:08:09.0446562Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r2843 + 256 ], %rd60, %rd59, %r786, %p108; 2026-02-21T09:08:09.0447145Z // end inline asm 2026-02-21T09:08:09.0447432Z cvt.u64.u32 %rd62, %r818; 2026-02-21T09:08:09.0447755Z // begin inline asm 2026-02-21T09:08:09.0448209Z @%p109 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd62]; 2026-02-21T09:08:09.0448721Z // end inline asm 2026-02-21T09:08:09.0449009Z bra.uni $L__BB0_6; 2026-02-21T09:08:09.0449375Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:08:09.0450098Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0450747Z // begin inline asm 2026-02-21T09:08:09.0451038Z 2026-02-21T09:08:09.0451261Z { 2026-02-21T09:08:09.0451515Z .reg .pred complete; 2026-02-21T09:08:09.0451805Z waitLoop: 2026-02-21T09:08:09.0452199Z mbarrier.try_wait.parity.shared.b64 complete, [%r818], %r819; 2026-02-21T09:08:09.0452709Z @!complete bra.uni waitLoop; 2026-02-21T09:08:09.0453018Z } 2026-02-21T09:08:09.0453143Z 2026-02-21T09:08:09.0453258Z // end inline asm 2026-02-21T09:08:09.0453792Z .loc 1 50 57 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:50:57 2026-02-21T09:08:09.0454416Z bar.sync 0; 2026-02-21T09:08:09.0454761Z // begin inline asm 2026-02-21T09:08:09.0455116Z @%p126 mbarrier.inval.shared::cta.b64 [%r677]; 2026-02-21T09:08:09.0455507Z // end inline asm 2026-02-21T09:08:09.0455774Z bar.sync 0; 2026-02-21T09:08:09.0456048Z // begin inline asm 2026-02-21T09:08:09.0456386Z @%p126 mbarrier.inval.shared::cta.b64 [%r678]; 2026-02-21T09:08:09.0456789Z // end inline asm 2026-02-21T09:08:09.0457053Z bar.sync 0; 2026-02-21T09:08:09.0457470Z // begin inline asm 2026-02-21T09:08:09.0457807Z @%p126 mbarrier.inval.shared::cta.b64 [%r679]; 2026-02-21T09:08:09.0458212Z // end inline asm 2026-02-21T09:08:09.0458477Z bar.sync 0; 2026-02-21T09:08:09.0458739Z // begin inline asm 2026-02-21T09:08:09.0459163Z @%p126 mbarrier.inval.shared::cta.b64 [%r760]; 2026-02-21T09:08:09.0459552Z // end inline asm 2026-02-21T09:08:09.0459839Z add.s32 %r824, %r106, 65568; 2026-02-21T09:08:09.0460150Z // begin inline asm 2026-02-21T09:08:09.0460491Z @%p126 mbarrier.inval.shared::cta.b64 [%r824]; 2026-02-21T09:08:09.0460881Z // end inline asm 2026-02-21T09:08:09.0461151Z bar.sync 0; 2026-02-21T09:08:09.0461407Z // begin inline asm 2026-02-21T09:08:09.0461749Z @%p126 mbarrier.inval.shared::cta.b64 [%r676]; 2026-02-21T09:08:09.0462134Z // end inline asm 2026-02-21T09:08:09.0462684Z .loc 1 59 45 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:59:45 2026-02-21T09:08:09.0463426Z shl.b32 %r1947, %r25, 11; 2026-02-21T09:08:09.0463757Z shl.b32 %r1948, %r26, 11; 2026-02-21T09:08:09.0464081Z shl.b32 %r1949, %r27, 11; 2026-02-21T09:08:09.0464390Z shl.b32 %r1950, %r28, 11; 2026-02-21T09:08:09.0464798Z shl.b32 %r1951, %r29, 11; 2026-02-21T09:08:09.0465082Z shl.b32 %r1952, %r30, 11; 2026-02-21T09:08:09.0465381Z shl.b32 %r1953, %r31, 11; 2026-02-21T09:08:09.0465661Z shl.b32 %r1954, %r32, 11; 2026-02-21T09:08:09.0465963Z shl.b32 %r1955, %r33, 11; 2026-02-21T09:08:09.0466280Z shl.b32 %r1956, %r34, 11; 2026-02-21T09:08:09.0466650Z shl.b32 %r1957, %r35, 11; 2026-02-21T09:08:09.0466965Z shl.b32 %r1958, %r36, 11; 2026-02-21T09:08:09.0467271Z shl.b32 %r1959, %r37, 11; 2026-02-21T09:08:09.0467582Z shl.b32 %r1960, %r38, 11; 2026-02-21T09:08:09.0467883Z shl.b32 %r1961, %r39, 11; 2026-02-21T09:08:09.0468196Z shl.b32 %r1962, %r40, 11; 2026-02-21T09:08:09.0468499Z shl.b32 %r1963, %r41, 11; 2026-02-21T09:08:09.0468806Z shl.b32 %r1964, %r42, 11; 2026-02-21T09:08:09.0469112Z shl.b32 %r1965, %r43, 11; 2026-02-21T09:08:09.0469411Z shl.b32 %r1966, %r44, 11; 2026-02-21T09:08:09.0469681Z shl.b32 %r1967, %r45, 11; 2026-02-21T09:08:09.0469999Z shl.b32 %r1968, %r46, 11; 2026-02-21T09:08:09.0470316Z shl.b32 %r1969, %r47, 11; 2026-02-21T09:08:09.0470625Z shl.b32 %r1970, %r48, 11; 2026-02-21T09:08:09.0470946Z shl.b32 %r1971, %r49, 11; 2026-02-21T09:08:09.0471251Z shl.b32 %r1972, %r50, 11; 2026-02-21T09:08:09.0471567Z shl.b32 %r1973, %r51, 11; 2026-02-21T09:08:09.0471881Z shl.b32 %r1974, %r52, 11; 2026-02-21T09:08:09.0472206Z shl.b32 %r1975, %r53, 11; 2026-02-21T09:08:09.0472527Z shl.b32 %r1976, %r54, 11; 2026-02-21T09:08:09.0472853Z shl.b32 %r1977, %r55, 11; 2026-02-21T09:08:09.0473172Z shl.b32 %r1978, %r56, 11; 2026-02-21T09:08:09.0473482Z shl.b32 %r1979, %r57, 11; 2026-02-21T09:08:09.0473799Z shl.b32 %r1980, %r58, 11; 2026-02-21T09:08:09.0474109Z shl.b32 %r1981, %r59, 11; 2026-02-21T09:08:09.0474431Z shl.b32 %r1982, %r60, 11; 2026-02-21T09:08:09.0474812Z shl.b32 %r1983, %r61, 11; 2026-02-21T09:08:09.0475142Z shl.b32 %r1984, %r62, 11; 2026-02-21T09:08:09.0475459Z shl.b32 %r1985, %r63, 11; 2026-02-21T09:08:09.0475777Z shl.b32 %r1986, %r64, 11; 2026-02-21T09:08:09.0476085Z shl.b32 %r1987, %r65, 11; 2026-02-21T09:08:09.0476407Z shl.b32 %r1988, %r66, 11; 2026-02-21T09:08:09.0476726Z shl.b32 %r1989, %r67, 11; 2026-02-21T09:08:09.0477034Z shl.b32 %r1990, %r68, 11; 2026-02-21T09:08:09.0477348Z shl.b32 %r1991, %r69, 11; 2026-02-21T09:08:09.0477653Z shl.b32 %r1992, %r70, 11; 2026-02-21T09:08:09.0477976Z shl.b32 %r1993, %r71, 11; 2026-02-21T09:08:09.0478285Z shl.b32 %r1994, %r72, 11; 2026-02-21T09:08:09.0478602Z shl.b32 %r1995, %r73, 11; 2026-02-21T09:08:09.0478909Z shl.b32 %r1996, %r74, 11; 2026-02-21T09:08:09.0479225Z shl.b32 %r1997, %r75, 11; 2026-02-21T09:08:09.0479533Z shl.b32 %r1998, %r76, 11; 2026-02-21T09:08:09.0479859Z shl.b32 %r1999, %r77, 11; 2026-02-21T09:08:09.0480184Z shl.b32 %r2000, %r78, 11; 2026-02-21T09:08:09.0480493Z shl.b32 %r2001, %r79, 11; 2026-02-21T09:08:09.0480938Z shl.b32 %r2002, %r80, 11; 2026-02-21T09:08:09.0481244Z shl.b32 %r2003, %r81, 11; 2026-02-21T09:08:09.0481564Z shl.b32 %r2004, %r82, 11; 2026-02-21T09:08:09.0481874Z shl.b32 %r2005, %r83, 11; 2026-02-21T09:08:09.0482193Z shl.b32 %r2006, %r84, 11; 2026-02-21T09:08:09.0482578Z shl.b32 %r2007, %r85, 11; 2026-02-21T09:08:09.0482899Z shl.b32 %r2008, %r86, 11; 2026-02-21T09:08:09.0483211Z shl.b32 %r2009, %r87, 11; 2026-02-21T09:08:09.0483521Z shl.b32 %r2010, %r88, 11; 2026-02-21T09:08:09.0484148Z .loc 1 59 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:59:52 2026-02-21T09:08:09.0484987Z or.b32 %r2011, %r23, %r1947; 2026-02-21T09:08:09.0485346Z or.b32 %r2012, %r23, %r1948; 2026-02-21T09:08:09.0485683Z or.b32 %r2013, %r23, %r1949; 2026-02-21T09:08:09.0486029Z or.b32 %r2014, %r23, %r1950; 2026-02-21T09:08:09.0486350Z or.b32 %r2015, %r23, %r1951; 2026-02-21T09:08:09.0486756Z or.b32 %r2016, %r23, %r1952; 2026-02-21T09:08:09.0487088Z or.b32 %r2017, %r23, %r1953; 2026-02-21T09:08:09.0487420Z or.b32 %r2018, %r23, %r1954; 2026-02-21T09:08:09.0487757Z or.b32 %r2019, %r23, %r1955; 2026-02-21T09:08:09.0488079Z or.b32 %r2020, %r23, %r1956; 2026-02-21T09:08:09.0488417Z or.b32 %r2021, %r23, %r1957; 2026-02-21T09:08:09.0488751Z or.b32 %r2022, %r23, %r1958; 2026-02-21T09:08:09.0489090Z or.b32 %r2023, %r23, %r1959; 2026-02-21T09:08:09.0489412Z or.b32 %r2024, %r23, %r1960; 2026-02-21T09:08:09.0489755Z or.b32 %r2025, %r23, %r1961; 2026-02-21T09:08:09.0490150Z or.b32 %r2026, %r23, %r1962; 2026-02-21T09:08:09.0490499Z or.b32 %r2027, %r23, %r1963; 2026-02-21T09:08:09.0490849Z or.b32 %r2028, %r23, %r1964; 2026-02-21T09:08:09.0491175Z or.b32 %r2029, %r23, %r1965; 2026-02-21T09:08:09.0491515Z or.b32 %r2030, %r23, %r1966; 2026-02-21T09:08:09.0491838Z or.b32 %r2031, %r23, %r1967; 2026-02-21T09:08:09.0492179Z or.b32 %r2032, %r23, %r1968; 2026-02-21T09:08:09.0492505Z or.b32 %r2033, %r23, %r1969; 2026-02-21T09:08:09.0492843Z or.b32 %r2034, %r23, %r1970; 2026-02-21T09:08:09.0493170Z or.b32 %r2035, %r23, %r1971; 2026-02-21T09:08:09.0493509Z or.b32 %r2036, %r23, %r1972; 2026-02-21T09:08:09.0493838Z or.b32 %r2037, %r23, %r1973; 2026-02-21T09:08:09.0494177Z or.b32 %r2038, %r23, %r1974; 2026-02-21T09:08:09.0494520Z or.b32 %r2039, %r23, %r1975; 2026-02-21T09:08:09.0494917Z or.b32 %r2040, %r23, %r1976; 2026-02-21T09:08:09.0495254Z or.b32 %r2041, %r23, %r1977; 2026-02-21T09:08:09.0495577Z or.b32 %r2042, %r23, %r1978; 2026-02-21T09:08:09.0495924Z or.b32 %r2043, %r23, %r1979; 2026-02-21T09:08:09.0496246Z or.b32 %r2044, %r23, %r1980; 2026-02-21T09:08:09.0496585Z or.b32 %r2045, %r23, %r1981; 2026-02-21T09:08:09.0496921Z or.b32 %r2046, %r23, %r1982; 2026-02-21T09:08:09.0497251Z or.b32 %r2047, %r23, %r1983; 2026-02-21T09:08:09.0497584Z or.b32 %r2048, %r23, %r1984; 2026-02-21T09:08:09.0497911Z or.b32 %r2049, %r23, %r1985; 2026-02-21T09:08:09.0498252Z or.b32 %r2050, %r23, %r1986; 2026-02-21T09:08:09.0498579Z or.b32 %r2051, %r23, %r1987; 2026-02-21T09:08:09.0498915Z or.b32 %r2052, %r23, %r1988; 2026-02-21T09:08:09.0499247Z or.b32 %r2053, %r23, %r1989; 2026-02-21T09:08:09.0499584Z or.b32 %r2054, %r23, %r1990; 2026-02-21T09:08:09.0499905Z or.b32 %r2055, %r23, %r1991; 2026-02-21T09:08:09.0500246Z or.b32 %r2056, %r23, %r1992; 2026-02-21T09:08:09.0500572Z or.b32 %r2057, %r23, %r1993; 2026-02-21T09:08:09.0500898Z or.b32 %r2058, %r23, %r1994; 2026-02-21T09:08:09.0501230Z or.b32 %r2059, %r23, %r1995; 2026-02-21T09:08:09.0501550Z or.b32 %r2060, %r23, %r1996; 2026-02-21T09:08:09.0501891Z or.b32 %r2061, %r23, %r1997; 2026-02-21T09:08:09.0502214Z or.b32 %r2062, %r23, %r1998; 2026-02-21T09:08:09.0502545Z or.b32 %r2063, %r23, %r1999; 2026-02-21T09:08:09.0502874Z or.b32 %r2064, %r23, %r2000; 2026-02-21T09:08:09.0503208Z or.b32 %r2065, %r23, %r2001; 2026-02-21T09:08:09.0503534Z or.b32 %r2066, %r23, %r2002; 2026-02-21T09:08:09.0503873Z or.b32 %r2067, %r23, %r2003; 2026-02-21T09:08:09.0504201Z or.b32 %r2068, %r23, %r2004; 2026-02-21T09:08:09.0504774Z or.b32 %r2069, %r23, %r2005; 2026-02-21T09:08:09.0505118Z or.b32 %r2070, %r23, %r2006; 2026-02-21T09:08:09.0505438Z or.b32 %r2071, %r23, %r2007; 2026-02-21T09:08:09.0505777Z or.b32 %r2072, %r23, %r2008; 2026-02-21T09:08:09.0506209Z or.b32 %r2073, %r23, %r2009; 2026-02-21T09:08:09.0506548Z or.b32 %r2074, %r23, %r2010; 2026-02-21T09:08:09.0507149Z .loc 1 59 24 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:59:24 2026-02-21T09:08:09.0507848Z mad.wide.u32 %rd68, %r2011, 2, %rd3; 2026-02-21T09:08:09.0508254Z mad.wide.u32 %rd69, %r2012, 2, %rd3; 2026-02-21T09:08:09.0508636Z mad.wide.u32 %rd70, %r2013, 2, %rd3; 2026-02-21T09:08:09.0509028Z mad.wide.u32 %rd71, %r2014, 2, %rd3; 2026-02-21T09:08:09.0509405Z mad.wide.u32 %rd72, %r2015, 2, %rd3; 2026-02-21T09:08:09.0509791Z mad.wide.u32 %rd73, %r2016, 2, %rd3; 2026-02-21T09:08:09.0510242Z mad.wide.u32 %rd74, %r2017, 2, %rd3; 2026-02-21T09:08:09.0510630Z mad.wide.u32 %rd75, %r2018, 2, %rd3; 2026-02-21T09:08:09.0511011Z mad.wide.u32 %rd76, %r2019, 2, %rd3; 2026-02-21T09:08:09.0511394Z mad.wide.u32 %rd77, %r2020, 2, %rd3; 2026-02-21T09:08:09.0511782Z mad.wide.u32 %rd78, %r2021, 2, %rd3; 2026-02-21T09:08:09.0512158Z mad.wide.u32 %rd79, %r2022, 2, %rd3; 2026-02-21T09:08:09.0512544Z mad.wide.u32 %rd80, %r2023, 2, %rd3; 2026-02-21T09:08:09.0512919Z mad.wide.u32 %rd81, %r2024, 2, %rd3; 2026-02-21T09:08:09.0513305Z mad.wide.u32 %rd82, %r2025, 2, %rd3; 2026-02-21T09:08:09.0513748Z mad.wide.u32 %rd83, %r2026, 2, %rd3; 2026-02-21T09:08:09.0514143Z mad.wide.u32 %rd84, %r2027, 2, %rd3; 2026-02-21T09:08:09.0514514Z mad.wide.u32 %rd85, %r2028, 2, %rd3; 2026-02-21T09:08:09.0515002Z mad.wide.u32 %rd86, %r2029, 2, %rd3; 2026-02-21T09:08:09.0515401Z mad.wide.u32 %rd87, %r2030, 2, %rd3; 2026-02-21T09:08:09.0515781Z mad.wide.u32 %rd88, %r2031, 2, %rd3; 2026-02-21T09:08:09.0516168Z mad.wide.u32 %rd89, %r2032, 2, %rd3; 2026-02-21T09:08:09.0516544Z mad.wide.u32 %rd90, %r2033, 2, %rd3; 2026-02-21T09:08:09.0516933Z mad.wide.u32 %rd91, %r2034, 2, %rd3; 2026-02-21T09:08:09.0517306Z mad.wide.u32 %rd92, %r2035, 2, %rd3; 2026-02-21T09:08:09.0517685Z mad.wide.u32 %rd93, %r2036, 2, %rd3; 2026-02-21T09:08:09.0518063Z mad.wide.u32 %rd94, %r2037, 2, %rd3; 2026-02-21T09:08:09.0518452Z mad.wide.u32 %rd95, %r2038, 2, %rd3; 2026-02-21T09:08:09.0518838Z mad.wide.u32 %rd96, %r2039, 2, %rd3; 2026-02-21T09:08:09.0519221Z mad.wide.u32 %rd97, %r2040, 2, %rd3; 2026-02-21T09:08:09.0519616Z mad.wide.u32 %rd98, %r2041, 2, %rd3; 2026-02-21T09:08:09.0519997Z mad.wide.u32 %rd99, %r2042, 2, %rd3; 2026-02-21T09:08:09.0520402Z mad.wide.u32 %rd100, %r2043, 2, %rd3; 2026-02-21T09:08:09.0520802Z mad.wide.u32 %rd101, %r2044, 2, %rd3; 2026-02-21T09:08:09.0521206Z mad.wide.u32 %rd102, %r2045, 2, %rd3; 2026-02-21T09:08:09.0521588Z mad.wide.u32 %rd103, %r2046, 2, %rd3; 2026-02-21T09:08:09.0521987Z mad.wide.u32 %rd104, %r2047, 2, %rd3; 2026-02-21T09:08:09.0522383Z mad.wide.u32 %rd105, %r2048, 2, %rd3; 2026-02-21T09:08:09.0522769Z mad.wide.u32 %rd106, %r2049, 2, %rd3; 2026-02-21T09:08:09.0523163Z mad.wide.u32 %rd107, %r2050, 2, %rd3; 2026-02-21T09:08:09.0523553Z mad.wide.u32 %rd108, %r2051, 2, %rd3; 2026-02-21T09:08:09.0523950Z mad.wide.u32 %rd109, %r2052, 2, %rd3; 2026-02-21T09:08:09.0524330Z mad.wide.u32 %rd110, %r2053, 2, %rd3; 2026-02-21T09:08:09.0524869Z mad.wide.u32 %rd111, %r2054, 2, %rd3; 2026-02-21T09:08:09.0525250Z mad.wide.u32 %rd112, %r2055, 2, %rd3; 2026-02-21T09:08:09.0525639Z mad.wide.u32 %rd113, %r2056, 2, %rd3; 2026-02-21T09:08:09.0526025Z mad.wide.u32 %rd114, %r2057, 2, %rd3; 2026-02-21T09:08:09.0526419Z mad.wide.u32 %rd115, %r2058, 2, %rd3; 2026-02-21T09:08:09.0526808Z mad.wide.u32 %rd116, %r2059, 2, %rd3; 2026-02-21T09:08:09.0527186Z mad.wide.u32 %rd117, %r2060, 2, %rd3; 2026-02-21T09:08:09.0527576Z mad.wide.u32 %rd118, %r2061, 2, %rd3; 2026-02-21T09:08:09.0527963Z mad.wide.u32 %rd119, %r2062, 2, %rd3; 2026-02-21T09:08:09.0528483Z mad.wide.u32 %rd120, %r2063, 2, %rd3; 2026-02-21T09:08:09.0528858Z mad.wide.u32 %rd121, %r2064, 2, %rd3; 2026-02-21T09:08:09.0529248Z mad.wide.u32 %rd122, %r2065, 2, %rd3; 2026-02-21T09:08:09.0529623Z mad.wide.u32 %rd123, %r2066, 2, %rd3; 2026-02-21T09:08:09.0530104Z mad.wide.u32 %rd124, %r2067, 2, %rd3; 2026-02-21T09:08:09.0530494Z mad.wide.u32 %rd125, %r2068, 2, %rd3; 2026-02-21T09:08:09.0530873Z mad.wide.u32 %rd126, %r2069, 2, %rd3; 2026-02-21T09:08:09.0531255Z mad.wide.u32 %rd127, %r2070, 2, %rd3; 2026-02-21T09:08:09.0531637Z mad.wide.u32 %rd128, %r2071, 2, %rd3; 2026-02-21T09:08:09.0532026Z mad.wide.u32 %rd129, %r2072, 2, %rd3; 2026-02-21T09:08:09.0532397Z mad.wide.u32 %rd130, %r2073, 2, %rd3; 2026-02-21T09:08:09.0532786Z mad.wide.u32 %rd131, %r2074, 2, %rd3; 2026-02-21T09:08:09.0533470Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0534230Z // begin inline asm 2026-02-21T09:08:09.0535181Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841}, [%r1369 + 0]; 2026-02-21T09:08:09.0536108Z // end inline asm 2026-02-21T09:08:09.0536434Z cvt.u64.u32 %rd132, %r830; 2026-02-21T09:08:09.0536779Z cvt.u64.u32 %rd133, %r831; 2026-02-21T09:08:09.0537130Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:08:09.0537479Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:08:09.0537840Z cvt.u64.u32 %rd136, %r832; 2026-02-21T09:08:09.0538248Z cvt.u64.u32 %rd137, %r833; 2026-02-21T09:08:09.0538590Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:08:09.0538942Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:08:09.0539293Z cvt.u64.u32 %rd140, %r838; 2026-02-21T09:08:09.0539637Z cvt.u64.u32 %rd141, %r839; 2026-02-21T09:08:09.0539967Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:08:09.0540318Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:08:09.0540660Z cvt.u64.u32 %rd144, %r840; 2026-02-21T09:08:09.0541002Z cvt.u64.u32 %rd145, %r841; 2026-02-21T09:08:09.0541332Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:08:09.0541674Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:08:09.0542025Z // begin inline asm 2026-02-21T09:08:09.0542869Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r843, %r844, %r845, %r846, %r847, %r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858}, [%r1369 + 16]; 2026-02-21T09:08:09.0543788Z // end inline asm 2026-02-21T09:08:09.0544074Z cvt.u64.u32 %rd148, %r847; 2026-02-21T09:08:09.0544420Z cvt.u64.u32 %rd149, %r848; 2026-02-21T09:08:09.0544841Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:08:09.0545199Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:08:09.0545550Z cvt.u64.u32 %rd152, %r849; 2026-02-21T09:08:09.0545885Z cvt.u64.u32 %rd153, %r850; 2026-02-21T09:08:09.0546219Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:08:09.0546558Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:08:09.0546909Z cvt.u64.u32 %rd156, %r855; 2026-02-21T09:08:09.0547242Z cvt.u64.u32 %rd157, %r856; 2026-02-21T09:08:09.0547597Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:08:09.0547945Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:08:09.0548307Z cvt.u64.u32 %rd160, %r857; 2026-02-21T09:08:09.0548645Z cvt.u64.u32 %rd161, %r858; 2026-02-21T09:08:09.0548992Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:08:09.0549342Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:08:09.0549686Z // begin inline asm 2026-02-21T09:08:09.0550540Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r860, %r861, %r862, %r863, %r864, %r865, %r866, %r867, %r868, %r869, %r870, %r871, %r872, %r873, %r874, %r875}, [%r1369 + 32]; 2026-02-21T09:08:09.0551427Z // end inline asm 2026-02-21T09:08:09.0551733Z cvt.u64.u32 %rd164, %r864; 2026-02-21T09:08:09.0552057Z cvt.u64.u32 %rd165, %r865; 2026-02-21T09:08:09.0552398Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:08:09.0552729Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:08:09.0553075Z cvt.u64.u32 %rd168, %r866; 2026-02-21T09:08:09.0553415Z cvt.u64.u32 %rd169, %r867; 2026-02-21T09:08:09.0553849Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:08:09.0554195Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:08:09.0554536Z cvt.u64.u32 %rd172, %r872; 2026-02-21T09:08:09.0554973Z cvt.u64.u32 %rd173, %r873; 2026-02-21T09:08:09.0555305Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:08:09.0555717Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:08:09.0556058Z cvt.u64.u32 %rd176, %r874; 2026-02-21T09:08:09.0556397Z cvt.u64.u32 %rd177, %r875; 2026-02-21T09:08:09.0556720Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:08:09.0557067Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:08:09.0557417Z // begin inline asm 2026-02-21T09:08:09.0558254Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r877, %r878, %r879, %r880, %r881, %r882, %r883, %r884, %r885, %r886, %r887, %r888, %r889, %r890, %r891, %r892}, [%r1369 + 48]; 2026-02-21T09:08:09.0559167Z // end inline asm 2026-02-21T09:08:09.0559464Z cvt.u64.u32 %rd180, %r881; 2026-02-21T09:08:09.0559804Z cvt.u64.u32 %rd181, %r882; 2026-02-21T09:08:09.0560211Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:08:09.0560558Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:08:09.0560902Z cvt.u64.u32 %rd184, %r883; 2026-02-21T09:08:09.0561251Z cvt.u64.u32 %rd185, %r884; 2026-02-21T09:08:09.0561600Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:08:09.0561937Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:08:09.0562056Z cvt.u64.u32 %rd188, %r889; 2026-02-21T09:08:09.0562186Z cvt.u64.u32 %rd189, %r890; 2026-02-21T09:08:09.0562301Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:08:09.0562490Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:08:09.0562622Z cvt.u64.u32 %rd192, %r891; 2026-02-21T09:08:09.0562739Z cvt.u64.u32 %rd193, %r892; 2026-02-21T09:08:09.0562860Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:08:09.0562978Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:08:09.0563105Z // begin inline asm 2026-02-21T09:08:09.0563755Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r894, %r895, %r896, %r897, %r898, %r899, %r900, %r901, %r902, %r903, %r904, %r905, %r906, %r907, %r908, %r909}, [%r1369 + 64]; 2026-02-21T09:08:09.0563869Z // end inline asm 2026-02-21T09:08:09.0564001Z cvt.u64.u32 %rd196, %r898; 2026-02-21T09:08:09.0564118Z cvt.u64.u32 %rd197, %r899; 2026-02-21T09:08:09.0564233Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:08:09.0564365Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:08:09.0564481Z cvt.u64.u32 %rd200, %r900; 2026-02-21T09:08:09.0564595Z cvt.u64.u32 %rd201, %r901; 2026-02-21T09:08:09.0564794Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:08:09.0564929Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:08:09.0565048Z cvt.u64.u32 %rd204, %r906; 2026-02-21T09:08:09.0565160Z cvt.u64.u32 %rd205, %r907; 2026-02-21T09:08:09.0565289Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:08:09.0565406Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:08:09.0565521Z cvt.u64.u32 %rd208, %r908; 2026-02-21T09:08:09.0565636Z cvt.u64.u32 %rd209, %r909; 2026-02-21T09:08:09.0565770Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:08:09.0565892Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:08:09.0566010Z // begin inline asm 2026-02-21T09:08:09.0566643Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r911, %r912, %r913, %r914, %r915, %r916, %r917, %r918, %r919, %r920, %r921, %r922, %r923, %r924, %r925, %r926}, [%r1369 + 80]; 2026-02-21T09:08:09.0566757Z // end inline asm 2026-02-21T09:08:09.0566876Z cvt.u64.u32 %rd212, %r915; 2026-02-21T09:08:09.0567005Z cvt.u64.u32 %rd213, %r916; 2026-02-21T09:08:09.0567123Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:08:09.0567244Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:08:09.0567362Z cvt.u64.u32 %rd216, %r917; 2026-02-21T09:08:09.0567490Z cvt.u64.u32 %rd217, %r918; 2026-02-21T09:08:09.0567604Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:08:09.0567720Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:08:09.0567848Z cvt.u64.u32 %rd220, %r923; 2026-02-21T09:08:09.0567965Z cvt.u64.u32 %rd221, %r924; 2026-02-21T09:08:09.0568085Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:08:09.0568207Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:08:09.0568405Z cvt.u64.u32 %rd224, %r925; 2026-02-21T09:08:09.0568521Z cvt.u64.u32 %rd225, %r926; 2026-02-21T09:08:09.0568637Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:08:09.0568765Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:08:09.0568878Z // begin inline asm 2026-02-21T09:08:09.0569585Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r928, %r929, %r930, %r931, %r932, %r933, %r934, %r935, %r936, %r937, %r938, %r939, %r940, %r941, %r942, %r943}, [%r1369 + 96]; 2026-02-21T09:08:09.0569696Z // end inline asm 2026-02-21T09:08:09.0569828Z cvt.u64.u32 %rd228, %r932; 2026-02-21T09:08:09.0569945Z cvt.u64.u32 %rd229, %r933; 2026-02-21T09:08:09.0570065Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:08:09.0570194Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:08:09.0570311Z cvt.u64.u32 %rd232, %r934; 2026-02-21T09:08:09.0570429Z cvt.u64.u32 %rd233, %r935; 2026-02-21T09:08:09.0570554Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:08:09.0570675Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:08:09.0570858Z cvt.u64.u32 %rd236, %r940; 2026-02-21T09:08:09.0570980Z cvt.u64.u32 %rd237, %r941; 2026-02-21T09:08:09.0571108Z shl.b64 %rd238, %rd237, 32; 2026-02-21T09:08:09.0571229Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T09:08:09.0571347Z cvt.u64.u32 %rd240, %r942; 2026-02-21T09:08:09.0571479Z cvt.u64.u32 %rd241, %r943; 2026-02-21T09:08:09.0571595Z shl.b64 %rd242, %rd241, 32; 2026-02-21T09:08:09.0571714Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T09:08:09.0571826Z // begin inline asm 2026-02-21T09:08:09.0572589Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r945, %r946, %r947, %r948, %r949, %r950, %r951, %r952, %r953, %r954, %r955, %r956, %r957, %r958, %r959, %r960}, [%r1369 + 112]; 2026-02-21T09:08:09.0572703Z // end inline asm 2026-02-21T09:08:09.0572822Z cvt.u64.u32 %rd244, %r949; 2026-02-21T09:08:09.0572953Z cvt.u64.u32 %rd245, %r950; 2026-02-21T09:08:09.0573076Z shl.b64 %rd246, %rd245, 32; 2026-02-21T09:08:09.0573195Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T09:08:09.0573327Z cvt.u64.u32 %rd248, %r951; 2026-02-21T09:08:09.0573438Z cvt.u64.u32 %rd249, %r952; 2026-02-21T09:08:09.0573555Z shl.b64 %rd250, %rd249, 32; 2026-02-21T09:08:09.0573671Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T09:08:09.0573800Z cvt.u64.u32 %rd252, %r957; 2026-02-21T09:08:09.0573918Z cvt.u64.u32 %rd253, %r958; 2026-02-21T09:08:09.0574040Z shl.b64 %rd254, %rd253, 32; 2026-02-21T09:08:09.0574161Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T09:08:09.0574279Z cvt.u64.u32 %rd256, %r959; 2026-02-21T09:08:09.0574396Z cvt.u64.u32 %rd257, %r960; 2026-02-21T09:08:09.0574515Z shl.b64 %rd258, %rd257, 32; 2026-02-21T09:08:09.0574643Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T09:08:09.0574854Z // begin inline asm 2026-02-21T09:08:09.0575504Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r962, %r963, %r964, %r965, %r966, %r967, %r968, %r969, %r970, %r971, %r972, %r973, %r974, %r975, %r976, %r977}, [%r1369 + 128]; 2026-02-21T09:08:09.0575628Z // end inline asm 2026-02-21T09:08:09.0575749Z cvt.u64.u32 %rd260, %r966; 2026-02-21T09:08:09.0575868Z cvt.u64.u32 %rd261, %r967; 2026-02-21T09:08:09.0575986Z shl.b64 %rd262, %rd261, 32; 2026-02-21T09:08:09.0576115Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T09:08:09.0576232Z cvt.u64.u32 %rd264, %r968; 2026-02-21T09:08:09.0576347Z cvt.u64.u32 %rd265, %r969; 2026-02-21T09:08:09.0576481Z shl.b64 %rd266, %rd265, 32; 2026-02-21T09:08:09.0576599Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T09:08:09.0576713Z cvt.u64.u32 %rd268, %r974; 2026-02-21T09:08:09.0576841Z cvt.u64.u32 %rd269, %r975; 2026-02-21T09:08:09.0576963Z shl.b64 %rd270, %rd269, 32; 2026-02-21T09:08:09.0577082Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T09:08:09.0577198Z cvt.u64.u32 %rd272, %r976; 2026-02-21T09:08:09.0577327Z cvt.u64.u32 %rd273, %r977; 2026-02-21T09:08:09.0577447Z shl.b64 %rd274, %rd273, 32; 2026-02-21T09:08:09.0577569Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T09:08:09.0577694Z // begin inline asm 2026-02-21T09:08:09.0578327Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r979, %r980, %r981, %r982, %r983, %r984, %r985, %r986, %r987, %r988, %r989, %r990, %r991, %r992, %r993, %r994}, [%r1369 + 144]; 2026-02-21T09:08:09.0578517Z // end inline asm 2026-02-21T09:08:09.0578638Z cvt.u64.u32 %rd276, %r983; 2026-02-21T09:08:09.0578766Z cvt.u64.u32 %rd277, %r984; 2026-02-21T09:08:09.0578937Z shl.b64 %rd278, %rd277, 32; 2026-02-21T09:08:09.0579054Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T09:08:09.0579184Z cvt.u64.u32 %rd280, %r985; 2026-02-21T09:08:09.0579303Z cvt.u64.u32 %rd281, %r986; 2026-02-21T09:08:09.0579429Z shl.b64 %rd282, %rd281, 32; 2026-02-21T09:08:09.0579559Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T09:08:09.0579676Z cvt.u64.u32 %rd284, %r991; 2026-02-21T09:08:09.0579793Z cvt.u64.u32 %rd285, %r992; 2026-02-21T09:08:09.0579910Z shl.b64 %rd286, %rd285, 32; 2026-02-21T09:08:09.0580041Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T09:08:09.0580157Z cvt.u64.u32 %rd288, %r993; 2026-02-21T09:08:09.0580274Z cvt.u64.u32 %rd289, %r994; 2026-02-21T09:08:09.0580466Z shl.b64 %rd290, %rd289, 32; 2026-02-21T09:08:09.0580585Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T09:08:09.0580696Z // begin inline asm 2026-02-21T09:08:09.0581432Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r996, %r997, %r998, %r999, %r1000, %r1001, %r1002, %r1003, %r1004, %r1005, %r1006, %r1007, %r1008, %r1009, %r1010, %r1011}, [%r1369 + 160]; 2026-02-21T09:08:09.0581560Z // end inline asm 2026-02-21T09:08:09.0581678Z cvt.u64.u32 %rd292, %r1000; 2026-02-21T09:08:09.0581799Z cvt.u64.u32 %rd293, %r1001; 2026-02-21T09:08:09.0582009Z shl.b64 %rd294, %rd293, 32; 2026-02-21T09:08:09.0582130Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T09:08:09.0582246Z cvt.u64.u32 %rd296, %r1002; 2026-02-21T09:08:09.0582381Z cvt.u64.u32 %rd297, %r1003; 2026-02-21T09:08:09.0582506Z shl.b64 %rd298, %rd297, 32; 2026-02-21T09:08:09.0582629Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T09:08:09.0582749Z cvt.u64.u32 %rd300, %r1008; 2026-02-21T09:08:09.0582878Z cvt.u64.u32 %rd301, %r1009; 2026-02-21T09:08:09.0582995Z shl.b64 %rd302, %rd301, 32; 2026-02-21T09:08:09.0583114Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T09:08:09.0583240Z cvt.u64.u32 %rd304, %r1010; 2026-02-21T09:08:09.0583356Z cvt.u64.u32 %rd305, %r1011; 2026-02-21T09:08:09.0583473Z shl.b64 %rd306, %rd305, 32; 2026-02-21T09:08:09.0583595Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T09:08:09.0583718Z // begin inline asm 2026-02-21T09:08:09.0584451Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1013, %r1014, %r1015, %r1016, %r1017, %r1018, %r1019, %r1020, %r1021, %r1022, %r1023, %r1024, %r1025, %r1026, %r1027, %r1028}, [%r1369 + 176]; 2026-02-21T09:08:09.0584560Z // end inline asm 2026-02-21T09:08:09.0584793Z cvt.u64.u32 %rd308, %r1017; 2026-02-21T09:08:09.0584909Z cvt.u64.u32 %rd309, %r1018; 2026-02-21T09:08:09.0585029Z shl.b64 %rd310, %rd309, 32; 2026-02-21T09:08:09.0585159Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T09:08:09.0585280Z cvt.u64.u32 %rd312, %r1019; 2026-02-21T09:08:09.0585395Z cvt.u64.u32 %rd313, %r1020; 2026-02-21T09:08:09.0585520Z shl.b64 %rd314, %rd313, 32; 2026-02-21T09:08:09.0585653Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T09:08:09.0585770Z cvt.u64.u32 %rd316, %r1025; 2026-02-21T09:08:09.0585887Z cvt.u64.u32 %rd317, %r1026; 2026-02-21T09:08:09.0586014Z shl.b64 %rd318, %rd317, 32; 2026-02-21T09:08:09.0586134Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T09:08:09.0586250Z cvt.u64.u32 %rd320, %r1027; 2026-02-21T09:08:09.0586366Z cvt.u64.u32 %rd321, %r1028; 2026-02-21T09:08:09.0586497Z shl.b64 %rd322, %rd321, 32; 2026-02-21T09:08:09.0586624Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T09:08:09.0586737Z // begin inline asm 2026-02-21T09:08:09.0587469Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1030, %r1031, %r1032, %r1033, %r1034, %r1035, %r1036, %r1037, %r1038, %r1039, %r1040, %r1041, %r1042, %r1043, %r1044, %r1045}, [%r1369 + 192]; 2026-02-21T09:08:09.0587578Z // end inline asm 2026-02-21T09:08:09.0587691Z cvt.u64.u32 %rd324, %r1034; 2026-02-21T09:08:09.0587814Z cvt.u64.u32 %rd325, %r1035; 2026-02-21T09:08:09.0587932Z shl.b64 %rd326, %rd325, 32; 2026-02-21T09:08:09.0588131Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T09:08:09.0588250Z cvt.u64.u32 %rd328, %r1036; 2026-02-21T09:08:09.0588382Z cvt.u64.u32 %rd329, %r1037; 2026-02-21T09:08:09.0588499Z shl.b64 %rd330, %rd329, 32; 2026-02-21T09:08:09.0588679Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T09:08:09.0588800Z cvt.u64.u32 %rd332, %r1042; 2026-02-21T09:08:09.0588918Z cvt.u64.u32 %rd333, %r1043; 2026-02-21T09:08:09.0589037Z shl.b64 %rd334, %rd333, 32; 2026-02-21T09:08:09.0589157Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T09:08:09.0589286Z cvt.u64.u32 %rd336, %r1044; 2026-02-21T09:08:09.0589403Z cvt.u64.u32 %rd337, %r1045; 2026-02-21T09:08:09.0589519Z shl.b64 %rd338, %rd337, 32; 2026-02-21T09:08:09.0589647Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T09:08:09.0589760Z // begin inline asm 2026-02-21T09:08:09.0590539Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1047, %r1048, %r1049, %r1050, %r1051, %r1052, %r1053, %r1054, %r1055, %r1056, %r1057, %r1058, %r1059, %r1060, %r1061, %r1062}, [%r1369 + 208]; 2026-02-21T09:08:09.0590661Z // end inline asm 2026-02-21T09:08:09.0590780Z cvt.u64.u32 %rd340, %r1051; 2026-02-21T09:08:09.0590895Z cvt.u64.u32 %rd341, %r1052; 2026-02-21T09:08:09.0591011Z shl.b64 %rd342, %rd341, 32; 2026-02-21T09:08:09.0591146Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T09:08:09.0591266Z cvt.u64.u32 %rd344, %r1053; 2026-02-21T09:08:09.0591382Z cvt.u64.u32 %rd345, %r1054; 2026-02-21T09:08:09.0591510Z shl.b64 %rd346, %rd345, 32; 2026-02-21T09:08:09.0591683Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T09:08:09.0591803Z cvt.u64.u32 %rd348, %r1059; 2026-02-21T09:08:09.0591921Z cvt.u64.u32 %rd349, %r1060; 2026-02-21T09:08:09.0592045Z shl.b64 %rd350, %rd349, 32; 2026-02-21T09:08:09.0592165Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T09:08:09.0592282Z cvt.u64.u32 %rd352, %r1061; 2026-02-21T09:08:09.0592413Z cvt.u64.u32 %rd353, %r1062; 2026-02-21T09:08:09.0592529Z shl.b64 %rd354, %rd353, 32; 2026-02-21T09:08:09.0592651Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T09:08:09.0592766Z // begin inline asm 2026-02-21T09:08:09.0593513Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1064, %r1065, %r1066, %r1067, %r1068, %r1069, %r1070, %r1071, %r1072, %r1073, %r1074, %r1075, %r1076, %r1077, %r1078, %r1079}, [%r1369 + 224]; 2026-02-21T09:08:09.0593629Z // end inline asm 2026-02-21T09:08:09.0593746Z cvt.u64.u32 %rd356, %r1068; 2026-02-21T09:08:09.0593875Z cvt.u64.u32 %rd357, %r1069; 2026-02-21T09:08:09.0593992Z shl.b64 %rd358, %rd357, 32; 2026-02-21T09:08:09.0594112Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T09:08:09.0594243Z cvt.u64.u32 %rd360, %r1070; 2026-02-21T09:08:09.0594360Z cvt.u64.u32 %rd361, %r1071; 2026-02-21T09:08:09.0594476Z shl.b64 %rd362, %rd361, 32; 2026-02-21T09:08:09.0594594Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T09:08:09.0594786Z cvt.u64.u32 %rd364, %r1076; 2026-02-21T09:08:09.0594907Z cvt.u64.u32 %rd365, %r1077; 2026-02-21T09:08:09.0595027Z shl.b64 %rd366, %rd365, 32; 2026-02-21T09:08:09.0595155Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T09:08:09.0595274Z cvt.u64.u32 %rd368, %r1078; 2026-02-21T09:08:09.0595390Z cvt.u64.u32 %rd369, %r1079; 2026-02-21T09:08:09.0595507Z shl.b64 %rd370, %rd369, 32; 2026-02-21T09:08:09.0595638Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T09:08:09.0595752Z // begin inline asm 2026-02-21T09:08:09.0596474Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1081, %r1082, %r1083, %r1084, %r1085, %r1086, %r1087, %r1088, %r1089, %r1090, %r1091, %r1092, %r1093, %r1094, %r1095, %r1096}, [%r1369 + 240]; 2026-02-21T09:08:09.0596597Z // end inline asm 2026-02-21T09:08:09.0596714Z cvt.u64.u32 %rd372, %r1085; 2026-02-21T09:08:09.0596835Z cvt.u64.u32 %rd373, %r1086; 2026-02-21T09:08:09.0596961Z shl.b64 %rd374, %rd373, 32; 2026-02-21T09:08:09.0597080Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T09:08:09.0597196Z cvt.u64.u32 %rd376, %r1087; 2026-02-21T09:08:09.0597311Z cvt.u64.u32 %rd377, %r1088; 2026-02-21T09:08:09.0597440Z shl.b64 %rd378, %rd377, 32; 2026-02-21T09:08:09.0597562Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T09:08:09.0597754Z cvt.u64.u32 %rd380, %r1093; 2026-02-21T09:08:09.0597885Z cvt.u64.u32 %rd381, %r1094; 2026-02-21T09:08:09.0598006Z shl.b64 %rd382, %rd381, 32; 2026-02-21T09:08:09.0598129Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T09:08:09.0598305Z cvt.u64.u32 %rd384, %r1095; 2026-02-21T09:08:09.0598431Z cvt.u64.u32 %rd385, %r1096; 2026-02-21T09:08:09.0598554Z shl.b64 %rd386, %rd385, 32; 2026-02-21T09:08:09.0598677Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T09:08:09.0598807Z // begin inline asm 2026-02-21T09:08:09.0599540Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1098, %r1099, %r1100, %r1101, %r1102, %r1103, %r1104, %r1105, %r1106, %r1107, %r1108, %r1109, %r1110, %r1111, %r1112, %r1113}, [%r1369 + 256]; 2026-02-21T09:08:09.0599648Z // end inline asm 2026-02-21T09:08:09.0599776Z cvt.u64.u32 %rd388, %r1102; 2026-02-21T09:08:09.0599897Z cvt.u64.u32 %rd389, %r1103; 2026-02-21T09:08:09.0600069Z shl.b64 %rd390, %rd389, 32; 2026-02-21T09:08:09.0600193Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T09:08:09.0600324Z cvt.u64.u32 %rd392, %r1104; 2026-02-21T09:08:09.0600447Z cvt.u64.u32 %rd393, %r1105; 2026-02-21T09:08:09.0600564Z shl.b64 %rd394, %rd393, 32; 2026-02-21T09:08:09.0600696Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T09:08:09.0600816Z cvt.u64.u32 %rd396, %r1110; 2026-02-21T09:08:09.0600936Z cvt.u64.u32 %rd397, %r1111; 2026-02-21T09:08:09.0601052Z shl.b64 %rd398, %rd397, 32; 2026-02-21T09:08:09.0601186Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T09:08:09.0601356Z cvt.u64.u32 %rd400, %r1112; 2026-02-21T09:08:09.0601472Z cvt.u64.u32 %rd401, %r1113; 2026-02-21T09:08:09.0601597Z shl.b64 %rd402, %rd401, 32; 2026-02-21T09:08:09.0601713Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T09:08:09.0601825Z // begin inline asm 2026-02-21T09:08:09.0602568Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1115, %r1116, %r1117, %r1118, %r1119, %r1120, %r1121, %r1122, %r1123, %r1124, %r1125, %r1126, %r1127, %r1128, %r1129, %r1130}, [%r1369 + 272]; 2026-02-21T09:08:09.0602681Z // end inline asm 2026-02-21T09:08:09.0602800Z cvt.u64.u32 %rd404, %r1119; 2026-02-21T09:08:09.0602940Z cvt.u64.u32 %rd405, %r1120; 2026-02-21T09:08:09.0603067Z shl.b64 %rd406, %rd405, 32; 2026-02-21T09:08:09.0603182Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T09:08:09.0603303Z cvt.u64.u32 %rd408, %r1121; 2026-02-21T09:08:09.0603431Z cvt.u64.u32 %rd409, %r1122; 2026-02-21T09:08:09.0603546Z shl.b64 %rd410, %rd409, 32; 2026-02-21T09:08:09.0603662Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T09:08:09.0603788Z cvt.u64.u32 %rd412, %r1127; 2026-02-21T09:08:09.0603914Z cvt.u64.u32 %rd413, %r1128; 2026-02-21T09:08:09.0604028Z shl.b64 %rd414, %rd413, 32; 2026-02-21T09:08:09.0604147Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T09:08:09.0604274Z cvt.u64.u32 %rd416, %r1129; 2026-02-21T09:08:09.0604391Z cvt.u64.u32 %rd417, %r1130; 2026-02-21T09:08:09.0604504Z shl.b64 %rd418, %rd417, 32; 2026-02-21T09:08:09.0604631Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T09:08:09.0604808Z // begin inline asm 2026-02-21T09:08:09.0605543Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1132, %r1133, %r1134, %r1135, %r1136, %r1137, %r1138, %r1139, %r1140, %r1141, %r1142, %r1143, %r1144, %r1145, %r1146, %r1147}, [%r1369 + 288]; 2026-02-21T09:08:09.0605654Z // end inline asm 2026-02-21T09:08:09.0605787Z cvt.u64.u32 %rd420, %r1136; 2026-02-21T09:08:09.0605906Z cvt.u64.u32 %rd421, %r1137; 2026-02-21T09:08:09.0606023Z shl.b64 %rd422, %rd421, 32; 2026-02-21T09:08:09.0606154Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T09:08:09.0606273Z cvt.u64.u32 %rd424, %r1138; 2026-02-21T09:08:09.0606391Z cvt.u64.u32 %rd425, %r1139; 2026-02-21T09:08:09.0606519Z shl.b64 %rd426, %rd425, 32; 2026-02-21T09:08:09.0606636Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T09:08:09.0606754Z cvt.u64.u32 %rd428, %r1144; 2026-02-21T09:08:09.0606869Z cvt.u64.u32 %rd429, %r1145; 2026-02-21T09:08:09.0606999Z shl.b64 %rd430, %rd429, 32; 2026-02-21T09:08:09.0607122Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T09:08:09.0607308Z cvt.u64.u32 %rd432, %r1146; 2026-02-21T09:08:09.0607439Z cvt.u64.u32 %rd433, %r1147; 2026-02-21T09:08:09.0607555Z shl.b64 %rd434, %rd433, 32; 2026-02-21T09:08:09.0607675Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T09:08:09.0607790Z // begin inline asm 2026-02-21T09:08:09.0608598Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1149, %r1150, %r1151, %r1152, %r1153, %r1154, %r1155, %r1156, %r1157, %r1158, %r1159, %r1160, %r1161, %r1162, %r1163, %r1164}, [%r1369 + 304]; 2026-02-21T09:08:09.0608710Z // end inline asm 2026-02-21T09:08:09.0608830Z cvt.u64.u32 %rd436, %r1153; 2026-02-21T09:08:09.0608961Z cvt.u64.u32 %rd437, %r1154; 2026-02-21T09:08:09.0609081Z shl.b64 %rd438, %rd437, 32; 2026-02-21T09:08:09.0609202Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T09:08:09.0609327Z cvt.u64.u32 %rd440, %r1155; 2026-02-21T09:08:09.0609446Z cvt.u64.u32 %rd441, %r1156; 2026-02-21T09:08:09.0609561Z shl.b64 %rd442, %rd441, 32; 2026-02-21T09:08:09.0613265Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T09:08:09.0613505Z cvt.u64.u32 %rd444, %r1161; 2026-02-21T09:08:09.0613626Z cvt.u64.u32 %rd445, %r1162; 2026-02-21T09:08:09.0613744Z shl.b64 %rd446, %rd445, 32; 2026-02-21T09:08:09.0613875Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T09:08:09.0613993Z cvt.u64.u32 %rd448, %r1163; 2026-02-21T09:08:09.0614114Z cvt.u64.u32 %rd449, %r1164; 2026-02-21T09:08:09.0614234Z shl.b64 %rd450, %rd449, 32; 2026-02-21T09:08:09.0614362Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T09:08:09.0614470Z // begin inline asm 2026-02-21T09:08:09.0615385Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1166, %r1167, %r1168, %r1169, %r1170, %r1171, %r1172, %r1173, %r1174, %r1175, %r1176, %r1177, %r1178, %r1179, %r1180, %r1181}, [%r1369 + 320]; 2026-02-21T09:08:09.0615512Z // end inline asm 2026-02-21T09:08:09.0615629Z cvt.u64.u32 %rd452, %r1170; 2026-02-21T09:08:09.0615749Z cvt.u64.u32 %rd453, %r1171; 2026-02-21T09:08:09.0615881Z shl.b64 %rd454, %rd453, 32; 2026-02-21T09:08:09.0616005Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T09:08:09.0616127Z cvt.u64.u32 %rd456, %r1172; 2026-02-21T09:08:09.0616241Z cvt.u64.u32 %rd457, %r1173; 2026-02-21T09:08:09.0616369Z shl.b64 %rd458, %rd457, 32; 2026-02-21T09:08:09.0616489Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T09:08:09.0616605Z cvt.u64.u32 %rd460, %r1178; 2026-02-21T09:08:09.0616734Z cvt.u64.u32 %rd461, %r1179; 2026-02-21T09:08:09.0616852Z shl.b64 %rd462, %rd461, 32; 2026-02-21T09:08:09.0616972Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T09:08:09.0617092Z cvt.u64.u32 %rd464, %r1180; 2026-02-21T09:08:09.0617219Z cvt.u64.u32 %rd465, %r1181; 2026-02-21T09:08:09.0617336Z shl.b64 %rd466, %rd465, 32; 2026-02-21T09:08:09.0617453Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T09:08:09.0617575Z // begin inline asm 2026-02-21T09:08:09.0618308Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1183, %r1184, %r1185, %r1186, %r1187, %r1188, %r1189, %r1190, %r1191, %r1192, %r1193, %r1194, %r1195, %r1196, %r1197, %r1198}, [%r1369 + 336]; 2026-02-21T09:08:09.0618422Z // end inline asm 2026-02-21T09:08:09.0618554Z cvt.u64.u32 %rd468, %r1187; 2026-02-21T09:08:09.0618672Z cvt.u64.u32 %rd469, %r1188; 2026-02-21T09:08:09.0618787Z shl.b64 %rd470, %rd469, 32; 2026-02-21T09:08:09.0618907Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T09:08:09.0619037Z cvt.u64.u32 %rd472, %r1189; 2026-02-21T09:08:09.0619155Z cvt.u64.u32 %rd473, %r1190; 2026-02-21T09:08:09.0619278Z shl.b64 %rd474, %rd473, 32; 2026-02-21T09:08:09.0619407Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T09:08:09.0619526Z cvt.u64.u32 %rd476, %r1195; 2026-02-21T09:08:09.0619643Z cvt.u64.u32 %rd477, %r1196; 2026-02-21T09:08:09.0619758Z shl.b64 %rd478, %rd477, 32; 2026-02-21T09:08:09.0619884Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T09:08:09.0620000Z cvt.u64.u32 %rd480, %r1197; 2026-02-21T09:08:09.0620116Z cvt.u64.u32 %rd481, %r1198; 2026-02-21T09:08:09.0620242Z shl.b64 %rd482, %rd481, 32; 2026-02-21T09:08:09.0620359Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T09:08:09.0620475Z // begin inline asm 2026-02-21T09:08:09.0621290Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1200, %r1201, %r1202, %r1203, %r1204, %r1205, %r1206, %r1207, %r1208, %r1209, %r1210, %r1211, %r1212, %r1213, %r1214, %r1215}, [%r1369 + 352]; 2026-02-21T09:08:09.0621410Z // end inline asm 2026-02-21T09:08:09.0621524Z cvt.u64.u32 %rd484, %r1204; 2026-02-21T09:08:09.0621714Z cvt.u64.u32 %rd485, %r1205; 2026-02-21T09:08:09.0621842Z shl.b64 %rd486, %rd485, 32; 2026-02-21T09:08:09.0621964Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T09:08:09.0622082Z cvt.u64.u32 %rd488, %r1206; 2026-02-21T09:08:09.0622208Z cvt.u64.u32 %rd489, %r1207; 2026-02-21T09:08:09.0622327Z shl.b64 %rd490, %rd489, 32; 2026-02-21T09:08:09.0622444Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T09:08:09.0622561Z cvt.u64.u32 %rd492, %r1212; 2026-02-21T09:08:09.0622690Z cvt.u64.u32 %rd493, %r1213; 2026-02-21T09:08:09.0622805Z shl.b64 %rd494, %rd493, 32; 2026-02-21T09:08:09.0622925Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T09:08:09.0623120Z cvt.u64.u32 %rd496, %r1214; 2026-02-21T09:08:09.0623245Z cvt.u64.u32 %rd497, %r1215; 2026-02-21T09:08:09.0623364Z shl.b64 %rd498, %rd497, 32; 2026-02-21T09:08:09.0623489Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T09:08:09.0623612Z // begin inline asm 2026-02-21T09:08:09.0624343Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1217, %r1218, %r1219, %r1220, %r1221, %r1222, %r1223, %r1224, %r1225, %r1226, %r1227, %r1228, %r1229, %r1230, %r1231, %r1232}, [%r1369 + 368]; 2026-02-21T09:08:09.0624453Z // end inline asm 2026-02-21T09:08:09.0624631Z cvt.u64.u32 %rd500, %r1221; 2026-02-21T09:08:09.0624853Z cvt.u64.u32 %rd501, %r1222; 2026-02-21T09:08:09.0624978Z shl.b64 %rd502, %rd501, 32; 2026-02-21T09:08:09.0625110Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T09:08:09.0625228Z cvt.u64.u32 %rd504, %r1223; 2026-02-21T09:08:09.0625345Z cvt.u64.u32 %rd505, %r1224; 2026-02-21T09:08:09.0625465Z shl.b64 %rd506, %rd505, 32; 2026-02-21T09:08:09.0625595Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T09:08:09.0625713Z cvt.u64.u32 %rd508, %r1229; 2026-02-21T09:08:09.0625827Z cvt.u64.u32 %rd509, %r1230; 2026-02-21T09:08:09.0625957Z shl.b64 %rd510, %rd509, 32; 2026-02-21T09:08:09.0626072Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T09:08:09.0626188Z cvt.u64.u32 %rd512, %r1231; 2026-02-21T09:08:09.0626306Z cvt.u64.u32 %rd513, %r1232; 2026-02-21T09:08:09.0626437Z shl.b64 %rd514, %rd513, 32; 2026-02-21T09:08:09.0626557Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T09:08:09.0626667Z // begin inline asm 2026-02-21T09:08:09.0627400Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1234, %r1235, %r1236, %r1237, %r1238, %r1239, %r1240, %r1241, %r1242, %r1243, %r1244, %r1245, %r1246, %r1247, %r1248, %r1249}, [%r1369 + 384]; 2026-02-21T09:08:09.0627507Z // end inline asm 2026-02-21T09:08:09.0627623Z cvt.u64.u32 %rd516, %r1238; 2026-02-21T09:08:09.0627752Z cvt.u64.u32 %rd517, %r1239; 2026-02-21T09:08:09.0627868Z shl.b64 %rd518, %rd517, 32; 2026-02-21T09:08:09.0627985Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T09:08:09.0628105Z cvt.u64.u32 %rd520, %r1240; 2026-02-21T09:08:09.0628239Z cvt.u64.u32 %rd521, %r1241; 2026-02-21T09:08:09.0628353Z shl.b64 %rd522, %rd521, 32; 2026-02-21T09:08:09.0628473Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T09:08:09.0628599Z cvt.u64.u32 %rd524, %r1246; 2026-02-21T09:08:09.0628721Z cvt.u64.u32 %rd525, %r1247; 2026-02-21T09:08:09.0628841Z shl.b64 %rd526, %rd525, 32; 2026-02-21T09:08:09.0628959Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T09:08:09.0629086Z cvt.u64.u32 %rd528, %r1248; 2026-02-21T09:08:09.0629205Z cvt.u64.u32 %rd529, %r1249; 2026-02-21T09:08:09.0629323Z shl.b64 %rd530, %rd529, 32; 2026-02-21T09:08:09.0629453Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T09:08:09.0629566Z // begin inline asm 2026-02-21T09:08:09.0630296Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1251, %r1252, %r1253, %r1254, %r1255, %r1256, %r1257, %r1258, %r1259, %r1260, %r1261, %r1262, %r1263, %r1264, %r1265, %r1266}, [%r1369 + 400]; 2026-02-21T09:08:09.0630411Z // end inline asm 2026-02-21T09:08:09.0630527Z cvt.u64.u32 %rd532, %r1255; 2026-02-21T09:08:09.0630733Z cvt.u64.u32 %rd533, %r1256; 2026-02-21T09:08:09.0630849Z shl.b64 %rd534, %rd533, 32; 2026-02-21T09:08:09.0630980Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T09:08:09.0631100Z cvt.u64.u32 %rd536, %r1257; 2026-02-21T09:08:09.0631280Z cvt.u64.u32 %rd537, %r1258; 2026-02-21T09:08:09.0631409Z shl.b64 %rd538, %rd537, 32; 2026-02-21T09:08:09.0631526Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T09:08:09.0631646Z cvt.u64.u32 %rd540, %r1263; 2026-02-21T09:08:09.0631771Z cvt.u64.u32 %rd541, %r1264; 2026-02-21T09:08:09.0631899Z shl.b64 %rd542, %rd541, 32; 2026-02-21T09:08:09.0632017Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T09:08:09.0632133Z cvt.u64.u32 %rd544, %r1265; 2026-02-21T09:08:09.0632258Z cvt.u64.u32 %rd545, %r1266; 2026-02-21T09:08:09.0632373Z shl.b64 %rd546, %rd545, 32; 2026-02-21T09:08:09.0632484Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T09:08:09.0632611Z // begin inline asm 2026-02-21T09:08:09.0633398Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1268, %r1269, %r1270, %r1271, %r1272, %r1273, %r1274, %r1275, %r1276, %r1277, %r1278, %r1279, %r1280, %r1281, %r1282, %r1283}, [%r1369 + 416]; 2026-02-21T09:08:09.0633510Z // end inline asm 2026-02-21T09:08:09.0633632Z cvt.u64.u32 %rd548, %r1270; 2026-02-21T09:08:09.0633770Z cvt.u64.u32 %rd549, %r1271; 2026-02-21T09:08:09.0633891Z shl.b64 %rd550, %rd549, 32; 2026-02-21T09:08:09.0634011Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T09:08:09.0634135Z cvt.u64.u32 %rd552, %r1272; 2026-02-21T09:08:09.0634302Z cvt.u64.u32 %rd553, %r1273; 2026-02-21T09:08:09.0634422Z shl.b64 %rd554, %rd553, 32; 2026-02-21T09:08:09.0634548Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T09:08:09.0634747Z cvt.u64.u32 %rd556, %r1274; 2026-02-21T09:08:09.0634871Z cvt.u64.u32 %rd557, %r1275; 2026-02-21T09:08:09.0634986Z shl.b64 %rd558, %rd557, 32; 2026-02-21T09:08:09.0635117Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T09:08:09.0635233Z cvt.u64.u32 %rd560, %r1278; 2026-02-21T09:08:09.0635350Z cvt.u64.u32 %rd561, %r1279; 2026-02-21T09:08:09.0635480Z shl.b64 %rd562, %rd561, 32; 2026-02-21T09:08:09.0635595Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T09:08:09.0635711Z cvt.u64.u32 %rd564, %r1280; 2026-02-21T09:08:09.0635827Z cvt.u64.u32 %rd565, %r1281; 2026-02-21T09:08:09.0635953Z shl.b64 %rd566, %rd565, 32; 2026-02-21T09:08:09.0636071Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T09:08:09.0636194Z cvt.u64.u32 %rd568, %r1282; 2026-02-21T09:08:09.0636318Z cvt.u64.u32 %rd569, %r1283; 2026-02-21T09:08:09.0636438Z shl.b64 %rd570, %rd569, 32; 2026-02-21T09:08:09.0636556Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T09:08:09.0636674Z // begin inline asm 2026-02-21T09:08:09.0637422Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1285, %r1286, %r1287, %r1288, %r1289, %r1290, %r1291, %r1292, %r1293, %r1294, %r1295, %r1296, %r1297, %r1298, %r1299, %r1300}, [%r1369 + 432]; 2026-02-21T09:08:09.0637534Z // end inline asm 2026-02-21T09:08:09.0637651Z cvt.u64.u32 %rd572, %r1287; 2026-02-21T09:08:09.0637781Z cvt.u64.u32 %rd573, %r1288; 2026-02-21T09:08:09.0637896Z shl.b64 %rd574, %rd573, 32; 2026-02-21T09:08:09.0638015Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T09:08:09.0638143Z cvt.u64.u32 %rd576, %r1289; 2026-02-21T09:08:09.0638258Z cvt.u64.u32 %rd577, %r1290; 2026-02-21T09:08:09.0638382Z shl.b64 %rd578, %rd577, 32; 2026-02-21T09:08:09.0638500Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T09:08:09.0638630Z cvt.u64.u32 %rd580, %r1291; 2026-02-21T09:08:09.0638746Z cvt.u64.u32 %rd581, %r1292; 2026-02-21T09:08:09.0638866Z shl.b64 %rd582, %rd581, 32; 2026-02-21T09:08:09.0638990Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T09:08:09.0639111Z cvt.u64.u32 %rd584, %r1297; 2026-02-21T09:08:09.0639225Z cvt.u64.u32 %rd585, %r1298; 2026-02-21T09:08:09.0639340Z shl.b64 %rd586, %rd585, 32; 2026-02-21T09:08:09.0639462Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T09:08:09.0639576Z cvt.u64.u32 %rd588, %r1299; 2026-02-21T09:08:09.0639690Z cvt.u64.u32 %rd589, %r1300; 2026-02-21T09:08:09.0639819Z shl.b64 %rd590, %rd589, 32; 2026-02-21T09:08:09.0640013Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T09:08:09.0640125Z // begin inline asm 2026-02-21T09:08:09.0640858Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1302, %r1303, %r1304, %r1305, %r1306, %r1307, %r1308, %r1309, %r1310, %r1311, %r1312, %r1313, %r1314, %r1315, %r1316, %r1317}, [%r1369 + 448]; 2026-02-21T09:08:09.0641021Z // end inline asm 2026-02-21T09:08:09.0641139Z cvt.u64.u32 %rd592, %r1304; 2026-02-21T09:08:09.0641257Z cvt.u64.u32 %rd593, %r1305; 2026-02-21T09:08:09.0641388Z shl.b64 %rd594, %rd593, 32; 2026-02-21T09:08:09.0641507Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T09:08:09.0641621Z cvt.u64.u32 %rd596, %r1306; 2026-02-21T09:08:09.0641745Z cvt.u64.u32 %rd597, %r1307; 2026-02-21T09:08:09.0641859Z shl.b64 %rd598, %rd597, 32; 2026-02-21T09:08:09.0641978Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T09:08:09.0642093Z cvt.u64.u32 %rd600, %r1308; 2026-02-21T09:08:09.0642217Z cvt.u64.u32 %rd601, %r1309; 2026-02-21T09:08:09.0642384Z shl.b64 %rd602, %rd601, 32; 2026-02-21T09:08:09.0642506Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T09:08:09.0642635Z cvt.u64.u32 %rd604, %r1312; 2026-02-21T09:08:09.0642754Z cvt.u64.u32 %rd605, %r1313; 2026-02-21T09:08:09.0642872Z shl.b64 %rd606, %rd605, 32; 2026-02-21T09:08:09.0642997Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T09:08:09.0643123Z cvt.u64.u32 %rd608, %r1314; 2026-02-21T09:08:09.0643241Z cvt.u64.u32 %rd609, %r1315; 2026-02-21T09:08:09.0643357Z shl.b64 %rd610, %rd609, 32; 2026-02-21T09:08:09.0643536Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T09:08:09.0643654Z cvt.u64.u32 %rd612, %r1316; 2026-02-21T09:08:09.0643769Z cvt.u64.u32 %rd613, %r1317; 2026-02-21T09:08:09.0643896Z shl.b64 %rd614, %rd613, 32; 2026-02-21T09:08:09.0644017Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T09:08:09.0644129Z // begin inline asm 2026-02-21T09:08:09.0644966Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1319, %r1320, %r1321, %r1322, %r1323, %r1324, %r1325, %r1326, %r1327, %r1328, %r1329, %r1330, %r1331, %r1332, %r1333, %r1334}, [%r1369 + 464]; 2026-02-21T09:08:09.0645091Z // end inline asm 2026-02-21T09:08:09.0645207Z cvt.u64.u32 %rd616, %r1321; 2026-02-21T09:08:09.0645327Z cvt.u64.u32 %rd617, %r1322; 2026-02-21T09:08:09.0645457Z shl.b64 %rd618, %rd617, 32; 2026-02-21T09:08:09.0645576Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T09:08:09.0645697Z cvt.u64.u32 %rd620, %r1323; 2026-02-21T09:08:09.0645811Z cvt.u64.u32 %rd621, %r1324; 2026-02-21T09:08:09.0645937Z shl.b64 %rd622, %rd621, 32; 2026-02-21T09:08:09.0646058Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T09:08:09.0646173Z cvt.u64.u32 %rd624, %r1325; 2026-02-21T09:08:09.0646302Z cvt.u64.u32 %rd625, %r1326; 2026-02-21T09:08:09.0646423Z shl.b64 %rd626, %rd625, 32; 2026-02-21T09:08:09.0646536Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T09:08:09.0646667Z cvt.u64.u32 %rd628, %r1329; 2026-02-21T09:08:09.0646785Z cvt.u64.u32 %rd629, %r1330; 2026-02-21T09:08:09.0646907Z shl.b64 %rd630, %rd629, 32; 2026-02-21T09:08:09.0647041Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T09:08:09.0647173Z cvt.u64.u32 %rd632, %r1331; 2026-02-21T09:08:09.0647294Z cvt.u64.u32 %rd633, %r1332; 2026-02-21T09:08:09.0647415Z shl.b64 %rd634, %rd633, 32; 2026-02-21T09:08:09.0647546Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T09:08:09.0647667Z cvt.u64.u32 %rd636, %r1333; 2026-02-21T09:08:09.0647786Z cvt.u64.u32 %rd637, %r1334; 2026-02-21T09:08:09.0647902Z shl.b64 %rd638, %rd637, 32; 2026-02-21T09:08:09.0648030Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T09:08:09.0648140Z // begin inline asm 2026-02-21T09:08:09.0648866Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1336, %r1337, %r1338, %r1339, %r1340, %r1341, %r1342, %r1343, %r1344, %r1345, %r1346, %r1347, %r1348, %r1349, %r1350, %r1351}, [%r1369 + 480]; 2026-02-21T09:08:09.0648987Z // end inline asm 2026-02-21T09:08:09.0649101Z cvt.u64.u32 %rd640, %r1338; 2026-02-21T09:08:09.0649215Z cvt.u64.u32 %rd641, %r1339; 2026-02-21T09:08:09.0649343Z shl.b64 %rd642, %rd641, 32; 2026-02-21T09:08:09.0649464Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T09:08:09.0649658Z cvt.u64.u32 %rd644, %r1340; 2026-02-21T09:08:09.0649781Z cvt.u64.u32 %rd645, %r1341; 2026-02-21T09:08:09.0649912Z shl.b64 %rd646, %rd645, 32; 2026-02-21T09:08:09.0650029Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T09:08:09.0650247Z cvt.u64.u32 %rd648, %r1342; 2026-02-21T09:08:09.0650371Z cvt.u64.u32 %rd649, %r1343; 2026-02-21T09:08:09.0650490Z shl.b64 %rd650, %rd649, 32; 2026-02-21T09:08:09.0650607Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T09:08:09.0650727Z cvt.u64.u32 %rd652, %r1346; 2026-02-21T09:08:09.0650854Z cvt.u64.u32 %rd653, %r1347; 2026-02-21T09:08:09.0650969Z shl.b64 %rd654, %rd653, 32; 2026-02-21T09:08:09.0651087Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T09:08:09.0651208Z cvt.u64.u32 %rd656, %r1348; 2026-02-21T09:08:09.0651326Z cvt.u64.u32 %rd657, %r1349; 2026-02-21T09:08:09.0651438Z shl.b64 %rd658, %rd657, 32; 2026-02-21T09:08:09.0651552Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T09:08:09.0651738Z cvt.u64.u32 %rd660, %r1350; 2026-02-21T09:08:09.0651858Z cvt.u64.u32 %rd661, %r1351; 2026-02-21T09:08:09.0651973Z shl.b64 %rd662, %rd661, 32; 2026-02-21T09:08:09.0652100Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T09:08:09.0652213Z // begin inline asm 2026-02-21T09:08:09.0652956Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1353, %r1354, %r1355, %r1356, %r1357, %r1358, %r1359, %r1360, %r1361, %r1362, %r1363, %r1364, %r1365, %r1366, %r1367, %r1368}, [%r1369 + 496]; 2026-02-21T09:08:09.0653076Z // end inline asm 2026-02-21T09:08:09.0653256Z cvt.u64.u32 %rd664, %r1355; 2026-02-21T09:08:09.0653379Z cvt.u64.u32 %rd665, %r1356; 2026-02-21T09:08:09.0653498Z shl.b64 %rd666, %rd665, 32; 2026-02-21T09:08:09.0653632Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T09:08:09.0653748Z cvt.u64.u32 %rd668, %r1357; 2026-02-21T09:08:09.0653867Z cvt.u64.u32 %rd669, %r1358; 2026-02-21T09:08:09.0653994Z shl.b64 %rd670, %rd669, 32; 2026-02-21T09:08:09.0654112Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T09:08:09.0654232Z cvt.u64.u32 %rd672, %r1359; 2026-02-21T09:08:09.0654348Z cvt.u64.u32 %rd673, %r1360; 2026-02-21T09:08:09.0654469Z shl.b64 %rd674, %rd673, 32; 2026-02-21T09:08:09.0654586Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T09:08:09.0654803Z cvt.u64.u32 %rd676, %r1363; 2026-02-21T09:08:09.0654939Z cvt.u64.u32 %rd677, %r1364; 2026-02-21T09:08:09.0655057Z shl.b64 %rd678, %rd677, 32; 2026-02-21T09:08:09.0655177Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T09:08:09.0655306Z cvt.u64.u32 %rd680, %r1365; 2026-02-21T09:08:09.0655430Z cvt.u64.u32 %rd681, %r1366; 2026-02-21T09:08:09.0655550Z shl.b64 %rd682, %rd681, 32; 2026-02-21T09:08:09.0655667Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T09:08:09.0655792Z cvt.u64.u32 %rd684, %r1367; 2026-02-21T09:08:09.0655907Z cvt.u64.u32 %rd685, %r1368; 2026-02-21T09:08:09.0656030Z shl.b64 %rd686, %rd685, 32; 2026-02-21T09:08:09.0656157Z or.b64 %rd687, %rd684, %rd686; 2026-02-21T09:08:09.0656273Z // begin inline asm 2026-02-21T09:08:09.0656445Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:08:09.0656554Z // end inline asm 2026-02-21T09:08:09.0656684Z cvt.u64.u32 %rd688, %r826; 2026-02-21T09:08:09.0656803Z cvt.u64.u32 %rd689, %r827; 2026-02-21T09:08:09.0656921Z shl.b64 %rd690, %rd689, 32; 2026-02-21T09:08:09.0657049Z or.b64 %rd691, %rd688, %rd690; 2026-02-21T09:08:09.0657468Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0657600Z mov.b64 {%r2075, %r2076}, %rd691; 2026-02-21T09:08:09.0657759Z cvt.rn.f16x2.f32 %r2077, %r2076, %r2075; 2026-02-21T09:08:09.0658164Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0658283Z cvt.u64.u32 %rd692, %r828; 2026-02-21T09:08:09.0658399Z cvt.u64.u32 %rd693, %r829; 2026-02-21T09:08:09.0658526Z shl.b64 %rd694, %rd693, 32; 2026-02-21T09:08:09.0658646Z or.b64 %rd695, %rd692, %rd694; 2026-02-21T09:08:09.0659040Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0659260Z mov.b64 {%r2078, %r2079}, %rd695; 2026-02-21T09:08:09.0659409Z cvt.rn.f16x2.f32 %r2080, %r2079, %r2078; 2026-02-21T09:08:09.0659533Z mov.b64 {%r2081, %r2082}, %rd135; 2026-02-21T09:08:09.0659674Z cvt.rn.f16x2.f32 %r2083, %r2082, %r2081; 2026-02-21T09:08:09.0659859Z mov.b64 {%r2084, %r2085}, %rd139; 2026-02-21T09:08:09.0660001Z cvt.rn.f16x2.f32 %r2086, %r2085, %r2084; 2026-02-21T09:08:09.0660416Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0660546Z cvt.u64.u32 %rd696, %r834; 2026-02-21T09:08:09.0660664Z cvt.u64.u32 %rd697, %r835; 2026-02-21T09:08:09.0660782Z shl.b64 %rd698, %rd697, 32; 2026-02-21T09:08:09.0660907Z or.b64 %rd699, %rd696, %rd698; 2026-02-21T09:08:09.0661299Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0661485Z mov.b64 {%r2087, %r2088}, %rd699; 2026-02-21T09:08:09.0661629Z cvt.rn.f16x2.f32 %r2089, %r2088, %r2087; 2026-02-21T09:08:09.0662031Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0662156Z cvt.u64.u32 %rd700, %r836; 2026-02-21T09:08:09.0662279Z cvt.u64.u32 %rd701, %r837; 2026-02-21T09:08:09.0662418Z shl.b64 %rd702, %rd701, 32; 2026-02-21T09:08:09.0662533Z or.b64 %rd703, %rd700, %rd702; 2026-02-21T09:08:09.0662983Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0663116Z mov.b64 {%r2090, %r2091}, %rd703; 2026-02-21T09:08:09.0663259Z cvt.rn.f16x2.f32 %r2092, %r2091, %r2090; 2026-02-21T09:08:09.0663384Z mov.b64 {%r2093, %r2094}, %rd143; 2026-02-21T09:08:09.0663522Z cvt.rn.f16x2.f32 %r2095, %r2094, %r2093; 2026-02-21T09:08:09.0663652Z mov.b64 {%r2096, %r2097}, %rd147; 2026-02-21T09:08:09.0663789Z cvt.rn.f16x2.f32 %r2098, %r2097, %r2096; 2026-02-21T09:08:09.0664178Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0664308Z cvt.u64.u32 %rd704, %r843; 2026-02-21T09:08:09.0664424Z cvt.u64.u32 %rd705, %r844; 2026-02-21T09:08:09.0664539Z shl.b64 %rd706, %rd705, 32; 2026-02-21T09:08:09.0664726Z or.b64 %rd707, %rd704, %rd706; 2026-02-21T09:08:09.0665106Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0665225Z mov.b64 {%r2099, %r2100}, %rd707; 2026-02-21T09:08:09.0665368Z cvt.rn.f16x2.f32 %r2101, %r2100, %r2099; 2026-02-21T09:08:09.0665769Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0665889Z cvt.u64.u32 %rd708, %r845; 2026-02-21T09:08:09.0666003Z cvt.u64.u32 %rd709, %r846; 2026-02-21T09:08:09.0666130Z shl.b64 %rd710, %rd709, 32; 2026-02-21T09:08:09.0666250Z or.b64 %rd711, %rd708, %rd710; 2026-02-21T09:08:09.0666640Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0666767Z mov.b64 {%r2102, %r2103}, %rd711; 2026-02-21T09:08:09.0666902Z cvt.rn.f16x2.f32 %r2104, %r2103, %r2102; 2026-02-21T09:08:09.0667021Z mov.b64 {%r2105, %r2106}, %rd151; 2026-02-21T09:08:09.0667165Z cvt.rn.f16x2.f32 %r2107, %r2106, %r2105; 2026-02-21T09:08:09.0667297Z mov.b64 {%r2108, %r2109}, %rd155; 2026-02-21T09:08:09.0667434Z cvt.rn.f16x2.f32 %r2110, %r2109, %r2108; 2026-02-21T09:08:09.0667826Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0667953Z cvt.u64.u32 %rd712, %r851; 2026-02-21T09:08:09.0668064Z cvt.u64.u32 %rd713, %r852; 2026-02-21T09:08:09.0668178Z shl.b64 %rd714, %rd713, 32; 2026-02-21T09:08:09.0668308Z or.b64 %rd715, %rd712, %rd714; 2026-02-21T09:08:09.0668689Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0668810Z mov.b64 {%r2111, %r2112}, %rd715; 2026-02-21T09:08:09.0669069Z cvt.rn.f16x2.f32 %r2113, %r2112, %r2111; 2026-02-21T09:08:09.0669462Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0669584Z cvt.u64.u32 %rd716, %r853; 2026-02-21T09:08:09.0669770Z cvt.u64.u32 %rd717, %r854; 2026-02-21T09:08:09.0669900Z shl.b64 %rd718, %rd717, 32; 2026-02-21T09:08:09.0670021Z or.b64 %rd719, %rd716, %rd718; 2026-02-21T09:08:09.0670417Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0670543Z mov.b64 {%r2114, %r2115}, %rd719; 2026-02-21T09:08:09.0670683Z cvt.rn.f16x2.f32 %r2116, %r2115, %r2114; 2026-02-21T09:08:09.0670806Z mov.b64 {%r2117, %r2118}, %rd159; 2026-02-21T09:08:09.0670941Z cvt.rn.f16x2.f32 %r2119, %r2118, %r2117; 2026-02-21T09:08:09.0671070Z mov.b64 {%r2120, %r2121}, %rd163; 2026-02-21T09:08:09.0671296Z cvt.rn.f16x2.f32 %r2122, %r2121, %r2120; 2026-02-21T09:08:09.0671702Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0671838Z cvt.u64.u32 %rd720, %r860; 2026-02-21T09:08:09.0671960Z cvt.u64.u32 %rd721, %r861; 2026-02-21T09:08:09.0672079Z shl.b64 %rd722, %rd721, 32; 2026-02-21T09:08:09.0672214Z or.b64 %rd723, %rd720, %rd722; 2026-02-21T09:08:09.0672601Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0672789Z mov.b64 {%r2123, %r2124}, %rd723; 2026-02-21T09:08:09.0672928Z cvt.rn.f16x2.f32 %r2125, %r2124, %r2123; 2026-02-21T09:08:09.0673332Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0673453Z cvt.u64.u32 %rd724, %r862; 2026-02-21T09:08:09.0673570Z cvt.u64.u32 %rd725, %r863; 2026-02-21T09:08:09.0673703Z shl.b64 %rd726, %rd725, 32; 2026-02-21T09:08:09.0673823Z or.b64 %rd727, %rd724, %rd726; 2026-02-21T09:08:09.0674211Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0674346Z mov.b64 {%r2126, %r2127}, %rd727; 2026-02-21T09:08:09.0674479Z cvt.rn.f16x2.f32 %r2128, %r2127, %r2126; 2026-02-21T09:08:09.0674597Z mov.b64 {%r2129, %r2130}, %rd167; 2026-02-21T09:08:09.0674843Z cvt.rn.f16x2.f32 %r2131, %r2130, %r2129; 2026-02-21T09:08:09.0674977Z mov.b64 {%r2132, %r2133}, %rd171; 2026-02-21T09:08:09.0675115Z cvt.rn.f16x2.f32 %r2134, %r2133, %r2132; 2026-02-21T09:08:09.0675509Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0675640Z cvt.u64.u32 %rd728, %r868; 2026-02-21T09:08:09.0675763Z cvt.u64.u32 %rd729, %r869; 2026-02-21T09:08:09.0675885Z shl.b64 %rd730, %rd729, 32; 2026-02-21T09:08:09.0676007Z or.b64 %rd731, %rd728, %rd730; 2026-02-21T09:08:09.0676401Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0676523Z mov.b64 {%r2135, %r2136}, %rd731; 2026-02-21T09:08:09.0676661Z cvt.rn.f16x2.f32 %r2137, %r2136, %r2135; 2026-02-21T09:08:09.0677060Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0677184Z cvt.u64.u32 %rd732, %r870; 2026-02-21T09:08:09.0677301Z cvt.u64.u32 %rd733, %r871; 2026-02-21T09:08:09.0677428Z shl.b64 %rd734, %rd733, 32; 2026-02-21T09:08:09.0677549Z or.b64 %rd735, %rd732, %rd734; 2026-02-21T09:08:09.0677936Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0678063Z mov.b64 {%r2138, %r2139}, %rd735; 2026-02-21T09:08:09.0678197Z cvt.rn.f16x2.f32 %r2140, %r2139, %r2138; 2026-02-21T09:08:09.0678317Z mov.b64 {%r2141, %r2142}, %rd175; 2026-02-21T09:08:09.0678452Z cvt.rn.f16x2.f32 %r2143, %r2142, %r2141; 2026-02-21T09:08:09.0678579Z mov.b64 {%r2144, %r2145}, %rd179; 2026-02-21T09:08:09.0678716Z cvt.rn.f16x2.f32 %r2146, %r2145, %r2144; 2026-02-21T09:08:09.0679203Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0679337Z cvt.u64.u32 %rd736, %r877; 2026-02-21T09:08:09.0679456Z cvt.u64.u32 %rd737, %r878; 2026-02-21T09:08:09.0679641Z shl.b64 %rd738, %rd737, 32; 2026-02-21T09:08:09.0679763Z or.b64 %rd739, %rd736, %rd738; 2026-02-21T09:08:09.0680166Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0680290Z mov.b64 {%r2147, %r2148}, %rd739; 2026-02-21T09:08:09.0680436Z cvt.rn.f16x2.f32 %r2149, %r2148, %r2147; 2026-02-21T09:08:09.0680841Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0680963Z cvt.u64.u32 %rd740, %r879; 2026-02-21T09:08:09.0681083Z cvt.u64.u32 %rd741, %r880; 2026-02-21T09:08:09.0681213Z shl.b64 %rd742, %rd741, 32; 2026-02-21T09:08:09.0681400Z or.b64 %rd743, %rd740, %rd742; 2026-02-21T09:08:09.0681798Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0681918Z mov.b64 {%r2150, %r2151}, %rd743; 2026-02-21T09:08:09.0682078Z cvt.rn.f16x2.f32 %r2152, %r2151, %r2150; 2026-02-21T09:08:09.0682203Z mov.b64 {%r2153, %r2154}, %rd183; 2026-02-21T09:08:09.0682345Z cvt.rn.f16x2.f32 %r2155, %r2154, %r2153; 2026-02-21T09:08:09.0682483Z mov.b64 {%r2156, %r2157}, %rd187; 2026-02-21T09:08:09.0682676Z cvt.rn.f16x2.f32 %r2158, %r2157, %r2156; 2026-02-21T09:08:09.0683073Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0683208Z cvt.u64.u32 %rd744, %r885; 2026-02-21T09:08:09.0683329Z cvt.u64.u32 %rd745, %r886; 2026-02-21T09:08:09.0683450Z shl.b64 %rd746, %rd745, 32; 2026-02-21T09:08:09.0683570Z or.b64 %rd747, %rd744, %rd746; 2026-02-21T09:08:09.0683977Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0684102Z mov.b64 {%r2159, %r2160}, %rd747; 2026-02-21T09:08:09.0684237Z cvt.rn.f16x2.f32 %r2161, %r2160, %r2159; 2026-02-21T09:08:09.0684640Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0684846Z cvt.u64.u32 %rd748, %r887; 2026-02-21T09:08:09.0684962Z cvt.u64.u32 %rd749, %r888; 2026-02-21T09:08:09.0685095Z shl.b64 %rd750, %rd749, 32; 2026-02-21T09:08:09.0685215Z or.b64 %rd751, %rd748, %rd750; 2026-02-21T09:08:09.0685612Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0685733Z mov.b64 {%r2162, %r2163}, %rd751; 2026-02-21T09:08:09.0685883Z cvt.rn.f16x2.f32 %r2164, %r2163, %r2162; 2026-02-21T09:08:09.0686001Z mov.b64 {%r2165, %r2166}, %rd191; 2026-02-21T09:08:09.0686136Z cvt.rn.f16x2.f32 %r2167, %r2166, %r2165; 2026-02-21T09:08:09.0686263Z mov.b64 {%r2168, %r2169}, %rd195; 2026-02-21T09:08:09.0686399Z cvt.rn.f16x2.f32 %r2170, %r2169, %r2168; 2026-02-21T09:08:09.0686802Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0686929Z cvt.u64.u32 %rd752, %r894; 2026-02-21T09:08:09.0687044Z cvt.u64.u32 %rd753, %r895; 2026-02-21T09:08:09.0687164Z shl.b64 %rd754, %rd753, 32; 2026-02-21T09:08:09.0687283Z or.b64 %rd755, %rd752, %rd754; 2026-02-21T09:08:09.0687690Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0687810Z mov.b64 {%r2171, %r2172}, %rd755; 2026-02-21T09:08:09.0687946Z cvt.rn.f16x2.f32 %r2173, %r2172, %r2171; 2026-02-21T09:08:09.0688348Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0688467Z cvt.u64.u32 %rd756, %r896; 2026-02-21T09:08:09.0688584Z cvt.u64.u32 %rd757, %r897; 2026-02-21T09:08:09.0688717Z shl.b64 %rd758, %rd757, 32; 2026-02-21T09:08:09.0688840Z or.b64 %rd759, %rd756, %rd758; 2026-02-21T09:08:09.0689367Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0689490Z mov.b64 {%r2174, %r2175}, %rd759; 2026-02-21T09:08:09.0689641Z cvt.rn.f16x2.f32 %r2176, %r2175, %r2174; 2026-02-21T09:08:09.0689820Z mov.b64 {%r2177, %r2178}, %rd199; 2026-02-21T09:08:09.0689955Z cvt.rn.f16x2.f32 %r2179, %r2178, %r2177; 2026-02-21T09:08:09.0690082Z mov.b64 {%r2180, %r2181}, %rd203; 2026-02-21T09:08:09.0690220Z cvt.rn.f16x2.f32 %r2182, %r2181, %r2180; 2026-02-21T09:08:09.0690608Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0690738Z cvt.u64.u32 %rd760, %r902; 2026-02-21T09:08:09.0690857Z cvt.u64.u32 %rd761, %r903; 2026-02-21T09:08:09.0690974Z shl.b64 %rd762, %rd761, 32; 2026-02-21T09:08:09.0691092Z or.b64 %rd763, %rd760, %rd762; 2026-02-21T09:08:09.0691548Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0691674Z mov.b64 {%r2183, %r2184}, %rd763; 2026-02-21T09:08:09.0691816Z cvt.rn.f16x2.f32 %r2185, %r2184, %r2183; 2026-02-21T09:08:09.0692212Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0692335Z cvt.u64.u32 %rd764, %r904; 2026-02-21T09:08:09.0692455Z cvt.u64.u32 %rd765, %r905; 2026-02-21T09:08:09.0692585Z shl.b64 %rd766, %rd765, 32; 2026-02-21T09:08:09.0692770Z or.b64 %rd767, %rd764, %rd766; 2026-02-21T09:08:09.0693167Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0693289Z mov.b64 {%r2186, %r2187}, %rd767; 2026-02-21T09:08:09.0693443Z cvt.rn.f16x2.f32 %r2188, %r2187, %r2186; 2026-02-21T09:08:09.0693559Z mov.b64 {%r2189, %r2190}, %rd207; 2026-02-21T09:08:09.0693700Z cvt.rn.f16x2.f32 %r2191, %r2190, %r2189; 2026-02-21T09:08:09.0693830Z mov.b64 {%r2192, %r2193}, %rd211; 2026-02-21T09:08:09.0693968Z cvt.rn.f16x2.f32 %r2194, %r2193, %r2192; 2026-02-21T09:08:09.0694355Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0694485Z cvt.u64.u32 %rd768, %r911; 2026-02-21T09:08:09.0694603Z cvt.u64.u32 %rd769, %r912; 2026-02-21T09:08:09.0694818Z shl.b64 %rd770, %rd769, 32; 2026-02-21T09:08:09.0694939Z or.b64 %rd771, %rd768, %rd770; 2026-02-21T09:08:09.0695342Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0695463Z mov.b64 {%r2195, %r2196}, %rd771; 2026-02-21T09:08:09.0695601Z cvt.rn.f16x2.f32 %r2197, %r2196, %r2195; 2026-02-21T09:08:09.0696009Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0696129Z cvt.u64.u32 %rd772, %r913; 2026-02-21T09:08:09.0696250Z cvt.u64.u32 %rd773, %r914; 2026-02-21T09:08:09.0696373Z shl.b64 %rd774, %rd773, 32; 2026-02-21T09:08:09.0696509Z or.b64 %rd775, %rd772, %rd774; 2026-02-21T09:08:09.0696905Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0697028Z mov.b64 {%r2198, %r2199}, %rd775; 2026-02-21T09:08:09.0697182Z cvt.rn.f16x2.f32 %r2200, %r2199, %r2198; 2026-02-21T09:08:09.0697304Z mov.b64 {%r2201, %r2202}, %rd215; 2026-02-21T09:08:09.0697440Z cvt.rn.f16x2.f32 %r2203, %r2202, %r2201; 2026-02-21T09:08:09.0697570Z mov.b64 {%r2204, %r2205}, %rd219; 2026-02-21T09:08:09.0697712Z cvt.rn.f16x2.f32 %r2206, %r2205, %r2204; 2026-02-21T09:08:09.0698111Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0698230Z cvt.u64.u32 %rd776, %r919; 2026-02-21T09:08:09.0698364Z cvt.u64.u32 %rd777, %r920; 2026-02-21T09:08:09.0698483Z shl.b64 %rd778, %rd777, 32; 2026-02-21T09:08:09.0698601Z or.b64 %rd779, %rd776, %rd778; 2026-02-21T09:08:09.0699005Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0699207Z mov.b64 {%r2207, %r2208}, %rd779; 2026-02-21T09:08:09.0699348Z cvt.rn.f16x2.f32 %r2209, %r2208, %r2207; 2026-02-21T09:08:09.0699751Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0699935Z cvt.u64.u32 %rd780, %r921; 2026-02-21T09:08:09.0700055Z cvt.u64.u32 %rd781, %r922; 2026-02-21T09:08:09.0700179Z shl.b64 %rd782, %rd781, 32; 2026-02-21T09:08:09.0700319Z or.b64 %rd783, %rd780, %rd782; 2026-02-21T09:08:09.0700720Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0700842Z mov.b64 {%r2210, %r2211}, %rd783; 2026-02-21T09:08:09.0701001Z cvt.rn.f16x2.f32 %r2212, %r2211, %r2210; 2026-02-21T09:08:09.0701125Z mov.b64 {%r2213, %r2214}, %rd223; 2026-02-21T09:08:09.0701265Z cvt.rn.f16x2.f32 %r2215, %r2214, %r2213; 2026-02-21T09:08:09.0701460Z mov.b64 {%r2216, %r2217}, %rd227; 2026-02-21T09:08:09.0701599Z cvt.rn.f16x2.f32 %r2218, %r2217, %r2216; 2026-02-21T09:08:09.0701995Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0702121Z cvt.u64.u32 %rd784, %r928; 2026-02-21T09:08:09.0702252Z cvt.u64.u32 %rd785, %r929; 2026-02-21T09:08:09.0702372Z shl.b64 %rd786, %rd785, 32; 2026-02-21T09:08:09.0702488Z or.b64 %rd787, %rd784, %rd786; 2026-02-21T09:08:09.0702972Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0703094Z mov.b64 {%r2219, %r2220}, %rd787; 2026-02-21T09:08:09.0703235Z cvt.rn.f16x2.f32 %r2221, %r2220, %r2219; 2026-02-21T09:08:09.0703633Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0703751Z cvt.u64.u32 %rd788, %r930; 2026-02-21T09:08:09.0703868Z cvt.u64.u32 %rd789, %r931; 2026-02-21T09:08:09.0703988Z shl.b64 %rd790, %rd789, 32; 2026-02-21T09:08:09.0704124Z or.b64 %rd791, %rd788, %rd790; 2026-02-21T09:08:09.0704512Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0704631Z mov.b64 {%r2222, %r2223}, %rd791; 2026-02-21T09:08:09.0704867Z cvt.rn.f16x2.f32 %r2224, %r2223, %r2222; 2026-02-21T09:08:09.0704989Z mov.b64 {%r2225, %r2226}, %rd231; 2026-02-21T09:08:09.0705126Z cvt.rn.f16x2.f32 %r2227, %r2226, %r2225; 2026-02-21T09:08:09.0705257Z mov.b64 {%r2228, %r2229}, %rd235; 2026-02-21T09:08:09.0705393Z cvt.rn.f16x2.f32 %r2230, %r2229, %r2228; 2026-02-21T09:08:09.0705783Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0705901Z cvt.u64.u32 %rd792, %r936; 2026-02-21T09:08:09.0706025Z cvt.u64.u32 %rd793, %r937; 2026-02-21T09:08:09.0706143Z shl.b64 %rd794, %rd793, 32; 2026-02-21T09:08:09.0706261Z or.b64 %rd795, %rd792, %rd794; 2026-02-21T09:08:09.0706663Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0706783Z mov.b64 {%r2231, %r2232}, %rd795; 2026-02-21T09:08:09.0706923Z cvt.rn.f16x2.f32 %r2233, %r2232, %r2231; 2026-02-21T09:08:09.0707322Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0707444Z cvt.u64.u32 %rd796, %r938; 2026-02-21T09:08:09.0707561Z cvt.u64.u32 %rd797, %r939; 2026-02-21T09:08:09.0707680Z shl.b64 %rd798, %rd797, 32; 2026-02-21T09:08:09.0707812Z or.b64 %rd799, %rd796, %rd798; 2026-02-21T09:08:09.0708196Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0708314Z mov.b64 {%r2234, %r2235}, %rd799; 2026-02-21T09:08:09.0708463Z cvt.rn.f16x2.f32 %r2236, %r2235, %r2234; 2026-02-21T09:08:09.0708581Z mov.b64 {%r2237, %r2238}, %rd239; 2026-02-21T09:08:09.0708718Z cvt.rn.f16x2.f32 %r2239, %r2238, %r2237; 2026-02-21T09:08:09.0708922Z mov.b64 {%r2240, %r2241}, %rd243; 2026-02-21T09:08:09.0709059Z cvt.rn.f16x2.f32 %r2242, %r2241, %r2240; 2026-02-21T09:08:09.0709448Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0709637Z cvt.u64.u32 %rd800, %r945; 2026-02-21T09:08:09.0709768Z cvt.u64.u32 %rd801, %r946; 2026-02-21T09:08:09.0709888Z shl.b64 %rd802, %rd801, 32; 2026-02-21T09:08:09.0710014Z or.b64 %rd803, %rd800, %rd802; 2026-02-21T09:08:09.0710422Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0710544Z mov.b64 {%r2243, %r2244}, %rd803; 2026-02-21T09:08:09.0710680Z cvt.rn.f16x2.f32 %r2245, %r2244, %r2243; 2026-02-21T09:08:09.0711072Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0711194Z cvt.u64.u32 %rd804, %r947; 2026-02-21T09:08:09.0711376Z cvt.u64.u32 %rd805, %r948; 2026-02-21T09:08:09.0711499Z shl.b64 %rd806, %rd805, 32; 2026-02-21T09:08:09.0711626Z or.b64 %rd807, %rd804, %rd806; 2026-02-21T09:08:09.0712014Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0712135Z mov.b64 {%r2246, %r2247}, %rd807; 2026-02-21T09:08:09.0712283Z cvt.rn.f16x2.f32 %r2248, %r2247, %r2246; 2026-02-21T09:08:09.0712401Z mov.b64 {%r2249, %r2250}, %rd247; 2026-02-21T09:08:09.0712543Z cvt.rn.f16x2.f32 %r2251, %r2250, %r2249; 2026-02-21T09:08:09.0712719Z mov.b64 {%r2252, %r2253}, %rd251; 2026-02-21T09:08:09.0712870Z cvt.rn.f16x2.f32 %r2254, %r2253, %r2252; 2026-02-21T09:08:09.0713267Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0713388Z cvt.u64.u32 %rd808, %r953; 2026-02-21T09:08:09.0713519Z cvt.u64.u32 %rd809, %r954; 2026-02-21T09:08:09.0713640Z shl.b64 %rd810, %rd809, 32; 2026-02-21T09:08:09.0713767Z or.b64 %rd811, %rd808, %rd810; 2026-02-21T09:08:09.0714168Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0714293Z mov.b64 {%r2255, %r2256}, %rd811; 2026-02-21T09:08:09.0714435Z cvt.rn.f16x2.f32 %r2257, %r2256, %r2255; 2026-02-21T09:08:09.0714930Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0715068Z cvt.u64.u32 %rd812, %r955; 2026-02-21T09:08:09.0715191Z cvt.u64.u32 %rd813, %r956; 2026-02-21T09:08:09.0715312Z shl.b64 %rd814, %rd813, 32; 2026-02-21T09:08:09.0715444Z or.b64 %rd815, %rd812, %rd814; 2026-02-21T09:08:09.0715833Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0715955Z mov.b64 {%r2258, %r2259}, %rd815; 2026-02-21T09:08:09.0716106Z cvt.rn.f16x2.f32 %r2260, %r2259, %r2258; 2026-02-21T09:08:09.0716222Z mov.b64 {%r2261, %r2262}, %rd255; 2026-02-21T09:08:09.0716362Z cvt.rn.f16x2.f32 %r2263, %r2262, %r2261; 2026-02-21T09:08:09.0716482Z mov.b64 {%r2264, %r2265}, %rd259; 2026-02-21T09:08:09.0716629Z cvt.rn.f16x2.f32 %r2266, %r2265, %r2264; 2026-02-21T09:08:09.0717011Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0717130Z cvt.u64.u32 %rd816, %r962; 2026-02-21T09:08:09.0717260Z cvt.u64.u32 %rd817, %r963; 2026-02-21T09:08:09.0717379Z shl.b64 %rd818, %rd817, 32; 2026-02-21T09:08:09.0717501Z or.b64 %rd819, %rd816, %rd818; 2026-02-21T09:08:09.0717906Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0718025Z mov.b64 {%r2267, %r2268}, %rd819; 2026-02-21T09:08:09.0718159Z cvt.rn.f16x2.f32 %r2269, %r2268, %r2267; 2026-02-21T09:08:09.0718566Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0718694Z cvt.u64.u32 %rd820, %r964; 2026-02-21T09:08:09.0718815Z cvt.u64.u32 %rd821, %r965; 2026-02-21T09:08:09.0719017Z shl.b64 %rd822, %rd821, 32; 2026-02-21T09:08:09.0719147Z or.b64 %rd823, %rd820, %rd822; 2026-02-21T09:08:09.0719536Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0719710Z mov.b64 {%r2270, %r2271}, %rd823; 2026-02-21T09:08:09.0719855Z cvt.rn.f16x2.f32 %r2272, %r2271, %r2270; 2026-02-21T09:08:09.0719974Z mov.b64 {%r2273, %r2274}, %rd263; 2026-02-21T09:08:09.0720112Z cvt.rn.f16x2.f32 %r2275, %r2274, %r2273; 2026-02-21T09:08:09.0720236Z mov.b64 {%r2276, %r2277}, %rd267; 2026-02-21T09:08:09.0720386Z cvt.rn.f16x2.f32 %r2278, %r2277, %r2276; 2026-02-21T09:08:09.0720774Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0720893Z cvt.u64.u32 %rd824, %r970; 2026-02-21T09:08:09.0721021Z cvt.u64.u32 %rd825, %r971; 2026-02-21T09:08:09.0721142Z shl.b64 %rd826, %rd825, 32; 2026-02-21T09:08:09.0721326Z or.b64 %rd827, %rd824, %rd826; 2026-02-21T09:08:09.0721734Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0721854Z mov.b64 {%r2279, %r2280}, %rd827; 2026-02-21T09:08:09.0721994Z cvt.rn.f16x2.f32 %r2281, %r2280, %r2279; 2026-02-21T09:08:09.0722388Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0722515Z cvt.u64.u32 %rd828, %r972; 2026-02-21T09:08:09.0722721Z cvt.u64.u32 %rd829, %r973; 2026-02-21T09:08:09.0722842Z shl.b64 %rd830, %rd829, 32; 2026-02-21T09:08:09.0722975Z or.b64 %rd831, %rd828, %rd830; 2026-02-21T09:08:09.0723364Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0723482Z mov.b64 {%r2282, %r2283}, %rd831; 2026-02-21T09:08:09.0723627Z cvt.rn.f16x2.f32 %r2284, %r2283, %r2282; 2026-02-21T09:08:09.0723750Z mov.b64 {%r2285, %r2286}, %rd271; 2026-02-21T09:08:09.0723886Z cvt.rn.f16x2.f32 %r2287, %r2286, %r2285; 2026-02-21T09:08:09.0724006Z mov.b64 {%r2288, %r2289}, %rd275; 2026-02-21T09:08:09.0724148Z cvt.rn.f16x2.f32 %r2290, %r2289, %r2288; 2026-02-21T09:08:09.0724534Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0724655Z cvt.u64.u32 %rd832, %r979; 2026-02-21T09:08:09.0724870Z cvt.u64.u32 %rd833, %r980; 2026-02-21T09:08:09.0724990Z shl.b64 %rd834, %rd833, 32; 2026-02-21T09:08:09.0725114Z or.b64 %rd835, %rd832, %rd834; 2026-02-21T09:08:09.0725532Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0725652Z mov.b64 {%r2291, %r2292}, %rd835; 2026-02-21T09:08:09.0725791Z cvt.rn.f16x2.f32 %r2293, %r2292, %r2291; 2026-02-21T09:08:09.0726190Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0726325Z cvt.u64.u32 %rd836, %r981; 2026-02-21T09:08:09.0726450Z cvt.u64.u32 %rd837, %r982; 2026-02-21T09:08:09.0726571Z shl.b64 %rd838, %rd837, 32; 2026-02-21T09:08:09.0726700Z or.b64 %rd839, %rd836, %rd838; 2026-02-21T09:08:09.0727086Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0727213Z mov.b64 {%r2294, %r2295}, %rd839; 2026-02-21T09:08:09.0727361Z cvt.rn.f16x2.f32 %r2296, %r2295, %r2294; 2026-02-21T09:08:09.0727481Z mov.b64 {%r2297, %r2298}, %rd279; 2026-02-21T09:08:09.0727621Z cvt.rn.f16x2.f32 %r2299, %r2298, %r2297; 2026-02-21T09:08:09.0727737Z mov.b64 {%r2300, %r2301}, %rd283; 2026-02-21T09:08:09.0727885Z cvt.rn.f16x2.f32 %r2302, %r2301, %r2300; 2026-02-21T09:08:09.0728279Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0728399Z cvt.u64.u32 %rd840, %r987; 2026-02-21T09:08:09.0728526Z cvt.u64.u32 %rd841, %r988; 2026-02-21T09:08:09.0728645Z shl.b64 %rd842, %rd841, 32; 2026-02-21T09:08:09.0729216Z or.b64 %rd843, %rd840, %rd842; 2026-02-21T09:08:09.0729618Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0729748Z mov.b64 {%r2303, %r2304}, %rd843; 2026-02-21T09:08:09.0729888Z cvt.rn.f16x2.f32 %r2305, %r2304, %r2303; 2026-02-21T09:08:09.0730351Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0730484Z cvt.u64.u32 %rd844, %r989; 2026-02-21T09:08:09.0730606Z cvt.u64.u32 %rd845, %r990; 2026-02-21T09:08:09.0730724Z shl.b64 %rd846, %rd845, 32; 2026-02-21T09:08:09.0730856Z or.b64 %rd847, %rd844, %rd846; 2026-02-21T09:08:09.0731250Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0731369Z mov.b64 {%r2306, %r2307}, %rd847; 2026-02-21T09:08:09.0731506Z cvt.rn.f16x2.f32 %r2308, %r2307, %r2306; 2026-02-21T09:08:09.0731722Z mov.b64 {%r2309, %r2310}, %rd287; 2026-02-21T09:08:09.0731867Z cvt.rn.f16x2.f32 %r2311, %r2310, %r2309; 2026-02-21T09:08:09.0731982Z mov.b64 {%r2312, %r2313}, %rd291; 2026-02-21T09:08:09.0732134Z cvt.rn.f16x2.f32 %r2314, %r2313, %r2312; 2026-02-21T09:08:09.0732527Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0732649Z cvt.u64.u32 %rd848, %r996; 2026-02-21T09:08:09.0732777Z cvt.u64.u32 %rd849, %r997; 2026-02-21T09:08:09.0732898Z shl.b64 %rd850, %rd849, 32; 2026-02-21T09:08:09.0733083Z or.b64 %rd851, %rd848, %rd850; 2026-02-21T09:08:09.0733481Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0733612Z mov.b64 {%r2315, %r2316}, %rd851; 2026-02-21T09:08:09.0733749Z cvt.rn.f16x2.f32 %r2317, %r2316, %r2315; 2026-02-21T09:08:09.0734138Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0734273Z cvt.u64.u32 %rd852, %r998; 2026-02-21T09:08:09.0734399Z cvt.u64.u32 %rd853, %r999; 2026-02-21T09:08:09.0734523Z shl.b64 %rd854, %rd853, 32; 2026-02-21T09:08:09.0734655Z or.b64 %rd855, %rd852, %rd854; 2026-02-21T09:08:09.0735156Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0735280Z mov.b64 {%r2318, %r2319}, %rd855; 2026-02-21T09:08:09.0735424Z cvt.rn.f16x2.f32 %r2320, %r2319, %r2318; 2026-02-21T09:08:09.0735555Z mov.b64 {%r2321, %r2322}, %rd295; 2026-02-21T09:08:09.0735694Z cvt.rn.f16x2.f32 %r2323, %r2322, %r2321; 2026-02-21T09:08:09.0735810Z mov.b64 {%r2324, %r2325}, %rd299; 2026-02-21T09:08:09.0735960Z cvt.rn.f16x2.f32 %r2326, %r2325, %r2324; 2026-02-21T09:08:09.0736349Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0736473Z cvt.u64.u32 %rd856, %r1004; 2026-02-21T09:08:09.0736600Z cvt.u64.u32 %rd857, %r1005; 2026-02-21T09:08:09.0736719Z shl.b64 %rd858, %rd857, 32; 2026-02-21T09:08:09.0736842Z or.b64 %rd859, %rd856, %rd858; 2026-02-21T09:08:09.0737233Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0737367Z mov.b64 {%r2327, %r2328}, %rd859; 2026-02-21T09:08:09.0737510Z cvt.rn.f16x2.f32 %r2329, %r2328, %r2327; 2026-02-21T09:08:09.0737904Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0738036Z cvt.u64.u32 %rd860, %r1006; 2026-02-21T09:08:09.0738157Z cvt.u64.u32 %rd861, %r1007; 2026-02-21T09:08:09.0738274Z shl.b64 %rd862, %rd861, 32; 2026-02-21T09:08:09.0738410Z or.b64 %rd863, %rd860, %rd862; 2026-02-21T09:08:09.0738796Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0738914Z mov.b64 {%r2330, %r2331}, %rd863; 2026-02-21T09:08:09.0739054Z cvt.rn.f16x2.f32 %r2332, %r2331, %r2330; 2026-02-21T09:08:09.0739188Z mov.b64 {%r2333, %r2334}, %rd303; 2026-02-21T09:08:09.0739408Z cvt.rn.f16x2.f32 %r2335, %r2334, %r2333; 2026-02-21T09:08:09.0739535Z mov.b64 {%r2336, %r2337}, %rd307; 2026-02-21T09:08:09.0739685Z cvt.rn.f16x2.f32 %r2338, %r2337, %r2336; 2026-02-21T09:08:09.0740081Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0740270Z cvt.u64.u32 %rd864, %r1013; 2026-02-21T09:08:09.0740402Z cvt.u64.u32 %rd865, %r1014; 2026-02-21T09:08:09.0740522Z shl.b64 %rd866, %rd865, 32; 2026-02-21T09:08:09.0740646Z or.b64 %rd867, %rd864, %rd866; 2026-02-21T09:08:09.0741041Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0741175Z mov.b64 {%r2339, %r2340}, %rd867; 2026-02-21T09:08:09.0741312Z cvt.rn.f16x2.f32 %r2341, %r2340, %r2339; 2026-02-21T09:08:09.0741769Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0741906Z cvt.u64.u32 %rd868, %r1015; 2026-02-21T09:08:09.0742029Z cvt.u64.u32 %rd869, %r1016; 2026-02-21T09:08:09.0742149Z shl.b64 %rd870, %rd869, 32; 2026-02-21T09:08:09.0742283Z or.b64 %rd871, %rd868, %rd870; 2026-02-21T09:08:09.0742671Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0742791Z mov.b64 {%r2342, %r2343}, %rd871; 2026-02-21T09:08:09.0742934Z cvt.rn.f16x2.f32 %r2344, %r2343, %r2342; 2026-02-21T09:08:09.0743125Z mov.b64 {%r2345, %r2346}, %rd311; 2026-02-21T09:08:09.0743265Z cvt.rn.f16x2.f32 %r2347, %r2346, %r2345; 2026-02-21T09:08:09.0743384Z mov.b64 {%r2348, %r2349}, %rd315; 2026-02-21T09:08:09.0743536Z cvt.rn.f16x2.f32 %r2350, %r2349, %r2348; 2026-02-21T09:08:09.0743936Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0744054Z cvt.u64.u32 %rd872, %r1021; 2026-02-21T09:08:09.0744188Z cvt.u64.u32 %rd873, %r1022; 2026-02-21T09:08:09.0744307Z shl.b64 %rd874, %rd873, 32; 2026-02-21T09:08:09.0744429Z or.b64 %rd875, %rd872, %rd874; 2026-02-21T09:08:09.0744911Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0745045Z mov.b64 {%r2351, %r2352}, %rd875; 2026-02-21T09:08:09.0745187Z cvt.rn.f16x2.f32 %r2353, %r2352, %r2351; 2026-02-21T09:08:09.0745579Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0745715Z cvt.u64.u32 %rd876, %r1023; 2026-02-21T09:08:09.0745837Z cvt.u64.u32 %rd877, %r1024; 2026-02-21T09:08:09.0745954Z shl.b64 %rd878, %rd877, 32; 2026-02-21T09:08:09.0746083Z or.b64 %rd879, %rd876, %rd878; 2026-02-21T09:08:09.0746469Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0746592Z mov.b64 {%r2354, %r2355}, %rd879; 2026-02-21T09:08:09.0746734Z cvt.rn.f16x2.f32 %r2356, %r2355, %r2354; 2026-02-21T09:08:09.0746876Z mov.b64 {%r2357, %r2358}, %rd319; 2026-02-21T09:08:09.0747014Z cvt.rn.f16x2.f32 %r2359, %r2358, %r2357; 2026-02-21T09:08:09.0747134Z mov.b64 {%r2360, %r2361}, %rd323; 2026-02-21T09:08:09.0747283Z cvt.rn.f16x2.f32 %r2362, %r2361, %r2360; 2026-02-21T09:08:09.0747674Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0747791Z cvt.u64.u32 %rd880, %r1030; 2026-02-21T09:08:09.0747921Z cvt.u64.u32 %rd881, %r1031; 2026-02-21T09:08:09.0748041Z shl.b64 %rd882, %rd881, 32; 2026-02-21T09:08:09.0748161Z or.b64 %rd883, %rd880, %rd882; 2026-02-21T09:08:09.0748546Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0748678Z mov.b64 {%r2363, %r2364}, %rd883; 2026-02-21T09:08:09.0748819Z cvt.rn.f16x2.f32 %r2365, %r2364, %r2363; 2026-02-21T09:08:09.0749213Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0749421Z cvt.u64.u32 %rd884, %r1032; 2026-02-21T09:08:09.0749542Z cvt.u64.u32 %rd885, %r1033; 2026-02-21T09:08:09.0749660Z shl.b64 %rd886, %rd885, 32; 2026-02-21T09:08:09.0749788Z or.b64 %rd887, %rd884, %rd886; 2026-02-21T09:08:09.0750245Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0750366Z mov.b64 {%r2366, %r2367}, %rd887; 2026-02-21T09:08:09.0750501Z cvt.rn.f16x2.f32 %r2368, %r2367, %r2366; 2026-02-21T09:08:09.0750636Z mov.b64 {%r2369, %r2370}, %rd327; 2026-02-21T09:08:09.0750774Z cvt.rn.f16x2.f32 %r2371, %r2370, %r2369; 2026-02-21T09:08:09.0750893Z mov.b64 {%r2372, %r2373}, %rd331; 2026-02-21T09:08:09.0751042Z cvt.rn.f16x2.f32 %r2374, %r2373, %r2372; 2026-02-21T09:08:09.0751437Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0751615Z cvt.u64.u32 %rd888, %r1038; 2026-02-21T09:08:09.0751745Z cvt.u64.u32 %rd889, %r1039; 2026-02-21T09:08:09.0751869Z shl.b64 %rd890, %rd889, 32; 2026-02-21T09:08:09.0751992Z or.b64 %rd891, %rd888, %rd890; 2026-02-21T09:08:09.0752378Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0752510Z mov.b64 {%r2375, %r2376}, %rd891; 2026-02-21T09:08:09.0752652Z cvt.rn.f16x2.f32 %r2377, %r2376, %r2375; 2026-02-21T09:08:09.0753102Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0753230Z cvt.u64.u32 %rd892, %r1040; 2026-02-21T09:08:09.0753349Z cvt.u64.u32 %rd893, %r1041; 2026-02-21T09:08:09.0753469Z shl.b64 %rd894, %rd893, 32; 2026-02-21T09:08:09.0753591Z or.b64 %rd895, %rd892, %rd894; 2026-02-21T09:08:09.0753996Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0754118Z mov.b64 {%r2378, %r2379}, %rd895; 2026-02-21T09:08:09.0754259Z cvt.rn.f16x2.f32 %r2380, %r2379, %r2378; 2026-02-21T09:08:09.0754392Z mov.b64 {%r2381, %r2382}, %rd335; 2026-02-21T09:08:09.0754528Z cvt.rn.f16x2.f32 %r2383, %r2382, %r2381; 2026-02-21T09:08:09.0754650Z mov.b64 {%r2384, %r2385}, %rd339; 2026-02-21T09:08:09.0754879Z cvt.rn.f16x2.f32 %r2386, %r2385, %r2384; 2026-02-21T09:08:09.0755288Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0755411Z cvt.u64.u32 %rd896, %r1047; 2026-02-21T09:08:09.0755540Z cvt.u64.u32 %rd897, %r1048; 2026-02-21T09:08:09.0755675Z shl.b64 %rd898, %rd897, 32; 2026-02-21T09:08:09.0755799Z or.b64 %rd899, %rd896, %rd898; 2026-02-21T09:08:09.0756205Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0756332Z mov.b64 {%r2387, %r2388}, %rd899; 2026-02-21T09:08:09.0756471Z cvt.rn.f16x2.f32 %r2389, %r2388, %r2387; 2026-02-21T09:08:09.0756865Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0756996Z cvt.u64.u32 %rd900, %r1049; 2026-02-21T09:08:09.0757111Z cvt.u64.u32 %rd901, %r1050; 2026-02-21T09:08:09.0757228Z shl.b64 %rd902, %rd901, 32; 2026-02-21T09:08:09.0757351Z or.b64 %rd903, %rd900, %rd902; 2026-02-21T09:08:09.0757757Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0757877Z mov.b64 {%r2390, %r2391}, %rd903; 2026-02-21T09:08:09.0758017Z cvt.rn.f16x2.f32 %r2392, %r2391, %r2390; 2026-02-21T09:08:09.0758153Z mov.b64 {%r2393, %r2394}, %rd343; 2026-02-21T09:08:09.0758287Z cvt.rn.f16x2.f32 %r2395, %r2394, %r2393; 2026-02-21T09:08:09.0758406Z mov.b64 {%r2396, %r2397}, %rd347; 2026-02-21T09:08:09.0758554Z cvt.rn.f16x2.f32 %r2398, %r2397, %r2396; 2026-02-21T09:08:09.0758942Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0759066Z cvt.u64.u32 %rd904, %r1055; 2026-02-21T09:08:09.0759269Z cvt.u64.u32 %rd905, %r1056; 2026-02-21T09:08:09.0759402Z shl.b64 %rd906, %rd905, 32; 2026-02-21T09:08:09.0759524Z or.b64 %rd907, %rd904, %rd906; 2026-02-21T09:08:09.0759913Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0760110Z mov.b64 {%r2399, %r2400}, %rd907; 2026-02-21T09:08:09.0760246Z cvt.rn.f16x2.f32 %r2401, %r2400, %r2399; 2026-02-21T09:08:09.0760630Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0760762Z cvt.u64.u32 %rd908, %r1057; 2026-02-21T09:08:09.0760880Z cvt.u64.u32 %rd909, %r1058; 2026-02-21T09:08:09.0760995Z shl.b64 %rd910, %rd909, 32; 2026-02-21T09:08:09.0761111Z or.b64 %rd911, %rd908, %rd910; 2026-02-21T09:08:09.0761499Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0761707Z mov.b64 {%r2402, %r2403}, %rd911; 2026-02-21T09:08:09.0761847Z cvt.rn.f16x2.f32 %r2404, %r2403, %r2402; 2026-02-21T09:08:09.0761973Z mov.b64 {%r2405, %r2406}, %rd351; 2026-02-21T09:08:09.0762105Z cvt.rn.f16x2.f32 %r2407, %r2406, %r2405; 2026-02-21T09:08:09.0762221Z mov.b64 {%r2408, %r2409}, %rd355; 2026-02-21T09:08:09.0762370Z cvt.rn.f16x2.f32 %r2410, %r2409, %r2408; 2026-02-21T09:08:09.0762757Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0762944Z cvt.u64.u32 %rd912, %r1064; 2026-02-21T09:08:09.0763067Z cvt.u64.u32 %rd913, %r1065; 2026-02-21T09:08:09.0763195Z shl.b64 %rd914, %rd913, 32; 2026-02-21T09:08:09.0763317Z or.b64 %rd915, %rd912, %rd914; 2026-02-21T09:08:09.0763706Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0763830Z mov.b64 {%r2411, %r2412}, %rd915; 2026-02-21T09:08:09.0763969Z cvt.rn.f16x2.f32 %r2413, %r2412, %r2411; 2026-02-21T09:08:09.0764366Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0764499Z cvt.u64.u32 %rd916, %r1066; 2026-02-21T09:08:09.0764620Z cvt.u64.u32 %rd917, %r1067; 2026-02-21T09:08:09.0764837Z shl.b64 %rd918, %rd917, 32; 2026-02-21T09:08:09.0764966Z or.b64 %rd919, %rd916, %rd918; 2026-02-21T09:08:09.0765366Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0765487Z mov.b64 {%r2414, %r2415}, %rd919; 2026-02-21T09:08:09.0765628Z cvt.rn.f16x2.f32 %r2416, %r2415, %r2414; 2026-02-21T09:08:09.0765756Z mov.b64 {%r2417, %r2418}, %rd359; 2026-02-21T09:08:09.0765896Z cvt.rn.f16x2.f32 %r2419, %r2418, %r2417; 2026-02-21T09:08:09.0766014Z mov.b64 {%r2420, %r2421}, %rd363; 2026-02-21T09:08:09.0766159Z cvt.rn.f16x2.f32 %r2422, %r2421, %r2420; 2026-02-21T09:08:09.0766547Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0766670Z cvt.u64.u32 %rd920, %r1072; 2026-02-21T09:08:09.0766788Z cvt.u64.u32 %rd921, %r1073; 2026-02-21T09:08:09.0766918Z shl.b64 %rd922, %rd921, 32; 2026-02-21T09:08:09.0767038Z or.b64 %rd923, %rd920, %rd922; 2026-02-21T09:08:09.0767424Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0767558Z mov.b64 {%r2423, %r2424}, %rd923; 2026-02-21T09:08:09.0767694Z cvt.rn.f16x2.f32 %r2425, %r2424, %r2423; 2026-02-21T09:08:09.0768089Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0768218Z cvt.u64.u32 %rd924, %r1074; 2026-02-21T09:08:09.0768337Z cvt.u64.u32 %rd925, %r1075; 2026-02-21T09:08:09.0768456Z shl.b64 %rd926, %rd925, 32; 2026-02-21T09:08:09.0768575Z or.b64 %rd927, %rd924, %rd926; 2026-02-21T09:08:09.0768980Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0769101Z mov.b64 {%r2426, %r2427}, %rd927; 2026-02-21T09:08:09.0769318Z cvt.rn.f16x2.f32 %r2428, %r2427, %r2426; 2026-02-21T09:08:09.0769449Z mov.b64 {%r2429, %r2430}, %rd367; 2026-02-21T09:08:09.0769591Z cvt.rn.f16x2.f32 %r2431, %r2430, %r2429; 2026-02-21T09:08:09.0769707Z mov.b64 {%r2432, %r2433}, %rd371; 2026-02-21T09:08:09.0769914Z cvt.rn.f16x2.f32 %r2434, %r2433, %r2432; 2026-02-21T09:08:09.0770306Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0770430Z cvt.u64.u32 %rd928, %r1081; 2026-02-21T09:08:09.0770550Z cvt.u64.u32 %rd929, %r1082; 2026-02-21T09:08:09.0770680Z shl.b64 %rd930, %rd929, 32; 2026-02-21T09:08:09.0770799Z or.b64 %rd931, %rd928, %rd930; 2026-02-21T09:08:09.0771191Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0771320Z mov.b64 {%r2435, %r2436}, %rd931; 2026-02-21T09:08:09.0771519Z cvt.rn.f16x2.f32 %r2437, %r2436, %r2435; 2026-02-21T09:08:09.0771914Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0772048Z cvt.u64.u32 %rd932, %r1083; 2026-02-21T09:08:09.0772167Z cvt.u64.u32 %rd933, %r1084; 2026-02-21T09:08:09.0772289Z shl.b64 %rd934, %rd933, 32; 2026-02-21T09:08:09.0772418Z or.b64 %rd935, %rd932, %rd934; 2026-02-21T09:08:09.0772822Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0773001Z mov.b64 {%r2438, %r2439}, %rd935; 2026-02-21T09:08:09.0773141Z cvt.rn.f16x2.f32 %r2440, %r2439, %r2438; 2026-02-21T09:08:09.0773271Z mov.b64 {%r2441, %r2442}, %rd375; 2026-02-21T09:08:09.0773411Z cvt.rn.f16x2.f32 %r2443, %r2442, %r2441; 2026-02-21T09:08:09.0773533Z mov.b64 {%r2444, %r2445}, %rd379; 2026-02-21T09:08:09.0773683Z cvt.rn.f16x2.f32 %r2446, %r2445, %r2444; 2026-02-21T09:08:09.0774073Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0774195Z cvt.u64.u32 %rd936, %r1089; 2026-02-21T09:08:09.0774313Z cvt.u64.u32 %rd937, %r1090; 2026-02-21T09:08:09.0774441Z shl.b64 %rd938, %rd937, 32; 2026-02-21T09:08:09.0774562Z or.b64 %rd939, %rd936, %rd938; 2026-02-21T09:08:09.0775027Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0775172Z mov.b64 {%r2447, %r2448}, %rd939; 2026-02-21T09:08:09.0775316Z cvt.rn.f16x2.f32 %r2449, %r2448, %r2447; 2026-02-21T09:08:09.0775704Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0775837Z cvt.u64.u32 %rd940, %r1091; 2026-02-21T09:08:09.0775957Z cvt.u64.u32 %rd941, %r1092; 2026-02-21T09:08:09.0776075Z shl.b64 %rd942, %rd941, 32; 2026-02-21T09:08:09.0776196Z or.b64 %rd943, %rd940, %rd942; 2026-02-21T09:08:09.0776598Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0776721Z mov.b64 {%r2450, %r2451}, %rd943; 2026-02-21T09:08:09.0776860Z cvt.rn.f16x2.f32 %r2452, %r2451, %r2450; 2026-02-21T09:08:09.0776990Z mov.b64 {%r2453, %r2454}, %rd383; 2026-02-21T09:08:09.0777128Z cvt.rn.f16x2.f32 %r2455, %r2454, %r2453; 2026-02-21T09:08:09.0777256Z mov.b64 {%r2456, %r2457}, %rd387; 2026-02-21T09:08:09.0777404Z cvt.rn.f16x2.f32 %r2458, %r2457, %r2456; 2026-02-21T09:08:09.0777801Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0777922Z cvt.u64.u32 %rd944, %r1098; 2026-02-21T09:08:09.0778039Z cvt.u64.u32 %rd945, %r1099; 2026-02-21T09:08:09.0778171Z shl.b64 %rd946, %rd945, 32; 2026-02-21T09:08:09.0778290Z or.b64 %rd947, %rd944, %rd946; 2026-02-21T09:08:09.0778676Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0778810Z mov.b64 {%r2459, %r2460}, %rd947; 2026-02-21T09:08:09.0778949Z cvt.rn.f16x2.f32 %r2461, %r2460, %r2459; 2026-02-21T09:08:09.0779421Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0779552Z cvt.u64.u32 %rd948, %r1100; 2026-02-21T09:08:09.0779673Z cvt.u64.u32 %rd949, %r1101; 2026-02-21T09:08:09.0779864Z shl.b64 %rd950, %rd949, 32; 2026-02-21T09:08:09.0780004Z or.b64 %rd951, %rd948, %rd950; 2026-02-21T09:08:09.0780410Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0780526Z mov.b64 {%r2462, %r2463}, %rd951; 2026-02-21T09:08:09.0780657Z cvt.rn.f16x2.f32 %r2464, %r2463, %r2462; 2026-02-21T09:08:09.0780786Z mov.b64 {%r2465, %r2466}, %rd391; 2026-02-21T09:08:09.0780918Z cvt.rn.f16x2.f32 %r2467, %r2466, %r2465; 2026-02-21T09:08:09.0781035Z mov.b64 {%r2468, %r2469}, %rd395; 2026-02-21T09:08:09.0781172Z cvt.rn.f16x2.f32 %r2470, %r2469, %r2468; 2026-02-21T09:08:09.0781640Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0781760Z cvt.u64.u32 %rd952, %r1106; 2026-02-21T09:08:09.0781877Z cvt.u64.u32 %rd953, %r1107; 2026-02-21T09:08:09.0782004Z shl.b64 %rd954, %rd953, 32; 2026-02-21T09:08:09.0782122Z or.b64 %rd955, %rd952, %rd954; 2026-02-21T09:08:09.0782520Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0782647Z mov.b64 {%r2471, %r2472}, %rd955; 2026-02-21T09:08:09.0782869Z cvt.rn.f16x2.f32 %r2473, %r2472, %r2471; 2026-02-21T09:08:09.0783259Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0783390Z cvt.u64.u32 %rd956, %r1108; 2026-02-21T09:08:09.0783512Z cvt.u64.u32 %rd957, %r1109; 2026-02-21T09:08:09.0783631Z shl.b64 %rd958, %rd957, 32; 2026-02-21T09:08:09.0783752Z or.b64 %rd959, %rd956, %rd958; 2026-02-21T09:08:09.0784155Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0784278Z mov.b64 {%r2474, %r2475}, %rd959; 2026-02-21T09:08:09.0784410Z cvt.rn.f16x2.f32 %r2476, %r2475, %r2474; 2026-02-21T09:08:09.0784535Z mov.b64 {%r2477, %r2478}, %rd399; 2026-02-21T09:08:09.0784792Z cvt.rn.f16x2.f32 %r2479, %r2478, %r2477; 2026-02-21T09:08:09.0784918Z mov.b64 {%r2480, %r2481}, %rd403; 2026-02-21T09:08:09.0785057Z cvt.rn.f16x2.f32 %r2482, %r2481, %r2480; 2026-02-21T09:08:09.0785463Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0785580Z cvt.u64.u32 %rd960, %r1115; 2026-02-21T09:08:09.0785700Z cvt.u64.u32 %rd961, %r1116; 2026-02-21T09:08:09.0785827Z shl.b64 %rd962, %rd961, 32; 2026-02-21T09:08:09.0785946Z or.b64 %rd963, %rd960, %rd962; 2026-02-21T09:08:09.0786332Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0786460Z mov.b64 {%r2483, %r2484}, %rd963; 2026-02-21T09:08:09.0786602Z cvt.rn.f16x2.f32 %r2485, %r2484, %r2483; 2026-02-21T09:08:09.0786989Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0787119Z cvt.u64.u32 %rd964, %r1117; 2026-02-21T09:08:09.0787237Z cvt.u64.u32 %rd965, %r1118; 2026-02-21T09:08:09.0787356Z shl.b64 %rd966, %rd965, 32; 2026-02-21T09:08:09.0787475Z or.b64 %rd967, %rd964, %rd966; 2026-02-21T09:08:09.0787877Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0787997Z mov.b64 {%r2486, %r2487}, %rd967; 2026-02-21T09:08:09.0788133Z cvt.rn.f16x2.f32 %r2488, %r2487, %r2486; 2026-02-21T09:08:09.0788265Z mov.b64 {%r2489, %r2490}, %rd407; 2026-02-21T09:08:09.0788402Z cvt.rn.f16x2.f32 %r2491, %r2490, %r2489; 2026-02-21T09:08:09.0788521Z mov.b64 {%r2492, %r2493}, %rd411; 2026-02-21T09:08:09.0788658Z cvt.rn.f16x2.f32 %r2494, %r2493, %r2492; 2026-02-21T09:08:09.0789056Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0789260Z cvt.u64.u32 %rd968, %r1123; 2026-02-21T09:08:09.0789383Z cvt.u64.u32 %rd969, %r1124; 2026-02-21T09:08:09.0789513Z shl.b64 %rd970, %rd969, 32; 2026-02-21T09:08:09.0789700Z or.b64 %rd971, %rd968, %rd970; 2026-02-21T09:08:09.0790092Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0790219Z mov.b64 {%r2495, %r2496}, %rd971; 2026-02-21T09:08:09.0790361Z cvt.rn.f16x2.f32 %r2497, %r2496, %r2495; 2026-02-21T09:08:09.0790756Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0790874Z cvt.u64.u32 %rd972, %r1125; 2026-02-21T09:08:09.0791004Z cvt.u64.u32 %rd973, %r1126; 2026-02-21T09:08:09.0791124Z shl.b64 %rd974, %rd973, 32; 2026-02-21T09:08:09.0791242Z or.b64 %rd975, %rd972, %rd974; 2026-02-21T09:08:09.0791715Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0791838Z mov.b64 {%r2498, %r2499}, %rd975; 2026-02-21T09:08:09.0791981Z cvt.rn.f16x2.f32 %r2500, %r2499, %r2498; 2026-02-21T09:08:09.0792114Z mov.b64 {%r2501, %r2502}, %rd415; 2026-02-21T09:08:09.0792256Z cvt.rn.f16x2.f32 %r2503, %r2502, %r2501; 2026-02-21T09:08:09.0792375Z mov.b64 {%r2504, %r2505}, %rd419; 2026-02-21T09:08:09.0792508Z cvt.rn.f16x2.f32 %r2506, %r2505, %r2504; 2026-02-21T09:08:09.0792992Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0793112Z cvt.u64.u32 %rd976, %r1132; 2026-02-21T09:08:09.0793224Z cvt.u64.u32 %rd977, %r1133; 2026-02-21T09:08:09.0793356Z shl.b64 %rd978, %rd977, 32; 2026-02-21T09:08:09.0793476Z or.b64 %rd979, %rd976, %rd978; 2026-02-21T09:08:09.0793864Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0793997Z mov.b64 {%r2507, %r2508}, %rd979; 2026-02-21T09:08:09.0794136Z cvt.rn.f16x2.f32 %r2509, %r2508, %r2507; 2026-02-21T09:08:09.0794533Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0794653Z cvt.u64.u32 %rd980, %r1134; 2026-02-21T09:08:09.0794855Z cvt.u64.u32 %rd981, %r1135; 2026-02-21T09:08:09.0794973Z shl.b64 %rd982, %rd981, 32; 2026-02-21T09:08:09.0795097Z or.b64 %rd983, %rd980, %rd982; 2026-02-21T09:08:09.0795493Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0795615Z mov.b64 {%r2510, %r2511}, %rd983; 2026-02-21T09:08:09.0795758Z cvt.rn.f16x2.f32 %r2512, %r2511, %r2510; 2026-02-21T09:08:09.0795888Z mov.b64 {%r2513, %r2514}, %rd423; 2026-02-21T09:08:09.0796028Z cvt.rn.f16x2.f32 %r2515, %r2514, %r2513; 2026-02-21T09:08:09.0796145Z mov.b64 {%r2516, %r2517}, %rd427; 2026-02-21T09:08:09.0796287Z cvt.rn.f16x2.f32 %r2518, %r2517, %r2516; 2026-02-21T09:08:09.0796701Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0796822Z cvt.u64.u32 %rd984, %r1140; 2026-02-21T09:08:09.0796939Z cvt.u64.u32 %rd985, %r1141; 2026-02-21T09:08:09.0797069Z shl.b64 %rd986, %rd985, 32; 2026-02-21T09:08:09.0797193Z or.b64 %rd987, %rd984, %rd986; 2026-02-21T09:08:09.0797582Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0797710Z mov.b64 {%r2519, %r2520}, %rd987; 2026-02-21T09:08:09.0797847Z cvt.rn.f16x2.f32 %r2521, %r2520, %r2519; 2026-02-21T09:08:09.0798235Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0798354Z cvt.u64.u32 %rd988, %r1142; 2026-02-21T09:08:09.0798484Z cvt.u64.u32 %rd989, %r1143; 2026-02-21T09:08:09.0798602Z shl.b64 %rd990, %rd989, 32; 2026-02-21T09:08:09.0798724Z or.b64 %rd991, %rd988, %rd990; 2026-02-21T09:08:09.0799208Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0799331Z mov.b64 {%r2522, %r2523}, %rd991; 2026-02-21T09:08:09.0799475Z cvt.rn.f16x2.f32 %r2524, %r2523, %r2522; 2026-02-21T09:08:09.0799608Z mov.b64 {%r2525, %r2526}, %rd431; 2026-02-21T09:08:09.0799845Z cvt.rn.f16x2.f32 %r2527, %r2526, %r2525; 2026-02-21T09:08:09.0799966Z mov.b64 {%r2528, %r2529}, %rd435; 2026-02-21T09:08:09.0800107Z cvt.rn.f16x2.f32 %r2530, %r2529, %r2528; 2026-02-21T09:08:09.0800511Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0800636Z cvt.u64.u32 %rd992, %r1149; 2026-02-21T09:08:09.0800756Z cvt.u64.u32 %rd993, %r1150; 2026-02-21T09:08:09.0800883Z shl.b64 %rd994, %rd993, 32; 2026-02-21T09:08:09.0801003Z or.b64 %rd995, %rd992, %rd994; 2026-02-21T09:08:09.0801469Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0801603Z mov.b64 {%r2531, %r2532}, %rd995; 2026-02-21T09:08:09.0801738Z cvt.rn.f16x2.f32 %r2533, %r2532, %r2531; 2026-02-21T09:08:09.0802121Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0802241Z cvt.u64.u32 %rd996, %r1151; 2026-02-21T09:08:09.0802367Z cvt.u64.u32 %rd997, %r1152; 2026-02-21T09:08:09.0802482Z shl.b64 %rd998, %rd997, 32; 2026-02-21T09:08:09.0802600Z or.b64 %rd999, %rd996, %rd998; 2026-02-21T09:08:09.0803060Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0803184Z mov.b64 {%r2534, %r2535}, %rd999; 2026-02-21T09:08:09.0803320Z cvt.rn.f16x2.f32 %r2536, %r2535, %r2534; 2026-02-21T09:08:09.0803448Z mov.b64 {%r2537, %r2538}, %rd439; 2026-02-21T09:08:09.0803583Z cvt.rn.f16x2.f32 %r2539, %r2538, %r2537; 2026-02-21T09:08:09.0803703Z mov.b64 {%r2540, %r2541}, %rd443; 2026-02-21T09:08:09.0803842Z cvt.rn.f16x2.f32 %r2542, %r2541, %r2540; 2026-02-21T09:08:09.0804246Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0804372Z cvt.u64.u32 %rd1000, %r1157; 2026-02-21T09:08:09.0804495Z cvt.u64.u32 %rd1001, %r1158; 2026-02-21T09:08:09.0804634Z shl.b64 %rd1002, %rd1001, 32; 2026-02-21T09:08:09.0804826Z or.b64 %rd1003, %rd1000, %rd1002; 2026-02-21T09:08:09.0805226Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0805368Z mov.b64 {%r2543, %r2544}, %rd1003; 2026-02-21T09:08:09.0805513Z cvt.rn.f16x2.f32 %r2545, %r2544, %r2543; 2026-02-21T09:08:09.0805904Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0806027Z cvt.u64.u32 %rd1004, %r1159; 2026-02-21T09:08:09.0806162Z cvt.u64.u32 %rd1005, %r1160; 2026-02-21T09:08:09.0806286Z shl.b64 %rd1006, %rd1005, 32; 2026-02-21T09:08:09.0806408Z or.b64 %rd1007, %rd1004, %rd1006; 2026-02-21T09:08:09.0806811Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0806936Z mov.b64 {%r2546, %r2547}, %rd1007; 2026-02-21T09:08:09.0807075Z cvt.rn.f16x2.f32 %r2548, %r2547, %r2546; 2026-02-21T09:08:09.0807207Z mov.b64 {%r2549, %r2550}, %rd447; 2026-02-21T09:08:09.0807342Z cvt.rn.f16x2.f32 %r2551, %r2550, %r2549; 2026-02-21T09:08:09.0807461Z mov.b64 {%r2552, %r2553}, %rd451; 2026-02-21T09:08:09.0807597Z cvt.rn.f16x2.f32 %r2554, %r2553, %r2552; 2026-02-21T09:08:09.0808000Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0808120Z cvt.u64.u32 %rd1008, %r1166; 2026-02-21T09:08:09.0808239Z cvt.u64.u32 %rd1009, %r1167; 2026-02-21T09:08:09.0808375Z shl.b64 %rd1010, %rd1009, 32; 2026-02-21T09:08:09.0808499Z or.b64 %rd1011, %rd1008, %rd1010; 2026-02-21T09:08:09.0808891Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0809103Z mov.b64 {%r2555, %r2556}, %rd1011; 2026-02-21T09:08:09.0809246Z cvt.rn.f16x2.f32 %r2557, %r2556, %r2555; 2026-02-21T09:08:09.0809635Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0809819Z cvt.u64.u32 %rd1012, %r1168; 2026-02-21T09:08:09.0809952Z cvt.u64.u32 %rd1013, %r1169; 2026-02-21T09:08:09.0810077Z shl.b64 %rd1014, %rd1013, 32; 2026-02-21T09:08:09.0810202Z or.b64 %rd1015, %rd1012, %rd1014; 2026-02-21T09:08:09.0810607Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0810728Z mov.b64 {%r2558, %r2559}, %rd1015; 2026-02-21T09:08:09.0810867Z cvt.rn.f16x2.f32 %r2560, %r2559, %r2558; 2026-02-21T09:08:09.0810995Z mov.b64 {%r2561, %r2562}, %rd455; 2026-02-21T09:08:09.0811133Z cvt.rn.f16x2.f32 %r2563, %r2562, %r2561; 2026-02-21T09:08:09.0811311Z mov.b64 {%r2564, %r2565}, %rd459; 2026-02-21T09:08:09.0811453Z cvt.rn.f16x2.f32 %r2566, %r2565, %r2564; 2026-02-21T09:08:09.0811882Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0812008Z cvt.u64.u32 %rd1016, %r1174; 2026-02-21T09:08:09.0812129Z cvt.u64.u32 %rd1017, %r1175; 2026-02-21T09:08:09.0812260Z shl.b64 %rd1018, %rd1017, 32; 2026-02-21T09:08:09.0812380Z or.b64 %rd1019, %rd1016, %rd1018; 2026-02-21T09:08:09.0812832Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0812969Z mov.b64 {%r2567, %r2568}, %rd1019; 2026-02-21T09:08:09.0813109Z cvt.rn.f16x2.f32 %r2569, %r2568, %r2567; 2026-02-21T09:08:09.0813506Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0813626Z cvt.u64.u32 %rd1020, %r1176; 2026-02-21T09:08:09.0813755Z cvt.u64.u32 %rd1021, %r1177; 2026-02-21T09:08:09.0813882Z shl.b64 %rd1022, %rd1021, 32; 2026-02-21T09:08:09.0813998Z or.b64 %rd1023, %rd1020, %rd1022; 2026-02-21T09:08:09.0814413Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0814533Z mov.b64 {%r2570, %r2571}, %rd1023; 2026-02-21T09:08:09.0814748Z cvt.rn.f16x2.f32 %r2572, %r2571, %r2570; 2026-02-21T09:08:09.0814869Z mov.b64 {%r2573, %r2574}, %rd463; 2026-02-21T09:08:09.0815021Z cvt.rn.f16x2.f32 %r2575, %r2574, %r2573; 2026-02-21T09:08:09.0815138Z mov.b64 {%r2576, %r2577}, %rd467; 2026-02-21T09:08:09.0815276Z cvt.rn.f16x2.f32 %r2578, %r2577, %r2576; 2026-02-21T09:08:09.0815683Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0815804Z cvt.u64.u32 %rd1024, %r1183; 2026-02-21T09:08:09.0815926Z cvt.u64.u32 %rd1025, %r1184; 2026-02-21T09:08:09.0816058Z shl.b64 %rd1026, %rd1025, 32; 2026-02-21T09:08:09.0816181Z or.b64 %rd1027, %rd1024, %rd1026; 2026-02-21T09:08:09.0816578Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0816702Z mov.b64 {%r2579, %r2580}, %rd1027; 2026-02-21T09:08:09.0816850Z cvt.rn.f16x2.f32 %r2581, %r2580, %r2579; 2026-02-21T09:08:09.0817244Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0817367Z cvt.u64.u32 %rd1028, %r1185; 2026-02-21T09:08:09.0817502Z cvt.u64.u32 %rd1029, %r1186; 2026-02-21T09:08:09.0817626Z shl.b64 %rd1030, %rd1029, 32; 2026-02-21T09:08:09.0817746Z or.b64 %rd1031, %rd1028, %rd1030; 2026-02-21T09:08:09.0818156Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0818276Z mov.b64 {%r2582, %r2583}, %rd1031; 2026-02-21T09:08:09.0818414Z cvt.rn.f16x2.f32 %r2584, %r2583, %r2582; 2026-02-21T09:08:09.0818533Z mov.b64 {%r2585, %r2586}, %rd471; 2026-02-21T09:08:09.0818685Z cvt.rn.f16x2.f32 %r2587, %r2586, %r2585; 2026-02-21T09:08:09.0818881Z mov.b64 {%r2588, %r2589}, %rd475; 2026-02-21T09:08:09.0819016Z cvt.rn.f16x2.f32 %r2590, %r2589, %r2588; 2026-02-21T09:08:09.0819424Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0819607Z cvt.u64.u32 %rd1032, %r1191; 2026-02-21T09:08:09.0819725Z cvt.u64.u32 %rd1033, %r1192; 2026-02-21T09:08:09.0819862Z shl.b64 %rd1034, %rd1033, 32; 2026-02-21T09:08:09.0819988Z or.b64 %rd1035, %rd1032, %rd1034; 2026-02-21T09:08:09.0820384Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0820503Z mov.b64 {%r2591, %r2592}, %rd1035; 2026-02-21T09:08:09.0820656Z cvt.rn.f16x2.f32 %r2593, %r2592, %r2591; 2026-02-21T09:08:09.0821053Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0821233Z cvt.u64.u32 %rd1036, %r1193; 2026-02-21T09:08:09.0821368Z cvt.u64.u32 %rd1037, %r1194; 2026-02-21T09:08:09.0821490Z shl.b64 %rd1038, %rd1037, 32; 2026-02-21T09:08:09.0821614Z or.b64 %rd1039, %rd1036, %rd1038; 2026-02-21T09:08:09.0822030Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0822158Z mov.b64 {%r2594, %r2595}, %rd1039; 2026-02-21T09:08:09.0822293Z cvt.rn.f16x2.f32 %r2596, %r2595, %r2594; 2026-02-21T09:08:09.0822409Z mov.b64 {%r2597, %r2598}, %rd479; 2026-02-21T09:08:09.0822614Z cvt.rn.f16x2.f32 %r2599, %r2598, %r2597; 2026-02-21T09:08:09.0822732Z mov.b64 {%r2600, %r2601}, %rd483; 2026-02-21T09:08:09.0822868Z cvt.rn.f16x2.f32 %r2602, %r2601, %r2600; 2026-02-21T09:08:09.0823276Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0823394Z cvt.u64.u32 %rd1040, %r1200; 2026-02-21T09:08:09.0823518Z cvt.u64.u32 %rd1041, %r1201; 2026-02-21T09:08:09.0823651Z shl.b64 %rd1042, %rd1041, 32; 2026-02-21T09:08:09.0823772Z or.b64 %rd1043, %rd1040, %rd1042; 2026-02-21T09:08:09.0824164Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0824285Z mov.b64 {%r2603, %r2604}, %rd1043; 2026-02-21T09:08:09.0824439Z cvt.rn.f16x2.f32 %r2605, %r2604, %r2603; 2026-02-21T09:08:09.0824935Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0825057Z cvt.u64.u32 %rd1044, %r1202; 2026-02-21T09:08:09.0825194Z cvt.u64.u32 %rd1045, %r1203; 2026-02-21T09:08:09.0825314Z shl.b64 %rd1046, %rd1045, 32; 2026-02-21T09:08:09.0825438Z or.b64 %rd1047, %rd1044, %rd1046; 2026-02-21T09:08:09.0825832Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0825949Z mov.b64 {%r2606, %r2607}, %rd1047; 2026-02-21T09:08:09.0826083Z cvt.rn.f16x2.f32 %r2608, %r2607, %r2606; 2026-02-21T09:08:09.0826205Z mov.b64 {%r2609, %r2610}, %rd487; 2026-02-21T09:08:09.0826355Z cvt.rn.f16x2.f32 %r2611, %r2610, %r2609; 2026-02-21T09:08:09.0826471Z mov.b64 {%r2612, %r2613}, %rd491; 2026-02-21T09:08:09.0826609Z cvt.rn.f16x2.f32 %r2614, %r2613, %r2612; 2026-02-21T09:08:09.0827013Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0827137Z cvt.u64.u32 %rd1048, %r1208; 2026-02-21T09:08:09.0827258Z cvt.u64.u32 %rd1049, %r1209; 2026-02-21T09:08:09.0827391Z shl.b64 %rd1050, %rd1049, 32; 2026-02-21T09:08:09.0827507Z or.b64 %rd1051, %rd1048, %rd1050; 2026-02-21T09:08:09.0827900Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0828020Z mov.b64 {%r2615, %r2616}, %rd1051; 2026-02-21T09:08:09.0828167Z cvt.rn.f16x2.f32 %r2617, %r2616, %r2615; 2026-02-21T09:08:09.0828557Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0828757Z cvt.u64.u32 %rd1052, %r1210; 2026-02-21T09:08:09.0828892Z cvt.u64.u32 %rd1053, %r1211; 2026-02-21T09:08:09.0829014Z shl.b64 %rd1054, %rd1053, 32; 2026-02-21T09:08:09.0829134Z or.b64 %rd1055, %rd1052, %rd1054; 2026-02-21T09:08:09.0829535Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0829718Z mov.b64 {%r2618, %r2619}, %rd1055; 2026-02-21T09:08:09.0829858Z cvt.rn.f16x2.f32 %r2620, %r2619, %r2618; 2026-02-21T09:08:09.0829984Z mov.b64 {%r2621, %r2622}, %rd495; 2026-02-21T09:08:09.0830138Z cvt.rn.f16x2.f32 %r2623, %r2622, %r2621; 2026-02-21T09:08:09.0830257Z mov.b64 {%r2624, %r2625}, %rd499; 2026-02-21T09:08:09.0830394Z cvt.rn.f16x2.f32 %r2626, %r2625, %r2624; 2026-02-21T09:08:09.0830800Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0830916Z cvt.u64.u32 %rd1056, %r1217; 2026-02-21T09:08:09.0831101Z cvt.u64.u32 %rd1057, %r1218; 2026-02-21T09:08:09.0831239Z shl.b64 %rd1058, %rd1057, 32; 2026-02-21T09:08:09.0831359Z or.b64 %rd1059, %rd1056, %rd1058; 2026-02-21T09:08:09.0831764Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0831891Z mov.b64 {%r2627, %r2628}, %rd1059; 2026-02-21T09:08:09.0832040Z cvt.rn.f16x2.f32 %r2629, %r2628, %r2627; 2026-02-21T09:08:09.0832490Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0832610Z cvt.u64.u32 %rd1060, %r1219; 2026-02-21T09:08:09.0832737Z cvt.u64.u32 %rd1061, %r1220; 2026-02-21T09:08:09.0832854Z shl.b64 %rd1062, %rd1061, 32; 2026-02-21T09:08:09.0832976Z or.b64 %rd1063, %rd1060, %rd1062; 2026-02-21T09:08:09.0833371Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0833490Z mov.b64 {%r2630, %r2631}, %rd1063; 2026-02-21T09:08:09.0833632Z cvt.rn.f16x2.f32 %r2632, %r2631, %r2630; 2026-02-21T09:08:09.0833747Z mov.b64 {%r2633, %r2634}, %rd503; 2026-02-21T09:08:09.0833891Z cvt.rn.f16x2.f32 %r2635, %r2634, %r2633; 2026-02-21T09:08:09.0834008Z mov.b64 {%r2636, %r2637}, %rd507; 2026-02-21T09:08:09.0834144Z cvt.rn.f16x2.f32 %r2638, %r2637, %r2636; 2026-02-21T09:08:09.0834543Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0834664Z cvt.u64.u32 %rd1064, %r1225; 2026-02-21T09:08:09.0834844Z cvt.u64.u32 %rd1065, %r1226; 2026-02-21T09:08:09.0834975Z shl.b64 %rd1066, %rd1065, 32; 2026-02-21T09:08:09.0835094Z or.b64 %rd1067, %rd1064, %rd1066; 2026-02-21T09:08:09.0835485Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0835607Z mov.b64 {%r2639, %r2640}, %rd1067; 2026-02-21T09:08:09.0835756Z cvt.rn.f16x2.f32 %r2641, %r2640, %r2639; 2026-02-21T09:08:09.0836144Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0836266Z cvt.u64.u32 %rd1068, %r1227; 2026-02-21T09:08:09.0836387Z cvt.u64.u32 %rd1069, %r1228; 2026-02-21T09:08:09.0836508Z shl.b64 %rd1070, %rd1069, 32; 2026-02-21T09:08:09.0836626Z or.b64 %rd1071, %rd1068, %rd1070; 2026-02-21T09:08:09.0837018Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0837137Z mov.b64 {%r2642, %r2643}, %rd1071; 2026-02-21T09:08:09.0837275Z cvt.rn.f16x2.f32 %r2644, %r2643, %r2642; 2026-02-21T09:08:09.0837393Z mov.b64 {%r2645, %r2646}, %rd511; 2026-02-21T09:08:09.0837536Z cvt.rn.f16x2.f32 %r2647, %r2646, %r2645; 2026-02-21T09:08:09.0837655Z mov.b64 {%r2648, %r2649}, %rd515; 2026-02-21T09:08:09.0837789Z cvt.rn.f16x2.f32 %r2650, %r2649, %r2648; 2026-02-21T09:08:09.0838184Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0838306Z cvt.u64.u32 %rd1072, %r1234; 2026-02-21T09:08:09.0838536Z cvt.u64.u32 %rd1073, %r1235; 2026-02-21T09:08:09.0838674Z shl.b64 %rd1074, %rd1073, 32; 2026-02-21T09:08:09.0838797Z or.b64 %rd1075, %rd1072, %rd1074; 2026-02-21T09:08:09.0839183Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0839371Z mov.b64 {%r2651, %r2652}, %rd1075; 2026-02-21T09:08:09.0839517Z cvt.rn.f16x2.f32 %r2653, %r2652, %r2651; 2026-02-21T09:08:09.0839901Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0840020Z cvt.u64.u32 %rd1076, %r1236; 2026-02-21T09:08:09.0840148Z cvt.u64.u32 %rd1077, %r1237; 2026-02-21T09:08:09.0840267Z shl.b64 %rd1078, %rd1077, 32; 2026-02-21T09:08:09.0840382Z or.b64 %rd1079, %rd1076, %rd1078; 2026-02-21T09:08:09.0840780Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0840960Z mov.b64 {%r2654, %r2655}, %rd1079; 2026-02-21T09:08:09.0841099Z cvt.rn.f16x2.f32 %r2656, %r2655, %r2654; 2026-02-21T09:08:09.0841214Z mov.b64 {%r2657, %r2658}, %rd519; 2026-02-21T09:08:09.0841365Z cvt.rn.f16x2.f32 %r2659, %r2658, %r2657; 2026-02-21T09:08:09.0841484Z mov.b64 {%r2660, %r2661}, %rd523; 2026-02-21T09:08:09.0841618Z cvt.rn.f16x2.f32 %r2662, %r2661, %r2660; 2026-02-21T09:08:09.0842012Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0842190Z cvt.u64.u32 %rd1080, %r1242; 2026-02-21T09:08:09.0842312Z cvt.u64.u32 %rd1081, %r1243; 2026-02-21T09:08:09.0842443Z shl.b64 %rd1082, %rd1081, 32; 2026-02-21T09:08:09.0842564Z or.b64 %rd1083, %rd1080, %rd1082; 2026-02-21T09:08:09.0842955Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0843074Z mov.b64 {%r2663, %r2664}, %rd1083; 2026-02-21T09:08:09.0843222Z cvt.rn.f16x2.f32 %r2665, %r2664, %r2663; 2026-02-21T09:08:09.0843618Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0843738Z cvt.u64.u32 %rd1084, %r1244; 2026-02-21T09:08:09.0843866Z cvt.u64.u32 %rd1085, %r1245; 2026-02-21T09:08:09.0843988Z shl.b64 %rd1086, %rd1085, 32; 2026-02-21T09:08:09.0844108Z or.b64 %rd1087, %rd1084, %rd1086; 2026-02-21T09:08:09.0844514Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0844639Z mov.b64 {%r2666, %r2667}, %rd1087; 2026-02-21T09:08:09.0844848Z cvt.rn.f16x2.f32 %r2668, %r2667, %r2666; 2026-02-21T09:08:09.0844968Z mov.b64 {%r2669, %r2670}, %rd527; 2026-02-21T09:08:09.0845120Z cvt.rn.f16x2.f32 %r2671, %r2670, %r2669; 2026-02-21T09:08:09.0845238Z mov.b64 {%r2672, %r2673}, %rd531; 2026-02-21T09:08:09.0845375Z cvt.rn.f16x2.f32 %r2674, %r2673, %r2672; 2026-02-21T09:08:09.0845780Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0845905Z cvt.u64.u32 %rd1088, %r1251; 2026-02-21T09:08:09.0846023Z cvt.u64.u32 %rd1089, %r1252; 2026-02-21T09:08:09.0846154Z shl.b64 %rd1090, %rd1089, 32; 2026-02-21T09:08:09.0846272Z or.b64 %rd1091, %rd1088, %rd1090; 2026-02-21T09:08:09.0846661Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0846783Z mov.b64 {%r2675, %r2676}, %rd1091; 2026-02-21T09:08:09.0846930Z cvt.rn.f16x2.f32 %r2677, %r2676, %r2675; 2026-02-21T09:08:09.0847312Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0847431Z cvt.u64.u32 %rd1092, %r1253; 2026-02-21T09:08:09.0847560Z cvt.u64.u32 %rd1093, %r1254; 2026-02-21T09:08:09.0847679Z shl.b64 %rd1094, %rd1093, 32; 2026-02-21T09:08:09.0847795Z or.b64 %rd1095, %rd1092, %rd1094; 2026-02-21T09:08:09.0848197Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0848390Z mov.b64 {%r2678, %r2679}, %rd1095; 2026-02-21T09:08:09.0848527Z cvt.rn.f16x2.f32 %r2680, %r2679, %r2678; 2026-02-21T09:08:09.0848645Z mov.b64 {%r2681, %r2682}, %rd535; 2026-02-21T09:08:09.0848797Z cvt.rn.f16x2.f32 %r2683, %r2682, %r2681; 2026-02-21T09:08:09.0848974Z mov.b64 {%r2684, %r2685}, %rd539; 2026-02-21T09:08:09.0849111Z cvt.rn.f16x2.f32 %r2686, %r2685, %r2684; 2026-02-21T09:08:09.0849522Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0849649Z cvt.u64.u32 %rd1096, %r1259; 2026-02-21T09:08:09.0849772Z cvt.u64.u32 %rd1097, %r1260; 2026-02-21T09:08:09.0849896Z shl.b64 %rd1098, %rd1097, 32; 2026-02-21T09:08:09.0850030Z or.b64 %rd1099, %rd1096, %rd1098; 2026-02-21T09:08:09.0850428Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0850615Z mov.b64 {%r2687, %r2688}, %rd1099; 2026-02-21T09:08:09.0850761Z cvt.rn.f16x2.f32 %r2689, %r2688, %r2687; 2026-02-21T09:08:09.0851162Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0851283Z cvt.u64.u32 %rd1100, %r1261; 2026-02-21T09:08:09.0851419Z cvt.u64.u32 %rd1101, %r1262; 2026-02-21T09:08:09.0851540Z shl.b64 %rd1102, %rd1101, 32; 2026-02-21T09:08:09.0851657Z or.b64 %rd1103, %rd1100, %rd1102; 2026-02-21T09:08:09.0852121Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0852240Z mov.b64 {%r2690, %r2691}, %rd1103; 2026-02-21T09:08:09.0852380Z cvt.rn.f16x2.f32 %r2692, %r2691, %r2690; 2026-02-21T09:08:09.0852498Z mov.b64 {%r2693, %r2694}, %rd543; 2026-02-21T09:08:09.0852645Z cvt.rn.f16x2.f32 %r2695, %r2694, %r2693; 2026-02-21T09:08:09.0852761Z mov.b64 {%r2696, %r2697}, %rd547; 2026-02-21T09:08:09.0852891Z cvt.rn.f16x2.f32 %r2698, %r2697, %r2696; 2026-02-21T09:08:09.0853293Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0853414Z cvt.u64.u32 %rd1104, %r1268; 2026-02-21T09:08:09.0853534Z cvt.u64.u32 %rd1105, %r1269; 2026-02-21T09:08:09.0853657Z shl.b64 %rd1106, %rd1105, 32; 2026-02-21T09:08:09.0853784Z or.b64 %rd1107, %rd1104, %rd1106; 2026-02-21T09:08:09.0854176Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0854297Z mov.b64 {%r2699, %r2700}, %rd1107; 2026-02-21T09:08:09.0854448Z cvt.rn.f16x2.f32 %r2701, %r2700, %r2699; 2026-02-21T09:08:09.0854566Z mov.b64 {%r2702, %r2703}, %rd551; 2026-02-21T09:08:09.0854772Z cvt.rn.f16x2.f32 %r2704, %r2703, %r2702; 2026-02-21T09:08:09.0854902Z mov.b64 {%r2705, %r2706}, %rd555; 2026-02-21T09:08:09.0855039Z cvt.rn.f16x2.f32 %r2707, %r2706, %r2705; 2026-02-21T09:08:09.0855157Z mov.b64 {%r2708, %r2709}, %rd559; 2026-02-21T09:08:09.0855300Z cvt.rn.f16x2.f32 %r2710, %r2709, %r2708; 2026-02-21T09:08:09.0855703Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0855819Z cvt.u64.u32 %rd1108, %r1276; 2026-02-21T09:08:09.0855944Z cvt.u64.u32 %rd1109, %r1277; 2026-02-21T09:08:09.0856074Z shl.b64 %rd1110, %rd1109, 32; 2026-02-21T09:08:09.0856195Z or.b64 %rd1111, %rd1108, %rd1110; 2026-02-21T09:08:09.0856584Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0856722Z mov.b64 {%r2711, %r2712}, %rd1111; 2026-02-21T09:08:09.0856863Z cvt.rn.f16x2.f32 %r2713, %r2712, %r2711; 2026-02-21T09:08:09.0856981Z mov.b64 {%r2714, %r2715}, %rd563; 2026-02-21T09:08:09.0857121Z cvt.rn.f16x2.f32 %r2716, %r2715, %r2714; 2026-02-21T09:08:09.0857252Z mov.b64 {%r2717, %r2718}, %rd567; 2026-02-21T09:08:09.0857388Z cvt.rn.f16x2.f32 %r2719, %r2718, %r2717; 2026-02-21T09:08:09.0857507Z mov.b64 {%r2720, %r2721}, %rd571; 2026-02-21T09:08:09.0857660Z cvt.rn.f16x2.f32 %r2722, %r2721, %r2720; 2026-02-21T09:08:09.0858113Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0858230Z cvt.u64.u32 %rd1112, %r1285; 2026-02-21T09:08:09.0858361Z cvt.u64.u32 %rd1113, %r1286; 2026-02-21T09:08:09.0858542Z shl.b64 %rd1114, %rd1113, 32; 2026-02-21T09:08:09.0858661Z or.b64 %rd1115, %rd1112, %rd1114; 2026-02-21T09:08:09.0859050Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0859183Z mov.b64 {%r2723, %r2724}, %rd1115; 2026-02-21T09:08:09.0859320Z cvt.rn.f16x2.f32 %r2725, %r2724, %r2723; 2026-02-21T09:08:09.0859441Z mov.b64 {%r2726, %r2727}, %rd575; 2026-02-21T09:08:09.0859588Z cvt.rn.f16x2.f32 %r2728, %r2727, %r2726; 2026-02-21T09:08:09.0859704Z mov.b64 {%r2729, %r2730}, %rd579; 2026-02-21T09:08:09.0859840Z cvt.rn.f16x2.f32 %r2731, %r2730, %r2729; 2026-02-21T09:08:09.0860038Z mov.b64 {%r2732, %r2733}, %rd583; 2026-02-21T09:08:09.0860181Z cvt.rn.f16x2.f32 %r2734, %r2733, %r2732; 2026-02-21T09:08:09.0860571Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0860696Z cvt.u64.u32 %rd1116, %r1293; 2026-02-21T09:08:09.0860829Z cvt.u64.u32 %rd1117, %r1294; 2026-02-21T09:08:09.0860954Z shl.b64 %rd1118, %rd1117, 32; 2026-02-21T09:08:09.0861076Z or.b64 %rd1119, %rd1116, %rd1118; 2026-02-21T09:08:09.0861536Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0861663Z mov.b64 {%r2735, %r2736}, %rd1119; 2026-02-21T09:08:09.0861804Z cvt.rn.f16x2.f32 %r2737, %r2736, %r2735; 2026-02-21T09:08:09.0862195Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0862318Z cvt.u64.u32 %rd1120, %r1295; 2026-02-21T09:08:09.0862435Z cvt.u64.u32 %rd1121, %r1296; 2026-02-21T09:08:09.0862557Z shl.b64 %rd1122, %rd1121, 32; 2026-02-21T09:08:09.0862694Z or.b64 %rd1123, %rd1120, %rd1122; 2026-02-21T09:08:09.0863079Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0863202Z mov.b64 {%r2738, %r2739}, %rd1123; 2026-02-21T09:08:09.0863353Z cvt.rn.f16x2.f32 %r2740, %r2739, %r2738; 2026-02-21T09:08:09.0863475Z mov.b64 {%r2741, %r2742}, %rd587; 2026-02-21T09:08:09.0863609Z cvt.rn.f16x2.f32 %r2743, %r2742, %r2741; 2026-02-21T09:08:09.0863742Z mov.b64 {%r2744, %r2745}, %rd591; 2026-02-21T09:08:09.0863877Z cvt.rn.f16x2.f32 %r2746, %r2745, %r2744; 2026-02-21T09:08:09.0864256Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0864376Z cvt.u64.u32 %rd1124, %r1302; 2026-02-21T09:08:09.0864506Z cvt.u64.u32 %rd1125, %r1303; 2026-02-21T09:08:09.0864625Z shl.b64 %rd1126, %rd1125, 32; 2026-02-21T09:08:09.0864833Z or.b64 %rd1127, %rd1124, %rd1126; 2026-02-21T09:08:09.0865247Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0865371Z mov.b64 {%r2747, %r2748}, %rd1127; 2026-02-21T09:08:09.0865508Z cvt.rn.f16x2.f32 %r2749, %r2748, %r2747; 2026-02-21T09:08:09.0865625Z mov.b64 {%r2750, %r2751}, %rd595; 2026-02-21T09:08:09.0865779Z cvt.rn.f16x2.f32 %r2752, %r2751, %r2750; 2026-02-21T09:08:09.0865898Z mov.b64 {%r2753, %r2754}, %rd599; 2026-02-21T09:08:09.0866054Z cvt.rn.f16x2.f32 %r2755, %r2754, %r2753; 2026-02-21T09:08:09.0866192Z mov.b64 {%r2756, %r2757}, %rd603; 2026-02-21T09:08:09.0866330Z cvt.rn.f16x2.f32 %r2758, %r2757, %r2756; 2026-02-21T09:08:09.0866725Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0866854Z cvt.u64.u32 %rd1128, %r1310; 2026-02-21T09:08:09.0866975Z cvt.u64.u32 %rd1129, %r1311; 2026-02-21T09:08:09.0867099Z shl.b64 %rd1130, %rd1129, 32; 2026-02-21T09:08:09.0867226Z or.b64 %rd1131, %rd1128, %rd1130; 2026-02-21T09:08:09.0867716Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0867839Z mov.b64 {%r2759, %r2760}, %rd1131; 2026-02-21T09:08:09.0867979Z cvt.rn.f16x2.f32 %r2761, %r2760, %r2759; 2026-02-21T09:08:09.0868179Z mov.b64 {%r2762, %r2763}, %rd607; 2026-02-21T09:08:09.0868320Z cvt.rn.f16x2.f32 %r2764, %r2763, %r2762; 2026-02-21T09:08:09.0868441Z mov.b64 {%r2765, %r2766}, %rd611; 2026-02-21T09:08:09.0868590Z cvt.rn.f16x2.f32 %r2767, %r2766, %r2765; 2026-02-21T09:08:09.0868705Z mov.b64 {%r2768, %r2769}, %rd615; 2026-02-21T09:08:09.0868842Z cvt.rn.f16x2.f32 %r2770, %r2769, %r2768; 2026-02-21T09:08:09.0869234Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0869365Z cvt.u64.u32 %rd1132, %r1319; 2026-02-21T09:08:09.0869485Z cvt.u64.u32 %rd1133, %r1320; 2026-02-21T09:08:09.0869666Z shl.b64 %rd1134, %rd1133, 32; 2026-02-21T09:08:09.0869802Z or.b64 %rd1135, %rd1132, %rd1134; 2026-02-21T09:08:09.0870202Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0870326Z mov.b64 {%r2771, %r2772}, %rd1135; 2026-02-21T09:08:09.0870479Z cvt.rn.f16x2.f32 %r2773, %r2772, %r2771; 2026-02-21T09:08:09.0870604Z mov.b64 {%r2774, %r2775}, %rd619; 2026-02-21T09:08:09.0870740Z cvt.rn.f16x2.f32 %r2776, %r2775, %r2774; 2026-02-21T09:08:09.0870860Z mov.b64 {%r2777, %r2778}, %rd623; 2026-02-21T09:08:09.0871098Z cvt.rn.f16x2.f32 %r2779, %r2778, %r2777; 2026-02-21T09:08:09.0871219Z mov.b64 {%r2780, %r2781}, %rd627; 2026-02-21T09:08:09.0871352Z cvt.rn.f16x2.f32 %r2782, %r2781, %r2780; 2026-02-21T09:08:09.0871753Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0871869Z cvt.u64.u32 %rd1136, %r1327; 2026-02-21T09:08:09.0871989Z cvt.u64.u32 %rd1137, %r1328; 2026-02-21T09:08:09.0872123Z shl.b64 %rd1138, %rd1137, 32; 2026-02-21T09:08:09.0872243Z or.b64 %rd1139, %rd1136, %rd1138; 2026-02-21T09:08:09.0872627Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0872747Z mov.b64 {%r2783, %r2784}, %rd1139; 2026-02-21T09:08:09.0872895Z cvt.rn.f16x2.f32 %r2785, %r2784, %r2783; 2026-02-21T09:08:09.0873011Z mov.b64 {%r2786, %r2787}, %rd631; 2026-02-21T09:08:09.0873145Z cvt.rn.f16x2.f32 %r2788, %r2787, %r2786; 2026-02-21T09:08:09.0873273Z mov.b64 {%r2789, %r2790}, %rd635; 2026-02-21T09:08:09.0873409Z cvt.rn.f16x2.f32 %r2791, %r2790, %r2789; 2026-02-21T09:08:09.0873527Z mov.b64 {%r2792, %r2793}, %rd639; 2026-02-21T09:08:09.0873661Z cvt.rn.f16x2.f32 %r2794, %r2793, %r2792; 2026-02-21T09:08:09.0874061Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0874180Z cvt.u64.u32 %rd1140, %r1336; 2026-02-21T09:08:09.0874299Z cvt.u64.u32 %rd1141, %r1337; 2026-02-21T09:08:09.0874431Z shl.b64 %rd1142, %rd1141, 32; 2026-02-21T09:08:09.0874547Z or.b64 %rd1143, %rd1140, %rd1142; 2026-02-21T09:08:09.0875043Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0875182Z mov.b64 {%r2795, %r2796}, %rd1143; 2026-02-21T09:08:09.0875327Z cvt.rn.f16x2.f32 %r2797, %r2796, %r2795; 2026-02-21T09:08:09.0875444Z mov.b64 {%r2798, %r2799}, %rd643; 2026-02-21T09:08:09.0875582Z cvt.rn.f16x2.f32 %r2800, %r2799, %r2798; 2026-02-21T09:08:09.0875713Z mov.b64 {%r2801, %r2802}, %rd647; 2026-02-21T09:08:09.0875852Z cvt.rn.f16x2.f32 %r2803, %r2802, %r2801; 2026-02-21T09:08:09.0875967Z mov.b64 {%r2804, %r2805}, %rd651; 2026-02-21T09:08:09.0876109Z cvt.rn.f16x2.f32 %r2806, %r2805, %r2804; 2026-02-21T09:08:09.0876499Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0876619Z cvt.u64.u32 %rd1144, %r1344; 2026-02-21T09:08:09.0876754Z cvt.u64.u32 %rd1145, %r1345; 2026-02-21T09:08:09.0876962Z shl.b64 %rd1146, %rd1145, 32; 2026-02-21T09:08:09.0877080Z or.b64 %rd1147, %rd1144, %rd1146; 2026-02-21T09:08:09.0877479Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0877670Z mov.b64 {%r2807, %r2808}, %rd1147; 2026-02-21T09:08:09.0877807Z cvt.rn.f16x2.f32 %r2809, %r2808, %r2807; 2026-02-21T09:08:09.0877927Z mov.b64 {%r2810, %r2811}, %rd655; 2026-02-21T09:08:09.0878075Z cvt.rn.f16x2.f32 %r2812, %r2811, %r2810; 2026-02-21T09:08:09.0878192Z mov.b64 {%r2813, %r2814}, %rd659; 2026-02-21T09:08:09.0878332Z cvt.rn.f16x2.f32 %r2815, %r2814, %r2813; 2026-02-21T09:08:09.0878461Z mov.b64 {%r2816, %r2817}, %rd663; 2026-02-21T09:08:09.0878595Z cvt.rn.f16x2.f32 %r2818, %r2817, %r2816; 2026-02-21T09:08:09.0878987Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0879191Z cvt.u64.u32 %rd1148, %r1353; 2026-02-21T09:08:09.0879327Z cvt.u64.u32 %rd1149, %r1354; 2026-02-21T09:08:09.0879444Z shl.b64 %rd1150, %rd1149, 32; 2026-02-21T09:08:09.0879563Z or.b64 %rd1151, %rd1148, %rd1150; 2026-02-21T09:08:09.0879966Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0880088Z mov.b64 {%r2819, %r2820}, %rd1151; 2026-02-21T09:08:09.0880227Z cvt.rn.f16x2.f32 %r2821, %r2820, %r2819; 2026-02-21T09:08:09.0880352Z mov.b64 {%r2822, %r2823}, %rd667; 2026-02-21T09:08:09.0880553Z cvt.rn.f16x2.f32 %r2824, %r2823, %r2822; 2026-02-21T09:08:09.0880673Z mov.b64 {%r2825, %r2826}, %rd671; 2026-02-21T09:08:09.0880808Z cvt.rn.f16x2.f32 %r2827, %r2826, %r2825; 2026-02-21T09:08:09.0880941Z mov.b64 {%r2828, %r2829}, %rd675; 2026-02-21T09:08:09.0881076Z cvt.rn.f16x2.f32 %r2830, %r2829, %r2828; 2026-02-21T09:08:09.0881458Z .loc 1 56 52 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:56:52 2026-02-21T09:08:09.0881592Z cvt.u64.u32 %rd1152, %r1361; 2026-02-21T09:08:09.0881715Z cvt.u64.u32 %rd1153, %r1362; 2026-02-21T09:08:09.0881834Z shl.b64 %rd1154, %rd1153, 32; 2026-02-21T09:08:09.0881963Z or.b64 %rd1155, %rd1152, %rd1154; 2026-02-21T09:08:09.0882358Z .loc 1 58 27 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:58:27 2026-02-21T09:08:09.0882478Z mov.b64 {%r2831, %r2832}, %rd1155; 2026-02-21T09:08:09.0882614Z cvt.rn.f16x2.f32 %r2833, %r2832, %r2831; 2026-02-21T09:08:09.0882743Z mov.b64 {%r2834, %r2835}, %rd679; 2026-02-21T09:08:09.0882885Z cvt.rn.f16x2.f32 %r2836, %r2835, %r2834; 2026-02-21T09:08:09.0882997Z mov.b64 {%r2837, %r2838}, %rd683; 2026-02-21T09:08:09.0883141Z cvt.rn.f16x2.f32 %r2839, %r2838, %r2837; 2026-02-21T09:08:09.0883255Z mov.b64 {%r2840, %r2841}, %rd687; 2026-02-21T09:08:09.0883388Z cvt.rn.f16x2.f32 %r2842, %r2841, %r2840; 2026-02-21T09:08:09.0883786Z .loc 1 59 82 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:59:82 2026-02-21T09:08:09.0884010Z st.shared.v4.b32 [%r6], {%r2077, %r2089, %r2101, %r2113}; 2026-02-21T09:08:09.0884218Z st.shared.v4.b32 [%r7], {%r2125, %r2137, %r2149, %r2161}; 2026-02-21T09:08:09.0884422Z st.shared.v4.b32 [%r8], {%r2173, %r2185, %r2197, %r2209}; 2026-02-21T09:08:09.0884628Z st.shared.v4.b32 [%r9], {%r2221, %r2233, %r2245, %r2257}; 2026-02-21T09:08:09.0884980Z st.shared.v4.b32 [%r10], {%r2269, %r2281, %r2293, %r2305}; 2026-02-21T09:08:09.0885202Z st.shared.v4.b32 [%r11], {%r2317, %r2329, %r2341, %r2353}; 2026-02-21T09:08:09.0885426Z st.shared.v4.b32 [%r12], {%r2365, %r2377, %r2389, %r2401}; 2026-02-21T09:08:09.0885624Z st.shared.v4.b32 [%r13], {%r2413, %r2425, %r2437, %r2449}; 2026-02-21T09:08:09.0885734Z bar.sync 0; 2026-02-21T09:08:09.0885864Z // begin inline asm 2026-02-21T09:08:09.0886229Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1690, %r1694, %r1698, %r1702}, [%r1374]; 2026-02-21T09:08:09.0886343Z // end inline asm 2026-02-21T09:08:09.0886462Z // begin inline asm 2026-02-21T09:08:09.0886901Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1706, %r1710, %r1714, %r1718}, [%r1379]; 2026-02-21T09:08:09.0887009Z // end inline asm 2026-02-21T09:08:09.0887128Z // begin inline asm 2026-02-21T09:08:09.0887490Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1722, %r1726, %r1730, %r1734}, [%r1384]; 2026-02-21T09:08:09.0887661Z // end inline asm 2026-02-21T09:08:09.0887773Z // begin inline asm 2026-02-21T09:08:09.0888125Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1738, %r1742, %r1746, %r1750}, [%r1389]; 2026-02-21T09:08:09.0888243Z // end inline asm 2026-02-21T09:08:09.0888356Z // begin inline asm 2026-02-21T09:08:09.0888707Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1754, %r1758, %r1762, %r1766}, [%r1394]; 2026-02-21T09:08:09.0888826Z // end inline asm 2026-02-21T09:08:09.0888937Z // begin inline asm 2026-02-21T09:08:09.0889283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1770, %r1774, %r1778, %r1782}, [%r1399]; 2026-02-21T09:08:09.0889457Z // end inline asm 2026-02-21T09:08:09.0889576Z // begin inline asm 2026-02-21T09:08:09.0889916Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1786, %r1790, %r1794, %r1798}, [%r1404]; 2026-02-21T09:08:09.0890027Z // end inline asm 2026-02-21T09:08:09.0890152Z // begin inline asm 2026-02-21T09:08:09.0890494Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1802, %r1806, %r1810, %r1814}, [%r1409]; 2026-02-21T09:08:09.0890605Z // end inline asm 2026-02-21T09:08:09.0890722Z bar.sync 0; 2026-02-21T09:08:09.0890986Z st.shared.v4.b32 [%r6], {%r2461, %r2473, %r2485, %r2497}; 2026-02-21T09:08:09.0891186Z st.shared.v4.b32 [%r7], {%r2509, %r2521, %r2533, %r2545}; 2026-02-21T09:08:09.0891398Z st.shared.v4.b32 [%r8], {%r2557, %r2569, %r2581, %r2593}; 2026-02-21T09:08:09.0891593Z st.shared.v4.b32 [%r9], {%r2605, %r2617, %r2629, %r2641}; 2026-02-21T09:08:09.0891799Z st.shared.v4.b32 [%r10], {%r2653, %r2665, %r2677, %r2689}; 2026-02-21T09:08:09.0892002Z st.shared.v4.b32 [%r11], {%r2701, %r2713, %r2725, %r2737}; 2026-02-21T09:08:09.0892211Z st.shared.v4.b32 [%r12], {%r2749, %r2761, %r2773, %r2785}; 2026-02-21T09:08:09.0892407Z st.shared.v4.b32 [%r13], {%r2797, %r2809, %r2821, %r2833}; 2026-02-21T09:08:09.0892514Z bar.sync 0; 2026-02-21T09:08:09.0892638Z // begin inline asm 2026-02-21T09:08:09.0892982Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1818, %r1822, %r1826, %r1830}, [%r1374]; 2026-02-21T09:08:09.0893093Z // end inline asm 2026-02-21T09:08:09.0893216Z // begin inline asm 2026-02-21T09:08:09.0893561Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1834, %r1838, %r1842, %r1846}, [%r1379]; 2026-02-21T09:08:09.0893671Z // end inline asm 2026-02-21T09:08:09.0893784Z // begin inline asm 2026-02-21T09:08:09.0894131Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1850, %r1854, %r1858, %r1862}, [%r1384]; 2026-02-21T09:08:09.0894242Z // end inline asm 2026-02-21T09:08:09.0894354Z // begin inline asm 2026-02-21T09:08:09.0894789Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1866, %r1870, %r1874, %r1878}, [%r1389]; 2026-02-21T09:08:09.0894900Z // end inline asm 2026-02-21T09:08:09.0895020Z // begin inline asm 2026-02-21T09:08:09.0895365Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1882, %r1886, %r1890, %r1894}, [%r1394]; 2026-02-21T09:08:09.0895485Z // end inline asm 2026-02-21T09:08:09.0895599Z // begin inline asm 2026-02-21T09:08:09.0895936Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1898, %r1902, %r1906, %r1910}, [%r1399]; 2026-02-21T09:08:09.0896054Z // end inline asm 2026-02-21T09:08:09.0896160Z // begin inline asm 2026-02-21T09:08:09.0896505Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1914, %r1918, %r1922, %r1926}, [%r1404]; 2026-02-21T09:08:09.0896625Z // end inline asm 2026-02-21T09:08:09.0896739Z // begin inline asm 2026-02-21T09:08:09.0897078Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1930, %r1934, %r1938, %r1942}, [%r1409]; 2026-02-21T09:08:09.0897186Z // end inline asm 2026-02-21T09:08:09.0897305Z bar.sync 0; 2026-02-21T09:08:09.0897501Z st.shared.v4.b32 [%r6], {%r2080, %r2092, %r2104, %r2116}; 2026-02-21T09:08:09.0897800Z st.shared.v4.b32 [%r7], {%r2128, %r2140, %r2152, %r2164}; 2026-02-21T09:08:09.0898016Z st.shared.v4.b32 [%r8], {%r2176, %r2188, %r2200, %r2212}; 2026-02-21T09:08:09.0898212Z st.shared.v4.b32 [%r9], {%r2224, %r2236, %r2248, %r2260}; 2026-02-21T09:08:09.0898417Z st.shared.v4.b32 [%r10], {%r2272, %r2284, %r2296, %r2308}; 2026-02-21T09:08:09.0898696Z st.shared.v4.b32 [%r11], {%r2320, %r2332, %r2344, %r2356}; 2026-02-21T09:08:09.0898899Z st.shared.v4.b32 [%r12], {%r2368, %r2380, %r2392, %r2404}; 2026-02-21T09:08:09.0899102Z st.shared.v4.b32 [%r13], {%r2416, %r2428, %r2440, %r2452}; 2026-02-21T09:08:09.0899210Z bar.sync 0; 2026-02-21T09:08:09.0899342Z // begin inline asm 2026-02-21T09:08:09.0899690Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1691, %r1695, %r1699, %r1703}, [%r1374]; 2026-02-21T09:08:09.0899798Z // end inline asm 2026-02-21T09:08:09.0899919Z // begin inline asm 2026-02-21T09:08:09.0900334Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1707, %r1711, %r1715, %r1719}, [%r1379]; 2026-02-21T09:08:09.0900448Z // end inline asm 2026-02-21T09:08:09.0900577Z // begin inline asm 2026-02-21T09:08:09.0900928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1723, %r1727, %r1731, %r1735}, [%r1384]; 2026-02-21T09:08:09.0901040Z // end inline asm 2026-02-21T09:08:09.0901155Z // begin inline asm 2026-02-21T09:08:09.0901510Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1739, %r1743, %r1747, %r1751}, [%r1389]; 2026-02-21T09:08:09.0901617Z // end inline asm 2026-02-21T09:08:09.0901729Z // begin inline asm 2026-02-21T09:08:09.0902152Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1755, %r1759, %r1763, %r1767}, [%r1394]; 2026-02-21T09:08:09.0902260Z // end inline asm 2026-02-21T09:08:09.0902371Z // begin inline asm 2026-02-21T09:08:09.0902716Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1771, %r1775, %r1779, %r1783}, [%r1399]; 2026-02-21T09:08:09.0902840Z // end inline asm 2026-02-21T09:08:09.0902950Z // begin inline asm 2026-02-21T09:08:09.0903296Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1787, %r1791, %r1795, %r1799}, [%r1404]; 2026-02-21T09:08:09.0903415Z // end inline asm 2026-02-21T09:08:09.0903529Z // begin inline asm 2026-02-21T09:08:09.0903866Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1803, %r1807, %r1811, %r1815}, [%r1409]; 2026-02-21T09:08:09.0903987Z // end inline asm 2026-02-21T09:08:09.0904096Z bar.sync 0; 2026-02-21T09:08:09.0904292Z st.shared.v4.b32 [%r6], {%r2464, %r2476, %r2488, %r2500}; 2026-02-21T09:08:09.0904496Z st.shared.v4.b32 [%r7], {%r2512, %r2524, %r2536, %r2548}; 2026-02-21T09:08:09.0904777Z st.shared.v4.b32 [%r8], {%r2560, %r2572, %r2584, %r2596}; 2026-02-21T09:08:09.0904976Z st.shared.v4.b32 [%r9], {%r2608, %r2620, %r2632, %r2644}; 2026-02-21T09:08:09.0905176Z st.shared.v4.b32 [%r10], {%r2656, %r2668, %r2680, %r2692}; 2026-02-21T09:08:09.0905396Z st.shared.v4.b32 [%r11], {%r2704, %r2716, %r2728, %r2740}; 2026-02-21T09:08:09.0905593Z st.shared.v4.b32 [%r12], {%r2752, %r2764, %r2776, %r2788}; 2026-02-21T09:08:09.0905796Z st.shared.v4.b32 [%r13], {%r2800, %r2812, %r2824, %r2836}; 2026-02-21T09:08:09.0905915Z bar.sync 0; 2026-02-21T09:08:09.0906029Z // begin inline asm 2026-02-21T09:08:09.0906371Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1819, %r1823, %r1827, %r1831}, [%r1374]; 2026-02-21T09:08:09.0906477Z // end inline asm 2026-02-21T09:08:09.0906605Z // begin inline asm 2026-02-21T09:08:09.0906946Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1835, %r1839, %r1843, %r1847}, [%r1379]; 2026-02-21T09:08:09.0907051Z // end inline asm 2026-02-21T09:08:09.0907177Z // begin inline asm 2026-02-21T09:08:09.0907519Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1851, %r1855, %r1859, %r1863}, [%r1384]; 2026-02-21T09:08:09.0907626Z // end inline asm 2026-02-21T09:08:09.0907738Z // begin inline asm 2026-02-21T09:08:09.0908093Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1867, %r1871, %r1875, %r1879}, [%r1389]; 2026-02-21T09:08:09.0908200Z // end inline asm 2026-02-21T09:08:09.0908309Z // begin inline asm 2026-02-21T09:08:09.0908665Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1883, %r1887, %r1891, %r1895}, [%r1394]; 2026-02-21T09:08:09.0908870Z // end inline asm 2026-02-21T09:08:09.0908982Z // begin inline asm 2026-02-21T09:08:09.0909339Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1899, %r1903, %r1907, %r1911}, [%r1399]; 2026-02-21T09:08:09.0909543Z // end inline asm 2026-02-21T09:08:09.0909654Z // begin inline asm 2026-02-21T09:08:09.0909991Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1915, %r1919, %r1923, %r1927}, [%r1404]; 2026-02-21T09:08:09.0910113Z // end inline asm 2026-02-21T09:08:09.0910224Z // begin inline asm 2026-02-21T09:08:09.0910564Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1931, %r1935, %r1939, %r1943}, [%r1409]; 2026-02-21T09:08:09.0910682Z // end inline asm 2026-02-21T09:08:09.0910788Z bar.sync 0; 2026-02-21T09:08:09.0910992Z st.shared.v4.b32 [%r6], {%r2083, %r2095, %r2107, %r2119}; 2026-02-21T09:08:09.0911194Z st.shared.v4.b32 [%r7], {%r2131, %r2143, %r2155, %r2167}; 2026-02-21T09:08:09.0911497Z st.shared.v4.b32 [%r8], {%r2179, %r2191, %r2203, %r2215}; 2026-02-21T09:08:09.0911697Z st.shared.v4.b32 [%r9], {%r2227, %r2239, %r2251, %r2263}; 2026-02-21T09:08:09.0911899Z st.shared.v4.b32 [%r10], {%r2275, %r2287, %r2299, %r2311}; 2026-02-21T09:08:09.0912108Z st.shared.v4.b32 [%r11], {%r2323, %r2335, %r2347, %r2359}; 2026-02-21T09:08:09.0912308Z st.shared.v4.b32 [%r12], {%r2371, %r2383, %r2395, %r2407}; 2026-02-21T09:08:09.0912506Z st.shared.v4.b32 [%r13], {%r2419, %r2431, %r2443, %r2455}; 2026-02-21T09:08:09.0912684Z bar.sync 0; 2026-02-21T09:08:09.0912800Z // begin inline asm 2026-02-21T09:08:09.0913151Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1692, %r1696, %r1700, %r1704}, [%r1374]; 2026-02-21T09:08:09.0913271Z // end inline asm 2026-02-21T09:08:09.0913386Z // begin inline asm 2026-02-21T09:08:09.0913734Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1708, %r1712, %r1716, %r1720}, [%r1379]; 2026-02-21T09:08:09.0913840Z // end inline asm 2026-02-21T09:08:09.0913964Z // begin inline asm 2026-02-21T09:08:09.0914315Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1724, %r1728, %r1732, %r1736}, [%r1384]; 2026-02-21T09:08:09.0914421Z // end inline asm 2026-02-21T09:08:09.0914544Z // begin inline asm 2026-02-21T09:08:09.0914986Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1740, %r1744, %r1748, %r1752}, [%r1389]; 2026-02-21T09:08:09.0915099Z // end inline asm 2026-02-21T09:08:09.0915214Z // begin inline asm 2026-02-21T09:08:09.0915578Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1756, %r1760, %r1764, %r1768}, [%r1394]; 2026-02-21T09:08:09.0915681Z // end inline asm 2026-02-21T09:08:09.0915793Z // begin inline asm 2026-02-21T09:08:09.0916152Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1772, %r1776, %r1780, %r1784}, [%r1399]; 2026-02-21T09:08:09.0916259Z // end inline asm 2026-02-21T09:08:09.0916370Z // begin inline asm 2026-02-21T09:08:09.0916729Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1788, %r1792, %r1796, %r1800}, [%r1404]; 2026-02-21T09:08:09.0916844Z // end inline asm 2026-02-21T09:08:09.0916957Z // begin inline asm 2026-02-21T09:08:09.0917299Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1804, %r1808, %r1812, %r1816}, [%r1409]; 2026-02-21T09:08:09.0917415Z // end inline asm 2026-02-21T09:08:09.0917518Z bar.sync 0; 2026-02-21T09:08:09.0917720Z st.shared.v4.b32 [%r6], {%r2467, %r2479, %r2491, %r2503}; 2026-02-21T09:08:09.0917937Z st.shared.v4.b32 [%r7], {%r2515, %r2527, %r2539, %r2551}; 2026-02-21T09:08:09.0918131Z st.shared.v4.b32 [%r8], {%r2563, %r2575, %r2587, %r2599}; 2026-02-21T09:08:09.0918329Z st.shared.v4.b32 [%r9], {%r2611, %r2623, %r2635, %r2647}; 2026-02-21T09:08:09.0918545Z st.shared.v4.b32 [%r10], {%r2659, %r2671, %r2683, %r2695}; 2026-02-21T09:08:09.0918749Z st.shared.v4.b32 [%r11], {%r2707, %r2719, %r2731, %r2743}; 2026-02-21T09:08:09.0918950Z st.shared.v4.b32 [%r12], {%r2755, %r2767, %r2779, %r2791}; 2026-02-21T09:08:09.0919146Z st.shared.v4.b32 [%r13], {%r2803, %r2815, %r2827, %r2839}; 2026-02-21T09:08:09.0919260Z bar.sync 0; 2026-02-21T09:08:09.0919373Z // begin inline asm 2026-02-21T09:08:09.0919800Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1820, %r1824, %r1828, %r1832}, [%r1374]; 2026-02-21T09:08:09.0919923Z // end inline asm 2026-02-21T09:08:09.0920036Z // begin inline asm 2026-02-21T09:08:09.0920383Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1836, %r1840, %r1844, %r1848}, [%r1379]; 2026-02-21T09:08:09.0920567Z // end inline asm 2026-02-21T09:08:09.0920678Z // begin inline asm 2026-02-21T09:08:09.0921031Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1852, %r1856, %r1860, %r1864}, [%r1384]; 2026-02-21T09:08:09.0921140Z // end inline asm 2026-02-21T09:08:09.0921261Z // begin inline asm 2026-02-21T09:08:09.0921608Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1868, %r1872, %r1876, %r1880}, [%r1389]; 2026-02-21T09:08:09.0921720Z // end inline asm 2026-02-21T09:08:09.0921837Z // begin inline asm 2026-02-21T09:08:09.0922253Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1884, %r1888, %r1892, %r1896}, [%r1394]; 2026-02-21T09:08:09.0922365Z // end inline asm 2026-02-21T09:08:09.0922482Z // begin inline asm 2026-02-21T09:08:09.0922835Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1900, %r1904, %r1908, %r1912}, [%r1399]; 2026-02-21T09:08:09.0922946Z // end inline asm 2026-02-21T09:08:09.0923065Z // begin inline asm 2026-02-21T09:08:09.0923418Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1916, %r1920, %r1924, %r1928}, [%r1404]; 2026-02-21T09:08:09.0923526Z // end inline asm 2026-02-21T09:08:09.0923637Z // begin inline asm 2026-02-21T09:08:09.0924057Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1932, %r1936, %r1940, %r1944}, [%r1409]; 2026-02-21T09:08:09.0924166Z // end inline asm 2026-02-21T09:08:09.0924274Z bar.sync 0; 2026-02-21T09:08:09.0924481Z st.shared.v4.b32 [%r6], {%r2086, %r2098, %r2110, %r2122}; 2026-02-21T09:08:09.0924772Z st.shared.v4.b32 [%r7], {%r2134, %r2146, %r2158, %r2170}; 2026-02-21T09:08:09.0924977Z st.shared.v4.b32 [%r8], {%r2182, %r2194, %r2206, %r2218}; 2026-02-21T09:08:09.0925183Z st.shared.v4.b32 [%r9], {%r2230, %r2242, %r2254, %r2266}; 2026-02-21T09:08:09.0925408Z st.shared.v4.b32 [%r10], {%r2278, %r2290, %r2302, %r2314}; 2026-02-21T09:08:09.0925604Z st.shared.v4.b32 [%r11], {%r2326, %r2338, %r2350, %r2362}; 2026-02-21T09:08:09.0925801Z st.shared.v4.b32 [%r12], {%r2374, %r2386, %r2398, %r2410}; 2026-02-21T09:08:09.0926014Z st.shared.v4.b32 [%r13], {%r2422, %r2434, %r2446, %r2458}; 2026-02-21T09:08:09.0926120Z bar.sync 0; 2026-02-21T09:08:09.0926239Z // begin inline asm 2026-02-21T09:08:09.0926592Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1693, %r1697, %r1701, %r1705}, [%r1374]; 2026-02-21T09:08:09.0926713Z // end inline asm 2026-02-21T09:08:09.0926824Z // begin inline asm 2026-02-21T09:08:09.0927175Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1709, %r1713, %r1717, %r1721}, [%r1379]; 2026-02-21T09:08:09.0927294Z // end inline asm 2026-02-21T09:08:09.0927405Z // begin inline asm 2026-02-21T09:08:09.0927764Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1725, %r1729, %r1733, %r1737}, [%r1384]; 2026-02-21T09:08:09.0927885Z // end inline asm 2026-02-21T09:08:09.0927998Z // begin inline asm 2026-02-21T09:08:09.0928347Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1741, %r1745, %r1749, %r1753}, [%r1389]; 2026-02-21T09:08:09.0928457Z // end inline asm 2026-02-21T09:08:09.0928585Z // begin inline asm 2026-02-21T09:08:09.0928939Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1757, %r1761, %r1765, %r1769}, [%r1394]; 2026-02-21T09:08:09.0929045Z // end inline asm 2026-02-21T09:08:09.0929167Z // begin inline asm 2026-02-21T09:08:09.0929524Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1773, %r1777, %r1781, %r1785}, [%r1399]; 2026-02-21T09:08:09.0929635Z // end inline asm 2026-02-21T09:08:09.0929750Z // begin inline asm 2026-02-21T09:08:09.0930114Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1789, %r1793, %r1797, %r1801}, [%r1404]; 2026-02-21T09:08:09.0930223Z // end inline asm 2026-02-21T09:08:09.0930337Z // begin inline asm 2026-02-21T09:08:09.0930702Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1805, %r1809, %r1813, %r1817}, [%r1409]; 2026-02-21T09:08:09.0930887Z // end inline asm 2026-02-21T09:08:09.0930995Z bar.sync 0; 2026-02-21T09:08:09.0931215Z st.shared.v4.b32 [%r6], {%r2470, %r2482, %r2494, %r2506}; 2026-02-21T09:08:09.0931416Z st.shared.v4.b32 [%r7], {%r2518, %r2530, %r2542, %r2554}; 2026-02-21T09:08:09.0931676Z st.shared.v4.b32 [%r8], {%r2566, %r2578, %r2590, %r2602}; 2026-02-21T09:08:09.0931873Z st.shared.v4.b32 [%r9], {%r2614, %r2626, %r2638, %r2650}; 2026-02-21T09:08:09.0932092Z st.shared.v4.b32 [%r10], {%r2662, %r2674, %r2686, %r2698}; 2026-02-21T09:08:09.0932294Z st.shared.v4.b32 [%r11], {%r2710, %r2722, %r2734, %r2746}; 2026-02-21T09:08:09.0932492Z st.shared.v4.b32 [%r12], {%r2758, %r2770, %r2782, %r2794}; 2026-02-21T09:08:09.0932703Z st.shared.v4.b32 [%r13], {%r2806, %r2818, %r2830, %r2842}; 2026-02-21T09:08:09.0932809Z bar.sync 0; 2026-02-21T09:08:09.0932924Z // begin inline asm 2026-02-21T09:08:09.0933353Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1821, %r1825, %r1829, %r1833}, [%r1374]; 2026-02-21T09:08:09.0933464Z // end inline asm 2026-02-21T09:08:09.0933576Z // begin inline asm 2026-02-21T09:08:09.0933930Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1837, %r1841, %r1845, %r1849}, [%r1379]; 2026-02-21T09:08:09.0934049Z // end inline asm 2026-02-21T09:08:09.0934164Z // begin inline asm 2026-02-21T09:08:09.0934510Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1853, %r1857, %r1861, %r1865}, [%r1384]; 2026-02-21T09:08:09.0934627Z // end inline asm 2026-02-21T09:08:09.0934879Z // begin inline asm 2026-02-21T09:08:09.0935232Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1869, %r1873, %r1877, %r1881}, [%r1389]; 2026-02-21T09:08:09.0935349Z // end inline asm 2026-02-21T09:08:09.0935461Z // begin inline asm 2026-02-21T09:08:09.0935801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1885, %r1889, %r1893, %r1897}, [%r1394]; 2026-02-21T09:08:09.0935911Z // end inline asm 2026-02-21T09:08:09.0936034Z // begin inline asm 2026-02-21T09:08:09.0936380Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1901, %r1905, %r1909, %r1913}, [%r1399]; 2026-02-21T09:08:09.0936491Z // end inline asm 2026-02-21T09:08:09.0936609Z // begin inline asm 2026-02-21T09:08:09.0936952Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1917, %r1921, %r1925, %r1929}, [%r1404]; 2026-02-21T09:08:09.0937062Z // end inline asm 2026-02-21T09:08:09.0937173Z // begin inline asm 2026-02-21T09:08:09.0937521Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1933, %r1937, %r1941, %r1945}, [%r1409]; 2026-02-21T09:08:09.0937634Z // end inline asm 2026-02-21T09:08:09.0937750Z // begin inline asm 2026-02-21T09:08:09.0938001Z st.global.v4.b32 [ %rd68 + 0 ], { %r1690, %r1691, %r1692, %r1693 }; 2026-02-21T09:08:09.0938111Z // end inline asm 2026-02-21T09:08:09.0938224Z // begin inline asm 2026-02-21T09:08:09.0938470Z st.global.v4.b32 [ %rd69 + 0 ], { %r1694, %r1695, %r1696, %r1697 }; 2026-02-21T09:08:09.0938580Z // end inline asm 2026-02-21T09:08:09.0938692Z // begin inline asm 2026-02-21T09:08:09.0938921Z st.global.v4.b32 [ %rd70 + 0 ], { %r1698, %r1699, %r1700, %r1701 }; 2026-02-21T09:08:09.0939042Z // end inline asm 2026-02-21T09:08:09.0939154Z // begin inline asm 2026-02-21T09:08:09.0939374Z st.global.v4.b32 [ %rd71 + 0 ], { %r1702, %r1703, %r1704, %r1705 }; 2026-02-21T09:08:09.0939493Z // end inline asm 2026-02-21T09:08:09.0939604Z // begin inline asm 2026-02-21T09:08:09.0939822Z st.global.v4.b32 [ %rd72 + 0 ], { %r1706, %r1707, %r1708, %r1709 }; 2026-02-21T09:08:09.0939927Z // end inline asm 2026-02-21T09:08:09.0940049Z // begin inline asm 2026-02-21T09:08:09.0940268Z st.global.v4.b32 [ %rd73 + 0 ], { %r1710, %r1711, %r1712, %r1713 }; 2026-02-21T09:08:09.0940379Z // end inline asm 2026-02-21T09:08:09.0940503Z // begin inline asm 2026-02-21T09:08:09.0940776Z st.global.v4.b32 [ %rd74 + 0 ], { %r1714, %r1715, %r1716, %r1717 }; 2026-02-21T09:08:09.0940907Z // end inline asm 2026-02-21T09:08:09.0941020Z // begin inline asm 2026-02-21T09:08:09.0941256Z st.global.v4.b32 [ %rd75 + 0 ], { %r1718, %r1719, %r1720, %r1721 }; 2026-02-21T09:08:09.0941441Z // end inline asm 2026-02-21T09:08:09.0941555Z // begin inline asm 2026-02-21T09:08:09.0941790Z st.global.v4.b32 [ %rd76 + 0 ], { %r1722, %r1723, %r1724, %r1725 }; 2026-02-21T09:08:09.0941899Z // end inline asm 2026-02-21T09:08:09.0942081Z // begin inline asm 2026-02-21T09:08:09.0942312Z st.global.v4.b32 [ %rd77 + 0 ], { %r1726, %r1727, %r1728, %r1729 }; 2026-02-21T09:08:09.0942421Z // end inline asm 2026-02-21T09:08:09.0942539Z // begin inline asm 2026-02-21T09:08:09.0942767Z st.global.v4.b32 [ %rd78 + 0 ], { %r1730, %r1731, %r1732, %r1733 }; 2026-02-21T09:08:09.0942887Z // end inline asm 2026-02-21T09:08:09.0942999Z // begin inline asm 2026-02-21T09:08:09.0943230Z st.global.v4.b32 [ %rd79 + 0 ], { %r1734, %r1735, %r1736, %r1737 }; 2026-02-21T09:08:09.0943348Z // end inline asm 2026-02-21T09:08:09.0943459Z // begin inline asm 2026-02-21T09:08:09.0943739Z st.global.v4.b32 [ %rd80 + 0 ], { %r1738, %r1739, %r1740, %r1741 }; 2026-02-21T09:08:09.0943848Z // end inline asm 2026-02-21T09:08:09.0943971Z // begin inline asm 2026-02-21T09:08:09.0944192Z st.global.v4.b32 [ %rd81 + 0 ], { %r1742, %r1743, %r1744, %r1745 }; 2026-02-21T09:08:09.0944300Z // end inline asm 2026-02-21T09:08:09.0944423Z // begin inline asm 2026-02-21T09:08:09.0944647Z st.global.v4.b32 [ %rd82 + 0 ], { %r1746, %r1747, %r1748, %r1749 }; 2026-02-21T09:08:09.0944826Z // end inline asm 2026-02-21T09:08:09.0944950Z // begin inline asm 2026-02-21T09:08:09.0945239Z st.global.v4.b32 [ %rd83 + 0 ], { %r1750, %r1751, %r1752, %r1753 }; 2026-02-21T09:08:09.0945351Z // end inline asm 2026-02-21T09:08:09.0945460Z // begin inline asm 2026-02-21T09:08:09.0945694Z st.global.v4.b32 [ %rd84 + 0 ], { %r1754, %r1755, %r1756, %r1757 }; 2026-02-21T09:08:09.0945802Z // end inline asm 2026-02-21T09:08:09.0945919Z // begin inline asm 2026-02-21T09:08:09.0946149Z st.global.v4.b32 [ %rd85 + 0 ], { %r1758, %r1759, %r1760, %r1761 }; 2026-02-21T09:08:09.0946262Z // end inline asm 2026-02-21T09:08:09.0946379Z // begin inline asm 2026-02-21T09:08:09.0946597Z st.global.v4.b32 [ %rd86 + 0 ], { %r1762, %r1763, %r1764, %r1765 }; 2026-02-21T09:08:09.0946710Z // end inline asm 2026-02-21T09:08:09.0946822Z // begin inline asm 2026-02-21T09:08:09.0947042Z st.global.v4.b32 [ %rd87 + 0 ], { %r1766, %r1767, %r1768, %r1769 }; 2026-02-21T09:08:09.0947159Z // end inline asm 2026-02-21T09:08:09.0947272Z // begin inline asm 2026-02-21T09:08:09.0947493Z st.global.v4.b32 [ %rd88 + 0 ], { %r1770, %r1771, %r1772, %r1773 }; 2026-02-21T09:08:09.0947615Z // end inline asm 2026-02-21T09:08:09.0947730Z // begin inline asm 2026-02-21T09:08:09.0947953Z st.global.v4.b32 [ %rd89 + 0 ], { %r1774, %r1775, %r1776, %r1777 }; 2026-02-21T09:08:09.0948063Z // end inline asm 2026-02-21T09:08:09.0948186Z // begin inline asm 2026-02-21T09:08:09.0948416Z st.global.v4.b32 [ %rd90 + 0 ], { %r1778, %r1779, %r1780, %r1781 }; 2026-02-21T09:08:09.0948521Z // end inline asm 2026-02-21T09:08:09.0948648Z // begin inline asm 2026-02-21T09:08:09.0948877Z st.global.v4.b32 [ %rd91 + 0 ], { %r1782, %r1783, %r1784, %r1785 }; 2026-02-21T09:08:09.0948978Z // end inline asm 2026-02-21T09:08:09.0949089Z // begin inline asm 2026-02-21T09:08:09.0949321Z st.global.v4.b32 [ %rd92 + 0 ], { %r1786, %r1787, %r1788, %r1789 }; 2026-02-21T09:08:09.0949432Z // end inline asm 2026-02-21T09:08:09.0949542Z // begin inline asm 2026-02-21T09:08:09.0949776Z st.global.v4.b32 [ %rd93 + 0 ], { %r1790, %r1791, %r1792, %r1793 }; 2026-02-21T09:08:09.0949883Z // end inline asm 2026-02-21T09:08:09.0949996Z // begin inline asm 2026-02-21T09:08:09.0950217Z st.global.v4.b32 [ %rd94 + 0 ], { %r1794, %r1795, %r1796, %r1797 }; 2026-02-21T09:08:09.0950332Z // end inline asm 2026-02-21T09:08:09.0950441Z // begin inline asm 2026-02-21T09:08:09.0950663Z st.global.v4.b32 [ %rd95 + 0 ], { %r1798, %r1799, %r1800, %r1801 }; 2026-02-21T09:08:09.0950782Z // end inline asm 2026-02-21T09:08:09.0950890Z // begin inline asm 2026-02-21T09:08:09.0951112Z st.global.v4.b32 [ %rd96 + 0 ], { %r1802, %r1803, %r1804, %r1805 }; 2026-02-21T09:08:09.0951316Z // end inline asm 2026-02-21T09:08:09.0951428Z // begin inline asm 2026-02-21T09:08:09.0951648Z st.global.v4.b32 [ %rd97 + 0 ], { %r1806, %r1807, %r1808, %r1809 }; 2026-02-21T09:08:09.0951759Z // end inline asm 2026-02-21T09:08:09.0951977Z // begin inline asm 2026-02-21T09:08:09.0952200Z st.global.v4.b32 [ %rd98 + 0 ], { %r1810, %r1811, %r1812, %r1813 }; 2026-02-21T09:08:09.0952307Z // end inline asm 2026-02-21T09:08:09.0952425Z // begin inline asm 2026-02-21T09:08:09.0952645Z st.global.v4.b32 [ %rd99 + 0 ], { %r1814, %r1815, %r1816, %r1817 }; 2026-02-21T09:08:09.0952753Z // end inline asm 2026-02-21T09:08:09.0952863Z // begin inline asm 2026-02-21T09:08:09.0953110Z st.global.v4.b32 [ %rd100 + 0 ], { %r1818, %r1819, %r1820, %r1821 }; 2026-02-21T09:08:09.0953221Z // end inline asm 2026-02-21T09:08:09.0953334Z // begin inline asm 2026-02-21T09:08:09.0953643Z st.global.v4.b32 [ %rd101 + 0 ], { %r1822, %r1823, %r1824, %r1825 }; 2026-02-21T09:08:09.0953756Z // end inline asm 2026-02-21T09:08:09.0953868Z // begin inline asm 2026-02-21T09:08:09.0954111Z st.global.v4.b32 [ %rd102 + 0 ], { %r1826, %r1827, %r1828, %r1829 }; 2026-02-21T09:08:09.0954218Z // end inline asm 2026-02-21T09:08:09.0954331Z // begin inline asm 2026-02-21T09:08:09.0954560Z st.global.v4.b32 [ %rd103 + 0 ], { %r1830, %r1831, %r1832, %r1833 }; 2026-02-21T09:08:09.0954739Z // end inline asm 2026-02-21T09:08:09.0954854Z // begin inline asm 2026-02-21T09:08:09.0955140Z st.global.v4.b32 [ %rd104 + 0 ], { %r1834, %r1835, %r1836, %r1837 }; 2026-02-21T09:08:09.0955271Z // end inline asm 2026-02-21T09:08:09.0955389Z // begin inline asm 2026-02-21T09:08:09.0955617Z st.global.v4.b32 [ %rd105 + 0 ], { %r1838, %r1839, %r1840, %r1841 }; 2026-02-21T09:08:09.0955728Z // end inline asm 2026-02-21T09:08:09.0955852Z // begin inline asm 2026-02-21T09:08:09.0956082Z st.global.v4.b32 [ %rd106 + 0 ], { %r1842, %r1843, %r1844, %r1845 }; 2026-02-21T09:08:09.0956191Z // end inline asm 2026-02-21T09:08:09.0956317Z // begin inline asm 2026-02-21T09:08:09.0956544Z st.global.v4.b32 [ %rd107 + 0 ], { %r1846, %r1847, %r1848, %r1849 }; 2026-02-21T09:08:09.0956647Z // end inline asm 2026-02-21T09:08:09.0956760Z // begin inline asm 2026-02-21T09:08:09.0956997Z st.global.v4.b32 [ %rd108 + 0 ], { %r1850, %r1851, %r1852, %r1853 }; 2026-02-21T09:08:09.0957104Z // end inline asm 2026-02-21T09:08:09.0957215Z // begin inline asm 2026-02-21T09:08:09.0957450Z st.global.v4.b32 [ %rd109 + 0 ], { %r1854, %r1855, %r1856, %r1857 }; 2026-02-21T09:08:09.0957559Z // end inline asm 2026-02-21T09:08:09.0957673Z // begin inline asm 2026-02-21T09:08:09.0957907Z st.global.v4.b32 [ %rd110 + 0 ], { %r1858, %r1859, %r1860, %r1861 }; 2026-02-21T09:08:09.0958015Z // end inline asm 2026-02-21T09:08:09.0958130Z // begin inline asm 2026-02-21T09:08:09.0958353Z st.global.v4.b32 [ %rd111 + 0 ], { %r1862, %r1863, %r1864, %r1865 }; 2026-02-21T09:08:09.0958474Z // end inline asm 2026-02-21T09:08:09.0958590Z // begin inline asm 2026-02-21T09:08:09.0958817Z st.global.v4.b32 [ %rd112 + 0 ], { %r1866, %r1867, %r1868, %r1869 }; 2026-02-21T09:08:09.0958933Z // end inline asm 2026-02-21T09:08:09.0959042Z // begin inline asm 2026-02-21T09:08:09.0959267Z st.global.v4.b32 [ %rd113 + 0 ], { %r1870, %r1871, %r1872, %r1873 }; 2026-02-21T09:08:09.0959377Z // end inline asm 2026-02-21T09:08:09.0959502Z // begin inline asm 2026-02-21T09:08:09.0959725Z st.global.v4.b32 [ %rd114 + 0 ], { %r1874, %r1875, %r1876, %r1877 }; 2026-02-21T09:08:09.0959834Z // end inline asm 2026-02-21T09:08:09.0959955Z // begin inline asm 2026-02-21T09:08:09.0960180Z st.global.v4.b32 [ %rd115 + 0 ], { %r1878, %r1879, %r1880, %r1881 }; 2026-02-21T09:08:09.0960290Z // end inline asm 2026-02-21T09:08:09.0960411Z // begin inline asm 2026-02-21T09:08:09.0960638Z st.global.v4.b32 [ %rd116 + 0 ], { %r1882, %r1883, %r1884, %r1885 }; 2026-02-21T09:08:09.0960745Z // end inline asm 2026-02-21T09:08:09.0960853Z // begin inline asm 2026-02-21T09:08:09.0961162Z st.global.v4.b32 [ %rd117 + 0 ], { %r1886, %r1887, %r1888, %r1889 }; 2026-02-21T09:08:09.0961272Z // end inline asm 2026-02-21T09:08:09.0961383Z // begin inline asm 2026-02-21T09:08:09.0961617Z st.global.v4.b32 [ %rd118 + 0 ], { %r1890, %r1891, %r1892, %r1893 }; 2026-02-21T09:08:09.0961789Z // end inline asm 2026-02-21T09:08:09.0961902Z // begin inline asm 2026-02-21T09:08:09.0962124Z st.global.v4.b32 [ %rd119 + 0 ], { %r1894, %r1895, %r1896, %r1897 }; 2026-02-21T09:08:09.0962241Z // end inline asm 2026-02-21T09:08:09.0962353Z // begin inline asm 2026-02-21T09:08:09.0962572Z st.global.v4.b32 [ %rd120 + 0 ], { %r1898, %r1899, %r1900, %r1901 }; 2026-02-21T09:08:09.0962690Z // end inline asm 2026-02-21T09:08:09.0962805Z // begin inline asm 2026-02-21T09:08:09.0963028Z st.global.v4.b32 [ %rd121 + 0 ], { %r1902, %r1903, %r1904, %r1905 }; 2026-02-21T09:08:09.0963148Z // end inline asm 2026-02-21T09:08:09.0963261Z // begin inline asm 2026-02-21T09:08:09.0963545Z st.global.v4.b32 [ %rd122 + 0 ], { %r1906, %r1907, %r1908, %r1909 }; 2026-02-21T09:08:09.0963653Z // end inline asm 2026-02-21T09:08:09.0963778Z // begin inline asm 2026-02-21T09:08:09.0964004Z st.global.v4.b32 [ %rd123 + 0 ], { %r1910, %r1911, %r1912, %r1913 }; 2026-02-21T09:08:09.0964112Z // end inline asm 2026-02-21T09:08:09.0964236Z // begin inline asm 2026-02-21T09:08:09.0964467Z st.global.v4.b32 [ %rd124 + 0 ], { %r1914, %r1915, %r1916, %r1917 }; 2026-02-21T09:08:09.0964572Z // end inline asm 2026-02-21T09:08:09.0964839Z // begin inline asm 2026-02-21T09:08:09.0965082Z st.global.v4.b32 [ %rd125 + 0 ], { %r1918, %r1919, %r1920, %r1921 }; 2026-02-21T09:08:09.0965192Z // end inline asm 2026-02-21T09:08:09.0965314Z // begin inline asm 2026-02-21T09:08:09.0965554Z st.global.v4.b32 [ %rd126 + 0 ], { %r1922, %r1923, %r1924, %r1925 }; 2026-02-21T09:08:09.0965664Z // end inline asm 2026-02-21T09:08:09.0965779Z // begin inline asm 2026-02-21T09:08:09.0966013Z st.global.v4.b32 [ %rd127 + 0 ], { %r1926, %r1927, %r1928, %r1929 }; 2026-02-21T09:08:09.0966123Z // end inline asm 2026-02-21T09:08:09.0966239Z // begin inline asm 2026-02-21T09:08:09.0966459Z st.global.v4.b32 [ %rd128 + 0 ], { %r1930, %r1931, %r1932, %r1933 }; 2026-02-21T09:08:09.0966576Z // end inline asm 2026-02-21T09:08:09.0966685Z // begin inline asm 2026-02-21T09:08:09.0966905Z st.global.v4.b32 [ %rd129 + 0 ], { %r1934, %r1935, %r1936, %r1937 }; 2026-02-21T09:08:09.0967025Z // end inline asm 2026-02-21T09:08:09.0967137Z // begin inline asm 2026-02-21T09:08:09.0967364Z st.global.v4.b32 [ %rd130 + 0 ], { %r1938, %r1939, %r1940, %r1941 }; 2026-02-21T09:08:09.0967472Z // end inline asm 2026-02-21T09:08:09.0967596Z // begin inline asm 2026-02-21T09:08:09.0967818Z st.global.v4.b32 [ %rd131 + 0 ], { %r1942, %r1943, %r1944, %r1945 }; 2026-02-21T09:08:09.0967924Z // end inline asm 2026-02-21T09:08:09.0968105Z $L__BB0_8: // %._crit_edge 2026-02-21T09:08:09.0968527Z .loc 1 31 4 // crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py:31:4 2026-02-21T09:08:09.0968642Z bar.sync 0; 2026-02-21T09:08:09.0968766Z // begin inline asm 2026-02-21T09:08:09.0969041Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2843, 512; 2026-02-21T09:08:09.0969149Z // end inline asm 2026-02-21T09:08:09.0969249Z ret; 2026-02-21T09:08:09.0969367Z $L__tmp1: 2026-02-21T09:08:09.0969476Z $L__func_end0: 2026-02-21T09:08:09.0969653Z // -- End function 2026-02-21T09:08:09.0969770Z } 2026-02-21T09:08:09.0970254Z .file 1 "/tmp/torchinductor_root/rq/crqol364wqpplmioxbmwhptyl7gcyfqrtrehpewzjgcxvcr54wh4.py" 2026-02-21T09:08:09.0970384Z .section .debug_abbrev 2026-02-21T09:08:09.0970483Z { 2026-02-21T09:08:09.0970673Z .b8 1 // Abbreviation Code 2026-02-21T09:08:09.0970859Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:08:09.0971029Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:08:09.0971206Z .b8 37 // DW_AT_producer 2026-02-21T09:08:09.0971455Z .b8 8 // DW_FORM_string 2026-02-21T09:08:09.0971613Z .b8 19 // DW_AT_language 2026-02-21T09:08:09.0971790Z .b8 5 // DW_FORM_data2 2026-02-21T09:08:09.0972003Z .b8 3 // DW_AT_name 2026-02-21T09:08:09.0972162Z .b8 8 // DW_FORM_string 2026-02-21T09:08:09.0972332Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:08:09.0972504Z .b8 6 // DW_FORM_data4 2026-02-21T09:08:09.0972664Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:08:09.0972818Z .b8 8 // DW_FORM_string 2026-02-21T09:08:09.0972973Z .b8 0 // EOM(1) 2026-02-21T09:08:09.0973112Z .b8 0 // EOM(2) 2026-02-21T09:08:09.0973322Z .b8 0 // EOM(3) 2026-02-21T09:08:09.0973439Z } 2026-02-21T09:08:09.0973565Z .section .debug_info 2026-02-21T09:08:09.0973663Z { 2026-02-21T09:08:09.0973837Z .b32 104 // Length of Unit 2026-02-21T09:08:09.0974037Z .b8 2 // DWARF version number 2026-02-21T09:08:09.0974138Z .b8 0 2026-02-21T09:08:09.0974382Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:08:09.0974617Z .b8 8 // Address Size (in bytes) 2026-02-21T09:08:09.0974914Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:08:09.0975086Z .b8 116 // DW_AT_producer 2026-02-21T09:08:09.0975202Z .b8 114 2026-02-21T09:08:09.0975302Z .b8 105 2026-02-21T09:08:09.0975398Z .b8 116 2026-02-21T09:08:09.0975494Z .b8 111 2026-02-21T09:08:09.0975602Z .b8 110 2026-02-21T09:08:09.0975696Z .b8 0 2026-02-21T09:08:09.0975851Z .b8 2 // DW_AT_language 2026-02-21T09:08:09.0975958Z .b8 0 2026-02-21T09:08:09.0976112Z .b8 99 // DW_AT_name 2026-02-21T09:08:09.0976212Z .b8 114 2026-02-21T09:08:09.0976309Z .b8 113 2026-02-21T09:08:09.0976415Z .b8 111 2026-02-21T09:08:09.0976512Z .b8 108 2026-02-21T09:08:09.0976610Z .b8 51 2026-02-21T09:08:09.0976707Z .b8 54 2026-02-21T09:08:09.0976809Z .b8 52 2026-02-21T09:08:09.0976903Z .b8 119 2026-02-21T09:08:09.0977002Z .b8 113 2026-02-21T09:08:09.0977108Z .b8 112 2026-02-21T09:08:09.0977228Z .b8 112 2026-02-21T09:08:09.0977348Z .b8 108 2026-02-21T09:08:09.0977469Z .b8 109 2026-02-21T09:08:09.0977601Z .b8 105 2026-02-21T09:08:09.0977777Z .b8 111 2026-02-21T09:08:09.0977899Z .b8 120 2026-02-21T09:08:09.0978033Z .b8 98 2026-02-21T09:08:09.0978155Z .b8 109 2026-02-21T09:08:09.0978279Z .b8 119 2026-02-21T09:08:09.0978386Z .b8 104 2026-02-21T09:08:09.0978492Z .b8 112 2026-02-21T09:08:09.0978588Z .b8 116 2026-02-21T09:08:09.0978684Z .b8 121 2026-02-21T09:08:09.0978795Z .b8 108 2026-02-21T09:08:09.0978886Z .b8 55 2026-02-21T09:08:09.0978979Z .b8 103 2026-02-21T09:08:09.0979073Z .b8 99 2026-02-21T09:08:09.0979176Z .b8 121 2026-02-21T09:08:09.0979272Z .b8 102 2026-02-21T09:08:09.0979366Z .b8 113 2026-02-21T09:08:09.0979459Z .b8 114 2026-02-21T09:08:09.0979565Z .b8 116 2026-02-21T09:08:09.0979660Z .b8 114 2026-02-21T09:08:09.0979763Z .b8 101 2026-02-21T09:08:09.0979867Z .b8 104 2026-02-21T09:08:09.0979959Z .b8 112 2026-02-21T09:08:09.0980053Z .b8 101 2026-02-21T09:08:09.0980146Z .b8 119 2026-02-21T09:08:09.0980249Z .b8 122 2026-02-21T09:08:09.0980340Z .b8 106 2026-02-21T09:08:09.0980432Z .b8 103 2026-02-21T09:08:09.0980530Z .b8 99 2026-02-21T09:08:09.0980623Z .b8 120 2026-02-21T09:08:09.0980713Z .b8 118 2026-02-21T09:08:09.0980803Z .b8 99 2026-02-21T09:08:09.0980907Z .b8 114 2026-02-21T09:08:09.0980999Z .b8 53 2026-02-21T09:08:09.0981090Z .b8 52 2026-02-21T09:08:09.0981181Z .b8 119 2026-02-21T09:08:09.0981283Z .b8 104 2026-02-21T09:08:09.0981376Z .b8 52 2026-02-21T09:08:09.0981564Z .b8 46 2026-02-21T09:08:09.0981671Z .b8 112 2026-02-21T09:08:09.0981768Z .b8 121 2026-02-21T09:08:09.0981862Z .b8 0 2026-02-21T09:08:09.0982053Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:08:09.0982213Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:08:09.0982362Z .b8 116 2026-02-21T09:08:09.0982457Z .b8 109 2026-02-21T09:08:09.0982558Z .b8 112 2026-02-21T09:08:09.0982654Z .b8 47 2026-02-21T09:08:09.0982747Z .b8 116 2026-02-21T09:08:09.0982838Z .b8 111 2026-02-21T09:08:09.0982943Z .b8 114 2026-02-21T09:08:09.0983035Z .b8 99 2026-02-21T09:08:09.0983131Z .b8 104 2026-02-21T09:08:09.0983235Z .b8 105 2026-02-21T09:08:09.0983326Z .b8 110 2026-02-21T09:08:09.0983416Z .b8 100 2026-02-21T09:08:09.0983509Z .b8 117 2026-02-21T09:08:09.0983609Z .b8 99 2026-02-21T09:08:09.0983700Z .b8 116 2026-02-21T09:08:09.0983792Z .b8 111 2026-02-21T09:08:09.0983896Z .b8 114 2026-02-21T09:08:09.0983987Z .b8 95 2026-02-21T09:08:09.0984158Z .b8 114 2026-02-21T09:08:09.0984257Z .b8 111 2026-02-21T09:08:09.0984370Z .b8 111 2026-02-21T09:08:09.0984470Z .b8 116 2026-02-21T09:08:09.0984570Z .b8 47 2026-02-21T09:08:09.0984664Z .b8 114 2026-02-21T09:08:09.0984836Z .b8 113 2026-02-21T09:08:09.0984933Z .b8 0 2026-02-21T09:08:09.0985028Z } 2026-02-21T09:08:09.0985183Z .section .debug_macinfo { } 2026-02-21T09:08:09.0985200Z 2026-02-21T09:08:09.0985357Z ================================================================ 2026-02-21T09:08:09.0985628Z please share the reproducer above with Triton project. 2026-02-21T09:08:10.4165860Z 2026-02-21T09:08:10.4165883Z 2026-02-21T09:08:10.4165892Z 2026-02-21T09:08:10.4166445Z ================================================================ 2026-02-21T09:08:10.4167018Z Internal Triton PTX codegen error 2026-02-21T09:08:10.4167403Z `ptxas` stderr: 2026-02-21T09:08:10.4168443Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 210 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:10.4169597Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:10.4169938Z 2026-02-21T09:08:10.4170566Z [190s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:08:10.4173501Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:08:10.4177043Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:08:10.4177596Z `ptxas` stderr: 2026-02-21T09:08:10.4178593Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 210 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:10.4179799Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:10.4180178Z 2026-02-21T09:08:10.4181232Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpyahdi95x.ptx -o /tmp/tmpyahdi95x.ptx.o 2026-02-21T09:08:10.4182408Z 2026-02-21T09:08:10.4182735Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:08:10.4184107Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpyahdi95x.ptx -o /tmp/tmpyahdi95x.ptx.o 2026-02-21T09:08:10.4185320Z 2026-02-21T09:08:10.4185327Z 2026-02-21T09:08:10.4185441Z // 2026-02-21T09:08:10.4185747Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:08:10.4186121Z // 2026-02-21T09:08:10.4186263Z 2026-02-21T09:08:10.4186391Z .version 8.7 2026-02-21T09:08:10.4186683Z .target sm_100a 2026-02-21T09:08:10.4187430Z .address_size 64 2026-02-21T09:08:10.4187611Z 2026-02-21T09:08:10.4187884Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:08:10.4188483Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:08:10.4188951Z // @_helion_matmul 2026-02-21T09:08:10.4189567Z .visible .entry _helion_matmul( 2026-02-21T09:08:10.4190067Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:08:10.4190679Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:08:10.4191293Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:08:10.4191887Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:08:10.4192494Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:08:10.4192952Z ) 2026-02-21T09:08:10.4193209Z .reqntid 384 2026-02-21T09:08:10.4193478Z .maxnreg 32 2026-02-21T09:08:10.4193742Z { 2026-02-21T09:08:10.4194145Z .reg .pred %p<115>; 2026-02-21T09:08:10.4194466Z .reg .b32 %r<973>; 2026-02-21T09:08:10.4194845Z .reg .b64 %rd<337>; 2026-02-21T09:08:10.4195458Z .loc 1 19 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:19:0 2026-02-21T09:08:10.4196158Z $L__func_begin0: 2026-02-21T09:08:10.4196741Z .loc 1 19 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:19:0 2026-02-21T09:08:10.4197320Z 2026-02-21T09:08:10.4197429Z // %bb.0: 2026-02-21T09:08:10.4197861Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:08:10.4198292Z $L__tmp0: 2026-02-21T09:08:10.4198839Z .loc 1 19 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:19 2026-02-21T09:08:10.4199513Z mov.u32 %r1, %tid.x; 2026-02-21T09:08:10.4199841Z shr.u32 %r2, %r1, 5; 2026-02-21T09:08:10.4200183Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:08:10.4200610Z setp.lt.u32 %p3, %r3, 8; 2026-02-21T09:08:10.4200957Z @%p3 bra $L__BB0_16; 2026-02-21T09:08:10.4201279Z bra.uni $L__BB0_1; 2026-02-21T09:08:10.4201571Z $L__BB0_16: 2026-02-21T09:08:10.4202136Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0:0 2026-02-21T09:08:10.4202887Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:08:10.4203370Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:08:10.4204080Z .loc 1 19 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:19 2026-02-21T09:08:10.4204909Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:08:10.4205356Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T09:08:10.4205720Z mov.b32 %r174, global_smem; 2026-02-21T09:08:10.4206078Z // begin inline asm 2026-02-21T09:08:10.4206629Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r174], 256; 2026-02-21T09:08:10.4207225Z // end inline asm 2026-02-21T09:08:10.4207538Z bar.sync 0, 256; 2026-02-21T09:08:10.4207863Z ld.shared.b32 %r944, [global_smem]; 2026-02-21T09:08:10.4208262Z bar.sync 0, 256; 2026-02-21T09:08:10.4208561Z // begin inline asm 2026-02-21T09:08:10.4209034Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:08:10.4209558Z // end inline asm 2026-02-21T09:08:10.4210152Z .loc 1 21 67 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:21:67 2026-02-21T09:08:10.4210859Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:08:10.4211197Z mov.u32 %r343, %ctaid.y; 2026-02-21T09:08:10.4211544Z mov.u32 %r344, %ctaid.z; 2026-02-21T09:08:10.4211886Z mov.u32 %r345, %nctaid.x; 2026-02-21T09:08:10.4212241Z mov.u32 %r346, %nctaid.y; 2026-02-21T09:08:10.4212595Z mad.lo.s32 %r347, %r344, %r346, %r343; 2026-02-21T09:08:10.4213007Z mad.lo.s32 %r348, %r347, %r345, %r41; 2026-02-21T09:08:10.4213394Z shl.b32 %r349, %r348, 8; 2026-02-21T09:08:10.4213737Z cvt.s64.s32 %rd64, %r349; 2026-02-21T09:08:10.4214081Z add.s64 %rd43, %rd6, %rd64; 2026-02-21T09:08:10.4214442Z shl.b32 %r350, %r1, 2; 2026-02-21T09:08:10.4214858Z add.s32 %r175, %r174, %r350; 2026-02-21T09:08:10.4215306Z mov.b32 %r972, 0; 2026-02-21T09:08:10.4215615Z // begin inline asm 2026-02-21T09:08:10.4215959Z @%p33 st.shared.b32 [ %r175 + 0 ], %r972; 2026-02-21T09:08:10.4216371Z // end inline asm 2026-02-21T09:08:10.4216677Z bar.warp.sync -1; 2026-02-21T09:08:10.4217075Z setp.eq.b32 %p103, %r1, 0; 2026-02-21T09:08:10.4217427Z cvt.u64.u32 %rd28, %r174; 2026-02-21T09:08:10.4217772Z // begin inline asm 2026-02-21T09:08:10.4218375Z @%p103 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd3; 2026-02-21T09:08:10.4219046Z // end inline asm 2026-02-21T09:08:10.4219349Z // begin inline asm 2026-02-21T09:08:10.4219877Z @%p103 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:08:10.4220494Z // end inline asm 2026-02-21T09:08:10.4220782Z mov.b32 %r177, 32; 2026-02-21T09:08:10.4221089Z // begin inline asm 2026-02-21T09:08:10.4221706Z @%p103 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r177; 2026-02-21T09:08:10.4222374Z // end inline asm 2026-02-21T09:08:10.4222675Z mov.b32 %r178, 256; 2026-02-21T09:08:10.4222984Z // begin inline asm 2026-02-21T09:08:10.4223552Z @%p103 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r178; 2026-02-21T09:08:10.4224200Z // end inline asm 2026-02-21T09:08:10.4224504Z mov.b32 %r179, 2048; 2026-02-21T09:08:10.4224907Z // begin inline asm 2026-02-21T09:08:10.4225563Z @%p103 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r179; 2026-02-21T09:08:10.4226239Z // end inline asm 2026-02-21T09:08:10.4226540Z mov.b32 %r180, 4096; 2026-02-21T09:08:10.4226860Z // begin inline asm 2026-02-21T09:08:10.4227439Z @%p103 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r180; 2026-02-21T09:08:10.4228118Z // end inline asm 2026-02-21T09:08:10.4228419Z mov.b64 %rd36, 4096; 2026-02-21T09:08:10.4228745Z // begin inline asm 2026-02-21T09:08:10.4229352Z @%p103 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:08:10.4230056Z // end inline asm 2026-02-21T09:08:10.4230352Z mov.b32 %r181, 1; 2026-02-21T09:08:10.4230644Z // begin inline asm 2026-02-21T09:08:10.4231265Z @%p103 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r181; 2026-02-21T09:08:10.4231967Z // end inline asm 2026-02-21T09:08:10.4232266Z // begin inline asm 2026-02-21T09:08:10.4232873Z @%p103 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r181; 2026-02-21T09:08:10.4233580Z // end inline asm 2026-02-21T09:08:10.4233869Z // begin inline asm 2026-02-21T09:08:10.4234423Z @%p103 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:08:10.4235210Z // end inline asm 2026-02-21T09:08:10.4235601Z // begin inline asm 2026-02-21T09:08:10.4236416Z @%p103 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:08:10.4237274Z // end inline asm 2026-02-21T09:08:10.4237599Z // begin inline asm 2026-02-21T09:08:10.4238152Z @%p103 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:08:10.4238847Z // end inline asm 2026-02-21T09:08:10.4239193Z // begin inline asm 2026-02-21T09:08:10.4239793Z @%p103 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:08:10.4240457Z // end inline asm 2026-02-21T09:08:10.4240746Z // begin inline asm 2026-02-21T09:08:10.4241555Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:08:10.4242439Z // end inline asm 2026-02-21T09:08:10.4242732Z // begin inline asm 2026-02-21T09:08:10.4243208Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T09:08:10.4243785Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:08:10.4244216Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:10.4244617Z // end inline asm 2026-02-21T09:08:10.4245076Z bar.sync 0, 256; 2026-02-21T09:08:10.4245651Z .loc 1 22 67 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:22:67 2026-02-21T09:08:10.4246353Z add.s64 %rd61, %rd43, 128; 2026-02-21T09:08:10.4246681Z bar.sync 0, 256; 2026-02-21T09:08:10.4247043Z // begin inline asm 2026-02-21T09:08:10.4247375Z @%p33 st.shared.b32 [ %r175 + 0 ], %r972; 2026-02-21T09:08:10.4247762Z // end inline asm 2026-02-21T09:08:10.4248070Z bar.warp.sync -1; 2026-02-21T09:08:10.4248372Z // begin inline asm 2026-02-21T09:08:10.4248948Z @%p103 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd4; 2026-02-21T09:08:10.4249592Z // end inline asm 2026-02-21T09:08:10.4249882Z // begin inline asm 2026-02-21T09:08:10.4250386Z @%p103 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:08:10.4250978Z // end inline asm 2026-02-21T09:08:10.4251271Z // begin inline asm 2026-02-21T09:08:10.4251871Z @%p103 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r177; 2026-02-21T09:08:10.4252508Z // end inline asm 2026-02-21T09:08:10.4252793Z mov.b32 %r186, 128; 2026-02-21T09:08:10.4253098Z // begin inline asm 2026-02-21T09:08:10.4253627Z @%p103 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r186; 2026-02-21T09:08:10.4254271Z // end inline asm 2026-02-21T09:08:10.4254553Z // begin inline asm 2026-02-21T09:08:10.4255261Z @%p103 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r179; 2026-02-21T09:08:10.4255916Z // end inline asm 2026-02-21T09:08:10.4256195Z // begin inline asm 2026-02-21T09:08:10.4256754Z @%p103 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r179; 2026-02-21T09:08:10.4257394Z // end inline asm 2026-02-21T09:08:10.4257688Z // begin inline asm 2026-02-21T09:08:10.4258269Z @%p103 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:08:10.4258940Z // end inline asm 2026-02-21T09:08:10.4259232Z // begin inline asm 2026-02-21T09:08:10.4259818Z @%p103 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r181; 2026-02-21T09:08:10.4260495Z // end inline asm 2026-02-21T09:08:10.4260776Z // begin inline asm 2026-02-21T09:08:10.4261366Z @%p103 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r181; 2026-02-21T09:08:10.4262027Z // end inline asm 2026-02-21T09:08:10.4262316Z // begin inline asm 2026-02-21T09:08:10.4262857Z @%p103 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:08:10.4263463Z // end inline asm 2026-02-21T09:08:10.4263750Z // begin inline asm 2026-02-21T09:08:10.4264317Z @%p103 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:08:10.4265063Z // end inline asm 2026-02-21T09:08:10.4265343Z // begin inline asm 2026-02-21T09:08:10.4265900Z @%p103 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:08:10.4266537Z // end inline asm 2026-02-21T09:08:10.4266817Z // begin inline asm 2026-02-21T09:08:10.4267346Z @%p103 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:08:10.4267947Z // end inline asm 2026-02-21T09:08:10.4268242Z // begin inline asm 2026-02-21T09:08:10.4269046Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:08:10.4269945Z // end inline asm 2026-02-21T09:08:10.4270225Z // begin inline asm 2026-02-21T09:08:10.4270694Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T09:08:10.4271273Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:08:10.4271690Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:10.4272081Z // end inline asm 2026-02-21T09:08:10.4272350Z bar.sync 0, 256; 2026-02-21T09:08:10.4272936Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4273715Z sub.s32 %r353, 256, %r41; 2026-02-21T09:08:10.4274065Z mul.hi.s32 %r354, %r353, -580400985; 2026-02-21T09:08:10.4274453Z add.s32 %r355, %r354, %r353; 2026-02-21T09:08:10.4274892Z shr.u32 %r356, %r355, 31; 2026-02-21T09:08:10.4275287Z shr.s32 %r357, %r355, 11; 2026-02-21T09:08:10.4275607Z add.s32 %r358, %r357, %r356; 2026-02-21T09:08:10.4275950Z mul.lo.s32 %r359, %r358, 2368; 2026-02-21T09:08:10.4276307Z setp.ne.b32 %p94, %r353, %r359; 2026-02-21T09:08:10.4276677Z setp.lt.u32 %p95, %r41, 257; 2026-02-21T09:08:10.4277057Z and.pred %p96, %p95, %p94; 2026-02-21T09:08:10.4277406Z selp.b32 %r360, 1, 0, %p96; 2026-02-21T09:08:10.4277735Z add.s32 %r361, %r358, %r360; 2026-02-21T09:08:10.4278065Z shl.b32 %r59, %r361, 6; 2026-02-21T09:08:10.4278653Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4279434Z shfl.sync.idx.b32 %r362, %r2, 0, 31, -1; 2026-02-21T09:08:10.4279839Z shl.b32 %r363, %r362, 21; 2026-02-21T09:08:10.4280157Z and.b32 %r364, %r363, 6291456; 2026-02-21T09:08:10.4280494Z add.s32 %r365, %r364, %r944; 2026-02-21T09:08:10.4280871Z shl.b32 %r366, %r362, 5; 2026-02-21T09:08:10.4281303Z and.b32 %r367, %r366, 128; 2026-02-21T09:08:10.4281677Z add.s32 %r191, %r365, %r367; 2026-02-21T09:08:10.4282018Z mov.pred %p71, -1; 2026-02-21T09:08:10.4282320Z // begin inline asm 2026-02-21T09:08:10.4283200Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 0], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4284082Z // end inline asm 2026-02-21T09:08:10.4284358Z // begin inline asm 2026-02-21T09:08:10.4285289Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 16], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4286189Z // end inline asm 2026-02-21T09:08:10.4286485Z // begin inline asm 2026-02-21T09:08:10.4287330Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 32], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4288235Z // end inline asm 2026-02-21T09:08:10.4288535Z // begin inline asm 2026-02-21T09:08:10.4289369Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 48], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4290276Z // end inline asm 2026-02-21T09:08:10.4290560Z // begin inline asm 2026-02-21T09:08:10.4291399Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 64], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4292302Z // end inline asm 2026-02-21T09:08:10.4292585Z // begin inline asm 2026-02-21T09:08:10.4293428Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 80], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4294326Z // end inline asm 2026-02-21T09:08:10.4294621Z // begin inline asm 2026-02-21T09:08:10.4295541Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 96], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4296456Z // end inline asm 2026-02-21T09:08:10.4296752Z // begin inline asm 2026-02-21T09:08:10.4297601Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r191 + 112], {%r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972, %r972}; 2026-02-21T09:08:10.4298525Z // end inline asm 2026-02-21T09:08:10.4298819Z // begin inline asm 2026-02-21T09:08:10.4299159Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:08:10.4299523Z // end inline asm 2026-02-21T09:08:10.4299800Z bar.sync 0, 256; 2026-02-21T09:08:10.4300412Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4301242Z add.s32 %r327, %r174, 114688; 2026-02-21T09:08:10.4301597Z // begin inline asm 2026-02-21T09:08:10.4301964Z @%p103 mbarrier.init.shared::cta.b64 [%r327], 1; 2026-02-21T09:08:10.4302469Z // end inline asm 2026-02-21T09:08:10.4302745Z bar.sync 0, 256; 2026-02-21T09:08:10.4303053Z add.s32 %r328, %r174, 114696; 2026-02-21T09:08:10.4303393Z // begin inline asm 2026-02-21T09:08:10.4303766Z @%p103 mbarrier.init.shared::cta.b64 [%r328], 1; 2026-02-21T09:08:10.4304202Z // end inline asm 2026-02-21T09:08:10.4304478Z bar.sync 0, 256; 2026-02-21T09:08:10.4304882Z add.s32 %r329, %r174, 114704; 2026-02-21T09:08:10.4305209Z // begin inline asm 2026-02-21T09:08:10.4305574Z @%p103 mbarrier.init.shared::cta.b64 [%r329], 1; 2026-02-21T09:08:10.4305986Z // end inline asm 2026-02-21T09:08:10.4306278Z bar.sync 0, 256; 2026-02-21T09:08:10.4306644Z add.s32 %r330, %r174, 114712; 2026-02-21T09:08:10.4306995Z // begin inline asm 2026-02-21T09:08:10.4307372Z @%p103 mbarrier.init.shared::cta.b64 [%r330], 1; 2026-02-21T09:08:10.4307787Z // end inline asm 2026-02-21T09:08:10.4308094Z add.s32 %r331, %r174, 114720; 2026-02-21T09:08:10.4308432Z // begin inline asm 2026-02-21T09:08:10.4308793Z @%p103 mbarrier.init.shared::cta.b64 [%r331], 1; 2026-02-21T09:08:10.4309211Z // end inline asm 2026-02-21T09:08:10.4309493Z bar.sync 0, 256; 2026-02-21T09:08:10.4309777Z add.s32 %r332, %r174, 114728; 2026-02-21T09:08:10.4310176Z // begin inline asm 2026-02-21T09:08:10.4310543Z @%p103 mbarrier.init.shared::cta.b64 [%r332], 1; 2026-02-21T09:08:10.4310956Z // end inline asm 2026-02-21T09:08:10.4311245Z bar.sync 0, 256; 2026-02-21T09:08:10.4311529Z add.s32 %r333, %r174, 114736; 2026-02-21T09:08:10.4311868Z // begin inline asm 2026-02-21T09:08:10.4312174Z @%p103 mbarrier.init.shared::cta.b64 [%r333], 1; 2026-02-21T09:08:10.4312513Z // end inline asm 2026-02-21T09:08:10.4312747Z bar.sync 0, 256; 2026-02-21T09:08:10.4313021Z add.s32 %r334, %r174, 114744; 2026-02-21T09:08:10.4313336Z // begin inline asm 2026-02-21T09:08:10.4313695Z @%p103 mbarrier.init.shared::cta.b64 [%r334], 1; 2026-02-21T09:08:10.4314110Z // end inline asm 2026-02-21T09:08:10.4314667Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4315410Z bar.sync 0, 256; 2026-02-21T09:08:10.4315690Z // begin inline asm 2026-02-21T09:08:10.4316065Z @%p103 mbarrier.arrive.shared::cta.b64 _, [%r327]; 2026-02-21T09:08:10.4316503Z // end inline asm 2026-02-21T09:08:10.4316789Z bar.sync 0, 256; 2026-02-21T09:08:10.4317065Z // begin inline asm 2026-02-21T09:08:10.4317442Z @%p103 mbarrier.arrive.shared::cta.b64 _, [%r328]; 2026-02-21T09:08:10.4317876Z // end inline asm 2026-02-21T09:08:10.4318147Z bar.sync 0, 256; 2026-02-21T09:08:10.4318438Z // begin inline asm 2026-02-21T09:08:10.4318791Z @%p103 mbarrier.arrive.shared::cta.b64 _, [%r329]; 2026-02-21T09:08:10.4319225Z // end inline asm 2026-02-21T09:08:10.4319499Z bar.sync 0, 256; 2026-02-21T09:08:10.4319786Z // begin inline asm 2026-02-21T09:08:10.4320150Z @%p103 mbarrier.arrive.shared::cta.b64 _, [%r330]; 2026-02-21T09:08:10.4320578Z // end inline asm 2026-02-21T09:08:10.4321169Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4321839Z bar.sync 0, 256; 2026-02-21T09:08:10.4322144Z add.s32 %r339, %r174, 114752; 2026-02-21T09:08:10.4322481Z // begin inline asm 2026-02-21T09:08:10.4322850Z @%p103 mbarrier.init.shared::cta.b64 [%r339], 1; 2026-02-21T09:08:10.4323264Z // end inline asm 2026-02-21T09:08:10.4323559Z add.s32 %r932, %r174, 114768; 2026-02-21T09:08:10.4323887Z // begin inline asm 2026-02-21T09:08:10.4324257Z @%p103 mbarrier.init.shared::cta.b64 [%r932], 1; 2026-02-21T09:08:10.4324814Z // end inline asm 2026-02-21T09:08:10.4325377Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4326172Z bar.sync 0, 256; 2026-02-21T09:08:10.4326460Z // begin inline asm 2026-02-21T09:08:10.4326839Z @%p103 mbarrier.arrive.shared::cta.b64 _, [%r932]; 2026-02-21T09:08:10.4327271Z // end inline asm 2026-02-21T09:08:10.4327854Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4328669Z st.shared.b32 [global_smem+114776], 33554689; 2026-02-21T09:08:10.4329138Z st.shared.b32 [global_smem+98304], %r944; 2026-02-21T09:08:10.4329591Z st.shared.b32 [global_smem+98312], %r59; 2026-02-21T09:08:10.4329990Z barrier.sync 1; 2026-02-21T09:08:10.4330340Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:08:10.4330728Z barrier.sync 1; 2026-02-21T09:08:10.4331034Z setp.lt.s32 %p97, %r361, 1; 2026-02-21T09:08:10.4331379Z @%p97 bra $L__BB0_23; 2026-02-21T09:08:10.4331746Z // %bb.17: // %.lr.ph10 2026-02-21T09:08:10.4332524Z .loc 1 0 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0:130 2026-02-21T09:08:10.4333246Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:08:10.4333655Z shl.b32 %r351, %r1, 3; 2026-02-21T09:08:10.4333975Z and.b32 %r42, %r351, 120; 2026-02-21T09:08:10.4334316Z shr.u32 %r352, %r1, 4; 2026-02-21T09:08:10.4334643Z bfe.u32 %r43, %r1, 4, 4; 2026-02-21T09:08:10.4335045Z or.b32 %r44, %r43, 16; 2026-02-21T09:08:10.4335354Z or.b32 %r45, %r43, 32; 2026-02-21T09:08:10.4335672Z or.b32 %r46, %r352, 48; 2026-02-21T09:08:10.4336058Z or.b32 %r47, %r43, 64; 2026-02-21T09:08:10.4336375Z or.b32 %r48, %r43, 80; 2026-02-21T09:08:10.4336684Z or.b32 %r49, %r43, 96; 2026-02-21T09:08:10.4337002Z or.b32 %r50, %r352, 112; 2026-02-21T09:08:10.4337339Z or.b32 %r51, %r43, 128; 2026-02-21T09:08:10.4337655Z or.b32 %r52, %r43, 144; 2026-02-21T09:08:10.4337984Z or.b32 %r53, %r43, 160; 2026-02-21T09:08:10.4338368Z or.b32 %r54, %r352, 176; 2026-02-21T09:08:10.4338778Z or.b32 %r55, %r43, 192; 2026-02-21T09:08:10.4339126Z or.b32 %r56, %r43, 208; 2026-02-21T09:08:10.4339440Z or.b32 %r57, %r43, 224; 2026-02-21T09:08:10.4339756Z or.b32 %r58, %r352, 240; 2026-02-21T09:08:10.4340417Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4341120Z add.s32 %r969, %r41, -2368; 2026-02-21T09:08:10.4341456Z shl.b32 %r370, %r1, 11; 2026-02-21T09:08:10.4341808Z and.b32 %r371, %r370, 12288; 2026-02-21T09:08:10.4342124Z shl.b32 %r372, %r1, 4; 2026-02-21T09:08:10.4342436Z and.b32 %r373, %r372, 4080; 2026-02-21T09:08:10.4342759Z or.b32 %r374, %r371, %r373; 2026-02-21T09:08:10.4343070Z add.s32 %r376, %r174, 98304; 2026-02-21T09:08:10.4343345Z add.s32 %r62, %r376, %r374; 2026-02-21T09:08:10.4343659Z xor.b32 %r377, %r374, 32; 2026-02-21T09:08:10.4343969Z add.s32 %r63, %r376, %r377; 2026-02-21T09:08:10.4344289Z xor.b32 %r378, %r374, 64; 2026-02-21T09:08:10.4344612Z add.s32 %r64, %r376, %r378; 2026-02-21T09:08:10.4345012Z xor.b32 %r379, %r374, 96; 2026-02-21T09:08:10.4345329Z add.s32 %r65, %r376, %r379; 2026-02-21T09:08:10.4345638Z shl.b32 %r380, %r1, 7; 2026-02-21T09:08:10.4345944Z and.b32 %r381, %r380, 12288; 2026-02-21T09:08:10.4346254Z shl.b32 %r382, %r1, 5; 2026-02-21T09:08:10.4346565Z and.b32 %r383, %r382, 864; 2026-02-21T09:08:10.4346882Z and.b32 %r384, %r1, 224; 2026-02-21T09:08:10.4347196Z and.b32 %r386, %r350, 16; 2026-02-21T09:08:10.4347500Z or.b32 %r387, %r381, %r383; 2026-02-21T09:08:10.4347821Z xor.b32 %r388, %r387, %r384; 2026-02-21T09:08:10.4348149Z add.s32 %r389, %r376, %r386; 2026-02-21T09:08:10.4348468Z add.s32 %r535, %r389, %r388; 2026-02-21T09:08:10.4348790Z add.s32 %r540, %r535, 1024; 2026-02-21T09:08:10.4349095Z add.s32 %r545, %r535, 2048; 2026-02-21T09:08:10.4349406Z add.s32 %r550, %r535, 3072; 2026-02-21T09:08:10.4349715Z max.s32 %r962, %r59, 1; 2026-02-21T09:08:10.4350021Z mov.b32 %r967, -1; 2026-02-21T09:08:10.4350298Z mov.b32 %r970, %r972; 2026-02-21T09:08:10.4350607Z mov.b32 %r971, %r972; 2026-02-21T09:08:10.4351031Z bra.uni $L__BB0_18; 2026-02-21T09:08:10.4351431Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:08:10.4352187Z .loc 1 40 32 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:40:32 2026-02-21T09:08:10.4352898Z or.b32 %r676, %r971, %r42; 2026-02-21T09:08:10.4353497Z .loc 1 42 32 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:42:32 2026-02-21T09:08:10.4354131Z add.s32 %r677, %r970, %r43; 2026-02-21T09:08:10.4354464Z add.s32 %r678, %r970, %r44; 2026-02-21T09:08:10.4354859Z add.s32 %r679, %r970, %r45; 2026-02-21T09:08:10.4355187Z add.s32 %r680, %r970, %r46; 2026-02-21T09:08:10.4355514Z add.s32 %r681, %r970, %r47; 2026-02-21T09:08:10.4355833Z add.s32 %r682, %r970, %r48; 2026-02-21T09:08:10.4356159Z add.s32 %r683, %r970, %r49; 2026-02-21T09:08:10.4356483Z add.s32 %r684, %r970, %r50; 2026-02-21T09:08:10.4356889Z add.s32 %r685, %r970, %r51; 2026-02-21T09:08:10.4357213Z add.s32 %r686, %r970, %r52; 2026-02-21T09:08:10.4357547Z add.s32 %r687, %r970, %r53; 2026-02-21T09:08:10.4357853Z add.s32 %r688, %r970, %r54; 2026-02-21T09:08:10.4358148Z add.s32 %r689, %r970, %r55; 2026-02-21T09:08:10.4358460Z add.s32 %r690, %r970, %r56; 2026-02-21T09:08:10.4358758Z add.s32 %r691, %r970, %r57; 2026-02-21T09:08:10.4359072Z add.s32 %r692, %r970, %r58; 2026-02-21T09:08:10.4359679Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4360325Z bar.sync 0, 256; 2026-02-21T09:08:10.4360612Z // begin inline asm 2026-02-21T09:08:10.4360930Z 2026-02-21T09:08:10.4361159Z { 2026-02-21T09:08:10.4361421Z .reg .pred complete; 2026-02-21T09:08:10.4361716Z waitLoop: 2026-02-21T09:08:10.4362132Z mbarrier.try_wait.parity.shared.b64 complete, [%r339], %r972; 2026-02-21T09:08:10.4362676Z @!complete bra.uni waitLoop; 2026-02-21T09:08:10.4363000Z } 2026-02-21T09:08:10.4363140Z 2026-02-21T09:08:10.4363265Z // end inline asm 2026-02-21T09:08:10.4363553Z // begin inline asm 2026-02-21T09:08:10.4364409Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409}, [%r191 + 0]; 2026-02-21T09:08:10.4365450Z // end inline asm 2026-02-21T09:08:10.4365746Z // begin inline asm 2026-02-21T09:08:10.4366596Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426}, [%r191 + 16]; 2026-02-21T09:08:10.4367509Z // end inline asm 2026-02-21T09:08:10.4367804Z // begin inline asm 2026-02-21T09:08:10.4368632Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443}, [%r191 + 32]; 2026-02-21T09:08:10.4369558Z // end inline asm 2026-02-21T09:08:10.4369838Z // begin inline asm 2026-02-21T09:08:10.4370666Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460}, [%r191 + 48]; 2026-02-21T09:08:10.4371587Z // end inline asm 2026-02-21T09:08:10.4371867Z // begin inline asm 2026-02-21T09:08:10.4372691Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477}, [%r191 + 64]; 2026-02-21T09:08:10.4373605Z // end inline asm 2026-02-21T09:08:10.4373894Z // begin inline asm 2026-02-21T09:08:10.4374799Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494}, [%r191 + 80]; 2026-02-21T09:08:10.4375709Z // end inline asm 2026-02-21T09:08:10.4375995Z // begin inline asm 2026-02-21T09:08:10.4376808Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511}, [%r191 + 96]; 2026-02-21T09:08:10.4377826Z // end inline asm 2026-02-21T09:08:10.4378101Z // begin inline asm 2026-02-21T09:08:10.4378924Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528}, [%r191 + 112]; 2026-02-21T09:08:10.4379958Z // end inline asm 2026-02-21T09:08:10.4380238Z // begin inline asm 2026-02-21T09:08:10.4380580Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:08:10.4380928Z // end inline asm 2026-02-21T09:08:10.4381217Z bar.sync 0, 256; 2026-02-21T09:08:10.4381499Z // begin inline asm 2026-02-21T09:08:10.4381877Z @%p103 mbarrier.arrive.shared::cta.b64 _, [%r932]; 2026-02-21T09:08:10.4382318Z // end inline asm 2026-02-21T09:08:10.4382617Z cvt.u64.u32 %rd81, %r394; 2026-02-21T09:08:10.4382950Z cvt.u64.u32 %rd82, %r395; 2026-02-21T09:08:10.4383285Z shl.b64 %rd83, %rd82, 32; 2026-02-21T09:08:10.4383620Z or.b64 %rd84, %rd81, %rd83; 2026-02-21T09:08:10.4384313Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4385091Z mov.b64 {%r694, %r695}, %rd84; 2026-02-21T09:08:10.4385461Z cvt.rn.f16x2.f32 %r696, %r695, %r694; 2026-02-21T09:08:10.4386137Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4386806Z cvt.u64.u32 %rd85, %r396; 2026-02-21T09:08:10.4387145Z cvt.u64.u32 %rd86, %r397; 2026-02-21T09:08:10.4387472Z shl.b64 %rd87, %rd86, 32; 2026-02-21T09:08:10.4387873Z or.b64 %rd88, %rd85, %rd87; 2026-02-21T09:08:10.4388494Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4389164Z mov.b64 {%r697, %r698}, %rd88; 2026-02-21T09:08:10.4389530Z cvt.rn.f16x2.f32 %r699, %r698, %r697; 2026-02-21T09:08:10.4390175Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4390851Z cvt.u64.u32 %rd89, %r398; 2026-02-21T09:08:10.4391173Z cvt.u64.u32 %rd90, %r399; 2026-02-21T09:08:10.4391508Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:08:10.4391839Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:08:10.4392437Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4393117Z mov.b64 {%r700, %r701}, %rd92; 2026-02-21T09:08:10.4393475Z cvt.rn.f16x2.f32 %r702, %r701, %r700; 2026-02-21T09:08:10.4394134Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4394869Z cvt.u64.u32 %rd93, %r400; 2026-02-21T09:08:10.4395202Z cvt.u64.u32 %rd94, %r401; 2026-02-21T09:08:10.4395521Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:08:10.4395851Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:08:10.4396468Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4397274Z mov.b64 {%r703, %r704}, %rd96; 2026-02-21T09:08:10.4397701Z cvt.rn.f16x2.f32 %r705, %r704, %r703; 2026-02-21T09:08:10.4398369Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4399069Z cvt.u64.u32 %rd97, %r402; 2026-02-21T09:08:10.4399403Z cvt.u64.u32 %rd98, %r403; 2026-02-21T09:08:10.4399741Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:08:10.4400085Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:08:10.4400716Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4401423Z mov.b64 {%r706, %r707}, %rd100; 2026-02-21T09:08:10.4401800Z cvt.rn.f16x2.f32 %r708, %r707, %r706; 2026-02-21T09:08:10.4402477Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4403167Z cvt.u64.u32 %rd101, %r404; 2026-02-21T09:08:10.4403523Z cvt.u64.u32 %rd102, %r405; 2026-02-21T09:08:10.4403864Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:08:10.4404233Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:08:10.4405008Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4405705Z mov.b64 {%r709, %r710}, %rd104; 2026-02-21T09:08:10.4406092Z cvt.rn.f16x2.f32 %r711, %r710, %r709; 2026-02-21T09:08:10.4406755Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4407530Z cvt.u64.u32 %rd105, %r406; 2026-02-21T09:08:10.4407873Z cvt.u64.u32 %rd106, %r407; 2026-02-21T09:08:10.4408228Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:08:10.4408584Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:08:10.4409246Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4409955Z mov.b64 {%r712, %r713}, %rd108; 2026-02-21T09:08:10.4410332Z cvt.rn.f16x2.f32 %r714, %r713, %r712; 2026-02-21T09:08:10.4411087Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4411786Z cvt.u64.u32 %rd109, %r408; 2026-02-21T09:08:10.4412142Z cvt.u64.u32 %rd110, %r409; 2026-02-21T09:08:10.4412484Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:08:10.4412846Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:08:10.4413506Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4414217Z mov.b64 {%r715, %r716}, %rd112; 2026-02-21T09:08:10.4414606Z cvt.rn.f16x2.f32 %r717, %r716, %r715; 2026-02-21T09:08:10.4415411Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4416117Z cvt.u64.u32 %rd113, %r411; 2026-02-21T09:08:10.4416457Z cvt.u64.u32 %rd114, %r412; 2026-02-21T09:08:10.4416801Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:08:10.4417154Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:08:10.4417792Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4418507Z mov.b64 {%r718, %r719}, %rd116; 2026-02-21T09:08:10.4418886Z cvt.rn.f16x2.f32 %r720, %r719, %r718; 2026-02-21T09:08:10.4419574Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4420257Z cvt.u64.u32 %rd117, %r413; 2026-02-21T09:08:10.4420626Z cvt.u64.u32 %rd118, %r414; 2026-02-21T09:08:10.4420974Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:08:10.4421319Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:08:10.4421972Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4422665Z mov.b64 {%r721, %r722}, %rd120; 2026-02-21T09:08:10.4423049Z cvt.rn.f16x2.f32 %r723, %r722, %r721; 2026-02-21T09:08:10.4423729Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4424428Z cvt.u64.u32 %rd121, %r415; 2026-02-21T09:08:10.4424844Z cvt.u64.u32 %rd122, %r416; 2026-02-21T09:08:10.4425365Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:08:10.4425736Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:08:10.4426377Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4427087Z mov.b64 {%r724, %r725}, %rd124; 2026-02-21T09:08:10.4427476Z cvt.rn.f16x2.f32 %r726, %r725, %r724; 2026-02-21T09:08:10.4428142Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4428884Z cvt.u64.u32 %rd125, %r417; 2026-02-21T09:08:10.4429228Z cvt.u64.u32 %rd126, %r418; 2026-02-21T09:08:10.4429578Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:08:10.4429923Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:08:10.4430571Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4431257Z mov.b64 {%r727, %r728}, %rd128; 2026-02-21T09:08:10.4431641Z cvt.rn.f16x2.f32 %r729, %r728, %r727; 2026-02-21T09:08:10.4432323Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4433111Z cvt.u64.u32 %rd129, %r419; 2026-02-21T09:08:10.4433467Z cvt.u64.u32 %rd130, %r420; 2026-02-21T09:08:10.4433809Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:08:10.4434231Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:08:10.4434949Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4435649Z mov.b64 {%r730, %r731}, %rd132; 2026-02-21T09:08:10.4436094Z cvt.rn.f16x2.f32 %r732, %r731, %r730; 2026-02-21T09:08:10.4436989Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4437753Z cvt.u64.u32 %rd133, %r421; 2026-02-21T09:08:10.4438134Z cvt.u64.u32 %rd134, %r422; 2026-02-21T09:08:10.4438540Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:08:10.4438972Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:08:10.4439720Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4440441Z mov.b64 {%r733, %r734}, %rd136; 2026-02-21T09:08:10.4440855Z cvt.rn.f16x2.f32 %r735, %r734, %r733; 2026-02-21T09:08:10.4441513Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4442185Z cvt.u64.u32 %rd137, %r423; 2026-02-21T09:08:10.4442522Z cvt.u64.u32 %rd138, %r424; 2026-02-21T09:08:10.4442850Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:08:10.4443249Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:08:10.4443865Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4444535Z mov.b64 {%r736, %r737}, %rd140; 2026-02-21T09:08:10.4444946Z cvt.rn.f16x2.f32 %r738, %r737, %r736; 2026-02-21T09:08:10.4445589Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4446264Z cvt.u64.u32 %rd141, %r425; 2026-02-21T09:08:10.4446594Z cvt.u64.u32 %rd142, %r426; 2026-02-21T09:08:10.4446931Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:08:10.4447265Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:08:10.4447895Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4448557Z mov.b64 {%r739, %r740}, %rd144; 2026-02-21T09:08:10.4448924Z cvt.rn.f16x2.f32 %r741, %r740, %r739; 2026-02-21T09:08:10.4449581Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4450240Z cvt.u64.u32 %rd145, %r428; 2026-02-21T09:08:10.4450580Z cvt.u64.u32 %rd146, %r429; 2026-02-21T09:08:10.4450906Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:08:10.4451248Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:08:10.4451853Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4452535Z mov.b64 {%r742, %r743}, %rd148; 2026-02-21T09:08:10.4452901Z cvt.rn.f16x2.f32 %r744, %r743, %r742; 2026-02-21T09:08:10.4453548Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4454219Z cvt.u64.u32 %rd149, %r430; 2026-02-21T09:08:10.4454548Z cvt.u64.u32 %rd150, %r431; 2026-02-21T09:08:10.4454959Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:08:10.4455293Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:08:10.4455917Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4456588Z mov.b64 {%r745, %r746}, %rd152; 2026-02-21T09:08:10.4456957Z cvt.rn.f16x2.f32 %r747, %r746, %r745; 2026-02-21T09:08:10.4457614Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4458277Z cvt.u64.u32 %rd153, %r432; 2026-02-21T09:08:10.4458614Z cvt.u64.u32 %rd154, %r433; 2026-02-21T09:08:10.4474511Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:08:10.4475154Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:08:10.4476020Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4476736Z mov.b64 {%r748, %r749}, %rd156; 2026-02-21T09:08:10.4477120Z cvt.rn.f16x2.f32 %r750, %r749, %r748; 2026-02-21T09:08:10.4477874Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4478570Z cvt.u64.u32 %rd157, %r434; 2026-02-21T09:08:10.4478927Z cvt.u64.u32 %rd158, %r435; 2026-02-21T09:08:10.4479281Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:08:10.4479628Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:08:10.4480264Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4480934Z mov.b64 {%r751, %r752}, %rd160; 2026-02-21T09:08:10.4481319Z cvt.rn.f16x2.f32 %r753, %r752, %r751; 2026-02-21T09:08:10.4482047Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4482718Z cvt.u64.u32 %rd161, %r436; 2026-02-21T09:08:10.4483064Z cvt.u64.u32 %rd162, %r437; 2026-02-21T09:08:10.4483398Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:08:10.4483751Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:08:10.4484360Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4485101Z mov.b64 {%r754, %r755}, %rd164; 2026-02-21T09:08:10.4485461Z cvt.rn.f16x2.f32 %r756, %r755, %r754; 2026-02-21T09:08:10.4486167Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4486823Z cvt.u64.u32 %rd165, %r438; 2026-02-21T09:08:10.4487144Z cvt.u64.u32 %rd166, %r439; 2026-02-21T09:08:10.4487477Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:08:10.4487808Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:08:10.4488408Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4489040Z mov.b64 {%r757, %r758}, %rd168; 2026-02-21T09:08:10.4489398Z cvt.rn.f16x2.f32 %r759, %r758, %r757; 2026-02-21T09:08:10.4490017Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4490658Z cvt.u64.u32 %rd169, %r440; 2026-02-21T09:08:10.4490992Z cvt.u64.u32 %rd170, %r441; 2026-02-21T09:08:10.4491310Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:08:10.4491641Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:08:10.4492255Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4492947Z mov.b64 {%r760, %r761}, %rd172; 2026-02-21T09:08:10.4493312Z cvt.rn.f16x2.f32 %r762, %r761, %r760; 2026-02-21T09:08:10.4493973Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4494650Z cvt.u64.u32 %rd173, %r442; 2026-02-21T09:08:10.4495082Z cvt.u64.u32 %rd174, %r443; 2026-02-21T09:08:10.4495428Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:08:10.4495767Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:08:10.4496396Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4497067Z mov.b64 {%r763, %r764}, %rd176; 2026-02-21T09:08:10.4497441Z cvt.rn.f16x2.f32 %r765, %r764, %r763; 2026-02-21T09:08:10.4498098Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4498905Z cvt.u64.u32 %rd177, %r445; 2026-02-21T09:08:10.4499317Z cvt.u64.u32 %rd178, %r446; 2026-02-21T09:08:10.4499678Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:08:10.4500035Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:08:10.4500680Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4501384Z mov.b64 {%r766, %r767}, %rd180; 2026-02-21T09:08:10.4501759Z cvt.rn.f16x2.f32 %r768, %r767, %r766; 2026-02-21T09:08:10.4502552Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4503262Z cvt.u64.u32 %rd181, %r447; 2026-02-21T09:08:10.4503598Z cvt.u64.u32 %rd182, %r448; 2026-02-21T09:08:10.4503944Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:08:10.4504350Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:08:10.4505095Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4505794Z mov.b64 {%r769, %r770}, %rd184; 2026-02-21T09:08:10.4506175Z cvt.rn.f16x2.f32 %r771, %r770, %r769; 2026-02-21T09:08:10.4506849Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4507557Z cvt.u64.u32 %rd185, %r449; 2026-02-21T09:08:10.4507911Z cvt.u64.u32 %rd186, %r450; 2026-02-21T09:08:10.4508250Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:08:10.4508609Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:08:10.4509312Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4510024Z mov.b64 {%r772, %r773}, %rd188; 2026-02-21T09:08:10.4510394Z cvt.rn.f16x2.f32 %r774, %r773, %r772; 2026-02-21T09:08:10.4511084Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4511797Z cvt.u64.u32 %rd189, %r451; 2026-02-21T09:08:10.4512137Z cvt.u64.u32 %rd190, %r452; 2026-02-21T09:08:10.4512566Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:08:10.4512914Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:08:10.4513566Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4514259Z mov.b64 {%r775, %r776}, %rd192; 2026-02-21T09:08:10.4514640Z cvt.rn.f16x2.f32 %r777, %r776, %r775; 2026-02-21T09:08:10.4515376Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4516081Z cvt.u64.u32 %rd193, %r453; 2026-02-21T09:08:10.4516438Z cvt.u64.u32 %rd194, %r454; 2026-02-21T09:08:10.4516777Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:08:10.4517133Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:08:10.4517774Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4518482Z mov.b64 {%r778, %r779}, %rd196; 2026-02-21T09:08:10.4518853Z cvt.rn.f16x2.f32 %r780, %r779, %r778; 2026-02-21T09:08:10.4519538Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4520239Z cvt.u64.u32 %rd197, %r455; 2026-02-21T09:08:10.4520583Z cvt.u64.u32 %rd198, %r456; 2026-02-21T09:08:10.4520932Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:08:10.4521278Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:08:10.4521936Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4522639Z mov.b64 {%r781, %r782}, %rd200; 2026-02-21T09:08:10.4523022Z cvt.rn.f16x2.f32 %r783, %r782, %r781; 2026-02-21T09:08:10.4523698Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4524398Z cvt.u64.u32 %rd201, %r457; 2026-02-21T09:08:10.4524810Z cvt.u64.u32 %rd202, %r458; 2026-02-21T09:08:10.4525155Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:08:10.4525511Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:08:10.4526164Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4526875Z mov.b64 {%r784, %r785}, %rd204; 2026-02-21T09:08:10.4527249Z cvt.rn.f16x2.f32 %r786, %r785, %r784; 2026-02-21T09:08:10.4527944Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4528655Z cvt.u64.u32 %rd205, %r459; 2026-02-21T09:08:10.4528999Z cvt.u64.u32 %rd206, %r460; 2026-02-21T09:08:10.4529357Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:08:10.4529786Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:08:10.4530453Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4531158Z mov.b64 {%r787, %r788}, %rd208; 2026-02-21T09:08:10.4531874Z cvt.rn.f16x2.f32 %r789, %r788, %r787; 2026-02-21T09:08:10.4532554Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4533261Z cvt.u64.u32 %rd209, %r462; 2026-02-21T09:08:10.4533627Z cvt.u64.u32 %rd210, %r463; 2026-02-21T09:08:10.4533971Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:08:10.4534330Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:08:10.4535075Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4535783Z mov.b64 {%r790, %r791}, %rd212; 2026-02-21T09:08:10.4536150Z cvt.rn.f16x2.f32 %r792, %r791, %r790; 2026-02-21T09:08:10.4536899Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4537616Z cvt.u64.u32 %rd213, %r464; 2026-02-21T09:08:10.4537956Z cvt.u64.u32 %rd214, %r465; 2026-02-21T09:08:10.4538304Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:08:10.4538653Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:08:10.4539315Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4540013Z mov.b64 {%r793, %r794}, %rd216; 2026-02-21T09:08:10.4540452Z cvt.rn.f16x2.f32 %r795, %r794, %r793; 2026-02-21T09:08:10.4541131Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4541836Z cvt.u64.u32 %rd217, %r466; 2026-02-21T09:08:10.4542188Z cvt.u64.u32 %rd218, %r467; 2026-02-21T09:08:10.4542525Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:08:10.4542882Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:08:10.4543522Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4544235Z mov.b64 {%r796, %r797}, %rd220; 2026-02-21T09:08:10.4544602Z cvt.rn.f16x2.f32 %r798, %r797, %r796; 2026-02-21T09:08:10.4545389Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4546192Z cvt.u64.u32 %rd221, %r468; 2026-02-21T09:08:10.4546619Z cvt.u64.u32 %rd222, %r469; 2026-02-21T09:08:10.4547117Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:08:10.4547596Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:08:10.4548350Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4549048Z mov.b64 {%r799, %r800}, %rd224; 2026-02-21T09:08:10.4549437Z cvt.rn.f16x2.f32 %r801, %r800, %r799; 2026-02-21T09:08:10.4550113Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4550817Z cvt.u64.u32 %rd225, %r470; 2026-02-21T09:08:10.4551180Z cvt.u64.u32 %rd226, %r471; 2026-02-21T09:08:10.4551554Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:08:10.4551960Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:08:10.4552638Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4553356Z mov.b64 {%r802, %r803}, %rd228; 2026-02-21T09:08:10.4553718Z cvt.rn.f16x2.f32 %r804, %r803, %r802; 2026-02-21T09:08:10.4554382Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4555145Z cvt.u64.u32 %rd229, %r472; 2026-02-21T09:08:10.4555480Z cvt.u64.u32 %rd230, %r473; 2026-02-21T09:08:10.4555823Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:08:10.4556160Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:08:10.4556802Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4557484Z mov.b64 {%r805, %r806}, %rd232; 2026-02-21T09:08:10.4557860Z cvt.rn.f16x2.f32 %r807, %r806, %r805; 2026-02-21T09:08:10.4558588Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4559280Z cvt.u64.u32 %rd233, %r474; 2026-02-21T09:08:10.4559625Z cvt.u64.u32 %rd234, %r475; 2026-02-21T09:08:10.4560015Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:08:10.4560364Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:08:10.4560991Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4561682Z mov.b64 {%r808, %r809}, %rd236; 2026-02-21T09:08:10.4562055Z cvt.rn.f16x2.f32 %r810, %r809, %r808; 2026-02-21T09:08:10.4562706Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4563387Z cvt.u64.u32 %rd237, %r476; 2026-02-21T09:08:10.4563713Z cvt.u64.u32 %rd238, %r477; 2026-02-21T09:08:10.4564050Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:08:10.4564494Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:08:10.4565183Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4565875Z mov.b64 {%r811, %r812}, %rd240; 2026-02-21T09:08:10.4566240Z cvt.rn.f16x2.f32 %r813, %r812, %r811; 2026-02-21T09:08:10.4566917Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4567601Z cvt.u64.u32 %rd241, %r479; 2026-02-21T09:08:10.4567950Z cvt.u64.u32 %rd242, %r480; 2026-02-21T09:08:10.4568342Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:08:10.4568688Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:08:10.4569313Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4569993Z mov.b64 {%r814, %r815}, %rd244; 2026-02-21T09:08:10.4570365Z cvt.rn.f16x2.f32 %r816, %r815, %r814; 2026-02-21T09:08:10.4571014Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4571695Z cvt.u64.u32 %rd245, %r481; 2026-02-21T09:08:10.4572019Z cvt.u64.u32 %rd246, %r482; 2026-02-21T09:08:10.4572352Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:08:10.4572694Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:08:10.4573313Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4573991Z mov.b64 {%r817, %r818}, %rd248; 2026-02-21T09:08:10.4574349Z cvt.rn.f16x2.f32 %r819, %r818, %r817; 2026-02-21T09:08:10.4575118Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4575788Z cvt.u64.u32 %rd249, %r483; 2026-02-21T09:08:10.4576123Z cvt.u64.u32 %rd250, %r484; 2026-02-21T09:08:10.4576451Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:08:10.4576796Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:08:10.4577437Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4578122Z mov.b64 {%r820, %r821}, %rd252; 2026-02-21T09:08:10.4578492Z cvt.rn.f16x2.f32 %r822, %r821, %r820; 2026-02-21T09:08:10.4579151Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4579852Z cvt.u64.u32 %rd253, %r485; 2026-02-21T09:08:10.4580198Z cvt.u64.u32 %rd254, %r486; 2026-02-21T09:08:10.4580521Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:08:10.4580859Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:08:10.4581488Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4582180Z mov.b64 {%r823, %r824}, %rd256; 2026-02-21T09:08:10.4582545Z cvt.rn.f16x2.f32 %r825, %r824, %r823; 2026-02-21T09:08:10.4583206Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4583889Z cvt.u64.u32 %rd257, %r487; 2026-02-21T09:08:10.4584214Z cvt.u64.u32 %rd258, %r488; 2026-02-21T09:08:10.4584545Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:08:10.4585039Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:08:10.4585672Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4586350Z mov.b64 {%r826, %r827}, %rd260; 2026-02-21T09:08:10.4586789Z cvt.rn.f16x2.f32 %r828, %r827, %r826; 2026-02-21T09:08:10.4587451Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4588125Z cvt.u64.u32 %rd261, %r489; 2026-02-21T09:08:10.4588465Z cvt.u64.u32 %rd262, %r490; 2026-02-21T09:08:10.4588794Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:08:10.4589140Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:08:10.4589762Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4590441Z mov.b64 {%r829, %r830}, %rd264; 2026-02-21T09:08:10.4590809Z cvt.rn.f16x2.f32 %r831, %r830, %r829; 2026-02-21T09:08:10.4591515Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4592205Z cvt.u64.u32 %rd265, %r491; 2026-02-21T09:08:10.4592527Z cvt.u64.u32 %rd266, %r492; 2026-02-21T09:08:10.4592864Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:08:10.4593272Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:08:10.4594045Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4595053Z mov.b64 {%r832, %r833}, %rd268; 2026-02-21T09:08:10.4595451Z cvt.rn.f16x2.f32 %r834, %r833, %r832; 2026-02-21T09:08:10.4596129Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4596810Z cvt.u64.u32 %rd269, %r493; 2026-02-21T09:08:10.4597160Z cvt.u64.u32 %rd270, %r494; 2026-02-21T09:08:10.4597501Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:08:10.4597857Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:08:10.4598495Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4599196Z mov.b64 {%r835, %r836}, %rd272; 2026-02-21T09:08:10.4599571Z cvt.rn.f16x2.f32 %r837, %r836, %r835; 2026-02-21T09:08:10.4600242Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4600952Z cvt.u64.u32 %rd273, %r496; 2026-02-21T09:08:10.4601286Z cvt.u64.u32 %rd274, %r497; 2026-02-21T09:08:10.4601634Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:08:10.4601985Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:08:10.4602634Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4603323Z mov.b64 {%r838, %r839}, %rd276; 2026-02-21T09:08:10.4603702Z cvt.rn.f16x2.f32 %r840, %r839, %r838; 2026-02-21T09:08:10.4604379Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4605132Z cvt.u64.u32 %rd277, %r498; 2026-02-21T09:08:10.4605487Z cvt.u64.u32 %rd278, %r499; 2026-02-21T09:08:10.4605823Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:08:10.4606175Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:08:10.4606807Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4607496Z mov.b64 {%r841, %r842}, %rd280; 2026-02-21T09:08:10.4607880Z cvt.rn.f16x2.f32 %r843, %r842, %r841; 2026-02-21T09:08:10.4608542Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4609243Z cvt.u64.u32 %rd281, %r500; 2026-02-21T09:08:10.4609588Z cvt.u64.u32 %rd282, %r501; 2026-02-21T09:08:10.4609935Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:08:10.4610285Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:08:10.4610921Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4611608Z mov.b64 {%r844, %r845}, %rd284; 2026-02-21T09:08:10.4612057Z cvt.rn.f16x2.f32 %r846, %r845, %r844; 2026-02-21T09:08:10.4612736Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4613423Z cvt.u64.u32 %rd285, %r502; 2026-02-21T09:08:10.4613779Z cvt.u64.u32 %rd286, %r503; 2026-02-21T09:08:10.4614175Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:08:10.4614527Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:08:10.4615232Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4615923Z mov.b64 {%r847, %r848}, %rd288; 2026-02-21T09:08:10.4616306Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T09:08:10.4616971Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4617667Z cvt.u64.u32 %rd289, %r504; 2026-02-21T09:08:10.4618008Z cvt.u64.u32 %rd290, %r505; 2026-02-21T09:08:10.4618420Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:08:10.4618767Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:08:10.4619420Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4620106Z mov.b64 {%r850, %r851}, %rd292; 2026-02-21T09:08:10.4620488Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T09:08:10.4621169Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4621853Z cvt.u64.u32 %rd293, %r506; 2026-02-21T09:08:10.4622265Z cvt.u64.u32 %rd294, %r507; 2026-02-21T09:08:10.4622611Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:08:10.4622965Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:08:10.4623601Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4624297Z mov.b64 {%r853, %r854}, %rd296; 2026-02-21T09:08:10.4624755Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T09:08:10.4625415Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4626098Z cvt.u64.u32 %rd297, %r508; 2026-02-21T09:08:10.4626419Z cvt.u64.u32 %rd298, %r509; 2026-02-21T09:08:10.4626752Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:08:10.4627082Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:08:10.4627704Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4628360Z mov.b64 {%r856, %r857}, %rd300; 2026-02-21T09:08:10.4628720Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T09:08:10.4629326Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4629949Z cvt.u64.u32 %rd301, %r510; 2026-02-21T09:08:10.4630268Z cvt.u64.u32 %rd302, %r511; 2026-02-21T09:08:10.4630576Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:08:10.4630912Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:08:10.4631548Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4632241Z mov.b64 {%r859, %r860}, %rd304; 2026-02-21T09:08:10.4632625Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T09:08:10.4633304Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4634004Z cvt.u64.u32 %rd305, %r513; 2026-02-21T09:08:10.4634341Z cvt.u64.u32 %rd306, %r514; 2026-02-21T09:08:10.4634789Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:08:10.4635141Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:08:10.4635806Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4636497Z mov.b64 {%r862, %r863}, %rd308; 2026-02-21T09:08:10.4636879Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T09:08:10.4637561Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4638260Z cvt.u64.u32 %rd309, %r515; 2026-02-21T09:08:10.4638689Z cvt.u64.u32 %rd310, %r516; 2026-02-21T09:08:10.4639299Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:08:10.4639666Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:08:10.4640314Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4641085Z mov.b64 {%r865, %r866}, %rd312; 2026-02-21T09:08:10.4641468Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T09:08:10.4642153Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4642869Z cvt.u64.u32 %rd313, %r517; 2026-02-21T09:08:10.4643208Z cvt.u64.u32 %rd314, %r518; 2026-02-21T09:08:10.4643559Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:08:10.4643900Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:08:10.4644555Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4645351Z mov.b64 {%r868, %r869}, %rd316; 2026-02-21T09:08:10.4645792Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T09:08:10.4646486Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4647181Z cvt.u64.u32 %rd317, %r519; 2026-02-21T09:08:10.4647527Z cvt.u64.u32 %rd318, %r520; 2026-02-21T09:08:10.4647865Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:08:10.4648216Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:08:10.4648854Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4649618Z mov.b64 {%r871, %r872}, %rd320; 2026-02-21T09:08:10.4650003Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T09:08:10.4650680Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4651388Z cvt.u64.u32 %rd321, %r521; 2026-02-21T09:08:10.4651730Z cvt.u64.u32 %rd322, %r522; 2026-02-21T09:08:10.4652079Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:08:10.4652429Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:08:10.4653081Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4653778Z mov.b64 {%r874, %r875}, %rd324; 2026-02-21T09:08:10.4654162Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T09:08:10.4654942Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4655637Z cvt.u64.u32 %rd325, %r523; 2026-02-21T09:08:10.4655981Z cvt.u64.u32 %rd326, %r524; 2026-02-21T09:08:10.4656331Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:08:10.4656683Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:08:10.4657325Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4658027Z mov.b64 {%r877, %r878}, %rd328; 2026-02-21T09:08:10.4658405Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T09:08:10.4659085Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4659790Z cvt.u64.u32 %rd329, %r525; 2026-02-21T09:08:10.4660123Z cvt.u64.u32 %rd330, %r526; 2026-02-21T09:08:10.4660464Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:08:10.4660809Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:08:10.4661459Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4662162Z mov.b64 {%r880, %r881}, %rd332; 2026-02-21T09:08:10.4662542Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T09:08:10.4663368Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4664166Z cvt.u64.u32 %rd333, %r527; 2026-02-21T09:08:10.4664514Z cvt.u64.u32 %rd334, %r528; 2026-02-21T09:08:10.4664929Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:08:10.4665305Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:08:10.4665985Z .loc 1 55 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:55:27 2026-02-21T09:08:10.4666723Z mov.b64 {%r883, %r884}, %rd336; 2026-02-21T09:08:10.4667241Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T09:08:10.4667949Z .loc 1 56 45 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:56:45 2026-02-21T09:08:10.4668663Z shl.b32 %r886, %r677, 11; 2026-02-21T09:08:10.4669130Z shl.b32 %r887, %r678, 11; 2026-02-21T09:08:10.4669483Z shl.b32 %r888, %r679, 11; 2026-02-21T09:08:10.4669799Z shl.b32 %r889, %r680, 11; 2026-02-21T09:08:10.4670124Z shl.b32 %r890, %r681, 11; 2026-02-21T09:08:10.4670440Z shl.b32 %r891, %r682, 11; 2026-02-21T09:08:10.4670765Z shl.b32 %r892, %r683, 11; 2026-02-21T09:08:10.4671087Z shl.b32 %r893, %r684, 11; 2026-02-21T09:08:10.4671395Z shl.b32 %r894, %r685, 11; 2026-02-21T09:08:10.4671712Z shl.b32 %r895, %r686, 11; 2026-02-21T09:08:10.4672020Z shl.b32 %r896, %r687, 11; 2026-02-21T09:08:10.4672339Z shl.b32 %r897, %r688, 11; 2026-02-21T09:08:10.4672651Z shl.b32 %r898, %r689, 11; 2026-02-21T09:08:10.4673033Z shl.b32 %r899, %r690, 11; 2026-02-21T09:08:10.4673347Z shl.b32 %r900, %r691, 11; 2026-02-21T09:08:10.4673665Z shl.b32 %r901, %r692, 11; 2026-02-21T09:08:10.4674256Z .loc 1 56 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:56:52 2026-02-21T09:08:10.4675041Z add.s32 %r902, %r886, %r676; 2026-02-21T09:08:10.4675398Z add.s32 %r903, %r887, %r676; 2026-02-21T09:08:10.4675738Z add.s32 %r904, %r888, %r676; 2026-02-21T09:08:10.4676078Z add.s32 %r905, %r889, %r676; 2026-02-21T09:08:10.4676472Z add.s32 %r906, %r890, %r676; 2026-02-21T09:08:10.4676805Z add.s32 %r907, %r891, %r676; 2026-02-21T09:08:10.4677126Z add.s32 %r908, %r892, %r676; 2026-02-21T09:08:10.4677456Z add.s32 %r909, %r893, %r676; 2026-02-21T09:08:10.4677776Z add.s32 %r910, %r894, %r676; 2026-02-21T09:08:10.4678106Z add.s32 %r911, %r895, %r676; 2026-02-21T09:08:10.4678439Z add.s32 %r912, %r896, %r676; 2026-02-21T09:08:10.4678762Z add.s32 %r913, %r897, %r676; 2026-02-21T09:08:10.4679101Z add.s32 %r914, %r898, %r676; 2026-02-21T09:08:10.4679425Z add.s32 %r915, %r899, %r676; 2026-02-21T09:08:10.4679774Z add.s32 %r916, %r900, %r676; 2026-02-21T09:08:10.4680097Z add.s32 %r917, %r901, %r676; 2026-02-21T09:08:10.4680707Z .loc 1 56 24 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:56:24 2026-02-21T09:08:10.4681398Z mad.wide.s32 %rd65, %r902, 2, %rd5; 2026-02-21T09:08:10.4681799Z mad.wide.s32 %rd66, %r903, 2, %rd5; 2026-02-21T09:08:10.4682202Z mad.wide.s32 %rd67, %r904, 2, %rd5; 2026-02-21T09:08:10.4682593Z mad.wide.s32 %rd68, %r905, 2, %rd5; 2026-02-21T09:08:10.4682991Z mad.wide.s32 %rd69, %r906, 2, %rd5; 2026-02-21T09:08:10.4683395Z mad.wide.s32 %rd70, %r907, 2, %rd5; 2026-02-21T09:08:10.4683854Z mad.wide.s32 %rd71, %r908, 2, %rd5; 2026-02-21T09:08:10.4684303Z mad.wide.s32 %rd72, %r909, 2, %rd5; 2026-02-21T09:08:10.4684882Z mad.wide.s32 %rd73, %r910, 2, %rd5; 2026-02-21T09:08:10.4685331Z mad.wide.s32 %rd74, %r911, 2, %rd5; 2026-02-21T09:08:10.4685775Z mad.wide.s32 %rd75, %r912, 2, %rd5; 2026-02-21T09:08:10.4686170Z mad.wide.s32 %rd76, %r913, 2, %rd5; 2026-02-21T09:08:10.4686557Z mad.wide.s32 %rd77, %r914, 2, %rd5; 2026-02-21T09:08:10.4686948Z mad.wide.s32 %rd78, %r915, 2, %rd5; 2026-02-21T09:08:10.4687332Z mad.wide.s32 %rd79, %r916, 2, %rd5; 2026-02-21T09:08:10.4687727Z mad.wide.s32 %rd80, %r917, 2, %rd5; 2026-02-21T09:08:10.4688391Z .loc 1 56 82 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:56:82 2026-02-21T09:08:10.4689089Z bar.sync 0, 256; 2026-02-21T09:08:10.4689485Z st.shared.v4.b32 [%r62], {%r696, %r708, %r720, %r732}; 2026-02-21T09:08:10.4690038Z st.shared.v4.b32 [%r63], {%r744, %r756, %r768, %r780}; 2026-02-21T09:08:10.4690575Z st.shared.v4.b32 [%r64], {%r792, %r804, %r816, %r828}; 2026-02-21T09:08:10.4691095Z st.shared.v4.b32 [%r65], {%r840, %r852, %r864, %r876}; 2026-02-21T09:08:10.4691556Z bar.sync 0, 256; 2026-02-21T09:08:10.4691853Z // begin inline asm 2026-02-21T09:08:10.4692417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r611, %r615, %r619, %r623}, [%r535]; 2026-02-21T09:08:10.4693139Z // end inline asm 2026-02-21T09:08:10.4693444Z // begin inline asm 2026-02-21T09:08:10.4693979Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r627, %r631, %r635, %r639}, [%r540]; 2026-02-21T09:08:10.4694761Z // end inline asm 2026-02-21T09:08:10.4695073Z // begin inline asm 2026-02-21T09:08:10.4695606Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r643, %r647, %r651, %r655}, [%r545]; 2026-02-21T09:08:10.4696229Z // end inline asm 2026-02-21T09:08:10.4696516Z // begin inline asm 2026-02-21T09:08:10.4697046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r659, %r663, %r667, %r671}, [%r550]; 2026-02-21T09:08:10.4697660Z // end inline asm 2026-02-21T09:08:10.4697954Z bar.sync 0, 256; 2026-02-21T09:08:10.4698326Z st.shared.v4.b32 [%r62], {%r699, %r711, %r723, %r735}; 2026-02-21T09:08:10.4698865Z st.shared.v4.b32 [%r63], {%r747, %r759, %r771, %r783}; 2026-02-21T09:08:10.4699467Z st.shared.v4.b32 [%r64], {%r795, %r807, %r819, %r831}; 2026-02-21T09:08:10.4700000Z st.shared.v4.b32 [%r65], {%r843, %r855, %r867, %r879}; 2026-02-21T09:08:10.4700458Z bar.sync 0, 256; 2026-02-21T09:08:10.4700753Z // begin inline asm 2026-02-21T09:08:10.4701293Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r612, %r616, %r620, %r624}, [%r535]; 2026-02-21T09:08:10.4701913Z // end inline asm 2026-02-21T09:08:10.4702216Z // begin inline asm 2026-02-21T09:08:10.4702741Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r628, %r632, %r636, %r640}, [%r540]; 2026-02-21T09:08:10.4703426Z // end inline asm 2026-02-21T09:08:10.4703730Z // begin inline asm 2026-02-21T09:08:10.4704260Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r644, %r648, %r652, %r656}, [%r545]; 2026-02-21T09:08:10.4704966Z // end inline asm 2026-02-21T09:08:10.4705254Z // begin inline asm 2026-02-21T09:08:10.4705783Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r660, %r664, %r668, %r672}, [%r550]; 2026-02-21T09:08:10.4706399Z // end inline asm 2026-02-21T09:08:10.4706697Z bar.sync 0, 256; 2026-02-21T09:08:10.4707069Z st.shared.v4.b32 [%r62], {%r702, %r714, %r726, %r738}; 2026-02-21T09:08:10.4707608Z st.shared.v4.b32 [%r63], {%r750, %r762, %r774, %r786}; 2026-02-21T09:08:10.4708141Z st.shared.v4.b32 [%r64], {%r798, %r810, %r822, %r834}; 2026-02-21T09:08:10.4708668Z st.shared.v4.b32 [%r65], {%r846, %r858, %r870, %r882}; 2026-02-21T09:08:10.4709124Z bar.sync 0, 256; 2026-02-21T09:08:10.4709413Z // begin inline asm 2026-02-21T09:08:10.4709959Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r613, %r617, %r621, %r625}, [%r535]; 2026-02-21T09:08:10.4710577Z // end inline asm 2026-02-21T09:08:10.4710873Z // begin inline asm 2026-02-21T09:08:10.4711396Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r629, %r633, %r637, %r641}, [%r540]; 2026-02-21T09:08:10.4712027Z // end inline asm 2026-02-21T09:08:10.4712321Z // begin inline asm 2026-02-21T09:08:10.4712846Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r645, %r649, %r653, %r657}, [%r545]; 2026-02-21T09:08:10.4713475Z // end inline asm 2026-02-21T09:08:10.4713762Z // begin inline asm 2026-02-21T09:08:10.4714296Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r661, %r665, %r669, %r673}, [%r550]; 2026-02-21T09:08:10.4715010Z // end inline asm 2026-02-21T09:08:10.4715304Z bar.sync 0, 256; 2026-02-21T09:08:10.4715673Z st.shared.v4.b32 [%r62], {%r705, %r717, %r729, %r741}; 2026-02-21T09:08:10.4716207Z st.shared.v4.b32 [%r63], {%r753, %r765, %r777, %r789}; 2026-02-21T09:08:10.4716737Z st.shared.v4.b32 [%r64], {%r801, %r813, %r825, %r837}; 2026-02-21T09:08:10.4717265Z st.shared.v4.b32 [%r65], {%r849, %r861, %r873, %r885}; 2026-02-21T09:08:10.4717725Z bar.sync 0, 256; 2026-02-21T09:08:10.4718019Z // begin inline asm 2026-02-21T09:08:10.4718565Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r614, %r618, %r622, %r626}, [%r535]; 2026-02-21T09:08:10.4719192Z // end inline asm 2026-02-21T09:08:10.4719492Z // begin inline asm 2026-02-21T09:08:10.4720023Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r630, %r634, %r638, %r642}, [%r540]; 2026-02-21T09:08:10.4720731Z // end inline asm 2026-02-21T09:08:10.4721026Z // begin inline asm 2026-02-21T09:08:10.4721542Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r646, %r650, %r654, %r658}, [%r545]; 2026-02-21T09:08:10.4722159Z // end inline asm 2026-02-21T09:08:10.4722504Z // begin inline asm 2026-02-21T09:08:10.4723029Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r662, %r666, %r670, %r674}, [%r550]; 2026-02-21T09:08:10.4723636Z // end inline asm 2026-02-21T09:08:10.4723934Z // begin inline asm 2026-02-21T09:08:10.4724348Z st.global.v4.b32 [ %rd65 + 0 ], { %r611, %r612, %r613, %r614 }; 2026-02-21T09:08:10.4724927Z // end inline asm 2026-02-21T09:08:10.4725225Z // begin inline asm 2026-02-21T09:08:10.4725635Z st.global.v4.b32 [ %rd66 + 0 ], { %r615, %r616, %r617, %r618 }; 2026-02-21T09:08:10.4726130Z // end inline asm 2026-02-21T09:08:10.4726418Z // begin inline asm 2026-02-21T09:08:10.4726900Z st.global.v4.b32 [ %rd67 + 0 ], { %r619, %r620, %r621, %r622 }; 2026-02-21T09:08:10.4727380Z // end inline asm 2026-02-21T09:08:10.4727677Z // begin inline asm 2026-02-21T09:08:10.4728078Z st.global.v4.b32 [ %rd68 + 0 ], { %r623, %r624, %r625, %r626 }; 2026-02-21T09:08:10.4728577Z // end inline asm 2026-02-21T09:08:10.4728883Z // begin inline asm 2026-02-21T09:08:10.4729289Z st.global.v4.b32 [ %rd69 + 0 ], { %r627, %r628, %r629, %r630 }; 2026-02-21T09:08:10.4729776Z // end inline asm 2026-02-21T09:08:10.4730062Z // begin inline asm 2026-02-21T09:08:10.4730517Z st.global.v4.b32 [ %rd70 + 0 ], { %r631, %r632, %r633, %r634 }; 2026-02-21T09:08:10.4730996Z // end inline asm 2026-02-21T09:08:10.4731290Z // begin inline asm 2026-02-21T09:08:10.4731686Z st.global.v4.b32 [ %rd71 + 0 ], { %r635, %r636, %r637, %r638 }; 2026-02-21T09:08:10.4732246Z // end inline asm 2026-02-21T09:08:10.4732598Z // begin inline asm 2026-02-21T09:08:10.4733097Z st.global.v4.b32 [ %rd72 + 0 ], { %r639, %r640, %r641, %r642 }; 2026-02-21T09:08:10.4733586Z // end inline asm 2026-02-21T09:08:10.4733881Z // begin inline asm 2026-02-21T09:08:10.4734289Z st.global.v4.b32 [ %rd73 + 0 ], { %r643, %r644, %r645, %r646 }; 2026-02-21T09:08:10.4734826Z // end inline asm 2026-02-21T09:08:10.4735124Z // begin inline asm 2026-02-21T09:08:10.4735522Z st.global.v4.b32 [ %rd74 + 0 ], { %r647, %r648, %r649, %r650 }; 2026-02-21T09:08:10.4736014Z // end inline asm 2026-02-21T09:08:10.4736306Z // begin inline asm 2026-02-21T09:08:10.4736718Z st.global.v4.b32 [ %rd75 + 0 ], { %r651, %r652, %r653, %r654 }; 2026-02-21T09:08:10.4737201Z // end inline asm 2026-02-21T09:08:10.4737489Z // begin inline asm 2026-02-21T09:08:10.4737899Z st.global.v4.b32 [ %rd76 + 0 ], { %r655, %r656, %r657, %r658 }; 2026-02-21T09:08:10.4738379Z // end inline asm 2026-02-21T09:08:10.4738676Z // begin inline asm 2026-02-21T09:08:10.4739074Z st.global.v4.b32 [ %rd77 + 0 ], { %r659, %r660, %r661, %r662 }; 2026-02-21T09:08:10.4739555Z // end inline asm 2026-02-21T09:08:10.4739838Z // begin inline asm 2026-02-21T09:08:10.4740250Z st.global.v4.b32 [ %rd78 + 0 ], { %r663, %r664, %r665, %r666 }; 2026-02-21T09:08:10.4740740Z // end inline asm 2026-02-21T09:08:10.4741029Z // begin inline asm 2026-02-21T09:08:10.4741435Z st.global.v4.b32 [ %rd79 + 0 ], { %r667, %r668, %r669, %r670 }; 2026-02-21T09:08:10.4741914Z // end inline asm 2026-02-21T09:08:10.4742217Z // begin inline asm 2026-02-21T09:08:10.4742612Z st.global.v4.b32 [ %rd80 + 0 ], { %r671, %r672, %r673, %r674 }; 2026-02-21T09:08:10.4743094Z // end inline asm 2026-02-21T09:08:10.4743377Z mov.b32 %r968, 1; 2026-02-21T09:08:10.4743802Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:08:10.4744616Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4745403Z xor.b32 %r972, %r968, %r972; 2026-02-21T09:08:10.4746074Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4746799Z add.s32 %r962, %r962, -1; 2026-02-21T09:08:10.4747249Z setp.ne.b32 %p102, %r962, 0; 2026-02-21T09:08:10.4747607Z @%p102 bra $L__BB0_18; 2026-02-21T09:08:10.4747944Z bra.uni $L__BB0_23; 2026-02-21T09:08:10.4748368Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:08:10.4748882Z add.s32 %r391, %r967, 1; 2026-02-21T09:08:10.4749294Z setp.eq.b32 %p98, %r967, 63; 2026-02-21T09:08:10.4749660Z selp.b32 %r967, 0, %r391, %p98; 2026-02-21T09:08:10.4750041Z setp.eq.b32 %p99, %r967, 63; 2026-02-21T09:08:10.4750389Z @%p99 bra $L__BB0_21; 2026-02-21T09:08:10.4750813Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:08:10.4751601Z .loc 1 0 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0:130 2026-02-21T09:08:10.4752306Z mov.b32 %r968, 0; 2026-02-21T09:08:10.4752905Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4753857Z setp.ne.b32 %p100, %r967, 0; 2026-02-21T09:08:10.4754266Z @%p100 bra $L__BB0_22; 2026-02-21T09:08:10.4754623Z // %bb.20: // %.thread 2026-02-21T09:08:10.4755224Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:08:10.4755698Z add.s32 %r969, %r969, 2368; 2026-02-21T09:08:10.4756356Z .loc 1 34 35 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:34:35 2026-02-21T09:08:10.4757047Z shr.s32 %r919, %r969, 31; 2026-02-21T09:08:10.4757455Z shr.u32 %r920, %r919, 27; 2026-02-21T09:08:10.4757802Z add.s32 %r921, %r969, %r920; 2026-02-21T09:08:10.4758147Z shr.s32 %r922, %r921, 5; 2026-02-21T09:08:10.4758784Z .loc 1 35 33 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:35:33 2026-02-21T09:08:10.4759472Z shl.b32 %r923, %r922, 1; 2026-02-21T09:08:10.4760108Z .loc 1 36 39 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:36:39 2026-02-21T09:08:10.4760797Z sub.s32 %r924, 16, %r923; 2026-02-21T09:08:10.4761428Z .loc 1 36 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:36:52 2026-02-21T09:08:10.4762130Z min.s32 %r925, %r924, 2; 2026-02-21T09:08:10.4762745Z .loc 1 37 45 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:37:45 2026-02-21T09:08:10.4763443Z and.b32 %r926, %r921, -32; 2026-02-21T09:08:10.4763794Z sub.s32 %r927, %r969, %r926; 2026-02-21T09:08:10.4764424Z .loc 1 38 51 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:38:51 2026-02-21T09:08:10.4765214Z div.s32 %r928, %r927, %r925; 2026-02-21T09:08:10.4765859Z .loc 1 37 64 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:37:64 2026-02-21T09:08:10.4766572Z mul.lo.s32 %r929, %r928, %r925; 2026-02-21T09:08:10.4766937Z sub.s32 %r930, %r927, %r929; 2026-02-21T09:08:10.4767578Z .loc 1 37 30 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:37:30 2026-02-21T09:08:10.4768277Z add.s32 %r931, %r930, %r923; 2026-02-21T09:08:10.4768915Z .loc 1 39 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:39:27 2026-02-21T09:08:10.4769609Z shl.b32 %r971, %r931, 7; 2026-02-21T09:08:10.4770236Z .loc 1 41 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:41:27 2026-02-21T09:08:10.4770935Z shl.b32 %r970, %r928, 8; 2026-02-21T09:08:10.4771569Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4772291Z bra.uni $L__BB0_22; 2026-02-21T09:08:10.4772695Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:08:10.4773470Z .loc 1 0 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0:130 2026-02-21T09:08:10.4774174Z mov.b32 %r89, global_smem; 2026-02-21T09:08:10.4774528Z add.s32 %r90, %r89, %r3; 2026-02-21T09:08:10.4774951Z bra.uni $L__BB0_2; 2026-02-21T09:08:10.4775529Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4776501Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4777428Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:08:10.4777921Z barrier.sync 1; 2026-02-21T09:08:10.4778242Z barrier.sync 1; 2026-02-21T09:08:10.4778599Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:08:10.4779069Z $L__BB0_2: // %.preheader 2026-02-21T09:08:10.4779586Z // =>This Loop Header: Depth=1 2026-02-21T09:08:10.4780122Z // Child Loop BB0_11 Depth 2 2026-02-21T09:08:10.4780644Z // Child Loop BB0_7 Depth 2 2026-02-21T09:08:10.4781376Z .loc 1 19 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:19 2026-02-21T09:08:10.4782157Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:08:10.4782576Z barrier.sync 1; 2026-02-21T09:08:10.4782894Z ld.shared.b8 %r88, [%r90+114768]; 2026-02-21T09:08:10.4783297Z setp.gt.u32 %p4, %r88, 3; 2026-02-21T09:08:10.4783648Z @%p4 bra $L__BB0_4; 2026-02-21T09:08:10.4784009Z // %bb.3: // %.preheader 2026-02-21T09:08:10.4784519Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4785052Z $L_brx_0: .branchtargets 2026-02-21T09:08:10.4785388Z $L__BB0_5, 2026-02-21T09:08:10.4785715Z $L__BB0_9, 2026-02-21T09:08:10.4785987Z $L__BB0_15, 2026-02-21T09:08:10.4786252Z $L__BB0_24; 2026-02-21T09:08:10.4786535Z brx.idx %r88, $L_brx_0; 2026-02-21T09:08:10.4786954Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4787755Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4788523Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:08:10.4788985Z ld.shared.b32 %r139, [global_smem+98304]; 2026-02-21T09:08:10.4789450Z ld.shared.b32 %r946, [global_smem+98312]; 2026-02-21T09:08:10.4789856Z barrier.sync 1; 2026-02-21T09:08:10.4790178Z setp.lt.s32 %p17, %r946, 1; 2026-02-21T09:08:10.4790529Z @%p17 bra $L__BB0_8; 2026-02-21T09:08:10.4790894Z // %bb.6: // %.lr.ph7 2026-02-21T09:08:10.4791387Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4792154Z .loc 1 0 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0:130 2026-02-21T09:08:10.4792864Z mov.b32 %r950, -1; 2026-02-21T09:08:10.4793182Z mov.pred %p114, 0; 2026-02-21T09:08:10.4793494Z mov.b32 %r947, 0; 2026-02-21T09:08:10.4793793Z mov.b32 %r948, %r947; 2026-02-21T09:08:10.4794117Z mov.b32 %r949, %r947; 2026-02-21T09:08:10.4794523Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:08:10.4795170Z // => This Inner Loop Header: Depth=2 2026-02-21T09:08:10.4795967Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4796675Z add.s32 %r149, %r950, 1; 2026-02-21T09:08:10.4797035Z setp.eq.b32 %p30, %r950, 63; 2026-02-21T09:08:10.4797411Z selp.b32 %r950, 0, %r149, %p30; 2026-02-21T09:08:10.4797792Z shl.b32 %r150, %r949, 3; 2026-02-21T09:08:10.4798128Z add.s32 %r152, %r89, %r150; 2026-02-21T09:08:10.4798490Z add.s32 %r153, %r152, 114688; 2026-02-21T09:08:10.4798848Z add.s32 %r137, %r152, 114720; 2026-02-21T09:08:10.4799510Z .loc 1 51 31 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:51:31 2026-02-21T09:08:10.4800215Z shl.b32 %r154, %r949, 14; 2026-02-21T09:08:10.4800557Z add.s32 %r155, %r89, %r154; 2026-02-21T09:08:10.4801200Z .loc 1 52 44 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:52:44 2026-02-21T09:08:10.4802003Z shl.b32 %r156, %r949, 13; 2026-02-21T09:08:10.4802582Z add.s32 %r157, %r89, %r156; 2026-02-21T09:08:10.4802996Z add.s32 %r158, %r157, 65536; 2026-02-21T09:08:10.4803621Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4804315Z bar.warp.sync -1; 2026-02-21T09:08:10.4804749Z // begin inline asm 2026-02-21T09:08:10.4805057Z 2026-02-21T09:08:10.4805299Z { 2026-02-21T09:08:10.4805568Z .reg .pred complete; 2026-02-21T09:08:10.4805884Z waitLoop: 2026-02-21T09:08:10.4806326Z mbarrier.try_wait.parity.shared.b64 complete, [%r137], %r948; 2026-02-21T09:08:10.4806878Z @!complete bra.uni waitLoop; 2026-02-21T09:08:10.4807227Z } 2026-02-21T09:08:10.4807376Z 2026-02-21T09:08:10.4807495Z // end inline asm 2026-02-21T09:08:10.4808102Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4808819Z setp.eq.b32 %p29, %r950, 63; 2026-02-21T09:08:10.4809540Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4810266Z elect.sync %r159|%p20, -1; 2026-02-21T09:08:10.4810626Z bfe.u32 %r160, %r155, 4, 14; 2026-02-21T09:08:10.4810986Z cvt.u64.u32 %rd22, %r160; 2026-02-21T09:08:10.4811361Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T09:08:10.4811792Z bfe.u32 %r161, %r158, 4, 14; 2026-02-21T09:08:10.4812138Z cvt.u64.u32 %rd23, %r161; 2026-02-21T09:08:10.4812508Z or.b64 %rd13, %rd23, -9223371899382267904; 2026-02-21T09:08:10.4812989Z mov.b32 %r140, 136314896; 2026-02-21T09:08:10.4813322Z // begin inline asm 2026-02-21T09:08:10.4813877Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r139 + 0 ], %rd12, %rd13, %r140, %p114; 2026-02-21T09:08:10.4814476Z // end inline asm 2026-02-21T09:08:10.4814825Z add.s32 %r162, %r155, 32; 2026-02-21T09:08:10.4815158Z bfe.u32 %r163, %r162, 4, 14; 2026-02-21T09:08:10.4815507Z cvt.u64.u32 %rd24, %r163; 2026-02-21T09:08:10.4815869Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:08:10.4816286Z add.s32 %r164, %r157, 65568; 2026-02-21T09:08:10.4816638Z bfe.u32 %r165, %r164, 4, 14; 2026-02-21T09:08:10.4816981Z cvt.u64.u32 %rd25, %r165; 2026-02-21T09:08:10.4817343Z or.b64 %rd15, %rd25, -9223371899382267904; 2026-02-21T09:08:10.4817744Z mov.pred %p21, -1; 2026-02-21T09:08:10.4818063Z // begin inline asm 2026-02-21T09:08:10.4818564Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r139 + 0 ], %rd14, %rd15, %r140, %p21; 2026-02-21T09:08:10.4819158Z // end inline asm 2026-02-21T09:08:10.4819462Z add.s32 %r166, %r155, 8192; 2026-02-21T09:08:10.4819816Z bfe.u32 %r167, %r166, 4, 14; 2026-02-21T09:08:10.4820164Z cvt.u64.u32 %rd26, %r167; 2026-02-21T09:08:10.4820520Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:08:10.4820927Z // begin inline asm 2026-02-21T09:08:10.4821439Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r139 + 128 ], %rd16, %rd13, %r140, %p114; 2026-02-21T09:08:10.4822055Z // end inline asm 2026-02-21T09:08:10.4822365Z add.s32 %r168, %r155, 8224; 2026-02-21T09:08:10.4822727Z bfe.u32 %r169, %r168, 4, 14; 2026-02-21T09:08:10.4823070Z cvt.u64.u32 %rd27, %r169; 2026-02-21T09:08:10.4823442Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T09:08:10.4823844Z // begin inline asm 2026-02-21T09:08:10.4824350Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r139 + 128 ], %rd18, %rd15, %r140, %p21; 2026-02-21T09:08:10.4825062Z // end inline asm 2026-02-21T09:08:10.4825359Z cvt.u64.u32 %rd20, %r153; 2026-02-21T09:08:10.4825696Z // begin inline asm 2026-02-21T09:08:10.4826172Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:08:10.4826720Z // end inline asm 2026-02-21T09:08:10.4827030Z and.pred %p28, %p29, %p20; 2026-02-21T09:08:10.4827396Z add.s32 %r170, %r89, 114752; 2026-02-21T09:08:10.4827754Z cvt.u64.u32 %rd21, %r170; 2026-02-21T09:08:10.4828082Z // begin inline asm 2026-02-21T09:08:10.4828555Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:08:10.4829095Z // end inline asm 2026-02-21T09:08:10.4829747Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4830427Z setp.ne.b32 %p114, %r950, 63; 2026-02-21T09:08:10.4831079Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4831841Z selp.b32 %r171, 1, 0, %p29; 2026-02-21T09:08:10.4832197Z xor.b32 %r947, %r947, %r171; 2026-02-21T09:08:10.4832548Z add.s32 %r147, %r89, 114768; 2026-02-21T09:08:10.4832890Z // begin inline asm 2026-02-21T09:08:10.4833182Z 2026-02-21T09:08:10.4833414Z { 2026-02-21T09:08:10.4833690Z @!%p29 bra.uni skipWait; 2026-02-21T09:08:10.4834038Z .reg .pred complete; 2026-02-21T09:08:10.4834361Z waitLoop: 2026-02-21T09:08:10.4834875Z mbarrier.try_wait.parity.shared.b64 complete, [%r147], %r947; 2026-02-21T09:08:10.4835431Z @!complete bra.uni waitLoop; 2026-02-21T09:08:10.4835785Z skipWait: 2026-02-21T09:08:10.4836105Z } 2026-02-21T09:08:10.4836251Z 2026-02-21T09:08:10.4836382Z // end inline asm 2026-02-21T09:08:10.4836945Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4837646Z add.s32 %r172, %r949, 1; 2026-02-21T09:08:10.4837994Z setp.eq.b32 %p31, %r172, 4; 2026-02-21T09:08:10.4838374Z selp.b32 %r949, 0, %r172, %p31; 2026-02-21T09:08:10.4838749Z selp.b32 %r173, 1, 0, %p31; 2026-02-21T09:08:10.4839105Z xor.b32 %r948, %r948, %r173; 2026-02-21T09:08:10.4839814Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4840532Z add.s32 %r946, %r946, -1; 2026-02-21T09:08:10.4840889Z setp.ne.b32 %p32, %r946, 0; 2026-02-21T09:08:10.4841240Z @%p32 bra $L__BB0_7; 2026-02-21T09:08:10.4841625Z $L__BB0_8: // %._crit_edge8 2026-02-21T09:08:10.4842139Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4842619Z barrier.sync 1; 2026-02-21T09:08:10.4842976Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:08:10.4843396Z bra.uni $L__BB0_2; 2026-02-21T09:08:10.4843814Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4844592Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4845416Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:08:10.4845868Z ld.shared.b32 %r951, [global_smem+98312]; 2026-02-21T09:08:10.4846287Z barrier.sync 1; 2026-02-21T09:08:10.4846876Z .loc 1 21 67 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:21:67 2026-02-21T09:08:10.4847591Z mov.u32 %r17, %ctaid.x; 2026-02-21T09:08:10.4847935Z mov.u32 %r91, %ctaid.y; 2026-02-21T09:08:10.4848265Z mov.u32 %r92, %ctaid.z; 2026-02-21T09:08:10.4848609Z mov.u32 %r93, %nctaid.x; 2026-02-21T09:08:10.4848942Z mov.u32 %r94, %nctaid.y; 2026-02-21T09:08:10.4849301Z mad.lo.s32 %r95, %r92, %r94, %r91; 2026-02-21T09:08:10.4849693Z mad.lo.s32 %r96, %r95, %r93, %r17; 2026-02-21T09:08:10.4850079Z shl.b32 %r97, %r96, 8; 2026-02-21T09:08:10.4850690Z .loc 1 22 67 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:22:67 2026-02-21T09:08:10.4851394Z cvt.s64.s32 %rd7, %r97; 2026-02-21T09:08:10.4851745Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:08:10.4852087Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:08:10.4852445Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:08:10.4853083Z .loc 1 21 67 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:21:67 2026-02-21T09:08:10.4853790Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:08:10.4854436Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4855222Z setp.lt.s32 %p5, %r951, 1; 2026-02-21T09:08:10.4855574Z @%p5 bra $L__BB0_14; 2026-02-21T09:08:10.4855939Z // %bb.10: // %.lr.ph 2026-02-21T09:08:10.4856448Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4857005Z add.s32 %r961, %r17, -2368; 2026-02-21T09:08:10.4857361Z add.s32 %r19, %r1, -256; 2026-02-21T09:08:10.4857692Z mov.b32 %r958, -1; 2026-02-21T09:08:10.4858002Z mov.b32 %r952, 0; 2026-02-21T09:08:10.4858354Z mov.b32 %r953, %r952; 2026-02-21T09:08:10.4858679Z mov.b32 %r960, %r952; 2026-02-21T09:08:10.4858990Z mov.b32 %r959, %r952; 2026-02-21T09:08:10.4859306Z mov.b32 %r956, %r952; 2026-02-21T09:08:10.4859617Z bra.uni $L__BB0_11; 2026-02-21T09:08:10.4860043Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:08:10.4860837Z .loc 1 0 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0:130 2026-02-21T09:08:10.4861544Z selp.b32 %r118, 0, %r956, %p8; 2026-02-21T09:08:10.4861930Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:08:10.4862287Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:08:10.4863000Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4863702Z shl.b32 %r125, %r953, 3; 2026-02-21T09:08:10.4864048Z add.s32 %r127, %r89, %r125; 2026-02-21T09:08:10.4864408Z add.s32 %r114, %r127, 114688; 2026-02-21T09:08:10.4865101Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4865770Z // begin inline asm 2026-02-21T09:08:10.4866060Z 2026-02-21T09:08:10.4866308Z { 2026-02-21T09:08:10.4866567Z .reg .pred complete; 2026-02-21T09:08:10.4866994Z waitLoop: 2026-02-21T09:08:10.4867414Z mbarrier.try_wait.parity.shared.b64 complete, [%r114], %r952; 2026-02-21T09:08:10.4867972Z @!complete bra.uni waitLoop; 2026-02-21T09:08:10.4868306Z } 2026-02-21T09:08:10.4868461Z 2026-02-21T09:08:10.4868595Z // end inline asm 2026-02-21T09:08:10.4869214Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4869913Z add.s32 %r120, %r127, 114720; 2026-02-21T09:08:10.4870532Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4871188Z bar.sync 3, 64; 2026-02-21T09:08:10.4871498Z // begin inline asm 2026-02-21T09:08:10.4871927Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r120], 24576; 2026-02-21T09:08:10.4872443Z // end inline asm 2026-02-21T09:08:10.4873044Z .loc 1 51 31 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:51:31 2026-02-21T09:08:10.4873730Z shl.b32 %r128, %r953, 14; 2026-02-21T09:08:10.4874083Z add.s32 %r117, %r89, %r128; 2026-02-21T09:08:10.4874785Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4875447Z bar.sync 3, 64; 2026-02-21T09:08:10.4875762Z elect.sync %r129|%p13, -1; 2026-02-21T09:08:10.4876137Z and.pred %p10, %p12, %p13; 2026-02-21T09:08:10.4876483Z // begin inline asm 2026-02-21T09:08:10.4877291Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd10, {%r118, %r960}], [%r120]; 2026-02-21T09:08:10.4878160Z // end inline asm 2026-02-21T09:08:10.4878749Z .loc 1 52 44 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:52:44 2026-02-21T09:08:10.4879451Z shl.b32 %r130, %r953, 13; 2026-02-21T09:08:10.4879791Z add.s32 %r131, %r89, %r130; 2026-02-21T09:08:10.4880146Z add.s32 %r121, %r131, 65536; 2026-02-21T09:08:10.4880752Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4881409Z bar.sync 3, 64; 2026-02-21T09:08:10.4881723Z elect.sync %r132|%p14, -1; 2026-02-21T09:08:10.4882079Z and.pred %p11, %p12, %p14; 2026-02-21T09:08:10.4882432Z // begin inline asm 2026-02-21T09:08:10.4883208Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd11, {%r118, %r959}], [%r120]; 2026-02-21T09:08:10.4884075Z // end inline asm 2026-02-21T09:08:10.4884759Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4885561Z add.s32 %r956, %r118, 32; 2026-02-21T09:08:10.4886159Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4886874Z add.s32 %r133, %r953, 1; 2026-02-21T09:08:10.4887229Z setp.eq.b32 %p15, %r133, 4; 2026-02-21T09:08:10.4887587Z selp.b32 %r953, 0, %r133, %p15; 2026-02-21T09:08:10.4887968Z selp.b32 %r134, 1, 0, %p15; 2026-02-21T09:08:10.4888320Z xor.b32 %r952, %r952, %r134; 2026-02-21T09:08:10.4888978Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4889683Z add.s32 %r951, %r951, -1; 2026-02-21T09:08:10.4890019Z setp.ne.b32 %p16, %r951, 0; 2026-02-21T09:08:10.4890360Z @%p16 bra $L__BB0_11; 2026-02-21T09:08:10.4890664Z bra.uni $L__BB0_14; 2026-02-21T09:08:10.4891145Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:08:10.4891705Z // => This Inner Loop Header: Depth=2 2026-02-21T09:08:10.4892191Z add.s32 %r100, %r958, 1; 2026-02-21T09:08:10.4892521Z setp.eq.b32 %p6, %r958, 63; 2026-02-21T09:08:10.4892887Z selp.b32 %r958, 0, %r100, %p6; 2026-02-21T09:08:10.4893263Z setp.ne.b32 %p7, %r958, 0; 2026-02-21T09:08:10.4893606Z setp.eq.b32 %p8, %r958, 0; 2026-02-21T09:08:10.4893945Z @%p7 bra $L__BB0_13; 2026-02-21T09:08:10.4894400Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:08:10.4894973Z add.s32 %r961, %r961, 2368; 2026-02-21T09:08:10.4895595Z .loc 1 34 35 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:34:35 2026-02-21T09:08:10.4896293Z shr.s32 %r101, %r961, 31; 2026-02-21T09:08:10.4896623Z shr.u32 %r102, %r101, 27; 2026-02-21T09:08:10.4896971Z add.s32 %r103, %r961, %r102; 2026-02-21T09:08:10.4897316Z shr.s32 %r104, %r103, 5; 2026-02-21T09:08:10.4897919Z .loc 1 35 33 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:35:33 2026-02-21T09:08:10.4898591Z shl.b32 %r105, %r104, 1; 2026-02-21T09:08:10.4899180Z .loc 1 36 39 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:36:39 2026-02-21T09:08:10.4899837Z sub.s32 %r106, 16, %r105; 2026-02-21T09:08:10.4900418Z .loc 1 36 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:36:52 2026-02-21T09:08:10.4901082Z min.s32 %r107, %r106, 2; 2026-02-21T09:08:10.4901683Z .loc 1 37 45 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:37:45 2026-02-21T09:08:10.4902329Z and.b32 %r108, %r103, -32; 2026-02-21T09:08:10.4902640Z sub.s32 %r109, %r961, %r108; 2026-02-21T09:08:10.4903196Z .loc 1 38 51 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:38:51 2026-02-21T09:08:10.4903827Z div.s32 %r110, %r109, %r107; 2026-02-21T09:08:10.4904412Z .loc 1 37 64 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:37:64 2026-02-21T09:08:10.4905143Z mul.lo.s32 %r111, %r110, %r107; 2026-02-21T09:08:10.4905468Z sub.s32 %r112, %r109, %r111; 2026-02-21T09:08:10.4906006Z .loc 1 37 30 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:37:30 2026-02-21T09:08:10.4906600Z add.s32 %r113, %r112, %r105; 2026-02-21T09:08:10.4907182Z .loc 1 39 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:39:27 2026-02-21T09:08:10.4907855Z shl.b32 %r959, %r113, 7; 2026-02-21T09:08:10.4908450Z .loc 1 41 27 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:41:27 2026-02-21T09:08:10.4909132Z shl.b32 %r960, %r110, 8; 2026-02-21T09:08:10.4909466Z bra.uni $L__BB0_13; 2026-02-21T09:08:10.4909829Z $L__BB0_14: // %._crit_edge 2026-02-21T09:08:10.4910353Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4911117Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4911960Z barrier.sync 1; 2026-02-21T09:08:10.4912315Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:08:10.4912744Z bra.uni $L__BB0_2; 2026-02-21T09:08:10.4913217Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:10.4913973Z .loc 1 19 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:19 2026-02-21T09:08:10.4914654Z barrier.sync 1; 2026-02-21T09:08:10.4915014Z barrier.sync 1; 2026-02-21T09:08:10.4915318Z bra.uni $L__BB0_2; 2026-02-21T09:08:10.4915688Z $L__BB0_23: // %._crit_edge11 2026-02-21T09:08:10.4916455Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4917160Z barrier.sync 1; 2026-02-21T09:08:10.4917519Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:08:10.4918315Z .loc 1 53 52 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:53:52 2026-02-21T09:08:10.4919015Z bar.sync 0, 256; 2026-02-21T09:08:10.4919326Z // begin inline asm 2026-02-21T09:08:10.4919622Z 2026-02-21T09:08:10.4919869Z { 2026-02-21T09:08:10.4920125Z .reg .pred complete; 2026-02-21T09:08:10.4920448Z waitLoop: 2026-02-21T09:08:10.4920878Z mbarrier.try_wait.parity.shared.b64 complete, [%r932], %r972; 2026-02-21T09:08:10.4921455Z @!complete bra.uni waitLoop; 2026-02-21T09:08:10.4921796Z } 2026-02-21T09:08:10.4922014Z 2026-02-21T09:08:10.4922131Z // end inline asm 2026-02-21T09:08:10.4922739Z .loc 1 28 130 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:130 2026-02-21T09:08:10.4923437Z bar.sync 0, 256; 2026-02-21T09:08:10.4923743Z // begin inline asm 2026-02-21T09:08:10.4924127Z @%p103 mbarrier.inval.shared::cta.b64 [%r932]; 2026-02-21T09:08:10.4924576Z // end inline asm 2026-02-21T09:08:10.4924919Z // begin inline asm 2026-02-21T09:08:10.4925302Z @%p103 mbarrier.inval.shared::cta.b64 [%r339]; 2026-02-21T09:08:10.4925743Z // end inline asm 2026-02-21T09:08:10.4926046Z // begin inline asm 2026-02-21T09:08:10.4926418Z @%p103 mbarrier.inval.shared::cta.b64 [%r331]; 2026-02-21T09:08:10.4926846Z // end inline asm 2026-02-21T09:08:10.4927144Z bar.sync 0, 256; 2026-02-21T09:08:10.4927437Z // begin inline asm 2026-02-21T09:08:10.4927803Z @%p103 mbarrier.inval.shared::cta.b64 [%r332]; 2026-02-21T09:08:10.4927918Z // end inline asm 2026-02-21T09:08:10.4928034Z bar.sync 0, 256; 2026-02-21T09:08:10.4928163Z // begin inline asm 2026-02-21T09:08:10.4928342Z @%p103 mbarrier.inval.shared::cta.b64 [%r333]; 2026-02-21T09:08:10.4928453Z // end inline asm 2026-02-21T09:08:10.4928576Z bar.sync 0, 256; 2026-02-21T09:08:10.4928692Z // begin inline asm 2026-02-21T09:08:10.4928863Z @%p103 mbarrier.inval.shared::cta.b64 [%r334]; 2026-02-21T09:08:10.4928976Z // end inline asm 2026-02-21T09:08:10.4929103Z // begin inline asm 2026-02-21T09:08:10.4929278Z @%p103 mbarrier.inval.shared::cta.b64 [%r327]; 2026-02-21T09:08:10.4929392Z // end inline asm 2026-02-21T09:08:10.4929510Z bar.sync 0, 256; 2026-02-21T09:08:10.4929627Z // begin inline asm 2026-02-21T09:08:10.4929799Z @%p103 mbarrier.inval.shared::cta.b64 [%r328]; 2026-02-21T09:08:10.4929913Z // end inline asm 2026-02-21T09:08:10.4930035Z bar.sync 0, 256; 2026-02-21T09:08:10.4930151Z // begin inline asm 2026-02-21T09:08:10.4930323Z @%p103 mbarrier.inval.shared::cta.b64 [%r329]; 2026-02-21T09:08:10.4930446Z // end inline asm 2026-02-21T09:08:10.4930561Z bar.sync 0, 256; 2026-02-21T09:08:10.4930678Z // begin inline asm 2026-02-21T09:08:10.4930851Z @%p103 mbarrier.inval.shared::cta.b64 [%r330]; 2026-02-21T09:08:10.4930972Z // end inline asm 2026-02-21T09:08:10.4931388Z .loc 1 28 4 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:28:4 2026-02-21T09:08:10.4931500Z bar.sync 0, 256; 2026-02-21T09:08:10.4931629Z // begin inline asm 2026-02-21T09:08:10.4931904Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r944, 256; 2026-02-21T09:08:10.4932095Z // end inline asm 2026-02-21T09:08:10.4932275Z st.shared.b32 [global_smem+114776], 50529027; 2026-02-21T09:08:10.4932393Z barrier.sync 1; 2026-02-21T09:08:10.4932564Z $L__BB0_24: // %common.ret 2026-02-21T09:08:10.4933017Z .loc 1 0 0 // cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py:0 2026-02-21T09:08:10.4933133Z ret; 2026-02-21T09:08:10.4933247Z $L__tmp1: 2026-02-21T09:08:10.4933362Z $L__func_end0: 2026-02-21T09:08:10.4933546Z // -- End function 2026-02-21T09:08:10.4933647Z } 2026-02-21T09:08:10.4934145Z .file 1 "/tmp/torchinductor_root/ev/cevpxfmw74fxlcmuprgmwdfwfsb6qlkbhzq4sq6vp77mu3vwhpiu.py" 2026-02-21T09:08:10.4934287Z .section .debug_abbrev 2026-02-21T09:08:10.4934388Z { 2026-02-21T09:08:10.4934571Z .b8 1 // Abbreviation Code 2026-02-21T09:08:10.4934879Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:08:10.4935066Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:08:10.4935234Z .b8 37 // DW_AT_producer 2026-02-21T09:08:10.4935394Z .b8 8 // DW_FORM_string 2026-02-21T09:08:10.4935567Z .b8 19 // DW_AT_language 2026-02-21T09:08:10.4935732Z .b8 5 // DW_FORM_data2 2026-02-21T09:08:10.4935943Z .b8 3 // DW_AT_name 2026-02-21T09:08:10.4936113Z .b8 8 // DW_FORM_string 2026-02-21T09:08:10.4936282Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:08:10.4936445Z .b8 6 // DW_FORM_data4 2026-02-21T09:08:10.4936605Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:08:10.4936776Z .b8 8 // DW_FORM_string 2026-02-21T09:08:10.4936926Z .b8 0 // EOM(1) 2026-02-21T09:08:10.4937066Z .b8 0 // EOM(2) 2026-02-21T09:08:10.4937215Z .b8 0 // EOM(3) 2026-02-21T09:08:10.4937317Z } 2026-02-21T09:08:10.4937442Z .section .debug_info 2026-02-21T09:08:10.4937556Z { 2026-02-21T09:08:10.4937733Z .b32 104 // Length of Unit 2026-02-21T09:08:10.4937912Z .b8 2 // DWARF version number 2026-02-21T09:08:10.4938019Z .b8 0 2026-02-21T09:08:10.4938291Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:08:10.4938481Z .b8 8 // Address Size (in bytes) 2026-02-21T09:08:10.4938704Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:08:10.4938892Z .b8 116 // DW_AT_producer 2026-02-21T09:08:10.4939003Z .b8 114 2026-02-21T09:08:10.4939110Z .b8 105 2026-02-21T09:08:10.4939215Z .b8 116 2026-02-21T09:08:10.4939326Z .b8 111 2026-02-21T09:08:10.4939427Z .b8 110 2026-02-21T09:08:10.4939526Z .b8 0 2026-02-21T09:08:10.4939691Z .b8 2 // DW_AT_language 2026-02-21T09:08:10.4939791Z .b8 0 2026-02-21T09:08:10.4939948Z .b8 99 // DW_AT_name 2026-02-21T09:08:10.4940051Z .b8 101 2026-02-21T09:08:10.4940164Z .b8 118 2026-02-21T09:08:10.4940268Z .b8 112 2026-02-21T09:08:10.4940370Z .b8 120 2026-02-21T09:08:10.4940486Z .b8 102 2026-02-21T09:08:10.4940587Z .b8 109 2026-02-21T09:08:10.4940689Z .b8 119 2026-02-21T09:08:10.4940790Z .b8 55 2026-02-21T09:08:10.4940902Z .b8 52 2026-02-21T09:08:10.4941003Z .b8 102 2026-02-21T09:08:10.4941106Z .b8 120 2026-02-21T09:08:10.4941217Z .b8 108 2026-02-21T09:08:10.4941317Z .b8 99 2026-02-21T09:08:10.4941418Z .b8 109 2026-02-21T09:08:10.4941521Z .b8 117 2026-02-21T09:08:10.4941632Z .b8 112 2026-02-21T09:08:10.4941732Z .b8 114 2026-02-21T09:08:10.4941853Z .b8 103 2026-02-21T09:08:10.4942062Z .b8 109 2026-02-21T09:08:10.4942197Z .b8 119 2026-02-21T09:08:10.4942319Z .b8 100 2026-02-21T09:08:10.4942464Z .b8 102 2026-02-21T09:08:10.4942595Z .b8 119 2026-02-21T09:08:10.4942728Z .b8 102 2026-02-21T09:08:10.4942849Z .b8 115 2026-02-21T09:08:10.4942971Z .b8 98 2026-02-21T09:08:10.4943131Z .b8 54 2026-02-21T09:08:10.4943234Z .b8 113 2026-02-21T09:08:10.4943336Z .b8 108 2026-02-21T09:08:10.4943447Z .b8 107 2026-02-21T09:08:10.4943550Z .b8 98 2026-02-21T09:08:10.4943648Z .b8 104 2026-02-21T09:08:10.4943752Z .b8 122 2026-02-21T09:08:10.4943865Z .b8 113 2026-02-21T09:08:10.4943965Z .b8 52 2026-02-21T09:08:10.4944066Z .b8 115 2026-02-21T09:08:10.4944177Z .b8 113 2026-02-21T09:08:10.4944277Z .b8 54 2026-02-21T09:08:10.4944379Z .b8 118 2026-02-21T09:08:10.4944478Z .b8 112 2026-02-21T09:08:10.4944588Z .b8 55 2026-02-21T09:08:10.4944754Z .b8 55 2026-02-21T09:08:10.4944857Z .b8 109 2026-02-21T09:08:10.4944961Z .b8 117 2026-02-21T09:08:10.4945073Z .b8 51 2026-02-21T09:08:10.4945256Z .b8 118 2026-02-21T09:08:10.4945359Z .b8 119 2026-02-21T09:08:10.4945469Z .b8 104 2026-02-21T09:08:10.4945569Z .b8 112 2026-02-21T09:08:10.4945669Z .b8 105 2026-02-21T09:08:10.4945768Z .b8 117 2026-02-21T09:08:10.4945877Z .b8 46 2026-02-21T09:08:10.4945977Z .b8 112 2026-02-21T09:08:10.4946079Z .b8 121 2026-02-21T09:08:10.4946191Z .b8 0 2026-02-21T09:08:10.4946396Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:08:10.4946556Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:08:10.4946716Z .b8 116 2026-02-21T09:08:10.4946829Z .b8 109 2026-02-21T09:08:10.4946929Z .b8 112 2026-02-21T09:08:10.4947029Z .b8 47 2026-02-21T09:08:10.4947139Z .b8 116 2026-02-21T09:08:10.4947243Z .b8 111 2026-02-21T09:08:10.4947343Z .b8 114 2026-02-21T09:08:10.4947443Z .b8 99 2026-02-21T09:08:10.4947554Z .b8 104 2026-02-21T09:08:10.4947655Z .b8 105 2026-02-21T09:08:10.4947755Z .b8 110 2026-02-21T09:08:10.4947855Z .b8 100 2026-02-21T09:08:10.4947965Z .b8 117 2026-02-21T09:08:10.4948067Z .b8 99 2026-02-21T09:08:10.4948168Z .b8 116 2026-02-21T09:08:10.4948278Z .b8 111 2026-02-21T09:08:10.4948378Z .b8 114 2026-02-21T09:08:10.4948478Z .b8 95 2026-02-21T09:08:10.4948578Z .b8 114 2026-02-21T09:08:10.4948701Z .b8 111 2026-02-21T09:08:10.4948801Z .b8 111 2026-02-21T09:08:10.4948897Z .b8 116 2026-02-21T09:08:10.4949007Z .b8 47 2026-02-21T09:08:10.4949103Z .b8 101 2026-02-21T09:08:10.4949200Z .b8 118 2026-02-21T09:08:10.4949294Z .b8 0 2026-02-21T09:08:10.4949402Z } 2026-02-21T09:08:10.4949534Z .section .debug_macinfo { } 2026-02-21T09:08:10.4949546Z 2026-02-21T09:08:10.4949705Z ================================================================ 2026-02-21T09:08:10.4949947Z please share the reproducer above with Triton project. 2026-02-21T09:08:10.4949955Z 2026-02-21T09:08:10.4950707Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 45/45 11.2 configs/s 2026-02-21T09:08:11.6147695Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 860.2 2026-02-21T09:08:11.6148530Z configs/s 2026-02-21T09:08:11.7229051Z [192s] Generation 8 complete: 2026-02-21T09:08:11.7229515Z error=14 2026-02-21T09:08:11.7229772Z ok=34 2026-02-21T09:08:11.7230041Z min=0.0440 2026-02-21T09:08:11.7230331Z mid=0.0676 2026-02-21T09:08:11.7230670Z max=13.0785 2026-02-21T09:08:11.7231008Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:08:11.7231592Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:08:11.7232158Z 'l2_groupings': [1], 2026-02-21T09:08:11.7232551Z 'load_eviction_policies': ['first', ''], 2026-02-21T09:08:11.7232962Z 'loop_orders': [[1, 0]], 2026-02-21T09:08:11.7233301Z 'num_stages': 7, 2026-02-21T09:08:11.7233584Z 'num_warps': 8, 2026-02-21T09:08:11.7233894Z 'pid_type': 'flat', 2026-02-21T09:08:11.7234205Z 'range_flattens': [None, None], 2026-02-21T09:08:11.7234615Z 'range_multi_buffers': [None, None], 2026-02-21T09:08:11.7235507Z 'range_num_stages': [0, 0], 2026-02-21T09:08:11.7235860Z 'range_unroll_factors': [0, 0], 2026-02-21T09:08:11.7236847Z 'range_warp_specializes': [None, False]} 2026-02-21T09:08:11.7286371Z [192s] Fitting surrogate: 734 points, 734 targets 2026-02-21T09:08:12.7553322Z [193s] Generation 9 starting: 33 neighbors, 2 active search path(s) 2026-02-21T09:08:20.7985453Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35/35 2.8 configs/s 2026-02-21T09:08:22.7496254Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 35/35 17.3 configs/s 2026-02-21T09:08:22.9590079Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 4977.2 2026-02-21T09:08:22.9593211Z configs/s 2026-02-21T09:08:22.9822499Z [203s] Generation 9 complete: 2026-02-21T09:08:22.9822890Z error=11 2026-02-21T09:08:22.9823130Z ok=25 2026-02-21T09:08:22.9823362Z min=0.0543 2026-02-21T09:08:22.9823591Z mid=0.0635 2026-02-21T09:08:22.9823821Z max=13.1748 2026-02-21T09:08:22.9824498Z best={'block_sizes': [256, 128, 32], 2026-02-21T09:08:22.9825432Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:08:22.9825947Z 'l2_groupings': [2], 2026-02-21T09:08:22.9826264Z 'load_eviction_policies': ['first', ''], 2026-02-21T09:08:22.9826654Z 'loop_orders': [[1, 0]], 2026-02-21T09:08:22.9826959Z 'num_stages': 5, 2026-02-21T09:08:22.9827231Z 'num_warps': 8, 2026-02-21T09:08:22.9827491Z 'pid_type': 'flat', 2026-02-21T09:08:22.9827794Z 'range_flattens': [None, None], 2026-02-21T09:08:22.9828314Z 'range_multi_buffers': [None, False], 2026-02-21T09:08:22.9828698Z 'range_num_stages': [0, 0], 2026-02-21T09:08:22.9829021Z 'range_unroll_factors': [0, 0], 2026-02-21T09:08:22.9829386Z 'range_warp_specializes': [None, False]} 2026-02-21T09:08:22.9883685Z [203s] Fitting surrogate: 770 points, 770 targets 2026-02-21T09:08:23.6067016Z [203s] Generation 10 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:08:34.9533790Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0.9 configs/s 2026-02-21T09:08:36.1474014Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 14.7 configs/s 2026-02-21T09:08:36.3037379Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5908.2 2026-02-21T09:08:36.3038037Z configs/s 2026-02-21T09:08:36.3347730Z [216s] Generation 10 complete: 2026-02-21T09:08:36.3348075Z error=2 2026-02-21T09:08:36.3348417Z ok=16 2026-02-21T09:08:36.3348652Z min=0.0369 2026-02-21T09:08:36.3348911Z mid=0.1660 2026-02-21T09:08:36.3349176Z max=23.6242 2026-02-21T09:08:36.3349445Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:08:36.3349918Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:08:36.3350420Z 'l2_groupings': [2], 2026-02-21T09:08:36.3350770Z 'load_eviction_policies': ['first', ''], 2026-02-21T09:08:36.3351146Z 'loop_orders': [[1, 0]], 2026-02-21T09:08:36.3351416Z 'num_stages': 5, 2026-02-21T09:08:36.3351676Z 'num_warps': 4, 2026-02-21T09:08:36.3351926Z 'pid_type': 'flat', 2026-02-21T09:08:36.3352203Z 'range_flattens': [None, None], 2026-02-21T09:08:36.3352529Z 'range_multi_buffers': [None, False], 2026-02-21T09:08:36.3352873Z 'range_num_stages': [0, 0], 2026-02-21T09:08:36.3353171Z 'range_unroll_factors': [0, 0], 2026-02-21T09:08:36.3353517Z 'range_warp_specializes': [None, None]} 2026-02-21T09:08:36.3373098Z [216s] Fitting surrogate: 788 points, 788 targets 2026-02-21T09:08:36.8555647Z [217s] Generation 11 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:08:40.5593173Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 4.1 configs/s 2026-02-21T09:08:41.2410698Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 24.2 configs/s 2026-02-21T09:08:41.5471996Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3148.2 2026-02-21T09:08:41.5472409Z configs/s 2026-02-21T09:08:41.5870011Z [221s] Generation 11 complete: 2026-02-21T09:08:41.5870990Z error=7 2026-02-21T09:08:41.5871320Z ok=10 2026-02-21T09:08:41.5871643Z min=0.0368 2026-02-21T09:08:41.5871981Z mid=0.0635 2026-02-21T09:08:41.5872310Z max=12.4242 2026-02-21T09:08:41.5872686Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:08:41.5873543Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:08:41.5874252Z 'l2_groupings': [2], 2026-02-21T09:08:41.5875165Z 'load_eviction_policies': ['first', ''], 2026-02-21T09:08:41.5875849Z 'loop_orders': [[1, 0]], 2026-02-21T09:08:41.5876296Z 'num_stages': 5, 2026-02-21T09:08:41.5876683Z 'num_warps': 8, 2026-02-21T09:08:41.5877081Z 'pid_type': 'flat', 2026-02-21T09:08:41.5877506Z 'range_flattens': [None, None], 2026-02-21T09:08:41.5878021Z 'range_multi_buffers': [None, None], 2026-02-21T09:08:41.5878527Z 'range_num_stages': [0, 0], 2026-02-21T09:08:41.5879008Z 'range_unroll_factors': [0, 0], 2026-02-21T09:08:41.5879690Z 'range_warp_specializes': [None, None]} 2026-02-21T09:08:41.5895764Z [221s] Fitting surrogate: 805 points, 805 targets 2026-02-21T09:08:41.9660824Z [222s] Generation 12 starting: 15 neighbors, 1 active search path(s) 2026-02-21T09:08:46.2535979Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 3.5 configs/s 2026-02-21T09:08:46.8796902Z 2026-02-21T09:08:46.8796916Z 2026-02-21T09:08:46.8797385Z ================================================================ 2026-02-21T09:08:46.8797880Z Internal Triton PTX codegen error 2026-02-21T09:08:46.8798243Z `ptxas` stderr: 2026-02-21T09:08:46.8799099Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 208 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:46.8800078Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:46.8800379Z 2026-02-21T09:08:46.8801211Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpndn_sp91.ptx -o /tmp/tmpndn_sp91.ptx.o 2026-02-21T09:08:46.8802146Z 2026-02-21T09:08:46.8802170Z 2026-02-21T09:08:46.8802270Z // 2026-02-21T09:08:46.8802514Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:08:46.8802840Z // 2026-02-21T09:08:46.8803639Z [227s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:08:46.8806270Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=5, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, None]), static_shapes=True) 2026-02-21T09:08:46.8808791Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:08:46.8809267Z `ptxas` stderr: 2026-02-21T09:08:46.8810090Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 208 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:08:46.8811055Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:46.8811347Z 2026-02-21T09:08:46.8812573Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpndn_sp91.ptx -o /tmp/tmpndn_sp91.ptx.o 2026-02-21T09:08:46.8813505Z 2026-02-21T09:08:46.8813772Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:08:46.8814146Z 2026-02-21T09:08:46.8814242Z .version 8.7 2026-02-21T09:08:46.8814482Z .target sm_100a 2026-02-21T09:08:46.8814791Z .address_size 64 2026-02-21T09:08:46.8814969Z 2026-02-21T09:08:46.8815198Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:08:46.8815674Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:08:46.8816082Z // @_helion_matmul 2026-02-21T09:08:46.8816659Z .visible .entry _helion_matmul( 2026-02-21T09:08:46.8817044Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:08:46.8817543Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:08:46.8818012Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:08:46.8818479Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:08:46.8819114Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:08:46.8819504Z ) 2026-02-21T09:08:46.8819705Z .reqntid 256 2026-02-21T09:08:46.8819936Z .maxnreg 32 2026-02-21T09:08:46.8820137Z { 2026-02-21T09:08:46.8820354Z .reg .pred %p<101>; 2026-02-21T09:08:46.8820606Z .reg .b32 %r<1608>; 2026-02-21T09:08:46.8820851Z .reg .b64 %rd<656>; 2026-02-21T09:08:46.8821102Z $L__func_begin0: 2026-02-21T09:08:46.8821248Z 2026-02-21T09:08:46.8821336Z // %bb.0: 2026-02-21T09:08:46.8821943Z .loc 1 19 0 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:19 2026-02-21T09:08:46.8822504Z mov.u32 %r1, %tid.x; 2026-02-21T09:08:46.8822816Z ld.param.b64 %rd16, [_helion_matmul_param_1]; 2026-02-21T09:08:46.8823182Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:08:46.8823480Z mov.b32 %r76, global_smem; 2026-02-21T09:08:46.8823770Z // begin inline asm 2026-02-21T09:08:46.8824218Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r76], 512; 2026-02-21T09:08:46.8824791Z // end inline asm 2026-02-21T09:08:46.8825088Z ld.param.b64 %rd33, [_helion_matmul_param_3]; 2026-02-21T09:08:46.8825449Z bar.sync 0; 2026-02-21T09:08:46.8825699Z ld.shared.b32 %r1600, [global_smem]; 2026-02-21T09:08:46.8826015Z bar.sync 0; 2026-02-21T09:08:46.8826237Z // begin inline asm 2026-02-21T09:08:46.8826615Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:08:46.8827037Z // end inline asm 2026-02-21T09:08:46.8827532Z .loc 1 21 67 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:21:67 2026-02-21T09:08:46.8828111Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:08:46.8828384Z mov.u32 %r85, %ctaid.y; 2026-02-21T09:08:46.8828656Z mov.u32 %r86, %ctaid.z; 2026-02-21T09:08:46.8828922Z mov.u32 %r87, %nctaid.x; 2026-02-21T09:08:46.8829205Z mov.u32 %r88, %nctaid.y; 2026-02-21T09:08:46.8829481Z mad.lo.s32 %r89, %r86, %r88, %r85; 2026-02-21T09:08:46.8829804Z mad.lo.s32 %r90, %r89, %r87, %r3; 2026-02-21T09:08:46.8830101Z shl.b32 %r91, %r90, 7; 2026-02-21T09:08:46.8830380Z cvt.s64.s32 %rd34, %r91; 2026-02-21T09:08:46.8830663Z add.s64 %rd30, %rd33, %rd34; 2026-02-21T09:08:46.8830955Z shl.b32 %r92, %r1, 2; 2026-02-21T09:08:46.8831220Z add.s32 %r77, %r76, %r92; 2026-02-21T09:08:46.8831484Z mov.b32 %r94, 0; 2026-02-21T09:08:46.8831725Z // begin inline asm 2026-02-21T09:08:46.8831993Z @%p1 st.shared.b32 [ %r77 + 0 ], %r94; 2026-02-21T09:08:46.8832334Z // end inline asm 2026-02-21T09:08:46.8832585Z bar.warp.sync -1; 2026-02-21T09:08:46.8832861Z setp.eq.b32 %p93, %r1, 0; 2026-02-21T09:08:46.8833164Z cvt.u64.u32 %rd15, %r76; 2026-02-21T09:08:46.8833462Z // begin inline asm 2026-02-21T09:08:46.8834001Z @%p93 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd15 + 0 ], %rd16; 2026-02-21T09:08:46.8834569Z // end inline asm 2026-02-21T09:08:46.8835060Z // begin inline asm 2026-02-21T09:08:46.8835497Z @%p93 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1; 2026-02-21T09:08:46.8836012Z // end inline asm 2026-02-21T09:08:46.8836253Z mov.b32 %r79, 32; 2026-02-21T09:08:46.8836522Z // begin inline asm 2026-02-21T09:08:46.8836969Z @%p93 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r79; 2026-02-21T09:08:46.8837495Z // end inline asm 2026-02-21T09:08:46.8837733Z mov.b32 %r80, 256; 2026-02-21T09:08:46.8837968Z // begin inline asm 2026-02-21T09:08:46.8838419Z @%p93 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r80; 2026-02-21T09:08:46.8838940Z // end inline asm 2026-02-21T09:08:46.8839310Z mov.b32 %r81, 2048; 2026-02-21T09:08:46.8839553Z // begin inline asm 2026-02-21T09:08:46.8840022Z @%p93 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r81; 2026-02-21T09:08:46.8840553Z // end inline asm 2026-02-21T09:08:46.8840805Z // begin inline asm 2026-02-21T09:08:46.8841268Z @%p93 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r81; 2026-02-21T09:08:46.8841792Z // end inline asm 2026-02-21T09:08:46.8842136Z mov.b64 %rd23, 4096; 2026-02-21T09:08:46.8842396Z // begin inline asm 2026-02-21T09:08:46.8842881Z @%p93 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd15 + 0 ], 0x0, %rd23; 2026-02-21T09:08:46.8843433Z // end inline asm 2026-02-21T09:08:46.8843666Z mov.b32 %r83, 1; 2026-02-21T09:08:46.8843901Z // begin inline asm 2026-02-21T09:08:46.8844377Z @%p93 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r83; 2026-02-21T09:08:46.8845154Z // end inline asm 2026-02-21T09:08:46.8845404Z // begin inline asm 2026-02-21T09:08:46.8845897Z @%p93 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r83; 2026-02-21T09:08:46.8846453Z // end inline asm 2026-02-21T09:08:46.8846694Z // begin inline asm 2026-02-21T09:08:46.8847145Z @%p93 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x6; 2026-02-21T09:08:46.8847663Z // end inline asm 2026-02-21T09:08:46.8847905Z // begin inline asm 2026-02-21T09:08:46.8848384Z @%p93 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T09:08:46.8848944Z // end inline asm 2026-02-21T09:08:46.8849177Z // begin inline asm 2026-02-21T09:08:46.8849633Z @%p93 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x2; 2026-02-21T09:08:46.8850147Z // end inline asm 2026-02-21T09:08:46.8850390Z // begin inline asm 2026-02-21T09:08:46.8850833Z @%p93 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T09:08:46.8851342Z // end inline asm 2026-02-21T09:08:46.8851585Z // begin inline asm 2026-02-21T09:08:46.8852254Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd30 + 0 ], [ %rd15 + 0 ], 0x80; 2026-02-21T09:08:46.8853001Z // end inline asm 2026-02-21T09:08:46.8853239Z // begin inline asm 2026-02-21T09:08:46.8853628Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd30 + 0 ], 0x80; 2026-02-21T09:08:46.8854113Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:08:46.8854459Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:08:46.8854844Z // end inline asm 2026-02-21T09:08:46.8855072Z bar.sync 0; 2026-02-21T09:08:46.8855324Z cvta.global.u64 %rd82, %rd30; 2026-02-21T09:08:46.8855858Z .loc 1 27 99 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:27:99 2026-02-21T09:08:46.8856442Z setp.gt.u32 %p21, %r3, 127; 2026-02-21T09:08:46.8856727Z @%p21 bra $L__BB0_8; 2026-02-21T09:08:46.8857027Z // %bb.1: // %.lr.ph 2026-02-21T09:08:46.8857610Z .loc 1 0 99 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:0:99 2026-02-21T09:08:46.8858220Z ld.param.b64 %rd13, [_helion_matmul_param_0]; 2026-02-21T09:08:46.8858816Z .loc 1 47 48 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:47:48 2026-02-21T09:08:46.8859502Z shl.b32 %r426, %r1, 3; 2026-02-21T09:08:46.8859780Z and.b32 %r427, %r426, 24; 2026-02-21T09:08:46.8860282Z .loc 1 39 45 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:39:45 2026-02-21T09:08:46.8860846Z and.b32 %r428, %r1, 31; 2026-02-21T09:08:46.8861122Z shl.b32 %r429, %r428, 3; 2026-02-21T09:08:46.8861385Z and.b32 %r430, %r1, 224; 2026-02-21T09:08:46.8861663Z bfe.u32 %r431, %r1, 5, 3; 2026-02-21T09:08:46.8861926Z shr.u32 %r432, %r1, 2; 2026-02-21T09:08:46.8862191Z bfe.u32 %r4, %r1, 2, 6; 2026-02-21T09:08:46.8862453Z shr.u32 %r433, %r1, 5; 2026-02-21T09:08:46.8862841Z shl.b32 %r434, %r1, 4; 2026-02-21T09:08:46.8863101Z and.b32 %r435, %r434, 4080; 2026-02-21T09:08:46.8863381Z shl.b32 %r436, %r1, 1; 2026-02-21T09:08:46.8863641Z and.b32 %r437, %r436, 48; 2026-02-21T09:08:46.8863917Z xor.b32 %r5, %r435, %r437; 2026-02-21T09:08:46.8864206Z add.s32 %r372, %r76, %r5; 2026-02-21T09:08:46.8864473Z add.s32 %r374, %r372, 4096; 2026-02-21T09:08:46.8864857Z add.s32 %r376, %r372, 8192; 2026-02-21T09:08:46.8865145Z add.s32 %r378, %r372, 12288; 2026-02-21T09:08:46.8865538Z add.s32 %r385, %r372, 16384; 2026-02-21T09:08:46.8865826Z add.s32 %r387, %r372, 20480; 2026-02-21T09:08:46.8866104Z add.s32 %r389, %r372, 24576; 2026-02-21T09:08:46.8866372Z add.s32 %r391, %r372, 28672; 2026-02-21T09:08:46.8866650Z add.s32 %r398, %r372, 32768; 2026-02-21T09:08:46.8866919Z add.s32 %r400, %r372, 36864; 2026-02-21T09:08:46.8867197Z add.s32 %r402, %r372, 40960; 2026-02-21T09:08:46.8867472Z add.s32 %r404, %r372, 45056; 2026-02-21T09:08:46.8867844Z add.s32 %r411, %r372, 49152; 2026-02-21T09:08:46.8868143Z add.s32 %r413, %r372, 53248; 2026-02-21T09:08:46.8868408Z add.s32 %r415, %r372, 57344; 2026-02-21T09:08:46.8868681Z add.s32 %r417, %r372, 61440; 2026-02-21T09:08:46.8868951Z or.b32 %r6, %r427, 128; 2026-02-21T09:08:46.8869224Z add.s32 %r501, %r372, 65536; 2026-02-21T09:08:46.8869502Z add.s32 %r503, %r372, 69632; 2026-02-21T09:08:46.8869781Z add.s32 %r505, %r372, 73728; 2026-02-21T09:08:46.8870060Z add.s32 %r507, %r372, 77824; 2026-02-21T09:08:46.8870335Z shl.b32 %r439, %r1, 12; 2026-02-21T09:08:46.8870600Z and.b32 %r440, %r439, 28672; 2026-02-21T09:08:46.8870869Z or.b32 %r441, %r440, %r435; 2026-02-21T09:08:46.8871150Z xor.b32 %r442, %r441, 16; 2026-02-21T09:08:46.8871414Z xor.b32 %r443, %r441, 32; 2026-02-21T09:08:46.8871685Z xor.b32 %r444, %r441, 48; 2026-02-21T09:08:46.8871946Z xor.b32 %r445, %r441, 64; 2026-02-21T09:08:46.8872219Z xor.b32 %r446, %r441, 80; 2026-02-21T09:08:46.8872484Z xor.b32 %r447, %r441, 96; 2026-02-21T09:08:46.8872758Z xor.b32 %r448, %r441, 112; 2026-02-21T09:08:46.8873038Z shl.b32 %r449, %r430, 7; 2026-02-21T09:08:46.8873296Z shl.b32 %r450, %r428, 4; 2026-02-21T09:08:46.8873566Z shr.u32 %r451, %r430, 1; 2026-02-21T09:08:46.8873827Z or.b32 %r452, %r449, %r450; 2026-02-21T09:08:46.8874121Z xor.b32 %r453, %r452, %r451; 2026-02-21T09:08:46.8874395Z add.s32 %r867, %r76, %r453; 2026-02-21T09:08:46.8874995Z .loc 1 27 99 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:27:99 2026-02-21T09:08:46.8875558Z cvt.u64.u32 %rd55, %r427; 2026-02-21T09:08:46.8876062Z .loc 1 34 33 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:34:33 2026-02-21T09:08:46.8876622Z shr.u32 %r454, %r3, 4; 2026-02-21T09:08:46.8876887Z and.b32 %r455, %r454, 6; 2026-02-21T09:08:46.8877381Z .loc 1 36 64 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:36:64 2026-02-21T09:08:46.8877934Z and.b32 %r456, %r3, 1; 2026-02-21T09:08:46.8878419Z .loc 1 36 30 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:36:30 2026-02-21T09:08:46.8878976Z or.b32 %r457, %r455, %r456; 2026-02-21T09:08:46.8879481Z .loc 1 38 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:38:27 2026-02-21T09:08:46.8880153Z shl.b32 %r512, %r457, 8; 2026-02-21T09:08:46.8880636Z .loc 1 40 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:40:27 2026-02-21T09:08:46.8881196Z shl.b32 %r458, %r3, 7; 2026-02-21T09:08:46.8881455Z and.b32 %r459, %r458, 3840; 2026-02-21T09:08:46.8881958Z .loc 1 41 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:41:32 2026-02-21T09:08:46.8882502Z or.b32 %r460, %r459, %r4; 2026-02-21T09:08:46.8882777Z or.b32 %r461, %r432, %r459; 2026-02-21T09:08:46.8883053Z or.b32 %r29, %r459, %r431; 2026-02-21T09:08:46.8883559Z .loc 1 51 53 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:53 2026-02-21T09:08:46.8884245Z shl.b32 %r462, %r460, 11; 2026-02-21T09:08:46.8884511Z shl.b32 %r463, %r461, 11; 2026-02-21T09:08:46.8884861Z or.b32 %r464, %r463, 393216; 2026-02-21T09:08:46.8885367Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.8885969Z shfl.sync.idx.b32 %r61, %r433, 0, 31, -1; 2026-02-21T09:08:46.8886303Z shl.b32 %r465, %r61, 21; 2026-02-21T09:08:46.8886585Z and.b32 %r466, %r465, 6291456; 2026-02-21T09:08:46.8887029Z add.s32 %r467, %r466, %r1600; 2026-02-21T09:08:46.8887325Z shl.b32 %r468, %r61, 6; 2026-02-21T09:08:46.8887600Z and.b32 %r469, %r468, 256; 2026-02-21T09:08:46.8887875Z add.s32 %r862, %r467, %r469; 2026-02-21T09:08:46.8888164Z mov.pred %p61, -1; 2026-02-21T09:08:46.8888420Z // begin inline asm 2026-02-21T09:08:46.8889197Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 0], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8889938Z // end inline asm 2026-02-21T09:08:46.8890187Z // begin inline asm 2026-02-21T09:08:46.8890840Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 16], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8891558Z // end inline asm 2026-02-21T09:08:46.8891805Z // begin inline asm 2026-02-21T09:08:46.8892435Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 32], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8893151Z // end inline asm 2026-02-21T09:08:46.8893383Z // begin inline asm 2026-02-21T09:08:46.8894015Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 48], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8894842Z // end inline asm 2026-02-21T09:08:46.8895084Z // begin inline asm 2026-02-21T09:08:46.8895734Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 64], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8896455Z // end inline asm 2026-02-21T09:08:46.8896703Z // begin inline asm 2026-02-21T09:08:46.8897334Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 80], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8898061Z // end inline asm 2026-02-21T09:08:46.8898305Z // begin inline asm 2026-02-21T09:08:46.8898944Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 96], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8899664Z // end inline asm 2026-02-21T09:08:46.8899898Z // begin inline asm 2026-02-21T09:08:46.8900549Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 112], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8901283Z // end inline asm 2026-02-21T09:08:46.8901521Z // begin inline asm 2026-02-21T09:08:46.8902168Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 128], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8902883Z // end inline asm 2026-02-21T09:08:46.8903241Z // begin inline asm 2026-02-21T09:08:46.8903880Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 144], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8904609Z // end inline asm 2026-02-21T09:08:46.8904913Z // begin inline asm 2026-02-21T09:08:46.8905565Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 160], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8906288Z // end inline asm 2026-02-21T09:08:46.8906525Z // begin inline asm 2026-02-21T09:08:46.8907183Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 176], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8908010Z // end inline asm 2026-02-21T09:08:46.8908256Z // begin inline asm 2026-02-21T09:08:46.8908904Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 192], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8909625Z // end inline asm 2026-02-21T09:08:46.8909870Z // begin inline asm 2026-02-21T09:08:46.8910605Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 208], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8911343Z // end inline asm 2026-02-21T09:08:46.8911574Z // begin inline asm 2026-02-21T09:08:46.8912212Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 224], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8912928Z // end inline asm 2026-02-21T09:08:46.8913255Z // begin inline asm 2026-02-21T09:08:46.8913916Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 240], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:08:46.8914625Z // end inline asm 2026-02-21T09:08:46.8914994Z // begin inline asm 2026-02-21T09:08:46.8915269Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:08:46.8915566Z // end inline asm 2026-02-21T09:08:46.8915793Z bar.sync 0; 2026-02-21T09:08:46.8916270Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.8916844Z add.s32 %r1602, %r76, 163888; 2026-02-21T09:08:46.8917125Z // begin inline asm 2026-02-21T09:08:46.8917432Z @%p93 mbarrier.init.shared::cta.b64 [%r1602], 1; 2026-02-21T09:08:46.8917790Z // end inline asm 2026-02-21T09:08:46.8918022Z bar.sync 0; 2026-02-21T09:08:46.8918250Z add.s32 %r366, %r76, 163896; 2026-02-21T09:08:46.8918534Z // begin inline asm 2026-02-21T09:08:46.8918833Z @%p93 mbarrier.init.shared::cta.b64 [%r366], 1; 2026-02-21T09:08:46.8919203Z // end inline asm 2026-02-21T09:08:46.8919448Z add.s32 %r367, %r76, 163840; 2026-02-21T09:08:46.8919719Z // begin inline asm 2026-02-21T09:08:46.8920022Z @%p93 mbarrier.init.shared::cta.b64 [%r367], 1; 2026-02-21T09:08:46.8920371Z // end inline asm 2026-02-21T09:08:46.8920605Z bar.sync 0; 2026-02-21T09:08:46.8920830Z add.s32 %r368, %r76, 163848; 2026-02-21T09:08:46.8921117Z // begin inline asm 2026-02-21T09:08:46.8921407Z @%p93 mbarrier.init.shared::cta.b64 [%r368], 1; 2026-02-21T09:08:46.8921759Z // end inline asm 2026-02-21T09:08:46.8921990Z bar.sync 0; 2026-02-21T09:08:46.8922213Z add.s32 %r369, %r76, 163856; 2026-02-21T09:08:46.8922490Z // begin inline asm 2026-02-21T09:08:46.8922777Z @%p93 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T09:08:46.8923130Z // end inline asm 2026-02-21T09:08:46.8923353Z bar.sync 0; 2026-02-21T09:08:46.8923583Z add.s32 %r370, %r76, 163864; 2026-02-21T09:08:46.8923858Z // begin inline asm 2026-02-21T09:08:46.8924164Z @%p93 mbarrier.init.shared::cta.b64 [%r370], 1; 2026-02-21T09:08:46.8924502Z // end inline asm 2026-02-21T09:08:46.8924816Z bar.sync 0; 2026-02-21T09:08:46.8925052Z add.s32 %r509, %r76, 163872; 2026-02-21T09:08:46.8925321Z // begin inline asm 2026-02-21T09:08:46.8925738Z @%p93 mbarrier.init.shared::cta.b64 [%r509], 1; 2026-02-21T09:08:46.8926083Z // end inline asm 2026-02-21T09:08:46.8926562Z .loc 1 51 60 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:60 2026-02-21T09:08:46.8927134Z or.b32 %r470, %r462, %r427; 2026-02-21T09:08:46.8927429Z or.b32 %r471, %r464, %r427; 2026-02-21T09:08:46.8927941Z .loc 1 51 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:32 2026-02-21T09:08:46.8928530Z mad.wide.u32 %rd35, %r470, 2, %rd13; 2026-02-21T09:08:46.8928867Z cvt.u64.u32 %rd2, %r462; 2026-02-21T09:08:46.8929143Z or.b64 %rd56, %rd2, %rd55; 2026-02-21T09:08:46.8929439Z shl.b64 %rd57, %rd56, 1; 2026-02-21T09:08:46.8929828Z add.s64 %rd3, %rd13, %rd57; 2026-02-21T09:08:46.8930115Z add.s64 %rd36, %rd3, 262144; 2026-02-21T09:08:46.8930392Z add.s64 %rd37, %rd3, 524288; 2026-02-21T09:08:46.8930691Z mad.wide.u32 %rd38, %r471, 2, %rd13; 2026-02-21T09:08:46.8930998Z mov.b32 %r502, 16; 2026-02-21T09:08:46.8931489Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.8932054Z // begin inline asm 2026-02-21T09:08:46.8932526Z cp.async.cg.shared.global [ %r372 + 0 ], [ %rd35 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8932975Z // end inline asm 2026-02-21T09:08:46.8933208Z // begin inline asm 2026-02-21T09:08:46.8933580Z cp.async.cg.shared.global [ %r374 + 0 ], [ %rd36 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8934003Z // end inline asm 2026-02-21T09:08:46.8934239Z // begin inline asm 2026-02-21T09:08:46.8934593Z cp.async.cg.shared.global [ %r376 + 0 ], [ %rd37 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8935221Z // end inline asm 2026-02-21T09:08:46.8935481Z // begin inline asm 2026-02-21T09:08:46.8935867Z cp.async.cg.shared.global [ %r378 + 0 ], [ %rd38 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8936305Z // end inline asm 2026-02-21T09:08:46.8936552Z cp.async.commit_group; 2026-02-21T09:08:46.8937063Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.8937624Z bar.sync 0; 2026-02-21T09:08:46.8937858Z // begin inline asm 2026-02-21T09:08:46.8938214Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r367], 16384; 2026-02-21T09:08:46.8938639Z // end inline asm 2026-02-21T09:08:46.8939114Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.8939670Z // begin inline asm 2026-02-21T09:08:46.8939958Z fence.proxy.async.shared::cta; 2026-02-21T09:08:46.8940254Z // end inline asm 2026-02-21T09:08:46.8940495Z bar.sync 0; 2026-02-21T09:08:46.8940744Z elect.sync %r472|%p54, -1; 2026-02-21T09:08:46.8941048Z and.pred %p46, %p1, %p54; 2026-02-21T09:08:46.8941330Z add.s32 %r381, %r76, 81920; 2026-02-21T09:08:46.8941610Z // begin inline asm 2026-02-21T09:08:46.8942258Z @%p46 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r381], [%rd82, {%r94, %r512}], [%r367]; 2026-02-21T09:08:46.8942970Z // end inline asm 2026-02-21T09:08:46.8943450Z .loc 1 51 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:32 2026-02-21T09:08:46.8944008Z add.s64 %rd40, %rd3, 64; 2026-02-21T09:08:46.8944289Z or.b32 %r473, %r470, 32; 2026-02-21T09:08:46.8944571Z mad.wide.u32 %rd58, %r473, 2, %rd13; 2026-02-21T09:08:46.8944984Z add.s64 %rd41, %rd58, 262144; 2026-02-21T09:08:46.8945279Z add.s64 %rd42, %rd58, 524288; 2026-02-21T09:08:46.8945560Z cvt.u64.u32 %rd4, %r464; 2026-02-21T09:08:46.8945839Z or.b64 %rd59, %rd4, %rd55; 2026-02-21T09:08:46.8946113Z shl.b64 %rd60, %rd59, 1; 2026-02-21T09:08:46.8946396Z add.s64 %rd5, %rd13, %rd60; 2026-02-21T09:08:46.8946680Z add.s64 %rd43, %rd5, 64; 2026-02-21T09:08:46.8947185Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.8947736Z // begin inline asm 2026-02-21T09:08:46.8948113Z cp.async.cg.shared.global [ %r385 + 0 ], [ %rd40 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8948656Z // end inline asm 2026-02-21T09:08:46.8948888Z // begin inline asm 2026-02-21T09:08:46.8949260Z cp.async.cg.shared.global [ %r387 + 0 ], [ %rd41 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8949687Z // end inline asm 2026-02-21T09:08:46.8949927Z // begin inline asm 2026-02-21T09:08:46.8950284Z cp.async.cg.shared.global [ %r389 + 0 ], [ %rd42 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8950710Z // end inline asm 2026-02-21T09:08:46.8950940Z // begin inline asm 2026-02-21T09:08:46.8951300Z cp.async.cg.shared.global [ %r391 + 0 ], [ %rd43 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8951722Z // end inline asm 2026-02-21T09:08:46.8951984Z cp.async.commit_group; 2026-02-21T09:08:46.8952603Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.8953145Z bar.sync 0; 2026-02-21T09:08:46.8953382Z // begin inline asm 2026-02-21T09:08:46.8953732Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r368], 16384; 2026-02-21T09:08:46.8954149Z // end inline asm 2026-02-21T09:08:46.8954613Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.8955254Z bar.sync 0; 2026-02-21T09:08:46.8955600Z elect.sync %r474|%p55, -1; 2026-02-21T09:08:46.8955913Z and.pred %p48, %p1, %p55; 2026-02-21T09:08:46.8956204Z add.s32 %r394, %r76, 98304; 2026-02-21T09:08:46.8956476Z // begin inline asm 2026-02-21T09:08:46.8957113Z @%p48 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r394], [%rd82, {%r79, %r512}], [%r368]; 2026-02-21T09:08:46.8957810Z // end inline asm 2026-02-21T09:08:46.8958392Z .loc 1 51 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:32 2026-02-21T09:08:46.8958967Z add.s64 %rd45, %rd3, 128; 2026-02-21T09:08:46.8959251Z or.b32 %r475, %r470, 64; 2026-02-21T09:08:46.8959536Z mad.wide.u32 %rd61, %r475, 2, %rd13; 2026-02-21T09:08:46.8959859Z add.s64 %rd46, %rd61, 262144; 2026-02-21T09:08:46.8960156Z add.s64 %rd47, %rd61, 524288; 2026-02-21T09:08:46.8960433Z add.s64 %rd48, %rd5, 128; 2026-02-21T09:08:46.8960943Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.8961500Z // begin inline asm 2026-02-21T09:08:46.8961871Z cp.async.cg.shared.global [ %r398 + 0 ], [ %rd45 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8962300Z // end inline asm 2026-02-21T09:08:46.8962549Z // begin inline asm 2026-02-21T09:08:46.8962914Z cp.async.cg.shared.global [ %r400 + 0 ], [ %rd46 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8963334Z // end inline asm 2026-02-21T09:08:46.8963576Z // begin inline asm 2026-02-21T09:08:46.8963933Z cp.async.cg.shared.global [ %r402 + 0 ], [ %rd47 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8964367Z // end inline asm 2026-02-21T09:08:46.8964598Z // begin inline asm 2026-02-21T09:08:46.8965037Z cp.async.cg.shared.global [ %r404 + 0 ], [ %rd48 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8965460Z // end inline asm 2026-02-21T09:08:46.8965713Z cp.async.commit_group; 2026-02-21T09:08:46.8966211Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.8966760Z bar.sync 0; 2026-02-21T09:08:46.8966994Z // begin inline asm 2026-02-21T09:08:46.8967341Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r369], 16384; 2026-02-21T09:08:46.8967758Z // end inline asm 2026-02-21T09:08:46.8968218Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.8968772Z bar.sync 0; 2026-02-21T09:08:46.8969016Z elect.sync %r476|%p56, -1; 2026-02-21T09:08:46.8969325Z and.pred %p50, %p1, %p56; 2026-02-21T09:08:46.8969620Z add.s32 %r407, %r76, 114688; 2026-02-21T09:08:46.8969891Z mov.b32 %r408, 64; 2026-02-21T09:08:46.8970140Z // begin inline asm 2026-02-21T09:08:46.8970769Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r407], [%rd82, {%r408, %r512}], [%r369]; 2026-02-21T09:08:46.8971583Z // end inline asm 2026-02-21T09:08:46.8972049Z .loc 1 51 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:32 2026-02-21T09:08:46.8972612Z add.s64 %rd50, %rd3, 192; 2026-02-21T09:08:46.8972893Z or.b32 %r477, %r470, 96; 2026-02-21T09:08:46.8973176Z mad.wide.u32 %rd62, %r477, 2, %rd13; 2026-02-21T09:08:46.8973498Z add.s64 %rd51, %rd62, 262144; 2026-02-21T09:08:46.8973779Z add.s64 %rd52, %rd62, 524288; 2026-02-21T09:08:46.8974061Z add.s64 %rd53, %rd5, 192; 2026-02-21T09:08:46.8974561Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.8975252Z // begin inline asm 2026-02-21T09:08:46.8975771Z cp.async.cg.shared.global [ %r411 + 0 ], [ %rd50 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8976202Z // end inline asm 2026-02-21T09:08:46.8976446Z // begin inline asm 2026-02-21T09:08:46.8976809Z cp.async.cg.shared.global [ %r413 + 0 ], [ %rd51 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8977250Z // end inline asm 2026-02-21T09:08:46.8977486Z // begin inline asm 2026-02-21T09:08:46.8977849Z cp.async.cg.shared.global [ %r415 + 0 ], [ %rd52 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8978367Z // end inline asm 2026-02-21T09:08:46.8978620Z // begin inline asm 2026-02-21T09:08:46.8978972Z cp.async.cg.shared.global [ %r417 + 0 ], [ %rd53 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.8979396Z // end inline asm 2026-02-21T09:08:46.8979644Z cp.async.commit_group; 2026-02-21T09:08:46.8980139Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.8980696Z bar.sync 0; 2026-02-21T09:08:46.8981031Z // begin inline asm 2026-02-21T09:08:46.8981400Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r370], 16384; 2026-02-21T09:08:46.8981808Z // end inline asm 2026-02-21T09:08:46.8982276Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.8982823Z bar.sync 0; 2026-02-21T09:08:46.8983068Z elect.sync %r478|%p57, -1; 2026-02-21T09:08:46.8983370Z and.pred %p52, %p1, %p57; 2026-02-21T09:08:46.8983647Z add.s32 %r420, %r76, 131072; 2026-02-21T09:08:46.8983929Z mov.b32 %r421, 96; 2026-02-21T09:08:46.8984169Z // begin inline asm 2026-02-21T09:08:46.8984872Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r420], [%rd82, {%r421, %r512}], [%r370]; 2026-02-21T09:08:46.8985573Z // end inline asm 2026-02-21T09:08:46.8986053Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.8986625Z cp.async.wait_group 3; 2026-02-21T09:08:46.8986897Z bar.sync 0; 2026-02-21T09:08:46.8987361Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.8987912Z // begin inline asm 2026-02-21T09:08:46.8988154Z 2026-02-21T09:08:46.8988346Z { 2026-02-21T09:08:46.8988564Z .reg .pred complete; 2026-02-21T09:08:46.8988826Z waitLoop: 2026-02-21T09:08:46.8989170Z mbarrier.try_wait.parity.shared.b64 complete, [%r367], %r94; 2026-02-21T09:08:46.8989607Z @!complete bra.uni waitLoop; 2026-02-21T09:08:46.8989887Z } 2026-02-21T09:08:46.8990011Z 2026-02-21T09:08:46.8990113Z // end inline asm 2026-02-21T09:08:46.8990579Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.8991150Z setp.ne.b32 %p58, %r61, 0; 2026-02-21T09:08:46.8991431Z @%p58 bra $L__BB0_3; 2026-02-21T09:08:46.8991685Z // %bb.2: 2026-02-21T09:08:46.8992119Z .loc 1 0 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:0:52 2026-02-21T09:08:46.8992687Z add.s32 %r488, %r76, 8224; 2026-02-21T09:08:46.8992973Z bfe.u32 %r489, %r488, 4, 14; 2026-02-21T09:08:46.8993263Z cvt.u64.u32 %rd72, %r489; 2026-02-21T09:08:46.8993568Z or.b64 %rd69, %rd72, -9223371899348713472; 2026-02-21T09:08:46.8993899Z add.s32 %r490, %r76, 8192; 2026-02-21T09:08:46.8994325Z bfe.u32 %r491, %r490, 4, 14; 2026-02-21T09:08:46.8994604Z cvt.u64.u32 %rd73, %r491; 2026-02-21T09:08:46.8994987Z or.b64 %rd67, %rd73, -9223371899348713472; 2026-02-21T09:08:46.8995325Z add.s32 %r493, %r76, 81952; 2026-02-21T09:08:46.8995609Z bfe.u32 %r494, %r493, 4, 14; 2026-02-21T09:08:46.8995885Z cvt.u64.u32 %rd74, %r494; 2026-02-21T09:08:46.8996183Z or.b64 %rd66, %rd74, -9223371899348713472; 2026-02-21T09:08:46.8996518Z add.s32 %r495, %r76, 32; 2026-02-21T09:08:46.8996785Z bfe.u32 %r496, %r495, 4, 14; 2026-02-21T09:08:46.8997071Z cvt.u64.u32 %rd75, %r496; 2026-02-21T09:08:46.8997358Z or.b64 %rd65, %rd75, -9223371899348713472; 2026-02-21T09:08:46.8997700Z bfe.u32 %r497, %r381, 4, 14; 2026-02-21T09:08:46.8998108Z cvt.u64.u32 %rd76, %r497; 2026-02-21T09:08:46.8998403Z or.b64 %rd64, %rd76, -9223371899348713472; 2026-02-21T09:08:46.8998724Z bfe.u32 %r498, %r76, 4, 14; 2026-02-21T09:08:46.8999004Z cvt.u64.u32 %rd77, %r498; 2026-02-21T09:08:46.8999293Z or.b64 %rd63, %rd77, -9223371899348713472; 2026-02-21T09:08:46.8999858Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9000434Z elect.sync %r499|%p60, -1; 2026-02-21T09:08:46.9000811Z mov.b32 %r480, 138412048; 2026-02-21T09:08:46.9001098Z mov.pred %p59, 0; 2026-02-21T09:08:46.9001341Z // begin inline asm 2026-02-21T09:08:46.9001768Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd63, %rd64, %r480, %p59; 2026-02-21T09:08:46.9002252Z // end inline asm 2026-02-21T09:08:46.9002497Z // begin inline asm 2026-02-21T09:08:46.9003007Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd65, %rd66, %r480, %p61; 2026-02-21T09:08:46.9003502Z // end inline asm 2026-02-21T09:08:46.9003745Z // begin inline asm 2026-02-21T09:08:46.9004159Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd67, %rd64, %r480, %p59; 2026-02-21T09:08:46.9004654Z // end inline asm 2026-02-21T09:08:46.9004951Z // begin inline asm 2026-02-21T09:08:46.9005374Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd69, %rd66, %r480, %p61; 2026-02-21T09:08:46.9005858Z // end inline asm 2026-02-21T09:08:46.9006104Z add.s32 %r500, %r76, 163888; 2026-02-21T09:08:46.9006394Z cvt.u64.u32 %rd71, %r500; 2026-02-21T09:08:46.9006656Z // begin inline asm 2026-02-21T09:08:46.9007044Z @%p60 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd71]; 2026-02-21T09:08:46.9007481Z // end inline asm 2026-02-21T09:08:46.9007717Z $L__BB0_3: 2026-02-21T09:08:46.9008160Z .loc 1 0 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:0:52 2026-02-21T09:08:46.9008781Z ld.param.b64 %rd14, [_helion_matmul_param_2]; 2026-02-21T09:08:46.9009157Z add.s32 %r11, %r76, %r441; 2026-02-21T09:08:46.9009432Z add.s32 %r12, %r76, %r442; 2026-02-21T09:08:46.9009706Z add.s32 %r13, %r76, %r443; 2026-02-21T09:08:46.9009977Z add.s32 %r14, %r76, %r444; 2026-02-21T09:08:46.9010254Z add.s32 %r15, %r76, %r445; 2026-02-21T09:08:46.9010525Z add.s32 %r16, %r76, %r446; 2026-02-21T09:08:46.9010798Z add.s32 %r17, %r76, %r447; 2026-02-21T09:08:46.9011062Z add.s32 %r18, %r76, %r448; 2026-02-21T09:08:46.9011338Z add.s32 %r872, %r867, 512; 2026-02-21T09:08:46.9011620Z add.s32 %r877, %r867, 1024; 2026-02-21T09:08:46.9011904Z add.s32 %r882, %r867, 1536; 2026-02-21T09:08:46.9012185Z add.s32 %r887, %r867, 2048; 2026-02-21T09:08:46.9012457Z add.s32 %r892, %r867, 2560; 2026-02-21T09:08:46.9012734Z add.s32 %r897, %r867, 3072; 2026-02-21T09:08:46.9013002Z add.s32 %r902, %r867, 3584; 2026-02-21T09:08:46.9013281Z or.b32 %r28, %r512, %r429; 2026-02-21T09:08:46.9013547Z or.b32 %r30, %r29, 8; 2026-02-21T09:08:46.9013811Z or.b32 %r31, %r29, 16; 2026-02-21T09:08:46.9014079Z or.b32 %r32, %r29, 24; 2026-02-21T09:08:46.9014340Z or.b32 %r33, %r29, 32; 2026-02-21T09:08:46.9014593Z or.b32 %r34, %r29, 40; 2026-02-21T09:08:46.9014915Z or.b32 %r35, %r29, 48; 2026-02-21T09:08:46.9015172Z or.b32 %r36, %r29, 56; 2026-02-21T09:08:46.9015531Z or.b32 %r37, %r29, 64; 2026-02-21T09:08:46.9015790Z or.b32 %r38, %r29, 72; 2026-02-21T09:08:46.9016036Z or.b32 %r39, %r29, 80; 2026-02-21T09:08:46.9016291Z or.b32 %r40, %r29, 88; 2026-02-21T09:08:46.9016540Z or.b32 %r41, %r29, 96; 2026-02-21T09:08:46.9016802Z or.b32 %r42, %r29, 104; 2026-02-21T09:08:46.9017068Z or.b32 %r43, %r29, 112; 2026-02-21T09:08:46.9017337Z or.b32 %r44, %r29, 120; 2026-02-21T09:08:46.9017589Z or.b32 %r45, %r29, 128; 2026-02-21T09:08:46.9017852Z or.b32 %r46, %r29, 136; 2026-02-21T09:08:46.9018114Z or.b32 %r47, %r29, 144; 2026-02-21T09:08:46.9018364Z or.b32 %r48, %r29, 152; 2026-02-21T09:08:46.9018632Z or.b32 %r49, %r29, 160; 2026-02-21T09:08:46.9018886Z or.b32 %r50, %r29, 168; 2026-02-21T09:08:46.9019271Z or.b32 %r51, %r29, 176; 2026-02-21T09:08:46.9019521Z or.b32 %r52, %r29, 184; 2026-02-21T09:08:46.9019776Z or.b32 %r53, %r29, 192; 2026-02-21T09:08:46.9020029Z or.b32 %r54, %r29, 200; 2026-02-21T09:08:46.9020289Z or.b32 %r55, %r29, 208; 2026-02-21T09:08:46.9020543Z or.b32 %r56, %r29, 216; 2026-02-21T09:08:46.9020801Z or.b32 %r57, %r29, 224; 2026-02-21T09:08:46.9021058Z or.b32 %r58, %r29, 232; 2026-02-21T09:08:46.9021309Z or.b32 %r59, %r29, 240; 2026-02-21T09:08:46.9021665Z or.b32 %r60, %r29, 248; 2026-02-21T09:08:46.9022162Z .loc 1 51 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:32 2026-02-21T09:08:46.9022723Z add.s64 %rd78, %rd3, 256; 2026-02-21T09:08:46.9022995Z cvt.u64.u32 %rd84, %r6; 2026-02-21T09:08:46.9023267Z add.s64 %rd85, %rd2, %rd84; 2026-02-21T09:08:46.9023543Z shl.b64 %rd86, %rd85, 1; 2026-02-21T09:08:46.9023927Z add.s64 %rd87, %rd13, %rd86; 2026-02-21T09:08:46.9024221Z add.s64 %rd79, %rd87, 262144; 2026-02-21T09:08:46.9024518Z add.s64 %rd80, %rd87, 524288; 2026-02-21T09:08:46.9024874Z add.s64 %rd81, %rd5, 256; 2026-02-21T09:08:46.9025373Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.9025935Z bar.sync 0; 2026-02-21T09:08:46.9026159Z // begin inline asm 2026-02-21T09:08:46.9026533Z cp.async.cg.shared.global [ %r501 + 0 ], [ %rd78 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.9026957Z // end inline asm 2026-02-21T09:08:46.9027203Z // begin inline asm 2026-02-21T09:08:46.9027571Z cp.async.cg.shared.global [ %r503 + 0 ], [ %rd79 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.9028000Z // end inline asm 2026-02-21T09:08:46.9028243Z // begin inline asm 2026-02-21T09:08:46.9028601Z cp.async.cg.shared.global [ %r505 + 0 ], [ %rd80 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.9029031Z // end inline asm 2026-02-21T09:08:46.9029263Z // begin inline asm 2026-02-21T09:08:46.9029630Z cp.async.cg.shared.global [ %r507 + 0 ], [ %rd81 + 0 ], 0x10, %r502; 2026-02-21T09:08:46.9030059Z // end inline asm 2026-02-21T09:08:46.9030313Z cp.async.commit_group; 2026-02-21T09:08:46.9030807Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9031369Z // begin inline asm 2026-02-21T09:08:46.9031726Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r509], 16384; 2026-02-21T09:08:46.9032134Z // end inline asm 2026-02-21T09:08:46.9032604Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.9033148Z bar.sync 0; 2026-02-21T09:08:46.9033403Z elect.sync %r519|%p71, -1; 2026-02-21T09:08:46.9033692Z and.pred %p69, %p1, %p71; 2026-02-21T09:08:46.9033978Z add.s32 %r510, %r76, 147456; 2026-02-21T09:08:46.9034250Z mov.b32 %r511, 128; 2026-02-21T09:08:46.9034505Z // begin inline asm 2026-02-21T09:08:46.9035228Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r510], [%rd82, {%r511, %r512}], [%r509]; 2026-02-21T09:08:46.9035943Z // end inline asm 2026-02-21T09:08:46.9036421Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9036978Z shl.b64 %rd88, %rd4, 1; 2026-02-21T09:08:46.9037372Z add.s64 %rd6, %rd88, 320; 2026-02-21T09:08:46.9037646Z and.b32 %r520, %r1, 3; 2026-02-21T09:08:46.9037935Z mad.wide.u32 %rd654, %r520, 16, %rd13; 2026-02-21T09:08:46.9038264Z shl.b32 %r521, %r3, 18; 2026-02-21T09:08:46.9038540Z and.b32 %r522, %r521, 7864320; 2026-02-21T09:08:46.9038835Z shl.b32 %r523, %r4, 11; 2026-02-21T09:08:46.9039097Z or.b32 %r524, %r522, %r523; 2026-02-21T09:08:46.9039384Z mul.wide.u32 %rd8, %r524, 2; 2026-02-21T09:08:46.9039657Z mov.b32 %r1606, 1; 2026-02-21T09:08:46.9039903Z mov.b32 %r1605, 4; 2026-02-21T09:08:46.9040136Z mov.b32 %r1601, 0; 2026-02-21T09:08:46.9040378Z mov.b64 %rd655, 0; 2026-02-21T09:08:46.9040620Z mov.b32 %r1603, %r1601; 2026-02-21T09:08:46.9041041Z mov.b32 %r1604, %r1601; 2026-02-21T09:08:46.9041307Z mov.b32 %r1607, %r1601; 2026-02-21T09:08:46.9041563Z bra.uni $L__BB0_4; 2026-02-21T09:08:46.9041901Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:08:46.9042532Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9043118Z setp.lt.u64 %p85, %rd655, 1888; 2026-02-21T09:08:46.9043761Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9044324Z // begin inline asm 2026-02-21T09:08:46.9044557Z 2026-02-21T09:08:46.9044815Z { 2026-02-21T09:08:46.9045054Z .reg .pred complete; 2026-02-21T09:08:46.9045302Z waitLoop: 2026-02-21T09:08:46.9045656Z mbarrier.try_wait.parity.shared.b64 complete, [%r1602], %r1601; 2026-02-21T09:08:46.9046109Z @!complete bra.uni waitLoop; 2026-02-21T09:08:46.9046388Z } 2026-02-21T09:08:46.9046499Z 2026-02-21T09:08:46.9046697Z // end inline asm 2026-02-21T09:08:46.9046960Z add.s32 %r572, %r1606, 1; 2026-02-21T09:08:46.9047237Z setp.gt.s32 %p88, %r572, 1; 2026-02-21T09:08:46.9047536Z selp.b32 %r1606, 0, %r572, %p88; 2026-02-21T09:08:46.9047849Z selp.b32 %r573, 1, 0, %p88; 2026-02-21T09:08:46.9048139Z xor.b32 %r74, %r1607, %r573; 2026-02-21T09:08:46.9048653Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9049207Z add.s32 %r574, %r1605, 1; 2026-02-21T09:08:46.9049497Z setp.gt.s32 %p89, %r574, 4; 2026-02-21T09:08:46.9049798Z selp.b32 %r1605, 0, %r574, %p89; 2026-02-21T09:08:46.9050337Z .loc 1 51 32 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:32 2026-02-21T09:08:46.9050896Z add.s64 %rd109, %rd654, %rd8; 2026-02-21T09:08:46.9051191Z add.s64 %rd104, %rd109, 320; 2026-02-21T09:08:46.9051483Z add.s64 %rd105, %rd109, 262464; 2026-02-21T09:08:46.9051786Z add.s64 %rd106, %rd109, 524608; 2026-02-21T09:08:46.9052319Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.9052878Z add.s64 %rd107, %rd654, %rd6; 2026-02-21T09:08:46.9053170Z shl.b32 %r575, %r1605, 14; 2026-02-21T09:08:46.9053445Z add.s32 %r577, %r76, %r575; 2026-02-21T09:08:46.9053728Z bar.sync 0; 2026-02-21T09:08:46.9053958Z add.s32 %r559, %r577, %r5; 2026-02-21T09:08:46.9054248Z selp.b32 %r560, 16, 0, %p85; 2026-02-21T09:08:46.9054535Z // begin inline asm 2026-02-21T09:08:46.9054973Z cp.async.cg.shared.global [ %r559 + 0 ], [ %rd104 + 0 ], 0x10, %r560; 2026-02-21T09:08:46.9055417Z // end inline asm 2026-02-21T09:08:46.9055654Z add.s32 %r561, %r559, 4096; 2026-02-21T09:08:46.9055933Z // begin inline asm 2026-02-21T09:08:46.9056294Z cp.async.cg.shared.global [ %r561 + 0 ], [ %rd105 + 0 ], 0x10, %r560; 2026-02-21T09:08:46.9056729Z // end inline asm 2026-02-21T09:08:46.9056966Z add.s32 %r563, %r559, 8192; 2026-02-21T09:08:46.9057251Z // begin inline asm 2026-02-21T09:08:46.9057623Z cp.async.cg.shared.global [ %r563 + 0 ], [ %rd106 + 0 ], 0x10, %r560; 2026-02-21T09:08:46.9058052Z // end inline asm 2026-02-21T09:08:46.9058303Z add.s32 %r565, %r559, 12288; 2026-02-21T09:08:46.9058578Z // begin inline asm 2026-02-21T09:08:46.9058944Z cp.async.cg.shared.global [ %r565 + 0 ], [ %rd107 + 0 ], 0x10, %r560; 2026-02-21T09:08:46.9059531Z // end inline asm 2026-02-21T09:08:46.9059784Z cp.async.commit_group; 2026-02-21T09:08:46.9060283Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9060847Z shl.b32 %r578, %r1605, 3; 2026-02-21T09:08:46.9061125Z add.s32 %r579, %r76, %r578; 2026-02-21T09:08:46.9061399Z add.s32 %r571, %r579, 163840; 2026-02-21T09:08:46.9061694Z and.pred %p83, %p93, %p85; 2026-02-21T09:08:46.9061969Z // begin inline asm 2026-02-21T09:08:46.9062325Z @%p83 mbarrier.arrive.expect_tx.shared.b64 _, [%r571], 16384; 2026-02-21T09:08:46.9062742Z // end inline asm 2026-02-21T09:08:46.9063327Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.9063884Z add.s32 %r568, %r577, 81920; 2026-02-21T09:08:46.9064167Z bar.sync 0; 2026-02-21T09:08:46.9064410Z elect.sync %r580|%p90, -1; 2026-02-21T09:08:46.9064762Z and.pred %p91, %p85, %p90; 2026-02-21T09:08:46.9065016Z and.pred %p84, %p1, %p91; 2026-02-21T09:08:46.9065292Z cvt.u32.u64 %r581, %rd655; 2026-02-21T09:08:46.9065567Z add.s32 %r569, %r581, 160; 2026-02-21T09:08:46.9065929Z // begin inline asm 2026-02-21T09:08:46.9066576Z @%p84 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r568], [%rd82, {%r569, %r512}], [%r571]; 2026-02-21T09:08:46.9067271Z // end inline asm 2026-02-21T09:08:46.9067749Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9068310Z add.s64 %rd654, %rd654, 64; 2026-02-21T09:08:46.9068703Z setp.lt.u64 %p92, %rd655, 1984; 2026-02-21T09:08:46.9069028Z add.s64 %rd655, %rd655, 32; 2026-02-21T09:08:46.9069301Z mov.b32 %r1601, %r1607; 2026-02-21T09:08:46.9069568Z mov.b32 %r1602, %r582; 2026-02-21T09:08:46.9069828Z mov.b32 %r1607, %r74; 2026-02-21T09:08:46.9070089Z @%p92 bra $L__BB0_4; 2026-02-21T09:08:46.9070348Z bra.uni $L__BB0_7; 2026-02-21T09:08:46.9070694Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:08:46.9071110Z add.s32 %r527, %r1604, 1; 2026-02-21T09:08:46.9071387Z setp.gt.s32 %p73, %r527, 4; 2026-02-21T09:08:46.9071692Z selp.b32 %r1604, 0, %r527, %p73; 2026-02-21T09:08:46.9071995Z selp.b32 %r528, 1, 0, %p73; 2026-02-21T09:08:46.9072287Z xor.b32 %r1603, %r1603, %r528; 2026-02-21T09:08:46.9072806Z .loc 1 51 85 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:51:85 2026-02-21T09:08:46.9073378Z cp.async.wait_group 3; 2026-02-21T09:08:46.9073642Z bar.sync 0; 2026-02-21T09:08:46.9074107Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9074723Z shl.b32 %r529, %r1604, 3; 2026-02-21T09:08:46.9075004Z add.s32 %r531, %r76, %r529; 2026-02-21T09:08:46.9075290Z add.s32 %r525, %r531, 163840; 2026-02-21T09:08:46.9075567Z // begin inline asm 2026-02-21T09:08:46.9075815Z 2026-02-21T09:08:46.9076002Z { 2026-02-21T09:08:46.9076216Z .reg .pred complete; 2026-02-21T09:08:46.9076464Z waitLoop: 2026-02-21T09:08:46.9076816Z mbarrier.try_wait.parity.shared.b64 complete, [%r525], %r1603; 2026-02-21T09:08:46.9077268Z @!complete bra.uni waitLoop; 2026-02-21T09:08:46.9077545Z } 2026-02-21T09:08:46.9077656Z 2026-02-21T09:08:46.9077758Z // end inline asm 2026-02-21T09:08:46.9077996Z shl.b32 %r532, %r1606, 3; 2026-02-21T09:08:46.9078271Z add.s32 %r533, %r76, %r532; 2026-02-21T09:08:46.9078548Z add.s32 %r582, %r533, 163888; 2026-02-21T09:08:46.9079064Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9079618Z @%p58 bra $L__BB0_6; 2026-02-21T09:08:46.9079958Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:08:46.9080574Z .loc 1 52 44 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:52:44 2026-02-21T09:08:46.9081939Z shl.b32 %r542, %r1604, 14; 2026-02-21T09:08:46.9082233Z add.s32 %r544, %r76, %r542; 2026-02-21T09:08:46.9082494Z add.s32 %r545, %r544, 81920; 2026-02-21T09:08:46.9083020Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9083588Z elect.sync %r546|%p75, -1; 2026-02-21T09:08:46.9083884Z bfe.u32 %r547, %r544, 4, 14; 2026-02-21T09:08:46.9084160Z cvt.u64.u32 %rd98, %r547; 2026-02-21T09:08:46.9084465Z or.b64 %rd89, %rd98, -9223371899348713472; 2026-02-21T09:08:46.9084887Z bfe.u32 %r548, %r545, 4, 14; 2026-02-21T09:08:46.9085165Z cvt.u64.u32 %rd99, %r548; 2026-02-21T09:08:46.9085478Z or.b64 %rd90, %rd99, -9223371899348713472; 2026-02-21T09:08:46.9085941Z mov.b32 %r535, 138412048; 2026-02-21T09:08:46.9086218Z mov.pred %p74, -1; 2026-02-21T09:08:46.9086465Z // begin inline asm 2026-02-21T09:08:46.9086897Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd89, %rd90, %r535, %p74; 2026-02-21T09:08:46.9087390Z // end inline asm 2026-02-21T09:08:46.9087635Z add.s32 %r549, %r544, 32; 2026-02-21T09:08:46.9087908Z bfe.u32 %r550, %r549, 4, 14; 2026-02-21T09:08:46.9088194Z cvt.u64.u32 %rd100, %r550; 2026-02-21T09:08:46.9088618Z or.b64 %rd91, %rd100, -9223371899348713472; 2026-02-21T09:08:46.9088964Z add.s32 %r551, %r544, 81952; 2026-02-21T09:08:46.9089245Z bfe.u32 %r552, %r551, 4, 14; 2026-02-21T09:08:46.9089520Z cvt.u64.u32 %rd101, %r552; 2026-02-21T09:08:46.9089825Z or.b64 %rd92, %rd101, -9223371899348713472; 2026-02-21T09:08:46.9090150Z // begin inline asm 2026-02-21T09:08:46.9090567Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd91, %rd92, %r535, %p74; 2026-02-21T09:08:46.9091178Z // end inline asm 2026-02-21T09:08:46.9091433Z add.s32 %r553, %r544, 8192; 2026-02-21T09:08:46.9091715Z bfe.u32 %r554, %r553, 4, 14; 2026-02-21T09:08:46.9091988Z cvt.u64.u32 %rd102, %r554; 2026-02-21T09:08:46.9092292Z or.b64 %rd93, %rd102, -9223371899348713472; 2026-02-21T09:08:46.9092623Z // begin inline asm 2026-02-21T09:08:46.9093043Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd93, %rd90, %r535, %p74; 2026-02-21T09:08:46.9093532Z // end inline asm 2026-02-21T09:08:46.9093783Z add.s32 %r555, %r544, 8224; 2026-02-21T09:08:46.9094065Z bfe.u32 %r556, %r555, 4, 14; 2026-02-21T09:08:46.9094340Z cvt.u64.u32 %rd103, %r556; 2026-02-21T09:08:46.9094638Z or.b64 %rd95, %rd103, -9223371899348713472; 2026-02-21T09:08:46.9095037Z // begin inline asm 2026-02-21T09:08:46.9095458Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd95, %rd92, %r535, %p74; 2026-02-21T09:08:46.9095940Z // end inline asm 2026-02-21T09:08:46.9096198Z cvt.u64.u32 %rd97, %r582; 2026-02-21T09:08:46.9096462Z // begin inline asm 2026-02-21T09:08:46.9096850Z @%p75 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd97]; 2026-02-21T09:08:46.9097296Z // end inline asm 2026-02-21T09:08:46.9097527Z bra.uni $L__BB0_6; 2026-02-21T09:08:46.9097847Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:08:46.9098457Z .loc 1 0 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:0:52 2026-02-21T09:08:46.9099011Z mov.b32 %r583, 1; 2026-02-21T09:08:46.9099489Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9100052Z // begin inline asm 2026-02-21T09:08:46.9100287Z 2026-02-21T09:08:46.9100485Z { 2026-02-21T09:08:46.9100696Z .reg .pred complete; 2026-02-21T09:08:46.9100947Z waitLoop: 2026-02-21T09:08:46.9101288Z mbarrier.try_wait.parity.shared.b64 complete, [%r582], %r583; 2026-02-21T09:08:46.9101726Z @!complete bra.uni waitLoop; 2026-02-21T09:08:46.9102009Z } 2026-02-21T09:08:46.9102125Z 2026-02-21T09:08:46.9102218Z // end inline asm 2026-02-21T09:08:46.9102694Z .loc 1 46 42 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:46:42 2026-02-21T09:08:46.9103252Z cp.async.wait_group 0; 2026-02-21T09:08:46.9103523Z bar.sync 0; 2026-02-21T09:08:46.9103877Z // begin inline asm 2026-02-21T09:08:46.9104164Z @%p93 mbarrier.inval.shared::cta.b64 [%r367]; 2026-02-21T09:08:46.9104520Z // end inline asm 2026-02-21T09:08:46.9104801Z bar.sync 0; 2026-02-21T09:08:46.9105042Z // begin inline asm 2026-02-21T09:08:46.9105336Z @%p93 mbarrier.inval.shared::cta.b64 [%r368]; 2026-02-21T09:08:46.9105693Z // end inline asm 2026-02-21T09:08:46.9105922Z bar.sync 0; 2026-02-21T09:08:46.9106152Z // begin inline asm 2026-02-21T09:08:46.9106448Z @%p93 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T09:08:46.9106798Z // end inline asm 2026-02-21T09:08:46.9107039Z bar.sync 0; 2026-02-21T09:08:46.9107270Z // begin inline asm 2026-02-21T09:08:46.9107569Z @%p93 mbarrier.inval.shared::cta.b64 [%r370]; 2026-02-21T09:08:46.9108034Z // end inline asm 2026-02-21T09:08:46.9108267Z bar.sync 0; 2026-02-21T09:08:46.9108484Z // begin inline asm 2026-02-21T09:08:46.9108783Z @%p93 mbarrier.inval.shared::cta.b64 [%r509]; 2026-02-21T09:08:46.9109125Z // end inline asm 2026-02-21T09:08:46.9109370Z add.s32 %r589, %r76, 163888; 2026-02-21T09:08:46.9109642Z // begin inline asm 2026-02-21T09:08:46.9109948Z @%p93 mbarrier.inval.shared::cta.b64 [%r589]; 2026-02-21T09:08:46.9110390Z // end inline asm 2026-02-21T09:08:46.9110627Z bar.sync 0; 2026-02-21T09:08:46.9110856Z // begin inline asm 2026-02-21T09:08:46.9111146Z @%p93 mbarrier.inval.shared::cta.b64 [%r366]; 2026-02-21T09:08:46.9111494Z // end inline asm 2026-02-21T09:08:46.9111962Z .loc 1 56 45 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:56:45 2026-02-21T09:08:46.9112530Z shl.b32 %r1152, %r29, 11; 2026-02-21T09:08:46.9112902Z shl.b32 %r1153, %r30, 11; 2026-02-21T09:08:46.9113198Z shl.b32 %r1154, %r31, 11; 2026-02-21T09:08:46.9113475Z shl.b32 %r1155, %r32, 11; 2026-02-21T09:08:46.9113739Z shl.b32 %r1156, %r33, 11; 2026-02-21T09:08:46.9114014Z shl.b32 %r1157, %r34, 11; 2026-02-21T09:08:46.9114278Z shl.b32 %r1158, %r35, 11; 2026-02-21T09:08:46.9114557Z shl.b32 %r1159, %r36, 11; 2026-02-21T09:08:46.9114917Z shl.b32 %r1160, %r37, 11; 2026-02-21T09:08:46.9115211Z shl.b32 %r1161, %r38, 11; 2026-02-21T09:08:46.9115482Z shl.b32 %r1162, %r39, 11; 2026-02-21T09:08:46.9115756Z shl.b32 %r1163, %r40, 11; 2026-02-21T09:08:46.9116028Z shl.b32 %r1164, %r41, 11; 2026-02-21T09:08:46.9116288Z shl.b32 %r1165, %r42, 11; 2026-02-21T09:08:46.9116559Z shl.b32 %r1166, %r43, 11; 2026-02-21T09:08:46.9116817Z shl.b32 %r1167, %r44, 11; 2026-02-21T09:08:46.9117085Z shl.b32 %r1168, %r45, 11; 2026-02-21T09:08:46.9117345Z shl.b32 %r1169, %r46, 11; 2026-02-21T09:08:46.9117614Z shl.b32 %r1170, %r47, 11; 2026-02-21T09:08:46.9117877Z shl.b32 %r1171, %r48, 11; 2026-02-21T09:08:46.9118154Z shl.b32 %r1172, %r49, 11; 2026-02-21T09:08:46.9118416Z shl.b32 %r1173, %r50, 11; 2026-02-21T09:08:46.9118684Z shl.b32 %r1174, %r51, 11; 2026-02-21T09:08:46.9118951Z shl.b32 %r1175, %r52, 11; 2026-02-21T09:08:46.9119214Z shl.b32 %r1176, %r53, 11; 2026-02-21T09:08:46.9119486Z shl.b32 %r1177, %r54, 11; 2026-02-21T09:08:46.9119742Z shl.b32 %r1178, %r55, 11; 2026-02-21T09:08:46.9120012Z shl.b32 %r1179, %r56, 11; 2026-02-21T09:08:46.9120271Z shl.b32 %r1180, %r57, 11; 2026-02-21T09:08:46.9120543Z shl.b32 %r1181, %r58, 11; 2026-02-21T09:08:46.9120807Z shl.b32 %r1182, %r59, 11; 2026-02-21T09:08:46.9121076Z shl.b32 %r1183, %r60, 11; 2026-02-21T09:08:46.9121566Z .loc 1 56 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:56:52 2026-02-21T09:08:46.9122133Z or.b32 %r1184, %r28, %r1152; 2026-02-21T09:08:46.9122420Z or.b32 %r1185, %r28, %r1153; 2026-02-21T09:08:46.9122695Z or.b32 %r1186, %r28, %r1154; 2026-02-21T09:08:46.9122982Z or.b32 %r1187, %r28, %r1155; 2026-02-21T09:08:46.9123263Z or.b32 %r1188, %r28, %r1156; 2026-02-21T09:08:46.9123536Z or.b32 %r1189, %r28, %r1157; 2026-02-21T09:08:46.9123806Z or.b32 %r1190, %r28, %r1158; 2026-02-21T09:08:46.9124083Z or.b32 %r1191, %r28, %r1159; 2026-02-21T09:08:46.9124473Z or.b32 %r1192, %r28, %r1160; 2026-02-21T09:08:46.9124818Z or.b32 %r1193, %r28, %r1161; 2026-02-21T09:08:46.9125100Z or.b32 %r1194, %r28, %r1162; 2026-02-21T09:08:46.9125367Z or.b32 %r1195, %r28, %r1163; 2026-02-21T09:08:46.9125650Z or.b32 %r1196, %r28, %r1164; 2026-02-21T09:08:46.9125919Z or.b32 %r1197, %r28, %r1165; 2026-02-21T09:08:46.9126197Z or.b32 %r1198, %r28, %r1166; 2026-02-21T09:08:46.9126470Z or.b32 %r1199, %r28, %r1167; 2026-02-21T09:08:46.9126748Z or.b32 %r1200, %r28, %r1168; 2026-02-21T09:08:46.9127012Z or.b32 %r1201, %r28, %r1169; 2026-02-21T09:08:46.9127287Z or.b32 %r1202, %r28, %r1170; 2026-02-21T09:08:46.9127555Z or.b32 %r1203, %r28, %r1171; 2026-02-21T09:08:46.9127836Z or.b32 %r1204, %r28, %r1172; 2026-02-21T09:08:46.9128230Z or.b32 %r1205, %r28, %r1173; 2026-02-21T09:08:46.9128495Z or.b32 %r1206, %r28, %r1174; 2026-02-21T09:08:46.9128770Z or.b32 %r1207, %r28, %r1175; 2026-02-21T09:08:46.9129043Z or.b32 %r1208, %r28, %r1176; 2026-02-21T09:08:46.9129329Z or.b32 %r1209, %r28, %r1177; 2026-02-21T09:08:46.9129601Z or.b32 %r1210, %r28, %r1178; 2026-02-21T09:08:46.9129879Z or.b32 %r1211, %r28, %r1179; 2026-02-21T09:08:46.9130141Z or.b32 %r1212, %r28, %r1180; 2026-02-21T09:08:46.9130510Z or.b32 %r1213, %r28, %r1181; 2026-02-21T09:08:46.9130806Z or.b32 %r1214, %r28, %r1182; 2026-02-21T09:08:46.9131075Z or.b32 %r1215, %r28, %r1183; 2026-02-21T09:08:46.9131592Z .loc 1 56 24 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:56:24 2026-02-21T09:08:46.9132163Z mad.wide.u32 %rd110, %r1184, 2, %rd14; 2026-02-21T09:08:46.9132517Z mad.wide.u32 %rd111, %r1185, 2, %rd14; 2026-02-21T09:08:46.9132983Z mad.wide.u32 %rd112, %r1186, 2, %rd14; 2026-02-21T09:08:46.9133336Z mad.wide.u32 %rd113, %r1187, 2, %rd14; 2026-02-21T09:08:46.9133659Z mad.wide.u32 %rd114, %r1188, 2, %rd14; 2026-02-21T09:08:46.9133988Z mad.wide.u32 %rd115, %r1189, 2, %rd14; 2026-02-21T09:08:46.9134317Z mad.wide.u32 %rd116, %r1190, 2, %rd14; 2026-02-21T09:08:46.9134647Z mad.wide.u32 %rd117, %r1191, 2, %rd14; 2026-02-21T09:08:46.9135045Z mad.wide.u32 %rd118, %r1192, 2, %rd14; 2026-02-21T09:08:46.9135367Z mad.wide.u32 %rd119, %r1193, 2, %rd14; 2026-02-21T09:08:46.9135699Z mad.wide.u32 %rd120, %r1194, 2, %rd14; 2026-02-21T09:08:46.9136022Z mad.wide.u32 %rd121, %r1195, 2, %rd14; 2026-02-21T09:08:46.9136350Z mad.wide.u32 %rd122, %r1196, 2, %rd14; 2026-02-21T09:08:46.9136676Z mad.wide.u32 %rd123, %r1197, 2, %rd14; 2026-02-21T09:08:46.9137003Z mad.wide.u32 %rd124, %r1198, 2, %rd14; 2026-02-21T09:08:46.9137336Z mad.wide.u32 %rd125, %r1199, 2, %rd14; 2026-02-21T09:08:46.9137657Z mad.wide.u32 %rd126, %r1200, 2, %rd14; 2026-02-21T09:08:46.9137988Z mad.wide.u32 %rd127, %r1201, 2, %rd14; 2026-02-21T09:08:46.9138310Z mad.wide.u32 %rd128, %r1202, 2, %rd14; 2026-02-21T09:08:46.9138639Z mad.wide.u32 %rd129, %r1203, 2, %rd14; 2026-02-21T09:08:46.9138959Z mad.wide.u32 %rd130, %r1204, 2, %rd14; 2026-02-21T09:08:46.9139287Z mad.wide.u32 %rd131, %r1205, 2, %rd14; 2026-02-21T09:08:46.9139613Z mad.wide.u32 %rd132, %r1206, 2, %rd14; 2026-02-21T09:08:46.9139942Z mad.wide.u32 %rd133, %r1207, 2, %rd14; 2026-02-21T09:08:46.9140278Z mad.wide.u32 %rd134, %r1208, 2, %rd14; 2026-02-21T09:08:46.9140599Z mad.wide.u32 %rd135, %r1209, 2, %rd14; 2026-02-21T09:08:46.9140928Z mad.wide.u32 %rd136, %r1210, 2, %rd14; 2026-02-21T09:08:46.9141247Z mad.wide.u32 %rd137, %r1211, 2, %rd14; 2026-02-21T09:08:46.9141576Z mad.wide.u32 %rd138, %r1212, 2, %rd14; 2026-02-21T09:08:46.9141896Z mad.wide.u32 %rd139, %r1213, 2, %rd14; 2026-02-21T09:08:46.9142227Z mad.wide.u32 %rd140, %r1214, 2, %rd14; 2026-02-21T09:08:46.9142552Z mad.wide.u32 %rd141, %r1215, 2, %rd14; 2026-02-21T09:08:46.9143098Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9143671Z // begin inline asm 2026-02-21T09:08:46.9144380Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606}, [%r862 + 0]; 2026-02-21T09:08:46.9145332Z // end inline asm 2026-02-21T09:08:46.9145567Z // begin inline asm 2026-02-21T09:08:46.9146277Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623}, [%r862 + 16]; 2026-02-21T09:08:46.9147046Z // end inline asm 2026-02-21T09:08:46.9147293Z // begin inline asm 2026-02-21T09:08:46.9147996Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640}, [%r862 + 32]; 2026-02-21T09:08:46.9148765Z // end inline asm 2026-02-21T09:08:46.9149019Z // begin inline asm 2026-02-21T09:08:46.9149841Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657}, [%r862 + 48]; 2026-02-21T09:08:46.9150629Z // end inline asm 2026-02-21T09:08:46.9150877Z // begin inline asm 2026-02-21T09:08:46.9151581Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674}, [%r862 + 64]; 2026-02-21T09:08:46.9152489Z // end inline asm 2026-02-21T09:08:46.9152742Z // begin inline asm 2026-02-21T09:08:46.9153434Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691}, [%r862 + 80]; 2026-02-21T09:08:46.9154199Z // end inline asm 2026-02-21T09:08:46.9154443Z // begin inline asm 2026-02-21T09:08:46.9155319Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708}, [%r862 + 96]; 2026-02-21T09:08:46.9156103Z // end inline asm 2026-02-21T09:08:46.9156343Z // begin inline asm 2026-02-21T09:08:46.9157029Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725}, [%r862 + 112]; 2026-02-21T09:08:46.9157820Z // end inline asm 2026-02-21T09:08:46.9158051Z // begin inline asm 2026-02-21T09:08:46.9158750Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742}, [%r862 + 128]; 2026-02-21T09:08:46.9159523Z // end inline asm 2026-02-21T09:08:46.9159756Z // begin inline asm 2026-02-21T09:08:46.9160444Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759}, [%r862 + 144]; 2026-02-21T09:08:46.9161222Z // end inline asm 2026-02-21T09:08:46.9161470Z // begin inline asm 2026-02-21T09:08:46.9162148Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776}, [%r862 + 160]; 2026-02-21T09:08:46.9162912Z // end inline asm 2026-02-21T09:08:46.9163163Z // begin inline asm 2026-02-21T09:08:46.9163845Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793}, [%r862 + 176]; 2026-02-21T09:08:46.9164775Z // end inline asm 2026-02-21T09:08:46.9165120Z // begin inline asm 2026-02-21T09:08:46.9165888Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810}, [%r862 + 192]; 2026-02-21T09:08:46.9166670Z // end inline asm 2026-02-21T09:08:46.9166904Z // begin inline asm 2026-02-21T09:08:46.9167606Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827}, [%r862 + 208]; 2026-02-21T09:08:46.9168390Z // end inline asm 2026-02-21T09:08:46.9168633Z // begin inline asm 2026-02-21T09:08:46.9169319Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844}, [%r862 + 224]; 2026-02-21T09:08:46.9170212Z // end inline asm 2026-02-21T09:08:46.9170456Z // begin inline asm 2026-02-21T09:08:46.9171145Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r846, %r847, %r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858, %r859, %r860, %r861}, [%r862 + 240]; 2026-02-21T09:08:46.9171918Z // end inline asm 2026-02-21T09:08:46.9172154Z // begin inline asm 2026-02-21T09:08:46.9172428Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:08:46.9172719Z // end inline asm 2026-02-21T09:08:46.9172974Z cvt.u64.u32 %rd142, %r591; 2026-02-21T09:08:46.9173257Z cvt.u64.u32 %rd143, %r592; 2026-02-21T09:08:46.9173663Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:08:46.9173958Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:08:46.9174486Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9175148Z mov.b64 {%r1216, %r1217}, %rd145; 2026-02-21T09:08:46.9175478Z cvt.rn.f16x2.f32 %r1218, %r1217, %r1216; 2026-02-21T09:08:46.9176044Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9176703Z cvt.u64.u32 %rd146, %r593; 2026-02-21T09:08:46.9177012Z cvt.u64.u32 %rd147, %r594; 2026-02-21T09:08:46.9177294Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:08:46.9177590Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:08:46.9178118Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9178686Z mov.b64 {%r1219, %r1220}, %rd149; 2026-02-21T09:08:46.9179146Z cvt.rn.f16x2.f32 %r1221, %r1220, %r1219; 2026-02-21T09:08:46.9179725Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9180294Z cvt.u64.u32 %rd150, %r595; 2026-02-21T09:08:46.9180568Z cvt.u64.u32 %rd151, %r596; 2026-02-21T09:08:46.9180852Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:08:46.9181152Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:08:46.9181666Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9182238Z mov.b64 {%r1222, %r1223}, %rd153; 2026-02-21T09:08:46.9182558Z cvt.rn.f16x2.f32 %r1224, %r1223, %r1222; 2026-02-21T09:08:46.9183114Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9183673Z cvt.u64.u32 %rd154, %r597; 2026-02-21T09:08:46.9183961Z cvt.u64.u32 %rd155, %r598; 2026-02-21T09:08:46.9184242Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:08:46.9184532Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:08:46.9185138Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9185705Z mov.b64 {%r1225, %r1226}, %rd157; 2026-02-21T09:08:46.9186031Z cvt.rn.f16x2.f32 %r1227, %r1226, %r1225; 2026-02-21T09:08:46.9186579Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9187146Z cvt.u64.u32 %rd158, %r599; 2026-02-21T09:08:46.9187420Z cvt.u64.u32 %rd159, %r600; 2026-02-21T09:08:46.9187707Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:08:46.9187998Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:08:46.9188510Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9189076Z mov.b64 {%r1228, %r1229}, %rd161; 2026-02-21T09:08:46.9189403Z cvt.rn.f16x2.f32 %r1230, %r1229, %r1228; 2026-02-21T09:08:46.9189967Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9190534Z cvt.u64.u32 %rd162, %r601; 2026-02-21T09:08:46.9190819Z cvt.u64.u32 %rd163, %r602; 2026-02-21T09:08:46.9191100Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:08:46.9191384Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:08:46.9191899Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9192576Z mov.b64 {%r1231, %r1232}, %rd165; 2026-02-21T09:08:46.9192899Z cvt.rn.f16x2.f32 %r1233, %r1232, %r1231; 2026-02-21T09:08:46.9193444Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9194007Z cvt.u64.u32 %rd166, %r603; 2026-02-21T09:08:46.9194282Z cvt.u64.u32 %rd167, %r604; 2026-02-21T09:08:46.9194568Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:08:46.9194931Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:08:46.9195445Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9196008Z mov.b64 {%r1234, %r1235}, %rd169; 2026-02-21T09:08:46.9196449Z cvt.rn.f16x2.f32 %r1236, %r1235, %r1234; 2026-02-21T09:08:46.9197001Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9197550Z cvt.u64.u32 %rd170, %r605; 2026-02-21T09:08:46.9197838Z cvt.u64.u32 %rd171, %r606; 2026-02-21T09:08:46.9197935Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:08:46.9198069Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:08:46.9198492Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9198605Z mov.b64 {%r1237, %r1238}, %rd173; 2026-02-21T09:08:46.9198722Z cvt.rn.f16x2.f32 %r1239, %r1238, %r1237; 2026-02-21T09:08:46.9199054Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9199153Z cvt.u64.u32 %rd174, %r608; 2026-02-21T09:08:46.9199335Z cvt.u64.u32 %rd175, %r609; 2026-02-21T09:08:46.9199458Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:08:46.9199565Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:08:46.9199888Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9199993Z mov.b64 {%r1240, %r1241}, %rd177; 2026-02-21T09:08:46.9200115Z cvt.rn.f16x2.f32 %r1242, %r1241, %r1240; 2026-02-21T09:08:46.9200430Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9200530Z cvt.u64.u32 %rd178, %r610; 2026-02-21T09:08:46.9200638Z cvt.u64.u32 %rd179, %r611; 2026-02-21T09:08:46.9200738Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:08:46.9200836Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:08:46.9201163Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9201264Z mov.b64 {%r1243, %r1244}, %rd181; 2026-02-21T09:08:46.9201381Z cvt.rn.f16x2.f32 %r1245, %r1244, %r1243; 2026-02-21T09:08:46.9201706Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9201808Z cvt.u64.u32 %rd182, %r612; 2026-02-21T09:08:46.9201904Z cvt.u64.u32 %rd183, %r613; 2026-02-21T09:08:46.9202002Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:08:46.9202116Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:08:46.9202434Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9202537Z mov.b64 {%r1246, %r1247}, %rd185; 2026-02-21T09:08:46.9202657Z cvt.rn.f16x2.f32 %r1248, %r1247, %r1246; 2026-02-21T09:08:46.9202972Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9203076Z cvt.u64.u32 %rd186, %r614; 2026-02-21T09:08:46.9203181Z cvt.u64.u32 %rd187, %r615; 2026-02-21T09:08:46.9203278Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:08:46.9203381Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:08:46.9203706Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9203819Z mov.b64 {%r1249, %r1250}, %rd189; 2026-02-21T09:08:46.9203932Z cvt.rn.f16x2.f32 %r1251, %r1250, %r1249; 2026-02-21T09:08:46.9204248Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9204464Z cvt.u64.u32 %rd190, %r616; 2026-02-21T09:08:46.9204560Z cvt.u64.u32 %rd191, %r617; 2026-02-21T09:08:46.9204662Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:08:46.9204820Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:08:46.9205144Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9205241Z mov.b64 {%r1252, %r1253}, %rd193; 2026-02-21T09:08:46.9205353Z cvt.rn.f16x2.f32 %r1254, %r1253, %r1252; 2026-02-21T09:08:46.9205676Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9205776Z cvt.u64.u32 %rd194, %r618; 2026-02-21T09:08:46.9205971Z cvt.u64.u32 %rd195, %r619; 2026-02-21T09:08:46.9206080Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:08:46.9206181Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:08:46.9206508Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9206619Z mov.b64 {%r1255, %r1256}, %rd197; 2026-02-21T09:08:46.9206732Z cvt.rn.f16x2.f32 %r1257, %r1256, %r1255; 2026-02-21T09:08:46.9207172Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9207284Z cvt.u64.u32 %rd198, %r620; 2026-02-21T09:08:46.9207389Z cvt.u64.u32 %rd199, %r621; 2026-02-21T09:08:46.9207486Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:08:46.9207586Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:08:46.9207916Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9208099Z mov.b64 {%r1258, %r1259}, %rd201; 2026-02-21T09:08:46.9208221Z cvt.rn.f16x2.f32 %r1260, %r1259, %r1258; 2026-02-21T09:08:46.9208558Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9208657Z cvt.u64.u32 %rd202, %r622; 2026-02-21T09:08:46.9208758Z cvt.u64.u32 %rd203, %r623; 2026-02-21T09:08:46.9208856Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:08:46.9208964Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:08:46.9209289Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9209388Z mov.b64 {%r1261, %r1262}, %rd205; 2026-02-21T09:08:46.9209511Z cvt.rn.f16x2.f32 %r1263, %r1262, %r1261; 2026-02-21T09:08:46.9209830Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9209927Z cvt.u64.u32 %rd206, %r625; 2026-02-21T09:08:46.9210033Z cvt.u64.u32 %rd207, %r626; 2026-02-21T09:08:46.9210130Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:08:46.9210233Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:08:46.9210554Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9210663Z mov.b64 {%r1264, %r1265}, %rd209; 2026-02-21T09:08:46.9210778Z cvt.rn.f16x2.f32 %r1266, %r1265, %r1264; 2026-02-21T09:08:46.9211094Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9211209Z cvt.u64.u32 %rd210, %r627; 2026-02-21T09:08:46.9211308Z cvt.u64.u32 %rd211, %r628; 2026-02-21T09:08:46.9211406Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:08:46.9211516Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:08:46.9211831Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9211930Z mov.b64 {%r1267, %r1268}, %rd213; 2026-02-21T09:08:46.9212042Z cvt.rn.f16x2.f32 %r1269, %r1268, %r1267; 2026-02-21T09:08:46.9212374Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9212475Z cvt.u64.u32 %rd214, %r629; 2026-02-21T09:08:46.9212572Z cvt.u64.u32 %rd215, %r630; 2026-02-21T09:08:46.9212681Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:08:46.9212890Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:08:46.9213219Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9213327Z mov.b64 {%r1270, %r1271}, %rd217; 2026-02-21T09:08:46.9213443Z cvt.rn.f16x2.f32 %r1272, %r1271, %r1270; 2026-02-21T09:08:46.9213759Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9213860Z cvt.u64.u32 %rd218, %r631; 2026-02-21T09:08:46.9213967Z cvt.u64.u32 %rd219, %r632; 2026-02-21T09:08:46.9214066Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:08:46.9214164Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:08:46.9214493Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9214736Z mov.b64 {%r1273, %r1274}, %rd221; 2026-02-21T09:08:46.9214852Z cvt.rn.f16x2.f32 %r1275, %r1274, %r1273; 2026-02-21T09:08:46.9215206Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9215312Z cvt.u64.u32 %rd222, %r633; 2026-02-21T09:08:46.9215408Z cvt.u64.u32 %rd223, %r634; 2026-02-21T09:08:46.9215506Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:08:46.9215701Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:08:46.9216032Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9216130Z mov.b64 {%r1276, %r1277}, %rd225; 2026-02-21T09:08:46.9216252Z cvt.rn.f16x2.f32 %r1278, %r1277, %r1276; 2026-02-21T09:08:46.9216568Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9216754Z cvt.u64.u32 %rd226, %r635; 2026-02-21T09:08:46.9216877Z cvt.u64.u32 %rd227, %r636; 2026-02-21T09:08:46.9216977Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:08:46.9217077Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:08:46.9217405Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9217519Z mov.b64 {%r1279, %r1280}, %rd229; 2026-02-21T09:08:46.9217630Z cvt.rn.f16x2.f32 %r1281, %r1280, %r1279; 2026-02-21T09:08:46.9217954Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9218061Z cvt.u64.u32 %rd230, %r637; 2026-02-21T09:08:46.9218158Z cvt.u64.u32 %rd231, %r638; 2026-02-21T09:08:46.9218256Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:08:46.9218362Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:08:46.9218683Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9218785Z mov.b64 {%r1282, %r1283}, %rd233; 2026-02-21T09:08:46.9218901Z cvt.rn.f16x2.f32 %r1284, %r1283, %r1282; 2026-02-21T09:08:46.9219227Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9219331Z cvt.u64.u32 %rd234, %r639; 2026-02-21T09:08:46.9219435Z cvt.u64.u32 %rd235, %r640; 2026-02-21T09:08:46.9219540Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:08:46.9219638Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:08:46.9219966Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9220074Z mov.b64 {%r1285, %r1286}, %rd237; 2026-02-21T09:08:46.9220186Z cvt.rn.f16x2.f32 %r1287, %r1286, %r1285; 2026-02-21T09:08:46.9220508Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9220606Z cvt.u64.u32 %rd238, %r642; 2026-02-21T09:08:46.9220713Z cvt.u64.u32 %rd239, %r643; 2026-02-21T09:08:46.9220817Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:08:46.9220919Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:08:46.9221251Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9221351Z mov.b64 {%r1288, %r1289}, %rd241; 2026-02-21T09:08:46.9221571Z cvt.rn.f16x2.f32 %r1290, %r1289, %r1288; 2026-02-21T09:08:46.9221903Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9222004Z cvt.u64.u32 %rd242, %r644; 2026-02-21T09:08:46.9222104Z cvt.u64.u32 %rd243, %r645; 2026-02-21T09:08:46.9222201Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:08:46.9222307Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:08:46.9222626Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9222723Z mov.b64 {%r1291, %r1292}, %rd245; 2026-02-21T09:08:46.9222843Z cvt.rn.f16x2.f32 %r1293, %r1292, %r1291; 2026-02-21T09:08:46.9223163Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9223365Z cvt.u64.u32 %rd246, %r646; 2026-02-21T09:08:46.9223472Z cvt.u64.u32 %rd247, %r647; 2026-02-21T09:08:46.9223569Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:08:46.9223674Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:08:46.9223998Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9224109Z mov.b64 {%r1294, %r1295}, %rd249; 2026-02-21T09:08:46.9224286Z cvt.rn.f16x2.f32 %r1296, %r1295, %r1294; 2026-02-21T09:08:46.9224612Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9224779Z cvt.u64.u32 %rd250, %r648; 2026-02-21T09:08:46.9224879Z cvt.u64.u32 %rd251, %r649; 2026-02-21T09:08:46.9224977Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:08:46.9225087Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:08:46.9225483Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9225593Z mov.b64 {%r1297, %r1298}, %rd253; 2026-02-21T09:08:46.9225708Z cvt.rn.f16x2.f32 %r1299, %r1298, %r1297; 2026-02-21T09:08:46.9226044Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9226146Z cvt.u64.u32 %rd254, %r650; 2026-02-21T09:08:46.9226245Z cvt.u64.u32 %rd255, %r651; 2026-02-21T09:08:46.9226353Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:08:46.9226461Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:08:46.9226789Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9226897Z mov.b64 {%r1300, %r1301}, %rd257; 2026-02-21T09:08:46.9227011Z cvt.rn.f16x2.f32 %r1302, %r1301, %r1300; 2026-02-21T09:08:46.9227341Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9227442Z cvt.u64.u32 %rd258, %r652; 2026-02-21T09:08:46.9227561Z cvt.u64.u32 %rd259, %r653; 2026-02-21T09:08:46.9227660Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:08:46.9227761Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:08:46.9228095Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9228199Z mov.b64 {%r1303, %r1304}, %rd261; 2026-02-21T09:08:46.9228314Z cvt.rn.f16x2.f32 %r1305, %r1304, %r1303; 2026-02-21T09:08:46.9228656Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9228756Z cvt.u64.u32 %rd262, %r654; 2026-02-21T09:08:46.9228854Z cvt.u64.u32 %rd263, %r655; 2026-02-21T09:08:46.9228951Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:08:46.9229058Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:08:46.9229388Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9229491Z mov.b64 {%r1306, %r1307}, %rd265; 2026-02-21T09:08:46.9229619Z cvt.rn.f16x2.f32 %r1308, %r1307, %r1306; 2026-02-21T09:08:46.9229946Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9230044Z cvt.u64.u32 %rd266, %r656; 2026-02-21T09:08:46.9230265Z cvt.u64.u32 %rd267, %r657; 2026-02-21T09:08:46.9230362Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:08:46.9230461Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:08:46.9230789Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9230901Z mov.b64 {%r1309, %r1310}, %rd269; 2026-02-21T09:08:46.9231013Z cvt.rn.f16x2.f32 %r1311, %r1310, %r1309; 2026-02-21T09:08:46.9231339Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9231445Z cvt.u64.u32 %rd270, %r659; 2026-02-21T09:08:46.9231541Z cvt.u64.u32 %rd271, %r660; 2026-02-21T09:08:46.9231644Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:08:46.9231858Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:08:46.9232185Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9232285Z mov.b64 {%r1312, %r1313}, %rd273; 2026-02-21T09:08:46.9232399Z cvt.rn.f16x2.f32 %r1314, %r1313, %r1312; 2026-02-21T09:08:46.9232736Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9232834Z cvt.u64.u32 %rd274, %r661; 2026-02-21T09:08:46.9233012Z cvt.u64.u32 %rd275, %r662; 2026-02-21T09:08:46.9233131Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:08:46.9233230Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:08:46.9233551Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9233658Z mov.b64 {%r1315, %r1316}, %rd277; 2026-02-21T09:08:46.9233770Z cvt.rn.f16x2.f32 %r1317, %r1316, %r1315; 2026-02-21T09:08:46.9234156Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9234274Z cvt.u64.u32 %rd278, %r663; 2026-02-21T09:08:46.9234383Z cvt.u64.u32 %rd279, %r664; 2026-02-21T09:08:46.9234481Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:08:46.9234580Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:08:46.9234986Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9235091Z mov.b64 {%r1318, %r1319}, %rd281; 2026-02-21T09:08:46.9235205Z cvt.rn.f16x2.f32 %r1320, %r1319, %r1318; 2026-02-21T09:08:46.9235535Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9235634Z cvt.u64.u32 %rd282, %r665; 2026-02-21T09:08:46.9235730Z cvt.u64.u32 %rd283, %r666; 2026-02-21T09:08:46.9235827Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:08:46.9235936Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:08:46.9236257Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9236358Z mov.b64 {%r1321, %r1322}, %rd285; 2026-02-21T09:08:46.9236479Z cvt.rn.f16x2.f32 %r1323, %r1322, %r1321; 2026-02-21T09:08:46.9236797Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9236898Z cvt.u64.u32 %rd286, %r667; 2026-02-21T09:08:46.9237005Z cvt.u64.u32 %rd287, %r668; 2026-02-21T09:08:46.9237101Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:08:46.9237202Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:08:46.9237521Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9237632Z mov.b64 {%r1324, %r1325}, %rd289; 2026-02-21T09:08:46.9237746Z cvt.rn.f16x2.f32 %r1326, %r1325, %r1324; 2026-02-21T09:08:46.9238368Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9238486Z cvt.u64.u32 %rd290, %r669; 2026-02-21T09:08:46.9238592Z cvt.u64.u32 %rd291, %r670; 2026-02-21T09:08:46.9238688Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:08:46.9238796Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:08:46.9239113Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9239330Z mov.b64 {%r1327, %r1328}, %rd293; 2026-02-21T09:08:46.9239453Z cvt.rn.f16x2.f32 %r1329, %r1328, %r1327; 2026-02-21T09:08:46.9239777Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9239876Z cvt.u64.u32 %rd294, %r671; 2026-02-21T09:08:46.9239971Z cvt.u64.u32 %rd295, %r672; 2026-02-21T09:08:46.9240076Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:08:46.9240175Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:08:46.9240498Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9240610Z mov.b64 {%r1330, %r1331}, %rd297; 2026-02-21T09:08:46.9240722Z cvt.rn.f16x2.f32 %r1332, %r1331, %r1330; 2026-02-21T09:08:46.9241168Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9241277Z cvt.u64.u32 %rd298, %r673; 2026-02-21T09:08:46.9241375Z cvt.u64.u32 %rd299, %r674; 2026-02-21T09:08:46.9241477Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:08:46.9241575Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:08:46.9241979Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9242086Z mov.b64 {%r1333, %r1334}, %rd301; 2026-02-21T09:08:46.9242199Z cvt.rn.f16x2.f32 %r1335, %r1334, %r1333; 2026-02-21T09:08:46.9242527Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9242625Z cvt.u64.u32 %rd302, %r676; 2026-02-21T09:08:46.9242722Z cvt.u64.u32 %rd303, %r677; 2026-02-21T09:08:46.9242829Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:08:46.9243036Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:08:46.9243374Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9243469Z mov.b64 {%r1336, %r1337}, %rd305; 2026-02-21T09:08:46.9243589Z cvt.rn.f16x2.f32 %r1338, %r1337, %r1336; 2026-02-21T09:08:46.9243908Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9244007Z cvt.u64.u32 %rd306, %r678; 2026-02-21T09:08:46.9244116Z cvt.u64.u32 %rd307, %r679; 2026-02-21T09:08:46.9244214Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:08:46.9244313Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:08:46.9244642Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9244813Z mov.b64 {%r1339, %r1340}, %rd309; 2026-02-21T09:08:46.9244930Z cvt.rn.f16x2.f32 %r1341, %r1340, %r1339; 2026-02-21T09:08:46.9245253Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9245366Z cvt.u64.u32 %rd310, %r680; 2026-02-21T09:08:46.9245461Z cvt.u64.u32 %rd311, %r681; 2026-02-21T09:08:46.9245558Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:08:46.9245663Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:08:46.9245986Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9246085Z mov.b64 {%r1342, %r1343}, %rd313; 2026-02-21T09:08:46.9246208Z cvt.rn.f16x2.f32 %r1344, %r1343, %r1342; 2026-02-21T09:08:46.9246529Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9246626Z cvt.u64.u32 %rd314, %r682; 2026-02-21T09:08:46.9246717Z cvt.u64.u32 %rd315, %r683; 2026-02-21T09:08:46.9246828Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:08:46.9246927Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:08:46.9247250Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9247363Z mov.b64 {%r1345, %r1346}, %rd317; 2026-02-21T09:08:46.9247474Z cvt.rn.f16x2.f32 %r1347, %r1346, %r1345; 2026-02-21T09:08:46.9247792Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9248020Z cvt.u64.u32 %rd318, %r684; 2026-02-21T09:08:46.9248118Z cvt.u64.u32 %rd319, %r685; 2026-02-21T09:08:46.9248212Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:08:46.9248314Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:08:46.9248648Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9248746Z mov.b64 {%r1348, %r1349}, %rd321; 2026-02-21T09:08:46.9248859Z cvt.rn.f16x2.f32 %r1350, %r1349, %r1348; 2026-02-21T09:08:46.9249190Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9249293Z cvt.u64.u32 %rd322, %r686; 2026-02-21T09:08:46.9249389Z cvt.u64.u32 %rd323, %r687; 2026-02-21T09:08:46.9249597Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:08:46.9249698Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:08:46.9250018Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9250120Z mov.b64 {%r1351, %r1352}, %rd325; 2026-02-21T09:08:46.9250241Z cvt.rn.f16x2.f32 %r1353, %r1352, %r1351; 2026-02-21T09:08:46.9250637Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9250747Z cvt.u64.u32 %rd326, %r688; 2026-02-21T09:08:46.9250856Z cvt.u64.u32 %rd327, %r689; 2026-02-21T09:08:46.9250952Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:08:46.9251051Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:08:46.9251380Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9251478Z mov.b64 {%r1354, %r1355}, %rd329; 2026-02-21T09:08:46.9251673Z cvt.rn.f16x2.f32 %r1356, %r1355, %r1354; 2026-02-21T09:08:46.9252015Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9252120Z cvt.u64.u32 %rd330, %r690; 2026-02-21T09:08:46.9252219Z cvt.u64.u32 %rd331, %r691; 2026-02-21T09:08:46.9252321Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:08:46.9252428Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:08:46.9252755Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9252855Z mov.b64 {%r1357, %r1358}, %rd333; 2026-02-21T09:08:46.9252978Z cvt.rn.f16x2.f32 %r1359, %r1358, %r1357; 2026-02-21T09:08:46.9253303Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9253401Z cvt.u64.u32 %rd334, %r693; 2026-02-21T09:08:46.9253499Z cvt.u64.u32 %rd335, %r694; 2026-02-21T09:08:46.9253605Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:08:46.9253709Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:08:46.9254039Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9254150Z mov.b64 {%r1360, %r1361}, %rd337; 2026-02-21T09:08:46.9254263Z cvt.rn.f16x2.f32 %r1362, %r1361, %r1360; 2026-02-21T09:08:46.9254590Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9254753Z cvt.u64.u32 %rd338, %r695; 2026-02-21T09:08:46.9254859Z cvt.u64.u32 %rd339, %r696; 2026-02-21T09:08:46.9254951Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:08:46.9255051Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:08:46.9255382Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9255482Z mov.b64 {%r1363, %r1364}, %rd341; 2026-02-21T09:08:46.9255595Z cvt.rn.f16x2.f32 %r1365, %r1364, %r1363; 2026-02-21T09:08:46.9255931Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9256031Z cvt.u64.u32 %rd342, %r697; 2026-02-21T09:08:46.9256126Z cvt.u64.u32 %rd343, %r698; 2026-02-21T09:08:46.9256234Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:08:46.9256333Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:08:46.9256776Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9256874Z mov.b64 {%r1366, %r1367}, %rd345; 2026-02-21T09:08:46.9256999Z cvt.rn.f16x2.f32 %r1368, %r1367, %r1366; 2026-02-21T09:08:46.9257320Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9257415Z cvt.u64.u32 %rd346, %r699; 2026-02-21T09:08:46.9257521Z cvt.u64.u32 %rd347, %r700; 2026-02-21T09:08:46.9257618Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:08:46.9257715Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:08:46.9258054Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9258256Z mov.b64 {%r1369, %r1370}, %rd349; 2026-02-21T09:08:46.9258370Z cvt.rn.f16x2.f32 %r1371, %r1370, %r1369; 2026-02-21T09:08:46.9258693Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9258806Z cvt.u64.u32 %rd350, %r701; 2026-02-21T09:08:46.9258902Z cvt.u64.u32 %rd351, %r702; 2026-02-21T09:08:46.9259001Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:08:46.9259108Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:08:46.9259518Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9259631Z mov.b64 {%r1372, %r1373}, %rd353; 2026-02-21T09:08:46.9259753Z cvt.rn.f16x2.f32 %r1374, %r1373, %r1372; 2026-02-21T09:08:46.9260077Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9260176Z cvt.u64.u32 %rd354, %r703; 2026-02-21T09:08:46.9260355Z cvt.u64.u32 %rd355, %r704; 2026-02-21T09:08:46.9260475Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:08:46.9260572Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:08:46.9260937Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9261053Z mov.b64 {%r1375, %r1376}, %rd357; 2026-02-21T09:08:46.9261164Z cvt.rn.f16x2.f32 %r1377, %r1376, %r1375; 2026-02-21T09:08:46.9261481Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9261587Z cvt.u64.u32 %rd358, %r705; 2026-02-21T09:08:46.9261683Z cvt.u64.u32 %rd359, %r706; 2026-02-21T09:08:46.9261779Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:08:46.9261876Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:08:46.9262202Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9262301Z mov.b64 {%r1378, %r1379}, %rd361; 2026-02-21T09:08:46.9262415Z cvt.rn.f16x2.f32 %r1380, %r1379, %r1378; 2026-02-21T09:08:46.9262741Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9262837Z cvt.u64.u32 %rd362, %r707; 2026-02-21T09:08:46.9262932Z cvt.u64.u32 %rd363, %r708; 2026-02-21T09:08:46.9263043Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:08:46.9263141Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:08:46.9263459Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9263561Z mov.b64 {%r1381, %r1382}, %rd365; 2026-02-21T09:08:46.9263683Z cvt.rn.f16x2.f32 %r1383, %r1382, %r1381; 2026-02-21T09:08:46.9263995Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9264089Z cvt.u64.u32 %rd366, %r710; 2026-02-21T09:08:46.9264215Z cvt.u64.u32 %rd367, %r711; 2026-02-21T09:08:46.9264314Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:08:46.9264417Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:08:46.9264827Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9264936Z mov.b64 {%r1384, %r1385}, %rd369; 2026-02-21T09:08:46.9265048Z cvt.rn.f16x2.f32 %r1386, %r1385, %r1384; 2026-02-21T09:08:46.9265491Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9265600Z cvt.u64.u32 %rd370, %r712; 2026-02-21T09:08:46.9265698Z cvt.u64.u32 %rd371, %r713; 2026-02-21T09:08:46.9265799Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:08:46.9265903Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:08:46.9266230Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9266330Z mov.b64 {%r1387, %r1388}, %rd373; 2026-02-21T09:08:46.9266457Z cvt.rn.f16x2.f32 %r1389, %r1388, %r1387; 2026-02-21T09:08:46.9266784Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9266986Z cvt.u64.u32 %rd374, %r714; 2026-02-21T09:08:46.9267084Z cvt.u64.u32 %rd375, %r715; 2026-02-21T09:08:46.9267193Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:08:46.9267293Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:08:46.9267619Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9267736Z mov.b64 {%r1390, %r1391}, %rd377; 2026-02-21T09:08:46.9267847Z cvt.rn.f16x2.f32 %r1392, %r1391, %r1390; 2026-02-21T09:08:46.9268255Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9268375Z cvt.u64.u32 %rd378, %r716; 2026-02-21T09:08:46.9268473Z cvt.u64.u32 %rd379, %r717; 2026-02-21T09:08:46.9268570Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:08:46.9268664Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:08:46.9269075Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9269188Z mov.b64 {%r1393, %r1394}, %rd381; 2026-02-21T09:08:46.9269304Z cvt.rn.f16x2.f32 %r1395, %r1394, %r1393; 2026-02-21T09:08:46.9269636Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9269738Z cvt.u64.u32 %rd382, %r718; 2026-02-21T09:08:46.9269836Z cvt.u64.u32 %rd383, %r719; 2026-02-21T09:08:46.9269944Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:08:46.9270045Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:08:46.9270373Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9270474Z mov.b64 {%r1396, %r1397}, %rd385; 2026-02-21T09:08:46.9270595Z cvt.rn.f16x2.f32 %r1398, %r1397, %r1396; 2026-02-21T09:08:46.9270919Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9271018Z cvt.u64.u32 %rd386, %r720; 2026-02-21T09:08:46.9271126Z cvt.u64.u32 %rd387, %r721; 2026-02-21T09:08:46.9271229Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:08:46.9271325Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:08:46.9271658Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9271761Z mov.b64 {%r1399, %r1400}, %rd389; 2026-02-21T09:08:46.9271877Z cvt.rn.f16x2.f32 %r1401, %r1400, %r1399; 2026-02-21T09:08:46.9272201Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9272312Z cvt.u64.u32 %rd390, %r722; 2026-02-21T09:08:46.9272408Z cvt.u64.u32 %rd391, %r723; 2026-02-21T09:08:46.9272505Z shl.b64 %rd392, %rd391, 32; 2026-02-21T09:08:46.9272611Z or.b64 %rd393, %rd390, %rd392; 2026-02-21T09:08:46.9272937Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9273036Z mov.b64 {%r1402, %r1403}, %rd393; 2026-02-21T09:08:46.9273162Z cvt.rn.f16x2.f32 %r1404, %r1403, %r1402; 2026-02-21T09:08:46.9273492Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9273589Z cvt.u64.u32 %rd394, %r724; 2026-02-21T09:08:46.9273685Z cvt.u64.u32 %rd395, %r725; 2026-02-21T09:08:46.9273789Z shl.b64 %rd396, %rd395, 32; 2026-02-21T09:08:46.9274016Z or.b64 %rd397, %rd394, %rd396; 2026-02-21T09:08:46.9274328Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9274443Z mov.b64 {%r1405, %r1406}, %rd397; 2026-02-21T09:08:46.9274556Z cvt.rn.f16x2.f32 %r1407, %r1406, %r1405; 2026-02-21T09:08:46.9274917Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9275033Z cvt.u64.u32 %rd398, %r727; 2026-02-21T09:08:46.9275130Z cvt.u64.u32 %rd399, %r728; 2026-02-21T09:08:46.9275229Z shl.b64 %rd400, %rd399, 32; 2026-02-21T09:08:46.9275333Z or.b64 %rd401, %rd398, %rd400; 2026-02-21T09:08:46.9275754Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9275852Z mov.b64 {%r1408, %r1409}, %rd401; 2026-02-21T09:08:46.9275965Z cvt.rn.f16x2.f32 %r1410, %r1409, %r1408; 2026-02-21T09:08:46.9276293Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9276397Z cvt.u64.u32 %rd402, %r729; 2026-02-21T09:08:46.9276492Z cvt.u64.u32 %rd403, %r730; 2026-02-21T09:08:46.9276687Z shl.b64 %rd404, %rd403, 32; 2026-02-21T09:08:46.9276798Z or.b64 %rd405, %rd402, %rd404; 2026-02-21T09:08:46.9277117Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9277218Z mov.b64 {%r1411, %r1412}, %rd405; 2026-02-21T09:08:46.9277340Z cvt.rn.f16x2.f32 %r1413, %r1412, %r1411; 2026-02-21T09:08:46.9277737Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9277848Z cvt.u64.u32 %rd406, %r731; 2026-02-21T09:08:46.9277955Z cvt.u64.u32 %rd407, %r732; 2026-02-21T09:08:46.9278052Z shl.b64 %rd408, %rd407, 32; 2026-02-21T09:08:46.9278155Z or.b64 %rd409, %rd406, %rd408; 2026-02-21T09:08:46.9278487Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9278592Z mov.b64 {%r1414, %r1415}, %rd409; 2026-02-21T09:08:46.9278707Z cvt.rn.f16x2.f32 %r1416, %r1415, %r1414; 2026-02-21T09:08:46.9279036Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9279144Z cvt.u64.u32 %rd410, %r733; 2026-02-21T09:08:46.9279241Z cvt.u64.u32 %rd411, %r734; 2026-02-21T09:08:46.9279339Z shl.b64 %rd412, %rd411, 32; 2026-02-21T09:08:46.9279445Z or.b64 %rd413, %rd410, %rd412; 2026-02-21T09:08:46.9279769Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9279875Z mov.b64 {%r1417, %r1418}, %rd413; 2026-02-21T09:08:46.9279998Z cvt.rn.f16x2.f32 %r1419, %r1418, %r1417; 2026-02-21T09:08:46.9280324Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9280422Z cvt.u64.u32 %rd414, %r735; 2026-02-21T09:08:46.9280523Z cvt.u64.u32 %rd415, %r736; 2026-02-21T09:08:46.9280632Z shl.b64 %rd416, %rd415, 32; 2026-02-21T09:08:46.9280730Z or.b64 %rd417, %rd414, %rd416; 2026-02-21T09:08:46.9281057Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9281165Z mov.b64 {%r1420, %r1421}, %rd417; 2026-02-21T09:08:46.9281277Z cvt.rn.f16x2.f32 %r1422, %r1421, %r1420; 2026-02-21T09:08:46.9281601Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9281711Z cvt.u64.u32 %rd418, %r737; 2026-02-21T09:08:46.9281814Z cvt.u64.u32 %rd419, %r738; 2026-02-21T09:08:46.9281915Z shl.b64 %rd420, %rd419, 32; 2026-02-21T09:08:46.9282013Z or.b64 %rd421, %rd418, %rd420; 2026-02-21T09:08:46.9282346Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9282450Z mov.b64 {%r1423, %r1424}, %rd421; 2026-02-21T09:08:46.9282668Z cvt.rn.f16x2.f32 %r1425, %r1424, %r1423; 2026-02-21T09:08:46.9283001Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9283104Z cvt.u64.u32 %rd422, %r739; 2026-02-21T09:08:46.9283204Z cvt.u64.u32 %rd423, %r740; 2026-02-21T09:08:46.9283311Z shl.b64 %rd424, %rd423, 32; 2026-02-21T09:08:46.9283412Z or.b64 %rd425, %rd422, %rd424; 2026-02-21T09:08:46.9283731Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9283830Z mov.b64 {%r1426, %r1427}, %rd425; 2026-02-21T09:08:46.9283958Z cvt.rn.f16x2.f32 %r1428, %r1427, %r1426; 2026-02-21T09:08:46.9284362Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9284463Z cvt.u64.u32 %rd426, %r741; 2026-02-21T09:08:46.9284569Z cvt.u64.u32 %rd427, %r742; 2026-02-21T09:08:46.9284665Z shl.b64 %rd428, %rd427, 32; 2026-02-21T09:08:46.9284839Z or.b64 %rd429, %rd426, %rd428; 2026-02-21T09:08:46.9285171Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9285354Z mov.b64 {%r1429, %r1430}, %rd429; 2026-02-21T09:08:46.9285475Z cvt.rn.f16x2.f32 %r1431, %r1430, %r1429; 2026-02-21T09:08:46.9285797Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9285903Z cvt.u64.u32 %rd430, %r744; 2026-02-21T09:08:46.9285999Z cvt.u64.u32 %rd431, %r745; 2026-02-21T09:08:46.9286096Z shl.b64 %rd432, %rd431, 32; 2026-02-21T09:08:46.9286203Z or.b64 %rd433, %rd430, %rd432; 2026-02-21T09:08:46.9286600Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9286717Z mov.b64 {%r1432, %r1433}, %rd433; 2026-02-21T09:08:46.9286840Z cvt.rn.f16x2.f32 %r1434, %r1433, %r1432; 2026-02-21T09:08:46.9287163Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9287266Z cvt.u64.u32 %rd434, %r746; 2026-02-21T09:08:46.9287364Z cvt.u64.u32 %rd435, %r747; 2026-02-21T09:08:46.9287472Z shl.b64 %rd436, %rd435, 32; 2026-02-21T09:08:46.9287572Z or.b64 %rd437, %rd434, %rd436; 2026-02-21T09:08:46.9287897Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9288009Z mov.b64 {%r1435, %r1436}, %rd437; 2026-02-21T09:08:46.9288122Z cvt.rn.f16x2.f32 %r1437, %r1436, %r1435; 2026-02-21T09:08:46.9288445Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9288555Z cvt.u64.u32 %rd438, %r748; 2026-02-21T09:08:46.9288650Z cvt.u64.u32 %rd439, %r749; 2026-02-21T09:08:46.9288747Z shl.b64 %rd440, %rd439, 32; 2026-02-21T09:08:46.9288844Z or.b64 %rd441, %rd438, %rd440; 2026-02-21T09:08:46.9289175Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9289278Z mov.b64 {%r1438, %r1439}, %rd441; 2026-02-21T09:08:46.9289390Z cvt.rn.f16x2.f32 %r1440, %r1439, %r1438; 2026-02-21T09:08:46.9289724Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9289822Z cvt.u64.u32 %rd442, %r750; 2026-02-21T09:08:46.9289918Z cvt.u64.u32 %rd443, %r751; 2026-02-21T09:08:46.9290028Z shl.b64 %rd444, %rd443, 32; 2026-02-21T09:08:46.9290126Z or.b64 %rd445, %rd442, %rd444; 2026-02-21T09:08:46.9290453Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9290557Z mov.b64 {%r1441, %r1442}, %rd445; 2026-02-21T09:08:46.9290686Z cvt.rn.f16x2.f32 %r1443, %r1442, %r1441; 2026-02-21T09:08:46.9291008Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9291108Z cvt.u64.u32 %rd446, %r752; 2026-02-21T09:08:46.9291321Z cvt.u64.u32 %rd447, %r753; 2026-02-21T09:08:46.9291418Z shl.b64 %rd448, %rd447, 32; 2026-02-21T09:08:46.9291517Z or.b64 %rd449, %rd446, %rd448; 2026-02-21T09:08:46.9291848Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9291947Z mov.b64 {%r1444, %r1445}, %rd449; 2026-02-21T09:08:46.9292058Z cvt.rn.f16x2.f32 %r1446, %r1445, %r1444; 2026-02-21T09:08:46.9292377Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9292486Z cvt.u64.u32 %rd450, %r754; 2026-02-21T09:08:46.9292585Z cvt.u64.u32 %rd451, %r755; 2026-02-21T09:08:46.9292686Z shl.b64 %rd452, %rd451, 32; 2026-02-21T09:08:46.9292891Z or.b64 %rd453, %rd450, %rd452; 2026-02-21T09:08:46.9293218Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9293315Z mov.b64 {%r1447, %r1448}, %rd453; 2026-02-21T09:08:46.9293446Z cvt.rn.f16x2.f32 %r1449, %r1448, %r1447; 2026-02-21T09:08:46.9293770Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9293938Z cvt.u64.u32 %rd454, %r756; 2026-02-21T09:08:46.9294047Z cvt.u64.u32 %rd455, %r757; 2026-02-21T09:08:46.9294156Z shl.b64 %rd456, %rd455, 32; 2026-02-21T09:08:46.9294255Z or.b64 %rd457, %rd454, %rd456; 2026-02-21T09:08:46.9294575Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9294731Z mov.b64 {%r1450, %r1451}, %rd457; 2026-02-21T09:08:46.9294872Z cvt.rn.f16x2.f32 %r1452, %r1451, %r1450; 2026-02-21T09:08:46.9295271Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9295393Z cvt.u64.u32 %rd458, %r758; 2026-02-21T09:08:46.9295491Z cvt.u64.u32 %rd459, %r759; 2026-02-21T09:08:46.9295588Z shl.b64 %rd460, %rd459, 32; 2026-02-21T09:08:46.9295692Z or.b64 %rd461, %rd458, %rd460; 2026-02-21T09:08:46.9296019Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9296120Z mov.b64 {%r1453, %r1454}, %rd461; 2026-02-21T09:08:46.9296233Z cvt.rn.f16x2.f32 %r1455, %r1454, %r1453; 2026-02-21T09:08:46.9296558Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9296658Z cvt.u64.u32 %rd462, %r761; 2026-02-21T09:08:46.9296754Z cvt.u64.u32 %rd463, %r762; 2026-02-21T09:08:46.9296861Z shl.b64 %rd464, %rd463, 32; 2026-02-21T09:08:46.9296961Z or.b64 %rd465, %rd462, %rd464; 2026-02-21T09:08:46.9297282Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9297387Z mov.b64 {%r1456, %r1457}, %rd465; 2026-02-21T09:08:46.9297514Z cvt.rn.f16x2.f32 %r1458, %r1457, %r1456; 2026-02-21T09:08:46.9297834Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9297939Z cvt.u64.u32 %rd466, %r763; 2026-02-21T09:08:46.9298045Z cvt.u64.u32 %rd467, %r764; 2026-02-21T09:08:46.9298141Z shl.b64 %rd468, %rd467, 32; 2026-02-21T09:08:46.9298242Z or.b64 %rd469, %rd466, %rd468; 2026-02-21T09:08:46.9298571Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9298672Z mov.b64 {%r1459, %r1460}, %rd469; 2026-02-21T09:08:46.9298783Z cvt.rn.f16x2.f32 %r1461, %r1460, %r1459; 2026-02-21T09:08:46.9299097Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9299215Z cvt.u64.u32 %rd470, %r765; 2026-02-21T09:08:46.9299315Z cvt.u64.u32 %rd471, %r766; 2026-02-21T09:08:46.9299412Z shl.b64 %rd472, %rd471, 32; 2026-02-21T09:08:46.9299519Z or.b64 %rd473, %rd470, %rd472; 2026-02-21T09:08:46.9299838Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9300047Z mov.b64 {%r1462, %r1463}, %rd473; 2026-02-21T09:08:46.9300173Z cvt.rn.f16x2.f32 %r1464, %r1463, %r1462; 2026-02-21T09:08:46.9300501Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9300600Z cvt.u64.u32 %rd474, %r767; 2026-02-21T09:08:46.9300699Z cvt.u64.u32 %rd475, %r768; 2026-02-21T09:08:46.9300803Z shl.b64 %rd476, %rd475, 32; 2026-02-21T09:08:46.9300902Z or.b64 %rd477, %rd474, %rd476; 2026-02-21T09:08:46.9301226Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9301335Z mov.b64 {%r1465, %r1466}, %rd477; 2026-02-21T09:08:46.9301545Z cvt.rn.f16x2.f32 %r1467, %r1466, %r1465; 2026-02-21T09:08:46.9301864Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9301974Z cvt.u64.u32 %rd478, %r769; 2026-02-21T09:08:46.9302078Z cvt.u64.u32 %rd479, %r770; 2026-02-21T09:08:46.9302175Z shl.b64 %rd480, %rd479, 32; 2026-02-21T09:08:46.9302275Z or.b64 %rd481, %rd478, %rd480; 2026-02-21T09:08:46.9302682Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9302791Z mov.b64 {%r1468, %r1469}, %rd481; 2026-02-21T09:08:46.9302904Z cvt.rn.f16x2.f32 %r1470, %r1469, %r1468; 2026-02-21T09:08:46.9303229Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9303326Z cvt.u64.u32 %rd482, %r771; 2026-02-21T09:08:46.9303424Z cvt.u64.u32 %rd483, %r772; 2026-02-21T09:08:46.9303623Z shl.b64 %rd484, %rd483, 32; 2026-02-21T09:08:46.9303736Z or.b64 %rd485, %rd482, %rd484; 2026-02-21T09:08:46.9304061Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9304160Z mov.b64 {%r1471, %r1472}, %rd485; 2026-02-21T09:08:46.9304283Z cvt.rn.f16x2.f32 %r1473, %r1472, %r1471; 2026-02-21T09:08:46.9304603Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9304775Z cvt.u64.u32 %rd486, %r773; 2026-02-21T09:08:46.9304892Z cvt.u64.u32 %rd487, %r774; 2026-02-21T09:08:46.9304991Z shl.b64 %rd488, %rd487, 32; 2026-02-21T09:08:46.9305089Z or.b64 %rd489, %rd486, %rd488; 2026-02-21T09:08:46.9305419Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9305520Z mov.b64 {%r1474, %r1475}, %rd489; 2026-02-21T09:08:46.9305634Z cvt.rn.f16x2.f32 %r1476, %r1475, %r1474; 2026-02-21T09:08:46.9305957Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9306071Z cvt.u64.u32 %rd490, %r775; 2026-02-21T09:08:46.9306170Z cvt.u64.u32 %rd491, %r776; 2026-02-21T09:08:46.9306266Z shl.b64 %rd492, %rd491, 32; 2026-02-21T09:08:46.9306378Z or.b64 %rd493, %rd490, %rd492; 2026-02-21T09:08:46.9306700Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9306799Z mov.b64 {%r1477, %r1478}, %rd493; 2026-02-21T09:08:46.9306923Z cvt.rn.f16x2.f32 %r1479, %r1478, %r1477; 2026-02-21T09:08:46.9307248Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9307352Z cvt.u64.u32 %rd494, %r778; 2026-02-21T09:08:46.9307450Z cvt.u64.u32 %rd495, %r779; 2026-02-21T09:08:46.9307554Z shl.b64 %rd496, %rd495, 32; 2026-02-21T09:08:46.9307654Z or.b64 %rd497, %rd494, %rd496; 2026-02-21T09:08:46.9307979Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9308093Z mov.b64 {%r1480, %r1481}, %rd497; 2026-02-21T09:08:46.9308207Z cvt.rn.f16x2.f32 %r1482, %r1481, %r1480; 2026-02-21T09:08:46.9308526Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9308744Z cvt.u64.u32 %rd498, %r780; 2026-02-21T09:08:46.9308842Z cvt.u64.u32 %rd499, %r781; 2026-02-21T09:08:46.9308939Z shl.b64 %rd500, %rd499, 32; 2026-02-21T09:08:46.9309039Z or.b64 %rd501, %rd498, %rd500; 2026-02-21T09:08:46.9309369Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9309467Z mov.b64 {%r1483, %r1484}, %rd501; 2026-02-21T09:08:46.9309579Z cvt.rn.f16x2.f32 %r1485, %r1484, %r1483; 2026-02-21T09:08:46.9309902Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9310007Z cvt.u64.u32 %rd502, %r782; 2026-02-21T09:08:46.9310215Z cvt.u64.u32 %rd503, %r783; 2026-02-21T09:08:46.9310324Z shl.b64 %rd504, %rd503, 32; 2026-02-21T09:08:46.9310424Z or.b64 %rd505, %rd502, %rd504; 2026-02-21T09:08:46.9310745Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9310847Z mov.b64 {%r1486, %r1487}, %rd505; 2026-02-21T09:08:46.9310970Z cvt.rn.f16x2.f32 %r1488, %r1487, %r1486; 2026-02-21T09:08:46.9311370Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9311480Z cvt.u64.u32 %rd506, %r784; 2026-02-21T09:08:46.9311586Z cvt.u64.u32 %rd507, %r785; 2026-02-21T09:08:46.9311682Z shl.b64 %rd508, %rd507, 32; 2026-02-21T09:08:46.9311781Z or.b64 %rd509, %rd506, %rd508; 2026-02-21T09:08:46.9312112Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9312287Z mov.b64 {%r1489, %r1490}, %rd509; 2026-02-21T09:08:46.9312416Z cvt.rn.f16x2.f32 %r1491, %r1490, %r1489; 2026-02-21T09:08:46.9312732Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9312842Z cvt.u64.u32 %rd510, %r786; 2026-02-21T09:08:46.9312944Z cvt.u64.u32 %rd511, %r787; 2026-02-21T09:08:46.9313042Z shl.b64 %rd512, %rd511, 32; 2026-02-21T09:08:46.9313150Z or.b64 %rd513, %rd510, %rd512; 2026-02-21T09:08:46.9313469Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9313569Z mov.b64 {%r1492, %r1493}, %rd513; 2026-02-21T09:08:46.9313692Z cvt.rn.f16x2.f32 %r1494, %r1493, %r1492; 2026-02-21T09:08:46.9314010Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9314108Z cvt.u64.u32 %rd514, %r788; 2026-02-21T09:08:46.9314204Z cvt.u64.u32 %rd515, %r789; 2026-02-21T09:08:46.9314316Z shl.b64 %rd516, %rd515, 32; 2026-02-21T09:08:46.9314418Z or.b64 %rd517, %rd514, %rd516; 2026-02-21T09:08:46.9314812Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9314929Z mov.b64 {%r1495, %r1496}, %rd517; 2026-02-21T09:08:46.9315041Z cvt.rn.f16x2.f32 %r1497, %r1496, %r1495; 2026-02-21T09:08:46.9315368Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9315479Z cvt.u64.u32 %rd518, %r790; 2026-02-21T09:08:46.9315579Z cvt.u64.u32 %rd519, %r791; 2026-02-21T09:08:46.9315680Z shl.b64 %rd520, %rd519, 32; 2026-02-21T09:08:46.9315777Z or.b64 %rd521, %rd518, %rd520; 2026-02-21T09:08:46.9316102Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9316200Z mov.b64 {%r1498, %r1499}, %rd521; 2026-02-21T09:08:46.9316313Z cvt.rn.f16x2.f32 %r1500, %r1499, %r1498; 2026-02-21T09:08:46.9316641Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9316744Z cvt.u64.u32 %rd522, %r792; 2026-02-21T09:08:46.9316841Z cvt.u64.u32 %rd523, %r793; 2026-02-21T09:08:46.9316946Z shl.b64 %rd524, %rd523, 32; 2026-02-21T09:08:46.9317046Z or.b64 %rd525, %rd522, %rd524; 2026-02-21T09:08:46.9317501Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9317601Z mov.b64 {%r1501, %r1502}, %rd525; 2026-02-21T09:08:46.9317728Z cvt.rn.f16x2.f32 %r1503, %r1502, %r1501; 2026-02-21T09:08:46.9318050Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9318147Z cvt.u64.u32 %rd526, %r795; 2026-02-21T09:08:46.9318257Z cvt.u64.u32 %rd527, %r796; 2026-02-21T09:08:46.9318354Z shl.b64 %rd528, %rd527, 32; 2026-02-21T09:08:46.9318453Z or.b64 %rd529, %rd526, %rd528; 2026-02-21T09:08:46.9318787Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9318992Z mov.b64 {%r1504, %r1505}, %rd529; 2026-02-21T09:08:46.9319104Z cvt.rn.f16x2.f32 %r1506, %r1505, %r1504; 2026-02-21T09:08:46.9319424Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9319538Z cvt.u64.u32 %rd530, %r797; 2026-02-21T09:08:46.9319636Z cvt.u64.u32 %rd531, %r798; 2026-02-21T09:08:46.9319734Z shl.b64 %rd532, %rd531, 32; 2026-02-21T09:08:46.9319919Z or.b64 %rd533, %rd530, %rd532; 2026-02-21T09:08:46.9320257Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9320355Z mov.b64 {%r1507, %r1508}, %rd533; 2026-02-21T09:08:46.9320473Z cvt.rn.f16x2.f32 %r1509, %r1508, %r1507; 2026-02-21T09:08:46.9320789Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9320969Z cvt.u64.u32 %rd534, %r799; 2026-02-21T09:08:46.9321078Z cvt.u64.u32 %rd535, %r800; 2026-02-21T09:08:46.9321185Z shl.b64 %rd536, %rd535, 32; 2026-02-21T09:08:46.9321281Z or.b64 %rd537, %rd534, %rd536; 2026-02-21T09:08:46.9321599Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9321712Z mov.b64 {%r1510, %r1511}, %rd537; 2026-02-21T09:08:46.9321822Z cvt.rn.f16x2.f32 %r1512, %r1511, %r1510; 2026-02-21T09:08:46.9322138Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9322245Z cvt.u64.u32 %rd538, %r801; 2026-02-21T09:08:46.9322340Z cvt.u64.u32 %rd539, %r802; 2026-02-21T09:08:46.9322437Z shl.b64 %rd540, %rd539, 32; 2026-02-21T09:08:46.9322535Z or.b64 %rd541, %rd538, %rd540; 2026-02-21T09:08:46.9322859Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9322962Z mov.b64 {%r1513, %r1514}, %rd541; 2026-02-21T09:08:46.9323082Z cvt.rn.f16x2.f32 %r1515, %r1514, %r1513; 2026-02-21T09:08:46.9323406Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9323503Z cvt.u64.u32 %rd542, %r803; 2026-02-21T09:08:46.9323606Z cvt.u64.u32 %rd543, %r804; 2026-02-21T09:08:46.9323717Z shl.b64 %rd544, %rd543, 32; 2026-02-21T09:08:46.9323818Z or.b64 %rd545, %rd542, %rd544; 2026-02-21T09:08:46.9324135Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9324233Z mov.b64 {%r1516, %r1517}, %rd545; 2026-02-21T09:08:46.9324353Z cvt.rn.f16x2.f32 %r1518, %r1517, %r1516; 2026-02-21T09:08:46.9324761Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9324867Z cvt.u64.u32 %rd546, %r805; 2026-02-21T09:08:46.9324972Z cvt.u64.u32 %rd547, %r806; 2026-02-21T09:08:46.9325073Z shl.b64 %rd548, %rd547, 32; 2026-02-21T09:08:46.9325171Z or.b64 %rd549, %rd546, %rd548; 2026-02-21T09:08:46.9325505Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9325607Z mov.b64 {%r1519, %r1520}, %rd549; 2026-02-21T09:08:46.9325719Z cvt.rn.f16x2.f32 %r1521, %r1520, %r1519; 2026-02-21T09:08:46.9326153Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9326263Z cvt.u64.u32 %rd550, %r807; 2026-02-21T09:08:46.9326362Z cvt.u64.u32 %rd551, %r808; 2026-02-21T09:08:46.9326463Z shl.b64 %rd552, %rd551, 32; 2026-02-21T09:08:46.9326570Z or.b64 %rd553, %rd550, %rd552; 2026-02-21T09:08:46.9326890Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9326988Z mov.b64 {%r1522, %r1523}, %rd553; 2026-02-21T09:08:46.9327109Z cvt.rn.f16x2.f32 %r1524, %r1523, %r1522; 2026-02-21T09:08:46.9327440Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9327638Z cvt.u64.u32 %rd554, %r809; 2026-02-21T09:08:46.9327737Z cvt.u64.u32 %rd555, %r810; 2026-02-21T09:08:46.9327841Z shl.b64 %rd556, %rd555, 32; 2026-02-21T09:08:46.9327942Z or.b64 %rd557, %rd554, %rd556; 2026-02-21T09:08:46.9328270Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9328379Z mov.b64 {%r1525, %r1526}, %rd557; 2026-02-21T09:08:46.9328573Z cvt.rn.f16x2.f32 %r1527, %r1526, %r1525; 2026-02-21T09:08:46.9328906Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9329012Z cvt.u64.u32 %rd558, %r812; 2026-02-21T09:08:46.9329105Z cvt.u64.u32 %rd559, %r813; 2026-02-21T09:08:46.9329203Z shl.b64 %rd560, %rd559, 32; 2026-02-21T09:08:46.9329301Z or.b64 %rd561, %rd558, %rd560; 2026-02-21T09:08:46.9329706Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9329821Z mov.b64 {%r1528, %r1529}, %rd561; 2026-02-21T09:08:46.9329936Z cvt.rn.f16x2.f32 %r1530, %r1529, %r1528; 2026-02-21T09:08:46.9330268Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9330371Z cvt.u64.u32 %rd562, %r814; 2026-02-21T09:08:46.9330489Z cvt.u64.u32 %rd563, %r815; 2026-02-21T09:08:46.9330600Z shl.b64 %rd564, %rd563, 32; 2026-02-21T09:08:46.9330704Z or.b64 %rd565, %rd562, %rd564; 2026-02-21T09:08:46.9331037Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9331135Z mov.b64 {%r1531, %r1532}, %rd565; 2026-02-21T09:08:46.9331256Z cvt.rn.f16x2.f32 %r1533, %r1532, %r1531; 2026-02-21T09:08:46.9331584Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9331689Z cvt.u64.u32 %rd566, %r816; 2026-02-21T09:08:46.9331795Z cvt.u64.u32 %rd567, %r817; 2026-02-21T09:08:46.9331905Z shl.b64 %rd568, %rd567, 32; 2026-02-21T09:08:46.9332004Z or.b64 %rd569, %rd566, %rd568; 2026-02-21T09:08:46.9332336Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9332439Z mov.b64 {%r1534, %r1535}, %rd569; 2026-02-21T09:08:46.9332551Z cvt.rn.f16x2.f32 %r1536, %r1535, %r1534; 2026-02-21T09:08:46.9332874Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9332982Z cvt.u64.u32 %rd570, %r818; 2026-02-21T09:08:46.9333077Z cvt.u64.u32 %rd571, %r819; 2026-02-21T09:08:46.9333175Z shl.b64 %rd572, %rd571, 32; 2026-02-21T09:08:46.9333282Z or.b64 %rd573, %rd570, %rd572; 2026-02-21T09:08:46.9333605Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9333702Z mov.b64 {%r1537, %r1538}, %rd573; 2026-02-21T09:08:46.9333826Z cvt.rn.f16x2.f32 %r1539, %r1538, %r1537; 2026-02-21T09:08:46.9334154Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9334249Z cvt.u64.u32 %rd574, %r820; 2026-02-21T09:08:46.9334346Z cvt.u64.u32 %rd575, %r821; 2026-02-21T09:08:46.9334544Z shl.b64 %rd576, %rd575, 32; 2026-02-21T09:08:46.9334643Z or.b64 %rd577, %rd574, %rd576; 2026-02-21T09:08:46.9335024Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9335136Z mov.b64 {%r1540, %r1541}, %rd577; 2026-02-21T09:08:46.9335249Z cvt.rn.f16x2.f32 %r1542, %r1541, %r1540; 2026-02-21T09:08:46.9335566Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9335671Z cvt.u64.u32 %rd578, %r822; 2026-02-21T09:08:46.9335768Z cvt.u64.u32 %rd579, %r823; 2026-02-21T09:08:46.9335865Z shl.b64 %rd580, %rd579, 32; 2026-02-21T09:08:46.9335971Z or.b64 %rd581, %rd578, %rd580; 2026-02-21T09:08:46.9336402Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9336502Z mov.b64 {%r1543, %r1544}, %rd581; 2026-02-21T09:08:46.9336614Z cvt.rn.f16x2.f32 %r1545, %r1544, %r1543; 2026-02-21T09:08:46.9336945Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9337043Z cvt.u64.u32 %rd582, %r824; 2026-02-21T09:08:46.9337256Z cvt.u64.u32 %rd583, %r825; 2026-02-21T09:08:46.9337376Z shl.b64 %rd584, %rd583, 32; 2026-02-21T09:08:46.9337475Z or.b64 %rd585, %rd582, %rd584; 2026-02-21T09:08:46.9337799Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9337895Z mov.b64 {%r1546, %r1547}, %rd585; 2026-02-21T09:08:46.9338016Z cvt.rn.f16x2.f32 %r1548, %r1547, %r1546; 2026-02-21T09:08:46.9338423Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9338537Z cvt.u64.u32 %rd586, %r826; 2026-02-21T09:08:46.9338642Z cvt.u64.u32 %rd587, %r827; 2026-02-21T09:08:46.9338738Z shl.b64 %rd588, %rd587, 32; 2026-02-21T09:08:46.9338836Z or.b64 %rd589, %rd586, %rd588; 2026-02-21T09:08:46.9339173Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9339272Z mov.b64 {%r1549, %r1550}, %rd589; 2026-02-21T09:08:46.9339389Z cvt.rn.f16x2.f32 %r1551, %r1550, %r1549; 2026-02-21T09:08:46.9339709Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9339815Z cvt.u64.u32 %rd590, %r829; 2026-02-21T09:08:46.9339911Z cvt.u64.u32 %rd591, %r830; 2026-02-21T09:08:46.9340004Z shl.b64 %rd592, %rd591, 32; 2026-02-21T09:08:46.9340117Z or.b64 %rd593, %rd590, %rd592; 2026-02-21T09:08:46.9340440Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9340544Z mov.b64 {%r1552, %r1553}, %rd593; 2026-02-21T09:08:46.9340666Z cvt.rn.f16x2.f32 %r1554, %r1553, %r1552; 2026-02-21T09:08:46.9340987Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9341089Z cvt.u64.u32 %rd594, %r831; 2026-02-21T09:08:46.9341186Z cvt.u64.u32 %rd595, %r832; 2026-02-21T09:08:46.9341291Z shl.b64 %rd596, %rd595, 32; 2026-02-21T09:08:46.9341387Z or.b64 %rd597, %rd594, %rd596; 2026-02-21T09:08:46.9341715Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9341824Z mov.b64 {%r1555, %r1556}, %rd597; 2026-02-21T09:08:46.9341937Z cvt.rn.f16x2.f32 %r1557, %r1556, %r1555; 2026-02-21T09:08:46.9342258Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9342364Z cvt.u64.u32 %rd598, %r833; 2026-02-21T09:08:46.9342462Z cvt.u64.u32 %rd599, %r834; 2026-02-21T09:08:46.9342564Z shl.b64 %rd600, %rd599, 32; 2026-02-21T09:08:46.9342659Z or.b64 %rd601, %rd598, %rd600; 2026-02-21T09:08:46.9342985Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9343197Z mov.b64 {%r1558, %r1559}, %rd601; 2026-02-21T09:08:46.9343310Z cvt.rn.f16x2.f32 %r1560, %r1559, %r1558; 2026-02-21T09:08:46.9343640Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9343739Z cvt.u64.u32 %rd602, %r835; 2026-02-21T09:08:46.9343835Z cvt.u64.u32 %rd603, %r836; 2026-02-21T09:08:46.9343941Z shl.b64 %rd604, %rd603, 32; 2026-02-21T09:08:46.9344041Z or.b64 %rd605, %rd602, %rd604; 2026-02-21T09:08:46.9344357Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9344454Z mov.b64 {%r1561, %r1562}, %rd605; 2026-02-21T09:08:46.9344577Z cvt.rn.f16x2.f32 %r1563, %r1562, %r1561; 2026-02-21T09:08:46.9345059Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9345164Z cvt.u64.u32 %rd606, %r837; 2026-02-21T09:08:46.9345270Z cvt.u64.u32 %rd607, %r838; 2026-02-21T09:08:46.9345374Z shl.b64 %rd608, %rd607, 32; 2026-02-21T09:08:46.9345471Z or.b64 %rd609, %rd606, %rd608; 2026-02-21T09:08:46.9345798Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9345987Z mov.b64 {%r1564, %r1565}, %rd609; 2026-02-21T09:08:46.9346110Z cvt.rn.f16x2.f32 %r1566, %r1565, %r1564; 2026-02-21T09:08:46.9346432Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9346541Z cvt.u64.u32 %rd610, %r839; 2026-02-21T09:08:46.9346637Z cvt.u64.u32 %rd611, %r840; 2026-02-21T09:08:46.9346736Z shl.b64 %rd612, %rd611, 32; 2026-02-21T09:08:46.9346919Z or.b64 %rd613, %rd610, %rd612; 2026-02-21T09:08:46.9347255Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9347356Z mov.b64 {%r1567, %r1568}, %rd613; 2026-02-21T09:08:46.9347480Z cvt.rn.f16x2.f32 %r1569, %r1568, %r1567; 2026-02-21T09:08:46.9347803Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9347900Z cvt.u64.u32 %rd614, %r841; 2026-02-21T09:08:46.9347996Z cvt.u64.u32 %rd615, %r842; 2026-02-21T09:08:46.9348103Z shl.b64 %rd616, %rd615, 32; 2026-02-21T09:08:46.9348201Z or.b64 %rd617, %rd614, %rd616; 2026-02-21T09:08:46.9348529Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9348638Z mov.b64 {%r1570, %r1571}, %rd617; 2026-02-21T09:08:46.9348750Z cvt.rn.f16x2.f32 %r1572, %r1571, %r1570; 2026-02-21T09:08:46.9349067Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9349180Z cvt.u64.u32 %rd618, %r843; 2026-02-21T09:08:46.9349277Z cvt.u64.u32 %rd619, %r844; 2026-02-21T09:08:46.9349379Z shl.b64 %rd620, %rd619, 32; 2026-02-21T09:08:46.9349492Z or.b64 %rd621, %rd618, %rd620; 2026-02-21T09:08:46.9349937Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9350051Z mov.b64 {%r1573, %r1574}, %rd621; 2026-02-21T09:08:46.9350181Z cvt.rn.f16x2.f32 %r1575, %r1574, %r1573; 2026-02-21T09:08:46.9350534Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9350634Z cvt.u64.u32 %rd622, %r846; 2026-02-21T09:08:46.9350732Z cvt.u64.u32 %rd623, %r847; 2026-02-21T09:08:46.9350836Z shl.b64 %rd624, %rd623, 32; 2026-02-21T09:08:46.9350940Z or.b64 %rd625, %rd622, %rd624; 2026-02-21T09:08:46.9351262Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9351370Z mov.b64 {%r1576, %r1577}, %rd625; 2026-02-21T09:08:46.9351493Z cvt.rn.f16x2.f32 %r1578, %r1577, %r1576; 2026-02-21T09:08:46.9351808Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9352014Z cvt.u64.u32 %rd626, %r848; 2026-02-21T09:08:46.9352121Z cvt.u64.u32 %rd627, %r849; 2026-02-21T09:08:46.9352217Z shl.b64 %rd628, %rd627, 32; 2026-02-21T09:08:46.9352316Z or.b64 %rd629, %rd626, %rd628; 2026-02-21T09:08:46.9352649Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9352748Z mov.b64 {%r1579, %r1580}, %rd629; 2026-02-21T09:08:46.9352860Z cvt.rn.f16x2.f32 %r1581, %r1580, %r1579; 2026-02-21T09:08:46.9353178Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9353284Z cvt.u64.u32 %rd630, %r850; 2026-02-21T09:08:46.9353384Z cvt.u64.u32 %rd631, %r851; 2026-02-21T09:08:46.9353578Z shl.b64 %rd632, %rd631, 32; 2026-02-21T09:08:46.9353687Z or.b64 %rd633, %rd630, %rd632; 2026-02-21T09:08:46.9354003Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9354109Z mov.b64 {%r1582, %r1583}, %rd633; 2026-02-21T09:08:46.9354230Z cvt.rn.f16x2.f32 %r1584, %r1583, %r1582; 2026-02-21T09:08:46.9354544Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9354758Z cvt.u64.u32 %rd634, %r852; 2026-02-21T09:08:46.9354867Z cvt.u64.u32 %rd635, %r853; 2026-02-21T09:08:46.9354970Z shl.b64 %rd636, %rd635, 32; 2026-02-21T09:08:46.9355070Z or.b64 %rd637, %rd634, %rd636; 2026-02-21T09:08:46.9355395Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9355503Z mov.b64 {%r1585, %r1586}, %rd637; 2026-02-21T09:08:46.9355693Z cvt.rn.f16x2.f32 %r1587, %r1586, %r1585; 2026-02-21T09:08:46.9356028Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9356135Z cvt.u64.u32 %rd638, %r854; 2026-02-21T09:08:46.9356231Z cvt.u64.u32 %rd639, %r855; 2026-02-21T09:08:46.9356332Z shl.b64 %rd640, %rd639, 32; 2026-02-21T09:08:46.9356430Z or.b64 %rd641, %rd638, %rd640; 2026-02-21T09:08:46.9356751Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9356856Z mov.b64 {%r1588, %r1589}, %rd641; 2026-02-21T09:08:46.9356969Z cvt.rn.f16x2.f32 %r1590, %r1589, %r1588; 2026-02-21T09:08:46.9357296Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9357394Z cvt.u64.u32 %rd642, %r856; 2026-02-21T09:08:46.9357491Z cvt.u64.u32 %rd643, %r857; 2026-02-21T09:08:46.9357597Z shl.b64 %rd644, %rd643, 32; 2026-02-21T09:08:46.9357700Z or.b64 %rd645, %rd642, %rd644; 2026-02-21T09:08:46.9358021Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9358119Z mov.b64 {%r1591, %r1592}, %rd645; 2026-02-21T09:08:46.9358242Z cvt.rn.f16x2.f32 %r1593, %r1592, %r1591; 2026-02-21T09:08:46.9358568Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9358669Z cvt.u64.u32 %rd646, %r858; 2026-02-21T09:08:46.9358778Z cvt.u64.u32 %rd647, %r859; 2026-02-21T09:08:46.9358878Z shl.b64 %rd648, %rd647, 32; 2026-02-21T09:08:46.9358980Z or.b64 %rd649, %rd646, %rd648; 2026-02-21T09:08:46.9359310Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9359411Z mov.b64 {%r1594, %r1595}, %rd649; 2026-02-21T09:08:46.9359525Z cvt.rn.f16x2.f32 %r1596, %r1595, %r1594; 2026-02-21T09:08:46.9359851Z .loc 1 53 52 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:53:52 2026-02-21T09:08:46.9359967Z cvt.u64.u32 %rd650, %r860; 2026-02-21T09:08:46.9360065Z cvt.u64.u32 %rd651, %r861; 2026-02-21T09:08:46.9360164Z shl.b64 %rd652, %rd651, 32; 2026-02-21T09:08:46.9360271Z or.b64 %rd653, %rd650, %rd652; 2026-02-21T09:08:46.9360592Z .loc 1 55 27 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:55:27 2026-02-21T09:08:46.9360797Z mov.b64 {%r1597, %r1598}, %rd653; 2026-02-21T09:08:46.9360921Z cvt.rn.f16x2.f32 %r1599, %r1598, %r1597; 2026-02-21T09:08:46.9361247Z .loc 1 56 82 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:56:82 2026-02-21T09:08:46.9361426Z st.shared.v4.b32 [%r11], {%r1218, %r1230, %r1242, %r1254}; 2026-02-21T09:08:46.9361595Z st.shared.v4.b32 [%r12], {%r1266, %r1278, %r1290, %r1302}; 2026-02-21T09:08:46.9361771Z st.shared.v4.b32 [%r13], {%r1314, %r1326, %r1338, %r1350}; 2026-02-21T09:08:46.9361936Z st.shared.v4.b32 [%r14], {%r1362, %r1374, %r1386, %r1398}; 2026-02-21T09:08:46.9362099Z st.shared.v4.b32 [%r15], {%r1410, %r1422, %r1434, %r1446}; 2026-02-21T09:08:46.9362369Z st.shared.v4.b32 [%r16], {%r1458, %r1470, %r1482, %r1494}; 2026-02-21T09:08:46.9362530Z st.shared.v4.b32 [%r17], {%r1506, %r1518, %r1530, %r1542}; 2026-02-21T09:08:46.9362692Z st.shared.v4.b32 [%r18], {%r1554, %r1566, %r1578, %r1590}; 2026-02-21T09:08:46.9362797Z bar.sync 0; 2026-02-21T09:08:46.9362896Z // begin inline asm 2026-02-21T09:08:46.9363270Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1023, %r1027, %r1031, %r1035}, [%r867]; 2026-02-21T09:08:46.9363375Z // end inline asm 2026-02-21T09:08:46.9363480Z // begin inline asm 2026-02-21T09:08:46.9363776Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1039, %r1043, %r1047, %r1051}, [%r872]; 2026-02-21T09:08:46.9363867Z // end inline asm 2026-02-21T09:08:46.9363972Z // begin inline asm 2026-02-21T09:08:46.9364260Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1055, %r1059, %r1063, %r1067}, [%r877]; 2026-02-21T09:08:46.9364417Z // end inline asm 2026-02-21T09:08:46.9364540Z // begin inline asm 2026-02-21T09:08:46.9364889Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1071, %r1075, %r1079, %r1083}, [%r882]; 2026-02-21T09:08:46.9364987Z // end inline asm 2026-02-21T09:08:46.9365082Z // begin inline asm 2026-02-21T09:08:46.9365378Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1087, %r1091, %r1095, %r1099}, [%r887]; 2026-02-21T09:08:46.9365467Z // end inline asm 2026-02-21T09:08:46.9365559Z // begin inline asm 2026-02-21T09:08:46.9365856Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1103, %r1107, %r1111, %r1115}, [%r892]; 2026-02-21T09:08:46.9365946Z // end inline asm 2026-02-21T09:08:46.9366038Z // begin inline asm 2026-02-21T09:08:46.9366321Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1119, %r1123, %r1127, %r1131}, [%r897]; 2026-02-21T09:08:46.9366423Z // end inline asm 2026-02-21T09:08:46.9366515Z // begin inline asm 2026-02-21T09:08:46.9366801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1135, %r1139, %r1143, %r1147}, [%r902]; 2026-02-21T09:08:46.9366903Z // end inline asm 2026-02-21T09:08:46.9366992Z bar.sync 0; 2026-02-21T09:08:46.9367158Z st.shared.v4.b32 [%r11], {%r1221, %r1233, %r1245, %r1257}; 2026-02-21T09:08:46.9367330Z st.shared.v4.b32 [%r12], {%r1269, %r1281, %r1293, %r1305}; 2026-02-21T09:08:46.9367494Z st.shared.v4.b32 [%r13], {%r1317, %r1329, %r1341, %r1353}; 2026-02-21T09:08:46.9367665Z st.shared.v4.b32 [%r14], {%r1365, %r1377, %r1389, %r1401}; 2026-02-21T09:08:46.9367826Z st.shared.v4.b32 [%r15], {%r1413, %r1425, %r1437, %r1449}; 2026-02-21T09:08:46.9367998Z st.shared.v4.b32 [%r16], {%r1461, %r1473, %r1485, %r1497}; 2026-02-21T09:08:46.9368157Z st.shared.v4.b32 [%r17], {%r1509, %r1521, %r1533, %r1545}; 2026-02-21T09:08:46.9368315Z st.shared.v4.b32 [%r18], {%r1557, %r1569, %r1581, %r1593}; 2026-02-21T09:08:46.9368414Z bar.sync 0; 2026-02-21T09:08:46.9368509Z // begin inline asm 2026-02-21T09:08:46.9368801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1028, %r1032, %r1036}, [%r867]; 2026-02-21T09:08:46.9368899Z // end inline asm 2026-02-21T09:08:46.9368996Z // begin inline asm 2026-02-21T09:08:46.9369280Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1040, %r1044, %r1048, %r1052}, [%r872]; 2026-02-21T09:08:46.9369369Z // end inline asm 2026-02-21T09:08:46.9369474Z // begin inline asm 2026-02-21T09:08:46.9369861Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1056, %r1060, %r1064, %r1068}, [%r877]; 2026-02-21T09:08:46.9369953Z // end inline asm 2026-02-21T09:08:46.9370053Z // begin inline asm 2026-02-21T09:08:46.9370330Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1072, %r1076, %r1080, %r1084}, [%r882]; 2026-02-21T09:08:46.9370423Z // end inline asm 2026-02-21T09:08:46.9370516Z // begin inline asm 2026-02-21T09:08:46.9370801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1088, %r1092, %r1096, %r1100}, [%r887]; 2026-02-21T09:08:46.9370889Z // end inline asm 2026-02-21T09:08:46.9370982Z // begin inline asm 2026-02-21T09:08:46.9371270Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1104, %r1108, %r1112, %r1116}, [%r892]; 2026-02-21T09:08:46.9371487Z // end inline asm 2026-02-21T09:08:46.9371594Z // begin inline asm 2026-02-21T09:08:46.9371989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1120, %r1124, %r1128, %r1132}, [%r897]; 2026-02-21T09:08:46.9372093Z // end inline asm 2026-02-21T09:08:46.9372202Z // begin inline asm 2026-02-21T09:08:46.9372513Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1136, %r1140, %r1144, %r1148}, [%r902]; 2026-02-21T09:08:46.9372609Z // end inline asm 2026-02-21T09:08:46.9372696Z bar.sync 0; 2026-02-21T09:08:46.9372947Z st.shared.v4.b32 [%r11], {%r1224, %r1236, %r1248, %r1260}; 2026-02-21T09:08:46.9373123Z st.shared.v4.b32 [%r12], {%r1272, %r1284, %r1296, %r1308}; 2026-02-21T09:08:46.9373287Z st.shared.v4.b32 [%r13], {%r1320, %r1332, %r1344, %r1356}; 2026-02-21T09:08:46.9373447Z st.shared.v4.b32 [%r14], {%r1368, %r1380, %r1392, %r1404}; 2026-02-21T09:08:46.9373616Z st.shared.v4.b32 [%r15], {%r1416, %r1428, %r1440, %r1452}; 2026-02-21T09:08:46.9373849Z st.shared.v4.b32 [%r16], {%r1464, %r1476, %r1488, %r1500}; 2026-02-21T09:08:46.9374023Z st.shared.v4.b32 [%r17], {%r1512, %r1524, %r1536, %r1548}; 2026-02-21T09:08:46.9374183Z st.shared.v4.b32 [%r18], {%r1560, %r1572, %r1584, %r1596}; 2026-02-21T09:08:46.9374279Z bar.sync 0; 2026-02-21T09:08:46.9374377Z // begin inline asm 2026-02-21T09:08:46.9374663Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r867]; 2026-02-21T09:08:46.9374850Z // end inline asm 2026-02-21T09:08:46.9374946Z // begin inline asm 2026-02-21T09:08:46.9375237Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r872]; 2026-02-21T09:08:46.9375338Z // end inline asm 2026-02-21T09:08:46.9375432Z // begin inline asm 2026-02-21T09:08:46.9375722Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r877]; 2026-02-21T09:08:46.9375811Z // end inline asm 2026-02-21T09:08:46.9375909Z // begin inline asm 2026-02-21T09:08:46.9376200Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r882]; 2026-02-21T09:08:46.9376292Z // end inline asm 2026-02-21T09:08:46.9376392Z // begin inline asm 2026-02-21T09:08:46.9376675Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r887]; 2026-02-21T09:08:46.9376770Z // end inline asm 2026-02-21T09:08:46.9376863Z // begin inline asm 2026-02-21T09:08:46.9377153Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1105, %r1109, %r1113, %r1117}, [%r892]; 2026-02-21T09:08:46.9377244Z // end inline asm 2026-02-21T09:08:46.9377340Z // begin inline asm 2026-02-21T09:08:46.9377632Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1121, %r1125, %r1129, %r1133}, [%r897]; 2026-02-21T09:08:46.9377720Z // end inline asm 2026-02-21T09:08:46.9377812Z // begin inline asm 2026-02-21T09:08:46.9378106Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1137, %r1141, %r1145, %r1149}, [%r902]; 2026-02-21T09:08:46.9378198Z // end inline asm 2026-02-21T09:08:46.9378285Z bar.sync 0; 2026-02-21T09:08:46.9378454Z st.shared.v4.b32 [%r11], {%r1227, %r1239, %r1251, %r1263}; 2026-02-21T09:08:46.9378629Z st.shared.v4.b32 [%r12], {%r1275, %r1287, %r1299, %r1311}; 2026-02-21T09:08:46.9378795Z st.shared.v4.b32 [%r13], {%r1323, %r1335, %r1347, %r1359}; 2026-02-21T09:08:46.9378955Z st.shared.v4.b32 [%r14], {%r1371, %r1383, %r1395, %r1407}; 2026-02-21T09:08:46.9379242Z st.shared.v4.b32 [%r15], {%r1419, %r1431, %r1443, %r1455}; 2026-02-21T09:08:46.9379401Z st.shared.v4.b32 [%r16], {%r1467, %r1479, %r1491, %r1503}; 2026-02-21T09:08:46.9379565Z st.shared.v4.b32 [%r17], {%r1515, %r1527, %r1539, %r1551}; 2026-02-21T09:08:46.9379734Z st.shared.v4.b32 [%r18], {%r1563, %r1575, %r1587, %r1599}; 2026-02-21T09:08:46.9379823Z bar.sync 0; 2026-02-21T09:08:46.9379916Z // begin inline asm 2026-02-21T09:08:46.9380204Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r867]; 2026-02-21T09:08:46.9380306Z // end inline asm 2026-02-21T09:08:46.9380400Z // begin inline asm 2026-02-21T09:08:46.9380690Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r872]; 2026-02-21T09:08:46.9380891Z // end inline asm 2026-02-21T09:08:46.9380988Z // begin inline asm 2026-02-21T09:08:46.9381274Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r877]; 2026-02-21T09:08:46.9381365Z // end inline asm 2026-02-21T09:08:46.9381467Z // begin inline asm 2026-02-21T09:08:46.9381750Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r882]; 2026-02-21T09:08:46.9381918Z // end inline asm 2026-02-21T09:08:46.9382033Z // begin inline asm 2026-02-21T09:08:46.9382315Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r887]; 2026-02-21T09:08:46.9382406Z // end inline asm 2026-02-21T09:08:46.9382511Z // begin inline asm 2026-02-21T09:08:46.9382791Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1106, %r1110, %r1114, %r1118}, [%r892]; 2026-02-21T09:08:46.9382883Z // end inline asm 2026-02-21T09:08:46.9383055Z // begin inline asm 2026-02-21T09:08:46.9383357Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1122, %r1126, %r1130, %r1134}, [%r897]; 2026-02-21T09:08:46.9383446Z // end inline asm 2026-02-21T09:08:46.9383538Z // begin inline asm 2026-02-21T09:08:46.9383830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1138, %r1142, %r1146, %r1150}, [%r902]; 2026-02-21T09:08:46.9383924Z // end inline asm 2026-02-21T09:08:46.9384012Z // begin inline asm 2026-02-21T09:08:46.9384226Z st.global.v4.b32 [ %rd110 + 0 ], { %r1023, %r1024, %r1025, %r1026 }; 2026-02-21T09:08:46.9384318Z // end inline asm 2026-02-21T09:08:46.9384412Z // begin inline asm 2026-02-21T09:08:46.9384605Z st.global.v4.b32 [ %rd111 + 0 ], { %r1027, %r1028, %r1029, %r1030 }; 2026-02-21T09:08:46.9384781Z // end inline asm 2026-02-21T09:08:46.9384877Z // begin inline asm 2026-02-21T09:08:46.9385065Z st.global.v4.b32 [ %rd112 + 0 ], { %r1031, %r1032, %r1033, %r1034 }; 2026-02-21T09:08:46.9385162Z // end inline asm 2026-02-21T09:08:46.9385259Z // begin inline asm 2026-02-21T09:08:46.9385447Z st.global.v4.b32 [ %rd113 + 0 ], { %r1035, %r1036, %r1037, %r1038 }; 2026-02-21T09:08:46.9385537Z // end inline asm 2026-02-21T09:08:46.9385635Z // begin inline asm 2026-02-21T09:08:46.9385815Z st.global.v4.b32 [ %rd114 + 0 ], { %r1039, %r1040, %r1041, %r1042 }; 2026-02-21T09:08:46.9385909Z // end inline asm 2026-02-21T09:08:46.9386007Z // begin inline asm 2026-02-21T09:08:46.9386188Z st.global.v4.b32 [ %rd115 + 0 ], { %r1043, %r1044, %r1045, %r1046 }; 2026-02-21T09:08:46.9386278Z // end inline asm 2026-02-21T09:08:46.9386371Z // begin inline asm 2026-02-21T09:08:46.9386563Z st.global.v4.b32 [ %rd116 + 0 ], { %r1047, %r1048, %r1049, %r1050 }; 2026-02-21T09:08:46.9386651Z // end inline asm 2026-02-21T09:08:46.9386743Z // begin inline asm 2026-02-21T09:08:46.9386934Z st.global.v4.b32 [ %rd117 + 0 ], { %r1051, %r1052, %r1053, %r1054 }; 2026-02-21T09:08:46.9387028Z // end inline asm 2026-02-21T09:08:46.9387120Z // begin inline asm 2026-02-21T09:08:46.9387314Z st.global.v4.b32 [ %rd118 + 0 ], { %r1055, %r1056, %r1057, %r1058 }; 2026-02-21T09:08:46.9387407Z // end inline asm 2026-02-21T09:08:46.9387498Z // begin inline asm 2026-02-21T09:08:46.9387678Z st.global.v4.b32 [ %rd119 + 0 ], { %r1059, %r1060, %r1061, %r1062 }; 2026-02-21T09:08:46.9387774Z // end inline asm 2026-02-21T09:08:46.9387982Z // begin inline asm 2026-02-21T09:08:46.9388164Z st.global.v4.b32 [ %rd120 + 0 ], { %r1063, %r1064, %r1065, %r1066 }; 2026-02-21T09:08:46.9388262Z // end inline asm 2026-02-21T09:08:46.9388356Z // begin inline asm 2026-02-21T09:08:46.9388537Z st.global.v4.b32 [ %rd121 + 0 ], { %r1067, %r1068, %r1069, %r1070 }; 2026-02-21T09:08:46.9388627Z // end inline asm 2026-02-21T09:08:46.9388726Z // begin inline asm 2026-02-21T09:08:46.9388906Z st.global.v4.b32 [ %rd122 + 0 ], { %r1071, %r1072, %r1073, %r1074 }; 2026-02-21T09:08:46.9388997Z // end inline asm 2026-02-21T09:08:46.9389098Z // begin inline asm 2026-02-21T09:08:46.9389282Z st.global.v4.b32 [ %rd123 + 0 ], { %r1075, %r1076, %r1077, %r1078 }; 2026-02-21T09:08:46.9389475Z // end inline asm 2026-02-21T09:08:46.9389571Z // begin inline asm 2026-02-21T09:08:46.9389755Z st.global.v4.b32 [ %rd124 + 0 ], { %r1079, %r1080, %r1081, %r1082 }; 2026-02-21T09:08:46.9389846Z // end inline asm 2026-02-21T09:08:46.9389938Z // begin inline asm 2026-02-21T09:08:46.9390134Z st.global.v4.b32 [ %rd125 + 0 ], { %r1083, %r1084, %r1085, %r1086 }; 2026-02-21T09:08:46.9390224Z // end inline asm 2026-02-21T09:08:46.9390315Z // begin inline asm 2026-02-21T09:08:46.9390587Z st.global.v4.b32 [ %rd126 + 0 ], { %r1087, %r1088, %r1089, %r1090 }; 2026-02-21T09:08:46.9390686Z // end inline asm 2026-02-21T09:08:46.9390780Z // begin inline asm 2026-02-21T09:08:46.9390961Z st.global.v4.b32 [ %rd127 + 0 ], { %r1091, %r1092, %r1093, %r1094 }; 2026-02-21T09:08:46.9391058Z // end inline asm 2026-02-21T09:08:46.9391152Z // begin inline asm 2026-02-21T09:08:46.9391332Z st.global.v4.b32 [ %rd128 + 0 ], { %r1095, %r1096, %r1097, %r1098 }; 2026-02-21T09:08:46.9391510Z // end inline asm 2026-02-21T09:08:46.9391617Z // begin inline asm 2026-02-21T09:08:46.9391798Z st.global.v4.b32 [ %rd129 + 0 ], { %r1099, %r1100, %r1101, %r1102 }; 2026-02-21T09:08:46.9391898Z // end inline asm 2026-02-21T09:08:46.9391991Z // begin inline asm 2026-02-21T09:08:46.9392173Z st.global.v4.b32 [ %rd130 + 0 ], { %r1103, %r1104, %r1105, %r1106 }; 2026-02-21T09:08:46.9392259Z // end inline asm 2026-02-21T09:08:46.9392363Z // begin inline asm 2026-02-21T09:08:46.9392545Z st.global.v4.b32 [ %rd131 + 0 ], { %r1107, %r1108, %r1109, %r1110 }; 2026-02-21T09:08:46.9392636Z // end inline asm 2026-02-21T09:08:46.9392737Z // begin inline asm 2026-02-21T09:08:46.9392915Z st.global.v4.b32 [ %rd132 + 0 ], { %r1111, %r1112, %r1113, %r1114 }; 2026-02-21T09:08:46.9393002Z // end inline asm 2026-02-21T09:08:46.9393093Z // begin inline asm 2026-02-21T09:08:46.9393281Z st.global.v4.b32 [ %rd133 + 0 ], { %r1115, %r1116, %r1117, %r1118 }; 2026-02-21T09:08:46.9393373Z // end inline asm 2026-02-21T09:08:46.9393465Z // begin inline asm 2026-02-21T09:08:46.9393658Z st.global.v4.b32 [ %rd134 + 0 ], { %r1119, %r1120, %r1121, %r1122 }; 2026-02-21T09:08:46.9393745Z // end inline asm 2026-02-21T09:08:46.9393856Z // begin inline asm 2026-02-21T09:08:46.9394052Z st.global.v4.b32 [ %rd135 + 0 ], { %r1123, %r1124, %r1125, %r1126 }; 2026-02-21T09:08:46.9394148Z // end inline asm 2026-02-21T09:08:46.9394239Z // begin inline asm 2026-02-21T09:08:46.9394417Z st.global.v4.b32 [ %rd136 + 0 ], { %r1127, %r1128, %r1129, %r1130 }; 2026-02-21T09:08:46.9394520Z // end inline asm 2026-02-21T09:08:46.9394611Z // begin inline asm 2026-02-21T09:08:46.9394854Z st.global.v4.b32 [ %rd137 + 0 ], { %r1131, %r1132, %r1133, %r1134 }; 2026-02-21T09:08:46.9394962Z // end inline asm 2026-02-21T09:08:46.9395053Z // begin inline asm 2026-02-21T09:08:46.9395235Z st.global.v4.b32 [ %rd138 + 0 ], { %r1135, %r1136, %r1137, %r1138 }; 2026-02-21T09:08:46.9395325Z // end inline asm 2026-02-21T09:08:46.9395431Z // begin inline asm 2026-02-21T09:08:46.9395617Z st.global.v4.b32 [ %rd139 + 0 ], { %r1139, %r1140, %r1141, %r1142 }; 2026-02-21T09:08:46.9395707Z // end inline asm 2026-02-21T09:08:46.9395810Z // begin inline asm 2026-02-21T09:08:46.9395992Z st.global.v4.b32 [ %rd140 + 0 ], { %r1143, %r1144, %r1145, %r1146 }; 2026-02-21T09:08:46.9396190Z // end inline asm 2026-02-21T09:08:46.9396282Z // begin inline asm 2026-02-21T09:08:46.9396476Z st.global.v4.b32 [ %rd141 + 0 ], { %r1147, %r1148, %r1149, %r1150 }; 2026-02-21T09:08:46.9396563Z // end inline asm 2026-02-21T09:08:46.9396705Z $L__BB0_8: // %._crit_edge 2026-02-21T09:08:46.9397049Z .loc 1 27 4 // cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py:27:4 2026-02-21T09:08:46.9397140Z bar.sync 0; 2026-02-21T09:08:46.9397234Z // begin inline asm 2026-02-21T09:08:46.9397465Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1600, 512; 2026-02-21T09:08:46.9397554Z // end inline asm 2026-02-21T09:08:46.9397642Z ret; 2026-02-21T09:08:46.9397824Z $L__tmp0: 2026-02-21T09:08:46.9397925Z $L__func_end0: 2026-02-21T09:08:46.9398065Z // -- End function 2026-02-21T09:08:46.9398151Z } 2026-02-21T09:08:46.9398595Z .file 1 "/tmp/torchinductor_root/mz/cmzfh5bgn7zmvufvezd3kcvl35l2lxjeurmohauqozfhcar75h5b.py" 2026-02-21T09:08:46.9398723Z .section .debug_abbrev 2026-02-21T09:08:46.9398817Z { 2026-02-21T09:08:46.9398983Z .b8 1 // Abbreviation Code 2026-02-21T09:08:46.9399314Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:08:46.9399485Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:08:46.9399642Z .b8 37 // DW_AT_producer 2026-02-21T09:08:46.9399799Z .b8 8 // DW_FORM_string 2026-02-21T09:08:46.9399946Z .b8 19 // DW_AT_language 2026-02-21T09:08:46.9400178Z .b8 5 // DW_FORM_data2 2026-02-21T09:08:46.9400333Z .b8 3 // DW_AT_name 2026-02-21T09:08:46.9400460Z .b8 8 // DW_FORM_string 2026-02-21T09:08:46.9400591Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:08:46.9400730Z .b8 6 // DW_FORM_data4 2026-02-21T09:08:46.9400860Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:08:46.9400989Z .b8 8 // DW_FORM_string 2026-02-21T09:08:46.9401109Z .b8 0 // EOM(1) 2026-02-21T09:08:46.9401233Z .b8 0 // EOM(2) 2026-02-21T09:08:46.9401343Z .b8 0 // EOM(3) 2026-02-21T09:08:46.9401424Z } 2026-02-21T09:08:46.9401532Z .section .debug_info 2026-02-21T09:08:46.9401612Z { 2026-02-21T09:08:46.9401751Z .b32 104 // Length of Unit 2026-02-21T09:08:46.9401901Z .b8 2 // DWARF version number 2026-02-21T09:08:46.9401997Z .b8 0 2026-02-21T09:08:46.9402202Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:08:46.9402352Z .b8 8 // Address Size (in bytes) 2026-02-21T09:08:46.9402535Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:08:46.9402674Z .b8 116 // DW_AT_producer 2026-02-21T09:08:46.9402763Z .b8 114 2026-02-21T09:08:46.9402857Z .b8 105 2026-02-21T09:08:46.9402942Z .b8 116 2026-02-21T09:08:46.9403024Z .b8 111 2026-02-21T09:08:46.9403109Z .b8 110 2026-02-21T09:08:46.9403195Z .b8 0 2026-02-21T09:08:46.9403321Z .b8 2 // DW_AT_language 2026-02-21T09:08:46.9403401Z .b8 0 2026-02-21T09:08:46.9403535Z .b8 99 // DW_AT_name 2026-02-21T09:08:46.9403621Z .b8 109 2026-02-21T09:08:46.9403702Z .b8 122 2026-02-21T09:08:46.9403783Z .b8 102 2026-02-21T09:08:46.9403883Z .b8 104 2026-02-21T09:08:46.9403968Z .b8 53 2026-02-21T09:08:46.9404049Z .b8 98 2026-02-21T09:08:46.9404139Z .b8 103 2026-02-21T09:08:46.9404220Z .b8 110 2026-02-21T09:08:46.9404300Z .b8 55 2026-02-21T09:08:46.9404381Z .b8 122 2026-02-21T09:08:46.9404471Z .b8 109 2026-02-21T09:08:46.9404551Z .b8 118 2026-02-21T09:08:46.9404820Z .b8 117 2026-02-21T09:08:46.9404903Z .b8 102 2026-02-21T09:08:46.9404997Z .b8 118 2026-02-21T09:08:46.9405080Z .b8 101 2026-02-21T09:08:46.9405160Z .b8 122 2026-02-21T09:08:46.9405250Z .b8 100 2026-02-21T09:08:46.9405335Z .b8 51 2026-02-21T09:08:46.9405420Z .b8 107 2026-02-21T09:08:46.9405501Z .b8 99 2026-02-21T09:08:46.9405592Z .b8 118 2026-02-21T09:08:46.9405672Z .b8 108 2026-02-21T09:08:46.9405753Z .b8 51 2026-02-21T09:08:46.9405843Z .b8 53 2026-02-21T09:08:46.9405923Z .b8 108 2026-02-21T09:08:46.9406002Z .b8 50 2026-02-21T09:08:46.9406082Z .b8 108 2026-02-21T09:08:46.9406173Z .b8 120 2026-02-21T09:08:46.9406256Z .b8 106 2026-02-21T09:08:46.9406335Z .b8 101 2026-02-21T09:08:46.9406431Z .b8 117 2026-02-21T09:08:46.9406617Z .b8 114 2026-02-21T09:08:46.9406702Z .b8 109 2026-02-21T09:08:46.9406789Z .b8 111 2026-02-21T09:08:46.9406885Z .b8 104 2026-02-21T09:08:46.9406968Z .b8 97 2026-02-21T09:08:46.9407050Z .b8 117 2026-02-21T09:08:46.9407132Z .b8 113 2026-02-21T09:08:46.9407228Z .b8 111 2026-02-21T09:08:46.9407310Z .b8 122 2026-02-21T09:08:46.9407391Z .b8 102 2026-02-21T09:08:46.9407482Z .b8 104 2026-02-21T09:08:46.9407562Z .b8 99 2026-02-21T09:08:46.9407642Z .b8 97 2026-02-21T09:08:46.9407724Z .b8 114 2026-02-21T09:08:46.9407899Z .b8 55 2026-02-21T09:08:46.9407991Z .b8 53 2026-02-21T09:08:46.9408075Z .b8 104 2026-02-21T09:08:46.9408163Z .b8 53 2026-02-21T09:08:46.9408244Z .b8 98 2026-02-21T09:08:46.9408323Z .b8 46 2026-02-21T09:08:46.9408405Z .b8 112 2026-02-21T09:08:46.9408496Z .b8 121 2026-02-21T09:08:46.9408577Z .b8 0 2026-02-21T09:08:46.9408745Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:08:46.9408964Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:08:46.9409064Z .b8 116 2026-02-21T09:08:46.9409145Z .b8 109 2026-02-21T09:08:46.9409224Z .b8 112 2026-02-21T09:08:46.9409313Z .b8 47 2026-02-21T09:08:46.9409394Z .b8 116 2026-02-21T09:08:46.9409475Z .b8 111 2026-02-21T09:08:46.9409556Z .b8 114 2026-02-21T09:08:46.9409646Z .b8 99 2026-02-21T09:08:46.9409732Z .b8 104 2026-02-21T09:08:46.9409812Z .b8 105 2026-02-21T09:08:46.9409899Z .b8 110 2026-02-21T09:08:46.9409979Z .b8 100 2026-02-21T09:08:46.9410061Z .b8 117 2026-02-21T09:08:46.9410140Z .b8 99 2026-02-21T09:08:46.9410232Z .b8 116 2026-02-21T09:08:46.9410313Z .b8 111 2026-02-21T09:08:46.9410394Z .b8 114 2026-02-21T09:08:46.9410481Z .b8 95 2026-02-21T09:08:46.9410562Z .b8 114 2026-02-21T09:08:46.9410643Z .b8 111 2026-02-21T09:08:46.9410723Z .b8 111 2026-02-21T09:08:46.9410815Z .b8 116 2026-02-21T09:08:46.9410896Z .b8 47 2026-02-21T09:08:46.9410979Z .b8 109 2026-02-21T09:08:46.9411069Z .b8 122 2026-02-21T09:08:46.9411151Z .b8 0 2026-02-21T09:08:46.9411232Z } 2026-02-21T09:08:46.9411348Z .section .debug_macinfo { } 2026-02-21T09:08:46.9411362Z 2026-02-21T09:08:46.9411513Z ================================================================ 2026-02-21T09:08:46.9411696Z please share the reproducer above with Triton project. 2026-02-21T09:08:47.3592555Z 2026-02-21T09:08:47.3593656Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 14.0 configs/s 2026-02-21T09:08:47.9989924Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1544.0 2026-02-21T09:08:47.9990353Z configs/s 2026-02-21T09:08:48.0771280Z [228s] Generation 12 complete: 2026-02-21T09:08:48.0771615Z error=5 2026-02-21T09:08:48.0771795Z ok=11 2026-02-21T09:08:48.0772122Z min=0.0369 2026-02-21T09:08:48.0772300Z mid=0.0389 2026-02-21T09:08:48.0772508Z max=12.4048 2026-02-21T09:08:48.0772704Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:08:48.0773112Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:08:48.0773404Z 'l2_groupings': [2], 2026-02-21T09:08:48.0773614Z 'load_eviction_policies': ['first', ''], 2026-02-21T09:08:48.0773828Z 'loop_orders': [[1, 0]], 2026-02-21T09:08:48.0774014Z 'num_stages': 5, 2026-02-21T09:08:48.0774168Z 'num_warps': 8, 2026-02-21T09:08:48.0774329Z 'pid_type': 'flat', 2026-02-21T09:08:48.0775255Z 'range_flattens': [None, None], 2026-02-21T09:08:48.0775476Z 'range_multi_buffers': [None, None], 2026-02-21T09:08:48.0775693Z 'range_num_stages': [0, 0], 2026-02-21T09:08:48.0775901Z 'range_unroll_factors': [0, 0], 2026-02-21T09:08:48.0776114Z 'range_warp_specializes': [None, None]} 2026-02-21T09:08:48.0811262Z [228s] Fitting surrogate: 821 points, 821 targets 2026-02-21T09:08:48.3260460Z [228s] Autotuning complete in 228.7s after searching 789 configs. 2026-02-21T09:08:48.3260847Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:08:48.3262091Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=5, num_warps=8, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:08:48.3263691Z 2026-02-21T09:08:48.3263991Z [228s] Code of selected kernel: /tmp/torchinductor_root/za/czag5jicnz7ybidn76f35bmauskar7h64lbkv3qtkcrhdsim45yx.py 2026-02-21T09:08:48.3412686Z from __future__ import annotations 2026-02-21T09:08:48.3412951Z 2026-02-21T09:08:48.3416421Z import torch 2026-02-21T09:08:48.3416720Z import helion 2026-02-21T09:08:48.3416884Z import triton 2026-02-21T09:08:48.3417072Z import triton.language as tl 2026-02-21T09:08:48.3417356Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:08:48.3417595Z 2026-02-21T09:08:48.3417692Z _BLOCK_SIZE_1 = tl.constexpr(256) 2026-02-21T09:08:48.3418158Z _BLOCK_SIZE_0 = tl.constexpr(256) 2026-02-21T09:08:48.3418389Z _BLOCK_SIZE_2 = tl.constexpr(32) 2026-02-21T09:08:48.3418589Z # src[matmul.py:42]: def matmul( 2026-02-21T09:08:48.3418815Z # src[matmul.py:43]: x: Tensor, 2026-02-21T09:08:48.3419035Z # src[matmul.py:44]: y: Tensor, 2026-02-21T09:08:48.3419230Z # src[matmul.py:42-68]: ... 2026-02-21T09:08:48.3419450Z helion.runtime.set_triton_allocator() 2026-02-21T09:08:48.3419595Z 2026-02-21T09:08:48.3419658Z @triton.jit 2026-02-21T09:08:48.3419828Z def _helion_matmul(x, y, out): 2026-02-21T09:08:48.3420114Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:08:48.3420547Z x_desc = tl.make_tensor_descriptor(x, [4096, 2048], [2048, 1], [_BLOCK_SIZE_0, _BLOCK_SIZE_2]) 2026-02-21T09:08:48.3420986Z y_desc = tl.make_tensor_descriptor(y, [2048, 2048], [2048, 1], [_BLOCK_SIZE_1, _BLOCK_SIZE_2]) 2026-02-21T09:08:48.3421350Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:08:48.3421626Z num_pid_m = tl.cdiv(2048, _BLOCK_SIZE_1) 2026-02-21T09:08:48.3421854Z num_pid_n = tl.cdiv(4096, _BLOCK_SIZE_0) 2026-02-21T09:08:48.3422074Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:08:48.3422279Z num_pid_in_group = 2 * num_pid_n 2026-02-21T09:08:48.3422509Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:08:48.3422743Z first_pid_m = group_id * 2 2026-02-21T09:08:48.3422964Z group_size_m = min(num_pid_m - first_pid_m, 2) 2026-02-21T09:08:48.3423271Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:08:48.3423589Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:08:48.3423847Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:08:48.3424107Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:08:48.3424408Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:08:48.3424659Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:08:48.3425106Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:08:48.3425464Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:08:48.3425745Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:08:48.3426074Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:08:48.3426492Z for offset_2 in tl.range(0, 2048, _BLOCK_SIZE_2): 2026-02-21T09:08:48.3426737Z acc_copy = acc 2026-02-21T09:08:48.3426916Z acc_copy_0 = acc_copy 2026-02-21T09:08:48.3427207Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:08:48.3427526Z load = x_desc.load([offset_0, offset_2]) 2026-02-21T09:08:48.3427800Z load_1 = tl.permute(y_desc.load([offset_1, offset_2]), [1, 0]) 2026-02-21T09:08:48.3428294Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:08:48.3428800Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:08:48.3429178Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:08:48.3429471Z tl.store(out + (indices_0[:, None] * 2048 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:08:48.3429698Z 2026-02-21T09:08:48.3430061Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:08:48.3430632Z """ 2026-02-21T09:08:48.3430943Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:08:48.3431263Z Args: 2026-02-21T09:08:48.3431432Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:08:48.3431683Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:08:48.3432044Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:08:48.3432467Z after the matmul. Defaults to identity (no change). 2026-02-21T09:08:48.3432716Z Returns: 2026-02-21T09:08:48.3432889Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:08:48.3433103Z """ 2026-02-21T09:08:48.3433259Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:08:48.3433478Z m, k = x.size() 2026-02-21T09:08:48.3433645Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:08:48.3433850Z k2, n = y.size() 2026-02-21T09:08:48.3434079Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:08:48.3434357Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:08:48.3434597Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:08:48.3434967Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:08:48.3435314Z # src[matmul.py:62]: ) 2026-02-21T09:08:48.3435604Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:08:48.3435973Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:08:48.3436214Z _BLOCK_SIZE_1 = 256 2026-02-21T09:08:48.3436393Z _BLOCK_SIZE_0 = 256 2026-02-21T09:08:48.3436600Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:08:48.3436920Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:08:48.3437232Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:08:48.3437458Z # src[matmul.py:63-67]: ... 2026-02-21T09:08:48.3437873Z _launcher(_helion_matmul, (triton.cdiv(2048, _BLOCK_SIZE_1) * triton.cdiv(4096, _BLOCK_SIZE_0),), x, y, out, num_warps=8, num_stages=5) 2026-02-21T09:08:48.3438304Z # src[matmul.py:68]: return out 2026-02-21T09:08:48.3438497Z return out 2026-02-21T09:08:59.0384326Z WARNING:tritonbench.utils.triton_op:Completed input ID 2: 2026-02-21T09:08:59.0384610Z (M, N, K) 2026-02-21T09:08:59.0385026Z ------------------ 2026-02-21T09:08:59.0385170Z (4096, 2048, 2048) 2026-02-21T09:08:59.0385276Z 2026-02-21T09:08:59.0394160Z 25%|██▌ | 2/8 [10:55<33:28, 334.75s/it]WARNING:tritonbench.utils.triton_op:Running input ID 3: 2026-02-21T09:08:59.0394566Z (M, N, K) 2026-02-21T09:08:59.0394840Z ------------------ 2026-02-21T09:08:59.0399825Z (2048, 4096, 2048) 2026-02-21T09:08:59.0400166Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:09:42.4119853Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:10:20.9017248Z INFO:tritonbench.utils.triton_op:Took 83.00ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:11:03.2776741Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:11:03.2780952Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:11:03.2782541Z 'dtype': 'torch.float16', 2026-02-21T09:11:03.2782831Z 'shape': (2048, 2048), 2026-02-21T09:11:03.2787764Z 'stride': (2048, 1)}, 2026-02-21T09:11:03.2789392Z { 'device': 'cuda:0', 2026-02-21T09:11:03.2790139Z 'dtype': 'torch.float16', 2026-02-21T09:11:03.2793805Z 'shape': (2048, 4096), 2026-02-21T09:11:03.2797698Z 'stride': (1, 2048)}, 2026-02-21T09:11:03.2801165Z None), 2026-02-21T09:11:03.2802738Z 'kwargs': {}} 2026-02-21T09:11:03.2812408Z INFO:tritonbench.utils.triton_op:Took 4.24ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:11:03.3710186Z [0s] Autotune random seed: 2137757931 2026-02-21T09:11:03.4988636Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:11:08.1627338Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 18.1 configs/s 2026-02-21T09:11:20.4198869Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 8.1 configs/s 2026-02-21T09:11:20.4208051Z [16s] Adaptive compile timeout: 30s (90% percentile=2.6s, bounds=[30.0s, 30s]) 2026-02-21T09:11:21.6220823Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━━━ 513/513 387.1 configs/s 2026-02-21T09:11:21.7656189Z [18s] Initial random population of 100, 5 starting points: 2026-02-21T09:11:21.7658030Z error=19 2026-02-21T09:11:21.7658183Z ok=81 2026-02-21T09:11:21.7658310Z min=0.3892 2026-02-21T09:11:21.7658436Z mid=2.5180 2026-02-21T09:11:21.7658583Z max=212.9642 2026-02-21T09:11:21.7658729Z best={'block_sizes': [256, 16, 16], 2026-02-21T09:11:21.7659007Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:21.7659273Z 'l2_groupings': [4], 2026-02-21T09:11:21.7659438Z 'load_eviction_policies': ['', ''], 2026-02-21T09:11:21.7659616Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:21.7659772Z 'maxnreg': 32, 2026-02-21T09:11:21.7659918Z 'num_sm_multiplier': 16, 2026-02-21T09:11:21.7660063Z 'num_stages': 4, 2026-02-21T09:11:21.7660202Z 'num_warps': 16, 2026-02-21T09:11:21.7660347Z 'pid_type': 'persistent_blocked', 2026-02-21T09:11:21.7660534Z 'range_flattens': [True, None], 2026-02-21T09:11:21.7660708Z 'range_multi_buffers': [False, None], 2026-02-21T09:11:21.7660898Z 'range_num_stages': [0, 0], 2026-02-21T09:11:21.7661055Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:21.7661233Z 'range_warp_specializes': [True, None]} 2026-02-21T09:11:21.7676414Z [18s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:11:23.0427460Z [19s] Generation 1 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:11:28.0521929Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 14.3 configs/s 2026-02-21T09:11:28.2578916Z 2026-02-21T09:11:28.2579016Z 2026-02-21T09:11:28.2579420Z ================================================================ 2026-02-21T09:11:28.2579773Z Internal Triton PTX codegen error 2026-02-21T09:11:28.2580003Z `ptxas` stderr: 2026-02-21T09:11:28.2580590Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 262 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:11:28.2581246Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:11:28.2581469Z 2026-02-21T09:11:28.2582016Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp3rwhehg_.ptx -o /tmp/tmp3rwhehg_.ptx.o 2026-02-21T09:11:28.2582911Z 2026-02-21T09:11:28.2582928Z 2026-02-21T09:11:28.2583004Z // 2026-02-21T09:11:28.2583180Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:11:28.2583412Z // 2026-02-21T09:11:28.2583495Z 2026-02-21T09:11:28.2583563Z .version 8.7 2026-02-21T09:11:28.2583733Z .target sm_100a 2026-02-21T09:11:28.2583912Z .address_size 64 2026-02-21T09:11:28.2584014Z 2026-02-21T09:11:28.2584167Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:11:28.2584493Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:11:28.2584833Z // @_helion_matmul 2026-02-21T09:11:28.2585093Z .visible .entry _helion_matmul( 2026-02-21T09:11:28.2585361Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:11:28.2585819Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:11:28.2586154Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:11:28.2586472Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:11:28.2586802Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:11:28.2587050Z ) 2026-02-21T09:11:28.2587188Z .reqntid 384 2026-02-21T09:11:28.2587332Z .maxnreg 32 2026-02-21T09:11:28.2587475Z { 2026-02-21T09:11:28.2587681Z .reg .pred %p<125>; 2026-02-21T09:11:28.2587866Z .reg .b32 %r<525>; 2026-02-21T09:11:28.2588039Z .reg .b64 %rd<209>; 2026-02-21T09:11:28.2588365Z .loc 1 19 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:19:0 2026-02-21T09:11:28.2588711Z $L__func_begin0: 2026-02-21T09:11:28.2588998Z .loc 1 19 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:19:0 2026-02-21T09:11:28.2589365Z 2026-02-21T09:11:28.2589427Z // %bb.0: 2026-02-21T09:11:28.2589607Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:11:28.2589833Z $L__tmp0: 2026-02-21T09:11:28.2590098Z .loc 1 19 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:19 2026-02-21T09:11:28.2590438Z mov.u32 %r1, %tid.x; 2026-02-21T09:11:28.2590621Z shr.u32 %r2, %r1, 5; 2026-02-21T09:11:28.2590809Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:11:28.2591041Z setp.lt.u32 %p3, %r3, 8; 2026-02-21T09:11:28.2591230Z @%p3 bra $L__BB0_16; 2026-02-21T09:11:28.2591410Z bra.uni $L__BB0_1; 2026-02-21T09:11:28.2591569Z $L__BB0_16: 2026-02-21T09:11:28.2591858Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0:0 2026-02-21T09:11:28.2592226Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:11:28.2592485Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:11:28.2592745Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:11:28.2593098Z .loc 1 19 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:19 2026-02-21T09:11:28.2593478Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:11:28.2593705Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T09:11:28.2593906Z mov.b32 %r152, global_smem; 2026-02-21T09:11:28.2594097Z // begin inline asm 2026-02-21T09:11:28.2594400Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r152], 128; 2026-02-21T09:11:28.2594791Z // end inline asm 2026-02-21T09:11:28.2594963Z bar.sync 0, 256; 2026-02-21T09:11:28.2595148Z ld.shared.b32 %r496, [global_smem]; 2026-02-21T09:11:28.2595358Z bar.sync 0, 256; 2026-02-21T09:11:28.2595531Z // begin inline asm 2026-02-21T09:11:28.2595787Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:11:28.2596084Z // end inline asm 2026-02-21T09:11:28.2596388Z .loc 1 21 67 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:21:67 2026-02-21T09:11:28.2596758Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:11:28.2596946Z mov.u32 %r261, %ctaid.y; 2026-02-21T09:11:28.2597127Z mov.u32 %r262, %ctaid.z; 2026-02-21T09:11:28.2597314Z mov.u32 %r263, %nctaid.x; 2026-02-21T09:11:28.2597502Z mov.u32 %r264, %nctaid.y; 2026-02-21T09:11:28.2597702Z mad.lo.s32 %r265, %r262, %r264, %r261; 2026-02-21T09:11:28.2597999Z mad.lo.s32 %r266, %r265, %r263, %r41; 2026-02-21T09:11:28.2598252Z mul.lo.s32 %r267, %r266, 384; 2026-02-21T09:11:28.2598441Z cvt.s64.s32 %rd77, %r267; 2026-02-21T09:11:28.2598639Z add.s64 %rd38, %rd7, %rd77; 2026-02-21T09:11:28.2598826Z shl.b32 %r268, %r1, 2; 2026-02-21T09:11:28.2599016Z add.s32 %r153, %r152, %r268; 2026-02-21T09:11:28.2599198Z mov.b32 %r524, 0; 2026-02-21T09:11:28.2599366Z // begin inline asm 2026-02-21T09:11:28.2599547Z @%p29 st.shared.b32 [ %r153 + 0 ], %r524; 2026-02-21T09:11:28.2599763Z // end inline asm 2026-02-21T09:11:28.2599939Z bar.warp.sync -1; 2026-02-21T09:11:28.2600115Z setp.eq.b32 %p113, %r1, 0; 2026-02-21T09:11:28.2600304Z cvt.u64.u32 %rd23, %r152; 2026-02-21T09:11:28.2600538Z // begin inline asm 2026-02-21T09:11:28.2600839Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd4; 2026-02-21T09:11:28.2601175Z // end inline asm 2026-02-21T09:11:28.2601329Z // begin inline asm 2026-02-21T09:11:28.2601586Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:11:28.2601881Z // end inline asm 2026-02-21T09:11:28.2602029Z mov.b32 %r155, 16; 2026-02-21T09:11:28.2602243Z // begin inline asm 2026-02-21T09:11:28.2602528Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r155; 2026-02-21T09:11:28.2602844Z // end inline asm 2026-02-21T09:11:28.2602997Z mov.b32 %r156, 256; 2026-02-21T09:11:28.2603149Z // begin inline asm 2026-02-21T09:11:28.2603428Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r156; 2026-02-21T09:11:28.2603738Z // end inline asm 2026-02-21T09:11:28.2603945Z mov.b32 %r157, 2048; 2026-02-21T09:11:28.2604103Z // begin inline asm 2026-02-21T09:11:28.2604393Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r157; 2026-02-21T09:11:28.2605139Z [24s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:11:28.2606758Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:11:28.2608182Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:11:28.2608462Z `ptxas` stderr: 2026-02-21T09:11:28.2608950Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 262 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:11:28.2609511Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:11:28.2609683Z 2026-02-21T09:11:28.2610155Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp3rwhehg_.ptx -o /tmp/tmp3rwhehg_.ptx.o 2026-02-21T09:11:28.2610682Z 2026-02-21T09:11:28.2610831Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:11:28.2611116Z // end inline asm 2026-02-21T09:11:28.2611264Z // begin inline asm 2026-02-21T09:11:28.2611566Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r157; 2026-02-21T09:11:28.2611906Z // end inline asm 2026-02-21T09:11:28.2612055Z mov.b64 %rd31, 4096; 2026-02-21T09:11:28.2612221Z // begin inline asm 2026-02-21T09:11:28.2612523Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:11:28.2612871Z // end inline asm 2026-02-21T09:11:28.2613016Z mov.b32 %r159, 1; 2026-02-21T09:11:28.2613176Z // begin inline asm 2026-02-21T09:11:28.2613477Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r159; 2026-02-21T09:11:28.2613897Z // end inline asm 2026-02-21T09:11:28.2614059Z // begin inline asm 2026-02-21T09:11:28.2614356Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r159; 2026-02-21T09:11:28.2614745Z // end inline asm 2026-02-21T09:11:28.2614903Z // begin inline asm 2026-02-21T09:11:28.2615194Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:11:28.2615512Z // end inline asm 2026-02-21T09:11:28.2615680Z // begin inline asm 2026-02-21T09:11:28.2616001Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:11:28.2616362Z // end inline asm 2026-02-21T09:11:28.2616593Z // begin inline asm 2026-02-21T09:11:28.2616888Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:11:28.2617206Z // end inline asm 2026-02-21T09:11:28.2617353Z // begin inline asm 2026-02-21T09:11:28.2617647Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:11:28.2617982Z // end inline asm 2026-02-21T09:11:28.2618145Z // begin inline asm 2026-02-21T09:11:28.2618643Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd38 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:11:28.2619123Z // end inline asm 2026-02-21T09:11:28.2619290Z // begin inline asm 2026-02-21T09:11:28.2619552Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd38 + 0 ], 0x80; 2026-02-21T09:11:28.2619879Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:11:28.2620117Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:11:28.2620393Z // end inline asm 2026-02-21T09:11:28.2620564Z bar.sync 0, 256; 2026-02-21T09:11:28.2620881Z .loc 1 22 67 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:22:67 2026-02-21T09:11:28.2621255Z add.s32 %r269, %r267, 128; 2026-02-21T09:11:28.2621443Z cvt.s64.s32 %rd78, %r269; 2026-02-21T09:11:28.2621645Z add.s64 %rd56, %rd7, %rd78; 2026-02-21T09:11:28.2621832Z bar.sync 0, 256; 2026-02-21T09:11:28.2621999Z // begin inline asm 2026-02-21T09:11:28.2622181Z @%p29 st.shared.b32 [ %r153 + 0 ], %r524; 2026-02-21T09:11:28.2622405Z // end inline asm 2026-02-21T09:11:28.2622577Z bar.warp.sync -1; 2026-02-21T09:11:28.2622743Z // begin inline asm 2026-02-21T09:11:28.2623064Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd5; 2026-02-21T09:11:28.2623413Z // end inline asm 2026-02-21T09:11:28.2623578Z // begin inline asm 2026-02-21T09:11:28.2623854Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:11:28.2624184Z // end inline asm 2026-02-21T09:11:28.2624344Z // begin inline asm 2026-02-21T09:11:28.2624651Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r155; 2026-02-21T09:11:28.2625058Z // end inline asm 2026-02-21T09:11:28.2625237Z mov.b32 %r164, 64; 2026-02-21T09:11:28.2625431Z // begin inline asm 2026-02-21T09:11:28.2625759Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r164; 2026-02-21T09:11:28.2626139Z // end inline asm 2026-02-21T09:11:28.2626317Z // begin inline asm 2026-02-21T09:11:28.2626673Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r157; 2026-02-21T09:11:28.2627016Z // end inline asm 2026-02-21T09:11:28.2627167Z mov.b32 %r166, 4096; 2026-02-21T09:11:28.2627335Z // begin inline asm 2026-02-21T09:11:28.2627623Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r166; 2026-02-21T09:11:28.2627959Z // end inline asm 2026-02-21T09:11:28.2628110Z // begin inline asm 2026-02-21T09:11:28.2628421Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd31; 2026-02-21T09:11:28.2628761Z // end inline asm 2026-02-21T09:11:28.2628920Z // begin inline asm 2026-02-21T09:11:28.2629225Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r159; 2026-02-21T09:11:28.2629638Z // end inline asm 2026-02-21T09:11:28.2629796Z // begin inline asm 2026-02-21T09:11:28.2630094Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r159; 2026-02-21T09:11:28.2630438Z // end inline asm 2026-02-21T09:11:28.2630586Z // begin inline asm 2026-02-21T09:11:28.2630867Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:11:28.2631181Z // end inline asm 2026-02-21T09:11:28.2631329Z // begin inline asm 2026-02-21T09:11:28.2631632Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:11:28.2631968Z // end inline asm 2026-02-21T09:11:28.2632175Z // begin inline asm 2026-02-21T09:11:28.2632456Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:11:28.2632786Z // end inline asm 2026-02-21T09:11:28.2632939Z // begin inline asm 2026-02-21T09:11:28.2633204Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:11:28.2633511Z // end inline asm 2026-02-21T09:11:28.2633657Z // begin inline asm 2026-02-21T09:11:28.2634140Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd56 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:11:28.2634579Z // end inline asm 2026-02-21T09:11:28.2634790Z // begin inline asm 2026-02-21T09:11:28.2635037Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd56 + 0 ], 0x80; 2026-02-21T09:11:28.2635322Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:11:28.2635558Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:11:28.2635831Z // end inline asm 2026-02-21T09:11:28.2636001Z bar.sync 0, 256; 2026-02-21T09:11:28.2636309Z .loc 1 24 71 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:24:71 2026-02-21T09:11:28.2636653Z add.s32 %r270, %r267, 256; 2026-02-21T09:11:28.2636830Z cvt.s64.s32 %rd79, %r270; 2026-02-21T09:11:28.2637017Z add.s64 %rd74, %rd7, %rd79; 2026-02-21T09:11:28.2637200Z bar.sync 0, 256; 2026-02-21T09:11:28.2637352Z // begin inline asm 2026-02-21T09:11:28.2637530Z @%p29 st.shared.b32 [ %r153 + 0 ], %r524; 2026-02-21T09:11:28.2637728Z // end inline asm 2026-02-21T09:11:28.2637888Z bar.warp.sync -1; 2026-02-21T09:11:28.2638042Z // begin inline asm 2026-02-21T09:11:28.2638334Z @%p113 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd23 + 0 ], %rd6; 2026-02-21T09:11:28.2638647Z // end inline asm 2026-02-21T09:11:28.2638804Z // begin inline asm 2026-02-21T09:11:28.2639063Z @%p113 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1; 2026-02-21T09:11:28.2639357Z // end inline asm 2026-02-21T09:11:28.2639518Z // begin inline asm 2026-02-21T09:11:28.2639788Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r164; 2026-02-21T09:11:28.2640104Z // end inline asm 2026-02-21T09:11:28.2640251Z // begin inline asm 2026-02-21T09:11:28.2640524Z @%p113 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r156; 2026-02-21T09:11:28.2640838Z // end inline asm 2026-02-21T09:11:28.2640992Z // begin inline asm 2026-02-21T09:11:28.2641278Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r166; 2026-02-21T09:11:28.2641593Z // end inline asm 2026-02-21T09:11:28.2641748Z // begin inline asm 2026-02-21T09:11:28.2642026Z @%p113 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r157; 2026-02-21T09:11:28.2642348Z // end inline asm 2026-02-21T09:11:28.2642496Z mov.b64 %rd67, 8192; 2026-02-21T09:11:28.2642662Z // begin inline asm 2026-02-21T09:11:28.2642962Z @%p113 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd23 + 0 ], 0x0, %rd67; 2026-02-21T09:11:28.2643289Z // end inline asm 2026-02-21T09:11:28.2643443Z // begin inline asm 2026-02-21T09:11:28.2643735Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0, %r159; 2026-02-21T09:11:28.2644188Z // end inline asm 2026-02-21T09:11:28.2644336Z // begin inline asm 2026-02-21T09:11:28.2644645Z @%p113 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x1, %r159; 2026-02-21T09:11:28.2645062Z // end inline asm 2026-02-21T09:11:28.2645224Z // begin inline asm 2026-02-21T09:11:28.2645517Z @%p113 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x6; 2026-02-21T09:11:28.2645844Z // end inline asm 2026-02-21T09:11:28.2646012Z // begin inline asm 2026-02-21T09:11:28.2646319Z @%p113 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:11:28.2646663Z // end inline asm 2026-02-21T09:11:28.2646820Z // begin inline asm 2026-02-21T09:11:28.2647188Z @%p113 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x3; 2026-02-21T09:11:28.2647542Z // end inline asm 2026-02-21T09:11:28.2647701Z // begin inline asm 2026-02-21T09:11:28.2648000Z @%p113 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd23 + 0 ], 0x0; 2026-02-21T09:11:28.2648334Z // end inline asm 2026-02-21T09:11:28.2648504Z // begin inline asm 2026-02-21T09:11:28.2648989Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd74 + 0 ], [ %rd23 + 0 ], 0x80; 2026-02-21T09:11:28.2649478Z // end inline asm 2026-02-21T09:11:28.2649643Z // begin inline asm 2026-02-21T09:11:28.2649905Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd74 + 0 ], 0x80; 2026-02-21T09:11:28.2650231Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:11:28.2650469Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:11:28.2650700Z // end inline asm 2026-02-21T09:11:28.2650909Z bar.sync 0, 256; 2026-02-21T09:11:28.2651096Z cvta.global.u64 %rd80, %rd74; 2026-02-21T09:11:28.2651460Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2651847Z max.u32 %r271, %r41, 511; 2026-02-21T09:11:28.2652043Z shl.b32 %r272, %r271, 7; 2026-02-21T09:11:28.2652235Z sub.s32 %r42, 65536, %r272; 2026-02-21T09:11:28.2652580Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2652968Z shfl.sync.idx.b32 %r273, %r2, 0, 31, -1; 2026-02-21T09:11:28.2653216Z shl.b32 %r274, %r273, 21; 2026-02-21T09:11:28.2653406Z and.b32 %r275, %r274, 6291456; 2026-02-21T09:11:28.2653619Z add.s32 %r276, %r275, %r496; 2026-02-21T09:11:28.2653811Z shl.b32 %r277, %r273, 4; 2026-02-21T09:11:28.2654002Z and.b32 %r278, %r277, 64; 2026-02-21T09:11:28.2654197Z add.s32 %r177, %r276, %r278; 2026-02-21T09:11:28.2654393Z mov.pred %p85, -1; 2026-02-21T09:11:28.2654578Z // begin inline asm 2026-02-21T09:11:28.2655073Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r177 + 0], {%r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524}; 2026-02-21T09:11:28.2655545Z // end inline asm 2026-02-21T09:11:28.2655696Z // begin inline asm 2026-02-21T09:11:28.2656121Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r177 + 16], {%r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524}; 2026-02-21T09:11:28.2656629Z // end inline asm 2026-02-21T09:11:28.2656792Z // begin inline asm 2026-02-21T09:11:28.2657253Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r177 + 32], {%r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524}; 2026-02-21T09:11:28.2657753Z // end inline asm 2026-02-21T09:11:28.2657922Z // begin inline asm 2026-02-21T09:11:28.2658381Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r177 + 48], {%r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524, %r524}; 2026-02-21T09:11:28.2658893Z // end inline asm 2026-02-21T09:11:28.2659064Z // begin inline asm 2026-02-21T09:11:28.2659247Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:11:28.2659456Z // end inline asm 2026-02-21T09:11:28.2659682Z bar.sync 0, 256; 2026-02-21T09:11:28.2660015Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2660393Z add.s32 %r245, %r152, 73728; 2026-02-21T09:11:28.2660589Z // begin inline asm 2026-02-21T09:11:28.2660797Z @%p113 mbarrier.init.shared::cta.b64 [%r245], 1; 2026-02-21T09:11:28.2661032Z // end inline asm 2026-02-21T09:11:28.2661196Z bar.sync 0, 256; 2026-02-21T09:11:28.2661358Z add.s32 %r246, %r152, 73736; 2026-02-21T09:11:28.2661547Z // begin inline asm 2026-02-21T09:11:28.2661747Z @%p113 mbarrier.init.shared::cta.b64 [%r246], 1; 2026-02-21T09:11:28.2661992Z // end inline asm 2026-02-21T09:11:28.2662152Z bar.sync 0, 256; 2026-02-21T09:11:28.2662375Z add.s32 %r247, %r152, 73744; 2026-02-21T09:11:28.2662567Z // begin inline asm 2026-02-21T09:11:28.2662774Z @%p113 mbarrier.init.shared::cta.b64 [%r247], 1; 2026-02-21T09:11:28.2663019Z // end inline asm 2026-02-21T09:11:28.2663181Z bar.sync 0, 256; 2026-02-21T09:11:28.2663355Z add.s32 %r248, %r152, 73752; 2026-02-21T09:11:28.2663540Z // begin inline asm 2026-02-21T09:11:28.2663745Z @%p113 mbarrier.init.shared::cta.b64 [%r248], 1; 2026-02-21T09:11:28.2664022Z // end inline asm 2026-02-21T09:11:28.2664196Z add.s32 %r249, %r152, 73760; 2026-02-21T09:11:28.2664390Z // begin inline asm 2026-02-21T09:11:28.2664601Z @%p113 mbarrier.init.shared::cta.b64 [%r249], 1; 2026-02-21T09:11:28.2664893Z // end inline asm 2026-02-21T09:11:28.2665055Z bar.sync 0, 256; 2026-02-21T09:11:28.2665231Z add.s32 %r250, %r152, 73768; 2026-02-21T09:11:28.2665417Z // begin inline asm 2026-02-21T09:11:28.2665672Z @%p113 mbarrier.init.shared::cta.b64 [%r250], 1; 2026-02-21T09:11:28.2665914Z // end inline asm 2026-02-21T09:11:28.2666082Z bar.sync 0, 256; 2026-02-21T09:11:28.2666247Z add.s32 %r251, %r152, 73776; 2026-02-21T09:11:28.2666442Z // begin inline asm 2026-02-21T09:11:28.2666629Z @%p113 mbarrier.init.shared::cta.b64 [%r251], 1; 2026-02-21T09:11:28.2666857Z // end inline asm 2026-02-21T09:11:28.2667013Z bar.sync 0, 256; 2026-02-21T09:11:28.2667168Z add.s32 %r252, %r152, 73784; 2026-02-21T09:11:28.2667347Z // begin inline asm 2026-02-21T09:11:28.2667539Z @%p113 mbarrier.init.shared::cta.b64 [%r252], 1; 2026-02-21T09:11:28.2667763Z // end inline asm 2026-02-21T09:11:28.2668048Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2668384Z bar.sync 0, 256; 2026-02-21T09:11:28.2668534Z // begin inline asm 2026-02-21T09:11:28.2668739Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r245]; 2026-02-21T09:11:28.2668976Z // end inline asm 2026-02-21T09:11:28.2669122Z bar.sync 0, 256; 2026-02-21T09:11:28.2669278Z // begin inline asm 2026-02-21T09:11:28.2669470Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r246]; 2026-02-21T09:11:28.2669698Z // end inline asm 2026-02-21T09:11:28.2669842Z bar.sync 0, 256; 2026-02-21T09:11:28.2669996Z // begin inline asm 2026-02-21T09:11:28.2670189Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r247]; 2026-02-21T09:11:28.2670410Z // end inline asm 2026-02-21T09:11:28.2670560Z bar.sync 0, 256; 2026-02-21T09:11:28.2670709Z // begin inline asm 2026-02-21T09:11:28.2670906Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r248]; 2026-02-21T09:11:28.2671124Z // end inline asm 2026-02-21T09:11:28.2671425Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2671760Z bar.sync 0, 256; 2026-02-21T09:11:28.2671922Z add.s32 %r257, %r152, 73792; 2026-02-21T09:11:28.2672092Z // begin inline asm 2026-02-21T09:11:28.2672287Z @%p113 mbarrier.init.shared::cta.b64 [%r257], 1; 2026-02-21T09:11:28.2672512Z // end inline asm 2026-02-21T09:11:28.2672663Z add.s32 %r484, %r152, 73808; 2026-02-21T09:11:28.2672844Z // begin inline asm 2026-02-21T09:11:28.2673027Z @%p113 mbarrier.init.shared::cta.b64 [%r484], 1; 2026-02-21T09:11:28.2673250Z // end inline asm 2026-02-21T09:11:28.2673597Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2673928Z bar.sync 0, 256; 2026-02-21T09:11:28.2674073Z // begin inline asm 2026-02-21T09:11:28.2674269Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r484]; 2026-02-21T09:11:28.2674492Z // end inline asm 2026-02-21T09:11:28.2674857Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2675237Z st.shared.b32 [global_smem+73816], 33554689; 2026-02-21T09:11:28.2675474Z st.shared.b32 [global_smem+32768], %r496; 2026-02-21T09:11:28.2675685Z barrier.sync 1; 2026-02-21T09:11:28.2675872Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:11:28.2676159Z barrier.sync 1; 2026-02-21T09:11:28.2676339Z setp.lt.s32 %p104, %r42, 1; 2026-02-21T09:11:28.2676550Z @%p104 bra $L__BB0_23; 2026-02-21T09:11:28.2676761Z // %bb.17: // %.lr.ph12 2026-02-21T09:11:28.2676999Z add.s32 %r521, %r41, -1; 2026-02-21T09:11:28.2677192Z shl.b32 %r281, %r1, 7; 2026-02-21T09:11:28.2677374Z and.b32 %r282, %r281, 32640; 2026-02-21T09:11:28.2677572Z shl.b32 %r283, %r1, 4; 2026-02-21T09:11:28.2677753Z and.b32 %r284, %r283, 112; 2026-02-21T09:11:28.2677995Z or.b32 %r285, %r282, %r284; 2026-02-21T09:11:28.2678191Z add.s32 %r287, %r152, 32768; 2026-02-21T09:11:28.2678386Z add.s32 %r45, %r287, %r285; 2026-02-21T09:11:28.2678577Z xor.b32 %r288, %r285, 16; 2026-02-21T09:11:28.2678769Z add.s32 %r46, %r287, %r288; 2026-02-21T09:11:28.2678965Z xor.b32 %r289, %r285, 32; 2026-02-21T09:11:28.2679146Z add.s32 %r47, %r287, %r289; 2026-02-21T09:11:28.2679337Z xor.b32 %r290, %r285, 48; 2026-02-21T09:11:28.2679562Z add.s32 %r48, %r287, %r290; 2026-02-21T09:11:28.2679756Z xor.b32 %r291, %r285, 64; 2026-02-21T09:11:28.2679935Z add.s32 %r49, %r287, %r291; 2026-02-21T09:11:28.2680125Z xor.b32 %r292, %r285, 80; 2026-02-21T09:11:28.2680303Z add.s32 %r50, %r287, %r292; 2026-02-21T09:11:28.2680494Z xor.b32 %r293, %r285, 96; 2026-02-21T09:11:28.2680672Z add.s32 %r51, %r287, %r293; 2026-02-21T09:11:28.2680865Z xor.b32 %r294, %r285, 112; 2026-02-21T09:11:28.2681056Z add.s32 %r52, %r287, %r294; 2026-02-21T09:11:28.2681236Z mov.b32 %r518, -1; 2026-02-21T09:11:28.2681411Z mov.b32 %r522, %r524; 2026-02-21T09:11:28.2681588Z mov.b32 %r523, %r524; 2026-02-21T09:11:28.2681765Z mov.b32 %r519, %r524; 2026-02-21T09:11:28.2681933Z bra.uni $L__BB0_18; 2026-02-21T09:11:28.2682169Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:11:28.2682582Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2682944Z bar.sync 0, 256; 2026-02-21T09:11:28.2683117Z // begin inline asm 2026-02-21T09:11:28.2683279Z 2026-02-21T09:11:28.2683416Z { 2026-02-21T09:11:28.2683560Z .reg .pred complete; 2026-02-21T09:11:28.2683741Z waitLoop: 2026-02-21T09:11:28.2683969Z mbarrier.try_wait.parity.shared.b64 complete, [%r257], %r524; 2026-02-21T09:11:28.2684274Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.2684455Z } 2026-02-21T09:11:28.2684542Z 2026-02-21T09:11:28.2684605Z // end inline asm 2026-02-21T09:11:28.2684828Z // begin inline asm 2026-02-21T09:11:28.2685290Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r299, %r300, %r301, %r302, %r303, %r304, %r305, %r306, %r307, %r308, %r309, %r310, %r311, %r312, %r313, %r314}, [%r177 + 0]; 2026-02-21T09:11:28.2685820Z // end inline asm 2026-02-21T09:11:28.2685979Z // begin inline asm 2026-02-21T09:11:28.2686418Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331}, [%r177 + 16]; 2026-02-21T09:11:28.2686898Z // end inline asm 2026-02-21T09:11:28.2687065Z // begin inline asm 2026-02-21T09:11:28.2687500Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348}, [%r177 + 32]; 2026-02-21T09:11:28.2688063Z // end inline asm 2026-02-21T09:11:28.2688229Z // begin inline asm 2026-02-21T09:11:28.2688669Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365}, [%r177 + 48]; 2026-02-21T09:11:28.2689148Z // end inline asm 2026-02-21T09:11:28.2689305Z // begin inline asm 2026-02-21T09:11:28.2689497Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:11:28.2689692Z // end inline asm 2026-02-21T09:11:28.2689862Z bar.sync 0, 256; 2026-02-21T09:11:28.2690026Z // begin inline asm 2026-02-21T09:11:28.2690232Z @%p113 mbarrier.arrive.shared::cta.b64 _, [%r484]; 2026-02-21T09:11:28.2690483Z // end inline asm 2026-02-21T09:11:28.2690768Z cvt.u64.u32 %rd81, %r299; 2026-02-21T09:11:28.2690948Z cvt.u64.u32 %rd82, %r300; 2026-02-21T09:11:28.2691116Z shl.b64 %rd83, %rd82, 32; 2026-02-21T09:11:28.2691294Z or.b64 %rd84, %rd81, %rd83; 2026-02-21T09:11:28.2691602Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2691952Z mov.b64 {%r373, %r374}, %rd84; 2026-02-21T09:11:28.2692151Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T09:11:28.2692545Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2692891Z cvt.u64.u32 %rd85, %r301; 2026-02-21T09:11:28.2693058Z cvt.u64.u32 %rd86, %r302; 2026-02-21T09:11:28.2693233Z shl.b64 %rd87, %rd86, 32; 2026-02-21T09:11:28.2693405Z or.b64 %rd88, %rd85, %rd87; 2026-02-21T09:11:28.2693714Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2694102Z mov.b64 {%r376, %r377}, %rd88; 2026-02-21T09:11:28.2694297Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T09:11:28.2694624Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2695007Z cvt.u64.u32 %rd89, %r303; 2026-02-21T09:11:28.2695186Z cvt.u64.u32 %rd90, %r304; 2026-02-21T09:11:28.2695353Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:11:28.2695528Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:11:28.2695827Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2696166Z mov.b64 {%r379, %r380}, %rd92; 2026-02-21T09:11:28.2696356Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T09:11:28.2696678Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2697022Z cvt.u64.u32 %rd93, %r305; 2026-02-21T09:11:28.2697201Z cvt.u64.u32 %rd94, %r306; 2026-02-21T09:11:28.2697392Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:11:28.2697573Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:11:28.2697911Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2698272Z mov.b64 {%r382, %r383}, %rd96; 2026-02-21T09:11:28.2698468Z cvt.rn.f16x2.f32 %r384, %r383, %r382; 2026-02-21T09:11:28.2698817Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2699171Z cvt.u64.u32 %rd97, %r307; 2026-02-21T09:11:28.2699359Z cvt.u64.u32 %rd98, %r308; 2026-02-21T09:11:28.2699542Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:11:28.2699733Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:11:28.2700059Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2700428Z mov.b64 {%r385, %r386}, %rd100; 2026-02-21T09:11:28.2700646Z cvt.rn.f16x2.f32 %r387, %r386, %r385; 2026-02-21T09:11:28.2700993Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2701362Z cvt.u64.u32 %rd101, %r309; 2026-02-21T09:11:28.2701553Z cvt.u64.u32 %rd102, %r310; 2026-02-21T09:11:28.2701750Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:11:28.2701938Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:11:28.2702356Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2702731Z mov.b64 {%r388, %r389}, %rd104; 2026-02-21T09:11:28.2702938Z cvt.rn.f16x2.f32 %r390, %r389, %r388; 2026-02-21T09:11:28.2703297Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2703656Z cvt.u64.u32 %rd105, %r311; 2026-02-21T09:11:28.2703850Z cvt.u64.u32 %rd106, %r312; 2026-02-21T09:11:28.2704036Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:11:28.2704231Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:11:28.2704569Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2705064Z mov.b64 {%r391, %r392}, %rd108; 2026-02-21T09:11:28.2705275Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T09:11:28.2705625Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2705977Z cvt.u64.u32 %rd109, %r313; 2026-02-21T09:11:28.2706151Z cvt.u64.u32 %rd110, %r314; 2026-02-21T09:11:28.2706329Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:11:28.2706502Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:11:28.2706859Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2707198Z mov.b64 {%r394, %r395}, %rd112; 2026-02-21T09:11:28.2707388Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T09:11:28.2707711Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2708040Z cvt.u64.u32 %rd113, %r316; 2026-02-21T09:11:28.2708263Z cvt.u64.u32 %rd114, %r317; 2026-02-21T09:11:28.2708441Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:11:28.2708632Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:11:28.2708937Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2709278Z mov.b64 {%r397, %r398}, %rd116; 2026-02-21T09:11:28.2709474Z cvt.rn.f16x2.f32 %r399, %r398, %r397; 2026-02-21T09:11:28.2709793Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2710134Z cvt.u64.u32 %rd117, %r318; 2026-02-21T09:11:28.2710309Z cvt.u64.u32 %rd118, %r319; 2026-02-21T09:11:28.2710489Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:11:28.2710665Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:11:28.2710983Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2711327Z mov.b64 {%r400, %r401}, %rd120; 2026-02-21T09:11:28.2711518Z cvt.rn.f16x2.f32 %r402, %r401, %r400; 2026-02-21T09:11:28.2711856Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2712193Z cvt.u64.u32 %rd121, %r320; 2026-02-21T09:11:28.2712375Z cvt.u64.u32 %rd122, %r321; 2026-02-21T09:11:28.2712550Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:11:28.2712737Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:11:28.2713048Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2713402Z mov.b64 {%r403, %r404}, %rd124; 2026-02-21T09:11:28.2713602Z cvt.rn.f16x2.f32 %r405, %r404, %r403; 2026-02-21T09:11:28.2713931Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2714283Z cvt.u64.u32 %rd125, %r322; 2026-02-21T09:11:28.2714457Z cvt.u64.u32 %rd126, %r323; 2026-02-21T09:11:28.2714639Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:11:28.2714908Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:11:28.2715240Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2715597Z mov.b64 {%r406, %r407}, %rd128; 2026-02-21T09:11:28.2715793Z cvt.rn.f16x2.f32 %r408, %r407, %r406; 2026-02-21T09:11:28.2716151Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2716579Z cvt.u64.u32 %rd129, %r324; 2026-02-21T09:11:28.2716774Z cvt.u64.u32 %rd130, %r325; 2026-02-21T09:11:28.2716964Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:11:28.2717159Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:11:28.2717493Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2717861Z mov.b64 {%r409, %r410}, %rd132; 2026-02-21T09:11:28.2718071Z cvt.rn.f16x2.f32 %r411, %r410, %r409; 2026-02-21T09:11:28.2718418Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2718834Z cvt.u64.u32 %rd133, %r326; 2026-02-21T09:11:28.2719017Z cvt.u64.u32 %rd134, %r327; 2026-02-21T09:11:28.2719214Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:11:28.2719399Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:11:28.2719735Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2720099Z mov.b64 {%r412, %r413}, %rd136; 2026-02-21T09:11:28.2720298Z cvt.rn.f16x2.f32 %r414, %r413, %r412; 2026-02-21T09:11:28.2720691Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2721050Z cvt.u64.u32 %rd137, %r328; 2026-02-21T09:11:28.2721244Z cvt.u64.u32 %rd138, %r329; 2026-02-21T09:11:28.2721430Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:11:28.2721627Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:11:28.2721961Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2722369Z mov.b64 {%r415, %r416}, %rd140; 2026-02-21T09:11:28.2722584Z cvt.rn.f16x2.f32 %r417, %r416, %r415; 2026-02-21T09:11:28.2722935Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2723301Z cvt.u64.u32 %rd141, %r330; 2026-02-21T09:11:28.2723488Z cvt.u64.u32 %rd142, %r331; 2026-02-21T09:11:28.2723685Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:11:28.2723875Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:11:28.2724225Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2724601Z mov.b64 {%r418, %r419}, %rd144; 2026-02-21T09:11:28.2724856Z cvt.rn.f16x2.f32 %r420, %r419, %r418; 2026-02-21T09:11:28.2725240Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2725635Z cvt.u64.u32 %rd145, %r333; 2026-02-21T09:11:28.2725829Z cvt.u64.u32 %rd146, %r334; 2026-02-21T09:11:28.2726016Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:11:28.2726213Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:11:28.2726548Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2726917Z mov.b64 {%r421, %r422}, %rd148; 2026-02-21T09:11:28.2727128Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:11:28.2727477Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2727855Z cvt.u64.u32 %rd149, %r335; 2026-02-21T09:11:28.2728034Z cvt.u64.u32 %rd150, %r336; 2026-02-21T09:11:28.2728228Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:11:28.2728408Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:11:28.2728742Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2729104Z mov.b64 {%r424, %r425}, %rd152; 2026-02-21T09:11:28.2729299Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:11:28.2729650Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2730002Z cvt.u64.u32 %rd153, %r337; 2026-02-21T09:11:28.2730189Z cvt.u64.u32 %rd154, %r338; 2026-02-21T09:11:28.2730368Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:11:28.2730561Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:11:28.2730950Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2731305Z mov.b64 {%r427, %r428}, %rd156; 2026-02-21T09:11:28.2731505Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:11:28.2731844Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2732198Z cvt.u64.u32 %rd157, %r339; 2026-02-21T09:11:28.2732378Z cvt.u64.u32 %rd158, %r340; 2026-02-21T09:11:28.2732561Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:11:28.2732743Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:11:28.2733079Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2733476Z mov.b64 {%r430, %r431}, %rd160; 2026-02-21T09:11:28.2733670Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:11:28.2734009Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2734350Z cvt.u64.u32 %rd161, %r341; 2026-02-21T09:11:28.2734535Z cvt.u64.u32 %rd162, %r342; 2026-02-21T09:11:28.2734767Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:11:28.2735006Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:11:28.2735333Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2735684Z mov.b64 {%r433, %r434}, %rd164; 2026-02-21T09:11:28.2735885Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:11:28.2736222Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2736626Z cvt.u64.u32 %rd165, %r343; 2026-02-21T09:11:28.2736816Z cvt.u64.u32 %rd166, %r344; 2026-02-21T09:11:28.2737012Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:11:28.2737206Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:11:28.2737573Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2737967Z mov.b64 {%r436, %r437}, %rd168; 2026-02-21T09:11:28.2738185Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:11:28.2738560Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2738922Z cvt.u64.u32 %rd169, %r345; 2026-02-21T09:11:28.2739123Z cvt.u64.u32 %rd170, %r346; 2026-02-21T09:11:28.2739312Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:11:28.2739514Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:11:28.2739860Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2740249Z mov.b64 {%r439, %r440}, %rd172; 2026-02-21T09:11:28.2740473Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T09:11:28.2740838Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2741213Z cvt.u64.u32 %rd173, %r347; 2026-02-21T09:11:28.2741399Z cvt.u64.u32 %rd174, %r348; 2026-02-21T09:11:28.2741599Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:11:28.2741788Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:11:28.2742131Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2742502Z mov.b64 {%r442, %r443}, %rd176; 2026-02-21T09:11:28.2742703Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T09:11:28.2743058Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2743413Z cvt.u64.u32 %rd177, %r350; 2026-02-21T09:11:28.2743609Z cvt.u64.u32 %rd178, %r351; 2026-02-21T09:11:28.2743794Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:11:28.2743996Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:11:28.2744331Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2744738Z mov.b64 {%r445, %r446}, %rd180; 2026-02-21T09:11:28.2744956Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T09:11:28.2745393Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2745753Z cvt.u64.u32 %rd181, %r352; 2026-02-21T09:11:28.2745947Z cvt.u64.u32 %rd182, %r353; 2026-02-21T09:11:28.2746132Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:11:28.2746318Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:11:28.2746646Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2746995Z mov.b64 {%r448, %r449}, %rd184; 2026-02-21T09:11:28.2747194Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T09:11:28.2747534Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2747953Z cvt.u64.u32 %rd185, %r354; 2026-02-21T09:11:28.2748135Z cvt.u64.u32 %rd186, %r355; 2026-02-21T09:11:28.2748304Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:11:28.2748489Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:11:28.2748794Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2749126Z mov.b64 {%r451, %r452}, %rd188; 2026-02-21T09:11:28.2749320Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T09:11:28.2749673Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2750013Z cvt.u64.u32 %rd189, %r356; 2026-02-21T09:11:28.2750183Z cvt.u64.u32 %rd190, %r357; 2026-02-21T09:11:28.2750359Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:11:28.2750533Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:11:28.2750894Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2751238Z mov.b64 {%r454, %r455}, %rd192; 2026-02-21T09:11:28.2751425Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T09:11:28.2751753Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2752083Z cvt.u64.u32 %rd193, %r358; 2026-02-21T09:11:28.2752262Z cvt.u64.u32 %rd194, %r359; 2026-02-21T09:11:28.2752436Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:11:28.2752621Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:11:28.2752926Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2753259Z mov.b64 {%r457, %r458}, %rd196; 2026-02-21T09:11:28.2753452Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T09:11:28.2753772Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2754104Z cvt.u64.u32 %rd197, %r360; 2026-02-21T09:11:28.2754276Z cvt.u64.u32 %rd198, %r361; 2026-02-21T09:11:28.2754454Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:11:28.2754631Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:11:28.2755039Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2755410Z mov.b64 {%r460, %r461}, %rd200; 2026-02-21T09:11:28.2755617Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T09:11:28.2755972Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2756328Z cvt.u64.u32 %rd201, %r362; 2026-02-21T09:11:28.2756508Z cvt.u64.u32 %rd202, %r363; 2026-02-21T09:11:28.2756676Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:11:28.2756856Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:11:28.2757163Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2757498Z mov.b64 {%r463, %r464}, %rd204; 2026-02-21T09:11:28.2757693Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T09:11:28.2758012Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2758349Z cvt.u64.u32 %rd205, %r364; 2026-02-21T09:11:28.2758521Z cvt.u64.u32 %rd206, %r365; 2026-02-21T09:11:28.2758703Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:11:28.2758936Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:11:28.2759253Z .loc 1 58 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:58:27 2026-02-21T09:11:28.2759592Z mov.b64 {%r466, %r467}, %rd208; 2026-02-21T09:11:28.2759777Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T09:11:28.2760104Z .loc 1 59 45 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:59:45 2026-02-21T09:11:28.2760452Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:11:28.2760650Z bar.sync 0, 256; 2026-02-21T09:11:28.2760850Z st.shared.v4.b32 [%r45], {%r375, %r378, %r381, %r384}; 2026-02-21T09:11:28.2761129Z st.shared.v4.b32 [%r46], {%r387, %r390, %r393, %r396}; 2026-02-21T09:11:28.2761436Z st.shared.v4.b32 [%r47], {%r399, %r402, %r405, %r408}; 2026-02-21T09:11:28.2761700Z st.shared.v4.b32 [%r48], {%r411, %r414, %r417, %r420}; 2026-02-21T09:11:28.2761957Z st.shared.v4.b32 [%r49], {%r423, %r426, %r429, %r432}; 2026-02-21T09:11:28.2762210Z st.shared.v4.b32 [%r50], {%r435, %r438, %r441, %r444}; 2026-02-21T09:11:28.2762466Z st.shared.v4.b32 [%r51], {%r447, %r450, %r453, %r456}; 2026-02-21T09:11:28.2762714Z st.shared.v4.b32 [%r52], {%r459, %r462, %r465, %r468}; 2026-02-21T09:11:28.2762976Z // begin inline asm 2026-02-21T09:11:28.2763156Z fence.proxy.async.shared::cta; 2026-02-21T09:11:28.2763348Z // end inline asm 2026-02-21T09:11:28.2763497Z bar.sync 0, 256; 2026-02-21T09:11:28.2763661Z elect.sync %r469|%p111, -1; 2026-02-21T09:11:28.2763855Z and.pred %p109, %p29, %p111; 2026-02-21T09:11:28.2764036Z // begin inline asm 2026-02-21T09:11:28.2764392Z @%p109 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd80, {%r522, %r523}], [%r287]; 2026-02-21T09:11:28.2764782Z // end inline asm 2026-02-21T09:11:28.2764956Z cp.async.bulk.commit_group; 2026-02-21T09:11:28.2765133Z mov.b32 %r520, 1; 2026-02-21T09:11:28.2765350Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:11:28.2765741Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2766114Z xor.b32 %r524, %r520, %r524; 2026-02-21T09:11:28.2766465Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2766828Z add.s32 %r519, %r519, 1; 2026-02-21T09:11:28.2767026Z setp.lt.s32 %p112, %r519, %r42; 2026-02-21T09:11:28.2767224Z @%p112 bra $L__BB0_18; 2026-02-21T09:11:28.2767407Z bra.uni $L__BB0_23; 2026-02-21T09:11:28.2767643Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:11:28.2767933Z add.s32 %r296, %r518, 1; 2026-02-21T09:11:28.2768133Z setp.eq.b32 %p105, %r518, 127; 2026-02-21T09:11:28.2768342Z selp.b32 %r518, 0, %r296, %p105; 2026-02-21T09:11:28.2768559Z setp.eq.b32 %p106, %r518, 127; 2026-02-21T09:11:28.2768755Z @%p106 bra $L__BB0_21; 2026-02-21T09:11:28.2768993Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:11:28.2769405Z .loc 1 0 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0:106 2026-02-21T09:11:28.2769778Z mov.b32 %r520, 0; 2026-02-21T09:11:28.2770101Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2770476Z setp.ne.b32 %p107, %r518, 0; 2026-02-21T09:11:28.2770686Z @%p107 bra $L__BB0_22; 2026-02-21T09:11:28.2770881Z // %bb.20: // %.thread 2026-02-21T09:11:28.2771159Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:11:28.2771411Z add.s32 %r521, %r521, 1; 2026-02-21T09:11:28.2771744Z .loc 1 39 35 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:39:35 2026-02-21T09:11:28.2772102Z shr.s32 %r471, %r521, 31; 2026-02-21T09:11:28.2772295Z shr.u32 %r472, %r471, 24; 2026-02-21T09:11:28.2772488Z add.s32 %r473, %r521, %r472; 2026-02-21T09:11:28.2772676Z shr.s32 %r474, %r473, 8; 2026-02-21T09:11:28.2773078Z .loc 1 40 33 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:40:33 2026-02-21T09:11:28.2773437Z shl.b32 %r475, %r474, 2; 2026-02-21T09:11:28.2773771Z .loc 1 41 39 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:41:39 2026-02-21T09:11:28.2774131Z sub.s32 %r476, 8, %r475; 2026-02-21T09:11:28.2774456Z .loc 1 41 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:41:52 2026-02-21T09:11:28.2774883Z min.s32 %r477, %r476, 4; 2026-02-21T09:11:28.2775210Z .loc 1 42 45 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:42:45 2026-02-21T09:11:28.2775585Z and.b32 %r478, %r473, -256; 2026-02-21T09:11:28.2775832Z sub.s32 %r479, %r521, %r478; 2026-02-21T09:11:28.2776157Z .loc 1 43 51 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:43:51 2026-02-21T09:11:28.2776512Z div.s32 %r480, %r479, %r477; 2026-02-21T09:11:28.2776843Z .loc 1 42 64 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:42:64 2026-02-21T09:11:28.2777210Z mul.lo.s32 %r481, %r480, %r477; 2026-02-21T09:11:28.2777446Z sub.s32 %r482, %r479, %r481; 2026-02-21T09:11:28.2777782Z .loc 1 42 30 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:42:30 2026-02-21T09:11:28.2778133Z add.s32 %r483, %r482, %r475; 2026-02-21T09:11:28.2778467Z .loc 1 44 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:44:27 2026-02-21T09:11:28.2778826Z shl.b32 %r523, %r483, 8; 2026-02-21T09:11:28.2779190Z .loc 1 45 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:45:27 2026-02-21T09:11:28.2779563Z shl.b32 %r522, %r480, 6; 2026-02-21T09:11:28.2779892Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2780238Z bra.uni $L__BB0_22; 2026-02-21T09:11:28.2780444Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:11:28.2780816Z .loc 1 0 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0:106 2026-02-21T09:11:28.2781150Z mov.b32 %r71, global_smem; 2026-02-21T09:11:28.2781332Z add.s32 %r72, %r71, %r3; 2026-02-21T09:11:28.2781506Z mov.u32 %r120, %ctaid.x; 2026-02-21T09:11:28.2781677Z max.u32 %r121, %r120, 511; 2026-02-21T09:11:28.2781853Z shl.b32 %r122, %r121, 7; 2026-02-21T09:11:28.2782014Z sub.s32 %r5, 65536, %r122; 2026-02-21T09:11:28.2782196Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:11:28.2782367Z bra.uni $L__BB0_2; 2026-02-21T09:11:28.2782586Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2782967Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2783331Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:11:28.2783546Z barrier.sync 1; 2026-02-21T09:11:28.2783700Z barrier.sync 1; 2026-02-21T09:11:28.2783883Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:11:28.2784113Z $L__BB0_2: // %.preheader 2026-02-21T09:11:28.2784375Z // =>This Loop Header: Depth=1 2026-02-21T09:11:28.2784636Z // Child Loop BB0_11 Depth 2 2026-02-21T09:11:28.2784969Z // Child Loop BB0_7 Depth 2 2026-02-21T09:11:28.2785314Z .loc 1 19 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:19 2026-02-21T09:11:28.2785674Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:11:28.2785897Z barrier.sync 1; 2026-02-21T09:11:28.2786073Z ld.shared.b8 %r70, [%r72+73808]; 2026-02-21T09:11:28.2786292Z setp.gt.u32 %p4, %r70, 3; 2026-02-21T09:11:28.2786481Z @%p4 bra $L__BB0_4; 2026-02-21T09:11:28.2786687Z // %bb.3: // %.preheader 2026-02-21T09:11:28.2786957Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2787282Z $L_brx_0: .branchtargets 2026-02-21T09:11:28.2787464Z $L__BB0_5, 2026-02-21T09:11:28.2787616Z $L__BB0_9, 2026-02-21T09:11:28.2787770Z $L__BB0_15, 2026-02-21T09:11:28.2787918Z $L__BB0_24; 2026-02-21T09:11:28.2788081Z brx.idx %r70, $L_brx_0; 2026-02-21T09:11:28.2788313Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2788737Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2789128Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:11:28.2789378Z ld.shared.b32 %r127, [global_smem+32768]; 2026-02-21T09:11:28.2789600Z barrier.sync 1; 2026-02-21T09:11:28.2789827Z @%p17 bra $L__BB0_8; 2026-02-21T09:11:28.2790027Z // %bb.6: // %.lr.ph9 2026-02-21T09:11:28.2790289Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2790692Z .loc 1 0 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0:106 2026-02-21T09:11:28.2791050Z mov.b32 %r501, -1; 2026-02-21T09:11:28.2791230Z mov.pred %p124, 0; 2026-02-21T09:11:28.2791440Z mov.b32 %r498, 0; 2026-02-21T09:11:28.2791616Z mov.b32 %r499, %r498; 2026-02-21T09:11:28.2791795Z mov.b32 %r500, %r498; 2026-02-21T09:11:28.2791974Z mov.b32 %r502, %r498; 2026-02-21T09:11:28.2792205Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:11:28.2792504Z // => This Inner Loop Header: Depth=2 2026-02-21T09:11:28.2792979Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2793353Z add.s32 %r133, %r501, 1; 2026-02-21T09:11:28.2793552Z setp.eq.b32 %p26, %r501, 127; 2026-02-21T09:11:28.2793759Z selp.b32 %r501, 0, %r133, %p26; 2026-02-21T09:11:28.2793968Z shl.b32 %r134, %r500, 3; 2026-02-21T09:11:28.2794159Z add.s32 %r136, %r71, %r134; 2026-02-21T09:11:28.2794347Z add.s32 %r137, %r136, 73728; 2026-02-21T09:11:28.2794544Z add.s32 %r125, %r136, 73760; 2026-02-21T09:11:28.2794924Z .loc 1 54 31 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:54:31 2026-02-21T09:11:28.2795305Z shl.b32 %r138, %r500, 13; 2026-02-21T09:11:28.2795496Z add.s32 %r139, %r71, %r138; 2026-02-21T09:11:28.2795833Z .loc 1 55 44 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:55:44 2026-02-21T09:11:28.2796160Z shl.b32 %r140, %r500, 11; 2026-02-21T09:11:28.2796339Z add.s32 %r141, %r71, %r140; 2026-02-21T09:11:28.2796520Z add.s32 %r142, %r141, 65536; 2026-02-21T09:11:28.2796817Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2797158Z bar.warp.sync -1; 2026-02-21T09:11:28.2797315Z // begin inline asm 2026-02-21T09:11:28.2797472Z 2026-02-21T09:11:28.2797596Z { 2026-02-21T09:11:28.2797738Z .reg .pred complete; 2026-02-21T09:11:28.2797912Z waitLoop: 2026-02-21T09:11:28.2798148Z mbarrier.try_wait.parity.shared.b64 complete, [%r125], %r499; 2026-02-21T09:11:28.2798442Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.2798631Z } 2026-02-21T09:11:28.2798712Z 2026-02-21T09:11:28.2798787Z // end inline asm 2026-02-21T09:11:28.2799105Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2799484Z setp.eq.b32 %p25, %r501, 127; 2026-02-21T09:11:28.2799823Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2800192Z elect.sync %r143|%p20, -1; 2026-02-21T09:11:28.2800393Z bfe.u32 %r144, %r139, 4, 14; 2026-02-21T09:11:28.2800591Z cvt.u64.u32 %rd20, %r144; 2026-02-21T09:11:28.2800804Z or.b64 %rd14, %rd20, -4611685949674356736; 2026-02-21T09:11:28.2801027Z bfe.u32 %r145, %r142, 4, 14; 2026-02-21T09:11:28.2801226Z cvt.u64.u32 %rd21, %r145; 2026-02-21T09:11:28.2801493Z or.b64 %rd15, %rd21, -4611685949699522560; 2026-02-21T09:11:28.2801723Z mov.b32 %r128, 135266320; 2026-02-21T09:11:28.2801903Z // begin inline asm 2026-02-21T09:11:28.2802195Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r127 + 0 ], %rd14, %rd15, %r128, %p124; 2026-02-21T09:11:28.2802515Z // end inline asm 2026-02-21T09:11:28.2802686Z add.s32 %r146, %r139, 4096; 2026-02-21T09:11:28.2802886Z bfe.u32 %r147, %r146, 4, 14; 2026-02-21T09:11:28.2803072Z cvt.u64.u32 %rd22, %r147; 2026-02-21T09:11:28.2803277Z or.b64 %rd16, %rd22, -4611685949674356736; 2026-02-21T09:11:28.2803493Z // begin inline asm 2026-02-21T09:11:28.2803780Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r127 + 64 ], %rd16, %rd15, %r128, %p124; 2026-02-21T09:11:28.2804150Z // end inline asm 2026-02-21T09:11:28.2804325Z cvt.u64.u32 %rd18, %r137; 2026-02-21T09:11:28.2804503Z // begin inline asm 2026-02-21T09:11:28.2804835Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd18]; 2026-02-21T09:11:28.2805164Z // end inline asm 2026-02-21T09:11:28.2805352Z and.pred %p24, %p25, %p20; 2026-02-21T09:11:28.2805572Z add.s32 %r148, %r71, 73792; 2026-02-21T09:11:28.2805771Z cvt.u64.u32 %rd19, %r148; 2026-02-21T09:11:28.2806007Z // begin inline asm 2026-02-21T09:11:28.2806262Z @%p24 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd19]; 2026-02-21T09:11:28.2806552Z // end inline asm 2026-02-21T09:11:28.2806847Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2807218Z setp.ne.b32 %p124, %r501, 127; 2026-02-21T09:11:28.2819188Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2819606Z selp.b32 %r149, 1, 0, %p25; 2026-02-21T09:11:28.2819829Z xor.b32 %r498, %r498, %r149; 2026-02-21T09:11:28.2820026Z add.s32 %r131, %r71, 73808; 2026-02-21T09:11:28.2820230Z // begin inline asm 2026-02-21T09:11:28.2820408Z 2026-02-21T09:11:28.2820560Z { 2026-02-21T09:11:28.2820730Z @!%p25 bra.uni skipWait; 2026-02-21T09:11:28.2820925Z .reg .pred complete; 2026-02-21T09:11:28.2821112Z waitLoop: 2026-02-21T09:11:28.2821355Z mbarrier.try_wait.parity.shared.b64 complete, [%r131], %r498; 2026-02-21T09:11:28.2821665Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.2821859Z skipWait: 2026-02-21T09:11:28.2822013Z } 2026-02-21T09:11:28.2822093Z 2026-02-21T09:11:28.2822164Z // end inline asm 2026-02-21T09:11:28.2822483Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2822849Z add.s32 %r150, %r500, 1; 2026-02-21T09:11:28.2823042Z setp.eq.b32 %p27, %r150, 4; 2026-02-21T09:11:28.2823256Z selp.b32 %r500, 0, %r150, %p27; 2026-02-21T09:11:28.2823466Z selp.b32 %r151, 1, 0, %p27; 2026-02-21T09:11:28.2823667Z xor.b32 %r499, %r499, %r151; 2026-02-21T09:11:28.2824013Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2824393Z add.s32 %r502, %r502, 1; 2026-02-21T09:11:28.2824594Z setp.lt.s32 %p28, %r502, %r5; 2026-02-21T09:11:28.2824858Z @%p28 bra $L__BB0_7; 2026-02-21T09:11:28.2825104Z $L__BB0_8: // %._crit_edge10 2026-02-21T09:11:28.2825418Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2825710Z barrier.sync 1; 2026-02-21T09:11:28.2825925Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:11:28.2826158Z bra.uni $L__BB0_2; 2026-02-21T09:11:28.2826394Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2826850Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2827284Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:11:28.2827524Z barrier.sync 1; 2026-02-21T09:11:28.2827867Z .loc 1 21 67 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:21:67 2026-02-21T09:11:28.2828259Z mov.u32 %r73, %ctaid.y; 2026-02-21T09:11:28.2828556Z mov.u32 %r74, %ctaid.z; 2026-02-21T09:11:28.2828750Z mov.u32 %r75, %nctaid.x; 2026-02-21T09:11:28.2828954Z mov.u32 %r76, %nctaid.y; 2026-02-21T09:11:28.2829157Z mad.lo.s32 %r77, %r74, %r76, %r73; 2026-02-21T09:11:28.2829401Z mad.lo.s32 %r78, %r77, %r75, %r120; 2026-02-21T09:11:28.2829646Z mul.lo.s32 %r79, %r78, 384; 2026-02-21T09:11:28.2829951Z .loc 1 22 67 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:22:67 2026-02-21T09:11:28.2830291Z add.s32 %r80, %r79, 128; 2026-02-21T09:11:28.2830462Z cvt.s64.s32 %rd8, %r80; 2026-02-21T09:11:28.2830646Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:11:28.2830833Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:11:28.2831224Z .loc 1 21 67 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:21:67 2026-02-21T09:11:28.2831569Z cvt.s64.s32 %rd10, %r79; 2026-02-21T09:11:28.2831754Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:11:28.2831947Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:11:28.2832276Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2832628Z @%p17 bra $L__BB0_14; 2026-02-21T09:11:28.2832869Z // %bb.10: // %.lr.ph 2026-02-21T09:11:28.2833137Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2833370Z add.s32 %r513, %r120, -1; 2026-02-21T09:11:28.2833562Z add.s32 %r19, %r1, -256; 2026-02-21T09:11:28.2833733Z mov.b32 %r509, -1; 2026-02-21T09:11:28.2833901Z mov.b32 %r503, 0; 2026-02-21T09:11:28.2834068Z mov.b32 %r504, %r503; 2026-02-21T09:11:28.2834230Z mov.b32 %r512, %r503; 2026-02-21T09:11:28.2834448Z mov.b32 %r511, %r503; 2026-02-21T09:11:28.2834624Z mov.b32 %r507, %r503; 2026-02-21T09:11:28.2834858Z mov.b32 %r510, %r503; 2026-02-21T09:11:28.2835032Z bra.uni $L__BB0_11; 2026-02-21T09:11:28.2835278Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:11:28.2835733Z .loc 1 0 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0:106 2026-02-21T09:11:28.2836143Z selp.b32 %r103, 0, %r507, %p8; 2026-02-21T09:11:28.2836344Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:11:28.2836533Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:11:28.2836855Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2837192Z shl.b32 %r110, %r504, 3; 2026-02-21T09:11:28.2837369Z add.s32 %r112, %r71, %r110; 2026-02-21T09:11:28.2837544Z add.s32 %r99, %r112, 73728; 2026-02-21T09:11:28.2837851Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2838182Z // begin inline asm 2026-02-21T09:11:28.2838335Z 2026-02-21T09:11:28.2838464Z { 2026-02-21T09:11:28.2838602Z .reg .pred complete; 2026-02-21T09:11:28.2838775Z waitLoop: 2026-02-21T09:11:28.2838985Z mbarrier.try_wait.parity.shared.b64 complete, [%r99], %r503; 2026-02-21T09:11:28.2839262Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.2839435Z } 2026-02-21T09:11:28.2839516Z 2026-02-21T09:11:28.2839578Z // end inline asm 2026-02-21T09:11:28.2839872Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2840220Z add.s32 %r105, %r112, 73760; 2026-02-21T09:11:28.2840527Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2840842Z bar.sync 3, 64; 2026-02-21T09:11:28.2841005Z // begin inline asm 2026-02-21T09:11:28.2841221Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r105], 10240; 2026-02-21T09:11:28.2841480Z // end inline asm 2026-02-21T09:11:28.2841768Z .loc 1 54 31 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:54:31 2026-02-21T09:11:28.2842111Z shl.b32 %r113, %r504, 13; 2026-02-21T09:11:28.2842293Z add.s32 %r102, %r71, %r113; 2026-02-21T09:11:28.2842587Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2842985Z bar.sync 3, 64; 2026-02-21T09:11:28.2843150Z elect.sync %r114|%p13, -1; 2026-02-21T09:11:28.2843351Z and.pred %p10, %p12, %p13; 2026-02-21T09:11:28.2843531Z // begin inline asm 2026-02-21T09:11:28.2843938Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r102], [%rd12, {%r103, %r511}], [%r105]; 2026-02-21T09:11:28.2844364Z // end inline asm 2026-02-21T09:11:28.2844665Z .loc 1 55 44 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:55:44 2026-02-21T09:11:28.2845053Z shl.b32 %r115, %r504, 11; 2026-02-21T09:11:28.2845233Z add.s32 %r116, %r71, %r115; 2026-02-21T09:11:28.2845493Z add.s32 %r106, %r116, 65536; 2026-02-21T09:11:28.2845819Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2846173Z bar.sync 3, 64; 2026-02-21T09:11:28.2846344Z elect.sync %r117|%p14, -1; 2026-02-21T09:11:28.2846553Z and.pred %p11, %p12, %p14; 2026-02-21T09:11:28.2846745Z // begin inline asm 2026-02-21T09:11:28.2847228Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r106], [%rd13, {%r103, %r512}], [%r105]; 2026-02-21T09:11:28.2847699Z // end inline asm 2026-02-21T09:11:28.2848016Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2848418Z add.s32 %r507, %r103, 16; 2026-02-21T09:11:28.2848763Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2849151Z add.s32 %r118, %r504, 1; 2026-02-21T09:11:28.2849411Z setp.eq.b32 %p15, %r118, 4; 2026-02-21T09:11:28.2849632Z selp.b32 %r504, 0, %r118, %p15; 2026-02-21T09:11:28.2849865Z selp.b32 %r119, 1, 0, %p15; 2026-02-21T09:11:28.2850070Z xor.b32 %r503, %r503, %r119; 2026-02-21T09:11:28.2850458Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2850825Z add.s32 %r510, %r510, 1; 2026-02-21T09:11:28.2851028Z setp.lt.s32 %p16, %r510, %r5; 2026-02-21T09:11:28.2851228Z @%p16 bra $L__BB0_11; 2026-02-21T09:11:28.2851419Z bra.uni $L__BB0_14; 2026-02-21T09:11:28.2851657Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:11:28.2851973Z // => This Inner Loop Header: Depth=2 2026-02-21T09:11:28.2852251Z add.s32 %r85, %r509, 1; 2026-02-21T09:11:28.2852440Z setp.eq.b32 %p6, %r509, 127; 2026-02-21T09:11:28.2852651Z selp.b32 %r509, 0, %r85, %p6; 2026-02-21T09:11:28.2852849Z setp.ne.b32 %p7, %r509, 0; 2026-02-21T09:11:28.2853049Z setp.eq.b32 %p8, %r509, 0; 2026-02-21T09:11:28.2853241Z @%p7 bra $L__BB0_13; 2026-02-21T09:11:28.2853479Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:11:28.2853751Z add.s32 %r513, %r513, 1; 2026-02-21T09:11:28.2854080Z .loc 1 39 35 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:39:35 2026-02-21T09:11:28.2854451Z shr.s32 %r86, %r513, 31; 2026-02-21T09:11:28.2854634Z shr.u32 %r87, %r86, 24; 2026-02-21T09:11:28.2854878Z add.s32 %r88, %r513, %r87; 2026-02-21T09:11:28.2855065Z shr.s32 %r89, %r88, 8; 2026-02-21T09:11:28.2855403Z .loc 1 40 33 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:40:33 2026-02-21T09:11:28.2855783Z shl.b32 %r90, %r89, 2; 2026-02-21T09:11:28.2856115Z .loc 1 41 39 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:41:39 2026-02-21T09:11:28.2856483Z sub.s32 %r91, 8, %r90; 2026-02-21T09:11:28.2856806Z .loc 1 41 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:41:52 2026-02-21T09:11:28.2857172Z min.s32 %r92, %r91, 4; 2026-02-21T09:11:28.2857494Z .loc 1 42 45 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:42:45 2026-02-21T09:11:28.2857929Z and.b32 %r93, %r88, -256; 2026-02-21T09:11:28.2858115Z sub.s32 %r94, %r513, %r93; 2026-02-21T09:11:28.2858452Z .loc 1 43 51 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:43:51 2026-02-21T09:11:28.2858820Z div.s32 %r95, %r94, %r92; 2026-02-21T09:11:28.2859149Z .loc 1 42 64 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:42:64 2026-02-21T09:11:28.2859523Z mul.lo.s32 %r96, %r95, %r92; 2026-02-21T09:11:28.2859717Z sub.s32 %r97, %r94, %r96; 2026-02-21T09:11:28.2860050Z .loc 1 42 30 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:42:30 2026-02-21T09:11:28.2860412Z add.s32 %r98, %r97, %r90; 2026-02-21T09:11:28.2860738Z .loc 1 44 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:44:27 2026-02-21T09:11:28.2861138Z shl.b32 %r511, %r98, 8; 2026-02-21T09:11:28.2861441Z .loc 1 45 27 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:45:27 2026-02-21T09:11:28.2861783Z shl.b32 %r512, %r95, 6; 2026-02-21T09:11:28.2861952Z bra.uni $L__BB0_13; 2026-02-21T09:11:28.2862148Z $L__BB0_14: // %._crit_edge 2026-02-21T09:11:28.2862297Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2862501Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2862576Z barrier.sync 1; 2026-02-21T09:11:28.2862669Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:11:28.2862735Z bra.uni $L__BB0_2; 2026-02-21T09:11:28.2862851Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:11:28.2863086Z .loc 1 19 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:19 2026-02-21T09:11:28.2863160Z barrier.sync 1; 2026-02-21T09:11:28.2863225Z barrier.sync 1; 2026-02-21T09:11:28.2863287Z bra.uni $L__BB0_2; 2026-02-21T09:11:28.2863380Z $L__BB0_23: // %._crit_edge13 2026-02-21T09:11:28.2863587Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2863670Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:11:28.2863730Z bar.sync 0, 256; 2026-02-21T09:11:28.2863800Z barrier.sync 1; 2026-02-21T09:11:28.2863885Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:11:28.2864080Z .loc 1 56 52 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:56:52 2026-02-21T09:11:28.2864149Z // begin inline asm 2026-02-21T09:11:28.2864202Z 2026-02-21T09:11:28.2864254Z { 2026-02-21T09:11:28.2864321Z .reg .pred complete; 2026-02-21T09:11:28.2864392Z waitLoop: 2026-02-21T09:11:28.2864534Z mbarrier.try_wait.parity.shared.b64 complete, [%r484], %r524; 2026-02-21T09:11:28.2864607Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.2864713Z } 2026-02-21T09:11:28.2864720Z 2026-02-21T09:11:28.2864783Z // end inline asm 2026-02-21T09:11:28.2864983Z .loc 1 33 106 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:106 2026-02-21T09:11:28.2865050Z bar.sync 0, 256; 2026-02-21T09:11:28.2865119Z // begin inline asm 2026-02-21T09:11:28.2865222Z @%p113 mbarrier.inval.shared::cta.b64 [%r484]; 2026-02-21T09:11:28.2865282Z // end inline asm 2026-02-21T09:11:28.2865351Z // begin inline asm 2026-02-21T09:11:28.2865446Z @%p113 mbarrier.inval.shared::cta.b64 [%r257]; 2026-02-21T09:11:28.2865507Z // end inline asm 2026-02-21T09:11:28.2865578Z // begin inline asm 2026-02-21T09:11:28.2865672Z @%p113 mbarrier.inval.shared::cta.b64 [%r249]; 2026-02-21T09:11:28.2865735Z // end inline asm 2026-02-21T09:11:28.2865796Z bar.sync 0, 256; 2026-02-21T09:11:28.2865866Z // begin inline asm 2026-02-21T09:11:28.2865956Z @%p113 mbarrier.inval.shared::cta.b64 [%r250]; 2026-02-21T09:11:28.2866014Z // end inline asm 2026-02-21T09:11:28.2866080Z bar.sync 0, 256; 2026-02-21T09:11:28.2866142Z // begin inline asm 2026-02-21T09:11:28.2866284Z @%p113 mbarrier.inval.shared::cta.b64 [%r251]; 2026-02-21T09:11:28.2866346Z // end inline asm 2026-02-21T09:11:28.2866421Z bar.sync 0, 256; 2026-02-21T09:11:28.2866484Z // begin inline asm 2026-02-21T09:11:28.2866579Z @%p113 mbarrier.inval.shared::cta.b64 [%r252]; 2026-02-21T09:11:28.2866650Z // end inline asm 2026-02-21T09:11:28.2866714Z // begin inline asm 2026-02-21T09:11:28.2866808Z @%p113 mbarrier.inval.shared::cta.b64 [%r245]; 2026-02-21T09:11:28.2866873Z // end inline asm 2026-02-21T09:11:28.2866945Z bar.sync 0, 256; 2026-02-21T09:11:28.2867008Z // begin inline asm 2026-02-21T09:11:28.2867101Z @%p113 mbarrier.inval.shared::cta.b64 [%r246]; 2026-02-21T09:11:28.2867173Z // end inline asm 2026-02-21T09:11:28.2867299Z bar.sync 0, 256; 2026-02-21T09:11:28.2867364Z // begin inline asm 2026-02-21T09:11:28.2867468Z @%p113 mbarrier.inval.shared::cta.b64 [%r247]; 2026-02-21T09:11:28.2867530Z // end inline asm 2026-02-21T09:11:28.2867591Z bar.sync 0, 256; 2026-02-21T09:11:28.2867657Z // begin inline asm 2026-02-21T09:11:28.2867760Z @%p113 mbarrier.inval.shared::cta.b64 [%r248]; 2026-02-21T09:11:28.2867822Z // end inline asm 2026-02-21T09:11:28.2868069Z .loc 1 33 4 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:33:4 2026-02-21T09:11:28.2868142Z bar.sync 0, 256; 2026-02-21T09:11:28.2868210Z // begin inline asm 2026-02-21T09:11:28.2868355Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r496, 128; 2026-02-21T09:11:28.2868427Z // end inline asm 2026-02-21T09:11:28.2868520Z st.shared.b32 [global_smem+73816], 50529027; 2026-02-21T09:11:28.2868583Z barrier.sync 1; 2026-02-21T09:11:28.2868720Z $L__BB0_24: // %common.ret 2026-02-21T09:11:28.2868932Z .loc 1 0 0 // cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py:0 2026-02-21T09:11:28.2869003Z ret; 2026-02-21T09:11:28.2869078Z $L__tmp1: 2026-02-21T09:11:28.2869150Z $L__func_end0: 2026-02-21T09:11:28.2869244Z // -- End function 2026-02-21T09:11:28.2869302Z } 2026-02-21T09:11:28.2869567Z .file 1 "/tmp/torchinductor_root/ve/cveo7q3btgxnl6kdwwq5ewrgt4pjdnnz5n7krxcr5hnx2irfx7xs.py" 2026-02-21T09:11:28.2869640Z .section .debug_abbrev 2026-02-21T09:11:28.2869697Z { 2026-02-21T09:11:28.2869806Z .b8 1 // Abbreviation Code 2026-02-21T09:11:28.2869909Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:11:28.2870005Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:11:28.2870102Z .b8 37 // DW_AT_producer 2026-02-21T09:11:28.2870194Z .b8 8 // DW_FORM_string 2026-02-21T09:11:28.2870281Z .b8 19 // DW_AT_language 2026-02-21T09:11:28.2870373Z .b8 5 // DW_FORM_data2 2026-02-21T09:11:28.2870469Z .b8 3 // DW_AT_name 2026-02-21T09:11:28.2870555Z .b8 8 // DW_FORM_string 2026-02-21T09:11:28.2870647Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:11:28.2870743Z .b8 6 // DW_FORM_data4 2026-02-21T09:11:28.2870831Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:11:28.2870918Z .b8 8 // DW_FORM_string 2026-02-21T09:11:28.2871008Z .b8 0 // EOM(1) 2026-02-21T09:11:28.2871090Z .b8 0 // EOM(2) 2026-02-21T09:11:28.2871167Z .b8 0 // EOM(3) 2026-02-21T09:11:28.2871224Z } 2026-02-21T09:11:28.2871301Z .section .debug_info 2026-02-21T09:11:28.2871359Z { 2026-02-21T09:11:28.2871459Z .b32 104 // Length of Unit 2026-02-21T09:11:28.2871565Z .b8 2 // DWARF version number 2026-02-21T09:11:28.2871622Z .b8 0 2026-02-21T09:11:28.2871761Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:11:28.2871897Z .b8 8 // Address Size (in bytes) 2026-02-21T09:11:28.2872025Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:11:28.2872121Z .b8 116 // DW_AT_producer 2026-02-21T09:11:28.2872184Z .b8 114 2026-02-21T09:11:28.2872251Z .b8 105 2026-02-21T09:11:28.2872309Z .b8 116 2026-02-21T09:11:28.2872365Z .b8 111 2026-02-21T09:11:28.2872428Z .b8 110 2026-02-21T09:11:28.2872484Z .b8 0 2026-02-21T09:11:28.2872572Z .b8 2 // DW_AT_language 2026-02-21T09:11:28.2872628Z .b8 0 2026-02-21T09:11:28.2872725Z .b8 99 // DW_AT_name 2026-02-21T09:11:28.2872815Z .b8 118 2026-02-21T09:11:28.2872873Z .b8 101 2026-02-21T09:11:28.2872939Z .b8 111 2026-02-21T09:11:28.2872998Z .b8 55 2026-02-21T09:11:28.2873055Z .b8 113 2026-02-21T09:11:28.2873111Z .b8 51 2026-02-21T09:11:28.2873176Z .b8 98 2026-02-21T09:11:28.2873240Z .b8 116 2026-02-21T09:11:28.2873297Z .b8 103 2026-02-21T09:11:28.2873355Z .b8 120 2026-02-21T09:11:28.2873420Z .b8 110 2026-02-21T09:11:28.2873476Z .b8 108 2026-02-21T09:11:28.2873532Z .b8 54 2026-02-21T09:11:28.2873597Z .b8 107 2026-02-21T09:11:28.2873685Z .b8 100 2026-02-21T09:11:28.2873746Z .b8 119 2026-02-21T09:11:28.2873803Z .b8 119 2026-02-21T09:11:28.2873869Z .b8 113 2026-02-21T09:11:28.2873924Z .b8 53 2026-02-21T09:11:28.2873981Z .b8 101 2026-02-21T09:11:28.2874046Z .b8 119 2026-02-21T09:11:28.2874105Z .b8 114 2026-02-21T09:11:28.2874162Z .b8 103 2026-02-21T09:11:28.2874218Z .b8 116 2026-02-21T09:11:28.2874282Z .b8 52 2026-02-21T09:11:28.2874338Z .b8 112 2026-02-21T09:11:28.2874429Z .b8 106 2026-02-21T09:11:28.2874496Z .b8 100 2026-02-21T09:11:28.2874557Z .b8 110 2026-02-21T09:11:28.2874618Z .b8 110 2026-02-21T09:11:28.2874728Z .b8 122 2026-02-21T09:11:28.2874797Z .b8 53 2026-02-21T09:11:28.2874855Z .b8 110 2026-02-21T09:11:28.2874911Z .b8 55 2026-02-21T09:11:28.2874968Z .b8 107 2026-02-21T09:11:28.2875035Z .b8 114 2026-02-21T09:11:28.2875095Z .b8 120 2026-02-21T09:11:28.2875153Z .b8 99 2026-02-21T09:11:28.2875220Z .b8 114 2026-02-21T09:11:28.2875276Z .b8 53 2026-02-21T09:11:28.2875335Z .b8 104 2026-02-21T09:11:28.2875395Z .b8 110 2026-02-21T09:11:28.2875468Z .b8 120 2026-02-21T09:11:28.2875528Z .b8 50 2026-02-21T09:11:28.2875591Z .b8 105 2026-02-21T09:11:28.2875660Z .b8 114 2026-02-21T09:11:28.2875720Z .b8 102 2026-02-21T09:11:28.2875781Z .b8 120 2026-02-21T09:11:28.2875842Z .b8 55 2026-02-21T09:11:28.2875911Z .b8 120 2026-02-21T09:11:28.2875971Z .b8 115 2026-02-21T09:11:28.2876033Z .b8 46 2026-02-21T09:11:28.2876095Z .b8 112 2026-02-21T09:11:28.2876164Z .b8 121 2026-02-21T09:11:28.2876234Z .b8 0 2026-02-21T09:11:28.2876335Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:11:28.2876427Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:11:28.2876480Z .b8 116 2026-02-21T09:11:28.2876532Z .b8 109 2026-02-21T09:11:28.2876585Z .b8 112 2026-02-21T09:11:28.2876645Z .b8 47 2026-02-21T09:11:28.2876697Z .b8 116 2026-02-21T09:11:28.2876750Z .b8 111 2026-02-21T09:11:28.2876810Z .b8 114 2026-02-21T09:11:28.2876863Z .b8 99 2026-02-21T09:11:28.2876918Z .b8 104 2026-02-21T09:11:28.2876970Z .b8 105 2026-02-21T09:11:28.2877032Z .b8 110 2026-02-21T09:11:28.2877084Z .b8 100 2026-02-21T09:11:28.2877137Z .b8 117 2026-02-21T09:11:28.2877197Z .b8 99 2026-02-21T09:11:28.2877251Z .b8 116 2026-02-21T09:11:28.2877306Z .b8 111 2026-02-21T09:11:28.2877358Z .b8 114 2026-02-21T09:11:28.2877418Z .b8 95 2026-02-21T09:11:28.2877470Z .b8 114 2026-02-21T09:11:28.2877525Z .b8 111 2026-02-21T09:11:28.2877588Z .b8 111 2026-02-21T09:11:28.2877644Z .b8 116 2026-02-21T09:11:28.2877699Z .b8 47 2026-02-21T09:11:28.2877758Z .b8 118 2026-02-21T09:11:28.2877823Z .b8 101 2026-02-21T09:11:28.2877876Z .b8 0 2026-02-21T09:11:28.2877931Z } 2026-02-21T09:11:28.2878007Z .section .debug_macinfo { } 2026-02-21T09:11:28.2878019Z 2026-02-21T09:11:28.2878108Z ================================================================ 2026-02-21T09:11:28.2878282Z please share the reproducer above with Triton project. 2026-02-21T09:11:28.4254304Z [24s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:11:28.4256056Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:11:28.4256458Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:11:28.4256548Z `ptxas` stderr: 2026-02-21T09:11:28.4256558Z 2026-02-21T09:11:28.4256563Z 2026-02-21T09:11:28.4256567Z 2026-02-21T09:11:28.4256666Z ================================================================ 2026-02-21T09:11:28.4256764Z Internal Triton PTX codegen error 2026-02-21T09:11:28.4256856Z `ptxas` stderr: 2026-02-21T09:11:28.4257358Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 272 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:11:28.4257476Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:11:28.4257482Z 2026-02-21T09:11:28.4258031Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp4db5tzdi.ptx -o /tmp/tmp4db5tzdi.ptx.o 2026-02-21T09:11:28.4258036Z 2026-02-21T09:11:28.4258112Z 2026-02-21T09:11:28.4258184Z // 2026-02-21T09:11:28.4258283Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:11:28.4258344Z // 2026-02-21T09:11:28.4258349Z 2026-02-21T09:11:28.4258415Z .version 8.7 2026-02-21T09:11:28.4258482Z .target sm_100a 2026-02-21T09:11:28.4258555Z .address_size 64 2026-02-21T09:11:28.4258562Z 2026-02-21T09:11:28.4258721Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:11:28.4258820Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:11:28.4258928Z // @_helion_matmul 2026-02-21T09:11:28.4259016Z .visible .entry _helion_matmul( 2026-02-21T09:11:28.4259151Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:11:28.4259279Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:11:28.4259400Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:11:28.4259520Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:11:28.4259647Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:11:28.4259716Z ) 2026-02-21T09:11:28.4259782Z .reqntid 128 2026-02-21T09:11:28.4259848Z .maxnreg 32 2026-02-21T09:11:28.4259913Z { 2026-02-21T09:11:28.4259989Z .reg .pred %p<124>; 2026-02-21T09:11:28.4260073Z .reg .b32 %r<328>; 2026-02-21T09:11:28.4260145Z .reg .b64 %rd<153>; 2026-02-21T09:11:28.4260400Z .loc 1 19 0 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:19:0 2026-02-21T09:11:28.4260470Z $L__func_begin0: 2026-02-21T09:11:28.4260701Z .loc 1 19 0 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:19:0 2026-02-21T09:11:28.4260706Z 2026-02-21T09:11:28.4260777Z // %bb.0: 2026-02-21T09:11:28.4260881Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:11:28.4260944Z $L__tmp0: 2026-02-21T09:11:28.4261174Z .loc 1 19 0 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:19 2026-02-21T09:11:28.4261245Z mov.u32 %r1, %tid.x; 2026-02-21T09:11:28.4261354Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:11:28.4261434Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:11:28.4261541Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:11:28.4261618Z mov.b32 %r25, global_smem; 2026-02-21T09:11:28.4261758Z // begin inline asm 2026-02-21T09:11:28.4262381Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 272 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:11:28.4262495Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:11:28.4262501Z 2026-02-21T09:11:28.4263016Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp4db5tzdi.ptx -o /tmp/tmp4db5tzdi.ptx.o 2026-02-21T09:11:28.4263032Z 2026-02-21T09:11:28.4263193Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:11:28.4263399Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r25], 32; 2026-02-21T09:11:28.4263528Z // end inline asm 2026-02-21T09:11:28.4263632Z ld.param.b64 %rd58, [_helion_matmul_param_3]; 2026-02-21T09:11:28.4263695Z bar.sync 0; 2026-02-21T09:11:28.4263774Z ld.shared.b32 %r319, [global_smem]; 2026-02-21T09:11:28.4263845Z bar.sync 0; 2026-02-21T09:11:28.4263910Z // begin inline asm 2026-02-21T09:11:28.4264051Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:11:28.4264122Z // end inline asm 2026-02-21T09:11:28.4264371Z .loc 1 21 67 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:21:67 2026-02-21T09:11:28.4264443Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:11:28.4264512Z mov.u32 %r50, %ctaid.y; 2026-02-21T09:11:28.4264589Z mov.u32 %r51, %ctaid.z; 2026-02-21T09:11:28.4264658Z mov.u32 %r52, %nctaid.x; 2026-02-21T09:11:28.4264780Z mov.u32 %r53, %nctaid.y; 2026-02-21T09:11:28.4264866Z mad.lo.s32 %r54, %r51, %r53, %r50; 2026-02-21T09:11:28.4264984Z mad.lo.s32 %r55, %r54, %r52, %r3; 2026-02-21T09:11:28.4265057Z mul.lo.s32 %r56, %r55, 384; 2026-02-21T09:11:28.4265128Z cvt.s64.s32 %rd59, %r56; 2026-02-21T09:11:28.4265209Z add.s64 %rd19, %rd58, %rd59; 2026-02-21T09:11:28.4265275Z shl.b32 %r57, %r1, 2; 2026-02-21T09:11:28.4265339Z add.s32 %r26, %r25, %r57; 2026-02-21T09:11:28.4265407Z mov.b32 %r35, 0; 2026-02-21T09:11:28.4265469Z // begin inline asm 2026-02-21T09:11:28.4265550Z @%p1 st.shared.b32 [ %r26 + 0 ], %r35; 2026-02-21T09:11:28.4265618Z // end inline asm 2026-02-21T09:11:28.4265701Z bar.warp.sync -1; 2026-02-21T09:11:28.4265780Z setp.eq.b32 %p114, %r1, 0; 2026-02-21T09:11:28.4265852Z cvt.u64.u32 %rd4, %r25; 2026-02-21T09:11:28.4265925Z // begin inline asm 2026-02-21T09:11:28.4266146Z @%p114 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:11:28.4266211Z // end inline asm 2026-02-21T09:11:28.4266296Z // begin inline asm 2026-02-21T09:11:28.4266470Z @%p114 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:11:28.4266533Z // end inline asm 2026-02-21T09:11:28.4266595Z mov.b32 %r28, 16; 2026-02-21T09:11:28.4266666Z // begin inline asm 2026-02-21T09:11:28.4266849Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r28; 2026-02-21T09:11:28.4266915Z // end inline asm 2026-02-21T09:11:28.4266983Z mov.b32 %r29, 256; 2026-02-21T09:11:28.4267045Z // begin inline asm 2026-02-21T09:11:28.4267230Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r29; 2026-02-21T09:11:28.4267297Z // end inline asm 2026-02-21T09:11:28.4267359Z mov.b32 %r30, 2048; 2026-02-21T09:11:28.4267423Z // begin inline asm 2026-02-21T09:11:28.4267622Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T09:11:28.4267690Z // end inline asm 2026-02-21T09:11:28.4267751Z // begin inline asm 2026-02-21T09:11:28.4267944Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r30; 2026-02-21T09:11:28.4268013Z // end inline asm 2026-02-21T09:11:28.4268076Z mov.b64 %rd12, 4096; 2026-02-21T09:11:28.4268137Z // begin inline asm 2026-02-21T09:11:28.4268347Z @%p114 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:11:28.4268495Z // end inline asm 2026-02-21T09:11:28.4268555Z mov.b32 %r32, 1; 2026-02-21T09:11:28.4268616Z // begin inline asm 2026-02-21T09:11:28.4268839Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T09:11:28.4268901Z // end inline asm 2026-02-21T09:11:28.4268962Z // begin inline asm 2026-02-21T09:11:28.4269175Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T09:11:28.4269234Z // end inline asm 2026-02-21T09:11:28.4269294Z // begin inline asm 2026-02-21T09:11:28.4269483Z @%p114 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:11:28.4269548Z // end inline asm 2026-02-21T09:11:28.4269663Z // begin inline asm 2026-02-21T09:11:28.4269870Z @%p114 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:11:28.4269939Z // end inline asm 2026-02-21T09:11:28.4270004Z // begin inline asm 2026-02-21T09:11:28.4270194Z @%p114 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:11:28.4270264Z // end inline asm 2026-02-21T09:11:28.4270326Z // begin inline asm 2026-02-21T09:11:28.4270543Z @%p114 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:11:28.4270618Z // end inline asm 2026-02-21T09:11:28.4270682Z // begin inline asm 2026-02-21T09:11:28.4271009Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:11:28.4271079Z // end inline asm 2026-02-21T09:11:28.4271142Z // begin inline asm 2026-02-21T09:11:28.4271325Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:11:28.4271412Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:11:28.4271504Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:11:28.4271568Z // end inline asm 2026-02-21T09:11:28.4271626Z bar.sync 0; 2026-02-21T09:11:28.4271706Z cvta.global.u64 %rd76, %rd19; 2026-02-21T09:11:28.4271923Z .loc 1 22 67 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:22:67 2026-02-21T09:11:28.4271989Z add.s32 %r58, %r56, 128; 2026-02-21T09:11:28.4272057Z cvt.s64.s32 %rd60, %r58; 2026-02-21T09:11:28.4272133Z add.s64 %rd37, %rd58, %rd60; 2026-02-21T09:11:28.4272194Z bar.sync 0; 2026-02-21T09:11:28.4272256Z // begin inline asm 2026-02-21T09:11:28.4272338Z @%p1 st.shared.b32 [ %r26 + 0 ], %r35; 2026-02-21T09:11:28.4272399Z // end inline asm 2026-02-21T09:11:28.4272465Z bar.warp.sync -1; 2026-02-21T09:11:28.4272537Z // begin inline asm 2026-02-21T09:11:28.4272732Z @%p114 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:11:28.4272795Z // end inline asm 2026-02-21T09:11:28.4272858Z // begin inline asm 2026-02-21T09:11:28.4273035Z @%p114 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:11:28.4273100Z // end inline asm 2026-02-21T09:11:28.4273163Z // begin inline asm 2026-02-21T09:11:28.4273353Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r28; 2026-02-21T09:11:28.4273416Z // end inline asm 2026-02-21T09:11:28.4273479Z // begin inline asm 2026-02-21T09:11:28.4273670Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r28; 2026-02-21T09:11:28.4273731Z // end inline asm 2026-02-21T09:11:28.4273793Z // begin inline asm 2026-02-21T09:11:28.4273984Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r30; 2026-02-21T09:11:28.4274054Z // end inline asm 2026-02-21T09:11:28.4274116Z mov.b32 %r39, 4096; 2026-02-21T09:11:28.4274180Z // begin inline asm 2026-02-21T09:11:28.4274384Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r39; 2026-02-21T09:11:28.4274444Z // end inline asm 2026-02-21T09:11:28.4274505Z // begin inline asm 2026-02-21T09:11:28.4274765Z @%p114 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:11:28.4274875Z // end inline asm 2026-02-21T09:11:28.4274939Z // begin inline asm 2026-02-21T09:11:28.4275150Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T09:11:28.4275216Z // end inline asm 2026-02-21T09:11:28.4275278Z // begin inline asm 2026-02-21T09:11:28.4275480Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T09:11:28.4275548Z // end inline asm 2026-02-21T09:11:28.4275611Z // begin inline asm 2026-02-21T09:11:28.4275794Z @%p114 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:11:28.4275862Z // end inline asm 2026-02-21T09:11:28.4275964Z // begin inline asm 2026-02-21T09:11:28.4276171Z @%p114 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:11:28.4276239Z // end inline asm 2026-02-21T09:11:28.4276301Z // begin inline asm 2026-02-21T09:11:28.4276490Z @%p114 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:11:28.4276550Z // end inline asm 2026-02-21T09:11:28.4276619Z // begin inline asm 2026-02-21T09:11:28.4276828Z @%p114 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:11:28.4276890Z // end inline asm 2026-02-21T09:11:28.4276964Z // begin inline asm 2026-02-21T09:11:28.4277288Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:11:28.4277350Z // end inline asm 2026-02-21T09:11:28.4277418Z // begin inline asm 2026-02-21T09:11:28.4277612Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:11:28.4277697Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:11:28.4277781Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:11:28.4277853Z // end inline asm 2026-02-21T09:11:28.4277915Z bar.sync 0; 2026-02-21T09:11:28.4277989Z cvta.global.u64 %rd77, %rd37; 2026-02-21T09:11:28.4278217Z .loc 1 24 71 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:24:71 2026-02-21T09:11:28.4278283Z add.s32 %r59, %r56, 256; 2026-02-21T09:11:28.4278350Z cvt.s64.s32 %rd61, %r59; 2026-02-21T09:11:28.4278427Z add.s64 %rd55, %rd58, %rd61; 2026-02-21T09:11:28.4278486Z bar.sync 0; 2026-02-21T09:11:28.4278551Z // begin inline asm 2026-02-21T09:11:28.4278624Z @%p1 st.shared.b32 [ %r26 + 0 ], %r35; 2026-02-21T09:11:28.4278694Z // end inline asm 2026-02-21T09:11:28.4278760Z bar.warp.sync -1; 2026-02-21T09:11:28.4278821Z // begin inline asm 2026-02-21T09:11:28.4279029Z @%p114 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:11:28.4279090Z // end inline asm 2026-02-21T09:11:28.4279151Z // begin inline asm 2026-02-21T09:11:28.4279322Z @%p114 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:11:28.4279393Z // end inline asm 2026-02-21T09:11:28.4279455Z // begin inline asm 2026-02-21T09:11:28.4279637Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r28; 2026-02-21T09:11:28.4279706Z // end inline asm 2026-02-21T09:11:28.4279768Z // begin inline asm 2026-02-21T09:11:28.4279955Z @%p114 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r29; 2026-02-21T09:11:28.4280023Z // end inline asm 2026-02-21T09:11:28.4280083Z // begin inline asm 2026-02-21T09:11:28.4280278Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r39; 2026-02-21T09:11:28.4280346Z // end inline asm 2026-02-21T09:11:28.4280408Z // begin inline asm 2026-02-21T09:11:28.4280603Z @%p114 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r30; 2026-02-21T09:11:28.4280664Z // end inline asm 2026-02-21T09:11:28.4280735Z mov.b64 %rd48, 8192; 2026-02-21T09:11:28.4280798Z // begin inline asm 2026-02-21T09:11:28.4281003Z @%p114 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd48; 2026-02-21T09:11:28.4281108Z // end inline asm 2026-02-21T09:11:28.4281170Z // begin inline asm 2026-02-21T09:11:28.4281380Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r32; 2026-02-21T09:11:28.4281449Z // end inline asm 2026-02-21T09:11:28.4281508Z // begin inline asm 2026-02-21T09:11:28.4281707Z @%p114 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r32; 2026-02-21T09:11:28.4281764Z // end inline asm 2026-02-21T09:11:28.4281834Z // begin inline asm 2026-02-21T09:11:28.4282013Z @%p114 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:11:28.4282075Z // end inline asm 2026-02-21T09:11:28.4282182Z // begin inline asm 2026-02-21T09:11:28.4282392Z @%p114 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:11:28.4282450Z // end inline asm 2026-02-21T09:11:28.4282523Z // begin inline asm 2026-02-21T09:11:28.4282709Z @%p114 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:11:28.4282770Z // end inline asm 2026-02-21T09:11:28.4282834Z // begin inline asm 2026-02-21T09:11:28.4283047Z @%p114 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:11:28.4283109Z // end inline asm 2026-02-21T09:11:28.4283170Z // begin inline asm 2026-02-21T09:11:28.4283499Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:11:28.4283562Z // end inline asm 2026-02-21T09:11:28.4283622Z // begin inline asm 2026-02-21T09:11:28.4283831Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:11:28.4283916Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:11:28.4284002Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:11:28.4284077Z // end inline asm 2026-02-21T09:11:28.4284135Z bar.sync 0; 2026-02-21T09:11:28.4284209Z cvta.global.u64 %rd88, %rd55; 2026-02-21T09:11:28.4284425Z .loc 1 33 74 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:33:74 2026-02-21T09:11:28.4284504Z setp.gt.u32 %p57, %r3, 2047; 2026-02-21T09:11:28.4284570Z @%p57 bra $L__BB0_8; 2026-02-21T09:11:28.4284660Z // %bb.1: // %.lr.ph 2026-02-21T09:11:28.4284924Z .loc 1 0 74 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:0:74 2026-02-21T09:11:28.4284994Z shl.b32 %r129, %r1, 5; 2026-02-21T09:11:28.4285067Z and.b32 %r130, %r129, 3936; 2026-02-21T09:11:28.4285152Z bfe.s32 %r131, %r1, 2, 1; 2026-02-21T09:11:28.4285217Z and.b32 %r132, %r131, 144; 2026-02-21T09:11:28.4285285Z or.b32 %r133, %r132, %r130; 2026-02-21T09:11:28.4285350Z add.s32 %r134, %r25, 32768; 2026-02-21T09:11:28.4285426Z xor.b32 %r135, %r133, 16; 2026-02-21T09:11:28.4285490Z shr.u32 %r136, %r1, 5; 2026-02-21T09:11:28.4285694Z .loc 1 40 33 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:40:33 2026-02-21T09:11:28.4285766Z shr.u32 %r137, %r3, 3; 2026-02-21T09:11:28.4285831Z and.b32 %r138, %r137, 252; 2026-02-21T09:11:28.4286037Z .loc 1 42 64 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:42:64 2026-02-21T09:11:28.4286100Z and.b32 %r139, %r3, 3; 2026-02-21T09:11:28.4286305Z .loc 1 42 30 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:42:30 2026-02-21T09:11:28.4286372Z or.b32 %r140, %r138, %r139; 2026-02-21T09:11:28.4286573Z .loc 1 44 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:44:27 2026-02-21T09:11:28.4286648Z shl.b32 %r266, %r140, 4; 2026-02-21T09:11:28.4286850Z .loc 1 45 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:45:27 2026-02-21T09:11:28.4286916Z shl.b32 %r141, %r3, 6; 2026-02-21T09:11:28.4286989Z and.b32 %r267, %r141, 1792; 2026-02-21T09:11:28.4287189Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4287334Z shfl.sync.idx.b32 %r8, %r136, 0, 31, -1; 2026-02-21T09:11:28.4287408Z shl.b32 %r142, %r8, 21; 2026-02-21T09:11:28.4287478Z and.b32 %r143, %r142, 6291456; 2026-02-21T09:11:28.4287544Z add.s32 %r265, %r143, %r319; 2026-02-21T09:11:28.4287614Z mov.pred %p58, -1; 2026-02-21T09:11:28.4287684Z // begin inline asm 2026-02-21T09:11:28.4287983Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r265 + 0], {%r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35}; 2026-02-21T09:11:28.4288046Z // end inline asm 2026-02-21T09:11:28.4288116Z // begin inline asm 2026-02-21T09:11:28.4288421Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r265 + 16], {%r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35, %r35}; 2026-02-21T09:11:28.4288528Z // end inline asm 2026-02-21T09:11:28.4288598Z // begin inline asm 2026-02-21T09:11:28.4288681Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:11:28.4288744Z // end inline asm 2026-02-21T09:11:28.4288804Z bar.sync 0; 2026-02-21T09:11:28.4289020Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4289125Z add.s32 %r321, %r25, 43040; 2026-02-21T09:11:28.4289190Z // begin inline asm 2026-02-21T09:11:28.4289301Z @%p114 mbarrier.init.shared::cta.b64 [%r321], 1; 2026-02-21T09:11:28.4289362Z // end inline asm 2026-02-21T09:11:28.4289420Z bar.sync 0; 2026-02-21T09:11:28.4289485Z add.s32 %r95, %r25, 43048; 2026-02-21T09:11:28.4289555Z // begin inline asm 2026-02-21T09:11:28.4289654Z @%p114 mbarrier.init.shared::cta.b64 [%r95], 1; 2026-02-21T09:11:28.4289776Z // end inline asm 2026-02-21T09:11:28.4289852Z add.s32 %r96, %r25, 43008; 2026-02-21T09:11:28.4289915Z // begin inline asm 2026-02-21T09:11:28.4290013Z @%p114 mbarrier.init.shared::cta.b64 [%r96], 1; 2026-02-21T09:11:28.4290085Z // end inline asm 2026-02-21T09:11:28.4290151Z bar.sync 0; 2026-02-21T09:11:28.4290223Z add.s32 %r97, %r25, 43016; 2026-02-21T09:11:28.4290289Z // begin inline asm 2026-02-21T09:11:28.4290396Z @%p114 mbarrier.init.shared::cta.b64 [%r97], 1; 2026-02-21T09:11:28.4290463Z // end inline asm 2026-02-21T09:11:28.4290529Z bar.sync 0; 2026-02-21T09:11:28.4290605Z add.s32 %r98, %r25, 43024; 2026-02-21T09:11:28.4290672Z // begin inline asm 2026-02-21T09:11:28.4290772Z @%p114 mbarrier.init.shared::cta.b64 [%r98], 1; 2026-02-21T09:11:28.4290840Z // end inline asm 2026-02-21T09:11:28.4290913Z bar.sync 0; 2026-02-21T09:11:28.4290985Z add.s32 %r162, %r25, 43032; 2026-02-21T09:11:28.4291051Z // begin inline asm 2026-02-21T09:11:28.4291169Z @%p114 mbarrier.init.shared::cta.b64 [%r162], 1; 2026-02-21T09:11:28.4291236Z // end inline asm 2026-02-21T09:11:28.4291300Z bar.sync 0; 2026-02-21T09:11:28.4291366Z // begin inline asm 2026-02-21T09:11:28.4291517Z @%p114 mbarrier.arrive.expect_tx.shared.b64 _, [%r96], 8704; 2026-02-21T09:11:28.4291582Z // end inline asm 2026-02-21T09:11:28.4291811Z .loc 1 54 31 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:54:31 2026-02-21T09:11:28.4291883Z // begin inline asm 2026-02-21T09:11:28.4291970Z fence.proxy.async.shared::cta; 2026-02-21T09:11:28.4292036Z // end inline asm 2026-02-21T09:11:28.4292107Z bar.sync 0; 2026-02-21T09:11:28.4292185Z elect.sync %r144|%p76, -1; 2026-02-21T09:11:28.4292262Z and.pred %p67, %p1, %p76; 2026-02-21T09:11:28.4292329Z // begin inline asm 2026-02-21T09:11:28.4292656Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r25], [%rd76, {%r35, %r267}], [%r96]; 2026-02-21T09:11:28.4292722Z // end inline asm 2026-02-21T09:11:28.4292947Z .loc 1 55 44 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:55:44 2026-02-21T09:11:28.4293021Z bar.sync 0; 2026-02-21T09:11:28.4293098Z elect.sync %r145|%p77, -1; 2026-02-21T09:11:28.4293172Z and.pred %p68, %p1, %p77; 2026-02-21T09:11:28.4293249Z add.s32 %r105, %r25, 40960; 2026-02-21T09:11:28.4293352Z // begin inline asm 2026-02-21T09:11:28.4293669Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r105], [%rd77, {%r35, %r266}], [%r96]; 2026-02-21T09:11:28.4293742Z // end inline asm 2026-02-21T09:11:28.4293982Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4294049Z bar.sync 0; 2026-02-21T09:11:28.4294117Z // begin inline asm 2026-02-21T09:11:28.4294264Z @%p114 mbarrier.arrive.expect_tx.shared.b64 _, [%r97], 8704; 2026-02-21T09:11:28.4294329Z // end inline asm 2026-02-21T09:11:28.4294555Z .loc 1 54 31 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:54:31 2026-02-21T09:11:28.4294662Z bar.sync 0; 2026-02-21T09:11:28.4294787Z elect.sync %r146|%p78, -1; 2026-02-21T09:11:28.4294863Z and.pred %p70, %p1, %p78; 2026-02-21T09:11:28.4294937Z add.s32 %r110, %r25, 8192; 2026-02-21T09:11:28.4295019Z // begin inline asm 2026-02-21T09:11:28.4295335Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r110], [%rd76, {%r28, %r267}], [%r97]; 2026-02-21T09:11:28.4295402Z // end inline asm 2026-02-21T09:11:28.4295673Z .loc 1 55 44 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:55:44 2026-02-21T09:11:28.4295741Z bar.sync 0; 2026-02-21T09:11:28.4295817Z elect.sync %r147|%p79, -1; 2026-02-21T09:11:28.4295901Z and.pred %p71, %p1, %p79; 2026-02-21T09:11:28.4295973Z add.s32 %r114, %r25, 41472; 2026-02-21T09:11:28.4296043Z // begin inline asm 2026-02-21T09:11:28.4296393Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r114], [%rd77, {%r28, %r266}], [%r97]; 2026-02-21T09:11:28.4296472Z // end inline asm 2026-02-21T09:11:28.4296701Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4296766Z bar.sync 0; 2026-02-21T09:11:28.4296841Z // begin inline asm 2026-02-21T09:11:28.4296980Z @%p114 mbarrier.arrive.expect_tx.shared.b64 _, [%r98], 8704; 2026-02-21T09:11:28.4297049Z // end inline asm 2026-02-21T09:11:28.4297282Z .loc 1 54 31 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:54:31 2026-02-21T09:11:28.4297349Z bar.sync 0; 2026-02-21T09:11:28.4297421Z elect.sync %r148|%p80, -1; 2026-02-21T09:11:28.4297497Z and.pred %p73, %p1, %p80; 2026-02-21T09:11:28.4297574Z add.s32 %r119, %r25, 16384; 2026-02-21T09:11:28.4297640Z mov.b32 %r120, 32; 2026-02-21T09:11:28.4297708Z // begin inline asm 2026-02-21T09:11:28.4298040Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r119], [%rd76, {%r120, %r267}], [%r98]; 2026-02-21T09:11:28.4298108Z // end inline asm 2026-02-21T09:11:28.4298330Z .loc 1 55 44 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:55:44 2026-02-21T09:11:28.4298402Z bar.sync 0; 2026-02-21T09:11:28.4298476Z elect.sync %r149|%p81, -1; 2026-02-21T09:11:28.4298552Z and.pred %p74, %p1, %p81; 2026-02-21T09:11:28.4298621Z add.s32 %r123, %r25, 41984; 2026-02-21T09:11:28.4298698Z // begin inline asm 2026-02-21T09:11:28.4299015Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r123], [%rd77, {%r120, %r266}], [%r98]; 2026-02-21T09:11:28.4299083Z // end inline asm 2026-02-21T09:11:28.4299310Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4299377Z bar.sync 0; 2026-02-21T09:11:28.4299444Z // begin inline asm 2026-02-21T09:11:28.4299511Z 2026-02-21T09:11:28.4299569Z { 2026-02-21T09:11:28.4299645Z .reg .pred complete; 2026-02-21T09:11:28.4299716Z waitLoop: 2026-02-21T09:11:28.4299876Z mbarrier.try_wait.parity.shared.b64 complete, [%r96], %r35; 2026-02-21T09:11:28.4299955Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.4300015Z } 2026-02-21T09:11:28.4300021Z 2026-02-21T09:11:28.4300095Z // end inline asm 2026-02-21T09:11:28.4300369Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4300444Z setp.ne.b32 %p82, %r8, 0; 2026-02-21T09:11:28.4300514Z @%p82 bra $L__BB0_3; 2026-02-21T09:11:28.4300584Z // %bb.2: 2026-02-21T09:11:28.4300811Z .loc 1 0 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:0:52 2026-02-21T09:11:28.4300881Z add.s32 %r155, %r25, 4096; 2026-02-21T09:11:28.4300963Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T09:11:28.4301037Z cvt.u64.u32 %rd73, %r156; 2026-02-21T09:11:28.4301127Z or.b64 %rd70, %rd73, -4611685949674356736; 2026-02-21T09:11:28.4301206Z bfe.u32 %r158, %r105, 4, 14; 2026-02-21T09:11:28.4301279Z cvt.u64.u32 %rd74, %r158; 2026-02-21T09:11:28.4301412Z or.b64 %rd69, %rd74, -4611685949705814016; 2026-02-21T09:11:28.4301484Z bfe.u32 %r159, %r25, 4, 14; 2026-02-21T09:11:28.4301565Z cvt.u64.u32 %rd75, %r159; 2026-02-21T09:11:28.4301650Z or.b64 %rd68, %rd75, -4611685949674356736; 2026-02-21T09:11:28.4301877Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4301957Z elect.sync %r160|%p84, -1; 2026-02-21T09:11:28.4302047Z mov.b32 %r151, 134479888; 2026-02-21T09:11:28.4302112Z mov.pred %p83, 0; 2026-02-21T09:11:28.4302174Z // begin inline asm 2026-02-21T09:11:28.4302351Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r319 + 0 ], %rd68, %rd69, %r151, %p83; 2026-02-21T09:11:28.4302413Z // end inline asm 2026-02-21T09:11:28.4302475Z // begin inline asm 2026-02-21T09:11:28.4302646Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r319 + 16 ], %rd70, %rd69, %r151, %p83; 2026-02-21T09:11:28.4302738Z // end inline asm 2026-02-21T09:11:28.4302804Z add.s32 %r161, %r25, 43040; 2026-02-21T09:11:28.4302877Z cvt.u64.u32 %rd72, %r161; 2026-02-21T09:11:28.4302941Z // begin inline asm 2026-02-21T09:11:28.4303089Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd72]; 2026-02-21T09:11:28.4303151Z // end inline asm 2026-02-21T09:11:28.4303222Z $L__BB0_3: 2026-02-21T09:11:28.4303429Z .loc 1 0 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:0:52 2026-02-21T09:11:28.4303496Z add.s32 %r4, %r134, %r133; 2026-02-21T09:11:28.4303574Z add.s32 %r5, %r134, %r135; 2026-02-21T09:11:28.4303779Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4303839Z bar.sync 0; 2026-02-21T09:11:28.4303910Z // begin inline asm 2026-02-21T09:11:28.4304043Z @%p114 mbarrier.arrive.expect_tx.shared.b64 _, [%r162], 8704; 2026-02-21T09:11:28.4304103Z // end inline asm 2026-02-21T09:11:28.4304310Z .loc 1 54 31 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:54:31 2026-02-21T09:11:28.4304379Z bar.sync 0; 2026-02-21T09:11:28.4304449Z elect.sync %r176|%p92, -1; 2026-02-21T09:11:28.4304523Z and.pred %p89, %p1, %p92; 2026-02-21T09:11:28.4304599Z add.s32 %r163, %r25, 24576; 2026-02-21T09:11:28.4304717Z mov.b32 %r164, 48; 2026-02-21T09:11:28.4304789Z // begin inline asm 2026-02-21T09:11:28.4305117Z @%p89 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r163], [%rd76, {%r164, %r267}], [%r162]; 2026-02-21T09:11:28.4305189Z // end inline asm 2026-02-21T09:11:28.4305409Z .loc 1 55 44 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:55:44 2026-02-21T09:11:28.4305473Z bar.sync 0; 2026-02-21T09:11:28.4305557Z elect.sync %r177|%p93, -1; 2026-02-21T09:11:28.4305632Z and.pred %p90, %p1, %p93; 2026-02-21T09:11:28.4305702Z add.s32 %r167, %r25, 42496; 2026-02-21T09:11:28.4305777Z // begin inline asm 2026-02-21T09:11:28.4306096Z @%p90 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r167], [%rd77, {%r164, %r266}], [%r162]; 2026-02-21T09:11:28.4306160Z // end inline asm 2026-02-21T09:11:28.4306229Z mov.b32 %r325, 1; 2026-02-21T09:11:28.4306291Z mov.b32 %r324, 3; 2026-02-21T09:11:28.4306348Z mov.b32 %r320, 0; 2026-02-21T09:11:28.4311144Z mov.b32 %r322, %r320; 2026-02-21T09:11:28.4311221Z mov.b32 %r323, %r320; 2026-02-21T09:11:28.4311283Z mov.b32 %r326, %r320; 2026-02-21T09:11:28.4311344Z mov.b32 %r327, %r320; 2026-02-21T09:11:28.4311421Z bra.uni $L__BB0_4; 2026-02-21T09:11:28.4311540Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:11:28.4311750Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4311823Z setp.lt.u32 %p104, %r327, 1984; 2026-02-21T09:11:28.4312044Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4312108Z // begin inline asm 2026-02-21T09:11:28.4312240Z 2026-02-21T09:11:28.4312304Z { 2026-02-21T09:11:28.4312372Z .reg .pred complete; 2026-02-21T09:11:28.4312442Z waitLoop: 2026-02-21T09:11:28.4312591Z mbarrier.try_wait.parity.shared.b64 complete, [%r321], %r320; 2026-02-21T09:11:28.4312668Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.4312724Z } 2026-02-21T09:11:28.4312729Z 2026-02-21T09:11:28.4312791Z // end inline asm 2026-02-21T09:11:28.4312865Z add.s32 %r213, %r325, 1; 2026-02-21T09:11:28.4312982Z setp.gt.s32 %p107, %r213, 1; 2026-02-21T09:11:28.4313060Z selp.b32 %r325, 0, %r213, %p107; 2026-02-21T09:11:28.4313137Z selp.b32 %r214, 1, 0, %p107; 2026-02-21T09:11:28.4313202Z xor.b32 %r326, %r225, %r214; 2026-02-21T09:11:28.4313410Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4313484Z add.s32 %r215, %r324, 1; 2026-02-21T09:11:28.4313551Z setp.gt.s32 %p108, %r215, 3; 2026-02-21T09:11:28.4313676Z selp.b32 %r324, 0, %r215, %p108; 2026-02-21T09:11:28.4313744Z shl.b32 %r216, %r324, 3; 2026-02-21T09:11:28.4313815Z add.s32 %r218, %r25, %r216; 2026-02-21T09:11:28.4313875Z add.s32 %r208, %r218, 43008; 2026-02-21T09:11:28.4313932Z bar.sync 0; 2026-02-21T09:11:28.4314012Z and.pred %p101, %p114, %p104; 2026-02-21T09:11:28.4314074Z // begin inline asm 2026-02-21T09:11:28.4314201Z @%p101 mbarrier.arrive.expect_tx.shared.b64 _, [%r208], 8704; 2026-02-21T09:11:28.4314261Z // end inline asm 2026-02-21T09:11:28.4314469Z .loc 1 54 31 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:54:31 2026-02-21T09:11:28.4314534Z shl.b32 %r219, %r324, 13; 2026-02-21T09:11:28.4314598Z add.s32 %r205, %r25, %r219; 2026-02-21T09:11:28.4314661Z bar.sync 0; 2026-02-21T09:11:28.4314795Z elect.sync %r220|%p109, -1; 2026-02-21T09:11:28.4314869Z and.pred %p110, %p104, %p109; 2026-02-21T09:11:28.4314940Z and.pred %p102, %p1, %p110; 2026-02-21T09:11:28.4315015Z add.s32 %r206, %r327, 64; 2026-02-21T09:11:28.4315084Z // begin inline asm 2026-02-21T09:11:28.4315420Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r205], [%rd76, {%r206, %r267}], [%r208]; 2026-02-21T09:11:28.4315494Z // end inline asm 2026-02-21T09:11:28.4315716Z .loc 1 55 44 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:55:44 2026-02-21T09:11:28.4315789Z shl.b32 %r221, %r324, 9; 2026-02-21T09:11:28.4315866Z add.s32 %r222, %r25, %r221; 2026-02-21T09:11:28.4315938Z add.s32 %r209, %r222, 40960; 2026-02-21T09:11:28.4316002Z bar.sync 0; 2026-02-21T09:11:28.4316078Z elect.sync %r223|%p111, -1; 2026-02-21T09:11:28.4316162Z and.pred %p112, %p104, %p111; 2026-02-21T09:11:28.4316236Z and.pred %p103, %p1, %p112; 2026-02-21T09:11:28.4316303Z // begin inline asm 2026-02-21T09:11:28.4316635Z @%p103 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r209], [%rd77, {%r206, %r266}], [%r208]; 2026-02-21T09:11:28.4316706Z // end inline asm 2026-02-21T09:11:28.4316929Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4317016Z setp.lt.u32 %p113, %r327, 2016; 2026-02-21T09:11:28.4317086Z add.s32 %r327, %r327, 16; 2026-02-21T09:11:28.4317194Z mov.b32 %r320, %r225; 2026-02-21T09:11:28.4317263Z mov.b32 %r321, %r224; 2026-02-21T09:11:28.4317344Z @%p113 bra $L__BB0_4; 2026-02-21T09:11:28.4317414Z bra.uni $L__BB0_7; 2026-02-21T09:11:28.4317546Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:11:28.4317776Z .loc 1 0 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:0:42 2026-02-21T09:11:28.4317842Z mov.b32 %r225, %r326; 2026-02-21T09:11:28.4318063Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4318141Z add.s32 %r180, %r323, 1; 2026-02-21T09:11:28.4318216Z setp.gt.s32 %p95, %r180, 3; 2026-02-21T09:11:28.4318353Z selp.b32 %r323, 0, %r180, %p95; 2026-02-21T09:11:28.4318427Z selp.b32 %r181, 1, 0, %p95; 2026-02-21T09:11:28.4318505Z xor.b32 %r322, %r322, %r181; 2026-02-21T09:11:28.4318573Z shl.b32 %r182, %r323, 3; 2026-02-21T09:11:28.4318642Z add.s32 %r184, %r25, %r182; 2026-02-21T09:11:28.4318722Z add.s32 %r178, %r184, 43008; 2026-02-21T09:11:28.4318787Z bar.sync 0; 2026-02-21T09:11:28.4318853Z // begin inline asm 2026-02-21T09:11:28.4318911Z 2026-02-21T09:11:28.4318979Z { 2026-02-21T09:11:28.4319112Z .reg .pred complete; 2026-02-21T09:11:28.4319181Z waitLoop: 2026-02-21T09:11:28.4319344Z mbarrier.try_wait.parity.shared.b64 complete, [%r178], %r322; 2026-02-21T09:11:28.4319422Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.4319479Z } 2026-02-21T09:11:28.4319485Z 2026-02-21T09:11:28.4319561Z // end inline asm 2026-02-21T09:11:28.4319632Z shl.b32 %r185, %r325, 3; 2026-02-21T09:11:28.4319700Z add.s32 %r186, %r25, %r185; 2026-02-21T09:11:28.4319810Z add.s32 %r224, %r186, 43040; 2026-02-21T09:11:28.4320055Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4320124Z @%p82 bra $L__BB0_6; 2026-02-21T09:11:28.4320248Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:11:28.4320480Z .loc 1 54 31 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:54:31 2026-02-21T09:11:28.4320548Z shl.b32 %r191, %r323, 13; 2026-02-21T09:11:28.4320617Z add.s32 %r193, %r25, %r191; 2026-02-21T09:11:28.4320846Z .loc 1 55 44 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:55:44 2026-02-21T09:11:28.4320915Z shl.b32 %r194, %r323, 9; 2026-02-21T09:11:28.4320988Z add.s32 %r195, %r25, %r194; 2026-02-21T09:11:28.4321059Z add.s32 %r196, %r195, 40960; 2026-02-21T09:11:28.4321287Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4321364Z elect.sync %r197|%p97, -1; 2026-02-21T09:11:28.4321437Z bfe.u32 %r198, %r193, 4, 14; 2026-02-21T09:11:28.4321516Z cvt.u64.u32 %rd83, %r198; 2026-02-21T09:11:28.4321605Z or.b64 %rd78, %rd83, -4611685949674356736; 2026-02-21T09:11:28.4321675Z bfe.u32 %r199, %r196, 4, 14; 2026-02-21T09:11:28.4321748Z cvt.u64.u32 %rd84, %r199; 2026-02-21T09:11:28.4321842Z or.b64 %rd79, %rd84, -4611685949705814016; 2026-02-21T09:11:28.4321913Z mov.b32 %r188, 134479888; 2026-02-21T09:11:28.4321983Z mov.pred %p96, -1; 2026-02-21T09:11:28.4322062Z // begin inline asm 2026-02-21T09:11:28.4322246Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r319 + 0 ], %rd78, %rd79, %r188, %p96; 2026-02-21T09:11:28.4322314Z // end inline asm 2026-02-21T09:11:28.4322390Z add.s32 %r200, %r193, 4096; 2026-02-21T09:11:28.4322461Z bfe.u32 %r201, %r200, 4, 14; 2026-02-21T09:11:28.4322532Z cvt.u64.u32 %rd85, %r201; 2026-02-21T09:11:28.4322614Z or.b64 %rd80, %rd85, -4611685949674356736; 2026-02-21T09:11:28.4322692Z // begin inline asm 2026-02-21T09:11:28.4322874Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r319 + 16 ], %rd80, %rd79, %r188, %p96; 2026-02-21T09:11:28.4322940Z // end inline asm 2026-02-21T09:11:28.4323018Z cvt.u64.u32 %rd82, %r224; 2026-02-21T09:11:28.4323087Z // begin inline asm 2026-02-21T09:11:28.4323244Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd82]; 2026-02-21T09:11:28.4323364Z // end inline asm 2026-02-21T09:11:28.4323432Z bra.uni $L__BB0_6; 2026-02-21T09:11:28.4323554Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:11:28.4323780Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4323857Z // begin inline asm 2026-02-21T09:11:28.4323914Z 2026-02-21T09:11:28.4323972Z { 2026-02-21T09:11:28.4324052Z .reg .pred complete; 2026-02-21T09:11:28.4324119Z waitLoop: 2026-02-21T09:11:28.4324265Z mbarrier.try_wait.parity.shared.b64 complete, [%r224], %r225; 2026-02-21T09:11:28.4324343Z @!complete bra.uni waitLoop; 2026-02-21T09:11:28.4324444Z } 2026-02-21T09:11:28.4324449Z 2026-02-21T09:11:28.4324515Z // end inline asm 2026-02-21T09:11:28.4324794Z .loc 1 50 42 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:50:42 2026-02-21T09:11:28.4324873Z bar.sync 0; 2026-02-21T09:11:28.4324941Z // begin inline asm 2026-02-21T09:11:28.4325046Z @%p114 mbarrier.inval.shared::cta.b64 [%r96]; 2026-02-21T09:11:28.4325119Z // end inline asm 2026-02-21T09:11:28.4325185Z bar.sync 0; 2026-02-21T09:11:28.4325290Z // begin inline asm 2026-02-21T09:11:28.4325392Z @%p114 mbarrier.inval.shared::cta.b64 [%r97]; 2026-02-21T09:11:28.4325465Z // end inline asm 2026-02-21T09:11:28.4325531Z bar.sync 0; 2026-02-21T09:11:28.4325597Z // begin inline asm 2026-02-21T09:11:28.4325701Z @%p114 mbarrier.inval.shared::cta.b64 [%r98]; 2026-02-21T09:11:28.4325764Z // end inline asm 2026-02-21T09:11:28.4325830Z bar.sync 0; 2026-02-21T09:11:28.4325897Z // begin inline asm 2026-02-21T09:11:28.4326042Z @%p114 mbarrier.inval.shared::cta.b64 [%r162]; 2026-02-21T09:11:28.4326113Z // end inline asm 2026-02-21T09:11:28.4326189Z add.s32 %r230, %r25, 43040; 2026-02-21T09:11:28.4326266Z // begin inline asm 2026-02-21T09:11:28.4326373Z @%p114 mbarrier.inval.shared::cta.b64 [%r230]; 2026-02-21T09:11:28.4326444Z // end inline asm 2026-02-21T09:11:28.4326512Z bar.sync 0; 2026-02-21T09:11:28.4326589Z // begin inline asm 2026-02-21T09:11:28.4326691Z @%p114 mbarrier.inval.shared::cta.b64 [%r95]; 2026-02-21T09:11:28.4326761Z // end inline asm 2026-02-21T09:11:28.4327006Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4327076Z // begin inline asm 2026-02-21T09:11:28.4327479Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r232, %r233, %r234, %r235, %r236, %r237, %r238, %r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247}, [%r265 + 0]; 2026-02-21T09:11:28.4327558Z // end inline asm 2026-02-21T09:11:28.4327631Z // begin inline asm 2026-02-21T09:11:28.4328017Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r249, %r250, %r251, %r252, %r253, %r254, %r255, %r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264}, [%r265 + 16]; 2026-02-21T09:11:28.4328095Z // end inline asm 2026-02-21T09:11:28.4328167Z // begin inline asm 2026-02-21T09:11:28.4328258Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:11:28.4328326Z // end inline asm 2026-02-21T09:11:28.4328409Z cvt.u64.u32 %rd89, %r232; 2026-02-21T09:11:28.4328483Z cvt.u64.u32 %rd90, %r233; 2026-02-21T09:11:28.4328557Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:11:28.4328641Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:11:28.4328879Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4328957Z mov.b64 {%r270, %r271}, %rd92; 2026-02-21T09:11:28.4329044Z cvt.rn.f16x2.f32 %r272, %r271, %r270; 2026-02-21T09:11:28.4329285Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4329361Z cvt.u64.u32 %rd93, %r234; 2026-02-21T09:11:28.4329432Z cvt.u64.u32 %rd94, %r235; 2026-02-21T09:11:28.4329514Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:11:28.4329592Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:11:28.4329807Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4329929Z mov.b64 {%r273, %r274}, %rd96; 2026-02-21T09:11:28.4330004Z cvt.rn.f16x2.f32 %r275, %r274, %r273; 2026-02-21T09:11:28.4330216Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4330290Z cvt.u64.u32 %rd97, %r236; 2026-02-21T09:11:28.4330354Z cvt.u64.u32 %rd98, %r237; 2026-02-21T09:11:28.4330420Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:11:28.4330487Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:11:28.4330699Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4330771Z mov.b64 {%r276, %r277}, %rd100; 2026-02-21T09:11:28.4330884Z cvt.rn.f16x2.f32 %r278, %r277, %r276; 2026-02-21T09:11:28.4331102Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4331169Z cvt.u64.u32 %rd101, %r238; 2026-02-21T09:11:28.4331237Z cvt.u64.u32 %rd102, %r239; 2026-02-21T09:11:28.4331304Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:11:28.4331379Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:11:28.4331610Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4331678Z mov.b64 {%r279, %r280}, %rd104; 2026-02-21T09:11:28.4331762Z cvt.rn.f16x2.f32 %r281, %r280, %r279; 2026-02-21T09:11:28.4331972Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4332042Z cvt.u64.u32 %rd105, %r240; 2026-02-21T09:11:28.4332114Z cvt.u64.u32 %rd106, %r241; 2026-02-21T09:11:28.4332212Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:11:28.4332281Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:11:28.4332490Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4332564Z mov.b64 {%r282, %r283}, %rd108; 2026-02-21T09:11:28.4332638Z cvt.rn.f16x2.f32 %r284, %r283, %r282; 2026-02-21T09:11:28.4332847Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4332922Z cvt.u64.u32 %rd109, %r242; 2026-02-21T09:11:28.4332989Z cvt.u64.u32 %rd110, %r243; 2026-02-21T09:11:28.4333054Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:11:28.4333126Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:11:28.4333337Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4333402Z mov.b64 {%r285, %r286}, %rd112; 2026-02-21T09:11:28.4333474Z cvt.rn.f16x2.f32 %r287, %r286, %r285; 2026-02-21T09:11:28.4333694Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4333761Z cvt.u64.u32 %rd113, %r244; 2026-02-21T09:11:28.4333826Z cvt.u64.u32 %rd114, %r245; 2026-02-21T09:11:28.4333900Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:11:28.4333966Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:11:28.4334176Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4334248Z mov.b64 {%r288, %r289}, %rd116; 2026-02-21T09:11:28.4334321Z cvt.rn.f16x2.f32 %r290, %r289, %r288; 2026-02-21T09:11:28.4334538Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4334617Z cvt.u64.u32 %rd117, %r246; 2026-02-21T09:11:28.4334745Z cvt.u64.u32 %rd118, %r247; 2026-02-21T09:11:28.4334819Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:11:28.4334891Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:11:28.4335125Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4335198Z mov.b64 {%r291, %r292}, %rd120; 2026-02-21T09:11:28.4335277Z cvt.rn.f16x2.f32 %r293, %r292, %r291; 2026-02-21T09:11:28.4335507Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4335621Z cvt.u64.u32 %rd121, %r249; 2026-02-21T09:11:28.4335691Z cvt.u64.u32 %rd122, %r250; 2026-02-21T09:11:28.4335764Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:11:28.4335843Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:11:28.4336065Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4336136Z mov.b64 {%r294, %r295}, %rd124; 2026-02-21T09:11:28.4336219Z cvt.rn.f16x2.f32 %r296, %r295, %r294; 2026-02-21T09:11:28.4336443Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4336510Z cvt.u64.u32 %rd125, %r251; 2026-02-21T09:11:28.4336622Z cvt.u64.u32 %rd126, %r252; 2026-02-21T09:11:28.4336687Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:11:28.4336752Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:11:28.4336964Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4337039Z mov.b64 {%r297, %r298}, %rd128; 2026-02-21T09:11:28.4337109Z cvt.rn.f16x2.f32 %r299, %r298, %r297; 2026-02-21T09:11:28.4337344Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4337418Z cvt.u64.u32 %rd129, %r253; 2026-02-21T09:11:28.4337482Z cvt.u64.u32 %rd130, %r254; 2026-02-21T09:11:28.4337549Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:11:28.4337622Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:11:28.4337830Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4337896Z mov.b64 {%r300, %r301}, %rd132; 2026-02-21T09:11:28.4338003Z cvt.rn.f16x2.f32 %r302, %r301, %r300; 2026-02-21T09:11:28.4338216Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4338280Z cvt.u64.u32 %rd133, %r255; 2026-02-21T09:11:28.4338344Z cvt.u64.u32 %rd134, %r256; 2026-02-21T09:11:28.4338418Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:11:28.4338484Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:11:28.4338690Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4338764Z mov.b64 {%r303, %r304}, %rd136; 2026-02-21T09:11:28.4338838Z cvt.rn.f16x2.f32 %r305, %r304, %r303; 2026-02-21T09:11:28.4339044Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4339110Z cvt.u64.u32 %rd137, %r257; 2026-02-21T09:11:28.4339182Z cvt.u64.u32 %rd138, %r258; 2026-02-21T09:11:28.4339248Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:11:28.4339315Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:11:28.4339528Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4339593Z mov.b64 {%r306, %r307}, %rd140; 2026-02-21T09:11:28.4339663Z cvt.rn.f16x2.f32 %r308, %r307, %r306; 2026-02-21T09:11:28.4339876Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4339939Z cvt.u64.u32 %rd141, %r259; 2026-02-21T09:11:28.4340003Z cvt.u64.u32 %rd142, %r260; 2026-02-21T09:11:28.4340069Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:11:28.4340142Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:11:28.4340349Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4340415Z mov.b64 {%r309, %r310}, %rd144; 2026-02-21T09:11:28.4340496Z cvt.rn.f16x2.f32 %r311, %r310, %r309; 2026-02-21T09:11:28.4340704Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4340773Z cvt.u64.u32 %rd145, %r261; 2026-02-21T09:11:28.4340846Z cvt.u64.u32 %rd146, %r262; 2026-02-21T09:11:28.4340912Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:11:28.4340979Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:11:28.4341215Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4341290Z mov.b64 {%r312, %r313}, %rd148; 2026-02-21T09:11:28.4341363Z cvt.rn.f16x2.f32 %r314, %r313, %r312; 2026-02-21T09:11:28.4341568Z .loc 1 56 52 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:56:52 2026-02-21T09:11:28.4341644Z cvt.u64.u32 %rd149, %r263; 2026-02-21T09:11:28.4341708Z cvt.u64.u32 %rd150, %r264; 2026-02-21T09:11:28.4341772Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:11:28.4341844Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:11:28.4342051Z .loc 1 58 27 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:58:27 2026-02-21T09:11:28.4342171Z mov.b64 {%r315, %r316}, %rd152; 2026-02-21T09:11:28.4342243Z cvt.rn.f16x2.f32 %r317, %r316, %r315; 2026-02-21T09:11:28.4342456Z .loc 1 59 45 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:59:45 2026-02-21T09:11:28.4342539Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:11:28.4342600Z bar.sync 0; 2026-02-21T09:11:28.4342715Z st.shared.v4.b32 [%r4], {%r272, %r275, %r278, %r281}; 2026-02-21T09:11:28.4342854Z st.shared.v4.b32 [%r4+4096], {%r296, %r299, %r302, %r305}; 2026-02-21T09:11:28.4342960Z st.shared.v4.b32 [%r5], {%r284, %r287, %r290, %r293}; 2026-02-21T09:11:28.4343080Z st.shared.v4.b32 [%r5+4096], {%r308, %r311, %r314, %r317}; 2026-02-21T09:11:28.4343146Z // begin inline asm 2026-02-21T09:11:28.4343234Z fence.proxy.async.shared::cta; 2026-02-21T09:11:28.4343297Z // end inline asm 2026-02-21T09:11:28.4343367Z bar.sync 0; 2026-02-21T09:11:28.4343440Z elect.sync %r318|%p122, -1; 2026-02-21T09:11:28.4343543Z and.pred %p120, %p1, %p122; 2026-02-21T09:11:28.4343618Z // begin inline asm 2026-02-21T09:11:28.4343845Z @%p120 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd88, {%r266, %r267}], [%r134]; 2026-02-21T09:11:28.4343911Z // end inline asm 2026-02-21T09:11:28.4343996Z cp.async.bulk.commit_group; 2026-02-21T09:11:28.4344089Z $L__BB0_8: // %._crit_edge 2026-02-21T09:11:28.4344304Z .loc 1 33 74 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:33:74 2026-02-21T09:11:28.4344388Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:11:28.4344455Z bar.sync 0; 2026-02-21T09:11:28.4344662Z .loc 1 33 4 // cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py:33:4 2026-02-21T09:11:28.4344773Z bar.sync 0; 2026-02-21T09:11:28.4344849Z // begin inline asm 2026-02-21T09:11:28.4344983Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r319, 32; 2026-02-21T09:11:28.4345045Z // end inline asm 2026-02-21T09:11:28.4345110Z ret; 2026-02-21T09:11:28.4345174Z $L__tmp1: 2026-02-21T09:11:28.4345234Z $L__func_end0: 2026-02-21T09:11:28.4345325Z // -- End function 2026-02-21T09:11:28.4345388Z } 2026-02-21T09:11:28.4345641Z .file 1 "/tmp/torchinductor_root/fu/cfunh46fzhqeretiljodqwyizfkjp2p5vdijfvbvflduc5qpzief.py" 2026-02-21T09:11:28.4345712Z .section .debug_abbrev 2026-02-21T09:11:28.4345773Z { 2026-02-21T09:11:28.4345871Z .b8 1 // Abbreviation Code 2026-02-21T09:11:28.4345968Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:11:28.4346058Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:11:28.4346155Z .b8 37 // DW_AT_producer 2026-02-21T09:11:28.4346239Z .b8 8 // DW_FORM_string 2026-02-21T09:11:28.4346326Z .b8 19 // DW_AT_language 2026-02-21T09:11:28.4346420Z .b8 5 // DW_FORM_data2 2026-02-21T09:11:28.4346506Z .b8 3 // DW_AT_name 2026-02-21T09:11:28.4346587Z .b8 8 // DW_FORM_string 2026-02-21T09:11:28.4346687Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:11:28.4346824Z .b8 6 // DW_FORM_data4 2026-02-21T09:11:28.4346911Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:11:28.4346999Z .b8 8 // DW_FORM_string 2026-02-21T09:11:28.4347093Z .b8 0 // EOM(1) 2026-02-21T09:11:28.4347173Z .b8 0 // EOM(2) 2026-02-21T09:11:28.4347251Z .b8 0 // EOM(3) 2026-02-21T09:11:28.4347312Z } 2026-02-21T09:11:28.4347379Z .section .debug_info 2026-02-21T09:11:28.4347433Z { 2026-02-21T09:11:28.4347534Z .b32 104 // Length of Unit 2026-02-21T09:11:28.4347635Z .b8 2 // DWARF version number 2026-02-21T09:11:28.4347729Z .b8 0 2026-02-21T09:11:28.4347866Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:11:28.4347976Z .b8 8 // Address Size (in bytes) 2026-02-21T09:11:28.4348091Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:11:28.4348182Z .b8 116 // DW_AT_producer 2026-02-21T09:11:28.4348251Z .b8 114 2026-02-21T09:11:28.4348309Z .b8 105 2026-02-21T09:11:28.4348396Z .b8 116 2026-02-21T09:11:28.4348454Z .b8 111 2026-02-21T09:11:28.4348520Z .b8 110 2026-02-21T09:11:28.4348575Z .b8 0 2026-02-21T09:11:28.4348658Z .b8 2 // DW_AT_language 2026-02-21T09:11:28.4348721Z .b8 0 2026-02-21T09:11:28.4348805Z .b8 99 // DW_AT_name 2026-02-21T09:11:28.4348860Z .b8 102 2026-02-21T09:11:28.4348914Z .b8 117 2026-02-21T09:11:28.4348974Z .b8 110 2026-02-21T09:11:28.4349068Z .b8 104 2026-02-21T09:11:28.4349128Z .b8 52 2026-02-21T09:11:28.4349196Z .b8 54 2026-02-21T09:11:28.4349250Z .b8 102 2026-02-21T09:11:28.4349304Z .b8 122 2026-02-21T09:11:28.4349355Z .b8 104 2026-02-21T09:11:28.4349418Z .b8 113 2026-02-21T09:11:28.4349470Z .b8 101 2026-02-21T09:11:28.4349526Z .b8 114 2026-02-21T09:11:28.4349589Z .b8 101 2026-02-21T09:11:28.4349643Z .b8 116 2026-02-21T09:11:28.4349696Z .b8 105 2026-02-21T09:11:28.4349749Z .b8 108 2026-02-21T09:11:28.4349811Z .b8 106 2026-02-21T09:11:28.4349866Z .b8 111 2026-02-21T09:11:28.4349921Z .b8 100 2026-02-21T09:11:28.4349984Z .b8 113 2026-02-21T09:11:28.4350038Z .b8 119 2026-02-21T09:11:28.4350092Z .b8 121 2026-02-21T09:11:28.4350147Z .b8 105 2026-02-21T09:11:28.4350211Z .b8 122 2026-02-21T09:11:28.4350264Z .b8 102 2026-02-21T09:11:28.4350318Z .b8 107 2026-02-21T09:11:28.4350372Z .b8 106 2026-02-21T09:11:28.4350432Z .b8 112 2026-02-21T09:11:28.4350485Z .b8 50 2026-02-21T09:11:28.4350541Z .b8 112 2026-02-21T09:11:28.4350604Z .b8 53 2026-02-21T09:11:28.4350659Z .b8 118 2026-02-21T09:11:28.4350715Z .b8 100 2026-02-21T09:11:28.4350769Z .b8 105 2026-02-21T09:11:28.4350831Z .b8 106 2026-02-21T09:11:28.4350887Z .b8 102 2026-02-21T09:11:28.4350941Z .b8 118 2026-02-21T09:11:28.4351000Z .b8 98 2026-02-21T09:11:28.4351054Z .b8 118 2026-02-21T09:11:28.4351111Z .b8 102 2026-02-21T09:11:28.4351164Z .b8 108 2026-02-21T09:11:28.4351229Z .b8 100 2026-02-21T09:11:28.4351283Z .b8 117 2026-02-21T09:11:28.4351336Z .b8 99 2026-02-21T09:11:28.4351390Z .b8 53 2026-02-21T09:11:28.4351452Z .b8 113 2026-02-21T09:11:28.4351508Z .b8 112 2026-02-21T09:11:28.4351563Z .b8 122 2026-02-21T09:11:28.4351624Z .b8 105 2026-02-21T09:11:28.4351680Z .b8 101 2026-02-21T09:11:28.4351735Z .b8 102 2026-02-21T09:11:28.4351787Z .b8 46 2026-02-21T09:11:28.4351850Z .b8 112 2026-02-21T09:11:28.4351905Z .b8 121 2026-02-21T09:11:28.4351958Z .b8 0 2026-02-21T09:11:28.4352070Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:11:28.4352160Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:11:28.4352217Z .b8 116 2026-02-21T09:11:28.4352273Z .b8 109 2026-02-21T09:11:28.4352335Z .b8 112 2026-02-21T09:11:28.4352388Z .b8 47 2026-02-21T09:11:28.4352441Z .b8 116 2026-02-21T09:11:28.4352504Z .b8 111 2026-02-21T09:11:28.4352558Z .b8 114 2026-02-21T09:11:28.4352651Z .b8 99 2026-02-21T09:11:28.4352704Z .b8 104 2026-02-21T09:11:28.4352764Z .b8 105 2026-02-21T09:11:28.4352819Z .b8 110 2026-02-21T09:11:28.4352873Z .b8 100 2026-02-21T09:11:28.4352935Z .b8 117 2026-02-21T09:11:28.4352988Z .b8 99 2026-02-21T09:11:28.4353043Z .b8 116 2026-02-21T09:11:28.4353098Z .b8 111 2026-02-21T09:11:28.4353163Z .b8 114 2026-02-21T09:11:28.4353217Z .b8 95 2026-02-21T09:11:28.4353270Z .b8 114 2026-02-21T09:11:28.4353325Z .b8 111 2026-02-21T09:11:28.4353387Z .b8 111 2026-02-21T09:11:28.4353442Z .b8 116 2026-02-21T09:11:28.4353498Z .b8 47 2026-02-21T09:11:28.4353560Z .b8 102 2026-02-21T09:11:28.4353615Z .b8 117 2026-02-21T09:11:28.4353669Z .b8 0 2026-02-21T09:11:28.4353723Z } 2026-02-21T09:11:28.4353808Z .section .debug_macinfo { } 2026-02-21T09:11:28.4353843Z 2026-02-21T09:11:28.4353935Z ================================================================ 2026-02-21T09:11:28.4354059Z please share the reproducer above with Triton project. 2026-02-21T09:11:32.4662212Z 2026-02-21T09:11:32.4663143Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 19.9 configs/s 2026-02-21T09:11:33.8990913Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 679.5 2026-02-21T09:11:33.8991796Z configs/s 2026-02-21T09:11:33.9852288Z [30s] Generation 1 complete: 2026-02-21T09:11:33.9852586Z error=20 2026-02-21T09:11:33.9852782Z ok=71 2026-02-21T09:11:33.9852963Z min=0.1352 2026-02-21T09:11:33.9853163Z mid=0.4414 2026-02-21T09:11:33.9853350Z max=2.3954 2026-02-21T09:11:33.9853545Z best={'block_sizes': [64, 128, 16], 2026-02-21T09:11:33.9854196Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:33.9854635Z 'l2_groupings': [32], 2026-02-21T09:11:33.9855225Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:11:33.9855517Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:33.9855755Z 'num_stages': 8, 2026-02-21T09:11:33.9855958Z 'num_warps': 4, 2026-02-21T09:11:33.9856182Z 'pid_type': 'flat', 2026-02-21T09:11:33.9856407Z 'range_flattens': [None, None], 2026-02-21T09:11:33.9856688Z 'range_multi_buffers': [None, False], 2026-02-21T09:11:33.9856974Z 'range_num_stages': [0, 0], 2026-02-21T09:11:33.9857229Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:33.9857509Z 'range_warp_specializes': [None, None]} 2026-02-21T09:11:33.9874155Z [30s] Fitting surrogate: 191 points, 191 targets 2026-02-21T09:11:35.2997901Z [31s] Generation 2 starting: 81 neighbors, 5 active search path(s) 2026-02-21T09:11:40.8610128Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 8.7 configs/s 2026-02-21T09:11:45.2699879Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 19.5 configs/s 2026-02-21T09:11:45.5755491Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3077.6 2026-02-21T09:11:45.5758900Z configs/s 2026-02-21T09:11:45.6156316Z [42s] Generation 2 complete: 2026-02-21T09:11:45.6160328Z error=11 2026-02-21T09:11:45.6163778Z ok=76 2026-02-21T09:11:45.6168026Z min=0.0594 2026-02-21T09:11:45.6171989Z mid=0.1843 2026-02-21T09:11:45.6176090Z max=7.3657 2026-02-21T09:11:45.6177595Z best={'block_sizes': [512, 128, 16], 2026-02-21T09:11:45.6177876Z 'indexing': ['tensor_descriptor', 'pointer', 'tensor_descriptor'], 2026-02-21T09:11:45.6178134Z 'l2_groupings': [2], 2026-02-21T09:11:45.6178295Z 'load_eviction_policies': ['', ''], 2026-02-21T09:11:45.6178480Z 'loop_orders': [[0, 1]], 2026-02-21T09:11:45.6178632Z 'num_stages': 5, 2026-02-21T09:11:45.6178776Z 'num_warps': 8, 2026-02-21T09:11:45.6178913Z 'pid_type': 'flat', 2026-02-21T09:11:45.6179083Z 'range_flattens': [None, None], 2026-02-21T09:11:45.6179547Z 'range_multi_buffers': [None, True], 2026-02-21T09:11:45.6179736Z 'range_num_stages': [0, 0], 2026-02-21T09:11:45.6179906Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:45.6180079Z 'range_warp_specializes': [None, None]} 2026-02-21T09:11:45.6180287Z [42s] Fitting surrogate: 278 points, 278 targets 2026-02-21T09:11:47.0020179Z [43s] Generation 3 starting: 75 neighbors, 5 active search path(s) 2026-02-21T09:11:51.1811393Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77/77 20.0 configs/s 2026-02-21T09:11:54.7991140Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 77/77 21.3 configs/s 2026-02-21T09:11:55.6892109Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1113.0 2026-02-21T09:11:55.6892563Z configs/s 2026-02-21T09:11:55.7560776Z [52s] Generation 3 complete: 2026-02-21T09:11:55.7565645Z error=14 2026-02-21T09:11:55.7570795Z ok=66 2026-02-21T09:11:55.7575455Z min=0.0409 2026-02-21T09:11:55.7578850Z mid=0.1331 2026-02-21T09:11:55.7583338Z max=0.5204 2026-02-21T09:11:55.7588069Z best={'block_sizes': [512, 128, 32], 2026-02-21T09:11:55.7590351Z 'indexing': ['tensor_descriptor', 'pointer', 'tensor_descriptor'], 2026-02-21T09:11:55.7590642Z 'l2_groupings': [2], 2026-02-21T09:11:55.7590817Z 'load_eviction_policies': ['', ''], 2026-02-21T09:11:55.7590998Z 'loop_orders': [[0, 1]], 2026-02-21T09:11:55.7591159Z 'num_stages': 5, 2026-02-21T09:11:55.7591299Z 'num_warps': 4, 2026-02-21T09:11:55.7591710Z 'pid_type': 'flat', 2026-02-21T09:11:55.7591884Z 'range_flattens': [None, None], 2026-02-21T09:11:55.7592070Z 'range_multi_buffers': [None, True], 2026-02-21T09:11:55.7592259Z 'range_num_stages': [0, 0], 2026-02-21T09:11:55.7592421Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:55.7592608Z 'range_warp_specializes': [None, None]} 2026-02-21T09:11:55.7592825Z [52s] Fitting surrogate: 358 points, 358 targets 2026-02-21T09:11:56.8805668Z [53s] Generation 4 starting: 72 neighbors, 5 active search path(s) 2026-02-21T09:12:28.8891757Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74/74 0.4 configs/s 2026-02-21T09:12:32.4419839Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 74/74 20.9 configs/s 2026-02-21T09:12:34.3521418Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 609.7 2026-02-21T09:12:34.3521812Z configs/s 2026-02-21T09:12:34.4559835Z [90s] Generation 4 complete: 2026-02-21T09:12:34.4561269Z error=23 2026-02-21T09:12:34.4561475Z ok=54 2026-02-21T09:12:34.4566705Z min=0.0388 2026-02-21T09:12:34.4568730Z mid=0.0757 2026-02-21T09:12:34.4568899Z max=12.8020 2026-02-21T09:12:34.4569105Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:12:34.4569336Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:12:34.4569551Z 'l2_groupings': [1], 2026-02-21T09:12:34.4573771Z 'load_eviction_policies': ['', ''], 2026-02-21T09:12:34.4575917Z 'loop_orders': [[1, 0]], 2026-02-21T09:12:34.4576123Z 'num_stages': 4, 2026-02-21T09:12:34.4576328Z 'num_warps': 2, 2026-02-21T09:12:34.4580312Z 'pid_type': 'flat', 2026-02-21T09:12:34.4585153Z 'range_flattens': [None, True], 2026-02-21T09:12:34.4588772Z 'range_multi_buffers': [None, True], 2026-02-21T09:12:34.4589030Z 'range_num_stages': [0, 0], 2026-02-21T09:12:34.4589204Z 'range_unroll_factors': [0, 0], 2026-02-21T09:12:34.4592191Z 'range_warp_specializes': [None, True]} 2026-02-21T09:12:34.4592521Z [90s] Fitting surrogate: 435 points, 435 targets 2026-02-21T09:12:35.5655180Z [92s] Generation 5 starting: 71 neighbors, 5 active search path(s) 2026-02-21T09:13:06.7558356Z [123s] Timeout after 30s compiling Config(block_sizes=[256, 512, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=4, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T09:13:07.2808205Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72/72 0.5 configs/s 2026-02-21T09:13:09.7829418Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 72/72 29.2 configs/s 2026-02-21T09:13:11.3239524Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 650.9 2026-02-21T09:13:11.3240004Z configs/s 2026-02-21T09:13:11.4240272Z [127s] Generation 5 complete: 2026-02-21T09:13:11.4244533Z error=33 2026-02-21T09:13:11.4247813Z timeout=1 2026-02-21T09:13:11.4251964Z ok=42 2026-02-21T09:13:11.4257353Z min=0.0369 2026-02-21T09:13:11.4262087Z mid=0.0553 2026-02-21T09:13:11.4262286Z max=12.9547 2026-02-21T09:13:11.4267906Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:13:11.4269488Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:13:11.4270153Z 'l2_groupings': [1], 2026-02-21T09:13:11.4274660Z 'load_eviction_policies': ['', ''], 2026-02-21T09:13:11.4278654Z 'loop_orders': [[0, 1]], 2026-02-21T09:13:11.4280180Z 'num_stages': 4, 2026-02-21T09:13:11.4280410Z 'num_warps': 4, 2026-02-21T09:13:11.4283549Z 'pid_type': 'flat', 2026-02-21T09:13:11.4283812Z 'range_flattens': [None, True], 2026-02-21T09:13:11.4288557Z 'range_multi_buffers': [None, True], 2026-02-21T09:13:11.4293595Z 'range_num_stages': [0, 0], 2026-02-21T09:13:11.4297768Z 'range_unroll_factors': [0, 0], 2026-02-21T09:13:11.4300132Z 'range_warp_specializes': [None, True]} 2026-02-21T09:13:11.4300471Z [127s] Fitting surrogate: 511 points, 511 targets 2026-02-21T09:13:12.5036270Z [129s] Generation 6 starting: 65 neighbors, 4 active search path(s) 2026-02-21T09:13:44.8768737Z [161s] Timeout after 30s compiling Config(block_sizes=[1024, 128, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_stages=8, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T09:13:45.0482924Z [161s] Timeout after 30s compiling Config(block_sizes=[1024, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_stages=8, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T09:13:45.0496479Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67/67 0.6 configs/s 2026-02-21T09:13:45.3538586Z 2026-02-21T09:13:45.3538644Z 2026-02-21T09:13:45.3538869Z ================================================================ 2026-02-21T09:13:45.3539169Z Internal Triton PTX codegen error 2026-02-21T09:13:45.3539382Z `ptxas` stderr: 2026-02-21T09:13:45.3539854Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 259 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:13:45.3540346Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:13:45.3540532Z 2026-02-21T09:13:45.3540955Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp70tkmfdx.ptx -o /tmp/tmp70tkmfdx.ptx.o 2026-02-21T09:13:45.3541395Z 2026-02-21T09:13:45.3541399Z 2026-02-21T09:13:45.3541462Z // 2026-02-21T09:13:45.3541607Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:13:45.3541782Z // 2026-02-21T09:13:45.3541848Z 2026-02-21T09:13:45.3541902Z .version 8.7 2026-02-21T09:13:45.3542041Z .target sm_100a 2026-02-21T09:13:45.3542169Z .address_size 64 2026-02-21T09:13:45.3542248Z 2026-02-21T09:13:45.3542377Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:13:45.3542631Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:13:45.3543078Z // @_helion_matmul 2026-02-21T09:13:45.3543270Z .visible .entry _helion_matmul( 2026-02-21T09:13:45.3543484Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:13:45.3543738Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:13:45.3544053Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:13:45.3544296Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:13:45.3544536Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:13:45.3544823Z ) 2026-02-21T09:13:45.3544941Z .reqntid 256 2026-02-21T09:13:45.3545074Z .maxnreg 32 2026-02-21T09:13:45.3545191Z { 2026-02-21T09:13:45.3545320Z .reg .pred %p<157>; 2026-02-21T09:13:45.3545473Z .reg .b32 %r<2156>; 2026-02-21T09:13:45.3545613Z .reg .b64 %rd<1112>; 2026-02-21T09:13:45.3545953Z .loc 1 19 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:19:0 2026-02-21T09:13:45.3546245Z $L__func_begin0: 2026-02-21T09:13:45.3546493Z .loc 1 19 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:19:0 2026-02-21T09:13:45.3546723Z 2026-02-21T09:13:45.3546775Z // %bb.0: 2026-02-21T09:13:45.3546932Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:13:45.3547127Z $L__tmp0: 2026-02-21T09:13:45.3547347Z .loc 1 19 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:19 2026-02-21T09:13:45.3547635Z mov.u32 %r1, %tid.x; 2026-02-21T09:13:45.3547837Z shr.u32 %r2, %r1, 5; 2026-02-21T09:13:45.3547999Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:13:45.3548178Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:13:45.3548338Z @%p3 bra $L__BB0_16; 2026-02-21T09:13:45.3548476Z bra.uni $L__BB0_1; 2026-02-21T09:13:45.3548621Z $L__BB0_16: 2026-02-21T09:13:45.3548854Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0:0 2026-02-21T09:13:45.3549168Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:13:45.3549392Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:13:45.3549595Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:13:45.3549891Z .loc 1 19 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:19 2026-02-21T09:13:45.3550195Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:13:45.3550413Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T09:13:45.3550588Z mov.b32 %r163, global_smem; 2026-02-21T09:13:45.3550750Z // begin inline asm 2026-02-21T09:13:45.3551008Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r163], 512; 2026-02-21T09:13:45.3551266Z // end inline asm 2026-02-21T09:13:45.3551431Z bar.sync 0, 128; 2026-02-21T09:13:45.3551577Z ld.shared.b32 %r2127, [global_smem]; 2026-02-21T09:13:45.3551778Z bar.sync 0, 128; 2026-02-21T09:13:45.3551914Z // begin inline asm 2026-02-21T09:13:45.3552132Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:13:45.3552362Z // end inline asm 2026-02-21T09:13:45.3552602Z .loc 1 21 67 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:21:67 2026-02-21T09:13:45.3552897Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:13:45.3553048Z mov.u32 %r748, %ctaid.y; 2026-02-21T09:13:45.3553213Z mov.u32 %r749, %ctaid.z; 2026-02-21T09:13:45.3553377Z mov.u32 %r750, %nctaid.x; 2026-02-21T09:13:45.3553538Z mov.u32 %r751, %nctaid.y; 2026-02-21T09:13:45.3553707Z mad.lo.s32 %r752, %r749, %r751, %r748; 2026-02-21T09:13:45.3553889Z mad.lo.s32 %r753, %r752, %r750, %r41; 2026-02-21T09:13:45.3554077Z mul.lo.s32 %r754, %r753, 384; 2026-02-21T09:13:45.3554236Z cvt.s64.s32 %rd84, %r754; 2026-02-21T09:13:45.3554397Z add.s64 %rd45, %rd7, %rd84; 2026-02-21T09:13:45.3554551Z shl.b32 %r755, %r1, 2; 2026-02-21T09:13:45.3554747Z add.s32 %r164, %r163, %r755; 2026-02-21T09:13:45.3554902Z mov.b32 %r2155, 0; 2026-02-21T09:13:45.3555053Z // begin inline asm 2026-02-21T09:13:45.3555210Z @%p33 st.shared.b32 [ %r164 + 0 ], %r2155; 2026-02-21T09:13:45.3555439Z // end inline asm 2026-02-21T09:13:45.3555581Z bar.warp.sync -1; 2026-02-21T09:13:45.3555724Z setp.eq.b32 %p145, %r1, 0; 2026-02-21T09:13:45.3555884Z cvt.u64.u32 %rd30, %r163; 2026-02-21T09:13:45.3556027Z // begin inline asm 2026-02-21T09:13:45.3556326Z @%p145 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd4; 2026-02-21T09:13:45.3556609Z // end inline asm 2026-02-21T09:13:45.3556748Z // begin inline asm 2026-02-21T09:13:45.3556974Z @%p145 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:13:45.3557240Z // end inline asm 2026-02-21T09:13:45.3557376Z mov.b32 %r166, 32; 2026-02-21T09:13:45.3557508Z // begin inline asm 2026-02-21T09:13:45.3557749Z @%p145 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r166; 2026-02-21T09:13:45.3558021Z // end inline asm 2026-02-21T09:13:45.3558159Z mov.b32 %r167, 256; 2026-02-21T09:13:45.3558334Z // begin inline asm 2026-02-21T09:13:45.3558572Z @%p145 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r167; 2026-02-21T09:13:45.3558847Z // end inline asm 2026-02-21T09:13:45.3558976Z mov.b32 %r168, 2048; 2026-02-21T09:13:45.3559120Z // begin inline asm 2026-02-21T09:13:45.3559356Z @%p145 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r168; 2026-02-21T09:13:45.3559638Z // end inline asm 2026-02-21T09:13:45.3559765Z // begin inline asm 2026-02-21T09:13:45.3560040Z @%p145 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r168; 2026-02-21T09:13:45.3560322Z // end inline asm 2026-02-21T09:13:45.3560460Z mov.b64 %rd38, 4096; 2026-02-21T09:13:45.3560600Z // begin inline asm 2026-02-21T09:13:45.3561281Z [161s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:13:45.3562543Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:13:45.3563721Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:13:45.3563951Z `ptxas` stderr: 2026-02-21T09:13:45.3564358Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 259 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:13:45.3564860Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:13:45.3565002Z 2026-02-21T09:13:45.3565378Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp70tkmfdx.ptx -o /tmp/tmp70tkmfdx.ptx.o 2026-02-21T09:13:45.3565839Z 2026-02-21T09:13:45.3565970Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:13:45.3566353Z @%p145 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd38; 2026-02-21T09:13:45.3566651Z // end inline asm 2026-02-21T09:13:45.3566800Z mov.b32 %r170, 1; 2026-02-21T09:13:45.3566941Z // begin inline asm 2026-02-21T09:13:45.3567219Z @%p145 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r170; 2026-02-21T09:13:45.3567518Z // end inline asm 2026-02-21T09:13:45.3567663Z // begin inline asm 2026-02-21T09:13:45.3567938Z @%p145 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r170; 2026-02-21T09:13:45.3568234Z // end inline asm 2026-02-21T09:13:45.3568385Z // begin inline asm 2026-02-21T09:13:45.3568631Z @%p145 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:13:45.3568918Z // end inline asm 2026-02-21T09:13:45.3569057Z // begin inline asm 2026-02-21T09:13:45.3569359Z @%p145 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:45.3569653Z // end inline asm 2026-02-21T09:13:45.3569788Z // begin inline asm 2026-02-21T09:13:45.3570039Z @%p145 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x2; 2026-02-21T09:13:45.3570344Z // end inline asm 2026-02-21T09:13:45.3570488Z // begin inline asm 2026-02-21T09:13:45.3570719Z @%p145 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:45.3570990Z // end inline asm 2026-02-21T09:13:45.3571125Z // begin inline asm 2026-02-21T09:13:45.3571498Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd45 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:13:45.3571907Z // end inline asm 2026-02-21T09:13:45.3572045Z // begin inline asm 2026-02-21T09:13:45.3572312Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd45 + 0 ], 0x80; 2026-02-21T09:13:45.3572576Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:45.3572780Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:45.3572957Z // end inline asm 2026-02-21T09:13:45.3573102Z bar.sync 0, 128; 2026-02-21T09:13:45.3573366Z .loc 1 22 67 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:22:67 2026-02-21T09:13:45.3573663Z add.s32 %r756, %r754, 128; 2026-02-21T09:13:45.3573841Z cvt.s64.s32 %rd85, %r756; 2026-02-21T09:13:45.3574020Z add.s64 %rd63, %rd7, %rd85; 2026-02-21T09:13:45.3574180Z bar.sync 0, 128; 2026-02-21T09:13:45.3574310Z // begin inline asm 2026-02-21T09:13:45.3574467Z @%p33 st.shared.b32 [ %r164 + 0 ], %r2155; 2026-02-21T09:13:45.3574641Z // end inline asm 2026-02-21T09:13:45.3574810Z bar.warp.sync -1; 2026-02-21T09:13:45.3574944Z // begin inline asm 2026-02-21T09:13:45.3575186Z @%p145 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd5; 2026-02-21T09:13:45.3575462Z // end inline asm 2026-02-21T09:13:45.3575590Z // begin inline asm 2026-02-21T09:13:45.3575811Z @%p145 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:13:45.3576061Z // end inline asm 2026-02-21T09:13:45.3576193Z // begin inline asm 2026-02-21T09:13:45.3576420Z @%p145 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r166; 2026-02-21T09:13:45.3576689Z // end inline asm 2026-02-21T09:13:45.3576827Z // begin inline asm 2026-02-21T09:13:45.3577062Z @%p145 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r167; 2026-02-21T09:13:45.3577329Z // end inline asm 2026-02-21T09:13:45.3577457Z // begin inline asm 2026-02-21T09:13:45.3577697Z @%p145 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r168; 2026-02-21T09:13:45.3577960Z // end inline asm 2026-02-21T09:13:45.3578096Z mov.b32 %r177, 4096; 2026-02-21T09:13:45.3578233Z // begin inline asm 2026-02-21T09:13:45.3578475Z @%p145 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r177; 2026-02-21T09:13:45.3578749Z // end inline asm 2026-02-21T09:13:45.3578878Z // begin inline asm 2026-02-21T09:13:45.3579133Z @%p145 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd38; 2026-02-21T09:13:45.3579413Z // end inline asm 2026-02-21T09:13:45.3579551Z // begin inline asm 2026-02-21T09:13:45.3579797Z @%p145 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r170; 2026-02-21T09:13:45.3580082Z // end inline asm 2026-02-21T09:13:45.3580218Z // begin inline asm 2026-02-21T09:13:45.3580461Z @%p145 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r170; 2026-02-21T09:13:45.3580740Z // end inline asm 2026-02-21T09:13:45.3580866Z // begin inline asm 2026-02-21T09:13:45.3581097Z @%p145 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:13:45.3581357Z // end inline asm 2026-02-21T09:13:45.3581492Z // begin inline asm 2026-02-21T09:13:45.3581773Z @%p145 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:45.3582046Z // end inline asm 2026-02-21T09:13:45.3582182Z // begin inline asm 2026-02-21T09:13:45.3582409Z @%p145 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x2; 2026-02-21T09:13:45.3582744Z // end inline asm 2026-02-21T09:13:45.3582873Z // begin inline asm 2026-02-21T09:13:45.3583100Z @%p145 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:45.3583356Z // end inline asm 2026-02-21T09:13:45.3583484Z // begin inline asm 2026-02-21T09:13:45.3583836Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd63 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:13:45.3584195Z // end inline asm 2026-02-21T09:13:45.3584331Z // begin inline asm 2026-02-21T09:13:45.3584533Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd63 + 0 ], 0x80; 2026-02-21T09:13:45.3584833Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:45.3585017Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:45.3585194Z // end inline asm 2026-02-21T09:13:45.3585324Z bar.sync 0, 128; 2026-02-21T09:13:45.3585560Z .loc 1 24 71 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:24:71 2026-02-21T09:13:45.3585855Z add.s32 %r757, %r754, 256; 2026-02-21T09:13:45.3586008Z cvt.s64.s32 %rd86, %r757; 2026-02-21T09:13:45.3586168Z add.s64 %rd81, %rd7, %rd86; 2026-02-21T09:13:45.3586344Z bar.sync 0, 128; 2026-02-21T09:13:45.3586484Z // begin inline asm 2026-02-21T09:13:45.3586630Z @%p33 st.shared.b32 [ %r164 + 0 ], %r2155; 2026-02-21T09:13:45.3586806Z // end inline asm 2026-02-21T09:13:45.3586943Z bar.warp.sync -1; 2026-02-21T09:13:45.3587076Z // begin inline asm 2026-02-21T09:13:45.3587322Z @%p145 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd6; 2026-02-21T09:13:45.3587591Z // end inline asm 2026-02-21T09:13:45.3587726Z // begin inline asm 2026-02-21T09:13:45.3587941Z @%p145 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:13:45.3588197Z // end inline asm 2026-02-21T09:13:45.3588324Z mov.b32 %r182, 64; 2026-02-21T09:13:45.3588465Z // begin inline asm 2026-02-21T09:13:45.3588700Z @%p145 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r182; 2026-02-21T09:13:45.3588960Z // end inline asm 2026-02-21T09:13:45.3589100Z // begin inline asm 2026-02-21T09:13:45.3589328Z @%p145 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r167; 2026-02-21T09:13:45.3589618Z // end inline asm 2026-02-21T09:13:45.3589747Z // begin inline asm 2026-02-21T09:13:45.3589989Z @%p145 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r177; 2026-02-21T09:13:45.3590270Z // end inline asm 2026-02-21T09:13:45.3590395Z // begin inline asm 2026-02-21T09:13:45.3590634Z @%p145 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r168; 2026-02-21T09:13:45.3590902Z // end inline asm 2026-02-21T09:13:45.3591038Z mov.b64 %rd74, 8192; 2026-02-21T09:13:45.3591173Z // begin inline asm 2026-02-21T09:13:45.3591420Z @%p145 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd74; 2026-02-21T09:13:45.3591717Z // end inline asm 2026-02-21T09:13:45.3591846Z // begin inline asm 2026-02-21T09:13:45.3592095Z @%p145 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r170; 2026-02-21T09:13:45.3592369Z // end inline asm 2026-02-21T09:13:45.3592506Z // begin inline asm 2026-02-21T09:13:45.3592751Z @%p145 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r170; 2026-02-21T09:13:45.3593033Z // end inline asm 2026-02-21T09:13:45.3593160Z // begin inline asm 2026-02-21T09:13:45.3593390Z @%p145 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:13:45.3593647Z // end inline asm 2026-02-21T09:13:45.3593776Z // begin inline asm 2026-02-21T09:13:45.3594052Z @%p145 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:45.3594333Z // end inline asm 2026-02-21T09:13:45.3594466Z // begin inline asm 2026-02-21T09:13:45.3594728Z @%p145 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x3; 2026-02-21T09:13:45.3595022Z // end inline asm 2026-02-21T09:13:45.3595153Z // begin inline asm 2026-02-21T09:13:45.3595371Z @%p145 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:45.3595634Z // end inline asm 2026-02-21T09:13:45.3595760Z // begin inline asm 2026-02-21T09:13:45.3596096Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd81 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:13:45.3596455Z // end inline asm 2026-02-21T09:13:45.3596589Z // begin inline asm 2026-02-21T09:13:45.3596814Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd81 + 0 ], 0x80; 2026-02-21T09:13:45.3597060Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:45.3597251Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:45.3597422Z // end inline asm 2026-02-21T09:13:45.3597558Z bar.sync 0, 128; 2026-02-21T09:13:45.3597695Z cvta.global.u64 %rd87, %rd81; 2026-02-21T09:13:45.3597978Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3598267Z max.u32 %r758, %r41, 127; 2026-02-21T09:13:45.3598424Z shl.b32 %r759, %r758, 6; 2026-02-21T09:13:45.3598608Z sub.s32 %r42, 8192, %r759; 2026-02-21T09:13:45.3598863Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3599162Z shfl.sync.idx.b32 %r760, %r2, 0, 31, -1; 2026-02-21T09:13:45.3599339Z and.b32 %r43, %r760, 3; 2026-02-21T09:13:45.3599495Z shl.b32 %r761, %r43, 21; 2026-02-21T09:13:45.3599645Z add.s32 %r188, %r761, %r2127; 2026-02-21T09:13:45.3599810Z mov.pred %p89, -1; 2026-02-21T09:13:45.3599951Z // begin inline asm 2026-02-21T09:13:45.3600362Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 0], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3600787Z // end inline asm 2026-02-21T09:13:45.3600915Z // begin inline asm 2026-02-21T09:13:45.3601303Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 16], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3601725Z // end inline asm 2026-02-21T09:13:45.3601859Z // begin inline asm 2026-02-21T09:13:45.3602245Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 32], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3602656Z // end inline asm 2026-02-21T09:13:45.3602793Z // begin inline asm 2026-02-21T09:13:45.3603171Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 48], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3603599Z // end inline asm 2026-02-21T09:13:45.3603727Z // begin inline asm 2026-02-21T09:13:45.3604102Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 64], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3604530Z // end inline asm 2026-02-21T09:13:45.3604658Z // begin inline asm 2026-02-21T09:13:45.3605066Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 80], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3605471Z // end inline asm 2026-02-21T09:13:45.3605608Z // begin inline asm 2026-02-21T09:13:45.3605980Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 96], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3606413Z // end inline asm 2026-02-21T09:13:45.3606550Z // begin inline asm 2026-02-21T09:13:45.3606917Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 112], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3607368Z // end inline asm 2026-02-21T09:13:45.3607497Z // begin inline asm 2026-02-21T09:13:45.3607889Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 128], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3608364Z // end inline asm 2026-02-21T09:13:45.3608492Z // begin inline asm 2026-02-21T09:13:45.3608903Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 144], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3609345Z // end inline asm 2026-02-21T09:13:45.3609485Z // begin inline asm 2026-02-21T09:13:45.3609870Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 160], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3610314Z // end inline asm 2026-02-21T09:13:45.3610458Z // begin inline asm 2026-02-21T09:13:45.3610887Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 176], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3611348Z // end inline asm 2026-02-21T09:13:45.3611482Z // begin inline asm 2026-02-21T09:13:45.3611894Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 192], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3612335Z // end inline asm 2026-02-21T09:13:45.3612472Z // begin inline asm 2026-02-21T09:13:45.3612887Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 208], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3613324Z // end inline asm 2026-02-21T09:13:45.3613467Z // begin inline asm 2026-02-21T09:13:45.3613863Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 224], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3614305Z // end inline asm 2026-02-21T09:13:45.3614450Z // begin inline asm 2026-02-21T09:13:45.3614876Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 240], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3615303Z // end inline asm 2026-02-21T09:13:45.3615440Z // begin inline asm 2026-02-21T09:13:45.3615837Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 256], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3616271Z // end inline asm 2026-02-21T09:13:45.3616417Z // begin inline asm 2026-02-21T09:13:45.3616794Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 272], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3617193Z // end inline asm 2026-02-21T09:13:45.3617328Z // begin inline asm 2026-02-21T09:13:45.3617697Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 288], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3618126Z // end inline asm 2026-02-21T09:13:45.3618265Z // begin inline asm 2026-02-21T09:13:45.3618636Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 304], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3619082Z // end inline asm 2026-02-21T09:13:45.3619211Z // begin inline asm 2026-02-21T09:13:45.3619586Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 320], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3620042Z // end inline asm 2026-02-21T09:13:45.3620170Z // begin inline asm 2026-02-21T09:13:45.3620558Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 336], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3620978Z // end inline asm 2026-02-21T09:13:45.3621112Z // begin inline asm 2026-02-21T09:13:45.3621524Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 352], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3621934Z // end inline asm 2026-02-21T09:13:45.3622068Z // begin inline asm 2026-02-21T09:13:45.3622449Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 368], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3622861Z // end inline asm 2026-02-21T09:13:45.3622988Z // begin inline asm 2026-02-21T09:13:45.3623384Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 384], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3623807Z // end inline asm 2026-02-21T09:13:45.3623935Z // begin inline asm 2026-02-21T09:13:45.3624314Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 400], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3624765Z // end inline asm 2026-02-21T09:13:45.3624898Z // begin inline asm 2026-02-21T09:13:45.3625266Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 416], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3625679Z // end inline asm 2026-02-21T09:13:45.3625812Z // begin inline asm 2026-02-21T09:13:45.3626182Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 432], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3626610Z // end inline asm 2026-02-21T09:13:45.3626739Z // begin inline asm 2026-02-21T09:13:45.3627130Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 448], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3627539Z // end inline asm 2026-02-21T09:13:45.3627669Z // begin inline asm 2026-02-21T09:13:45.3628057Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 464], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3628478Z // end inline asm 2026-02-21T09:13:45.3628612Z // begin inline asm 2026-02-21T09:13:45.3628993Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 480], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3629404Z // end inline asm 2026-02-21T09:13:45.3629539Z // begin inline asm 2026-02-21T09:13:45.3629907Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 496], {%r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155, %r2155}; 2026-02-21T09:13:45.3630315Z // end inline asm 2026-02-21T09:13:45.3630442Z // begin inline asm 2026-02-21T09:13:45.3630600Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:13:45.3630805Z // end inline asm 2026-02-21T09:13:45.3630942Z bar.sync 0, 128; 2026-02-21T09:13:45.3631200Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3631514Z add.s32 %r732, %r163, 262144; 2026-02-21T09:13:45.3631673Z // begin inline asm 2026-02-21T09:13:45.3631834Z @%p145 mbarrier.init.shared::cta.b64 [%r732], 1; 2026-02-21T09:13:45.3632026Z // end inline asm 2026-02-21T09:13:45.3632158Z bar.sync 0, 128; 2026-02-21T09:13:45.3632307Z add.s32 %r733, %r163, 262152; 2026-02-21T09:13:45.3632460Z // begin inline asm 2026-02-21T09:13:45.3632623Z @%p145 mbarrier.init.shared::cta.b64 [%r733], 1; 2026-02-21T09:13:45.3632813Z // end inline asm 2026-02-21T09:13:45.3632939Z bar.sync 0, 128; 2026-02-21T09:13:45.3633076Z add.s32 %r734, %r163, 262160; 2026-02-21T09:13:45.3633224Z // begin inline asm 2026-02-21T09:13:45.3633411Z @%p145 mbarrier.init.shared::cta.b64 [%r734], 1; 2026-02-21T09:13:45.3633593Z // end inline asm 2026-02-21T09:13:45.3633725Z bar.sync 0, 128; 2026-02-21T09:13:45.3633855Z add.s32 %r735, %r163, 262168; 2026-02-21T09:13:45.3634011Z // begin inline asm 2026-02-21T09:13:45.3634166Z @%p145 mbarrier.init.shared::cta.b64 [%r735], 1; 2026-02-21T09:13:45.3634353Z // end inline asm 2026-02-21T09:13:45.3634488Z add.s32 %r736, %r163, 262176; 2026-02-21T09:13:45.3634633Z // begin inline asm 2026-02-21T09:13:45.3634822Z @%p145 mbarrier.init.shared::cta.b64 [%r736], 1; 2026-02-21T09:13:45.3635024Z // end inline asm 2026-02-21T09:13:45.3635158Z bar.sync 0, 128; 2026-02-21T09:13:45.3635286Z add.s32 %r737, %r163, 262184; 2026-02-21T09:13:45.3635441Z // begin inline asm 2026-02-21T09:13:45.3635594Z @%p145 mbarrier.init.shared::cta.b64 [%r737], 1; 2026-02-21T09:13:45.3635777Z // end inline asm 2026-02-21T09:13:45.3635909Z bar.sync 0, 128; 2026-02-21T09:13:45.3636037Z add.s32 %r738, %r163, 262192; 2026-02-21T09:13:45.3636190Z // begin inline asm 2026-02-21T09:13:45.3636344Z @%p145 mbarrier.init.shared::cta.b64 [%r738], 1; 2026-02-21T09:13:45.3636527Z // end inline asm 2026-02-21T09:13:45.3636651Z bar.sync 0, 128; 2026-02-21T09:13:45.3636784Z add.s32 %r739, %r163, 262200; 2026-02-21T09:13:45.3636931Z // begin inline asm 2026-02-21T09:13:45.3637093Z @%p145 mbarrier.init.shared::cta.b64 [%r739], 1; 2026-02-21T09:13:45.3637270Z // end inline asm 2026-02-21T09:13:45.3637511Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3637786Z bar.sync 0, 128; 2026-02-21T09:13:45.3637912Z // begin inline asm 2026-02-21T09:13:45.3638085Z @%p145 mbarrier.arrive.shared::cta.b64 _, [%r732]; 2026-02-21T09:13:45.3638274Z // end inline asm 2026-02-21T09:13:45.3638405Z bar.sync 0, 128; 2026-02-21T09:13:45.3638532Z // begin inline asm 2026-02-21T09:13:45.3638701Z @%p145 mbarrier.arrive.shared::cta.b64 _, [%r733]; 2026-02-21T09:13:45.3638886Z // end inline asm 2026-02-21T09:13:45.3639025Z bar.sync 0, 128; 2026-02-21T09:13:45.3639168Z // begin inline asm 2026-02-21T09:13:45.3639326Z @%p145 mbarrier.arrive.shared::cta.b64 _, [%r734]; 2026-02-21T09:13:45.3639515Z // end inline asm 2026-02-21T09:13:45.3639640Z bar.sync 0, 128; 2026-02-21T09:13:45.3639776Z // begin inline asm 2026-02-21T09:13:45.3639933Z @%p145 mbarrier.arrive.shared::cta.b64 _, [%r735]; 2026-02-21T09:13:45.3640118Z // end inline asm 2026-02-21T09:13:45.3640356Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3640638Z bar.sync 0, 128; 2026-02-21T09:13:45.3640774Z add.s32 %r744, %r163, 262208; 2026-02-21T09:13:45.3640922Z // begin inline asm 2026-02-21T09:13:45.3641084Z @%p145 mbarrier.init.shared::cta.b64 [%r744], 1; 2026-02-21T09:13:45.3641264Z // end inline asm 2026-02-21T09:13:45.3641403Z add.s32 %r2115, %r163, 262224; 2026-02-21T09:13:45.3641556Z // begin inline asm 2026-02-21T09:13:45.3641722Z @%p145 mbarrier.init.shared::cta.b64 [%r2115], 1; 2026-02-21T09:13:45.3641934Z // end inline asm 2026-02-21T09:13:45.3642184Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3642461Z bar.sync 0, 128; 2026-02-21T09:13:45.3642589Z // begin inline asm 2026-02-21T09:13:45.3642790Z @%p145 mbarrier.arrive.shared::cta.b64 _, [%r2115]; 2026-02-21T09:13:45.3642976Z // end inline asm 2026-02-21T09:13:45.3643218Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3643519Z st.shared.b32 [global_smem+262232], 33554689; 2026-02-21T09:13:45.3643722Z st.shared.b32 [global_smem], %r2127; 2026-02-21T09:13:45.3643887Z barrier.sync 1; 2026-02-21T09:13:45.3644045Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:13:45.3644223Z barrier.sync 1; 2026-02-21T09:13:45.3644359Z setp.lt.s32 %p136, %r42, 1; 2026-02-21T09:13:45.3644523Z @%p136 bra $L__BB0_23; 2026-02-21T09:13:45.3644734Z // %bb.17: // %.lr.ph9 2026-02-21T09:13:45.3644926Z add.s32 %r2152, %r41, -1; 2026-02-21T09:13:45.3645073Z shl.b32 %r764, %r1, 7; 2026-02-21T09:13:45.3645226Z and.b32 %r765, %r764, 16256; 2026-02-21T09:13:45.3645379Z shl.b32 %r766, %r1, 4; 2026-02-21T09:13:45.3645530Z and.b32 %r767, %r766, 112; 2026-02-21T09:13:45.3645688Z or.b32 %r768, %r765, %r767; 2026-02-21T09:13:45.3645840Z add.s32 %r46, %r163, %r768; 2026-02-21T09:13:45.3645996Z xor.b32 %r770, %r768, 16; 2026-02-21T09:13:45.3646179Z add.s32 %r47, %r163, %r770; 2026-02-21T09:13:45.3646339Z xor.b32 %r771, %r768, 32; 2026-02-21T09:13:45.3646482Z add.s32 %r48, %r163, %r771; 2026-02-21T09:13:45.3646634Z xor.b32 %r772, %r768, 48; 2026-02-21T09:13:45.3646777Z add.s32 %r49, %r163, %r772; 2026-02-21T09:13:45.3646930Z xor.b32 %r773, %r768, 64; 2026-02-21T09:13:45.3647073Z add.s32 %r50, %r163, %r773; 2026-02-21T09:13:45.3647224Z xor.b32 %r774, %r768, 80; 2026-02-21T09:13:45.3647377Z add.s32 %r51, %r163, %r774; 2026-02-21T09:13:45.3647524Z xor.b32 %r775, %r768, 96; 2026-02-21T09:13:45.3647679Z add.s32 %r52, %r163, %r775; 2026-02-21T09:13:45.3647832Z xor.b32 %r776, %r768, 112; 2026-02-21T09:13:45.3647996Z add.s32 %r53, %r163, %r776; 2026-02-21T09:13:45.3648150Z shl.b32 %r777, %r43, 15; 2026-02-21T09:13:45.3648307Z add.s32 %r1329, %r163, %r777; 2026-02-21T09:13:45.3648458Z shl.b32 %r55, %r43, 6; 2026-02-21T09:13:45.3648607Z mov.b32 %r2149, -1; 2026-02-21T09:13:45.3648749Z mov.b32 %r2153, %r2155; 2026-02-21T09:13:45.3648901Z mov.b32 %r2154, %r2155; 2026-02-21T09:13:45.3649051Z mov.b32 %r2150, %r2155; 2026-02-21T09:13:45.3649193Z bra.uni $L__BB0_18; 2026-02-21T09:13:45.3649385Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:45.3649713Z .loc 1 0 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0:106 2026-02-21T09:13:45.3650005Z setp.lt.u32 %p142, %r1, 128; 2026-02-21T09:13:45.3650273Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3650554Z bar.sync 0, 128; 2026-02-21T09:13:45.3650696Z // begin inline asm 2026-02-21T09:13:45.3650825Z 2026-02-21T09:13:45.3650945Z { 2026-02-21T09:13:45.3651064Z .reg .pred complete; 2026-02-21T09:13:45.3651213Z waitLoop: 2026-02-21T09:13:45.3651401Z mbarrier.try_wait.parity.shared.b64 complete, [%r744], %r2155; 2026-02-21T09:13:45.3651637Z @!complete bra.uni waitLoop; 2026-02-21T09:13:45.3651782Z } 2026-02-21T09:13:45.3651851Z 2026-02-21T09:13:45.3651905Z // end inline asm 2026-02-21T09:13:45.3652038Z // begin inline asm 2026-02-21T09:13:45.3652409Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797}, [%r188 + 0]; 2026-02-21T09:13:45.3652818Z // end inline asm 2026-02-21T09:13:45.3652953Z // begin inline asm 2026-02-21T09:13:45.3653312Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814}, [%r188 + 16]; 2026-02-21T09:13:45.3653730Z // end inline asm 2026-02-21T09:13:45.3653870Z // begin inline asm 2026-02-21T09:13:45.3654223Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829, %r830, %r831}, [%r188 + 32]; 2026-02-21T09:13:45.3654657Z // end inline asm 2026-02-21T09:13:45.3654839Z cvt.u64.u32 %rd88, %r816; 2026-02-21T09:13:45.3655000Z cvt.u64.u32 %rd89, %r817; 2026-02-21T09:13:45.3655161Z shl.b64 %rd90, %rd89, 32; 2026-02-21T09:13:45.3655314Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T09:13:45.3655477Z cvt.u64.u32 %rd92, %r818; 2026-02-21T09:13:45.3655626Z cvt.u64.u32 %rd93, %r819; 2026-02-21T09:13:45.3655781Z shl.b64 %rd94, %rd93, 32; 2026-02-21T09:13:45.3655931Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T09:13:45.3656095Z cvt.u64.u32 %rd96, %r820; 2026-02-21T09:13:45.3656277Z cvt.u64.u32 %rd97, %r821; 2026-02-21T09:13:45.3656430Z shl.b64 %rd98, %rd97, 32; 2026-02-21T09:13:45.3656589Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T09:13:45.3656749Z cvt.u64.u32 %rd100, %r822; 2026-02-21T09:13:45.3656914Z cvt.u64.u32 %rd101, %r823; 2026-02-21T09:13:45.3657068Z shl.b64 %rd102, %rd101, 32; 2026-02-21T09:13:45.3657244Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T09:13:45.3657410Z cvt.u64.u32 %rd104, %r824; 2026-02-21T09:13:45.3657578Z cvt.u64.u32 %rd105, %r825; 2026-02-21T09:13:45.3657733Z shl.b64 %rd106, %rd105, 32; 2026-02-21T09:13:45.3657930Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T09:13:45.3658100Z cvt.u64.u32 %rd108, %r826; 2026-02-21T09:13:45.3658251Z cvt.u64.u32 %rd109, %r827; 2026-02-21T09:13:45.3658410Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:13:45.3658567Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:13:45.3658733Z cvt.u64.u32 %rd112, %r828; 2026-02-21T09:13:45.3658884Z cvt.u64.u32 %rd113, %r829; 2026-02-21T09:13:45.3659044Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:13:45.3659201Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:13:45.3659366Z cvt.u64.u32 %rd116, %r830; 2026-02-21T09:13:45.3659518Z cvt.u64.u32 %rd117, %r831; 2026-02-21T09:13:45.3659679Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:13:45.3659841Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:13:45.3660001Z // begin inline asm 2026-02-21T09:13:45.3660378Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845, %r846, %r847, %r848}, [%r188 + 48]; 2026-02-21T09:13:45.3660744Z // end inline asm 2026-02-21T09:13:45.3660883Z cvt.u64.u32 %rd120, %r833; 2026-02-21T09:13:45.3661030Z cvt.u64.u32 %rd121, %r834; 2026-02-21T09:13:45.3661183Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:13:45.3661333Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:13:45.3661492Z cvt.u64.u32 %rd124, %r835; 2026-02-21T09:13:45.3661644Z cvt.u64.u32 %rd125, %r836; 2026-02-21T09:13:45.3661788Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:13:45.3661944Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:13:45.3662096Z cvt.u64.u32 %rd128, %r837; 2026-02-21T09:13:45.3662248Z cvt.u64.u32 %rd129, %r838; 2026-02-21T09:13:45.3662391Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:13:45.3662545Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:13:45.3662692Z cvt.u64.u32 %rd132, %r839; 2026-02-21T09:13:45.3662845Z cvt.u64.u32 %rd133, %r840; 2026-02-21T09:13:45.3662995Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:13:45.3663140Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:13:45.3663298Z cvt.u64.u32 %rd136, %r841; 2026-02-21T09:13:45.3663443Z cvt.u64.u32 %rd137, %r842; 2026-02-21T09:13:45.3663595Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:13:45.3663742Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:13:45.3663898Z cvt.u64.u32 %rd140, %r843; 2026-02-21T09:13:45.3664042Z cvt.u64.u32 %rd141, %r844; 2026-02-21T09:13:45.3664191Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:13:45.3664336Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:13:45.3664494Z cvt.u64.u32 %rd144, %r845; 2026-02-21T09:13:45.3664646Z cvt.u64.u32 %rd145, %r846; 2026-02-21T09:13:45.3664858Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:13:45.3665015Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:13:45.3665166Z cvt.u64.u32 %rd148, %r847; 2026-02-21T09:13:45.3665320Z cvt.u64.u32 %rd149, %r848; 2026-02-21T09:13:45.3665505Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:13:45.3665666Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:13:45.3665821Z // begin inline asm 2026-02-21T09:13:45.3666188Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858, %r859, %r860, %r861, %r862, %r863, %r864, %r865}, [%r188 + 64]; 2026-02-21T09:13:45.3666580Z // end inline asm 2026-02-21T09:13:45.3666712Z // begin inline asm 2026-02-21T09:13:45.3667051Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r867, %r868, %r869, %r870, %r871, %r872, %r873, %r874, %r875, %r876, %r877, %r878, %r879, %r880, %r881, %r882}, [%r188 + 80]; 2026-02-21T09:13:45.3667435Z // end inline asm 2026-02-21T09:13:45.3667605Z // begin inline asm 2026-02-21T09:13:45.3667953Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r884, %r885, %r886, %r887, %r888, %r889, %r890, %r891, %r892, %r893, %r894, %r895, %r896, %r897, %r898, %r899}, [%r188 + 96]; 2026-02-21T09:13:45.3668322Z // end inline asm 2026-02-21T09:13:45.3668461Z cvt.u64.u32 %rd152, %r884; 2026-02-21T09:13:45.3668610Z cvt.u64.u32 %rd153, %r885; 2026-02-21T09:13:45.3668766Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:13:45.3668912Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:13:45.3669113Z cvt.u64.u32 %rd156, %r886; 2026-02-21T09:13:45.3669257Z cvt.u64.u32 %rd157, %r887; 2026-02-21T09:13:45.3669409Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:13:45.3669554Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:13:45.3669710Z cvt.u64.u32 %rd160, %r888; 2026-02-21T09:13:45.3669853Z cvt.u64.u32 %rd161, %r889; 2026-02-21T09:13:45.3670005Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:13:45.3670158Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:13:45.3670308Z cvt.u64.u32 %rd164, %r890; 2026-02-21T09:13:45.3670460Z cvt.u64.u32 %rd165, %r891; 2026-02-21T09:13:45.3670604Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:13:45.3670758Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:13:45.3670908Z cvt.u64.u32 %rd168, %r892; 2026-02-21T09:13:45.3671062Z cvt.u64.u32 %rd169, %r893; 2026-02-21T09:13:45.3671209Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:13:45.3671362Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:13:45.3671516Z cvt.u64.u32 %rd172, %r894; 2026-02-21T09:13:45.3671657Z cvt.u64.u32 %rd173, %r895; 2026-02-21T09:13:45.3671806Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:13:45.3671952Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:13:45.3672108Z cvt.u64.u32 %rd176, %r896; 2026-02-21T09:13:45.3672249Z cvt.u64.u32 %rd177, %r897; 2026-02-21T09:13:45.3672397Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:13:45.3672541Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:13:45.3672697Z cvt.u64.u32 %rd180, %r898; 2026-02-21T09:13:45.3672842Z cvt.u64.u32 %rd181, %r899; 2026-02-21T09:13:45.3672993Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:13:45.3673147Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:13:45.3673295Z // begin inline asm 2026-02-21T09:13:45.3673638Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r901, %r902, %r903, %r904, %r905, %r906, %r907, %r908, %r909, %r910, %r911, %r912, %r913, %r914, %r915, %r916}, [%r188 + 112]; 2026-02-21T09:13:45.3674006Z // end inline asm 2026-02-21T09:13:45.3674146Z cvt.u64.u32 %rd184, %r901; 2026-02-21T09:13:45.3674289Z cvt.u64.u32 %rd185, %r902; 2026-02-21T09:13:45.3674442Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:13:45.3674588Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:13:45.3674784Z cvt.u64.u32 %rd188, %r903; 2026-02-21T09:13:45.3674941Z cvt.u64.u32 %rd189, %r904; 2026-02-21T09:13:45.3675097Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:13:45.3675256Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:13:45.3675409Z cvt.u64.u32 %rd192, %r905; 2026-02-21T09:13:45.3675562Z cvt.u64.u32 %rd193, %r906; 2026-02-21T09:13:45.3675710Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:13:45.3675913Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:13:45.3676062Z cvt.u64.u32 %rd196, %r907; 2026-02-21T09:13:45.3676213Z cvt.u64.u32 %rd197, %r908; 2026-02-21T09:13:45.3676358Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:13:45.3676545Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:13:45.3676703Z cvt.u64.u32 %rd200, %r909; 2026-02-21T09:13:45.3676848Z cvt.u64.u32 %rd201, %r910; 2026-02-21T09:13:45.3677004Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:13:45.3677156Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:13:45.3677317Z cvt.u64.u32 %rd204, %r911; 2026-02-21T09:13:45.3677464Z cvt.u64.u32 %rd205, %r912; 2026-02-21T09:13:45.3677616Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:13:45.3677763Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:13:45.3677921Z cvt.u64.u32 %rd208, %r913; 2026-02-21T09:13:45.3678071Z cvt.u64.u32 %rd209, %r914; 2026-02-21T09:13:45.3678215Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:13:45.3678402Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:13:45.3678555Z cvt.u64.u32 %rd212, %r915; 2026-02-21T09:13:45.3678704Z cvt.u64.u32 %rd213, %r916; 2026-02-21T09:13:45.3678845Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:13:45.3678995Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:13:45.3679141Z // begin inline asm 2026-02-21T09:13:45.3679481Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r918, %r919, %r920, %r921, %r922, %r923, %r924, %r925, %r926, %r927, %r928, %r929, %r930, %r931, %r932, %r933}, [%r188 + 128]; 2026-02-21T09:13:45.3679886Z // end inline asm 2026-02-21T09:13:45.3680018Z // begin inline asm 2026-02-21T09:13:45.3680366Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r935, %r936, %r937, %r938, %r939, %r940, %r941, %r942, %r943, %r944, %r945, %r946, %r947, %r948, %r949, %r950}, [%r188 + 144]; 2026-02-21T09:13:45.3680745Z // end inline asm 2026-02-21T09:13:45.3680880Z // begin inline asm 2026-02-21T09:13:45.3681222Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r952, %r953, %r954, %r955, %r956, %r957, %r958, %r959, %r960, %r961, %r962, %r963, %r964, %r965, %r966, %r967}, [%r188 + 160]; 2026-02-21T09:13:45.3681612Z // end inline asm 2026-02-21T09:13:45.3681748Z cvt.u64.u32 %rd216, %r952; 2026-02-21T09:13:45.3681895Z cvt.u64.u32 %rd217, %r953; 2026-02-21T09:13:45.3682048Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:13:45.3682197Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:13:45.3682357Z cvt.u64.u32 %rd220, %r954; 2026-02-21T09:13:45.3682502Z cvt.u64.u32 %rd221, %r955; 2026-02-21T09:13:45.3682655Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:13:45.3682803Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:13:45.3682960Z cvt.u64.u32 %rd224, %r956; 2026-02-21T09:13:45.3683103Z cvt.u64.u32 %rd225, %r957; 2026-02-21T09:13:45.3683256Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:13:45.3683412Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:13:45.3683563Z cvt.u64.u32 %rd228, %r958; 2026-02-21T09:13:45.3683720Z cvt.u64.u32 %rd229, %r959; 2026-02-21T09:13:45.3683868Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:13:45.3684030Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:13:45.3684181Z cvt.u64.u32 %rd232, %r960; 2026-02-21T09:13:45.3684332Z cvt.u64.u32 %rd233, %r961; 2026-02-21T09:13:45.3684477Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:13:45.3684629Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:13:45.3684829Z cvt.u64.u32 %rd236, %r962; 2026-02-21T09:13:45.3684980Z cvt.u64.u32 %rd237, %r963; 2026-02-21T09:13:45.3685130Z shl.b64 %rd238, %rd237, 32; 2026-02-21T09:13:45.3685278Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T09:13:45.3685438Z cvt.u64.u32 %rd240, %r964; 2026-02-21T09:13:45.3685583Z cvt.u64.u32 %rd241, %r965; 2026-02-21T09:13:45.3685734Z shl.b64 %rd242, %rd241, 32; 2026-02-21T09:13:45.3685882Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T09:13:45.3686037Z cvt.u64.u32 %rd244, %r966; 2026-02-21T09:13:45.3686183Z cvt.u64.u32 %rd245, %r967; 2026-02-21T09:13:45.3686334Z shl.b64 %rd246, %rd245, 32; 2026-02-21T09:13:45.3686488Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T09:13:45.3686639Z // begin inline asm 2026-02-21T09:13:45.3687006Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r969, %r970, %r971, %r972, %r973, %r974, %r975, %r976, %r977, %r978, %r979, %r980, %r981, %r982, %r983, %r984}, [%r188 + 176]; 2026-02-21T09:13:45.3687368Z // end inline asm 2026-02-21T09:13:45.3687506Z cvt.u64.u32 %rd248, %r969; 2026-02-21T09:13:45.3687681Z cvt.u64.u32 %rd249, %r970; 2026-02-21T09:13:45.3687833Z shl.b64 %rd250, %rd249, 32; 2026-02-21T09:13:45.3687981Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T09:13:45.3688139Z cvt.u64.u32 %rd252, %r971; 2026-02-21T09:13:45.3688290Z cvt.u64.u32 %rd253, %r972; 2026-02-21T09:13:45.3688434Z shl.b64 %rd254, %rd253, 32; 2026-02-21T09:13:45.3688586Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T09:13:45.3688734Z cvt.u64.u32 %rd256, %r973; 2026-02-21T09:13:45.3688884Z cvt.u64.u32 %rd257, %r974; 2026-02-21T09:13:45.3689026Z shl.b64 %rd258, %rd257, 32; 2026-02-21T09:13:45.3689178Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T09:13:45.3689354Z cvt.u64.u32 %rd260, %r975; 2026-02-21T09:13:45.3689507Z cvt.u64.u32 %rd261, %r976; 2026-02-21T09:13:45.3689651Z shl.b64 %rd262, %rd261, 32; 2026-02-21T09:13:45.3689803Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T09:13:45.3689958Z cvt.u64.u32 %rd264, %r977; 2026-02-21T09:13:45.3690102Z cvt.u64.u32 %rd265, %r978; 2026-02-21T09:13:45.3690255Z shl.b64 %rd266, %rd265, 32; 2026-02-21T09:13:45.3690403Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T09:13:45.3690559Z cvt.u64.u32 %rd268, %r979; 2026-02-21T09:13:45.3690702Z cvt.u64.u32 %rd269, %r980; 2026-02-21T09:13:45.3690881Z shl.b64 %rd270, %rd269, 32; 2026-02-21T09:13:45.3691029Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T09:13:45.3691187Z cvt.u64.u32 %rd272, %r981; 2026-02-21T09:13:45.3691330Z cvt.u64.u32 %rd273, %r982; 2026-02-21T09:13:45.3691482Z shl.b64 %rd274, %rd273, 32; 2026-02-21T09:13:45.3691635Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T09:13:45.3691786Z cvt.u64.u32 %rd276, %r983; 2026-02-21T09:13:45.3691942Z cvt.u64.u32 %rd277, %r984; 2026-02-21T09:13:45.3692091Z shl.b64 %rd278, %rd277, 32; 2026-02-21T09:13:45.3692249Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T09:13:45.3692397Z // begin inline asm 2026-02-21T09:13:45.3692750Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r986, %r987, %r988, %r989, %r990, %r991, %r992, %r993, %r994, %r995, %r996, %r997, %r998, %r999, %r1000, %r1001}, [%r188 + 192]; 2026-02-21T09:13:45.3693126Z // end inline asm 2026-02-21T09:13:45.3693264Z // begin inline asm 2026-02-21T09:13:45.3693641Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1003, %r1004, %r1005, %r1006, %r1007, %r1008, %r1009, %r1010, %r1011, %r1012, %r1013, %r1014, %r1015, %r1016, %r1017, %r1018}, [%r188 + 208]; 2026-02-21T09:13:45.3694033Z // end inline asm 2026-02-21T09:13:45.3694170Z // begin inline asm 2026-02-21T09:13:45.3694531Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1020, %r1021, %r1022, %r1023, %r1024, %r1025, %r1026, %r1027, %r1028, %r1029, %r1030, %r1031, %r1032, %r1033, %r1034, %r1035}, [%r188 + 224]; 2026-02-21T09:13:45.3694972Z // end inline asm 2026-02-21T09:13:45.3695115Z cvt.u64.u32 %rd280, %r1020; 2026-02-21T09:13:45.3695264Z cvt.u64.u32 %rd281, %r1021; 2026-02-21T09:13:45.3695420Z shl.b64 %rd282, %rd281, 32; 2026-02-21T09:13:45.3695568Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T09:13:45.3695729Z cvt.u64.u32 %rd284, %r1022; 2026-02-21T09:13:45.3695876Z cvt.u64.u32 %rd285, %r1023; 2026-02-21T09:13:45.3696051Z shl.b64 %rd286, %rd285, 32; 2026-02-21T09:13:45.3696206Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T09:13:45.3696378Z cvt.u64.u32 %rd288, %r1024; 2026-02-21T09:13:45.3696531Z cvt.u64.u32 %rd289, %r1025; 2026-02-21T09:13:45.3696692Z shl.b64 %rd290, %rd289, 32; 2026-02-21T09:13:45.3696854Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T09:13:45.3697011Z cvt.u64.u32 %rd292, %r1026; 2026-02-21T09:13:45.3697173Z cvt.u64.u32 %rd293, %r1027; 2026-02-21T09:13:45.3697324Z shl.b64 %rd294, %rd293, 32; 2026-02-21T09:13:45.3697485Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T09:13:45.3697644Z cvt.u64.u32 %rd296, %r1028; 2026-02-21T09:13:45.3697834Z cvt.u64.u32 %rd297, %r1029; 2026-02-21T09:13:45.3697985Z shl.b64 %rd298, %rd297, 32; 2026-02-21T09:13:45.3698146Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T09:13:45.3698305Z cvt.u64.u32 %rd300, %r1030; 2026-02-21T09:13:45.3698463Z cvt.u64.u32 %rd301, %r1031; 2026-02-21T09:13:45.3698655Z shl.b64 %rd302, %rd301, 32; 2026-02-21T09:13:45.3698810Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T09:13:45.3698974Z cvt.u64.u32 %rd304, %r1032; 2026-02-21T09:13:45.3699124Z cvt.u64.u32 %rd305, %r1033; 2026-02-21T09:13:45.3699296Z shl.b64 %rd306, %rd305, 32; 2026-02-21T09:13:45.3699475Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T09:13:45.3699640Z cvt.u64.u32 %rd308, %r1034; 2026-02-21T09:13:45.3699793Z cvt.u64.u32 %rd309, %r1035; 2026-02-21T09:13:45.3699956Z shl.b64 %rd310, %rd309, 32; 2026-02-21T09:13:45.3700140Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T09:13:45.3700330Z // begin inline asm 2026-02-21T09:13:45.3700794Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1037, %r1038, %r1039, %r1040, %r1041, %r1042, %r1043, %r1044, %r1045, %r1046, %r1047, %r1048, %r1049, %r1050, %r1051, %r1052}, [%r188 + 240]; 2026-02-21T09:13:45.3701212Z // end inline asm 2026-02-21T09:13:45.3701358Z cvt.u64.u32 %rd312, %r1037; 2026-02-21T09:13:45.3701511Z cvt.u64.u32 %rd313, %r1038; 2026-02-21T09:13:45.3701674Z shl.b64 %rd314, %rd313, 32; 2026-02-21T09:13:45.3701830Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T09:13:45.3701995Z cvt.u64.u32 %rd316, %r1039; 2026-02-21T09:13:45.3702155Z cvt.u64.u32 %rd317, %r1040; 2026-02-21T09:13:45.3702339Z shl.b64 %rd318, %rd317, 32; 2026-02-21T09:13:45.3702509Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T09:13:45.3702664Z cvt.u64.u32 %rd320, %r1041; 2026-02-21T09:13:45.3702824Z cvt.u64.u32 %rd321, %r1042; 2026-02-21T09:13:45.3702973Z shl.b64 %rd322, %rd321, 32; 2026-02-21T09:13:45.3703136Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T09:13:45.3703293Z cvt.u64.u32 %rd324, %r1043; 2026-02-21T09:13:45.3703455Z cvt.u64.u32 %rd325, %r1044; 2026-02-21T09:13:45.3703607Z shl.b64 %rd326, %rd325, 32; 2026-02-21T09:13:45.3703768Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T09:13:45.3703941Z cvt.u64.u32 %rd328, %r1045; 2026-02-21T09:13:45.3704086Z cvt.u64.u32 %rd329, %r1046; 2026-02-21T09:13:45.3704241Z shl.b64 %rd330, %rd329, 32; 2026-02-21T09:13:45.3704390Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T09:13:45.3704545Z cvt.u64.u32 %rd332, %r1047; 2026-02-21T09:13:45.3704722Z cvt.u64.u32 %rd333, %r1048; 2026-02-21T09:13:45.3704879Z shl.b64 %rd334, %rd333, 32; 2026-02-21T09:13:45.3705028Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T09:13:45.3705186Z cvt.u64.u32 %rd336, %r1049; 2026-02-21T09:13:45.3705335Z cvt.u64.u32 %rd337, %r1050; 2026-02-21T09:13:45.3705479Z shl.b64 %rd338, %rd337, 32; 2026-02-21T09:13:45.3705631Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T09:13:45.3705780Z cvt.u64.u32 %rd340, %r1051; 2026-02-21T09:13:45.3705932Z cvt.u64.u32 %rd341, %r1052; 2026-02-21T09:13:45.3706078Z shl.b64 %rd342, %rd341, 32; 2026-02-21T09:13:45.3706231Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T09:13:45.3706381Z // begin inline asm 2026-02-21T09:13:45.3706748Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1054, %r1055, %r1056, %r1057, %r1058, %r1059, %r1060, %r1061, %r1062, %r1063, %r1064, %r1065, %r1066, %r1067, %r1068, %r1069}, [%r188 + 256]; 2026-02-21T09:13:45.3707148Z // end inline asm 2026-02-21T09:13:45.3707276Z // begin inline asm 2026-02-21T09:13:45.3707642Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1071, %r1072, %r1073, %r1074, %r1075, %r1076, %r1077, %r1078, %r1079, %r1080, %r1081, %r1082, %r1083, %r1084, %r1085, %r1086}, [%r188 + 272]; 2026-02-21T09:13:45.3708033Z // end inline asm 2026-02-21T09:13:45.3708168Z // begin inline asm 2026-02-21T09:13:45.3708523Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1088, %r1089, %r1090, %r1091, %r1092, %r1093, %r1094, %r1095, %r1096, %r1097, %r1098, %r1099, %r1100, %r1101, %r1102, %r1103}, [%r188 + 288]; 2026-02-21T09:13:45.3708920Z // end inline asm 2026-02-21T09:13:45.3709058Z cvt.u64.u32 %rd344, %r1088; 2026-02-21T09:13:45.3709237Z cvt.u64.u32 %rd345, %r1089; 2026-02-21T09:13:45.3709392Z shl.b64 %rd346, %rd345, 32; 2026-02-21T09:13:45.3709539Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T09:13:45.3709693Z cvt.u64.u32 %rd348, %r1090; 2026-02-21T09:13:45.3709839Z cvt.u64.u32 %rd349, %r1091; 2026-02-21T09:13:45.3710020Z shl.b64 %rd350, %rd349, 32; 2026-02-21T09:13:45.3710165Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T09:13:45.3710323Z cvt.u64.u32 %rd352, %r1092; 2026-02-21T09:13:45.3710474Z cvt.u64.u32 %rd353, %r1093; 2026-02-21T09:13:45.3710618Z shl.b64 %rd354, %rd353, 32; 2026-02-21T09:13:45.3710770Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T09:13:45.3710919Z cvt.u64.u32 %rd356, %r1094; 2026-02-21T09:13:45.3711079Z cvt.u64.u32 %rd357, %r1095; 2026-02-21T09:13:45.3711224Z shl.b64 %rd358, %rd357, 32; 2026-02-21T09:13:45.3711377Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T09:13:45.3711523Z cvt.u64.u32 %rd360, %r1096; 2026-02-21T09:13:45.3711725Z cvt.u64.u32 %rd361, %r1097; 2026-02-21T09:13:45.3711876Z shl.b64 %rd362, %rd361, 32; 2026-02-21T09:13:45.3712030Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T09:13:45.3712186Z cvt.u64.u32 %rd364, %r1098; 2026-02-21T09:13:45.3712330Z cvt.u64.u32 %rd365, %r1099; 2026-02-21T09:13:45.3712482Z shl.b64 %rd366, %rd365, 32; 2026-02-21T09:13:45.3712626Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T09:13:45.3712779Z cvt.u64.u32 %rd368, %r1100; 2026-02-21T09:13:45.3712923Z cvt.u64.u32 %rd369, %r1101; 2026-02-21T09:13:45.3713098Z shl.b64 %rd370, %rd369, 32; 2026-02-21T09:13:45.3713248Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T09:13:45.3713404Z cvt.u64.u32 %rd372, %r1102; 2026-02-21T09:13:45.3713553Z cvt.u64.u32 %rd373, %r1103; 2026-02-21T09:13:45.3713697Z shl.b64 %rd374, %rd373, 32; 2026-02-21T09:13:45.3713847Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T09:13:45.3713992Z // begin inline asm 2026-02-21T09:13:45.3714358Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1105, %r1106, %r1107, %r1108, %r1109, %r1110, %r1111, %r1112, %r1113, %r1114, %r1115, %r1116, %r1117, %r1118, %r1119, %r1120}, [%r188 + 304]; 2026-02-21T09:13:45.3714793Z // end inline asm 2026-02-21T09:13:45.3714931Z cvt.u64.u32 %rd376, %r1105; 2026-02-21T09:13:45.3715076Z cvt.u64.u32 %rd377, %r1106; 2026-02-21T09:13:45.3715226Z shl.b64 %rd378, %rd377, 32; 2026-02-21T09:13:45.3715379Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T09:13:45.3715528Z cvt.u64.u32 %rd380, %r1107; 2026-02-21T09:13:45.3715678Z cvt.u64.u32 %rd381, %r1108; 2026-02-21T09:13:45.3715823Z shl.b64 %rd382, %rd381, 32; 2026-02-21T09:13:45.3715976Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T09:13:45.3716125Z cvt.u64.u32 %rd384, %r1109; 2026-02-21T09:13:45.3716277Z cvt.u64.u32 %rd385, %r1110; 2026-02-21T09:13:45.3716423Z shl.b64 %rd386, %rd385, 32; 2026-02-21T09:13:45.3716578Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T09:13:45.3716727Z cvt.u64.u32 %rd388, %r1111; 2026-02-21T09:13:45.3716878Z cvt.u64.u32 %rd389, %r1112; 2026-02-21T09:13:45.3717031Z shl.b64 %rd390, %rd389, 32; 2026-02-21T09:13:45.3717179Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T09:13:45.3717336Z cvt.u64.u32 %rd392, %r1113; 2026-02-21T09:13:45.3717483Z cvt.u64.u32 %rd393, %r1114; 2026-02-21T09:13:45.3717634Z shl.b64 %rd394, %rd393, 32; 2026-02-21T09:13:45.3717783Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T09:13:45.3717942Z cvt.u64.u32 %rd396, %r1115; 2026-02-21T09:13:45.3718088Z cvt.u64.u32 %rd397, %r1116; 2026-02-21T09:13:45.3718239Z shl.b64 %rd398, %rd397, 32; 2026-02-21T09:13:45.3718390Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T09:13:45.3718538Z cvt.u64.u32 %rd400, %r1117; 2026-02-21T09:13:45.3718689Z cvt.u64.u32 %rd401, %r1118; 2026-02-21T09:13:45.3718832Z shl.b64 %rd402, %rd401, 32; 2026-02-21T09:13:45.3718982Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T09:13:45.3719128Z cvt.u64.u32 %rd404, %r1119; 2026-02-21T09:13:45.3719280Z cvt.u64.u32 %rd405, %r1120; 2026-02-21T09:13:45.3719423Z shl.b64 %rd406, %rd405, 32; 2026-02-21T09:13:45.3719575Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T09:13:45.3719756Z // begin inline asm 2026-02-21T09:13:45.3720135Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1122, %r1123, %r1124, %r1125, %r1126, %r1127, %r1128, %r1129, %r1130, %r1131, %r1132, %r1133, %r1134, %r1135, %r1136, %r1137}, [%r188 + 320]; 2026-02-21T09:13:45.3720584Z // end inline asm 2026-02-21T09:13:45.3720719Z // begin inline asm 2026-02-21T09:13:45.3721090Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1139, %r1140, %r1141, %r1142, %r1143, %r1144, %r1145, %r1146, %r1147, %r1148, %r1149, %r1150, %r1151, %r1152, %r1153, %r1154}, [%r188 + 336]; 2026-02-21T09:13:45.3721486Z // end inline asm 2026-02-21T09:13:45.3721627Z cvt.u64.u32 %rd408, %r1147; 2026-02-21T09:13:45.3721776Z cvt.u64.u32 %rd409, %r1148; 2026-02-21T09:13:45.3721932Z shl.b64 %rd410, %rd409, 32; 2026-02-21T09:13:45.3722090Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T09:13:45.3722242Z cvt.u64.u32 %rd412, %r1149; 2026-02-21T09:13:45.3722397Z cvt.u64.u32 %rd413, %r1150; 2026-02-21T09:13:45.3722572Z shl.b64 %rd414, %rd413, 32; 2026-02-21T09:13:45.3722729Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T09:13:45.3722876Z // begin inline asm 2026-02-21T09:13:45.3723250Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1156, %r1157, %r1158, %r1159, %r1160, %r1161, %r1162, %r1163, %r1164, %r1165, %r1166, %r1167, %r1168, %r1169, %r1170, %r1171}, [%r188 + 352]; 2026-02-21T09:13:45.3723650Z // end inline asm 2026-02-21T09:13:45.3723787Z cvt.u64.u32 %rd416, %r1156; 2026-02-21T09:13:45.3723934Z cvt.u64.u32 %rd417, %r1157; 2026-02-21T09:13:45.3724103Z shl.b64 %rd418, %rd417, 32; 2026-02-21T09:13:45.3724259Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T09:13:45.3724407Z cvt.u64.u32 %rd420, %r1158; 2026-02-21T09:13:45.3724557Z cvt.u64.u32 %rd421, %r1159; 2026-02-21T09:13:45.3724737Z shl.b64 %rd422, %rd421, 32; 2026-02-21T09:13:45.3724891Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T09:13:45.3725039Z cvt.u64.u32 %rd424, %r1160; 2026-02-21T09:13:45.3725190Z cvt.u64.u32 %rd425, %r1161; 2026-02-21T09:13:45.3725344Z shl.b64 %rd426, %rd425, 32; 2026-02-21T09:13:45.3725492Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T09:13:45.3725647Z cvt.u64.u32 %rd428, %r1162; 2026-02-21T09:13:45.3725790Z cvt.u64.u32 %rd429, %r1163; 2026-02-21T09:13:45.3725943Z shl.b64 %rd430, %rd429, 32; 2026-02-21T09:13:45.3726091Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T09:13:45.3726250Z cvt.u64.u32 %rd432, %r1164; 2026-02-21T09:13:45.3726396Z cvt.u64.u32 %rd433, %r1165; 2026-02-21T09:13:45.3726551Z shl.b64 %rd434, %rd433, 32; 2026-02-21T09:13:45.3726699Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T09:13:45.3726856Z cvt.u64.u32 %rd436, %r1166; 2026-02-21T09:13:45.3727008Z cvt.u64.u32 %rd437, %r1167; 2026-02-21T09:13:45.3727151Z shl.b64 %rd438, %rd437, 32; 2026-02-21T09:13:45.3727304Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T09:13:45.3727452Z cvt.u64.u32 %rd440, %r1168; 2026-02-21T09:13:45.3727602Z cvt.u64.u32 %rd441, %r1169; 2026-02-21T09:13:45.3727744Z shl.b64 %rd442, %rd441, 32; 2026-02-21T09:13:45.3727896Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T09:13:45.3728043Z cvt.u64.u32 %rd444, %r1170; 2026-02-21T09:13:45.3728197Z cvt.u64.u32 %rd445, %r1171; 2026-02-21T09:13:45.3728344Z shl.b64 %rd446, %rd445, 32; 2026-02-21T09:13:45.3728488Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T09:13:45.3728640Z // begin inline asm 2026-02-21T09:13:45.3729017Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1173, %r1174, %r1175, %r1176, %r1177, %r1178, %r1179, %r1180, %r1181, %r1182, %r1183, %r1184, %r1185, %r1186, %r1187, %r1188}, [%r188 + 368]; 2026-02-21T09:13:45.3729441Z // end inline asm 2026-02-21T09:13:45.3729572Z cvt.u64.u32 %rd448, %r1173; 2026-02-21T09:13:45.3729723Z cvt.u64.u32 %rd449, %r1174; 2026-02-21T09:13:45.3729867Z shl.b64 %rd450, %rd449, 32; 2026-02-21T09:13:45.3730021Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T09:13:45.3730176Z cvt.u64.u32 %rd452, %r1175; 2026-02-21T09:13:45.3730318Z cvt.u64.u32 %rd453, %r1176; 2026-02-21T09:13:45.3730470Z shl.b64 %rd454, %rd453, 32; 2026-02-21T09:13:45.3730616Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T09:13:45.3730800Z cvt.u64.u32 %rd456, %r1177; 2026-02-21T09:13:45.3730943Z cvt.u64.u32 %rd457, %r1178; 2026-02-21T09:13:45.3731092Z shl.b64 %rd458, %rd457, 32; 2026-02-21T09:13:45.3731235Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T09:13:45.3731390Z cvt.u64.u32 %rd460, %r1179; 2026-02-21T09:13:45.3731559Z cvt.u64.u32 %rd461, %r1180; 2026-02-21T09:13:45.3731707Z shl.b64 %rd462, %rd461, 32; 2026-02-21T09:13:45.3731860Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T09:13:45.3732009Z cvt.u64.u32 %rd464, %r1181; 2026-02-21T09:13:45.3732161Z cvt.u64.u32 %rd465, %r1182; 2026-02-21T09:13:45.3732304Z shl.b64 %rd466, %rd465, 32; 2026-02-21T09:13:45.3732453Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T09:13:45.3732599Z cvt.u64.u32 %rd468, %r1183; 2026-02-21T09:13:45.3732748Z cvt.u64.u32 %rd469, %r1184; 2026-02-21T09:13:45.3732891Z shl.b64 %rd470, %rd469, 32; 2026-02-21T09:13:45.3733044Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T09:13:45.3733224Z cvt.u64.u32 %rd472, %r1185; 2026-02-21T09:13:45.3733374Z cvt.u64.u32 %rd473, %r1186; 2026-02-21T09:13:45.3733530Z shl.b64 %rd474, %rd473, 32; 2026-02-21T09:13:45.3733681Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T09:13:45.3733840Z cvt.u64.u32 %rd476, %r1187; 2026-02-21T09:13:45.3733987Z cvt.u64.u32 %rd477, %r1188; 2026-02-21T09:13:45.3734145Z shl.b64 %rd478, %rd477, 32; 2026-02-21T09:13:45.3734296Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T09:13:45.3734455Z // begin inline asm 2026-02-21T09:13:45.3734889Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1190, %r1191, %r1192, %r1193, %r1194, %r1195, %r1196, %r1197, %r1198, %r1199, %r1200, %r1201, %r1202, %r1203, %r1204, %r1205}, [%r188 + 384]; 2026-02-21T09:13:45.3735312Z // end inline asm 2026-02-21T09:13:45.3735448Z // begin inline asm 2026-02-21T09:13:45.3735822Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1207, %r1208, %r1209, %r1210, %r1211, %r1212, %r1213, %r1214, %r1215, %r1216, %r1217, %r1218, %r1219, %r1220, %r1221, %r1222}, [%r188 + 400]; 2026-02-21T09:13:45.3736221Z // end inline asm 2026-02-21T09:13:45.3736355Z cvt.u64.u32 %rd480, %r1215; 2026-02-21T09:13:45.3736513Z cvt.u64.u32 %rd481, %r1216; 2026-02-21T09:13:45.3736659Z shl.b64 %rd482, %rd481, 32; 2026-02-21T09:13:45.3736816Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T09:13:45.3736977Z cvt.u64.u32 %rd484, %r1217; 2026-02-21T09:13:45.3737121Z cvt.u64.u32 %rd485, %r1218; 2026-02-21T09:13:45.3737278Z shl.b64 %rd486, %rd485, 32; 2026-02-21T09:13:45.3737424Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T09:13:45.3737579Z cvt.u64.u32 %rd488, %r1219; 2026-02-21T09:13:45.3737722Z cvt.u64.u32 %rd489, %r1220; 2026-02-21T09:13:45.3737875Z shl.b64 %rd490, %rd489, 32; 2026-02-21T09:13:45.3738021Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T09:13:45.3738178Z cvt.u64.u32 %rd492, %r1221; 2026-02-21T09:13:45.3738328Z cvt.u64.u32 %rd493, %r1222; 2026-02-21T09:13:45.3738473Z shl.b64 %rd494, %rd493, 32; 2026-02-21T09:13:45.3738628Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T09:13:45.3738776Z // begin inline asm 2026-02-21T09:13:45.3739152Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1224, %r1225, %r1226, %r1227, %r1228, %r1229, %r1230, %r1231, %r1232, %r1233, %r1234, %r1235, %r1236, %r1237, %r1238, %r1239}, [%r188 + 416]; 2026-02-21T09:13:45.3739567Z // end inline asm 2026-02-21T09:13:45.3739707Z cvt.u64.u32 %rd496, %r1224; 2026-02-21T09:13:45.3739850Z cvt.u64.u32 %rd497, %r1225; 2026-02-21T09:13:45.3740000Z shl.b64 %rd498, %rd497, 32; 2026-02-21T09:13:45.3740150Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T09:13:45.3740319Z cvt.u64.u32 %rd500, %r1226; 2026-02-21T09:13:45.3740477Z cvt.u64.u32 %rd501, %r1227; 2026-02-21T09:13:45.3740629Z shl.b64 %rd502, %rd501, 32; 2026-02-21T09:13:45.3740789Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T09:13:45.3740943Z cvt.u64.u32 %rd504, %r1228; 2026-02-21T09:13:45.3741100Z cvt.u64.u32 %rd505, %r1229; 2026-02-21T09:13:45.3741251Z shl.b64 %rd506, %rd505, 32; 2026-02-21T09:13:45.3741413Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T09:13:45.3741571Z cvt.u64.u32 %rd508, %r1230; 2026-02-21T09:13:45.3741759Z cvt.u64.u32 %rd509, %r1231; 2026-02-21T09:13:45.3741921Z shl.b64 %rd510, %rd509, 32; 2026-02-21T09:13:45.3742073Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T09:13:45.3742236Z cvt.u64.u32 %rd512, %r1232; 2026-02-21T09:13:45.3742422Z cvt.u64.u32 %rd513, %r1233; 2026-02-21T09:13:45.3742582Z shl.b64 %rd514, %rd513, 32; 2026-02-21T09:13:45.3742734Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T09:13:45.3742895Z cvt.u64.u32 %rd516, %r1234; 2026-02-21T09:13:45.3743049Z cvt.u64.u32 %rd517, %r1235; 2026-02-21T09:13:45.3743208Z shl.b64 %rd518, %rd517, 32; 2026-02-21T09:13:45.3743368Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T09:13:45.3743525Z cvt.u64.u32 %rd520, %r1236; 2026-02-21T09:13:45.3743685Z cvt.u64.u32 %rd521, %r1237; 2026-02-21T09:13:45.3743836Z shl.b64 %rd522, %rd521, 32; 2026-02-21T09:13:45.3743996Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T09:13:45.3744148Z cvt.u64.u32 %rd524, %r1238; 2026-02-21T09:13:45.3744335Z cvt.u64.u32 %rd525, %r1239; 2026-02-21T09:13:45.3744491Z shl.b64 %rd526, %rd525, 32; 2026-02-21T09:13:45.3744652Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T09:13:45.3744839Z // begin inline asm 2026-02-21T09:13:45.3745221Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1241, %r1242, %r1243, %r1244, %r1245, %r1246, %r1247, %r1248, %r1249, %r1250, %r1251, %r1252, %r1253, %r1254, %r1255, %r1256}, [%r188 + 432]; 2026-02-21T09:13:45.3745641Z // end inline asm 2026-02-21T09:13:45.3745778Z cvt.u64.u32 %rd528, %r1241; 2026-02-21T09:13:45.3745961Z cvt.u64.u32 %rd529, %r1242; 2026-02-21T09:13:45.3746122Z shl.b64 %rd530, %rd529, 32; 2026-02-21T09:13:45.3746284Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T09:13:45.3746439Z cvt.u64.u32 %rd532, %r1243; 2026-02-21T09:13:45.3746596Z cvt.u64.u32 %rd533, %r1244; 2026-02-21T09:13:45.3746747Z shl.b64 %rd534, %rd533, 32; 2026-02-21T09:13:45.3746908Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T09:13:45.3747069Z cvt.u64.u32 %rd536, %r1245; 2026-02-21T09:13:45.3747221Z cvt.u64.u32 %rd537, %r1246; 2026-02-21T09:13:45.3747279Z shl.b64 %rd538, %rd537, 32; 2026-02-21T09:13:45.3747345Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T09:13:45.3747401Z cvt.u64.u32 %rd540, %r1247; 2026-02-21T09:13:45.3747458Z cvt.u64.u32 %rd541, %r1248; 2026-02-21T09:13:45.3747522Z shl.b64 %rd542, %rd541, 32; 2026-02-21T09:13:45.3747581Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T09:13:45.3747636Z cvt.u64.u32 %rd544, %r1249; 2026-02-21T09:13:45.3747692Z cvt.u64.u32 %rd545, %r1250; 2026-02-21T09:13:45.3747756Z shl.b64 %rd546, %rd545, 32; 2026-02-21T09:13:45.3747815Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T09:13:45.3747871Z cvt.u64.u32 %rd548, %r1251; 2026-02-21T09:13:45.3747942Z cvt.u64.u32 %rd549, %r1252; 2026-02-21T09:13:45.3747995Z shl.b64 %rd550, %rd549, 32; 2026-02-21T09:13:45.3748051Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T09:13:45.3748112Z cvt.u64.u32 %rd552, %r1253; 2026-02-21T09:13:45.3748165Z cvt.u64.u32 %rd553, %r1254; 2026-02-21T09:13:45.3748219Z shl.b64 %rd554, %rd553, 32; 2026-02-21T09:13:45.3748274Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T09:13:45.3748334Z cvt.u64.u32 %rd556, %r1255; 2026-02-21T09:13:45.3748386Z cvt.u64.u32 %rd557, %r1256; 2026-02-21T09:13:45.3748439Z shl.b64 %rd558, %rd557, 32; 2026-02-21T09:13:45.3748500Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T09:13:45.3748556Z // begin inline asm 2026-02-21T09:13:45.3748849Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1258, %r1259, %r1260, %r1261, %r1262, %r1263, %r1264, %r1265, %r1266, %r1267, %r1268, %r1269, %r1270, %r1271, %r1272, %r1273}, [%r188 + 448]; 2026-02-21T09:13:45.3748901Z // end inline asm 2026-02-21T09:13:45.3748961Z // begin inline asm 2026-02-21T09:13:45.3749239Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1275, %r1276, %r1277, %r1278, %r1279, %r1280, %r1281, %r1282, %r1283, %r1284, %r1285, %r1286, %r1287, %r1288, %r1289, %r1290}, [%r188 + 464]; 2026-02-21T09:13:45.3749292Z // end inline asm 2026-02-21T09:13:45.3749353Z cvt.u64.u32 %rd560, %r1283; 2026-02-21T09:13:45.3749408Z cvt.u64.u32 %rd561, %r1284; 2026-02-21T09:13:45.3749490Z shl.b64 %rd562, %rd561, 32; 2026-02-21T09:13:45.3749552Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T09:13:45.3749605Z cvt.u64.u32 %rd564, %r1285; 2026-02-21T09:13:45.3749658Z cvt.u64.u32 %rd565, %r1286; 2026-02-21T09:13:45.3749757Z shl.b64 %rd566, %rd565, 32; 2026-02-21T09:13:45.3749820Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T09:13:45.3749874Z cvt.u64.u32 %rd568, %r1287; 2026-02-21T09:13:45.3749927Z cvt.u64.u32 %rd569, %r1288; 2026-02-21T09:13:45.3749987Z shl.b64 %rd570, %rd569, 32; 2026-02-21T09:13:45.3750043Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T09:13:45.3750097Z cvt.u64.u32 %rd572, %r1289; 2026-02-21T09:13:45.3750151Z cvt.u64.u32 %rd573, %r1290; 2026-02-21T09:13:45.3750214Z shl.b64 %rd574, %rd573, 32; 2026-02-21T09:13:45.3750269Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T09:13:45.3750322Z // begin inline asm 2026-02-21T09:13:45.3750635Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1292, %r1293, %r1294, %r1295, %r1296, %r1297, %r1298, %r1299, %r1300, %r1301, %r1302, %r1303, %r1304, %r1305, %r1306, %r1307}, [%r188 + 480]; 2026-02-21T09:13:45.3750692Z // end inline asm 2026-02-21T09:13:45.3750749Z cvt.u64.u32 %rd576, %r1292; 2026-02-21T09:13:45.3750811Z cvt.u64.u32 %rd577, %r1293; 2026-02-21T09:13:45.3750867Z shl.b64 %rd578, %rd577, 32; 2026-02-21T09:13:45.3750922Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T09:13:45.3750977Z cvt.u64.u32 %rd580, %r1294; 2026-02-21T09:13:45.3751039Z cvt.u64.u32 %rd581, %r1295; 2026-02-21T09:13:45.3751118Z shl.b64 %rd582, %rd581, 32; 2026-02-21T09:13:45.3751176Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T09:13:45.3751236Z cvt.u64.u32 %rd584, %r1296; 2026-02-21T09:13:45.3751290Z cvt.u64.u32 %rd585, %r1297; 2026-02-21T09:13:45.3751344Z shl.b64 %rd586, %rd585, 32; 2026-02-21T09:13:45.3751399Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T09:13:45.3751462Z cvt.u64.u32 %rd588, %r1298; 2026-02-21T09:13:45.3751516Z cvt.u64.u32 %rd589, %r1299; 2026-02-21T09:13:45.3751573Z shl.b64 %rd590, %rd589, 32; 2026-02-21T09:13:45.3751638Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T09:13:45.3751692Z cvt.u64.u32 %rd592, %r1300; 2026-02-21T09:13:45.3751746Z cvt.u64.u32 %rd593, %r1301; 2026-02-21T09:13:45.3751808Z shl.b64 %rd594, %rd593, 32; 2026-02-21T09:13:45.3751867Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T09:13:45.3751922Z cvt.u64.u32 %rd596, %r1302; 2026-02-21T09:13:45.3751976Z cvt.u64.u32 %rd597, %r1303; 2026-02-21T09:13:45.3752038Z shl.b64 %rd598, %rd597, 32; 2026-02-21T09:13:45.3752096Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T09:13:45.3752150Z cvt.u64.u32 %rd600, %r1304; 2026-02-21T09:13:45.3752210Z cvt.u64.u32 %rd601, %r1305; 2026-02-21T09:13:45.3752264Z shl.b64 %rd602, %rd601, 32; 2026-02-21T09:13:45.3752317Z or.b64 %rd603, %rd600, %rd602; 2026-02-21T09:13:45.3752370Z cvt.u64.u32 %rd604, %r1306; 2026-02-21T09:13:45.3752432Z cvt.u64.u32 %rd605, %r1307; 2026-02-21T09:13:45.3752486Z shl.b64 %rd606, %rd605, 32; 2026-02-21T09:13:45.3752540Z or.b64 %rd607, %rd604, %rd606; 2026-02-21T09:13:45.3752601Z // begin inline asm 2026-02-21T09:13:45.3752904Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1309, %r1310, %r1311, %r1312, %r1313, %r1314, %r1315, %r1316, %r1317, %r1318, %r1319, %r1320, %r1321, %r1322, %r1323, %r1324}, [%r188 + 496]; 2026-02-21T09:13:45.3752957Z // end inline asm 2026-02-21T09:13:45.3753018Z cvt.u64.u32 %rd608, %r1309; 2026-02-21T09:13:45.3753073Z cvt.u64.u32 %rd609, %r1310; 2026-02-21T09:13:45.3753126Z shl.b64 %rd610, %rd609, 32; 2026-02-21T09:13:45.3753182Z or.b64 %rd611, %rd608, %rd610; 2026-02-21T09:13:45.3753244Z cvt.u64.u32 %rd612, %r1311; 2026-02-21T09:13:45.3753298Z cvt.u64.u32 %rd613, %r1312; 2026-02-21T09:13:45.3753353Z shl.b64 %rd614, %rd613, 32; 2026-02-21T09:13:45.3753416Z or.b64 %rd615, %rd612, %rd614; 2026-02-21T09:13:45.3753471Z cvt.u64.u32 %rd616, %r1313; 2026-02-21T09:13:45.3753523Z cvt.u64.u32 %rd617, %r1314; 2026-02-21T09:13:45.3753577Z shl.b64 %rd618, %rd617, 32; 2026-02-21T09:13:45.3753641Z or.b64 %rd619, %rd616, %rd618; 2026-02-21T09:13:45.3753718Z cvt.u64.u32 %rd620, %r1315; 2026-02-21T09:13:45.3753772Z cvt.u64.u32 %rd621, %r1316; 2026-02-21T09:13:45.3753832Z shl.b64 %rd622, %rd621, 32; 2026-02-21T09:13:45.3753889Z or.b64 %rd623, %rd620, %rd622; 2026-02-21T09:13:45.3753966Z cvt.u64.u32 %rd624, %r1317; 2026-02-21T09:13:45.3754019Z cvt.u64.u32 %rd625, %r1318; 2026-02-21T09:13:45.3754081Z shl.b64 %rd626, %rd625, 32; 2026-02-21T09:13:45.3754136Z or.b64 %rd627, %rd624, %rd626; 2026-02-21T09:13:45.3754190Z cvt.u64.u32 %rd628, %r1319; 2026-02-21T09:13:45.3754251Z cvt.u64.u32 %rd629, %r1320; 2026-02-21T09:13:45.3754305Z shl.b64 %rd630, %rd629, 32; 2026-02-21T09:13:45.3754360Z or.b64 %rd631, %rd628, %rd630; 2026-02-21T09:13:45.3754413Z cvt.u64.u32 %rd632, %r1321; 2026-02-21T09:13:45.3754475Z cvt.u64.u32 %rd633, %r1322; 2026-02-21T09:13:45.3754529Z shl.b64 %rd634, %rd633, 32; 2026-02-21T09:13:45.3754584Z or.b64 %rd635, %rd632, %rd634; 2026-02-21T09:13:45.3754703Z cvt.u64.u32 %rd636, %r1323; 2026-02-21T09:13:45.3754763Z cvt.u64.u32 %rd637, %r1324; 2026-02-21T09:13:45.3754817Z shl.b64 %rd638, %rd637, 32; 2026-02-21T09:13:45.3754878Z or.b64 %rd639, %rd636, %rd638; 2026-02-21T09:13:45.3754930Z // begin inline asm 2026-02-21T09:13:45.3755000Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:13:45.3755052Z // end inline asm 2026-02-21T09:13:45.3755112Z bar.sync 0, 128; 2026-02-21T09:13:45.3755165Z // begin inline asm 2026-02-21T09:13:45.3755258Z @%p145 mbarrier.arrive.shared::cta.b64 _, [%r2115]; 2026-02-21T09:13:45.3755343Z // end inline asm 2026-02-21T09:13:45.3755402Z cvt.u64.u32 %rd640, %r782; 2026-02-21T09:13:45.3755459Z cvt.u64.u32 %rd641, %r783; 2026-02-21T09:13:45.3755513Z shl.b64 %rd642, %rd641, 32; 2026-02-21T09:13:45.3755577Z or.b64 %rd643, %rd640, %rd642; 2026-02-21T09:13:45.3755748Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3755812Z mov.b64 {%r1332, %r1333}, %rd643; 2026-02-21T09:13:45.3755893Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T09:13:45.3756059Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3756115Z cvt.u64.u32 %rd644, %r784; 2026-02-21T09:13:45.3756176Z cvt.u64.u32 %rd645, %r785; 2026-02-21T09:13:45.3756232Z shl.b64 %rd646, %rd645, 32; 2026-02-21T09:13:45.3756287Z or.b64 %rd647, %rd644, %rd646; 2026-02-21T09:13:45.3756451Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3756519Z mov.b64 {%r1335, %r1336}, %rd647; 2026-02-21T09:13:45.3756588Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T09:13:45.3756754Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3756817Z cvt.u64.u32 %rd648, %r786; 2026-02-21T09:13:45.3756873Z cvt.u64.u32 %rd649, %r787; 2026-02-21T09:13:45.3756929Z shl.b64 %rd650, %rd649, 32; 2026-02-21T09:13:45.3756992Z or.b64 %rd651, %rd648, %rd650; 2026-02-21T09:13:45.3757151Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3757210Z mov.b64 {%r1338, %r1339}, %rd651; 2026-02-21T09:13:45.3757276Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T09:13:45.3757442Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3757497Z cvt.u64.u32 %rd652, %r788; 2026-02-21T09:13:45.3757553Z cvt.u64.u32 %rd653, %r789; 2026-02-21T09:13:45.3757614Z shl.b64 %rd654, %rd653, 32; 2026-02-21T09:13:45.3757670Z or.b64 %rd655, %rd652, %rd654; 2026-02-21T09:13:45.3757827Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3757891Z mov.b64 {%r1341, %r1342}, %rd655; 2026-02-21T09:13:45.3757957Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T09:13:45.3758120Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3758205Z cvt.u64.u32 %rd656, %r790; 2026-02-21T09:13:45.3758270Z cvt.u64.u32 %rd657, %r791; 2026-02-21T09:13:45.3758325Z shl.b64 %rd658, %rd657, 32; 2026-02-21T09:13:45.3758380Z or.b64 %rd659, %rd656, %rd658; 2026-02-21T09:13:45.3758571Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3758629Z mov.b64 {%r1344, %r1345}, %rd659; 2026-02-21T09:13:45.3758694Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T09:13:45.3758858Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3758915Z cvt.u64.u32 %rd660, %r792; 2026-02-21T09:13:45.3758970Z cvt.u64.u32 %rd661, %r793; 2026-02-21T09:13:45.3759025Z shl.b64 %rd662, %rd661, 32; 2026-02-21T09:13:45.3759087Z or.b64 %rd663, %rd660, %rd662; 2026-02-21T09:13:45.3759272Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3759332Z mov.b64 {%r1347, %r1348}, %rd663; 2026-02-21T09:13:45.3759401Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T09:13:45.3759563Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3759621Z cvt.u64.u32 %rd664, %r794; 2026-02-21T09:13:45.3759683Z cvt.u64.u32 %rd665, %r795; 2026-02-21T09:13:45.3759738Z shl.b64 %rd666, %rd665, 32; 2026-02-21T09:13:45.3759814Z or.b64 %rd667, %rd664, %rd666; 2026-02-21T09:13:45.3759980Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3760046Z mov.b64 {%r1350, %r1351}, %rd667; 2026-02-21T09:13:45.3760111Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T09:13:45.3760269Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3760334Z cvt.u64.u32 %rd668, %r796; 2026-02-21T09:13:45.3760389Z cvt.u64.u32 %rd669, %r797; 2026-02-21T09:13:45.3760446Z shl.b64 %rd670, %rd669, 32; 2026-02-21T09:13:45.3760509Z or.b64 %rd671, %rd668, %rd670; 2026-02-21T09:13:45.3760665Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3760724Z mov.b64 {%r1353, %r1354}, %rd671; 2026-02-21T09:13:45.3760788Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T09:13:45.3760952Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3761008Z cvt.u64.u32 %rd672, %r799; 2026-02-21T09:13:45.3761064Z cvt.u64.u32 %rd673, %r800; 2026-02-21T09:13:45.3761127Z shl.b64 %rd674, %rd673, 32; 2026-02-21T09:13:45.3761185Z or.b64 %rd675, %rd672, %rd674; 2026-02-21T09:13:45.3761343Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3761410Z mov.b64 {%r1356, %r1357}, %rd675; 2026-02-21T09:13:45.3761475Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T09:13:45.3761631Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3761688Z cvt.u64.u32 %rd676, %r801; 2026-02-21T09:13:45.3761752Z cvt.u64.u32 %rd677, %r802; 2026-02-21T09:13:45.3761811Z shl.b64 %rd678, %rd677, 32; 2026-02-21T09:13:45.3761867Z or.b64 %rd679, %rd676, %rd678; 2026-02-21T09:13:45.3762032Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3762091Z mov.b64 {%r1359, %r1360}, %rd679; 2026-02-21T09:13:45.3762156Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T09:13:45.3762320Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3762375Z cvt.u64.u32 %rd680, %r803; 2026-02-21T09:13:45.3762432Z cvt.u64.u32 %rd681, %r804; 2026-02-21T09:13:45.3762487Z shl.b64 %rd682, %rd681, 32; 2026-02-21T09:13:45.3762552Z or.b64 %rd683, %rd680, %rd682; 2026-02-21T09:13:45.3762740Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3762798Z mov.b64 {%r1362, %r1363}, %rd683; 2026-02-21T09:13:45.3762869Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T09:13:45.3763053Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3763108Z cvt.u64.u32 %rd684, %r805; 2026-02-21T09:13:45.3763168Z cvt.u64.u32 %rd685, %r806; 2026-02-21T09:13:45.3763223Z shl.b64 %rd686, %rd685, 32; 2026-02-21T09:13:45.3763279Z or.b64 %rd687, %rd684, %rd686; 2026-02-21T09:13:45.3763432Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3763496Z mov.b64 {%r1365, %r1366}, %rd687; 2026-02-21T09:13:45.3763561Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T09:13:45.3763738Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3763804Z cvt.u64.u32 %rd688, %r807; 2026-02-21T09:13:45.3763857Z cvt.u64.u32 %rd689, %r808; 2026-02-21T09:13:45.3763913Z shl.b64 %rd690, %rd689, 32; 2026-02-21T09:13:45.3763976Z or.b64 %rd691, %rd688, %rd690; 2026-02-21T09:13:45.3764135Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3764192Z mov.b64 {%r1368, %r1369}, %rd691; 2026-02-21T09:13:45.3764280Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T09:13:45.3764449Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3764504Z cvt.u64.u32 %rd692, %r809; 2026-02-21T09:13:45.3764558Z cvt.u64.u32 %rd693, %r810; 2026-02-21T09:13:45.3764619Z shl.b64 %rd694, %rd693, 32; 2026-02-21T09:13:45.3764704Z or.b64 %rd695, %rd692, %rd694; 2026-02-21T09:13:45.3764871Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3764934Z mov.b64 {%r1371, %r1372}, %rd695; 2026-02-21T09:13:45.3764998Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T09:13:45.3765159Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3765216Z cvt.u64.u32 %rd696, %r811; 2026-02-21T09:13:45.3765278Z cvt.u64.u32 %rd697, %r812; 2026-02-21T09:13:45.3765333Z shl.b64 %rd698, %rd697, 32; 2026-02-21T09:13:45.3765388Z or.b64 %rd699, %rd696, %rd698; 2026-02-21T09:13:45.3765554Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3765610Z mov.b64 {%r1374, %r1375}, %rd699; 2026-02-21T09:13:45.3765672Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T09:13:45.3765840Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3765896Z cvt.u64.u32 %rd700, %r813; 2026-02-21T09:13:45.3765951Z cvt.u64.u32 %rd701, %r814; 2026-02-21T09:13:45.3766007Z shl.b64 %rd702, %rd701, 32; 2026-02-21T09:13:45.3766071Z or.b64 %rd703, %rd700, %rd702; 2026-02-21T09:13:45.3766234Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3766291Z mov.b64 {%r1377, %r1378}, %rd703; 2026-02-21T09:13:45.3766362Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T09:13:45.3766423Z mov.b64 {%r1380, %r1381}, %rd91; 2026-02-21T09:13:45.3766488Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T09:13:45.3766555Z mov.b64 {%r1383, %r1384}, %rd95; 2026-02-21T09:13:45.3766617Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T09:13:45.3766676Z mov.b64 {%r1386, %r1387}, %rd99; 2026-02-21T09:13:45.3766739Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T09:13:45.3766803Z mov.b64 {%r1389, %r1390}, %rd103; 2026-02-21T09:13:45.3766864Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T09:13:45.3766921Z mov.b64 {%r1392, %r1393}, %rd107; 2026-02-21T09:13:45.3767043Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T09:13:45.3767099Z mov.b64 {%r1395, %r1396}, %rd111; 2026-02-21T09:13:45.3767160Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T09:13:45.3767215Z mov.b64 {%r1398, %r1399}, %rd115; 2026-02-21T09:13:45.3767315Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T09:13:45.3767372Z mov.b64 {%r1401, %r1402}, %rd119; 2026-02-21T09:13:45.3767433Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T09:13:45.3767499Z mov.b64 {%r1404, %r1405}, %rd123; 2026-02-21T09:13:45.3767562Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T09:13:45.3767618Z mov.b64 {%r1407, %r1408}, %rd127; 2026-02-21T09:13:45.3767689Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T09:13:45.3767746Z mov.b64 {%r1410, %r1411}, %rd131; 2026-02-21T09:13:45.3767809Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T09:13:45.3767865Z mov.b64 {%r1413, %r1414}, %rd135; 2026-02-21T09:13:45.3767962Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T09:13:45.3768021Z mov.b64 {%r1416, %r1417}, %rd139; 2026-02-21T09:13:45.3768083Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T09:13:45.3768143Z mov.b64 {%r1419, %r1420}, %rd143; 2026-02-21T09:13:45.3768204Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T09:13:45.3768259Z mov.b64 {%r1422, %r1423}, %rd147; 2026-02-21T09:13:45.3768321Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T09:13:45.3768380Z mov.b64 {%r1425, %r1426}, %rd151; 2026-02-21T09:13:45.3768463Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T09:13:45.3768631Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3768694Z cvt.u64.u32 %rd704, %r850; 2026-02-21T09:13:45.3768749Z cvt.u64.u32 %rd705, %r851; 2026-02-21T09:13:45.3768805Z shl.b64 %rd706, %rd705, 32; 2026-02-21T09:13:45.3768869Z or.b64 %rd707, %rd704, %rd706; 2026-02-21T09:13:45.3769034Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3769091Z mov.b64 {%r1428, %r1429}, %rd707; 2026-02-21T09:13:45.3769152Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T09:13:45.3769322Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3769378Z cvt.u64.u32 %rd708, %r852; 2026-02-21T09:13:45.3769434Z cvt.u64.u32 %rd709, %r853; 2026-02-21T09:13:45.3769500Z shl.b64 %rd710, %rd709, 32; 2026-02-21T09:13:45.3769558Z or.b64 %rd711, %rd708, %rd710; 2026-02-21T09:13:45.3769720Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3769783Z mov.b64 {%r1431, %r1432}, %rd711; 2026-02-21T09:13:45.3769845Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T09:13:45.3770006Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3770062Z cvt.u64.u32 %rd712, %r854; 2026-02-21T09:13:45.3770125Z cvt.u64.u32 %rd713, %r855; 2026-02-21T09:13:45.3770182Z shl.b64 %rd714, %rd713, 32; 2026-02-21T09:13:45.3770238Z or.b64 %rd715, %rd712, %rd714; 2026-02-21T09:13:45.3770404Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3770461Z mov.b64 {%r1434, %r1435}, %rd715; 2026-02-21T09:13:45.3770522Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T09:13:45.3770690Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3770744Z cvt.u64.u32 %rd716, %r856; 2026-02-21T09:13:45.3770799Z cvt.u64.u32 %rd717, %r857; 2026-02-21T09:13:45.3770854Z shl.b64 %rd718, %rd717, 32; 2026-02-21T09:13:45.3770916Z or.b64 %rd719, %rd716, %rd718; 2026-02-21T09:13:45.3771077Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3771132Z mov.b64 {%r1437, %r1438}, %rd719; 2026-02-21T09:13:45.3771204Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T09:13:45.3771387Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3771442Z cvt.u64.u32 %rd720, %r858; 2026-02-21T09:13:45.3771505Z cvt.u64.u32 %rd721, %r859; 2026-02-21T09:13:45.3771582Z shl.b64 %rd722, %rd721, 32; 2026-02-21T09:13:45.3771638Z or.b64 %rd723, %rd720, %rd722; 2026-02-21T09:13:45.3771797Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3771861Z mov.b64 {%r1440, %r1441}, %rd723; 2026-02-21T09:13:45.3771924Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T09:13:45.3772084Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3772147Z cvt.u64.u32 %rd724, %r860; 2026-02-21T09:13:45.3772201Z cvt.u64.u32 %rd725, %r861; 2026-02-21T09:13:45.3772256Z shl.b64 %rd726, %rd725, 32; 2026-02-21T09:13:45.3772339Z or.b64 %rd727, %rd724, %rd726; 2026-02-21T09:13:45.3772499Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3772553Z mov.b64 {%r1443, %r1444}, %rd727; 2026-02-21T09:13:45.3772615Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T09:13:45.3772780Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3772835Z cvt.u64.u32 %rd728, %r862; 2026-02-21T09:13:45.3772889Z cvt.u64.u32 %rd729, %r863; 2026-02-21T09:13:45.3772972Z shl.b64 %rd730, %rd729, 32; 2026-02-21T09:13:45.3773028Z or.b64 %rd731, %rd728, %rd730; 2026-02-21T09:13:45.3773185Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3773248Z mov.b64 {%r1446, %r1447}, %rd731; 2026-02-21T09:13:45.3773310Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T09:13:45.3773464Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3773522Z cvt.u64.u32 %rd732, %r864; 2026-02-21T09:13:45.3773584Z cvt.u64.u32 %rd733, %r865; 2026-02-21T09:13:45.3773638Z shl.b64 %rd734, %rd733, 32; 2026-02-21T09:13:45.3773693Z or.b64 %rd735, %rd732, %rd734; 2026-02-21T09:13:45.3773855Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3773911Z mov.b64 {%r1449, %r1450}, %rd735; 2026-02-21T09:13:45.3773974Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T09:13:45.3774136Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3774191Z cvt.u64.u32 %rd736, %r867; 2026-02-21T09:13:45.3774245Z cvt.u64.u32 %rd737, %r868; 2026-02-21T09:13:45.3774301Z shl.b64 %rd738, %rd737, 32; 2026-02-21T09:13:45.3774363Z or.b64 %rd739, %rd736, %rd738; 2026-02-21T09:13:45.3774521Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3774578Z mov.b64 {%r1452, %r1453}, %rd739; 2026-02-21T09:13:45.3774647Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T09:13:45.3774837Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3774894Z cvt.u64.u32 %rd740, %r869; 2026-02-21T09:13:45.3774957Z cvt.u64.u32 %rd741, %r870; 2026-02-21T09:13:45.3775012Z shl.b64 %rd742, %rd741, 32; 2026-02-21T09:13:45.3775067Z or.b64 %rd743, %rd740, %rd742; 2026-02-21T09:13:45.3775225Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3775288Z mov.b64 {%r1455, %r1456}, %rd743; 2026-02-21T09:13:45.3775351Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T09:13:45.3775508Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3775570Z cvt.u64.u32 %rd744, %r871; 2026-02-21T09:13:45.3775625Z cvt.u64.u32 %rd745, %r872; 2026-02-21T09:13:45.3775709Z shl.b64 %rd746, %rd745, 32; 2026-02-21T09:13:45.3775772Z or.b64 %rd747, %rd744, %rd746; 2026-02-21T09:13:45.3775928Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3776010Z mov.b64 {%r1458, %r1459}, %rd747; 2026-02-21T09:13:45.3776073Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T09:13:45.3776243Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3776299Z cvt.u64.u32 %rd748, %r873; 2026-02-21T09:13:45.3776353Z cvt.u64.u32 %rd749, %r874; 2026-02-21T09:13:45.3776415Z shl.b64 %rd750, %rd749, 32; 2026-02-21T09:13:45.3776470Z or.b64 %rd751, %rd748, %rd750; 2026-02-21T09:13:45.3776628Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3776690Z mov.b64 {%r1461, %r1462}, %rd751; 2026-02-21T09:13:45.3776775Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T09:13:45.3776939Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3776994Z cvt.u64.u32 %rd752, %r875; 2026-02-21T09:13:45.3777063Z cvt.u64.u32 %rd753, %r876; 2026-02-21T09:13:45.3777119Z shl.b64 %rd754, %rd753, 32; 2026-02-21T09:13:45.3777175Z or.b64 %rd755, %rd752, %rd754; 2026-02-21T09:13:45.3777339Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3777431Z mov.b64 {%r1464, %r1465}, %rd755; 2026-02-21T09:13:45.3777497Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T09:13:45.3777663Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3777719Z cvt.u64.u32 %rd756, %r877; 2026-02-21T09:13:45.3777772Z cvt.u64.u32 %rd757, %r878; 2026-02-21T09:13:45.3777827Z shl.b64 %rd758, %rd757, 32; 2026-02-21T09:13:45.3777891Z or.b64 %rd759, %rd756, %rd758; 2026-02-21T09:13:45.3778052Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3778108Z mov.b64 {%r1467, %r1468}, %rd759; 2026-02-21T09:13:45.3778178Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T09:13:45.3778336Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3778390Z cvt.u64.u32 %rd760, %r879; 2026-02-21T09:13:45.3778450Z cvt.u64.u32 %rd761, %r880; 2026-02-21T09:13:45.3778505Z shl.b64 %rd762, %rd761, 32; 2026-02-21T09:13:45.3778561Z or.b64 %rd763, %rd760, %rd762; 2026-02-21T09:13:45.3778719Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3778781Z mov.b64 {%r1470, %r1471}, %rd763; 2026-02-21T09:13:45.3778843Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T09:13:45.3778999Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3779063Z cvt.u64.u32 %rd764, %r881; 2026-02-21T09:13:45.3779119Z cvt.u64.u32 %rd765, %r882; 2026-02-21T09:13:45.3779176Z shl.b64 %rd766, %rd765, 32; 2026-02-21T09:13:45.3779239Z or.b64 %rd767, %rd764, %rd766; 2026-02-21T09:13:45.3779396Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3779454Z mov.b64 {%r1473, %r1474}, %rd767; 2026-02-21T09:13:45.3779517Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T09:13:45.3779579Z mov.b64 {%r1476, %r1477}, %rd155; 2026-02-21T09:13:45.3779640Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T09:13:45.3779695Z mov.b64 {%r1479, %r1480}, %rd159; 2026-02-21T09:13:45.3779761Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T09:13:45.3779815Z mov.b64 {%r1482, %r1483}, %rd163; 2026-02-21T09:13:45.3779876Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T09:13:45.3779937Z mov.b64 {%r1485, %r1486}, %rd167; 2026-02-21T09:13:45.3779999Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T09:13:45.3780075Z mov.b64 {%r1488, %r1489}, %rd171; 2026-02-21T09:13:45.3780135Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T09:13:45.3780197Z mov.b64 {%r1491, %r1492}, %rd175; 2026-02-21T09:13:45.3780257Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T09:13:45.3780334Z mov.b64 {%r1494, %r1495}, %rd179; 2026-02-21T09:13:45.3780400Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T09:13:45.3780454Z mov.b64 {%r1497, %r1498}, %rd183; 2026-02-21T09:13:45.3780516Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T09:13:45.3780570Z mov.b64 {%r1500, %r1501}, %rd187; 2026-02-21T09:13:45.3780637Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T09:13:45.3780692Z mov.b64 {%r1503, %r1504}, %rd191; 2026-02-21T09:13:45.3780753Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T09:13:45.3780814Z mov.b64 {%r1506, %r1507}, %rd195; 2026-02-21T09:13:45.3780896Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T09:13:45.3780952Z mov.b64 {%r1509, %r1510}, %rd199; 2026-02-21T09:13:45.3781020Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T09:13:45.3781074Z mov.b64 {%r1512, %r1513}, %rd203; 2026-02-21T09:13:45.3781135Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T09:13:45.3781189Z mov.b64 {%r1515, %r1516}, %rd207; 2026-02-21T09:13:45.3781259Z cvt.rn.f16x2.f32 %r1517, %r1516, %r1515; 2026-02-21T09:13:45.3781313Z mov.b64 {%r1518, %r1519}, %rd211; 2026-02-21T09:13:45.3781373Z cvt.rn.f16x2.f32 %r1520, %r1519, %r1518; 2026-02-21T09:13:45.3781464Z mov.b64 {%r1521, %r1522}, %rd215; 2026-02-21T09:13:45.3781526Z cvt.rn.f16x2.f32 %r1523, %r1522, %r1521; 2026-02-21T09:13:45.3781688Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3781750Z cvt.u64.u32 %rd768, %r918; 2026-02-21T09:13:45.3781806Z cvt.u64.u32 %rd769, %r919; 2026-02-21T09:13:45.3781861Z shl.b64 %rd770, %rd769, 32; 2026-02-21T09:13:45.3781917Z or.b64 %rd771, %rd768, %rd770; 2026-02-21T09:13:45.3782085Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3782139Z mov.b64 {%r1524, %r1525}, %rd771; 2026-02-21T09:13:45.3782199Z cvt.rn.f16x2.f32 %r1526, %r1525, %r1524; 2026-02-21T09:13:45.3782366Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3782422Z cvt.u64.u32 %rd772, %r920; 2026-02-21T09:13:45.3782476Z cvt.u64.u32 %rd773, %r921; 2026-02-21T09:13:45.3782540Z shl.b64 %rd774, %rd773, 32; 2026-02-21T09:13:45.3782596Z or.b64 %rd775, %rd772, %rd774; 2026-02-21T09:13:45.3782757Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3782813Z mov.b64 {%r1527, %r1528}, %rd775; 2026-02-21T09:13:45.3782883Z cvt.rn.f16x2.f32 %r1529, %r1528, %r1527; 2026-02-21T09:13:45.3783046Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3783103Z cvt.u64.u32 %rd776, %r922; 2026-02-21T09:13:45.3783166Z cvt.u64.u32 %rd777, %r923; 2026-02-21T09:13:45.3783220Z shl.b64 %rd778, %rd777, 32; 2026-02-21T09:13:45.3783275Z or.b64 %rd779, %rd776, %rd778; 2026-02-21T09:13:45.3783444Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3783501Z mov.b64 {%r1530, %r1531}, %rd779; 2026-02-21T09:13:45.3783563Z cvt.rn.f16x2.f32 %r1532, %r1531, %r1530; 2026-02-21T09:13:45.3783721Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3783785Z cvt.u64.u32 %rd780, %r924; 2026-02-21T09:13:45.3783840Z cvt.u64.u32 %rd781, %r925; 2026-02-21T09:13:45.3783894Z shl.b64 %rd782, %rd781, 32; 2026-02-21T09:13:45.3783955Z or.b64 %rd783, %rd780, %rd782; 2026-02-21T09:13:45.3784113Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3784192Z mov.b64 {%r1533, %r1534}, %rd783; 2026-02-21T09:13:45.3784263Z cvt.rn.f16x2.f32 %r1535, %r1534, %r1533; 2026-02-21T09:13:45.3784430Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3784509Z cvt.u64.u32 %rd784, %r926; 2026-02-21T09:13:45.3784567Z cvt.u64.u32 %rd785, %r927; 2026-02-21T09:13:45.3784630Z shl.b64 %rd786, %rd785, 32; 2026-02-21T09:13:45.3784717Z or.b64 %rd787, %rd784, %rd786; 2026-02-21T09:13:45.3784887Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3784953Z mov.b64 {%r1536, %r1537}, %rd787; 2026-02-21T09:13:45.3785017Z cvt.rn.f16x2.f32 %r1538, %r1537, %r1536; 2026-02-21T09:13:45.3785183Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3785248Z cvt.u64.u32 %rd788, %r928; 2026-02-21T09:13:45.3785338Z cvt.u64.u32 %rd789, %r929; 2026-02-21T09:13:45.3785398Z shl.b64 %rd790, %rd789, 32; 2026-02-21T09:13:45.3785458Z or.b64 %rd791, %rd788, %rd790; 2026-02-21T09:13:45.3785637Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3785698Z mov.b64 {%r1539, %r1540}, %rd791; 2026-02-21T09:13:45.3785763Z cvt.rn.f16x2.f32 %r1541, %r1540, %r1539; 2026-02-21T09:13:45.3785942Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3786026Z cvt.u64.u32 %rd792, %r930; 2026-02-21T09:13:45.3786085Z cvt.u64.u32 %rd793, %r931; 2026-02-21T09:13:45.3786151Z shl.b64 %rd794, %rd793, 32; 2026-02-21T09:13:45.3786211Z or.b64 %rd795, %rd792, %rd794; 2026-02-21T09:13:45.3786380Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3786438Z mov.b64 {%r1542, %r1543}, %rd795; 2026-02-21T09:13:45.3786513Z cvt.rn.f16x2.f32 %r1544, %r1543, %r1542; 2026-02-21T09:13:45.3786679Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3786737Z cvt.u64.u32 %rd796, %r932; 2026-02-21T09:13:45.3786802Z cvt.u64.u32 %rd797, %r933; 2026-02-21T09:13:45.3786861Z shl.b64 %rd798, %rd797, 32; 2026-02-21T09:13:45.3786920Z or.b64 %rd799, %rd796, %rd798; 2026-02-21T09:13:45.3787094Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3787153Z mov.b64 {%r1545, %r1546}, %rd799; 2026-02-21T09:13:45.3787219Z cvt.rn.f16x2.f32 %r1547, %r1546, %r1545; 2026-02-21T09:13:45.3787386Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3787450Z cvt.u64.u32 %rd800, %r935; 2026-02-21T09:13:45.3787507Z cvt.u64.u32 %rd801, %r936; 2026-02-21T09:13:45.3787564Z shl.b64 %rd802, %rd801, 32; 2026-02-21T09:13:45.3787630Z or.b64 %rd803, %rd800, %rd802; 2026-02-21T09:13:45.3787795Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3787852Z mov.b64 {%r1548, %r1549}, %rd803; 2026-02-21T09:13:45.3787922Z cvt.rn.f16x2.f32 %r1550, %r1549, %r1548; 2026-02-21T09:13:45.3788087Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3788144Z cvt.u64.u32 %rd804, %r937; 2026-02-21T09:13:45.3788200Z cvt.u64.u32 %rd805, %r938; 2026-02-21T09:13:45.3788266Z shl.b64 %rd806, %rd805, 32; 2026-02-21T09:13:45.3788326Z or.b64 %rd807, %rd804, %rd806; 2026-02-21T09:13:45.3788492Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3788556Z mov.b64 {%r1551, %r1552}, %rd807; 2026-02-21T09:13:45.3788620Z cvt.rn.f16x2.f32 %r1553, %r1552, %r1551; 2026-02-21T09:13:45.3788788Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3788878Z cvt.u64.u32 %rd808, %r939; 2026-02-21T09:13:45.3788935Z cvt.u64.u32 %rd809, %r940; 2026-02-21T09:13:45.3788993Z shl.b64 %rd810, %rd809, 32; 2026-02-21T09:13:45.3789050Z or.b64 %rd811, %rd808, %rd810; 2026-02-21T09:13:45.3789227Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3789318Z mov.b64 {%r1554, %r1555}, %rd811; 2026-02-21T09:13:45.3789382Z cvt.rn.f16x2.f32 %r1556, %r1555, %r1554; 2026-02-21T09:13:45.3789552Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3789610Z cvt.u64.u32 %rd812, %r941; 2026-02-21T09:13:45.3789667Z cvt.u64.u32 %rd813, %r942; 2026-02-21T09:13:45.3789731Z shl.b64 %rd814, %rd813, 32; 2026-02-21T09:13:45.3789789Z or.b64 %rd815, %rd812, %rd814; 2026-02-21T09:13:45.3789978Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3790041Z mov.b64 {%r1557, %r1558}, %rd815; 2026-02-21T09:13:45.3790112Z cvt.rn.f16x2.f32 %r1559, %r1558, %r1557; 2026-02-21T09:13:45.3790282Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3790343Z cvt.u64.u32 %rd816, %r943; 2026-02-21T09:13:45.3790406Z cvt.u64.u32 %rd817, %r944; 2026-02-21T09:13:45.3790465Z shl.b64 %rd818, %rd817, 32; 2026-02-21T09:13:45.3790523Z or.b64 %rd819, %rd816, %rd818; 2026-02-21T09:13:45.3790717Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3790778Z mov.b64 {%r1560, %r1561}, %rd819; 2026-02-21T09:13:45.3790842Z cvt.rn.f16x2.f32 %r1562, %r1561, %r1560; 2026-02-21T09:13:45.3791010Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3791076Z cvt.u64.u32 %rd820, %r945; 2026-02-21T09:13:45.3791135Z cvt.u64.u32 %rd821, %r946; 2026-02-21T09:13:45.3791195Z shl.b64 %rd822, %rd821, 32; 2026-02-21T09:13:45.3791260Z or.b64 %rd823, %rd820, %rd822; 2026-02-21T09:13:45.3791427Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3791485Z mov.b64 {%r1563, %r1564}, %rd823; 2026-02-21T09:13:45.3791559Z cvt.rn.f16x2.f32 %r1565, %r1564, %r1563; 2026-02-21T09:13:45.3791729Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3791787Z cvt.u64.u32 %rd824, %r947; 2026-02-21T09:13:45.3791844Z cvt.u64.u32 %rd825, %r948; 2026-02-21T09:13:45.3791908Z shl.b64 %rd826, %rd825, 32; 2026-02-21T09:13:45.3791964Z or.b64 %rd827, %rd824, %rd826; 2026-02-21T09:13:45.3792139Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3792201Z mov.b64 {%r1566, %r1567}, %rd827; 2026-02-21T09:13:45.3792264Z cvt.rn.f16x2.f32 %r1568, %r1567, %r1566; 2026-02-21T09:13:45.3792423Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3792484Z cvt.u64.u32 %rd828, %r949; 2026-02-21T09:13:45.3792538Z cvt.u64.u32 %rd829, %r950; 2026-02-21T09:13:45.3792594Z shl.b64 %rd830, %rd829, 32; 2026-02-21T09:13:45.3792651Z or.b64 %rd831, %rd828, %rd830; 2026-02-21T09:13:45.3792814Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3792872Z mov.b64 {%r1569, %r1570}, %rd831; 2026-02-21T09:13:45.3792934Z cvt.rn.f16x2.f32 %r1571, %r1570, %r1569; 2026-02-21T09:13:45.3792995Z mov.b64 {%r1572, %r1573}, %rd219; 2026-02-21T09:13:45.3793056Z cvt.rn.f16x2.f32 %r1574, %r1573, %r1572; 2026-02-21T09:13:45.3793110Z mov.b64 {%r1575, %r1576}, %rd223; 2026-02-21T09:13:45.3793171Z cvt.rn.f16x2.f32 %r1577, %r1576, %r1575; 2026-02-21T09:13:45.3793231Z mov.b64 {%r1578, %r1579}, %rd227; 2026-02-21T09:13:45.3793293Z cvt.rn.f16x2.f32 %r1580, %r1579, %r1578; 2026-02-21T09:13:45.3793370Z mov.b64 {%r1581, %r1582}, %rd231; 2026-02-21T09:13:45.3793438Z cvt.rn.f16x2.f32 %r1583, %r1582, %r1581; 2026-02-21T09:13:45.3793492Z mov.b64 {%r1584, %r1585}, %rd235; 2026-02-21T09:13:45.3793552Z cvt.rn.f16x2.f32 %r1586, %r1585, %r1584; 2026-02-21T09:13:45.3793637Z mov.b64 {%r1587, %r1588}, %rd239; 2026-02-21T09:13:45.3793698Z cvt.rn.f16x2.f32 %r1589, %r1588, %r1587; 2026-02-21T09:13:45.3793751Z mov.b64 {%r1590, %r1591}, %rd243; 2026-02-21T09:13:45.3793814Z cvt.rn.f16x2.f32 %r1592, %r1591, %r1590; 2026-02-21T09:13:45.3793877Z mov.b64 {%r1593, %r1594}, %rd247; 2026-02-21T09:13:45.3793937Z cvt.rn.f16x2.f32 %r1595, %r1594, %r1593; 2026-02-21T09:13:45.3793992Z mov.b64 {%r1596, %r1597}, %rd251; 2026-02-21T09:13:45.3794060Z cvt.rn.f16x2.f32 %r1598, %r1597, %r1596; 2026-02-21T09:13:45.3794114Z mov.b64 {%r1599, %r1600}, %rd255; 2026-02-21T09:13:45.3794174Z cvt.rn.f16x2.f32 %r1601, %r1600, %r1599; 2026-02-21T09:13:45.3794249Z mov.b64 {%r1602, %r1603}, %rd259; 2026-02-21T09:13:45.3794319Z cvt.rn.f16x2.f32 %r1604, %r1603, %r1602; 2026-02-21T09:13:45.3794374Z mov.b64 {%r1605, %r1606}, %rd263; 2026-02-21T09:13:45.3794435Z cvt.rn.f16x2.f32 %r1607, %r1606, %r1605; 2026-02-21T09:13:45.3794499Z mov.b64 {%r1608, %r1609}, %rd267; 2026-02-21T09:13:45.3794562Z cvt.rn.f16x2.f32 %r1610, %r1609, %r1608; 2026-02-21T09:13:45.3794616Z mov.b64 {%r1611, %r1612}, %rd271; 2026-02-21T09:13:45.3794714Z cvt.rn.f16x2.f32 %r1613, %r1612, %r1611; 2026-02-21T09:13:45.3794797Z mov.b64 {%r1614, %r1615}, %rd275; 2026-02-21T09:13:45.3794860Z cvt.rn.f16x2.f32 %r1616, %r1615, %r1614; 2026-02-21T09:13:45.3794915Z mov.b64 {%r1617, %r1618}, %rd279; 2026-02-21T09:13:45.3794984Z cvt.rn.f16x2.f32 %r1619, %r1618, %r1617; 2026-02-21T09:13:45.3795144Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3795199Z cvt.u64.u32 %rd832, %r986; 2026-02-21T09:13:45.3795263Z cvt.u64.u32 %rd833, %r987; 2026-02-21T09:13:45.3795319Z shl.b64 %rd834, %rd833, 32; 2026-02-21T09:13:45.3795374Z or.b64 %rd835, %rd832, %rd834; 2026-02-21T09:13:45.3795541Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3795597Z mov.b64 {%r1620, %r1621}, %rd835; 2026-02-21T09:13:45.3795660Z cvt.rn.f16x2.f32 %r1622, %r1621, %r1620; 2026-02-21T09:13:45.3795817Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3795881Z cvt.u64.u32 %rd836, %r988; 2026-02-21T09:13:45.3795937Z cvt.u64.u32 %rd837, %r989; 2026-02-21T09:13:45.3795992Z shl.b64 %rd838, %rd837, 32; 2026-02-21T09:13:45.3796055Z or.b64 %rd839, %rd836, %rd838; 2026-02-21T09:13:45.3796213Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3796269Z mov.b64 {%r1623, %r1624}, %rd839; 2026-02-21T09:13:45.3796338Z cvt.rn.f16x2.f32 %r1625, %r1624, %r1623; 2026-02-21T09:13:45.3796496Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3796550Z cvt.u64.u32 %rd840, %r990; 2026-02-21T09:13:45.3796604Z cvt.u64.u32 %rd841, %r991; 2026-02-21T09:13:45.3796666Z shl.b64 %rd842, %rd841, 32; 2026-02-21T09:13:45.3796724Z or.b64 %rd843, %rd840, %rd842; 2026-02-21T09:13:45.3796883Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3796945Z mov.b64 {%r1626, %r1627}, %rd843; 2026-02-21T09:13:45.3797008Z cvt.rn.f16x2.f32 %r1628, %r1627, %r1626; 2026-02-21T09:13:45.3797165Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3797227Z cvt.u64.u32 %rd844, %r992; 2026-02-21T09:13:45.3797280Z cvt.u64.u32 %rd845, %r993; 2026-02-21T09:13:45.3797335Z shl.b64 %rd846, %rd845, 32; 2026-02-21T09:13:45.3797391Z or.b64 %rd847, %rd844, %rd846; 2026-02-21T09:13:45.3797583Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3797638Z mov.b64 {%r1629, %r1630}, %rd847; 2026-02-21T09:13:45.3797701Z cvt.rn.f16x2.f32 %r1631, %r1630, %r1629; 2026-02-21T09:13:45.3797894Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3797949Z cvt.u64.u32 %rd848, %r994; 2026-02-21T09:13:45.3798004Z cvt.u64.u32 %rd849, %r995; 2026-02-21T09:13:45.3798066Z shl.b64 %rd850, %rd849, 32; 2026-02-21T09:13:45.3798121Z or.b64 %rd851, %rd848, %rd850; 2026-02-21T09:13:45.3798281Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3798337Z mov.b64 {%r1632, %r1633}, %rd851; 2026-02-21T09:13:45.3798405Z cvt.rn.f16x2.f32 %r1634, %r1633, %r1632; 2026-02-21T09:13:45.3798599Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3798658Z cvt.u64.u32 %rd852, %r996; 2026-02-21T09:13:45.3798720Z cvt.u64.u32 %rd853, %r997; 2026-02-21T09:13:45.3798776Z shl.b64 %rd854, %rd853, 32; 2026-02-21T09:13:45.3798831Z or.b64 %rd855, %rd852, %rd854; 2026-02-21T09:13:45.3799001Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3799059Z mov.b64 {%r1635, %r1636}, %rd855; 2026-02-21T09:13:45.3799121Z cvt.rn.f16x2.f32 %r1637, %r1636, %r1635; 2026-02-21T09:13:45.3799300Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3799363Z cvt.u64.u32 %rd856, %r998; 2026-02-21T09:13:45.3799418Z cvt.u64.u32 %rd857, %r999; 2026-02-21T09:13:45.3799475Z shl.b64 %rd858, %rd857, 32; 2026-02-21T09:13:45.3799540Z or.b64 %rd859, %rd856, %rd858; 2026-02-21T09:13:45.3799703Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3799761Z mov.b64 {%r1638, %r1639}, %rd859; 2026-02-21T09:13:45.3799831Z cvt.rn.f16x2.f32 %r1640, %r1639, %r1638; 2026-02-21T09:13:45.3799992Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3800049Z cvt.u64.u32 %rd860, %r1000; 2026-02-21T09:13:45.3800103Z cvt.u64.u32 %rd861, %r1001; 2026-02-21T09:13:45.3800164Z shl.b64 %rd862, %rd861, 32; 2026-02-21T09:13:45.3800220Z or.b64 %rd863, %rd860, %rd862; 2026-02-21T09:13:45.3800378Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3800443Z mov.b64 {%r1641, %r1642}, %rd863; 2026-02-21T09:13:45.3800505Z cvt.rn.f16x2.f32 %r1643, %r1642, %r1641; 2026-02-21T09:13:45.3800660Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3800721Z cvt.u64.u32 %rd864, %r1003; 2026-02-21T09:13:45.3800777Z cvt.u64.u32 %rd865, %r1004; 2026-02-21T09:13:45.3800833Z shl.b64 %rd866, %rd865, 32; 2026-02-21T09:13:45.3800889Z or.b64 %rd867, %rd864, %rd866; 2026-02-21T09:13:45.3801057Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3801112Z mov.b64 {%r1644, %r1645}, %rd867; 2026-02-21T09:13:45.3801175Z cvt.rn.f16x2.f32 %r1646, %r1645, %r1644; 2026-02-21T09:13:45.3801341Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3801398Z cvt.u64.u32 %rd868, %r1005; 2026-02-21T09:13:45.3801451Z cvt.u64.u32 %rd869, %r1006; 2026-02-21T09:13:45.3801511Z shl.b64 %rd870, %rd869, 32; 2026-02-21T09:13:45.3801567Z or.b64 %rd871, %rd868, %rd870; 2026-02-21T09:13:45.3801723Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3801780Z mov.b64 {%r1647, %r1648}, %rd871; 2026-02-21T09:13:45.3801851Z cvt.rn.f16x2.f32 %r1649, %r1648, %r1647; 2026-02-21T09:13:45.3802034Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3802089Z cvt.u64.u32 %rd872, %r1007; 2026-02-21T09:13:45.3802152Z cvt.u64.u32 %rd873, %r1008; 2026-02-21T09:13:45.3802207Z shl.b64 %rd874, %rd873, 32; 2026-02-21T09:13:45.3802283Z or.b64 %rd875, %rd872, %rd874; 2026-02-21T09:13:45.3802447Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3802505Z mov.b64 {%r1650, %r1651}, %rd875; 2026-02-21T09:13:45.3802568Z cvt.rn.f16x2.f32 %r1652, %r1651, %r1650; 2026-02-21T09:13:45.3802722Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3802783Z cvt.u64.u32 %rd876, %r1009; 2026-02-21T09:13:45.3802838Z cvt.u64.u32 %rd877, %r1010; 2026-02-21T09:13:45.3802891Z shl.b64 %rd878, %rd877, 32; 2026-02-21T09:13:45.3802973Z or.b64 %rd879, %rd876, %rd878; 2026-02-21T09:13:45.3803133Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3803188Z mov.b64 {%r1653, %r1654}, %rd879; 2026-02-21T09:13:45.3803258Z cvt.rn.f16x2.f32 %r1655, %r1654, %r1653; 2026-02-21T09:13:45.3803419Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3803475Z cvt.u64.u32 %rd880, %r1011; 2026-02-21T09:13:45.3803528Z cvt.u64.u32 %rd881, %r1012; 2026-02-21T09:13:45.3803609Z shl.b64 %rd882, %rd881, 32; 2026-02-21T09:13:45.3803665Z or.b64 %rd883, %rd880, %rd882; 2026-02-21T09:13:45.3803826Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3803890Z mov.b64 {%r1656, %r1657}, %rd883; 2026-02-21T09:13:45.3803951Z cvt.rn.f16x2.f32 %r1658, %r1657, %r1656; 2026-02-21T09:13:45.3804108Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3804170Z cvt.u64.u32 %rd884, %r1013; 2026-02-21T09:13:45.3804224Z cvt.u64.u32 %rd885, %r1014; 2026-02-21T09:13:45.3804278Z shl.b64 %rd886, %rd885, 32; 2026-02-21T09:13:45.3804334Z or.b64 %rd887, %rd884, %rd886; 2026-02-21T09:13:45.3804494Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3804550Z mov.b64 {%r1659, %r1660}, %rd887; 2026-02-21T09:13:45.3804612Z cvt.rn.f16x2.f32 %r1661, %r1660, %r1659; 2026-02-21T09:13:45.3804812Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3804867Z cvt.u64.u32 %rd888, %r1015; 2026-02-21T09:13:45.3804920Z cvt.u64.u32 %rd889, %r1016; 2026-02-21T09:13:45.3804981Z shl.b64 %rd890, %rd889, 32; 2026-02-21T09:13:45.3805037Z or.b64 %rd891, %rd888, %rd890; 2026-02-21T09:13:45.3805201Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3805258Z mov.b64 {%r1662, %r1663}, %rd891; 2026-02-21T09:13:45.3805327Z cvt.rn.f16x2.f32 %r1664, %r1663, %r1662; 2026-02-21T09:13:45.3805484Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3805539Z cvt.u64.u32 %rd892, %r1017; 2026-02-21T09:13:45.3805602Z cvt.u64.u32 %rd893, %r1018; 2026-02-21T09:13:45.3805656Z shl.b64 %rd894, %rd893, 32; 2026-02-21T09:13:45.3805712Z or.b64 %rd895, %rd892, %rd894; 2026-02-21T09:13:45.3805882Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3805937Z mov.b64 {%r1665, %r1666}, %rd895; 2026-02-21T09:13:45.3805999Z cvt.rn.f16x2.f32 %r1667, %r1666, %r1665; 2026-02-21T09:13:45.3806054Z mov.b64 {%r1668, %r1669}, %rd283; 2026-02-21T09:13:45.3806125Z cvt.rn.f16x2.f32 %r1670, %r1669, %r1668; 2026-02-21T09:13:45.3806181Z mov.b64 {%r1671, %r1672}, %rd287; 2026-02-21T09:13:45.3806244Z cvt.rn.f16x2.f32 %r1673, %r1672, %r1671; 2026-02-21T09:13:45.3806337Z mov.b64 {%r1674, %r1675}, %rd291; 2026-02-21T09:13:45.3806401Z cvt.rn.f16x2.f32 %r1676, %r1675, %r1674; 2026-02-21T09:13:45.3806456Z mov.b64 {%r1677, %r1678}, %rd295; 2026-02-21T09:13:45.3806525Z cvt.rn.f16x2.f32 %r1679, %r1678, %r1677; 2026-02-21T09:13:45.3806619Z mov.b64 {%r1680, %r1681}, %rd299; 2026-02-21T09:13:45.3806681Z cvt.rn.f16x2.f32 %r1682, %r1681, %r1680; 2026-02-21T09:13:45.3806736Z mov.b64 {%r1683, %r1684}, %rd303; 2026-02-21T09:13:45.3806805Z cvt.rn.f16x2.f32 %r1685, %r1684, %r1683; 2026-02-21T09:13:45.3806861Z mov.b64 {%r1686, %r1687}, %rd307; 2026-02-21T09:13:45.3806922Z cvt.rn.f16x2.f32 %r1688, %r1687, %r1686; 2026-02-21T09:13:45.3806983Z mov.b64 {%r1689, %r1690}, %rd311; 2026-02-21T09:13:45.3807045Z cvt.rn.f16x2.f32 %r1691, %r1690, %r1689; 2026-02-21T09:13:45.3807101Z mov.b64 {%r1692, %r1693}, %rd315; 2026-02-21T09:13:45.3807163Z cvt.rn.f16x2.f32 %r1694, %r1693, %r1692; 2026-02-21T09:13:45.3807251Z mov.b64 {%r1695, %r1696}, %rd319; 2026-02-21T09:13:45.3807317Z cvt.rn.f16x2.f32 %r1697, %r1696, %r1695; 2026-02-21T09:13:45.3807370Z mov.b64 {%r1698, %r1699}, %rd323; 2026-02-21T09:13:45.3807440Z cvt.rn.f16x2.f32 %r1700, %r1699, %r1698; 2026-02-21T09:13:45.3807495Z mov.b64 {%r1701, %r1702}, %rd327; 2026-02-21T09:13:45.3807560Z cvt.rn.f16x2.f32 %r1703, %r1702, %r1701; 2026-02-21T09:13:45.3807622Z mov.b64 {%r1704, %r1705}, %rd331; 2026-02-21T09:13:45.3807684Z cvt.rn.f16x2.f32 %r1706, %r1705, %r1704; 2026-02-21T09:13:45.3807762Z mov.b64 {%r1707, %r1708}, %rd335; 2026-02-21T09:13:45.3807826Z cvt.rn.f16x2.f32 %r1709, %r1708, %r1707; 2026-02-21T09:13:45.3807887Z mov.b64 {%r1710, %r1711}, %rd339; 2026-02-21T09:13:45.3807948Z cvt.rn.f16x2.f32 %r1712, %r1711, %r1710; 2026-02-21T09:13:45.3808003Z mov.b64 {%r1713, %r1714}, %rd343; 2026-02-21T09:13:45.3808070Z cvt.rn.f16x2.f32 %r1715, %r1714, %r1713; 2026-02-21T09:13:45.3808233Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3808290Z cvt.u64.u32 %rd896, %r1054; 2026-02-21T09:13:45.3808351Z cvt.u64.u32 %rd897, %r1055; 2026-02-21T09:13:45.3808405Z shl.b64 %rd898, %rd897, 32; 2026-02-21T09:13:45.3808460Z or.b64 %rd899, %rd896, %rd898; 2026-02-21T09:13:45.3808624Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3808688Z mov.b64 {%r1716, %r1717}, %rd899; 2026-02-21T09:13:45.3808749Z cvt.rn.f16x2.f32 %r1718, %r1717, %r1716; 2026-02-21T09:13:45.3808913Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3808975Z cvt.u64.u32 %rd900, %r1056; 2026-02-21T09:13:45.3809038Z cvt.u64.u32 %rd901, %r1057; 2026-02-21T09:13:45.3809092Z shl.b64 %rd902, %rd901, 32; 2026-02-21T09:13:45.3809147Z or.b64 %rd903, %rd900, %rd902; 2026-02-21T09:13:45.3809316Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3809373Z mov.b64 {%r1719, %r1720}, %rd903; 2026-02-21T09:13:45.3809434Z cvt.rn.f16x2.f32 %r1721, %r1720, %r1719; 2026-02-21T09:13:45.3809599Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3809655Z cvt.u64.u32 %rd904, %r1058; 2026-02-21T09:13:45.3809709Z cvt.u64.u32 %rd905, %r1059; 2026-02-21T09:13:45.3809769Z shl.b64 %rd906, %rd905, 32; 2026-02-21T09:13:45.3809824Z or.b64 %rd907, %rd904, %rd906; 2026-02-21T09:13:45.3809979Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3810043Z mov.b64 {%r1722, %r1723}, %rd907; 2026-02-21T09:13:45.3810104Z cvt.rn.f16x2.f32 %r1724, %r1723, %r1722; 2026-02-21T09:13:45.3810262Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3810316Z cvt.u64.u32 %rd908, %r1060; 2026-02-21T09:13:45.3810381Z cvt.u64.u32 %rd909, %r1061; 2026-02-21T09:13:45.3810458Z shl.b64 %rd910, %rd909, 32; 2026-02-21T09:13:45.3810515Z or.b64 %rd911, %rd908, %rd910; 2026-02-21T09:13:45.3810679Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3810757Z mov.b64 {%r1725, %r1726}, %rd911; 2026-02-21T09:13:45.3810819Z cvt.rn.f16x2.f32 %r1727, %r1726, %r1725; 2026-02-21T09:13:45.3810984Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3811039Z cvt.u64.u32 %rd912, %r1062; 2026-02-21T09:13:45.3811094Z cvt.u64.u32 %rd913, %r1063; 2026-02-21T09:13:45.3811148Z shl.b64 %rd914, %rd913, 32; 2026-02-21T09:13:45.3811210Z or.b64 %rd915, %rd912, %rd914; 2026-02-21T09:13:45.3811366Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3811422Z mov.b64 {%r1728, %r1729}, %rd915; 2026-02-21T09:13:45.3811513Z cvt.rn.f16x2.f32 %r1730, %r1729, %r1728; 2026-02-21T09:13:45.3811670Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3811723Z cvt.u64.u32 %rd916, %r1064; 2026-02-21T09:13:45.3811784Z cvt.u64.u32 %rd917, %r1065; 2026-02-21T09:13:45.3811839Z shl.b64 %rd918, %rd917, 32; 2026-02-21T09:13:45.3811894Z or.b64 %rd919, %rd916, %rd918; 2026-02-21T09:13:45.3812050Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3812132Z mov.b64 {%r1731, %r1732}, %rd919; 2026-02-21T09:13:45.3812194Z cvt.rn.f16x2.f32 %r1733, %r1732, %r1731; 2026-02-21T09:13:45.3812349Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3812411Z cvt.u64.u32 %rd920, %r1066; 2026-02-21T09:13:45.3812464Z cvt.u64.u32 %rd921, %r1067; 2026-02-21T09:13:45.3812517Z shl.b64 %rd922, %rd921, 32; 2026-02-21T09:13:45.3812580Z or.b64 %rd923, %rd920, %rd922; 2026-02-21T09:13:45.3812736Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3812791Z mov.b64 {%r1734, %r1735}, %rd923; 2026-02-21T09:13:45.3812852Z cvt.rn.f16x2.f32 %r1736, %r1735, %r1734; 2026-02-21T09:13:45.3813018Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3813071Z cvt.u64.u32 %rd924, %r1068; 2026-02-21T09:13:45.3813124Z cvt.u64.u32 %rd925, %r1069; 2026-02-21T09:13:45.3813186Z shl.b64 %rd926, %rd925, 32; 2026-02-21T09:13:45.3813242Z or.b64 %rd927, %rd924, %rd926; 2026-02-21T09:13:45.3813398Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3813459Z mov.b64 {%r1737, %r1738}, %rd927; 2026-02-21T09:13:45.3813520Z cvt.rn.f16x2.f32 %r1739, %r1738, %r1737; 2026-02-21T09:13:45.3813678Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3813733Z cvt.u64.u32 %rd928, %r1071; 2026-02-21T09:13:45.3813793Z cvt.u64.u32 %rd929, %r1072; 2026-02-21T09:13:45.3813847Z shl.b64 %rd930, %rd929, 32; 2026-02-21T09:13:45.3813902Z or.b64 %rd931, %rd928, %rd930; 2026-02-21T09:13:45.3814067Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3814125Z mov.b64 {%r1740, %r1741}, %rd931; 2026-02-21T09:13:45.3814186Z cvt.rn.f16x2.f32 %r1742, %r1741, %r1740; 2026-02-21T09:13:45.3814347Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3814402Z cvt.u64.u32 %rd932, %r1073; 2026-02-21T09:13:45.3814455Z cvt.u64.u32 %rd933, %r1074; 2026-02-21T09:13:45.3814509Z shl.b64 %rd934, %rd933, 32; 2026-02-21T09:13:45.3814571Z or.b64 %rd935, %rd932, %rd934; 2026-02-21T09:13:45.3814755Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3814838Z mov.b64 {%r1743, %r1744}, %rd935; 2026-02-21T09:13:45.3814907Z cvt.rn.f16x2.f32 %r1745, %r1744, %r1743; 2026-02-21T09:13:45.3815070Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3815165Z cvt.u64.u32 %rd936, %r1075; 2026-02-21T09:13:45.3815227Z cvt.u64.u32 %rd937, %r1076; 2026-02-21T09:13:45.3815282Z shl.b64 %rd938, %rd937, 32; 2026-02-21T09:13:45.3815338Z or.b64 %rd939, %rd936, %rd938; 2026-02-21T09:13:45.3815497Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3815563Z mov.b64 {%r1746, %r1747}, %rd939; 2026-02-21T09:13:45.3815626Z cvt.rn.f16x2.f32 %r1748, %r1747, %r1746; 2026-02-21T09:13:45.3815785Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3815849Z cvt.u64.u32 %rd940, %r1077; 2026-02-21T09:13:45.3815930Z cvt.u64.u32 %rd941, %r1078; 2026-02-21T09:13:45.3815990Z shl.b64 %rd942, %rd941, 32; 2026-02-21T09:13:45.3816054Z or.b64 %rd943, %rd940, %rd942; 2026-02-21T09:13:45.3816211Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3816269Z mov.b64 {%r1749, %r1750}, %rd943; 2026-02-21T09:13:45.3816330Z cvt.rn.f16x2.f32 %r1751, %r1750, %r1749; 2026-02-21T09:13:45.3816498Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3816581Z cvt.u64.u32 %rd944, %r1079; 2026-02-21T09:13:45.3816639Z cvt.u64.u32 %rd945, %r1080; 2026-02-21T09:13:45.3816703Z shl.b64 %rd946, %rd945, 32; 2026-02-21T09:13:45.3816758Z or.b64 %rd947, %rd944, %rd946; 2026-02-21T09:13:45.3816920Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3816983Z mov.b64 {%r1752, %r1753}, %rd947; 2026-02-21T09:13:45.3817045Z cvt.rn.f16x2.f32 %r1754, %r1753, %r1752; 2026-02-21T09:13:45.3817203Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3817258Z cvt.u64.u32 %rd948, %r1081; 2026-02-21T09:13:45.3817318Z cvt.u64.u32 %rd949, %r1082; 2026-02-21T09:13:45.3817374Z shl.b64 %rd950, %rd949, 32; 2026-02-21T09:13:45.3817430Z or.b64 %rd951, %rd948, %rd950; 2026-02-21T09:13:45.3817595Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3817651Z mov.b64 {%r1755, %r1756}, %rd951; 2026-02-21T09:13:45.3817713Z cvt.rn.f16x2.f32 %r1757, %r1756, %r1755; 2026-02-21T09:13:45.3817880Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3817935Z cvt.u64.u32 %rd952, %r1083; 2026-02-21T09:13:45.3817989Z cvt.u64.u32 %rd953, %r1084; 2026-02-21T09:13:45.3818043Z shl.b64 %rd954, %rd953, 32; 2026-02-21T09:13:45.3818107Z or.b64 %rd955, %rd952, %rd954; 2026-02-21T09:13:45.3818266Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3818322Z mov.b64 {%r1758, %r1759}, %rd955; 2026-02-21T09:13:45.3818390Z cvt.rn.f16x2.f32 %r1760, %r1759, %r1758; 2026-02-21T09:13:45.3818548Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3818605Z cvt.u64.u32 %rd956, %r1085; 2026-02-21T09:13:45.3818668Z cvt.u64.u32 %rd957, %r1086; 2026-02-21T09:13:45.3818723Z shl.b64 %rd958, %rd957, 32; 2026-02-21T09:13:45.3818780Z or.b64 %rd959, %rd956, %rd958; 2026-02-21T09:13:45.3818937Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3819001Z mov.b64 {%r1761, %r1762}, %rd959; 2026-02-21T09:13:45.3819062Z cvt.rn.f16x2.f32 %r1763, %r1762, %r1761; 2026-02-21T09:13:45.3819116Z mov.b64 {%r1764, %r1765}, %rd347; 2026-02-21T09:13:45.3819186Z cvt.rn.f16x2.f32 %r1766, %r1765, %r1764; 2026-02-21T09:13:45.3819265Z mov.b64 {%r1767, %r1768}, %rd351; 2026-02-21T09:13:45.3819327Z cvt.rn.f16x2.f32 %r1769, %r1768, %r1767; 2026-02-21T09:13:45.3819389Z mov.b64 {%r1770, %r1771}, %rd355; 2026-02-21T09:13:45.3819448Z cvt.rn.f16x2.f32 %r1772, %r1771, %r1770; 2026-02-21T09:13:45.3819528Z mov.b64 {%r1773, %r1774}, %rd359; 2026-02-21T09:13:45.3819589Z cvt.rn.f16x2.f32 %r1775, %r1774, %r1773; 2026-02-21T09:13:45.3819650Z mov.b64 {%r1776, %r1777}, %rd363; 2026-02-21T09:13:45.3819711Z cvt.rn.f16x2.f32 %r1778, %r1777, %r1776; 2026-02-21T09:13:45.3819766Z mov.b64 {%r1779, %r1780}, %rd367; 2026-02-21T09:13:45.3819831Z cvt.rn.f16x2.f32 %r1781, %r1780, %r1779; 2026-02-21T09:13:45.3819884Z mov.b64 {%r1782, %r1783}, %rd371; 2026-02-21T09:13:45.3819944Z cvt.rn.f16x2.f32 %r1784, %r1783, %r1782; 2026-02-21T09:13:45.3819999Z mov.b64 {%r1785, %r1786}, %rd375; 2026-02-21T09:13:45.3820067Z cvt.rn.f16x2.f32 %r1787, %r1786, %r1785; 2026-02-21T09:13:45.3820143Z mov.b64 {%r1788, %r1789}, %rd379; 2026-02-21T09:13:45.3820207Z cvt.rn.f16x2.f32 %r1790, %r1789, %r1788; 2026-02-21T09:13:45.3820269Z mov.b64 {%r1791, %r1792}, %rd383; 2026-02-21T09:13:45.3820328Z cvt.rn.f16x2.f32 %r1793, %r1792, %r1791; 2026-02-21T09:13:45.3820382Z mov.b64 {%r1794, %r1795}, %rd387; 2026-02-21T09:13:45.3820451Z cvt.rn.f16x2.f32 %r1796, %r1795, %r1794; 2026-02-21T09:13:45.3820505Z mov.b64 {%r1797, %r1798}, %rd391; 2026-02-21T09:13:45.3820566Z cvt.rn.f16x2.f32 %r1799, %r1798, %r1797; 2026-02-21T09:13:45.3820641Z mov.b64 {%r1800, %r1801}, %rd395; 2026-02-21T09:13:45.3820709Z cvt.rn.f16x2.f32 %r1802, %r1801, %r1800; 2026-02-21T09:13:45.3820765Z mov.b64 {%r1803, %r1804}, %rd399; 2026-02-21T09:13:45.3820824Z cvt.rn.f16x2.f32 %r1805, %r1804, %r1803; 2026-02-21T09:13:45.3820884Z mov.b64 {%r1806, %r1807}, %rd403; 2026-02-21T09:13:45.3820944Z cvt.rn.f16x2.f32 %r1808, %r1807, %r1806; 2026-02-21T09:13:45.3820998Z mov.b64 {%r1809, %r1810}, %rd407; 2026-02-21T09:13:45.3821060Z cvt.rn.f16x2.f32 %r1811, %r1810, %r1809; 2026-02-21T09:13:45.3821226Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3821281Z cvt.u64.u32 %rd960, %r1122; 2026-02-21T09:13:45.3821335Z cvt.u64.u32 %rd961, %r1123; 2026-02-21T09:13:45.3821401Z shl.b64 %rd962, %rd961, 32; 2026-02-21T09:13:45.3821456Z or.b64 %rd963, %rd960, %rd962; 2026-02-21T09:13:45.3821613Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3821674Z mov.b64 {%r1812, %r1813}, %rd963; 2026-02-21T09:13:45.3821735Z cvt.rn.f16x2.f32 %r1814, %r1813, %r1812; 2026-02-21T09:13:45.3821896Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3821956Z cvt.u64.u32 %rd964, %r1124; 2026-02-21T09:13:45.3822011Z cvt.u64.u32 %rd965, %r1125; 2026-02-21T09:13:45.3822065Z shl.b64 %rd966, %rd965, 32; 2026-02-21T09:13:45.3822122Z or.b64 %rd967, %rd964, %rd966; 2026-02-21T09:13:45.3822286Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3822341Z mov.b64 {%r1815, %r1816}, %rd967; 2026-02-21T09:13:45.3822402Z cvt.rn.f16x2.f32 %r1817, %r1816, %r1815; 2026-02-21T09:13:45.3822563Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3822619Z cvt.u64.u32 %rd968, %r1126; 2026-02-21T09:13:45.3822674Z cvt.u64.u32 %rd969, %r1127; 2026-02-21T09:13:45.3822736Z shl.b64 %rd970, %rd969, 32; 2026-02-21T09:13:45.3822791Z or.b64 %rd971, %rd968, %rd970; 2026-02-21T09:13:45.3822946Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3823002Z mov.b64 {%r1818, %r1819}, %rd971; 2026-02-21T09:13:45.3823071Z cvt.rn.f16x2.f32 %r1820, %r1819, %r1818; 2026-02-21T09:13:45.3823238Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3823318Z cvt.u64.u32 %rd972, %r1128; 2026-02-21T09:13:45.3823380Z cvt.u64.u32 %rd973, %r1129; 2026-02-21T09:13:45.3823434Z shl.b64 %rd974, %rd973, 32; 2026-02-21T09:13:45.3823491Z or.b64 %rd975, %rd972, %rd974; 2026-02-21T09:13:45.3823678Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3823733Z mov.b64 {%r1821, %r1822}, %rd975; 2026-02-21T09:13:45.3823798Z cvt.rn.f16x2.f32 %r1823, %r1822, %r1821; 2026-02-21T09:13:45.3823956Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3824020Z cvt.u64.u32 %rd976, %r1130; 2026-02-21T09:13:45.3824073Z cvt.u64.u32 %rd977, %r1131; 2026-02-21T09:13:45.3824127Z shl.b64 %rd978, %rd977, 32; 2026-02-21T09:13:45.3824190Z or.b64 %rd979, %rd976, %rd978; 2026-02-21T09:13:45.3824369Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3824426Z mov.b64 {%r1824, %r1825}, %rd979; 2026-02-21T09:13:45.3824495Z cvt.rn.f16x2.f32 %r1826, %r1825, %r1824; 2026-02-21T09:13:45.3824647Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3824733Z cvt.u64.u32 %rd980, %r1132; 2026-02-21T09:13:45.3824787Z cvt.u64.u32 %rd981, %r1133; 2026-02-21T09:13:45.3824849Z shl.b64 %rd982, %rd981, 32; 2026-02-21T09:13:45.3824905Z or.b64 %rd983, %rd980, %rd982; 2026-02-21T09:13:45.3825094Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3825161Z mov.b64 {%r1827, %r1828}, %rd983; 2026-02-21T09:13:45.3825223Z cvt.rn.f16x2.f32 %r1829, %r1828, %r1827; 2026-02-21T09:13:45.3825387Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3825450Z cvt.u64.u32 %rd984, %r1134; 2026-02-21T09:13:45.3825506Z cvt.u64.u32 %rd985, %r1135; 2026-02-21T09:13:45.3825566Z shl.b64 %rd986, %rd985, 32; 2026-02-21T09:13:45.3825623Z or.b64 %rd987, %rd984, %rd986; 2026-02-21T09:13:45.3825798Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3825857Z mov.b64 {%r1830, %r1831}, %rd987; 2026-02-21T09:13:45.3825920Z cvt.rn.f16x2.f32 %r1832, %r1831, %r1830; 2026-02-21T09:13:45.3826088Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3826146Z cvt.u64.u32 %rd988, %r1136; 2026-02-21T09:13:45.3826202Z cvt.u64.u32 %rd989, %r1137; 2026-02-21T09:13:45.3826265Z shl.b64 %rd990, %rd989, 32; 2026-02-21T09:13:45.3826321Z or.b64 %rd991, %rd988, %rd990; 2026-02-21T09:13:45.3826476Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3826532Z mov.b64 {%r1833, %r1834}, %rd991; 2026-02-21T09:13:45.3826603Z cvt.rn.f16x2.f32 %r1835, %r1834, %r1833; 2026-02-21T09:13:45.3826762Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3826816Z cvt.u64.u32 %rd992, %r1139; 2026-02-21T09:13:45.3826878Z cvt.u64.u32 %rd993, %r1140; 2026-02-21T09:13:45.3826934Z shl.b64 %rd994, %rd993, 32; 2026-02-21T09:13:45.3826989Z or.b64 %rd995, %rd992, %rd994; 2026-02-21T09:13:45.3827153Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3827209Z mov.b64 {%r1836, %r1837}, %rd995; 2026-02-21T09:13:45.3827273Z cvt.rn.f16x2.f32 %r1838, %r1837, %r1836; 2026-02-21T09:13:45.3827430Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3827492Z cvt.u64.u32 %rd996, %r1141; 2026-02-21T09:13:45.3827546Z cvt.u64.u32 %rd997, %r1142; 2026-02-21T09:13:45.3827599Z shl.b64 %rd998, %rd997, 32; 2026-02-21T09:13:45.3827664Z or.b64 %rd999, %rd996, %rd998; 2026-02-21T09:13:45.3827848Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3827904Z mov.b64 {%r1839, %r1840}, %rd999; 2026-02-21T09:13:45.3827971Z cvt.rn.f16x2.f32 %r1841, %r1840, %r1839; 2026-02-21T09:13:45.3828162Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3828225Z cvt.u64.u32 %rd1000, %r1143; 2026-02-21T09:13:45.3828286Z cvt.u64.u32 %rd1001, %r1144; 2026-02-21T09:13:45.3828356Z shl.b64 %rd1002, %rd1001, 32; 2026-02-21T09:13:45.3828415Z or.b64 %rd1003, %rd1000, %rd1002; 2026-02-21T09:13:45.3828581Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3828650Z mov.b64 {%r1842, %r1843}, %rd1003; 2026-02-21T09:13:45.3828716Z cvt.rn.f16x2.f32 %r1844, %r1843, %r1842; 2026-02-21T09:13:45.3828907Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3828978Z cvt.u64.u32 %rd1004, %r1145; 2026-02-21T09:13:45.3829036Z cvt.u64.u32 %rd1005, %r1146; 2026-02-21T09:13:45.3829096Z shl.b64 %rd1006, %rd1005, 32; 2026-02-21T09:13:45.3829155Z or.b64 %rd1007, %rd1004, %rd1006; 2026-02-21T09:13:45.3829326Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3829387Z mov.b64 {%r1845, %r1846}, %rd1007; 2026-02-21T09:13:45.3829476Z cvt.rn.f16x2.f32 %r1847, %r1846, %r1845; 2026-02-21T09:13:45.3829541Z mov.b64 {%r1848, %r1849}, %rd411; 2026-02-21T09:13:45.3829605Z cvt.rn.f16x2.f32 %r1850, %r1849, %r1848; 2026-02-21T09:13:45.3829663Z mov.b64 {%r1851, %r1852}, %rd415; 2026-02-21T09:13:45.3829733Z cvt.rn.f16x2.f32 %r1853, %r1852, %r1851; 2026-02-21T09:13:45.3829901Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3829960Z cvt.u64.u32 %rd1008, %r1151; 2026-02-21T09:13:45.3830019Z cvt.u64.u32 %rd1009, %r1152; 2026-02-21T09:13:45.3830086Z shl.b64 %rd1010, %rd1009, 32; 2026-02-21T09:13:45.3830145Z or.b64 %rd1011, %rd1008, %rd1010; 2026-02-21T09:13:45.3830310Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3830377Z mov.b64 {%r1854, %r1855}, %rd1011; 2026-02-21T09:13:45.3830443Z cvt.rn.f16x2.f32 %r1856, %r1855, %r1854; 2026-02-21T09:13:45.3830609Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3830675Z cvt.u64.u32 %rd1012, %r1153; 2026-02-21T09:13:45.3830732Z cvt.u64.u32 %rd1013, %r1154; 2026-02-21T09:13:45.3830791Z shl.b64 %rd1014, %rd1013, 32; 2026-02-21T09:13:45.3830849Z or.b64 %rd1015, %rd1012, %rd1014; 2026-02-21T09:13:45.3831024Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3831084Z mov.b64 {%r1857, %r1858}, %rd1015; 2026-02-21T09:13:45.3831151Z cvt.rn.f16x2.f32 %r1859, %r1858, %r1857; 2026-02-21T09:13:45.3831215Z mov.b64 {%r1860, %r1861}, %rd419; 2026-02-21T09:13:45.3831279Z cvt.rn.f16x2.f32 %r1862, %r1861, %r1860; 2026-02-21T09:13:45.3831336Z mov.b64 {%r1863, %r1864}, %rd423; 2026-02-21T09:13:45.3831408Z cvt.rn.f16x2.f32 %r1865, %r1864, %r1863; 2026-02-21T09:13:45.3831464Z mov.b64 {%r1866, %r1867}, %rd427; 2026-02-21T09:13:45.3831528Z cvt.rn.f16x2.f32 %r1868, %r1867, %r1866; 2026-02-21T09:13:45.3831585Z mov.b64 {%r1869, %r1870}, %rd431; 2026-02-21T09:13:45.3831658Z cvt.rn.f16x2.f32 %r1871, %r1870, %r1869; 2026-02-21T09:13:45.3831714Z mov.b64 {%r1872, %r1873}, %rd435; 2026-02-21T09:13:45.3831777Z cvt.rn.f16x2.f32 %r1874, %r1873, %r1872; 2026-02-21T09:13:45.3831841Z mov.b64 {%r1875, %r1876}, %rd439; 2026-02-21T09:13:45.3831906Z cvt.rn.f16x2.f32 %r1877, %r1876, %r1875; 2026-02-21T09:13:45.3831963Z mov.b64 {%r1878, %r1879}, %rd443; 2026-02-21T09:13:45.3832027Z cvt.rn.f16x2.f32 %r1880, %r1879, %r1878; 2026-02-21T09:13:45.3832129Z mov.b64 {%r1881, %r1882}, %rd447; 2026-02-21T09:13:45.3832194Z cvt.rn.f16x2.f32 %r1883, %r1882, %r1881; 2026-02-21T09:13:45.3832250Z mov.b64 {%r1884, %r1885}, %rd451; 2026-02-21T09:13:45.3832321Z cvt.rn.f16x2.f32 %r1886, %r1885, %r1884; 2026-02-21T09:13:45.3832402Z mov.b64 {%r1887, %r1888}, %rd455; 2026-02-21T09:13:45.3832467Z cvt.rn.f16x2.f32 %r1889, %r1888, %r1887; 2026-02-21T09:13:45.3832529Z mov.b64 {%r1890, %r1891}, %rd459; 2026-02-21T09:13:45.3832594Z cvt.rn.f16x2.f32 %r1892, %r1891, %r1890; 2026-02-21T09:13:45.3832651Z mov.b64 {%r1893, %r1894}, %rd463; 2026-02-21T09:13:45.3832715Z cvt.rn.f16x2.f32 %r1895, %r1894, %r1893; 2026-02-21T09:13:45.3832780Z mov.b64 {%r1896, %r1897}, %rd467; 2026-02-21T09:13:45.3832843Z cvt.rn.f16x2.f32 %r1898, %r1897, %r1896; 2026-02-21T09:13:45.3832900Z mov.b64 {%r1899, %r1900}, %rd471; 2026-02-21T09:13:45.3832969Z cvt.rn.f16x2.f32 %r1901, %r1900, %r1899; 2026-02-21T09:13:45.3833047Z mov.b64 {%r1902, %r1903}, %rd475; 2026-02-21T09:13:45.3833114Z cvt.rn.f16x2.f32 %r1904, %r1903, %r1902; 2026-02-21T09:13:45.3833172Z mov.b64 {%r1905, %r1906}, %rd479; 2026-02-21T09:13:45.3833242Z cvt.rn.f16x2.f32 %r1907, %r1906, %r1905; 2026-02-21T09:13:45.3833414Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3833474Z cvt.u64.u32 %rd1016, %r1190; 2026-02-21T09:13:45.3833540Z cvt.u64.u32 %rd1017, %r1191; 2026-02-21T09:13:45.3833619Z shl.b64 %rd1018, %rd1017, 32; 2026-02-21T09:13:45.3833679Z or.b64 %rd1019, %rd1016, %rd1018; 2026-02-21T09:13:45.3833861Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3833920Z mov.b64 {%r1908, %r1909}, %rd1019; 2026-02-21T09:13:45.3833985Z cvt.rn.f16x2.f32 %r1910, %r1909, %r1908; 2026-02-21T09:13:45.3834154Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3834221Z cvt.u64.u32 %rd1020, %r1192; 2026-02-21T09:13:45.3834280Z cvt.u64.u32 %rd1021, %r1193; 2026-02-21T09:13:45.3834339Z shl.b64 %rd1022, %rd1021, 32; 2026-02-21T09:13:45.3834404Z or.b64 %rd1023, %rd1020, %rd1022; 2026-02-21T09:13:45.3834574Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3834634Z mov.b64 {%r1911, %r1912}, %rd1023; 2026-02-21T09:13:45.3834736Z cvt.rn.f16x2.f32 %r1913, %r1912, %r1911; 2026-02-21T09:13:45.3834906Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3834966Z cvt.u64.u32 %rd1024, %r1194; 2026-02-21T09:13:45.3835024Z cvt.u64.u32 %rd1025, %r1195; 2026-02-21T09:13:45.3835090Z shl.b64 %rd1026, %rd1025, 32; 2026-02-21T09:13:45.3835148Z or.b64 %rd1027, %rd1024, %rd1026; 2026-02-21T09:13:45.3835317Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3835384Z mov.b64 {%r1914, %r1915}, %rd1027; 2026-02-21T09:13:45.3835451Z cvt.rn.f16x2.f32 %r1916, %r1915, %r1914; 2026-02-21T09:13:45.3835619Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3835685Z cvt.u64.u32 %rd1028, %r1196; 2026-02-21T09:13:45.3835744Z cvt.u64.u32 %rd1029, %r1197; 2026-02-21T09:13:45.3835802Z shl.b64 %rd1030, %rd1029, 32; 2026-02-21T09:13:45.3835861Z or.b64 %rd1031, %rd1028, %rd1030; 2026-02-21T09:13:45.3836036Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3836096Z mov.b64 {%r1917, %r1918}, %rd1031; 2026-02-21T09:13:45.3836161Z cvt.rn.f16x2.f32 %r1919, %r1918, %r1917; 2026-02-21T09:13:45.3836338Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3836398Z cvt.u64.u32 %rd1032, %r1198; 2026-02-21T09:13:45.3836458Z cvt.u64.u32 %rd1033, %r1199; 2026-02-21T09:13:45.3836563Z shl.b64 %rd1034, %rd1033, 32; 2026-02-21T09:13:45.3836620Z or.b64 %rd1035, %rd1032, %rd1034; 2026-02-21T09:13:45.3836777Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3836833Z mov.b64 {%r1920, %r1921}, %rd1035; 2026-02-21T09:13:45.3836933Z cvt.rn.f16x2.f32 %r1922, %r1921, %r1920; 2026-02-21T09:13:45.3837089Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3837144Z cvt.u64.u32 %rd1036, %r1200; 2026-02-21T09:13:45.3837205Z cvt.u64.u32 %rd1037, %r1201; 2026-02-21T09:13:45.3837262Z shl.b64 %rd1038, %rd1037, 32; 2026-02-21T09:13:45.3837318Z or.b64 %rd1039, %rd1036, %rd1038; 2026-02-21T09:13:45.3837483Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3837540Z mov.b64 {%r1923, %r1924}, %rd1039; 2026-02-21T09:13:45.3837627Z cvt.rn.f16x2.f32 %r1925, %r1924, %r1923; 2026-02-21T09:13:45.3837795Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3837852Z cvt.u64.u32 %rd1040, %r1202; 2026-02-21T09:13:45.3837907Z cvt.u64.u32 %rd1041, %r1203; 2026-02-21T09:13:45.3837965Z shl.b64 %rd1042, %rd1041, 32; 2026-02-21T09:13:45.3838029Z or.b64 %rd1043, %rd1040, %rd1042; 2026-02-21T09:13:45.3838189Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3838269Z mov.b64 {%r1926, %r1927}, %rd1043; 2026-02-21T09:13:45.3838339Z cvt.rn.f16x2.f32 %r1928, %r1927, %r1926; 2026-02-21T09:13:45.3838498Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3838554Z cvt.u64.u32 %rd1044, %r1204; 2026-02-21T09:13:45.3838616Z cvt.u64.u32 %rd1045, %r1205; 2026-02-21T09:13:45.3838672Z shl.b64 %rd1046, %rd1045, 32; 2026-02-21T09:13:45.3838729Z or.b64 %rd1047, %rd1044, %rd1046; 2026-02-21T09:13:45.3838890Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3838952Z mov.b64 {%r1929, %r1930}, %rd1047; 2026-02-21T09:13:45.3839015Z cvt.rn.f16x2.f32 %r1931, %r1930, %r1929; 2026-02-21T09:13:45.3839177Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3839240Z cvt.u64.u32 %rd1048, %r1207; 2026-02-21T09:13:45.3839296Z cvt.u64.u32 %rd1049, %r1208; 2026-02-21T09:13:45.3839354Z shl.b64 %rd1050, %rd1049, 32; 2026-02-21T09:13:45.3839418Z or.b64 %rd1051, %rd1048, %rd1050; 2026-02-21T09:13:45.3839581Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3839637Z mov.b64 {%r1932, %r1933}, %rd1051; 2026-02-21T09:13:45.3839699Z cvt.rn.f16x2.f32 %r1934, %r1933, %r1932; 2026-02-21T09:13:45.3839869Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3839926Z cvt.u64.u32 %rd1052, %r1209; 2026-02-21T09:13:45.3839980Z cvt.u64.u32 %rd1053, %r1210; 2026-02-21T09:13:45.3840043Z shl.b64 %rd1054, %rd1053, 32; 2026-02-21T09:13:45.3840100Z or.b64 %rd1055, %rd1052, %rd1054; 2026-02-21T09:13:45.3840258Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3840321Z mov.b64 {%r1935, %r1936}, %rd1055; 2026-02-21T09:13:45.3840386Z cvt.rn.f16x2.f32 %r1937, %r1936, %r1935; 2026-02-21T09:13:45.3840543Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3840597Z cvt.u64.u32 %rd1056, %r1211; 2026-02-21T09:13:45.3840660Z cvt.u64.u32 %rd1057, %r1212; 2026-02-21T09:13:45.3840717Z shl.b64 %rd1058, %rd1057, 32; 2026-02-21T09:13:45.3840775Z or.b64 %rd1059, %rd1056, %rd1058; 2026-02-21T09:13:45.3840937Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3841014Z mov.b64 {%r1938, %r1939}, %rd1059; 2026-02-21T09:13:45.3841076Z cvt.rn.f16x2.f32 %r1940, %r1939, %r1938; 2026-02-21T09:13:45.3841239Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3841318Z cvt.u64.u32 %rd1060, %r1213; 2026-02-21T09:13:45.3841372Z cvt.u64.u32 %rd1061, %r1214; 2026-02-21T09:13:45.3841428Z shl.b64 %rd1062, %rd1061, 32; 2026-02-21T09:13:45.3841492Z or.b64 %rd1063, %rd1060, %rd1062; 2026-02-21T09:13:45.3841650Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3841706Z mov.b64 {%r1941, %r1942}, %rd1063; 2026-02-21T09:13:45.3841781Z cvt.rn.f16x2.f32 %r1943, %r1942, %r1941; 2026-02-21T09:13:45.3841835Z mov.b64 {%r1944, %r1945}, %rd483; 2026-02-21T09:13:45.3841897Z cvt.rn.f16x2.f32 %r1946, %r1945, %r1944; 2026-02-21T09:13:45.3841996Z mov.b64 {%r1947, %r1948}, %rd487; 2026-02-21T09:13:45.3842063Z cvt.rn.f16x2.f32 %r1949, %r1948, %r1947; 2026-02-21T09:13:45.3842118Z mov.b64 {%r1950, %r1951}, %rd491; 2026-02-21T09:13:45.3842180Z cvt.rn.f16x2.f32 %r1952, %r1951, %r1950; 2026-02-21T09:13:45.3842242Z mov.b64 {%r1953, %r1954}, %rd495; 2026-02-21T09:13:45.3842305Z cvt.rn.f16x2.f32 %r1955, %r1954, %r1953; 2026-02-21T09:13:45.3842358Z mov.b64 {%r1956, %r1957}, %rd499; 2026-02-21T09:13:45.3842425Z cvt.rn.f16x2.f32 %r1958, %r1957, %r1956; 2026-02-21T09:13:45.3842503Z mov.b64 {%r1959, %r1960}, %rd503; 2026-02-21T09:13:45.3842566Z cvt.rn.f16x2.f32 %r1961, %r1960, %r1959; 2026-02-21T09:13:45.3842620Z mov.b64 {%r1962, %r1963}, %rd507; 2026-02-21T09:13:45.3842688Z cvt.rn.f16x2.f32 %r1964, %r1963, %r1962; 2026-02-21T09:13:45.3842742Z mov.b64 {%r1965, %r1966}, %rd511; 2026-02-21T09:13:45.3842804Z cvt.rn.f16x2.f32 %r1967, %r1966, %r1965; 2026-02-21T09:13:45.3842865Z mov.b64 {%r1968, %r1969}, %rd515; 2026-02-21T09:13:45.3842928Z cvt.rn.f16x2.f32 %r1970, %r1969, %r1968; 2026-02-21T09:13:45.3842984Z mov.b64 {%r1971, %r1972}, %rd519; 2026-02-21T09:13:45.3843052Z cvt.rn.f16x2.f32 %r1973, %r1972, %r1971; 2026-02-21T09:13:45.3843106Z mov.b64 {%r1974, %r1975}, %rd523; 2026-02-21T09:13:45.3843168Z cvt.rn.f16x2.f32 %r1976, %r1975, %r1974; 2026-02-21T09:13:45.3843224Z mov.b64 {%r1977, %r1978}, %rd527; 2026-02-21T09:13:45.3843294Z cvt.rn.f16x2.f32 %r1979, %r1978, %r1977; 2026-02-21T09:13:45.3843349Z mov.b64 {%r1980, %r1981}, %rd531; 2026-02-21T09:13:45.3843411Z cvt.rn.f16x2.f32 %r1982, %r1981, %r1980; 2026-02-21T09:13:45.3843472Z mov.b64 {%r1983, %r1984}, %rd535; 2026-02-21T09:13:45.3843532Z cvt.rn.f16x2.f32 %r1985, %r1984, %r1983; 2026-02-21T09:13:45.3843585Z mov.b64 {%r1986, %r1987}, %rd539; 2026-02-21T09:13:45.3843646Z cvt.rn.f16x2.f32 %r1988, %r1987, %r1986; 2026-02-21T09:13:45.3843708Z mov.b64 {%r1989, %r1990}, %rd543; 2026-02-21T09:13:45.3843768Z cvt.rn.f16x2.f32 %r1991, %r1990, %r1989; 2026-02-21T09:13:45.3843824Z mov.b64 {%r1992, %r1993}, %rd547; 2026-02-21T09:13:45.3843894Z cvt.rn.f16x2.f32 %r1994, %r1993, %r1992; 2026-02-21T09:13:45.3843947Z mov.b64 {%r1995, %r1996}, %rd551; 2026-02-21T09:13:45.3844008Z cvt.rn.f16x2.f32 %r1997, %r1996, %r1995; 2026-02-21T09:13:45.3844066Z mov.b64 {%r1998, %r1999}, %rd555; 2026-02-21T09:13:45.3844128Z cvt.rn.f16x2.f32 %r2000, %r1999, %r1998; 2026-02-21T09:13:45.3844182Z mov.b64 {%r2001, %r2002}, %rd559; 2026-02-21T09:13:45.3844244Z cvt.rn.f16x2.f32 %r2003, %r2002, %r2001; 2026-02-21T09:13:45.3844412Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3844468Z cvt.u64.u32 %rd1064, %r1258; 2026-02-21T09:13:45.3844523Z cvt.u64.u32 %rd1065, %r1259; 2026-02-21T09:13:45.3844585Z shl.b64 %rd1066, %rd1065, 32; 2026-02-21T09:13:45.3844640Z or.b64 %rd1067, %rd1064, %rd1066; 2026-02-21T09:13:45.3844833Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3844922Z mov.b64 {%r2004, %r2005}, %rd1067; 2026-02-21T09:13:45.3844982Z cvt.rn.f16x2.f32 %r2006, %r2005, %r2004; 2026-02-21T09:13:45.3845145Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3845201Z cvt.u64.u32 %rd1068, %r1260; 2026-02-21T09:13:45.3845291Z cvt.u64.u32 %rd1069, %r1261; 2026-02-21T09:13:45.3845347Z shl.b64 %rd1070, %rd1069, 32; 2026-02-21T09:13:45.3845402Z or.b64 %rd1071, %rd1068, %rd1070; 2026-02-21T09:13:45.3845569Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3845627Z mov.b64 {%r2007, %r2008}, %rd1071; 2026-02-21T09:13:45.3845690Z cvt.rn.f16x2.f32 %r2009, %r2008, %r2007; 2026-02-21T09:13:45.3845859Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3845915Z cvt.u64.u32 %rd1072, %r1262; 2026-02-21T09:13:45.3845993Z cvt.u64.u32 %rd1073, %r1263; 2026-02-21T09:13:45.3846051Z shl.b64 %rd1074, %rd1073, 32; 2026-02-21T09:13:45.3846114Z or.b64 %rd1075, %rd1072, %rd1074; 2026-02-21T09:13:45.3846275Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3846332Z mov.b64 {%r2010, %r2011}, %rd1075; 2026-02-21T09:13:45.3846401Z cvt.rn.f16x2.f32 %r2012, %r2011, %r2010; 2026-02-21T09:13:45.3846557Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3846649Z cvt.u64.u32 %rd1076, %r1264; 2026-02-21T09:13:45.3846713Z cvt.u64.u32 %rd1077, %r1265; 2026-02-21T09:13:45.3846771Z shl.b64 %rd1078, %rd1077, 32; 2026-02-21T09:13:45.3846828Z or.b64 %rd1079, %rd1076, %rd1078; 2026-02-21T09:13:45.3846987Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3847050Z mov.b64 {%r2013, %r2014}, %rd1079; 2026-02-21T09:13:45.3847116Z cvt.rn.f16x2.f32 %r2015, %r2014, %r2013; 2026-02-21T09:13:45.3847275Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3847339Z cvt.u64.u32 %rd1080, %r1266; 2026-02-21T09:13:45.3847394Z cvt.u64.u32 %rd1081, %r1267; 2026-02-21T09:13:45.3847452Z shl.b64 %rd1082, %rd1081, 32; 2026-02-21T09:13:45.3847515Z or.b64 %rd1083, %rd1080, %rd1082; 2026-02-21T09:13:45.3847673Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3847731Z mov.b64 {%r2016, %r2017}, %rd1083; 2026-02-21T09:13:45.3847794Z cvt.rn.f16x2.f32 %r2018, %r2017, %r2016; 2026-02-21T09:13:45.3847963Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3848018Z cvt.u64.u32 %rd1084, %r1268; 2026-02-21T09:13:45.3848073Z cvt.u64.u32 %rd1085, %r1269; 2026-02-21T09:13:45.3848136Z shl.b64 %rd1086, %rd1085, 32; 2026-02-21T09:13:45.3848192Z or.b64 %rd1087, %rd1084, %rd1086; 2026-02-21T09:13:45.3848356Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3848422Z mov.b64 {%r2019, %r2020}, %rd1087; 2026-02-21T09:13:45.3848486Z cvt.rn.f16x2.f32 %r2021, %r2020, %r2019; 2026-02-21T09:13:45.3848646Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3848702Z cvt.u64.u32 %rd1088, %r1270; 2026-02-21T09:13:45.3848765Z cvt.u64.u32 %rd1089, %r1271; 2026-02-21T09:13:45.3848822Z shl.b64 %rd1090, %rd1089, 32; 2026-02-21T09:13:45.3848876Z or.b64 %rd1091, %rd1088, %rd1090; 2026-02-21T09:13:45.3849039Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3849094Z mov.b64 {%r2022, %r2023}, %rd1091; 2026-02-21T09:13:45.3849156Z cvt.rn.f16x2.f32 %r2024, %r2023, %r2022; 2026-02-21T09:13:45.3849320Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3849396Z cvt.u64.u32 %rd1092, %r1272; 2026-02-21T09:13:45.3849451Z cvt.u64.u32 %rd1093, %r1273; 2026-02-21T09:13:45.3849507Z shl.b64 %rd1094, %rd1093, 32; 2026-02-21T09:13:45.3849569Z or.b64 %rd1095, %rd1092, %rd1094; 2026-02-21T09:13:45.3849748Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3849804Z mov.b64 {%r2025, %r2026}, %rd1095; 2026-02-21T09:13:45.3849876Z cvt.rn.f16x2.f32 %r2027, %r2026, %r2025; 2026-02-21T09:13:45.3850034Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3850089Z cvt.u64.u32 %rd1096, %r1275; 2026-02-21T09:13:45.3850149Z cvt.u64.u32 %rd1097, %r1276; 2026-02-21T09:13:45.3850206Z shl.b64 %rd1098, %rd1097, 32; 2026-02-21T09:13:45.3850262Z or.b64 %rd1099, %rd1096, %rd1098; 2026-02-21T09:13:45.3850444Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3850509Z mov.b64 {%r2028, %r2029}, %rd1099; 2026-02-21T09:13:45.3850572Z cvt.rn.f16x2.f32 %r2030, %r2029, %r2028; 2026-02-21T09:13:45.3850730Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3850796Z cvt.u64.u32 %rd1100, %r1277; 2026-02-21T09:13:45.3850852Z cvt.u64.u32 %rd1101, %r1278; 2026-02-21T09:13:45.3850909Z shl.b64 %rd1102, %rd1101, 32; 2026-02-21T09:13:45.3850993Z or.b64 %rd1103, %rd1100, %rd1102; 2026-02-21T09:13:45.3851152Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3851208Z mov.b64 {%r2031, %r2032}, %rd1103; 2026-02-21T09:13:45.3851273Z cvt.rn.f16x2.f32 %r2033, %r2032, %r2031; 2026-02-21T09:13:45.3851437Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3851493Z cvt.u64.u32 %rd1104, %r1279; 2026-02-21T09:13:45.3851549Z cvt.u64.u32 %rd1105, %r1280; 2026-02-21T09:13:45.3851613Z shl.b64 %rd1106, %rd1105, 32; 2026-02-21T09:13:45.3851667Z or.b64 %rd1107, %rd1104, %rd1106; 2026-02-21T09:13:45.3851827Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3851890Z mov.b64 {%r2034, %r2035}, %rd1107; 2026-02-21T09:13:45.3851950Z cvt.rn.f16x2.f32 %r2036, %r2035, %r2034; 2026-02-21T09:13:45.3852110Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3852173Z cvt.u64.u32 %rd1108, %r1281; 2026-02-21T09:13:45.3852228Z cvt.u64.u32 %rd1109, %r1282; 2026-02-21T09:13:45.3852284Z shl.b64 %rd1110, %rd1109, 32; 2026-02-21T09:13:45.3852339Z or.b64 %rd1111, %rd1108, %rd1110; 2026-02-21T09:13:45.3852505Z .loc 1 58 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:58:27 2026-02-21T09:13:45.3852562Z mov.b64 {%r2037, %r2038}, %rd1111; 2026-02-21T09:13:45.3852625Z cvt.rn.f16x2.f32 %r2039, %r2038, %r2037; 2026-02-21T09:13:45.3852684Z mov.b64 {%r2040, %r2041}, %rd563; 2026-02-21T09:13:45.3852745Z cvt.rn.f16x2.f32 %r2042, %r2041, %r2040; 2026-02-21T09:13:45.3852799Z mov.b64 {%r2043, %r2044}, %rd567; 2026-02-21T09:13:45.3852864Z cvt.rn.f16x2.f32 %r2045, %r2044, %r2043; 2026-02-21T09:13:45.3852924Z mov.b64 {%r2046, %r2047}, %rd571; 2026-02-21T09:13:45.3852985Z cvt.rn.f16x2.f32 %r2048, %r2047, %r2046; 2026-02-21T09:13:45.3853040Z mov.b64 {%r2049, %r2050}, %rd575; 2026-02-21T09:13:45.3853109Z cvt.rn.f16x2.f32 %r2051, %r2050, %r2049; 2026-02-21T09:13:45.3853162Z mov.b64 {%r2052, %r2053}, %rd579; 2026-02-21T09:13:45.3853223Z cvt.rn.f16x2.f32 %r2054, %r2053, %r2052; 2026-02-21T09:13:45.3853282Z mov.b64 {%r2055, %r2056}, %rd583; 2026-02-21T09:13:45.3853344Z cvt.rn.f16x2.f32 %r2057, %r2056, %r2055; 2026-02-21T09:13:45.3853397Z mov.b64 {%r2058, %r2059}, %rd587; 2026-02-21T09:13:45.3853458Z cvt.rn.f16x2.f32 %r2060, %r2059, %r2058; 2026-02-21T09:13:45.3853542Z mov.b64 {%r2061, %r2062}, %rd591; 2026-02-21T09:13:45.3853604Z cvt.rn.f16x2.f32 %r2063, %r2062, %r2061; 2026-02-21T09:13:45.3853658Z mov.b64 {%r2064, %r2065}, %rd595; 2026-02-21T09:13:45.3853724Z cvt.rn.f16x2.f32 %r2066, %r2065, %r2064; 2026-02-21T09:13:45.3853802Z mov.b64 {%r2067, %r2068}, %rd599; 2026-02-21T09:13:45.3853865Z cvt.rn.f16x2.f32 %r2069, %r2068, %r2067; 2026-02-21T09:13:45.3853919Z mov.b64 {%r2070, %r2071}, %rd603; 2026-02-21T09:13:45.3853989Z cvt.rn.f16x2.f32 %r2072, %r2071, %r2070; 2026-02-21T09:13:45.3854043Z mov.b64 {%r2073, %r2074}, %rd607; 2026-02-21T09:13:45.3854104Z cvt.rn.f16x2.f32 %r2075, %r2074, %r2073; 2026-02-21T09:13:45.3854165Z mov.b64 {%r2076, %r2077}, %rd611; 2026-02-21T09:13:45.3854227Z cvt.rn.f16x2.f32 %r2078, %r2077, %r2076; 2026-02-21T09:13:45.3854281Z mov.b64 {%r2079, %r2080}, %rd615; 2026-02-21T09:13:45.3854350Z cvt.rn.f16x2.f32 %r2081, %r2080, %r2079; 2026-02-21T09:13:45.3854425Z mov.b64 {%r2082, %r2083}, %rd619; 2026-02-21T09:13:45.3854490Z cvt.rn.f16x2.f32 %r2084, %r2083, %r2082; 2026-02-21T09:13:45.3854543Z mov.b64 {%r2085, %r2086}, %rd623; 2026-02-21T09:13:45.3854611Z cvt.rn.f16x2.f32 %r2087, %r2086, %r2085; 2026-02-21T09:13:45.3854665Z mov.b64 {%r2088, %r2089}, %rd627; 2026-02-21T09:13:45.3854754Z cvt.rn.f16x2.f32 %r2090, %r2089, %r2088; 2026-02-21T09:13:45.3854817Z mov.b64 {%r2091, %r2092}, %rd631; 2026-02-21T09:13:45.3854878Z cvt.rn.f16x2.f32 %r2093, %r2092, %r2091; 2026-02-21T09:13:45.3854964Z mov.b64 {%r2094, %r2095}, %rd635; 2026-02-21T09:13:45.3855028Z cvt.rn.f16x2.f32 %r2096, %r2095, %r2094; 2026-02-21T09:13:45.3855089Z mov.b64 {%r2097, %r2098}, %rd639; 2026-02-21T09:13:45.3855150Z cvt.rn.f16x2.f32 %r2099, %r2098, %r2097; 2026-02-21T09:13:45.3855317Z .loc 1 59 45 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:59:45 2026-02-21T09:13:45.3855396Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:13:45.3855455Z bar.sync 0, 128; 2026-02-21T09:13:45.3855555Z st.shared.v4.b32 [%r46], {%r1334, %r1337, %r1340, %r1343}; 2026-02-21T09:13:45.3855671Z st.shared.v4.b32 [%r46+32768], {%r1430, %r1433, %r1436, %r1439}; 2026-02-21T09:13:45.3855771Z st.shared.v4.b32 [%r46+65536], {%r1526, %r1529, %r1532, %r1535}; 2026-02-21T09:13:45.3855868Z st.shared.v4.b32 [%r46+98304], {%r1622, %r1625, %r1628, %r1631}; 2026-02-21T09:13:45.3855972Z st.shared.v4.b32 [%r46+16384], {%r1718, %r1721, %r1724, %r1727}; 2026-02-21T09:13:45.3856067Z st.shared.v4.b32 [%r46+49152], {%r1814, %r1817, %r1820, %r1823}; 2026-02-21T09:13:45.3856161Z st.shared.v4.b32 [%r46+81920], {%r1910, %r1913, %r1916, %r1919}; 2026-02-21T09:13:45.3856264Z st.shared.v4.b32 [%r46+114688], {%r2006, %r2009, %r2012, %r2015}; 2026-02-21T09:13:45.3856361Z st.shared.v4.b32 [%r47], {%r1346, %r1349, %r1352, %r1355}; 2026-02-21T09:13:45.3856455Z st.shared.v4.b32 [%r47+32768], {%r1442, %r1445, %r1448, %r1451}; 2026-02-21T09:13:45.3856550Z st.shared.v4.b32 [%r47+65536], {%r1538, %r1541, %r1544, %r1547}; 2026-02-21T09:13:45.3856653Z st.shared.v4.b32 [%r47+98304], {%r1634, %r1637, %r1640, %r1643}; 2026-02-21T09:13:45.3856746Z st.shared.v4.b32 [%r47+16384], {%r1730, %r1733, %r1736, %r1739}; 2026-02-21T09:13:45.3856838Z st.shared.v4.b32 [%r47+49152], {%r1826, %r1829, %r1832, %r1835}; 2026-02-21T09:13:45.3856937Z st.shared.v4.b32 [%r47+81920], {%r1922, %r1925, %r1928, %r1931}; 2026-02-21T09:13:45.3857031Z st.shared.v4.b32 [%r47+114688], {%r2018, %r2021, %r2024, %r2027}; 2026-02-21T09:13:45.3857121Z st.shared.v4.b32 [%r48], {%r1358, %r1361, %r1364, %r1367}; 2026-02-21T09:13:45.3857221Z st.shared.v4.b32 [%r48+32768], {%r1454, %r1457, %r1460, %r1463}; 2026-02-21T09:13:45.3857313Z st.shared.v4.b32 [%r48+65536], {%r1550, %r1553, %r1556, %r1559}; 2026-02-21T09:13:45.3857404Z st.shared.v4.b32 [%r48+98304], {%r1646, %r1649, %r1652, %r1655}; 2026-02-21T09:13:45.3857494Z st.shared.v4.b32 [%r48+16384], {%r1742, %r1745, %r1748, %r1751}; 2026-02-21T09:13:45.3857594Z st.shared.v4.b32 [%r48+49152], {%r1838, %r1841, %r1844, %r1847}; 2026-02-21T09:13:45.3857713Z st.shared.v4.b32 [%r48+81920], {%r1934, %r1937, %r1940, %r1943}; 2026-02-21T09:13:45.3857813Z st.shared.v4.b32 [%r48+114688], {%r2030, %r2033, %r2036, %r2039}; 2026-02-21T09:13:45.3857910Z st.shared.v4.b32 [%r49], {%r1370, %r1373, %r1376, %r1379}; 2026-02-21T09:13:45.3858028Z st.shared.v4.b32 [%r49+32768], {%r1466, %r1469, %r1472, %r1475}; 2026-02-21T09:13:45.3858120Z st.shared.v4.b32 [%r49+65536], {%r1562, %r1565, %r1568, %r1571}; 2026-02-21T09:13:45.3858220Z st.shared.v4.b32 [%r49+98304], {%r1658, %r1661, %r1664, %r1667}; 2026-02-21T09:13:45.3858311Z st.shared.v4.b32 [%r49+16384], {%r1754, %r1757, %r1760, %r1763}; 2026-02-21T09:13:45.3858403Z st.shared.v4.b32 [%r49+49152], {%r1850, %r1853, %r1856, %r1859}; 2026-02-21T09:13:45.3858501Z st.shared.v4.b32 [%r49+81920], {%r1946, %r1949, %r1952, %r1955}; 2026-02-21T09:13:45.3858622Z st.shared.v4.b32 [%r49+114688], {%r2042, %r2045, %r2048, %r2051}; 2026-02-21T09:13:45.3858711Z st.shared.v4.b32 [%r50], {%r1382, %r1385, %r1388, %r1391}; 2026-02-21T09:13:45.3858803Z st.shared.v4.b32 [%r50+32768], {%r1478, %r1481, %r1484, %r1487}; 2026-02-21T09:13:45.3858902Z st.shared.v4.b32 [%r50+65536], {%r1574, %r1577, %r1580, %r1583}; 2026-02-21T09:13:45.3858995Z st.shared.v4.b32 [%r50+98304], {%r1670, %r1673, %r1676, %r1679}; 2026-02-21T09:13:45.3859086Z st.shared.v4.b32 [%r50+16384], {%r1766, %r1769, %r1772, %r1775}; 2026-02-21T09:13:45.3859210Z st.shared.v4.b32 [%r50+49152], {%r1862, %r1865, %r1868, %r1871}; 2026-02-21T09:13:45.3859303Z st.shared.v4.b32 [%r50+81920], {%r1958, %r1961, %r1964, %r1967}; 2026-02-21T09:13:45.3859397Z st.shared.v4.b32 [%r50+114688], {%r2054, %r2057, %r2060, %r2063}; 2026-02-21T09:13:45.3859489Z st.shared.v4.b32 [%r51], {%r1394, %r1397, %r1400, %r1403}; 2026-02-21T09:13:45.3859579Z st.shared.v4.b32 [%r51+32768], {%r1490, %r1493, %r1496, %r1499}; 2026-02-21T09:13:45.3859673Z st.shared.v4.b32 [%r51+65536], {%r1586, %r1589, %r1592, %r1595}; 2026-02-21T09:13:45.3859773Z st.shared.v4.b32 [%r51+98304], {%r1682, %r1685, %r1688, %r1691}; 2026-02-21T09:13:45.3859864Z st.shared.v4.b32 [%r51+16384], {%r1778, %r1781, %r1784, %r1787}; 2026-02-21T09:13:45.3859953Z st.shared.v4.b32 [%r51+49152], {%r1874, %r1877, %r1880, %r1883}; 2026-02-21T09:13:45.3860044Z st.shared.v4.b32 [%r51+81920], {%r1970, %r1973, %r1976, %r1979}; 2026-02-21T09:13:45.3860145Z st.shared.v4.b32 [%r51+114688], {%r2066, %r2069, %r2072, %r2075}; 2026-02-21T09:13:45.3860232Z st.shared.v4.b32 [%r52], {%r1406, %r1409, %r1412, %r1415}; 2026-02-21T09:13:45.3860322Z st.shared.v4.b32 [%r52+32768], {%r1502, %r1505, %r1508, %r1511}; 2026-02-21T09:13:45.3860421Z st.shared.v4.b32 [%r52+65536], {%r1598, %r1601, %r1604, %r1607}; 2026-02-21T09:13:45.3860514Z st.shared.v4.b32 [%r52+98304], {%r1694, %r1697, %r1700, %r1703}; 2026-02-21T09:13:45.3860606Z st.shared.v4.b32 [%r52+16384], {%r1790, %r1793, %r1796, %r1799}; 2026-02-21T09:13:45.3860705Z st.shared.v4.b32 [%r52+49152], {%r1886, %r1889, %r1892, %r1895}; 2026-02-21T09:13:45.3860798Z st.shared.v4.b32 [%r52+81920], {%r1982, %r1985, %r1988, %r1991}; 2026-02-21T09:13:45.3860893Z st.shared.v4.b32 [%r52+114688], {%r2078, %r2081, %r2084, %r2087}; 2026-02-21T09:13:45.3860985Z st.shared.v4.b32 [%r53], {%r1418, %r1421, %r1424, %r1427}; 2026-02-21T09:13:45.3861078Z st.shared.v4.b32 [%r53+32768], {%r1514, %r1517, %r1520, %r1523}; 2026-02-21T09:13:45.3861169Z st.shared.v4.b32 [%r53+65536], {%r1610, %r1613, %r1616, %r1619}; 2026-02-21T09:13:45.3861262Z st.shared.v4.b32 [%r53+98304], {%r1706, %r1709, %r1712, %r1715}; 2026-02-21T09:13:45.3861360Z st.shared.v4.b32 [%r53+16384], {%r1802, %r1805, %r1808, %r1811}; 2026-02-21T09:13:45.3861450Z st.shared.v4.b32 [%r53+49152], {%r1898, %r1901, %r1904, %r1907}; 2026-02-21T09:13:45.3861542Z st.shared.v4.b32 [%r53+81920], {%r1994, %r1997, %r2000, %r2003}; 2026-02-21T09:13:45.3861642Z st.shared.v4.b32 [%r53+114688], {%r2090, %r2093, %r2096, %r2099}; 2026-02-21T09:13:45.3861699Z // begin inline asm 2026-02-21T09:13:45.3861793Z fence.proxy.async.shared::cta; 2026-02-21T09:13:45.3861852Z // end inline asm 2026-02-21T09:13:45.3861906Z bar.sync 0, 128; 2026-02-21T09:13:45.3861970Z elect.sync %r2100|%p143, -1; 2026-02-21T09:13:45.3862031Z and.pred %p141, %p142, %p143; 2026-02-21T09:13:45.3862116Z add.s32 %r1327, %r2153, %r55; 2026-02-21T09:13:45.3862171Z // begin inline asm 2026-02-21T09:13:45.3862363Z @%p141 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd87, {%r1327, %r2154}], [%r1329]; 2026-02-21T09:13:45.3862424Z // end inline asm 2026-02-21T09:13:45.3862490Z cp.async.bulk.commit_group; 2026-02-21T09:13:45.3862543Z mov.b32 %r2151, 1; 2026-02-21T09:13:45.3862652Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:45.3862821Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3862882Z xor.b32 %r2155, %r2151, %r2155; 2026-02-21T09:13:45.3863089Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3863157Z add.s32 %r2150, %r2150, 1; 2026-02-21T09:13:45.3863222Z setp.lt.s32 %p144, %r2150, %r42; 2026-02-21T09:13:45.3863281Z @%p144 bra $L__BB0_18; 2026-02-21T09:13:45.3863346Z bra.uni $L__BB0_23; 2026-02-21T09:13:45.3863448Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:13:45.3863505Z add.s32 %r779, %r2149, 1; 2026-02-21T09:13:45.3863594Z setp.eq.b32 %p137, %r2149, 63; 2026-02-21T09:13:45.3863658Z selp.b32 %r2149, 0, %r779, %p137; 2026-02-21T09:13:45.3863718Z setp.eq.b32 %p138, %r2149, 63; 2026-02-21T09:13:45.3863775Z @%p138 bra $L__BB0_21; 2026-02-21T09:13:45.3863880Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:45.3864045Z .loc 1 0 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0:106 2026-02-21T09:13:45.3864101Z mov.b32 %r2151, 0; 2026-02-21T09:13:45.3864273Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3864331Z setp.ne.b32 %p139, %r2149, 0; 2026-02-21T09:13:45.3864386Z @%p139 bra $L__BB0_22; 2026-02-21T09:13:45.3864467Z // %bb.20: // %.thread 2026-02-21T09:13:45.3864558Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:45.3864615Z add.s32 %r2152, %r2152, 1; 2026-02-21T09:13:45.3864809Z .loc 1 39 35 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:39:35 2026-02-21T09:13:45.3864874Z shr.s32 %r2102, %r2152, 31; 2026-02-21T09:13:45.3864930Z shr.u32 %r2103, %r2102, 25; 2026-02-21T09:13:45.3864989Z add.s32 %r2104, %r2152, %r2103; 2026-02-21T09:13:45.3865052Z shr.s32 %r2105, %r2104, 7; 2026-02-21T09:13:45.3865214Z .loc 1 40 33 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:40:33 2026-02-21T09:13:45.3865270Z shl.b32 %r2106, %r2105, 3; 2026-02-21T09:13:45.3865438Z .loc 1 41 39 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:41:39 2026-02-21T09:13:45.3865494Z sub.s32 %r2107, 8, %r2106; 2026-02-21T09:13:45.3865652Z .loc 1 41 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:41:52 2026-02-21T09:13:45.3865707Z min.s32 %r2108, %r2107, 8; 2026-02-21T09:13:45.3865875Z .loc 1 42 45 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:42:45 2026-02-21T09:13:45.3865933Z and.b32 %r2109, %r2104, -128; 2026-02-21T09:13:45.3865990Z sub.s32 %r2110, %r2152, %r2109; 2026-02-21T09:13:45.3866155Z .loc 1 43 51 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:43:51 2026-02-21T09:13:45.3866213Z div.s32 %r2111, %r2110, %r2108; 2026-02-21T09:13:45.3866375Z .loc 1 42 64 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:42:64 2026-02-21T09:13:45.3866443Z mul.lo.s32 %r2112, %r2111, %r2108; 2026-02-21T09:13:45.3866540Z sub.s32 %r2113, %r2110, %r2112; 2026-02-21T09:13:45.3866702Z .loc 1 42 30 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:42:30 2026-02-21T09:13:45.3866760Z add.s32 %r2114, %r2113, %r2106; 2026-02-21T09:13:45.3866955Z .loc 1 44 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:44:27 2026-02-21T09:13:45.3867011Z shl.b32 %r2154, %r2114, 8; 2026-02-21T09:13:45.3867175Z .loc 1 45 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:45:27 2026-02-21T09:13:45.3867238Z shl.b32 %r2153, %r2111, 8; 2026-02-21T09:13:45.3867406Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3867461Z bra.uni $L__BB0_22; 2026-02-21T09:13:45.3867563Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:13:45.3867750Z .loc 1 0 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0:106 2026-02-21T09:13:45.3867813Z mov.b32 %r74, global_smem; 2026-02-21T09:13:45.3867878Z add.s32 %r75, %r74, %r3; 2026-02-21T09:13:45.3867935Z mov.u32 %r122, %ctaid.x; 2026-02-21T09:13:45.3867989Z max.u32 %r123, %r122, 127; 2026-02-21T09:13:45.3868046Z shl.b32 %r124, %r123, 6; 2026-02-21T09:13:45.3868110Z sub.s32 %r5, 8192, %r124; 2026-02-21T09:13:45.3868169Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:13:45.3868222Z bra.uni $L__BB0_2; 2026-02-21T09:13:45.3868348Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3868516Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3868593Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:45.3868649Z barrier.sync 1; 2026-02-21T09:13:45.3868710Z barrier.sync 1; 2026-02-21T09:13:45.3868785Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:45.3868864Z $L__BB0_2: // %.preheader 2026-02-21T09:13:45.3868960Z // =>This Loop Header: Depth=1 2026-02-21T09:13:45.3869045Z // Child Loop BB0_11 Depth 2 2026-02-21T09:13:45.3869125Z // Child Loop BB0_7 Depth 2 2026-02-21T09:13:45.3869283Z .loc 1 19 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:19 2026-02-21T09:13:45.3869357Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:13:45.3869412Z barrier.sync 1; 2026-02-21T09:13:45.3869482Z ld.shared.b8 %r73, [%r75+262228]; 2026-02-21T09:13:45.3869540Z setp.gt.u32 %p4, %r73, 3; 2026-02-21T09:13:45.3869595Z @%p4 bra $L__BB0_4; 2026-02-21T09:13:45.3869672Z // %bb.3: // %.preheader 2026-02-21T09:13:45.3869764Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3869824Z $L_brx_0: .branchtargets 2026-02-21T09:13:45.3869877Z $L__BB0_5, 2026-02-21T09:13:45.3869935Z $L__BB0_9, 2026-02-21T09:13:45.3869986Z $L__BB0_15, 2026-02-21T09:13:45.3870034Z $L__BB0_24; 2026-02-21T09:13:45.3870093Z brx.idx %r73, $L_brx_0; 2026-02-21T09:13:45.3870191Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3870359Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3870432Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:45.3870503Z ld.shared.b32 %r129, [global_smem]; 2026-02-21T09:13:45.3870557Z barrier.sync 1; 2026-02-21T09:13:45.3870612Z @%p17 bra $L__BB0_8; 2026-02-21T09:13:45.3870684Z // %bb.6: // %.lr.ph6 2026-02-21T09:13:45.3870772Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3870935Z .loc 1 0 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0:106 2026-02-21T09:13:45.3870990Z mov.b32 %r2132, -1; 2026-02-21T09:13:45.3871077Z mov.pred %p156, 0; 2026-02-21T09:13:45.3871130Z mov.b32 %r2129, 0; 2026-02-21T09:13:45.3871186Z mov.b32 %r2130, %r2129; 2026-02-21T09:13:45.3871248Z mov.b32 %r2131, %r2129; 2026-02-21T09:13:45.3871302Z mov.b32 %r2133, %r2129; 2026-02-21T09:13:45.3871395Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:13:45.3871507Z // => This Inner Loop Header: Depth=2 2026-02-21T09:13:45.3871682Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3871738Z add.s32 %r139, %r2132, 1; 2026-02-21T09:13:45.3871797Z setp.eq.b32 %p30, %r2132, 63; 2026-02-21T09:13:45.3871867Z selp.b32 %r2132, 0, %r139, %p30; 2026-02-21T09:13:45.3871925Z shl.b32 %r140, %r2131, 3; 2026-02-21T09:13:45.3871984Z add.s32 %r142, %r74, %r140; 2026-02-21T09:13:45.3872051Z add.s32 %r143, %r142, 262144; 2026-02-21T09:13:45.3872128Z add.s32 %r127, %r142, 262176; 2026-02-21T09:13:45.3872304Z .loc 1 54 31 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:54:31 2026-02-21T09:13:45.3872363Z shl.b32 %r144, %r2131, 14; 2026-02-21T09:13:45.3872431Z add.s32 %r145, %r74, %r144; 2026-02-21T09:13:45.3872490Z add.s32 %r146, %r145, 131072; 2026-02-21T09:13:45.3872657Z .loc 1 55 44 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:55:44 2026-02-21T09:13:45.3872724Z add.s32 %r147, %r145, 196608; 2026-02-21T09:13:45.3872908Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3872974Z bar.warp.sync -1; 2026-02-21T09:13:45.3873038Z // begin inline asm 2026-02-21T09:13:45.3873091Z 2026-02-21T09:13:45.3873142Z { 2026-02-21T09:13:45.3873204Z .reg .pred complete; 2026-02-21T09:13:45.3873267Z waitLoop: 2026-02-21T09:13:45.3873394Z mbarrier.try_wait.parity.shared.b64 complete, [%r127], %r2130; 2026-02-21T09:13:45.3873462Z @!complete bra.uni waitLoop; 2026-02-21T09:13:45.3873520Z } 2026-02-21T09:13:45.3873524Z 2026-02-21T09:13:45.3873581Z // end inline asm 2026-02-21T09:13:45.3873753Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3873815Z setp.eq.b32 %p29, %r2132, 63; 2026-02-21T09:13:45.3873991Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3874054Z elect.sync %r148|%p20, -1; 2026-02-21T09:13:45.3874119Z bfe.u32 %r149, %r146, 4, 14; 2026-02-21T09:13:45.3874184Z cvt.u64.u32 %rd24, %r149; 2026-02-21T09:13:45.3874259Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:13:45.3874319Z bfe.u32 %r150, %r147, 4, 14; 2026-02-21T09:13:45.3874386Z cvt.u64.u32 %rd25, %r150; 2026-02-21T09:13:45.3874457Z or.b64 %rd15, %rd25, -9223371899348713472; 2026-02-21T09:13:45.3874520Z mov.b32 %r130, 138412048; 2026-02-21T09:13:45.3874576Z // begin inline asm 2026-02-21T09:13:45.3874825Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r129 + 0 ], %rd14, %rd15, %r130, %p156; 2026-02-21T09:13:45.3874884Z // end inline asm 2026-02-21T09:13:45.3874942Z add.s32 %r151, %r145, 131104; 2026-02-21T09:13:45.3875007Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T09:13:45.3875064Z cvt.u64.u32 %rd26, %r152; 2026-02-21T09:13:45.3875134Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:13:45.3875198Z add.s32 %r153, %r145, 196640; 2026-02-21T09:13:45.3875254Z bfe.u32 %r154, %r153, 4, 14; 2026-02-21T09:13:45.3875313Z cvt.u64.u32 %rd27, %r154; 2026-02-21T09:13:45.3875379Z or.b64 %rd17, %rd27, -9223371899348713472; 2026-02-21T09:13:45.3875445Z mov.pred %p21, -1; 2026-02-21T09:13:45.3875500Z // begin inline asm 2026-02-21T09:13:45.3875641Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r129 + 0 ], %rd16, %rd17, %r130, %p21; 2026-02-21T09:13:45.3875703Z // end inline asm 2026-02-21T09:13:45.3875761Z add.s32 %r155, %r145, 139264; 2026-02-21T09:13:45.3875820Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T09:13:45.3875914Z cvt.u64.u32 %rd28, %r156; 2026-02-21T09:13:45.3875988Z or.b64 %rd18, %rd28, -9223371899348713472; 2026-02-21T09:13:45.3876044Z // begin inline asm 2026-02-21T09:13:45.3876181Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r129 + 256 ], %rd18, %rd15, %r130, %p156; 2026-02-21T09:13:45.3876268Z // end inline asm 2026-02-21T09:13:45.3876324Z add.s32 %r157, %r145, 139296; 2026-02-21T09:13:45.3876380Z bfe.u32 %r158, %r157, 4, 14; 2026-02-21T09:13:45.3876444Z cvt.u64.u32 %rd29, %r158; 2026-02-21T09:13:45.3876512Z or.b64 %rd20, %rd29, -9223371899348713472; 2026-02-21T09:13:45.3876568Z // begin inline asm 2026-02-21T09:13:45.3876703Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r129 + 256 ], %rd20, %rd17, %r130, %p21; 2026-02-21T09:13:45.3876767Z // end inline asm 2026-02-21T09:13:45.3876823Z cvt.u64.u32 %rd22, %r143; 2026-02-21T09:13:45.3876878Z // begin inline asm 2026-02-21T09:13:45.3877040Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:13:45.3877096Z // end inline asm 2026-02-21T09:13:45.3877161Z and.pred %p28, %p29, %p20; 2026-02-21T09:13:45.3877228Z add.s32 %r159, %r74, 262208; 2026-02-21T09:13:45.3877285Z cvt.u64.u32 %rd23, %r159; 2026-02-21T09:13:45.3877340Z // begin inline asm 2026-02-21T09:13:45.3877463Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:13:45.3877527Z // end inline asm 2026-02-21T09:13:45.3877688Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3877775Z setp.ne.b32 %p156, %r2132, 63; 2026-02-21T09:13:45.3877955Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3878017Z selp.b32 %r160, 1, 0, %p29; 2026-02-21T09:13:45.3878078Z xor.b32 %r2129, %r2129, %r160; 2026-02-21T09:13:45.3878143Z add.s32 %r137, %r74, 262224; 2026-02-21T09:13:45.3878200Z // begin inline asm 2026-02-21T09:13:45.3878250Z 2026-02-21T09:13:45.3878301Z { 2026-02-21T09:13:45.3878371Z @!%p29 bra.uni skipWait; 2026-02-21T09:13:45.3878432Z .reg .pred complete; 2026-02-21T09:13:45.3878485Z waitLoop: 2026-02-21T09:13:45.3878614Z mbarrier.try_wait.parity.shared.b64 complete, [%r137], %r2129; 2026-02-21T09:13:45.3878679Z @!complete bra.uni waitLoop; 2026-02-21T09:13:45.3878734Z skipWait: 2026-02-21T09:13:45.3878784Z } 2026-02-21T09:13:45.3878788Z 2026-02-21T09:13:45.3878850Z // end inline asm 2026-02-21T09:13:45.3879015Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3879074Z add.s32 %r161, %r2131, 1; 2026-02-21T09:13:45.3879142Z setp.eq.b32 %p31, %r161, 4; 2026-02-21T09:13:45.3879207Z selp.b32 %r2131, 0, %r161, %p31; 2026-02-21T09:13:45.3879266Z selp.b32 %r162, 1, 0, %p31; 2026-02-21T09:13:45.3879326Z xor.b32 %r2130, %r2130, %r162; 2026-02-21T09:13:45.3879513Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3879572Z add.s32 %r2133, %r2133, 1; 2026-02-21T09:13:45.3879636Z setp.lt.s32 %p32, %r2133, %r5; 2026-02-21T09:13:45.3879703Z @%p32 bra $L__BB0_7; 2026-02-21T09:13:45.3879787Z $L__BB0_8: // %._crit_edge7 2026-02-21T09:13:45.3879885Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3879947Z barrier.sync 1; 2026-02-21T09:13:45.3880022Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:45.3880077Z bra.uni $L__BB0_2; 2026-02-21T09:13:45.3880169Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3880344Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3880420Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:45.3880474Z barrier.sync 1; 2026-02-21T09:13:45.3880643Z .loc 1 21 67 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:21:67 2026-02-21T09:13:45.3880702Z mov.u32 %r76, %ctaid.y; 2026-02-21T09:13:45.3880781Z mov.u32 %r77, %ctaid.z; 2026-02-21T09:13:45.3880844Z mov.u32 %r78, %nctaid.x; 2026-02-21T09:13:45.3880901Z mov.u32 %r79, %nctaid.y; 2026-02-21T09:13:45.3880961Z mad.lo.s32 %r80, %r77, %r79, %r76; 2026-02-21T09:13:45.3881021Z mad.lo.s32 %r81, %r80, %r78, %r122; 2026-02-21T09:13:45.3881124Z mul.lo.s32 %r82, %r81, 384; 2026-02-21T09:13:45.3881283Z .loc 1 22 67 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:22:67 2026-02-21T09:13:45.3881340Z add.s32 %r83, %r82, 128; 2026-02-21T09:13:45.3881404Z cvt.s64.s32 %rd8, %r83; 2026-02-21T09:13:45.3881460Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:13:45.3881519Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:13:45.3881684Z .loc 1 21 67 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:21:67 2026-02-21T09:13:45.3881742Z cvt.s64.s32 %rd10, %r82; 2026-02-21T09:13:45.3881798Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:13:45.3881886Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:13:45.3882059Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3882115Z @%p17 bra $L__BB0_14; 2026-02-21T09:13:45.3882187Z // %bb.10: // %.lr.ph 2026-02-21T09:13:45.3882281Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3882339Z add.s32 %r2144, %r122, -1; 2026-02-21T09:13:45.3882394Z add.s32 %r19, %r1, -128; 2026-02-21T09:13:45.3882469Z mov.b32 %r2140, -1; 2026-02-21T09:13:45.3882529Z mov.b32 %r2134, 0; 2026-02-21T09:13:45.3882583Z mov.b32 %r2135, %r2134; 2026-02-21T09:13:45.3882637Z mov.b32 %r2143, %r2134; 2026-02-21T09:13:45.3882696Z mov.b32 %r2142, %r2134; 2026-02-21T09:13:45.3882748Z mov.b32 %r2138, %r2134; 2026-02-21T09:13:45.3882800Z mov.b32 %r2141, %r2134; 2026-02-21T09:13:45.3882862Z bra.uni $L__BB0_11; 2026-02-21T09:13:45.3882959Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:13:45.3883124Z .loc 1 0 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0:106 2026-02-21T09:13:45.3883185Z selp.b32 %r106, 0, %r2138, %p8; 2026-02-21T09:13:45.3883251Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:13:45.3883310Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:13:45.3883473Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3883536Z shl.b32 %r113, %r2135, 3; 2026-02-21T09:13:45.3883592Z add.s32 %r115, %r74, %r113; 2026-02-21T09:13:45.3883648Z add.s32 %r102, %r115, 262144; 2026-02-21T09:13:45.3883809Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3883864Z // begin inline asm 2026-02-21T09:13:45.3883912Z 2026-02-21T09:13:45.3883957Z { 2026-02-21T09:13:45.3884020Z .reg .pred complete; 2026-02-21T09:13:45.3884072Z waitLoop: 2026-02-21T09:13:45.3884189Z mbarrier.try_wait.parity.shared.b64 complete, [%r102], %r2134; 2026-02-21T09:13:45.3884257Z @!complete bra.uni waitLoop; 2026-02-21T09:13:45.3884304Z } 2026-02-21T09:13:45.3884307Z 2026-02-21T09:13:45.3884359Z // end inline asm 2026-02-21T09:13:45.3884525Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3884589Z add.s32 %r108, %r115, 262176; 2026-02-21T09:13:45.3884774Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3884828Z bar.sync 3, 64; 2026-02-21T09:13:45.3884891Z // begin inline asm 2026-02-21T09:13:45.3884999Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r108], 32768; 2026-02-21T09:13:45.3885051Z // end inline asm 2026-02-21T09:13:45.3885214Z .loc 1 54 31 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:54:31 2026-02-21T09:13:45.3885270Z shl.b32 %r116, %r2135, 14; 2026-02-21T09:13:45.3885325Z add.s32 %r117, %r74, %r116; 2026-02-21T09:13:45.3885381Z add.s32 %r105, %r117, 131072; 2026-02-21T09:13:45.3885568Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3885620Z bar.sync 3, 64; 2026-02-21T09:13:45.3885680Z elect.sync %r118|%p13, -1; 2026-02-21T09:13:45.3885772Z and.pred %p10, %p12, %p13; 2026-02-21T09:13:45.3885826Z // begin inline asm 2026-02-21T09:13:45.3886081Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r105], [%rd12, {%r106, %r2142}], [%r108]; 2026-02-21T09:13:45.3886143Z // end inline asm 2026-02-21T09:13:45.3886304Z .loc 1 55 44 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:55:44 2026-02-21T09:13:45.3886359Z add.s32 %r109, %r117, 196608; 2026-02-21T09:13:45.3886514Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3886567Z bar.sync 3, 64; 2026-02-21T09:13:45.3886652Z elect.sync %r119|%p14, -1; 2026-02-21T09:13:45.3886712Z and.pred %p11, %p12, %p14; 2026-02-21T09:13:45.3886774Z // begin inline asm 2026-02-21T09:13:45.3887018Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r109], [%rd13, {%r106, %r2143}], [%r108]; 2026-02-21T09:13:45.3887072Z // end inline asm 2026-02-21T09:13:45.3887245Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3887299Z add.s32 %r2138, %r106, 32; 2026-02-21T09:13:45.3887474Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3887537Z add.s32 %r120, %r2135, 1; 2026-02-21T09:13:45.3887597Z setp.eq.b32 %p15, %r120, 4; 2026-02-21T09:13:45.3887657Z selp.b32 %r2135, 0, %r120, %p15; 2026-02-21T09:13:45.3887714Z selp.b32 %r121, 1, 0, %p15; 2026-02-21T09:13:45.3887779Z xor.b32 %r2134, %r2134, %r121; 2026-02-21T09:13:45.3887948Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3888005Z add.s32 %r2141, %r2141, 1; 2026-02-21T09:13:45.3888071Z setp.lt.s32 %p16, %r2141, %r5; 2026-02-21T09:13:45.3888129Z @%p16 bra $L__BB0_11; 2026-02-21T09:13:45.3888182Z bra.uni $L__BB0_14; 2026-02-21T09:13:45.3888280Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:13:45.3888380Z // => This Inner Loop Header: Depth=2 2026-02-21T09:13:45.3888435Z add.s32 %r88, %r2140, 1; 2026-02-21T09:13:45.3888496Z setp.eq.b32 %p6, %r2140, 63; 2026-02-21T09:13:45.3888563Z selp.b32 %r2140, 0, %r88, %p6; 2026-02-21T09:13:45.3888620Z setp.ne.b32 %p7, %r2140, 0; 2026-02-21T09:13:45.3888677Z setp.eq.b32 %p8, %r2140, 0; 2026-02-21T09:13:45.3888739Z @%p7 bra $L__BB0_13; 2026-02-21T09:13:45.3888829Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:13:45.3888884Z add.s32 %r2144, %r2144, 1; 2026-02-21T09:13:45.3889043Z .loc 1 39 35 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:39:35 2026-02-21T09:13:45.3889107Z shr.s32 %r89, %r2144, 31; 2026-02-21T09:13:45.3889161Z shr.u32 %r90, %r89, 25; 2026-02-21T09:13:45.3889217Z add.s32 %r91, %r2144, %r90; 2026-02-21T09:13:45.3889280Z shr.s32 %r92, %r91, 7; 2026-02-21T09:13:45.3889440Z .loc 1 40 33 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:40:33 2026-02-21T09:13:45.3889495Z shl.b32 %r93, %r92, 3; 2026-02-21T09:13:45.3889656Z .loc 1 41 39 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:41:39 2026-02-21T09:13:45.3889711Z sub.s32 %r94, 8, %r93; 2026-02-21T09:13:45.3889871Z .loc 1 41 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:41:52 2026-02-21T09:13:45.3889923Z min.s32 %r95, %r94, 8; 2026-02-21T09:13:45.3890088Z .loc 1 42 45 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:42:45 2026-02-21T09:13:45.3890145Z and.b32 %r96, %r91, -128; 2026-02-21T09:13:45.3890224Z sub.s32 %r97, %r2144, %r96; 2026-02-21T09:13:45.3890390Z .loc 1 43 51 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:43:51 2026-02-21T09:13:45.3890446Z div.s32 %r98, %r97, %r95; 2026-02-21T09:13:45.3890627Z .loc 1 42 64 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:42:64 2026-02-21T09:13:45.3890692Z mul.lo.s32 %r99, %r98, %r95; 2026-02-21T09:13:45.3890746Z sub.s32 %r100, %r97, %r99; 2026-02-21T09:13:45.3890902Z .loc 1 42 30 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:42:30 2026-02-21T09:13:45.3890965Z add.s32 %r101, %r100, %r93; 2026-02-21T09:13:45.3891127Z .loc 1 44 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:44:27 2026-02-21T09:13:45.3891181Z shl.b32 %r2142, %r101, 8; 2026-02-21T09:13:45.3891358Z .loc 1 45 27 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:45:27 2026-02-21T09:13:45.3891425Z shl.b32 %r2143, %r98, 8; 2026-02-21T09:13:45.3891479Z bra.uni $L__BB0_13; 2026-02-21T09:13:45.3891560Z $L__BB0_14: // %._crit_edge 2026-02-21T09:13:45.3891652Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3891819Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3891873Z barrier.sync 1; 2026-02-21T09:13:45.3891977Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:45.3892033Z bra.uni $L__BB0_2; 2026-02-21T09:13:45.3892124Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:45.3892275Z .loc 1 19 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:19 2026-02-21T09:13:45.3892336Z barrier.sync 1; 2026-02-21T09:13:45.3892389Z barrier.sync 1; 2026-02-21T09:13:45.3892442Z bra.uni $L__BB0_2; 2026-02-21T09:13:45.3892532Z $L__BB0_23: // %._crit_edge10 2026-02-21T09:13:45.3892692Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3892759Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:13:45.3892820Z bar.sync 0, 128; 2026-02-21T09:13:45.3892874Z barrier.sync 1; 2026-02-21T09:13:45.3892947Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:13:45.3893103Z .loc 1 56 52 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:56:52 2026-02-21T09:13:45.3893164Z // begin inline asm 2026-02-21T09:13:45.3893210Z 2026-02-21T09:13:45.3893256Z { 2026-02-21T09:13:45.3893319Z .reg .pred complete; 2026-02-21T09:13:45.3893369Z waitLoop: 2026-02-21T09:13:45.3893485Z mbarrier.try_wait.parity.shared.b64 complete, [%r2115], %r2155; 2026-02-21T09:13:45.3893546Z @!complete bra.uni waitLoop; 2026-02-21T09:13:45.3893599Z } 2026-02-21T09:13:45.3893602Z 2026-02-21T09:13:45.3893655Z // end inline asm 2026-02-21T09:13:45.3893816Z .loc 1 33 106 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:106 2026-02-21T09:13:45.3893878Z bar.sync 0, 128; 2026-02-21T09:13:45.3893933Z // begin inline asm 2026-02-21T09:13:45.3894018Z @%p145 mbarrier.inval.shared::cta.b64 [%r2115]; 2026-02-21T09:13:45.3894077Z // end inline asm 2026-02-21T09:13:45.3894131Z // begin inline asm 2026-02-21T09:13:45.3894214Z @%p145 mbarrier.inval.shared::cta.b64 [%r744]; 2026-02-21T09:13:45.3894265Z // end inline asm 2026-02-21T09:13:45.3894325Z // begin inline asm 2026-02-21T09:13:45.3894403Z @%p145 mbarrier.inval.shared::cta.b64 [%r736]; 2026-02-21T09:13:45.3894453Z // end inline asm 2026-02-21T09:13:45.3894510Z bar.sync 0, 128; 2026-02-21T09:13:45.3894562Z // begin inline asm 2026-02-21T09:13:45.3894636Z @%p145 mbarrier.inval.shared::cta.b64 [%r737]; 2026-02-21T09:13:45.3894717Z // end inline asm 2026-02-21T09:13:45.3894779Z bar.sync 0, 128; 2026-02-21T09:13:45.3894834Z // begin inline asm 2026-02-21T09:13:45.3894939Z @%p145 mbarrier.inval.shared::cta.b64 [%r738]; 2026-02-21T09:13:45.3894997Z // end inline asm 2026-02-21T09:13:45.3895048Z bar.sync 0, 128; 2026-02-21T09:13:45.3895099Z // begin inline asm 2026-02-21T09:13:45.3895177Z @%p145 mbarrier.inval.shared::cta.b64 [%r739]; 2026-02-21T09:13:45.3895258Z // end inline asm 2026-02-21T09:13:45.3895310Z // begin inline asm 2026-02-21T09:13:45.3895383Z @%p145 mbarrier.inval.shared::cta.b64 [%r732]; 2026-02-21T09:13:45.3895442Z // end inline asm 2026-02-21T09:13:45.3895495Z bar.sync 0, 128; 2026-02-21T09:13:45.3895548Z // begin inline asm 2026-02-21T09:13:45.3895628Z @%p145 mbarrier.inval.shared::cta.b64 [%r733]; 2026-02-21T09:13:45.3895679Z // end inline asm 2026-02-21T09:13:45.3895731Z bar.sync 0, 128; 2026-02-21T09:13:45.3895783Z // begin inline asm 2026-02-21T09:13:45.3895862Z @%p145 mbarrier.inval.shared::cta.b64 [%r734]; 2026-02-21T09:13:45.3895913Z // end inline asm 2026-02-21T09:13:45.3895987Z bar.sync 0, 128; 2026-02-21T09:13:45.3896048Z // begin inline asm 2026-02-21T09:13:45.3896124Z @%p145 mbarrier.inval.shared::cta.b64 [%r735]; 2026-02-21T09:13:45.3896175Z // end inline asm 2026-02-21T09:13:45.3896340Z .loc 1 33 4 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:33:4 2026-02-21T09:13:45.3896393Z bar.sync 0, 128; 2026-02-21T09:13:45.3896447Z // begin inline asm 2026-02-21T09:13:45.3896564Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r2127, 512; 2026-02-21T09:13:45.3896622Z // end inline asm 2026-02-21T09:13:45.3896725Z st.shared.b32 [global_smem+262232], 50529027; 2026-02-21T09:13:45.3896782Z barrier.sync 1; 2026-02-21T09:13:45.3896867Z $L__BB0_24: // %common.ret 2026-02-21T09:13:45.3897020Z .loc 1 0 0 // c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py:0 2026-02-21T09:13:45.3897069Z ret; 2026-02-21T09:13:45.3897122Z $L__tmp1: 2026-02-21T09:13:45.3897182Z $L__func_end0: 2026-02-21T09:13:45.3897261Z // -- End function 2026-02-21T09:13:45.3897309Z } 2026-02-21T09:13:45.3897510Z .file 1 "/tmp/torchinductor_root/6j/c6jxaho2zcccndj4fe2uvcxe2a7hsc2t33uw3vmdybo6egxaukgu.py" 2026-02-21T09:13:45.3897568Z .section .debug_abbrev 2026-02-21T09:13:45.3897616Z { 2026-02-21T09:13:45.3897708Z .b8 1 // Abbreviation Code 2026-02-21T09:13:45.3897791Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:13:45.3897870Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:13:45.3897945Z .b8 37 // DW_AT_producer 2026-02-21T09:13:45.3898022Z .b8 8 // DW_FORM_string 2026-02-21T09:13:45.3898091Z .b8 19 // DW_AT_language 2026-02-21T09:13:45.3898163Z .b8 5 // DW_FORM_data2 2026-02-21T09:13:45.3898241Z .b8 3 // DW_AT_name 2026-02-21T09:13:45.3898312Z .b8 8 // DW_FORM_string 2026-02-21T09:13:45.3898387Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:13:45.3898464Z .b8 6 // DW_FORM_data4 2026-02-21T09:13:45.3898535Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:13:45.3898606Z .b8 8 // DW_FORM_string 2026-02-21T09:13:45.3898672Z .b8 0 // EOM(1) 2026-02-21T09:13:45.3898745Z .b8 0 // EOM(2) 2026-02-21T09:13:45.3898807Z .b8 0 // EOM(3) 2026-02-21T09:13:45.3898853Z } 2026-02-21T09:13:45.3898916Z .section .debug_info 2026-02-21T09:13:45.3898964Z { 2026-02-21T09:13:45.3899043Z .b32 104 // Length of Unit 2026-02-21T09:13:45.3899129Z .b8 2 // DWARF version number 2026-02-21T09:13:45.3899178Z .b8 0 2026-02-21T09:13:45.3899291Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:13:45.3899412Z .b8 8 // Address Size (in bytes) 2026-02-21T09:13:45.3899517Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:13:45.3899596Z .b8 116 // DW_AT_producer 2026-02-21T09:13:45.3899667Z .b8 114 2026-02-21T09:13:45.3899725Z .b8 105 2026-02-21T09:13:45.3899773Z .b8 116 2026-02-21T09:13:45.3899820Z .b8 111 2026-02-21T09:13:45.3899867Z .b8 110 2026-02-21T09:13:45.3899924Z .b8 0 2026-02-21T09:13:45.3899993Z .b8 2 // DW_AT_language 2026-02-21T09:13:45.3900040Z .b8 0 2026-02-21T09:13:45.3900118Z .b8 99 // DW_AT_name 2026-02-21T09:13:45.3900165Z .b8 54 2026-02-21T09:13:45.3900213Z .b8 106 2026-02-21T09:13:45.3900259Z .b8 120 2026-02-21T09:13:45.3900314Z .b8 97 2026-02-21T09:13:45.3900363Z .b8 104 2026-02-21T09:13:45.3900410Z .b8 111 2026-02-21T09:13:45.3900490Z .b8 50 2026-02-21T09:13:45.3900540Z .b8 122 2026-02-21T09:13:45.3900588Z .b8 99 2026-02-21T09:13:45.3900634Z .b8 99 2026-02-21T09:13:45.3900688Z .b8 99 2026-02-21T09:13:45.3900735Z .b8 110 2026-02-21T09:13:45.3900780Z .b8 100 2026-02-21T09:13:45.3900833Z .b8 106 2026-02-21T09:13:45.3900881Z .b8 52 2026-02-21T09:13:45.3900930Z .b8 102 2026-02-21T09:13:45.3900977Z .b8 101 2026-02-21T09:13:45.3901030Z .b8 50 2026-02-21T09:13:45.3901076Z .b8 117 2026-02-21T09:13:45.3901123Z .b8 118 2026-02-21T09:13:45.3901169Z .b8 99 2026-02-21T09:13:45.3901224Z .b8 120 2026-02-21T09:13:45.3901291Z .b8 101 2026-02-21T09:13:45.3901339Z .b8 50 2026-02-21T09:13:45.3901393Z .b8 97 2026-02-21T09:13:45.3901439Z .b8 55 2026-02-21T09:13:45.3901487Z .b8 104 2026-02-21T09:13:45.3901534Z .b8 115 2026-02-21T09:13:45.3901586Z .b8 99 2026-02-21T09:13:45.3901632Z .b8 50 2026-02-21T09:13:45.3901679Z .b8 116 2026-02-21T09:13:45.3901731Z .b8 51 2026-02-21T09:13:45.3901777Z .b8 51 2026-02-21T09:13:45.3901825Z .b8 117 2026-02-21T09:13:45.3901872Z .b8 119 2026-02-21T09:13:45.3901927Z .b8 51 2026-02-21T09:13:45.3901975Z .b8 118 2026-02-21T09:13:45.3902023Z .b8 109 2026-02-21T09:13:45.3902080Z .b8 100 2026-02-21T09:13:45.3902128Z .b8 121 2026-02-21T09:13:45.3902174Z .b8 98 2026-02-21T09:13:45.3902221Z .b8 111 2026-02-21T09:13:45.3902274Z .b8 54 2026-02-21T09:13:45.3902322Z .b8 101 2026-02-21T09:13:45.3902371Z .b8 103 2026-02-21T09:13:45.3902418Z .b8 120 2026-02-21T09:13:45.3902474Z .b8 97 2026-02-21T09:13:45.3902520Z .b8 117 2026-02-21T09:13:45.3902568Z .b8 107 2026-02-21T09:13:45.3902623Z .b8 103 2026-02-21T09:13:45.3902671Z .b8 117 2026-02-21T09:13:45.3902718Z .b8 46 2026-02-21T09:13:45.3902765Z .b8 112 2026-02-21T09:13:45.3902821Z .b8 121 2026-02-21T09:13:45.3902868Z .b8 0 2026-02-21T09:13:45.3902959Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:13:45.3903037Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:13:45.3903085Z .b8 116 2026-02-21T09:13:45.3903133Z .b8 109 2026-02-21T09:13:45.3903183Z .b8 112 2026-02-21T09:13:45.3903237Z .b8 47 2026-02-21T09:13:45.3903287Z .b8 116 2026-02-21T09:13:45.3903333Z .b8 111 2026-02-21T09:13:45.3903387Z .b8 114 2026-02-21T09:13:45.3903433Z .b8 99 2026-02-21T09:13:45.3903480Z .b8 104 2026-02-21T09:13:45.3903528Z .b8 105 2026-02-21T09:13:45.3903581Z .b8 110 2026-02-21T09:13:45.3903630Z .b8 100 2026-02-21T09:13:45.3903677Z .b8 117 2026-02-21T09:13:45.3903722Z .b8 99 2026-02-21T09:13:45.3903777Z .b8 116 2026-02-21T09:13:45.3903825Z .b8 111 2026-02-21T09:13:45.3903872Z .b8 114 2026-02-21T09:13:45.3903925Z .b8 95 2026-02-21T09:13:45.3903981Z .b8 114 2026-02-21T09:13:45.3904029Z .b8 111 2026-02-21T09:13:45.3904076Z .b8 111 2026-02-21T09:13:45.3904131Z .b8 116 2026-02-21T09:13:45.3904177Z .b8 47 2026-02-21T09:13:45.3904224Z .b8 54 2026-02-21T09:13:45.3904278Z .b8 106 2026-02-21T09:13:45.3904325Z .b8 0 2026-02-21T09:13:45.3904371Z } 2026-02-21T09:13:45.3904433Z .section .debug_macinfo { } 2026-02-21T09:13:45.3904437Z 2026-02-21T09:13:45.3904520Z ================================================================ 2026-02-21T09:13:45.3904644Z please share the reproducer above with Triton project. 2026-02-21T09:13:46.0987562Z [162s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:13:46.0987857Z 2026-02-21T09:13:46.0987954Z 2026-02-21T09:13:46.0988256Z 2026-02-21T09:13:46.0988487Z ================================================================ 2026-02-21T09:13:46.0988721Z Internal Triton PTX codegen error 2026-02-21T09:13:46.0988899Z `ptxas` stderr: 2026-02-21T09:13:46.0989369Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 270 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:13:46.0989846Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:13:46.0989999Z 2026-02-21T09:13:46.0990494Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpv2efbo2e.ptx -o /tmp/tmpv2efbo2e.ptx.o 2026-02-21T09:13:46.0990973Z 2026-02-21T09:13:46.0990976Z 2026-02-21T09:13:46.0991031Z // 2026-02-21T09:13:46.0991174Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:13:46.0991352Z // 2026-02-21T09:13:46.0991419Z 2026-02-21T09:13:46.0991481Z .version 8.7 2026-02-21T09:13:46.0991617Z .target sm_100a 2026-02-21T09:13:46.0991754Z .address_size 64 2026-02-21T09:13:46.0991838Z 2026-02-21T09:13:46.0991957Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:13:46.0992268Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:13:46.0992478Z // @_helion_matmul 2026-02-21T09:13:46.0992687Z .visible .entry _helion_matmul( 2026-02-21T09:13:46.0992905Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:13:46.0993165Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:13:46.0993420Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:13:46.0993660Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:13:46.0993919Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:13:46.0994125Z ) 2026-02-21T09:13:46.0994258Z .reqntid 256 2026-02-21T09:13:46.0994388Z .maxnreg 32 2026-02-21T09:13:46.0994520Z { 2026-02-21T09:13:46.0994650Z .reg .pred %p<144>; 2026-02-21T09:13:46.0997434Z .reg .b32 %r<1235>; 2026-02-21T09:13:46.0997599Z .reg .b64 %rd<600>; 2026-02-21T09:13:46.0997867Z .loc 1 19 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:19:0 2026-02-21T09:13:46.0998165Z $L__func_begin0: 2026-02-21T09:13:46.0998414Z .loc 1 19 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:19:0 2026-02-21T09:13:46.0998651Z 2026-02-21T09:13:46.0998704Z // %bb.0: 2026-02-21T09:13:46.0998855Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:13:46.0999045Z $L__tmp0: 2026-02-21T09:13:46.0999280Z .loc 1 19 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:19 2026-02-21T09:13:46.0999558Z mov.u32 %r1, %tid.x; 2026-02-21T09:13:46.0999707Z shr.u32 %r2, %r1, 5; 2026-02-21T09:13:46.0999863Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:13:46.1001299Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:13:46.1002541Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:13:46.1002774Z `ptxas` stderr: 2026-02-21T09:13:46.1003176Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 270 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:13:46.1003735Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:13:46.1003875Z 2026-02-21T09:13:46.1004249Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpv2efbo2e.ptx -o /tmp/tmpv2efbo2e.ptx.o 2026-02-21T09:13:46.1004766Z 2026-02-21T09:13:46.1004892Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:13:46.1005140Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:13:46.1005296Z @%p3 bra $L__BB0_16; 2026-02-21T09:13:46.1005444Z bra.uni $L__BB0_1; 2026-02-21T09:13:46.1005579Z $L__BB0_16: 2026-02-21T09:13:46.1005820Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0:0 2026-02-21T09:13:46.1006127Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:13:46.1006342Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:13:46.1006594Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:13:46.1006880Z .loc 1 19 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:19 2026-02-21T09:13:46.1007193Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:13:46.1007377Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T09:13:46.1007547Z mov.b32 %r161, global_smem; 2026-02-21T09:13:46.1007704Z // begin inline asm 2026-02-21T09:13:46.1007956Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r161], 256; 2026-02-21T09:13:46.1008210Z // end inline asm 2026-02-21T09:13:46.1008389Z bar.sync 0, 128; 2026-02-21T09:13:46.1008549Z ld.shared.b32 %r1206, [global_smem]; 2026-02-21T09:13:46.1008717Z bar.sync 0, 128; 2026-02-21T09:13:46.1008855Z // begin inline asm 2026-02-21T09:13:46.1009053Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:13:46.1009275Z // end inline asm 2026-02-21T09:13:46.1009516Z .loc 1 21 67 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:21:67 2026-02-21T09:13:46.1009804Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:13:46.1009961Z mov.u32 %r474, %ctaid.y; 2026-02-21T09:13:46.1010108Z mov.u32 %r475, %ctaid.z; 2026-02-21T09:13:46.1010258Z mov.u32 %r476, %nctaid.x; 2026-02-21T09:13:46.1010409Z mov.u32 %r477, %nctaid.y; 2026-02-21T09:13:46.1010572Z mad.lo.s32 %r478, %r475, %r477, %r474; 2026-02-21T09:13:46.1010750Z mad.lo.s32 %r479, %r478, %r476, %r41; 2026-02-21T09:13:46.1010928Z mul.lo.s32 %r480, %r479, 384; 2026-02-21T09:13:46.1011086Z cvt.s64.s32 %rd84, %r480; 2026-02-21T09:13:46.1011245Z add.s64 %rd45, %rd7, %rd84; 2026-02-21T09:13:46.1011399Z shl.b32 %r481, %r1, 2; 2026-02-21T09:13:46.1011554Z add.s32 %r162, %r161, %r481; 2026-02-21T09:13:46.1011709Z mov.b32 %r1234, 0; 2026-02-21T09:13:46.1011843Z // begin inline asm 2026-02-21T09:13:46.1012000Z @%p33 st.shared.b32 [ %r162 + 0 ], %r1234; 2026-02-21T09:13:46.1012169Z // end inline asm 2026-02-21T09:13:46.1012311Z bar.warp.sync -1; 2026-02-21T09:13:46.1012454Z setp.eq.b32 %p132, %r1, 0; 2026-02-21T09:13:46.1012614Z cvt.u64.u32 %rd30, %r161; 2026-02-21T09:13:46.1012760Z // begin inline asm 2026-02-21T09:13:46.1013010Z @%p132 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd4; 2026-02-21T09:13:46.1013290Z // end inline asm 2026-02-21T09:13:46.1013421Z // begin inline asm 2026-02-21T09:13:46.1013646Z @%p132 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:13:46.1013893Z // end inline asm 2026-02-21T09:13:46.1014031Z mov.b32 %r164, 32; 2026-02-21T09:13:46.1014169Z // begin inline asm 2026-02-21T09:13:46.1014418Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r164; 2026-02-21T09:13:46.1014725Z // end inline asm 2026-02-21T09:13:46.1014868Z mov.b32 %r165, 256; 2026-02-21T09:13:46.1015016Z // begin inline asm 2026-02-21T09:13:46.1015254Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r165; 2026-02-21T09:13:46.1015534Z // end inline asm 2026-02-21T09:13:46.1015720Z mov.b32 %r166, 2048; 2026-02-21T09:13:46.1015872Z // begin inline asm 2026-02-21T09:13:46.1016121Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r166; 2026-02-21T09:13:46.1016415Z // end inline asm 2026-02-21T09:13:46.1016599Z // begin inline asm 2026-02-21T09:13:46.1016854Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r166; 2026-02-21T09:13:46.1017144Z // end inline asm 2026-02-21T09:13:46.1017279Z mov.b64 %rd38, 4096; 2026-02-21T09:13:46.1017433Z // begin inline asm 2026-02-21T09:13:46.1017691Z @%p132 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd38; 2026-02-21T09:13:46.1017987Z // end inline asm 2026-02-21T09:13:46.1018123Z mov.b32 %r168, 1; 2026-02-21T09:13:46.1018268Z // begin inline asm 2026-02-21T09:13:46.1018535Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r168; 2026-02-21T09:13:46.1018870Z // end inline asm 2026-02-21T09:13:46.1019017Z // begin inline asm 2026-02-21T09:13:46.1019281Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r168; 2026-02-21T09:13:46.1019577Z // end inline asm 2026-02-21T09:13:46.1019711Z // begin inline asm 2026-02-21T09:13:46.1019955Z @%p132 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:13:46.1020225Z // end inline asm 2026-02-21T09:13:46.1020366Z // begin inline asm 2026-02-21T09:13:46.1020661Z @%p132 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:46.1020953Z // end inline asm 2026-02-21T09:13:46.1021093Z // begin inline asm 2026-02-21T09:13:46.1021330Z @%p132 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x2; 2026-02-21T09:13:46.1021610Z // end inline asm 2026-02-21T09:13:46.1021749Z // begin inline asm 2026-02-21T09:13:46.1021978Z @%p132 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:46.1022238Z // end inline asm 2026-02-21T09:13:46.1022366Z // begin inline asm 2026-02-21T09:13:46.1022715Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd45 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:13:46.1023078Z // end inline asm 2026-02-21T09:13:46.1023217Z // begin inline asm 2026-02-21T09:13:46.1023420Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd45 + 0 ], 0x80; 2026-02-21T09:13:46.1023670Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:46.1023865Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:46.1024033Z // end inline asm 2026-02-21T09:13:46.1024166Z bar.sync 0, 128; 2026-02-21T09:13:46.1024405Z .loc 1 22 67 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:22:67 2026-02-21T09:13:46.1024726Z add.s32 %r482, %r480, 128; 2026-02-21T09:13:46.1024880Z cvt.s64.s32 %rd85, %r482; 2026-02-21T09:13:46.1025036Z add.s64 %rd63, %rd7, %rd85; 2026-02-21T09:13:46.1025187Z bar.sync 0, 128; 2026-02-21T09:13:46.1025325Z // begin inline asm 2026-02-21T09:13:46.1025483Z @%p33 st.shared.b32 [ %r162 + 0 ], %r1234; 2026-02-21T09:13:46.1025653Z // end inline asm 2026-02-21T09:13:46.1025791Z bar.warp.sync -1; 2026-02-21T09:13:46.1025925Z // begin inline asm 2026-02-21T09:13:46.1026173Z @%p132 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd5; 2026-02-21T09:13:46.1026440Z // end inline asm 2026-02-21T09:13:46.1026573Z // begin inline asm 2026-02-21T09:13:46.1026790Z @%p132 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:13:46.1027043Z // end inline asm 2026-02-21T09:13:46.1027179Z // begin inline asm 2026-02-21T09:13:46.1027412Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r164; 2026-02-21T09:13:46.1027685Z // end inline asm 2026-02-21T09:13:46.1027833Z mov.b32 %r173, 128; 2026-02-21T09:13:46.1027979Z // begin inline asm 2026-02-21T09:13:46.1028208Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r173; 2026-02-21T09:13:46.1028510Z // end inline asm 2026-02-21T09:13:46.1028638Z // begin inline asm 2026-02-21T09:13:46.1028880Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r166; 2026-02-21T09:13:46.1029208Z // end inline asm 2026-02-21T09:13:46.1029338Z mov.b32 %r175, 4096; 2026-02-21T09:13:46.1029484Z // begin inline asm 2026-02-21T09:13:46.1029721Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r175; 2026-02-21T09:13:46.1029996Z // end inline asm 2026-02-21T09:13:46.1030124Z // begin inline asm 2026-02-21T09:13:46.1030381Z @%p132 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd38; 2026-02-21T09:13:46.1030662Z // end inline asm 2026-02-21T09:13:46.1030791Z // begin inline asm 2026-02-21T09:13:46.1031052Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r168; 2026-02-21T09:13:46.1031363Z // end inline asm 2026-02-21T09:13:46.1031503Z // begin inline asm 2026-02-21T09:13:46.1031747Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r168; 2026-02-21T09:13:46.1032027Z // end inline asm 2026-02-21T09:13:46.1032155Z // begin inline asm 2026-02-21T09:13:46.1032387Z @%p132 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:13:46.1032658Z // end inline asm 2026-02-21T09:13:46.1032785Z // begin inline asm 2026-02-21T09:13:46.1033063Z @%p132 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:46.1033350Z // end inline asm 2026-02-21T09:13:46.1033486Z // begin inline asm 2026-02-21T09:13:46.1033713Z @%p132 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x2; 2026-02-21T09:13:46.1033984Z // end inline asm 2026-02-21T09:13:46.1034116Z // begin inline asm 2026-02-21T09:13:46.1034338Z @%p132 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:46.1034597Z // end inline asm 2026-02-21T09:13:46.1034751Z // begin inline asm 2026-02-21T09:13:46.1035097Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd63 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:13:46.1035470Z // end inline asm 2026-02-21T09:13:46.1035608Z // begin inline asm 2026-02-21T09:13:46.1035823Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd63 + 0 ], 0x80; 2026-02-21T09:13:46.1036069Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:46.1036266Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:46.1036440Z // end inline asm 2026-02-21T09:13:46.1036578Z bar.sync 0, 128; 2026-02-21T09:13:46.1036827Z .loc 1 24 71 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:24:71 2026-02-21T09:13:46.1037135Z add.s32 %r483, %r480, 256; 2026-02-21T09:13:46.1037290Z cvt.s64.s32 %rd86, %r483; 2026-02-21T09:13:46.1037456Z add.s64 %rd81, %rd7, %rd86; 2026-02-21T09:13:46.1037619Z bar.sync 0, 128; 2026-02-21T09:13:46.1037750Z // begin inline asm 2026-02-21T09:13:46.1037911Z @%p33 st.shared.b32 [ %r162 + 0 ], %r1234; 2026-02-21T09:13:46.1038084Z // end inline asm 2026-02-21T09:13:46.1038229Z bar.warp.sync -1; 2026-02-21T09:13:46.1038368Z // begin inline asm 2026-02-21T09:13:46.1038619Z @%p132 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd6; 2026-02-21T09:13:46.1038900Z // end inline asm 2026-02-21T09:13:46.1039034Z // begin inline asm 2026-02-21T09:13:46.1039262Z @%p132 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:13:46.1039513Z // end inline asm 2026-02-21T09:13:46.1039646Z mov.b32 %r180, 64; 2026-02-21T09:13:46.1039776Z // begin inline asm 2026-02-21T09:13:46.1040012Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r180; 2026-02-21T09:13:46.1040274Z // end inline asm 2026-02-21T09:13:46.1040409Z // begin inline asm 2026-02-21T09:13:46.1040640Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r165; 2026-02-21T09:13:46.1040934Z // end inline asm 2026-02-21T09:13:46.1041067Z // begin inline asm 2026-02-21T09:13:46.1041309Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r175; 2026-02-21T09:13:46.1041620Z // end inline asm 2026-02-21T09:13:46.1041747Z // begin inline asm 2026-02-21T09:13:46.1041990Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r166; 2026-02-21T09:13:46.1042258Z // end inline asm 2026-02-21T09:13:46.1042395Z mov.b64 %rd74, 8192; 2026-02-21T09:13:46.1042536Z // begin inline asm 2026-02-21T09:13:46.1042783Z @%p132 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd74; 2026-02-21T09:13:46.1043068Z // end inline asm 2026-02-21T09:13:46.1043194Z // begin inline asm 2026-02-21T09:13:46.1043477Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r168; 2026-02-21T09:13:46.1043763Z // end inline asm 2026-02-21T09:13:46.1043898Z // begin inline asm 2026-02-21T09:13:46.1044149Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r168; 2026-02-21T09:13:46.1044423Z // end inline asm 2026-02-21T09:13:46.1044558Z // begin inline asm 2026-02-21T09:13:46.1044816Z @%p132 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:13:46.1045081Z // end inline asm 2026-02-21T09:13:46.1045207Z // begin inline asm 2026-02-21T09:13:46.1045483Z @%p132 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:46.1045767Z // end inline asm 2026-02-21T09:13:46.1045901Z // begin inline asm 2026-02-21T09:13:46.1046137Z @%p132 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x3; 2026-02-21T09:13:46.1046394Z // end inline asm 2026-02-21T09:13:46.1046530Z // begin inline asm 2026-02-21T09:13:46.1046752Z @%p132 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:13:46.1047016Z // end inline asm 2026-02-21T09:13:46.1047143Z // begin inline asm 2026-02-21T09:13:46.1047483Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd81 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:13:46.1047852Z // end inline asm 2026-02-21T09:13:46.1047981Z // begin inline asm 2026-02-21T09:13:46.1048190Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd81 + 0 ], 0x80; 2026-02-21T09:13:46.1048431Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:46.1048625Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:46.1048797Z // end inline asm 2026-02-21T09:13:46.1048937Z bar.sync 0, 128; 2026-02-21T09:13:46.1049078Z cvta.global.u64 %rd87, %rd81; 2026-02-21T09:13:46.1049365Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1049670Z sub.s32 %r484, 256, %r41; 2026-02-21T09:13:46.1049828Z mul.hi.s32 %r485, %r484, -580400985; 2026-02-21T09:13:46.1050009Z add.s32 %r486, %r485, %r484; 2026-02-21T09:13:46.1050159Z shr.u32 %r487, %r486, 31; 2026-02-21T09:13:46.1050310Z shr.s32 %r488, %r486, 11; 2026-02-21T09:13:46.1050456Z add.s32 %r489, %r488, %r487; 2026-02-21T09:13:46.1050617Z mul.lo.s32 %r490, %r489, 2368; 2026-02-21T09:13:46.1050782Z setp.ne.b32 %p120, %r484, %r490; 2026-02-21T09:13:46.1050955Z setp.lt.u32 %p121, %r41, 257; 2026-02-21T09:13:46.1051119Z and.pred %p122, %p121, %p120; 2026-02-21T09:13:46.1051275Z selp.b32 %r491, 1, 0, %p122; 2026-02-21T09:13:46.1051432Z add.s32 %r492, %r489, %r491; 2026-02-21T09:13:46.1051580Z shl.b32 %r42, %r492, 6; 2026-02-21T09:13:46.1051845Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1052148Z shfl.sync.idx.b32 %r43, %r2, 0, 31, -1; 2026-02-21T09:13:46.1052332Z shl.b32 %r493, %r43, 21; 2026-02-21T09:13:46.1052485Z and.b32 %r494, %r493, 6291456; 2026-02-21T09:13:46.1052649Z add.s32 %r186, %r494, %r1206; 2026-02-21T09:13:46.1052840Z mov.pred %p89, -1; 2026-02-21T09:13:46.1052980Z // begin inline asm 2026-02-21T09:13:46.1053387Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 0], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1053824Z // end inline asm 2026-02-21T09:13:46.1053961Z // begin inline asm 2026-02-21T09:13:46.1054339Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 16], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1054795Z // end inline asm 2026-02-21T09:13:46.1054931Z // begin inline asm 2026-02-21T09:13:46.1055300Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 32], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1055727Z // end inline asm 2026-02-21T09:13:46.1055863Z // begin inline asm 2026-02-21T09:13:46.1056244Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 48], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1056676Z // end inline asm 2026-02-21T09:13:46.1056816Z // begin inline asm 2026-02-21T09:13:46.1057239Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 64], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1057660Z // end inline asm 2026-02-21T09:13:46.1057801Z // begin inline asm 2026-02-21T09:13:46.1058179Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 80], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1058605Z // end inline asm 2026-02-21T09:13:46.1058747Z // begin inline asm 2026-02-21T09:13:46.1059126Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 96], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1059553Z // end inline asm 2026-02-21T09:13:46.1059686Z // begin inline asm 2026-02-21T09:13:46.1060081Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 112], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1060510Z // end inline asm 2026-02-21T09:13:46.1060644Z // begin inline asm 2026-02-21T09:13:46.1061036Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 128], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1061456Z // end inline asm 2026-02-21T09:13:46.1061602Z // begin inline asm 2026-02-21T09:13:46.1062000Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 144], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1062433Z // end inline asm 2026-02-21T09:13:46.1062571Z // begin inline asm 2026-02-21T09:13:46.1062952Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 160], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1063389Z // end inline asm 2026-02-21T09:13:46.1063522Z // begin inline asm 2026-02-21T09:13:46.1063915Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 176], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1064356Z // end inline asm 2026-02-21T09:13:46.1064500Z // begin inline asm 2026-02-21T09:13:46.1064895Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 192], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1065356Z // end inline asm 2026-02-21T09:13:46.1065491Z // begin inline asm 2026-02-21T09:13:46.1065857Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 208], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1066306Z // end inline asm 2026-02-21T09:13:46.1066442Z // begin inline asm 2026-02-21T09:13:46.1066816Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 224], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1067240Z // end inline asm 2026-02-21T09:13:46.1067369Z // begin inline asm 2026-02-21T09:13:46.1067774Z @%p89 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r186 + 240], {%r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234, %r1234}; 2026-02-21T09:13:46.1068187Z // end inline asm 2026-02-21T09:13:46.1068316Z // begin inline asm 2026-02-21T09:13:46.1068466Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:13:46.1068620Z // end inline asm 2026-02-21T09:13:46.1068757Z bar.sync 0, 128; 2026-02-21T09:13:46.1069006Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1069309Z add.s32 %r458, %r161, 163840; 2026-02-21T09:13:46.1069457Z // begin inline asm 2026-02-21T09:13:46.1069656Z @%p132 mbarrier.init.shared::cta.b64 [%r458], 1; 2026-02-21T09:13:46.1069850Z // end inline asm 2026-02-21T09:13:46.1069979Z bar.sync 0, 128; 2026-02-21T09:13:46.1070114Z add.s32 %r459, %r161, 163848; 2026-02-21T09:13:46.1070261Z // begin inline asm 2026-02-21T09:13:46.1070430Z @%p132 mbarrier.init.shared::cta.b64 [%r459], 1; 2026-02-21T09:13:46.1070613Z // end inline asm 2026-02-21T09:13:46.1070745Z bar.sync 0, 128; 2026-02-21T09:13:46.1070876Z add.s32 %r460, %r161, 163856; 2026-02-21T09:13:46.1071033Z // begin inline asm 2026-02-21T09:13:46.1071190Z @%p132 mbarrier.init.shared::cta.b64 [%r460], 1; 2026-02-21T09:13:46.1071378Z // end inline asm 2026-02-21T09:13:46.1071509Z bar.sync 0, 128; 2026-02-21T09:13:46.1071639Z add.s32 %r461, %r161, 163864; 2026-02-21T09:13:46.1071798Z // begin inline asm 2026-02-21T09:13:46.1071954Z @%p132 mbarrier.init.shared::cta.b64 [%r461], 1; 2026-02-21T09:13:46.1072139Z // end inline asm 2026-02-21T09:13:46.1072271Z add.s32 %r462, %r161, 163872; 2026-02-21T09:13:46.1072431Z // begin inline asm 2026-02-21T09:13:46.1072585Z @%p132 mbarrier.init.shared::cta.b64 [%r462], 1; 2026-02-21T09:13:46.1072771Z // end inline asm 2026-02-21T09:13:46.1072899Z bar.sync 0, 128; 2026-02-21T09:13:46.1073039Z add.s32 %r463, %r161, 163880; 2026-02-21T09:13:46.1073211Z // begin inline asm 2026-02-21T09:13:46.1073368Z @%p132 mbarrier.init.shared::cta.b64 [%r463], 1; 2026-02-21T09:13:46.1073557Z // end inline asm 2026-02-21T09:13:46.1073685Z bar.sync 0, 128; 2026-02-21T09:13:46.1073822Z add.s32 %r464, %r161, 163888; 2026-02-21T09:13:46.1073968Z // begin inline asm 2026-02-21T09:13:46.1074130Z @%p132 mbarrier.init.shared::cta.b64 [%r464], 1; 2026-02-21T09:13:46.1074310Z // end inline asm 2026-02-21T09:13:46.1074443Z bar.sync 0, 128; 2026-02-21T09:13:46.1074579Z add.s32 %r465, %r161, 163896; 2026-02-21T09:13:46.1074754Z // begin inline asm 2026-02-21T09:13:46.1074921Z @%p132 mbarrier.init.shared::cta.b64 [%r465], 1; 2026-02-21T09:13:46.1075098Z // end inline asm 2026-02-21T09:13:46.1075353Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1075630Z bar.sync 0, 128; 2026-02-21T09:13:46.1075769Z // begin inline asm 2026-02-21T09:13:46.1075939Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r458]; 2026-02-21T09:13:46.1076141Z // end inline asm 2026-02-21T09:13:46.1076278Z bar.sync 0, 128; 2026-02-21T09:13:46.1076408Z // begin inline asm 2026-02-21T09:13:46.1076583Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r459]; 2026-02-21T09:13:46.1076820Z // end inline asm 2026-02-21T09:13:46.1076953Z bar.sync 0, 128; 2026-02-21T09:13:46.1077081Z // begin inline asm 2026-02-21T09:13:46.1077247Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r460]; 2026-02-21T09:13:46.1077464Z // end inline asm 2026-02-21T09:13:46.1077600Z bar.sync 0, 128; 2026-02-21T09:13:46.1077727Z // begin inline asm 2026-02-21T09:13:46.1077893Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r461]; 2026-02-21T09:13:46.1078083Z // end inline asm 2026-02-21T09:13:46.1078331Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1078624Z bar.sync 0, 128; 2026-02-21T09:13:46.1078758Z add.s32 %r470, %r161, 163904; 2026-02-21T09:13:46.1078913Z // begin inline asm 2026-02-21T09:13:46.1079070Z @%p132 mbarrier.init.shared::cta.b64 [%r470], 1; 2026-02-21T09:13:46.1079256Z // end inline asm 2026-02-21T09:13:46.1079423Z add.s32 %r1194, %r161, 163920; 2026-02-21T09:13:46.1079578Z // begin inline asm 2026-02-21T09:13:46.1079738Z @%p132 mbarrier.init.shared::cta.b64 [%r1194], 1; 2026-02-21T09:13:46.1079919Z // end inline asm 2026-02-21T09:13:46.1080166Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1080446Z bar.sync 0, 128; 2026-02-21T09:13:46.1080583Z // begin inline asm 2026-02-21T09:13:46.1080746Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r1194]; 2026-02-21T09:13:46.1080970Z // end inline asm 2026-02-21T09:13:46.1081225Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1081527Z st.shared.b32 [global_smem+163928], 33554689; 2026-02-21T09:13:46.1081743Z st.shared.b32 [global_smem+65536], %r1206; 2026-02-21T09:13:46.1081939Z st.shared.b32 [global_smem+65544], %r42; 2026-02-21T09:13:46.1082122Z barrier.sync 1; 2026-02-21T09:13:46.1082281Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:13:46.1082470Z barrier.sync 1; 2026-02-21T09:13:46.1082610Z setp.lt.s32 %p123, %r492, 1; 2026-02-21T09:13:46.1082774Z @%p123 bra $L__BB0_23; 2026-02-21T09:13:46.1082945Z // %bb.17: // %.lr.ph11 2026-02-21T09:13:46.1083131Z add.s32 %r1231, %r41, -2368; 2026-02-21T09:13:46.1083293Z shl.b32 %r497, %r1, 7; 2026-02-21T09:13:46.1083437Z and.b32 %r498, %r497, 16256; 2026-02-21T09:13:46.1083593Z shl.b32 %r499, %r1, 4; 2026-02-21T09:13:46.1083734Z and.b32 %r500, %r499, 112; 2026-02-21T09:13:46.1083892Z or.b32 %r501, %r498, %r500; 2026-02-21T09:13:46.1084042Z add.s32 %r503, %r161, 65536; 2026-02-21T09:13:46.1084199Z add.s32 %r46, %r503, %r501; 2026-02-21T09:13:46.1084349Z xor.b32 %r504, %r501, 16; 2026-02-21T09:13:46.1084504Z add.s32 %r47, %r503, %r504; 2026-02-21T09:13:46.1084660Z xor.b32 %r505, %r501, 32; 2026-02-21T09:13:46.1084848Z add.s32 %r48, %r503, %r505; 2026-02-21T09:13:46.1085003Z xor.b32 %r506, %r501, 48; 2026-02-21T09:13:46.1085149Z add.s32 %r49, %r503, %r506; 2026-02-21T09:13:46.1085303Z xor.b32 %r507, %r501, 64; 2026-02-21T09:13:46.1085447Z add.s32 %r50, %r503, %r507; 2026-02-21T09:13:46.1085597Z xor.b32 %r508, %r501, 80; 2026-02-21T09:13:46.1085737Z add.s32 %r51, %r503, %r508; 2026-02-21T09:13:46.1085890Z xor.b32 %r509, %r501, 96; 2026-02-21T09:13:46.1086033Z add.s32 %r52, %r503, %r509; 2026-02-21T09:13:46.1086186Z xor.b32 %r510, %r501, 112; 2026-02-21T09:13:46.1086340Z add.s32 %r53, %r503, %r510; 2026-02-21T09:13:46.1086489Z and.b32 %r511, %r43, 1; 2026-02-21T09:13:46.1086642Z shl.b32 %r512, %r511, 15; 2026-02-21T09:13:46.1086783Z add.s32 %r792, %r503, %r512; 2026-02-21T09:13:46.1086939Z shl.b32 %r55, %r511, 6; 2026-02-21T09:13:46.1087084Z max.s32 %r1224, %r42, 1; 2026-02-21T09:13:46.1087235Z mov.b32 %r1229, -1; 2026-02-21T09:13:46.1087373Z mov.b32 %r1232, %r1234; 2026-02-21T09:13:46.1087521Z mov.b32 %r1233, %r1234; 2026-02-21T09:13:46.1087661Z bra.uni $L__BB0_18; 2026-02-21T09:13:46.1087853Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:46.1088221Z .loc 1 0 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0:130 2026-02-21T09:13:46.1088514Z setp.lt.u32 %p129, %r1, 64; 2026-02-21T09:13:46.1088786Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1089093Z bar.sync 0, 128; 2026-02-21T09:13:46.1089236Z // begin inline asm 2026-02-21T09:13:46.1089368Z 2026-02-21T09:13:46.1089489Z { 2026-02-21T09:13:46.1089616Z .reg .pred complete; 2026-02-21T09:13:46.1089759Z waitLoop: 2026-02-21T09:13:46.1089953Z mbarrier.try_wait.parity.shared.b64 complete, [%r470], %r1234; 2026-02-21T09:13:46.1090183Z @!complete bra.uni waitLoop; 2026-02-21T09:13:46.1090342Z } 2026-02-21T09:13:46.1090406Z 2026-02-21T09:13:46.1090461Z // end inline asm 2026-02-21T09:13:46.1090607Z // begin inline asm 2026-02-21T09:13:46.1090989Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529, %r530, %r531, %r532}, [%r186 + 0]; 2026-02-21T09:13:46.1091374Z // end inline asm 2026-02-21T09:13:46.1091511Z // begin inline asm 2026-02-21T09:13:46.1091854Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549}, [%r186 + 16]; 2026-02-21T09:13:46.1092227Z // end inline asm 2026-02-21T09:13:46.1092355Z // begin inline asm 2026-02-21T09:13:46.1092716Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566}, [%r186 + 32]; 2026-02-21T09:13:46.1093095Z // end inline asm 2026-02-21T09:13:46.1093231Z // begin inline asm 2026-02-21T09:13:46.1093568Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583}, [%r186 + 48]; 2026-02-21T09:13:46.1093943Z // end inline asm 2026-02-21T09:13:46.1094080Z // begin inline asm 2026-02-21T09:13:46.1094414Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600}, [%r186 + 64]; 2026-02-21T09:13:46.1094827Z // end inline asm 2026-02-21T09:13:46.1094955Z // begin inline asm 2026-02-21T09:13:46.1095305Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617}, [%r186 + 80]; 2026-02-21T09:13:46.1095676Z // end inline asm 2026-02-21T09:13:46.1095804Z // begin inline asm 2026-02-21T09:13:46.1096153Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634}, [%r186 + 96]; 2026-02-21T09:13:46.1096532Z // end inline asm 2026-02-21T09:13:46.1096668Z // begin inline asm 2026-02-21T09:13:46.1096997Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651}, [%r186 + 112]; 2026-02-21T09:13:46.1097386Z // end inline asm 2026-02-21T09:13:46.1097518Z // begin inline asm 2026-02-21T09:13:46.1097849Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668}, [%r186 + 128]; 2026-02-21T09:13:46.1098241Z // end inline asm 2026-02-21T09:13:46.1098372Z // begin inline asm 2026-02-21T09:13:46.1098714Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685}, [%r186 + 144]; 2026-02-21T09:13:46.1099102Z // end inline asm 2026-02-21T09:13:46.1099231Z // begin inline asm 2026-02-21T09:13:46.1099572Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702}, [%r186 + 160]; 2026-02-21T09:13:46.1099961Z // end inline asm 2026-02-21T09:13:46.1100093Z // begin inline asm 2026-02-21T09:13:46.1100424Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719}, [%r186 + 176]; 2026-02-21T09:13:46.1100853Z // end inline asm 2026-02-21T09:13:46.1100994Z // begin inline asm 2026-02-21T09:13:46.1101350Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736}, [%r186 + 192]; 2026-02-21T09:13:46.1101748Z // end inline asm 2026-02-21T09:13:46.1101884Z // begin inline asm 2026-02-21T09:13:46.1102239Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753}, [%r186 + 208]; 2026-02-21T09:13:46.1102640Z // end inline asm 2026-02-21T09:13:46.1102783Z // begin inline asm 2026-02-21T09:13:46.1103163Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770}, [%r186 + 224]; 2026-02-21T09:13:46.1103544Z // end inline asm 2026-02-21T09:13:46.1103608Z // begin inline asm 2026-02-21T09:13:46.1103874Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787}, [%r186 + 240]; 2026-02-21T09:13:46.1103931Z // end inline asm 2026-02-21T09:13:46.1104013Z // begin inline asm 2026-02-21T09:13:46.1104096Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:13:46.1104151Z // end inline asm 2026-02-21T09:13:46.1104208Z bar.sync 0, 128; 2026-02-21T09:13:46.1104286Z // begin inline asm 2026-02-21T09:13:46.1104381Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r1194]; 2026-02-21T09:13:46.1104440Z // end inline asm 2026-02-21T09:13:46.1104505Z cvt.u64.u32 %rd88, %r517; 2026-02-21T09:13:46.1104575Z cvt.u64.u32 %rd89, %r518; 2026-02-21T09:13:46.1104634Z shl.b64 %rd90, %rd89, 32; 2026-02-21T09:13:46.1104727Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T09:13:46.1104917Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1104981Z mov.b64 {%r795, %r796}, %rd91; 2026-02-21T09:13:46.1105053Z cvt.rn.f16x2.f32 %r797, %r796, %r795; 2026-02-21T09:13:46.1105236Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1105297Z cvt.u64.u32 %rd92, %r519; 2026-02-21T09:13:46.1105356Z cvt.u64.u32 %rd93, %r520; 2026-02-21T09:13:46.1105413Z shl.b64 %rd94, %rd93, 32; 2026-02-21T09:13:46.1105480Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T09:13:46.1105653Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1105715Z mov.b64 {%r798, %r799}, %rd95; 2026-02-21T09:13:46.1105790Z cvt.rn.f16x2.f32 %r800, %r799, %r798; 2026-02-21T09:13:46.1105964Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1106024Z cvt.u64.u32 %rd96, %r521; 2026-02-21T09:13:46.1106089Z cvt.u64.u32 %rd97, %r522; 2026-02-21T09:13:46.1106147Z shl.b64 %rd98, %rd97, 32; 2026-02-21T09:13:46.1106209Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T09:13:46.1106381Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1106449Z mov.b64 {%r801, %r802}, %rd99; 2026-02-21T09:13:46.1106517Z cvt.rn.f16x2.f32 %r803, %r802, %r801; 2026-02-21T09:13:46.1106687Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1106757Z cvt.u64.u32 %rd100, %r523; 2026-02-21T09:13:46.1106818Z cvt.u64.u32 %rd101, %r524; 2026-02-21T09:13:46.1106877Z shl.b64 %rd102, %rd101, 32; 2026-02-21T09:13:46.1106946Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T09:13:46.1107118Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1107211Z mov.b64 {%r804, %r805}, %rd103; 2026-02-21T09:13:46.1107278Z cvt.rn.f16x2.f32 %r806, %r805, %r804; 2026-02-21T09:13:46.1107452Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1107536Z cvt.u64.u32 %rd104, %r525; 2026-02-21T09:13:46.1107597Z cvt.u64.u32 %rd105, %r526; 2026-02-21T09:13:46.1107663Z shl.b64 %rd106, %rd105, 32; 2026-02-21T09:13:46.1107724Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T09:13:46.1107897Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1107965Z mov.b64 {%r807, %r808}, %rd107; 2026-02-21T09:13:46.1108039Z cvt.rn.f16x2.f32 %r809, %r808, %r807; 2026-02-21T09:13:46.1108203Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1108284Z cvt.u64.u32 %rd108, %r527; 2026-02-21T09:13:46.1108349Z cvt.u64.u32 %rd109, %r528; 2026-02-21T09:13:46.1108406Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:13:46.1108463Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:13:46.1108635Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1108693Z mov.b64 {%r810, %r811}, %rd111; 2026-02-21T09:13:46.1108754Z cvt.rn.f16x2.f32 %r812, %r811, %r810; 2026-02-21T09:13:46.1108960Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1109017Z cvt.u64.u32 %rd112, %r529; 2026-02-21T09:13:46.1109072Z cvt.u64.u32 %rd113, %r530; 2026-02-21T09:13:46.1109127Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:13:46.1109191Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:13:46.1109350Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1109408Z mov.b64 {%r813, %r814}, %rd115; 2026-02-21T09:13:46.1109476Z cvt.rn.f16x2.f32 %r815, %r814, %r813; 2026-02-21T09:13:46.1109636Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1109692Z cvt.u64.u32 %rd116, %r531; 2026-02-21T09:13:46.1109751Z cvt.u64.u32 %rd117, %r532; 2026-02-21T09:13:46.1109808Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:13:46.1109867Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:13:46.1110031Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1110094Z mov.b64 {%r816, %r817}, %rd119; 2026-02-21T09:13:46.1110154Z cvt.rn.f16x2.f32 %r818, %r817, %r816; 2026-02-21T09:13:46.1110316Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1110380Z cvt.u64.u32 %rd120, %r534; 2026-02-21T09:13:46.1110435Z cvt.u64.u32 %rd121, %r535; 2026-02-21T09:13:46.1110489Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:13:46.1110554Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:13:46.1110716Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1110772Z mov.b64 {%r819, %r820}, %rd123; 2026-02-21T09:13:46.1110832Z cvt.rn.f16x2.f32 %r821, %r820, %r819; 2026-02-21T09:13:46.1111002Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1111056Z cvt.u64.u32 %rd124, %r536; 2026-02-21T09:13:46.1111111Z cvt.u64.u32 %rd125, %r537; 2026-02-21T09:13:46.1111175Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:13:46.1111232Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:13:46.1111392Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1111456Z mov.b64 {%r822, %r823}, %rd127; 2026-02-21T09:13:46.1111517Z cvt.rn.f16x2.f32 %r824, %r823, %r822; 2026-02-21T09:13:46.1111685Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1111773Z cvt.u64.u32 %rd128, %r538; 2026-02-21T09:13:46.1111836Z cvt.u64.u32 %rd129, %r539; 2026-02-21T09:13:46.1111892Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:13:46.1111948Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:13:46.1112143Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1112200Z mov.b64 {%r825, %r826}, %rd131; 2026-02-21T09:13:46.1112259Z cvt.rn.f16x2.f32 %r827, %r826, %r825; 2026-02-21T09:13:46.1112434Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1112489Z cvt.u64.u32 %rd132, %r540; 2026-02-21T09:13:46.1112544Z cvt.u64.u32 %rd133, %r541; 2026-02-21T09:13:46.1112600Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:13:46.1112664Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:13:46.1112847Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1112909Z mov.b64 {%r828, %r829}, %rd135; 2026-02-21T09:13:46.1112980Z cvt.rn.f16x2.f32 %r830, %r829, %r828; 2026-02-21T09:13:46.1113136Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1113195Z cvt.u64.u32 %rd136, %r542; 2026-02-21T09:13:46.1113270Z cvt.u64.u32 %rd137, %r543; 2026-02-21T09:13:46.1113326Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:13:46.1113383Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:13:46.1113564Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1113631Z mov.b64 {%r831, %r832}, %rd139; 2026-02-21T09:13:46.1113691Z cvt.rn.f16x2.f32 %r833, %r832, %r831; 2026-02-21T09:13:46.1113847Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1113912Z cvt.u64.u32 %rd140, %r544; 2026-02-21T09:13:46.1113968Z cvt.u64.u32 %rd141, %r545; 2026-02-21T09:13:46.1114026Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:13:46.1114090Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:13:46.1114247Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1114303Z mov.b64 {%r834, %r835}, %rd143; 2026-02-21T09:13:46.1114365Z cvt.rn.f16x2.f32 %r836, %r835, %r834; 2026-02-21T09:13:46.1114530Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1114588Z cvt.u64.u32 %rd144, %r546; 2026-02-21T09:13:46.1114644Z cvt.u64.u32 %rd145, %r547; 2026-02-21T09:13:46.1114742Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:13:46.1114801Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:13:46.1114966Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1115028Z mov.b64 {%r837, %r838}, %rd147; 2026-02-21T09:13:46.1115091Z cvt.rn.f16x2.f32 %r839, %r838, %r837; 2026-02-21T09:13:46.1115254Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1115310Z cvt.u64.u32 %rd148, %r548; 2026-02-21T09:13:46.1115374Z cvt.u64.u32 %rd149, %r549; 2026-02-21T09:13:46.1115430Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:13:46.1115490Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:13:46.1115663Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1115720Z mov.b64 {%r840, %r841}, %rd151; 2026-02-21T09:13:46.1115780Z cvt.rn.f16x2.f32 %r842, %r841, %r840; 2026-02-21T09:13:46.1115951Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1116007Z cvt.u64.u32 %rd152, %r551; 2026-02-21T09:13:46.1116064Z cvt.u64.u32 %rd153, %r552; 2026-02-21T09:13:46.1116121Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:13:46.1116186Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:13:46.1116351Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1116440Z mov.b64 {%r843, %r844}, %rd155; 2026-02-21T09:13:46.1116510Z cvt.rn.f16x2.f32 %r845, %r844, %r843; 2026-02-21T09:13:46.1116676Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1116757Z cvt.u64.u32 %rd156, %r553; 2026-02-21T09:13:46.1116820Z cvt.u64.u32 %rd157, %r554; 2026-02-21T09:13:46.1116877Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:13:46.1116935Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:13:46.1117092Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1117157Z mov.b64 {%r846, %r847}, %rd159; 2026-02-21T09:13:46.1117218Z cvt.rn.f16x2.f32 %r848, %r847, %r846; 2026-02-21T09:13:46.1117380Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1117468Z cvt.u64.u32 %rd160, %r555; 2026-02-21T09:13:46.1117527Z cvt.u64.u32 %rd161, %r556; 2026-02-21T09:13:46.1117584Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:13:46.1117645Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:13:46.1117810Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1117867Z mov.b64 {%r849, %r850}, %rd163; 2026-02-21T09:13:46.1117927Z cvt.rn.f16x2.f32 %r851, %r850, %r849; 2026-02-21T09:13:46.1118120Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1118176Z cvt.u64.u32 %rd164, %r557; 2026-02-21T09:13:46.1118231Z cvt.u64.u32 %rd165, %r558; 2026-02-21T09:13:46.1118293Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:13:46.1118349Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:13:46.1118511Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1118575Z mov.b64 {%r852, %r853}, %rd167; 2026-02-21T09:13:46.1118637Z cvt.rn.f16x2.f32 %r854, %r853, %r852; 2026-02-21T09:13:46.1118797Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1118852Z cvt.u64.u32 %rd168, %r559; 2026-02-21T09:13:46.1118916Z cvt.u64.u32 %rd169, %r560; 2026-02-21T09:13:46.1118972Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:13:46.1119028Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:13:46.1119199Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1119255Z mov.b64 {%r855, %r856}, %rd171; 2026-02-21T09:13:46.1119315Z cvt.rn.f16x2.f32 %r857, %r856, %r855; 2026-02-21T09:13:46.1119479Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1119535Z cvt.u64.u32 %rd172, %r561; 2026-02-21T09:13:46.1119590Z cvt.u64.u32 %rd173, %r562; 2026-02-21T09:13:46.1119646Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:13:46.1119713Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:13:46.1119877Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1119933Z mov.b64 {%r858, %r859}, %rd175; 2026-02-21T09:13:46.1120000Z cvt.rn.f16x2.f32 %r860, %r859, %r858; 2026-02-21T09:13:46.1120165Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1120221Z cvt.u64.u32 %rd176, %r563; 2026-02-21T09:13:46.1120285Z cvt.u64.u32 %rd177, %r564; 2026-02-21T09:13:46.1120341Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:13:46.1120398Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:13:46.1120564Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1120627Z mov.b64 {%r861, %r862}, %rd179; 2026-02-21T09:13:46.1120687Z cvt.rn.f16x2.f32 %r863, %r862, %r861; 2026-02-21T09:13:46.1120853Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1120940Z cvt.u64.u32 %rd180, %r565; 2026-02-21T09:13:46.1120997Z cvt.u64.u32 %rd181, %r566; 2026-02-21T09:13:46.1121054Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:13:46.1121111Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:13:46.1121308Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1121365Z mov.b64 {%r864, %r865}, %rd183; 2026-02-21T09:13:46.1121426Z cvt.rn.f16x2.f32 %r866, %r865, %r864; 2026-02-21T09:13:46.1121603Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1121660Z cvt.u64.u32 %rd184, %r568; 2026-02-21T09:13:46.1121716Z cvt.u64.u32 %rd185, %r569; 2026-02-21T09:13:46.1121781Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:13:46.1121841Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:13:46.1122043Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1122112Z mov.b64 {%r867, %r868}, %rd187; 2026-02-21T09:13:46.1122173Z cvt.rn.f16x2.f32 %r869, %r868, %r867; 2026-02-21T09:13:46.1122337Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1122396Z cvt.u64.u32 %rd188, %r570; 2026-02-21T09:13:46.1122459Z cvt.u64.u32 %rd189, %r571; 2026-02-21T09:13:46.1122514Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:13:46.1122592Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:13:46.1122766Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1122823Z mov.b64 {%r870, %r871}, %rd191; 2026-02-21T09:13:46.1122881Z cvt.rn.f16x2.f32 %r872, %r871, %r870; 2026-02-21T09:13:46.1123048Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1123104Z cvt.u64.u32 %rd192, %r572; 2026-02-21T09:13:46.1123159Z cvt.u64.u32 %rd193, %r573; 2026-02-21T09:13:46.1123216Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:13:46.1123279Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:13:46.1123440Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1123499Z mov.b64 {%r873, %r874}, %rd195; 2026-02-21T09:13:46.1123565Z cvt.rn.f16x2.f32 %r875, %r874, %r873; 2026-02-21T09:13:46.1123730Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1123788Z cvt.u64.u32 %rd196, %r574; 2026-02-21T09:13:46.1123850Z cvt.u64.u32 %rd197, %r575; 2026-02-21T09:13:46.1123906Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:13:46.1123963Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:13:46.1124126Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1124190Z mov.b64 {%r876, %r877}, %rd199; 2026-02-21T09:13:46.1124250Z cvt.rn.f16x2.f32 %r878, %r877, %r876; 2026-02-21T09:13:46.1124414Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1124478Z cvt.u64.u32 %rd200, %r576; 2026-02-21T09:13:46.1124532Z cvt.u64.u32 %rd201, %r577; 2026-02-21T09:13:46.1124589Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:13:46.1124644Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:13:46.1124848Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1124906Z mov.b64 {%r879, %r880}, %rd203; 2026-02-21T09:13:46.1124967Z cvt.rn.f16x2.f32 %r881, %r880, %r879; 2026-02-21T09:13:46.1125139Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1125196Z cvt.u64.u32 %rd204, %r578; 2026-02-21T09:13:46.1125252Z cvt.u64.u32 %rd205, %r579; 2026-02-21T09:13:46.1125315Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:13:46.1125372Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:13:46.1125565Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1125623Z mov.b64 {%r882, %r883}, %rd207; 2026-02-21T09:13:46.1125692Z cvt.rn.f16x2.f32 %r884, %r883, %r882; 2026-02-21T09:13:46.1125879Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1125935Z cvt.u64.u32 %rd208, %r580; 2026-02-21T09:13:46.1125999Z cvt.u64.u32 %rd209, %r581; 2026-02-21T09:13:46.1126057Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:13:46.1126113Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:13:46.1126287Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1126346Z mov.b64 {%r885, %r886}, %rd211; 2026-02-21T09:13:46.1126407Z cvt.rn.f16x2.f32 %r887, %r886, %r885; 2026-02-21T09:13:46.1126618Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1126677Z cvt.u64.u32 %rd212, %r582; 2026-02-21T09:13:46.1126733Z cvt.u64.u32 %rd213, %r583; 2026-02-21T09:13:46.1126788Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:13:46.1126850Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:13:46.1127012Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1127069Z mov.b64 {%r888, %r889}, %rd215; 2026-02-21T09:13:46.1127136Z cvt.rn.f16x2.f32 %r890, %r889, %r888; 2026-02-21T09:13:46.1127325Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1127382Z cvt.u64.u32 %rd216, %r585; 2026-02-21T09:13:46.1127436Z cvt.u64.u32 %rd217, %r586; 2026-02-21T09:13:46.1127499Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:13:46.1127555Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:13:46.1127717Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1127780Z mov.b64 {%r891, %r892}, %rd219; 2026-02-21T09:13:46.1127840Z cvt.rn.f16x2.f32 %r893, %r892, %r891; 2026-02-21T09:13:46.1127998Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1128059Z cvt.u64.u32 %rd220, %r587; 2026-02-21T09:13:46.1128115Z cvt.u64.u32 %rd221, %r588; 2026-02-21T09:13:46.1128170Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:13:46.1128226Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:13:46.1128391Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1128448Z mov.b64 {%r894, %r895}, %rd223; 2026-02-21T09:13:46.1128506Z cvt.rn.f16x2.f32 %r896, %r895, %r894; 2026-02-21T09:13:46.1128669Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1128726Z cvt.u64.u32 %rd224, %r589; 2026-02-21T09:13:46.1128781Z cvt.u64.u32 %rd225, %r590; 2026-02-21T09:13:46.1128844Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:13:46.1128903Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:13:46.1129059Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1129114Z mov.b64 {%r897, %r898}, %rd227; 2026-02-21T09:13:46.1129183Z cvt.rn.f16x2.f32 %r899, %r898, %r897; 2026-02-21T09:13:46.1129344Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1129399Z cvt.u64.u32 %rd228, %r591; 2026-02-21T09:13:46.1129463Z cvt.u64.u32 %rd229, %r592; 2026-02-21T09:13:46.1129518Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:13:46.1129575Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:13:46.1129744Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1129801Z mov.b64 {%r900, %r901}, %rd231; 2026-02-21T09:13:46.1129860Z cvt.rn.f16x2.f32 %r902, %r901, %r900; 2026-02-21T09:13:46.1130018Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1130108Z cvt.u64.u32 %rd232, %r593; 2026-02-21T09:13:46.1130165Z cvt.u64.u32 %rd233, %r594; 2026-02-21T09:13:46.1130222Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:13:46.1130314Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:13:46.1130485Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1130543Z mov.b64 {%r903, %r904}, %rd235; 2026-02-21T09:13:46.1130617Z cvt.rn.f16x2.f32 %r905, %r904, %r903; 2026-02-21T09:13:46.1130782Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1130839Z cvt.u64.u32 %rd236, %r595; 2026-02-21T09:13:46.1130895Z cvt.u64.u32 %rd237, %r596; 2026-02-21T09:13:46.1130959Z shl.b64 %rd238, %rd237, 32; 2026-02-21T09:13:46.1131015Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T09:13:46.1131202Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1131270Z mov.b64 {%r906, %r907}, %rd239; 2026-02-21T09:13:46.1131332Z cvt.rn.f16x2.f32 %r908, %r907, %r906; 2026-02-21T09:13:46.1131493Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1131557Z cvt.u64.u32 %rd240, %r597; 2026-02-21T09:13:46.1131612Z cvt.u64.u32 %rd241, %r598; 2026-02-21T09:13:46.1131668Z shl.b64 %rd242, %rd241, 32; 2026-02-21T09:13:46.1131747Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T09:13:46.1131921Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1131977Z mov.b64 {%r909, %r910}, %rd243; 2026-02-21T09:13:46.1132037Z cvt.rn.f16x2.f32 %r911, %r910, %r909; 2026-02-21T09:13:46.1132211Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1132267Z cvt.u64.u32 %rd244, %r599; 2026-02-21T09:13:46.1132324Z cvt.u64.u32 %rd245, %r600; 2026-02-21T09:13:46.1132388Z shl.b64 %rd246, %rd245, 32; 2026-02-21T09:13:46.1132443Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T09:13:46.1132608Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1132666Z mov.b64 {%r912, %r913}, %rd247; 2026-02-21T09:13:46.1132733Z cvt.rn.f16x2.f32 %r914, %r913, %r912; 2026-02-21T09:13:46.1132900Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1132956Z cvt.u64.u32 %rd248, %r602; 2026-02-21T09:13:46.1133019Z cvt.u64.u32 %rd249, %r603; 2026-02-21T09:13:46.1133075Z shl.b64 %rd250, %rd249, 32; 2026-02-21T09:13:46.1133131Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T09:13:46.1133309Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1133367Z mov.b64 {%r915, %r916}, %rd251; 2026-02-21T09:13:46.1133429Z cvt.rn.f16x2.f32 %r917, %r916, %r915; 2026-02-21T09:13:46.1133593Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1133656Z cvt.u64.u32 %rd252, %r604; 2026-02-21T09:13:46.1133713Z cvt.u64.u32 %rd253, %r605; 2026-02-21T09:13:46.1133771Z shl.b64 %rd254, %rd253, 32; 2026-02-21T09:13:46.1133837Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T09:13:46.1134005Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1134062Z mov.b64 {%r918, %r919}, %rd255; 2026-02-21T09:13:46.1134129Z cvt.rn.f16x2.f32 %r920, %r919, %r918; 2026-02-21T09:13:46.1134294Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1134350Z cvt.u64.u32 %rd256, %r606; 2026-02-21T09:13:46.1134405Z cvt.u64.u32 %rd257, %r607; 2026-02-21T09:13:46.1134471Z shl.b64 %rd258, %rd257, 32; 2026-02-21T09:13:46.1134526Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T09:13:46.1134770Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1134837Z mov.b64 {%r921, %r922}, %rd259; 2026-02-21T09:13:46.1134896Z cvt.rn.f16x2.f32 %r923, %r922, %r921; 2026-02-21T09:13:46.1135093Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1135154Z cvt.u64.u32 %rd260, %r608; 2026-02-21T09:13:46.1135211Z cvt.u64.u32 %rd261, %r609; 2026-02-21T09:13:46.1135268Z shl.b64 %rd262, %rd261, 32; 2026-02-21T09:13:46.1135324Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T09:13:46.1135492Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1135549Z mov.b64 {%r924, %r925}, %rd263; 2026-02-21T09:13:46.1135610Z cvt.rn.f16x2.f32 %r926, %r925, %r924; 2026-02-21T09:13:46.1135810Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1135870Z cvt.u64.u32 %rd264, %r610; 2026-02-21T09:13:46.1135927Z cvt.u64.u32 %rd265, %r611; 2026-02-21T09:13:46.1135991Z shl.b64 %rd266, %rd265, 32; 2026-02-21T09:13:46.1136047Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T09:13:46.1136214Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1136271Z mov.b64 {%r927, %r928}, %rd267; 2026-02-21T09:13:46.1136366Z cvt.rn.f16x2.f32 %r929, %r928, %r927; 2026-02-21T09:13:46.1136534Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1136590Z cvt.u64.u32 %rd268, %r612; 2026-02-21T09:13:46.1136653Z cvt.u64.u32 %rd269, %r613; 2026-02-21T09:13:46.1136709Z shl.b64 %rd270, %rd269, 32; 2026-02-21T09:13:46.1136768Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T09:13:46.1136942Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1137002Z mov.b64 {%r930, %r931}, %rd271; 2026-02-21T09:13:46.1137063Z cvt.rn.f16x2.f32 %r932, %r931, %r930; 2026-02-21T09:13:46.1137229Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1137295Z cvt.u64.u32 %rd272, %r614; 2026-02-21T09:13:46.1137349Z cvt.u64.u32 %rd273, %r615; 2026-02-21T09:13:46.1137404Z shl.b64 %rd274, %rd273, 32; 2026-02-21T09:13:46.1137469Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T09:13:46.1137636Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1137695Z mov.b64 {%r933, %r934}, %rd275; 2026-02-21T09:13:46.1137763Z cvt.rn.f16x2.f32 %r935, %r934, %r933; 2026-02-21T09:13:46.1137926Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1137982Z cvt.u64.u32 %rd276, %r616; 2026-02-21T09:13:46.1138037Z cvt.u64.u32 %rd277, %r617; 2026-02-21T09:13:46.1138104Z shl.b64 %rd278, %rd277, 32; 2026-02-21T09:13:46.1138161Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T09:13:46.1138323Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1138388Z mov.b64 {%r936, %r937}, %rd279; 2026-02-21T09:13:46.1138448Z cvt.rn.f16x2.f32 %r938, %r937, %r936; 2026-02-21T09:13:46.1138611Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1138674Z cvt.u64.u32 %rd280, %r619; 2026-02-21T09:13:46.1138731Z cvt.u64.u32 %rd281, %r620; 2026-02-21T09:13:46.1138787Z shl.b64 %rd282, %rd281, 32; 2026-02-21T09:13:46.1138845Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T09:13:46.1139019Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1139078Z mov.b64 {%r939, %r940}, %rd283; 2026-02-21T09:13:46.1139141Z cvt.rn.f16x2.f32 %r941, %r940, %r939; 2026-02-21T09:13:46.1139369Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1139428Z cvt.u64.u32 %rd284, %r621; 2026-02-21T09:13:46.1139484Z cvt.u64.u32 %rd285, %r622; 2026-02-21T09:13:46.1139548Z shl.b64 %rd286, %rd285, 32; 2026-02-21T09:13:46.1139625Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T09:13:46.1139789Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1139848Z mov.b64 {%r942, %r943}, %rd287; 2026-02-21T09:13:46.1139917Z cvt.rn.f16x2.f32 %r944, %r943, %r942; 2026-02-21T09:13:46.1140081Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1140138Z cvt.u64.u32 %rd288, %r623; 2026-02-21T09:13:46.1140202Z cvt.u64.u32 %rd289, %r624; 2026-02-21T09:13:46.1140256Z shl.b64 %rd290, %rd289, 32; 2026-02-21T09:13:46.1140345Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T09:13:46.1140528Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1140589Z mov.b64 {%r945, %r946}, %rd291; 2026-02-21T09:13:46.1140651Z cvt.rn.f16x2.f32 %r947, %r946, %r945; 2026-02-21T09:13:46.1140818Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1140883Z cvt.u64.u32 %rd292, %r625; 2026-02-21T09:13:46.1140938Z cvt.u64.u32 %rd293, %r626; 2026-02-21T09:13:46.1141015Z shl.b64 %rd294, %rd293, 32; 2026-02-21T09:13:46.1141080Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T09:13:46.1141245Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1141302Z mov.b64 {%r948, %r949}, %rd295; 2026-02-21T09:13:46.1141367Z cvt.rn.f16x2.f32 %r950, %r949, %r948; 2026-02-21T09:13:46.1141533Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1141591Z cvt.u64.u32 %rd296, %r627; 2026-02-21T09:13:46.1141646Z cvt.u64.u32 %rd297, %r628; 2026-02-21T09:13:46.1141709Z shl.b64 %rd298, %rd297, 32; 2026-02-21T09:13:46.1141765Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T09:13:46.1141926Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1141992Z mov.b64 {%r951, %r952}, %rd299; 2026-02-21T09:13:46.1142052Z cvt.rn.f16x2.f32 %r953, %r952, %r951; 2026-02-21T09:13:46.1142220Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1142285Z cvt.u64.u32 %rd300, %r629; 2026-02-21T09:13:46.1142340Z cvt.u64.u32 %rd301, %r630; 2026-02-21T09:13:46.1142396Z shl.b64 %rd302, %rd301, 32; 2026-02-21T09:13:46.1142451Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T09:13:46.1142626Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1142685Z mov.b64 {%r954, %r955}, %rd303; 2026-02-21T09:13:46.1142746Z cvt.rn.f16x2.f32 %r956, %r955, %r954; 2026-02-21T09:13:46.1142961Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1143017Z cvt.u64.u32 %rd304, %r631; 2026-02-21T09:13:46.1143073Z cvt.u64.u32 %rd305, %r632; 2026-02-21T09:13:46.1143137Z shl.b64 %rd306, %rd305, 32; 2026-02-21T09:13:46.1143191Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T09:13:46.1143359Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1143414Z mov.b64 {%r957, %r958}, %rd307; 2026-02-21T09:13:46.1143483Z cvt.rn.f16x2.f32 %r959, %r958, %r957; 2026-02-21T09:13:46.1143654Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1143711Z cvt.u64.u32 %rd308, %r633; 2026-02-21T09:13:46.1143776Z cvt.u64.u32 %rd309, %r634; 2026-02-21T09:13:46.1143833Z shl.b64 %rd310, %rd309, 32; 2026-02-21T09:13:46.1143912Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T09:13:46.1144091Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1144166Z mov.b64 {%r960, %r961}, %rd311; 2026-02-21T09:13:46.1144262Z cvt.rn.f16x2.f32 %r962, %r961, %r960; 2026-02-21T09:13:46.1144432Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1144498Z cvt.u64.u32 %rd312, %r636; 2026-02-21T09:13:46.1144559Z cvt.u64.u32 %rd313, %r637; 2026-02-21T09:13:46.1144616Z shl.b64 %rd314, %rd313, 32; 2026-02-21T09:13:46.1144733Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T09:13:46.1144905Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1144964Z mov.b64 {%r963, %r964}, %rd315; 2026-02-21T09:13:46.1145033Z cvt.rn.f16x2.f32 %r965, %r964, %r963; 2026-02-21T09:13:46.1145236Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1145297Z cvt.u64.u32 %rd316, %r638; 2026-02-21T09:13:46.1145354Z cvt.u64.u32 %rd317, %r639; 2026-02-21T09:13:46.1145419Z shl.b64 %rd318, %rd317, 32; 2026-02-21T09:13:46.1145477Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T09:13:46.1145650Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1145716Z mov.b64 {%r966, %r967}, %rd319; 2026-02-21T09:13:46.1145804Z cvt.rn.f16x2.f32 %r968, %r967, %r966; 2026-02-21T09:13:46.1145978Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1146046Z cvt.u64.u32 %rd320, %r640; 2026-02-21T09:13:46.1146106Z cvt.u64.u32 %rd321, %r641; 2026-02-21T09:13:46.1146164Z shl.b64 %rd322, %rd321, 32; 2026-02-21T09:13:46.1146224Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T09:13:46.1146403Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1146465Z mov.b64 {%r969, %r970}, %rd323; 2026-02-21T09:13:46.1146530Z cvt.rn.f16x2.f32 %r971, %r970, %r969; 2026-02-21T09:13:46.1146708Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1146770Z cvt.u64.u32 %rd324, %r642; 2026-02-21T09:13:46.1146830Z cvt.u64.u32 %rd325, %r643; 2026-02-21T09:13:46.1146898Z shl.b64 %rd326, %rd325, 32; 2026-02-21T09:13:46.1146959Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T09:13:46.1147129Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1147188Z mov.b64 {%r972, %r973}, %rd327; 2026-02-21T09:13:46.1147262Z cvt.rn.f16x2.f32 %r974, %r973, %r972; 2026-02-21T09:13:46.1147432Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1147493Z cvt.u64.u32 %rd328, %r644; 2026-02-21T09:13:46.1147561Z cvt.u64.u32 %rd329, %r645; 2026-02-21T09:13:46.1147619Z shl.b64 %rd330, %rd329, 32; 2026-02-21T09:13:46.1147679Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T09:13:46.1147858Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1147921Z mov.b64 {%r975, %r976}, %rd331; 2026-02-21T09:13:46.1147986Z cvt.rn.f16x2.f32 %r977, %r976, %r975; 2026-02-21T09:13:46.1148165Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1148235Z cvt.u64.u32 %rd332, %r646; 2026-02-21T09:13:46.1148296Z cvt.u64.u32 %rd333, %r647; 2026-02-21T09:13:46.1148366Z shl.b64 %rd334, %rd333, 32; 2026-02-21T09:13:46.1148434Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T09:13:46.1148609Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1148669Z mov.b64 {%r978, %r979}, %rd335; 2026-02-21T09:13:46.1148741Z cvt.rn.f16x2.f32 %r980, %r979, %r978; 2026-02-21T09:13:46.1148935Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1148994Z cvt.u64.u32 %rd336, %r648; 2026-02-21T09:13:46.1149052Z cvt.u64.u32 %rd337, %r649; 2026-02-21T09:13:46.1149148Z shl.b64 %rd338, %rd337, 32; 2026-02-21T09:13:46.1149207Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T09:13:46.1149374Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1149442Z mov.b64 {%r981, %r982}, %rd339; 2026-02-21T09:13:46.1149504Z cvt.rn.f16x2.f32 %r983, %r982, %r981; 2026-02-21T09:13:46.1149669Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1149735Z cvt.u64.u32 %rd340, %r650; 2026-02-21T09:13:46.1149793Z cvt.u64.u32 %rd341, %r651; 2026-02-21T09:13:46.1149850Z shl.b64 %rd342, %rd341, 32; 2026-02-21T09:13:46.1149929Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T09:13:46.1150106Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1150165Z mov.b64 {%r984, %r985}, %rd343; 2026-02-21T09:13:46.1150227Z cvt.rn.f16x2.f32 %r986, %r985, %r984; 2026-02-21T09:13:46.1150402Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1150461Z cvt.u64.u32 %rd344, %r653; 2026-02-21T09:13:46.1150518Z cvt.u64.u32 %rd345, %r654; 2026-02-21T09:13:46.1150602Z shl.b64 %rd346, %rd345, 32; 2026-02-21T09:13:46.1150663Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T09:13:46.1150831Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1150890Z mov.b64 {%r987, %r988}, %rd347; 2026-02-21T09:13:46.1150962Z cvt.rn.f16x2.f32 %r989, %r988, %r987; 2026-02-21T09:13:46.1151134Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1151194Z cvt.u64.u32 %rd348, %r655; 2026-02-21T09:13:46.1151260Z cvt.u64.u32 %rd349, %r656; 2026-02-21T09:13:46.1151318Z shl.b64 %rd350, %rd349, 32; 2026-02-21T09:13:46.1151375Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T09:13:46.1151558Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1151616Z mov.b64 {%r990, %r991}, %rd351; 2026-02-21T09:13:46.1151679Z cvt.rn.f16x2.f32 %r992, %r991, %r990; 2026-02-21T09:13:46.1151850Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1151917Z cvt.u64.u32 %rd352, %r657; 2026-02-21T09:13:46.1151974Z cvt.u64.u32 %rd353, %r658; 2026-02-21T09:13:46.1152033Z shl.b64 %rd354, %rd353, 32; 2026-02-21T09:13:46.1152109Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T09:13:46.1152275Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1152332Z mov.b64 {%r993, %r994}, %rd355; 2026-02-21T09:13:46.1152398Z cvt.rn.f16x2.f32 %r995, %r994, %r993; 2026-02-21T09:13:46.1152562Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1152619Z cvt.u64.u32 %rd356, %r659; 2026-02-21T09:13:46.1152673Z cvt.u64.u32 %rd357, %r660; 2026-02-21T09:13:46.1152735Z shl.b64 %rd358, %rd357, 32; 2026-02-21T09:13:46.1152789Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T09:13:46.1152952Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1153016Z mov.b64 {%r996, %r997}, %rd359; 2026-02-21T09:13:46.1153076Z cvt.rn.f16x2.f32 %r998, %r997, %r996; 2026-02-21T09:13:46.1153236Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1153298Z cvt.u64.u32 %rd360, %r661; 2026-02-21T09:13:46.1153354Z cvt.u64.u32 %rd361, %r662; 2026-02-21T09:13:46.1153434Z shl.b64 %rd362, %rd361, 32; 2026-02-21T09:13:46.1153489Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T09:13:46.1153660Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1153720Z mov.b64 {%r999, %r1000}, %rd363; 2026-02-21T09:13:46.1153809Z cvt.rn.f16x2.f32 %r1001, %r1000, %r999; 2026-02-21T09:13:46.1153982Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1154038Z cvt.u64.u32 %rd364, %r663; 2026-02-21T09:13:46.1154093Z cvt.u64.u32 %rd365, %r664; 2026-02-21T09:13:46.1154154Z shl.b64 %rd366, %rd365, 32; 2026-02-21T09:13:46.1154211Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T09:13:46.1154373Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1154433Z mov.b64 {%r1002, %r1003}, %rd367; 2026-02-21T09:13:46.1154532Z cvt.rn.f16x2.f32 %r1004, %r1003, %r1002; 2026-02-21T09:13:46.1154732Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1154791Z cvt.u64.u32 %rd368, %r665; 2026-02-21T09:13:46.1154853Z cvt.u64.u32 %rd369, %r666; 2026-02-21T09:13:46.1154909Z shl.b64 %rd370, %rd369, 32; 2026-02-21T09:13:46.1154968Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T09:13:46.1155139Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1155226Z mov.b64 {%r1005, %r1006}, %rd371; 2026-02-21T09:13:46.1155293Z cvt.rn.f16x2.f32 %r1007, %r1006, %r1005; 2026-02-21T09:13:46.1155456Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1155520Z cvt.u64.u32 %rd372, %r667; 2026-02-21T09:13:46.1155575Z cvt.u64.u32 %rd373, %r668; 2026-02-21T09:13:46.1155631Z shl.b64 %rd374, %rd373, 32; 2026-02-21T09:13:46.1155697Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T09:13:46.1155858Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1155919Z mov.b64 {%r1008, %r1009}, %rd375; 2026-02-21T09:13:46.1155992Z cvt.rn.f16x2.f32 %r1010, %r1009, %r1008; 2026-02-21T09:13:46.1156156Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1156213Z cvt.u64.u32 %rd376, %r670; 2026-02-21T09:13:46.1156269Z cvt.u64.u32 %rd377, %r671; 2026-02-21T09:13:46.1156336Z shl.b64 %rd378, %rd377, 32; 2026-02-21T09:13:46.1156393Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T09:13:46.1156553Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1156618Z mov.b64 {%r1011, %r1012}, %rd379; 2026-02-21T09:13:46.1156684Z cvt.rn.f16x2.f32 %r1013, %r1012, %r1011; 2026-02-21T09:13:46.1156846Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1156911Z cvt.u64.u32 %rd380, %r672; 2026-02-21T09:13:46.1156968Z cvt.u64.u32 %rd381, %r673; 2026-02-21T09:13:46.1157026Z shl.b64 %rd382, %rd381, 32; 2026-02-21T09:13:46.1157084Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T09:13:46.1157256Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1157314Z mov.b64 {%r1014, %r1015}, %rd383; 2026-02-21T09:13:46.1157377Z cvt.rn.f16x2.f32 %r1016, %r1015, %r1014; 2026-02-21T09:13:46.1157545Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1157600Z cvt.u64.u32 %rd384, %r674; 2026-02-21T09:13:46.1157656Z cvt.u64.u32 %rd385, %r675; 2026-02-21T09:13:46.1157717Z shl.b64 %rd386, %rd385, 32; 2026-02-21T09:13:46.1157772Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T09:13:46.1157934Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1157991Z mov.b64 {%r1017, %r1018}, %rd387; 2026-02-21T09:13:46.1158089Z cvt.rn.f16x2.f32 %r1019, %r1018, %r1017; 2026-02-21T09:13:46.1158252Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1158308Z cvt.u64.u32 %rd388, %r676; 2026-02-21T09:13:46.1158397Z cvt.u64.u32 %rd389, %r677; 2026-02-21T09:13:46.1158453Z shl.b64 %rd390, %rd389, 32; 2026-02-21T09:13:46.1158509Z or.b64 %rd391, %rd388, %rd390; 2026-02-21T09:13:46.1158681Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1158738Z mov.b64 {%r1020, %r1021}, %rd391; 2026-02-21T09:13:46.1158801Z cvt.rn.f16x2.f32 %r1022, %r1021, %r1020; 2026-02-21T09:13:46.1158963Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1159026Z cvt.u64.u32 %rd392, %r678; 2026-02-21T09:13:46.1159081Z cvt.u64.u32 %rd393, %r679; 2026-02-21T09:13:46.1159160Z shl.b64 %rd394, %rd393, 32; 2026-02-21T09:13:46.1159228Z or.b64 %rd395, %rd392, %rd394; 2026-02-21T09:13:46.1159394Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1159451Z mov.b64 {%r1023, %r1024}, %rd395; 2026-02-21T09:13:46.1159524Z cvt.rn.f16x2.f32 %r1025, %r1024, %r1023; 2026-02-21T09:13:46.1159684Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1159760Z cvt.u64.u32 %rd396, %r680; 2026-02-21T09:13:46.1159816Z cvt.u64.u32 %rd397, %r681; 2026-02-21T09:13:46.1159881Z shl.b64 %rd398, %rd397, 32; 2026-02-21T09:13:46.1159936Z or.b64 %rd399, %rd396, %rd398; 2026-02-21T09:13:46.1160098Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1160164Z mov.b64 {%r1026, %r1027}, %rd399; 2026-02-21T09:13:46.1160230Z cvt.rn.f16x2.f32 %r1028, %r1027, %r1026; 2026-02-21T09:13:46.1160392Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1160457Z cvt.u64.u32 %rd400, %r682; 2026-02-21T09:13:46.1160513Z cvt.u64.u32 %rd401, %r683; 2026-02-21T09:13:46.1160568Z shl.b64 %rd402, %rd401, 32; 2026-02-21T09:13:46.1160626Z or.b64 %rd403, %rd400, %rd402; 2026-02-21T09:13:46.1160795Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1160854Z mov.b64 {%r1029, %r1030}, %rd403; 2026-02-21T09:13:46.1160918Z cvt.rn.f16x2.f32 %r1031, %r1030, %r1029; 2026-02-21T09:13:46.1161087Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1161143Z cvt.u64.u32 %rd404, %r684; 2026-02-21T09:13:46.1161197Z cvt.u64.u32 %rd405, %r685; 2026-02-21T09:13:46.1161258Z shl.b64 %rd406, %rd405, 32; 2026-02-21T09:13:46.1161314Z or.b64 %rd407, %rd404, %rd406; 2026-02-21T09:13:46.1161477Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1161534Z mov.b64 {%r1032, %r1033}, %rd407; 2026-02-21T09:13:46.1161605Z cvt.rn.f16x2.f32 %r1034, %r1033, %r1032; 2026-02-21T09:13:46.1161766Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1161822Z cvt.u64.u32 %rd408, %r687; 2026-02-21T09:13:46.1161885Z cvt.u64.u32 %rd409, %r688; 2026-02-21T09:13:46.1161941Z shl.b64 %rd410, %rd409, 32; 2026-02-21T09:13:46.1161998Z or.b64 %rd411, %rd408, %rd410; 2026-02-21T09:13:46.1162166Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1162223Z mov.b64 {%r1035, %r1036}, %rd411; 2026-02-21T09:13:46.1162286Z cvt.rn.f16x2.f32 %r1037, %r1036, %r1035; 2026-02-21T09:13:46.1162447Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1162510Z cvt.u64.u32 %rd412, %r689; 2026-02-21T09:13:46.1162597Z cvt.u64.u32 %rd413, %r690; 2026-02-21T09:13:46.1162653Z shl.b64 %rd414, %rd413, 32; 2026-02-21T09:13:46.1162714Z or.b64 %rd415, %rd412, %rd414; 2026-02-21T09:13:46.1162880Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1162958Z mov.b64 {%r1038, %r1039}, %rd415; 2026-02-21T09:13:46.1163029Z cvt.rn.f16x2.f32 %r1040, %r1039, %r1038; 2026-02-21T09:13:46.1163197Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1163254Z cvt.u64.u32 %rd416, %r691; 2026-02-21T09:13:46.1163308Z cvt.u64.u32 %rd417, %r692; 2026-02-21T09:13:46.1163369Z shl.b64 %rd418, %rd417, 32; 2026-02-21T09:13:46.1163425Z or.b64 %rd419, %rd416, %rd418; 2026-02-21T09:13:46.1163587Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1163670Z mov.b64 {%r1041, %r1042}, %rd419; 2026-02-21T09:13:46.1163737Z cvt.rn.f16x2.f32 %r1043, %r1042, %r1041; 2026-02-21T09:13:46.1163901Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1163963Z cvt.u64.u32 %rd420, %r693; 2026-02-21T09:13:46.1164019Z cvt.u64.u32 %rd421, %r694; 2026-02-21T09:13:46.1164073Z shl.b64 %rd422, %rd421, 32; 2026-02-21T09:13:46.1164127Z or.b64 %rd423, %rd420, %rd422; 2026-02-21T09:13:46.1164410Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1164470Z mov.b64 {%r1044, %r1045}, %rd423; 2026-02-21T09:13:46.1164535Z cvt.rn.f16x2.f32 %r1046, %r1045, %r1044; 2026-02-21T09:13:46.1164764Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1164823Z cvt.u64.u32 %rd424, %r695; 2026-02-21T09:13:46.1164879Z cvt.u64.u32 %rd425, %r696; 2026-02-21T09:13:46.1164944Z shl.b64 %rd426, %rd425, 32; 2026-02-21T09:13:46.1165004Z or.b64 %rd427, %rd424, %rd426; 2026-02-21T09:13:46.1165173Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1165231Z mov.b64 {%r1047, %r1048}, %rd427; 2026-02-21T09:13:46.1165304Z cvt.rn.f16x2.f32 %r1049, %r1048, %r1047; 2026-02-21T09:13:46.1165464Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1165521Z cvt.u64.u32 %rd428, %r697; 2026-02-21T09:13:46.1165590Z cvt.u64.u32 %rd429, %r698; 2026-02-21T09:13:46.1165646Z shl.b64 %rd430, %rd429, 32; 2026-02-21T09:13:46.1165703Z or.b64 %rd431, %rd428, %rd430; 2026-02-21T09:13:46.1165876Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1165934Z mov.b64 {%r1050, %r1051}, %rd431; 2026-02-21T09:13:46.1166003Z cvt.rn.f16x2.f32 %r1052, %r1051, %r1050; 2026-02-21T09:13:46.1166167Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1166236Z cvt.u64.u32 %rd432, %r699; 2026-02-21T09:13:46.1166294Z cvt.u64.u32 %rd433, %r700; 2026-02-21T09:13:46.1166354Z shl.b64 %rd434, %rd433, 32; 2026-02-21T09:13:46.1166427Z or.b64 %rd435, %rd432, %rd434; 2026-02-21T09:13:46.1166593Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1166654Z mov.b64 {%r1053, %r1054}, %rd435; 2026-02-21T09:13:46.1166724Z cvt.rn.f16x2.f32 %r1055, %r1054, %r1053; 2026-02-21T09:13:46.1166885Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1166941Z cvt.u64.u32 %rd436, %r701; 2026-02-21T09:13:46.1166996Z cvt.u64.u32 %rd437, %r702; 2026-02-21T09:13:46.1167060Z shl.b64 %rd438, %rd437, 32; 2026-02-21T09:13:46.1167116Z or.b64 %rd439, %rd436, %rd438; 2026-02-21T09:13:46.1167280Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1167375Z mov.b64 {%r1056, %r1057}, %rd439; 2026-02-21T09:13:46.1167439Z cvt.rn.f16x2.f32 %r1058, %r1057, %r1056; 2026-02-21T09:13:46.1167596Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1167685Z cvt.u64.u32 %rd440, %r704; 2026-02-21T09:13:46.1167741Z cvt.u64.u32 %rd441, %r705; 2026-02-21T09:13:46.1167797Z shl.b64 %rd442, %rd441, 32; 2026-02-21T09:13:46.1167854Z or.b64 %rd443, %rd440, %rd442; 2026-02-21T09:13:46.1168026Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1168081Z mov.b64 {%r1059, %r1060}, %rd443; 2026-02-21T09:13:46.1168144Z cvt.rn.f16x2.f32 %r1061, %r1060, %r1059; 2026-02-21T09:13:46.1168312Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1168394Z cvt.u64.u32 %rd444, %r706; 2026-02-21T09:13:46.1168453Z cvt.u64.u32 %rd445, %r707; 2026-02-21T09:13:46.1168515Z shl.b64 %rd446, %rd445, 32; 2026-02-21T09:13:46.1168570Z or.b64 %rd447, %rd444, %rd446; 2026-02-21T09:13:46.1168729Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1168787Z mov.b64 {%r1062, %r1063}, %rd447; 2026-02-21T09:13:46.1168855Z cvt.rn.f16x2.f32 %r1064, %r1063, %r1062; 2026-02-21T09:13:46.1169039Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1169096Z cvt.u64.u32 %rd448, %r708; 2026-02-21T09:13:46.1169158Z cvt.u64.u32 %rd449, %r709; 2026-02-21T09:13:46.1169215Z shl.b64 %rd450, %rd449, 32; 2026-02-21T09:13:46.1169271Z or.b64 %rd451, %rd448, %rd450; 2026-02-21T09:13:46.1169440Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1169498Z mov.b64 {%r1065, %r1066}, %rd451; 2026-02-21T09:13:46.1169564Z cvt.rn.f16x2.f32 %r1067, %r1066, %r1065; 2026-02-21T09:13:46.1169726Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1169788Z cvt.u64.u32 %rd452, %r710; 2026-02-21T09:13:46.1169845Z cvt.u64.u32 %rd453, %r711; 2026-02-21T09:13:46.1169899Z shl.b64 %rd454, %rd453, 32; 2026-02-21T09:13:46.1169961Z or.b64 %rd455, %rd452, %rd454; 2026-02-21T09:13:46.1170127Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1170184Z mov.b64 {%r1068, %r1069}, %rd455; 2026-02-21T09:13:46.1170253Z cvt.rn.f16x2.f32 %r1070, %r1069, %r1068; 2026-02-21T09:13:46.1170418Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1170473Z cvt.u64.u32 %rd456, %r712; 2026-02-21T09:13:46.1170526Z cvt.u64.u32 %rd457, %r713; 2026-02-21T09:13:46.1170589Z shl.b64 %rd458, %rd457, 32; 2026-02-21T09:13:46.1170646Z or.b64 %rd459, %rd456, %rd458; 2026-02-21T09:13:46.1170806Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1170869Z mov.b64 {%r1071, %r1072}, %rd459; 2026-02-21T09:13:46.1170932Z cvt.rn.f16x2.f32 %r1073, %r1072, %r1071; 2026-02-21T09:13:46.1171094Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1171155Z cvt.u64.u32 %rd460, %r714; 2026-02-21T09:13:46.1171211Z cvt.u64.u32 %rd461, %r715; 2026-02-21T09:13:46.1171266Z shl.b64 %rd462, %rd461, 32; 2026-02-21T09:13:46.1171322Z or.b64 %rd463, %rd460, %rd462; 2026-02-21T09:13:46.1171490Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1171546Z mov.b64 {%r1074, %r1075}, %rd463; 2026-02-21T09:13:46.1171608Z cvt.rn.f16x2.f32 %r1076, %r1075, %r1074; 2026-02-21T09:13:46.1171779Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1171858Z cvt.u64.u32 %rd464, %r716; 2026-02-21T09:13:46.1171912Z cvt.u64.u32 %rd465, %r717; 2026-02-21T09:13:46.1171974Z shl.b64 %rd466, %rd465, 32; 2026-02-21T09:13:46.1172030Z or.b64 %rd467, %rd464, %rd466; 2026-02-21T09:13:46.1172218Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1172275Z mov.b64 {%r1077, %r1078}, %rd467; 2026-02-21T09:13:46.1172347Z cvt.rn.f16x2.f32 %r1079, %r1078, %r1077; 2026-02-21T09:13:46.1172511Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1172567Z cvt.u64.u32 %rd468, %r718; 2026-02-21T09:13:46.1172628Z cvt.u64.u32 %rd469, %r719; 2026-02-21T09:13:46.1172683Z shl.b64 %rd470, %rd469, 32; 2026-02-21T09:13:46.1172740Z or.b64 %rd471, %rd468, %rd470; 2026-02-21T09:13:46.1172933Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1172994Z mov.b64 {%r1080, %r1081}, %rd471; 2026-02-21T09:13:46.1173059Z cvt.rn.f16x2.f32 %r1082, %r1081, %r1080; 2026-02-21T09:13:46.1173224Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1173288Z cvt.u64.u32 %rd472, %r721; 2026-02-21T09:13:46.1173343Z cvt.u64.u32 %rd473, %r722; 2026-02-21T09:13:46.1173398Z shl.b64 %rd474, %rd473, 32; 2026-02-21T09:13:46.1173481Z or.b64 %rd475, %rd472, %rd474; 2026-02-21T09:13:46.1173645Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1173703Z mov.b64 {%r1083, %r1084}, %rd475; 2026-02-21T09:13:46.1173775Z cvt.rn.f16x2.f32 %r1085, %r1084, %r1083; 2026-02-21T09:13:46.1173942Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1174000Z cvt.u64.u32 %rd476, %r723; 2026-02-21T09:13:46.1174059Z cvt.u64.u32 %rd477, %r724; 2026-02-21T09:13:46.1174124Z shl.b64 %rd478, %rd477, 32; 2026-02-21T09:13:46.1174181Z or.b64 %rd479, %rd476, %rd478; 2026-02-21T09:13:46.1174347Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1174415Z mov.b64 {%r1086, %r1087}, %rd479; 2026-02-21T09:13:46.1174480Z cvt.rn.f16x2.f32 %r1088, %r1087, %r1086; 2026-02-21T09:13:46.1174648Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1174745Z cvt.u64.u32 %rd480, %r725; 2026-02-21T09:13:46.1174804Z cvt.u64.u32 %rd481, %r726; 2026-02-21T09:13:46.1174861Z shl.b64 %rd482, %rd481, 32; 2026-02-21T09:13:46.1174917Z or.b64 %rd483, %rd480, %rd482; 2026-02-21T09:13:46.1175092Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1175151Z mov.b64 {%r1089, %r1090}, %rd483; 2026-02-21T09:13:46.1175217Z cvt.rn.f16x2.f32 %r1091, %r1090, %r1089; 2026-02-21T09:13:46.1175385Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1175442Z cvt.u64.u32 %rd484, %r727; 2026-02-21T09:13:46.1175497Z cvt.u64.u32 %rd485, %r728; 2026-02-21T09:13:46.1175563Z shl.b64 %rd486, %rd485, 32; 2026-02-21T09:13:46.1175619Z or.b64 %rd487, %rd484, %rd486; 2026-02-21T09:13:46.1175788Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1175845Z mov.b64 {%r1092, %r1093}, %rd487; 2026-02-21T09:13:46.1175916Z cvt.rn.f16x2.f32 %r1094, %r1093, %r1092; 2026-02-21T09:13:46.1176078Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1176133Z cvt.u64.u32 %rd488, %r729; 2026-02-21T09:13:46.1176195Z cvt.u64.u32 %rd489, %r730; 2026-02-21T09:13:46.1176252Z shl.b64 %rd490, %rd489, 32; 2026-02-21T09:13:46.1176309Z or.b64 %rd491, %rd488, %rd490; 2026-02-21T09:13:46.1176522Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1176580Z mov.b64 {%r1095, %r1096}, %rd491; 2026-02-21T09:13:46.1176644Z cvt.rn.f16x2.f32 %r1097, %r1096, %r1095; 2026-02-21T09:13:46.1176837Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1176900Z cvt.u64.u32 %rd492, %r731; 2026-02-21T09:13:46.1176958Z cvt.u64.u32 %rd493, %r732; 2026-02-21T09:13:46.1177014Z shl.b64 %rd494, %rd493, 32; 2026-02-21T09:13:46.1177077Z or.b64 %rd495, %rd492, %rd494; 2026-02-21T09:13:46.1177240Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1177298Z mov.b64 {%r1098, %r1099}, %rd495; 2026-02-21T09:13:46.1177371Z cvt.rn.f16x2.f32 %r1100, %r1099, %r1098; 2026-02-21T09:13:46.1177560Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1177620Z cvt.u64.u32 %rd496, %r733; 2026-02-21T09:13:46.1177676Z cvt.u64.u32 %rd497, %r734; 2026-02-21T09:13:46.1177740Z shl.b64 %rd498, %rd497, 32; 2026-02-21T09:13:46.1177797Z or.b64 %rd499, %rd496, %rd498; 2026-02-21T09:13:46.1177964Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1178028Z mov.b64 {%r1101, %r1102}, %rd499; 2026-02-21T09:13:46.1178126Z cvt.rn.f16x2.f32 %r1103, %r1102, %r1101; 2026-02-21T09:13:46.1178290Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1178353Z cvt.u64.u32 %rd500, %r735; 2026-02-21T09:13:46.1178410Z cvt.u64.u32 %rd501, %r736; 2026-02-21T09:13:46.1178466Z shl.b64 %rd502, %rd501, 32; 2026-02-21T09:13:46.1178521Z or.b64 %rd503, %rd500, %rd502; 2026-02-21T09:13:46.1178695Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1178754Z mov.b64 {%r1104, %r1105}, %rd503; 2026-02-21T09:13:46.1178818Z cvt.rn.f16x2.f32 %r1106, %r1105, %r1104; 2026-02-21T09:13:46.1178984Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1179041Z cvt.u64.u32 %rd504, %r738; 2026-02-21T09:13:46.1179095Z cvt.u64.u32 %rd505, %r739; 2026-02-21T09:13:46.1179156Z shl.b64 %rd506, %rd505, 32; 2026-02-21T09:13:46.1179212Z or.b64 %rd507, %rd504, %rd506; 2026-02-21T09:13:46.1179376Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1179432Z mov.b64 {%r1107, %r1108}, %rd507; 2026-02-21T09:13:46.1179501Z cvt.rn.f16x2.f32 %r1109, %r1108, %r1107; 2026-02-21T09:13:46.1179661Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1179718Z cvt.u64.u32 %rd508, %r740; 2026-02-21T09:13:46.1179780Z cvt.u64.u32 %rd509, %r741; 2026-02-21T09:13:46.1179835Z shl.b64 %rd510, %rd509, 32; 2026-02-21T09:13:46.1179892Z or.b64 %rd511, %rd508, %rd510; 2026-02-21T09:13:46.1180061Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1180118Z mov.b64 {%r1110, %r1111}, %rd511; 2026-02-21T09:13:46.1180182Z cvt.rn.f16x2.f32 %r1112, %r1111, %r1110; 2026-02-21T09:13:46.1180344Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1180406Z cvt.u64.u32 %rd512, %r742; 2026-02-21T09:13:46.1180460Z cvt.u64.u32 %rd513, %r743; 2026-02-21T09:13:46.1180515Z shl.b64 %rd514, %rd513, 32; 2026-02-21T09:13:46.1180579Z or.b64 %rd515, %rd512, %rd514; 2026-02-21T09:13:46.1180741Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1180798Z mov.b64 {%r1113, %r1114}, %rd515; 2026-02-21T09:13:46.1180869Z cvt.rn.f16x2.f32 %r1115, %r1114, %r1113; 2026-02-21T09:13:46.1181058Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1181114Z cvt.u64.u32 %rd516, %r744; 2026-02-21T09:13:46.1181170Z cvt.u64.u32 %rd517, %r745; 2026-02-21T09:13:46.1181256Z shl.b64 %rd518, %rd517, 32; 2026-02-21T09:13:46.1181312Z or.b64 %rd519, %rd516, %rd518; 2026-02-21T09:13:46.1181478Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1181542Z mov.b64 {%r1116, %r1117}, %rd519; 2026-02-21T09:13:46.1181606Z cvt.rn.f16x2.f32 %r1118, %r1117, %r1116; 2026-02-21T09:13:46.1181766Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1181828Z cvt.u64.u32 %rd520, %r746; 2026-02-21T09:13:46.1181882Z cvt.u64.u32 %rd521, %r747; 2026-02-21T09:13:46.1181937Z shl.b64 %rd522, %rd521, 32; 2026-02-21T09:13:46.1182014Z or.b64 %rd523, %rd520, %rd522; 2026-02-21T09:13:46.1182190Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1182248Z mov.b64 {%r1119, %r1120}, %rd523; 2026-02-21T09:13:46.1182312Z cvt.rn.f16x2.f32 %r1121, %r1120, %r1119; 2026-02-21T09:13:46.1182480Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1182536Z cvt.u64.u32 %rd524, %r748; 2026-02-21T09:13:46.1182613Z cvt.u64.u32 %rd525, %r749; 2026-02-21T09:13:46.1182677Z shl.b64 %rd526, %rd525, 32; 2026-02-21T09:13:46.1182734Z or.b64 %rd527, %rd524, %rd526; 2026-02-21T09:13:46.1182899Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1182955Z mov.b64 {%r1122, %r1123}, %rd527; 2026-02-21T09:13:46.1183028Z cvt.rn.f16x2.f32 %r1124, %r1123, %r1122; 2026-02-21T09:13:46.1183200Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1183259Z cvt.u64.u32 %rd528, %r750; 2026-02-21T09:13:46.1183324Z cvt.u64.u32 %rd529, %r751; 2026-02-21T09:13:46.1183390Z shl.b64 %rd530, %rd529, 32; 2026-02-21T09:13:46.1196712Z or.b64 %rd531, %rd528, %rd530; 2026-02-21T09:13:46.1197043Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1197118Z mov.b64 {%r1125, %r1126}, %rd531; 2026-02-21T09:13:46.1197204Z cvt.rn.f16x2.f32 %r1127, %r1126, %r1125; 2026-02-21T09:13:46.1197395Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1197462Z cvt.u64.u32 %rd532, %r752; 2026-02-21T09:13:46.1197526Z cvt.u64.u32 %rd533, %r753; 2026-02-21T09:13:46.1197589Z shl.b64 %rd534, %rd533, 32; 2026-02-21T09:13:46.1197663Z or.b64 %rd535, %rd532, %rd534; 2026-02-21T09:13:46.1197845Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1197913Z mov.b64 {%r1128, %r1129}, %rd535; 2026-02-21T09:13:46.1197993Z cvt.rn.f16x2.f32 %r1130, %r1129, %r1128; 2026-02-21T09:13:46.1198163Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1198225Z cvt.u64.u32 %rd536, %r755; 2026-02-21T09:13:46.1198293Z cvt.u64.u32 %rd537, %r756; 2026-02-21T09:13:46.1198354Z shl.b64 %rd538, %rd537, 32; 2026-02-21T09:13:46.1198414Z or.b64 %rd539, %rd536, %rd538; 2026-02-21T09:13:46.1198584Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1198654Z mov.b64 {%r1131, %r1132}, %rd539; 2026-02-21T09:13:46.1198721Z cvt.rn.f16x2.f32 %r1133, %r1132, %r1131; 2026-02-21T09:13:46.1198889Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1198958Z cvt.u64.u32 %rd540, %r757; 2026-02-21T09:13:46.1199020Z cvt.u64.u32 %rd541, %r758; 2026-02-21T09:13:46.1199186Z shl.b64 %rd542, %rd541, 32; 2026-02-21T09:13:46.1199255Z or.b64 %rd543, %rd540, %rd542; 2026-02-21T09:13:46.1199428Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1199530Z mov.b64 {%r1134, %r1135}, %rd543; 2026-02-21T09:13:46.1199599Z cvt.rn.f16x2.f32 %r1136, %r1135, %r1134; 2026-02-21T09:13:46.1199776Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1199837Z cvt.u64.u32 %rd544, %r759; 2026-02-21T09:13:46.1199895Z cvt.u64.u32 %rd545, %r760; 2026-02-21T09:13:46.1199963Z shl.b64 %rd546, %rd545, 32; 2026-02-21T09:13:46.1200024Z or.b64 %rd547, %rd544, %rd546; 2026-02-21T09:13:46.1200192Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1200262Z mov.b64 {%r1137, %r1138}, %rd547; 2026-02-21T09:13:46.1200365Z cvt.rn.f16x2.f32 %r1139, %r1138, %r1137; 2026-02-21T09:13:46.1200535Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1200605Z cvt.u64.u32 %rd548, %r761; 2026-02-21T09:13:46.1200662Z cvt.u64.u32 %rd549, %r762; 2026-02-21T09:13:46.1200721Z shl.b64 %rd550, %rd549, 32; 2026-02-21T09:13:46.1200780Z or.b64 %rd551, %rd548, %rd550; 2026-02-21T09:13:46.1200958Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1201052Z mov.b64 {%r1140, %r1141}, %rd551; 2026-02-21T09:13:46.1201120Z cvt.rn.f16x2.f32 %r1142, %r1141, %r1140; 2026-02-21T09:13:46.1201292Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1201350Z cvt.u64.u32 %rd552, %r763; 2026-02-21T09:13:46.1201407Z cvt.u64.u32 %rd553, %r764; 2026-02-21T09:13:46.1201464Z shl.b64 %rd554, %rd553, 32; 2026-02-21T09:13:46.1201533Z or.b64 %rd555, %rd552, %rd554; 2026-02-21T09:13:46.1201704Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1201764Z mov.b64 {%r1143, %r1144}, %rd555; 2026-02-21T09:13:46.1201838Z cvt.rn.f16x2.f32 %r1145, %r1144, %r1143; 2026-02-21T09:13:46.1202006Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1202064Z cvt.u64.u32 %rd556, %r765; 2026-02-21T09:13:46.1202131Z cvt.u64.u32 %rd557, %r766; 2026-02-21T09:13:46.1202190Z shl.b64 %rd558, %rd557, 32; 2026-02-21T09:13:46.1202247Z or.b64 %rd559, %rd556, %rd558; 2026-02-21T09:13:46.1202411Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1202477Z mov.b64 {%r1146, %r1147}, %rd559; 2026-02-21T09:13:46.1202540Z cvt.rn.f16x2.f32 %r1148, %r1147, %r1146; 2026-02-21T09:13:46.1202705Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1202774Z cvt.u64.u32 %rd560, %r767; 2026-02-21T09:13:46.1202830Z cvt.u64.u32 %rd561, %r768; 2026-02-21T09:13:46.1202889Z shl.b64 %rd562, %rd561, 32; 2026-02-21T09:13:46.1202956Z or.b64 %rd563, %rd560, %rd562; 2026-02-21T09:13:46.1203122Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1203181Z mov.b64 {%r1149, %r1150}, %rd563; 2026-02-21T09:13:46.1203247Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T09:13:46.1203424Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1203484Z cvt.u64.u32 %rd564, %r769; 2026-02-21T09:13:46.1203541Z cvt.u64.u32 %rd565, %r770; 2026-02-21T09:13:46.1203609Z shl.b64 %rd566, %rd565, 32; 2026-02-21T09:13:46.1203669Z or.b64 %rd567, %rd564, %rd566; 2026-02-21T09:13:46.1203833Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1203928Z mov.b64 {%r1152, %r1153}, %rd567; 2026-02-21T09:13:46.1203993Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T09:13:46.1204159Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1204253Z cvt.u64.u32 %rd568, %r772; 2026-02-21T09:13:46.1204315Z cvt.u64.u32 %rd569, %r773; 2026-02-21T09:13:46.1204372Z shl.b64 %rd570, %rd569, 32; 2026-02-21T09:13:46.1204431Z or.b64 %rd571, %rd568, %rd570; 2026-02-21T09:13:46.1204608Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1204714Z mov.b64 {%r1155, %r1156}, %rd571; 2026-02-21T09:13:46.1204781Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T09:13:46.1204956Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1205013Z cvt.u64.u32 %rd572, %r774; 2026-02-21T09:13:46.1205095Z cvt.u64.u32 %rd573, %r775; 2026-02-21T09:13:46.1205156Z shl.b64 %rd574, %rd573, 32; 2026-02-21T09:13:46.1205225Z or.b64 %rd575, %rd572, %rd574; 2026-02-21T09:13:46.1205384Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1205447Z mov.b64 {%r1158, %r1159}, %rd575; 2026-02-21T09:13:46.1205527Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T09:13:46.1205691Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1205775Z cvt.u64.u32 %rd576, %r776; 2026-02-21T09:13:46.1205847Z cvt.u64.u32 %rd577, %r777; 2026-02-21T09:13:46.1205906Z shl.b64 %rd578, %rd577, 32; 2026-02-21T09:13:46.1205965Z or.b64 %rd579, %rd576, %rd578; 2026-02-21T09:13:46.1206130Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1206199Z mov.b64 {%r1161, %r1162}, %rd579; 2026-02-21T09:13:46.1206264Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T09:13:46.1206427Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1206494Z cvt.u64.u32 %rd580, %r778; 2026-02-21T09:13:46.1206551Z cvt.u64.u32 %rd581, %r779; 2026-02-21T09:13:46.1206611Z shl.b64 %rd582, %rd581, 32; 2026-02-21T09:13:46.1206678Z or.b64 %rd583, %rd580, %rd582; 2026-02-21T09:13:46.1206839Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1206900Z mov.b64 {%r1164, %r1165}, %rd583; 2026-02-21T09:13:46.1206964Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T09:13:46.1207133Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1207192Z cvt.u64.u32 %rd584, %r780; 2026-02-21T09:13:46.1207248Z cvt.u64.u32 %rd585, %r781; 2026-02-21T09:13:46.1207314Z shl.b64 %rd586, %rd585, 32; 2026-02-21T09:13:46.1207374Z or.b64 %rd587, %rd584, %rd586; 2026-02-21T09:13:46.1207537Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1207604Z mov.b64 {%r1167, %r1168}, %rd587; 2026-02-21T09:13:46.1207670Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T09:13:46.1207836Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1207902Z cvt.u64.u32 %rd588, %r782; 2026-02-21T09:13:46.1207959Z cvt.u64.u32 %rd589, %r783; 2026-02-21T09:13:46.1208018Z shl.b64 %rd590, %rd589, 32; 2026-02-21T09:13:46.1208078Z or.b64 %rd591, %rd588, %rd590; 2026-02-21T09:13:46.1208253Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1208311Z mov.b64 {%r1170, %r1171}, %rd591; 2026-02-21T09:13:46.1208377Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T09:13:46.1208551Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1208637Z cvt.u64.u32 %rd592, %r784; 2026-02-21T09:13:46.1208695Z cvt.u64.u32 %rd593, %r785; 2026-02-21T09:13:46.1208751Z shl.b64 %rd594, %rd593, 32; 2026-02-21T09:13:46.1208820Z or.b64 %rd595, %rd592, %rd594; 2026-02-21T09:13:46.1208987Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1209075Z mov.b64 {%r1173, %r1174}, %rd595; 2026-02-21T09:13:46.1209150Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T09:13:46.1209316Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1209373Z cvt.u64.u32 %rd596, %r786; 2026-02-21T09:13:46.1209438Z cvt.u64.u32 %rd597, %r787; 2026-02-21T09:13:46.1209495Z shl.b64 %rd598, %rd597, 32; 2026-02-21T09:13:46.1209551Z or.b64 %rd599, %rd596, %rd598; 2026-02-21T09:13:46.1209734Z .loc 1 55 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:55:27 2026-02-21T09:13:46.1209803Z mov.b64 {%r1176, %r1177}, %rd599; 2026-02-21T09:13:46.1209866Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T09:13:46.1210033Z .loc 1 56 45 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:56:45 2026-02-21T09:13:46.1210118Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:13:46.1210174Z bar.sync 0, 128; 2026-02-21T09:13:46.1210272Z st.shared.v4.b32 [%r46], {%r797, %r800, %r803, %r806}; 2026-02-21T09:13:46.1210402Z st.shared.v4.b32 [%r46+32768], {%r893, %r896, %r899, %r902}; 2026-02-21T09:13:46.1210499Z st.shared.v4.b32 [%r46+16384], {%r989, %r992, %r995, %r998}; 2026-02-21T09:13:46.1210606Z st.shared.v4.b32 [%r46+49152], {%r1085, %r1088, %r1091, %r1094}; 2026-02-21T09:13:46.1210702Z st.shared.v4.b32 [%r47], {%r809, %r812, %r815, %r818}; 2026-02-21T09:13:46.1210794Z st.shared.v4.b32 [%r47+32768], {%r905, %r908, %r911, %r914}; 2026-02-21T09:13:46.1210895Z st.shared.v4.b32 [%r47+16384], {%r1001, %r1004, %r1007, %r1010}; 2026-02-21T09:13:46.1210991Z st.shared.v4.b32 [%r47+49152], {%r1097, %r1100, %r1103, %r1106}; 2026-02-21T09:13:46.1211088Z st.shared.v4.b32 [%r48], {%r821, %r824, %r827, %r830}; 2026-02-21T09:13:46.1211179Z st.shared.v4.b32 [%r48+32768], {%r917, %r920, %r923, %r926}; 2026-02-21T09:13:46.1211276Z st.shared.v4.b32 [%r48+16384], {%r1013, %r1016, %r1019, %r1022}; 2026-02-21T09:13:46.1211381Z st.shared.v4.b32 [%r48+49152], {%r1109, %r1112, %r1115, %r1118}; 2026-02-21T09:13:46.1211468Z st.shared.v4.b32 [%r49], {%r833, %r836, %r839, %r842}; 2026-02-21T09:13:46.1211560Z st.shared.v4.b32 [%r49+32768], {%r929, %r932, %r935, %r938}; 2026-02-21T09:13:46.1211665Z st.shared.v4.b32 [%r49+16384], {%r1025, %r1028, %r1031, %r1034}; 2026-02-21T09:13:46.1211760Z st.shared.v4.b32 [%r49+49152], {%r1121, %r1124, %r1127, %r1130}; 2026-02-21T09:13:46.1211845Z st.shared.v4.b32 [%r50], {%r845, %r848, %r851, %r854}; 2026-02-21T09:13:46.1211943Z st.shared.v4.b32 [%r50+32768], {%r941, %r944, %r947, %r950}; 2026-02-21T09:13:46.1212040Z st.shared.v4.b32 [%r50+16384], {%r1037, %r1040, %r1043, %r1046}; 2026-02-21T09:13:46.1212134Z st.shared.v4.b32 [%r50+49152], {%r1133, %r1136, %r1139, %r1142}; 2026-02-21T09:13:46.1212219Z st.shared.v4.b32 [%r51], {%r857, %r860, %r863, %r866}; 2026-02-21T09:13:46.1212320Z st.shared.v4.b32 [%r51+32768], {%r953, %r956, %r959, %r962}; 2026-02-21T09:13:46.1212415Z st.shared.v4.b32 [%r51+16384], {%r1049, %r1052, %r1055, %r1058}; 2026-02-21T09:13:46.1212514Z st.shared.v4.b32 [%r51+49152], {%r1145, %r1148, %r1151, %r1154}; 2026-02-21T09:13:46.1212608Z st.shared.v4.b32 [%r52], {%r869, %r872, %r875, %r878}; 2026-02-21T09:13:46.1212700Z st.shared.v4.b32 [%r52+32768], {%r965, %r968, %r971, %r974}; 2026-02-21T09:13:46.1212796Z st.shared.v4.b32 [%r52+16384], {%r1061, %r1064, %r1067, %r1070}; 2026-02-21T09:13:46.1212899Z st.shared.v4.b32 [%r52+49152], {%r1157, %r1160, %r1163, %r1166}; 2026-02-21T09:13:46.1212986Z st.shared.v4.b32 [%r53], {%r881, %r884, %r887, %r890}; 2026-02-21T09:13:46.1213079Z st.shared.v4.b32 [%r53+32768], {%r977, %r980, %r983, %r986}; 2026-02-21T09:13:46.1213197Z st.shared.v4.b32 [%r53+16384], {%r1073, %r1076, %r1079, %r1082}; 2026-02-21T09:13:46.1213303Z st.shared.v4.b32 [%r53+49152], {%r1169, %r1172, %r1175, %r1178}; 2026-02-21T09:13:46.1213387Z // begin inline asm 2026-02-21T09:13:46.1213469Z fence.proxy.async.shared::cta; 2026-02-21T09:13:46.1213536Z // end inline asm 2026-02-21T09:13:46.1213595Z bar.sync 0, 128; 2026-02-21T09:13:46.1213668Z elect.sync %r1179|%p130, -1; 2026-02-21T09:13:46.1213745Z and.pred %p128, %p129, %p130; 2026-02-21T09:13:46.1213804Z add.s32 %r790, %r1233, %r55; 2026-02-21T09:13:46.1213861Z // begin inline asm 2026-02-21T09:13:46.1214048Z @%p128 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd87, {%r790, %r1232}], [%r792]; 2026-02-21T09:13:46.1214114Z // end inline asm 2026-02-21T09:13:46.1214182Z cp.async.bulk.commit_group; 2026-02-21T09:13:46.1214237Z mov.b32 %r1230, 1; 2026-02-21T09:13:46.1214388Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:46.1214563Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1214627Z xor.b32 %r1234, %r1230, %r1234; 2026-02-21T09:13:46.1214848Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1214912Z add.s32 %r1224, %r1224, -1; 2026-02-21T09:13:46.1214975Z setp.ne.b32 %p131, %r1224, 0; 2026-02-21T09:13:46.1215059Z @%p131 bra $L__BB0_18; 2026-02-21T09:13:46.1215129Z bra.uni $L__BB0_23; 2026-02-21T09:13:46.1215233Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:13:46.1215293Z add.s32 %r514, %r1229, 1; 2026-02-21T09:13:46.1215365Z setp.eq.b32 %p124, %r1229, 63; 2026-02-21T09:13:46.1215431Z selp.b32 %r1229, 0, %r514, %p124; 2026-02-21T09:13:46.1215494Z setp.eq.b32 %p125, %r1229, 63; 2026-02-21T09:13:46.1215555Z @%p125 bra $L__BB0_21; 2026-02-21T09:13:46.1215662Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:46.1215834Z .loc 1 0 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0:130 2026-02-21T09:13:46.1215890Z mov.b32 %r1230, 0; 2026-02-21T09:13:46.1216063Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1216123Z setp.ne.b32 %p126, %r1229, 0; 2026-02-21T09:13:46.1216184Z @%p126 bra $L__BB0_22; 2026-02-21T09:13:46.1216275Z // %bb.20: // %.thread 2026-02-21T09:13:46.1216363Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:46.1216426Z add.s32 %r1231, %r1231, 2368; 2026-02-21T09:13:46.1216598Z .loc 1 36 35 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:36:35 2026-02-21T09:13:46.1216658Z shr.s32 %r1181, %r1231, 31; 2026-02-21T09:13:46.1216719Z shr.u32 %r1182, %r1181, 26; 2026-02-21T09:13:46.1216781Z add.s32 %r1183, %r1231, %r1182; 2026-02-21T09:13:46.1216849Z shr.s32 %r1184, %r1183, 6; 2026-02-21T09:13:46.1217009Z .loc 1 37 33 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:37:33 2026-02-21T09:13:46.1217066Z shl.b32 %r1185, %r1184, 3; 2026-02-21T09:13:46.1217235Z .loc 1 38 39 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:38:39 2026-02-21T09:13:46.1217294Z sub.s32 %r1186, 32, %r1185; 2026-02-21T09:13:46.1217453Z .loc 1 38 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:38:52 2026-02-21T09:13:46.1217521Z min.s32 %r1187, %r1186, 8; 2026-02-21T09:13:46.1217677Z .loc 1 39 45 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:39:45 2026-02-21T09:13:46.1217737Z and.b32 %r1188, %r1183, -64; 2026-02-21T09:13:46.1217798Z sub.s32 %r1189, %r1231, %r1188; 2026-02-21T09:13:46.1217964Z .loc 1 40 51 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:40:51 2026-02-21T09:13:46.1218053Z div.s32 %r1190, %r1189, %r1187; 2026-02-21T09:13:46.1218217Z .loc 1 39 64 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:39:64 2026-02-21T09:13:46.1218289Z mul.lo.s32 %r1191, %r1190, %r1187; 2026-02-21T09:13:46.1218376Z sub.s32 %r1192, %r1189, %r1191; 2026-02-21T09:13:46.1218540Z .loc 1 39 30 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:39:30 2026-02-21T09:13:46.1218608Z add.s32 %r1193, %r1192, %r1185; 2026-02-21T09:13:46.1218772Z .loc 1 41 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:41:27 2026-02-21T09:13:46.1218828Z shl.b32 %r1233, %r1193, 7; 2026-02-21T09:13:46.1219001Z .loc 1 42 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:42:27 2026-02-21T09:13:46.1219057Z shl.b32 %r1232, %r1190, 8; 2026-02-21T09:13:46.1219265Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1219326Z bra.uni $L__BB0_22; 2026-02-21T09:13:46.1219433Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:13:46.1219603Z .loc 1 0 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0:130 2026-02-21T09:13:46.1219664Z mov.b32 %r75, global_smem; 2026-02-21T09:13:46.1219732Z add.s32 %r76, %r75, %r3; 2026-02-21T09:13:46.1219787Z bra.uni $L__BB0_2; 2026-02-21T09:13:46.1219900Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1220083Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1220166Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:46.1220222Z barrier.sync 1; 2026-02-21T09:13:46.1220280Z barrier.sync 1; 2026-02-21T09:13:46.1220366Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:46.1220448Z $L__BB0_2: // %.preheader 2026-02-21T09:13:46.1220538Z // =>This Loop Header: Depth=1 2026-02-21T09:13:46.1220630Z // Child Loop BB0_11 Depth 2 2026-02-21T09:13:46.1220715Z // Child Loop BB0_7 Depth 2 2026-02-21T09:13:46.1220878Z .loc 1 19 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:19 2026-02-21T09:13:46.1220964Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:13:46.1221022Z barrier.sync 1; 2026-02-21T09:13:46.1221088Z ld.shared.b8 %r74, [%r76+163924]; 2026-02-21T09:13:46.1221154Z setp.gt.u32 %p4, %r74, 3; 2026-02-21T09:13:46.1221221Z @%p4 bra $L__BB0_4; 2026-02-21T09:13:46.1221299Z // %bb.3: // %.preheader 2026-02-21T09:13:46.1221384Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1221457Z $L_brx_0: .branchtargets 2026-02-21T09:13:46.1221514Z $L__BB0_5, 2026-02-21T09:13:46.1221568Z $L__BB0_9, 2026-02-21T09:13:46.1221629Z $L__BB0_15, 2026-02-21T09:13:46.1221683Z $L__BB0_24; 2026-02-21T09:13:46.1221743Z brx.idx %r74, $L_brx_0; 2026-02-21T09:13:46.1221836Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1222023Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1222100Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:46.1222179Z ld.shared.b32 %r126, [global_smem+65536]; 2026-02-21T09:13:46.1222264Z ld.shared.b32 %r1208, [global_smem+65544]; 2026-02-21T09:13:46.1222321Z barrier.sync 1; 2026-02-21T09:13:46.1222385Z setp.lt.s32 %p17, %r1208, 1; 2026-02-21T09:13:46.1222442Z @%p17 bra $L__BB0_8; 2026-02-21T09:13:46.1222531Z // %bb.6: // %.lr.ph8 2026-02-21T09:13:46.1222618Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1222788Z .loc 1 0 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0:130 2026-02-21T09:13:46.1222878Z mov.b32 %r1212, -1; 2026-02-21T09:13:46.1222939Z mov.pred %p143, 0; 2026-02-21T09:13:46.1222993Z mov.b32 %r1209, 0; 2026-02-21T09:13:46.1223059Z mov.b32 %r1210, %r1209; 2026-02-21T09:13:46.1223140Z mov.b32 %r1211, %r1209; 2026-02-21T09:13:46.1223236Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:13:46.1223326Z // => This Inner Loop Header: Depth=2 2026-02-21T09:13:46.1223507Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1223565Z add.s32 %r136, %r1212, 1; 2026-02-21T09:13:46.1223628Z setp.eq.b32 %p30, %r1212, 63; 2026-02-21T09:13:46.1223703Z selp.b32 %r1212, 0, %r136, %p30; 2026-02-21T09:13:46.1223761Z shl.b32 %r137, %r1211, 3; 2026-02-21T09:13:46.1223841Z add.s32 %r139, %r75, %r137; 2026-02-21T09:13:46.1223910Z add.s32 %r140, %r139, 163840; 2026-02-21T09:13:46.1223968Z add.s32 %r124, %r139, 163872; 2026-02-21T09:13:46.1224134Z .loc 1 51 31 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:51:31 2026-02-21T09:13:46.1224190Z shl.b32 %r141, %r1211, 14; 2026-02-21T09:13:46.1224260Z add.s32 %r142, %r75, %r141; 2026-02-21T09:13:46.1224420Z .loc 1 52 44 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:52:44 2026-02-21T09:13:46.1224478Z shl.b32 %r143, %r1211, 13; 2026-02-21T09:13:46.1224563Z add.s32 %r144, %r75, %r143; 2026-02-21T09:13:46.1224621Z add.s32 %r145, %r144, 131072; 2026-02-21T09:13:46.1224822Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1224895Z bar.warp.sync -1; 2026-02-21T09:13:46.1224954Z // begin inline asm 2026-02-21T09:13:46.1225008Z 2026-02-21T09:13:46.1225060Z { 2026-02-21T09:13:46.1225133Z .reg .pred complete; 2026-02-21T09:13:46.1225188Z waitLoop: 2026-02-21T09:13:46.1225315Z mbarrier.try_wait.parity.shared.b64 complete, [%r124], %r1210; 2026-02-21T09:13:46.1225388Z @!complete bra.uni waitLoop; 2026-02-21T09:13:46.1225438Z } 2026-02-21T09:13:46.1225445Z 2026-02-21T09:13:46.1225503Z // end inline asm 2026-02-21T09:13:46.1225675Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1225750Z setp.eq.b32 %p29, %r1212, 63; 2026-02-21T09:13:46.1225916Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1225978Z elect.sync %r146|%p20, -1; 2026-02-21T09:13:46.1226053Z bfe.u32 %r147, %r142, 4, 14; 2026-02-21T09:13:46.1226116Z cvt.u64.u32 %rd24, %r147; 2026-02-21T09:13:46.1226189Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:13:46.1226259Z bfe.u32 %r148, %r145, 4, 14; 2026-02-21T09:13:46.1226319Z cvt.u64.u32 %rd25, %r148; 2026-02-21T09:13:46.1226392Z or.b64 %rd15, %rd25, -9223371899382267904; 2026-02-21T09:13:46.1226452Z mov.b32 %r127, 136314896; 2026-02-21T09:13:46.1226516Z // begin inline asm 2026-02-21T09:13:46.1226664Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r126 + 0 ], %rd14, %rd15, %r127, %p143; 2026-02-21T09:13:46.1226720Z // end inline asm 2026-02-21T09:13:46.1226784Z add.s32 %r149, %r142, 32; 2026-02-21T09:13:46.1226842Z bfe.u32 %r150, %r149, 4, 14; 2026-02-21T09:13:46.1226899Z cvt.u64.u32 %rd26, %r150; 2026-02-21T09:13:46.1226972Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:13:46.1227028Z add.s32 %r151, %r144, 131104; 2026-02-21T09:13:46.1227084Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T09:13:46.1227148Z cvt.u64.u32 %rd27, %r152; 2026-02-21T09:13:46.1227210Z or.b64 %rd17, %rd27, -9223371899382267904; 2026-02-21T09:13:46.1227268Z mov.pred %p21, -1; 2026-02-21T09:13:46.1227332Z // begin inline asm 2026-02-21T09:13:46.1227470Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r126 + 0 ], %rd16, %rd17, %r127, %p21; 2026-02-21T09:13:46.1227526Z // end inline asm 2026-02-21T09:13:46.1227615Z add.s32 %r153, %r142, 8192; 2026-02-21T09:13:46.1227682Z bfe.u32 %r154, %r153, 4, 14; 2026-02-21T09:13:46.1227741Z cvt.u64.u32 %rd28, %r154; 2026-02-21T09:13:46.1227805Z or.b64 %rd18, %rd28, -9223371899348713472; 2026-02-21T09:13:46.1227900Z // begin inline asm 2026-02-21T09:13:46.1228036Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r126 + 128 ], %rd18, %rd15, %r127, %p143; 2026-02-21T09:13:46.1228089Z // end inline asm 2026-02-21T09:13:46.1228153Z add.s32 %r155, %r142, 8224; 2026-02-21T09:13:46.1228210Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T09:13:46.1228265Z cvt.u64.u32 %rd29, %r156; 2026-02-21T09:13:46.1228329Z or.b64 %rd20, %rd29, -9223371899348713472; 2026-02-21T09:13:46.1228390Z // begin inline asm 2026-02-21T09:13:46.1228521Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r126 + 128 ], %rd20, %rd17, %r127, %p21; 2026-02-21T09:13:46.1228573Z // end inline asm 2026-02-21T09:13:46.1228665Z cvt.u64.u32 %rd22, %r140; 2026-02-21T09:13:46.1228721Z // begin inline asm 2026-02-21T09:13:46.1228847Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:13:46.1228900Z // end inline asm 2026-02-21T09:13:46.1228971Z and.pred %p28, %p29, %p20; 2026-02-21T09:13:46.1229026Z add.s32 %r157, %r75, 163904; 2026-02-21T09:13:46.1229083Z cvt.u64.u32 %rd23, %r157; 2026-02-21T09:13:46.1229147Z // begin inline asm 2026-02-21T09:13:46.1229262Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:13:46.1229339Z // end inline asm 2026-02-21T09:13:46.1229509Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1229570Z setp.ne.b32 %p143, %r1212, 63; 2026-02-21T09:13:46.1229737Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1229797Z selp.b32 %r158, 1, 0, %p29; 2026-02-21T09:13:46.1229861Z xor.b32 %r1209, %r1209, %r158; 2026-02-21T09:13:46.1229919Z add.s32 %r134, %r75, 163920; 2026-02-21T09:13:46.1229975Z // begin inline asm 2026-02-21T09:13:46.1230033Z 2026-02-21T09:13:46.1230084Z { 2026-02-21T09:13:46.1230147Z @!%p29 bra.uni skipWait; 2026-02-21T09:13:46.1230206Z .reg .pred complete; 2026-02-21T09:13:46.1230269Z waitLoop: 2026-02-21T09:13:46.1230394Z mbarrier.try_wait.parity.shared.b64 complete, [%r134], %r1209; 2026-02-21T09:13:46.1230460Z @!complete bra.uni waitLoop; 2026-02-21T09:13:46.1230523Z skipWait: 2026-02-21T09:13:46.1230574Z } 2026-02-21T09:13:46.1230580Z 2026-02-21T09:13:46.1230636Z // end inline asm 2026-02-21T09:13:46.1230810Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1230868Z add.s32 %r159, %r1211, 1; 2026-02-21T09:13:46.1230929Z setp.eq.b32 %p31, %r159, 4; 2026-02-21T09:13:46.1230995Z selp.b32 %r1211, 0, %r159, %p31; 2026-02-21T09:13:46.1231062Z selp.b32 %r160, 1, 0, %p31; 2026-02-21T09:13:46.1231124Z xor.b32 %r1210, %r1210, %r160; 2026-02-21T09:13:46.1231306Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1231373Z add.s32 %r1208, %r1208, -1; 2026-02-21T09:13:46.1231434Z setp.ne.b32 %p32, %r1208, 0; 2026-02-21T09:13:46.1231491Z @%p32 bra $L__BB0_7; 2026-02-21T09:13:46.1231584Z $L__BB0_8: // %._crit_edge9 2026-02-21T09:13:46.1231675Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1231733Z barrier.sync 1; 2026-02-21T09:13:46.1231813Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:46.1231885Z bra.uni $L__BB0_2; 2026-02-21T09:13:46.1231984Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1232172Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1232251Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:46.1232327Z ld.shared.b32 %r1213, [global_smem+65544]; 2026-02-21T09:13:46.1232408Z barrier.sync 1; 2026-02-21T09:13:46.1232582Z .loc 1 21 67 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:21:67 2026-02-21T09:13:46.1232644Z mov.u32 %r17, %ctaid.x; 2026-02-21T09:13:46.1232702Z mov.u32 %r77, %ctaid.y; 2026-02-21T09:13:46.1232806Z mov.u32 %r78, %ctaid.z; 2026-02-21T09:13:46.1232867Z mov.u32 %r79, %nctaid.x; 2026-02-21T09:13:46.1232927Z mov.u32 %r80, %nctaid.y; 2026-02-21T09:13:46.1233000Z mad.lo.s32 %r81, %r78, %r80, %r77; 2026-02-21T09:13:46.1233064Z mad.lo.s32 %r82, %r81, %r79, %r17; 2026-02-21T09:13:46.1233123Z mul.lo.s32 %r83, %r82, 384; 2026-02-21T09:13:46.1233297Z .loc 1 22 67 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:22:67 2026-02-21T09:13:46.1233362Z add.s32 %r84, %r83, 128; 2026-02-21T09:13:46.1233421Z cvt.s64.s32 %rd8, %r84; 2026-02-21T09:13:46.1233481Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:13:46.1233573Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:13:46.1233745Z .loc 1 21 67 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:21:67 2026-02-21T09:13:46.1233805Z cvt.s64.s32 %rd10, %r83; 2026-02-21T09:13:46.1233872Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:13:46.1233936Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:13:46.1234116Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1234177Z setp.lt.s32 %p5, %r1213, 1; 2026-02-21T09:13:46.1234262Z @%p5 bra $L__BB0_14; 2026-02-21T09:13:46.1234341Z // %bb.10: // %.lr.ph 2026-02-21T09:13:46.1234430Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1234498Z add.s32 %r1223, %r17, -2368; 2026-02-21T09:13:46.1234557Z add.s32 %r19, %r1, -128; 2026-02-21T09:13:46.1234615Z mov.b32 %r1220, -1; 2026-02-21T09:13:46.1234695Z mov.b32 %r1214, 0; 2026-02-21T09:13:46.1234763Z mov.b32 %r1215, %r1214; 2026-02-21T09:13:46.1234822Z mov.b32 %r1222, %r1214; 2026-02-21T09:13:46.1234881Z mov.b32 %r1221, %r1214; 2026-02-21T09:13:46.1234945Z mov.b32 %r1218, %r1214; 2026-02-21T09:13:46.1235001Z bra.uni $L__BB0_11; 2026-02-21T09:13:46.1235104Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:13:46.1235289Z .loc 1 0 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0:130 2026-02-21T09:13:46.1235354Z selp.b32 %r105, 0, %r1218, %p8; 2026-02-21T09:13:46.1235417Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:13:46.1235478Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:13:46.1235664Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1235721Z shl.b32 %r112, %r1215, 3; 2026-02-21T09:13:46.1235779Z add.s32 %r114, %r75, %r112; 2026-02-21T09:13:46.1235845Z add.s32 %r101, %r114, 163840; 2026-02-21T09:13:46.1236015Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1236075Z // begin inline asm 2026-02-21T09:13:46.1236131Z 2026-02-21T09:13:46.1236182Z { 2026-02-21T09:13:46.1236243Z .reg .pred complete; 2026-02-21T09:13:46.1236297Z waitLoop: 2026-02-21T09:13:46.1236428Z mbarrier.try_wait.parity.shared.b64 complete, [%r101], %r1214; 2026-02-21T09:13:46.1236497Z @!complete bra.uni waitLoop; 2026-02-21T09:13:46.1236547Z } 2026-02-21T09:13:46.1236551Z 2026-02-21T09:13:46.1236615Z // end inline asm 2026-02-21T09:13:46.1236792Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1236852Z add.s32 %r107, %r114, 163872; 2026-02-21T09:13:46.1237018Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1237080Z bar.sync 3, 64; 2026-02-21T09:13:46.1237138Z // begin inline asm 2026-02-21T09:13:46.1237253Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r107], 24576; 2026-02-21T09:13:46.1237320Z // end inline asm 2026-02-21T09:13:46.1237535Z .loc 1 51 31 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:51:31 2026-02-21T09:13:46.1237595Z shl.b32 %r115, %r1215, 14; 2026-02-21T09:13:46.1237659Z add.s32 %r104, %r75, %r115; 2026-02-21T09:13:46.1237855Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1237913Z bar.sync 3, 64; 2026-02-21T09:13:46.1237977Z elect.sync %r116|%p13, -1; 2026-02-21T09:13:46.1238049Z and.pred %p10, %p12, %p13; 2026-02-21T09:13:46.1238105Z // begin inline asm 2026-02-21T09:13:46.1238375Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r104], [%rd12, {%r105, %r1222}], [%r107]; 2026-02-21T09:13:46.1238437Z // end inline asm 2026-02-21T09:13:46.1238606Z .loc 1 52 44 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:52:44 2026-02-21T09:13:46.1238688Z shl.b32 %r117, %r1215, 13; 2026-02-21T09:13:46.1238754Z add.s32 %r118, %r75, %r117; 2026-02-21T09:13:46.1238813Z add.s32 %r108, %r118, 131072; 2026-02-21T09:13:46.1238968Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1239028Z bar.sync 3, 64; 2026-02-21T09:13:46.1239092Z elect.sync %r119|%p14, -1; 2026-02-21T09:13:46.1239154Z and.pred %p11, %p12, %p14; 2026-02-21T09:13:46.1239208Z // begin inline asm 2026-02-21T09:13:46.1239498Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd13, {%r105, %r1221}], [%r107]; 2026-02-21T09:13:46.1239555Z // end inline asm 2026-02-21T09:13:46.1239725Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1239790Z add.s32 %r1218, %r105, 32; 2026-02-21T09:13:46.1239940Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1239998Z add.s32 %r120, %r1215, 1; 2026-02-21T09:13:46.1240068Z setp.eq.b32 %p15, %r120, 4; 2026-02-21T09:13:46.1240130Z selp.b32 %r1215, 0, %r120, %p15; 2026-02-21T09:13:46.1240188Z selp.b32 %r121, 1, 0, %p15; 2026-02-21T09:13:46.1240248Z xor.b32 %r1214, %r1214, %r121; 2026-02-21T09:13:46.1240424Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1240483Z add.s32 %r1213, %r1213, -1; 2026-02-21T09:13:46.1240544Z setp.ne.b32 %p16, %r1213, 0; 2026-02-21T09:13:46.1240611Z @%p16 bra $L__BB0_11; 2026-02-21T09:13:46.1240667Z bra.uni $L__BB0_14; 2026-02-21T09:13:46.1240767Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:13:46.1240866Z // => This Inner Loop Header: Depth=2 2026-02-21T09:13:46.1240924Z add.s32 %r87, %r1220, 1; 2026-02-21T09:13:46.1240984Z setp.eq.b32 %p6, %r1220, 63; 2026-02-21T09:13:46.1241046Z selp.b32 %r1220, 0, %r87, %p6; 2026-02-21T09:13:46.1241115Z setp.ne.b32 %p7, %r1220, 0; 2026-02-21T09:13:46.1241174Z setp.eq.b32 %p8, %r1220, 0; 2026-02-21T09:13:46.1241232Z @%p7 bra $L__BB0_13; 2026-02-21T09:13:46.1241338Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:13:46.1241399Z add.s32 %r1223, %r1223, 2368; 2026-02-21T09:13:46.1241566Z .loc 1 36 35 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:36:35 2026-02-21T09:13:46.1241624Z shr.s32 %r88, %r1223, 31; 2026-02-21T09:13:46.1241691Z shr.u32 %r89, %r88, 26; 2026-02-21T09:13:46.1241749Z add.s32 %r90, %r1223, %r89; 2026-02-21T09:13:46.1241804Z shr.s32 %r91, %r90, 6; 2026-02-21T09:13:46.1241970Z .loc 1 37 33 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:37:33 2026-02-21T09:13:46.1242025Z shl.b32 %r92, %r91, 3; 2026-02-21T09:13:46.1242184Z .loc 1 38 39 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:38:39 2026-02-21T09:13:46.1242246Z sub.s32 %r93, 32, %r92; 2026-02-21T09:13:46.1242435Z .loc 1 38 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:38:52 2026-02-21T09:13:46.1242489Z min.s32 %r94, %r93, 8; 2026-02-21T09:13:46.1242657Z .loc 1 39 45 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:39:45 2026-02-21T09:13:46.1242741Z and.b32 %r95, %r90, -64; 2026-02-21T09:13:46.1242798Z sub.s32 %r96, %r1223, %r95; 2026-02-21T09:13:46.1242963Z .loc 1 40 51 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:40:51 2026-02-21T09:13:46.1243028Z div.s32 %r97, %r96, %r94; 2026-02-21T09:13:46.1243192Z .loc 1 39 64 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:39:64 2026-02-21T09:13:46.1243249Z mul.lo.s32 %r98, %r97, %r94; 2026-02-21T09:13:46.1243313Z sub.s32 %r99, %r96, %r98; 2026-02-21T09:13:46.1243501Z .loc 1 39 30 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:39:30 2026-02-21T09:13:46.1243560Z add.s32 %r100, %r99, %r92; 2026-02-21T09:13:46.1243733Z .loc 1 41 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:41:27 2026-02-21T09:13:46.1243789Z shl.b32 %r1221, %r100, 7; 2026-02-21T09:13:46.1243955Z .loc 1 42 27 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:42:27 2026-02-21T09:13:46.1244012Z shl.b32 %r1222, %r97, 8; 2026-02-21T09:13:46.1244075Z bra.uni $L__BB0_13; 2026-02-21T09:13:46.1244174Z $L__BB0_14: // %._crit_edge 2026-02-21T09:13:46.1244261Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1244439Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1244495Z barrier.sync 1; 2026-02-21T09:13:46.1244570Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:46.1244631Z bra.uni $L__BB0_2; 2026-02-21T09:13:46.1244785Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:46.1244945Z .loc 1 19 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:19 2026-02-21T09:13:46.1245000Z barrier.sync 1; 2026-02-21T09:13:46.1245064Z barrier.sync 1; 2026-02-21T09:13:46.1245119Z bra.uni $L__BB0_2; 2026-02-21T09:13:46.1245206Z $L__BB0_23: // %._crit_edge12 2026-02-21T09:13:46.1245383Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1245454Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:13:46.1245507Z bar.sync 0, 128; 2026-02-21T09:13:46.1245569Z barrier.sync 1; 2026-02-21T09:13:46.1245643Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:13:46.1245804Z .loc 1 53 52 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:53:52 2026-02-21T09:13:46.1245860Z // begin inline asm 2026-02-21T09:13:46.1245917Z 2026-02-21T09:13:46.1245965Z { 2026-02-21T09:13:46.1246022Z .reg .pred complete; 2026-02-21T09:13:46.1246085Z waitLoop: 2026-02-21T09:13:46.1246202Z mbarrier.try_wait.parity.shared.b64 complete, [%r1194], %r1234; 2026-02-21T09:13:46.1246265Z @!complete bra.uni waitLoop; 2026-02-21T09:13:46.1246313Z } 2026-02-21T09:13:46.1246317Z 2026-02-21T09:13:46.1246380Z // end inline asm 2026-02-21T09:13:46.1246547Z .loc 1 30 130 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:130 2026-02-21T09:13:46.1246601Z bar.sync 0, 128; 2026-02-21T09:13:46.1246665Z // begin inline asm 2026-02-21T09:13:46.1246752Z @%p132 mbarrier.inval.shared::cta.b64 [%r1194]; 2026-02-21T09:13:46.1246805Z // end inline asm 2026-02-21T09:13:46.1246865Z // begin inline asm 2026-02-21T09:13:46.1246949Z @%p132 mbarrier.inval.shared::cta.b64 [%r470]; 2026-02-21T09:13:46.1247001Z // end inline asm 2026-02-21T09:13:46.1247055Z // begin inline asm 2026-02-21T09:13:46.1247141Z @%p132 mbarrier.inval.shared::cta.b64 [%r462]; 2026-02-21T09:13:46.1247195Z // end inline asm 2026-02-21T09:13:46.1247277Z bar.sync 0, 128; 2026-02-21T09:13:46.1247338Z // begin inline asm 2026-02-21T09:13:46.1247414Z @%p132 mbarrier.inval.shared::cta.b64 [%r463]; 2026-02-21T09:13:46.1247466Z // end inline asm 2026-02-21T09:13:46.1247518Z bar.sync 0, 128; 2026-02-21T09:13:46.1247606Z // begin inline asm 2026-02-21T09:13:46.1247680Z @%p132 mbarrier.inval.shared::cta.b64 [%r464]; 2026-02-21T09:13:46.1247731Z // end inline asm 2026-02-21T09:13:46.1247792Z bar.sync 0, 128; 2026-02-21T09:13:46.1247846Z // begin inline asm 2026-02-21T09:13:46.1247923Z @%p132 mbarrier.inval.shared::cta.b64 [%r465]; 2026-02-21T09:13:46.1247982Z // end inline asm 2026-02-21T09:13:46.1248035Z // begin inline asm 2026-02-21T09:13:46.1248109Z @%p132 mbarrier.inval.shared::cta.b64 [%r458]; 2026-02-21T09:13:46.1248160Z // end inline asm 2026-02-21T09:13:46.1248220Z bar.sync 0, 128; 2026-02-21T09:13:46.1248274Z // begin inline asm 2026-02-21T09:13:46.1248374Z @%p132 mbarrier.inval.shared::cta.b64 [%r459]; 2026-02-21T09:13:46.1248437Z // end inline asm 2026-02-21T09:13:46.1248490Z bar.sync 0, 128; 2026-02-21T09:13:46.1248544Z // begin inline asm 2026-02-21T09:13:46.1248620Z @%p132 mbarrier.inval.shared::cta.b64 [%r460]; 2026-02-21T09:13:46.1248682Z // end inline asm 2026-02-21T09:13:46.1248737Z bar.sync 0, 128; 2026-02-21T09:13:46.1248791Z // begin inline asm 2026-02-21T09:13:46.1248877Z @%p132 mbarrier.inval.shared::cta.b64 [%r461]; 2026-02-21T09:13:46.1248930Z // end inline asm 2026-02-21T09:13:46.1249122Z .loc 1 30 4 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:30:4 2026-02-21T09:13:46.1249184Z bar.sync 0, 128; 2026-02-21T09:13:46.1249237Z // begin inline asm 2026-02-21T09:13:46.1249354Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1206, 256; 2026-02-21T09:13:46.1249407Z // end inline asm 2026-02-21T09:13:46.1249490Z st.shared.b32 [global_smem+163928], 50529027; 2026-02-21T09:13:46.1249545Z barrier.sync 1; 2026-02-21T09:13:46.1249624Z $L__BB0_24: // %common.ret 2026-02-21T09:13:46.1249791Z .loc 1 0 0 // c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py:0 2026-02-21T09:13:46.1249841Z ret; 2026-02-21T09:13:46.1249895Z $L__tmp1: 2026-02-21T09:13:46.1249947Z $L__func_end0: 2026-02-21T09:13:46.1250035Z // -- End function 2026-02-21T09:13:46.1250083Z } 2026-02-21T09:13:46.1250282Z .file 1 "/tmp/torchinductor_root/7l/c7lmbwdluhl6gqabvengfgxq46uurymhuyxcz6gnrjq42yk3dgz5.py" 2026-02-21T09:13:46.1250352Z .section .debug_abbrev 2026-02-21T09:13:46.1250400Z { 2026-02-21T09:13:46.1250485Z .b8 1 // Abbreviation Code 2026-02-21T09:13:46.1250576Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:13:46.1250653Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:13:46.1250728Z .b8 37 // DW_AT_producer 2026-02-21T09:13:46.1250799Z .b8 8 // DW_FORM_string 2026-02-21T09:13:46.1250878Z .b8 19 // DW_AT_language 2026-02-21T09:13:46.1250953Z .b8 5 // DW_FORM_data2 2026-02-21T09:13:46.1251024Z .b8 3 // DW_AT_name 2026-02-21T09:13:46.1251102Z .b8 8 // DW_FORM_string 2026-02-21T09:13:46.1251176Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:13:46.1251248Z .b8 6 // DW_FORM_data4 2026-02-21T09:13:46.1251327Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:13:46.1251397Z .b8 8 // DW_FORM_string 2026-02-21T09:13:46.1251464Z .b8 0 // EOM(1) 2026-02-21T09:13:46.1251529Z .b8 0 // EOM(2) 2026-02-21T09:13:46.1251600Z .b8 0 // EOM(3) 2026-02-21T09:13:46.1251649Z } 2026-02-21T09:13:46.1251706Z .section .debug_info 2026-02-21T09:13:46.1251796Z { 2026-02-21T09:13:46.1251876Z .b32 104 // Length of Unit 2026-02-21T09:13:46.1251959Z .b8 2 // DWARF version number 2026-02-21T09:13:46.1252015Z .b8 0 2026-02-21T09:13:46.1252149Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:13:46.1252233Z .b8 8 // Address Size (in bytes) 2026-02-21T09:13:46.1252332Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:13:46.1252416Z .b8 116 // DW_AT_producer 2026-02-21T09:13:46.1252467Z .b8 114 2026-02-21T09:13:46.1252518Z .b8 105 2026-02-21T09:13:46.1252574Z .b8 116 2026-02-21T09:13:46.1252622Z .b8 111 2026-02-21T09:13:46.1252669Z .b8 110 2026-02-21T09:13:46.1252716Z .b8 0 2026-02-21T09:13:46.1252792Z .b8 2 // DW_AT_language 2026-02-21T09:13:46.1252862Z .b8 0 2026-02-21T09:13:46.1252935Z .b8 99 // DW_AT_name 2026-02-21T09:13:46.1252991Z .b8 55 2026-02-21T09:13:46.1253039Z .b8 108 2026-02-21T09:13:46.1253085Z .b8 109 2026-02-21T09:13:46.1253133Z .b8 98 2026-02-21T09:13:46.1253188Z .b8 119 2026-02-21T09:13:46.1253235Z .b8 100 2026-02-21T09:13:46.1253284Z .b8 108 2026-02-21T09:13:46.1253337Z .b8 117 2026-02-21T09:13:46.1253386Z .b8 104 2026-02-21T09:13:46.1253434Z .b8 108 2026-02-21T09:13:46.1253481Z .b8 54 2026-02-21T09:13:46.1253537Z .b8 103 2026-02-21T09:13:46.1253609Z .b8 113 2026-02-21T09:13:46.1253658Z .b8 97 2026-02-21T09:13:46.1253713Z .b8 98 2026-02-21T09:13:46.1253761Z .b8 118 2026-02-21T09:13:46.1253810Z .b8 101 2026-02-21T09:13:46.1253858Z .b8 110 2026-02-21T09:13:46.1253914Z .b8 103 2026-02-21T09:13:46.1253962Z .b8 102 2026-02-21T09:13:46.1254010Z .b8 103 2026-02-21T09:13:46.1254057Z .b8 120 2026-02-21T09:13:46.1254112Z .b8 113 2026-02-21T09:13:46.1254159Z .b8 52 2026-02-21T09:13:46.1254205Z .b8 54 2026-02-21T09:13:46.1254261Z .b8 117 2026-02-21T09:13:46.1254311Z .b8 117 2026-02-21T09:13:46.1254360Z .b8 114 2026-02-21T09:13:46.1254407Z .b8 121 2026-02-21T09:13:46.1254462Z .b8 109 2026-02-21T09:13:46.1254511Z .b8 104 2026-02-21T09:13:46.1254560Z .b8 117 2026-02-21T09:13:46.1254615Z .b8 121 2026-02-21T09:13:46.1254666Z .b8 120 2026-02-21T09:13:46.1254749Z .b8 99 2026-02-21T09:13:46.1254799Z .b8 122 2026-02-21T09:13:46.1254855Z .b8 54 2026-02-21T09:13:46.1254902Z .b8 103 2026-02-21T09:13:46.1254952Z .b8 110 2026-02-21T09:13:46.1254998Z .b8 114 2026-02-21T09:13:46.1255056Z .b8 106 2026-02-21T09:13:46.1255105Z .b8 113 2026-02-21T09:13:46.1255154Z .b8 52 2026-02-21T09:13:46.1255212Z .b8 50 2026-02-21T09:13:46.1255261Z .b8 121 2026-02-21T09:13:46.1255312Z .b8 107 2026-02-21T09:13:46.1255360Z .b8 51 2026-02-21T09:13:46.1255416Z .b8 100 2026-02-21T09:13:46.1255464Z .b8 103 2026-02-21T09:13:46.1255512Z .b8 122 2026-02-21T09:13:46.1255567Z .b8 53 2026-02-21T09:13:46.1255614Z .b8 46 2026-02-21T09:13:46.1255662Z .b8 112 2026-02-21T09:13:46.1255711Z .b8 121 2026-02-21T09:13:46.1255769Z .b8 0 2026-02-21T09:13:46.1255859Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:13:46.1255934Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:13:46.1255994Z .b8 116 2026-02-21T09:13:46.1256045Z .b8 109 2026-02-21T09:13:46.1256098Z .b8 112 2026-02-21T09:13:46.1256144Z .b8 47 2026-02-21T09:13:46.1256199Z .b8 116 2026-02-21T09:13:46.1256248Z .b8 111 2026-02-21T09:13:46.1256297Z .b8 114 2026-02-21T09:13:46.1256352Z .b8 99 2026-02-21T09:13:46.1256401Z .b8 104 2026-02-21T09:13:46.1256449Z .b8 105 2026-02-21T09:13:46.1256498Z .b8 110 2026-02-21T09:13:46.1256556Z .b8 100 2026-02-21T09:13:46.1256604Z .b8 117 2026-02-21T09:13:46.1256651Z .b8 99 2026-02-21T09:13:46.1256699Z .b8 116 2026-02-21T09:13:46.1256753Z .b8 111 2026-02-21T09:13:46.1256801Z .b8 114 2026-02-21T09:13:46.1256849Z .b8 95 2026-02-21T09:13:46.1256903Z .b8 114 2026-02-21T09:13:46.1256952Z .b8 111 2026-02-21T09:13:46.1257000Z .b8 111 2026-02-21T09:13:46.1257050Z .b8 116 2026-02-21T09:13:46.1257139Z .b8 47 2026-02-21T09:13:46.1257188Z .b8 55 2026-02-21T09:13:46.1257235Z .b8 108 2026-02-21T09:13:46.1257290Z .b8 0 2026-02-21T09:13:46.1257338Z } 2026-02-21T09:13:46.1257401Z .section .debug_macinfo { } 2026-02-21T09:13:46.1257405Z 2026-02-21T09:13:46.1257508Z ================================================================ 2026-02-21T09:13:46.1257616Z please share the reproducer above with Triton project. 2026-02-21T09:13:47.7930525Z [164s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:13:47.7930875Z 2026-02-21T09:13:47.7930879Z 2026-02-21T09:13:47.7930882Z 2026-02-21T09:13:47.7930998Z ================================================================ 2026-02-21T09:13:47.7931257Z Internal Triton PTX codegen error 2026-02-21T09:13:47.7931484Z `ptxas` stderr: 2026-02-21T09:13:47.7932246Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 210 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:13:47.7932829Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:13:47.7933004Z 2026-02-21T09:13:47.7933498Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxtz453hx.ptx -o /tmp/tmpxtz453hx.ptx.o 2026-02-21T09:13:47.7934003Z 2026-02-21T09:13:47.7934008Z 2026-02-21T09:13:47.7934074Z // 2026-02-21T09:13:47.7934242Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:13:47.7934531Z // 2026-02-21T09:13:47.7934615Z 2026-02-21T09:13:47.7934798Z .version 8.7 2026-02-21T09:13:47.7934981Z .target sm_100a 2026-02-21T09:13:47.7935163Z .address_size 64 2026-02-21T09:13:47.7935248Z 2026-02-21T09:13:47.7935383Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:13:47.7935667Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:13:47.7935886Z // @_helion_matmul 2026-02-21T09:13:47.7936114Z .visible .entry _helion_matmul( 2026-02-21T09:13:47.7936329Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:13:47.7936626Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:13:47.7936907Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:13:47.7937183Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:13:47.7937446Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:13:47.7937651Z ) 2026-02-21T09:13:47.7937784Z .reqntid 384 2026-02-21T09:13:47.7937909Z .maxnreg 32 2026-02-21T09:13:47.7938031Z { 2026-02-21T09:13:47.7938148Z .reg .pred %p<131>; 2026-02-21T09:13:47.7938300Z .reg .b32 %r<651>; 2026-02-21T09:13:47.7938444Z .reg .b64 %rd<201>; 2026-02-21T09:13:47.7938698Z .loc 1 19 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:19:0 2026-02-21T09:13:47.7938984Z $L__func_begin0: 2026-02-21T09:13:47.7939225Z .loc 1 19 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:19:0 2026-02-21T09:13:47.7939458Z 2026-02-21T09:13:47.7939510Z // %bb.0: 2026-02-21T09:13:47.7939656Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:13:47.7939843Z $L__tmp0: 2026-02-21T09:13:47.7940070Z .loc 1 19 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:19 2026-02-21T09:13:47.7940342Z mov.u32 %r1, %tid.x; 2026-02-21T09:13:47.7940493Z shr.u32 %r2, %r1, 5; 2026-02-21T09:13:47.7940651Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:13:47.7940845Z setp.lt.u32 %p3, %r3, 8; 2026-02-21T09:13:47.7940995Z @%p3 bra $L__BB0_16; 2026-02-21T09:13:47.7941140Z bra.uni $L__BB0_1; 2026-02-21T09:13:47.7941273Z $L__BB0_16: 2026-02-21T09:13:47.7941507Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0:0 2026-02-21T09:13:47.7941808Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:13:47.7942012Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:13:47.7942403Z .loc 1 19 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:19 2026-02-21T09:13:47.7942701Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:13:47.7942896Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T09:13:47.7943054Z mov.b32 %r163, global_smem; 2026-02-21T09:13:47.7943268Z // begin inline asm 2026-02-21T09:13:47.7943501Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r163], 128; 2026-02-21T09:13:47.7943754Z // end inline asm 2026-02-21T09:13:47.7943893Z bar.sync 0, 256; 2026-02-21T09:13:47.7944034Z ld.shared.b32 %r622, [global_smem]; 2026-02-21T09:13:47.7944207Z bar.sync 0, 256; 2026-02-21T09:13:47.7944336Z // begin inline asm 2026-02-21T09:13:47.7944539Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:13:47.7944797Z // end inline asm 2026-02-21T09:13:47.7945084Z .loc 1 21 67 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:21:67 2026-02-21T09:13:47.7945375Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:13:47.7945536Z mov.u32 %r276, %ctaid.y; 2026-02-21T09:13:47.7945690Z mov.u32 %r277, %ctaid.z; 2026-02-21T09:13:47.7945838Z mov.u32 %r278, %nctaid.x; 2026-02-21T09:13:47.7945996Z mov.u32 %r279, %nctaid.y; 2026-02-21T09:13:47.7946154Z mad.lo.s32 %r280, %r277, %r279, %r276; 2026-02-21T09:13:47.7946336Z mad.lo.s32 %r281, %r280, %r278, %r41; 2026-02-21T09:13:47.7946502Z shl.b32 %r282, %r281, 8; 2026-02-21T09:13:47.7946654Z cvt.s64.s32 %rd64, %r282; 2026-02-21T09:13:47.7946843Z add.s64 %rd43, %rd6, %rd64; 2026-02-21T09:13:47.7947003Z shl.b32 %r283, %r1, 2; 2026-02-21T09:13:47.7947150Z add.s32 %r164, %r163, %r283; 2026-02-21T09:13:47.7947305Z mov.b32 %r650, 0; 2026-02-21T09:13:47.7947442Z // begin inline asm 2026-02-21T09:13:47.7947593Z @%p33 st.shared.b32 [ %r164 + 0 ], %r650; 2026-02-21T09:13:47.7947770Z // end inline asm 2026-02-21T09:13:47.7947905Z bar.warp.sync -1; 2026-02-21T09:13:47.7948056Z setp.eq.b32 %p111, %r1, 0; 2026-02-21T09:13:47.7948208Z cvt.u64.u32 %rd28, %r163; 2026-02-21T09:13:47.7948360Z // begin inline asm 2026-02-21T09:13:47.7948604Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd3; 2026-02-21T09:13:47.7948887Z // end inline asm 2026-02-21T09:13:47.7949025Z // begin inline asm 2026-02-21T09:13:47.7949243Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:13:47.7949497Z // end inline asm 2026-02-21T09:13:47.7949625Z mov.b32 %r166, 32; 2026-02-21T09:13:47.7949765Z // begin inline asm 2026-02-21T09:13:47.7949996Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r166; 2026-02-21T09:13:47.7950269Z // end inline asm 2026-02-21T09:13:47.7950399Z mov.b32 %r167, 256; 2026-02-21T09:13:47.7950548Z // begin inline asm 2026-02-21T09:13:47.7950783Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r167; 2026-02-21T09:13:47.7951044Z // end inline asm 2026-02-21T09:13:47.7951182Z mov.b32 %r168, 2048; 2026-02-21T09:13:47.7951317Z // begin inline asm 2026-02-21T09:13:47.7951563Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r168; 2026-02-21T09:13:47.7951830Z // end inline asm 2026-02-21T09:13:47.7951963Z // begin inline asm 2026-02-21T09:13:47.7952207Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r168; 2026-02-21T09:13:47.7952476Z // end inline asm 2026-02-21T09:13:47.7952613Z mov.b64 %rd36, 4096; 2026-02-21T09:13:47.7952748Z // begin inline asm 2026-02-21T09:13:47.7952998Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:13:47.7953275Z // end inline asm 2026-02-21T09:13:47.7954562Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=8, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:13:47.7955867Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:13:47.7956141Z `ptxas` stderr: 2026-02-21T09:13:47.7956551Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 210 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:13:47.7957026Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:13:47.7957168Z 2026-02-21T09:13:47.7957551Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxtz453hx.ptx -o /tmp/tmpxtz453hx.ptx.o 2026-02-21T09:13:47.7957983Z 2026-02-21T09:13:47.7958137Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:13:47.7958376Z mov.b32 %r170, 1; 2026-02-21T09:13:47.7958512Z // begin inline asm 2026-02-21T09:13:47.7958774Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r170; 2026-02-21T09:13:47.7959067Z // end inline asm 2026-02-21T09:13:47.7959195Z // begin inline asm 2026-02-21T09:13:47.7959447Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r170; 2026-02-21T09:13:47.7959730Z // end inline asm 2026-02-21T09:13:47.7959863Z // begin inline asm 2026-02-21T09:13:47.7960114Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:13:47.7960382Z // end inline asm 2026-02-21T09:13:47.7960509Z // begin inline asm 2026-02-21T09:13:47.7960758Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:13:47.7961041Z // end inline asm 2026-02-21T09:13:47.7961167Z // begin inline asm 2026-02-21T09:13:47.7961405Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:13:47.7961669Z // end inline asm 2026-02-21T09:13:47.7961801Z // begin inline asm 2026-02-21T09:13:47.7962022Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:13:47.7962286Z // end inline asm 2026-02-21T09:13:47.7962424Z // begin inline asm 2026-02-21T09:13:47.7962758Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:13:47.7963146Z // end inline asm 2026-02-21T09:13:47.7963274Z // begin inline asm 2026-02-21T09:13:47.7963480Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T09:13:47.7963722Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:47.7963919Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:47.7964102Z // end inline asm 2026-02-21T09:13:47.7964233Z bar.sync 0, 256; 2026-02-21T09:13:47.7964488Z .loc 1 22 67 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:22:67 2026-02-21T09:13:47.7964806Z add.s64 %rd61, %rd43, 128; 2026-02-21T09:13:47.7964962Z bar.sync 0, 256; 2026-02-21T09:13:47.7965091Z // begin inline asm 2026-02-21T09:13:47.7965247Z @%p33 st.shared.b32 [ %r164 + 0 ], %r650; 2026-02-21T09:13:47.7965419Z // end inline asm 2026-02-21T09:13:47.7965560Z bar.warp.sync -1; 2026-02-21T09:13:47.7965693Z // begin inline asm 2026-02-21T09:13:47.7965943Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd4; 2026-02-21T09:13:47.7966220Z // end inline asm 2026-02-21T09:13:47.7966348Z // begin inline asm 2026-02-21T09:13:47.7966570Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:13:47.7966819Z // end inline asm 2026-02-21T09:13:47.7966956Z // begin inline asm 2026-02-21T09:13:47.7967186Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r166; 2026-02-21T09:13:47.7967460Z // end inline asm 2026-02-21T09:13:47.7967633Z mov.b32 %r175, 64; 2026-02-21T09:13:47.7967764Z // begin inline asm 2026-02-21T09:13:47.7967992Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r175; 2026-02-21T09:13:47.7968246Z // end inline asm 2026-02-21T09:13:47.7968380Z // begin inline asm 2026-02-21T09:13:47.7968649Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r168; 2026-02-21T09:13:47.7968921Z // end inline asm 2026-02-21T09:13:47.7969050Z mov.b32 %r177, 4096; 2026-02-21T09:13:47.7969192Z // begin inline asm 2026-02-21T09:13:47.7969434Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r177; 2026-02-21T09:13:47.7969706Z // end inline asm 2026-02-21T09:13:47.7969840Z // begin inline asm 2026-02-21T09:13:47.7970086Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:13:47.7970368Z // end inline asm 2026-02-21T09:13:47.7970520Z // begin inline asm 2026-02-21T09:13:47.7970786Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r170; 2026-02-21T09:13:47.7971072Z // end inline asm 2026-02-21T09:13:47.7971198Z // begin inline asm 2026-02-21T09:13:47.7971451Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r170; 2026-02-21T09:13:47.7971727Z // end inline asm 2026-02-21T09:13:47.7971860Z // begin inline asm 2026-02-21T09:13:47.7972112Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:13:47.7972381Z // end inline asm 2026-02-21T09:13:47.7972520Z // begin inline asm 2026-02-21T09:13:47.7972772Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:13:47.7973068Z // end inline asm 2026-02-21T09:13:47.7973202Z // begin inline asm 2026-02-21T09:13:47.7973446Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:13:47.7973721Z // end inline asm 2026-02-21T09:13:47.7973862Z // begin inline asm 2026-02-21T09:13:47.7974091Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:13:47.7974364Z // end inline asm 2026-02-21T09:13:47.7974508Z // begin inline asm 2026-02-21T09:13:47.7974907Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:13:47.7975310Z // end inline asm 2026-02-21T09:13:47.7975446Z // begin inline asm 2026-02-21T09:13:47.7975668Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T09:13:47.7975920Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:13:47.7976120Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:13:47.7976305Z // end inline asm 2026-02-21T09:13:47.7976438Z bar.sync 0, 256; 2026-02-21T09:13:47.7976699Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.7976999Z sub.s32 %r285, 512, %r41; 2026-02-21T09:13:47.7977175Z mul.hi.s32 %r286, %r285, -580400985; 2026-02-21T09:13:47.7977354Z add.s32 %r287, %r286, %r285; 2026-02-21T09:13:47.7977524Z shr.u32 %r288, %r287, 31; 2026-02-21T09:13:47.7977679Z shr.s32 %r289, %r287, 8; 2026-02-21T09:13:47.7977845Z add.s32 %r290, %r289, %r288; 2026-02-21T09:13:47.7978018Z mul.lo.s32 %r291, %r290, 296; 2026-02-21T09:13:47.7978189Z setp.ne.b32 %p102, %r285, %r291; 2026-02-21T09:13:47.7978371Z setp.lt.u32 %p103, %r41, 513; 2026-02-21T09:13:47.7978538Z and.pred %p104, %p103, %p102; 2026-02-21T09:13:47.7978708Z selp.b32 %r292, 1, 0, %p104; 2026-02-21T09:13:47.7978864Z add.s32 %r293, %r290, %r292; 2026-02-21T09:13:47.7979029Z shl.b32 %r52, %r293, 6; 2026-02-21T09:13:47.7979303Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.7979618Z shfl.sync.idx.b32 %r294, %r2, 0, 31, -1; 2026-02-21T09:13:47.7979809Z shl.b32 %r295, %r294, 21; 2026-02-21T09:13:47.7979966Z and.b32 %r296, %r295, 6291456; 2026-02-21T09:13:47.7980198Z add.s32 %r297, %r296, %r622; 2026-02-21T09:13:47.7980359Z shl.b32 %r298, %r294, 4; 2026-02-21T09:13:47.7980521Z and.b32 %r299, %r298, 64; 2026-02-21T09:13:47.7980678Z add.s32 %r180, %r297, %r299; 2026-02-21T09:13:47.7980874Z mov.pred %p71, -1; 2026-02-21T09:13:47.7981018Z // begin inline asm 2026-02-21T09:13:47.7981407Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r180 + 0], {%r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650}; 2026-02-21T09:13:47.7981800Z // end inline asm 2026-02-21T09:13:47.7981931Z // begin inline asm 2026-02-21T09:13:47.7982287Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r180 + 16], {%r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650}; 2026-02-21T09:13:47.7982663Z // end inline asm 2026-02-21T09:13:47.7982799Z // begin inline asm 2026-02-21T09:13:47.7983163Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r180 + 32], {%r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650}; 2026-02-21T09:13:47.7983559Z // end inline asm 2026-02-21T09:13:47.7983695Z // begin inline asm 2026-02-21T09:13:47.7984036Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r180 + 48], {%r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650, %r650}; 2026-02-21T09:13:47.7984433Z // end inline asm 2026-02-21T09:13:47.7984591Z // begin inline asm 2026-02-21T09:13:47.7984774Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:13:47.7984932Z // end inline asm 2026-02-21T09:13:47.7985068Z bar.sync 0, 256; 2026-02-21T09:13:47.7985326Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.7985612Z add.s32 %r248, %r163, 172032; 2026-02-21T09:13:47.7985774Z // begin inline asm 2026-02-21T09:13:47.7985941Z @%p111 mbarrier.init.shared::cta.b64 [%r248], 1; 2026-02-21T09:13:47.7986138Z // end inline asm 2026-02-21T09:13:47.7986265Z bar.sync 0, 256; 2026-02-21T09:13:47.7986405Z add.s32 %r249, %r163, 172040; 2026-02-21T09:13:47.7986560Z // begin inline asm 2026-02-21T09:13:47.7986737Z @%p111 mbarrier.init.shared::cta.b64 [%r249], 1; 2026-02-21T09:13:47.7986934Z // end inline asm 2026-02-21T09:13:47.7987062Z bar.sync 0, 256; 2026-02-21T09:13:47.7987199Z add.s32 %r250, %r163, 172048; 2026-02-21T09:13:47.7987345Z // begin inline asm 2026-02-21T09:13:47.7987509Z @%p111 mbarrier.init.shared::cta.b64 [%r250], 1; 2026-02-21T09:13:47.7987690Z // end inline asm 2026-02-21T09:13:47.7987823Z bar.sync 0, 256; 2026-02-21T09:13:47.7987952Z add.s32 %r251, %r163, 172056; 2026-02-21T09:13:47.7988108Z // begin inline asm 2026-02-21T09:13:47.7988263Z @%p111 mbarrier.init.shared::cta.b64 [%r251], 1; 2026-02-21T09:13:47.7988449Z // end inline asm 2026-02-21T09:13:47.7988582Z bar.sync 0, 256; 2026-02-21T09:13:47.7988715Z add.s32 %r252, %r163, 172064; 2026-02-21T09:13:47.7988870Z // begin inline asm 2026-02-21T09:13:47.7989023Z @%p111 mbarrier.init.shared::cta.b64 [%r252], 1; 2026-02-21T09:13:47.7989210Z // end inline asm 2026-02-21T09:13:47.7989335Z bar.sync 0, 256; 2026-02-21T09:13:47.7989469Z add.s32 %r253, %r163, 172072; 2026-02-21T09:13:47.7989615Z // begin inline asm 2026-02-21T09:13:47.7989778Z @%p111 mbarrier.init.shared::cta.b64 [%r253], 1; 2026-02-21T09:13:47.7989959Z // end inline asm 2026-02-21T09:13:47.7990083Z bar.sync 0, 256; 2026-02-21T09:13:47.7990219Z add.s32 %r254, %r163, 172080; 2026-02-21T09:13:47.7990365Z // begin inline asm 2026-02-21T09:13:47.7990525Z @%p111 mbarrier.init.shared::cta.b64 [%r254], 1; 2026-02-21T09:13:47.7990701Z // end inline asm 2026-02-21T09:13:47.7990832Z bar.sync 0, 256; 2026-02-21T09:13:47.7990960Z add.s32 %r255, %r163, 172088; 2026-02-21T09:13:47.7991113Z // begin inline asm 2026-02-21T09:13:47.7991268Z @%p111 mbarrier.init.shared::cta.b64 [%r255], 1; 2026-02-21T09:13:47.7991452Z // end inline asm 2026-02-21T09:13:47.7991620Z add.s32 %r256, %r163, 172096; 2026-02-21T09:13:47.7991765Z // begin inline asm 2026-02-21T09:13:47.7991926Z @%p111 mbarrier.init.shared::cta.b64 [%r256], 1; 2026-02-21T09:13:47.7992102Z // end inline asm 2026-02-21T09:13:47.7992234Z bar.sync 0, 256; 2026-02-21T09:13:47.7992390Z add.s32 %r257, %r163, 172104; 2026-02-21T09:13:47.7992545Z // begin inline asm 2026-02-21T09:13:47.7992701Z @%p111 mbarrier.init.shared::cta.b64 [%r257], 1; 2026-02-21T09:13:47.7992887Z // end inline asm 2026-02-21T09:13:47.7993023Z bar.sync 0, 256; 2026-02-21T09:13:47.7993156Z add.s32 %r258, %r163, 172112; 2026-02-21T09:13:47.7993312Z // begin inline asm 2026-02-21T09:13:47.7993470Z @%p111 mbarrier.init.shared::cta.b64 [%r258], 1; 2026-02-21T09:13:47.7993659Z // end inline asm 2026-02-21T09:13:47.7993785Z bar.sync 0, 256; 2026-02-21T09:13:47.7993921Z add.s32 %r259, %r163, 172120; 2026-02-21T09:13:47.7994066Z // begin inline asm 2026-02-21T09:13:47.7994255Z @%p111 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:13:47.7994434Z // end inline asm 2026-02-21T09:13:47.7994568Z bar.sync 0, 256; 2026-02-21T09:13:47.7994742Z add.s32 %r260, %r163, 172128; 2026-02-21T09:13:47.7994892Z // begin inline asm 2026-02-21T09:13:47.7995054Z @%p111 mbarrier.init.shared::cta.b64 [%r260], 1; 2026-02-21T09:13:47.7995236Z // end inline asm 2026-02-21T09:13:47.7995370Z bar.sync 0, 256; 2026-02-21T09:13:47.7995502Z add.s32 %r261, %r163, 172136; 2026-02-21T09:13:47.7995657Z // begin inline asm 2026-02-21T09:13:47.7995845Z @%p111 mbarrier.init.shared::cta.b64 [%r261], 1; 2026-02-21T09:13:47.7996033Z // end inline asm 2026-02-21T09:13:47.7996166Z bar.sync 0, 256; 2026-02-21T09:13:47.7996295Z add.s32 %r262, %r163, 172144; 2026-02-21T09:13:47.7996451Z // begin inline asm 2026-02-21T09:13:47.7996608Z @%p111 mbarrier.init.shared::cta.b64 [%r262], 1; 2026-02-21T09:13:47.7996792Z // end inline asm 2026-02-21T09:13:47.7996917Z bar.sync 0, 256; 2026-02-21T09:13:47.7997055Z add.s32 %r263, %r163, 172152; 2026-02-21T09:13:47.7997202Z // begin inline asm 2026-02-21T09:13:47.7997362Z @%p111 mbarrier.init.shared::cta.b64 [%r263], 1; 2026-02-21T09:13:47.7997536Z // end inline asm 2026-02-21T09:13:47.7997780Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.7998058Z bar.sync 0, 256; 2026-02-21T09:13:47.7998184Z // begin inline asm 2026-02-21T09:13:47.7998355Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r248]; 2026-02-21T09:13:47.7998543Z // end inline asm 2026-02-21T09:13:47.7998675Z bar.sync 0, 256; 2026-02-21T09:13:47.7998799Z // begin inline asm 2026-02-21T09:13:47.7998966Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r249]; 2026-02-21T09:13:47.7999147Z // end inline asm 2026-02-21T09:13:47.7999278Z bar.sync 0, 256; 2026-02-21T09:13:47.7999410Z // begin inline asm 2026-02-21T09:13:47.7999569Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r250]; 2026-02-21T09:13:47.7999757Z // end inline asm 2026-02-21T09:13:47.7999882Z bar.sync 0, 256; 2026-02-21T09:13:47.8000018Z // begin inline asm 2026-02-21T09:13:47.8000177Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r251]; 2026-02-21T09:13:47.8000364Z // end inline asm 2026-02-21T09:13:47.8000489Z bar.sync 0, 256; 2026-02-21T09:13:47.8000621Z // begin inline asm 2026-02-21T09:13:47.8000786Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r252]; 2026-02-21T09:13:47.8000966Z // end inline asm 2026-02-21T09:13:47.8001101Z bar.sync 0, 256; 2026-02-21T09:13:47.8001229Z // begin inline asm 2026-02-21T09:13:47.8001397Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r253]; 2026-02-21T09:13:47.8001580Z // end inline asm 2026-02-21T09:13:47.8001713Z bar.sync 0, 256; 2026-02-21T09:13:47.8001838Z // begin inline asm 2026-02-21T09:13:47.8002004Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r254]; 2026-02-21T09:13:47.8002184Z // end inline asm 2026-02-21T09:13:47.8002315Z bar.sync 0, 256; 2026-02-21T09:13:47.8002448Z // begin inline asm 2026-02-21T09:13:47.8002606Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r255]; 2026-02-21T09:13:47.8002825Z // end inline asm 2026-02-21T09:13:47.8003066Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8003350Z bar.sync 0, 256; 2026-02-21T09:13:47.8003512Z add.s32 %r272, %r163, 172160; 2026-02-21T09:13:47.8003666Z // begin inline asm 2026-02-21T09:13:47.8003821Z @%p111 mbarrier.init.shared::cta.b64 [%r272], 1; 2026-02-21T09:13:47.8004010Z // end inline asm 2026-02-21T09:13:47.8004150Z add.s32 %r602, %r163, 172176; 2026-02-21T09:13:47.8004300Z // begin inline asm 2026-02-21T09:13:47.8004464Z @%p111 mbarrier.init.shared::cta.b64 [%r602], 1; 2026-02-21T09:13:47.8004642Z // end inline asm 2026-02-21T09:13:47.8004933Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8005212Z bar.sync 0, 256; 2026-02-21T09:13:47.8005349Z // begin inline asm 2026-02-21T09:13:47.8005536Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r602]; 2026-02-21T09:13:47.8005728Z // end inline asm 2026-02-21T09:13:47.8005966Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8006271Z st.shared.b32 [global_smem+172184], 33554689; 2026-02-21T09:13:47.8006482Z st.shared.b32 [global_smem+163840], %r622; 2026-02-21T09:13:47.8006676Z st.shared.b32 [global_smem+163848], %r52; 2026-02-21T09:13:47.8006860Z barrier.sync 1; 2026-02-21T09:13:47.8007043Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:13:47.8007225Z barrier.sync 1; 2026-02-21T09:13:47.8007363Z setp.lt.s32 %p105, %r293, 1; 2026-02-21T09:13:47.8007527Z @%p105 bra $L__BB0_23; 2026-02-21T09:13:47.8007695Z // %bb.17: // %.lr.ph10 2026-02-21T09:13:47.8007986Z .loc 1 0 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0:97 2026-02-21T09:13:47.8008293Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:13:47.8008474Z shl.b32 %r42, %r1, 3; 2026-02-21T09:13:47.8008624Z and.b32 %r43, %r42, 56; 2026-02-21T09:13:47.8008767Z shr.u32 %r284, %r1, 3; 2026-02-21T09:13:47.8008918Z bfe.u32 %r44, %r1, 3, 5; 2026-02-21T09:13:47.8009060Z or.b32 %r45, %r44, 32; 2026-02-21T09:13:47.8009206Z or.b32 %r46, %r44, 64; 2026-02-21T09:13:47.8009352Z or.b32 %r47, %r284, 96; 2026-02-21T09:13:47.8009494Z or.b32 %r48, %r44, 128; 2026-02-21T09:13:47.8009639Z or.b32 %r49, %r44, 160; 2026-02-21T09:13:47.8009775Z or.b32 %r50, %r44, 192; 2026-02-21T09:13:47.8009921Z or.b32 %r51, %r284, 224; 2026-02-21T09:13:47.8010172Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8010465Z add.s32 %r647, %r41, -296; 2026-02-21T09:13:47.8010618Z shl.b32 %r302, %r1, 4; 2026-02-21T09:13:47.8010779Z and.b32 %r303, %r302, 4016; 2026-02-21T09:13:47.8010939Z bfe.s32 %r304, %r1, 2, 1; 2026-02-21T09:13:47.8011084Z and.b32 %r305, %r304, 4160; 2026-02-21T09:13:47.8011242Z or.b32 %r306, %r305, %r303; 2026-02-21T09:13:47.8011391Z add.s32 %r308, %r163, 163840; 2026-02-21T09:13:47.8011545Z add.s32 %r55, %r308, %r306; 2026-02-21T09:13:47.8011690Z xor.b32 %r309, %r306, 64; 2026-02-21T09:13:47.8011843Z add.s32 %r56, %r308, %r309; 2026-02-21T09:13:47.8011990Z shl.b32 %r310, %r1, 6; 2026-02-21T09:13:47.8012138Z and.b32 %r311, %r310, 1600; 2026-02-21T09:13:47.8012285Z and.b32 %r312, %r42, 48; 2026-02-21T09:13:47.8012437Z shl.b32 %r313, %r1, 1; 2026-02-21T09:13:47.8012588Z and.b32 %r314, %r313, 384; 2026-02-21T09:13:47.8012735Z bfe.s32 %r315, %r1, 5, 1; 2026-02-21T09:13:47.8012886Z and.b32 %r316, %r315, 4160; 2026-02-21T09:13:47.8013028Z or.b32 %r317, %r311, %r312; 2026-02-21T09:13:47.8013181Z or.b32 %r318, %r316, %r314; 2026-02-21T09:13:47.8013329Z xor.b32 %r319, %r318, %r317; 2026-02-21T09:13:47.8013489Z add.s32 %r397, %r308, %r319; 2026-02-21T09:13:47.8013635Z add.s32 %r402, %r397, 2048; 2026-02-21T09:13:47.8013788Z max.s32 %r640, %r52, 1; 2026-02-21T09:13:47.8013961Z mov.b32 %r645, -1; 2026-02-21T09:13:47.8014104Z mov.b32 %r648, %r650; 2026-02-21T09:13:47.8014252Z mov.b32 %r649, %r650; 2026-02-21T09:13:47.8014390Z bra.uni $L__BB0_18; 2026-02-21T09:13:47.8014580Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:47.8014986Z .loc 1 40 32 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:40:32 2026-02-21T09:13:47.8015274Z or.b32 %r466, %r649, %r43; 2026-02-21T09:13:47.8015532Z .loc 1 42 32 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:42:32 2026-02-21T09:13:47.8015818Z add.s32 %r467, %r648, %r44; 2026-02-21T09:13:47.8015978Z add.s32 %r468, %r648, %r45; 2026-02-21T09:13:47.8016130Z add.s32 %r469, %r648, %r46; 2026-02-21T09:13:47.8016292Z add.s32 %r470, %r648, %r47; 2026-02-21T09:13:47.8016446Z add.s32 %r471, %r648, %r48; 2026-02-21T09:13:47.8016609Z add.s32 %r472, %r648, %r49; 2026-02-21T09:13:47.8016791Z add.s32 %r473, %r648, %r50; 2026-02-21T09:13:47.8016955Z add.s32 %r474, %r648, %r51; 2026-02-21T09:13:47.8017217Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8017508Z bar.sync 0, 256; 2026-02-21T09:13:47.8017657Z // begin inline asm 2026-02-21T09:13:47.8017798Z 2026-02-21T09:13:47.8017920Z { 2026-02-21T09:13:47.8018042Z .reg .pred complete; 2026-02-21T09:13:47.8018197Z waitLoop: 2026-02-21T09:13:47.8018440Z mbarrier.try_wait.parity.shared.b64 complete, [%r272], %r650; 2026-02-21T09:13:47.8018693Z @!complete bra.uni waitLoop; 2026-02-21T09:13:47.8018845Z } 2026-02-21T09:13:47.8018923Z 2026-02-21T09:13:47.8018980Z // end inline asm 2026-02-21T09:13:47.8019118Z // begin inline asm 2026-02-21T09:13:47.8019510Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339}, [%r180 + 0]; 2026-02-21T09:13:47.8019920Z // end inline asm 2026-02-21T09:13:47.8020059Z // begin inline asm 2026-02-21T09:13:47.8020422Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356}, [%r180 + 16]; 2026-02-21T09:13:47.8020814Z // end inline asm 2026-02-21T09:13:47.8020959Z // begin inline asm 2026-02-21T09:13:47.8021307Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365, %r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373}, [%r180 + 32]; 2026-02-21T09:13:47.8021712Z // end inline asm 2026-02-21T09:13:47.8021858Z // begin inline asm 2026-02-21T09:13:47.8022205Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r375, %r376, %r377, %r378, %r379, %r380, %r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390}, [%r180 + 48]; 2026-02-21T09:13:47.8022608Z // end inline asm 2026-02-21T09:13:47.8022741Z // begin inline asm 2026-02-21T09:13:47.8022903Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:13:47.8023068Z // end inline asm 2026-02-21T09:13:47.8023205Z bar.sync 0, 256; 2026-02-21T09:13:47.8023338Z // begin inline asm 2026-02-21T09:13:47.8023504Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r602]; 2026-02-21T09:13:47.8023699Z // end inline asm 2026-02-21T09:13:47.8023830Z cvt.u64.u32 %rd73, %r324; 2026-02-21T09:13:47.8023989Z cvt.u64.u32 %rd74, %r325; 2026-02-21T09:13:47.8024134Z shl.b64 %rd75, %rd74, 32; 2026-02-21T09:13:47.8024287Z or.b64 %rd76, %rd73, %rd75; 2026-02-21T09:13:47.8024543Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8024861Z mov.b64 {%r476, %r477}, %rd76; 2026-02-21T09:13:47.8025033Z cvt.rn.f16x2.f32 %r478, %r477, %r476; 2026-02-21T09:13:47.8025312Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8025597Z cvt.u64.u32 %rd77, %r326; 2026-02-21T09:13:47.8025744Z cvt.u64.u32 %rd78, %r327; 2026-02-21T09:13:47.8025895Z shl.b64 %rd79, %rd78, 32; 2026-02-21T09:13:47.8026070Z or.b64 %rd80, %rd77, %rd79; 2026-02-21T09:13:47.8026332Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8026610Z mov.b64 {%r479, %r480}, %rd80; 2026-02-21T09:13:47.8026817Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T09:13:47.8027097Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8027378Z cvt.u64.u32 %rd81, %r328; 2026-02-21T09:13:47.8027530Z cvt.u64.u32 %rd82, %r329; 2026-02-21T09:13:47.8027671Z shl.b64 %rd83, %rd82, 32; 2026-02-21T09:13:47.8027820Z or.b64 %rd84, %rd81, %rd83; 2026-02-21T09:13:47.8028074Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8028359Z mov.b64 {%r482, %r483}, %rd84; 2026-02-21T09:13:47.8028525Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T09:13:47.8028819Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8029105Z cvt.u64.u32 %rd85, %r330; 2026-02-21T09:13:47.8029249Z cvt.u64.u32 %rd86, %r331; 2026-02-21T09:13:47.8029398Z shl.b64 %rd87, %rd86, 32; 2026-02-21T09:13:47.8029541Z or.b64 %rd88, %rd85, %rd87; 2026-02-21T09:13:47.8029802Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8030076Z mov.b64 {%r485, %r486}, %rd88; 2026-02-21T09:13:47.8030270Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T09:13:47.8030550Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8030822Z cvt.u64.u32 %rd89, %r332; 2026-02-21T09:13:47.8030975Z cvt.u64.u32 %rd90, %r333; 2026-02-21T09:13:47.8031117Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:13:47.8031267Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:13:47.8031513Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8031796Z mov.b64 {%r488, %r489}, %rd92; 2026-02-21T09:13:47.8031961Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T09:13:47.8032224Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8032509Z cvt.u64.u32 %rd93, %r334; 2026-02-21T09:13:47.8032651Z cvt.u64.u32 %rd94, %r335; 2026-02-21T09:13:47.8032800Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:13:47.8032942Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:13:47.8033208Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8033477Z mov.b64 {%r491, %r492}, %rd96; 2026-02-21T09:13:47.8033641Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T09:13:47.8033912Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8034186Z cvt.u64.u32 %rd97, %r336; 2026-02-21T09:13:47.8034338Z cvt.u64.u32 %rd98, %r337; 2026-02-21T09:13:47.8034481Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:13:47.8034635Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:13:47.8034921Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8035220Z mov.b64 {%r494, %r495}, %rd100; 2026-02-21T09:13:47.8035389Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T09:13:47.8035664Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8035956Z cvt.u64.u32 %rd101, %r338; 2026-02-21T09:13:47.8036109Z cvt.u64.u32 %rd102, %r339; 2026-02-21T09:13:47.8036265Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:13:47.8036414Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:13:47.8036683Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8036968Z mov.b64 {%r497, %r498}, %rd104; 2026-02-21T09:13:47.8037136Z cvt.rn.f16x2.f32 %r499, %r498, %r497; 2026-02-21T09:13:47.8037448Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8037722Z cvt.u64.u32 %rd105, %r341; 2026-02-21T09:13:47.8037877Z cvt.u64.u32 %rd106, %r342; 2026-02-21T09:13:47.8038021Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:13:47.8038207Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:13:47.8038456Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8038736Z mov.b64 {%r500, %r501}, %rd108; 2026-02-21T09:13:47.8038903Z cvt.rn.f16x2.f32 %r502, %r501, %r500; 2026-02-21T09:13:47.8039172Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8039451Z cvt.u64.u32 %rd109, %r343; 2026-02-21T09:13:47.8039596Z cvt.u64.u32 %rd110, %r344; 2026-02-21T09:13:47.8039750Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:13:47.8039898Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:13:47.8040203Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8040488Z mov.b64 {%r503, %r504}, %rd112; 2026-02-21T09:13:47.8040653Z cvt.rn.f16x2.f32 %r505, %r504, %r503; 2026-02-21T09:13:47.8040924Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8041203Z cvt.u64.u32 %rd113, %r345; 2026-02-21T09:13:47.8041361Z cvt.u64.u32 %rd114, %r346; 2026-02-21T09:13:47.8041539Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:13:47.8041700Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:13:47.8041960Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8042246Z mov.b64 {%r506, %r507}, %rd116; 2026-02-21T09:13:47.8042413Z cvt.rn.f16x2.f32 %r508, %r507, %r506; 2026-02-21T09:13:47.8042675Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8042952Z cvt.u64.u32 %rd117, %r347; 2026-02-21T09:13:47.8043098Z cvt.u64.u32 %rd118, %r348; 2026-02-21T09:13:47.8043251Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:13:47.8043398Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:13:47.8043652Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8043930Z mov.b64 {%r509, %r510}, %rd120; 2026-02-21T09:13:47.8044095Z cvt.rn.f16x2.f32 %r511, %r510, %r509; 2026-02-21T09:13:47.8044366Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8044635Z cvt.u64.u32 %rd121, %r349; 2026-02-21T09:13:47.8044822Z cvt.u64.u32 %rd122, %r350; 2026-02-21T09:13:47.8044969Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:13:47.8045126Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:13:47.8045382Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8045674Z mov.b64 {%r512, %r513}, %rd124; 2026-02-21T09:13:47.8045843Z cvt.rn.f16x2.f32 %r514, %r513, %r512; 2026-02-21T09:13:47.8046112Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8046394Z cvt.u64.u32 %rd125, %r351; 2026-02-21T09:13:47.8046545Z cvt.u64.u32 %rd126, %r352; 2026-02-21T09:13:47.8046699Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:13:47.8046850Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:13:47.8047115Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8047389Z mov.b64 {%r515, %r516}, %rd128; 2026-02-21T09:13:47.8047556Z cvt.rn.f16x2.f32 %r517, %r516, %r515; 2026-02-21T09:13:47.8047830Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8048106Z cvt.u64.u32 %rd129, %r353; 2026-02-21T09:13:47.8048259Z cvt.u64.u32 %rd130, %r354; 2026-02-21T09:13:47.8048407Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:13:47.8048589Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:13:47.8048837Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8049116Z mov.b64 {%r518, %r519}, %rd132; 2026-02-21T09:13:47.8049280Z cvt.rn.f16x2.f32 %r520, %r519, %r518; 2026-02-21T09:13:47.8049570Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8049852Z cvt.u64.u32 %rd133, %r355; 2026-02-21T09:13:47.8050000Z cvt.u64.u32 %rd134, %r356; 2026-02-21T09:13:47.8050153Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:13:47.8050302Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:13:47.8050560Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8050830Z mov.b64 {%r521, %r522}, %rd136; 2026-02-21T09:13:47.8050996Z cvt.rn.f16x2.f32 %r523, %r522, %r521; 2026-02-21T09:13:47.8051291Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8051562Z cvt.u64.u32 %rd137, %r358; 2026-02-21T09:13:47.8051716Z cvt.u64.u32 %rd138, %r359; 2026-02-21T09:13:47.8051864Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:13:47.8052025Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:13:47.8052288Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8052567Z mov.b64 {%r524, %r525}, %rd140; 2026-02-21T09:13:47.8052759Z cvt.rn.f16x2.f32 %r526, %r525, %r524; 2026-02-21T09:13:47.8053023Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8053299Z cvt.u64.u32 %rd141, %r360; 2026-02-21T09:13:47.8053443Z cvt.u64.u32 %rd142, %r361; 2026-02-21T09:13:47.8053596Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:13:47.8053744Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:13:47.8054006Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8054276Z mov.b64 {%r527, %r528}, %rd144; 2026-02-21T09:13:47.8054443Z cvt.rn.f16x2.f32 %r529, %r528, %r527; 2026-02-21T09:13:47.8054740Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8055013Z cvt.u64.u32 %rd145, %r362; 2026-02-21T09:13:47.8055167Z cvt.u64.u32 %rd146, %r363; 2026-02-21T09:13:47.8055314Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:13:47.8055473Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:13:47.8055729Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8056016Z mov.b64 {%r530, %r531}, %rd148; 2026-02-21T09:13:47.8056181Z cvt.rn.f16x2.f32 %r532, %r531, %r530; 2026-02-21T09:13:47.8056447Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8056731Z cvt.u64.u32 %rd149, %r364; 2026-02-21T09:13:47.8056876Z cvt.u64.u32 %rd150, %r365; 2026-02-21T09:13:47.8057032Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:13:47.8057181Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:13:47.8057442Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8057716Z mov.b64 {%r533, %r534}, %rd152; 2026-02-21T09:13:47.8057881Z cvt.rn.f16x2.f32 %r535, %r534, %r533; 2026-02-21T09:13:47.8058152Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8058426Z cvt.u64.u32 %rd153, %r366; 2026-02-21T09:13:47.8058580Z cvt.u64.u32 %rd154, %r367; 2026-02-21T09:13:47.8058722Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:13:47.8058878Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:13:47.8059136Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8059449Z mov.b64 {%r536, %r537}, %rd156; 2026-02-21T09:13:47.8059614Z cvt.rn.f16x2.f32 %r538, %r537, %r536; 2026-02-21T09:13:47.8059916Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8060197Z cvt.u64.u32 %rd157, %r368; 2026-02-21T09:13:47.8060341Z cvt.u64.u32 %rd158, %r369; 2026-02-21T09:13:47.8060527Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:13:47.8060676Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:13:47.8060947Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8061237Z mov.b64 {%r539, %r540}, %rd160; 2026-02-21T09:13:47.8061413Z cvt.rn.f16x2.f32 %r541, %r540, %r539; 2026-02-21T09:13:47.8061707Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8061994Z cvt.u64.u32 %rd161, %r370; 2026-02-21T09:13:47.8062158Z cvt.u64.u32 %rd162, %r371; 2026-02-21T09:13:47.8062308Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:13:47.8062526Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:13:47.8062798Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8063100Z mov.b64 {%r542, %r543}, %rd164; 2026-02-21T09:13:47.8063282Z cvt.rn.f16x2.f32 %r544, %r543, %r542; 2026-02-21T09:13:47.8063566Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8063873Z cvt.u64.u32 %rd165, %r372; 2026-02-21T09:13:47.8064031Z cvt.u64.u32 %rd166, %r373; 2026-02-21T09:13:47.8064223Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:13:47.8064381Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:13:47.8064660Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8064982Z mov.b64 {%r545, %r546}, %rd168; 2026-02-21T09:13:47.8065156Z cvt.rn.f16x2.f32 %r547, %r546, %r545; 2026-02-21T09:13:47.8065451Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8065740Z cvt.u64.u32 %rd169, %r375; 2026-02-21T09:13:47.8065902Z cvt.u64.u32 %rd170, %r376; 2026-02-21T09:13:47.8066054Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:13:47.8066217Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:13:47.8066490Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8066791Z mov.b64 {%r548, %r549}, %rd172; 2026-02-21T09:13:47.8066964Z cvt.rn.f16x2.f32 %r550, %r549, %r548; 2026-02-21T09:13:47.8067248Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8067545Z cvt.u64.u32 %rd173, %r377; 2026-02-21T09:13:47.8067699Z cvt.u64.u32 %rd174, %r378; 2026-02-21T09:13:47.8067861Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:13:47.8068018Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:13:47.8068298Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8068587Z mov.b64 {%r551, %r552}, %rd176; 2026-02-21T09:13:47.8068759Z cvt.rn.f16x2.f32 %r553, %r552, %r551; 2026-02-21T09:13:47.8069093Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8069374Z cvt.u64.u32 %rd177, %r379; 2026-02-21T09:13:47.8069531Z cvt.u64.u32 %rd178, %r380; 2026-02-21T09:13:47.8069675Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:13:47.8069830Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:13:47.8070088Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8070368Z mov.b64 {%r554, %r555}, %rd180; 2026-02-21T09:13:47.8070529Z cvt.rn.f16x2.f32 %r556, %r555, %r554; 2026-02-21T09:13:47.8070798Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8071079Z cvt.u64.u32 %rd181, %r381; 2026-02-21T09:13:47.8071224Z cvt.u64.u32 %rd182, %r382; 2026-02-21T09:13:47.8071377Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:13:47.8071556Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:13:47.8071814Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8072094Z mov.b64 {%r557, %r558}, %rd184; 2026-02-21T09:13:47.8072287Z cvt.rn.f16x2.f32 %r559, %r558, %r557; 2026-02-21T09:13:47.8072554Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8072825Z cvt.u64.u32 %rd185, %r383; 2026-02-21T09:13:47.8072977Z cvt.u64.u32 %rd186, %r384; 2026-02-21T09:13:47.8073123Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:13:47.8073279Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:13:47.8073529Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8073813Z mov.b64 {%r560, %r561}, %rd188; 2026-02-21T09:13:47.8073984Z cvt.rn.f16x2.f32 %r562, %r561, %r560; 2026-02-21T09:13:47.8074278Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8074559Z cvt.u64.u32 %rd189, %r385; 2026-02-21T09:13:47.8074732Z cvt.u64.u32 %rd190, %r386; 2026-02-21T09:13:47.8074886Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:13:47.8075038Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:13:47.8075298Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8075573Z mov.b64 {%r563, %r564}, %rd192; 2026-02-21T09:13:47.8075765Z cvt.rn.f16x2.f32 %r565, %r564, %r563; 2026-02-21T09:13:47.8076035Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8076302Z cvt.u64.u32 %rd193, %r387; 2026-02-21T09:13:47.8076456Z cvt.u64.u32 %rd194, %r388; 2026-02-21T09:13:47.8076601Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:13:47.8076760Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:13:47.8077015Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8077296Z mov.b64 {%r566, %r567}, %rd196; 2026-02-21T09:13:47.8077463Z cvt.rn.f16x2.f32 %r568, %r567, %r566; 2026-02-21T09:13:47.8077723Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8077999Z cvt.u64.u32 %rd197, %r389; 2026-02-21T09:13:47.8078144Z cvt.u64.u32 %rd198, %r390; 2026-02-21T09:13:47.8078295Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:13:47.8078443Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:13:47.8078701Z .loc 1 55 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:55:27 2026-02-21T09:13:47.8078979Z mov.b64 {%r569, %r570}, %rd200; 2026-02-21T09:13:47.8079142Z cvt.rn.f16x2.f32 %r571, %r570, %r569; 2026-02-21T09:13:47.8079408Z .loc 1 56 45 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:56:45 2026-02-21T09:13:47.8079678Z shl.b32 %r572, %r467, 12; 2026-02-21T09:13:47.8079832Z shl.b32 %r573, %r468, 12; 2026-02-21T09:13:47.8079975Z shl.b32 %r574, %r469, 12; 2026-02-21T09:13:47.8080123Z shl.b32 %r575, %r470, 12; 2026-02-21T09:13:47.8080263Z shl.b32 %r576, %r471, 12; 2026-02-21T09:13:47.8080411Z shl.b32 %r577, %r472, 12; 2026-02-21T09:13:47.8080550Z shl.b32 %r578, %r473, 12; 2026-02-21T09:13:47.8080697Z shl.b32 %r579, %r474, 12; 2026-02-21T09:13:47.8080946Z .loc 1 56 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:56:52 2026-02-21T09:13:47.8081217Z add.s32 %r580, %r572, %r466; 2026-02-21T09:13:47.8081376Z add.s32 %r581, %r573, %r466; 2026-02-21T09:13:47.8081522Z add.s32 %r582, %r574, %r466; 2026-02-21T09:13:47.8081674Z add.s32 %r583, %r575, %r466; 2026-02-21T09:13:47.8081818Z add.s32 %r584, %r576, %r466; 2026-02-21T09:13:47.8081972Z add.s32 %r585, %r577, %r466; 2026-02-21T09:13:47.8082115Z add.s32 %r586, %r578, %r466; 2026-02-21T09:13:47.8082266Z add.s32 %r587, %r579, %r466; 2026-02-21T09:13:47.8082554Z .loc 1 56 24 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:56:24 2026-02-21T09:13:47.8082835Z mad.wide.s32 %rd65, %r580, 2, %rd5; 2026-02-21T09:13:47.8083017Z mad.wide.s32 %rd66, %r581, 2, %rd5; 2026-02-21T09:13:47.8083216Z mad.wide.s32 %rd67, %r582, 2, %rd5; 2026-02-21T09:13:47.8083391Z mad.wide.s32 %rd68, %r583, 2, %rd5; 2026-02-21T09:13:47.8083553Z mad.wide.s32 %rd69, %r584, 2, %rd5; 2026-02-21T09:13:47.8083725Z mad.wide.s32 %rd70, %r585, 2, %rd5; 2026-02-21T09:13:47.8083889Z mad.wide.s32 %rd71, %r586, 2, %rd5; 2026-02-21T09:13:47.8084063Z mad.wide.s32 %rd72, %r587, 2, %rd5; 2026-02-21T09:13:47.8084334Z .loc 1 56 82 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:56:82 2026-02-21T09:13:47.8084605Z bar.sync 0, 256; 2026-02-21T09:13:47.8084817Z st.shared.v4.b32 [%r55], {%r478, %r490, %r502, %r514}; 2026-02-21T09:13:47.8085070Z st.shared.v4.b32 [%r56], {%r526, %r538, %r550, %r562}; 2026-02-21T09:13:47.8085270Z bar.sync 0, 256; 2026-02-21T09:13:47.8085403Z // begin inline asm 2026-02-21T09:13:47.8085634Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r433, %r437, %r441, %r445}, [%r397]; 2026-02-21T09:13:47.8085895Z // end inline asm 2026-02-21T09:13:47.8086024Z // begin inline asm 2026-02-21T09:13:47.8086252Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r449, %r453, %r457, %r461}, [%r402]; 2026-02-21T09:13:47.8086498Z // end inline asm 2026-02-21T09:13:47.8086638Z bar.sync 0, 256; 2026-02-21T09:13:47.8086825Z st.shared.v4.b32 [%r55], {%r481, %r493, %r505, %r517}; 2026-02-21T09:13:47.8087053Z st.shared.v4.b32 [%r56], {%r529, %r541, %r553, %r565}; 2026-02-21T09:13:47.8087237Z bar.sync 0, 256; 2026-02-21T09:13:47.8087376Z // begin inline asm 2026-02-21T09:13:47.8087592Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r434, %r438, %r442, %r446}, [%r397]; 2026-02-21T09:13:47.8087855Z // end inline asm 2026-02-21T09:13:47.8087991Z // begin inline asm 2026-02-21T09:13:47.8088208Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r450, %r454, %r458, %r462}, [%r402]; 2026-02-21T09:13:47.8088464Z // end inline asm 2026-02-21T09:13:47.8088589Z bar.sync 0, 256; 2026-02-21T09:13:47.8088754Z st.shared.v4.b32 [%r55], {%r484, %r496, %r508, %r520}; 2026-02-21T09:13:47.8088970Z st.shared.v4.b32 [%r56], {%r532, %r544, %r556, %r568}; 2026-02-21T09:13:47.8089160Z bar.sync 0, 256; 2026-02-21T09:13:47.8089287Z // begin inline asm 2026-02-21T09:13:47.8089512Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r435, %r439, %r443, %r447}, [%r397]; 2026-02-21T09:13:47.8089767Z // end inline asm 2026-02-21T09:13:47.8089897Z // begin inline asm 2026-02-21T09:13:47.8090120Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r451, %r455, %r459, %r463}, [%r402]; 2026-02-21T09:13:47.8090369Z // end inline asm 2026-02-21T09:13:47.8090500Z bar.sync 0, 256; 2026-02-21T09:13:47.8090656Z st.shared.v4.b32 [%r55], {%r487, %r499, %r511, %r523}; 2026-02-21T09:13:47.8090878Z st.shared.v4.b32 [%r56], {%r535, %r547, %r559, %r571}; 2026-02-21T09:13:47.8091065Z bar.sync 0, 256; 2026-02-21T09:13:47.8091199Z // begin inline asm 2026-02-21T09:13:47.8091422Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r436, %r440, %r444, %r448}, [%r397]; 2026-02-21T09:13:47.8091672Z // end inline asm 2026-02-21T09:13:47.8091806Z // begin inline asm 2026-02-21T09:13:47.8092020Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r452, %r456, %r460, %r464}, [%r402]; 2026-02-21T09:13:47.8092271Z // end inline asm 2026-02-21T09:13:47.8092400Z // begin inline asm 2026-02-21T09:13:47.8092584Z st.global.v4.b32 [ %rd65 + 0 ], { %r433, %r434, %r435, %r436 }; 2026-02-21T09:13:47.8092787Z // end inline asm 2026-02-21T09:13:47.8092921Z // begin inline asm 2026-02-21T09:13:47.8093100Z st.global.v4.b32 [ %rd66 + 0 ], { %r437, %r438, %r439, %r440 }; 2026-02-21T09:13:47.8093299Z // end inline asm 2026-02-21T09:13:47.8093434Z // begin inline asm 2026-02-21T09:13:47.8093605Z st.global.v4.b32 [ %rd67 + 0 ], { %r441, %r442, %r443, %r444 }; 2026-02-21T09:13:47.8093827Z // end inline asm 2026-02-21T09:13:47.8093983Z // begin inline asm 2026-02-21T09:13:47.8094159Z st.global.v4.b32 [ %rd68 + 0 ], { %r445, %r446, %r447, %r448 }; 2026-02-21T09:13:47.8094354Z // end inline asm 2026-02-21T09:13:47.8094488Z // begin inline asm 2026-02-21T09:13:47.8094664Z st.global.v4.b32 [ %rd69 + 0 ], { %r449, %r450, %r451, %r452 }; 2026-02-21T09:13:47.8094913Z // end inline asm 2026-02-21T09:13:47.8095047Z // begin inline asm 2026-02-21T09:13:47.8095215Z st.global.v4.b32 [ %rd70 + 0 ], { %r453, %r454, %r455, %r456 }; 2026-02-21T09:13:47.8095417Z // end inline asm 2026-02-21T09:13:47.8095544Z // begin inline asm 2026-02-21T09:13:47.8095715Z st.global.v4.b32 [ %rd71 + 0 ], { %r457, %r458, %r459, %r460 }; 2026-02-21T09:13:47.8095914Z // end inline asm 2026-02-21T09:13:47.8096050Z // begin inline asm 2026-02-21T09:13:47.8096217Z st.global.v4.b32 [ %rd72 + 0 ], { %r461, %r462, %r463, %r464 }; 2026-02-21T09:13:47.8096424Z // end inline asm 2026-02-21T09:13:47.8096587Z mov.b32 %r646, 1; 2026-02-21T09:13:47.8096767Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:47.8097103Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8097390Z xor.b32 %r650, %r646, %r650; 2026-02-21T09:13:47.8097662Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8097944Z add.s32 %r640, %r640, -1; 2026-02-21T09:13:47.8098133Z setp.ne.b32 %p110, %r640, 0; 2026-02-21T09:13:47.8098298Z @%p110 bra $L__BB0_18; 2026-02-21T09:13:47.8098440Z bra.uni $L__BB0_23; 2026-02-21T09:13:47.8098630Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:13:47.8098840Z add.s32 %r321, %r645, 1; 2026-02-21T09:13:47.8099000Z setp.eq.b32 %p106, %r645, 63; 2026-02-21T09:13:47.8099162Z selp.b32 %r645, 0, %r321, %p106; 2026-02-21T09:13:47.8099334Z setp.eq.b32 %p107, %r645, 63; 2026-02-21T09:13:47.8099490Z @%p107 bra $L__BB0_21; 2026-02-21T09:13:47.8099677Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:47.8099988Z .loc 1 0 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0:97 2026-02-21T09:13:47.8100269Z mov.b32 %r646, 0; 2026-02-21T09:13:47.8100513Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8100793Z setp.ne.b32 %p108, %r645, 0; 2026-02-21T09:13:47.8100953Z @%p108 bra $L__BB0_22; 2026-02-21T09:13:47.8101109Z // %bb.20: // %.thread 2026-02-21T09:13:47.8101326Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:13:47.8101530Z add.s32 %r647, %r647, 296; 2026-02-21T09:13:47.8101791Z .loc 1 34 35 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:34:35 2026-02-21T09:13:47.8102079Z shr.s32 %r589, %r647, 31; 2026-02-21T09:13:47.8102227Z shr.u32 %r590, %r589, 25; 2026-02-21T09:13:47.8102383Z add.s32 %r591, %r647, %r590; 2026-02-21T09:13:47.8102532Z shr.s32 %r592, %r591, 7; 2026-02-21T09:13:47.8102791Z .loc 1 35 33 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:35:33 2026-02-21T09:13:47.8103067Z shl.b32 %r593, %r592, 4; 2026-02-21T09:13:47.8103330Z .loc 1 36 39 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:36:39 2026-02-21T09:13:47.8103611Z sub.s32 %r594, 64, %r593; 2026-02-21T09:13:47.8103871Z .loc 1 36 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:36:52 2026-02-21T09:13:47.8104194Z min.s32 %r595, %r594, 16; 2026-02-21T09:13:47.8104444Z .loc 1 37 45 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:37:45 2026-02-21T09:13:47.8104783Z and.b32 %r596, %r591, -128; 2026-02-21T09:13:47.8104951Z sub.s32 %r597, %r647, %r596; 2026-02-21T09:13:47.8105235Z .loc 1 38 51 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:38:51 2026-02-21T09:13:47.8105561Z div.s32 %r598, %r597, %r595; 2026-02-21T09:13:47.8105824Z .loc 1 37 64 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:37:64 2026-02-21T09:13:47.8106179Z mul.lo.s32 %r599, %r598, %r595; 2026-02-21T09:13:47.8106343Z sub.s32 %r600, %r597, %r599; 2026-02-21T09:13:47.8106615Z .loc 1 37 30 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:37:30 2026-02-21T09:13:47.8106899Z add.s32 %r601, %r600, %r593; 2026-02-21T09:13:47.8107171Z .loc 1 39 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:39:27 2026-02-21T09:13:47.8107466Z shl.b32 %r649, %r601, 6; 2026-02-21T09:13:47.8107726Z .loc 1 41 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:41:27 2026-02-21T09:13:47.8108024Z shl.b32 %r648, %r598, 8; 2026-02-21T09:13:47.8108310Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8108615Z bra.uni $L__BB0_22; 2026-02-21T09:13:47.8108808Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:13:47.8109144Z .loc 1 0 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0:97 2026-02-21T09:13:47.8109456Z mov.b32 %r78, global_smem; 2026-02-21T09:13:47.8109622Z add.s32 %r79, %r78, %r3; 2026-02-21T09:13:47.8109785Z bra.uni $L__BB0_2; 2026-02-21T09:13:47.8110003Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8110338Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8110645Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:47.8110836Z barrier.sync 1; 2026-02-21T09:13:47.8110980Z barrier.sync 1; 2026-02-21T09:13:47.8111138Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:47.8111348Z $L__BB0_2: // %.preheader 2026-02-21T09:13:47.8111569Z // =>This Loop Header: Depth=1 2026-02-21T09:13:47.8111803Z // Child Loop BB0_11 Depth 2 2026-02-21T09:13:47.8112029Z // Child Loop BB0_7 Depth 2 2026-02-21T09:13:47.8112341Z .loc 1 19 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:19 2026-02-21T09:13:47.8112632Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:13:47.8112806Z barrier.sync 1; 2026-02-21T09:13:47.8112952Z ld.shared.b8 %r77, [%r79+172176]; 2026-02-21T09:13:47.8113119Z setp.gt.u32 %p4, %r77, 3; 2026-02-21T09:13:47.8113277Z @%p4 bra $L__BB0_4; 2026-02-21T09:13:47.8113435Z // %bb.3: // %.preheader 2026-02-21T09:13:47.8113654Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8113853Z $L_brx_0: .branchtargets 2026-02-21T09:13:47.8114007Z $L__BB0_5, 2026-02-21T09:13:47.8114138Z $L__BB0_9, 2026-02-21T09:13:47.8114258Z $L__BB0_15, 2026-02-21T09:13:47.8114383Z $L__BB0_24; 2026-02-21T09:13:47.8114508Z brx.idx %r77, $L_brx_0; 2026-02-21T09:13:47.8114716Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8115037Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8115350Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:47.8115547Z ld.shared.b32 %r128, [global_smem+163840]; 2026-02-21T09:13:47.8115755Z ld.shared.b32 %r624, [global_smem+163848]; 2026-02-21T09:13:47.8115936Z barrier.sync 1; 2026-02-21T09:13:47.8116075Z setp.lt.s32 %p17, %r624, 1; 2026-02-21T09:13:47.8116237Z @%p17 bra $L__BB0_8; 2026-02-21T09:13:47.8116392Z // %bb.6: // %.lr.ph7 2026-02-21T09:13:47.8116605Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8116907Z .loc 1 0 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0:97 2026-02-21T09:13:47.8117213Z mov.b32 %r628, -1; 2026-02-21T09:13:47.8117354Z mov.pred %p130, 0; 2026-02-21T09:13:47.8117498Z mov.b32 %r625, 0; 2026-02-21T09:13:47.8117637Z mov.b32 %r626, %r625; 2026-02-21T09:13:47.8117811Z mov.b32 %r627, %r625; 2026-02-21T09:13:47.8117995Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:13:47.8118228Z // => This Inner Loop Header: Depth=2 2026-02-21T09:13:47.8118546Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8118825Z add.s32 %r138, %r628, 1; 2026-02-21T09:13:47.8118987Z setp.eq.b32 %p30, %r628, 63; 2026-02-21T09:13:47.8119146Z selp.b32 %r628, 0, %r138, %p30; 2026-02-21T09:13:47.8119313Z shl.b32 %r139, %r627, 3; 2026-02-21T09:13:47.8119465Z add.s32 %r141, %r78, %r139; 2026-02-21T09:13:47.8119648Z add.s32 %r142, %r141, 172032; 2026-02-21T09:13:47.8119811Z add.s32 %r126, %r141, 172096; 2026-02-21T09:13:47.8120062Z .loc 1 51 31 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:51:31 2026-02-21T09:13:47.8120340Z shl.b32 %r143, %r627, 14; 2026-02-21T09:13:47.8120489Z add.s32 %r144, %r78, %r143; 2026-02-21T09:13:47.8120744Z .loc 1 52 44 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:52:44 2026-02-21T09:13:47.8121021Z shl.b32 %r145, %r627, 12; 2026-02-21T09:13:47.8121202Z add.s32 %r146, %r78, %r145; 2026-02-21T09:13:47.8121363Z add.s32 %r147, %r146, 131072; 2026-02-21T09:13:47.8121619Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8121901Z bar.warp.sync -1; 2026-02-21T09:13:47.8122045Z // begin inline asm 2026-02-21T09:13:47.8122187Z 2026-02-21T09:13:47.8122303Z { 2026-02-21T09:13:47.8122433Z .reg .pred complete; 2026-02-21T09:13:47.8122579Z waitLoop: 2026-02-21T09:13:47.8122773Z mbarrier.try_wait.parity.shared.b64 complete, [%r126], %r626; 2026-02-21T09:13:47.8123009Z @!complete bra.uni waitLoop; 2026-02-21T09:13:47.8123161Z } 2026-02-21T09:13:47.8123226Z 2026-02-21T09:13:47.8123290Z // end inline asm 2026-02-21T09:13:47.8123539Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8123830Z setp.eq.b32 %p29, %r628, 63; 2026-02-21T09:13:47.8124089Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8124380Z elect.sync %r148|%p20, -1; 2026-02-21T09:13:47.8124540Z bfe.u32 %r149, %r144, 4, 14; 2026-02-21T09:13:47.8124734Z cvt.u64.u32 %rd22, %r149; 2026-02-21T09:13:47.8124901Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T09:13:47.8125081Z bfe.u32 %r150, %r147, 4, 14; 2026-02-21T09:13:47.8125243Z cvt.u64.u32 %rd23, %r150; 2026-02-21T09:13:47.8125406Z or.b64 %rd13, %rd23, -9223371899399045120; 2026-02-21T09:13:47.8125593Z mov.b32 %r129, 135266320; 2026-02-21T09:13:47.8125737Z // begin inline asm 2026-02-21T09:13:47.8125968Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r128 + 0 ], %rd12, %rd13, %r129, %p130; 2026-02-21T09:13:47.8126222Z // end inline asm 2026-02-21T09:13:47.8126365Z add.s32 %r151, %r144, 32; 2026-02-21T09:13:47.8126528Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T09:13:47.8126676Z cvt.u64.u32 %rd24, %r152; 2026-02-21T09:13:47.8126838Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:13:47.8127013Z add.s32 %r153, %r146, 131104; 2026-02-21T09:13:47.8127171Z bfe.u32 %r154, %r153, 4, 14; 2026-02-21T09:13:47.8127319Z cvt.u64.u32 %rd25, %r154; 2026-02-21T09:13:47.8127481Z or.b64 %rd15, %rd25, -9223371899399045120; 2026-02-21T09:13:47.8127652Z mov.pred %p21, -1; 2026-02-21T09:13:47.8127798Z // begin inline asm 2026-02-21T09:13:47.8128017Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r128 + 0 ], %rd14, %rd15, %r129, %p21; 2026-02-21T09:13:47.8128259Z // end inline asm 2026-02-21T09:13:47.8128431Z add.s32 %r155, %r144, 8192; 2026-02-21T09:13:47.8128583Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T09:13:47.8128737Z cvt.u64.u32 %rd26, %r156; 2026-02-21T09:13:47.8128890Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:13:47.8129065Z // begin inline asm 2026-02-21T09:13:47.8129309Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r128 + 64 ], %rd16, %rd13, %r129, %p130; 2026-02-21T09:13:47.8129557Z // end inline asm 2026-02-21T09:13:47.8129695Z add.s32 %r157, %r144, 8224; 2026-02-21T09:13:47.8129844Z bfe.u32 %r158, %r157, 4, 14; 2026-02-21T09:13:47.8130001Z cvt.u64.u32 %rd27, %r158; 2026-02-21T09:13:47.8130153Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T09:13:47.8130327Z // begin inline asm 2026-02-21T09:13:47.8130533Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r128 + 64 ], %rd18, %rd15, %r129, %p21; 2026-02-21T09:13:47.8130778Z // end inline asm 2026-02-21T09:13:47.8130907Z cvt.u64.u32 %rd20, %r142; 2026-02-21T09:13:47.8131085Z // begin inline asm 2026-02-21T09:13:47.8131291Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:13:47.8131519Z // end inline asm 2026-02-21T09:13:47.8131663Z and.pred %p28, %p29, %p20; 2026-02-21T09:13:47.8131815Z add.s32 %r159, %r78, 172160; 2026-02-21T09:13:47.8131971Z cvt.u64.u32 %rd21, %r159; 2026-02-21T09:13:47.8132114Z // begin inline asm 2026-02-21T09:13:47.8132315Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:13:47.8132536Z // end inline asm 2026-02-21T09:13:47.8132805Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8133095Z setp.ne.b32 %p130, %r628, 63; 2026-02-21T09:13:47.8133364Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8133652Z selp.b32 %r160, 1, 0, %p29; 2026-02-21T09:13:47.8133806Z xor.b32 %r625, %r625, %r160; 2026-02-21T09:13:47.8133966Z add.s32 %r136, %r78, 172176; 2026-02-21T09:13:47.8134113Z // begin inline asm 2026-02-21T09:13:47.8134250Z 2026-02-21T09:13:47.8134361Z { 2026-02-21T09:13:47.8134502Z @!%p29 bra.uni skipWait; 2026-02-21T09:13:47.8134662Z .reg .pred complete; 2026-02-21T09:13:47.8134839Z waitLoop: 2026-02-21T09:13:47.8135035Z mbarrier.try_wait.parity.shared.b64 complete, [%r136], %r625; 2026-02-21T09:13:47.8135273Z @!complete bra.uni waitLoop; 2026-02-21T09:13:47.8135437Z skipWait: 2026-02-21T09:13:47.8135558Z } 2026-02-21T09:13:47.8135631Z 2026-02-21T09:13:47.8135688Z // end inline asm 2026-02-21T09:13:47.8135926Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8136215Z add.s32 %r161, %r627, 1; 2026-02-21T09:13:47.8136380Z setp.eq.b32 %p31, %r161, 8; 2026-02-21T09:13:47.8136544Z selp.b32 %r627, 0, %r161, %p31; 2026-02-21T09:13:47.8136718Z selp.b32 %r162, 1, 0, %p31; 2026-02-21T09:13:47.8136874Z xor.b32 %r626, %r626, %r162; 2026-02-21T09:13:47.8137153Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8137445Z add.s32 %r624, %r624, -1; 2026-02-21T09:13:47.8137611Z setp.ne.b32 %p32, %r624, 0; 2026-02-21T09:13:47.8137770Z @%p32 bra $L__BB0_7; 2026-02-21T09:13:47.8137950Z $L__BB0_8: // %._crit_edge8 2026-02-21T09:13:47.8138184Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8138387Z barrier.sync 1; 2026-02-21T09:13:47.8138556Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:47.8138741Z bra.uni $L__BB0_2; 2026-02-21T09:13:47.8138929Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8139247Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8139555Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:47.8139767Z ld.shared.b32 %r629, [global_smem+163848]; 2026-02-21T09:13:47.8139950Z barrier.sync 1; 2026-02-21T09:13:47.8140224Z .loc 1 21 67 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:21:67 2026-02-21T09:13:47.8140503Z mov.u32 %r17, %ctaid.x; 2026-02-21T09:13:47.8140656Z mov.u32 %r80, %ctaid.y; 2026-02-21T09:13:47.8140799Z mov.u32 %r81, %ctaid.z; 2026-02-21T09:13:47.8140981Z mov.u32 %r82, %nctaid.x; 2026-02-21T09:13:47.8141127Z mov.u32 %r83, %nctaid.y; 2026-02-21T09:13:47.8141287Z mad.lo.s32 %r84, %r81, %r83, %r80; 2026-02-21T09:13:47.8141460Z mad.lo.s32 %r85, %r84, %r82, %r17; 2026-02-21T09:13:47.8141631Z shl.b32 %r86, %r85, 8; 2026-02-21T09:13:47.8141883Z .loc 1 22 67 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:22:67 2026-02-21T09:13:47.8142163Z cvt.s64.s32 %rd7, %r86; 2026-02-21T09:13:47.8142317Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:13:47.8142374Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:13:47.8142436Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:13:47.8142632Z .loc 1 21 67 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:21:67 2026-02-21T09:13:47.8142698Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:13:47.8142858Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8142928Z setp.lt.s32 %p5, %r629, 1; 2026-02-21T09:13:47.8142984Z @%p5 bra $L__BB0_14; 2026-02-21T09:13:47.8143056Z // %bb.10: // %.lr.ph 2026-02-21T09:13:47.8143170Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8143238Z add.s32 %r639, %r17, -296; 2026-02-21T09:13:47.8143297Z add.s32 %r19, %r1, -256; 2026-02-21T09:13:47.8143351Z mov.b32 %r636, -1; 2026-02-21T09:13:47.8143411Z mov.b32 %r630, 0; 2026-02-21T09:13:47.8143465Z mov.b32 %r631, %r630; 2026-02-21T09:13:47.8143519Z mov.b32 %r638, %r630; 2026-02-21T09:13:47.8143573Z mov.b32 %r637, %r630; 2026-02-21T09:13:47.8143637Z mov.b32 %r634, %r630; 2026-02-21T09:13:47.8143695Z bra.uni $L__BB0_11; 2026-02-21T09:13:47.8143795Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:13:47.8143965Z .loc 1 0 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0:97 2026-02-21T09:13:47.8144030Z selp.b32 %r107, 0, %r634, %p8; 2026-02-21T09:13:47.8144093Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:13:47.8144159Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:13:47.8144326Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8144383Z shl.b32 %r114, %r631, 3; 2026-02-21T09:13:47.8144439Z add.s32 %r116, %r78, %r114; 2026-02-21T09:13:47.8144505Z add.s32 %r103, %r116, 172032; 2026-02-21T09:13:47.8144665Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8144750Z // begin inline asm 2026-02-21T09:13:47.8144806Z 2026-02-21T09:13:47.8144853Z { 2026-02-21T09:13:47.8144914Z .reg .pred complete; 2026-02-21T09:13:47.8144968Z waitLoop: 2026-02-21T09:13:47.8145090Z mbarrier.try_wait.parity.shared.b64 complete, [%r103], %r630; 2026-02-21T09:13:47.8145150Z @!complete bra.uni waitLoop; 2026-02-21T09:13:47.8145198Z } 2026-02-21T09:13:47.8145201Z 2026-02-21T09:13:47.8145264Z // end inline asm 2026-02-21T09:13:47.8145431Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8145488Z add.s32 %r109, %r116, 172096; 2026-02-21T09:13:47.8145655Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8145710Z bar.sync 3, 64; 2026-02-21T09:13:47.8145764Z // begin inline asm 2026-02-21T09:13:47.8145874Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r109], 20480; 2026-02-21T09:13:47.8145934Z // end inline asm 2026-02-21T09:13:47.8146098Z .loc 1 51 31 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:51:31 2026-02-21T09:13:47.8146154Z shl.b32 %r117, %r631, 14; 2026-02-21T09:13:47.8146279Z add.s32 %r106, %r78, %r117; 2026-02-21T09:13:47.8146430Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8146485Z bar.sync 3, 64; 2026-02-21T09:13:47.8146554Z elect.sync %r118|%p13, -1; 2026-02-21T09:13:47.8146641Z and.pred %p10, %p12, %p13; 2026-02-21T09:13:47.8146696Z // begin inline asm 2026-02-21T09:13:47.8146941Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r106], [%rd10, {%r107, %r638}], [%r109]; 2026-02-21T09:13:47.8147003Z // end inline asm 2026-02-21T09:13:47.8147170Z .loc 1 52 44 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:52:44 2026-02-21T09:13:47.8147225Z shl.b32 %r119, %r631, 12; 2026-02-21T09:13:47.8147289Z add.s32 %r120, %r78, %r119; 2026-02-21T09:13:47.8147347Z add.s32 %r110, %r120, 131072; 2026-02-21T09:13:47.8147523Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8147589Z bar.sync 3, 64; 2026-02-21T09:13:47.8147650Z elect.sync %r121|%p14, -1; 2026-02-21T09:13:47.8147709Z and.pred %p11, %p12, %p14; 2026-02-21T09:13:47.8147764Z // begin inline asm 2026-02-21T09:13:47.8148012Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r110], [%rd11, {%r107, %r637}], [%r109]; 2026-02-21T09:13:47.8148069Z // end inline asm 2026-02-21T09:13:47.8148257Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8148322Z add.s32 %r634, %r107, 32; 2026-02-21T09:13:47.8148478Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8148534Z add.s32 %r122, %r631, 1; 2026-02-21T09:13:47.8148601Z setp.eq.b32 %p15, %r122, 8; 2026-02-21T09:13:47.8148662Z selp.b32 %r631, 0, %r122, %p15; 2026-02-21T09:13:47.8148721Z selp.b32 %r123, 1, 0, %p15; 2026-02-21T09:13:47.8148788Z xor.b32 %r630, %r630, %r123; 2026-02-21T09:13:47.8148962Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8149021Z add.s32 %r629, %r629, -1; 2026-02-21T09:13:47.8149082Z setp.ne.b32 %p16, %r629, 0; 2026-02-21T09:13:47.8149151Z @%p16 bra $L__BB0_11; 2026-02-21T09:13:47.8149208Z bra.uni $L__BB0_14; 2026-02-21T09:13:47.8149310Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:13:47.8149416Z // => This Inner Loop Header: Depth=2 2026-02-21T09:13:47.8149476Z add.s32 %r89, %r636, 1; 2026-02-21T09:13:47.8149538Z setp.eq.b32 %p6, %r636, 63; 2026-02-21T09:13:47.8149600Z selp.b32 %r636, 0, %r89, %p6; 2026-02-21T09:13:47.8149668Z setp.ne.b32 %p7, %r636, 0; 2026-02-21T09:13:47.8149729Z setp.eq.b32 %p8, %r636, 0; 2026-02-21T09:13:47.8149787Z @%p7 bra $L__BB0_13; 2026-02-21T09:13:47.8149892Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:13:47.8149953Z add.s32 %r639, %r639, 296; 2026-02-21T09:13:47.8150125Z .loc 1 34 35 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:34:35 2026-02-21T09:13:47.8150190Z shr.s32 %r90, %r639, 31; 2026-02-21T09:13:47.8150249Z shr.u32 %r91, %r90, 25; 2026-02-21T09:13:47.8150307Z add.s32 %r92, %r639, %r91; 2026-02-21T09:13:47.8150363Z shr.s32 %r93, %r92, 7; 2026-02-21T09:13:47.8150543Z .loc 1 35 33 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:35:33 2026-02-21T09:13:47.8150600Z shl.b32 %r94, %r93, 4; 2026-02-21T09:13:47.8150770Z .loc 1 36 39 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:36:39 2026-02-21T09:13:47.8150834Z sub.s32 %r95, 64, %r94; 2026-02-21T09:13:47.8151003Z .loc 1 36 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:36:52 2026-02-21T09:13:47.8151062Z min.s32 %r96, %r95, 16; 2026-02-21T09:13:47.8151244Z .loc 1 37 45 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:37:45 2026-02-21T09:13:47.8151335Z and.b32 %r97, %r92, -128; 2026-02-21T09:13:47.8151393Z sub.s32 %r98, %r639, %r97; 2026-02-21T09:13:47.8151564Z .loc 1 38 51 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:38:51 2026-02-21T09:13:47.8151654Z div.s32 %r99, %r98, %r96; 2026-02-21T09:13:47.8151828Z .loc 1 37 64 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:37:64 2026-02-21T09:13:47.8151890Z mul.lo.s32 %r100, %r99, %r96; 2026-02-21T09:13:47.8151955Z sub.s32 %r101, %r98, %r100; 2026-02-21T09:13:47.8152122Z .loc 1 37 30 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:37:30 2026-02-21T09:13:47.8152180Z add.s32 %r102, %r101, %r94; 2026-02-21T09:13:47.8152354Z .loc 1 39 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:39:27 2026-02-21T09:13:47.8152434Z shl.b32 %r637, %r102, 6; 2026-02-21T09:13:47.8152609Z .loc 1 41 27 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:41:27 2026-02-21T09:13:47.8152675Z shl.b32 %r638, %r99, 8; 2026-02-21T09:13:47.8152733Z bra.uni $L__BB0_13; 2026-02-21T09:13:47.8152819Z $L__BB0_14: // %._crit_edge 2026-02-21T09:13:47.8152914Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8153115Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8153175Z barrier.sync 1; 2026-02-21T09:13:47.8153256Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:13:47.8153324Z bra.uni $L__BB0_2; 2026-02-21T09:13:47.8153422Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:13:47.8153584Z .loc 1 19 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:19 2026-02-21T09:13:47.8153651Z barrier.sync 1; 2026-02-21T09:13:47.8153709Z barrier.sync 1; 2026-02-21T09:13:47.8153766Z bra.uni $L__BB0_2; 2026-02-21T09:13:47.8153850Z $L__BB0_23: // %._crit_edge11 2026-02-21T09:13:47.8154019Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8154077Z barrier.sync 1; 2026-02-21T09:13:47.8154154Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:13:47.8154328Z .loc 1 53 52 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:53:52 2026-02-21T09:13:47.8154384Z bar.sync 0, 256; 2026-02-21T09:13:47.8154440Z // begin inline asm 2026-02-21T09:13:47.8154497Z 2026-02-21T09:13:47.8154548Z { 2026-02-21T09:13:47.8154609Z .reg .pred complete; 2026-02-21T09:13:47.8154662Z waitLoop: 2026-02-21T09:13:47.8154839Z mbarrier.try_wait.parity.shared.b64 complete, [%r602], %r650; 2026-02-21T09:13:47.8154906Z @!complete bra.uni waitLoop; 2026-02-21T09:13:47.8154955Z } 2026-02-21T09:13:47.8154960Z 2026-02-21T09:13:47.8155025Z // end inline asm 2026-02-21T09:13:47.8155193Z .loc 1 28 97 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:97 2026-02-21T09:13:47.8155248Z bar.sync 0, 256; 2026-02-21T09:13:47.8155305Z // begin inline asm 2026-02-21T09:13:47.8155404Z @%p111 mbarrier.inval.shared::cta.b64 [%r602]; 2026-02-21T09:13:47.8155460Z // end inline asm 2026-02-21T09:13:47.8155514Z // begin inline asm 2026-02-21T09:13:47.8155608Z @%p111 mbarrier.inval.shared::cta.b64 [%r272]; 2026-02-21T09:13:47.8155666Z // end inline asm 2026-02-21T09:13:47.8155722Z // begin inline asm 2026-02-21T09:13:47.8155812Z @%p111 mbarrier.inval.shared::cta.b64 [%r256]; 2026-02-21T09:13:47.8155867Z // end inline asm 2026-02-21T09:13:47.8155923Z bar.sync 0, 256; 2026-02-21T09:13:47.8155978Z // begin inline asm 2026-02-21T09:13:47.8156066Z @%p111 mbarrier.inval.shared::cta.b64 [%r257]; 2026-02-21T09:13:47.8156131Z // end inline asm 2026-02-21T09:13:47.8156186Z bar.sync 0, 256; 2026-02-21T09:13:47.8156276Z // begin inline asm 2026-02-21T09:13:47.8156350Z @%p111 mbarrier.inval.shared::cta.b64 [%r258]; 2026-02-21T09:13:47.8156402Z // end inline asm 2026-02-21T09:13:47.8156454Z bar.sync 0, 256; 2026-02-21T09:13:47.8156515Z // begin inline asm 2026-02-21T09:13:47.8156615Z @%p111 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:13:47.8156666Z // end inline asm 2026-02-21T09:13:47.8156725Z bar.sync 0, 256; 2026-02-21T09:13:47.8156779Z // begin inline asm 2026-02-21T09:13:47.8156854Z @%p111 mbarrier.inval.shared::cta.b64 [%r260]; 2026-02-21T09:13:47.8156906Z // end inline asm 2026-02-21T09:13:47.8156965Z bar.sync 0, 256; 2026-02-21T09:13:47.8157018Z // begin inline asm 2026-02-21T09:13:47.8157091Z @%p111 mbarrier.inval.shared::cta.b64 [%r261]; 2026-02-21T09:13:47.8157148Z // end inline asm 2026-02-21T09:13:47.8157199Z bar.sync 0, 256; 2026-02-21T09:13:47.8157251Z // begin inline asm 2026-02-21T09:13:47.8157353Z @%p111 mbarrier.inval.shared::cta.b64 [%r262]; 2026-02-21T09:13:47.8157406Z // end inline asm 2026-02-21T09:13:47.8157458Z bar.sync 0, 256; 2026-02-21T09:13:47.8157510Z // begin inline asm 2026-02-21T09:13:47.8157590Z @%p111 mbarrier.inval.shared::cta.b64 [%r263]; 2026-02-21T09:13:47.8157641Z // end inline asm 2026-02-21T09:13:47.8157696Z // begin inline asm 2026-02-21T09:13:47.8157776Z @%p111 mbarrier.inval.shared::cta.b64 [%r248]; 2026-02-21T09:13:47.8157827Z // end inline asm 2026-02-21T09:13:47.8157877Z bar.sync 0, 256; 2026-02-21T09:13:47.8157953Z // begin inline asm 2026-02-21T09:13:47.8158034Z @%p111 mbarrier.inval.shared::cta.b64 [%r249]; 2026-02-21T09:13:47.8158085Z // end inline asm 2026-02-21T09:13:47.8158136Z bar.sync 0, 256; 2026-02-21T09:13:47.8158196Z // begin inline asm 2026-02-21T09:13:47.8158269Z @%p111 mbarrier.inval.shared::cta.b64 [%r250]; 2026-02-21T09:13:47.8158321Z // end inline asm 2026-02-21T09:13:47.8158372Z bar.sync 0, 256; 2026-02-21T09:13:47.8158432Z // begin inline asm 2026-02-21T09:13:47.8158508Z @%p111 mbarrier.inval.shared::cta.b64 [%r251]; 2026-02-21T09:13:47.8158561Z // end inline asm 2026-02-21T09:13:47.8158621Z bar.sync 0, 256; 2026-02-21T09:13:47.8158676Z // begin inline asm 2026-02-21T09:13:47.8158750Z @%p111 mbarrier.inval.shared::cta.b64 [%r252]; 2026-02-21T09:13:47.8158811Z // end inline asm 2026-02-21T09:13:47.8158863Z bar.sync 0, 256; 2026-02-21T09:13:47.8158915Z // begin inline asm 2026-02-21T09:13:47.8158990Z @%p111 mbarrier.inval.shared::cta.b64 [%r253]; 2026-02-21T09:13:47.8159048Z // end inline asm 2026-02-21T09:13:47.8159101Z bar.sync 0, 256; 2026-02-21T09:13:47.8159154Z // begin inline asm 2026-02-21T09:13:47.8159233Z @%p111 mbarrier.inval.shared::cta.b64 [%r254]; 2026-02-21T09:13:47.8159286Z // end inline asm 2026-02-21T09:13:47.8159337Z bar.sync 0, 256; 2026-02-21T09:13:47.8159391Z // begin inline asm 2026-02-21T09:13:47.8159472Z @%p111 mbarrier.inval.shared::cta.b64 [%r255]; 2026-02-21T09:13:47.8159523Z // end inline asm 2026-02-21T09:13:47.8159690Z .loc 1 28 4 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:28:4 2026-02-21T09:13:47.8159752Z bar.sync 0, 256; 2026-02-21T09:13:47.8159807Z // begin inline asm 2026-02-21T09:13:47.8159922Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r622, 128; 2026-02-21T09:13:47.8159985Z // end inline asm 2026-02-21T09:13:47.8160062Z st.shared.b32 [global_smem+172184], 50529027; 2026-02-21T09:13:47.8160119Z barrier.sync 1; 2026-02-21T09:13:47.8160201Z $L__BB0_24: // %common.ret 2026-02-21T09:13:47.8160371Z .loc 1 0 0 // cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py:0 2026-02-21T09:13:47.8160422Z ret; 2026-02-21T09:13:47.8160476Z $L__tmp1: 2026-02-21T09:13:47.8160537Z $L__func_end0: 2026-02-21T09:13:47.8160617Z // -- End function 2026-02-21T09:13:47.8160664Z } 2026-02-21T09:13:47.8160868Z .file 1 "/tmp/torchinductor_root/gj/cgjdw4vttjogrk4m5kf6d3hyd7frbd6cutraw2o3qyxadgenqltk.py" 2026-02-21T09:13:47.8160958Z .section .debug_abbrev 2026-02-21T09:13:47.8161006Z { 2026-02-21T09:13:47.8161091Z .b8 1 // Abbreviation Code 2026-02-21T09:13:47.8161182Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:13:47.8161278Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:13:47.8161353Z .b8 37 // DW_AT_producer 2026-02-21T09:13:47.8161432Z .b8 8 // DW_FORM_string 2026-02-21T09:13:47.8161504Z .b8 19 // DW_AT_language 2026-02-21T09:13:47.8161579Z .b8 5 // DW_FORM_data2 2026-02-21T09:13:47.8161650Z .b8 3 // DW_AT_name 2026-02-21T09:13:47.8161727Z .b8 8 // DW_FORM_string 2026-02-21T09:13:47.8161803Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:13:47.8161895Z .b8 6 // DW_FORM_data4 2026-02-21T09:13:47.8161975Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:13:47.8162045Z .b8 8 // DW_FORM_string 2026-02-21T09:13:47.8162112Z .b8 0 // EOM(1) 2026-02-21T09:13:47.8162187Z .b8 0 // EOM(2) 2026-02-21T09:13:47.8162250Z .b8 0 // EOM(3) 2026-02-21T09:13:47.8162297Z } 2026-02-21T09:13:47.8162353Z .section .debug_info 2026-02-21T09:13:47.8162443Z { 2026-02-21T09:13:47.8162522Z .b32 104 // Length of Unit 2026-02-21T09:13:47.8162606Z .b8 2 // DWARF version number 2026-02-21T09:13:47.8162663Z .b8 0 2026-02-21T09:13:47.8162775Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:13:47.8162861Z .b8 8 // Address Size (in bytes) 2026-02-21T09:13:47.8162966Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:13:47.8163045Z .b8 116 // DW_AT_producer 2026-02-21T09:13:47.8163096Z .b8 114 2026-02-21T09:13:47.8163146Z .b8 105 2026-02-21T09:13:47.8163203Z .b8 116 2026-02-21T09:13:47.8163253Z .b8 111 2026-02-21T09:13:47.8163304Z .b8 110 2026-02-21T09:13:47.8163360Z .b8 0 2026-02-21T09:13:47.8163430Z .b8 2 // DW_AT_language 2026-02-21T09:13:47.8163477Z .b8 0 2026-02-21T09:13:47.8163549Z .b8 99 // DW_AT_name 2026-02-21T09:13:47.8163606Z .b8 103 2026-02-21T09:13:47.8163652Z .b8 106 2026-02-21T09:13:47.8163700Z .b8 100 2026-02-21T09:13:47.8163754Z .b8 119 2026-02-21T09:13:47.8163802Z .b8 52 2026-02-21T09:13:47.8163851Z .b8 118 2026-02-21T09:13:47.8163899Z .b8 116 2026-02-21T09:13:47.8163955Z .b8 116 2026-02-21T09:13:47.8164002Z .b8 106 2026-02-21T09:13:47.8164050Z .b8 111 2026-02-21T09:13:47.8164106Z .b8 103 2026-02-21T09:13:47.8164153Z .b8 114 2026-02-21T09:13:47.8164200Z .b8 107 2026-02-21T09:13:47.8164251Z .b8 52 2026-02-21T09:13:47.8164306Z .b8 109 2026-02-21T09:13:47.8164353Z .b8 53 2026-02-21T09:13:47.8164401Z .b8 107 2026-02-21T09:13:47.8164449Z .b8 102 2026-02-21T09:13:47.8164503Z .b8 54 2026-02-21T09:13:47.8164550Z .b8 100 2026-02-21T09:13:47.8164598Z .b8 51 2026-02-21T09:13:47.8164652Z .b8 104 2026-02-21T09:13:47.8164735Z .b8 121 2026-02-21T09:13:47.8164784Z .b8 100 2026-02-21T09:13:47.8164830Z .b8 55 2026-02-21T09:13:47.8164885Z .b8 102 2026-02-21T09:13:47.8164932Z .b8 114 2026-02-21T09:13:47.8164980Z .b8 98 2026-02-21T09:13:47.8165035Z .b8 100 2026-02-21T09:13:47.8165083Z .b8 54 2026-02-21T09:13:47.8165131Z .b8 99 2026-02-21T09:13:47.8165180Z .b8 117 2026-02-21T09:13:47.8165236Z .b8 116 2026-02-21T09:13:47.8165284Z .b8 114 2026-02-21T09:13:47.8165332Z .b8 97 2026-02-21T09:13:47.8165388Z .b8 119 2026-02-21T09:13:47.8165436Z .b8 50 2026-02-21T09:13:47.8165485Z .b8 111 2026-02-21T09:13:47.8165533Z .b8 51 2026-02-21T09:13:47.8165590Z .b8 113 2026-02-21T09:13:47.8165640Z .b8 121 2026-02-21T09:13:47.8165722Z .b8 120 2026-02-21T09:13:47.8165771Z .b8 97 2026-02-21T09:13:47.8165830Z .b8 100 2026-02-21T09:13:47.8165879Z .b8 103 2026-02-21T09:13:47.8165927Z .b8 101 2026-02-21T09:13:47.8165983Z .b8 110 2026-02-21T09:13:47.8166030Z .b8 113 2026-02-21T09:13:47.8166104Z .b8 108 2026-02-21T09:13:47.8166154Z .b8 116 2026-02-21T09:13:47.8166213Z .b8 107 2026-02-21T09:13:47.8166262Z .b8 46 2026-02-21T09:13:47.8166312Z .b8 112 2026-02-21T09:13:47.8166370Z .b8 121 2026-02-21T09:13:47.8166426Z .b8 0 2026-02-21T09:13:47.8166518Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:13:47.8166590Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:13:47.8166645Z .b8 116 2026-02-21T09:13:47.8166693Z .b8 109 2026-02-21T09:13:47.8166742Z .b8 112 2026-02-21T09:13:47.8166795Z .b8 47 2026-02-21T09:13:47.8166845Z .b8 116 2026-02-21T09:13:47.8166892Z .b8 111 2026-02-21T09:13:47.8166938Z .b8 114 2026-02-21T09:13:47.8166993Z .b8 99 2026-02-21T09:13:47.8167069Z .b8 104 2026-02-21T09:13:47.8167120Z .b8 105 2026-02-21T09:13:47.8167168Z .b8 110 2026-02-21T09:13:47.8167223Z .b8 100 2026-02-21T09:13:47.8167270Z .b8 117 2026-02-21T09:13:47.8167318Z .b8 99 2026-02-21T09:13:47.8167371Z .b8 116 2026-02-21T09:13:47.8167418Z .b8 111 2026-02-21T09:13:47.8167467Z .b8 114 2026-02-21T09:13:47.8167513Z .b8 95 2026-02-21T09:13:47.8167569Z .b8 114 2026-02-21T09:13:47.8167618Z .b8 111 2026-02-21T09:13:47.8167665Z .b8 111 2026-02-21T09:13:47.8167719Z .b8 116 2026-02-21T09:13:47.8167766Z .b8 47 2026-02-21T09:13:47.8167841Z .b8 103 2026-02-21T09:13:47.8167890Z .b8 106 2026-02-21T09:13:47.8167946Z .b8 0 2026-02-21T09:13:47.8167993Z } 2026-02-21T09:13:47.8168057Z .section .debug_macinfo { } 2026-02-21T09:13:47.8168061Z 2026-02-21T09:13:47.8168143Z ================================================================ 2026-02-21T09:13:47.8168245Z please share the reproducer above with Triton project. 2026-02-21T09:13:48.7340443Z 2026-02-21T09:13:48.7344891Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 67/67 19.5 configs/s 2026-02-21T09:13:50.7109538Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 508.4 2026-02-21T09:13:50.7113606Z configs/s 2026-02-21T09:13:50.8317927Z [167s] Generation 6 complete: 2026-02-21T09:13:50.8319785Z error=17 2026-02-21T09:13:50.8319947Z timeout=2 2026-02-21T09:13:50.8320071Z ok=50 2026-02-21T09:13:50.8320198Z min=0.0369 2026-02-21T09:13:50.8320320Z mid=0.0655 2026-02-21T09:13:50.8320463Z max=12.5501 2026-02-21T09:13:50.8320604Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:13:50.8320830Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:13:50.8321028Z 'l2_groupings': [1], 2026-02-21T09:13:50.8321194Z 'load_eviction_policies': ['', ''], 2026-02-21T09:13:50.8321371Z 'loop_orders': [[0, 1]], 2026-02-21T09:13:50.8321531Z 'num_stages': 4, 2026-02-21T09:13:50.8321668Z 'num_warps': 4, 2026-02-21T09:13:50.8321824Z 'pid_type': 'flat', 2026-02-21T09:13:50.8321995Z 'range_flattens': [None, True], 2026-02-21T09:13:50.8322170Z 'range_multi_buffers': [None, None], 2026-02-21T09:13:50.8322354Z 'range_num_stages': [0, 0], 2026-02-21T09:13:50.8322513Z 'range_unroll_factors': [0, 0], 2026-02-21T09:13:50.8322693Z 'range_warp_specializes': [None, True]} 2026-02-21T09:13:50.8341014Z [167s] Fitting surrogate: 580 points, 580 targets 2026-02-21T09:13:51.7367001Z [168s] Generation 7 starting: 43 neighbors, 3 active search path(s) 2026-02-21T09:13:58.8539660Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44/44 3.4 configs/s 2026-02-21T09:14:00.8083198Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 44/44 23.2 configs/s 2026-02-21T09:14:02.2196585Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 711.1 2026-02-21T09:14:02.2197116Z configs/s 2026-02-21T09:14:02.3175160Z [178s] Generation 7 complete: 2026-02-21T09:14:02.3176810Z error=14 2026-02-21T09:14:02.3177257Z ok=32 2026-02-21T09:14:02.3177386Z min=0.0370 2026-02-21T09:14:02.3177509Z mid=0.0512 2026-02-21T09:14:02.3177636Z max=13.2465 2026-02-21T09:14:02.3177776Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:14:02.3178027Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:14:02.3182274Z 'l2_groupings': [1], 2026-02-21T09:14:02.3187595Z 'load_eviction_policies': ['', ''], 2026-02-21T09:14:02.3189810Z 'loop_orders': [[0, 1]], 2026-02-21T09:14:02.3190003Z 'num_stages': 4, 2026-02-21T09:14:02.3190521Z 'num_warps': 4, 2026-02-21T09:14:02.3197338Z 'pid_type': 'flat', 2026-02-21T09:14:02.3197591Z 'range_flattens': [None, False], 2026-02-21T09:14:02.3197784Z 'range_multi_buffers': [None, None], 2026-02-21T09:14:02.3197982Z 'range_num_stages': [0, 0], 2026-02-21T09:14:02.3198149Z 'range_unroll_factors': [0, 0], 2026-02-21T09:14:02.3198338Z 'range_warp_specializes': [None, True]} 2026-02-21T09:14:02.3198790Z [178s] Fitting surrogate: 626 points, 626 targets 2026-02-21T09:14:03.1073616Z [179s] Generation 8 starting: 45 neighbors, 3 active search path(s) 2026-02-21T09:14:31.0975491Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46/46 0.5 configs/s 2026-02-21T09:14:32.9972998Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 46/46 25.0 configs/s 2026-02-21T09:14:34.7585212Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 715.5 2026-02-21T09:14:34.7585699Z configs/s 2026-02-21T09:14:34.8624351Z [211s] Generation 8 complete: 2026-02-21T09:14:34.8625264Z error=14 2026-02-21T09:14:34.8625442Z ok=34 2026-02-21T09:14:34.8625582Z min=0.0388 2026-02-21T09:14:34.8625736Z mid=0.0532 2026-02-21T09:14:34.8625886Z max=1.4965 2026-02-21T09:14:34.8626060Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:14:34.8626287Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:14:34.8626519Z 'l2_groupings': [1], 2026-02-21T09:14:34.8626720Z 'load_eviction_policies': ['', ''], 2026-02-21T09:14:34.8626935Z 'loop_orders': [[0, 1]], 2026-02-21T09:14:34.8627117Z 'num_stages': 4, 2026-02-21T09:14:34.8627274Z 'num_warps': 4, 2026-02-21T09:14:34.8627416Z 'pid_type': 'flat', 2026-02-21T09:14:34.8627568Z 'range_flattens': [None, False], 2026-02-21T09:14:34.8627758Z 'range_multi_buffers': [None, None], 2026-02-21T09:14:34.8627944Z 'range_num_stages': [0, 0], 2026-02-21T09:14:34.8628112Z 'range_unroll_factors': [0, 0], 2026-02-21T09:14:34.8628289Z 'range_warp_specializes': [None, True]} 2026-02-21T09:14:34.8642164Z [211s] Fitting surrogate: 674 points, 674 targets 2026-02-21T09:14:35.4498971Z [211s] Generation 9 starting: 29 neighbors, 2 active search path(s) 2026-02-21T09:14:37.9305757Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 29/29 16.2 configs/s 2026-02-21T09:14:39.0866044Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 29/29 25.6 configs/s 2026-02-21T09:14:40.2424483Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 864.8 2026-02-21T09:14:40.2425137Z configs/s 2026-02-21T09:14:40.3270924Z [216s] Generation 9 complete: 2026-02-21T09:14:40.3271203Z error=11 2026-02-21T09:14:40.3271363Z ok=20 2026-02-21T09:14:40.3271867Z min=0.0388 2026-02-21T09:14:40.3272028Z mid=0.0409 2026-02-21T09:14:40.3272173Z max=3.8446 2026-02-21T09:14:40.3272346Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:14:40.3272613Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:14:40.3272883Z 'l2_groupings': [1], 2026-02-21T09:14:40.3273110Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:14:40.3273341Z 'loop_orders': [[0, 1]], 2026-02-21T09:14:40.3273535Z 'num_stages': 4, 2026-02-21T09:14:40.3273703Z 'num_warps': 4, 2026-02-21T09:14:40.3273877Z 'pid_type': 'flat', 2026-02-21T09:14:40.3274297Z 'range_flattens': [None, False], 2026-02-21T09:14:40.3274531Z 'range_multi_buffers': [None, None], 2026-02-21T09:14:40.3274964Z 'range_num_stages': [0, 0], 2026-02-21T09:14:40.3275329Z 'range_unroll_factors': [0, 0], 2026-02-21T09:14:40.3275564Z 'range_warp_specializes': [None, True]} 2026-02-21T09:14:40.3293297Z [216s] Fitting surrogate: 705 points, 705 targets 2026-02-21T09:14:40.8278435Z [217s] Generation 10 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:14:42.8017905Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 13.6 configs/s 2026-02-21T09:14:43.4794097Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 26.1 configs/s 2026-02-21T09:14:44.2895387Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1220.8 2026-02-21T09:14:44.2900123Z configs/s 2026-02-21T09:14:44.3591936Z [220s] Generation 10 complete: 2026-02-21T09:14:44.3596022Z error=5 2026-02-21T09:14:44.3597523Z ok=13 2026-02-21T09:14:44.3597742Z min=0.0368 2026-02-21T09:14:44.3603161Z mid=0.0390 2026-02-21T09:14:44.3608174Z max=2.4341 2026-02-21T09:14:44.3612403Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:14:44.3614665Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:14:44.3619039Z 'l2_groupings': [16], 2026-02-21T09:14:44.3623219Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:14:44.3626220Z 'loop_orders': [[1, 0]], 2026-02-21T09:14:44.3628235Z 'num_stages': 3, 2026-02-21T09:14:44.3628424Z 'num_warps': 8, 2026-02-21T09:14:44.3628577Z 'pid_type': 'flat', 2026-02-21T09:14:44.3628748Z 'range_flattens': [None, False], 2026-02-21T09:14:44.3628956Z 'range_multi_buffers': [None, None], 2026-02-21T09:14:44.3629148Z 'range_num_stages': [0, 0], 2026-02-21T09:14:44.3629322Z 'range_unroll_factors': [0, 0], 2026-02-21T09:14:44.3629500Z 'range_warp_specializes': [None, True]} 2026-02-21T09:14:44.3629827Z [220s] Fitting surrogate: 723 points, 723 targets 2026-02-21T09:14:44.8597751Z [221s] Generation 11 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:14:47.7393505Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 8.4 configs/s 2026-02-21T09:14:48.6570230Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 20.9 configs/s 2026-02-21T09:14:49.6740331Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 978.9 2026-02-21T09:14:49.6744225Z configs/s 2026-02-21T09:14:49.7501387Z [226s] Generation 11 complete: 2026-02-21T09:14:49.7501677Z error=3 2026-02-21T09:14:49.7501841Z ok=16 2026-02-21T09:14:49.7501979Z min=0.0349 2026-02-21T09:14:49.7502172Z mid=0.0472 2026-02-21T09:14:49.7502300Z max=2.3419 2026-02-21T09:14:49.7502475Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:14:49.7502721Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:14:49.7502944Z 'l2_groupings': [16], 2026-02-21T09:14:49.7503125Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:14:49.7503316Z 'loop_orders': [[1, 0]], 2026-02-21T09:14:49.7503480Z 'num_stages': 3, 2026-02-21T09:14:49.7503629Z 'num_warps': 8, 2026-02-21T09:14:49.7504119Z 'pid_type': 'flat', 2026-02-21T09:14:49.7504283Z 'range_flattens': [None, False], 2026-02-21T09:14:49.7504460Z 'range_multi_buffers': [None, None], 2026-02-21T09:14:49.7504647Z 'range_num_stages': [0, 0], 2026-02-21T09:14:49.7504983Z 'range_unroll_factors': [0, 0], 2026-02-21T09:14:49.7505262Z 'range_warp_specializes': [None, True]} 2026-02-21T09:14:49.7528173Z [226s] Fitting surrogate: 742 points, 742 targets 2026-02-21T09:14:50.0284399Z [226s] Autotuning complete in 226.5s after searching 714 configs. 2026-02-21T09:14:50.0285101Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:14:50.0286432Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[16], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_stages=3, num_warps=8, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:14:50.0287375Z 2026-02-21T09:14:50.0287645Z [226s] Code of selected kernel: /tmp/torchinductor_root/2r/c2rkbjf7f3e77aus4anwvlfnc3763vmgiqcj7fzamluvglnazb43.py 2026-02-21T09:14:50.0401459Z from __future__ import annotations 2026-02-21T09:14:50.0401701Z 2026-02-21T09:14:50.0406000Z import torch 2026-02-21T09:14:50.0410509Z import triton 2026-02-21T09:14:50.0412071Z import triton.language as tl 2026-02-21T09:14:50.0412601Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:14:50.0415273Z 2026-02-21T09:14:50.0415435Z _BLOCK_SIZE_1 = tl.constexpr(256) 2026-02-21T09:14:50.0415657Z _BLOCK_SIZE_0 = tl.constexpr(256) 2026-02-21T09:14:50.0415835Z _BLOCK_SIZE_2 = tl.constexpr(64) 2026-02-21T09:14:50.0415949Z 2026-02-21T09:14:50.0416017Z @triton.jit 2026-02-21T09:14:50.0416164Z def _helion_matmul(x, y, out): 2026-02-21T09:14:50.0416391Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:14:50.0416643Z num_pid_m = tl.cdiv(4096, _BLOCK_SIZE_1) 2026-02-21T09:14:50.0416848Z num_pid_n = tl.cdiv(2048, _BLOCK_SIZE_0) 2026-02-21T09:14:50.0417034Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:14:50.0417227Z num_pid_in_group = 16 * num_pid_n 2026-02-21T09:14:50.0417431Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:14:50.0417627Z first_pid_m = group_id * 16 2026-02-21T09:14:50.0417830Z group_size_m = min(num_pid_m - first_pid_m, 16) 2026-02-21T09:14:50.0418094Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:14:50.0418385Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:14:50.0418601Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:14:50.0418835Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:14:50.0419070Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:14:50.0419288Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:14:50.0419610Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:14:50.0419904Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:14:50.0420149Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:14:50.0420423Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:14:50.0420772Z for offset_2 in tl.range(0, 2048, _BLOCK_SIZE_2, warp_specialize=True, flatten=False): 2026-02-21T09:14:50.0421103Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:14:50.0421332Z acc_copy = acc 2026-02-21T09:14:50.0421491Z acc_copy_0 = acc_copy 2026-02-21T09:14:50.0421728Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:14:50.0422058Z load = tl.load(x + (indices_0[:, None] * 2048 + indices_2[None, :] * 1), None) 2026-02-21T09:14:50.0422449Z load_1 = tl.load(y + (indices_2[:, None] * 1 + indices_1[None, :] * 2048), None, eviction_policy='evict_first') 2026-02-21T09:14:50.0423147Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:14:50.0423583Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:14:50.0423886Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:14:50.0424131Z tl.store(out + (indices_0[:, None] * 4096 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:14:50.0424317Z 2026-02-21T09:14:50.0424598Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:14:50.0425038Z """ 2026-02-21T09:14:50.0425259Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:14:50.0425512Z Args: 2026-02-21T09:14:50.0425660Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:14:50.0425901Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:14:50.0426197Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:14:50.0426509Z after the matmul. Defaults to identity (no change). 2026-02-21T09:14:50.0426710Z Returns: 2026-02-21T09:14:50.0426865Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:14:50.0427040Z """ 2026-02-21T09:14:50.0427177Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:14:50.0427345Z m, k = x.size() 2026-02-21T09:14:50.0427533Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:14:50.0427708Z k2, n = y.size() 2026-02-21T09:14:50.0427903Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:14:50.0428144Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:14:50.0428339Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:14:50.0428621Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:14:50.0428895Z # src[matmul.py:62]: ) 2026-02-21T09:14:50.0429151Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:14:50.0429458Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:14:50.0429680Z _BLOCK_SIZE_1 = 256 2026-02-21T09:14:50.0429839Z _BLOCK_SIZE_0 = 256 2026-02-21T09:14:50.0430022Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:14:50.0430307Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:14:50.0430577Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:14:50.0430787Z # src[matmul.py:63-67]: ... 2026-02-21T09:14:50.0431142Z _launcher(_helion_matmul, (triton.cdiv(4096, _BLOCK_SIZE_1) * triton.cdiv(2048, _BLOCK_SIZE_0),), x, y, out, num_warps=8, num_stages=3) 2026-02-21T09:14:50.0431518Z # src[matmul.py:68]: return out 2026-02-21T09:14:50.0431694Z return out 2026-02-21T09:15:18.9092634Z WARNING:tritonbench.utils.triton_op:Completed input ID 3: 2026-02-21T09:15:18.9094183Z (M, N, K) 2026-02-21T09:15:18.9094386Z ------------------ 2026-02-21T09:15:18.9094577Z (2048, 4096, 2048) 2026-02-21T09:15:18.9094863Z 2026-02-21T09:15:18.9103563Z 38%|███▊ | 3/8 [17:14<29:36, 355.35s/it]WARNING:tritonbench.utils.triton_op:Running input ID 5: 2026-02-21T09:15:18.9107718Z (M, N, K) 2026-02-21T09:15:18.9107964Z ------------------ 2026-02-21T09:15:18.9108142Z (1024, 8192, 1024) 2026-02-21T09:15:18.9108483Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:16:06.9924068Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:16:45.8141586Z INFO:tritonbench.utils.triton_op:Took 108.47ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:17:16.9062981Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:17:16.9063478Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:17:16.9067640Z 'dtype': 'torch.float16', 2026-02-21T09:17:16.9068467Z 'shape': (1024, 1024), 2026-02-21T09:17:16.9068786Z 'stride': (1024, 1)}, 2026-02-21T09:17:16.9069224Z { 'device': 'cuda:0', 2026-02-21T09:17:16.9069566Z 'dtype': 'torch.float16', 2026-02-21T09:17:16.9070028Z 'shape': (1024, 8192), 2026-02-21T09:17:16.9070345Z 'stride': (1, 1024)}, 2026-02-21T09:17:16.9070638Z None), 2026-02-21T09:17:16.9070903Z 'kwargs': {}} 2026-02-21T09:17:16.9109191Z INFO:tritonbench.utils.triton_op:Took 5.11ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:17:17.0168790Z [0s] Autotune random seed: 2137757931 2026-02-21T09:17:17.1416281Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:17:25.3471782Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 31.3 configs/s 2026-02-21T09:17:30.4736142Z [13s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:17:30.4738631Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:17:30.4740829Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:17:30.4741266Z `ptxas` stderr: 2026-02-21T09:17:30.4742057Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 209 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:17:30.4742966Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:17:30.4743259Z 2026-02-21T09:17:30.4744030Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5q2czhtf.ptx -o /tmp/tmp5q2czhtf.ptx.o 2026-02-21T09:17:30.4744968Z 2026-02-21T09:17:30.4745212Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:17:30.4745575Z 2026-02-21T09:17:30.4745582Z 2026-02-21T09:17:30.4745719Z ================================================================ 2026-02-21T09:17:30.4746136Z Internal Triton PTX codegen error 2026-02-21T09:17:30.4746453Z `ptxas` stderr: 2026-02-21T09:17:30.4747228Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 209 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:17:30.4748130Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:17:30.4748400Z 2026-02-21T09:17:30.4749148Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5q2czhtf.ptx -o /tmp/tmp5q2czhtf.ptx.o 2026-02-21T09:17:30.4749996Z 2026-02-21T09:17:30.4750002Z 2026-02-21T09:17:30.4750089Z // 2026-02-21T09:17:30.4750331Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:17:30.4750628Z // 2026-02-21T09:17:30.4750756Z 2026-02-21T09:17:30.4750846Z .version 8.7 2026-02-21T09:17:30.4751069Z .target sm_100a 2026-02-21T09:17:30.4751307Z .address_size 64 2026-02-21T09:17:30.4751452Z 2026-02-21T09:17:30.4751684Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:17:30.4752146Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:17:30.4752527Z // @_helion_matmul 2026-02-21T09:17:30.4752883Z .visible .entry _helion_matmul( 2026-02-21T09:17:30.4753270Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:17:30.4753735Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:17:30.4754201Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:17:30.4754839Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:17:30.4755294Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:17:30.4755668Z ) 2026-02-21T09:17:30.4755869Z .reqntid 256 2026-02-21T09:17:30.4756178Z .maxnreg 32 2026-02-21T09:17:30.4756380Z { 2026-02-21T09:17:30.4756595Z .reg .pred %p<31>; 2026-02-21T09:17:30.4756844Z .reg .b16 %rs<11>; 2026-02-21T09:17:30.4757085Z .reg .b32 %r<578>; 2026-02-21T09:17:30.4757316Z .reg .b64 %rd<232>; 2026-02-21T09:17:30.4757548Z $L__func_begin0: 2026-02-21T09:17:30.4757682Z 2026-02-21T09:17:30.4757772Z // %bb.0: 2026-02-21T09:17:30.4758183Z .loc 1 14 0 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:14 2026-02-21T09:17:30.4758695Z mov.u32 %r1, %tid.x; 2026-02-21T09:17:30.4758938Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:17:30.4759208Z mov.b32 %r86, global_smem; 2026-02-21T09:17:30.4759516Z // begin inline asm 2026-02-21T09:17:30.4759945Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r86], 128; 2026-02-21T09:17:30.4760398Z // end inline asm 2026-02-21T09:17:30.4760642Z bar.sync 0; 2026-02-21T09:17:30.4760904Z ld.shared.b32 %r570, [global_smem]; 2026-02-21T09:17:30.4761216Z bar.sync 0; 2026-02-21T09:17:30.4761447Z // begin inline asm 2026-02-21T09:17:30.4761810Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:17:30.4762232Z // end inline asm 2026-02-21T09:17:30.4762764Z .loc 1 21 30 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:21:30 2026-02-21T09:17:30.4763342Z mov.u32 %r87, %ctaid.x; 2026-02-21T09:17:30.4763809Z .loc 1 21 35 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:21:35 2026-02-21T09:17:30.4764370Z shl.b32 %r571, %r87, 1; 2026-02-21T09:17:30.4764921Z .loc 1 22 37 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:22:37 2026-02-21T09:17:30.4765474Z add.s32 %r88, %r571, 2; 2026-02-21T09:17:30.4765981Z .loc 1 22 49 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:22:49 2026-02-21T09:17:30.4766528Z min.s32 %r4, %r88, 512; 2026-02-21T09:17:30.4767038Z .loc 1 23 107 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:23:107 2026-02-21T09:17:30.4767627Z setp.ge.s32 %p3, %r571, %r4; 2026-02-21T09:17:30.4767922Z @%p3 bra $L__BB0_9; 2026-02-21T09:17:30.4768216Z // %bb.1: // %.lr.ph 2026-02-21T09:17:30.4768783Z .loc 1 0 107 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:0:107 2026-02-21T09:17:30.4769393Z ld.param.b64 %rd16, [_helion_matmul_param_2]; 2026-02-21T09:17:30.4769794Z ld.param.b64 %rd15, [_helion_matmul_param_1]; 2026-02-21T09:17:30.4770187Z ld.param.b64 %rd14, [_helion_matmul_param_0]; 2026-02-21T09:17:30.4770530Z shr.u32 %r5, %r1, 5; 2026-02-21T09:17:30.4770796Z bfe.u32 %r6, %r1, 2, 6; 2026-02-21T09:17:30.4771067Z shl.b32 %r7, %r1, 3; 2026-02-21T09:17:30.4771311Z and.b32 %r8, %r7, 56; 2026-02-21T09:17:30.4771581Z bfe.u32 %r9, %r1, 1, 7; 2026-02-21T09:17:30.4771817Z shr.u32 %r89, %r1, 3; 2026-02-21T09:17:30.4772063Z bfe.u32 %r10, %r1, 3, 5; 2026-02-21T09:17:30.4772309Z or.b32 %r11, %r10, 32; 2026-02-21T09:17:30.4772550Z or.b32 %r12, %r10, 64; 2026-02-21T09:17:30.4772780Z or.b32 %r13, %r89, 96; 2026-02-21T09:17:30.4773022Z or.b32 %r14, %r10, 128; 2026-02-21T09:17:30.4773258Z or.b32 %r15, %r10, 160; 2026-02-21T09:17:30.4773499Z or.b32 %r16, %r10, 192; 2026-02-21T09:17:30.4773739Z or.b32 %r17, %r89, 224; 2026-02-21T09:17:30.4773969Z and.b32 %r18, %r7, 8; 2026-02-21T09:17:30.4774204Z shl.b32 %r90, %r1, 2; 2026-02-21T09:17:30.4774433Z and.b32 %r19, %r90, 12; 2026-02-21T09:17:30.4774749Z shl.b32 %r91, %r1, 4; 2026-02-21T09:17:30.4774950Z and.b32 %r92, %r91, 3952; 2026-02-21T09:17:30.4775164Z bfe.s32 %r93, %r1, 3, 1; 2026-02-21T09:17:30.4775370Z and.b32 %r94, %r93, 144; 2026-02-21T09:17:30.4775682Z xor.b32 %r20, %r94, %r92; 2026-02-21T09:17:30.4775904Z add.s32 %r192, %r86, %r20; 2026-02-21T09:17:30.4776138Z add.s32 %r194, %r192, 4096; 2026-02-21T09:17:30.4776371Z and.b32 %r96, %r7, 1912; 2026-02-21T09:17:30.4776580Z bfe.s32 %r97, %r1, 4, 1; 2026-02-21T09:17:30.4776868Z and.b32 %r98, %r97, 144; 2026-02-21T09:17:30.4777080Z xor.b32 %r99, %r98, %r96; 2026-02-21T09:17:30.4777306Z add.s32 %r100, %r86, 65536; 2026-02-21T09:17:30.4777530Z add.s32 %r196, %r100, %r99; 2026-02-21T09:17:30.4777758Z or.b32 %r24, %r18, 16; 2026-02-21T09:17:30.4777964Z add.s32 %r198, %r192, 8192; 2026-02-21T09:17:30.4778196Z add.s32 %r200, %r192, 12288; 2026-02-21T09:17:30.4778421Z add.s32 %r101, %r86, %r99; 2026-02-21T09:17:30.4778648Z add.s32 %r202, %r101, 67584; 2026-02-21T09:17:30.4778873Z or.b32 %r28, %r18, 32; 2026-02-21T09:17:30.4779079Z add.s32 %r204, %r192, 16384; 2026-02-21T09:17:30.4779304Z add.s32 %r206, %r192, 20480; 2026-02-21T09:17:30.4779579Z add.s32 %r208, %r101, 69632; 2026-02-21T09:17:30.4779821Z or.b32 %r32, %r18, 48; 2026-02-21T09:17:30.4780024Z add.s32 %r210, %r192, 24576; 2026-02-21T09:17:30.4780253Z add.s32 %r212, %r192, 28672; 2026-02-21T09:17:30.4780468Z add.s32 %r214, %r101, 71680; 2026-02-21T09:17:30.4780689Z or.b32 %r36, %r18, 64; 2026-02-21T09:17:30.4780897Z add.s32 %r216, %r192, 32768; 2026-02-21T09:17:30.4781127Z add.s32 %r218, %r192, 36864; 2026-02-21T09:17:30.4781353Z add.s32 %r220, %r101, 73728; 2026-02-21T09:17:30.4781624Z or.b32 %r40, %r18, 80; 2026-02-21T09:17:30.4781851Z add.s32 %r222, %r192, 40960; 2026-02-21T09:17:30.4782068Z add.s32 %r224, %r192, 45056; 2026-02-21T09:17:30.4782290Z add.s32 %r226, %r101, 75776; 2026-02-21T09:17:30.4782501Z or.b32 %r44, %r18, 96; 2026-02-21T09:17:30.4782711Z add.s32 %r228, %r192, 49152; 2026-02-21T09:17:30.4782925Z add.s32 %r230, %r192, 53248; 2026-02-21T09:17:30.4783148Z add.s32 %r232, %r101, 77824; 2026-02-21T09:17:30.4783375Z bfe.u32 %r102, %r86, 4, 14; 2026-02-21T09:17:30.4783597Z cvt.u64.u32 %rd17, %r102; 2026-02-21T09:17:30.4783851Z or.b64 %rd69, %rd17, -4611685949674356736; 2026-02-21T09:17:30.4784315Z bfe.u32 %r103, %r100, 4, 14; 2026-02-21T09:17:30.4784541Z cvt.u64.u32 %rd18, %r103; 2026-02-21T09:17:30.4784859Z or.b64 %rd70, %rd18, -4611685949699522560; 2026-02-21T09:17:30.4785144Z add.s32 %r104, %r86, 4096; 2026-02-21T09:17:30.4785364Z bfe.u32 %r105, %r104, 4, 14; 2026-02-21T09:17:30.4785590Z cvt.u64.u32 %rd19, %r105; 2026-02-21T09:17:30.4785825Z or.b64 %rd71, %rd19, -4611685949674356736; 2026-02-21T09:17:30.4786062Z or.b32 %r48, %r18, 112; 2026-02-21T09:17:30.4786262Z add.s32 %r261, %r192, 57344; 2026-02-21T09:17:30.4786462Z add.s32 %r263, %r192, 61440; 2026-02-21T09:17:30.4786668Z add.s32 %r265, %r101, 79872; 2026-02-21T09:17:30.4786867Z and.b32 %r106, %r91, 4016; 2026-02-21T09:17:30.4787075Z bfe.s32 %r107, %r1, 2, 1; 2026-02-21T09:17:30.4787274Z and.b32 %r108, %r107, 4160; 2026-02-21T09:17:30.4787487Z or.b32 %r109, %r108, %r106; 2026-02-21T09:17:30.4787692Z add.s32 %r52, %r86, %r109; 2026-02-21T09:17:30.4787898Z xor.b32 %r110, %r109, 64; 2026-02-21T09:17:30.4788100Z add.s32 %r53, %r86, %r110; 2026-02-21T09:17:30.4788298Z shl.b32 %r111, %r1, 6; 2026-02-21T09:17:30.4788496Z and.b32 %r112, %r111, 1600; 2026-02-21T09:17:30.4788698Z and.b32 %r113, %r7, 48; 2026-02-21T09:17:30.4788902Z shl.b32 %r114, %r1, 1; 2026-02-21T09:17:30.4789091Z and.b32 %r115, %r114, 384; 2026-02-21T09:17:30.4789298Z bfe.s32 %r116, %r1, 5, 1; 2026-02-21T09:17:30.4789495Z and.b32 %r117, %r116, 4160; 2026-02-21T09:17:30.4789702Z or.b32 %r118, %r112, %r113; 2026-02-21T09:17:30.4789900Z or.b32 %r119, %r117, %r115; 2026-02-21T09:17:30.4790111Z xor.b32 %r120, %r119, %r118; 2026-02-21T09:17:30.4790326Z add.s32 %r389, %r86, %r120; 2026-02-21T09:17:30.4790527Z add.s32 %r394, %r389, 2048; 2026-02-21T09:17:30.4790952Z .loc 1 23 107 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:23:107 2026-02-21T09:17:30.4791439Z and.b32 %r121, %r1, 1; 2026-02-21T09:17:30.4791785Z mad.wide.u32 %rd20, %r121, 16, %rd14; 2026-02-21T09:17:30.4792051Z add.s64 %rd4, %rd20, 262400; 2026-02-21T09:17:30.4792285Z shl.b32 %r56, %r9, 10; 2026-02-21T09:17:30.4792731Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.4793245Z or.b32 %r57, %r19, 128; 2026-02-21T09:17:30.4793461Z setp.eq.b32 %p8, %r1, 0; 2026-02-21T09:17:30.4793681Z bra.uni $L__BB0_2; 2026-02-21T09:17:30.4793962Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:17:30.4794482Z .loc 1 0 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:0:89 2026-02-21T09:17:30.4795031Z mov.b32 %r314, 1; 2026-02-21T09:17:30.4795425Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4795891Z // begin inline asm 2026-02-21T09:17:30.4796148Z 2026-02-21T09:17:30.4796310Z { 2026-02-21T09:17:30.4796490Z .reg .pred complete; 2026-02-21T09:17:30.4796689Z waitLoop: 2026-02-21T09:17:30.4796980Z mbarrier.try_wait.parity.shared.b64 complete, [%r313], %r314; 2026-02-21T09:17:30.4797345Z @!complete bra.uni waitLoop; 2026-02-21T09:17:30.4797572Z } 2026-02-21T09:17:30.4797663Z 2026-02-21T09:17:30.4797737Z // end inline asm 2026-02-21T09:17:30.4798134Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.4798665Z cp.async.wait_group 0; 2026-02-21T09:17:30.4798885Z bar.sync 0; 2026-02-21T09:17:30.4799079Z add.s32 %r315, %r86, 81920; 2026-02-21T09:17:30.4799299Z // begin inline asm 2026-02-21T09:17:30.4799555Z @%p8 mbarrier.inval.shared::cta.b64 [%r315]; 2026-02-21T09:17:30.4799837Z // end inline asm 2026-02-21T09:17:30.4800032Z bar.sync 0; 2026-02-21T09:17:30.4800210Z // begin inline asm 2026-02-21T09:17:30.4800450Z @%p8 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T09:17:30.4800726Z // end inline asm 2026-02-21T09:17:30.4801124Z .loc 1 52 45 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:52:45 2026-02-21T09:17:30.4801618Z shl.b32 %r458, %r64, 13; 2026-02-21T09:17:30.4801835Z shl.b32 %r459, %r65, 13; 2026-02-21T09:17:30.4802058Z shl.b32 %r460, %r66, 13; 2026-02-21T09:17:30.4802266Z shl.b32 %r461, %r67, 13; 2026-02-21T09:17:30.4802483Z shl.b32 %r462, %r68, 13; 2026-02-21T09:17:30.4802687Z shl.b32 %r463, %r69, 13; 2026-02-21T09:17:30.4802903Z shl.b32 %r464, %r70, 13; 2026-02-21T09:17:30.4803113Z shl.b32 %r465, %r71, 13; 2026-02-21T09:17:30.4803516Z .loc 1 52 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:52:52 2026-02-21T09:17:30.4803937Z add.s32 %r466, %r458, %r63; 2026-02-21T09:17:30.4804141Z add.s32 %r467, %r459, %r63; 2026-02-21T09:17:30.4804351Z add.s32 %r468, %r460, %r63; 2026-02-21T09:17:30.4804550Z add.s32 %r469, %r461, %r63; 2026-02-21T09:17:30.4804838Z add.s32 %r470, %r462, %r63; 2026-02-21T09:17:30.4805062Z add.s32 %r471, %r463, %r63; 2026-02-21T09:17:30.4805288Z add.s32 %r472, %r464, %r63; 2026-02-21T09:17:30.4805505Z add.s32 %r473, %r465, %r63; 2026-02-21T09:17:30.4805923Z .loc 1 52 24 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:52:24 2026-02-21T09:17:30.4806431Z mad.wide.s32 %rd94, %r466, 2, %rd16; 2026-02-21T09:17:30.4806705Z mad.wide.s32 %rd95, %r467, 2, %rd16; 2026-02-21T09:17:30.4806973Z mad.wide.s32 %rd96, %r468, 2, %rd16; 2026-02-21T09:17:30.4807235Z mad.wide.s32 %rd97, %r469, 2, %rd16; 2026-02-21T09:17:30.4807501Z mad.wide.s32 %rd98, %r470, 2, %rd16; 2026-02-21T09:17:30.4807753Z mad.wide.s32 %rd99, %r471, 2, %rd16; 2026-02-21T09:17:30.4808017Z mad.wide.s32 %rd100, %r472, 2, %rd16; 2026-02-21T09:17:30.4808281Z mad.wide.s32 %rd101, %r473, 2, %rd16; 2026-02-21T09:17:30.4808719Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4809159Z // begin inline asm 2026-02-21T09:17:30.4809883Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332}, [%r384 + 0]; 2026-02-21T09:17:30.4810564Z // end inline asm 2026-02-21T09:17:30.4810845Z // begin inline asm 2026-02-21T09:17:30.4811471Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349}, [%r384 + 16]; 2026-02-21T09:17:30.4812156Z // end inline asm 2026-02-21T09:17:30.4812372Z // begin inline asm 2026-02-21T09:17:30.4812991Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365, %r366}, [%r384 + 32]; 2026-02-21T09:17:30.4813661Z // end inline asm 2026-02-21T09:17:30.4813880Z // begin inline asm 2026-02-21T09:17:30.4814544Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381, %r382, %r383}, [%r384 + 48]; 2026-02-21T09:17:30.4815285Z // end inline asm 2026-02-21T09:17:30.4815511Z // begin inline asm 2026-02-21T09:17:30.4815761Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:17:30.4816051Z // end inline asm 2026-02-21T09:17:30.4816298Z cvt.u64.u32 %rd102, %r317; 2026-02-21T09:17:30.4816589Z cvt.u64.u32 %rd103, %r318; 2026-02-21T09:17:30.4816871Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:17:30.4817160Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:17:30.4817730Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4818302Z mov.b64 {%r474, %r475}, %rd105; 2026-02-21T09:17:30.4818619Z cvt.rn.f16x2.f32 %r476, %r475, %r474; 2026-02-21T09:17:30.4819166Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4819731Z cvt.u64.u32 %rd106, %r319; 2026-02-21T09:17:30.4820012Z cvt.u64.u32 %rd107, %r320; 2026-02-21T09:17:30.4820300Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:17:30.4820586Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:17:30.4821106Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4821657Z mov.b64 {%r477, %r478}, %rd109; 2026-02-21T09:17:30.4821970Z cvt.rn.f16x2.f32 %r479, %r478, %r477; 2026-02-21T09:17:30.4822521Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4823072Z cvt.u64.u32 %rd110, %r321; 2026-02-21T09:17:30.4823360Z cvt.u64.u32 %rd111, %r322; 2026-02-21T09:17:30.4823634Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:17:30.4823925Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:17:30.4824440Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4825037Z mov.b64 {%r480, %r481}, %rd113; 2026-02-21T09:17:30.4825353Z cvt.rn.f16x2.f32 %r482, %r481, %r480; 2026-02-21T09:17:30.4825895Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4826453Z cvt.u64.u32 %rd114, %r323; 2026-02-21T09:17:30.4826732Z cvt.u64.u32 %rd115, %r324; 2026-02-21T09:17:30.4827025Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:17:30.4827314Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:17:30.4827834Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4828384Z mov.b64 {%r483, %r484}, %rd117; 2026-02-21T09:17:30.4828694Z cvt.rn.f16x2.f32 %r485, %r484, %r483; 2026-02-21T09:17:30.4829235Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4829784Z cvt.u64.u32 %rd118, %r325; 2026-02-21T09:17:30.4830066Z cvt.u64.u32 %rd119, %r326; 2026-02-21T09:17:30.4830337Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:17:30.4830631Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:17:30.4831138Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4831749Z mov.b64 {%r486, %r487}, %rd121; 2026-02-21T09:17:30.4832057Z cvt.rn.f16x2.f32 %r488, %r487, %r486; 2026-02-21T09:17:30.4832592Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4833202Z cvt.u64.u32 %rd122, %r327; 2026-02-21T09:17:30.4833480Z cvt.u64.u32 %rd123, %r328; 2026-02-21T09:17:30.4833768Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:17:30.4834050Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:17:30.4834584Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4835175Z mov.b64 {%r489, %r490}, %rd125; 2026-02-21T09:17:30.4835486Z cvt.rn.f16x2.f32 %r491, %r490, %r489; 2026-02-21T09:17:30.4836077Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4836628Z cvt.u64.u32 %rd126, %r329; 2026-02-21T09:17:30.4836915Z cvt.u64.u32 %rd127, %r330; 2026-02-21T09:17:30.4837184Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:17:30.4837470Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:17:30.4837975Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4838530Z mov.b64 {%r492, %r493}, %rd129; 2026-02-21T09:17:30.4838842Z cvt.rn.f16x2.f32 %r494, %r493, %r492; 2026-02-21T09:17:30.4839424Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4839980Z cvt.u64.u32 %rd130, %r331; 2026-02-21T09:17:30.4840254Z cvt.u64.u32 %rd131, %r332; 2026-02-21T09:17:30.4840533Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:17:30.4840810Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:17:30.4841326Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4841878Z mov.b64 {%r495, %r496}, %rd133; 2026-02-21T09:17:30.4842191Z cvt.rn.f16x2.f32 %r497, %r496, %r495; 2026-02-21T09:17:30.4842734Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4843276Z cvt.u64.u32 %rd134, %r334; 2026-02-21T09:17:30.4843564Z cvt.u64.u32 %rd135, %r335; 2026-02-21T09:17:30.4843841Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:17:30.4844132Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:17:30.4844650Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4845260Z mov.b64 {%r498, %r499}, %rd137; 2026-02-21T09:17:30.4845569Z cvt.rn.f16x2.f32 %r500, %r499, %r498; 2026-02-21T09:17:30.4846112Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4846675Z cvt.u64.u32 %rd138, %r336; 2026-02-21T09:17:30.4846956Z cvt.u64.u32 %rd139, %r337; 2026-02-21T09:17:30.4847250Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:17:30.4847537Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:17:30.4848062Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4848607Z mov.b64 {%r501, %r502}, %rd141; 2026-02-21T09:17:30.4848918Z cvt.rn.f16x2.f32 %r503, %r502, %r501; 2026-02-21T09:17:30.4849455Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4849998Z cvt.u64.u32 %rd142, %r338; 2026-02-21T09:17:30.4850288Z cvt.u64.u32 %rd143, %r339; 2026-02-21T09:17:30.4850566Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:17:30.4850858Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:17:30.4851369Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4851927Z mov.b64 {%r504, %r505}, %rd145; 2026-02-21T09:17:30.4852235Z cvt.rn.f16x2.f32 %r506, %r505, %r504; 2026-02-21T09:17:30.4852772Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4853389Z cvt.u64.u32 %rd146, %r340; 2026-02-21T09:17:30.4853663Z cvt.u64.u32 %rd147, %r341; 2026-02-21T09:17:30.4853950Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:17:30.4854233Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:17:30.4854849Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4855409Z mov.b64 {%r507, %r508}, %rd149; 2026-02-21T09:17:30.4855726Z cvt.rn.f16x2.f32 %r509, %r508, %r507; 2026-02-21T09:17:30.4856270Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4856820Z cvt.u64.u32 %rd150, %r342; 2026-02-21T09:17:30.4857106Z cvt.u64.u32 %rd151, %r343; 2026-02-21T09:17:30.4857381Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:17:30.4857665Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:17:30.4858226Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4858788Z mov.b64 {%r510, %r511}, %rd153; 2026-02-21T09:17:30.4859094Z cvt.rn.f16x2.f32 %r512, %r511, %r510; 2026-02-21T09:17:30.4859626Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4860179Z cvt.u64.u32 %rd154, %r344; 2026-02-21T09:17:30.4860454Z cvt.u64.u32 %rd155, %r345; 2026-02-21T09:17:30.4860735Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:17:30.4861055Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:17:30.4861571Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4862112Z mov.b64 {%r513, %r514}, %rd157; 2026-02-21T09:17:30.4862420Z cvt.rn.f16x2.f32 %r515, %r514, %r513; 2026-02-21T09:17:30.4862965Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4863507Z cvt.u64.u32 %rd158, %r346; 2026-02-21T09:17:30.4863795Z cvt.u64.u32 %rd159, %r347; 2026-02-21T09:17:30.4864068Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:17:30.4864358Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:17:30.4864934Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4865506Z mov.b64 {%r516, %r517}, %rd161; 2026-02-21T09:17:30.4865820Z cvt.rn.f16x2.f32 %r518, %r517, %r516; 2026-02-21T09:17:30.4866367Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4866930Z cvt.u64.u32 %rd162, %r348; 2026-02-21T09:17:30.4867206Z cvt.u64.u32 %rd163, %r349; 2026-02-21T09:17:30.4867494Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:17:30.4867776Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:17:30.4868309Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4868870Z mov.b64 {%r519, %r520}, %rd165; 2026-02-21T09:17:30.4869184Z cvt.rn.f16x2.f32 %r521, %r520, %r519; 2026-02-21T09:17:30.4869745Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4870249Z cvt.u64.u32 %rd166, %r351; 2026-02-21T09:17:30.4870508Z cvt.u64.u32 %rd167, %r352; 2026-02-21T09:17:30.4870760Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:17:30.4871022Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:17:30.4871488Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4872001Z mov.b64 {%r522, %r523}, %rd169; 2026-02-21T09:17:30.4872288Z cvt.rn.f16x2.f32 %r524, %r523, %r522; 2026-02-21T09:17:30.4872783Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4873303Z cvt.u64.u32 %rd170, %r353; 2026-02-21T09:17:30.4873559Z cvt.u64.u32 %rd171, %r354; 2026-02-21T09:17:30.4873826Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:17:30.4874088Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:17:30.4874642Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4875240Z mov.b64 {%r525, %r526}, %rd173; 2026-02-21T09:17:30.4875556Z cvt.rn.f16x2.f32 %r527, %r526, %r525; 2026-02-21T09:17:30.4876157Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4876706Z cvt.u64.u32 %rd174, %r355; 2026-02-21T09:17:30.4876990Z cvt.u64.u32 %rd175, %r356; 2026-02-21T09:17:30.4877265Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:17:30.4877555Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:17:30.4878069Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4878621Z mov.b64 {%r528, %r529}, %rd177; 2026-02-21T09:17:30.4878929Z cvt.rn.f16x2.f32 %r530, %r529, %r528; 2026-02-21T09:17:30.4879513Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4880071Z cvt.u64.u32 %rd178, %r357; 2026-02-21T09:17:30.4880339Z cvt.u64.u32 %rd179, %r358; 2026-02-21T09:17:30.4880616Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:17:30.4880897Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:17:30.4881417Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4881966Z mov.b64 {%r531, %r532}, %rd181; 2026-02-21T09:17:30.4882271Z cvt.rn.f16x2.f32 %r533, %r532, %r531; 2026-02-21T09:17:30.4882886Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4883429Z cvt.u64.u32 %rd182, %r359; 2026-02-21T09:17:30.4883711Z cvt.u64.u32 %rd183, %r360; 2026-02-21T09:17:30.4883982Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:17:30.4884271Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:17:30.4884838Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4885399Z mov.b64 {%r534, %r535}, %rd185; 2026-02-21T09:17:30.4885708Z cvt.rn.f16x2.f32 %r536, %r535, %r534; 2026-02-21T09:17:30.4886246Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4886802Z cvt.u64.u32 %rd186, %r361; 2026-02-21T09:17:30.4887080Z cvt.u64.u32 %rd187, %r362; 2026-02-21T09:17:30.4887365Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:17:30.4887641Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:17:30.4888160Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4888714Z mov.b64 {%r537, %r538}, %rd189; 2026-02-21T09:17:30.4889026Z cvt.rn.f16x2.f32 %r539, %r538, %r537; 2026-02-21T09:17:30.4889586Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4890140Z cvt.u64.u32 %rd190, %r363; 2026-02-21T09:17:30.4890430Z cvt.u64.u32 %rd191, %r364; 2026-02-21T09:17:30.4890706Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:17:30.4890996Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:17:30.4891517Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4892076Z mov.b64 {%r540, %r541}, %rd193; 2026-02-21T09:17:30.4892391Z cvt.rn.f16x2.f32 %r542, %r541, %r540; 2026-02-21T09:17:30.4892928Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4893484Z cvt.u64.u32 %rd194, %r365; 2026-02-21T09:17:30.4893756Z cvt.u64.u32 %rd195, %r366; 2026-02-21T09:17:30.4894038Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:17:30.4894319Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:17:30.4894911Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4895462Z mov.b64 {%r543, %r544}, %rd197; 2026-02-21T09:17:30.4895775Z cvt.rn.f16x2.f32 %r545, %r544, %r543; 2026-02-21T09:17:30.4896324Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4896927Z cvt.u64.u32 %rd198, %r368; 2026-02-21T09:17:30.4897214Z cvt.u64.u32 %rd199, %r369; 2026-02-21T09:17:30.4897487Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:17:30.4897825Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:17:30.4898331Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4898889Z mov.b64 {%r546, %r547}, %rd201; 2026-02-21T09:17:30.4899192Z cvt.rn.f16x2.f32 %r548, %r547, %r546; 2026-02-21T09:17:30.4899732Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4900290Z cvt.u64.u32 %rd202, %r370; 2026-02-21T09:17:30.4900563Z cvt.u64.u32 %rd203, %r371; 2026-02-21T09:17:30.4900845Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:17:30.4901122Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:17:30.4901652Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4902178Z mov.b64 {%r549, %r550}, %rd205; 2026-02-21T09:17:30.4902486Z cvt.rn.f16x2.f32 %r551, %r550, %r549; 2026-02-21T09:17:30.4903020Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4903571Z cvt.u64.u32 %rd206, %r372; 2026-02-21T09:17:30.4903854Z cvt.u64.u32 %rd207, %r373; 2026-02-21T09:17:30.4904108Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:17:30.4904408Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:17:30.4904936Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4905451Z mov.b64 {%r552, %r553}, %rd209; 2026-02-21T09:17:30.4905748Z cvt.rn.f16x2.f32 %r554, %r553, %r552; 2026-02-21T09:17:30.4906289Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4906853Z cvt.u64.u32 %rd210, %r374; 2026-02-21T09:17:30.4907138Z cvt.u64.u32 %rd211, %r375; 2026-02-21T09:17:30.4907402Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:17:30.4907667Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:17:30.4908148Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4908660Z mov.b64 {%r555, %r556}, %rd213; 2026-02-21T09:17:30.4908952Z cvt.rn.f16x2.f32 %r557, %r556, %r555; 2026-02-21T09:17:30.4909455Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4909957Z cvt.u64.u32 %rd214, %r376; 2026-02-21T09:17:30.4910224Z cvt.u64.u32 %rd215, %r377; 2026-02-21T09:17:30.4910475Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:17:30.4910739Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:17:30.4911206Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4911729Z mov.b64 {%r558, %r559}, %rd217; 2026-02-21T09:17:30.4912017Z cvt.rn.f16x2.f32 %r560, %r559, %r558; 2026-02-21T09:17:30.4912509Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4913021Z cvt.u64.u32 %rd218, %r378; 2026-02-21T09:17:30.4913279Z cvt.u64.u32 %rd219, %r379; 2026-02-21T09:17:30.4913539Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:17:30.4913798Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:17:30.4914279Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4914821Z mov.b64 {%r561, %r562}, %rd221; 2026-02-21T09:17:30.4915104Z cvt.rn.f16x2.f32 %r563, %r562, %r561; 2026-02-21T09:17:30.4915611Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4916151Z cvt.u64.u32 %rd222, %r380; 2026-02-21T09:17:30.4916434Z cvt.u64.u32 %rd223, %r381; 2026-02-21T09:17:30.4916709Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:17:30.4917054Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:17:30.4917566Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4918131Z mov.b64 {%r564, %r565}, %rd225; 2026-02-21T09:17:30.4918422Z cvt.rn.f16x2.f32 %r566, %r565, %r564; 2026-02-21T09:17:30.4918997Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4919560Z cvt.u64.u32 %rd226, %r382; 2026-02-21T09:17:30.4919839Z cvt.u64.u32 %rd227, %r383; 2026-02-21T09:17:30.4920121Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:17:30.4920404Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:17:30.4920930Z .loc 1 51 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:51:27 2026-02-21T09:17:30.4921485Z mov.b64 {%r567, %r568}, %rd229; 2026-02-21T09:17:30.4921798Z cvt.rn.f16x2.f32 %r569, %r568, %r567; 2026-02-21T09:17:30.4922390Z .loc 1 52 82 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:52:82 2026-02-21T09:17:30.4923006Z st.shared.v4.b32 [%r52], {%r476, %r488, %r500, %r512}; 2026-02-21T09:17:30.4923442Z st.shared.v4.b32 [%r53], {%r524, %r536, %r548, %r560}; 2026-02-21T09:17:30.4923797Z bar.sync 0; 2026-02-21T09:17:30.4924042Z // begin inline asm 2026-02-21T09:17:30.4924475Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r385, %r386, %r387, %r388}, [%r389]; 2026-02-21T09:17:30.4925004Z // end inline asm 2026-02-21T09:17:30.4925232Z // begin inline asm 2026-02-21T09:17:30.4925663Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r390, %r391, %r392, %r393}, [%r394]; 2026-02-21T09:17:30.4926151Z // end inline asm 2026-02-21T09:17:30.4926383Z bar.sync 0; 2026-02-21T09:17:30.4926680Z st.shared.v4.b32 [%r52], {%r479, %r491, %r503, %r515}; 2026-02-21T09:17:30.4927099Z st.shared.v4.b32 [%r53], {%r527, %r539, %r551, %r563}; 2026-02-21T09:17:30.4927455Z bar.sync 0; 2026-02-21T09:17:30.4927678Z // begin inline asm 2026-02-21T09:17:30.4928113Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r395, %r396, %r397, %r398}, [%r389]; 2026-02-21T09:17:30.4928606Z // end inline asm 2026-02-21T09:17:30.4928843Z // begin inline asm 2026-02-21T09:17:30.4929274Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r400, %r401, %r402, %r403}, [%r394]; 2026-02-21T09:17:30.4929754Z // end inline asm 2026-02-21T09:17:30.4929995Z bar.sync 0; 2026-02-21T09:17:30.4930279Z st.shared.v4.b32 [%r52], {%r482, %r494, %r506, %r518}; 2026-02-21T09:17:30.4930705Z st.shared.v4.b32 [%r53], {%r530, %r542, %r554, %r566}; 2026-02-21T09:17:30.4931052Z bar.sync 0; 2026-02-21T09:17:30.4931281Z // begin inline asm 2026-02-21T09:17:30.4931704Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r405, %r406, %r407, %r408}, [%r389]; 2026-02-21T09:17:30.4932191Z // end inline asm 2026-02-21T09:17:30.4932433Z // begin inline asm 2026-02-21T09:17:30.4932848Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r410, %r411, %r412, %r413}, [%r394]; 2026-02-21T09:17:30.4933334Z // end inline asm 2026-02-21T09:17:30.4933563Z bar.sync 0; 2026-02-21T09:17:30.4933851Z st.shared.v4.b32 [%r52], {%r485, %r497, %r509, %r521}; 2026-02-21T09:17:30.4934263Z st.shared.v4.b32 [%r53], {%r533, %r545, %r557, %r569}; 2026-02-21T09:17:30.4934614Z bar.sync 0; 2026-02-21T09:17:30.4934925Z // begin inline asm 2026-02-21T09:17:30.4935357Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r415, %r416, %r417, %r418}, [%r389]; 2026-02-21T09:17:30.4935852Z // end inline asm 2026-02-21T09:17:30.4936087Z // begin inline asm 2026-02-21T09:17:30.4936513Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r420, %r421, %r422, %r423}, [%r394]; 2026-02-21T09:17:30.4936989Z // end inline asm 2026-02-21T09:17:30.4937232Z // begin inline asm 2026-02-21T09:17:30.4937560Z st.global.v4.b32 [ %rd94 + 0 ], { %r385, %r395, %r405, %r415 }; 2026-02-21T09:17:30.4937953Z // end inline asm 2026-02-21T09:17:30.4938186Z // begin inline asm 2026-02-21T09:17:30.4938513Z st.global.v4.b32 [ %rd95 + 0 ], { %r386, %r396, %r406, %r416 }; 2026-02-21T09:17:30.4938904Z // end inline asm 2026-02-21T09:17:30.4939193Z // begin inline asm 2026-02-21T09:17:30.4939523Z st.global.v4.b32 [ %rd96 + 0 ], { %r387, %r397, %r407, %r417 }; 2026-02-21T09:17:30.4939900Z // end inline asm 2026-02-21T09:17:30.4940140Z // begin inline asm 2026-02-21T09:17:30.4940506Z st.global.v4.b32 [ %rd97 + 0 ], { %r388, %r398, %r408, %r418 }; 2026-02-21T09:17:30.4940890Z // end inline asm 2026-02-21T09:17:30.4941119Z // begin inline asm 2026-02-21T09:17:30.4941444Z st.global.v4.b32 [ %rd98 + 0 ], { %r390, %r400, %r410, %r420 }; 2026-02-21T09:17:30.4941825Z // end inline asm 2026-02-21T09:17:30.4942048Z // begin inline asm 2026-02-21T09:17:30.4942376Z st.global.v4.b32 [ %rd99 + 0 ], { %r391, %r401, %r411, %r421 }; 2026-02-21T09:17:30.4942749Z // end inline asm 2026-02-21T09:17:30.4942992Z // begin inline asm 2026-02-21T09:17:30.4943322Z st.global.v4.b32 [ %rd100 + 0 ], { %r392, %r402, %r412, %r422 }; 2026-02-21T09:17:30.4943717Z // end inline asm 2026-02-21T09:17:30.4944014Z // begin inline asm 2026-02-21T09:17:30.4944359Z st.global.v4.b32 [ %rd101 + 0 ], { %r393, %r403, %r413, %r423 }; 2026-02-21T09:17:30.4944787Z // end inline asm 2026-02-21T09:17:30.4945289Z .loc 1 23 107 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:23:107 2026-02-21T09:17:30.4945875Z add.s32 %r571, %r571, 1; 2026-02-21T09:17:30.4946165Z setp.ne.b32 %p29, %r571, %r4; 2026-02-21T09:17:30.4946462Z @%p29 bra $L__BB0_2; 2026-02-21T09:17:30.4946711Z bra.uni $L__BB0_9; 2026-02-21T09:17:30.4947087Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:17:30.4947535Z // Child Loop BB0_5 Depth 2 2026-02-21T09:17:30.4948154Z .loc 1 29 35 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:29:35 2026-02-21T09:17:30.4948729Z shr.s32 %r234, %r571, 31; 2026-02-21T09:17:30.4949007Z shr.u32 %r235, %r234, 28; 2026-02-21T09:17:30.4949298Z add.s32 %r236, %r571, %r235; 2026-02-21T09:17:30.4949811Z .loc 1 32 45 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:32:45 2026-02-21T09:17:30.4950367Z and.b32 %r237, %r236, 65520; 2026-02-21T09:17:30.4950647Z sub.s32 %r238, %r571, %r237; 2026-02-21T09:17:30.4951162Z .loc 1 32 64 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:32:64 2026-02-21T09:17:30.4951720Z cvt.u16.u32 %rs1, %r238; 2026-02-21T09:17:30.4951993Z cvt.s8.s32 %rs2, %r238; 2026-02-21T09:17:30.4952497Z .loc 1 33 51 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:33:51 2026-02-21T09:17:30.4953042Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:17:30.4953314Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:17:30.4953580Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:17:30.4953859Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:17:30.4954118Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:17:30.4954618Z .loc 1 32 64 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:32:64 2026-02-21T09:17:30.4955232Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:17:30.4955500Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:17:30.4956000Z .loc 1 34 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:34:27 2026-02-21T09:17:30.4956547Z shl.b32 %r239, %r236, 4; 2026-02-21T09:17:30.4956829Z and.b32 %r61, %r239, -256; 2026-02-21T09:17:30.4957108Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:17:30.4957391Z mul.wide.s16 %r62, %rs10, 64; 2026-02-21T09:17:30.4957679Z add.s32 %r240, %r62, %r61; 2026-02-21T09:17:30.4958186Z .loc 1 35 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:35:32 2026-02-21T09:17:30.4958738Z or.b32 %r241, %r240, %r6; 2026-02-21T09:17:30.4959233Z .loc 1 36 27 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:36:27 2026-02-21T09:17:30.4959802Z mul.wide.s16 %r242, %rs7, 256; 2026-02-21T09:17:30.4960318Z .loc 1 37 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:37:32 2026-02-21T09:17:30.4960932Z or.b32 %r243, %r242, %r9; 2026-02-21T09:17:30.4961426Z .loc 1 47 53 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:53 2026-02-21T09:17:30.4961988Z shl.b32 %r244, %r243, 10; 2026-02-21T09:17:30.4962534Z .loc 1 48 80 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:80 2026-02-21T09:17:30.4963080Z shl.b32 %r245, %r241, 10; 2026-02-21T09:17:30.4963586Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.4964157Z shfl.sync.idx.b32 %r72, %r5, 0, 31, -1; 2026-02-21T09:17:30.4964498Z shl.b32 %r246, %r72, 21; 2026-02-21T09:17:30.4964808Z and.b32 %r247, %r246, 6291456; 2026-02-21T09:17:30.4965108Z add.s32 %r248, %r247, %r570; 2026-02-21T09:17:30.4965397Z shl.b32 %r249, %r72, 4; 2026-02-21T09:17:30.4965658Z and.b32 %r250, %r249, 64; 2026-02-21T09:17:30.4966006Z add.s32 %r384, %r248, %r250; 2026-02-21T09:17:30.4966285Z mov.pred %p4, -1; 2026-02-21T09:17:30.4966540Z mov.b32 %r572, 0; 2026-02-21T09:17:30.4966779Z // begin inline asm 2026-02-21T09:17:30.4967490Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r384 + 0], {%r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572}; 2026-02-21T09:17:30.4968236Z // end inline asm 2026-02-21T09:17:30.4968504Z // begin inline asm 2026-02-21T09:17:30.4969245Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r384 + 16], {%r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572}; 2026-02-21T09:17:30.4969981Z // end inline asm 2026-02-21T09:17:30.4970225Z // begin inline asm 2026-02-21T09:17:30.4970904Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r384 + 32], {%r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572}; 2026-02-21T09:17:30.4971654Z // end inline asm 2026-02-21T09:17:30.4971896Z // begin inline asm 2026-02-21T09:17:30.4972575Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r384 + 48], {%r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572, %r572}; 2026-02-21T09:17:30.4973311Z // end inline asm 2026-02-21T09:17:30.4973542Z // begin inline asm 2026-02-21T09:17:30.4973820Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:17:30.4974110Z // end inline asm 2026-02-21T09:17:30.4974348Z bar.sync 0; 2026-02-21T09:17:30.4974882Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.4975450Z add.s32 %r573, %r86, 81920; 2026-02-21T09:17:30.4975730Z // begin inline asm 2026-02-21T09:17:30.4976043Z @%p8 mbarrier.init.shared::cta.b64 [%r573], 1; 2026-02-21T09:17:30.4976399Z // end inline asm 2026-02-21T09:17:30.4976627Z bar.sync 0; 2026-02-21T09:17:30.4976870Z add.s32 %r191, %r86, 81928; 2026-02-21T09:17:30.4977146Z // begin inline asm 2026-02-21T09:17:30.4977444Z @%p8 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T09:17:30.4977780Z // end inline asm 2026-02-21T09:17:30.4978253Z .loc 1 47 60 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:60 2026-02-21T09:17:30.4978807Z or.b32 %r252, %r244, %r18; 2026-02-21T09:17:30.4979319Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.4979887Z mad.wide.s32 %rd21, %r252, 2, %rd14; 2026-02-21T09:17:30.4980199Z cvt.u64.u32 %rd42, %r18; 2026-02-21T09:17:30.4980485Z cvt.s64.s32 %rd5, %r244; 2026-02-21T09:17:30.4980761Z or.b64 %rd43, %rd5, %rd42; 2026-02-21T09:17:30.4981043Z shl.b64 %rd44, %rd43, 1; 2026-02-21T09:17:30.4981311Z add.s64 %rd6, %rd14, %rd44; 2026-02-21T09:17:30.4981602Z add.s64 %rd22, %rd6, 262144; 2026-02-21T09:17:30.4981875Z mov.b32 %r262, 16; 2026-02-21T09:17:30.4982368Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.4982921Z // begin inline asm 2026-02-21T09:17:30.4983352Z cp.async.cg.shared.global [ %r192 + 0 ], [ %rd21 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.4983780Z // end inline asm 2026-02-21T09:17:30.4984021Z // begin inline asm 2026-02-21T09:17:30.4984391Z cp.async.cg.shared.global [ %r194 + 0 ], [ %rd22 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.4984909Z // end inline asm 2026-02-21T09:17:30.4985168Z cp.async.commit_group; 2026-02-21T09:17:30.4985673Z .loc 1 48 59 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:59 2026-02-21T09:17:30.4986247Z or.b32 %r253, %r245, %r19; 2026-02-21T09:17:30.4986763Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.4987344Z mad.wide.s32 %rd23, %r253, 2, %rd15; 2026-02-21T09:17:30.4987661Z mov.b32 %r197, 8; 2026-02-21T09:17:30.4988133Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.4988737Z // begin inline asm 2026-02-21T09:17:30.4989115Z cp.async.ca.shared.global [ %r196 + 0 ], [ %rd23 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.4989549Z // end inline asm 2026-02-21T09:17:30.4989826Z cp.async.commit_group; 2026-02-21T09:17:30.4990342Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.4990917Z add.s64 %rd24, %rd6, 32; 2026-02-21T09:17:30.4991190Z cvt.u64.u32 %rd45, %r24; 2026-02-21T09:17:30.4991475Z or.b64 %rd46, %rd5, %rd45; 2026-02-21T09:17:30.4991798Z shl.b64 %rd47, %rd46, 1; 2026-02-21T09:17:30.4992082Z add.s64 %rd48, %rd14, %rd47; 2026-02-21T09:17:30.4992369Z add.s64 %rd25, %rd48, 262144; 2026-02-21T09:17:30.4992895Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.4993439Z bar.sync 0; 2026-02-21T09:17:30.4993670Z // begin inline asm 2026-02-21T09:17:30.4994044Z cp.async.cg.shared.global [ %r198 + 0 ], [ %rd24 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.4994467Z // end inline asm 2026-02-21T09:17:30.4994761Z // begin inline asm 2026-02-21T09:17:30.4995123Z cp.async.cg.shared.global [ %r200 + 0 ], [ %rd25 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.4995541Z // end inline asm 2026-02-21T09:17:30.4995790Z cp.async.commit_group; 2026-02-21T09:17:30.4996303Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.4996872Z add.s64 %rd26, %rd23, 32; 2026-02-21T09:17:30.4997377Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.4997932Z // begin inline asm 2026-02-21T09:17:30.4998295Z cp.async.ca.shared.global [ %r202 + 0 ], [ %rd26 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.4998714Z // end inline asm 2026-02-21T09:17:30.4998958Z cp.async.commit_group; 2026-02-21T09:17:30.4999469Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5000031Z add.s64 %rd27, %rd6, 64; 2026-02-21T09:17:30.5000305Z cvt.u64.u32 %rd49, %r28; 2026-02-21T09:17:30.5000588Z or.b64 %rd50, %rd5, %rd49; 2026-02-21T09:17:30.5000866Z shl.b64 %rd51, %rd50, 1; 2026-02-21T09:17:30.5001146Z add.s64 %rd52, %rd14, %rd51; 2026-02-21T09:17:30.5001433Z add.s64 %rd28, %rd52, 262144; 2026-02-21T09:17:30.5001958Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5002498Z bar.sync 0; 2026-02-21T09:17:30.5002742Z // begin inline asm 2026-02-21T09:17:30.5003114Z cp.async.cg.shared.global [ %r204 + 0 ], [ %rd27 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5003538Z // end inline asm 2026-02-21T09:17:30.5003787Z // begin inline asm 2026-02-21T09:17:30.5004149Z cp.async.cg.shared.global [ %r206 + 0 ], [ %rd28 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5004566Z // end inline asm 2026-02-21T09:17:30.5004854Z cp.async.commit_group; 2026-02-21T09:17:30.5005360Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5005970Z add.s64 %rd29, %rd23, 64; 2026-02-21T09:17:30.5006483Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5007034Z // begin inline asm 2026-02-21T09:17:30.5007449Z cp.async.ca.shared.global [ %r208 + 0 ], [ %rd29 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.5007866Z // end inline asm 2026-02-21T09:17:30.5008111Z cp.async.commit_group; 2026-02-21T09:17:30.5008615Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5009175Z add.s64 %rd30, %rd6, 96; 2026-02-21T09:17:30.5009460Z cvt.u64.u32 %rd53, %r32; 2026-02-21T09:17:30.5009737Z or.b64 %rd54, %rd5, %rd53; 2026-02-21T09:17:30.5010022Z shl.b64 %rd55, %rd54, 1; 2026-02-21T09:17:30.5010301Z add.s64 %rd56, %rd14, %rd55; 2026-02-21T09:17:30.5010592Z add.s64 %rd31, %rd56, 262144; 2026-02-21T09:17:30.5011160Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5011710Z bar.sync 0; 2026-02-21T09:17:30.5011951Z // begin inline asm 2026-02-21T09:17:30.5012323Z cp.async.cg.shared.global [ %r210 + 0 ], [ %rd30 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5012750Z // end inline asm 2026-02-21T09:17:30.5012988Z // begin inline asm 2026-02-21T09:17:30.5013358Z cp.async.cg.shared.global [ %r212 + 0 ], [ %rd31 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5013779Z // end inline asm 2026-02-21T09:17:30.5014074Z cp.async.commit_group; 2026-02-21T09:17:30.5014586Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5015185Z add.s64 %rd32, %rd23, 96; 2026-02-21T09:17:30.5015699Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5016250Z // begin inline asm 2026-02-21T09:17:30.5016625Z cp.async.ca.shared.global [ %r214 + 0 ], [ %rd32 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.5017052Z // end inline asm 2026-02-21T09:17:30.5017298Z cp.async.commit_group; 2026-02-21T09:17:30.5017806Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5018377Z add.s64 %rd33, %rd6, 128; 2026-02-21T09:17:30.5018661Z cvt.u64.u32 %rd57, %r36; 2026-02-21T09:17:30.5018939Z or.b64 %rd58, %rd5, %rd57; 2026-02-21T09:17:30.5019229Z shl.b64 %rd59, %rd58, 1; 2026-02-21T09:17:30.5019503Z add.s64 %rd60, %rd14, %rd59; 2026-02-21T09:17:30.5019800Z add.s64 %rd34, %rd60, 262144; 2026-02-21T09:17:30.5020326Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5020867Z bar.sync 0; 2026-02-21T09:17:30.5021108Z // begin inline asm 2026-02-21T09:17:30.5021480Z cp.async.cg.shared.global [ %r216 + 0 ], [ %rd33 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5021909Z // end inline asm 2026-02-21T09:17:30.5022149Z // begin inline asm 2026-02-21T09:17:30.5022524Z cp.async.cg.shared.global [ %r218 + 0 ], [ %rd34 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5022938Z // end inline asm 2026-02-21T09:17:30.5023196Z cp.async.commit_group; 2026-02-21T09:17:30.5023715Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5024285Z add.s64 %rd35, %rd23, 128; 2026-02-21T09:17:30.5024850Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5025404Z // begin inline asm 2026-02-21T09:17:30.5025779Z cp.async.ca.shared.global [ %r220 + 0 ], [ %rd35 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.5026198Z // end inline asm 2026-02-21T09:17:30.5026460Z cp.async.commit_group; 2026-02-21T09:17:30.5026958Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5027529Z add.s64 %rd36, %rd6, 160; 2026-02-21T09:17:30.5027825Z cvt.u64.u32 %rd61, %r40; 2026-02-21T09:17:30.5028101Z or.b64 %rd62, %rd5, %rd61; 2026-02-21T09:17:30.5028451Z shl.b64 %rd63, %rd62, 1; 2026-02-21T09:17:30.5028728Z add.s64 %rd64, %rd14, %rd63; 2026-02-21T09:17:30.5029041Z add.s64 %rd37, %rd64, 262144; 2026-02-21T09:17:30.5029565Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5030174Z bar.sync 0; 2026-02-21T09:17:30.5030406Z // begin inline asm 2026-02-21T09:17:30.5030787Z cp.async.cg.shared.global [ %r222 + 0 ], [ %rd36 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5031225Z // end inline asm 2026-02-21T09:17:30.5031473Z // begin inline asm 2026-02-21T09:17:30.5031852Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd37 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5032274Z // end inline asm 2026-02-21T09:17:30.5032531Z cp.async.commit_group; 2026-02-21T09:17:30.5033033Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5033663Z add.s64 %rd38, %rd23, 160; 2026-02-21T09:17:30.5034197Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5034837Z // begin inline asm 2026-02-21T09:17:30.5035219Z cp.async.ca.shared.global [ %r226 + 0 ], [ %rd38 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.5035647Z // end inline asm 2026-02-21T09:17:30.5035911Z cp.async.commit_group; 2026-02-21T09:17:30.5036416Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5037029Z add.s64 %rd39, %rd6, 192; 2026-02-21T09:17:30.5037316Z cvt.u64.u32 %rd65, %r44; 2026-02-21T09:17:30.5037606Z or.b64 %rd66, %rd5, %rd65; 2026-02-21T09:17:30.5037901Z shl.b64 %rd67, %rd66, 1; 2026-02-21T09:17:30.5038184Z add.s64 %rd68, %rd14, %rd67; 2026-02-21T09:17:30.5038488Z add.s64 %rd40, %rd68, 262144; 2026-02-21T09:17:30.5039017Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5039583Z bar.sync 0; 2026-02-21T09:17:30.5039816Z // begin inline asm 2026-02-21T09:17:30.5040199Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd39 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5040621Z // end inline asm 2026-02-21T09:17:30.5040869Z // begin inline asm 2026-02-21T09:17:30.5041249Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd40 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5041675Z // end inline asm 2026-02-21T09:17:30.5041937Z cp.async.commit_group; 2026-02-21T09:17:30.5042452Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5043031Z add.s64 %rd41, %rd23, 192; 2026-02-21T09:17:30.5043549Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5044121Z // begin inline asm 2026-02-21T09:17:30.5044489Z cp.async.ca.shared.global [ %r232 + 0 ], [ %rd41 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.5044958Z // end inline asm 2026-02-21T09:17:30.5045220Z cp.async.commit_group; 2026-02-21T09:17:30.5045704Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5046253Z cp.async.wait_group 12; 2026-02-21T09:17:30.5046525Z bar.sync 0; 2026-02-21T09:17:30.5046993Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.5047557Z setp.ne.b32 %p10, %r72, 0; 2026-02-21T09:17:30.5047845Z @%p10 bra $L__BB0_4; 2026-02-21T09:17:30.5048181Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:17:30.5048585Z elect.sync %r258|%p12, -1; 2026-02-21T09:17:30.5048877Z mov.b32 %r255, 135266320; 2026-02-21T09:17:30.5049146Z mov.pred %p11, 0; 2026-02-21T09:17:30.5049417Z // begin inline asm 2026-02-21T09:17:30.5049830Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r570 + 0 ], %rd69, %rd70, %r255, %p11; 2026-02-21T09:17:30.5050301Z // end inline asm 2026-02-21T09:17:30.5050540Z // begin inline asm 2026-02-21T09:17:30.5050949Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r570 + 64 ], %rd71, %rd70, %r255, %p11; 2026-02-21T09:17:30.5051521Z // end inline asm 2026-02-21T09:17:30.5051766Z add.s32 %r260, %r86, 81920; 2026-02-21T09:17:30.5052060Z cvt.u64.u32 %rd73, %r260; 2026-02-21T09:17:30.5052418Z // begin inline asm 2026-02-21T09:17:30.5052806Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd73]; 2026-02-21T09:17:30.5053234Z // end inline asm 2026-02-21T09:17:30.5053561Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:17:30.5054168Z .loc 1 0 0 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:0 2026-02-21T09:17:30.5054780Z cvt.s32.s16 %r60, %rs7; 2026-02-21T09:17:30.5055060Z or.b32 %r63, %r240, %r8; 2026-02-21T09:17:30.5055326Z or.b32 %r64, %r242, %r10; 2026-02-21T09:17:30.5055602Z or.b32 %r65, %r242, %r11; 2026-02-21T09:17:30.5055864Z or.b32 %r66, %r242, %r12; 2026-02-21T09:17:30.5056179Z or.b32 %r67, %r242, %r13; 2026-02-21T09:17:30.5056440Z or.b32 %r68, %r242, %r14; 2026-02-21T09:17:30.5056705Z or.b32 %r69, %r242, %r15; 2026-02-21T09:17:30.5056957Z or.b32 %r70, %r242, %r16; 2026-02-21T09:17:30.5057223Z or.b32 %r71, %r242, %r17; 2026-02-21T09:17:30.5057729Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5058289Z add.s64 %rd74, %rd6, 224; 2026-02-21T09:17:30.5058568Z cvt.u64.u32 %rd78, %r48; 2026-02-21T09:17:30.5058845Z add.s64 %rd79, %rd5, %rd78; 2026-02-21T09:17:30.5059180Z shl.b64 %rd80, %rd79, 1; 2026-02-21T09:17:30.5059449Z add.s64 %rd81, %rd14, %rd80; 2026-02-21T09:17:30.5059740Z add.s64 %rd75, %rd81, 262144; 2026-02-21T09:17:30.5060249Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5060814Z bar.sync 0; 2026-02-21T09:17:30.5061048Z // begin inline asm 2026-02-21T09:17:30.5061421Z cp.async.cg.shared.global [ %r261 + 0 ], [ %rd74 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5061861Z // end inline asm 2026-02-21T09:17:30.5062101Z // begin inline asm 2026-02-21T09:17:30.5062476Z cp.async.cg.shared.global [ %r263 + 0 ], [ %rd75 + 0 ], 0x10, %r262; 2026-02-21T09:17:30.5062889Z // end inline asm 2026-02-21T09:17:30.5063149Z cp.async.commit_group; 2026-02-21T09:17:30.5063652Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5064222Z add.s64 %rd76, %rd23, 224; 2026-02-21T09:17:30.5064798Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5065344Z // begin inline asm 2026-02-21T09:17:30.5065715Z cp.async.ca.shared.global [ %r265 + 0 ], [ %rd76 + 0 ], 0x8, %r197; 2026-02-21T09:17:30.5066120Z // end inline asm 2026-02-21T09:17:30.5066375Z cp.async.commit_group; 2026-02-21T09:17:30.5066874Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.5067441Z shl.b32 %r271, %r60, 18; 2026-02-21T09:17:30.5067720Z or.b32 %r272, %r56, %r271; 2026-02-21T09:17:30.5068011Z mad.wide.s32 %rd230, %r272, 2, %rd4; 2026-02-21T09:17:30.5068334Z add.s32 %r273, %r6, %r61; 2026-02-21T09:17:30.5068605Z add.s32 %r274, %r273, %r62; 2026-02-21T09:17:30.5068895Z shl.b32 %r275, %r274, 10; 2026-02-21T09:17:30.5069164Z or.b32 %r276, %r57, %r275; 2026-02-21T09:17:30.5069440Z cvt.u64.u32 %rd9, %r276; 2026-02-21T09:17:30.5069704Z mov.b32 %r576, 1; 2026-02-21T09:17:30.5069951Z mov.b32 %r575, 7; 2026-02-21T09:17:30.5070182Z mov.b64 %rd231, 0; 2026-02-21T09:17:30.5070433Z mov.b32 %r574, %r572; 2026-02-21T09:17:30.5070695Z mov.b32 %r577, %r572; 2026-02-21T09:17:30.5070945Z bra.uni $L__BB0_5; 2026-02-21T09:17:30.5071279Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:17:30.5071908Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.5072479Z setp.lt.u64 %p23, %rd231, 896; 2026-02-21T09:17:30.5073063Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.5073622Z // begin inline asm 2026-02-21T09:17:30.5073858Z 2026-02-21T09:17:30.5074058Z { 2026-02-21T09:17:30.5074333Z .reg .pred complete; 2026-02-21T09:17:30.5074586Z waitLoop: 2026-02-21T09:17:30.5074988Z mbarrier.try_wait.parity.shared.b64 complete, [%r573], %r572; 2026-02-21T09:17:30.5075427Z @!complete bra.uni waitLoop; 2026-02-21T09:17:30.5075711Z } 2026-02-21T09:17:30.5075825Z 2026-02-21T09:17:30.5075919Z // end inline asm 2026-02-21T09:17:30.5076170Z add.s32 %r305, %r576, 1; 2026-02-21T09:17:30.5076445Z setp.gt.s32 %p24, %r305, 1; 2026-02-21T09:17:30.5076743Z selp.b32 %r576, 0, %r305, %p24; 2026-02-21T09:17:30.5077041Z selp.b32 %r306, 1, 0, %p24; 2026-02-21T09:17:30.5077324Z xor.b32 %r83, %r577, %r306; 2026-02-21T09:17:30.5077890Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.5078446Z add.s32 %r307, %r575, 1; 2026-02-21T09:17:30.5078726Z setp.gt.s32 %p25, %r307, 7; 2026-02-21T09:17:30.5079015Z selp.b32 %r575, 0, %r307, %p25; 2026-02-21T09:17:30.5079556Z .loc 1 47 32 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:32 2026-02-21T09:17:30.5080134Z add.s64 %rd90, %rd230, -262144; 2026-02-21T09:17:30.5080706Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5081266Z shl.b32 %r308, %r575, 13; 2026-02-21T09:17:30.5081543Z add.s32 %r310, %r86, %r308; 2026-02-21T09:17:30.5081823Z bar.sync 0; 2026-02-21T09:17:30.5082054Z add.s32 %r299, %r310, %r20; 2026-02-21T09:17:30.5082342Z selp.b32 %r300, 16, 0, %p23; 2026-02-21T09:17:30.5082623Z // begin inline asm 2026-02-21T09:17:30.5083006Z cp.async.cg.shared.global [ %r299 + 0 ], [ %rd90 + 0 ], 0x10, %r300; 2026-02-21T09:17:30.5083432Z // end inline asm 2026-02-21T09:17:30.5083681Z add.s32 %r301, %r299, 4096; 2026-02-21T09:17:30.5083958Z // begin inline asm 2026-02-21T09:17:30.5084326Z cp.async.cg.shared.global [ %r301 + 0 ], [ %rd230 + 0 ], 0x10, %r300; 2026-02-21T09:17:30.5084807Z // end inline asm 2026-02-21T09:17:30.5085051Z cp.async.commit_group; 2026-02-21T09:17:30.5085557Z .loc 1 48 34 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:34 2026-02-21T09:17:30.5086111Z add.s64 %rd93, %rd9, %rd231; 2026-02-21T09:17:30.5086405Z cvt.u32.u64 %r311, %rd93; 2026-02-21T09:17:30.5086692Z mad.wide.s32 %rd92, %r311, 2, %rd15; 2026-02-21T09:17:30.5087245Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5087814Z shl.b32 %r312, %r575, 11; 2026-02-21T09:17:30.5088094Z add.s32 %r303, %r196, %r312; 2026-02-21T09:17:30.5088387Z selp.b32 %r304, 8, 0, %p23; 2026-02-21T09:17:30.5088668Z // begin inline asm 2026-02-21T09:17:30.5089046Z cp.async.ca.shared.global [ %r303 + 0 ], [ %rd92 + 0 ], 0x8, %r304; 2026-02-21T09:17:30.5089467Z // end inline asm 2026-02-21T09:17:30.5089721Z cp.async.commit_group; 2026-02-21T09:17:30.5090219Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.5090778Z add.s64 %rd12, %rd231, 16; 2026-02-21T09:17:30.5091070Z add.s64 %rd230, %rd230, 32; 2026-02-21T09:17:30.5091357Z setp.lt.u64 %p26, %rd231, 992; 2026-02-21T09:17:30.5091657Z mov.b64 %rd231, %rd12; 2026-02-21T09:17:30.5091918Z mov.b32 %r572, %r577; 2026-02-21T09:17:30.5092178Z mov.b32 %r573, %r313; 2026-02-21T09:17:30.5092425Z mov.b32 %r577, %r83; 2026-02-21T09:17:30.5092684Z @%p26 bra $L__BB0_5; 2026-02-21T09:17:30.5092930Z bra.uni $L__BB0_8; 2026-02-21T09:17:30.5093263Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:17:30.5093714Z // => This Inner Loop Header: Depth=2 2026-02-21T09:17:30.5094095Z add.s32 %r278, %r574, 1; 2026-02-21T09:17:30.5094443Z setp.gt.s32 %p17, %r278, 7; 2026-02-21T09:17:30.5094774Z selp.b32 %r574, 0, %r278, %p17; 2026-02-21T09:17:30.5095313Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5095934Z cp.async.wait_group 12; 2026-02-21T09:17:30.5096212Z bar.sync 0; 2026-02-21T09:17:30.5096666Z .loc 1 42 89 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:42:89 2026-02-21T09:17:30.5097232Z shl.b32 %r279, %r576, 3; 2026-02-21T09:17:30.5097509Z add.s32 %r281, %r86, %r279; 2026-02-21T09:17:30.5097788Z add.s32 %r313, %r281, 81920; 2026-02-21T09:17:30.5098299Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.5098848Z @%p10 bra $L__BB0_7; 2026-02-21T09:17:30.5099185Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:17:30.5099835Z .loc 1 48 87 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:48:87 2026-02-21T09:17:30.5100397Z shl.b32 %r286, %r574, 11; 2026-02-21T09:17:30.5100686Z add.s32 %r288, %r86, %r286; 2026-02-21T09:17:30.5100961Z add.s32 %r289, %r288, 65536; 2026-02-21T09:17:30.5101468Z .loc 1 47 85 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:47:85 2026-02-21T09:17:30.5102020Z shl.b32 %r290, %r574, 13; 2026-02-21T09:17:30.5102302Z add.s32 %r291, %r86, %r290; 2026-02-21T09:17:30.5102843Z .loc 1 49 52 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:49:52 2026-02-21T09:17:30.5103422Z elect.sync %r292|%p19, -1; 2026-02-21T09:17:30.5103716Z bfe.u32 %r293, %r291, 4, 14; 2026-02-21T09:17:30.5104008Z cvt.u64.u32 %rd87, %r293; 2026-02-21T09:17:30.5104313Z or.b64 %rd82, %rd87, -4611685949674356736; 2026-02-21T09:17:30.5104643Z bfe.u32 %r294, %r289, 4, 14; 2026-02-21T09:17:30.5104981Z cvt.u64.u32 %rd88, %r294; 2026-02-21T09:17:30.5105276Z or.b64 %rd83, %rd88, -4611685949699522560; 2026-02-21T09:17:30.5105608Z mov.b32 %r283, 135266320; 2026-02-21T09:17:30.5105875Z mov.pred %p18, -1; 2026-02-21T09:17:30.5106141Z // begin inline asm 2026-02-21T09:17:30.5106563Z @%p19 tcgen05.mma.cta_group::1.kind::f16 [ %r570 + 0 ], %rd82, %rd83, %r283, %p18; 2026-02-21T09:17:30.5107043Z // end inline asm 2026-02-21T09:17:30.5107297Z add.s32 %r295, %r291, 4096; 2026-02-21T09:17:30.5107578Z bfe.u32 %r296, %r295, 4, 14; 2026-02-21T09:17:30.5107863Z cvt.u64.u32 %rd89, %r296; 2026-02-21T09:17:30.5108154Z or.b64 %rd84, %rd89, -4611685949674356736; 2026-02-21T09:17:30.5108475Z // begin inline asm 2026-02-21T09:17:30.5108879Z @%p19 tcgen05.mma.cta_group::1.kind::f16 [ %r570 + 64 ], %rd84, %rd83, %r283, %p18; 2026-02-21T09:17:30.5109355Z // end inline asm 2026-02-21T09:17:30.5109593Z cvt.u64.u32 %rd86, %r313; 2026-02-21T09:17:30.5109867Z // begin inline asm 2026-02-21T09:17:30.5110255Z @%p19 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T09:17:30.5110683Z // end inline asm 2026-02-21T09:17:30.5110926Z bra.uni $L__BB0_7; 2026-02-21T09:17:30.5111215Z $L__BB0_9: // %._crit_edge 2026-02-21T09:17:30.5111803Z .loc 1 23 4 // csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py:23:4 2026-02-21T09:17:30.5112353Z bar.sync 0; 2026-02-21T09:17:30.5112591Z // begin inline asm 2026-02-21T09:17:30.5112951Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r570, 128; 2026-02-21T09:17:30.5113370Z // end inline asm 2026-02-21T09:17:30.5113601Z ret; 2026-02-21T09:17:30.5113800Z $L__tmp0: 2026-02-21T09:17:30.5114021Z $L__func_end0: 2026-02-21T09:17:30.5114293Z // -- End function 2026-02-21T09:17:30.5114630Z } 2026-02-21T09:17:30.5115202Z .file 1 "/tmp/torchinductor_root/sb/csbwgh7wkzhsnogvhkyswe2ftawptflnamjiwy2paxwxuexbk4hf.py" 2026-02-21T09:17:30.5115833Z .section .debug_abbrev 2026-02-21T09:17:30.5116081Z { 2026-02-21T09:17:30.5116349Z .b8 1 // Abbreviation Code 2026-02-21T09:17:30.5116829Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:17:30.5117219Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:17:30.5117608Z .b8 37 // DW_AT_producer 2026-02-21T09:17:30.5118037Z .b8 8 // DW_FORM_string 2026-02-21T09:17:30.5118417Z .b8 19 // DW_AT_language 2026-02-21T09:17:30.5118792Z .b8 5 // DW_FORM_data2 2026-02-21T09:17:30.5119165Z .b8 3 // DW_AT_name 2026-02-21T09:17:30.5119538Z .b8 8 // DW_FORM_string 2026-02-21T09:17:30.5119914Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:17:30.5120298Z .b8 6 // DW_FORM_data4 2026-02-21T09:17:30.5120712Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:17:30.5121094Z .b8 8 // DW_FORM_string 2026-02-21T09:17:30.5121453Z .b8 0 // EOM(1) 2026-02-21T09:17:30.5121805Z .b8 0 // EOM(2) 2026-02-21T09:17:30.5122146Z .b8 0 // EOM(3) 2026-02-21T09:17:30.5122468Z } 2026-02-21T09:17:30.5122708Z .section .debug_info 2026-02-21T09:17:30.5122954Z { 2026-02-21T09:17:30.5123209Z .b32 104 // Length of Unit 2026-02-21T09:17:30.5123652Z .b8 2 // DWARF version number 2026-02-21T09:17:30.5124004Z .b8 0 2026-02-21T09:17:30.5124319Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:17:30.5124846Z .b8 8 // Address Size (in bytes) 2026-02-21T09:17:30.5125282Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:17:30.5125727Z .b8 116 // DW_AT_producer 2026-02-21T09:17:30.5126073Z .b8 114 2026-02-21T09:17:30.5126270Z .b8 105 2026-02-21T09:17:30.5126477Z .b8 116 2026-02-21T09:17:30.5126666Z .b8 111 2026-02-21T09:17:30.5126863Z .b8 110 2026-02-21T09:17:30.5127053Z .b8 0 2026-02-21T09:17:30.5127303Z .b8 2 // DW_AT_language 2026-02-21T09:17:30.5127619Z .b8 0 2026-02-21T09:17:30.5127867Z .b8 99 // DW_AT_name 2026-02-21T09:17:30.5128185Z .b8 115 2026-02-21T09:17:30.5128381Z .b8 98 2026-02-21T09:17:30.5128468Z .b8 119 2026-02-21T09:17:30.5128558Z .b8 103 2026-02-21T09:17:30.5128642Z .b8 104 2026-02-21T09:17:30.5128724Z .b8 55 2026-02-21T09:17:30.5128805Z .b8 119 2026-02-21T09:17:30.5128894Z .b8 107 2026-02-21T09:17:30.5128976Z .b8 122 2026-02-21T09:17:30.5129055Z .b8 104 2026-02-21T09:17:30.5129141Z .b8 115 2026-02-21T09:17:30.5129220Z .b8 110 2026-02-21T09:17:30.5129301Z .b8 111 2026-02-21T09:17:30.5129382Z .b8 103 2026-02-21T09:17:30.5129475Z .b8 118 2026-02-21T09:17:30.5129556Z .b8 104 2026-02-21T09:17:30.5129640Z .b8 107 2026-02-21T09:17:30.5129720Z .b8 121 2026-02-21T09:17:30.5129812Z .b8 115 2026-02-21T09:17:30.5129892Z .b8 119 2026-02-21T09:17:30.5129973Z .b8 101 2026-02-21T09:17:30.5130062Z .b8 50 2026-02-21T09:17:30.5130142Z .b8 102 2026-02-21T09:17:30.5130225Z .b8 116 2026-02-21T09:17:30.5130308Z .b8 97 2026-02-21T09:17:30.5130397Z .b8 119 2026-02-21T09:17:30.5130479Z .b8 112 2026-02-21T09:17:30.5130560Z .b8 116 2026-02-21T09:17:30.5130649Z .b8 102 2026-02-21T09:17:30.5130731Z .b8 108 2026-02-21T09:17:30.5130814Z .b8 110 2026-02-21T09:17:30.5130893Z .b8 97 2026-02-21T09:17:30.5130986Z .b8 109 2026-02-21T09:17:30.5131065Z .b8 106 2026-02-21T09:17:30.5131145Z .b8 105 2026-02-21T09:17:30.5131236Z .b8 119 2026-02-21T09:17:30.5131317Z .b8 121 2026-02-21T09:17:30.5131397Z .b8 50 2026-02-21T09:17:30.5131478Z .b8 112 2026-02-21T09:17:30.5131569Z .b8 97 2026-02-21T09:17:30.5131648Z .b8 120 2026-02-21T09:17:30.5131730Z .b8 119 2026-02-21T09:17:30.5131814Z .b8 120 2026-02-21T09:17:30.5132010Z .b8 117 2026-02-21T09:17:30.5132094Z .b8 101 2026-02-21T09:17:30.5132177Z .b8 120 2026-02-21T09:17:30.5132267Z .b8 98 2026-02-21T09:17:30.5132349Z .b8 107 2026-02-21T09:17:30.5132430Z .b8 52 2026-02-21T09:17:30.5132511Z .b8 104 2026-02-21T09:17:30.5132605Z .b8 102 2026-02-21T09:17:30.5132735Z .b8 46 2026-02-21T09:17:30.5132816Z .b8 112 2026-02-21T09:17:30.5132908Z .b8 121 2026-02-21T09:17:30.5132989Z .b8 0 2026-02-21T09:17:30.5133172Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:17:30.5133322Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:17:30.5133411Z .b8 116 2026-02-21T09:17:30.5133494Z .b8 109 2026-02-21T09:17:30.5133575Z .b8 112 2026-02-21T09:17:30.5133664Z .b8 47 2026-02-21T09:17:30.5133748Z .b8 116 2026-02-21T09:17:30.5133832Z .b8 111 2026-02-21T09:17:30.5133913Z .b8 114 2026-02-21T09:17:30.5134003Z .b8 99 2026-02-21T09:17:30.5134083Z .b8 104 2026-02-21T09:17:30.5134164Z .b8 105 2026-02-21T09:17:30.5134285Z .b8 110 2026-02-21T09:17:30.5134374Z .b8 100 2026-02-21T09:17:30.5134455Z .b8 117 2026-02-21T09:17:30.5134536Z .b8 99 2026-02-21T09:17:30.5134623Z .b8 116 2026-02-21T09:17:30.5134758Z .b8 111 2026-02-21T09:17:30.5134839Z .b8 114 2026-02-21T09:17:30.5134917Z .b8 95 2026-02-21T09:17:30.5135007Z .b8 114 2026-02-21T09:17:30.5135092Z .b8 111 2026-02-21T09:17:30.5135172Z .b8 111 2026-02-21T09:17:30.5135261Z .b8 116 2026-02-21T09:17:30.5135341Z .b8 47 2026-02-21T09:17:30.5135421Z .b8 115 2026-02-21T09:17:30.5135500Z .b8 98 2026-02-21T09:17:30.5135591Z .b8 0 2026-02-21T09:17:30.5135715Z } 2026-02-21T09:17:30.5135833Z .section .debug_macinfo { } 2026-02-21T09:17:30.5135842Z 2026-02-21T09:17:30.5135982Z ================================================================ 2026-02-21T09:17:30.5136170Z please share the reproducer above with Triton project. 2026-02-21T09:17:33.1190666Z 2026-02-21T09:17:33.1195707Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━ 100/100 12.9 configs/s 2026-02-21T09:17:33.1201173Z [15s] Adaptive compile timeout: 30s (90% percentile=6.0s, bounds=[30.0s, 30s]) 2026-02-21T09:17:33.6281454Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 1829.6 configs/s 2026-02-21T09:17:33.6878728Z [16s] Initial random population of 100, 5 starting points: 2026-02-21T09:17:33.6879043Z error=16 2026-02-21T09:17:33.6879254Z ok=84 2026-02-21T09:17:33.6879392Z min=0.1127 2026-02-21T09:17:33.6879542Z mid=1.5125 2026-02-21T09:17:33.6879677Z max=44.5880 2026-02-21T09:17:33.6879843Z best={'block_sizes': [16, 128, 32], 2026-02-21T09:17:33.6880120Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:17:33.6880396Z 'l2_groupings': [64], 2026-02-21T09:17:33.6880593Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:17:33.6880822Z 'loop_orders': [[1, 0]], 2026-02-21T09:17:33.6881005Z 'maxnreg': 128, 2026-02-21T09:17:33.6881169Z 'num_sm_multiplier': 16, 2026-02-21T09:17:33.6881349Z 'num_stages': 4, 2026-02-21T09:17:33.6881505Z 'num_warps': 1, 2026-02-21T09:17:33.6881697Z 'pid_type': 'persistent_blocked', 2026-02-21T09:17:33.6881901Z 'range_flattens': [True, True], 2026-02-21T09:17:33.6882097Z 'range_multi_buffers': [True, False], 2026-02-21T09:17:33.6882290Z 'range_num_stages': [0, 0], 2026-02-21T09:17:33.6882474Z 'range_unroll_factors': [0, 0], 2026-02-21T09:17:33.6882669Z 'range_warp_specializes': [False, False]} 2026-02-21T09:17:33.6897631Z [16s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:17:35.1379460Z [17s] Generation 1 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:17:41.9445567Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90/90 24.4 configs/s 2026-02-21T09:17:47.1587579Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 90/90 17.4 configs/s 2026-02-21T09:17:47.3353995Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5160.4 2026-02-21T09:17:47.3355191Z configs/s 2026-02-21T09:17:47.3682660Z [30s] Generation 1 complete: 2026-02-21T09:17:47.3683363Z ok=92 2026-02-21T09:17:47.3683954Z min=0.0492 2026-02-21T09:17:47.3684136Z mid=0.1986 2026-02-21T09:17:47.3684276Z max=1.6353 2026-02-21T09:17:47.3684424Z best={'block_sizes': [64, 128, 32], 2026-02-21T09:17:47.3684795Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:17:47.3685341Z 'l2_groupings': [64], 2026-02-21T09:17:47.3685537Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:17:47.3685762Z 'loop_orders': [[1, 0]], 2026-02-21T09:17:47.3685948Z 'maxnreg': 128, 2026-02-21T09:17:47.3686123Z 'num_sm_multiplier': 16, 2026-02-21T09:17:47.3686284Z 'num_stages': 4, 2026-02-21T09:17:47.3686437Z 'num_warps': 1, 2026-02-21T09:17:47.3686595Z 'pid_type': 'persistent_blocked', 2026-02-21T09:17:47.3700279Z 'range_flattens': [False, True], 2026-02-21T09:17:47.3700522Z 'range_multi_buffers': [True, False], 2026-02-21T09:17:47.3700737Z 'range_num_stages': [0, 0], 2026-02-21T09:17:47.3701136Z 'range_unroll_factors': [0, 0], 2026-02-21T09:17:47.3701350Z 'range_warp_specializes': [True, None]} 2026-02-21T09:17:47.3701585Z [30s] Fitting surrogate: 192 points, 192 targets 2026-02-21T09:17:49.0593712Z [31s] Generation 2 starting: 80 neighbors, 5 active search path(s) 2026-02-21T09:17:54.3282257Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 17.8 configs/s 2026-02-21T09:17:58.8937006Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 18.6 configs/s 2026-02-21T09:18:00.6551331Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 568.3 2026-02-21T09:18:00.6552226Z configs/s 2026-02-21T09:18:00.7578668Z [43s] Generation 2 complete: 2026-02-21T09:18:00.7583768Z error=5 2026-02-21T09:18:00.7589197Z ok=81 2026-02-21T09:18:00.7591664Z min=0.0492 2026-02-21T09:18:00.7591888Z mid=0.1065 2026-02-21T09:18:00.7596904Z max=1.1966 2026-02-21T09:18:00.7601679Z best={'block_sizes': [64, 128, 32], 2026-02-21T09:18:00.7605319Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:18:00.7609880Z 'l2_groupings': [64], 2026-02-21T09:18:00.7611359Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:18:00.7611590Z 'loop_orders': [[1, 0]], 2026-02-21T09:18:00.7611762Z 'maxnreg': 128, 2026-02-21T09:18:00.7611925Z 'num_sm_multiplier': 16, 2026-02-21T09:18:00.7612088Z 'num_stages': 4, 2026-02-21T09:18:00.7612223Z 'num_warps': 1, 2026-02-21T09:18:00.7612388Z 'pid_type': 'persistent_blocked', 2026-02-21T09:18:00.7612586Z 'range_flattens': [False, True], 2026-02-21T09:18:00.7612763Z 'range_multi_buffers': [True, False], 2026-02-21T09:18:00.7612948Z 'range_num_stages': [0, 0], 2026-02-21T09:18:00.7613112Z 'range_unroll_factors': [0, 0], 2026-02-21T09:18:00.7613295Z 'range_warp_specializes': [True, None]} 2026-02-21T09:18:00.7613584Z [43s] Fitting surrogate: 278 points, 278 targets 2026-02-21T09:18:02.0525263Z [44s] Generation 3 starting: 82 neighbors, 5 active search path(s) 2026-02-21T09:18:08.2827248Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 14.5 configs/s 2026-02-21T09:18:12.8881439Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 18.7 configs/s 2026-02-21T09:18:15.2445688Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 427.9 2026-02-21T09:18:15.2450815Z configs/s 2026-02-21T09:18:15.3907122Z [58s] Generation 3 complete: 2026-02-21T09:18:15.3908960Z error=8 2026-02-21T09:18:15.3909110Z ok=80 2026-02-21T09:18:15.3909276Z min=0.0348 2026-02-21T09:18:15.3909403Z mid=0.0716 2026-02-21T09:18:15.3909531Z max=3.2451 2026-02-21T09:18:15.3909702Z best={'block_sizes': [128, 128, 32], 2026-02-21T09:18:15.3909941Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:18:15.3910166Z 'l2_groupings': [16], 2026-02-21T09:18:15.3910335Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:18:15.3910527Z 'loop_orders': [[1, 0]], 2026-02-21T09:18:15.3910684Z 'num_stages': 6, 2026-02-21T09:18:15.3911117Z 'num_warps': 4, 2026-02-21T09:18:15.3911258Z 'pid_type': 'flat', 2026-02-21T09:18:15.3911423Z 'range_flattens': [None, True], 2026-02-21T09:18:15.3911607Z 'range_multi_buffers': [None, True], 2026-02-21T09:18:15.3911786Z 'range_num_stages': [0, 0], 2026-02-21T09:18:15.3911965Z 'range_unroll_factors': [0, 0], 2026-02-21T09:18:15.3912140Z 'range_warp_specializes': [None, False]} 2026-02-21T09:18:15.3927119Z [58s] Fitting surrogate: 366 points, 366 targets 2026-02-21T09:18:16.7117576Z [59s] Generation 4 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:18:28.5164650Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 1.4 configs/s 2026-02-21T09:18:29.7936986Z 2026-02-21T09:18:29.7940820Z 2026-02-21T09:18:29.7942693Z ================================================================ 2026-02-21T09:18:29.7942989Z Internal Triton PTX codegen error 2026-02-21T09:18:29.7943166Z `ptxas` stderr: 2026-02-21T09:18:29.7943935Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 197 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:18:29.7944449Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:18:29.7944604Z 2026-02-21T09:18:29.7945095Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpk0pebrgw.ptx -o /tmp/tmpk0pebrgw.ptx.o 2026-02-21T09:18:29.7945551Z 2026-02-21T09:18:29.7945555Z 2026-02-21T09:18:29.7945626Z // 2026-02-21T09:18:29.7945766Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:18:29.7945939Z // 2026-02-21T09:18:29.7946005Z 2026-02-21T09:18:29.7946060Z .version 8.7 2026-02-21T09:18:29.7946199Z .target sm_100a 2026-02-21T09:18:29.7946330Z .address_size 64 2026-02-21T09:18:29.7946421Z 2026-02-21T09:18:29.7946539Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:18:29.7946795Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:18:29.7947001Z // @_helion_matmul 2026-02-21T09:18:29.7947206Z .visible .entry _helion_matmul( 2026-02-21T09:18:29.7947418Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:18:29.7947675Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:18:29.7947924Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:18:29.7948186Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:18:29.7948446Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:18:29.7948650Z ) 2026-02-21T09:18:29.7948788Z .reqntid 128 2026-02-21T09:18:29.7948914Z .maxnreg 32 2026-02-21T09:18:29.7949038Z { 2026-02-21T09:18:29.7949159Z .reg .pred %p<88>; 2026-02-21T09:18:29.7949308Z .reg .b32 %r<1572>; 2026-02-21T09:18:29.7949448Z .reg .b64 %rd<631>; 2026-02-21T09:18:29.7949590Z $L__func_begin0: 2026-02-21T09:18:29.7949669Z 2026-02-21T09:18:29.7949737Z // %bb.0: 2026-02-21T09:18:29.7949990Z .loc 1 19 0 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:19 2026-02-21T09:18:29.7950283Z mov.u32 %r1, %tid.x; 2026-02-21T09:18:29.7950452Z ld.param.b64 %rd12, [_helion_matmul_param_1]; 2026-02-21T09:18:29.7950655Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:18:29.7950944Z mov.b32 %r76, global_smem; 2026-02-21T09:18:29.7951541Z [72s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:18:29.7952814Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=4, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:18:29.7953998Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:18:29.7954324Z `ptxas` stderr: 2026-02-21T09:18:29.7954789Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 197 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:18:29.7955278Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:18:29.7955436Z 2026-02-21T09:18:29.7955896Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpk0pebrgw.ptx -o /tmp/tmpk0pebrgw.ptx.o 2026-02-21T09:18:29.7956367Z 2026-02-21T09:18:29.7956507Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:18:29.7956754Z // begin inline asm 2026-02-21T09:18:29.7957013Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r76], 256; 2026-02-21T09:18:29.7957262Z // end inline asm 2026-02-21T09:18:29.7957476Z ld.param.b64 %rd29, [_helion_matmul_param_3]; 2026-02-21T09:18:29.7957669Z bar.sync 0; 2026-02-21T09:18:29.7957828Z ld.shared.b32 %r1564, [global_smem]; 2026-02-21T09:18:29.7958002Z bar.sync 0; 2026-02-21T09:18:29.7958144Z // begin inline asm 2026-02-21T09:18:29.7958360Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:18:29.7958590Z // end inline asm 2026-02-21T09:18:29.7958860Z .loc 1 21 67 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:21:67 2026-02-21T09:18:29.7959162Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:18:29.7959338Z mov.u32 %r85, %ctaid.y; 2026-02-21T09:18:29.7959495Z mov.u32 %r86, %ctaid.z; 2026-02-21T09:18:29.7959670Z mov.u32 %r87, %nctaid.x; 2026-02-21T09:18:29.7959835Z mov.u32 %r88, %nctaid.y; 2026-02-21T09:18:29.7960002Z mad.lo.s32 %r89, %r86, %r88, %r85; 2026-02-21T09:18:29.7960189Z mad.lo.s32 %r90, %r89, %r87, %r3; 2026-02-21T09:18:29.7960358Z shl.b32 %r91, %r90, 7; 2026-02-21T09:18:29.7960516Z cvt.s64.s32 %rd30, %r91; 2026-02-21T09:18:29.7960674Z add.s64 %rd26, %rd29, %rd30; 2026-02-21T09:18:29.7960843Z shl.b32 %r92, %r1, 2; 2026-02-21T09:18:29.7960993Z add.s32 %r77, %r76, %r92; 2026-02-21T09:18:29.7961148Z mov.b32 %r94, 0; 2026-02-21T09:18:29.7961285Z // begin inline asm 2026-02-21T09:18:29.7961447Z @%p1 st.shared.b32 [ %r77 + 0 ], %r94; 2026-02-21T09:18:29.7961638Z // end inline asm 2026-02-21T09:18:29.7961772Z bar.warp.sync -1; 2026-02-21T09:18:29.7961922Z setp.eq.b32 %p81, %r1, 0; 2026-02-21T09:18:29.7962073Z cvt.u64.u32 %rd11, %r76; 2026-02-21T09:18:29.7962226Z // begin inline asm 2026-02-21T09:18:29.7962469Z @%p81 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T09:18:29.7962745Z // end inline asm 2026-02-21T09:18:29.7962873Z // begin inline asm 2026-02-21T09:18:29.7963097Z @%p81 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:18:29.7963346Z // end inline asm 2026-02-21T09:18:29.7963475Z mov.b32 %r79, 32; 2026-02-21T09:18:29.7963612Z // begin inline asm 2026-02-21T09:18:29.7963840Z @%p81 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r79; 2026-02-21T09:18:29.7964106Z // end inline asm 2026-02-21T09:18:29.7964236Z mov.b32 %r80, 256; 2026-02-21T09:18:29.7964374Z // begin inline asm 2026-02-21T09:18:29.7964637Z @%p81 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r80; 2026-02-21T09:18:29.7964943Z // end inline asm 2026-02-21T09:18:29.7965079Z mov.b32 %r81, 1024; 2026-02-21T09:18:29.7965216Z // begin inline asm 2026-02-21T09:18:29.7965459Z @%p81 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r81; 2026-02-21T09:18:29.7965726Z // end inline asm 2026-02-21T09:18:29.7965864Z mov.b32 %r82, 8192; 2026-02-21T09:18:29.7965997Z // begin inline asm 2026-02-21T09:18:29.7966238Z @%p81 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r82; 2026-02-21T09:18:29.7966509Z // end inline asm 2026-02-21T09:18:29.7966648Z mov.b64 %rd19, 2048; 2026-02-21T09:18:29.7966830Z // begin inline asm 2026-02-21T09:18:29.7967075Z @%p81 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:18:29.7967358Z // end inline asm 2026-02-21T09:18:29.7967486Z mov.b32 %r83, 1; 2026-02-21T09:18:29.7967620Z // begin inline asm 2026-02-21T09:18:29.7967868Z @%p81 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r83; 2026-02-21T09:18:29.7968156Z // end inline asm 2026-02-21T09:18:29.7968287Z // begin inline asm 2026-02-21T09:18:29.7968565Z @%p81 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r83; 2026-02-21T09:18:29.7968847Z // end inline asm 2026-02-21T09:18:29.7968977Z // begin inline asm 2026-02-21T09:18:29.7969209Z @%p81 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:18:29.7969466Z // end inline asm 2026-02-21T09:18:29.7969598Z // begin inline asm 2026-02-21T09:18:29.7969869Z @%p81 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:18:29.7970157Z // end inline asm 2026-02-21T09:18:29.7970292Z // begin inline asm 2026-02-21T09:18:29.7970518Z @%p81 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x2; 2026-02-21T09:18:29.7970784Z // end inline asm 2026-02-21T09:18:29.7970913Z // begin inline asm 2026-02-21T09:18:29.7971138Z @%p81 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:18:29.7971393Z // end inline asm 2026-02-21T09:18:29.7971531Z // begin inline asm 2026-02-21T09:18:29.7971880Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd26 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:18:29.7972252Z // end inline asm 2026-02-21T09:18:29.7972390Z // begin inline asm 2026-02-21T09:18:29.7972593Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd26 + 0 ], 0x80; 2026-02-21T09:18:29.7972852Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:18:29.7973040Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:18:29.7973218Z // end inline asm 2026-02-21T09:18:29.7973347Z bar.sync 0; 2026-02-21T09:18:29.7973491Z cvta.global.u64 %rd61, %rd26; 2026-02-21T09:18:29.7973770Z .loc 1 30 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:30:52 2026-02-21T09:18:29.7974060Z setp.gt.u32 %p21, %r3, 255; 2026-02-21T09:18:29.7974225Z @%p21 bra $L__BB0_8; 2026-02-21T09:18:29.7974382Z // %bb.1: // %.lr.ph 2026-02-21T09:18:29.7974749Z .loc 1 0 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:0:52 2026-02-21T09:18:29.7975060Z ld.param.b64 %rd9, [_helion_matmul_param_0]; 2026-02-21T09:18:29.7975370Z .loc 1 44 45 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:44:45 2026-02-21T09:18:29.7975658Z shl.b32 %r412, %r1, 3; 2026-02-21T09:18:29.7975919Z .loc 1 50 48 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:50:48 2026-02-21T09:18:29.7976210Z and.b32 %r413, %r412, 24; 2026-02-21T09:18:29.7976467Z .loc 1 44 45 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:44:45 2026-02-21T09:18:29.7976758Z and.b32 %r414, %r412, 248; 2026-02-21T09:18:29.7977020Z .loc 1 42 45 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:42:45 2026-02-21T09:18:29.7977345Z and.b32 %r415, %r1, 96; 2026-02-21T09:18:29.7977505Z bfe.u32 %r416, %r1, 5, 2; 2026-02-21T09:18:29.7977654Z bfe.u32 %r4, %r1, 2, 5; 2026-02-21T09:18:29.7977806Z shr.u32 %r417, %r1, 5; 2026-02-21T09:18:29.7977948Z shl.b32 %r418, %r1, 4; 2026-02-21T09:18:29.7978100Z and.b32 %r419, %r418, 2032; 2026-02-21T09:18:29.7978251Z shl.b32 %r420, %r1, 1; 2026-02-21T09:18:29.7978401Z and.b32 %r421, %r420, 48; 2026-02-21T09:18:29.7978548Z xor.b32 %r5, %r419, %r421; 2026-02-21T09:18:29.7978710Z or.b32 %r6, %r413, 96; 2026-02-21T09:18:29.7978855Z add.s32 %r422, %r76, %r5; 2026-02-21T09:18:29.7979021Z add.s32 %r470, %r422, 90112; 2026-02-21T09:18:29.7979237Z add.s32 %r472, %r422, 92160; 2026-02-21T09:18:29.7979386Z add.s32 %r474, %r422, 94208; 2026-02-21T09:18:29.7979542Z add.s32 %r476, %r422, 96256; 2026-02-21T09:18:29.7979687Z and.b32 %r423, %r1, 7; 2026-02-21T09:18:29.7979835Z shl.b32 %r424, %r423, 11; 2026-02-21T09:18:29.7979980Z or.b32 %r425, %r424, %r419; 2026-02-21T09:18:29.7980135Z xor.b32 %r426, %r425, 16; 2026-02-21T09:18:29.7980274Z xor.b32 %r427, %r425, 32; 2026-02-21T09:18:29.7980450Z xor.b32 %r428, %r425, 48; 2026-02-21T09:18:29.7980594Z xor.b32 %r429, %r425, 64; 2026-02-21T09:18:29.7980744Z xor.b32 %r430, %r425, 80; 2026-02-21T09:18:29.7980890Z xor.b32 %r431, %r425, 96; 2026-02-21T09:18:29.7981032Z xor.b32 %r432, %r425, 112; 2026-02-21T09:18:29.7981190Z shl.b32 %r433, %r415, 6; 2026-02-21T09:18:29.7981334Z shl.b32 %r434, %r423, 4; 2026-02-21T09:18:29.7981484Z shr.u32 %r435, %r415, 1; 2026-02-21T09:18:29.7981655Z bfe.s32 %r436, %r1, 3, 1; 2026-02-21T09:18:29.7981812Z and.b32 %r437, %r436, 8256; 2026-02-21T09:18:29.7981961Z and.b32 %r438, %r412, 128; 2026-02-21T09:18:29.7982117Z or.b32 %r439, %r433, %r434; 2026-02-21T09:18:29.7982271Z or.b32 %r440, %r437, %r435; 2026-02-21T09:18:29.7982420Z xor.b32 %r441, %r440, %r439; 2026-02-21T09:18:29.7982576Z add.s32 %r442, %r76, %r438; 2026-02-21T09:18:29.7982723Z add.s32 %r831, %r442, %r441; 2026-02-21T09:18:29.7982878Z add.s32 %r397, %r422, 81920; 2026-02-21T09:18:29.7983024Z add.s32 %r403, %r422, 88064; 2026-02-21T09:18:29.7983177Z add.s32 %r401, %r422, 86016; 2026-02-21T09:18:29.7983320Z add.s32 %r399, %r422, 83968; 2026-02-21T09:18:29.7983470Z add.s32 %r384, %r422, 73728; 2026-02-21T09:18:29.7983614Z add.s32 %r390, %r422, 79872; 2026-02-21T09:18:29.7983767Z add.s32 %r388, %r422, 77824; 2026-02-21T09:18:29.7983921Z add.s32 %r386, %r422, 75776; 2026-02-21T09:18:29.7984065Z add.s32 %r371, %r422, 65536; 2026-02-21T09:18:29.7984217Z add.s32 %r377, %r422, 71680; 2026-02-21T09:18:29.7984361Z add.s32 %r375, %r422, 69632; 2026-02-21T09:18:29.7984515Z add.s32 %r373, %r422, 67584; 2026-02-21T09:18:29.7984810Z .loc 1 41 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:41:27 2026-02-21T09:18:29.7985099Z shl.b32 %r443, %r3, 7; 2026-02-21T09:18:29.7985244Z and.b32 %r444, %r443, 896; 2026-02-21T09:18:29.7985508Z .loc 1 42 32 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:42:32 2026-02-21T09:18:29.7985790Z or.b32 %r445, %r444, %r4; 2026-02-21T09:18:29.7985939Z or.b32 %r27, %r444, %r416; 2026-02-21T09:18:29.7986201Z .loc 1 43 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:43:27 2026-02-21T09:18:29.7986477Z shl.b32 %r446, %r3, 5; 2026-02-21T09:18:29.7986629Z and.b32 %r481, %r446, 7936; 2026-02-21T09:18:29.7986881Z .loc 1 54 53 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:53 2026-02-21T09:18:29.7987166Z shl.b32 %r447, %r445, 10; 2026-02-21T09:18:29.7987430Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.7987730Z shfl.sync.idx.b32 %r61, %r417, 0, 31, -1; 2026-02-21T09:18:29.7987923Z shl.b32 %r448, %r61, 21; 2026-02-21T09:18:29.7988078Z and.b32 %r449, %r448, 6291456; 2026-02-21T09:18:29.7988292Z add.s32 %r826, %r449, %r1564; 2026-02-21T09:18:29.7988447Z mov.pred %p22, -1; 2026-02-21T09:18:29.7988592Z // begin inline asm 2026-02-21T09:18:29.7988921Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 0], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7989280Z // end inline asm 2026-02-21T09:18:29.7989419Z // begin inline asm 2026-02-21T09:18:29.7989738Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 16], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7990095Z // end inline asm 2026-02-21T09:18:29.7990225Z // begin inline asm 2026-02-21T09:18:29.7990604Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 32], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7990951Z // end inline asm 2026-02-21T09:18:29.7991090Z // begin inline asm 2026-02-21T09:18:29.7991406Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 48], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7991781Z // end inline asm 2026-02-21T09:18:29.7991922Z // begin inline asm 2026-02-21T09:18:29.7992233Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 64], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7992583Z // end inline asm 2026-02-21T09:18:29.7992717Z // begin inline asm 2026-02-21T09:18:29.7993068Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 80], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7993423Z // end inline asm 2026-02-21T09:18:29.7993549Z // begin inline asm 2026-02-21T09:18:29.7993869Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 96], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7994218Z // end inline asm 2026-02-21T09:18:29.7994353Z // begin inline asm 2026-02-21T09:18:29.7994720Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 112], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7995097Z // end inline asm 2026-02-21T09:18:29.7995240Z // begin inline asm 2026-02-21T09:18:29.7995579Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 128], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7995959Z // end inline asm 2026-02-21T09:18:29.7996092Z // begin inline asm 2026-02-21T09:18:29.7996451Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 144], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7996837Z // end inline asm 2026-02-21T09:18:29.7996972Z // begin inline asm 2026-02-21T09:18:29.7997314Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 160], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7997689Z // end inline asm 2026-02-21T09:18:29.7997833Z // begin inline asm 2026-02-21T09:18:29.7998168Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 176], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7998556Z // end inline asm 2026-02-21T09:18:29.7998698Z // begin inline asm 2026-02-21T09:18:29.7999043Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 192], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.7999434Z // end inline asm 2026-02-21T09:18:29.7999569Z // begin inline asm 2026-02-21T09:18:29.7999913Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 208], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.8000323Z // end inline asm 2026-02-21T09:18:29.8000467Z // begin inline asm 2026-02-21T09:18:29.8000817Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 224], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.8001194Z // end inline asm 2026-02-21T09:18:29.8001340Z // begin inline asm 2026-02-21T09:18:29.8001681Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r826 + 240], {%r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94, %r94}; 2026-02-21T09:18:29.8002065Z // end inline asm 2026-02-21T09:18:29.8002199Z // begin inline asm 2026-02-21T09:18:29.8002367Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:18:29.8002568Z // end inline asm 2026-02-21T09:18:29.8002703Z bar.sync 0; 2026-02-21T09:18:29.8002963Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8003289Z add.s32 %r1566, %r76, 98336; 2026-02-21T09:18:29.8003454Z // begin inline asm 2026-02-21T09:18:29.8003624Z @%p81 mbarrier.init.shared::cta.b64 [%r1566], 1; 2026-02-21T09:18:29.8003825Z // end inline asm 2026-02-21T09:18:29.8003959Z bar.sync 0; 2026-02-21T09:18:29.8004129Z add.s32 %r366, %r76, 98344; 2026-02-21T09:18:29.8004297Z // begin inline asm 2026-02-21T09:18:29.8004462Z @%p81 mbarrier.init.shared::cta.b64 [%r366], 1; 2026-02-21T09:18:29.8004659Z // end inline asm 2026-02-21T09:18:29.8004828Z add.s32 %r367, %r76, 98304; 2026-02-21T09:18:29.8004993Z // begin inline asm 2026-02-21T09:18:29.8005156Z @%p81 mbarrier.init.shared::cta.b64 [%r367], 1; 2026-02-21T09:18:29.8005352Z // end inline asm 2026-02-21T09:18:29.8005511Z bar.sync 0; 2026-02-21T09:18:29.8005655Z add.s32 %r368, %r76, 98312; 2026-02-21T09:18:29.8005809Z // begin inline asm 2026-02-21T09:18:29.8005977Z @%p81 mbarrier.init.shared::cta.b64 [%r368], 1; 2026-02-21T09:18:29.8006167Z // end inline asm 2026-02-21T09:18:29.8006299Z bar.sync 0; 2026-02-21T09:18:29.8006438Z add.s32 %r369, %r76, 98320; 2026-02-21T09:18:29.8006590Z // begin inline asm 2026-02-21T09:18:29.8006759Z @%p81 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T09:18:29.8006951Z // end inline asm 2026-02-21T09:18:29.8007081Z bar.sync 0; 2026-02-21T09:18:29.8007206Z add.s32 %r478, %r76, 98328; 2026-02-21T09:18:29.8007361Z // begin inline asm 2026-02-21T09:18:29.8007519Z @%p81 mbarrier.init.shared::cta.b64 [%r478], 1; 2026-02-21T09:18:29.8007691Z // end inline asm 2026-02-21T09:18:29.8007935Z .loc 1 54 60 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:60 2026-02-21T09:18:29.8008213Z or.b32 %r450, %r447, %r413; 2026-02-21T09:18:29.8008474Z .loc 1 54 32 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:32 2026-02-21T09:18:29.8008767Z mad.wide.u32 %rd31, %r450, 2, %rd9; 2026-02-21T09:18:29.8008947Z cvt.u64.u32 %rd2, %r447; 2026-02-21T09:18:29.8009100Z add.s64 %rd32, %rd31, 65536; 2026-02-21T09:18:29.8009263Z add.s64 %rd33, %rd31, 131072; 2026-02-21T09:18:29.8009428Z add.s64 %rd34, %rd31, 196608; 2026-02-21T09:18:29.8009576Z mov.b32 %r471, 16; 2026-02-21T09:18:29.8009824Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8010104Z // begin inline asm 2026-02-21T09:18:29.8010308Z cp.async.cg.shared.global [ %r371 + 0 ], [ %rd31 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8010525Z // end inline asm 2026-02-21T09:18:29.8010663Z // begin inline asm 2026-02-21T09:18:29.8010854Z cp.async.cg.shared.global [ %r373 + 0 ], [ %rd32 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8011080Z // end inline asm 2026-02-21T09:18:29.8011217Z // begin inline asm 2026-02-21T09:18:29.8011404Z cp.async.cg.shared.global [ %r375 + 0 ], [ %rd33 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8011626Z // end inline asm 2026-02-21T09:18:29.8011753Z // begin inline asm 2026-02-21T09:18:29.8011945Z cp.async.cg.shared.global [ %r377 + 0 ], [ %rd34 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8012196Z // end inline asm 2026-02-21T09:18:29.8012340Z cp.async.commit_group; 2026-02-21T09:18:29.8012591Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8012873Z bar.sync 0; 2026-02-21T09:18:29.8013007Z // begin inline asm 2026-02-21T09:18:29.8013193Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r367], 16384; 2026-02-21T09:18:29.8013413Z // end inline asm 2026-02-21T09:18:29.8013646Z .loc 1 55 44 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:55:44 2026-02-21T09:18:29.8013939Z // begin inline asm 2026-02-21T09:18:29.8014086Z fence.proxy.async.shared::cta; 2026-02-21T09:18:29.8014251Z // end inline asm 2026-02-21T09:18:29.8014424Z bar.sync 0; 2026-02-21T09:18:29.8014566Z elect.sync %r451|%p51, -1; 2026-02-21T09:18:29.8014761Z and.pred %p45, %p1, %p51; 2026-02-21T09:18:29.8014912Z // begin inline asm 2026-02-21T09:18:29.8015239Z @%p45 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r76], [%rd61, {%r94, %r481}], [%r367]; 2026-02-21T09:18:29.8015581Z // end inline asm 2026-02-21T09:18:29.8015869Z .loc 1 54 32 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:32 2026-02-21T09:18:29.8016148Z add.s64 %rd36, %rd31, 64; 2026-02-21T09:18:29.8016304Z or.b32 %r452, %r450, 32; 2026-02-21T09:18:29.8016466Z mad.wide.u32 %rd46, %r452, 2, %rd9; 2026-02-21T09:18:29.8016635Z add.s64 %rd37, %rd46, 65536; 2026-02-21T09:18:29.8016796Z add.s64 %rd38, %rd46, 131072; 2026-02-21T09:18:29.8016950Z add.s64 %rd39, %rd46, 196608; 2026-02-21T09:18:29.8017245Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8017523Z // begin inline asm 2026-02-21T09:18:29.8017718Z cp.async.cg.shared.global [ %r384 + 0 ], [ %rd36 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8017931Z // end inline asm 2026-02-21T09:18:29.8018066Z // begin inline asm 2026-02-21T09:18:29.8018260Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd37 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8018471Z // end inline asm 2026-02-21T09:18:29.8018605Z // begin inline asm 2026-02-21T09:18:29.8018790Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd38 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8019016Z // end inline asm 2026-02-21T09:18:29.8019141Z // begin inline asm 2026-02-21T09:18:29.8019331Z cp.async.cg.shared.global [ %r390 + 0 ], [ %rd39 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8019542Z // end inline asm 2026-02-21T09:18:29.8019686Z cp.async.commit_group; 2026-02-21T09:18:29.8019947Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8020221Z bar.sync 0; 2026-02-21T09:18:29.8020359Z // begin inline asm 2026-02-21T09:18:29.8020543Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r368], 16384; 2026-02-21T09:18:29.8020766Z // end inline asm 2026-02-21T09:18:29.8021001Z .loc 1 55 44 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:55:44 2026-02-21T09:18:29.8021292Z bar.sync 0; 2026-02-21T09:18:29.8021426Z elect.sync %r453|%p52, -1; 2026-02-21T09:18:29.8021596Z and.pred %p47, %p1, %p52; 2026-02-21T09:18:29.8021759Z add.s32 %r393, %r76, 16384; 2026-02-21T09:18:29.8021905Z // begin inline asm 2026-02-21T09:18:29.8022234Z @%p47 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r393], [%rd61, {%r79, %r481}], [%r368]; 2026-02-21T09:18:29.8022570Z // end inline asm 2026-02-21T09:18:29.8022813Z .loc 1 54 32 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:32 2026-02-21T09:18:29.8023089Z add.s64 %rd41, %rd31, 128; 2026-02-21T09:18:29.8023250Z or.b32 %r454, %r450, 64; 2026-02-21T09:18:29.8023406Z mad.wide.u32 %rd47, %r454, 2, %rd9; 2026-02-21T09:18:29.8023577Z add.s64 %rd42, %rd47, 65536; 2026-02-21T09:18:29.8023738Z add.s64 %rd43, %rd47, 131072; 2026-02-21T09:18:29.8023890Z add.s64 %rd44, %rd47, 196608; 2026-02-21T09:18:29.8024196Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8024474Z // begin inline asm 2026-02-21T09:18:29.8024704Z cp.async.cg.shared.global [ %r397 + 0 ], [ %rd41 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8024923Z // end inline asm 2026-02-21T09:18:29.8025062Z // begin inline asm 2026-02-21T09:18:29.8025255Z cp.async.cg.shared.global [ %r399 + 0 ], [ %rd42 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8025473Z // end inline asm 2026-02-21T09:18:29.8025612Z // begin inline asm 2026-02-21T09:18:29.8025800Z cp.async.cg.shared.global [ %r401 + 0 ], [ %rd43 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8026021Z // end inline asm 2026-02-21T09:18:29.8026150Z // begin inline asm 2026-02-21T09:18:29.8026378Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd44 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8026589Z // end inline asm 2026-02-21T09:18:29.8026732Z cp.async.commit_group; 2026-02-21T09:18:29.8026985Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8027262Z bar.sync 0; 2026-02-21T09:18:29.8027393Z // begin inline asm 2026-02-21T09:18:29.8027606Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r369], 16384; 2026-02-21T09:18:29.8027829Z // end inline asm 2026-02-21T09:18:29.8028068Z .loc 1 55 44 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:55:44 2026-02-21T09:18:29.8028350Z bar.sync 0; 2026-02-21T09:18:29.8028482Z elect.sync %r455|%p53, -1; 2026-02-21T09:18:29.8028649Z and.pred %p49, %p1, %p53; 2026-02-21T09:18:29.8028809Z add.s32 %r406, %r76, 32768; 2026-02-21T09:18:29.8028955Z mov.b32 %r407, 64; 2026-02-21T09:18:29.8029149Z // begin inline asm 2026-02-21T09:18:29.8029462Z @%p49 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r406], [%rd61, {%r407, %r481}], [%r369]; 2026-02-21T09:18:29.8029806Z // end inline asm 2026-02-21T09:18:29.8030043Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8030330Z cp.async.wait_group 2; 2026-02-21T09:18:29.8030476Z bar.sync 0; 2026-02-21T09:18:29.8030715Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8031000Z // begin inline asm 2026-02-21T09:18:29.8031132Z 2026-02-21T09:18:29.8031259Z { 2026-02-21T09:18:29.8031383Z .reg .pred complete; 2026-02-21T09:18:29.8031541Z waitLoop: 2026-02-21T09:18:29.8031729Z mbarrier.try_wait.parity.shared.b64 complete, [%r367], %r94; 2026-02-21T09:18:29.8031971Z @!complete bra.uni waitLoop; 2026-02-21T09:18:29.8032118Z } 2026-02-21T09:18:29.8032189Z 2026-02-21T09:18:29.8032244Z // end inline asm 2026-02-21T09:18:29.8032492Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8032775Z setp.ne.b32 %p54, %r61, 0; 2026-02-21T09:18:29.8032935Z @%p54 bra $L__BB0_3; 2026-02-21T09:18:29.8033072Z // %bb.2: 2026-02-21T09:18:29.8033308Z .loc 1 0 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:0:52 2026-02-21T09:18:29.8033590Z add.s32 %r461, %r76, 32; 2026-02-21T09:18:29.8033748Z bfe.u32 %r462, %r461, 4, 14; 2026-02-21T09:18:29.8033899Z cvt.u64.u32 %rd53, %r462; 2026-02-21T09:18:29.8034070Z or.b64 %rd51, %rd53, -9223371899348713472; 2026-02-21T09:18:29.8034251Z add.s32 %r463, %r76, 65536; 2026-02-21T09:18:29.8034400Z add.s32 %r464, %r76, 65568; 2026-02-21T09:18:29.8034556Z bfe.u32 %r465, %r464, 4, 14; 2026-02-21T09:18:29.8034738Z cvt.u64.u32 %rd54, %r465; 2026-02-21T09:18:29.8034911Z or.b64 %rd50, %rd54, -9223371899382267904; 2026-02-21T09:18:29.8035090Z bfe.u32 %r466, %r76, 4, 14; 2026-02-21T09:18:29.8035244Z cvt.u64.u32 %rd55, %r466; 2026-02-21T09:18:29.8035400Z or.b64 %rd49, %rd55, -9223371899348713472; 2026-02-21T09:18:29.8035581Z bfe.u32 %r467, %r463, 4, 14; 2026-02-21T09:18:29.8035731Z cvt.u64.u32 %rd56, %r467; 2026-02-21T09:18:29.8035967Z or.b64 %rd48, %rd56, -9223371899382267904; 2026-02-21T09:18:29.8036246Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8036525Z elect.sync %r468|%p56, -1; 2026-02-21T09:18:29.8036683Z mov.b32 %r457, 138412048; 2026-02-21T09:18:29.8036829Z mov.pred %p55, 0; 2026-02-21T09:18:29.8036973Z // begin inline asm 2026-02-21T09:18:29.8037192Z @%p56 tcgen05.mma.cta_group::1.kind::f16 [ %r1564 + 0 ], %rd48, %rd49, %r457, %p55; 2026-02-21T09:18:29.8037448Z // end inline asm 2026-02-21T09:18:29.8037586Z // begin inline asm 2026-02-21T09:18:29.8037798Z @%p56 tcgen05.mma.cta_group::1.kind::f16 [ %r1564 + 0 ], %rd50, %rd51, %r457, %p22; 2026-02-21T09:18:29.8038098Z // end inline asm 2026-02-21T09:18:29.8038234Z add.s32 %r469, %r76, 98336; 2026-02-21T09:18:29.8038394Z cvt.u64.u32 %rd52, %r469; 2026-02-21T09:18:29.8038544Z // begin inline asm 2026-02-21T09:18:29.8038758Z @%p56 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd52]; 2026-02-21T09:18:29.8038989Z // end inline asm 2026-02-21T09:18:29.8039134Z $L__BB0_3: 2026-02-21T09:18:29.8039403Z .loc 1 0 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:0:52 2026-02-21T09:18:29.8039791Z ld.param.b64 %rd10, [_helion_matmul_param_2]; 2026-02-21T09:18:29.8040007Z add.s32 %r11, %r76, %r425; 2026-02-21T09:18:29.8040164Z add.s32 %r12, %r76, %r426; 2026-02-21T09:18:29.8040326Z add.s32 %r13, %r76, %r427; 2026-02-21T09:18:29.8040477Z add.s32 %r14, %r76, %r428; 2026-02-21T09:18:29.8040639Z add.s32 %r15, %r76, %r429; 2026-02-21T09:18:29.8040790Z add.s32 %r16, %r76, %r430; 2026-02-21T09:18:29.8040985Z add.s32 %r17, %r76, %r431; 2026-02-21T09:18:29.8041154Z add.s32 %r18, %r76, %r432; 2026-02-21T09:18:29.8041306Z add.s32 %r836, %r831, 256; 2026-02-21T09:18:29.8041467Z add.s32 %r841, %r831, 512; 2026-02-21T09:18:29.8041617Z add.s32 %r846, %r831, 768; 2026-02-21T09:18:29.8041775Z add.s32 %r851, %r831, 1024; 2026-02-21T09:18:29.8041929Z add.s32 %r856, %r831, 1280; 2026-02-21T09:18:29.8042089Z add.s32 %r861, %r831, 1536; 2026-02-21T09:18:29.8042240Z add.s32 %r866, %r831, 1792; 2026-02-21T09:18:29.8042398Z or.b32 %r28, %r27, 4; 2026-02-21T09:18:29.8042545Z or.b32 %r29, %r27, 8; 2026-02-21T09:18:29.8042699Z or.b32 %r30, %r27, 12; 2026-02-21T09:18:29.8042857Z or.b32 %r31, %r27, 16; 2026-02-21T09:18:29.8043006Z or.b32 %r32, %r27, 20; 2026-02-21T09:18:29.8043162Z or.b32 %r33, %r27, 24; 2026-02-21T09:18:29.8043307Z or.b32 %r34, %r27, 28; 2026-02-21T09:18:29.8043459Z or.b32 %r35, %r27, 32; 2026-02-21T09:18:29.8043602Z or.b32 %r36, %r27, 36; 2026-02-21T09:18:29.8043751Z or.b32 %r37, %r27, 40; 2026-02-21T09:18:29.8043896Z or.b32 %r38, %r27, 44; 2026-02-21T09:18:29.8044047Z or.b32 %r39, %r27, 48; 2026-02-21T09:18:29.8044189Z or.b32 %r40, %r27, 52; 2026-02-21T09:18:29.8044337Z or.b32 %r41, %r27, 56; 2026-02-21T09:18:29.8044478Z or.b32 %r42, %r27, 60; 2026-02-21T09:18:29.8044626Z or.b32 %r43, %r27, 64; 2026-02-21T09:18:29.8044806Z or.b32 %r44, %r27, 68; 2026-02-21T09:18:29.8044949Z or.b32 %r45, %r27, 72; 2026-02-21T09:18:29.8045098Z or.b32 %r46, %r27, 76; 2026-02-21T09:18:29.8045241Z or.b32 %r47, %r27, 80; 2026-02-21T09:18:29.8045392Z or.b32 %r48, %r27, 84; 2026-02-21T09:18:29.8045533Z or.b32 %r49, %r27, 88; 2026-02-21T09:18:29.8045682Z or.b32 %r50, %r27, 92; 2026-02-21T09:18:29.8045822Z or.b32 %r51, %r27, 96; 2026-02-21T09:18:29.8045975Z or.b32 %r52, %r27, 100; 2026-02-21T09:18:29.8046127Z or.b32 %r53, %r27, 104; 2026-02-21T09:18:29.8046286Z or.b32 %r54, %r27, 108; 2026-02-21T09:18:29.8046438Z or.b32 %r55, %r27, 112; 2026-02-21T09:18:29.8046584Z or.b32 %r56, %r27, 116; 2026-02-21T09:18:29.8046736Z or.b32 %r57, %r27, 120; 2026-02-21T09:18:29.8046881Z or.b32 %r58, %r27, 124; 2026-02-21T09:18:29.8047036Z or.b32 %r60, %r481, %r414; 2026-02-21T09:18:29.8047314Z .loc 1 54 32 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:32 2026-02-21T09:18:29.8047651Z add.s64 %rd57, %rd31, 192; 2026-02-21T09:18:29.8047809Z cvt.u64.u32 %rd63, %r6; 2026-02-21T09:18:29.8047971Z add.s64 %rd64, %rd2, %rd63; 2026-02-21T09:18:29.8048137Z shl.b64 %rd65, %rd64, 1; 2026-02-21T09:18:29.8048293Z add.s64 %rd66, %rd9, %rd65; 2026-02-21T09:18:29.8048461Z add.s64 %rd58, %rd66, 65536; 2026-02-21T09:18:29.8048622Z add.s64 %rd59, %rd66, 131072; 2026-02-21T09:18:29.8048796Z add.s64 %rd60, %rd66, 196608; 2026-02-21T09:18:29.8049072Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8049399Z bar.sync 0; 2026-02-21T09:18:29.8049528Z // begin inline asm 2026-02-21T09:18:29.8049734Z cp.async.cg.shared.global [ %r470 + 0 ], [ %rd57 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8049986Z // end inline asm 2026-02-21T09:18:29.8050120Z // begin inline asm 2026-02-21T09:18:29.8050321Z cp.async.cg.shared.global [ %r472 + 0 ], [ %rd58 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8050535Z // end inline asm 2026-02-21T09:18:29.8050674Z // begin inline asm 2026-02-21T09:18:29.8050862Z cp.async.cg.shared.global [ %r474 + 0 ], [ %rd59 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8051081Z // end inline asm 2026-02-21T09:18:29.8051213Z // begin inline asm 2026-02-21T09:18:29.8051434Z cp.async.cg.shared.global [ %r476 + 0 ], [ %rd60 + 0 ], 0x10, %r471; 2026-02-21T09:18:29.8051661Z // end inline asm 2026-02-21T09:18:29.8051799Z cp.async.commit_group; 2026-02-21T09:18:29.8052063Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8052343Z // begin inline asm 2026-02-21T09:18:29.8052567Z @%p81 mbarrier.arrive.expect_tx.shared.b64 _, [%r478], 16384; 2026-02-21T09:18:29.8052779Z // end inline asm 2026-02-21T09:18:29.8053022Z .loc 1 55 44 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:55:44 2026-02-21T09:18:29.8053296Z bar.sync 0; 2026-02-21T09:18:29.8053437Z elect.sync %r488|%p63, -1; 2026-02-21T09:18:29.8053601Z and.pred %p61, %p1, %p63; 2026-02-21T09:18:29.8053754Z add.s32 %r479, %r76, 49152; 2026-02-21T09:18:29.8053908Z mov.b32 %r480, 96; 2026-02-21T09:18:29.8054041Z // begin inline asm 2026-02-21T09:18:29.8054364Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r479], [%rd61, {%r480, %r481}], [%r478]; 2026-02-21T09:18:29.8054741Z // end inline asm 2026-02-21T09:18:29.8054985Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8055275Z and.b32 %r489, %r1, 3; 2026-02-21T09:18:29.8055423Z mul.wide.u32 %rd67, %r489, 16; 2026-02-21T09:18:29.8055587Z and.b32 %r490, %r3, 7; 2026-02-21T09:18:29.8055731Z shl.b32 %r491, %r490, 17; 2026-02-21T09:18:29.8055887Z shl.b32 %r492, %r4, 10; 2026-02-21T09:18:29.8056028Z or.b32 %r493, %r491, %r492; 2026-02-21T09:18:29.8056185Z mul.wide.u32 %rd68, %r493, 2; 2026-02-21T09:18:29.8056339Z or.b64 %rd69, %rd67, %rd68; 2026-02-21T09:18:29.8056495Z add.s64 %rd70, %rd69, %rd9; 2026-02-21T09:18:29.8056647Z add.s64 %rd629, %rd70, 196864; 2026-02-21T09:18:29.8056808Z mov.b32 %r1570, 1; 2026-02-21T09:18:29.8056947Z mov.b32 %r1569, 3; 2026-02-21T09:18:29.8057076Z mov.b32 %r1565, 0; 2026-02-21T09:18:29.8057215Z mov.b64 %rd630, 0; 2026-02-21T09:18:29.8057349Z mov.b32 %r1567, %r1565; 2026-02-21T09:18:29.8057503Z mov.b32 %r1568, %r1565; 2026-02-21T09:18:29.8057643Z mov.b32 %r1571, %r1565; 2026-02-21T09:18:29.8057788Z bra.uni $L__BB0_4; 2026-02-21T09:18:29.8057966Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:18:29.8058292Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8058578Z setp.lt.u64 %p73, %rd630, 896; 2026-02-21T09:18:29.8058856Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8059146Z // begin inline asm 2026-02-21T09:18:29.8059278Z 2026-02-21T09:18:29.8059426Z { 2026-02-21T09:18:29.8059544Z .reg .pred complete; 2026-02-21T09:18:29.8059691Z waitLoop: 2026-02-21T09:18:29.8059880Z mbarrier.try_wait.parity.shared.b64 complete, [%r1566], %r1565; 2026-02-21T09:18:29.8060117Z @!complete bra.uni waitLoop; 2026-02-21T09:18:29.8060263Z } 2026-02-21T09:18:29.8060333Z 2026-02-21T09:18:29.8060388Z // end inline asm 2026-02-21T09:18:29.8060527Z add.s32 %r535, %r1570, 1; 2026-02-21T09:18:29.8060679Z setp.gt.s32 %p76, %r535, 1; 2026-02-21T09:18:29.8060846Z selp.b32 %r1570, 0, %r535, %p76; 2026-02-21T09:18:29.8061012Z selp.b32 %r536, 1, 0, %p76; 2026-02-21T09:18:29.8061173Z xor.b32 %r74, %r1571, %r536; 2026-02-21T09:18:29.8061436Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8061753Z add.s32 %r537, %r1569, 1; 2026-02-21T09:18:29.8061903Z setp.gt.s32 %p77, %r537, 3; 2026-02-21T09:18:29.8062068Z selp.b32 %r1569, 0, %r537, %p77; 2026-02-21T09:18:29.8062339Z .loc 1 54 32 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:32 2026-02-21T09:18:29.8062622Z add.s64 %rd80, %rd629, -196608; 2026-02-21T09:18:29.8062794Z add.s64 %rd81, %rd629, -131072; 2026-02-21T09:18:29.8062983Z add.s64 %rd82, %rd629, -65536; 2026-02-21T09:18:29.8063260Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8063546Z shl.b32 %r538, %r1569, 13; 2026-02-21T09:18:29.8063706Z add.s32 %r540, %r76, %r538; 2026-02-21T09:18:29.8063865Z add.s32 %r541, %r540, %r5; 2026-02-21T09:18:29.8064013Z bar.sync 0; 2026-02-21T09:18:29.8064153Z add.s32 %r522, %r541, 65536; 2026-02-21T09:18:29.8064335Z selp.b32 %r523, 16, 0, %p73; 2026-02-21T09:18:29.8064494Z // begin inline asm 2026-02-21T09:18:29.8064722Z cp.async.cg.shared.global [ %r522 + 0 ], [ %rd80 + 0 ], 0x10, %r523; 2026-02-21T09:18:29.8064947Z // end inline asm 2026-02-21T09:18:29.8065078Z add.s32 %r524, %r541, 67584; 2026-02-21T09:18:29.8065235Z // begin inline asm 2026-02-21T09:18:29.8065424Z cp.async.cg.shared.global [ %r524 + 0 ], [ %rd81 + 0 ], 0x10, %r523; 2026-02-21T09:18:29.8065645Z // end inline asm 2026-02-21T09:18:29.8065783Z add.s32 %r526, %r541, 69632; 2026-02-21T09:18:29.8065933Z // begin inline asm 2026-02-21T09:18:29.8066125Z cp.async.cg.shared.global [ %r526 + 0 ], [ %rd82 + 0 ], 0x10, %r523; 2026-02-21T09:18:29.8066337Z // end inline asm 2026-02-21T09:18:29.8066477Z add.s32 %r528, %r541, 71680; 2026-02-21T09:18:29.8066623Z // begin inline asm 2026-02-21T09:18:29.8066819Z cp.async.cg.shared.global [ %r528 + 0 ], [ %rd629 + 0 ], 0x10, %r523; 2026-02-21T09:18:29.8067029Z // end inline asm 2026-02-21T09:18:29.8067174Z cp.async.commit_group; 2026-02-21T09:18:29.8067437Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8067711Z shl.b32 %r542, %r1569, 3; 2026-02-21T09:18:29.8067871Z add.s32 %r543, %r76, %r542; 2026-02-21T09:18:29.8068023Z add.s32 %r534, %r543, 98304; 2026-02-21T09:18:29.8068193Z and.pred %p71, %p81, %p73; 2026-02-21T09:18:29.8068347Z // begin inline asm 2026-02-21T09:18:29.8068540Z @%p71 mbarrier.arrive.expect_tx.shared.b64 _, [%r534], 16384; 2026-02-21T09:18:29.8068754Z // end inline asm 2026-02-21T09:18:29.8069000Z .loc 1 55 44 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:55:44 2026-02-21T09:18:29.8069279Z shl.b32 %r544, %r1569, 14; 2026-02-21T09:18:29.8069429Z add.s32 %r531, %r76, %r544; 2026-02-21T09:18:29.8069582Z bar.sync 0; 2026-02-21T09:18:29.8069714Z elect.sync %r545|%p78, -1; 2026-02-21T09:18:29.8069875Z and.pred %p79, %p73, %p78; 2026-02-21T09:18:29.8070029Z and.pred %p72, %p1, %p79; 2026-02-21T09:18:29.8070192Z cvt.u32.u64 %r546, %rd630; 2026-02-21T09:18:29.8070338Z add.s32 %r532, %r546, 128; 2026-02-21T09:18:29.8070489Z // begin inline asm 2026-02-21T09:18:29.8070811Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r531], [%rd61, {%r532, %r481}], [%r534]; 2026-02-21T09:18:29.8071184Z // end inline asm 2026-02-21T09:18:29.8071432Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8071711Z add.s64 %rd629, %rd629, 64; 2026-02-21T09:18:29.8071875Z setp.lt.u64 %p80, %rd630, 960; 2026-02-21T09:18:29.8072033Z add.s64 %rd630, %rd630, 32; 2026-02-21T09:18:29.8072188Z mov.b32 %r1565, %r1571; 2026-02-21T09:18:29.8072331Z mov.b32 %r1566, %r547; 2026-02-21T09:18:29.8072481Z mov.b32 %r1571, %r74; 2026-02-21T09:18:29.8072629Z @%p80 bra $L__BB0_4; 2026-02-21T09:18:29.8072767Z bra.uni $L__BB0_7; 2026-02-21T09:18:29.8072954Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:18:29.8073192Z add.s32 %r496, %r1568, 1; 2026-02-21T09:18:29.8073350Z setp.gt.s32 %p65, %r496, 3; 2026-02-21T09:18:29.8073508Z selp.b32 %r1568, 0, %r496, %p65; 2026-02-21T09:18:29.8073677Z selp.b32 %r497, 1, 0, %p65; 2026-02-21T09:18:29.8073829Z xor.b32 %r1567, %r1567, %r497; 2026-02-21T09:18:29.8074094Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8074385Z cp.async.wait_group 2; 2026-02-21T09:18:29.8074575Z bar.sync 0; 2026-02-21T09:18:29.8074836Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8075109Z shl.b32 %r498, %r1568, 3; 2026-02-21T09:18:29.8075263Z add.s32 %r500, %r76, %r498; 2026-02-21T09:18:29.8075412Z add.s32 %r494, %r500, 98304; 2026-02-21T09:18:29.8075568Z // begin inline asm 2026-02-21T09:18:29.8075695Z 2026-02-21T09:18:29.8075811Z { 2026-02-21T09:18:29.8075980Z .reg .pred complete; 2026-02-21T09:18:29.8076129Z waitLoop: 2026-02-21T09:18:29.8076321Z mbarrier.try_wait.parity.shared.b64 complete, [%r494], %r1567; 2026-02-21T09:18:29.8076552Z @!complete bra.uni waitLoop; 2026-02-21T09:18:29.8076706Z } 2026-02-21T09:18:29.8076768Z 2026-02-21T09:18:29.8076822Z // end inline asm 2026-02-21T09:18:29.8076964Z shl.b32 %r501, %r1570, 3; 2026-02-21T09:18:29.8077111Z add.s32 %r502, %r76, %r501; 2026-02-21T09:18:29.8077273Z add.s32 %r547, %r502, 98336; 2026-02-21T09:18:29.8077534Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8077830Z @%p54 bra $L__BB0_6; 2026-02-21T09:18:29.8078026Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:18:29.8078335Z .loc 1 55 44 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:55:44 2026-02-21T09:18:29.8078619Z shl.b32 %r507, %r1568, 14; 2026-02-21T09:18:29.8078769Z add.s32 %r509, %r76, %r507; 2026-02-21T09:18:29.8079029Z .loc 1 54 85 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:54:85 2026-02-21T09:18:29.8079304Z shl.b32 %r510, %r1568, 13; 2026-02-21T09:18:29.8079459Z add.s32 %r511, %r76, %r510; 2026-02-21T09:18:29.8079615Z add.s32 %r512, %r511, 65536; 2026-02-21T09:18:29.8079868Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8080152Z elect.sync %r513|%p67, -1; 2026-02-21T09:18:29.8080307Z bfe.u32 %r514, %r512, 4, 14; 2026-02-21T09:18:29.8080464Z cvt.u64.u32 %rd76, %r514; 2026-02-21T09:18:29.8080624Z or.b64 %rd71, %rd76, -9223371899382267904; 2026-02-21T09:18:29.8080807Z bfe.u32 %r515, %r509, 4, 14; 2026-02-21T09:18:29.8080957Z cvt.u64.u32 %rd77, %r515; 2026-02-21T09:18:29.8081126Z or.b64 %rd72, %rd77, -9223371899348713472; 2026-02-21T09:18:29.8081302Z mov.b32 %r504, 138412048; 2026-02-21T09:18:29.8081447Z mov.pred %p66, -1; 2026-02-21T09:18:29.8081594Z // begin inline asm 2026-02-21T09:18:29.8081811Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r1564 + 0 ], %rd71, %rd72, %r504, %p66; 2026-02-21T09:18:29.8082084Z // end inline asm 2026-02-21T09:18:29.8082222Z add.s32 %r516, %r511, 65568; 2026-02-21T09:18:29.8082383Z bfe.u32 %r517, %r516, 4, 14; 2026-02-21T09:18:29.8082577Z cvt.u64.u32 %rd78, %r517; 2026-02-21T09:18:29.8082747Z or.b64 %rd73, %rd78, -9223371899382267904; 2026-02-21T09:18:29.8082931Z add.s32 %r518, %r509, 32; 2026-02-21T09:18:29.8083084Z bfe.u32 %r519, %r518, 4, 14; 2026-02-21T09:18:29.8083244Z cvt.u64.u32 %rd79, %r519; 2026-02-21T09:18:29.8083407Z or.b64 %rd74, %rd79, -9223371899348713472; 2026-02-21T09:18:29.8083589Z // begin inline asm 2026-02-21T09:18:29.8083809Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r1564 + 0 ], %rd73, %rd74, %r504, %p66; 2026-02-21T09:18:29.8084070Z // end inline asm 2026-02-21T09:18:29.8084206Z cvt.u64.u32 %rd75, %r547; 2026-02-21T09:18:29.8084363Z // begin inline asm 2026-02-21T09:18:29.8084580Z @%p67 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd75]; 2026-02-21T09:18:29.8084874Z // end inline asm 2026-02-21T09:18:29.8085017Z bra.uni $L__BB0_6; 2026-02-21T09:18:29.8085197Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:18:29.8085527Z .loc 1 0 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:0:52 2026-02-21T09:18:29.8085817Z mov.b32 %r548, 1; 2026-02-21T09:18:29.8086125Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8086449Z // begin inline asm 2026-02-21T09:18:29.8086584Z 2026-02-21T09:18:29.8086706Z { 2026-02-21T09:18:29.8086830Z .reg .pred complete; 2026-02-21T09:18:29.8086986Z waitLoop: 2026-02-21T09:18:29.8087176Z mbarrier.try_wait.parity.shared.b64 complete, [%r547], %r548; 2026-02-21T09:18:29.8087424Z @!complete bra.uni waitLoop; 2026-02-21T09:18:29.8087581Z } 2026-02-21T09:18:29.8087657Z 2026-02-21T09:18:29.8087741Z // end inline asm 2026-02-21T09:18:29.8087994Z .loc 1 49 57 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:49:57 2026-02-21T09:18:29.8088318Z cp.async.wait_group 0; 2026-02-21T09:18:29.8088477Z bar.sync 0; 2026-02-21T09:18:29.8088608Z // begin inline asm 2026-02-21T09:18:29.8088784Z @%p81 mbarrier.inval.shared::cta.b64 [%r367]; 2026-02-21T09:18:29.8088975Z // end inline asm 2026-02-21T09:18:29.8089114Z bar.sync 0; 2026-02-21T09:18:29.8089246Z // begin inline asm 2026-02-21T09:18:29.8089417Z @%p81 mbarrier.inval.shared::cta.b64 [%r368]; 2026-02-21T09:18:29.8089601Z // end inline asm 2026-02-21T09:18:29.8089742Z bar.sync 0; 2026-02-21T09:18:29.8089870Z // begin inline asm 2026-02-21T09:18:29.8090040Z @%p81 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T09:18:29.8090230Z // end inline asm 2026-02-21T09:18:29.8090364Z bar.sync 0; 2026-02-21T09:18:29.8090499Z // begin inline asm 2026-02-21T09:18:29.8090660Z @%p81 mbarrier.inval.shared::cta.b64 [%r478]; 2026-02-21T09:18:29.8090852Z // end inline asm 2026-02-21T09:18:29.8090991Z add.s32 %r553, %r76, 98336; 2026-02-21T09:18:29.8091153Z // begin inline asm 2026-02-21T09:18:29.8091314Z @%p81 mbarrier.inval.shared::cta.b64 [%r553]; 2026-02-21T09:18:29.8091505Z // end inline asm 2026-02-21T09:18:29.8091646Z bar.sync 0; 2026-02-21T09:18:29.8091774Z // begin inline asm 2026-02-21T09:18:29.8091942Z @%p81 mbarrier.inval.shared::cta.b64 [%r366]; 2026-02-21T09:18:29.8092125Z // end inline asm 2026-02-21T09:18:29.8092386Z .loc 1 59 45 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:59:45 2026-02-21T09:18:29.8092660Z shl.b32 %r1116, %r27, 13; 2026-02-21T09:18:29.8092812Z shl.b32 %r1117, %r28, 13; 2026-02-21T09:18:29.8092956Z shl.b32 %r1118, %r29, 13; 2026-02-21T09:18:29.8093103Z shl.b32 %r1119, %r30, 13; 2026-02-21T09:18:29.8093250Z shl.b32 %r1120, %r31, 13; 2026-02-21T09:18:29.8093389Z shl.b32 %r1121, %r32, 13; 2026-02-21T09:18:29.8093537Z shl.b32 %r1122, %r33, 13; 2026-02-21T09:18:29.8093678Z shl.b32 %r1123, %r34, 13; 2026-02-21T09:18:29.8093828Z shl.b32 %r1124, %r35, 13; 2026-02-21T09:18:29.8093969Z shl.b32 %r1125, %r36, 13; 2026-02-21T09:18:29.8094114Z shl.b32 %r1126, %r37, 13; 2026-02-21T09:18:29.8094254Z shl.b32 %r1127, %r38, 13; 2026-02-21T09:18:29.8094434Z shl.b32 %r1128, %r39, 13; 2026-02-21T09:18:29.8094576Z shl.b32 %r1129, %r40, 13; 2026-02-21T09:18:29.8094756Z shl.b32 %r1130, %r41, 13; 2026-02-21T09:18:29.8094905Z shl.b32 %r1131, %r42, 13; 2026-02-21T09:18:29.8095047Z shl.b32 %r1132, %r43, 13; 2026-02-21T09:18:29.8095198Z shl.b32 %r1133, %r44, 13; 2026-02-21T09:18:29.8095341Z shl.b32 %r1134, %r45, 13; 2026-02-21T09:18:29.8095494Z shl.b32 %r1135, %r46, 13; 2026-02-21T09:18:29.8095640Z shl.b32 %r1136, %r47, 13; 2026-02-21T09:18:29.8095800Z shl.b32 %r1137, %r48, 13; 2026-02-21T09:18:29.8107558Z shl.b32 %r1138, %r49, 13; 2026-02-21T09:18:29.8107761Z shl.b32 %r1139, %r50, 13; 2026-02-21T09:18:29.8107942Z shl.b32 %r1140, %r51, 13; 2026-02-21T09:18:29.8108194Z shl.b32 %r1141, %r52, 13; 2026-02-21T09:18:29.8108349Z shl.b32 %r1142, %r53, 13; 2026-02-21T09:18:29.8108493Z shl.b32 %r1143, %r54, 13; 2026-02-21T09:18:29.8108651Z shl.b32 %r1144, %r55, 13; 2026-02-21T09:18:29.8108793Z shl.b32 %r1145, %r56, 13; 2026-02-21T09:18:29.8108948Z shl.b32 %r1146, %r57, 13; 2026-02-21T09:18:29.8109094Z shl.b32 %r1147, %r58, 13; 2026-02-21T09:18:29.8109394Z .loc 1 59 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:59:52 2026-02-21T09:18:29.8109752Z or.b32 %r1148, %r1116, %r60; 2026-02-21T09:18:29.8109918Z or.b32 %r1149, %r1117, %r60; 2026-02-21T09:18:29.8110083Z or.b32 %r1150, %r1118, %r60; 2026-02-21T09:18:29.8110233Z or.b32 %r1151, %r1119, %r60; 2026-02-21T09:18:29.8110390Z or.b32 %r1152, %r1120, %r60; 2026-02-21T09:18:29.8110538Z or.b32 %r1153, %r1121, %r60; 2026-02-21T09:18:29.8110696Z or.b32 %r1154, %r1122, %r60; 2026-02-21T09:18:29.8110880Z or.b32 %r1155, %r1123, %r60; 2026-02-21T09:18:29.8111038Z or.b32 %r1156, %r1124, %r60; 2026-02-21T09:18:29.8111185Z or.b32 %r1157, %r1125, %r60; 2026-02-21T09:18:29.8111340Z or.b32 %r1158, %r1126, %r60; 2026-02-21T09:18:29.8111492Z or.b32 %r1159, %r1127, %r60; 2026-02-21T09:18:29.8111640Z or.b32 %r1160, %r1128, %r60; 2026-02-21T09:18:29.8111800Z or.b32 %r1161, %r1129, %r60; 2026-02-21T09:18:29.8111947Z or.b32 %r1162, %r1130, %r60; 2026-02-21T09:18:29.8112102Z or.b32 %r1163, %r1131, %r60; 2026-02-21T09:18:29.8112248Z or.b32 %r1164, %r1132, %r60; 2026-02-21T09:18:29.8112407Z or.b32 %r1165, %r1133, %r60; 2026-02-21T09:18:29.8112556Z or.b32 %r1166, %r1134, %r60; 2026-02-21T09:18:29.8112713Z or.b32 %r1167, %r1135, %r60; 2026-02-21T09:18:29.8112861Z or.b32 %r1168, %r1136, %r60; 2026-02-21T09:18:29.8113018Z or.b32 %r1169, %r1137, %r60; 2026-02-21T09:18:29.8113172Z or.b32 %r1170, %r1138, %r60; 2026-02-21T09:18:29.8113319Z or.b32 %r1171, %r1139, %r60; 2026-02-21T09:18:29.8113477Z or.b32 %r1172, %r1140, %r60; 2026-02-21T09:18:29.8113625Z or.b32 %r1173, %r1141, %r60; 2026-02-21T09:18:29.8113782Z or.b32 %r1174, %r1142, %r60; 2026-02-21T09:18:29.8113929Z or.b32 %r1175, %r1143, %r60; 2026-02-21T09:18:29.8114089Z or.b32 %r1176, %r1144, %r60; 2026-02-21T09:18:29.8114236Z or.b32 %r1177, %r1145, %r60; 2026-02-21T09:18:29.8114397Z or.b32 %r1178, %r1146, %r60; 2026-02-21T09:18:29.8114551Z or.b32 %r1179, %r1147, %r60; 2026-02-21T09:18:29.8114891Z .loc 1 59 24 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:59:24 2026-02-21T09:18:29.8115207Z mad.wide.u32 %rd85, %r1148, 2, %rd10; 2026-02-21T09:18:29.8115396Z mad.wide.u32 %rd86, %r1149, 2, %rd10; 2026-02-21T09:18:29.8115583Z mad.wide.u32 %rd87, %r1150, 2, %rd10; 2026-02-21T09:18:29.8115754Z mad.wide.u32 %rd88, %r1151, 2, %rd10; 2026-02-21T09:18:29.8115934Z mad.wide.u32 %rd89, %r1152, 2, %rd10; 2026-02-21T09:18:29.8116102Z mad.wide.u32 %rd90, %r1153, 2, %rd10; 2026-02-21T09:18:29.8116285Z mad.wide.u32 %rd91, %r1154, 2, %rd10; 2026-02-21T09:18:29.8116463Z mad.wide.u32 %rd92, %r1155, 2, %rd10; 2026-02-21T09:18:29.8116634Z mad.wide.u32 %rd93, %r1156, 2, %rd10; 2026-02-21T09:18:29.8116810Z mad.wide.u32 %rd94, %r1157, 2, %rd10; 2026-02-21T09:18:29.8116977Z mad.wide.u32 %rd95, %r1158, 2, %rd10; 2026-02-21T09:18:29.8117189Z mad.wide.u32 %rd96, %r1159, 2, %rd10; 2026-02-21T09:18:29.8117356Z mad.wide.u32 %rd97, %r1160, 2, %rd10; 2026-02-21T09:18:29.8117533Z mad.wide.u32 %rd98, %r1161, 2, %rd10; 2026-02-21T09:18:29.8117704Z mad.wide.u32 %rd99, %r1162, 2, %rd10; 2026-02-21T09:18:29.8117888Z mad.wide.u32 %rd100, %r1163, 2, %rd10; 2026-02-21T09:18:29.8118074Z mad.wide.u32 %rd101, %r1164, 2, %rd10; 2026-02-21T09:18:29.8118247Z mad.wide.u32 %rd102, %r1165, 2, %rd10; 2026-02-21T09:18:29.8118429Z mad.wide.u32 %rd103, %r1166, 2, %rd10; 2026-02-21T09:18:29.8118600Z mad.wide.u32 %rd104, %r1167, 2, %rd10; 2026-02-21T09:18:29.8118785Z mad.wide.u32 %rd105, %r1168, 2, %rd10; 2026-02-21T09:18:29.8118960Z mad.wide.u32 %rd106, %r1169, 2, %rd10; 2026-02-21T09:18:29.8119172Z mad.wide.u32 %rd107, %r1170, 2, %rd10; 2026-02-21T09:18:29.8119339Z mad.wide.u32 %rd108, %r1171, 2, %rd10; 2026-02-21T09:18:29.8119519Z mad.wide.u32 %rd109, %r1172, 2, %rd10; 2026-02-21T09:18:29.8119698Z mad.wide.u32 %rd110, %r1173, 2, %rd10; 2026-02-21T09:18:29.8119867Z mad.wide.u32 %rd111, %r1174, 2, %rd10; 2026-02-21T09:18:29.8120049Z mad.wide.u32 %rd112, %r1175, 2, %rd10; 2026-02-21T09:18:29.8120219Z mad.wide.u32 %rd113, %r1176, 2, %rd10; 2026-02-21T09:18:29.8120421Z mad.wide.u32 %rd114, %r1177, 2, %rd10; 2026-02-21T09:18:29.8120598Z mad.wide.u32 %rd115, %r1178, 2, %rd10; 2026-02-21T09:18:29.8120780Z mad.wide.u32 %rd116, %r1179, 2, %rd10; 2026-02-21T09:18:29.8121063Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8121361Z // begin inline asm 2026-02-21T09:18:29.8121769Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570}, [%r826 + 0]; 2026-02-21T09:18:29.8122167Z // end inline asm 2026-02-21T09:18:29.8122316Z // begin inline asm 2026-02-21T09:18:29.8122663Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587}, [%r826 + 16]; 2026-02-21T09:18:29.8123048Z // end inline asm 2026-02-21T09:18:29.8123196Z // begin inline asm 2026-02-21T09:18:29.8123539Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604}, [%r826 + 32]; 2026-02-21T09:18:29.8123937Z // end inline asm 2026-02-21T09:18:29.8124070Z // begin inline asm 2026-02-21T09:18:29.8124415Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621}, [%r826 + 48]; 2026-02-21T09:18:29.8124827Z // end inline asm 2026-02-21T09:18:29.8124980Z // begin inline asm 2026-02-21T09:18:29.8125330Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638}, [%r826 + 64]; 2026-02-21T09:18:29.8125695Z // end inline asm 2026-02-21T09:18:29.8125838Z // begin inline asm 2026-02-21T09:18:29.8126181Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655}, [%r826 + 80]; 2026-02-21T09:18:29.8126570Z // end inline asm 2026-02-21T09:18:29.8126700Z // begin inline asm 2026-02-21T09:18:29.8127057Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672}, [%r826 + 96]; 2026-02-21T09:18:29.8127469Z // end inline asm 2026-02-21T09:18:29.8127613Z // begin inline asm 2026-02-21T09:18:29.8128004Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689}, [%r826 + 112]; 2026-02-21T09:18:29.8128405Z // end inline asm 2026-02-21T09:18:29.8128555Z // begin inline asm 2026-02-21T09:18:29.8128938Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706}, [%r826 + 128]; 2026-02-21T09:18:29.8129385Z // end inline asm 2026-02-21T09:18:29.8129532Z // begin inline asm 2026-02-21T09:18:29.8129891Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723}, [%r826 + 144]; 2026-02-21T09:18:29.8130301Z // end inline asm 2026-02-21T09:18:29.8130440Z // begin inline asm 2026-02-21T09:18:29.8130811Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740}, [%r826 + 160]; 2026-02-21T09:18:29.8131221Z // end inline asm 2026-02-21T09:18:29.8131393Z // begin inline asm 2026-02-21T09:18:29.8131768Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757}, [%r826 + 176]; 2026-02-21T09:18:29.8132159Z // end inline asm 2026-02-21T09:18:29.8132315Z // begin inline asm 2026-02-21T09:18:29.8132674Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774}, [%r826 + 192]; 2026-02-21T09:18:29.8133121Z // end inline asm 2026-02-21T09:18:29.8133270Z // begin inline asm 2026-02-21T09:18:29.8133633Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791}, [%r826 + 208]; 2026-02-21T09:18:29.8134038Z // end inline asm 2026-02-21T09:18:29.8134175Z // begin inline asm 2026-02-21T09:18:29.8134569Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808}, [%r826 + 224]; 2026-02-21T09:18:29.8135009Z // end inline asm 2026-02-21T09:18:29.8135153Z // begin inline asm 2026-02-21T09:18:29.8135511Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825}, [%r826 + 240]; 2026-02-21T09:18:29.8135898Z // end inline asm 2026-02-21T09:18:29.8136045Z // begin inline asm 2026-02-21T09:18:29.8136211Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:18:29.8136391Z // end inline asm 2026-02-21T09:18:29.8136537Z cvt.u64.u32 %rd117, %r555; 2026-02-21T09:18:29.8136713Z cvt.u64.u32 %rd118, %r556; 2026-02-21T09:18:29.8136882Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:18:29.8137056Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:18:29.8137341Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8137631Z mov.b64 {%r1180, %r1181}, %rd120; 2026-02-21T09:18:29.8137825Z cvt.rn.f16x2.f32 %r1182, %r1181, %r1180; 2026-02-21T09:18:29.8138110Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8138399Z cvt.u64.u32 %rd121, %r557; 2026-02-21T09:18:29.8138555Z cvt.u64.u32 %rd122, %r558; 2026-02-21T09:18:29.8138721Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:18:29.8138890Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:18:29.8139157Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8139455Z mov.b64 {%r1183, %r1184}, %rd124; 2026-02-21T09:18:29.8139628Z cvt.rn.f16x2.f32 %r1185, %r1184, %r1183; 2026-02-21T09:18:29.8139917Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8140196Z cvt.u64.u32 %rd125, %r559; 2026-02-21T09:18:29.8140363Z cvt.u64.u32 %rd126, %r560; 2026-02-21T09:18:29.8140517Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:18:29.8140686Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:18:29.8140962Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8141243Z mov.b64 {%r1186, %r1187}, %rd128; 2026-02-21T09:18:29.8141453Z cvt.rn.f16x2.f32 %r1188, %r1187, %r1186; 2026-02-21T09:18:29.8141728Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8142014Z cvt.u64.u32 %rd129, %r561; 2026-02-21T09:18:29.8142168Z cvt.u64.u32 %rd130, %r562; 2026-02-21T09:18:29.8142329Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:18:29.8142493Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:18:29.8142753Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8143040Z mov.b64 {%r1189, %r1190}, %rd132; 2026-02-21T09:18:29.8143212Z cvt.rn.f16x2.f32 %r1191, %r1190, %r1189; 2026-02-21T09:18:29.8143498Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8143812Z cvt.u64.u32 %rd133, %r563; 2026-02-21T09:18:29.8143972Z cvt.u64.u32 %rd134, %r564; 2026-02-21T09:18:29.8144122Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:18:29.8144289Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:18:29.8144553Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8144873Z mov.b64 {%r1192, %r1193}, %rd136; 2026-02-21T09:18:29.8145081Z cvt.rn.f16x2.f32 %r1194, %r1193, %r1192; 2026-02-21T09:18:29.8145364Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8145642Z cvt.u64.u32 %rd137, %r565; 2026-02-21T09:18:29.8145803Z cvt.u64.u32 %rd138, %r566; 2026-02-21T09:18:29.8145953Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:18:29.8146116Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:18:29.8146423Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8146718Z mov.b64 {%r1195, %r1196}, %rd140; 2026-02-21T09:18:29.8146893Z cvt.rn.f16x2.f32 %r1197, %r1196, %r1195; 2026-02-21T09:18:29.8147167Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8147449Z cvt.u64.u32 %rd141, %r567; 2026-02-21T09:18:29.8147595Z cvt.u64.u32 %rd142, %r568; 2026-02-21T09:18:29.8147748Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:18:29.8147900Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:18:29.8148163Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8148441Z mov.b64 {%r1198, %r1199}, %rd144; 2026-02-21T09:18:29.8148615Z cvt.rn.f16x2.f32 %r1200, %r1199, %r1198; 2026-02-21T09:18:29.8148893Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8149173Z cvt.u64.u32 %rd145, %r569; 2026-02-21T09:18:29.8149334Z cvt.u64.u32 %rd146, %r570; 2026-02-21T09:18:29.8149481Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:18:29.8149641Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:18:29.8149904Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8150189Z mov.b64 {%r1201, %r1202}, %rd148; 2026-02-21T09:18:29.8150363Z cvt.rn.f16x2.f32 %r1203, %r1202, %r1201; 2026-02-21T09:18:29.8150643Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8150922Z cvt.u64.u32 %rd149, %r572; 2026-02-21T09:18:29.8151068Z cvt.u64.u32 %rd150, %r573; 2026-02-21T09:18:29.8151222Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:18:29.8151372Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:18:29.8151633Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8151909Z mov.b64 {%r1204, %r1205}, %rd152; 2026-02-21T09:18:29.8152088Z cvt.rn.f16x2.f32 %r1206, %r1205, %r1204; 2026-02-21T09:18:29.8152367Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8152641Z cvt.u64.u32 %rd153, %r574; 2026-02-21T09:18:29.8152823Z cvt.u64.u32 %rd154, %r575; 2026-02-21T09:18:29.8152976Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:18:29.8153126Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:18:29.8153388Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8153661Z mov.b64 {%r1207, %r1208}, %rd156; 2026-02-21T09:18:29.8153832Z cvt.rn.f16x2.f32 %r1209, %r1208, %r1207; 2026-02-21T09:18:29.8154100Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8154378Z cvt.u64.u32 %rd157, %r576; 2026-02-21T09:18:29.8154533Z cvt.u64.u32 %rd158, %r577; 2026-02-21T09:18:29.8154709Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:18:29.8154902Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:18:29.8155158Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8155442Z mov.b64 {%r1210, %r1211}, %rd160; 2026-02-21T09:18:29.8155606Z cvt.rn.f16x2.f32 %r1212, %r1211, %r1210; 2026-02-21T09:18:29.8155886Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8156166Z cvt.u64.u32 %rd161, %r578; 2026-02-21T09:18:29.8156348Z cvt.u64.u32 %rd162, %r579; 2026-02-21T09:18:29.8156507Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:18:29.8156659Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:18:29.8156923Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8157201Z mov.b64 {%r1213, %r1214}, %rd164; 2026-02-21T09:18:29.8157370Z cvt.rn.f16x2.f32 %r1215, %r1214, %r1213; 2026-02-21T09:18:29.8157667Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8157960Z cvt.u64.u32 %rd165, %r580; 2026-02-21T09:18:29.8158110Z cvt.u64.u32 %rd166, %r581; 2026-02-21T09:18:29.8158255Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:18:29.8158410Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:18:29.8158667Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8158948Z mov.b64 {%r1216, %r1217}, %rd168; 2026-02-21T09:18:29.8159114Z cvt.rn.f16x2.f32 %r1218, %r1217, %r1216; 2026-02-21T09:18:29.8159392Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8159670Z cvt.u64.u32 %rd169, %r582; 2026-02-21T09:18:29.8159818Z cvt.u64.u32 %rd170, %r583; 2026-02-21T09:18:29.8159971Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:18:29.8160122Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:18:29.8160388Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8160663Z mov.b64 {%r1219, %r1220}, %rd172; 2026-02-21T09:18:29.8160834Z cvt.rn.f16x2.f32 %r1221, %r1220, %r1219; 2026-02-21T09:18:29.8161107Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8161390Z cvt.u64.u32 %rd173, %r584; 2026-02-21T09:18:29.8161545Z cvt.u64.u32 %rd174, %r585; 2026-02-21T09:18:29.8161692Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:18:29.8161855Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:18:29.8162113Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8162398Z mov.b64 {%r1222, %r1223}, %rd176; 2026-02-21T09:18:29.8162564Z cvt.rn.f16x2.f32 %r1224, %r1223, %r1222; 2026-02-21T09:18:29.8162850Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8163134Z cvt.u64.u32 %rd177, %r586; 2026-02-21T09:18:29.8163282Z cvt.u64.u32 %rd178, %r587; 2026-02-21T09:18:29.8163436Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:18:29.8163585Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:18:29.8163849Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8164156Z mov.b64 {%r1225, %r1226}, %rd180; 2026-02-21T09:18:29.8164327Z cvt.rn.f16x2.f32 %r1227, %r1226, %r1225; 2026-02-21T09:18:29.8164600Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8164917Z cvt.u64.u32 %rd181, %r589; 2026-02-21T09:18:29.8165070Z cvt.u64.u32 %rd182, %r590; 2026-02-21T09:18:29.8165217Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:18:29.8165377Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:18:29.8165632Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8165915Z mov.b64 {%r1228, %r1229}, %rd184; 2026-02-21T09:18:29.8166104Z cvt.rn.f16x2.f32 %r1230, %r1229, %r1228; 2026-02-21T09:18:29.8166381Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8166665Z cvt.u64.u32 %rd185, %r591; 2026-02-21T09:18:29.8166814Z cvt.u64.u32 %rd186, %r592; 2026-02-21T09:18:29.8166970Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:18:29.8167122Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:18:29.8167408Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8167685Z mov.b64 {%r1231, %r1232}, %rd188; 2026-02-21T09:18:29.8167855Z cvt.rn.f16x2.f32 %r1233, %r1232, %r1231; 2026-02-21T09:18:29.8168130Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8168414Z cvt.u64.u32 %rd189, %r593; 2026-02-21T09:18:29.8168471Z cvt.u64.u32 %rd190, %r594; 2026-02-21T09:18:29.8168558Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:18:29.8168618Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:18:29.8168784Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8168848Z mov.b64 {%r1234, %r1235}, %rd192; 2026-02-21T09:18:29.8168912Z cvt.rn.f16x2.f32 %r1236, %r1235, %r1234; 2026-02-21T09:18:29.8169072Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8169127Z cvt.u64.u32 %rd193, %r595; 2026-02-21T09:18:29.8169192Z cvt.u64.u32 %rd194, %r596; 2026-02-21T09:18:29.8169248Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:18:29.8169305Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:18:29.8169474Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8169532Z mov.b64 {%r1237, %r1238}, %rd196; 2026-02-21T09:18:29.8169596Z cvt.rn.f16x2.f32 %r1239, %r1238, %r1237; 2026-02-21T09:18:29.8169762Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8169820Z cvt.u64.u32 %rd197, %r597; 2026-02-21T09:18:29.8169876Z cvt.u64.u32 %rd198, %r598; 2026-02-21T09:18:29.8169931Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:18:29.8169995Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:18:29.8170155Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8170213Z mov.b64 {%r1240, %r1241}, %rd200; 2026-02-21T09:18:29.8170286Z cvt.rn.f16x2.f32 %r1242, %r1241, %r1240; 2026-02-21T09:18:29.8170450Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8170508Z cvt.u64.u32 %rd201, %r599; 2026-02-21T09:18:29.8170574Z cvt.u64.u32 %rd202, %r600; 2026-02-21T09:18:29.8170632Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:18:29.8170691Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:18:29.8170880Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8170951Z mov.b64 {%r1243, %r1244}, %rd204; 2026-02-21T09:18:29.8171017Z cvt.rn.f16x2.f32 %r1245, %r1244, %r1243; 2026-02-21T09:18:29.8171202Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8171298Z cvt.u64.u32 %rd205, %r601; 2026-02-21T09:18:29.8171359Z cvt.u64.u32 %rd206, %r602; 2026-02-21T09:18:29.8171418Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:18:29.8171479Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:18:29.8171661Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8171724Z mov.b64 {%r1246, %r1247}, %rd208; 2026-02-21T09:18:29.8171790Z cvt.rn.f16x2.f32 %r1248, %r1247, %r1246; 2026-02-21T09:18:29.8171988Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8172049Z cvt.u64.u32 %rd209, %r603; 2026-02-21T09:18:29.8172107Z cvt.u64.u32 %rd210, %r604; 2026-02-21T09:18:29.8172197Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:18:29.8172258Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:18:29.8172452Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8172523Z mov.b64 {%r1249, %r1250}, %rd212; 2026-02-21T09:18:29.8172594Z cvt.rn.f16x2.f32 %r1251, %r1250, %r1249; 2026-02-21T09:18:29.8172790Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8172852Z cvt.u64.u32 %rd213, %r606; 2026-02-21T09:18:29.8172922Z cvt.u64.u32 %rd214, %r607; 2026-02-21T09:18:29.8172982Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:18:29.8173042Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:18:29.8173233Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8173293Z mov.b64 {%r1252, %r1253}, %rd216; 2026-02-21T09:18:29.8173381Z cvt.rn.f16x2.f32 %r1254, %r1253, %r1252; 2026-02-21T09:18:29.8173563Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8173623Z cvt.u64.u32 %rd217, %r608; 2026-02-21T09:18:29.8173681Z cvt.u64.u32 %rd218, %r609; 2026-02-21T09:18:29.8173741Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:18:29.8173808Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:18:29.8174000Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8174058Z mov.b64 {%r1255, %r1256}, %rd220; 2026-02-21T09:18:29.8174132Z cvt.rn.f16x2.f32 %r1257, %r1256, %r1255; 2026-02-21T09:18:29.8174324Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8174382Z cvt.u64.u32 %rd221, %r610; 2026-02-21T09:18:29.8174447Z cvt.u64.u32 %rd222, %r611; 2026-02-21T09:18:29.8174505Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:18:29.8174567Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:18:29.8174768Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8174837Z mov.b64 {%r1258, %r1259}, %rd224; 2026-02-21T09:18:29.8174903Z cvt.rn.f16x2.f32 %r1260, %r1259, %r1258; 2026-02-21T09:18:29.8175074Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8175140Z cvt.u64.u32 %rd225, %r612; 2026-02-21T09:18:29.8175199Z cvt.u64.u32 %rd226, %r613; 2026-02-21T09:18:29.8175258Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:18:29.8175317Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:18:29.8175515Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8175575Z mov.b64 {%r1261, %r1262}, %rd228; 2026-02-21T09:18:29.8175641Z cvt.rn.f16x2.f32 %r1263, %r1262, %r1261; 2026-02-21T09:18:29.8175840Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8175900Z cvt.u64.u32 %rd229, %r614; 2026-02-21T09:18:29.8175958Z cvt.u64.u32 %rd230, %r615; 2026-02-21T09:18:29.8176025Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:18:29.8176083Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:18:29.8176286Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8176353Z mov.b64 {%r1264, %r1265}, %rd232; 2026-02-21T09:18:29.8176421Z cvt.rn.f16x2.f32 %r1266, %r1265, %r1264; 2026-02-21T09:18:29.8176591Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8176651Z cvt.u64.u32 %rd233, %r616; 2026-02-21T09:18:29.8176717Z cvt.u64.u32 %rd234, %r617; 2026-02-21T09:18:29.8176776Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:18:29.8176837Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:18:29.8177018Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8177107Z mov.b64 {%r1267, %r1268}, %rd236; 2026-02-21T09:18:29.8177173Z cvt.rn.f16x2.f32 %r1269, %r1268, %r1267; 2026-02-21T09:18:29.8177347Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8177408Z cvt.u64.u32 %rd237, %r618; 2026-02-21T09:18:29.8177465Z cvt.u64.u32 %rd238, %r619; 2026-02-21T09:18:29.8177523Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:18:29.8177589Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:18:29.8177801Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8177862Z mov.b64 {%r1270, %r1271}, %rd240; 2026-02-21T09:18:29.8177936Z cvt.rn.f16x2.f32 %r1272, %r1271, %r1270; 2026-02-21T09:18:29.8178101Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8178159Z cvt.u64.u32 %rd241, %r620; 2026-02-21T09:18:29.8178255Z cvt.u64.u32 %rd242, %r621; 2026-02-21T09:18:29.8178316Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:18:29.8178374Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:18:29.8178544Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8178611Z mov.b64 {%r1273, %r1274}, %rd244; 2026-02-21T09:18:29.8178677Z cvt.rn.f16x2.f32 %r1275, %r1274, %r1273; 2026-02-21T09:18:29.8178847Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8178912Z cvt.u64.u32 %rd245, %r623; 2026-02-21T09:18:29.8178971Z cvt.u64.u32 %rd246, %r624; 2026-02-21T09:18:29.8179029Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:18:29.8179087Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:18:29.8179265Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8179323Z mov.b64 {%r1276, %r1277}, %rd248; 2026-02-21T09:18:29.8179389Z cvt.rn.f16x2.f32 %r1278, %r1277, %r1276; 2026-02-21T09:18:29.8179567Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8179624Z cvt.u64.u32 %rd249, %r625; 2026-02-21T09:18:29.8179682Z cvt.u64.u32 %rd250, %r626; 2026-02-21T09:18:29.8179749Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:18:29.8179808Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:18:29.8179980Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8180047Z mov.b64 {%r1279, %r1280}, %rd252; 2026-02-21T09:18:29.8180115Z cvt.rn.f16x2.f32 %r1281, %r1280, %r1279; 2026-02-21T09:18:29.8180284Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8180343Z cvt.u64.u32 %rd253, %r627; 2026-02-21T09:18:29.8180411Z cvt.u64.u32 %rd254, %r628; 2026-02-21T09:18:29.8180469Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:18:29.8180540Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:18:29.8180712Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8180771Z mov.b64 {%r1282, %r1283}, %rd256; 2026-02-21T09:18:29.8180835Z cvt.rn.f16x2.f32 %r1284, %r1283, %r1282; 2026-02-21T09:18:29.8181026Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8181084Z cvt.u64.u32 %rd257, %r629; 2026-02-21T09:18:29.8181139Z cvt.u64.u32 %rd258, %r630; 2026-02-21T09:18:29.8181198Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:18:29.8181263Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:18:29.8181426Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8181485Z mov.b64 {%r1285, %r1286}, %rd260; 2026-02-21T09:18:29.8181557Z cvt.rn.f16x2.f32 %r1287, %r1286, %r1285; 2026-02-21T09:18:29.8181720Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8181799Z cvt.u64.u32 %rd261, %r631; 2026-02-21T09:18:29.8181865Z cvt.u64.u32 %rd262, %r632; 2026-02-21T09:18:29.8181922Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:18:29.8181980Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:18:29.8182143Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8182209Z mov.b64 {%r1288, %r1289}, %rd264; 2026-02-21T09:18:29.8182272Z cvt.rn.f16x2.f32 %r1290, %r1289, %r1288; 2026-02-21T09:18:29.8182454Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8182518Z cvt.u64.u32 %rd265, %r633; 2026-02-21T09:18:29.8182573Z cvt.u64.u32 %rd266, %r634; 2026-02-21T09:18:29.8182630Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:18:29.8182687Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:18:29.8182880Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8182940Z mov.b64 {%r1291, %r1292}, %rd268; 2026-02-21T09:18:29.8183002Z cvt.rn.f16x2.f32 %r1293, %r1292, %r1291; 2026-02-21T09:18:29.8183167Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8183224Z cvt.u64.u32 %rd269, %r635; 2026-02-21T09:18:29.8183279Z cvt.u64.u32 %rd270, %r636; 2026-02-21T09:18:29.8183340Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:18:29.8183397Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:18:29.8183557Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8183622Z mov.b64 {%r1294, %r1295}, %rd272; 2026-02-21T09:18:29.8183686Z cvt.rn.f16x2.f32 %r1296, %r1295, %r1294; 2026-02-21T09:18:29.8183843Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8183898Z cvt.u64.u32 %rd273, %r637; 2026-02-21T09:18:29.8183962Z cvt.u64.u32 %rd274, %r638; 2026-02-21T09:18:29.8184018Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:18:29.8184073Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:18:29.8184238Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8184295Z mov.b64 {%r1297, %r1298}, %rd276; 2026-02-21T09:18:29.8184359Z cvt.rn.f16x2.f32 %r1299, %r1298, %r1297; 2026-02-21T09:18:29.8184525Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8184584Z cvt.u64.u32 %rd277, %r640; 2026-02-21T09:18:29.8184638Z cvt.u64.u32 %rd278, %r641; 2026-02-21T09:18:29.8184724Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:18:29.8184789Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:18:29.8184952Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8185009Z mov.b64 {%r1300, %r1301}, %rd280; 2026-02-21T09:18:29.8185081Z cvt.rn.f16x2.f32 %r1302, %r1301, %r1300; 2026-02-21T09:18:29.8185249Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8185306Z cvt.u64.u32 %rd281, %r642; 2026-02-21T09:18:29.8185369Z cvt.u64.u32 %rd282, %r643; 2026-02-21T09:18:29.8185425Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:18:29.8185509Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:18:29.8185674Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8185740Z mov.b64 {%r1303, %r1304}, %rd284; 2026-02-21T09:18:29.8185805Z cvt.rn.f16x2.f32 %r1305, %r1304, %r1303; 2026-02-21T09:18:29.8185966Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8186030Z cvt.u64.u32 %rd285, %r644; 2026-02-21T09:18:29.8186086Z cvt.u64.u32 %rd286, %r645; 2026-02-21T09:18:29.8186141Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:18:29.8186199Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:18:29.8186395Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8186454Z mov.b64 {%r1306, %r1307}, %rd288; 2026-02-21T09:18:29.8186518Z cvt.rn.f16x2.f32 %r1308, %r1307, %r1306; 2026-02-21T09:18:29.8186685Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8186741Z cvt.u64.u32 %rd289, %r646; 2026-02-21T09:18:29.8186796Z cvt.u64.u32 %rd290, %r647; 2026-02-21T09:18:29.8186881Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:18:29.8186939Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:18:29.8187099Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8187161Z mov.b64 {%r1309, %r1310}, %rd292; 2026-02-21T09:18:29.8187224Z cvt.rn.f16x2.f32 %r1311, %r1310, %r1309; 2026-02-21T09:18:29.8187411Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8187469Z cvt.u64.u32 %rd293, %r648; 2026-02-21T09:18:29.8187530Z cvt.u64.u32 %rd294, %r649; 2026-02-21T09:18:29.8187586Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:18:29.8187641Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:18:29.8187810Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8187869Z mov.b64 {%r1312, %r1313}, %rd296; 2026-02-21T09:18:29.8187930Z cvt.rn.f16x2.f32 %r1314, %r1313, %r1312; 2026-02-21T09:18:29.8188103Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8188159Z cvt.u64.u32 %rd297, %r650; 2026-02-21T09:18:29.8188214Z cvt.u64.u32 %rd298, %r651; 2026-02-21T09:18:29.8188271Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:18:29.8188335Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:18:29.8188495Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8188554Z mov.b64 {%r1315, %r1316}, %rd300; 2026-02-21T09:18:29.8188625Z cvt.rn.f16x2.f32 %r1317, %r1316, %r1315; 2026-02-21T09:18:29.8188787Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8188844Z cvt.u64.u32 %rd301, %r652; 2026-02-21T09:18:29.8188907Z cvt.u64.u32 %rd302, %r653; 2026-02-21T09:18:29.8188961Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:18:29.8189018Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:18:29.8189182Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8189246Z mov.b64 {%r1318, %r1319}, %rd304; 2026-02-21T09:18:29.8189309Z cvt.rn.f16x2.f32 %r1320, %r1319, %r1318; 2026-02-21T09:18:29.8189474Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8189536Z cvt.u64.u32 %rd305, %r654; 2026-02-21T09:18:29.8189594Z cvt.u64.u32 %rd306, %r655; 2026-02-21T09:18:29.8189652Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:18:29.8189709Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:18:29.8189881Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8189937Z mov.b64 {%r1321, %r1322}, %rd308; 2026-02-21T09:18:29.8190024Z cvt.rn.f16x2.f32 %r1323, %r1322, %r1321; 2026-02-21T09:18:29.8190200Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8190258Z cvt.u64.u32 %rd309, %r657; 2026-02-21T09:18:29.8190316Z cvt.u64.u32 %rd310, %r658; 2026-02-21T09:18:29.8190381Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:18:29.8190439Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:18:29.8190600Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8190676Z mov.b64 {%r1324, %r1325}, %rd312; 2026-02-21T09:18:29.8190743Z cvt.rn.f16x2.f32 %r1326, %r1325, %r1324; 2026-02-21T09:18:29.8190928Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8190984Z cvt.u64.u32 %rd313, %r659; 2026-02-21T09:18:29.8191049Z cvt.u64.u32 %rd314, %r660; 2026-02-21T09:18:29.8191105Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:18:29.8191163Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:18:29.8191332Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8191411Z mov.b64 {%r1327, %r1328}, %rd316; 2026-02-21T09:18:29.8191476Z cvt.rn.f16x2.f32 %r1329, %r1328, %r1327; 2026-02-21T09:18:29.8191641Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8191698Z cvt.u64.u32 %rd317, %r661; 2026-02-21T09:18:29.8191752Z cvt.u64.u32 %rd318, %r662; 2026-02-21T09:18:29.8191808Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:18:29.8191893Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:18:29.8192059Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8192117Z mov.b64 {%r1330, %r1331}, %rd320; 2026-02-21T09:18:29.8192187Z cvt.rn.f16x2.f32 %r1332, %r1331, %r1330; 2026-02-21T09:18:29.8192351Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8192410Z cvt.u64.u32 %rd321, %r663; 2026-02-21T09:18:29.8192473Z cvt.u64.u32 %rd322, %r664; 2026-02-21T09:18:29.8192531Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:18:29.8192588Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:18:29.8192751Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8192817Z mov.b64 {%r1333, %r1334}, %rd324; 2026-02-21T09:18:29.8192881Z cvt.rn.f16x2.f32 %r1335, %r1334, %r1333; 2026-02-21T09:18:29.8193050Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8193116Z cvt.u64.u32 %rd325, %r665; 2026-02-21T09:18:29.8193172Z cvt.u64.u32 %rd326, %r666; 2026-02-21T09:18:29.8193228Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:18:29.8193293Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:18:29.8193460Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8193519Z mov.b64 {%r1336, %r1337}, %rd328; 2026-02-21T09:18:29.8193583Z cvt.rn.f16x2.f32 %r1338, %r1337, %r1336; 2026-02-21T09:18:29.8193756Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8193812Z cvt.u64.u32 %rd329, %r667; 2026-02-21T09:18:29.8193868Z cvt.u64.u32 %rd330, %r668; 2026-02-21T09:18:29.8193930Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:18:29.8193988Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:18:29.8194152Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8194218Z mov.b64 {%r1339, %r1340}, %rd332; 2026-02-21T09:18:29.8194289Z cvt.rn.f16x2.f32 %r1341, %r1340, %r1339; 2026-02-21T09:18:29.8194463Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8194522Z cvt.u64.u32 %rd333, %r669; 2026-02-21T09:18:29.8194610Z cvt.u64.u32 %rd334, %r670; 2026-02-21T09:18:29.8194703Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:18:29.8194765Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:18:29.8194946Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8195009Z mov.b64 {%r1342, %r1343}, %rd336; 2026-02-21T09:18:29.8195074Z cvt.rn.f16x2.f32 %r1344, %r1343, %r1342; 2026-02-21T09:18:29.8195252Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8195311Z cvt.u64.u32 %rd337, %r671; 2026-02-21T09:18:29.8195370Z cvt.u64.u32 %rd338, %r672; 2026-02-21T09:18:29.8195432Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:18:29.8195547Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:18:29.8195719Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8195773Z mov.b64 {%r1345, %r1346}, %rd340; 2026-02-21T09:18:29.8195845Z cvt.rn.f16x2.f32 %r1347, %r1346, %r1345; 2026-02-21T09:18:29.8196000Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8196087Z cvt.u64.u32 %rd341, %r674; 2026-02-21T09:18:29.8196152Z cvt.u64.u32 %rd342, %r675; 2026-02-21T09:18:29.8196208Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:18:29.8196264Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:18:29.8196424Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8196487Z mov.b64 {%r1348, %r1349}, %rd344; 2026-02-21T09:18:29.8196576Z cvt.rn.f16x2.f32 %r1350, %r1349, %r1348; 2026-02-21T09:18:29.8196742Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8196809Z cvt.u64.u32 %rd345, %r676; 2026-02-21T09:18:29.8196866Z cvt.u64.u32 %rd346, %r677; 2026-02-21T09:18:29.8196922Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:18:29.8196988Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:18:29.8197152Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8197212Z mov.b64 {%r1351, %r1352}, %rd348; 2026-02-21T09:18:29.8197276Z cvt.rn.f16x2.f32 %r1353, %r1352, %r1351; 2026-02-21T09:18:29.8197445Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8197500Z cvt.u64.u32 %rd349, %r678; 2026-02-21T09:18:29.8197557Z cvt.u64.u32 %rd350, %r679; 2026-02-21T09:18:29.8197621Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:18:29.8197678Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:18:29.8197841Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8197906Z mov.b64 {%r1354, %r1355}, %rd352; 2026-02-21T09:18:29.8197969Z cvt.rn.f16x2.f32 %r1356, %r1355, %r1354; 2026-02-21T09:18:29.8198132Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8198189Z cvt.u64.u32 %rd353, %r680; 2026-02-21T09:18:29.8198253Z cvt.u64.u32 %rd354, %r681; 2026-02-21T09:18:29.8198308Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:18:29.8198367Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:18:29.8198537Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8198595Z mov.b64 {%r1357, %r1358}, %rd356; 2026-02-21T09:18:29.8198658Z cvt.rn.f16x2.f32 %r1359, %r1358, %r1357; 2026-02-21T09:18:29.8198829Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8198886Z cvt.u64.u32 %rd357, %r682; 2026-02-21T09:18:29.8198943Z cvt.u64.u32 %rd358, %r683; 2026-02-21T09:18:29.8199001Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:18:29.8199067Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:18:29.8199230Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8199318Z mov.b64 {%r1360, %r1361}, %rd360; 2026-02-21T09:18:29.8199393Z cvt.rn.f16x2.f32 %r1362, %r1361, %r1360; 2026-02-21T09:18:29.8199560Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8199618Z cvt.u64.u32 %rd361, %r684; 2026-02-21T09:18:29.8199684Z cvt.u64.u32 %rd362, %r685; 2026-02-21T09:18:29.8199741Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:18:29.8199797Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:18:29.8199958Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8200024Z mov.b64 {%r1363, %r1364}, %rd364; 2026-02-21T09:18:29.8200122Z cvt.rn.f16x2.f32 %r1365, %r1364, %r1363; 2026-02-21T09:18:29.8200283Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8200345Z cvt.u64.u32 %rd365, %r686; 2026-02-21T09:18:29.8200401Z cvt.u64.u32 %rd366, %r687; 2026-02-21T09:18:29.8200456Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:18:29.8200519Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:18:29.8200706Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8200765Z mov.b64 {%r1366, %r1367}, %rd368; 2026-02-21T09:18:29.8200827Z cvt.rn.f16x2.f32 %r1368, %r1367, %r1366; 2026-02-21T09:18:29.8200997Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8201055Z cvt.u64.u32 %rd369, %r688; 2026-02-21T09:18:29.8201110Z cvt.u64.u32 %rd370, %r689; 2026-02-21T09:18:29.8201193Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:18:29.8201251Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:18:29.8201407Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8201472Z mov.b64 {%r1369, %r1370}, %rd372; 2026-02-21T09:18:29.8201537Z cvt.rn.f16x2.f32 %r1371, %r1370, %r1369; 2026-02-21T09:18:29.8201697Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8201751Z cvt.u64.u32 %rd373, %r691; 2026-02-21T09:18:29.8201815Z cvt.u64.u32 %rd374, %r692; 2026-02-21T09:18:29.8201872Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:18:29.8201929Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:18:29.8202096Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8202153Z mov.b64 {%r1372, %r1373}, %rd376; 2026-02-21T09:18:29.8202216Z cvt.rn.f16x2.f32 %r1374, %r1373, %r1372; 2026-02-21T09:18:29.8202380Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8202437Z cvt.u64.u32 %rd377, %r693; 2026-02-21T09:18:29.8202491Z cvt.u64.u32 %rd378, %r694; 2026-02-21T09:18:29.8202546Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:18:29.8202611Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:18:29.8202767Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8202824Z mov.b64 {%r1375, %r1376}, %rd380; 2026-02-21T09:18:29.8202892Z cvt.rn.f16x2.f32 %r1377, %r1376, %r1375; 2026-02-21T09:18:29.8203047Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8203102Z cvt.u64.u32 %rd381, %r695; 2026-02-21T09:18:29.8203164Z cvt.u64.u32 %rd382, %r696; 2026-02-21T09:18:29.8203218Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:18:29.8203274Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:18:29.8203433Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8203497Z mov.b64 {%r1378, %r1379}, %rd384; 2026-02-21T09:18:29.8203559Z cvt.rn.f16x2.f32 %r1380, %r1379, %r1378; 2026-02-21T09:18:29.8203716Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8203803Z cvt.u64.u32 %rd385, %r697; 2026-02-21T09:18:29.8203858Z cvt.u64.u32 %rd386, %r698; 2026-02-21T09:18:29.8203913Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:18:29.8203976Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:18:29.8204137Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8204193Z mov.b64 {%r1381, %r1382}, %rd388; 2026-02-21T09:18:29.8204256Z cvt.rn.f16x2.f32 %r1383, %r1382, %r1381; 2026-02-21T09:18:29.8204423Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8204480Z cvt.u64.u32 %rd389, %r699; 2026-02-21T09:18:29.8204564Z cvt.u64.u32 %rd390, %r700; 2026-02-21T09:18:29.8204628Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:18:29.8204710Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:18:29.8204875Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8204941Z mov.b64 {%r1384, %r1385}, %rd392; 2026-02-21T09:18:29.8205004Z cvt.rn.f16x2.f32 %r1386, %r1385, %r1384; 2026-02-21T09:18:29.8205194Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8205253Z cvt.u64.u32 %rd393, %r701; 2026-02-21T09:18:29.8205314Z cvt.u64.u32 %rd394, %r702; 2026-02-21T09:18:29.8205370Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:18:29.8205427Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:18:29.8205596Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8205677Z mov.b64 {%r1387, %r1388}, %rd396; 2026-02-21T09:18:29.8205743Z cvt.rn.f16x2.f32 %r1389, %r1388, %r1387; 2026-02-21T09:18:29.8205912Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8205968Z cvt.u64.u32 %rd397, %r703; 2026-02-21T09:18:29.8206025Z cvt.u64.u32 %rd398, %r704; 2026-02-21T09:18:29.8206081Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:18:29.8206145Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:18:29.8206313Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8206372Z mov.b64 {%r1390, %r1391}, %rd400; 2026-02-21T09:18:29.8206442Z cvt.rn.f16x2.f32 %r1392, %r1391, %r1390; 2026-02-21T09:18:29.8206604Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8206660Z cvt.u64.u32 %rd401, %r705; 2026-02-21T09:18:29.8206722Z cvt.u64.u32 %rd402, %r706; 2026-02-21T09:18:29.8206778Z shl.b64 %rd403, %rd402, 32; 2026-02-21T09:18:29.8206836Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T09:18:29.8207001Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8207064Z mov.b64 {%r1393, %r1394}, %rd404; 2026-02-21T09:18:29.8207129Z cvt.rn.f16x2.f32 %r1395, %r1394, %r1393; 2026-02-21T09:18:29.8207296Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8207359Z cvt.u64.u32 %rd405, %r708; 2026-02-21T09:18:29.8207416Z cvt.u64.u32 %rd406, %r709; 2026-02-21T09:18:29.8207473Z shl.b64 %rd407, %rd406, 32; 2026-02-21T09:18:29.8207538Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T09:18:29.8207703Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8207761Z mov.b64 {%r1396, %r1397}, %rd408; 2026-02-21T09:18:29.8207825Z cvt.rn.f16x2.f32 %r1398, %r1397, %r1396; 2026-02-21T09:18:29.8208001Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8208060Z cvt.u64.u32 %rd409, %r710; 2026-02-21T09:18:29.8208117Z cvt.u64.u32 %rd410, %r711; 2026-02-21T09:18:29.8208184Z shl.b64 %rd411, %rd410, 32; 2026-02-21T09:18:29.8208240Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T09:18:29.8208429Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8208495Z mov.b64 {%r1399, %r1400}, %rd412; 2026-02-21T09:18:29.8208561Z cvt.rn.f16x2.f32 %r1401, %r1400, %r1399; 2026-02-21T09:18:29.8208720Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8208775Z cvt.u64.u32 %rd413, %r712; 2026-02-21T09:18:29.8208837Z cvt.u64.u32 %rd414, %r713; 2026-02-21T09:18:29.8208892Z shl.b64 %rd415, %rd414, 32; 2026-02-21T09:18:29.8208948Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T09:18:29.8209119Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8209207Z mov.b64 {%r1402, %r1403}, %rd416; 2026-02-21T09:18:29.8209271Z cvt.rn.f16x2.f32 %r1404, %r1403, %r1402; 2026-02-21T09:18:29.8209439Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8209497Z cvt.u64.u32 %rd417, %r714; 2026-02-21T09:18:29.8209553Z cvt.u64.u32 %rd418, %r715; 2026-02-21T09:18:29.8209608Z shl.b64 %rd419, %rd418, 32; 2026-02-21T09:18:29.8209690Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T09:18:29.8209857Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8209914Z mov.b64 {%r1405, %r1406}, %rd420; 2026-02-21T09:18:29.8209983Z cvt.rn.f16x2.f32 %r1407, %r1406, %r1405; 2026-02-21T09:18:29.8210145Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8210221Z cvt.u64.u32 %rd421, %r716; 2026-02-21T09:18:29.8210286Z cvt.u64.u32 %rd422, %r717; 2026-02-21T09:18:29.8210342Z shl.b64 %rd423, %rd422, 32; 2026-02-21T09:18:29.8210399Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T09:18:29.8210554Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8210619Z mov.b64 {%r1408, %r1409}, %rd424; 2026-02-21T09:18:29.8210682Z cvt.rn.f16x2.f32 %r1410, %r1409, %r1408; 2026-02-21T09:18:29.8210838Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8210900Z cvt.u64.u32 %rd425, %r718; 2026-02-21T09:18:29.8210956Z cvt.u64.u32 %rd426, %r719; 2026-02-21T09:18:29.8211011Z shl.b64 %rd427, %rd426, 32; 2026-02-21T09:18:29.8211075Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T09:18:29.8211231Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8211290Z mov.b64 {%r1411, %r1412}, %rd428; 2026-02-21T09:18:29.8211354Z cvt.rn.f16x2.f32 %r1413, %r1412, %r1411; 2026-02-21T09:18:29.8211515Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8211572Z cvt.u64.u32 %rd429, %r720; 2026-02-21T09:18:29.8211627Z cvt.u64.u32 %rd430, %r721; 2026-02-21T09:18:29.8211691Z shl.b64 %rd431, %rd430, 32; 2026-02-21T09:18:29.8211747Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T09:18:29.8211907Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8211970Z mov.b64 {%r1414, %r1415}, %rd432; 2026-02-21T09:18:29.8212032Z cvt.rn.f16x2.f32 %r1416, %r1415, %r1414; 2026-02-21T09:18:29.8212187Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8212242Z cvt.u64.u32 %rd433, %r722; 2026-02-21T09:18:29.8212304Z cvt.u64.u32 %rd434, %r723; 2026-02-21T09:18:29.8212360Z shl.b64 %rd435, %rd434, 32; 2026-02-21T09:18:29.8212415Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T09:18:29.8212578Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8212635Z mov.b64 {%r1417, %r1418}, %rd436; 2026-02-21T09:18:29.8212697Z cvt.rn.f16x2.f32 %r1419, %r1418, %r1417; 2026-02-21T09:18:29.8212898Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8212957Z cvt.u64.u32 %rd437, %r725; 2026-02-21T09:18:29.8213017Z cvt.u64.u32 %rd438, %r726; 2026-02-21T09:18:29.8213076Z shl.b64 %rd439, %rd438, 32; 2026-02-21T09:18:29.8213142Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T09:18:29.8213310Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8213370Z mov.b64 {%r1420, %r1421}, %rd440; 2026-02-21T09:18:29.8213442Z cvt.rn.f16x2.f32 %r1422, %r1421, %r1420; 2026-02-21T09:18:29.8213614Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8213695Z cvt.u64.u32 %rd441, %r727; 2026-02-21T09:18:29.8213761Z cvt.u64.u32 %rd442, %r728; 2026-02-21T09:18:29.8213819Z shl.b64 %rd443, %rd442, 32; 2026-02-21T09:18:29.8213877Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T09:18:29.8214048Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8214115Z mov.b64 {%r1423, %r1424}, %rd444; 2026-02-21T09:18:29.8214201Z cvt.rn.f16x2.f32 %r1425, %r1424, %r1423; 2026-02-21T09:18:29.8214373Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8214437Z cvt.u64.u32 %rd445, %r729; 2026-02-21T09:18:29.8214494Z cvt.u64.u32 %rd446, %r730; 2026-02-21T09:18:29.8214553Z shl.b64 %rd447, %rd446, 32; 2026-02-21T09:18:29.8214619Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T09:18:29.8214853Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8214919Z mov.b64 {%r1426, %r1427}, %rd448; 2026-02-21T09:18:29.8214986Z cvt.rn.f16x2.f32 %r1428, %r1427, %r1426; 2026-02-21T09:18:29.8215164Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8215225Z cvt.u64.u32 %rd449, %r731; 2026-02-21T09:18:29.8215284Z cvt.u64.u32 %rd450, %r732; 2026-02-21T09:18:29.8215349Z shl.b64 %rd451, %rd450, 32; 2026-02-21T09:18:29.8215410Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T09:18:29.8215577Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8215652Z mov.b64 {%r1429, %r1430}, %rd452; 2026-02-21T09:18:29.8215718Z cvt.rn.f16x2.f32 %r1431, %r1430, %r1429; 2026-02-21T09:18:29.8215887Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8215946Z cvt.u64.u32 %rd453, %r733; 2026-02-21T09:18:29.8216012Z cvt.u64.u32 %rd454, %r734; 2026-02-21T09:18:29.8216071Z shl.b64 %rd455, %rd454, 32; 2026-02-21T09:18:29.8216130Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T09:18:29.8216301Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8216362Z mov.b64 {%r1432, %r1433}, %rd456; 2026-02-21T09:18:29.8216428Z cvt.rn.f16x2.f32 %r1434, %r1433, %r1432; 2026-02-21T09:18:29.8216604Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8216664Z cvt.u64.u32 %rd457, %r735; 2026-02-21T09:18:29.8216723Z cvt.u64.u32 %rd458, %r736; 2026-02-21T09:18:29.8216781Z shl.b64 %rd459, %rd458, 32; 2026-02-21T09:18:29.8216850Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T09:18:29.8217020Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8217081Z mov.b64 {%r1435, %r1436}, %rd460; 2026-02-21T09:18:29.8217157Z cvt.rn.f16x2.f32 %r1437, %r1436, %r1435; 2026-02-21T09:18:29.8217323Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8217381Z cvt.u64.u32 %rd461, %r737; 2026-02-21T09:18:29.8217448Z cvt.u64.u32 %rd462, %r738; 2026-02-21T09:18:29.8217534Z shl.b64 %rd463, %rd462, 32; 2026-02-21T09:18:29.8217594Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T09:18:29.8217771Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8217838Z mov.b64 {%r1438, %r1439}, %rd464; 2026-02-21T09:18:29.8217904Z cvt.rn.f16x2.f32 %r1440, %r1439, %r1438; 2026-02-21T09:18:29.8218077Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8218142Z cvt.u64.u32 %rd465, %r739; 2026-02-21T09:18:29.8218198Z cvt.u64.u32 %rd466, %r740; 2026-02-21T09:18:29.8218256Z shl.b64 %rd467, %rd466, 32; 2026-02-21T09:18:29.8218323Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T09:18:29.8218523Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8218582Z mov.b64 {%r1441, %r1442}, %rd468; 2026-02-21T09:18:29.8218647Z cvt.rn.f16x2.f32 %r1443, %r1442, %r1441; 2026-02-21T09:18:29.8218828Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8218886Z cvt.u64.u32 %rd469, %r742; 2026-02-21T09:18:29.8218971Z cvt.u64.u32 %rd470, %r743; 2026-02-21T09:18:29.8219037Z shl.b64 %rd471, %rd470, 32; 2026-02-21T09:18:29.8219096Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T09:18:29.8219268Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8219334Z mov.b64 {%r1444, %r1445}, %rd472; 2026-02-21T09:18:29.8219402Z cvt.rn.f16x2.f32 %r1446, %r1445, %r1444; 2026-02-21T09:18:29.8219595Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8219655Z cvt.u64.u32 %rd473, %r744; 2026-02-21T09:18:29.8219720Z cvt.u64.u32 %rd474, %r745; 2026-02-21T09:18:29.8219778Z shl.b64 %rd475, %rd474, 32; 2026-02-21T09:18:29.8219835Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T09:18:29.8220016Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8220076Z mov.b64 {%r1447, %r1448}, %rd476; 2026-02-21T09:18:29.8220143Z cvt.rn.f16x2.f32 %r1449, %r1448, %r1447; 2026-02-21T09:18:29.8220319Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8220376Z cvt.u64.u32 %rd477, %r746; 2026-02-21T09:18:29.8220433Z cvt.u64.u32 %rd478, %r747; 2026-02-21T09:18:29.8220492Z shl.b64 %rd479, %rd478, 32; 2026-02-21T09:18:29.8220559Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T09:18:29.8220736Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8220795Z mov.b64 {%r1450, %r1451}, %rd480; 2026-02-21T09:18:29.8220868Z cvt.rn.f16x2.f32 %r1452, %r1451, %r1450; 2026-02-21T09:18:29.8221036Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8221095Z cvt.u64.u32 %rd481, %r748; 2026-02-21T09:18:29.8221160Z cvt.u64.u32 %rd482, %r749; 2026-02-21T09:18:29.8221217Z shl.b64 %rd483, %rd482, 32; 2026-02-21T09:18:29.8221286Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T09:18:29.8221449Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8221512Z mov.b64 {%r1453, %r1454}, %rd484; 2026-02-21T09:18:29.8221574Z cvt.rn.f16x2.f32 %r1455, %r1454, %r1453; 2026-02-21T09:18:29.8221735Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8221797Z cvt.u64.u32 %rd485, %r750; 2026-02-21T09:18:29.8221854Z cvt.u64.u32 %rd486, %r751; 2026-02-21T09:18:29.8221910Z shl.b64 %rd487, %rd486, 32; 2026-02-21T09:18:29.8221974Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T09:18:29.8222133Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8222213Z mov.b64 {%r1456, %r1457}, %rd488; 2026-02-21T09:18:29.8222275Z cvt.rn.f16x2.f32 %r1458, %r1457, %r1456; 2026-02-21T09:18:29.8222446Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8222502Z cvt.u64.u32 %rd489, %r752; 2026-02-21T09:18:29.8222555Z cvt.u64.u32 %rd490, %r753; 2026-02-21T09:18:29.8222617Z shl.b64 %rd491, %rd490, 32; 2026-02-21T09:18:29.8222674Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T09:18:29.8222838Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8222900Z mov.b64 {%r1459, %r1460}, %rd492; 2026-02-21T09:18:29.8222964Z cvt.rn.f16x2.f32 %r1461, %r1460, %r1459; 2026-02-21T09:18:29.8223150Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8223206Z cvt.u64.u32 %rd493, %r754; 2026-02-21T09:18:29.8223268Z cvt.u64.u32 %rd494, %r755; 2026-02-21T09:18:29.8223326Z shl.b64 %rd495, %rd494, 32; 2026-02-21T09:18:29.8223383Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T09:18:29.8223556Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8223636Z mov.b64 {%r1462, %r1463}, %rd496; 2026-02-21T09:18:29.8223700Z cvt.rn.f16x2.f32 %r1464, %r1463, %r1462; 2026-02-21T09:18:29.8223871Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8223927Z cvt.u64.u32 %rd497, %r756; 2026-02-21T09:18:29.8223983Z cvt.u64.u32 %rd498, %r757; 2026-02-21T09:18:29.8224039Z shl.b64 %rd499, %rd498, 32; 2026-02-21T09:18:29.8224124Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T09:18:29.8224291Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8224349Z mov.b64 {%r1465, %r1466}, %rd500; 2026-02-21T09:18:29.8224418Z cvt.rn.f16x2.f32 %r1467, %r1466, %r1465; 2026-02-21T09:18:29.8224583Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8224638Z cvt.u64.u32 %rd501, %r759; 2026-02-21T09:18:29.8224726Z cvt.u64.u32 %rd502, %r760; 2026-02-21T09:18:29.8224784Z shl.b64 %rd503, %rd502, 32; 2026-02-21T09:18:29.8224840Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T09:18:29.8225005Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8225068Z mov.b64 {%r1468, %r1469}, %rd504; 2026-02-21T09:18:29.8225131Z cvt.rn.f16x2.f32 %r1470, %r1469, %r1468; 2026-02-21T09:18:29.8225298Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8225362Z cvt.u64.u32 %rd505, %r761; 2026-02-21T09:18:29.8225418Z cvt.u64.u32 %rd506, %r762; 2026-02-21T09:18:29.8225474Z shl.b64 %rd507, %rd506, 32; 2026-02-21T09:18:29.8225538Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T09:18:29.8225710Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8225768Z mov.b64 {%r1471, %r1472}, %rd508; 2026-02-21T09:18:29.8225838Z cvt.rn.f16x2.f32 %r1473, %r1472, %r1471; 2026-02-21T09:18:29.8226015Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8226071Z cvt.u64.u32 %rd509, %r763; 2026-02-21T09:18:29.8226127Z cvt.u64.u32 %rd510, %r764; 2026-02-21T09:18:29.8226192Z shl.b64 %rd511, %rd510, 32; 2026-02-21T09:18:29.8226249Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T09:18:29.8226417Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8226481Z mov.b64 {%r1474, %r1475}, %rd512; 2026-02-21T09:18:29.8226545Z cvt.rn.f16x2.f32 %r1476, %r1475, %r1474; 2026-02-21T09:18:29.8226709Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8226797Z cvt.u64.u32 %rd513, %r765; 2026-02-21T09:18:29.8226861Z cvt.u64.u32 %rd514, %r766; 2026-02-21T09:18:29.8226916Z shl.b64 %rd515, %rd514, 32; 2026-02-21T09:18:29.8226971Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T09:18:29.8227136Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8227194Z mov.b64 {%r1477, %r1478}, %rd516; 2026-02-21T09:18:29.8227258Z cvt.rn.f16x2.f32 %r1479, %r1478, %r1477; 2026-02-21T09:18:29.8227425Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8227481Z cvt.u64.u32 %rd517, %r767; 2026-02-21T09:18:29.8227538Z cvt.u64.u32 %rd518, %r768; 2026-02-21T09:18:29.8227621Z shl.b64 %rd519, %rd518, 32; 2026-02-21T09:18:29.8227685Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T09:18:29.8227851Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8227909Z mov.b64 {%r1480, %r1481}, %rd520; 2026-02-21T09:18:29.8227979Z cvt.rn.f16x2.f32 %r1482, %r1481, %r1480; 2026-02-21T09:18:29.8228144Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8228226Z cvt.u64.u32 %rd521, %r769; 2026-02-21T09:18:29.8228292Z cvt.u64.u32 %rd522, %r770; 2026-02-21T09:18:29.8228347Z shl.b64 %rd523, %rd522, 32; 2026-02-21T09:18:29.8228404Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T09:18:29.8228560Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8228625Z mov.b64 {%r1483, %r1484}, %rd524; 2026-02-21T09:18:29.8228727Z cvt.rn.f16x2.f32 %r1485, %r1484, %r1483; 2026-02-21T09:18:29.8228892Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8228955Z cvt.u64.u32 %rd525, %r771; 2026-02-21T09:18:29.8229010Z cvt.u64.u32 %rd526, %r772; 2026-02-21T09:18:29.8229068Z shl.b64 %rd527, %rd526, 32; 2026-02-21T09:18:29.8229133Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T09:18:29.8229301Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8229360Z mov.b64 {%r1486, %r1487}, %rd528; 2026-02-21T09:18:29.8229423Z cvt.rn.f16x2.f32 %r1488, %r1487, %r1486; 2026-02-21T09:18:29.8229592Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8229648Z cvt.u64.u32 %rd529, %r773; 2026-02-21T09:18:29.8229703Z cvt.u64.u32 %rd530, %r774; 2026-02-21T09:18:29.8229765Z shl.b64 %rd531, %rd530, 32; 2026-02-21T09:18:29.8229823Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T09:18:29.8229986Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8230049Z mov.b64 {%r1489, %r1490}, %rd532; 2026-02-21T09:18:29.8230112Z cvt.rn.f16x2.f32 %r1491, %r1490, %r1489; 2026-02-21T09:18:29.8230273Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8230329Z cvt.u64.u32 %rd533, %r776; 2026-02-21T09:18:29.8230391Z cvt.u64.u32 %rd534, %r777; 2026-02-21T09:18:29.8230449Z shl.b64 %rd535, %rd534, 32; 2026-02-21T09:18:29.8230505Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T09:18:29.8230674Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8230732Z mov.b64 {%r1492, %r1493}, %rd536; 2026-02-21T09:18:29.8230793Z cvt.rn.f16x2.f32 %r1494, %r1493, %r1492; 2026-02-21T09:18:29.8230964Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8231023Z cvt.u64.u32 %rd537, %r778; 2026-02-21T09:18:29.8231079Z cvt.u64.u32 %rd538, %r779; 2026-02-21T09:18:29.8231136Z shl.b64 %rd539, %rd538, 32; 2026-02-21T09:18:29.8231202Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T09:18:29.8231365Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8231447Z mov.b64 {%r1495, %r1496}, %rd540; 2026-02-21T09:18:29.8231516Z cvt.rn.f16x2.f32 %r1497, %r1496, %r1495; 2026-02-21T09:18:29.8231684Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8231740Z cvt.u64.u32 %rd541, %r780; 2026-02-21T09:18:29.8231802Z cvt.u64.u32 %rd542, %r781; 2026-02-21T09:18:29.8231857Z shl.b64 %rd543, %rd542, 32; 2026-02-21T09:18:29.8231912Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T09:18:29.8232080Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8232168Z mov.b64 {%r1498, %r1499}, %rd544; 2026-02-21T09:18:29.8232231Z cvt.rn.f16x2.f32 %r1500, %r1499, %r1498; 2026-02-21T09:18:29.8232395Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8232460Z cvt.u64.u32 %rd545, %r782; 2026-02-21T09:18:29.8232515Z cvt.u64.u32 %rd546, %r783; 2026-02-21T09:18:29.8232571Z shl.b64 %rd547, %rd546, 32; 2026-02-21T09:18:29.8232635Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T09:18:29.8232820Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8232880Z mov.b64 {%r1501, %r1502}, %rd548; 2026-02-21T09:18:29.8232944Z cvt.rn.f16x2.f32 %r1503, %r1502, %r1501; 2026-02-21T09:18:29.8233114Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8233171Z cvt.u64.u32 %rd549, %r784; 2026-02-21T09:18:29.8233249Z cvt.u64.u32 %rd550, %r785; 2026-02-21T09:18:29.8233315Z shl.b64 %rd551, %rd550, 32; 2026-02-21T09:18:29.8233372Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T09:18:29.8233541Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8233606Z mov.b64 {%r1504, %r1505}, %rd552; 2026-02-21T09:18:29.8233672Z cvt.rn.f16x2.f32 %r1506, %r1505, %r1504; 2026-02-21T09:18:29.8233830Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8233888Z cvt.u64.u32 %rd553, %r786; 2026-02-21T09:18:29.8233953Z cvt.u64.u32 %rd554, %r787; 2026-02-21T09:18:29.8234011Z shl.b64 %rd555, %rd554, 32; 2026-02-21T09:18:29.8234070Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T09:18:29.8234242Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8234301Z mov.b64 {%r1507, %r1508}, %rd556; 2026-02-21T09:18:29.8234367Z cvt.rn.f16x2.f32 %r1509, %r1508, %r1507; 2026-02-21T09:18:29.8234547Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8234607Z cvt.u64.u32 %rd557, %r788; 2026-02-21T09:18:29.8234663Z cvt.u64.u32 %rd558, %r789; 2026-02-21T09:18:29.8234750Z shl.b64 %rd559, %rd558, 32; 2026-02-21T09:18:29.8234818Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T09:18:29.8234977Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8235035Z mov.b64 {%r1510, %r1511}, %rd560; 2026-02-21T09:18:29.8235106Z cvt.rn.f16x2.f32 %r1512, %r1511, %r1510; 2026-02-21T09:18:29.8235267Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8235322Z cvt.u64.u32 %rd561, %r790; 2026-02-21T09:18:29.8235386Z cvt.u64.u32 %rd562, %r791; 2026-02-21T09:18:29.8235440Z shl.b64 %rd563, %rd562, 32; 2026-02-21T09:18:29.8235499Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T09:18:29.8235654Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8235721Z mov.b64 {%r1513, %r1514}, %rd564; 2026-02-21T09:18:29.8235782Z cvt.rn.f16x2.f32 %r1515, %r1514, %r1513; 2026-02-21T09:18:29.8235936Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8236031Z cvt.u64.u32 %rd565, %r793; 2026-02-21T09:18:29.8236087Z cvt.u64.u32 %rd566, %r794; 2026-02-21T09:18:29.8236144Z shl.b64 %rd567, %rd566, 32; 2026-02-21T09:18:29.8236208Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T09:18:29.8236367Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8236424Z mov.b64 {%r1516, %r1517}, %rd568; 2026-02-21T09:18:29.8236487Z cvt.rn.f16x2.f32 %r1518, %r1517, %r1516; 2026-02-21T09:18:29.8236657Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8236743Z cvt.u64.u32 %rd569, %r795; 2026-02-21T09:18:29.8236799Z cvt.u64.u32 %rd570, %r796; 2026-02-21T09:18:29.8236863Z shl.b64 %rd571, %rd570, 32; 2026-02-21T09:18:29.8236919Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T09:18:29.8237080Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8237147Z mov.b64 {%r1519, %r1520}, %rd572; 2026-02-21T09:18:29.8237211Z cvt.rn.f16x2.f32 %r1521, %r1520, %r1519; 2026-02-21T09:18:29.8237398Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8237457Z cvt.u64.u32 %rd573, %r797; 2026-02-21T09:18:29.8237521Z cvt.u64.u32 %rd574, %r798; 2026-02-21T09:18:29.8237577Z shl.b64 %rd575, %rd574, 32; 2026-02-21T09:18:29.8237633Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T09:18:29.8237831Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8237891Z mov.b64 {%r1522, %r1523}, %rd576; 2026-02-21T09:18:29.8237958Z cvt.rn.f16x2.f32 %r1524, %r1523, %r1522; 2026-02-21T09:18:29.8238128Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8238184Z cvt.u64.u32 %rd577, %r799; 2026-02-21T09:18:29.8238240Z cvt.u64.u32 %rd578, %r800; 2026-02-21T09:18:29.8238295Z shl.b64 %rd579, %rd578, 32; 2026-02-21T09:18:29.8238359Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T09:18:29.8238521Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8238579Z mov.b64 {%r1525, %r1526}, %rd580; 2026-02-21T09:18:29.8238649Z cvt.rn.f16x2.f32 %r1527, %r1526, %r1525; 2026-02-21T09:18:29.8238807Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8238862Z cvt.u64.u32 %rd581, %r801; 2026-02-21T09:18:29.8238926Z cvt.u64.u32 %rd582, %r802; 2026-02-21T09:18:29.8238981Z shl.b64 %rd583, %rd582, 32; 2026-02-21T09:18:29.8239038Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T09:18:29.8239198Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8239264Z mov.b64 {%r1528, %r1529}, %rd584; 2026-02-21T09:18:29.8239329Z cvt.rn.f16x2.f32 %r1530, %r1529, %r1528; 2026-02-21T09:18:29.8239491Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8239555Z cvt.u64.u32 %rd585, %r803; 2026-02-21T09:18:29.8239610Z cvt.u64.u32 %rd586, %r804; 2026-02-21T09:18:29.8239665Z shl.b64 %rd587, %rd586, 32; 2026-02-21T09:18:29.8239728Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T09:18:29.8239890Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8239946Z mov.b64 {%r1531, %r1532}, %rd588; 2026-02-21T09:18:29.8240009Z cvt.rn.f16x2.f32 %r1533, %r1532, %r1531; 2026-02-21T09:18:29.8240177Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8240235Z cvt.u64.u32 %rd589, %r805; 2026-02-21T09:18:29.8240290Z cvt.u64.u32 %rd590, %r806; 2026-02-21T09:18:29.8240352Z shl.b64 %rd591, %rd590, 32; 2026-02-21T09:18:29.8240433Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T09:18:29.8240593Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8240657Z mov.b64 {%r1534, %r1535}, %rd592; 2026-02-21T09:18:29.8240719Z cvt.rn.f16x2.f32 %r1536, %r1535, %r1534; 2026-02-21T09:18:29.8240874Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8240929Z cvt.u64.u32 %rd593, %r807; 2026-02-21T09:18:29.8240991Z cvt.u64.u32 %rd594, %r808; 2026-02-21T09:18:29.8241046Z shl.b64 %rd595, %rd594, 32; 2026-02-21T09:18:29.8241101Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T09:18:29.8241265Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8241359Z mov.b64 {%r1537, %r1538}, %rd596; 2026-02-21T09:18:29.8241423Z cvt.rn.f16x2.f32 %r1539, %r1538, %r1537; 2026-02-21T09:18:29.8241590Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8241650Z cvt.u64.u32 %rd597, %r810; 2026-02-21T09:18:29.8241705Z cvt.u64.u32 %rd598, %r811; 2026-02-21T09:18:29.8241784Z shl.b64 %rd599, %rd598, 32; 2026-02-21T09:18:29.8241851Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T09:18:29.8242011Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8242069Z mov.b64 {%r1540, %r1541}, %rd600; 2026-02-21T09:18:29.8242139Z cvt.rn.f16x2.f32 %r1542, %r1541, %r1540; 2026-02-21T09:18:29.8242318Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8242375Z cvt.u64.u32 %rd601, %r812; 2026-02-21T09:18:29.8242441Z cvt.u64.u32 %rd602, %r813; 2026-02-21T09:18:29.8242497Z shl.b64 %rd603, %rd602, 32; 2026-02-21T09:18:29.8242554Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T09:18:29.8242718Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8242785Z mov.b64 {%r1543, %r1544}, %rd604; 2026-02-21T09:18:29.8242849Z cvt.rn.f16x2.f32 %r1545, %r1544, %r1543; 2026-02-21T09:18:29.8243015Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8243083Z cvt.u64.u32 %rd605, %r814; 2026-02-21T09:18:29.8243141Z cvt.u64.u32 %rd606, %r815; 2026-02-21T09:18:29.8243200Z shl.b64 %rd607, %rd606, 32; 2026-02-21T09:18:29.8243270Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T09:18:29.8243437Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8243496Z mov.b64 {%r1546, %r1547}, %rd608; 2026-02-21T09:18:29.8243559Z cvt.rn.f16x2.f32 %r1548, %r1547, %r1546; 2026-02-21T09:18:29.8243728Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8243784Z cvt.u64.u32 %rd609, %r816; 2026-02-21T09:18:29.8243840Z cvt.u64.u32 %rd610, %r817; 2026-02-21T09:18:29.8243903Z shl.b64 %rd611, %rd610, 32; 2026-02-21T09:18:29.8243959Z or.b64 %rd612, %rd609, %rd611; 2026-02-21T09:18:29.8244124Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8244188Z mov.b64 {%r1549, %r1550}, %rd612; 2026-02-21T09:18:29.8244251Z cvt.rn.f16x2.f32 %r1551, %r1550, %r1549; 2026-02-21T09:18:29.8244413Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8244469Z cvt.u64.u32 %rd613, %r818; 2026-02-21T09:18:29.8244533Z cvt.u64.u32 %rd614, %r819; 2026-02-21T09:18:29.8244590Z shl.b64 %rd615, %rd614, 32; 2026-02-21T09:18:29.8244647Z or.b64 %rd616, %rd613, %rd615; 2026-02-21T09:18:29.8244848Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8244906Z mov.b64 {%r1552, %r1553}, %rd616; 2026-02-21T09:18:29.8244997Z cvt.rn.f16x2.f32 %r1554, %r1553, %r1552; 2026-02-21T09:18:29.8245161Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8245218Z cvt.u64.u32 %rd617, %r820; 2026-02-21T09:18:29.8245274Z cvt.u64.u32 %rd618, %r821; 2026-02-21T09:18:29.8245329Z shl.b64 %rd619, %rd618, 32; 2026-02-21T09:18:29.8245392Z or.b64 %rd620, %rd617, %rd619; 2026-02-21T09:18:29.8245553Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8245610Z mov.b64 {%r1555, %r1556}, %rd620; 2026-02-21T09:18:29.8245677Z cvt.rn.f16x2.f32 %r1557, %r1556, %r1555; 2026-02-21T09:18:29.8245840Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8245924Z cvt.u64.u32 %rd621, %r822; 2026-02-21T09:18:29.8245986Z cvt.u64.u32 %rd622, %r823; 2026-02-21T09:18:29.8246041Z shl.b64 %rd623, %rd622, 32; 2026-02-21T09:18:29.8246099Z or.b64 %rd624, %rd621, %rd623; 2026-02-21T09:18:29.8246262Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8246326Z mov.b64 {%r1558, %r1559}, %rd624; 2026-02-21T09:18:29.8246426Z cvt.rn.f16x2.f32 %r1560, %r1559, %r1558; 2026-02-21T09:18:29.8246589Z .loc 1 56 52 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:56:52 2026-02-21T09:18:29.8246653Z cvt.u64.u32 %rd625, %r824; 2026-02-21T09:18:29.8246707Z cvt.u64.u32 %rd626, %r825; 2026-02-21T09:18:29.8246763Z shl.b64 %rd627, %rd626, 32; 2026-02-21T09:18:29.8246827Z or.b64 %rd628, %rd625, %rd627; 2026-02-21T09:18:29.8247011Z .loc 1 58 27 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:58:27 2026-02-21T09:18:29.8247072Z mov.b64 {%r1561, %r1562}, %rd628; 2026-02-21T09:18:29.8247136Z cvt.rn.f16x2.f32 %r1563, %r1562, %r1561; 2026-02-21T09:18:29.8247304Z .loc 1 59 82 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:59:82 2026-02-21T09:18:29.8247404Z st.shared.v4.b32 [%r11], {%r1182, %r1194, %r1206, %r1218}; 2026-02-21T09:18:29.8247497Z st.shared.v4.b32 [%r12], {%r1230, %r1242, %r1254, %r1266}; 2026-02-21T09:18:29.8247594Z st.shared.v4.b32 [%r13], {%r1278, %r1290, %r1302, %r1314}; 2026-02-21T09:18:29.8247681Z st.shared.v4.b32 [%r14], {%r1326, %r1338, %r1350, %r1362}; 2026-02-21T09:18:29.8247768Z st.shared.v4.b32 [%r15], {%r1374, %r1386, %r1398, %r1410}; 2026-02-21T09:18:29.8247861Z st.shared.v4.b32 [%r16], {%r1422, %r1434, %r1446, %r1458}; 2026-02-21T09:18:29.8247946Z st.shared.v4.b32 [%r17], {%r1470, %r1482, %r1494, %r1506}; 2026-02-21T09:18:29.8248031Z st.shared.v4.b32 [%r18], {%r1518, %r1530, %r1542, %r1554}; 2026-02-21T09:18:29.8248087Z bar.sync 0; 2026-02-21T09:18:29.8248152Z // begin inline asm 2026-02-21T09:18:29.8248304Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r987, %r991, %r995, %r999}, [%r831]; 2026-02-21T09:18:29.8248358Z // end inline asm 2026-02-21T09:18:29.8248421Z // begin inline asm 2026-02-21T09:18:29.8248571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1003, %r1007, %r1011, %r1015}, [%r836]; 2026-02-21T09:18:29.8248624Z // end inline asm 2026-02-21T09:18:29.8248687Z // begin inline asm 2026-02-21T09:18:29.8248836Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1019, %r1023, %r1027, %r1031}, [%r841]; 2026-02-21T09:18:29.8248888Z // end inline asm 2026-02-21T09:18:29.8248941Z // begin inline asm 2026-02-21T09:18:29.8249093Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1035, %r1039, %r1043, %r1047}, [%r846]; 2026-02-21T09:18:29.8249146Z // end inline asm 2026-02-21T09:18:29.8249200Z // begin inline asm 2026-02-21T09:18:29.8249348Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1051, %r1055, %r1059, %r1063}, [%r851]; 2026-02-21T09:18:29.8249402Z // end inline asm 2026-02-21T09:18:29.8249456Z // begin inline asm 2026-02-21T09:18:29.8249596Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1067, %r1071, %r1075, %r1079}, [%r856]; 2026-02-21T09:18:29.8249679Z // end inline asm 2026-02-21T09:18:29.8249732Z // begin inline asm 2026-02-21T09:18:29.8249872Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1083, %r1087, %r1091, %r1095}, [%r861]; 2026-02-21T09:18:29.8249936Z // end inline asm 2026-02-21T09:18:29.8249990Z // begin inline asm 2026-02-21T09:18:29.8250130Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1099, %r1103, %r1107, %r1111}, [%r866]; 2026-02-21T09:18:29.8250191Z // end inline asm 2026-02-21T09:18:29.8250246Z bar.sync 0; 2026-02-21T09:18:29.8250335Z st.shared.v4.b32 [%r11], {%r1185, %r1197, %r1209, %r1221}; 2026-02-21T09:18:29.8250423Z st.shared.v4.b32 [%r12], {%r1233, %r1245, %r1257, %r1269}; 2026-02-21T09:18:29.8250517Z st.shared.v4.b32 [%r13], {%r1281, %r1293, %r1305, %r1317}; 2026-02-21T09:18:29.8250627Z st.shared.v4.b32 [%r14], {%r1329, %r1341, %r1353, %r1365}; 2026-02-21T09:18:29.8250714Z st.shared.v4.b32 [%r15], {%r1377, %r1389, %r1401, %r1413}; 2026-02-21T09:18:29.8250807Z st.shared.v4.b32 [%r16], {%r1425, %r1437, %r1449, %r1461}; 2026-02-21T09:18:29.8250893Z st.shared.v4.b32 [%r17], {%r1473, %r1485, %r1497, %r1509}; 2026-02-21T09:18:29.8250978Z st.shared.v4.b32 [%r18], {%r1521, %r1533, %r1545, %r1557}; 2026-02-21T09:18:29.8251059Z bar.sync 0; 2026-02-21T09:18:29.8251115Z // begin inline asm 2026-02-21T09:18:29.8251262Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r988, %r992, %r996, %r1000}, [%r831]; 2026-02-21T09:18:29.8251314Z // end inline asm 2026-02-21T09:18:29.8251376Z // begin inline asm 2026-02-21T09:18:29.8251521Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1004, %r1008, %r1012, %r1016}, [%r836]; 2026-02-21T09:18:29.8251574Z // end inline asm 2026-02-21T09:18:29.8251658Z // begin inline asm 2026-02-21T09:18:29.8251803Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1024, %r1028, %r1032}, [%r841]; 2026-02-21T09:18:29.8251857Z // end inline asm 2026-02-21T09:18:29.8251912Z // begin inline asm 2026-02-21T09:18:29.8252063Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1036, %r1040, %r1044, %r1048}, [%r846]; 2026-02-21T09:18:29.8252119Z // end inline asm 2026-02-21T09:18:29.8252174Z // begin inline asm 2026-02-21T09:18:29.8252320Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1052, %r1056, %r1060, %r1064}, [%r851]; 2026-02-21T09:18:29.8252376Z // end inline asm 2026-02-21T09:18:29.8252429Z // begin inline asm 2026-02-21T09:18:29.8252578Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1068, %r1072, %r1076, %r1080}, [%r856]; 2026-02-21T09:18:29.8252630Z // end inline asm 2026-02-21T09:18:29.8252685Z // begin inline asm 2026-02-21T09:18:29.8252825Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1084, %r1088, %r1092, %r1096}, [%r861]; 2026-02-21T09:18:29.8252885Z // end inline asm 2026-02-21T09:18:29.8252940Z // begin inline asm 2026-02-21T09:18:29.8253081Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1100, %r1104, %r1108, %r1112}, [%r866]; 2026-02-21T09:18:29.8253142Z // end inline asm 2026-02-21T09:18:29.8253194Z bar.sync 0; 2026-02-21T09:18:29.8253282Z st.shared.v4.b32 [%r11], {%r1188, %r1200, %r1212, %r1224}; 2026-02-21T09:18:29.8253379Z st.shared.v4.b32 [%r12], {%r1236, %r1248, %r1260, %r1272}; 2026-02-21T09:18:29.8253468Z st.shared.v4.b32 [%r13], {%r1284, %r1296, %r1308, %r1320}; 2026-02-21T09:18:29.8253557Z st.shared.v4.b32 [%r14], {%r1332, %r1344, %r1356, %r1368}; 2026-02-21T09:18:29.8253643Z st.shared.v4.b32 [%r15], {%r1380, %r1392, %r1404, %r1416}; 2026-02-21T09:18:29.8253737Z st.shared.v4.b32 [%r16], {%r1428, %r1440, %r1452, %r1464}; 2026-02-21T09:18:29.8253821Z st.shared.v4.b32 [%r17], {%r1476, %r1488, %r1500, %r1512}; 2026-02-21T09:18:29.8253907Z st.shared.v4.b32 [%r18], {%r1524, %r1536, %r1548, %r1560}; 2026-02-21T09:18:29.8253969Z bar.sync 0; 2026-02-21T09:18:29.8254025Z // begin inline asm 2026-02-21T09:18:29.8254168Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r989, %r993, %r997, %r1001}, [%r831]; 2026-02-21T09:18:29.8254228Z // end inline asm 2026-02-21T09:18:29.8254282Z // begin inline asm 2026-02-21T09:18:29.8254423Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1005, %r1009, %r1013, %r1017}, [%r836]; 2026-02-21T09:18:29.8254499Z // end inline asm 2026-02-21T09:18:29.8254562Z // begin inline asm 2026-02-21T09:18:29.8254736Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1021, %r1025, %r1029, %r1033}, [%r841]; 2026-02-21T09:18:29.8254790Z // end inline asm 2026-02-21T09:18:29.8254850Z // begin inline asm 2026-02-21T09:18:29.8254989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1037, %r1041, %r1045, %r1049}, [%r846]; 2026-02-21T09:18:29.8255041Z // end inline asm 2026-02-21T09:18:29.8255095Z // begin inline asm 2026-02-21T09:18:29.8255240Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1053, %r1057, %r1061, %r1065}, [%r851]; 2026-02-21T09:18:29.8255294Z // end inline asm 2026-02-21T09:18:29.8255375Z // begin inline asm 2026-02-21T09:18:29.8255522Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1069, %r1073, %r1077, %r1081}, [%r856]; 2026-02-21T09:18:29.8255576Z // end inline asm 2026-02-21T09:18:29.8255629Z // begin inline asm 2026-02-21T09:18:29.8255782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1085, %r1089, %r1093, %r1097}, [%r861]; 2026-02-21T09:18:29.8255836Z // end inline asm 2026-02-21T09:18:29.8255891Z // begin inline asm 2026-02-21T09:18:29.8256064Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1101, %r1105, %r1109, %r1113}, [%r866]; 2026-02-21T09:18:29.8256128Z // end inline asm 2026-02-21T09:18:29.8256182Z bar.sync 0; 2026-02-21T09:18:29.8256274Z st.shared.v4.b32 [%r11], {%r1191, %r1203, %r1215, %r1227}; 2026-02-21T09:18:29.8256373Z st.shared.v4.b32 [%r12], {%r1239, %r1251, %r1263, %r1275}; 2026-02-21T09:18:29.8256464Z st.shared.v4.b32 [%r13], {%r1287, %r1299, %r1311, %r1323}; 2026-02-21T09:18:29.8256581Z st.shared.v4.b32 [%r14], {%r1335, %r1347, %r1359, %r1371}; 2026-02-21T09:18:29.8256681Z st.shared.v4.b32 [%r15], {%r1383, %r1395, %r1407, %r1419}; 2026-02-21T09:18:29.8256770Z st.shared.v4.b32 [%r16], {%r1431, %r1443, %r1455, %r1467}; 2026-02-21T09:18:29.8256858Z st.shared.v4.b32 [%r17], {%r1479, %r1491, %r1503, %r1515}; 2026-02-21T09:18:29.8256947Z st.shared.v4.b32 [%r18], {%r1527, %r1539, %r1551, %r1563}; 2026-02-21T09:18:29.8257008Z bar.sync 0; 2026-02-21T09:18:29.8257063Z // begin inline asm 2026-02-21T09:18:29.8257272Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r990, %r994, %r998, %r1002}, [%r831]; 2026-02-21T09:18:29.8257327Z // end inline asm 2026-02-21T09:18:29.8257382Z // begin inline asm 2026-02-21T09:18:29.8257538Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1006, %r1010, %r1014, %r1018}, [%r836]; 2026-02-21T09:18:29.8257592Z // end inline asm 2026-02-21T09:18:29.8257648Z // begin inline asm 2026-02-21T09:18:29.8257805Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1022, %r1026, %r1030, %r1034}, [%r841]; 2026-02-21T09:18:29.8257860Z // end inline asm 2026-02-21T09:18:29.8257916Z // begin inline asm 2026-02-21T09:18:29.8258070Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1038, %r1042, %r1046, %r1050}, [%r846]; 2026-02-21T09:18:29.8258124Z // end inline asm 2026-02-21T09:18:29.8258180Z // begin inline asm 2026-02-21T09:18:29.8258330Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1054, %r1058, %r1062, %r1066}, [%r851]; 2026-02-21T09:18:29.8258394Z // end inline asm 2026-02-21T09:18:29.8258450Z // begin inline asm 2026-02-21T09:18:29.8258604Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1070, %r1074, %r1078, %r1082}, [%r856]; 2026-02-21T09:18:29.8258666Z // end inline asm 2026-02-21T09:18:29.8258721Z // begin inline asm 2026-02-21T09:18:29.8258867Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1086, %r1090, %r1094, %r1098}, [%r861]; 2026-02-21T09:18:29.8258921Z // end inline asm 2026-02-21T09:18:29.8258985Z // begin inline asm 2026-02-21T09:18:29.8259134Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1102, %r1106, %r1110, %r1114}, [%r866]; 2026-02-21T09:18:29.8259191Z // end inline asm 2026-02-21T09:18:29.8259255Z // begin inline asm 2026-02-21T09:18:29.8259359Z st.global.v4.b32 [ %rd85 + 0 ], { %r987, %r988, %r989, %r990 }; 2026-02-21T09:18:29.8259415Z // end inline asm 2026-02-21T09:18:29.8259479Z // begin inline asm 2026-02-21T09:18:29.8259609Z st.global.v4.b32 [ %rd86 + 0 ], { %r991, %r992, %r993, %r994 }; 2026-02-21T09:18:29.8259665Z // end inline asm 2026-02-21T09:18:29.8259720Z // begin inline asm 2026-02-21T09:18:29.8259825Z st.global.v4.b32 [ %rd87 + 0 ], { %r995, %r996, %r997, %r998 }; 2026-02-21T09:18:29.8259882Z // end inline asm 2026-02-21T09:18:29.8259939Z // begin inline asm 2026-02-21T09:18:29.8260055Z st.global.v4.b32 [ %rd88 + 0 ], { %r999, %r1000, %r1001, %r1002 }; 2026-02-21T09:18:29.8260111Z // end inline asm 2026-02-21T09:18:29.8260167Z // begin inline asm 2026-02-21T09:18:29.8260276Z st.global.v4.b32 [ %rd89 + 0 ], { %r1003, %r1004, %r1005, %r1006 }; 2026-02-21T09:18:29.8260342Z // end inline asm 2026-02-21T09:18:29.8260428Z // begin inline asm 2026-02-21T09:18:29.8260532Z st.global.v4.b32 [ %rd90 + 0 ], { %r1007, %r1008, %r1009, %r1010 }; 2026-02-21T09:18:29.8260597Z // end inline asm 2026-02-21T09:18:29.8260653Z // begin inline asm 2026-02-21T09:18:29.8260753Z st.global.v4.b32 [ %rd91 + 0 ], { %r1011, %r1012, %r1013, %r1014 }; 2026-02-21T09:18:29.8260816Z // end inline asm 2026-02-21T09:18:29.8260872Z // begin inline asm 2026-02-21T09:18:29.8260992Z st.global.v4.b32 [ %rd92 + 0 ], { %r1015, %r1016, %r1017, %r1018 }; 2026-02-21T09:18:29.8261048Z // end inline asm 2026-02-21T09:18:29.8261112Z // begin inline asm 2026-02-21T09:18:29.8261213Z st.global.v4.b32 [ %rd93 + 0 ], { %r1019, %r1020, %r1021, %r1022 }; 2026-02-21T09:18:29.8261267Z // end inline asm 2026-02-21T09:18:29.8261328Z // begin inline asm 2026-02-21T09:18:29.8261426Z st.global.v4.b32 [ %rd94 + 0 ], { %r1023, %r1024, %r1025, %r1026 }; 2026-02-21T09:18:29.8261480Z // end inline asm 2026-02-21T09:18:29.8261558Z // begin inline asm 2026-02-21T09:18:29.8261667Z st.global.v4.b32 [ %rd95 + 0 ], { %r1027, %r1028, %r1029, %r1030 }; 2026-02-21T09:18:29.8261722Z // end inline asm 2026-02-21T09:18:29.8261778Z // begin inline asm 2026-02-21T09:18:29.8261883Z st.global.v4.b32 [ %rd96 + 0 ], { %r1031, %r1032, %r1033, %r1034 }; 2026-02-21T09:18:29.8261939Z // end inline asm 2026-02-21T09:18:29.8261995Z // begin inline asm 2026-02-21T09:18:29.8262093Z st.global.v4.b32 [ %rd97 + 0 ], { %r1035, %r1036, %r1037, %r1038 }; 2026-02-21T09:18:29.8262156Z // end inline asm 2026-02-21T09:18:29.8262214Z // begin inline asm 2026-02-21T09:18:29.8262312Z st.global.v4.b32 [ %rd98 + 0 ], { %r1039, %r1040, %r1041, %r1042 }; 2026-02-21T09:18:29.8262374Z // end inline asm 2026-02-21T09:18:29.8262430Z // begin inline asm 2026-02-21T09:18:29.8262526Z st.global.v4.b32 [ %rd99 + 0 ], { %r1043, %r1044, %r1045, %r1046 }; 2026-02-21T09:18:29.8262588Z // end inline asm 2026-02-21T09:18:29.8262644Z // begin inline asm 2026-02-21T09:18:29.8262750Z st.global.v4.b32 [ %rd100 + 0 ], { %r1047, %r1048, %r1049, %r1050 }; 2026-02-21T09:18:29.8262807Z // end inline asm 2026-02-21T09:18:29.8262871Z // begin inline asm 2026-02-21T09:18:29.8262972Z st.global.v4.b32 [ %rd101 + 0 ], { %r1051, %r1052, %r1053, %r1054 }; 2026-02-21T09:18:29.8263029Z // end inline asm 2026-02-21T09:18:29.8263093Z // begin inline asm 2026-02-21T09:18:29.8263193Z st.global.v4.b32 [ %rd102 + 0 ], { %r1055, %r1056, %r1057, %r1058 }; 2026-02-21T09:18:29.8263247Z // end inline asm 2026-02-21T09:18:29.8263303Z // begin inline asm 2026-02-21T09:18:29.8263412Z st.global.v4.b32 [ %rd103 + 0 ], { %r1059, %r1060, %r1061, %r1062 }; 2026-02-21T09:18:29.8263468Z // end inline asm 2026-02-21T09:18:29.8263525Z // begin inline asm 2026-02-21T09:18:29.8263631Z st.global.v4.b32 [ %rd104 + 0 ], { %r1063, %r1064, %r1065, %r1066 }; 2026-02-21T09:18:29.8263687Z // end inline asm 2026-02-21T09:18:29.8263754Z // begin inline asm 2026-02-21T09:18:29.8263855Z st.global.v4.b32 [ %rd105 + 0 ], { %r1067, %r1068, %r1069, %r1070 }; 2026-02-21T09:18:29.8263908Z // end inline asm 2026-02-21T09:18:29.8263961Z // begin inline asm 2026-02-21T09:18:29.8264055Z st.global.v4.b32 [ %rd106 + 0 ], { %r1071, %r1072, %r1073, %r1074 }; 2026-02-21T09:18:29.8264114Z // end inline asm 2026-02-21T09:18:29.8264187Z // begin inline asm 2026-02-21T09:18:29.8264280Z st.global.v4.b32 [ %rd107 + 0 ], { %r1075, %r1076, %r1077, %r1078 }; 2026-02-21T09:18:29.8264339Z // end inline asm 2026-02-21T09:18:29.8264392Z // begin inline asm 2026-02-21T09:18:29.8264485Z st.global.v4.b32 [ %rd108 + 0 ], { %r1079, %r1080, %r1081, %r1082 }; 2026-02-21T09:18:29.8264538Z // end inline asm 2026-02-21T09:18:29.8264598Z // begin inline asm 2026-02-21T09:18:29.8264732Z st.global.v4.b32 [ %rd109 + 0 ], { %r1083, %r1084, %r1085, %r1086 }; 2026-02-21T09:18:29.8264785Z // end inline asm 2026-02-21T09:18:29.8264846Z // begin inline asm 2026-02-21T09:18:29.8264942Z st.global.v4.b32 [ %rd110 + 0 ], { %r1087, %r1088, %r1089, %r1090 }; 2026-02-21T09:18:29.8265033Z // end inline asm 2026-02-21T09:18:29.8265086Z // begin inline asm 2026-02-21T09:18:29.8265186Z st.global.v4.b32 [ %rd111 + 0 ], { %r1091, %r1092, %r1093, %r1094 }; 2026-02-21T09:18:29.8265239Z // end inline asm 2026-02-21T09:18:29.8265292Z // begin inline asm 2026-02-21T09:18:29.8265395Z st.global.v4.b32 [ %rd112 + 0 ], { %r1095, %r1096, %r1097, %r1098 }; 2026-02-21T09:18:29.8265448Z // end inline asm 2026-02-21T09:18:29.8265501Z // begin inline asm 2026-02-21T09:18:29.8265634Z st.global.v4.b32 [ %rd113 + 0 ], { %r1099, %r1100, %r1101, %r1102 }; 2026-02-21T09:18:29.8265687Z // end inline asm 2026-02-21T09:18:29.8265741Z // begin inline asm 2026-02-21T09:18:29.8265835Z st.global.v4.b32 [ %rd114 + 0 ], { %r1103, %r1104, %r1105, %r1106 }; 2026-02-21T09:18:29.8265896Z // end inline asm 2026-02-21T09:18:29.8265950Z // begin inline asm 2026-02-21T09:18:29.8266043Z st.global.v4.b32 [ %rd115 + 0 ], { %r1107, %r1108, %r1109, %r1110 }; 2026-02-21T09:18:29.8266129Z // end inline asm 2026-02-21T09:18:29.8266187Z // begin inline asm 2026-02-21T09:18:29.8266283Z st.global.v4.b32 [ %rd116 + 0 ], { %r1111, %r1112, %r1113, %r1114 }; 2026-02-21T09:18:29.8266335Z // end inline asm 2026-02-21T09:18:29.8266426Z $L__BB0_8: // %._crit_edge 2026-02-21T09:18:29.8266602Z .loc 1 30 4 // ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py:30:4 2026-02-21T09:18:29.8266654Z bar.sync 0; 2026-02-21T09:18:29.8266717Z // begin inline asm 2026-02-21T09:18:29.8266838Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1564, 256; 2026-02-21T09:18:29.8266891Z // end inline asm 2026-02-21T09:18:29.8266947Z ret; 2026-02-21T09:18:29.8267003Z $L__tmp0: 2026-02-21T09:18:29.8267057Z $L__func_end0: 2026-02-21T09:18:29.8267142Z // -- End function 2026-02-21T09:18:29.8267202Z } 2026-02-21T09:18:29.8267413Z .file 1 "/tmp/torchinductor_root/iy/ciykg2yook2isgud2xldroriuyrbxk6l2mkwknxyzkwbv3tt35uq.py" 2026-02-21T09:18:29.8267476Z .section .debug_abbrev 2026-02-21T09:18:29.8267538Z { 2026-02-21T09:18:29.8267622Z .b8 1 // Abbreviation Code 2026-02-21T09:18:29.8267708Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:18:29.8267796Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:18:29.8267876Z .b8 37 // DW_AT_producer 2026-02-21T09:18:29.8267950Z .b8 8 // DW_FORM_string 2026-02-21T09:18:29.8268023Z .b8 19 // DW_AT_language 2026-02-21T09:18:29.8268107Z .b8 5 // DW_FORM_data2 2026-02-21T09:18:29.8268181Z .b8 3 // DW_AT_name 2026-02-21T09:18:29.8268251Z .b8 8 // DW_FORM_string 2026-02-21T09:18:29.8268336Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:18:29.8268411Z .b8 6 // DW_FORM_data4 2026-02-21T09:18:29.8268483Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:18:29.8268557Z .b8 8 // DW_FORM_string 2026-02-21T09:18:29.8268627Z .b8 0 // EOM(1) 2026-02-21T09:18:29.8268694Z .b8 0 // EOM(2) 2026-02-21T09:18:29.8268782Z .b8 0 // EOM(3) 2026-02-21T09:18:29.8268839Z } 2026-02-21T09:18:29.8268898Z .section .debug_info 2026-02-21T09:18:29.8268945Z { 2026-02-21T09:18:29.8269034Z .b32 104 // Length of Unit 2026-02-21T09:18:29.8269119Z .b8 2 // DWARF version number 2026-02-21T09:18:29.8269169Z .b8 0 2026-02-21T09:18:29.8269280Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:18:29.8269374Z .b8 8 // Address Size (in bytes) 2026-02-21T09:18:29.8269469Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:18:29.8269584Z .b8 116 // DW_AT_producer 2026-02-21T09:18:29.8269644Z .b8 114 2026-02-21T09:18:29.8269695Z .b8 105 2026-02-21T09:18:29.8269746Z .b8 116 2026-02-21T09:18:29.8269803Z .b8 111 2026-02-21T09:18:29.8269854Z .b8 110 2026-02-21T09:18:29.8269904Z .b8 0 2026-02-21T09:18:29.8269974Z .b8 2 // DW_AT_language 2026-02-21T09:18:29.8270033Z .b8 0 2026-02-21T09:18:29.8270129Z .b8 99 // DW_AT_name 2026-02-21T09:18:29.8270180Z .b8 105 2026-02-21T09:18:29.8270235Z .b8 121 2026-02-21T09:18:29.8270284Z .b8 107 2026-02-21T09:18:29.8270332Z .b8 103 2026-02-21T09:18:29.8270382Z .b8 50 2026-02-21T09:18:29.8270439Z .b8 121 2026-02-21T09:18:29.8270487Z .b8 111 2026-02-21T09:18:29.8270536Z .b8 111 2026-02-21T09:18:29.8270591Z .b8 107 2026-02-21T09:18:29.8270641Z .b8 50 2026-02-21T09:18:29.8270689Z .b8 105 2026-02-21T09:18:29.8270739Z .b8 115 2026-02-21T09:18:29.8270820Z .b8 103 2026-02-21T09:18:29.8270870Z .b8 117 2026-02-21T09:18:29.8270917Z .b8 100 2026-02-21T09:18:29.8270966Z .b8 50 2026-02-21T09:18:29.8271023Z .b8 120 2026-02-21T09:18:29.8271070Z .b8 108 2026-02-21T09:18:29.8271118Z .b8 100 2026-02-21T09:18:29.8271172Z .b8 114 2026-02-21T09:18:29.8271220Z .b8 111 2026-02-21T09:18:29.8271269Z .b8 114 2026-02-21T09:18:29.8271316Z .b8 105 2026-02-21T09:18:29.8271372Z .b8 117 2026-02-21T09:18:29.8271420Z .b8 121 2026-02-21T09:18:29.8271467Z .b8 114 2026-02-21T09:18:29.8271522Z .b8 98 2026-02-21T09:18:29.8271571Z .b8 120 2026-02-21T09:18:29.8271619Z .b8 107 2026-02-21T09:18:29.8271667Z .b8 54 2026-02-21T09:18:29.8271722Z .b8 108 2026-02-21T09:18:29.8271770Z .b8 50 2026-02-21T09:18:29.8271818Z .b8 109 2026-02-21T09:18:29.8271864Z .b8 107 2026-02-21T09:18:29.8271919Z .b8 119 2026-02-21T09:18:29.8271966Z .b8 107 2026-02-21T09:18:29.8272014Z .b8 110 2026-02-21T09:18:29.8272071Z .b8 120 2026-02-21T09:18:29.8272120Z .b8 121 2026-02-21T09:18:29.8272170Z .b8 122 2026-02-21T09:18:29.8272220Z .b8 107 2026-02-21T09:18:29.8272281Z .b8 119 2026-02-21T09:18:29.8272332Z .b8 98 2026-02-21T09:18:29.8272382Z .b8 118 2026-02-21T09:18:29.8272436Z .b8 51 2026-02-21T09:18:29.8272484Z .b8 116 2026-02-21T09:18:29.8272531Z .b8 116 2026-02-21T09:18:29.8272579Z .b8 51 2026-02-21T09:18:29.8272640Z .b8 53 2026-02-21T09:18:29.8272691Z .b8 117 2026-02-21T09:18:29.8272741Z .b8 113 2026-02-21T09:18:29.8272789Z .b8 46 2026-02-21T09:18:29.8272846Z .b8 112 2026-02-21T09:18:29.8272896Z .b8 121 2026-02-21T09:18:29.8272945Z .b8 0 2026-02-21T09:18:29.8273042Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:18:29.8273115Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:18:29.8273163Z .b8 116 2026-02-21T09:18:29.8273221Z .b8 109 2026-02-21T09:18:29.8273269Z .b8 112 2026-02-21T09:18:29.8273316Z .b8 47 2026-02-21T09:18:29.8273364Z .b8 116 2026-02-21T09:18:29.8273421Z .b8 111 2026-02-21T09:18:29.8273470Z .b8 114 2026-02-21T09:18:29.8273520Z .b8 99 2026-02-21T09:18:29.8273571Z .b8 104 2026-02-21T09:18:29.8273632Z .b8 105 2026-02-21T09:18:29.8273682Z .b8 110 2026-02-21T09:18:29.8273733Z .b8 100 2026-02-21T09:18:29.8273789Z .b8 117 2026-02-21T09:18:29.8273837Z .b8 99 2026-02-21T09:18:29.8273888Z .b8 116 2026-02-21T09:18:29.8273936Z .b8 111 2026-02-21T09:18:29.8273993Z .b8 114 2026-02-21T09:18:29.8274066Z .b8 95 2026-02-21T09:18:29.8274114Z .b8 114 2026-02-21T09:18:29.8274169Z .b8 111 2026-02-21T09:18:29.8274218Z .b8 111 2026-02-21T09:18:29.8274265Z .b8 116 2026-02-21T09:18:29.8274313Z .b8 47 2026-02-21T09:18:29.8274372Z .b8 105 2026-02-21T09:18:29.8274421Z .b8 121 2026-02-21T09:18:29.8274469Z .b8 0 2026-02-21T09:18:29.8274518Z } 2026-02-21T09:18:29.8274588Z .section .debug_macinfo { } 2026-02-21T09:18:29.8274595Z 2026-02-21T09:18:29.8274699Z ================================================================ 2026-02-21T09:18:29.8274801Z please share the reproducer above with Triton project. 2026-02-21T09:18:30.5167328Z [73s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:18:30.5167999Z 2026-02-21T09:18:30.5168004Z 2026-02-21T09:18:30.5168008Z 2026-02-21T09:18:30.5168109Z ================================================================ 2026-02-21T09:18:30.5168345Z Internal Triton PTX codegen error 2026-02-21T09:18:30.5168541Z `ptxas` stderr: 2026-02-21T09:18:30.5169012Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 316 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:18:30.5169605Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:18:30.5169775Z 2026-02-21T09:18:30.5170196Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpab90n44s.ptx -o /tmp/tmpab90n44s.ptx.o 2026-02-21T09:18:30.5170660Z 2026-02-21T09:18:30.5170664Z 2026-02-21T09:18:30.5170730Z // 2026-02-21T09:18:30.5170873Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:18:30.5171131Z // 2026-02-21T09:18:30.5171206Z 2026-02-21T09:18:30.5171265Z .version 8.7 2026-02-21T09:18:30.5171411Z .target sm_100a 2026-02-21T09:18:30.5171549Z .address_size 64 2026-02-21T09:18:30.5171643Z 2026-02-21T09:18:30.5171770Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:18:30.5172043Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:18:30.5172270Z // @_helion_matmul 2026-02-21T09:18:30.5172485Z .visible .entry _helion_matmul( 2026-02-21T09:18:30.5172708Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:18:30.5172979Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:18:30.5173233Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:18:30.5173498Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:18:30.5173753Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:18:30.5173951Z ) 2026-02-21T09:18:30.5174070Z .reqntid 128 2026-02-21T09:18:30.5174194Z .maxnreg 32 2026-02-21T09:18:30.5175820Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:18:30.5177108Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:18:30.5177363Z `ptxas` stderr: 2026-02-21T09:18:30.5177822Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 316 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:18:30.5178336Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:18:30.5178506Z 2026-02-21T09:18:30.5178931Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpab90n44s.ptx -o /tmp/tmpab90n44s.ptx.o 2026-02-21T09:18:30.5179406Z 2026-02-21T09:18:30.5179553Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:18:30.5179869Z { 2026-02-21T09:18:30.5180013Z .reg .pred %p<81>; 2026-02-21T09:18:30.5180180Z .reg .b16 %rs<8>; 2026-02-21T09:18:30.5180319Z .reg .b32 %r<944>; 2026-02-21T09:18:30.5180453Z .reg .b64 %rd<372>; 2026-02-21T09:18:30.5180599Z $L__func_begin0: 2026-02-21T09:18:30.5180678Z 2026-02-21T09:18:30.5180730Z // %bb.0: 2026-02-21T09:18:30.5180977Z .loc 1 19 0 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:19 2026-02-21T09:18:30.5181268Z mov.u32 %r1, %tid.x; 2026-02-21T09:18:30.5181435Z ld.param.b64 %rd18, [_helion_matmul_param_1]; 2026-02-21T09:18:30.5181639Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:18:30.5181803Z mov.b32 %r93, global_smem; 2026-02-21T09:18:30.5182003Z // begin inline asm 2026-02-21T09:18:30.5182244Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r93], 128; 2026-02-21T09:18:30.5182498Z // end inline asm 2026-02-21T09:18:30.5182654Z ld.param.b64 %rd35, [_helion_matmul_param_3]; 2026-02-21T09:18:30.5182846Z bar.sync 0; 2026-02-21T09:18:30.5182992Z ld.shared.b32 %r935, [global_smem]; 2026-02-21T09:18:30.5183158Z bar.sync 0; 2026-02-21T09:18:30.5183300Z // begin inline asm 2026-02-21T09:18:30.5183531Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:18:30.5183762Z // end inline asm 2026-02-21T09:18:30.5184007Z .loc 1 21 67 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:21:67 2026-02-21T09:18:30.5184302Z mov.u32 %r102, %ctaid.x; 2026-02-21T09:18:30.5184449Z mov.u32 %r103, %ctaid.y; 2026-02-21T09:18:30.5184607Z mov.u32 %r104, %ctaid.z; 2026-02-21T09:18:30.5184803Z mov.u32 %r105, %nctaid.x; 2026-02-21T09:18:30.5184999Z mov.u32 %r106, %nctaid.y; 2026-02-21T09:18:30.5185176Z mad.lo.s32 %r107, %r104, %r106, %r103; 2026-02-21T09:18:30.5185364Z mad.lo.s32 %r108, %r107, %r105, %r102; 2026-02-21T09:18:30.5185567Z shl.b32 %r109, %r108, 7; 2026-02-21T09:18:30.5185740Z cvt.s64.s32 %rd36, %r109; 2026-02-21T09:18:30.5185910Z add.s64 %rd32, %rd35, %rd36; 2026-02-21T09:18:30.5186065Z shl.b32 %r110, %r1, 2; 2026-02-21T09:18:30.5186221Z add.s32 %r94, %r93, %r110; 2026-02-21T09:18:30.5186367Z mov.b32 %r95, 0; 2026-02-21T09:18:30.5186510Z // begin inline asm 2026-02-21T09:18:30.5186658Z @%p1 st.shared.b32 [ %r94 + 0 ], %r95; 2026-02-21T09:18:30.5186834Z // end inline asm 2026-02-21T09:18:30.5186967Z bar.warp.sync -1; 2026-02-21T09:18:30.5187117Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:18:30.5187264Z cvt.u64.u32 %rd17, %r93; 2026-02-21T09:18:30.5187412Z // begin inline asm 2026-02-21T09:18:30.5187664Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd17 + 0 ], %rd18; 2026-02-21T09:18:30.5187934Z // end inline asm 2026-02-21T09:18:30.5188070Z // begin inline asm 2026-02-21T09:18:30.5188286Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x1; 2026-02-21T09:18:30.5188532Z // end inline asm 2026-02-21T09:18:30.5188661Z mov.b32 %r96, 32; 2026-02-21T09:18:30.5188800Z // begin inline asm 2026-02-21T09:18:30.5189026Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x0, %r96; 2026-02-21T09:18:30.5189284Z // end inline asm 2026-02-21T09:18:30.5189418Z mov.b32 %r97, 128; 2026-02-21T09:18:30.5189549Z // begin inline asm 2026-02-21T09:18:30.5189778Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x1, %r97; 2026-02-21T09:18:30.5190026Z // end inline asm 2026-02-21T09:18:30.5190159Z mov.b32 %r98, 1024; 2026-02-21T09:18:30.5190292Z // begin inline asm 2026-02-21T09:18:30.5190526Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x0, %r98; 2026-02-21T09:18:30.5190791Z // end inline asm 2026-02-21T09:18:30.5190919Z mov.b32 %r99, 8192; 2026-02-21T09:18:30.5191061Z // begin inline asm 2026-02-21T09:18:30.5191286Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x1, %r99; 2026-02-21T09:18:30.5191557Z // end inline asm 2026-02-21T09:18:30.5191686Z mov.b64 %rd25, 2048; 2026-02-21T09:18:30.5191864Z // begin inline asm 2026-02-21T09:18:30.5192103Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd17 + 0 ], 0x0, %rd25; 2026-02-21T09:18:30.5192382Z // end inline asm 2026-02-21T09:18:30.5192517Z mov.b32 %r100, 1; 2026-02-21T09:18:30.5192655Z // begin inline asm 2026-02-21T09:18:30.5192923Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x0, %r100; 2026-02-21T09:18:30.5193218Z // end inline asm 2026-02-21T09:18:30.5193360Z // begin inline asm 2026-02-21T09:18:30.5193623Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x1, %r100; 2026-02-21T09:18:30.5193926Z // end inline asm 2026-02-21T09:18:30.5194065Z // begin inline asm 2026-02-21T09:18:30.5194343Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x6; 2026-02-21T09:18:30.5194629Z // end inline asm 2026-02-21T09:18:30.5194807Z // begin inline asm 2026-02-21T09:18:30.5195079Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x0; 2026-02-21T09:18:30.5195375Z // end inline asm 2026-02-21T09:18:30.5195519Z // begin inline asm 2026-02-21T09:18:30.5195809Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x2; 2026-02-21T09:18:30.5196088Z // end inline asm 2026-02-21T09:18:30.5196223Z // begin inline asm 2026-02-21T09:18:30.5196440Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd17 + 0 ], 0x0; 2026-02-21T09:18:30.5196695Z // end inline asm 2026-02-21T09:18:30.5196822Z // begin inline asm 2026-02-21T09:18:30.5197191Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd32 + 0 ], [ %rd17 + 0 ], 0x80; 2026-02-21T09:18:30.5197569Z // end inline asm 2026-02-21T09:18:30.5197705Z // begin inline asm 2026-02-21T09:18:30.5197914Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd32 + 0 ], 0x80; 2026-02-21T09:18:30.5198156Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:18:30.5198347Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:18:30.5198521Z // end inline asm 2026-02-21T09:18:30.5198657Z bar.sync 0; 2026-02-21T09:18:30.5198794Z cvta.global.u64 %rd77, %rd32; 2026-02-21T09:18:30.5199080Z .loc 1 28 35 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:28:35 2026-02-21T09:18:30.5199373Z shl.b32 %r936, %r102, 1; 2026-02-21T09:18:30.5199642Z .loc 1 29 37 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:29:37 2026-02-21T09:18:30.5199932Z add.s32 %r111, %r936, 2; 2026-02-21T09:18:30.5200185Z .loc 1 29 49 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:29:49 2026-02-21T09:18:30.5200474Z min.s32 %r4, %r111, 512; 2026-02-21T09:18:30.5200730Z .loc 1 30 74 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:30:74 2026-02-21T09:18:30.5201033Z setp.ge.s32 %p21, %r936, %r4; 2026-02-21T09:18:30.5201201Z @%p21 bra $L__BB0_9; 2026-02-21T09:18:30.5201378Z // %bb.1: // %.lr.ph 2026-02-21T09:18:30.5201706Z .loc 1 0 74 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:0:74 2026-02-21T09:18:30.5202049Z ld.param.b64 %rd16, [_helion_matmul_param_2]; 2026-02-21T09:18:30.5202284Z ld.param.b64 %rd15, [_helion_matmul_param_0]; 2026-02-21T09:18:30.5202482Z shr.u32 %r5, %r1, 5; 2026-02-21T09:18:30.5202639Z bfe.u32 %r6, %r1, 2, 5; 2026-02-21T09:18:30.5202796Z shr.u32 %r112, %r1, 4; 2026-02-21T09:18:30.5202958Z bfe.u32 %r7, %r1, 4, 3; 2026-02-21T09:18:30.5203110Z or.b32 %r8, %r7, 8; 2026-02-21T09:18:30.5203273Z or.b32 %r9, %r7, 16; 2026-02-21T09:18:30.5203414Z or.b32 %r10, %r7, 24; 2026-02-21T09:18:30.5203554Z or.b32 %r11, %r7, 32; 2026-02-21T09:18:30.5203698Z or.b32 %r12, %r7, 40; 2026-02-21T09:18:30.5203832Z or.b32 %r13, %r7, 48; 2026-02-21T09:18:30.5203980Z or.b32 %r14, %r112, 56; 2026-02-21T09:18:30.5204131Z or.b32 %r15, %r7, 64; 2026-02-21T09:18:30.5204282Z or.b32 %r16, %r7, 72; 2026-02-21T09:18:30.5204472Z or.b32 %r17, %r7, 80; 2026-02-21T09:18:30.5204622Z or.b32 %r18, %r7, 88; 2026-02-21T09:18:30.5204801Z or.b32 %r19, %r7, 96; 2026-02-21T09:18:30.5204956Z or.b32 %r20, %r7, 104; 2026-02-21T09:18:30.5205122Z or.b32 %r21, %r7, 112; 2026-02-21T09:18:30.5205276Z or.b32 %r22, %r112, 120; 2026-02-21T09:18:30.5205460Z shl.b32 %r113, %r1, 3; 2026-02-21T09:18:30.5205631Z and.b32 %r23, %r113, 120; 2026-02-21T09:18:30.5205817Z and.b32 %r24, %r1, 3; 2026-02-21T09:18:30.5205980Z shl.b32 %r25, %r24, 3; 2026-02-21T09:18:30.5206148Z shl.b32 %r114, %r1, 4; 2026-02-21T09:18:30.5206311Z and.b32 %r115, %r114, 2032; 2026-02-21T09:18:30.5206497Z and.b32 %r116, %r1, 24; 2026-02-21T09:18:30.5206667Z shl.b32 %r117, %r116, 1; 2026-02-21T09:18:30.5206881Z xor.b32 %r26, %r115, %r117; 2026-02-21T09:18:30.5207053Z add.s32 %r285, %r93, %r26; 2026-02-21T09:18:30.5207214Z add.s32 %r287, %r285, 2048; 2026-02-21T09:18:30.5207379Z add.s32 %r289, %r285, 4096; 2026-02-21T09:18:30.5207536Z add.s32 %r291, %r285, 6144; 2026-02-21T09:18:30.5207699Z or.b32 %r31, %r25, 32; 2026-02-21T09:18:30.5207850Z add.s32 %r298, %r285, 8192; 2026-02-21T09:18:30.5208016Z add.s32 %r300, %r285, 10240; 2026-02-21T09:18:30.5208212Z add.s32 %r302, %r285, 12288; 2026-02-21T09:18:30.5208384Z add.s32 %r304, %r285, 14336; 2026-02-21T09:18:30.5208542Z or.b32 %r36, %r25, 64; 2026-02-21T09:18:30.5208700Z add.s32 %r311, %r285, 16384; 2026-02-21T09:18:30.5208864Z add.s32 %r313, %r285, 18432; 2026-02-21T09:18:30.5209020Z add.s32 %r315, %r285, 20480; 2026-02-21T09:18:30.5209182Z add.s32 %r317, %r285, 22528; 2026-02-21T09:18:30.5209343Z bfe.u32 %r119, %r93, 4, 14; 2026-02-21T09:18:30.5209558Z cvt.u64.u32 %rd37, %r119; 2026-02-21T09:18:30.5209741Z or.b64 %rd68, %rd37, -9223371899382267904; 2026-02-21T09:18:30.5209932Z add.s32 %r120, %r93, 32768; 2026-02-21T09:18:30.5210089Z bfe.u32 %r121, %r120, 4, 14; 2026-02-21T09:18:30.5210254Z cvt.u64.u32 %rd38, %r121; 2026-02-21T09:18:30.5210423Z or.b64 %rd69, %rd38, -9223371899382267904; 2026-02-21T09:18:30.5210618Z add.s32 %r122, %r93, 32; 2026-02-21T09:18:30.5210781Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T09:18:30.5210941Z cvt.u64.u32 %rd39, %r123; 2026-02-21T09:18:30.5211117Z or.b64 %rd70, %rd39, -9223371899382267904; 2026-02-21T09:18:30.5211299Z add.s32 %r124, %r93, 32800; 2026-02-21T09:18:30.5211461Z bfe.u32 %r125, %r124, 4, 14; 2026-02-21T09:18:30.5211620Z cvt.u64.u32 %rd40, %r125; 2026-02-21T09:18:30.5211793Z or.b64 %rd71, %rd40, -9223371899382267904; 2026-02-21T09:18:30.5211972Z or.b32 %r41, %r25, 96; 2026-02-21T09:18:30.5212128Z add.s32 %r349, %r285, 24576; 2026-02-21T09:18:30.5212293Z add.s32 %r351, %r285, 26624; 2026-02-21T09:18:30.5212448Z add.s32 %r353, %r285, 28672; 2026-02-21T09:18:30.5212612Z add.s32 %r355, %r285, 30720; 2026-02-21T09:18:30.5212769Z shl.b32 %r126, %r1, 10; 2026-02-21T09:18:30.5212932Z and.b32 %r127, %r126, 6144; 2026-02-21T09:18:30.5213088Z or.b32 %r128, %r127, %r115; 2026-02-21T09:18:30.5213255Z add.s32 %r46, %r93, %r128; 2026-02-21T09:18:30.5213413Z xor.b32 %r129, %r128, 32; 2026-02-21T09:18:30.5213578Z add.s32 %r47, %r93, %r129; 2026-02-21T09:18:30.5213737Z xor.b32 %r130, %r128, 64; 2026-02-21T09:18:30.5213903Z add.s32 %r48, %r93, %r130; 2026-02-21T09:18:30.5214069Z xor.b32 %r131, %r128, 96; 2026-02-21T09:18:30.5214224Z add.s32 %r49, %r93, %r131; 2026-02-21T09:18:30.5214386Z and.b32 %r132, %r1, 96; 2026-02-21T09:18:30.5214537Z shl.b32 %r133, %r132, 6; 2026-02-21T09:18:30.5214761Z shl.b32 %r134, %r24, 5; 2026-02-21T09:18:30.5214917Z shl.b32 %r135, %r116, 4; 2026-02-21T09:18:30.5215079Z and.b32 %r137, %r110, 16; 2026-02-21T09:18:30.5215233Z or.b32 %r138, %r133, %r134; 2026-02-21T09:18:30.5215401Z or.b32 %r139, %r135, %r132; 2026-02-21T09:18:30.5215559Z xor.b32 %r140, %r138, %r139; 2026-02-21T09:18:30.5215723Z add.s32 %r141, %r93, %r137; 2026-02-21T09:18:30.5215885Z add.s32 %r570, %r141, %r140; 2026-02-21T09:18:30.5216053Z add.s32 %r575, %r570, 512; 2026-02-21T09:18:30.5216253Z add.s32 %r580, %r570, 1024; 2026-02-21T09:18:30.5216406Z add.s32 %r585, %r570, 1536; 2026-02-21T09:18:30.5216692Z .loc 1 30 74 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:30:74 2026-02-21T09:18:30.5217020Z mad.wide.u32 %rd41, %r24, 16, %rd15; 2026-02-21T09:18:30.5217212Z add.s64 %rd6, %rd41, 131328; 2026-02-21T09:18:30.5217499Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5217800Z or.b32 %r54, %r25, 98432; 2026-02-21T09:18:30.5217965Z bra.uni $L__BB0_2; 2026-02-21T09:18:30.5218163Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:18:30.5218515Z .loc 1 0 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:0:42 2026-02-21T09:18:30.5218897Z mov.b32 %r423, 1; 2026-02-21T09:18:30.5219144Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5219426Z // begin inline asm 2026-02-21T09:18:30.5219564Z 2026-02-21T09:18:30.5219680Z { 2026-02-21T09:18:30.5219798Z .reg .pred complete; 2026-02-21T09:18:30.5219945Z waitLoop: 2026-02-21T09:18:30.5220157Z mbarrier.try_wait.parity.shared.b64 complete, [%r422], %r423; 2026-02-21T09:18:30.5220396Z @!complete bra.uni waitLoop; 2026-02-21T09:18:30.5220539Z } 2026-02-21T09:18:30.5220609Z 2026-02-21T09:18:30.5220662Z // end inline asm 2026-02-21T09:18:30.5220902Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5221191Z cp.async.wait_group 0; 2026-02-21T09:18:30.5221344Z bar.sync 0; 2026-02-21T09:18:30.5221496Z // begin inline asm 2026-02-21T09:18:30.5221670Z @%p4 mbarrier.inval.shared::cta.b64 [%r281]; 2026-02-21T09:18:30.5221849Z // end inline asm 2026-02-21T09:18:30.5221989Z bar.sync 0; 2026-02-21T09:18:30.5222114Z // begin inline asm 2026-02-21T09:18:30.5222278Z @%p4 mbarrier.inval.shared::cta.b64 [%r282]; 2026-02-21T09:18:30.5222455Z // end inline asm 2026-02-21T09:18:30.5222589Z bar.sync 0; 2026-02-21T09:18:30.5222712Z // begin inline asm 2026-02-21T09:18:30.5222872Z @%p4 mbarrier.inval.shared::cta.b64 [%r283]; 2026-02-21T09:18:30.5223056Z // end inline asm 2026-02-21T09:18:30.5223185Z bar.sync 0; 2026-02-21T09:18:30.5223326Z // begin inline asm 2026-02-21T09:18:30.5223478Z @%p4 mbarrier.inval.shared::cta.b64 [%r357]; 2026-02-21T09:18:30.5223658Z // end inline asm 2026-02-21T09:18:30.5223787Z add.s32 %r428, %r93, 65568; 2026-02-21T09:18:30.5223944Z // begin inline asm 2026-02-21T09:18:30.5224093Z @%p4 mbarrier.inval.shared::cta.b64 [%r428]; 2026-02-21T09:18:30.5224272Z // end inline asm 2026-02-21T09:18:30.5224403Z bar.sync 0; 2026-02-21T09:18:30.5224531Z // begin inline asm 2026-02-21T09:18:30.5224763Z @%p4 mbarrier.inval.shared::cta.b64 [%r280]; 2026-02-21T09:18:30.5224966Z // end inline asm 2026-02-21T09:18:30.5225248Z .loc 1 59 45 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:59:45 2026-02-21T09:18:30.5225579Z shl.b32 %r711, %r59, 13; 2026-02-21T09:18:30.5225756Z shl.b32 %r712, %r60, 13; 2026-02-21T09:18:30.5225922Z shl.b32 %r713, %r61, 13; 2026-02-21T09:18:30.5226096Z shl.b32 %r714, %r62, 13; 2026-02-21T09:18:30.5226268Z shl.b32 %r715, %r63, 13; 2026-02-21T09:18:30.5226410Z shl.b32 %r716, %r64, 13; 2026-02-21T09:18:30.5226558Z shl.b32 %r717, %r65, 13; 2026-02-21T09:18:30.5226697Z shl.b32 %r718, %r66, 13; 2026-02-21T09:18:30.5226846Z shl.b32 %r719, %r67, 13; 2026-02-21T09:18:30.5226985Z shl.b32 %r720, %r68, 13; 2026-02-21T09:18:30.5227131Z shl.b32 %r721, %r69, 13; 2026-02-21T09:18:30.5227269Z shl.b32 %r722, %r70, 13; 2026-02-21T09:18:30.5227416Z shl.b32 %r723, %r71, 13; 2026-02-21T09:18:30.5227553Z shl.b32 %r724, %r72, 13; 2026-02-21T09:18:30.5227699Z shl.b32 %r725, %r73, 13; 2026-02-21T09:18:30.5227843Z shl.b32 %r726, %r74, 13; 2026-02-21T09:18:30.5228088Z .loc 1 59 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:59:52 2026-02-21T09:18:30.5228409Z add.s32 %r727, %r711, %r76; 2026-02-21T09:18:30.5228559Z add.s32 %r728, %r712, %r76; 2026-02-21T09:18:30.5228710Z add.s32 %r729, %r713, %r76; 2026-02-21T09:18:30.5228857Z add.s32 %r730, %r714, %r76; 2026-02-21T09:18:30.5229004Z add.s32 %r731, %r715, %r76; 2026-02-21T09:18:30.5229146Z add.s32 %r732, %r716, %r76; 2026-02-21T09:18:30.5229299Z add.s32 %r733, %r717, %r76; 2026-02-21T09:18:30.5229440Z add.s32 %r734, %r718, %r76; 2026-02-21T09:18:30.5229589Z add.s32 %r735, %r719, %r76; 2026-02-21T09:18:30.5229739Z add.s32 %r736, %r720, %r76; 2026-02-21T09:18:30.5229882Z add.s32 %r737, %r721, %r76; 2026-02-21T09:18:30.5230034Z add.s32 %r738, %r722, %r76; 2026-02-21T09:18:30.5230210Z add.s32 %r739, %r723, %r76; 2026-02-21T09:18:30.5230363Z add.s32 %r740, %r724, %r76; 2026-02-21T09:18:30.5230505Z add.s32 %r741, %r725, %r76; 2026-02-21T09:18:30.5230658Z add.s32 %r742, %r726, %r76; 2026-02-21T09:18:30.5230914Z .loc 1 59 24 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:59:24 2026-02-21T09:18:30.5231219Z mad.wide.s32 %rd98, %r727, 2, %rd16; 2026-02-21T09:18:30.5231402Z mad.wide.s32 %rd99, %r728, 2, %rd16; 2026-02-21T09:18:30.5231614Z mad.wide.s32 %rd100, %r729, 2, %rd16; 2026-02-21T09:18:30.5231802Z mad.wide.s32 %rd101, %r730, 2, %rd16; 2026-02-21T09:18:30.5231976Z mad.wide.s32 %rd102, %r731, 2, %rd16; 2026-02-21T09:18:30.5232149Z mad.wide.s32 %rd103, %r732, 2, %rd16; 2026-02-21T09:18:30.5232311Z mad.wide.s32 %rd104, %r733, 2, %rd16; 2026-02-21T09:18:30.5232480Z mad.wide.s32 %rd105, %r734, 2, %rd16; 2026-02-21T09:18:30.5232669Z mad.wide.s32 %rd106, %r735, 2, %rd16; 2026-02-21T09:18:30.5232842Z mad.wide.s32 %rd107, %r736, 2, %rd16; 2026-02-21T09:18:30.5233011Z mad.wide.s32 %rd108, %r737, 2, %rd16; 2026-02-21T09:18:30.5233175Z mad.wide.s32 %rd109, %r738, 2, %rd16; 2026-02-21T09:18:30.5233350Z mad.wide.s32 %rd110, %r739, 2, %rd16; 2026-02-21T09:18:30.5233517Z mad.wide.s32 %rd111, %r740, 2, %rd16; 2026-02-21T09:18:30.5233694Z mad.wide.s32 %rd112, %r741, 2, %rd16; 2026-02-21T09:18:30.5233861Z mad.wide.s32 %rd113, %r742, 2, %rd16; 2026-02-21T09:18:30.5234143Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5234430Z // begin inline asm 2026-02-21T09:18:30.5234853Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445}, [%r565 + 0]; 2026-02-21T09:18:30.5235311Z // end inline asm 2026-02-21T09:18:30.5235467Z // begin inline asm 2026-02-21T09:18:30.5235884Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461, %r462}, [%r565 + 16]; 2026-02-21T09:18:30.5236314Z // end inline asm 2026-02-21T09:18:30.5236453Z // begin inline asm 2026-02-21T09:18:30.5236796Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478, %r479}, [%r565 + 32]; 2026-02-21T09:18:30.5237206Z // end inline asm 2026-02-21T09:18:30.5237363Z // begin inline asm 2026-02-21T09:18:30.5237755Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495, %r496}, [%r565 + 48]; 2026-02-21T09:18:30.5238192Z // end inline asm 2026-02-21T09:18:30.5238342Z // begin inline asm 2026-02-21T09:18:30.5238745Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512, %r513}, [%r565 + 64]; 2026-02-21T09:18:30.5239185Z // end inline asm 2026-02-21T09:18:30.5239335Z // begin inline asm 2026-02-21T09:18:30.5239722Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529, %r530}, [%r565 + 80]; 2026-02-21T09:18:30.5240188Z // end inline asm 2026-02-21T09:18:30.5240344Z // begin inline asm 2026-02-21T09:18:30.5240676Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547}, [%r565 + 96]; 2026-02-21T09:18:30.5241045Z // end inline asm 2026-02-21T09:18:30.5241178Z // begin inline asm 2026-02-21T09:18:30.5241514Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564}, [%r565 + 112]; 2026-02-21T09:18:30.5241887Z // end inline asm 2026-02-21T09:18:30.5242014Z // begin inline asm 2026-02-21T09:18:30.5242167Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:18:30.5242351Z // end inline asm 2026-02-21T09:18:30.5242492Z cvt.u64.u32 %rd114, %r430; 2026-02-21T09:18:30.5242654Z cvt.u64.u32 %rd115, %r431; 2026-02-21T09:18:30.5242803Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:18:30.5242963Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:18:30.5243231Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5243523Z mov.b64 {%r743, %r744}, %rd117; 2026-02-21T09:18:30.5243720Z cvt.rn.f16x2.f32 %r745, %r744, %r743; 2026-02-21T09:18:30.5244008Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5244290Z cvt.u64.u32 %rd118, %r432; 2026-02-21T09:18:30.5244451Z cvt.u64.u32 %rd119, %r433; 2026-02-21T09:18:30.5244612Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:18:30.5244804Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:18:30.5245123Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5245467Z mov.b64 {%r746, %r747}, %rd121; 2026-02-21T09:18:30.5245673Z cvt.rn.f16x2.f32 %r748, %r747, %r746; 2026-02-21T09:18:30.5245996Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5246311Z cvt.u64.u32 %rd122, %r434; 2026-02-21T09:18:30.5246485Z cvt.u64.u32 %rd123, %r435; 2026-02-21T09:18:30.5246652Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:18:30.5246829Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:18:30.5247119Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5247435Z mov.b64 {%r749, %r750}, %rd125; 2026-02-21T09:18:30.5247617Z cvt.rn.f16x2.f32 %r751, %r750, %r749; 2026-02-21T09:18:30.5247923Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5248228Z cvt.u64.u32 %rd126, %r436; 2026-02-21T09:18:30.5248402Z cvt.u64.u32 %rd127, %r437; 2026-02-21T09:18:30.5248575Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:18:30.5248744Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:18:30.5249039Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5249347Z mov.b64 {%r752, %r753}, %rd129; 2026-02-21T09:18:30.5249533Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T09:18:30.5249832Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5250148Z cvt.u64.u32 %rd130, %r438; 2026-02-21T09:18:30.5250323Z cvt.u64.u32 %rd131, %r439; 2026-02-21T09:18:30.5250486Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:18:30.5250663Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:18:30.5250951Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5251273Z mov.b64 {%r755, %r756}, %rd133; 2026-02-21T09:18:30.5251450Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T09:18:30.5251753Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5252079Z cvt.u64.u32 %rd134, %r440; 2026-02-21T09:18:30.5252244Z cvt.u64.u32 %rd135, %r441; 2026-02-21T09:18:30.5252441Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:18:30.5252601Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:18:30.5252883Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5253187Z mov.b64 {%r758, %r759}, %rd137; 2026-02-21T09:18:30.5253366Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T09:18:30.5253653Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5253954Z cvt.u64.u32 %rd138, %r442; 2026-02-21T09:18:30.5254117Z cvt.u64.u32 %rd139, %r443; 2026-02-21T09:18:30.5254275Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:18:30.5254444Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:18:30.5254787Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5255101Z mov.b64 {%r761, %r762}, %rd141; 2026-02-21T09:18:30.5255273Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T09:18:30.5255569Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5255883Z cvt.u64.u32 %rd142, %r444; 2026-02-21T09:18:30.5256033Z cvt.u64.u32 %rd143, %r445; 2026-02-21T09:18:30.5256242Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:18:30.5256401Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:18:30.5256672Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5256949Z mov.b64 {%r764, %r765}, %rd145; 2026-02-21T09:18:30.5257117Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T09:18:30.5257411Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5257708Z cvt.u64.u32 %rd146, %r447; 2026-02-21T09:18:30.5257861Z cvt.u64.u32 %rd147, %r448; 2026-02-21T09:18:30.5258008Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:18:30.5258166Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:18:30.5258432Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5258724Z mov.b64 {%r767, %r768}, %rd149; 2026-02-21T09:18:30.5258884Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T09:18:30.5259169Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5259459Z cvt.u64.u32 %rd150, %r449; 2026-02-21T09:18:30.5259606Z cvt.u64.u32 %rd151, %r450; 2026-02-21T09:18:30.5259758Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:18:30.5259910Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:18:30.5260178Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5260467Z mov.b64 {%r770, %r771}, %rd153; 2026-02-21T09:18:30.5260644Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T09:18:30.5260925Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5261218Z cvt.u64.u32 %rd154, %r451; 2026-02-21T09:18:30.5261379Z cvt.u64.u32 %rd155, %r452; 2026-02-21T09:18:30.5261532Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:18:30.5261698Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:18:30.5261972Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5262270Z mov.b64 {%r773, %r774}, %rd157; 2026-02-21T09:18:30.5262435Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T09:18:30.5262725Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5263023Z cvt.u64.u32 %rd158, %r453; 2026-02-21T09:18:30.5263177Z cvt.u64.u32 %rd159, %r454; 2026-02-21T09:18:30.5263335Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:18:30.5263496Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:18:30.5263779Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5264071Z mov.b64 {%r776, %r777}, %rd161; 2026-02-21T09:18:30.5264274Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T09:18:30.5264559Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5264896Z cvt.u64.u32 %rd162, %r455; 2026-02-21T09:18:30.5265062Z cvt.u64.u32 %rd163, %r456; 2026-02-21T09:18:30.5265221Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:18:30.5265389Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:18:30.5265688Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5266032Z mov.b64 {%r779, %r780}, %rd165; 2026-02-21T09:18:30.5266214Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T09:18:30.5266525Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5266845Z cvt.u64.u32 %rd166, %r457; 2026-02-21T09:18:30.5267003Z cvt.u64.u32 %rd167, %r458; 2026-02-21T09:18:30.5267163Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:18:30.5267326Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:18:30.5267607Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5267901Z mov.b64 {%r782, %r783}, %rd169; 2026-02-21T09:18:30.5268112Z cvt.rn.f16x2.f32 %r784, %r783, %r782; 2026-02-21T09:18:30.5268395Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5268693Z cvt.u64.u32 %rd170, %r459; 2026-02-21T09:18:30.5268853Z cvt.u64.u32 %rd171, %r460; 2026-02-21T09:18:30.5269007Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:18:30.5269171Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:18:30.5269467Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5269773Z mov.b64 {%r785, %r786}, %rd173; 2026-02-21T09:18:30.5269952Z cvt.rn.f16x2.f32 %r787, %r786, %r785; 2026-02-21T09:18:30.5270236Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5270525Z cvt.u64.u32 %rd174, %r461; 2026-02-21T09:18:30.5270678Z cvt.u64.u32 %rd175, %r462; 2026-02-21T09:18:30.5270835Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:18:30.5270993Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:18:30.5271266Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5271553Z mov.b64 {%r788, %r789}, %rd177; 2026-02-21T09:18:30.5271730Z cvt.rn.f16x2.f32 %r790, %r789, %r788; 2026-02-21T09:18:30.5272010Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5272308Z cvt.u64.u32 %rd178, %r464; 2026-02-21T09:18:30.5272472Z cvt.u64.u32 %rd179, %r465; 2026-02-21T09:18:30.5272626Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:18:30.5272790Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:18:30.5273057Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5273349Z mov.b64 {%r791, %r792}, %rd181; 2026-02-21T09:18:30.5273513Z cvt.rn.f16x2.f32 %r793, %r792, %r791; 2026-02-21T09:18:30.5273806Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5274102Z cvt.u64.u32 %rd182, %r466; 2026-02-21T09:18:30.5274254Z cvt.u64.u32 %rd183, %r467; 2026-02-21T09:18:30.5274412Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:18:30.5274566Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:18:30.5274871Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5275156Z mov.b64 {%r794, %r795}, %rd185; 2026-02-21T09:18:30.5275324Z cvt.rn.f16x2.f32 %r796, %r795, %r794; 2026-02-21T09:18:30.5275612Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5275917Z cvt.u64.u32 %rd186, %r468; 2026-02-21T09:18:30.5276082Z cvt.u64.u32 %rd187, %r469; 2026-02-21T09:18:30.5276269Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:18:30.5276438Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:18:30.5276721Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5277034Z mov.b64 {%r797, %r798}, %rd189; 2026-02-21T09:18:30.5277206Z cvt.rn.f16x2.f32 %r799, %r798, %r797; 2026-02-21T09:18:30.5277393Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5277454Z cvt.u64.u32 %rd190, %r470; 2026-02-21T09:18:30.5277513Z cvt.u64.u32 %rd191, %r471; 2026-02-21T09:18:30.5277581Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:18:30.5277643Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:18:30.5277854Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5277915Z mov.b64 {%r800, %r801}, %rd193; 2026-02-21T09:18:30.5277987Z cvt.rn.f16x2.f32 %r802, %r801, %r800; 2026-02-21T09:18:30.5278172Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5278231Z cvt.u64.u32 %rd194, %r472; 2026-02-21T09:18:30.5278328Z cvt.u64.u32 %rd195, %r473; 2026-02-21T09:18:30.5278391Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:18:30.5278453Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:18:30.5278647Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5278710Z mov.b64 {%r803, %r804}, %rd197; 2026-02-21T09:18:30.5278776Z cvt.rn.f16x2.f32 %r805, %r804, %r803; 2026-02-21T09:18:30.5278986Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5279062Z cvt.u64.u32 %rd198, %r474; 2026-02-21T09:18:30.5279122Z cvt.u64.u32 %rd199, %r475; 2026-02-21T09:18:30.5279182Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:18:30.5279250Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:18:30.5279434Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5279495Z mov.b64 {%r806, %r807}, %rd201; 2026-02-21T09:18:30.5279566Z cvt.rn.f16x2.f32 %r808, %r807, %r806; 2026-02-21T09:18:30.5279746Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5279807Z cvt.u64.u32 %rd202, %r476; 2026-02-21T09:18:30.5279866Z cvt.u64.u32 %rd203, %r477; 2026-02-21T09:18:30.5279933Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:18:30.5279995Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:18:30.5280177Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5280246Z mov.b64 {%r809, %r810}, %rd205; 2026-02-21T09:18:30.5280310Z cvt.rn.f16x2.f32 %r811, %r810, %r809; 2026-02-21T09:18:30.5280488Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5280558Z cvt.u64.u32 %rd206, %r478; 2026-02-21T09:18:30.5280619Z cvt.u64.u32 %rd207, %r479; 2026-02-21T09:18:30.5280679Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:18:30.5280739Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:18:30.5280932Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5280992Z mov.b64 {%r812, %r813}, %rd209; 2026-02-21T09:18:30.5281057Z cvt.rn.f16x2.f32 %r814, %r813, %r812; 2026-02-21T09:18:30.5281243Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5281303Z cvt.u64.u32 %rd210, %r481; 2026-02-21T09:18:30.5281362Z cvt.u64.u32 %rd211, %r482; 2026-02-21T09:18:30.5281431Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:18:30.5281492Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:18:30.5281673Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5281735Z mov.b64 {%r815, %r816}, %rd213; 2026-02-21T09:18:30.5281831Z cvt.rn.f16x2.f32 %r817, %r816, %r815; 2026-02-21T09:18:30.5282009Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5282070Z cvt.u64.u32 %rd214, %r483; 2026-02-21T09:18:30.5282138Z cvt.u64.u32 %rd215, %r484; 2026-02-21T09:18:30.5282198Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:18:30.5282258Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:18:30.5282445Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5282506Z mov.b64 {%r818, %r819}, %rd217; 2026-02-21T09:18:30.5282572Z cvt.rn.f16x2.f32 %r820, %r819, %r818; 2026-02-21T09:18:30.5282779Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5282847Z cvt.u64.u32 %rd218, %r485; 2026-02-21T09:18:30.5282906Z cvt.u64.u32 %rd219, %r486; 2026-02-21T09:18:30.5282965Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:18:30.5283034Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:18:30.5283216Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5283301Z mov.b64 {%r821, %r822}, %rd221; 2026-02-21T09:18:30.5283373Z cvt.rn.f16x2.f32 %r823, %r822, %r821; 2026-02-21T09:18:30.5283550Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5283609Z cvt.u64.u32 %rd222, %r487; 2026-02-21T09:18:30.5283666Z cvt.u64.u32 %rd223, %r488; 2026-02-21T09:18:30.5283733Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:18:30.5283792Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:18:30.5283992Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5284061Z mov.b64 {%r824, %r825}, %rd225; 2026-02-21T09:18:30.5284126Z cvt.rn.f16x2.f32 %r826, %r825, %r824; 2026-02-21T09:18:30.5284306Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5284374Z cvt.u64.u32 %rd226, %r489; 2026-02-21T09:18:30.5284434Z cvt.u64.u32 %rd227, %r490; 2026-02-21T09:18:30.5284496Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:18:30.5284558Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:18:30.5284794Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5284859Z mov.b64 {%r827, %r828}, %rd229; 2026-02-21T09:18:30.5284929Z cvt.rn.f16x2.f32 %r829, %r828, %r827; 2026-02-21T09:18:30.5285134Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5285199Z cvt.u64.u32 %rd230, %r491; 2026-02-21T09:18:30.5285264Z cvt.u64.u32 %rd231, %r492; 2026-02-21T09:18:30.5285336Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:18:30.5285400Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:18:30.5285651Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5285721Z mov.b64 {%r830, %r831}, %rd233; 2026-02-21T09:18:30.5285787Z cvt.rn.f16x2.f32 %r832, %r831, %r830; 2026-02-21T09:18:30.5285957Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5286013Z cvt.u64.u32 %rd234, %r493; 2026-02-21T09:18:30.5286075Z cvt.u64.u32 %rd235, %r494; 2026-02-21T09:18:30.5286131Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:18:30.5286186Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:18:30.5286362Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5286419Z mov.b64 {%r833, %r834}, %rd237; 2026-02-21T09:18:30.5286481Z cvt.rn.f16x2.f32 %r835, %r834, %r833; 2026-02-21T09:18:30.5286649Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5286712Z cvt.u64.u32 %rd238, %r495; 2026-02-21T09:18:30.5286796Z cvt.u64.u32 %rd239, %r496; 2026-02-21T09:18:30.5286852Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:18:30.5286915Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:18:30.5287082Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5287139Z mov.b64 {%r836, %r837}, %rd241; 2026-02-21T09:18:30.5287207Z cvt.rn.f16x2.f32 %r838, %r837, %r836; 2026-02-21T09:18:30.5287372Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5287427Z cvt.u64.u32 %rd242, %r498; 2026-02-21T09:18:30.5287481Z cvt.u64.u32 %rd243, %r499; 2026-02-21T09:18:30.5287545Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:18:30.5287640Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:18:30.5287805Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5287871Z mov.b64 {%r839, %r840}, %rd245; 2026-02-21T09:18:30.5287933Z cvt.rn.f16x2.f32 %r841, %r840, %r839; 2026-02-21T09:18:30.5288099Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5288164Z cvt.u64.u32 %rd246, %r500; 2026-02-21T09:18:30.5288244Z cvt.u64.u32 %rd247, %r501; 2026-02-21T09:18:30.5288305Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:18:30.5288361Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:18:30.5288537Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5288594Z mov.b64 {%r842, %r843}, %rd249; 2026-02-21T09:18:30.5288653Z cvt.rn.f16x2.f32 %r844, %r843, %r842; 2026-02-21T09:18:30.5288849Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5288908Z cvt.u64.u32 %rd250, %r502; 2026-02-21T09:18:30.5288963Z cvt.u64.u32 %rd251, %r503; 2026-02-21T09:18:30.5289027Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:18:30.5289083Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:18:30.5289249Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5289306Z mov.b64 {%r845, %r846}, %rd253; 2026-02-21T09:18:30.5289375Z cvt.rn.f16x2.f32 %r847, %r846, %r845; 2026-02-21T09:18:30.5289539Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5289594Z cvt.u64.u32 %rd254, %r504; 2026-02-21T09:18:30.5289658Z cvt.u64.u32 %rd255, %r505; 2026-02-21T09:18:30.5289715Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:18:30.5289772Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:18:30.5289947Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5290006Z mov.b64 {%r848, %r849}, %rd257; 2026-02-21T09:18:30.5290066Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T09:18:30.5290233Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5290296Z cvt.u64.u32 %rd258, %r506; 2026-02-21T09:18:30.5290352Z cvt.u64.u32 %rd259, %r507; 2026-02-21T09:18:30.5290409Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:18:30.5290472Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:18:30.5290633Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5290689Z mov.b64 {%r851, %r852}, %rd261; 2026-02-21T09:18:30.5290755Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:18:30.5290922Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5290979Z cvt.u64.u32 %rd262, %r508; 2026-02-21T09:18:30.5291035Z cvt.u64.u32 %rd263, %r509; 2026-02-21T09:18:30.5291099Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:18:30.5291155Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:18:30.5291321Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5291410Z mov.b64 {%r854, %r855}, %rd265; 2026-02-21T09:18:30.5291470Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:18:30.5291640Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5291706Z cvt.u64.u32 %rd266, %r510; 2026-02-21T09:18:30.5291762Z cvt.u64.u32 %rd267, %r511; 2026-02-21T09:18:30.5291821Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:18:30.5291876Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:18:30.5292049Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5292106Z mov.b64 {%r857, %r858}, %rd269; 2026-02-21T09:18:30.5292167Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:18:30.5292358Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5292413Z cvt.u64.u32 %rd270, %r512; 2026-02-21T09:18:30.5292469Z cvt.u64.u32 %rd271, %r513; 2026-02-21T09:18:30.5292534Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:18:30.5292590Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:18:30.5292755Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5292832Z mov.b64 {%r860, %r861}, %rd273; 2026-02-21T09:18:30.5292902Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:18:30.5293065Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5293120Z cvt.u64.u32 %rd274, %r515; 2026-02-21T09:18:30.5293182Z cvt.u64.u32 %rd275, %r516; 2026-02-21T09:18:30.5293238Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:18:30.5293331Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:18:30.5293507Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5293563Z mov.b64 {%r863, %r864}, %rd277; 2026-02-21T09:18:30.5293623Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:18:30.5293788Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5293852Z cvt.u64.u32 %rd278, %r517; 2026-02-21T09:18:30.5293905Z cvt.u64.u32 %rd279, %r518; 2026-02-21T09:18:30.5293962Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:18:30.5294024Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:18:30.5294187Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5294242Z mov.b64 {%r866, %r867}, %rd281; 2026-02-21T09:18:30.5294308Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:18:30.5294477Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5294541Z cvt.u64.u32 %rd282, %r519; 2026-02-21T09:18:30.5294604Z cvt.u64.u32 %rd283, %r520; 2026-02-21T09:18:30.5294705Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:18:30.5294771Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:18:30.5294967Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5295041Z mov.b64 {%r869, %r870}, %rd285; 2026-02-21T09:18:30.5295110Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:18:30.5295303Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5295377Z cvt.u64.u32 %rd286, %r521; 2026-02-21T09:18:30.5295441Z cvt.u64.u32 %rd287, %r522; 2026-02-21T09:18:30.5295505Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:18:30.5295570Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:18:30.5295772Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5295838Z mov.b64 {%r872, %r873}, %rd289; 2026-02-21T09:18:30.5295907Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:18:30.5296105Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5296170Z cvt.u64.u32 %rd290, %r523; 2026-02-21T09:18:30.5296261Z cvt.u64.u32 %rd291, %r524; 2026-02-21T09:18:30.5296324Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:18:30.5296381Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:18:30.5296550Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5296607Z mov.b64 {%r875, %r876}, %rd293; 2026-02-21T09:18:30.5296674Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:18:30.5296841Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5296907Z cvt.u64.u32 %rd294, %r525; 2026-02-21T09:18:30.5296984Z cvt.u64.u32 %rd295, %r526; 2026-02-21T09:18:30.5297051Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:18:30.5297159Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:18:30.5297361Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5297429Z mov.b64 {%r878, %r879}, %rd297; 2026-02-21T09:18:30.5297500Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:18:30.5297693Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5297795Z cvt.u64.u32 %rd298, %r527; 2026-02-21T09:18:30.5297862Z cvt.u64.u32 %rd299, %r528; 2026-02-21T09:18:30.5297926Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:18:30.5297997Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:18:30.5298192Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5298257Z mov.b64 {%r881, %r882}, %rd301; 2026-02-21T09:18:30.5298334Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:18:30.5298552Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5298619Z cvt.u64.u32 %rd302, %r529; 2026-02-21T09:18:30.5298682Z cvt.u64.u32 %rd303, %r530; 2026-02-21T09:18:30.5298753Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:18:30.5298819Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:18:30.5299010Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5299083Z mov.b64 {%r884, %r885}, %rd305; 2026-02-21T09:18:30.5299153Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:18:30.5299348Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5299420Z cvt.u64.u32 %rd306, %r532; 2026-02-21T09:18:30.5299484Z cvt.u64.u32 %rd307, %r533; 2026-02-21T09:18:30.5299549Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:18:30.5299613Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:18:30.5299812Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5299879Z mov.b64 {%r887, %r888}, %rd309; 2026-02-21T09:18:30.5299948Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:18:30.5300144Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5300211Z cvt.u64.u32 %rd310, %r534; 2026-02-21T09:18:30.5300274Z cvt.u64.u32 %rd311, %r535; 2026-02-21T09:18:30.5300337Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:18:30.5300412Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:18:30.5300602Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5300667Z mov.b64 {%r890, %r891}, %rd313; 2026-02-21T09:18:30.5300744Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:18:30.5300943Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5301001Z cvt.u64.u32 %rd314, %r536; 2026-02-21T09:18:30.5301066Z cvt.u64.u32 %rd315, %r537; 2026-02-21T09:18:30.5301121Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:18:30.5301177Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:18:30.5301343Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5301431Z mov.b64 {%r893, %r894}, %rd317; 2026-02-21T09:18:30.5301492Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:18:30.5301660Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5301724Z cvt.u64.u32 %rd318, %r538; 2026-02-21T09:18:30.5301779Z cvt.u64.u32 %rd319, %r539; 2026-02-21T09:18:30.5301835Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:18:30.5301898Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:18:30.5302061Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5302118Z mov.b64 {%r896, %r897}, %rd321; 2026-02-21T09:18:30.5302199Z cvt.rn.f16x2.f32 %r898, %r897, %r896; 2026-02-21T09:18:30.5302371Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5302427Z cvt.u64.u32 %rd322, %r540; 2026-02-21T09:18:30.5302480Z cvt.u64.u32 %rd323, %r541; 2026-02-21T09:18:30.5302545Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:18:30.5302601Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:18:30.5302784Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5302847Z mov.b64 {%r899, %r900}, %rd325; 2026-02-21T09:18:30.5302909Z cvt.rn.f16x2.f32 %r901, %r900, %r899; 2026-02-21T09:18:30.5303071Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5303133Z cvt.u64.u32 %rd326, %r542; 2026-02-21T09:18:30.5303188Z cvt.u64.u32 %rd327, %r543; 2026-02-21T09:18:30.5303264Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:18:30.5303321Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:18:30.5303497Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5303554Z mov.b64 {%r902, %r903}, %rd329; 2026-02-21T09:18:30.5303613Z cvt.rn.f16x2.f32 %r904, %r903, %r902; 2026-02-21T09:18:30.5303786Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5303841Z cvt.u64.u32 %rd330, %r544; 2026-02-21T09:18:30.5303895Z cvt.u64.u32 %rd331, %r545; 2026-02-21T09:18:30.5303949Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:18:30.5304013Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:18:30.5304176Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5304231Z mov.b64 {%r905, %r906}, %rd333; 2026-02-21T09:18:30.5304298Z cvt.rn.f16x2.f32 %r907, %r906, %r905; 2026-02-21T09:18:30.5304465Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5304522Z cvt.u64.u32 %rd334, %r546; 2026-02-21T09:18:30.5304583Z cvt.u64.u32 %rd335, %r547; 2026-02-21T09:18:30.5304638Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:18:30.5304722Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:18:30.5304889Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5304952Z mov.b64 {%r908, %r909}, %rd337; 2026-02-21T09:18:30.5305013Z cvt.rn.f16x2.f32 %r910, %r909, %r908; 2026-02-21T09:18:30.5305174Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5305236Z cvt.u64.u32 %rd338, %r549; 2026-02-21T09:18:30.5305291Z cvt.u64.u32 %rd339, %r550; 2026-02-21T09:18:30.5305347Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:18:30.5305409Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:18:30.5305574Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5305636Z mov.b64 {%r911, %r912}, %rd341; 2026-02-21T09:18:30.5305700Z cvt.rn.f16x2.f32 %r913, %r912, %r911; 2026-02-21T09:18:30.5305881Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5305990Z cvt.u64.u32 %rd342, %r551; 2026-02-21T09:18:30.5306049Z cvt.u64.u32 %rd343, %r552; 2026-02-21T09:18:30.5306123Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:18:30.5306184Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:18:30.5306369Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5306442Z mov.b64 {%r914, %r915}, %rd345; 2026-02-21T09:18:30.5306515Z cvt.rn.f16x2.f32 %r916, %r915, %r914; 2026-02-21T09:18:30.5306694Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5306761Z cvt.u64.u32 %rd346, %r553; 2026-02-21T09:18:30.5306822Z cvt.u64.u32 %rd347, %r554; 2026-02-21T09:18:30.5306913Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:18:30.5306976Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:18:30.5307169Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5307233Z mov.b64 {%r917, %r918}, %rd349; 2026-02-21T09:18:30.5307301Z cvt.rn.f16x2.f32 %r919, %r918, %r917; 2026-02-21T09:18:30.5307495Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5307587Z cvt.u64.u32 %rd350, %r555; 2026-02-21T09:18:30.5307649Z cvt.u64.u32 %rd351, %r556; 2026-02-21T09:18:30.5307709Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:18:30.5307777Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:18:30.5307955Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5308016Z mov.b64 {%r920, %r921}, %rd353; 2026-02-21T09:18:30.5308120Z cvt.rn.f16x2.f32 %r922, %r921, %r920; 2026-02-21T09:18:30.5308304Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5308364Z cvt.u64.u32 %rd354, %r557; 2026-02-21T09:18:30.5308431Z cvt.u64.u32 %rd355, %r558; 2026-02-21T09:18:30.5308491Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:18:30.5308554Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:18:30.5308731Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5308799Z mov.b64 {%r923, %r924}, %rd357; 2026-02-21T09:18:30.5308863Z cvt.rn.f16x2.f32 %r925, %r924, %r923; 2026-02-21T09:18:30.5309040Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5309109Z cvt.u64.u32 %rd358, %r559; 2026-02-21T09:18:30.5309169Z cvt.u64.u32 %rd359, %r560; 2026-02-21T09:18:30.5309229Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:18:30.5309297Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:18:30.5309476Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5309536Z mov.b64 {%r926, %r927}, %rd361; 2026-02-21T09:18:30.5309603Z cvt.rn.f16x2.f32 %r928, %r927, %r926; 2026-02-21T09:18:30.5309789Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5309850Z cvt.u64.u32 %rd362, %r561; 2026-02-21T09:18:30.5309910Z cvt.u64.u32 %rd363, %r562; 2026-02-21T09:18:30.5309979Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:18:30.5310040Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:18:30.5310216Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5310284Z mov.b64 {%r929, %r930}, %rd365; 2026-02-21T09:18:30.5310349Z cvt.rn.f16x2.f32 %r931, %r930, %r929; 2026-02-21T09:18:30.5310535Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5310596Z cvt.u64.u32 %rd366, %r563; 2026-02-21T09:18:30.5310664Z cvt.u64.u32 %rd367, %r564; 2026-02-21T09:18:30.5310725Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:18:30.5310786Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:18:30.5310971Z .loc 1 58 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:58:27 2026-02-21T09:18:30.5311062Z mov.b64 {%r932, %r933}, %rd369; 2026-02-21T09:18:30.5311126Z cvt.rn.f16x2.f32 %r934, %r933, %r932; 2026-02-21T09:18:30.5311314Z .loc 1 59 82 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:59:82 2026-02-21T09:18:30.5311415Z st.shared.v4.b32 [%r46], {%r745, %r757, %r769, %r781}; 2026-02-21T09:18:30.5311512Z st.shared.v4.b32 [%r47], {%r793, %r805, %r817, %r829}; 2026-02-21T09:18:30.5311608Z st.shared.v4.b32 [%r48], {%r841, %r853, %r865, %r877}; 2026-02-21T09:18:30.5311698Z st.shared.v4.b32 [%r49], {%r889, %r901, %r913, %r925}; 2026-02-21T09:18:30.5311756Z bar.sync 0; 2026-02-21T09:18:30.5311849Z // begin inline asm 2026-02-21T09:18:30.5312018Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r566, %r567, %r568, %r569}, [%r570]; 2026-02-21T09:18:30.5312078Z // end inline asm 2026-02-21T09:18:30.5312137Z // begin inline asm 2026-02-21T09:18:30.5312298Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r571, %r572, %r573, %r574}, [%r575]; 2026-02-21T09:18:30.5312357Z // end inline asm 2026-02-21T09:18:30.5312415Z // begin inline asm 2026-02-21T09:18:30.5312587Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r576, %r577, %r578, %r579}, [%r580]; 2026-02-21T09:18:30.5312653Z // end inline asm 2026-02-21T09:18:30.5312710Z // begin inline asm 2026-02-21T09:18:30.5312861Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r581, %r582, %r583, %r584}, [%r585]; 2026-02-21T09:18:30.5312924Z // end inline asm 2026-02-21T09:18:30.5312980Z bar.sync 0; 2026-02-21T09:18:30.5313071Z st.shared.v4.b32 [%r46], {%r748, %r760, %r772, %r784}; 2026-02-21T09:18:30.5313189Z st.shared.v4.b32 [%r47], {%r796, %r808, %r820, %r832}; 2026-02-21T09:18:30.5313277Z st.shared.v4.b32 [%r48], {%r844, %r856, %r868, %r880}; 2026-02-21T09:18:30.5313364Z st.shared.v4.b32 [%r49], {%r892, %r904, %r916, %r928}; 2026-02-21T09:18:30.5313419Z bar.sync 0; 2026-02-21T09:18:30.5313483Z // begin inline asm 2026-02-21T09:18:30.5313633Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r586, %r587, %r588, %r589}, [%r570]; 2026-02-21T09:18:30.5313688Z // end inline asm 2026-02-21T09:18:30.5313752Z // begin inline asm 2026-02-21T09:18:30.5313903Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r591, %r592, %r593, %r594}, [%r575]; 2026-02-21T09:18:30.5313959Z // end inline asm 2026-02-21T09:18:30.5314015Z // begin inline asm 2026-02-21T09:18:30.5314171Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r596, %r597, %r598, %r599}, [%r580]; 2026-02-21T09:18:30.5314227Z // end inline asm 2026-02-21T09:18:30.5314284Z // begin inline asm 2026-02-21T09:18:30.5314441Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r601, %r602, %r603, %r604}, [%r585]; 2026-02-21T09:18:30.5314498Z // end inline asm 2026-02-21T09:18:30.5314554Z bar.sync 0; 2026-02-21T09:18:30.5314650Z st.shared.v4.b32 [%r46], {%r751, %r763, %r775, %r787}; 2026-02-21T09:18:30.5314771Z st.shared.v4.b32 [%r47], {%r799, %r811, %r823, %r835}; 2026-02-21T09:18:30.5314860Z st.shared.v4.b32 [%r48], {%r847, %r859, %r871, %r883}; 2026-02-21T09:18:30.5314947Z st.shared.v4.b32 [%r49], {%r895, %r907, %r919, %r931}; 2026-02-21T09:18:30.5315012Z bar.sync 0; 2026-02-21T09:18:30.5315070Z // begin inline asm 2026-02-21T09:18:30.5315219Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r606, %r607, %r608, %r609}, [%r570]; 2026-02-21T09:18:30.5315284Z // end inline asm 2026-02-21T09:18:30.5315343Z // begin inline asm 2026-02-21T09:18:30.5315490Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r611, %r612, %r613, %r614}, [%r575]; 2026-02-21T09:18:30.5315546Z // end inline asm 2026-02-21T09:18:30.5315613Z // begin inline asm 2026-02-21T09:18:30.5315763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r616, %r617, %r618, %r619}, [%r580]; 2026-02-21T09:18:30.5315819Z // end inline asm 2026-02-21T09:18:30.5315879Z // begin inline asm 2026-02-21T09:18:30.5316015Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r621, %r622, %r623, %r624}, [%r585]; 2026-02-21T09:18:30.5316066Z // end inline asm 2026-02-21T09:18:30.5316151Z bar.sync 0; 2026-02-21T09:18:30.5316233Z st.shared.v4.b32 [%r46], {%r754, %r766, %r778, %r790}; 2026-02-21T09:18:30.5316314Z st.shared.v4.b32 [%r47], {%r802, %r814, %r826, %r838}; 2026-02-21T09:18:30.5316397Z st.shared.v4.b32 [%r48], {%r850, %r862, %r874, %r886}; 2026-02-21T09:18:30.5316487Z st.shared.v4.b32 [%r49], {%r898, %r910, %r922, %r934}; 2026-02-21T09:18:30.5316541Z bar.sync 0; 2026-02-21T09:18:30.5316597Z // begin inline asm 2026-02-21T09:18:30.5316743Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r626, %r627, %r628, %r629}, [%r570]; 2026-02-21T09:18:30.5316798Z // end inline asm 2026-02-21T09:18:30.5316853Z // begin inline asm 2026-02-21T09:18:30.5316995Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r631, %r632, %r633, %r634}, [%r575]; 2026-02-21T09:18:30.5317079Z // end inline asm 2026-02-21T09:18:30.5317133Z // begin inline asm 2026-02-21T09:18:30.5317267Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r636, %r637, %r638, %r639}, [%r580]; 2026-02-21T09:18:30.5317327Z // end inline asm 2026-02-21T09:18:30.5317381Z // begin inline asm 2026-02-21T09:18:30.5317513Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r641, %r642, %r643, %r644}, [%r585]; 2026-02-21T09:18:30.5317573Z // end inline asm 2026-02-21T09:18:30.5317651Z // begin inline asm 2026-02-21T09:18:30.5317753Z st.global.v4.b32 [ %rd98 + 0 ], { %r566, %r586, %r606, %r626 }; 2026-02-21T09:18:30.5317806Z // end inline asm 2026-02-21T09:18:30.5317867Z // begin inline asm 2026-02-21T09:18:30.5317964Z st.global.v4.b32 [ %rd99 + 0 ], { %r567, %r587, %r607, %r627 }; 2026-02-21T09:18:30.5318015Z // end inline asm 2026-02-21T09:18:30.5318073Z // begin inline asm 2026-02-21T09:18:30.5318199Z st.global.v4.b32 [ %rd100 + 0 ], { %r568, %r588, %r608, %r628 }; 2026-02-21T09:18:30.5318255Z // end inline asm 2026-02-21T09:18:30.5318307Z // begin inline asm 2026-02-21T09:18:30.5318410Z st.global.v4.b32 [ %rd101 + 0 ], { %r569, %r589, %r609, %r629 }; 2026-02-21T09:18:30.5318463Z // end inline asm 2026-02-21T09:18:30.5318517Z // begin inline asm 2026-02-21T09:18:30.5318621Z st.global.v4.b32 [ %rd102 + 0 ], { %r571, %r591, %r611, %r631 }; 2026-02-21T09:18:30.5318674Z // end inline asm 2026-02-21T09:18:30.5318728Z // begin inline asm 2026-02-21T09:18:30.5318829Z st.global.v4.b32 [ %rd103 + 0 ], { %r572, %r592, %r612, %r632 }; 2026-02-21T09:18:30.5318883Z // end inline asm 2026-02-21T09:18:30.5318935Z // begin inline asm 2026-02-21T09:18:30.5319027Z st.global.v4.b32 [ %rd104 + 0 ], { %r573, %r593, %r613, %r633 }; 2026-02-21T09:18:30.5319086Z // end inline asm 2026-02-21T09:18:30.5319138Z // begin inline asm 2026-02-21T09:18:30.5319230Z st.global.v4.b32 [ %rd105 + 0 ], { %r574, %r594, %r614, %r634 }; 2026-02-21T09:18:30.5319289Z // end inline asm 2026-02-21T09:18:30.5319344Z // begin inline asm 2026-02-21T09:18:30.5319433Z st.global.v4.b32 [ %rd106 + 0 ], { %r576, %r596, %r616, %r636 }; 2026-02-21T09:18:30.5319485Z // end inline asm 2026-02-21T09:18:30.5319547Z // begin inline asm 2026-02-21T09:18:30.5319636Z st.global.v4.b32 [ %rd107 + 0 ], { %r577, %r597, %r617, %r637 }; 2026-02-21T09:18:30.5319690Z // end inline asm 2026-02-21T09:18:30.5319750Z // begin inline asm 2026-02-21T09:18:30.5319841Z st.global.v4.b32 [ %rd108 + 0 ], { %r578, %r598, %r618, %r638 }; 2026-02-21T09:18:30.5319895Z // end inline asm 2026-02-21T09:18:30.5319955Z // begin inline asm 2026-02-21T09:18:30.5320045Z st.global.v4.b32 [ %rd109 + 0 ], { %r579, %r599, %r619, %r639 }; 2026-02-21T09:18:30.5320097Z // end inline asm 2026-02-21T09:18:30.5320150Z // begin inline asm 2026-02-21T09:18:30.5320247Z st.global.v4.b32 [ %rd110 + 0 ], { %r581, %r601, %r621, %r641 }; 2026-02-21T09:18:30.5320299Z // end inline asm 2026-02-21T09:18:30.5320352Z // begin inline asm 2026-02-21T09:18:30.5320451Z st.global.v4.b32 [ %rd111 + 0 ], { %r582, %r602, %r622, %r642 }; 2026-02-21T09:18:30.5320504Z // end inline asm 2026-02-21T09:18:30.5320557Z // begin inline asm 2026-02-21T09:18:30.5320647Z st.global.v4.b32 [ %rd112 + 0 ], { %r583, %r603, %r623, %r643 }; 2026-02-21T09:18:30.5320733Z // end inline asm 2026-02-21T09:18:30.5320787Z // begin inline asm 2026-02-21T09:18:30.5320877Z st.global.v4.b32 [ %rd113 + 0 ], { %r584, %r604, %r624, %r644 }; 2026-02-21T09:18:30.5320936Z // end inline asm 2026-02-21T09:18:30.5321105Z .loc 1 30 74 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:30:74 2026-02-21T09:18:30.5321163Z add.s32 %r936, %r936, 1; 2026-02-21T09:18:30.5321233Z setp.ne.b32 %p79, %r936, %r4; 2026-02-21T09:18:30.5321290Z @%p79 bra $L__BB0_2; 2026-02-21T09:18:30.5321344Z bra.uni $L__BB0_9; 2026-02-21T09:18:30.5321443Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:18:30.5321540Z // Child Loop BB0_5 Depth 2 2026-02-21T09:18:30.5321737Z .loc 1 36 35 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:36:35 2026-02-21T09:18:30.5321796Z shr.s32 %r326, %r936, 31; 2026-02-21T09:18:30.5321861Z shr.u32 %r327, %r326, 23; 2026-02-21T09:18:30.5321922Z add.s32 %r328, %r936, %r327; 2026-02-21T09:18:30.5322093Z .loc 1 39 45 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:39:45 2026-02-21T09:18:30.5322180Z and.b32 %r329, %r328, 65024; 2026-02-21T09:18:30.5322238Z sub.s32 %r330, %r936, %r329; 2026-02-21T09:18:30.5322406Z .loc 1 39 64 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:39:64 2026-02-21T09:18:30.5322466Z cvt.u16.u32 %rs1, %r330; 2026-02-21T09:18:30.5322647Z .loc 1 40 51 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:40:51 2026-02-21T09:18:30.5322706Z shr.s16 %rs2, %rs1, 15; 2026-02-21T09:18:30.5322794Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:18:30.5322861Z add.s16 %rs4, %rs1, %rs3; 2026-02-21T09:18:30.5322917Z shr.s16 %rs5, %rs4, 3; 2026-02-21T09:18:30.5323085Z .loc 1 39 64 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:39:64 2026-02-21T09:18:30.5323157Z and.b16 %rs6, %rs4, -8; 2026-02-21T09:18:30.5323213Z sub.s16 %rs7, %rs1, %rs6; 2026-02-21T09:18:30.5323380Z .loc 1 41 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:41:27 2026-02-21T09:18:30.5323438Z shl.b32 %r331, %r328, 1; 2026-02-21T09:18:30.5323507Z and.b32 %r57, %r331, -1024; 2026-02-21T09:18:30.5323571Z mul.wide.s16 %r58, %rs7, 128; 2026-02-21T09:18:30.5323631Z add.s32 %r332, %r58, %r57; 2026-02-21T09:18:30.5323808Z .loc 1 42 32 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:42:32 2026-02-21T09:18:30.5323864Z or.b32 %r333, %r332, %r6; 2026-02-21T09:18:30.5324033Z .loc 1 43 27 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:43:27 2026-02-21T09:18:30.5324103Z mul.wide.s16 %r360, %rs5, 128; 2026-02-21T09:18:30.5324268Z .loc 1 54 53 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:53 2026-02-21T09:18:30.5324323Z shl.b32 %r334, %r333, 10; 2026-02-21T09:18:30.5324490Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5324576Z shfl.sync.idx.b32 %r77, %r5, 0, 31, -1; 2026-02-21T09:18:30.5324637Z shl.b32 %r335, %r77, 21; 2026-02-21T09:18:30.5324723Z and.b32 %r336, %r335, 6291456; 2026-02-21T09:18:30.5324792Z add.s32 %r565, %r336, %r935; 2026-02-21T09:18:30.5324856Z mov.pred %p22, -1; 2026-02-21T09:18:30.5324913Z mov.b32 %r937, 0; 2026-02-21T09:18:30.5324970Z // begin inline asm 2026-02-21T09:18:30.5325297Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 0], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5325358Z // end inline asm 2026-02-21T09:18:30.5325415Z // begin inline asm 2026-02-21T09:18:30.5325728Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 16], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5325818Z // end inline asm 2026-02-21T09:18:30.5325875Z // begin inline asm 2026-02-21T09:18:30.5326171Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 32], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5326225Z // end inline asm 2026-02-21T09:18:30.5326278Z // begin inline asm 2026-02-21T09:18:30.5326618Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 48], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5326675Z // end inline asm 2026-02-21T09:18:30.5326732Z // begin inline asm 2026-02-21T09:18:30.5327026Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 64], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5327112Z // end inline asm 2026-02-21T09:18:30.5327170Z // begin inline asm 2026-02-21T09:18:30.5327457Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 80], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5327521Z // end inline asm 2026-02-21T09:18:30.5327605Z // begin inline asm 2026-02-21T09:18:30.5327905Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 96], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5327970Z // end inline asm 2026-02-21T09:18:30.5328026Z // begin inline asm 2026-02-21T09:18:30.5328343Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r565 + 112], {%r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937, %r937}; 2026-02-21T09:18:30.5328411Z // end inline asm 2026-02-21T09:18:30.5328468Z // begin inline asm 2026-02-21T09:18:30.5328543Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:18:30.5328607Z // end inline asm 2026-02-21T09:18:30.5328664Z bar.sync 0; 2026-02-21T09:18:30.5328846Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5328909Z add.s32 %r938, %r93, 65568; 2026-02-21T09:18:30.5328975Z // begin inline asm 2026-02-21T09:18:30.5329067Z @%p4 mbarrier.init.shared::cta.b64 [%r938], 1; 2026-02-21T09:18:30.5329122Z // end inline asm 2026-02-21T09:18:30.5329184Z bar.sync 0; 2026-02-21T09:18:30.5329244Z add.s32 %r280, %r93, 65576; 2026-02-21T09:18:30.5329301Z // begin inline asm 2026-02-21T09:18:30.5329389Z @%p4 mbarrier.init.shared::cta.b64 [%r280], 1; 2026-02-21T09:18:30.5329453Z // end inline asm 2026-02-21T09:18:30.5329512Z add.s32 %r281, %r93, 65536; 2026-02-21T09:18:30.5329569Z // begin inline asm 2026-02-21T09:18:30.5329659Z @%p4 mbarrier.init.shared::cta.b64 [%r281], 1; 2026-02-21T09:18:30.5329713Z // end inline asm 2026-02-21T09:18:30.5329768Z bar.sync 0; 2026-02-21T09:18:30.5329834Z add.s32 %r282, %r93, 65544; 2026-02-21T09:18:30.5329892Z // begin inline asm 2026-02-21T09:18:30.5329972Z @%p4 mbarrier.init.shared::cta.b64 [%r282], 1; 2026-02-21T09:18:30.5330026Z // end inline asm 2026-02-21T09:18:30.5330090Z bar.sync 0; 2026-02-21T09:18:30.5330149Z add.s32 %r283, %r93, 65552; 2026-02-21T09:18:30.5330208Z // begin inline asm 2026-02-21T09:18:30.5330294Z @%p4 mbarrier.init.shared::cta.b64 [%r283], 1; 2026-02-21T09:18:30.5330350Z // end inline asm 2026-02-21T09:18:30.5330403Z bar.sync 0; 2026-02-21T09:18:30.5330463Z add.s32 %r357, %r93, 65560; 2026-02-21T09:18:30.5330528Z // begin inline asm 2026-02-21T09:18:30.5330607Z @%p4 mbarrier.init.shared::cta.b64 [%r357], 1; 2026-02-21T09:18:30.5330662Z // end inline asm 2026-02-21T09:18:30.5330850Z .loc 1 54 60 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:60 2026-02-21T09:18:30.5330915Z or.b32 %r338, %r334, %r25; 2026-02-21T09:18:30.5331092Z .loc 1 54 32 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:32 2026-02-21T09:18:30.5331193Z mad.wide.s32 %rd42, %r338, 2, %rd15; 2026-02-21T09:18:30.5331255Z cvt.u64.u32 %rd57, %r25; 2026-02-21T09:18:30.5331315Z cvt.s64.s32 %rd7, %r334; 2026-02-21T09:18:30.5331376Z or.b64 %rd58, %rd7, %rd57; 2026-02-21T09:18:30.5331445Z shl.b64 %rd59, %rd58, 1; 2026-02-21T09:18:30.5331505Z add.s64 %rd8, %rd15, %rd59; 2026-02-21T09:18:30.5331565Z add.s64 %rd43, %rd8, 65536; 2026-02-21T09:18:30.5331635Z add.s64 %rd44, %rd8, 131072; 2026-02-21T09:18:30.5331695Z add.s64 %rd45, %rd8, 196608; 2026-02-21T09:18:30.5331752Z mov.b32 %r350, 16; 2026-02-21T09:18:30.5331936Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5332004Z // begin inline asm 2026-02-21T09:18:30.5332157Z cp.async.cg.shared.global [ %r285 + 0 ], [ %rd42 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5332215Z // end inline asm 2026-02-21T09:18:30.5332279Z // begin inline asm 2026-02-21T09:18:30.5332399Z cp.async.cg.shared.global [ %r287 + 0 ], [ %rd43 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5332457Z // end inline asm 2026-02-21T09:18:30.5332523Z // begin inline asm 2026-02-21T09:18:30.5332638Z cp.async.cg.shared.global [ %r289 + 0 ], [ %rd44 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5332718Z // end inline asm 2026-02-21T09:18:30.5332777Z // begin inline asm 2026-02-21T09:18:30.5332900Z cp.async.cg.shared.global [ %r291 + 0 ], [ %rd45 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5332960Z // end inline asm 2026-02-21T09:18:30.5333026Z cp.async.commit_group; 2026-02-21T09:18:30.5333219Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5333276Z bar.sync 0; 2026-02-21T09:18:30.5333356Z // begin inline asm 2026-02-21T09:18:30.5333473Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r281], 8192; 2026-02-21T09:18:30.5333539Z // end inline asm 2026-02-21T09:18:30.5333720Z .loc 1 55 44 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:55:44 2026-02-21T09:18:30.5333775Z bar.sync 0; 2026-02-21T09:18:30.5333852Z elect.sync %r339|%p43, -1; 2026-02-21T09:18:30.5333917Z and.pred %p37, %p1, %p43; 2026-02-21T09:18:30.5333974Z // begin inline asm 2026-02-21T09:18:30.5334250Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r120], [%rd77, {%r937, %r360}], [%r281]; 2026-02-21T09:18:30.5334307Z // end inline asm 2026-02-21T09:18:30.5334494Z .loc 1 54 32 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:32 2026-02-21T09:18:30.5334551Z add.s64 %rd47, %rd8, 64; 2026-02-21T09:18:30.5334615Z cvt.u64.u32 %rd60, %r31; 2026-02-21T09:18:30.5334712Z or.b64 %rd61, %rd7, %rd60; 2026-02-21T09:18:30.5334769Z shl.b64 %rd62, %rd61, 1; 2026-02-21T09:18:30.5334836Z add.s64 %rd63, %rd15, %rd62; 2026-02-21T09:18:30.5334892Z add.s64 %rd48, %rd63, 65536; 2026-02-21T09:18:30.5334950Z add.s64 %rd49, %rd63, 131072; 2026-02-21T09:18:30.5335015Z add.s64 %rd50, %rd63, 196608; 2026-02-21T09:18:30.5335183Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5335241Z // begin inline asm 2026-02-21T09:18:30.5335356Z cp.async.cg.shared.global [ %r298 + 0 ], [ %rd47 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5335421Z // end inline asm 2026-02-21T09:18:30.5335478Z // begin inline asm 2026-02-21T09:18:30.5335592Z cp.async.cg.shared.global [ %r300 + 0 ], [ %rd48 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5335656Z // end inline asm 2026-02-21T09:18:30.5335714Z // begin inline asm 2026-02-21T09:18:30.5335827Z cp.async.cg.shared.global [ %r302 + 0 ], [ %rd49 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5335885Z // end inline asm 2026-02-21T09:18:30.5335956Z // begin inline asm 2026-02-21T09:18:30.5336078Z cp.async.cg.shared.global [ %r304 + 0 ], [ %rd50 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5336138Z // end inline asm 2026-02-21T09:18:30.5336218Z cp.async.commit_group; 2026-02-21T09:18:30.5336408Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5336503Z bar.sync 0; 2026-02-21T09:18:30.5336571Z // begin inline asm 2026-02-21T09:18:30.5336698Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r282], 8192; 2026-02-21T09:18:30.5336756Z // end inline asm 2026-02-21T09:18:30.5336930Z .loc 1 55 44 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:55:44 2026-02-21T09:18:30.5336994Z bar.sync 0; 2026-02-21T09:18:30.5337062Z elect.sync %r340|%p44, -1; 2026-02-21T09:18:30.5337126Z and.pred %p39, %p1, %p44; 2026-02-21T09:18:30.5337193Z add.s32 %r307, %r93, 40960; 2026-02-21T09:18:30.5337249Z mov.b32 %r308, 32; 2026-02-21T09:18:30.5338992Z // begin inline asm 2026-02-21T09:18:30.5343243Z @%p39 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r307], [%rd77, {%r308, %r360}], [%r282]; 2026-02-21T09:18:30.5343302Z // end inline asm 2026-02-21T09:18:30.5343497Z .loc 1 54 32 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:32 2026-02-21T09:18:30.5343561Z add.s64 %rd52, %rd8, 128; 2026-02-21T09:18:30.5343622Z cvt.u64.u32 %rd64, %r36; 2026-02-21T09:18:30.5343683Z or.b64 %rd65, %rd7, %rd64; 2026-02-21T09:18:30.5343813Z shl.b64 %rd66, %rd65, 1; 2026-02-21T09:18:30.5343877Z add.s64 %rd67, %rd15, %rd66; 2026-02-21T09:18:30.5343938Z add.s64 %rd53, %rd67, 65536; 2026-02-21T09:18:30.5344005Z add.s64 %rd54, %rd67, 131072; 2026-02-21T09:18:30.5344065Z add.s64 %rd55, %rd67, 196608; 2026-02-21T09:18:30.5344248Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5344305Z // begin inline asm 2026-02-21T09:18:30.5344505Z cp.async.cg.shared.global [ %r311 + 0 ], [ %rd52 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5344589Z // end inline asm 2026-02-21T09:18:30.5344651Z // begin inline asm 2026-02-21T09:18:30.5344808Z cp.async.cg.shared.global [ %r313 + 0 ], [ %rd53 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5344866Z // end inline asm 2026-02-21T09:18:30.5344923Z // begin inline asm 2026-02-21T09:18:30.5345043Z cp.async.cg.shared.global [ %r315 + 0 ], [ %rd54 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5345103Z // end inline asm 2026-02-21T09:18:30.5345166Z // begin inline asm 2026-02-21T09:18:30.5345291Z cp.async.cg.shared.global [ %r317 + 0 ], [ %rd55 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5345364Z // end inline asm 2026-02-21T09:18:30.5345437Z cp.async.commit_group; 2026-02-21T09:18:30.5345632Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5345705Z bar.sync 0; 2026-02-21T09:18:30.5345766Z // begin inline asm 2026-02-21T09:18:30.5345886Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r283], 8192; 2026-02-21T09:18:30.5345946Z // end inline asm 2026-02-21T09:18:30.5346109Z .loc 1 55 44 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:55:44 2026-02-21T09:18:30.5346161Z bar.sync 0; 2026-02-21T09:18:30.5346222Z elect.sync %r341|%p45, -1; 2026-02-21T09:18:30.5346291Z and.pred %p41, %p1, %p45; 2026-02-21T09:18:30.5346348Z add.s32 %r320, %r93, 49152; 2026-02-21T09:18:30.5346404Z mov.b32 %r321, 64; 2026-02-21T09:18:30.5346466Z // begin inline asm 2026-02-21T09:18:30.5346710Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r320], [%rd77, {%r321, %r360}], [%r283]; 2026-02-21T09:18:30.5346763Z // end inline asm 2026-02-21T09:18:30.5346934Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5346995Z cp.async.wait_group 2; 2026-02-21T09:18:30.5347046Z bar.sync 0; 2026-02-21T09:18:30.5347210Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5347272Z // begin inline asm 2026-02-21T09:18:30.5347324Z 2026-02-21T09:18:30.5347374Z { 2026-02-21T09:18:30.5347441Z .reg .pred complete; 2026-02-21T09:18:30.5347512Z waitLoop: 2026-02-21T09:18:30.5347631Z mbarrier.try_wait.parity.shared.b64 complete, [%r281], %r937; 2026-02-21T09:18:30.5347693Z @!complete bra.uni waitLoop; 2026-02-21T09:18:30.5347742Z } 2026-02-21T09:18:30.5347746Z 2026-02-21T09:18:30.5347807Z // end inline asm 2026-02-21T09:18:30.5347966Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5348026Z setp.ne.b32 %p46, %r77, 0; 2026-02-21T09:18:30.5348090Z @%p46 bra $L__BB0_4; 2026-02-21T09:18:30.5348187Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:18:30.5348248Z elect.sync %r346|%p48, -1; 2026-02-21T09:18:30.5348312Z mov.b32 %r343, 136314896; 2026-02-21T09:18:30.5348448Z mov.pred %p47, 0; 2026-02-21T09:18:30.5348541Z // begin inline asm 2026-02-21T09:18:30.5348682Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r935 + 0 ], %rd68, %rd69, %r343, %p47; 2026-02-21T09:18:30.5348743Z // end inline asm 2026-02-21T09:18:30.5348794Z // begin inline asm 2026-02-21T09:18:30.5348926Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r935 + 0 ], %rd70, %rd71, %r343, %p22; 2026-02-21T09:18:30.5348988Z // end inline asm 2026-02-21T09:18:30.5349044Z add.s32 %r348, %r93, 65568; 2026-02-21T09:18:30.5349132Z cvt.u64.u32 %rd72, %r348; 2026-02-21T09:18:30.5349194Z // begin inline asm 2026-02-21T09:18:30.5349313Z @%p48 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd72]; 2026-02-21T09:18:30.5349367Z // end inline asm 2026-02-21T09:18:30.5349466Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:18:30.5349667Z .loc 1 0 0 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:0 2026-02-21T09:18:30.5349727Z or.b32 %r59, %r332, %r7; 2026-02-21T09:18:30.5349784Z or.b32 %r60, %r332, %r8; 2026-02-21T09:18:30.5349847Z or.b32 %r61, %r332, %r9; 2026-02-21T09:18:30.5349903Z or.b32 %r62, %r332, %r10; 2026-02-21T09:18:30.5349957Z or.b32 %r63, %r332, %r11; 2026-02-21T09:18:30.5350017Z or.b32 %r64, %r332, %r12; 2026-02-21T09:18:30.5350072Z or.b32 %r65, %r332, %r13; 2026-02-21T09:18:30.5350127Z or.b32 %r66, %r332, %r14; 2026-02-21T09:18:30.5350180Z or.b32 %r67, %r332, %r15; 2026-02-21T09:18:30.5350243Z or.b32 %r68, %r332, %r16; 2026-02-21T09:18:30.5350296Z or.b32 %r69, %r332, %r17; 2026-02-21T09:18:30.5350349Z or.b32 %r70, %r332, %r18; 2026-02-21T09:18:30.5350407Z or.b32 %r71, %r332, %r19; 2026-02-21T09:18:30.5350460Z or.b32 %r72, %r332, %r20; 2026-02-21T09:18:30.5350513Z or.b32 %r73, %r332, %r21; 2026-02-21T09:18:30.5350566Z or.b32 %r74, %r332, %r22; 2026-02-21T09:18:30.5350626Z or.b32 %r76, %r360, %r23; 2026-02-21T09:18:30.5350796Z .loc 1 54 32 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:32 2026-02-21T09:18:30.5350854Z add.s64 %rd73, %rd8, 192; 2026-02-21T09:18:30.5350916Z cvt.u64.u32 %rd79, %r41; 2026-02-21T09:18:30.5350973Z add.s64 %rd80, %rd7, %rd79; 2026-02-21T09:18:30.5351027Z shl.b64 %rd81, %rd80, 1; 2026-02-21T09:18:30.5351085Z add.s64 %rd82, %rd15, %rd81; 2026-02-21T09:18:30.5351148Z add.s64 %rd74, %rd82, 65536; 2026-02-21T09:18:30.5351205Z add.s64 %rd75, %rd82, 131072; 2026-02-21T09:18:30.5351261Z add.s64 %rd76, %rd82, 196608; 2026-02-21T09:18:30.5351429Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5351481Z bar.sync 0; 2026-02-21T09:18:30.5351535Z // begin inline asm 2026-02-21T09:18:30.5351649Z cp.async.cg.shared.global [ %r349 + 0 ], [ %rd73 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5351702Z // end inline asm 2026-02-21T09:18:30.5351754Z // begin inline asm 2026-02-21T09:18:30.5351865Z cp.async.cg.shared.global [ %r351 + 0 ], [ %rd74 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5351926Z // end inline asm 2026-02-21T09:18:30.5351979Z // begin inline asm 2026-02-21T09:18:30.5352084Z cp.async.cg.shared.global [ %r353 + 0 ], [ %rd75 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5352143Z // end inline asm 2026-02-21T09:18:30.5352195Z // begin inline asm 2026-02-21T09:18:30.5352299Z cp.async.cg.shared.global [ %r355 + 0 ], [ %rd76 + 0 ], 0x10, %r350; 2026-02-21T09:18:30.5352350Z // end inline asm 2026-02-21T09:18:30.5352416Z cp.async.commit_group; 2026-02-21T09:18:30.5352580Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5352632Z // begin inline asm 2026-02-21T09:18:30.5352741Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r357], 8192; 2026-02-21T09:18:30.5352792Z // end inline asm 2026-02-21T09:18:30.5352955Z .loc 1 55 44 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:55:44 2026-02-21T09:18:30.5353015Z bar.sync 0; 2026-02-21T09:18:30.5353110Z elect.sync %r367|%p55, -1; 2026-02-21T09:18:30.5353197Z and.pred %p53, %p1, %p55; 2026-02-21T09:18:30.5353254Z add.s32 %r358, %r93, 57344; 2026-02-21T09:18:30.5353314Z mov.b32 %r359, 96; 2026-02-21T09:18:30.5353368Z // begin inline asm 2026-02-21T09:18:30.5353607Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r358], [%rd77, {%r359, %r360}], [%r357]; 2026-02-21T09:18:30.5353667Z // end inline asm 2026-02-21T09:18:30.5353852Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5353910Z add.s32 %r368, %r6, %r57; 2026-02-21T09:18:30.5353974Z add.s32 %r369, %r368, %r58; 2026-02-21T09:18:30.5354030Z shl.b32 %r370, %r369, 10; 2026-02-21T09:18:30.5354096Z mad.wide.s32 %rd370, %r370, 2, %rd6; 2026-02-21T09:18:30.5354152Z add.s32 %r371, %r54, %r370; 2026-02-21T09:18:30.5354215Z cvt.u64.u32 %rd10, %r371; 2026-02-21T09:18:30.5354287Z mov.b32 %r942, 1; 2026-02-21T09:18:30.5354342Z mov.b32 %r941, 3; 2026-02-21T09:18:30.5354406Z mov.b64 %rd371, 0; 2026-02-21T09:18:30.5354460Z mov.b32 %r939, %r937; 2026-02-21T09:18:30.5354515Z mov.b32 %r940, %r937; 2026-02-21T09:18:30.5354576Z mov.b32 %r943, %r937; 2026-02-21T09:18:30.5354646Z bra.uni $L__BB0_5; 2026-02-21T09:18:30.5354802Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:18:30.5354996Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5355083Z setp.lt.u64 %p65, %rd371, 896; 2026-02-21T09:18:30.5355273Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5355338Z // begin inline asm 2026-02-21T09:18:30.5355406Z 2026-02-21T09:18:30.5355464Z { 2026-02-21T09:18:30.5355530Z .reg .pred complete; 2026-02-21T09:18:30.5355592Z waitLoop: 2026-02-21T09:18:30.5355736Z mbarrier.try_wait.parity.shared.b64 complete, [%r938], %r937; 2026-02-21T09:18:30.5355810Z @!complete bra.uni waitLoop; 2026-02-21T09:18:30.5355868Z } 2026-02-21T09:18:30.5355874Z 2026-02-21T09:18:30.5355941Z // end inline asm 2026-02-21T09:18:30.5356005Z add.s32 %r411, %r942, 1; 2026-02-21T09:18:30.5356073Z setp.gt.s32 %p68, %r411, 1; 2026-02-21T09:18:30.5356144Z selp.b32 %r942, 0, %r411, %p68; 2026-02-21T09:18:30.5356218Z selp.b32 %r412, 1, 0, %p68; 2026-02-21T09:18:30.5356288Z xor.b32 %r90, %r943, %r412; 2026-02-21T09:18:30.5356453Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5356520Z add.s32 %r413, %r941, 1; 2026-02-21T09:18:30.5356580Z setp.gt.s32 %p69, %r413, 3; 2026-02-21T09:18:30.5356643Z selp.b32 %r941, 0, %r413, %p69; 2026-02-21T09:18:30.5356709Z add.s64 %rd97, %rd10, %rd371; 2026-02-21T09:18:30.5356868Z .loc 1 54 32 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:32 2026-02-21T09:18:30.5356942Z add.s64 %rd92, %rd370, -131072; 2026-02-21T09:18:30.5357015Z add.s64 %rd93, %rd370, -65536; 2026-02-21T09:18:30.5357091Z cvt.u32.u64 %r414, %rd97; 2026-02-21T09:18:30.5357165Z mad.wide.s32 %rd95, %r414, 2, %rd15; 2026-02-21T09:18:30.5357356Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5357430Z shl.b32 %r415, %r941, 13; 2026-02-21T09:18:30.5357492Z add.s32 %r417, %r93, %r415; 2026-02-21T09:18:30.5357553Z bar.sync 0; 2026-02-21T09:18:30.5357623Z add.s32 %r398, %r417, %r26; 2026-02-21T09:18:30.5357716Z selp.b32 %r399, 16, 0, %p65; 2026-02-21T09:18:30.5357788Z // begin inline asm 2026-02-21T09:18:30.5357909Z cp.async.cg.shared.global [ %r398 + 0 ], [ %rd92 + 0 ], 0x10, %r399; 2026-02-21T09:18:30.5357978Z // end inline asm 2026-02-21T09:18:30.5358040Z add.s32 %r400, %r398, 2048; 2026-02-21T09:18:30.5358100Z // begin inline asm 2026-02-21T09:18:30.5358229Z cp.async.cg.shared.global [ %r400 + 0 ], [ %rd93 + 0 ], 0x10, %r399; 2026-02-21T09:18:30.5358325Z // end inline asm 2026-02-21T09:18:30.5358415Z add.s32 %r402, %r398, 4096; 2026-02-21T09:18:30.5358476Z // begin inline asm 2026-02-21T09:18:30.5358613Z cp.async.cg.shared.global [ %r402 + 0 ], [ %rd370 + 0 ], 0x10, %r399; 2026-02-21T09:18:30.5358672Z // end inline asm 2026-02-21T09:18:30.5358734Z add.s32 %r404, %r398, 6144; 2026-02-21T09:18:30.5358800Z // begin inline asm 2026-02-21T09:18:30.5358920Z cp.async.cg.shared.global [ %r404 + 0 ], [ %rd95 + 0 ], 0x10, %r399; 2026-02-21T09:18:30.5358979Z // end inline asm 2026-02-21T09:18:30.5359076Z cp.async.commit_group; 2026-02-21T09:18:30.5359275Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5359339Z shl.b32 %r418, %r941, 3; 2026-02-21T09:18:30.5359402Z add.s32 %r419, %r93, %r418; 2026-02-21T09:18:30.5359474Z add.s32 %r410, %r419, 65536; 2026-02-21T09:18:30.5359542Z and.pred %p63, %p4, %p65; 2026-02-21T09:18:30.5359638Z // begin inline asm 2026-02-21T09:18:30.5359771Z @%p63 mbarrier.arrive.expect_tx.shared.b64 _, [%r410], 8192; 2026-02-21T09:18:30.5359832Z // end inline asm 2026-02-21T09:18:30.5360020Z .loc 1 55 44 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:55:44 2026-02-21T09:18:30.5360084Z add.s32 %r407, %r417, 32768; 2026-02-21T09:18:30.5360149Z bar.sync 0; 2026-02-21T09:18:30.5360218Z elect.sync %r420|%p70, -1; 2026-02-21T09:18:30.5360286Z and.pred %p71, %p65, %p70; 2026-02-21T09:18:30.5360359Z and.pred %p64, %p1, %p71; 2026-02-21T09:18:30.5360423Z cvt.u32.u64 %r421, %rd371; 2026-02-21T09:18:30.5360486Z add.s32 %r408, %r421, 128; 2026-02-21T09:18:30.5360547Z // begin inline asm 2026-02-21T09:18:30.5360826Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r407], [%rd77, {%r408, %r360}], [%r410]; 2026-02-21T09:18:30.5360886Z // end inline asm 2026-02-21T09:18:30.5361077Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5361151Z add.s64 %rd370, %rd370, 64; 2026-02-21T09:18:30.5361224Z setp.lt.u64 %p72, %rd371, 960; 2026-02-21T09:18:30.5361286Z add.s64 %rd371, %rd371, 32; 2026-02-21T09:18:30.5361353Z mov.b32 %r937, %r943; 2026-02-21T09:18:30.5361413Z mov.b32 %r938, %r422; 2026-02-21T09:18:30.5361473Z mov.b32 %r943, %r90; 2026-02-21T09:18:30.5361535Z @%p72 bra $L__BB0_5; 2026-02-21T09:18:30.5361601Z bra.uni $L__BB0_8; 2026-02-21T09:18:30.5361710Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:18:30.5361815Z // => This Inner Loop Header: Depth=2 2026-02-21T09:18:30.5361885Z add.s32 %r374, %r940, 1; 2026-02-21T09:18:30.5361952Z setp.gt.s32 %p57, %r374, 3; 2026-02-21T09:18:30.5362020Z selp.b32 %r940, 0, %r374, %p57; 2026-02-21T09:18:30.5362092Z selp.b32 %r375, 1, 0, %p57; 2026-02-21T09:18:30.5362162Z xor.b32 %r939, %r939, %r375; 2026-02-21T09:18:30.5362330Z .loc 1 54 85 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:54:85 2026-02-21T09:18:30.5362394Z cp.async.wait_group 2; 2026-02-21T09:18:30.5362454Z bar.sync 0; 2026-02-21T09:18:30.5362620Z .loc 1 49 42 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:49:42 2026-02-21T09:18:30.5362674Z shl.b32 %r376, %r940, 3; 2026-02-21T09:18:30.5362738Z add.s32 %r378, %r93, %r376; 2026-02-21T09:18:30.5362798Z add.s32 %r372, %r378, 65536; 2026-02-21T09:18:30.5362854Z // begin inline asm 2026-02-21T09:18:30.5362905Z 2026-02-21T09:18:30.5362964Z { 2026-02-21T09:18:30.5363027Z .reg .pred complete; 2026-02-21T09:18:30.5363082Z waitLoop: 2026-02-21T09:18:30.5363211Z mbarrier.try_wait.parity.shared.b64 complete, [%r372], %r939; 2026-02-21T09:18:30.5363277Z @!complete bra.uni waitLoop; 2026-02-21T09:18:30.5363328Z } 2026-02-21T09:18:30.5363332Z 2026-02-21T09:18:30.5363397Z // end inline asm 2026-02-21T09:18:30.5363457Z shl.b32 %r379, %r942, 3; 2026-02-21T09:18:30.5363518Z add.s32 %r380, %r93, %r379; 2026-02-21T09:18:30.5363649Z add.s32 %r422, %r380, 65568; 2026-02-21T09:18:30.5363832Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5363891Z @%p46 bra $L__BB0_7; 2026-02-21T09:18:30.5363986Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:18:30.5364168Z .loc 1 55 44 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:55:44 2026-02-21T09:18:30.5364247Z shl.b32 %r385, %r940, 13; 2026-02-21T09:18:30.5364306Z add.s32 %r387, %r93, %r385; 2026-02-21T09:18:30.5364370Z add.s32 %r388, %r387, 32768; 2026-02-21T09:18:30.5364539Z .loc 1 56 52 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:56:52 2026-02-21T09:18:30.5364604Z elect.sync %r389|%p59, -1; 2026-02-21T09:18:30.5364695Z bfe.u32 %r390, %r387, 4, 14; 2026-02-21T09:18:30.5364774Z cvt.u64.u32 %rd88, %r390; 2026-02-21T09:18:30.5364884Z or.b64 %rd83, %rd88, -9223371899382267904; 2026-02-21T09:18:30.5364955Z bfe.u32 %r391, %r388, 4, 14; 2026-02-21T09:18:30.5365026Z cvt.u64.u32 %rd89, %r391; 2026-02-21T09:18:30.5365107Z or.b64 %rd84, %rd89, -9223371899382267904; 2026-02-21T09:18:30.5365170Z mov.b32 %r382, 136314896; 2026-02-21T09:18:30.5365235Z mov.pred %p58, -1; 2026-02-21T09:18:30.5365305Z // begin inline asm 2026-02-21T09:18:30.5365468Z @%p59 tcgen05.mma.cta_group::1.kind::f16 [ %r935 + 0 ], %rd83, %rd84, %r382, %p58; 2026-02-21T09:18:30.5365530Z // end inline asm 2026-02-21T09:18:30.5365601Z add.s32 %r392, %r387, 32; 2026-02-21T09:18:30.5365666Z bfe.u32 %r393, %r392, 4, 14; 2026-02-21T09:18:30.5365729Z cvt.u64.u32 %rd90, %r393; 2026-02-21T09:18:30.5365813Z or.b64 %rd85, %rd90, -9223371899382267904; 2026-02-21T09:18:30.5365878Z add.s32 %r394, %r387, 32800; 2026-02-21T09:18:30.5365941Z bfe.u32 %r395, %r394, 4, 14; 2026-02-21T09:18:30.5366003Z cvt.u64.u32 %rd91, %r395; 2026-02-21T09:18:30.5366089Z or.b64 %rd86, %rd91, -9223371899382267904; 2026-02-21T09:18:30.5366153Z // begin inline asm 2026-02-21T09:18:30.5366310Z @%p59 tcgen05.mma.cta_group::1.kind::f16 [ %r935 + 0 ], %rd85, %rd86, %r382, %p58; 2026-02-21T09:18:30.5366387Z // end inline asm 2026-02-21T09:18:30.5366447Z cvt.u64.u32 %rd87, %r422; 2026-02-21T09:18:30.5366506Z // begin inline asm 2026-02-21T09:18:30.5366644Z @%p59 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd87]; 2026-02-21T09:18:30.5366700Z // end inline asm 2026-02-21T09:18:30.5366758Z bra.uni $L__BB0_7; 2026-02-21T09:18:30.5366845Z $L__BB0_9: // %._crit_edge 2026-02-21T09:18:30.5367039Z .loc 1 30 4 // ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py:30:4 2026-02-21T09:18:30.5367096Z bar.sync 0; 2026-02-21T09:18:30.5367153Z // begin inline asm 2026-02-21T09:18:30.5367281Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r935, 128; 2026-02-21T09:18:30.5367338Z // end inline asm 2026-02-21T09:18:30.5367393Z ret; 2026-02-21T09:18:30.5367451Z $L__tmp0: 2026-02-21T09:18:30.5367517Z $L__func_end0: 2026-02-21T09:18:30.5367601Z // -- End function 2026-02-21T09:18:30.5367653Z } 2026-02-21T09:18:30.5367894Z .file 1 "/tmp/torchinductor_root/a2/ca2mvzteldnzsctpj2dbl4mg3gilsikrz6apprjqn7ekfykxpgzn.py" 2026-02-21T09:18:30.5367956Z .section .debug_abbrev 2026-02-21T09:18:30.5368006Z { 2026-02-21T09:18:30.5368092Z .b8 1 // Abbreviation Code 2026-02-21T09:18:30.5368187Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:18:30.5368267Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:18:30.5368347Z .b8 37 // DW_AT_producer 2026-02-21T09:18:30.5368430Z .b8 8 // DW_FORM_string 2026-02-21T09:18:30.5368504Z .b8 19 // DW_AT_language 2026-02-21T09:18:30.5368582Z .b8 5 // DW_FORM_data2 2026-02-21T09:18:30.5368694Z .b8 3 // DW_AT_name 2026-02-21T09:18:30.5368796Z .b8 8 // DW_FORM_string 2026-02-21T09:18:30.5368875Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:18:30.5368959Z .b8 6 // DW_FORM_data4 2026-02-21T09:18:30.5369036Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:18:30.5369109Z .b8 8 // DW_FORM_string 2026-02-21T09:18:30.5369210Z .b8 0 // EOM(1) 2026-02-21T09:18:30.5369290Z .b8 0 // EOM(2) 2026-02-21T09:18:30.5369355Z .b8 0 // EOM(3) 2026-02-21T09:18:30.5369406Z } 2026-02-21T09:18:30.5369473Z .section .debug_info 2026-02-21T09:18:30.5369526Z { 2026-02-21T09:18:30.5369610Z .b32 104 // Length of Unit 2026-02-21T09:18:30.5369720Z .b8 2 // DWARF version number 2026-02-21T09:18:30.5369784Z .b8 0 2026-02-21T09:18:30.5369899Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:18:30.5369989Z .b8 8 // Address Size (in bytes) 2026-02-21T09:18:30.5370096Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:18:30.5370176Z .b8 116 // DW_AT_producer 2026-02-21T09:18:30.5370229Z .b8 114 2026-02-21T09:18:30.5370288Z .b8 105 2026-02-21T09:18:30.5370339Z .b8 116 2026-02-21T09:18:30.5370390Z .b8 111 2026-02-21T09:18:30.5370439Z .b8 110 2026-02-21T09:18:30.5370498Z .b8 0 2026-02-21T09:18:30.5370571Z .b8 2 // DW_AT_language 2026-02-21T09:18:30.5370621Z .b8 0 2026-02-21T09:18:30.5370704Z .b8 99 // DW_AT_name 2026-02-21T09:18:30.5370757Z .b8 97 2026-02-21T09:18:30.5370809Z .b8 50 2026-02-21T09:18:30.5370859Z .b8 109 2026-02-21T09:18:30.5370918Z .b8 118 2026-02-21T09:18:30.5370969Z .b8 122 2026-02-21T09:18:30.5371021Z .b8 116 2026-02-21T09:18:30.5371078Z .b8 101 2026-02-21T09:18:30.5371129Z .b8 108 2026-02-21T09:18:30.5371181Z .b8 100 2026-02-21T09:18:30.5371233Z .b8 110 2026-02-21T09:18:30.5371293Z .b8 122 2026-02-21T09:18:30.5371345Z .b8 115 2026-02-21T09:18:30.5371396Z .b8 99 2026-02-21T09:18:30.5371448Z .b8 116 2026-02-21T09:18:30.5371508Z .b8 112 2026-02-21T09:18:30.5371559Z .b8 106 2026-02-21T09:18:30.5371610Z .b8 50 2026-02-21T09:18:30.5371668Z .b8 100 2026-02-21T09:18:30.5371720Z .b8 98 2026-02-21T09:18:30.5371770Z .b8 108 2026-02-21T09:18:30.5371820Z .b8 52 2026-02-21T09:18:30.5371876Z .b8 109 2026-02-21T09:18:30.5371927Z .b8 103 2026-02-21T09:18:30.5371976Z .b8 51 2026-02-21T09:18:30.5372033Z .b8 103 2026-02-21T09:18:30.5372084Z .b8 105 2026-02-21T09:18:30.5372134Z .b8 108 2026-02-21T09:18:30.5372183Z .b8 115 2026-02-21T09:18:30.5372240Z .b8 105 2026-02-21T09:18:30.5372289Z .b8 107 2026-02-21T09:18:30.5372338Z .b8 114 2026-02-21T09:18:30.5372397Z .b8 122 2026-02-21T09:18:30.5372447Z .b8 54 2026-02-21T09:18:30.5372498Z .b8 97 2026-02-21T09:18:30.5372548Z .b8 112 2026-02-21T09:18:30.5372607Z .b8 112 2026-02-21T09:18:30.5372657Z .b8 114 2026-02-21T09:18:30.5372706Z .b8 106 2026-02-21T09:18:30.5372756Z .b8 113 2026-02-21T09:18:30.5372815Z .b8 110 2026-02-21T09:18:30.5372867Z .b8 55 2026-02-21T09:18:30.5372917Z .b8 101 2026-02-21T09:18:30.5372972Z .b8 107 2026-02-21T09:18:30.5373022Z .b8 102 2026-02-21T09:18:30.5373072Z .b8 121 2026-02-21T09:18:30.5373120Z .b8 107 2026-02-21T09:18:30.5373178Z .b8 120 2026-02-21T09:18:30.5373229Z .b8 112 2026-02-21T09:18:30.5373280Z .b8 103 2026-02-21T09:18:30.5373338Z .b8 122 2026-02-21T09:18:30.5373389Z .b8 110 2026-02-21T09:18:30.5373438Z .b8 46 2026-02-21T09:18:30.5373488Z .b8 112 2026-02-21T09:18:30.5373545Z .b8 121 2026-02-21T09:18:30.5373605Z .b8 0 2026-02-21T09:18:30.5373693Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:18:30.5373772Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:18:30.5373845Z .b8 116 2026-02-21T09:18:30.5373915Z .b8 109 2026-02-21T09:18:30.5373963Z .b8 112 2026-02-21T09:18:30.5374018Z .b8 47 2026-02-21T09:18:30.5374067Z .b8 116 2026-02-21T09:18:30.5374113Z .b8 111 2026-02-21T09:18:30.5374161Z .b8 114 2026-02-21T09:18:30.5374215Z .b8 99 2026-02-21T09:18:30.5374262Z .b8 104 2026-02-21T09:18:30.5374309Z .b8 105 2026-02-21T09:18:30.5374365Z .b8 110 2026-02-21T09:18:30.5374411Z .b8 100 2026-02-21T09:18:30.5374458Z .b8 117 2026-02-21T09:18:30.5374506Z .b8 99 2026-02-21T09:18:30.5374583Z .b8 116 2026-02-21T09:18:30.5374632Z .b8 111 2026-02-21T09:18:30.5374737Z .b8 114 2026-02-21T09:18:30.5374793Z .b8 95 2026-02-21T09:18:30.5374842Z .b8 114 2026-02-21T09:18:30.5374891Z .b8 111 2026-02-21T09:18:30.5374938Z .b8 111 2026-02-21T09:18:30.5374994Z .b8 116 2026-02-21T09:18:30.5375046Z .b8 47 2026-02-21T09:18:30.5375101Z .b8 97 2026-02-21T09:18:30.5375165Z .b8 50 2026-02-21T09:18:30.5375221Z .b8 0 2026-02-21T09:18:30.5375276Z } 2026-02-21T09:18:30.5375388Z .section .debug_macinfo { } 2026-02-21T09:18:30.5375396Z 2026-02-21T09:18:30.5375493Z ================================================================ 2026-02-21T09:18:30.5375610Z please share the reproducer above with Triton project. 2026-02-21T09:18:33.4449272Z 2026-02-21T09:18:33.4455970Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 18.0 configs/s 2026-02-21T09:18:35.1527859Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 767.3 2026-02-21T09:18:35.1530536Z configs/s 2026-02-21T09:18:35.2549106Z [78s] Generation 4 complete: 2026-02-21T09:18:35.2549444Z error=18 2026-02-21T09:18:35.2549605Z ok=73 2026-02-21T09:18:35.2549765Z min=0.0266 2026-02-21T09:18:35.2549903Z mid=0.0512 2026-02-21T09:18:35.2550037Z max=7.6278 2026-02-21T09:18:35.2550179Z best={'block_sizes': [128, 512, 32], 2026-02-21T09:18:35.2550427Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:18:35.2550705Z 'l2_groupings': [32], 2026-02-21T09:18:35.2550886Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:18:35.2551079Z 'loop_orders': [[1, 0]], 2026-02-21T09:18:35.2551227Z 'num_stages': 4, 2026-02-21T09:18:35.2551367Z 'num_warps': 4, 2026-02-21T09:18:35.2551510Z 'pid_type': 'flat', 2026-02-21T09:18:35.2551659Z 'range_flattens': [None, False], 2026-02-21T09:18:35.2551842Z 'range_multi_buffers': [None, False], 2026-02-21T09:18:35.2552018Z 'range_num_stages': [0, 0], 2026-02-21T09:18:35.2552191Z 'range_unroll_factors': [0, 0], 2026-02-21T09:18:35.2552365Z 'range_warp_specializes': [None, False]} 2026-02-21T09:18:35.2567875Z [78s] Fitting surrogate: 457 points, 457 targets 2026-02-21T09:18:36.6027256Z [79s] Generation 5 starting: 85 neighbors, 5 active search path(s) 2026-02-21T09:18:43.7384214Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 4.0 configs/s 2026-02-21T09:18:48.6359875Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 18.1 configs/s 2026-02-21T09:18:50.8897400Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 449.4 2026-02-21T09:18:50.8901499Z configs/s 2026-02-21T09:18:51.0564652Z [93s] Generation 5 complete: 2026-02-21T09:18:51.0570252Z error=9 2026-02-21T09:18:51.0571848Z ok=81 2026-02-21T09:18:51.0572025Z min=0.0245 2026-02-21T09:18:51.0572156Z mid=0.0388 2026-02-21T09:18:51.0572290Z max=4.0224 2026-02-21T09:18:51.0572429Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:18:51.0572715Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:18:51.0572950Z 'l2_groupings': [16], 2026-02-21T09:18:51.0573126Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:18:51.0573340Z 'loop_orders': [[1, 0]], 2026-02-21T09:18:51.0573495Z 'num_stages': 6, 2026-02-21T09:18:51.0573643Z 'num_warps': 8, 2026-02-21T09:18:51.0573781Z 'pid_type': 'flat', 2026-02-21T09:18:51.0573950Z 'range_flattens': [None, True], 2026-02-21T09:18:51.0574539Z 'range_multi_buffers': [None, None], 2026-02-21T09:18:51.0574904Z 'range_num_stages': [0, 0], 2026-02-21T09:18:51.0575078Z 'range_unroll_factors': [0, 0], 2026-02-21T09:18:51.0575255Z 'range_warp_specializes': [None, False]} 2026-02-21T09:18:51.0591348Z [93s] Fitting surrogate: 547 points, 547 targets 2026-02-21T09:18:52.2945534Z [95s] Generation 6 starting: 72 neighbors, 4 active search path(s) 2026-02-21T09:19:05.8616532Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73/73 1.0 configs/s 2026-02-21T09:19:09.1964961Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 73/73 22.3 configs/s 2026-02-21T09:19:11.5233287Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 435.4 2026-02-21T09:19:11.5235074Z configs/s 2026-02-21T09:19:11.6952187Z [114s] Generation 6 complete: 2026-02-21T09:19:11.6956048Z error=19 2026-02-21T09:19:11.6960847Z ok=57 2026-02-21T09:19:11.6962759Z min=0.0245 2026-02-21T09:19:11.6962957Z mid=0.0348 2026-02-21T09:19:11.6963099Z max=3.8175 2026-02-21T09:19:11.6963234Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:19:11.6963480Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:19:11.6963701Z 'l2_groupings': [16], 2026-02-21T09:19:11.6963890Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:19:11.6964083Z 'loop_orders': [[1, 0]], 2026-02-21T09:19:11.6964239Z 'num_stages': 6, 2026-02-21T09:19:11.6964374Z 'num_warps': 8, 2026-02-21T09:19:11.6964531Z 'pid_type': 'flat', 2026-02-21T09:19:11.6964861Z 'range_flattens': [None, True], 2026-02-21T09:19:11.6965041Z 'range_multi_buffers': [None, None], 2026-02-21T09:19:11.6965226Z 'range_num_stages': [0, 0], 2026-02-21T09:19:11.6965387Z 'range_unroll_factors': [0, 0], 2026-02-21T09:19:11.6965572Z 'range_warp_specializes': [None, False]} 2026-02-21T09:19:11.6977404Z [114s] Fitting surrogate: 623 points, 623 targets 2026-02-21T09:19:12.9108952Z [115s] Generation 7 starting: 65 neighbors, 4 active search path(s) 2026-02-21T09:19:27.0639288Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66/66 0.9 configs/s 2026-02-21T09:19:29.0601850Z [131s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:19:29.0602141Z 2026-02-21T09:19:29.0605737Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:19:29.0607040Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:19:29.0607296Z `ptxas` stderr: 2026-02-21T09:19:29.0607720Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 296 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:19:29.0608261Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:19:29.0608410Z 2026-02-21T09:19:29.0608828Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp1tvw3f3y.ptx -o /tmp/tmp1tvw3f3y.ptx.o 2026-02-21T09:19:29.0609307Z 2026-02-21T09:19:29.0609439Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:19:29.0609623Z 2026-02-21T09:19:29.0609709Z ================================================================ 2026-02-21T09:19:29.0609917Z Internal Triton PTX codegen error 2026-02-21T09:19:29.0610093Z `ptxas` stderr: 2026-02-21T09:19:29.0610516Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 296 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:19:29.0611244Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:19:29.0611527Z 2026-02-21T09:19:29.0611921Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp1tvw3f3y.ptx -o /tmp/tmp1tvw3f3y.ptx.o 2026-02-21T09:19:29.0612353Z 2026-02-21T09:19:29.0612356Z 2026-02-21T09:19:29.0612416Z // 2026-02-21T09:19:29.0612563Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:19:29.0612734Z // 2026-02-21T09:19:29.0612798Z 2026-02-21T09:19:29.0612924Z .version 8.7 2026-02-21T09:19:29.0613062Z .target sm_100a 2026-02-21T09:19:29.0613191Z .address_size 64 2026-02-21T09:19:29.0613270Z 2026-02-21T09:19:29.0613393Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:19:29.0613634Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:19:29.0613843Z // @_helion_matmul 2026-02-21T09:19:29.0614033Z .visible .entry _helion_matmul( 2026-02-21T09:19:29.0614246Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:19:29.0614496Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:19:29.0614786Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:19:29.0615045Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:19:29.0615301Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:19:29.0615519Z ) 2026-02-21T09:19:29.0615639Z .reqntid 128 2026-02-21T09:19:29.0615781Z .maxnreg 32 2026-02-21T09:19:29.0615904Z { 2026-02-21T09:19:29.0616040Z .reg .pred %p<87>; 2026-02-21T09:19:29.0616208Z .reg .b16 %rs<11>; 2026-02-21T09:19:29.0616341Z .reg .b32 %r<643>; 2026-02-21T09:19:29.0616476Z .reg .b64 %rd<252>; 2026-02-21T09:19:29.0616611Z $L__func_begin0: 2026-02-21T09:19:29.0616689Z 2026-02-21T09:19:29.0616746Z // %bb.0: 2026-02-21T09:19:29.0616975Z .loc 1 19 0 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:19 2026-02-21T09:19:29.0617270Z mov.u32 %r1, %tid.x; 2026-02-21T09:19:29.0617438Z ld.param.b64 %rd17, [_helion_matmul_param_1]; 2026-02-21T09:19:29.0617638Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:19:29.0617800Z mov.b32 %r81, global_smem; 2026-02-21T09:19:29.0618134Z // begin inline asm 2026-02-21T09:19:29.0618375Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r81], 64; 2026-02-21T09:19:29.0618614Z // end inline asm 2026-02-21T09:19:29.0618775Z ld.param.b64 %rd34, [_helion_matmul_param_3]; 2026-02-21T09:19:29.0618958Z bar.sync 0; 2026-02-21T09:19:29.0619106Z ld.shared.b32 %r634, [global_smem]; 2026-02-21T09:19:29.0619270Z bar.sync 0; 2026-02-21T09:19:29.0619401Z // begin inline asm 2026-02-21T09:19:29.0619602Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:19:29.0619823Z // end inline asm 2026-02-21T09:19:29.0620076Z .loc 1 21 67 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:21:67 2026-02-21T09:19:29.0620360Z mov.u32 %r90, %ctaid.x; 2026-02-21T09:19:29.0620513Z mov.u32 %r91, %ctaid.y; 2026-02-21T09:19:29.0620654Z mov.u32 %r92, %ctaid.z; 2026-02-21T09:19:29.0620802Z mov.u32 %r93, %nctaid.x; 2026-02-21T09:19:29.0620947Z mov.u32 %r94, %nctaid.y; 2026-02-21T09:19:29.0621101Z mad.lo.s32 %r95, %r92, %r94, %r91; 2026-02-21T09:19:29.0621275Z mad.lo.s32 %r96, %r95, %r93, %r90; 2026-02-21T09:19:29.0621436Z shl.b32 %r97, %r96, 7; 2026-02-21T09:19:29.0621589Z cvt.s64.s32 %rd35, %r97; 2026-02-21T09:19:29.0621740Z add.s64 %rd31, %rd34, %rd35; 2026-02-21T09:19:29.0621903Z shl.b32 %r98, %r1, 2; 2026-02-21T09:19:29.0622045Z add.s32 %r82, %r81, %r98; 2026-02-21T09:19:29.0622195Z mov.b32 %r83, 0; 2026-02-21T09:19:29.0622323Z // begin inline asm 2026-02-21T09:19:29.0622475Z @%p1 st.shared.b32 [ %r82 + 0 ], %r83; 2026-02-21T09:19:29.0622639Z // end inline asm 2026-02-21T09:19:29.0622778Z bar.warp.sync -1; 2026-02-21T09:19:29.0622922Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:19:29.0623123Z cvt.u64.u32 %rd16, %r81; 2026-02-21T09:19:29.0623318Z // begin inline asm 2026-02-21T09:19:29.0623594Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd16 + 0 ], %rd17; 2026-02-21T09:19:29.0623868Z // end inline asm 2026-02-21T09:19:29.0623998Z // begin inline asm 2026-02-21T09:19:29.0624220Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1; 2026-02-21T09:19:29.0624465Z // end inline asm 2026-02-21T09:19:29.0624601Z mov.b32 %r84, 32; 2026-02-21T09:19:29.0624784Z // begin inline asm 2026-02-21T09:19:29.0625049Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r84; 2026-02-21T09:19:29.0625324Z // end inline asm 2026-02-21T09:19:29.0625451Z mov.b32 %r85, 64; 2026-02-21T09:19:29.0625588Z // begin inline asm 2026-02-21T09:19:29.0625809Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r85; 2026-02-21T09:19:29.0626075Z // end inline asm 2026-02-21T09:19:29.0626203Z mov.b32 %r86, 1024; 2026-02-21T09:19:29.0626347Z // begin inline asm 2026-02-21T09:19:29.0626582Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r86; 2026-02-21T09:19:29.0626844Z // end inline asm 2026-02-21T09:19:29.0626981Z mov.b32 %r87, 8192; 2026-02-21T09:19:29.0627118Z // begin inline asm 2026-02-21T09:19:29.0627354Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r87; 2026-02-21T09:19:29.0627618Z // end inline asm 2026-02-21T09:19:29.0627758Z mov.b64 %rd24, 2048; 2026-02-21T09:19:29.0627894Z // begin inline asm 2026-02-21T09:19:29.0628144Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd16 + 0 ], 0x0, %rd24; 2026-02-21T09:19:29.0628423Z // end inline asm 2026-02-21T09:19:29.0628550Z mov.b32 %r88, 1; 2026-02-21T09:19:29.0628686Z // begin inline asm 2026-02-21T09:19:29.0628929Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r88; 2026-02-21T09:19:29.0629217Z // end inline asm 2026-02-21T09:19:29.0629344Z // begin inline asm 2026-02-21T09:19:29.0629590Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r88; 2026-02-21T09:19:29.0629874Z // end inline asm 2026-02-21T09:19:29.0630000Z // begin inline asm 2026-02-21T09:19:29.0630228Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x6; 2026-02-21T09:19:29.0630479Z // end inline asm 2026-02-21T09:19:29.0630613Z // begin inline asm 2026-02-21T09:19:29.0630853Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:19:29.0631133Z // end inline asm 2026-02-21T09:19:29.0631268Z // begin inline asm 2026-02-21T09:19:29.0631495Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x2; 2026-02-21T09:19:29.0631756Z // end inline asm 2026-02-21T09:19:29.0631883Z // begin inline asm 2026-02-21T09:19:29.0632106Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:19:29.0632349Z // end inline asm 2026-02-21T09:19:29.0632484Z // begin inline asm 2026-02-21T09:19:29.0632814Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd31 + 0 ], [ %rd16 + 0 ], 0x80; 2026-02-21T09:19:29.0633178Z // end inline asm 2026-02-21T09:19:29.0633314Z // begin inline asm 2026-02-21T09:19:29.0633512Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd31 + 0 ], 0x80; 2026-02-21T09:19:29.0633763Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:19:29.0633944Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:19:29.0634120Z // end inline asm 2026-02-21T09:19:29.0634244Z bar.sync 0; 2026-02-21T09:19:29.0634385Z cvta.global.u64 %rd94, %rd31; 2026-02-21T09:19:29.0634656Z .loc 1 28 35 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:28:35 2026-02-21T09:19:29.0635004Z shl.b32 %r635, %r90, 2; 2026-02-21T09:19:29.0635283Z .loc 1 29 37 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:29:37 2026-02-21T09:19:29.0635660Z add.s32 %r99, %r635, 4; 2026-02-21T09:19:29.0635965Z .loc 1 29 49 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:29:49 2026-02-21T09:19:29.0636323Z min.s32 %r4, %r99, 1024; 2026-02-21T09:19:29.0636578Z .loc 1 30 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:30:52 2026-02-21T09:19:29.0636865Z setp.ge.s32 %p21, %r635, %r4; 2026-02-21T09:19:29.0637022Z @%p21 bra $L__BB0_9; 2026-02-21T09:19:29.0637185Z // %bb.1: // %.lr.ph 2026-02-21T09:19:29.0637510Z .loc 1 0 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:0:52 2026-02-21T09:19:29.0637813Z ld.param.b64 %rd15, [_helion_matmul_param_2]; 2026-02-21T09:19:29.0638015Z ld.param.b64 %rd14, [_helion_matmul_param_0]; 2026-02-21T09:19:29.0638201Z shr.u32 %r5, %r1, 5; 2026-02-21T09:19:29.0638337Z shl.b32 %r6, %r1, 3; 2026-02-21T09:19:29.0638480Z and.b32 %r7, %r6, 56; 2026-02-21T09:19:29.0638629Z bfe.u32 %r8, %r1, 2, 5; 2026-02-21T09:19:29.0638774Z shr.u32 %r100, %r1, 3; 2026-02-21T09:19:29.0638928Z bfe.u32 %r9, %r1, 3, 4; 2026-02-21T09:19:29.0639068Z or.b32 %r10, %r9, 16; 2026-02-21T09:19:29.0639216Z or.b32 %r11, %r9, 32; 2026-02-21T09:19:29.0639351Z or.b32 %r12, %r9, 48; 2026-02-21T09:19:29.0639492Z or.b32 %r13, %r9, 64; 2026-02-21T09:19:29.0639624Z or.b32 %r14, %r9, 80; 2026-02-21T09:19:29.0639763Z or.b32 %r15, %r9, 96; 2026-02-21T09:19:29.0639895Z or.b32 %r16, %r100, 112; 2026-02-21T09:19:29.0640046Z and.b32 %r17, %r6, 24; 2026-02-21T09:19:29.0640190Z shl.b32 %r101, %r1, 4; 2026-02-21T09:19:29.0640335Z and.b32 %r102, %r101, 2032; 2026-02-21T09:19:29.0640516Z and.b32 %r103, %r1, 24; 2026-02-21T09:19:29.0640656Z shl.b32 %r104, %r103, 1; 2026-02-21T09:19:29.0640807Z xor.b32 %r18, %r102, %r104; 2026-02-21T09:19:29.0640959Z add.s32 %r204, %r81, %r18; 2026-02-21T09:19:29.0641114Z add.s32 %r206, %r204, 2048; 2026-02-21T09:19:29.0641260Z add.s32 %r208, %r204, 4096; 2026-02-21T09:19:29.0641413Z add.s32 %r210, %r204, 6144; 2026-02-21T09:19:29.0641557Z or.b32 %r23, %r17, 32; 2026-02-21T09:19:29.0641702Z add.s32 %r217, %r204, 8192; 2026-02-21T09:19:29.0641856Z add.s32 %r219, %r204, 10240; 2026-02-21T09:19:29.0642005Z add.s32 %r221, %r204, 12288; 2026-02-21T09:19:29.0642161Z add.s32 %r223, %r204, 14336; 2026-02-21T09:19:29.0642303Z or.b32 %r28, %r17, 64; 2026-02-21T09:19:29.0642451Z add.s32 %r230, %r204, 16384; 2026-02-21T09:19:29.0642595Z add.s32 %r232, %r204, 18432; 2026-02-21T09:19:29.0642746Z add.s32 %r234, %r204, 20480; 2026-02-21T09:19:29.0642887Z add.s32 %r236, %r204, 22528; 2026-02-21T09:19:29.0643036Z or.b32 %r33, %r17, 96; 2026-02-21T09:19:29.0643172Z add.s32 %r243, %r204, 24576; 2026-02-21T09:19:29.0643322Z add.s32 %r245, %r204, 26624; 2026-02-21T09:19:29.0643467Z add.s32 %r247, %r204, 28672; 2026-02-21T09:19:29.0643609Z add.s32 %r249, %r204, 30720; 2026-02-21T09:19:29.0643762Z or.b32 %r38, %r17, 128; 2026-02-21T09:19:29.0643902Z add.s32 %r256, %r204, 32768; 2026-02-21T09:19:29.0644053Z add.s32 %r258, %r204, 34816; 2026-02-21T09:19:29.0644196Z add.s32 %r260, %r204, 36864; 2026-02-21T09:19:29.0644345Z add.s32 %r262, %r204, 38912; 2026-02-21T09:19:29.0644494Z bfe.u32 %r106, %r81, 4, 14; 2026-02-21T09:19:29.0644651Z cvt.u64.u32 %rd36, %r106; 2026-02-21T09:19:29.0644871Z or.b64 %rd85, %rd36, -9223371899382267904; 2026-02-21T09:19:29.0645043Z add.s32 %r107, %r81, 49152; 2026-02-21T09:19:29.0645197Z bfe.u32 %r108, %r107, 4, 14; 2026-02-21T09:19:29.0645348Z cvt.u64.u32 %rd37, %r108; 2026-02-21T09:19:29.0645516Z or.b64 %rd86, %rd37, -9223371899399045120; 2026-02-21T09:19:29.0645687Z add.s32 %r109, %r81, 32; 2026-02-21T09:19:29.0645838Z bfe.u32 %r110, %r109, 4, 14; 2026-02-21T09:19:29.0645984Z cvt.u64.u32 %rd38, %r110; 2026-02-21T09:19:29.0646153Z or.b64 %rd87, %rd38, -9223371899382267904; 2026-02-21T09:19:29.0646323Z add.s32 %r111, %r81, 49184; 2026-02-21T09:19:29.0646480Z bfe.u32 %r112, %r111, 4, 14; 2026-02-21T09:19:29.0646710Z cvt.u64.u32 %rd39, %r112; 2026-02-21T09:19:29.0646927Z or.b64 %rd88, %rd39, -9223371899399045120; 2026-02-21T09:19:29.0647115Z or.b32 %r43, %r17, 160; 2026-02-21T09:19:29.0647267Z add.s32 %r297, %r204, 40960; 2026-02-21T09:19:29.0647426Z add.s32 %r299, %r204, 43008; 2026-02-21T09:19:29.0647578Z add.s32 %r301, %r204, 45056; 2026-02-21T09:19:29.0647735Z add.s32 %r303, %r204, 47104; 2026-02-21T09:19:29.0647884Z and.b32 %r113, %r101, 1968; 2026-02-21T09:19:29.0648042Z bfe.s32 %r114, %r1, 2, 1; 2026-02-21T09:19:29.0648244Z and.b32 %r115, %r114, 2112; 2026-02-21T09:19:29.0648403Z or.b32 %r116, %r115, %r113; 2026-02-21T09:19:29.0648574Z add.s32 %r48, %r81, %r116; 2026-02-21T09:19:29.0648733Z xor.b32 %r117, %r116, 64; 2026-02-21T09:19:29.0648899Z add.s32 %r49, %r81, %r117; 2026-02-21T09:19:29.0649058Z shl.b32 %r118, %r1, 6; 2026-02-21T09:19:29.0649214Z and.b32 %r119, %r118, 2112; 2026-02-21T09:19:29.0649375Z shl.b32 %r120, %r103, 5; 2026-02-21T09:19:29.0649546Z and.b32 %r121, %r6, 48; 2026-02-21T09:19:29.0649702Z shl.b32 %r122, %r1, 1; 2026-02-21T09:19:29.0649863Z and.b32 %r123, %r122, 192; 2026-02-21T09:19:29.0650028Z or.b32 %r124, %r119, %r120; 2026-02-21T09:19:29.0650188Z or.b32 %r125, %r121, %r123; 2026-02-21T09:19:29.0650358Z xor.b32 %r126, %r124, %r125; 2026-02-21T09:19:29.0650514Z add.s32 %r453, %r81, %r126; 2026-02-21T09:19:29.0650677Z add.s32 %r458, %r453, 1024; 2026-02-21T09:19:29.0650955Z .loc 1 30 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:30:52 2026-02-21T09:19:29.0651262Z and.b32 %r127, %r1, 3; 2026-02-21T09:19:29.0651425Z mad.wide.u32 %rd40, %r127, 16, %rd14; 2026-02-21T09:19:29.0651614Z add.s64 %rd6, %rd40, 196992; 2026-02-21T09:19:29.0651780Z shl.b32 %r52, %r8, 10; 2026-02-21T09:19:29.0651932Z bra.uni $L__BB0_2; 2026-02-21T09:19:29.0652133Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:29.0652467Z .loc 1 0 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:0:52 2026-02-21T09:19:29.0652774Z mov.b32 %r372, 1; 2026-02-21T09:19:29.0653029Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0653327Z // begin inline asm 2026-02-21T09:19:29.0653470Z 2026-02-21T09:19:29.0653596Z { 2026-02-21T09:19:29.0653732Z .reg .pred complete; 2026-02-21T09:19:29.0653884Z waitLoop: 2026-02-21T09:19:29.0654086Z mbarrier.try_wait.parity.shared.b64 complete, [%r371], %r372; 2026-02-21T09:19:29.0654329Z @!complete bra.uni waitLoop; 2026-02-21T09:19:29.0654494Z } 2026-02-21T09:19:29.0654561Z 2026-02-21T09:19:29.0654620Z // end inline asm 2026-02-21T09:19:29.0654953Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0655244Z cp.async.wait_group 0; 2026-02-21T09:19:29.0655396Z bar.sync 0; 2026-02-21T09:19:29.0655531Z // begin inline asm 2026-02-21T09:19:29.0655709Z @%p4 mbarrier.inval.shared::cta.b64 [%r198]; 2026-02-21T09:19:29.0655911Z // end inline asm 2026-02-21T09:19:29.0656048Z bar.sync 0; 2026-02-21T09:19:29.0656188Z // begin inline asm 2026-02-21T09:19:29.0656356Z @%p4 mbarrier.inval.shared::cta.b64 [%r199]; 2026-02-21T09:19:29.0656552Z // end inline asm 2026-02-21T09:19:29.0656686Z bar.sync 0; 2026-02-21T09:19:29.0656823Z // begin inline asm 2026-02-21T09:19:29.0656992Z @%p4 mbarrier.inval.shared::cta.b64 [%r200]; 2026-02-21T09:19:29.0657174Z // end inline asm 2026-02-21T09:19:29.0657306Z bar.sync 0; 2026-02-21T09:19:29.0657423Z // begin inline asm 2026-02-21T09:19:29.0657581Z @%p4 mbarrier.inval.shared::cta.b64 [%r201]; 2026-02-21T09:19:29.0657756Z // end inline asm 2026-02-21T09:19:29.0657887Z bar.sync 0; 2026-02-21T09:19:29.0658007Z // begin inline asm 2026-02-21T09:19:29.0658164Z @%p4 mbarrier.inval.shared::cta.b64 [%r202]; 2026-02-21T09:19:29.0658335Z // end inline asm 2026-02-21T09:19:29.0658467Z bar.sync 0; 2026-02-21T09:19:29.0658629Z // begin inline asm 2026-02-21T09:19:29.0658816Z @%p4 mbarrier.inval.shared::cta.b64 [%r305]; 2026-02-21T09:19:29.0659026Z // end inline asm 2026-02-21T09:19:29.0659157Z add.s32 %r379, %r81, 73776; 2026-02-21T09:19:29.0659312Z // begin inline asm 2026-02-21T09:19:29.0659462Z @%p4 mbarrier.inval.shared::cta.b64 [%r379]; 2026-02-21T09:19:29.0659641Z // end inline asm 2026-02-21T09:19:29.0659764Z bar.sync 0; 2026-02-21T09:19:29.0659894Z // begin inline asm 2026-02-21T09:19:29.0660041Z @%p4 mbarrier.inval.shared::cta.b64 [%r197]; 2026-02-21T09:19:29.0660255Z // end inline asm 2026-02-21T09:19:29.0660504Z .loc 1 59 45 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:59:45 2026-02-21T09:19:29.0660786Z shl.b32 %r522, %r57, 13; 2026-02-21T09:19:29.0660939Z shl.b32 %r523, %r58, 13; 2026-02-21T09:19:29.0661083Z shl.b32 %r524, %r59, 13; 2026-02-21T09:19:29.0661231Z shl.b32 %r525, %r60, 13; 2026-02-21T09:19:29.0661369Z shl.b32 %r526, %r61, 13; 2026-02-21T09:19:29.0661516Z shl.b32 %r527, %r62, 13; 2026-02-21T09:19:29.0661658Z shl.b32 %r528, %r63, 13; 2026-02-21T09:19:29.0661807Z shl.b32 %r529, %r64, 13; 2026-02-21T09:19:29.0662063Z .loc 1 59 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:59:52 2026-02-21T09:19:29.0662341Z add.s32 %r530, %r522, %r56; 2026-02-21T09:19:29.0662499Z add.s32 %r531, %r523, %r56; 2026-02-21T09:19:29.0662645Z add.s32 %r532, %r524, %r56; 2026-02-21T09:19:29.0662797Z add.s32 %r533, %r525, %r56; 2026-02-21T09:19:29.0662942Z add.s32 %r534, %r526, %r56; 2026-02-21T09:19:29.0663091Z add.s32 %r535, %r527, %r56; 2026-02-21T09:19:29.0663234Z add.s32 %r536, %r528, %r56; 2026-02-21T09:19:29.0663386Z add.s32 %r537, %r529, %r56; 2026-02-21T09:19:29.0663638Z .loc 1 59 24 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:59:24 2026-02-21T09:19:29.0663932Z mad.wide.s32 %rd114, %r530, 2, %rd15; 2026-02-21T09:19:29.0664118Z mad.wide.s32 %rd115, %r531, 2, %rd15; 2026-02-21T09:19:29.0664290Z mad.wide.s32 %rd116, %r532, 2, %rd15; 2026-02-21T09:19:29.0664465Z mad.wide.s32 %rd117, %r533, 2, %rd15; 2026-02-21T09:19:29.0664643Z mad.wide.s32 %rd118, %r534, 2, %rd15; 2026-02-21T09:19:29.0664905Z mad.wide.s32 %rd119, %r535, 2, %rd15; 2026-02-21T09:19:29.0665079Z mad.wide.s32 %rd120, %r536, 2, %rd15; 2026-02-21T09:19:29.0665264Z mad.wide.s32 %rd121, %r537, 2, %rd15; 2026-02-21T09:19:29.0665559Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0665856Z // begin inline asm 2026-02-21T09:19:29.0666239Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r381, %r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396}, [%r448 + 0]; 2026-02-21T09:19:29.0666624Z // end inline asm 2026-02-21T09:19:29.0666759Z // begin inline asm 2026-02-21T09:19:29.0667094Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r398, %r399, %r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413}, [%r448 + 16]; 2026-02-21T09:19:29.0667476Z // end inline asm 2026-02-21T09:19:29.0667616Z // begin inline asm 2026-02-21T09:19:29.0667950Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r415, %r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430}, [%r448 + 32]; 2026-02-21T09:19:29.0668335Z // end inline asm 2026-02-21T09:19:29.0668463Z // begin inline asm 2026-02-21T09:19:29.0668800Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r432, %r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447}, [%r448 + 48]; 2026-02-21T09:19:29.0669182Z // end inline asm 2026-02-21T09:19:29.0669308Z // begin inline asm 2026-02-21T09:19:29.0669460Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:19:29.0669616Z // end inline asm 2026-02-21T09:19:29.0669755Z cvt.u64.u32 %rd122, %r381; 2026-02-21T09:19:29.0669904Z cvt.u64.u32 %rd123, %r382; 2026-02-21T09:19:29.0670097Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:19:29.0670279Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:19:29.0670574Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0670851Z mov.b64 {%r538, %r539}, %rd125; 2026-02-21T09:19:29.0671021Z cvt.rn.f16x2.f32 %r540, %r539, %r538; 2026-02-21T09:19:29.0671296Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0671574Z cvt.u64.u32 %rd126, %r383; 2026-02-21T09:19:29.0671782Z cvt.u64.u32 %rd127, %r384; 2026-02-21T09:19:29.0671932Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:19:29.0672091Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:19:29.0672351Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0672633Z mov.b64 {%r541, %r542}, %rd129; 2026-02-21T09:19:29.0672801Z cvt.rn.f16x2.f32 %r543, %r542, %r541; 2026-02-21T09:19:29.0673075Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0673360Z cvt.u64.u32 %rd130, %r385; 2026-02-21T09:19:29.0673511Z cvt.u64.u32 %rd131, %r386; 2026-02-21T09:19:29.0673665Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:19:29.0673817Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:19:29.0674086Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0674359Z mov.b64 {%r544, %r545}, %rd133; 2026-02-21T09:19:29.0674527Z cvt.rn.f16x2.f32 %r546, %r545, %r544; 2026-02-21T09:19:29.0674848Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0675131Z cvt.u64.u32 %rd134, %r387; 2026-02-21T09:19:29.0675285Z cvt.u64.u32 %rd135, %r388; 2026-02-21T09:19:29.0675440Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:19:29.0675609Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:19:29.0675892Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0676201Z mov.b64 {%r547, %r548}, %rd137; 2026-02-21T09:19:29.0676379Z cvt.rn.f16x2.f32 %r549, %r548, %r547; 2026-02-21T09:19:29.0676669Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0676974Z cvt.u64.u32 %rd138, %r389; 2026-02-21T09:19:29.0677132Z cvt.u64.u32 %rd139, %r390; 2026-02-21T09:19:29.0677296Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:19:29.0677456Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:19:29.0677745Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0678042Z mov.b64 {%r550, %r551}, %rd141; 2026-02-21T09:19:29.0678219Z cvt.rn.f16x2.f32 %r552, %r551, %r550; 2026-02-21T09:19:29.0678530Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0678824Z cvt.u64.u32 %rd142, %r391; 2026-02-21T09:19:29.0678992Z cvt.u64.u32 %rd143, %r392; 2026-02-21T09:19:29.0679151Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:19:29.0679318Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:19:29.0679596Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0679909Z mov.b64 {%r553, %r554}, %rd145; 2026-02-21T09:19:29.0680082Z cvt.rn.f16x2.f32 %r555, %r554, %r553; 2026-02-21T09:19:29.0680369Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0680672Z cvt.u64.u32 %rd146, %r393; 2026-02-21T09:19:29.0680826Z cvt.u64.u32 %rd147, %r394; 2026-02-21T09:19:29.0680986Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:19:29.0681146Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:19:29.0681430Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0681722Z mov.b64 {%r556, %r557}, %rd149; 2026-02-21T09:19:29.0681932Z cvt.rn.f16x2.f32 %r558, %r557, %r556; 2026-02-21T09:19:29.0682286Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0682591Z cvt.u64.u32 %rd150, %r395; 2026-02-21T09:19:29.0682757Z cvt.u64.u32 %rd151, %r396; 2026-02-21T09:19:29.0682911Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:19:29.0683079Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:19:29.0683359Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0683702Z mov.b64 {%r559, %r560}, %rd153; 2026-02-21T09:19:29.0683885Z cvt.rn.f16x2.f32 %r561, %r560, %r559; 2026-02-21T09:19:29.0684179Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0684484Z cvt.u64.u32 %rd154, %r398; 2026-02-21T09:19:29.0684645Z cvt.u64.u32 %rd155, %r399; 2026-02-21T09:19:29.0684860Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:19:29.0685023Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:19:29.0685309Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0685617Z mov.b64 {%r562, %r563}, %rd157; 2026-02-21T09:19:29.0685793Z cvt.rn.f16x2.f32 %r564, %r563, %r562; 2026-02-21T09:19:29.0686095Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0686393Z cvt.u64.u32 %rd158, %r400; 2026-02-21T09:19:29.0686556Z cvt.u64.u32 %rd159, %r401; 2026-02-21T09:19:29.0686715Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:19:29.0686884Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:19:29.0687165Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0687466Z mov.b64 {%r565, %r566}, %rd161; 2026-02-21T09:19:29.0687641Z cvt.rn.f16x2.f32 %r567, %r566, %r565; 2026-02-21T09:19:29.0687951Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0688248Z cvt.u64.u32 %rd162, %r402; 2026-02-21T09:19:29.0688402Z cvt.u64.u32 %rd163, %r403; 2026-02-21T09:19:29.0688559Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:19:29.0688714Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:19:29.0688992Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0689279Z mov.b64 {%r568, %r569}, %rd165; 2026-02-21T09:19:29.0689449Z cvt.rn.f16x2.f32 %r570, %r569, %r568; 2026-02-21T09:19:29.0689736Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0690021Z cvt.u64.u32 %rd166, %r404; 2026-02-21T09:19:29.0690182Z cvt.u64.u32 %rd167, %r405; 2026-02-21T09:19:29.0690335Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:19:29.0690508Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:19:29.0690764Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0691056Z mov.b64 {%r571, %r572}, %rd169; 2026-02-21T09:19:29.0691229Z cvt.rn.f16x2.f32 %r573, %r572, %r571; 2026-02-21T09:19:29.0691511Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0691802Z cvt.u64.u32 %rd170, %r406; 2026-02-21T09:19:29.0691953Z cvt.u64.u32 %rd171, %r407; 2026-02-21T09:19:29.0692114Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:19:29.0692269Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:19:29.0692550Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0692837Z mov.b64 {%r574, %r575}, %rd173; 2026-02-21T09:19:29.0693007Z cvt.rn.f16x2.f32 %r576, %r575, %r574; 2026-02-21T09:19:29.0693292Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0693582Z cvt.u64.u32 %rd174, %r408; 2026-02-21T09:19:29.0693742Z cvt.u64.u32 %rd175, %r409; 2026-02-21T09:19:29.0693979Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:19:29.0694171Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:19:29.0694434Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0694779Z mov.b64 {%r577, %r578}, %rd177; 2026-02-21T09:19:29.0694958Z cvt.rn.f16x2.f32 %r579, %r578, %r577; 2026-02-21T09:19:29.0695251Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0695593Z cvt.u64.u32 %rd178, %r410; 2026-02-21T09:19:29.0695750Z cvt.u64.u32 %rd179, %r411; 2026-02-21T09:19:29.0695912Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:19:29.0696072Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:19:29.0696357Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0696642Z mov.b64 {%r580, %r581}, %rd181; 2026-02-21T09:19:29.0696815Z cvt.rn.f16x2.f32 %r582, %r581, %r580; 2026-02-21T09:19:29.0697108Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0697401Z cvt.u64.u32 %rd182, %r412; 2026-02-21T09:19:29.0697563Z cvt.u64.u32 %rd183, %r413; 2026-02-21T09:19:29.0697714Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:19:29.0697880Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:19:29.0698148Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0698445Z mov.b64 {%r583, %r584}, %rd185; 2026-02-21T09:19:29.0698617Z cvt.rn.f16x2.f32 %r585, %r584, %r583; 2026-02-21T09:19:29.0698900Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0699196Z cvt.u64.u32 %rd186, %r415; 2026-02-21T09:19:29.0699347Z cvt.u64.u32 %rd187, %r416; 2026-02-21T09:19:29.0699506Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:19:29.0699663Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:19:29.0699942Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0700229Z mov.b64 {%r586, %r587}, %rd189; 2026-02-21T09:19:29.0700399Z cvt.rn.f16x2.f32 %r588, %r587, %r586; 2026-02-21T09:19:29.0700684Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0700979Z cvt.u64.u32 %rd190, %r417; 2026-02-21T09:19:29.0701136Z cvt.u64.u32 %rd191, %r418; 2026-02-21T09:19:29.0701292Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:19:29.0701449Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:19:29.0701709Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0701991Z mov.b64 {%r589, %r590}, %rd193; 2026-02-21T09:19:29.0702156Z cvt.rn.f16x2.f32 %r591, %r590, %r589; 2026-02-21T09:19:29.0702421Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0702701Z cvt.u64.u32 %rd194, %r419; 2026-02-21T09:19:29.0702846Z cvt.u64.u32 %rd195, %r420; 2026-02-21T09:19:29.0702998Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:19:29.0703147Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:19:29.0703413Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0703685Z mov.b64 {%r592, %r593}, %rd197; 2026-02-21T09:19:29.0703846Z cvt.rn.f16x2.f32 %r594, %r593, %r592; 2026-02-21T09:19:29.0704116Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0704394Z cvt.u64.u32 %rd198, %r421; 2026-02-21T09:19:29.0704546Z cvt.u64.u32 %rd199, %r422; 2026-02-21T09:19:29.0704736Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:19:29.0704896Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:19:29.0705154Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0705521Z mov.b64 {%r595, %r596}, %rd201; 2026-02-21T09:19:29.0705725Z cvt.rn.f16x2.f32 %r597, %r596, %r595; 2026-02-21T09:19:29.0706033Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0706332Z cvt.u64.u32 %rd202, %r423; 2026-02-21T09:19:29.0706488Z cvt.u64.u32 %rd203, %r424; 2026-02-21T09:19:29.0706650Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:19:29.0706809Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:19:29.0707114Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0707413Z mov.b64 {%r598, %r599}, %rd205; 2026-02-21T09:19:29.0707590Z cvt.rn.f16x2.f32 %r600, %r599, %r598; 2026-02-21T09:19:29.0707885Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0708177Z cvt.u64.u32 %rd206, %r425; 2026-02-21T09:19:29.0708338Z cvt.u64.u32 %rd207, %r426; 2026-02-21T09:19:29.0708493Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:19:29.0708663Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:19:29.0708940Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0709244Z mov.b64 {%r601, %r602}, %rd209; 2026-02-21T09:19:29.0709419Z cvt.rn.f16x2.f32 %r603, %r602, %r601; 2026-02-21T09:19:29.0709705Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0710006Z cvt.u64.u32 %rd210, %r427; 2026-02-21T09:19:29.0710164Z cvt.u64.u32 %rd211, %r428; 2026-02-21T09:19:29.0710325Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:19:29.0710484Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:19:29.0710765Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0711059Z mov.b64 {%r604, %r605}, %rd213; 2026-02-21T09:19:29.0711235Z cvt.rn.f16x2.f32 %r606, %r605, %r604; 2026-02-21T09:19:29.0711531Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0711828Z cvt.u64.u32 %rd214, %r429; 2026-02-21T09:19:29.0711990Z cvt.u64.u32 %rd215, %r430; 2026-02-21T09:19:29.0712145Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:19:29.0712311Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:19:29.0712583Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0712884Z mov.b64 {%r607, %r608}, %rd217; 2026-02-21T09:19:29.0713058Z cvt.rn.f16x2.f32 %r609, %r608, %r607; 2026-02-21T09:19:29.0713345Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0713647Z cvt.u64.u32 %rd218, %r432; 2026-02-21T09:19:29.0713801Z cvt.u64.u32 %rd219, %r433; 2026-02-21T09:19:29.0713962Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:19:29.0714120Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:19:29.0714404Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0714740Z mov.b64 {%r610, %r611}, %rd221; 2026-02-21T09:19:29.0714941Z cvt.rn.f16x2.f32 %r612, %r611, %r610; 2026-02-21T09:19:29.0715233Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0715533Z cvt.u64.u32 %rd222, %r434; 2026-02-21T09:19:29.0715703Z cvt.u64.u32 %rd223, %r435; 2026-02-21T09:19:29.0715846Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:19:29.0715997Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:19:29.0716254Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0716536Z mov.b64 {%r613, %r614}, %rd225; 2026-02-21T09:19:29.0716700Z cvt.rn.f16x2.f32 %r615, %r614, %r613; 2026-02-21T09:19:29.0716967Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0717248Z cvt.u64.u32 %rd226, %r436; 2026-02-21T09:19:29.0717424Z cvt.u64.u32 %rd227, %r437; 2026-02-21T09:19:29.0717604Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:19:29.0717786Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:19:29.0718041Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0718316Z mov.b64 {%r616, %r617}, %rd229; 2026-02-21T09:19:29.0718477Z cvt.rn.f16x2.f32 %r618, %r617, %r616; 2026-02-21T09:19:29.0718744Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0719042Z cvt.u64.u32 %rd230, %r438; 2026-02-21T09:19:29.0719198Z cvt.u64.u32 %rd231, %r439; 2026-02-21T09:19:29.0719343Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:19:29.0719499Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:19:29.0719752Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0720030Z mov.b64 {%r619, %r620}, %rd233; 2026-02-21T09:19:29.0720195Z cvt.rn.f16x2.f32 %r621, %r620, %r619; 2026-02-21T09:19:29.0720462Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0720743Z cvt.u64.u32 %rd234, %r440; 2026-02-21T09:19:29.0720889Z cvt.u64.u32 %rd235, %r441; 2026-02-21T09:19:29.0721041Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:19:29.0721190Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:19:29.0721454Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0721728Z mov.b64 {%r622, %r623}, %rd237; 2026-02-21T09:19:29.0721893Z cvt.rn.f16x2.f32 %r624, %r623, %r622; 2026-02-21T09:19:29.0722166Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0722441Z cvt.u64.u32 %rd238, %r442; 2026-02-21T09:19:29.0722591Z cvt.u64.u32 %rd239, %r443; 2026-02-21T09:19:29.0722736Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:19:29.0722889Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:19:29.0723149Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0723426Z mov.b64 {%r625, %r626}, %rd241; 2026-02-21T09:19:29.0723600Z cvt.rn.f16x2.f32 %r627, %r626, %r625; 2026-02-21T09:19:29.0723864Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0724144Z cvt.u64.u32 %rd242, %r444; 2026-02-21T09:19:29.0724288Z cvt.u64.u32 %rd243, %r445; 2026-02-21T09:19:29.0724440Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:19:29.0724597Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:19:29.0724921Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0725220Z mov.b64 {%r628, %r629}, %rd245; 2026-02-21T09:19:29.0725397Z cvt.rn.f16x2.f32 %r630, %r629, %r628; 2026-02-21T09:19:29.0725697Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0726004Z cvt.u64.u32 %rd246, %r446; 2026-02-21T09:19:29.0726178Z cvt.u64.u32 %rd247, %r447; 2026-02-21T09:19:29.0726322Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:19:29.0726478Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:19:29.0726735Z .loc 1 58 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:58:27 2026-02-21T09:19:29.0727037Z mov.b64 {%r631, %r632}, %rd249; 2026-02-21T09:19:29.0727213Z cvt.rn.f16x2.f32 %r633, %r632, %r631; 2026-02-21T09:19:29.0727505Z .loc 1 59 82 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:59:82 2026-02-21T09:19:29.0727847Z st.shared.v4.b32 [%r48], {%r540, %r552, %r564, %r576}; 2026-02-21T09:19:29.0728088Z st.shared.v4.b32 [%r49], {%r588, %r600, %r612, %r624}; 2026-02-21T09:19:29.0728297Z bar.sync 0; 2026-02-21T09:19:29.0728432Z // begin inline asm 2026-02-21T09:19:29.0728686Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r449, %r450, %r451, %r452}, [%r453]; 2026-02-21T09:19:29.0729000Z // end inline asm 2026-02-21T09:19:29.0729179Z // begin inline asm 2026-02-21T09:19:29.0729446Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r454, %r455, %r456, %r457}, [%r458]; 2026-02-21T09:19:29.0729710Z // end inline asm 2026-02-21T09:19:29.0729854Z bar.sync 0; 2026-02-21T09:19:29.0730020Z st.shared.v4.b32 [%r48], {%r543, %r555, %r567, %r579}; 2026-02-21T09:19:29.0730260Z st.shared.v4.b32 [%r49], {%r591, %r603, %r615, %r627}; 2026-02-21T09:19:29.0730453Z bar.sync 0; 2026-02-21T09:19:29.0730592Z // begin inline asm 2026-02-21T09:19:29.0730854Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r459, %r460, %r461, %r462}, [%r453]; 2026-02-21T09:19:29.0731131Z // end inline asm 2026-02-21T09:19:29.0731276Z // begin inline asm 2026-02-21T09:19:29.0731508Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r464, %r465, %r466, %r467}, [%r458]; 2026-02-21T09:19:29.0731789Z // end inline asm 2026-02-21T09:19:29.0731924Z bar.sync 0; 2026-02-21T09:19:29.0732099Z st.shared.v4.b32 [%r48], {%r546, %r558, %r570, %r582}; 2026-02-21T09:19:29.0732339Z st.shared.v4.b32 [%r49], {%r594, %r606, %r618, %r630}; 2026-02-21T09:19:29.0732541Z bar.sync 0; 2026-02-21T09:19:29.0732669Z // begin inline asm 2026-02-21T09:19:29.0732903Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r469, %r470, %r471, %r472}, [%r453]; 2026-02-21T09:19:29.0733169Z // end inline asm 2026-02-21T09:19:29.0733303Z // begin inline asm 2026-02-21T09:19:29.0733536Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r474, %r475, %r476, %r477}, [%r458]; 2026-02-21T09:19:29.0733799Z // end inline asm 2026-02-21T09:19:29.0733936Z bar.sync 0; 2026-02-21T09:19:29.0734094Z st.shared.v4.b32 [%r48], {%r549, %r561, %r573, %r585}; 2026-02-21T09:19:29.0734335Z st.shared.v4.b32 [%r49], {%r597, %r609, %r621, %r633}; 2026-02-21T09:19:29.0734520Z bar.sync 0; 2026-02-21T09:19:29.0734648Z // begin inline asm 2026-02-21T09:19:29.0734942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r479, %r480, %r481, %r482}, [%r453]; 2026-02-21T09:19:29.0735203Z // end inline asm 2026-02-21T09:19:29.0735347Z // begin inline asm 2026-02-21T09:19:29.0735571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r484, %r485, %r486, %r487}, [%r458]; 2026-02-21T09:19:29.0735836Z // end inline asm 2026-02-21T09:19:29.0735968Z // begin inline asm 2026-02-21T09:19:29.0736160Z st.global.v4.b32 [ %rd114 + 0 ], { %r449, %r459, %r469, %r479 }; 2026-02-21T09:19:29.0736370Z // end inline asm 2026-02-21T09:19:29.0736510Z // begin inline asm 2026-02-21T09:19:29.0736698Z st.global.v4.b32 [ %rd115 + 0 ], { %r450, %r460, %r470, %r480 }; 2026-02-21T09:19:29.0736908Z // end inline asm 2026-02-21T09:19:29.0737045Z // begin inline asm 2026-02-21T09:19:29.0737222Z st.global.v4.b32 [ %rd116 + 0 ], { %r451, %r461, %r471, %r481 }; 2026-02-21T09:19:29.0737438Z // end inline asm 2026-02-21T09:19:29.0737569Z // begin inline asm 2026-02-21T09:19:29.0737751Z st.global.v4.b32 [ %rd117 + 0 ], { %r452, %r462, %r472, %r482 }; 2026-02-21T09:19:29.0737956Z // end inline asm 2026-02-21T09:19:29.0738095Z // begin inline asm 2026-02-21T09:19:29.0738272Z st.global.v4.b32 [ %rd118 + 0 ], { %r454, %r464, %r474, %r484 }; 2026-02-21T09:19:29.0738485Z // end inline asm 2026-02-21T09:19:29.0738619Z // begin inline asm 2026-02-21T09:19:29.0738793Z st.global.v4.b32 [ %rd119 + 0 ], { %r455, %r465, %r475, %r485 }; 2026-02-21T09:19:29.0739004Z // end inline asm 2026-02-21T09:19:29.0739136Z // begin inline asm 2026-02-21T09:19:29.0739316Z st.global.v4.b32 [ %rd120 + 0 ], { %r456, %r466, %r476, %r486 }; 2026-02-21T09:19:29.0739521Z // end inline asm 2026-02-21T09:19:29.0739661Z // begin inline asm 2026-02-21T09:19:29.0739838Z st.global.v4.b32 [ %rd121 + 0 ], { %r457, %r467, %r477, %r487 }; 2026-02-21T09:19:29.0740053Z // end inline asm 2026-02-21T09:19:29.0740314Z .loc 1 30 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:30:52 2026-02-21T09:19:29.0740608Z add.s32 %r635, %r635, 1; 2026-02-21T09:19:29.0740777Z setp.ne.b32 %p85, %r635, %r4; 2026-02-21T09:19:29.0740999Z @%p85 bra $L__BB0_2; 2026-02-21T09:19:29.0741179Z bra.uni $L__BB0_9; 2026-02-21T09:19:29.0741395Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:19:29.0741645Z // Child Loop BB0_5 Depth 2 2026-02-21T09:19:29.0741962Z .loc 1 36 35 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:36:35 2026-02-21T09:19:29.0742253Z shr.s32 %r271, %r635, 31; 2026-02-21T09:19:29.0742419Z shr.u32 %r272, %r271, 27; 2026-02-21T09:19:29.0742619Z add.s32 %r273, %r635, %r272; 2026-02-21T09:19:29.0742880Z .loc 1 39 45 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:39:45 2026-02-21T09:19:29.0743160Z and.b32 %r274, %r273, 65504; 2026-02-21T09:19:29.0743319Z sub.s32 %r275, %r635, %r274; 2026-02-21T09:19:29.0745076Z .loc 1 39 64 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:39:64 2026-02-21T09:19:29.0746407Z cvt.u16.u32 %rs1, %r275; 2026-02-21T09:19:29.0746580Z cvt.s8.s32 %rs2, %r275; 2026-02-21T09:19:29.0746846Z .loc 1 40 51 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:40:51 2026-02-21T09:19:29.0747143Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:19:29.0747302Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:19:29.0747450Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:19:29.0747615Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:19:29.0747762Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:19:29.0748025Z .loc 1 39 64 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:39:64 2026-02-21T09:19:29.0748304Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:19:29.0748457Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:19:29.0748717Z .loc 1 41 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:41:27 2026-02-21T09:19:29.0748993Z shl.b32 %r276, %r273, 3; 2026-02-21T09:19:29.0749148Z and.b32 %r277, %r276, -256; 2026-02-21T09:19:29.0749332Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:19:29.0749497Z mad.wide.s16 %r308, %rs10, 64, %r277; 2026-02-21T09:19:29.0749774Z .loc 1 43 27 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:43:27 2026-02-21T09:19:29.0750060Z mul.wide.s16 %r278, %rs7, 128; 2026-02-21T09:19:29.0750326Z .loc 1 44 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:44:32 2026-02-21T09:19:29.0750603Z or.b32 %r279, %r278, %r8; 2026-02-21T09:19:29.0750859Z .loc 1 54 53 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:53 2026-02-21T09:19:29.0751138Z shl.b32 %r280, %r279, 10; 2026-02-21T09:19:29.0751396Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0751692Z shfl.sync.idx.b32 %r65, %r5, 0, 31, -1; 2026-02-21T09:19:29.0751876Z shl.b32 %r281, %r65, 21; 2026-02-21T09:19:29.0752029Z and.b32 %r282, %r281, 6291456; 2026-02-21T09:19:29.0752188Z add.s32 %r448, %r282, %r634; 2026-02-21T09:19:29.0752350Z mov.pred %p22, -1; 2026-02-21T09:19:29.0752488Z mov.b32 %r636, 0; 2026-02-21T09:19:29.0752630Z // begin inline asm 2026-02-21T09:19:29.0752998Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r448 + 0], {%r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636}; 2026-02-21T09:19:29.0753394Z // end inline asm 2026-02-21T09:19:29.0753525Z // begin inline asm 2026-02-21T09:19:29.0753883Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r448 + 16], {%r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636}; 2026-02-21T09:19:29.0754287Z // end inline asm 2026-02-21T09:19:29.0754416Z // begin inline asm 2026-02-21T09:19:29.0754848Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r448 + 32], {%r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636}; 2026-02-21T09:19:29.0755248Z // end inline asm 2026-02-21T09:19:29.0755399Z // begin inline asm 2026-02-21T09:19:29.0755914Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r448 + 48], {%r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636, %r636}; 2026-02-21T09:19:29.0756315Z // end inline asm 2026-02-21T09:19:29.0756452Z // begin inline asm 2026-02-21T09:19:29.0756606Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:19:29.0756780Z // end inline asm 2026-02-21T09:19:29.0756912Z bar.sync 0; 2026-02-21T09:19:29.0757160Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0757436Z add.s32 %r637, %r81, 73776; 2026-02-21T09:19:29.0757591Z // begin inline asm 2026-02-21T09:19:29.0757760Z @%p4 mbarrier.init.shared::cta.b64 [%r637], 1; 2026-02-21T09:19:29.0757944Z // end inline asm 2026-02-21T09:19:29.0758079Z bar.sync 0; 2026-02-21T09:19:29.0758205Z add.s32 %r197, %r81, 73784; 2026-02-21T09:19:29.0758431Z // begin inline asm 2026-02-21T09:19:29.0758638Z @%p4 mbarrier.init.shared::cta.b64 [%r197], 1; 2026-02-21T09:19:29.0758826Z // end inline asm 2026-02-21T09:19:29.0758959Z add.s32 %r198, %r81, 73728; 2026-02-21T09:19:29.0759113Z // begin inline asm 2026-02-21T09:19:29.0759275Z @%p4 mbarrier.init.shared::cta.b64 [%r198], 1; 2026-02-21T09:19:29.0759450Z // end inline asm 2026-02-21T09:19:29.0759585Z bar.sync 0; 2026-02-21T09:19:29.0759711Z add.s32 %r199, %r81, 73736; 2026-02-21T09:19:29.0759862Z // begin inline asm 2026-02-21T09:19:29.0760012Z @%p4 mbarrier.init.shared::cta.b64 [%r199], 1; 2026-02-21T09:19:29.0760192Z // end inline asm 2026-02-21T09:19:29.0760317Z bar.sync 0; 2026-02-21T09:19:29.0760445Z add.s32 %r200, %r81, 73744; 2026-02-21T09:19:29.0760589Z // begin inline asm 2026-02-21T09:19:29.0760747Z @%p4 mbarrier.init.shared::cta.b64 [%r200], 1; 2026-02-21T09:19:29.0760927Z // end inline asm 2026-02-21T09:19:29.0761052Z bar.sync 0; 2026-02-21T09:19:29.0761182Z add.s32 %r201, %r81, 73752; 2026-02-21T09:19:29.0761326Z // begin inline asm 2026-02-21T09:19:29.0761484Z @%p4 mbarrier.init.shared::cta.b64 [%r201], 1; 2026-02-21T09:19:29.0761659Z // end inline asm 2026-02-21T09:19:29.0761792Z bar.sync 0; 2026-02-21T09:19:29.0761914Z add.s32 %r202, %r81, 73760; 2026-02-21T09:19:29.0762066Z // begin inline asm 2026-02-21T09:19:29.0762218Z @%p4 mbarrier.init.shared::cta.b64 [%r202], 1; 2026-02-21T09:19:29.0762398Z // end inline asm 2026-02-21T09:19:29.0762530Z bar.sync 0; 2026-02-21T09:19:29.0762653Z add.s32 %r305, %r81, 73768; 2026-02-21T09:19:29.0762803Z // begin inline asm 2026-02-21T09:19:29.0762954Z @%p4 mbarrier.init.shared::cta.b64 [%r305], 1; 2026-02-21T09:19:29.0763134Z // end inline asm 2026-02-21T09:19:29.0763375Z .loc 1 54 60 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:60 2026-02-21T09:19:29.0763658Z or.b32 %r284, %r280, %r17; 2026-02-21T09:19:29.0763917Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0764209Z mad.wide.s32 %rd41, %r284, 2, %rd14; 2026-02-21T09:19:29.0764385Z cvt.u64.u32 %rd66, %r17; 2026-02-21T09:19:29.0764535Z cvt.s64.s32 %rd7, %r280; 2026-02-21T09:19:29.0764745Z or.b64 %rd67, %rd7, %rd66; 2026-02-21T09:19:29.0764909Z shl.b64 %rd68, %rd67, 1; 2026-02-21T09:19:29.0765071Z add.s64 %rd8, %rd14, %rd68; 2026-02-21T09:19:29.0765230Z add.s64 %rd42, %rd8, 65536; 2026-02-21T09:19:29.0765400Z add.s64 %rd43, %rd8, 131072; 2026-02-21T09:19:29.0765563Z add.s64 %rd44, %rd8, 196608; 2026-02-21T09:19:29.0765727Z mov.b32 %r298, 16; 2026-02-21T09:19:29.0765988Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0766289Z // begin inline asm 2026-02-21T09:19:29.0766495Z cp.async.cg.shared.global [ %r204 + 0 ], [ %rd41 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0766710Z // end inline asm 2026-02-21T09:19:29.0766843Z // begin inline asm 2026-02-21T09:19:29.0767049Z cp.async.cg.shared.global [ %r206 + 0 ], [ %rd42 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0767302Z // end inline asm 2026-02-21T09:19:29.0767466Z // begin inline asm 2026-02-21T09:19:29.0767666Z cp.async.cg.shared.global [ %r208 + 0 ], [ %rd43 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0767882Z // end inline asm 2026-02-21T09:19:29.0768019Z // begin inline asm 2026-02-21T09:19:29.0768208Z cp.async.cg.shared.global [ %r210 + 0 ], [ %rd44 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0768432Z // end inline asm 2026-02-21T09:19:29.0768573Z cp.async.commit_group; 2026-02-21T09:19:29.0768837Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0769126Z bar.sync 0; 2026-02-21T09:19:29.0769254Z // begin inline asm 2026-02-21T09:19:29.0769445Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r198], 4096; 2026-02-21T09:19:29.0769654Z // end inline asm 2026-02-21T09:19:29.0769928Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0770229Z bar.sync 0; 2026-02-21T09:19:29.0770372Z elect.sync %r285|%p45, -1; 2026-02-21T09:19:29.0770530Z and.pred %p35, %p1, %p45; 2026-02-21T09:19:29.0770684Z // begin inline asm 2026-02-21T09:19:29.0771011Z @%p35 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r107], [%rd94, {%r636, %r308}], [%r198]; 2026-02-21T09:19:29.0771358Z // end inline asm 2026-02-21T09:19:29.0771600Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0771873Z add.s64 %rd46, %rd8, 64; 2026-02-21T09:19:29.0772029Z cvt.u64.u32 %rd69, %r23; 2026-02-21T09:19:29.0772175Z or.b64 %rd70, %rd7, %rd69; 2026-02-21T09:19:29.0772329Z shl.b64 %rd71, %rd70, 1; 2026-02-21T09:19:29.0772479Z add.s64 %rd72, %rd14, %rd71; 2026-02-21T09:19:29.0772628Z add.s64 %rd47, %rd72, 65536; 2026-02-21T09:19:29.0772786Z add.s64 %rd48, %rd72, 131072; 2026-02-21T09:19:29.0772941Z add.s64 %rd49, %rd72, 196608; 2026-02-21T09:19:29.0773198Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0773475Z // begin inline asm 2026-02-21T09:19:29.0773672Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd46 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0773885Z // end inline asm 2026-02-21T09:19:29.0774025Z // begin inline asm 2026-02-21T09:19:29.0774221Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd47 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0774433Z // end inline asm 2026-02-21T09:19:29.0774570Z // begin inline asm 2026-02-21T09:19:29.0774804Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd48 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0775026Z // end inline asm 2026-02-21T09:19:29.0775160Z // begin inline asm 2026-02-21T09:19:29.0775364Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd49 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0775593Z // end inline asm 2026-02-21T09:19:29.0775745Z cp.async.commit_group; 2026-02-21T09:19:29.0776026Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0776323Z bar.sync 0; 2026-02-21T09:19:29.0776472Z // begin inline asm 2026-02-21T09:19:29.0776653Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r199], 4096; 2026-02-21T09:19:29.0776870Z // end inline asm 2026-02-21T09:19:29.0777106Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0777408Z bar.sync 0; 2026-02-21T09:19:29.0777548Z elect.sync %r286|%p46, -1; 2026-02-21T09:19:29.0777724Z and.pred %p37, %p1, %p46; 2026-02-21T09:19:29.0777893Z add.s32 %r226, %r81, 53248; 2026-02-21T09:19:29.0778047Z mov.b32 %r227, 32; 2026-02-21T09:19:29.0778193Z // begin inline asm 2026-02-21T09:19:29.0778533Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r226], [%rd94, {%r227, %r308}], [%r199]; 2026-02-21T09:19:29.0778923Z // end inline asm 2026-02-21T09:19:29.0779185Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0779525Z add.s64 %rd51, %rd8, 128; 2026-02-21T09:19:29.0779750Z cvt.u64.u32 %rd73, %r28; 2026-02-21T09:19:29.0779917Z or.b64 %rd74, %rd7, %rd73; 2026-02-21T09:19:29.0780079Z shl.b64 %rd75, %rd74, 1; 2026-02-21T09:19:29.0780236Z add.s64 %rd76, %rd14, %rd75; 2026-02-21T09:19:29.0780410Z add.s64 %rd52, %rd76, 65536; 2026-02-21T09:19:29.0780573Z add.s64 %rd53, %rd76, 131072; 2026-02-21T09:19:29.0780745Z add.s64 %rd54, %rd76, 196608; 2026-02-21T09:19:29.0781015Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0781314Z // begin inline asm 2026-02-21T09:19:29.0781517Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd51 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0781755Z // end inline asm 2026-02-21T09:19:29.0781902Z // begin inline asm 2026-02-21T09:19:29.0782131Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd52 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0782394Z // end inline asm 2026-02-21T09:19:29.0782536Z // begin inline asm 2026-02-21T09:19:29.0782739Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd53 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0782962Z // end inline asm 2026-02-21T09:19:29.0783106Z // begin inline asm 2026-02-21T09:19:29.0783301Z cp.async.cg.shared.global [ %r236 + 0 ], [ %rd54 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0783540Z // end inline asm 2026-02-21T09:19:29.0783689Z cp.async.commit_group; 2026-02-21T09:19:29.0783961Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0784257Z bar.sync 0; 2026-02-21T09:19:29.0784386Z // begin inline asm 2026-02-21T09:19:29.0784582Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r200], 4096; 2026-02-21T09:19:29.0784855Z // end inline asm 2026-02-21T09:19:29.0785116Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0785407Z bar.sync 0; 2026-02-21T09:19:29.0785557Z elect.sync %r287|%p47, -1; 2026-02-21T09:19:29.0785733Z and.pred %p39, %p1, %p47; 2026-02-21T09:19:29.0785897Z add.s32 %r239, %r81, 57344; 2026-02-21T09:19:29.0786060Z mov.b32 %r240, 64; 2026-02-21T09:19:29.0786200Z // begin inline asm 2026-02-21T09:19:29.0786533Z @%p39 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r239], [%rd94, {%r240, %r308}], [%r200]; 2026-02-21T09:19:29.0786870Z // end inline asm 2026-02-21T09:19:29.0787122Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0787416Z add.s64 %rd56, %rd8, 192; 2026-02-21T09:19:29.0787563Z cvt.u64.u32 %rd77, %r33; 2026-02-21T09:19:29.0787720Z or.b64 %rd78, %rd7, %rd77; 2026-02-21T09:19:29.0787869Z shl.b64 %rd79, %rd78, 1; 2026-02-21T09:19:29.0788019Z add.s64 %rd80, %rd14, %rd79; 2026-02-21T09:19:29.0788169Z add.s64 %rd57, %rd80, 65536; 2026-02-21T09:19:29.0788329Z add.s64 %rd58, %rd80, 131072; 2026-02-21T09:19:29.0788479Z add.s64 %rd59, %rd80, 196608; 2026-02-21T09:19:29.0788746Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0789032Z // begin inline asm 2026-02-21T09:19:29.0789219Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd56 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0789439Z // end inline asm 2026-02-21T09:19:29.0789568Z // begin inline asm 2026-02-21T09:19:29.0789760Z cp.async.cg.shared.global [ %r245 + 0 ], [ %rd57 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0789975Z // end inline asm 2026-02-21T09:19:29.0790106Z // begin inline asm 2026-02-21T09:19:29.0790289Z cp.async.cg.shared.global [ %r247 + 0 ], [ %rd58 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0790505Z // end inline asm 2026-02-21T09:19:29.0790640Z // begin inline asm 2026-02-21T09:19:29.0790822Z cp.async.cg.shared.global [ %r249 + 0 ], [ %rd59 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0791039Z // end inline asm 2026-02-21T09:19:29.0791173Z cp.async.commit_group; 2026-02-21T09:19:29.0791435Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0791762Z bar.sync 0; 2026-02-21T09:19:29.0791891Z // begin inline asm 2026-02-21T09:19:29.0792068Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r201], 4096; 2026-02-21T09:19:29.0792278Z // end inline asm 2026-02-21T09:19:29.0792512Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0792784Z bar.sync 0; 2026-02-21T09:19:29.0792923Z elect.sync %r288|%p48, -1; 2026-02-21T09:19:29.0793077Z and.pred %p41, %p1, %p48; 2026-02-21T09:19:29.0793237Z add.s32 %r252, %r81, 61440; 2026-02-21T09:19:29.0793384Z mov.b32 %r253, 96; 2026-02-21T09:19:29.0793522Z // begin inline asm 2026-02-21T09:19:29.0793834Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r252], [%rd94, {%r253, %r308}], [%r201]; 2026-02-21T09:19:29.0794212Z // end inline asm 2026-02-21T09:19:29.0794483Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0794880Z add.s64 %rd61, %rd8, 256; 2026-02-21T09:19:29.0795102Z cvt.u64.u32 %rd81, %r38; 2026-02-21T09:19:29.0795264Z or.b64 %rd82, %rd7, %rd81; 2026-02-21T09:19:29.0795430Z shl.b64 %rd83, %rd82, 1; 2026-02-21T09:19:29.0795591Z add.s64 %rd84, %rd14, %rd83; 2026-02-21T09:19:29.0795761Z add.s64 %rd62, %rd84, 65536; 2026-02-21T09:19:29.0795924Z add.s64 %rd63, %rd84, 131072; 2026-02-21T09:19:29.0796096Z add.s64 %rd64, %rd84, 196608; 2026-02-21T09:19:29.0796381Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0796659Z // begin inline asm 2026-02-21T09:19:29.0796859Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd61 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0797077Z // end inline asm 2026-02-21T09:19:29.0797215Z // begin inline asm 2026-02-21T09:19:29.0797407Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd62 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0797632Z // end inline asm 2026-02-21T09:19:29.0797760Z // begin inline asm 2026-02-21T09:19:29.0797958Z cp.async.cg.shared.global [ %r260 + 0 ], [ %rd63 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0798178Z // end inline asm 2026-02-21T09:19:29.0798307Z // begin inline asm 2026-02-21T09:19:29.0798497Z cp.async.cg.shared.global [ %r262 + 0 ], [ %rd64 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0798708Z // end inline asm 2026-02-21T09:19:29.0798847Z cp.async.commit_group; 2026-02-21T09:19:29.0799108Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0799395Z bar.sync 0; 2026-02-21T09:19:29.0799517Z // begin inline asm 2026-02-21T09:19:29.0799701Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r202], 4096; 2026-02-21T09:19:29.0799910Z // end inline asm 2026-02-21T09:19:29.0800147Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0800427Z bar.sync 0; 2026-02-21T09:19:29.0800559Z elect.sync %r289|%p49, -1; 2026-02-21T09:19:29.0800721Z and.pred %p43, %p1, %p49; 2026-02-21T09:19:29.0800875Z add.s32 %r265, %r81, 65536; 2026-02-21T09:19:29.0801028Z mov.b32 %r266, 128; 2026-02-21T09:19:29.0801168Z // begin inline asm 2026-02-21T09:19:29.0801488Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r265], [%rd94, {%r266, %r308}], [%r202]; 2026-02-21T09:19:29.0801854Z // end inline asm 2026-02-21T09:19:29.0802093Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0802378Z cp.async.wait_group 4; 2026-02-21T09:19:29.0802521Z bar.sync 0; 2026-02-21T09:19:29.0802759Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0803034Z // begin inline asm 2026-02-21T09:19:29.0803169Z 2026-02-21T09:19:29.0803287Z { 2026-02-21T09:19:29.0803405Z .reg .pred complete; 2026-02-21T09:19:29.0803585Z waitLoop: 2026-02-21T09:19:29.0803766Z mbarrier.try_wait.parity.shared.b64 complete, [%r198], %r636; 2026-02-21T09:19:29.0804026Z @!complete bra.uni waitLoop; 2026-02-21T09:19:29.0804172Z } 2026-02-21T09:19:29.0804242Z 2026-02-21T09:19:29.0804296Z // end inline asm 2026-02-21T09:19:29.0804536Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0804893Z setp.ne.b32 %p50, %r65, 0; 2026-02-21T09:19:29.0805055Z @%p50 bra $L__BB0_4; 2026-02-21T09:19:29.0805256Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:29.0805493Z elect.sync %r294|%p52, -1; 2026-02-21T09:19:29.0805655Z mov.b32 %r291, 135266320; 2026-02-21T09:19:29.0805820Z mov.pred %p51, 0; 2026-02-21T09:19:29.0805967Z // begin inline asm 2026-02-21T09:19:29.0806247Z @%p52 tcgen05.mma.cta_group::1.kind::f16 [ %r634 + 0 ], %rd85, %rd86, %r291, %p51; 2026-02-21T09:19:29.0806494Z // end inline asm 2026-02-21T09:19:29.0806660Z // begin inline asm 2026-02-21T09:19:29.0806886Z @%p52 tcgen05.mma.cta_group::1.kind::f16 [ %r634 + 0 ], %rd87, %rd88, %r291, %p22; 2026-02-21T09:19:29.0807142Z // end inline asm 2026-02-21T09:19:29.0807290Z add.s32 %r296, %r81, 73776; 2026-02-21T09:19:29.0807451Z cvt.u64.u32 %rd89, %r296; 2026-02-21T09:19:29.0807612Z // begin inline asm 2026-02-21T09:19:29.0807820Z @%p52 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd89]; 2026-02-21T09:19:29.0808070Z // end inline asm 2026-02-21T09:19:29.0808255Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:29.0808591Z .loc 1 0 0 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:0 2026-02-21T09:19:29.0808893Z cvt.s32.s16 %r54, %rs7; 2026-02-21T09:19:29.0809050Z or.b32 %r56, %r308, %r7; 2026-02-21T09:19:29.0809210Z or.b32 %r57, %r278, %r9; 2026-02-21T09:19:29.0809365Z or.b32 %r58, %r278, %r10; 2026-02-21T09:19:29.0809527Z or.b32 %r59, %r278, %r11; 2026-02-21T09:19:29.0809681Z or.b32 %r60, %r278, %r12; 2026-02-21T09:19:29.0809839Z or.b32 %r61, %r278, %r13; 2026-02-21T09:19:29.0809989Z or.b32 %r62, %r278, %r14; 2026-02-21T09:19:29.0810144Z or.b32 %r63, %r278, %r15; 2026-02-21T09:19:29.0810302Z or.b32 %r64, %r278, %r16; 2026-02-21T09:19:29.0810573Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0810877Z add.s64 %rd90, %rd8, 320; 2026-02-21T09:19:29.0811032Z cvt.u64.u32 %rd96, %r43; 2026-02-21T09:19:29.0811194Z add.s64 %rd97, %rd7, %rd96; 2026-02-21T09:19:29.0811354Z shl.b64 %rd98, %rd97, 1; 2026-02-21T09:19:29.0811515Z add.s64 %rd99, %rd14, %rd98; 2026-02-21T09:19:29.0811678Z add.s64 %rd91, %rd99, 65536; 2026-02-21T09:19:29.0811845Z add.s64 %rd92, %rd99, 131072; 2026-02-21T09:19:29.0812008Z add.s64 %rd93, %rd99, 196608; 2026-02-21T09:19:29.0812289Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0812581Z bar.sync 0; 2026-02-21T09:19:29.0812716Z // begin inline asm 2026-02-21T09:19:29.0812931Z cp.async.cg.shared.global [ %r297 + 0 ], [ %rd90 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0813164Z // end inline asm 2026-02-21T09:19:29.0813308Z // begin inline asm 2026-02-21T09:19:29.0813509Z cp.async.cg.shared.global [ %r299 + 0 ], [ %rd91 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0813743Z // end inline asm 2026-02-21T09:19:29.0813879Z // begin inline asm 2026-02-21T09:19:29.0814084Z cp.async.cg.shared.global [ %r301 + 0 ], [ %rd92 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0814318Z // end inline asm 2026-02-21T09:19:29.0814454Z // begin inline asm 2026-02-21T09:19:29.0814665Z cp.async.cg.shared.global [ %r303 + 0 ], [ %rd93 + 0 ], 0x10, %r298; 2026-02-21T09:19:29.0814906Z // end inline asm 2026-02-21T09:19:29.0815050Z cp.async.commit_group; 2026-02-21T09:19:29.0815303Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0815613Z // begin inline asm 2026-02-21T09:19:29.0815844Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r305], 4096; 2026-02-21T09:19:29.0816098Z // end inline asm 2026-02-21T09:19:29.0816357Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0816658Z bar.sync 0; 2026-02-21T09:19:29.0816816Z elect.sync %r315|%p59, -1; 2026-02-21T09:19:29.0816974Z and.pred %p57, %p1, %p59; 2026-02-21T09:19:29.0817132Z add.s32 %r306, %r81, 69632; 2026-02-21T09:19:29.0817280Z mov.b32 %r307, 160; 2026-02-21T09:19:29.0817424Z // begin inline asm 2026-02-21T09:19:29.0817749Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r306], [%rd94, {%r307, %r308}], [%r305]; 2026-02-21T09:19:29.0818092Z // end inline asm 2026-02-21T09:19:29.0818363Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0818667Z shl.b32 %r316, %r54, 17; 2026-02-21T09:19:29.0818825Z or.b32 %r317, %r52, %r316; 2026-02-21T09:19:29.0818982Z mad.wide.s32 %rd250, %r317, 2, %rd6; 2026-02-21T09:19:29.0819156Z mov.b32 %r641, 1; 2026-02-21T09:19:29.0819286Z mov.b32 %r640, 5; 2026-02-21T09:19:29.0819423Z mov.b64 %rd251, 0; 2026-02-21T09:19:29.0819560Z mov.b32 %r638, %r636; 2026-02-21T09:19:29.0819698Z mov.b32 %r639, %r636; 2026-02-21T09:19:29.0819843Z mov.b32 %r642, %r636; 2026-02-21T09:19:29.0819978Z bra.uni $L__BB0_5; 2026-02-21T09:19:29.0820162Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:19:29.0820466Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0820762Z setp.lt.u64 %p69, %rd251, 832; 2026-02-21T09:19:29.0821022Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0821306Z // begin inline asm 2026-02-21T09:19:29.0821440Z 2026-02-21T09:19:29.0821548Z { 2026-02-21T09:19:29.0821672Z .reg .pred complete; 2026-02-21T09:19:29.0821810Z waitLoop: 2026-02-21T09:19:29.0821998Z mbarrier.try_wait.parity.shared.b64 complete, [%r637], %r636; 2026-02-21T09:19:29.0822061Z @!complete bra.uni waitLoop; 2026-02-21T09:19:29.0822108Z } 2026-02-21T09:19:29.0822112Z 2026-02-21T09:19:29.0822172Z // end inline asm 2026-02-21T09:19:29.0822229Z add.s32 %r359, %r641, 1; 2026-02-21T09:19:29.0822287Z setp.gt.s32 %p72, %r359, 1; 2026-02-21T09:19:29.0822347Z selp.b32 %r641, 0, %r359, %p72; 2026-02-21T09:19:29.0822408Z selp.b32 %r360, 1, 0, %p72; 2026-02-21T09:19:29.0822462Z xor.b32 %r78, %r642, %r360; 2026-02-21T09:19:29.0822621Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0822683Z add.s32 %r361, %r640, 1; 2026-02-21T09:19:29.0822741Z setp.gt.s32 %p73, %r361, 5; 2026-02-21T09:19:29.0822801Z selp.b32 %r640, 0, %r361, %p73; 2026-02-21T09:19:29.0822967Z .loc 1 54 32 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:32 2026-02-21T09:19:29.0823032Z add.s64 %rd109, %rd250, -196608; 2026-02-21T09:19:29.0823094Z add.s64 %rd110, %rd250, -131072; 2026-02-21T09:19:29.0823153Z add.s64 %rd111, %rd250, -65536; 2026-02-21T09:19:29.0823313Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0823368Z shl.b32 %r362, %r640, 13; 2026-02-21T09:19:29.0823424Z add.s32 %r364, %r81, %r362; 2026-02-21T09:19:29.0823484Z bar.sync 0; 2026-02-21T09:19:29.0823538Z add.s32 %r346, %r364, %r18; 2026-02-21T09:19:29.0823597Z selp.b32 %r347, 16, 0, %p69; 2026-02-21T09:19:29.0823658Z // begin inline asm 2026-02-21T09:19:29.0823773Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd109 + 0 ], 0x10, %r347; 2026-02-21T09:19:29.0823826Z // end inline asm 2026-02-21T09:19:29.0823880Z add.s32 %r348, %r346, 2048; 2026-02-21T09:19:29.0823942Z // begin inline asm 2026-02-21T09:19:29.0824055Z cp.async.cg.shared.global [ %r348 + 0 ], [ %rd110 + 0 ], 0x10, %r347; 2026-02-21T09:19:29.0824172Z // end inline asm 2026-02-21T09:19:29.0824234Z add.s32 %r350, %r346, 4096; 2026-02-21T09:19:29.0824287Z // begin inline asm 2026-02-21T09:19:29.0824398Z cp.async.cg.shared.global [ %r350 + 0 ], [ %rd111 + 0 ], 0x10, %r347; 2026-02-21T09:19:29.0824451Z // end inline asm 2026-02-21T09:19:29.0824513Z add.s32 %r352, %r346, 6144; 2026-02-21T09:19:29.0824566Z // begin inline asm 2026-02-21T09:19:29.0824695Z cp.async.cg.shared.global [ %r352 + 0 ], [ %rd250 + 0 ], 0x10, %r347; 2026-02-21T09:19:29.0824758Z // end inline asm 2026-02-21T09:19:29.0824817Z cp.async.commit_group; 2026-02-21T09:19:29.0824977Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0825041Z shl.b32 %r365, %r640, 3; 2026-02-21T09:19:29.0825098Z add.s32 %r366, %r81, %r365; 2026-02-21T09:19:29.0825182Z add.s32 %r358, %r366, 73728; 2026-02-21T09:19:29.0825269Z and.pred %p67, %p4, %p69; 2026-02-21T09:19:29.0825333Z // begin inline asm 2026-02-21T09:19:29.0825442Z @%p67 mbarrier.arrive.expect_tx.shared.b64 _, [%r358], 4096; 2026-02-21T09:19:29.0825494Z // end inline asm 2026-02-21T09:19:29.0825666Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0825722Z shl.b32 %r367, %r640, 12; 2026-02-21T09:19:29.0825778Z add.s32 %r368, %r81, %r367; 2026-02-21T09:19:29.0825833Z add.s32 %r355, %r368, 49152; 2026-02-21T09:19:29.0825896Z bar.sync 0; 2026-02-21T09:19:29.0825957Z elect.sync %r369|%p74, -1; 2026-02-21T09:19:29.0826018Z and.pred %p75, %p69, %p74; 2026-02-21T09:19:29.0826083Z and.pred %p68, %p1, %p75; 2026-02-21T09:19:29.0826138Z cvt.u32.u64 %r370, %rd251; 2026-02-21T09:19:29.0826190Z add.s32 %r356, %r370, 192; 2026-02-21T09:19:29.0826249Z // begin inline asm 2026-02-21T09:19:29.0826489Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r355], [%rd94, {%r356, %r308}], [%r358]; 2026-02-21T09:19:29.0826546Z // end inline asm 2026-02-21T09:19:29.0826712Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0826773Z add.s64 %rd250, %rd250, 64; 2026-02-21T09:19:29.0826835Z setp.lt.u64 %p76, %rd251, 960; 2026-02-21T09:19:29.0826889Z add.s64 %rd251, %rd251, 32; 2026-02-21T09:19:29.0826950Z mov.b32 %r636, %r642; 2026-02-21T09:19:29.0827004Z mov.b32 %r637, %r371; 2026-02-21T09:19:29.0827058Z mov.b32 %r642, %r78; 2026-02-21T09:19:29.0827121Z @%p76 bra $L__BB0_5; 2026-02-21T09:19:29.0827173Z bra.uni $L__BB0_8; 2026-02-21T09:19:29.0827267Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:19:29.0827361Z // => This Inner Loop Header: Depth=2 2026-02-21T09:19:29.0827422Z add.s32 %r320, %r639, 1; 2026-02-21T09:19:29.0827480Z setp.gt.s32 %p61, %r320, 5; 2026-02-21T09:19:29.0827541Z selp.b32 %r639, 0, %r320, %p61; 2026-02-21T09:19:29.0827606Z selp.b32 %r321, 1, 0, %p61; 2026-02-21T09:19:29.0827665Z xor.b32 %r638, %r638, %r321; 2026-02-21T09:19:29.0827827Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0827886Z cp.async.wait_group 4; 2026-02-21T09:19:29.0827945Z bar.sync 0; 2026-02-21T09:19:29.0828103Z .loc 1 49 90 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:49:90 2026-02-21T09:19:29.0828157Z shl.b32 %r322, %r639, 3; 2026-02-21T09:19:29.0828219Z add.s32 %r324, %r81, %r322; 2026-02-21T09:19:29.0828274Z add.s32 %r318, %r324, 73728; 2026-02-21T09:19:29.0828327Z // begin inline asm 2026-02-21T09:19:29.0828382Z 2026-02-21T09:19:29.0828430Z { 2026-02-21T09:19:29.0828489Z .reg .pred complete; 2026-02-21T09:19:29.0828540Z waitLoop: 2026-02-21T09:19:29.0828659Z mbarrier.try_wait.parity.shared.b64 complete, [%r318], %r638; 2026-02-21T09:19:29.0828719Z @!complete bra.uni waitLoop; 2026-02-21T09:19:29.0828767Z } 2026-02-21T09:19:29.0828803Z 2026-02-21T09:19:29.0828863Z // end inline asm 2026-02-21T09:19:29.0828942Z shl.b32 %r325, %r641, 3; 2026-02-21T09:19:29.0828996Z add.s32 %r326, %r81, %r325; 2026-02-21T09:19:29.0829052Z add.s32 %r371, %r326, 73776; 2026-02-21T09:19:29.0829222Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0829277Z @%p50 bra $L__BB0_7; 2026-02-21T09:19:29.0829368Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:19:29.0829537Z .loc 1 55 44 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:55:44 2026-02-21T09:19:29.0829591Z shl.b32 %r331, %r639, 12; 2026-02-21T09:19:29.0829644Z add.s32 %r333, %r81, %r331; 2026-02-21T09:19:29.0829705Z add.s32 %r334, %r333, 49152; 2026-02-21T09:19:29.0829890Z .loc 1 54 85 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:54:85 2026-02-21T09:19:29.0829969Z shl.b32 %r335, %r639, 13; 2026-02-21T09:19:29.0830026Z add.s32 %r336, %r81, %r335; 2026-02-21T09:19:29.0830202Z .loc 1 56 52 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:56:52 2026-02-21T09:19:29.0830263Z elect.sync %r337|%p63, -1; 2026-02-21T09:19:29.0830320Z bfe.u32 %r338, %r336, 4, 14; 2026-02-21T09:19:29.0830384Z cvt.u64.u32 %rd105, %r338; 2026-02-21T09:19:29.0830454Z or.b64 %rd100, %rd105, -9223371899382267904; 2026-02-21T09:19:29.0830510Z bfe.u32 %r339, %r334, 4, 14; 2026-02-21T09:19:29.0830573Z cvt.u64.u32 %rd106, %r339; 2026-02-21T09:19:29.0830642Z or.b64 %rd101, %rd106, -9223371899399045120; 2026-02-21T09:19:29.0830696Z mov.b32 %r328, 135266320; 2026-02-21T09:19:29.0830753Z mov.pred %p62, -1; 2026-02-21T09:19:29.0830814Z // begin inline asm 2026-02-21T09:19:29.0830955Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r634 + 0 ], %rd100, %rd101, %r328, %p62; 2026-02-21T09:19:29.0831010Z // end inline asm 2026-02-21T09:19:29.0831072Z add.s32 %r340, %r336, 32; 2026-02-21T09:19:29.0831129Z bfe.u32 %r341, %r340, 4, 14; 2026-02-21T09:19:29.0831185Z cvt.u64.u32 %rd107, %r341; 2026-02-21T09:19:29.0831252Z or.b64 %rd102, %rd107, -9223371899382267904; 2026-02-21T09:19:29.0831314Z add.s32 %r342, %r333, 49184; 2026-02-21T09:19:29.0831368Z bfe.u32 %r343, %r342, 4, 14; 2026-02-21T09:19:29.0831424Z cvt.u64.u32 %rd108, %r343; 2026-02-21T09:19:29.0831497Z or.b64 %rd103, %rd108, -9223371899399045120; 2026-02-21T09:19:29.0831552Z // begin inline asm 2026-02-21T09:19:29.0831686Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r634 + 0 ], %rd102, %rd103, %r328, %p62; 2026-02-21T09:19:29.0831746Z // end inline asm 2026-02-21T09:19:29.0831802Z cvt.u64.u32 %rd104, %r371; 2026-02-21T09:19:29.0831856Z // begin inline asm 2026-02-21T09:19:29.0831976Z @%p63 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd104]; 2026-02-21T09:19:29.0832037Z // end inline asm 2026-02-21T09:19:29.0832093Z bra.uni $L__BB0_7; 2026-02-21T09:19:29.0832177Z $L__BB0_9: // %._crit_edge 2026-02-21T09:19:29.0832360Z .loc 1 30 4 // chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py:30:4 2026-02-21T09:19:29.0832416Z bar.sync 0; 2026-02-21T09:19:29.0832471Z // begin inline asm 2026-02-21T09:19:29.0832595Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r634, 64; 2026-02-21T09:19:29.0832649Z // end inline asm 2026-02-21T09:19:29.0832700Z ret; 2026-02-21T09:19:29.0832754Z $L__tmp0: 2026-02-21T09:19:29.0832819Z $L__func_end0: 2026-02-21T09:19:29.0832900Z // -- End function 2026-02-21T09:19:29.0832951Z } 2026-02-21T09:19:29.0833171Z .file 1 "/tmp/torchinductor_root/hx/chxmvwsgej624wuiernxjllh5v5mfu2ilfzsjuanqi6o4vdue7fv.py" 2026-02-21T09:19:29.0833232Z .section .debug_abbrev 2026-02-21T09:19:29.0833283Z { 2026-02-21T09:19:29.0833370Z .b8 1 // Abbreviation Code 2026-02-21T09:19:29.0833465Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:19:29.0833568Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:19:29.0833672Z .b8 37 // DW_AT_producer 2026-02-21T09:19:29.0833756Z .b8 8 // DW_FORM_string 2026-02-21T09:19:29.0833830Z .b8 19 // DW_AT_language 2026-02-21T09:19:29.0833907Z .b8 5 // DW_FORM_data2 2026-02-21T09:19:29.0833988Z .b8 3 // DW_AT_name 2026-02-21T09:19:29.0834061Z .b8 8 // DW_FORM_string 2026-02-21T09:19:29.0834138Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:19:29.0834216Z .b8 6 // DW_FORM_data4 2026-02-21T09:19:29.0834299Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:19:29.0834393Z .b8 8 // DW_FORM_string 2026-02-21T09:19:29.0834488Z .b8 0 // EOM(1) 2026-02-21T09:19:29.0834567Z .b8 0 // EOM(2) 2026-02-21T09:19:29.0834634Z .b8 0 // EOM(3) 2026-02-21T09:19:29.0834745Z } 2026-02-21T09:19:29.0834812Z .section .debug_info 2026-02-21T09:19:29.0834864Z { 2026-02-21T09:19:29.0834946Z .b32 104 // Length of Unit 2026-02-21T09:19:29.0835032Z .b8 2 // DWARF version number 2026-02-21T09:19:29.0835091Z .b8 0 2026-02-21T09:19:29.0835205Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:19:29.0835294Z .b8 8 // Address Size (in bytes) 2026-02-21T09:19:29.0835402Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:19:29.0835481Z .b8 116 // DW_AT_producer 2026-02-21T09:19:29.0835534Z .b8 114 2026-02-21T09:19:29.0835595Z .b8 105 2026-02-21T09:19:29.0835646Z .b8 116 2026-02-21T09:19:29.0835697Z .b8 111 2026-02-21T09:19:29.0835747Z .b8 110 2026-02-21T09:19:29.0835808Z .b8 0 2026-02-21T09:19:29.0835881Z .b8 2 // DW_AT_language 2026-02-21T09:19:29.0835931Z .b8 0 2026-02-21T09:19:29.0836012Z .b8 99 // DW_AT_name 2026-02-21T09:19:29.0836062Z .b8 104 2026-02-21T09:19:29.0836112Z .b8 120 2026-02-21T09:19:29.0836162Z .b8 109 2026-02-21T09:19:29.0836222Z .b8 118 2026-02-21T09:19:29.0836272Z .b8 119 2026-02-21T09:19:29.0836324Z .b8 115 2026-02-21T09:19:29.0836373Z .b8 103 2026-02-21T09:19:29.0836429Z .b8 101 2026-02-21T09:19:29.0836479Z .b8 106 2026-02-21T09:19:29.0836530Z .b8 54 2026-02-21T09:19:29.0836589Z .b8 50 2026-02-21T09:19:29.0836638Z .b8 52 2026-02-21T09:19:29.0836688Z .b8 119 2026-02-21T09:19:29.0836737Z .b8 117 2026-02-21T09:19:29.0836791Z .b8 105 2026-02-21T09:19:29.0836841Z .b8 101 2026-02-21T09:19:29.0836890Z .b8 114 2026-02-21T09:19:29.0836945Z .b8 110 2026-02-21T09:19:29.0836996Z .b8 120 2026-02-21T09:19:29.0837048Z .b8 106 2026-02-21T09:19:29.0837099Z .b8 108 2026-02-21T09:19:29.0837158Z .b8 108 2026-02-21T09:19:29.0837209Z .b8 104 2026-02-21T09:19:29.0837259Z .b8 53 2026-02-21T09:19:29.0837316Z .b8 118 2026-02-21T09:19:29.0837366Z .b8 53 2026-02-21T09:19:29.0837415Z .b8 109 2026-02-21T09:19:29.0837464Z .b8 102 2026-02-21T09:19:29.0837520Z .b8 117 2026-02-21T09:19:29.0837570Z .b8 50 2026-02-21T09:19:29.0837620Z .b8 105 2026-02-21T09:19:29.0837670Z .b8 108 2026-02-21T09:19:29.0837727Z .b8 102 2026-02-21T09:19:29.0837777Z .b8 122 2026-02-21T09:19:29.0837827Z .b8 115 2026-02-21T09:19:29.0837883Z .b8 106 2026-02-21T09:19:29.0837933Z .b8 117 2026-02-21T09:19:29.0837982Z .b8 97 2026-02-21T09:19:29.0838031Z .b8 110 2026-02-21T09:19:29.0838087Z .b8 113 2026-02-21T09:19:29.0838135Z .b8 105 2026-02-21T09:19:29.0838186Z .b8 54 2026-02-21T09:19:29.0838242Z .b8 111 2026-02-21T09:19:29.0838292Z .b8 52 2026-02-21T09:19:29.0838342Z .b8 118 2026-02-21T09:19:29.0838394Z .b8 100 2026-02-21T09:19:29.0838451Z .b8 117 2026-02-21T09:19:29.0838502Z .b8 101 2026-02-21T09:19:29.0838586Z .b8 55 2026-02-21T09:19:29.0838666Z .b8 102 2026-02-21T09:19:29.0838725Z .b8 118 2026-02-21T09:19:29.0838777Z .b8 46 2026-02-21T09:19:29.0838825Z .b8 112 2026-02-21T09:19:29.0838884Z .b8 121 2026-02-21T09:19:29.0838934Z .b8 0 2026-02-21T09:19:29.0839025Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:19:29.0839101Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:19:29.0839159Z .b8 116 2026-02-21T09:19:29.0839208Z .b8 109 2026-02-21T09:19:29.0839257Z .b8 112 2026-02-21T09:19:29.0839314Z .b8 47 2026-02-21T09:19:29.0839364Z .b8 116 2026-02-21T09:19:29.0839413Z .b8 111 2026-02-21T09:19:29.0839461Z .b8 114 2026-02-21T09:19:29.0839520Z .b8 99 2026-02-21T09:19:29.0839569Z .b8 104 2026-02-21T09:19:29.0839618Z .b8 105 2026-02-21T09:19:29.0839673Z .b8 110 2026-02-21T09:19:29.0839722Z .b8 100 2026-02-21T09:19:29.0839770Z .b8 117 2026-02-21T09:19:29.0839845Z .b8 99 2026-02-21T09:19:29.0839905Z .b8 116 2026-02-21T09:19:29.0840003Z .b8 111 2026-02-21T09:19:29.0840060Z .b8 114 2026-02-21T09:19:29.0840116Z .b8 95 2026-02-21T09:19:29.0840166Z .b8 114 2026-02-21T09:19:29.0840213Z .b8 111 2026-02-21T09:19:29.0840262Z .b8 111 2026-02-21T09:19:29.0840317Z .b8 116 2026-02-21T09:19:29.0840365Z .b8 47 2026-02-21T09:19:29.0840411Z .b8 104 2026-02-21T09:19:29.0840458Z .b8 120 2026-02-21T09:19:29.0840512Z .b8 0 2026-02-21T09:19:29.0840560Z } 2026-02-21T09:19:29.0840622Z .section .debug_macinfo { } 2026-02-21T09:19:29.0840626Z 2026-02-21T09:19:29.0840709Z ================================================================ 2026-02-21T09:19:29.0840809Z please share the reproducer above with Triton project. 2026-02-21T09:19:31.2609799Z 2026-02-21T09:19:31.2612405Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 66/66 15.9 configs/s 2026-02-21T09:19:33.7068470Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 414.6 2026-02-21T09:19:33.7069863Z configs/s 2026-02-21T09:19:33.8826067Z [136s] Generation 7 complete: 2026-02-21T09:19:33.8830430Z error=7 2026-02-21T09:19:33.8831811Z ok=62 2026-02-21T09:19:33.8831969Z min=0.0245 2026-02-21T09:19:33.8832105Z mid=0.0328 2026-02-21T09:19:33.8832224Z max=7.7620 2026-02-21T09:19:33.8832366Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:19:33.8832603Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:19:33.8832826Z 'l2_groupings': [16], 2026-02-21T09:19:33.8832994Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:19:33.8833188Z 'loop_orders': [[1, 0]], 2026-02-21T09:19:33.8833345Z 'num_stages': 6, 2026-02-21T09:19:33.8833478Z 'num_warps': 8, 2026-02-21T09:19:33.8833620Z 'pid_type': 'flat', 2026-02-21T09:19:33.8833771Z 'range_flattens': [None, True], 2026-02-21T09:19:33.8833951Z 'range_multi_buffers': [None, None], 2026-02-21T09:19:33.8834126Z 'range_num_stages': [0, 0], 2026-02-21T09:19:33.8834306Z 'range_unroll_factors': [0, 0], 2026-02-21T09:19:33.8834487Z 'range_warp_specializes': [None, False]} 2026-02-21T09:19:33.8847098Z [136s] Fitting surrogate: 692 points, 692 targets 2026-02-21T09:19:34.8233424Z [137s] Generation 8 starting: 50 neighbors, 3 active search path(s) 2026-02-21T09:19:46.2703020Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 52/52 1.0 configs/s 2026-02-21T09:19:48.6681847Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 52/52 21.8 configs/s 2026-02-21T09:19:49.9178855Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 806.8 2026-02-21T09:19:49.9180509Z configs/s 2026-02-21T09:19:50.0139069Z [152s] Generation 8 complete: 2026-02-21T09:19:50.0139347Z error=17 2026-02-21T09:19:50.0144319Z ok=37 2026-02-21T09:19:50.0146609Z min=0.0245 2026-02-21T09:19:50.0151634Z mid=0.0369 2026-02-21T09:19:50.0156222Z max=12.4549 2026-02-21T09:19:50.0161133Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:19:50.0161504Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:19:50.0162069Z 'l2_groupings': [16], 2026-02-21T09:19:50.0162317Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:19:50.0162523Z 'loop_orders': [[1, 0]], 2026-02-21T09:19:50.0162674Z 'num_stages': 6, 2026-02-21T09:19:50.0162822Z 'num_warps': 8, 2026-02-21T09:19:50.0162960Z 'pid_type': 'flat', 2026-02-21T09:19:50.0163121Z 'range_flattens': [None, True], 2026-02-21T09:19:50.0163305Z 'range_multi_buffers': [None, None], 2026-02-21T09:19:50.0163487Z 'range_num_stages': [0, 0], 2026-02-21T09:19:50.0163655Z 'range_unroll_factors': [0, 0], 2026-02-21T09:19:50.0163826Z 'range_warp_specializes': [None, False]} 2026-02-21T09:19:50.0164033Z [152s] Fitting surrogate: 746 points, 746 targets 2026-02-21T09:19:50.9588386Z [153s] Generation 9 starting: 47 neighbors, 3 active search path(s) 2026-02-21T09:20:23.0808747Z [185s] Timeout after 30s compiling Config(block_sizes=[256, 512, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], num_stages=2, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:20:23.0822747Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48/48 0.3 configs/s 2026-02-21T09:20:25.1441656Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 48/48 24.0 configs/s 2026-02-21T09:20:26.3073537Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 867.3 2026-02-21T09:20:26.3077612Z configs/s 2026-02-21T09:20:26.4050894Z [189s] Generation 9 complete: 2026-02-21T09:20:26.4053427Z error=14 2026-02-21T09:20:26.4053597Z timeout=1 2026-02-21T09:20:26.4053727Z ok=36 2026-02-21T09:20:26.4053863Z min=0.0245 2026-02-21T09:20:26.4054023Z mid=0.0369 2026-02-21T09:20:26.4054151Z max=6.5536 2026-02-21T09:20:26.4054302Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:20:26.4054572Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:20:26.4054863Z 'l2_groupings': [16], 2026-02-21T09:20:26.4055051Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:20:26.4055262Z 'loop_orders': [[1, 0]], 2026-02-21T09:20:26.4055428Z 'num_stages': 6, 2026-02-21T09:20:26.4055586Z 'num_warps': 8, 2026-02-21T09:20:26.4055733Z 'pid_type': 'flat', 2026-02-21T09:20:26.4055910Z 'range_flattens': [None, True], 2026-02-21T09:20:26.4056099Z 'range_multi_buffers': [None, None], 2026-02-21T09:20:26.4056302Z 'range_num_stages': [0, 0], 2026-02-21T09:20:26.4056473Z 'range_unroll_factors': [0, 0], 2026-02-21T09:20:26.4056666Z 'range_warp_specializes': [None, False]} 2026-02-21T09:20:26.4077979Z [189s] Fitting surrogate: 797 points, 797 targets 2026-02-21T09:20:27.2216621Z [190s] Generation 10 starting: 38 neighbors, 3 active search path(s) 2026-02-21T09:21:00.3673304Z [223s] Timeout after 30s compiling Config(block_sizes=[128, 2048, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], num_stages=2, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:21:00.3689909Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39/39 0.2 configs/s 2026-02-21T09:21:01.8508986Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 39/39 26.8 configs/s 2026-02-21T09:21:02.8652538Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 992.7 2026-02-21T09:21:02.8654140Z configs/s 2026-02-21T09:21:02.9534112Z [225s] Generation 10 complete: 2026-02-21T09:21:02.9538533Z error=15 2026-02-21T09:21:02.9539989Z timeout=1 2026-02-21T09:21:02.9540184Z ok=26 2026-02-21T09:21:02.9540332Z min=0.0245 2026-02-21T09:21:02.9540773Z mid=0.0266 2026-02-21T09:21:02.9540979Z max=6.2905 2026-02-21T09:21:02.9541115Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:21:02.9541361Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:21:02.9541577Z 'l2_groupings': [16], 2026-02-21T09:21:02.9541757Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:21:02.9541953Z 'loop_orders': [[1, 0]], 2026-02-21T09:21:02.9542118Z 'num_stages': 6, 2026-02-21T09:21:02.9542259Z 'num_warps': 8, 2026-02-21T09:21:02.9542413Z 'pid_type': 'flat', 2026-02-21T09:21:02.9542572Z 'range_flattens': [None, True], 2026-02-21T09:21:02.9542745Z 'range_multi_buffers': [None, None], 2026-02-21T09:21:02.9542926Z 'range_num_stages': [0, 0], 2026-02-21T09:21:02.9543085Z 'range_unroll_factors': [0, 0], 2026-02-21T09:21:02.9543264Z 'range_warp_specializes': [None, False]} 2026-02-21T09:21:02.9560917Z [225s] Fitting surrogate: 839 points, 839 targets 2026-02-21T09:21:03.4441668Z [226s] Generation 11 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:21:05.7271100Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 6.2 configs/s 2026-02-21T09:21:06.4919326Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 24.2 configs/s 2026-02-21T09:21:06.4919807Z [229s] Generation 11 complete: 2026-02-21T09:21:06.4920014Z error=5 2026-02-21T09:21:06.4920154Z ok=13 2026-02-21T09:21:06.4920277Z min=0.0245 2026-02-21T09:21:06.4920433Z mid=0.1086 2026-02-21T09:21:06.4926641Z max=6.2229 2026-02-21T09:21:06.4931311Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:21:06.4935372Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:21:06.4939414Z 'l2_groupings': [16], 2026-02-21T09:21:06.4941015Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:21:06.4941248Z 'loop_orders': [[1, 0]], 2026-02-21T09:21:06.4941446Z 'num_stages': 6, 2026-02-21T09:21:06.4941590Z 'num_warps': 8, 2026-02-21T09:21:06.4941772Z 'pid_type': 'flat', 2026-02-21T09:21:06.4941937Z 'range_flattens': [None, True], 2026-02-21T09:21:06.4942127Z 'range_multi_buffers': [None, None], 2026-02-21T09:21:06.4942310Z 'range_num_stages': [0, 0], 2026-02-21T09:21:06.4942478Z 'range_unroll_factors': [0, 0], 2026-02-21T09:21:06.4942656Z 'range_warp_specializes': [None, False]} 2026-02-21T09:21:06.4947162Z [229s] Fitting surrogate: 857 points, 857 targets 2026-02-21T09:21:06.9842176Z [229s] Generation 12 starting: 20 neighbors, 1 active search path(s) 2026-02-21T09:21:12.7127107Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21/21 2.0 configs/s 2026-02-21T09:21:13.5083327Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 21/21 27.4 configs/s 2026-02-21T09:21:13.5090269Z [236s] Generation 12 complete: 2026-02-21T09:21:13.5090563Z error=9 2026-02-21T09:21:13.5095816Z ok=13 2026-02-21T09:21:13.5100216Z min=0.0245 2026-02-21T09:21:13.5104370Z mid=0.0717 2026-02-21T09:21:13.5106557Z max=7.7241 2026-02-21T09:21:13.5106801Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:21:13.5107097Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:21:13.5107385Z 'l2_groupings': [16], 2026-02-21T09:21:13.5107585Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:21:13.5107812Z 'loop_orders': [[1, 0]], 2026-02-21T09:21:13.5107984Z 'num_stages': 6, 2026-02-21T09:21:13.5108148Z 'num_warps': 8, 2026-02-21T09:21:13.5108307Z 'pid_type': 'flat', 2026-02-21T09:21:13.5108502Z 'range_flattens': [None, True], 2026-02-21T09:21:13.5108688Z 'range_multi_buffers': [None, None], 2026-02-21T09:21:13.5108887Z 'range_num_stages': [0, 0], 2026-02-21T09:21:13.5109064Z 'range_unroll_factors': [0, 0], 2026-02-21T09:21:13.5109250Z 'range_warp_specializes': [None, False]} 2026-02-21T09:21:13.5119579Z [236s] Fitting surrogate: 879 points, 879 targets 2026-02-21T09:21:14.3898371Z [237s] Generation 13 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:21:40.6348317Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0.3 configs/s 2026-02-21T09:21:41.4652090Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 20.6 configs/s 2026-02-21T09:21:41.4657546Z [264s] Generation 13 complete: 2026-02-21T09:21:41.4662107Z error=4 2026-02-21T09:21:41.4666629Z ok=14 2026-02-21T09:21:41.4668745Z min=0.0245 2026-02-21T09:21:41.4668906Z mid=0.0757 2026-02-21T09:21:41.4669043Z max=5.1098 2026-02-21T09:21:41.4669182Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:21:41.4669428Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:21:41.4669654Z 'l2_groupings': [16], 2026-02-21T09:21:41.4669825Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:21:41.4670020Z 'loop_orders': [[1, 0]], 2026-02-21T09:21:41.4670168Z 'num_stages': 6, 2026-02-21T09:21:41.4670309Z 'num_warps': 8, 2026-02-21T09:21:41.4670447Z 'pid_type': 'flat', 2026-02-21T09:21:41.4670605Z 'range_flattens': [None, True], 2026-02-21T09:21:41.4670777Z 'range_multi_buffers': [None, None], 2026-02-21T09:21:41.4671195Z 'range_num_stages': [0, 0], 2026-02-21T09:21:41.4671432Z 'range_unroll_factors': [0, 0], 2026-02-21T09:21:41.4671621Z 'range_warp_specializes': [None, False]} 2026-02-21T09:21:41.4681364Z [264s] Fitting surrogate: 897 points, 897 targets 2026-02-21T09:21:41.9499884Z [264s] Generation 14 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:22:12.0526699Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0.3 configs/s 2026-02-21T09:22:12.6856029Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 30.2 configs/s 2026-02-21T09:22:12.8153708Z Generation 14: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 7181.3 2026-02-21T09:22:12.8154107Z configs/s 2026-02-21T09:22:12.8438501Z [295s] Generation 14 complete: 2026-02-21T09:22:12.8442712Z error=7 2026-02-21T09:22:12.8446752Z ok=11 2026-02-21T09:22:12.8451394Z min=0.0246 2026-02-21T09:22:12.8456584Z mid=0.0512 2026-02-21T09:22:12.8458104Z max=0.0940 2026-02-21T09:22:12.8458320Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:22:12.8458633Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:22:12.8458915Z 'l2_groupings': [16], 2026-02-21T09:22:12.8459132Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:22:12.8459372Z 'loop_orders': [[1, 0]], 2026-02-21T09:22:12.8459560Z 'num_stages': 6, 2026-02-21T09:22:12.8459733Z 'num_warps': 8, 2026-02-21T09:22:12.8459894Z 'pid_type': 'flat', 2026-02-21T09:22:12.8460095Z 'range_flattens': [None, True], 2026-02-21T09:22:12.8460297Z 'range_multi_buffers': [None, None], 2026-02-21T09:22:12.8460513Z 'range_num_stages': [0, 0], 2026-02-21T09:22:12.8460694Z 'range_unroll_factors': [0, 0], 2026-02-21T09:22:12.8460899Z 'range_warp_specializes': [None, False]} 2026-02-21T09:22:12.8468960Z [295s] Fitting surrogate: 915 points, 915 targets 2026-02-21T09:22:13.4231959Z [296s] Generation 15 starting: 17 neighbors, 1 active search path(s) 2026-02-21T09:22:26.7258143Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 0.8 configs/s 2026-02-21T09:22:27.5742363Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 21.5 configs/s 2026-02-21T09:22:27.7042285Z Generation 15: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 7177.8 2026-02-21T09:22:27.7046910Z configs/s 2026-02-21T09:22:27.7324351Z [310s] Generation 15 complete: 2026-02-21T09:22:27.7328672Z error=6 2026-02-21T09:22:27.7333197Z ok=13 2026-02-21T09:22:27.7334801Z min=0.0245 2026-02-21T09:22:27.7334983Z mid=0.0451 2026-02-21T09:22:27.7335137Z max=13.1983 2026-02-21T09:22:27.7335302Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:22:27.7335587Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:22:27.7335846Z 'l2_groupings': [16], 2026-02-21T09:22:27.7336039Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:22:27.7336259Z 'loop_orders': [[1, 0]], 2026-02-21T09:22:27.7336461Z 'num_stages': 6, 2026-02-21T09:22:27.7336634Z 'num_warps': 8, 2026-02-21T09:22:27.7336798Z 'pid_type': 'flat', 2026-02-21T09:22:27.7336987Z 'range_flattens': [None, True], 2026-02-21T09:22:27.7337182Z 'range_multi_buffers': [None, None], 2026-02-21T09:22:27.7337392Z 'range_num_stages': [0, 0], 2026-02-21T09:22:27.7337574Z 'range_unroll_factors': [0, 0], 2026-02-21T09:22:27.7337782Z 'range_warp_specializes': [None, False]} 2026-02-21T09:22:27.7358157Z [310s] Fitting surrogate: 934 points, 934 targets 2026-02-21T09:22:28.2625312Z [311s] Generation 16 starting: 14 neighbors, 1 active search path(s) 2026-02-21T09:22:31.8536846Z Generation 16: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 4.4 configs/s 2026-02-21T09:22:32.8254411Z Generation 16: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 16.1 configs/s 2026-02-21T09:22:33.0336132Z Generation 16: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 4585.9 2026-02-21T09:22:33.0337427Z configs/s 2026-02-21T09:22:33.0673577Z [315s] Generation 16 complete: 2026-02-21T09:22:33.0674854Z error=1 2026-02-21T09:22:33.0675066Z ok=15 2026-02-21T09:22:33.0675210Z min=0.0246 2026-02-21T09:22:33.0675365Z mid=0.0491 2026-02-21T09:22:33.0675505Z max=5.2900 2026-02-21T09:22:33.0675659Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:22:33.0675941Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:22:33.0676208Z 'l2_groupings': [16], 2026-02-21T09:22:33.0676411Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:22:33.0676639Z 'loop_orders': [[1, 0]], 2026-02-21T09:22:33.0676821Z 'num_stages': 6, 2026-02-21T09:22:33.0676976Z 'num_warps': 8, 2026-02-21T09:22:33.0677136Z 'pid_type': 'flat', 2026-02-21T09:22:33.0677311Z 'range_flattens': [None, True], 2026-02-21T09:22:33.0677517Z 'range_multi_buffers': [None, None], 2026-02-21T09:22:33.0677727Z 'range_num_stages': [0, 0], 2026-02-21T09:22:33.0677913Z 'range_unroll_factors': [0, 0], 2026-02-21T09:22:33.0678139Z 'range_warp_specializes': [None, False]} 2026-02-21T09:22:33.0707115Z [315s] Fitting surrogate: 950 points, 950 targets 2026-02-21T09:22:33.4028500Z [316s] Autotuning complete in 316.3s after searching 916 configs. 2026-02-21T09:22:33.4028948Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:22:33.4030137Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=8, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:22:33.4031177Z 2026-02-21T09:22:33.4031475Z [316s] Code of selected kernel: /tmp/torchinductor_root/um/cumfkfhuvu2juy3h2cvq36tllrmi2bsjofqwp4iwfuiyawferwdk.py 2026-02-21T09:22:33.4169793Z from __future__ import annotations 2026-02-21T09:22:33.4170069Z 2026-02-21T09:22:33.4170164Z import torch 2026-02-21T09:22:33.4170759Z import helion 2026-02-21T09:22:33.4170933Z import triton 2026-02-21T09:22:33.4171180Z import triton.language as tl 2026-02-21T09:22:33.4171476Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:22:33.4171718Z 2026-02-21T09:22:33.4171806Z _BLOCK_SIZE_1 = tl.constexpr(256) 2026-02-21T09:22:33.4172032Z _BLOCK_SIZE_0 = tl.constexpr(256) 2026-02-21T09:22:33.4172243Z _BLOCK_SIZE_2 = tl.constexpr(32) 2026-02-21T09:22:33.4172463Z # src[matmul.py:42]: def matmul( 2026-02-21T09:22:33.4172686Z # src[matmul.py:43]: x: Tensor, 2026-02-21T09:22:33.4172899Z # src[matmul.py:44]: y: Tensor, 2026-02-21T09:22:33.4173124Z # src[matmul.py:42-68]: ... 2026-02-21T09:22:33.4173350Z helion.runtime.set_triton_allocator() 2026-02-21T09:22:33.4173516Z 2026-02-21T09:22:33.4173584Z @triton.jit 2026-02-21T09:22:33.4173760Z def _helion_matmul(x, y, out): 2026-02-21T09:22:33.4174182Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:22:33.4174948Z y_desc = tl.make_tensor_descriptor(y, [8192, 1024], [1024, 1], [_BLOCK_SIZE_1, _BLOCK_SIZE_2]) 2026-02-21T09:22:33.4175409Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:22:33.4175718Z num_pid_m = tl.cdiv(8192, _BLOCK_SIZE_1) 2026-02-21T09:22:33.4175957Z num_pid_n = tl.cdiv(1024, _BLOCK_SIZE_0) 2026-02-21T09:22:33.4176198Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:22:33.4176418Z num_pid_in_group = 16 * num_pid_n 2026-02-21T09:22:33.4176672Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:22:33.4176914Z first_pid_m = group_id * 16 2026-02-21T09:22:33.4177191Z group_size_m = min(num_pid_m - first_pid_m, 16) 2026-02-21T09:22:33.4177521Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:22:33.4177858Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:22:33.4178133Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:22:33.4178412Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:22:33.4178703Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:22:33.4178981Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:22:33.4179354Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:22:33.4179733Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:22:33.4180026Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:22:33.4180373Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:22:33.4180823Z for offset_2 in tl.range(0, 1024, _BLOCK_SIZE_2, warp_specialize=False, flatten=True): 2026-02-21T09:22:33.4181222Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:22:33.4181507Z acc_copy = acc 2026-02-21T09:22:33.4181697Z acc_copy_0 = acc_copy 2026-02-21T09:22:33.4182000Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:22:33.4182505Z load = tl.load(x + (indices_0[:, None] * 1024 + indices_2[None, :] * 1), None, eviction_policy='evict_last') 2026-02-21T09:22:33.4182978Z load_1 = tl.permute(y_desc.load([offset_1, offset_2]), [1, 0]) 2026-02-21T09:22:33.4183501Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:22:33.4184023Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:22:33.4184388Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:22:33.4184660Z tl.store(out + (indices_0[:, None] * 8192 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:22:33.4184928Z 2026-02-21T09:22:33.4185245Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:22:33.4185691Z """ 2026-02-21T09:22:33.4185961Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:22:33.4186377Z Args: 2026-02-21T09:22:33.4186554Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:22:33.4186815Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:22:33.4187168Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:22:33.4187588Z after the matmul. Defaults to identity (no change). 2026-02-21T09:22:33.4187849Z Returns: 2026-02-21T09:22:33.4188032Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:22:33.4188261Z """ 2026-02-21T09:22:33.4188420Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:22:33.4188637Z m, k = x.size() 2026-02-21T09:22:33.4188817Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:22:33.4189037Z k2, n = y.size() 2026-02-21T09:22:33.4189318Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:22:33.4189670Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:22:33.4189928Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:22:33.4190294Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:22:33.4190646Z # src[matmul.py:62]: ) 2026-02-21T09:22:33.4190962Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:22:33.4191391Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:22:33.4191655Z _BLOCK_SIZE_1 = 256 2026-02-21T09:22:33.4191848Z _BLOCK_SIZE_0 = 256 2026-02-21T09:22:33.4192087Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:22:33.4192434Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:22:33.4192781Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:22:33.4193038Z # src[matmul.py:63-67]: ... 2026-02-21T09:22:33.4193508Z _launcher(_helion_matmul, (triton.cdiv(8192, _BLOCK_SIZE_1) * triton.cdiv(1024, _BLOCK_SIZE_0),), x, y, out, num_warps=8, num_stages=6) 2026-02-21T09:22:33.4193989Z # src[matmul.py:68]: return out 2026-02-21T09:22:33.4194202Z return out 2026-02-21T09:22:54.8804458Z WARNING:tritonbench.utils.triton_op:Completed input ID 5: 2026-02-21T09:22:54.8805898Z (M, N, K) 2026-02-21T09:22:54.8806116Z ------------------ 2026-02-21T09:22:54.8806307Z (1024, 8192, 1024) 2026-02-21T09:22:54.8806434Z 2026-02-21T09:22:54.8820952Z 50%|█████ | 4/8 [24:50<26:20, 395.08s/it]WARNING:tritonbench.utils.triton_op:Running input ID 6: 2026-02-21T09:22:54.8824807Z (M, N, K) 2026-02-21T09:22:54.8826355Z ------------------ 2026-02-21T09:22:54.8826608Z (8192, 2048, 2048) 2026-02-21T09:22:54.8827003Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:23:42.5654980Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:24:18.9208742Z INFO:tritonbench.utils.triton_op:Took 81.77ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:25:01.0060730Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:25:01.0065147Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:25:01.0068907Z 'dtype': 'torch.float16', 2026-02-21T09:25:01.0070403Z 'shape': (8192, 2048), 2026-02-21T09:25:01.0070663Z 'stride': (2048, 1)}, 2026-02-21T09:25:01.0074262Z { 'device': 'cuda:0', 2026-02-21T09:25:01.0074549Z 'dtype': 'torch.float16', 2026-02-21T09:25:01.0079059Z 'shape': (2048, 2048), 2026-02-21T09:25:01.0084294Z 'stride': (1, 2048)}, 2026-02-21T09:25:01.0086333Z None), 2026-02-21T09:25:01.0086562Z 'kwargs': {}} 2026-02-21T09:25:01.0108637Z INFO:tritonbench.utils.triton_op:Took 5.47ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:25:01.1030826Z [0s] Autotune random seed: 2137757931 2026-02-21T09:25:01.2331222Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:25:06.8460557Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 27.6 configs/s 2026-02-21T09:25:30.9415332Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 4.1 configs/s 2026-02-21T09:25:30.9426567Z [29s] Adaptive compile timeout: 30s (90% percentile=3.8s, bounds=[30.0s, 30s]) 2026-02-21T09:25:31.3940763Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 1956.1 configs/s 2026-02-21T09:25:31.4670895Z [30s] Initial random population of 100, 5 starting points: 2026-02-21T09:25:31.4676154Z error=16 2026-02-21T09:25:31.4680102Z ok=84 2026-02-21T09:25:31.4685274Z min=0.1803 2026-02-21T09:25:31.4689332Z mid=4.4831 2026-02-21T09:25:31.4690810Z max=1063.1486 2026-02-21T09:25:31.4691006Z best={'block_sizes': [64, 256, 16], 2026-02-21T09:25:31.4691247Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:25:31.4691725Z 'l2_groupings': [1], 2026-02-21T09:25:31.4691977Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:25:31.4692198Z 'loop_orders': [[1, 0]], 2026-02-21T09:25:31.4692348Z 'num_stages': 7, 2026-02-21T09:25:31.4692489Z 'num_warps': 8, 2026-02-21T09:25:31.4692623Z 'pid_type': 'flat', 2026-02-21T09:25:31.4692780Z 'range_flattens': [None, None], 2026-02-21T09:25:31.4692953Z 'range_multi_buffers': [None, False], 2026-02-21T09:25:31.4693138Z 'range_num_stages': [0, 0], 2026-02-21T09:25:31.4693306Z 'range_unroll_factors': [0, 0], 2026-02-21T09:25:31.4693478Z 'range_warp_specializes': [None, None]} 2026-02-21T09:25:31.4693770Z [30s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:25:32.6634048Z [31s] Generation 1 starting: 83 neighbors, 5 active search path(s) 2026-02-21T09:25:37.7973123Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 17.9 configs/s 2026-02-21T09:25:43.4065986Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 87/87 15.9 configs/s 2026-02-21T09:25:44.0397533Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1494.8 2026-02-21T09:25:44.0401919Z configs/s 2026-02-21T09:25:44.0982939Z [42s] Generation 1 complete: 2026-02-21T09:25:44.0988088Z error=4 2026-02-21T09:25:44.0989823Z ok=84 2026-02-21T09:25:44.0990035Z min=0.1023 2026-02-21T09:25:44.0996178Z mid=0.4669 2026-02-21T09:25:44.1000936Z max=15.2789 2026-02-21T09:25:44.1005501Z best={'block_sizes': [256, 256, 16], 2026-02-21T09:25:44.1006948Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:25:44.1007199Z 'l2_groupings': [2], 2026-02-21T09:25:44.1007382Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:25:44.1007574Z 'loop_orders': [[1, 0]], 2026-02-21T09:25:44.1007736Z 'num_stages': 7, 2026-02-21T09:25:44.1007874Z 'num_warps': 4, 2026-02-21T09:25:44.1008020Z 'pid_type': 'flat', 2026-02-21T09:25:44.1008173Z 'range_flattens': [None, None], 2026-02-21T09:25:44.1008380Z 'range_multi_buffers': [None, False], 2026-02-21T09:25:44.1008575Z 'range_num_stages': [0, 0], 2026-02-21T09:25:44.1008738Z 'range_unroll_factors': [0, 0], 2026-02-21T09:25:44.1008919Z 'range_warp_specializes': [None, None]} 2026-02-21T09:25:44.1009198Z [42s] Fitting surrogate: 188 points, 188 targets 2026-02-21T09:25:45.3496145Z [44s] Generation 2 starting: 85 neighbors, 5 active search path(s) 2026-02-21T09:25:56.4668507Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 1.2 configs/s 2026-02-21T09:26:01.1210495Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 19.1 configs/s 2026-02-21T09:26:04.1407742Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 329.2 2026-02-21T09:26:04.1408088Z configs/s 2026-02-21T09:26:04.2633636Z [63s] Generation 2 complete: 2026-02-21T09:26:04.2635296Z error=20 2026-02-21T09:26:04.2635469Z ok=70 2026-02-21T09:26:04.2635619Z min=0.1024 2026-02-21T09:26:04.2635758Z mid=0.2302 2026-02-21T09:26:04.2635892Z max=15.4932 2026-02-21T09:26:04.2636348Z best={'block_sizes': [256, 256, 16], 2026-02-21T09:26:04.2636677Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:26:04.2636897Z 'l2_groupings': [32], 2026-02-21T09:26:04.2637071Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:26:04.2637257Z 'loop_orders': [[1, 0]], 2026-02-21T09:26:04.2637420Z 'num_stages': 6, 2026-02-21T09:26:04.2637558Z 'num_warps': 8, 2026-02-21T09:26:04.2637708Z 'pid_type': 'flat', 2026-02-21T09:26:04.2637864Z 'range_flattens': [None, None], 2026-02-21T09:26:04.2638050Z 'range_multi_buffers': [None, False], 2026-02-21T09:26:04.2638232Z 'range_num_stages': [0, 0], 2026-02-21T09:26:04.2638409Z 'range_unroll_factors': [0, 0], 2026-02-21T09:26:04.2638595Z 'range_warp_specializes': [None, None]} 2026-02-21T09:26:04.2651116Z [63s] Fitting surrogate: 278 points, 278 targets 2026-02-21T09:26:05.4945681Z [64s] Generation 3 starting: 78 neighbors, 5 active search path(s) 2026-02-21T09:26:13.6889092Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 1.5 configs/s 2026-02-21T09:26:18.1768581Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 17.8 configs/s 2026-02-21T09:26:18.7916588Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1547.3 2026-02-21T09:26:18.7917401Z configs/s 2026-02-21T09:26:18.8490886Z [77s] Generation 3 complete: 2026-02-21T09:26:18.8494967Z error=7 2026-02-21T09:26:18.8498839Z ok=77 2026-02-21T09:26:18.8502739Z min=0.0745 2026-02-21T09:26:18.8506853Z mid=0.1393 2026-02-21T09:26:18.8508962Z max=11.4693 2026-02-21T09:26:18.8509169Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:26:18.8509417Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:26:18.8509655Z 'l2_groupings': [2], 2026-02-21T09:26:18.8509843Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:26:18.8510071Z 'loop_orders': [[1, 0]], 2026-02-21T09:26:18.8510245Z 'num_stages': 7, 2026-02-21T09:26:18.8510399Z 'num_warps': 8, 2026-02-21T09:26:18.8510566Z 'pid_type': 'flat', 2026-02-21T09:26:18.8510726Z 'range_flattens': [None, None], 2026-02-21T09:26:18.8510917Z 'range_multi_buffers': [None, False], 2026-02-21T09:26:18.8511104Z 'range_num_stages': [0, 0], 2026-02-21T09:26:18.8511283Z 'range_unroll_factors': [0, 0], 2026-02-21T09:26:18.8511466Z 'range_warp_specializes': [None, False]} 2026-02-21T09:26:18.8511693Z [77s] Fitting surrogate: 362 points, 362 targets 2026-02-21T09:26:20.5306762Z [79s] Generation 4 starting: 85 neighbors, 5 active search path(s) 2026-02-21T09:26:52.1390780Z [110s] Timeout after 30s compiling Config(block_sizes=[256, 1024, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=7, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:26:52.1405014Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 0.4 configs/s 2026-02-21T09:26:55.9570531Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 87/87 23.2 configs/s 2026-02-21T09:26:57.7525328Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 550.9 2026-02-21T09:26:57.7529270Z configs/s 2026-02-21T09:26:57.8481108Z [116s] Generation 4 complete: 2026-02-21T09:26:57.8484943Z error=24 2026-02-21T09:26:57.8490058Z timeout=1 2026-02-21T09:26:57.8491580Z ok=66 2026-02-21T09:26:57.8491795Z min=0.0745 2026-02-21T09:26:57.8498037Z mid=0.1065 2026-02-21T09:26:57.8500378Z max=10.4837 2026-02-21T09:26:57.8500602Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:26:57.8500865Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:26:57.8505542Z 'l2_groupings': [2], 2026-02-21T09:26:57.8507294Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:26:57.8507853Z 'loop_orders': [[1, 0]], 2026-02-21T09:26:57.8508087Z 'num_stages': 7, 2026-02-21T09:26:57.8508242Z 'num_warps': 8, 2026-02-21T09:26:57.8508385Z 'pid_type': 'flat', 2026-02-21T09:26:57.8508555Z 'range_flattens': [None, False], 2026-02-21T09:26:57.8508750Z 'range_multi_buffers': [None, False], 2026-02-21T09:26:57.8508936Z 'range_num_stages': [0, 0], 2026-02-21T09:26:57.8509113Z 'range_unroll_factors': [0, 0], 2026-02-21T09:26:57.8509361Z 'range_warp_specializes': [None, False]} 2026-02-21T09:26:57.8509584Z [116s] Fitting surrogate: 453 points, 453 targets 2026-02-21T09:26:59.1481126Z [117s] Generation 5 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:27:04.8346604Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 24.8 configs/s 2026-02-21T09:27:09.0121058Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 87/87 20.6 configs/s 2026-02-21T09:27:13.1595113Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 241.7 2026-02-21T09:27:13.1595575Z configs/s 2026-02-21T09:27:13.3469935Z [132s] Generation 5 complete: 2026-02-21T09:27:13.3473630Z error=18 2026-02-21T09:27:13.3478474Z ok=73 2026-02-21T09:27:13.3482507Z min=0.0706 2026-02-21T09:27:13.3487924Z mid=0.1005 2026-02-21T09:27:13.3489767Z max=15.6682 2026-02-21T09:27:13.3490014Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:27:13.3493251Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:27:13.3493568Z 'l2_groupings': [16], 2026-02-21T09:27:13.3499856Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:27:13.3501394Z 'loop_orders': [[1, 0]], 2026-02-21T09:27:13.3501594Z 'num_stages': 3, 2026-02-21T09:27:13.3501804Z 'num_warps': 8, 2026-02-21T09:27:13.3501952Z 'pid_type': 'flat', 2026-02-21T09:27:13.3502127Z 'range_flattens': [None, False], 2026-02-21T09:27:13.3507027Z 'range_multi_buffers': [None, True], 2026-02-21T09:27:13.3508401Z 'range_num_stages': [0, 0], 2026-02-21T09:27:13.3508634Z 'range_unroll_factors': [0, 0], 2026-02-21T09:27:13.3508837Z 'range_warp_specializes': [None, None]} 2026-02-21T09:27:13.3509133Z [132s] Fitting surrogate: 544 points, 544 targets 2026-02-21T09:27:14.8146455Z [133s] Generation 6 starting: 93 neighbors, 5 active search path(s) 2026-02-21T09:27:27.4948328Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94/94 1.6 configs/s 2026-02-21T09:27:30.4235598Z 2026-02-21T09:27:30.4235614Z 2026-02-21T09:27:30.4235997Z ================================================================ 2026-02-21T09:27:30.4236292Z Internal Triton PTX codegen error 2026-02-21T09:27:30.4236477Z `ptxas` stderr: 2026-02-21T09:27:30.4236937Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 209 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:27:30.4237461Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:27:30.4237623Z 2026-02-21T09:27:30.4238045Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplsezy5b7.ptx -o /tmp/tmplsezy5b7.ptx.o 2026-02-21T09:27:30.4238850Z 2026-02-21T09:27:30.4239123Z 2026-02-21T09:27:30.4239202Z // 2026-02-21T09:27:30.4239350Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:27:30.4239540Z // 2026-02-21T09:27:30.4239674Z 2026-02-21T09:27:30.4239733Z .version 8.7 2026-02-21T09:27:30.4239878Z .target sm_100a 2026-02-21T09:27:30.4240013Z .address_size 64 2026-02-21T09:27:30.4240106Z 2026-02-21T09:27:30.4240230Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:27:30.4240484Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:27:30.4240697Z // @_helion_matmul 2026-02-21T09:27:30.4240899Z .visible .entry _helion_matmul( 2026-02-21T09:27:30.4241110Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:27:30.4241364Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:27:30.4241610Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:27:30.4241846Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:27:30.4242093Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:27:30.4242289Z ) 2026-02-21T09:27:30.4242415Z .reqntid 256 2026-02-21T09:27:30.4242538Z .maxnreg 32 2026-02-21T09:27:30.4242661Z { 2026-02-21T09:27:30.4242779Z .reg .pred %p<101>; 2026-02-21T09:27:30.4242932Z .reg .b16 %rs<4>; 2026-02-21T09:27:30.4243065Z .reg .b32 %r<1608>; 2026-02-21T09:27:30.4243206Z .reg .b64 %rd<656>; 2026-02-21T09:27:30.4243347Z $L__func_begin0: 2026-02-21T09:27:30.4243427Z 2026-02-21T09:27:30.4243478Z // %bb.0: 2026-02-21T09:27:30.4243717Z .loc 1 19 0 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:19 2026-02-21T09:27:30.4244004Z mov.u32 %r1, %tid.x; 2026-02-21T09:27:30.4244310Z ld.param.b64 %rd16, [_helion_matmul_param_1]; 2026-02-21T09:27:30.4244511Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:27:30.4244900Z mov.b32 %r77, global_smem; 2026-02-21T09:27:30.4245061Z // begin inline asm 2026-02-21T09:27:30.4245312Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r77], 512; 2026-02-21T09:27:30.4245562Z // end inline asm 2026-02-21T09:27:30.4245722Z ld.param.b64 %rd33, [_helion_matmul_param_3]; 2026-02-21T09:27:30.4245914Z bar.sync 0; 2026-02-21T09:27:30.4246054Z ld.shared.b32 %r1600, [global_smem]; 2026-02-21T09:27:30.4246228Z bar.sync 0; 2026-02-21T09:27:30.4246355Z // begin inline asm 2026-02-21T09:27:30.4246562Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:27:30.4246786Z // end inline asm 2026-02-21T09:27:30.4247041Z .loc 1 21 67 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:21:67 2026-02-21T09:27:30.4247345Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:27:30.4247497Z mov.u32 %r86, %ctaid.y; 2026-02-21T09:27:30.4247654Z mov.u32 %r87, %ctaid.z; 2026-02-21T09:27:30.4247807Z mov.u32 %r88, %nctaid.x; 2026-02-21T09:27:30.4247957Z mov.u32 %r89, %nctaid.y; 2026-02-21T09:27:30.4248116Z mad.lo.s32 %r90, %r87, %r89, %r86; 2026-02-21T09:27:30.4248286Z mad.lo.s32 %r91, %r90, %r88, %r3; 2026-02-21T09:27:30.4248462Z shl.b32 %r92, %r91, 7; 2026-02-21T09:27:30.4248611Z cvt.s64.s32 %rd34, %r92; 2026-02-21T09:27:30.4248779Z add.s64 %rd30, %rd33, %rd34; 2026-02-21T09:27:30.4248935Z shl.b32 %r93, %r1, 2; 2026-02-21T09:27:30.4249096Z add.s32 %r78, %r77, %r93; 2026-02-21T09:27:30.4249256Z mov.b32 %r95, 0; 2026-02-21T09:27:30.4249394Z // begin inline asm 2026-02-21T09:27:30.4249556Z @%p1 st.shared.b32 [ %r78 + 0 ], %r95; 2026-02-21T09:27:30.4249726Z // end inline asm 2026-02-21T09:27:30.4249868Z bar.warp.sync -1; 2026-02-21T09:27:30.4250009Z setp.eq.b32 %p93, %r1, 0; 2026-02-21T09:27:30.4250164Z cvt.u64.u32 %rd15, %r77; 2026-02-21T09:27:30.4250308Z // begin inline asm 2026-02-21T09:27:30.4250570Z @%p93 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd15 + 0 ], %rd16; 2026-02-21T09:27:30.4250925Z // end inline asm 2026-02-21T09:27:30.4251055Z // begin inline asm 2026-02-21T09:27:30.4251287Z @%p93 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1; 2026-02-21T09:27:30.4251533Z // end inline asm 2026-02-21T09:27:30.4251669Z mov.b32 %r80, 32; 2026-02-21T09:27:30.4251800Z // begin inline asm 2026-02-21T09:27:30.4252036Z @%p93 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r80; 2026-02-21T09:27:30.4252293Z // end inline asm 2026-02-21T09:27:30.4252430Z mov.b32 %r81, 256; 2026-02-21T09:27:30.4252568Z // begin inline asm 2026-02-21T09:27:30.4252795Z @%p93 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r81; 2026-02-21T09:27:30.4253055Z // end inline asm 2026-02-21T09:27:30.4253185Z mov.b32 %r82, 2048; 2026-02-21T09:27:30.4253334Z // begin inline asm 2026-02-21T09:27:30.4253571Z @%p93 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r82; 2026-02-21T09:27:30.4253859Z // end inline asm 2026-02-21T09:27:30.4253999Z // begin inline asm 2026-02-21T09:27:30.4254236Z @%p93 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r82; 2026-02-21T09:27:30.4254498Z // end inline asm 2026-02-21T09:27:30.4254626Z mov.b64 %rd23, 4096; 2026-02-21T09:27:30.4254803Z // begin inline asm 2026-02-21T09:27:30.4255049Z @%p93 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd15 + 0 ], 0x0, %rd23; 2026-02-21T09:27:30.4255330Z // end inline asm 2026-02-21T09:27:30.4255456Z mov.b32 %r84, 1; 2026-02-21T09:27:30.4255591Z // begin inline asm 2026-02-21T09:27:30.4255843Z @%p93 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0, %r84; 2026-02-21T09:27:30.4256119Z // end inline asm 2026-02-21T09:27:30.4256255Z // begin inline asm 2026-02-21T09:27:30.4256528Z @%p93 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x1, %r84; 2026-02-21T09:27:30.4256840Z // end inline asm 2026-02-21T09:27:30.4256971Z // begin inline asm 2026-02-21T09:27:30.4257201Z @%p93 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x6; 2026-02-21T09:27:30.4257462Z // end inline asm 2026-02-21T09:27:30.4257599Z // begin inline asm 2026-02-21T09:27:30.4257844Z @%p93 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T09:27:30.4258120Z // end inline asm 2026-02-21T09:27:30.4258256Z // begin inline asm 2026-02-21T09:27:30.4258485Z @%p93 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x2; 2026-02-21T09:27:30.4258752Z // end inline asm 2026-02-21T09:27:30.4258880Z // begin inline asm 2026-02-21T09:27:30.4259109Z @%p93 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd15 + 0 ], 0x0; 2026-02-21T09:27:30.4259365Z // end inline asm 2026-02-21T09:27:30.4259491Z // begin inline asm 2026-02-21T09:27:30.4259838Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd30 + 0 ], [ %rd15 + 0 ], 0x80; 2026-02-21T09:27:30.4260210Z // end inline asm 2026-02-21T09:27:30.4260345Z // begin inline asm 2026-02-21T09:27:30.4260544Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd30 + 0 ], 0x80; 2026-02-21T09:27:30.4260796Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:27:30.4260986Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:27:30.4261156Z // end inline asm 2026-02-21T09:27:30.4261287Z bar.sync 0; 2026-02-21T09:27:30.4261421Z cvta.global.u64 %rd82, %rd30; 2026-02-21T09:27:30.4261702Z .loc 1 27 62 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:27:62 2026-02-21T09:27:30.4262004Z setp.gt.u32 %p21, %r3, 255; 2026-02-21T09:27:30.4262183Z @%p21 bra $L__BB0_8; 2026-02-21T09:27:30.4262341Z // %bb.1: // %.lr.ph 2026-02-21T09:27:30.4262637Z .loc 1 0 62 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:0:62 2026-02-21T09:27:30.4262947Z ld.param.b64 %rd13, [_helion_matmul_param_0]; 2026-02-21T09:27:30.4263311Z .loc 1 47 48 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:47:48 2026-02-21T09:27:30.4263594Z shl.b32 %r427, %r1, 3; 2026-02-21T09:27:30.4263745Z and.b32 %r428, %r427, 24; 2026-02-21T09:27:30.4264004Z .loc 1 39 45 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:39:45 2026-02-21T09:27:30.4264285Z and.b32 %r429, %r1, 31; 2026-02-21T09:27:30.4264441Z shl.b32 %r430, %r429, 3; 2026-02-21T09:27:30.4264592Z and.b32 %r431, %r1, 224; 2026-02-21T09:27:30.4264786Z bfe.u32 %r432, %r1, 5, 3; 2026-02-21T09:27:30.4264940Z shr.u32 %r433, %r1, 2; 2026-02-21T09:27:30.4265086Z bfe.u32 %r4, %r1, 2, 6; 2026-02-21T09:27:30.4265237Z shr.u32 %r434, %r1, 5; 2026-02-21T09:27:30.4265379Z shl.b32 %r435, %r1, 4; 2026-02-21T09:27:30.4265529Z and.b32 %r436, %r435, 4080; 2026-02-21T09:27:30.4265681Z shl.b32 %r437, %r1, 1; 2026-02-21T09:27:30.4265833Z and.b32 %r438, %r437, 48; 2026-02-21T09:27:30.4265987Z xor.b32 %r5, %r436, %r438; 2026-02-21T09:27:30.4266150Z add.s32 %r373, %r77, %r5; 2026-02-21T09:27:30.4266306Z add.s32 %r375, %r373, 4096; 2026-02-21T09:27:30.4266456Z add.s32 %r377, %r373, 8192; 2026-02-21T09:27:30.4266618Z add.s32 %r379, %r373, 12288; 2026-02-21T09:27:30.4266771Z add.s32 %r386, %r373, 16384; 2026-02-21T09:27:30.4266930Z add.s32 %r388, %r373, 20480; 2026-02-21T09:27:30.4267079Z add.s32 %r390, %r373, 24576; 2026-02-21T09:27:30.4267231Z add.s32 %r392, %r373, 28672; 2026-02-21T09:27:30.4267373Z add.s32 %r399, %r373, 32768; 2026-02-21T09:27:30.4267527Z add.s32 %r401, %r373, 36864; 2026-02-21T09:27:30.4267672Z add.s32 %r403, %r373, 40960; 2026-02-21T09:27:30.4267823Z add.s32 %r405, %r373, 45056; 2026-02-21T09:27:30.4267975Z add.s32 %r412, %r373, 49152; 2026-02-21T09:27:30.4268123Z add.s32 %r414, %r373, 53248; 2026-02-21T09:27:30.4268322Z add.s32 %r416, %r373, 57344; 2026-02-21T09:27:30.4268550Z add.s32 %r418, %r373, 61440; 2026-02-21T09:27:30.4268739Z or.b32 %r6, %r428, 128; 2026-02-21T09:27:30.4268923Z add.s32 %r501, %r373, 65536; 2026-02-21T09:27:30.4269104Z add.s32 %r503, %r373, 69632; 2026-02-21T09:27:30.4269283Z add.s32 %r505, %r373, 73728; 2026-02-21T09:27:30.4269466Z add.s32 %r507, %r373, 77824; 2026-02-21T09:27:30.4269646Z shl.b32 %r440, %r1, 12; 2026-02-21T09:27:30.4269816Z and.b32 %r441, %r440, 28672; 2026-02-21T09:27:30.4270010Z or.b32 %r442, %r441, %r436; 2026-02-21T09:27:30.4270181Z xor.b32 %r443, %r442, 16; 2026-02-21T09:27:30.4270352Z xor.b32 %r444, %r442, 32; 2026-02-21T09:27:30.4270521Z xor.b32 %r445, %r442, 48; 2026-02-21T09:27:30.4270688Z xor.b32 %r446, %r442, 64; 2026-02-21T09:27:30.4270860Z xor.b32 %r447, %r442, 80; 2026-02-21T09:27:30.4271034Z xor.b32 %r448, %r442, 96; 2026-02-21T09:27:30.4271208Z xor.b32 %r449, %r442, 112; 2026-02-21T09:27:30.4271398Z shl.b32 %r450, %r431, 7; 2026-02-21T09:27:30.4271581Z shl.b32 %r451, %r429, 4; 2026-02-21T09:27:30.4271741Z shr.u32 %r452, %r431, 1; 2026-02-21T09:27:30.4271912Z or.b32 %r453, %r450, %r451; 2026-02-21T09:27:30.4272065Z xor.b32 %r454, %r453, %r452; 2026-02-21T09:27:30.4272229Z add.s32 %r867, %r77, %r454; 2026-02-21T09:27:30.4272483Z .loc 1 27 62 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:27:62 2026-02-21T09:27:30.4272774Z cvt.u64.u32 %rd55, %r428; 2026-02-21T09:27:30.4273024Z .loc 1 34 33 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:34:33 2026-02-21T09:27:30.4273308Z shr.u32 %r455, %r3, 3; 2026-02-21T09:27:30.4273461Z and.b32 %r456, %r455, 30; 2026-02-21T09:27:30.4273707Z .loc 1 36 64 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:36:64 2026-02-21T09:27:30.4273986Z and.b32 %r27, %r3, 1; 2026-02-21T09:27:30.4274230Z .loc 1 36 30 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:36:30 2026-02-21T09:27:30.4274514Z or.b32 %r457, %r456, %r27; 2026-02-21T09:27:30.4274812Z .loc 1 38 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:38:27 2026-02-21T09:27:30.4275159Z shl.b32 %r458, %r457, 8; 2026-02-21T09:27:30.4275424Z .loc 1 39 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:39:32 2026-02-21T09:27:30.4275702Z or.b32 %r459, %r458, %r4; 2026-02-21T09:27:30.4275859Z or.b32 %r460, %r433, %r458; 2026-02-21T09:27:30.4276011Z or.b32 %r28, %r458, %r432; 2026-02-21T09:27:30.4276273Z .loc 1 40 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:40:27 2026-02-21T09:27:30.4276552Z shl.b32 %r461, %r3, 7; 2026-02-21T09:27:30.4276708Z and.b32 %r512, %r461, 1792; 2026-02-21T09:27:30.4276964Z .loc 1 51 53 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:53 2026-02-21T09:27:30.4277257Z shl.b32 %r462, %r459, 11; 2026-02-21T09:27:30.4277414Z shl.b32 %r463, %r460, 11; 2026-02-21T09:27:30.4277562Z or.b32 %r464, %r463, 393216; 2026-02-21T09:27:30.4277828Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4278124Z shfl.sync.idx.b32 %r62, %r434, 0, 31, -1; 2026-02-21T09:27:30.4278313Z shl.b32 %r465, %r62, 21; 2026-02-21T09:27:30.4278462Z and.b32 %r466, %r465, 6291456; 2026-02-21T09:27:30.4278630Z add.s32 %r467, %r466, %r1600; 2026-02-21T09:27:30.4278790Z shl.b32 %r468, %r62, 6; 2026-02-21T09:27:30.4278934Z and.b32 %r469, %r468, 256; 2026-02-21T09:27:30.4279090Z add.s32 %r862, %r467, %r469; 2026-02-21T09:27:30.4279241Z mov.pred %p61, -1; 2026-02-21T09:27:30.4279387Z // begin inline asm 2026-02-21T09:27:30.4279729Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 0], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4280104Z // end inline asm 2026-02-21T09:27:30.4280237Z // begin inline asm 2026-02-21T09:27:30.4280653Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 16], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4281031Z // end inline asm 2026-02-21T09:27:30.4281160Z // begin inline asm 2026-02-21T09:27:30.4281478Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 32], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4281841Z // end inline asm 2026-02-21T09:27:30.4281978Z // begin inline asm 2026-02-21T09:27:30.4282296Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 48], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4282651Z // end inline asm 2026-02-21T09:27:30.4282788Z // begin inline asm 2026-02-21T09:27:30.4283096Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 64], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4283451Z // end inline asm 2026-02-21T09:27:30.4283582Z // begin inline asm 2026-02-21T09:27:30.4283905Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 80], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4284285Z // end inline asm 2026-02-21T09:27:30.4284414Z // begin inline asm 2026-02-21T09:27:30.4284764Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 96], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4285111Z // end inline asm 2026-02-21T09:27:30.4285245Z // begin inline asm 2026-02-21T09:27:30.4285562Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 112], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4286379Z [149s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:27:30.4287601Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=5, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:27:30.4288741Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:27:30.4288969Z `ptxas` stderr: 2026-02-21T09:27:30.4289387Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 209 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:27:30.4289842Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:27:30.4289989Z 2026-02-21T09:27:30.4290386Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplsezy5b7.ptx -o /tmp/tmplsezy5b7.ptx.o 2026-02-21T09:27:30.4290810Z 2026-02-21T09:27:30.4290941Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:27:30.4291170Z // end inline asm 2026-02-21T09:27:30.4291308Z // begin inline asm 2026-02-21T09:27:30.4291628Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 128], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4291984Z // end inline asm 2026-02-21T09:27:30.4292112Z // begin inline asm 2026-02-21T09:27:30.4292433Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 144], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4292781Z // end inline asm 2026-02-21T09:27:30.4292907Z // begin inline asm 2026-02-21T09:27:30.4293261Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 160], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4293628Z // end inline asm 2026-02-21T09:27:30.4293767Z // begin inline asm 2026-02-21T09:27:30.4294084Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 176], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4294429Z // end inline asm 2026-02-21T09:27:30.4294562Z // begin inline asm 2026-02-21T09:27:30.4294931Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 192], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4295297Z // end inline asm 2026-02-21T09:27:30.4295425Z // begin inline asm 2026-02-21T09:27:30.4295742Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 208], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4296096Z // end inline asm 2026-02-21T09:27:30.4296224Z // begin inline asm 2026-02-21T09:27:30.4296545Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 224], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4296894Z // end inline asm 2026-02-21T09:27:30.4297028Z // begin inline asm 2026-02-21T09:27:30.4297348Z @%p61 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r862 + 240], {%r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95, %r95}; 2026-02-21T09:27:30.4297701Z // end inline asm 2026-02-21T09:27:30.4297829Z // begin inline asm 2026-02-21T09:27:30.4297988Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:27:30.4298154Z // end inline asm 2026-02-21T09:27:30.4298283Z bar.sync 0; 2026-02-21T09:27:30.4298531Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4298815Z add.s32 %r1602, %r77, 163888; 2026-02-21T09:27:30.4298979Z // begin inline asm 2026-02-21T09:27:30.4299144Z @%p93 mbarrier.init.shared::cta.b64 [%r1602], 1; 2026-02-21T09:27:30.4299341Z // end inline asm 2026-02-21T09:27:30.4299472Z bar.sync 0; 2026-02-21T09:27:30.4299646Z add.s32 %r367, %r77, 163896; 2026-02-21T09:27:30.4299838Z // begin inline asm 2026-02-21T09:27:30.4299996Z @%p93 mbarrier.init.shared::cta.b64 [%r367], 1; 2026-02-21T09:27:30.4300185Z // end inline asm 2026-02-21T09:27:30.4300317Z add.s32 %r368, %r77, 163840; 2026-02-21T09:27:30.4300472Z // begin inline asm 2026-02-21T09:27:30.4300627Z @%p93 mbarrier.init.shared::cta.b64 [%r368], 1; 2026-02-21T09:27:30.4300816Z // end inline asm 2026-02-21T09:27:30.4300942Z bar.sync 0; 2026-02-21T09:27:30.4301075Z add.s32 %r369, %r77, 163848; 2026-02-21T09:27:30.4301229Z // begin inline asm 2026-02-21T09:27:30.4301382Z @%p93 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T09:27:30.4301567Z // end inline asm 2026-02-21T09:27:30.4301692Z bar.sync 0; 2026-02-21T09:27:30.4301826Z add.s32 %r370, %r77, 163856; 2026-02-21T09:27:30.4301971Z // begin inline asm 2026-02-21T09:27:30.4302134Z @%p93 mbarrier.init.shared::cta.b64 [%r370], 1; 2026-02-21T09:27:30.4302314Z // end inline asm 2026-02-21T09:27:30.4302449Z bar.sync 0; 2026-02-21T09:27:30.4302577Z add.s32 %r371, %r77, 163864; 2026-02-21T09:27:30.4302731Z // begin inline asm 2026-02-21T09:27:30.4302891Z @%p93 mbarrier.init.shared::cta.b64 [%r371], 1; 2026-02-21T09:27:30.4303064Z // end inline asm 2026-02-21T09:27:30.4303196Z bar.sync 0; 2026-02-21T09:27:30.4303318Z add.s32 %r509, %r77, 163872; 2026-02-21T09:27:30.4303469Z // begin inline asm 2026-02-21T09:27:30.4303620Z @%p93 mbarrier.init.shared::cta.b64 [%r509], 1; 2026-02-21T09:27:30.4303800Z // end inline asm 2026-02-21T09:27:30.4304035Z .loc 1 51 60 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:60 2026-02-21T09:27:30.4304317Z or.b32 %r470, %r462, %r428; 2026-02-21T09:27:30.4304474Z or.b32 %r471, %r464, %r428; 2026-02-21T09:27:30.4304809Z .loc 1 51 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:32 2026-02-21T09:27:30.4305132Z mad.wide.u32 %rd35, %r470, 2, %rd13; 2026-02-21T09:27:30.4305309Z cvt.u64.u32 %rd2, %r462; 2026-02-21T09:27:30.4305469Z or.b64 %rd56, %rd2, %rd55; 2026-02-21T09:27:30.4305619Z shl.b64 %rd57, %rd56, 1; 2026-02-21T09:27:30.4305773Z add.s64 %rd3, %rd13, %rd57; 2026-02-21T09:27:30.4305927Z add.s64 %rd36, %rd3, 262144; 2026-02-21T09:27:30.4306085Z add.s64 %rd37, %rd3, 524288; 2026-02-21T09:27:30.4306248Z mad.wide.u32 %rd38, %r471, 2, %rd13; 2026-02-21T09:27:30.4306412Z mov.b32 %r502, 16; 2026-02-21T09:27:30.4306660Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4306932Z // begin inline asm 2026-02-21T09:27:30.4307134Z cp.async.cg.shared.global [ %r373 + 0 ], [ %rd35 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4307352Z // end inline asm 2026-02-21T09:27:30.4307491Z // begin inline asm 2026-02-21T09:27:30.4307682Z cp.async.cg.shared.global [ %r375 + 0 ], [ %rd36 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4307908Z // end inline asm 2026-02-21T09:27:30.4308052Z // begin inline asm 2026-02-21T09:27:30.4308240Z cp.async.cg.shared.global [ %r377 + 0 ], [ %rd37 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4308461Z // end inline asm 2026-02-21T09:27:30.4308590Z // begin inline asm 2026-02-21T09:27:30.4308779Z cp.async.cg.shared.global [ %r379 + 0 ], [ %rd38 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4308990Z // end inline asm 2026-02-21T09:27:30.4309132Z cp.async.commit_group; 2026-02-21T09:27:30.4309380Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4309658Z bar.sync 0; 2026-02-21T09:27:30.4309790Z // begin inline asm 2026-02-21T09:27:30.4309973Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r368], 16384; 2026-02-21T09:27:30.4310197Z // end inline asm 2026-02-21T09:27:30.4310428Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4310703Z // begin inline asm 2026-02-21T09:27:30.4310852Z fence.proxy.async.shared::cta; 2026-02-21T09:27:30.4311064Z // end inline asm 2026-02-21T09:27:30.4311215Z bar.sync 0; 2026-02-21T09:27:30.4311358Z elect.sync %r472|%p54, -1; 2026-02-21T09:27:30.4311523Z and.pred %p46, %p1, %p54; 2026-02-21T09:27:30.4311678Z add.s32 %r382, %r77, 81920; 2026-02-21T09:27:30.4311832Z // begin inline asm 2026-02-21T09:27:30.4312149Z @%p46 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r382], [%rd82, {%r95, %r512}], [%r368]; 2026-02-21T09:27:30.4312504Z // end inline asm 2026-02-21T09:27:30.4312747Z .loc 1 51 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:32 2026-02-21T09:27:30.4313034Z add.s64 %rd40, %rd3, 64; 2026-02-21T09:27:30.4313190Z or.b32 %r473, %r470, 32; 2026-02-21T09:27:30.4313347Z mad.wide.u32 %rd58, %r473, 2, %rd13; 2026-02-21T09:27:30.4313521Z add.s64 %rd41, %rd58, 262144; 2026-02-21T09:27:30.4313679Z add.s64 %rd42, %rd58, 524288; 2026-02-21T09:27:30.4313842Z cvt.u64.u32 %rd4, %r464; 2026-02-21T09:27:30.4313990Z or.b64 %rd59, %rd4, %rd55; 2026-02-21T09:27:30.4314148Z shl.b64 %rd60, %rd59, 1; 2026-02-21T09:27:30.4314292Z add.s64 %rd5, %rd13, %rd60; 2026-02-21T09:27:30.4314449Z add.s64 %rd43, %rd5, 64; 2026-02-21T09:27:30.4314757Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4315043Z // begin inline asm 2026-02-21T09:27:30.4315240Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd40 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4315458Z // end inline asm 2026-02-21T09:27:30.4315597Z // begin inline asm 2026-02-21T09:27:30.4315786Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd41 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4316007Z // end inline asm 2026-02-21T09:27:30.4316136Z // begin inline asm 2026-02-21T09:27:30.4316328Z cp.async.cg.shared.global [ %r390 + 0 ], [ %rd42 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4316543Z // end inline asm 2026-02-21T09:27:30.4316723Z // begin inline asm 2026-02-21T09:27:30.4316954Z cp.async.cg.shared.global [ %r392 + 0 ], [ %rd43 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4317174Z // end inline asm 2026-02-21T09:27:30.4317317Z cp.async.commit_group; 2026-02-21T09:27:30.4317576Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4317878Z bar.sync 0; 2026-02-21T09:27:30.4318010Z // begin inline asm 2026-02-21T09:27:30.4318216Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r369], 16384; 2026-02-21T09:27:30.4318440Z // end inline asm 2026-02-21T09:27:30.4318691Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4318985Z bar.sync 0; 2026-02-21T09:27:30.4319122Z elect.sync %r474|%p55, -1; 2026-02-21T09:27:30.4319295Z and.pred %p48, %p1, %p55; 2026-02-21T09:27:30.4319456Z add.s32 %r395, %r77, 98304; 2026-02-21T09:27:30.4319619Z // begin inline asm 2026-02-21T09:27:30.4319959Z @%p48 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r395], [%rd82, {%r80, %r512}], [%r369]; 2026-02-21T09:27:30.4320328Z // end inline asm 2026-02-21T09:27:30.4320580Z .loc 1 51 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:32 2026-02-21T09:27:30.4320869Z add.s64 %rd45, %rd3, 128; 2026-02-21T09:27:30.4321033Z or.b32 %r475, %r470, 64; 2026-02-21T09:27:30.4321191Z mad.wide.u32 %rd61, %r475, 2, %rd13; 2026-02-21T09:27:30.4321374Z add.s64 %rd46, %rd61, 262144; 2026-02-21T09:27:30.4321534Z add.s64 %rd47, %rd61, 524288; 2026-02-21T09:27:30.4321702Z add.s64 %rd48, %rd5, 128; 2026-02-21T09:27:30.4321966Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4322266Z // begin inline asm 2026-02-21T09:27:30.4322470Z cp.async.cg.shared.global [ %r399 + 0 ], [ %rd45 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4322697Z // end inline asm 2026-02-21T09:27:30.4322840Z // begin inline asm 2026-02-21T09:27:30.4323035Z cp.async.cg.shared.global [ %r401 + 0 ], [ %rd46 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4323371Z // end inline asm 2026-02-21T09:27:30.4323506Z // begin inline asm 2026-02-21T09:27:30.4323703Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd47 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4323927Z // end inline asm 2026-02-21T09:27:30.4324068Z // begin inline asm 2026-02-21T09:27:30.4324267Z cp.async.cg.shared.global [ %r405 + 0 ], [ %rd48 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4324491Z // end inline asm 2026-02-21T09:27:30.4324639Z cp.async.commit_group; 2026-02-21T09:27:30.4324967Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4325270Z bar.sync 0; 2026-02-21T09:27:30.4325400Z // begin inline asm 2026-02-21T09:27:30.4325600Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r370], 16384; 2026-02-21T09:27:30.4325822Z // end inline asm 2026-02-21T09:27:30.4326079Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4326372Z bar.sync 0; 2026-02-21T09:27:30.4326513Z elect.sync %r476|%p56, -1; 2026-02-21T09:27:30.4326695Z and.pred %p50, %p1, %p56; 2026-02-21T09:27:30.4326848Z add.s32 %r408, %r77, 114688; 2026-02-21T09:27:30.4327004Z mov.b32 %r409, 64; 2026-02-21T09:27:30.4327135Z // begin inline asm 2026-02-21T09:27:30.4327457Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r408], [%rd82, {%r409, %r512}], [%r370]; 2026-02-21T09:27:30.4327818Z // end inline asm 2026-02-21T09:27:30.4328059Z .loc 1 51 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:32 2026-02-21T09:27:30.4328342Z add.s64 %rd50, %rd3, 192; 2026-02-21T09:27:30.4328492Z or.b32 %r477, %r470, 96; 2026-02-21T09:27:30.4328660Z mad.wide.u32 %rd62, %r477, 2, %rd13; 2026-02-21T09:27:30.4328830Z add.s64 %rd51, %rd62, 262144; 2026-02-21T09:27:30.4329036Z add.s64 %rd52, %rd62, 524288; 2026-02-21T09:27:30.4329225Z add.s64 %rd53, %rd5, 192; 2026-02-21T09:27:30.4329487Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4329767Z // begin inline asm 2026-02-21T09:27:30.4329957Z cp.async.cg.shared.global [ %r412 + 0 ], [ %rd50 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4330177Z // end inline asm 2026-02-21T09:27:30.4330306Z // begin inline asm 2026-02-21T09:27:30.4330498Z cp.async.cg.shared.global [ %r414 + 0 ], [ %rd51 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4330711Z // end inline asm 2026-02-21T09:27:30.4330846Z // begin inline asm 2026-02-21T09:27:30.4331029Z cp.async.cg.shared.global [ %r416 + 0 ], [ %rd52 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4331251Z // end inline asm 2026-02-21T09:27:30.4331385Z // begin inline asm 2026-02-21T09:27:30.4331569Z cp.async.cg.shared.global [ %r418 + 0 ], [ %rd53 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4331786Z // end inline asm 2026-02-21T09:27:30.4331920Z cp.async.commit_group; 2026-02-21T09:27:30.4332182Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4332455Z bar.sync 0; 2026-02-21T09:27:30.4332586Z // begin inline asm 2026-02-21T09:27:30.4332768Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r371], 16384; 2026-02-21T09:27:30.4332990Z // end inline asm 2026-02-21T09:27:30.4333230Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4333511Z bar.sync 0; 2026-02-21T09:27:30.4333652Z elect.sync %r478|%p57, -1; 2026-02-21T09:27:30.4333808Z and.pred %p52, %p1, %p57; 2026-02-21T09:27:30.4333966Z add.s32 %r421, %r77, 131072; 2026-02-21T09:27:30.4334110Z mov.b32 %r422, 96; 2026-02-21T09:27:30.4334245Z // begin inline asm 2026-02-21T09:27:30.4334555Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r421], [%rd82, {%r422, %r512}], [%r371]; 2026-02-21T09:27:30.4335016Z // end inline asm 2026-02-21T09:27:30.4335261Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4335609Z cp.async.wait_group 3; 2026-02-21T09:27:30.4335763Z bar.sync 0; 2026-02-21T09:27:30.4335997Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4336279Z // begin inline asm 2026-02-21T09:27:30.4336406Z 2026-02-21T09:27:30.4336524Z { 2026-02-21T09:27:30.4336642Z .reg .pred complete; 2026-02-21T09:27:30.4336792Z waitLoop: 2026-02-21T09:27:30.4336980Z mbarrier.try_wait.parity.shared.b64 complete, [%r368], %r95; 2026-02-21T09:27:30.4337206Z @!complete bra.uni waitLoop; 2026-02-21T09:27:30.4337363Z } 2026-02-21T09:27:30.4337425Z 2026-02-21T09:27:30.4337479Z // end inline asm 2026-02-21T09:27:30.4337720Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4338009Z setp.ne.b32 %p58, %r62, 0; 2026-02-21T09:27:30.4338172Z @%p58 bra $L__BB0_3; 2026-02-21T09:27:30.4338310Z // %bb.2: 2026-02-21T09:27:30.4338547Z .loc 1 0 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:0:52 2026-02-21T09:27:30.4338837Z add.s32 %r488, %r77, 8224; 2026-02-21T09:27:30.4338991Z bfe.u32 %r489, %r488, 4, 14; 2026-02-21T09:27:30.4339149Z cvt.u64.u32 %rd72, %r489; 2026-02-21T09:27:30.4339309Z or.b64 %rd69, %rd72, -9223371899348713472; 2026-02-21T09:27:30.4339491Z add.s32 %r490, %r77, 8192; 2026-02-21T09:27:30.4339638Z bfe.u32 %r491, %r490, 4, 14; 2026-02-21T09:27:30.4339796Z cvt.u64.u32 %rd73, %r491; 2026-02-21T09:27:30.4339952Z or.b64 %rd67, %rd73, -9223371899348713472; 2026-02-21T09:27:30.4340134Z add.s32 %r493, %r77, 81952; 2026-02-21T09:27:30.4340287Z bfe.u32 %r494, %r493, 4, 14; 2026-02-21T09:27:30.4340433Z cvt.u64.u32 %rd74, %r494; 2026-02-21T09:27:30.4340593Z or.b64 %rd66, %rd74, -9223371899348713472; 2026-02-21T09:27:30.4340802Z add.s32 %r495, %r77, 32; 2026-02-21T09:27:30.4340986Z bfe.u32 %r496, %r495, 4, 14; 2026-02-21T09:27:30.4341135Z cvt.u64.u32 %rd75, %r496; 2026-02-21T09:27:30.4341294Z or.b64 %rd65, %rd75, -9223371899348713472; 2026-02-21T09:27:30.4341462Z bfe.u32 %r497, %r382, 4, 14; 2026-02-21T09:27:30.4341618Z cvt.u64.u32 %rd76, %r497; 2026-02-21T09:27:30.4341772Z or.b64 %rd64, %rd76, -9223371899348713472; 2026-02-21T09:27:30.4341951Z bfe.u32 %r498, %r77, 4, 14; 2026-02-21T09:27:30.4342107Z cvt.u64.u32 %rd77, %r498; 2026-02-21T09:27:30.4342259Z or.b64 %rd63, %rd77, -9223371899348713472; 2026-02-21T09:27:30.4342546Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4342826Z elect.sync %r499|%p60, -1; 2026-02-21T09:27:30.4342983Z mov.b32 %r480, 138412048; 2026-02-21T09:27:30.4343125Z mov.pred %p59, 0; 2026-02-21T09:27:30.4343269Z // begin inline asm 2026-02-21T09:27:30.4343496Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd63, %rd64, %r480, %p59; 2026-02-21T09:27:30.4343744Z // end inline asm 2026-02-21T09:27:30.4343882Z // begin inline asm 2026-02-21T09:27:30.4344100Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd65, %rd66, %r480, %p61; 2026-02-21T09:27:30.4344347Z // end inline asm 2026-02-21T09:27:30.4344476Z // begin inline asm 2026-02-21T09:27:30.4344735Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd67, %rd64, %r480, %p59; 2026-02-21T09:27:30.4344982Z // end inline asm 2026-02-21T09:27:30.4345120Z // begin inline asm 2026-02-21T09:27:30.4345336Z @%p60 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd69, %rd66, %r480, %p61; 2026-02-21T09:27:30.4345579Z // end inline asm 2026-02-21T09:27:30.4345722Z add.s32 %r500, %r77, 163888; 2026-02-21T09:27:30.4345874Z cvt.u64.u32 %rd71, %r500; 2026-02-21T09:27:30.4346026Z // begin inline asm 2026-02-21T09:27:30.4346223Z @%p60 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd71]; 2026-02-21T09:27:30.4346451Z // end inline asm 2026-02-21T09:27:30.4346577Z $L__BB0_3: 2026-02-21T09:27:30.4346813Z .loc 1 0 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:0:52 2026-02-21T09:27:30.4347194Z ld.param.b64 %rd14, [_helion_matmul_param_2]; 2026-02-21T09:27:30.4347382Z add.s32 %r11, %r77, %r442; 2026-02-21T09:27:30.4347540Z add.s32 %r12, %r77, %r443; 2026-02-21T09:27:30.4347688Z add.s32 %r13, %r77, %r444; 2026-02-21T09:27:30.4347843Z add.s32 %r14, %r77, %r445; 2026-02-21T09:27:30.4347990Z add.s32 %r15, %r77, %r446; 2026-02-21T09:27:30.4348143Z add.s32 %r16, %r77, %r447; 2026-02-21T09:27:30.4348284Z add.s32 %r17, %r77, %r448; 2026-02-21T09:27:30.4348438Z add.s32 %r18, %r77, %r449; 2026-02-21T09:27:30.4348598Z add.s32 %r872, %r867, 512; 2026-02-21T09:27:30.4348751Z add.s32 %r877, %r867, 1024; 2026-02-21T09:27:30.4348911Z add.s32 %r882, %r867, 1536; 2026-02-21T09:27:30.4349057Z add.s32 %r887, %r867, 2048; 2026-02-21T09:27:30.4349208Z add.s32 %r892, %r867, 2560; 2026-02-21T09:27:30.4349354Z add.s32 %r897, %r867, 3072; 2026-02-21T09:27:30.4349507Z add.s32 %r902, %r867, 3584; 2026-02-21T09:27:30.4349654Z or.b32 %r29, %r28, 8; 2026-02-21T09:27:30.4349803Z or.b32 %r30, %r28, 16; 2026-02-21T09:27:30.4349945Z or.b32 %r31, %r28, 24; 2026-02-21T09:27:30.4350093Z or.b32 %r32, %r28, 32; 2026-02-21T09:27:30.4350236Z or.b32 %r33, %r28, 40; 2026-02-21T09:27:30.4350372Z or.b32 %r34, %r28, 48; 2026-02-21T09:27:30.4350515Z or.b32 %r35, %r28, 56; 2026-02-21T09:27:30.4350649Z or.b32 %r36, %r28, 64; 2026-02-21T09:27:30.4350793Z or.b32 %r37, %r28, 72; 2026-02-21T09:27:30.4350928Z or.b32 %r38, %r28, 80; 2026-02-21T09:27:30.4351071Z or.b32 %r39, %r28, 88; 2026-02-21T09:27:30.4351204Z or.b32 %r40, %r28, 96; 2026-02-21T09:27:30.4351350Z or.b32 %r41, %r28, 104; 2026-02-21T09:27:30.4351492Z or.b32 %r42, %r28, 112; 2026-02-21T09:27:30.4351639Z or.b32 %r43, %r28, 120; 2026-02-21T09:27:30.4351784Z or.b32 %r44, %r28, 128; 2026-02-21T09:27:30.4351953Z or.b32 %r45, %r28, 136; 2026-02-21T09:27:30.4352135Z or.b32 %r46, %r28, 144; 2026-02-21T09:27:30.4352275Z or.b32 %r47, %r28, 152; 2026-02-21T09:27:30.4352422Z or.b32 %r48, %r28, 160; 2026-02-21T09:27:30.4352558Z or.b32 %r49, %r28, 168; 2026-02-21T09:27:30.4352701Z or.b32 %r50, %r28, 176; 2026-02-21T09:27:30.4352836Z or.b32 %r51, %r28, 184; 2026-02-21T09:27:30.4352977Z or.b32 %r52, %r28, 192; 2026-02-21T09:27:30.4353113Z or.b32 %r53, %r28, 200; 2026-02-21T09:27:30.4353255Z or.b32 %r54, %r28, 208; 2026-02-21T09:27:30.4353401Z or.b32 %r55, %r28, 216; 2026-02-21T09:27:30.4353536Z or.b32 %r56, %r28, 224; 2026-02-21T09:27:30.4353681Z or.b32 %r57, %r28, 232; 2026-02-21T09:27:30.4353818Z or.b32 %r58, %r28, 240; 2026-02-21T09:27:30.4353961Z or.b32 %r59, %r28, 248; 2026-02-21T09:27:30.4354101Z or.b32 %r61, %r512, %r430; 2026-02-21T09:27:30.4354361Z .loc 1 51 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:32 2026-02-21T09:27:30.4354644Z add.s64 %rd78, %rd3, 256; 2026-02-21T09:27:30.4354840Z cvt.u64.u32 %rd84, %r6; 2026-02-21T09:27:30.4354998Z add.s64 %rd85, %rd2, %rd84; 2026-02-21T09:27:30.4355152Z shl.b64 %rd86, %rd85, 1; 2026-02-21T09:27:30.4355314Z add.s64 %rd87, %rd13, %rd86; 2026-02-21T09:27:30.4355470Z add.s64 %rd79, %rd87, 262144; 2026-02-21T09:27:30.4355640Z add.s64 %rd80, %rd87, 524288; 2026-02-21T09:27:30.4355794Z add.s64 %rd81, %rd5, 256; 2026-02-21T09:27:30.4356058Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4356331Z bar.sync 0; 2026-02-21T09:27:30.4356464Z // begin inline asm 2026-02-21T09:27:30.4356663Z cp.async.cg.shared.global [ %r501 + 0 ], [ %rd78 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4356894Z // end inline asm 2026-02-21T09:27:30.4357042Z // begin inline asm 2026-02-21T09:27:30.4357236Z cp.async.cg.shared.global [ %r503 + 0 ], [ %rd79 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4357463Z // end inline asm 2026-02-21T09:27:30.4357596Z // begin inline asm 2026-02-21T09:27:30.4357791Z cp.async.cg.shared.global [ %r505 + 0 ], [ %rd80 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4358051Z // end inline asm 2026-02-21T09:27:30.4358217Z // begin inline asm 2026-02-21T09:27:30.4358403Z cp.async.cg.shared.global [ %r507 + 0 ], [ %rd81 + 0 ], 0x10, %r502; 2026-02-21T09:27:30.4358623Z // end inline asm 2026-02-21T09:27:30.4358768Z cp.async.commit_group; 2026-02-21T09:27:30.4359023Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4359306Z // begin inline asm 2026-02-21T09:27:30.4359493Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r509], 16384; 2026-02-21T09:27:30.4359716Z // end inline asm 2026-02-21T09:27:30.4359968Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4360250Z bar.sync 0; 2026-02-21T09:27:30.4360393Z elect.sync %r519|%p71, -1; 2026-02-21T09:27:30.4360557Z and.pred %p69, %p1, %p71; 2026-02-21T09:27:30.4360725Z add.s32 %r510, %r77, 147456; 2026-02-21T09:27:30.4360881Z mov.b32 %r511, 128; 2026-02-21T09:27:30.4361031Z // begin inline asm 2026-02-21T09:27:30.4361364Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r510], [%rd82, {%r511, %r512}], [%r509]; 2026-02-21T09:27:30.4361728Z // end inline asm 2026-02-21T09:27:30.4361977Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4362284Z shl.b64 %rd88, %rd4, 1; 2026-02-21T09:27:30.4362444Z add.s64 %rd6, %rd88, 320; 2026-02-21T09:27:30.4362599Z and.b32 %r520, %r1, 3; 2026-02-21T09:27:30.4362768Z mad.wide.u32 %rd654, %r520, 16, %rd13; 2026-02-21T09:27:30.4362947Z cvt.u16.u32 %rs1, %r3; 2026-02-21T09:27:30.4363106Z shr.u16 %rs2, %rs1, 4; 2026-02-21T09:27:30.4363253Z and.b16 %rs3, %rs2, 15; 2026-02-21T09:27:30.4363415Z mul.wide.u16 %r521, %rs3, 512; 2026-02-21T09:27:30.4363577Z shl.b32 %r522, %r27, 8; 2026-02-21T09:27:30.4363773Z or.b32 %r523, %r521, %r522; 2026-02-21T09:27:30.4363978Z or.b32 %r524, %r523, %r4; 2026-02-21T09:27:30.4364147Z mul.wide.u32 %rd8, %r524, 4096; 2026-02-21T09:27:30.4364322Z mov.b32 %r1606, 1; 2026-02-21T09:27:30.4364461Z mov.b32 %r1605, 4; 2026-02-21T09:27:30.4364605Z mov.b32 %r1601, 0; 2026-02-21T09:27:30.4364790Z mov.b64 %rd655, 0; 2026-02-21T09:27:30.4364939Z mov.b32 %r1603, %r1601; 2026-02-21T09:27:30.4365092Z mov.b32 %r1604, %r1601; 2026-02-21T09:27:30.4365250Z mov.b32 %r1607, %r1601; 2026-02-21T09:27:30.4365401Z bra.uni $L__BB0_4; 2026-02-21T09:27:30.4365600Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:27:30.4365942Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4366243Z setp.lt.u64 %p85, %rd655, 1888; 2026-02-21T09:27:30.4366532Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4366826Z // begin inline asm 2026-02-21T09:27:30.4366970Z 2026-02-21T09:27:30.4367086Z { 2026-02-21T09:27:30.4367216Z .reg .pred complete; 2026-02-21T09:27:30.4367363Z waitLoop: 2026-02-21T09:27:30.4367568Z mbarrier.try_wait.parity.shared.b64 complete, [%r1602], %r1601; 2026-02-21T09:27:30.4367820Z @!complete bra.uni waitLoop; 2026-02-21T09:27:30.4367972Z } 2026-02-21T09:27:30.4368038Z 2026-02-21T09:27:30.4368101Z // end inline asm 2026-02-21T09:27:30.4368250Z add.s32 %r572, %r1606, 1; 2026-02-21T09:27:30.4368408Z setp.gt.s32 %p88, %r572, 1; 2026-02-21T09:27:30.4368569Z selp.b32 %r1606, 0, %r572, %p88; 2026-02-21T09:27:30.4368739Z selp.b32 %r573, 1, 0, %p88; 2026-02-21T09:27:30.4368894Z xor.b32 %r75, %r1607, %r573; 2026-02-21T09:27:30.4369163Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4369458Z add.s32 %r574, %r1605, 1; 2026-02-21T09:27:30.4369607Z setp.gt.s32 %p89, %r574, 4; 2026-02-21T09:27:30.4369772Z selp.b32 %r1605, 0, %r574, %p89; 2026-02-21T09:27:30.4370042Z .loc 1 51 32 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:32 2026-02-21T09:27:30.4370428Z add.s64 %rd109, %rd654, %rd8; 2026-02-21T09:27:30.4370584Z add.s64 %rd104, %rd109, 320; 2026-02-21T09:27:30.4370743Z add.s64 %rd105, %rd109, 262464; 2026-02-21T09:27:30.4370902Z add.s64 %rd106, %rd109, 524608; 2026-02-21T09:27:30.4371171Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4371455Z add.s64 %rd107, %rd654, %rd6; 2026-02-21T09:27:30.4371611Z shl.b32 %r575, %r1605, 14; 2026-02-21T09:27:30.4371768Z add.s32 %r577, %r77, %r575; 2026-02-21T09:27:30.4371913Z bar.sync 0; 2026-02-21T09:27:30.4372051Z add.s32 %r559, %r577, %r5; 2026-02-21T09:27:30.4372203Z selp.b32 %r560, 16, 0, %p85; 2026-02-21T09:27:30.4372364Z // begin inline asm 2026-02-21T09:27:30.4372563Z cp.async.cg.shared.global [ %r559 + 0 ], [ %rd104 + 0 ], 0x10, %r560; 2026-02-21T09:27:30.4372790Z // end inline asm 2026-02-21T09:27:30.4372931Z add.s32 %r561, %r559, 4096; 2026-02-21T09:27:30.4373081Z // begin inline asm 2026-02-21T09:27:30.4373281Z cp.async.cg.shared.global [ %r561 + 0 ], [ %rd105 + 0 ], 0x10, %r560; 2026-02-21T09:27:30.4373498Z // end inline asm 2026-02-21T09:27:30.4373636Z add.s32 %r563, %r559, 8192; 2026-02-21T09:27:30.4373781Z // begin inline asm 2026-02-21T09:27:30.4373982Z cp.async.cg.shared.global [ %r563 + 0 ], [ %rd106 + 0 ], 0x10, %r560; 2026-02-21T09:27:30.4374198Z // end inline asm 2026-02-21T09:27:30.4374338Z add.s32 %r565, %r559, 12288; 2026-02-21T09:27:30.4374488Z // begin inline asm 2026-02-21T09:27:30.4374724Z cp.async.cg.shared.global [ %r565 + 0 ], [ %rd107 + 0 ], 0x10, %r560; 2026-02-21T09:27:30.4374957Z // end inline asm 2026-02-21T09:27:30.4375097Z cp.async.commit_group; 2026-02-21T09:27:30.4375352Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4375663Z shl.b32 %r578, %r1605, 3; 2026-02-21T09:27:30.4375856Z add.s32 %r579, %r77, %r578; 2026-02-21T09:27:30.4376015Z add.s32 %r571, %r579, 163840; 2026-02-21T09:27:30.4376191Z and.pred %p83, %p93, %p85; 2026-02-21T09:27:30.4376346Z // begin inline asm 2026-02-21T09:27:30.4376536Z @%p83 mbarrier.arrive.expect_tx.shared.b64 _, [%r571], 16384; 2026-02-21T09:27:30.4376754Z // end inline asm 2026-02-21T09:27:30.4376986Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4377265Z add.s32 %r568, %r577, 81920; 2026-02-21T09:27:30.4377408Z bar.sync 0; 2026-02-21T09:27:30.4377546Z elect.sync %r580|%p90, -1; 2026-02-21T09:27:30.4377701Z and.pred %p91, %p85, %p90; 2026-02-21T09:27:30.4377859Z and.pred %p84, %p1, %p91; 2026-02-21T09:27:30.4378007Z cvt.u32.u64 %r581, %rd655; 2026-02-21T09:27:30.4378162Z add.s32 %r569, %r581, 160; 2026-02-21T09:27:30.4378311Z // begin inline asm 2026-02-21T09:27:30.4378627Z @%p84 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r568], [%rd82, {%r569, %r512}], [%r571]; 2026-02-21T09:27:30.4378977Z // end inline asm 2026-02-21T09:27:30.4379214Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4379492Z add.s64 %rd654, %rd654, 64; 2026-02-21T09:27:30.4379648Z setp.lt.u64 %p92, %rd655, 1984; 2026-02-21T09:27:30.4379814Z add.s64 %rd655, %rd655, 32; 2026-02-21T09:27:30.4379968Z mov.b32 %r1601, %r1607; 2026-02-21T09:27:30.4380111Z mov.b32 %r1602, %r582; 2026-02-21T09:27:30.4380259Z mov.b32 %r1607, %r75; 2026-02-21T09:27:30.4380401Z @%p92 bra $L__BB0_4; 2026-02-21T09:27:30.4380546Z bra.uni $L__BB0_7; 2026-02-21T09:27:30.4380723Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:27:30.4380937Z add.s32 %r527, %r1604, 1; 2026-02-21T09:27:30.4381086Z setp.gt.s32 %p73, %r527, 4; 2026-02-21T09:27:30.4381248Z selp.b32 %r1604, 0, %r527, %p73; 2026-02-21T09:27:30.4381409Z selp.b32 %r528, 1, 0, %p73; 2026-02-21T09:27:30.4381657Z xor.b32 %r1603, %r1603, %r528; 2026-02-21T09:27:30.4381948Z .loc 1 51 85 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:51:85 2026-02-21T09:27:30.4382233Z cp.async.wait_group 3; 2026-02-21T09:27:30.4382386Z bar.sync 0; 2026-02-21T09:27:30.4382613Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4382888Z shl.b32 %r529, %r1604, 3; 2026-02-21T09:27:30.4383033Z add.s32 %r531, %r77, %r529; 2026-02-21T09:27:30.4383189Z add.s32 %r525, %r531, 163840; 2026-02-21T09:27:30.4383339Z // begin inline asm 2026-02-21T09:27:30.4383474Z 2026-02-21T09:27:30.4383589Z { 2026-02-21T09:27:30.4383705Z .reg .pred complete; 2026-02-21T09:27:30.4383852Z waitLoop: 2026-02-21T09:27:30.4384039Z mbarrier.try_wait.parity.shared.b64 complete, [%r525], %r1603; 2026-02-21T09:27:30.4384276Z @!complete bra.uni waitLoop; 2026-02-21T09:27:30.4384426Z } 2026-02-21T09:27:30.4384497Z 2026-02-21T09:27:30.4384553Z // end inline asm 2026-02-21T09:27:30.4384742Z shl.b32 %r532, %r1606, 3; 2026-02-21T09:27:30.4384899Z add.s32 %r533, %r77, %r532; 2026-02-21T09:27:30.4385046Z add.s32 %r582, %r533, 163888; 2026-02-21T09:27:30.4385308Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4385586Z @%p58 bra $L__BB0_6; 2026-02-21T09:27:30.4385770Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:27:30.4386085Z .loc 1 52 44 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:52:44 2026-02-21T09:27:30.4386359Z shl.b32 %r542, %r1604, 14; 2026-02-21T09:27:30.4386516Z add.s32 %r544, %r77, %r542; 2026-02-21T09:27:30.4386666Z add.s32 %r545, %r544, 81920; 2026-02-21T09:27:30.4386928Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4387249Z elect.sync %r546|%p75, -1; 2026-02-21T09:27:30.4387434Z bfe.u32 %r547, %r544, 4, 14; 2026-02-21T09:27:30.4387595Z cvt.u64.u32 %rd98, %r547; 2026-02-21T09:27:30.4387759Z or.b64 %rd89, %rd98, -9223371899348713472; 2026-02-21T09:27:30.4387944Z bfe.u32 %r548, %r545, 4, 14; 2026-02-21T09:27:30.4388093Z cvt.u64.u32 %rd99, %r548; 2026-02-21T09:27:30.4388256Z or.b64 %rd90, %rd99, -9223371899348713472; 2026-02-21T09:27:30.4388426Z mov.b32 %r535, 138412048; 2026-02-21T09:27:30.4388577Z mov.pred %p74, -1; 2026-02-21T09:27:30.4388722Z // begin inline asm 2026-02-21T09:27:30.4388941Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd89, %rd90, %r535, %p74; 2026-02-21T09:27:30.4389200Z // end inline asm 2026-02-21T09:27:30.4389330Z add.s32 %r549, %r544, 32; 2026-02-21T09:27:30.4389479Z bfe.u32 %r550, %r549, 4, 14; 2026-02-21T09:27:30.4389626Z cvt.u64.u32 %rd100, %r550; 2026-02-21T09:27:30.4389793Z or.b64 %rd91, %rd100, -9223371899348713472; 2026-02-21T09:27:30.4389966Z add.s32 %r551, %r544, 81952; 2026-02-21T09:27:30.4390118Z bfe.u32 %r552, %r551, 4, 14; 2026-02-21T09:27:30.4390273Z cvt.u64.u32 %rd101, %r552; 2026-02-21T09:27:30.4390430Z or.b64 %rd92, %rd101, -9223371899348713472; 2026-02-21T09:27:30.4390606Z // begin inline asm 2026-02-21T09:27:30.4390818Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 0 ], %rd91, %rd92, %r535, %p74; 2026-02-21T09:27:30.4391065Z // end inline asm 2026-02-21T09:27:30.4391198Z add.s32 %r553, %r544, 8192; 2026-02-21T09:27:30.4391351Z bfe.u32 %r554, %r553, 4, 14; 2026-02-21T09:27:30.4391500Z cvt.u64.u32 %rd102, %r554; 2026-02-21T09:27:30.4391663Z or.b64 %rd93, %rd102, -9223371899348713472; 2026-02-21T09:27:30.4391841Z // begin inline asm 2026-02-21T09:27:30.4392053Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd93, %rd90, %r535, %p74; 2026-02-21T09:27:30.4392307Z // end inline asm 2026-02-21T09:27:30.4392438Z add.s32 %r555, %r544, 8224; 2026-02-21T09:27:30.4392592Z bfe.u32 %r556, %r555, 4, 14; 2026-02-21T09:27:30.4392740Z cvt.u64.u32 %rd103, %r556; 2026-02-21T09:27:30.4392905Z or.b64 %rd95, %rd103, -9223371899348713472; 2026-02-21T09:27:30.4393130Z // begin inline asm 2026-02-21T09:27:30.4393351Z @%p75 tcgen05.mma.cta_group::1.kind::f16 [ %r1600 + 256 ], %rd95, %rd92, %r535, %p74; 2026-02-21T09:27:30.4393607Z // end inline asm 2026-02-21T09:27:30.4393742Z cvt.u64.u32 %rd97, %r582; 2026-02-21T09:27:30.4393899Z // begin inline asm 2026-02-21T09:27:30.4394101Z @%p75 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd97]; 2026-02-21T09:27:30.4394331Z // end inline asm 2026-02-21T09:27:30.4394460Z bra.uni $L__BB0_6; 2026-02-21T09:27:30.4394635Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:27:30.4394965Z .loc 1 0 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:0:52 2026-02-21T09:27:30.4395247Z mov.b32 %r583, 1; 2026-02-21T09:27:30.4395497Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4395775Z // begin inline asm 2026-02-21T09:27:30.4395912Z 2026-02-21T09:27:30.4396020Z { 2026-02-21T09:27:30.4396145Z .reg .pred complete; 2026-02-21T09:27:30.4396284Z waitLoop: 2026-02-21T09:27:30.4396469Z mbarrier.try_wait.parity.shared.b64 complete, [%r582], %r583; 2026-02-21T09:27:30.4396694Z @!complete bra.uni waitLoop; 2026-02-21T09:27:30.4396848Z } 2026-02-21T09:27:30.4396912Z 2026-02-21T09:27:30.4396973Z // end inline asm 2026-02-21T09:27:30.4397214Z .loc 1 46 80 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:46:80 2026-02-21T09:27:30.4397503Z cp.async.wait_group 0; 2026-02-21T09:27:30.4397648Z bar.sync 0; 2026-02-21T09:27:30.4397779Z // begin inline asm 2026-02-21T09:27:30.4397938Z @%p93 mbarrier.inval.shared::cta.b64 [%r368]; 2026-02-21T09:27:30.4398129Z // end inline asm 2026-02-21T09:27:30.4398256Z bar.sync 0; 2026-02-21T09:27:30.4398388Z // begin inline asm 2026-02-21T09:27:30.4398581Z @%p93 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T09:27:30.4398787Z // end inline asm 2026-02-21T09:27:30.4398926Z bar.sync 0; 2026-02-21T09:27:30.4399048Z // begin inline asm 2026-02-21T09:27:30.4399206Z @%p93 mbarrier.inval.shared::cta.b64 [%r370]; 2026-02-21T09:27:30.4399378Z // end inline asm 2026-02-21T09:27:30.4399511Z bar.sync 0; 2026-02-21T09:27:30.4399631Z // begin inline asm 2026-02-21T09:27:30.4399790Z @%p93 mbarrier.inval.shared::cta.b64 [%r371]; 2026-02-21T09:27:30.4399972Z // end inline asm 2026-02-21T09:27:30.4400097Z bar.sync 0; 2026-02-21T09:27:30.4400227Z // begin inline asm 2026-02-21T09:27:30.4400381Z @%p93 mbarrier.inval.shared::cta.b64 [%r509]; 2026-02-21T09:27:30.4400562Z // end inline asm 2026-02-21T09:27:30.4400693Z add.s32 %r589, %r77, 163888; 2026-02-21T09:27:30.4400847Z // begin inline asm 2026-02-21T09:27:30.4400997Z @%p93 mbarrier.inval.shared::cta.b64 [%r589]; 2026-02-21T09:27:30.4401176Z // end inline asm 2026-02-21T09:27:30.4401301Z bar.sync 0; 2026-02-21T09:27:30.4401432Z // begin inline asm 2026-02-21T09:27:30.4401592Z @%p93 mbarrier.inval.shared::cta.b64 [%r367]; 2026-02-21T09:27:30.4401768Z // end inline asm 2026-02-21T09:27:30.4402009Z .loc 1 56 45 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:56:45 2026-02-21T09:27:30.4402288Z shl.b32 %r1152, %r28, 11; 2026-02-21T09:27:30.4402445Z shl.b32 %r1153, %r29, 11; 2026-02-21T09:27:30.4402598Z shl.b32 %r1154, %r30, 11; 2026-02-21T09:27:30.4402748Z shl.b32 %r1155, %r31, 11; 2026-02-21T09:27:30.4402889Z shl.b32 %r1156, %r32, 11; 2026-02-21T09:27:30.4403036Z shl.b32 %r1157, %r33, 11; 2026-02-21T09:27:30.4403184Z shl.b32 %r1158, %r34, 11; 2026-02-21T09:27:30.4403323Z shl.b32 %r1159, %r35, 11; 2026-02-21T09:27:30.4403468Z shl.b32 %r1160, %r36, 11; 2026-02-21T09:27:30.4403605Z shl.b32 %r1161, %r37, 11; 2026-02-21T09:27:30.4403752Z shl.b32 %r1162, %r38, 11; 2026-02-21T09:27:30.4403891Z shl.b32 %r1163, %r39, 11; 2026-02-21T09:27:30.4404040Z shl.b32 %r1164, %r40, 11; 2026-02-21T09:27:30.4404183Z shl.b32 %r1165, %r41, 11; 2026-02-21T09:27:30.4404369Z shl.b32 %r1166, %r42, 11; 2026-02-21T09:27:30.4404544Z shl.b32 %r1167, %r43, 11; 2026-02-21T09:27:30.4404750Z shl.b32 %r1168, %r44, 11; 2026-02-21T09:27:30.4404908Z shl.b32 %r1169, %r45, 11; 2026-02-21T09:27:30.4405056Z shl.b32 %r1170, %r46, 11; 2026-02-21T09:27:30.4405214Z shl.b32 %r1171, %r47, 11; 2026-02-21T09:27:30.4405363Z shl.b32 %r1172, %r48, 11; 2026-02-21T09:27:30.4405521Z shl.b32 %r1173, %r49, 11; 2026-02-21T09:27:30.4405669Z shl.b32 %r1174, %r50, 11; 2026-02-21T09:27:30.4405825Z shl.b32 %r1175, %r51, 11; 2026-02-21T09:27:30.4405972Z shl.b32 %r1176, %r52, 11; 2026-02-21T09:27:30.4406125Z shl.b32 %r1177, %r53, 11; 2026-02-21T09:27:30.4406272Z shl.b32 %r1178, %r54, 11; 2026-02-21T09:27:30.4406425Z shl.b32 %r1179, %r55, 11; 2026-02-21T09:27:30.4406580Z shl.b32 %r1180, %r56, 11; 2026-02-21T09:27:30.4406725Z shl.b32 %r1181, %r57, 11; 2026-02-21T09:27:30.4406882Z shl.b32 %r1182, %r58, 11; 2026-02-21T09:27:30.4407032Z shl.b32 %r1183, %r59, 11; 2026-02-21T09:27:30.4407304Z .loc 1 56 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:56:52 2026-02-21T09:27:30.4407603Z or.b32 %r1184, %r1152, %r61; 2026-02-21T09:27:30.4407769Z or.b32 %r1185, %r1153, %r61; 2026-02-21T09:27:30.4407924Z or.b32 %r1186, %r1154, %r61; 2026-02-21T09:27:30.4408086Z or.b32 %r1187, %r1155, %r61; 2026-02-21T09:27:30.4408244Z or.b32 %r1188, %r1156, %r61; 2026-02-21T09:27:30.4408398Z or.b32 %r1189, %r1157, %r61; 2026-02-21T09:27:30.4408560Z or.b32 %r1190, %r1158, %r61; 2026-02-21T09:27:30.4408710Z or.b32 %r1191, %r1159, %r61; 2026-02-21T09:27:30.4408869Z or.b32 %r1192, %r1160, %r61; 2026-02-21T09:27:30.4409021Z or.b32 %r1193, %r1161, %r61; 2026-02-21T09:27:30.4409181Z or.b32 %r1194, %r1162, %r61; 2026-02-21T09:27:30.4409332Z or.b32 %r1195, %r1163, %r61; 2026-02-21T09:27:30.4409491Z or.b32 %r1196, %r1164, %r61; 2026-02-21T09:27:30.4409673Z or.b32 %r1197, %r1165, %r61; 2026-02-21T09:27:30.4409880Z or.b32 %r1198, %r1166, %r61; 2026-02-21T09:27:30.4410044Z or.b32 %r1199, %r1167, %r61; 2026-02-21T09:27:30.4410198Z or.b32 %r1200, %r1168, %r61; 2026-02-21T09:27:30.4410362Z or.b32 %r1201, %r1169, %r61; 2026-02-21T09:27:30.4410515Z or.b32 %r1202, %r1170, %r61; 2026-02-21T09:27:30.4410678Z or.b32 %r1203, %r1171, %r61; 2026-02-21T09:27:30.4410830Z or.b32 %r1204, %r1172, %r61; 2026-02-21T09:27:30.4410989Z or.b32 %r1205, %r1173, %r61; 2026-02-21T09:27:30.4411140Z or.b32 %r1206, %r1174, %r61; 2026-02-21T09:27:30.4411299Z or.b32 %r1207, %r1175, %r61; 2026-02-21T09:27:30.4411456Z or.b32 %r1208, %r1176, %r61; 2026-02-21T09:27:30.4411605Z or.b32 %r1209, %r1177, %r61; 2026-02-21T09:27:30.4411764Z or.b32 %r1210, %r1178, %r61; 2026-02-21T09:27:30.4411915Z or.b32 %r1211, %r1179, %r61; 2026-02-21T09:27:30.4412074Z or.b32 %r1212, %r1180, %r61; 2026-02-21T09:27:30.4412224Z or.b32 %r1213, %r1181, %r61; 2026-02-21T09:27:30.4412384Z or.b32 %r1214, %r1182, %r61; 2026-02-21T09:27:30.4412544Z or.b32 %r1215, %r1183, %r61; 2026-02-21T09:27:30.4412801Z .loc 1 56 24 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:56:24 2026-02-21T09:27:30.4413089Z mad.wide.u32 %rd110, %r1184, 2, %rd14; 2026-02-21T09:27:30.4413267Z mad.wide.u32 %rd111, %r1185, 2, %rd14; 2026-02-21T09:27:30.4413449Z mad.wide.u32 %rd112, %r1186, 2, %rd14; 2026-02-21T09:27:30.4413619Z mad.wide.u32 %rd113, %r1187, 2, %rd14; 2026-02-21T09:27:30.4413795Z mad.wide.u32 %rd114, %r1188, 2, %rd14; 2026-02-21T09:27:30.4413962Z mad.wide.u32 %rd115, %r1189, 2, %rd14; 2026-02-21T09:27:30.4414134Z mad.wide.u32 %rd116, %r1190, 2, %rd14; 2026-02-21T09:27:30.4414299Z mad.wide.u32 %rd117, %r1191, 2, %rd14; 2026-02-21T09:27:30.4414474Z mad.wide.u32 %rd118, %r1192, 2, %rd14; 2026-02-21T09:27:30.4414647Z mad.wide.u32 %rd119, %r1193, 2, %rd14; 2026-02-21T09:27:30.4414895Z mad.wide.u32 %rd120, %r1194, 2, %rd14; 2026-02-21T09:27:30.4415072Z mad.wide.u32 %rd121, %r1195, 2, %rd14; 2026-02-21T09:27:30.4415243Z mad.wide.u32 %rd122, %r1196, 2, %rd14; 2026-02-21T09:27:30.4415454Z mad.wide.u32 %rd123, %r1197, 2, %rd14; 2026-02-21T09:27:30.4415653Z mad.wide.u32 %rd124, %r1198, 2, %rd14; 2026-02-21T09:27:30.4415828Z mad.wide.u32 %rd125, %r1199, 2, %rd14; 2026-02-21T09:27:30.4415995Z mad.wide.u32 %rd126, %r1200, 2, %rd14; 2026-02-21T09:27:30.4416172Z mad.wide.u32 %rd127, %r1201, 2, %rd14; 2026-02-21T09:27:30.4416339Z mad.wide.u32 %rd128, %r1202, 2, %rd14; 2026-02-21T09:27:30.4416514Z mad.wide.u32 %rd129, %r1203, 2, %rd14; 2026-02-21T09:27:30.4416688Z mad.wide.u32 %rd130, %r1204, 2, %rd14; 2026-02-21T09:27:30.4416856Z mad.wide.u32 %rd131, %r1205, 2, %rd14; 2026-02-21T09:27:30.4417030Z mad.wide.u32 %rd132, %r1206, 2, %rd14; 2026-02-21T09:27:30.4417195Z mad.wide.u32 %rd133, %r1207, 2, %rd14; 2026-02-21T09:27:30.4417368Z mad.wide.u32 %rd134, %r1208, 2, %rd14; 2026-02-21T09:27:30.4417533Z mad.wide.u32 %rd135, %r1209, 2, %rd14; 2026-02-21T09:27:30.4417707Z mad.wide.u32 %rd136, %r1210, 2, %rd14; 2026-02-21T09:27:30.4417873Z mad.wide.u32 %rd137, %r1211, 2, %rd14; 2026-02-21T09:27:30.4418048Z mad.wide.u32 %rd138, %r1212, 2, %rd14; 2026-02-21T09:27:30.4418224Z mad.wide.u32 %rd139, %r1213, 2, %rd14; 2026-02-21T09:27:30.4418391Z mad.wide.u32 %rd140, %r1214, 2, %rd14; 2026-02-21T09:27:30.4418565Z mad.wide.u32 %rd141, %r1215, 2, %rd14; 2026-02-21T09:27:30.4418826Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4419106Z // begin inline asm 2026-02-21T09:27:30.4419455Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606}, [%r862 + 0]; 2026-02-21T09:27:30.4419840Z // end inline asm 2026-02-21T09:27:30.4419981Z // begin inline asm 2026-02-21T09:27:30.4420348Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623}, [%r862 + 16]; 2026-02-21T09:27:30.4420772Z // end inline asm 2026-02-21T09:27:30.4420905Z // begin inline asm 2026-02-21T09:27:30.4421247Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640}, [%r862 + 32]; 2026-02-21T09:27:30.4421621Z // end inline asm 2026-02-21T09:27:30.4421751Z // begin inline asm 2026-02-21T09:27:30.4422084Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657}, [%r862 + 48]; 2026-02-21T09:27:30.4422443Z // end inline asm 2026-02-21T09:27:30.4422577Z // begin inline asm 2026-02-21T09:27:30.4422908Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674}, [%r862 + 64]; 2026-02-21T09:27:30.4423273Z // end inline asm 2026-02-21T09:27:30.4423411Z // begin inline asm 2026-02-21T09:27:30.4423744Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691}, [%r862 + 80]; 2026-02-21T09:27:30.4424120Z // end inline asm 2026-02-21T09:27:30.4424248Z // begin inline asm 2026-02-21T09:27:30.4424597Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708}, [%r862 + 96]; 2026-02-21T09:27:30.4425010Z // end inline asm 2026-02-21T09:27:30.4425147Z // begin inline asm 2026-02-21T09:27:30.4425496Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725}, [%r862 + 112]; 2026-02-21T09:27:30.4425866Z // end inline asm 2026-02-21T09:27:30.4426001Z // begin inline asm 2026-02-21T09:27:30.4426338Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742}, [%r862 + 128]; 2026-02-21T09:27:30.4426756Z // end inline asm 2026-02-21T09:27:30.4426918Z // begin inline asm 2026-02-21T09:27:30.4427261Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759}, [%r862 + 144]; 2026-02-21T09:27:30.4427634Z // end inline asm 2026-02-21T09:27:30.4427762Z // begin inline asm 2026-02-21T09:27:30.4428100Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776}, [%r862 + 160]; 2026-02-21T09:27:30.4428464Z // end inline asm 2026-02-21T09:27:30.4428596Z // begin inline asm 2026-02-21T09:27:30.4428933Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793}, [%r862 + 176]; 2026-02-21T09:27:30.4429305Z // end inline asm 2026-02-21T09:27:30.4429440Z // begin inline asm 2026-02-21T09:27:30.4429771Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810}, [%r862 + 192]; 2026-02-21T09:27:30.4430145Z // end inline asm 2026-02-21T09:27:30.4430273Z // begin inline asm 2026-02-21T09:27:30.4430611Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827}, [%r862 + 208]; 2026-02-21T09:27:30.4430982Z // end inline asm 2026-02-21T09:27:30.4431110Z // begin inline asm 2026-02-21T09:27:30.4431449Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844}, [%r862 + 224]; 2026-02-21T09:27:30.4431812Z // end inline asm 2026-02-21T09:27:30.4431944Z // begin inline asm 2026-02-21T09:27:30.4432326Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r846, %r847, %r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858, %r859, %r860, %r861}, [%r862 + 240]; 2026-02-21T09:27:30.4432713Z // end inline asm 2026-02-21T09:27:30.4432851Z // begin inline asm 2026-02-21T09:27:30.4432998Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:27:30.4433170Z // end inline asm 2026-02-21T09:27:30.4433303Z cvt.u64.u32 %rd142, %r591; 2026-02-21T09:27:30.4433469Z cvt.u64.u32 %rd143, %r592; 2026-02-21T09:27:30.4433622Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:27:30.4433782Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:27:30.4434045Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4434335Z mov.b64 {%r1216, %r1217}, %rd145; 2026-02-21T09:27:30.4434517Z cvt.rn.f16x2.f32 %r1218, %r1217, %r1216; 2026-02-21T09:27:30.4434850Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4435130Z cvt.u64.u32 %rd146, %r593; 2026-02-21T09:27:30.4435281Z cvt.u64.u32 %rd147, %r594; 2026-02-21T09:27:30.4435437Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:27:30.4435591Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:27:30.4435855Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4436137Z mov.b64 {%r1219, %r1220}, %rd149; 2026-02-21T09:27:30.4436314Z cvt.rn.f16x2.f32 %r1221, %r1220, %r1219; 2026-02-21T09:27:30.4436594Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4436863Z cvt.u64.u32 %rd150, %r595; 2026-02-21T09:27:30.4437017Z cvt.u64.u32 %rd151, %r596; 2026-02-21T09:27:30.4437163Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:27:30.4437325Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:27:30.4437579Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4437863Z mov.b64 {%r1222, %r1223}, %rd153; 2026-02-21T09:27:30.4438040Z cvt.rn.f16x2.f32 %r1224, %r1223, %r1222; 2026-02-21T09:27:30.4438309Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4438638Z cvt.u64.u32 %rd154, %r597; 2026-02-21T09:27:30.4438787Z cvt.u64.u32 %rd155, %r598; 2026-02-21T09:27:30.4438938Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:27:30.4439087Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:27:30.4439348Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4439623Z mov.b64 {%r1225, %r1226}, %rd157; 2026-02-21T09:27:30.4439798Z cvt.rn.f16x2.f32 %r1227, %r1226, %r1225; 2026-02-21T09:27:30.4440075Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4440350Z cvt.u64.u32 %rd158, %r599; 2026-02-21T09:27:30.4440501Z cvt.u64.u32 %rd159, %r600; 2026-02-21T09:27:30.4440648Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:27:30.4440802Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:27:30.4441059Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4441347Z mov.b64 {%r1228, %r1229}, %rd161; 2026-02-21T09:27:30.4441519Z cvt.rn.f16x2.f32 %r1230, %r1229, %r1228; 2026-02-21T09:27:30.4441789Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4442070Z cvt.u64.u32 %rd162, %r601; 2026-02-21T09:27:30.4442216Z cvt.u64.u32 %rd163, %r602; 2026-02-21T09:27:30.4442372Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:27:30.4442523Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:27:30.4442787Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4443072Z mov.b64 {%r1231, %r1232}, %rd165; 2026-02-21T09:27:30.4443240Z cvt.rn.f16x2.f32 %r1233, %r1232, %r1231; 2026-02-21T09:27:30.4443544Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4443855Z cvt.u64.u32 %rd166, %r603; 2026-02-21T09:27:30.4444015Z cvt.u64.u32 %rd167, %r604; 2026-02-21T09:27:30.4444164Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:27:30.4444334Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:27:30.4444586Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4444901Z mov.b64 {%r1234, %r1235}, %rd169; 2026-02-21T09:27:30.4444966Z cvt.rn.f16x2.f32 %r1236, %r1235, %r1234; 2026-02-21T09:27:30.4445132Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4445189Z cvt.u64.u32 %rd170, %r605; 2026-02-21T09:27:30.4445245Z cvt.u64.u32 %rd171, %r606; 2026-02-21T09:27:30.4445302Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:27:30.4445367Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:27:30.4445526Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4445586Z mov.b64 {%r1237, %r1238}, %rd173; 2026-02-21T09:27:30.4445660Z cvt.rn.f16x2.f32 %r1239, %r1238, %r1237; 2026-02-21T09:27:30.4445819Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4445876Z cvt.u64.u32 %rd174, %r608; 2026-02-21T09:27:30.4445938Z cvt.u64.u32 %rd175, %r609; 2026-02-21T09:27:30.4445994Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:27:30.4446052Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:27:30.4446208Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4446273Z mov.b64 {%r1240, %r1241}, %rd177; 2026-02-21T09:27:30.4446336Z cvt.rn.f16x2.f32 %r1242, %r1241, %r1240; 2026-02-21T09:27:30.4446492Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4446556Z cvt.u64.u32 %rd178, %r610; 2026-02-21T09:27:30.4446614Z cvt.u64.u32 %rd179, %r611; 2026-02-21T09:27:30.4446674Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:27:30.4446778Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:27:30.4446977Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4447040Z mov.b64 {%r1243, %r1244}, %rd181; 2026-02-21T09:27:30.4447114Z cvt.rn.f16x2.f32 %r1245, %r1244, %r1243; 2026-02-21T09:27:30.4447281Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4447339Z cvt.u64.u32 %rd182, %r612; 2026-02-21T09:27:30.4447398Z cvt.u64.u32 %rd183, %r613; 2026-02-21T09:27:30.4447465Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:27:30.4447525Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:27:30.4447695Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4447765Z mov.b64 {%r1246, %r1247}, %rd185; 2026-02-21T09:27:30.4447833Z cvt.rn.f16x2.f32 %r1248, %r1247, %r1246; 2026-02-21T09:27:30.4448002Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4448072Z cvt.u64.u32 %rd186, %r614; 2026-02-21T09:27:30.4448132Z cvt.u64.u32 %rd187, %r615; 2026-02-21T09:27:30.4448190Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:27:30.4448248Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:27:30.4448426Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4448486Z mov.b64 {%r1249, %r1250}, %rd189; 2026-02-21T09:27:30.4448551Z cvt.rn.f16x2.f32 %r1251, %r1250, %r1249; 2026-02-21T09:27:30.4448725Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4448784Z cvt.u64.u32 %rd190, %r616; 2026-02-21T09:27:30.4448841Z cvt.u64.u32 %rd191, %r617; 2026-02-21T09:27:30.4448900Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:27:30.4449007Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:27:30.4449222Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4449285Z mov.b64 {%r1252, %r1253}, %rd193; 2026-02-21T09:27:30.4449359Z cvt.rn.f16x2.f32 %r1254, %r1253, %r1252; 2026-02-21T09:27:30.4449530Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4449588Z cvt.u64.u32 %rd194, %r618; 2026-02-21T09:27:30.4449651Z cvt.u64.u32 %rd195, %r619; 2026-02-21T09:27:30.4449709Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:27:30.4449768Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:27:30.4449940Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4450006Z mov.b64 {%r1255, %r1256}, %rd197; 2026-02-21T09:27:30.4450072Z cvt.rn.f16x2.f32 %r1257, %r1256, %r1255; 2026-02-21T09:27:30.4450243Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4450310Z cvt.u64.u32 %rd198, %r620; 2026-02-21T09:27:30.4450369Z cvt.u64.u32 %rd199, %r621; 2026-02-21T09:27:30.4450428Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:27:30.4450493Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:27:30.4450664Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4450725Z mov.b64 {%r1258, %r1259}, %rd201; 2026-02-21T09:27:30.4450797Z cvt.rn.f16x2.f32 %r1260, %r1259, %r1258; 2026-02-21T09:27:30.4450967Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4451025Z cvt.u64.u32 %rd202, %r622; 2026-02-21T09:27:30.4451082Z cvt.u64.u32 %rd203, %r623; 2026-02-21T09:27:30.4451148Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:27:30.4451206Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:27:30.4451378Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4451448Z mov.b64 {%r1261, %r1262}, %rd205; 2026-02-21T09:27:30.4451542Z cvt.rn.f16x2.f32 %r1263, %r1262, %r1261; 2026-02-21T09:27:30.4451740Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4451806Z cvt.u64.u32 %rd206, %r625; 2026-02-21T09:27:30.4451866Z cvt.u64.u32 %rd207, %r626; 2026-02-21T09:27:30.4451925Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:27:30.4451984Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:27:30.4452162Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4452222Z mov.b64 {%r1264, %r1265}, %rd209; 2026-02-21T09:27:30.4452289Z cvt.rn.f16x2.f32 %r1266, %r1265, %r1264; 2026-02-21T09:27:30.4452466Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4452525Z cvt.u64.u32 %rd210, %r627; 2026-02-21T09:27:30.4452585Z cvt.u64.u32 %rd211, %r628; 2026-02-21T09:27:30.4452646Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:27:30.4452715Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:27:30.4452889Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4452950Z mov.b64 {%r1267, %r1268}, %rd213; 2026-02-21T09:27:30.4453028Z cvt.rn.f16x2.f32 %r1269, %r1268, %r1267; 2026-02-21T09:27:30.4453200Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4453261Z cvt.u64.u32 %rd214, %r629; 2026-02-21T09:27:30.4453331Z cvt.u64.u32 %rd215, %r630; 2026-02-21T09:27:30.4453392Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:27:30.4453451Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:27:30.4453624Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4453693Z mov.b64 {%r1270, %r1271}, %rd217; 2026-02-21T09:27:30.4453781Z cvt.rn.f16x2.f32 %r1272, %r1271, %r1270; 2026-02-21T09:27:30.4453975Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4454046Z cvt.u64.u32 %rd218, %r631; 2026-02-21T09:27:30.4454105Z cvt.u64.u32 %rd219, %r632; 2026-02-21T09:27:30.4454163Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:27:30.4454240Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:27:30.4454397Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4454455Z mov.b64 {%r1273, %r1274}, %rd221; 2026-02-21T09:27:30.4454525Z cvt.rn.f16x2.f32 %r1275, %r1274, %r1273; 2026-02-21T09:27:30.4454725Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4454783Z cvt.u64.u32 %rd222, %r633; 2026-02-21T09:27:30.4454837Z cvt.u64.u32 %rd223, %r634; 2026-02-21T09:27:30.4454900Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:27:30.4454957Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:27:30.4455123Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4455191Z mov.b64 {%r1276, %r1277}, %rd225; 2026-02-21T09:27:30.4455256Z cvt.rn.f16x2.f32 %r1278, %r1277, %r1276; 2026-02-21T09:27:30.4455419Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4455483Z cvt.u64.u32 %rd226, %r635; 2026-02-21T09:27:30.4455538Z cvt.u64.u32 %rd227, %r636; 2026-02-21T09:27:30.4455594Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:27:30.4455652Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:27:30.4455823Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4455878Z mov.b64 {%r1279, %r1280}, %rd229; 2026-02-21T09:27:30.4455943Z cvt.rn.f16x2.f32 %r1281, %r1280, %r1279; 2026-02-21T09:27:30.4456113Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4456170Z cvt.u64.u32 %rd230, %r637; 2026-02-21T09:27:30.4456255Z cvt.u64.u32 %rd231, %r638; 2026-02-21T09:27:30.4456335Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:27:30.4456401Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:27:30.4456566Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4456624Z mov.b64 {%r1282, %r1283}, %rd233; 2026-02-21T09:27:30.4456695Z cvt.rn.f16x2.f32 %r1284, %r1283, %r1282; 2026-02-21T09:27:30.4456858Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4456914Z cvt.u64.u32 %rd234, %r639; 2026-02-21T09:27:30.4456977Z cvt.u64.u32 %rd235, %r640; 2026-02-21T09:27:30.4457033Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:27:30.4457089Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:27:30.4457247Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4457312Z mov.b64 {%r1285, %r1286}, %rd237; 2026-02-21T09:27:30.4457376Z cvt.rn.f16x2.f32 %r1287, %r1286, %r1285; 2026-02-21T09:27:30.4457538Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4457600Z cvt.u64.u32 %rd238, %r642; 2026-02-21T09:27:30.4457655Z cvt.u64.u32 %rd239, %r643; 2026-02-21T09:27:30.4457710Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:27:30.4457772Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:27:30.4457935Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4457991Z mov.b64 {%r1288, %r1289}, %rd241; 2026-02-21T09:27:30.4458059Z cvt.rn.f16x2.f32 %r1290, %r1289, %r1288; 2026-02-21T09:27:30.4458216Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4458273Z cvt.u64.u32 %rd242, %r644; 2026-02-21T09:27:30.4458328Z cvt.u64.u32 %rd243, %r645; 2026-02-21T09:27:30.4458414Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:27:30.4458497Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:27:30.4458660Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4458724Z mov.b64 {%r1291, %r1292}, %rd245; 2026-02-21T09:27:30.4458786Z cvt.rn.f16x2.f32 %r1293, %r1292, %r1291; 2026-02-21T09:27:30.4458943Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4459005Z cvt.u64.u32 %rd246, %r646; 2026-02-21T09:27:30.4459060Z cvt.u64.u32 %rd247, %r647; 2026-02-21T09:27:30.4459114Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:27:30.4459169Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:27:30.4459333Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4459389Z mov.b64 {%r1294, %r1295}, %rd249; 2026-02-21T09:27:30.4459451Z cvt.rn.f16x2.f32 %r1296, %r1295, %r1294; 2026-02-21T09:27:30.4459615Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4459674Z cvt.u64.u32 %rd250, %r648; 2026-02-21T09:27:30.4459729Z cvt.u64.u32 %rd251, %r649; 2026-02-21T09:27:30.4459784Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:27:30.4459848Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:27:30.4460008Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4460063Z mov.b64 {%r1297, %r1298}, %rd253; 2026-02-21T09:27:30.4460133Z cvt.rn.f16x2.f32 %r1299, %r1298, %r1297; 2026-02-21T09:27:30.4460291Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4460347Z cvt.u64.u32 %rd254, %r650; 2026-02-21T09:27:30.4460409Z cvt.u64.u32 %rd255, %r651; 2026-02-21T09:27:30.4460465Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:27:30.4460522Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:27:30.4460688Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4460775Z mov.b64 {%r1300, %r1301}, %rd257; 2026-02-21T09:27:30.4460860Z cvt.rn.f16x2.f32 %r1302, %r1301, %r1300; 2026-02-21T09:27:30.4461022Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4461087Z cvt.u64.u32 %rd258, %r652; 2026-02-21T09:27:30.4461142Z cvt.u64.u32 %rd259, %r653; 2026-02-21T09:27:30.4461197Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:27:30.4461261Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:27:30.4461423Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4461481Z mov.b64 {%r1303, %r1304}, %rd261; 2026-02-21T09:27:30.4461552Z cvt.rn.f16x2.f32 %r1305, %r1304, %r1303; 2026-02-21T09:27:30.4461715Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4461772Z cvt.u64.u32 %rd262, %r654; 2026-02-21T09:27:30.4461830Z cvt.u64.u32 %rd263, %r655; 2026-02-21T09:27:30.4461898Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:27:30.4461957Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:27:30.4462120Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4462188Z mov.b64 {%r1306, %r1307}, %rd265; 2026-02-21T09:27:30.4462251Z cvt.rn.f16x2.f32 %r1308, %r1307, %r1306; 2026-02-21T09:27:30.4462416Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4462479Z cvt.u64.u32 %rd266, %r656; 2026-02-21T09:27:30.4462536Z cvt.u64.u32 %rd267, %r657; 2026-02-21T09:27:30.4462590Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:27:30.4462647Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:27:30.4462816Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4462895Z mov.b64 {%r1309, %r1310}, %rd269; 2026-02-21T09:27:30.4462980Z cvt.rn.f16x2.f32 %r1311, %r1310, %r1309; 2026-02-21T09:27:30.4463146Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4463203Z cvt.u64.u32 %rd270, %r659; 2026-02-21T09:27:30.4463257Z cvt.u64.u32 %rd271, %r660; 2026-02-21T09:27:30.4463314Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:27:30.4463378Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:27:30.4463536Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4463594Z mov.b64 {%r1312, %r1313}, %rd273; 2026-02-21T09:27:30.4463664Z cvt.rn.f16x2.f32 %r1314, %r1313, %r1312; 2026-02-21T09:27:30.4463823Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4463879Z cvt.u64.u32 %rd274, %r661; 2026-02-21T09:27:30.4463942Z cvt.u64.u32 %rd275, %r662; 2026-02-21T09:27:30.4463999Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:27:30.4464058Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:27:30.4464218Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4464283Z mov.b64 {%r1315, %r1316}, %rd277; 2026-02-21T09:27:30.4464347Z cvt.rn.f16x2.f32 %r1317, %r1316, %r1315; 2026-02-21T09:27:30.4464503Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4464567Z cvt.u64.u32 %rd278, %r663; 2026-02-21T09:27:30.4464623Z cvt.u64.u32 %rd279, %r664; 2026-02-21T09:27:30.4464700Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:27:30.4464764Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:27:30.4464923Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4464980Z mov.b64 {%r1318, %r1319}, %rd281; 2026-02-21T09:27:30.4465050Z cvt.rn.f16x2.f32 %r1320, %r1319, %r1318; 2026-02-21T09:27:30.4465213Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4465296Z cvt.u64.u32 %rd282, %r665; 2026-02-21T09:27:30.4465390Z cvt.u64.u32 %rd283, %r666; 2026-02-21T09:27:30.4465452Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:27:30.4465509Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:27:30.4465670Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4465734Z mov.b64 {%r1321, %r1322}, %rd285; 2026-02-21T09:27:30.4465796Z cvt.rn.f16x2.f32 %r1323, %r1322, %r1321; 2026-02-21T09:27:30.4465953Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4466015Z cvt.u64.u32 %rd286, %r667; 2026-02-21T09:27:30.4466069Z cvt.u64.u32 %rd287, %r668; 2026-02-21T09:27:30.4466125Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:27:30.4466183Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:27:30.4466353Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4466412Z mov.b64 {%r1324, %r1325}, %rd289; 2026-02-21T09:27:30.4466477Z cvt.rn.f16x2.f32 %r1326, %r1325, %r1324; 2026-02-21T09:27:30.4466643Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4466698Z cvt.u64.u32 %rd290, %r669; 2026-02-21T09:27:30.4466752Z cvt.u64.u32 %rd291, %r670; 2026-02-21T09:27:30.4466807Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:27:30.4466869Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:27:30.4467027Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4467085Z mov.b64 {%r1327, %r1328}, %rd293; 2026-02-21T09:27:30.4467154Z cvt.rn.f16x2.f32 %r1329, %r1328, %r1327; 2026-02-21T09:27:30.4467315Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4467396Z cvt.u64.u32 %rd294, %r671; 2026-02-21T09:27:30.4467480Z cvt.u64.u32 %rd295, %r672; 2026-02-21T09:27:30.4467539Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:27:30.4467596Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:27:30.4467750Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4467813Z mov.b64 {%r1330, %r1331}, %rd297; 2026-02-21T09:27:30.4467875Z cvt.rn.f16x2.f32 %r1332, %r1331, %r1330; 2026-02-21T09:27:30.4468030Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4468093Z cvt.u64.u32 %rd298, %r673; 2026-02-21T09:27:30.4468147Z cvt.u64.u32 %rd299, %r674; 2026-02-21T09:27:30.4468202Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:27:30.4468264Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:27:30.4468417Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4468474Z mov.b64 {%r1333, %r1334}, %rd301; 2026-02-21T09:27:30.4468545Z cvt.rn.f16x2.f32 %r1335, %r1334, %r1333; 2026-02-21T09:27:30.4468700Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4468757Z cvt.u64.u32 %rd302, %r676; 2026-02-21T09:27:30.4468812Z cvt.u64.u32 %rd303, %r677; 2026-02-21T09:27:30.4468875Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:27:30.4468931Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:27:30.4469087Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4469151Z mov.b64 {%r1336, %r1337}, %rd305; 2026-02-21T09:27:30.4469214Z cvt.rn.f16x2.f32 %r1338, %r1337, %r1336; 2026-02-21T09:27:30.4469367Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4469430Z cvt.u64.u32 %rd306, %r678; 2026-02-21T09:27:30.4469486Z cvt.u64.u32 %rd307, %r679; 2026-02-21T09:27:30.4469543Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:27:30.4469600Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:27:30.4469792Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4469870Z mov.b64 {%r1339, %r1340}, %rd309; 2026-02-21T09:27:30.4469934Z cvt.rn.f16x2.f32 %r1341, %r1340, %r1339; 2026-02-21T09:27:30.4470103Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4470158Z cvt.u64.u32 %rd310, %r680; 2026-02-21T09:27:30.4470214Z cvt.u64.u32 %rd311, %r681; 2026-02-21T09:27:30.4470270Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:27:30.4470336Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:27:30.4470497Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4470557Z mov.b64 {%r1342, %r1343}, %rd313; 2026-02-21T09:27:30.4470630Z cvt.rn.f16x2.f32 %r1344, %r1343, %r1342; 2026-02-21T09:27:30.4470792Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4470849Z cvt.u64.u32 %rd314, %r682; 2026-02-21T09:27:30.4470913Z cvt.u64.u32 %rd315, %r683; 2026-02-21T09:27:30.4470970Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:27:30.4471026Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:27:30.4471183Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4471246Z mov.b64 {%r1345, %r1346}, %rd317; 2026-02-21T09:27:30.4471310Z cvt.rn.f16x2.f32 %r1347, %r1346, %r1345; 2026-02-21T09:27:30.4471468Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4471530Z cvt.u64.u32 %rd318, %r684; 2026-02-21T09:27:30.4471584Z cvt.u64.u32 %rd319, %r685; 2026-02-21T09:27:30.4471638Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:27:30.4471702Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:27:30.4471905Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4471965Z mov.b64 {%r1348, %r1349}, %rd321; 2026-02-21T09:27:30.4472037Z cvt.rn.f16x2.f32 %r1350, %r1349, %r1348; 2026-02-21T09:27:30.4472197Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4472254Z cvt.u64.u32 %rd322, %r686; 2026-02-21T09:27:30.4472308Z cvt.u64.u32 %rd323, %r687; 2026-02-21T09:27:30.4472370Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:27:30.4472426Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:27:30.4472588Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4472650Z mov.b64 {%r1351, %r1352}, %rd325; 2026-02-21T09:27:30.4472712Z cvt.rn.f16x2.f32 %r1353, %r1352, %r1351; 2026-02-21T09:27:30.4472872Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4472937Z cvt.u64.u32 %rd326, %r688; 2026-02-21T09:27:30.4472995Z cvt.u64.u32 %rd327, %r689; 2026-02-21T09:27:30.4473052Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:27:30.4473111Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:27:30.4473280Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4473336Z mov.b64 {%r1354, %r1355}, %rd329; 2026-02-21T09:27:30.4473400Z cvt.rn.f16x2.f32 %r1356, %r1355, %r1354; 2026-02-21T09:27:30.4473572Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4473629Z cvt.u64.u32 %rd330, %r690; 2026-02-21T09:27:30.4473684Z cvt.u64.u32 %rd331, %r691; 2026-02-21T09:27:30.4473741Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:27:30.4473804Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:27:30.4473965Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4474025Z mov.b64 {%r1357, %r1358}, %rd333; 2026-02-21T09:27:30.4474098Z cvt.rn.f16x2.f32 %r1359, %r1358, %r1357; 2026-02-21T09:27:30.4474280Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4474355Z cvt.u64.u32 %rd334, %r693; 2026-02-21T09:27:30.4474417Z cvt.u64.u32 %rd335, %r694; 2026-02-21T09:27:30.4474472Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:27:30.4474529Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:27:30.4474737Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4474804Z mov.b64 {%r1360, %r1361}, %rd337; 2026-02-21T09:27:30.4474869Z cvt.rn.f16x2.f32 %r1362, %r1361, %r1360; 2026-02-21T09:27:30.4475037Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4475102Z cvt.u64.u32 %rd338, %r695; 2026-02-21T09:27:30.4475160Z cvt.u64.u32 %rd339, %r696; 2026-02-21T09:27:30.4475216Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:27:30.4475284Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:27:30.4475450Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4475510Z mov.b64 {%r1363, %r1364}, %rd341; 2026-02-21T09:27:30.4475583Z cvt.rn.f16x2.f32 %r1365, %r1364, %r1363; 2026-02-21T09:27:30.4475748Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4475805Z cvt.u64.u32 %rd342, %r697; 2026-02-21T09:27:30.4475863Z cvt.u64.u32 %rd343, %r698; 2026-02-21T09:27:30.4475928Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:27:30.4475987Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:27:30.4476154Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4476217Z mov.b64 {%r1366, %r1367}, %rd345; 2026-02-21T09:27:30.4476282Z cvt.rn.f16x2.f32 %r1368, %r1367, %r1366; 2026-02-21T09:27:30.4476502Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4476567Z cvt.u64.u32 %rd346, %r699; 2026-02-21T09:27:30.4476625Z cvt.u64.u32 %rd347, %r700; 2026-02-21T09:27:30.4476679Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:27:30.4476735Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:27:30.4476904Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4476960Z mov.b64 {%r1369, %r1370}, %rd349; 2026-02-21T09:27:30.4477023Z cvt.rn.f16x2.f32 %r1371, %r1370, %r1369; 2026-02-21T09:27:30.4477188Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4477244Z cvt.u64.u32 %rd350, %r701; 2026-02-21T09:27:30.4477299Z cvt.u64.u32 %rd351, %r702; 2026-02-21T09:27:30.4477353Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:27:30.4477417Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:27:30.4477583Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4477641Z mov.b64 {%r1372, %r1373}, %rd353; 2026-02-21T09:27:30.4477716Z cvt.rn.f16x2.f32 %r1374, %r1373, %r1372; 2026-02-21T09:27:30.4477876Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4477931Z cvt.u64.u32 %rd354, %r703; 2026-02-21T09:27:30.4477993Z cvt.u64.u32 %rd355, %r704; 2026-02-21T09:27:30.4478048Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:27:30.4478104Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:27:30.4478270Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4478335Z mov.b64 {%r1375, %r1376}, %rd357; 2026-02-21T09:27:30.4478400Z cvt.rn.f16x2.f32 %r1377, %r1376, %r1375; 2026-02-21T09:27:30.4478564Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4478629Z cvt.u64.u32 %rd358, %r705; 2026-02-21T09:27:30.4478687Z cvt.u64.u32 %rd359, %r706; 2026-02-21T09:27:30.4478744Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:27:30.4478858Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:27:30.4479025Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4479082Z mov.b64 {%r1378, %r1379}, %rd361; 2026-02-21T09:27:30.4479157Z cvt.rn.f16x2.f32 %r1380, %r1379, %r1378; 2026-02-21T09:27:30.4479321Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4479377Z cvt.u64.u32 %rd362, %r707; 2026-02-21T09:27:30.4479443Z cvt.u64.u32 %rd363, %r708; 2026-02-21T09:27:30.4479507Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:27:30.4479563Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:27:30.4479730Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4479795Z mov.b64 {%r1381, %r1382}, %rd365; 2026-02-21T09:27:30.4479861Z cvt.rn.f16x2.f32 %r1383, %r1382, %r1381; 2026-02-21T09:27:30.4480021Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4480085Z cvt.u64.u32 %rd366, %r710; 2026-02-21T09:27:30.4480140Z cvt.u64.u32 %rd367, %r711; 2026-02-21T09:27:30.4480196Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:27:30.4480252Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:27:30.4480422Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4480478Z mov.b64 {%r1384, %r1385}, %rd369; 2026-02-21T09:27:30.4480541Z cvt.rn.f16x2.f32 %r1386, %r1385, %r1384; 2026-02-21T09:27:30.4480711Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4480766Z cvt.u64.u32 %rd370, %r712; 2026-02-21T09:27:30.4480821Z cvt.u64.u32 %rd371, %r713; 2026-02-21T09:27:30.4480885Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:27:30.4480964Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:27:30.4481149Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4481208Z mov.b64 {%r1387, %r1388}, %rd373; 2026-02-21T09:27:30.4481278Z cvt.rn.f16x2.f32 %r1389, %r1388, %r1387; 2026-02-21T09:27:30.4481442Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4481497Z cvt.u64.u32 %rd374, %r714; 2026-02-21T09:27:30.4481559Z cvt.u64.u32 %rd375, %r715; 2026-02-21T09:27:30.4481615Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:27:30.4481672Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:27:30.4481842Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4481899Z mov.b64 {%r1390, %r1391}, %rd377; 2026-02-21T09:27:30.4481962Z cvt.rn.f16x2.f32 %r1392, %r1391, %r1390; 2026-02-21T09:27:30.4482127Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4482192Z cvt.u64.u32 %rd378, %r716; 2026-02-21T09:27:30.4482250Z cvt.u64.u32 %rd379, %r717; 2026-02-21T09:27:30.4482306Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:27:30.4482370Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:27:30.4482529Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4482585Z mov.b64 {%r1393, %r1394}, %rd381; 2026-02-21T09:27:30.4482655Z cvt.rn.f16x2.f32 %r1395, %r1394, %r1393; 2026-02-21T09:27:30.4482819Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4482875Z cvt.u64.u32 %rd382, %r718; 2026-02-21T09:27:30.4482930Z cvt.u64.u32 %rd383, %r719; 2026-02-21T09:27:30.4482991Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:27:30.4483047Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:27:30.4483213Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4483276Z mov.b64 {%r1396, %r1397}, %rd385; 2026-02-21T09:27:30.4483408Z cvt.rn.f16x2.f32 %r1398, %r1397, %r1396; 2026-02-21T09:27:30.4483599Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4483662Z cvt.u64.u32 %rd386, %r720; 2026-02-21T09:27:30.4483718Z cvt.u64.u32 %rd387, %r721; 2026-02-21T09:27:30.4483773Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:27:30.4483830Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:27:30.4483996Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4484053Z mov.b64 {%r1399, %r1400}, %rd389; 2026-02-21T09:27:30.4484114Z cvt.rn.f16x2.f32 %r1401, %r1400, %r1399; 2026-02-21T09:27:30.4484284Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4484339Z cvt.u64.u32 %rd390, %r722; 2026-02-21T09:27:30.4484395Z cvt.u64.u32 %rd391, %r723; 2026-02-21T09:27:30.4484458Z shl.b64 %rd392, %rd391, 32; 2026-02-21T09:27:30.4484515Z or.b64 %rd393, %rd390, %rd392; 2026-02-21T09:27:30.4484721Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4484781Z mov.b64 {%r1402, %r1403}, %rd393; 2026-02-21T09:27:30.4484851Z cvt.rn.f16x2.f32 %r1404, %r1403, %r1402; 2026-02-21T09:27:30.4485013Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4485070Z cvt.u64.u32 %rd394, %r724; 2026-02-21T09:27:30.4485132Z cvt.u64.u32 %rd395, %r725; 2026-02-21T09:27:30.4485188Z shl.b64 %rd396, %rd395, 32; 2026-02-21T09:27:30.4485244Z or.b64 %rd397, %rd394, %rd396; 2026-02-21T09:27:30.4485415Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4485475Z mov.b64 {%r1405, %r1406}, %rd397; 2026-02-21T09:27:30.4485569Z cvt.rn.f16x2.f32 %r1407, %r1406, %r1405; 2026-02-21T09:27:30.4485758Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4485825Z cvt.u64.u32 %rd398, %r727; 2026-02-21T09:27:30.4485880Z cvt.u64.u32 %rd399, %r728; 2026-02-21T09:27:30.4485934Z shl.b64 %rd400, %rd399, 32; 2026-02-21T09:27:30.4485997Z or.b64 %rd401, %rd398, %rd400; 2026-02-21T09:27:30.4486157Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4486214Z mov.b64 {%r1408, %r1409}, %rd401; 2026-02-21T09:27:30.4486284Z cvt.rn.f16x2.f32 %r1410, %r1409, %r1408; 2026-02-21T09:27:30.4486444Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4486500Z cvt.u64.u32 %rd402, %r729; 2026-02-21T09:27:30.4486556Z cvt.u64.u32 %rd403, %r730; 2026-02-21T09:27:30.4486619Z shl.b64 %rd404, %rd403, 32; 2026-02-21T09:27:30.4486677Z or.b64 %rd405, %rd402, %rd404; 2026-02-21T09:27:30.4486840Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4486907Z mov.b64 {%r1411, %r1412}, %rd405; 2026-02-21T09:27:30.4486971Z cvt.rn.f16x2.f32 %r1413, %r1412, %r1411; 2026-02-21T09:27:30.4487131Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4487194Z cvt.u64.u32 %rd406, %r731; 2026-02-21T09:27:30.4487250Z cvt.u64.u32 %rd407, %r732; 2026-02-21T09:27:30.4487306Z shl.b64 %rd408, %rd407, 32; 2026-02-21T09:27:30.4487362Z or.b64 %rd409, %rd406, %rd408; 2026-02-21T09:27:30.4487533Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4487590Z mov.b64 {%r1414, %r1415}, %rd409; 2026-02-21T09:27:30.4487656Z cvt.rn.f16x2.f32 %r1416, %r1415, %r1414; 2026-02-21T09:27:30.4487826Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4487885Z cvt.u64.u32 %rd410, %r733; 2026-02-21T09:27:30.4487971Z cvt.u64.u32 %rd411, %r734; 2026-02-21T09:27:30.4488064Z shl.b64 %rd412, %rd411, 32; 2026-02-21T09:27:30.4488121Z or.b64 %rd413, %rd410, %rd412; 2026-02-21T09:27:30.4488279Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4488336Z mov.b64 {%r1417, %r1418}, %rd413; 2026-02-21T09:27:30.4488406Z cvt.rn.f16x2.f32 %r1419, %r1418, %r1417; 2026-02-21T09:27:30.4488563Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4488618Z cvt.u64.u32 %rd414, %r735; 2026-02-21T09:27:30.4488679Z cvt.u64.u32 %rd415, %r736; 2026-02-21T09:27:30.4488736Z shl.b64 %rd416, %rd415, 32; 2026-02-21T09:27:30.4488791Z or.b64 %rd417, %rd414, %rd416; 2026-02-21T09:27:30.4488957Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4489015Z mov.b64 {%r1420, %r1421}, %rd417; 2026-02-21T09:27:30.4489080Z cvt.rn.f16x2.f32 %r1422, %r1421, %r1420; 2026-02-21T09:27:30.4489240Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4489302Z cvt.u64.u32 %rd418, %r737; 2026-02-21T09:27:30.4489356Z cvt.u64.u32 %rd419, %r738; 2026-02-21T09:27:30.4489412Z shl.b64 %rd420, %rd419, 32; 2026-02-21T09:27:30.4489474Z or.b64 %rd421, %rd418, %rd420; 2026-02-21T09:27:30.4489633Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4489689Z mov.b64 {%r1423, %r1424}, %rd421; 2026-02-21T09:27:30.4489759Z cvt.rn.f16x2.f32 %r1425, %r1424, %r1423; 2026-02-21T09:27:30.4489916Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4489971Z cvt.u64.u32 %rd422, %r739; 2026-02-21T09:27:30.4490046Z cvt.u64.u32 %rd423, %r740; 2026-02-21T09:27:30.4490134Z shl.b64 %rd424, %rd423, 32; 2026-02-21T09:27:30.4490196Z or.b64 %rd425, %rd422, %rd424; 2026-02-21T09:27:30.4490366Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4490432Z mov.b64 {%r1426, %r1427}, %rd425; 2026-02-21T09:27:30.4490500Z cvt.rn.f16x2.f32 %r1428, %r1427, %r1426; 2026-02-21T09:27:30.4490670Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4490733Z cvt.u64.u32 %rd426, %r741; 2026-02-21T09:27:30.4490792Z cvt.u64.u32 %rd427, %r742; 2026-02-21T09:27:30.4490849Z shl.b64 %rd428, %rd427, 32; 2026-02-21T09:27:30.4490909Z or.b64 %rd429, %rd426, %rd428; 2026-02-21T09:27:30.4491086Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4491145Z mov.b64 {%r1429, %r1430}, %rd429; 2026-02-21T09:27:30.4491213Z cvt.rn.f16x2.f32 %r1431, %r1430, %r1429; 2026-02-21T09:27:30.4491394Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4491456Z cvt.u64.u32 %rd430, %r744; 2026-02-21T09:27:30.4491513Z cvt.u64.u32 %rd431, %r745; 2026-02-21T09:27:30.4491578Z shl.b64 %rd432, %rd431, 32; 2026-02-21T09:27:30.4491636Z or.b64 %rd433, %rd430, %rd432; 2026-02-21T09:27:30.4491801Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4491862Z mov.b64 {%r1432, %r1433}, %rd433; 2026-02-21T09:27:30.4491935Z cvt.rn.f16x2.f32 %r1434, %r1433, %r1432; 2026-02-21T09:27:30.4492102Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4492159Z cvt.u64.u32 %rd434, %r746; 2026-02-21T09:27:30.4492224Z cvt.u64.u32 %rd435, %r747; 2026-02-21T09:27:30.4492282Z shl.b64 %rd436, %rd435, 32; 2026-02-21T09:27:30.4492342Z or.b64 %rd437, %rd434, %rd436; 2026-02-21T09:27:30.4492520Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4492630Z mov.b64 {%r1435, %r1436}, %rd437; 2026-02-21T09:27:30.4492697Z cvt.rn.f16x2.f32 %r1437, %r1436, %r1435; 2026-02-21T09:27:30.4492863Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4492927Z cvt.u64.u32 %rd438, %r748; 2026-02-21T09:27:30.4492985Z cvt.u64.u32 %rd439, %r749; 2026-02-21T09:27:30.4493042Z shl.b64 %rd440, %rd439, 32; 2026-02-21T09:27:30.4493106Z or.b64 %rd441, %rd438, %rd440; 2026-02-21T09:27:30.4493272Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4493331Z mov.b64 {%r1438, %r1439}, %rd441; 2026-02-21T09:27:30.4493403Z cvt.rn.f16x2.f32 %r1440, %r1439, %r1438; 2026-02-21T09:27:30.4493569Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4493629Z cvt.u64.u32 %rd442, %r750; 2026-02-21T09:27:30.4493688Z cvt.u64.u32 %rd443, %r751; 2026-02-21T09:27:30.4493755Z shl.b64 %rd444, %rd443, 32; 2026-02-21T09:27:30.4493813Z or.b64 %rd445, %rd442, %rd444; 2026-02-21T09:27:30.4493979Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4494045Z mov.b64 {%r1441, %r1442}, %rd445; 2026-02-21T09:27:30.4494109Z cvt.rn.f16x2.f32 %r1443, %r1442, %r1441; 2026-02-21T09:27:30.4494300Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4494363Z cvt.u64.u32 %rd446, %r752; 2026-02-21T09:27:30.4494421Z cvt.u64.u32 %rd447, %r753; 2026-02-21T09:27:30.4494478Z shl.b64 %rd448, %rd447, 32; 2026-02-21T09:27:30.4494538Z or.b64 %rd449, %rd446, %rd448; 2026-02-21T09:27:30.4494764Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4494853Z mov.b64 {%r1444, %r1445}, %rd449; 2026-02-21T09:27:30.4494923Z cvt.rn.f16x2.f32 %r1446, %r1445, %r1444; 2026-02-21T09:27:30.4495103Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4495161Z cvt.u64.u32 %rd450, %r754; 2026-02-21T09:27:30.4495220Z cvt.u64.u32 %rd451, %r755; 2026-02-21T09:27:30.4495284Z shl.b64 %rd452, %rd451, 32; 2026-02-21T09:27:30.4495342Z or.b64 %rd453, %rd450, %rd452; 2026-02-21T09:27:30.4495530Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4495589Z mov.b64 {%r1447, %r1448}, %rd453; 2026-02-21T09:27:30.4495663Z cvt.rn.f16x2.f32 %r1449, %r1448, %r1447; 2026-02-21T09:27:30.4495851Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4495909Z cvt.u64.u32 %rd454, %r756; 2026-02-21T09:27:30.4495977Z cvt.u64.u32 %rd455, %r757; 2026-02-21T09:27:30.4496037Z shl.b64 %rd456, %rd455, 32; 2026-02-21T09:27:30.4496096Z or.b64 %rd457, %rd454, %rd456; 2026-02-21T09:27:30.4496274Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4496333Z mov.b64 {%r1450, %r1451}, %rd457; 2026-02-21T09:27:30.4496399Z cvt.rn.f16x2.f32 %r1452, %r1451, %r1450; 2026-02-21T09:27:30.4496587Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4496655Z cvt.u64.u32 %rd458, %r758; 2026-02-21T09:27:30.4496712Z cvt.u64.u32 %rd459, %r759; 2026-02-21T09:27:30.4496771Z shl.b64 %rd460, %rd459, 32; 2026-02-21T09:27:30.4496839Z or.b64 %rd461, %rd458, %rd460; 2026-02-21T09:27:30.4497028Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4497089Z mov.b64 {%r1453, %r1454}, %rd461; 2026-02-21T09:27:30.4497164Z cvt.rn.f16x2.f32 %r1455, %r1454, %r1453; 2026-02-21T09:27:30.4497332Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4497441Z cvt.u64.u32 %rd462, %r761; 2026-02-21T09:27:30.4497499Z cvt.u64.u32 %rd463, %r762; 2026-02-21T09:27:30.4497564Z shl.b64 %rd464, %rd463, 32; 2026-02-21T09:27:30.4497622Z or.b64 %rd465, %rd462, %rd464; 2026-02-21T09:27:30.4497811Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4497877Z mov.b64 {%r1456, %r1457}, %rd465; 2026-02-21T09:27:30.4497944Z cvt.rn.f16x2.f32 %r1458, %r1457, %r1456; 2026-02-21T09:27:30.4498131Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4498195Z cvt.u64.u32 %rd466, %r763; 2026-02-21T09:27:30.4498252Z cvt.u64.u32 %rd467, %r764; 2026-02-21T09:27:30.4498309Z shl.b64 %rd468, %rd467, 32; 2026-02-21T09:27:30.4498366Z or.b64 %rd469, %rd466, %rd468; 2026-02-21T09:27:30.4498541Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4498604Z mov.b64 {%r1459, %r1460}, %rd469; 2026-02-21T09:27:30.4498670Z cvt.rn.f16x2.f32 %r1461, %r1460, %r1459; 2026-02-21T09:27:30.4498858Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4498916Z cvt.u64.u32 %rd470, %r765; 2026-02-21T09:27:30.4498974Z cvt.u64.u32 %rd471, %r766; 2026-02-21T09:27:30.4499041Z shl.b64 %rd472, %rd471, 32; 2026-02-21T09:27:30.4499099Z or.b64 %rd473, %rd470, %rd472; 2026-02-21T09:27:30.4499262Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4499321Z mov.b64 {%r1462, %r1463}, %rd473; 2026-02-21T09:27:30.4499392Z cvt.rn.f16x2.f32 %r1464, %r1463, %r1462; 2026-02-21T09:27:30.4499604Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4499696Z cvt.u64.u32 %rd474, %r767; 2026-02-21T09:27:30.4499773Z cvt.u64.u32 %rd475, %r768; 2026-02-21T09:27:30.4499831Z shl.b64 %rd476, %rd475, 32; 2026-02-21T09:27:30.4499887Z or.b64 %rd477, %rd474, %rd476; 2026-02-21T09:27:30.4500050Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4500108Z mov.b64 {%r1465, %r1466}, %rd477; 2026-02-21T09:27:30.4500171Z cvt.rn.f16x2.f32 %r1467, %r1466, %r1465; 2026-02-21T09:27:30.4500325Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4500388Z cvt.u64.u32 %rd478, %r769; 2026-02-21T09:27:30.4500444Z cvt.u64.u32 %rd479, %r770; 2026-02-21T09:27:30.4500501Z shl.b64 %rd480, %rd479, 32; 2026-02-21T09:27:30.4500564Z or.b64 %rd481, %rd478, %rd480; 2026-02-21T09:27:30.4500718Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4500777Z mov.b64 {%r1468, %r1469}, %rd481; 2026-02-21T09:27:30.4500848Z cvt.rn.f16x2.f32 %r1470, %r1469, %r1468; 2026-02-21T09:27:30.4501005Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4501061Z cvt.u64.u32 %rd482, %r771; 2026-02-21T09:27:30.4501117Z cvt.u64.u32 %rd483, %r772; 2026-02-21T09:27:30.4501180Z shl.b64 %rd484, %rd483, 32; 2026-02-21T09:27:30.4501237Z or.b64 %rd485, %rd482, %rd484; 2026-02-21T09:27:30.4501392Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4501455Z mov.b64 {%r1471, %r1472}, %rd485; 2026-02-21T09:27:30.4501516Z cvt.rn.f16x2.f32 %r1473, %r1472, %r1471; 2026-02-21T09:27:30.4501669Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4501731Z cvt.u64.u32 %rd486, %r773; 2026-02-21T09:27:30.4501786Z cvt.u64.u32 %rd487, %r774; 2026-02-21T09:27:30.4501842Z shl.b64 %rd488, %rd487, 32; 2026-02-21T09:27:30.4501899Z or.b64 %rd489, %rd486, %rd488; 2026-02-21T09:27:30.4502108Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4502165Z mov.b64 {%r1474, %r1475}, %rd489; 2026-02-21T09:27:30.4502229Z cvt.rn.f16x2.f32 %r1476, %r1475, %r1474; 2026-02-21T09:27:30.4502401Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4502458Z cvt.u64.u32 %rd490, %r775; 2026-02-21T09:27:30.4502513Z cvt.u64.u32 %rd491, %r776; 2026-02-21T09:27:30.4502573Z shl.b64 %rd492, %rd491, 32; 2026-02-21T09:27:30.4502628Z or.b64 %rd493, %rd490, %rd492; 2026-02-21T09:27:30.4502785Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4502842Z mov.b64 {%r1477, %r1478}, %rd493; 2026-02-21T09:27:30.4502912Z cvt.rn.f16x2.f32 %r1479, %r1478, %r1477; 2026-02-21T09:27:30.4503077Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4503136Z cvt.u64.u32 %rd494, %r778; 2026-02-21T09:27:30.4503200Z cvt.u64.u32 %rd495, %r779; 2026-02-21T09:27:30.4503255Z shl.b64 %rd496, %rd495, 32; 2026-02-21T09:27:30.4503311Z or.b64 %rd497, %rd494, %rd496; 2026-02-21T09:27:30.4503478Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4503536Z mov.b64 {%r1480, %r1481}, %rd497; 2026-02-21T09:27:30.4503599Z cvt.rn.f16x2.f32 %r1482, %r1481, %r1480; 2026-02-21T09:27:30.4503762Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4503825Z cvt.u64.u32 %rd498, %r780; 2026-02-21T09:27:30.4503880Z cvt.u64.u32 %rd499, %r781; 2026-02-21T09:27:30.4503936Z shl.b64 %rd500, %rd499, 32; 2026-02-21T09:27:30.4503998Z or.b64 %rd501, %rd498, %rd500; 2026-02-21T09:27:30.4504198Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4504259Z mov.b64 {%r1483, %r1484}, %rd501; 2026-02-21T09:27:30.4504329Z cvt.rn.f16x2.f32 %r1485, %r1484, %r1483; 2026-02-21T09:27:30.4504489Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4504544Z cvt.u64.u32 %rd502, %r782; 2026-02-21T09:27:30.4504599Z cvt.u64.u32 %rd503, %r783; 2026-02-21T09:27:30.4504708Z shl.b64 %rd504, %rd503, 32; 2026-02-21T09:27:30.4504766Z or.b64 %rd505, %rd502, %rd504; 2026-02-21T09:27:30.4504922Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4504986Z mov.b64 {%r1486, %r1487}, %rd505; 2026-02-21T09:27:30.4505049Z cvt.rn.f16x2.f32 %r1488, %r1487, %r1486; 2026-02-21T09:27:30.4505205Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4505270Z cvt.u64.u32 %rd506, %r784; 2026-02-21T09:27:30.4505328Z cvt.u64.u32 %rd507, %r785; 2026-02-21T09:27:30.4505387Z shl.b64 %rd508, %rd507, 32; 2026-02-21T09:27:30.4505445Z or.b64 %rd509, %rd506, %rd508; 2026-02-21T09:27:30.4505613Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4505671Z mov.b64 {%r1489, %r1490}, %rd509; 2026-02-21T09:27:30.4505737Z cvt.rn.f16x2.f32 %r1491, %r1490, %r1489; 2026-02-21T09:27:30.4505900Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4505958Z cvt.u64.u32 %rd510, %r786; 2026-02-21T09:27:30.4506013Z cvt.u64.u32 %rd511, %r787; 2026-02-21T09:27:30.4506075Z shl.b64 %rd512, %rd511, 32; 2026-02-21T09:27:30.4506131Z or.b64 %rd513, %rd510, %rd512; 2026-02-21T09:27:30.4506290Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4506347Z mov.b64 {%r1492, %r1493}, %rd513; 2026-02-21T09:27:30.4506420Z cvt.rn.f16x2.f32 %r1494, %r1493, %r1492; 2026-02-21T09:27:30.4506643Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4506699Z cvt.u64.u32 %rd514, %r788; 2026-02-21T09:27:30.4506762Z cvt.u64.u32 %rd515, %r789; 2026-02-21T09:27:30.4506817Z shl.b64 %rd516, %rd515, 32; 2026-02-21T09:27:30.4506873Z or.b64 %rd517, %rd514, %rd516; 2026-02-21T09:27:30.4507042Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4507098Z mov.b64 {%r1495, %r1496}, %rd517; 2026-02-21T09:27:30.4507161Z cvt.rn.f16x2.f32 %r1497, %r1496, %r1495; 2026-02-21T09:27:30.4507319Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4507382Z cvt.u64.u32 %rd518, %r790; 2026-02-21T09:27:30.4507437Z cvt.u64.u32 %rd519, %r791; 2026-02-21T09:27:30.4507494Z shl.b64 %rd520, %rd519, 32; 2026-02-21T09:27:30.4507560Z or.b64 %rd521, %rd518, %rd520; 2026-02-21T09:27:30.4507721Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4507779Z mov.b64 {%r1498, %r1499}, %rd521; 2026-02-21T09:27:30.4507849Z cvt.rn.f16x2.f32 %r1500, %r1499, %r1498; 2026-02-21T09:27:30.4508008Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4508065Z cvt.u64.u32 %rd522, %r792; 2026-02-21T09:27:30.4508120Z cvt.u64.u32 %rd523, %r793; 2026-02-21T09:27:30.4508184Z shl.b64 %rd524, %rd523, 32; 2026-02-21T09:27:30.4508240Z or.b64 %rd525, %rd522, %rd524; 2026-02-21T09:27:30.4508399Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4508463Z mov.b64 {%r1501, %r1502}, %rd525; 2026-02-21T09:27:30.4508527Z cvt.rn.f16x2.f32 %r1503, %r1502, %r1501; 2026-02-21T09:27:30.4508736Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4508803Z cvt.u64.u32 %rd526, %r795; 2026-02-21T09:27:30.4508860Z cvt.u64.u32 %rd527, %r796; 2026-02-21T09:27:30.4508915Z shl.b64 %rd528, %rd527, 32; 2026-02-21T09:27:30.4508971Z or.b64 %rd529, %rd526, %rd528; 2026-02-21T09:27:30.4509139Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4509197Z mov.b64 {%r1504, %r1505}, %rd529; 2026-02-21T09:27:30.4509259Z cvt.rn.f16x2.f32 %r1506, %r1505, %r1504; 2026-02-21T09:27:30.4509425Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4509481Z cvt.u64.u32 %rd530, %r797; 2026-02-21T09:27:30.4509535Z cvt.u64.u32 %rd531, %r798; 2026-02-21T09:27:30.4509597Z shl.b64 %rd532, %rd531, 32; 2026-02-21T09:27:30.4509653Z or.b64 %rd533, %rd530, %rd532; 2026-02-21T09:27:30.4509813Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4509871Z mov.b64 {%r1507, %r1508}, %rd533; 2026-02-21T09:27:30.4509942Z cvt.rn.f16x2.f32 %r1509, %r1508, %r1507; 2026-02-21T09:27:30.4510101Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4510156Z cvt.u64.u32 %rd534, %r799; 2026-02-21T09:27:30.4510219Z cvt.u64.u32 %rd535, %r800; 2026-02-21T09:27:30.4510274Z shl.b64 %rd536, %rd535, 32; 2026-02-21T09:27:30.4510329Z or.b64 %rd537, %rd534, %rd536; 2026-02-21T09:27:30.4510496Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4510553Z mov.b64 {%r1510, %r1511}, %rd537; 2026-02-21T09:27:30.4510615Z cvt.rn.f16x2.f32 %r1512, %r1511, %r1510; 2026-02-21T09:27:30.4510774Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4510837Z cvt.u64.u32 %rd538, %r801; 2026-02-21T09:27:30.4510893Z cvt.u64.u32 %rd539, %r802; 2026-02-21T09:27:30.4510970Z shl.b64 %rd540, %rd539, 32; 2026-02-21T09:27:30.4511055Z or.b64 %rd541, %rd538, %rd540; 2026-02-21T09:27:30.4511217Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4511273Z mov.b64 {%r1513, %r1514}, %rd541; 2026-02-21T09:27:30.4511342Z cvt.rn.f16x2.f32 %r1515, %r1514, %r1513; 2026-02-21T09:27:30.4511503Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4511558Z cvt.u64.u32 %rd542, %r803; 2026-02-21T09:27:30.4511613Z cvt.u64.u32 %rd543, %r804; 2026-02-21T09:27:30.4511677Z shl.b64 %rd544, %rd543, 32; 2026-02-21T09:27:30.4511732Z or.b64 %rd545, %rd542, %rd544; 2026-02-21T09:27:30.4511891Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4511955Z mov.b64 {%r1516, %r1517}, %rd545; 2026-02-21T09:27:30.4512018Z cvt.rn.f16x2.f32 %r1518, %r1517, %r1516; 2026-02-21T09:27:30.4512180Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4512243Z cvt.u64.u32 %rd546, %r805; 2026-02-21T09:27:30.4512299Z cvt.u64.u32 %rd547, %r806; 2026-02-21T09:27:30.4512355Z shl.b64 %rd548, %rd547, 32; 2026-02-21T09:27:30.4512410Z or.b64 %rd549, %rd546, %rd548; 2026-02-21T09:27:30.4512577Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4512634Z mov.b64 {%r1519, %r1520}, %rd549; 2026-02-21T09:27:30.4512696Z cvt.rn.f16x2.f32 %r1521, %r1520, %r1519; 2026-02-21T09:27:30.4512866Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4512921Z cvt.u64.u32 %rd550, %r807; 2026-02-21T09:27:30.4512976Z cvt.u64.u32 %rd551, %r808; 2026-02-21T09:27:30.4513064Z shl.b64 %rd552, %rd551, 32; 2026-02-21T09:27:30.4513143Z or.b64 %rd553, %rd550, %rd552; 2026-02-21T09:27:30.4513304Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4513364Z mov.b64 {%r1522, %r1523}, %rd553; 2026-02-21T09:27:30.4513435Z cvt.rn.f16x2.f32 %r1524, %r1523, %r1522; 2026-02-21T09:27:30.4513599Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4513654Z cvt.u64.u32 %rd554, %r809; 2026-02-21T09:27:30.4513717Z cvt.u64.u32 %rd555, %r810; 2026-02-21T09:27:30.4513773Z shl.b64 %rd556, %rd555, 32; 2026-02-21T09:27:30.4513830Z or.b64 %rd557, %rd554, %rd556; 2026-02-21T09:27:30.4513998Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4514056Z mov.b64 {%r1525, %r1526}, %rd557; 2026-02-21T09:27:30.4514120Z cvt.rn.f16x2.f32 %r1527, %r1526, %r1525; 2026-02-21T09:27:30.4514282Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4514347Z cvt.u64.u32 %rd558, %r812; 2026-02-21T09:27:30.4514413Z cvt.u64.u32 %rd559, %r813; 2026-02-21T09:27:30.4514469Z shl.b64 %rd560, %rd559, 32; 2026-02-21T09:27:30.4514534Z or.b64 %rd561, %rd558, %rd560; 2026-02-21T09:27:30.4514759Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4514820Z mov.b64 {%r1528, %r1529}, %rd561; 2026-02-21T09:27:30.4514889Z cvt.rn.f16x2.f32 %r1530, %r1529, %r1528; 2026-02-21T09:27:30.4515048Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4515103Z cvt.u64.u32 %rd562, %r814; 2026-02-21T09:27:30.4515158Z cvt.u64.u32 %rd563, %r815; 2026-02-21T09:27:30.4515220Z shl.b64 %rd564, %rd563, 32; 2026-02-21T09:27:30.4515276Z or.b64 %rd565, %rd562, %rd564; 2026-02-21T09:27:30.4515437Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4515527Z mov.b64 {%r1531, %r1532}, %rd565; 2026-02-21T09:27:30.4515615Z cvt.rn.f16x2.f32 %r1533, %r1532, %r1531; 2026-02-21T09:27:30.4515776Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4515837Z cvt.u64.u32 %rd566, %r816; 2026-02-21T09:27:30.4515892Z cvt.u64.u32 %rd567, %r817; 2026-02-21T09:27:30.4515947Z shl.b64 %rd568, %rd567, 32; 2026-02-21T09:27:30.4516003Z or.b64 %rd569, %rd566, %rd568; 2026-02-21T09:27:30.4516172Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4516228Z mov.b64 {%r1534, %r1535}, %rd569; 2026-02-21T09:27:30.4516291Z cvt.rn.f16x2.f32 %r1536, %r1535, %r1534; 2026-02-21T09:27:30.4516459Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4516516Z cvt.u64.u32 %rd570, %r818; 2026-02-21T09:27:30.4516572Z cvt.u64.u32 %rd571, %r819; 2026-02-21T09:27:30.4516636Z shl.b64 %rd572, %rd571, 32; 2026-02-21T09:27:30.4516693Z or.b64 %rd573, %rd570, %rd572; 2026-02-21T09:27:30.4516853Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4516910Z mov.b64 {%r1537, %r1538}, %rd573; 2026-02-21T09:27:30.4516981Z cvt.rn.f16x2.f32 %r1539, %r1538, %r1537; 2026-02-21T09:27:30.4517145Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4517199Z cvt.u64.u32 %rd574, %r820; 2026-02-21T09:27:30.4517262Z cvt.u64.u32 %rd575, %r821; 2026-02-21T09:27:30.4517318Z shl.b64 %rd576, %rd575, 32; 2026-02-21T09:27:30.4517374Z or.b64 %rd577, %rd574, %rd576; 2026-02-21T09:27:30.4517540Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4517646Z mov.b64 {%r1540, %r1541}, %rd577; 2026-02-21T09:27:30.4517733Z cvt.rn.f16x2.f32 %r1542, %r1541, %r1540; 2026-02-21T09:27:30.4517897Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4517964Z cvt.u64.u32 %rd578, %r822; 2026-02-21T09:27:30.4518020Z cvt.u64.u32 %rd579, %r823; 2026-02-21T09:27:30.4518077Z shl.b64 %rd580, %rd579, 32; 2026-02-21T09:27:30.4518142Z or.b64 %rd581, %rd578, %rd580; 2026-02-21T09:27:30.4518305Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4518361Z mov.b64 {%r1543, %r1544}, %rd581; 2026-02-21T09:27:30.4518432Z cvt.rn.f16x2.f32 %r1545, %r1544, %r1543; 2026-02-21T09:27:30.4518595Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4518652Z cvt.u64.u32 %rd582, %r824; 2026-02-21T09:27:30.4518708Z cvt.u64.u32 %rd583, %r825; 2026-02-21T09:27:30.4518773Z shl.b64 %rd584, %rd583, 32; 2026-02-21T09:27:30.4518830Z or.b64 %rd585, %rd582, %rd584; 2026-02-21T09:27:30.4518990Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4519054Z mov.b64 {%r1546, %r1547}, %rd585; 2026-02-21T09:27:30.4519117Z cvt.rn.f16x2.f32 %r1548, %r1547, %r1546; 2026-02-21T09:27:30.4519281Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4519344Z cvt.u64.u32 %rd586, %r826; 2026-02-21T09:27:30.4519401Z cvt.u64.u32 %rd587, %r827; 2026-02-21T09:27:30.4519457Z shl.b64 %rd588, %rd587, 32; 2026-02-21T09:27:30.4519513Z or.b64 %rd589, %rd586, %rd588; 2026-02-21T09:27:30.4519679Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4519736Z mov.b64 {%r1549, %r1550}, %rd589; 2026-02-21T09:27:30.4519800Z cvt.rn.f16x2.f32 %r1551, %r1550, %r1549; 2026-02-21T09:27:30.4519969Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4520049Z cvt.u64.u32 %rd590, %r829; 2026-02-21T09:27:30.4520123Z cvt.u64.u32 %rd591, %r830; 2026-02-21T09:27:30.4520185Z shl.b64 %rd592, %rd591, 32; 2026-02-21T09:27:30.4520241Z or.b64 %rd593, %rd590, %rd592; 2026-02-21T09:27:30.4520405Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4520461Z mov.b64 {%r1552, %r1553}, %rd593; 2026-02-21T09:27:30.4520529Z cvt.rn.f16x2.f32 %r1554, %r1553, %r1552; 2026-02-21T09:27:30.4520686Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4520743Z cvt.u64.u32 %rd594, %r831; 2026-02-21T09:27:30.4520805Z cvt.u64.u32 %rd595, %r832; 2026-02-21T09:27:30.4520860Z shl.b64 %rd596, %rd595, 32; 2026-02-21T09:27:30.4520916Z or.b64 %rd597, %rd594, %rd596; 2026-02-21T09:27:30.4521080Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4521138Z mov.b64 {%r1555, %r1556}, %rd597; 2026-02-21T09:27:30.4521202Z cvt.rn.f16x2.f32 %r1557, %r1556, %r1555; 2026-02-21T09:27:30.4521358Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4521420Z cvt.u64.u32 %rd598, %r833; 2026-02-21T09:27:30.4521477Z cvt.u64.u32 %rd599, %r834; 2026-02-21T09:27:30.4521533Z shl.b64 %rd600, %rd599, 32; 2026-02-21T09:27:30.4521596Z or.b64 %rd601, %rd598, %rd600; 2026-02-21T09:27:30.4521749Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4521806Z mov.b64 {%r1558, %r1559}, %rd601; 2026-02-21T09:27:30.4521876Z cvt.rn.f16x2.f32 %r1560, %r1559, %r1558; 2026-02-21T09:27:30.4522034Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4522112Z cvt.u64.u32 %rd602, %r835; 2026-02-21T09:27:30.4522192Z cvt.u64.u32 %rd603, %r836; 2026-02-21T09:27:30.4522259Z shl.b64 %rd604, %rd603, 32; 2026-02-21T09:27:30.4522318Z or.b64 %rd605, %rd602, %rd604; 2026-02-21T09:27:30.4522482Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4522548Z mov.b64 {%r1561, %r1562}, %rd605; 2026-02-21T09:27:30.4522614Z cvt.rn.f16x2.f32 %r1563, %r1562, %r1561; 2026-02-21T09:27:30.4522777Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4522842Z cvt.u64.u32 %rd606, %r837; 2026-02-21T09:27:30.4522898Z cvt.u64.u32 %rd607, %r838; 2026-02-21T09:27:30.4522954Z shl.b64 %rd608, %rd607, 32; 2026-02-21T09:27:30.4523014Z or.b64 %rd609, %rd606, %rd608; 2026-02-21T09:27:30.4523182Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4523239Z mov.b64 {%r1564, %r1565}, %rd609; 2026-02-21T09:27:30.4523304Z cvt.rn.f16x2.f32 %r1566, %r1565, %r1564; 2026-02-21T09:27:30.4523472Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4523529Z cvt.u64.u32 %rd610, %r839; 2026-02-21T09:27:30.4523584Z cvt.u64.u32 %rd611, %r840; 2026-02-21T09:27:30.4523646Z shl.b64 %rd612, %rd611, 32; 2026-02-21T09:27:30.4523704Z or.b64 %rd613, %rd610, %rd612; 2026-02-21T09:27:30.4523865Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4523922Z mov.b64 {%r1567, %r1568}, %rd613; 2026-02-21T09:27:30.4523992Z cvt.rn.f16x2.f32 %r1569, %r1568, %r1567; 2026-02-21T09:27:30.4524154Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4524210Z cvt.u64.u32 %rd614, %r841; 2026-02-21T09:27:30.4524272Z cvt.u64.u32 %rd615, %r842; 2026-02-21T09:27:30.4524328Z shl.b64 %rd616, %rd615, 32; 2026-02-21T09:27:30.4524387Z or.b64 %rd617, %rd614, %rd616; 2026-02-21T09:27:30.4524554Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4524651Z mov.b64 {%r1570, %r1571}, %rd617; 2026-02-21T09:27:30.4524750Z cvt.rn.f16x2.f32 %r1572, %r1571, %r1570; 2026-02-21T09:27:30.4524912Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4524975Z cvt.u64.u32 %rd618, %r843; 2026-02-21T09:27:30.4525032Z cvt.u64.u32 %rd619, %r844; 2026-02-21T09:27:30.4525087Z shl.b64 %rd620, %rd619, 32; 2026-02-21T09:27:30.4525152Z or.b64 %rd621, %rd618, %rd620; 2026-02-21T09:27:30.4525314Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4525372Z mov.b64 {%r1573, %r1574}, %rd621; 2026-02-21T09:27:30.4525441Z cvt.rn.f16x2.f32 %r1575, %r1574, %r1573; 2026-02-21T09:27:30.4525604Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4525663Z cvt.u64.u32 %rd622, %r846; 2026-02-21T09:27:30.4525719Z cvt.u64.u32 %rd623, %r847; 2026-02-21T09:27:30.4525783Z shl.b64 %rd624, %rd623, 32; 2026-02-21T09:27:30.4525840Z or.b64 %rd625, %rd622, %rd624; 2026-02-21T09:27:30.4526005Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4526069Z mov.b64 {%r1576, %r1577}, %rd625; 2026-02-21T09:27:30.4526133Z cvt.rn.f16x2.f32 %r1578, %r1577, %r1576; 2026-02-21T09:27:30.4526294Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4526359Z cvt.u64.u32 %rd626, %r848; 2026-02-21T09:27:30.4526414Z cvt.u64.u32 %rd627, %r849; 2026-02-21T09:27:30.4526469Z shl.b64 %rd628, %rd627, 32; 2026-02-21T09:27:30.4526526Z or.b64 %rd629, %rd626, %rd628; 2026-02-21T09:27:30.4526730Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4526812Z mov.b64 {%r1579, %r1580}, %rd629; 2026-02-21T09:27:30.4526881Z cvt.rn.f16x2.f32 %r1581, %r1580, %r1579; 2026-02-21T09:27:30.4527051Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4527107Z cvt.u64.u32 %rd630, %r850; 2026-02-21T09:27:30.4527162Z cvt.u64.u32 %rd631, %r851; 2026-02-21T09:27:30.4527223Z shl.b64 %rd632, %rd631, 32; 2026-02-21T09:27:30.4527280Z or.b64 %rd633, %rd630, %rd632; 2026-02-21T09:27:30.4527441Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4527498Z mov.b64 {%r1582, %r1583}, %rd633; 2026-02-21T09:27:30.4527566Z cvt.rn.f16x2.f32 %r1584, %r1583, %r1582; 2026-02-21T09:27:30.4527731Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4527785Z cvt.u64.u32 %rd634, %r852; 2026-02-21T09:27:30.4527846Z cvt.u64.u32 %rd635, %r853; 2026-02-21T09:27:30.4527902Z shl.b64 %rd636, %rd635, 32; 2026-02-21T09:27:30.4527959Z or.b64 %rd637, %rd634, %rd636; 2026-02-21T09:27:30.4528128Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4528184Z mov.b64 {%r1585, %r1586}, %rd637; 2026-02-21T09:27:30.4528246Z cvt.rn.f16x2.f32 %r1587, %r1586, %r1585; 2026-02-21T09:27:30.4528409Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4528472Z cvt.u64.u32 %rd638, %r854; 2026-02-21T09:27:30.4528527Z cvt.u64.u32 %rd639, %r855; 2026-02-21T09:27:30.4528581Z shl.b64 %rd640, %rd639, 32; 2026-02-21T09:27:30.4528643Z or.b64 %rd641, %rd638, %rd640; 2026-02-21T09:27:30.4528809Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4528865Z mov.b64 {%r1588, %r1589}, %rd641; 2026-02-21T09:27:30.4528934Z cvt.rn.f16x2.f32 %r1590, %r1589, %r1588; 2026-02-21T09:27:30.4529100Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4529205Z cvt.u64.u32 %rd642, %r856; 2026-02-21T09:27:30.4529260Z cvt.u64.u32 %rd643, %r857; 2026-02-21T09:27:30.4529324Z shl.b64 %rd644, %rd643, 32; 2026-02-21T09:27:30.4529380Z or.b64 %rd645, %rd642, %rd644; 2026-02-21T09:27:30.4529540Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4529603Z mov.b64 {%r1591, %r1592}, %rd645; 2026-02-21T09:27:30.4529666Z cvt.rn.f16x2.f32 %r1593, %r1592, %r1591; 2026-02-21T09:27:30.4529829Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4529891Z cvt.u64.u32 %rd646, %r858; 2026-02-21T09:27:30.4529946Z cvt.u64.u32 %rd647, %r859; 2026-02-21T09:27:30.4530002Z shl.b64 %rd648, %rd647, 32; 2026-02-21T09:27:30.4530059Z or.b64 %rd649, %rd646, %rd648; 2026-02-21T09:27:30.4530230Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4530290Z mov.b64 {%r1594, %r1595}, %rd649; 2026-02-21T09:27:30.4530353Z cvt.rn.f16x2.f32 %r1596, %r1595, %r1594; 2026-02-21T09:27:30.4530521Z .loc 1 53 52 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:53:52 2026-02-21T09:27:30.4530578Z cvt.u64.u32 %rd650, %r860; 2026-02-21T09:27:30.4530633Z cvt.u64.u32 %rd651, %r861; 2026-02-21T09:27:30.4530697Z shl.b64 %rd652, %rd651, 32; 2026-02-21T09:27:30.4530753Z or.b64 %rd653, %rd650, %rd652; 2026-02-21T09:27:30.4530912Z .loc 1 55 27 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:55:27 2026-02-21T09:27:30.4530969Z mov.b64 {%r1597, %r1598}, %rd653; 2026-02-21T09:27:30.4531040Z cvt.rn.f16x2.f32 %r1599, %r1598, %r1597; 2026-02-21T09:27:30.4531221Z .loc 1 56 82 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:56:82 2026-02-21T09:27:30.4531343Z st.shared.v4.b32 [%r11], {%r1218, %r1230, %r1242, %r1254}; 2026-02-21T09:27:30.4531449Z st.shared.v4.b32 [%r12], {%r1266, %r1278, %r1290, %r1302}; 2026-02-21T09:27:30.4531539Z st.shared.v4.b32 [%r13], {%r1314, %r1326, %r1338, %r1350}; 2026-02-21T09:27:30.4531626Z st.shared.v4.b32 [%r14], {%r1362, %r1374, %r1386, %r1398}; 2026-02-21T09:27:30.4531722Z st.shared.v4.b32 [%r15], {%r1410, %r1422, %r1434, %r1446}; 2026-02-21T09:27:30.4531810Z st.shared.v4.b32 [%r16], {%r1458, %r1470, %r1482, %r1494}; 2026-02-21T09:27:30.4531895Z st.shared.v4.b32 [%r17], {%r1506, %r1518, %r1530, %r1542}; 2026-02-21T09:27:30.4531980Z st.shared.v4.b32 [%r18], {%r1554, %r1566, %r1578, %r1590}; 2026-02-21T09:27:30.4532043Z bar.sync 0; 2026-02-21T09:27:30.4532100Z // begin inline asm 2026-02-21T09:27:30.4532256Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1023, %r1027, %r1031, %r1035}, [%r867]; 2026-02-21T09:27:30.4532319Z // end inline asm 2026-02-21T09:27:30.4532377Z // begin inline asm 2026-02-21T09:27:30.4532527Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1039, %r1043, %r1047, %r1051}, [%r872]; 2026-02-21T09:27:30.4532590Z // end inline asm 2026-02-21T09:27:30.4532644Z // begin inline asm 2026-02-21T09:27:30.4532789Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1055, %r1059, %r1063, %r1067}, [%r877]; 2026-02-21T09:27:30.4532841Z // end inline asm 2026-02-21T09:27:30.4532902Z // begin inline asm 2026-02-21T09:27:30.4533044Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1071, %r1075, %r1079, %r1083}, [%r882]; 2026-02-21T09:27:30.4533097Z // end inline asm 2026-02-21T09:27:30.4533158Z // begin inline asm 2026-02-21T09:27:30.4533299Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1087, %r1091, %r1095, %r1099}, [%r887]; 2026-02-21T09:27:30.4533351Z // end inline asm 2026-02-21T09:27:30.4533405Z // begin inline asm 2026-02-21T09:27:30.4533552Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1103, %r1107, %r1111, %r1115}, [%r892]; 2026-02-21T09:27:30.4533605Z // end inline asm 2026-02-21T09:27:30.4533660Z // begin inline asm 2026-02-21T09:27:30.4533830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1119, %r1123, %r1127, %r1131}, [%r897]; 2026-02-21T09:27:30.4533918Z // end inline asm 2026-02-21T09:27:30.4533974Z // begin inline asm 2026-02-21T09:27:30.4534142Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1135, %r1139, %r1143, %r1147}, [%r902]; 2026-02-21T09:27:30.4534197Z // end inline asm 2026-02-21T09:27:30.4534253Z bar.sync 0; 2026-02-21T09:27:30.4534347Z st.shared.v4.b32 [%r11], {%r1221, %r1233, %r1245, %r1257}; 2026-02-21T09:27:30.4534447Z st.shared.v4.b32 [%r12], {%r1269, %r1281, %r1293, %r1305}; 2026-02-21T09:27:30.4534539Z st.shared.v4.b32 [%r13], {%r1317, %r1329, %r1341, %r1353}; 2026-02-21T09:27:30.4534631Z st.shared.v4.b32 [%r14], {%r1365, %r1377, %r1389, %r1401}; 2026-02-21T09:27:30.4534769Z st.shared.v4.b32 [%r15], {%r1413, %r1425, %r1437, %r1449}; 2026-02-21T09:27:30.4534861Z st.shared.v4.b32 [%r16], {%r1461, %r1473, %r1485, %r1497}; 2026-02-21T09:27:30.4534952Z st.shared.v4.b32 [%r17], {%r1509, %r1521, %r1533, %r1545}; 2026-02-21T09:27:30.4535051Z st.shared.v4.b32 [%r18], {%r1557, %r1569, %r1581, %r1593}; 2026-02-21T09:27:30.4535105Z bar.sync 0; 2026-02-21T09:27:30.4535162Z // begin inline asm 2026-02-21T09:27:30.4535314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1028, %r1032, %r1036}, [%r867]; 2026-02-21T09:27:30.4535377Z // end inline asm 2026-02-21T09:27:30.4535432Z // begin inline asm 2026-02-21T09:27:30.4535580Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1040, %r1044, %r1048, %r1052}, [%r872]; 2026-02-21T09:27:30.4535644Z // end inline asm 2026-02-21T09:27:30.4535700Z // begin inline asm 2026-02-21T09:27:30.4535847Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1056, %r1060, %r1064, %r1068}, [%r877]; 2026-02-21T09:27:30.4535910Z // end inline asm 2026-02-21T09:27:30.4535965Z // begin inline asm 2026-02-21T09:27:30.4536146Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1072, %r1076, %r1080, %r1084}, [%r882]; 2026-02-21T09:27:30.4536232Z // end inline asm 2026-02-21T09:27:30.4536299Z // begin inline asm 2026-02-21T09:27:30.4536450Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1088, %r1092, %r1096, %r1100}, [%r887]; 2026-02-21T09:27:30.4536505Z // end inline asm 2026-02-21T09:27:30.4536565Z // begin inline asm 2026-02-21T09:27:30.4536710Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1104, %r1108, %r1112, %r1116}, [%r892]; 2026-02-21T09:27:30.4536765Z // end inline asm 2026-02-21T09:27:30.4536820Z // begin inline asm 2026-02-21T09:27:30.4536976Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1120, %r1124, %r1128, %r1132}, [%r897]; 2026-02-21T09:27:30.4537030Z // end inline asm 2026-02-21T09:27:30.4537087Z // begin inline asm 2026-02-21T09:27:30.4537241Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1136, %r1140, %r1144, %r1148}, [%r902]; 2026-02-21T09:27:30.4537295Z // end inline asm 2026-02-21T09:27:30.4537349Z bar.sync 0; 2026-02-21T09:27:30.4537448Z st.shared.v4.b32 [%r11], {%r1224, %r1236, %r1248, %r1260}; 2026-02-21T09:27:30.4537542Z st.shared.v4.b32 [%r12], {%r1272, %r1284, %r1296, %r1308}; 2026-02-21T09:27:30.4537637Z st.shared.v4.b32 [%r13], {%r1320, %r1332, %r1344, %r1356}; 2026-02-21T09:27:30.4537728Z st.shared.v4.b32 [%r14], {%r1368, %r1380, %r1392, %r1404}; 2026-02-21T09:27:30.4537825Z st.shared.v4.b32 [%r15], {%r1416, %r1428, %r1440, %r1452}; 2026-02-21T09:27:30.4537915Z st.shared.v4.b32 [%r16], {%r1464, %r1476, %r1488, %r1500}; 2026-02-21T09:27:30.4538005Z st.shared.v4.b32 [%r17], {%r1512, %r1524, %r1536, %r1548}; 2026-02-21T09:27:30.4538104Z st.shared.v4.b32 [%r18], {%r1560, %r1572, %r1584, %r1596}; 2026-02-21T09:27:30.4538158Z bar.sync 0; 2026-02-21T09:27:30.4538215Z // begin inline asm 2026-02-21T09:27:30.4538372Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r867]; 2026-02-21T09:27:30.4538427Z // end inline asm 2026-02-21T09:27:30.4538483Z // begin inline asm 2026-02-21T09:27:30.4538633Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r872]; 2026-02-21T09:27:30.4538696Z // end inline asm 2026-02-21T09:27:30.4538779Z // begin inline asm 2026-02-21T09:27:30.4538952Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r877]; 2026-02-21T09:27:30.4539013Z // end inline asm 2026-02-21T09:27:30.4539068Z // begin inline asm 2026-02-21T09:27:30.4539215Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r882]; 2026-02-21T09:27:30.4539269Z // end inline asm 2026-02-21T09:27:30.4539333Z // begin inline asm 2026-02-21T09:27:30.4539480Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r887]; 2026-02-21T09:27:30.4539533Z // end inline asm 2026-02-21T09:27:30.4539596Z // begin inline asm 2026-02-21T09:27:30.4539743Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1105, %r1109, %r1113, %r1117}, [%r892]; 2026-02-21T09:27:30.4539798Z // end inline asm 2026-02-21T09:27:30.4539860Z // begin inline asm 2026-02-21T09:27:30.4540010Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1121, %r1125, %r1129, %r1133}, [%r897]; 2026-02-21T09:27:30.4540066Z // end inline asm 2026-02-21T09:27:30.4540123Z // begin inline asm 2026-02-21T09:27:30.4540279Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1137, %r1141, %r1145, %r1149}, [%r902]; 2026-02-21T09:27:30.4540334Z // end inline asm 2026-02-21T09:27:30.4540389Z bar.sync 0; 2026-02-21T09:27:30.4540491Z st.shared.v4.b32 [%r11], {%r1227, %r1239, %r1251, %r1263}; 2026-02-21T09:27:30.4540584Z st.shared.v4.b32 [%r12], {%r1275, %r1287, %r1299, %r1311}; 2026-02-21T09:27:30.4540678Z st.shared.v4.b32 [%r13], {%r1323, %r1335, %r1347, %r1359}; 2026-02-21T09:27:30.4540776Z st.shared.v4.b32 [%r14], {%r1371, %r1383, %r1395, %r1407}; 2026-02-21T09:27:30.4540867Z st.shared.v4.b32 [%r15], {%r1419, %r1431, %r1443, %r1455}; 2026-02-21T09:27:30.4540955Z st.shared.v4.b32 [%r16], {%r1467, %r1479, %r1491, %r1503}; 2026-02-21T09:27:30.4541045Z st.shared.v4.b32 [%r17], {%r1515, %r1527, %r1539, %r1551}; 2026-02-21T09:27:30.4541181Z st.shared.v4.b32 [%r18], {%r1563, %r1575, %r1587, %r1599}; 2026-02-21T09:27:30.4541239Z bar.sync 0; 2026-02-21T09:27:30.4541297Z // begin inline asm 2026-02-21T09:27:30.4541454Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r867]; 2026-02-21T09:27:30.4541508Z // end inline asm 2026-02-21T09:27:30.4541563Z // begin inline asm 2026-02-21T09:27:30.4541712Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r872]; 2026-02-21T09:27:30.4541776Z // end inline asm 2026-02-21T09:27:30.4541830Z // begin inline asm 2026-02-21T09:27:30.4541978Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r877]; 2026-02-21T09:27:30.4542041Z // end inline asm 2026-02-21T09:27:30.4542096Z // begin inline asm 2026-02-21T09:27:30.4542243Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r882]; 2026-02-21T09:27:30.4542306Z // end inline asm 2026-02-21T09:27:30.4542362Z // begin inline asm 2026-02-21T09:27:30.4542511Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r887]; 2026-02-21T09:27:30.4542568Z // end inline asm 2026-02-21T09:27:30.4542632Z // begin inline asm 2026-02-21T09:27:30.4542779Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1106, %r1110, %r1114, %r1118}, [%r892]; 2026-02-21T09:27:30.4542833Z // end inline asm 2026-02-21T09:27:30.4542896Z // begin inline asm 2026-02-21T09:27:30.4543040Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1122, %r1126, %r1130, %r1134}, [%r897]; 2026-02-21T09:27:30.4543094Z // end inline asm 2026-02-21T09:27:30.4543157Z // begin inline asm 2026-02-21T09:27:30.4543303Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1138, %r1142, %r1146, %r1150}, [%r902]; 2026-02-21T09:27:30.4543358Z // end inline asm 2026-02-21T09:27:30.4543414Z // begin inline asm 2026-02-21T09:27:30.4543547Z st.global.v4.b32 [ %rd110 + 0 ], { %r1023, %r1024, %r1025, %r1026 }; 2026-02-21T09:27:30.4543600Z // end inline asm 2026-02-21T09:27:30.4543655Z // begin inline asm 2026-02-21T09:27:30.4543765Z st.global.v4.b32 [ %rd111 + 0 ], { %r1027, %r1028, %r1029, %r1030 }; 2026-02-21T09:27:30.4543866Z // end inline asm 2026-02-21T09:27:30.4543918Z // begin inline asm 2026-02-21T09:27:30.4544017Z st.global.v4.b32 [ %rd112 + 0 ], { %r1031, %r1032, %r1033, %r1034 }; 2026-02-21T09:27:30.4544079Z // end inline asm 2026-02-21T09:27:30.4544133Z // begin inline asm 2026-02-21T09:27:30.4544234Z st.global.v4.b32 [ %rd113 + 0 ], { %r1035, %r1036, %r1037, %r1038 }; 2026-02-21T09:27:30.4544294Z // end inline asm 2026-02-21T09:27:30.4544346Z // begin inline asm 2026-02-21T09:27:30.4544442Z st.global.v4.b32 [ %rd114 + 0 ], { %r1039, %r1040, %r1041, %r1042 }; 2026-02-21T09:27:30.4544501Z // end inline asm 2026-02-21T09:27:30.4544553Z // begin inline asm 2026-02-21T09:27:30.4544649Z st.global.v4.b32 [ %rd115 + 0 ], { %r1043, %r1044, %r1045, %r1046 }; 2026-02-21T09:27:30.4544727Z // end inline asm 2026-02-21T09:27:30.4544789Z // begin inline asm 2026-02-21T09:27:30.4544886Z st.global.v4.b32 [ %rd116 + 0 ], { %r1047, %r1048, %r1049, %r1050 }; 2026-02-21T09:27:30.4544939Z // end inline asm 2026-02-21T09:27:30.4545002Z // begin inline asm 2026-02-21T09:27:30.4545096Z st.global.v4.b32 [ %rd117 + 0 ], { %r1051, %r1052, %r1053, %r1054 }; 2026-02-21T09:27:30.4545148Z // end inline asm 2026-02-21T09:27:30.4545200Z // begin inline asm 2026-02-21T09:27:30.4545302Z st.global.v4.b32 [ %rd118 + 0 ], { %r1055, %r1056, %r1057, %r1058 }; 2026-02-21T09:27:30.4545354Z // end inline asm 2026-02-21T09:27:30.4545405Z // begin inline asm 2026-02-21T09:27:30.4545507Z st.global.v4.b32 [ %rd119 + 0 ], { %r1059, %r1060, %r1061, %r1062 }; 2026-02-21T09:27:30.4545559Z // end inline asm 2026-02-21T09:27:30.4545611Z // begin inline asm 2026-02-21T09:27:30.4545706Z st.global.v4.b32 [ %rd120 + 0 ], { %r1063, %r1064, %r1065, %r1066 }; 2026-02-21T09:27:30.4545765Z // end inline asm 2026-02-21T09:27:30.4545817Z // begin inline asm 2026-02-21T09:27:30.4545937Z st.global.v4.b32 [ %rd121 + 0 ], { %r1067, %r1068, %r1069, %r1070 }; 2026-02-21T09:27:30.4546023Z // end inline asm 2026-02-21T09:27:30.4546079Z // begin inline asm 2026-02-21T09:27:30.4546175Z st.global.v4.b32 [ %rd122 + 0 ], { %r1071, %r1072, %r1073, %r1074 }; 2026-02-21T09:27:30.4546234Z // end inline asm 2026-02-21T09:27:30.4546287Z // begin inline asm 2026-02-21T09:27:30.4546379Z st.global.v4.b32 [ %rd123 + 0 ], { %r1075, %r1076, %r1077, %r1078 }; 2026-02-21T09:27:30.4546430Z // end inline asm 2026-02-21T09:27:30.4546490Z // begin inline asm 2026-02-21T09:27:30.4546585Z st.global.v4.b32 [ %rd124 + 0 ], { %r1079, %r1080, %r1081, %r1082 }; 2026-02-21T09:27:30.4546636Z // end inline asm 2026-02-21T09:27:30.4546694Z // begin inline asm 2026-02-21T09:27:30.4546788Z st.global.v4.b32 [ %rd125 + 0 ], { %r1083, %r1084, %r1085, %r1086 }; 2026-02-21T09:27:30.4546841Z // end inline asm 2026-02-21T09:27:30.4546893Z // begin inline asm 2026-02-21T09:27:30.4546996Z st.global.v4.b32 [ %rd126 + 0 ], { %r1087, %r1088, %r1089, %r1090 }; 2026-02-21T09:27:30.4547050Z // end inline asm 2026-02-21T09:27:30.4547104Z // begin inline asm 2026-02-21T09:27:30.4547207Z st.global.v4.b32 [ %rd127 + 0 ], { %r1091, %r1092, %r1093, %r1094 }; 2026-02-21T09:27:30.4547262Z // end inline asm 2026-02-21T09:27:30.4547314Z // begin inline asm 2026-02-21T09:27:30.4547416Z st.global.v4.b32 [ %rd128 + 0 ], { %r1095, %r1096, %r1097, %r1098 }; 2026-02-21T09:27:30.4547469Z // end inline asm 2026-02-21T09:27:30.4547521Z // begin inline asm 2026-02-21T09:27:30.4547615Z st.global.v4.b32 [ %rd129 + 0 ], { %r1099, %r1100, %r1101, %r1102 }; 2026-02-21T09:27:30.4547676Z // end inline asm 2026-02-21T09:27:30.4547729Z // begin inline asm 2026-02-21T09:27:30.4547823Z st.global.v4.b32 [ %rd130 + 0 ], { %r1103, %r1104, %r1105, %r1106 }; 2026-02-21T09:27:30.4547883Z // end inline asm 2026-02-21T09:27:30.4547936Z // begin inline asm 2026-02-21T09:27:30.4548030Z st.global.v4.b32 [ %rd131 + 0 ], { %r1107, %r1108, %r1109, %r1110 }; 2026-02-21T09:27:30.4548086Z // end inline asm 2026-02-21T09:27:30.4548150Z // begin inline asm 2026-02-21T09:27:30.4548283Z st.global.v4.b32 [ %rd132 + 0 ], { %r1111, %r1112, %r1113, %r1114 }; 2026-02-21T09:27:30.4548360Z // end inline asm 2026-02-21T09:27:30.4548421Z // begin inline asm 2026-02-21T09:27:30.4548516Z st.global.v4.b32 [ %rd133 + 0 ], { %r1115, %r1116, %r1117, %r1118 }; 2026-02-21T09:27:30.4548569Z // end inline asm 2026-02-21T09:27:30.4548627Z // begin inline asm 2026-02-21T09:27:30.4548719Z st.global.v4.b32 [ %rd134 + 0 ], { %r1119, %r1120, %r1121, %r1122 }; 2026-02-21T09:27:30.4548772Z // end inline asm 2026-02-21T09:27:30.4548825Z // begin inline asm 2026-02-21T09:27:30.4548927Z st.global.v4.b32 [ %rd135 + 0 ], { %r1123, %r1124, %r1125, %r1126 }; 2026-02-21T09:27:30.4548979Z // end inline asm 2026-02-21T09:27:30.4549031Z // begin inline asm 2026-02-21T09:27:30.4549131Z st.global.v4.b32 [ %rd136 + 0 ], { %r1127, %r1128, %r1129, %r1130 }; 2026-02-21T09:27:30.4549182Z // end inline asm 2026-02-21T09:27:30.4549237Z // begin inline asm 2026-02-21T09:27:30.4549331Z st.global.v4.b32 [ %rd137 + 0 ], { %r1131, %r1132, %r1133, %r1134 }; 2026-02-21T09:27:30.4549392Z // end inline asm 2026-02-21T09:27:30.4549445Z // begin inline asm 2026-02-21T09:27:30.4549537Z st.global.v4.b32 [ %rd138 + 0 ], { %r1135, %r1136, %r1137, %r1138 }; 2026-02-21T09:27:30.4549595Z // end inline asm 2026-02-21T09:27:30.4549648Z // begin inline asm 2026-02-21T09:27:30.4549741Z st.global.v4.b32 [ %rd139 + 0 ], { %r1139, %r1140, %r1141, %r1142 }; 2026-02-21T09:27:30.4549799Z // end inline asm 2026-02-21T09:27:30.4549852Z // begin inline asm 2026-02-21T09:27:30.4549944Z st.global.v4.b32 [ %rd140 + 0 ], { %r1143, %r1144, %r1145, %r1146 }; 2026-02-21T09:27:30.4549996Z // end inline asm 2026-02-21T09:27:30.4550055Z // begin inline asm 2026-02-21T09:27:30.4550147Z st.global.v4.b32 [ %rd141 + 0 ], { %r1147, %r1148, %r1149, %r1150 }; 2026-02-21T09:27:30.4550198Z // end inline asm 2026-02-21T09:27:30.4550305Z $L__BB0_8: // %._crit_edge 2026-02-21T09:27:30.4550504Z .loc 1 27 4 // cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py:27:4 2026-02-21T09:27:30.4550561Z bar.sync 0; 2026-02-21T09:27:30.4550614Z // begin inline asm 2026-02-21T09:27:30.4550741Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1600, 512; 2026-02-21T09:27:30.4550795Z // end inline asm 2026-02-21T09:27:30.4550845Z ret; 2026-02-21T09:27:30.4550907Z $L__tmp0: 2026-02-21T09:27:30.4550961Z $L__func_end0: 2026-02-21T09:27:30.4551041Z // -- End function 2026-02-21T09:27:30.4551099Z } 2026-02-21T09:27:30.4551299Z .file 1 "/tmp/torchinductor_root/ux/cuxdywlcovhrwi7fxv56djvse343pge5g2a7ifpwxlmgfg4rxro2.py" 2026-02-21T09:27:30.4551360Z .section .debug_abbrev 2026-02-21T09:27:30.4551408Z { 2026-02-21T09:27:30.4551497Z .b8 1 // Abbreviation Code 2026-02-21T09:27:30.4551582Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:27:30.4551661Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:27:30.4551747Z .b8 37 // DW_AT_producer 2026-02-21T09:27:30.4551821Z .b8 8 // DW_FORM_string 2026-02-21T09:27:30.4551892Z .b8 19 // DW_AT_language 2026-02-21T09:27:30.4551972Z .b8 5 // DW_FORM_data2 2026-02-21T09:27:30.4552044Z .b8 3 // DW_AT_name 2026-02-21T09:27:30.4552114Z .b8 8 // DW_FORM_string 2026-02-21T09:27:30.4552188Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:27:30.4552268Z .b8 6 // DW_FORM_data4 2026-02-21T09:27:30.4552337Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:27:30.4552408Z .b8 8 // DW_FORM_string 2026-02-21T09:27:30.4552483Z .b8 0 // EOM(1) 2026-02-21T09:27:30.4552553Z .b8 0 // EOM(2) 2026-02-21T09:27:30.4552658Z .b8 0 // EOM(3) 2026-02-21T09:27:30.4552731Z } 2026-02-21T09:27:30.4552789Z .section .debug_info 2026-02-21T09:27:30.4552836Z { 2026-02-21T09:27:30.4552916Z .b32 104 // Length of Unit 2026-02-21T09:27:30.4553006Z .b8 2 // DWARF version number 2026-02-21T09:27:30.4553055Z .b8 0 2026-02-21T09:27:30.4553164Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:27:30.4553257Z .b8 8 // Address Size (in bytes) 2026-02-21T09:27:30.4553352Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:27:30.4553427Z .b8 116 // DW_AT_producer 2026-02-21T09:27:30.4553478Z .b8 114 2026-02-21T09:27:30.4553537Z .b8 105 2026-02-21T09:27:30.4553585Z .b8 116 2026-02-21T09:27:30.4553634Z .b8 111 2026-02-21T09:27:30.4553691Z .b8 110 2026-02-21T09:27:30.4553741Z .b8 0 2026-02-21T09:27:30.4553814Z .b8 2 // DW_AT_language 2026-02-21T09:27:30.4553862Z .b8 0 2026-02-21T09:27:30.4553940Z .b8 99 // DW_AT_name 2026-02-21T09:27:30.4553989Z .b8 117 2026-02-21T09:27:30.4554037Z .b8 120 2026-02-21T09:27:30.4554092Z .b8 100 2026-02-21T09:27:30.4554141Z .b8 121 2026-02-21T09:27:30.4554190Z .b8 119 2026-02-21T09:27:30.4554238Z .b8 108 2026-02-21T09:27:30.4554295Z .b8 99 2026-02-21T09:27:30.4554344Z .b8 111 2026-02-21T09:27:30.4554393Z .b8 118 2026-02-21T09:27:30.4554448Z .b8 104 2026-02-21T09:27:30.4554496Z .b8 114 2026-02-21T09:27:30.4554544Z .b8 119 2026-02-21T09:27:30.4554593Z .b8 105 2026-02-21T09:27:30.4554649Z .b8 55 2026-02-21T09:27:30.4554731Z .b8 102 2026-02-21T09:27:30.4554782Z .b8 120 2026-02-21T09:27:30.4554842Z .b8 118 2026-02-21T09:27:30.4554891Z .b8 53 2026-02-21T09:27:30.4554940Z .b8 54 2026-02-21T09:27:30.4554989Z .b8 100 2026-02-21T09:27:30.4555082Z .b8 106 2026-02-21T09:27:30.4555157Z .b8 118 2026-02-21T09:27:30.4555209Z .b8 115 2026-02-21T09:27:30.4555260Z .b8 101 2026-02-21T09:27:30.4555319Z .b8 51 2026-02-21T09:27:30.4555368Z .b8 52 2026-02-21T09:27:30.4555416Z .b8 51 2026-02-21T09:27:30.4555470Z .b8 112 2026-02-21T09:27:30.4555518Z .b8 103 2026-02-21T09:27:30.4555566Z .b8 101 2026-02-21T09:27:30.4555613Z .b8 53 2026-02-21T09:27:30.4555670Z .b8 103 2026-02-21T09:27:30.4555718Z .b8 50 2026-02-21T09:27:30.4555766Z .b8 97 2026-02-21T09:27:30.4555822Z .b8 55 2026-02-21T09:27:30.4555871Z .b8 105 2026-02-21T09:27:30.4555918Z .b8 102 2026-02-21T09:27:30.4555965Z .b8 112 2026-02-21T09:27:30.4556022Z .b8 119 2026-02-21T09:27:30.4556069Z .b8 120 2026-02-21T09:27:30.4556116Z .b8 108 2026-02-21T09:27:30.4556165Z .b8 109 2026-02-21T09:27:30.4556220Z .b8 103 2026-02-21T09:27:30.4556267Z .b8 102 2026-02-21T09:27:30.4556316Z .b8 103 2026-02-21T09:27:30.4556374Z .b8 52 2026-02-21T09:27:30.4556422Z .b8 114 2026-02-21T09:27:30.4556470Z .b8 120 2026-02-21T09:27:30.4556521Z .b8 114 2026-02-21T09:27:30.4556579Z .b8 111 2026-02-21T09:27:30.4556628Z .b8 50 2026-02-21T09:27:30.4556678Z .b8 46 2026-02-21T09:27:30.4556736Z .b8 112 2026-02-21T09:27:30.4556786Z .b8 121 2026-02-21T09:27:30.4556836Z .b8 0 2026-02-21T09:27:30.4556926Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:27:30.4557007Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:27:30.4557056Z .b8 116 2026-02-21T09:27:30.4557104Z .b8 109 2026-02-21T09:27:30.4557160Z .b8 112 2026-02-21T09:27:30.4557208Z .b8 47 2026-02-21T09:27:30.4557255Z .b8 116 2026-02-21T09:27:30.4557303Z .b8 111 2026-02-21T09:27:30.4557359Z .b8 114 2026-02-21T09:27:30.4557406Z .b8 99 2026-02-21T09:27:30.4557454Z .b8 104 2026-02-21T09:27:30.4557509Z .b8 105 2026-02-21T09:27:30.4557557Z .b8 110 2026-02-21T09:27:30.4557605Z .b8 100 2026-02-21T09:27:30.4557653Z .b8 117 2026-02-21T09:27:30.4557708Z .b8 99 2026-02-21T09:27:30.4557754Z .b8 116 2026-02-21T09:27:30.4557801Z .b8 111 2026-02-21T09:27:30.4557850Z .b8 114 2026-02-21T09:27:30.4557907Z .b8 95 2026-02-21T09:27:30.4557983Z .b8 114 2026-02-21T09:27:30.4558031Z .b8 111 2026-02-21T09:27:30.4558110Z .b8 111 2026-02-21T09:27:30.4558158Z .b8 116 2026-02-21T09:27:30.4558206Z .b8 47 2026-02-21T09:27:30.4558254Z .b8 117 2026-02-21T09:27:30.4558307Z .b8 120 2026-02-21T09:27:30.4558355Z .b8 0 2026-02-21T09:27:30.4558402Z } 2026-02-21T09:27:30.4558472Z .section .debug_macinfo { } 2026-02-21T09:27:30.4558477Z 2026-02-21T09:27:30.4558551Z ================================================================ 2026-02-21T09:27:30.4558650Z please share the reproducer above with Triton project. 2026-02-21T09:27:32.6865511Z 2026-02-21T09:27:32.6871196Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 94/94 18.1 configs/s 2026-02-21T09:27:37.3579612Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 233.8 2026-02-21T09:27:37.3579971Z configs/s 2026-02-21T09:27:37.5495119Z [156s] Generation 6 complete: 2026-02-21T09:27:37.5496923Z error=24 2026-02-21T09:27:37.5497141Z ok=74 2026-02-21T09:27:37.5502050Z min=0.0706 2026-02-21T09:27:37.5507159Z mid=0.0962 2026-02-21T09:27:37.5509408Z max=25.2867 2026-02-21T09:27:37.5513408Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:27:37.5514982Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:27:37.5515248Z 'l2_groupings': [16], 2026-02-21T09:27:37.5515434Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:27:37.5515646Z 'loop_orders': [[1, 0]], 2026-02-21T09:27:37.5515809Z 'num_stages': 3, 2026-02-21T09:27:37.5515966Z 'num_warps': 8, 2026-02-21T09:27:37.5516110Z 'pid_type': 'flat', 2026-02-21T09:27:37.5516272Z 'range_flattens': [None, False], 2026-02-21T09:27:37.5516447Z 'range_multi_buffers': [None, True], 2026-02-21T09:27:37.5516633Z 'range_num_stages': [0, 0], 2026-02-21T09:27:37.5516797Z 'range_unroll_factors': [0, 0], 2026-02-21T09:27:37.5517401Z 'range_warp_specializes': [None, None]} 2026-02-21T09:27:37.5517701Z [156s] Fitting surrogate: 642 points, 642 targets 2026-02-21T09:27:38.9017345Z [157s] Generation 7 starting: 79 neighbors, 5 active search path(s) 2026-02-21T09:28:13.2205812Z [191s] Timeout after 30s compiling Config(block_sizes=[512, 256, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=16, num_stages=6, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:28:13.7990641Z [192s] Timeout after 30s compiling Config(block_sizes=[256, 512, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=6, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:28:13.8015650Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 0.4 configs/s 2026-02-21T09:28:18.2003158Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 19.3 configs/s 2026-02-21T09:28:21.8059831Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 282.2 2026-02-21T09:28:21.8063601Z configs/s 2026-02-21T09:28:22.0685667Z [200s] Generation 7 complete: 2026-02-21T09:28:22.0690593Z error=27 2026-02-21T09:28:22.0690871Z timeout=2 2026-02-21T09:28:22.0691031Z ok=55 2026-02-21T09:28:22.0691214Z min=0.0707 2026-02-21T09:28:22.0691374Z mid=0.0952 2026-02-21T09:28:22.0691563Z max=25.4690 2026-02-21T09:28:22.0691748Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:28:22.0692095Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:28:22.0692416Z 'l2_groupings': [16], 2026-02-21T09:28:22.0692650Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:28:22.0693475Z 'loop_orders': [[1, 0]], 2026-02-21T09:28:22.0693685Z 'num_stages': 3, 2026-02-21T09:28:22.0693881Z 'num_warps': 8, 2026-02-21T09:28:22.0694074Z 'pid_type': 'flat', 2026-02-21T09:28:22.0694307Z 'range_flattens': [None, False], 2026-02-21T09:28:22.0694549Z 'range_multi_buffers': [None, True], 2026-02-21T09:28:22.0696410Z 'range_num_stages': [0, 0], 2026-02-21T09:28:22.0696613Z 'range_unroll_factors': [0, 0], 2026-02-21T09:28:22.0696841Z 'range_warp_specializes': [None, None]} 2026-02-21T09:28:22.0736282Z [200s] Fitting surrogate: 726 points, 726 targets 2026-02-21T09:28:23.0968215Z [201s] Generation 8 starting: 32 neighbors, 2 active search path(s) 2026-02-21T09:28:30.9436679Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32/32 2.0 configs/s 2026-02-21T09:28:31.5615852Z 2026-02-21T09:28:31.5615883Z 2026-02-21T09:28:31.5616730Z ================================================================ 2026-02-21T09:28:31.5617336Z Internal Triton PTX codegen error 2026-02-21T09:28:31.5617703Z `ptxas` stderr: 2026-02-21T09:28:31.5618544Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 253 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:28:31.5619503Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:28:31.5619792Z 2026-02-21T09:28:31.5620614Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp9cf896ua.ptx -o /tmp/tmp9cf896ua.ptx.o 2026-02-21T09:28:31.5621497Z 2026-02-21T09:28:31.5621504Z 2026-02-21T09:28:31.5621604Z // 2026-02-21T09:28:31.5621866Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:28:31.5622181Z // 2026-02-21T09:28:31.5622312Z 2026-02-21T09:28:31.5622406Z .version 8.7 2026-02-21T09:28:31.5622648Z .target sm_100a 2026-02-21T09:28:31.5622954Z .address_size 64 2026-02-21T09:28:31.5623112Z 2026-02-21T09:28:31.5623342Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:28:31.5623840Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:28:31.5624242Z // @_helion_matmul 2026-02-21T09:28:31.5624608Z .visible .entry _helion_matmul( 2026-02-21T09:28:31.5625116Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:28:31.5625605Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:28:31.5626101Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:28:31.5626577Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:28:31.5627061Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:28:31.5627459Z ) 2026-02-21T09:28:31.5627673Z .reqntid 384 2026-02-21T09:28:31.5627901Z .maxnreg 32 2026-02-21T09:28:31.5628114Z { 2026-02-21T09:28:31.5628343Z .reg .pred %p<122>; 2026-02-21T09:28:31.5628610Z .reg .b32 %r<1472>; 2026-02-21T09:28:31.5628875Z .reg .b64 %rd<625>; 2026-02-21T09:28:31.5629412Z .loc 1 19 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:19:0 2026-02-21T09:28:31.5629972Z $L__func_begin0: 2026-02-21T09:28:31.5630477Z .loc 1 19 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:19:0 2026-02-21T09:28:31.5630944Z 2026-02-21T09:28:31.5631035Z // %bb.0: 2026-02-21T09:28:31.5631323Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:28:31.5631683Z $L__tmp0: 2026-02-21T09:28:31.5632129Z .loc 1 19 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:19 2026-02-21T09:28:31.5632876Z mov.u32 %r1, %tid.x; 2026-02-21T09:28:31.5633194Z shr.u32 %r2, %r1, 5; 2026-02-21T09:28:31.5633480Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:28:31.5633840Z setp.lt.u32 %p1, %r3, 8; 2026-02-21T09:28:31.5634124Z @%p1 bra $L__BB0_12; 2026-02-21T09:28:31.5634383Z bra.uni $L__BB0_1; 2026-02-21T09:28:31.5634646Z $L__BB0_12: 2026-02-21T09:28:31.5635169Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0:0 2026-02-21T09:28:31.5636037Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:28:31.5636434Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:28:31.5637033Z .loc 1 19 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:19 2026-02-21T09:28:31.5637638Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:28:31.5637985Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:28:31.5638287Z mov.b32 %r112, global_smem; 2026-02-21T09:28:31.5638579Z // begin inline asm 2026-02-21T09:28:31.5639036Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r112], 512; 2026-02-21T09:28:31.5639501Z // end inline asm 2026-02-21T09:28:31.5639762Z bar.sync 0, 256; 2026-02-21T09:28:31.5640018Z ld.shared.b32 %r1465, [global_smem]; 2026-02-21T09:28:31.5640333Z bar.sync 0, 256; 2026-02-21T09:28:31.5640651Z // begin inline asm 2026-02-21T09:28:31.5641070Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:28:31.5641503Z // end inline asm 2026-02-21T09:28:31.5641987Z .loc 1 21 67 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:21:67 2026-02-21T09:28:31.5642547Z mov.u32 %r21, %ctaid.x; 2026-02-21T09:28:31.5642821Z mov.u32 %r129, %ctaid.y; 2026-02-21T09:28:31.5643097Z mov.u32 %r130, %ctaid.z; 2026-02-21T09:28:31.5643366Z mov.u32 %r131, %nctaid.x; 2026-02-21T09:28:31.5643642Z mov.u32 %r132, %nctaid.y; 2026-02-21T09:28:31.5643935Z mad.lo.s32 %r133, %r130, %r132, %r129; 2026-02-21T09:28:31.5644260Z mad.lo.s32 %r134, %r133, %r131, %r21; 2026-02-21T09:28:31.5644571Z shl.b32 %r135, %r134, 8; 2026-02-21T09:28:31.5644883Z cvt.s64.s32 %rd80, %r135; 2026-02-21T09:28:31.5645179Z add.s64 %rd59, %rd6, %rd80; 2026-02-21T09:28:31.5645459Z shl.b32 %r136, %r1, 2; 2026-02-21T09:28:31.5645733Z add.s32 %r113, %r112, %r136; 2026-02-21T09:28:31.5646020Z mov.b32 %r138, 0; 2026-02-21T09:28:31.5646271Z // begin inline asm 2026-02-21T09:28:31.5647214Z [210s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:28:31.5649635Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T09:28:31.5652057Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:28:31.5652530Z `ptxas` stderr: 2026-02-21T09:28:31.5653365Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 253 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:28:31.5654332Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:28:31.5654611Z 2026-02-21T09:28:31.5655504Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp9cf896ua.ptx -o /tmp/tmp9cf896ua.ptx.o 2026-02-21T09:28:31.5656420Z 2026-02-21T09:28:31.5656657Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:28:31.5657154Z @%p34 st.shared.b32 [ %r113 + 0 ], %r138; 2026-02-21T09:28:31.5657483Z // end inline asm 2026-02-21T09:28:31.5657743Z bar.warp.sync -1; 2026-02-21T09:28:31.5658005Z setp.eq.b32 %p89, %r1, 0; 2026-02-21T09:28:31.5658313Z cvt.u64.u32 %rd44, %r112; 2026-02-21T09:28:31.5658594Z // begin inline asm 2026-02-21T09:28:31.5659076Z @%p89 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd44 + 0 ], %rd3; 2026-02-21T09:28:31.5659648Z // end inline asm 2026-02-21T09:28:31.5659882Z // begin inline asm 2026-02-21T09:28:31.5660329Z @%p89 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1; 2026-02-21T09:28:31.5660993Z // end inline asm 2026-02-21T09:28:31.5661246Z mov.b32 %r115, 32; 2026-02-21T09:28:31.5661490Z // begin inline asm 2026-02-21T09:28:31.5661963Z @%p89 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0, %r115; 2026-02-21T09:28:31.5662507Z // end inline asm 2026-02-21T09:28:31.5662743Z mov.b32 %r116, 256; 2026-02-21T09:28:31.5663017Z // begin inline asm 2026-02-21T09:28:31.5663461Z @%p89 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1, %r116; 2026-02-21T09:28:31.5663974Z // end inline asm 2026-02-21T09:28:31.5664206Z mov.b32 %r117, 2048; 2026-02-21T09:28:31.5664459Z // begin inline asm 2026-02-21T09:28:31.5664971Z @%p89 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0, %r117; 2026-02-21T09:28:31.5665506Z // end inline asm 2026-02-21T09:28:31.5665743Z mov.b32 %r118, 8192; 2026-02-21T09:28:31.5666070Z // begin inline asm 2026-02-21T09:28:31.5666607Z @%p89 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1, %r118; 2026-02-21T09:28:31.5667134Z // end inline asm 2026-02-21T09:28:31.5667383Z mov.b64 %rd52, 4096; 2026-02-21T09:28:31.5667627Z // begin inline asm 2026-02-21T09:28:31.5668109Z @%p89 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd44 + 0 ], 0x0, %rd52; 2026-02-21T09:28:31.5668665Z // end inline asm 2026-02-21T09:28:31.5668898Z mov.b32 %r119, 1; 2026-02-21T09:28:31.5669156Z // begin inline asm 2026-02-21T09:28:31.5669631Z @%p89 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0, %r119; 2026-02-21T09:28:31.5670193Z // end inline asm 2026-02-21T09:28:31.5670422Z // begin inline asm 2026-02-21T09:28:31.5670896Z @%p89 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1, %r119; 2026-02-21T09:28:31.5671469Z // end inline asm 2026-02-21T09:28:31.5671699Z // begin inline asm 2026-02-21T09:28:31.5672161Z @%p89 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x6; 2026-02-21T09:28:31.5672656Z // end inline asm 2026-02-21T09:28:31.5672893Z // begin inline asm 2026-02-21T09:28:31.5673370Z @%p89 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0; 2026-02-21T09:28:31.5673902Z // end inline asm 2026-02-21T09:28:31.5674153Z // begin inline asm 2026-02-21T09:28:31.5674592Z @%p89 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x2; 2026-02-21T09:28:31.5675162Z // end inline asm 2026-02-21T09:28:31.5675391Z // begin inline asm 2026-02-21T09:28:31.5675814Z @%p89 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0; 2026-02-21T09:28:31.5676332Z // end inline asm 2026-02-21T09:28:31.5676573Z // begin inline asm 2026-02-21T09:28:31.5677250Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd59 + 0 ], [ %rd44 + 0 ], 0x80; 2026-02-21T09:28:31.5677966Z // end inline asm 2026-02-21T09:28:31.5678215Z // begin inline asm 2026-02-21T09:28:31.5678591Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd59 + 0 ], 0x80; 2026-02-21T09:28:31.5679081Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:28:31.5679421Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:28:31.5679745Z // end inline asm 2026-02-21T09:28:31.5679986Z bar.sync 0, 256; 2026-02-21T09:28:31.5680458Z .loc 1 22 67 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:22:67 2026-02-21T09:28:31.5681042Z add.s64 %rd77, %rd59, 128; 2026-02-21T09:28:31.5681306Z bar.sync 0, 256; 2026-02-21T09:28:31.5681546Z // begin inline asm 2026-02-21T09:28:31.5681817Z @%p34 st.shared.b32 [ %r113 + 0 ], %r138; 2026-02-21T09:28:31.5682138Z // end inline asm 2026-02-21T09:28:31.5682373Z bar.warp.sync -1; 2026-02-21T09:28:31.5682619Z // begin inline asm 2026-02-21T09:28:31.5683088Z @%p89 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd44 + 0 ], %rd4; 2026-02-21T09:28:31.5683612Z // end inline asm 2026-02-21T09:28:31.5683853Z // begin inline asm 2026-02-21T09:28:31.5684366Z @%p89 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1; 2026-02-21T09:28:31.5684963Z // end inline asm 2026-02-21T09:28:31.5685193Z // begin inline asm 2026-02-21T09:28:31.5685656Z @%p89 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0, %r115; 2026-02-21T09:28:31.5686156Z // end inline asm 2026-02-21T09:28:31.5686397Z // begin inline asm 2026-02-21T09:28:31.5686849Z @%p89 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1, %r116; 2026-02-21T09:28:31.5687353Z // end inline asm 2026-02-21T09:28:31.5687605Z // begin inline asm 2026-02-21T09:28:31.5688049Z @%p89 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0, %r117; 2026-02-21T09:28:31.5688604Z // end inline asm 2026-02-21T09:28:31.5688830Z // begin inline asm 2026-02-21T09:28:31.5689366Z @%p89 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1, %r117; 2026-02-21T09:28:31.5689952Z // end inline asm 2026-02-21T09:28:31.5690196Z // begin inline asm 2026-02-21T09:28:31.5690695Z @%p89 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd44 + 0 ], 0x0, %rd52; 2026-02-21T09:28:31.5691224Z // end inline asm 2026-02-21T09:28:31.5691460Z // begin inline asm 2026-02-21T09:28:31.5691940Z @%p89 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0, %r119; 2026-02-21T09:28:31.5692492Z // end inline asm 2026-02-21T09:28:31.5692721Z // begin inline asm 2026-02-21T09:28:31.5693191Z @%p89 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x1, %r119; 2026-02-21T09:28:31.5693756Z // end inline asm 2026-02-21T09:28:31.5693992Z // begin inline asm 2026-02-21T09:28:31.5694432Z @%p89 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x6; 2026-02-21T09:28:31.5695022Z // end inline asm 2026-02-21T09:28:31.5695281Z // begin inline asm 2026-02-21T09:28:31.5695770Z @%p89 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0; 2026-02-21T09:28:31.5696344Z // end inline asm 2026-02-21T09:28:31.5696581Z // begin inline asm 2026-02-21T09:28:31.5697079Z @%p89 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x2; 2026-02-21T09:28:31.5697599Z // end inline asm 2026-02-21T09:28:31.5697849Z // begin inline asm 2026-02-21T09:28:31.5698295Z @%p89 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd44 + 0 ], 0x0; 2026-02-21T09:28:31.5698835Z // end inline asm 2026-02-21T09:28:31.5699084Z // begin inline asm 2026-02-21T09:28:31.5699821Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd77 + 0 ], [ %rd44 + 0 ], 0x80; 2026-02-21T09:28:31.5700597Z // end inline asm 2026-02-21T09:28:31.5700844Z // begin inline asm 2026-02-21T09:28:31.5701242Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd77 + 0 ], 0x80; 2026-02-21T09:28:31.5701751Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:28:31.5702109Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:28:31.5702448Z // end inline asm 2026-02-21T09:28:31.5702697Z bar.sync 0, 256; 2026-02-21T09:28:31.5703182Z .loc 1 31 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:31:52 2026-02-21T09:28:31.5703814Z setp.gt.u32 %p72, %r21, 255; 2026-02-21T09:28:31.5704120Z @%p72 bra $L__BB0_14; 2026-02-21T09:28:31.5704425Z // %bb.13: // %.lr.ph 2026-02-21T09:28:31.5705051Z .loc 1 0 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0:52 2026-02-21T09:28:31.5705695Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:28:31.5706318Z .loc 1 43 45 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:43:45 2026-02-21T09:28:31.5706895Z and.b32 %r1003, %r1, 31; 2026-02-21T09:28:31.5707177Z shl.b32 %r1004, %r1003, 3; 2026-02-21T09:28:31.5707471Z and.b32 %r1005, %r1, 224; 2026-02-21T09:28:31.5707754Z bfe.u32 %r1006, %r1, 5, 3; 2026-02-21T09:28:31.5708268Z .loc 1 44 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:44:27 2026-02-21T09:28:31.5709018Z shl.b32 %r1007, %r21, 5; 2026-02-21T09:28:31.5709518Z .loc 1 43 45 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:43:45 2026-02-21T09:28:31.5710079Z or.b32 %r1008, %r1006, %r1007; 2026-02-21T09:28:31.5710396Z shl.b32 %r1009, %r1005, 7; 2026-02-21T09:28:31.5710693Z shl.b32 %r1010, %r1003, 4; 2026-02-21T09:28:31.5710992Z or.b32 %r1011, %r1009, %r1010; 2026-02-21T09:28:31.5711277Z shr.u32 %r1012, %r1005, 1; 2026-02-21T09:28:31.5711563Z xor.b32 %r1013, %r1011, %r1012; 2026-02-21T09:28:31.5711855Z add.s32 %r719, %r112, %r1013; 2026-02-21T09:28:31.5712160Z add.s32 %r754, %r719, 3584; 2026-02-21T09:28:31.5712442Z add.s32 %r749, %r719, 3072; 2026-02-21T09:28:31.5712738Z add.s32 %r744, %r719, 2560; 2026-02-21T09:28:31.5713091Z add.s32 %r739, %r719, 2048; 2026-02-21T09:28:31.5713435Z add.s32 %r734, %r719, 1536; 2026-02-21T09:28:31.5713723Z add.s32 %r729, %r719, 1024; 2026-02-21T09:28:31.5714009Z add.s32 %r724, %r719, 512; 2026-02-21T09:28:31.5714300Z shl.b32 %r1015, %r1, 12; 2026-02-21T09:28:31.5714586Z and.b32 %r1016, %r1015, 28672; 2026-02-21T09:28:31.5714932Z shl.b32 %r1017, %r1, 4; 2026-02-21T09:28:31.5715198Z and.b32 %r1018, %r1017, 4080; 2026-02-21T09:28:31.5715501Z or.b32 %r1019, %r1016, %r1018; 2026-02-21T09:28:31.5715811Z xor.b32 %r1020, %r1019, 112; 2026-02-21T09:28:31.5716117Z add.s32 %r1021, %r112, %r1020; 2026-02-21T09:28:31.5716414Z xor.b32 %r1022, %r1019, 96; 2026-02-21T09:28:31.5716721Z add.s32 %r1023, %r112, %r1022; 2026-02-21T09:28:31.5717026Z xor.b32 %r1024, %r1019, 80; 2026-02-21T09:28:31.5717297Z add.s32 %r1025, %r112, %r1024; 2026-02-21T09:28:31.5717601Z xor.b32 %r1026, %r1019, 64; 2026-02-21T09:28:31.5717877Z add.s32 %r1027, %r112, %r1026; 2026-02-21T09:28:31.5718166Z xor.b32 %r1028, %r1019, 48; 2026-02-21T09:28:31.5718444Z add.s32 %r1029, %r112, %r1028; 2026-02-21T09:28:31.5718735Z xor.b32 %r1030, %r1019, 32; 2026-02-21T09:28:31.5719024Z add.s32 %r1031, %r112, %r1030; 2026-02-21T09:28:31.5719312Z xor.b32 %r1032, %r1019, 16; 2026-02-21T09:28:31.5719599Z add.s32 %r1033, %r112, %r1032; 2026-02-21T09:28:31.5719903Z add.s32 %r1034, %r112, %r1019; 2026-02-21T09:28:31.5720434Z .loc 1 42 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:42:27 2026-02-21T09:28:31.5720994Z shl.b32 %r1035, %r21, 8; 2026-02-21T09:28:31.5721295Z and.b32 %r1036, %r1035, 1792; 2026-02-21T09:28:31.5721818Z .loc 1 43 32 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:43:32 2026-02-21T09:28:31.5722400Z or.b32 %r1037, %r1036, %r1004; 2026-02-21T09:28:31.5722915Z .loc 1 44 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:44:27 2026-02-21T09:28:31.5723491Z and.b32 %r1038, %r1007, 7936; 2026-02-21T09:28:31.5724014Z .loc 1 45 32 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:45:32 2026-02-21T09:28:31.5724580Z or.b32 %r1039, %r1038, %r1006; 2026-02-21T09:28:31.5724920Z shl.b32 %r1040, %r1008, 11; 2026-02-21T09:28:31.5725407Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5726022Z shfl.sync.idx.b32 %r1041, %r2, 0, 31, -1; 2026-02-21T09:28:31.5726354Z shl.b32 %r1042, %r1041, 21; 2026-02-21T09:28:31.5726646Z and.b32 %r1043, %r1042, 6291456; 2026-02-21T09:28:31.5726952Z add.s32 %r1044, %r1043, %r1465; 2026-02-21T09:28:31.5727239Z shl.b32 %r1045, %r1041, 6; 2026-02-21T09:28:31.5727527Z and.b32 %r1046, %r1045, 256; 2026-02-21T09:28:31.5727810Z add.s32 %r137, %r1044, %r1046; 2026-02-21T09:28:31.5728102Z mov.pred %p73, -1; 2026-02-21T09:28:31.5728372Z // begin inline asm 2026-02-21T09:28:31.5729105Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 0], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5729988Z // end inline asm 2026-02-21T09:28:31.5730289Z // begin inline asm 2026-02-21T09:28:31.5731002Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 16], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5731786Z // end inline asm 2026-02-21T09:28:31.5732033Z // begin inline asm 2026-02-21T09:28:31.5732739Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 32], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5733516Z // end inline asm 2026-02-21T09:28:31.5733754Z // begin inline asm 2026-02-21T09:28:31.5734445Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 48], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5735307Z // end inline asm 2026-02-21T09:28:31.5735596Z // begin inline asm 2026-02-21T09:28:31.5736311Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 64], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5737084Z // end inline asm 2026-02-21T09:28:31.5737326Z // begin inline asm 2026-02-21T09:28:31.5738030Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 80], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5738795Z // end inline asm 2026-02-21T09:28:31.5739048Z // begin inline asm 2026-02-21T09:28:31.5739731Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 96], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5740506Z // end inline asm 2026-02-21T09:28:31.5740737Z // begin inline asm 2026-02-21T09:28:31.5741460Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 112], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5742243Z // end inline asm 2026-02-21T09:28:31.5742484Z // begin inline asm 2026-02-21T09:28:31.5743193Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 128], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5743959Z // end inline asm 2026-02-21T09:28:31.5744197Z // begin inline asm 2026-02-21T09:28:31.5744940Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 144], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5745717Z // end inline asm 2026-02-21T09:28:31.5745970Z // begin inline asm 2026-02-21T09:28:31.5746663Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 160], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5747452Z // end inline asm 2026-02-21T09:28:31.5747687Z // begin inline asm 2026-02-21T09:28:31.5748384Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 176], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5749171Z // end inline asm 2026-02-21T09:28:31.5749411Z // begin inline asm 2026-02-21T09:28:31.5750119Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 192], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5750881Z // end inline asm 2026-02-21T09:28:31.5751123Z // begin inline asm 2026-02-21T09:28:31.5751815Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 208], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5752597Z // end inline asm 2026-02-21T09:28:31.5752834Z // begin inline asm 2026-02-21T09:28:31.5753525Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 224], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5754427Z // end inline asm 2026-02-21T09:28:31.5754724Z // begin inline asm 2026-02-21T09:28:31.5755444Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 240], {%r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138, %r138}; 2026-02-21T09:28:31.5756222Z // end inline asm 2026-02-21T09:28:31.5756453Z // begin inline asm 2026-02-21T09:28:31.5756742Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:28:31.5757023Z // end inline asm 2026-02-21T09:28:31.5757267Z bar.sync 0, 256; 2026-02-21T09:28:31.5757753Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.5758348Z add.s32 %r409, %r112, 196736; 2026-02-21T09:28:31.5758632Z // begin inline asm 2026-02-21T09:28:31.5759024Z @%p89 mbarrier.init.shared::cta.b64 [%r409], 1; 2026-02-21T09:28:31.5759385Z // end inline asm 2026-02-21T09:28:31.5759623Z bar.sync 0, 256; 2026-02-21T09:28:31.5759868Z add.s32 %r410, %r112, 196744; 2026-02-21T09:28:31.5760161Z // begin inline asm 2026-02-21T09:28:31.5760462Z @%p89 mbarrier.init.shared::cta.b64 [%r410], 1; 2026-02-21T09:28:31.5760830Z // end inline asm 2026-02-21T09:28:31.5761068Z bar.sync 0, 256; 2026-02-21T09:28:31.5761310Z add.s32 %r411, %r112, 196752; 2026-02-21T09:28:31.5761601Z // begin inline asm 2026-02-21T09:28:31.5761900Z @%p89 mbarrier.init.shared::cta.b64 [%r411], 1; 2026-02-21T09:28:31.5762249Z // end inline asm 2026-02-21T09:28:31.5762499Z bar.sync 0, 256; 2026-02-21T09:28:31.5762728Z add.s32 %r412, %r112, 196760; 2026-02-21T09:28:31.5763019Z // begin inline asm 2026-02-21T09:28:31.5763305Z @%p89 mbarrier.init.shared::cta.b64 [%r412], 1; 2026-02-21T09:28:31.5763658Z // end inline asm 2026-02-21T09:28:31.5763887Z bar.sync 0, 256; 2026-02-21T09:28:31.5764129Z add.s32 %r413, %r112, 196768; 2026-02-21T09:28:31.5764426Z // begin inline asm 2026-02-21T09:28:31.5764756Z @%p89 mbarrier.init.shared::cta.b64 [%r413], 1; 2026-02-21T09:28:31.5765123Z // end inline asm 2026-02-21T09:28:31.5765347Z bar.sync 0, 256; 2026-02-21T09:28:31.5765594Z add.s32 %r414, %r112, 196776; 2026-02-21T09:28:31.5765871Z // begin inline asm 2026-02-21T09:28:31.5766164Z @%p89 mbarrier.init.shared::cta.b64 [%r414], 1; 2026-02-21T09:28:31.5766515Z // end inline asm 2026-02-21T09:28:31.5766756Z add.s32 %r415, %r112, 196784; 2026-02-21T09:28:31.5767050Z // begin inline asm 2026-02-21T09:28:31.5767341Z @%p89 mbarrier.init.shared::cta.b64 [%r415], 1; 2026-02-21T09:28:31.5767693Z // end inline asm 2026-02-21T09:28:31.5767923Z bar.sync 0, 256; 2026-02-21T09:28:31.5768171Z add.s32 %r416, %r112, 196792; 2026-02-21T09:28:31.5768449Z // begin inline asm 2026-02-21T09:28:31.5768745Z @%p89 mbarrier.init.shared::cta.b64 [%r416], 1; 2026-02-21T09:28:31.5769088Z // end inline asm 2026-02-21T09:28:31.5769325Z bar.sync 0, 256; 2026-02-21T09:28:31.5769568Z add.s32 %r417, %r112, 196800; 2026-02-21T09:28:31.5769842Z // begin inline asm 2026-02-21T09:28:31.5770138Z @%p89 mbarrier.init.shared::cta.b64 [%r417], 1; 2026-02-21T09:28:31.5770475Z // end inline asm 2026-02-21T09:28:31.5770706Z bar.sync 0, 256; 2026-02-21T09:28:31.5770948Z add.s32 %r418, %r112, 196808; 2026-02-21T09:28:31.5771226Z // begin inline asm 2026-02-21T09:28:31.5771523Z @%p89 mbarrier.init.shared::cta.b64 [%r418], 1; 2026-02-21T09:28:31.5771866Z // end inline asm 2026-02-21T09:28:31.5772099Z bar.sync 0, 256; 2026-02-21T09:28:31.5772334Z add.s32 %r419, %r112, 196816; 2026-02-21T09:28:31.5772605Z // begin inline asm 2026-02-21T09:28:31.5772906Z @%p89 mbarrier.init.shared::cta.b64 [%r419], 1; 2026-02-21T09:28:31.5773246Z // end inline asm 2026-02-21T09:28:31.5773484Z bar.sync 0, 256; 2026-02-21T09:28:31.5773727Z add.s32 %r420, %r112, 196824; 2026-02-21T09:28:31.5774003Z // begin inline asm 2026-02-21T09:28:31.5774306Z @%p89 mbarrier.init.shared::cta.b64 [%r420], 1; 2026-02-21T09:28:31.5774786Z // end inline asm 2026-02-21T09:28:31.5775296Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.5775836Z bar.sync 0, 256; 2026-02-21T09:28:31.5776094Z // begin inline asm 2026-02-21T09:28:31.5776393Z @%p89 mbarrier.arrive.shared::cta.b64 _, [%r409]; 2026-02-21T09:28:31.5776775Z // end inline asm 2026-02-21T09:28:31.5777008Z bar.sync 0, 256; 2026-02-21T09:28:31.5777247Z // begin inline asm 2026-02-21T09:28:31.5777557Z @%p89 mbarrier.arrive.shared::cta.b64 _, [%r410]; 2026-02-21T09:28:31.5777921Z // end inline asm 2026-02-21T09:28:31.5778154Z bar.sync 0, 256; 2026-02-21T09:28:31.5778385Z // begin inline asm 2026-02-21T09:28:31.5778696Z @%p89 mbarrier.arrive.shared::cta.b64 _, [%r411]; 2026-02-21T09:28:31.5779049Z // end inline asm 2026-02-21T09:28:31.5779309Z bar.sync 0, 256; 2026-02-21T09:28:31.5779583Z // begin inline asm 2026-02-21T09:28:31.5779961Z @%p89 mbarrier.arrive.shared::cta.b64 _, [%r412]; 2026-02-21T09:28:31.5780317Z // end inline asm 2026-02-21T09:28:31.5780558Z bar.sync 0, 256; 2026-02-21T09:28:31.5780794Z // begin inline asm 2026-02-21T09:28:31.5781097Z @%p89 mbarrier.arrive.shared::cta.b64 _, [%r413]; 2026-02-21T09:28:31.5781454Z // end inline asm 2026-02-21T09:28:31.5781696Z bar.sync 0, 256; 2026-02-21T09:28:31.5781929Z // begin inline asm 2026-02-21T09:28:31.5782219Z @%p89 mbarrier.arrive.shared::cta.b64 _, [%r414]; 2026-02-21T09:28:31.5782578Z // end inline asm 2026-02-21T09:28:31.5783090Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.5783664Z bar.sync 0, 256; 2026-02-21T09:28:31.5783915Z add.s32 %r427, %r112, 196832; 2026-02-21T09:28:31.5784207Z // begin inline asm 2026-02-21T09:28:31.5784499Z @%p89 mbarrier.init.shared::cta.b64 [%r427], 1; 2026-02-21T09:28:31.5784901Z // end inline asm 2026-02-21T09:28:31.5785197Z st.shared.b32 [global_smem+196840], 33554689; 2026-02-21T09:28:31.5785596Z st.shared.b32 [global_smem+196608], %r1465; 2026-02-21T09:28:31.5786042Z st.shared.v2.b32 [global_smem+196616], {%r1038, %r1036}; 2026-02-21T09:28:31.5786432Z barrier.sync 1; 2026-02-21T09:28:31.5786718Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:28:31.5787055Z barrier.sync 1; 2026-02-21T09:28:31.5787294Z barrier.sync 1; 2026-02-21T09:28:31.5787581Z setmaxnreg.inc.sync.aligned.u32 32; 2026-02-21T09:28:31.5788147Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5788715Z bar.sync 0, 256; 2026-02-21T09:28:31.5788957Z // begin inline asm 2026-02-21T09:28:31.5789192Z 2026-02-21T09:28:31.5789400Z { 2026-02-21T09:28:31.5789610Z .reg .pred complete; 2026-02-21T09:28:31.5789872Z waitLoop: 2026-02-21T09:28:31.5790221Z mbarrier.try_wait.parity.shared.b64 complete, [%r427], %r138; 2026-02-21T09:28:31.5790684Z @!complete bra.uni waitLoop; 2026-02-21T09:28:31.5790956Z } 2026-02-21T09:28:31.5791090Z 2026-02-21T09:28:31.5791185Z // end inline asm 2026-02-21T09:28:31.5791657Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.5792214Z bar.sync 0, 256; 2026-02-21T09:28:31.5792452Z // begin inline asm 2026-02-21T09:28:31.5792750Z @%p89 mbarrier.inval.shared::cta.b64 [%r427]; 2026-02-21T09:28:31.5793119Z // end inline asm 2026-02-21T09:28:31.5793356Z // begin inline asm 2026-02-21T09:28:31.5793661Z @%p89 mbarrier.inval.shared::cta.b64 [%r415]; 2026-02-21T09:28:31.5794002Z // end inline asm 2026-02-21T09:28:31.5794234Z bar.sync 0, 256; 2026-02-21T09:28:31.5794473Z // begin inline asm 2026-02-21T09:28:31.5794827Z @%p89 mbarrier.inval.shared::cta.b64 [%r416]; 2026-02-21T09:28:31.5795166Z // end inline asm 2026-02-21T09:28:31.5795418Z bar.sync 0, 256; 2026-02-21T09:28:31.5795662Z // begin inline asm 2026-02-21T09:28:31.5795971Z @%p89 mbarrier.inval.shared::cta.b64 [%r417]; 2026-02-21T09:28:31.5796322Z // end inline asm 2026-02-21T09:28:31.5796644Z bar.sync 0, 256; 2026-02-21T09:28:31.5796936Z // begin inline asm 2026-02-21T09:28:31.5797229Z @%p89 mbarrier.inval.shared::cta.b64 [%r418]; 2026-02-21T09:28:31.5797582Z // end inline asm 2026-02-21T09:28:31.5797818Z bar.sync 0, 256; 2026-02-21T09:28:31.5798065Z // begin inline asm 2026-02-21T09:28:31.5798344Z @%p89 mbarrier.inval.shared::cta.b64 [%r419]; 2026-02-21T09:28:31.5798702Z // end inline asm 2026-02-21T09:28:31.5798944Z bar.sync 0, 256; 2026-02-21T09:28:31.5799182Z // begin inline asm 2026-02-21T09:28:31.5799476Z @%p89 mbarrier.inval.shared::cta.b64 [%r420]; 2026-02-21T09:28:31.5799817Z // end inline asm 2026-02-21T09:28:31.5800065Z // begin inline asm 2026-02-21T09:28:31.5800353Z @%p89 mbarrier.inval.shared::cta.b64 [%r409]; 2026-02-21T09:28:31.5800705Z // end inline asm 2026-02-21T09:28:31.5800927Z bar.sync 0, 256; 2026-02-21T09:28:31.5801174Z // begin inline asm 2026-02-21T09:28:31.5801518Z @%p89 mbarrier.inval.shared::cta.b64 [%r410]; 2026-02-21T09:28:31.5801918Z // end inline asm 2026-02-21T09:28:31.5802159Z bar.sync 0, 256; 2026-02-21T09:28:31.5802406Z // begin inline asm 2026-02-21T09:28:31.5802694Z @%p89 mbarrier.inval.shared::cta.b64 [%r411]; 2026-02-21T09:28:31.5803053Z // end inline asm 2026-02-21T09:28:31.5803297Z bar.sync 0, 256; 2026-02-21T09:28:31.5803534Z // begin inline asm 2026-02-21T09:28:31.5803826Z @%p89 mbarrier.inval.shared::cta.b64 [%r412]; 2026-02-21T09:28:31.5804167Z // end inline asm 2026-02-21T09:28:31.5804407Z bar.sync 0, 256; 2026-02-21T09:28:31.5804652Z // begin inline asm 2026-02-21T09:28:31.5804991Z @%p89 mbarrier.inval.shared::cta.b64 [%r413]; 2026-02-21T09:28:31.5805338Z // end inline asm 2026-02-21T09:28:31.5805582Z bar.sync 0, 256; 2026-02-21T09:28:31.5805822Z // begin inline asm 2026-02-21T09:28:31.5806120Z @%p89 mbarrier.inval.shared::cta.b64 [%r414]; 2026-02-21T09:28:31.5806456Z // end inline asm 2026-02-21T09:28:31.5806939Z .loc 1 45 32 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:45:32 2026-02-21T09:28:31.5807535Z shl.b32 %r1047, %r1039, 11; 2026-02-21T09:28:31.5808067Z .loc 1 59 45 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:59:45 2026-02-21T09:28:31.5808606Z or.b32 %r1048, %r1040, %r1037; 2026-02-21T09:28:31.5808914Z or.b32 %r1049, %r1047, %r1037; 2026-02-21T09:28:31.5809439Z .loc 1 59 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:59:52 2026-02-21T09:28:31.5810011Z or.b32 %r1050, %r1049, 16384; 2026-02-21T09:28:31.5810286Z or.b32 %r1051, %r1049, 32768; 2026-02-21T09:28:31.5810581Z or.b32 %r1052, %r1049, 49152; 2026-02-21T09:28:31.5810860Z or.b32 %r1053, %r1049, 65536; 2026-02-21T09:28:31.5811137Z or.b32 %r1054, %r1049, 81920; 2026-02-21T09:28:31.5811412Z or.b32 %r1055, %r1049, 98304; 2026-02-21T09:28:31.5811680Z or.b32 %r1056, %r1049, 114688; 2026-02-21T09:28:31.5811985Z or.b32 %r1057, %r1049, 131072; 2026-02-21T09:28:31.5812269Z or.b32 %r1058, %r1049, 147456; 2026-02-21T09:28:31.5812562Z or.b32 %r1059, %r1049, 163840; 2026-02-21T09:28:31.5812853Z or.b32 %r1060, %r1049, 180224; 2026-02-21T09:28:31.5813132Z or.b32 %r1061, %r1049, 196608; 2026-02-21T09:28:31.5813422Z or.b32 %r1062, %r1049, 212992; 2026-02-21T09:28:31.5813702Z or.b32 %r1063, %r1049, 229376; 2026-02-21T09:28:31.5813997Z or.b32 %r1064, %r1049, 245760; 2026-02-21T09:28:31.5814274Z or.b32 %r1065, %r1049, 262144; 2026-02-21T09:28:31.5814553Z or.b32 %r1066, %r1049, 278528; 2026-02-21T09:28:31.5814886Z or.b32 %r1067, %r1049, 294912; 2026-02-21T09:28:31.5815167Z or.b32 %r1068, %r1049, 311296; 2026-02-21T09:28:31.5815457Z or.b32 %r1069, %r1049, 327680; 2026-02-21T09:28:31.5815731Z or.b32 %r1070, %r1049, 344064; 2026-02-21T09:28:31.5816025Z or.b32 %r1071, %r1049, 360448; 2026-02-21T09:28:31.5816311Z or.b32 %r1072, %r1049, 376832; 2026-02-21T09:28:31.5816606Z or.b32 %r1073, %r1049, 393216; 2026-02-21T09:28:31.5816896Z or.b32 %r1074, %r1049, 409600; 2026-02-21T09:28:31.5817181Z or.b32 %r1075, %r1049, 425984; 2026-02-21T09:28:31.5817554Z or.b32 %r1076, %r1049, 442368; 2026-02-21T09:28:31.5817883Z or.b32 %r1077, %r1048, 458752; 2026-02-21T09:28:31.5818160Z or.b32 %r1078, %r1048, 475136; 2026-02-21T09:28:31.5818440Z or.b32 %r1079, %r1048, 491520; 2026-02-21T09:28:31.5818737Z or.b32 %r1080, %r1048, 507904; 2026-02-21T09:28:31.5819268Z .loc 1 59 24 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:59:24 2026-02-21T09:28:31.5819861Z mad.wide.u32 %rd81, %r1049, 2, %rd5; 2026-02-21T09:28:31.5820204Z mad.wide.u32 %rd82, %r1050, 2, %rd5; 2026-02-21T09:28:31.5820518Z mad.wide.u32 %rd83, %r1051, 2, %rd5; 2026-02-21T09:28:31.5820829Z mad.wide.u32 %rd84, %r1052, 2, %rd5; 2026-02-21T09:28:31.5821137Z mad.wide.u32 %rd85, %r1053, 2, %rd5; 2026-02-21T09:28:31.5821451Z mad.wide.u32 %rd86, %r1054, 2, %rd5; 2026-02-21T09:28:31.5821775Z mad.wide.u32 %rd87, %r1055, 2, %rd5; 2026-02-21T09:28:31.5822172Z mad.wide.u32 %rd88, %r1056, 2, %rd5; 2026-02-21T09:28:31.5822486Z mad.wide.u32 %rd89, %r1057, 2, %rd5; 2026-02-21T09:28:31.5822807Z mad.wide.u32 %rd90, %r1058, 2, %rd5; 2026-02-21T09:28:31.5823115Z mad.wide.u32 %rd91, %r1059, 2, %rd5; 2026-02-21T09:28:31.5823450Z mad.wide.u32 %rd92, %r1060, 2, %rd5; 2026-02-21T09:28:31.5823761Z mad.wide.u32 %rd93, %r1061, 2, %rd5; 2026-02-21T09:28:31.5824092Z mad.wide.u32 %rd94, %r1062, 2, %rd5; 2026-02-21T09:28:31.5824418Z mad.wide.u32 %rd95, %r1063, 2, %rd5; 2026-02-21T09:28:31.5824775Z mad.wide.u32 %rd96, %r1064, 2, %rd5; 2026-02-21T09:28:31.5825090Z mad.wide.u32 %rd97, %r1065, 2, %rd5; 2026-02-21T09:28:31.5825403Z mad.wide.u32 %rd98, %r1066, 2, %rd5; 2026-02-21T09:28:31.5825730Z mad.wide.u32 %rd99, %r1067, 2, %rd5; 2026-02-21T09:28:31.5826068Z mad.wide.u32 %rd100, %r1068, 2, %rd5; 2026-02-21T09:28:31.5826406Z mad.wide.u32 %rd101, %r1069, 2, %rd5; 2026-02-21T09:28:31.5826735Z mad.wide.u32 %rd102, %r1070, 2, %rd5; 2026-02-21T09:28:31.5827081Z mad.wide.u32 %rd103, %r1071, 2, %rd5; 2026-02-21T09:28:31.5827380Z mad.wide.u32 %rd104, %r1072, 2, %rd5; 2026-02-21T09:28:31.5827704Z mad.wide.u32 %rd105, %r1073, 2, %rd5; 2026-02-21T09:28:31.5828013Z mad.wide.u32 %rd106, %r1074, 2, %rd5; 2026-02-21T09:28:31.5828348Z mad.wide.u32 %rd107, %r1075, 2, %rd5; 2026-02-21T09:28:31.5828676Z mad.wide.u32 %rd108, %r1076, 2, %rd5; 2026-02-21T09:28:31.5829001Z mad.wide.u32 %rd109, %r1077, 2, %rd5; 2026-02-21T09:28:31.5829326Z mad.wide.u32 %rd110, %r1078, 2, %rd5; 2026-02-21T09:28:31.5829652Z mad.wide.u32 %rd111, %r1079, 2, %rd5; 2026-02-21T09:28:31.5829964Z mad.wide.u32 %rd112, %r1080, 2, %rd5; 2026-02-21T09:28:31.5830529Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5831111Z // begin inline asm 2026-02-21T09:28:31.5831828Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458}, [%r137 + 0]; 2026-02-21T09:28:31.5832620Z // end inline asm 2026-02-21T09:28:31.5832887Z // begin inline asm 2026-02-21T09:28:31.5833588Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475}, [%r137 + 16]; 2026-02-21T09:28:31.5834394Z // end inline asm 2026-02-21T09:28:31.5834638Z // begin inline asm 2026-02-21T09:28:31.5835382Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492}, [%r137 + 32]; 2026-02-21T09:28:31.5836184Z // end inline asm 2026-02-21T09:28:31.5836426Z // begin inline asm 2026-02-21T09:28:31.5837143Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509}, [%r137 + 48]; 2026-02-21T09:28:31.5837932Z // end inline asm 2026-02-21T09:28:31.5838182Z // begin inline asm 2026-02-21T09:28:31.5838895Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526}, [%r137 + 64]; 2026-02-21T09:28:31.5839791Z // end inline asm 2026-02-21T09:28:31.5840049Z // begin inline asm 2026-02-21T09:28:31.5840758Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543}, [%r137 + 80]; 2026-02-21T09:28:31.5841535Z // end inline asm 2026-02-21T09:28:31.5841779Z // begin inline asm 2026-02-21T09:28:31.5842473Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560}, [%r137 + 96]; 2026-02-21T09:28:31.5843244Z // end inline asm 2026-02-21T09:28:31.5843497Z // begin inline asm 2026-02-21T09:28:31.5844242Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577}, [%r137 + 112]; 2026-02-21T09:28:31.5845099Z // end inline asm 2026-02-21T09:28:31.5845363Z // begin inline asm 2026-02-21T09:28:31.5846069Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594}, [%r137 + 128]; 2026-02-21T09:28:31.5846851Z // end inline asm 2026-02-21T09:28:31.5847088Z // begin inline asm 2026-02-21T09:28:31.5847800Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611}, [%r137 + 144]; 2026-02-21T09:28:31.5848561Z // end inline asm 2026-02-21T09:28:31.5848794Z // begin inline asm 2026-02-21T09:28:31.5849500Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628}, [%r137 + 160]; 2026-02-21T09:28:31.5850280Z // end inline asm 2026-02-21T09:28:31.5850525Z // begin inline asm 2026-02-21T09:28:31.5851215Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645}, [%r137 + 176]; 2026-02-21T09:28:31.5852008Z // end inline asm 2026-02-21T09:28:31.5852264Z // begin inline asm 2026-02-21T09:28:31.5852971Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662}, [%r137 + 192]; 2026-02-21T09:28:31.5853692Z // end inline asm 2026-02-21T09:28:31.5853919Z // begin inline asm 2026-02-21T09:28:31.5854594Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679}, [%r137 + 208]; 2026-02-21T09:28:31.5855438Z // end inline asm 2026-02-21T09:28:31.5855688Z // begin inline asm 2026-02-21T09:28:31.5856404Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696}, [%r137 + 224]; 2026-02-21T09:28:31.5857148Z // end inline asm 2026-02-21T09:28:31.5857388Z // begin inline asm 2026-02-21T09:28:31.5858085Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713}, [%r137 + 240]; 2026-02-21T09:28:31.5858884Z // end inline asm 2026-02-21T09:28:31.5859133Z // begin inline asm 2026-02-21T09:28:31.5859410Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:28:31.5859720Z // end inline asm 2026-02-21T09:28:31.5859964Z cvt.u64.u32 %rd113, %r443; 2026-02-21T09:28:31.5860266Z cvt.u64.u32 %rd114, %r444; 2026-02-21T09:28:31.5860563Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:28:31.5860863Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:28:31.5861413Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5862011Z mov.b64 {%r1081, %r1082}, %rd116; 2026-02-21T09:28:31.5862361Z cvt.rn.f16x2.f32 %r1083, %r1082, %r1081; 2026-02-21T09:28:31.5863018Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5863648Z cvt.u64.u32 %rd117, %r445; 2026-02-21T09:28:31.5863935Z cvt.u64.u32 %rd118, %r446; 2026-02-21T09:28:31.5864243Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:28:31.5864525Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:28:31.5865130Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5865710Z mov.b64 {%r1084, %r1085}, %rd120; 2026-02-21T09:28:31.5866049Z cvt.rn.f16x2.f32 %r1086, %r1085, %r1084; 2026-02-21T09:28:31.5866629Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5867197Z cvt.u64.u32 %rd121, %r447; 2026-02-21T09:28:31.5867485Z cvt.u64.u32 %rd122, %r448; 2026-02-21T09:28:31.5867769Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:28:31.5868165Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:28:31.5868752Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5869329Z mov.b64 {%r1087, %r1088}, %rd124; 2026-02-21T09:28:31.5869655Z cvt.rn.f16x2.f32 %r1089, %r1088, %r1087; 2026-02-21T09:28:31.5870223Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5870799Z cvt.u64.u32 %rd125, %r449; 2026-02-21T09:28:31.5871084Z cvt.u64.u32 %rd126, %r450; 2026-02-21T09:28:31.5871358Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:28:31.5871645Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:28:31.5872164Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5872736Z mov.b64 {%r1090, %r1091}, %rd128; 2026-02-21T09:28:31.5873082Z cvt.rn.f16x2.f32 %r1092, %r1091, %r1090; 2026-02-21T09:28:31.5873666Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5874245Z cvt.u64.u32 %rd129, %r451; 2026-02-21T09:28:31.5874534Z cvt.u64.u32 %rd130, %r452; 2026-02-21T09:28:31.5874863Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:28:31.5875157Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:28:31.5875677Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5876247Z mov.b64 {%r1093, %r1094}, %rd132; 2026-02-21T09:28:31.5876573Z cvt.rn.f16x2.f32 %r1095, %r1094, %r1093; 2026-02-21T09:28:31.5877130Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5877706Z cvt.u64.u32 %rd133, %r453; 2026-02-21T09:28:31.5877986Z cvt.u64.u32 %rd134, %r454; 2026-02-21T09:28:31.5878306Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:28:31.5878649Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:28:31.5879284Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5879979Z mov.b64 {%r1096, %r1097}, %rd136; 2026-02-21T09:28:31.5880352Z cvt.rn.f16x2.f32 %r1098, %r1097, %r1096; 2026-02-21T09:28:31.5880946Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5881525Z cvt.u64.u32 %rd137, %r455; 2026-02-21T09:28:31.5881813Z cvt.u64.u32 %rd138, %r456; 2026-02-21T09:28:31.5882104Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:28:31.5882406Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:28:31.5882944Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5883538Z mov.b64 {%r1099, %r1100}, %rd140; 2026-02-21T09:28:31.5883888Z cvt.rn.f16x2.f32 %r1101, %r1100, %r1099; 2026-02-21T09:28:31.5884462Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5885098Z cvt.u64.u32 %rd141, %r457; 2026-02-21T09:28:31.5885394Z cvt.u64.u32 %rd142, %r458; 2026-02-21T09:28:31.5885694Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:28:31.5886080Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:28:31.5886670Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5887235Z mov.b64 {%r1102, %r1103}, %rd144; 2026-02-21T09:28:31.5887555Z cvt.rn.f16x2.f32 %r1104, %r1103, %r1102; 2026-02-21T09:28:31.5888164Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5888747Z cvt.u64.u32 %rd145, %r460; 2026-02-21T09:28:31.5889033Z cvt.u64.u32 %rd146, %r461; 2026-02-21T09:28:31.5889325Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:28:31.5889641Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:28:31.5890183Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5890781Z mov.b64 {%r1105, %r1106}, %rd148; 2026-02-21T09:28:31.5891174Z cvt.rn.f16x2.f32 %r1107, %r1106, %r1105; 2026-02-21T09:28:31.5891776Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5892384Z cvt.u64.u32 %rd149, %r462; 2026-02-21T09:28:31.5892684Z cvt.u64.u32 %rd150, %r463; 2026-02-21T09:28:31.5892982Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:28:31.5893273Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:28:31.5893818Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5894392Z mov.b64 {%r1108, %r1109}, %rd152; 2026-02-21T09:28:31.5894770Z cvt.rn.f16x2.f32 %r1110, %r1109, %r1108; 2026-02-21T09:28:31.5895355Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5895947Z cvt.u64.u32 %rd153, %r464; 2026-02-21T09:28:31.5896255Z cvt.u64.u32 %rd154, %r465; 2026-02-21T09:28:31.5896529Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:28:31.5896841Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:28:31.5897383Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5897991Z mov.b64 {%r1111, %r1112}, %rd156; 2026-02-21T09:28:31.5898336Z cvt.rn.f16x2.f32 %r1113, %r1112, %r1111; 2026-02-21T09:28:31.5898899Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5899466Z cvt.u64.u32 %rd157, %r466; 2026-02-21T09:28:31.5899756Z cvt.u64.u32 %rd158, %r467; 2026-02-21T09:28:31.5900047Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:28:31.5900340Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:28:31.5900878Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5901451Z mov.b64 {%r1114, %r1115}, %rd160; 2026-02-21T09:28:31.5901795Z cvt.rn.f16x2.f32 %r1116, %r1115, %r1114; 2026-02-21T09:28:31.5902384Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5902972Z cvt.u64.u32 %rd161, %r468; 2026-02-21T09:28:31.5903285Z cvt.u64.u32 %rd162, %r469; 2026-02-21T09:28:31.5903570Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:28:31.5903861Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:28:31.5904403Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5905044Z mov.b64 {%r1117, %r1118}, %rd164; 2026-02-21T09:28:31.5905375Z cvt.rn.f16x2.f32 %r1119, %r1118, %r1117; 2026-02-21T09:28:31.5905957Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5906556Z cvt.u64.u32 %rd165, %r470; 2026-02-21T09:28:31.5906844Z cvt.u64.u32 %rd166, %r471; 2026-02-21T09:28:31.5907158Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:28:31.5907451Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:28:31.5908002Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5908596Z mov.b64 {%r1120, %r1121}, %rd168; 2026-02-21T09:28:31.5909008Z cvt.rn.f16x2.f32 %r1122, %r1121, %r1120; 2026-02-21T09:28:31.5909646Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5910234Z cvt.u64.u32 %rd169, %r472; 2026-02-21T09:28:31.5910540Z cvt.u64.u32 %rd170, %r473; 2026-02-21T09:28:31.5910822Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:28:31.5911128Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:28:31.5911664Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5912264Z mov.b64 {%r1123, %r1124}, %rd172; 2026-02-21T09:28:31.5912608Z cvt.rn.f16x2.f32 %r1125, %r1124, %r1123; 2026-02-21T09:28:31.5913200Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5913794Z cvt.u64.u32 %rd173, %r474; 2026-02-21T09:28:31.5914130Z cvt.u64.u32 %rd174, %r475; 2026-02-21T09:28:31.5914473Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:28:31.5914826Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:28:31.5915415Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5916155Z mov.b64 {%r1126, %r1127}, %rd176; 2026-02-21T09:28:31.5916514Z cvt.rn.f16x2.f32 %r1128, %r1127, %r1126; 2026-02-21T09:28:31.5917106Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5917701Z cvt.u64.u32 %rd177, %r477; 2026-02-21T09:28:31.5917997Z cvt.u64.u32 %rd178, %r478; 2026-02-21T09:28:31.5918288Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:28:31.5918603Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:28:31.5919152Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5919705Z mov.b64 {%r1129, %r1130}, %rd180; 2026-02-21T09:28:31.5920052Z cvt.rn.f16x2.f32 %r1131, %r1130, %r1129; 2026-02-21T09:28:31.5920616Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5921222Z cvt.u64.u32 %rd181, %r479; 2026-02-21T09:28:31.5921524Z cvt.u64.u32 %rd182, %r480; 2026-02-21T09:28:31.5921812Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:28:31.5922108Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:28:31.5922663Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5923247Z mov.b64 {%r1132, %r1133}, %rd184; 2026-02-21T09:28:31.5923579Z cvt.rn.f16x2.f32 %r1134, %r1133, %r1132; 2026-02-21T09:28:31.5924157Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5924773Z cvt.u64.u32 %rd185, %r481; 2026-02-21T09:28:31.5925077Z cvt.u64.u32 %rd186, %r482; 2026-02-21T09:28:31.5925361Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:28:31.5925657Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:28:31.5926174Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5926772Z mov.b64 {%r1135, %r1136}, %rd188; 2026-02-21T09:28:31.5927105Z cvt.rn.f16x2.f32 %r1137, %r1136, %r1135; 2026-02-21T09:28:31.5927684Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5928259Z cvt.u64.u32 %rd189, %r483; 2026-02-21T09:28:31.5928541Z cvt.u64.u32 %rd190, %r484; 2026-02-21T09:28:31.5928835Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:28:31.5929122Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:28:31.5929658Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5930248Z mov.b64 {%r1138, %r1139}, %rd192; 2026-02-21T09:28:31.5930573Z cvt.rn.f16x2.f32 %r1140, %r1139, %r1138; 2026-02-21T09:28:31.5931143Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5931733Z cvt.u64.u32 %rd193, %r485; 2026-02-21T09:28:31.5932093Z cvt.u64.u32 %rd194, %r486; 2026-02-21T09:28:31.5932420Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:28:31.5932716Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:28:31.5933221Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5933760Z mov.b64 {%r1141, %r1142}, %rd196; 2026-02-21T09:28:31.5934051Z cvt.rn.f16x2.f32 %r1143, %r1142, %r1141; 2026-02-21T09:28:31.5934555Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5935097Z cvt.u64.u32 %rd197, %r487; 2026-02-21T09:28:31.5935352Z cvt.u64.u32 %rd198, %r488; 2026-02-21T09:28:31.5935612Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:28:31.5935865Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:28:31.5936384Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5936959Z mov.b64 {%r1144, %r1145}, %rd200; 2026-02-21T09:28:31.5937237Z cvt.rn.f16x2.f32 %r1146, %r1145, %r1144; 2026-02-21T09:28:31.5937736Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5938227Z cvt.u64.u32 %rd201, %r489; 2026-02-21T09:28:31.5938472Z cvt.u64.u32 %rd202, %r490; 2026-02-21T09:28:31.5938719Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:28:31.5938980Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:28:31.5939450Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5940005Z mov.b64 {%r1147, %r1148}, %rd204; 2026-02-21T09:28:31.5940321Z cvt.rn.f16x2.f32 %r1149, %r1148, %r1147; 2026-02-21T09:28:31.5940838Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5941356Z cvt.u64.u32 %rd205, %r491; 2026-02-21T09:28:31.5941608Z cvt.u64.u32 %rd206, %r492; 2026-02-21T09:28:31.5941873Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:28:31.5942140Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:28:31.5942624Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5943167Z mov.b64 {%r1150, %r1151}, %rd208; 2026-02-21T09:28:31.5943456Z cvt.rn.f16x2.f32 %r1152, %r1151, %r1150; 2026-02-21T09:28:31.5943975Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5944505Z cvt.u64.u32 %rd209, %r494; 2026-02-21T09:28:31.5944869Z cvt.u64.u32 %rd210, %r495; 2026-02-21T09:28:31.5945141Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:28:31.5945414Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:28:31.5945903Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5946420Z mov.b64 {%r1153, %r1154}, %rd212; 2026-02-21T09:28:31.5946726Z cvt.rn.f16x2.f32 %r1155, %r1154, %r1153; 2026-02-21T09:28:31.5947244Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5947770Z cvt.u64.u32 %rd213, %r496; 2026-02-21T09:28:31.5948028Z cvt.u64.u32 %rd214, %r497; 2026-02-21T09:28:31.5948296Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:28:31.5948596Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:28:31.5949101Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5949631Z mov.b64 {%r1156, %r1157}, %rd216; 2026-02-21T09:28:31.5949928Z cvt.rn.f16x2.f32 %r1158, %r1157, %r1156; 2026-02-21T09:28:31.5950456Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5950969Z cvt.u64.u32 %rd217, %r498; 2026-02-21T09:28:31.5951240Z cvt.u64.u32 %rd218, %r499; 2026-02-21T09:28:31.5951500Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:28:31.5951774Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:28:31.5952279Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5953022Z mov.b64 {%r1159, %r1160}, %rd220; 2026-02-21T09:28:31.5953334Z cvt.rn.f16x2.f32 %r1161, %r1160, %r1159; 2026-02-21T09:28:31.5953891Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5954459Z cvt.u64.u32 %rd221, %r500; 2026-02-21T09:28:31.5954780Z cvt.u64.u32 %rd222, %r501; 2026-02-21T09:28:31.5955051Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:28:31.5955318Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:28:31.5955813Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5956340Z mov.b64 {%r1162, %r1163}, %rd224; 2026-02-21T09:28:31.5956635Z cvt.rn.f16x2.f32 %r1164, %r1163, %r1162; 2026-02-21T09:28:31.5957269Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5957837Z cvt.u64.u32 %rd225, %r502; 2026-02-21T09:28:31.5958121Z cvt.u64.u32 %rd226, %r503; 2026-02-21T09:28:31.5958420Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:28:31.5958693Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:28:31.5959194Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5959722Z mov.b64 {%r1165, %r1166}, %rd228; 2026-02-21T09:28:31.5960030Z cvt.rn.f16x2.f32 %r1167, %r1166, %r1165; 2026-02-21T09:28:31.5960540Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5961066Z cvt.u64.u32 %rd229, %r504; 2026-02-21T09:28:31.5961326Z cvt.u64.u32 %rd230, %r505; 2026-02-21T09:28:31.5961590Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:28:31.5961856Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:28:31.5962358Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5962945Z mov.b64 {%r1168, %r1169}, %rd232; 2026-02-21T09:28:31.5963250Z cvt.rn.f16x2.f32 %r1170, %r1169, %r1168; 2026-02-21T09:28:31.5963785Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5964311Z cvt.u64.u32 %rd233, %r506; 2026-02-21T09:28:31.5964581Z cvt.u64.u32 %rd234, %r507; 2026-02-21T09:28:31.5964868Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:28:31.5965127Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:28:31.5965618Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5966142Z mov.b64 {%r1171, %r1172}, %rd236; 2026-02-21T09:28:31.5966450Z cvt.rn.f16x2.f32 %r1173, %r1172, %r1171; 2026-02-21T09:28:31.5966965Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5967515Z cvt.u64.u32 %rd237, %r508; 2026-02-21T09:28:31.5967809Z cvt.u64.u32 %rd238, %r509; 2026-02-21T09:28:31.5968085Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:28:31.5968356Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:28:31.5968864Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5969401Z mov.b64 {%r1174, %r1175}, %rd240; 2026-02-21T09:28:31.5969698Z cvt.rn.f16x2.f32 %r1176, %r1175, %r1174; 2026-02-21T09:28:31.5970231Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5970759Z cvt.u64.u32 %rd241, %r511; 2026-02-21T09:28:31.5971028Z cvt.u64.u32 %rd242, %r512; 2026-02-21T09:28:31.5971287Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:28:31.5971564Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:28:31.5972112Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5972649Z mov.b64 {%r1177, %r1178}, %rd244; 2026-02-21T09:28:31.5972962Z cvt.rn.f16x2.f32 %r1179, %r1178, %r1177; 2026-02-21T09:28:31.5973488Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5974179Z cvt.u64.u32 %rd245, %r513; 2026-02-21T09:28:31.5974443Z cvt.u64.u32 %rd246, %r514; 2026-02-21T09:28:31.5974758Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:28:31.5975025Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:28:31.5975515Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5976045Z mov.b64 {%r1180, %r1181}, %rd248; 2026-02-21T09:28:31.5976343Z cvt.rn.f16x2.f32 %r1182, %r1181, %r1180; 2026-02-21T09:28:31.5976925Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5977459Z cvt.u64.u32 %rd249, %r515; 2026-02-21T09:28:31.5977735Z cvt.u64.u32 %rd250, %r516; 2026-02-21T09:28:31.5977997Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:28:31.5978272Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:28:31.5978898Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5979445Z mov.b64 {%r1183, %r1184}, %rd252; 2026-02-21T09:28:31.5979748Z cvt.rn.f16x2.f32 %r1185, %r1184, %r1183; 2026-02-21T09:28:31.5980264Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5980794Z cvt.u64.u32 %rd253, %r517; 2026-02-21T09:28:31.5981100Z cvt.u64.u32 %rd254, %r518; 2026-02-21T09:28:31.5981368Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:28:31.5981631Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:28:31.5982124Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5982657Z mov.b64 {%r1186, %r1187}, %rd256; 2026-02-21T09:28:31.5982954Z cvt.rn.f16x2.f32 %r1188, %r1187, %r1186; 2026-02-21T09:28:31.5983480Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5984004Z cvt.u64.u32 %rd257, %r519; 2026-02-21T09:28:31.5984281Z cvt.u64.u32 %rd258, %r520; 2026-02-21T09:28:31.5984547Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:28:31.5984883Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:28:31.5985409Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5985966Z mov.b64 {%r1189, %r1190}, %rd260; 2026-02-21T09:28:31.5986280Z cvt.rn.f16x2.f32 %r1191, %r1190, %r1189; 2026-02-21T09:28:31.5986800Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5987329Z cvt.u64.u32 %rd261, %r521; 2026-02-21T09:28:31.5987592Z cvt.u64.u32 %rd262, %r522; 2026-02-21T09:28:31.5987858Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:28:31.5988125Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:28:31.5988620Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5989159Z mov.b64 {%r1192, %r1193}, %rd264; 2026-02-21T09:28:31.5989463Z cvt.rn.f16x2.f32 %r1194, %r1193, %r1192; 2026-02-21T09:28:31.5990044Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5990588Z cvt.u64.u32 %rd265, %r523; 2026-02-21T09:28:31.5990862Z cvt.u64.u32 %rd266, %r524; 2026-02-21T09:28:31.5991129Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:28:31.5991399Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:28:31.5991899Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5992424Z mov.b64 {%r1195, %r1196}, %rd268; 2026-02-21T09:28:31.5992728Z cvt.rn.f16x2.f32 %r1197, %r1196, %r1195; 2026-02-21T09:28:31.5993246Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5993782Z cvt.u64.u32 %rd269, %r525; 2026-02-21T09:28:31.5994042Z cvt.u64.u32 %rd270, %r526; 2026-02-21T09:28:31.5994357Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:28:31.5994633Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:28:31.5995302Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5995901Z mov.b64 {%r1198, %r1199}, %rd272; 2026-02-21T09:28:31.5996199Z cvt.rn.f16x2.f32 %r1200, %r1199, %r1198; 2026-02-21T09:28:31.5996717Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.5997236Z cvt.u64.u32 %rd273, %r528; 2026-02-21T09:28:31.5997504Z cvt.u64.u32 %rd274, %r529; 2026-02-21T09:28:31.5997767Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:28:31.5998045Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:28:31.5998540Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.5999123Z mov.b64 {%r1201, %r1202}, %rd276; 2026-02-21T09:28:31.5999433Z cvt.rn.f16x2.f32 %r1203, %r1202, %r1201; 2026-02-21T09:28:31.6000084Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6000626Z cvt.u64.u32 %rd277, %r530; 2026-02-21T09:28:31.6000884Z cvt.u64.u32 %rd278, %r531; 2026-02-21T09:28:31.6001156Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:28:31.6001429Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:28:31.6001910Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6002447Z mov.b64 {%r1204, %r1205}, %rd280; 2026-02-21T09:28:31.6002741Z cvt.rn.f16x2.f32 %r1206, %r1205, %r1204; 2026-02-21T09:28:31.6003319Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6003859Z cvt.u64.u32 %rd281, %r532; 2026-02-21T09:28:31.6004135Z cvt.u64.u32 %rd282, %r533; 2026-02-21T09:28:31.6004401Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:28:31.6004721Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:28:31.6005233Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6005764Z mov.b64 {%r1207, %r1208}, %rd284; 2026-02-21T09:28:31.6006078Z cvt.rn.f16x2.f32 %r1209, %r1208, %r1207; 2026-02-21T09:28:31.6006594Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6007124Z cvt.u64.u32 %rd285, %r534; 2026-02-21T09:28:31.6007390Z cvt.u64.u32 %rd286, %r535; 2026-02-21T09:28:31.6007698Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:28:31.6007970Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:28:31.6008464Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6009005Z mov.b64 {%r1210, %r1211}, %rd288; 2026-02-21T09:28:31.6009305Z cvt.rn.f16x2.f32 %r1212, %r1211, %r1210; 2026-02-21T09:28:31.6009828Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6010357Z cvt.u64.u32 %rd289, %r536; 2026-02-21T09:28:31.6010631Z cvt.u64.u32 %rd290, %r537; 2026-02-21T09:28:31.6010894Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:28:31.6011177Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:28:31.6011669Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6012254Z mov.b64 {%r1213, %r1214}, %rd292; 2026-02-21T09:28:31.6012565Z cvt.rn.f16x2.f32 %r1215, %r1214, %r1213; 2026-02-21T09:28:31.6013087Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6013621Z cvt.u64.u32 %rd293, %r538; 2026-02-21T09:28:31.6013886Z cvt.u64.u32 %rd294, %r539; 2026-02-21T09:28:31.6014153Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:28:31.6014429Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:28:31.6014978Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6015522Z mov.b64 {%r1216, %r1217}, %rd296; 2026-02-21T09:28:31.6015823Z cvt.rn.f16x2.f32 %r1218, %r1217, %r1216; 2026-02-21T09:28:31.6016501Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6017095Z cvt.u64.u32 %rd297, %r540; 2026-02-21T09:28:31.6017366Z cvt.u64.u32 %rd298, %r541; 2026-02-21T09:28:31.6017635Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:28:31.6017909Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:28:31.6018403Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6018927Z mov.b64 {%r1219, %r1220}, %rd300; 2026-02-21T09:28:31.6019237Z cvt.rn.f16x2.f32 %r1221, %r1220, %r1219; 2026-02-21T09:28:31.6019762Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6020295Z cvt.u64.u32 %rd301, %r542; 2026-02-21T09:28:31.6020558Z cvt.u64.u32 %rd302, %r543; 2026-02-21T09:28:31.6020951Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:28:31.6021283Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:28:31.6021795Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6022335Z mov.b64 {%r1222, %r1223}, %rd304; 2026-02-21T09:28:31.6022630Z cvt.rn.f16x2.f32 %r1224, %r1223, %r1222; 2026-02-21T09:28:31.6023153Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6023671Z cvt.u64.u32 %rd305, %r545; 2026-02-21T09:28:31.6023938Z cvt.u64.u32 %rd306, %r546; 2026-02-21T09:28:31.6024198Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:28:31.6024474Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:28:31.6025044Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6025639Z mov.b64 {%r1225, %r1226}, %rd308; 2026-02-21T09:28:31.6025953Z cvt.rn.f16x2.f32 %r1227, %r1226, %r1225; 2026-02-21T09:28:31.6026479Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6027064Z cvt.u64.u32 %rd309, %r547; 2026-02-21T09:28:31.6027336Z cvt.u64.u32 %rd310, %r548; 2026-02-21T09:28:31.6027615Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:28:31.6027907Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:28:31.6028398Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6028929Z mov.b64 {%r1228, %r1229}, %rd312; 2026-02-21T09:28:31.6029230Z cvt.rn.f16x2.f32 %r1230, %r1229, %r1228; 2026-02-21T09:28:31.6029755Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6030271Z cvt.u64.u32 %rd313, %r549; 2026-02-21T09:28:31.6030541Z cvt.u64.u32 %rd314, %r550; 2026-02-21T09:28:31.6030810Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:28:31.6031078Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:28:31.6031599Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6032157Z mov.b64 {%r1231, %r1232}, %rd316; 2026-02-21T09:28:31.6032500Z cvt.rn.f16x2.f32 %r1233, %r1232, %r1231; 2026-02-21T09:28:31.6033040Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6033567Z cvt.u64.u32 %rd317, %r551; 2026-02-21T09:28:31.6033832Z cvt.u64.u32 %rd318, %r552; 2026-02-21T09:28:31.6034103Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:28:31.6034373Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:28:31.6034908Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6035442Z mov.b64 {%r1234, %r1235}, %rd320; 2026-02-21T09:28:31.6035736Z cvt.rn.f16x2.f32 %r1236, %r1235, %r1234; 2026-02-21T09:28:31.6036281Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6036834Z cvt.u64.u32 %rd321, %r553; 2026-02-21T09:28:31.6037108Z cvt.u64.u32 %rd322, %r554; 2026-02-21T09:28:31.6037497Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:28:31.6037809Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:28:31.6038300Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6038818Z mov.b64 {%r1237, %r1238}, %rd324; 2026-02-21T09:28:31.6039118Z cvt.rn.f16x2.f32 %r1239, %r1238, %r1237; 2026-02-21T09:28:31.6039634Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6040170Z cvt.u64.u32 %rd325, %r555; 2026-02-21T09:28:31.6040463Z cvt.u64.u32 %rd326, %r556; 2026-02-21T09:28:31.6040728Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:28:31.6040997Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:28:31.6041480Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6041665Z mov.b64 {%r1240, %r1241}, %rd328; 2026-02-21T09:28:31.6041834Z cvt.rn.f16x2.f32 %r1242, %r1241, %r1240; 2026-02-21T09:28:31.6042148Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6042249Z cvt.u64.u32 %rd329, %r557; 2026-02-21T09:28:31.6042353Z cvt.u64.u32 %rd330, %r558; 2026-02-21T09:28:31.6042446Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:28:31.6042543Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:28:31.6042859Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6042955Z mov.b64 {%r1243, %r1244}, %rd332; 2026-02-21T09:28:31.6043066Z cvt.rn.f16x2.f32 %r1245, %r1244, %r1243; 2026-02-21T09:28:31.6043368Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6043472Z cvt.u64.u32 %rd333, %r559; 2026-02-21T09:28:31.6043566Z cvt.u64.u32 %rd334, %r560; 2026-02-21T09:28:31.6043662Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:28:31.6043768Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:28:31.6044081Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6044182Z mov.b64 {%r1246, %r1247}, %rd336; 2026-02-21T09:28:31.6044301Z cvt.rn.f16x2.f32 %r1248, %r1247, %r1246; 2026-02-21T09:28:31.6044608Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6044770Z cvt.u64.u32 %rd337, %r562; 2026-02-21T09:28:31.6044885Z cvt.u64.u32 %rd338, %r563; 2026-02-21T09:28:31.6044999Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:28:31.6045094Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:28:31.6045415Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6045518Z mov.b64 {%r1249, %r1250}, %rd340; 2026-02-21T09:28:31.6045625Z cvt.rn.f16x2.f32 %r1251, %r1250, %r1249; 2026-02-21T09:28:31.6045937Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6046045Z cvt.u64.u32 %rd341, %r564; 2026-02-21T09:28:31.6046141Z cvt.u64.u32 %rd342, %r565; 2026-02-21T09:28:31.6046236Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:28:31.6046331Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:28:31.6046647Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6046746Z mov.b64 {%r1252, %r1253}, %rd344; 2026-02-21T09:28:31.6046857Z cvt.rn.f16x2.f32 %r1254, %r1253, %r1252; 2026-02-21T09:28:31.6047173Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6047270Z cvt.u64.u32 %rd345, %r566; 2026-02-21T09:28:31.6047366Z cvt.u64.u32 %rd346, %r567; 2026-02-21T09:28:31.6047485Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:28:31.6047584Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:28:31.6047900Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6048089Z mov.b64 {%r1255, %r1256}, %rd348; 2026-02-21T09:28:31.6048252Z cvt.rn.f16x2.f32 %r1257, %r1256, %r1255; 2026-02-21T09:28:31.6048557Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6048651Z cvt.u64.u32 %rd349, %r568; 2026-02-21T09:28:31.6048754Z cvt.u64.u32 %rd350, %r569; 2026-02-21T09:28:31.6048850Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:28:31.6048950Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:28:31.6049278Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6049405Z mov.b64 {%r1258, %r1259}, %rd352; 2026-02-21T09:28:31.6049545Z cvt.rn.f16x2.f32 %r1260, %r1259, %r1258; 2026-02-21T09:28:31.6049905Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6050097Z cvt.u64.u32 %rd353, %r570; 2026-02-21T09:28:31.6050259Z cvt.u64.u32 %rd354, %r571; 2026-02-21T09:28:31.6050379Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:28:31.6050492Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:28:31.6050832Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6050940Z mov.b64 {%r1261, %r1262}, %rd356; 2026-02-21T09:28:31.6051062Z cvt.rn.f16x2.f32 %r1263, %r1262, %r1261; 2026-02-21T09:28:31.6051401Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6051510Z cvt.u64.u32 %rd357, %r572; 2026-02-21T09:28:31.6051612Z cvt.u64.u32 %rd358, %r573; 2026-02-21T09:28:31.6051719Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:28:31.6051823Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:28:31.6052138Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6052247Z mov.b64 {%r1264, %r1265}, %rd360; 2026-02-21T09:28:31.6052362Z cvt.rn.f16x2.f32 %r1266, %r1265, %r1264; 2026-02-21T09:28:31.6052682Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6052789Z cvt.u64.u32 %rd361, %r574; 2026-02-21T09:28:31.6052884Z cvt.u64.u32 %rd362, %r575; 2026-02-21T09:28:31.6052980Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:28:31.6053076Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:28:31.6053393Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6053489Z mov.b64 {%r1267, %r1268}, %rd364; 2026-02-21T09:28:31.6053599Z cvt.rn.f16x2.f32 %r1269, %r1268, %r1267; 2026-02-21T09:28:31.6053920Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6054019Z cvt.u64.u32 %rd365, %r576; 2026-02-21T09:28:31.6054115Z cvt.u64.u32 %rd366, %r577; 2026-02-21T09:28:31.6054219Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:28:31.6054318Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:28:31.6054632Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6054814Z mov.b64 {%r1270, %r1271}, %rd368; 2026-02-21T09:28:31.6054940Z cvt.rn.f16x2.f32 %r1272, %r1271, %r1270; 2026-02-21T09:28:31.6055255Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6055351Z cvt.u64.u32 %rd369, %r579; 2026-02-21T09:28:31.6055457Z cvt.u64.u32 %rd370, %r580; 2026-02-21T09:28:31.6055553Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:28:31.6055649Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:28:31.6055971Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6056067Z mov.b64 {%r1273, %r1274}, %rd372; 2026-02-21T09:28:31.6056176Z cvt.rn.f16x2.f32 %r1275, %r1274, %r1273; 2026-02-21T09:28:31.6056558Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6056758Z cvt.u64.u32 %rd373, %r581; 2026-02-21T09:28:31.6056894Z cvt.u64.u32 %rd374, %r582; 2026-02-21T09:28:31.6056989Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:28:31.6057094Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:28:31.6057430Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6057526Z mov.b64 {%r1276, %r1277}, %rd376; 2026-02-21T09:28:31.6057641Z cvt.rn.f16x2.f32 %r1278, %r1277, %r1276; 2026-02-21T09:28:31.6057953Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6058049Z cvt.u64.u32 %rd377, %r583; 2026-02-21T09:28:31.6058144Z cvt.u64.u32 %rd378, %r584; 2026-02-21T09:28:31.6058248Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:28:31.6058346Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:28:31.6058726Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6058876Z mov.b64 {%r1279, %r1280}, %rd380; 2026-02-21T09:28:31.6058996Z cvt.rn.f16x2.f32 %r1281, %r1280, %r1279; 2026-02-21T09:28:31.6059311Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6059416Z cvt.u64.u32 %rd381, %r585; 2026-02-21T09:28:31.6059510Z cvt.u64.u32 %rd382, %r586; 2026-02-21T09:28:31.6059607Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:28:31.6059703Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:28:31.6060022Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6060118Z mov.b64 {%r1282, %r1283}, %rd384; 2026-02-21T09:28:31.6060228Z cvt.rn.f16x2.f32 %r1284, %r1283, %r1282; 2026-02-21T09:28:31.6060548Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6060649Z cvt.u64.u32 %rd385, %r587; 2026-02-21T09:28:31.6060748Z cvt.u64.u32 %rd386, %r588; 2026-02-21T09:28:31.6060860Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:28:31.6060961Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:28:31.6061312Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6061418Z mov.b64 {%r1285, %r1286}, %rd388; 2026-02-21T09:28:31.6061541Z cvt.rn.f16x2.f32 %r1287, %r1286, %r1285; 2026-02-21T09:28:31.6061868Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6061962Z cvt.u64.u32 %rd389, %r589; 2026-02-21T09:28:31.6062069Z cvt.u64.u32 %rd390, %r590; 2026-02-21T09:28:31.6062162Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:28:31.6062258Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:28:31.6062577Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6062679Z mov.b64 {%r1288, %r1289}, %rd392; 2026-02-21T09:28:31.6062794Z cvt.rn.f16x2.f32 %r1290, %r1289, %r1288; 2026-02-21T09:28:31.6063114Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6063228Z cvt.u64.u32 %rd393, %r591; 2026-02-21T09:28:31.6063323Z cvt.u64.u32 %rd394, %r592; 2026-02-21T09:28:31.6063419Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:28:31.6063525Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:28:31.6063842Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6063938Z mov.b64 {%r1291, %r1292}, %rd396; 2026-02-21T09:28:31.6064061Z cvt.rn.f16x2.f32 %r1293, %r1292, %r1291; 2026-02-21T09:28:31.6064375Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6064473Z cvt.u64.u32 %rd397, %r593; 2026-02-21T09:28:31.6064569Z cvt.u64.u32 %rd398, %r594; 2026-02-21T09:28:31.6064752Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:28:31.6064855Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:28:31.6065179Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6065420Z mov.b64 {%r1294, %r1295}, %rd400; 2026-02-21T09:28:31.6065534Z cvt.rn.f16x2.f32 %r1296, %r1295, %r1294; 2026-02-21T09:28:31.6065848Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6065955Z cvt.u64.u32 %rd401, %r596; 2026-02-21T09:28:31.6066071Z cvt.u64.u32 %rd402, %r597; 2026-02-21T09:28:31.6066183Z shl.b64 %rd403, %rd402, 32; 2026-02-21T09:28:31.6066283Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T09:28:31.6066613Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6066706Z mov.b64 {%r1297, %r1298}, %rd404; 2026-02-21T09:28:31.6066815Z cvt.rn.f16x2.f32 %r1299, %r1298, %r1297; 2026-02-21T09:28:31.6067214Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6067381Z cvt.u64.u32 %rd405, %r598; 2026-02-21T09:28:31.6067486Z cvt.u64.u32 %rd406, %r599; 2026-02-21T09:28:31.6067592Z shl.b64 %rd407, %rd406, 32; 2026-02-21T09:28:31.6067689Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T09:28:31.6068000Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6068097Z mov.b64 {%r1300, %r1301}, %rd408; 2026-02-21T09:28:31.6068218Z cvt.rn.f16x2.f32 %r1302, %r1301, %r1300; 2026-02-21T09:28:31.6068529Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6068627Z cvt.u64.u32 %rd409, %r600; 2026-02-21T09:28:31.6068731Z cvt.u64.u32 %rd410, %r601; 2026-02-21T09:28:31.6068825Z shl.b64 %rd411, %rd410, 32; 2026-02-21T09:28:31.6068923Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T09:28:31.6069248Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6069345Z mov.b64 {%r1303, %r1304}, %rd412; 2026-02-21T09:28:31.6069458Z cvt.rn.f16x2.f32 %r1305, %r1304, %r1303; 2026-02-21T09:28:31.6069769Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6069876Z cvt.u64.u32 %rd413, %r602; 2026-02-21T09:28:31.6069970Z cvt.u64.u32 %rd414, %r603; 2026-02-21T09:28:31.6070066Z shl.b64 %rd415, %rd414, 32; 2026-02-21T09:28:31.6070172Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T09:28:31.6070487Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6070586Z mov.b64 {%r1306, %r1307}, %rd416; 2026-02-21T09:28:31.6070709Z cvt.rn.f16x2.f32 %r1308, %r1307, %r1306; 2026-02-21T09:28:31.6071084Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6071181Z cvt.u64.u32 %rd417, %r604; 2026-02-21T09:28:31.6071273Z cvt.u64.u32 %rd418, %r605; 2026-02-21T09:28:31.6071376Z shl.b64 %rd419, %rd418, 32; 2026-02-21T09:28:31.6071475Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T09:28:31.6071802Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6071906Z mov.b64 {%r1309, %r1310}, %rd420; 2026-02-21T09:28:31.6072018Z cvt.rn.f16x2.f32 %r1311, %r1310, %r1309; 2026-02-21T09:28:31.6072328Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6072435Z cvt.u64.u32 %rd421, %r606; 2026-02-21T09:28:31.6072530Z cvt.u64.u32 %rd422, %r607; 2026-02-21T09:28:31.6072629Z shl.b64 %rd423, %rd422, 32; 2026-02-21T09:28:31.6072728Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T09:28:31.6073048Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6073146Z mov.b64 {%r1312, %r1313}, %rd424; 2026-02-21T09:28:31.6073260Z cvt.rn.f16x2.f32 %r1314, %r1313, %r1312; 2026-02-21T09:28:31.6073586Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6073801Z cvt.u64.u32 %rd425, %r608; 2026-02-21T09:28:31.6073900Z cvt.u64.u32 %rd426, %r609; 2026-02-21T09:28:31.6074008Z shl.b64 %rd427, %rd426, 32; 2026-02-21T09:28:31.6074107Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T09:28:31.6074419Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6074517Z mov.b64 {%r1315, %r1316}, %rd428; 2026-02-21T09:28:31.6074637Z cvt.rn.f16x2.f32 %r1317, %r1316, %r1315; 2026-02-21T09:28:31.6075013Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6075112Z cvt.u64.u32 %rd429, %r610; 2026-02-21T09:28:31.6075218Z cvt.u64.u32 %rd430, %r611; 2026-02-21T09:28:31.6075312Z shl.b64 %rd431, %rd430, 32; 2026-02-21T09:28:31.6075471Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T09:28:31.6075879Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6075982Z mov.b64 {%r1318, %r1319}, %rd432; 2026-02-21T09:28:31.6076090Z cvt.rn.f16x2.f32 %r1320, %r1319, %r1318; 2026-02-21T09:28:31.6076412Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6076513Z cvt.u64.u32 %rd433, %r613; 2026-02-21T09:28:31.6076606Z cvt.u64.u32 %rd434, %r614; 2026-02-21T09:28:31.6076698Z shl.b64 %rd435, %rd434, 32; 2026-02-21T09:28:31.6076803Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T09:28:31.6077116Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6077213Z mov.b64 {%r1321, %r1322}, %rd436; 2026-02-21T09:28:31.6077331Z cvt.rn.f16x2.f32 %r1323, %r1322, %r1321; 2026-02-21T09:28:31.6077645Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6077745Z cvt.u64.u32 %rd437, %r615; 2026-02-21T09:28:31.6077843Z cvt.u64.u32 %rd438, %r616; 2026-02-21T09:28:31.6077949Z shl.b64 %rd439, %rd438, 32; 2026-02-21T09:28:31.6078046Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T09:28:31.6078359Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6078467Z mov.b64 {%r1324, %r1325}, %rd440; 2026-02-21T09:28:31.6078580Z cvt.rn.f16x2.f32 %r1326, %r1325, %r1324; 2026-02-21T09:28:31.6078893Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6078998Z cvt.u64.u32 %rd441, %r617; 2026-02-21T09:28:31.6079094Z cvt.u64.u32 %rd442, %r618; 2026-02-21T09:28:31.6079190Z shl.b64 %rd443, %rd442, 32; 2026-02-21T09:28:31.6079288Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T09:28:31.6079610Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6079712Z mov.b64 {%r1327, %r1328}, %rd444; 2026-02-21T09:28:31.6079825Z cvt.rn.f16x2.f32 %r1329, %r1328, %r1327; 2026-02-21T09:28:31.6080151Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6080261Z cvt.u64.u32 %rd445, %r619; 2026-02-21T09:28:31.6080372Z cvt.u64.u32 %rd446, %r620; 2026-02-21T09:28:31.6080486Z shl.b64 %rd447, %rd446, 32; 2026-02-21T09:28:31.6080584Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T09:28:31.6080904Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6081001Z mov.b64 {%r1330, %r1331}, %rd448; 2026-02-21T09:28:31.6081118Z cvt.rn.f16x2.f32 %r1332, %r1331, %r1330; 2026-02-21T09:28:31.6081431Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6081531Z cvt.u64.u32 %rd449, %r621; 2026-02-21T09:28:31.6081639Z cvt.u64.u32 %rd450, %r622; 2026-02-21T09:28:31.6081739Z shl.b64 %rd451, %rd450, 32; 2026-02-21T09:28:31.6081929Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T09:28:31.6082299Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6082398Z mov.b64 {%r1333, %r1334}, %rd452; 2026-02-21T09:28:31.6082508Z cvt.rn.f16x2.f32 %r1335, %r1334, %r1333; 2026-02-21T09:28:31.6082824Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6082929Z cvt.u64.u32 %rd453, %r623; 2026-02-21T09:28:31.6083024Z cvt.u64.u32 %rd454, %r624; 2026-02-21T09:28:31.6083122Z shl.b64 %rd455, %rd454, 32; 2026-02-21T09:28:31.6083228Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T09:28:31.6083542Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6083640Z mov.b64 {%r1336, %r1337}, %rd456; 2026-02-21T09:28:31.6083796Z cvt.rn.f16x2.f32 %r1338, %r1337, %r1336; 2026-02-21T09:28:31.6084144Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6084248Z cvt.u64.u32 %rd457, %r625; 2026-02-21T09:28:31.6084344Z cvt.u64.u32 %rd458, %r626; 2026-02-21T09:28:31.6084452Z shl.b64 %rd459, %rd458, 32; 2026-02-21T09:28:31.6084549Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T09:28:31.6084925Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6085049Z mov.b64 {%r1339, %r1340}, %rd460; 2026-02-21T09:28:31.6085172Z cvt.rn.f16x2.f32 %r1341, %r1340, %r1339; 2026-02-21T09:28:31.6085490Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6085592Z cvt.u64.u32 %rd461, %r627; 2026-02-21T09:28:31.6085688Z cvt.u64.u32 %rd462, %r628; 2026-02-21T09:28:31.6085782Z shl.b64 %rd463, %rd462, 32; 2026-02-21T09:28:31.6085881Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T09:28:31.6086205Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6086312Z mov.b64 {%r1342, %r1343}, %rd464; 2026-02-21T09:28:31.6086422Z cvt.rn.f16x2.f32 %r1344, %r1343, %r1342; 2026-02-21T09:28:31.6086741Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6086839Z cvt.u64.u32 %rd465, %r630; 2026-02-21T09:28:31.6086936Z cvt.u64.u32 %rd466, %r631; 2026-02-21T09:28:31.6087042Z shl.b64 %rd467, %rd466, 32; 2026-02-21T09:28:31.6087142Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T09:28:31.6087452Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6087551Z mov.b64 {%r1345, %r1346}, %rd468; 2026-02-21T09:28:31.6087674Z cvt.rn.f16x2.f32 %r1347, %r1346, %r1345; 2026-02-21T09:28:31.6087986Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6088085Z cvt.u64.u32 %rd469, %r632; 2026-02-21T09:28:31.6088192Z cvt.u64.u32 %rd470, %r633; 2026-02-21T09:28:31.6088292Z shl.b64 %rd471, %rd470, 32; 2026-02-21T09:28:31.6088390Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T09:28:31.6088712Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6088811Z mov.b64 {%r1348, %r1349}, %rd472; 2026-02-21T09:28:31.6088924Z cvt.rn.f16x2.f32 %r1350, %r1349, %r1348; 2026-02-21T09:28:31.6089234Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6089337Z cvt.u64.u32 %rd473, %r634; 2026-02-21T09:28:31.6089434Z cvt.u64.u32 %rd474, %r635; 2026-02-21T09:28:31.6089530Z shl.b64 %rd475, %rd474, 32; 2026-02-21T09:28:31.6089653Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T09:28:31.6089995Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6090096Z mov.b64 {%r1351, %r1352}, %rd476; 2026-02-21T09:28:31.6090311Z cvt.rn.f16x2.f32 %r1353, %r1352, %r1351; 2026-02-21T09:28:31.6090685Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6090780Z cvt.u64.u32 %rd477, %r636; 2026-02-21T09:28:31.6090881Z cvt.u64.u32 %rd478, %r637; 2026-02-21T09:28:31.6090985Z shl.b64 %rd479, %rd478, 32; 2026-02-21T09:28:31.6091085Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T09:28:31.6091402Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6091510Z mov.b64 {%r1354, %r1355}, %rd480; 2026-02-21T09:28:31.6091623Z cvt.rn.f16x2.f32 %r1356, %r1355, %r1354; 2026-02-21T09:28:31.6091932Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6092037Z cvt.u64.u32 %rd481, %r638; 2026-02-21T09:28:31.6092192Z cvt.u64.u32 %rd482, %r639; 2026-02-21T09:28:31.6092333Z shl.b64 %rd483, %rd482, 32; 2026-02-21T09:28:31.6092436Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T09:28:31.6092795Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6092910Z mov.b64 {%r1357, %r1358}, %rd484; 2026-02-21T09:28:31.6093038Z cvt.rn.f16x2.f32 %r1359, %r1358, %r1357; 2026-02-21T09:28:31.6093413Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6093526Z cvt.u64.u32 %rd485, %r640; 2026-02-21T09:28:31.6093638Z cvt.u64.u32 %rd486, %r641; 2026-02-21T09:28:31.6093743Z shl.b64 %rd487, %rd486, 32; 2026-02-21T09:28:31.6093840Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T09:28:31.6094155Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6094252Z mov.b64 {%r1360, %r1361}, %rd488; 2026-02-21T09:28:31.6094373Z cvt.rn.f16x2.f32 %r1362, %r1361, %r1360; 2026-02-21T09:28:31.6094810Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6094913Z cvt.u64.u32 %rd489, %r642; 2026-02-21T09:28:31.6095018Z cvt.u64.u32 %rd490, %r643; 2026-02-21T09:28:31.6095111Z shl.b64 %rd491, %rd490, 32; 2026-02-21T09:28:31.6095207Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T09:28:31.6095547Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6095643Z mov.b64 {%r1363, %r1364}, %rd492; 2026-02-21T09:28:31.6095750Z cvt.rn.f16x2.f32 %r1365, %r1364, %r1363; 2026-02-21T09:28:31.6096046Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6096153Z cvt.u64.u32 %rd493, %r644; 2026-02-21T09:28:31.6096249Z cvt.u64.u32 %rd494, %r645; 2026-02-21T09:28:31.6096348Z shl.b64 %rd495, %rd494, 32; 2026-02-21T09:28:31.6096453Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T09:28:31.6096776Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6096876Z mov.b64 {%r1366, %r1367}, %rd496; 2026-02-21T09:28:31.6096994Z cvt.rn.f16x2.f32 %r1368, %r1367, %r1366; 2026-02-21T09:28:31.6097306Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6097402Z cvt.u64.u32 %rd497, %r647; 2026-02-21T09:28:31.6097499Z cvt.u64.u32 %rd498, %r648; 2026-02-21T09:28:31.6097607Z shl.b64 %rd499, %rd498, 32; 2026-02-21T09:28:31.6097703Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T09:28:31.6098018Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6098124Z mov.b64 {%r1369, %r1370}, %rd500; 2026-02-21T09:28:31.6098234Z cvt.rn.f16x2.f32 %r1371, %r1370, %r1369; 2026-02-21T09:28:31.6098552Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6098658Z cvt.u64.u32 %rd501, %r649; 2026-02-21T09:28:31.6098843Z cvt.u64.u32 %rd502, %r650; 2026-02-21T09:28:31.6098983Z shl.b64 %rd503, %rd502, 32; 2026-02-21T09:28:31.6099080Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T09:28:31.6099483Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6099580Z mov.b64 {%r1372, %r1373}, %rd504; 2026-02-21T09:28:31.6099689Z cvt.rn.f16x2.f32 %r1374, %r1373, %r1372; 2026-02-21T09:28:31.6100005Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6100097Z cvt.u64.u32 %rd505, %r651; 2026-02-21T09:28:31.6100187Z cvt.u64.u32 %rd506, %r652; 2026-02-21T09:28:31.6100286Z shl.b64 %rd507, %rd506, 32; 2026-02-21T09:28:31.6100376Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T09:28:31.6100671Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6100852Z mov.b64 {%r1375, %r1376}, %rd508; 2026-02-21T09:28:31.6101015Z cvt.rn.f16x2.f32 %r1377, %r1376, %r1375; 2026-02-21T09:28:31.6101332Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6101427Z cvt.u64.u32 %rd509, %r653; 2026-02-21T09:28:31.6101531Z cvt.u64.u32 %rd510, %r654; 2026-02-21T09:28:31.6101624Z shl.b64 %rd511, %rd510, 32; 2026-02-21T09:28:31.6101719Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T09:28:31.6102030Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6102124Z mov.b64 {%r1378, %r1379}, %rd512; 2026-02-21T09:28:31.6102232Z cvt.rn.f16x2.f32 %r1380, %r1379, %r1378; 2026-02-21T09:28:31.6102533Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6102633Z cvt.u64.u32 %rd513, %r655; 2026-02-21T09:28:31.6102727Z cvt.u64.u32 %rd514, %r656; 2026-02-21T09:28:31.6102822Z shl.b64 %rd515, %rd514, 32; 2026-02-21T09:28:31.6102933Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T09:28:31.6103242Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6103337Z mov.b64 {%r1381, %r1382}, %rd516; 2026-02-21T09:28:31.6103454Z cvt.rn.f16x2.f32 %r1383, %r1382, %r1381; 2026-02-21T09:28:31.6103757Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6103862Z cvt.u64.u32 %rd517, %r657; 2026-02-21T09:28:31.6103969Z cvt.u64.u32 %rd518, %r658; 2026-02-21T09:28:31.6104082Z shl.b64 %rd519, %rd518, 32; 2026-02-21T09:28:31.6104175Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T09:28:31.6104482Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6104583Z mov.b64 {%r1384, %r1385}, %rd520; 2026-02-21T09:28:31.6104760Z cvt.rn.f16x2.f32 %r1386, %r1385, %r1384; 2026-02-21T09:28:31.6105085Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6105194Z cvt.u64.u32 %rd521, %r659; 2026-02-21T09:28:31.6105289Z cvt.u64.u32 %rd522, %r660; 2026-02-21T09:28:31.6105385Z shl.b64 %rd523, %rd522, 32; 2026-02-21T09:28:31.6105479Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T09:28:31.6105799Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6105895Z mov.b64 {%r1387, %r1388}, %rd524; 2026-02-21T09:28:31.6106003Z cvt.rn.f16x2.f32 %r1389, %r1388, %r1387; 2026-02-21T09:28:31.6106321Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6106415Z cvt.u64.u32 %rd525, %r661; 2026-02-21T09:28:31.6106511Z cvt.u64.u32 %rd526, %r662; 2026-02-21T09:28:31.6106615Z shl.b64 %rd527, %rd526, 32; 2026-02-21T09:28:31.6106710Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T09:28:31.6107019Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6107200Z mov.b64 {%r1390, %r1391}, %rd528; 2026-02-21T09:28:31.6107361Z cvt.rn.f16x2.f32 %r1392, %r1391, %r1390; 2026-02-21T09:28:31.6107668Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6107765Z cvt.u64.u32 %rd529, %r664; 2026-02-21T09:28:31.6107868Z cvt.u64.u32 %rd530, %r665; 2026-02-21T09:28:31.6107963Z shl.b64 %rd531, %rd530, 32; 2026-02-21T09:28:31.6108057Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T09:28:31.6108375Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6108489Z mov.b64 {%r1393, %r1394}, %rd532; 2026-02-21T09:28:31.6108617Z cvt.rn.f16x2.f32 %r1395, %r1394, %r1393; 2026-02-21T09:28:31.6108935Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6109101Z cvt.u64.u32 %rd533, %r666; 2026-02-21T09:28:31.6109239Z cvt.u64.u32 %rd534, %r667; 2026-02-21T09:28:31.6109337Z shl.b64 %rd535, %rd534, 32; 2026-02-21T09:28:31.6109447Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T09:28:31.6109751Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6109847Z mov.b64 {%r1396, %r1397}, %rd536; 2026-02-21T09:28:31.6109964Z cvt.rn.f16x2.f32 %r1398, %r1397, %r1396; 2026-02-21T09:28:31.6110267Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6110364Z cvt.u64.u32 %rd537, %r668; 2026-02-21T09:28:31.6110458Z cvt.u64.u32 %rd538, %r669; 2026-02-21T09:28:31.6110559Z shl.b64 %rd539, %rd538, 32; 2026-02-21T09:28:31.6110654Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T09:28:31.6110962Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6111071Z mov.b64 {%r1399, %r1400}, %rd540; 2026-02-21T09:28:31.6111184Z cvt.rn.f16x2.f32 %r1401, %r1400, %r1399; 2026-02-21T09:28:31.6111490Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6111596Z cvt.u64.u32 %rd541, %r670; 2026-02-21T09:28:31.6111689Z cvt.u64.u32 %rd542, %r671; 2026-02-21T09:28:31.6111783Z shl.b64 %rd543, %rd542, 32; 2026-02-21T09:28:31.6111877Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T09:28:31.6112190Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6112287Z mov.b64 {%r1402, %r1403}, %rd544; 2026-02-21T09:28:31.6112397Z cvt.rn.f16x2.f32 %r1404, %r1403, %r1402; 2026-02-21T09:28:31.6112707Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6112800Z cvt.u64.u32 %rd545, %r672; 2026-02-21T09:28:31.6112891Z cvt.u64.u32 %rd546, %r673; 2026-02-21T09:28:31.6112994Z shl.b64 %rd547, %rd546, 32; 2026-02-21T09:28:31.6113110Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T09:28:31.6113441Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6113535Z mov.b64 {%r1405, %r1406}, %rd548; 2026-02-21T09:28:31.6113650Z cvt.rn.f16x2.f32 %r1407, %r1406, %r1405; 2026-02-21T09:28:31.6113965Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6114056Z cvt.u64.u32 %rd549, %r674; 2026-02-21T09:28:31.6114155Z cvt.u64.u32 %rd550, %r675; 2026-02-21T09:28:31.6114250Z shl.b64 %rd551, %rd550, 32; 2026-02-21T09:28:31.6114346Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T09:28:31.6114659Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6114831Z mov.b64 {%r1408, %r1409}, %rd552; 2026-02-21T09:28:31.6114944Z cvt.rn.f16x2.f32 %r1410, %r1409, %r1408; 2026-02-21T09:28:31.6115255Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6115492Z cvt.u64.u32 %rd553, %r676; 2026-02-21T09:28:31.6115588Z cvt.u64.u32 %rd554, %r677; 2026-02-21T09:28:31.6115682Z shl.b64 %rd555, %rd554, 32; 2026-02-21T09:28:31.6115786Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T09:28:31.6116095Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6116190Z mov.b64 {%r1411, %r1412}, %rd556; 2026-02-21T09:28:31.6116308Z cvt.rn.f16x2.f32 %r1413, %r1412, %r1411; 2026-02-21T09:28:31.6116612Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6116707Z cvt.u64.u32 %rd557, %r678; 2026-02-21T09:28:31.6116800Z cvt.u64.u32 %rd558, %r679; 2026-02-21T09:28:31.6116904Z shl.b64 %rd559, %rd558, 32; 2026-02-21T09:28:31.6117001Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T09:28:31.6117391Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6117501Z mov.b64 {%r1414, %r1415}, %rd560; 2026-02-21T09:28:31.6117633Z cvt.rn.f16x2.f32 %r1416, %r1415, %r1414; 2026-02-21T09:28:31.6117958Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6118056Z cvt.u64.u32 %rd561, %r681; 2026-02-21T09:28:31.6118148Z cvt.u64.u32 %rd562, %r682; 2026-02-21T09:28:31.6118242Z shl.b64 %rd563, %rd562, 32; 2026-02-21T09:28:31.6118334Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T09:28:31.6118656Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6118748Z mov.b64 {%r1417, %r1418}, %rd564; 2026-02-21T09:28:31.6118859Z cvt.rn.f16x2.f32 %r1419, %r1418, %r1417; 2026-02-21T09:28:31.6119170Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6119265Z cvt.u64.u32 %rd565, %r683; 2026-02-21T09:28:31.6119363Z cvt.u64.u32 %rd566, %r684; 2026-02-21T09:28:31.6119469Z shl.b64 %rd567, %rd566, 32; 2026-02-21T09:28:31.6119568Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T09:28:31.6119869Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6119968Z mov.b64 {%r1420, %r1421}, %rd568; 2026-02-21T09:28:31.6120086Z cvt.rn.f16x2.f32 %r1422, %r1421, %r1420; 2026-02-21T09:28:31.6120392Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6120489Z cvt.u64.u32 %rd569, %r685; 2026-02-21T09:28:31.6120592Z cvt.u64.u32 %rd570, %r686; 2026-02-21T09:28:31.6120684Z shl.b64 %rd571, %rd570, 32; 2026-02-21T09:28:31.6120778Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T09:28:31.6121088Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6121186Z mov.b64 {%r1423, %r1424}, %rd572; 2026-02-21T09:28:31.6121300Z cvt.rn.f16x2.f32 %r1425, %r1424, %r1423; 2026-02-21T09:28:31.6121608Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6121713Z cvt.u64.u32 %rd573, %r687; 2026-02-21T09:28:31.6121807Z cvt.u64.u32 %rd574, %r688; 2026-02-21T09:28:31.6121902Z shl.b64 %rd575, %rd574, 32; 2026-02-21T09:28:31.6122006Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T09:28:31.6122368Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6122463Z mov.b64 {%r1426, %r1427}, %rd576; 2026-02-21T09:28:31.6122575Z cvt.rn.f16x2.f32 %r1428, %r1427, %r1426; 2026-02-21T09:28:31.6122886Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6122978Z cvt.u64.u32 %rd577, %r689; 2026-02-21T09:28:31.6123070Z cvt.u64.u32 %rd578, %r690; 2026-02-21T09:28:31.6123170Z shl.b64 %rd579, %rd578, 32; 2026-02-21T09:28:31.6123272Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T09:28:31.6123650Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6123815Z mov.b64 {%r1429, %r1430}, %rd580; 2026-02-21T09:28:31.6123926Z cvt.rn.f16x2.f32 %r1431, %r1430, %r1429; 2026-02-21T09:28:31.6124230Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6124336Z cvt.u64.u32 %rd581, %r691; 2026-02-21T09:28:31.6124431Z cvt.u64.u32 %rd582, %r692; 2026-02-21T09:28:31.6124529Z shl.b64 %rd583, %rd582, 32; 2026-02-21T09:28:31.6124624Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T09:28:31.6124992Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6125091Z mov.b64 {%r1432, %r1433}, %rd584; 2026-02-21T09:28:31.6125205Z cvt.rn.f16x2.f32 %r1434, %r1433, %r1432; 2026-02-21T09:28:31.6125617Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6125717Z cvt.u64.u32 %rd585, %r693; 2026-02-21T09:28:31.6125812Z cvt.u64.u32 %rd586, %r694; 2026-02-21T09:28:31.6125914Z shl.b64 %rd587, %rd586, 32; 2026-02-21T09:28:31.6126010Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T09:28:31.6126316Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6126408Z mov.b64 {%r1435, %r1436}, %rd588; 2026-02-21T09:28:31.6126526Z cvt.rn.f16x2.f32 %r1437, %r1436, %r1435; 2026-02-21T09:28:31.6126885Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6126979Z cvt.u64.u32 %rd589, %r695; 2026-02-21T09:28:31.6127078Z cvt.u64.u32 %rd590, %r696; 2026-02-21T09:28:31.6127166Z shl.b64 %rd591, %rd590, 32; 2026-02-21T09:28:31.6127259Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T09:28:31.6127579Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6127673Z mov.b64 {%r1438, %r1439}, %rd592; 2026-02-21T09:28:31.6127786Z cvt.rn.f16x2.f32 %r1440, %r1439, %r1438; 2026-02-21T09:28:31.6128088Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6128190Z cvt.u64.u32 %rd593, %r698; 2026-02-21T09:28:31.6128282Z cvt.u64.u32 %rd594, %r699; 2026-02-21T09:28:31.6128374Z shl.b64 %rd595, %rd594, 32; 2026-02-21T09:28:31.6128477Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T09:28:31.6128779Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6128873Z mov.b64 {%r1441, %r1442}, %rd596; 2026-02-21T09:28:31.6128988Z cvt.rn.f16x2.f32 %r1443, %r1442, %r1441; 2026-02-21T09:28:31.6129291Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6129389Z cvt.u64.u32 %rd597, %r700; 2026-02-21T09:28:31.6129483Z cvt.u64.u32 %rd598, %r701; 2026-02-21T09:28:31.6129589Z shl.b64 %rd599, %rd598, 32; 2026-02-21T09:28:31.6129687Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T09:28:31.6129990Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6130092Z mov.b64 {%r1444, %r1445}, %rd600; 2026-02-21T09:28:31.6130198Z cvt.rn.f16x2.f32 %r1446, %r1445, %r1444; 2026-02-21T09:28:31.6130500Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6130601Z cvt.u64.u32 %rd601, %r702; 2026-02-21T09:28:31.6130692Z cvt.u64.u32 %rd602, %r703; 2026-02-21T09:28:31.6130784Z shl.b64 %rd603, %rd602, 32; 2026-02-21T09:28:31.6130878Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T09:28:31.6131194Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6131310Z mov.b64 {%r1447, %r1448}, %rd604; 2026-02-21T09:28:31.6131436Z cvt.rn.f16x2.f32 %r1449, %r1448, %r1447; 2026-02-21T09:28:31.6131853Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6132028Z cvt.u64.u32 %rd605, %r704; 2026-02-21T09:28:31.6132121Z cvt.u64.u32 %rd606, %r705; 2026-02-21T09:28:31.6132221Z shl.b64 %rd607, %rd606, 32; 2026-02-21T09:28:31.6132316Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T09:28:31.6132617Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6132711Z mov.b64 {%r1450, %r1451}, %rd608; 2026-02-21T09:28:31.6132826Z cvt.rn.f16x2.f32 %r1452, %r1451, %r1450; 2026-02-21T09:28:31.6133130Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6133224Z cvt.u64.u32 %rd609, %r706; 2026-02-21T09:28:31.6133328Z cvt.u64.u32 %rd610, %r707; 2026-02-21T09:28:31.6133421Z shl.b64 %rd611, %rd610, 32; 2026-02-21T09:28:31.6133596Z or.b64 %rd612, %rd609, %rd611; 2026-02-21T09:28:31.6133910Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6134008Z mov.b64 {%r1453, %r1454}, %rd612; 2026-02-21T09:28:31.6134118Z cvt.rn.f16x2.f32 %r1455, %r1454, %r1453; 2026-02-21T09:28:31.6134419Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6134523Z cvt.u64.u32 %rd613, %r708; 2026-02-21T09:28:31.6134616Z cvt.u64.u32 %rd614, %r709; 2026-02-21T09:28:31.6134785Z shl.b64 %rd615, %rd614, 32; 2026-02-21T09:28:31.6134893Z or.b64 %rd616, %rd613, %rd615; 2026-02-21T09:28:31.6135196Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6135292Z mov.b64 {%r1456, %r1457}, %rd616; 2026-02-21T09:28:31.6135411Z cvt.rn.f16x2.f32 %r1458, %r1457, %r1456; 2026-02-21T09:28:31.6135725Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6135831Z cvt.u64.u32 %rd617, %r710; 2026-02-21T09:28:31.6135945Z cvt.u64.u32 %rd618, %r711; 2026-02-21T09:28:31.6136056Z shl.b64 %rd619, %rd618, 32; 2026-02-21T09:28:31.6136151Z or.b64 %rd620, %rd617, %rd619; 2026-02-21T09:28:31.6136460Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6136560Z mov.b64 {%r1459, %r1460}, %rd620; 2026-02-21T09:28:31.6136666Z cvt.rn.f16x2.f32 %r1461, %r1460, %r1459; 2026-02-21T09:28:31.6136976Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6137081Z cvt.u64.u32 %rd621, %r712; 2026-02-21T09:28:31.6137179Z cvt.u64.u32 %rd622, %r713; 2026-02-21T09:28:31.6137274Z shl.b64 %rd623, %rd622, 32; 2026-02-21T09:28:31.6137369Z or.b64 %rd624, %rd621, %rd623; 2026-02-21T09:28:31.6137688Z .loc 1 58 27 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:58:27 2026-02-21T09:28:31.6137787Z mov.b64 {%r1462, %r1463}, %rd624; 2026-02-21T09:28:31.6137901Z cvt.rn.f16x2.f32 %r1464, %r1463, %r1462; 2026-02-21T09:28:31.6138214Z .loc 1 59 82 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:59:82 2026-02-21T09:28:31.6138392Z st.shared.v4.b32 [%r1034], {%r1083, %r1095, %r1107, %r1119}; 2026-02-21T09:28:31.6138563Z st.shared.v4.b32 [%r1033], {%r1131, %r1143, %r1155, %r1167}; 2026-02-21T09:28:31.6138735Z st.shared.v4.b32 [%r1031], {%r1179, %r1191, %r1203, %r1215}; 2026-02-21T09:28:31.6138896Z st.shared.v4.b32 [%r1029], {%r1227, %r1239, %r1251, %r1263}; 2026-02-21T09:28:31.6139058Z st.shared.v4.b32 [%r1027], {%r1275, %r1287, %r1299, %r1311}; 2026-02-21T09:28:31.6139226Z st.shared.v4.b32 [%r1025], {%r1323, %r1335, %r1347, %r1359}; 2026-02-21T09:28:31.6139383Z st.shared.v4.b32 [%r1023], {%r1371, %r1383, %r1395, %r1407}; 2026-02-21T09:28:31.6139542Z st.shared.v4.b32 [%r1021], {%r1419, %r1431, %r1443, %r1455}; 2026-02-21T09:28:31.6139635Z bar.sync 0, 256; 2026-02-21T09:28:31.6139836Z // begin inline asm 2026-02-21T09:28:31.6140163Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r875, %r879, %r883, %r887}, [%r719]; 2026-02-21T09:28:31.6140255Z // end inline asm 2026-02-21T09:28:31.6140376Z // begin inline asm 2026-02-21T09:28:31.6140657Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r891, %r895, %r899, %r903}, [%r724]; 2026-02-21T09:28:31.6140744Z // end inline asm 2026-02-21T09:28:31.6140832Z // begin inline asm 2026-02-21T09:28:31.6141103Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r907, %r911, %r915, %r919}, [%r729]; 2026-02-21T09:28:31.6141189Z // end inline asm 2026-02-21T09:28:31.6141278Z // begin inline asm 2026-02-21T09:28:31.6141539Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r923, %r927, %r931, %r935}, [%r734]; 2026-02-21T09:28:31.6141627Z // end inline asm 2026-02-21T09:28:31.6141717Z // begin inline asm 2026-02-21T09:28:31.6142063Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r939, %r943, %r947, %r951}, [%r739]; 2026-02-21T09:28:31.6142208Z // end inline asm 2026-02-21T09:28:31.6142302Z // begin inline asm 2026-02-21T09:28:31.6142565Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r955, %r959, %r963, %r967}, [%r744]; 2026-02-21T09:28:31.6142661Z // end inline asm 2026-02-21T09:28:31.6142751Z // begin inline asm 2026-02-21T09:28:31.6143013Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r971, %r975, %r979, %r983}, [%r749]; 2026-02-21T09:28:31.6143106Z // end inline asm 2026-02-21T09:28:31.6143197Z // begin inline asm 2026-02-21T09:28:31.6143458Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r987, %r991, %r995, %r999}, [%r754]; 2026-02-21T09:28:31.6143542Z // end inline asm 2026-02-21T09:28:31.6143638Z bar.sync 0, 256; 2026-02-21T09:28:31.6143802Z st.shared.v4.b32 [%r1034], {%r1086, %r1098, %r1110, %r1122}; 2026-02-21T09:28:31.6143966Z st.shared.v4.b32 [%r1033], {%r1134, %r1146, %r1158, %r1170}; 2026-02-21T09:28:31.6144136Z st.shared.v4.b32 [%r1031], {%r1182, %r1194, %r1206, %r1218}; 2026-02-21T09:28:31.6144299Z st.shared.v4.b32 [%r1029], {%r1230, %r1242, %r1254, %r1266}; 2026-02-21T09:28:31.6144462Z st.shared.v4.b32 [%r1027], {%r1278, %r1290, %r1302, %r1314}; 2026-02-21T09:28:31.6144628Z st.shared.v4.b32 [%r1025], {%r1326, %r1338, %r1350, %r1362}; 2026-02-21T09:28:31.6144857Z st.shared.v4.b32 [%r1023], {%r1374, %r1386, %r1398, %r1410}; 2026-02-21T09:28:31.6145057Z st.shared.v4.b32 [%r1021], {%r1422, %r1434, %r1446, %r1458}; 2026-02-21T09:28:31.6145166Z bar.sync 0, 256; 2026-02-21T09:28:31.6145256Z // begin inline asm 2026-02-21T09:28:31.6145522Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r876, %r880, %r884, %r888}, [%r719]; 2026-02-21T09:28:31.6145607Z // end inline asm 2026-02-21T09:28:31.6145703Z // begin inline asm 2026-02-21T09:28:31.6145968Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r892, %r896, %r900, %r904}, [%r724]; 2026-02-21T09:28:31.6146052Z // end inline asm 2026-02-21T09:28:31.6146150Z // begin inline asm 2026-02-21T09:28:31.6146411Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r908, %r912, %r916, %r920}, [%r729]; 2026-02-21T09:28:31.6146503Z // end inline asm 2026-02-21T09:28:31.6146594Z // begin inline asm 2026-02-21T09:28:31.6146860Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r924, %r928, %r932, %r936}, [%r734]; 2026-02-21T09:28:31.6146948Z // end inline asm 2026-02-21T09:28:31.6147037Z // begin inline asm 2026-02-21T09:28:31.6147301Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r940, %r944, %r948, %r952}, [%r739]; 2026-02-21T09:28:31.6147388Z // end inline asm 2026-02-21T09:28:31.6147479Z // begin inline asm 2026-02-21T09:28:31.6147742Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r956, %r960, %r964, %r968}, [%r744]; 2026-02-21T09:28:31.6147829Z // end inline asm 2026-02-21T09:28:31.6147918Z // begin inline asm 2026-02-21T09:28:31.6148176Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r972, %r976, %r980, %r984}, [%r749]; 2026-02-21T09:28:31.6148273Z // end inline asm 2026-02-21T09:28:31.6148362Z // begin inline asm 2026-02-21T09:28:31.6148634Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r988, %r992, %r996, %r1000}, [%r754]; 2026-02-21T09:28:31.6148827Z // end inline asm 2026-02-21T09:28:31.6148956Z bar.sync 0, 256; 2026-02-21T09:28:31.6149122Z st.shared.v4.b32 [%r1034], {%r1089, %r1101, %r1113, %r1125}; 2026-02-21T09:28:31.6149286Z st.shared.v4.b32 [%r1033], {%r1137, %r1149, %r1161, %r1173}; 2026-02-21T09:28:31.6149453Z st.shared.v4.b32 [%r1031], {%r1185, %r1197, %r1209, %r1221}; 2026-02-21T09:28:31.6149652Z st.shared.v4.b32 [%r1029], {%r1233, %r1245, %r1257, %r1269}; 2026-02-21T09:28:31.6149822Z st.shared.v4.b32 [%r1027], {%r1281, %r1293, %r1305, %r1317}; 2026-02-21T09:28:31.6149990Z st.shared.v4.b32 [%r1025], {%r1329, %r1341, %r1353, %r1365}; 2026-02-21T09:28:31.6150149Z st.shared.v4.b32 [%r1023], {%r1377, %r1389, %r1401, %r1413}; 2026-02-21T09:28:31.6150307Z st.shared.v4.b32 [%r1021], {%r1425, %r1437, %r1449, %r1461}; 2026-02-21T09:28:31.6150402Z bar.sync 0, 256; 2026-02-21T09:28:31.6150561Z // begin inline asm 2026-02-21T09:28:31.6150862Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r877, %r881, %r885, %r889}, [%r719]; 2026-02-21T09:28:31.6150955Z // end inline asm 2026-02-21T09:28:31.6151054Z // begin inline asm 2026-02-21T09:28:31.6151311Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r893, %r897, %r901, %r905}, [%r724]; 2026-02-21T09:28:31.6151399Z // end inline asm 2026-02-21T09:28:31.6151500Z // begin inline asm 2026-02-21T09:28:31.6151762Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r909, %r913, %r917, %r921}, [%r729]; 2026-02-21T09:28:31.6151851Z // end inline asm 2026-02-21T09:28:31.6151950Z // begin inline asm 2026-02-21T09:28:31.6152208Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r925, %r929, %r933, %r937}, [%r734]; 2026-02-21T09:28:31.6152294Z // end inline asm 2026-02-21T09:28:31.6152382Z // begin inline asm 2026-02-21T09:28:31.6152651Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r941, %r945, %r949, %r953}, [%r739]; 2026-02-21T09:28:31.6152736Z // end inline asm 2026-02-21T09:28:31.6152826Z // begin inline asm 2026-02-21T09:28:31.6153092Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r957, %r961, %r965, %r969}, [%r744]; 2026-02-21T09:28:31.6153186Z // end inline asm 2026-02-21T09:28:31.6153272Z // begin inline asm 2026-02-21T09:28:31.6153533Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r973, %r977, %r981, %r985}, [%r749]; 2026-02-21T09:28:31.6153628Z // end inline asm 2026-02-21T09:28:31.6153714Z // begin inline asm 2026-02-21T09:28:31.6153977Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r989, %r993, %r997, %r1001}, [%r754]; 2026-02-21T09:28:31.6154084Z // end inline asm 2026-02-21T09:28:31.6154181Z bar.sync 0, 256; 2026-02-21T09:28:31.6154349Z st.shared.v4.b32 [%r1034], {%r1092, %r1104, %r1116, %r1128}; 2026-02-21T09:28:31.6154515Z st.shared.v4.b32 [%r1033], {%r1140, %r1152, %r1164, %r1176}; 2026-02-21T09:28:31.6154759Z st.shared.v4.b32 [%r1031], {%r1188, %r1200, %r1212, %r1224}; 2026-02-21T09:28:31.6154934Z st.shared.v4.b32 [%r1029], {%r1236, %r1248, %r1260, %r1272}; 2026-02-21T09:28:31.6155101Z st.shared.v4.b32 [%r1027], {%r1284, %r1296, %r1308, %r1320}; 2026-02-21T09:28:31.6155268Z st.shared.v4.b32 [%r1025], {%r1332, %r1344, %r1356, %r1368}; 2026-02-21T09:28:31.6155429Z st.shared.v4.b32 [%r1023], {%r1380, %r1392, %r1404, %r1416}; 2026-02-21T09:28:31.6155605Z st.shared.v4.b32 [%r1021], {%r1428, %r1440, %r1452, %r1464}; 2026-02-21T09:28:31.6155702Z bar.sync 0, 256; 2026-02-21T09:28:31.6155791Z // begin inline asm 2026-02-21T09:28:31.6156052Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r878, %r882, %r886, %r890}, [%r719]; 2026-02-21T09:28:31.6156149Z // end inline asm 2026-02-21T09:28:31.6156239Z // begin inline asm 2026-02-21T09:28:31.6156496Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r894, %r898, %r902, %r906}, [%r724]; 2026-02-21T09:28:31.6156584Z // end inline asm 2026-02-21T09:28:31.6156682Z // begin inline asm 2026-02-21T09:28:31.6156941Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r910, %r914, %r918, %r922}, [%r729]; 2026-02-21T09:28:31.6157029Z // end inline asm 2026-02-21T09:28:31.6157128Z // begin inline asm 2026-02-21T09:28:31.6157473Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r926, %r930, %r934, %r938}, [%r734]; 2026-02-21T09:28:31.6157601Z // end inline asm 2026-02-21T09:28:31.6157702Z // begin inline asm 2026-02-21T09:28:31.6157960Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r942, %r946, %r950, %r954}, [%r739]; 2026-02-21T09:28:31.6158045Z // end inline asm 2026-02-21T09:28:31.6158135Z // begin inline asm 2026-02-21T09:28:31.6158399Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r958, %r962, %r966, %r970}, [%r744]; 2026-02-21T09:28:31.6158483Z // end inline asm 2026-02-21T09:28:31.6158573Z // begin inline asm 2026-02-21T09:28:31.6158899Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r974, %r978, %r982, %r986}, [%r749]; 2026-02-21T09:28:31.6158984Z // end inline asm 2026-02-21T09:28:31.6159070Z // begin inline asm 2026-02-21T09:28:31.6159341Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r990, %r994, %r998, %r1002}, [%r754]; 2026-02-21T09:28:31.6159503Z // end inline asm 2026-02-21T09:28:31.6159637Z // begin inline asm 2026-02-21T09:28:31.6159820Z st.global.v4.b32 [ %rd81 + 0 ], { %r875, %r876, %r877, %r878 }; 2026-02-21T09:28:31.6159921Z // end inline asm 2026-02-21T09:28:31.6160011Z // begin inline asm 2026-02-21T09:28:31.6160184Z st.global.v4.b32 [ %rd82 + 0 ], { %r879, %r880, %r881, %r882 }; 2026-02-21T09:28:31.6160281Z // end inline asm 2026-02-21T09:28:31.6160370Z // begin inline asm 2026-02-21T09:28:31.6160539Z st.global.v4.b32 [ %rd83 + 0 ], { %r883, %r884, %r885, %r886 }; 2026-02-21T09:28:31.6160625Z // end inline asm 2026-02-21T09:28:31.6160721Z // begin inline asm 2026-02-21T09:28:31.6160885Z st.global.v4.b32 [ %rd84 + 0 ], { %r887, %r888, %r889, %r890 }; 2026-02-21T09:28:31.6160971Z // end inline asm 2026-02-21T09:28:31.6161067Z // begin inline asm 2026-02-21T09:28:31.6161233Z st.global.v4.b32 [ %rd85 + 0 ], { %r891, %r892, %r893, %r894 }; 2026-02-21T09:28:31.6161320Z // end inline asm 2026-02-21T09:28:31.6161410Z // begin inline asm 2026-02-21T09:28:31.6161585Z st.global.v4.b32 [ %rd86 + 0 ], { %r895, %r896, %r897, %r898 }; 2026-02-21T09:28:31.6161674Z // end inline asm 2026-02-21T09:28:31.6161764Z // begin inline asm 2026-02-21T09:28:31.6161933Z st.global.v4.b32 [ %rd87 + 0 ], { %r899, %r900, %r901, %r902 }; 2026-02-21T09:28:31.6162018Z // end inline asm 2026-02-21T09:28:31.6162106Z // begin inline asm 2026-02-21T09:28:31.6162268Z st.global.v4.b32 [ %rd88 + 0 ], { %r903, %r904, %r905, %r906 }; 2026-02-21T09:28:31.6162361Z // end inline asm 2026-02-21T09:28:31.6162451Z // begin inline asm 2026-02-21T09:28:31.6162615Z st.global.v4.b32 [ %rd89 + 0 ], { %r907, %r908, %r909, %r910 }; 2026-02-21T09:28:31.6162708Z // end inline asm 2026-02-21T09:28:31.6162797Z // begin inline asm 2026-02-21T09:28:31.6162958Z st.global.v4.b32 [ %rd90 + 0 ], { %r911, %r912, %r913, %r914 }; 2026-02-21T09:28:31.6163051Z // end inline asm 2026-02-21T09:28:31.6163145Z // begin inline asm 2026-02-21T09:28:31.6163347Z st.global.v4.b32 [ %rd91 + 0 ], { %r915, %r916, %r917, %r918 }; 2026-02-21T09:28:31.6163439Z // end inline asm 2026-02-21T09:28:31.6163539Z // begin inline asm 2026-02-21T09:28:31.6163707Z st.global.v4.b32 [ %rd92 + 0 ], { %r919, %r920, %r921, %r922 }; 2026-02-21T09:28:31.6163792Z // end inline asm 2026-02-21T09:28:31.6163888Z // begin inline asm 2026-02-21T09:28:31.6164053Z st.global.v4.b32 [ %rd93 + 0 ], { %r923, %r924, %r925, %r926 }; 2026-02-21T09:28:31.6164139Z // end inline asm 2026-02-21T09:28:31.6164226Z // begin inline asm 2026-02-21T09:28:31.6164394Z st.global.v4.b32 [ %rd94 + 0 ], { %r927, %r928, %r929, %r930 }; 2026-02-21T09:28:31.6164483Z // end inline asm 2026-02-21T09:28:31.6164574Z // begin inline asm 2026-02-21T09:28:31.6164800Z st.global.v4.b32 [ %rd95 + 0 ], { %r931, %r932, %r933, %r934 }; 2026-02-21T09:28:31.6164890Z // end inline asm 2026-02-21T09:28:31.6164981Z // begin inline asm 2026-02-21T09:28:31.6165143Z st.global.v4.b32 [ %rd96 + 0 ], { %r935, %r936, %r937, %r938 }; 2026-02-21T09:28:31.6165240Z // end inline asm 2026-02-21T09:28:31.6165333Z // begin inline asm 2026-02-21T09:28:31.6165606Z st.global.v4.b32 [ %rd97 + 0 ], { %r939, %r940, %r941, %r942 }; 2026-02-21T09:28:31.6165745Z // end inline asm 2026-02-21T09:28:31.6165836Z // begin inline asm 2026-02-21T09:28:31.6166002Z st.global.v4.b32 [ %rd98 + 0 ], { %r943, %r944, %r945, %r946 }; 2026-02-21T09:28:31.6166098Z // end inline asm 2026-02-21T09:28:31.6166189Z // begin inline asm 2026-02-21T09:28:31.6166353Z st.global.v4.b32 [ %rd99 + 0 ], { %r947, %r948, %r949, %r950 }; 2026-02-21T09:28:31.6166438Z // end inline asm 2026-02-21T09:28:31.6166536Z // begin inline asm 2026-02-21T09:28:31.6166714Z st.global.v4.b32 [ %rd100 + 0 ], { %r951, %r952, %r953, %r954 }; 2026-02-21T09:28:31.6166801Z // end inline asm 2026-02-21T09:28:31.6166901Z // begin inline asm 2026-02-21T09:28:31.6167074Z st.global.v4.b32 [ %rd101 + 0 ], { %r955, %r956, %r957, %r958 }; 2026-02-21T09:28:31.6167161Z // end inline asm 2026-02-21T09:28:31.6167304Z // begin inline asm 2026-02-21T09:28:31.6167521Z st.global.v4.b32 [ %rd102 + 0 ], { %r959, %r960, %r961, %r962 }; 2026-02-21T09:28:31.6167615Z // end inline asm 2026-02-21T09:28:31.6167703Z // begin inline asm 2026-02-21T09:28:31.6167915Z st.global.v4.b32 [ %rd103 + 0 ], { %r963, %r964, %r965, %r966 }; 2026-02-21T09:28:31.6168015Z // end inline asm 2026-02-21T09:28:31.6168102Z // begin inline asm 2026-02-21T09:28:31.6168275Z st.global.v4.b32 [ %rd104 + 0 ], { %r967, %r968, %r969, %r970 }; 2026-02-21T09:28:31.6168359Z // end inline asm 2026-02-21T09:28:31.6168448Z // begin inline asm 2026-02-21T09:28:31.6168613Z st.global.v4.b32 [ %rd105 + 0 ], { %r971, %r972, %r973, %r974 }; 2026-02-21T09:28:31.6168705Z // end inline asm 2026-02-21T09:28:31.6168792Z // begin inline asm 2026-02-21T09:28:31.6168956Z st.global.v4.b32 [ %rd106 + 0 ], { %r975, %r976, %r977, %r978 }; 2026-02-21T09:28:31.6169049Z // end inline asm 2026-02-21T09:28:31.6169139Z // begin inline asm 2026-02-21T09:28:31.6169309Z st.global.v4.b32 [ %rd107 + 0 ], { %r979, %r980, %r981, %r982 }; 2026-02-21T09:28:31.6169399Z // end inline asm 2026-02-21T09:28:31.6169498Z // begin inline asm 2026-02-21T09:28:31.6169667Z st.global.v4.b32 [ %rd108 + 0 ], { %r983, %r984, %r985, %r986 }; 2026-02-21T09:28:31.6169755Z // end inline asm 2026-02-21T09:28:31.6169852Z // begin inline asm 2026-02-21T09:28:31.6170016Z st.global.v4.b32 [ %rd109 + 0 ], { %r987, %r988, %r989, %r990 }; 2026-02-21T09:28:31.6170102Z // end inline asm 2026-02-21T09:28:31.6170189Z // begin inline asm 2026-02-21T09:28:31.6170364Z st.global.v4.b32 [ %rd110 + 0 ], { %r991, %r992, %r993, %r994 }; 2026-02-21T09:28:31.6170450Z // end inline asm 2026-02-21T09:28:31.6170541Z // begin inline asm 2026-02-21T09:28:31.6170711Z st.global.v4.b32 [ %rd111 + 0 ], { %r995, %r996, %r997, %r998 }; 2026-02-21T09:28:31.6170799Z // end inline asm 2026-02-21T09:28:31.6170891Z // begin inline asm 2026-02-21T09:28:31.6171082Z st.global.v4.b32 [ %rd112 + 0 ], { %r999, %r1000, %r1001, %r1002 }; 2026-02-21T09:28:31.6171171Z // end inline asm 2026-02-21T09:28:31.6171309Z $L__BB0_14: // %._crit_edge 2026-02-21T09:28:31.6171629Z .loc 1 31 4 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:31:4 2026-02-21T09:28:31.6171725Z bar.sync 0, 256; 2026-02-21T09:28:31.6171811Z // begin inline asm 2026-02-21T09:28:31.6172023Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1465, 512; 2026-02-21T09:28:31.6172117Z // end inline asm 2026-02-21T09:28:31.6172251Z st.shared.b32 [global_smem+196840], 50529027; 2026-02-21T09:28:31.6172350Z barrier.sync 1; 2026-02-21T09:28:31.6172513Z $L__BB0_15: // %common.ret 2026-02-21T09:28:31.6172823Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6172905Z ret; 2026-02-21T09:28:31.6173062Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:28:31.6173172Z mov.b32 %r23, global_smem; 2026-02-21T09:28:31.6173272Z add.s32 %r24, %r23, %r3; 2026-02-21T09:28:31.6173371Z add.s32 %r54, %r23, 196784; 2026-02-21T09:28:31.6210427Z bfe.u32 %r68, %r23, 4, 14; 2026-02-21T09:28:31.6210531Z cvt.u64.u32 %rd22, %r68; 2026-02-21T09:28:31.6210674Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T09:28:31.6210772Z add.s32 %r69, %r23, 98304; 2026-02-21T09:28:31.6210874Z bfe.u32 %r70, %r69, 4, 14; 2026-02-21T09:28:31.6210971Z cvt.u64.u32 %rd23, %r70; 2026-02-21T09:28:31.6211092Z or.b64 %rd17, %rd23, -9223371899348713472; 2026-02-21T09:28:31.6211195Z add.s32 %r71, %r23, 32; 2026-02-21T09:28:31.6211286Z bfe.u32 %r72, %r71, 4, 14; 2026-02-21T09:28:31.6211376Z bra.uni $L__BB0_2; 2026-02-21T09:28:31.6211564Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6211895Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6212032Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:28:31.6212237Z barrier.sync 1; 2026-02-21T09:28:31.6212408Z barrier.sync 1; 2026-02-21T09:28:31.6212571Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:28:31.6212724Z $L__BB0_2: // %.preheader 2026-02-21T09:28:31.6212891Z // =>This Loop Header: Depth=1 2026-02-21T09:28:31.6213038Z // Child Loop BB0_9 Depth 2 2026-02-21T09:28:31.6213177Z // Child Loop BB0_6 Depth 2 2026-02-21T09:28:31.6213495Z .loc 1 19 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:19 2026-02-21T09:28:31.6213619Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:28:31.6213712Z barrier.sync 1; 2026-02-21T09:28:31.6213820Z ld.shared.b8 %r22, [%r24+196832]; 2026-02-21T09:28:31.6213932Z setp.gt.u32 %p2, %r22, 3; 2026-02-21T09:28:31.6214025Z @%p2 bra $L__BB0_4; 2026-02-21T09:28:31.6214158Z // %bb.3: // %.preheader 2026-02-21T09:28:31.6214323Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6214428Z $L_brx_0: .branchtargets 2026-02-21T09:28:31.6214518Z $L__BB0_5, 2026-02-21T09:28:31.6214599Z $L__BB0_8, 2026-02-21T09:28:31.6214769Z $L__BB0_11, 2026-02-21T09:28:31.6214853Z $L__BB0_15; 2026-02-21T09:28:31.6214952Z brx.idx %r22, $L_brx_0; 2026-02-21T09:28:31.6215090Z $L__BB0_5: // %.peel.next 2026-02-21T09:28:31.6215238Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6215562Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6215696Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:28:31.6215826Z ld.shared.b32 %r56, [global_smem+196608]; 2026-02-21T09:28:31.6215917Z barrier.sync 1; 2026-02-21T09:28:31.6216211Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6216327Z bar.warp.sync -1; 2026-02-21T09:28:31.6216419Z mov.b32 %r1466, 0; 2026-02-21T09:28:31.6216514Z // begin inline asm 2026-02-21T09:28:31.6216607Z 2026-02-21T09:28:31.6216686Z { 2026-02-21T09:28:31.6216786Z .reg .pred complete; 2026-02-21T09:28:31.6216872Z waitLoop: 2026-02-21T09:28:31.6217110Z mbarrier.try_wait.parity.shared.b64 complete, [%r54], %r1466; 2026-02-21T09:28:31.6217236Z @!complete bra.uni waitLoop; 2026-02-21T09:28:31.6217321Z } 2026-02-21T09:28:31.6217330Z 2026-02-21T09:28:31.6217430Z // end inline asm 2026-02-21T09:28:31.6217747Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6217852Z elect.sync %r67|%p12, -1; 2026-02-21T09:28:31.6217953Z mov.b32 %r57, 138412048; 2026-02-21T09:28:31.6218047Z mov.pred %p11, 0; 2026-02-21T09:28:31.6218135Z // begin inline asm 2026-02-21T09:28:31.6218413Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 0 ], %rd12, %rd17, %r57, %p11; 2026-02-21T09:28:31.6218513Z // end inline asm 2026-02-21T09:28:31.6218613Z cvt.u64.u32 %rd24, %r72; 2026-02-21T09:28:31.6218850Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:28:31.6219020Z add.s32 %r73, %r23, 98336; 2026-02-21T09:28:31.6219120Z bfe.u32 %r74, %r73, 4, 14; 2026-02-21T09:28:31.6219218Z cvt.u64.u32 %rd25, %r74; 2026-02-21T09:28:31.6219337Z or.b64 %rd15, %rd25, -9223371899348713472; 2026-02-21T09:28:31.6219443Z mov.pred %p13, -1; 2026-02-21T09:28:31.6219533Z // begin inline asm 2026-02-21T09:28:31.6219779Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 0 ], %rd14, %rd15, %r57, %p13; 2026-02-21T09:28:31.6219877Z // end inline asm 2026-02-21T09:28:31.6219972Z add.s32 %r75, %r23, 8192; 2026-02-21T09:28:31.6220065Z bfe.u32 %r76, %r75, 4, 14; 2026-02-21T09:28:31.6220167Z cvt.u64.u32 %rd26, %r76; 2026-02-21T09:28:31.6220278Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:28:31.6220370Z // begin inline asm 2026-02-21T09:28:31.6220725Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 256 ], %rd16, %rd17, %r57, %p11; 2026-02-21T09:28:31.6220829Z // end inline asm 2026-02-21T09:28:31.6220927Z add.s32 %r77, %r23, 8224; 2026-02-21T09:28:31.6221023Z bfe.u32 %r78, %r77, 4, 14; 2026-02-21T09:28:31.6221128Z cvt.u64.u32 %rd27, %r78; 2026-02-21T09:28:31.6221240Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T09:28:31.6221328Z // begin inline asm 2026-02-21T09:28:31.6221632Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 256 ], %rd18, %rd15, %r57, %p13; 2026-02-21T09:28:31.6221720Z // end inline asm 2026-02-21T09:28:31.6221815Z add.s32 %r79, %r23, 196736; 2026-02-21T09:28:31.6221906Z cvt.u64.u32 %rd20, %r79; 2026-02-21T09:28:31.6222005Z // begin inline asm 2026-02-21T09:28:31.6222231Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:28:31.6222317Z // end inline asm 2026-02-21T09:28:31.6222419Z add.s32 %r80, %r23, 196832; 2026-02-21T09:28:31.6222511Z cvt.u64.u32 %rd21, %r80; 2026-02-21T09:28:31.6222600Z // begin inline asm 2026-02-21T09:28:31.6222822Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:28:31.6222928Z // end inline asm 2026-02-21T09:28:31.6223017Z mov.b32 %r1468, 1; 2026-02-21T09:28:31.6223112Z mov.b32 %r1467, %r1466; 2026-02-21T09:28:31.6223289Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:28:31.6223447Z // => This Inner Loop Header: Depth=2 2026-02-21T09:28:31.6223772Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6223874Z shl.b32 %r91, %r1468, 3; 2026-02-21T09:28:31.6223968Z add.s32 %r93, %r23, %r91; 2026-02-21T09:28:31.6224063Z add.s32 %r94, %r93, 196736; 2026-02-21T09:28:31.6224157Z add.s32 %r81, %r93, 196784; 2026-02-21T09:28:31.6224480Z .loc 1 54 31 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:54:31 2026-02-21T09:28:31.6224577Z shl.b32 %r95, %r1468, 14; 2026-02-21T09:28:31.6224735Z add.s32 %r96, %r23, %r95; 2026-02-21T09:28:31.6225064Z .loc 1 55 44 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:55:44 2026-02-21T09:28:31.6225162Z add.s32 %r97, %r96, 98304; 2026-02-21T09:28:31.6225453Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6225558Z bar.warp.sync -1; 2026-02-21T09:28:31.6225653Z // begin inline asm 2026-02-21T09:28:31.6225733Z 2026-02-21T09:28:31.6225811Z { 2026-02-21T09:28:31.6225917Z .reg .pred complete; 2026-02-21T09:28:31.6226003Z waitLoop: 2026-02-21T09:28:31.6226230Z mbarrier.try_wait.parity.shared.b64 complete, [%r81], %r1467; 2026-02-21T09:28:31.6226363Z @!complete bra.uni waitLoop; 2026-02-21T09:28:31.6226446Z } 2026-02-21T09:28:31.6226453Z 2026-02-21T09:28:31.6226539Z // end inline asm 2026-02-21T09:28:31.6226861Z .loc 1 56 52 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:56:52 2026-02-21T09:28:31.6226968Z setp.eq.b32 %p31, %r1466, 1984; 2026-02-21T09:28:31.6227185Z elect.sync %r98|%p22, -1; 2026-02-21T09:28:31.6227328Z bfe.u32 %r99, %r96, 4, 14; 2026-02-21T09:28:31.6227437Z cvt.u64.u32 %rd38, %r99; 2026-02-21T09:28:31.6227555Z or.b64 %rd28, %rd38, -9223371899348713472; 2026-02-21T09:28:31.6227651Z bfe.u32 %r100, %r97, 4, 14; 2026-02-21T09:28:31.6227754Z cvt.u64.u32 %rd39, %r100; 2026-02-21T09:28:31.6227867Z or.b64 %rd29, %rd39, -9223371899348713472; 2026-02-21T09:28:31.6227959Z // begin inline asm 2026-02-21T09:28:31.6228201Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 0 ], %rd28, %rd29, %r57, %p13; 2026-02-21T09:28:31.6228297Z // end inline asm 2026-02-21T09:28:31.6228390Z add.s32 %r101, %r96, 32; 2026-02-21T09:28:31.6228490Z bfe.u32 %r102, %r101, 4, 14; 2026-02-21T09:28:31.6228595Z cvt.u64.u32 %rd40, %r102; 2026-02-21T09:28:31.6228704Z or.b64 %rd30, %rd40, -9223371899348713472; 2026-02-21T09:28:31.6228799Z add.s32 %r103, %r96, 98336; 2026-02-21T09:28:31.6229030Z bfe.u32 %r104, %r103, 4, 14; 2026-02-21T09:28:31.6229128Z cvt.u64.u32 %rd41, %r104; 2026-02-21T09:28:31.6229241Z or.b64 %rd31, %rd41, -9223371899348713472; 2026-02-21T09:28:31.6229333Z // begin inline asm 2026-02-21T09:28:31.6229579Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 0 ], %rd30, %rd31, %r57, %p13; 2026-02-21T09:28:31.6229668Z // end inline asm 2026-02-21T09:28:31.6229764Z add.s32 %r105, %r96, 8192; 2026-02-21T09:28:31.6229866Z bfe.u32 %r106, %r105, 4, 14; 2026-02-21T09:28:31.6229961Z cvt.u64.u32 %rd42, %r106; 2026-02-21T09:28:31.6230069Z or.b64 %rd32, %rd42, -9223371899348713472; 2026-02-21T09:28:31.6230159Z // begin inline asm 2026-02-21T09:28:31.6230402Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 256 ], %rd32, %rd29, %r57, %p13; 2026-02-21T09:28:31.6230488Z // end inline asm 2026-02-21T09:28:31.6230580Z add.s32 %r107, %r96, 8224; 2026-02-21T09:28:31.6230699Z bfe.u32 %r108, %r107, 4, 14; 2026-02-21T09:28:31.6230811Z cvt.u64.u32 %rd43, %r108; 2026-02-21T09:28:31.6230930Z or.b64 %rd34, %rd43, -9223371899348713472; 2026-02-21T09:28:31.6231031Z // begin inline asm 2026-02-21T09:28:31.6231272Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r56 + 256 ], %rd34, %rd31, %r57, %p13; 2026-02-21T09:28:31.6231357Z // end inline asm 2026-02-21T09:28:31.6231449Z cvt.u64.u32 %rd36, %r94; 2026-02-21T09:28:31.6231544Z // begin inline asm 2026-02-21T09:28:31.6231762Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd36]; 2026-02-21T09:28:31.6231848Z // end inline asm 2026-02-21T09:28:31.6231966Z and.pred %p30, %p31, %p22; 2026-02-21T09:28:31.6232057Z // begin inline asm 2026-02-21T09:28:31.6232267Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:28:31.6232361Z // end inline asm 2026-02-21T09:28:31.6232660Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6232753Z add.s32 %r110, %r1468, 1; 2026-02-21T09:28:31.6232856Z setp.eq.b32 %p32, %r110, 6; 2026-02-21T09:28:31.6232972Z selp.b32 %r1468, 0, %r110, %p32; 2026-02-21T09:28:31.6233074Z selp.b32 %r111, 1, 0, %p32; 2026-02-21T09:28:31.6233177Z xor.b32 %r1467, %r1467, %r111; 2026-02-21T09:28:31.6233516Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6233611Z add.s32 %r1466, %r1466, 32; 2026-02-21T09:28:31.6233717Z setp.lt.u32 %p33, %r1466, 2016; 2026-02-21T09:28:31.6233817Z @%p33 bra $L__BB0_6; 2026-02-21T09:28:31.6233947Z // %bb.7: // %.loopexit 2026-02-21T09:28:31.6234096Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6234188Z barrier.sync 1; 2026-02-21T09:28:31.6234328Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:28:31.6234419Z bra.uni $L__BB0_2; 2026-02-21T09:28:31.6234585Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6234977Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6235218Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:28:31.6235453Z ld.shared.v2.b32 {%r40, %r44}, [global_smem+196616]; 2026-02-21T09:28:31.6235554Z barrier.sync 1; 2026-02-21T09:28:31.6235870Z .loc 1 21 67 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:21:67 2026-02-21T09:28:31.6235964Z mov.u32 %r27, %ctaid.x; 2026-02-21T09:28:31.6236060Z mov.u32 %r28, %ctaid.y; 2026-02-21T09:28:31.6236160Z mov.u32 %r29, %ctaid.z; 2026-02-21T09:28:31.6236250Z mov.u32 %r30, %nctaid.x; 2026-02-21T09:28:31.6236344Z mov.u32 %r31, %nctaid.y; 2026-02-21T09:28:31.6236454Z mad.lo.s32 %r32, %r29, %r31, %r28; 2026-02-21T09:28:31.6236555Z mad.lo.s32 %r33, %r32, %r30, %r27; 2026-02-21T09:28:31.6236649Z shl.b32 %r34, %r33, 8; 2026-02-21T09:28:31.6236960Z .loc 1 22 67 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:22:67 2026-02-21T09:28:31.6237138Z cvt.s64.s32 %rd7, %r34; 2026-02-21T09:28:31.6237289Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:28:31.6237390Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:28:31.6237506Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:28:31.6237813Z .loc 1 21 67 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:21:67 2026-02-21T09:28:31.6237919Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:28:31.6238025Z add.s32 %r13, %r1, -256; 2026-02-21T09:28:31.6238112Z mov.b32 %r1470, 0; 2026-02-21T09:28:31.6238202Z mov.b32 %r1469, -32; 2026-02-21T09:28:31.6238294Z mov.b32 %r1471, %r1470; 2026-02-21T09:28:31.6238465Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:28:31.6238619Z // => This Inner Loop Header: Depth=2 2026-02-21T09:28:31.6238927Z .loc 1 0 67 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0:67 2026-02-21T09:28:31.6239039Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:28:31.6239141Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:28:31.6239457Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6239564Z add.s32 %r1469, %r1469, 32; 2026-02-21T09:28:31.6239674Z shl.b32 %r46, %r1471, 3; 2026-02-21T09:28:31.6239783Z add.s32 %r48, %r23, %r46; 2026-02-21T09:28:31.6239884Z add.s32 %r35, %r48, 196736; 2026-02-21T09:28:31.6240186Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6240277Z // begin inline asm 2026-02-21T09:28:31.6240358Z 2026-02-21T09:28:31.6240442Z { 2026-02-21T09:28:31.6240539Z .reg .pred complete; 2026-02-21T09:28:31.6240621Z waitLoop: 2026-02-21T09:28:31.6240827Z mbarrier.try_wait.parity.shared.b64 complete, [%r35], %r1470; 2026-02-21T09:28:31.6240943Z @!complete bra.uni waitLoop; 2026-02-21T09:28:31.6241019Z } 2026-02-21T09:28:31.6241027Z 2026-02-21T09:28:31.6241116Z // end inline asm 2026-02-21T09:28:31.6241446Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6241545Z add.s32 %r41, %r48, 196784; 2026-02-21T09:28:31.6241834Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6241933Z bar.sync 3, 64; 2026-02-21T09:28:31.6242025Z // begin inline asm 2026-02-21T09:28:31.6242217Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r41], 32768; 2026-02-21T09:28:31.6242304Z // end inline asm 2026-02-21T09:28:31.6242626Z .loc 1 54 31 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:54:31 2026-02-21T09:28:31.6242720Z shl.b32 %r49, %r1471, 14; 2026-02-21T09:28:31.6242812Z add.s32 %r38, %r23, %r49; 2026-02-21T09:28:31.6243110Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6243200Z bar.sync 3, 64; 2026-02-21T09:28:31.6243305Z elect.sync %r50|%p7, -1; 2026-02-21T09:28:31.6243418Z and.pred %p4, %p6, %p7; 2026-02-21T09:28:31.6243512Z // begin inline asm 2026-02-21T09:28:31.6244072Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r38], [%rd10, {%r1469, %r40}], [%r41]; 2026-02-21T09:28:31.6244233Z // end inline asm 2026-02-21T09:28:31.6244554Z .loc 1 55 44 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:55:44 2026-02-21T09:28:31.6244651Z add.s32 %r42, %r38, 98304; 2026-02-21T09:28:31.6245031Z .loc 1 0 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:0 2026-02-21T09:28:31.6245126Z bar.sync 3, 64; 2026-02-21T09:28:31.6245225Z elect.sync %r51|%p8, -1; 2026-02-21T09:28:31.6245331Z and.pred %p5, %p6, %p8; 2026-02-21T09:28:31.6245432Z // begin inline asm 2026-02-21T09:28:31.6245876Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r42], [%rd11, {%r1469, %r44}], [%r41]; 2026-02-21T09:28:31.6245965Z // end inline asm 2026-02-21T09:28:31.6246161Z add.s32 %r52, %r1471, 1; 2026-02-21T09:28:31.6246312Z setp.eq.b32 %p9, %r52, 6; 2026-02-21T09:28:31.6246422Z selp.b32 %r1471, 0, %r52, %p9; 2026-02-21T09:28:31.6246523Z selp.b32 %r53, 1, 0, %p9; 2026-02-21T09:28:31.6246636Z xor.b32 %r1470, %r1470, %r53; 2026-02-21T09:28:31.6246956Z .loc 1 50 112 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:50:112 2026-02-21T09:28:31.6247063Z setp.lt.u32 %p10, %r1469, 2016; 2026-02-21T09:28:31.6247175Z @%p10 bra $L__BB0_9; 2026-02-21T09:28:31.6247342Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6247434Z barrier.sync 1; 2026-02-21T09:28:31.6247563Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:28:31.6247667Z bra.uni $L__BB0_2; 2026-02-21T09:28:31.6247830Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:28:31.6248127Z .loc 1 19 0 // cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py:19 2026-02-21T09:28:31.6248227Z barrier.sync 1; 2026-02-21T09:28:31.6248318Z barrier.sync 1; 2026-02-21T09:28:31.6248410Z bra.uni $L__BB0_2; 2026-02-21T09:28:31.6248517Z $L__tmp1: 2026-02-21T09:28:31.6248616Z $L__func_end0: 2026-02-21T09:28:31.6248759Z // -- End function 2026-02-21T09:28:31.6248840Z } 2026-02-21T09:28:31.6249241Z .file 1 "/tmp/torchinductor_root/ez/cezw7iarvnfjaotsjin2h2aslylqsui2nn2bzcwodebhciax3ae6.py" 2026-02-21T09:28:31.6249340Z .section .debug_abbrev 2026-02-21T09:28:31.6249418Z { 2026-02-21T09:28:31.6249573Z .b8 1 // Abbreviation Code 2026-02-21T09:28:31.6249718Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:28:31.6249851Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:28:31.6249984Z .b8 37 // DW_AT_producer 2026-02-21T09:28:31.6250120Z .b8 8 // DW_FORM_string 2026-02-21T09:28:31.6250268Z .b8 19 // DW_AT_language 2026-02-21T09:28:31.6250421Z .b8 5 // DW_FORM_data2 2026-02-21T09:28:31.6250574Z .b8 3 // DW_AT_name 2026-02-21T09:28:31.6250716Z .b8 8 // DW_FORM_string 2026-02-21T09:28:31.6250864Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:28:31.6251022Z .b8 6 // DW_FORM_data4 2026-02-21T09:28:31.6251180Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:28:31.6251304Z .b8 8 // DW_FORM_string 2026-02-21T09:28:31.6251429Z .b8 0 // EOM(1) 2026-02-21T09:28:31.6251542Z .b8 0 // EOM(2) 2026-02-21T09:28:31.6251653Z .b8 0 // EOM(3) 2026-02-21T09:28:31.6251735Z } 2026-02-21T09:28:31.6251843Z .section .debug_info 2026-02-21T09:28:31.6251923Z { 2026-02-21T09:28:31.6252067Z .b32 104 // Length of Unit 2026-02-21T09:28:31.6252324Z .b8 2 // DWARF version number 2026-02-21T09:28:31.6252454Z .b8 0 2026-02-21T09:28:31.6252660Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:28:31.6252815Z .b8 8 // Address Size (in bytes) 2026-02-21T09:28:31.6252997Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:28:31.6253130Z .b8 116 // DW_AT_producer 2026-02-21T09:28:31.6253230Z .b8 114 2026-02-21T09:28:31.6253333Z .b8 105 2026-02-21T09:28:31.6253422Z .b8 116 2026-02-21T09:28:31.6253506Z .b8 111 2026-02-21T09:28:31.6253587Z .b8 110 2026-02-21T09:28:31.6253671Z .b8 0 2026-02-21T09:28:31.6253796Z .b8 2 // DW_AT_language 2026-02-21T09:28:31.6253875Z .b8 0 2026-02-21T09:28:31.6254061Z .b8 99 // DW_AT_name 2026-02-21T09:28:31.6254144Z .b8 101 2026-02-21T09:28:31.6254279Z .b8 122 2026-02-21T09:28:31.6254364Z .b8 119 2026-02-21T09:28:31.6254452Z .b8 55 2026-02-21T09:28:31.6254532Z .b8 105 2026-02-21T09:28:31.6254615Z .b8 97 2026-02-21T09:28:31.6254759Z .b8 114 2026-02-21T09:28:31.6254842Z .b8 118 2026-02-21T09:28:31.6254923Z .b8 110 2026-02-21T09:28:31.6255003Z .b8 102 2026-02-21T09:28:31.6255093Z .b8 106 2026-02-21T09:28:31.6255176Z .b8 97 2026-02-21T09:28:31.6255257Z .b8 111 2026-02-21T09:28:31.6255344Z .b8 116 2026-02-21T09:28:31.6255425Z .b8 115 2026-02-21T09:28:31.6255506Z .b8 106 2026-02-21T09:28:31.6255589Z .b8 105 2026-02-21T09:28:31.6255677Z .b8 110 2026-02-21T09:28:31.6255757Z .b8 50 2026-02-21T09:28:31.6255838Z .b8 104 2026-02-21T09:28:31.6255928Z .b8 50 2026-02-21T09:28:31.6256008Z .b8 97 2026-02-21T09:28:31.6256088Z .b8 115 2026-02-21T09:28:31.6256168Z .b8 108 2026-02-21T09:28:31.6256258Z .b8 121 2026-02-21T09:28:31.6256339Z .b8 108 2026-02-21T09:28:31.6256418Z .b8 113 2026-02-21T09:28:31.6256501Z .b8 115 2026-02-21T09:28:31.6256592Z .b8 117 2026-02-21T09:28:31.6256673Z .b8 105 2026-02-21T09:28:31.6256756Z .b8 50 2026-02-21T09:28:31.6256847Z .b8 110 2026-02-21T09:28:31.6256926Z .b8 110 2026-02-21T09:28:31.6257005Z .b8 50 2026-02-21T09:28:31.6257083Z .b8 98 2026-02-21T09:28:31.6257172Z .b8 122 2026-02-21T09:28:31.6257251Z .b8 99 2026-02-21T09:28:31.6257329Z .b8 119 2026-02-21T09:28:31.6257416Z .b8 111 2026-02-21T09:28:31.6257495Z .b8 100 2026-02-21T09:28:31.6257575Z .b8 101 2026-02-21T09:28:31.6257654Z .b8 98 2026-02-21T09:28:31.6257739Z .b8 104 2026-02-21T09:28:31.6257817Z .b8 99 2026-02-21T09:28:31.6257897Z .b8 105 2026-02-21T09:28:31.6257975Z .b8 97 2026-02-21T09:28:31.6258068Z .b8 120 2026-02-21T09:28:31.6258163Z .b8 51 2026-02-21T09:28:31.6258255Z .b8 97 2026-02-21T09:28:31.6258354Z .b8 101 2026-02-21T09:28:31.6258434Z .b8 54 2026-02-21T09:28:31.6258516Z .b8 46 2026-02-21T09:28:31.6258594Z .b8 112 2026-02-21T09:28:31.6258681Z .b8 121 2026-02-21T09:28:31.6258758Z .b8 0 2026-02-21T09:28:31.6258934Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:28:31.6259078Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:28:31.6259163Z .b8 116 2026-02-21T09:28:31.6259241Z .b8 109 2026-02-21T09:28:31.6259317Z .b8 112 2026-02-21T09:28:31.6259400Z .b8 47 2026-02-21T09:28:31.6259475Z .b8 116 2026-02-21T09:28:31.6259554Z .b8 111 2026-02-21T09:28:31.6259644Z .b8 114 2026-02-21T09:28:31.6259724Z .b8 99 2026-02-21T09:28:31.6259805Z .b8 104 2026-02-21T09:28:31.6259886Z .b8 105 2026-02-21T09:28:31.6259975Z .b8 110 2026-02-21T09:28:31.6260056Z .b8 100 2026-02-21T09:28:31.6260135Z .b8 117 2026-02-21T09:28:31.6260225Z .b8 99 2026-02-21T09:28:31.6260306Z .b8 116 2026-02-21T09:28:31.6260386Z .b8 111 2026-02-21T09:28:31.6260464Z .b8 114 2026-02-21T09:28:31.6260552Z .b8 95 2026-02-21T09:28:31.6260631Z .b8 114 2026-02-21T09:28:31.6260710Z .b8 111 2026-02-21T09:28:31.6260790Z .b8 111 2026-02-21T09:28:31.6260880Z .b8 116 2026-02-21T09:28:31.6260960Z .b8 47 2026-02-21T09:28:31.6261042Z .b8 101 2026-02-21T09:28:31.6261132Z .b8 122 2026-02-21T09:28:31.6261214Z .b8 0 2026-02-21T09:28:31.6261405Z } 2026-02-21T09:28:31.6261574Z .section .debug_macinfo { } 2026-02-21T09:28:31.6261584Z 2026-02-21T09:28:31.6261727Z ================================================================ 2026-02-21T09:28:31.6261910Z please share the reproducer above with Triton project. 2026-02-21T09:28:33.2297761Z 2026-02-21T09:28:33.2298805Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 32/32 14.2 configs/s 2026-02-21T09:28:34.9218984Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 586.8 2026-02-21T09:28:34.9219705Z configs/s 2026-02-21T09:28:35.0235364Z [213s] Generation 8 complete: 2026-02-21T09:28:35.0235811Z error=10 2026-02-21T09:28:35.0236096Z ok=25 2026-02-21T09:28:35.0236384Z min=0.0718 2026-02-21T09:28:35.0236685Z mid=0.0911 2026-02-21T09:28:35.0236964Z max=27.1350 2026-02-21T09:28:35.0237791Z best={'block_sizes': [256, 256, 64], 2026-02-21T09:28:35.0238429Z 'indexing': ['pointer', 'pointer', 'tensor_descriptor'], 2026-02-21T09:28:35.0239016Z 'l2_groupings': [16], 2026-02-21T09:28:35.0239416Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:28:35.0239904Z 'loop_orders': [[1, 0]], 2026-02-21T09:28:35.0240263Z 'num_stages': 3, 2026-02-21T09:28:35.0240595Z 'num_warps': 8, 2026-02-21T09:28:35.0240917Z 'pid_type': 'flat', 2026-02-21T09:28:35.0241285Z 'range_flattens': [None, False], 2026-02-21T09:28:35.0241732Z 'range_multi_buffers': [None, True], 2026-02-21T09:28:35.0242172Z 'range_num_stages': [0, 0], 2026-02-21T09:28:35.0242573Z 'range_unroll_factors': [0, 0], 2026-02-21T09:28:35.0243010Z 'range_warp_specializes': [None, None]} 2026-02-21T09:28:35.0261344Z [213s] Fitting surrogate: 761 points, 761 targets 2026-02-21T09:28:35.2421926Z [214s] Autotuning complete in 214.0s after searching 732 configs. 2026-02-21T09:28:35.2422646Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:28:35.2425387Z @helion.kernel(config=helion.Config(block_sizes=[256, 256, 64], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_stages=3, num_warps=8, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:28:35.2427623Z 2026-02-21T09:28:35.2428214Z [214s] Code of selected kernel: /tmp/torchinductor_root/s7/cs7vm2jlrejztuzmc6dhixu5tznn5tmjiqllffuzorznwnq3majd.py 2026-02-21T09:28:35.2556919Z from __future__ import annotations 2026-02-21T09:28:35.2557228Z 2026-02-21T09:28:35.2557340Z import torch 2026-02-21T09:28:35.2557586Z import helion 2026-02-21T09:28:35.2557834Z import triton 2026-02-21T09:28:35.2558113Z import triton.language as tl 2026-02-21T09:28:35.2558572Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:28:35.2558966Z 2026-02-21T09:28:35.2559089Z _BLOCK_SIZE_1 = tl.constexpr(256) 2026-02-21T09:28:35.2559459Z _BLOCK_SIZE_0 = tl.constexpr(256) 2026-02-21T09:28:35.2559788Z _BLOCK_SIZE_2 = tl.constexpr(64) 2026-02-21T09:28:35.2560124Z # src[matmul.py:42]: def matmul( 2026-02-21T09:28:35.2560450Z # src[matmul.py:43]: x: Tensor, 2026-02-21T09:28:35.2560811Z # src[matmul.py:44]: y: Tensor, 2026-02-21T09:28:35.2561139Z # src[matmul.py:42-68]: ... 2026-02-21T09:28:35.2561481Z helion.runtime.set_triton_allocator() 2026-02-21T09:28:35.2561740Z 2026-02-21T09:28:35.2561848Z @triton.jit 2026-02-21T09:28:35.2562118Z def _helion_matmul(x, y, out): 2026-02-21T09:28:35.2562587Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:28:35.2563312Z out_desc = tl.make_tensor_descriptor(out, [8192, 2048], [2048, 1], [_BLOCK_SIZE_0, _BLOCK_SIZE_1]) 2026-02-21T09:28:35.2564013Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:28:35.2564473Z num_pid_m = tl.cdiv(2048, _BLOCK_SIZE_1) 2026-02-21T09:28:35.2564939Z num_pid_n = tl.cdiv(8192, _BLOCK_SIZE_0) 2026-02-21T09:28:35.2565751Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:28:35.2566135Z num_pid_in_group = 16 * num_pid_n 2026-02-21T09:28:35.2566517Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:28:35.2566906Z first_pid_m = group_id * 16 2026-02-21T09:28:35.2567288Z group_size_m = min(num_pid_m - first_pid_m, 16) 2026-02-21T09:28:35.2567806Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:28:35.2568356Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:28:35.2568784Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:28:35.2569236Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:28:35.2569698Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:28:35.2570127Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:28:35.2570866Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:28:35.2571455Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:28:35.2571943Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:28:35.2572486Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:28:35.2573275Z for offset_2 in tl.range(0, 2048, _BLOCK_SIZE_2, disallow_acc_multi_buffer=False, flatten=False): 2026-02-21T09:28:35.2573964Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:28:35.2574414Z acc_copy = acc 2026-02-21T09:28:35.2574778Z acc_copy_0 = acc_copy 2026-02-21T09:28:35.2575261Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:28:35.2576062Z load = tl.load(x + (indices_0[:, None] * 2048 + indices_2[None, :] * 1), None, eviction_policy='evict_first') 2026-02-21T09:28:35.2576967Z load_1 = tl.load(y + (indices_2[:, None] * 1 + indices_1[None, :] * 2048), None, eviction_policy='evict_last') 2026-02-21T09:28:35.2577989Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:28:35.2578866Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:28:35.2579348Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:28:35.2579721Z out_desc.store([offset_0, offset_1], v_0) 2026-02-21T09:28:35.2579987Z 2026-02-21T09:28:35.2580548Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:28:35.2581321Z """ 2026-02-21T09:28:35.2581748Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:28:35.2582270Z Args: 2026-02-21T09:28:35.2582573Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:28:35.2582970Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:28:35.2583569Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:28:35.2584218Z after the matmul. Defaults to identity (no change). 2026-02-21T09:28:35.2584623Z Returns: 2026-02-21T09:28:35.2584949Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:28:35.2585321Z """ 2026-02-21T09:28:35.2585571Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:28:35.2585894Z m, k = x.size() 2026-02-21T09:28:35.2586171Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:28:35.2586501Z k2, n = y.size() 2026-02-21T09:28:35.2586887Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:28:35.2587369Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:28:35.2587759Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:28:35.2588323Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:28:35.2588909Z # src[matmul.py:62]: ) 2026-02-21T09:28:35.2589422Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:28:35.2590215Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:28:35.2590643Z _BLOCK_SIZE_1 = 256 2026-02-21T09:28:35.2590920Z _BLOCK_SIZE_0 = 256 2026-02-21T09:28:35.2591304Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:28:35.2591862Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:28:35.2592417Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:28:35.2592816Z # src[matmul.py:63-67]: ... 2026-02-21T09:28:35.2593536Z _launcher(_helion_matmul, (triton.cdiv(2048, _BLOCK_SIZE_1) * triton.cdiv(8192, _BLOCK_SIZE_0),), x, y, out, num_warps=8, num_stages=3) 2026-02-21T09:28:35.2594336Z # src[matmul.py:68]: return out 2026-02-21T09:28:35.2594742Z return out 2026-02-21T09:28:55.0833831Z WARNING:tritonbench.utils.triton_op:Completed input ID 6: 2026-02-21T09:28:55.0834215Z (M, N, K) 2026-02-21T09:28:55.0834373Z ------------------ 2026-02-21T09:28:55.0834539Z (8192, 2048, 2048) 2026-02-21T09:28:55.0834635Z 2026-02-21T09:28:55.0843748Z 62%|██████▎ | 5/8 [30:51<19:07, 382.50s/it]WARNING:tritonbench.utils.triton_op:Running input ID 8: 2026-02-21T09:28:55.0844120Z (M, N, K) 2026-02-21T09:28:55.0844277Z ------------------- 2026-02-21T09:28:55.0847114Z (12288, 1024, 1024) 2026-02-21T09:28:55.0847449Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:29:40.3534024Z INFO:tritonbench.utils.triton_op:Took 0.02ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:30:11.4924131Z INFO:tritonbench.utils.triton_op:Took 97.08ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:30:48.0503615Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:30:48.0505310Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:30:48.0505566Z 'dtype': 'torch.float16', 2026-02-21T09:30:48.0505773Z 'shape': (12288, 1024), 2026-02-21T09:30:48.0505963Z 'stride': (1024, 1)}, 2026-02-21T09:30:48.0506141Z { 'device': 'cuda:0', 2026-02-21T09:30:48.0506306Z 'dtype': 'torch.float16', 2026-02-21T09:30:48.0506489Z 'shape': (1024, 1024), 2026-02-21T09:30:48.0506661Z 'stride': (1, 1024)}, 2026-02-21T09:30:48.0506815Z None), 2026-02-21T09:30:48.0506960Z 'kwargs': {}} 2026-02-21T09:30:48.0546393Z INFO:tritonbench.utils.triton_op:Took 4.98ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:30:48.1442419Z [0s] Autotune random seed: 2137757931 2026-02-21T09:30:48.2476794Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:30:56.8731595Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 13.8 configs/s 2026-02-21T09:31:01.6328637Z 2026-02-21T09:31:01.6328686Z 2026-02-21T09:31:01.6329148Z ================================================================ 2026-02-21T09:31:01.6329526Z Internal Triton PTX codegen error 2026-02-21T09:31:01.6329742Z `ptxas` stderr: 2026-02-21T09:31:01.6330194Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 277 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:01.6330690Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:01.6330834Z 2026-02-21T09:31:01.6331262Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5lxm22xb.ptx -o /tmp/tmp5lxm22xb.ptx.o 2026-02-21T09:31:01.6331745Z 2026-02-21T09:31:01.6331997Z [13s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:01.6332232Z 2026-02-21T09:31:01.6332300Z // 2026-02-21T09:31:01.6333387Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[True, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:31:01.6335825Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:01.6336095Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:01.6336319Z `ptxas` stderr: 2026-02-21T09:31:01.6336752Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 277 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:01.6337246Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:01.6337402Z 2026-02-21T09:31:01.6337962Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp5lxm22xb.ptx -o /tmp/tmp5lxm22xb.ptx.o 2026-02-21T09:31:01.6338421Z 2026-02-21T09:31:01.6338552Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:01.6338797Z // 2026-02-21T09:31:01.6338869Z 2026-02-21T09:31:01.6338925Z .version 8.7 2026-02-21T09:31:01.6339076Z .target sm_100a 2026-02-21T09:31:01.6339211Z .address_size 64 2026-02-21T09:31:01.6339307Z 2026-02-21T09:31:01.6339432Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:01.6339702Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:01.6339913Z // @_helion_matmul 2026-02-21T09:31:01.6340121Z .visible .entry _helion_matmul( 2026-02-21T09:31:01.6340336Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:01.6340592Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:01.6340837Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:01.6341091Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:01.6341357Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:01.6341557Z ) 2026-02-21T09:31:01.6341685Z .reqntid 128 2026-02-21T09:31:01.6341819Z .maxnreg 32 2026-02-21T09:31:01.6341959Z { 2026-02-21T09:31:01.6342084Z .reg .pred %p<113>; 2026-02-21T09:31:01.6342242Z .reg .b16 %rs<8>; 2026-02-21T09:31:01.6342383Z .reg .b32 %r<310>; 2026-02-21T09:31:01.6342549Z .reg .b64 %rd<102>; 2026-02-21T09:31:01.6342821Z .loc 1 19 0 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:19:0 2026-02-21T09:31:01.6343106Z $L__func_begin0: 2026-02-21T09:31:01.6343360Z .loc 1 19 0 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:19:0 2026-02-21T09:31:01.6343586Z 2026-02-21T09:31:01.6343640Z // %bb.0: 2026-02-21T09:31:01.6343800Z ld.param.b64 %rd7, [_helion_matmul_param_0]; 2026-02-21T09:31:01.6343983Z $L__tmp0: 2026-02-21T09:31:01.6344220Z .loc 1 19 0 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:19 2026-02-21T09:31:01.6344497Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:01.6344716Z ld.param.b64 %rd25, [_helion_matmul_param_1]; 2026-02-21T09:31:01.6344918Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:01.6345077Z mov.b32 %r38, global_smem; 2026-02-21T09:31:01.6345236Z // begin inline asm 2026-02-21T09:31:01.6345472Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r38], 32; 2026-02-21T09:31:01.6345715Z // end inline asm 2026-02-21T09:31:01.6345870Z ld.param.b64 %rd42, [_helion_matmul_param_3]; 2026-02-21T09:31:01.6346060Z bar.sync 0; 2026-02-21T09:31:01.6346199Z ld.shared.b32 %r300, [global_smem]; 2026-02-21T09:31:01.6346372Z bar.sync 0; 2026-02-21T09:31:01.6346503Z // begin inline asm 2026-02-21T09:31:01.6346696Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:01.6346920Z // end inline asm 2026-02-21T09:31:01.6347161Z .loc 1 21 68 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:21:68 2026-02-21T09:31:01.6347525Z mov.u32 %r55, %ctaid.x; 2026-02-21T09:31:01.6347669Z mov.u32 %r56, %ctaid.y; 2026-02-21T09:31:01.6347817Z mov.u32 %r57, %ctaid.z; 2026-02-21T09:31:01.6347960Z mov.u32 %r58, %nctaid.x; 2026-02-21T09:31:01.6348114Z mov.u32 %r59, %nctaid.y; 2026-02-21T09:31:01.6348275Z mad.lo.s32 %r60, %r57, %r59, %r56; 2026-02-21T09:31:01.6348444Z mad.lo.s32 %r61, %r60, %r58, %r55; 2026-02-21T09:31:01.6348613Z shl.b32 %r62, %r61, 8; 2026-02-21T09:31:01.6348758Z cvt.s64.s32 %rd43, %r62; 2026-02-21T09:31:01.6348916Z add.s64 %rd21, %rd42, %rd43; 2026-02-21T09:31:01.6349072Z shl.b32 %r63, %r1, 2; 2026-02-21T09:31:01.6349222Z add.s32 %r39, %r38, %r63; 2026-02-21T09:31:01.6349364Z mov.b32 %r48, 0; 2026-02-21T09:31:01.6349503Z // begin inline asm 2026-02-21T09:31:01.6349649Z @%p1 st.shared.b32 [ %r39 + 0 ], %r48; 2026-02-21T09:31:01.6349851Z // end inline asm 2026-02-21T09:31:01.6350023Z bar.warp.sync -1; 2026-02-21T09:31:01.6350170Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:31:01.6350326Z cvt.u64.u32 %rd6, %r38; 2026-02-21T09:31:01.6350469Z // begin inline asm 2026-02-21T09:31:01.6350720Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd6 + 0 ], %rd7; 2026-02-21T09:31:01.6350987Z // end inline asm 2026-02-21T09:31:01.6351122Z // begin inline asm 2026-02-21T09:31:01.6351337Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1; 2026-02-21T09:31:01.6351591Z // end inline asm 2026-02-21T09:31:01.6351726Z mov.b32 %r41, 16; 2026-02-21T09:31:01.6351857Z // begin inline asm 2026-02-21T09:31:01.6352091Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r41; 2026-02-21T09:31:01.6352345Z // end inline asm 2026-02-21T09:31:01.6352484Z mov.b32 %r42, 128; 2026-02-21T09:31:01.6352620Z // begin inline asm 2026-02-21T09:31:01.6352852Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r42; 2026-02-21T09:31:01.6353110Z // end inline asm 2026-02-21T09:31:01.6353241Z mov.b32 %r43, 1024; 2026-02-21T09:31:01.6353385Z // begin inline asm 2026-02-21T09:31:01.6353611Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r43; 2026-02-21T09:31:01.6354005Z // end inline asm 2026-02-21T09:31:01.6354132Z mov.b32 %r44, 12288; 2026-02-21T09:31:01.6354275Z // begin inline asm 2026-02-21T09:31:01.6354500Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r44; 2026-02-21T09:31:01.6354797Z // end inline asm 2026-02-21T09:31:01.6354932Z mov.b64 %rd14, 2048; 2026-02-21T09:31:01.6355069Z // begin inline asm 2026-02-21T09:31:01.6355319Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd6 + 0 ], 0x0, %rd14; 2026-02-21T09:31:01.6355591Z // end inline asm 2026-02-21T09:31:01.6355721Z mov.b32 %r45, 1; 2026-02-21T09:31:01.6355848Z // begin inline asm 2026-02-21T09:31:01.6356097Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r45; 2026-02-21T09:31:01.6356379Z // end inline asm 2026-02-21T09:31:01.6356510Z // begin inline asm 2026-02-21T09:31:01.6356756Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r45; 2026-02-21T09:31:01.6357030Z // end inline asm 2026-02-21T09:31:01.6357169Z // begin inline asm 2026-02-21T09:31:01.6357409Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x6; 2026-02-21T09:31:01.6365791Z // end inline asm 2026-02-21T09:31:01.6365959Z // begin inline asm 2026-02-21T09:31:01.6366247Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0; 2026-02-21T09:31:01.6366538Z // end inline asm 2026-02-21T09:31:01.6366685Z // begin inline asm 2026-02-21T09:31:01.6366929Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1; 2026-02-21T09:31:01.6367192Z // end inline asm 2026-02-21T09:31:01.6367344Z // begin inline asm 2026-02-21T09:31:01.6367573Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0; 2026-02-21T09:31:01.6367970Z // end inline asm 2026-02-21T09:31:01.6368106Z // begin inline asm 2026-02-21T09:31:01.6368455Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd21 + 0 ], [ %rd6 + 0 ], 0x80; 2026-02-21T09:31:01.6368830Z // end inline asm 2026-02-21T09:31:01.6368963Z // begin inline asm 2026-02-21T09:31:01.6369183Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd21 + 0 ], 0x80; 2026-02-21T09:31:01.6369431Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:31:01.6369628Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:31:01.6369802Z // end inline asm 2026-02-21T09:31:01.6369943Z bar.sync 0; 2026-02-21T09:31:01.6370083Z cvta.global.u64 %rd59, %rd21; 2026-02-21T09:31:01.6370370Z .loc 1 22 67 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:22:67 2026-02-21T09:31:01.6370714Z add.s64 %rd39, %rd21, 128; 2026-02-21T09:31:01.6370899Z bar.sync 0; 2026-02-21T09:31:01.6371039Z // begin inline asm 2026-02-21T09:31:01.6371187Z @%p1 st.shared.b32 [ %r39 + 0 ], %r48; 2026-02-21T09:31:01.6371363Z // end inline asm 2026-02-21T09:31:01.6371503Z bar.warp.sync -1; 2026-02-21T09:31:01.6371653Z // begin inline asm 2026-02-21T09:31:01.6371896Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd6 + 0 ], %rd25; 2026-02-21T09:31:01.6372176Z // end inline asm 2026-02-21T09:31:01.6372318Z // begin inline asm 2026-02-21T09:31:01.6372531Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1; 2026-02-21T09:31:01.6372780Z // end inline asm 2026-02-21T09:31:01.6372914Z // begin inline asm 2026-02-21T09:31:01.6373149Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r41; 2026-02-21T09:31:01.6373402Z // end inline asm 2026-02-21T09:31:01.6373543Z // begin inline asm 2026-02-21T09:31:01.6373774Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r41; 2026-02-21T09:31:01.6374026Z // end inline asm 2026-02-21T09:31:01.6374173Z // begin inline asm 2026-02-21T09:31:01.6374408Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r43; 2026-02-21T09:31:01.6374738Z // end inline asm 2026-02-21T09:31:01.6374872Z // begin inline asm 2026-02-21T09:31:01.6375106Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r43; 2026-02-21T09:31:01.6375367Z // end inline asm 2026-02-21T09:31:01.6375512Z // begin inline asm 2026-02-21T09:31:01.6375759Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd6 + 0 ], 0x0, %rd14; 2026-02-21T09:31:01.6376032Z // end inline asm 2026-02-21T09:31:01.6376170Z // begin inline asm 2026-02-21T09:31:01.6376415Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r45; 2026-02-21T09:31:01.6376705Z // end inline asm 2026-02-21T09:31:01.6376842Z // begin inline asm 2026-02-21T09:31:01.6377104Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r45; 2026-02-21T09:31:01.6377401Z // end inline asm 2026-02-21T09:31:01.6377538Z // begin inline asm 2026-02-21T09:31:01.6377785Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x6; 2026-02-21T09:31:01.6378051Z // end inline asm 2026-02-21T09:31:01.6378200Z // begin inline asm 2026-02-21T09:31:01.6378450Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0; 2026-02-21T09:31:01.6378751Z // end inline asm 2026-02-21T09:31:01.6378887Z // begin inline asm 2026-02-21T09:31:01.6379138Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1; 2026-02-21T09:31:01.6379416Z // end inline asm 2026-02-21T09:31:01.6379554Z // begin inline asm 2026-02-21T09:31:01.6379794Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0; 2026-02-21T09:31:01.6380052Z // end inline asm 2026-02-21T09:31:01.6380199Z // begin inline asm 2026-02-21T09:31:01.6380545Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd39 + 0 ], [ %rd6 + 0 ], 0x80; 2026-02-21T09:31:01.6381041Z // end inline asm 2026-02-21T09:31:01.6381187Z // begin inline asm 2026-02-21T09:31:01.6381400Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd39 + 0 ], 0x80; 2026-02-21T09:31:01.6381665Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:31:01.6381862Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:31:01.6382055Z // end inline asm 2026-02-21T09:31:01.6382192Z bar.sync 0; 2026-02-21T09:31:01.6382344Z cvta.global.u64 %rd60, %rd39; 2026-02-21T09:31:01.6382636Z .loc 1 29 35 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:29:35 2026-02-21T09:31:01.6382951Z shl.b32 %r301, %r55, 1; 2026-02-21T09:31:01.6383238Z .loc 1 30 37 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:30:37 2026-02-21T09:31:01.6383560Z add.s32 %r64, %r301, 2; 2026-02-21T09:31:01.6383876Z .loc 1 30 49 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:30:49 2026-02-21T09:31:01.6384159Z min.s32 %r4, %r64, 6144; 2026-02-21T09:31:01.6384438Z .loc 1 31 107 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:31:107 2026-02-21T09:31:01.6384784Z setp.ge.s32 %p39, %r301, %r4; 2026-02-21T09:31:01.6384946Z @%p39 bra $L__BB0_9; 2026-02-21T09:31:01.6385121Z // %bb.1: // %.lr.ph 2026-02-21T09:31:01.6385423Z .loc 1 0 107 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:0:107 2026-02-21T09:31:01.6385746Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:31:01.6385934Z shr.u32 %r5, %r1, 5; 2026-02-21T09:31:01.6386092Z bfe.u32 %r6, %r1, 1, 6; 2026-02-21T09:31:01.6386238Z or.b32 %r7, %r6, 64; 2026-02-21T09:31:01.6386389Z and.b32 %r8, %r1, 1; 2026-02-21T09:31:01.6386532Z shl.b32 %r9, %r8, 3; 2026-02-21T09:31:01.6386681Z bfe.u32 %r66, %r38, 4, 14; 2026-02-21T09:31:01.6386855Z cvt.u64.u32 %rd44, %r66; 2026-02-21T09:31:01.6387030Z or.b64 %rd56, %rd44, -4611685949691133952; 2026-02-21T09:31:01.6387229Z add.s32 %r67, %r38, 24576; 2026-02-21T09:31:01.6387383Z bfe.u32 %r68, %r67, 4, 14; 2026-02-21T09:31:01.6387543Z cvt.u64.u32 %rd45, %r68; 2026-02-21T09:31:01.6387700Z or.b64 %rd57, %rd45, -4611685949705814016; 2026-02-21T09:31:01.6387884Z shl.b32 %r69, %r1, 4; 2026-02-21T09:31:01.6388031Z and.b32 %r70, %r69, 1968; 2026-02-21T09:31:01.6388190Z bfe.s32 %r71, %r1, 2, 1; 2026-02-21T09:31:01.6388344Z and.b32 %r72, %r71, 2112; 2026-02-21T09:31:01.6388493Z or.b32 %r73, %r72, %r70; 2026-02-21T09:31:01.6388649Z add.s32 %r10, %r38, %r73; 2026-02-21T09:31:01.6388797Z xor.b32 %r74, %r73, 64; 2026-02-21T09:31:01.6388954Z add.s32 %r11, %r38, %r74; 2026-02-21T09:31:01.6389100Z shl.b32 %r75, %r1, 3; 2026-02-21T09:31:01.6389249Z and.b32 %r76, %r75, 944; 2026-02-21T09:31:01.6389395Z shl.b32 %r77, %r8, 6; 2026-02-21T09:31:01.6389550Z bfe.s32 %r78, %r1, 3, 1; 2026-02-21T09:31:01.6389696Z and.b32 %r79, %r78, 2112; 2026-02-21T09:31:01.6389855Z or.b32 %r80, %r76, %r77; 2026-02-21T09:31:01.6390008Z xor.b32 %r81, %r80, %r79; 2026-02-21T09:31:01.6390153Z add.s32 %r12, %r38, %r81; 2026-02-21T09:31:01.6390305Z bra.uni $L__BB0_2; 2026-02-21T09:31:01.6390488Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:01.6390703Z mov.b32 %r237, 1; 2026-02-21T09:31:01.6390948Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6391237Z // begin inline asm 2026-02-21T09:31:01.6391370Z 2026-02-21T09:31:01.6391492Z { 2026-02-21T09:31:01.6391614Z .reg .pred complete; 2026-02-21T09:31:01.6391764Z waitLoop: 2026-02-21T09:31:01.6391959Z mbarrier.try_wait.parity.shared.b64 complete, [%r236], %r237; 2026-02-21T09:31:01.6392189Z @!complete bra.uni waitLoop; 2026-02-21T09:31:01.6392351Z } 2026-02-21T09:31:01.6392419Z 2026-02-21T09:31:01.6392475Z // end inline asm 2026-02-21T09:31:01.6392735Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6393079Z bar.sync 0; 2026-02-21T09:31:01.6393218Z // begin inline asm 2026-02-21T09:31:01.6393391Z @%p4 mbarrier.inval.shared::cta.b64 [%r101]; 2026-02-21T09:31:01.6393575Z // end inline asm 2026-02-21T09:31:01.6393714Z bar.sync 0; 2026-02-21T09:31:01.6393840Z // begin inline asm 2026-02-21T09:31:01.6394007Z @%p4 mbarrier.inval.shared::cta.b64 [%r102]; 2026-02-21T09:31:01.6394184Z // end inline asm 2026-02-21T09:31:01.6394324Z bar.sync 0; 2026-02-21T09:31:01.6394452Z // begin inline asm 2026-02-21T09:31:01.6394621Z @%p4 mbarrier.inval.shared::cta.b64 [%r103]; 2026-02-21T09:31:01.6394847Z // end inline asm 2026-02-21T09:31:01.6394987Z bar.sync 0; 2026-02-21T09:31:01.6395123Z // begin inline asm 2026-02-21T09:31:01.6395276Z @%p4 mbarrier.inval.shared::cta.b64 [%r104]; 2026-02-21T09:31:01.6395489Z // end inline asm 2026-02-21T09:31:01.6395645Z bar.sync 0; 2026-02-21T09:31:01.6395786Z // begin inline asm 2026-02-21T09:31:01.6395942Z @%p4 mbarrier.inval.shared::cta.b64 [%r105]; 2026-02-21T09:31:01.6396129Z // end inline asm 2026-02-21T09:31:01.6396257Z bar.sync 0; 2026-02-21T09:31:01.6396389Z // begin inline asm 2026-02-21T09:31:01.6396541Z @%p4 mbarrier.inval.shared::cta.b64 [%r178]; 2026-02-21T09:31:01.6396723Z // end inline asm 2026-02-21T09:31:01.6396869Z add.s32 %r244, %r38, 27696; 2026-02-21T09:31:01.6397022Z // begin inline asm 2026-02-21T09:31:01.6397185Z @%p4 mbarrier.inval.shared::cta.b64 [%r244]; 2026-02-21T09:31:01.6397358Z // end inline asm 2026-02-21T09:31:01.6397494Z bar.sync 0; 2026-02-21T09:31:01.6397618Z // begin inline asm 2026-02-21T09:31:01.6397776Z @%p4 mbarrier.inval.shared::cta.b64 [%r100]; 2026-02-21T09:31:01.6397949Z // end inline asm 2026-02-21T09:31:01.6398206Z .loc 1 59 45 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:59:45 2026-02-21T09:31:01.6398496Z shl.b32 %r272, %r16, 10; 2026-02-21T09:31:01.6398643Z shl.b32 %r273, %r17, 10; 2026-02-21T09:31:01.6398903Z .loc 1 59 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:59:52 2026-02-21T09:31:01.6399192Z add.s32 %r274, %r272, %r19; 2026-02-21T09:31:01.6399342Z add.s32 %r275, %r273, %r19; 2026-02-21T09:31:01.6399599Z .loc 1 59 24 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:59:24 2026-02-21T09:31:01.6399884Z mad.wide.s32 %rd68, %r274, 2, %rd5; 2026-02-21T09:31:01.6400084Z mad.wide.s32 %rd69, %r275, 2, %rd5; 2026-02-21T09:31:01.6400360Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6400654Z // begin inline asm 2026-02-21T09:31:01.6401019Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r246, %r247, %r248, %r249, %r250, %r251, %r252, %r253, %r254, %r255, %r256, %r257, %r258, %r259, %r260, %r261}, [%r262 + 0]; 2026-02-21T09:31:01.6401391Z // end inline asm 2026-02-21T09:31:01.6401529Z // begin inline asm 2026-02-21T09:31:01.6401675Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:01.6401839Z // end inline asm 2026-02-21T09:31:01.6401969Z cvt.u64.u32 %rd70, %r246; 2026-02-21T09:31:01.6402121Z cvt.u64.u32 %rd71, %r247; 2026-02-21T09:31:01.6402263Z shl.b64 %rd72, %rd71, 32; 2026-02-21T09:31:01.6402416Z or.b64 %rd73, %rd70, %rd72; 2026-02-21T09:31:01.6402678Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6402965Z mov.b64 {%r276, %r277}, %rd73; 2026-02-21T09:31:01.6403137Z cvt.rn.f16x2.f32 %r278, %r277, %r276; 2026-02-21T09:31:01.6403406Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6403683Z cvt.u64.u32 %rd74, %r248; 2026-02-21T09:31:01.6403826Z cvt.u64.u32 %rd75, %r249; 2026-02-21T09:31:01.6403976Z shl.b64 %rd76, %rd75, 32; 2026-02-21T09:31:01.6404121Z or.b64 %rd77, %rd74, %rd76; 2026-02-21T09:31:01.6404385Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6404780Z mov.b64 {%r279, %r280}, %rd77; 2026-02-21T09:31:01.6404944Z cvt.rn.f16x2.f32 %r281, %r280, %r279; 2026-02-21T09:31:01.6405224Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6405503Z cvt.u64.u32 %rd78, %r250; 2026-02-21T09:31:01.6405653Z cvt.u64.u32 %rd79, %r251; 2026-02-21T09:31:01.6405796Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:31:01.6405947Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:31:01.6406216Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6406497Z mov.b64 {%r282, %r283}, %rd81; 2026-02-21T09:31:01.6406665Z cvt.rn.f16x2.f32 %r284, %r283, %r282; 2026-02-21T09:31:01.6406995Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6407300Z cvt.u64.u32 %rd82, %r252; 2026-02-21T09:31:01.6407458Z cvt.u64.u32 %rd83, %r253; 2026-02-21T09:31:01.6407602Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:31:01.6407755Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:31:01.6408006Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6408291Z mov.b64 {%r285, %r286}, %rd85; 2026-02-21T09:31:01.6408449Z cvt.rn.f16x2.f32 %r287, %r286, %r285; 2026-02-21T09:31:01.6408725Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6409012Z cvt.u64.u32 %rd86, %r254; 2026-02-21T09:31:01.6409155Z cvt.u64.u32 %rd87, %r255; 2026-02-21T09:31:01.6409305Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:31:01.6409450Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:31:01.6409710Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6409985Z mov.b64 {%r288, %r289}, %rd89; 2026-02-21T09:31:01.6410155Z cvt.rn.f16x2.f32 %r290, %r289, %r288; 2026-02-21T09:31:01.6410431Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6410704Z cvt.u64.u32 %rd90, %r256; 2026-02-21T09:31:01.6410857Z cvt.u64.u32 %rd91, %r257; 2026-02-21T09:31:01.6410998Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:31:01.6411149Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:31:01.6411400Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6411676Z mov.b64 {%r291, %r292}, %rd93; 2026-02-21T09:31:01.6411831Z cvt.rn.f16x2.f32 %r293, %r292, %r291; 2026-02-21T09:31:01.6412106Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6412382Z cvt.u64.u32 %rd94, %r258; 2026-02-21T09:31:01.6412523Z cvt.u64.u32 %rd95, %r259; 2026-02-21T09:31:01.6412674Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:31:01.6412818Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:31:01.6413077Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6413358Z mov.b64 {%r294, %r295}, %rd97; 2026-02-21T09:31:01.6413526Z cvt.rn.f16x2.f32 %r296, %r295, %r294; 2026-02-21T09:31:01.6413791Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6414078Z cvt.u64.u32 %rd98, %r260; 2026-02-21T09:31:01.6414228Z cvt.u64.u32 %rd99, %r261; 2026-02-21T09:31:01.6414374Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:31:01.6414534Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:31:01.6414843Z .loc 1 58 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:58:27 2026-02-21T09:31:01.6415131Z mov.b64 {%r297, %r298}, %rd101; 2026-02-21T09:31:01.6415295Z cvt.rn.f16x2.f32 %r299, %r298, %r297; 2026-02-21T09:31:01.6415581Z .loc 1 59 82 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:59:82 2026-02-21T09:31:01.6415935Z st.shared.v4.b32 [%r10], {%r278, %r281, %r284, %r287}; 2026-02-21T09:31:01.6416191Z st.shared.v4.b32 [%r11], {%r290, %r293, %r296, %r299}; 2026-02-21T09:31:01.6416390Z bar.sync 0; 2026-02-21T09:31:01.6416560Z ld.shared.v4.b32 {%r267, %r268, %r269, %r270}, [%r12+1024]; 2026-02-21T09:31:01.6416804Z ld.shared.v4.b32 {%r263, %r264, %r265, %r266}, [%r12]; 2026-02-21T09:31:01.6416995Z // begin inline asm 2026-02-21T09:31:01.6417187Z st.global.v4.b32 [ %rd68 + 0 ], { %r263, %r264, %r265, %r266 }; 2026-02-21T09:31:01.6417397Z // end inline asm 2026-02-21T09:31:01.6417531Z // begin inline asm 2026-02-21T09:31:01.6417710Z st.global.v4.b32 [ %rd69 + 0 ], { %r267, %r268, %r269, %r270 }; 2026-02-21T09:31:01.6417913Z // end inline asm 2026-02-21T09:31:01.6418166Z .loc 1 31 107 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:31:107 2026-02-21T09:31:01.6418483Z add.s32 %r301, %r301, 1; 2026-02-21T09:31:01.6418667Z setp.ne.b32 %p111, %r301, %r4; 2026-02-21T09:31:01.6418827Z @%p111 bra $L__BB0_2; 2026-02-21T09:31:01.6418978Z bra.uni $L__BB0_9; 2026-02-21T09:31:01.6419156Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:31:01.6419396Z // Child Loop BB0_5 Depth 2 2026-02-21T09:31:01.6419707Z .loc 1 37 35 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:37:35 2026-02-21T09:31:01.6419994Z shr.s32 %r154, %r301, 31; 2026-02-21T09:31:01.6420154Z shr.u32 %r155, %r154, 22; 2026-02-21T09:31:01.6420310Z add.s32 %r156, %r301, %r155; 2026-02-21T09:31:01.6420591Z .loc 1 40 45 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:40:45 2026-02-21T09:31:01.6420898Z and.b32 %r157, %r156, 64512; 2026-02-21T09:31:01.6421064Z sub.s32 %r158, %r301, %r157; 2026-02-21T09:31:01.6421348Z .loc 1 40 64 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:40:64 2026-02-21T09:31:01.6421646Z cvt.u16.u32 %rs1, %r158; 2026-02-21T09:31:01.6421924Z .loc 1 41 51 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:41:51 2026-02-21T09:31:01.6422220Z shr.s16 %rs2, %rs1, 15; 2026-02-21T09:31:01.6422382Z shr.u16 %rs3, %rs2, 12; 2026-02-21T09:31:01.6422534Z add.s16 %rs4, %rs1, %rs3; 2026-02-21T09:31:01.6422694Z shr.s16 %rs5, %rs4, 4; 2026-02-21T09:31:01.6422961Z .loc 1 40 64 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:40:64 2026-02-21T09:31:01.6423254Z and.b16 %rs6, %rs4, -16; 2026-02-21T09:31:01.6423414Z sub.s16 %rs7, %rs1, %rs6; 2026-02-21T09:31:01.6423678Z .loc 1 42 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:42:27 2026-02-21T09:31:01.6423976Z shl.b32 %r159, %r156, 1; 2026-02-21T09:31:01.6424126Z and.b32 %r160, %r159, -2048; 2026-02-21T09:31:01.6424300Z mad.wide.s16 %r181, %rs7, 128, %r160; 2026-02-21T09:31:01.6424592Z .loc 1 44 27 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:44:27 2026-02-21T09:31:01.6424936Z mul.wide.s16 %r185, %rs5, 16; 2026-02-21T09:31:01.6425218Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6425522Z shfl.sync.idx.b32 %r20, %r5, 0, 31, -1; 2026-02-21T09:31:01.6425715Z shl.b32 %r161, %r20, 21; 2026-02-21T09:31:01.6425870Z and.b32 %r162, %r161, 6291456; 2026-02-21T09:31:01.6426037Z add.s32 %r262, %r162, %r300; 2026-02-21T09:31:01.6426194Z mov.pred %p40, -1; 2026-02-21T09:31:01.6426348Z mov.b32 %r302, 0; 2026-02-21T09:31:01.6426487Z // begin inline asm 2026-02-21T09:31:01.6426879Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r262 + 0], {%r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302, %r302}; 2026-02-21T09:31:01.6427289Z // end inline asm 2026-02-21T09:31:01.6427443Z // begin inline asm 2026-02-21T09:31:01.6427605Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:01.6427794Z // end inline asm 2026-02-21T09:31:01.6427959Z bar.sync 0; 2026-02-21T09:31:01.6428201Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6428496Z add.s32 %r303, %r38, 27696; 2026-02-21T09:31:01.6428652Z // begin inline asm 2026-02-21T09:31:01.6428812Z @%p4 mbarrier.init.shared::cta.b64 [%r303], 1; 2026-02-21T09:31:01.6429000Z // end inline asm 2026-02-21T09:31:01.6429127Z bar.sync 0; 2026-02-21T09:31:01.6429261Z add.s32 %r100, %r38, 27704; 2026-02-21T09:31:01.6429408Z // begin inline asm 2026-02-21T09:31:01.6429574Z @%p4 mbarrier.init.shared::cta.b64 [%r100], 1; 2026-02-21T09:31:01.6429754Z // end inline asm 2026-02-21T09:31:01.6429895Z add.s32 %r101, %r38, 27648; 2026-02-21T09:31:01.6430041Z // begin inline asm 2026-02-21T09:31:01.6430200Z @%p4 mbarrier.init.shared::cta.b64 [%r101], 1; 2026-02-21T09:31:01.6430410Z // end inline asm 2026-02-21T09:31:01.6430541Z bar.sync 0; 2026-02-21T09:31:01.6430706Z add.s32 %r102, %r38, 27656; 2026-02-21T09:31:01.6430857Z // begin inline asm 2026-02-21T09:31:01.6431016Z @%p4 mbarrier.init.shared::cta.b64 [%r102], 1; 2026-02-21T09:31:01.6431193Z // end inline asm 2026-02-21T09:31:01.6431326Z bar.sync 0; 2026-02-21T09:31:01.6431448Z add.s32 %r103, %r38, 27664; 2026-02-21T09:31:01.6431600Z // begin inline asm 2026-02-21T09:31:01.6431759Z @%p4 mbarrier.init.shared::cta.b64 [%r103], 1; 2026-02-21T09:31:01.6431934Z // end inline asm 2026-02-21T09:31:01.6432066Z bar.sync 0; 2026-02-21T09:31:01.6432190Z add.s32 %r104, %r38, 27672; 2026-02-21T09:31:01.6432343Z // begin inline asm 2026-02-21T09:31:01.6432496Z @%p4 mbarrier.init.shared::cta.b64 [%r104], 1; 2026-02-21T09:31:01.6432678Z // end inline asm 2026-02-21T09:31:01.6432802Z bar.sync 0; 2026-02-21T09:31:01.6432933Z add.s32 %r105, %r38, 27680; 2026-02-21T09:31:01.6433076Z // begin inline asm 2026-02-21T09:31:01.6433236Z @%p4 mbarrier.init.shared::cta.b64 [%r105], 1; 2026-02-21T09:31:01.6433417Z // end inline asm 2026-02-21T09:31:01.6433545Z bar.sync 0; 2026-02-21T09:31:01.6433676Z add.s32 %r178, %r38, 27688; 2026-02-21T09:31:01.6433821Z // begin inline asm 2026-02-21T09:31:01.6433978Z @%p4 mbarrier.init.shared::cta.b64 [%r178], 1; 2026-02-21T09:31:01.6434152Z // end inline asm 2026-02-21T09:31:01.6434285Z bar.sync 0; 2026-02-21T09:31:01.6434405Z // begin inline asm 2026-02-21T09:31:01.6434594Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r101], 4608; 2026-02-21T09:31:01.6434849Z // end inline asm 2026-02-21T09:31:01.6435087Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6435376Z // begin inline asm 2026-02-21T09:31:01.6435524Z fence.proxy.async.shared::cta; 2026-02-21T09:31:01.6435691Z // end inline asm 2026-02-21T09:31:01.6435819Z bar.sync 0; 2026-02-21T09:31:01.6435959Z elect.sync %r163|%p65, -1; 2026-02-21T09:31:01.6436122Z and.pred %p50, %p1, %p65; 2026-02-21T09:31:01.6436284Z // begin inline asm 2026-02-21T09:31:01.6436622Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r38], [%rd59, {%r302, %r181}], [%r101]; 2026-02-21T09:31:01.6436984Z // end inline asm 2026-02-21T09:31:01.6437238Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6437524Z bar.sync 0; 2026-02-21T09:31:01.6437664Z elect.sync %r164|%p66, -1; 2026-02-21T09:31:01.6437820Z and.pred %p51, %p1, %p66; 2026-02-21T09:31:01.6437973Z // begin inline asm 2026-02-21T09:31:01.6438281Z @%p51 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r67], [%rd60, {%r302, %r185}], [%r101]; 2026-02-21T09:31:01.6438626Z // end inline asm 2026-02-21T09:31:01.6438868Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6439147Z bar.sync 0; 2026-02-21T09:31:01.6439281Z // begin inline asm 2026-02-21T09:31:01.6439462Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r102], 4608; 2026-02-21T09:31:01.6439734Z // end inline asm 2026-02-21T09:31:01.6439968Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6440243Z bar.sync 0; 2026-02-21T09:31:01.6440378Z elect.sync %r165|%p67, -1; 2026-02-21T09:31:01.6440533Z and.pred %p53, %p1, %p67; 2026-02-21T09:31:01.6440690Z add.s32 %r117, %r38, 4096; 2026-02-21T09:31:01.6440835Z mov.b32 %r118, 16; 2026-02-21T09:31:01.6440977Z // begin inline asm 2026-02-21T09:31:01.6441289Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r117], [%rd59, {%r118, %r181}], [%r102]; 2026-02-21T09:31:01.6441648Z // end inline asm 2026-02-21T09:31:01.6441885Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6442178Z bar.sync 0; 2026-02-21T09:31:01.6442345Z elect.sync %r166|%p68, -1; 2026-02-21T09:31:01.6442529Z and.pred %p54, %p1, %p68; 2026-02-21T09:31:01.6442692Z add.s32 %r121, %r38, 25088; 2026-02-21T09:31:01.6442845Z // begin inline asm 2026-02-21T09:31:01.6443163Z @%p54 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd60, {%r118, %r185}], [%r102]; 2026-02-21T09:31:01.6443538Z // end inline asm 2026-02-21T09:31:01.6443804Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6444109Z bar.sync 0; 2026-02-21T09:31:01.6444234Z // begin inline asm 2026-02-21T09:31:01.6444424Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r103], 4608; 2026-02-21T09:31:01.6444649Z // end inline asm 2026-02-21T09:31:01.6444956Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6445259Z bar.sync 0; 2026-02-21T09:31:01.6445398Z elect.sync %r167|%p69, -1; 2026-02-21T09:31:01.6445559Z and.pred %p56, %p1, %p69; 2026-02-21T09:31:01.6445724Z add.s32 %r126, %r38, 8192; 2026-02-21T09:31:01.6445879Z mov.b32 %r127, 32; 2026-02-21T09:31:01.6446015Z // begin inline asm 2026-02-21T09:31:01.6446360Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r126], [%rd59, {%r127, %r181}], [%r103]; 2026-02-21T09:31:01.6446730Z // end inline asm 2026-02-21T09:31:01.6447005Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6447313Z bar.sync 0; 2026-02-21T09:31:01.6447460Z elect.sync %r168|%p70, -1; 2026-02-21T09:31:01.6447623Z and.pred %p57, %p1, %p70; 2026-02-21T09:31:01.6447790Z add.s32 %r130, %r38, 25600; 2026-02-21T09:31:01.6447957Z // begin inline asm 2026-02-21T09:31:01.6448295Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r130], [%rd60, {%r127, %r185}], [%r103]; 2026-02-21T09:31:01.6448663Z // end inline asm 2026-02-21T09:31:01.6448907Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6449198Z bar.sync 0; 2026-02-21T09:31:01.6449322Z // begin inline asm 2026-02-21T09:31:01.6449507Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r104], 4608; 2026-02-21T09:31:01.6449719Z // end inline asm 2026-02-21T09:31:01.6449960Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6450251Z bar.sync 0; 2026-02-21T09:31:01.6450379Z elect.sync %r169|%p71, -1; 2026-02-21T09:31:01.6450542Z and.pred %p59, %p1, %p71; 2026-02-21T09:31:01.6450695Z add.s32 %r135, %r38, 12288; 2026-02-21T09:31:01.6450848Z mov.b32 %r136, 48; 2026-02-21T09:31:01.6450981Z // begin inline asm 2026-02-21T09:31:01.6451309Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r135], [%rd59, {%r136, %r181}], [%r104]; 2026-02-21T09:31:01.6451666Z // end inline asm 2026-02-21T09:31:01.6451909Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6452214Z bar.sync 0; 2026-02-21T09:31:01.6452387Z elect.sync %r170|%p72, -1; 2026-02-21T09:31:01.6452545Z and.pred %p60, %p1, %p72; 2026-02-21T09:31:01.6452693Z add.s32 %r139, %r38, 26112; 2026-02-21T09:31:01.6452846Z // begin inline asm 2026-02-21T09:31:01.6453157Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r139], [%rd60, {%r136, %r185}], [%r104]; 2026-02-21T09:31:01.6453508Z // end inline asm 2026-02-21T09:31:01.6453754Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6454025Z bar.sync 0; 2026-02-21T09:31:01.6454151Z // begin inline asm 2026-02-21T09:31:01.6454325Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r105], 4608; 2026-02-21T09:31:01.6454530Z // end inline asm 2026-02-21T09:31:01.6454845Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6455149Z bar.sync 0; 2026-02-21T09:31:01.6455285Z elect.sync %r171|%p73, -1; 2026-02-21T09:31:01.6455439Z and.pred %p62, %p1, %p73; 2026-02-21T09:31:01.6455594Z add.s32 %r144, %r38, 16384; 2026-02-21T09:31:01.6455737Z mov.b32 %r145, 64; 2026-02-21T09:31:01.6455874Z // begin inline asm 2026-02-21T09:31:01.6456179Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd59, {%r145, %r181}], [%r105]; 2026-02-21T09:31:01.6456522Z // end inline asm 2026-02-21T09:31:01.6456764Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6457036Z bar.sync 0; 2026-02-21T09:31:01.6457175Z elect.sync %r172|%p74, -1; 2026-02-21T09:31:01.6457330Z and.pred %p63, %p1, %p74; 2026-02-21T09:31:01.6457489Z add.s32 %r148, %r38, 26624; 2026-02-21T09:31:01.6457636Z // begin inline asm 2026-02-21T09:31:01.6457951Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd60, {%r145, %r185}], [%r105]; 2026-02-21T09:31:01.6458287Z // end inline asm 2026-02-21T09:31:01.6458536Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6458815Z bar.sync 0; 2026-02-21T09:31:01.6458943Z // begin inline asm 2026-02-21T09:31:01.6459079Z 2026-02-21T09:31:01.6459189Z { 2026-02-21T09:31:01.6459316Z .reg .pred complete; 2026-02-21T09:31:01.6459456Z waitLoop: 2026-02-21T09:31:01.6459643Z mbarrier.try_wait.parity.shared.b64 complete, [%r101], %r302; 2026-02-21T09:31:01.6459868Z @!complete bra.uni waitLoop; 2026-02-21T09:31:01.6460021Z } 2026-02-21T09:31:01.6460084Z 2026-02-21T09:31:01.6460144Z // end inline asm 2026-02-21T09:31:01.6460379Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6460662Z setp.ne.b32 %p75, %r20, 0; 2026-02-21T09:31:01.6460813Z @%p75 bra $L__BB0_4; 2026-02-21T09:31:01.6461003Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:01.6461213Z elect.sync %r175|%p77, -1; 2026-02-21T09:31:01.6461373Z mov.b32 %r174, 134479888; 2026-02-21T09:31:01.6461518Z mov.pred %p76, 0; 2026-02-21T09:31:01.6461658Z // begin inline asm 2026-02-21T09:31:01.6461881Z @%p77 tcgen05.mma.cta_group::1.kind::f16 [ %r300 + 0 ], %rd56, %rd57, %r174, %p76; 2026-02-21T09:31:01.6462123Z // end inline asm 2026-02-21T09:31:01.6462265Z add.s32 %r177, %r38, 27696; 2026-02-21T09:31:01.6462414Z cvt.u64.u32 %rd58, %r177; 2026-02-21T09:31:01.6462568Z // begin inline asm 2026-02-21T09:31:01.6462766Z @%p77 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd58]; 2026-02-21T09:31:01.6462997Z // end inline asm 2026-02-21T09:31:01.6463166Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:01.6463476Z .loc 1 0 0 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:0 2026-02-21T09:31:01.6463761Z or.b32 %r16, %r181, %r6; 2026-02-21T09:31:01.6463909Z or.b32 %r17, %r181, %r7; 2026-02-21T09:31:01.6464087Z or.b32 %r19, %r185, %r9; 2026-02-21T09:31:01.6464358Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6464631Z bar.sync 0; 2026-02-21T09:31:01.6464803Z // begin inline asm 2026-02-21T09:31:01.6464993Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r178], 4608; 2026-02-21T09:31:01.6465204Z // end inline asm 2026-02-21T09:31:01.6465449Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6465739Z bar.sync 0; 2026-02-21T09:31:01.6465873Z elect.sync %r192|%p83, -1; 2026-02-21T09:31:01.6466028Z and.pred %p80, %p1, %p83; 2026-02-21T09:31:01.6466181Z add.s32 %r179, %r38, 20480; 2026-02-21T09:31:01.6466335Z mov.b32 %r180, 80; 2026-02-21T09:31:01.6466470Z // begin inline asm 2026-02-21T09:31:01.6466857Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r179], [%rd59, {%r180, %r181}], [%r178]; 2026-02-21T09:31:01.6467222Z // end inline asm 2026-02-21T09:31:01.6467473Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6467764Z bar.sync 0; 2026-02-21T09:31:01.6467901Z elect.sync %r193|%p84, -1; 2026-02-21T09:31:01.6468069Z and.pred %p81, %p1, %p84; 2026-02-21T09:31:01.6468225Z add.s32 %r183, %r38, 27136; 2026-02-21T09:31:01.6468385Z // begin inline asm 2026-02-21T09:31:01.6468717Z @%p81 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r183], [%rd60, {%r180, %r185}], [%r178]; 2026-02-21T09:31:01.6469091Z // end inline asm 2026-02-21T09:31:01.6469233Z mov.b32 %r307, 1; 2026-02-21T09:31:01.6469368Z mov.b32 %r306, 5; 2026-02-21T09:31:01.6469513Z mov.b32 %r304, %r302; 2026-02-21T09:31:01.6469658Z mov.b32 %r305, %r302; 2026-02-21T09:31:01.6469808Z mov.b32 %r308, %r302; 2026-02-21T09:31:01.6469949Z mov.b32 %r309, %r302; 2026-02-21T09:31:01.6470095Z bra.uni $L__BB0_5; 2026-02-21T09:31:01.6470275Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:01.6470605Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6470908Z setp.lt.u32 %p93, %r309, 928; 2026-02-21T09:31:01.6471184Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6471473Z // begin inline asm 2026-02-21T09:31:01.6471602Z 2026-02-21T09:31:01.6471716Z { 2026-02-21T09:31:01.6471834Z .reg .pred complete; 2026-02-21T09:31:01.6471982Z waitLoop: 2026-02-21T09:31:01.6472170Z mbarrier.try_wait.parity.shared.b64 complete, [%r303], %r302; 2026-02-21T09:31:01.6472412Z @!complete bra.uni waitLoop; 2026-02-21T09:31:01.6472566Z } 2026-02-21T09:31:01.6472628Z 2026-02-21T09:31:01.6472683Z // end inline asm 2026-02-21T09:31:01.6472822Z add.s32 %r225, %r307, 1; 2026-02-21T09:31:01.6472976Z setp.gt.s32 %p96, %r225, 1; 2026-02-21T09:31:01.6473141Z selp.b32 %r307, 0, %r225, %p96; 2026-02-21T09:31:01.6473314Z selp.b32 %r226, 1, 0, %p96; 2026-02-21T09:31:01.6473474Z xor.b32 %r34, %r308, %r226; 2026-02-21T09:31:01.6473740Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6474036Z add.s32 %r227, %r306, 1; 2026-02-21T09:31:01.6474187Z setp.gt.s32 %p97, %r227, 5; 2026-02-21T09:31:01.6474338Z selp.b32 %r306, 0, %r227, %p97; 2026-02-21T09:31:01.6474498Z shl.b32 %r228, %r306, 3; 2026-02-21T09:31:01.6474638Z add.s32 %r230, %r38, %r228; 2026-02-21T09:31:01.6474825Z add.s32 %r220, %r230, 27648; 2026-02-21T09:31:01.6474971Z bar.sync 0; 2026-02-21T09:31:01.6475103Z and.pred %p90, %p4, %p93; 2026-02-21T09:31:01.6475249Z // begin inline asm 2026-02-21T09:31:01.6475435Z @%p90 mbarrier.arrive.expect_tx.shared.b64 _, [%r220], 4608; 2026-02-21T09:31:01.6475644Z // end inline asm 2026-02-21T09:31:01.6475898Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6476236Z shl.b32 %r231, %r306, 12; 2026-02-21T09:31:01.6476381Z add.s32 %r217, %r38, %r231; 2026-02-21T09:31:01.6476530Z bar.sync 0; 2026-02-21T09:31:01.6476659Z elect.sync %r232|%p98, -1; 2026-02-21T09:31:01.6476822Z and.pred %p99, %p93, %p98; 2026-02-21T09:31:01.6476973Z and.pred %p91, %p1, %p99; 2026-02-21T09:31:01.6477130Z add.s32 %r218, %r309, 96; 2026-02-21T09:31:01.6477272Z // begin inline asm 2026-02-21T09:31:01.6477597Z @%p91 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r217], [%rd59, {%r218, %r181}], [%r220]; 2026-02-21T09:31:01.6477960Z // end inline asm 2026-02-21T09:31:01.6478197Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6478481Z shl.b32 %r233, %r306, 9; 2026-02-21T09:31:01.6478627Z add.s32 %r234, %r38, %r233; 2026-02-21T09:31:01.6478805Z add.s32 %r221, %r234, 24576; 2026-02-21T09:31:01.6478979Z bar.sync 0; 2026-02-21T09:31:01.6479119Z elect.sync %r235|%p100, -1; 2026-02-21T09:31:01.6479283Z and.pred %p101, %p93, %p100; 2026-02-21T09:31:01.6479437Z and.pred %p92, %p1, %p101; 2026-02-21T09:31:01.6479592Z // begin inline asm 2026-02-21T09:31:01.6479913Z @%p92 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r221], [%rd60, {%r218, %r185}], [%r220]; 2026-02-21T09:31:01.6480271Z // end inline asm 2026-02-21T09:31:01.6480513Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6480813Z setp.lt.u32 %p102, %r309, 992; 2026-02-21T09:31:01.6480971Z add.s32 %r309, %r309, 16; 2026-02-21T09:31:01.6481119Z mov.b32 %r302, %r308; 2026-02-21T09:31:01.6481262Z mov.b32 %r303, %r236; 2026-02-21T09:31:01.6481398Z mov.b32 %r308, %r34; 2026-02-21T09:31:01.6481541Z @%p102 bra $L__BB0_5; 2026-02-21T09:31:01.6481679Z bra.uni $L__BB0_8; 2026-02-21T09:31:01.6481863Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:31:01.6482100Z // => This Inner Loop Header: Depth=2 2026-02-21T09:31:01.6482419Z .loc 1 50 57 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:50:57 2026-02-21T09:31:01.6482705Z add.s32 %r196, %r305, 1; 2026-02-21T09:31:01.6482852Z setp.gt.s32 %p86, %r196, 5; 2026-02-21T09:31:01.6483013Z selp.b32 %r305, 0, %r196, %p86; 2026-02-21T09:31:01.6483170Z selp.b32 %r197, 1, 0, %p86; 2026-02-21T09:31:01.6483323Z xor.b32 %r304, %r304, %r197; 2026-02-21T09:31:01.6483472Z shl.b32 %r198, %r305, 3; 2026-02-21T09:31:01.6483617Z add.s32 %r200, %r38, %r198; 2026-02-21T09:31:01.6483762Z add.s32 %r194, %r200, 27648; 2026-02-21T09:31:01.6483910Z bar.sync 0; 2026-02-21T09:31:01.6484031Z // begin inline asm 2026-02-21T09:31:01.6484162Z 2026-02-21T09:31:01.6484275Z { 2026-02-21T09:31:01.6484386Z .reg .pred complete; 2026-02-21T09:31:01.6484528Z waitLoop: 2026-02-21T09:31:01.6484752Z mbarrier.try_wait.parity.shared.b64 complete, [%r194], %r304; 2026-02-21T09:31:01.6484986Z @!complete bra.uni waitLoop; 2026-02-21T09:31:01.6485128Z } 2026-02-21T09:31:01.6485196Z 2026-02-21T09:31:01.6485247Z // end inline asm 2026-02-21T09:31:01.6485378Z shl.b32 %r201, %r307, 3; 2026-02-21T09:31:01.6485527Z add.s32 %r202, %r38, %r201; 2026-02-21T09:31:01.6485670Z add.s32 %r236, %r202, 27696; 2026-02-21T09:31:01.6485934Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6486229Z @%p75 bra $L__BB0_7; 2026-02-21T09:31:01.6486406Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:01.6486721Z .loc 1 54 31 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:54:31 2026-02-21T09:31:01.6486997Z shl.b32 %r205, %r305, 12; 2026-02-21T09:31:01.6487147Z add.s32 %r207, %r38, %r205; 2026-02-21T09:31:01.6487407Z .loc 1 55 44 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:55:44 2026-02-21T09:31:01.6487744Z shl.b32 %r208, %r305, 9; 2026-02-21T09:31:01.6487904Z add.s32 %r209, %r38, %r208; 2026-02-21T09:31:01.6488052Z add.s32 %r210, %r209, 24576; 2026-02-21T09:31:01.6488309Z .loc 1 56 52 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:56:52 2026-02-21T09:31:01.6488594Z elect.sync %r211|%p88, -1; 2026-02-21T09:31:01.6488754Z bfe.u32 %r212, %r207, 4, 14; 2026-02-21T09:31:01.6488913Z cvt.u64.u32 %rd64, %r212; 2026-02-21T09:31:01.6489090Z or.b64 %rd61, %rd64, -4611685949691133952; 2026-02-21T09:31:01.6489271Z bfe.u32 %r213, %r210, 4, 14; 2026-02-21T09:31:01.6489423Z cvt.u64.u32 %rd65, %r213; 2026-02-21T09:31:01.6489588Z or.b64 %rd62, %rd65, -4611685949705814016; 2026-02-21T09:31:01.6489763Z mov.b32 %r204, 134479888; 2026-02-21T09:31:01.6489916Z mov.pred %p87, -1; 2026-02-21T09:31:01.6490054Z // begin inline asm 2026-02-21T09:31:01.6490328Z @%p88 tcgen05.mma.cta_group::1.kind::f16 [ %r300 + 0 ], %rd61, %rd62, %r204, %p87; 2026-02-21T09:31:01.6490578Z // end inline asm 2026-02-21T09:31:01.6490718Z cvt.u64.u32 %rd63, %r236; 2026-02-21T09:31:01.6490862Z // begin inline asm 2026-02-21T09:31:01.6491069Z @%p88 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T09:31:01.6491298Z // end inline asm 2026-02-21T09:31:01.6491427Z bra.uni $L__BB0_7; 2026-02-21T09:31:01.6491591Z $L__BB0_9: // %._crit_edge 2026-02-21T09:31:01.6491881Z .loc 1 31 4 // cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py:31:4 2026-02-21T09:31:01.6492166Z bar.sync 0; 2026-02-21T09:31:01.6492289Z // begin inline asm 2026-02-21T09:31:01.6492484Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r300, 32; 2026-02-21T09:31:01.6492698Z // end inline asm 2026-02-21T09:31:01.6492833Z ret; 2026-02-21T09:31:01.6492957Z $L__tmp1: 2026-02-21T09:31:01.6493077Z $L__func_end0: 2026-02-21T09:31:01.6493233Z // -- End function 2026-02-21T09:31:01.6493409Z } 2026-02-21T09:31:01.6493676Z .file 1 "/tmp/torchinductor_root/gd/cgdqolknytx5zf4yhiqln5ar3vlo63qsm6sldlolmnwyswpodiee.py" 2026-02-21T09:31:01.6493997Z .section .debug_abbrev 2026-02-21T09:31:01.6494141Z { 2026-02-21T09:31:01.6494285Z .b8 1 // Abbreviation Code 2026-02-21T09:31:01.6494506Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:01.6494787Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:01.6494992Z .b8 37 // DW_AT_producer 2026-02-21T09:31:01.6495195Z .b8 8 // DW_FORM_string 2026-02-21T09:31:01.6495391Z .b8 19 // DW_AT_language 2026-02-21T09:31:01.6495599Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:01.6495797Z .b8 3 // DW_AT_name 2026-02-21T09:31:01.6496001Z .b8 8 // DW_FORM_string 2026-02-21T09:31:01.6496208Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:01.6496408Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:01.6496613Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:01.6496806Z .b8 8 // DW_FORM_string 2026-02-21T09:31:01.6497011Z .b8 0 // EOM(1) 2026-02-21T09:31:01.6497199Z .b8 0 // EOM(2) 2026-02-21T09:31:01.6497391Z .b8 0 // EOM(3) 2026-02-21T09:31:01.6497559Z } 2026-02-21T09:31:01.6497686Z .section .debug_info 2026-02-21T09:31:01.6497826Z { 2026-02-21T09:31:01.6497965Z .b32 104 // Length of Unit 2026-02-21T09:31:01.6498184Z .b8 2 // DWARF version number 2026-02-21T09:31:01.6498366Z .b8 0 2026-02-21T09:31:01.6498549Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:01.6498837Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:01.6499112Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:01.6499349Z .b8 116 // DW_AT_producer 2026-02-21T09:31:01.6499526Z .b8 114 2026-02-21T09:31:01.6499648Z .b8 105 2026-02-21T09:31:01.6499762Z .b8 116 2026-02-21T09:31:01.6499880Z .b8 111 2026-02-21T09:31:01.6499989Z .b8 110 2026-02-21T09:31:01.6500106Z .b8 0 2026-02-21T09:31:01.6500242Z .b8 2 // DW_AT_language 2026-02-21T09:31:01.6500420Z .b8 0 2026-02-21T09:31:01.6500551Z .b8 99 // DW_AT_name 2026-02-21T09:31:01.6500731Z .b8 103 2026-02-21T09:31:01.6500840Z .b8 100 2026-02-21T09:31:01.6500955Z .b8 113 2026-02-21T09:31:01.6501068Z .b8 111 2026-02-21T09:31:01.6501175Z .b8 108 2026-02-21T09:31:01.6501334Z .b8 107 2026-02-21T09:31:01.6501445Z .b8 110 2026-02-21T09:31:01.6501591Z .b8 121 2026-02-21T09:31:01.6501702Z .b8 116 2026-02-21T09:31:01.6501817Z .b8 120 2026-02-21T09:31:01.6501924Z .b8 53 2026-02-21T09:31:01.6502041Z .b8 122 2026-02-21T09:31:01.6502147Z .b8 102 2026-02-21T09:31:01.6502263Z .b8 52 2026-02-21T09:31:01.6502373Z .b8 121 2026-02-21T09:31:01.6502490Z .b8 104 2026-02-21T09:31:01.6502597Z .b8 105 2026-02-21T09:31:01.6502712Z .b8 113 2026-02-21T09:31:01.6502819Z .b8 108 2026-02-21T09:31:01.6502932Z .b8 110 2026-02-21T09:31:01.6503048Z .b8 53 2026-02-21T09:31:01.6503156Z .b8 97 2026-02-21T09:31:01.6503270Z .b8 114 2026-02-21T09:31:01.6503377Z .b8 51 2026-02-21T09:31:01.6503491Z .b8 118 2026-02-21T09:31:01.6503600Z .b8 108 2026-02-21T09:31:01.6503716Z .b8 111 2026-02-21T09:31:01.6503822Z .b8 54 2026-02-21T09:31:01.6503939Z .b8 51 2026-02-21T09:31:01.6504049Z .b8 113 2026-02-21T09:31:01.6504169Z .b8 115 2026-02-21T09:31:01.6504279Z .b8 109 2026-02-21T09:31:01.6504395Z .b8 54 2026-02-21T09:31:01.6504504Z .b8 115 2026-02-21T09:31:01.6504619Z .b8 108 2026-02-21T09:31:01.6504774Z .b8 100 2026-02-21T09:31:01.6504882Z .b8 108 2026-02-21T09:31:01.6504996Z .b8 111 2026-02-21T09:31:01.6505103Z .b8 108 2026-02-21T09:31:01.6505217Z .b8 109 2026-02-21T09:31:01.6505323Z .b8 110 2026-02-21T09:31:01.6505440Z .b8 119 2026-02-21T09:31:01.6505546Z .b8 121 2026-02-21T09:31:01.6505660Z .b8 115 2026-02-21T09:31:01.6505766Z .b8 119 2026-02-21T09:31:01.6505880Z .b8 112 2026-02-21T09:31:01.6505988Z .b8 111 2026-02-21T09:31:01.6506107Z .b8 100 2026-02-21T09:31:01.6506215Z .b8 105 2026-02-21T09:31:01.6506330Z .b8 101 2026-02-21T09:31:01.6506445Z .b8 101 2026-02-21T09:31:01.6506552Z .b8 46 2026-02-21T09:31:01.6506666Z .b8 112 2026-02-21T09:31:01.6506772Z .b8 121 2026-02-21T09:31:01.6506887Z .b8 0 2026-02-21T09:31:01.6507036Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:01.6507255Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:01.6507426Z .b8 116 2026-02-21T09:31:01.6507541Z .b8 109 2026-02-21T09:31:01.6507649Z .b8 112 2026-02-21T09:31:01.6507763Z .b8 47 2026-02-21T09:31:01.6507870Z .b8 116 2026-02-21T09:31:01.6507987Z .b8 111 2026-02-21T09:31:01.6508093Z .b8 114 2026-02-21T09:31:01.6508206Z .b8 99 2026-02-21T09:31:01.6508319Z .b8 104 2026-02-21T09:31:01.6508426Z .b8 105 2026-02-21T09:31:01.6508540Z .b8 110 2026-02-21T09:31:01.6508647Z .b8 100 2026-02-21T09:31:01.6508761Z .b8 117 2026-02-21T09:31:01.6508869Z .b8 99 2026-02-21T09:31:01.6508984Z .b8 116 2026-02-21T09:31:01.6509090Z .b8 111 2026-02-21T09:31:01.6509204Z .b8 114 2026-02-21T09:31:01.6509310Z .b8 95 2026-02-21T09:31:01.6509423Z .b8 114 2026-02-21T09:31:01.6509531Z .b8 111 2026-02-21T09:31:01.6509646Z .b8 111 2026-02-21T09:31:01.6509755Z .b8 116 2026-02-21T09:31:01.6509875Z .b8 47 2026-02-21T09:31:01.6509997Z .b8 103 2026-02-21T09:31:01.6510112Z .b8 100 2026-02-21T09:31:01.6510235Z .b8 0 2026-02-21T09:31:01.6510350Z } 2026-02-21T09:31:01.6510485Z .section .debug_macinfo { } 2026-02-21T09:31:01.6510594Z 2026-02-21T09:31:01.6510676Z ================================================================ 2026-02-21T09:31:01.6510966Z please share the reproducer above with Triton project. 2026-02-21T09:31:03.1640055Z [14s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:03.1640366Z 2026-02-21T09:31:03.1641498Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:31:03.1642644Z 2026-02-21T09:31:03.1642648Z 2026-02-21T09:31:03.1642907Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:03.1643165Z `ptxas` stderr: 2026-02-21T09:31:03.1643958Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 209 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:03.1644493Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:03.1644646Z 2026-02-21T09:31:03.1645264Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpipsfb61h.ptx -o /tmp/tmpipsfb61h.ptx.o 2026-02-21T09:31:03.1645739Z 2026-02-21T09:31:03.1645877Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:03.1646168Z ================================================================ 2026-02-21T09:31:03.1646389Z Internal Triton PTX codegen error 2026-02-21T09:31:03.1646584Z `ptxas` stderr: 2026-02-21T09:31:03.1647016Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 209 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:03.1647532Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:03.1647683Z 2026-02-21T09:31:03.1648076Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpipsfb61h.ptx -o /tmp/tmpipsfb61h.ptx.o 2026-02-21T09:31:03.1648532Z 2026-02-21T09:31:03.1648536Z 2026-02-21T09:31:03.1648591Z // 2026-02-21T09:31:03.1648732Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:03.1648895Z // 2026-02-21T09:31:03.1648959Z 2026-02-21T09:31:03.1649022Z .version 8.7 2026-02-21T09:31:03.1649148Z .target sm_100a 2026-02-21T09:31:03.1649284Z .address_size 64 2026-02-21T09:31:03.1649365Z 2026-02-21T09:31:03.1649482Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:03.1649738Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:03.1649940Z // @_helion_matmul 2026-02-21T09:31:03.1650140Z .visible .entry _helion_matmul( 2026-02-21T09:31:03.1650354Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:03.1650598Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:03.1650848Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:03.1651087Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:03.1651330Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:03.1651523Z ) 2026-02-21T09:31:03.1651643Z .reqntid 256 2026-02-21T09:31:03.1651765Z .maxnreg 32 2026-02-21T09:31:03.1651889Z { 2026-02-21T09:31:03.1652015Z .reg .pred %p<31>; 2026-02-21T09:31:03.1652155Z .reg .b16 %rs<8>; 2026-02-21T09:31:03.1652293Z .reg .b32 %r<577>; 2026-02-21T09:31:03.1652426Z .reg .b64 %rd<232>; 2026-02-21T09:31:03.1652567Z $L__func_begin0: 2026-02-21T09:31:03.1652646Z 2026-02-21T09:31:03.1652697Z // %bb.0: 2026-02-21T09:31:03.1652937Z .loc 1 14 0 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:14 2026-02-21T09:31:03.1653227Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:03.1653381Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:03.1653690Z mov.b32 %r85, global_smem; 2026-02-21T09:31:03.1653843Z // begin inline asm 2026-02-21T09:31:03.1654089Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r85], 128; 2026-02-21T09:31:03.1654333Z // end inline asm 2026-02-21T09:31:03.1654469Z bar.sync 0; 2026-02-21T09:31:03.1654608Z ld.shared.b32 %r569, [global_smem]; 2026-02-21T09:31:03.1654814Z bar.sync 0; 2026-02-21T09:31:03.1654942Z // begin inline asm 2026-02-21T09:31:03.1655145Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:03.1655373Z // end inline asm 2026-02-21T09:31:03.1655623Z .loc 1 21 30 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:21:30 2026-02-21T09:31:03.1655916Z mov.u32 %r86, %ctaid.x; 2026-02-21T09:31:03.1656222Z .loc 1 21 35 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:21:35 2026-02-21T09:31:03.1656547Z mul.lo.s32 %r570, %r86, 3; 2026-02-21T09:31:03.1656811Z .loc 1 22 37 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:22:37 2026-02-21T09:31:03.1657100Z add.s32 %r87, %r570, 3; 2026-02-21T09:31:03.1657356Z .loc 1 22 49 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:22:49 2026-02-21T09:31:03.1657632Z min.s32 %r4, %r87, 768; 2026-02-21T09:31:03.1657897Z .loc 1 23 107 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:23:107 2026-02-21T09:31:03.1658196Z setp.ge.s32 %p3, %r570, %r4; 2026-02-21T09:31:03.1658364Z @%p3 bra $L__BB0_9; 2026-02-21T09:31:03.1658527Z // %bb.1: // %.lr.ph 2026-02-21T09:31:03.1658834Z .loc 1 0 107 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:0:107 2026-02-21T09:31:03.1659154Z ld.param.b64 %rd16, [_helion_matmul_param_2]; 2026-02-21T09:31:03.1659363Z ld.param.b64 %rd15, [_helion_matmul_param_1]; 2026-02-21T09:31:03.1659573Z ld.param.b64 %rd14, [_helion_matmul_param_0]; 2026-02-21T09:31:03.1659752Z shr.u32 %r5, %r1, 5; 2026-02-21T09:31:03.1659903Z bfe.u32 %r6, %r1, 2, 6; 2026-02-21T09:31:03.1660045Z shl.b32 %r7, %r1, 3; 2026-02-21T09:31:03.1660189Z and.b32 %r8, %r7, 56; 2026-02-21T09:31:03.1660330Z bfe.u32 %r9, %r1, 1, 7; 2026-02-21T09:31:03.1660481Z shr.u32 %r88, %r1, 3; 2026-02-21T09:31:03.1660626Z bfe.u32 %r10, %r1, 3, 5; 2026-02-21T09:31:03.1660781Z or.b32 %r11, %r10, 32; 2026-02-21T09:31:03.1660932Z or.b32 %r12, %r10, 64; 2026-02-21T09:31:03.1661071Z or.b32 %r13, %r88, 96; 2026-02-21T09:31:03.1661216Z or.b32 %r14, %r10, 128; 2026-02-21T09:31:03.1661356Z or.b32 %r15, %r10, 160; 2026-02-21T09:31:03.1661503Z or.b32 %r16, %r10, 192; 2026-02-21T09:31:03.1661639Z or.b32 %r17, %r88, 224; 2026-02-21T09:31:03.1661787Z and.b32 %r18, %r7, 8; 2026-02-21T09:31:03.1661924Z shl.b32 %r89, %r1, 2; 2026-02-21T09:31:03.1662068Z and.b32 %r19, %r89, 12; 2026-02-21T09:31:03.1662205Z shl.b32 %r90, %r1, 4; 2026-02-21T09:31:03.1662352Z and.b32 %r91, %r90, 3952; 2026-02-21T09:31:03.1662507Z bfe.s32 %r92, %r1, 3, 1; 2026-02-21T09:31:03.1662650Z and.b32 %r93, %r92, 144; 2026-02-21T09:31:03.1662801Z xor.b32 %r20, %r93, %r91; 2026-02-21T09:31:03.1662947Z add.s32 %r191, %r85, %r20; 2026-02-21T09:31:03.1663105Z add.s32 %r193, %r191, 4096; 2026-02-21T09:31:03.1663254Z and.b32 %r95, %r7, 1912; 2026-02-21T09:31:03.1663402Z bfe.s32 %r96, %r1, 4, 1; 2026-02-21T09:31:03.1663542Z and.b32 %r97, %r96, 144; 2026-02-21T09:31:03.1663689Z xor.b32 %r98, %r97, %r95; 2026-02-21T09:31:03.1663832Z add.s32 %r99, %r85, 65536; 2026-02-21T09:31:03.1663987Z add.s32 %r195, %r99, %r98; 2026-02-21T09:31:03.1664139Z or.b32 %r24, %r18, 16; 2026-02-21T09:31:03.1664281Z add.s32 %r197, %r191, 8192; 2026-02-21T09:31:03.1664441Z add.s32 %r199, %r191, 12288; 2026-02-21T09:31:03.1664590Z add.s32 %r100, %r85, %r98; 2026-02-21T09:31:03.1664798Z add.s32 %r201, %r100, 67584; 2026-02-21T09:31:03.1664950Z or.b32 %r28, %r18, 32; 2026-02-21T09:31:03.1665147Z add.s32 %r203, %r191, 16384; 2026-02-21T09:31:03.1665327Z add.s32 %r205, %r191, 20480; 2026-02-21T09:31:03.1665482Z add.s32 %r207, %r100, 69632; 2026-02-21T09:31:03.1665628Z or.b32 %r32, %r18, 48; 2026-02-21T09:31:03.1665776Z add.s32 %r209, %r191, 24576; 2026-02-21T09:31:03.1665928Z add.s32 %r211, %r191, 28672; 2026-02-21T09:31:03.1666072Z add.s32 %r213, %r100, 71680; 2026-02-21T09:31:03.1666228Z or.b32 %r36, %r18, 64; 2026-02-21T09:31:03.1666369Z add.s32 %r215, %r191, 32768; 2026-02-21T09:31:03.1666526Z add.s32 %r217, %r191, 36864; 2026-02-21T09:31:03.1666674Z add.s32 %r219, %r100, 73728; 2026-02-21T09:31:03.1666825Z or.b32 %r40, %r18, 80; 2026-02-21T09:31:03.1666965Z add.s32 %r221, %r191, 40960; 2026-02-21T09:31:03.1667128Z add.s32 %r223, %r191, 45056; 2026-02-21T09:31:03.1667290Z add.s32 %r225, %r100, 75776; 2026-02-21T09:31:03.1667435Z or.b32 %r44, %r18, 96; 2026-02-21T09:31:03.1667608Z add.s32 %r227, %r191, 49152; 2026-02-21T09:31:03.1667784Z add.s32 %r229, %r191, 53248; 2026-02-21T09:31:03.1667940Z add.s32 %r231, %r100, 77824; 2026-02-21T09:31:03.1668090Z bfe.u32 %r101, %r85, 4, 14; 2026-02-21T09:31:03.1668249Z cvt.u64.u32 %rd17, %r101; 2026-02-21T09:31:03.1668416Z or.b64 %rd69, %rd17, -4611685949674356736; 2026-02-21T09:31:03.1668598Z bfe.u32 %r102, %r99, 4, 14; 2026-02-21T09:31:03.1668745Z cvt.u64.u32 %rd18, %r102; 2026-02-21T09:31:03.1668910Z or.b64 %rd70, %rd18, -4611685949699522560; 2026-02-21T09:31:03.1669087Z add.s32 %r103, %r85, 4096; 2026-02-21T09:31:03.1669235Z bfe.u32 %r104, %r103, 4, 14; 2026-02-21T09:31:03.1669389Z cvt.u64.u32 %rd19, %r104; 2026-02-21T09:31:03.1669543Z or.b64 %rd71, %rd19, -4611685949674356736; 2026-02-21T09:31:03.1669719Z or.b32 %r48, %r18, 112; 2026-02-21T09:31:03.1669861Z add.s32 %r260, %r191, 57344; 2026-02-21T09:31:03.1670013Z add.s32 %r262, %r191, 61440; 2026-02-21T09:31:03.1670157Z add.s32 %r264, %r100, 79872; 2026-02-21T09:31:03.1670309Z and.b32 %r105, %r90, 4016; 2026-02-21T09:31:03.1670456Z bfe.s32 %r106, %r1, 2, 1; 2026-02-21T09:31:03.1670609Z and.b32 %r107, %r106, 4160; 2026-02-21T09:31:03.1670760Z or.b32 %r108, %r107, %r105; 2026-02-21T09:31:03.1670908Z add.s32 %r52, %r85, %r108; 2026-02-21T09:31:03.1671061Z xor.b32 %r109, %r108, 64; 2026-02-21T09:31:03.1671206Z add.s32 %r53, %r85, %r109; 2026-02-21T09:31:03.1671356Z shl.b32 %r110, %r1, 6; 2026-02-21T09:31:03.1671496Z and.b32 %r111, %r110, 1600; 2026-02-21T09:31:03.1671649Z and.b32 %r112, %r7, 48; 2026-02-21T09:31:03.1671791Z shl.b32 %r113, %r1, 1; 2026-02-21T09:31:03.1671938Z and.b32 %r114, %r113, 384; 2026-02-21T09:31:03.1672081Z bfe.s32 %r115, %r1, 5, 1; 2026-02-21T09:31:03.1672232Z and.b32 %r116, %r115, 4160; 2026-02-21T09:31:03.1672386Z or.b32 %r117, %r111, %r112; 2026-02-21T09:31:03.1672528Z or.b32 %r118, %r116, %r114; 2026-02-21T09:31:03.1672682Z xor.b32 %r119, %r118, %r117; 2026-02-21T09:31:03.1672827Z add.s32 %r388, %r85, %r119; 2026-02-21T09:31:03.1672979Z add.s32 %r393, %r388, 2048; 2026-02-21T09:31:03.1673251Z .loc 1 23 107 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:23:107 2026-02-21T09:31:03.1673545Z and.b32 %r120, %r1, 1; 2026-02-21T09:31:03.1673696Z mad.wide.u32 %rd20, %r120, 16, %rd14; 2026-02-21T09:31:03.1673876Z add.s64 %rd4, %rd20, 262400; 2026-02-21T09:31:03.1674031Z shl.b32 %r56, %r9, 10; 2026-02-21T09:31:03.1674282Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1674571Z or.b32 %r57, %r19, 128; 2026-02-21T09:31:03.1674758Z setp.eq.b32 %p8, %r1, 0; 2026-02-21T09:31:03.1674919Z bra.uni $L__BB0_2; 2026-02-21T09:31:03.1675105Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:03.1675427Z .loc 1 0 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:0:89 2026-02-21T09:31:03.1675713Z mov.b32 %r313, 1; 2026-02-21T09:31:03.1675958Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1676311Z // begin inline asm 2026-02-21T09:31:03.1676441Z 2026-02-21T09:31:03.1676558Z { 2026-02-21T09:31:03.1676674Z .reg .pred complete; 2026-02-21T09:31:03.1676823Z waitLoop: 2026-02-21T09:31:03.1677007Z mbarrier.try_wait.parity.shared.b64 complete, [%r312], %r313; 2026-02-21T09:31:03.1677243Z @!complete bra.uni waitLoop; 2026-02-21T09:31:03.1677390Z } 2026-02-21T09:31:03.1677462Z 2026-02-21T09:31:03.1677516Z // end inline asm 2026-02-21T09:31:03.1677769Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1678054Z cp.async.wait_group 0; 2026-02-21T09:31:03.1678207Z bar.sync 0; 2026-02-21T09:31:03.1678339Z add.s32 %r314, %r85, 81920; 2026-02-21T09:31:03.1678492Z // begin inline asm 2026-02-21T09:31:03.1678655Z @%p8 mbarrier.inval.shared::cta.b64 [%r314]; 2026-02-21T09:31:03.1678870Z // end inline asm 2026-02-21T09:31:03.1679030Z bar.sync 0; 2026-02-21T09:31:03.1679166Z // begin inline asm 2026-02-21T09:31:03.1679331Z @%p8 mbarrier.inval.shared::cta.b64 [%r190]; 2026-02-21T09:31:03.1679507Z // end inline asm 2026-02-21T09:31:03.1679759Z .loc 1 52 45 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:52:45 2026-02-21T09:31:03.1680033Z shl.b32 %r457, %r63, 10; 2026-02-21T09:31:03.1680189Z shl.b32 %r458, %r64, 10; 2026-02-21T09:31:03.1680331Z shl.b32 %r459, %r65, 10; 2026-02-21T09:31:03.1680480Z shl.b32 %r460, %r66, 10; 2026-02-21T09:31:03.1680622Z shl.b32 %r461, %r67, 10; 2026-02-21T09:31:03.1680769Z shl.b32 %r462, %r68, 10; 2026-02-21T09:31:03.1680916Z shl.b32 %r463, %r69, 10; 2026-02-21T09:31:03.1681057Z shl.b32 %r464, %r70, 10; 2026-02-21T09:31:03.1681311Z .loc 1 52 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:52:52 2026-02-21T09:31:03.1681584Z add.s32 %r465, %r457, %r62; 2026-02-21T09:31:03.1681742Z add.s32 %r466, %r458, %r62; 2026-02-21T09:31:03.1681891Z add.s32 %r467, %r459, %r62; 2026-02-21T09:31:03.1682046Z add.s32 %r468, %r460, %r62; 2026-02-21T09:31:03.1682192Z add.s32 %r469, %r461, %r62; 2026-02-21T09:31:03.1682344Z add.s32 %r470, %r462, %r62; 2026-02-21T09:31:03.1682497Z add.s32 %r471, %r463, %r62; 2026-02-21T09:31:03.1682642Z add.s32 %r472, %r464, %r62; 2026-02-21T09:31:03.1682921Z .loc 1 52 24 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:52:24 2026-02-21T09:31:03.1683217Z mad.wide.s32 %rd94, %r465, 2, %rd16; 2026-02-21T09:31:03.1683408Z mad.wide.s32 %rd95, %r466, 2, %rd16; 2026-02-21T09:31:03.1683586Z mad.wide.s32 %rd96, %r467, 2, %rd16; 2026-02-21T09:31:03.1683768Z mad.wide.s32 %rd97, %r468, 2, %rd16; 2026-02-21T09:31:03.1683941Z mad.wide.s32 %rd98, %r469, 2, %rd16; 2026-02-21T09:31:03.1684131Z mad.wide.s32 %rd99, %r470, 2, %rd16; 2026-02-21T09:31:03.1684314Z mad.wide.s32 %rd100, %r471, 2, %rd16; 2026-02-21T09:31:03.1684496Z mad.wide.s32 %rd101, %r472, 2, %rd16; 2026-02-21T09:31:03.1684837Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1685130Z // begin inline asm 2026-02-21T09:31:03.1685522Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331}, [%r383 + 0]; 2026-02-21T09:31:03.1685937Z // end inline asm 2026-02-21T09:31:03.1686084Z // begin inline asm 2026-02-21T09:31:03.1686460Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348}, [%r383 + 16]; 2026-02-21T09:31:03.1686859Z // end inline asm 2026-02-21T09:31:03.1687003Z // begin inline asm 2026-02-21T09:31:03.1687355Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365}, [%r383 + 32]; 2026-02-21T09:31:03.1687757Z // end inline asm 2026-02-21T09:31:03.1687897Z // begin inline asm 2026-02-21T09:31:03.1688321Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381, %r382}, [%r383 + 48]; 2026-02-21T09:31:03.1688750Z // end inline asm 2026-02-21T09:31:03.1688885Z // begin inline asm 2026-02-21T09:31:03.1689046Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:03.1689211Z // end inline asm 2026-02-21T09:31:03.1689359Z cvt.u64.u32 %rd102, %r316; 2026-02-21T09:31:03.1689523Z cvt.u64.u32 %rd103, %r317; 2026-02-21T09:31:03.1689691Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:31:03.1689851Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:31:03.1690143Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1690430Z mov.b64 {%r473, %r474}, %rd105; 2026-02-21T09:31:03.1690595Z cvt.rn.f16x2.f32 %r475, %r474, %r473; 2026-02-21T09:31:03.1690934Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1691221Z cvt.u64.u32 %rd106, %r318; 2026-02-21T09:31:03.1691378Z cvt.u64.u32 %rd107, %r319; 2026-02-21T09:31:03.1691524Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:31:03.1691679Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:31:03.1691941Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1692215Z mov.b64 {%r476, %r477}, %rd109; 2026-02-21T09:31:03.1692386Z cvt.rn.f16x2.f32 %r478, %r477, %r476; 2026-02-21T09:31:03.1692653Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1692936Z cvt.u64.u32 %rd110, %r320; 2026-02-21T09:31:03.1693082Z cvt.u64.u32 %rd111, %r321; 2026-02-21T09:31:03.1693235Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:31:03.1693383Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:31:03.1693652Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1693936Z mov.b64 {%r479, %r480}, %rd113; 2026-02-21T09:31:03.1694098Z cvt.rn.f16x2.f32 %r481, %r480, %r479; 2026-02-21T09:31:03.1694375Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1694647Z cvt.u64.u32 %rd114, %r322; 2026-02-21T09:31:03.1694834Z cvt.u64.u32 %rd115, %r323; 2026-02-21T09:31:03.1694983Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:31:03.1695140Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:31:03.1695415Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1695705Z mov.b64 {%r482, %r483}, %rd117; 2026-02-21T09:31:03.1695877Z cvt.rn.f16x2.f32 %r484, %r483, %r482; 2026-02-21T09:31:03.1696158Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1696457Z cvt.u64.u32 %rd118, %r324; 2026-02-21T09:31:03.1696607Z cvt.u64.u32 %rd119, %r325; 2026-02-21T09:31:03.1696761Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:31:03.1696914Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:31:03.1697191Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1697485Z mov.b64 {%r485, %r486}, %rd121; 2026-02-21T09:31:03.1697650Z cvt.rn.f16x2.f32 %r487, %r486, %r485; 2026-02-21T09:31:03.1697936Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1698214Z cvt.u64.u32 %rd122, %r326; 2026-02-21T09:31:03.1698368Z cvt.u64.u32 %rd123, %r327; 2026-02-21T09:31:03.1698518Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:31:03.1698676Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:31:03.1698952Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1699237Z mov.b64 {%r488, %r489}, %rd125; 2026-02-21T09:31:03.1699407Z cvt.rn.f16x2.f32 %r490, %r489, %r488; 2026-02-21T09:31:03.1699702Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1700020Z cvt.u64.u32 %rd126, %r328; 2026-02-21T09:31:03.1700171Z cvt.u64.u32 %rd127, %r329; 2026-02-21T09:31:03.1700327Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:31:03.1700478Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:31:03.1700749Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1701034Z mov.b64 {%r491, %r492}, %rd129; 2026-02-21T09:31:03.1701193Z cvt.rn.f16x2.f32 %r493, %r492, %r491; 2026-02-21T09:31:03.1701467Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1701742Z cvt.u64.u32 %rd130, %r330; 2026-02-21T09:31:03.1701893Z cvt.u64.u32 %rd131, %r331; 2026-02-21T09:31:03.1702039Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:31:03.1702221Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:31:03.1702517Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1702798Z mov.b64 {%r494, %r495}, %rd133; 2026-02-21T09:31:03.1702964Z cvt.rn.f16x2.f32 %r496, %r495, %r494; 2026-02-21T09:31:03.1703236Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1703522Z cvt.u64.u32 %rd134, %r333; 2026-02-21T09:31:03.1703667Z cvt.u64.u32 %rd135, %r334; 2026-02-21T09:31:03.1703821Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:31:03.1703970Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:31:03.1704240Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1704528Z mov.b64 {%r497, %r498}, %rd137; 2026-02-21T09:31:03.1704737Z cvt.rn.f16x2.f32 %r499, %r498, %r497; 2026-02-21T09:31:03.1705018Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1705300Z cvt.u64.u32 %rd138, %r335; 2026-02-21T09:31:03.1705460Z cvt.u64.u32 %rd139, %r336; 2026-02-21T09:31:03.1705607Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:31:03.1705765Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:31:03.1706034Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1706310Z mov.b64 {%r500, %r501}, %rd141; 2026-02-21T09:31:03.1706481Z cvt.rn.f16x2.f32 %r502, %r501, %r500; 2026-02-21T09:31:03.1706750Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1707043Z cvt.u64.u32 %rd142, %r337; 2026-02-21T09:31:03.1707196Z cvt.u64.u32 %rd143, %r338; 2026-02-21T09:31:03.1707358Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:31:03.1707509Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:31:03.1707773Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1708058Z mov.b64 {%r503, %r504}, %rd145; 2026-02-21T09:31:03.1708218Z cvt.rn.f16x2.f32 %r505, %r504, %r503; 2026-02-21T09:31:03.1708499Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1708782Z cvt.u64.u32 %rd146, %r339; 2026-02-21T09:31:03.1708936Z cvt.u64.u32 %rd147, %r340; 2026-02-21T09:31:03.1709082Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:31:03.1709238Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:31:03.1709501Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1709776Z mov.b64 {%r506, %r507}, %rd149; 2026-02-21T09:31:03.1709943Z cvt.rn.f16x2.f32 %r508, %r507, %r506; 2026-02-21T09:31:03.1710207Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1710489Z cvt.u64.u32 %rd150, %r341; 2026-02-21T09:31:03.1710637Z cvt.u64.u32 %rd151, %r342; 2026-02-21T09:31:03.1710793Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:31:03.1710972Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:31:03.1711264Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1711547Z mov.b64 {%r509, %r510}, %rd153; 2026-02-21T09:31:03.1711705Z cvt.rn.f16x2.f32 %r511, %r510, %r509; 2026-02-21T09:31:03.1711985Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1712266Z cvt.u64.u32 %rd154, %r343; 2026-02-21T09:31:03.1712420Z cvt.u64.u32 %rd155, %r344; 2026-02-21T09:31:03.1712566Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:31:03.1712721Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:31:03.1712987Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1713264Z mov.b64 {%r512, %r513}, %rd157; 2026-02-21T09:31:03.1713451Z cvt.rn.f16x2.f32 %r514, %r513, %r512; 2026-02-21T09:31:03.1713746Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1714036Z cvt.u64.u32 %rd158, %r345; 2026-02-21T09:31:03.1714182Z cvt.u64.u32 %rd159, %r346; 2026-02-21T09:31:03.1714335Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:31:03.1714485Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:31:03.1714781Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1715065Z mov.b64 {%r515, %r516}, %rd161; 2026-02-21T09:31:03.1715225Z cvt.rn.f16x2.f32 %r517, %r516, %r515; 2026-02-21T09:31:03.1715501Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1715776Z cvt.u64.u32 %rd162, %r347; 2026-02-21T09:31:03.1715930Z cvt.u64.u32 %rd163, %r348; 2026-02-21T09:31:03.1716075Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:31:03.1716234Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:31:03.1716501Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1716784Z mov.b64 {%r518, %r519}, %rd165; 2026-02-21T09:31:03.1716953Z cvt.rn.f16x2.f32 %r520, %r519, %r518; 2026-02-21T09:31:03.1717223Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1717508Z cvt.u64.u32 %rd166, %r350; 2026-02-21T09:31:03.1717657Z cvt.u64.u32 %rd167, %r351; 2026-02-21T09:31:03.1717813Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:31:03.1717965Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:31:03.1718232Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1718526Z mov.b64 {%r521, %r522}, %rd169; 2026-02-21T09:31:03.1718684Z cvt.rn.f16x2.f32 %r523, %r522, %r521; 2026-02-21T09:31:03.1718961Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1719236Z cvt.u64.u32 %rd170, %r352; 2026-02-21T09:31:03.1719390Z cvt.u64.u32 %rd171, %r353; 2026-02-21T09:31:03.1719537Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:31:03.1719697Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:31:03.1719960Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1720236Z mov.b64 {%r524, %r525}, %rd173; 2026-02-21T09:31:03.1720403Z cvt.rn.f16x2.f32 %r526, %r525, %r524; 2026-02-21T09:31:03.1720672Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1720953Z cvt.u64.u32 %rd174, %r354; 2026-02-21T09:31:03.1721099Z cvt.u64.u32 %rd175, %r355; 2026-02-21T09:31:03.1721251Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:31:03.1721399Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:31:03.1721666Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1721951Z mov.b64 {%r527, %r528}, %rd177; 2026-02-21T09:31:03.1722111Z cvt.rn.f16x2.f32 %r529, %r528, %r527; 2026-02-21T09:31:03.1722412Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1722726Z cvt.u64.u32 %rd178, %r356; 2026-02-21T09:31:03.1722882Z cvt.u64.u32 %rd179, %r357; 2026-02-21T09:31:03.1723028Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:31:03.1723184Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:31:03.1723454Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1723733Z mov.b64 {%r530, %r531}, %rd181; 2026-02-21T09:31:03.1723898Z cvt.rn.f16x2.f32 %r532, %r531, %r530; 2026-02-21T09:31:03.1724176Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1724458Z cvt.u64.u32 %rd182, %r358; 2026-02-21T09:31:03.1724604Z cvt.u64.u32 %rd183, %r359; 2026-02-21T09:31:03.1724811Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:31:03.1724998Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:31:03.1725267Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1725548Z mov.b64 {%r533, %r534}, %rd185; 2026-02-21T09:31:03.1725712Z cvt.rn.f16x2.f32 %r535, %r534, %r533; 2026-02-21T09:31:03.1726004Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1726295Z cvt.u64.u32 %rd186, %r360; 2026-02-21T09:31:03.1726457Z cvt.u64.u32 %rd187, %r361; 2026-02-21T09:31:03.1726612Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:31:03.1726775Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:31:03.1727055Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1727345Z mov.b64 {%r536, %r537}, %rd189; 2026-02-21T09:31:03.1727522Z cvt.rn.f16x2.f32 %r538, %r537, %r536; 2026-02-21T09:31:03.1727811Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1728112Z cvt.u64.u32 %rd190, %r362; 2026-02-21T09:31:03.1728269Z cvt.u64.u32 %rd191, %r363; 2026-02-21T09:31:03.1728432Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:31:03.1728590Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:31:03.1728883Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1729184Z mov.b64 {%r539, %r540}, %rd193; 2026-02-21T09:31:03.1729352Z cvt.rn.f16x2.f32 %r541, %r540, %r539; 2026-02-21T09:31:03.1729639Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1729928Z cvt.u64.u32 %rd194, %r364; 2026-02-21T09:31:03.1730091Z cvt.u64.u32 %rd195, %r365; 2026-02-21T09:31:03.1730242Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:31:03.1730404Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:31:03.1730689Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1730980Z mov.b64 {%r542, %r543}, %rd197; 2026-02-21T09:31:03.1731154Z cvt.rn.f16x2.f32 %r544, %r543, %r542; 2026-02-21T09:31:03.1731439Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1731733Z cvt.u64.u32 %rd198, %r367; 2026-02-21T09:31:03.1731886Z cvt.u64.u32 %rd199, %r368; 2026-02-21T09:31:03.1732046Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:31:03.1732202Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:31:03.1732481Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1732779Z mov.b64 {%r545, %r546}, %rd201; 2026-02-21T09:31:03.1732946Z cvt.rn.f16x2.f32 %r547, %r546, %r545; 2026-02-21T09:31:03.1733234Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1733519Z cvt.u64.u32 %rd202, %r369; 2026-02-21T09:31:03.1733681Z cvt.u64.u32 %rd203, %r370; 2026-02-21T09:31:03.1733836Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:31:03.1734058Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:31:03.1734320Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1734596Z mov.b64 {%r548, %r549}, %rd205; 2026-02-21T09:31:03.1734800Z cvt.rn.f16x2.f32 %r550, %r549, %r548; 2026-02-21T09:31:03.1735078Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1735362Z cvt.u64.u32 %rd206, %r371; 2026-02-21T09:31:03.1735510Z cvt.u64.u32 %rd207, %r372; 2026-02-21T09:31:03.1735665Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:31:03.1735814Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:31:03.1736082Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1736366Z mov.b64 {%r551, %r552}, %rd209; 2026-02-21T09:31:03.1736575Z cvt.rn.f16x2.f32 %r553, %r552, %r551; 2026-02-21T09:31:03.1736890Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1737178Z cvt.u64.u32 %rd210, %r373; 2026-02-21T09:31:03.1737331Z cvt.u64.u32 %rd211, %r374; 2026-02-21T09:31:03.1737478Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:31:03.1737635Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:31:03.1737898Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1738173Z mov.b64 {%r554, %r555}, %rd213; 2026-02-21T09:31:03.1738340Z cvt.rn.f16x2.f32 %r556, %r555, %r554; 2026-02-21T09:31:03.1738611Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1738908Z cvt.u64.u32 %rd214, %r375; 2026-02-21T09:31:03.1739055Z cvt.u64.u32 %rd215, %r376; 2026-02-21T09:31:03.1739207Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:31:03.1739360Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:31:03.1739631Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1739931Z mov.b64 {%r557, %r558}, %rd217; 2026-02-21T09:31:03.1740093Z cvt.rn.f16x2.f32 %r559, %r558, %r557; 2026-02-21T09:31:03.1740375Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1740653Z cvt.u64.u32 %rd218, %r377; 2026-02-21T09:31:03.1740807Z cvt.u64.u32 %rd219, %r378; 2026-02-21T09:31:03.1740953Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:31:03.1741110Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:31:03.1741373Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1741650Z mov.b64 {%r560, %r561}, %rd221; 2026-02-21T09:31:03.1741816Z cvt.rn.f16x2.f32 %r562, %r561, %r560; 2026-02-21T09:31:03.1742089Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1742370Z cvt.u64.u32 %rd222, %r379; 2026-02-21T09:31:03.1742519Z cvt.u64.u32 %rd223, %r380; 2026-02-21T09:31:03.1742675Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:31:03.1742825Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:31:03.1743088Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1743379Z mov.b64 {%r563, %r564}, %rd225; 2026-02-21T09:31:03.1743539Z cvt.rn.f16x2.f32 %r565, %r564, %r563; 2026-02-21T09:31:03.1743812Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1744089Z cvt.u64.u32 %rd226, %r381; 2026-02-21T09:31:03.1744245Z cvt.u64.u32 %rd227, %r382; 2026-02-21T09:31:03.1744390Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:31:03.1744548Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:31:03.1744851Z .loc 1 51 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:51:27 2026-02-21T09:31:03.1745130Z mov.b64 {%r566, %r567}, %rd229; 2026-02-21T09:31:03.1745322Z cvt.rn.f16x2.f32 %r568, %r567, %r566; 2026-02-21T09:31:03.1745616Z .loc 1 52 82 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:52:82 2026-02-21T09:31:03.1745933Z st.shared.v4.b32 [%r52], {%r475, %r487, %r499, %r511}; 2026-02-21T09:31:03.1746159Z st.shared.v4.b32 [%r53], {%r523, %r535, %r547, %r559}; 2026-02-21T09:31:03.1746353Z bar.sync 0; 2026-02-21T09:31:03.1746490Z // begin inline asm 2026-02-21T09:31:03.1746718Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r384, %r385, %r386, %r387}, [%r388]; 2026-02-21T09:31:03.1746988Z // end inline asm 2026-02-21T09:31:03.1747121Z // begin inline asm 2026-02-21T09:31:03.1747349Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r389, %r390, %r391, %r392}, [%r393]; 2026-02-21T09:31:03.1747600Z // end inline asm 2026-02-21T09:31:03.1747737Z bar.sync 0; 2026-02-21T09:31:03.1747919Z st.shared.v4.b32 [%r52], {%r478, %r490, %r502, %r514}; 2026-02-21T09:31:03.1748168Z st.shared.v4.b32 [%r53], {%r526, %r538, %r550, %r562}; 2026-02-21T09:31:03.1748362Z bar.sync 0; 2026-02-21T09:31:03.1748490Z // begin inline asm 2026-02-21T09:31:03.1748717Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r394, %r395, %r396, %r397}, [%r388]; 2026-02-21T09:31:03.1748967Z // end inline asm 2026-02-21T09:31:03.1749107Z // begin inline asm 2026-02-21T09:31:03.1749320Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r399, %r400, %r401, %r402}, [%r393]; 2026-02-21T09:31:03.1749578Z // end inline asm 2026-02-21T09:31:03.1749707Z bar.sync 0; 2026-02-21T09:31:03.1749872Z st.shared.v4.b32 [%r52], {%r481, %r493, %r505, %r517}; 2026-02-21T09:31:03.1750098Z st.shared.v4.b32 [%r53], {%r529, %r541, %r553, %r565}; 2026-02-21T09:31:03.1750285Z bar.sync 0; 2026-02-21T09:31:03.1750420Z // begin inline asm 2026-02-21T09:31:03.1750642Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r404, %r405, %r406, %r407}, [%r388]; 2026-02-21T09:31:03.1750904Z // end inline asm 2026-02-21T09:31:03.1751036Z // begin inline asm 2026-02-21T09:31:03.1751257Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r409, %r410, %r411, %r412}, [%r393]; 2026-02-21T09:31:03.1751501Z // end inline asm 2026-02-21T09:31:03.1751635Z bar.sync 0; 2026-02-21T09:31:03.1751791Z st.shared.v4.b32 [%r52], {%r484, %r496, %r508, %r520}; 2026-02-21T09:31:03.1752013Z st.shared.v4.b32 [%r53], {%r532, %r544, %r556, %r568}; 2026-02-21T09:31:03.1752311Z bar.sync 0; 2026-02-21T09:31:03.1752433Z // begin inline asm 2026-02-21T09:31:03.1752656Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r414, %r415, %r416, %r417}, [%r388]; 2026-02-21T09:31:03.1752905Z // end inline asm 2026-02-21T09:31:03.1753039Z // begin inline asm 2026-02-21T09:31:03.1753254Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r419, %r420, %r421, %r422}, [%r393]; 2026-02-21T09:31:03.1753508Z // end inline asm 2026-02-21T09:31:03.1753646Z // begin inline asm 2026-02-21T09:31:03.1753825Z st.global.v4.b32 [ %rd94 + 0 ], { %r384, %r394, %r404, %r414 }; 2026-02-21T09:31:03.1754042Z // end inline asm 2026-02-21T09:31:03.1754173Z // begin inline asm 2026-02-21T09:31:03.1754356Z st.global.v4.b32 [ %rd95 + 0 ], { %r385, %r395, %r405, %r415 }; 2026-02-21T09:31:03.1754557Z // end inline asm 2026-02-21T09:31:03.1754752Z // begin inline asm 2026-02-21T09:31:03.1754928Z st.global.v4.b32 [ %rd96 + 0 ], { %r386, %r396, %r406, %r416 }; 2026-02-21T09:31:03.1755140Z // end inline asm 2026-02-21T09:31:03.1755271Z // begin inline asm 2026-02-21T09:31:03.1755454Z st.global.v4.b32 [ %rd97 + 0 ], { %r387, %r397, %r407, %r417 }; 2026-02-21T09:31:03.1755661Z // end inline asm 2026-02-21T09:31:03.1755794Z // begin inline asm 2026-02-21T09:31:03.1755973Z st.global.v4.b32 [ %rd98 + 0 ], { %r389, %r399, %r409, %r419 }; 2026-02-21T09:31:03.1756173Z // end inline asm 2026-02-21T09:31:03.1756308Z // begin inline asm 2026-02-21T09:31:03.1756480Z st.global.v4.b32 [ %rd99 + 0 ], { %r390, %r400, %r410, %r420 }; 2026-02-21T09:31:03.1756688Z // end inline asm 2026-02-21T09:31:03.1756819Z // begin inline asm 2026-02-21T09:31:03.1757007Z st.global.v4.b32 [ %rd100 + 0 ], { %r391, %r401, %r411, %r421 }; 2026-02-21T09:31:03.1757282Z // end inline asm 2026-02-21T09:31:03.1757410Z // begin inline asm 2026-02-21T09:31:03.1757589Z st.global.v4.b32 [ %rd101 + 0 ], { %r392, %r402, %r412, %r422 }; 2026-02-21T09:31:03.1757797Z // end inline asm 2026-02-21T09:31:03.1758057Z .loc 1 23 107 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:23:107 2026-02-21T09:31:03.1758354Z add.s32 %r570, %r570, 1; 2026-02-21T09:31:03.1758520Z setp.ne.b32 %p29, %r570, %r4; 2026-02-21T09:31:03.1758678Z @%p29 bra $L__BB0_2; 2026-02-21T09:31:03.1758827Z bra.uni $L__BB0_9; 2026-02-21T09:31:03.1759012Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:31:03.1759247Z // Child Loop BB0_5 Depth 2 2026-02-21T09:31:03.1759587Z .loc 1 29 35 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:29:35 2026-02-21T09:31:03.1759915Z mul.hi.s32 %r233, %r570, 715827883; 2026-02-21T09:31:03.1760097Z shr.u32 %r234, %r233, 31; 2026-02-21T09:31:03.1760255Z shr.s32 %r235, %r233, 5; 2026-02-21T09:31:03.1760415Z add.s32 %r236, %r235, %r234; 2026-02-21T09:31:03.1760686Z .loc 1 32 45 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:32:45 2026-02-21T09:31:03.1760976Z mul.lo.s32 %r237, %r236, 192; 2026-02-21T09:31:03.1761142Z sub.s32 %r238, %r570, %r237; 2026-02-21T09:31:03.1761404Z .loc 1 32 64 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:32:64 2026-02-21T09:31:03.1761695Z cvt.u16.u32 %rs1, %r238; 2026-02-21T09:31:03.1761951Z .loc 1 33 51 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:33:51 2026-02-21T09:31:03.1762240Z shr.s16 %rs2, %rs1, 15; 2026-02-21T09:31:03.1762388Z shr.u16 %rs3, %rs2, 14; 2026-02-21T09:31:03.1762542Z add.s16 %rs4, %rs1, %rs3; 2026-02-21T09:31:03.1762701Z shr.s16 %rs5, %rs4, 2; 2026-02-21T09:31:03.1762956Z .loc 1 32 64 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:32:64 2026-02-21T09:31:03.1763254Z and.b16 %rs6, %rs4, -4; 2026-02-21T09:31:03.1763402Z sub.s16 %rs7, %rs1, %rs6; 2026-02-21T09:31:03.1763665Z .loc 1 34 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:34:27 2026-02-21T09:31:03.1763946Z shl.b32 %r60, %r236, 8; 2026-02-21T09:31:03.1764102Z mul.wide.s16 %r61, %rs7, 64; 2026-02-21T09:31:03.1764265Z add.s32 %r239, %r61, %r60; 2026-02-21T09:31:03.1764528Z .loc 1 35 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:35:32 2026-02-21T09:31:03.1764863Z or.b32 %r240, %r239, %r6; 2026-02-21T09:31:03.1765118Z .loc 1 36 27 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:36:27 2026-02-21T09:31:03.1765418Z mul.wide.s16 %r241, %rs5, 256; 2026-02-21T09:31:03.1765691Z .loc 1 37 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:37:32 2026-02-21T09:31:03.1765977Z or.b32 %r242, %r241, %r9; 2026-02-21T09:31:03.1766239Z .loc 1 47 53 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:53 2026-02-21T09:31:03.1766515Z shl.b32 %r243, %r242, 10; 2026-02-21T09:31:03.1766776Z .loc 1 48 80 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:80 2026-02-21T09:31:03.1767054Z shl.b32 %r244, %r240, 10; 2026-02-21T09:31:03.1767315Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1767610Z shfl.sync.idx.b32 %r71, %r5, 0, 31, -1; 2026-02-21T09:31:03.1767794Z shl.b32 %r245, %r71, 21; 2026-02-21T09:31:03.1767954Z and.b32 %r246, %r245, 6291456; 2026-02-21T09:31:03.1768111Z add.s32 %r247, %r246, %r569; 2026-02-21T09:31:03.1768270Z shl.b32 %r248, %r71, 4; 2026-02-21T09:31:03.1768412Z and.b32 %r249, %r248, 64; 2026-02-21T09:31:03.1768570Z add.s32 %r383, %r247, %r249; 2026-02-21T09:31:03.1768723Z mov.pred %p4, -1; 2026-02-21T09:31:03.1768901Z mov.b32 %r571, 0; 2026-02-21T09:31:03.1769060Z // begin inline asm 2026-02-21T09:31:03.1769434Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 0], {%r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571}; 2026-02-21T09:31:03.1769844Z // end inline asm 2026-02-21T09:31:03.1769992Z // begin inline asm 2026-02-21T09:31:03.1770369Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 16], {%r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571}; 2026-02-21T09:31:03.1770768Z // end inline asm 2026-02-21T09:31:03.1770922Z // begin inline asm 2026-02-21T09:31:03.1771284Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 32], {%r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571}; 2026-02-21T09:31:03.1771738Z // end inline asm 2026-02-21T09:31:03.1771917Z // begin inline asm 2026-02-21T09:31:03.1772276Z @%p4 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r383 + 48], {%r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571, %r571}; 2026-02-21T09:31:03.1772678Z // end inline asm 2026-02-21T09:31:03.1772815Z // begin inline asm 2026-02-21T09:31:03.1772978Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:03.1773146Z // end inline asm 2026-02-21T09:31:03.1773296Z bar.sync 0; 2026-02-21T09:31:03.1773552Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1773858Z add.s32 %r572, %r85, 81920; 2026-02-21T09:31:03.1774021Z // begin inline asm 2026-02-21T09:31:03.1774192Z @%p8 mbarrier.init.shared::cta.b64 [%r572], 1; 2026-02-21T09:31:03.1774392Z // end inline asm 2026-02-21T09:31:03.1774527Z bar.sync 0; 2026-02-21T09:31:03.1774714Z add.s32 %r190, %r85, 81928; 2026-02-21T09:31:03.1774874Z // begin inline asm 2026-02-21T09:31:03.1775049Z @%p8 mbarrier.init.shared::cta.b64 [%r190], 1; 2026-02-21T09:31:03.1775238Z // end inline asm 2026-02-21T09:31:03.1775500Z .loc 1 47 60 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:60 2026-02-21T09:31:03.1775801Z or.b32 %r251, %r243, %r18; 2026-02-21T09:31:03.1776077Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1776386Z mad.wide.s32 %rd21, %r251, 2, %rd14; 2026-02-21T09:31:03.1776568Z cvt.u64.u32 %rd42, %r18; 2026-02-21T09:31:03.1776734Z cvt.s64.s32 %rd5, %r243; 2026-02-21T09:31:03.1776892Z or.b64 %rd43, %rd5, %rd42; 2026-02-21T09:31:03.1777056Z shl.b64 %rd44, %rd43, 1; 2026-02-21T09:31:03.1777211Z add.s64 %rd6, %rd14, %rd44; 2026-02-21T09:31:03.1777380Z add.s64 %rd22, %rd6, 262144; 2026-02-21T09:31:03.1777541Z mov.b32 %r261, 16; 2026-02-21T09:31:03.1777797Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1778080Z // begin inline asm 2026-02-21T09:31:03.1778277Z cp.async.cg.shared.global [ %r191 + 0 ], [ %rd21 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1778504Z // end inline asm 2026-02-21T09:31:03.1778635Z // begin inline asm 2026-02-21T09:31:03.1778831Z cp.async.cg.shared.global [ %r193 + 0 ], [ %rd22 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1779046Z // end inline asm 2026-02-21T09:31:03.1779189Z cp.async.commit_group; 2026-02-21T09:31:03.1779447Z .loc 1 48 59 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:59 2026-02-21T09:31:03.1779726Z or.b32 %r252, %r244, %r19; 2026-02-21T09:31:03.1779987Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1780269Z mad.wide.s32 %rd23, %r252, 2, %rd15; 2026-02-21T09:31:03.1780443Z mov.b32 %r196, 8; 2026-02-21T09:31:03.1780680Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1780963Z // begin inline asm 2026-02-21T09:31:03.1781190Z cp.async.ca.shared.global [ %r195 + 0 ], [ %rd23 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1781432Z // end inline asm 2026-02-21T09:31:03.1781579Z cp.async.commit_group; 2026-02-21T09:31:03.1781829Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1782113Z add.s64 %rd24, %rd6, 32; 2026-02-21T09:31:03.1782262Z cvt.u64.u32 %rd45, %r24; 2026-02-21T09:31:03.1782421Z or.b64 %rd46, %rd5, %rd45; 2026-02-21T09:31:03.1782576Z shl.b64 %rd47, %rd46, 1; 2026-02-21T09:31:03.1782738Z add.s64 %rd48, %rd14, %rd47; 2026-02-21T09:31:03.1782901Z add.s64 %rd25, %rd48, 262144; 2026-02-21T09:31:03.1783161Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1783460Z bar.sync 0; 2026-02-21T09:31:03.1783596Z // begin inline asm 2026-02-21T09:31:03.1783823Z cp.async.cg.shared.global [ %r197 + 0 ], [ %rd24 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1784062Z // end inline asm 2026-02-21T09:31:03.1784205Z // begin inline asm 2026-02-21T09:31:03.1784394Z cp.async.cg.shared.global [ %r199 + 0 ], [ %rd25 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1784616Z // end inline asm 2026-02-21T09:31:03.1784802Z cp.async.commit_group; 2026-02-21T09:31:03.1785060Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1785350Z add.s64 %rd26, %rd23, 32; 2026-02-21T09:31:03.1785610Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1785892Z // begin inline asm 2026-02-21T09:31:03.1786081Z cp.async.ca.shared.global [ %r201 + 0 ], [ %rd26 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1786309Z // end inline asm 2026-02-21T09:31:03.1786453Z cp.async.commit_group; 2026-02-21T09:31:03.1786710Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1787008Z add.s64 %rd27, %rd6, 64; 2026-02-21T09:31:03.1787159Z cvt.u64.u32 %rd49, %r28; 2026-02-21T09:31:03.1787318Z or.b64 %rd50, %rd5, %rd49; 2026-02-21T09:31:03.1787471Z shl.b64 %rd51, %rd50, 1; 2026-02-21T09:31:03.1787628Z add.s64 %rd52, %rd14, %rd51; 2026-02-21T09:31:03.1787785Z add.s64 %rd28, %rd52, 262144; 2026-02-21T09:31:03.1788058Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1788343Z bar.sync 0; 2026-02-21T09:31:03.1788469Z // begin inline asm 2026-02-21T09:31:03.1788666Z cp.async.cg.shared.global [ %r203 + 0 ], [ %rd27 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1788886Z // end inline asm 2026-02-21T09:31:03.1789022Z // begin inline asm 2026-02-21T09:31:03.1789210Z cp.async.cg.shared.global [ %r205 + 0 ], [ %rd28 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1789428Z // end inline asm 2026-02-21T09:31:03.1789563Z cp.async.commit_group; 2026-02-21T09:31:03.1789826Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1790113Z add.s64 %rd29, %rd23, 64; 2026-02-21T09:31:03.1790373Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1790661Z // begin inline asm 2026-02-21T09:31:03.1790850Z cp.async.ca.shared.global [ %r207 + 0 ], [ %rd29 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1791069Z // end inline asm 2026-02-21T09:31:03.1791204Z cp.async.commit_group; 2026-02-21T09:31:03.1791463Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1791750Z add.s64 %rd30, %rd6, 96; 2026-02-21T09:31:03.1791904Z cvt.u64.u32 %rd53, %r32; 2026-02-21T09:31:03.1792057Z or.b64 %rd54, %rd5, %rd53; 2026-02-21T09:31:03.1792208Z shl.b64 %rd55, %rd54, 1; 2026-02-21T09:31:03.1792365Z add.s64 %rd56, %rd14, %rd55; 2026-02-21T09:31:03.1792519Z add.s64 %rd31, %rd56, 262144; 2026-02-21T09:31:03.1792792Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1793134Z bar.sync 0; 2026-02-21T09:31:03.1793279Z // begin inline asm 2026-02-21T09:31:03.1793478Z cp.async.cg.shared.global [ %r209 + 0 ], [ %rd30 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1793709Z // end inline asm 2026-02-21T09:31:03.1793853Z // begin inline asm 2026-02-21T09:31:03.1794046Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd31 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1794274Z // end inline asm 2026-02-21T09:31:03.1794416Z cp.async.commit_group; 2026-02-21T09:31:03.1794795Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1795075Z add.s64 %rd32, %rd23, 96; 2026-02-21T09:31:03.1795341Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1795623Z // begin inline asm 2026-02-21T09:31:03.1795868Z cp.async.ca.shared.global [ %r213 + 0 ], [ %rd32 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1796096Z // end inline asm 2026-02-21T09:31:03.1796233Z cp.async.commit_group; 2026-02-21T09:31:03.1796489Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1796765Z add.s64 %rd33, %rd6, 128; 2026-02-21T09:31:03.1796922Z cvt.u64.u32 %rd57, %r36; 2026-02-21T09:31:03.1797072Z or.b64 %rd58, %rd5, %rd57; 2026-02-21T09:31:03.1797230Z shl.b64 %rd59, %rd58, 1; 2026-02-21T09:31:03.1797388Z add.s64 %rd60, %rd14, %rd59; 2026-02-21T09:31:03.1797545Z add.s64 %rd34, %rd60, 262144; 2026-02-21T09:31:03.1797809Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1798082Z bar.sync 0; 2026-02-21T09:31:03.1798216Z // begin inline asm 2026-02-21T09:31:03.1798404Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd33 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1798624Z // end inline asm 2026-02-21T09:31:03.1798755Z // begin inline asm 2026-02-21T09:31:03.1798951Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd34 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1799174Z // end inline asm 2026-02-21T09:31:03.1799324Z cp.async.commit_group; 2026-02-21T09:31:03.1799577Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1799854Z add.s64 %rd35, %rd23, 128; 2026-02-21T09:31:03.1800116Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1800385Z // begin inline asm 2026-02-21T09:31:03.1800581Z cp.async.ca.shared.global [ %r219 + 0 ], [ %rd35 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1800794Z // end inline asm 2026-02-21T09:31:03.1800933Z cp.async.commit_group; 2026-02-21T09:31:03.1801188Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1801466Z add.s64 %rd36, %rd6, 160; 2026-02-21T09:31:03.1801624Z cvt.u64.u32 %rd61, %r40; 2026-02-21T09:31:03.1801775Z or.b64 %rd62, %rd5, %rd61; 2026-02-21T09:31:03.1801932Z shl.b64 %rd63, %rd62, 1; 2026-02-21T09:31:03.1802083Z add.s64 %rd64, %rd14, %rd63; 2026-02-21T09:31:03.1802244Z add.s64 %rd37, %rd64, 262144; 2026-02-21T09:31:03.1802504Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1802790Z bar.sync 0; 2026-02-21T09:31:03.1802924Z // begin inline asm 2026-02-21T09:31:03.1803116Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd36 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1803344Z // end inline asm 2026-02-21T09:31:03.1803477Z // begin inline asm 2026-02-21T09:31:03.1803673Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd37 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1803888Z // end inline asm 2026-02-21T09:31:03.1804030Z cp.async.commit_group; 2026-02-21T09:31:03.1804277Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1804568Z add.s64 %rd38, %rd23, 160; 2026-02-21T09:31:03.1804927Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1805245Z // begin inline asm 2026-02-21T09:31:03.1805441Z cp.async.ca.shared.global [ %r225 + 0 ], [ %rd38 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1805653Z // end inline asm 2026-02-21T09:31:03.1805796Z cp.async.commit_group; 2026-02-21T09:31:03.1806050Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1806339Z add.s64 %rd39, %rd6, 192; 2026-02-21T09:31:03.1806498Z cvt.u64.u32 %rd65, %r44; 2026-02-21T09:31:03.1806648Z or.b64 %rd66, %rd5, %rd65; 2026-02-21T09:31:03.1806805Z shl.b64 %rd67, %rd66, 1; 2026-02-21T09:31:03.1806955Z add.s64 %rd68, %rd14, %rd67; 2026-02-21T09:31:03.1807120Z add.s64 %rd40, %rd68, 262144; 2026-02-21T09:31:03.1807408Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1807723Z bar.sync 0; 2026-02-21T09:31:03.1807854Z // begin inline asm 2026-02-21T09:31:03.1808047Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd39 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1808268Z // end inline asm 2026-02-21T09:31:03.1808397Z // begin inline asm 2026-02-21T09:31:03.1808591Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd40 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1808808Z // end inline asm 2026-02-21T09:31:03.1808949Z cp.async.commit_group; 2026-02-21T09:31:03.1809194Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1809480Z add.s64 %rd41, %rd23, 192; 2026-02-21T09:31:03.1809728Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1810004Z // begin inline asm 2026-02-21T09:31:03.1810195Z cp.async.ca.shared.global [ %r231 + 0 ], [ %rd41 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1810407Z // end inline asm 2026-02-21T09:31:03.1810551Z cp.async.commit_group; 2026-02-21T09:31:03.1810795Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1811082Z cp.async.wait_group 12; 2026-02-21T09:31:03.1811231Z bar.sync 0; 2026-02-21T09:31:03.1811465Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1811750Z setp.ne.b32 %p10, %r71, 0; 2026-02-21T09:31:03.1811902Z @%p10 bra $L__BB0_4; 2026-02-21T09:31:03.1812090Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:03.1812304Z elect.sync %r257|%p12, -1; 2026-02-21T09:31:03.1812466Z mov.b32 %r254, 135266320; 2026-02-21T09:31:03.1812616Z mov.pred %p11, 0; 2026-02-21T09:31:03.1812759Z // begin inline asm 2026-02-21T09:31:03.1812981Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r569 + 0 ], %rd69, %rd70, %r254, %p11; 2026-02-21T09:31:03.1813241Z // end inline asm 2026-02-21T09:31:03.1813386Z // begin inline asm 2026-02-21T09:31:03.1813613Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r569 + 64 ], %rd71, %rd70, %r254, %p11; 2026-02-21T09:31:03.1813882Z // end inline asm 2026-02-21T09:31:03.1814025Z add.s32 %r259, %r85, 81920; 2026-02-21T09:31:03.1814207Z cvt.u64.u32 %rd73, %r259; 2026-02-21T09:31:03.1814361Z // begin inline asm 2026-02-21T09:31:03.1814576Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd73]; 2026-02-21T09:31:03.1815033Z // end inline asm 2026-02-21T09:31:03.1815239Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:03.1815588Z .loc 1 0 0 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:0 2026-02-21T09:31:03.1815883Z cvt.s32.s16 %r59, %rs5; 2026-02-21T09:31:03.1816050Z or.b32 %r62, %r239, %r8; 2026-02-21T09:31:03.1816205Z or.b32 %r63, %r241, %r10; 2026-02-21T09:31:03.1816368Z or.b32 %r64, %r241, %r11; 2026-02-21T09:31:03.1816521Z or.b32 %r65, %r241, %r12; 2026-02-21T09:31:03.1816681Z or.b32 %r66, %r241, %r13; 2026-02-21T09:31:03.1816867Z or.b32 %r67, %r241, %r14; 2026-02-21T09:31:03.1817065Z or.b32 %r68, %r241, %r15; 2026-02-21T09:31:03.1817220Z or.b32 %r69, %r241, %r16; 2026-02-21T09:31:03.1817369Z or.b32 %r70, %r241, %r17; 2026-02-21T09:31:03.1817641Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1817933Z add.s64 %rd74, %rd6, 224; 2026-02-21T09:31:03.1818096Z cvt.u64.u32 %rd78, %r48; 2026-02-21T09:31:03.1818253Z add.s64 %rd79, %rd5, %rd78; 2026-02-21T09:31:03.1818420Z shl.b64 %rd80, %rd79, 1; 2026-02-21T09:31:03.1818575Z add.s64 %rd81, %rd14, %rd80; 2026-02-21T09:31:03.1818743Z add.s64 %rd75, %rd81, 262144; 2026-02-21T09:31:03.1819020Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1819314Z bar.sync 0; 2026-02-21T09:31:03.1819457Z // begin inline asm 2026-02-21T09:31:03.1819707Z cp.async.cg.shared.global [ %r260 + 0 ], [ %rd74 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1819950Z // end inline asm 2026-02-21T09:31:03.1820088Z // begin inline asm 2026-02-21T09:31:03.1820293Z cp.async.cg.shared.global [ %r262 + 0 ], [ %rd75 + 0 ], 0x10, %r261; 2026-02-21T09:31:03.1820517Z // end inline asm 2026-02-21T09:31:03.1820664Z cp.async.commit_group; 2026-02-21T09:31:03.1820938Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1821233Z add.s64 %rd76, %rd23, 224; 2026-02-21T09:31:03.1821510Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1821791Z // begin inline asm 2026-02-21T09:31:03.1821984Z cp.async.ca.shared.global [ %r264 + 0 ], [ %rd76 + 0 ], 0x8, %r196; 2026-02-21T09:31:03.1822198Z // end inline asm 2026-02-21T09:31:03.1822339Z cp.async.commit_group; 2026-02-21T09:31:03.1822595Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1822880Z shl.b32 %r270, %r59, 18; 2026-02-21T09:31:03.1823039Z or.b32 %r271, %r56, %r270; 2026-02-21T09:31:03.1823198Z mad.wide.s32 %rd230, %r271, 2, %rd4; 2026-02-21T09:31:03.1823373Z add.s32 %r272, %r6, %r60; 2026-02-21T09:31:03.1823521Z add.s32 %r273, %r272, %r61; 2026-02-21T09:31:03.1823679Z shl.b32 %r274, %r273, 10; 2026-02-21T09:31:03.1823826Z or.b32 %r275, %r57, %r274; 2026-02-21T09:31:03.1823986Z cvt.u64.u32 %rd9, %r275; 2026-02-21T09:31:03.1824131Z mov.b32 %r575, 1; 2026-02-21T09:31:03.1824274Z mov.b32 %r574, 7; 2026-02-21T09:31:03.1824417Z mov.b64 %rd231, 0; 2026-02-21T09:31:03.1824552Z mov.b32 %r573, %r571; 2026-02-21T09:31:03.1824731Z mov.b32 %r576, %r571; 2026-02-21T09:31:03.1824871Z bra.uni $L__BB0_5; 2026-02-21T09:31:03.1825054Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:03.1825364Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1825658Z setp.lt.u64 %p23, %rd231, 896; 2026-02-21T09:31:03.1825934Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1826219Z // begin inline asm 2026-02-21T09:31:03.1826356Z 2026-02-21T09:31:03.1826467Z { 2026-02-21T09:31:03.1826595Z .reg .pred complete; 2026-02-21T09:31:03.1826735Z waitLoop: 2026-02-21T09:31:03.1826929Z mbarrier.try_wait.parity.shared.b64 complete, [%r572], %r571; 2026-02-21T09:31:03.1827164Z @!complete bra.uni waitLoop; 2026-02-21T09:31:03.1827321Z } 2026-02-21T09:31:03.1827385Z 2026-02-21T09:31:03.1827439Z // end inline asm 2026-02-21T09:31:03.1827579Z add.s32 %r304, %r575, 1; 2026-02-21T09:31:03.1827729Z setp.gt.s32 %p24, %r304, 1; 2026-02-21T09:31:03.1827893Z selp.b32 %r575, 0, %r304, %p24; 2026-02-21T09:31:03.1828065Z selp.b32 %r305, 1, 0, %p24; 2026-02-21T09:31:03.1828215Z xor.b32 %r82, %r576, %r305; 2026-02-21T09:31:03.1828482Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1828790Z add.s32 %r306, %r574, 1; 2026-02-21T09:31:03.1828970Z setp.gt.s32 %p25, %r306, 7; 2026-02-21T09:31:03.1829120Z selp.b32 %r574, 0, %r306, %p25; 2026-02-21T09:31:03.1829393Z .loc 1 47 32 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:32 2026-02-21T09:31:03.1829684Z add.s64 %rd90, %rd230, -262144; 2026-02-21T09:31:03.1829947Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1830231Z shl.b32 %r307, %r574, 13; 2026-02-21T09:31:03.1830376Z add.s32 %r309, %r85, %r307; 2026-02-21T09:31:03.1830529Z bar.sync 0; 2026-02-21T09:31:03.1830656Z add.s32 %r298, %r309, %r20; 2026-02-21T09:31:03.1830816Z selp.b32 %r299, 16, 0, %p23; 2026-02-21T09:31:03.1830965Z // begin inline asm 2026-02-21T09:31:03.1831190Z cp.async.cg.shared.global [ %r298 + 0 ], [ %rd90 + 0 ], 0x10, %r299; 2026-02-21T09:31:03.1831415Z // end inline asm 2026-02-21T09:31:03.1831577Z add.s32 %r300, %r298, 4096; 2026-02-21T09:31:03.1831735Z // begin inline asm 2026-02-21T09:31:03.1831928Z cp.async.cg.shared.global [ %r300 + 0 ], [ %rd230 + 0 ], 0x10, %r299; 2026-02-21T09:31:03.1832150Z // end inline asm 2026-02-21T09:31:03.1832285Z cp.async.commit_group; 2026-02-21T09:31:03.1832549Z .loc 1 48 34 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:34 2026-02-21T09:31:03.1832825Z add.s64 %rd93, %rd9, %rd231; 2026-02-21T09:31:03.1832987Z cvt.u32.u64 %r310, %rd93; 2026-02-21T09:31:03.1833149Z mad.wide.s32 %rd92, %r310, 2, %rd15; 2026-02-21T09:31:03.1833419Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1833706Z shl.b32 %r311, %r574, 11; 2026-02-21T09:31:03.1833857Z add.s32 %r302, %r195, %r311; 2026-02-21T09:31:03.1834017Z selp.b32 %r303, 8, 0, %p23; 2026-02-21T09:31:03.1834168Z // begin inline asm 2026-02-21T09:31:03.1834368Z cp.async.ca.shared.global [ %r302 + 0 ], [ %rd92 + 0 ], 0x8, %r303; 2026-02-21T09:31:03.1834583Z // end inline asm 2026-02-21T09:31:03.1834769Z cp.async.commit_group; 2026-02-21T09:31:03.1835035Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1835324Z add.s64 %rd12, %rd231, 16; 2026-02-21T09:31:03.1835486Z add.s64 %rd230, %rd230, 32; 2026-02-21T09:31:03.1835644Z setp.lt.u64 %p26, %rd231, 992; 2026-02-21T09:31:03.1835808Z mov.b64 %rd231, %rd12; 2026-02-21T09:31:03.1835951Z mov.b32 %r571, %r576; 2026-02-21T09:31:03.1836095Z mov.b32 %r572, %r312; 2026-02-21T09:31:03.1836230Z mov.b32 %r576, %r82; 2026-02-21T09:31:03.1836378Z @%p26 bra $L__BB0_5; 2026-02-21T09:31:03.1836516Z bra.uni $L__BB0_8; 2026-02-21T09:31:03.1836702Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:31:03.1836947Z // => This Inner Loop Header: Depth=2 2026-02-21T09:31:03.1837154Z add.s32 %r277, %r573, 1; 2026-02-21T09:31:03.1837313Z setp.gt.s32 %p17, %r277, 7; 2026-02-21T09:31:03.1837470Z selp.b32 %r573, 0, %r277, %p17; 2026-02-21T09:31:03.1837749Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1838036Z cp.async.wait_group 12; 2026-02-21T09:31:03.1838190Z bar.sync 0; 2026-02-21T09:31:03.1838435Z .loc 1 42 89 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:42:89 2026-02-21T09:31:03.1838717Z shl.b32 %r278, %r575, 3; 2026-02-21T09:31:03.1838869Z add.s32 %r280, %r85, %r278; 2026-02-21T09:31:03.1839020Z add.s32 %r312, %r280, 81920; 2026-02-21T09:31:03.1839281Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1839558Z @%p10 bra $L__BB0_7; 2026-02-21T09:31:03.1839742Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:03.1840063Z .loc 1 48 87 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:48:87 2026-02-21T09:31:03.1840393Z shl.b32 %r285, %r573, 11; 2026-02-21T09:31:03.1840548Z add.s32 %r287, %r85, %r285; 2026-02-21T09:31:03.1840695Z add.s32 %r288, %r287, 65536; 2026-02-21T09:31:03.1840954Z .loc 1 47 85 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:47:85 2026-02-21T09:31:03.1841225Z shl.b32 %r289, %r573, 13; 2026-02-21T09:31:03.1841376Z add.s32 %r290, %r85, %r289; 2026-02-21T09:31:03.1841624Z .loc 1 49 52 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:49:52 2026-02-21T09:31:03.1841918Z elect.sync %r291|%p19, -1; 2026-02-21T09:31:03.1842081Z bfe.u32 %r292, %r290, 4, 14; 2026-02-21T09:31:03.1842230Z cvt.u64.u32 %rd87, %r292; 2026-02-21T09:31:03.1842401Z or.b64 %rd82, %rd87, -4611685949674356736; 2026-02-21T09:31:03.1842576Z bfe.u32 %r293, %r288, 4, 14; 2026-02-21T09:31:03.1842753Z cvt.u64.u32 %rd88, %r293; 2026-02-21T09:31:03.1842934Z or.b64 %rd83, %rd88, -4611685949699522560; 2026-02-21T09:31:03.1843120Z mov.b32 %r282, 135266320; 2026-02-21T09:31:03.1843268Z mov.pred %p18, -1; 2026-02-21T09:31:03.1843412Z // begin inline asm 2026-02-21T09:31:03.1843639Z @%p19 tcgen05.mma.cta_group::1.kind::f16 [ %r569 + 0 ], %rd82, %rd83, %r282, %p18; 2026-02-21T09:31:03.1843886Z // end inline asm 2026-02-21T09:31:03.1844032Z add.s32 %r294, %r290, 4096; 2026-02-21T09:31:03.1844184Z bfe.u32 %r295, %r294, 4, 14; 2026-02-21T09:31:03.1844339Z cvt.u64.u32 %rd89, %r295; 2026-02-21T09:31:03.1844495Z or.b64 %rd84, %rd89, -4611685949674356736; 2026-02-21T09:31:03.1844699Z // begin inline asm 2026-02-21T09:31:03.1844915Z @%p19 tcgen05.mma.cta_group::1.kind::f16 [ %r569 + 64 ], %rd84, %rd83, %r282, %p18; 2026-02-21T09:31:03.1845175Z // end inline asm 2026-02-21T09:31:03.1845318Z cvt.u64.u32 %rd86, %r312; 2026-02-21T09:31:03.1845461Z // begin inline asm 2026-02-21T09:31:03.1845668Z @%p19 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T09:31:03.1845890Z // end inline asm 2026-02-21T09:31:03.1846029Z bra.uni $L__BB0_7; 2026-02-21T09:31:03.1846183Z $L__BB0_9: // %._crit_edge 2026-02-21T09:31:03.1846478Z .loc 1 23 4 // c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py:23:4 2026-02-21T09:31:03.1846757Z bar.sync 0; 2026-02-21T09:31:03.1846894Z // begin inline asm 2026-02-21T09:31:03.1847094Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r569, 128; 2026-02-21T09:31:03.1847306Z // end inline asm 2026-02-21T09:31:03.1847441Z ret; 2026-02-21T09:31:03.1847561Z $L__tmp0: 2026-02-21T09:31:03.1847689Z $L__func_end0: 2026-02-21T09:31:03.1847838Z // -- End function 2026-02-21T09:31:03.1848019Z } 2026-02-21T09:31:03.1848276Z .file 1 "/tmp/torchinductor_root/2v/c2vt3krzxukldj2jxfawqezsf5teluajyytcmun2orsiaxvz5cef.py" 2026-02-21T09:31:03.1848595Z .section .debug_abbrev 2026-02-21T09:31:03.1848739Z { 2026-02-21T09:31:03.1848885Z .b8 1 // Abbreviation Code 2026-02-21T09:31:03.1849107Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:03.1849314Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:03.1849521Z .b8 37 // DW_AT_producer 2026-02-21T09:31:03.1849717Z .b8 8 // DW_FORM_string 2026-02-21T09:31:03.1849918Z .b8 19 // DW_AT_language 2026-02-21T09:31:03.1850113Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:03.1850315Z .b8 3 // DW_AT_name 2026-02-21T09:31:03.1850517Z .b8 8 // DW_FORM_string 2026-02-21T09:31:03.1850714Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:03.1850919Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:03.1851113Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:03.1851344Z .b8 8 // DW_FORM_string 2026-02-21T09:31:03.1851559Z .b8 0 // EOM(1) 2026-02-21T09:31:03.1851747Z .b8 0 // EOM(2) 2026-02-21T09:31:03.1851931Z .b8 0 // EOM(3) 2026-02-21T09:31:03.1852092Z } 2026-02-21T09:31:03.1852217Z .section .debug_info 2026-02-21T09:31:03.1852352Z { 2026-02-21T09:31:03.1852500Z .b32 104 // Length of Unit 2026-02-21T09:31:03.1852712Z .b8 2 // DWARF version number 2026-02-21T09:31:03.1852903Z .b8 0 2026-02-21T09:31:03.1853079Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:03.1853335Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:03.1853585Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:03.1853861Z .b8 116 // DW_AT_producer 2026-02-21T09:31:03.1854053Z .b8 114 2026-02-21T09:31:03.1854170Z .b8 105 2026-02-21T09:31:03.1854290Z .b8 116 2026-02-21T09:31:03.1854401Z .b8 111 2026-02-21T09:31:03.1854516Z .b8 110 2026-02-21T09:31:03.1854623Z .b8 0 2026-02-21T09:31:03.1854799Z .b8 2 // DW_AT_language 2026-02-21T09:31:03.1854971Z .b8 0 2026-02-21T09:31:03.1855117Z .b8 99 // DW_AT_name 2026-02-21T09:31:03.1855303Z .b8 50 2026-02-21T09:31:03.1855423Z .b8 118 2026-02-21T09:31:03.1855546Z .b8 116 2026-02-21T09:31:03.1855660Z .b8 51 2026-02-21T09:31:03.1855785Z .b8 107 2026-02-21T09:31:03.1855836Z .b8 114 2026-02-21T09:31:03.1855887Z .b8 122 2026-02-21T09:31:03.1855945Z .b8 120 2026-02-21T09:31:03.1855998Z .b8 117 2026-02-21T09:31:03.1856048Z .b8 107 2026-02-21T09:31:03.1856099Z .b8 108 2026-02-21T09:31:03.1856157Z .b8 100 2026-02-21T09:31:03.1856208Z .b8 106 2026-02-21T09:31:03.1856259Z .b8 50 2026-02-21T09:31:03.1856317Z .b8 106 2026-02-21T09:31:03.1856369Z .b8 120 2026-02-21T09:31:03.1856421Z .b8 102 2026-02-21T09:31:03.1856474Z .b8 97 2026-02-21T09:31:03.1856532Z .b8 119 2026-02-21T09:31:03.1856582Z .b8 113 2026-02-21T09:31:03.1856633Z .b8 101 2026-02-21T09:31:03.1856683Z .b8 122 2026-02-21T09:31:03.1856740Z .b8 115 2026-02-21T09:31:03.1856790Z .b8 102 2026-02-21T09:31:03.1856840Z .b8 53 2026-02-21T09:31:03.1856897Z .b8 116 2026-02-21T09:31:03.1856947Z .b8 101 2026-02-21T09:31:03.1856996Z .b8 108 2026-02-21T09:31:03.1857047Z .b8 117 2026-02-21T09:31:03.1857104Z .b8 97 2026-02-21T09:31:03.1857154Z .b8 106 2026-02-21T09:31:03.1857204Z .b8 121 2026-02-21T09:31:03.1857260Z .b8 121 2026-02-21T09:31:03.1857310Z .b8 116 2026-02-21T09:31:03.1857358Z .b8 99 2026-02-21T09:31:03.1857408Z .b8 109 2026-02-21T09:31:03.1857465Z .b8 117 2026-02-21T09:31:03.1857514Z .b8 110 2026-02-21T09:31:03.1857564Z .b8 50 2026-02-21T09:31:03.1857612Z .b8 111 2026-02-21T09:31:03.1857669Z .b8 114 2026-02-21T09:31:03.1857721Z .b8 115 2026-02-21T09:31:03.1857770Z .b8 105 2026-02-21T09:31:03.1857827Z .b8 97 2026-02-21T09:31:03.1857879Z .b8 120 2026-02-21T09:31:03.1857931Z .b8 118 2026-02-21T09:31:03.1857981Z .b8 122 2026-02-21T09:31:03.1858037Z .b8 53 2026-02-21T09:31:03.1858089Z .b8 99 2026-02-21T09:31:03.1858140Z .b8 101 2026-02-21T09:31:03.1858195Z .b8 102 2026-02-21T09:31:03.1858246Z .b8 46 2026-02-21T09:31:03.1858295Z .b8 112 2026-02-21T09:31:03.1858346Z .b8 121 2026-02-21T09:31:03.1858404Z .b8 0 2026-02-21T09:31:03.1858499Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:03.1858577Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:03.1858634Z .b8 116 2026-02-21T09:31:03.1858684Z .b8 109 2026-02-21T09:31:03.1858734Z .b8 112 2026-02-21T09:31:03.1858784Z .b8 47 2026-02-21T09:31:03.1858844Z .b8 116 2026-02-21T09:31:03.1858894Z .b8 111 2026-02-21T09:31:03.1858943Z .b8 114 2026-02-21T09:31:03.1859003Z .b8 99 2026-02-21T09:31:03.1859054Z .b8 104 2026-02-21T09:31:03.1859107Z .b8 105 2026-02-21T09:31:03.1859157Z .b8 110 2026-02-21T09:31:03.1859219Z .b8 100 2026-02-21T09:31:03.1859316Z .b8 117 2026-02-21T09:31:03.1859393Z .b8 99 2026-02-21T09:31:03.1859446Z .b8 116 2026-02-21T09:31:03.1859509Z .b8 111 2026-02-21T09:31:03.1859563Z .b8 114 2026-02-21T09:31:03.1859616Z .b8 95 2026-02-21T09:31:03.1859674Z .b8 114 2026-02-21T09:31:03.1859726Z .b8 111 2026-02-21T09:31:03.1859778Z .b8 111 2026-02-21T09:31:03.1859829Z .b8 116 2026-02-21T09:31:03.1859886Z .b8 47 2026-02-21T09:31:03.1859936Z .b8 50 2026-02-21T09:31:03.1859986Z .b8 118 2026-02-21T09:31:03.1860042Z .b8 0 2026-02-21T09:31:03.1860095Z } 2026-02-21T09:31:03.1860163Z .section .debug_macinfo { } 2026-02-21T09:31:03.1860168Z 2026-02-21T09:31:03.1860252Z ================================================================ 2026-02-21T09:31:03.1860367Z please share the reproducer above with Triton project. 2026-02-21T09:31:06.4387995Z 2026-02-21T09:31:06.4389306Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━ 100/100 10.5 configs/s 2026-02-21T09:31:06.4400149Z [18s] Adaptive compile timeout: 30s (90% percentile=6.6s, bounds=[30.0s, 30s]) 2026-02-21T09:31:07.1293105Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━ 1000/1000 1340.2 configs/s 2026-02-21T09:31:07.2065624Z [18s] Initial random population of 100, 5 starting points: 2026-02-21T09:31:07.2067266Z error=18 2026-02-21T09:31:07.2067471Z ok=82 2026-02-21T09:31:07.2073044Z min=0.1617 2026-02-21T09:31:07.2074590Z mid=2.2375 2026-02-21T09:31:07.2075054Z max=59.4125 2026-02-21T09:31:07.2075200Z best={'block_sizes': [128, 16, 32], 2026-02-21T09:31:07.2078104Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:31:07.2078419Z 'l2_groupings': [16], 2026-02-21T09:31:07.2083188Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:31:07.2087208Z 'loop_orders': [[1, 0]], 2026-02-21T09:31:07.2091157Z 'maxnreg': 128, 2026-02-21T09:31:07.2092516Z 'num_sm_multiplier': 8, 2026-02-21T09:31:07.2092719Z 'num_stages': 7, 2026-02-21T09:31:07.2092886Z 'num_warps': 4, 2026-02-21T09:31:07.2093061Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:31:07.2093262Z 'range_flattens': [None, False], 2026-02-21T09:31:07.2093453Z 'range_multi_buffers': [False, False], 2026-02-21T09:31:07.2093641Z 'range_num_stages': [0, 0], 2026-02-21T09:31:07.2093806Z 'range_unroll_factors': [0, 0], 2026-02-21T09:31:07.2093987Z 'range_warp_specializes': [True, None]} 2026-02-21T09:31:07.2094278Z [18s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:31:08.5827204Z [20s] Generation 1 starting: 94 neighbors, 5 active search path(s) 2026-02-21T09:31:15.8876274Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96/96 15.2 configs/s 2026-02-21T09:31:16.3680338Z 2026-02-21T09:31:16.3680353Z 2026-02-21T09:31:16.3680780Z ================================================================ 2026-02-21T09:31:16.3681142Z Internal Triton PTX codegen error 2026-02-21T09:31:16.3681343Z `ptxas` stderr: 2026-02-21T09:31:16.3681892Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 300 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:16.3682471Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:16.3682654Z 2026-02-21T09:31:16.3683146Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpux472jqq.ptx -o /tmp/tmpux472jqq.ptx.o 2026-02-21T09:31:16.3683707Z 2026-02-21T09:31:16.3683711Z 2026-02-21T09:31:16.3683792Z // 2026-02-21T09:31:16.3683962Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:16.3684191Z // 2026-02-21T09:31:16.3684277Z 2026-02-21T09:31:16.3684345Z .version 8.7 2026-02-21T09:31:16.3684529Z .target sm_100a 2026-02-21T09:31:16.3684804Z .address_size 64 2026-02-21T09:31:16.3684938Z 2026-02-21T09:31:16.3685105Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:16.3685461Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:16.3685755Z // @_helion_matmul 2026-02-21T09:31:16.3686353Z .visible .entry _helion_matmul( 2026-02-21T09:31:16.3686733Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:16.3687079Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:16.3687416Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:16.3687769Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:16.3688084Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:16.3688329Z ) 2026-02-21T09:31:16.3688476Z .reqntid 128 2026-02-21T09:31:16.3688624Z .maxnreg 32 2026-02-21T09:31:16.3688776Z { 2026-02-21T09:31:16.3688923Z .reg .pred %p<133>; 2026-02-21T09:31:16.3689104Z .reg .b16 %rs<6>; 2026-02-21T09:31:16.3689268Z .reg .b32 %r<417>; 2026-02-21T09:31:16.3689445Z .reg .b64 %rd<162>; 2026-02-21T09:31:16.3689882Z .loc 1 19 0 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:19:0 2026-02-21T09:31:16.3690331Z $L__func_begin0: 2026-02-21T09:31:16.3690658Z .loc 1 19 0 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:19:0 2026-02-21T09:31:16.3690972Z 2026-02-21T09:31:16.3691034Z // %bb.0: 2026-02-21T09:31:16.3691227Z ld.param.b64 %rd11, [_helion_matmul_param_0]; 2026-02-21T09:31:16.3691462Z $L__tmp0: 2026-02-21T09:31:16.3691773Z .loc 1 19 0 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:19 2026-02-21T09:31:16.3692130Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:16.3692348Z ld.param.b64 %rd29, [_helion_matmul_param_1]; 2026-02-21T09:31:16.3692986Z [28s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:16.3694826Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=8, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:31:16.3696362Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:16.3696643Z `ptxas` stderr: 2026-02-21T09:31:16.3697128Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 300 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:16.3697693Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:16.3697866Z 2026-02-21T09:31:16.3698332Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpux472jqq.ptx -o /tmp/tmpux472jqq.ptx.o 2026-02-21T09:31:16.3698862Z 2026-02-21T09:31:16.3699009Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:16.3699312Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:16.3699501Z mov.b32 %r40, global_smem; 2026-02-21T09:31:16.3699687Z // begin inline asm 2026-02-21T09:31:16.3699962Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r40], 32; 2026-02-21T09:31:16.3700253Z // end inline asm 2026-02-21T09:31:16.3700442Z ld.param.b64 %rd46, [_helion_matmul_param_3]; 2026-02-21T09:31:16.3700658Z bar.sync 0; 2026-02-21T09:31:16.3700828Z ld.shared.b32 %r407, [global_smem]; 2026-02-21T09:31:16.3701027Z bar.sync 0; 2026-02-21T09:31:16.3701188Z // begin inline asm 2026-02-21T09:31:16.3701442Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:16.3701748Z // end inline asm 2026-02-21T09:31:16.3702112Z .loc 1 21 68 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:21:68 2026-02-21T09:31:16.3702486Z mov.u32 %r408, %ctaid.x; 2026-02-21T09:31:16.3702670Z mov.u32 %r57, %ctaid.y; 2026-02-21T09:31:16.3702844Z mov.u32 %r58, %ctaid.z; 2026-02-21T09:31:16.3703024Z mov.u32 %r59, %nctaid.x; 2026-02-21T09:31:16.3703322Z mov.u32 %r60, %nctaid.y; 2026-02-21T09:31:16.3703567Z mad.lo.s32 %r61, %r58, %r60, %r57; 2026-02-21T09:31:16.3703768Z mad.lo.s32 %r62, %r61, %r59, %r408; 2026-02-21T09:31:16.3703966Z shl.b32 %r63, %r62, 8; 2026-02-21T09:31:16.3704135Z cvt.s64.s32 %rd47, %r63; 2026-02-21T09:31:16.3704316Z add.s64 %rd25, %rd46, %rd47; 2026-02-21T09:31:16.3704492Z shl.b32 %r64, %r1, 2; 2026-02-21T09:31:16.3704660Z add.s32 %r41, %r40, %r64; 2026-02-21T09:31:16.3704890Z mov.b32 %r50, 0; 2026-02-21T09:31:16.3705044Z // begin inline asm 2026-02-21T09:31:16.3705227Z @%p1 st.shared.b32 [ %r41 + 0 ], %r50; 2026-02-21T09:31:16.3705424Z // end inline asm 2026-02-21T09:31:16.3705587Z bar.warp.sync -1; 2026-02-21T09:31:16.3705800Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:31:16.3705984Z cvt.u64.u32 %rd10, %r40; 2026-02-21T09:31:16.3706171Z // begin inline asm 2026-02-21T09:31:16.3706610Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T09:31:16.3706976Z // end inline asm 2026-02-21T09:31:16.3707152Z // begin inline asm 2026-02-21T09:31:16.3707428Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T09:31:16.3707751Z // end inline asm 2026-02-21T09:31:16.3707911Z mov.b32 %r43, 32; 2026-02-21T09:31:16.3708083Z // begin inline asm 2026-02-21T09:31:16.3708373Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r43; 2026-02-21T09:31:16.3708711Z // end inline asm 2026-02-21T09:31:16.3708877Z mov.b32 %r44, 256; 2026-02-21T09:31:16.3709041Z // begin inline asm 2026-02-21T09:31:16.3709331Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r44; 2026-02-21T09:31:16.3709674Z // end inline asm 2026-02-21T09:31:16.3709856Z mov.b32 %r45, 1024; 2026-02-21T09:31:16.3710037Z // begin inline asm 2026-02-21T09:31:16.3710385Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r45; 2026-02-21T09:31:16.3710799Z // end inline asm 2026-02-21T09:31:16.3710976Z mov.b32 %r46, 12288; 2026-02-21T09:31:16.3711156Z // begin inline asm 2026-02-21T09:31:16.3711451Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r46; 2026-02-21T09:31:16.3711799Z // end inline asm 2026-02-21T09:31:16.3711960Z mov.b64 %rd18, 2048; 2026-02-21T09:31:16.3712134Z // begin inline asm 2026-02-21T09:31:16.3712448Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T09:31:16.3712815Z // end inline asm 2026-02-21T09:31:16.3712975Z mov.b32 %r47, 1; 2026-02-21T09:31:16.3713142Z // begin inline asm 2026-02-21T09:31:16.3713467Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r47; 2026-02-21T09:31:16.3713825Z // end inline asm 2026-02-21T09:31:16.3713994Z // begin inline asm 2026-02-21T09:31:16.3714306Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r47; 2026-02-21T09:31:16.3714713Z // end inline asm 2026-02-21T09:31:16.3714886Z // begin inline asm 2026-02-21T09:31:16.3715191Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T09:31:16.3715533Z // end inline asm 2026-02-21T09:31:16.3715695Z // begin inline asm 2026-02-21T09:31:16.3716019Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:31:16.3716381Z // end inline asm 2026-02-21T09:31:16.3716546Z // begin inline asm 2026-02-21T09:31:16.3716837Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x2; 2026-02-21T09:31:16.3717183Z // end inline asm 2026-02-21T09:31:16.3717359Z // begin inline asm 2026-02-21T09:31:16.3717631Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:31:16.3717963Z // end inline asm 2026-02-21T09:31:16.3718127Z // begin inline asm 2026-02-21T09:31:16.3718630Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd25 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T09:31:16.3719281Z // end inline asm 2026-02-21T09:31:16.3719502Z // begin inline asm 2026-02-21T09:31:16.3719755Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd25 + 0 ], 0x80; 2026-02-21T09:31:16.3720058Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:31:16.3720294Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:31:16.3720502Z // end inline asm 2026-02-21T09:31:16.3720664Z bar.sync 0; 2026-02-21T09:31:16.3720827Z cvta.global.u64 %rd75, %rd25; 2026-02-21T09:31:16.3721181Z .loc 1 22 67 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:22:67 2026-02-21T09:31:16.3721544Z add.s64 %rd43, %rd25, 128; 2026-02-21T09:31:16.3721732Z bar.sync 0; 2026-02-21T09:31:16.3721887Z // begin inline asm 2026-02-21T09:31:16.3722073Z @%p1 st.shared.b32 [ %r41 + 0 ], %r50; 2026-02-21T09:31:16.3722283Z // end inline asm 2026-02-21T09:31:16.3722506Z bar.warp.sync -1; 2026-02-21T09:31:16.3722729Z // begin inline asm 2026-02-21T09:31:16.3723042Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd29; 2026-02-21T09:31:16.3723376Z // end inline asm 2026-02-21T09:31:16.3723533Z // begin inline asm 2026-02-21T09:31:16.3723810Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T09:31:16.3724124Z // end inline asm 2026-02-21T09:31:16.3724277Z // begin inline asm 2026-02-21T09:31:16.3724563Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r43; 2026-02-21T09:31:16.3724928Z // end inline asm 2026-02-21T09:31:16.3725091Z mov.b32 %r52, 16; 2026-02-21T09:31:16.3725250Z // begin inline asm 2026-02-21T09:31:16.3725535Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r52; 2026-02-21T09:31:16.3725860Z // end inline asm 2026-02-21T09:31:16.3726034Z // begin inline asm 2026-02-21T09:31:16.3726370Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r45; 2026-02-21T09:31:16.3726765Z // end inline asm 2026-02-21T09:31:16.3726959Z // begin inline asm 2026-02-21T09:31:16.3727320Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r45; 2026-02-21T09:31:16.3727667Z // end inline asm 2026-02-21T09:31:16.3727831Z // begin inline asm 2026-02-21T09:31:16.3728153Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T09:31:16.3728522Z // end inline asm 2026-02-21T09:31:16.3728685Z // begin inline asm 2026-02-21T09:31:16.3729009Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r47; 2026-02-21T09:31:16.3729368Z // end inline asm 2026-02-21T09:31:16.3729540Z // begin inline asm 2026-02-21T09:31:16.3729852Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r47; 2026-02-21T09:31:16.3730217Z // end inline asm 2026-02-21T09:31:16.3730379Z // begin inline asm 2026-02-21T09:31:16.3730677Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T09:31:16.3731015Z // end inline asm 2026-02-21T09:31:16.3731179Z // begin inline asm 2026-02-21T09:31:16.3731497Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:31:16.3731849Z // end inline asm 2026-02-21T09:31:16.3732014Z // begin inline asm 2026-02-21T09:31:16.3732304Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x2; 2026-02-21T09:31:16.3732644Z // end inline asm 2026-02-21T09:31:16.3732812Z // begin inline asm 2026-02-21T09:31:16.3733091Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:31:16.3733421Z // end inline asm 2026-02-21T09:31:16.3733580Z // begin inline asm 2026-02-21T09:31:16.3734029Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T09:31:16.3734571Z // end inline asm 2026-02-21T09:31:16.3734818Z // begin inline asm 2026-02-21T09:31:16.3735234Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T09:31:16.3735604Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:31:16.3735854Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:31:16.3736064Z // end inline asm 2026-02-21T09:31:16.3736227Z bar.sync 0; 2026-02-21T09:31:16.3736389Z cvta.global.u64 %rd76, %rd43; 2026-02-21T09:31:16.3736744Z .loc 1 28 97 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:28:97 2026-02-21T09:31:16.3737112Z setp.gt.u32 %p39, %r408, 3071; 2026-02-21T09:31:16.3737318Z @%p39 bra $L__BB0_9; 2026-02-21T09:31:16.3737516Z // %bb.1: // %.lr.ph 2026-02-21T09:31:16.3737886Z .loc 1 0 97 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:0:97 2026-02-21T09:31:16.3738279Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T09:31:16.3738596Z shr.u32 %r4, %r1, 5; 2026-02-21T09:31:16.3738808Z and.b32 %r5, %r1, 1; 2026-02-21T09:31:16.3738968Z shl.b32 %r6, %r5, 3; 2026-02-21T09:31:16.3739139Z bfe.u32 %r7, %r1, 1, 6; 2026-02-21T09:31:16.3739304Z or.b32 %r8, %r7, 64; 2026-02-21T09:31:16.3739466Z or.b32 %r9, %r7, 128; 2026-02-21T09:31:16.3739638Z or.b32 %r10, %r7, 192; 2026-02-21T09:31:16.3739811Z bfe.u32 %r66, %r40, 4, 14; 2026-02-21T09:31:16.3739995Z cvt.u64.u32 %rd48, %r66; 2026-02-21T09:31:16.3740187Z or.b64 %rd66, %rd48, -9223371899348713472; 2026-02-21T09:31:16.3740407Z add.s32 %r67, %r40, 114688; 2026-02-21T09:31:16.3740591Z bfe.u32 %r68, %r67, 4, 14; 2026-02-21T09:31:16.3740772Z cvt.u64.u32 %rd49, %r68; 2026-02-21T09:31:16.3740956Z or.b64 %rd67, %rd49, -9223371899411628032; 2026-02-21T09:31:16.3741168Z add.s32 %r69, %r40, 32; 2026-02-21T09:31:16.3741340Z bfe.u32 %r70, %r69, 4, 14; 2026-02-21T09:31:16.3741525Z cvt.u64.u32 %rd50, %r70; 2026-02-21T09:31:16.3741710Z or.b64 %rd68, %rd50, -9223371899348713472; 2026-02-21T09:31:16.3741911Z add.s32 %r71, %r40, 114720; 2026-02-21T09:31:16.3742097Z bfe.u32 %r72, %r71, 4, 14; 2026-02-21T09:31:16.3742276Z cvt.u64.u32 %rd51, %r72; 2026-02-21T09:31:16.3742479Z or.b64 %rd69, %rd51, -9223371899411628032; 2026-02-21T09:31:16.3742698Z add.s32 %r73, %r40, 8192; 2026-02-21T09:31:16.3742898Z bfe.u32 %r74, %r73, 4, 14; 2026-02-21T09:31:16.3743092Z cvt.u64.u32 %rd52, %r74; 2026-02-21T09:31:16.3743308Z or.b64 %rd70, %rd52, -9223371899348713472; 2026-02-21T09:31:16.3743553Z add.s32 %r75, %r40, 8224; 2026-02-21T09:31:16.3743726Z bfe.u32 %r76, %r75, 4, 14; 2026-02-21T09:31:16.3743914Z cvt.u64.u32 %rd53, %r76; 2026-02-21T09:31:16.3744097Z or.b64 %rd72, %rd53, -9223371899348713472; 2026-02-21T09:31:16.3744303Z shl.b32 %r77, %r1, 4; 2026-02-21T09:31:16.3744469Z and.b32 %r78, %r77, 1968; 2026-02-21T09:31:16.3744728Z bfe.s32 %r79, %r1, 2, 1; 2026-02-21T09:31:16.3744934Z and.b32 %r80, %r79, 2112; 2026-02-21T09:31:16.3745123Z or.b32 %r81, %r80, %r78; 2026-02-21T09:31:16.3745318Z add.s32 %r11, %r40, %r81; 2026-02-21T09:31:16.3745497Z xor.b32 %r82, %r81, 64; 2026-02-21T09:31:16.3745667Z add.s32 %r12, %r40, %r82; 2026-02-21T09:31:16.3745844Z shl.b32 %r83, %r1, 3; 2026-02-21T09:31:16.3746009Z and.b32 %r84, %r83, 944; 2026-02-21T09:31:16.3746182Z shl.b32 %r85, %r5, 6; 2026-02-21T09:31:16.3746354Z bfe.s32 %r86, %r1, 3, 1; 2026-02-21T09:31:16.3746559Z and.b32 %r87, %r86, 2112; 2026-02-21T09:31:16.3746756Z or.b32 %r88, %r84, %r85; 2026-02-21T09:31:16.3746959Z xor.b32 %r89, %r88, %r87; 2026-02-21T09:31:16.3747155Z add.s32 %r13, %r40, %r89; 2026-02-21T09:31:16.3747363Z bra.uni $L__BB0_2; 2026-02-21T09:31:16.3747620Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:16.3747914Z mov.b32 %r290, 1; 2026-02-21T09:31:16.3748274Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3748645Z // begin inline asm 2026-02-21T09:31:16.3748816Z 2026-02-21T09:31:16.3748958Z { 2026-02-21T09:31:16.3749119Z .reg .pred complete; 2026-02-21T09:31:16.3749395Z waitLoop: 2026-02-21T09:31:16.3749646Z mbarrier.try_wait.parity.shared.b64 complete, [%r289], %r290; 2026-02-21T09:31:16.3750039Z @!complete bra.uni waitLoop; 2026-02-21T09:31:16.3750249Z } 2026-02-21T09:31:16.3750334Z 2026-02-21T09:31:16.3750416Z // end inline asm 2026-02-21T09:31:16.3750790Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3751251Z bar.sync 0; 2026-02-21T09:31:16.3751428Z // begin inline asm 2026-02-21T09:31:16.3751679Z @%p4 mbarrier.inval.shared::cta.b64 [%r126]; 2026-02-21T09:31:16.3751946Z // end inline asm 2026-02-21T09:31:16.3752118Z bar.sync 0; 2026-02-21T09:31:16.3752291Z // begin inline asm 2026-02-21T09:31:16.3752511Z @%p4 mbarrier.inval.shared::cta.b64 [%r127]; 2026-02-21T09:31:16.3752775Z // end inline asm 2026-02-21T09:31:16.3752957Z bar.sync 0; 2026-02-21T09:31:16.3753122Z // begin inline asm 2026-02-21T09:31:16.3753425Z @%p4 mbarrier.inval.shared::cta.b64 [%r128]; 2026-02-21T09:31:16.3753675Z // end inline asm 2026-02-21T09:31:16.3753836Z bar.sync 0; 2026-02-21T09:31:16.3754017Z // begin inline asm 2026-02-21T09:31:16.3754217Z @%p4 mbarrier.inval.shared::cta.b64 [%r129]; 2026-02-21T09:31:16.3754447Z // end inline asm 2026-02-21T09:31:16.3754621Z bar.sync 0; 2026-02-21T09:31:16.3754834Z // begin inline asm 2026-02-21T09:31:16.3755045Z @%p4 mbarrier.inval.shared::cta.b64 [%r130]; 2026-02-21T09:31:16.3755277Z // end inline asm 2026-02-21T09:31:16.3755447Z bar.sync 0; 2026-02-21T09:31:16.3755600Z // begin inline asm 2026-02-21T09:31:16.3755812Z @%p4 mbarrier.inval.shared::cta.b64 [%r131]; 2026-02-21T09:31:16.3756031Z // end inline asm 2026-02-21T09:31:16.3756178Z bar.sync 0; 2026-02-21T09:31:16.3756328Z // begin inline asm 2026-02-21T09:31:16.3756518Z @%p4 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T09:31:16.3756755Z // end inline asm 2026-02-21T09:31:16.3756925Z add.s32 %r298, %r40, 121920; 2026-02-21T09:31:16.3757127Z // begin inline asm 2026-02-21T09:31:16.3757323Z @%p4 mbarrier.inval.shared::cta.b64 [%r298]; 2026-02-21T09:31:16.3757564Z // end inline asm 2026-02-21T09:31:16.3757724Z bar.sync 0; 2026-02-21T09:31:16.3757883Z // begin inline asm 2026-02-21T09:31:16.3758082Z @%p4 mbarrier.inval.shared::cta.b64 [%r125]; 2026-02-21T09:31:16.3758311Z // end inline asm 2026-02-21T09:31:16.3758649Z .loc 1 56 45 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:56:45 2026-02-21T09:31:16.3759057Z shl.b32 %r351, %r18, 10; 2026-02-21T09:31:16.3759267Z shl.b32 %r352, %r19, 10; 2026-02-21T09:31:16.3759470Z shl.b32 %r353, %r20, 10; 2026-02-21T09:31:16.3759692Z shl.b32 %r354, %r21, 10; 2026-02-21T09:31:16.3760052Z .loc 1 56 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:56:52 2026-02-21T09:31:16.3760433Z add.s32 %r355, %r351, %r16; 2026-02-21T09:31:16.3760635Z add.s32 %r356, %r352, %r16; 2026-02-21T09:31:16.3760827Z add.s32 %r357, %r353, %r16; 2026-02-21T09:31:16.3761030Z add.s32 %r358, %r354, %r16; 2026-02-21T09:31:16.3761371Z .loc 1 56 24 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:56:24 2026-02-21T09:31:16.3761756Z mad.wide.u32 %rd94, %r355, 2, %rd9; 2026-02-21T09:31:16.3761979Z mad.wide.u32 %rd95, %r356, 2, %rd9; 2026-02-21T09:31:16.3762205Z mad.wide.u32 %rd96, %r357, 2, %rd9; 2026-02-21T09:31:16.3762424Z mad.wide.u32 %rd97, %r358, 2, %rd9; 2026-02-21T09:31:16.3762794Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3763178Z // begin inline asm 2026-02-21T09:31:16.3763654Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r300, %r301, %r302, %r303, %r304, %r305, %r306, %r307, %r308, %r309, %r310, %r311, %r312, %r313, %r314, %r315}, [%r333 + 0]; 2026-02-21T09:31:16.3764173Z // end inline asm 2026-02-21T09:31:16.3764339Z // begin inline asm 2026-02-21T09:31:16.3764868Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332}, [%r333 + 16]; 2026-02-21T09:31:16.3765495Z // end inline asm 2026-02-21T09:31:16.3765658Z // begin inline asm 2026-02-21T09:31:16.3765850Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:16.3766047Z // end inline asm 2026-02-21T09:31:16.3766219Z cvt.u64.u32 %rd98, %r300; 2026-02-21T09:31:16.3766406Z cvt.u64.u32 %rd99, %r301; 2026-02-21T09:31:16.3766604Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:31:16.3766802Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:31:16.3767127Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3767513Z mov.b64 {%r359, %r360}, %rd101; 2026-02-21T09:31:16.3767734Z cvt.rn.f16x2.f32 %r361, %r360, %r359; 2026-02-21T09:31:16.3768125Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3768600Z cvt.u64.u32 %rd102, %r302; 2026-02-21T09:31:16.3768841Z cvt.u64.u32 %rd103, %r303; 2026-02-21T09:31:16.3769024Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:31:16.3769216Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:31:16.3769534Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3769878Z mov.b64 {%r362, %r363}, %rd105; 2026-02-21T09:31:16.3770080Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T09:31:16.3770405Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3770746Z cvt.u64.u32 %rd106, %r304; 2026-02-21T09:31:16.3770921Z cvt.u64.u32 %rd107, %r305; 2026-02-21T09:31:16.3771101Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:31:16.3771280Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:31:16.3771595Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3771938Z mov.b64 {%r365, %r366}, %rd109; 2026-02-21T09:31:16.3772131Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T09:31:16.3772464Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3772802Z cvt.u64.u32 %rd110, %r306; 2026-02-21T09:31:16.3772982Z cvt.u64.u32 %rd111, %r307; 2026-02-21T09:31:16.3773155Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:31:16.3773342Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:31:16.3773648Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3773987Z mov.b64 {%r368, %r369}, %rd113; 2026-02-21T09:31:16.3774184Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T09:31:16.3774508Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3774918Z cvt.u64.u32 %rd114, %r308; 2026-02-21T09:31:16.3775107Z cvt.u64.u32 %rd115, %r309; 2026-02-21T09:31:16.3775303Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:31:16.3775500Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:31:16.3775875Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3776286Z mov.b64 {%r371, %r372}, %rd117; 2026-02-21T09:31:16.3776501Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T09:31:16.3776917Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3777294Z cvt.u64.u32 %rd118, %r310; 2026-02-21T09:31:16.3777484Z cvt.u64.u32 %rd119, %r311; 2026-02-21T09:31:16.3777656Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:31:16.3777841Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:31:16.3778157Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3778507Z mov.b64 {%r374, %r375}, %rd121; 2026-02-21T09:31:16.3778704Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T09:31:16.3779045Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3779481Z cvt.u64.u32 %rd122, %r312; 2026-02-21T09:31:16.3779703Z cvt.u64.u32 %rd123, %r313; 2026-02-21T09:31:16.3779882Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:31:16.3780057Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:31:16.3780381Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3780721Z mov.b64 {%r377, %r378}, %rd125; 2026-02-21T09:31:16.3780909Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T09:31:16.3781243Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3781580Z cvt.u64.u32 %rd126, %r314; 2026-02-21T09:31:16.3781761Z cvt.u64.u32 %rd127, %r315; 2026-02-21T09:31:16.3781931Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:31:16.3782116Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:31:16.3782469Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3782859Z mov.b64 {%r380, %r381}, %rd129; 2026-02-21T09:31:16.3783061Z cvt.rn.f16x2.f32 %r382, %r381, %r380; 2026-02-21T09:31:16.3783389Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3783728Z cvt.u64.u32 %rd130, %r317; 2026-02-21T09:31:16.3783901Z cvt.u64.u32 %rd131, %r318; 2026-02-21T09:31:16.3784080Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:31:16.3784256Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:31:16.3784597Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3785022Z mov.b64 {%r383, %r384}, %rd133; 2026-02-21T09:31:16.3785236Z cvt.rn.f16x2.f32 %r385, %r384, %r383; 2026-02-21T09:31:16.3785661Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3786052Z cvt.u64.u32 %rd134, %r319; 2026-02-21T09:31:16.3786251Z cvt.u64.u32 %rd135, %r320; 2026-02-21T09:31:16.3786438Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:31:16.3786638Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:31:16.3786977Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3787345Z mov.b64 {%r386, %r387}, %rd137; 2026-02-21T09:31:16.3787554Z cvt.rn.f16x2.f32 %r388, %r387, %r386; 2026-02-21T09:31:16.3787910Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3788281Z cvt.u64.u32 %rd138, %r321; 2026-02-21T09:31:16.3788468Z cvt.u64.u32 %rd139, %r322; 2026-02-21T09:31:16.3788665Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:31:16.3788854Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:31:16.3789199Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3789572Z mov.b64 {%r389, %r390}, %rd141; 2026-02-21T09:31:16.3789777Z cvt.rn.f16x2.f32 %r391, %r390, %r389; 2026-02-21T09:31:16.3790143Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3790509Z cvt.u64.u32 %rd142, %r323; 2026-02-21T09:31:16.3790703Z cvt.u64.u32 %rd143, %r324; 2026-02-21T09:31:16.3790886Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:31:16.3791085Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:31:16.3791423Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3791796Z mov.b64 {%r392, %r393}, %rd145; 2026-02-21T09:31:16.3792004Z cvt.rn.f16x2.f32 %r394, %r393, %r392; 2026-02-21T09:31:16.3792365Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3792737Z cvt.u64.u32 %rd146, %r325; 2026-02-21T09:31:16.3792921Z cvt.u64.u32 %rd147, %r326; 2026-02-21T09:31:16.3793114Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:31:16.3793314Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:31:16.3793705Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3794315Z mov.b64 {%r395, %r396}, %rd149; 2026-02-21T09:31:16.3794562Z cvt.rn.f16x2.f32 %r397, %r396, %r395; 2026-02-21T09:31:16.3795025Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3795406Z cvt.u64.u32 %rd150, %r327; 2026-02-21T09:31:16.3795604Z cvt.u64.u32 %rd151, %r328; 2026-02-21T09:31:16.3795794Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:31:16.3795996Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:31:16.3796346Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3796726Z mov.b64 {%r398, %r399}, %rd153; 2026-02-21T09:31:16.3796943Z cvt.rn.f16x2.f32 %r400, %r399, %r398; 2026-02-21T09:31:16.3797372Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3797788Z cvt.u64.u32 %rd154, %r329; 2026-02-21T09:31:16.3797981Z cvt.u64.u32 %rd155, %r330; 2026-02-21T09:31:16.3798180Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:31:16.3798369Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:31:16.3798716Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3799087Z mov.b64 {%r401, %r402}, %rd157; 2026-02-21T09:31:16.3799289Z cvt.rn.f16x2.f32 %r403, %r402, %r401; 2026-02-21T09:31:16.3799650Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3800016Z cvt.u64.u32 %rd158, %r331; 2026-02-21T09:31:16.3800213Z cvt.u64.u32 %rd159, %r332; 2026-02-21T09:31:16.3800400Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:31:16.3800598Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:31:16.3800944Z .loc 1 55 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:55:27 2026-02-21T09:31:16.3801308Z mov.b64 {%r404, %r405}, %rd161; 2026-02-21T09:31:16.3801512Z cvt.rn.f16x2.f32 %r406, %r405, %r404; 2026-02-21T09:31:16.3801849Z .loc 1 56 82 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:56:82 2026-02-21T09:31:16.3802238Z st.shared.v4.b32 [%r11], {%r361, %r364, %r367, %r370}; 2026-02-21T09:31:16.3802525Z st.shared.v4.b32 [%r12], {%r373, %r376, %r379, %r382}; 2026-02-21T09:31:16.3802781Z bar.sync 0; 2026-02-21T09:31:16.3802986Z ld.shared.v4.b32 {%r334, %r335, %r336, %r337}, [%r13]; 2026-02-21T09:31:16.3803307Z ld.shared.v4.b32 {%r338, %r339, %r340, %r341}, [%r13+1024]; 2026-02-21T09:31:16.3803593Z bar.sync 0; 2026-02-21T09:31:16.3803804Z st.shared.v4.b32 [%r11], {%r385, %r388, %r391, %r394}; 2026-02-21T09:31:16.3804114Z st.shared.v4.b32 [%r12], {%r397, %r400, %r403, %r406}; 2026-02-21T09:31:16.3804354Z bar.sync 0; 2026-02-21T09:31:16.3804545Z ld.shared.v4.b32 {%r342, %r343, %r344, %r345}, [%r13]; 2026-02-21T09:31:16.3804866Z ld.shared.v4.b32 {%r346, %r347, %r348, %r349}, [%r13+1024]; 2026-02-21T09:31:16.3805123Z // begin inline asm 2026-02-21T09:31:16.3805357Z st.global.v4.b32 [ %rd94 + 0 ], { %r334, %r335, %r336, %r337 }; 2026-02-21T09:31:16.3805633Z // end inline asm 2026-02-21T09:31:16.3805805Z // begin inline asm 2026-02-21T09:31:16.3806031Z st.global.v4.b32 [ %rd95 + 0 ], { %r338, %r339, %r340, %r341 }; 2026-02-21T09:31:16.3806300Z // end inline asm 2026-02-21T09:31:16.3806465Z // begin inline asm 2026-02-21T09:31:16.3806696Z st.global.v4.b32 [ %rd96 + 0 ], { %r342, %r343, %r344, %r345 }; 2026-02-21T09:31:16.3806955Z // end inline asm 2026-02-21T09:31:16.3807123Z // begin inline asm 2026-02-21T09:31:16.3807340Z st.global.v4.b32 [ %rd97 + 0 ], { %r346, %r347, %r348, %r349 }; 2026-02-21T09:31:16.3807603Z // end inline asm 2026-02-21T09:31:16.3807935Z .loc 1 28 97 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:28:97 2026-02-21T09:31:16.3808310Z add.s32 %r39, %r408, 1184; 2026-02-21T09:31:16.3808518Z setp.lt.u32 %p131, %r408, 1888; 2026-02-21T09:31:16.3808722Z mov.b32 %r408, %r39; 2026-02-21T09:31:16.3808997Z @%p131 bra $L__BB0_2; 2026-02-21T09:31:16.3809224Z bra.uni $L__BB0_9; 2026-02-21T09:31:16.3809456Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:31:16.3809755Z // Child Loop BB0_5 Depth 2 2026-02-21T09:31:16.3810153Z .loc 1 34 35 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:34:35 2026-02-21T09:31:16.3810495Z cvt.u16.u32 %rs1, %r408; 2026-02-21T09:31:16.3810676Z mul.hi.u16 %rs2, %rs1, -21845; 2026-02-21T09:31:16.3810868Z shr.u16 %rs3, %rs2, 9; 2026-02-21T09:31:16.3811163Z .loc 1 37 45 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:37:45 2026-02-21T09:31:16.3811502Z mul.lo.s16 %rs4, %rs3, 768; 2026-02-21T09:31:16.3811682Z sub.s16 %rs5, %rs1, %rs4; 2026-02-21T09:31:16.3812112Z .loc 1 39 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:39:27 2026-02-21T09:31:16.3812525Z mul.wide.u16 %r189, %rs3, 256; 2026-02-21T09:31:16.3812763Z mul.wide.u16 %r190, %rs5, 16; 2026-02-21T09:31:16.3813006Z and.b32 %r191, %r190, 240; 2026-02-21T09:31:16.3813216Z or.b32 %r224, %r191, %r189; 2026-02-21T09:31:16.3813558Z .loc 1 41 27 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:41:27 2026-02-21T09:31:16.3813928Z and.b32 %r220, %r190, 16128; 2026-02-21T09:31:16.3814265Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3814625Z shfl.sync.idx.b32 %r22, %r4, 0, 31, -1; 2026-02-21T09:31:16.3814892Z shl.b32 %r192, %r22, 21; 2026-02-21T09:31:16.3815089Z and.b32 %r193, %r192, 6291456; 2026-02-21T09:31:16.3815286Z add.s32 %r333, %r193, %r407; 2026-02-21T09:31:16.3815489Z mov.pred %p85, -1; 2026-02-21T09:31:16.3815659Z mov.b32 %r409, 0; 2026-02-21T09:31:16.3815835Z // begin inline asm 2026-02-21T09:31:16.3816330Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r333 + 0], {%r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409}; 2026-02-21T09:31:16.3816851Z // end inline asm 2026-02-21T09:31:16.3817024Z // begin inline asm 2026-02-21T09:31:16.3817468Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r333 + 16], {%r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409, %r409}; 2026-02-21T09:31:16.3817946Z // end inline asm 2026-02-21T09:31:16.3818099Z // begin inline asm 2026-02-21T09:31:16.3818278Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:16.3818469Z // end inline asm 2026-02-21T09:31:16.3818627Z bar.sync 0; 2026-02-21T09:31:16.3818913Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3819263Z add.s32 %r410, %r40, 121920; 2026-02-21T09:31:16.3819446Z // begin inline asm 2026-02-21T09:31:16.3819639Z @%p4 mbarrier.init.shared::cta.b64 [%r410], 1; 2026-02-21T09:31:16.3819864Z // end inline asm 2026-02-21T09:31:16.3820013Z bar.sync 0; 2026-02-21T09:31:16.3820168Z add.s32 %r125, %r40, 121928; 2026-02-21T09:31:16.3820342Z // begin inline asm 2026-02-21T09:31:16.3820537Z @%p4 mbarrier.init.shared::cta.b64 [%r125], 1; 2026-02-21T09:31:16.3820771Z // end inline asm 2026-02-21T09:31:16.3820943Z add.s32 %r126, %r40, 121856; 2026-02-21T09:31:16.3821148Z // begin inline asm 2026-02-21T09:31:16.3821358Z @%p4 mbarrier.init.shared::cta.b64 [%r126], 1; 2026-02-21T09:31:16.3821613Z // end inline asm 2026-02-21T09:31:16.3821783Z bar.sync 0; 2026-02-21T09:31:16.3821964Z add.s32 %r127, %r40, 121864; 2026-02-21T09:31:16.3822157Z // begin inline asm 2026-02-21T09:31:16.3822350Z @%p4 mbarrier.init.shared::cta.b64 [%r127], 1; 2026-02-21T09:31:16.3822564Z // end inline asm 2026-02-21T09:31:16.3822720Z bar.sync 0; 2026-02-21T09:31:16.3822867Z add.s32 %r128, %r40, 121872; 2026-02-21T09:31:16.3823053Z // begin inline asm 2026-02-21T09:31:16.3823247Z @%p4 mbarrier.init.shared::cta.b64 [%r128], 1; 2026-02-21T09:31:16.3823548Z // end inline asm 2026-02-21T09:31:16.3823746Z bar.sync 0; 2026-02-21T09:31:16.3823891Z add.s32 %r129, %r40, 121880; 2026-02-21T09:31:16.3824069Z // begin inline asm 2026-02-21T09:31:16.3824254Z @%p4 mbarrier.init.shared::cta.b64 [%r129], 1; 2026-02-21T09:31:16.3824474Z // end inline asm 2026-02-21T09:31:16.3824632Z bar.sync 0; 2026-02-21T09:31:16.3824837Z add.s32 %r130, %r40, 121888; 2026-02-21T09:31:16.3825026Z // begin inline asm 2026-02-21T09:31:16.3825226Z @%p4 mbarrier.init.shared::cta.b64 [%r130], 1; 2026-02-21T09:31:16.3825462Z // end inline asm 2026-02-21T09:31:16.3825617Z bar.sync 0; 2026-02-21T09:31:16.3825779Z add.s32 %r131, %r40, 121896; 2026-02-21T09:31:16.3825966Z // begin inline asm 2026-02-21T09:31:16.3826166Z @%p4 mbarrier.init.shared::cta.b64 [%r131], 1; 2026-02-21T09:31:16.3826393Z // end inline asm 2026-02-21T09:31:16.3826559Z bar.sync 0; 2026-02-21T09:31:16.3826759Z add.s32 %r217, %r40, 121904; 2026-02-21T09:31:16.3826996Z // begin inline asm 2026-02-21T09:31:16.3827204Z @%p4 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T09:31:16.3827429Z // end inline asm 2026-02-21T09:31:16.3827595Z bar.sync 0; 2026-02-21T09:31:16.3827744Z // begin inline asm 2026-02-21T09:31:16.3827984Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r126], 17408; 2026-02-21T09:31:16.3828255Z // end inline asm 2026-02-21T09:31:16.3828571Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3828926Z // begin inline asm 2026-02-21T09:31:16.3829118Z fence.proxy.async.shared::cta; 2026-02-21T09:31:16.3829327Z // end inline asm 2026-02-21T09:31:16.3829483Z bar.sync 0; 2026-02-21T09:31:16.3829664Z elect.sync %r194|%p70, -1; 2026-02-21T09:31:16.3829879Z and.pred %p52, %p1, %p70; 2026-02-21T09:31:16.3830091Z // begin inline asm 2026-02-21T09:31:16.3830561Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r40], [%rd75, {%r409, %r220}], [%r126]; 2026-02-21T09:31:16.3831114Z // end inline asm 2026-02-21T09:31:16.3831441Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3831812Z bar.sync 0; 2026-02-21T09:31:16.3831982Z elect.sync %r195|%p71, -1; 2026-02-21T09:31:16.3832184Z and.pred %p53, %p1, %p71; 2026-02-21T09:31:16.3832379Z // begin inline asm 2026-02-21T09:31:16.3832793Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r67], [%rd76, {%r409, %r224}], [%r126]; 2026-02-21T09:31:16.3833245Z // end inline asm 2026-02-21T09:31:16.3833560Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3833924Z bar.sync 0; 2026-02-21T09:31:16.3834084Z // begin inline asm 2026-02-21T09:31:16.3834317Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r127], 17408; 2026-02-21T09:31:16.3834603Z // end inline asm 2026-02-21T09:31:16.3834985Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3835346Z bar.sync 0; 2026-02-21T09:31:16.3835499Z elect.sync %r196|%p72, -1; 2026-02-21T09:31:16.3835690Z and.pred %p55, %p1, %p72; 2026-02-21T09:31:16.3835870Z add.s32 %r143, %r40, 16384; 2026-02-21T09:31:16.3836050Z mov.b32 %r144, 32; 2026-02-21T09:31:16.3836214Z // begin inline asm 2026-02-21T09:31:16.3836600Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r143], [%rd75, {%r144, %r220}], [%r127]; 2026-02-21T09:31:16.3837026Z // end inline asm 2026-02-21T09:31:16.3837316Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3837652Z bar.sync 0; 2026-02-21T09:31:16.3837802Z elect.sync %r197|%p73, -1; 2026-02-21T09:31:16.3837996Z and.pred %p56, %p1, %p73; 2026-02-21T09:31:16.3838177Z add.s32 %r147, %r40, 115712; 2026-02-21T09:31:16.3838360Z // begin inline asm 2026-02-21T09:31:16.3838772Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd76, {%r144, %r224}], [%r127]; 2026-02-21T09:31:16.3839380Z // end inline asm 2026-02-21T09:31:16.3839736Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3840139Z bar.sync 0; 2026-02-21T09:31:16.3840294Z // begin inline asm 2026-02-21T09:31:16.3840510Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r128], 17408; 2026-02-21T09:31:16.3840770Z // end inline asm 2026-02-21T09:31:16.3841066Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3841397Z bar.sync 0; 2026-02-21T09:31:16.3841557Z elect.sync %r198|%p74, -1; 2026-02-21T09:31:16.3841740Z and.pred %p58, %p1, %p74; 2026-02-21T09:31:16.3841925Z add.s32 %r152, %r40, 32768; 2026-02-21T09:31:16.3842101Z mov.b32 %r153, 64; 2026-02-21T09:31:16.3842325Z // begin inline asm 2026-02-21T09:31:16.3842747Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r152], [%rd75, {%r153, %r220}], [%r128]; 2026-02-21T09:31:16.3843174Z // end inline asm 2026-02-21T09:31:16.3843470Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3843798Z bar.sync 0; 2026-02-21T09:31:16.3843955Z elect.sync %r199|%p75, -1; 2026-02-21T09:31:16.3844137Z and.pred %p59, %p1, %p75; 2026-02-21T09:31:16.3844321Z add.s32 %r156, %r40, 116736; 2026-02-21T09:31:16.3844492Z // begin inline asm 2026-02-21T09:31:16.3844926Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r156], [%rd76, {%r153, %r224}], [%r128]; 2026-02-21T09:31:16.3845349Z // end inline asm 2026-02-21T09:31:16.3845647Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3846010Z bar.sync 0; 2026-02-21T09:31:16.3846166Z // begin inline asm 2026-02-21T09:31:16.3846407Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r129], 17408; 2026-02-21T09:31:16.3846677Z // end inline asm 2026-02-21T09:31:16.3846994Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3847351Z bar.sync 0; 2026-02-21T09:31:16.3847528Z elect.sync %r200|%p76, -1; 2026-02-21T09:31:16.3847749Z and.pred %p61, %p1, %p76; 2026-02-21T09:31:16.3847961Z add.s32 %r161, %r40, 49152; 2026-02-21T09:31:16.3848180Z mov.b32 %r162, 96; 2026-02-21T09:31:16.3848365Z // begin inline asm 2026-02-21T09:31:16.3848868Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r161], [%rd75, {%r162, %r220}], [%r129]; 2026-02-21T09:31:16.3849368Z // end inline asm 2026-02-21T09:31:16.3849689Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3850053Z bar.sync 0; 2026-02-21T09:31:16.3850216Z elect.sync %r201|%p77, -1; 2026-02-21T09:31:16.3850424Z and.pred %p62, %p1, %p77; 2026-02-21T09:31:16.3850622Z add.s32 %r165, %r40, 117760; 2026-02-21T09:31:16.3850820Z // begin inline asm 2026-02-21T09:31:16.3851230Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r165], [%rd76, {%r162, %r224}], [%r129]; 2026-02-21T09:31:16.3851686Z // end inline asm 2026-02-21T09:31:16.3851996Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3852360Z bar.sync 0; 2026-02-21T09:31:16.3852522Z // begin inline asm 2026-02-21T09:31:16.3852756Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r130], 17408; 2026-02-21T09:31:16.3853031Z // end inline asm 2026-02-21T09:31:16.3853339Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3853706Z bar.sync 0; 2026-02-21T09:31:16.3853868Z elect.sync %r202|%p78, -1; 2026-02-21T09:31:16.3854073Z and.pred %p64, %p1, %p78; 2026-02-21T09:31:16.3854267Z add.s32 %r170, %r40, 65536; 2026-02-21T09:31:16.3854615Z mov.b32 %r171, 128; 2026-02-21T09:31:16.3854838Z // begin inline asm 2026-02-21T09:31:16.3855249Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r170], [%rd75, {%r171, %r220}], [%r130]; 2026-02-21T09:31:16.3855747Z // end inline asm 2026-02-21T09:31:16.3856059Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3856434Z bar.sync 0; 2026-02-21T09:31:16.3856603Z elect.sync %r203|%p79, -1; 2026-02-21T09:31:16.3856839Z and.pred %p65, %p1, %p79; 2026-02-21T09:31:16.3857063Z add.s32 %r174, %r40, 118784; 2026-02-21T09:31:16.3857278Z // begin inline asm 2026-02-21T09:31:16.3857777Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r174], [%rd76, {%r171, %r224}], [%r130]; 2026-02-21T09:31:16.3858327Z // end inline asm 2026-02-21T09:31:16.3858759Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3859130Z bar.sync 0; 2026-02-21T09:31:16.3859294Z // begin inline asm 2026-02-21T09:31:16.3859531Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r131], 17408; 2026-02-21T09:31:16.3859798Z // end inline asm 2026-02-21T09:31:16.3860115Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3860469Z bar.sync 0; 2026-02-21T09:31:16.3860650Z elect.sync %r204|%p80, -1; 2026-02-21T09:31:16.3860849Z and.pred %p67, %p1, %p80; 2026-02-21T09:31:16.3861045Z add.s32 %r179, %r40, 81920; 2026-02-21T09:31:16.3861229Z mov.b32 %r180, 160; 2026-02-21T09:31:16.3861409Z // begin inline asm 2026-02-21T09:31:16.3861826Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r179], [%rd75, {%r180, %r220}], [%r131]; 2026-02-21T09:31:16.3862273Z // end inline asm 2026-02-21T09:31:16.3862609Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3862959Z bar.sync 0; 2026-02-21T09:31:16.3863117Z elect.sync %r205|%p81, -1; 2026-02-21T09:31:16.3863297Z and.pred %p68, %p1, %p81; 2026-02-21T09:31:16.3863481Z add.s32 %r183, %r40, 119808; 2026-02-21T09:31:16.3863654Z // begin inline asm 2026-02-21T09:31:16.3864035Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r183], [%rd76, {%r180, %r224}], [%r131]; 2026-02-21T09:31:16.3864451Z // end inline asm 2026-02-21T09:31:16.3864798Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3865175Z bar.sync 0; 2026-02-21T09:31:16.3865334Z // begin inline asm 2026-02-21T09:31:16.3865499Z 2026-02-21T09:31:16.3865634Z { 2026-02-21T09:31:16.3865799Z .reg .pred complete; 2026-02-21T09:31:16.3865994Z waitLoop: 2026-02-21T09:31:16.3866240Z mbarrier.try_wait.parity.shared.b64 complete, [%r126], %r409; 2026-02-21T09:31:16.3866557Z @!complete bra.uni waitLoop; 2026-02-21T09:31:16.3866757Z } 2026-02-21T09:31:16.3866840Z 2026-02-21T09:31:16.3866917Z // end inline asm 2026-02-21T09:31:16.3867267Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3867627Z setp.ne.b32 %p82, %r22, 0; 2026-02-21T09:31:16.3867805Z @%p82 bra $L__BB0_4; 2026-02-21T09:31:16.3868025Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:16.3868282Z elect.sync %r214|%p84, -1; 2026-02-21T09:31:16.3868460Z mov.b32 %r207, 134479888; 2026-02-21T09:31:16.3868638Z mov.pred %p83, 0; 2026-02-21T09:31:16.3868794Z // begin inline asm 2026-02-21T09:31:16.3869062Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 0 ], %rd66, %rd67, %r207, %p83; 2026-02-21T09:31:16.3869358Z // end inline asm 2026-02-21T09:31:16.3869515Z // begin inline asm 2026-02-21T09:31:16.3869772Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 0 ], %rd68, %rd69, %r207, %p85; 2026-02-21T09:31:16.3870161Z // end inline asm 2026-02-21T09:31:16.3870365Z // begin inline asm 2026-02-21T09:31:16.3870615Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 16 ], %rd70, %rd67, %r207, %p83; 2026-02-21T09:31:16.3870911Z // end inline asm 2026-02-21T09:31:16.3871061Z // begin inline asm 2026-02-21T09:31:16.3871316Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 16 ], %rd72, %rd69, %r207, %p85; 2026-02-21T09:31:16.3871600Z // end inline asm 2026-02-21T09:31:16.3871762Z add.s32 %r216, %r40, 121920; 2026-02-21T09:31:16.3871939Z cvt.u64.u32 %rd74, %r216; 2026-02-21T09:31:16.3872118Z // begin inline asm 2026-02-21T09:31:16.3872361Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd74]; 2026-02-21T09:31:16.3872624Z // end inline asm 2026-02-21T09:31:16.3872831Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:16.3873250Z .loc 1 0 0 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:0 2026-02-21T09:31:16.3873632Z or.b32 %r16, %r224, %r6; 2026-02-21T09:31:16.3873810Z or.b32 %r18, %r220, %r7; 2026-02-21T09:31:16.3873986Z or.b32 %r19, %r220, %r8; 2026-02-21T09:31:16.3874159Z or.b32 %r20, %r220, %r9; 2026-02-21T09:31:16.3874327Z or.b32 %r21, %r220, %r10; 2026-02-21T09:31:16.3874654Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3875098Z bar.sync 0; 2026-02-21T09:31:16.3875276Z // begin inline asm 2026-02-21T09:31:16.3875541Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r217], 17408; 2026-02-21T09:31:16.3875868Z // end inline asm 2026-02-21T09:31:16.3876231Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3876598Z bar.sync 0; 2026-02-21T09:31:16.3876771Z elect.sync %r231|%p96, -1; 2026-02-21T09:31:16.3876969Z and.pred %p93, %p1, %p96; 2026-02-21T09:31:16.3877169Z add.s32 %r218, %r40, 98304; 2026-02-21T09:31:16.3877359Z mov.b32 %r219, 192; 2026-02-21T09:31:16.3877539Z // begin inline asm 2026-02-21T09:31:16.3877954Z @%p93 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r218], [%rd75, {%r219, %r220}], [%r217]; 2026-02-21T09:31:16.3878417Z // end inline asm 2026-02-21T09:31:16.3878729Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3879088Z bar.sync 0; 2026-02-21T09:31:16.3879257Z elect.sync %r232|%p97, -1; 2026-02-21T09:31:16.3879452Z and.pred %p94, %p1, %p97; 2026-02-21T09:31:16.3879650Z add.s32 %r222, %r40, 120832; 2026-02-21T09:31:16.3879836Z // begin inline asm 2026-02-21T09:31:16.3880252Z @%p94 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r222], [%rd76, {%r219, %r224}], [%r217]; 2026-02-21T09:31:16.3880698Z // end inline asm 2026-02-21T09:31:16.3880865Z mov.b32 %r414, 1; 2026-02-21T09:31:16.3881028Z mov.b32 %r413, 6; 2026-02-21T09:31:16.3881203Z mov.b32 %r411, %r409; 2026-02-21T09:31:16.3881391Z mov.b32 %r412, %r409; 2026-02-21T09:31:16.3881572Z mov.b32 %r415, %r409; 2026-02-21T09:31:16.3881753Z mov.b32 %r416, %r409; 2026-02-21T09:31:16.3881922Z bra.uni $L__BB0_5; 2026-02-21T09:31:16.3882151Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:16.3882567Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3882953Z setp.lt.u32 %p112, %r416, 800; 2026-02-21T09:31:16.3883309Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3883682Z // begin inline asm 2026-02-21T09:31:16.3883864Z 2026-02-21T09:31:16.3884010Z { 2026-02-21T09:31:16.3884175Z .reg .pred complete; 2026-02-21T09:31:16.3884359Z waitLoop: 2026-02-21T09:31:16.3884634Z mbarrier.try_wait.parity.shared.b64 complete, [%r410], %r409; 2026-02-21T09:31:16.3885045Z @!complete bra.uni waitLoop; 2026-02-21T09:31:16.3885278Z } 2026-02-21T09:31:16.3885442Z 2026-02-21T09:31:16.3885508Z // end inline asm 2026-02-21T09:31:16.3885729Z add.s32 %r278, %r414, 1; 2026-02-21T09:31:16.3885935Z setp.gt.s32 %p115, %r278, 1; 2026-02-21T09:31:16.3886135Z selp.b32 %r414, 0, %r278, %p115; 2026-02-21T09:31:16.3886352Z selp.b32 %r279, 1, 0, %p115; 2026-02-21T09:31:16.3886545Z xor.b32 %r36, %r415, %r279; 2026-02-21T09:31:16.3886889Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3887257Z add.s32 %r280, %r413, 1; 2026-02-21T09:31:16.3887453Z setp.gt.s32 %p116, %r280, 6; 2026-02-21T09:31:16.3887653Z selp.b32 %r413, 0, %r280, %p116; 2026-02-21T09:31:16.3887862Z shl.b32 %r281, %r413, 3; 2026-02-21T09:31:16.3888054Z add.s32 %r283, %r40, %r281; 2026-02-21T09:31:16.3888245Z add.s32 %r273, %r283, 121856; 2026-02-21T09:31:16.3888440Z bar.sync 0; 2026-02-21T09:31:16.3888653Z and.pred %p109, %p4, %p112; 2026-02-21T09:31:16.3888852Z // begin inline asm 2026-02-21T09:31:16.3889134Z @%p109 mbarrier.arrive.expect_tx.shared.b64 _, [%r273], 17408; 2026-02-21T09:31:16.3889439Z // end inline asm 2026-02-21T09:31:16.3889748Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3890123Z shl.b32 %r284, %r413, 14; 2026-02-21T09:31:16.3890317Z add.s32 %r270, %r40, %r284; 2026-02-21T09:31:16.3890502Z bar.sync 0; 2026-02-21T09:31:16.3890674Z elect.sync %r285|%p117, -1; 2026-02-21T09:31:16.3890875Z and.pred %p118, %p112, %p117; 2026-02-21T09:31:16.3891086Z and.pred %p110, %p1, %p118; 2026-02-21T09:31:16.3891282Z add.s32 %r271, %r416, 224; 2026-02-21T09:31:16.3891474Z // begin inline asm 2026-02-21T09:31:16.3891908Z @%p110 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r270], [%rd75, {%r271, %r220}], [%r273]; 2026-02-21T09:31:16.3892341Z // end inline asm 2026-02-21T09:31:16.3892641Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3892998Z shl.b32 %r286, %r413, 10; 2026-02-21T09:31:16.3893193Z add.s32 %r287, %r40, %r286; 2026-02-21T09:31:16.3893383Z add.s32 %r274, %r287, 114688; 2026-02-21T09:31:16.3893581Z bar.sync 0; 2026-02-21T09:31:16.3893749Z elect.sync %r288|%p119, -1; 2026-02-21T09:31:16.3893974Z and.pred %p120, %p112, %p119; 2026-02-21T09:31:16.3894196Z and.pred %p111, %p1, %p120; 2026-02-21T09:31:16.3894402Z // begin inline asm 2026-02-21T09:31:16.3894892Z @%p111 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r274], [%rd76, {%r271, %r224}], [%r273]; 2026-02-21T09:31:16.3895353Z // end inline asm 2026-02-21T09:31:16.3895671Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3896072Z setp.lt.u32 %p121, %r416, 960; 2026-02-21T09:31:16.3896266Z add.s32 %r416, %r416, 32; 2026-02-21T09:31:16.3896437Z mov.b32 %r409, %r415; 2026-02-21T09:31:16.3896613Z mov.b32 %r410, %r289; 2026-02-21T09:31:16.3896776Z mov.b32 %r415, %r36; 2026-02-21T09:31:16.3896949Z @%p121 bra $L__BB0_5; 2026-02-21T09:31:16.3897119Z bra.uni $L__BB0_8; 2026-02-21T09:31:16.3897326Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:31:16.3897617Z // => This Inner Loop Header: Depth=2 2026-02-21T09:31:16.3897991Z .loc 1 47 89 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:47:89 2026-02-21T09:31:16.3898335Z add.s32 %r235, %r412, 1; 2026-02-21T09:31:16.3898511Z setp.gt.s32 %p99, %r235, 6; 2026-02-21T09:31:16.3898703Z selp.b32 %r412, 0, %r235, %p99; 2026-02-21T09:31:16.3898900Z selp.b32 %r236, 1, 0, %p99; 2026-02-21T09:31:16.3899078Z xor.b32 %r411, %r411, %r236; 2026-02-21T09:31:16.3899259Z shl.b32 %r237, %r412, 3; 2026-02-21T09:31:16.3899434Z add.s32 %r239, %r40, %r237; 2026-02-21T09:31:16.3899615Z add.s32 %r233, %r239, 121856; 2026-02-21T09:31:16.3899786Z bar.sync 0; 2026-02-21T09:31:16.3899940Z // begin inline asm 2026-02-21T09:31:16.3900172Z 2026-02-21T09:31:16.3900340Z { 2026-02-21T09:31:16.3900476Z .reg .pred complete; 2026-02-21T09:31:16.3900642Z waitLoop: 2026-02-21T09:31:16.3900853Z mbarrier.try_wait.parity.shared.b64 complete, [%r233], %r411; 2026-02-21T09:31:16.3901130Z @!complete bra.uni waitLoop; 2026-02-21T09:31:16.3901305Z } 2026-02-21T09:31:16.3901377Z 2026-02-21T09:31:16.3901437Z // end inline asm 2026-02-21T09:31:16.3901597Z shl.b32 %r240, %r414, 3; 2026-02-21T09:31:16.3901765Z add.s32 %r241, %r40, %r240; 2026-02-21T09:31:16.3901962Z add.s32 %r289, %r241, 121920; 2026-02-21T09:31:16.3902308Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3902691Z @%p82 bra $L__BB0_7; 2026-02-21T09:31:16.3902936Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:16.3903500Z .loc 1 51 31 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:51:31 2026-02-21T09:31:16.3903852Z shl.b32 %r250, %r412, 14; 2026-02-21T09:31:16.3904026Z add.s32 %r252, %r40, %r250; 2026-02-21T09:31:16.3904338Z .loc 1 52 44 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:52:44 2026-02-21T09:31:16.3904722Z shl.b32 %r253, %r412, 10; 2026-02-21T09:31:16.3904921Z add.s32 %r254, %r40, %r253; 2026-02-21T09:31:16.3905109Z add.s32 %r255, %r254, 114688; 2026-02-21T09:31:16.3905447Z .loc 1 53 52 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:53:52 2026-02-21T09:31:16.3905820Z elect.sync %r256|%p101, -1; 2026-02-21T09:31:16.3906017Z bfe.u32 %r257, %r252, 4, 14; 2026-02-21T09:31:16.3906202Z cvt.u64.u32 %rd86, %r257; 2026-02-21T09:31:16.3906392Z or.b64 %rd77, %rd86, -9223371899348713472; 2026-02-21T09:31:16.3906605Z bfe.u32 %r258, %r255, 4, 14; 2026-02-21T09:31:16.3906780Z cvt.u64.u32 %rd87, %r258; 2026-02-21T09:31:16.3906976Z or.b64 %rd78, %rd87, -9223371899411628032; 2026-02-21T09:31:16.3907178Z mov.b32 %r243, 134479888; 2026-02-21T09:31:16.3907359Z mov.pred %p100, -1; 2026-02-21T09:31:16.3907528Z // begin inline asm 2026-02-21T09:31:16.3907789Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 0 ], %rd77, %rd78, %r243, %p100; 2026-02-21T09:31:16.3908091Z // end inline asm 2026-02-21T09:31:16.3908242Z add.s32 %r259, %r252, 32; 2026-02-21T09:31:16.3908421Z bfe.u32 %r260, %r259, 4, 14; 2026-02-21T09:31:16.3908594Z cvt.u64.u32 %rd88, %r260; 2026-02-21T09:31:16.3908782Z or.b64 %rd79, %rd88, -9223371899348713472; 2026-02-21T09:31:16.3908983Z add.s32 %r261, %r254, 114720; 2026-02-21T09:31:16.3909168Z bfe.u32 %r262, %r261, 4, 14; 2026-02-21T09:31:16.3909344Z cvt.u64.u32 %rd89, %r262; 2026-02-21T09:31:16.3909533Z or.b64 %rd80, %rd89, -9223371899411628032; 2026-02-21T09:31:16.3909743Z // begin inline asm 2026-02-21T09:31:16.3910000Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 0 ], %rd79, %rd80, %r243, %p100; 2026-02-21T09:31:16.3910301Z // end inline asm 2026-02-21T09:31:16.3910458Z add.s32 %r263, %r252, 8192; 2026-02-21T09:31:16.3910646Z bfe.u32 %r264, %r263, 4, 14; 2026-02-21T09:31:16.3910836Z cvt.u64.u32 %rd90, %r264; 2026-02-21T09:31:16.3911044Z or.b64 %rd81, %rd90, -9223371899348713472; 2026-02-21T09:31:16.3911274Z // begin inline asm 2026-02-21T09:31:16.3911569Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 16 ], %rd81, %rd78, %r243, %p100; 2026-02-21T09:31:16.3911937Z // end inline asm 2026-02-21T09:31:16.3912102Z add.s32 %r265, %r252, 8224; 2026-02-21T09:31:16.3912282Z bfe.u32 %r266, %r265, 4, 14; 2026-02-21T09:31:16.3912456Z cvt.u64.u32 %rd91, %r266; 2026-02-21T09:31:16.3912648Z or.b64 %rd83, %rd91, -9223371899348713472; 2026-02-21T09:31:16.3912847Z // begin inline asm 2026-02-21T09:31:16.3913111Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r407 + 16 ], %rd83, %rd80, %r243, %p100; 2026-02-21T09:31:16.3913410Z // end inline asm 2026-02-21T09:31:16.3913565Z cvt.u64.u32 %rd85, %r289; 2026-02-21T09:31:16.3913741Z // begin inline asm 2026-02-21T09:31:16.3914082Z @%p101 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd85]; 2026-02-21T09:31:16.3914399Z // end inline asm 2026-02-21T09:31:16.3914546Z bra.uni $L__BB0_7; 2026-02-21T09:31:16.3914789Z $L__BB0_9: // %._crit_edge 2026-02-21T09:31:16.3915163Z .loc 1 28 4 // cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py:28:4 2026-02-21T09:31:16.3915521Z bar.sync 0; 2026-02-21T09:31:16.3915683Z // begin inline asm 2026-02-21T09:31:16.3915921Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r407, 32; 2026-02-21T09:31:16.3916205Z // end inline asm 2026-02-21T09:31:16.3916359Z ret; 2026-02-21T09:31:16.3916510Z $L__tmp1: 2026-02-21T09:31:16.3916657Z $L__func_end0: 2026-02-21T09:31:16.3916849Z // -- End function 2026-02-21T09:31:16.3917068Z } 2026-02-21T09:31:16.3917508Z .file 1 "/tmp/torchinductor_root/n4/cn4dl3w6lptir73yvprkbfngfnsah2ppp2hxdqxvxvhllmpznmej.py" 2026-02-21T09:31:16.3917933Z .section .debug_abbrev 2026-02-21T09:31:16.3918105Z { 2026-02-21T09:31:16.3918292Z .b8 1 // Abbreviation Code 2026-02-21T09:31:16.3918569Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:16.3918848Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:16.3919112Z .b8 37 // DW_AT_producer 2026-02-21T09:31:16.3919374Z .b8 8 // DW_FORM_string 2026-02-21T09:31:16.3919638Z .b8 19 // DW_AT_language 2026-02-21T09:31:16.3919920Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:16.3920205Z .b8 3 // DW_AT_name 2026-02-21T09:31:16.3920479Z .b8 8 // DW_FORM_string 2026-02-21T09:31:16.3920795Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:16.3921097Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:16.3921356Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:16.3921606Z .b8 8 // DW_FORM_string 2026-02-21T09:31:16.3921858Z .b8 0 // EOM(1) 2026-02-21T09:31:16.3922102Z .b8 0 // EOM(2) 2026-02-21T09:31:16.3922332Z .b8 0 // EOM(3) 2026-02-21T09:31:16.3922548Z } 2026-02-21T09:31:16.3922695Z .section .debug_info 2026-02-21T09:31:16.3922868Z { 2026-02-21T09:31:16.3923041Z .b32 104 // Length of Unit 2026-02-21T09:31:16.3923321Z .b8 2 // DWARF version number 2026-02-21T09:31:16.3923557Z .b8 0 2026-02-21T09:31:16.3923784Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:16.3924111Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:16.3924409Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:16.3924764Z .b8 116 // DW_AT_producer 2026-02-21T09:31:16.3925006Z .b8 114 2026-02-21T09:31:16.3925156Z .b8 105 2026-02-21T09:31:16.3925293Z .b8 116 2026-02-21T09:31:16.3925437Z .b8 111 2026-02-21T09:31:16.3925572Z .b8 110 2026-02-21T09:31:16.3925713Z .b8 0 2026-02-21T09:31:16.3925890Z .b8 2 // DW_AT_language 2026-02-21T09:31:16.3926112Z .b8 0 2026-02-21T09:31:16.3926287Z .b8 99 // DW_AT_name 2026-02-21T09:31:16.3926509Z .b8 110 2026-02-21T09:31:16.3926654Z .b8 52 2026-02-21T09:31:16.3926788Z .b8 100 2026-02-21T09:31:16.3926928Z .b8 108 2026-02-21T09:31:16.3927060Z .b8 51 2026-02-21T09:31:16.3927201Z .b8 119 2026-02-21T09:31:16.3927333Z .b8 54 2026-02-21T09:31:16.3927476Z .b8 108 2026-02-21T09:31:16.3927608Z .b8 112 2026-02-21T09:31:16.3927749Z .b8 116 2026-02-21T09:31:16.3927883Z .b8 105 2026-02-21T09:31:16.3928024Z .b8 114 2026-02-21T09:31:16.3928162Z .b8 55 2026-02-21T09:31:16.3928385Z .b8 51 2026-02-21T09:31:16.3928569Z .b8 121 2026-02-21T09:31:16.3928707Z .b8 118 2026-02-21T09:31:16.3928865Z .b8 112 2026-02-21T09:31:16.3929007Z .b8 114 2026-02-21T09:31:16.3929162Z .b8 107 2026-02-21T09:31:16.3929303Z .b8 98 2026-02-21T09:31:16.3929455Z .b8 102 2026-02-21T09:31:16.3929595Z .b8 110 2026-02-21T09:31:16.3929748Z .b8 103 2026-02-21T09:31:16.3929897Z .b8 102 2026-02-21T09:31:16.3930057Z .b8 110 2026-02-21T09:31:16.3930207Z .b8 115 2026-02-21T09:31:16.3930353Z .b8 97 2026-02-21T09:31:16.3930491Z .b8 104 2026-02-21T09:31:16.3930622Z .b8 50 2026-02-21T09:31:16.3930761Z .b8 112 2026-02-21T09:31:16.3930891Z .b8 112 2026-02-21T09:31:16.3931030Z .b8 112 2026-02-21T09:31:16.3931161Z .b8 50 2026-02-21T09:31:16.3931303Z .b8 104 2026-02-21T09:31:16.3931437Z .b8 120 2026-02-21T09:31:16.3931576Z .b8 100 2026-02-21T09:31:16.3931710Z .b8 113 2026-02-21T09:31:16.3931851Z .b8 120 2026-02-21T09:31:16.3932054Z .b8 118 2026-02-21T09:31:16.3932205Z .b8 120 2026-02-21T09:31:16.3932379Z .b8 118 2026-02-21T09:31:16.3932516Z .b8 104 2026-02-21T09:31:16.3932651Z .b8 108 2026-02-21T09:31:16.3932777Z .b8 108 2026-02-21T09:31:16.3932907Z .b8 109 2026-02-21T09:31:16.3933028Z .b8 112 2026-02-21T09:31:16.3933157Z .b8 122 2026-02-21T09:31:16.3933278Z .b8 110 2026-02-21T09:31:16.3933410Z .b8 109 2026-02-21T09:31:16.3933534Z .b8 101 2026-02-21T09:31:16.3933664Z .b8 106 2026-02-21T09:31:16.3933787Z .b8 46 2026-02-21T09:31:16.3933916Z .b8 112 2026-02-21T09:31:16.3934037Z .b8 121 2026-02-21T09:31:16.3934168Z .b8 0 2026-02-21T09:31:16.3934347Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:16.3934619Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:16.3934893Z .b8 116 2026-02-21T09:31:16.3935027Z .b8 109 2026-02-21T09:31:16.3935167Z .b8 112 2026-02-21T09:31:16.3935296Z .b8 47 2026-02-21T09:31:16.3935436Z .b8 116 2026-02-21T09:31:16.3935566Z .b8 111 2026-02-21T09:31:16.3935708Z .b8 114 2026-02-21T09:31:16.3935840Z .b8 99 2026-02-21T09:31:16.3935984Z .b8 104 2026-02-21T09:31:16.3936118Z .b8 105 2026-02-21T09:31:16.3936262Z .b8 110 2026-02-21T09:31:16.3936392Z .b8 100 2026-02-21T09:31:16.3936531Z .b8 117 2026-02-21T09:31:16.3936662Z .b8 99 2026-02-21T09:31:16.3936799Z .b8 116 2026-02-21T09:31:16.3936943Z .b8 111 2026-02-21T09:31:16.3937063Z .b8 114 2026-02-21T09:31:16.3937192Z .b8 95 2026-02-21T09:31:16.3937312Z .b8 114 2026-02-21T09:31:16.3937438Z .b8 111 2026-02-21T09:31:16.3937559Z .b8 111 2026-02-21T09:31:16.3937689Z .b8 116 2026-02-21T09:31:16.3937813Z .b8 47 2026-02-21T09:31:16.3937952Z .b8 110 2026-02-21T09:31:16.3938085Z .b8 52 2026-02-21T09:31:16.3938224Z .b8 0 2026-02-21T09:31:16.3938361Z } 2026-02-21T09:31:16.3938520Z .section .debug_macinfo { } 2026-02-21T09:31:16.3938658Z 2026-02-21T09:31:16.3938760Z ================================================================ 2026-02-21T09:31:16.3939073Z please share the reproducer above with Triton project. 2026-02-21T09:31:17.8284083Z [29s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:17.8284443Z 2026-02-21T09:31:17.8284751Z 2026-02-21T09:31:17.8284786Z 2026-02-21T09:31:17.8285088Z ================================================================ 2026-02-21T09:31:17.8285384Z Internal Triton PTX codegen error 2026-02-21T09:31:17.8285594Z `ptxas` stderr: 2026-02-21T09:31:17.8286084Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:17.8286615Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:17.8286788Z 2026-02-21T09:31:17.8287221Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpffwgqgf3.ptx -o /tmp/tmpffwgqgf3.ptx.o 2026-02-21T09:31:17.8287702Z 2026-02-21T09:31:17.8287706Z 2026-02-21T09:31:17.8288873Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 32, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:31:17.8290446Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:17.8290694Z `ptxas` stderr: 2026-02-21T09:31:17.8291139Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 227 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:17.8291642Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:17.8291796Z 2026-02-21T09:31:17.8292335Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpffwgqgf3.ptx -o /tmp/tmpffwgqgf3.ptx.o 2026-02-21T09:31:17.8292866Z 2026-02-21T09:31:17.8293007Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:17.8293281Z // 2026-02-21T09:31:17.8293437Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:17.8293632Z // 2026-02-21T09:31:17.8293704Z 2026-02-21T09:31:17.8293767Z .version 8.7 2026-02-21T09:31:17.8293922Z .target sm_100a 2026-02-21T09:31:17.8294068Z .address_size 64 2026-02-21T09:31:17.8294164Z 2026-02-21T09:31:17.8294301Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:17.8294572Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:17.8294845Z // @_helion_matmul 2026-02-21T09:31:17.8295063Z .visible .entry _helion_matmul( 2026-02-21T09:31:17.8295288Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:17.8295563Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:17.8295821Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:17.8296085Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:17.8296355Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:17.8296564Z ) 2026-02-21T09:31:17.8296698Z .reqntid 256 2026-02-21T09:31:17.8296832Z .maxnreg 32 2026-02-21T09:31:17.8296970Z { 2026-02-21T09:31:17.8297100Z .reg .pred %p<70>; 2026-02-21T09:31:17.8297270Z .reg .b16 %rs<8>; 2026-02-21T09:31:17.8297408Z .reg .b32 %r<286>; 2026-02-21T09:31:17.8297554Z .reg .b64 %rd<87>; 2026-02-21T09:31:17.8297689Z $L__func_begin0: 2026-02-21T09:31:17.8297778Z 2026-02-21T09:31:17.8297831Z // %bb.0: 2026-02-21T09:31:17.8298091Z .loc 1 19 0 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:19 2026-02-21T09:31:17.8298392Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:17.8298573Z ld.param.b64 %rd11, [_helion_matmul_param_1]; 2026-02-21T09:31:17.8298776Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:17.8298950Z mov.b32 %r45, global_smem; 2026-02-21T09:31:17.8299113Z // begin inline asm 2026-02-21T09:31:17.8299366Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r45], 32; 2026-02-21T09:31:17.8299615Z // end inline asm 2026-02-21T09:31:17.8299788Z ld.param.b64 %rd28, [_helion_matmul_param_3]; 2026-02-21T09:31:17.8299986Z bar.sync 0; 2026-02-21T09:31:17.8300133Z ld.shared.b32 %r277, [global_smem]; 2026-02-21T09:31:17.8300319Z bar.sync 0; 2026-02-21T09:31:17.8300446Z // begin inline asm 2026-02-21T09:31:17.8300656Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:17.8300886Z // end inline asm 2026-02-21T09:31:17.8301158Z .loc 1 21 67 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:21:67 2026-02-21T09:31:17.8301465Z mov.u32 %r54, %ctaid.x; 2026-02-21T09:31:17.8301630Z mov.u32 %r55, %ctaid.y; 2026-02-21T09:31:17.8301786Z mov.u32 %r56, %ctaid.z; 2026-02-21T09:31:17.8301938Z mov.u32 %r57, %nctaid.x; 2026-02-21T09:31:17.8302104Z mov.u32 %r58, %nctaid.y; 2026-02-21T09:31:17.8302314Z mad.lo.s32 %r59, %r56, %r58, %r55; 2026-02-21T09:31:17.8302542Z mad.lo.s32 %r60, %r59, %r57, %r54; 2026-02-21T09:31:17.8302718Z shl.b32 %r61, %r60, 7; 2026-02-21T09:31:17.8302894Z cvt.s64.s32 %rd29, %r61; 2026-02-21T09:31:17.8303058Z add.s64 %rd25, %rd28, %rd29; 2026-02-21T09:31:17.8303238Z shl.b32 %r62, %r1, 2; 2026-02-21T09:31:17.8303393Z add.s32 %r46, %r45, %r62; 2026-02-21T09:31:17.8303562Z mov.b32 %r47, 0; 2026-02-21T09:31:17.8303713Z // begin inline asm 2026-02-21T09:31:17.8303870Z @%p1 st.shared.b32 [ %r46 + 0 ], %r47; 2026-02-21T09:31:17.8304051Z // end inline asm 2026-02-21T09:31:17.8304191Z bar.warp.sync -1; 2026-02-21T09:31:17.8304346Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:31:17.8304501Z cvt.u64.u32 %rd10, %r45; 2026-02-21T09:31:17.8304656Z // begin inline asm 2026-02-21T09:31:17.8304985Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T09:31:17.8305318Z // end inline asm 2026-02-21T09:31:17.8305466Z // begin inline asm 2026-02-21T09:31:17.8305696Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T09:31:17.8305957Z // end inline asm 2026-02-21T09:31:17.8306091Z mov.b32 %r48, 16; 2026-02-21T09:31:17.8306237Z // begin inline asm 2026-02-21T09:31:17.8306475Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r48; 2026-02-21T09:31:17.8306754Z // end inline asm 2026-02-21T09:31:17.8306888Z mov.b32 %r49, 32; 2026-02-21T09:31:17.8307033Z // begin inline asm 2026-02-21T09:31:17.8307272Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r49; 2026-02-21T09:31:17.8307537Z // end inline asm 2026-02-21T09:31:17.8307676Z mov.b32 %r50, 1024; 2026-02-21T09:31:17.8307817Z // begin inline asm 2026-02-21T09:31:17.8308062Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r50; 2026-02-21T09:31:17.8308335Z // end inline asm 2026-02-21T09:31:17.8308476Z // begin inline asm 2026-02-21T09:31:17.8308711Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r50; 2026-02-21T09:31:17.8308995Z // end inline asm 2026-02-21T09:31:17.8309129Z mov.b64 %rd18, 2048; 2026-02-21T09:31:17.8309265Z // begin inline asm 2026-02-21T09:31:17.8309513Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T09:31:17.8309785Z // end inline asm 2026-02-21T09:31:17.8309919Z mov.b32 %r52, 1; 2026-02-21T09:31:17.8310046Z // begin inline asm 2026-02-21T09:31:17.8310294Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r52; 2026-02-21T09:31:17.8310572Z // end inline asm 2026-02-21T09:31:17.8310697Z // begin inline asm 2026-02-21T09:31:17.8310943Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r52; 2026-02-21T09:31:17.8311211Z // end inline asm 2026-02-21T09:31:17.8311346Z // begin inline asm 2026-02-21T09:31:17.8311567Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x6; 2026-02-21T09:31:17.8311829Z // end inline asm 2026-02-21T09:31:17.8311957Z // begin inline asm 2026-02-21T09:31:17.8312202Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:31:17.8312482Z // end inline asm 2026-02-21T09:31:17.8312611Z // begin inline asm 2026-02-21T09:31:17.8312848Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T09:31:17.8313102Z // end inline asm 2026-02-21T09:31:17.8313237Z // begin inline asm 2026-02-21T09:31:17.8313457Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:31:17.8313711Z // end inline asm 2026-02-21T09:31:17.8313844Z // begin inline asm 2026-02-21T09:31:17.8314181Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd25 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T09:31:17.8314555Z // end inline asm 2026-02-21T09:31:17.8314712Z // begin inline asm 2026-02-21T09:31:17.8314954Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd25 + 0 ], 0x80; 2026-02-21T09:31:17.8315230Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:31:17.8315422Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:31:17.8315601Z // end inline asm 2026-02-21T09:31:17.8315729Z bar.sync 0; 2026-02-21T09:31:17.8315871Z cvta.global.u64 %rd42, %rd25; 2026-02-21T09:31:17.8316147Z .loc 1 28 35 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:28:35 2026-02-21T09:31:17.8316441Z mul.lo.s32 %r278, %r54, 3; 2026-02-21T09:31:17.8316706Z .loc 1 29 37 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:29:37 2026-02-21T09:31:17.8316997Z add.s32 %r63, %r278, 3; 2026-02-21T09:31:17.8317254Z .loc 1 29 49 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:29:49 2026-02-21T09:31:17.8317565Z min.s32 %r4, %r63, 3072; 2026-02-21T09:31:17.8317856Z .loc 1 30 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:30:52 2026-02-21T09:31:17.8318140Z setp.ge.s32 %p21, %r278, %r4; 2026-02-21T09:31:17.8318305Z @%p21 bra $L__BB0_9; 2026-02-21T09:31:17.8318463Z // %bb.1: // %.lr.ph 2026-02-21T09:31:17.8318759Z .loc 1 0 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:0:52 2026-02-21T09:31:17.8319054Z ld.param.b64 %rd9, [_helion_matmul_param_2]; 2026-02-21T09:31:17.8319270Z ld.param.b64 %rd8, [_helion_matmul_param_0]; 2026-02-21T09:31:17.8319458Z shr.u32 %r5, %r1, 5; 2026-02-21T09:31:17.8319596Z shr.u32 %r6, %r1, 1; 2026-02-21T09:31:17.8319741Z bfe.u32 %r7, %r1, 1, 7; 2026-02-21T09:31:17.8319884Z and.b32 %r8, %r1, 252; 2026-02-21T09:31:17.8320037Z bfe.u32 %r9, %r1, 2, 6; 2026-02-21T09:31:17.8320175Z or.b32 %r10, %r9, 64; 2026-02-21T09:31:17.8320319Z and.b32 %r11, %r1, 3; 2026-02-21T09:31:17.8320459Z shl.b32 %r12, %r11, 3; 2026-02-21T09:31:17.8320610Z shl.b32 %r64, %r1, 3; 2026-02-21T09:31:17.8320749Z and.b32 %r13, %r64, 8; 2026-02-21T09:31:17.8320896Z shl.b32 %r65, %r1, 4; 2026-02-21T09:31:17.8321042Z and.b32 %r66, %r65, 3952; 2026-02-21T09:31:17.8321195Z bfe.s32 %r67, %r1, 3, 1; 2026-02-21T09:31:17.8321349Z and.b32 %r68, %r67, 144; 2026-02-21T09:31:17.8321493Z xor.b32 %r69, %r68, %r66; 2026-02-21T09:31:17.8321649Z add.s32 %r110, %r45, %r69; 2026-02-21T09:31:17.8321804Z add.s32 %r117, %r110, 4096; 2026-02-21T09:31:17.8321961Z add.s32 %r124, %r110, 8192; 2026-02-21T09:31:17.8322111Z bfe.u32 %r71, %r45, 4, 14; 2026-02-21T09:31:17.8322272Z cvt.u64.u32 %rd30, %r71; 2026-02-21T09:31:17.8322434Z or.b64 %rd38, %rd30, -4611685949691133952; 2026-02-21T09:31:17.8322622Z add.s32 %r72, %r45, 16384; 2026-02-21T09:31:17.8322779Z bfe.u32 %r73, %r72, 4, 14; 2026-02-21T09:31:17.8322925Z cvt.u64.u32 %rd31, %r73; 2026-02-21T09:31:17.8323088Z or.b64 %rd39, %rd31, -4611685949703716864; 2026-02-21T09:31:17.8323263Z add.s32 %r157, %r110, 12288; 2026-02-21T09:31:17.8323423Z shl.b32 %r74, %r1, 10; 2026-02-21T09:31:17.8323566Z and.b32 %r75, %r74, 6144; 2026-02-21T09:31:17.8323718Z and.b32 %r76, %r65, 2032; 2026-02-21T09:31:17.8323861Z and.b32 %r77, %r6, 64; 2026-02-21T09:31:17.8324008Z xor.b32 %r78, %r76, %r77; 2026-02-21T09:31:17.8324158Z or.b32 %r79, %r78, %r75; 2026-02-21T09:31:17.8324300Z add.s32 %r18, %r45, %r79; 2026-02-21T09:31:17.8324450Z xor.b32 %r80, %r79, 32; 2026-02-21T09:31:17.8324590Z add.s32 %r19, %r45, %r80; 2026-02-21T09:31:17.8324767Z shl.b32 %r81, %r1, 8; 2026-02-21T09:31:17.8324908Z and.b32 %r82, %r81, 6144; 2026-02-21T09:31:17.8325058Z shl.b32 %r83, %r11, 5; 2026-02-21T09:31:17.8325199Z shl.b32 %r84, %r8, 2; 2026-02-21T09:31:17.8325345Z or.b32 %r85, %r82, %r83; 2026-02-21T09:31:17.8325487Z xor.b32 %r86, %r85, %r84; 2026-02-21T09:31:17.8325639Z add.s32 %r20, %r45, %r86; 2026-02-21T09:31:17.8325915Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8326229Z or.b32 %r21, %r13, 64; 2026-02-21T09:31:17.8326406Z bra.uni $L__BB0_2; 2026-02-21T09:31:17.8326585Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:17.8326900Z .loc 1 0 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:0:57 2026-02-21T09:31:17.8327178Z mov.b32 %r216, 1; 2026-02-21T09:31:17.8327426Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8327709Z // begin inline asm 2026-02-21T09:31:17.8327839Z 2026-02-21T09:31:17.8327960Z { 2026-02-21T09:31:17.8328080Z .reg .pred complete; 2026-02-21T09:31:17.8328230Z waitLoop: 2026-02-21T09:31:17.8328414Z mbarrier.try_wait.parity.shared.b64 complete, [%r215], %r216; 2026-02-21T09:31:17.8328651Z @!complete bra.uni waitLoop; 2026-02-21T09:31:17.8328797Z } 2026-02-21T09:31:17.8328867Z 2026-02-21T09:31:17.8328969Z // end inline asm 2026-02-21T09:31:17.8329240Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8329531Z cp.async.wait_group 0; 2026-02-21T09:31:17.8329682Z bar.sync 0; 2026-02-21T09:31:17.8329807Z // begin inline asm 2026-02-21T09:31:17.8329975Z @%p4 mbarrier.inval.shared::cta.b64 [%r106]; 2026-02-21T09:31:17.8330157Z // end inline asm 2026-02-21T09:31:17.8330294Z bar.sync 0; 2026-02-21T09:31:17.8330418Z // begin inline asm 2026-02-21T09:31:17.8330582Z @%p4 mbarrier.inval.shared::cta.b64 [%r107]; 2026-02-21T09:31:17.8330759Z // end inline asm 2026-02-21T09:31:17.8330894Z bar.sync 0; 2026-02-21T09:31:17.8331016Z // begin inline asm 2026-02-21T09:31:17.8331182Z @%p4 mbarrier.inval.shared::cta.b64 [%r108]; 2026-02-21T09:31:17.8331372Z // end inline asm 2026-02-21T09:31:17.8331502Z bar.sync 0; 2026-02-21T09:31:17.8331630Z // begin inline asm 2026-02-21T09:31:17.8331782Z @%p4 mbarrier.inval.shared::cta.b64 [%r159]; 2026-02-21T09:31:17.8331966Z // end inline asm 2026-02-21T09:31:17.8332099Z add.s32 %r221, %r45, 20512; 2026-02-21T09:31:17.8332257Z // begin inline asm 2026-02-21T09:31:17.8332407Z @%p4 mbarrier.inval.shared::cta.b64 [%r221]; 2026-02-21T09:31:17.8332588Z // end inline asm 2026-02-21T09:31:17.8332721Z bar.sync 0; 2026-02-21T09:31:17.8332842Z // begin inline asm 2026-02-21T09:31:17.8332999Z @%p4 mbarrier.inval.shared::cta.b64 [%r105]; 2026-02-21T09:31:17.8333169Z // end inline asm 2026-02-21T09:31:17.8333419Z .loc 1 59 45 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:59:45 2026-02-21T09:31:17.8333697Z shl.b32 %r249, %r25, 10; 2026-02-21T09:31:17.8333852Z shl.b32 %r250, %r26, 10; 2026-02-21T09:31:17.8334103Z .loc 1 59 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:59:52 2026-02-21T09:31:17.8334389Z add.s32 %r251, %r249, %r28; 2026-02-21T09:31:17.8334547Z add.s32 %r252, %r250, %r28; 2026-02-21T09:31:17.8334834Z .loc 1 59 24 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:59:24 2026-02-21T09:31:17.8335130Z mad.wide.s32 %rd52, %r251, 2, %rd9; 2026-02-21T09:31:17.8335310Z mad.wide.s32 %rd53, %r252, 2, %rd9; 2026-02-21T09:31:17.8335586Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8335858Z // begin inline asm 2026-02-21T09:31:17.8336213Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r223, %r224, %r225, %r226, %r227, %r228, %r229, %r230, %r231, %r232, %r233, %r234, %r235, %r236, %r237, %r238}, [%r239 + 0]; 2026-02-21T09:31:17.8336598Z // end inline asm 2026-02-21T09:31:17.8336728Z // begin inline asm 2026-02-21T09:31:17.8336879Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:17.8337036Z // end inline asm 2026-02-21T09:31:17.8337174Z cvt.u64.u32 %rd54, %r223; 2026-02-21T09:31:17.8337325Z cvt.u64.u32 %rd55, %r224; 2026-02-21T09:31:17.8337480Z shl.b64 %rd56, %rd55, 32; 2026-02-21T09:31:17.8337626Z or.b64 %rd57, %rd54, %rd56; 2026-02-21T09:31:17.8337891Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8338230Z mov.b64 {%r253, %r254}, %rd57; 2026-02-21T09:31:17.8338394Z cvt.rn.f16x2.f32 %r255, %r254, %r253; 2026-02-21T09:31:17.8338676Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8338950Z cvt.u64.u32 %rd58, %r225; 2026-02-21T09:31:17.8339102Z cvt.u64.u32 %rd59, %r226; 2026-02-21T09:31:17.8339244Z shl.b64 %rd60, %rd59, 32; 2026-02-21T09:31:17.8339395Z or.b64 %rd61, %rd58, %rd60; 2026-02-21T09:31:17.8339656Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8339934Z mov.b64 {%r256, %r257}, %rd61; 2026-02-21T09:31:17.8340106Z cvt.rn.f16x2.f32 %r258, %r257, %r256; 2026-02-21T09:31:17.8340407Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8340725Z cvt.u64.u32 %rd62, %r227; 2026-02-21T09:31:17.8340874Z cvt.u64.u32 %rd63, %r228; 2026-02-21T09:31:17.8341028Z shl.b64 %rd64, %rd63, 32; 2026-02-21T09:31:17.8341174Z or.b64 %rd65, %rd62, %rd64; 2026-02-21T09:31:17.8341442Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8341732Z mov.b64 {%r259, %r260}, %rd65; 2026-02-21T09:31:17.8341893Z cvt.rn.f16x2.f32 %r261, %r260, %r259; 2026-02-21T09:31:17.8342173Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8342452Z cvt.u64.u32 %rd66, %r229; 2026-02-21T09:31:17.8342601Z cvt.u64.u32 %rd67, %r230; 2026-02-21T09:31:17.8342744Z shl.b64 %rd68, %rd67, 32; 2026-02-21T09:31:17.8342896Z or.b64 %rd69, %rd66, %rd68; 2026-02-21T09:31:17.8343149Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8343432Z mov.b64 {%r262, %r263}, %rd69; 2026-02-21T09:31:17.8343602Z cvt.rn.f16x2.f32 %r264, %r263, %r262; 2026-02-21T09:31:17.8343871Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8344155Z cvt.u64.u32 %rd70, %r231; 2026-02-21T09:31:17.8344299Z cvt.u64.u32 %rd71, %r232; 2026-02-21T09:31:17.8344450Z shl.b64 %rd72, %rd71, 32; 2026-02-21T09:31:17.8344594Z or.b64 %rd73, %rd70, %rd72; 2026-02-21T09:31:17.8344901Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8345202Z mov.b64 {%r265, %r266}, %rd73; 2026-02-21T09:31:17.8345370Z cvt.rn.f16x2.f32 %r267, %r266, %r265; 2026-02-21T09:31:17.8345668Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8345957Z cvt.u64.u32 %rd74, %r233; 2026-02-21T09:31:17.8346116Z cvt.u64.u32 %rd75, %r234; 2026-02-21T09:31:17.8346270Z shl.b64 %rd76, %rd75, 32; 2026-02-21T09:31:17.8346434Z or.b64 %rd77, %rd74, %rd76; 2026-02-21T09:31:17.8346705Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8347001Z mov.b64 {%r268, %r269}, %rd77; 2026-02-21T09:31:17.8347174Z cvt.rn.f16x2.f32 %r270, %r269, %r268; 2026-02-21T09:31:17.8347454Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8347747Z cvt.u64.u32 %rd78, %r235; 2026-02-21T09:31:17.8347897Z cvt.u64.u32 %rd79, %r236; 2026-02-21T09:31:17.8348056Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:31:17.8348206Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:31:17.8348492Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8348790Z mov.b64 {%r271, %r272}, %rd81; 2026-02-21T09:31:17.8348956Z cvt.rn.f16x2.f32 %r273, %r272, %r271; 2026-02-21T09:31:17.8349257Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8349578Z cvt.u64.u32 %rd82, %r237; 2026-02-21T09:31:17.8349768Z cvt.u64.u32 %rd83, %r238; 2026-02-21T09:31:17.8349919Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:31:17.8350079Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:31:17.8350344Z .loc 1 58 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:58:27 2026-02-21T09:31:17.8350643Z mov.b64 {%r274, %r275}, %rd85; 2026-02-21T09:31:17.8350817Z cvt.rn.f16x2.f32 %r276, %r275, %r274; 2026-02-21T09:31:17.8351098Z .loc 1 59 82 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:59:82 2026-02-21T09:31:17.8351436Z st.shared.v4.b32 [%r18], {%r255, %r258, %r261, %r264}; 2026-02-21T09:31:17.8351675Z st.shared.v4.b32 [%r19], {%r267, %r270, %r273, %r276}; 2026-02-21T09:31:17.8351886Z bar.sync 0; 2026-02-21T09:31:17.8352065Z ld.shared.v4.b32 {%r244, %r245, %r246, %r247}, [%r20+1024]; 2026-02-21T09:31:17.8352369Z ld.shared.v4.b32 {%r240, %r241, %r242, %r243}, [%r20]; 2026-02-21T09:31:17.8352617Z // begin inline asm 2026-02-21T09:31:17.8352799Z st.global.v4.b32 [ %rd52 + 0 ], { %r240, %r241, %r242, %r243 }; 2026-02-21T09:31:17.8353022Z // end inline asm 2026-02-21T09:31:17.8353157Z // begin inline asm 2026-02-21T09:31:17.8353341Z st.global.v4.b32 [ %rd53 + 0 ], { %r244, %r245, %r246, %r247 }; 2026-02-21T09:31:17.8353543Z // end inline asm 2026-02-21T09:31:17.8353793Z .loc 1 30 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:30:52 2026-02-21T09:31:17.8354074Z add.s32 %r278, %r278, 1; 2026-02-21T09:31:17.8354240Z setp.ne.b32 %p68, %r278, %r4; 2026-02-21T09:31:17.8354409Z @%p68 bra $L__BB0_2; 2026-02-21T09:31:17.8354550Z bra.uni $L__BB0_9; 2026-02-21T09:31:17.8354765Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:31:17.8354993Z // Child Loop BB0_5 Depth 2 2026-02-21T09:31:17.8355305Z .loc 1 36 35 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:36:35 2026-02-21T09:31:17.8355585Z shr.s32 %r133, %r278, 31; 2026-02-21T09:31:17.8355739Z shr.u32 %r134, %r133, 24; 2026-02-21T09:31:17.8355895Z add.s32 %r135, %r278, %r134; 2026-02-21T09:31:17.8356150Z .loc 1 39 45 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:39:45 2026-02-21T09:31:17.8356432Z and.b32 %r136, %r135, 65280; 2026-02-21T09:31:17.8356583Z sub.s32 %r137, %r278, %r136; 2026-02-21T09:31:17.8356843Z .loc 1 39 64 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:39:64 2026-02-21T09:31:17.8357122Z cvt.u16.u32 %rs1, %r137; 2026-02-21T09:31:17.8357379Z .loc 1 40 51 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:40:51 2026-02-21T09:31:17.8357669Z shr.s16 %rs2, %rs1, 15; 2026-02-21T09:31:17.8357817Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:31:17.8357973Z add.s16 %rs4, %rs1, %rs3; 2026-02-21T09:31:17.8358120Z shr.s16 %rs5, %rs4, 3; 2026-02-21T09:31:17.8358374Z .loc 1 39 64 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:39:64 2026-02-21T09:31:17.8358651Z and.b16 %rs6, %rs4, -8; 2026-02-21T09:31:17.8358801Z sub.s16 %rs7, %rs1, %rs6; 2026-02-21T09:31:17.8359047Z .loc 1 41 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:41:27 2026-02-21T09:31:17.8359337Z shl.b32 %r138, %r135, 2; 2026-02-21T09:31:17.8359491Z and.b32 %r23, %r138, -1024; 2026-02-21T09:31:17.8359648Z mul.wide.s16 %r24, %rs7, 128; 2026-02-21T09:31:17.8359811Z add.s32 %r139, %r24, %r23; 2026-02-21T09:31:17.8360064Z .loc 1 42 32 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:42:32 2026-02-21T09:31:17.8360345Z or.b32 %r140, %r139, %r7; 2026-02-21T09:31:17.8360593Z .loc 1 43 27 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:43:27 2026-02-21T09:31:17.8360881Z mul.wide.s16 %r162, %rs5, 32; 2026-02-21T09:31:17.8361145Z .loc 1 54 53 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:53 2026-02-21T09:31:17.8361469Z shl.b32 %r141, %r140, 10; 2026-02-21T09:31:17.8361722Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8362009Z shfl.sync.idx.b32 %r29, %r5, 0, 31, -1; 2026-02-21T09:31:17.8362191Z shl.b32 %r142, %r29, 21; 2026-02-21T09:31:17.8362338Z and.b32 %r143, %r142, 6291456; 2026-02-21T09:31:17.8362503Z add.s32 %r144, %r143, %r277; 2026-02-21T09:31:17.8362652Z shl.b32 %r145, %r29, 2; 2026-02-21T09:31:17.8362802Z and.b32 %r146, %r145, 16; 2026-02-21T09:31:17.8362954Z add.s32 %r239, %r144, %r146; 2026-02-21T09:31:17.8363107Z mov.pred %p22, -1; 2026-02-21T09:31:17.8363254Z mov.b32 %r279, 0; 2026-02-21T09:31:17.8363391Z // begin inline asm 2026-02-21T09:31:17.8363815Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r239 + 0], {%r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279, %r279}; 2026-02-21T09:31:17.8364198Z // end inline asm 2026-02-21T09:31:17.8364340Z // begin inline asm 2026-02-21T09:31:17.8364488Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:17.8364654Z // end inline asm 2026-02-21T09:31:17.8364814Z bar.sync 0; 2026-02-21T09:31:17.8365051Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8365341Z add.s32 %r280, %r45, 20512; 2026-02-21T09:31:17.8365489Z // begin inline asm 2026-02-21T09:31:17.8365655Z @%p4 mbarrier.init.shared::cta.b64 [%r280], 1; 2026-02-21T09:31:17.8365839Z // end inline asm 2026-02-21T09:31:17.8365973Z bar.sync 0; 2026-02-21T09:31:17.8366101Z add.s32 %r105, %r45, 20520; 2026-02-21T09:31:17.8366254Z // begin inline asm 2026-02-21T09:31:17.8366418Z @%p4 mbarrier.init.shared::cta.b64 [%r105], 1; 2026-02-21T09:31:17.8366599Z // end inline asm 2026-02-21T09:31:17.8366740Z add.s32 %r106, %r45, 20480; 2026-02-21T09:31:17.8366889Z // begin inline asm 2026-02-21T09:31:17.8367054Z @%p4 mbarrier.init.shared::cta.b64 [%r106], 1; 2026-02-21T09:31:17.8367234Z // end inline asm 2026-02-21T09:31:17.8367366Z bar.sync 0; 2026-02-21T09:31:17.8367492Z add.s32 %r107, %r45, 20488; 2026-02-21T09:31:17.8367646Z // begin inline asm 2026-02-21T09:31:17.8367797Z @%p4 mbarrier.init.shared::cta.b64 [%r107], 1; 2026-02-21T09:31:17.8367981Z // end inline asm 2026-02-21T09:31:17.8368113Z bar.sync 0; 2026-02-21T09:31:17.8368235Z add.s32 %r108, %r45, 20496; 2026-02-21T09:31:17.8368386Z // begin inline asm 2026-02-21T09:31:17.8368537Z @%p4 mbarrier.init.shared::cta.b64 [%r108], 1; 2026-02-21T09:31:17.8368717Z // end inline asm 2026-02-21T09:31:17.8368843Z bar.sync 0; 2026-02-21T09:31:17.8368972Z add.s32 %r159, %r45, 20504; 2026-02-21T09:31:17.8369115Z // begin inline asm 2026-02-21T09:31:17.8369273Z @%p4 mbarrier.init.shared::cta.b64 [%r159], 1; 2026-02-21T09:31:17.8369456Z // end inline asm 2026-02-21T09:31:17.8369700Z .loc 1 54 60 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:60 2026-02-21T09:31:17.8369990Z or.b32 %r148, %r141, %r13; 2026-02-21T09:31:17.8370246Z .loc 1 54 32 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:32 2026-02-21T09:31:17.8370537Z mad.wide.s32 %rd32, %r148, 2, %rd8; 2026-02-21T09:31:17.8370700Z mov.b32 %r111, 16; 2026-02-21T09:31:17.8370945Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8371218Z // begin inline asm 2026-02-21T09:31:17.8371423Z cp.async.cg.shared.global [ %r110 + 0 ], [ %rd32 + 0 ], 0x10, %r111; 2026-02-21T09:31:17.8371652Z // end inline asm 2026-02-21T09:31:17.8371788Z cp.async.commit_group; 2026-02-21T09:31:17.8372049Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8372325Z bar.sync 0; 2026-02-21T09:31:17.8372459Z // begin inline asm 2026-02-21T09:31:17.8372644Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r106], 1024; 2026-02-21T09:31:17.8372936Z // end inline asm 2026-02-21T09:31:17.8373191Z .loc 1 55 44 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:55:44 2026-02-21T09:31:17.8373470Z bar.sync 0; 2026-02-21T09:31:17.8373612Z elect.sync %r149|%p36, -1; 2026-02-21T09:31:17.8373780Z and.pred %p30, %p1, %p36; 2026-02-21T09:31:17.8373946Z // begin inline asm 2026-02-21T09:31:17.8374272Z @%p30 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r72], [%rd42, {%r279, %r162}], [%r106]; 2026-02-21T09:31:17.8374636Z // end inline asm 2026-02-21T09:31:17.8374909Z .loc 1 54 32 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:32 2026-02-21T09:31:17.8375203Z add.s64 %rd34, %rd32, 32; 2026-02-21T09:31:17.8375497Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8375768Z // begin inline asm 2026-02-21T09:31:17.8376008Z cp.async.cg.shared.global [ %r117 + 0 ], [ %rd34 + 0 ], 0x10, %r111; 2026-02-21T09:31:17.8376227Z // end inline asm 2026-02-21T09:31:17.8376369Z cp.async.commit_group; 2026-02-21T09:31:17.8376623Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8376904Z bar.sync 0; 2026-02-21T09:31:17.8377034Z // begin inline asm 2026-02-21T09:31:17.8377214Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r107], 1024; 2026-02-21T09:31:17.8377428Z // end inline asm 2026-02-21T09:31:17.8377665Z .loc 1 55 44 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:55:44 2026-02-21T09:31:17.8377954Z bar.sync 0; 2026-02-21T09:31:17.8378083Z elect.sync %r150|%p37, -1; 2026-02-21T09:31:17.8378244Z and.pred %p32, %p1, %p37; 2026-02-21T09:31:17.8378394Z add.s32 %r120, %r45, 17408; 2026-02-21T09:31:17.8378547Z // begin inline asm 2026-02-21T09:31:17.8378872Z @%p32 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r120], [%rd42, {%r111, %r162}], [%r107]; 2026-02-21T09:31:17.8379214Z // end inline asm 2026-02-21T09:31:17.8379460Z .loc 1 54 32 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:32 2026-02-21T09:31:17.8379736Z add.s64 %rd36, %rd32, 64; 2026-02-21T09:31:17.8379995Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8380268Z // begin inline asm 2026-02-21T09:31:17.8380467Z cp.async.cg.shared.global [ %r124 + 0 ], [ %rd36 + 0 ], 0x10, %r111; 2026-02-21T09:31:17.8380691Z // end inline asm 2026-02-21T09:31:17.8380824Z cp.async.commit_group; 2026-02-21T09:31:17.8381083Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8381353Z bar.sync 0; 2026-02-21T09:31:17.8381483Z // begin inline asm 2026-02-21T09:31:17.8381665Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r108], 1024; 2026-02-21T09:31:17.8381878Z // end inline asm 2026-02-21T09:31:17.8382115Z .loc 1 55 44 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:55:44 2026-02-21T09:31:17.8382395Z bar.sync 0; 2026-02-21T09:31:17.8382533Z elect.sync %r151|%p38, -1; 2026-02-21T09:31:17.8382688Z and.pred %p34, %p1, %p38; 2026-02-21T09:31:17.8382846Z add.s32 %r127, %r45, 18432; 2026-02-21T09:31:17.8382991Z mov.b32 %r128, 32; 2026-02-21T09:31:17.8383131Z // begin inline asm 2026-02-21T09:31:17.8383444Z @%p34 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r127], [%rd42, {%r128, %r162}], [%r108]; 2026-02-21T09:31:17.8383790Z // end inline asm 2026-02-21T09:31:17.8384039Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8384326Z cp.async.wait_group 2; 2026-02-21T09:31:17.8384482Z bar.sync 0; 2026-02-21T09:31:17.8384746Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8385058Z // begin inline asm 2026-02-21T09:31:17.8385212Z 2026-02-21T09:31:17.8385329Z { 2026-02-21T09:31:17.8385445Z .reg .pred complete; 2026-02-21T09:31:17.8385590Z waitLoop: 2026-02-21T09:31:17.8385773Z mbarrier.try_wait.parity.shared.b64 complete, [%r106], %r279; 2026-02-21T09:31:17.8386003Z @!complete bra.uni waitLoop; 2026-02-21T09:31:17.8386156Z } 2026-02-21T09:31:17.8386219Z 2026-02-21T09:31:17.8386273Z // end inline asm 2026-02-21T09:31:17.8386515Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8386792Z setp.ne.b32 %p39, %r29, 0; 2026-02-21T09:31:17.8386951Z @%p39 bra $L__BB0_4; 2026-02-21T09:31:17.8387129Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:17.8387343Z elect.sync %r154|%p41, -1; 2026-02-21T09:31:17.8387502Z mov.b32 %r153, 134742032; 2026-02-21T09:31:17.8387677Z mov.pred %p40, 0; 2026-02-21T09:31:17.8387848Z // begin inline asm 2026-02-21T09:31:17.8388069Z @%p41 tcgen05.mma.cta_group::1.kind::f16 [ %r277 + 0 ], %rd38, %rd39, %r153, %p40; 2026-02-21T09:31:17.8388327Z // end inline asm 2026-02-21T09:31:17.8388467Z add.s32 %r156, %r45, 20512; 2026-02-21T09:31:17.8388633Z cvt.u64.u32 %rd40, %r156; 2026-02-21T09:31:17.8388783Z // begin inline asm 2026-02-21T09:31:17.8389000Z @%p41 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T09:31:17.8389238Z // end inline asm 2026-02-21T09:31:17.8389420Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:31:17.8389756Z .loc 1 0 0 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:0 2026-02-21T09:31:17.8390048Z or.b32 %r25, %r139, %r9; 2026-02-21T09:31:17.8390209Z or.b32 %r26, %r139, %r10; 2026-02-21T09:31:17.8390361Z or.b32 %r28, %r162, %r12; 2026-02-21T09:31:17.8390642Z .loc 1 54 32 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:32 2026-02-21T09:31:17.8390940Z add.s64 %rd41, %rd32, 96; 2026-02-21T09:31:17.8391213Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8391511Z bar.sync 0; 2026-02-21T09:31:17.8391643Z // begin inline asm 2026-02-21T09:31:17.8391854Z cp.async.cg.shared.global [ %r157 + 0 ], [ %rd41 + 0 ], 0x10, %r111; 2026-02-21T09:31:17.8392082Z // end inline asm 2026-02-21T09:31:17.8392232Z cp.async.commit_group; 2026-02-21T09:31:17.8392500Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8392796Z // begin inline asm 2026-02-21T09:31:17.8392994Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r159], 1024; 2026-02-21T09:31:17.8393211Z // end inline asm 2026-02-21T09:31:17.8393476Z .loc 1 55 44 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:55:44 2026-02-21T09:31:17.8393764Z bar.sync 0; 2026-02-21T09:31:17.8393913Z elect.sync %r169|%p46, -1; 2026-02-21T09:31:17.8394080Z and.pred %p44, %p1, %p46; 2026-02-21T09:31:17.8394251Z add.s32 %r160, %r45, 19456; 2026-02-21T09:31:17.8394408Z mov.b32 %r161, 48; 2026-02-21T09:31:17.8394562Z // begin inline asm 2026-02-21T09:31:17.8394939Z @%p44 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd42, {%r161, %r162}], [%r159]; 2026-02-21T09:31:17.8395296Z // end inline asm 2026-02-21T09:31:17.8395556Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8395857Z add.s32 %r170, %r7, %r23; 2026-02-21T09:31:17.8396022Z add.s32 %r171, %r170, %r24; 2026-02-21T09:31:17.8396177Z shl.b32 %r172, %r171, 10; 2026-02-21T09:31:17.8396338Z or.b32 %r173, %r21, %r172; 2026-02-21T09:31:17.8396494Z cvt.u64.u32 %rd5, %r173; 2026-02-21T09:31:17.8396655Z mov.b32 %r284, 1; 2026-02-21T09:31:17.8396800Z mov.b32 %r283, 3; 2026-02-21T09:31:17.8396937Z mov.b64 %rd86, 0; 2026-02-21T09:31:17.8397088Z mov.b32 %r281, %r279; 2026-02-21T09:31:17.8397257Z mov.b32 %r282, %r279; 2026-02-21T09:31:17.8397434Z mov.b32 %r285, %r279; 2026-02-21T09:31:17.8397570Z bra.uni $L__BB0_5; 2026-02-21T09:31:17.8397750Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:17.8398058Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8398345Z setp.lt.u64 %p54, %rd86, 960; 2026-02-21T09:31:17.8398614Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8398884Z // begin inline asm 2026-02-21T09:31:17.8399020Z 2026-02-21T09:31:17.8399130Z { 2026-02-21T09:31:17.8399253Z .reg .pred complete; 2026-02-21T09:31:17.8399391Z waitLoop: 2026-02-21T09:31:17.8399580Z mbarrier.try_wait.parity.shared.b64 complete, [%r280], %r279; 2026-02-21T09:31:17.8399827Z @!complete bra.uni waitLoop; 2026-02-21T09:31:17.8399983Z } 2026-02-21T09:31:17.8400076Z 2026-02-21T09:31:17.8400138Z // end inline asm 2026-02-21T09:31:17.8400271Z add.s32 %r203, %r284, 1; 2026-02-21T09:31:17.8400424Z setp.gt.s32 %p57, %r203, 1; 2026-02-21T09:31:17.8400580Z selp.b32 %r284, 0, %r203, %p57; 2026-02-21T09:31:17.8400748Z selp.b32 %r204, 1, 0, %p57; 2026-02-21T09:31:17.8400895Z xor.b32 %r42, %r285, %r204; 2026-02-21T09:31:17.8401153Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8401424Z add.s32 %r205, %r283, 1; 2026-02-21T09:31:17.8401576Z setp.gt.s32 %p58, %r205, 3; 2026-02-21T09:31:17.8401728Z selp.b32 %r283, 0, %r205, %p58; 2026-02-21T09:31:17.8402001Z .loc 1 54 32 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:32 2026-02-21T09:31:17.8402281Z add.s64 %rd51, %rd5, %rd86; 2026-02-21T09:31:17.8402431Z cvt.u32.u64 %r206, %rd51; 2026-02-21T09:31:17.8402594Z mad.wide.s32 %rd49, %r206, 2, %rd8; 2026-02-21T09:31:17.8402861Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8403143Z shl.b32 %r207, %r283, 12; 2026-02-21T09:31:17.8403285Z bar.sync 0; 2026-02-21T09:31:17.8403423Z add.s32 %r196, %r110, %r207; 2026-02-21T09:31:17.8403585Z selp.b32 %r197, 16, 0, %p54; 2026-02-21T09:31:17.8403736Z // begin inline asm 2026-02-21T09:31:17.8403941Z cp.async.cg.shared.global [ %r196 + 0 ], [ %rd49 + 0 ], 0x10, %r197; 2026-02-21T09:31:17.8404157Z // end inline asm 2026-02-21T09:31:17.8404304Z cp.async.commit_group; 2026-02-21T09:31:17.8404556Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8404865Z shl.b32 %r208, %r283, 3; 2026-02-21T09:31:17.8405013Z add.s32 %r210, %r45, %r208; 2026-02-21T09:31:17.8405170Z add.s32 %r202, %r210, 20480; 2026-02-21T09:31:17.8405329Z and.pred %p52, %p4, %p54; 2026-02-21T09:31:17.8405479Z // begin inline asm 2026-02-21T09:31:17.8405671Z @%p52 mbarrier.arrive.expect_tx.shared.b64 _, [%r202], 1024; 2026-02-21T09:31:17.8405881Z // end inline asm 2026-02-21T09:31:17.8406133Z .loc 1 55 44 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:55:44 2026-02-21T09:31:17.8406421Z shl.b32 %r211, %r283, 10; 2026-02-21T09:31:17.8406573Z add.s32 %r212, %r45, %r211; 2026-02-21T09:31:17.8406724Z add.s32 %r199, %r212, 16384; 2026-02-21T09:31:17.8406878Z bar.sync 0; 2026-02-21T09:31:17.8407013Z elect.sync %r213|%p59, -1; 2026-02-21T09:31:17.8407177Z and.pred %p60, %p54, %p59; 2026-02-21T09:31:17.8407336Z and.pred %p53, %p1, %p60; 2026-02-21T09:31:17.8407485Z cvt.u32.u64 %r214, %rd86; 2026-02-21T09:31:17.8407637Z add.s32 %r200, %r214, 64; 2026-02-21T09:31:17.8407780Z // begin inline asm 2026-02-21T09:31:17.8408103Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r199], [%rd42, {%r200, %r162}], [%r202]; 2026-02-21T09:31:17.8408457Z // end inline asm 2026-02-21T09:31:17.8408711Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8409055Z add.s64 %rd7, %rd86, 16; 2026-02-21T09:31:17.8409210Z setp.lt.u64 %p61, %rd86, 992; 2026-02-21T09:31:17.8409373Z mov.b64 %rd86, %rd7; 2026-02-21T09:31:17.8409512Z mov.b32 %r279, %r285; 2026-02-21T09:31:17.8409659Z mov.b32 %r280, %r215; 2026-02-21T09:31:17.8409794Z mov.b32 %r285, %r42; 2026-02-21T09:31:17.8409937Z @%p61 bra $L__BB0_5; 2026-02-21T09:31:17.8410116Z bra.uni $L__BB0_8; 2026-02-21T09:31:17.8410291Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:31:17.8410533Z // => This Inner Loop Header: Depth=2 2026-02-21T09:31:17.8410741Z add.s32 %r176, %r282, 1; 2026-02-21T09:31:17.8410888Z setp.gt.s32 %p48, %r176, 3; 2026-02-21T09:31:17.8411053Z selp.b32 %r282, 0, %r176, %p48; 2026-02-21T09:31:17.8411240Z selp.b32 %r177, 1, 0, %p48; 2026-02-21T09:31:17.8411433Z xor.b32 %r281, %r281, %r177; 2026-02-21T09:31:17.8411691Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8411985Z cp.async.wait_group 2; 2026-02-21T09:31:17.8412136Z bar.sync 0; 2026-02-21T09:31:17.8412366Z .loc 1 49 57 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:49:57 2026-02-21T09:31:17.8412654Z shl.b32 %r178, %r282, 3; 2026-02-21T09:31:17.8412797Z add.s32 %r180, %r45, %r178; 2026-02-21T09:31:17.8412955Z add.s32 %r174, %r180, 20480; 2026-02-21T09:31:17.8413104Z // begin inline asm 2026-02-21T09:31:17.8413241Z 2026-02-21T09:31:17.8413349Z { 2026-02-21T09:31:17.8413477Z .reg .pred complete; 2026-02-21T09:31:17.8413617Z waitLoop: 2026-02-21T09:31:17.8413811Z mbarrier.try_wait.parity.shared.b64 complete, [%r174], %r281; 2026-02-21T09:31:17.8414039Z @!complete bra.uni waitLoop; 2026-02-21T09:31:17.8414187Z } 2026-02-21T09:31:17.8414249Z 2026-02-21T09:31:17.8414310Z // end inline asm 2026-02-21T09:31:17.8414443Z shl.b32 %r181, %r284, 3; 2026-02-21T09:31:17.8414596Z add.s32 %r182, %r45, %r181; 2026-02-21T09:31:17.8414767Z add.s32 %r215, %r182, 20512; 2026-02-21T09:31:17.8415037Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8415318Z @%p39 bra $L__BB0_7; 2026-02-21T09:31:17.8415501Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:31:17.8415823Z .loc 1 55 44 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:55:44 2026-02-21T09:31:17.8416098Z shl.b32 %r185, %r282, 10; 2026-02-21T09:31:17.8416254Z add.s32 %r187, %r45, %r185; 2026-02-21T09:31:17.8416404Z add.s32 %r188, %r187, 16384; 2026-02-21T09:31:17.8416666Z .loc 1 54 85 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:54:85 2026-02-21T09:31:17.8416944Z shl.b32 %r189, %r282, 12; 2026-02-21T09:31:17.8417099Z add.s32 %r190, %r45, %r189; 2026-02-21T09:31:17.8417365Z .loc 1 56 52 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:56:52 2026-02-21T09:31:17.8417652Z elect.sync %r191|%p50, -1; 2026-02-21T09:31:17.8417816Z bfe.u32 %r192, %r190, 4, 14; 2026-02-21T09:31:17.8417965Z cvt.u64.u32 %rd47, %r192; 2026-02-21T09:31:17.8418133Z or.b64 %rd44, %rd47, -4611685949691133952; 2026-02-21T09:31:17.8418306Z bfe.u32 %r193, %r188, 4, 14; 2026-02-21T09:31:17.8418459Z cvt.u64.u32 %rd48, %r193; 2026-02-21T09:31:17.8418619Z or.b64 %rd45, %rd48, -4611685949703716864; 2026-02-21T09:31:17.8418795Z mov.b32 %r184, 134742032; 2026-02-21T09:31:17.8418948Z mov.pred %p49, -1; 2026-02-21T09:31:17.8419089Z // begin inline asm 2026-02-21T09:31:17.8419314Z @%p50 tcgen05.mma.cta_group::1.kind::f16 [ %r277 + 0 ], %rd44, %rd45, %r184, %p49; 2026-02-21T09:31:17.8419556Z // end inline asm 2026-02-21T09:31:17.8419694Z cvt.u64.u32 %rd46, %r215; 2026-02-21T09:31:17.8419839Z // begin inline asm 2026-02-21T09:31:17.8420045Z @%p50 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd46]; 2026-02-21T09:31:17.8420318Z // end inline asm 2026-02-21T09:31:17.8420481Z bra.uni $L__BB0_7; 2026-02-21T09:31:17.8420642Z $L__BB0_9: // %._crit_edge 2026-02-21T09:31:17.8420930Z .loc 1 30 4 // cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py:30:4 2026-02-21T09:31:17.8421208Z bar.sync 0; 2026-02-21T09:31:17.8421331Z // begin inline asm 2026-02-21T09:31:17.8421525Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r277, 32; 2026-02-21T09:31:17.8421735Z // end inline asm 2026-02-21T09:31:17.8421869Z ret; 2026-02-21T09:31:17.8421986Z $L__tmp0: 2026-02-21T09:31:17.8422113Z $L__func_end0: 2026-02-21T09:31:17.8422269Z // -- End function 2026-02-21T09:31:17.8422442Z } 2026-02-21T09:31:17.8422706Z .file 1 "/tmp/torchinductor_root/br/cbroctmaz4wzutr4okjbm7vslpbowmvkpctpfs6yba6vq542uxcj.py" 2026-02-21T09:31:17.8423045Z .section .debug_abbrev 2026-02-21T09:31:17.8423225Z { 2026-02-21T09:31:17.8423378Z .b8 1 // Abbreviation Code 2026-02-21T09:31:17.8423605Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:17.8423814Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:17.8424023Z .b8 37 // DW_AT_producer 2026-02-21T09:31:17.8424230Z .b8 8 // DW_FORM_string 2026-02-21T09:31:17.8424425Z .b8 19 // DW_AT_language 2026-02-21T09:31:17.8424627Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:17.8424852Z .b8 3 // DW_AT_name 2026-02-21T09:31:17.8425053Z .b8 8 // DW_FORM_string 2026-02-21T09:31:17.8425248Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:17.8425457Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:17.8425662Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:17.8425858Z .b8 8 // DW_FORM_string 2026-02-21T09:31:17.8426058Z .b8 0 // EOM(1) 2026-02-21T09:31:17.8426244Z .b8 0 // EOM(2) 2026-02-21T09:31:17.8426432Z .b8 0 // EOM(3) 2026-02-21T09:31:17.8426598Z } 2026-02-21T09:31:17.8426724Z .section .debug_info 2026-02-21T09:31:17.8426857Z { 2026-02-21T09:31:17.8427008Z .b32 104 // Length of Unit 2026-02-21T09:31:17.8427231Z .b8 2 // DWARF version number 2026-02-21T09:31:17.8427415Z .b8 0 2026-02-21T09:31:17.8427596Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:17.8427842Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:17.8428079Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:17.8428313Z .b8 116 // DW_AT_producer 2026-02-21T09:31:17.8428499Z .b8 114 2026-02-21T09:31:17.8428615Z .b8 105 2026-02-21T09:31:17.8428732Z .b8 116 2026-02-21T09:31:17.8428847Z .b8 111 2026-02-21T09:31:17.8428954Z .b8 110 2026-02-21T09:31:17.8429068Z .b8 0 2026-02-21T09:31:17.8429201Z .b8 2 // DW_AT_language 2026-02-21T09:31:17.8429379Z .b8 0 2026-02-21T09:31:17.8429513Z .b8 99 // DW_AT_name 2026-02-21T09:31:17.8429690Z .b8 98 2026-02-21T09:31:17.8429801Z .b8 114 2026-02-21T09:31:17.8429915Z .b8 111 2026-02-21T09:31:17.8430024Z .b8 99 2026-02-21T09:31:17.8430144Z .b8 116 2026-02-21T09:31:17.8430253Z .b8 109 2026-02-21T09:31:17.8430368Z .b8 97 2026-02-21T09:31:17.8430476Z .b8 122 2026-02-21T09:31:17.8430592Z .b8 52 2026-02-21T09:31:17.8430706Z .b8 119 2026-02-21T09:31:17.8430816Z .b8 122 2026-02-21T09:31:17.8430931Z .b8 117 2026-02-21T09:31:17.8431040Z .b8 116 2026-02-21T09:31:17.8431163Z .b8 114 2026-02-21T09:31:17.8431277Z .b8 52 2026-02-21T09:31:17.8431429Z .b8 111 2026-02-21T09:31:17.8431574Z .b8 107 2026-02-21T09:31:17.8431690Z .b8 106 2026-02-21T09:31:17.8431798Z .b8 98 2026-02-21T09:31:17.8431915Z .b8 109 2026-02-21T09:31:17.8432039Z .b8 55 2026-02-21T09:31:17.8432161Z .b8 118 2026-02-21T09:31:17.8432275Z .b8 115 2026-02-21T09:31:17.8432399Z .b8 108 2026-02-21T09:31:17.8432519Z .b8 112 2026-02-21T09:31:17.8432632Z .b8 98 2026-02-21T09:31:17.8432752Z .b8 111 2026-02-21T09:31:17.8432866Z .b8 119 2026-02-21T09:31:17.8432987Z .b8 109 2026-02-21T09:31:17.8433101Z .b8 118 2026-02-21T09:31:17.8433222Z .b8 107 2026-02-21T09:31:17.8433335Z .b8 112 2026-02-21T09:31:17.8433455Z .b8 99 2026-02-21T09:31:17.8433568Z .b8 116 2026-02-21T09:31:17.8433692Z .b8 112 2026-02-21T09:31:17.8433804Z .b8 102 2026-02-21T09:31:17.8433926Z .b8 115 2026-02-21T09:31:17.8434039Z .b8 54 2026-02-21T09:31:17.8434160Z .b8 121 2026-02-21T09:31:17.8434277Z .b8 98 2026-02-21T09:31:17.8434422Z .b8 97 2026-02-21T09:31:17.8434543Z .b8 54 2026-02-21T09:31:17.8434714Z .b8 118 2026-02-21T09:31:17.8434840Z .b8 113 2026-02-21T09:31:17.8434957Z .b8 53 2026-02-21T09:31:17.8435081Z .b8 52 2026-02-21T09:31:17.8435194Z .b8 50 2026-02-21T09:31:17.8435314Z .b8 117 2026-02-21T09:31:17.8435426Z .b8 120 2026-02-21T09:31:17.8435550Z .b8 99 2026-02-21T09:31:17.8435664Z .b8 106 2026-02-21T09:31:17.8435785Z .b8 46 2026-02-21T09:31:17.8435899Z .b8 112 2026-02-21T09:31:17.8436022Z .b8 121 2026-02-21T09:31:17.8436135Z .b8 0 2026-02-21T09:31:17.8436301Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:17.8436531Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:17.8436713Z .b8 116 2026-02-21T09:31:17.8436835Z .b8 109 2026-02-21T09:31:17.8436948Z .b8 112 2026-02-21T09:31:17.8437070Z .b8 47 2026-02-21T09:31:17.8437183Z .b8 116 2026-02-21T09:31:17.8437303Z .b8 111 2026-02-21T09:31:17.8437415Z .b8 114 2026-02-21T09:31:17.8437540Z .b8 99 2026-02-21T09:31:17.8437656Z .b8 104 2026-02-21T09:31:17.8437784Z .b8 105 2026-02-21T09:31:17.8437902Z .b8 110 2026-02-21T09:31:17.8438024Z .b8 100 2026-02-21T09:31:17.8438145Z .b8 117 2026-02-21T09:31:17.8438258Z .b8 99 2026-02-21T09:31:17.8438380Z .b8 116 2026-02-21T09:31:17.8438493Z .b8 111 2026-02-21T09:31:17.8438612Z .b8 114 2026-02-21T09:31:17.8438724Z .b8 95 2026-02-21T09:31:17.8438842Z .b8 114 2026-02-21T09:31:17.8438956Z .b8 111 2026-02-21T09:31:17.8439076Z .b8 111 2026-02-21T09:31:17.8439188Z .b8 116 2026-02-21T09:31:17.8439309Z .b8 47 2026-02-21T09:31:17.8439422Z .b8 98 2026-02-21T09:31:17.8439544Z .b8 114 2026-02-21T09:31:17.8439655Z .b8 0 2026-02-21T09:31:17.8439778Z } 2026-02-21T09:31:17.8439921Z .section .debug_macinfo { } 2026-02-21T09:31:17.8440025Z 2026-02-21T09:31:17.8440101Z ================================================================ 2026-02-21T09:31:17.8440336Z please share the reproducer above with Triton project. 2026-02-21T09:31:21.2408947Z 2026-02-21T09:31:21.2409923Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 96/96 17.9 configs/s 2026-02-21T09:31:21.7271468Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1966.8 2026-02-21T09:31:21.7271865Z configs/s 2026-02-21T09:31:21.7777064Z [33s] Generation 1 complete: 2026-02-21T09:31:21.7777346Z error=8 2026-02-21T09:31:21.7777489Z ok=91 2026-02-21T09:31:21.7777641Z min=0.0735 2026-02-21T09:31:21.7777792Z mid=0.2335 2026-02-21T09:31:21.7777931Z max=3.6629 2026-02-21T09:31:21.7778091Z best={'block_sizes': [128, 64, 32], 2026-02-21T09:31:21.7778407Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:31:21.7778702Z 'l2_groupings': [16], 2026-02-21T09:31:21.7778895Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:31:21.7779118Z 'loop_orders': [[1, 0]], 2026-02-21T09:31:21.7779319Z 'maxnreg': 128, 2026-02-21T09:31:21.7779490Z 'num_sm_multiplier': 8, 2026-02-21T09:31:21.7779664Z 'num_stages': 7, 2026-02-21T09:31:21.7779859Z 'num_warps': 4, 2026-02-21T09:31:21.7780043Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:31:21.7780590Z 'range_flattens': [None, True], 2026-02-21T09:31:21.7780871Z 'range_multi_buffers': [False, False], 2026-02-21T09:31:21.7781091Z 'range_num_stages': [0, 0], 2026-02-21T09:31:21.7781292Z 'range_unroll_factors': [0, 0], 2026-02-21T09:31:21.7781493Z 'range_warp_specializes': [True, None]} 2026-02-21T09:31:21.7795126Z [33s] Fitting surrogate: 199 points, 199 targets 2026-02-21T09:31:23.1941091Z [34s] Generation 2 starting: 88 neighbors, 5 active search path(s) 2026-02-21T09:31:31.6339006Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92/92 26.8 configs/s 2026-02-21T09:31:36.9606166Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 92/92 17.4 configs/s 2026-02-21T09:31:37.9588211Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 995.1 2026-02-21T09:31:37.9588565Z configs/s 2026-02-21T09:31:38.0303582Z [49s] Generation 2 complete: 2026-02-21T09:31:38.0305455Z error=2 2026-02-21T09:31:38.0305716Z ok=92 2026-02-21T09:31:38.0310805Z min=0.0491 2026-02-21T09:31:38.0315448Z mid=0.1147 2026-02-21T09:31:38.0316977Z max=4.2680 2026-02-21T09:31:38.0317164Z best={'block_sizes': [128, 128, 32], 2026-02-21T09:31:38.0317408Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:31:38.0317638Z 'l2_groupings': [16], 2026-02-21T09:31:38.0317806Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:31:38.0318004Z 'loop_orders': [[1, 0]], 2026-02-21T09:31:38.0318155Z 'maxnreg': 128, 2026-02-21T09:31:38.0318308Z 'num_sm_multiplier': 8, 2026-02-21T09:31:38.0318467Z 'num_stages': 7, 2026-02-21T09:31:38.0318603Z 'num_warps': 4, 2026-02-21T09:31:38.0318760Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:31:38.0318951Z 'range_flattens': [None, True], 2026-02-21T09:31:38.0319132Z 'range_multi_buffers': [False, False], 2026-02-21T09:31:38.0319315Z 'range_num_stages': [0, 0], 2026-02-21T09:31:38.0319502Z 'range_unroll_factors': [0, 0], 2026-02-21T09:31:38.0319702Z 'range_warp_specializes': [True, None]} 2026-02-21T09:31:38.0323244Z [49s] Fitting surrogate: 293 points, 293 targets 2026-02-21T09:31:39.2960319Z [51s] Generation 3 starting: 83 neighbors, 5 active search path(s) 2026-02-21T09:31:49.6970964Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 4.0 configs/s 2026-02-21T09:31:51.3904630Z [63s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:51.3905301Z 2026-02-21T09:31:51.3905394Z 2026-02-21T09:31:51.3905584Z ================================================================ 2026-02-21T09:31:51.3905854Z Internal Triton PTX codegen error 2026-02-21T09:31:51.3907327Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=8, num_stages=4, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:31:51.3908741Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:51.3909019Z `ptxas` stderr: 2026-02-21T09:31:51.3909532Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 215 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:51.3910097Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:51.3910274Z 2026-02-21T09:31:51.3910736Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpu18bx_vf.ptx -o /tmp/tmpu18bx_vf.ptx.o 2026-02-21T09:31:51.3911235Z 2026-02-21T09:31:51.3911399Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:51.3911690Z `ptxas` stderr: 2026-02-21T09:31:51.3912176Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 215 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:51.3913113Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:51.3913286Z 2026-02-21T09:31:51.3913727Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpu18bx_vf.ptx -o /tmp/tmpu18bx_vf.ptx.o 2026-02-21T09:31:51.3914243Z 2026-02-21T09:31:51.3914247Z 2026-02-21T09:31:51.3914311Z // 2026-02-21T09:31:51.3914467Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:51.3914751Z // 2026-02-21T09:31:51.3914829Z 2026-02-21T09:31:51.3914901Z .version 8.7 2026-02-21T09:31:51.3915062Z .target sm_100a 2026-02-21T09:31:51.3915222Z .address_size 64 2026-02-21T09:31:51.3915317Z 2026-02-21T09:31:51.3915551Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:51.3915922Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:51.3916172Z // @_helion_matmul 2026-02-21T09:31:51.3916408Z .visible .entry _helion_matmul( 2026-02-21T09:31:51.3916652Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:51.3916952Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:51.3917246Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:51.3917536Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:51.3917841Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:51.3918075Z ) 2026-02-21T09:31:51.3918216Z .reqntid 256 2026-02-21T09:31:51.3918363Z .maxnreg 32 2026-02-21T09:31:51.3918506Z { 2026-02-21T09:31:51.3918643Z .reg .pred %p<100>; 2026-02-21T09:31:51.3918818Z .reg .b32 %r<683>; 2026-02-21T09:31:51.3918984Z .reg .b64 %rd<305>; 2026-02-21T09:31:51.3919140Z $L__func_begin0: 2026-02-21T09:31:51.3919234Z 2026-02-21T09:31:51.3919306Z // %bb.0: 2026-02-21T09:31:51.3919578Z .loc 1 19 0 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:19 2026-02-21T09:31:51.3919915Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:51.3920108Z ld.param.b64 %rd17, [_helion_matmul_param_1]; 2026-02-21T09:31:51.3920342Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:51.3920524Z mov.b32 %r39, global_smem; 2026-02-21T09:31:51.3920708Z // begin inline asm 2026-02-21T09:31:51.3920992Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r39], 128; 2026-02-21T09:31:51.3921269Z // end inline asm 2026-02-21T09:31:51.3921458Z ld.param.b64 %rd34, [_helion_matmul_param_3]; 2026-02-21T09:31:51.3921670Z bar.sync 0; 2026-02-21T09:31:51.3921840Z ld.shared.b32 %r675, [global_smem]; 2026-02-21T09:31:51.3922033Z bar.sync 0; 2026-02-21T09:31:51.3922186Z // begin inline asm 2026-02-21T09:31:51.3922416Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:51.3922682Z // end inline asm 2026-02-21T09:31:51.3922978Z .loc 1 21 67 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:21:67 2026-02-21T09:31:51.3923305Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:31:51.3923483Z mov.u32 %r48, %ctaid.y; 2026-02-21T09:31:51.3923649Z mov.u32 %r49, %ctaid.z; 2026-02-21T09:31:51.3923820Z mov.u32 %r50, %nctaid.x; 2026-02-21T09:31:51.3923989Z mov.u32 %r51, %nctaid.y; 2026-02-21T09:31:51.3924168Z mad.lo.s32 %r52, %r49, %r51, %r48; 2026-02-21T09:31:51.3924367Z mad.lo.s32 %r53, %r52, %r50, %r3; 2026-02-21T09:31:51.3924563Z shl.b32 %r54, %r53, 7; 2026-02-21T09:31:51.3924790Z cvt.s64.s32 %rd35, %r54; 2026-02-21T09:31:51.3924964Z add.s64 %rd31, %rd34, %rd35; 2026-02-21T09:31:51.3925150Z shl.b32 %r55, %r1, 2; 2026-02-21T09:31:51.3925314Z add.s32 %r40, %r39, %r55; 2026-02-21T09:31:51.3925491Z mov.b32 %r57, 0; 2026-02-21T09:31:51.3925642Z // begin inline asm 2026-02-21T09:31:51.3925819Z @%p1 st.shared.b32 [ %r40 + 0 ], %r57; 2026-02-21T09:31:51.3926015Z // end inline asm 2026-02-21T09:31:51.3926179Z bar.warp.sync -1; 2026-02-21T09:31:51.3926405Z setp.eq.b32 %p93, %r1, 0; 2026-02-21T09:31:51.3926626Z cvt.u64.u32 %rd16, %r39; 2026-02-21T09:31:51.3926803Z // begin inline asm 2026-02-21T09:31:51.3927094Z @%p93 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd16 + 0 ], %rd17; 2026-02-21T09:31:51.3927438Z // end inline asm 2026-02-21T09:31:51.3927590Z // begin inline asm 2026-02-21T09:31:51.3927851Z @%p93 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1; 2026-02-21T09:31:51.3928151Z // end inline asm 2026-02-21T09:31:51.3928320Z mov.b32 %r42, 64; 2026-02-21T09:31:51.3928485Z // begin inline asm 2026-02-21T09:31:51.3928763Z @%p93 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r42; 2026-02-21T09:31:51.3929069Z // end inline asm 2026-02-21T09:31:51.3929218Z mov.b32 %r43, 128; 2026-02-21T09:31:51.3929383Z // begin inline asm 2026-02-21T09:31:51.3929715Z @%p93 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r43; 2026-02-21T09:31:51.3930027Z // end inline asm 2026-02-21T09:31:51.3930181Z mov.b32 %r44, 1024; 2026-02-21T09:31:51.3930350Z // begin inline asm 2026-02-21T09:31:51.3930623Z @%p93 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r44; 2026-02-21T09:31:51.3930939Z // end inline asm 2026-02-21T09:31:51.3931094Z // begin inline asm 2026-02-21T09:31:51.3931360Z @%p93 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r44; 2026-02-21T09:31:51.3931675Z // end inline asm 2026-02-21T09:31:51.3931825Z mov.b64 %rd24, 2048; 2026-02-21T09:31:51.3931992Z // begin inline asm 2026-02-21T09:31:51.3932275Z @%p93 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd16 + 0 ], 0x0, %rd24; 2026-02-21T09:31:51.3932605Z // end inline asm 2026-02-21T09:31:51.3932762Z mov.b32 %r46, 1; 2026-02-21T09:31:51.3932919Z // begin inline asm 2026-02-21T09:31:51.3933211Z @%p93 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r46; 2026-02-21T09:31:51.3933526Z // end inline asm 2026-02-21T09:31:51.3933682Z // begin inline asm 2026-02-21T09:31:51.3933955Z @%p93 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r46; 2026-02-21T09:31:51.3934284Z // end inline asm 2026-02-21T09:31:51.3934433Z // begin inline asm 2026-02-21T09:31:51.3934753Z @%p93 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x6; 2026-02-21T09:31:51.3935062Z // end inline asm 2026-02-21T09:31:51.3935210Z // begin inline asm 2026-02-21T09:31:51.3935498Z @%p93 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:31:51.3935816Z // end inline asm 2026-02-21T09:31:51.3935972Z // begin inline asm 2026-02-21T09:31:51.3936236Z @%p93 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x3; 2026-02-21T09:31:51.3936557Z // end inline asm 2026-02-21T09:31:51.3936713Z // begin inline asm 2026-02-21T09:31:51.3936975Z @%p93 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:31:51.3937283Z // end inline asm 2026-02-21T09:31:51.3937433Z // begin inline asm 2026-02-21T09:31:51.3937840Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd31 + 0 ], [ %rd16 + 0 ], 0x80; 2026-02-21T09:31:51.3938274Z // end inline asm 2026-02-21T09:31:51.3938433Z // begin inline asm 2026-02-21T09:31:51.3938677Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd31 + 0 ], 0x80; 2026-02-21T09:31:51.3938959Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:31:51.3939182Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:31:51.3939383Z // end inline asm 2026-02-21T09:31:51.3939545Z bar.sync 0; 2026-02-21T09:31:51.3939708Z cvta.global.u64 %rd111, %rd31; 2026-02-21T09:31:51.3940039Z .loc 1 27 97 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:27:97 2026-02-21T09:31:51.3950362Z setp.gt.u32 %p21, %r3, 767; 2026-02-21T09:31:51.3950667Z @%p21 bra $L__BB0_8; 2026-02-21T09:31:51.3951020Z // %bb.1: // %.lr.ph 2026-02-21T09:31:51.3951441Z .loc 1 0 97 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:0:97 2026-02-21T09:31:51.3951816Z ld.param.b64 %rd14, [_helion_matmul_param_0]; 2026-02-21T09:31:51.3952157Z .loc 1 39 45 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:39:45 2026-02-21T09:31:51.3952491Z shl.b32 %r195, %r1, 3; 2026-02-21T09:31:51.3952680Z and.b32 %r196, %r195, 120; 2026-02-21T09:31:51.3952860Z and.b32 %r197, %r1, 240; 2026-02-21T09:31:51.3953046Z bfe.u32 %r4, %r1, 4, 4; 2026-02-21T09:31:51.3953218Z shr.u32 %r198, %r1, 5; 2026-02-21T09:31:51.3953391Z shl.b32 %r199, %r1, 4; 2026-02-21T09:31:51.3953556Z and.b32 %r200, %r199, 112; 2026-02-21T09:31:51.3953740Z shl.b32 %r201, %r197, 3; 2026-02-21T09:31:51.3953907Z and.b32 %r202, %r1, 112; 2026-02-21T09:31:51.3954130Z shl.b32 %r203, %r1, 11; 2026-02-21T09:31:51.3954336Z and.b32 %r204, %r203, 16384; 2026-02-21T09:31:51.3954530Z or.b32 %r205, %r200, %r201; 2026-02-21T09:31:51.3954810Z xor.b32 %r206, %r205, %r202; 2026-02-21T09:31:51.3954992Z or.b32 %r5, %r206, %r204; 2026-02-21T09:31:51.3955179Z add.s32 %r130, %r39, %r5; 2026-02-21T09:31:51.3955354Z add.s32 %r132, %r130, 2048; 2026-02-21T09:31:51.3955781Z add.s32 %r134, %r130, 4096; 2026-02-21T09:31:51.3955956Z add.s32 %r136, %r130, 6144; 2026-02-21T09:31:51.3956139Z add.s32 %r138, %r130, 8192; 2026-02-21T09:31:51.3956323Z add.s32 %r140, %r130, 10240; 2026-02-21T09:31:51.3956505Z add.s32 %r142, %r130, 12288; 2026-02-21T09:31:51.3956672Z add.s32 %r144, %r130, 14336; 2026-02-21T09:31:51.3956860Z setp.lt.u32 %p38, %r1, 64; 2026-02-21T09:31:51.3957048Z add.s32 %r151, %r130, 32768; 2026-02-21T09:31:51.3957220Z add.s32 %r153, %r130, 34816; 2026-02-21T09:31:51.3957408Z add.s32 %r155, %r130, 36864; 2026-02-21T09:31:51.3957583Z add.s32 %r157, %r130, 38912; 2026-02-21T09:31:51.3957767Z add.s32 %r159, %r130, 40960; 2026-02-21T09:31:51.3957941Z add.s32 %r161, %r130, 43008; 2026-02-21T09:31:51.3958127Z add.s32 %r163, %r130, 45056; 2026-02-21T09:31:51.3958299Z add.s32 %r165, %r130, 47104; 2026-02-21T09:31:51.3958482Z add.s32 %r172, %r130, 65536; 2026-02-21T09:31:51.3958670Z add.s32 %r174, %r130, 67584; 2026-02-21T09:31:51.3958845Z add.s32 %r176, %r130, 69632; 2026-02-21T09:31:51.3959033Z add.s32 %r178, %r130, 71680; 2026-02-21T09:31:51.3959205Z add.s32 %r180, %r130, 73728; 2026-02-21T09:31:51.3959387Z add.s32 %r182, %r130, 75776; 2026-02-21T09:31:51.3959559Z add.s32 %r184, %r130, 77824; 2026-02-21T09:31:51.3959740Z add.s32 %r186, %r130, 79872; 2026-02-21T09:31:51.3959917Z or.b32 %r6, %r196, 384; 2026-02-21T09:31:51.3960096Z add.s32 %r299, %r130, 98304; 2026-02-21T09:31:51.3960272Z add.s32 %r301, %r130, 100352; 2026-02-21T09:31:51.3960466Z add.s32 %r303, %r130, 102400; 2026-02-21T09:31:51.3960655Z add.s32 %r305, %r130, 104448; 2026-02-21T09:31:51.3960833Z add.s32 %r307, %r130, 106496; 2026-02-21T09:31:51.3961018Z add.s32 %r309, %r130, 108544; 2026-02-21T09:31:51.3961194Z add.s32 %r311, %r130, 110592; 2026-02-21T09:31:51.3961380Z add.s32 %r313, %r130, 112640; 2026-02-21T09:31:51.3961556Z shl.b32 %r208, %r1, 10; 2026-02-21T09:31:51.3961736Z and.b32 %r209, %r208, 6144; 2026-02-21T09:31:51.3961911Z and.b32 %r210, %r199, 2032; 2026-02-21T09:31:51.3962097Z shr.u32 %r211, %r1, 1; 2026-02-21T09:31:51.3962269Z and.b32 %r212, %r211, 64; 2026-02-21T09:31:51.3962456Z xor.b32 %r213, %r210, %r212; 2026-02-21T09:31:51.3962642Z or.b32 %r214, %r213, %r209; 2026-02-21T09:31:51.3962819Z xor.b32 %r215, %r214, 32; 2026-02-21T09:31:51.3963000Z shl.b32 %r216, %r1, 6; 2026-02-21T09:31:51.3963169Z and.b32 %r217, %r216, 6144; 2026-02-21T09:31:51.3963351Z shl.b32 %r218, %r1, 5; 2026-02-21T09:31:51.3963517Z and.b32 %r219, %r218, 864; 2026-02-21T09:31:51.3963698Z and.b32 %r220, %r1, 224; 2026-02-21T09:31:51.3963871Z and.b32 %r222, %r55, 16; 2026-02-21T09:31:51.3964053Z or.b32 %r223, %r217, %r219; 2026-02-21T09:31:51.3964305Z xor.b32 %r224, %r223, %r220; 2026-02-21T09:31:51.3964518Z add.s32 %r225, %r39, %r222; 2026-02-21T09:31:51.3964742Z add.s32 %r506, %r225, %r224; 2026-02-21T09:31:51.3965049Z .loc 1 27 97 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:27:97 2026-02-21T09:31:51.3965386Z cvt.u64.u32 %rd63, %r196; 2026-02-21T09:31:51.3965685Z .loc 1 38 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:38:27 2026-02-21T09:31:51.3966021Z shl.b32 %r226, %r3, 7; 2026-02-21T09:31:51.3966191Z and.b32 %r318, %r226, 896; 2026-02-21T09:31:51.3966491Z .loc 1 40 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:40:27 2026-02-21T09:31:51.3966832Z shl.b32 %r227, %r3, 4; 2026-02-21T09:31:51.3967001Z and.b32 %r228, %r227, 16256; 2026-02-21T09:31:51.3967368Z .loc 1 41 32 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:41:32 2026-02-21T09:31:51.3967735Z or.b32 %r229, %r228, %r4; 2026-02-21T09:31:51.3967926Z or.b32 %r230, %r4, %r227; 2026-02-21T09:31:51.3968213Z .loc 1 51 53 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:53 2026-02-21T09:31:51.3968545Z shl.b32 %r231, %r229, 10; 2026-02-21T09:31:51.3968729Z shl.b32 %r232, %r230, 10; 2026-02-21T09:31:51.3968901Z or.b32 %r233, %r232, 114688; 2026-02-21T09:31:51.3969209Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.3969561Z shfl.sync.idx.b32 %r21, %r198, 0, 31, -1; 2026-02-21T09:31:51.3969790Z shl.b32 %r234, %r21, 21; 2026-02-21T09:31:51.3969972Z and.b32 %r235, %r234, 6291456; 2026-02-21T09:31:51.3970170Z add.s32 %r236, %r235, %r675; 2026-02-21T09:31:51.3970355Z shl.b32 %r237, %r21, 4; 2026-02-21T09:31:51.3970527Z and.b32 %r238, %r237, 64; 2026-02-21T09:31:51.3970705Z add.s32 %r501, %r236, %r238; 2026-02-21T09:31:51.3970881Z mov.pred %p45, -1; 2026-02-21T09:31:51.3971064Z // begin inline asm 2026-02-21T09:31:51.3971483Z @%p45 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r501 + 0], {%r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57}; 2026-02-21T09:31:51.3971934Z // end inline asm 2026-02-21T09:31:51.3972094Z // begin inline asm 2026-02-21T09:31:51.3972509Z @%p45 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r501 + 16], {%r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57}; 2026-02-21T09:31:51.3972954Z // end inline asm 2026-02-21T09:31:51.3973110Z // begin inline asm 2026-02-21T09:31:51.3973495Z @%p45 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r501 + 32], {%r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57}; 2026-02-21T09:31:51.3973904Z // end inline asm 2026-02-21T09:31:51.3974064Z // begin inline asm 2026-02-21T09:31:51.3974433Z @%p45 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r501 + 48], {%r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57, %r57}; 2026-02-21T09:31:51.3974911Z // end inline asm 2026-02-21T09:31:51.3975076Z // begin inline asm 2026-02-21T09:31:51.3975263Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:51.3975461Z // end inline asm 2026-02-21T09:31:51.3975613Z bar.sync 0; 2026-02-21T09:31:51.3975899Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.3976226Z add.s32 %r677, %r39, 262176; 2026-02-21T09:31:51.3976414Z // begin inline asm 2026-02-21T09:31:51.3976607Z @%p93 mbarrier.init.shared::cta.b64 [%r677], 1; 2026-02-21T09:31:51.3976837Z // end inline asm 2026-02-21T09:31:51.3976990Z bar.sync 0; 2026-02-21T09:31:51.3977155Z add.s32 %r125, %r39, 262184; 2026-02-21T09:31:51.3977337Z // begin inline asm 2026-02-21T09:31:51.3977525Z @%p93 mbarrier.init.shared::cta.b64 [%r125], 1; 2026-02-21T09:31:51.3977754Z // end inline asm 2026-02-21T09:31:51.3977911Z add.s32 %r126, %r39, 262144; 2026-02-21T09:31:51.3978095Z // begin inline asm 2026-02-21T09:31:51.3978317Z @%p93 mbarrier.init.shared::cta.b64 [%r126], 1; 2026-02-21T09:31:51.3978572Z // end inline asm 2026-02-21T09:31:51.3978722Z bar.sync 0; 2026-02-21T09:31:51.3978884Z add.s32 %r127, %r39, 262152; 2026-02-21T09:31:51.3979068Z // begin inline asm 2026-02-21T09:31:51.3979253Z @%p93 mbarrier.init.shared::cta.b64 [%r127], 1; 2026-02-21T09:31:51.3979472Z // end inline asm 2026-02-21T09:31:51.3979621Z bar.sync 0; 2026-02-21T09:31:51.3979782Z add.s32 %r128, %r39, 262160; 2026-02-21T09:31:51.3979956Z // begin inline asm 2026-02-21T09:31:51.3980149Z @%p93 mbarrier.init.shared::cta.b64 [%r128], 1; 2026-02-21T09:31:51.3980357Z // end inline asm 2026-02-21T09:31:51.3980519Z bar.sync 0; 2026-02-21T09:31:51.3980672Z add.s32 %r315, %r39, 262168; 2026-02-21T09:31:51.3980860Z // begin inline asm 2026-02-21T09:31:51.3981057Z @%p93 mbarrier.init.shared::cta.b64 [%r315], 1; 2026-02-21T09:31:51.3981300Z // end inline asm 2026-02-21T09:31:51.3981621Z .loc 1 51 60 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:60 2026-02-21T09:31:51.3981959Z or.b32 %r239, %r231, %r196; 2026-02-21T09:31:51.3982154Z or.b32 %r240, %r233, %r196; 2026-02-21T09:31:51.3982465Z .loc 1 51 32 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:32 2026-02-21T09:31:51.3982813Z mad.wide.u32 %rd36, %r239, 2, %rd14; 2026-02-21T09:31:51.3983030Z cvt.u64.u32 %rd2, %r231; 2026-02-21T09:31:51.3983210Z or.b64 %rd64, %rd2, %rd63; 2026-02-21T09:31:51.3983399Z shl.b64 %rd65, %rd64, 1; 2026-02-21T09:31:51.3983597Z add.s64 %rd3, %rd14, %rd65; 2026-02-21T09:31:51.3983788Z add.s64 %rd37, %rd3, 32768; 2026-02-21T09:31:51.3983972Z add.s64 %rd38, %rd3, 65536; 2026-02-21T09:31:51.3984168Z add.s64 %rd39, %rd3, 98304; 2026-02-21T09:31:51.3984353Z add.s64 %rd40, %rd3, 131072; 2026-02-21T09:31:51.3984554Z add.s64 %rd41, %rd3, 163840; 2026-02-21T09:31:51.3984780Z add.s64 %rd42, %rd3, 196608; 2026-02-21T09:31:51.3984988Z mad.wide.u32 %rd43, %r240, 2, %rd14; 2026-02-21T09:31:51.3985200Z mov.b32 %r300, 16; 2026-02-21T09:31:51.3985486Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.3985830Z // begin inline asm 2026-02-21T09:31:51.3986068Z cp.async.cg.shared.global [ %r130 + 0 ], [ %rd36 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3986321Z // end inline asm 2026-02-21T09:31:51.3986477Z // begin inline asm 2026-02-21T09:31:51.3986695Z cp.async.cg.shared.global [ %r132 + 0 ], [ %rd37 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3986970Z // end inline asm 2026-02-21T09:31:51.3987124Z // begin inline asm 2026-02-21T09:31:51.3987348Z cp.async.cg.shared.global [ %r134 + 0 ], [ %rd38 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3987618Z // end inline asm 2026-02-21T09:31:51.3987770Z // begin inline asm 2026-02-21T09:31:51.3988037Z cp.async.cg.shared.global [ %r136 + 0 ], [ %rd39 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3988320Z // end inline asm 2026-02-21T09:31:51.3988480Z // begin inline asm 2026-02-21T09:31:51.3988703Z cp.async.cg.shared.global [ %r138 + 0 ], [ %rd40 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3988960Z // end inline asm 2026-02-21T09:31:51.3989118Z // begin inline asm 2026-02-21T09:31:51.3989333Z cp.async.cg.shared.global [ %r140 + 0 ], [ %rd41 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3989593Z // end inline asm 2026-02-21T09:31:51.3989742Z // begin inline asm 2026-02-21T09:31:51.3989967Z cp.async.cg.shared.global [ %r142 + 0 ], [ %rd42 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3990223Z // end inline asm 2026-02-21T09:31:51.3990383Z // begin inline asm 2026-02-21T09:31:51.3990595Z cp.async.cg.shared.global [ %r144 + 0 ], [ %rd43 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.3990870Z // end inline asm 2026-02-21T09:31:51.3991037Z cp.async.commit_group; 2026-02-21T09:31:51.3991338Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.3991688Z bar.sync 0; 2026-02-21T09:31:51.3991871Z // begin inline asm 2026-02-21T09:31:51.3992131Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r126], 32768; 2026-02-21T09:31:51.3992382Z // end inline asm 2026-02-21T09:31:51.3992672Z .loc 1 52 44 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:52:44 2026-02-21T09:31:51.3992994Z bar.sync 0; 2026-02-21T09:31:51.3993171Z elect.sync %r241|%p39, -1; 2026-02-21T09:31:51.3993371Z and.pred %p33, %p38, %p39; 2026-02-21T09:31:51.3993565Z and.b32 %r242, %r21, 1; 2026-02-21T09:31:51.3993737Z shl.b32 %r23, %r242, 13; 2026-02-21T09:31:51.3993917Z shl.b32 %r243, %r242, 14; 2026-02-21T09:31:51.3994095Z add.s32 %r244, %r39, %r243; 2026-02-21T09:31:51.3994269Z add.s32 %r147, %r244, 131072; 2026-02-21T09:31:51.3994455Z shl.b32 %r148, %r242, 6; 2026-02-21T09:31:51.3994619Z // begin inline asm 2026-02-21T09:31:51.3995217Z @%p33 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd111, {%r148, %r318}], [%r126]; 2026-02-21T09:31:51.3995630Z // end inline asm 2026-02-21T09:31:51.3996007Z .loc 1 51 32 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:32 2026-02-21T09:31:51.3996392Z add.s64 %rd45, %rd3, 256; 2026-02-21T09:31:51.3996621Z or.b32 %r245, %r239, 128; 2026-02-21T09:31:51.3996885Z mad.wide.u32 %rd66, %r245, 2, %rd14; 2026-02-21T09:31:51.3997137Z add.s64 %rd46, %rd66, 32768; 2026-02-21T09:31:51.3997327Z add.s64 %rd47, %rd66, 65536; 2026-02-21T09:31:51.3997499Z add.s64 %rd48, %rd66, 98304; 2026-02-21T09:31:51.3997681Z add.s64 %rd49, %rd66, 131072; 2026-02-21T09:31:51.3997859Z add.s64 %rd50, %rd66, 163840; 2026-02-21T09:31:51.3998041Z add.s64 %rd51, %rd66, 196608; 2026-02-21T09:31:51.3998216Z cvt.u64.u32 %rd4, %r233; 2026-02-21T09:31:51.3998397Z or.b64 %rd67, %rd4, %rd63; 2026-02-21T09:31:51.3998575Z shl.b64 %rd68, %rd67, 1; 2026-02-21T09:31:51.3998745Z add.s64 %rd5, %rd14, %rd68; 2026-02-21T09:31:51.3998932Z add.s64 %rd52, %rd5, 256; 2026-02-21T09:31:51.3999228Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.3999560Z // begin inline asm 2026-02-21T09:31:51.3999787Z cp.async.cg.shared.global [ %r151 + 0 ], [ %rd45 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4000053Z // end inline asm 2026-02-21T09:31:51.4000203Z // begin inline asm 2026-02-21T09:31:51.4000431Z cp.async.cg.shared.global [ %r153 + 0 ], [ %rd46 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4000684Z // end inline asm 2026-02-21T09:31:51.4000833Z // begin inline asm 2026-02-21T09:31:51.4001055Z cp.async.cg.shared.global [ %r155 + 0 ], [ %rd47 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4001302Z // end inline asm 2026-02-21T09:31:51.4001460Z // begin inline asm 2026-02-21T09:31:51.4001675Z cp.async.cg.shared.global [ %r157 + 0 ], [ %rd48 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4001926Z // end inline asm 2026-02-21T09:31:51.4002076Z // begin inline asm 2026-02-21T09:31:51.4002301Z cp.async.cg.shared.global [ %r159 + 0 ], [ %rd49 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4002562Z // end inline asm 2026-02-21T09:31:51.4002712Z // begin inline asm 2026-02-21T09:31:51.4002937Z cp.async.cg.shared.global [ %r161 + 0 ], [ %rd50 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4003184Z // end inline asm 2026-02-21T09:31:51.4003364Z // begin inline asm 2026-02-21T09:31:51.4003620Z cp.async.cg.shared.global [ %r163 + 0 ], [ %rd51 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4003922Z // end inline asm 2026-02-21T09:31:51.4004101Z // begin inline asm 2026-02-21T09:31:51.4004367Z cp.async.cg.shared.global [ %r165 + 0 ], [ %rd52 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4004766Z // end inline asm 2026-02-21T09:31:51.4004962Z cp.async.commit_group; 2026-02-21T09:31:51.4005330Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4005650Z bar.sync 0; 2026-02-21T09:31:51.4005809Z // begin inline asm 2026-02-21T09:31:51.4006035Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r127], 32768; 2026-02-21T09:31:51.4006344Z // end inline asm 2026-02-21T09:31:51.4006650Z .loc 1 52 44 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:52:44 2026-02-21T09:31:51.4006973Z bar.sync 0; 2026-02-21T09:31:51.4007138Z elect.sync %r246|%p40, -1; 2026-02-21T09:31:51.4007326Z and.pred %p35, %p38, %p40; 2026-02-21T09:31:51.4007510Z add.s32 %r168, %r244, 163840; 2026-02-21T09:31:51.4007687Z or.b32 %r169, %r148, 128; 2026-02-21T09:31:51.4007860Z // begin inline asm 2026-02-21T09:31:51.4008232Z @%p35 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r168], [%rd111, {%r169, %r318}], [%r127]; 2026-02-21T09:31:51.4008644Z // end inline asm 2026-02-21T09:31:51.4008913Z .loc 1 51 32 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:32 2026-02-21T09:31:51.4009237Z add.s64 %rd54, %rd3, 512; 2026-02-21T09:31:51.4009453Z or.b32 %r247, %r239, 256; 2026-02-21T09:31:51.4009653Z mad.wide.u32 %rd69, %r247, 2, %rd14; 2026-02-21T09:31:51.4009856Z add.s64 %rd55, %rd69, 32768; 2026-02-21T09:31:51.4010033Z add.s64 %rd56, %rd69, 65536; 2026-02-21T09:31:51.4010208Z add.s64 %rd57, %rd69, 98304; 2026-02-21T09:31:51.4010376Z add.s64 %rd58, %rd69, 131072; 2026-02-21T09:31:51.4010556Z add.s64 %rd59, %rd69, 163840; 2026-02-21T09:31:51.4010726Z add.s64 %rd60, %rd69, 196608; 2026-02-21T09:31:51.4010902Z add.s64 %rd61, %rd5, 512; 2026-02-21T09:31:51.4011192Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.4011509Z // begin inline asm 2026-02-21T09:31:51.4011732Z cp.async.cg.shared.global [ %r172 + 0 ], [ %rd54 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4011974Z // end inline asm 2026-02-21T09:31:51.4012124Z // begin inline asm 2026-02-21T09:31:51.4012334Z cp.async.cg.shared.global [ %r174 + 0 ], [ %rd55 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4012589Z // end inline asm 2026-02-21T09:31:51.4012735Z // begin inline asm 2026-02-21T09:31:51.4012954Z cp.async.cg.shared.global [ %r176 + 0 ], [ %rd56 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4013205Z // end inline asm 2026-02-21T09:31:51.4013350Z // begin inline asm 2026-02-21T09:31:51.4013564Z cp.async.cg.shared.global [ %r178 + 0 ], [ %rd57 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4013803Z // end inline asm 2026-02-21T09:31:51.4013957Z // begin inline asm 2026-02-21T09:31:51.4014164Z cp.async.cg.shared.global [ %r180 + 0 ], [ %rd58 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4014418Z // end inline asm 2026-02-21T09:31:51.4014563Z // begin inline asm 2026-02-21T09:31:51.4014839Z cp.async.cg.shared.global [ %r182 + 0 ], [ %rd59 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4015091Z // end inline asm 2026-02-21T09:31:51.4015239Z // begin inline asm 2026-02-21T09:31:51.4015462Z cp.async.cg.shared.global [ %r184 + 0 ], [ %rd60 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4015706Z // end inline asm 2026-02-21T09:31:51.4015862Z // begin inline asm 2026-02-21T09:31:51.4016080Z cp.async.cg.shared.global [ %r186 + 0 ], [ %rd61 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4016348Z // end inline asm 2026-02-21T09:31:51.4016500Z cp.async.commit_group; 2026-02-21T09:31:51.4016794Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4017119Z bar.sync 0; 2026-02-21T09:31:51.4017264Z // begin inline asm 2026-02-21T09:31:51.4017480Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r128], 32768; 2026-02-21T09:31:51.4017719Z // end inline asm 2026-02-21T09:31:51.4017986Z .loc 1 52 44 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:52:44 2026-02-21T09:31:51.4018284Z bar.sync 0; 2026-02-21T09:31:51.4018442Z elect.sync %r248|%p41, -1; 2026-02-21T09:31:51.4018622Z and.pred %p37, %p38, %p41; 2026-02-21T09:31:51.4018802Z add.s32 %r189, %r244, 196608; 2026-02-21T09:31:51.4018979Z or.b32 %r190, %r148, 256; 2026-02-21T09:31:51.4019141Z // begin inline asm 2026-02-21T09:31:51.4019515Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r189], [%rd111, {%r190, %r318}], [%r128]; 2026-02-21T09:31:51.4019999Z // end inline asm 2026-02-21T09:31:51.4020271Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.4020586Z cp.async.wait_group 2; 2026-02-21T09:31:51.4020760Z bar.sync 0; 2026-02-21T09:31:51.4021022Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4021337Z // begin inline asm 2026-02-21T09:31:51.4021495Z 2026-02-21T09:31:51.4021620Z { 2026-02-21T09:31:51.4021767Z .reg .pred complete; 2026-02-21T09:31:51.4021927Z waitLoop: 2026-02-21T09:31:51.4022139Z mbarrier.try_wait.parity.shared.b64 complete, [%r126], %r57; 2026-02-21T09:31:51.4022393Z @!complete bra.uni waitLoop; 2026-02-21T09:31:51.4022564Z } 2026-02-21T09:31:51.4022640Z 2026-02-21T09:31:51.4022737Z // end inline asm 2026-02-21T09:31:51.4023042Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4023373Z setp.ne.b32 %p42, %r21, 0; 2026-02-21T09:31:51.4023549Z @%p42 bra $L__BB0_3; 2026-02-21T09:31:51.4023711Z // %bb.2: 2026-02-21T09:31:51.4023970Z .loc 1 0 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:0:52 2026-02-21T09:31:51.4024290Z add.s32 %r266, %r39, 131072; 2026-02-21T09:31:51.4024463Z add.s32 %r267, %r39, 147552; 2026-02-21T09:31:51.4024653Z bfe.u32 %r268, %r267, 4, 14; 2026-02-21T09:31:51.4024900Z cvt.u64.u32 %rd87, %r268; 2026-02-21T09:31:51.4025096Z or.b64 %rd85, %rd87, 4611686293372403712; 2026-02-21T09:31:51.4025304Z add.s32 %r269, %r39, 16480; 2026-02-21T09:31:51.4025479Z bfe.u32 %r270, %r269, 4, 14; 2026-02-21T09:31:51.4025656Z cvt.u64.u32 %rd88, %r270; 2026-02-21T09:31:51.4025839Z or.b64 %rd84, %rd88, 4611686293372403712; 2026-02-21T09:31:51.4026047Z add.s32 %r271, %r39, 147520; 2026-02-21T09:31:51.4026217Z bfe.u32 %r272, %r271, 4, 14; 2026-02-21T09:31:51.4026396Z cvt.u64.u32 %rd89, %r272; 2026-02-21T09:31:51.4026576Z or.b64 %rd83, %rd89, 4611686293372403712; 2026-02-21T09:31:51.4026783Z add.s32 %r273, %r39, 16448; 2026-02-21T09:31:51.4026963Z bfe.u32 %r274, %r273, 4, 14; 2026-02-21T09:31:51.4027132Z cvt.u64.u32 %rd90, %r274; 2026-02-21T09:31:51.4027321Z or.b64 %rd82, %rd90, 4611686293372403712; 2026-02-21T09:31:51.4027512Z add.s32 %r275, %r39, 147488; 2026-02-21T09:31:51.4027689Z bfe.u32 %r276, %r275, 4, 14; 2026-02-21T09:31:51.4027859Z cvt.u64.u32 %rd91, %r276; 2026-02-21T09:31:51.4028040Z or.b64 %rd81, %rd91, 4611686293372403712; 2026-02-21T09:31:51.4028231Z add.s32 %r277, %r39, 16416; 2026-02-21T09:31:51.4028407Z bfe.u32 %r278, %r277, 4, 14; 2026-02-21T09:31:51.4028584Z cvt.u64.u32 %rd92, %r278; 2026-02-21T09:31:51.4028758Z or.b64 %rd80, %rd92, 4611686293372403712; 2026-02-21T09:31:51.4028964Z add.s32 %r279, %r39, 147456; 2026-02-21T09:31:51.4029135Z bfe.u32 %r280, %r279, 4, 14; 2026-02-21T09:31:51.4029312Z cvt.u64.u32 %rd93, %r280; 2026-02-21T09:31:51.4029487Z or.b64 %rd79, %rd93, 4611686293372403712; 2026-02-21T09:31:51.4029682Z add.s32 %r281, %r39, 16384; 2026-02-21T09:31:51.4029848Z bfe.u32 %r282, %r281, 4, 14; 2026-02-21T09:31:51.4030024Z cvt.u64.u32 %rd94, %r282; 2026-02-21T09:31:51.4030197Z or.b64 %rd78, %rd94, 4611686293372403712; 2026-02-21T09:31:51.4030395Z add.s32 %r283, %r39, 131168; 2026-02-21T09:31:51.4030570Z bfe.u32 %r284, %r283, 4, 14; 2026-02-21T09:31:51.4030738Z cvt.u64.u32 %rd95, %r284; 2026-02-21T09:31:51.4030919Z or.b64 %rd77, %rd95, 4611686293372403712; 2026-02-21T09:31:51.4031112Z add.s32 %r285, %r39, 96; 2026-02-21T09:31:51.4031284Z bfe.u32 %r286, %r285, 4, 14; 2026-02-21T09:31:51.4031451Z cvt.u64.u32 %rd96, %r286; 2026-02-21T09:31:51.4031631Z or.b64 %rd76, %rd96, 4611686293372403712; 2026-02-21T09:31:51.4031821Z add.s32 %r287, %r39, 131136; 2026-02-21T09:31:51.4031997Z bfe.u32 %r288, %r287, 4, 14; 2026-02-21T09:31:51.4032175Z cvt.u64.u32 %rd97, %r288; 2026-02-21T09:31:51.4032395Z or.b64 %rd75, %rd97, 4611686293372403712; 2026-02-21T09:31:51.4032629Z add.s32 %r289, %r39, 64; 2026-02-21T09:31:51.4032796Z bfe.u32 %r290, %r289, 4, 14; 2026-02-21T09:31:51.4032977Z cvt.u64.u32 %rd98, %r290; 2026-02-21T09:31:51.4033156Z or.b64 %rd74, %rd98, 4611686293372403712; 2026-02-21T09:31:51.4033354Z add.s32 %r291, %r39, 131104; 2026-02-21T09:31:51.4033528Z bfe.u32 %r292, %r291, 4, 14; 2026-02-21T09:31:51.4033710Z cvt.u64.u32 %rd99, %r292; 2026-02-21T09:31:51.4033888Z or.b64 %rd73, %rd99, 4611686293372403712; 2026-02-21T09:31:51.4034092Z add.s32 %r293, %r39, 32; 2026-02-21T09:31:51.4034266Z bfe.u32 %r294, %r293, 4, 14; 2026-02-21T09:31:51.4034442Z cvt.u64.u32 %rd100, %r294; 2026-02-21T09:31:51.4034635Z or.b64 %rd72, %rd100, 4611686293372403712; 2026-02-21T09:31:51.4034892Z bfe.u32 %r295, %r266, 4, 14; 2026-02-21T09:31:51.4035079Z cvt.u64.u32 %rd101, %r295; 2026-02-21T09:31:51.4035356Z or.b64 %rd71, %rd101, 4611686293372403712; 2026-02-21T09:31:51.4035569Z bfe.u32 %r296, %r39, 4, 14; 2026-02-21T09:31:51.4035739Z cvt.u64.u32 %rd102, %r296; 2026-02-21T09:31:51.4035929Z or.b64 %rd70, %rd102, 4611686293372403712; 2026-02-21T09:31:51.4036272Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4036604Z elect.sync %r297|%p44, -1; 2026-02-21T09:31:51.4036796Z mov.b32 %r250, 136314896; 2026-02-21T09:31:51.4036963Z mov.pred %p43, 0; 2026-02-21T09:31:51.4037126Z // begin inline asm 2026-02-21T09:31:51.4037385Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd70, %rd71, %r250, %p43; 2026-02-21T09:31:51.4037690Z // end inline asm 2026-02-21T09:31:51.4037845Z // begin inline asm 2026-02-21T09:31:51.4038102Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd72, %rd73, %r250, %p45; 2026-02-21T09:31:51.4038391Z // end inline asm 2026-02-21T09:31:51.4038548Z // begin inline asm 2026-02-21T09:31:51.4038799Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd74, %rd75, %r250, %p45; 2026-02-21T09:31:51.4039085Z // end inline asm 2026-02-21T09:31:51.4039241Z // begin inline asm 2026-02-21T09:31:51.4039475Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd76, %rd77, %r250, %p45; 2026-02-21T09:31:51.4039758Z // end inline asm 2026-02-21T09:31:51.4039916Z // begin inline asm 2026-02-21T09:31:51.4040149Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd78, %rd79, %r250, %p45; 2026-02-21T09:31:51.4040431Z // end inline asm 2026-02-21T09:31:51.4040577Z // begin inline asm 2026-02-21T09:31:51.4040832Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd80, %rd81, %r250, %p45; 2026-02-21T09:31:51.4041121Z // end inline asm 2026-02-21T09:31:51.4041276Z // begin inline asm 2026-02-21T09:31:51.4041520Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd82, %rd83, %r250, %p45; 2026-02-21T09:31:51.4041815Z // end inline asm 2026-02-21T09:31:51.4041969Z // begin inline asm 2026-02-21T09:31:51.4042223Z @%p44 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd84, %rd85, %r250, %p45; 2026-02-21T09:31:51.4042520Z // end inline asm 2026-02-21T09:31:51.4042683Z add.s32 %r298, %r39, 262176; 2026-02-21T09:31:51.4042861Z cvt.u64.u32 %rd86, %r298; 2026-02-21T09:31:51.4043020Z // begin inline asm 2026-02-21T09:31:51.4043252Z @%p44 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd86]; 2026-02-21T09:31:51.4043515Z // end inline asm 2026-02-21T09:31:51.4043658Z $L__BB0_3: 2026-02-21T09:31:51.4043923Z .loc 1 0 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:0:52 2026-02-21T09:31:51.4044264Z ld.param.b64 %rd15, [_helion_matmul_param_2]; 2026-02-21T09:31:51.4044485Z add.s32 %r15, %r39, %r214; 2026-02-21T09:31:51.4044656Z add.s32 %r16, %r39, %r215; 2026-02-21T09:31:51.4044932Z add.s32 %r511, %r506, 1024; 2026-02-21T09:31:51.4045108Z or.b32 %r20, %r318, %r196; 2026-02-21T09:31:51.4045424Z .loc 1 51 32 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:32 2026-02-21T09:31:51.4045806Z add.s64 %rd103, %rd3, 768; 2026-02-21T09:31:51.4046029Z cvt.u64.u32 %rd113, %r6; 2026-02-21T09:31:51.4046216Z add.s64 %rd114, %rd2, %rd113; 2026-02-21T09:31:51.4046396Z shl.b64 %rd115, %rd114, 1; 2026-02-21T09:31:51.4046585Z add.s64 %rd116, %rd14, %rd115; 2026-02-21T09:31:51.4046774Z add.s64 %rd104, %rd116, 32768; 2026-02-21T09:31:51.4046964Z add.s64 %rd105, %rd116, 65536; 2026-02-21T09:31:51.4047143Z add.s64 %rd106, %rd116, 98304; 2026-02-21T09:31:51.4047334Z add.s64 %rd107, %rd116, 131072; 2026-02-21T09:31:51.4047527Z add.s64 %rd108, %rd116, 163840; 2026-02-21T09:31:51.4047712Z add.s64 %rd109, %rd116, 196608; 2026-02-21T09:31:51.4047905Z add.s64 %rd110, %rd5, 768; 2026-02-21T09:31:51.4048201Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.4048523Z bar.sync 0; 2026-02-21T09:31:51.4048706Z // begin inline asm 2026-02-21T09:31:51.4048965Z cp.async.cg.shared.global [ %r299 + 0 ], [ %rd103 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4049226Z // end inline asm 2026-02-21T09:31:51.4049381Z // begin inline asm 2026-02-21T09:31:51.4049600Z cp.async.cg.shared.global [ %r301 + 0 ], [ %rd104 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4049862Z // end inline asm 2026-02-21T09:31:51.4050010Z // begin inline asm 2026-02-21T09:31:51.4050220Z cp.async.cg.shared.global [ %r303 + 0 ], [ %rd105 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4050475Z // end inline asm 2026-02-21T09:31:51.4050625Z // begin inline asm 2026-02-21T09:31:51.4050847Z cp.async.cg.shared.global [ %r305 + 0 ], [ %rd106 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4051093Z // end inline asm 2026-02-21T09:31:51.4051247Z // begin inline asm 2026-02-21T09:31:51.4051460Z cp.async.cg.shared.global [ %r307 + 0 ], [ %rd107 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4051724Z // end inline asm 2026-02-21T09:31:51.4051868Z // begin inline asm 2026-02-21T09:31:51.4052073Z cp.async.cg.shared.global [ %r309 + 0 ], [ %rd108 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4052325Z // end inline asm 2026-02-21T09:31:51.4052473Z // begin inline asm 2026-02-21T09:31:51.4052691Z cp.async.cg.shared.global [ %r311 + 0 ], [ %rd109 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4052932Z // end inline asm 2026-02-21T09:31:51.4053081Z // begin inline asm 2026-02-21T09:31:51.4053294Z cp.async.cg.shared.global [ %r313 + 0 ], [ %rd110 + 0 ], 0x10, %r300; 2026-02-21T09:31:51.4053546Z // end inline asm 2026-02-21T09:31:51.4053705Z cp.async.commit_group; 2026-02-21T09:31:51.4053993Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4054318Z // begin inline asm 2026-02-21T09:31:51.4054530Z @%p93 mbarrier.arrive.expect_tx.shared.b64 _, [%r315], 32768; 2026-02-21T09:31:51.4054828Z // end inline asm 2026-02-21T09:31:51.4055101Z .loc 1 52 44 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:52:44 2026-02-21T09:31:51.4055431Z bar.sync 0; 2026-02-21T09:31:51.4055590Z elect.sync %r325|%p63, -1; 2026-02-21T09:31:51.4055784Z and.pred %p61, %p38, %p63; 2026-02-21T09:31:51.4055969Z shl.b32 %r326, %r23, 1; 2026-02-21T09:31:51.4056144Z add.s32 %r327, %r39, %r326; 2026-02-21T09:31:51.4056331Z add.s32 %r316, %r327, 229376; 2026-02-21T09:31:51.4056507Z or.b32 %r317, %r148, 384; 2026-02-21T09:31:51.4056683Z // begin inline asm 2026-02-21T09:31:51.4057061Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r316], [%rd111, {%r317, %r318}], [%r315]; 2026-02-21T09:31:51.4057478Z // end inline asm 2026-02-21T09:31:51.4057757Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4058077Z shl.b64 %rd117, %rd4, 1; 2026-02-21T09:31:51.4058259Z add.s64 %rd6, %rd117, 1024; 2026-02-21T09:31:51.4058434Z and.b32 %r328, %r1, 15; 2026-02-21T09:31:51.4058622Z mad.wide.u32 %rd303, %r328, 16, %rd14; 2026-02-21T09:31:51.4058821Z shl.b32 %r329, %r3, 14; 2026-02-21T09:31:51.4059083Z and.b32 %r330, %r329, 16646144; 2026-02-21T09:31:51.4059308Z shl.b32 %r331, %r4, 10; 2026-02-21T09:31:51.4059488Z or.b32 %r332, %r330, %r331; 2026-02-21T09:31:51.4059675Z mul.wide.u32 %rd8, %r332, 2; 2026-02-21T09:31:51.4059858Z mul.wide.u32 %rd118, %r242, 64; 2026-02-21T09:31:51.4060056Z or.b64 %rd9, %rd118, 512; 2026-02-21T09:31:51.4060224Z mov.b32 %r681, 1; 2026-02-21T09:31:51.4060388Z mov.b32 %r680, 3; 2026-02-21T09:31:51.4060537Z mov.b32 %r676, 0; 2026-02-21T09:31:51.4060691Z mov.b64 %rd304, 0; 2026-02-21T09:31:51.4060847Z mov.b32 %r678, %r676; 2026-02-21T09:31:51.4061016Z mov.b32 %r679, %r676; 2026-02-21T09:31:51.4061176Z mov.b32 %r682, %r676; 2026-02-21T09:31:51.4061341Z bra.uni $L__BB0_4; 2026-02-21T09:31:51.4061547Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:31:51.4062020Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4062399Z setp.lt.u64 %p85, %rd304, 512; 2026-02-21T09:31:51.4062706Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4063034Z // begin inline asm 2026-02-21T09:31:51.4063187Z 2026-02-21T09:31:51.4063337Z { 2026-02-21T09:31:51.4063482Z .reg .pred complete; 2026-02-21T09:31:51.4063667Z waitLoop: 2026-02-21T09:31:51.4063886Z mbarrier.try_wait.parity.shared.b64 complete, [%r677], %r676; 2026-02-21T09:31:51.4064159Z @!complete bra.uni waitLoop; 2026-02-21T09:31:51.4064337Z } 2026-02-21T09:31:51.4064413Z 2026-02-21T09:31:51.4064477Z // end inline asm 2026-02-21T09:31:51.4064640Z add.s32 %r417, %r681, 1; 2026-02-21T09:31:51.4064876Z setp.gt.s32 %p88, %r417, 1; 2026-02-21T09:31:51.4065068Z selp.b32 %r681, 0, %r417, %p88; 2026-02-21T09:31:51.4065253Z selp.b32 %r418, 1, 0, %p88; 2026-02-21T09:31:51.4065440Z xor.b32 %r37, %r682, %r418; 2026-02-21T09:31:51.4065735Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4066081Z add.s32 %r419, %r680, 1; 2026-02-21T09:31:51.4066275Z setp.gt.s32 %p89, %r419, 3; 2026-02-21T09:31:51.4066462Z selp.b32 %r680, 0, %r419, %p89; 2026-02-21T09:31:51.4066786Z .loc 1 51 32 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:32 2026-02-21T09:31:51.4067128Z add.s64 %rd161, %rd303, %rd8; 2026-02-21T09:31:51.4067330Z add.s64 %rd152, %rd161, 1024; 2026-02-21T09:31:51.4067514Z add.s64 %rd153, %rd161, 33792; 2026-02-21T09:31:51.4067706Z add.s64 %rd154, %rd161, 66560; 2026-02-21T09:31:51.4067892Z add.s64 %rd155, %rd161, 99328; 2026-02-21T09:31:51.4068077Z add.s64 %rd156, %rd161, 132096; 2026-02-21T09:31:51.4068271Z add.s64 %rd157, %rd161, 164864; 2026-02-21T09:31:51.4068451Z add.s64 %rd158, %rd161, 197632; 2026-02-21T09:31:51.4068775Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.4069119Z add.s64 %rd159, %rd303, %rd6; 2026-02-21T09:31:51.4069317Z shl.b32 %r420, %r680, 15; 2026-02-21T09:31:51.4069492Z add.s32 %r422, %r39, %r420; 2026-02-21T09:31:51.4069671Z bar.sync 0; 2026-02-21T09:31:51.4069823Z add.s32 %r396, %r422, %r5; 2026-02-21T09:31:51.4070015Z selp.b32 %r397, 16, 0, %p85; 2026-02-21T09:31:51.4070208Z // begin inline asm 2026-02-21T09:31:51.4070441Z cp.async.cg.shared.global [ %r396 + 0 ], [ %rd152 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4070715Z // end inline asm 2026-02-21T09:31:51.4070871Z add.s32 %r398, %r396, 2048; 2026-02-21T09:31:51.4071058Z // begin inline asm 2026-02-21T09:31:51.4071289Z cp.async.cg.shared.global [ %r398 + 0 ], [ %rd153 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4071566Z // end inline asm 2026-02-21T09:31:51.4071721Z add.s32 %r400, %r396, 4096; 2026-02-21T09:31:51.4071899Z // begin inline asm 2026-02-21T09:31:51.4072129Z cp.async.cg.shared.global [ %r400 + 0 ], [ %rd154 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4072377Z // end inline asm 2026-02-21T09:31:51.4072535Z add.s32 %r402, %r396, 6144; 2026-02-21T09:31:51.4072775Z // begin inline asm 2026-02-21T09:31:51.4073002Z cp.async.cg.shared.global [ %r402 + 0 ], [ %rd155 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4073250Z // end inline asm 2026-02-21T09:31:51.4073403Z add.s32 %r404, %r396, 8192; 2026-02-21T09:31:51.4073569Z // begin inline asm 2026-02-21T09:31:51.4073794Z cp.async.cg.shared.global [ %r404 + 0 ], [ %rd156 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4074048Z // end inline asm 2026-02-21T09:31:51.4074200Z add.s32 %r406, %r396, 10240; 2026-02-21T09:31:51.4074379Z // begin inline asm 2026-02-21T09:31:51.4074594Z cp.async.cg.shared.global [ %r406 + 0 ], [ %rd157 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4074928Z // end inline asm 2026-02-21T09:31:51.4075080Z add.s32 %r408, %r396, 12288; 2026-02-21T09:31:51.4075256Z // begin inline asm 2026-02-21T09:31:51.4075508Z cp.async.cg.shared.global [ %r408 + 0 ], [ %rd158 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4075813Z // end inline asm 2026-02-21T09:31:51.4075976Z add.s32 %r410, %r396, 14336; 2026-02-21T09:31:51.4076151Z // begin inline asm 2026-02-21T09:31:51.4076375Z cp.async.cg.shared.global [ %r410 + 0 ], [ %rd159 + 0 ], 0x10, %r397; 2026-02-21T09:31:51.4076621Z // end inline asm 2026-02-21T09:31:51.4076783Z cp.async.commit_group; 2026-02-21T09:31:51.4077069Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4077401Z shl.b32 %r423, %r680, 3; 2026-02-21T09:31:51.4077571Z add.s32 %r424, %r39, %r423; 2026-02-21T09:31:51.4077752Z add.s32 %r416, %r424, 262144; 2026-02-21T09:31:51.4077944Z and.pred %p83, %p93, %p85; 2026-02-21T09:31:51.4078118Z // begin inline asm 2026-02-21T09:31:51.4078337Z @%p83 mbarrier.arrive.expect_tx.shared.b64 _, [%r416], 32768; 2026-02-21T09:31:51.4078584Z // end inline asm 2026-02-21T09:31:51.4078861Z .loc 1 52 44 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:52:44 2026-02-21T09:31:51.4079174Z bar.sync 0; 2026-02-21T09:31:51.4079337Z elect.sync %r425|%p90, -1; 2026-02-21T09:31:51.4079517Z and.pred %p91, %p85, %p90; 2026-02-21T09:31:51.4079700Z and.pred %p84, %p38, %p91; 2026-02-21T09:31:51.4079875Z add.s32 %r413, %r147, %r420; 2026-02-21T09:31:51.4080061Z add.s64 %rd162, %rd9, %rd304; 2026-02-21T09:31:51.4080247Z cvt.u32.u64 %r414, %rd162; 2026-02-21T09:31:51.4080416Z // begin inline asm 2026-02-21T09:31:51.4080795Z @%p84 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r413], [%rd111, {%r414, %r318}], [%r416]; 2026-02-21T09:31:51.4081201Z // end inline asm 2026-02-21T09:31:51.4081481Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4081802Z add.s64 %rd12, %rd304, 128; 2026-02-21T09:31:51.4081987Z add.s64 %rd303, %rd303, 256; 2026-02-21T09:31:51.4082172Z setp.lt.u64 %p92, %rd304, 768; 2026-02-21T09:31:51.4082356Z mov.b64 %rd304, %rd12; 2026-02-21T09:31:51.4082532Z mov.b32 %r676, %r682; 2026-02-21T09:31:51.4082694Z mov.b32 %r677, %r426; 2026-02-21T09:31:51.4082863Z mov.b32 %r682, %r37; 2026-02-21T09:31:51.4083025Z @%p92 bra $L__BB0_4; 2026-02-21T09:31:51.4083195Z bra.uni $L__BB0_7; 2026-02-21T09:31:51.4083405Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:31:51.4083674Z add.s32 %r336, %r679, 1; 2026-02-21T09:31:51.4083852Z setp.gt.s32 %p65, %r336, 3; 2026-02-21T09:31:51.4084041Z selp.b32 %r679, 0, %r336, %p65; 2026-02-21T09:31:51.4084231Z selp.b32 %r337, 1, 0, %p65; 2026-02-21T09:31:51.4084405Z xor.b32 %r678, %r678, %r337; 2026-02-21T09:31:51.4084758Z .loc 1 51 85 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:51:85 2026-02-21T09:31:51.4085114Z cp.async.wait_group 2; 2026-02-21T09:31:51.4085298Z bar.sync 0; 2026-02-21T09:31:51.4085590Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4085951Z shl.b32 %r338, %r679, 3; 2026-02-21T09:31:51.4086174Z add.s32 %r340, %r39, %r338; 2026-02-21T09:31:51.4086391Z add.s32 %r334, %r340, 262144; 2026-02-21T09:31:51.4086575Z // begin inline asm 2026-02-21T09:31:51.4086724Z 2026-02-21T09:31:51.4086855Z { 2026-02-21T09:31:51.4086993Z .reg .pred complete; 2026-02-21T09:31:51.4087162Z waitLoop: 2026-02-21T09:31:51.4087375Z mbarrier.try_wait.parity.shared.b64 complete, [%r334], %r678; 2026-02-21T09:31:51.4087654Z @!complete bra.uni waitLoop; 2026-02-21T09:31:51.4087824Z } 2026-02-21T09:31:51.4087906Z 2026-02-21T09:31:51.4087970Z // end inline asm 2026-02-21T09:31:51.4088133Z shl.b32 %r341, %r681, 3; 2026-02-21T09:31:51.4088304Z add.s32 %r342, %r39, %r341; 2026-02-21T09:31:51.4088487Z add.s32 %r426, %r342, 262176; 2026-02-21T09:31:51.4088787Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4089119Z @%p42 bra $L__BB0_6; 2026-02-21T09:31:51.4089396Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:31:51.4089764Z .loc 1 52 44 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:52:44 2026-02-21T09:31:51.4090087Z shl.b32 %r359, %r679, 15; 2026-02-21T09:31:51.4090262Z add.s32 %r361, %r39, %r359; 2026-02-21T09:31:51.4090441Z add.s32 %r362, %r361, 131072; 2026-02-21T09:31:51.4090731Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4091062Z elect.sync %r363|%p67, -1; 2026-02-21T09:31:51.4091246Z bfe.u32 %r364, %r361, 4, 14; 2026-02-21T09:31:51.4091434Z cvt.u64.u32 %rd136, %r364; 2026-02-21T09:31:51.4091623Z or.b64 %rd119, %rd136, 4611686293372403712; 2026-02-21T09:31:51.4091845Z bfe.u32 %r365, %r362, 4, 14; 2026-02-21T09:31:51.4092020Z cvt.u64.u32 %rd137, %r365; 2026-02-21T09:31:51.4092218Z or.b64 %rd120, %rd137, 4611686293372403712; 2026-02-21T09:31:51.4092430Z mov.b32 %r344, 136314896; 2026-02-21T09:31:51.4092602Z mov.pred %p66, -1; 2026-02-21T09:31:51.4092773Z // begin inline asm 2026-02-21T09:31:51.4093032Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd119, %rd120, %r344, %p66; 2026-02-21T09:31:51.4093338Z // end inline asm 2026-02-21T09:31:51.4093493Z add.s32 %r366, %r361, 32; 2026-02-21T09:31:51.4093669Z bfe.u32 %r367, %r366, 4, 14; 2026-02-21T09:31:51.4093841Z cvt.u64.u32 %rd138, %r367; 2026-02-21T09:31:51.4094030Z or.b64 %rd121, %rd138, 4611686293372403712; 2026-02-21T09:31:51.4094237Z add.s32 %r368, %r361, 131104; 2026-02-21T09:31:51.4094409Z bfe.u32 %r369, %r368, 4, 14; 2026-02-21T09:31:51.4094587Z cvt.u64.u32 %rd139, %r369; 2026-02-21T09:31:51.4094828Z or.b64 %rd122, %rd139, 4611686293372403712; 2026-02-21T09:31:51.4095035Z // begin inline asm 2026-02-21T09:31:51.4095287Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd121, %rd122, %r344, %p66; 2026-02-21T09:31:51.4095581Z // end inline asm 2026-02-21T09:31:51.4095735Z add.s32 %r370, %r361, 64; 2026-02-21T09:31:51.4095914Z bfe.u32 %r371, %r370, 4, 14; 2026-02-21T09:31:51.4096096Z cvt.u64.u32 %rd140, %r371; 2026-02-21T09:31:51.4096283Z or.b64 %rd123, %rd140, 4611686293372403712; 2026-02-21T09:31:51.4096490Z add.s32 %r372, %r361, 131136; 2026-02-21T09:31:51.4096667Z bfe.u32 %r373, %r372, 4, 14; 2026-02-21T09:31:51.4096847Z cvt.u64.u32 %rd141, %r373; 2026-02-21T09:31:51.4097030Z or.b64 %rd124, %rd141, 4611686293372403712; 2026-02-21T09:31:51.4097235Z // begin inline asm 2026-02-21T09:31:51.4097482Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd123, %rd124, %r344, %p66; 2026-02-21T09:31:51.4097776Z // end inline asm 2026-02-21T09:31:51.4097937Z add.s32 %r374, %r361, 96; 2026-02-21T09:31:51.4098108Z bfe.u32 %r375, %r374, 4, 14; 2026-02-21T09:31:51.4098291Z cvt.u64.u32 %rd142, %r375; 2026-02-21T09:31:51.4098474Z or.b64 %rd125, %rd142, 4611686293372403712; 2026-02-21T09:31:51.4098683Z add.s32 %r376, %r361, 131168; 2026-02-21T09:31:51.4098861Z bfe.u32 %r377, %r376, 4, 14; 2026-02-21T09:31:51.4099045Z cvt.u64.u32 %rd143, %r377; 2026-02-21T09:31:51.4099283Z or.b64 %rd126, %rd143, 4611686293372403712; 2026-02-21T09:31:51.4099516Z // begin inline asm 2026-02-21T09:31:51.4099766Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd125, %rd126, %r344, %p66; 2026-02-21T09:31:51.4100058Z // end inline asm 2026-02-21T09:31:51.4100223Z add.s32 %r378, %r361, 16384; 2026-02-21T09:31:51.4100394Z bfe.u32 %r379, %r378, 4, 14; 2026-02-21T09:31:51.4100577Z cvt.u64.u32 %rd144, %r379; 2026-02-21T09:31:51.4100759Z or.b64 %rd127, %rd144, 4611686293372403712; 2026-02-21T09:31:51.4100969Z add.s32 %r380, %r361, 147456; 2026-02-21T09:31:51.4101146Z bfe.u32 %r381, %r380, 4, 14; 2026-02-21T09:31:51.4101325Z cvt.u64.u32 %rd145, %r381; 2026-02-21T09:31:51.4101513Z or.b64 %rd128, %rd145, 4611686293372403712; 2026-02-21T09:31:51.4101710Z // begin inline asm 2026-02-21T09:31:51.4101997Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd127, %rd128, %r344, %p66; 2026-02-21T09:31:51.4102320Z // end inline asm 2026-02-21T09:31:51.4102487Z add.s32 %r382, %r361, 16416; 2026-02-21T09:31:51.4102657Z bfe.u32 %r383, %r382, 4, 14; 2026-02-21T09:31:51.4102835Z cvt.u64.u32 %rd146, %r383; 2026-02-21T09:31:51.4103016Z or.b64 %rd129, %rd146, 4611686293372403712; 2026-02-21T09:31:51.4103221Z add.s32 %r384, %r361, 147488; 2026-02-21T09:31:51.4103402Z bfe.u32 %r385, %r384, 4, 14; 2026-02-21T09:31:51.4103572Z cvt.u64.u32 %rd147, %r385; 2026-02-21T09:31:51.4103763Z or.b64 %rd130, %rd147, 4611686293372403712; 2026-02-21T09:31:51.4103955Z // begin inline asm 2026-02-21T09:31:51.4104210Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd129, %rd130, %r344, %p66; 2026-02-21T09:31:51.4104490Z // end inline asm 2026-02-21T09:31:51.4104648Z add.s32 %r386, %r361, 16448; 2026-02-21T09:31:51.4104867Z bfe.u32 %r387, %r386, 4, 14; 2026-02-21T09:31:51.4105045Z cvt.u64.u32 %rd148, %r387; 2026-02-21T09:31:51.4105237Z or.b64 %rd131, %rd148, 4611686293372403712; 2026-02-21T09:31:51.4105439Z add.s32 %r388, %r361, 147520; 2026-02-21T09:31:51.4105619Z bfe.u32 %r389, %r388, 4, 14; 2026-02-21T09:31:51.4105790Z cvt.u64.u32 %rd149, %r389; 2026-02-21T09:31:51.4105976Z or.b64 %rd132, %rd149, 4611686293372403712; 2026-02-21T09:31:51.4106170Z // begin inline asm 2026-02-21T09:31:51.4106418Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd131, %rd132, %r344, %p66; 2026-02-21T09:31:51.4106699Z // end inline asm 2026-02-21T09:31:51.4106858Z add.s32 %r390, %r361, 16480; 2026-02-21T09:31:51.4107040Z bfe.u32 %r391, %r390, 4, 14; 2026-02-21T09:31:51.4107214Z cvt.u64.u32 %rd150, %r391; 2026-02-21T09:31:51.4107404Z or.b64 %rd133, %rd150, 4611686293372403712; 2026-02-21T09:31:51.4107600Z add.s32 %r392, %r361, 147552; 2026-02-21T09:31:51.4107780Z bfe.u32 %r393, %r392, 4, 14; 2026-02-21T09:31:51.4107951Z cvt.u64.u32 %rd151, %r393; 2026-02-21T09:31:51.4108141Z or.b64 %rd134, %rd151, 4611686293372403712; 2026-02-21T09:31:51.4108339Z // begin inline asm 2026-02-21T09:31:51.4108592Z @%p67 tcgen05.mma.cta_group::1.kind::f16 [ %r675 + 0 ], %rd133, %rd134, %r344, %p66; 2026-02-21T09:31:51.4108883Z // end inline asm 2026-02-21T09:31:51.4109037Z cvt.u64.u32 %rd135, %r426; 2026-02-21T09:31:51.4109215Z // begin inline asm 2026-02-21T09:31:51.4109450Z @%p67 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd135]; 2026-02-21T09:31:51.4109718Z // end inline asm 2026-02-21T09:31:51.4109869Z bra.uni $L__BB0_6; 2026-02-21T09:31:51.4110076Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:31:51.4110431Z .loc 1 0 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:0:52 2026-02-21T09:31:51.4110761Z cvt.u32.u64 %r574, %rd4; 2026-02-21T09:31:51.4110938Z cvt.u32.u64 %r575, %rd2; 2026-02-21T09:31:51.4111100Z mov.b32 %r427, 1; 2026-02-21T09:31:51.4111385Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4111709Z // begin inline asm 2026-02-21T09:31:51.4111867Z 2026-02-21T09:31:51.4112040Z { 2026-02-21T09:31:51.4112243Z .reg .pred complete; 2026-02-21T09:31:51.4112409Z waitLoop: 2026-02-21T09:31:51.4112628Z mbarrier.try_wait.parity.shared.b64 complete, [%r426], %r427; 2026-02-21T09:31:51.4112902Z @!complete bra.uni waitLoop; 2026-02-21T09:31:51.4113072Z } 2026-02-21T09:31:51.4113146Z 2026-02-21T09:31:51.4113216Z // end inline asm 2026-02-21T09:31:51.4113492Z .loc 1 46 57 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:46:57 2026-02-21T09:31:51.4113830Z cp.async.wait_group 0; 2026-02-21T09:31:51.4114003Z bar.sync 0; 2026-02-21T09:31:51.4114160Z // begin inline asm 2026-02-21T09:31:51.4114344Z @%p93 mbarrier.inval.shared::cta.b64 [%r126]; 2026-02-21T09:31:51.4114564Z // end inline asm 2026-02-21T09:31:51.4114779Z bar.sync 0; 2026-02-21T09:31:51.4114924Z // begin inline asm 2026-02-21T09:31:51.4115151Z @%p93 mbarrier.inval.shared::cta.b64 [%r127]; 2026-02-21T09:31:51.4115363Z // end inline asm 2026-02-21T09:31:51.4115556Z bar.sync 0; 2026-02-21T09:31:51.4115701Z // begin inline asm 2026-02-21T09:31:51.4115887Z @%p93 mbarrier.inval.shared::cta.b64 [%r128]; 2026-02-21T09:31:51.4116094Z // end inline asm 2026-02-21T09:31:51.4116248Z bar.sync 0; 2026-02-21T09:31:51.4116389Z // begin inline asm 2026-02-21T09:31:51.4116571Z @%p93 mbarrier.inval.shared::cta.b64 [%r315]; 2026-02-21T09:31:51.4116778Z // end inline asm 2026-02-21T09:31:51.4116932Z add.s32 %r432, %r39, 262176; 2026-02-21T09:31:51.4117112Z // begin inline asm 2026-02-21T09:31:51.4117286Z @%p93 mbarrier.inval.shared::cta.b64 [%r432]; 2026-02-21T09:31:51.4117493Z // end inline asm 2026-02-21T09:31:51.4117636Z bar.sync 0; 2026-02-21T09:31:51.4117784Z // begin inline asm 2026-02-21T09:31:51.4117959Z @%p93 mbarrier.inval.shared::cta.b64 [%r125]; 2026-02-21T09:31:51.4118168Z // end inline asm 2026-02-21T09:31:51.4118449Z .loc 1 56 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:56:52 2026-02-21T09:31:51.4118776Z or.b32 %r577, %r575, %r20; 2026-02-21T09:31:51.4118960Z or.b32 %r578, %r574, %r20; 2026-02-21T09:31:51.4119250Z .loc 1 56 24 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:56:24 2026-02-21T09:31:51.4119590Z mad.wide.u32 %rd163, %r577, 2, %rd15; 2026-02-21T09:31:51.4119788Z cvt.u64.u32 %rd171, %r20; 2026-02-21T09:31:51.4119973Z add.s64 %rd172, %rd2, %rd171; 2026-02-21T09:31:51.4120155Z shl.b64 %rd173, %rd172, 1; 2026-02-21T09:31:51.4120338Z add.s64 %rd174, %rd15, %rd173; 2026-02-21T09:31:51.4120528Z add.s64 %rd164, %rd174, 32768; 2026-02-21T09:31:51.4120704Z add.s64 %rd165, %rd174, 65536; 2026-02-21T09:31:51.4120891Z add.s64 %rd166, %rd174, 98304; 2026-02-21T09:31:51.4121069Z add.s64 %rd167, %rd174, 131072; 2026-02-21T09:31:51.4121258Z add.s64 %rd168, %rd174, 163840; 2026-02-21T09:31:51.4121440Z add.s64 %rd169, %rd174, 196608; 2026-02-21T09:31:51.4121634Z mad.wide.u32 %rd170, %r578, 2, %rd15; 2026-02-21T09:31:51.4121952Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4122289Z // begin inline asm 2026-02-21T09:31:51.4122718Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449}, [%r501 + 0]; 2026-02-21T09:31:51.4123162Z // end inline asm 2026-02-21T09:31:51.4123322Z // begin inline asm 2026-02-21T09:31:51.4123723Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466}, [%r501 + 16]; 2026-02-21T09:31:51.4124181Z // end inline asm 2026-02-21T09:31:51.4124331Z // begin inline asm 2026-02-21T09:31:51.4124804Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483}, [%r501 + 32]; 2026-02-21T09:31:51.4125245Z // end inline asm 2026-02-21T09:31:51.4125398Z // begin inline asm 2026-02-21T09:31:51.4125844Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500}, [%r501 + 48]; 2026-02-21T09:31:51.4126329Z // end inline asm 2026-02-21T09:31:51.4126487Z // begin inline asm 2026-02-21T09:31:51.4126657Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:51.4126851Z // end inline asm 2026-02-21T09:31:51.4127013Z cvt.u64.u32 %rd175, %r434; 2026-02-21T09:31:51.4127190Z cvt.u64.u32 %rd176, %r435; 2026-02-21T09:31:51.4127379Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:31:51.4127557Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:31:51.4127865Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4128191Z mov.b64 {%r579, %r580}, %rd178; 2026-02-21T09:31:51.4128396Z cvt.rn.f16x2.f32 %r581, %r580, %r579; 2026-02-21T09:31:51.4128773Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4129110Z cvt.u64.u32 %rd179, %r436; 2026-02-21T09:31:51.4129292Z cvt.u64.u32 %rd180, %r437; 2026-02-21T09:31:51.4129464Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:31:51.4129647Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:31:51.4129946Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4130270Z mov.b64 {%r582, %r583}, %rd182; 2026-02-21T09:31:51.4130458Z cvt.rn.f16x2.f32 %r584, %r583, %r582; 2026-02-21T09:31:51.4130780Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4131104Z cvt.u64.u32 %rd183, %r438; 2026-02-21T09:31:51.4131278Z cvt.u64.u32 %rd184, %r439; 2026-02-21T09:31:51.4131457Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:31:51.4131632Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:31:51.4131944Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4132263Z mov.b64 {%r585, %r586}, %rd186; 2026-02-21T09:31:51.4132461Z cvt.rn.f16x2.f32 %r587, %r586, %r585; 2026-02-21T09:31:51.4132776Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4133100Z cvt.u64.u32 %rd187, %r440; 2026-02-21T09:31:51.4133280Z cvt.u64.u32 %rd188, %r441; 2026-02-21T09:31:51.4133452Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:31:51.4133634Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:31:51.4133932Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4134260Z mov.b64 {%r588, %r589}, %rd190; 2026-02-21T09:31:51.4134443Z cvt.rn.f16x2.f32 %r590, %r589, %r588; 2026-02-21T09:31:51.4134810Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4135141Z cvt.u64.u32 %rd191, %r442; 2026-02-21T09:31:51.4135314Z cvt.u64.u32 %rd192, %r443; 2026-02-21T09:31:51.4135495Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:31:51.4135674Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:31:51.4135974Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4136287Z mov.b64 {%r591, %r592}, %rd194; 2026-02-21T09:31:51.4136478Z cvt.rn.f16x2.f32 %r593, %r592, %r591; 2026-02-21T09:31:51.4136786Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4137104Z cvt.u64.u32 %rd195, %r444; 2026-02-21T09:31:51.4137293Z cvt.u64.u32 %rd196, %r445; 2026-02-21T09:31:51.4137465Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:31:51.4137643Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:31:51.4137935Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4138266Z mov.b64 {%r594, %r595}, %rd198; 2026-02-21T09:31:51.4138447Z cvt.rn.f16x2.f32 %r596, %r595, %r594; 2026-02-21T09:31:51.4138761Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4139145Z cvt.u64.u32 %rd199, %r446; 2026-02-21T09:31:51.4139316Z cvt.u64.u32 %rd200, %r447; 2026-02-21T09:31:51.4139493Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:31:51.4139666Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:31:51.4139964Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4140274Z mov.b64 {%r597, %r598}, %rd202; 2026-02-21T09:31:51.4140467Z cvt.rn.f16x2.f32 %r599, %r598, %r597; 2026-02-21T09:31:51.4140772Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4141096Z cvt.u64.u32 %rd203, %r448; 2026-02-21T09:31:51.4141276Z cvt.u64.u32 %rd204, %r449; 2026-02-21T09:31:51.4141447Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:31:51.4141665Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:31:51.4142016Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4142352Z mov.b64 {%r600, %r601}, %rd206; 2026-02-21T09:31:51.4142533Z cvt.rn.f16x2.f32 %r602, %r601, %r600; 2026-02-21T09:31:51.4142844Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4143166Z cvt.u64.u32 %rd207, %r451; 2026-02-21T09:31:51.4143337Z cvt.u64.u32 %rd208, %r452; 2026-02-21T09:31:51.4143515Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:31:51.4143687Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:31:51.4143990Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4144303Z mov.b64 {%r603, %r604}, %rd210; 2026-02-21T09:31:51.4144497Z cvt.rn.f16x2.f32 %r605, %r604, %r603; 2026-02-21T09:31:51.4144847Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4145179Z cvt.u64.u32 %rd211, %r453; 2026-02-21T09:31:51.4145362Z cvt.u64.u32 %rd212, %r454; 2026-02-21T09:31:51.4145536Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:31:51.4145718Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:31:51.4146018Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4146349Z mov.b64 {%r606, %r607}, %rd214; 2026-02-21T09:31:51.4146531Z cvt.rn.f16x2.f32 %r608, %r607, %r606; 2026-02-21T09:31:51.4146849Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4147172Z cvt.u64.u32 %rd215, %r455; 2026-02-21T09:31:51.4147341Z cvt.u64.u32 %rd216, %r456; 2026-02-21T09:31:51.4147520Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:31:51.4147692Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:31:51.4147994Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4148315Z mov.b64 {%r609, %r610}, %rd218; 2026-02-21T09:31:51.4148508Z cvt.rn.f16x2.f32 %r611, %r610, %r609; 2026-02-21T09:31:51.4148814Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4149146Z cvt.u64.u32 %rd219, %r457; 2026-02-21T09:31:51.4149323Z cvt.u64.u32 %rd220, %r458; 2026-02-21T09:31:51.4149492Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:31:51.4149674Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:31:51.4149966Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4150287Z mov.b64 {%r612, %r613}, %rd222; 2026-02-21T09:31:51.4150470Z cvt.rn.f16x2.f32 %r614, %r613, %r612; 2026-02-21T09:31:51.4150784Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4151105Z cvt.u64.u32 %rd223, %r459; 2026-02-21T09:31:51.4151278Z cvt.u64.u32 %rd224, %r460; 2026-02-21T09:31:51.4151454Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:31:51.4151628Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:31:51.4151990Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4152314Z mov.b64 {%r615, %r616}, %rd226; 2026-02-21T09:31:51.4152513Z cvt.rn.f16x2.f32 %r617, %r616, %r615; 2026-02-21T09:31:51.4152820Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4153146Z cvt.u64.u32 %rd227, %r461; 2026-02-21T09:31:51.4153329Z cvt.u64.u32 %rd228, %r462; 2026-02-21T09:31:51.4153507Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:31:51.4153693Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:31:51.4153988Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4154313Z mov.b64 {%r618, %r619}, %rd230; 2026-02-21T09:31:51.4154504Z cvt.rn.f16x2.f32 %r620, %r619, %r618; 2026-02-21T09:31:51.4154920Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4155253Z cvt.u64.u32 %rd231, %r463; 2026-02-21T09:31:51.4155423Z cvt.u64.u32 %rd232, %r464; 2026-02-21T09:31:51.4155599Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:31:51.4155772Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:31:51.4156067Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4156389Z mov.b64 {%r621, %r622}, %rd234; 2026-02-21T09:31:51.4156580Z cvt.rn.f16x2.f32 %r623, %r622, %r621; 2026-02-21T09:31:51.4156884Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4157209Z cvt.u64.u32 %rd235, %r465; 2026-02-21T09:31:51.4157390Z cvt.u64.u32 %rd236, %r466; 2026-02-21T09:31:51.4157562Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:31:51.4157743Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:31:51.4158036Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4158360Z mov.b64 {%r624, %r625}, %rd238; 2026-02-21T09:31:51.4158544Z cvt.rn.f16x2.f32 %r626, %r625, %r624; 2026-02-21T09:31:51.4158859Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4159189Z cvt.u64.u32 %rd239, %r468; 2026-02-21T09:31:51.4159358Z cvt.u64.u32 %rd240, %r469; 2026-02-21T09:31:51.4159534Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:31:51.4159705Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:31:51.4159997Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4160311Z mov.b64 {%r627, %r628}, %rd242; 2026-02-21T09:31:51.4160503Z cvt.rn.f16x2.f32 %r629, %r628, %r627; 2026-02-21T09:31:51.4160810Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4161132Z cvt.u64.u32 %rd243, %r470; 2026-02-21T09:31:51.4161309Z cvt.u64.u32 %rd244, %r471; 2026-02-21T09:31:51.4161479Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:31:51.4161660Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:31:51.4161947Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4162267Z mov.b64 {%r630, %r631}, %rd246; 2026-02-21T09:31:51.4162447Z cvt.rn.f16x2.f32 %r632, %r631, %r630; 2026-02-21T09:31:51.4162758Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4163076Z cvt.u64.u32 %rd247, %r472; 2026-02-21T09:31:51.4163245Z cvt.u64.u32 %rd248, %r473; 2026-02-21T09:31:51.4163422Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:31:51.4163594Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:31:51.4163885Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4164200Z mov.b64 {%r633, %r634}, %rd250; 2026-02-21T09:31:51.4164392Z cvt.rn.f16x2.f32 %r635, %r634, %r633; 2026-02-21T09:31:51.4164774Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4165172Z cvt.u64.u32 %rd251, %r474; 2026-02-21T09:31:51.4165350Z cvt.u64.u32 %rd252, %r475; 2026-02-21T09:31:51.4165518Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:31:51.4165697Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:31:51.4165984Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4166306Z mov.b64 {%r636, %r637}, %rd254; 2026-02-21T09:31:51.4166488Z cvt.rn.f16x2.f32 %r638, %r637, %r636; 2026-02-21T09:31:51.4166806Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4167129Z cvt.u64.u32 %rd255, %r476; 2026-02-21T09:31:51.4167298Z cvt.u64.u32 %rd256, %r477; 2026-02-21T09:31:51.4167473Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:31:51.4167678Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:31:51.4168013Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4168330Z mov.b64 {%r639, %r640}, %rd258; 2026-02-21T09:31:51.4168522Z cvt.rn.f16x2.f32 %r641, %r640, %r639; 2026-02-21T09:31:51.4168826Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4169147Z cvt.u64.u32 %rd259, %r478; 2026-02-21T09:31:51.4169325Z cvt.u64.u32 %rd260, %r479; 2026-02-21T09:31:51.4169496Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:31:51.4169677Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:31:51.4169969Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4170289Z mov.b64 {%r642, %r643}, %rd262; 2026-02-21T09:31:51.4170473Z cvt.rn.f16x2.f32 %r644, %r643, %r642; 2026-02-21T09:31:51.4170787Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4171108Z cvt.u64.u32 %rd263, %r480; 2026-02-21T09:31:51.4171282Z cvt.u64.u32 %rd264, %r481; 2026-02-21T09:31:51.4171462Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:31:51.4171636Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:31:51.4171934Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4172250Z mov.b64 {%r645, %r646}, %rd266; 2026-02-21T09:31:51.4172445Z cvt.rn.f16x2.f32 %r647, %r646, %r645; 2026-02-21T09:31:51.4172752Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4173085Z cvt.u64.u32 %rd267, %r482; 2026-02-21T09:31:51.4173258Z cvt.u64.u32 %rd268, %r483; 2026-02-21T09:31:51.4173427Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:31:51.4173611Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:31:51.4173902Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4174224Z mov.b64 {%r648, %r649}, %rd270; 2026-02-21T09:31:51.4174412Z cvt.rn.f16x2.f32 %r650, %r649, %r648; 2026-02-21T09:31:51.4174772Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4175099Z cvt.u64.u32 %rd271, %r485; 2026-02-21T09:31:51.4175271Z cvt.u64.u32 %rd272, %r486; 2026-02-21T09:31:51.4175450Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:31:51.4175619Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:31:51.4175917Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4176230Z mov.b64 {%r651, %r652}, %rd274; 2026-02-21T09:31:51.4176420Z cvt.rn.f16x2.f32 %r653, %r652, %r651; 2026-02-21T09:31:51.4176724Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4177045Z cvt.u64.u32 %rd275, %r487; 2026-02-21T09:31:51.4177225Z cvt.u64.u32 %rd276, %r488; 2026-02-21T09:31:51.4177396Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:31:51.4177616Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:31:51.4177941Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4178273Z mov.b64 {%r654, %r655}, %rd278; 2026-02-21T09:31:51.4178455Z cvt.rn.f16x2.f32 %r656, %r655, %r654; 2026-02-21T09:31:51.4178772Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4179095Z cvt.u64.u32 %rd279, %r489; 2026-02-21T09:31:51.4179263Z cvt.u64.u32 %rd280, %r490; 2026-02-21T09:31:51.4179441Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:31:51.4179618Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:31:51.4179921Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4180238Z mov.b64 {%r657, %r658}, %rd282; 2026-02-21T09:31:51.4180463Z cvt.rn.f16x2.f32 %r659, %r658, %r657; 2026-02-21T09:31:51.4180801Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4181125Z cvt.u64.u32 %rd283, %r491; 2026-02-21T09:31:51.4181303Z cvt.u64.u32 %rd284, %r492; 2026-02-21T09:31:51.4181475Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:31:51.4181657Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:31:51.4181953Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4182274Z mov.b64 {%r660, %r661}, %rd286; 2026-02-21T09:31:51.4182462Z cvt.rn.f16x2.f32 %r662, %r661, %r660; 2026-02-21T09:31:51.4182780Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4183102Z cvt.u64.u32 %rd287, %r493; 2026-02-21T09:31:51.4183274Z cvt.u64.u32 %rd288, %r494; 2026-02-21T09:31:51.4183454Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:31:51.4183630Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:31:51.4183938Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4184257Z mov.b64 {%r663, %r664}, %rd290; 2026-02-21T09:31:51.4184453Z cvt.rn.f16x2.f32 %r665, %r664, %r663; 2026-02-21T09:31:51.4184816Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4185137Z cvt.u64.u32 %rd291, %r495; 2026-02-21T09:31:51.4185316Z cvt.u64.u32 %rd292, %r496; 2026-02-21T09:31:51.4185485Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:31:51.4185666Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:31:51.4185955Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4186274Z mov.b64 {%r666, %r667}, %rd294; 2026-02-21T09:31:51.4186458Z cvt.rn.f16x2.f32 %r668, %r667, %r666; 2026-02-21T09:31:51.4186773Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4187101Z cvt.u64.u32 %rd295, %r497; 2026-02-21T09:31:51.4187274Z cvt.u64.u32 %rd296, %r498; 2026-02-21T09:31:51.4187457Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:31:51.4187630Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:31:51.4187929Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4188241Z mov.b64 {%r669, %r670}, %rd298; 2026-02-21T09:31:51.4188435Z cvt.rn.f16x2.f32 %r671, %r670, %r669; 2026-02-21T09:31:51.4188745Z .loc 1 53 52 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:53:52 2026-02-21T09:31:51.4189056Z cvt.u64.u32 %rd299, %r499; 2026-02-21T09:31:51.4189231Z cvt.u64.u32 %rd300, %r500; 2026-02-21T09:31:51.4189401Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:31:51.4189579Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:31:51.4189868Z .loc 1 55 27 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:55:27 2026-02-21T09:31:51.4190196Z mov.b64 {%r672, %r673}, %rd302; 2026-02-21T09:31:51.4190380Z cvt.rn.f16x2.f32 %r674, %r673, %r672; 2026-02-21T09:31:51.4190753Z .loc 1 56 82 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:56:82 2026-02-21T09:31:51.4191119Z st.shared.v4.b32 [%r15], {%r581, %r593, %r605, %r617}; 2026-02-21T09:31:51.4191385Z st.shared.v4.b32 [%r16], {%r629, %r641, %r653, %r665}; 2026-02-21T09:31:51.4191609Z bar.sync 0; 2026-02-21T09:31:51.4191760Z // begin inline asm 2026-02-21T09:31:51.4192033Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r542, %r546, %r550, %r554}, [%r506]; 2026-02-21T09:31:51.4192333Z // end inline asm 2026-02-21T09:31:51.4192491Z // begin inline asm 2026-02-21T09:31:51.4192747Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r558, %r562, %r566, %r570}, [%r511]; 2026-02-21T09:31:51.4193047Z // end inline asm 2026-02-21T09:31:51.4193205Z bar.sync 0; 2026-02-21T09:31:51.4193390Z st.shared.v4.b32 [%r15], {%r584, %r596, %r608, %r620}; 2026-02-21T09:31:51.4193722Z st.shared.v4.b32 [%r16], {%r632, %r644, %r656, %r668}; 2026-02-21T09:31:51.4193942Z bar.sync 0; 2026-02-21T09:31:51.4194096Z // begin inline asm 2026-02-21T09:31:51.4194353Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r543, %r547, %r551, %r555}, [%r506]; 2026-02-21T09:31:51.4194658Z // end inline asm 2026-02-21T09:31:51.4194861Z // begin inline asm 2026-02-21T09:31:51.4195125Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r559, %r563, %r567, %r571}, [%r511]; 2026-02-21T09:31:51.4195429Z // end inline asm 2026-02-21T09:31:51.4195581Z bar.sync 0; 2026-02-21T09:31:51.4195773Z st.shared.v4.b32 [%r15], {%r587, %r599, %r611, %r623}; 2026-02-21T09:31:51.4196029Z st.shared.v4.b32 [%r16], {%r635, %r647, %r659, %r671}; 2026-02-21T09:31:51.4196250Z bar.sync 0; 2026-02-21T09:31:51.4196392Z // begin inline asm 2026-02-21T09:31:51.4196656Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r544, %r548, %r552, %r556}, [%r506]; 2026-02-21T09:31:51.4196952Z // end inline asm 2026-02-21T09:31:51.4197111Z // begin inline asm 2026-02-21T09:31:51.4197372Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r560, %r564, %r568, %r572}, [%r511]; 2026-02-21T09:31:51.4197669Z // end inline asm 2026-02-21T09:31:51.4197825Z bar.sync 0; 2026-02-21T09:31:51.4198004Z st.shared.v4.b32 [%r15], {%r590, %r602, %r614, %r626}; 2026-02-21T09:31:51.4198264Z st.shared.v4.b32 [%r16], {%r638, %r650, %r662, %r674}; 2026-02-21T09:31:51.4198479Z bar.sync 0; 2026-02-21T09:31:51.4198630Z // begin inline asm 2026-02-21T09:31:51.4198885Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r545, %r549, %r553, %r557}, [%r506]; 2026-02-21T09:31:51.4199194Z // end inline asm 2026-02-21T09:31:51.4199349Z // begin inline asm 2026-02-21T09:31:51.4199603Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r561, %r565, %r569, %r573}, [%r511]; 2026-02-21T09:31:51.4199904Z // end inline asm 2026-02-21T09:31:51.4200054Z // begin inline asm 2026-02-21T09:31:51.4200269Z st.global.v4.b32 [ %rd163 + 0 ], { %r542, %r543, %r544, %r545 }; 2026-02-21T09:31:51.4200508Z // end inline asm 2026-02-21T09:31:51.4200664Z // begin inline asm 2026-02-21T09:31:51.4200868Z st.global.v4.b32 [ %rd164 + 0 ], { %r546, %r547, %r548, %r549 }; 2026-02-21T09:31:51.4201114Z // end inline asm 2026-02-21T09:31:51.4201268Z // begin inline asm 2026-02-21T09:31:51.4201466Z st.global.v4.b32 [ %rd165 + 0 ], { %r550, %r551, %r552, %r553 }; 2026-02-21T09:31:51.4201707Z // end inline asm 2026-02-21T09:31:51.4201855Z // begin inline asm 2026-02-21T09:31:51.4202059Z st.global.v4.b32 [ %rd166 + 0 ], { %r554, %r555, %r556, %r557 }; 2026-02-21T09:31:51.4202294Z // end inline asm 2026-02-21T09:31:51.4202448Z // begin inline asm 2026-02-21T09:31:51.4202647Z st.global.v4.b32 [ %rd167 + 0 ], { %r558, %r559, %r560, %r561 }; 2026-02-21T09:31:51.4202889Z // end inline asm 2026-02-21T09:31:51.4203036Z // begin inline asm 2026-02-21T09:31:51.4203242Z st.global.v4.b32 [ %rd168 + 0 ], { %r562, %r563, %r564, %r565 }; 2026-02-21T09:31:51.4203481Z // end inline asm 2026-02-21T09:31:51.4203631Z // begin inline asm 2026-02-21T09:31:51.4203837Z st.global.v4.b32 [ %rd169 + 0 ], { %r566, %r567, %r568, %r569 }; 2026-02-21T09:31:51.4204106Z // end inline asm 2026-02-21T09:31:51.4204294Z // begin inline asm 2026-02-21T09:31:51.4204492Z st.global.v4.b32 [ %rd170 + 0 ], { %r570, %r571, %r572, %r573 }; 2026-02-21T09:31:51.4204782Z // end inline asm 2026-02-21T09:31:51.4204963Z $L__BB0_8: // %._crit_edge 2026-02-21T09:31:51.4205313Z .loc 1 27 4 // cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py:27:4 2026-02-21T09:31:51.4205646Z bar.sync 0; 2026-02-21T09:31:51.4205794Z // begin inline asm 2026-02-21T09:31:51.4206027Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r675, 128; 2026-02-21T09:31:51.4206279Z // end inline asm 2026-02-21T09:31:51.4206432Z ret; 2026-02-21T09:31:51.4206571Z $L__tmp0: 2026-02-21T09:31:51.4206722Z $L__func_end0: 2026-02-21T09:31:51.4206900Z // -- End function 2026-02-21T09:31:51.4207158Z } 2026-02-21T09:31:51.4207507Z .file 1 "/tmp/torchinductor_root/dg/cdgnqkrpofwq2elt4o27y546ecze5mjkad6q3rxqva2ei2v542fa.py" 2026-02-21T09:31:51.4207888Z .section .debug_abbrev 2026-02-21T09:31:51.4208057Z { 2026-02-21T09:31:51.4208226Z .b8 1 // Abbreviation Code 2026-02-21T09:31:51.4208491Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:51.4208738Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:51.4208992Z .b8 37 // DW_AT_producer 2026-02-21T09:31:51.4209231Z .b8 8 // DW_FORM_string 2026-02-21T09:31:51.4209468Z .b8 19 // DW_AT_language 2026-02-21T09:31:51.4209701Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:51.4209924Z .b8 3 // DW_AT_name 2026-02-21T09:31:51.4210153Z .b8 8 // DW_FORM_string 2026-02-21T09:31:51.4210385Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:51.4210623Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:51.4210850Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:51.4211080Z .b8 8 // DW_FORM_string 2026-02-21T09:31:51.4211308Z .b8 0 // EOM(1) 2026-02-21T09:31:51.4211521Z .b8 0 // EOM(2) 2026-02-21T09:31:51.4211736Z .b8 0 // EOM(3) 2026-02-21T09:31:51.4211924Z } 2026-02-21T09:31:51.4212065Z .section .debug_info 2026-02-21T09:31:51.4212218Z { 2026-02-21T09:31:51.4212386Z .b32 104 // Length of Unit 2026-02-21T09:31:51.4212633Z .b8 2 // DWARF version number 2026-02-21T09:31:51.4212855Z .b8 0 2026-02-21T09:31:51.4213063Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:51.4213349Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:51.4213627Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:51.4213894Z .b8 116 // DW_AT_producer 2026-02-21T09:31:51.4214108Z .b8 114 2026-02-21T09:31:51.4214240Z .b8 105 2026-02-21T09:31:51.4214376Z .b8 116 2026-02-21T09:31:51.4214501Z .b8 111 2026-02-21T09:31:51.4214636Z .b8 110 2026-02-21T09:31:51.4214810Z .b8 0 2026-02-21T09:31:51.4214967Z .b8 2 // DW_AT_language 2026-02-21T09:31:51.4215178Z .b8 0 2026-02-21T09:31:51.4215335Z .b8 99 // DW_AT_name 2026-02-21T09:31:51.4215551Z .b8 100 2026-02-21T09:31:51.4215680Z .b8 103 2026-02-21T09:31:51.4215823Z .b8 110 2026-02-21T09:31:51.4215950Z .b8 113 2026-02-21T09:31:51.4216084Z .b8 107 2026-02-21T09:31:51.4216209Z .b8 114 2026-02-21T09:31:51.4216275Z .b8 112 2026-02-21T09:31:51.4216332Z .b8 111 2026-02-21T09:31:51.4216389Z .b8 102 2026-02-21T09:31:51.4216449Z .b8 119 2026-02-21T09:31:51.4216515Z .b8 113 2026-02-21T09:31:51.4216610Z .b8 50 2026-02-21T09:31:51.4216696Z .b8 101 2026-02-21T09:31:51.4216764Z .b8 108 2026-02-21T09:31:51.4216821Z .b8 116 2026-02-21T09:31:51.4216880Z .b8 52 2026-02-21T09:31:51.4216936Z .b8 111 2026-02-21T09:31:51.4217002Z .b8 50 2026-02-21T09:31:51.4217057Z .b8 55 2026-02-21T09:31:51.4217111Z .b8 121 2026-02-21T09:31:51.4217176Z .b8 53 2026-02-21T09:31:51.4217235Z .b8 52 2026-02-21T09:31:51.4217289Z .b8 54 2026-02-21T09:31:51.4217347Z .b8 101 2026-02-21T09:31:51.4217411Z .b8 99 2026-02-21T09:31:51.4217475Z .b8 122 2026-02-21T09:31:51.4217536Z .b8 101 2026-02-21T09:31:51.4217603Z .b8 53 2026-02-21T09:31:51.4217660Z .b8 109 2026-02-21T09:31:51.4217718Z .b8 106 2026-02-21T09:31:51.4217774Z .b8 107 2026-02-21T09:31:51.4217839Z .b8 97 2026-02-21T09:31:51.4217897Z .b8 100 2026-02-21T09:31:51.4217952Z .b8 54 2026-02-21T09:31:51.4218010Z .b8 113 2026-02-21T09:31:51.4218077Z .b8 51 2026-02-21T09:31:51.4218132Z .b8 114 2026-02-21T09:31:51.4218218Z .b8 120 2026-02-21T09:31:51.4218317Z .b8 113 2026-02-21T09:31:51.4218378Z .b8 118 2026-02-21T09:31:51.4218436Z .b8 97 2026-02-21T09:31:51.4218494Z .b8 50 2026-02-21T09:31:51.4218566Z .b8 101 2026-02-21T09:31:51.4218626Z .b8 105 2026-02-21T09:31:51.4218684Z .b8 50 2026-02-21T09:31:51.4218750Z .b8 118 2026-02-21T09:31:51.4218809Z .b8 53 2026-02-21T09:31:51.4218865Z .b8 52 2026-02-21T09:31:51.4218922Z .b8 50 2026-02-21T09:31:51.4218990Z .b8 102 2026-02-21T09:31:51.4219046Z .b8 97 2026-02-21T09:31:51.4219100Z .b8 46 2026-02-21T09:31:51.4219155Z .b8 112 2026-02-21T09:31:51.4219218Z .b8 121 2026-02-21T09:31:51.4219273Z .b8 0 2026-02-21T09:31:51.4219385Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:51.4219479Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:51.4219537Z .b8 116 2026-02-21T09:31:51.4219593Z .b8 109 2026-02-21T09:31:51.4219648Z .b8 112 2026-02-21T09:31:51.4219717Z .b8 47 2026-02-21T09:31:51.4219776Z .b8 116 2026-02-21T09:31:51.4219835Z .b8 111 2026-02-21T09:31:51.4219903Z .b8 114 2026-02-21T09:31:51.4219963Z .b8 99 2026-02-21T09:31:51.4220023Z .b8 104 2026-02-21T09:31:51.4220082Z .b8 105 2026-02-21T09:31:51.4220154Z .b8 110 2026-02-21T09:31:51.4220213Z .b8 100 2026-02-21T09:31:51.4220271Z .b8 117 2026-02-21T09:31:51.4220338Z .b8 99 2026-02-21T09:31:51.4220397Z .b8 116 2026-02-21T09:31:51.4220454Z .b8 111 2026-02-21T09:31:51.4220512Z .b8 114 2026-02-21T09:31:51.4220577Z .b8 95 2026-02-21T09:31:51.4220637Z .b8 114 2026-02-21T09:31:51.4220696Z .b8 111 2026-02-21T09:31:51.4220766Z .b8 111 2026-02-21T09:31:51.4220826Z .b8 116 2026-02-21T09:31:51.4220884Z .b8 47 2026-02-21T09:31:51.4220940Z .b8 100 2026-02-21T09:31:51.4221009Z .b8 103 2026-02-21T09:31:51.4221069Z .b8 0 2026-02-21T09:31:51.4221128Z } 2026-02-21T09:31:51.4221206Z .section .debug_macinfo { } 2026-02-21T09:31:51.4221219Z 2026-02-21T09:31:51.4221311Z ================================================================ 2026-02-21T09:31:51.4221432Z please share the reproducer above with Triton project. 2026-02-21T09:31:52.3882269Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:52.3882630Z 2026-02-21T09:31:52.3883725Z 2026-02-21T09:31:52.3883759Z 2026-02-21T09:31:52.3884033Z ================================================================ 2026-02-21T09:31:52.3884325Z Internal Triton PTX codegen error 2026-02-21T09:31:52.3884539Z `ptxas` stderr: 2026-02-21T09:31:52.3885389Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 147 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:52.3886021Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:52.3886214Z 2026-02-21T09:31:52.3886728Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpsqprp07n.ptx -o /tmp/tmpsqprp07n.ptx.o 2026-02-21T09:31:52.3887219Z 2026-02-21T09:31:52.3887237Z 2026-02-21T09:31:52.3887300Z // 2026-02-21T09:31:52.3887474Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:52.3887978Z // 2026-02-21T09:31:52.3889316Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=3, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:31:52.3890646Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:52.3890925Z `ptxas` stderr: 2026-02-21T09:31:52.3891396Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 147 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:52.3892028Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:52.3892193Z 2026-02-21T09:31:52.3892677Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpsqprp07n.ptx -o /tmp/tmpsqprp07n.ptx.o 2026-02-21T09:31:52.3893164Z 2026-02-21T09:31:52.3893306Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:52.3893516Z 2026-02-21T09:31:52.3893579Z .version 8.7 2026-02-21T09:31:52.3893728Z .target sm_100a 2026-02-21T09:31:52.3893883Z .address_size 64 2026-02-21T09:31:52.3893975Z 2026-02-21T09:31:52.3894118Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:52.3894402Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:52.3894649Z // @_helion_matmul 2026-02-21T09:31:52.3894919Z .visible .entry _helion_matmul( 2026-02-21T09:31:52.3895168Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:52.3895458Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:52.3895750Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:52.3896061Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:52.3896356Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:52.3896601Z ) 2026-02-21T09:31:52.3896745Z .reqntid 128 2026-02-21T09:31:52.3896915Z .maxnreg 32 2026-02-21T09:31:52.3897062Z { 2026-02-21T09:31:52.3897224Z .reg .pred %p<42>; 2026-02-21T09:31:52.3897396Z .reg .b32 %r<916>; 2026-02-21T09:31:52.3897568Z .reg .b64 %rd<394>; 2026-02-21T09:31:52.3897726Z $L__func_begin0: 2026-02-21T09:31:52.3897828Z 2026-02-21T09:31:52.3897888Z // %bb.0: 2026-02-21T09:31:52.3898172Z .loc 1 14 0 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:14 2026-02-21T09:31:52.3898505Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:52.3898687Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:52.3898872Z mov.b32 %r53, global_smem; 2026-02-21T09:31:52.3899059Z // begin inline asm 2026-02-21T09:31:52.3899335Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r53], 128; 2026-02-21T09:31:52.3899625Z // end inline asm 2026-02-21T09:31:52.3899786Z bar.sync 0; 2026-02-21T09:31:52.3899950Z ld.shared.b32 %r909, [global_smem]; 2026-02-21T09:31:52.3900150Z bar.sync 0; 2026-02-21T09:31:52.3900303Z // begin inline asm 2026-02-21T09:31:52.3900537Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:52.3900790Z // end inline asm 2026-02-21T09:31:52.3901076Z .loc 1 20 46 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:20:46 2026-02-21T09:31:52.3901395Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:31:52.3901687Z .loc 1 20 98 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:20:98 2026-02-21T09:31:52.3902019Z setp.gt.u32 %p3, %r3, 767; 2026-02-21T09:31:52.3902191Z @%p3 bra $L__BB0_8; 2026-02-21T09:31:52.3902374Z // %bb.1: // %.lr.ph 2026-02-21T09:31:52.3902703Z .loc 1 0 98 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:0:98 2026-02-21T09:31:52.3903169Z ld.param.b64 %rd27, [_helion_matmul_param_1]; 2026-02-21T09:31:52.3903406Z ld.param.b64 %rd26, [_helion_matmul_param_0]; 2026-02-21T09:31:52.3903745Z .loc 1 32 45 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:32:45 2026-02-21T09:31:52.3904077Z shl.b32 %r232, %r1, 3; 2026-02-21T09:31:52.3904362Z .loc 1 40 48 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:40:48 2026-02-21T09:31:52.3904817Z and.b32 %r233, %r232, 24; 2026-02-21T09:31:52.3905109Z .loc 1 34 45 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:34:45 2026-02-21T09:31:52.3905432Z shr.u32 %r234, %r1, 3; 2026-02-21T09:31:52.3905715Z .loc 1 33 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:33:27 2026-02-21T09:31:52.3906070Z shl.b32 %r235, %r3, 4; 2026-02-21T09:31:52.3906284Z and.b32 %r23, %r235, 16128; 2026-02-21T09:31:52.3906580Z .loc 1 34 45 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:34:45 2026-02-21T09:31:52.3906916Z or.b32 %r236, %r234, %r23; 2026-02-21T09:31:52.3907200Z .loc 1 32 45 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:32:45 2026-02-21T09:31:52.3907522Z and.b32 %r237, %r232, 56; 2026-02-21T09:31:52.3907690Z shr.u32 %r238, %r1, 2; 2026-02-21T09:31:52.3907859Z bfe.u32 %r4, %r1, 2, 5; 2026-02-21T09:31:52.3908025Z or.b32 %r239, %r4, 32; 2026-02-21T09:31:52.3908197Z shr.u32 %r240, %r1, 5; 2026-02-21T09:31:52.3908376Z setp.eq.b32 %p39, %r1, 0; 2026-02-21T09:31:52.3908546Z shl.b32 %r241, %r1, 4; 2026-02-21T09:31:52.3908718Z and.b32 %r242, %r241, 2032; 2026-02-21T09:31:52.3908892Z and.b32 %r243, %r1, 24; 2026-02-21T09:31:52.3909062Z shl.b32 %r244, %r243, 1; 2026-02-21T09:31:52.3909232Z xor.b32 %r5, %r242, %r244; 2026-02-21T09:31:52.3909410Z add.s32 %r192, %r53, %r5; 2026-02-21T09:31:52.3909577Z add.s32 %r194, %r192, 2048; 2026-02-21T09:31:52.3909751Z add.s32 %r196, %r192, 4096; 2026-02-21T09:31:52.3909916Z add.s32 %r198, %r192, 6144; 2026-02-21T09:31:52.3910087Z add.s32 %r200, %r192, 8192; 2026-02-21T09:31:52.3910263Z add.s32 %r202, %r192, 10240; 2026-02-21T09:31:52.3910435Z add.s32 %r204, %r192, 12288; 2026-02-21T09:31:52.3910614Z add.s32 %r206, %r192, 14336; 2026-02-21T09:31:52.3910780Z add.s32 %r208, %r192, 49152; 2026-02-21T09:31:52.3910952Z add.s32 %r210, %r192, 51200; 2026-02-21T09:31:52.3911115Z add.s32 %r212, %r192, 16384; 2026-02-21T09:31:52.3911287Z add.s32 %r214, %r192, 18432; 2026-02-21T09:31:52.3911450Z add.s32 %r216, %r192, 20480; 2026-02-21T09:31:52.3911623Z add.s32 %r218, %r192, 22528; 2026-02-21T09:31:52.3911795Z add.s32 %r220, %r192, 24576; 2026-02-21T09:31:52.3911960Z add.s32 %r222, %r192, 26624; 2026-02-21T09:31:52.3912133Z add.s32 %r224, %r192, 28672; 2026-02-21T09:31:52.3912297Z add.s32 %r226, %r192, 30720; 2026-02-21T09:31:52.3912471Z add.s32 %r228, %r192, 53248; 2026-02-21T09:31:52.3912637Z add.s32 %r230, %r192, 55296; 2026-02-21T09:31:52.3912809Z or.b32 %r6, %r233, 64; 2026-02-21T09:31:52.3912965Z add.s32 %r302, %r192, 32768; 2026-02-21T09:31:52.3913135Z add.s32 %r304, %r192, 34816; 2026-02-21T09:31:52.3913299Z add.s32 %r306, %r192, 36864; 2026-02-21T09:31:52.3913469Z add.s32 %r308, %r192, 38912; 2026-02-21T09:31:52.3913880Z add.s32 %r310, %r192, 40960; 2026-02-21T09:31:52.3914043Z add.s32 %r312, %r192, 43008; 2026-02-21T09:31:52.3914215Z add.s32 %r314, %r192, 45056; 2026-02-21T09:31:52.3914377Z add.s32 %r316, %r192, 47104; 2026-02-21T09:31:52.3914548Z add.s32 %r318, %r192, 57344; 2026-02-21T09:31:52.3914765Z add.s32 %r320, %r192, 59392; 2026-02-21T09:31:52.3914943Z and.b32 %r246, %r241, 1968; 2026-02-21T09:31:52.3915115Z bfe.s32 %r247, %r1, 2, 1; 2026-02-21T09:31:52.3915286Z and.b32 %r248, %r247, 2112; 2026-02-21T09:31:52.3915459Z or.b32 %r249, %r248, %r246; 2026-02-21T09:31:52.3915627Z xor.b32 %r250, %r249, 64; 2026-02-21T09:31:52.3915844Z shl.b32 %r251, %r1, 6; 2026-02-21T09:31:52.3916039Z and.b32 %r252, %r251, 2112; 2026-02-21T09:31:52.3916221Z shl.b32 %r253, %r243, 5; 2026-02-21T09:31:52.3916392Z and.b32 %r254, %r232, 48; 2026-02-21T09:31:52.3916572Z shl.b32 %r255, %r1, 1; 2026-02-21T09:31:52.3916741Z and.b32 %r256, %r255, 192; 2026-02-21T09:31:52.3916929Z or.b32 %r257, %r252, %r253; 2026-02-21T09:31:52.3917105Z or.b32 %r258, %r254, %r256; 2026-02-21T09:31:52.3917298Z xor.b32 %r259, %r257, %r258; 2026-02-21T09:31:52.3917490Z add.s32 %r543, %r53, %r259; 2026-02-21T09:31:52.3917796Z .loc 1 31 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:31:27 2026-02-21T09:31:52.3918147Z shl.b32 %r260, %r3, 6; 2026-02-21T09:31:52.3918317Z and.b32 %r21, %r260, 960; 2026-02-21T09:31:52.3918655Z .loc 1 32 32 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:32:32 2026-02-21T09:31:52.3919014Z or.b32 %r261, %r21, %r4; 2026-02-21T09:31:52.3919192Z or.b32 %r262, %r21, %r239; 2026-02-21T09:31:52.3919488Z .loc 1 34 32 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:34:32 2026-02-21T09:31:52.3919796Z or.b32 %r263, %r23, %r4; 2026-02-21T09:31:52.3919967Z or.b32 %r264, %r23, %r239; 2026-02-21T09:31:52.3920130Z or.b32 %r265, %r238, %r23; 2026-02-21T09:31:52.3920306Z or.b32 %r39, %r234, %r235; 2026-02-21T09:31:52.3920468Z and.b32 %r24, %r39, 16143; 2026-02-21T09:31:52.3920757Z .loc 1 44 53 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:53 2026-02-21T09:31:52.3921103Z shl.b32 %r266, %r263, 10; 2026-02-21T09:31:52.3921278Z shl.b32 %r267, %r264, 10; 2026-02-21T09:31:52.3921439Z shl.b32 %r268, %r265, 10; 2026-02-21T09:31:52.3921607Z or.b32 %r269, %r268, 229376; 2026-02-21T09:31:52.3921891Z .loc 1 45 80 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:80 2026-02-21T09:31:52.3922218Z shl.b32 %r270, %r261, 10; 2026-02-21T09:31:52.3922380Z shl.b32 %r271, %r262, 10; 2026-02-21T09:31:52.3922670Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.3923017Z shfl.sync.idx.b32 %r40, %r240, 0, 31, -1; 2026-02-21T09:31:52.3923218Z shl.b32 %r272, %r40, 21; 2026-02-21T09:31:52.3923395Z and.b32 %r273, %r272, 6291456; 2026-02-21T09:31:52.3923573Z add.s32 %r538, %r273, %r909; 2026-02-21T09:31:52.3923751Z mov.pred %p17, -1; 2026-02-21T09:31:52.3923906Z mov.b32 %r910, 0; 2026-02-21T09:31:52.3924065Z // begin inline asm 2026-02-21T09:31:52.3924474Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 0], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3924966Z // end inline asm 2026-02-21T09:31:52.3925124Z // begin inline asm 2026-02-21T09:31:52.3925528Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 16], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3925994Z // end inline asm 2026-02-21T09:31:52.3926146Z // begin inline asm 2026-02-21T09:31:52.3926547Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 32], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3926995Z // end inline asm 2026-02-21T09:31:52.3927142Z // begin inline asm 2026-02-21T09:31:52.3927539Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 48], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3927967Z // end inline asm 2026-02-21T09:31:52.3928122Z // begin inline asm 2026-02-21T09:31:52.3928527Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 64], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3928969Z // end inline asm 2026-02-21T09:31:52.3929167Z // begin inline asm 2026-02-21T09:31:52.3929582Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 80], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3930014Z // end inline asm 2026-02-21T09:31:52.3930162Z // begin inline asm 2026-02-21T09:31:52.3930558Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 96], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3930979Z // end inline asm 2026-02-21T09:31:52.3931131Z // begin inline asm 2026-02-21T09:31:52.3931533Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r538 + 112], {%r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910, %r910}; 2026-02-21T09:31:52.3931968Z // end inline asm 2026-02-21T09:31:52.3932164Z // begin inline asm 2026-02-21T09:31:52.3932372Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:52.3932563Z // end inline asm 2026-02-21T09:31:52.3932709Z bar.sync 0; 2026-02-21T09:31:52.3932987Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.3933323Z add.s32 %r911, %r53, 61440; 2026-02-21T09:31:52.3933493Z // begin inline asm 2026-02-21T09:31:52.3933683Z @%p39 mbarrier.init.shared::cta.b64 [%r911], 1; 2026-02-21T09:31:52.3933895Z // end inline asm 2026-02-21T09:31:52.3934054Z bar.sync 0; 2026-02-21T09:31:52.3934203Z add.s32 %r191, %r53, 61448; 2026-02-21T09:31:52.3934384Z // begin inline asm 2026-02-21T09:31:52.3934563Z @%p39 mbarrier.init.shared::cta.b64 [%r191], 1; 2026-02-21T09:31:52.3934817Z // end inline asm 2026-02-21T09:31:52.3935093Z .loc 1 44 60 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:60 2026-02-21T09:31:52.3935420Z or.b32 %r274, %r266, %r233; 2026-02-21T09:31:52.3935600Z or.b32 %r275, %r267, %r233; 2026-02-21T09:31:52.3935771Z or.b32 %r276, %r269, %r233; 2026-02-21T09:31:52.3936077Z .loc 1 44 32 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:32 2026-02-21T09:31:52.3936417Z mad.wide.u32 %rd29, %r274, 2, %rd26; 2026-02-21T09:31:52.3936635Z mad.wide.u32 %rd30, %r275, 2, %rd26; 2026-02-21T09:31:52.3936833Z cvt.u64.u32 %rd49, %r233; 2026-02-21T09:31:52.3937016Z cvt.u64.u32 %rd1, %r266; 2026-02-21T09:31:52.3937197Z or.b64 %rd50, %rd1, %rd49; 2026-02-21T09:31:52.3937370Z shl.b64 %rd51, %rd50, 1; 2026-02-21T09:31:52.3937547Z add.s64 %rd2, %rd26, %rd51; 2026-02-21T09:31:52.3937725Z add.s64 %rd31, %rd2, 131072; 2026-02-21T09:31:52.3937924Z add.s64 %rd32, %rd2, 196608; 2026-02-21T09:31:52.3938097Z add.s64 %rd33, %rd2, 262144; 2026-02-21T09:31:52.3938281Z add.s64 %rd34, %rd2, 327680; 2026-02-21T09:31:52.3938448Z add.s64 %rd35, %rd2, 393216; 2026-02-21T09:31:52.3938634Z mad.wide.u32 %rd36, %r276, 2, %rd26; 2026-02-21T09:31:52.3938824Z mov.b32 %r303, 16; 2026-02-21T09:31:52.3939155Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.3939504Z // begin inline asm 2026-02-21T09:31:52.3939747Z cp.async.cg.shared.global [ %r192 + 0 ], [ %rd29 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3940025Z // end inline asm 2026-02-21T09:31:52.3940177Z // begin inline asm 2026-02-21T09:31:52.3940420Z cp.async.cg.shared.global [ %r194 + 0 ], [ %rd30 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3940688Z // end inline asm 2026-02-21T09:31:52.3940845Z // begin inline asm 2026-02-21T09:31:52.3941073Z cp.async.cg.shared.global [ %r196 + 0 ], [ %rd31 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3941351Z // end inline asm 2026-02-21T09:31:52.3941506Z // begin inline asm 2026-02-21T09:31:52.3941733Z cp.async.cg.shared.global [ %r198 + 0 ], [ %rd32 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3942012Z // end inline asm 2026-02-21T09:31:52.3942159Z // begin inline asm 2026-02-21T09:31:52.3942402Z cp.async.cg.shared.global [ %r200 + 0 ], [ %rd33 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3942711Z // end inline asm 2026-02-21T09:31:52.3942907Z // begin inline asm 2026-02-21T09:31:52.3943136Z cp.async.cg.shared.global [ %r202 + 0 ], [ %rd34 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3943407Z // end inline asm 2026-02-21T09:31:52.3943562Z // begin inline asm 2026-02-21T09:31:52.3943790Z cp.async.cg.shared.global [ %r204 + 0 ], [ %rd35 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3944062Z // end inline asm 2026-02-21T09:31:52.3944219Z // begin inline asm 2026-02-21T09:31:52.3944457Z cp.async.cg.shared.global [ %r206 + 0 ], [ %rd36 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3944779Z // end inline asm 2026-02-21T09:31:52.3944946Z cp.async.commit_group; 2026-02-21T09:31:52.3945244Z .loc 1 45 59 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:59 2026-02-21T09:31:52.3945602Z or.b32 %r277, %r270, %r233; 2026-02-21T09:31:52.3945826Z or.b32 %r278, %r271, %r233; 2026-02-21T09:31:52.3946159Z .loc 1 45 34 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:34 2026-02-21T09:31:52.3946515Z mad.wide.u32 %rd37, %r277, 2, %rd27; 2026-02-21T09:31:52.3946724Z mad.wide.u32 %rd38, %r278, 2, %rd27; 2026-02-21T09:31:52.3947060Z .loc 1 45 87 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:87 2026-02-21T09:31:52.3947396Z // begin inline asm 2026-02-21T09:31:52.3947625Z cp.async.cg.shared.global [ %r208 + 0 ], [ %rd37 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3947906Z // end inline asm 2026-02-21T09:31:52.3948057Z // begin inline asm 2026-02-21T09:31:52.3948280Z cp.async.cg.shared.global [ %r210 + 0 ], [ %rd38 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3948546Z // end inline asm 2026-02-21T09:31:52.3948713Z cp.async.commit_group; 2026-02-21T09:31:52.3949007Z .loc 1 44 32 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:32 2026-02-21T09:31:52.3949354Z add.s64 %rd39, %rd2, 64; 2026-02-21T09:31:52.3949531Z add.s64 %rd40, %rd30, 64; 2026-02-21T09:31:52.3949717Z or.b32 %r279, %r274, 32; 2026-02-21T09:31:52.3949908Z mad.wide.u32 %rd52, %r279, 2, %rd26; 2026-02-21T09:31:52.3950100Z add.s64 %rd41, %rd52, 131072; 2026-02-21T09:31:52.3950285Z add.s64 %rd42, %rd52, 196608; 2026-02-21T09:31:52.3950458Z add.s64 %rd43, %rd52, 262144; 2026-02-21T09:31:52.3950641Z add.s64 %rd44, %rd52, 327680; 2026-02-21T09:31:52.3950811Z add.s64 %rd45, %rd52, 393216; 2026-02-21T09:31:52.3950988Z cvt.u64.u32 %rd4, %r269; 2026-02-21T09:31:52.3951155Z or.b64 %rd53, %rd4, %rd49; 2026-02-21T09:31:52.3951334Z shl.b64 %rd54, %rd53, 1; 2026-02-21T09:31:52.3951499Z add.s64 %rd5, %rd26, %rd54; 2026-02-21T09:31:52.3951674Z add.s64 %rd46, %rd5, 64; 2026-02-21T09:31:52.3951963Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.3952282Z bar.sync 0; 2026-02-21T09:31:52.3952430Z // begin inline asm 2026-02-21T09:31:52.3952647Z cp.async.cg.shared.global [ %r212 + 0 ], [ %rd39 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3952901Z // end inline asm 2026-02-21T09:31:52.3953051Z // begin inline asm 2026-02-21T09:31:52.3953269Z cp.async.cg.shared.global [ %r214 + 0 ], [ %rd40 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3953518Z // end inline asm 2026-02-21T09:31:52.3953670Z // begin inline asm 2026-02-21T09:31:52.3953885Z cp.async.cg.shared.global [ %r216 + 0 ], [ %rd41 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3954128Z // end inline asm 2026-02-21T09:31:52.3954282Z // begin inline asm 2026-02-21T09:31:52.3954490Z cp.async.cg.shared.global [ %r218 + 0 ], [ %rd42 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3954797Z // end inline asm 2026-02-21T09:31:52.3954943Z // begin inline asm 2026-02-21T09:31:52.3955157Z cp.async.cg.shared.global [ %r220 + 0 ], [ %rd43 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3955398Z // end inline asm 2026-02-21T09:31:52.3955551Z // begin inline asm 2026-02-21T09:31:52.3955767Z cp.async.cg.shared.global [ %r222 + 0 ], [ %rd44 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3956006Z // end inline asm 2026-02-21T09:31:52.3956221Z // begin inline asm 2026-02-21T09:31:52.3956463Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd45 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3956725Z // end inline asm 2026-02-21T09:31:52.3956871Z // begin inline asm 2026-02-21T09:31:52.3957090Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd46 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3957328Z // end inline asm 2026-02-21T09:31:52.3957488Z cp.async.commit_group; 2026-02-21T09:31:52.3957786Z .loc 1 45 34 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:34 2026-02-21T09:31:52.3958112Z add.s64 %rd47, %rd37, 64; 2026-02-21T09:31:52.3958300Z add.s64 %rd48, %rd38, 64; 2026-02-21T09:31:52.3958586Z .loc 1 45 87 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:87 2026-02-21T09:31:52.3958902Z // begin inline asm 2026-02-21T09:31:52.3959140Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd47 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3959419Z // end inline asm 2026-02-21T09:31:52.3959569Z // begin inline asm 2026-02-21T09:31:52.3959794Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd48 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3960044Z // end inline asm 2026-02-21T09:31:52.3960196Z cp.async.commit_group; 2026-02-21T09:31:52.3960488Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.3960815Z cp.async.wait_group 2; 2026-02-21T09:31:52.3960987Z bar.sync 0; 2026-02-21T09:31:52.3961246Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.3961573Z setp.ne.b32 %p14, %r40, 0; 2026-02-21T09:31:52.3961748Z @%p14 bra $L__BB0_3; 2026-02-21T09:31:52.3961911Z // %bb.2: 2026-02-21T09:31:52.3962172Z .loc 1 0 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:0:52 2026-02-21T09:31:52.3962493Z add.s32 %r289, %r53, 8224; 2026-02-21T09:31:52.3962677Z bfe.u32 %r290, %r289, 4, 14; 2026-02-21T09:31:52.3962853Z cvt.u64.u32 %rd64, %r290; 2026-02-21T09:31:52.3963044Z or.b64 %rd61, %rd64, -9223371899348713472; 2026-02-21T09:31:52.3963246Z add.s32 %r291, %r53, 8192; 2026-02-21T09:31:52.3963422Z bfe.u32 %r292, %r291, 4, 14; 2026-02-21T09:31:52.3963588Z cvt.u64.u32 %rd65, %r292; 2026-02-21T09:31:52.3963774Z or.b64 %rd59, %rd65, -9223371899348713472; 2026-02-21T09:31:52.3963984Z add.s32 %r293, %r53, 49152; 2026-02-21T09:31:52.3964152Z add.s32 %r294, %r53, 49184; 2026-02-21T09:31:52.3964327Z bfe.u32 %r295, %r294, 4, 14; 2026-02-21T09:31:52.3964492Z cvt.u64.u32 %rd66, %r295; 2026-02-21T09:31:52.3964743Z or.b64 %rd58, %rd66, -9223371899399045120; 2026-02-21T09:31:52.3964938Z add.s32 %r296, %r53, 32; 2026-02-21T09:31:52.3965110Z bfe.u32 %r297, %r296, 4, 14; 2026-02-21T09:31:52.3965275Z cvt.u64.u32 %rd67, %r297; 2026-02-21T09:31:52.3965459Z or.b64 %rd57, %rd67, -9223371899348713472; 2026-02-21T09:31:52.3965661Z bfe.u32 %r298, %r293, 4, 14; 2026-02-21T09:31:52.3965830Z cvt.u64.u32 %rd68, %r298; 2026-02-21T09:31:52.3966016Z or.b64 %rd56, %rd68, -9223371899399045120; 2026-02-21T09:31:52.3966210Z bfe.u32 %r299, %r53, 4, 14; 2026-02-21T09:31:52.3966387Z cvt.u64.u32 %rd69, %r299; 2026-02-21T09:31:52.3966562Z or.b64 %rd55, %rd69, -9223371899348713472; 2026-02-21T09:31:52.3966895Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.3967225Z elect.sync %r300|%p16, -1; 2026-02-21T09:31:52.3967408Z mov.b32 %r281, 135266320; 2026-02-21T09:31:52.3967585Z mov.pred %p15, 0; 2026-02-21T09:31:52.3967764Z // begin inline asm 2026-02-21T09:31:52.3968029Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 0 ], %rd55, %rd56, %r281, %p15; 2026-02-21T09:31:52.3968312Z // end inline asm 2026-02-21T09:31:52.3968467Z // begin inline asm 2026-02-21T09:31:52.3968710Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 0 ], %rd57, %rd58, %r281, %p17; 2026-02-21T09:31:52.3968994Z // end inline asm 2026-02-21T09:31:52.3969182Z // begin inline asm 2026-02-21T09:31:52.3969473Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 64 ], %rd59, %rd56, %r281, %p15; 2026-02-21T09:31:52.3969760Z // end inline asm 2026-02-21T09:31:52.3969906Z // begin inline asm 2026-02-21T09:31:52.3970147Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 64 ], %rd61, %rd58, %r281, %p17; 2026-02-21T09:31:52.3970419Z // end inline asm 2026-02-21T09:31:52.3970577Z add.s32 %r301, %r53, 61440; 2026-02-21T09:31:52.3970752Z cvt.u64.u32 %rd63, %r301; 2026-02-21T09:31:52.3970929Z // begin inline asm 2026-02-21T09:31:52.3971153Z @%p16 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd63]; 2026-02-21T09:31:52.3971419Z // end inline asm 2026-02-21T09:31:52.3971583Z $L__BB0_3: 2026-02-21T09:31:52.3971843Z .loc 1 0 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:0:52 2026-02-21T09:31:52.3972241Z ld.param.b64 %rd28, [_helion_matmul_param_2]; 2026-02-21T09:31:52.3972493Z add.s32 %r17, %r53, %r249; 2026-02-21T09:31:52.3972698Z add.s32 %r18, %r53, %r250; 2026-02-21T09:31:52.3972885Z add.s32 %r548, %r543, 1024; 2026-02-21T09:31:52.3973075Z or.b32 %r22, %r21, %r237; 2026-02-21T09:31:52.3973254Z or.b32 %r25, %r24, 16; 2026-02-21T09:31:52.3973451Z or.b32 %r26, %r24, 32; 2026-02-21T09:31:52.3973634Z or.b32 %r27, %r24, 48; 2026-02-21T09:31:52.3973810Z or.b32 %r28, %r24, 64; 2026-02-21T09:31:52.3973996Z or.b32 %r29, %r24, 80; 2026-02-21T09:31:52.3974165Z or.b32 %r30, %r24, 96; 2026-02-21T09:31:52.3974351Z or.b32 %r31, %r236, 112; 2026-02-21T09:31:52.3974534Z or.b32 %r32, %r24, 128; 2026-02-21T09:31:52.3974774Z or.b32 %r33, %r24, 144; 2026-02-21T09:31:52.3974944Z or.b32 %r34, %r24, 160; 2026-02-21T09:31:52.3975130Z or.b32 %r35, %r24, 176; 2026-02-21T09:31:52.3975298Z or.b32 %r36, %r24, 192; 2026-02-21T09:31:52.3975481Z or.b32 %r37, %r24, 208; 2026-02-21T09:31:52.3975664Z or.b32 %r38, %r24, 224; 2026-02-21T09:31:52.3975959Z .loc 1 44 32 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:32 2026-02-21T09:31:52.3976318Z add.s64 %rd70, %rd2, 128; 2026-02-21T09:31:52.3976504Z add.s64 %rd71, %rd30, 128; 2026-02-21T09:31:52.3976709Z cvt.u64.u32 %rd81, %r6; 2026-02-21T09:31:52.3976890Z add.s64 %rd82, %rd1, %rd81; 2026-02-21T09:31:52.3977093Z shl.b64 %rd83, %rd82, 1; 2026-02-21T09:31:52.3977279Z add.s64 %rd84, %rd26, %rd83; 2026-02-21T09:31:52.3977496Z add.s64 %rd72, %rd84, 131072; 2026-02-21T09:31:52.3977681Z add.s64 %rd73, %rd84, 196608; 2026-02-21T09:31:52.3977854Z add.s64 %rd74, %rd84, 262144; 2026-02-21T09:31:52.3978036Z add.s64 %rd75, %rd84, 327680; 2026-02-21T09:31:52.3978208Z add.s64 %rd76, %rd84, 393216; 2026-02-21T09:31:52.3978408Z add.s64 %rd77, %rd5, 128; 2026-02-21T09:31:52.3978719Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.3979066Z bar.sync 0; 2026-02-21T09:31:52.3979215Z // begin inline asm 2026-02-21T09:31:52.3979468Z cp.async.cg.shared.global [ %r302 + 0 ], [ %rd70 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3979748Z // end inline asm 2026-02-21T09:31:52.3979901Z // begin inline asm 2026-02-21T09:31:52.3980143Z cp.async.cg.shared.global [ %r304 + 0 ], [ %rd71 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3980410Z // end inline asm 2026-02-21T09:31:52.3980567Z // begin inline asm 2026-02-21T09:31:52.3980797Z cp.async.cg.shared.global [ %r306 + 0 ], [ %rd72 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3981068Z // end inline asm 2026-02-21T09:31:52.3981215Z // begin inline asm 2026-02-21T09:31:52.3981452Z cp.async.cg.shared.global [ %r308 + 0 ], [ %rd73 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3981724Z // end inline asm 2026-02-21T09:31:52.3981873Z // begin inline asm 2026-02-21T09:31:52.3982106Z cp.async.cg.shared.global [ %r310 + 0 ], [ %rd74 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3982367Z // end inline asm 2026-02-21T09:31:52.3982526Z // begin inline asm 2026-02-21T09:31:52.3982757Z cp.async.cg.shared.global [ %r312 + 0 ], [ %rd75 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3983186Z // end inline asm 2026-02-21T09:31:52.3983369Z // begin inline asm 2026-02-21T09:31:52.3983609Z cp.async.cg.shared.global [ %r314 + 0 ], [ %rd76 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3983871Z // end inline asm 2026-02-21T09:31:52.3984032Z // begin inline asm 2026-02-21T09:31:52.3984267Z cp.async.cg.shared.global [ %r316 + 0 ], [ %rd77 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3984531Z // end inline asm 2026-02-21T09:31:52.3984815Z cp.async.commit_group; 2026-02-21T09:31:52.3985112Z .loc 1 45 34 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:34 2026-02-21T09:31:52.3985489Z add.s64 %rd78, %rd37, 128; 2026-02-21T09:31:52.3985668Z add.s64 %rd79, %rd38, 128; 2026-02-21T09:31:52.3985976Z .loc 1 45 87 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:87 2026-02-21T09:31:52.3986313Z // begin inline asm 2026-02-21T09:31:52.3986591Z cp.async.cg.shared.global [ %r318 + 0 ], [ %rd78 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3986857Z // end inline asm 2026-02-21T09:31:52.3987008Z // begin inline asm 2026-02-21T09:31:52.3987232Z cp.async.cg.shared.global [ %r320 + 0 ], [ %rd79 + 0 ], 0x10, %r303; 2026-02-21T09:31:52.3987503Z // end inline asm 2026-02-21T09:31:52.3987666Z cp.async.commit_group; 2026-02-21T09:31:52.3987956Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.3988305Z and.b32 %r326, %r1, 3; 2026-02-21T09:31:52.3988486Z mul.wide.u32 %rd8, %r326, 16; 2026-02-21T09:31:52.3988670Z shl.b64 %rd85, %rd4, 1; 2026-02-21T09:31:52.3988853Z add.s64 %rd86, %rd85, %rd26; 2026-02-21T09:31:52.3989034Z add.s64 %rd393, %rd86, 192; 2026-02-21T09:31:52.3989220Z add.s32 %r327, %r21, %r4; 2026-02-21T09:31:52.3989394Z add.s32 %r328, %r327, 32; 2026-02-21T09:31:52.3989588Z mad.wide.u32 %rd87, %r328, 2048, %rd27; 2026-02-21T09:31:52.3989793Z add.s64 %rd392, %rd87, 192; 2026-02-21T09:31:52.3989980Z and.b32 %r329, %r3, 15; 2026-02-21T09:31:52.3990159Z shl.b32 %r330, %r329, 16; 2026-02-21T09:31:52.3990334Z shl.b32 %r331, %r4, 10; 2026-02-21T09:31:52.3990505Z or.b32 %r332, %r330, %r331; 2026-02-21T09:31:52.3990684Z mad.wide.u32 %rd88, %r332, 2, %rd27; 2026-02-21T09:31:52.3990885Z add.s64 %rd391, %rd88, 192; 2026-02-21T09:31:52.3991050Z shl.b32 %r333, %r3, 14; 2026-02-21T09:31:52.3991222Z and.b32 %r334, %r333, 16515072; 2026-02-21T09:31:52.3991401Z or.b32 %r335, %r334, %r331; 2026-02-21T09:31:52.3991584Z mad.wide.u32 %rd390, %r335, 2, %rd26; 2026-02-21T09:31:52.3991772Z add.s32 %r336, %r23, %r4; 2026-02-21T09:31:52.3991943Z add.s32 %r337, %r336, 32; 2026-02-21T09:31:52.3992124Z mad.wide.u32 %rd89, %r337, 2048, %rd26; 2026-02-21T09:31:52.3992316Z add.s64 %rd389, %rd89, 192; 2026-02-21T09:31:52.3992489Z mov.b32 %r914, 1; 2026-02-21T09:31:52.3992637Z mov.b32 %r913, 2; 2026-02-21T09:31:52.3992800Z mov.b64 %rd388, -32; 2026-02-21T09:31:52.3992960Z mov.b32 %r912, %r910; 2026-02-21T09:31:52.3993128Z mov.b32 %r915, %r910; 2026-02-21T09:31:52.3993282Z bra.uni $L__BB0_4; 2026-02-21T09:31:52.3993496Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:31:52.3993858Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.3994182Z add.s64 %rd388, %rd388, 32; 2026-02-21T09:31:52.3994371Z setp.lt.u64 %p35, %rd388, 928; 2026-02-21T09:31:52.3994722Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.3995059Z // begin inline asm 2026-02-21T09:31:52.3995216Z 2026-02-21T09:31:52.3995355Z { 2026-02-21T09:31:52.3995495Z .reg .pred complete; 2026-02-21T09:31:52.3995667Z waitLoop: 2026-02-21T09:31:52.3995885Z mbarrier.try_wait.parity.shared.b64 complete, [%r911], %r910; 2026-02-21T09:31:52.3996177Z @!complete bra.uni waitLoop; 2026-02-21T09:31:52.3996356Z } 2026-02-21T09:31:52.3996430Z 2026-02-21T09:31:52.3996492Z // end inline asm 2026-02-21T09:31:52.3996693Z add.s32 %r390, %r914, 1; 2026-02-21T09:31:52.3996892Z setp.gt.s32 %p36, %r390, 1; 2026-02-21T09:31:52.3997075Z selp.b32 %r914, 0, %r390, %p36; 2026-02-21T09:31:52.3997256Z selp.b32 %r391, 1, 0, %p36; 2026-02-21T09:31:52.3997432Z xor.b32 %r51, %r915, %r391; 2026-02-21T09:31:52.3997715Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.3998041Z add.s32 %r392, %r913, 1; 2026-02-21T09:31:52.3998221Z setp.gt.s32 %p37, %r392, 2; 2026-02-21T09:31:52.3998400Z selp.b32 %r913, 0, %r392, %p37; 2026-02-21T09:31:52.3998725Z .loc 1 44 32 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:32 2026-02-21T09:31:52.3999045Z add.s64 %rd115, %rd390, %rd8; 2026-02-21T09:31:52.3999229Z add.s64 %rd105, %rd115, 192; 2026-02-21T09:31:52.3999403Z add.s64 %rd106, %rd389, %rd8; 2026-02-21T09:31:52.3999616Z add.s64 %rd107, %rd115, 131264; 2026-02-21T09:31:52.3999842Z add.s64 %rd108, %rd115, 196800; 2026-02-21T09:31:52.4000035Z add.s64 %rd109, %rd115, 262336; 2026-02-21T09:31:52.4000225Z add.s64 %rd110, %rd115, 327872; 2026-02-21T09:31:52.4000406Z add.s64 %rd111, %rd115, 393408; 2026-02-21T09:31:52.4000717Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.4001051Z add.s64 %rd112, %rd393, %rd8; 2026-02-21T09:31:52.4001241Z shl.b32 %r393, %r913, 14; 2026-02-21T09:31:52.4001423Z add.s32 %r395, %r53, %r393; 2026-02-21T09:31:52.4001600Z bar.sync 0; 2026-02-21T09:31:52.4001747Z add.s32 %r370, %r395, %r5; 2026-02-21T09:31:52.4001934Z selp.b32 %r371, 16, 0, %p35; 2026-02-21T09:31:52.4002115Z // begin inline asm 2026-02-21T09:31:52.4002353Z cp.async.cg.shared.global [ %r370 + 0 ], [ %rd105 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4002631Z // end inline asm 2026-02-21T09:31:52.4002788Z add.s32 %r372, %r370, 2048; 2026-02-21T09:31:52.4002967Z // begin inline asm 2026-02-21T09:31:52.4003192Z cp.async.cg.shared.global [ %r372 + 0 ], [ %rd106 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4003457Z // end inline asm 2026-02-21T09:31:52.4003608Z add.s32 %r374, %r370, 4096; 2026-02-21T09:31:52.4003787Z // begin inline asm 2026-02-21T09:31:52.4004025Z cp.async.cg.shared.global [ %r374 + 0 ], [ %rd107 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4004286Z // end inline asm 2026-02-21T09:31:52.4004450Z add.s32 %r376, %r370, 6144; 2026-02-21T09:31:52.4004621Z // begin inline asm 2026-02-21T09:31:52.4004916Z cp.async.cg.shared.global [ %r376 + 0 ], [ %rd108 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4005178Z // end inline asm 2026-02-21T09:31:52.4005346Z add.s32 %r378, %r370, 8192; 2026-02-21T09:31:52.4005522Z // begin inline asm 2026-02-21T09:31:52.4005769Z cp.async.cg.shared.global [ %r378 + 0 ], [ %rd109 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4006035Z // end inline asm 2026-02-21T09:31:52.4006196Z add.s32 %r380, %r370, 10240; 2026-02-21T09:31:52.4006426Z // begin inline asm 2026-02-21T09:31:52.4006696Z cp.async.cg.shared.global [ %r380 + 0 ], [ %rd110 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4006958Z // end inline asm 2026-02-21T09:31:52.4007122Z add.s32 %r382, %r370, 12288; 2026-02-21T09:31:52.4007295Z // begin inline asm 2026-02-21T09:31:52.4007521Z cp.async.cg.shared.global [ %r382 + 0 ], [ %rd111 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4007774Z // end inline asm 2026-02-21T09:31:52.4007947Z add.s32 %r384, %r370, 14336; 2026-02-21T09:31:52.4008118Z // begin inline asm 2026-02-21T09:31:52.4008347Z cp.async.cg.shared.global [ %r384 + 0 ], [ %rd112 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4008597Z // end inline asm 2026-02-21T09:31:52.4008764Z cp.async.commit_group; 2026-02-21T09:31:52.4009074Z .loc 1 45 34 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:34 2026-02-21T09:31:52.4009406Z add.s64 %rd113, %rd391, %rd8; 2026-02-21T09:31:52.4009728Z .loc 1 45 87 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:87 2026-02-21T09:31:52.4010123Z add.s64 %rd114, %rd392, %rd8; 2026-02-21T09:31:52.4010391Z shl.b32 %r396, %r913, 12; 2026-02-21T09:31:52.4010625Z add.s32 %r397, %r53, %r396; 2026-02-21T09:31:52.4010875Z add.s32 %r398, %r397, %r5; 2026-02-21T09:31:52.4011130Z add.s32 %r386, %r398, 49152; 2026-02-21T09:31:52.4011349Z // begin inline asm 2026-02-21T09:31:52.4011577Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd113 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4011845Z // end inline asm 2026-02-21T09:31:52.4012002Z add.s32 %r388, %r398, 51200; 2026-02-21T09:31:52.4012168Z // begin inline asm 2026-02-21T09:31:52.4012393Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd114 + 0 ], 0x10, %r371; 2026-02-21T09:31:52.4012663Z // end inline asm 2026-02-21T09:31:52.4012825Z cp.async.commit_group; 2026-02-21T09:31:52.4013114Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.4013496Z add.s64 %rd393, %rd393, 64; 2026-02-21T09:31:52.4013714Z add.s64 %rd392, %rd392, 64; 2026-02-21T09:31:52.4013890Z add.s64 %rd391, %rd391, 64; 2026-02-21T09:31:52.4014066Z add.s64 %rd390, %rd390, 64; 2026-02-21T09:31:52.4014231Z add.s64 %rd389, %rd389, 64; 2026-02-21T09:31:52.4014417Z setp.lt.u64 %p38, %rd388, 960; 2026-02-21T09:31:52.4014599Z mov.b32 %r910, %r915; 2026-02-21T09:31:52.4014838Z mov.b32 %r911, %r399; 2026-02-21T09:31:52.4015002Z mov.b32 %r915, %r51; 2026-02-21T09:31:52.4015172Z @%p38 bra $L__BB0_4; 2026-02-21T09:31:52.4015340Z bra.uni $L__BB0_7; 2026-02-21T09:31:52.4015551Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:31:52.4015810Z add.s32 %r339, %r912, 1; 2026-02-21T09:31:52.4015987Z setp.gt.s32 %p25, %r339, 2; 2026-02-21T09:31:52.4016179Z selp.b32 %r912, 0, %r339, %p25; 2026-02-21T09:31:52.4016493Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.4016840Z cp.async.wait_group 2; 2026-02-21T09:31:52.4017013Z bar.sync 0; 2026-02-21T09:31:52.4017320Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.4017683Z shl.b32 %r340, %r914, 3; 2026-02-21T09:31:52.4017852Z add.s32 %r342, %r53, %r340; 2026-02-21T09:31:52.4018032Z add.s32 %r399, %r342, 61440; 2026-02-21T09:31:52.4018363Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4018706Z @%p14 bra $L__BB0_6; 2026-02-21T09:31:52.4018916Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:31:52.4019325Z .loc 1 45 87 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:45:87 2026-02-21T09:31:52.4019677Z shl.b32 %r351, %r912, 12; 2026-02-21T09:31:52.4019847Z add.s32 %r353, %r53, %r351; 2026-02-21T09:31:52.4020048Z add.s32 %r354, %r353, 49152; 2026-02-21T09:31:52.4020378Z .loc 1 44 85 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:44:85 2026-02-21T09:31:52.4020721Z shl.b32 %r355, %r912, 14; 2026-02-21T09:31:52.4020887Z add.s32 %r356, %r53, %r355; 2026-02-21T09:31:52.4021184Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4021520Z elect.sync %r357|%p27, -1; 2026-02-21T09:31:52.4021700Z bfe.u32 %r358, %r356, 4, 14; 2026-02-21T09:31:52.4021876Z cvt.u64.u32 %rd99, %r358; 2026-02-21T09:31:52.4022060Z or.b64 %rd90, %rd99, -9223371899348713472; 2026-02-21T09:31:52.4022270Z bfe.u32 %r359, %r354, 4, 14; 2026-02-21T09:31:52.4022439Z cvt.u64.u32 %rd100, %r359; 2026-02-21T09:31:52.4022630Z or.b64 %rd91, %rd100, -9223371899399045120; 2026-02-21T09:31:52.4022825Z mov.b32 %r344, 135266320; 2026-02-21T09:31:52.4022998Z mov.pred %p26, -1; 2026-02-21T09:31:52.4023152Z // begin inline asm 2026-02-21T09:31:52.4023413Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 0 ], %rd90, %rd91, %r344, %p26; 2026-02-21T09:31:52.4023697Z // end inline asm 2026-02-21T09:31:52.4023887Z add.s32 %r360, %r356, 32; 2026-02-21T09:31:52.4024088Z bfe.u32 %r361, %r360, 4, 14; 2026-02-21T09:31:52.4024256Z cvt.u64.u32 %rd101, %r361; 2026-02-21T09:31:52.4024444Z or.b64 %rd92, %rd101, -9223371899348713472; 2026-02-21T09:31:52.4024639Z add.s32 %r362, %r353, 49184; 2026-02-21T09:31:52.4024869Z bfe.u32 %r363, %r362, 4, 14; 2026-02-21T09:31:52.4025033Z cvt.u64.u32 %rd102, %r363; 2026-02-21T09:31:52.4025217Z or.b64 %rd93, %rd102, -9223371899399045120; 2026-02-21T09:31:52.4025417Z // begin inline asm 2026-02-21T09:31:52.4025663Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 0 ], %rd92, %rd93, %r344, %p26; 2026-02-21T09:31:52.4025950Z // end inline asm 2026-02-21T09:31:52.4026107Z add.s32 %r364, %r356, 8192; 2026-02-21T09:31:52.4026292Z bfe.u32 %r365, %r364, 4, 14; 2026-02-21T09:31:52.4026456Z cvt.u64.u32 %rd103, %r365; 2026-02-21T09:31:52.4026676Z or.b64 %rd94, %rd103, -9223371899348713472; 2026-02-21T09:31:52.4026902Z // begin inline asm 2026-02-21T09:31:52.4027161Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 64 ], %rd94, %rd91, %r344, %p26; 2026-02-21T09:31:52.4027454Z // end inline asm 2026-02-21T09:31:52.4027603Z add.s32 %r366, %r356, 8224; 2026-02-21T09:31:52.4027779Z bfe.u32 %r367, %r366, 4, 14; 2026-02-21T09:31:52.4027948Z cvt.u64.u32 %rd104, %r367; 2026-02-21T09:31:52.4028134Z or.b64 %rd96, %rd104, -9223371899348713472; 2026-02-21T09:31:52.4028331Z // begin inline asm 2026-02-21T09:31:52.4028580Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r909 + 64 ], %rd96, %rd93, %r344, %p26; 2026-02-21T09:31:52.4028862Z // end inline asm 2026-02-21T09:31:52.4029019Z cvt.u64.u32 %rd98, %r399; 2026-02-21T09:31:52.4029187Z // begin inline asm 2026-02-21T09:31:52.4029417Z @%p27 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd98]; 2026-02-21T09:31:52.4029681Z // end inline asm 2026-02-21T09:31:52.4029826Z bra.uni $L__BB0_6; 2026-02-21T09:31:52.4030025Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:31:52.4030381Z .loc 1 0 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:0:52 2026-02-21T09:31:52.4030715Z mov.b32 %r400, 1; 2026-02-21T09:31:52.4031000Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4031312Z // begin inline asm 2026-02-21T09:31:52.4031464Z 2026-02-21T09:31:52.4031586Z { 2026-02-21T09:31:52.4031725Z .reg .pred complete; 2026-02-21T09:31:52.4031884Z waitLoop: 2026-02-21T09:31:52.4032094Z mbarrier.try_wait.parity.shared.b64 complete, [%r399], %r400; 2026-02-21T09:31:52.4032355Z @!complete bra.uni waitLoop; 2026-02-21T09:31:52.4032529Z } 2026-02-21T09:31:52.4032599Z 2026-02-21T09:31:52.4032660Z // end inline asm 2026-02-21T09:31:52.4032940Z .loc 1 39 90 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:39:90 2026-02-21T09:31:52.4033268Z cp.async.wait_group 0; 2026-02-21T09:31:52.4033435Z bar.sync 0; 2026-02-21T09:31:52.4033588Z add.s32 %r401, %r53, 61440; 2026-02-21T09:31:52.4033754Z // begin inline asm 2026-02-21T09:31:52.4033936Z @%p39 mbarrier.inval.shared::cta.b64 [%r401]; 2026-02-21T09:31:52.4034140Z // end inline asm 2026-02-21T09:31:52.4034291Z bar.sync 0; 2026-02-21T09:31:52.4034428Z // begin inline asm 2026-02-21T09:31:52.4034612Z @%p39 mbarrier.inval.shared::cta.b64 [%r191]; 2026-02-21T09:31:52.4034877Z // end inline asm 2026-02-21T09:31:52.4035145Z .loc 1 49 45 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:49:45 2026-02-21T09:31:52.4035471Z shl.b32 %r684, %r24, 10; 2026-02-21T09:31:52.4035642Z shl.b32 %r685, %r25, 10; 2026-02-21T09:31:52.4035815Z shl.b32 %r686, %r26, 10; 2026-02-21T09:31:52.4035980Z shl.b32 %r687, %r27, 10; 2026-02-21T09:31:52.4036161Z shl.b32 %r688, %r28, 10; 2026-02-21T09:31:52.4036319Z shl.b32 %r689, %r29, 10; 2026-02-21T09:31:52.4036489Z shl.b32 %r690, %r30, 10; 2026-02-21T09:31:52.4036657Z shl.b32 %r691, %r31, 10; 2026-02-21T09:31:52.4036857Z shl.b32 %r692, %r32, 10; 2026-02-21T09:31:52.4037053Z shl.b32 %r693, %r33, 10; 2026-02-21T09:31:52.4037211Z shl.b32 %r694, %r34, 10; 2026-02-21T09:31:52.4037376Z shl.b32 %r695, %r35, 10; 2026-02-21T09:31:52.4037533Z shl.b32 %r696, %r36, 10; 2026-02-21T09:31:52.4037697Z shl.b32 %r697, %r37, 10; 2026-02-21T09:31:52.4037858Z shl.b32 %r698, %r38, 10; 2026-02-21T09:31:52.4038024Z shl.b32 %r699, %r39, 10; 2026-02-21T09:31:52.4038304Z .loc 1 49 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:49:52 2026-02-21T09:31:52.4038637Z or.b32 %r700, %r684, %r22; 2026-02-21T09:31:52.4038813Z or.b32 %r701, %r685, %r22; 2026-02-21T09:31:52.4038979Z or.b32 %r702, %r686, %r22; 2026-02-21T09:31:52.4039149Z or.b32 %r703, %r687, %r22; 2026-02-21T09:31:52.4039311Z or.b32 %r704, %r688, %r22; 2026-02-21T09:31:52.4039479Z or.b32 %r705, %r689, %r22; 2026-02-21T09:31:52.4039676Z or.b32 %r706, %r690, %r22; 2026-02-21T09:31:52.4039872Z or.b32 %r707, %r691, %r22; 2026-02-21T09:31:52.4040039Z or.b32 %r708, %r692, %r22; 2026-02-21T09:31:52.4040210Z or.b32 %r709, %r693, %r22; 2026-02-21T09:31:52.4040379Z or.b32 %r710, %r694, %r22; 2026-02-21T09:31:52.4040542Z or.b32 %r711, %r695, %r22; 2026-02-21T09:31:52.4040708Z or.b32 %r712, %r696, %r22; 2026-02-21T09:31:52.4040867Z or.b32 %r713, %r697, %r22; 2026-02-21T09:31:52.4041035Z or.b32 %r714, %r698, %r22; 2026-02-21T09:31:52.4041195Z or.b32 %r715, %r699, %r22; 2026-02-21T09:31:52.4041372Z or.b32 %r716, %r715, 245760; 2026-02-21T09:31:52.4041662Z .loc 1 49 24 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:49:24 2026-02-21T09:31:52.4041990Z mad.wide.u32 %rd116, %r700, 2, %rd28; 2026-02-21T09:31:52.4042191Z mad.wide.u32 %rd117, %r701, 2, %rd28; 2026-02-21T09:31:52.4042390Z mad.wide.u32 %rd118, %r702, 2, %rd28; 2026-02-21T09:31:52.4042585Z mad.wide.u32 %rd119, %r703, 2, %rd28; 2026-02-21T09:31:52.4042776Z mad.wide.u32 %rd120, %r704, 2, %rd28; 2026-02-21T09:31:52.4042968Z mad.wide.u32 %rd121, %r705, 2, %rd28; 2026-02-21T09:31:52.4043156Z mad.wide.u32 %rd122, %r706, 2, %rd28; 2026-02-21T09:31:52.4043347Z mad.wide.u32 %rd123, %r707, 2, %rd28; 2026-02-21T09:31:52.4043530Z mad.wide.u32 %rd124, %r708, 2, %rd28; 2026-02-21T09:31:52.4043720Z mad.wide.u32 %rd125, %r709, 2, %rd28; 2026-02-21T09:31:52.4043905Z mad.wide.u32 %rd126, %r710, 2, %rd28; 2026-02-21T09:31:52.4044097Z mad.wide.u32 %rd127, %r711, 2, %rd28; 2026-02-21T09:31:52.4044284Z mad.wide.u32 %rd128, %r712, 2, %rd28; 2026-02-21T09:31:52.4044467Z mad.wide.u32 %rd129, %r713, 2, %rd28; 2026-02-21T09:31:52.4044657Z mad.wide.u32 %rd130, %r714, 2, %rd28; 2026-02-21T09:31:52.4044883Z mad.wide.u32 %rd131, %r716, 2, %rd28; 2026-02-21T09:31:52.4045193Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4045501Z // begin inline asm 2026-02-21T09:31:52.4045915Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418}, [%r538 + 0]; 2026-02-21T09:31:52.4046364Z // end inline asm 2026-02-21T09:31:52.4046513Z // begin inline asm 2026-02-21T09:31:52.4046909Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435}, [%r538 + 16]; 2026-02-21T09:31:52.4047359Z // end inline asm 2026-02-21T09:31:52.4047520Z // begin inline asm 2026-02-21T09:31:52.4047918Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452}, [%r538 + 32]; 2026-02-21T09:31:52.4048367Z // end inline asm 2026-02-21T09:31:52.4048526Z // begin inline asm 2026-02-21T09:31:52.4048919Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469}, [%r538 + 48]; 2026-02-21T09:31:52.4049362Z // end inline asm 2026-02-21T09:31:52.4049556Z // begin inline asm 2026-02-21T09:31:52.4049993Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486}, [%r538 + 64]; 2026-02-21T09:31:52.4050431Z // end inline asm 2026-02-21T09:31:52.4050585Z // begin inline asm 2026-02-21T09:31:52.4050991Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503}, [%r538 + 80]; 2026-02-21T09:31:52.4051442Z // end inline asm 2026-02-21T09:31:52.4051600Z // begin inline asm 2026-02-21T09:31:52.4051998Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520}, [%r538 + 96]; 2026-02-21T09:31:52.4052454Z // end inline asm 2026-02-21T09:31:52.4052612Z // begin inline asm 2026-02-21T09:31:52.4053111Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536, %r537}, [%r538 + 112]; 2026-02-21T09:31:52.4053583Z // end inline asm 2026-02-21T09:31:52.4053731Z // begin inline asm 2026-02-21T09:31:52.4053908Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:52.4054103Z // end inline asm 2026-02-21T09:31:52.4054273Z cvt.u64.u32 %rd132, %r403; 2026-02-21T09:31:52.4054451Z cvt.u64.u32 %rd133, %r404; 2026-02-21T09:31:52.4054638Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:31:52.4054878Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:31:52.4055212Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4055565Z mov.b64 {%r717, %r718}, %rd135; 2026-02-21T09:31:52.4055763Z cvt.rn.f16x2.f32 %r719, %r718, %r717; 2026-02-21T09:31:52.4056114Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4056440Z cvt.u64.u32 %rd136, %r405; 2026-02-21T09:31:52.4056625Z cvt.u64.u32 %rd137, %r406; 2026-02-21T09:31:52.4056802Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:31:52.4056986Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:31:52.4057311Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4057664Z mov.b64 {%r720, %r721}, %rd139; 2026-02-21T09:31:52.4057862Z cvt.rn.f16x2.f32 %r722, %r721, %r720; 2026-02-21T09:31:52.4058196Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4058525Z cvt.u64.u32 %rd140, %r407; 2026-02-21T09:31:52.4058694Z cvt.u64.u32 %rd141, %r408; 2026-02-21T09:31:52.4058871Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:31:52.4059051Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:31:52.4059354Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4059696Z mov.b64 {%r723, %r724}, %rd143; 2026-02-21T09:31:52.4059890Z cvt.rn.f16x2.f32 %r725, %r724, %r723; 2026-02-21T09:31:52.4060204Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4060515Z cvt.u64.u32 %rd144, %r409; 2026-02-21T09:31:52.4060692Z cvt.u64.u32 %rd145, %r410; 2026-02-21T09:31:52.4060858Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:31:52.4061035Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:31:52.4061333Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4061646Z mov.b64 {%r726, %r727}, %rd147; 2026-02-21T09:31:52.4061835Z cvt.rn.f16x2.f32 %r728, %r727, %r726; 2026-02-21T09:31:52.4062141Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4062467Z cvt.u64.u32 %rd148, %r411; 2026-02-21T09:31:52.4062634Z cvt.u64.u32 %rd149, %r412; 2026-02-21T09:31:52.4062808Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:31:52.4062987Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:31:52.4063323Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4063679Z mov.b64 {%r729, %r730}, %rd151; 2026-02-21T09:31:52.4063860Z cvt.rn.f16x2.f32 %r731, %r730, %r729; 2026-02-21T09:31:52.4064170Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4064478Z cvt.u64.u32 %rd152, %r413; 2026-02-21T09:31:52.4064653Z cvt.u64.u32 %rd153, %r414; 2026-02-21T09:31:52.4064858Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:31:52.4065033Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:31:52.4065332Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4065658Z mov.b64 {%r732, %r733}, %rd155; 2026-02-21T09:31:52.4065851Z cvt.rn.f16x2.f32 %r734, %r733, %r732; 2026-02-21T09:31:52.4066228Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4066573Z cvt.u64.u32 %rd156, %r415; 2026-02-21T09:31:52.4066744Z cvt.u64.u32 %rd157, %r416; 2026-02-21T09:31:52.4066923Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:31:52.4067105Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:31:52.4067404Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4067732Z mov.b64 {%r735, %r736}, %rd159; 2026-02-21T09:31:52.4067918Z cvt.rn.f16x2.f32 %r737, %r736, %r735; 2026-02-21T09:31:52.4068241Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4068571Z cvt.u64.u32 %rd160, %r417; 2026-02-21T09:31:52.4068748Z cvt.u64.u32 %rd161, %r418; 2026-02-21T09:31:52.4068917Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:31:52.4069099Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:31:52.4069411Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4069730Z mov.b64 {%r738, %r739}, %rd163; 2026-02-21T09:31:52.4069920Z cvt.rn.f16x2.f32 %r740, %r739, %r738; 2026-02-21T09:31:52.4070230Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4070564Z cvt.u64.u32 %rd164, %r420; 2026-02-21T09:31:52.4070732Z cvt.u64.u32 %rd165, %r421; 2026-02-21T09:31:52.4070911Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:31:52.4071087Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:31:52.4071386Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4071714Z mov.b64 {%r741, %r742}, %rd167; 2026-02-21T09:31:52.4071894Z cvt.rn.f16x2.f32 %r743, %r742, %r741; 2026-02-21T09:31:52.4072210Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4072536Z cvt.u64.u32 %rd168, %r422; 2026-02-21T09:31:52.4072713Z cvt.u64.u32 %rd169, %r423; 2026-02-21T09:31:52.4072887Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:31:52.4073060Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:31:52.4073368Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4073692Z mov.b64 {%r744, %r745}, %rd171; 2026-02-21T09:31:52.4073884Z cvt.rn.f16x2.f32 %r746, %r745, %r744; 2026-02-21T09:31:52.4074198Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4074540Z cvt.u64.u32 %rd172, %r424; 2026-02-21T09:31:52.4074749Z cvt.u64.u32 %rd173, %r425; 2026-02-21T09:31:52.4074932Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:31:52.4075113Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:31:52.4075414Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4075741Z mov.b64 {%r747, %r748}, %rd175; 2026-02-21T09:31:52.4075929Z cvt.rn.f16x2.f32 %r749, %r748, %r747; 2026-02-21T09:31:52.4407026Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4407575Z cvt.u64.u32 %rd176, %r426; 2026-02-21T09:31:52.4407781Z cvt.u64.u32 %rd177, %r427; 2026-02-21T09:31:52.4407974Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:31:52.4408166Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:31:52.4408487Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4408824Z mov.b64 {%r750, %r751}, %rd179; 2026-02-21T09:31:52.4409036Z cvt.rn.f16x2.f32 %r752, %r751, %r750; 2026-02-21T09:31:52.4409360Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4409697Z cvt.u64.u32 %rd180, %r428; 2026-02-21T09:31:52.4409876Z cvt.u64.u32 %rd181, %r429; 2026-02-21T09:31:52.4410070Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:31:52.4410347Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:31:52.4410736Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4411110Z mov.b64 {%r753, %r754}, %rd183; 2026-02-21T09:31:52.4411316Z cvt.rn.f16x2.f32 %r755, %r754, %r753; 2026-02-21T09:31:52.4411664Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4411976Z cvt.u64.u32 %rd184, %r430; 2026-02-21T09:31:52.4412157Z cvt.u64.u32 %rd185, %r431; 2026-02-21T09:31:52.4412334Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:31:52.4412506Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:31:52.4412803Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4413114Z mov.b64 {%r756, %r757}, %rd187; 2026-02-21T09:31:52.4413309Z cvt.rn.f16x2.f32 %r758, %r757, %r756; 2026-02-21T09:31:52.4413617Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4413935Z cvt.u64.u32 %rd188, %r432; 2026-02-21T09:31:52.4414111Z cvt.u64.u32 %rd189, %r433; 2026-02-21T09:31:52.4414288Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:31:52.4414469Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:31:52.4414931Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4415266Z mov.b64 {%r759, %r760}, %rd191; 2026-02-21T09:31:52.4415458Z cvt.rn.f16x2.f32 %r761, %r760, %r759; 2026-02-21T09:31:52.4415800Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4416108Z cvt.u64.u32 %rd192, %r434; 2026-02-21T09:31:52.4416284Z cvt.u64.u32 %rd193, %r435; 2026-02-21T09:31:52.4416460Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:31:52.4416630Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:31:52.4416927Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4417244Z mov.b64 {%r762, %r763}, %rd195; 2026-02-21T09:31:52.4417435Z cvt.rn.f16x2.f32 %r764, %r763, %r762; 2026-02-21T09:31:52.4417752Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4418064Z cvt.u64.u32 %rd196, %r437; 2026-02-21T09:31:52.4418227Z cvt.u64.u32 %rd197, %r438; 2026-02-21T09:31:52.4418396Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:31:52.4418568Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:31:52.4418877Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4419205Z mov.b64 {%r765, %r766}, %rd199; 2026-02-21T09:31:52.4419383Z cvt.rn.f16x2.f32 %r767, %r766, %r765; 2026-02-21T09:31:52.4419708Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4420025Z cvt.u64.u32 %rd200, %r439; 2026-02-21T09:31:52.4420201Z cvt.u64.u32 %rd201, %r440; 2026-02-21T09:31:52.4420373Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:31:52.4420598Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:31:52.4420952Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4421278Z mov.b64 {%r768, %r769}, %rd203; 2026-02-21T09:31:52.4421468Z cvt.rn.f16x2.f32 %r770, %r769, %r768; 2026-02-21T09:31:52.4421760Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4422110Z cvt.u64.u32 %rd204, %r441; 2026-02-21T09:31:52.4422276Z cvt.u64.u32 %rd205, %r442; 2026-02-21T09:31:52.4422454Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:31:52.4422632Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:31:52.4422937Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4423253Z mov.b64 {%r771, %r772}, %rd207; 2026-02-21T09:31:52.4423476Z cvt.rn.f16x2.f32 %r773, %r772, %r771; 2026-02-21T09:31:52.4423853Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4424196Z cvt.u64.u32 %rd208, %r443; 2026-02-21T09:31:52.4424365Z cvt.u64.u32 %rd209, %r444; 2026-02-21T09:31:52.4424530Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:31:52.4424748Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:31:52.4425050Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4425375Z mov.b64 {%r774, %r775}, %rd211; 2026-02-21T09:31:52.4425559Z cvt.rn.f16x2.f32 %r776, %r775, %r774; 2026-02-21T09:31:52.4425876Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4426189Z cvt.u64.u32 %rd212, %r445; 2026-02-21T09:31:52.4426352Z cvt.u64.u32 %rd213, %r446; 2026-02-21T09:31:52.4426521Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:31:52.4426695Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:31:52.4426980Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4427300Z mov.b64 {%r777, %r778}, %rd215; 2026-02-21T09:31:52.4427477Z cvt.rn.f16x2.f32 %r779, %r778, %r777; 2026-02-21T09:31:52.4427779Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4428083Z cvt.u64.u32 %rd216, %r447; 2026-02-21T09:31:52.4428255Z cvt.u64.u32 %rd217, %r448; 2026-02-21T09:31:52.4428423Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:31:52.4428586Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:31:52.4428877Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4429182Z mov.b64 {%r780, %r781}, %rd219; 2026-02-21T09:31:52.4429366Z cvt.rn.f16x2.f32 %r782, %r781, %r780; 2026-02-21T09:31:52.4429658Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4429968Z cvt.u64.u32 %rd220, %r449; 2026-02-21T09:31:52.4430130Z cvt.u64.u32 %rd221, %r450; 2026-02-21T09:31:52.4430302Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:31:52.4430471Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:31:52.4430749Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4431070Z mov.b64 {%r783, %r784}, %rd223; 2026-02-21T09:31:52.4431245Z cvt.rn.f16x2.f32 %r785, %r784, %r783; 2026-02-21T09:31:52.4431549Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4431855Z cvt.u64.u32 %rd224, %r451; 2026-02-21T09:31:52.4432023Z cvt.u64.u32 %rd225, %r452; 2026-02-21T09:31:52.4432191Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:31:52.4432354Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:31:52.4432646Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4432950Z mov.b64 {%r786, %r787}, %rd227; 2026-02-21T09:31:52.4433135Z cvt.rn.f16x2.f32 %r788, %r787, %r786; 2026-02-21T09:31:52.4433498Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4433879Z cvt.u64.u32 %rd228, %r454; 2026-02-21T09:31:52.4434041Z cvt.u64.u32 %rd229, %r455; 2026-02-21T09:31:52.4434213Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:31:52.4434387Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:31:52.4434714Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4435035Z mov.b64 {%r789, %r790}, %rd231; 2026-02-21T09:31:52.4435247Z cvt.rn.f16x2.f32 %r791, %r790, %r789; 2026-02-21T09:31:52.4435557Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4435882Z cvt.u64.u32 %rd232, %r456; 2026-02-21T09:31:52.4436065Z cvt.u64.u32 %rd233, %r457; 2026-02-21T09:31:52.4436290Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:31:52.4436499Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:31:52.4436795Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4437115Z mov.b64 {%r792, %r793}, %rd235; 2026-02-21T09:31:52.4437306Z cvt.rn.f16x2.f32 %r794, %r793, %r792; 2026-02-21T09:31:52.4437635Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4437990Z cvt.u64.u32 %rd236, %r458; 2026-02-21T09:31:52.4438162Z cvt.u64.u32 %rd237, %r459; 2026-02-21T09:31:52.4438340Z shl.b64 %rd238, %rd237, 32; 2026-02-21T09:31:52.4438521Z or.b64 %rd239, %rd236, %rd238; 2026-02-21T09:31:52.4438837Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4439187Z mov.b64 {%r795, %r796}, %rd239; 2026-02-21T09:31:52.4439371Z cvt.rn.f16x2.f32 %r797, %r796, %r795; 2026-02-21T09:31:52.4439718Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4440066Z cvt.u64.u32 %rd240, %r460; 2026-02-21T09:31:52.4440250Z cvt.u64.u32 %rd241, %r461; 2026-02-21T09:31:52.4440429Z shl.b64 %rd242, %rd241, 32; 2026-02-21T09:31:52.4440603Z or.b64 %rd243, %rd240, %rd242; 2026-02-21T09:31:52.4440923Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4441266Z mov.b64 {%r798, %r799}, %rd243; 2026-02-21T09:31:52.4441459Z cvt.rn.f16x2.f32 %r800, %r799, %r798; 2026-02-21T09:31:52.4441791Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4442139Z cvt.u64.u32 %rd244, %r462; 2026-02-21T09:31:52.4442310Z cvt.u64.u32 %rd245, %r463; 2026-02-21T09:31:52.4442487Z shl.b64 %rd246, %rd245, 32; 2026-02-21T09:31:52.4442665Z or.b64 %rd247, %rd244, %rd246; 2026-02-21T09:31:52.4442982Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4443336Z mov.b64 {%r801, %r802}, %rd247; 2026-02-21T09:31:52.4443522Z cvt.rn.f16x2.f32 %r803, %r802, %r801; 2026-02-21T09:31:52.4443861Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4444198Z cvt.u64.u32 %rd248, %r464; 2026-02-21T09:31:52.4444375Z cvt.u64.u32 %rd249, %r465; 2026-02-21T09:31:52.4444553Z shl.b64 %rd250, %rd249, 32; 2026-02-21T09:31:52.4444782Z or.b64 %rd251, %rd248, %rd250; 2026-02-21T09:31:52.4445109Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4445414Z mov.b64 {%r804, %r805}, %rd251; 2026-02-21T09:31:52.4445594Z cvt.rn.f16x2.f32 %r806, %r805, %r804; 2026-02-21T09:31:52.4445881Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4446190Z cvt.u64.u32 %rd252, %r466; 2026-02-21T09:31:52.4446351Z cvt.u64.u32 %rd253, %r467; 2026-02-21T09:31:52.4446521Z shl.b64 %rd254, %rd253, 32; 2026-02-21T09:31:52.4446763Z or.b64 %rd255, %rd252, %rd254; 2026-02-21T09:31:52.4447047Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4447358Z mov.b64 {%r807, %r808}, %rd255; 2026-02-21T09:31:52.4447531Z cvt.rn.f16x2.f32 %r809, %r808, %r807; 2026-02-21T09:31:52.4447835Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4448133Z cvt.u64.u32 %rd256, %r468; 2026-02-21T09:31:52.4448303Z cvt.u64.u32 %rd257, %r469; 2026-02-21T09:31:52.4448485Z shl.b64 %rd258, %rd257, 32; 2026-02-21T09:31:52.4448653Z or.b64 %rd259, %rd256, %rd258; 2026-02-21T09:31:52.4448946Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4449243Z mov.b64 {%r810, %r811}, %rd259; 2026-02-21T09:31:52.4449456Z cvt.rn.f16x2.f32 %r812, %r811, %r810; 2026-02-21T09:31:52.4449774Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4450087Z cvt.u64.u32 %rd260, %r471; 2026-02-21T09:31:52.4450246Z cvt.u64.u32 %rd261, %r472; 2026-02-21T09:31:52.4450411Z shl.b64 %rd262, %rd261, 32; 2026-02-21T09:31:52.4450579Z or.b64 %rd263, %rd260, %rd262; 2026-02-21T09:31:52.4450855Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4451158Z mov.b64 {%r813, %r814}, %rd263; 2026-02-21T09:31:52.4451329Z cvt.rn.f16x2.f32 %r815, %r814, %r813; 2026-02-21T09:31:52.4451622Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4451921Z cvt.u64.u32 %rd264, %r473; 2026-02-21T09:31:52.4452087Z cvt.u64.u32 %rd265, %r474; 2026-02-21T09:31:52.4452253Z shl.b64 %rd266, %rd265, 32; 2026-02-21T09:31:52.4452418Z or.b64 %rd267, %rd264, %rd266; 2026-02-21T09:31:52.4452702Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4452999Z mov.b64 {%r816, %r817}, %rd267; 2026-02-21T09:31:52.4453177Z cvt.rn.f16x2.f32 %r818, %r817, %r816; 2026-02-21T09:31:52.4453464Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4453763Z cvt.u64.u32 %rd268, %r475; 2026-02-21T09:31:52.4453919Z cvt.u64.u32 %rd269, %r476; 2026-02-21T09:31:52.4454083Z shl.b64 %rd270, %rd269, 32; 2026-02-21T09:31:52.4454251Z or.b64 %rd271, %rd268, %rd270; 2026-02-21T09:31:52.4454421Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4454483Z mov.b64 {%r819, %r820}, %rd271; 2026-02-21T09:31:52.4454548Z cvt.rn.f16x2.f32 %r821, %r820, %r819; 2026-02-21T09:31:52.4454772Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4454835Z cvt.u64.u32 %rd272, %r477; 2026-02-21T09:31:52.4454897Z cvt.u64.u32 %rd273, %r478; 2026-02-21T09:31:52.4454967Z shl.b64 %rd274, %rd273, 32; 2026-02-21T09:31:52.4455028Z or.b64 %rd275, %rd272, %rd274; 2026-02-21T09:31:52.4455202Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4455273Z mov.b64 {%r822, %r823}, %rd275; 2026-02-21T09:31:52.4455338Z cvt.rn.f16x2.f32 %r824, %r823, %r822; 2026-02-21T09:31:52.4455514Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4455576Z cvt.u64.u32 %rd276, %r479; 2026-02-21T09:31:52.4455643Z cvt.u64.u32 %rd277, %r480; 2026-02-21T09:31:52.4455704Z shl.b64 %rd278, %rd277, 32; 2026-02-21T09:31:52.4455764Z or.b64 %rd279, %rd276, %rd278; 2026-02-21T09:31:52.4455946Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4456008Z mov.b64 {%r825, %r826}, %rd279; 2026-02-21T09:31:52.4456114Z cvt.rn.f16x2.f32 %r827, %r826, %r825; 2026-02-21T09:31:52.4456328Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4456389Z cvt.u64.u32 %rd280, %r481; 2026-02-21T09:31:52.4456449Z cvt.u64.u32 %rd281, %r482; 2026-02-21T09:31:52.4456509Z shl.b64 %rd282, %rd281, 32; 2026-02-21T09:31:52.4456591Z or.b64 %rd283, %rd280, %rd282; 2026-02-21T09:31:52.4456763Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4456823Z mov.b64 {%r828, %r829}, %rd283; 2026-02-21T09:31:52.4456899Z cvt.rn.f16x2.f32 %r830, %r829, %r828; 2026-02-21T09:31:52.4457070Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4457132Z cvt.u64.u32 %rd284, %r483; 2026-02-21T09:31:52.4457200Z cvt.u64.u32 %rd285, %r484; 2026-02-21T09:31:52.4457302Z shl.b64 %rd286, %rd285, 32; 2026-02-21T09:31:52.4457391Z or.b64 %rd287, %rd284, %rd286; 2026-02-21T09:31:52.4457575Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4457648Z mov.b64 {%r831, %r832}, %rd287; 2026-02-21T09:31:52.4457715Z cvt.rn.f16x2.f32 %r833, %r832, %r831; 2026-02-21T09:31:52.4457892Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4457967Z cvt.u64.u32 %rd288, %r485; 2026-02-21T09:31:52.4458031Z cvt.u64.u32 %rd289, %r486; 2026-02-21T09:31:52.4458097Z shl.b64 %rd290, %rd289, 32; 2026-02-21T09:31:52.4458170Z or.b64 %rd291, %rd288, %rd290; 2026-02-21T09:31:52.4458353Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4458416Z mov.b64 {%r834, %r835}, %rd291; 2026-02-21T09:31:52.4458485Z cvt.rn.f16x2.f32 %r836, %r835, %r834; 2026-02-21T09:31:52.4458674Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4458741Z cvt.u64.u32 %rd292, %r488; 2026-02-21T09:31:52.4458801Z cvt.u64.u32 %rd293, %r489; 2026-02-21T09:31:52.4458871Z shl.b64 %rd294, %rd293, 32; 2026-02-21T09:31:52.4458932Z or.b64 %rd295, %rd292, %rd294; 2026-02-21T09:31:52.4459106Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4459176Z mov.b64 {%r837, %r838}, %rd295; 2026-02-21T09:31:52.4459241Z cvt.rn.f16x2.f32 %r839, %r838, %r837; 2026-02-21T09:31:52.4459418Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4459478Z cvt.u64.u32 %rd296, %r490; 2026-02-21T09:31:52.4459546Z cvt.u64.u32 %rd297, %r491; 2026-02-21T09:31:52.4459606Z shl.b64 %rd298, %rd297, 32; 2026-02-21T09:31:52.4459666Z or.b64 %rd299, %rd296, %rd298; 2026-02-21T09:31:52.4459851Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4459914Z mov.b64 {%r840, %r841}, %rd299; 2026-02-21T09:31:52.4459982Z cvt.rn.f16x2.f32 %r842, %r841, %r840; 2026-02-21T09:31:52.4460167Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4460227Z cvt.u64.u32 %rd300, %r492; 2026-02-21T09:31:52.4460287Z cvt.u64.u32 %rd301, %r493; 2026-02-21T09:31:52.4460347Z shl.b64 %rd302, %rd301, 32; 2026-02-21T09:31:52.4460418Z or.b64 %rd303, %rd300, %rd302; 2026-02-21T09:31:52.4460596Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4460657Z mov.b64 {%r843, %r844}, %rd303; 2026-02-21T09:31:52.4460731Z cvt.rn.f16x2.f32 %r845, %r844, %r843; 2026-02-21T09:31:52.4460908Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4460972Z cvt.u64.u32 %rd304, %r494; 2026-02-21T09:31:52.4461042Z cvt.u64.u32 %rd305, %r495; 2026-02-21T09:31:52.4461127Z shl.b64 %rd306, %rd305, 32; 2026-02-21T09:31:52.4461236Z or.b64 %rd307, %rd304, %rd306; 2026-02-21T09:31:52.4461413Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4461481Z mov.b64 {%r846, %r847}, %rd307; 2026-02-21T09:31:52.4461548Z cvt.rn.f16x2.f32 %r848, %r847, %r846; 2026-02-21T09:31:52.4461727Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4461798Z cvt.u64.u32 %rd308, %r496; 2026-02-21T09:31:52.4461857Z cvt.u64.u32 %rd309, %r497; 2026-02-21T09:31:52.4461918Z shl.b64 %rd310, %rd309, 32; 2026-02-21T09:31:52.4461986Z or.b64 %rd311, %rd308, %rd310; 2026-02-21T09:31:52.4462169Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4462254Z mov.b64 {%r849, %r850}, %rd311; 2026-02-21T09:31:52.4462346Z cvt.rn.f16x2.f32 %r851, %r850, %r849; 2026-02-21T09:31:52.4462531Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4462593Z cvt.u64.u32 %rd312, %r498; 2026-02-21T09:31:52.4462652Z cvt.u64.u32 %rd313, %r499; 2026-02-21T09:31:52.4462722Z shl.b64 %rd314, %rd313, 32; 2026-02-21T09:31:52.4462785Z or.b64 %rd315, %rd312, %rd314; 2026-02-21T09:31:52.4462962Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4463034Z mov.b64 {%r852, %r853}, %rd315; 2026-02-21T09:31:52.4463100Z cvt.rn.f16x2.f32 %r854, %r853, %r852; 2026-02-21T09:31:52.4463274Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4463335Z cvt.u64.u32 %rd316, %r500; 2026-02-21T09:31:52.4463401Z cvt.u64.u32 %rd317, %r501; 2026-02-21T09:31:52.4463462Z shl.b64 %rd318, %rd317, 32; 2026-02-21T09:31:52.4463526Z or.b64 %rd319, %rd316, %rd318; 2026-02-21T09:31:52.4463709Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4463771Z mov.b64 {%r855, %r856}, %rd319; 2026-02-21T09:31:52.4463835Z cvt.rn.f16x2.f32 %r857, %r856, %r855; 2026-02-21T09:31:52.4464016Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4464076Z cvt.u64.u32 %rd320, %r502; 2026-02-21T09:31:52.4464135Z cvt.u64.u32 %rd321, %r503; 2026-02-21T09:31:52.4464195Z shl.b64 %rd322, %rd321, 32; 2026-02-21T09:31:52.4464264Z or.b64 %rd323, %rd320, %rd322; 2026-02-21T09:31:52.4464439Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4464499Z mov.b64 {%r858, %r859}, %rd323; 2026-02-21T09:31:52.4464571Z cvt.rn.f16x2.f32 %r860, %r859, %r858; 2026-02-21T09:31:52.4464817Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4464881Z cvt.u64.u32 %rd324, %r505; 2026-02-21T09:31:52.4464950Z cvt.u64.u32 %rd325, %r506; 2026-02-21T09:31:52.4465011Z shl.b64 %rd326, %rd325, 32; 2026-02-21T09:31:52.4465072Z or.b64 %rd327, %rd324, %rd326; 2026-02-21T09:31:52.4465251Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4465319Z mov.b64 {%r861, %r862}, %rd327; 2026-02-21T09:31:52.4465384Z cvt.rn.f16x2.f32 %r863, %r862, %r861; 2026-02-21T09:31:52.4465563Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4465636Z cvt.u64.u32 %rd328, %r507; 2026-02-21T09:31:52.4465696Z cvt.u64.u32 %rd329, %r508; 2026-02-21T09:31:52.4465757Z shl.b64 %rd330, %rd329, 32; 2026-02-21T09:31:52.4465826Z or.b64 %rd331, %rd328, %rd330; 2026-02-21T09:31:52.4466003Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4466066Z mov.b64 {%r864, %r865}, %rd331; 2026-02-21T09:31:52.4466209Z cvt.rn.f16x2.f32 %r866, %r865, %r864; 2026-02-21T09:31:52.4466394Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4466455Z cvt.u64.u32 %rd332, %r509; 2026-02-21T09:31:52.4466513Z cvt.u64.u32 %rd333, %r510; 2026-02-21T09:31:52.4466581Z shl.b64 %rd334, %rd333, 32; 2026-02-21T09:31:52.4466642Z or.b64 %rd335, %rd332, %rd334; 2026-02-21T09:31:52.4466815Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4466883Z mov.b64 {%r867, %r868}, %rd335; 2026-02-21T09:31:52.4466947Z cvt.rn.f16x2.f32 %r869, %r868, %r867; 2026-02-21T09:31:52.4467122Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4467180Z cvt.u64.u32 %rd336, %r511; 2026-02-21T09:31:52.4467280Z cvt.u64.u32 %rd337, %r512; 2026-02-21T09:31:52.4467369Z shl.b64 %rd338, %rd337, 32; 2026-02-21T09:31:52.4467435Z or.b64 %rd339, %rd336, %rd338; 2026-02-21T09:31:52.4467630Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4467695Z mov.b64 {%r870, %r871}, %rd339; 2026-02-21T09:31:52.4467764Z cvt.rn.f16x2.f32 %r872, %r871, %r870; 2026-02-21T09:31:52.4467956Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4468018Z cvt.u64.u32 %rd340, %r513; 2026-02-21T09:31:52.4468079Z cvt.u64.u32 %rd341, %r514; 2026-02-21T09:31:52.4468139Z shl.b64 %rd342, %rd341, 32; 2026-02-21T09:31:52.4468209Z or.b64 %rd343, %rd340, %rd342; 2026-02-21T09:31:52.4468387Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4468447Z mov.b64 {%r873, %r874}, %rd343; 2026-02-21T09:31:52.4468523Z cvt.rn.f16x2.f32 %r875, %r874, %r873; 2026-02-21T09:31:52.4468703Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4468766Z cvt.u64.u32 %rd344, %r515; 2026-02-21T09:31:52.4468832Z cvt.u64.u32 %rd345, %r516; 2026-02-21T09:31:52.4468893Z shl.b64 %rd346, %rd345, 32; 2026-02-21T09:31:52.4468954Z or.b64 %rd347, %rd344, %rd346; 2026-02-21T09:31:52.4469130Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4469199Z mov.b64 {%r876, %r877}, %rd347; 2026-02-21T09:31:52.4469262Z cvt.rn.f16x2.f32 %r878, %r877, %r876; 2026-02-21T09:31:52.4469442Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4469509Z cvt.u64.u32 %rd348, %r517; 2026-02-21T09:31:52.4469568Z cvt.u64.u32 %rd349, %r518; 2026-02-21T09:31:52.4469627Z shl.b64 %rd350, %rd349, 32; 2026-02-21T09:31:52.4469701Z or.b64 %rd351, %rd348, %rd350; 2026-02-21T09:31:52.4469878Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4469940Z mov.b64 {%r879, %r880}, %rd351; 2026-02-21T09:31:52.4470004Z cvt.rn.f16x2.f32 %r881, %r880, %r879; 2026-02-21T09:31:52.4470187Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4470246Z cvt.u64.u32 %rd352, %r519; 2026-02-21T09:31:52.4470306Z cvt.u64.u32 %rd353, %r520; 2026-02-21T09:31:52.4470374Z shl.b64 %rd354, %rd353, 32; 2026-02-21T09:31:52.4470435Z or.b64 %rd355, %rd352, %rd354; 2026-02-21T09:31:52.4470613Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4470680Z mov.b64 {%r882, %r883}, %rd355; 2026-02-21T09:31:52.4470746Z cvt.rn.f16x2.f32 %r884, %r883, %r882; 2026-02-21T09:31:52.4470929Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4470989Z cvt.u64.u32 %rd356, %r522; 2026-02-21T09:31:52.4471084Z cvt.u64.u32 %rd357, %r523; 2026-02-21T09:31:52.4471172Z shl.b64 %rd358, %rd357, 32; 2026-02-21T09:31:52.4471232Z or.b64 %rd359, %rd356, %rd358; 2026-02-21T09:31:52.4471415Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4471476Z mov.b64 {%r885, %r886}, %rd359; 2026-02-21T09:31:52.4471543Z cvt.rn.f16x2.f32 %r887, %r886, %r885; 2026-02-21T09:31:52.4471729Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4471790Z cvt.u64.u32 %rd360, %r524; 2026-02-21T09:31:52.4471849Z cvt.u64.u32 %rd361, %r525; 2026-02-21T09:31:52.4471908Z shl.b64 %rd362, %rd361, 32; 2026-02-21T09:31:52.4471977Z or.b64 %rd363, %rd360, %rd362; 2026-02-21T09:31:52.4472153Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4472236Z mov.b64 {%r888, %r889}, %rd363; 2026-02-21T09:31:52.4472334Z cvt.rn.f16x2.f32 %r890, %r889, %r888; 2026-02-21T09:31:52.4472515Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4472574Z cvt.u64.u32 %rd364, %r526; 2026-02-21T09:31:52.4472643Z cvt.u64.u32 %rd365, %r527; 2026-02-21T09:31:52.4472704Z shl.b64 %rd366, %rd365, 32; 2026-02-21T09:31:52.4472763Z or.b64 %rd367, %rd364, %rd366; 2026-02-21T09:31:52.4472937Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4473005Z mov.b64 {%r891, %r892}, %rd367; 2026-02-21T09:31:52.4473069Z cvt.rn.f16x2.f32 %r893, %r892, %r891; 2026-02-21T09:31:52.4473247Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4473314Z cvt.u64.u32 %rd368, %r528; 2026-02-21T09:31:52.4473373Z cvt.u64.u32 %rd369, %r529; 2026-02-21T09:31:52.4473434Z shl.b64 %rd370, %rd369, 32; 2026-02-21T09:31:52.4473506Z or.b64 %rd371, %rd368, %rd370; 2026-02-21T09:31:52.4473682Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4473745Z mov.b64 {%r894, %r895}, %rd371; 2026-02-21T09:31:52.4473809Z cvt.rn.f16x2.f32 %r896, %r895, %r894; 2026-02-21T09:31:52.4473991Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4474050Z cvt.u64.u32 %rd372, %r530; 2026-02-21T09:31:52.4474118Z cvt.u64.u32 %rd373, %r531; 2026-02-21T09:31:52.4474189Z shl.b64 %rd374, %rd373, 32; 2026-02-21T09:31:52.4474248Z or.b64 %rd375, %rd372, %rd374; 2026-02-21T09:31:52.4474425Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4474492Z mov.b64 {%r897, %r898}, %rd375; 2026-02-21T09:31:52.4474555Z cvt.rn.f16x2.f32 %r899, %r898, %r897; 2026-02-21T09:31:52.4474776Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4474839Z cvt.u64.u32 %rd376, %r532; 2026-02-21T09:31:52.4474908Z cvt.u64.u32 %rd377, %r533; 2026-02-21T09:31:52.4474968Z shl.b64 %rd378, %rd377, 32; 2026-02-21T09:31:52.4475028Z or.b64 %rd379, %rd376, %rd378; 2026-02-21T09:31:52.4475212Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4475273Z mov.b64 {%r900, %r901}, %rd379; 2026-02-21T09:31:52.4475336Z cvt.rn.f16x2.f32 %r902, %r901, %r900; 2026-02-21T09:31:52.4475524Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4475587Z cvt.u64.u32 %rd380, %r534; 2026-02-21T09:31:52.4475647Z cvt.u64.u32 %rd381, %r535; 2026-02-21T09:31:52.4475705Z shl.b64 %rd382, %rd381, 32; 2026-02-21T09:31:52.4475775Z or.b64 %rd383, %rd380, %rd382; 2026-02-21T09:31:52.4475951Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4476047Z mov.b64 {%r903, %r904}, %rd383; 2026-02-21T09:31:52.4476152Z cvt.rn.f16x2.f32 %r905, %r904, %r903; 2026-02-21T09:31:52.4476335Z .loc 1 46 52 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:46:52 2026-02-21T09:31:52.4476395Z cvt.u64.u32 %rd384, %r536; 2026-02-21T09:31:52.4476467Z cvt.u64.u32 %rd385, %r537; 2026-02-21T09:31:52.4476528Z shl.b64 %rd386, %rd385, 32; 2026-02-21T09:31:52.4476588Z or.b64 %rd387, %rd384, %rd386; 2026-02-21T09:31:52.4476768Z .loc 1 48 27 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:48:27 2026-02-21T09:31:52.4476840Z mov.b64 {%r906, %r907}, %rd387; 2026-02-21T09:31:52.4476906Z cvt.rn.f16x2.f32 %r908, %r907, %r906; 2026-02-21T09:31:52.4477088Z .loc 1 49 82 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:49:82 2026-02-21T09:31:52.4578971Z st.shared.v4.b32 [%r17], {%r719, %r731, %r743, %r755}; 2026-02-21T09:31:52.4579782Z st.shared.v4.b32 [%r18], {%r767, %r779, %r791, %r803}; 2026-02-21T09:31:52.4579883Z bar.sync 0; 2026-02-21T09:31:52.4579997Z // begin inline asm 2026-02-21T09:31:52.4580179Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r619, %r623, %r627, %r631}, [%r543]; 2026-02-21T09:31:52.4580242Z // end inline asm 2026-02-21T09:31:52.4580303Z // begin inline asm 2026-02-21T09:31:52.4580476Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r635, %r639, %r643, %r647}, [%r548]; 2026-02-21T09:31:52.4580539Z // end inline asm 2026-02-21T09:31:52.4580602Z bar.sync 0; 2026-02-21T09:31:52.4580710Z st.shared.v4.b32 [%r17], {%r815, %r827, %r839, %r851}; 2026-02-21T09:31:52.4580801Z st.shared.v4.b32 [%r18], {%r863, %r875, %r887, %r899}; 2026-02-21T09:31:52.4580855Z bar.sync 0; 2026-02-21T09:31:52.4580916Z // begin inline asm 2026-02-21T09:31:52.4581073Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r651, %r655, %r659, %r663}, [%r543]; 2026-02-21T09:31:52.4581133Z // end inline asm 2026-02-21T09:31:52.4581193Z // begin inline asm 2026-02-21T09:31:52.4581351Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r667, %r671, %r675, %r679}, [%r548]; 2026-02-21T09:31:52.4581411Z // end inline asm 2026-02-21T09:31:52.4581468Z bar.sync 0; 2026-02-21T09:31:52.4581566Z st.shared.v4.b32 [%r17], {%r722, %r734, %r746, %r758}; 2026-02-21T09:31:52.4581655Z st.shared.v4.b32 [%r18], {%r770, %r782, %r794, %r806}; 2026-02-21T09:31:52.4581710Z bar.sync 0; 2026-02-21T09:31:52.4581769Z // begin inline asm 2026-02-21T09:31:52.4581928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r620, %r624, %r628, %r632}, [%r543]; 2026-02-21T09:31:52.4581984Z // end inline asm 2026-02-21T09:31:52.4582043Z // begin inline asm 2026-02-21T09:31:52.4582196Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r636, %r640, %r644, %r648}, [%r548]; 2026-02-21T09:31:52.4582253Z // end inline asm 2026-02-21T09:31:52.4582309Z bar.sync 0; 2026-02-21T09:31:52.4582402Z st.shared.v4.b32 [%r17], {%r818, %r830, %r842, %r854}; 2026-02-21T09:31:52.4582499Z st.shared.v4.b32 [%r18], {%r866, %r878, %r890, %r902}; 2026-02-21T09:31:52.4582557Z bar.sync 0; 2026-02-21T09:31:52.4582617Z // begin inline asm 2026-02-21T09:31:52.4582772Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r652, %r656, %r660, %r664}, [%r543]; 2026-02-21T09:31:52.4582828Z // end inline asm 2026-02-21T09:31:52.4582886Z // begin inline asm 2026-02-21T09:31:52.4583040Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r668, %r672, %r676, %r680}, [%r548]; 2026-02-21T09:31:52.4583096Z // end inline asm 2026-02-21T09:31:52.4583151Z bar.sync 0; 2026-02-21T09:31:52.4583241Z st.shared.v4.b32 [%r17], {%r725, %r737, %r749, %r761}; 2026-02-21T09:31:52.4583336Z st.shared.v4.b32 [%r18], {%r773, %r785, %r797, %r809}; 2026-02-21T09:31:52.4583391Z bar.sync 0; 2026-02-21T09:31:52.4583448Z // begin inline asm 2026-02-21T09:31:52.4583602Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r621, %r625, %r629, %r633}, [%r543]; 2026-02-21T09:31:52.4583660Z // end inline asm 2026-02-21T09:31:52.4583719Z // begin inline asm 2026-02-21T09:31:52.4583865Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r641, %r645, %r649}, [%r548]; 2026-02-21T09:31:52.4584052Z // end inline asm 2026-02-21T09:31:52.4584109Z bar.sync 0; 2026-02-21T09:31:52.4584200Z st.shared.v4.b32 [%r17], {%r821, %r833, %r845, %r857}; 2026-02-21T09:31:52.4584298Z st.shared.v4.b32 [%r18], {%r869, %r881, %r893, %r905}; 2026-02-21T09:31:52.4584355Z bar.sync 0; 2026-02-21T09:31:52.4584415Z // begin inline asm 2026-02-21T09:31:52.4584576Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r653, %r657, %r661, %r665}, [%r543]; 2026-02-21T09:31:52.4584636Z // end inline asm 2026-02-21T09:31:52.4584798Z // begin inline asm 2026-02-21T09:31:52.4584949Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r669, %r673, %r677, %r681}, [%r548]; 2026-02-21T09:31:52.4585016Z // end inline asm 2026-02-21T09:31:52.4585073Z bar.sync 0; 2026-02-21T09:31:52.4585164Z st.shared.v4.b32 [%r17], {%r728, %r740, %r752, %r764}; 2026-02-21T09:31:52.4585360Z st.shared.v4.b32 [%r18], {%r776, %r788, %r800, %r812}; 2026-02-21T09:31:52.4585422Z bar.sync 0; 2026-02-21T09:31:52.4585482Z // begin inline asm 2026-02-21T09:31:52.4585626Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r622, %r626, %r630, %r634}, [%r543]; 2026-02-21T09:31:52.4585692Z // end inline asm 2026-02-21T09:31:52.4585751Z // begin inline asm 2026-02-21T09:31:52.4585895Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r638, %r642, %r646, %r650}, [%r548]; 2026-02-21T09:31:52.4585961Z // end inline asm 2026-02-21T09:31:52.4586017Z bar.sync 0; 2026-02-21T09:31:52.4586107Z st.shared.v4.b32 [%r17], {%r824, %r836, %r848, %r860}; 2026-02-21T09:31:52.4586202Z st.shared.v4.b32 [%r18], {%r872, %r884, %r896, %r908}; 2026-02-21T09:31:52.4586258Z bar.sync 0; 2026-02-21T09:31:52.4586315Z // begin inline asm 2026-02-21T09:31:52.4586460Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r654, %r658, %r662, %r666}, [%r543]; 2026-02-21T09:31:52.4586525Z // end inline asm 2026-02-21T09:31:52.4586586Z // begin inline asm 2026-02-21T09:31:52.4586732Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r670, %r674, %r678, %r682}, [%r548]; 2026-02-21T09:31:52.4586800Z // end inline asm 2026-02-21T09:31:52.4586861Z // begin inline asm 2026-02-21T09:31:52.4586974Z st.global.v4.b32 [ %rd116 + 0 ], { %r619, %r620, %r621, %r622 }; 2026-02-21T09:31:52.4587032Z // end inline asm 2026-02-21T09:31:52.4587098Z // begin inline asm 2026-02-21T09:31:52.4587209Z st.global.v4.b32 [ %rd117 + 0 ], { %r623, %r624, %r625, %r626 }; 2026-02-21T09:31:52.4587267Z // end inline asm 2026-02-21T09:31:52.4587334Z // begin inline asm 2026-02-21T09:31:52.4587436Z st.global.v4.b32 [ %rd118 + 0 ], { %r627, %r628, %r629, %r630 }; 2026-02-21T09:31:52.4587492Z // end inline asm 2026-02-21T09:31:52.4587551Z // begin inline asm 2026-02-21T09:31:52.4587658Z st.global.v4.b32 [ %rd119 + 0 ], { %r631, %r632, %r633, %r634 }; 2026-02-21T09:31:52.4587717Z // end inline asm 2026-02-21T09:31:52.4587778Z // begin inline asm 2026-02-21T09:31:52.4587891Z st.global.v4.b32 [ %rd120 + 0 ], { %r635, %r636, %r637, %r638 }; 2026-02-21T09:31:52.4587953Z // end inline asm 2026-02-21T09:31:52.4588016Z // begin inline asm 2026-02-21T09:31:52.4588127Z st.global.v4.b32 [ %rd121 + 0 ], { %r639, %r640, %r641, %r642 }; 2026-02-21T09:31:52.4588187Z // end inline asm 2026-02-21T09:31:52.4588249Z // begin inline asm 2026-02-21T09:31:52.4588353Z st.global.v4.b32 [ %rd122 + 0 ], { %r643, %r644, %r645, %r646 }; 2026-02-21T09:31:52.4588423Z // end inline asm 2026-02-21T09:31:52.4588485Z // begin inline asm 2026-02-21T09:31:52.4588592Z st.global.v4.b32 [ %rd123 + 0 ], { %r647, %r648, %r649, %r650 }; 2026-02-21T09:31:52.4588660Z // end inline asm 2026-02-21T09:31:52.4588723Z // begin inline asm 2026-02-21T09:31:52.4588827Z st.global.v4.b32 [ %rd124 + 0 ], { %r651, %r652, %r653, %r654 }; 2026-02-21T09:31:52.4588888Z // end inline asm 2026-02-21T09:31:52.4588958Z // begin inline asm 2026-02-21T09:31:52.4589062Z st.global.v4.b32 [ %rd125 + 0 ], { %r655, %r656, %r657, %r658 }; 2026-02-21T09:31:52.4589122Z // end inline asm 2026-02-21T09:31:52.4589228Z // begin inline asm 2026-02-21T09:31:52.4589365Z st.global.v4.b32 [ %rd126 + 0 ], { %r659, %r660, %r661, %r662 }; 2026-02-21T09:31:52.4589425Z // end inline asm 2026-02-21T09:31:52.4589492Z // begin inline asm 2026-02-21T09:31:52.4589596Z st.global.v4.b32 [ %rd127 + 0 ], { %r663, %r664, %r665, %r666 }; 2026-02-21T09:31:52.4589657Z // end inline asm 2026-02-21T09:31:52.4589717Z // begin inline asm 2026-02-21T09:31:52.4589826Z st.global.v4.b32 [ %rd128 + 0 ], { %r667, %r668, %r669, %r670 }; 2026-02-21T09:31:52.4589886Z // end inline asm 2026-02-21T09:31:52.4589946Z // begin inline asm 2026-02-21T09:31:52.4590060Z st.global.v4.b32 [ %rd129 + 0 ], { %r671, %r672, %r673, %r674 }; 2026-02-21T09:31:52.4590120Z // end inline asm 2026-02-21T09:31:52.4590181Z // begin inline asm 2026-02-21T09:31:52.4590286Z st.global.v4.b32 [ %rd130 + 0 ], { %r675, %r676, %r677, %r678 }; 2026-02-21T09:31:52.4590381Z // end inline asm 2026-02-21T09:31:52.4590481Z // begin inline asm 2026-02-21T09:31:52.4590590Z st.global.v4.b32 [ %rd131 + 0 ], { %r679, %r680, %r681, %r682 }; 2026-02-21T09:31:52.4590661Z // end inline asm 2026-02-21T09:31:52.4590755Z $L__BB0_8: // %._crit_edge 2026-02-21T09:31:52.4590981Z .loc 1 20 4 // czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py:20:4 2026-02-21T09:31:52.4591049Z bar.sync 0; 2026-02-21T09:31:52.4591109Z // begin inline asm 2026-02-21T09:31:52.4591270Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r909, 128; 2026-02-21T09:31:52.4591330Z // end inline asm 2026-02-21T09:31:52.4591395Z ret; 2026-02-21T09:31:52.4591456Z $L__tmp0: 2026-02-21T09:31:52.4591518Z $L__func_end0: 2026-02-21T09:31:52.4591624Z // -- End function 2026-02-21T09:31:52.4591690Z } 2026-02-21T09:31:52.4591948Z .file 1 "/tmp/torchinductor_root/zc/czcfn5w56jsjj7zcdbwal2h2la4blfkzsahspgimoeb7ctkwqqr4.py" 2026-02-21T09:31:52.4592019Z .section .debug_abbrev 2026-02-21T09:31:52.4592086Z { 2026-02-21T09:31:52.4592185Z .b8 1 // Abbreviation Code 2026-02-21T09:31:52.4592284Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:52.4592379Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:52.4592470Z .b8 37 // DW_AT_producer 2026-02-21T09:31:52.4592551Z .b8 8 // DW_FORM_string 2026-02-21T09:31:52.4592638Z .b8 19 // DW_AT_language 2026-02-21T09:31:52.4592720Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:52.4592803Z .b8 3 // DW_AT_name 2026-02-21T09:31:52.4592882Z .b8 8 // DW_FORM_string 2026-02-21T09:31:52.4592975Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:52.4593057Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:52.4593140Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:52.4593230Z .b8 8 // DW_FORM_string 2026-02-21T09:31:52.4593309Z .b8 0 // EOM(1) 2026-02-21T09:31:52.4593385Z .b8 0 // EOM(2) 2026-02-21T09:31:52.4593462Z .b8 0 // EOM(3) 2026-02-21T09:31:52.4593516Z } 2026-02-21T09:31:52.4593582Z .section .debug_info 2026-02-21T09:31:52.4593637Z { 2026-02-21T09:31:52.4593737Z .b32 104 // Length of Unit 2026-02-21T09:31:52.4593831Z .b8 2 // DWARF version number 2026-02-21T09:31:52.4593885Z .b8 0 2026-02-21T09:31:52.4594020Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:52.4594118Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:52.4594228Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:52.4594324Z .b8 116 // DW_AT_producer 2026-02-21T09:31:52.4594440Z .b8 114 2026-02-21T09:31:52.4594499Z .b8 105 2026-02-21T09:31:52.4594555Z .b8 116 2026-02-21T09:31:52.4594619Z .b8 111 2026-02-21T09:31:52.4594827Z .b8 110 2026-02-21T09:31:52.4594883Z .b8 0 2026-02-21T09:31:52.4594985Z .b8 2 // DW_AT_language 2026-02-21T09:31:52.4595038Z .b8 0 2026-02-21T09:31:52.4595115Z .b8 99 // DW_AT_name 2026-02-21T09:31:52.4595169Z .b8 122 2026-02-21T09:31:52.4595230Z .b8 99 2026-02-21T09:31:52.4595283Z .b8 102 2026-02-21T09:31:52.4595335Z .b8 110 2026-02-21T09:31:52.4595395Z .b8 53 2026-02-21T09:31:52.4595448Z .b8 119 2026-02-21T09:31:52.4595499Z .b8 53 2026-02-21T09:31:52.4595552Z .b8 54 2026-02-21T09:31:52.4595614Z .b8 106 2026-02-21T09:31:52.4595668Z .b8 115 2026-02-21T09:31:52.4595720Z .b8 106 2026-02-21T09:31:52.4595773Z .b8 106 2026-02-21T09:31:52.4595835Z .b8 55 2026-02-21T09:31:52.4595927Z .b8 122 2026-02-21T09:31:52.4596010Z .b8 99 2026-02-21T09:31:52.4596071Z .b8 100 2026-02-21T09:31:52.4596123Z .b8 98 2026-02-21T09:31:52.4596178Z .b8 119 2026-02-21T09:31:52.4596230Z .b8 97 2026-02-21T09:31:52.4596294Z .b8 108 2026-02-21T09:31:52.4596346Z .b8 50 2026-02-21T09:31:52.4596399Z .b8 104 2026-02-21T09:31:52.4596461Z .b8 50 2026-02-21T09:31:52.4596516Z .b8 108 2026-02-21T09:31:52.4596570Z .b8 97 2026-02-21T09:31:52.4596623Z .b8 52 2026-02-21T09:31:52.4596686Z .b8 98 2026-02-21T09:31:52.4596739Z .b8 108 2026-02-21T09:31:52.4596795Z .b8 102 2026-02-21T09:31:52.4596849Z .b8 107 2026-02-21T09:31:52.4596914Z .b8 122 2026-02-21T09:31:52.4596971Z .b8 115 2026-02-21T09:31:52.4597026Z .b8 97 2026-02-21T09:31:52.4597094Z .b8 104 2026-02-21T09:31:52.4597151Z .b8 115 2026-02-21T09:31:52.4597205Z .b8 112 2026-02-21T09:31:52.4597257Z .b8 103 2026-02-21T09:31:52.4597318Z .b8 105 2026-02-21T09:31:52.4597370Z .b8 109 2026-02-21T09:31:52.4597421Z .b8 111 2026-02-21T09:31:52.4597479Z .b8 101 2026-02-21T09:31:52.4597532Z .b8 98 2026-02-21T09:31:52.4597585Z .b8 55 2026-02-21T09:31:52.4597640Z .b8 99 2026-02-21T09:31:52.4597704Z .b8 116 2026-02-21T09:31:52.4597757Z .b8 107 2026-02-21T09:31:52.4597810Z .b8 119 2026-02-21T09:31:52.4597870Z .b8 113 2026-02-21T09:31:52.4597924Z .b8 113 2026-02-21T09:31:52.4597975Z .b8 114 2026-02-21T09:31:52.4598027Z .b8 52 2026-02-21T09:31:52.4598087Z .b8 46 2026-02-21T09:31:52.4598139Z .b8 112 2026-02-21T09:31:52.4598192Z .b8 121 2026-02-21T09:31:52.4598242Z .b8 0 2026-02-21T09:31:52.4598346Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:52.4598426Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:52.4598478Z .b8 116 2026-02-21T09:31:52.4598537Z .b8 109 2026-02-21T09:31:52.4598589Z .b8 112 2026-02-21T09:31:52.4598640Z .b8 47 2026-02-21T09:31:52.4598691Z .b8 116 2026-02-21T09:31:52.4598752Z .b8 111 2026-02-21T09:31:52.4598805Z .b8 114 2026-02-21T09:31:52.4598856Z .b8 99 2026-02-21T09:31:52.4598916Z .b8 104 2026-02-21T09:31:52.4598969Z .b8 105 2026-02-21T09:31:52.4599023Z .b8 110 2026-02-21T09:31:52.4599077Z .b8 100 2026-02-21T09:31:52.4599136Z .b8 117 2026-02-21T09:31:52.4599190Z .b8 99 2026-02-21T09:31:52.4599240Z .b8 116 2026-02-21T09:31:52.4599300Z .b8 111 2026-02-21T09:31:52.4599351Z .b8 114 2026-02-21T09:31:52.4599404Z .b8 95 2026-02-21T09:31:52.4599454Z .b8 114 2026-02-21T09:31:52.4599512Z .b8 111 2026-02-21T09:31:52.4599563Z .b8 111 2026-02-21T09:31:52.4599617Z .b8 116 2026-02-21T09:31:52.4599674Z .b8 47 2026-02-21T09:31:52.4599727Z .b8 122 2026-02-21T09:31:52.4599779Z .b8 99 2026-02-21T09:31:52.4599830Z .b8 0 2026-02-21T09:31:52.4599888Z } 2026-02-21T09:31:52.4599959Z .section .debug_macinfo { } 2026-02-21T09:31:52.4599969Z 2026-02-21T09:31:52.4600050Z ================================================================ 2026-02-21T09:31:52.4600166Z please share the reproducer above with Triton project. 2026-02-21T09:31:52.9778227Z 2026-02-21T09:31:52.9782787Z 2026-02-21T09:31:52.9784466Z 2026-02-21T09:31:52.9785101Z ================================================================ 2026-02-21T09:31:52.9785768Z Internal Triton PTX codegen error 2026-02-21T09:31:52.9786085Z `ptxas` stderr: 2026-02-21T09:31:52.9789086Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 181 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:52.9789763Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:52.9789940Z 2026-02-21T09:31:52.9790425Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjygui7eu.ptx -o /tmp/tmpjygui7eu.ptx.o 2026-02-21T09:31:52.9791025Z 2026-02-21T09:31:52.9791029Z 2026-02-21T09:31:52.9791095Z // 2026-02-21T09:31:52.9791264Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:31:52.9791462Z // 2026-02-21T09:31:52.9791543Z 2026-02-21T09:31:52.9791608Z .version 8.7 2026-02-21T09:31:52.9791769Z .target sm_100a 2026-02-21T09:31:52.9792177Z .address_size 64 2026-02-21T09:31:52.9792283Z 2026-02-21T09:31:52.9792513Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:31:52.9792819Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:31:52.9793075Z // @_helion_matmul 2026-02-21T09:31:52.9793310Z .visible .entry _helion_matmul( 2026-02-21T09:31:52.9793580Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:31:52.9793884Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:31:52.9794165Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:31:52.9794457Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:31:52.9794856Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:31:52.9795095Z ) 2026-02-21T09:31:52.9795232Z .reqntid 128 2026-02-21T09:31:52.9795386Z .maxnreg 32 2026-02-21T09:31:52.9795523Z { 2026-02-21T09:31:52.9795674Z .reg .pred %p<58>; 2026-02-21T09:31:52.9795852Z .reg .b32 %r<1025>; 2026-02-21T09:31:52.9796018Z .reg .b64 %rd<500>; 2026-02-21T09:31:52.9796186Z $L__func_begin0: 2026-02-21T09:31:52.9796281Z 2026-02-21T09:31:52.9796342Z // %bb.0: 2026-02-21T09:31:52.9796629Z .loc 1 14 0 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:14 2026-02-21T09:31:52.9796953Z mov.u32 %r1, %tid.x; 2026-02-21T09:31:52.9797133Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:31:52.9797318Z mov.b32 %r50, global_smem; 2026-02-21T09:31:52.9797501Z // begin inline asm 2026-02-21T09:31:52.9797783Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r50], 128; 2026-02-21T09:31:52.9798057Z // end inline asm 2026-02-21T09:31:52.9798219Z bar.sync 0; 2026-02-21T09:31:52.9798379Z ld.shared.b32 %r1018, [global_smem]; 2026-02-21T09:31:52.9798582Z bar.sync 0; 2026-02-21T09:31:52.9798726Z // begin inline asm 2026-02-21T09:31:52.9798958Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:31:52.9799256Z // end inline asm 2026-02-21T09:31:52.9799548Z .loc 1 20 46 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:20:46 2026-02-21T09:31:52.9799891Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:31:52.9800194Z .loc 1 20 98 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:20:98 2026-02-21T09:31:52.9800522Z setp.gt.u32 %p3, %r3, 767; 2026-02-21T09:31:52.9800708Z @%p3 bra $L__BB0_8; 2026-02-21T09:31:52.9800889Z // %bb.1: // %.lr.ph 2026-02-21T09:31:52.9801245Z .loc 1 0 98 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:0:98 2026-02-21T09:31:52.9801616Z ld.param.b64 %rd48, [_helion_matmul_param_1]; 2026-02-21T09:31:52.9801872Z ld.param.b64 %rd47, [_helion_matmul_param_0]; 2026-02-21T09:31:52.9802206Z .loc 1 32 45 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:32:45 2026-02-21T09:31:52.9802537Z shr.u32 %r269, %r1, 3; 2026-02-21T09:31:52.9802717Z bfe.u32 %r4, %r1, 3, 4; 2026-02-21T09:31:52.9802890Z or.b32 %r270, %r4, 48; 2026-02-21T09:31:52.9803130Z or.b32 %r271, %r4, 32; 2026-02-21T09:31:52.9803767Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:31:52.9805195Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=3, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:31:52.9806643Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:31:52.9806918Z `ptxas` stderr: 2026-02-21T09:31:52.9807475Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 181 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:31:52.9808046Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:31:52.9808212Z 2026-02-21T09:31:52.9808671Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjygui7eu.ptx -o /tmp/tmpjygui7eu.ptx.o 2026-02-21T09:31:52.9809217Z 2026-02-21T09:31:52.9809370Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:31:52.9809647Z or.b32 %r272, %r4, 16; 2026-02-21T09:31:52.9809811Z shl.b32 %r273, %r1, 3; 2026-02-21T09:31:52.9809986Z and.b32 %r274, %r273, 56; 2026-02-21T09:31:52.9810165Z shr.u32 %r275, %r1, 5; 2026-02-21T09:31:52.9810334Z setp.eq.b32 %p55, %r1, 0; 2026-02-21T09:31:52.9810501Z shl.b32 %r276, %r1, 4; 2026-02-21T09:31:52.9810671Z and.b32 %r277, %r276, 2032; 2026-02-21T09:31:52.9810843Z shl.b32 %r278, %r1, 1; 2026-02-21T09:31:52.9811011Z and.b32 %r279, %r278, 112; 2026-02-21T09:31:52.9811188Z xor.b32 %r5, %r277, %r279; 2026-02-21T09:31:52.9811359Z add.s32 %r189, %r50, %r5; 2026-02-21T09:31:52.9811541Z add.s32 %r191, %r189, 2048; 2026-02-21T09:31:52.9811711Z add.s32 %r193, %r189, 4096; 2026-02-21T09:31:52.9811885Z add.s32 %r195, %r189, 6144; 2026-02-21T09:31:52.9812050Z add.s32 %r197, %r189, 8192; 2026-02-21T09:31:52.9812227Z add.s32 %r199, %r189, 10240; 2026-02-21T09:31:52.9812399Z add.s32 %r201, %r189, 12288; 2026-02-21T09:31:52.9812579Z add.s32 %r203, %r189, 14336; 2026-02-21T09:31:52.9812748Z add.s32 %r205, %r189, 16384; 2026-02-21T09:31:52.9812924Z add.s32 %r207, %r189, 18432; 2026-02-21T09:31:52.9813093Z add.s32 %r209, %r189, 20480; 2026-02-21T09:31:52.9813258Z add.s32 %r211, %r189, 22528; 2026-02-21T09:31:52.9813435Z add.s32 %r213, %r189, 24576; 2026-02-21T09:31:52.9813605Z add.s32 %r215, %r189, 26624; 2026-02-21T09:31:52.9813783Z add.s32 %r217, %r189, 28672; 2026-02-21T09:31:52.9813949Z add.s32 %r219, %r189, 30720; 2026-02-21T09:31:52.9814123Z add.s32 %r221, %r189, 98304; 2026-02-21T09:31:52.9814293Z add.s32 %r223, %r189, 100352; 2026-02-21T09:31:52.9814480Z add.s32 %r225, %r189, 102400; 2026-02-21T09:31:52.9814657Z add.s32 %r227, %r189, 104448; 2026-02-21T09:31:52.9814880Z add.s32 %r229, %r189, 32768; 2026-02-21T09:31:52.9815054Z add.s32 %r231, %r189, 34816; 2026-02-21T09:31:52.9815217Z add.s32 %r233, %r189, 36864; 2026-02-21T09:31:52.9815389Z add.s32 %r235, %r189, 38912; 2026-02-21T09:31:52.9815552Z add.s32 %r237, %r189, 40960; 2026-02-21T09:31:52.9815721Z add.s32 %r239, %r189, 43008; 2026-02-21T09:31:52.9815884Z add.s32 %r241, %r189, 45056; 2026-02-21T09:31:52.9816055Z add.s32 %r243, %r189, 47104; 2026-02-21T09:31:52.9816217Z add.s32 %r245, %r189, 49152; 2026-02-21T09:31:52.9816388Z add.s32 %r247, %r189, 51200; 2026-02-21T09:31:52.9816558Z add.s32 %r249, %r189, 53248; 2026-02-21T09:31:52.9816721Z add.s32 %r251, %r189, 55296; 2026-02-21T09:31:52.9816893Z add.s32 %r253, %r189, 57344; 2026-02-21T09:31:52.9817061Z add.s32 %r255, %r189, 59392; 2026-02-21T09:31:52.9817233Z add.s32 %r257, %r189, 61440; 2026-02-21T09:31:52.9817434Z add.s32 %r259, %r189, 63488; 2026-02-21T09:31:52.9817640Z add.s32 %r261, %r189, 106496; 2026-02-21T09:31:52.9817804Z add.s32 %r263, %r189, 108544; 2026-02-21T09:31:52.9817979Z add.s32 %r265, %r189, 110592; 2026-02-21T09:31:52.9818142Z add.s32 %r267, %r189, 112640; 2026-02-21T09:31:52.9818314Z or.b32 %r6, %r274, 128; 2026-02-21T09:31:52.9818482Z add.s32 %r371, %r189, 65536; 2026-02-21T09:31:52.9818644Z add.s32 %r373, %r189, 67584; 2026-02-21T09:31:52.9818812Z add.s32 %r375, %r189, 69632; 2026-02-21T09:31:52.9818974Z add.s32 %r377, %r189, 71680; 2026-02-21T09:31:52.9819143Z add.s32 %r379, %r189, 73728; 2026-02-21T09:31:52.9819304Z add.s32 %r381, %r189, 75776; 2026-02-21T09:31:52.9819471Z add.s32 %r383, %r189, 77824; 2026-02-21T09:31:52.9819632Z add.s32 %r385, %r189, 79872; 2026-02-21T09:31:52.9819802Z add.s32 %r387, %r189, 81920; 2026-02-21T09:31:52.9820004Z add.s32 %r389, %r189, 83968; 2026-02-21T09:31:52.9820203Z add.s32 %r391, %r189, 86016; 2026-02-21T09:31:52.9820377Z add.s32 %r393, %r189, 88064; 2026-02-21T09:31:52.9820543Z add.s32 %r395, %r189, 90112; 2026-02-21T09:31:52.9820712Z add.s32 %r397, %r189, 92160; 2026-02-21T09:31:52.9820875Z add.s32 %r399, %r189, 94208; 2026-02-21T09:31:52.9821044Z add.s32 %r401, %r189, 96256; 2026-02-21T09:31:52.9821207Z add.s32 %r403, %r189, 114688; 2026-02-21T09:31:52.9821383Z add.s32 %r405, %r189, 116736; 2026-02-21T09:31:52.9821548Z add.s32 %r407, %r189, 118784; 2026-02-21T09:31:52.9821721Z add.s32 %r409, %r189, 120832; 2026-02-21T09:31:52.9821894Z and.b32 %r281, %r276, 1968; 2026-02-21T09:31:52.9822068Z bfe.s32 %r282, %r1, 2, 1; 2026-02-21T09:31:52.9822247Z and.b32 %r283, %r282, 2112; 2026-02-21T09:31:52.9822414Z or.b32 %r284, %r283, %r281; 2026-02-21T09:31:52.9822589Z xor.b32 %r285, %r284, 64; 2026-02-21T09:31:52.9822752Z shl.b32 %r286, %r1, 6; 2026-02-21T09:31:52.9822922Z and.b32 %r287, %r286, 2112; 2026-02-21T09:31:52.9823087Z shl.b32 %r288, %r1, 5; 2026-02-21T09:31:52.9823258Z and.b32 %r289, %r288, 768; 2026-02-21T09:31:52.9823423Z and.b32 %r290, %r273, 48; 2026-02-21T09:31:52.9823594Z and.b32 %r291, %r278, 192; 2026-02-21T09:31:52.9823764Z or.b32 %r292, %r287, %r289; 2026-02-21T09:31:52.9823926Z or.b32 %r293, %r290, %r291; 2026-02-21T09:31:52.9824098Z xor.b32 %r294, %r292, %r293; 2026-02-21T09:31:52.9824262Z add.s32 %r676, %r50, %r294; 2026-02-21T09:31:52.9824562Z .loc 1 31 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:31:27 2026-02-21T09:31:52.9824942Z shl.b32 %r295, %r3, 6; 2026-02-21T09:31:52.9825112Z and.b32 %r31, %r295, 960; 2026-02-21T09:31:52.9825414Z .loc 1 32 32 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:32:32 2026-02-21T09:31:52.9825752Z or.b32 %r296, %r31, %r4; 2026-02-21T09:31:52.9825928Z or.b32 %r297, %r31, %r272; 2026-02-21T09:31:52.9826096Z or.b32 %r298, %r31, %r271; 2026-02-21T09:31:52.9826269Z or.b32 %r299, %r31, %r270; 2026-02-21T09:31:52.9826561Z .loc 1 33 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:33:27 2026-02-21T09:31:52.9826883Z shl.b32 %r300, %r3, 4; 2026-02-21T09:31:52.9827042Z and.b32 %r33, %r300, 16128; 2026-02-21T09:31:52.9827340Z .loc 1 34 32 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:34:32 2026-02-21T09:31:52.9827667Z or.b32 %r301, %r33, %r4; 2026-02-21T09:31:52.9827832Z or.b32 %r302, %r33, %r272; 2026-02-21T09:31:52.9828002Z or.b32 %r303, %r33, %r271; 2026-02-21T09:31:52.9828162Z or.b32 %r304, %r33, %r270; 2026-02-21T09:31:52.9828330Z or.b32 %r305, %r269, %r33; 2026-02-21T09:31:52.9828491Z or.b32 %r306, %r269, %r300; 2026-02-21T09:31:52.9828783Z .loc 1 44 53 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:53 2026-02-21T09:31:52.9829092Z shl.b32 %r307, %r301, 10; 2026-02-21T09:31:52.9829265Z shl.b32 %r34, %r302, 10; 2026-02-21T09:31:52.9829436Z shl.b32 %r35, %r303, 10; 2026-02-21T09:31:52.9829636Z shl.b32 %r36, %r304, 10; 2026-02-21T09:31:52.9829832Z shl.b32 %r308, %r305, 10; 2026-02-21T09:31:52.9829996Z or.b32 %r309, %r308, 114688; 2026-02-21T09:31:52.9830170Z shl.b32 %r310, %r306, 10; 2026-02-21T09:31:52.9830332Z or.b32 %r311, %r310, 245760; 2026-02-21T09:31:52.9830622Z .loc 1 45 80 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:80 2026-02-21T09:31:52.9830928Z shl.b32 %r312, %r296, 10; 2026-02-21T09:31:52.9831098Z shl.b32 %r313, %r297, 10; 2026-02-21T09:31:52.9831269Z shl.b32 %r314, %r298, 10; 2026-02-21T09:31:52.9831430Z shl.b32 %r315, %r299, 10; 2026-02-21T09:31:52.9831724Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9832074Z shfl.sync.idx.b32 %r37, %r275, 0, 31, -1; 2026-02-21T09:31:52.9832290Z shl.b32 %r316, %r37, 21; 2026-02-21T09:31:52.9832493Z and.b32 %r317, %r316, 6291456; 2026-02-21T09:31:52.9832721Z add.s32 %r671, %r317, %r1018; 2026-02-21T09:31:52.9832908Z mov.pred %p17, -1; 2026-02-21T09:31:52.9833086Z mov.b32 %r1019, 0; 2026-02-21T09:31:52.9833247Z // begin inline asm 2026-02-21T09:31:52.9833710Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 0], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9834202Z // end inline asm 2026-02-21T09:31:52.9834352Z // begin inline asm 2026-02-21T09:31:52.9835028Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 16], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9835591Z // end inline asm 2026-02-21T09:31:52.9835776Z // begin inline asm 2026-02-21T09:31:52.9836299Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 32], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9836791Z // end inline asm 2026-02-21T09:31:52.9836949Z // begin inline asm 2026-02-21T09:31:52.9837372Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 48], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9837845Z // end inline asm 2026-02-21T09:31:52.9837994Z // begin inline asm 2026-02-21T09:31:52.9838422Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 64], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9838886Z // end inline asm 2026-02-21T09:31:52.9839034Z // begin inline asm 2026-02-21T09:31:52.9839477Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 80], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9839954Z // end inline asm 2026-02-21T09:31:52.9840111Z // begin inline asm 2026-02-21T09:31:52.9840551Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 96], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9841024Z // end inline asm 2026-02-21T09:31:52.9841178Z // begin inline asm 2026-02-21T09:31:52.9841618Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r671 + 112], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:31:52.9842099Z // end inline asm 2026-02-21T09:31:52.9842244Z // begin inline asm 2026-02-21T09:31:52.9842417Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:31:52.9842603Z // end inline asm 2026-02-21T09:31:52.9842751Z bar.sync 0; 2026-02-21T09:31:52.9843029Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9843352Z add.s32 %r1020, %r50, 122880; 2026-02-21T09:31:52.9843531Z // begin inline asm 2026-02-21T09:31:52.9843774Z @%p55 mbarrier.init.shared::cta.b64 [%r1020], 1; 2026-02-21T09:31:52.9844018Z // end inline asm 2026-02-21T09:31:52.9844163Z bar.sync 0; 2026-02-21T09:31:52.9844316Z add.s32 %r188, %r50, 122888; 2026-02-21T09:31:52.9844491Z // begin inline asm 2026-02-21T09:31:52.9844722Z @%p55 mbarrier.init.shared::cta.b64 [%r188], 1; 2026-02-21T09:31:52.9844943Z // end inline asm 2026-02-21T09:31:52.9845216Z .loc 1 44 60 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:60 2026-02-21T09:31:52.9845535Z or.b32 %r318, %r307, %r274; 2026-02-21T09:31:52.9845705Z or.b32 %r319, %r34, %r274; 2026-02-21T09:31:52.9845883Z or.b32 %r320, %r35, %r274; 2026-02-21T09:31:52.9846051Z or.b32 %r321, %r36, %r274; 2026-02-21T09:31:52.9846230Z or.b32 %r322, %r309, %r274; 2026-02-21T09:31:52.9846408Z or.b32 %r323, %r311, %r274; 2026-02-21T09:31:52.9846767Z .loc 1 44 32 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:32 2026-02-21T09:31:52.9847109Z mad.wide.u32 %rd50, %r318, 2, %rd47; 2026-02-21T09:31:52.9847317Z mad.wide.u32 %rd51, %r319, 2, %rd47; 2026-02-21T09:31:52.9847532Z mad.wide.u32 %rd52, %r320, 2, %rd47; 2026-02-21T09:31:52.9847741Z mad.wide.u32 %rd53, %r321, 2, %rd47; 2026-02-21T09:31:52.9847935Z cvt.u64.u32 %rd90, %r274; 2026-02-21T09:31:52.9848107Z cvt.u64.u32 %rd1, %r307; 2026-02-21T09:31:52.9848283Z or.b64 %rd91, %rd1, %rd90; 2026-02-21T09:31:52.9848455Z shl.b64 %rd92, %rd91, 1; 2026-02-21T09:31:52.9848621Z add.s64 %rd2, %rd47, %rd92; 2026-02-21T09:31:52.9848801Z add.s64 %rd54, %rd2, 131072; 2026-02-21T09:31:52.9848971Z add.s64 %rd55, %rd2, 163840; 2026-02-21T09:31:52.9849146Z add.s64 %rd56, %rd2, 196608; 2026-02-21T09:31:52.9849318Z mad.wide.u32 %rd57, %r322, 2, %rd47; 2026-02-21T09:31:52.9849506Z add.s64 %rd58, %rd2, 262144; 2026-02-21T09:31:52.9849670Z add.s64 %rd59, %rd2, 294912; 2026-02-21T09:31:52.9849844Z add.s64 %rd60, %rd2, 327680; 2026-02-21T09:31:52.9850013Z add.s64 %rd61, %rd2, 360448; 2026-02-21T09:31:52.9850187Z add.s64 %rd62, %rd2, 393216; 2026-02-21T09:31:52.9850360Z add.s64 %rd63, %rd2, 425984; 2026-02-21T09:31:52.9850524Z add.s64 %rd64, %rd2, 458752; 2026-02-21T09:31:52.9850702Z mad.wide.u32 %rd65, %r323, 2, %rd47; 2026-02-21T09:31:52.9850882Z mov.b32 %r372, 16; 2026-02-21T09:31:52.9851165Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9851473Z // begin inline asm 2026-02-21T09:31:52.9851703Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd50 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9851949Z // end inline asm 2026-02-21T09:31:52.9852103Z // begin inline asm 2026-02-21T09:31:52.9852323Z cp.async.cg.shared.global [ %r191 + 0 ], [ %rd51 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9852565Z // end inline asm 2026-02-21T09:31:52.9852719Z // begin inline asm 2026-02-21T09:31:52.9852928Z cp.async.cg.shared.global [ %r193 + 0 ], [ %rd52 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9853180Z // end inline asm 2026-02-21T09:31:52.9853324Z // begin inline asm 2026-02-21T09:31:52.9853539Z cp.async.cg.shared.global [ %r195 + 0 ], [ %rd53 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9853776Z // end inline asm 2026-02-21T09:31:52.9853927Z // begin inline asm 2026-02-21T09:31:52.9854141Z cp.async.cg.shared.global [ %r197 + 0 ], [ %rd54 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9854378Z // end inline asm 2026-02-21T09:31:52.9854528Z // begin inline asm 2026-02-21T09:31:52.9854779Z cp.async.cg.shared.global [ %r199 + 0 ], [ %rd55 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9855030Z // end inline asm 2026-02-21T09:31:52.9855175Z // begin inline asm 2026-02-21T09:31:52.9855387Z cp.async.cg.shared.global [ %r201 + 0 ], [ %rd56 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9855622Z // end inline asm 2026-02-21T09:31:52.9855776Z // begin inline asm 2026-02-21T09:31:52.9855990Z cp.async.cg.shared.global [ %r203 + 0 ], [ %rd57 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9856236Z // end inline asm 2026-02-21T09:31:52.9856426Z // begin inline asm 2026-02-21T09:31:52.9856665Z cp.async.cg.shared.global [ %r205 + 0 ], [ %rd58 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9856916Z // end inline asm 2026-02-21T09:31:52.9857064Z // begin inline asm 2026-02-21T09:31:52.9857282Z cp.async.cg.shared.global [ %r207 + 0 ], [ %rd59 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9857525Z // end inline asm 2026-02-21T09:31:52.9857682Z // begin inline asm 2026-02-21T09:31:52.9857897Z cp.async.cg.shared.global [ %r209 + 0 ], [ %rd60 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9858141Z // end inline asm 2026-02-21T09:31:52.9858308Z // begin inline asm 2026-02-21T09:31:52.9858553Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd61 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9858852Z // end inline asm 2026-02-21T09:31:52.9859026Z // begin inline asm 2026-02-21T09:31:52.9859274Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd62 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9859540Z // end inline asm 2026-02-21T09:31:52.9859725Z // begin inline asm 2026-02-21T09:31:52.9859934Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd63 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9860189Z // end inline asm 2026-02-21T09:31:52.9860342Z // begin inline asm 2026-02-21T09:31:52.9860549Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd64 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9860804Z // end inline asm 2026-02-21T09:31:52.9860949Z // begin inline asm 2026-02-21T09:31:52.9861164Z cp.async.cg.shared.global [ %r219 + 0 ], [ %rd65 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9861415Z // end inline asm 2026-02-21T09:31:52.9861578Z cp.async.commit_group; 2026-02-21T09:31:52.9861865Z .loc 1 45 59 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:59 2026-02-21T09:31:52.9862197Z or.b32 %r324, %r312, %r274; 2026-02-21T09:31:52.9862374Z or.b32 %r325, %r313, %r274; 2026-02-21T09:31:52.9862543Z or.b32 %r326, %r314, %r274; 2026-02-21T09:31:52.9862718Z or.b32 %r327, %r315, %r274; 2026-02-21T09:31:52.9863004Z .loc 1 45 34 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:34 2026-02-21T09:31:52.9863333Z mad.wide.u32 %rd66, %r324, 2, %rd48; 2026-02-21T09:31:52.9863524Z mad.wide.u32 %rd67, %r325, 2, %rd48; 2026-02-21T09:31:52.9863718Z mad.wide.u32 %rd68, %r326, 2, %rd48; 2026-02-21T09:31:52.9863908Z mad.wide.u32 %rd69, %r327, 2, %rd48; 2026-02-21T09:31:52.9864209Z .loc 1 45 87 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:87 2026-02-21T09:31:52.9864528Z // begin inline asm 2026-02-21T09:31:52.9864811Z cp.async.cg.shared.global [ %r221 + 0 ], [ %rd66 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9865062Z // end inline asm 2026-02-21T09:31:52.9865210Z // begin inline asm 2026-02-21T09:31:52.9865433Z cp.async.cg.shared.global [ %r223 + 0 ], [ %rd67 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9865670Z // end inline asm 2026-02-21T09:31:52.9865827Z // begin inline asm 2026-02-21T09:31:52.9866048Z cp.async.cg.shared.global [ %r225 + 0 ], [ %rd68 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9866289Z // end inline asm 2026-02-21T09:31:52.9866444Z // begin inline asm 2026-02-21T09:31:52.9866654Z cp.async.cg.shared.global [ %r227 + 0 ], [ %rd69 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9866904Z // end inline asm 2026-02-21T09:31:52.9867059Z cp.async.commit_group; 2026-02-21T09:31:52.9867355Z .loc 1 44 32 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:32 2026-02-21T09:31:52.9867669Z add.s64 %rd70, %rd2, 128; 2026-02-21T09:31:52.9867852Z add.s64 %rd71, %rd51, 128; 2026-02-21T09:31:52.9868032Z add.s64 %rd72, %rd52, 128; 2026-02-21T09:31:52.9868204Z add.s64 %rd73, %rd53, 128; 2026-02-21T09:31:52.9868381Z or.b32 %r328, %r318, 64; 2026-02-21T09:31:52.9868557Z mad.wide.u32 %rd93, %r328, 2, %rd47; 2026-02-21T09:31:52.9868762Z add.s64 %rd74, %rd93, 131072; 2026-02-21T09:31:52.9868937Z add.s64 %rd75, %rd93, 163840; 2026-02-21T09:31:52.9869120Z add.s64 %rd76, %rd93, 196608; 2026-02-21T09:31:52.9869298Z cvt.u64.u32 %rd6, %r309; 2026-02-21T09:31:52.9869507Z or.b64 %rd94, %rd6, %rd90; 2026-02-21T09:31:52.9869700Z shl.b64 %rd95, %rd94, 1; 2026-02-21T09:31:52.9869874Z add.s64 %rd7, %rd47, %rd95; 2026-02-21T09:31:52.9870049Z add.s64 %rd77, %rd7, 128; 2026-02-21T09:31:52.9870215Z add.s64 %rd78, %rd93, 262144; 2026-02-21T09:31:52.9870393Z add.s64 %rd79, %rd93, 294912; 2026-02-21T09:31:52.9870559Z add.s64 %rd80, %rd93, 327680; 2026-02-21T09:31:52.9870733Z add.s64 %rd81, %rd93, 360448; 2026-02-21T09:31:52.9870898Z add.s64 %rd82, %rd93, 393216; 2026-02-21T09:31:52.9871071Z add.s64 %rd83, %rd93, 425984; 2026-02-21T09:31:52.9871237Z add.s64 %rd84, %rd93, 458752; 2026-02-21T09:31:52.9871416Z cvt.u64.u32 %rd8, %r311; 2026-02-21T09:31:52.9871586Z or.b64 %rd96, %rd8, %rd90; 2026-02-21T09:31:52.9871752Z shl.b64 %rd97, %rd96, 1; 2026-02-21T09:31:52.9871924Z add.s64 %rd9, %rd47, %rd97; 2026-02-21T09:31:52.9872090Z add.s64 %rd85, %rd9, 128; 2026-02-21T09:31:52.9872470Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9872795Z bar.sync 0; 2026-02-21T09:31:52.9872946Z // begin inline asm 2026-02-21T09:31:52.9873161Z cp.async.cg.shared.global [ %r229 + 0 ], [ %rd70 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9873414Z // end inline asm 2026-02-21T09:31:52.9873561Z // begin inline asm 2026-02-21T09:31:52.9873781Z cp.async.cg.shared.global [ %r231 + 0 ], [ %rd71 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9874031Z // end inline asm 2026-02-21T09:31:52.9874176Z // begin inline asm 2026-02-21T09:31:52.9874394Z cp.async.cg.shared.global [ %r233 + 0 ], [ %rd72 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9874638Z // end inline asm 2026-02-21T09:31:52.9874842Z // begin inline asm 2026-02-21T09:31:52.9875053Z cp.async.cg.shared.global [ %r235 + 0 ], [ %rd73 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9875304Z // end inline asm 2026-02-21T09:31:52.9875451Z // begin inline asm 2026-02-21T09:31:52.9875669Z cp.async.cg.shared.global [ %r237 + 0 ], [ %rd74 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9875919Z // end inline asm 2026-02-21T09:31:52.9876066Z // begin inline asm 2026-02-21T09:31:52.9876285Z cp.async.cg.shared.global [ %r239 + 0 ], [ %rd75 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9876526Z // end inline asm 2026-02-21T09:31:52.9876683Z // begin inline asm 2026-02-21T09:31:52.9876897Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd76 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9877149Z // end inline asm 2026-02-21T09:31:52.9877297Z // begin inline asm 2026-02-21T09:31:52.9877517Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd77 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9877772Z // end inline asm 2026-02-21T09:31:52.9877919Z // begin inline asm 2026-02-21T09:31:52.9878141Z cp.async.cg.shared.global [ %r245 + 0 ], [ %rd78 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9878384Z // end inline asm 2026-02-21T09:31:52.9878543Z // begin inline asm 2026-02-21T09:31:52.9878761Z cp.async.cg.shared.global [ %r247 + 0 ], [ %rd79 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9879014Z // end inline asm 2026-02-21T09:31:52.9879163Z // begin inline asm 2026-02-21T09:31:52.9879383Z cp.async.cg.shared.global [ %r249 + 0 ], [ %rd80 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9879629Z // end inline asm 2026-02-21T09:31:52.9879776Z // begin inline asm 2026-02-21T09:31:52.9879992Z cp.async.cg.shared.global [ %r251 + 0 ], [ %rd81 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9880233Z // end inline asm 2026-02-21T09:31:52.9880386Z // begin inline asm 2026-02-21T09:31:52.9880595Z cp.async.cg.shared.global [ %r253 + 0 ], [ %rd82 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9880841Z // end inline asm 2026-02-21T09:31:52.9880986Z // begin inline asm 2026-02-21T09:31:52.9881202Z cp.async.cg.shared.global [ %r255 + 0 ], [ %rd83 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9881448Z // end inline asm 2026-02-21T09:31:52.9881596Z // begin inline asm 2026-02-21T09:31:52.9881817Z cp.async.cg.shared.global [ %r257 + 0 ], [ %rd84 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9882060Z // end inline asm 2026-02-21T09:31:52.9882231Z // begin inline asm 2026-02-21T09:31:52.9882478Z cp.async.cg.shared.global [ %r259 + 0 ], [ %rd85 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9882753Z // end inline asm 2026-02-21T09:31:52.9882906Z cp.async.commit_group; 2026-02-21T09:31:52.9883203Z .loc 1 45 34 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:34 2026-02-21T09:31:52.9883535Z add.s64 %rd86, %rd66, 128; 2026-02-21T09:31:52.9883705Z add.s64 %rd87, %rd67, 128; 2026-02-21T09:31:52.9883882Z add.s64 %rd88, %rd68, 128; 2026-02-21T09:31:52.9884046Z add.s64 %rd89, %rd69, 128; 2026-02-21T09:31:52.9884337Z .loc 1 45 87 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:87 2026-02-21T09:31:52.9884648Z // begin inline asm 2026-02-21T09:31:52.9884942Z cp.async.cg.shared.global [ %r261 + 0 ], [ %rd86 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9885184Z // end inline asm 2026-02-21T09:31:52.9885395Z // begin inline asm 2026-02-21T09:31:52.9885672Z cp.async.cg.shared.global [ %r263 + 0 ], [ %rd87 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9885923Z // end inline asm 2026-02-21T09:31:52.9886084Z // begin inline asm 2026-02-21T09:31:52.9886300Z cp.async.cg.shared.global [ %r265 + 0 ], [ %rd88 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9886556Z // end inline asm 2026-02-21T09:31:52.9886707Z // begin inline asm 2026-02-21T09:31:52.9886927Z cp.async.cg.shared.global [ %r267 + 0 ], [ %rd89 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9887175Z // end inline asm 2026-02-21T09:31:52.9887342Z cp.async.commit_group; 2026-02-21T09:31:52.9887655Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9887985Z cp.async.wait_group 2; 2026-02-21T09:31:52.9888180Z bar.sync 0; 2026-02-21T09:31:52.9888458Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9888804Z setp.ne.b32 %p14, %r37, 0; 2026-02-21T09:31:52.9888993Z @%p14 bra $L__BB0_3; 2026-02-21T09:31:52.9889186Z // %bb.2: 2026-02-21T09:31:52.9889454Z .loc 1 0 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:0:52 2026-02-21T09:31:52.9889800Z add.s32 %r346, %r50, 16480; 2026-02-21T09:31:52.9890011Z bfe.u32 %r347, %r346, 4, 14; 2026-02-21T09:31:52.9890223Z cvt.u64.u32 %rd115, %r347; 2026-02-21T09:31:52.9890438Z or.b64 %rd112, %rd115, 4611686293439512576; 2026-02-21T09:31:52.9890656Z add.s32 %r348, %r50, 16448; 2026-02-21T09:31:52.9890858Z bfe.u32 %r349, %r348, 4, 14; 2026-02-21T09:31:52.9891047Z cvt.u64.u32 %rd116, %r349; 2026-02-21T09:31:52.9891258Z or.b64 %rd110, %rd116, 4611686293439512576; 2026-02-21T09:31:52.9891477Z add.s32 %r350, %r50, 16416; 2026-02-21T09:31:52.9891681Z bfe.u32 %r351, %r350, 4, 14; 2026-02-21T09:31:52.9891869Z cvt.u64.u32 %rd117, %r351; 2026-02-21T09:31:52.9892083Z or.b64 %rd108, %rd117, 4611686293439512576; 2026-02-21T09:31:52.9892310Z add.s32 %r352, %r50, 16384; 2026-02-21T09:31:52.9892498Z bfe.u32 %r353, %r352, 4, 14; 2026-02-21T09:31:52.9892684Z cvt.u64.u32 %rd118, %r353; 2026-02-21T09:31:52.9892869Z or.b64 %rd106, %rd118, 4611686293439512576; 2026-02-21T09:31:52.9893077Z add.s32 %r354, %r50, 98304; 2026-02-21T09:31:52.9893249Z add.s32 %r355, %r50, 98400; 2026-02-21T09:31:52.9893430Z bfe.u32 %r356, %r355, 4, 14; 2026-02-21T09:31:52.9893603Z cvt.u64.u32 %rd119, %r356; 2026-02-21T09:31:52.9893795Z or.b64 %rd105, %rd119, 4611686293338849280; 2026-02-21T09:31:52.9894003Z add.s32 %r357, %r50, 96; 2026-02-21T09:31:52.9894175Z bfe.u32 %r358, %r357, 4, 14; 2026-02-21T09:31:52.9894356Z cvt.u64.u32 %rd120, %r358; 2026-02-21T09:31:52.9894536Z or.b64 %rd104, %rd120, 4611686293439512576; 2026-02-21T09:31:52.9894782Z add.s32 %r359, %r50, 98368; 2026-02-21T09:31:52.9894949Z bfe.u32 %r360, %r359, 4, 14; 2026-02-21T09:31:52.9895124Z cvt.u64.u32 %rd121, %r360; 2026-02-21T09:31:52.9895295Z or.b64 %rd103, %rd121, 4611686293338849280; 2026-02-21T09:31:52.9895495Z add.s32 %r361, %r50, 64; 2026-02-21T09:31:52.9895668Z bfe.u32 %r362, %r361, 4, 14; 2026-02-21T09:31:52.9895905Z cvt.u64.u32 %rd122, %r362; 2026-02-21T09:31:52.9896086Z or.b64 %rd102, %rd122, 4611686293439512576; 2026-02-21T09:31:52.9896276Z add.s32 %r363, %r50, 98336; 2026-02-21T09:31:52.9896449Z bfe.u32 %r364, %r363, 4, 14; 2026-02-21T09:31:52.9896614Z cvt.u64.u32 %rd123, %r364; 2026-02-21T09:31:52.9896793Z or.b64 %rd101, %rd123, 4611686293338849280; 2026-02-21T09:31:52.9896983Z add.s32 %r365, %r50, 32; 2026-02-21T09:31:52.9897153Z bfe.u32 %r366, %r365, 4, 14; 2026-02-21T09:31:52.9897321Z cvt.u64.u32 %rd124, %r366; 2026-02-21T09:31:52.9897503Z or.b64 %rd100, %rd124, 4611686293439512576; 2026-02-21T09:31:52.9897703Z bfe.u32 %r367, %r354, 4, 14; 2026-02-21T09:31:52.9897872Z cvt.u64.u32 %rd125, %r367; 2026-02-21T09:31:52.9898059Z or.b64 %rd99, %rd125, 4611686293338849280; 2026-02-21T09:31:52.9898253Z bfe.u32 %r368, %r50, 4, 14; 2026-02-21T09:31:52.9898460Z cvt.u64.u32 %rd126, %r368; 2026-02-21T09:31:52.9898665Z or.b64 %rd98, %rd126, 4611686293439512576; 2026-02-21T09:31:52.9898996Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9899324Z elect.sync %r369|%p16, -1; 2026-02-21T09:31:52.9899511Z mov.b32 %r330, 135266320; 2026-02-21T09:31:52.9899688Z mov.pred %p15, 0; 2026-02-21T09:31:52.9899849Z // begin inline asm 2026-02-21T09:31:52.9900119Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd98, %rd99, %r330, %p15; 2026-02-21T09:31:52.9900400Z // end inline asm 2026-02-21T09:31:52.9900556Z // begin inline asm 2026-02-21T09:31:52.9900801Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd100, %rd101, %r330, %p17; 2026-02-21T09:31:52.9901092Z // end inline asm 2026-02-21T09:31:52.9901237Z // begin inline asm 2026-02-21T09:31:52.9901484Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd102, %rd103, %r330, %p17; 2026-02-21T09:31:52.9901772Z // end inline asm 2026-02-21T09:31:52.9901919Z // begin inline asm 2026-02-21T09:31:52.9902166Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd104, %rd105, %r330, %p17; 2026-02-21T09:31:52.9902447Z // end inline asm 2026-02-21T09:31:52.9902599Z // begin inline asm 2026-02-21T09:31:52.9902832Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd106, %rd99, %r330, %p15; 2026-02-21T09:31:52.9902901Z // end inline asm 2026-02-21T09:31:52.9902962Z // begin inline asm 2026-02-21T09:31:52.9903113Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd108, %rd101, %r330, %p17; 2026-02-21T09:31:52.9903181Z // end inline asm 2026-02-21T09:31:52.9903241Z // begin inline asm 2026-02-21T09:31:52.9903391Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd110, %rd103, %r330, %p17; 2026-02-21T09:31:52.9903459Z // end inline asm 2026-02-21T09:31:52.9903519Z // begin inline asm 2026-02-21T09:31:52.9903667Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd112, %rd105, %r330, %p17; 2026-02-21T09:31:52.9903736Z // end inline asm 2026-02-21T09:31:52.9903803Z add.s32 %r370, %r50, 122880; 2026-02-21T09:31:52.9903870Z cvt.u64.u32 %rd114, %r370; 2026-02-21T09:31:52.9903930Z // begin inline asm 2026-02-21T09:31:52.9904077Z @%p16 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd114]; 2026-02-21T09:31:52.9904135Z // end inline asm 2026-02-21T09:31:52.9904195Z $L__BB0_3: 2026-02-21T09:31:52.9904389Z .loc 1 0 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:0:52 2026-02-21T09:31:52.9904484Z ld.param.b64 %rd49, [_helion_matmul_param_2]; 2026-02-21T09:31:52.9904548Z add.s32 %r27, %r50, %r284; 2026-02-21T09:31:52.9904611Z add.s32 %r28, %r50, %r285; 2026-02-21T09:31:52.9904727Z add.s32 %r681, %r676, 1024; 2026-02-21T09:31:52.9904797Z or.b32 %r32, %r31, %r274; 2026-02-21T09:31:52.9904993Z .loc 1 44 32 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:32 2026-02-21T09:31:52.9905069Z add.s64 %rd127, %rd2, 256; 2026-02-21T09:31:52.9905138Z add.s64 %rd128, %rd51, 256; 2026-02-21T09:31:52.9905237Z add.s64 %rd129, %rd52, 256; 2026-02-21T09:31:52.9905335Z add.s64 %rd130, %rd53, 256; 2026-02-21T09:31:52.9905401Z cvt.u64.u32 %rd148, %r6; 2026-02-21T09:31:52.9905468Z add.s64 %rd149, %rd1, %rd148; 2026-02-21T09:31:52.9905531Z shl.b64 %rd150, %rd149, 1; 2026-02-21T09:31:52.9905606Z add.s64 %rd151, %rd47, %rd150; 2026-02-21T09:31:52.9905675Z add.s64 %rd131, %rd151, 131072; 2026-02-21T09:31:52.9905740Z add.s64 %rd132, %rd151, 163840; 2026-02-21T09:31:52.9905809Z add.s64 %rd133, %rd151, 196608; 2026-02-21T09:31:52.9905872Z add.s64 %rd134, %rd7, 256; 2026-02-21T09:31:52.9905935Z add.s64 %rd135, %rd151, 262144; 2026-02-21T09:31:52.9905997Z add.s64 %rd136, %rd151, 294912; 2026-02-21T09:31:52.9906067Z add.s64 %rd137, %rd151, 327680; 2026-02-21T09:31:52.9906129Z add.s64 %rd138, %rd151, 360448; 2026-02-21T09:31:52.9906191Z add.s64 %rd139, %rd151, 393216; 2026-02-21T09:31:52.9906291Z add.s64 %rd140, %rd151, 425984; 2026-02-21T09:31:52.9906384Z add.s64 %rd141, %rd151, 458752; 2026-02-21T09:31:52.9906452Z add.s64 %rd142, %rd9, 256; 2026-02-21T09:31:52.9906652Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9906711Z bar.sync 0; 2026-02-21T09:31:52.9906772Z // begin inline asm 2026-02-21T09:31:52.9906906Z cp.async.cg.shared.global [ %r371 + 0 ], [ %rd127 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9906975Z // end inline asm 2026-02-21T09:31:52.9907036Z // begin inline asm 2026-02-21T09:31:52.9907165Z cp.async.cg.shared.global [ %r373 + 0 ], [ %rd128 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9907230Z // end inline asm 2026-02-21T09:31:52.9907291Z // begin inline asm 2026-02-21T09:31:52.9907415Z cp.async.cg.shared.global [ %r375 + 0 ], [ %rd129 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9907475Z // end inline asm 2026-02-21T09:31:52.9907546Z // begin inline asm 2026-02-21T09:31:52.9907674Z cp.async.cg.shared.global [ %r377 + 0 ], [ %rd130 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9907735Z // end inline asm 2026-02-21T09:31:52.9907805Z // begin inline asm 2026-02-21T09:31:52.9907928Z cp.async.cg.shared.global [ %r379 + 0 ], [ %rd131 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9907988Z // end inline asm 2026-02-21T09:31:52.9908048Z // begin inline asm 2026-02-21T09:31:52.9908178Z cp.async.cg.shared.global [ %r381 + 0 ], [ %rd132 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9908237Z // end inline asm 2026-02-21T09:31:52.9908297Z // begin inline asm 2026-02-21T09:31:52.9908425Z cp.async.cg.shared.global [ %r383 + 0 ], [ %rd133 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9908485Z // end inline asm 2026-02-21T09:31:52.9908546Z // begin inline asm 2026-02-21T09:31:52.9908676Z cp.async.cg.shared.global [ %r385 + 0 ], [ %rd134 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9908736Z // end inline asm 2026-02-21T09:31:52.9908799Z // begin inline asm 2026-02-21T09:31:52.9908926Z cp.async.cg.shared.global [ %r387 + 0 ], [ %rd135 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9908997Z // end inline asm 2026-02-21T09:31:52.9909060Z // begin inline asm 2026-02-21T09:31:52.9909184Z cp.async.cg.shared.global [ %r389 + 0 ], [ %rd136 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9909253Z // end inline asm 2026-02-21T09:31:52.9909314Z // begin inline asm 2026-02-21T09:31:52.9909437Z cp.async.cg.shared.global [ %r391 + 0 ], [ %rd137 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9909495Z // end inline asm 2026-02-21T09:31:52.9909563Z // begin inline asm 2026-02-21T09:31:52.9909684Z cp.async.cg.shared.global [ %r393 + 0 ], [ %rd138 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9909742Z // end inline asm 2026-02-21T09:31:52.9909809Z // begin inline asm 2026-02-21T09:31:52.9909930Z cp.async.cg.shared.global [ %r395 + 0 ], [ %rd139 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9909988Z // end inline asm 2026-02-21T09:31:52.9910055Z // begin inline asm 2026-02-21T09:31:52.9910178Z cp.async.cg.shared.global [ %r397 + 0 ], [ %rd140 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9910238Z // end inline asm 2026-02-21T09:31:52.9910299Z // begin inline asm 2026-02-21T09:31:52.9910457Z cp.async.cg.shared.global [ %r399 + 0 ], [ %rd141 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9910537Z // end inline asm 2026-02-21T09:31:52.9910597Z // begin inline asm 2026-02-21T09:31:52.9910725Z cp.async.cg.shared.global [ %r401 + 0 ], [ %rd142 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9910785Z // end inline asm 2026-02-21T09:31:52.9910854Z cp.async.commit_group; 2026-02-21T09:31:52.9911041Z .loc 1 45 34 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:34 2026-02-21T09:31:52.9911114Z add.s64 %rd143, %rd66, 256; 2026-02-21T09:31:52.9911178Z add.s64 %rd144, %rd67, 256; 2026-02-21T09:31:52.9911241Z add.s64 %rd145, %rd68, 256; 2026-02-21T09:31:52.9911311Z add.s64 %rd146, %rd69, 256; 2026-02-21T09:31:52.9911499Z .loc 1 45 87 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:87 2026-02-21T09:31:52.9911582Z // begin inline asm 2026-02-21T09:31:52.9911736Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd143 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9911801Z // end inline asm 2026-02-21T09:31:52.9911864Z // begin inline asm 2026-02-21T09:31:52.9911985Z cp.async.cg.shared.global [ %r405 + 0 ], [ %rd144 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9912054Z // end inline asm 2026-02-21T09:31:52.9912113Z // begin inline asm 2026-02-21T09:31:52.9912236Z cp.async.cg.shared.global [ %r407 + 0 ], [ %rd145 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9912304Z // end inline asm 2026-02-21T09:31:52.9912364Z // begin inline asm 2026-02-21T09:31:52.9912487Z cp.async.cg.shared.global [ %r409 + 0 ], [ %rd146 + 0 ], 0x10, %r372; 2026-02-21T09:31:52.9912554Z // end inline asm 2026-02-21T09:31:52.9912622Z cp.async.commit_group; 2026-02-21T09:31:52.9912805Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9912869Z and.b32 %r415, %r1, 7; 2026-02-21T09:31:52.9912949Z mul.wide.u32 %rd14, %r415, 16; 2026-02-21T09:31:52.9913015Z shl.b64 %rd152, %rd8, 1; 2026-02-21T09:31:52.9913082Z add.s64 %rd153, %rd152, %rd47; 2026-02-21T09:31:52.9913155Z add.s64 %rd499, %rd153, 384; 2026-02-21T09:31:52.9913219Z shl.b64 %rd154, %rd6, 1; 2026-02-21T09:31:52.9913284Z add.s64 %rd155, %rd154, %rd47; 2026-02-21T09:31:52.9913347Z add.s64 %rd498, %rd155, 384; 2026-02-21T09:31:52.9913418Z add.s32 %r416, %r31, %r4; 2026-02-21T09:31:52.9913483Z add.s32 %r417, %r416, 48; 2026-02-21T09:31:52.9913561Z mad.wide.u32 %rd156, %r417, 2048, %rd48; 2026-02-21T09:31:52.9913632Z add.s64 %rd497, %rd156, 384; 2026-02-21T09:31:52.9913694Z add.s32 %r418, %r416, 32; 2026-02-21T09:31:52.9913768Z mad.wide.u32 %rd157, %r418, 2048, %rd48; 2026-02-21T09:31:52.9913831Z add.s64 %rd496, %rd157, 384; 2026-02-21T09:31:52.9913900Z add.s32 %r419, %r416, 16; 2026-02-21T09:31:52.9913972Z mad.wide.u32 %rd158, %r419, 2048, %rd48; 2026-02-21T09:31:52.9914035Z add.s64 %rd495, %rd158, 384; 2026-02-21T09:31:52.9914108Z and.b32 %r420, %r3, 15; 2026-02-21T09:31:52.9914170Z shl.b32 %r421, %r420, 16; 2026-02-21T09:31:52.9914233Z shl.b32 %r422, %r4, 10; 2026-02-21T09:31:52.9914304Z or.b32 %r423, %r421, %r422; 2026-02-21T09:31:52.9914376Z mad.wide.u32 %rd159, %r423, 2, %rd48; 2026-02-21T09:31:52.9914440Z add.s64 %rd494, %rd159, 384; 2026-02-21T09:31:52.9914501Z shl.b32 %r424, %r3, 14; 2026-02-21T09:31:52.9914576Z and.b32 %r425, %r424, 16515072; 2026-02-21T09:31:52.9914639Z or.b32 %r426, %r425, %r422; 2026-02-21T09:31:52.9914756Z mad.wide.u32 %rd493, %r426, 2, %rd47; 2026-02-21T09:31:52.9914827Z add.s32 %r427, %r33, %r4; 2026-02-21T09:31:52.9914890Z add.s32 %r428, %r427, 48; 2026-02-21T09:31:52.9914963Z mad.wide.u32 %rd160, %r428, 2048, %rd47; 2026-02-21T09:31:52.9915026Z add.s64 %rd492, %rd160, 384; 2026-02-21T09:31:52.9915097Z add.s32 %r429, %r427, 32; 2026-02-21T09:31:52.9915193Z mad.wide.u32 %rd161, %r429, 2048, %rd47; 2026-02-21T09:31:52.9915258Z add.s64 %rd491, %rd161, 384; 2026-02-21T09:31:52.9915329Z add.s32 %r430, %r427, 16; 2026-02-21T09:31:52.9915433Z mad.wide.u32 %rd162, %r430, 2048, %rd47; 2026-02-21T09:31:52.9915524Z add.s64 %rd490, %rd162, 384; 2026-02-21T09:31:52.9915585Z mov.b32 %r1023, 1; 2026-02-21T09:31:52.9915651Z mov.b32 %r1022, 2; 2026-02-21T09:31:52.9915714Z mov.b64 %rd489, -64; 2026-02-21T09:31:52.9915778Z mov.b32 %r1021, %r1019; 2026-02-21T09:31:52.9915848Z mov.b32 %r1024, %r1019; 2026-02-21T09:31:52.9915910Z bra.uni $L__BB0_4; 2026-02-21T09:31:52.9916024Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:31:52.9916218Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9916283Z add.s64 %rd489, %rd489, 64; 2026-02-21T09:31:52.9916353Z setp.lt.u64 %p51, %rd489, 832; 2026-02-21T09:31:52.9916540Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9916665Z // begin inline asm 2026-02-21T09:31:52.9916724Z 2026-02-21T09:31:52.9916809Z { 2026-02-21T09:31:52.9916889Z .reg .pred complete; 2026-02-21T09:31:52.9916953Z waitLoop: 2026-02-21T09:31:52.9917094Z mbarrier.try_wait.parity.shared.b64 complete, [%r1020], %r1019; 2026-02-21T09:31:52.9917165Z @!complete bra.uni waitLoop; 2026-02-21T09:31:52.9917230Z } 2026-02-21T09:31:52.9917234Z 2026-02-21T09:31:52.9917296Z // end inline asm 2026-02-21T09:31:52.9917362Z add.s32 %r523, %r1023, 1; 2026-02-21T09:31:52.9917440Z setp.gt.s32 %p52, %r523, 1; 2026-02-21T09:31:52.9917512Z selp.b32 %r1023, 0, %r523, %p52; 2026-02-21T09:31:52.9917580Z selp.b32 %r524, 1, 0, %p52; 2026-02-21T09:31:52.9917650Z xor.b32 %r48, %r1024, %r524; 2026-02-21T09:31:52.9917834Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9917898Z add.s32 %r525, %r1022, 1; 2026-02-21T09:31:52.9917967Z setp.gt.s32 %p53, %r525, 2; 2026-02-21T09:31:52.9918056Z selp.b32 %r1022, 0, %r525, %p53; 2026-02-21T09:31:52.9918244Z .loc 1 44 32 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:32 2026-02-21T09:31:52.9918315Z add.s64 %rd212, %rd493, %rd14; 2026-02-21T09:31:52.9918391Z add.s64 %rd192, %rd212, 384; 2026-02-21T09:31:52.9918456Z add.s64 %rd193, %rd490, %rd14; 2026-02-21T09:31:52.9918521Z add.s64 %rd194, %rd491, %rd14; 2026-02-21T09:31:52.9918584Z add.s64 %rd195, %rd492, %rd14; 2026-02-21T09:31:52.9918658Z add.s64 %rd196, %rd212, 131456; 2026-02-21T09:31:52.9918722Z add.s64 %rd197, %rd212, 164224; 2026-02-21T09:31:52.9918785Z add.s64 %rd198, %rd212, 196992; 2026-02-21T09:31:52.9918856Z add.s64 %rd199, %rd498, %rd14; 2026-02-21T09:31:52.9918919Z add.s64 %rd200, %rd212, 262528; 2026-02-21T09:31:52.9918983Z add.s64 %rd201, %rd212, 295296; 2026-02-21T09:31:52.9919055Z add.s64 %rd202, %rd212, 328064; 2026-02-21T09:31:52.9919117Z add.s64 %rd203, %rd212, 360832; 2026-02-21T09:31:52.9919181Z add.s64 %rd204, %rd212, 393600; 2026-02-21T09:31:52.9919245Z add.s64 %rd205, %rd212, 426368; 2026-02-21T09:31:52.9919319Z add.s64 %rd206, %rd212, 459136; 2026-02-21T09:31:52.9919503Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9919568Z add.s64 %rd207, %rd499, %rd14; 2026-02-21T09:31:52.9919639Z shl.b32 %r526, %r1022, 15; 2026-02-21T09:31:52.9919702Z add.s32 %r528, %r50, %r526; 2026-02-21T09:31:52.9919760Z bar.sync 0; 2026-02-21T09:31:52.9919825Z add.s32 %r483, %r528, %r5; 2026-02-21T09:31:52.9919899Z selp.b32 %r484, 16, 0, %p51; 2026-02-21T09:31:52.9919961Z // begin inline asm 2026-02-21T09:31:52.9920087Z cp.async.cg.shared.global [ %r483 + 0 ], [ %rd192 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9920156Z // end inline asm 2026-02-21T09:31:52.9920219Z add.s32 %r485, %r483, 2048; 2026-02-21T09:31:52.9920280Z // begin inline asm 2026-02-21T09:31:52.9920412Z cp.async.cg.shared.global [ %r485 + 0 ], [ %rd193 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9920473Z // end inline asm 2026-02-21T09:31:52.9920536Z add.s32 %r487, %r483, 4096; 2026-02-21T09:31:52.9920621Z // begin inline asm 2026-02-21T09:31:52.9920775Z cp.async.cg.shared.global [ %r487 + 0 ], [ %rd194 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9920834Z // end inline asm 2026-02-21T09:31:52.9920894Z add.s32 %r489, %r483, 6144; 2026-02-21T09:31:52.9920960Z // begin inline asm 2026-02-21T09:31:52.9921081Z cp.async.cg.shared.global [ %r489 + 0 ], [ %rd195 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9921141Z // end inline asm 2026-02-21T09:31:52.9921202Z add.s32 %r491, %r483, 8192; 2026-02-21T09:31:52.9921271Z // begin inline asm 2026-02-21T09:31:52.9921391Z cp.async.cg.shared.global [ %r491 + 0 ], [ %rd196 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9921449Z // end inline asm 2026-02-21T09:31:52.9921520Z add.s32 %r493, %r483, 10240; 2026-02-21T09:31:52.9921580Z // begin inline asm 2026-02-21T09:31:52.9921699Z cp.async.cg.shared.global [ %r493 + 0 ], [ %rd197 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9921779Z // end inline asm 2026-02-21T09:31:52.9921872Z add.s32 %r495, %r483, 12288; 2026-02-21T09:31:52.9921936Z // begin inline asm 2026-02-21T09:31:52.9922058Z cp.async.cg.shared.global [ %r495 + 0 ], [ %rd198 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9922142Z // end inline asm 2026-02-21T09:31:52.9922204Z add.s32 %r497, %r483, 14336; 2026-02-21T09:31:52.9922263Z // begin inline asm 2026-02-21T09:31:52.9922389Z cp.async.cg.shared.global [ %r497 + 0 ], [ %rd199 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9922448Z // end inline asm 2026-02-21T09:31:52.9922511Z add.s32 %r499, %r483, 16384; 2026-02-21T09:31:52.9922570Z // begin inline asm 2026-02-21T09:31:52.9922698Z cp.async.cg.shared.global [ %r499 + 0 ], [ %rd200 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9922756Z // end inline asm 2026-02-21T09:31:52.9922818Z add.s32 %r501, %r483, 18432; 2026-02-21T09:31:52.9922885Z // begin inline asm 2026-02-21T09:31:52.9923005Z cp.async.cg.shared.global [ %r501 + 0 ], [ %rd201 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9923064Z // end inline asm 2026-02-21T09:31:52.9923128Z add.s32 %r503, %r483, 20480; 2026-02-21T09:31:52.9923196Z // begin inline asm 2026-02-21T09:31:52.9923318Z cp.async.cg.shared.global [ %r503 + 0 ], [ %rd202 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9923376Z // end inline asm 2026-02-21T09:31:52.9923446Z add.s32 %r505, %r483, 22528; 2026-02-21T09:31:52.9923507Z // begin inline asm 2026-02-21T09:31:52.9923627Z cp.async.cg.shared.global [ %r505 + 0 ], [ %rd203 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9923692Z // end inline asm 2026-02-21T09:31:52.9923754Z add.s32 %r507, %r483, 24576; 2026-02-21T09:31:52.9923813Z // begin inline asm 2026-02-21T09:31:52.9923933Z cp.async.cg.shared.global [ %r507 + 0 ], [ %rd204 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9924000Z // end inline asm 2026-02-21T09:31:52.9924062Z add.s32 %r509, %r483, 26624; 2026-02-21T09:31:52.9924123Z // begin inline asm 2026-02-21T09:31:52.9924254Z cp.async.cg.shared.global [ %r509 + 0 ], [ %rd205 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9924313Z // end inline asm 2026-02-21T09:31:52.9924375Z add.s32 %r511, %r483, 28672; 2026-02-21T09:31:52.9924438Z // begin inline asm 2026-02-21T09:31:52.9924568Z cp.async.cg.shared.global [ %r511 + 0 ], [ %rd206 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9924626Z // end inline asm 2026-02-21T09:31:52.9924746Z add.s32 %r513, %r483, 30720; 2026-02-21T09:31:52.9924819Z // begin inline asm 2026-02-21T09:31:52.9924944Z cp.async.cg.shared.global [ %r513 + 0 ], [ %rd207 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9925005Z // end inline asm 2026-02-21T09:31:52.9925072Z cp.async.commit_group; 2026-02-21T09:31:52.9925269Z .loc 1 45 34 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:34 2026-02-21T09:31:52.9925348Z add.s64 %rd208, %rd494, %rd14; 2026-02-21T09:31:52.9925412Z add.s64 %rd209, %rd495, %rd14; 2026-02-21T09:31:52.9925485Z add.s64 %rd210, %rd496, %rd14; 2026-02-21T09:31:52.9925667Z .loc 1 45 87 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:87 2026-02-21T09:31:52.9925732Z add.s64 %rd211, %rd497, %rd14; 2026-02-21T09:31:52.9925868Z shl.b32 %r529, %r1022, 13; 2026-02-21T09:31:52.9925932Z add.s32 %r530, %r50, %r529; 2026-02-21T09:31:52.9925995Z add.s32 %r531, %r530, %r5; 2026-02-21T09:31:52.9926057Z add.s32 %r515, %r531, 98304; 2026-02-21T09:31:52.9926126Z // begin inline asm 2026-02-21T09:31:52.9926248Z cp.async.cg.shared.global [ %r515 + 0 ], [ %rd208 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9926307Z // end inline asm 2026-02-21T09:31:52.9926379Z add.s32 %r517, %r531, 100352; 2026-02-21T09:31:52.9926439Z // begin inline asm 2026-02-21T09:31:52.9926560Z cp.async.cg.shared.global [ %r517 + 0 ], [ %rd209 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9926627Z // end inline asm 2026-02-21T09:31:52.9926693Z add.s32 %r519, %r531, 102400; 2026-02-21T09:31:52.9926753Z // begin inline asm 2026-02-21T09:31:52.9926875Z cp.async.cg.shared.global [ %r519 + 0 ], [ %rd210 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9926974Z // end inline asm 2026-02-21T09:31:52.9927065Z add.s32 %r521, %r531, 104448; 2026-02-21T09:31:52.9927130Z // begin inline asm 2026-02-21T09:31:52.9927260Z cp.async.cg.shared.global [ %r521 + 0 ], [ %rd211 + 0 ], 0x10, %r484; 2026-02-21T09:31:52.9927319Z // end inline asm 2026-02-21T09:31:52.9927386Z cp.async.commit_group; 2026-02-21T09:31:52.9927573Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9927645Z add.s64 %rd499, %rd499, 128; 2026-02-21T09:31:52.9927707Z add.s64 %rd498, %rd498, 128; 2026-02-21T09:31:52.9927769Z add.s64 %rd497, %rd497, 128; 2026-02-21T09:31:52.9927839Z add.s64 %rd496, %rd496, 128; 2026-02-21T09:31:52.9927901Z add.s64 %rd495, %rd495, 128; 2026-02-21T09:31:52.9927962Z add.s64 %rd494, %rd494, 128; 2026-02-21T09:31:52.9928032Z add.s64 %rd493, %rd493, 128; 2026-02-21T09:31:52.9928095Z add.s64 %rd492, %rd492, 128; 2026-02-21T09:31:52.9928157Z add.s64 %rd491, %rd491, 128; 2026-02-21T09:31:52.9928218Z add.s64 %rd490, %rd490, 128; 2026-02-21T09:31:52.9928298Z setp.lt.u64 %p54, %rd489, 896; 2026-02-21T09:31:52.9928364Z mov.b32 %r1019, %r1024; 2026-02-21T09:31:52.9928427Z mov.b32 %r1020, %r532; 2026-02-21T09:31:52.9928496Z mov.b32 %r1024, %r48; 2026-02-21T09:31:52.9928560Z @%p54 bra $L__BB0_4; 2026-02-21T09:31:52.9928620Z bra.uni $L__BB0_7; 2026-02-21T09:31:52.9928733Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:31:52.9928805Z add.s32 %r432, %r1021, 1; 2026-02-21T09:31:52.9928870Z setp.gt.s32 %p33, %r432, 2; 2026-02-21T09:31:52.9928938Z selp.b32 %r1021, 0, %r432, %p33; 2026-02-21T09:31:52.9929137Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9929205Z cp.async.wait_group 2; 2026-02-21T09:31:52.9929263Z bar.sync 0; 2026-02-21T09:31:52.9929453Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9929524Z shl.b32 %r433, %r1023, 3; 2026-02-21T09:31:52.9929588Z add.s32 %r435, %r50, %r433; 2026-02-21T09:31:52.9929652Z add.s32 %r532, %r435, 122880; 2026-02-21T09:31:52.9929849Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9929912Z @%p14 bra $L__BB0_6; 2026-02-21T09:31:52.9930018Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:31:52.9930210Z .loc 1 45 87 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:45:87 2026-02-21T09:31:52.9930273Z shl.b32 %r452, %r1021, 13; 2026-02-21T09:31:52.9930334Z add.s32 %r454, %r50, %r452; 2026-02-21T09:31:52.9930403Z add.s32 %r455, %r454, 98304; 2026-02-21T09:31:52.9930588Z .loc 1 44 85 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:44:85 2026-02-21T09:31:52.9930650Z shl.b32 %r456, %r1021, 15; 2026-02-21T09:31:52.9930713Z add.s32 %r457, %r50, %r456; 2026-02-21T09:31:52.9930904Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9931021Z elect.sync %r458|%p35, -1; 2026-02-21T09:31:52.9931086Z bfe.u32 %r459, %r457, 4, 14; 2026-02-21T09:31:52.9931156Z cvt.u64.u32 %rd180, %r459; 2026-02-21T09:31:52.9931236Z or.b64 %rd163, %rd180, 4611686293439512576; 2026-02-21T09:31:52.9931299Z bfe.u32 %r460, %r455, 4, 14; 2026-02-21T09:31:52.9931362Z cvt.u64.u32 %rd181, %r460; 2026-02-21T09:31:52.9931446Z or.b64 %rd164, %rd181, 4611686293338849280; 2026-02-21T09:31:52.9931508Z mov.b32 %r437, 135266320; 2026-02-21T09:31:52.9931574Z mov.pred %p34, -1; 2026-02-21T09:31:52.9931643Z // begin inline asm 2026-02-21T09:31:52.9931803Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd163, %rd164, %r437, %p34; 2026-02-21T09:31:52.9931864Z // end inline asm 2026-02-21T09:31:52.9931932Z add.s32 %r461, %r457, 32; 2026-02-21T09:31:52.9931994Z bfe.u32 %r462, %r461, 4, 14; 2026-02-21T09:31:52.9932077Z cvt.u64.u32 %rd182, %r462; 2026-02-21T09:31:52.9932172Z or.b64 %rd165, %rd182, 4611686293439512576; 2026-02-21T09:31:52.9932248Z add.s32 %r463, %r454, 98336; 2026-02-21T09:31:52.9932313Z bfe.u32 %r464, %r463, 4, 14; 2026-02-21T09:31:52.9932376Z cvt.u64.u32 %rd183, %r464; 2026-02-21T09:31:52.9932457Z or.b64 %rd166, %rd183, 4611686293338849280; 2026-02-21T09:31:52.9932519Z // begin inline asm 2026-02-21T09:31:52.9932671Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd165, %rd166, %r437, %p34; 2026-02-21T09:31:52.9932739Z // end inline asm 2026-02-21T09:31:52.9932801Z add.s32 %r465, %r457, 64; 2026-02-21T09:31:52.9932864Z bfe.u32 %r466, %r465, 4, 14; 2026-02-21T09:31:52.9932927Z cvt.u64.u32 %rd184, %r466; 2026-02-21T09:31:52.9933009Z or.b64 %rd167, %rd184, 4611686293439512576; 2026-02-21T09:31:52.9933075Z add.s32 %r467, %r454, 98368; 2026-02-21T09:31:52.9933136Z bfe.u32 %r468, %r467, 4, 14; 2026-02-21T09:31:52.9933211Z cvt.u64.u32 %rd185, %r468; 2026-02-21T09:31:52.9933287Z or.b64 %rd168, %rd185, 4611686293338849280; 2026-02-21T09:31:52.9933352Z // begin inline asm 2026-02-21T09:31:52.9933507Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd167, %rd168, %r437, %p34; 2026-02-21T09:31:52.9933579Z // end inline asm 2026-02-21T09:31:52.9933645Z add.s32 %r469, %r457, 96; 2026-02-21T09:31:52.9933710Z bfe.u32 %r470, %r469, 4, 14; 2026-02-21T09:31:52.9933782Z cvt.u64.u32 %rd186, %r470; 2026-02-21T09:31:52.9933856Z or.b64 %rd169, %rd186, 4611686293439512576; 2026-02-21T09:31:52.9933919Z add.s32 %r471, %r454, 98400; 2026-02-21T09:31:52.9933990Z bfe.u32 %r472, %r471, 4, 14; 2026-02-21T09:31:52.9934054Z cvt.u64.u32 %rd187, %r472; 2026-02-21T09:31:52.9934128Z or.b64 %rd170, %rd187, 4611686293338849280; 2026-02-21T09:31:52.9934189Z // begin inline asm 2026-02-21T09:31:52.9934344Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 0 ], %rd169, %rd170, %r437, %p34; 2026-02-21T09:31:52.9934404Z // end inline asm 2026-02-21T09:31:52.9934467Z add.s32 %r473, %r457, 16384; 2026-02-21T09:31:52.9934536Z bfe.u32 %r474, %r473, 4, 14; 2026-02-21T09:31:52.9934601Z cvt.u64.u32 %rd188, %r474; 2026-02-21T09:31:52.9934734Z or.b64 %rd171, %rd188, 4611686293439512576; 2026-02-21T09:31:52.9934797Z // begin inline asm 2026-02-21T09:31:52.9934958Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd171, %rd164, %r437, %p34; 2026-02-21T09:31:52.9935019Z // end inline asm 2026-02-21T09:31:52.9935080Z add.s32 %r475, %r457, 16416; 2026-02-21T09:31:52.9935152Z bfe.u32 %r476, %r475, 4, 14; 2026-02-21T09:31:52.9935216Z cvt.u64.u32 %rd189, %r476; 2026-02-21T09:31:52.9935289Z or.b64 %rd173, %rd189, 4611686293439512576; 2026-02-21T09:31:52.9935358Z // begin inline asm 2026-02-21T09:31:52.9935510Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd173, %rd166, %r437, %p34; 2026-02-21T09:31:52.9935570Z // end inline asm 2026-02-21T09:31:52.9935631Z add.s32 %r477, %r457, 16448; 2026-02-21T09:31:52.9935704Z bfe.u32 %r478, %r477, 4, 14; 2026-02-21T09:31:52.9935771Z cvt.u64.u32 %rd190, %r478; 2026-02-21T09:31:52.9935875Z or.b64 %rd175, %rd190, 4611686293439512576; 2026-02-21T09:31:52.9935988Z // begin inline asm 2026-02-21T09:31:52.9936137Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd175, %rd168, %r437, %p34; 2026-02-21T09:31:52.9936198Z // end inline asm 2026-02-21T09:31:52.9936268Z add.s32 %r479, %r457, 16480; 2026-02-21T09:31:52.9936330Z bfe.u32 %r480, %r479, 4, 14; 2026-02-21T09:31:52.9936395Z cvt.u64.u32 %rd191, %r480; 2026-02-21T09:31:52.9936467Z or.b64 %rd177, %rd191, 4611686293439512576; 2026-02-21T09:31:52.9936538Z // begin inline asm 2026-02-21T09:31:52.9936684Z @%p35 tcgen05.mma.cta_group::1.kind::f16 [ %r1018 + 64 ], %rd177, %rd170, %r437, %p34; 2026-02-21T09:31:52.9936744Z // end inline asm 2026-02-21T09:31:52.9936818Z cvt.u64.u32 %rd179, %r532; 2026-02-21T09:31:52.9936879Z // begin inline asm 2026-02-21T09:31:52.9937051Z @%p35 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd179]; 2026-02-21T09:31:52.9937146Z // end inline asm 2026-02-21T09:31:52.9937211Z bra.uni $L__BB0_6; 2026-02-21T09:31:52.9937317Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:31:52.9937509Z .loc 1 0 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:0:52 2026-02-21T09:31:52.9937581Z cvt.u32.u64 %r816, %rd8; 2026-02-21T09:31:52.9937646Z cvt.u32.u64 %r817, %rd6; 2026-02-21T09:31:52.9937709Z cvt.u32.u64 %r818, %rd1; 2026-02-21T09:31:52.9937776Z mov.b32 %r533, 1; 2026-02-21T09:31:52.9937965Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9938026Z // begin inline asm 2026-02-21T09:31:52.9938082Z 2026-02-21T09:31:52.9938145Z { 2026-02-21T09:31:52.9938214Z .reg .pred complete; 2026-02-21T09:31:52.9938273Z waitLoop: 2026-02-21T09:31:52.9938411Z mbarrier.try_wait.parity.shared.b64 complete, [%r532], %r533; 2026-02-21T09:31:52.9938484Z @!complete bra.uni waitLoop; 2026-02-21T09:31:52.9938538Z } 2026-02-21T09:31:52.9938546Z 2026-02-21T09:31:52.9938615Z // end inline asm 2026-02-21T09:31:52.9938805Z .loc 1 39 90 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:39:90 2026-02-21T09:31:52.9938873Z cp.async.wait_group 0; 2026-02-21T09:31:52.9938932Z bar.sync 0; 2026-02-21T09:31:52.9939004Z add.s32 %r534, %r50, 122880; 2026-02-21T09:31:52.9939067Z // begin inline asm 2026-02-21T09:31:52.9939160Z @%p55 mbarrier.inval.shared::cta.b64 [%r534]; 2026-02-21T09:31:52.9939227Z // end inline asm 2026-02-21T09:31:52.9939286Z bar.sync 0; 2026-02-21T09:31:52.9939345Z // begin inline asm 2026-02-21T09:31:52.9939435Z @%p55 mbarrier.inval.shared::cta.b64 [%r188]; 2026-02-21T09:31:52.9939502Z // end inline asm 2026-02-21T09:31:52.9939690Z .loc 1 49 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:49:52 2026-02-21T09:31:52.9939753Z or.b32 %r820, %r818, %r32; 2026-02-21T09:31:52.9939826Z or.b32 %r821, %r34, %r32; 2026-02-21T09:31:52.9939892Z or.b32 %r822, %r35, %r32; 2026-02-21T09:31:52.9939956Z or.b32 %r823, %r36, %r32; 2026-02-21T09:31:52.9940027Z or.b32 %r824, %r817, %r32; 2026-02-21T09:31:52.9940090Z or.b32 %r825, %r816, %r32; 2026-02-21T09:31:52.9940274Z .loc 1 49 24 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:49:24 2026-02-21T09:31:52.9940349Z mad.wide.u32 %rd213, %r820, 2, %rd49; 2026-02-21T09:31:52.9940430Z mad.wide.u32 %rd214, %r821, 2, %rd49; 2026-02-21T09:31:52.9940501Z mad.wide.u32 %rd215, %r822, 2, %rd49; 2026-02-21T09:31:52.9940569Z mad.wide.u32 %rd216, %r823, 2, %rd49; 2026-02-21T09:31:52.9940641Z cvt.u64.u32 %rd229, %r32; 2026-02-21T09:31:52.9940709Z add.s64 %rd230, %rd1, %rd229; 2026-02-21T09:31:52.9940774Z shl.b64 %rd231, %rd230, 1; 2026-02-21T09:31:52.9940841Z add.s64 %rd232, %rd49, %rd231; 2026-02-21T09:31:52.9940917Z add.s64 %rd217, %rd232, 131072; 2026-02-21T09:31:52.9940987Z add.s64 %rd218, %rd232, 163840; 2026-02-21T09:31:52.9941054Z add.s64 %rd219, %rd232, 196608; 2026-02-21T09:31:52.9941161Z mad.wide.u32 %rd220, %r824, 2, %rd49; 2026-02-21T09:31:52.9941251Z add.s64 %rd221, %rd232, 262144; 2026-02-21T09:31:52.9941316Z add.s64 %rd222, %rd232, 294912; 2026-02-21T09:31:52.9941389Z add.s64 %rd223, %rd232, 327680; 2026-02-21T09:31:52.9941452Z add.s64 %rd224, %rd232, 360448; 2026-02-21T09:31:52.9941516Z add.s64 %rd225, %rd232, 393216; 2026-02-21T09:31:52.9941581Z add.s64 %rd226, %rd232, 425984; 2026-02-21T09:31:52.9941655Z add.s64 %rd227, %rd232, 458752; 2026-02-21T09:31:52.9941725Z mad.wide.u32 %rd228, %r825, 2, %rd49; 2026-02-21T09:31:52.9941911Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9941982Z // begin inline asm 2026-02-21T09:31:52.9942300Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551}, [%r671 + 0]; 2026-02-21T09:31:52.9942387Z // end inline asm 2026-02-21T09:31:52.9942482Z // begin inline asm 2026-02-21T09:31:52.9942794Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568}, [%r671 + 16]; 2026-02-21T09:31:52.9942857Z // end inline asm 2026-02-21T09:31:52.9942919Z // begin inline asm 2026-02-21T09:31:52.9943219Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585}, [%r671 + 32]; 2026-02-21T09:31:52.9943279Z // end inline asm 2026-02-21T09:31:52.9943340Z // begin inline asm 2026-02-21T09:31:52.9943651Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602}, [%r671 + 48]; 2026-02-21T09:31:52.9943712Z // end inline asm 2026-02-21T09:31:52.9943773Z // begin inline asm 2026-02-21T09:31:52.9944075Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619}, [%r671 + 64]; 2026-02-21T09:31:52.9944137Z // end inline asm 2026-02-21T09:31:52.9944199Z // begin inline asm 2026-02-21T09:31:52.9944504Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636}, [%r671 + 80]; 2026-02-21T09:31:52.9944563Z // end inline asm 2026-02-21T09:31:52.9944623Z // begin inline asm 2026-02-21T09:31:52.9944959Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653}, [%r671 + 96]; 2026-02-21T09:31:52.9945076Z // end inline asm 2026-02-21T09:31:52.9945137Z // begin inline asm 2026-02-21T09:31:52.9945436Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670}, [%r671 + 112]; 2026-02-21T09:31:52.9945508Z // end inline asm 2026-02-21T09:31:52.9945573Z // begin inline asm 2026-02-21T09:31:52.9945653Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:31:52.9945720Z // end inline asm 2026-02-21T09:31:52.9945785Z cvt.u64.u32 %rd233, %r536; 2026-02-21T09:31:52.9945848Z cvt.u64.u32 %rd234, %r537; 2026-02-21T09:31:52.9945915Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:31:52.9945990Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:31:52.9946174Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9946241Z mov.b64 {%r826, %r827}, %rd236; 2026-02-21T09:31:52.9946320Z cvt.rn.f16x2.f32 %r828, %r827, %r826; 2026-02-21T09:31:52.9946500Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9946564Z cvt.u64.u32 %rd237, %r538; 2026-02-21T09:31:52.9946636Z cvt.u64.u32 %rd238, %r539; 2026-02-21T09:31:52.9946703Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:31:52.9946771Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:31:52.9947017Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9947131Z mov.b64 {%r829, %r830}, %rd240; 2026-02-21T09:31:52.9947203Z cvt.rn.f16x2.f32 %r831, %r830, %r829; 2026-02-21T09:31:52.9947387Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9947459Z cvt.u64.u32 %rd241, %r540; 2026-02-21T09:31:52.9947523Z cvt.u64.u32 %rd242, %r541; 2026-02-21T09:31:52.9947588Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:31:52.9947657Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:31:52.9947842Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9947906Z mov.b64 {%r832, %r833}, %rd244; 2026-02-21T09:31:52.9947977Z cvt.rn.f16x2.f32 %r834, %r833, %r832; 2026-02-21T09:31:52.9948220Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9948287Z cvt.u64.u32 %rd245, %r542; 2026-02-21T09:31:52.9948352Z cvt.u64.u32 %rd246, %r543; 2026-02-21T09:31:52.9948425Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:31:52.9948489Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:31:52.9948673Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9948745Z mov.b64 {%r835, %r836}, %rd248; 2026-02-21T09:31:52.9948815Z cvt.rn.f16x2.f32 %r837, %r836, %r835; 2026-02-21T09:31:52.9949000Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9949064Z cvt.u64.u32 %rd249, %r544; 2026-02-21T09:31:52.9949136Z cvt.u64.u32 %rd250, %r545; 2026-02-21T09:31:52.9949200Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:31:52.9949263Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:31:52.9949457Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9949522Z mov.b64 {%r838, %r839}, %rd252; 2026-02-21T09:31:52.9949593Z cvt.rn.f16x2.f32 %r840, %r839, %r838; 2026-02-21T09:31:52.9949786Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9949850Z cvt.u64.u32 %rd253, %r546; 2026-02-21T09:31:52.9949913Z cvt.u64.u32 %rd254, %r547; 2026-02-21T09:31:52.9949977Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:31:52.9950049Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:31:52.9950236Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9950300Z mov.b64 {%r841, %r842}, %rd256; 2026-02-21T09:31:52.9950376Z cvt.rn.f16x2.f32 %r843, %r842, %r841; 2026-02-21T09:31:52.9950561Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9950625Z cvt.u64.u32 %rd257, %r548; 2026-02-21T09:31:52.9950696Z cvt.u64.u32 %rd258, %r549; 2026-02-21T09:31:52.9950760Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:31:52.9950825Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:31:52.9951011Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9951082Z mov.b64 {%r844, %r845}, %rd260; 2026-02-21T09:31:52.9951151Z cvt.rn.f16x2.f32 %r846, %r845, %r844; 2026-02-21T09:31:52.9951339Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9951411Z cvt.u64.u32 %rd261, %r550; 2026-02-21T09:31:52.9951473Z cvt.u64.u32 %rd262, %r551; 2026-02-21T09:31:52.9951536Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:31:52.9951608Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:31:52.9951794Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9951858Z mov.b64 {%r847, %r848}, %rd264; 2026-02-21T09:31:52.9951928Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T09:31:52.9952120Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9952232Z cvt.u64.u32 %rd265, %r553; 2026-02-21T09:31:52.9952296Z cvt.u64.u32 %rd266, %r554; 2026-02-21T09:31:52.9952371Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:31:52.9952436Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:31:52.9952626Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9952701Z mov.b64 {%r850, %r851}, %rd268; 2026-02-21T09:31:52.9952771Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T09:31:52.9952957Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9953022Z cvt.u64.u32 %rd269, %r555; 2026-02-21T09:31:52.9953093Z cvt.u64.u32 %rd270, %r556; 2026-02-21T09:31:52.9953157Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:31:52.9953239Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:31:52.9953456Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9953525Z mov.b64 {%r853, %r854}, %rd272; 2026-02-21T09:31:52.9953593Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T09:31:52.9953788Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9953851Z cvt.u64.u32 %rd273, %r557; 2026-02-21T09:31:52.9953914Z cvt.u64.u32 %rd274, %r558; 2026-02-21T09:31:52.9953978Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:31:52.9954050Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:31:52.9954237Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9954300Z mov.b64 {%r856, %r857}, %rd276; 2026-02-21T09:31:52.9954376Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T09:31:52.9954569Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9954635Z cvt.u64.u32 %rd277, %r559; 2026-02-21T09:31:52.9954760Z cvt.u64.u32 %rd278, %r560; 2026-02-21T09:31:52.9954829Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:31:52.9954893Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:31:52.9955083Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9955157Z mov.b64 {%r859, %r860}, %rd280; 2026-02-21T09:31:52.9955225Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T09:31:52.9955410Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9955482Z cvt.u64.u32 %rd281, %r561; 2026-02-21T09:31:52.9955547Z cvt.u64.u32 %rd282, %r562; 2026-02-21T09:31:52.9955611Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:31:52.9955685Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:31:52.9955864Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9955932Z mov.b64 {%r862, %r863}, %rd284; 2026-02-21T09:31:52.9956004Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T09:31:52.9956196Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9956261Z cvt.u64.u32 %rd285, %r563; 2026-02-21T09:31:52.9956325Z cvt.u64.u32 %rd286, %r564; 2026-02-21T09:31:52.9956399Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:31:52.9956464Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:31:52.9956647Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9956719Z mov.b64 {%r865, %r866}, %rd288; 2026-02-21T09:31:52.9956787Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T09:31:52.9956973Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9957039Z cvt.u64.u32 %rd289, %r565; 2026-02-21T09:31:52.9957111Z cvt.u64.u32 %rd290, %r566; 2026-02-21T09:31:52.9957178Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:31:52.9957246Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:31:52.9957511Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9957576Z mov.b64 {%r868, %r869}, %rd292; 2026-02-21T09:31:52.9957644Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T09:31:52.9957838Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9957901Z cvt.u64.u32 %rd293, %r567; 2026-02-21T09:31:52.9957965Z cvt.u64.u32 %rd294, %r568; 2026-02-21T09:31:52.9958029Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:31:52.9958101Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:31:52.9958286Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9958349Z mov.b64 {%r871, %r872}, %rd296; 2026-02-21T09:31:52.9958426Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T09:31:52.9958664Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9958734Z cvt.u64.u32 %rd297, %r570; 2026-02-21T09:31:52.9958804Z cvt.u64.u32 %rd298, %r571; 2026-02-21T09:31:52.9958868Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:31:52.9958931Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:31:52.9959109Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9959181Z mov.b64 {%r874, %r875}, %rd300; 2026-02-21T09:31:52.9959248Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T09:31:52.9959431Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9959501Z cvt.u64.u32 %rd301, %r572; 2026-02-21T09:31:52.9959564Z cvt.u64.u32 %rd302, %r573; 2026-02-21T09:31:52.9959627Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:31:52.9959697Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:31:52.9959880Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9959946Z mov.b64 {%r877, %r878}, %rd304; 2026-02-21T09:31:52.9960015Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T09:31:52.9960207Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9960270Z cvt.u64.u32 %rd305, %r574; 2026-02-21T09:31:52.9960333Z cvt.u64.u32 %rd306, %r575; 2026-02-21T09:31:52.9960405Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:31:52.9960468Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:31:52.9960652Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9960722Z mov.b64 {%r880, %r881}, %rd308; 2026-02-21T09:31:52.9960791Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T09:31:52.9960971Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9961035Z cvt.u64.u32 %rd309, %r576; 2026-02-21T09:31:52.9961108Z cvt.u64.u32 %rd310, %r577; 2026-02-21T09:31:52.9961175Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:31:52.9961240Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:31:52.9961432Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9961498Z mov.b64 {%r883, %r884}, %rd312; 2026-02-21T09:31:52.9961565Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T09:31:52.9961760Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9961824Z cvt.u64.u32 %rd313, %r578; 2026-02-21T09:31:52.9961887Z cvt.u64.u32 %rd314, %r579; 2026-02-21T09:31:52.9961952Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:31:52.9962027Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:31:52.9962214Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9962283Z mov.b64 {%r886, %r887}, %rd316; 2026-02-21T09:31:52.9962362Z cvt.rn.f16x2.f32 %r888, %r887, %r886; 2026-02-21T09:31:52.9962571Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9962676Z cvt.u64.u32 %rd317, %r580; 2026-02-21T09:31:52.9962747Z cvt.u64.u32 %rd318, %r581; 2026-02-21T09:31:52.9962811Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:31:52.9962876Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:31:52.9963060Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9963133Z mov.b64 {%r889, %r890}, %rd320; 2026-02-21T09:31:52.9963201Z cvt.rn.f16x2.f32 %r891, %r890, %r889; 2026-02-21T09:31:52.9963384Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9963458Z cvt.u64.u32 %rd321, %r582; 2026-02-21T09:31:52.9963522Z cvt.u64.u32 %rd322, %r583; 2026-02-21T09:31:52.9963585Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:31:52.9963677Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:31:52.9963887Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9963955Z mov.b64 {%r892, %r893}, %rd324; 2026-02-21T09:31:52.9964022Z cvt.rn.f16x2.f32 %r894, %r893, %r892; 2026-02-21T09:31:52.9964208Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9964272Z cvt.u64.u32 %rd325, %r584; 2026-02-21T09:31:52.9964335Z cvt.u64.u32 %rd326, %r585; 2026-02-21T09:31:52.9964406Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:31:52.9964469Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:31:52.9964655Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9964776Z mov.b64 {%r895, %r896}, %rd328; 2026-02-21T09:31:52.9964846Z cvt.rn.f16x2.f32 %r897, %r896, %r895; 2026-02-21T09:31:52.9965029Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9965094Z cvt.u64.u32 %rd329, %r587; 2026-02-21T09:31:52.9965168Z cvt.u64.u32 %rd330, %r588; 2026-02-21T09:31:52.9965232Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:31:52.9965295Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:31:52.9965484Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9965547Z mov.b64 {%r898, %r899}, %rd332; 2026-02-21T09:31:52.9965616Z cvt.rn.f16x2.f32 %r900, %r899, %r898; 2026-02-21T09:31:52.9965803Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9965867Z cvt.u64.u32 %rd333, %r589; 2026-02-21T09:31:52.9965931Z cvt.u64.u32 %rd334, %r590; 2026-02-21T09:31:52.9965994Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:31:52.9966067Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:31:52.9966253Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9966319Z mov.b64 {%r901, %r902}, %rd336; 2026-02-21T09:31:52.9966395Z cvt.rn.f16x2.f32 %r903, %r902, %r901; 2026-02-21T09:31:52.9966577Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9966641Z cvt.u64.u32 %rd337, %r591; 2026-02-21T09:31:52.9966713Z cvt.u64.u32 %rd338, %r592; 2026-02-21T09:31:52.9966779Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:31:52.9966844Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:31:52.9967028Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9967102Z mov.b64 {%r904, %r905}, %rd340; 2026-02-21T09:31:52.9967170Z cvt.rn.f16x2.f32 %r906, %r905, %r904; 2026-02-21T09:31:52.9967350Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9967421Z cvt.u64.u32 %rd341, %r593; 2026-02-21T09:31:52.9967485Z cvt.u64.u32 %rd342, %r594; 2026-02-21T09:31:52.9967550Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:31:52.9967654Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:31:52.9967866Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9967929Z mov.b64 {%r907, %r908}, %rd344; 2026-02-21T09:31:52.9967995Z cvt.rn.f16x2.f32 %r909, %r908, %r907; 2026-02-21T09:31:52.9968184Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9968247Z cvt.u64.u32 %rd345, %r595; 2026-02-21T09:31:52.9968309Z cvt.u64.u32 %rd346, %r596; 2026-02-21T09:31:52.9968381Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:31:52.9968446Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:31:52.9968636Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9968705Z mov.b64 {%r910, %r911}, %rd348; 2026-02-21T09:31:52.9968812Z cvt.rn.f16x2.f32 %r912, %r911, %r910; 2026-02-21T09:31:52.9969020Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9969089Z cvt.u64.u32 %rd349, %r597; 2026-02-21T09:31:52.9969159Z cvt.u64.u32 %rd350, %r598; 2026-02-21T09:31:52.9969221Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:31:52.9969284Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:31:52.9969478Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9969542Z mov.b64 {%r913, %r914}, %rd352; 2026-02-21T09:31:52.9969609Z cvt.rn.f16x2.f32 %r915, %r914, %r913; 2026-02-21T09:31:52.9969801Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9969866Z cvt.u64.u32 %rd353, %r599; 2026-02-21T09:31:52.9969928Z cvt.u64.u32 %rd354, %r600; 2026-02-21T09:31:52.9969992Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:31:52.9970063Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:31:52.9970254Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9970321Z mov.b64 {%r916, %r917}, %rd356; 2026-02-21T09:31:52.9970397Z cvt.rn.f16x2.f32 %r918, %r917, %r916; 2026-02-21T09:31:52.9970581Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9970645Z cvt.u64.u32 %rd357, %r601; 2026-02-21T09:31:52.9970716Z cvt.u64.u32 %rd358, %r602; 2026-02-21T09:31:52.9970782Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:31:52.9970845Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:31:52.9971031Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9971104Z mov.b64 {%r919, %r920}, %rd360; 2026-02-21T09:31:52.9971173Z cvt.rn.f16x2.f32 %r921, %r920, %r919; 2026-02-21T09:31:52.9971358Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9971433Z cvt.u64.u32 %rd361, %r604; 2026-02-21T09:31:52.9971499Z cvt.u64.u32 %rd362, %r605; 2026-02-21T09:31:52.9971564Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:31:52.9971635Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:31:52.9971821Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9971886Z mov.b64 {%r922, %r923}, %rd364; 2026-02-21T09:31:52.9971957Z cvt.rn.f16x2.f32 %r924, %r923, %r922; 2026-02-21T09:31:52.9972155Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9972221Z cvt.u64.u32 %rd365, %r606; 2026-02-21T09:31:52.9972285Z cvt.u64.u32 %rd366, %r607; 2026-02-21T09:31:52.9972360Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:31:52.9972425Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:31:52.9972609Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9972681Z mov.b64 {%r925, %r926}, %rd368; 2026-02-21T09:31:52.9972752Z cvt.rn.f16x2.f32 %r927, %r926, %r925; 2026-02-21T09:31:52.9973024Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9973089Z cvt.u64.u32 %rd369, %r608; 2026-02-21T09:31:52.9973160Z cvt.u64.u32 %rd370, %r609; 2026-02-21T09:31:52.9973223Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:31:52.9973287Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:31:52.9973478Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9973541Z mov.b64 {%r928, %r929}, %rd372; 2026-02-21T09:31:52.9973610Z cvt.rn.f16x2.f32 %r930, %r929, %r928; 2026-02-21T09:31:52.9973804Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9973867Z cvt.u64.u32 %rd373, %r610; 2026-02-21T09:31:52.9973931Z cvt.u64.u32 %rd374, %r611; 2026-02-21T09:31:52.9974016Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:31:52.9974127Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:31:52.9974316Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9974383Z mov.b64 {%r931, %r932}, %rd376; 2026-02-21T09:31:52.9974460Z cvt.rn.f16x2.f32 %r933, %r932, %r931; 2026-02-21T09:31:52.9974644Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9974757Z cvt.u64.u32 %rd377, %r612; 2026-02-21T09:31:52.9974828Z cvt.u64.u32 %rd378, %r613; 2026-02-21T09:31:52.9974891Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:31:52.9974954Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:31:52.9975140Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9975215Z mov.b64 {%r934, %r935}, %rd380; 2026-02-21T09:31:52.9975283Z cvt.rn.f16x2.f32 %r936, %r935, %r934; 2026-02-21T09:31:52.9975468Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9975542Z cvt.u64.u32 %rd381, %r614; 2026-02-21T09:31:52.9975606Z cvt.u64.u32 %rd382, %r615; 2026-02-21T09:31:52.9975670Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:31:52.9975733Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:31:52.9975923Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9975988Z mov.b64 {%r937, %r938}, %rd384; 2026-02-21T09:31:52.9976055Z cvt.rn.f16x2.f32 %r939, %r938, %r937; 2026-02-21T09:31:52.9976246Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9976310Z cvt.u64.u32 %rd385, %r616; 2026-02-21T09:31:52.9976373Z cvt.u64.u32 %rd386, %r617; 2026-02-21T09:31:52.9976443Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:31:52.9976506Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:31:52.9976690Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9976767Z mov.b64 {%r940, %r941}, %rd388; 2026-02-21T09:31:52.9976837Z cvt.rn.f16x2.f32 %r942, %r941, %r940; 2026-02-21T09:31:52.9977023Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9977088Z cvt.u64.u32 %rd389, %r618; 2026-02-21T09:31:52.9977159Z cvt.u64.u32 %rd390, %r619; 2026-02-21T09:31:52.9977224Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:31:52.9977289Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:31:52.9977473Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9977537Z mov.b64 {%r943, %r944}, %rd392; 2026-02-21T09:31:52.9977603Z cvt.rn.f16x2.f32 %r945, %r944, %r943; 2026-02-21T09:31:52.9977795Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9977860Z cvt.u64.u32 %rd393, %r621; 2026-02-21T09:31:52.9977924Z cvt.u64.u32 %rd394, %r622; 2026-02-21T09:31:52.9978021Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:31:52.9978118Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:31:52.9978304Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9978368Z mov.b64 {%r946, %r947}, %rd396; 2026-02-21T09:31:52.9978443Z cvt.rn.f16x2.f32 %r948, %r947, %r946; 2026-02-21T09:31:52.9978619Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9978683Z cvt.u64.u32 %rd397, %r623; 2026-02-21T09:31:52.9978753Z cvt.u64.u32 %rd398, %r624; 2026-02-21T09:31:52.9978816Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:31:52.9978879Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:31:52.9979063Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9979135Z mov.b64 {%r949, %r950}, %rd400; 2026-02-21T09:31:52.9979230Z cvt.rn.f16x2.f32 %r951, %r950, %r949; 2026-02-21T09:31:52.9979440Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9979518Z cvt.u64.u32 %rd401, %r625; 2026-02-21T09:31:52.9979580Z cvt.u64.u32 %rd402, %r626; 2026-02-21T09:31:52.9979644Z shl.b64 %rd403, %rd402, 32; 2026-02-21T09:31:52.9979708Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T09:31:52.9979899Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9979964Z mov.b64 {%r952, %r953}, %rd404; 2026-02-21T09:31:52.9980031Z cvt.rn.f16x2.f32 %r954, %r953, %r952; 2026-02-21T09:31:52.9980226Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9980290Z cvt.u64.u32 %rd405, %r627; 2026-02-21T09:31:52.9980353Z cvt.u64.u32 %rd406, %r628; 2026-02-21T09:31:52.9980424Z shl.b64 %rd407, %rd406, 32; 2026-02-21T09:31:52.9980489Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T09:31:52.9980674Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9980750Z mov.b64 {%r955, %r956}, %rd408; 2026-02-21T09:31:52.9980818Z cvt.rn.f16x2.f32 %r957, %r956, %r955; 2026-02-21T09:31:52.9981006Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9981071Z cvt.u64.u32 %rd409, %r629; 2026-02-21T09:31:52.9981141Z cvt.u64.u32 %rd410, %r630; 2026-02-21T09:31:52.9981207Z shl.b64 %rd411, %rd410, 32; 2026-02-21T09:31:52.9981271Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T09:31:52.9981468Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9981532Z mov.b64 {%r958, %r959}, %rd412; 2026-02-21T09:31:52.9981599Z cvt.rn.f16x2.f32 %r960, %r959, %r958; 2026-02-21T09:31:52.9981797Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9981865Z cvt.u64.u32 %rd413, %r631; 2026-02-21T09:31:52.9981932Z cvt.u64.u32 %rd414, %r632; 2026-02-21T09:31:52.9981999Z shl.b64 %rd415, %rd414, 32; 2026-02-21T09:31:52.9982074Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T09:31:52.9982264Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9982329Z mov.b64 {%r961, %r962}, %rd416; 2026-02-21T09:31:52.9982406Z cvt.rn.f16x2.f32 %r963, %r962, %r961; 2026-02-21T09:31:52.9982595Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9982658Z cvt.u64.u32 %rd417, %r633; 2026-02-21T09:31:52.9982721Z cvt.u64.u32 %rd418, %r634; 2026-02-21T09:31:52.9982791Z shl.b64 %rd419, %rd418, 32; 2026-02-21T09:31:52.9982854Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T09:31:52.9983037Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9983110Z mov.b64 {%r964, %r965}, %rd420; 2026-02-21T09:31:52.9983204Z cvt.rn.f16x2.f32 %r966, %r965, %r964; 2026-02-21T09:31:52.9983410Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9983480Z cvt.u64.u32 %rd421, %r635; 2026-02-21T09:31:52.9983544Z cvt.u64.u32 %rd422, %r636; 2026-02-21T09:31:52.9983609Z shl.b64 %rd423, %rd422, 32; 2026-02-21T09:31:52.9983673Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T09:31:52.9983865Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9983928Z mov.b64 {%r967, %r968}, %rd424; 2026-02-21T09:31:52.9983997Z cvt.rn.f16x2.f32 %r969, %r968, %r967; 2026-02-21T09:31:52.9984190Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9984254Z cvt.u64.u32 %rd425, %r638; 2026-02-21T09:31:52.9984342Z cvt.u64.u32 %rd426, %r639; 2026-02-21T09:31:52.9984445Z shl.b64 %rd427, %rd426, 32; 2026-02-21T09:31:52.9984513Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T09:31:52.9984736Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9984801Z mov.b64 {%r970, %r971}, %rd428; 2026-02-21T09:31:52.9984876Z cvt.rn.f16x2.f32 %r972, %r971, %r970; 2026-02-21T09:31:52.9985065Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9985128Z cvt.u64.u32 %rd429, %r640; 2026-02-21T09:31:52.9985199Z cvt.u64.u32 %rd430, %r641; 2026-02-21T09:31:52.9985262Z shl.b64 %rd431, %rd430, 32; 2026-02-21T09:31:52.9985326Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T09:31:52.9985527Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9985592Z mov.b64 {%r973, %r974}, %rd432; 2026-02-21T09:31:52.9985662Z cvt.rn.f16x2.f32 %r975, %r974, %r973; 2026-02-21T09:31:52.9985852Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9985926Z cvt.u64.u32 %rd433, %r642; 2026-02-21T09:31:52.9985988Z cvt.u64.u32 %rd434, %r643; 2026-02-21T09:31:52.9986052Z shl.b64 %rd435, %rd434, 32; 2026-02-21T09:31:52.9986123Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T09:31:52.9986308Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9986372Z mov.b64 {%r976, %r977}, %rd436; 2026-02-21T09:31:52.9986446Z cvt.rn.f16x2.f32 %r978, %r977, %r976; 2026-02-21T09:31:52.9986637Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9986701Z cvt.u64.u32 %rd437, %r644; 2026-02-21T09:31:52.9986763Z cvt.u64.u32 %rd438, %r645; 2026-02-21T09:31:52.9986836Z shl.b64 %rd439, %rd438, 32; 2026-02-21T09:31:52.9986901Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T09:31:52.9987093Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9987170Z mov.b64 {%r979, %r980}, %rd440; 2026-02-21T09:31:52.9987239Z cvt.rn.f16x2.f32 %r981, %r980, %r979; 2026-02-21T09:31:52.9987425Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9987494Z cvt.u64.u32 %rd441, %r646; 2026-02-21T09:31:52.9987557Z cvt.u64.u32 %rd442, %r647; 2026-02-21T09:31:52.9987620Z shl.b64 %rd443, %rd442, 32; 2026-02-21T09:31:52.9987682Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T09:31:52.9987878Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9987941Z mov.b64 {%r982, %r983}, %rd444; 2026-02-21T09:31:52.9988008Z cvt.rn.f16x2.f32 %r984, %r983, %r982; 2026-02-21T09:31:52.9988200Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9988266Z cvt.u64.u32 %rd445, %r648; 2026-02-21T09:31:52.9988330Z cvt.u64.u32 %rd446, %r649; 2026-02-21T09:31:52.9988434Z shl.b64 %rd447, %rd446, 32; 2026-02-21T09:31:52.9988527Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T09:31:52.9988707Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9988771Z mov.b64 {%r985, %r986}, %rd448; 2026-02-21T09:31:52.9988845Z cvt.rn.f16x2.f32 %r987, %r986, %r985; 2026-02-21T09:31:52.9989027Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9989090Z cvt.u64.u32 %rd449, %r650; 2026-02-21T09:31:52.9989164Z cvt.u64.u32 %rd450, %r651; 2026-02-21T09:31:52.9989227Z shl.b64 %rd451, %rd450, 32; 2026-02-21T09:31:52.9989292Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T09:31:52.9989486Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9989580Z mov.b64 {%r988, %r989}, %rd452; 2026-02-21T09:31:52.9989675Z cvt.rn.f16x2.f32 %r990, %r989, %r988; 2026-02-21T09:31:52.9989861Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9989933Z cvt.u64.u32 %rd453, %r652; 2026-02-21T09:31:52.9989994Z cvt.u64.u32 %rd454, %r653; 2026-02-21T09:31:52.9990058Z shl.b64 %rd455, %rd454, 32; 2026-02-21T09:31:52.9990130Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T09:31:52.9990309Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9990374Z mov.b64 {%r991, %r992}, %rd456; 2026-02-21T09:31:52.9990449Z cvt.rn.f16x2.f32 %r993, %r992, %r991; 2026-02-21T09:31:52.9990632Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9990694Z cvt.u64.u32 %rd457, %r655; 2026-02-21T09:31:52.9990758Z cvt.u64.u32 %rd458, %r656; 2026-02-21T09:31:52.9990832Z shl.b64 %rd459, %rd458, 32; 2026-02-21T09:31:52.9990897Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T09:31:52.9991080Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9991154Z mov.b64 {%r994, %r995}, %rd460; 2026-02-21T09:31:52.9991222Z cvt.rn.f16x2.f32 %r996, %r995, %r994; 2026-02-21T09:31:52.9991400Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9991470Z cvt.u64.u32 %rd461, %r657; 2026-02-21T09:31:52.9991534Z cvt.u64.u32 %rd462, %r658; 2026-02-21T09:31:52.9991600Z shl.b64 %rd463, %rd462, 32; 2026-02-21T09:31:52.9991666Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T09:31:52.9991861Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9991927Z mov.b64 {%r997, %r998}, %rd464; 2026-02-21T09:31:52.9991996Z cvt.rn.f16x2.f32 %r999, %r998, %r997; 2026-02-21T09:31:52.9992195Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9992260Z cvt.u64.u32 %rd465, %r659; 2026-02-21T09:31:52.9992326Z cvt.u64.u32 %rd466, %r660; 2026-02-21T09:31:52.9992397Z shl.b64 %rd467, %rd466, 32; 2026-02-21T09:31:52.9992461Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T09:31:52.9992644Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9992714Z mov.b64 {%r1000, %r1001}, %rd468; 2026-02-21T09:31:52.9992798Z cvt.rn.f16x2.f32 %r1002, %r1001, %r1000; 2026-02-21T09:31:52.9992982Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9993045Z cvt.u64.u32 %rd469, %r661; 2026-02-21T09:31:52.9993116Z cvt.u64.u32 %rd470, %r662; 2026-02-21T09:31:52.9993179Z shl.b64 %rd471, %rd470, 32; 2026-02-21T09:31:52.9993243Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T09:31:52.9993435Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9993530Z mov.b64 {%r1003, %r1004}, %rd472; 2026-02-21T09:31:52.9993632Z cvt.rn.f16x2.f32 %r1005, %r1004, %r1003; 2026-02-21T09:31:52.9993813Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9993886Z cvt.u64.u32 %rd473, %r663; 2026-02-21T09:31:52.9993950Z cvt.u64.u32 %rd474, %r664; 2026-02-21T09:31:52.9994012Z shl.b64 %rd475, %rd474, 32; 2026-02-21T09:31:52.9994084Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T09:31:52.9994268Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9994333Z mov.b64 {%r1006, %r1007}, %rd476; 2026-02-21T09:31:52.9994414Z cvt.rn.f16x2.f32 %r1008, %r1007, %r1006; 2026-02-21T09:31:52.9994598Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9994662Z cvt.u64.u32 %rd477, %r665; 2026-02-21T09:31:52.9994813Z cvt.u64.u32 %rd478, %r666; 2026-02-21T09:31:52.9994916Z shl.b64 %rd479, %rd478, 32; 2026-02-21T09:31:52.9994984Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T09:31:52.9995167Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9995241Z mov.b64 {%r1009, %r1010}, %rd480; 2026-02-21T09:31:52.9995316Z cvt.rn.f16x2.f32 %r1011, %r1010, %r1009; 2026-02-21T09:31:52.9995501Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9995571Z cvt.u64.u32 %rd481, %r667; 2026-02-21T09:31:52.9995634Z cvt.u64.u32 %rd482, %r668; 2026-02-21T09:31:52.9995698Z shl.b64 %rd483, %rd482, 32; 2026-02-21T09:31:52.9995762Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T09:31:52.9995955Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9996020Z mov.b64 {%r1012, %r1013}, %rd484; 2026-02-21T09:31:52.9996096Z cvt.rn.f16x2.f32 %r1014, %r1013, %r1012; 2026-02-21T09:31:52.9996287Z .loc 1 46 52 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:46:52 2026-02-21T09:31:52.9996353Z cvt.u64.u32 %rd485, %r669; 2026-02-21T09:31:52.9996416Z cvt.u64.u32 %rd486, %r670; 2026-02-21T09:31:52.9996488Z shl.b64 %rd487, %rd486, 32; 2026-02-21T09:31:52.9996552Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T09:31:52.9996737Z .loc 1 48 27 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:48:27 2026-02-21T09:31:52.9996803Z mov.b64 {%r1015, %r1016}, %rd488; 2026-02-21T09:31:52.9996883Z cvt.rn.f16x2.f32 %r1017, %r1016, %r1015; 2026-02-21T09:31:52.9997065Z .loc 1 49 82 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:49:82 2026-02-21T09:31:52.9997171Z st.shared.v4.b32 [%r27], {%r828, %r840, %r852, %r864}; 2026-02-21T09:31:52.9997280Z st.shared.v4.b32 [%r28], {%r876, %r888, %r900, %r912}; 2026-02-21T09:31:52.9997345Z bar.sync 0; 2026-02-21T09:31:52.9997410Z // begin inline asm 2026-02-21T09:31:52.9997587Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r752, %r756, %r760, %r764}, [%r676]; 2026-02-21T09:31:52.9997650Z // end inline asm 2026-02-21T09:31:52.9997712Z // begin inline asm 2026-02-21T09:31:52.9997876Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r768, %r772, %r776, %r780}, [%r681]; 2026-02-21T09:31:52.9997944Z // end inline asm 2026-02-21T09:31:52.9998003Z bar.sync 0; 2026-02-21T09:31:52.9998100Z st.shared.v4.b32 [%r27], {%r924, %r936, %r948, %r960}; 2026-02-21T09:31:52.9998208Z st.shared.v4.b32 [%r28], {%r972, %r984, %r996, %r1008}; 2026-02-21T09:31:52.9998268Z bar.sync 0; 2026-02-21T09:31:52.9998330Z // begin inline asm 2026-02-21T09:31:52.9998494Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r784, %r788, %r792, %r796}, [%r676]; 2026-02-21T09:31:52.9998555Z // end inline asm 2026-02-21T09:31:52.9998616Z // begin inline asm 2026-02-21T09:31:52.9998776Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r800, %r804, %r808, %r812}, [%r681]; 2026-02-21T09:31:52.9998844Z // end inline asm 2026-02-21T09:31:52.9998946Z bar.sync 0; 2026-02-21T09:31:52.9999069Z st.shared.v4.b32 [%r27], {%r831, %r843, %r855, %r867}; 2026-02-21T09:31:52.9999171Z st.shared.v4.b32 [%r28], {%r879, %r891, %r903, %r915}; 2026-02-21T09:31:52.9999231Z bar.sync 0; 2026-02-21T09:31:52.9999292Z // begin inline asm 2026-02-21T09:31:52.9999448Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r753, %r757, %r761, %r765}, [%r676]; 2026-02-21T09:31:52.9999516Z // end inline asm 2026-02-21T09:31:52.9999577Z // begin inline asm 2026-02-21T09:31:52.9999734Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r769, %r773, %r777, %r781}, [%r681]; 2026-02-21T09:31:52.9999803Z // end inline asm 2026-02-21T09:31:52.9999862Z bar.sync 0; 2026-02-21T09:31:52.9999956Z st.shared.v4.b32 [%r27], {%r927, %r939, %r951, %r963}; 2026-02-21T09:31:53.0000061Z st.shared.v4.b32 [%r28], {%r975, %r987, %r999, %r1011}; 2026-02-21T09:31:53.0000121Z bar.sync 0; 2026-02-21T09:31:53.0000207Z // begin inline asm 2026-02-21T09:31:53.0000389Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r785, %r789, %r793, %r797}, [%r676]; 2026-02-21T09:31:53.0000462Z // end inline asm 2026-02-21T09:31:53.0000527Z // begin inline asm 2026-02-21T09:31:53.0000682Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r801, %r805, %r809, %r813}, [%r681]; 2026-02-21T09:31:53.0000752Z // end inline asm 2026-02-21T09:31:53.0000813Z bar.sync 0; 2026-02-21T09:31:53.0000907Z st.shared.v4.b32 [%r27], {%r834, %r846, %r858, %r870}; 2026-02-21T09:31:53.0001005Z st.shared.v4.b32 [%r28], {%r882, %r894, %r906, %r918}; 2026-02-21T09:31:53.0001074Z bar.sync 0; 2026-02-21T09:31:53.0001136Z // begin inline asm 2026-02-21T09:31:53.0001292Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r754, %r758, %r762, %r766}, [%r676]; 2026-02-21T09:31:53.0001364Z // end inline asm 2026-02-21T09:31:53.0001425Z // begin inline asm 2026-02-21T09:31:53.0001579Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r770, %r774, %r778, %r782}, [%r681]; 2026-02-21T09:31:53.0001662Z // end inline asm 2026-02-21T09:31:53.0001724Z bar.sync 0; 2026-02-21T09:31:53.0001816Z st.shared.v4.b32 [%r27], {%r930, %r942, %r954, %r966}; 2026-02-21T09:31:53.0001931Z st.shared.v4.b32 [%r28], {%r978, %r990, %r1002, %r1014}; 2026-02-21T09:31:53.0001998Z bar.sync 0; 2026-02-21T09:31:53.0002059Z // begin inline asm 2026-02-21T09:31:53.0002213Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r786, %r790, %r794, %r798}, [%r676]; 2026-02-21T09:31:53.0002278Z // end inline asm 2026-02-21T09:31:53.0002340Z // begin inline asm 2026-02-21T09:31:53.0002490Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r802, %r806, %r810, %r814}, [%r681]; 2026-02-21T09:31:53.0002550Z // end inline asm 2026-02-21T09:31:53.0002617Z bar.sync 0; 2026-02-21T09:31:53.0002711Z st.shared.v4.b32 [%r27], {%r837, %r849, %r861, %r873}; 2026-02-21T09:31:53.0002803Z st.shared.v4.b32 [%r28], {%r885, %r897, %r909, %r921}; 2026-02-21T09:31:53.0002871Z bar.sync 0; 2026-02-21T09:31:53.0002932Z // begin inline asm 2026-02-21T09:31:53.0003087Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r755, %r759, %r763, %r767}, [%r676]; 2026-02-21T09:31:53.0003157Z // end inline asm 2026-02-21T09:31:53.0003218Z // begin inline asm 2026-02-21T09:31:53.0003370Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r771, %r775, %r779, %r783}, [%r681]; 2026-02-21T09:31:53.0003431Z // end inline asm 2026-02-21T09:31:53.0003497Z bar.sync 0; 2026-02-21T09:31:53.0003589Z st.shared.v4.b32 [%r27], {%r933, %r945, %r957, %r969}; 2026-02-21T09:31:53.0003692Z st.shared.v4.b32 [%r28], {%r981, %r993, %r1005, %r1017}; 2026-02-21T09:31:53.0003758Z bar.sync 0; 2026-02-21T09:31:53.0003817Z // begin inline asm 2026-02-21T09:31:53.0003969Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r787, %r791, %r795, %r799}, [%r676]; 2026-02-21T09:31:53.0004028Z // end inline asm 2026-02-21T09:31:53.0004096Z // begin inline asm 2026-02-21T09:31:53.0004247Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r803, %r807, %r811, %r815}, [%r681]; 2026-02-21T09:31:53.0004307Z // end inline asm 2026-02-21T09:31:53.0004378Z // begin inline asm 2026-02-21T09:31:53.0004525Z st.global.v4.b32 [ %rd213 + 0 ], { %r752, %r753, %r754, %r755 }; 2026-02-21T09:31:53.0004604Z // end inline asm 2026-02-21T09:31:53.0004664Z // begin inline asm 2026-02-21T09:31:53.0004837Z st.global.v4.b32 [ %rd214 + 0 ], { %r756, %r757, %r758, %r759 }; 2026-02-21T09:31:53.0004896Z // end inline asm 2026-02-21T09:31:53.0004957Z // begin inline asm 2026-02-21T09:31:53.0005076Z st.global.v4.b32 [ %rd215 + 0 ], { %r760, %r761, %r762, %r763 }; 2026-02-21T09:31:53.0005137Z // end inline asm 2026-02-21T09:31:53.0005198Z // begin inline asm 2026-02-21T09:31:53.0005314Z st.global.v4.b32 [ %rd216 + 0 ], { %r764, %r765, %r766, %r767 }; 2026-02-21T09:31:53.0005374Z // end inline asm 2026-02-21T09:31:53.0005433Z // begin inline asm 2026-02-21T09:31:53.0005537Z st.global.v4.b32 [ %rd217 + 0 ], { %r768, %r769, %r770, %r771 }; 2026-02-21T09:31:53.0005605Z // end inline asm 2026-02-21T09:31:53.0005694Z // begin inline asm 2026-02-21T09:31:53.0005827Z st.global.v4.b32 [ %rd218 + 0 ], { %r772, %r773, %r774, %r775 }; 2026-02-21T09:31:53.0005896Z // end inline asm 2026-02-21T09:31:53.0005957Z // begin inline asm 2026-02-21T09:31:53.0006062Z st.global.v4.b32 [ %rd219 + 0 ], { %r776, %r777, %r778, %r779 }; 2026-02-21T09:31:53.0006123Z // end inline asm 2026-02-21T09:31:53.0006191Z // begin inline asm 2026-02-21T09:31:53.0006294Z st.global.v4.b32 [ %rd220 + 0 ], { %r780, %r781, %r782, %r783 }; 2026-02-21T09:31:53.0006355Z // end inline asm 2026-02-21T09:31:53.0006425Z // begin inline asm 2026-02-21T09:31:53.0006529Z st.global.v4.b32 [ %rd221 + 0 ], { %r784, %r785, %r786, %r787 }; 2026-02-21T09:31:53.0006589Z // end inline asm 2026-02-21T09:31:53.0006657Z // begin inline asm 2026-02-21T09:31:53.0006763Z st.global.v4.b32 [ %rd222 + 0 ], { %r788, %r789, %r790, %r791 }; 2026-02-21T09:31:53.0006824Z // end inline asm 2026-02-21T09:31:53.0006883Z // begin inline asm 2026-02-21T09:31:53.0006998Z st.global.v4.b32 [ %rd223 + 0 ], { %r792, %r793, %r794, %r795 }; 2026-02-21T09:31:53.0007059Z // end inline asm 2026-02-21T09:31:53.0007120Z // begin inline asm 2026-02-21T09:31:53.0007229Z st.global.v4.b32 [ %rd224 + 0 ], { %r796, %r797, %r798, %r799 }; 2026-02-21T09:31:53.0007288Z // end inline asm 2026-02-21T09:31:53.0007347Z // begin inline asm 2026-02-21T09:31:53.0007450Z st.global.v4.b32 [ %rd225 + 0 ], { %r800, %r801, %r802, %r803 }; 2026-02-21T09:31:53.0007519Z // end inline asm 2026-02-21T09:31:53.0007579Z // begin inline asm 2026-02-21T09:31:53.0007685Z st.global.v4.b32 [ %rd226 + 0 ], { %r804, %r805, %r806, %r807 }; 2026-02-21T09:31:53.0007752Z // end inline asm 2026-02-21T09:31:53.0007812Z // begin inline asm 2026-02-21T09:31:53.0007914Z st.global.v4.b32 [ %rd227 + 0 ], { %r808, %r809, %r810, %r811 }; 2026-02-21T09:31:53.0007980Z // end inline asm 2026-02-21T09:31:53.0008040Z // begin inline asm 2026-02-21T09:31:53.0008144Z st.global.v4.b32 [ %rd228 + 0 ], { %r812, %r813, %r814, %r815 }; 2026-02-21T09:31:53.0008205Z // end inline asm 2026-02-21T09:31:53.0008306Z $L__BB0_8: // %._crit_edge 2026-02-21T09:31:53.0008501Z .loc 1 20 4 // clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py:20:4 2026-02-21T09:31:53.0008561Z bar.sync 0; 2026-02-21T09:31:53.0008630Z // begin inline asm 2026-02-21T09:31:53.0008761Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1018, 128; 2026-02-21T09:31:53.0008820Z // end inline asm 2026-02-21T09:31:53.0008876Z ret; 2026-02-21T09:31:53.0008944Z $L__tmp0: 2026-02-21T09:31:53.0009004Z $L__func_end0: 2026-02-21T09:31:53.0009095Z // -- End function 2026-02-21T09:31:53.0009159Z } 2026-02-21T09:31:53.0009391Z .file 1 "/tmp/torchinductor_root/lm/clmnifbiv6recpqnwzqcim377e27hfdh4ghpw6awpsvl6y5yjaa5.py" 2026-02-21T09:31:53.0009461Z .section .debug_abbrev 2026-02-21T09:31:53.0009523Z { 2026-02-21T09:31:53.0009623Z .b8 1 // Abbreviation Code 2026-02-21T09:31:53.0009722Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:31:53.0009867Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:31:53.0009963Z .b8 37 // DW_AT_producer 2026-02-21T09:31:53.0010045Z .b8 8 // DW_FORM_string 2026-02-21T09:31:53.0010127Z .b8 19 // DW_AT_language 2026-02-21T09:31:53.0010222Z .b8 5 // DW_FORM_data2 2026-02-21T09:31:53.0010304Z .b8 3 // DW_AT_name 2026-02-21T09:31:53.0010384Z .b8 8 // DW_FORM_string 2026-02-21T09:31:53.0010476Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:31:53.0010557Z .b8 6 // DW_FORM_data4 2026-02-21T09:31:53.0010638Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:31:53.0010741Z .b8 8 // DW_FORM_string 2026-02-21T09:31:53.0010846Z .b8 0 // EOM(1) 2026-02-21T09:31:53.0010926Z .b8 0 // EOM(2) 2026-02-21T09:31:53.0010999Z .b8 0 // EOM(3) 2026-02-21T09:31:53.0011061Z } 2026-02-21T09:31:53.0011126Z .section .debug_info 2026-02-21T09:31:53.0011181Z { 2026-02-21T09:31:53.0011270Z .b32 104 // Length of Unit 2026-02-21T09:31:53.0011372Z .b8 2 // DWARF version number 2026-02-21T09:31:53.0011428Z .b8 0 2026-02-21T09:31:53.0011557Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:31:53.0011662Z .b8 8 // Address Size (in bytes) 2026-02-21T09:31:53.0011772Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:31:53.0011859Z .b8 116 // DW_AT_producer 2026-02-21T09:31:53.0011926Z .b8 114 2026-02-21T09:31:53.0011983Z .b8 105 2026-02-21T09:31:53.0012039Z .b8 116 2026-02-21T09:31:53.0012096Z .b8 111 2026-02-21T09:31:53.0012162Z .b8 110 2026-02-21T09:31:53.0012216Z .b8 0 2026-02-21T09:31:53.0012297Z .b8 2 // DW_AT_language 2026-02-21T09:31:53.0012358Z .b8 0 2026-02-21T09:31:53.0012442Z .b8 99 // DW_AT_name 2026-02-21T09:31:53.0012497Z .b8 108 2026-02-21T09:31:53.0012551Z .b8 109 2026-02-21T09:31:53.0012615Z .b8 110 2026-02-21T09:31:53.0012670Z .b8 105 2026-02-21T09:31:53.0012726Z .b8 102 2026-02-21T09:31:53.0012789Z .b8 98 2026-02-21T09:31:53.0012844Z .b8 105 2026-02-21T09:31:53.0012899Z .b8 118 2026-02-21T09:31:53.0012953Z .b8 54 2026-02-21T09:31:53.0013017Z .b8 114 2026-02-21T09:31:53.0013071Z .b8 101 2026-02-21T09:31:53.0013124Z .b8 99 2026-02-21T09:31:53.0013187Z .b8 112 2026-02-21T09:31:53.0013242Z .b8 113 2026-02-21T09:31:53.0013297Z .b8 110 2026-02-21T09:31:53.0013351Z .b8 119 2026-02-21T09:31:53.0013415Z .b8 122 2026-02-21T09:31:53.0013469Z .b8 113 2026-02-21T09:31:53.0013524Z .b8 99 2026-02-21T09:31:53.0013579Z .b8 105 2026-02-21T09:31:53.0013643Z .b8 109 2026-02-21T09:31:53.0013696Z .b8 51 2026-02-21T09:31:53.0013749Z .b8 55 2026-02-21T09:31:53.0013810Z .b8 55 2026-02-21T09:31:53.0013865Z .b8 101 2026-02-21T09:31:53.0013918Z .b8 50 2026-02-21T09:31:53.0013972Z .b8 55 2026-02-21T09:31:53.0014034Z .b8 104 2026-02-21T09:31:53.0014088Z .b8 102 2026-02-21T09:31:53.0014141Z .b8 100 2026-02-21T09:31:53.0014202Z .b8 104 2026-02-21T09:31:53.0014255Z .b8 52 2026-02-21T09:31:53.0014311Z .b8 103 2026-02-21T09:31:53.0014365Z .b8 104 2026-02-21T09:31:53.0014426Z .b8 112 2026-02-21T09:31:53.0014479Z .b8 119 2026-02-21T09:31:53.0014534Z .b8 54 2026-02-21T09:31:53.0014587Z .b8 97 2026-02-21T09:31:53.0014648Z .b8 119 2026-02-21T09:31:53.0014963Z .b8 112 2026-02-21T09:31:53.0015020Z .b8 115 2026-02-21T09:31:53.0015083Z .b8 118 2026-02-21T09:31:53.0015138Z .b8 108 2026-02-21T09:31:53.0015193Z .b8 54 2026-02-21T09:31:53.0015250Z .b8 121 2026-02-21T09:31:53.0015317Z .b8 53 2026-02-21T09:31:53.0015374Z .b8 121 2026-02-21T09:31:53.0015472Z .b8 106 2026-02-21T09:31:53.0015576Z .b8 97 2026-02-21T09:31:53.0015630Z .b8 97 2026-02-21T09:31:53.0015686Z .b8 53 2026-02-21T09:31:53.0015741Z .b8 46 2026-02-21T09:31:53.0015805Z .b8 112 2026-02-21T09:31:53.0015861Z .b8 121 2026-02-21T09:31:53.0015916Z .b8 0 2026-02-21T09:31:53.0016026Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:31:53.0016109Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:31:53.0016166Z .b8 116 2026-02-21T09:31:53.0016221Z .b8 109 2026-02-21T09:31:53.0016285Z .b8 112 2026-02-21T09:31:53.0016339Z .b8 47 2026-02-21T09:31:53.0016393Z .b8 116 2026-02-21T09:31:53.0016455Z .b8 111 2026-02-21T09:31:53.0016511Z .b8 114 2026-02-21T09:31:53.0016566Z .b8 99 2026-02-21T09:31:53.0016622Z .b8 104 2026-02-21T09:31:53.0016689Z .b8 105 2026-02-21T09:31:53.0016746Z .b8 110 2026-02-21T09:31:53.0016802Z .b8 100 2026-02-21T09:31:53.0016858Z .b8 117 2026-02-21T09:31:53.0016956Z .b8 99 2026-02-21T09:31:53.0017038Z .b8 116 2026-02-21T09:31:53.0017096Z .b8 111 2026-02-21T09:31:53.0017164Z .b8 114 2026-02-21T09:31:53.0017218Z .b8 95 2026-02-21T09:31:53.0017273Z .b8 114 2026-02-21T09:31:53.0017327Z .b8 111 2026-02-21T09:31:53.0017391Z .b8 111 2026-02-21T09:31:53.0017445Z .b8 116 2026-02-21T09:31:53.0017500Z .b8 47 2026-02-21T09:31:53.0017560Z .b8 108 2026-02-21T09:31:53.0017615Z .b8 109 2026-02-21T09:31:53.0017670Z .b8 0 2026-02-21T09:31:53.0017725Z } 2026-02-21T09:31:53.0017807Z .section .debug_macinfo { } 2026-02-21T09:31:53.0017812Z 2026-02-21T09:31:53.0017897Z ================================================================ 2026-02-21T09:31:53.0018009Z please share the reproducer above with Triton project. 2026-02-21T09:31:54.6945422Z 2026-02-21T09:31:54.6948942Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 17.0 configs/s 2026-02-21T09:31:57.2857171Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 388.4 2026-02-21T09:31:57.2857629Z configs/s 2026-02-21T09:31:57.4448958Z [69s] Generation 3 complete: 2026-02-21T09:31:57.4453242Z error=22 2026-02-21T09:31:57.4458674Z ok=66 2026-02-21T09:31:57.4462850Z min=0.0471 2026-02-21T09:31:57.4464437Z mid=0.0790 2026-02-21T09:31:57.4464612Z max=11.4904 2026-02-21T09:31:57.4464841Z best={'block_sizes': [256, 128, 32], 2026-02-21T09:31:57.4465098Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:31:57.4465326Z 'l2_groupings': [64], 2026-02-21T09:31:57.4465490Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:31:57.4465681Z 'loop_orders': [[1, 0]], 2026-02-21T09:31:57.4465829Z 'num_stages': 3, 2026-02-21T09:31:57.4465970Z 'num_warps': 4, 2026-02-21T09:31:57.4466103Z 'pid_type': 'flat', 2026-02-21T09:31:57.4466265Z 'range_flattens': [None, True], 2026-02-21T09:31:57.4466452Z 'range_multi_buffers': [None, True], 2026-02-21T09:31:57.4466647Z 'range_num_stages': [0, 0], 2026-02-21T09:31:57.4466856Z 'range_unroll_factors': [0, 0], 2026-02-21T09:31:57.4467053Z 'range_warp_specializes': [None, None]} 2026-02-21T09:31:57.4471822Z [69s] Fitting surrogate: 381 points, 381 targets 2026-02-21T09:31:58.7103516Z [70s] Generation 4 starting: 79 neighbors, 5 active search path(s) 2026-02-21T09:32:06.8475162Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82/82 22.0 configs/s 2026-02-21T09:32:11.0615861Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 82/82 19.7 configs/s 2026-02-21T09:32:14.3482282Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 306.8 2026-02-21T09:32:14.3486245Z configs/s 2026-02-21T09:32:14.5441471Z [86s] Generation 4 complete: 2026-02-21T09:32:14.5445163Z error=13 2026-02-21T09:32:14.5446743Z ok=72 2026-02-21T09:32:14.5446966Z min=0.0390 2026-02-21T09:32:14.5447108Z mid=0.0655 2026-02-21T09:32:14.5449447Z max=7.0051 2026-02-21T09:32:14.5449707Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:32:14.5455755Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:32:14.5461089Z 'l2_groupings': [32], 2026-02-21T09:32:14.5466129Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:32:14.5467718Z 'loop_orders': [[1, 0]], 2026-02-21T09:32:14.5467911Z 'num_stages': 3, 2026-02-21T09:32:14.5468062Z 'num_warps': 4, 2026-02-21T09:32:14.5468203Z 'pid_type': 'flat', 2026-02-21T09:32:14.5468369Z 'range_flattens': [None, True], 2026-02-21T09:32:14.5468546Z 'range_multi_buffers': [None, True], 2026-02-21T09:32:14.5468732Z 'range_num_stages': [0, 0], 2026-02-21T09:32:14.5468900Z 'range_unroll_factors': [0, 0], 2026-02-21T09:32:14.5469073Z 'range_warp_specializes': [None, None]} 2026-02-21T09:32:14.5469359Z [86s] Fitting surrogate: 466 points, 466 targets 2026-02-21T09:32:15.8208597Z [87s] Generation 5 starting: 83 neighbors, 5 active search path(s) 2026-02-21T09:32:34.4452433Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 0.8 configs/s 2026-02-21T09:32:38.5232588Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 84/84 20.6 configs/s 2026-02-21T09:32:43.5557814Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 227.5 2026-02-21T09:32:43.5562609Z configs/s 2026-02-21T09:32:43.8093335Z [115s] Generation 5 complete: 2026-02-21T09:32:43.8093570Z error=21 2026-02-21T09:32:43.8093705Z ok=67 2026-02-21T09:32:43.8093949Z min=0.0450 2026-02-21T09:32:43.8094075Z mid=0.0500 2026-02-21T09:32:43.8094200Z max=5.7170 2026-02-21T09:32:43.8094332Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:32:43.8094571Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:32:43.8095071Z 'l2_groupings': [32], 2026-02-21T09:32:43.8095246Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:32:43.8095429Z 'loop_orders': [[1, 0]], 2026-02-21T09:32:43.8095588Z 'num_stages': 3, 2026-02-21T09:32:43.8095760Z 'num_warps': 4, 2026-02-21T09:32:43.8095895Z 'pid_type': 'flat', 2026-02-21T09:32:43.8096058Z 'range_flattens': [None, True], 2026-02-21T09:32:43.8096243Z 'range_multi_buffers': [None, True], 2026-02-21T09:32:43.8096422Z 'range_num_stages': [0, 0], 2026-02-21T09:32:43.8096578Z 'range_unroll_factors': [0, 0], 2026-02-21T09:32:43.8096756Z 'range_warp_specializes': [None, None]} 2026-02-21T09:32:43.8112205Z [115s] Fitting surrogate: 554 points, 554 targets 2026-02-21T09:32:45.1509195Z [116s] Generation 6 starting: 81 neighbors, 5 active search path(s) 2026-02-21T09:32:56.4171195Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82/82 1.7 configs/s 2026-02-21T09:32:57.7080938Z [129s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:32:57.7081246Z 2026-02-21T09:32:57.7083421Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=8, num_stages=4, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:32:57.7085276Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:32:57.7085516Z `ptxas` stderr: 2026-02-21T09:32:57.7085924Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 191 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:32:57.7086391Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:32:57.7086539Z 2026-02-21T09:32:57.7086943Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpw5ijpvsv.ptx -o /tmp/tmpw5ijpvsv.ptx.o 2026-02-21T09:32:57.7087374Z 2026-02-21T09:32:57.7087517Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:32:57.7087715Z 2026-02-21T09:32:57.7087797Z ================================================================ 2026-02-21T09:32:57.7088024Z Internal Triton PTX codegen error 2026-02-21T09:32:57.7088195Z `ptxas` stderr: 2026-02-21T09:32:57.7088605Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 191 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:32:57.7089055Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:32:57.7089203Z 2026-02-21T09:32:57.7089577Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpw5ijpvsv.ptx -o /tmp/tmpw5ijpvsv.ptx.o 2026-02-21T09:32:57.7090009Z 2026-02-21T09:32:57.7090012Z 2026-02-21T09:32:57.7090075Z // 2026-02-21T09:32:57.7090209Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:32:57.7090381Z // 2026-02-21T09:32:57.7090525Z 2026-02-21T09:32:57.7090838Z .version 8.7 2026-02-21T09:32:57.7091035Z .target sm_100a 2026-02-21T09:32:57.7091169Z .address_size 64 2026-02-21T09:32:57.7091260Z 2026-02-21T09:32:57.7091379Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:32:57.7091634Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:32:57.7091837Z // @_helion_matmul 2026-02-21T09:32:57.7092037Z .visible .entry _helion_matmul( 2026-02-21T09:32:57.7092245Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:32:57.7092496Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:32:57.7092732Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:32:57.7092976Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:32:57.7093240Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:32:57.7093444Z ) 2026-02-21T09:32:57.7093575Z .reqntid 128 2026-02-21T09:32:57.7093705Z .maxnreg 32 2026-02-21T09:32:57.7093834Z { 2026-02-21T09:32:57.7093961Z .reg .pred %p<80>; 2026-02-21T09:32:57.7094118Z .reg .b32 %r<904>; 2026-02-21T09:32:57.7094261Z .reg .b64 %rd<359>; 2026-02-21T09:32:57.7094413Z $L__func_begin0: 2026-02-21T09:32:57.7094496Z 2026-02-21T09:32:57.7094549Z // %bb.0: 2026-02-21T09:32:57.7094846Z .loc 1 19 0 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:19 2026-02-21T09:32:57.7095153Z mov.u32 %r1, %tid.x; 2026-02-21T09:32:57.7095332Z ld.param.b64 %rd12, [_helion_matmul_param_1]; 2026-02-21T09:32:57.7095545Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:32:57.7095711Z mov.b32 %r52, global_smem; 2026-02-21T09:32:57.7095877Z // begin inline asm 2026-02-21T09:32:57.7096131Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r52], 128; 2026-02-21T09:32:57.7096395Z // end inline asm 2026-02-21T09:32:57.7096569Z ld.param.b64 %rd29, [_helion_matmul_param_3]; 2026-02-21T09:32:57.7096772Z bar.sync 0; 2026-02-21T09:32:57.7096932Z ld.shared.b32 %r896, [global_smem]; 2026-02-21T09:32:57.7097112Z bar.sync 0; 2026-02-21T09:32:57.7097301Z // begin inline asm 2026-02-21T09:32:57.7097550Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:32:57.7097785Z // end inline asm 2026-02-21T09:32:57.7098046Z .loc 1 21 67 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:21:67 2026-02-21T09:32:57.7098370Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:32:57.7098532Z mov.u32 %r61, %ctaid.y; 2026-02-21T09:32:57.7098698Z mov.u32 %r62, %ctaid.z; 2026-02-21T09:32:57.7098862Z mov.u32 %r63, %nctaid.x; 2026-02-21T09:32:57.7099017Z mov.u32 %r64, %nctaid.y; 2026-02-21T09:32:57.7099185Z mad.lo.s32 %r65, %r62, %r64, %r61; 2026-02-21T09:32:57.7099361Z mad.lo.s32 %r66, %r65, %r63, %r3; 2026-02-21T09:32:57.7099537Z shl.b32 %r67, %r66, 7; 2026-02-21T09:32:57.7099687Z cvt.s64.s32 %rd30, %r67; 2026-02-21T09:32:57.7099854Z add.s64 %rd26, %rd29, %rd30; 2026-02-21T09:32:57.7100017Z shl.b32 %r68, %r1, 2; 2026-02-21T09:32:57.7100179Z add.s32 %r53, %r52, %r68; 2026-02-21T09:32:57.7100338Z mov.b32 %r70, 0; 2026-02-21T09:32:57.7100480Z // begin inline asm 2026-02-21T09:32:57.7100639Z @%p1 st.shared.b32 [ %r53 + 0 ], %r70; 2026-02-21T09:32:57.7100814Z // end inline asm 2026-02-21T09:32:57.7100961Z bar.warp.sync -1; 2026-02-21T09:32:57.7101110Z setp.eq.b32 %p73, %r1, 0; 2026-02-21T09:32:57.7101272Z cvt.u64.u32 %rd11, %r52; 2026-02-21T09:32:57.7101421Z // begin inline asm 2026-02-21T09:32:57.7101682Z @%p73 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T09:32:57.7101970Z // end inline asm 2026-02-21T09:32:57.7102112Z // begin inline asm 2026-02-21T09:32:57.7102345Z @%p73 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:32:57.7102606Z // end inline asm 2026-02-21T09:32:57.7102748Z mov.b32 %r55, 32; 2026-02-21T09:32:57.7102889Z // begin inline asm 2026-02-21T09:32:57.7103184Z @%p73 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r55; 2026-02-21T09:32:57.7103447Z // end inline asm 2026-02-21T09:32:57.7103584Z mov.b32 %r56, 128; 2026-02-21T09:32:57.7103712Z // begin inline asm 2026-02-21T09:32:57.7103941Z @%p73 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r56; 2026-02-21T09:32:57.7104204Z // end inline asm 2026-02-21T09:32:57.7104336Z mov.b32 %r57, 1024; 2026-02-21T09:32:57.7104479Z // begin inline asm 2026-02-21T09:32:57.7104749Z @%p73 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r57; 2026-02-21T09:32:57.7105024Z // end inline asm 2026-02-21T09:32:57.7105152Z // begin inline asm 2026-02-21T09:32:57.7105393Z @%p73 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r57; 2026-02-21T09:32:57.7105668Z // end inline asm 2026-02-21T09:32:57.7105799Z mov.b64 %rd19, 2048; 2026-02-21T09:32:57.7105944Z // begin inline asm 2026-02-21T09:32:57.7106192Z @%p73 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:32:57.7106486Z // end inline asm 2026-02-21T09:32:57.7106620Z mov.b32 %r59, 1; 2026-02-21T09:32:57.7106766Z // begin inline asm 2026-02-21T09:32:57.7107018Z @%p73 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r59; 2026-02-21T09:32:57.7107311Z // end inline asm 2026-02-21T09:32:57.7107446Z // begin inline asm 2026-02-21T09:32:57.7107690Z @%p73 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r59; 2026-02-21T09:32:57.7107969Z // end inline asm 2026-02-21T09:32:57.7108098Z // begin inline asm 2026-02-21T09:32:57.7108331Z @%p73 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:32:57.7108588Z // end inline asm 2026-02-21T09:32:57.7108726Z // begin inline asm 2026-02-21T09:32:57.7108973Z @%p73 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:32:57.7109246Z // end inline asm 2026-02-21T09:32:57.7109384Z // begin inline asm 2026-02-21T09:32:57.7109614Z @%p73 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x2; 2026-02-21T09:32:57.7109944Z // end inline asm 2026-02-21T09:32:57.7110073Z // begin inline asm 2026-02-21T09:32:57.7110300Z @%p73 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:32:57.7110566Z // end inline asm 2026-02-21T09:32:57.7110696Z // begin inline asm 2026-02-21T09:32:57.7111038Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd26 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:32:57.7111412Z // end inline asm 2026-02-21T09:32:57.7111547Z // begin inline asm 2026-02-21T09:32:57.7111751Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd26 + 0 ], 0x80; 2026-02-21T09:32:57.7112004Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:32:57.7112189Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:32:57.7112368Z // end inline asm 2026-02-21T09:32:57.7112506Z bar.sync 0; 2026-02-21T09:32:57.7112644Z cvta.global.u64 %rd61, %rd26; 2026-02-21T09:32:57.7112923Z .loc 1 27 97 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:27:97 2026-02-21T09:32:57.7113216Z setp.gt.u32 %p21, %r3, 767; 2026-02-21T09:32:57.7113383Z @%p21 bra $L__BB0_8; 2026-02-21T09:32:57.7113537Z // %bb.1: // %.lr.ph 2026-02-21T09:32:57.7113834Z .loc 1 0 97 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:0:97 2026-02-21T09:32:57.7114144Z ld.param.b64 %rd9, [_helion_matmul_param_0]; 2026-02-21T09:32:57.7114438Z .loc 1 47 48 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:47:48 2026-02-21T09:32:57.7114764Z and.b32 %r252, %r1, 3; 2026-02-21T09:32:57.7114914Z shl.b32 %r253, %r252, 3; 2026-02-21T09:32:57.7115178Z .loc 1 39 45 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:39:45 2026-02-21T09:32:57.7115493Z shl.b32 %r254, %r1, 3; 2026-02-21T09:32:57.7115684Z and.b32 %r255, %r254, 120; 2026-02-21T09:32:57.7115841Z shr.u32 %r256, %r1, 4; 2026-02-21T09:32:57.7115999Z bfe.u32 %r257, %r1, 4, 3; 2026-02-21T09:32:57.7116259Z .loc 1 40 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:40:27 2026-02-21T09:32:57.7116539Z shl.b32 %r258, %r3, 4; 2026-02-21T09:32:57.7116692Z and.b32 %r259, %r258, 16256; 2026-02-21T09:32:57.7116947Z .loc 1 39 45 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:39:45 2026-02-21T09:32:57.7117240Z or.b32 %r260, %r256, %r259; 2026-02-21T09:32:57.7117395Z bfe.u32 %r4, %r1, 2, 5; 2026-02-21T09:32:57.7117554Z shr.u32 %r261, %r1, 5; 2026-02-21T09:32:57.7126332Z shl.b32 %r262, %r1, 4; 2026-02-21T09:32:57.7126536Z and.b32 %r263, %r262, 2032; 2026-02-21T09:32:57.7126716Z and.b32 %r264, %r1, 24; 2026-02-21T09:32:57.7126873Z shl.b32 %r265, %r264, 1; 2026-02-21T09:32:57.7127042Z xor.b32 %r5, %r263, %r265; 2026-02-21T09:32:57.7127205Z add.s32 %r211, %r52, %r5; 2026-02-21T09:32:57.7127374Z add.s32 %r213, %r211, 2048; 2026-02-21T09:32:57.7127532Z add.s32 %r215, %r211, 4096; 2026-02-21T09:32:57.7127694Z add.s32 %r217, %r211, 6144; 2026-02-21T09:32:57.7127851Z add.s32 %r224, %r211, 8192; 2026-02-21T09:32:57.7128007Z add.s32 %r226, %r211, 10240; 2026-02-21T09:32:57.7128174Z add.s32 %r228, %r211, 12288; 2026-02-21T09:32:57.7128326Z add.s32 %r230, %r211, 14336; 2026-02-21T09:32:57.7128487Z add.s32 %r237, %r211, 16384; 2026-02-21T09:32:57.7128642Z add.s32 %r239, %r211, 18432; 2026-02-21T09:32:57.7128802Z add.s32 %r241, %r211, 20480; 2026-02-21T09:32:57.7128950Z add.s32 %r243, %r211, 22528; 2026-02-21T09:32:57.7129108Z or.b32 %r6, %r253, 96; 2026-02-21T09:32:57.7129263Z add.s32 %r308, %r211, 24576; 2026-02-21T09:32:57.7129412Z add.s32 %r310, %r211, 26624; 2026-02-21T09:32:57.7129568Z add.s32 %r312, %r211, 28672; 2026-02-21T09:32:57.7129714Z add.s32 %r314, %r211, 30720; 2026-02-21T09:32:57.7129872Z shl.b32 %r267, %r1, 10; 2026-02-21T09:32:57.7130021Z and.b32 %r268, %r267, 6144; 2026-02-21T09:32:57.7130288Z or.b32 %r269, %r268, %r263; 2026-02-21T09:32:57.7130479Z xor.b32 %r270, %r269, 32; 2026-02-21T09:32:57.7130642Z xor.b32 %r271, %r269, 64; 2026-02-21T09:32:57.7130788Z xor.b32 %r272, %r269, 96; 2026-02-21T09:32:57.7130943Z and.b32 %r273, %r1, 96; 2026-02-21T09:32:57.7131094Z shl.b32 %r274, %r273, 6; 2026-02-21T09:32:57.7131244Z shl.b32 %r275, %r252, 5; 2026-02-21T09:32:57.7131399Z shl.b32 %r276, %r264, 4; 2026-02-21T09:32:57.7131544Z and.b32 %r278, %r68, 16; 2026-02-21T09:32:57.7131702Z or.b32 %r279, %r274, %r275; 2026-02-21T09:32:57.7131851Z or.b32 %r280, %r276, %r273; 2026-02-21T09:32:57.7132011Z xor.b32 %r281, %r279, %r280; 2026-02-21T09:32:57.7132160Z add.s32 %r282, %r52, %r278; 2026-02-21T09:32:57.7132319Z add.s32 %r529, %r282, %r281; 2026-02-21T09:32:57.7132605Z .loc 1 38 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:38:27 2026-02-21T09:32:57.7132914Z shl.b32 %r283, %r3, 7; 2026-02-21T09:32:57.7133076Z and.b32 %r319, %r283, 896; 2026-02-21T09:32:57.7133341Z .loc 1 41 32 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:41:32 2026-02-21T09:32:57.7133632Z or.b32 %r284, %r259, %r4; 2026-02-21T09:32:57.7133784Z or.b32 %r21, %r259, %r257; 2026-02-21T09:32:57.7134055Z .loc 1 51 53 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:53 2026-02-21T09:32:57.7134332Z shl.b32 %r285, %r284, 10; 2026-02-21T09:32:57.7134596Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7134940Z shfl.sync.idx.b32 %r37, %r261, 0, 31, -1; 2026-02-21T09:32:57.7135126Z shl.b32 %r286, %r37, 21; 2026-02-21T09:32:57.7135290Z and.b32 %r287, %r286, 6291456; 2026-02-21T09:32:57.7135451Z add.s32 %r524, %r287, %r896; 2026-02-21T09:32:57.7135615Z mov.pred %p22, -1; 2026-02-21T09:32:57.7135813Z // begin inline asm 2026-02-21T09:32:57.7136214Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 0], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7136589Z // end inline asm 2026-02-21T09:32:57.7136728Z // begin inline asm 2026-02-21T09:32:57.7137074Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 16], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7137463Z // end inline asm 2026-02-21T09:32:57.7137616Z // begin inline asm 2026-02-21T09:32:57.7137965Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 32], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7138360Z // end inline asm 2026-02-21T09:32:57.7138513Z // begin inline asm 2026-02-21T09:32:57.7138864Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 48], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7139258Z // end inline asm 2026-02-21T09:32:57.7139399Z // begin inline asm 2026-02-21T09:32:57.7139746Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 64], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7140121Z // end inline asm 2026-02-21T09:32:57.7140267Z // begin inline asm 2026-02-21T09:32:57.7140606Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 80], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7140987Z // end inline asm 2026-02-21T09:32:57.7141130Z // begin inline asm 2026-02-21T09:32:57.7141466Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 96], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7141846Z // end inline asm 2026-02-21T09:32:57.7141984Z // begin inline asm 2026-02-21T09:32:57.7142341Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 112], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:57.7142786Z // end inline asm 2026-02-21T09:32:57.7142925Z // begin inline asm 2026-02-21T09:32:57.7143098Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:32:57.7143269Z // end inline asm 2026-02-21T09:32:57.7143415Z bar.sync 0; 2026-02-21T09:32:57.7143678Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7143998Z add.s32 %r898, %r52, 65568; 2026-02-21T09:32:57.7144157Z // begin inline asm 2026-02-21T09:32:57.7144342Z @%p73 mbarrier.init.shared::cta.b64 [%r898], 1; 2026-02-21T09:32:57.7144551Z // end inline asm 2026-02-21T09:32:57.7144735Z bar.sync 0; 2026-02-21T09:32:57.7144888Z add.s32 %r206, %r52, 65576; 2026-02-21T09:32:57.7145049Z // begin inline asm 2026-02-21T09:32:57.7145233Z @%p73 mbarrier.init.shared::cta.b64 [%r206], 1; 2026-02-21T09:32:57.7145431Z // end inline asm 2026-02-21T09:32:57.7145587Z add.s32 %r207, %r52, 65536; 2026-02-21T09:32:57.7145754Z // begin inline asm 2026-02-21T09:32:57.7145928Z @%p73 mbarrier.init.shared::cta.b64 [%r207], 1; 2026-02-21T09:32:57.7146118Z // end inline asm 2026-02-21T09:32:57.7146250Z bar.sync 0; 2026-02-21T09:32:57.7146393Z add.s32 %r208, %r52, 65544; 2026-02-21T09:32:57.7146542Z // begin inline asm 2026-02-21T09:32:57.7146711Z @%p73 mbarrier.init.shared::cta.b64 [%r208], 1; 2026-02-21T09:32:57.7146896Z // end inline asm 2026-02-21T09:32:57.7147043Z bar.sync 0; 2026-02-21T09:32:57.7147171Z add.s32 %r209, %r52, 65552; 2026-02-21T09:32:57.7147329Z // begin inline asm 2026-02-21T09:32:57.7147487Z @%p73 mbarrier.init.shared::cta.b64 [%r209], 1; 2026-02-21T09:32:57.7147678Z // end inline asm 2026-02-21T09:32:57.7147815Z bar.sync 0; 2026-02-21T09:32:57.7147943Z add.s32 %r316, %r52, 65560; 2026-02-21T09:32:57.7148104Z // begin inline asm 2026-02-21T09:32:57.7148314Z @%p73 mbarrier.init.shared::cta.b64 [%r316], 1; 2026-02-21T09:32:57.7148542Z // end inline asm 2026-02-21T09:32:57.7148790Z .loc 1 51 60 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:60 2026-02-21T09:32:57.7149079Z or.b32 %r288, %r285, %r253; 2026-02-21T09:32:57.7149339Z .loc 1 51 32 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:32 2026-02-21T09:32:57.7149635Z mad.wide.u32 %rd31, %r288, 2, %rd9; 2026-02-21T09:32:57.7149820Z cvt.u64.u32 %rd2, %r285; 2026-02-21T09:32:57.7149979Z add.s64 %rd32, %rd31, 65536; 2026-02-21T09:32:57.7150149Z add.s64 %rd33, %rd31, 131072; 2026-02-21T09:32:57.7150311Z add.s64 %rd34, %rd31, 196608; 2026-02-21T09:32:57.7150472Z mov.b32 %r309, 16; 2026-02-21T09:32:57.7150712Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7151002Z // begin inline asm 2026-02-21T09:32:57.7151200Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd31 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7151432Z // end inline asm 2026-02-21T09:32:57.7151575Z // begin inline asm 2026-02-21T09:32:57.7151769Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd32 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7151998Z // end inline asm 2026-02-21T09:32:57.7152128Z // begin inline asm 2026-02-21T09:32:57.7152320Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd33 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7152535Z // end inline asm 2026-02-21T09:32:57.7152674Z // begin inline asm 2026-02-21T09:32:57.7152859Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd34 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7153082Z // end inline asm 2026-02-21T09:32:57.7153230Z cp.async.commit_group; 2026-02-21T09:32:57.7153488Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7153772Z bar.sync 0; 2026-02-21T09:32:57.7153901Z // begin inline asm 2026-02-21T09:32:57.7154097Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r207], 8192; 2026-02-21T09:32:57.7154311Z // end inline asm 2026-02-21T09:32:57.7154564Z .loc 1 52 44 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:52:44 2026-02-21T09:32:57.7154940Z bar.sync 0; 2026-02-21T09:32:57.7155091Z elect.sync %r289|%p43, -1; 2026-02-21T09:32:57.7155267Z and.pred %p37, %p1, %p43; 2026-02-21T09:32:57.7155423Z add.s32 %r220, %r52, 32768; 2026-02-21T09:32:57.7155588Z // begin inline asm 2026-02-21T09:32:57.7155908Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r220], [%rd61, {%r70, %r319}], [%r207]; 2026-02-21T09:32:57.7156268Z // end inline asm 2026-02-21T09:32:57.7156515Z .loc 1 51 32 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:32 2026-02-21T09:32:57.7156812Z add.s64 %rd36, %rd31, 64; 2026-02-21T09:32:57.7156978Z or.b32 %r290, %r288, 32; 2026-02-21T09:32:57.7157140Z mad.wide.u32 %rd46, %r290, 2, %rd9; 2026-02-21T09:32:57.7157324Z add.s64 %rd37, %rd46, 65536; 2026-02-21T09:32:57.7157480Z add.s64 %rd38, %rd46, 131072; 2026-02-21T09:32:57.7157649Z add.s64 %rd39, %rd46, 196608; 2026-02-21T09:32:57.7157911Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7158200Z // begin inline asm 2026-02-21T09:32:57.7158394Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd36 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7158616Z // end inline asm 2026-02-21T09:32:57.7158759Z // begin inline asm 2026-02-21T09:32:57.7158948Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd37 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7159170Z // end inline asm 2026-02-21T09:32:57.7159305Z // begin inline asm 2026-02-21T09:32:57.7159506Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd38 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7159726Z // end inline asm 2026-02-21T09:32:57.7159864Z // begin inline asm 2026-02-21T09:32:57.7160048Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd39 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7160275Z // end inline asm 2026-02-21T09:32:57.7160455Z cp.async.commit_group; 2026-02-21T09:32:57.7160745Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7161033Z bar.sync 0; 2026-02-21T09:32:57.7161165Z // begin inline asm 2026-02-21T09:32:57.7161350Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r208], 8192; 2026-02-21T09:32:57.7161574Z // end inline asm 2026-02-21T09:32:57.7161813Z .loc 1 52 44 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:52:44 2026-02-21T09:32:57.7162099Z bar.sync 0; 2026-02-21T09:32:57.7162233Z elect.sync %r291|%p44, -1; 2026-02-21T09:32:57.7162399Z and.pred %p39, %p1, %p44; 2026-02-21T09:32:57.7162550Z add.s32 %r233, %r52, 40960; 2026-02-21T09:32:57.7162712Z // begin inline asm 2026-02-21T09:32:57.7163041Z @%p39 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r233], [%rd61, {%r55, %r319}], [%r208]; 2026-02-21T09:32:57.7163391Z // end inline asm 2026-02-21T09:32:57.7163641Z .loc 1 51 32 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:32 2026-02-21T09:32:57.7163925Z add.s64 %rd41, %rd31, 128; 2026-02-21T09:32:57.7164083Z or.b32 %r292, %r288, 64; 2026-02-21T09:32:57.7164237Z mad.wide.u32 %rd47, %r292, 2, %rd9; 2026-02-21T09:32:57.7164410Z add.s64 %rd42, %rd47, 65536; 2026-02-21T09:32:57.7164561Z add.s64 %rd43, %rd47, 131072; 2026-02-21T09:32:57.7164752Z add.s64 %rd44, %rd47, 196608; 2026-02-21T09:32:57.7165023Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7165309Z // begin inline asm 2026-02-21T09:32:57.7165508Z cp.async.cg.shared.global [ %r237 + 0 ], [ %rd41 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7165723Z // end inline asm 2026-02-21T09:32:57.7165859Z // begin inline asm 2026-02-21T09:32:57.7166049Z cp.async.cg.shared.global [ %r239 + 0 ], [ %rd42 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7166275Z // end inline asm 2026-02-21T09:32:57.7166406Z // begin inline asm 2026-02-21T09:32:57.7166616Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd43 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7169316Z // end inline asm 2026-02-21T09:32:57.7169455Z // begin inline asm 2026-02-21T09:32:57.7169663Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd44 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7169894Z // end inline asm 2026-02-21T09:32:57.7170060Z cp.async.commit_group; 2026-02-21T09:32:57.7170354Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7170659Z bar.sync 0; 2026-02-21T09:32:57.7170811Z // begin inline asm 2026-02-21T09:32:57.7171013Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r209], 8192; 2026-02-21T09:32:57.7171253Z // end inline asm 2026-02-21T09:32:57.7171514Z .loc 1 52 44 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:52:44 2026-02-21T09:32:57.7171828Z bar.sync 0; 2026-02-21T09:32:57.7171980Z elect.sync %r293|%p45, -1; 2026-02-21T09:32:57.7172153Z and.pred %p41, %p1, %p45; 2026-02-21T09:32:57.7172326Z add.s32 %r246, %r52, 49152; 2026-02-21T09:32:57.7172482Z mov.b32 %r247, 64; 2026-02-21T09:32:57.7172629Z // begin inline asm 2026-02-21T09:32:57.7172971Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd61, {%r247, %r319}], [%r209]; 2026-02-21T09:32:57.7173347Z // end inline asm 2026-02-21T09:32:57.7173607Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7173919Z cp.async.wait_group 2; 2026-02-21T09:32:57.7174082Z bar.sync 0; 2026-02-21T09:32:57.7174330Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7174629Z // begin inline asm 2026-02-21T09:32:57.7174805Z 2026-02-21T09:32:57.7174933Z { 2026-02-21T09:32:57.7175060Z .reg .pred complete; 2026-02-21T09:32:57.7175219Z waitLoop: 2026-02-21T09:32:57.7175510Z mbarrier.try_wait.parity.shared.b64 complete, [%r207], %r70; 2026-02-21T09:32:57.7175767Z @!complete bra.uni waitLoop; 2026-02-21T09:32:57.7175937Z } 2026-02-21T09:32:57.7176006Z 2026-02-21T09:32:57.7176064Z // end inline asm 2026-02-21T09:32:57.7176347Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7176631Z setp.ne.b32 %p46, %r37, 0; 2026-02-21T09:32:57.7176791Z @%p46 bra $L__BB0_3; 2026-02-21T09:32:57.7176929Z // %bb.2: 2026-02-21T09:32:57.7177172Z .loc 1 0 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:0:52 2026-02-21T09:32:57.7177454Z add.s32 %r300, %r52, 32800; 2026-02-21T09:32:57.7177616Z bfe.u32 %r301, %r300, 4, 14; 2026-02-21T09:32:57.7177775Z cvt.u64.u32 %rd53, %r301; 2026-02-21T09:32:57.7177940Z or.b64 %rd51, %rd53, -9223371899382267904; 2026-02-21T09:32:57.7178123Z add.s32 %r302, %r52, 32; 2026-02-21T09:32:57.7178270Z bfe.u32 %r303, %r302, 4, 14; 2026-02-21T09:32:57.7178426Z cvt.u64.u32 %rd54, %r303; 2026-02-21T09:32:57.7178587Z or.b64 %rd50, %rd54, -9223371899382267904; 2026-02-21T09:32:57.7178771Z bfe.u32 %r304, %r220, 4, 14; 2026-02-21T09:32:57.7178919Z cvt.u64.u32 %rd55, %r304; 2026-02-21T09:32:57.7179081Z or.b64 %rd49, %rd55, -9223371899382267904; 2026-02-21T09:32:57.7179259Z bfe.u32 %r305, %r52, 4, 14; 2026-02-21T09:32:57.7179408Z cvt.u64.u32 %rd56, %r305; 2026-02-21T09:32:57.7179568Z or.b64 %rd48, %rd56, -9223371899382267904; 2026-02-21T09:32:57.7179854Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7180150Z elect.sync %r306|%p48, -1; 2026-02-21T09:32:57.7180304Z mov.b32 %r295, 136314896; 2026-02-21T09:32:57.7180459Z mov.pred %p47, 0; 2026-02-21T09:32:57.7180594Z // begin inline asm 2026-02-21T09:32:57.7180825Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd48, %rd49, %r295, %p47; 2026-02-21T09:32:57.7181076Z // end inline asm 2026-02-21T09:32:57.7181208Z // begin inline asm 2026-02-21T09:32:57.7181426Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd50, %rd51, %r295, %p22; 2026-02-21T09:32:57.7181722Z // end inline asm 2026-02-21T09:32:57.7181859Z add.s32 %r307, %r52, 65568; 2026-02-21T09:32:57.7182006Z cvt.u64.u32 %rd52, %r307; 2026-02-21T09:32:57.7182159Z // begin inline asm 2026-02-21T09:32:57.7182357Z @%p48 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd52]; 2026-02-21T09:32:57.7182583Z // end inline asm 2026-02-21T09:32:57.7182718Z $L__BB0_3: 2026-02-21T09:32:57.7182949Z .loc 1 0 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:0:52 2026-02-21T09:32:57.7183265Z ld.param.b64 %rd10, [_helion_matmul_param_2]; 2026-02-21T09:32:57.7183456Z add.s32 %r11, %r52, %r269; 2026-02-21T09:32:57.7183614Z add.s32 %r12, %r52, %r270; 2026-02-21T09:32:57.7183761Z add.s32 %r13, %r52, %r271; 2026-02-21T09:32:57.7183914Z add.s32 %r14, %r52, %r272; 2026-02-21T09:32:57.7184060Z add.s32 %r534, %r529, 512; 2026-02-21T09:32:57.7184214Z add.s32 %r539, %r529, 1024; 2026-02-21T09:32:57.7184371Z add.s32 %r544, %r529, 1536; 2026-02-21T09:32:57.7184518Z or.b32 %r20, %r319, %r255; 2026-02-21T09:32:57.7184701Z or.b32 %r22, %r21, 8; 2026-02-21T09:32:57.7184844Z or.b32 %r23, %r21, 16; 2026-02-21T09:32:57.7184993Z or.b32 %r24, %r21, 24; 2026-02-21T09:32:57.7185132Z or.b32 %r25, %r21, 32; 2026-02-21T09:32:57.7185275Z or.b32 %r26, %r21, 40; 2026-02-21T09:32:57.7185408Z or.b32 %r27, %r21, 48; 2026-02-21T09:32:57.7185553Z or.b32 %r28, %r260, 56; 2026-02-21T09:32:57.7185693Z or.b32 %r29, %r21, 64; 2026-02-21T09:32:57.7185835Z or.b32 %r30, %r21, 72; 2026-02-21T09:32:57.7185974Z or.b32 %r31, %r21, 80; 2026-02-21T09:32:57.7186109Z or.b32 %r32, %r21, 88; 2026-02-21T09:32:57.7186249Z or.b32 %r33, %r21, 96; 2026-02-21T09:32:57.7186387Z or.b32 %r34, %r21, 104; 2026-02-21T09:32:57.7186535Z or.b32 %r35, %r257, %r258; 2026-02-21T09:32:57.7186682Z or.b32 %r36, %r256, %r258; 2026-02-21T09:32:57.7186996Z .loc 1 51 32 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:32 2026-02-21T09:32:57.7187284Z add.s64 %rd57, %rd31, 192; 2026-02-21T09:32:57.7187445Z cvt.u64.u32 %rd63, %r6; 2026-02-21T09:32:57.7187605Z add.s64 %rd64, %rd2, %rd63; 2026-02-21T09:32:57.7187760Z shl.b64 %rd65, %rd64, 1; 2026-02-21T09:32:57.7187920Z add.s64 %rd66, %rd9, %rd65; 2026-02-21T09:32:57.7188076Z add.s64 %rd58, %rd66, 65536; 2026-02-21T09:32:57.7188240Z add.s64 %rd59, %rd66, 131072; 2026-02-21T09:32:57.7188400Z add.s64 %rd60, %rd66, 196608; 2026-02-21T09:32:57.7188681Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7188968Z bar.sync 0; 2026-02-21T09:32:57.7189114Z // begin inline asm 2026-02-21T09:32:57.7189322Z cp.async.cg.shared.global [ %r308 + 0 ], [ %rd57 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7189552Z // end inline asm 2026-02-21T09:32:57.7189694Z // begin inline asm 2026-02-21T09:32:57.7189892Z cp.async.cg.shared.global [ %r310 + 0 ], [ %rd58 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7190120Z // end inline asm 2026-02-21T09:32:57.7190254Z // begin inline asm 2026-02-21T09:32:57.7190451Z cp.async.cg.shared.global [ %r312 + 0 ], [ %rd59 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7190665Z // end inline asm 2026-02-21T09:32:57.7190806Z // begin inline asm 2026-02-21T09:32:57.7190998Z cp.async.cg.shared.global [ %r314 + 0 ], [ %rd60 + 0 ], 0x10, %r309; 2026-02-21T09:32:57.7191219Z // end inline asm 2026-02-21T09:32:57.7191367Z cp.async.commit_group; 2026-02-21T09:32:57.7191629Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7191923Z // begin inline asm 2026-02-21T09:32:57.7192113Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r316], 8192; 2026-02-21T09:32:57.7192334Z // end inline asm 2026-02-21T09:32:57.7192581Z .loc 1 52 44 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:52:44 2026-02-21T09:32:57.7192870Z bar.sync 0; 2026-02-21T09:32:57.7193017Z elect.sync %r326|%p55, -1; 2026-02-21T09:32:57.7193203Z and.pred %p53, %p1, %p55; 2026-02-21T09:32:57.7193388Z add.s32 %r317, %r52, 57344; 2026-02-21T09:32:57.7193533Z mov.b32 %r318, 96; 2026-02-21T09:32:57.7193672Z // begin inline asm 2026-02-21T09:32:57.7193988Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r317], [%rd61, {%r318, %r319}], [%r316]; 2026-02-21T09:32:57.7194357Z // end inline asm 2026-02-21T09:32:57.7194606Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7194949Z mul.wide.u32 %rd67, %r252, 16; 2026-02-21T09:32:57.7195121Z shl.b32 %r328, %r3, 14; 2026-02-21T09:32:57.7195278Z and.b32 %r329, %r328, 16646144; 2026-02-21T09:32:57.7195449Z shl.b32 %r330, %r4, 10; 2026-02-21T09:32:57.7195598Z or.b32 %r331, %r329, %r330; 2026-02-21T09:32:57.7195766Z mul.wide.u32 %rd68, %r331, 2; 2026-02-21T09:32:57.7195930Z or.b64 %rd69, %rd67, %rd68; 2026-02-21T09:32:57.7196096Z add.s64 %rd70, %rd69, %rd9; 2026-02-21T09:32:57.7196258Z add.s64 %rd357, %rd70, 196864; 2026-02-21T09:32:57.7196426Z mov.b32 %r902, 1; 2026-02-21T09:32:57.7196574Z mov.b32 %r901, 3; 2026-02-21T09:32:57.7196712Z mov.b32 %r897, 0; 2026-02-21T09:32:57.7196856Z mov.b64 %rd358, 0; 2026-02-21T09:32:57.7196998Z mov.b32 %r899, %r897; 2026-02-21T09:32:57.7197153Z mov.b32 %r900, %r897; 2026-02-21T09:32:57.7197298Z mov.b32 %r903, %r897; 2026-02-21T09:32:57.7197448Z bra.uni $L__BB0_4; 2026-02-21T09:32:57.7197636Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:32:57.7197978Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7198277Z setp.lt.u64 %p65, %rd358, 896; 2026-02-21T09:32:57.7198564Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7198908Z // begin inline asm 2026-02-21T09:32:57.7199088Z 2026-02-21T09:32:57.7199221Z { 2026-02-21T09:32:57.7199350Z .reg .pred complete; 2026-02-21T09:32:57.7199508Z waitLoop: 2026-02-21T09:32:57.7199704Z mbarrier.try_wait.parity.shared.b64 complete, [%r898], %r897; 2026-02-21T09:32:57.7199955Z @!complete bra.uni waitLoop; 2026-02-21T09:32:57.7200115Z } 2026-02-21T09:32:57.7200198Z 2026-02-21T09:32:57.7200257Z // end inline asm 2026-02-21T09:32:57.7200413Z add.s32 %r371, %r902, 1; 2026-02-21T09:32:57.7200572Z setp.gt.s32 %p68, %r371, 1; 2026-02-21T09:32:57.7200745Z selp.b32 %r902, 0, %r371, %p68; 2026-02-21T09:32:57.7200915Z selp.b32 %r372, 1, 0, %p68; 2026-02-21T09:32:57.7201079Z xor.b32 %r50, %r903, %r372; 2026-02-21T09:32:57.7201349Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7201729Z add.s32 %r373, %r901, 1; 2026-02-21T09:32:57.7201876Z setp.gt.s32 %p69, %r373, 3; 2026-02-21T09:32:57.7202038Z selp.b32 %r901, 0, %r373, %p69; 2026-02-21T09:32:57.7202315Z .loc 1 51 32 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:32 2026-02-21T09:32:57.7202600Z add.s64 %rd80, %rd357, -196608; 2026-02-21T09:32:57.7202768Z add.s64 %rd81, %rd357, -131072; 2026-02-21T09:32:57.7202923Z add.s64 %rd82, %rd357, -65536; 2026-02-21T09:32:57.7203194Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7203477Z shl.b32 %r374, %r901, 13; 2026-02-21T09:32:57.7203633Z add.s32 %r376, %r52, %r374; 2026-02-21T09:32:57.7203778Z bar.sync 0; 2026-02-21T09:32:57.7203915Z add.s32 %r358, %r376, %r5; 2026-02-21T09:32:57.7204074Z selp.b32 %r359, 16, 0, %p65; 2026-02-21T09:32:57.7204222Z // begin inline asm 2026-02-21T09:32:57.7204420Z cp.async.cg.shared.global [ %r358 + 0 ], [ %rd80 + 0 ], 0x10, %r359; 2026-02-21T09:32:57.7204639Z // end inline asm 2026-02-21T09:32:57.7204836Z add.s32 %r360, %r358, 2048; 2026-02-21T09:32:57.7204984Z // begin inline asm 2026-02-21T09:32:57.7205185Z cp.async.cg.shared.global [ %r360 + 0 ], [ %rd81 + 0 ], 0x10, %r359; 2026-02-21T09:32:57.7205458Z // end inline asm 2026-02-21T09:32:57.7205597Z add.s32 %r362, %r358, 4096; 2026-02-21T09:32:57.7205748Z // begin inline asm 2026-02-21T09:32:57.7205936Z cp.async.cg.shared.global [ %r362 + 0 ], [ %rd82 + 0 ], 0x10, %r359; 2026-02-21T09:32:57.7206153Z // end inline asm 2026-02-21T09:32:57.7206282Z add.s32 %r364, %r358, 6144; 2026-02-21T09:32:57.7206435Z // begin inline asm 2026-02-21T09:32:57.7206625Z cp.async.cg.shared.global [ %r364 + 0 ], [ %rd357 + 0 ], 0x10, %r359; 2026-02-21T09:32:57.7206846Z // end inline asm 2026-02-21T09:32:57.7206981Z cp.async.commit_group; 2026-02-21T09:32:57.7207239Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7207527Z shl.b32 %r377, %r901, 3; 2026-02-21T09:32:57.7207674Z add.s32 %r378, %r52, %r377; 2026-02-21T09:32:57.7207835Z add.s32 %r370, %r378, 65536; 2026-02-21T09:32:57.7207992Z and.pred %p63, %p73, %p65; 2026-02-21T09:32:57.7208152Z // begin inline asm 2026-02-21T09:32:57.7208335Z @%p63 mbarrier.arrive.expect_tx.shared.b64 _, [%r370], 8192; 2026-02-21T09:32:57.7208557Z // end inline asm 2026-02-21T09:32:57.7208796Z .loc 1 52 44 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:52:44 2026-02-21T09:32:57.7209098Z add.s32 %r367, %r376, 32768; 2026-02-21T09:32:57.7209252Z bar.sync 0; 2026-02-21T09:32:57.7209382Z elect.sync %r379|%p70, -1; 2026-02-21T09:32:57.7209544Z and.pred %p71, %p65, %p70; 2026-02-21T09:32:57.7209698Z and.pred %p64, %p1, %p71; 2026-02-21T09:32:57.7209858Z cvt.u32.u64 %r380, %rd358; 2026-02-21T09:32:57.7210004Z add.s32 %r368, %r380, 128; 2026-02-21T09:32:57.7210154Z // begin inline asm 2026-02-21T09:32:57.7210489Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r367], [%rd61, {%r368, %r319}], [%r370]; 2026-02-21T09:32:57.7210892Z // end inline asm 2026-02-21T09:32:57.7211175Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7211458Z add.s64 %rd357, %rd357, 64; 2026-02-21T09:32:57.7211622Z setp.lt.u64 %p72, %rd358, 960; 2026-02-21T09:32:57.7211780Z add.s64 %rd358, %rd358, 32; 2026-02-21T09:32:57.7211933Z mov.b32 %r897, %r903; 2026-02-21T09:32:57.7212069Z mov.b32 %r898, %r381; 2026-02-21T09:32:57.7212213Z mov.b32 %r903, %r50; 2026-02-21T09:32:57.7212352Z @%p72 bra $L__BB0_4; 2026-02-21T09:32:57.7212499Z bra.uni $L__BB0_7; 2026-02-21T09:32:57.7212686Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:32:57.7212898Z add.s32 %r334, %r900, 1; 2026-02-21T09:32:57.7213055Z setp.gt.s32 %p57, %r334, 3; 2026-02-21T09:32:57.7213209Z selp.b32 %r900, 0, %r334, %p57; 2026-02-21T09:32:57.7213377Z selp.b32 %r335, 1, 0, %p57; 2026-02-21T09:32:57.7213526Z xor.b32 %r899, %r899, %r335; 2026-02-21T09:32:57.7213790Z .loc 1 51 85 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:51:85 2026-02-21T09:32:57.7214074Z cp.async.wait_group 2; 2026-02-21T09:32:57.7214226Z bar.sync 0; 2026-02-21T09:32:57.7214468Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7214771Z shl.b32 %r336, %r900, 3; 2026-02-21T09:32:57.7214924Z add.s32 %r338, %r52, %r336; 2026-02-21T09:32:57.7215072Z add.s32 %r332, %r338, 65536; 2026-02-21T09:32:57.7215228Z // begin inline asm 2026-02-21T09:32:57.7215356Z 2026-02-21T09:32:57.7215470Z { 2026-02-21T09:32:57.7215583Z .reg .pred complete; 2026-02-21T09:32:57.7215718Z waitLoop: 2026-02-21T09:32:57.7215896Z mbarrier.try_wait.parity.shared.b64 complete, [%r332], %r899; 2026-02-21T09:32:57.7216128Z @!complete bra.uni waitLoop; 2026-02-21T09:32:57.7216276Z } 2026-02-21T09:32:57.7216339Z 2026-02-21T09:32:57.7216392Z // end inline asm 2026-02-21T09:32:57.7216532Z shl.b32 %r339, %r902, 3; 2026-02-21T09:32:57.7216677Z add.s32 %r340, %r52, %r339; 2026-02-21T09:32:57.7216851Z add.s32 %r381, %r340, 65568; 2026-02-21T09:32:57.7217157Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7217440Z @%p46 bra $L__BB0_6; 2026-02-21T09:32:57.7217616Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:32:57.7217939Z .loc 1 52 44 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:52:44 2026-02-21T09:32:57.7218220Z shl.b32 %r345, %r900, 13; 2026-02-21T09:32:57.7218365Z add.s32 %r347, %r52, %r345; 2026-02-21T09:32:57.7218520Z add.s32 %r348, %r347, 32768; 2026-02-21T09:32:57.7218770Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7219052Z elect.sync %r349|%p59, -1; 2026-02-21T09:32:57.7219205Z bfe.u32 %r350, %r347, 4, 14; 2026-02-21T09:32:57.7219359Z cvt.u64.u32 %rd76, %r350; 2026-02-21T09:32:57.7219528Z or.b64 %rd71, %rd76, -9223371899382267904; 2026-02-21T09:32:57.7219702Z bfe.u32 %r351, %r348, 4, 14; 2026-02-21T09:32:57.7219852Z cvt.u64.u32 %rd77, %r351; 2026-02-21T09:32:57.7220004Z or.b64 %rd72, %rd77, -9223371899382267904; 2026-02-21T09:32:57.7220175Z mov.b32 %r342, 136314896; 2026-02-21T09:32:57.7220320Z mov.pred %p58, -1; 2026-02-21T09:32:57.7220464Z // begin inline asm 2026-02-21T09:32:57.7220680Z @%p59 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd71, %rd72, %r342, %p58; 2026-02-21T09:32:57.7220930Z // end inline asm 2026-02-21T09:32:57.7221069Z add.s32 %r352, %r347, 32; 2026-02-21T09:32:57.7221215Z bfe.u32 %r353, %r352, 4, 14; 2026-02-21T09:32:57.7221370Z cvt.u64.u32 %rd78, %r353; 2026-02-21T09:32:57.7221523Z or.b64 %rd73, %rd78, -9223371899382267904; 2026-02-21T09:32:57.7221701Z add.s32 %r354, %r347, 32800; 2026-02-21T09:32:57.7221848Z bfe.u32 %r355, %r354, 4, 14; 2026-02-21T09:32:57.7221997Z cvt.u64.u32 %rd79, %r355; 2026-02-21T09:32:57.7222215Z or.b64 %rd74, %rd79, -9223371899382267904; 2026-02-21T09:32:57.7222394Z // begin inline asm 2026-02-21T09:32:57.7222607Z @%p59 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd73, %rd74, %r342, %p58; 2026-02-21T09:32:57.7222850Z // end inline asm 2026-02-21T09:32:57.7222989Z cvt.u64.u32 %rd75, %r381; 2026-02-21T09:32:57.7223130Z // begin inline asm 2026-02-21T09:32:57.7223335Z @%p59 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd75]; 2026-02-21T09:32:57.7223554Z // end inline asm 2026-02-21T09:32:57.7223690Z bra.uni $L__BB0_6; 2026-02-21T09:32:57.7223861Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:32:57.7224178Z .loc 1 0 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:0:52 2026-02-21T09:32:57.7224466Z mov.b32 %r382, 1; 2026-02-21T09:32:57.7224759Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7225046Z // begin inline asm 2026-02-21T09:32:57.7225172Z 2026-02-21T09:32:57.7225286Z { 2026-02-21T09:32:57.7225406Z .reg .pred complete; 2026-02-21T09:32:57.7225553Z waitLoop: 2026-02-21T09:32:57.7225729Z mbarrier.try_wait.parity.shared.b64 complete, [%r381], %r382; 2026-02-21T09:32:57.7225961Z @!complete bra.uni waitLoop; 2026-02-21T09:32:57.7226105Z } 2026-02-21T09:32:57.7226173Z 2026-02-21T09:32:57.7226225Z // end inline asm 2026-02-21T09:32:57.7226474Z .loc 1 46 42 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:46:42 2026-02-21T09:32:57.7226754Z cp.async.wait_group 0; 2026-02-21T09:32:57.7226906Z bar.sync 0; 2026-02-21T09:32:57.7227033Z // begin inline asm 2026-02-21T09:32:57.7227201Z @%p73 mbarrier.inval.shared::cta.b64 [%r207]; 2026-02-21T09:32:57.7227387Z // end inline asm 2026-02-21T09:32:57.7227528Z bar.sync 0; 2026-02-21T09:32:57.7227652Z // begin inline asm 2026-02-21T09:32:57.7227814Z @%p73 mbarrier.inval.shared::cta.b64 [%r208]; 2026-02-21T09:32:57.7227999Z // end inline asm 2026-02-21T09:32:57.7228127Z bar.sync 0; 2026-02-21T09:32:57.7228295Z // begin inline asm 2026-02-21T09:32:57.7228474Z @%p73 mbarrier.inval.shared::cta.b64 [%r209]; 2026-02-21T09:32:57.7228658Z // end inline asm 2026-02-21T09:32:57.7228781Z bar.sync 0; 2026-02-21T09:32:57.7228909Z // begin inline asm 2026-02-21T09:32:57.7229060Z @%p73 mbarrier.inval.shared::cta.b64 [%r316]; 2026-02-21T09:32:57.7229240Z // end inline asm 2026-02-21T09:32:57.7229378Z add.s32 %r387, %r52, 65568; 2026-02-21T09:32:57.7229525Z // begin inline asm 2026-02-21T09:32:57.7229685Z @%p73 mbarrier.inval.shared::cta.b64 [%r387]; 2026-02-21T09:32:57.7229858Z // end inline asm 2026-02-21T09:32:57.7229992Z bar.sync 0; 2026-02-21T09:32:57.7230110Z // begin inline asm 2026-02-21T09:32:57.7230265Z @%p73 mbarrier.inval.shared::cta.b64 [%r206]; 2026-02-21T09:32:57.7230434Z // end inline asm 2026-02-21T09:32:57.7230674Z .loc 1 56 45 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:56:45 2026-02-21T09:32:57.7230959Z shl.b32 %r670, %r21, 10; 2026-02-21T09:32:57.7231108Z shl.b32 %r671, %r22, 10; 2026-02-21T09:32:57.7231258Z shl.b32 %r672, %r23, 10; 2026-02-21T09:32:57.7231398Z shl.b32 %r673, %r24, 10; 2026-02-21T09:32:57.7231542Z shl.b32 %r674, %r25, 10; 2026-02-21T09:32:57.7231679Z shl.b32 %r675, %r26, 10; 2026-02-21T09:32:57.7231822Z shl.b32 %r676, %r27, 10; 2026-02-21T09:32:57.7231960Z shl.b32 %r677, %r28, 10; 2026-02-21T09:32:57.7232101Z shl.b32 %r678, %r29, 10; 2026-02-21T09:32:57.7232239Z shl.b32 %r679, %r30, 10; 2026-02-21T09:32:57.7232382Z shl.b32 %r680, %r31, 10; 2026-02-21T09:32:57.7232526Z shl.b32 %r681, %r32, 10; 2026-02-21T09:32:57.7232664Z shl.b32 %r682, %r33, 10; 2026-02-21T09:32:57.7232806Z shl.b32 %r683, %r34, 10; 2026-02-21T09:32:57.7232943Z shl.b32 %r684, %r35, 10; 2026-02-21T09:32:57.7233090Z shl.b32 %r685, %r36, 10; 2026-02-21T09:32:57.7233365Z .loc 1 56 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:56:52 2026-02-21T09:32:57.7233691Z or.b32 %r686, %r670, %r20; 2026-02-21T09:32:57.7233846Z or.b32 %r687, %r671, %r20; 2026-02-21T09:32:57.7234000Z or.b32 %r688, %r672, %r20; 2026-02-21T09:32:57.7234144Z or.b32 %r689, %r673, %r20; 2026-02-21T09:32:57.7234295Z or.b32 %r690, %r674, %r20; 2026-02-21T09:32:57.7234443Z or.b32 %r691, %r675, %r20; 2026-02-21T09:32:57.7234584Z or.b32 %r692, %r676, %r20; 2026-02-21T09:32:57.7234765Z or.b32 %r693, %r677, %r20; 2026-02-21T09:32:57.7234908Z or.b32 %r694, %r678, %r20; 2026-02-21T09:32:57.7235059Z or.b32 %r695, %r679, %r20; 2026-02-21T09:32:57.7235203Z or.b32 %r696, %r680, %r20; 2026-02-21T09:32:57.7235359Z or.b32 %r697, %r681, %r20; 2026-02-21T09:32:57.7235505Z or.b32 %r698, %r682, %r20; 2026-02-21T09:32:57.7235658Z or.b32 %r699, %r683, %r20; 2026-02-21T09:32:57.7235808Z or.b32 %r700, %r684, %r20; 2026-02-21T09:32:57.7235953Z or.b32 %r701, %r700, 114688; 2026-02-21T09:32:57.7236106Z or.b32 %r702, %r685, %r20; 2026-02-21T09:32:57.7236253Z or.b32 %r703, %r702, 122880; 2026-02-21T09:32:57.7236522Z .loc 1 56 24 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:56:24 2026-02-21T09:32:57.7236813Z mad.wide.u32 %rd85, %r686, 2, %rd10; 2026-02-21T09:32:57.7236995Z mad.wide.u32 %rd86, %r687, 2, %rd10; 2026-02-21T09:32:57.7237163Z mad.wide.u32 %rd87, %r688, 2, %rd10; 2026-02-21T09:32:57.7237334Z mad.wide.u32 %rd88, %r689, 2, %rd10; 2026-02-21T09:32:57.7237505Z mad.wide.u32 %rd89, %r690, 2, %rd10; 2026-02-21T09:32:57.7237667Z mad.wide.u32 %rd90, %r691, 2, %rd10; 2026-02-21T09:32:57.7237836Z mad.wide.u32 %rd91, %r692, 2, %rd10; 2026-02-21T09:32:57.7237998Z mad.wide.u32 %rd92, %r693, 2, %rd10; 2026-02-21T09:32:57.7238166Z mad.wide.u32 %rd93, %r694, 2, %rd10; 2026-02-21T09:32:57.7238327Z mad.wide.u32 %rd94, %r695, 2, %rd10; 2026-02-21T09:32:57.7238495Z mad.wide.u32 %rd95, %r696, 2, %rd10; 2026-02-21T09:32:57.7238655Z mad.wide.u32 %rd96, %r697, 2, %rd10; 2026-02-21T09:32:57.7238822Z mad.wide.u32 %rd97, %r698, 2, %rd10; 2026-02-21T09:32:57.7238991Z mad.wide.u32 %rd98, %r699, 2, %rd10; 2026-02-21T09:32:57.7239185Z mad.wide.u32 %rd99, %r701, 2, %rd10; 2026-02-21T09:32:57.7239382Z mad.wide.u32 %rd100, %r703, 2, %rd10; 2026-02-21T09:32:57.7239664Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7239957Z // begin inline asm 2026-02-21T09:32:57.7240325Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403, %r404}, [%r524 + 0]; 2026-02-21T09:32:57.7240723Z // end inline asm 2026-02-21T09:32:57.7240865Z // begin inline asm 2026-02-21T09:32:57.7241220Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421}, [%r524 + 16]; 2026-02-21T09:32:57.7241625Z // end inline asm 2026-02-21T09:32:57.7241760Z // begin inline asm 2026-02-21T09:32:57.7242122Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438}, [%r524 + 32]; 2026-02-21T09:32:57.7242511Z // end inline asm 2026-02-21T09:32:57.7242651Z // begin inline asm 2026-02-21T09:32:57.7243001Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455}, [%r524 + 48]; 2026-02-21T09:32:57.7243380Z // end inline asm 2026-02-21T09:32:57.7243518Z // begin inline asm 2026-02-21T09:32:57.7243857Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472}, [%r524 + 64]; 2026-02-21T09:32:57.7244242Z // end inline asm 2026-02-21T09:32:57.7244376Z // begin inline asm 2026-02-21T09:32:57.7244914Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489}, [%r524 + 80]; 2026-02-21T09:32:57.7245367Z // end inline asm 2026-02-21T09:32:57.7245503Z // begin inline asm 2026-02-21T09:32:57.7245859Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506}, [%r524 + 96]; 2026-02-21T09:32:57.7246238Z // end inline asm 2026-02-21T09:32:57.7246378Z // begin inline asm 2026-02-21T09:32:57.7246729Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523}, [%r524 + 112]; 2026-02-21T09:32:57.7247115Z // end inline asm 2026-02-21T09:32:57.7247256Z // begin inline asm 2026-02-21T09:32:57.7247410Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:32:57.7247597Z // end inline asm 2026-02-21T09:32:57.7247738Z cvt.u64.u32 %rd101, %r389; 2026-02-21T09:32:57.7247907Z cvt.u64.u32 %rd102, %r390; 2026-02-21T09:32:57.7248065Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:32:57.7248236Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:32:57.7248519Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7248829Z mov.b64 {%r704, %r705}, %rd104; 2026-02-21T09:32:57.7249004Z cvt.rn.f16x2.f32 %r706, %r705, %r704; 2026-02-21T09:32:57.7249280Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7249563Z cvt.u64.u32 %rd105, %r391; 2026-02-21T09:32:57.7249713Z cvt.u64.u32 %rd106, %r392; 2026-02-21T09:32:57.7249869Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:32:57.7250021Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:32:57.7250289Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7250572Z mov.b64 {%r707, %r708}, %rd108; 2026-02-21T09:32:57.7250739Z cvt.rn.f16x2.f32 %r709, %r708, %r707; 2026-02-21T09:32:57.7251019Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7251328Z cvt.u64.u32 %rd109, %r393; 2026-02-21T09:32:57.7251509Z cvt.u64.u32 %rd110, %r394; 2026-02-21T09:32:57.7251657Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:32:57.7251812Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:32:57.7252065Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7252343Z mov.b64 {%r710, %r711}, %rd112; 2026-02-21T09:32:57.7252511Z cvt.rn.f16x2.f32 %r712, %r711, %r710; 2026-02-21T09:32:57.7252777Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7253064Z cvt.u64.u32 %rd113, %r395; 2026-02-21T09:32:57.7253210Z cvt.u64.u32 %rd114, %r396; 2026-02-21T09:32:57.7253362Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:32:57.7253509Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:32:57.7253770Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7254057Z mov.b64 {%r713, %r714}, %rd116; 2026-02-21T09:32:57.7254216Z cvt.rn.f16x2.f32 %r715, %r714, %r713; 2026-02-21T09:32:57.7254491Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7254804Z cvt.u64.u32 %rd117, %r397; 2026-02-21T09:32:57.7254955Z cvt.u64.u32 %rd118, %r398; 2026-02-21T09:32:57.7255101Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:32:57.7255261Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:32:57.7255524Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7255817Z mov.b64 {%r716, %r717}, %rd120; 2026-02-21T09:32:57.7255987Z cvt.rn.f16x2.f32 %r718, %r717, %r716; 2026-02-21T09:32:57.7256263Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7256551Z cvt.u64.u32 %rd121, %r399; 2026-02-21T09:32:57.7256726Z cvt.u64.u32 %rd122, %r400; 2026-02-21T09:32:57.7256908Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:32:57.7257060Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:32:57.7257326Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7257616Z mov.b64 {%r719, %r720}, %rd124; 2026-02-21T09:32:57.7257777Z cvt.rn.f16x2.f32 %r721, %r720, %r719; 2026-02-21T09:32:57.7258051Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7258329Z cvt.u64.u32 %rd125, %r401; 2026-02-21T09:32:57.7258485Z cvt.u64.u32 %rd126, %r402; 2026-02-21T09:32:57.7258628Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:32:57.7258784Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:32:57.7259037Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7259315Z mov.b64 {%r722, %r723}, %rd128; 2026-02-21T09:32:57.7259481Z cvt.rn.f16x2.f32 %r724, %r723, %r722; 2026-02-21T09:32:57.7259744Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7260022Z cvt.u64.u32 %rd129, %r403; 2026-02-21T09:32:57.7260168Z cvt.u64.u32 %rd130, %r404; 2026-02-21T09:32:57.7260319Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:32:57.7260469Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:32:57.7260728Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7261005Z mov.b64 {%r725, %r726}, %rd132; 2026-02-21T09:32:57.7261162Z cvt.rn.f16x2.f32 %r727, %r726, %r725; 2026-02-21T09:32:57.7261434Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7261704Z cvt.u64.u32 %rd133, %r406; 2026-02-21T09:32:57.7261858Z cvt.u64.u32 %rd134, %r407; 2026-02-21T09:32:57.7262002Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:32:57.7262160Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:32:57.7262415Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7262768Z mov.b64 {%r728, %r729}, %rd136; 2026-02-21T09:32:57.7262936Z cvt.rn.f16x2.f32 %r730, %r729, %r728; 2026-02-21T09:32:57.7263206Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7263496Z cvt.u64.u32 %rd137, %r408; 2026-02-21T09:32:57.7263641Z cvt.u64.u32 %rd138, %r409; 2026-02-21T09:32:57.7263798Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:32:57.7263955Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:32:57.7264240Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7264548Z mov.b64 {%r731, %r732}, %rd140; 2026-02-21T09:32:57.7264751Z cvt.rn.f16x2.f32 %r733, %r732, %r731; 2026-02-21T09:32:57.7265051Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7265351Z cvt.u64.u32 %rd141, %r410; 2026-02-21T09:32:57.7265517Z cvt.u64.u32 %rd142, %r411; 2026-02-21T09:32:57.7265674Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:32:57.7265841Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:32:57.7266116Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7266402Z mov.b64 {%r734, %r735}, %rd144; 2026-02-21T09:32:57.7266569Z cvt.rn.f16x2.f32 %r736, %r735, %r734; 2026-02-21T09:32:57.7266842Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7267133Z cvt.u64.u32 %rd145, %r412; 2026-02-21T09:32:57.7267280Z cvt.u64.u32 %rd146, %r413; 2026-02-21T09:32:57.7267445Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:32:57.7267604Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:32:57.7267890Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7268231Z mov.b64 {%r737, %r738}, %rd148; 2026-02-21T09:32:57.7268442Z cvt.rn.f16x2.f32 %r739, %r738, %r737; 2026-02-21T09:32:57.7268757Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7269059Z cvt.u64.u32 %rd149, %r414; 2026-02-21T09:32:57.7269230Z cvt.u64.u32 %rd150, %r415; 2026-02-21T09:32:57.7269388Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:32:57.7269560Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:32:57.7269849Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7270158Z mov.b64 {%r740, %r741}, %rd152; 2026-02-21T09:32:57.7270337Z cvt.rn.f16x2.f32 %r742, %r741, %r740; 2026-02-21T09:32:57.7270633Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7270939Z cvt.u64.u32 %rd153, %r416; 2026-02-21T09:32:57.7271094Z cvt.u64.u32 %rd154, %r417; 2026-02-21T09:32:57.7271258Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:32:57.7271422Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:32:57.7271709Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7272016Z mov.b64 {%r743, %r744}, %rd156; 2026-02-21T09:32:57.7272187Z cvt.rn.f16x2.f32 %r745, %r744, %r743; 2026-02-21T09:32:57.7272489Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7272789Z cvt.u64.u32 %rd157, %r418; 2026-02-21T09:32:57.7272955Z cvt.u64.u32 %rd158, %r419; 2026-02-21T09:32:57.7273112Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:32:57.7273279Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:32:57.7273563Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7273873Z mov.b64 {%r746, %r747}, %rd160; 2026-02-21T09:32:57.7274049Z cvt.rn.f16x2.f32 %r748, %r747, %r746; 2026-02-21T09:32:57.7274344Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7274739Z cvt.u64.u32 %rd161, %r420; 2026-02-21T09:32:57.7274928Z cvt.u64.u32 %rd162, %r421; 2026-02-21T09:32:57.7275092Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:32:57.7275252Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:32:57.7275530Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7275820Z mov.b64 {%r749, %r750}, %rd164; 2026-02-21T09:32:57.7275981Z cvt.rn.f16x2.f32 %r751, %r750, %r749; 2026-02-21T09:32:57.7276255Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7276527Z cvt.u64.u32 %rd165, %r423; 2026-02-21T09:32:57.7276679Z cvt.u64.u32 %rd166, %r424; 2026-02-21T09:32:57.7276825Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:32:57.7276983Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:32:57.7277241Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7277524Z mov.b64 {%r752, %r753}, %rd168; 2026-02-21T09:32:57.7277690Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T09:32:57.7277955Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7278233Z cvt.u64.u32 %rd169, %r425; 2026-02-21T09:32:57.7278378Z cvt.u64.u32 %rd170, %r426; 2026-02-21T09:32:57.7278528Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:32:57.7278677Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:32:57.7278939Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7279230Z mov.b64 {%r755, %r756}, %rd172; 2026-02-21T09:32:57.7279390Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T09:32:57.7279662Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7279965Z cvt.u64.u32 %rd173, %r427; 2026-02-21T09:32:57.7280146Z cvt.u64.u32 %rd174, %r428; 2026-02-21T09:32:57.7280296Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:32:57.7280456Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:32:57.7280712Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7281002Z mov.b64 {%r758, %r759}, %rd176; 2026-02-21T09:32:57.7281168Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T09:32:57.7281434Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7281710Z cvt.u64.u32 %rd177, %r429; 2026-02-21T09:32:57.7281853Z cvt.u64.u32 %rd178, %r430; 2026-02-21T09:32:57.7282006Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:32:57.7282160Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:32:57.7282431Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7282718Z mov.b64 {%r761, %r762}, %rd180; 2026-02-21T09:32:57.7282880Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T09:32:57.7283156Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7283429Z cvt.u64.u32 %rd181, %r431; 2026-02-21T09:32:57.7283581Z cvt.u64.u32 %rd182, %r432; 2026-02-21T09:32:57.7283725Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:32:57.7283880Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:32:57.7284136Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7284416Z mov.b64 {%r764, %r765}, %rd184; 2026-02-21T09:32:57.7284583Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T09:32:57.7284883Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7285168Z cvt.u64.u32 %rd185, %r433; 2026-02-21T09:32:57.7285316Z cvt.u64.u32 %rd186, %r434; 2026-02-21T09:32:57.7285472Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:32:57.7285625Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:32:57.7285898Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7286248Z mov.b64 {%r767, %r768}, %rd188; 2026-02-21T09:32:57.7286417Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T09:32:57.7286701Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7286982Z cvt.u64.u32 %rd189, %r435; 2026-02-21T09:32:57.7287142Z cvt.u64.u32 %rd190, %r436; 2026-02-21T09:32:57.7287296Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:32:57.7287456Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:32:57.7287724Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7288022Z mov.b64 {%r770, %r771}, %rd192; 2026-02-21T09:32:57.7288195Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T09:32:57.7288478Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7288772Z cvt.u64.u32 %rd193, %r437; 2026-02-21T09:32:57.7288927Z cvt.u64.u32 %rd194, %r438; 2026-02-21T09:32:57.7289089Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:32:57.7289247Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:32:57.7289520Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7289811Z mov.b64 {%r773, %r774}, %rd196; 2026-02-21T09:32:57.7289977Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T09:32:57.7290262Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7290546Z cvt.u64.u32 %rd197, %r440; 2026-02-21T09:32:57.7290705Z cvt.u64.u32 %rd198, %r441; 2026-02-21T09:32:57.7290858Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:32:57.7291023Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:32:57.7291321Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7291637Z mov.b64 {%r776, %r777}, %rd200; 2026-02-21T09:32:57.7291817Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T09:32:57.7292085Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7292370Z cvt.u64.u32 %rd201, %r442; 2026-02-21T09:32:57.7292519Z cvt.u64.u32 %rd202, %r443; 2026-02-21T09:32:57.7292678Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:32:57.7292834Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:32:57.7293111Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7293413Z mov.b64 {%r779, %r780}, %rd204; 2026-02-21T09:32:57.7293576Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T09:32:57.7293867Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7294153Z cvt.u64.u32 %rd205, %r444; 2026-02-21T09:32:57.7294313Z cvt.u64.u32 %rd206, %r445; 2026-02-21T09:32:57.7294467Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:32:57.7294632Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:32:57.7294973Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7295281Z mov.b64 {%r782, %r783}, %rd208; 2026-02-21T09:32:57.7295457Z cvt.rn.f16x2.f32 %r784, %r783, %r782; 2026-02-21T09:32:57.7295745Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7296049Z cvt.u64.u32 %rd209, %r446; 2026-02-21T09:32:57.7296204Z cvt.u64.u32 %rd210, %r447; 2026-02-21T09:32:57.7296365Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:32:57.7296522Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:32:57.7296803Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7297104Z mov.b64 {%r785, %r786}, %rd212; 2026-02-21T09:32:57.7297273Z cvt.rn.f16x2.f32 %r787, %r786, %r785; 2026-02-21T09:32:57.7297579Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7297936Z cvt.u64.u32 %rd213, %r448; 2026-02-21T09:32:57.7298099Z cvt.u64.u32 %rd214, %r449; 2026-02-21T09:32:57.7298254Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:32:57.7298417Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:32:57.7298691Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7298988Z mov.b64 {%r788, %r789}, %rd216; 2026-02-21T09:32:57.7299163Z cvt.rn.f16x2.f32 %r790, %r789, %r788; 2026-02-21T09:32:57.7299445Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7299739Z cvt.u64.u32 %rd217, %r450; 2026-02-21T09:32:57.7299891Z cvt.u64.u32 %rd218, %r451; 2026-02-21T09:32:57.7300050Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:32:57.7300217Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:32:57.7300484Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7300772Z mov.b64 {%r791, %r792}, %rd220; 2026-02-21T09:32:57.7300930Z cvt.rn.f16x2.f32 %r793, %r792, %r791; 2026-02-21T09:32:57.7301203Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7301474Z cvt.u64.u32 %rd221, %r452; 2026-02-21T09:32:57.7301627Z cvt.u64.u32 %rd222, %r453; 2026-02-21T09:32:57.7301774Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:32:57.7301930Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:32:57.7302185Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7302470Z mov.b64 {%r794, %r795}, %rd224; 2026-02-21T09:32:57.7302640Z cvt.rn.f16x2.f32 %r796, %r795, %r794; 2026-02-21T09:32:57.7302907Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7303222Z cvt.u64.u32 %rd225, %r454; 2026-02-21T09:32:57.7303412Z cvt.u64.u32 %rd226, %r455; 2026-02-21T09:32:57.7303569Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:32:57.7303718Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:32:57.7303979Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7304264Z mov.b64 {%r797, %r798}, %rd228; 2026-02-21T09:32:57.7304420Z cvt.rn.f16x2.f32 %r799, %r798, %r797; 2026-02-21T09:32:57.7304716Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7304992Z cvt.u64.u32 %rd229, %r457; 2026-02-21T09:32:57.7305147Z cvt.u64.u32 %rd230, %r458; 2026-02-21T09:32:57.7305295Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:32:57.7305453Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:32:57.7305713Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7305997Z mov.b64 {%r800, %r801}, %rd232; 2026-02-21T09:32:57.7306167Z cvt.rn.f16x2.f32 %r802, %r801, %r800; 2026-02-21T09:32:57.7306433Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7306716Z cvt.u64.u32 %rd233, %r459; 2026-02-21T09:32:57.7306862Z cvt.u64.u32 %rd234, %r460; 2026-02-21T09:32:57.7307017Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:32:57.7307166Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:32:57.7307428Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7307717Z mov.b64 {%r803, %r804}, %rd236; 2026-02-21T09:32:57.7307875Z cvt.rn.f16x2.f32 %r805, %r804, %r803; 2026-02-21T09:32:57.7308151Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7308429Z cvt.u64.u32 %rd237, %r461; 2026-02-21T09:32:57.7308582Z cvt.u64.u32 %rd238, %r462; 2026-02-21T09:32:57.7308728Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:32:57.7308884Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:32:57.7309168Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7309483Z mov.b64 {%r806, %r807}, %rd240; 2026-02-21T09:32:57.7309544Z cvt.rn.f16x2.f32 %r808, %r807, %r806; 2026-02-21T09:32:57.7309718Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7309773Z cvt.u64.u32 %rd241, %r463; 2026-02-21T09:32:57.7309828Z cvt.u64.u32 %rd242, %r464; 2026-02-21T09:32:57.7309888Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:32:57.7309945Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:32:57.7310110Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7310167Z mov.b64 {%r809, %r810}, %rd244; 2026-02-21T09:32:57.7310235Z cvt.rn.f16x2.f32 %r811, %r810, %r809; 2026-02-21T09:32:57.7310400Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7310457Z cvt.u64.u32 %rd245, %r465; 2026-02-21T09:32:57.7310523Z cvt.u64.u32 %rd246, %r466; 2026-02-21T09:32:57.7310578Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:32:57.7310634Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:32:57.7310804Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7310860Z mov.b64 {%r812, %r813}, %rd248; 2026-02-21T09:32:57.7310919Z cvt.rn.f16x2.f32 %r814, %r813, %r812; 2026-02-21T09:32:57.7311082Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7311145Z cvt.u64.u32 %rd249, %r467; 2026-02-21T09:32:57.7311199Z cvt.u64.u32 %rd250, %r468; 2026-02-21T09:32:57.7311256Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:32:57.7311320Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:32:57.7311533Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7311593Z mov.b64 {%r815, %r816}, %rd252; 2026-02-21T09:32:57.7311662Z cvt.rn.f16x2.f32 %r817, %r816, %r815; 2026-02-21T09:32:57.7311821Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7311877Z cvt.u64.u32 %rd253, %r469; 2026-02-21T09:32:57.7311931Z cvt.u64.u32 %rd254, %r470; 2026-02-21T09:32:57.7311994Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:32:57.7312050Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:32:57.7312207Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7312272Z mov.b64 {%r818, %r819}, %rd256; 2026-02-21T09:32:57.7312332Z cvt.rn.f16x2.f32 %r820, %r819, %r818; 2026-02-21T09:32:57.7312490Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7312553Z cvt.u64.u32 %rd257, %r471; 2026-02-21T09:32:57.7312610Z cvt.u64.u32 %rd258, %r472; 2026-02-21T09:32:57.7312670Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:32:57.7312730Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:32:57.7312898Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7312957Z mov.b64 {%r821, %r822}, %rd260; 2026-02-21T09:32:57.7313017Z cvt.rn.f16x2.f32 %r823, %r822, %r821; 2026-02-21T09:32:57.7313184Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7313240Z cvt.u64.u32 %rd261, %r474; 2026-02-21T09:32:57.7313295Z cvt.u64.u32 %rd262, %r475; 2026-02-21T09:32:57.7313358Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:32:57.7313413Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:32:57.7313572Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7313628Z mov.b64 {%r824, %r825}, %rd264; 2026-02-21T09:32:57.7313695Z cvt.rn.f16x2.f32 %r826, %r825, %r824; 2026-02-21T09:32:57.7313853Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7313954Z cvt.u64.u32 %rd265, %r476; 2026-02-21T09:32:57.7314020Z cvt.u64.u32 %rd266, %r477; 2026-02-21T09:32:57.7314074Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:32:57.7314132Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:32:57.7314303Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7314360Z mov.b64 {%r827, %r828}, %rd268; 2026-02-21T09:32:57.7314419Z cvt.rn.f16x2.f32 %r829, %r828, %r827; 2026-02-21T09:32:57.7314585Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7314649Z cvt.u64.u32 %rd269, %r478; 2026-02-21T09:32:57.7314739Z cvt.u64.u32 %rd270, %r479; 2026-02-21T09:32:57.7314795Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:32:57.7314862Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:32:57.7315029Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7315088Z mov.b64 {%r830, %r831}, %rd272; 2026-02-21T09:32:57.7315154Z cvt.rn.f16x2.f32 %r832, %r831, %r830; 2026-02-21T09:32:57.7315320Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7315376Z cvt.u64.u32 %rd273, %r480; 2026-02-21T09:32:57.7315432Z cvt.u64.u32 %rd274, %r481; 2026-02-21T09:32:57.7315497Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:32:57.7315554Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:32:57.7315717Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7315783Z mov.b64 {%r833, %r834}, %rd276; 2026-02-21T09:32:57.7315844Z cvt.rn.f16x2.f32 %r835, %r834, %r833; 2026-02-21T09:32:57.7316039Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7316129Z cvt.u64.u32 %rd277, %r482; 2026-02-21T09:32:57.7316188Z cvt.u64.u32 %rd278, %r483; 2026-02-21T09:32:57.7316245Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:32:57.7316301Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:32:57.7316467Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7316524Z mov.b64 {%r836, %r837}, %rd280; 2026-02-21T09:32:57.7316585Z cvt.rn.f16x2.f32 %r838, %r837, %r836; 2026-02-21T09:32:57.7316757Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7316813Z cvt.u64.u32 %rd281, %r484; 2026-02-21T09:32:57.7316868Z cvt.u64.u32 %rd282, %r485; 2026-02-21T09:32:57.7316931Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:32:57.7316988Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:32:57.7317147Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7317206Z mov.b64 {%r839, %r840}, %rd284; 2026-02-21T09:32:57.7317274Z cvt.rn.f16x2.f32 %r841, %r840, %r839; 2026-02-21T09:32:57.7317433Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7317490Z cvt.u64.u32 %rd285, %r486; 2026-02-21T09:32:57.7317551Z cvt.u64.u32 %rd286, %r487; 2026-02-21T09:32:57.7317605Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:32:57.7317661Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:32:57.7317824Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7317880Z mov.b64 {%r842, %r843}, %rd288; 2026-02-21T09:32:57.7317940Z cvt.rn.f16x2.f32 %r844, %r843, %r842; 2026-02-21T09:32:57.7318105Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7318170Z cvt.u64.u32 %rd289, %r488; 2026-02-21T09:32:57.7318223Z cvt.u64.u32 %rd290, %r489; 2026-02-21T09:32:57.7318279Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:32:57.7318342Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:32:57.7318554Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7318612Z mov.b64 {%r845, %r846}, %rd292; 2026-02-21T09:32:57.7318677Z cvt.rn.f16x2.f32 %r847, %r846, %r845; 2026-02-21T09:32:57.7318839Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7318896Z cvt.u64.u32 %rd293, %r491; 2026-02-21T09:32:57.7318951Z cvt.u64.u32 %rd294, %r492; 2026-02-21T09:32:57.7319014Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:32:57.7319071Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:32:57.7319233Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7319296Z mov.b64 {%r848, %r849}, %rd296; 2026-02-21T09:32:57.7319356Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T09:32:57.7319522Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7319587Z cvt.u64.u32 %rd297, %r493; 2026-02-21T09:32:57.7319642Z cvt.u64.u32 %rd298, %r494; 2026-02-21T09:32:57.7319697Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:32:57.7319753Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:32:57.7319923Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7319978Z mov.b64 {%r851, %r852}, %rd300; 2026-02-21T09:32:57.7320038Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:32:57.7320207Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7320263Z cvt.u64.u32 %rd301, %r495; 2026-02-21T09:32:57.7320319Z cvt.u64.u32 %rd302, %r496; 2026-02-21T09:32:57.7320383Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:32:57.7320439Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:32:57.7320651Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7320711Z mov.b64 {%r854, %r855}, %rd304; 2026-02-21T09:32:57.7320781Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:32:57.7320948Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7321004Z cvt.u64.u32 %rd305, %r497; 2026-02-21T09:32:57.7321068Z cvt.u64.u32 %rd306, %r498; 2026-02-21T09:32:57.7321124Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:32:57.7321180Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:32:57.7321353Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7321411Z mov.b64 {%r857, %r858}, %rd308; 2026-02-21T09:32:57.7321473Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:32:57.7321637Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7321706Z cvt.u64.u32 %rd309, %r499; 2026-02-21T09:32:57.7321763Z cvt.u64.u32 %rd310, %r500; 2026-02-21T09:32:57.7321820Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:32:57.7321887Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:32:57.7322051Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7322108Z mov.b64 {%r860, %r861}, %rd312; 2026-02-21T09:32:57.7322174Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:32:57.7322338Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7322394Z cvt.u64.u32 %rd313, %r501; 2026-02-21T09:32:57.7322448Z cvt.u64.u32 %rd314, %r502; 2026-02-21T09:32:57.7322512Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:32:57.7322569Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:32:57.7322732Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7322797Z mov.b64 {%r863, %r864}, %rd316; 2026-02-21T09:32:57.7322858Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:32:57.7323045Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7323131Z cvt.u64.u32 %rd317, %r503; 2026-02-21T09:32:57.7323186Z cvt.u64.u32 %rd318, %r504; 2026-02-21T09:32:57.7323242Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:32:57.7323298Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:32:57.7323473Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7323528Z mov.b64 {%r866, %r867}, %rd320; 2026-02-21T09:32:57.7323588Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:32:57.7323757Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7323813Z cvt.u64.u32 %rd321, %r505; 2026-02-21T09:32:57.7323867Z cvt.u64.u32 %rd322, %r506; 2026-02-21T09:32:57.7323928Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:32:57.7323986Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:32:57.7324151Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7324210Z mov.b64 {%r869, %r870}, %rd324; 2026-02-21T09:32:57.7324276Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:32:57.7324442Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7324499Z cvt.u64.u32 %rd325, %r508; 2026-02-21T09:32:57.7324561Z cvt.u64.u32 %rd326, %r509; 2026-02-21T09:32:57.7324616Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:32:57.7324707Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:32:57.7324877Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7324933Z mov.b64 {%r872, %r873}, %rd328; 2026-02-21T09:32:57.7324993Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:32:57.7325219Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7325286Z cvt.u64.u32 %rd329, %r510; 2026-02-21T09:32:57.7325343Z cvt.u64.u32 %rd330, %r511; 2026-02-21T09:32:57.7325399Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:32:57.7325464Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:32:57.7325629Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7325685Z mov.b64 {%r875, %r876}, %rd332; 2026-02-21T09:32:57.7325750Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:32:57.7325911Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7325966Z cvt.u64.u32 %rd333, %r512; 2026-02-21T09:32:57.7326022Z cvt.u64.u32 %rd334, %r513; 2026-02-21T09:32:57.7326084Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:32:57.7326141Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:32:57.7326308Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7326373Z mov.b64 {%r878, %r879}, %rd336; 2026-02-21T09:32:57.7326434Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:32:57.7326597Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7326658Z cvt.u64.u32 %rd337, %r514; 2026-02-21T09:32:57.7326714Z cvt.u64.u32 %rd338, %r515; 2026-02-21T09:32:57.7326769Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:32:57.7326825Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:32:57.7326997Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7327053Z mov.b64 {%r881, %r882}, %rd340; 2026-02-21T09:32:57.7327113Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:32:57.7327285Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7327341Z cvt.u64.u32 %rd341, %r516; 2026-02-21T09:32:57.7327397Z cvt.u64.u32 %rd342, %r517; 2026-02-21T09:32:57.7327461Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:32:57.7327546Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:32:57.7327737Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7327793Z mov.b64 {%r884, %r885}, %rd344; 2026-02-21T09:32:57.7327859Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:32:57.7328025Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7328080Z cvt.u64.u32 %rd345, %r518; 2026-02-21T09:32:57.7328144Z cvt.u64.u32 %rd346, %r519; 2026-02-21T09:32:57.7328199Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:32:57.7328254Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:32:57.7328423Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7328480Z mov.b64 {%r887, %r888}, %rd348; 2026-02-21T09:32:57.7328541Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:32:57.7328704Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7328769Z cvt.u64.u32 %rd349, %r520; 2026-02-21T09:32:57.7328826Z cvt.u64.u32 %rd350, %r521; 2026-02-21T09:32:57.7328882Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:32:57.7328944Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:32:57.7329106Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7329163Z mov.b64 {%r890, %r891}, %rd352; 2026-02-21T09:32:57.7329229Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:32:57.7329393Z .loc 1 53 52 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:53:52 2026-02-21T09:32:57.7329448Z cvt.u64.u32 %rd353, %r522; 2026-02-21T09:32:57.7329503Z cvt.u64.u32 %rd354, %r523; 2026-02-21T09:32:57.7329567Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:32:57.7329644Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:32:57.7329831Z .loc 1 55 27 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:55:27 2026-02-21T09:32:57.7329901Z mov.b64 {%r893, %r894}, %rd356; 2026-02-21T09:32:57.7329963Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:32:57.7330129Z .loc 1 56 82 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:56:82 2026-02-21T09:32:57.7330232Z st.shared.v4.b32 [%r11], {%r706, %r718, %r730, %r742}; 2026-02-21T09:32:57.7330329Z st.shared.v4.b32 [%r12], {%r754, %r766, %r778, %r790}; 2026-02-21T09:32:57.7330414Z st.shared.v4.b32 [%r13], {%r802, %r814, %r826, %r838}; 2026-02-21T09:32:57.7330498Z st.shared.v4.b32 [%r14], {%r850, %r862, %r874, %r886}; 2026-02-21T09:32:57.7330561Z bar.sync 0; 2026-02-21T09:32:57.7330617Z // begin inline asm 2026-02-21T09:32:57.7330769Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r605, %r609, %r613, %r617}, [%r529]; 2026-02-21T09:32:57.7330831Z // end inline asm 2026-02-21T09:32:57.7330886Z // begin inline asm 2026-02-21T09:32:57.7331035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r621, %r625, %r629, %r633}, [%r534]; 2026-02-21T09:32:57.7331098Z // end inline asm 2026-02-21T09:32:57.7331153Z // begin inline asm 2026-02-21T09:32:57.7331290Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r641, %r645, %r649}, [%r539]; 2026-02-21T09:32:57.7331343Z // end inline asm 2026-02-21T09:32:57.7331404Z // begin inline asm 2026-02-21T09:32:57.7331536Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r653, %r657, %r661, %r665}, [%r544]; 2026-02-21T09:32:57.7331588Z // end inline asm 2026-02-21T09:32:57.7331646Z bar.sync 0; 2026-02-21T09:32:57.7331730Z st.shared.v4.b32 [%r11], {%r709, %r721, %r733, %r745}; 2026-02-21T09:32:57.7331813Z st.shared.v4.b32 [%r12], {%r757, %r769, %r781, %r793}; 2026-02-21T09:32:57.7331894Z st.shared.v4.b32 [%r13], {%r805, %r817, %r829, %r841}; 2026-02-21T09:32:57.7331981Z st.shared.v4.b32 [%r14], {%r853, %r865, %r877, %r889}; 2026-02-21T09:32:57.7332033Z bar.sync 0; 2026-02-21T09:32:57.7332088Z // begin inline asm 2026-02-21T09:32:57.7332234Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r606, %r610, %r614, %r618}, [%r529]; 2026-02-21T09:32:57.7332334Z // end inline asm 2026-02-21T09:32:57.7332388Z // begin inline asm 2026-02-21T09:32:57.7332530Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r622, %r626, %r630, %r634}, [%r534]; 2026-02-21T09:32:57.7332584Z // end inline asm 2026-02-21T09:32:57.7332637Z // begin inline asm 2026-02-21T09:32:57.7332770Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r638, %r642, %r646, %r650}, [%r539]; 2026-02-21T09:32:57.7332829Z // end inline asm 2026-02-21T09:32:57.7332883Z // begin inline asm 2026-02-21T09:32:57.7333014Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r654, %r658, %r662, %r666}, [%r544]; 2026-02-21T09:32:57.7333073Z // end inline asm 2026-02-21T09:32:57.7333127Z bar.sync 0; 2026-02-21T09:32:57.7333209Z st.shared.v4.b32 [%r11], {%r712, %r724, %r736, %r748}; 2026-02-21T09:32:57.7333292Z st.shared.v4.b32 [%r12], {%r760, %r772, %r784, %r796}; 2026-02-21T09:32:57.7333381Z st.shared.v4.b32 [%r13], {%r808, %r820, %r832, %r844}; 2026-02-21T09:32:57.7333462Z st.shared.v4.b32 [%r14], {%r856, %r868, %r880, %r892}; 2026-02-21T09:32:57.7333514Z bar.sync 0; 2026-02-21T09:32:57.7333574Z // begin inline asm 2026-02-21T09:32:57.7333709Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r607, %r611, %r615, %r619}, [%r529]; 2026-02-21T09:32:57.7333760Z // end inline asm 2026-02-21T09:32:57.7333820Z // begin inline asm 2026-02-21T09:32:57.7333955Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r623, %r627, %r631, %r635}, [%r534]; 2026-02-21T09:32:57.7334007Z // end inline asm 2026-02-21T09:32:57.7334059Z // begin inline asm 2026-02-21T09:32:57.7334202Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r639, %r643, %r647, %r651}, [%r539]; 2026-02-21T09:32:57.7334255Z // end inline asm 2026-02-21T09:32:57.7334308Z // begin inline asm 2026-02-21T09:32:57.7334452Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r655, %r659, %r663, %r667}, [%r544]; 2026-02-21T09:32:57.7334526Z // end inline asm 2026-02-21T09:32:57.7334600Z bar.sync 0; 2026-02-21T09:32:57.7334710Z st.shared.v4.b32 [%r11], {%r715, %r727, %r739, %r751}; 2026-02-21T09:32:57.7334801Z st.shared.v4.b32 [%r12], {%r763, %r775, %r787, %r799}; 2026-02-21T09:32:57.7334883Z st.shared.v4.b32 [%r13], {%r811, %r823, %r835, %r847}; 2026-02-21T09:32:57.7334964Z st.shared.v4.b32 [%r14], {%r859, %r871, %r883, %r895}; 2026-02-21T09:32:57.7335023Z bar.sync 0; 2026-02-21T09:32:57.7335075Z // begin inline asm 2026-02-21T09:32:57.7335215Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r608, %r612, %r616, %r620}, [%r529]; 2026-02-21T09:32:57.7335274Z // end inline asm 2026-02-21T09:32:57.7335326Z // begin inline asm 2026-02-21T09:32:57.7335464Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r624, %r628, %r632, %r636}, [%r534]; 2026-02-21T09:32:57.7335514Z // end inline asm 2026-02-21T09:32:57.7335574Z // begin inline asm 2026-02-21T09:32:57.7335712Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r640, %r644, %r648, %r652}, [%r539]; 2026-02-21T09:32:57.7335766Z // end inline asm 2026-02-21T09:32:57.7335829Z // begin inline asm 2026-02-21T09:32:57.7335966Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r656, %r660, %r664, %r668}, [%r544]; 2026-02-21T09:32:57.7336017Z // end inline asm 2026-02-21T09:32:57.7336077Z // begin inline asm 2026-02-21T09:32:57.7336176Z st.global.v4.b32 [ %rd85 + 0 ], { %r605, %r606, %r607, %r608 }; 2026-02-21T09:32:57.7336228Z // end inline asm 2026-02-21T09:32:57.7336280Z // begin inline asm 2026-02-21T09:32:57.7336384Z st.global.v4.b32 [ %rd86 + 0 ], { %r609, %r610, %r611, %r612 }; 2026-02-21T09:32:57.7336436Z // end inline asm 2026-02-21T09:32:57.7336490Z // begin inline asm 2026-02-21T09:32:57.7336591Z st.global.v4.b32 [ %rd87 + 0 ], { %r613, %r614, %r615, %r616 }; 2026-02-21T09:32:57.7336643Z // end inline asm 2026-02-21T09:32:57.7336696Z // begin inline asm 2026-02-21T09:32:57.7336787Z st.global.v4.b32 [ %rd88 + 0 ], { %r617, %r618, %r619, %r620 }; 2026-02-21T09:32:57.7336849Z // end inline asm 2026-02-21T09:32:57.7336904Z // begin inline asm 2026-02-21T09:32:57.7337043Z st.global.v4.b32 [ %rd89 + 0 ], { %r621, %r622, %r623, %r624 }; 2026-02-21T09:32:57.7337137Z // end inline asm 2026-02-21T09:32:57.7337194Z // begin inline asm 2026-02-21T09:32:57.7337292Z st.global.v4.b32 [ %rd90 + 0 ], { %r625, %r626, %r627, %r628 }; 2026-02-21T09:32:57.7337347Z // end inline asm 2026-02-21T09:32:57.7337411Z // begin inline asm 2026-02-21T09:32:57.7337504Z st.global.v4.b32 [ %rd91 + 0 ], { %r629, %r630, %r631, %r632 }; 2026-02-21T09:32:57.7337559Z // end inline asm 2026-02-21T09:32:57.7337623Z // begin inline asm 2026-02-21T09:32:57.7337717Z st.global.v4.b32 [ %rd92 + 0 ], { %r633, %r634, %r635, %r636 }; 2026-02-21T09:32:57.7337773Z // end inline asm 2026-02-21T09:32:57.7337839Z // begin inline asm 2026-02-21T09:32:57.7337936Z st.global.v4.b32 [ %rd93 + 0 ], { %r637, %r638, %r639, %r640 }; 2026-02-21T09:32:57.7337995Z // end inline asm 2026-02-21T09:32:57.7338055Z // begin inline asm 2026-02-21T09:32:57.7338165Z st.global.v4.b32 [ %rd94 + 0 ], { %r641, %r642, %r643, %r644 }; 2026-02-21T09:32:57.7338223Z // end inline asm 2026-02-21T09:32:57.7338279Z // begin inline asm 2026-02-21T09:32:57.7338384Z st.global.v4.b32 [ %rd95 + 0 ], { %r645, %r646, %r647, %r648 }; 2026-02-21T09:32:57.7338444Z // end inline asm 2026-02-21T09:32:57.7338502Z // begin inline asm 2026-02-21T09:32:57.7338599Z st.global.v4.b32 [ %rd96 + 0 ], { %r649, %r650, %r651, %r652 }; 2026-02-21T09:32:57.7338670Z // end inline asm 2026-02-21T09:32:57.7338729Z // begin inline asm 2026-02-21T09:32:57.7338823Z st.global.v4.b32 [ %rd97 + 0 ], { %r653, %r654, %r655, %r656 }; 2026-02-21T09:32:57.7338885Z // end inline asm 2026-02-21T09:32:57.7338941Z // begin inline asm 2026-02-21T09:32:57.7339033Z st.global.v4.b32 [ %rd98 + 0 ], { %r657, %r658, %r659, %r660 }; 2026-02-21T09:32:57.7339087Z // end inline asm 2026-02-21T09:32:57.7339150Z // begin inline asm 2026-02-21T09:32:57.7339277Z st.global.v4.b32 [ %rd99 + 0 ], { %r661, %r662, %r663, %r664 }; 2026-02-21T09:32:57.7339360Z // end inline asm 2026-02-21T09:32:57.7339431Z // begin inline asm 2026-02-21T09:32:57.7339536Z st.global.v4.b32 [ %rd100 + 0 ], { %r665, %r666, %r667, %r668 }; 2026-02-21T09:32:57.7339592Z // end inline asm 2026-02-21T09:32:57.7339684Z $L__BB0_8: // %._crit_edge 2026-02-21T09:32:57.7339869Z .loc 1 27 4 // cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py:27:4 2026-02-21T09:32:57.7339924Z bar.sync 0; 2026-02-21T09:32:57.7339979Z // begin inline asm 2026-02-21T09:32:57.7340107Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r896, 128; 2026-02-21T09:32:57.7340163Z // end inline asm 2026-02-21T09:32:57.7340216Z ret; 2026-02-21T09:32:57.7340277Z $L__tmp0: 2026-02-21T09:32:57.7340332Z $L__func_end0: 2026-02-21T09:32:57.7340418Z // -- End function 2026-02-21T09:32:57.7340468Z } 2026-02-21T09:32:57.7340698Z .file 1 "/tmp/torchinductor_root/uo/cuoxysrhmfjtdko6yhdwopo7dxy4nol3clmiimuszzbb54dkhk5e.py" 2026-02-21T09:32:57.7340764Z .section .debug_abbrev 2026-02-21T09:32:57.7340815Z { 2026-02-21T09:32:57.7340912Z .b8 1 // Abbreviation Code 2026-02-21T09:32:57.7341001Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:32:57.7341082Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:32:57.7341169Z .b8 37 // DW_AT_producer 2026-02-21T09:32:57.7341245Z .b8 8 // DW_FORM_string 2026-02-21T09:32:57.7341318Z .b8 19 // DW_AT_language 2026-02-21T09:32:57.7341395Z .b8 5 // DW_FORM_data2 2026-02-21T09:32:57.7341477Z .b8 3 // DW_AT_name 2026-02-21T09:32:57.7341551Z .b8 8 // DW_FORM_string 2026-02-21T09:32:57.7341631Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:32:57.7341716Z .b8 6 // DW_FORM_data4 2026-02-21T09:32:57.7341818Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:32:57.7341938Z .b8 8 // DW_FORM_string 2026-02-21T09:32:57.7342017Z .b8 0 // EOM(1) 2026-02-21T09:32:57.7342087Z .b8 0 // EOM(2) 2026-02-21T09:32:57.7342157Z .b8 0 // EOM(3) 2026-02-21T09:32:57.7342208Z } 2026-02-21T09:32:57.7342274Z .section .debug_info 2026-02-21T09:32:57.7342324Z { 2026-02-21T09:32:57.7342406Z .b32 104 // Length of Unit 2026-02-21T09:32:57.7342498Z .b8 2 // DWARF version number 2026-02-21T09:32:57.7342550Z .b8 0 2026-02-21T09:32:57.7342668Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:32:57.7342763Z .b8 8 // Address Size (in bytes) 2026-02-21T09:32:57.7342865Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:32:57.7342947Z .b8 116 // DW_AT_producer 2026-02-21T09:32:57.7342999Z .b8 114 2026-02-21T09:32:57.7343060Z .b8 105 2026-02-21T09:32:57.7343111Z .b8 116 2026-02-21T09:32:57.7343160Z .b8 111 2026-02-21T09:32:57.7343217Z .b8 110 2026-02-21T09:32:57.7343266Z .b8 0 2026-02-21T09:32:57.7343341Z .b8 2 // DW_AT_language 2026-02-21T09:32:57.7343392Z .b8 0 2026-02-21T09:32:57.7343476Z .b8 99 // DW_AT_name 2026-02-21T09:32:57.7343527Z .b8 117 2026-02-21T09:32:57.7343579Z .b8 111 2026-02-21T09:32:57.7343636Z .b8 120 2026-02-21T09:32:57.7343685Z .b8 121 2026-02-21T09:32:57.7343735Z .b8 115 2026-02-21T09:32:57.7343785Z .b8 114 2026-02-21T09:32:57.7343843Z .b8 104 2026-02-21T09:32:57.7343892Z .b8 109 2026-02-21T09:32:57.7343943Z .b8 102 2026-02-21T09:32:57.7344000Z .b8 106 2026-02-21T09:32:57.7344073Z .b8 116 2026-02-21T09:32:57.7344124Z .b8 100 2026-02-21T09:32:57.7344197Z .b8 107 2026-02-21T09:32:57.7344258Z .b8 111 2026-02-21T09:32:57.7344311Z .b8 54 2026-02-21T09:32:57.7344361Z .b8 121 2026-02-21T09:32:57.7344413Z .b8 104 2026-02-21T09:32:57.7344472Z .b8 100 2026-02-21T09:32:57.7344522Z .b8 119 2026-02-21T09:32:57.7344572Z .b8 111 2026-02-21T09:32:57.7344628Z .b8 112 2026-02-21T09:32:57.7344709Z .b8 111 2026-02-21T09:32:57.7344761Z .b8 55 2026-02-21T09:32:57.7344813Z .b8 100 2026-02-21T09:32:57.7344871Z .b8 120 2026-02-21T09:32:57.7344923Z .b8 121 2026-02-21T09:32:57.7344973Z .b8 52 2026-02-21T09:32:57.7345035Z .b8 110 2026-02-21T09:32:57.7345087Z .b8 111 2026-02-21T09:32:57.7345139Z .b8 108 2026-02-21T09:32:57.7345190Z .b8 51 2026-02-21T09:32:57.7345262Z .b8 99 2026-02-21T09:32:57.7345311Z .b8 108 2026-02-21T09:32:57.7345359Z .b8 109 2026-02-21T09:32:57.7345415Z .b8 105 2026-02-21T09:32:57.7345464Z .b8 105 2026-02-21T09:32:57.7345512Z .b8 109 2026-02-21T09:32:57.7345560Z .b8 117 2026-02-21T09:32:57.7345617Z .b8 115 2026-02-21T09:32:57.7345666Z .b8 122 2026-02-21T09:32:57.7345717Z .b8 122 2026-02-21T09:32:57.7345769Z .b8 98 2026-02-21T09:32:57.7345832Z .b8 98 2026-02-21T09:32:57.7345882Z .b8 53 2026-02-21T09:32:57.7345931Z .b8 52 2026-02-21T09:32:57.7345989Z .b8 100 2026-02-21T09:32:57.7346039Z .b8 107 2026-02-21T09:32:57.7346086Z .b8 104 2026-02-21T09:32:57.7346134Z .b8 107 2026-02-21T09:32:57.7346188Z .b8 53 2026-02-21T09:32:57.7346237Z .b8 101 2026-02-21T09:32:57.7346284Z .b8 46 2026-02-21T09:32:57.7346338Z .b8 112 2026-02-21T09:32:57.7346386Z .b8 121 2026-02-21T09:32:57.7346434Z .b8 0 2026-02-21T09:32:57.7346523Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:32:57.7346604Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:32:57.7346654Z .b8 116 2026-02-21T09:32:57.7346703Z .b8 109 2026-02-21T09:32:57.7346758Z .b8 112 2026-02-21T09:32:57.7346807Z .b8 47 2026-02-21T09:32:57.7346854Z .b8 116 2026-02-21T09:32:57.7346901Z .b8 111 2026-02-21T09:32:57.7346960Z .b8 114 2026-02-21T09:32:57.7347008Z .b8 99 2026-02-21T09:32:57.7347058Z .b8 104 2026-02-21T09:32:57.7347139Z .b8 105 2026-02-21T09:32:57.7347220Z .b8 110 2026-02-21T09:32:57.7347269Z .b8 100 2026-02-21T09:32:57.7347317Z .b8 117 2026-02-21T09:32:57.7347373Z .b8 99 2026-02-21T09:32:57.7347421Z .b8 116 2026-02-21T09:32:57.7347470Z .b8 111 2026-02-21T09:32:57.7347518Z .b8 114 2026-02-21T09:32:57.7347574Z .b8 95 2026-02-21T09:32:57.7347625Z .b8 114 2026-02-21T09:32:57.7347674Z .b8 111 2026-02-21T09:32:57.7347728Z .b8 111 2026-02-21T09:32:57.7347777Z .b8 116 2026-02-21T09:32:57.7347825Z .b8 47 2026-02-21T09:32:57.7347873Z .b8 117 2026-02-21T09:32:57.7347929Z .b8 111 2026-02-21T09:32:57.7347977Z .b8 0 2026-02-21T09:32:57.7348024Z } 2026-02-21T09:32:57.7348094Z .section .debug_macinfo { } 2026-02-21T09:32:57.7348099Z 2026-02-21T09:32:57.7348173Z ================================================================ 2026-02-21T09:32:57.7348272Z please share the reproducer above with Triton project. 2026-02-21T09:32:58.1716078Z [129s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:32:58.1716385Z 2026-02-21T09:32:58.1720152Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=8, num_stages=4, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:32:58.1721246Z 2026-02-21T09:32:58.1721470Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:32:58.1721719Z `ptxas` stderr: 2026-02-21T09:32:58.1722140Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 191 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:32:58.1722952Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:32:58.1723122Z 2026-02-21T09:32:58.1723538Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpldkv1flg.ptx -o /tmp/tmpldkv1flg.ptx.o 2026-02-21T09:32:58.1723987Z 2026-02-21T09:32:58.1724116Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:32:58.1724309Z 2026-02-21T09:32:58.1724393Z ================================================================ 2026-02-21T09:32:58.1724632Z Internal Triton PTX codegen error 2026-02-21T09:32:58.1725045Z `ptxas` stderr: 2026-02-21T09:32:58.1725454Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 191 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:32:58.1725916Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:32:58.1726069Z 2026-02-21T09:32:58.1726435Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpldkv1flg.ptx -o /tmp/tmpldkv1flg.ptx.o 2026-02-21T09:32:58.1726875Z 2026-02-21T09:32:58.1726879Z 2026-02-21T09:32:58.1726950Z // 2026-02-21T09:32:58.1727093Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:32:58.1727270Z // 2026-02-21T09:32:58.1727334Z 2026-02-21T09:32:58.1727388Z .version 8.7 2026-02-21T09:32:58.1727523Z .target sm_100a 2026-02-21T09:32:58.1727650Z .address_size 64 2026-02-21T09:32:58.1727737Z 2026-02-21T09:32:58.1727852Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:32:58.1728102Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:32:58.1728305Z // @_helion_matmul 2026-02-21T09:32:58.1728503Z .visible .entry _helion_matmul( 2026-02-21T09:32:58.1728710Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:32:58.1728974Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:32:58.1729219Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:32:58.1729558Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:32:58.1729853Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:32:58.1730051Z ) 2026-02-21T09:32:58.1730177Z .reqntid 128 2026-02-21T09:32:58.1730302Z .maxnreg 32 2026-02-21T09:32:58.1730427Z { 2026-02-21T09:32:58.1730547Z .reg .pred %p<80>; 2026-02-21T09:32:58.1730701Z .reg .b32 %r<904>; 2026-02-21T09:32:58.1730844Z .reg .b64 %rd<359>; 2026-02-21T09:32:58.1730996Z $L__func_begin0: 2026-02-21T09:32:58.1731079Z 2026-02-21T09:32:58.1731134Z // %bb.0: 2026-02-21T09:32:58.1731392Z .loc 1 19 0 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:19 2026-02-21T09:32:58.1731692Z mov.u32 %r1, %tid.x; 2026-02-21T09:32:58.1731870Z ld.param.b64 %rd12, [_helion_matmul_param_1]; 2026-02-21T09:32:58.1732084Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:32:58.1732252Z mov.b32 %r52, global_smem; 2026-02-21T09:32:58.1732425Z // begin inline asm 2026-02-21T09:32:58.1732677Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r52], 128; 2026-02-21T09:32:58.1732955Z // end inline asm 2026-02-21T09:32:58.1733130Z ld.param.b64 %rd29, [_helion_matmul_param_3]; 2026-02-21T09:32:58.1733311Z bar.sync 0; 2026-02-21T09:32:58.1733457Z ld.shared.b32 %r896, [global_smem]; 2026-02-21T09:32:58.1733629Z bar.sync 0; 2026-02-21T09:32:58.1733752Z // begin inline asm 2026-02-21T09:32:58.1733952Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:32:58.1734165Z // end inline asm 2026-02-21T09:32:58.1734414Z .loc 1 21 67 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:21:67 2026-02-21T09:32:58.1734737Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:32:58.1734894Z mov.u32 %r61, %ctaid.y; 2026-02-21T09:32:58.1735042Z mov.u32 %r62, %ctaid.z; 2026-02-21T09:32:58.1735195Z mov.u32 %r63, %nctaid.x; 2026-02-21T09:32:58.1735401Z mov.u32 %r64, %nctaid.y; 2026-02-21T09:32:58.1735582Z mad.lo.s32 %r65, %r62, %r64, %r61; 2026-02-21T09:32:58.1735766Z mad.lo.s32 %r66, %r65, %r63, %r3; 2026-02-21T09:32:58.1735927Z shl.b32 %r67, %r66, 7; 2026-02-21T09:32:58.1736081Z cvt.s64.s32 %rd30, %r67; 2026-02-21T09:32:58.1736232Z add.s64 %rd26, %rd29, %rd30; 2026-02-21T09:32:58.1736393Z shl.b32 %r68, %r1, 2; 2026-02-21T09:32:58.1736535Z add.s32 %r53, %r52, %r68; 2026-02-21T09:32:58.1736688Z mov.b32 %r70, 0; 2026-02-21T09:32:58.1736819Z // begin inline asm 2026-02-21T09:32:58.1736972Z @%p1 st.shared.b32 [ %r53 + 0 ], %r70; 2026-02-21T09:32:58.1737144Z // end inline asm 2026-02-21T09:32:58.1737277Z bar.warp.sync -1; 2026-02-21T09:32:58.1737421Z setp.eq.b32 %p73, %r1, 0; 2026-02-21T09:32:58.1737569Z cvt.u64.u32 %rd11, %r52; 2026-02-21T09:32:58.1737718Z // begin inline asm 2026-02-21T09:32:58.1737962Z @%p73 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T09:32:58.1738243Z // end inline asm 2026-02-21T09:32:58.1738372Z // begin inline asm 2026-02-21T09:32:58.1738597Z @%p73 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:32:58.1738851Z // end inline asm 2026-02-21T09:32:58.1738978Z mov.b32 %r55, 32; 2026-02-21T09:32:58.1739112Z // begin inline asm 2026-02-21T09:32:58.1739339Z @%p73 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r55; 2026-02-21T09:32:58.1739601Z // end inline asm 2026-02-21T09:32:58.1739728Z mov.b32 %r56, 128; 2026-02-21T09:32:58.1739865Z // begin inline asm 2026-02-21T09:32:58.1740082Z @%p73 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r56; 2026-02-21T09:32:58.1740339Z // end inline asm 2026-02-21T09:32:58.1740478Z mov.b32 %r57, 1024; 2026-02-21T09:32:58.1740615Z // begin inline asm 2026-02-21T09:32:58.1740854Z @%p73 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r57; 2026-02-21T09:32:58.1741126Z // end inline asm 2026-02-21T09:32:58.1741266Z // begin inline asm 2026-02-21T09:32:58.1741496Z @%p73 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r57; 2026-02-21T09:32:58.1741859Z // end inline asm 2026-02-21T09:32:58.1741989Z mov.b64 %rd19, 2048; 2026-02-21T09:32:58.1742136Z // begin inline asm 2026-02-21T09:32:58.1742386Z @%p73 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:32:58.1742656Z // end inline asm 2026-02-21T09:32:58.1742793Z mov.b32 %r59, 1; 2026-02-21T09:32:58.1742927Z // begin inline asm 2026-02-21T09:32:58.1743184Z @%p73 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r59; 2026-02-21T09:32:58.1743458Z // end inline asm 2026-02-21T09:32:58.1743596Z // begin inline asm 2026-02-21T09:32:58.1743844Z @%p73 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r59; 2026-02-21T09:32:58.1744114Z // end inline asm 2026-02-21T09:32:58.1744250Z // begin inline asm 2026-02-21T09:32:58.1744479Z @%p73 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:32:58.1744793Z // end inline asm 2026-02-21T09:32:58.1744924Z // begin inline asm 2026-02-21T09:32:58.1745177Z @%p73 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:32:58.1745458Z // end inline asm 2026-02-21T09:32:58.1745588Z // begin inline asm 2026-02-21T09:32:58.1745822Z @%p73 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x2; 2026-02-21T09:32:58.1746096Z // end inline asm 2026-02-21T09:32:58.1746235Z // begin inline asm 2026-02-21T09:32:58.1746466Z @%p73 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:32:58.1746730Z // end inline asm 2026-02-21T09:32:58.1746862Z // begin inline asm 2026-02-21T09:32:58.1747215Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd26 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:32:58.1747628Z // end inline asm 2026-02-21T09:32:58.1747791Z // begin inline asm 2026-02-21T09:32:58.1748007Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd26 + 0 ], 0x80; 2026-02-21T09:32:58.1748256Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:32:58.1748450Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:32:58.1748621Z // end inline asm 2026-02-21T09:32:58.1748755Z bar.sync 0; 2026-02-21T09:32:58.1748892Z cvta.global.u64 %rd61, %rd26; 2026-02-21T09:32:58.1749183Z .loc 1 27 129 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:27:129 2026-02-21T09:32:58.1749488Z setp.gt.u32 %p21, %r3, 767; 2026-02-21T09:32:58.1749649Z @%p21 bra $L__BB0_8; 2026-02-21T09:32:58.1749816Z // %bb.1: // %.lr.ph 2026-02-21T09:32:58.1750121Z .loc 1 0 129 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:0:129 2026-02-21T09:32:58.1750438Z ld.param.b64 %rd9, [_helion_matmul_param_0]; 2026-02-21T09:32:58.1750744Z .loc 1 47 48 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:47:48 2026-02-21T09:32:58.1751036Z and.b32 %r252, %r1, 3; 2026-02-21T09:32:58.1751200Z shl.b32 %r253, %r252, 3; 2026-02-21T09:32:58.1751447Z .loc 1 39 45 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:39:45 2026-02-21T09:32:58.1751723Z shl.b32 %r254, %r1, 3; 2026-02-21T09:32:58.1751869Z and.b32 %r255, %r254, 120; 2026-02-21T09:32:58.1752025Z shr.u32 %r256, %r1, 4; 2026-02-21T09:32:58.1752173Z bfe.u32 %r257, %r1, 4, 3; 2026-02-21T09:32:58.1752435Z .loc 1 40 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:40:27 2026-02-21T09:32:58.1752718Z shl.b32 %r258, %r3, 4; 2026-02-21T09:32:58.1752865Z and.b32 %r259, %r258, 16256; 2026-02-21T09:32:58.1753130Z .loc 1 39 45 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:39:45 2026-02-21T09:32:58.1753401Z or.b32 %r260, %r256, %r259; 2026-02-21T09:32:58.1753559Z bfe.u32 %r4, %r1, 2, 5; 2026-02-21T09:32:58.1753702Z shr.u32 %r261, %r1, 5; 2026-02-21T09:32:58.1753877Z shl.b32 %r262, %r1, 4; 2026-02-21T09:32:58.1754048Z and.b32 %r263, %r262, 2032; 2026-02-21T09:32:58.1754204Z and.b32 %r264, %r1, 24; 2026-02-21T09:32:58.1754347Z shl.b32 %r265, %r264, 1; 2026-02-21T09:32:58.1754503Z xor.b32 %r5, %r263, %r265; 2026-02-21T09:32:58.1754661Z add.s32 %r211, %r52, %r5; 2026-02-21T09:32:58.1754844Z add.s32 %r213, %r211, 2048; 2026-02-21T09:32:58.1754999Z add.s32 %r215, %r211, 4096; 2026-02-21T09:32:58.1755142Z add.s32 %r217, %r211, 6144; 2026-02-21T09:32:58.1755293Z add.s32 %r224, %r211, 8192; 2026-02-21T09:32:58.1755440Z add.s32 %r226, %r211, 10240; 2026-02-21T09:32:58.1755595Z add.s32 %r228, %r211, 12288; 2026-02-21T09:32:58.1755740Z add.s32 %r230, %r211, 14336; 2026-02-21T09:32:58.1755895Z add.s32 %r237, %r211, 16384; 2026-02-21T09:32:58.1756047Z add.s32 %r239, %r211, 18432; 2026-02-21T09:32:58.1756195Z add.s32 %r241, %r211, 20480; 2026-02-21T09:32:58.1756349Z add.s32 %r243, %r211, 22528; 2026-02-21T09:32:58.1756498Z or.b32 %r6, %r253, 96; 2026-02-21T09:32:58.1756650Z add.s32 %r308, %r211, 24576; 2026-02-21T09:32:58.1756794Z add.s32 %r310, %r211, 26624; 2026-02-21T09:32:58.1756946Z add.s32 %r312, %r211, 28672; 2026-02-21T09:32:58.1757089Z add.s32 %r314, %r211, 30720; 2026-02-21T09:32:58.1757241Z shl.b32 %r267, %r1, 10; 2026-02-21T09:32:58.1757381Z and.b32 %r268, %r267, 6144; 2026-02-21T09:32:58.1757534Z or.b32 %r269, %r268, %r263; 2026-02-21T09:32:58.1757686Z xor.b32 %r270, %r269, 32; 2026-02-21T09:32:58.1757829Z xor.b32 %r271, %r269, 64; 2026-02-21T09:32:58.1757978Z xor.b32 %r272, %r269, 96; 2026-02-21T09:32:58.1758119Z and.b32 %r273, %r1, 96; 2026-02-21T09:32:58.1758267Z shl.b32 %r274, %r273, 6; 2026-02-21T09:32:58.1758410Z shl.b32 %r275, %r252, 5; 2026-02-21T09:32:58.1758560Z shl.b32 %r276, %r264, 4; 2026-02-21T09:32:58.1758700Z and.b32 %r278, %r68, 16; 2026-02-21T09:32:58.1758875Z or.b32 %r279, %r274, %r275; 2026-02-21T09:32:58.1759049Z or.b32 %r280, %r276, %r273; 2026-02-21T09:32:58.1759208Z xor.b32 %r281, %r279, %r280; 2026-02-21T09:32:58.1759363Z add.s32 %r282, %r52, %r278; 2026-02-21T09:32:58.1759508Z add.s32 %r529, %r282, %r281; 2026-02-21T09:32:58.1759764Z .loc 1 38 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:38:27 2026-02-21T09:32:58.1760035Z shl.b32 %r283, %r3, 7; 2026-02-21T09:32:58.1760185Z and.b32 %r319, %r283, 896; 2026-02-21T09:32:58.1760434Z .loc 1 41 32 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:41:32 2026-02-21T09:32:58.1760713Z or.b32 %r284, %r259, %r4; 2026-02-21T09:32:58.1760870Z or.b32 %r21, %r259, %r257; 2026-02-21T09:32:58.1761127Z .loc 1 51 53 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:53 2026-02-21T09:32:58.1761413Z shl.b32 %r285, %r284, 10; 2026-02-21T09:32:58.1761661Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1761954Z shfl.sync.idx.b32 %r37, %r261, 0, 31, -1; 2026-02-21T09:32:58.1762133Z shl.b32 %r286, %r37, 21; 2026-02-21T09:32:58.1762289Z and.b32 %r287, %r286, 6291456; 2026-02-21T09:32:58.1762443Z add.s32 %r524, %r287, %r896; 2026-02-21T09:32:58.1762605Z mov.pred %p22, -1; 2026-02-21T09:32:58.1762749Z // begin inline asm 2026-02-21T09:32:58.1763085Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 0], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1763445Z // end inline asm 2026-02-21T09:32:58.1763575Z // begin inline asm 2026-02-21T09:32:58.1763902Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 16], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1764258Z // end inline asm 2026-02-21T09:32:58.1764396Z // begin inline asm 2026-02-21T09:32:58.1764744Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 32], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1765168Z // end inline asm 2026-02-21T09:32:58.1765309Z // begin inline asm 2026-02-21T09:32:58.1765619Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 48], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1765971Z // end inline asm 2026-02-21T09:32:58.1766101Z // begin inline asm 2026-02-21T09:32:58.1766419Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 64], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1766770Z // end inline asm 2026-02-21T09:32:58.1766900Z // begin inline asm 2026-02-21T09:32:58.1767218Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 80], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1767561Z // end inline asm 2026-02-21T09:32:58.1767697Z // begin inline asm 2026-02-21T09:32:58.1768008Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 96], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1768366Z // end inline asm 2026-02-21T09:32:58.1768500Z // begin inline asm 2026-02-21T09:32:58.1768812Z @%p22 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r524 + 112], {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:32:58.1769163Z // end inline asm 2026-02-21T09:32:58.1769290Z // begin inline asm 2026-02-21T09:32:58.1769444Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:32:58.1769602Z // end inline asm 2026-02-21T09:32:58.1769734Z bar.sync 0; 2026-02-21T09:32:58.1769977Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1770255Z add.s32 %r898, %r52, 65568; 2026-02-21T09:32:58.1770411Z // begin inline asm 2026-02-21T09:32:58.1770620Z @%p73 mbarrier.init.shared::cta.b64 [%r898], 1; 2026-02-21T09:32:58.1770814Z // end inline asm 2026-02-21T09:32:58.1770944Z bar.sync 0; 2026-02-21T09:32:58.1771079Z add.s32 %r206, %r52, 65576; 2026-02-21T09:32:58.1771224Z // begin inline asm 2026-02-21T09:32:58.1771394Z @%p73 mbarrier.init.shared::cta.b64 [%r206], 1; 2026-02-21T09:32:58.1771582Z // end inline asm 2026-02-21T09:32:58.1771713Z add.s32 %r207, %r52, 65536; 2026-02-21T09:32:58.1771866Z // begin inline asm 2026-02-21T09:32:58.1772023Z @%p73 mbarrier.init.shared::cta.b64 [%r207], 1; 2026-02-21T09:32:58.1772213Z // end inline asm 2026-02-21T09:32:58.1772356Z bar.sync 0; 2026-02-21T09:32:58.1772492Z add.s32 %r208, %r52, 65544; 2026-02-21T09:32:58.1772638Z // begin inline asm 2026-02-21T09:32:58.1772799Z @%p73 mbarrier.init.shared::cta.b64 [%r208], 1; 2026-02-21T09:32:58.1772972Z // end inline asm 2026-02-21T09:32:58.1773106Z bar.sync 0; 2026-02-21T09:32:58.1773238Z add.s32 %r209, %r52, 65552; 2026-02-21T09:32:58.1773384Z // begin inline asm 2026-02-21T09:32:58.1773546Z @%p73 mbarrier.init.shared::cta.b64 [%r209], 1; 2026-02-21T09:32:58.1773724Z // end inline asm 2026-02-21T09:32:58.1773859Z bar.sync 0; 2026-02-21T09:32:58.1774180Z add.s32 %r316, %r52, 65560; 2026-02-21T09:32:58.1774332Z // begin inline asm 2026-02-21T09:32:58.1774487Z @%p73 mbarrier.init.shared::cta.b64 [%r316], 1; 2026-02-21T09:32:58.1774714Z // end inline asm 2026-02-21T09:32:58.1774962Z .loc 1 51 60 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:60 2026-02-21T09:32:58.1775234Z or.b32 %r288, %r285, %r253; 2026-02-21T09:32:58.1775495Z .loc 1 51 32 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:32 2026-02-21T09:32:58.1775775Z mad.wide.u32 %rd31, %r288, 2, %rd9; 2026-02-21T09:32:58.1775951Z cvt.u64.u32 %rd2, %r285; 2026-02-21T09:32:58.1776102Z add.s64 %rd32, %rd31, 65536; 2026-02-21T09:32:58.1776265Z add.s64 %rd33, %rd31, 131072; 2026-02-21T09:32:58.1776419Z add.s64 %rd34, %rd31, 196608; 2026-02-21T09:32:58.1776575Z mov.b32 %r309, 16; 2026-02-21T09:32:58.1776848Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1777146Z // begin inline asm 2026-02-21T09:32:58.1777350Z cp.async.cg.shared.global [ %r211 + 0 ], [ %rd31 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1777567Z // end inline asm 2026-02-21T09:32:58.1777703Z // begin inline asm 2026-02-21T09:32:58.1777892Z cp.async.cg.shared.global [ %r213 + 0 ], [ %rd32 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1778115Z // end inline asm 2026-02-21T09:32:58.1778243Z // begin inline asm 2026-02-21T09:32:58.1778437Z cp.async.cg.shared.global [ %r215 + 0 ], [ %rd33 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1778658Z // end inline asm 2026-02-21T09:32:58.1778786Z // begin inline asm 2026-02-21T09:32:58.1778977Z cp.async.cg.shared.global [ %r217 + 0 ], [ %rd34 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1779187Z // end inline asm 2026-02-21T09:32:58.1779332Z cp.async.commit_group; 2026-02-21T09:32:58.1779587Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1779874Z bar.sync 0; 2026-02-21T09:32:58.1779997Z // begin inline asm 2026-02-21T09:32:58.1780189Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r207], 8192; 2026-02-21T09:32:58.1780405Z // end inline asm 2026-02-21T09:32:58.1780640Z .loc 1 52 44 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:52:44 2026-02-21T09:32:58.1780920Z bar.sync 0; 2026-02-21T09:32:58.1781057Z elect.sync %r289|%p43, -1; 2026-02-21T09:32:58.1781229Z and.pred %p37, %p1, %p43; 2026-02-21T09:32:58.1781385Z add.s32 %r220, %r52, 32768; 2026-02-21T09:32:58.1781542Z // begin inline asm 2026-02-21T09:32:58.1781867Z @%p37 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r220], [%rd61, {%r70, %r319}], [%r207]; 2026-02-21T09:32:58.1782213Z // end inline asm 2026-02-21T09:32:58.1782507Z .loc 1 51 32 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:32 2026-02-21T09:32:58.1782782Z add.s64 %rd36, %rd31, 64; 2026-02-21T09:32:58.1782935Z or.b32 %r290, %r288, 32; 2026-02-21T09:32:58.1783088Z mad.wide.u32 %rd46, %r290, 2, %rd9; 2026-02-21T09:32:58.1783261Z add.s64 %rd37, %rd46, 65536; 2026-02-21T09:32:58.1783415Z add.s64 %rd38, %rd46, 131072; 2026-02-21T09:32:58.1783576Z add.s64 %rd39, %rd46, 196608; 2026-02-21T09:32:58.1783832Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1784098Z // begin inline asm 2026-02-21T09:32:58.1784296Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd36 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1784516Z // end inline asm 2026-02-21T09:32:58.1784654Z // begin inline asm 2026-02-21T09:32:58.1784880Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd37 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1785106Z // end inline asm 2026-02-21T09:32:58.1785235Z // begin inline asm 2026-02-21T09:32:58.1785431Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd38 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1785652Z // end inline asm 2026-02-21T09:32:58.1785780Z // begin inline asm 2026-02-21T09:32:58.1785972Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd39 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1786181Z // end inline asm 2026-02-21T09:32:58.1786321Z cp.async.commit_group; 2026-02-21T09:32:58.1786571Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1786851Z bar.sync 0; 2026-02-21T09:32:58.1786972Z // begin inline asm 2026-02-21T09:32:58.1787161Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r208], 8192; 2026-02-21T09:32:58.1787375Z // end inline asm 2026-02-21T09:32:58.1787605Z .loc 1 52 44 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:52:44 2026-02-21T09:32:58.1787879Z bar.sync 0; 2026-02-21T09:32:58.1788011Z elect.sync %r291|%p44, -1; 2026-02-21T09:32:58.1788180Z and.pred %p39, %p1, %p44; 2026-02-21T09:32:58.1788357Z add.s32 %r233, %r52, 40960; 2026-02-21T09:32:58.1788536Z // begin inline asm 2026-02-21T09:32:58.1788852Z @%p39 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r233], [%rd61, {%r55, %r319}], [%r208]; 2026-02-21T09:32:58.1789190Z // end inline asm 2026-02-21T09:32:58.1789435Z .loc 1 51 32 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:32 2026-02-21T09:32:58.1789725Z add.s64 %rd41, %rd31, 128; 2026-02-21T09:32:58.1789882Z or.b32 %r292, %r288, 64; 2026-02-21T09:32:58.1790036Z mad.wide.u32 %rd47, %r292, 2, %rd9; 2026-02-21T09:32:58.1790210Z add.s64 %rd42, %rd47, 65536; 2026-02-21T09:32:58.1790361Z add.s64 %rd43, %rd47, 131072; 2026-02-21T09:32:58.1790524Z add.s64 %rd44, %rd47, 196608; 2026-02-21T09:32:58.1790785Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1791060Z // begin inline asm 2026-02-21T09:32:58.1791260Z cp.async.cg.shared.global [ %r237 + 0 ], [ %rd41 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1791478Z // end inline asm 2026-02-21T09:32:58.1791618Z // begin inline asm 2026-02-21T09:32:58.1791807Z cp.async.cg.shared.global [ %r239 + 0 ], [ %rd42 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1792037Z // end inline asm 2026-02-21T09:32:58.1792170Z // begin inline asm 2026-02-21T09:32:58.1792364Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd43 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1792582Z // end inline asm 2026-02-21T09:32:58.1792711Z // begin inline asm 2026-02-21T09:32:58.1792901Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd44 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1793113Z // end inline asm 2026-02-21T09:32:58.1793253Z cp.async.commit_group; 2026-02-21T09:32:58.1793499Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1793781Z bar.sync 0; 2026-02-21T09:32:58.1793934Z // begin inline asm 2026-02-21T09:32:58.1794150Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r209], 8192; 2026-02-21T09:32:58.1794369Z // end inline asm 2026-02-21T09:32:58.1794599Z .loc 1 52 44 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:52:44 2026-02-21T09:32:58.1794912Z bar.sync 0; 2026-02-21T09:32:58.1795045Z elect.sync %r293|%p45, -1; 2026-02-21T09:32:58.1795211Z and.pred %p41, %p1, %p45; 2026-02-21T09:32:58.1795364Z add.s32 %r246, %r52, 49152; 2026-02-21T09:32:58.1795516Z mov.b32 %r247, 64; 2026-02-21T09:32:58.1795647Z // begin inline asm 2026-02-21T09:32:58.1795970Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd61, {%r247, %r319}], [%r209]; 2026-02-21T09:32:58.1796323Z // end inline asm 2026-02-21T09:32:58.1796573Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1796877Z cp.async.wait_group 2; 2026-02-21T09:32:58.1797030Z bar.sync 0; 2026-02-21T09:32:58.1797277Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1797562Z // begin inline asm 2026-02-21T09:32:58.1797703Z 2026-02-21T09:32:58.1797825Z { 2026-02-21T09:32:58.1797949Z .reg .pred complete; 2026-02-21T09:32:58.1798104Z waitLoop: 2026-02-21T09:32:58.1798292Z mbarrier.try_wait.parity.shared.b64 complete, [%r207], %r70; 2026-02-21T09:32:58.1798532Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.1798682Z } 2026-02-21T09:32:58.1798755Z 2026-02-21T09:32:58.1798811Z // end inline asm 2026-02-21T09:32:58.1799057Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1799357Z setp.ne.b32 %p46, %r37, 0; 2026-02-21T09:32:58.1799516Z @%p46 bra $L__BB0_3; 2026-02-21T09:32:58.1799662Z // %bb.2: 2026-02-21T09:32:58.1799904Z .loc 1 0 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:0:52 2026-02-21T09:32:58.1800196Z add.s32 %r300, %r52, 32800; 2026-02-21T09:32:58.1800365Z bfe.u32 %r301, %r300, 4, 14; 2026-02-21T09:32:58.1800582Z cvt.u64.u32 %rd53, %r301; 2026-02-21T09:32:58.1800762Z or.b64 %rd51, %rd53, -9223371899382267904; 2026-02-21T09:32:58.1800945Z add.s32 %r302, %r52, 32; 2026-02-21T09:32:58.1801112Z bfe.u32 %r303, %r302, 4, 14; 2026-02-21T09:32:58.1801269Z cvt.u64.u32 %rd54, %r303; 2026-02-21T09:32:58.1801442Z or.b64 %rd50, %rd54, -9223371899382267904; 2026-02-21T09:32:58.1801636Z bfe.u32 %r304, %r220, 4, 14; 2026-02-21T09:32:58.1801794Z cvt.u64.u32 %rd55, %r304; 2026-02-21T09:32:58.1801968Z or.b64 %rd49, %rd55, -9223371899382267904; 2026-02-21T09:32:58.1802152Z bfe.u32 %r305, %r52, 4, 14; 2026-02-21T09:32:58.1802323Z cvt.u64.u32 %rd56, %r305; 2026-02-21T09:32:58.1802491Z or.b64 %rd48, %rd56, -9223371899382267904; 2026-02-21T09:32:58.1802790Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1803084Z elect.sync %r306|%p48, -1; 2026-02-21T09:32:58.1803252Z mov.b32 %r295, 136314896; 2026-02-21T09:32:58.1803411Z mov.pred %p47, 0; 2026-02-21T09:32:58.1803556Z // begin inline asm 2026-02-21T09:32:58.1803790Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd48, %rd49, %r295, %p47; 2026-02-21T09:32:58.1804054Z // end inline asm 2026-02-21T09:32:58.1804199Z // begin inline asm 2026-02-21T09:32:58.1804420Z @%p48 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd50, %rd51, %r295, %p22; 2026-02-21T09:32:58.1804709Z // end inline asm 2026-02-21T09:32:58.1804858Z add.s32 %r307, %r52, 65568; 2026-02-21T09:32:58.1805017Z cvt.u64.u32 %rd52, %r307; 2026-02-21T09:32:58.1805176Z // begin inline asm 2026-02-21T09:32:58.1805388Z @%p48 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd52]; 2026-02-21T09:32:58.1805631Z // end inline asm 2026-02-21T09:32:58.1805766Z $L__BB0_3: 2026-02-21T09:32:58.1806039Z .loc 1 0 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:0:52 2026-02-21T09:32:58.1806403Z ld.param.b64 %rd10, [_helion_matmul_param_2]; 2026-02-21T09:32:58.1806615Z add.s32 %r11, %r52, %r269; 2026-02-21T09:32:58.1806773Z add.s32 %r12, %r52, %r270; 2026-02-21T09:32:58.1806920Z add.s32 %r13, %r52, %r271; 2026-02-21T09:32:58.1807072Z add.s32 %r14, %r52, %r272; 2026-02-21T09:32:58.1807217Z add.s32 %r534, %r529, 512; 2026-02-21T09:32:58.1807366Z add.s32 %r539, %r529, 1024; 2026-02-21T09:32:58.1807512Z add.s32 %r544, %r529, 1536; 2026-02-21T09:32:58.1807666Z or.b32 %r20, %r319, %r255; 2026-02-21T09:32:58.1807811Z or.b32 %r22, %r21, 8; 2026-02-21T09:32:58.1807956Z or.b32 %r23, %r21, 16; 2026-02-21T09:32:58.1808097Z or.b32 %r24, %r21, 24; 2026-02-21T09:32:58.1808241Z or.b32 %r25, %r21, 32; 2026-02-21T09:32:58.1808381Z or.b32 %r26, %r21, 40; 2026-02-21T09:32:58.1808515Z or.b32 %r27, %r21, 48; 2026-02-21T09:32:58.1808658Z or.b32 %r28, %r260, 56; 2026-02-21T09:32:58.1808797Z or.b32 %r29, %r21, 64; 2026-02-21T09:32:58.1808941Z or.b32 %r30, %r21, 72; 2026-02-21T09:32:58.1809078Z or.b32 %r31, %r21, 80; 2026-02-21T09:32:58.1809222Z or.b32 %r32, %r21, 88; 2026-02-21T09:32:58.1809359Z or.b32 %r33, %r21, 96; 2026-02-21T09:32:58.1809506Z or.b32 %r34, %r21, 104; 2026-02-21T09:32:58.1809648Z or.b32 %r35, %r257, %r258; 2026-02-21T09:32:58.1809804Z or.b32 %r36, %r256, %r258; 2026-02-21T09:32:58.1810063Z .loc 1 51 32 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:32 2026-02-21T09:32:58.1810336Z add.s64 %rd57, %rd31, 192; 2026-02-21T09:32:58.1810493Z cvt.u64.u32 %rd63, %r6; 2026-02-21T09:32:58.1810641Z add.s64 %rd64, %rd2, %rd63; 2026-02-21T09:32:58.1810804Z shl.b64 %rd65, %rd64, 1; 2026-02-21T09:32:58.1810961Z add.s64 %rd66, %rd9, %rd65; 2026-02-21T09:32:58.1811118Z add.s64 %rd58, %rd66, 65536; 2026-02-21T09:32:58.1811270Z add.s64 %rd59, %rd66, 131072; 2026-02-21T09:32:58.1811432Z add.s64 %rd60, %rd66, 196608; 2026-02-21T09:32:58.1811694Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1811990Z bar.sync 0; 2026-02-21T09:32:58.1812150Z // begin inline asm 2026-02-21T09:32:58.1812345Z cp.async.cg.shared.global [ %r308 + 0 ], [ %rd57 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1812567Z // end inline asm 2026-02-21T09:32:58.1812697Z // begin inline asm 2026-02-21T09:32:58.1812895Z cp.async.cg.shared.global [ %r310 + 0 ], [ %rd58 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1813107Z // end inline asm 2026-02-21T09:32:58.1813244Z // begin inline asm 2026-02-21T09:32:58.1813435Z cp.async.cg.shared.global [ %r312 + 0 ], [ %rd59 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1813647Z // end inline asm 2026-02-21T09:32:58.1813783Z // begin inline asm 2026-02-21T09:32:58.1813968Z cp.async.cg.shared.global [ %r314 + 0 ], [ %rd60 + 0 ], 0x10, %r309; 2026-02-21T09:32:58.1814185Z // end inline asm 2026-02-21T09:32:58.1814320Z cp.async.commit_group; 2026-02-21T09:32:58.1814575Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1814876Z // begin inline asm 2026-02-21T09:32:58.1815068Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r316], 8192; 2026-02-21T09:32:58.1815284Z // end inline asm 2026-02-21T09:32:58.1815517Z .loc 1 52 44 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:52:44 2026-02-21T09:32:58.1815799Z bar.sync 0; 2026-02-21T09:32:58.1815930Z elect.sync %r326|%p55, -1; 2026-02-21T09:32:58.1816095Z and.pred %p53, %p1, %p55; 2026-02-21T09:32:58.1816245Z add.s32 %r317, %r52, 57344; 2026-02-21T09:32:58.1816396Z mov.b32 %r318, 96; 2026-02-21T09:32:58.1816524Z // begin inline asm 2026-02-21T09:32:58.1816847Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r317], [%rd61, {%r318, %r319}], [%r316]; 2026-02-21T09:32:58.1817192Z // end inline asm 2026-02-21T09:32:58.1817457Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1817774Z mul.wide.u32 %rd67, %r252, 16; 2026-02-21T09:32:58.1817935Z shl.b32 %r328, %r3, 14; 2026-02-21T09:32:58.1818092Z and.b32 %r329, %r328, 16646144; 2026-02-21T09:32:58.1818246Z shl.b32 %r330, %r4, 10; 2026-02-21T09:32:58.1818398Z or.b32 %r331, %r329, %r330; 2026-02-21T09:32:58.1818549Z mul.wide.u32 %rd68, %r331, 2; 2026-02-21T09:32:58.1818708Z or.b64 %rd69, %rd67, %rd68; 2026-02-21T09:32:58.1818862Z add.s64 %rd70, %rd69, %rd9; 2026-02-21T09:32:58.1819012Z add.s64 %rd357, %rd70, 196864; 2026-02-21T09:32:58.1819169Z mov.b32 %r902, 1; 2026-02-21T09:32:58.1819299Z mov.b32 %r901, 3; 2026-02-21T09:32:58.1819435Z mov.b32 %r897, 0; 2026-02-21T09:32:58.1819562Z mov.b64 %rd358, 0; 2026-02-21T09:32:58.1819703Z mov.b32 %r899, %r897; 2026-02-21T09:32:58.1819844Z mov.b32 %r900, %r897; 2026-02-21T09:32:58.1819994Z mov.b32 %r903, %r897; 2026-02-21T09:32:58.1820132Z bra.uni $L__BB0_4; 2026-02-21T09:32:58.1820325Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:32:58.1820640Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1820921Z setp.lt.u64 %p65, %rd358, 896; 2026-02-21T09:32:58.1821186Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1821452Z // begin inline asm 2026-02-21T09:32:58.1821587Z 2026-02-21T09:32:58.1821696Z { 2026-02-21T09:32:58.1821819Z .reg .pred complete; 2026-02-21T09:32:58.1821957Z waitLoop: 2026-02-21T09:32:58.1822147Z mbarrier.try_wait.parity.shared.b64 complete, [%r898], %r897; 2026-02-21T09:32:58.1822379Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.1822522Z } 2026-02-21T09:32:58.1822526Z 2026-02-21T09:32:58.1822578Z // end inline asm 2026-02-21T09:32:58.1822644Z add.s32 %r371, %r902, 1; 2026-02-21T09:32:58.1822703Z setp.gt.s32 %p68, %r371, 1; 2026-02-21T09:32:58.1822763Z selp.b32 %r902, 0, %r371, %p68; 2026-02-21T09:32:58.1822828Z selp.b32 %r372, 1, 0, %p68; 2026-02-21T09:32:58.1822886Z xor.b32 %r50, %r903, %r372; 2026-02-21T09:32:58.1823079Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1823168Z add.s32 %r373, %r901, 1; 2026-02-21T09:32:58.1823227Z setp.gt.s32 %p69, %r373, 3; 2026-02-21T09:32:58.1823287Z selp.b32 %r901, 0, %r373, %p69; 2026-02-21T09:32:58.1823448Z .loc 1 51 32 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:32 2026-02-21T09:32:58.1823518Z add.s64 %rd80, %rd357, -196608; 2026-02-21T09:32:58.1823578Z add.s64 %rd81, %rd357, -131072; 2026-02-21T09:32:58.1823637Z add.s64 %rd82, %rd357, -65536; 2026-02-21T09:32:58.1823804Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1823861Z shl.b32 %r374, %r901, 13; 2026-02-21T09:32:58.1823916Z add.s32 %r376, %r52, %r374; 2026-02-21T09:32:58.1823968Z bar.sync 0; 2026-02-21T09:32:58.1824034Z add.s32 %r358, %r376, %r5; 2026-02-21T09:32:58.1824095Z selp.b32 %r359, 16, 0, %p65; 2026-02-21T09:32:58.1824151Z // begin inline asm 2026-02-21T09:32:58.1824273Z cp.async.cg.shared.global [ %r358 + 0 ], [ %rd80 + 0 ], 0x10, %r359; 2026-02-21T09:32:58.1824328Z // end inline asm 2026-02-21T09:32:58.1824383Z add.s32 %r360, %r358, 2048; 2026-02-21T09:32:58.1824444Z // begin inline asm 2026-02-21T09:32:58.1824553Z cp.async.cg.shared.global [ %r360 + 0 ], [ %rd81 + 0 ], 0x10, %r359; 2026-02-21T09:32:58.1824605Z // end inline asm 2026-02-21T09:32:58.1824659Z add.s32 %r362, %r358, 4096; 2026-02-21T09:32:58.1824748Z // begin inline asm 2026-02-21T09:32:58.1824855Z cp.async.cg.shared.global [ %r362 + 0 ], [ %rd82 + 0 ], 0x10, %r359; 2026-02-21T09:32:58.1824907Z // end inline asm 2026-02-21T09:32:58.1824969Z add.s32 %r364, %r358, 6144; 2026-02-21T09:32:58.1825023Z // begin inline asm 2026-02-21T09:32:58.1825162Z cp.async.cg.shared.global [ %r364 + 0 ], [ %rd357 + 0 ], 0x10, %r359; 2026-02-21T09:32:58.1825216Z // end inline asm 2026-02-21T09:32:58.1825308Z cp.async.commit_group; 2026-02-21T09:32:58.1825473Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1825529Z shl.b32 %r377, %r901, 3; 2026-02-21T09:32:58.1825593Z add.s32 %r378, %r52, %r377; 2026-02-21T09:32:58.1825649Z add.s32 %r370, %r378, 65536; 2026-02-21T09:32:58.1825709Z and.pred %p63, %p73, %p65; 2026-02-21T09:32:58.1825768Z // begin inline asm 2026-02-21T09:32:58.1825875Z @%p63 mbarrier.arrive.expect_tx.shared.b64 _, [%r370], 8192; 2026-02-21T09:32:58.1825927Z // end inline asm 2026-02-21T09:32:58.1826083Z .loc 1 52 44 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:52:44 2026-02-21T09:32:58.1826149Z add.s32 %r367, %r376, 32768; 2026-02-21T09:32:58.1826202Z bar.sync 0; 2026-02-21T09:32:58.1826263Z elect.sync %r379|%p70, -1; 2026-02-21T09:32:58.1826329Z and.pred %p71, %p65, %p70; 2026-02-21T09:32:58.1826389Z and.pred %p64, %p1, %p71; 2026-02-21T09:32:58.1826447Z cvt.u32.u64 %r380, %rd358; 2026-02-21T09:32:58.1826504Z add.s32 %r368, %r380, 128; 2026-02-21T09:32:58.1826567Z // begin inline asm 2026-02-21T09:32:58.1826803Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r367], [%rd61, {%r368, %r319}], [%r370]; 2026-02-21T09:32:58.1826857Z // end inline asm 2026-02-21T09:32:58.1827023Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1827079Z add.s64 %rd357, %rd357, 64; 2026-02-21T09:32:58.1827140Z setp.lt.u64 %p72, %rd358, 960; 2026-02-21T09:32:58.1827203Z add.s64 %rd358, %rd358, 32; 2026-02-21T09:32:58.1827257Z mov.b32 %r897, %r903; 2026-02-21T09:32:58.1827311Z mov.b32 %r898, %r381; 2026-02-21T09:32:58.1827364Z mov.b32 %r903, %r50; 2026-02-21T09:32:58.1827428Z @%p72 bra $L__BB0_4; 2026-02-21T09:32:58.1827481Z bra.uni $L__BB0_7; 2026-02-21T09:32:58.1827582Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:32:58.1827674Z add.s32 %r334, %r900, 1; 2026-02-21T09:32:58.1827758Z setp.gt.s32 %p57, %r334, 3; 2026-02-21T09:32:58.1827817Z selp.b32 %r900, 0, %r334, %p57; 2026-02-21T09:32:58.1827874Z selp.b32 %r335, 1, 0, %p57; 2026-02-21T09:32:58.1827937Z xor.b32 %r899, %r899, %r335; 2026-02-21T09:32:58.1828102Z .loc 1 51 85 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:51:85 2026-02-21T09:32:58.1828166Z cp.async.wait_group 2; 2026-02-21T09:32:58.1828231Z bar.sync 0; 2026-02-21T09:32:58.1828394Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1828450Z shl.b32 %r336, %r900, 3; 2026-02-21T09:32:58.1828515Z add.s32 %r338, %r52, %r336; 2026-02-21T09:32:58.1828570Z add.s32 %r332, %r338, 65536; 2026-02-21T09:32:58.1828625Z // begin inline asm 2026-02-21T09:32:58.1828673Z 2026-02-21T09:32:58.1828728Z { 2026-02-21T09:32:58.1828788Z .reg .pred complete; 2026-02-21T09:32:58.1828840Z waitLoop: 2026-02-21T09:32:58.1828964Z mbarrier.try_wait.parity.shared.b64 complete, [%r332], %r899; 2026-02-21T09:32:58.1829025Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.1829071Z } 2026-02-21T09:32:58.1829075Z 2026-02-21T09:32:58.1829127Z // end inline asm 2026-02-21T09:32:58.1829187Z shl.b32 %r339, %r902, 3; 2026-02-21T09:32:58.1829241Z add.s32 %r340, %r52, %r339; 2026-02-21T09:32:58.1829295Z add.s32 %r381, %r340, 65568; 2026-02-21T09:32:58.1829463Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1829519Z @%p46 bra $L__BB0_6; 2026-02-21T09:32:58.1829614Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:32:58.1829778Z .loc 1 52 44 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:52:44 2026-02-21T09:32:58.1829833Z shl.b32 %r345, %r900, 13; 2026-02-21T09:32:58.1829911Z add.s32 %r347, %r52, %r345; 2026-02-21T09:32:58.1829989Z add.s32 %r348, %r347, 32768; 2026-02-21T09:32:58.1830157Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1830220Z elect.sync %r349|%p59, -1; 2026-02-21T09:32:58.1830277Z bfe.u32 %r350, %r347, 4, 14; 2026-02-21T09:32:58.1830342Z cvt.u64.u32 %rd76, %r350; 2026-02-21T09:32:58.1830413Z or.b64 %rd71, %rd76, -9223371899382267904; 2026-02-21T09:32:58.1830469Z bfe.u32 %r351, %r348, 4, 14; 2026-02-21T09:32:58.1830533Z cvt.u64.u32 %rd77, %r351; 2026-02-21T09:32:58.1830601Z or.b64 %rd72, %rd77, -9223371899382267904; 2026-02-21T09:32:58.1830656Z mov.b32 %r342, 136314896; 2026-02-21T09:32:58.1830712Z mov.pred %p58, -1; 2026-02-21T09:32:58.1830774Z // begin inline asm 2026-02-21T09:32:58.1830911Z @%p59 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd71, %rd72, %r342, %p58; 2026-02-21T09:32:58.1830963Z // end inline asm 2026-02-21T09:32:58.1831028Z add.s32 %r352, %r347, 32; 2026-02-21T09:32:58.1831085Z bfe.u32 %r353, %r352, 4, 14; 2026-02-21T09:32:58.1831142Z cvt.u64.u32 %rd78, %r353; 2026-02-21T09:32:58.1831210Z or.b64 %rd73, %rd78, -9223371899382267904; 2026-02-21T09:32:58.1831272Z add.s32 %r354, %r347, 32800; 2026-02-21T09:32:58.1831327Z bfe.u32 %r355, %r354, 4, 14; 2026-02-21T09:32:58.1831381Z cvt.u64.u32 %rd79, %r355; 2026-02-21T09:32:58.1831454Z or.b64 %rd74, %rd79, -9223371899382267904; 2026-02-21T09:32:58.1831507Z // begin inline asm 2026-02-21T09:32:58.1831642Z @%p59 tcgen05.mma.cta_group::1.kind::f16 [ %r896 + 0 ], %rd73, %rd74, %r342, %p58; 2026-02-21T09:32:58.1831702Z // end inline asm 2026-02-21T09:32:58.1831758Z cvt.u64.u32 %rd75, %r381; 2026-02-21T09:32:58.1831811Z // begin inline asm 2026-02-21T09:32:58.1831928Z @%p59 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd75]; 2026-02-21T09:32:58.1831989Z // end inline asm 2026-02-21T09:32:58.1832042Z bra.uni $L__BB0_6; 2026-02-21T09:32:58.1832134Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:32:58.1832305Z .loc 1 0 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:0:52 2026-02-21T09:32:58.1832398Z mov.b32 %r382, 1; 2026-02-21T09:32:58.1832553Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1832613Z // begin inline asm 2026-02-21T09:32:58.1832660Z 2026-02-21T09:32:58.1832707Z { 2026-02-21T09:32:58.1832764Z .reg .pred complete; 2026-02-21T09:32:58.1832823Z waitLoop: 2026-02-21T09:32:58.1832934Z mbarrier.try_wait.parity.shared.b64 complete, [%r381], %r382; 2026-02-21T09:32:58.1832995Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.1833049Z } 2026-02-21T09:32:58.1833052Z 2026-02-21T09:32:58.1833105Z // end inline asm 2026-02-21T09:32:58.1833260Z .loc 1 46 42 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:46:42 2026-02-21T09:32:58.1833326Z cp.async.wait_group 0; 2026-02-21T09:32:58.1833378Z bar.sync 0; 2026-02-21T09:32:58.1833432Z // begin inline asm 2026-02-21T09:32:58.1833514Z @%p73 mbarrier.inval.shared::cta.b64 [%r207]; 2026-02-21T09:32:58.1833577Z // end inline asm 2026-02-21T09:32:58.1833626Z bar.sync 0; 2026-02-21T09:32:58.1833680Z // begin inline asm 2026-02-21T09:32:58.1833763Z @%p73 mbarrier.inval.shared::cta.b64 [%r208]; 2026-02-21T09:32:58.1833815Z // end inline asm 2026-02-21T09:32:58.1833866Z bar.sync 0; 2026-02-21T09:32:58.1833919Z // begin inline asm 2026-02-21T09:32:58.1834000Z @%p73 mbarrier.inval.shared::cta.b64 [%r209]; 2026-02-21T09:32:58.1834051Z // end inline asm 2026-02-21T09:32:58.1834102Z bar.sync 0; 2026-02-21T09:32:58.1834163Z // begin inline asm 2026-02-21T09:32:58.1834238Z @%p73 mbarrier.inval.shared::cta.b64 [%r316]; 2026-02-21T09:32:58.1834289Z // end inline asm 2026-02-21T09:32:58.1834345Z add.s32 %r387, %r52, 65568; 2026-02-21T09:32:58.1834405Z // begin inline asm 2026-02-21T09:32:58.1834478Z @%p73 mbarrier.inval.shared::cta.b64 [%r387]; 2026-02-21T09:32:58.1834561Z // end inline asm 2026-02-21T09:32:58.1834641Z bar.sync 0; 2026-02-21T09:32:58.1834734Z // begin inline asm 2026-02-21T09:32:58.1834811Z @%p73 mbarrier.inval.shared::cta.b64 [%r206]; 2026-02-21T09:32:58.1834870Z // end inline asm 2026-02-21T09:32:58.1835031Z .loc 1 56 45 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:56:45 2026-02-21T09:32:58.1835088Z shl.b32 %r670, %r21, 10; 2026-02-21T09:32:58.1835143Z shl.b32 %r671, %r22, 10; 2026-02-21T09:32:58.1835205Z shl.b32 %r672, %r23, 10; 2026-02-21T09:32:58.1835259Z shl.b32 %r673, %r24, 10; 2026-02-21T09:32:58.1835313Z shl.b32 %r674, %r25, 10; 2026-02-21T09:32:58.1835375Z shl.b32 %r675, %r26, 10; 2026-02-21T09:32:58.1835429Z shl.b32 %r676, %r27, 10; 2026-02-21T09:32:58.1835483Z shl.b32 %r677, %r28, 10; 2026-02-21T09:32:58.1835537Z shl.b32 %r678, %r29, 10; 2026-02-21T09:32:58.1835602Z shl.b32 %r679, %r30, 10; 2026-02-21T09:32:58.1835656Z shl.b32 %r680, %r31, 10; 2026-02-21T09:32:58.1835711Z shl.b32 %r681, %r32, 10; 2026-02-21T09:32:58.1835774Z shl.b32 %r682, %r33, 10; 2026-02-21T09:32:58.1835833Z shl.b32 %r683, %r34, 10; 2026-02-21T09:32:58.1835892Z shl.b32 %r684, %r35, 10; 2026-02-21T09:32:58.1835950Z shl.b32 %r685, %r36, 10; 2026-02-21T09:32:58.1836126Z .loc 1 56 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:56:52 2026-02-21T09:32:58.1836187Z or.b32 %r686, %r670, %r20; 2026-02-21T09:32:58.1836246Z or.b32 %r687, %r671, %r20; 2026-02-21T09:32:58.1836316Z or.b32 %r688, %r672, %r20; 2026-02-21T09:32:58.1836372Z or.b32 %r689, %r673, %r20; 2026-02-21T09:32:58.1836429Z or.b32 %r690, %r674, %r20; 2026-02-21T09:32:58.1836495Z or.b32 %r691, %r675, %r20; 2026-02-21T09:32:58.1836550Z or.b32 %r692, %r676, %r20; 2026-02-21T09:32:58.1836604Z or.b32 %r693, %r677, %r20; 2026-02-21T09:32:58.1836657Z or.b32 %r694, %r678, %r20; 2026-02-21T09:32:58.1836718Z or.b32 %r695, %r679, %r20; 2026-02-21T09:32:58.1836773Z or.b32 %r696, %r680, %r20; 2026-02-21T09:32:58.1836829Z or.b32 %r697, %r681, %r20; 2026-02-21T09:32:58.1836916Z or.b32 %r698, %r682, %r20; 2026-02-21T09:32:58.1836992Z or.b32 %r699, %r683, %r20; 2026-02-21T09:32:58.1837046Z or.b32 %r700, %r684, %r20; 2026-02-21T09:32:58.1837102Z or.b32 %r701, %r700, 114688; 2026-02-21T09:32:58.1837163Z or.b32 %r702, %r685, %r20; 2026-02-21T09:32:58.1837219Z or.b32 %r703, %r702, 122880; 2026-02-21T09:32:58.1837379Z .loc 1 56 24 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:56:24 2026-02-21T09:32:58.1837453Z mad.wide.u32 %rd85, %r686, 2, %rd10; 2026-02-21T09:32:58.1837517Z mad.wide.u32 %rd86, %r687, 2, %rd10; 2026-02-21T09:32:58.1837578Z mad.wide.u32 %rd87, %r688, 2, %rd10; 2026-02-21T09:32:58.1837638Z mad.wide.u32 %rd88, %r689, 2, %rd10; 2026-02-21T09:32:58.1837703Z mad.wide.u32 %rd89, %r690, 2, %rd10; 2026-02-21T09:32:58.1837761Z mad.wide.u32 %rd90, %r691, 2, %rd10; 2026-02-21T09:32:58.1837821Z mad.wide.u32 %rd91, %r692, 2, %rd10; 2026-02-21T09:32:58.1837889Z mad.wide.u32 %rd92, %r693, 2, %rd10; 2026-02-21T09:32:58.1837950Z mad.wide.u32 %rd93, %r694, 2, %rd10; 2026-02-21T09:32:58.1838010Z mad.wide.u32 %rd94, %r695, 2, %rd10; 2026-02-21T09:32:58.1838076Z mad.wide.u32 %rd95, %r696, 2, %rd10; 2026-02-21T09:32:58.1838134Z mad.wide.u32 %rd96, %r697, 2, %rd10; 2026-02-21T09:32:58.1838193Z mad.wide.u32 %rd97, %r698, 2, %rd10; 2026-02-21T09:32:58.1838252Z mad.wide.u32 %rd98, %r699, 2, %rd10; 2026-02-21T09:32:58.1838319Z mad.wide.u32 %rd99, %r701, 2, %rd10; 2026-02-21T09:32:58.1838382Z mad.wide.u32 %rd100, %r703, 2, %rd10; 2026-02-21T09:32:58.1838542Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1838604Z // begin inline asm 2026-02-21T09:32:58.1838870Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403, %r404}, [%r524 + 0]; 2026-02-21T09:32:58.1838948Z // end inline asm 2026-02-21T09:32:58.1839034Z // begin inline asm 2026-02-21T09:32:58.1839302Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420, %r421}, [%r524 + 16]; 2026-02-21T09:32:58.1839357Z // end inline asm 2026-02-21T09:32:58.1839411Z // begin inline asm 2026-02-21T09:32:58.1839689Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437, %r438}, [%r524 + 32]; 2026-02-21T09:32:58.1839744Z // end inline asm 2026-02-21T09:32:58.1839800Z // begin inline asm 2026-02-21T09:32:58.1840074Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454, %r455}, [%r524 + 48]; 2026-02-21T09:32:58.1840129Z // end inline asm 2026-02-21T09:32:58.1840185Z // begin inline asm 2026-02-21T09:32:58.1840459Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471, %r472}, [%r524 + 64]; 2026-02-21T09:32:58.1840516Z // end inline asm 2026-02-21T09:32:58.1840571Z // begin inline asm 2026-02-21T09:32:58.1840847Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489}, [%r524 + 80]; 2026-02-21T09:32:58.1840901Z // end inline asm 2026-02-21T09:32:58.1840957Z // begin inline asm 2026-02-21T09:32:58.1841230Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506}, [%r524 + 96]; 2026-02-21T09:32:58.1841285Z // end inline asm 2026-02-21T09:32:58.1841340Z // begin inline asm 2026-02-21T09:32:58.1841609Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523}, [%r524 + 112]; 2026-02-21T09:32:58.1841673Z // end inline asm 2026-02-21T09:32:58.1841730Z // begin inline asm 2026-02-21T09:32:58.1841824Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:32:58.1841905Z // end inline asm 2026-02-21T09:32:58.1841964Z cvt.u64.u32 %rd101, %r389; 2026-02-21T09:32:58.1842024Z cvt.u64.u32 %rd102, %r390; 2026-02-21T09:32:58.1842084Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:32:58.1842151Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:32:58.1842323Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1842386Z mov.b64 {%r704, %r705}, %rd104; 2026-02-21T09:32:58.1842463Z cvt.rn.f16x2.f32 %r706, %r705, %r704; 2026-02-21T09:32:58.1842630Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1842688Z cvt.u64.u32 %rd105, %r391; 2026-02-21T09:32:58.1842752Z cvt.u64.u32 %rd106, %r392; 2026-02-21T09:32:58.1842811Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:32:58.1842873Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:32:58.1843040Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1843112Z mov.b64 {%r707, %r708}, %rd108; 2026-02-21T09:32:58.1843179Z cvt.rn.f16x2.f32 %r709, %r708, %r707; 2026-02-21T09:32:58.1843346Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1843412Z cvt.u64.u32 %rd109, %r393; 2026-02-21T09:32:58.1843469Z cvt.u64.u32 %rd110, %r394; 2026-02-21T09:32:58.1843528Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:32:58.1843594Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:32:58.1843756Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1843816Z mov.b64 {%r710, %r711}, %rd112; 2026-02-21T09:32:58.1843881Z cvt.rn.f16x2.f32 %r712, %r711, %r710; 2026-02-21T09:32:58.1844076Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1844161Z cvt.u64.u32 %rd113, %r395; 2026-02-21T09:32:58.1844224Z cvt.u64.u32 %rd114, %r396; 2026-02-21T09:32:58.1844293Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:32:58.1844360Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:32:58.1844528Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1844595Z mov.b64 {%r713, %r714}, %rd116; 2026-02-21T09:32:58.1844660Z cvt.rn.f16x2.f32 %r715, %r714, %r713; 2026-02-21T09:32:58.1844870Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1844936Z cvt.u64.u32 %rd117, %r397; 2026-02-21T09:32:58.1844994Z cvt.u64.u32 %rd118, %r398; 2026-02-21T09:32:58.1845052Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:32:58.1845113Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:32:58.1845292Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1845353Z mov.b64 {%r716, %r717}, %rd120; 2026-02-21T09:32:58.1845420Z cvt.rn.f16x2.f32 %r718, %r717, %r716; 2026-02-21T09:32:58.1845595Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1845653Z cvt.u64.u32 %rd121, %r399; 2026-02-21T09:32:58.1845709Z cvt.u64.u32 %rd122, %r400; 2026-02-21T09:32:58.1845766Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:32:58.1845832Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:32:58.1845998Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1846057Z mov.b64 {%r719, %r720}, %rd124; 2026-02-21T09:32:58.1846131Z cvt.rn.f16x2.f32 %r721, %r720, %r719; 2026-02-21T09:32:58.1846296Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1846354Z cvt.u64.u32 %rd125, %r401; 2026-02-21T09:32:58.1846421Z cvt.u64.u32 %rd126, %r402; 2026-02-21T09:32:58.1846482Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:32:58.1846569Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:32:58.1846766Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1846845Z mov.b64 {%r722, %r723}, %rd128; 2026-02-21T09:32:58.1846909Z cvt.rn.f16x2.f32 %r724, %r723, %r722; 2026-02-21T09:32:58.1847068Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1847133Z cvt.u64.u32 %rd129, %r403; 2026-02-21T09:32:58.1847191Z cvt.u64.u32 %rd130, %r404; 2026-02-21T09:32:58.1847250Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:32:58.1847316Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:32:58.1847479Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1847538Z mov.b64 {%r725, %r726}, %rd132; 2026-02-21T09:32:58.1847603Z cvt.rn.f16x2.f32 %r727, %r726, %r725; 2026-02-21T09:32:58.1847771Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1847833Z cvt.u64.u32 %rd133, %r406; 2026-02-21T09:32:58.1847890Z cvt.u64.u32 %rd134, %r407; 2026-02-21T09:32:58.1847958Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:32:58.1848016Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:32:58.1848176Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1848243Z mov.b64 {%r728, %r729}, %rd136; 2026-02-21T09:32:58.1848306Z cvt.rn.f16x2.f32 %r730, %r729, %r728; 2026-02-21T09:32:58.1848472Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1848531Z cvt.u64.u32 %rd137, %r408; 2026-02-21T09:32:58.1848596Z cvt.u64.u32 %rd138, %r409; 2026-02-21T09:32:58.1848654Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:32:58.1848712Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:32:58.1848929Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1848990Z mov.b64 {%r731, %r732}, %rd140; 2026-02-21T09:32:58.1849050Z cvt.rn.f16x2.f32 %r733, %r732, %r731; 2026-02-21T09:32:58.1849212Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1849266Z cvt.u64.u32 %rd141, %r410; 2026-02-21T09:32:58.1849321Z cvt.u64.u32 %rd142, %r411; 2026-02-21T09:32:58.1849377Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:32:58.1849439Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:32:58.1849602Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1849657Z mov.b64 {%r734, %r735}, %rd144; 2026-02-21T09:32:58.1849724Z cvt.rn.f16x2.f32 %r736, %r735, %r734; 2026-02-21T09:32:58.1849882Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1849938Z cvt.u64.u32 %rd145, %r412; 2026-02-21T09:32:58.1850000Z cvt.u64.u32 %rd146, %r413; 2026-02-21T09:32:58.1850058Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:32:58.1850113Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:32:58.1850267Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1850329Z mov.b64 {%r737, %r738}, %rd148; 2026-02-21T09:32:58.1850388Z cvt.rn.f16x2.f32 %r739, %r738, %r737; 2026-02-21T09:32:58.1850546Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1850607Z cvt.u64.u32 %rd149, %r414; 2026-02-21T09:32:58.1850661Z cvt.u64.u32 %rd150, %r415; 2026-02-21T09:32:58.1850716Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:32:58.1850778Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:32:58.1850937Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1850995Z mov.b64 {%r740, %r741}, %rd152; 2026-02-21T09:32:58.1851056Z cvt.rn.f16x2.f32 %r742, %r741, %r740; 2026-02-21T09:32:58.1851244Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1851333Z cvt.u64.u32 %rd153, %r416; 2026-02-21T09:32:58.1851388Z cvt.u64.u32 %rd154, %r417; 2026-02-21T09:32:58.1851452Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:32:58.1851508Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:32:58.1851666Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1851729Z mov.b64 {%r743, %r744}, %rd156; 2026-02-21T09:32:58.1851790Z cvt.rn.f16x2.f32 %r745, %r744, %r743; 2026-02-21T09:32:58.1851947Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1852005Z cvt.u64.u32 %rd157, %r418; 2026-02-21T09:32:58.1852067Z cvt.u64.u32 %rd158, %r419; 2026-02-21T09:32:58.1852124Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:32:58.1852181Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:32:58.1852350Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1852408Z mov.b64 {%r746, %r747}, %rd160; 2026-02-21T09:32:58.1852469Z cvt.rn.f16x2.f32 %r748, %r747, %r746; 2026-02-21T09:32:58.1852635Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1852692Z cvt.u64.u32 %rd161, %r420; 2026-02-21T09:32:58.1852747Z cvt.u64.u32 %rd162, %r421; 2026-02-21T09:32:58.1852804Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:32:58.1852872Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:32:58.1853033Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1853094Z mov.b64 {%r749, %r750}, %rd164; 2026-02-21T09:32:58.1853169Z cvt.rn.f16x2.f32 %r751, %r750, %r749; 2026-02-21T09:32:58.1853378Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1853439Z cvt.u64.u32 %rd165, %r423; 2026-02-21T09:32:58.1853503Z cvt.u64.u32 %rd166, %r424; 2026-02-21T09:32:58.1853561Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:32:58.1853617Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:32:58.1853774Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1853838Z mov.b64 {%r752, %r753}, %rd168; 2026-02-21T09:32:58.1853898Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T09:32:58.1854054Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1854118Z cvt.u64.u32 %rd169, %r425; 2026-02-21T09:32:58.1854173Z cvt.u64.u32 %rd170, %r426; 2026-02-21T09:32:58.1854228Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:32:58.1854289Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:32:58.1854450Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1854508Z mov.b64 {%r755, %r756}, %rd172; 2026-02-21T09:32:58.1854570Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T09:32:58.1854765Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1854821Z cvt.u64.u32 %rd173, %r427; 2026-02-21T09:32:58.1854875Z cvt.u64.u32 %rd174, %r428; 2026-02-21T09:32:58.1854938Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:32:58.1854995Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:32:58.1855153Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1855217Z mov.b64 {%r758, %r759}, %rd176; 2026-02-21T09:32:58.1855278Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T09:32:58.1855431Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1855486Z cvt.u64.u32 %rd177, %r429; 2026-02-21T09:32:58.1855551Z cvt.u64.u32 %rd178, %r430; 2026-02-21T09:32:58.1855609Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:32:58.1855729Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:32:58.1855896Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1855954Z mov.b64 {%r761, %r762}, %rd180; 2026-02-21T09:32:58.1856015Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T09:32:58.1856186Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1856242Z cvt.u64.u32 %rd181, %r431; 2026-02-21T09:32:58.1856298Z cvt.u64.u32 %rd182, %r432; 2026-02-21T09:32:58.1856354Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:32:58.1856417Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:32:58.1856573Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1856630Z mov.b64 {%r764, %r765}, %rd184; 2026-02-21T09:32:58.1856699Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T09:32:58.1856857Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1856916Z cvt.u64.u32 %rd185, %r433; 2026-02-21T09:32:58.1856978Z cvt.u64.u32 %rd186, %r434; 2026-02-21T09:32:58.1857033Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:32:58.1857090Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:32:58.1857250Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1857314Z mov.b64 {%r767, %r768}, %rd188; 2026-02-21T09:32:58.1857377Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T09:32:58.1857536Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1857600Z cvt.u64.u32 %rd189, %r435; 2026-02-21T09:32:58.1857657Z cvt.u64.u32 %rd190, %r436; 2026-02-21T09:32:58.1857714Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:32:58.1857799Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:32:58.1857982Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1858043Z mov.b64 {%r770, %r771}, %rd192; 2026-02-21T09:32:58.1858103Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T09:32:58.1858269Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1858324Z cvt.u64.u32 %rd193, %r437; 2026-02-21T09:32:58.1858378Z cvt.u64.u32 %rd194, %r438; 2026-02-21T09:32:58.1858439Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:32:58.1858495Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:32:58.1858654Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1858718Z mov.b64 {%r773, %r774}, %rd196; 2026-02-21T09:32:58.1858777Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T09:32:58.1858938Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1858995Z cvt.u64.u32 %rd197, %r440; 2026-02-21T09:32:58.1859060Z cvt.u64.u32 %rd198, %r441; 2026-02-21T09:32:58.1859119Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:32:58.1859175Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:32:58.1859339Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1859395Z mov.b64 {%r776, %r777}, %rd200; 2026-02-21T09:32:58.1859455Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T09:32:58.1859619Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1859675Z cvt.u64.u32 %rd201, %r442; 2026-02-21T09:32:58.1859730Z cvt.u64.u32 %rd202, %r443; 2026-02-21T09:32:58.1859785Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:32:58.1859849Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:32:58.1860008Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1860066Z mov.b64 {%r779, %r780}, %rd204; 2026-02-21T09:32:58.1860154Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T09:32:58.1860332Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1860389Z cvt.u64.u32 %rd205, %r444; 2026-02-21T09:32:58.1860453Z cvt.u64.u32 %rd206, %r445; 2026-02-21T09:32:58.1860510Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:32:58.1860566Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:32:58.1860724Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1860790Z mov.b64 {%r782, %r783}, %rd208; 2026-02-21T09:32:58.1860850Z cvt.rn.f16x2.f32 %r784, %r783, %r782; 2026-02-21T09:32:58.1861007Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1861071Z cvt.u64.u32 %rd209, %r446; 2026-02-21T09:32:58.1861126Z cvt.u64.u32 %rd210, %r447; 2026-02-21T09:32:58.1861187Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:32:58.1861256Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:32:58.1861415Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1861473Z mov.b64 {%r785, %r786}, %rd212; 2026-02-21T09:32:58.1861533Z cvt.rn.f16x2.f32 %r787, %r786, %r785; 2026-02-21T09:32:58.1861702Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1861759Z cvt.u64.u32 %rd213, %r448; 2026-02-21T09:32:58.1861814Z cvt.u64.u32 %rd214, %r449; 2026-02-21T09:32:58.1861877Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:32:58.1861933Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:32:58.1862092Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1862156Z mov.b64 {%r788, %r789}, %rd216; 2026-02-21T09:32:58.1862214Z cvt.rn.f16x2.f32 %r790, %r789, %r788; 2026-02-21T09:32:58.1862410Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1862469Z cvt.u64.u32 %rd217, %r450; 2026-02-21T09:32:58.1862532Z cvt.u64.u32 %rd218, %r451; 2026-02-21T09:32:58.1862589Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:32:58.1862643Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:32:58.1862811Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1862867Z mov.b64 {%r791, %r792}, %rd220; 2026-02-21T09:32:58.1862927Z cvt.rn.f16x2.f32 %r793, %r792, %r791; 2026-02-21T09:32:58.1863093Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1863150Z cvt.u64.u32 %rd221, %r452; 2026-02-21T09:32:58.1863205Z cvt.u64.u32 %rd222, %r453; 2026-02-21T09:32:58.1863260Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:32:58.1863324Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:32:58.1863488Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1863546Z mov.b64 {%r794, %r795}, %rd224; 2026-02-21T09:32:58.1863615Z cvt.rn.f16x2.f32 %r796, %r795, %r794; 2026-02-21T09:32:58.1863774Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1863829Z cvt.u64.u32 %rd225, %r454; 2026-02-21T09:32:58.1863892Z cvt.u64.u32 %rd226, %r455; 2026-02-21T09:32:58.1863946Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:32:58.1864003Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:32:58.1864162Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1864227Z mov.b64 {%r797, %r798}, %rd228; 2026-02-21T09:32:58.1864287Z cvt.rn.f16x2.f32 %r799, %r798, %r797; 2026-02-21T09:32:58.1864450Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1864515Z cvt.u64.u32 %rd229, %r457; 2026-02-21T09:32:58.1864571Z cvt.u64.u32 %rd230, %r458; 2026-02-21T09:32:58.1864650Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:32:58.1864774Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:32:58.1864935Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1864991Z mov.b64 {%r800, %r801}, %rd232; 2026-02-21T09:32:58.1865052Z cvt.rn.f16x2.f32 %r802, %r801, %r800; 2026-02-21T09:32:58.1865218Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1865274Z cvt.u64.u32 %rd233, %r459; 2026-02-21T09:32:58.1865329Z cvt.u64.u32 %rd234, %r460; 2026-02-21T09:32:58.1865391Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:32:58.1865446Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:32:58.1865603Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1865667Z mov.b64 {%r803, %r804}, %rd236; 2026-02-21T09:32:58.1865728Z cvt.rn.f16x2.f32 %r805, %r804, %r803; 2026-02-21T09:32:58.1865887Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1865943Z cvt.u64.u32 %rd237, %r461; 2026-02-21T09:32:58.1866007Z cvt.u64.u32 %rd238, %r462; 2026-02-21T09:32:58.1866064Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:32:58.1866120Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:32:58.1866285Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1866343Z mov.b64 {%r806, %r807}, %rd240; 2026-02-21T09:32:58.1866403Z cvt.rn.f16x2.f32 %r808, %r807, %r806; 2026-02-21T09:32:58.1866569Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1866626Z cvt.u64.u32 %rd241, %r463; 2026-02-21T09:32:58.1866683Z cvt.u64.u32 %rd242, %r464; 2026-02-21T09:32:58.1866763Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:32:58.1866852Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:32:58.1867010Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1867068Z mov.b64 {%r809, %r810}, %rd244; 2026-02-21T09:32:58.1867136Z cvt.rn.f16x2.f32 %r811, %r810, %r809; 2026-02-21T09:32:58.1867292Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1867347Z cvt.u64.u32 %rd245, %r465; 2026-02-21T09:32:58.1867408Z cvt.u64.u32 %rd246, %r466; 2026-02-21T09:32:58.1867462Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:32:58.1867519Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:32:58.1867678Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1867740Z mov.b64 {%r812, %r813}, %rd248; 2026-02-21T09:32:58.1867799Z cvt.rn.f16x2.f32 %r814, %r813, %r812; 2026-02-21T09:32:58.1867963Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1868027Z cvt.u64.u32 %rd249, %r467; 2026-02-21T09:32:58.1868085Z cvt.u64.u32 %rd250, %r468; 2026-02-21T09:32:58.1868140Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:32:58.1868202Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:32:58.1868357Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1868413Z mov.b64 {%r815, %r816}, %rd252; 2026-02-21T09:32:58.1868474Z cvt.rn.f16x2.f32 %r817, %r816, %r815; 2026-02-21T09:32:58.1868638Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1868695Z cvt.u64.u32 %rd253, %r469; 2026-02-21T09:32:58.1868749Z cvt.u64.u32 %rd254, %r470; 2026-02-21T09:32:58.1868812Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:32:58.1868868Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:32:58.1869024Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1869089Z mov.b64 {%r818, %r819}, %rd256; 2026-02-21T09:32:58.1869207Z cvt.rn.f16x2.f32 %r820, %r819, %r818; 2026-02-21T09:32:58.1869365Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1869421Z cvt.u64.u32 %rd257, %r471; 2026-02-21T09:32:58.1869485Z cvt.u64.u32 %rd258, %r472; 2026-02-21T09:32:58.1869541Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:32:58.1869597Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:32:58.1869764Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1869821Z mov.b64 {%r821, %r822}, %rd260; 2026-02-21T09:32:58.1869883Z cvt.rn.f16x2.f32 %r823, %r822, %r821; 2026-02-21T09:32:58.1870047Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1870105Z cvt.u64.u32 %rd261, %r474; 2026-02-21T09:32:58.1870164Z cvt.u64.u32 %rd262, %r475; 2026-02-21T09:32:58.1870223Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:32:58.1870291Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:32:58.1870448Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1870504Z mov.b64 {%r824, %r825}, %rd264; 2026-02-21T09:32:58.1870571Z cvt.rn.f16x2.f32 %r826, %r825, %r824; 2026-02-21T09:32:58.1870728Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1870784Z cvt.u64.u32 %rd265, %r476; 2026-02-21T09:32:58.1870855Z cvt.u64.u32 %rd266, %r477; 2026-02-21T09:32:58.1870915Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:32:58.1870974Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:32:58.1871133Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1871203Z mov.b64 {%r827, %r828}, %rd268; 2026-02-21T09:32:58.1871286Z cvt.rn.f16x2.f32 %r829, %r828, %r827; 2026-02-21T09:32:58.1871470Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1871541Z cvt.u64.u32 %rd269, %r478; 2026-02-21T09:32:58.1871597Z cvt.u64.u32 %rd270, %r479; 2026-02-21T09:32:58.1871654Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:32:58.1871719Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:32:58.1871879Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1871937Z mov.b64 {%r830, %r831}, %rd272; 2026-02-21T09:32:58.1871998Z cvt.rn.f16x2.f32 %r832, %r831, %r830; 2026-02-21T09:32:58.1872163Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1872220Z cvt.u64.u32 %rd273, %r480; 2026-02-21T09:32:58.1872276Z cvt.u64.u32 %rd274, %r481; 2026-02-21T09:32:58.1872338Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:32:58.1872396Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:32:58.1872560Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1872627Z mov.b64 {%r833, %r834}, %rd276; 2026-02-21T09:32:58.1872688Z cvt.rn.f16x2.f32 %r835, %r834, %r833; 2026-02-21T09:32:58.1872844Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1872899Z cvt.u64.u32 %rd277, %r482; 2026-02-21T09:32:58.1872964Z cvt.u64.u32 %rd278, %r483; 2026-02-21T09:32:58.1873021Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:32:58.1873077Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:32:58.1873241Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1873298Z mov.b64 {%r836, %r837}, %rd280; 2026-02-21T09:32:58.1873358Z cvt.rn.f16x2.f32 %r838, %r837, %r836; 2026-02-21T09:32:58.1873522Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1873581Z cvt.u64.u32 %rd281, %r484; 2026-02-21T09:32:58.1873657Z cvt.u64.u32 %rd282, %r485; 2026-02-21T09:32:58.1873732Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:32:58.1873795Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:32:58.1873954Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1874010Z mov.b64 {%r839, %r840}, %rd284; 2026-02-21T09:32:58.1874076Z cvt.rn.f16x2.f32 %r841, %r840, %r839; 2026-02-21T09:32:58.1874235Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1874290Z cvt.u64.u32 %rd285, %r486; 2026-02-21T09:32:58.1874353Z cvt.u64.u32 %rd286, %r487; 2026-02-21T09:32:58.1874408Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:32:58.1874462Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:32:58.1874620Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1874713Z mov.b64 {%r842, %r843}, %rd288; 2026-02-21T09:32:58.1874776Z cvt.rn.f16x2.f32 %r844, %r843, %r842; 2026-02-21T09:32:58.1874934Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1874995Z cvt.u64.u32 %rd289, %r488; 2026-02-21T09:32:58.1875050Z cvt.u64.u32 %rd290, %r489; 2026-02-21T09:32:58.1875105Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:32:58.1875159Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:32:58.1875325Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1875381Z mov.b64 {%r845, %r846}, %rd292; 2026-02-21T09:32:58.1875439Z cvt.rn.f16x2.f32 %r847, %r846, %r845; 2026-02-21T09:32:58.1875605Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1875660Z cvt.u64.u32 %rd293, %r491; 2026-02-21T09:32:58.1875714Z cvt.u64.u32 %rd294, %r492; 2026-02-21T09:32:58.1875803Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:32:58.1875882Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:32:58.1876049Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1876116Z mov.b64 {%r848, %r849}, %rd296; 2026-02-21T09:32:58.1876177Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T09:32:58.1876341Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1876395Z cvt.u64.u32 %rd297, %r493; 2026-02-21T09:32:58.1876458Z cvt.u64.u32 %rd298, %r494; 2026-02-21T09:32:58.1876513Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:32:58.1876570Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:32:58.1876737Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1876793Z mov.b64 {%r851, %r852}, %rd300; 2026-02-21T09:32:58.1876853Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:32:58.1877022Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1877082Z cvt.u64.u32 %rd301, %r495; 2026-02-21T09:32:58.1877139Z cvt.u64.u32 %rd302, %r496; 2026-02-21T09:32:58.1877194Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:32:58.1877257Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:32:58.1877417Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1877474Z mov.b64 {%r854, %r855}, %rd304; 2026-02-21T09:32:58.1877541Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:32:58.1877700Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1877755Z cvt.u64.u32 %rd305, %r497; 2026-02-21T09:32:58.1877819Z cvt.u64.u32 %rd306, %r498; 2026-02-21T09:32:58.1877875Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:32:58.1877931Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:32:58.1878092Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1878180Z mov.b64 {%r857, %r858}, %rd308; 2026-02-21T09:32:58.1878264Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:32:58.1878424Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1878488Z cvt.u64.u32 %rd309, %r499; 2026-02-21T09:32:58.1878543Z cvt.u64.u32 %rd310, %r500; 2026-02-21T09:32:58.1878600Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:32:58.1878658Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:32:58.1878832Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1878889Z mov.b64 {%r860, %r861}, %rd312; 2026-02-21T09:32:58.1878951Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:32:58.1879119Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1879175Z cvt.u64.u32 %rd313, %r501; 2026-02-21T09:32:58.1879232Z cvt.u64.u32 %rd314, %r502; 2026-02-21T09:32:58.1879295Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:32:58.1879352Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:32:58.1879511Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1879575Z mov.b64 {%r863, %r864}, %rd316; 2026-02-21T09:32:58.1879636Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:32:58.1879795Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1879850Z cvt.u64.u32 %rd317, %r503; 2026-02-21T09:32:58.1879912Z cvt.u64.u32 %rd318, %r504; 2026-02-21T09:32:58.1879969Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:32:58.1880024Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:32:58.1880190Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1880248Z mov.b64 {%r866, %r867}, %rd320; 2026-02-21T09:32:58.1880342Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:32:58.1880509Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1880566Z cvt.u64.u32 %rd321, %r505; 2026-02-21T09:32:58.1880622Z cvt.u64.u32 %rd322, %r506; 2026-02-21T09:32:58.1880677Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:32:58.1880740Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:32:58.1880896Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1880954Z mov.b64 {%r869, %r870}, %rd324; 2026-02-21T09:32:58.1881022Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:32:58.1881181Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1881236Z cvt.u64.u32 %rd325, %r508; 2026-02-21T09:32:58.1881299Z cvt.u64.u32 %rd326, %r509; 2026-02-21T09:32:58.1881355Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:32:58.1881410Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:32:58.1881571Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1881640Z mov.b64 {%r872, %r873}, %rd328; 2026-02-21T09:32:58.1881701Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:32:58.1881858Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1881920Z cvt.u64.u32 %rd329, %r510; 2026-02-21T09:32:58.1881977Z cvt.u64.u32 %rd330, %r511; 2026-02-21T09:32:58.1882032Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:32:58.1882088Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:32:58.1882252Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1882308Z mov.b64 {%r875, %r876}, %rd332; 2026-02-21T09:32:58.1882368Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:32:58.1882535Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1882593Z cvt.u64.u32 %rd333, %r512; 2026-02-21T09:32:58.1882683Z cvt.u64.u32 %rd334, %r513; 2026-02-21T09:32:58.1882745Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:32:58.1882801Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:32:58.1882962Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1883018Z mov.b64 {%r878, %r879}, %rd336; 2026-02-21T09:32:58.1883085Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:32:58.1883246Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1883300Z cvt.u64.u32 %rd337, %r514; 2026-02-21T09:32:58.1883363Z cvt.u64.u32 %rd338, %r515; 2026-02-21T09:32:58.1883419Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:32:58.1883475Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:32:58.1883640Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1883699Z mov.b64 {%r881, %r882}, %rd340; 2026-02-21T09:32:58.1883764Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:32:58.1883937Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1884001Z cvt.u64.u32 %rd341, %r516; 2026-02-21T09:32:58.1884059Z cvt.u64.u32 %rd342, %r517; 2026-02-21T09:32:58.1884116Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:32:58.1884182Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:32:58.1884351Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1884411Z mov.b64 {%r884, %r885}, %rd344; 2026-02-21T09:32:58.1884482Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:32:58.1884650Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1884745Z cvt.u64.u32 %rd345, %r518; 2026-02-21T09:32:58.1884828Z cvt.u64.u32 %rd346, %r519; 2026-02-21T09:32:58.1884928Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:32:58.1884991Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:32:58.1885160Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1885226Z mov.b64 {%r887, %r888}, %rd348; 2026-02-21T09:32:58.1885290Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:32:58.1885456Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1885520Z cvt.u64.u32 %rd349, %r520; 2026-02-21T09:32:58.1885577Z cvt.u64.u32 %rd350, %r521; 2026-02-21T09:32:58.1885635Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:32:58.1885694Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:32:58.1885867Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1885925Z mov.b64 {%r890, %r891}, %rd352; 2026-02-21T09:32:58.1885991Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:32:58.1886168Z .loc 1 53 52 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:53:52 2026-02-21T09:32:58.1886230Z cvt.u64.u32 %rd353, %r522; 2026-02-21T09:32:58.1886289Z cvt.u64.u32 %rd354, %r523; 2026-02-21T09:32:58.1886356Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:32:58.1886415Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:32:58.1886579Z .loc 1 55 27 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:55:27 2026-02-21T09:32:58.1886639Z mov.b64 {%r893, %r894}, %rd356; 2026-02-21T09:32:58.1886708Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:32:58.1886876Z .loc 1 56 82 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:56:82 2026-02-21T09:32:58.1886974Z st.shared.v4.b32 [%r11], {%r706, %r718, %r730, %r742}; 2026-02-21T09:32:58.1887078Z st.shared.v4.b32 [%r12], {%r754, %r766, %r778, %r790}; 2026-02-21T09:32:58.1887167Z st.shared.v4.b32 [%r13], {%r802, %r814, %r826, %r838}; 2026-02-21T09:32:58.1887256Z st.shared.v4.b32 [%r14], {%r850, %r862, %r874, %r886}; 2026-02-21T09:32:58.1887363Z bar.sync 0; 2026-02-21T09:32:58.1887458Z // begin inline asm 2026-02-21T09:32:58.1887620Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r605, %r609, %r613, %r617}, [%r529]; 2026-02-21T09:32:58.1887677Z // end inline asm 2026-02-21T09:32:58.1887743Z // begin inline asm 2026-02-21T09:32:58.1887891Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r621, %r625, %r629, %r633}, [%r534]; 2026-02-21T09:32:58.1887946Z // end inline asm 2026-02-21T09:32:58.1888008Z // begin inline asm 2026-02-21T09:32:58.1888152Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r641, %r645, %r649}, [%r539]; 2026-02-21T09:32:58.1888208Z // end inline asm 2026-02-21T09:32:58.1888270Z // begin inline asm 2026-02-21T09:32:58.1888414Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r653, %r657, %r661, %r665}, [%r544]; 2026-02-21T09:32:58.1888470Z // end inline asm 2026-02-21T09:32:58.1888525Z bar.sync 0; 2026-02-21T09:32:58.1888622Z st.shared.v4.b32 [%r11], {%r709, %r721, %r733, %r745}; 2026-02-21T09:32:58.1888709Z st.shared.v4.b32 [%r12], {%r757, %r769, %r781, %r793}; 2026-02-21T09:32:58.1888796Z st.shared.v4.b32 [%r13], {%r805, %r817, %r829, %r841}; 2026-02-21T09:32:58.1888887Z st.shared.v4.b32 [%r14], {%r853, %r865, %r877, %r889}; 2026-02-21T09:32:58.1888943Z bar.sync 0; 2026-02-21T09:32:58.1888999Z // begin inline asm 2026-02-21T09:32:58.1889144Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r606, %r610, %r614, %r618}, [%r529]; 2026-02-21T09:32:58.1889209Z // end inline asm 2026-02-21T09:32:58.1889264Z // begin inline asm 2026-02-21T09:32:58.1889408Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r622, %r626, %r630, %r634}, [%r534]; 2026-02-21T09:32:58.1889471Z // end inline asm 2026-02-21T09:32:58.1889526Z // begin inline asm 2026-02-21T09:32:58.1889667Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r638, %r642, %r646, %r650}, [%r539]; 2026-02-21T09:32:58.1889730Z // end inline asm 2026-02-21T09:32:58.1889805Z // begin inline asm 2026-02-21T09:32:58.1889966Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r654, %r658, %r662, %r666}, [%r544]; 2026-02-21T09:32:58.1890025Z // end inline asm 2026-02-21T09:32:58.1890088Z bar.sync 0; 2026-02-21T09:32:58.1890176Z st.shared.v4.b32 [%r11], {%r712, %r724, %r736, %r748}; 2026-02-21T09:32:58.1890262Z st.shared.v4.b32 [%r12], {%r760, %r772, %r784, %r796}; 2026-02-21T09:32:58.1890355Z st.shared.v4.b32 [%r13], {%r808, %r820, %r832, %r844}; 2026-02-21T09:32:58.1890439Z st.shared.v4.b32 [%r14], {%r856, %r868, %r880, %r892}; 2026-02-21T09:32:58.1890493Z bar.sync 0; 2026-02-21T09:32:58.1890558Z // begin inline asm 2026-02-21T09:32:58.1890704Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r607, %r611, %r615, %r619}, [%r529]; 2026-02-21T09:32:58.1890760Z // end inline asm 2026-02-21T09:32:58.1890816Z // begin inline asm 2026-02-21T09:32:58.1890965Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r623, %r627, %r631, %r635}, [%r534]; 2026-02-21T09:32:58.1891021Z // end inline asm 2026-02-21T09:32:58.1891078Z // begin inline asm 2026-02-21T09:32:58.1891231Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r639, %r643, %r647, %r651}, [%r539]; 2026-02-21T09:32:58.1891288Z // end inline asm 2026-02-21T09:32:58.1891344Z // begin inline asm 2026-02-21T09:32:58.1891484Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r655, %r659, %r663, %r667}, [%r544]; 2026-02-21T09:32:58.1891547Z // end inline asm 2026-02-21T09:32:58.1891601Z bar.sync 0; 2026-02-21T09:32:58.1891686Z st.shared.v4.b32 [%r11], {%r715, %r727, %r739, %r751}; 2026-02-21T09:32:58.1891775Z st.shared.v4.b32 [%r12], {%r763, %r775, %r787, %r799}; 2026-02-21T09:32:58.1891860Z st.shared.v4.b32 [%r13], {%r811, %r823, %r835, %r847}; 2026-02-21T09:32:58.1891943Z st.shared.v4.b32 [%r14], {%r859, %r871, %r883, %r895}; 2026-02-21T09:32:58.1892006Z bar.sync 0; 2026-02-21T09:32:58.1892062Z // begin inline asm 2026-02-21T09:32:58.1892205Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r608, %r612, %r616, %r620}, [%r529]; 2026-02-21T09:32:58.1892261Z // end inline asm 2026-02-21T09:32:58.1892325Z // begin inline asm 2026-02-21T09:32:58.1892497Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r624, %r628, %r632, %r636}, [%r534]; 2026-02-21T09:32:58.1892568Z // end inline asm 2026-02-21T09:32:58.1892628Z // begin inline asm 2026-02-21T09:32:58.1892763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r640, %r644, %r648, %r652}, [%r539]; 2026-02-21T09:32:58.1892816Z // end inline asm 2026-02-21T09:32:58.1892869Z // begin inline asm 2026-02-21T09:32:58.1893010Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r656, %r660, %r664, %r668}, [%r544]; 2026-02-21T09:32:58.1893063Z // end inline asm 2026-02-21T09:32:58.1893115Z // begin inline asm 2026-02-21T09:32:58.1893222Z st.global.v4.b32 [ %rd85 + 0 ], { %r605, %r606, %r607, %r608 }; 2026-02-21T09:32:58.1893276Z // end inline asm 2026-02-21T09:32:58.1893328Z // begin inline asm 2026-02-21T09:32:58.1893430Z st.global.v4.b32 [ %rd86 + 0 ], { %r609, %r610, %r611, %r612 }; 2026-02-21T09:32:58.1893485Z // end inline asm 2026-02-21T09:32:58.1893540Z // begin inline asm 2026-02-21T09:32:58.1893634Z st.global.v4.b32 [ %rd87 + 0 ], { %r613, %r614, %r615, %r616 }; 2026-02-21T09:32:58.1893694Z // end inline asm 2026-02-21T09:32:58.1893746Z // begin inline asm 2026-02-21T09:32:58.1893836Z st.global.v4.b32 [ %rd88 + 0 ], { %r617, %r618, %r619, %r620 }; 2026-02-21T09:32:58.1893896Z // end inline asm 2026-02-21T09:32:58.1893948Z // begin inline asm 2026-02-21T09:32:58.1894039Z st.global.v4.b32 [ %rd89 + 0 ], { %r621, %r622, %r623, %r624 }; 2026-02-21T09:32:58.1894090Z // end inline asm 2026-02-21T09:32:58.1894150Z // begin inline asm 2026-02-21T09:32:58.1894241Z st.global.v4.b32 [ %rd90 + 0 ], { %r625, %r626, %r627, %r628 }; 2026-02-21T09:32:58.1894293Z // end inline asm 2026-02-21T09:32:58.1894355Z // begin inline asm 2026-02-21T09:32:58.1894444Z st.global.v4.b32 [ %rd91 + 0 ], { %r629, %r630, %r631, %r632 }; 2026-02-21T09:32:58.1894497Z // end inline asm 2026-02-21T09:32:58.1894567Z // begin inline asm 2026-02-21T09:32:58.1894738Z st.global.v4.b32 [ %rd92 + 0 ], { %r633, %r634, %r635, %r636 }; 2026-02-21T09:32:58.1894794Z // end inline asm 2026-02-21T09:32:58.1894850Z // begin inline asm 2026-02-21T09:32:58.1894949Z st.global.v4.b32 [ %rd93 + 0 ], { %r637, %r638, %r639, %r640 }; 2026-02-21T09:32:58.1895001Z // end inline asm 2026-02-21T09:32:58.1895054Z // begin inline asm 2026-02-21T09:32:58.1895155Z st.global.v4.b32 [ %rd94 + 0 ], { %r641, %r642, %r643, %r644 }; 2026-02-21T09:32:58.1895209Z // end inline asm 2026-02-21T09:32:58.1895266Z // begin inline asm 2026-02-21T09:32:58.1895357Z st.global.v4.b32 [ %rd95 + 0 ], { %r645, %r646, %r647, %r648 }; 2026-02-21T09:32:58.1895419Z // end inline asm 2026-02-21T09:32:58.1895475Z // begin inline asm 2026-02-21T09:32:58.1895565Z st.global.v4.b32 [ %rd96 + 0 ], { %r649, %r650, %r651, %r652 }; 2026-02-21T09:32:58.1895624Z // end inline asm 2026-02-21T09:32:58.1895680Z // begin inline asm 2026-02-21T09:32:58.1895770Z st.global.v4.b32 [ %rd97 + 0 ], { %r653, %r654, %r655, %r656 }; 2026-02-21T09:32:58.1895823Z // end inline asm 2026-02-21T09:32:58.1895885Z // begin inline asm 2026-02-21T09:32:58.1895976Z st.global.v4.b32 [ %rd98 + 0 ], { %r657, %r658, %r659, %r660 }; 2026-02-21T09:32:58.1896029Z // end inline asm 2026-02-21T09:32:58.1896096Z // begin inline asm 2026-02-21T09:32:58.1896189Z st.global.v4.b32 [ %rd99 + 0 ], { %r661, %r662, %r663, %r664 }; 2026-02-21T09:32:58.1896244Z // end inline asm 2026-02-21T09:32:58.1896299Z // begin inline asm 2026-02-21T09:32:58.1896413Z st.global.v4.b32 [ %rd100 + 0 ], { %r665, %r666, %r667, %r668 }; 2026-02-21T09:32:58.1896467Z // end inline asm 2026-02-21T09:32:58.1896547Z $L__BB0_8: // %._crit_edge 2026-02-21T09:32:58.1896715Z .loc 1 27 4 // capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py:27:4 2026-02-21T09:32:58.1896768Z bar.sync 0; 2026-02-21T09:32:58.1896821Z // begin inline asm 2026-02-21T09:32:58.1896944Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r896, 128; 2026-02-21T09:32:58.1896998Z // end inline asm 2026-02-21T09:32:58.1897077Z ret; 2026-02-21T09:32:58.1897159Z $L__tmp0: 2026-02-21T09:32:58.1897221Z $L__func_end0: 2026-02-21T09:32:58.1897301Z // -- End function 2026-02-21T09:32:58.1897349Z } 2026-02-21T09:32:58.1897560Z .file 1 "/tmp/torchinductor_root/ap/capajs43ux4kyzdpdk6xn4j57m6dkhxsueb3wukp564tbxjcbqc7.py" 2026-02-21T09:32:58.1897621Z .section .debug_abbrev 2026-02-21T09:32:58.1897671Z { 2026-02-21T09:32:58.1897762Z .b8 1 // Abbreviation Code 2026-02-21T09:32:58.1897847Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:32:58.1897925Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:32:58.1898001Z .b8 37 // DW_AT_producer 2026-02-21T09:32:58.1898082Z .b8 8 // DW_FORM_string 2026-02-21T09:32:58.1898153Z .b8 19 // DW_AT_language 2026-02-21T09:32:58.1898226Z .b8 5 // DW_FORM_data2 2026-02-21T09:32:58.1898310Z .b8 3 // DW_AT_name 2026-02-21T09:32:58.1898382Z .b8 8 // DW_FORM_string 2026-02-21T09:32:58.1898459Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:32:58.1898537Z .b8 6 // DW_FORM_data4 2026-02-21T09:32:58.1898609Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:32:58.1898677Z .b8 8 // DW_FORM_string 2026-02-21T09:32:58.1898744Z .b8 0 // EOM(1) 2026-02-21T09:32:58.1898819Z .b8 0 // EOM(2) 2026-02-21T09:32:58.1898883Z .b8 0 // EOM(3) 2026-02-21T09:32:58.1898931Z } 2026-02-21T09:32:58.1898996Z .section .debug_info 2026-02-21T09:32:58.1899043Z { 2026-02-21T09:32:58.1899171Z .b32 104 // Length of Unit 2026-02-21T09:32:58.1899262Z .b8 2 // DWARF version number 2026-02-21T09:32:58.1899315Z .b8 0 2026-02-21T09:32:58.1899426Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:32:58.1899510Z .b8 8 // Address Size (in bytes) 2026-02-21T09:32:58.1899613Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:32:58.1899688Z .b8 116 // DW_AT_producer 2026-02-21T09:32:58.1899739Z .b8 114 2026-02-21T09:32:58.1899795Z .b8 105 2026-02-21T09:32:58.1899844Z .b8 116 2026-02-21T09:32:58.1899891Z .b8 111 2026-02-21T09:32:58.1899939Z .b8 110 2026-02-21T09:32:58.1899994Z .b8 0 2026-02-21T09:32:58.1900065Z .b8 2 // DW_AT_language 2026-02-21T09:32:58.1900112Z .b8 0 2026-02-21T09:32:58.1900190Z .b8 99 // DW_AT_name 2026-02-21T09:32:58.1900241Z .b8 97 2026-02-21T09:32:58.1900291Z .b8 112 2026-02-21T09:32:58.1900341Z .b8 97 2026-02-21T09:32:58.1900398Z .b8 106 2026-02-21T09:32:58.1900447Z .b8 115 2026-02-21T09:32:58.1900495Z .b8 52 2026-02-21T09:32:58.1900548Z .b8 51 2026-02-21T09:32:58.1900596Z .b8 117 2026-02-21T09:32:58.1900645Z .b8 120 2026-02-21T09:32:58.1900691Z .b8 52 2026-02-21T09:32:58.1900746Z .b8 107 2026-02-21T09:32:58.1900794Z .b8 121 2026-02-21T09:32:58.1900842Z .b8 122 2026-02-21T09:32:58.1900897Z .b8 100 2026-02-21T09:32:58.1900945Z .b8 112 2026-02-21T09:32:58.1900993Z .b8 100 2026-02-21T09:32:58.1901040Z .b8 107 2026-02-21T09:32:58.1901094Z .b8 54 2026-02-21T09:32:58.1901142Z .b8 120 2026-02-21T09:32:58.1901189Z .b8 110 2026-02-21T09:32:58.1901238Z .b8 52 2026-02-21T09:32:58.1901296Z .b8 106 2026-02-21T09:32:58.1901344Z .b8 53 2026-02-21T09:32:58.1901392Z .b8 55 2026-02-21T09:32:58.1901447Z .b8 109 2026-02-21T09:32:58.1901494Z .b8 54 2026-02-21T09:32:58.1901542Z .b8 100 2026-02-21T09:32:58.1901590Z .b8 107 2026-02-21T09:32:58.1901647Z .b8 104 2026-02-21T09:32:58.1901697Z .b8 120 2026-02-21T09:32:58.1901771Z .b8 115 2026-02-21T09:32:58.1901830Z .b8 117 2026-02-21T09:32:58.1901903Z .b8 101 2026-02-21T09:32:58.1901954Z .b8 98 2026-02-21T09:32:58.1902003Z .b8 51 2026-02-21T09:32:58.1902065Z .b8 119 2026-02-21T09:32:58.1902114Z .b8 117 2026-02-21T09:32:58.1902162Z .b8 107 2026-02-21T09:32:58.1902210Z .b8 112 2026-02-21T09:32:58.1902266Z .b8 53 2026-02-21T09:32:58.1902314Z .b8 54 2026-02-21T09:32:58.1902361Z .b8 52 2026-02-21T09:32:58.1902415Z .b8 116 2026-02-21T09:32:58.1902464Z .b8 98 2026-02-21T09:32:58.1902511Z .b8 120 2026-02-21T09:32:58.1902559Z .b8 106 2026-02-21T09:32:58.1902616Z .b8 99 2026-02-21T09:32:58.1902665Z .b8 98 2026-02-21T09:32:58.1902713Z .b8 113 2026-02-21T09:32:58.1902767Z .b8 99 2026-02-21T09:32:58.1902816Z .b8 55 2026-02-21T09:32:58.1902864Z .b8 46 2026-02-21T09:32:58.1902912Z .b8 112 2026-02-21T09:32:58.1902971Z .b8 121 2026-02-21T09:32:58.1903020Z .b8 0 2026-02-21T09:32:58.1903109Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:32:58.1903188Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:32:58.1903239Z .b8 116 2026-02-21T09:32:58.1903288Z .b8 109 2026-02-21T09:32:58.1903336Z .b8 112 2026-02-21T09:32:58.1903391Z .b8 47 2026-02-21T09:32:58.1903438Z .b8 116 2026-02-21T09:32:58.1903487Z .b8 111 2026-02-21T09:32:58.1903541Z .b8 114 2026-02-21T09:32:58.1903588Z .b8 99 2026-02-21T09:32:58.1903636Z .b8 104 2026-02-21T09:32:58.1903683Z .b8 105 2026-02-21T09:32:58.1903740Z .b8 110 2026-02-21T09:32:58.1903788Z .b8 100 2026-02-21T09:32:58.1903836Z .b8 117 2026-02-21T09:32:58.1903882Z .b8 99 2026-02-21T09:32:58.1903939Z .b8 116 2026-02-21T09:32:58.1903987Z .b8 111 2026-02-21T09:32:58.1904034Z .b8 114 2026-02-21T09:32:58.1904089Z .b8 95 2026-02-21T09:32:58.1904137Z .b8 114 2026-02-21T09:32:58.1904185Z .b8 111 2026-02-21T09:32:58.1904232Z .b8 111 2026-02-21T09:32:58.1904289Z .b8 116 2026-02-21T09:32:58.1904336Z .b8 47 2026-02-21T09:32:58.1904383Z .b8 97 2026-02-21T09:32:58.1904474Z .b8 112 2026-02-21T09:32:58.1904524Z .b8 0 2026-02-21T09:32:58.1904587Z } 2026-02-21T09:32:58.1904654Z .section .debug_macinfo { } 2026-02-21T09:32:58.1904659Z 2026-02-21T09:32:58.1904770Z ================================================================ 2026-02-21T09:32:58.1904868Z please share the reproducer above with Triton project. 2026-02-21T09:32:58.9105690Z 2026-02-21T09:32:58.9110417Z 2026-02-21T09:32:58.9111884Z 2026-02-21T09:32:58.9112380Z ================================================================ 2026-02-21T09:32:58.9112667Z Internal Triton PTX codegen error 2026-02-21T09:32:58.9112839Z `ptxas` stderr: 2026-02-21T09:32:58.9113275Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 140 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:32:58.9113749Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:32:58.9113907Z 2026-02-21T09:32:58.9114322Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmprar_kkzk.ptx -o /tmp/tmprar_kkzk.ptx.o 2026-02-21T09:32:58.9114948Z 2026-02-21T09:32:58.9114960Z 2026-02-21T09:32:58.9115017Z // 2026-02-21T09:32:58.9115156Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:32:58.9115331Z // 2026-02-21T09:32:58.9115396Z 2026-02-21T09:32:58.9115451Z .version 8.7 2026-02-21T09:32:58.9115593Z .target sm_100a 2026-02-21T09:32:58.9115764Z .address_size 64 2026-02-21T09:32:58.9115849Z 2026-02-21T09:32:58.9115983Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:32:58.9116233Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:32:58.9116446Z // @_helion_matmul 2026-02-21T09:32:58.9116640Z .visible .entry _helion_matmul( 2026-02-21T09:32:58.9116862Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:32:58.9117112Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:32:58.9117379Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:32:58.9117869Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:32:58.9118176Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:32:58.9118385Z ) 2026-02-21T09:32:58.9118502Z .reqntid 384 2026-02-21T09:32:58.9118632Z .maxnreg 32 2026-02-21T09:32:58.9118746Z { 2026-02-21T09:32:58.9118870Z .reg .pred %p<123>; 2026-02-21T09:32:58.9119012Z .reg .b32 %r<1227>; 2026-02-21T09:32:58.9119151Z .reg .b64 %rd<401>; 2026-02-21T09:32:58.9119412Z .loc 1 19 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:19:0 2026-02-21T09:32:58.9119694Z $L__func_begin0: 2026-02-21T09:32:58.9119936Z .loc 1 19 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:19:0 2026-02-21T09:32:58.9120160Z 2026-02-21T09:32:58.9120213Z // %bb.0: 2026-02-21T09:32:58.9120368Z ld.param.b64 %rd13, [_helion_matmul_param_3]; 2026-02-21T09:32:58.9120557Z $L__tmp0: 2026-02-21T09:32:58.9120794Z .loc 1 19 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:19 2026-02-21T09:32:58.9121074Z mov.u32 %r1, %tid.x; 2026-02-21T09:32:58.9121214Z shr.u32 %r2, %r1, 5; 2026-02-21T09:32:58.9121369Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:32:58.9121550Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:32:58.9121703Z @%p3 bra $L__BB0_22; 2026-02-21T09:32:58.9121840Z bra.uni $L__BB0_1; 2026-02-21T09:32:58.9121989Z $L__BB0_22: 2026-02-21T09:32:58.9122565Z [130s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:32:58.9123970Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=3, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:32:58.9125278Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:32:58.9125533Z `ptxas` stderr: 2026-02-21T09:32:58.9125960Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 140 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:32:58.9126448Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:32:58.9126598Z 2026-02-21T09:32:58.9126988Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmprar_kkzk.ptx -o /tmp/tmprar_kkzk.ptx.o 2026-02-21T09:32:58.9127457Z 2026-02-21T09:32:58.9127587Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:32:58.9127956Z .loc 1 0 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0:0 2026-02-21T09:32:58.9128276Z ld.param.b64 %rd10, [_helion_matmul_param_0]; 2026-02-21T09:32:58.9128592Z .loc 1 19 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:19 2026-02-21T09:32:58.9128901Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T09:32:58.9129109Z setp.lt.u32 %p61, %r1, 32; 2026-02-21T09:32:58.9129287Z mov.b32 %r418, global_smem; 2026-02-21T09:32:58.9129449Z // begin inline asm 2026-02-21T09:32:58.9129711Z @%p61 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r418], 256; 2026-02-21T09:32:58.9129957Z // end inline asm 2026-02-21T09:32:58.9130097Z bar.sync 0, 128; 2026-02-21T09:32:58.9130243Z ld.shared.b32 %r1174, [global_smem]; 2026-02-21T09:32:58.9130419Z bar.sync 0, 128; 2026-02-21T09:32:58.9130549Z // begin inline asm 2026-02-21T09:32:58.9130754Z @%p61 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:32:58.9130982Z // end inline asm 2026-02-21T09:32:58.9131228Z .loc 1 21 68 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:21:68 2026-02-21T09:32:58.9131559Z mov.u32 %r104, %ctaid.x; 2026-02-21T09:32:58.9131750Z mov.u32 %r579, %ctaid.y; 2026-02-21T09:32:58.9131901Z mov.u32 %r580, %ctaid.z; 2026-02-21T09:32:58.9132046Z mov.u32 %r581, %nctaid.x; 2026-02-21T09:32:58.9132200Z mov.u32 %r582, %nctaid.y; 2026-02-21T09:32:58.9132352Z mad.lo.s32 %r583, %r580, %r582, %r579; 2026-02-21T09:32:58.9132530Z mad.lo.s32 %r584, %r583, %r581, %r104; 2026-02-21T09:32:58.9132702Z shl.b32 %r585, %r584, 7; 2026-02-21T09:32:58.9132847Z cvt.s64.s32 %rd128, %r585; 2026-02-21T09:32:58.9133007Z add.s64 %rd125, %rd13, %rd128; 2026-02-21T09:32:58.9133162Z shl.b32 %r586, %r1, 2; 2026-02-21T09:32:58.9133317Z add.s32 %r419, %r418, %r586; 2026-02-21T09:32:58.9133467Z mov.b32 %r1224, 0; 2026-02-21T09:32:58.9133606Z // begin inline asm 2026-02-21T09:32:58.9133756Z @%p61 st.shared.b32 [ %r419 + 0 ], %r1224; 2026-02-21T09:32:58.9133934Z // end inline asm 2026-02-21T09:32:58.9134069Z bar.warp.sync -1; 2026-02-21T09:32:58.9134219Z setp.eq.b32 %p111, %r1, 0; 2026-02-21T09:32:58.9134380Z cvt.u64.u32 %rd110, %r418; 2026-02-21T09:32:58.9134528Z // begin inline asm 2026-02-21T09:32:58.9134815Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd110 + 0 ], %rd10; 2026-02-21T09:32:58.9135103Z // end inline asm 2026-02-21T09:32:58.9135242Z // begin inline asm 2026-02-21T09:32:58.9135468Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x1; 2026-02-21T09:32:58.9135735Z // end inline asm 2026-02-21T09:32:58.9135867Z mov.b32 %r421, 64; 2026-02-21T09:32:58.9136010Z // begin inline asm 2026-02-21T09:32:58.9136249Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x0, %r421; 2026-02-21T09:32:58.9136513Z // end inline asm 2026-02-21T09:32:58.9136655Z mov.b32 %r422, 128; 2026-02-21T09:32:58.9136799Z // begin inline asm 2026-02-21T09:32:58.9137084Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x1, %r422; 2026-02-21T09:32:58.9137374Z // end inline asm 2026-02-21T09:32:58.9137516Z mov.b32 %r423, 1024; 2026-02-21T09:32:58.9137650Z // begin inline asm 2026-02-21T09:32:58.9137896Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x0, %r423; 2026-02-21T09:32:58.9138171Z // end inline asm 2026-02-21T09:32:58.9138302Z mov.b32 %r424, 12288; 2026-02-21T09:32:58.9138448Z // begin inline asm 2026-02-21T09:32:58.9138683Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x1, %r424; 2026-02-21T09:32:58.9138961Z // end inline asm 2026-02-21T09:32:58.9139089Z mov.b64 %rd118, 2048; 2026-02-21T09:32:58.9139235Z // begin inline asm 2026-02-21T09:32:58.9139491Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd110 + 0 ], 0x0, %rd118; 2026-02-21T09:32:58.9139780Z // end inline asm 2026-02-21T09:32:58.9139914Z mov.b32 %r425, 1; 2026-02-21T09:32:58.9140044Z // begin inline asm 2026-02-21T09:32:58.9140302Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x0, %r425; 2026-02-21T09:32:58.9140585Z // end inline asm 2026-02-21T09:32:58.9140721Z // begin inline asm 2026-02-21T09:32:58.9140973Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x1, %r425; 2026-02-21T09:32:58.9141258Z // end inline asm 2026-02-21T09:32:58.9141390Z // begin inline asm 2026-02-21T09:32:58.9141615Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x6; 2026-02-21T09:32:58.9141878Z // end inline asm 2026-02-21T09:32:58.9142004Z // begin inline asm 2026-02-21T09:32:58.9142253Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x0; 2026-02-21T09:32:58.9142527Z // end inline asm 2026-02-21T09:32:58.9142659Z // begin inline asm 2026-02-21T09:32:58.9142893Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x3; 2026-02-21T09:32:58.9143157Z // end inline asm 2026-02-21T09:32:58.9143291Z // begin inline asm 2026-02-21T09:32:58.9143517Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd110 + 0 ], 0x0; 2026-02-21T09:32:58.9143843Z // end inline asm 2026-02-21T09:32:58.9143969Z // begin inline asm 2026-02-21T09:32:58.9144321Z @%p61 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd125 + 0 ], [ %rd110 + 0 ], 0x80; 2026-02-21T09:32:58.9144748Z // end inline asm 2026-02-21T09:32:58.9144877Z // begin inline asm 2026-02-21T09:32:58.9145091Z @%p61 fence.proxy.tensormap::generic.acquire.gpu [ %rd125 + 0 ], 0x80; 2026-02-21T09:32:58.9145350Z @%p61 cp.async.bulk.commit_group ; 2026-02-21T09:32:58.9145550Z @%p61 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:32:58.9145724Z // end inline asm 2026-02-21T09:32:58.9145860Z bar.sync 0, 128; 2026-02-21T09:32:58.9146114Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9146414Z max.u32 %r589, %r104, 767; 2026-02-21T09:32:58.9146580Z shl.b32 %r590, %r589, 4; 2026-02-21T09:32:58.9146736Z sub.s32 %r122, 12288, %r590; 2026-02-21T09:32:58.9147023Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9147332Z shfl.sync.idx.b32 %r591, %r2, 0, 31, -1; 2026-02-21T09:32:58.9147521Z shl.b32 %r592, %r591, 21; 2026-02-21T09:32:58.9147672Z and.b32 %r593, %r592, 6291456; 2026-02-21T09:32:58.9147838Z add.s32 %r427, %r593, %r1174; 2026-02-21T09:32:58.9147999Z mov.pred %p81, -1; 2026-02-21T09:32:58.9148136Z // begin inline asm 2026-02-21T09:32:58.9148545Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 0], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9148958Z // end inline asm 2026-02-21T09:32:58.9149096Z // begin inline asm 2026-02-21T09:32:58.9149549Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 16], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9149977Z // end inline asm 2026-02-21T09:32:58.9150116Z // begin inline asm 2026-02-21T09:32:58.9150491Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 32], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9150899Z // end inline asm 2026-02-21T09:32:58.9151028Z // begin inline asm 2026-02-21T09:32:58.9151410Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 48], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9151826Z // end inline asm 2026-02-21T09:32:58.9151954Z // begin inline asm 2026-02-21T09:32:58.9152327Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 64], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9152729Z // end inline asm 2026-02-21T09:32:58.9152869Z // begin inline asm 2026-02-21T09:32:58.9153249Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 80], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9153655Z // end inline asm 2026-02-21T09:32:58.9153788Z // begin inline asm 2026-02-21T09:32:58.9154160Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 96], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9154560Z // end inline asm 2026-02-21T09:32:58.9154747Z // begin inline asm 2026-02-21T09:32:58.9155124Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r427 + 112], {%r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224, %r1224}; 2026-02-21T09:32:58.9155546Z // end inline asm 2026-02-21T09:32:58.9155681Z // begin inline asm 2026-02-21T09:32:58.9155868Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:32:58.9156079Z // end inline asm 2026-02-21T09:32:58.9156211Z bar.sync 0, 128; 2026-02-21T09:32:58.9156454Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9156749Z add.s32 %r563, %r418, 90112; 2026-02-21T09:32:58.9156899Z // begin inline asm 2026-02-21T09:32:58.9157069Z @%p111 mbarrier.init.shared::cta.b64 [%r563], 1; 2026-02-21T09:32:58.9157254Z // end inline asm 2026-02-21T09:32:58.9157387Z bar.sync 0, 128; 2026-02-21T09:32:58.9157526Z add.s32 %r564, %r418, 90120; 2026-02-21T09:32:58.9157672Z // begin inline asm 2026-02-21T09:32:58.9157838Z @%p111 mbarrier.init.shared::cta.b64 [%r564], 1; 2026-02-21T09:32:58.9158019Z // end inline asm 2026-02-21T09:32:58.9158155Z bar.sync 0, 128; 2026-02-21T09:32:58.9158286Z add.s32 %r565, %r418, 90128; 2026-02-21T09:32:58.9158444Z // begin inline asm 2026-02-21T09:32:58.9158600Z @%p111 mbarrier.init.shared::cta.b64 [%r565], 1; 2026-02-21T09:32:58.9158804Z // end inline asm 2026-02-21T09:32:58.9158944Z add.s32 %r566, %r418, 90144; 2026-02-21T09:32:58.9159089Z // begin inline asm 2026-02-21T09:32:58.9159252Z @%p111 mbarrier.init.shared::cta.b64 [%r566], 1; 2026-02-21T09:32:58.9159427Z // end inline asm 2026-02-21T09:32:58.9159560Z bar.sync 0, 128; 2026-02-21T09:32:58.9159689Z add.s32 %r567, %r418, 90152; 2026-02-21T09:32:58.9159839Z // begin inline asm 2026-02-21T09:32:58.9159996Z @%p111 mbarrier.init.shared::cta.b64 [%r567], 1; 2026-02-21T09:32:58.9160178Z // end inline asm 2026-02-21T09:32:58.9160301Z bar.sync 0, 128; 2026-02-21T09:32:58.9160438Z add.s32 %r568, %r418, 90160; 2026-02-21T09:32:58.9160586Z // begin inline asm 2026-02-21T09:32:58.9160741Z @%p111 mbarrier.init.shared::cta.b64 [%r568], 1; 2026-02-21T09:32:58.9160923Z // end inline asm 2026-02-21T09:32:58.9161220Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9161508Z bar.sync 0, 128; 2026-02-21T09:32:58.9161637Z // begin inline asm 2026-02-21T09:32:58.9161808Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r563]; 2026-02-21T09:32:58.9161995Z // end inline asm 2026-02-21T09:32:58.9162129Z bar.sync 0, 128; 2026-02-21T09:32:58.9162261Z // begin inline asm 2026-02-21T09:32:58.9162425Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r564]; 2026-02-21T09:32:58.9162620Z // end inline asm 2026-02-21T09:32:58.9162746Z bar.sync 0, 128; 2026-02-21T09:32:58.9162881Z // begin inline asm 2026-02-21T09:32:58.9163041Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r565]; 2026-02-21T09:32:58.9163234Z // end inline asm 2026-02-21T09:32:58.9163472Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9163754Z bar.sync 0, 128; 2026-02-21T09:32:58.9163890Z add.s32 %r572, %r418, 90176; 2026-02-21T09:32:58.9164038Z // begin inline asm 2026-02-21T09:32:58.9164202Z @%p111 mbarrier.init.shared::cta.b64 [%r572], 1; 2026-02-21T09:32:58.9164381Z // end inline asm 2026-02-21T09:32:58.9164515Z bar.sync 0, 128; 2026-02-21T09:32:58.9164643Z add.s32 %r573, %r418, 90184; 2026-02-21T09:32:58.9164829Z // begin inline asm 2026-02-21T09:32:58.9164985Z @%p111 mbarrier.init.shared::cta.b64 [%r573], 1; 2026-02-21T09:32:58.9165171Z // end inline asm 2026-02-21T09:32:58.9165309Z add.s32 %r574, %r418, 90192; 2026-02-21T09:32:58.9165454Z // begin inline asm 2026-02-21T09:32:58.9165622Z @%p111 mbarrier.init.shared::cta.b64 [%r574], 1; 2026-02-21T09:32:58.9165800Z // end inline asm 2026-02-21T09:32:58.9165938Z bar.sync 0, 128; 2026-02-21T09:32:58.9166071Z add.s32 %r575, %r418, 90200; 2026-02-21T09:32:58.9166227Z // begin inline asm 2026-02-21T09:32:58.9166386Z @%p111 mbarrier.init.shared::cta.b64 [%r575], 1; 2026-02-21T09:32:58.9166575Z // end inline asm 2026-02-21T09:32:58.9166830Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9167119Z bar.sync 0, 128; 2026-02-21T09:32:58.9167294Z // begin inline asm 2026-02-21T09:32:58.9167504Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r574]; 2026-02-21T09:32:58.9167706Z // end inline asm 2026-02-21T09:32:58.9167837Z bar.sync 0, 128; 2026-02-21T09:32:58.9167975Z // begin inline asm 2026-02-21T09:32:58.9168142Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r575]; 2026-02-21T09:32:58.9168344Z // end inline asm 2026-02-21T09:32:58.9168607Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9168947Z st.shared.v2.b32 [global_smem+90208], {0, 33685761}; 2026-02-21T09:32:58.9169182Z st.shared.b32 [global_smem+49152], %r1174; 2026-02-21T09:32:58.9169368Z barrier.sync 1; 2026-02-21T09:32:58.9169516Z barrier.sync 1; 2026-02-21T09:32:58.9169660Z setp.lt.s32 %p104, %r122, 1; 2026-02-21T09:32:58.9169833Z mov.b32 %r1226, %r1224; 2026-02-21T09:32:58.9169991Z @%p104 bra $L__BB0_29; 2026-02-21T09:32:58.9170176Z // %bb.23: // %.lr.ph16 2026-02-21T09:32:58.9170500Z .loc 1 0 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0:107 2026-02-21T09:32:58.9170833Z ld.param.b64 %rd12, [_helion_matmul_param_2]; 2026-02-21T09:32:58.9171038Z shl.b32 %r587, %r1, 3; 2026-02-21T09:32:58.9171193Z and.b32 %r105, %r587, 120; 2026-02-21T09:32:58.9171358Z shr.u32 %r588, %r1, 4; 2026-02-21T09:32:58.9171510Z bfe.u32 %r106, %r1, 4, 3; 2026-02-21T09:32:58.9171674Z or.b32 %r107, %r106, 8; 2026-02-21T09:32:58.9171826Z or.b32 %r108, %r106, 16; 2026-02-21T09:32:58.9171983Z or.b32 %r109, %r106, 24; 2026-02-21T09:32:58.9172131Z or.b32 %r110, %r106, 32; 2026-02-21T09:32:58.9172284Z or.b32 %r111, %r106, 40; 2026-02-21T09:32:58.9172436Z or.b32 %r112, %r106, 48; 2026-02-21T09:32:58.9172581Z or.b32 %r113, %r588, 56; 2026-02-21T09:32:58.9172734Z or.b32 %r114, %r106, 64; 2026-02-21T09:32:58.9172912Z or.b32 %r115, %r106, 72; 2026-02-21T09:32:58.9173091Z or.b32 %r116, %r106, 80; 2026-02-21T09:32:58.9173240Z or.b32 %r117, %r106, 88; 2026-02-21T09:32:58.9173394Z or.b32 %r118, %r106, 96; 2026-02-21T09:32:58.9173539Z or.b32 %r119, %r106, 104; 2026-02-21T09:32:58.9173700Z or.b32 %r120, %r106, 112; 2026-02-21T09:32:58.9173849Z or.b32 %r121, %r588, 120; 2026-02-21T09:32:58.9174123Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9174430Z add.s32 %r1222, %r104, -1; 2026-02-21T09:32:58.9174589Z shl.b32 %r596, %r1, 10; 2026-02-21T09:32:58.9174775Z and.b32 %r597, %r596, 6144; 2026-02-21T09:32:58.9174933Z shl.b32 %r598, %r1, 4; 2026-02-21T09:32:58.9175090Z and.b32 %r599, %r598, 2032; 2026-02-21T09:32:58.9175247Z or.b32 %r600, %r597, %r599; 2026-02-21T09:32:58.9175416Z add.s32 %r602, %r418, 81920; 2026-02-21T09:32:58.9175575Z add.s32 %r125, %r602, %r600; 2026-02-21T09:32:58.9175738Z xor.b32 %r603, %r600, 32; 2026-02-21T09:32:58.9175904Z add.s32 %r126, %r602, %r603; 2026-02-21T09:32:58.9176063Z xor.b32 %r604, %r600, 64; 2026-02-21T09:32:58.9176232Z add.s32 %r127, %r602, %r604; 2026-02-21T09:32:58.9176392Z xor.b32 %r605, %r600, 96; 2026-02-21T09:32:58.9176563Z add.s32 %r128, %r602, %r605; 2026-02-21T09:32:58.9176733Z and.b32 %r606, %r1, 96; 2026-02-21T09:32:58.9176883Z shl.b32 %r607, %r606, 6; 2026-02-21T09:32:58.9177027Z shl.b32 %r608, %r1, 5; 2026-02-21T09:32:58.9177177Z and.b32 %r609, %r608, 96; 2026-02-21T09:32:58.9177322Z and.b32 %r610, %r598, 384; 2026-02-21T09:32:58.9177475Z and.b32 %r612, %r586, 16; 2026-02-21T09:32:58.9177627Z or.b32 %r613, %r607, %r609; 2026-02-21T09:32:58.9177775Z or.b32 %r614, %r610, %r606; 2026-02-21T09:32:58.9177929Z xor.b32 %r615, %r613, %r614; 2026-02-21T09:32:58.9178079Z add.s32 %r616, %r602, %r612; 2026-02-21T09:32:58.9178233Z add.s32 %r761, %r616, %r615; 2026-02-21T09:32:58.9178381Z add.s32 %r766, %r761, 512; 2026-02-21T09:32:58.9178535Z add.s32 %r771, %r761, 1024; 2026-02-21T09:32:58.9178684Z add.s32 %r776, %r761, 1536; 2026-02-21T09:32:58.9178873Z mov.b32 %r1218, -1; 2026-02-21T09:32:58.9179054Z mov.b32 %r1226, %r1224; 2026-02-21T09:32:58.9179195Z mov.b32 %r1221, %r1224; 2026-02-21T09:32:58.9179343Z mov.b32 %r1220, %r1224; 2026-02-21T09:32:58.9179482Z mov.b32 %r1219, %r1224; 2026-02-21T09:32:58.9179627Z bra.uni $L__BB0_24; 2026-02-21T09:32:58.9179809Z $L__BB0_27: // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:32:58.9180027Z shl.b32 %r902, %r1224, 3; 2026-02-21T09:32:58.9180171Z add.s32 %r904, %r418, %r902; 2026-02-21T09:32:58.9180327Z add.s32 %r619, %r904, 90176; 2026-02-21T09:32:58.9180584Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9180870Z shl.b32 %r905, %r1224, 7; 2026-02-21T09:32:58.9181020Z bar.sync 0, 128; 2026-02-21T09:32:58.9181152Z // begin inline asm 2026-02-21T09:32:58.9181289Z 2026-02-21T09:32:58.9181398Z { 2026-02-21T09:32:58.9181521Z .reg .pred complete; 2026-02-21T09:32:58.9181660Z waitLoop: 2026-02-21T09:32:58.9181852Z mbarrier.try_wait.parity.shared.b64 complete, [%r619], %r1226; 2026-02-21T09:32:58.9182084Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9182235Z } 2026-02-21T09:32:58.9182297Z 2026-02-21T09:32:58.9182356Z // end inline asm 2026-02-21T09:32:58.9182598Z .loc 1 42 32 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:42:32 2026-02-21T09:32:58.9182882Z or.b32 %r906, %r1220, %r105; 2026-02-21T09:32:58.9183139Z .loc 1 44 32 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:44:32 2026-02-21T09:32:58.9183427Z add.s32 %r907, %r1221, %r106; 2026-02-21T09:32:58.9183581Z add.s32 %r908, %r1221, %r107; 2026-02-21T09:32:58.9183739Z add.s32 %r909, %r1221, %r108; 2026-02-21T09:32:58.9183887Z add.s32 %r910, %r1221, %r109; 2026-02-21T09:32:58.9184045Z add.s32 %r911, %r1221, %r110; 2026-02-21T09:32:58.9184228Z add.s32 %r912, %r1221, %r111; 2026-02-21T09:32:58.9184405Z add.s32 %r913, %r1221, %r112; 2026-02-21T09:32:58.9184568Z add.s32 %r914, %r1221, %r113; 2026-02-21T09:32:58.9184743Z add.s32 %r915, %r1221, %r114; 2026-02-21T09:32:58.9184902Z add.s32 %r916, %r1221, %r115; 2026-02-21T09:32:58.9185050Z add.s32 %r917, %r1221, %r116; 2026-02-21T09:32:58.9185205Z add.s32 %r918, %r1221, %r117; 2026-02-21T09:32:58.9185356Z add.s32 %r919, %r1221, %r118; 2026-02-21T09:32:58.9185520Z add.s32 %r920, %r1221, %r119; 2026-02-21T09:32:58.9185682Z add.s32 %r921, %r1221, %r120; 2026-02-21T09:32:58.9185827Z add.s32 %r922, %r1221, %r121; 2026-02-21T09:32:58.9186083Z .loc 1 59 45 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:59:45 2026-02-21T09:32:58.9186361Z shl.b32 %r923, %r907, 10; 2026-02-21T09:32:58.9186517Z shl.b32 %r924, %r908, 10; 2026-02-21T09:32:58.9186660Z shl.b32 %r925, %r909, 10; 2026-02-21T09:32:58.9186810Z shl.b32 %r926, %r910, 10; 2026-02-21T09:32:58.9186953Z shl.b32 %r927, %r911, 10; 2026-02-21T09:32:58.9187102Z shl.b32 %r928, %r912, 10; 2026-02-21T09:32:58.9187245Z shl.b32 %r929, %r913, 10; 2026-02-21T09:32:58.9187392Z shl.b32 %r930, %r914, 10; 2026-02-21T09:32:58.9187538Z shl.b32 %r931, %r915, 10; 2026-02-21T09:32:58.9187677Z shl.b32 %r932, %r916, 10; 2026-02-21T09:32:58.9187826Z shl.b32 %r933, %r917, 10; 2026-02-21T09:32:58.9187965Z shl.b32 %r934, %r918, 10; 2026-02-21T09:32:58.9188112Z shl.b32 %r935, %r919, 10; 2026-02-21T09:32:58.9188252Z shl.b32 %r936, %r920, 10; 2026-02-21T09:32:58.9188398Z shl.b32 %r937, %r921, 10; 2026-02-21T09:32:58.9188537Z shl.b32 %r938, %r922, 10; 2026-02-21T09:32:58.9188786Z .loc 1 59 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:59:52 2026-02-21T09:32:58.9189072Z add.s32 %r939, %r923, %r906; 2026-02-21T09:32:58.9189220Z add.s32 %r940, %r924, %r906; 2026-02-21T09:32:58.9189375Z add.s32 %r941, %r925, %r906; 2026-02-21T09:32:58.9189522Z add.s32 %r942, %r926, %r906; 2026-02-21T09:32:58.9189676Z add.s32 %r943, %r927, %r906; 2026-02-21T09:32:58.9189855Z add.s32 %r944, %r928, %r906; 2026-02-21T09:32:58.9190034Z add.s32 %r945, %r929, %r906; 2026-02-21T09:32:58.9190178Z add.s32 %r946, %r930, %r906; 2026-02-21T09:32:58.9190329Z add.s32 %r947, %r931, %r906; 2026-02-21T09:32:58.9190472Z add.s32 %r948, %r932, %r906; 2026-02-21T09:32:58.9190622Z add.s32 %r949, %r933, %r906; 2026-02-21T09:32:58.9190770Z add.s32 %r950, %r934, %r906; 2026-02-21T09:32:58.9190912Z add.s32 %r951, %r935, %r906; 2026-02-21T09:32:58.9191064Z add.s32 %r952, %r936, %r906; 2026-02-21T09:32:58.9191210Z add.s32 %r953, %r937, %r906; 2026-02-21T09:32:58.9191364Z add.s32 %r954, %r938, %r906; 2026-02-21T09:32:58.9191606Z .loc 1 59 24 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:59:24 2026-02-21T09:32:58.9191892Z mad.wide.s32 %rd129, %r939, 2, %rd12; 2026-02-21T09:32:58.9192070Z mad.wide.s32 %rd130, %r940, 2, %rd12; 2026-02-21T09:32:58.9192246Z mad.wide.s32 %rd131, %r941, 2, %rd12; 2026-02-21T09:32:58.9192423Z mad.wide.s32 %rd132, %r942, 2, %rd12; 2026-02-21T09:32:58.9192590Z mad.wide.s32 %rd133, %r943, 2, %rd12; 2026-02-21T09:32:58.9192764Z mad.wide.s32 %rd134, %r944, 2, %rd12; 2026-02-21T09:32:58.9192927Z mad.wide.s32 %rd135, %r945, 2, %rd12; 2026-02-21T09:32:58.9193098Z mad.wide.s32 %rd136, %r946, 2, %rd12; 2026-02-21T09:32:58.9193271Z mad.wide.s32 %rd137, %r947, 2, %rd12; 2026-02-21T09:32:58.9193442Z mad.wide.s32 %rd138, %r948, 2, %rd12; 2026-02-21T09:32:58.9193603Z mad.wide.s32 %rd139, %r949, 2, %rd12; 2026-02-21T09:32:58.9193771Z mad.wide.s32 %rd140, %r950, 2, %rd12; 2026-02-21T09:32:58.9193938Z mad.wide.s32 %rd141, %r951, 2, %rd12; 2026-02-21T09:32:58.9194098Z mad.wide.s32 %rd142, %r952, 2, %rd12; 2026-02-21T09:32:58.9194266Z mad.wide.s32 %rd143, %r953, 2, %rd12; 2026-02-21T09:32:58.9194428Z mad.wide.s32 %rd144, %r954, 2, %rd12; 2026-02-21T09:32:58.9194754Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9195065Z add.s32 %r637, %r427, %r905; 2026-02-21T09:32:58.9195224Z // begin inline asm 2026-02-21T09:32:58.9195583Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636}, [%r637 + 0]; 2026-02-21T09:32:58.9195965Z // end inline asm 2026-02-21T09:32:58.9196104Z // begin inline asm 2026-02-21T09:32:58.9196441Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653}, [%r637 + 16]; 2026-02-21T09:32:58.9196819Z // end inline asm 2026-02-21T09:32:58.9196947Z // begin inline asm 2026-02-21T09:32:58.9197287Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r655, %r656, %r657, %r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670}, [%r637 + 32]; 2026-02-21T09:32:58.9197658Z // end inline asm 2026-02-21T09:32:58.9197787Z // begin inline asm 2026-02-21T09:32:58.9198122Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r672, %r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687}, [%r637 + 48]; 2026-02-21T09:32:58.9198489Z // end inline asm 2026-02-21T09:32:58.9198623Z // begin inline asm 2026-02-21T09:32:58.9198949Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r689, %r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704}, [%r637 + 64]; 2026-02-21T09:32:58.9199314Z // end inline asm 2026-02-21T09:32:58.9199448Z // begin inline asm 2026-02-21T09:32:58.9199787Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r706, %r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721}, [%r637 + 80]; 2026-02-21T09:32:58.9200157Z // end inline asm 2026-02-21T09:32:58.9200282Z // begin inline asm 2026-02-21T09:32:58.9200614Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r723, %r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738}, [%r637 + 96]; 2026-02-21T09:32:58.9201037Z // end inline asm 2026-02-21T09:32:58.9201191Z // begin inline asm 2026-02-21T09:32:58.9201538Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r740, %r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755}, [%r637 + 112]; 2026-02-21T09:32:58.9201915Z // end inline asm 2026-02-21T09:32:58.9202048Z // begin inline asm 2026-02-21T09:32:58.9202194Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:32:58.9202355Z // end inline asm 2026-02-21T09:32:58.9202490Z cvt.u64.u32 %rd145, %r621; 2026-02-21T09:32:58.9202652Z cvt.u64.u32 %rd146, %r622; 2026-02-21T09:32:58.9202812Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:32:58.9202967Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:32:58.9203235Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9203515Z mov.b64 {%r955, %r956}, %rd148; 2026-02-21T09:32:58.9203690Z cvt.rn.f16x2.f32 %r957, %r956, %r955; 2026-02-21T09:32:58.9203962Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9204246Z cvt.u64.u32 %rd149, %r623; 2026-02-21T09:32:58.9204395Z cvt.u64.u32 %rd150, %r624; 2026-02-21T09:32:58.9204552Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:32:58.9204740Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:32:58.9205004Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9205305Z mov.b64 {%r958, %r959}, %rd152; 2026-02-21T09:32:58.9205474Z cvt.rn.f16x2.f32 %r960, %r959, %r958; 2026-02-21T09:32:58.9205754Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9206028Z cvt.u64.u32 %rd153, %r625; 2026-02-21T09:32:58.9206189Z cvt.u64.u32 %rd154, %r626; 2026-02-21T09:32:58.9206346Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:32:58.9206531Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:32:58.9206822Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9207102Z mov.b64 {%r961, %r962}, %rd156; 2026-02-21T09:32:58.9207269Z cvt.rn.f16x2.f32 %r963, %r962, %r961; 2026-02-21T09:32:58.9207537Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9207817Z cvt.u64.u32 %rd157, %r627; 2026-02-21T09:32:58.9207962Z cvt.u64.u32 %rd158, %r628; 2026-02-21T09:32:58.9208113Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:32:58.9208268Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:32:58.9208526Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9208809Z mov.b64 {%r964, %r965}, %rd160; 2026-02-21T09:32:58.9208969Z cvt.rn.f16x2.f32 %r966, %r965, %r964; 2026-02-21T09:32:58.9209243Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9209516Z cvt.u64.u32 %rd161, %r629; 2026-02-21T09:32:58.9209675Z cvt.u64.u32 %rd162, %r630; 2026-02-21T09:32:58.9209829Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:32:58.9209979Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:32:58.9210239Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9210515Z mov.b64 {%r967, %r968}, %rd164; 2026-02-21T09:32:58.9210680Z cvt.rn.f16x2.f32 %r969, %r968, %r967; 2026-02-21T09:32:58.9210946Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9211226Z cvt.u64.u32 %rd165, %r631; 2026-02-21T09:32:58.9211372Z cvt.u64.u32 %rd166, %r632; 2026-02-21T09:32:58.9211527Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:32:58.9211682Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:32:58.9211940Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9212246Z mov.b64 {%r970, %r971}, %rd168; 2026-02-21T09:32:58.9212444Z cvt.rn.f16x2.f32 %r972, %r971, %r970; 2026-02-21T09:32:58.9212759Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9213049Z cvt.u64.u32 %rd169, %r633; 2026-02-21T09:32:58.9213210Z cvt.u64.u32 %rd170, %r634; 2026-02-21T09:32:58.9213373Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:32:58.9213529Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:32:58.9213802Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9214091Z mov.b64 {%r973, %r974}, %rd172; 2026-02-21T09:32:58.9214266Z cvt.rn.f16x2.f32 %r975, %r974, %r973; 2026-02-21T09:32:58.9214545Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9214877Z cvt.u64.u32 %rd173, %r635; 2026-02-21T09:32:58.9215034Z cvt.u64.u32 %rd174, %r636; 2026-02-21T09:32:58.9215200Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:32:58.9215368Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:32:58.9215636Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9215939Z mov.b64 {%r976, %r977}, %rd176; 2026-02-21T09:32:58.9216111Z cvt.rn.f16x2.f32 %r978, %r977, %r976; 2026-02-21T09:32:58.9216405Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9216691Z cvt.u64.u32 %rd177, %r638; 2026-02-21T09:32:58.9216852Z cvt.u64.u32 %rd178, %r639; 2026-02-21T09:32:58.9217021Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:32:58.9217186Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:32:58.9217465Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9217751Z mov.b64 {%r979, %r980}, %rd180; 2026-02-21T09:32:58.9217925Z cvt.rn.f16x2.f32 %r981, %r980, %r979; 2026-02-21T09:32:58.9218261Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9218574Z cvt.u64.u32 %rd181, %r640; 2026-02-21T09:32:58.9218737Z cvt.u64.u32 %rd182, %r641; 2026-02-21T09:32:58.9218808Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:32:58.9218872Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:32:58.9219045Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9219116Z mov.b64 {%r982, %r983}, %rd184; 2026-02-21T09:32:58.9219183Z cvt.rn.f16x2.f32 %r984, %r983, %r982; 2026-02-21T09:32:58.9219357Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9219428Z cvt.u64.u32 %rd185, %r642; 2026-02-21T09:32:58.9219490Z cvt.u64.u32 %rd186, %r643; 2026-02-21T09:32:58.9219550Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:32:58.9219613Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:32:58.9219798Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9219863Z mov.b64 {%r985, %r986}, %rd188; 2026-02-21T09:32:58.9219931Z cvt.rn.f16x2.f32 %r987, %r986, %r985; 2026-02-21T09:32:58.9220112Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9220174Z cvt.u64.u32 %rd189, %r644; 2026-02-21T09:32:58.9220234Z cvt.u64.u32 %rd190, %r645; 2026-02-21T09:32:58.9220304Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:32:58.9220366Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:32:58.9220540Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9220602Z mov.b64 {%r988, %r989}, %rd192; 2026-02-21T09:32:58.9220677Z cvt.rn.f16x2.f32 %r990, %r989, %r988; 2026-02-21T09:32:58.9220847Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9220909Z cvt.u64.u32 %rd193, %r646; 2026-02-21T09:32:58.9220980Z cvt.u64.u32 %rd194, %r647; 2026-02-21T09:32:58.9221066Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:32:58.9221155Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:32:58.9221331Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9221391Z mov.b64 {%r991, %r992}, %rd196; 2026-02-21T09:32:58.9221463Z cvt.rn.f16x2.f32 %r993, %r992, %r991; 2026-02-21T09:32:58.9221629Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9221691Z cvt.u64.u32 %rd197, %r648; 2026-02-21T09:32:58.9221747Z cvt.u64.u32 %rd198, %r649; 2026-02-21T09:32:58.9221801Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:32:58.9221865Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:32:58.9222026Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9222083Z mov.b64 {%r994, %r995}, %rd200; 2026-02-21T09:32:58.9222150Z cvt.rn.f16x2.f32 %r996, %r995, %r994; 2026-02-21T09:32:58.9222310Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9222368Z cvt.u64.u32 %rd201, %r650; 2026-02-21T09:32:58.9222422Z cvt.u64.u32 %rd202, %r651; 2026-02-21T09:32:58.9222484Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:32:58.9222540Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:32:58.9222701Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9222765Z mov.b64 {%r997, %r998}, %rd204; 2026-02-21T09:32:58.9222824Z cvt.rn.f16x2.f32 %r999, %r998, %r997; 2026-02-21T09:32:58.9222981Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9223042Z cvt.u64.u32 %rd205, %r652; 2026-02-21T09:32:58.9223096Z cvt.u64.u32 %rd206, %r653; 2026-02-21T09:32:58.9223176Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:32:58.9223254Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:32:58.9223422Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9223485Z mov.b64 {%r1000, %r1001}, %rd208; 2026-02-21T09:32:58.9223552Z cvt.rn.f16x2.f32 %r1002, %r1001, %r1000; 2026-02-21T09:32:58.9223719Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9223775Z cvt.u64.u32 %rd209, %r655; 2026-02-21T09:32:58.9223832Z cvt.u64.u32 %rd210, %r656; 2026-02-21T09:32:58.9223893Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:32:58.9223949Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:32:58.9224106Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9224167Z mov.b64 {%r1003, %r1004}, %rd212; 2026-02-21T09:32:58.9224241Z cvt.rn.f16x2.f32 %r1005, %r1004, %r1003; 2026-02-21T09:32:58.9224403Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9224461Z cvt.u64.u32 %rd213, %r657; 2026-02-21T09:32:58.9224527Z cvt.u64.u32 %rd214, %r658; 2026-02-21T09:32:58.9224582Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:32:58.9224638Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:32:58.9224837Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9224897Z mov.b64 {%r1006, %r1007}, %rd216; 2026-02-21T09:32:58.9224961Z cvt.rn.f16x2.f32 %r1008, %r1007, %r1006; 2026-02-21T09:32:58.9225120Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9225183Z cvt.u64.u32 %rd217, %r659; 2026-02-21T09:32:58.9225238Z cvt.u64.u32 %rd218, %r660; 2026-02-21T09:32:58.9225295Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:32:58.9225363Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:32:58.9225529Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9225621Z mov.b64 {%r1009, %r1010}, %rd220; 2026-02-21T09:32:58.9225720Z cvt.rn.f16x2.f32 %r1011, %r1010, %r1009; 2026-02-21T09:32:58.9225880Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9225936Z cvt.u64.u32 %rd221, %r661; 2026-02-21T09:32:58.9225992Z cvt.u64.u32 %rd222, %r662; 2026-02-21T09:32:58.9226055Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:32:58.9226111Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:32:58.9226272Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9226343Z mov.b64 {%r1012, %r1013}, %rd224; 2026-02-21T09:32:58.9226409Z cvt.rn.f16x2.f32 %r1014, %r1013, %r1012; 2026-02-21T09:32:58.9226568Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9226638Z cvt.u64.u32 %rd225, %r663; 2026-02-21T09:32:58.9226696Z cvt.u64.u32 %rd226, %r664; 2026-02-21T09:32:58.9226753Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:32:58.9226809Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:32:58.9226974Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9227031Z mov.b64 {%r1015, %r1016}, %rd228; 2026-02-21T09:32:58.9227095Z cvt.rn.f16x2.f32 %r1017, %r1016, %r1015; 2026-02-21T09:32:58.9227265Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9227322Z cvt.u64.u32 %rd229, %r665; 2026-02-21T09:32:58.9227378Z cvt.u64.u32 %rd230, %r666; 2026-02-21T09:32:58.9227441Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:32:58.9227496Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:32:58.9227655Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9227712Z mov.b64 {%r1018, %r1019}, %rd232; 2026-02-21T09:32:58.9227843Z cvt.rn.f16x2.f32 %r1020, %r1019, %r1018; 2026-02-21T09:32:58.9228006Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9228064Z cvt.u64.u32 %rd233, %r667; 2026-02-21T09:32:58.9228126Z cvt.u64.u32 %rd234, %r668; 2026-02-21T09:32:58.9228182Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:32:58.9228239Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:32:58.9228406Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9228463Z mov.b64 {%r1021, %r1022}, %rd236; 2026-02-21T09:32:58.9228527Z cvt.rn.f16x2.f32 %r1023, %r1022, %r1021; 2026-02-21T09:32:58.9228689Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9228752Z cvt.u64.u32 %rd237, %r669; 2026-02-21T09:32:58.9228805Z cvt.u64.u32 %rd238, %r670; 2026-02-21T09:32:58.9228861Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:32:58.9228926Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:32:58.9229089Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9229149Z mov.b64 {%r1024, %r1025}, %rd240; 2026-02-21T09:32:58.9229221Z cvt.rn.f16x2.f32 %r1026, %r1025, %r1024; 2026-02-21T09:32:58.9229381Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9229436Z cvt.u64.u32 %rd241, %r672; 2026-02-21T09:32:58.9229492Z cvt.u64.u32 %rd242, %r673; 2026-02-21T09:32:58.9229555Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:32:58.9229612Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:32:58.9229774Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9229839Z mov.b64 {%r1027, %r1028}, %rd244; 2026-02-21T09:32:58.9229902Z cvt.rn.f16x2.f32 %r1029, %r1028, %r1027; 2026-02-21T09:32:58.9230070Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9230155Z cvt.u64.u32 %rd245, %r674; 2026-02-21T09:32:58.9230231Z cvt.u64.u32 %rd246, %r675; 2026-02-21T09:32:58.9230286Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:32:58.9230342Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:32:58.9230513Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9230568Z mov.b64 {%r1030, %r1031}, %rd248; 2026-02-21T09:32:58.9230631Z cvt.rn.f16x2.f32 %r1032, %r1031, %r1030; 2026-02-21T09:32:58.9230796Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9230852Z cvt.u64.u32 %rd249, %r676; 2026-02-21T09:32:58.9230906Z cvt.u64.u32 %rd250, %r677; 2026-02-21T09:32:58.9230968Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:32:58.9231023Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:32:58.9231182Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9231240Z mov.b64 {%r1033, %r1034}, %rd252; 2026-02-21T09:32:58.9231314Z cvt.rn.f16x2.f32 %r1035, %r1034, %r1033; 2026-02-21T09:32:58.9231470Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9231526Z cvt.u64.u32 %rd253, %r678; 2026-02-21T09:32:58.9231587Z cvt.u64.u32 %rd254, %r679; 2026-02-21T09:32:58.9231642Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:32:58.9231697Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:32:58.9231864Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9231920Z mov.b64 {%r1036, %r1037}, %rd256; 2026-02-21T09:32:58.9231982Z cvt.rn.f16x2.f32 %r1038, %r1037, %r1036; 2026-02-21T09:32:58.9232147Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9232210Z cvt.u64.u32 %rd257, %r680; 2026-02-21T09:32:58.9232284Z cvt.u64.u32 %rd258, %r681; 2026-02-21T09:32:58.9232362Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:32:58.9232430Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:32:58.9232588Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9232645Z mov.b64 {%r1039, %r1040}, %rd260; 2026-02-21T09:32:58.9232716Z cvt.rn.f16x2.f32 %r1041, %r1040, %r1039; 2026-02-21T09:32:58.9232871Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9232926Z cvt.u64.u32 %rd261, %r682; 2026-02-21T09:32:58.9232982Z cvt.u64.u32 %rd262, %r683; 2026-02-21T09:32:58.9233045Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:32:58.9233102Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:32:58.9233261Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9233324Z mov.b64 {%r1042, %r1043}, %rd264; 2026-02-21T09:32:58.9233389Z cvt.rn.f16x2.f32 %r1044, %r1043, %r1042; 2026-02-21T09:32:58.9233547Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9233613Z cvt.u64.u32 %rd265, %r684; 2026-02-21T09:32:58.9233668Z cvt.u64.u32 %rd266, %r685; 2026-02-21T09:32:58.9233725Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:32:58.9233782Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:32:58.9233950Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9234008Z mov.b64 {%r1045, %r1046}, %rd268; 2026-02-21T09:32:58.9234073Z cvt.rn.f16x2.f32 %r1047, %r1046, %r1045; 2026-02-21T09:32:58.9234243Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9234309Z cvt.u64.u32 %rd269, %r686; 2026-02-21T09:32:58.9234365Z cvt.u64.u32 %rd270, %r687; 2026-02-21T09:32:58.9234428Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:32:58.9234486Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:32:58.9234646Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9234790Z mov.b64 {%r1048, %r1049}, %rd272; 2026-02-21T09:32:58.9234863Z cvt.rn.f16x2.f32 %r1050, %r1049, %r1048; 2026-02-21T09:32:58.9235021Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9235076Z cvt.u64.u32 %rd273, %r689; 2026-02-21T09:32:58.9235138Z cvt.u64.u32 %rd274, %r690; 2026-02-21T09:32:58.9235194Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:32:58.9235250Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:32:58.9235415Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9235472Z mov.b64 {%r1051, %r1052}, %rd276; 2026-02-21T09:32:58.9235537Z cvt.rn.f16x2.f32 %r1053, %r1052, %r1051; 2026-02-21T09:32:58.9235697Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9235761Z cvt.u64.u32 %rd277, %r691; 2026-02-21T09:32:58.9235818Z cvt.u64.u32 %rd278, %r692; 2026-02-21T09:32:58.9235876Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:32:58.9235940Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:32:58.9236102Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9236159Z mov.b64 {%r1054, %r1055}, %rd280; 2026-02-21T09:32:58.9236230Z cvt.rn.f16x2.f32 %r1056, %r1055, %r1054; 2026-02-21T09:32:58.9236392Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9236447Z cvt.u64.u32 %rd281, %r693; 2026-02-21T09:32:58.9236502Z cvt.u64.u32 %rd282, %r694; 2026-02-21T09:32:58.9236567Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:32:58.9236623Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:32:58.9236813Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9236905Z mov.b64 {%r1057, %r1058}, %rd284; 2026-02-21T09:32:58.9236971Z cvt.rn.f16x2.f32 %r1059, %r1058, %r1057; 2026-02-21T09:32:58.9237132Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9237196Z cvt.u64.u32 %rd285, %r695; 2026-02-21T09:32:58.9237251Z cvt.u64.u32 %rd286, %r696; 2026-02-21T09:32:58.9237305Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:32:58.9237361Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:32:58.9237529Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9237587Z mov.b64 {%r1060, %r1061}, %rd288; 2026-02-21T09:32:58.9237650Z cvt.rn.f16x2.f32 %r1062, %r1061, %r1060; 2026-02-21T09:32:58.9237817Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9237872Z cvt.u64.u32 %rd289, %r697; 2026-02-21T09:32:58.9237928Z cvt.u64.u32 %rd290, %r698; 2026-02-21T09:32:58.9237991Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:32:58.9238048Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:32:58.9238209Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9238264Z mov.b64 {%r1063, %r1064}, %rd292; 2026-02-21T09:32:58.9238333Z cvt.rn.f16x2.f32 %r1065, %r1064, %r1063; 2026-02-21T09:32:58.9238491Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9238546Z cvt.u64.u32 %rd293, %r699; 2026-02-21T09:32:58.9238608Z cvt.u64.u32 %rd294, %r700; 2026-02-21T09:32:58.9238663Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:32:58.9238719Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:32:58.9238884Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9238939Z mov.b64 {%r1066, %r1067}, %rd296; 2026-02-21T09:32:58.9239004Z cvt.rn.f16x2.f32 %r1068, %r1067, %r1066; 2026-02-21T09:32:58.9239165Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9239276Z cvt.u64.u32 %rd297, %r701; 2026-02-21T09:32:58.9239332Z cvt.u64.u32 %rd298, %r702; 2026-02-21T09:32:58.9239386Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:32:58.9239447Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:32:58.9239606Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9239663Z mov.b64 {%r1069, %r1070}, %rd300; 2026-02-21T09:32:58.9239732Z cvt.rn.f16x2.f32 %r1071, %r1070, %r1069; 2026-02-21T09:32:58.9239895Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9239950Z cvt.u64.u32 %rd301, %r703; 2026-02-21T09:32:58.9240005Z cvt.u64.u32 %rd302, %r704; 2026-02-21T09:32:58.9240066Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:32:58.9240123Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:32:58.9240284Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9240350Z mov.b64 {%r1072, %r1073}, %rd304; 2026-02-21T09:32:58.9240413Z cvt.rn.f16x2.f32 %r1074, %r1073, %r1072; 2026-02-21T09:32:58.9240578Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9240639Z cvt.u64.u32 %rd305, %r706; 2026-02-21T09:32:58.9240694Z cvt.u64.u32 %rd306, %r707; 2026-02-21T09:32:58.9240748Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:32:58.9240802Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:32:58.9240968Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9241024Z mov.b64 {%r1075, %r1076}, %rd308; 2026-02-21T09:32:58.9241087Z cvt.rn.f16x2.f32 %r1077, %r1076, %r1075; 2026-02-21T09:32:58.9241286Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9241362Z cvt.u64.u32 %rd309, %r708; 2026-02-21T09:32:58.9241421Z cvt.u64.u32 %rd310, %r709; 2026-02-21T09:32:58.9241486Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:32:58.9241544Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:32:58.9241703Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9241759Z mov.b64 {%r1078, %r1079}, %rd312; 2026-02-21T09:32:58.9241830Z cvt.rn.f16x2.f32 %r1080, %r1079, %r1078; 2026-02-21T09:32:58.9241988Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9242043Z cvt.u64.u32 %rd313, %r710; 2026-02-21T09:32:58.9242105Z cvt.u64.u32 %rd314, %r711; 2026-02-21T09:32:58.9242161Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:32:58.9242218Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:32:58.9242388Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9242447Z mov.b64 {%r1081, %r1082}, %rd316; 2026-02-21T09:32:58.9242515Z cvt.rn.f16x2.f32 %r1083, %r1082, %r1081; 2026-02-21T09:32:58.9242677Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9242743Z cvt.u64.u32 %rd317, %r712; 2026-02-21T09:32:58.9242800Z cvt.u64.u32 %rd318, %r713; 2026-02-21T09:32:58.9242858Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:32:58.9242923Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:32:58.9243080Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9243137Z mov.b64 {%r1084, %r1085}, %rd320; 2026-02-21T09:32:58.9243208Z cvt.rn.f16x2.f32 %r1086, %r1085, %r1084; 2026-02-21T09:32:58.9243373Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9243429Z cvt.u64.u32 %rd321, %r714; 2026-02-21T09:32:58.9243484Z cvt.u64.u32 %rd322, %r715; 2026-02-21T09:32:58.9243548Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:32:58.9243627Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:32:58.9243813Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9243876Z mov.b64 {%r1087, %r1088}, %rd324; 2026-02-21T09:32:58.9243940Z cvt.rn.f16x2.f32 %r1089, %r1088, %r1087; 2026-02-21T09:32:58.9244100Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9244163Z cvt.u64.u32 %rd325, %r716; 2026-02-21T09:32:58.9244219Z cvt.u64.u32 %rd326, %r717; 2026-02-21T09:32:58.9244275Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:32:58.9244331Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:32:58.9244495Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9244551Z mov.b64 {%r1090, %r1091}, %rd328; 2026-02-21T09:32:58.9244615Z cvt.rn.f16x2.f32 %r1092, %r1091, %r1090; 2026-02-21T09:32:58.9244810Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9244870Z cvt.u64.u32 %rd329, %r718; 2026-02-21T09:32:58.9244925Z cvt.u64.u32 %rd330, %r719; 2026-02-21T09:32:58.9244988Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:32:58.9245043Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:32:58.9245203Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9245261Z mov.b64 {%r1093, %r1094}, %rd332; 2026-02-21T09:32:58.9245331Z cvt.rn.f16x2.f32 %r1095, %r1094, %r1093; 2026-02-21T09:32:58.9245490Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9245546Z cvt.u64.u32 %rd333, %r720; 2026-02-21T09:32:58.9245607Z cvt.u64.u32 %rd334, %r721; 2026-02-21T09:32:58.9245663Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:32:58.9245719Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:32:58.9245958Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9246021Z mov.b64 {%r1096, %r1097}, %rd336; 2026-02-21T09:32:58.9246086Z cvt.rn.f16x2.f32 %r1098, %r1097, %r1096; 2026-02-21T09:32:58.9246246Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9246309Z cvt.u64.u32 %rd337, %r723; 2026-02-21T09:32:58.9246364Z cvt.u64.u32 %rd338, %r724; 2026-02-21T09:32:58.9246420Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:32:58.9246483Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:32:58.9246637Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9246694Z mov.b64 {%r1099, %r1100}, %rd340; 2026-02-21T09:32:58.9246765Z cvt.rn.f16x2.f32 %r1101, %r1100, %r1099; 2026-02-21T09:32:58.9246926Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9246983Z cvt.u64.u32 %rd341, %r725; 2026-02-21T09:32:58.9247040Z cvt.u64.u32 %rd342, %r726; 2026-02-21T09:32:58.9247104Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:32:58.9247159Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:32:58.9247314Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9247377Z mov.b64 {%r1102, %r1103}, %rd344; 2026-02-21T09:32:58.9247439Z cvt.rn.f16x2.f32 %r1104, %r1103, %r1102; 2026-02-21T09:32:58.9247594Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9247657Z cvt.u64.u32 %rd345, %r727; 2026-02-21T09:32:58.9247713Z cvt.u64.u32 %rd346, %r728; 2026-02-21T09:32:58.9247768Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:32:58.9247825Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:32:58.9247990Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9248050Z mov.b64 {%r1105, %r1106}, %rd348; 2026-02-21T09:32:58.9248141Z cvt.rn.f16x2.f32 %r1107, %r1106, %r1105; 2026-02-21T09:32:58.9248338Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9248395Z cvt.u64.u32 %rd349, %r729; 2026-02-21T09:32:58.9248449Z cvt.u64.u32 %rd350, %r730; 2026-02-21T09:32:58.9248513Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:32:58.9248568Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:32:58.9248730Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9248787Z mov.b64 {%r1108, %r1109}, %rd352; 2026-02-21T09:32:58.9248856Z cvt.rn.f16x2.f32 %r1110, %r1109, %r1108; 2026-02-21T09:32:58.9249018Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9249074Z cvt.u64.u32 %rd353, %r731; 2026-02-21T09:32:58.9249136Z cvt.u64.u32 %rd354, %r732; 2026-02-21T09:32:58.9249194Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:32:58.9249251Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:32:58.9249421Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9249478Z mov.b64 {%r1111, %r1112}, %rd356; 2026-02-21T09:32:58.9249542Z cvt.rn.f16x2.f32 %r1113, %r1112, %r1111; 2026-02-21T09:32:58.9249704Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9249767Z cvt.u64.u32 %rd357, %r733; 2026-02-21T09:32:58.9249822Z cvt.u64.u32 %rd358, %r734; 2026-02-21T09:32:58.9249877Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:32:58.9249940Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:32:58.9250101Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9250158Z mov.b64 {%r1114, %r1115}, %rd360; 2026-02-21T09:32:58.9250266Z cvt.rn.f16x2.f32 %r1116, %r1115, %r1114; 2026-02-21T09:32:58.9250451Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9250510Z cvt.u64.u32 %rd361, %r735; 2026-02-21T09:32:58.9250564Z cvt.u64.u32 %rd362, %r736; 2026-02-21T09:32:58.9250627Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:32:58.9250683Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:32:58.9250842Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9250908Z mov.b64 {%r1117, %r1118}, %rd364; 2026-02-21T09:32:58.9250971Z cvt.rn.f16x2.f32 %r1119, %r1118, %r1117; 2026-02-21T09:32:58.9251128Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9251193Z cvt.u64.u32 %rd365, %r737; 2026-02-21T09:32:58.9251250Z cvt.u64.u32 %rd366, %r738; 2026-02-21T09:32:58.9251305Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:32:58.9251362Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:32:58.9251531Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9251592Z mov.b64 {%r1120, %r1121}, %rd368; 2026-02-21T09:32:58.9251661Z cvt.rn.f16x2.f32 %r1122, %r1121, %r1120; 2026-02-21T09:32:58.9251836Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9251894Z cvt.u64.u32 %rd369, %r740; 2026-02-21T09:32:58.9251953Z cvt.u64.u32 %rd370, %r741; 2026-02-21T09:32:58.9252021Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:32:58.9252077Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:32:58.9252239Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9252296Z mov.b64 {%r1123, %r1124}, %rd372; 2026-02-21T09:32:58.9252367Z cvt.rn.f16x2.f32 %r1125, %r1124, %r1123; 2026-02-21T09:32:58.9252525Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9252582Z cvt.u64.u32 %rd373, %r742; 2026-02-21T09:32:58.9252668Z cvt.u64.u32 %rd374, %r743; 2026-02-21T09:32:58.9252744Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:32:58.9252800Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:32:58.9252966Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9253023Z mov.b64 {%r1126, %r1127}, %rd376; 2026-02-21T09:32:58.9253089Z cvt.rn.f16x2.f32 %r1128, %r1127, %r1126; 2026-02-21T09:32:58.9253250Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9253313Z cvt.u64.u32 %rd377, %r744; 2026-02-21T09:32:58.9253367Z cvt.u64.u32 %rd378, %r745; 2026-02-21T09:32:58.9253420Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:32:58.9253483Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:32:58.9253644Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9253702Z mov.b64 {%r1129, %r1130}, %rd380; 2026-02-21T09:32:58.9253775Z cvt.rn.f16x2.f32 %r1131, %r1130, %r1129; 2026-02-21T09:32:58.9253934Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9253990Z cvt.u64.u32 %rd381, %r746; 2026-02-21T09:32:58.9254046Z cvt.u64.u32 %rd382, %r747; 2026-02-21T09:32:58.9254107Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:32:58.9254162Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:32:58.9254326Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9254391Z mov.b64 {%r1132, %r1133}, %rd384; 2026-02-21T09:32:58.9254459Z cvt.rn.f16x2.f32 %r1134, %r1133, %r1132; 2026-02-21T09:32:58.9254625Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9254725Z cvt.u64.u32 %rd385, %r748; 2026-02-21T09:32:58.9254784Z cvt.u64.u32 %rd386, %r749; 2026-02-21T09:32:58.9254870Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:32:58.9254953Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:32:58.9255125Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9255187Z mov.b64 {%r1135, %r1136}, %rd388; 2026-02-21T09:32:58.9255254Z cvt.rn.f16x2.f32 %r1137, %r1136, %r1135; 2026-02-21T09:32:58.9255425Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9255483Z cvt.u64.u32 %rd389, %r750; 2026-02-21T09:32:58.9255539Z cvt.u64.u32 %rd390, %r751; 2026-02-21T09:32:58.9255605Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:32:58.9255664Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:32:58.9255828Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9255886Z mov.b64 {%r1138, %r1139}, %rd392; 2026-02-21T09:32:58.9255957Z cvt.rn.f16x2.f32 %r1140, %r1139, %r1138; 2026-02-21T09:32:58.9256126Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9256186Z cvt.u64.u32 %rd393, %r752; 2026-02-21T09:32:58.9256252Z cvt.u64.u32 %rd394, %r753; 2026-02-21T09:32:58.9256311Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:32:58.9256370Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:32:58.9256543Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9256605Z mov.b64 {%r1141, %r1142}, %rd396; 2026-02-21T09:32:58.9256672Z cvt.rn.f16x2.f32 %r1143, %r1142, %r1141; 2026-02-21T09:32:58.9256840Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9256904Z cvt.u64.u32 %rd397, %r754; 2026-02-21T09:32:58.9256962Z cvt.u64.u32 %rd398, %r755; 2026-02-21T09:32:58.9257022Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:32:58.9257087Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:32:58.9257257Z .loc 1 58 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:58:27 2026-02-21T09:32:58.9257343Z mov.b64 {%r1144, %r1145}, %rd400; 2026-02-21T09:32:58.9257444Z cvt.rn.f16x2.f32 %r1146, %r1145, %r1144; 2026-02-21T09:32:58.9257608Z .loc 1 59 82 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:59:82 2026-02-21T09:32:58.9257706Z st.shared.v4.b32 [%r125], {%r957, %r969, %r981, %r993}; 2026-02-21T09:32:58.9257817Z st.shared.v4.b32 [%r126], {%r1005, %r1017, %r1029, %r1041}; 2026-02-21T09:32:58.9257912Z st.shared.v4.b32 [%r127], {%r1053, %r1065, %r1077, %r1089}; 2026-02-21T09:32:58.9258005Z st.shared.v4.b32 [%r128], {%r1101, %r1113, %r1125, %r1137}; 2026-02-21T09:32:58.9258062Z bar.sync 0, 128; 2026-02-21T09:32:58.9258129Z // begin inline asm 2026-02-21T09:32:58.9258283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r837, %r841, %r845, %r849}, [%r761]; 2026-02-21T09:32:58.9258341Z // end inline asm 2026-02-21T09:32:58.9258406Z // begin inline asm 2026-02-21T09:32:58.9258558Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r853, %r857, %r861, %r865}, [%r766]; 2026-02-21T09:32:58.9258615Z // end inline asm 2026-02-21T09:32:58.9258672Z // begin inline asm 2026-02-21T09:32:58.9258826Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r869, %r873, %r877, %r881}, [%r771]; 2026-02-21T09:32:58.9258881Z // end inline asm 2026-02-21T09:32:58.9258937Z // begin inline asm 2026-02-21T09:32:58.9259088Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r885, %r889, %r893, %r897}, [%r776]; 2026-02-21T09:32:58.9259143Z // end inline asm 2026-02-21T09:32:58.9259198Z bar.sync 0, 128; 2026-02-21T09:32:58.9259297Z st.shared.v4.b32 [%r125], {%r960, %r972, %r984, %r996}; 2026-02-21T09:32:58.9259391Z st.shared.v4.b32 [%r126], {%r1008, %r1020, %r1032, %r1044}; 2026-02-21T09:32:58.9259481Z st.shared.v4.b32 [%r127], {%r1056, %r1068, %r1080, %r1092}; 2026-02-21T09:32:58.9259574Z st.shared.v4.b32 [%r128], {%r1104, %r1116, %r1128, %r1140}; 2026-02-21T09:32:58.9259660Z bar.sync 0, 128; 2026-02-21T09:32:58.9259740Z // begin inline asm 2026-02-21T09:32:58.9259888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r838, %r842, %r846, %r850}, [%r761]; 2026-02-21T09:32:58.9259956Z // end inline asm 2026-02-21T09:32:58.9260013Z // begin inline asm 2026-02-21T09:32:58.9260157Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r854, %r858, %r862, %r866}, [%r766]; 2026-02-21T09:32:58.9260222Z // end inline asm 2026-02-21T09:32:58.9260280Z // begin inline asm 2026-02-21T09:32:58.9260419Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r870, %r874, %r878, %r882}, [%r771]; 2026-02-21T09:32:58.9260477Z // end inline asm 2026-02-21T09:32:58.9260543Z // begin inline asm 2026-02-21T09:32:58.9260684Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r886, %r890, %r894, %r898}, [%r776]; 2026-02-21T09:32:58.9260738Z // end inline asm 2026-02-21T09:32:58.9260800Z bar.sync 0, 128; 2026-02-21T09:32:58.9260891Z st.shared.v4.b32 [%r125], {%r963, %r975, %r987, %r999}; 2026-02-21T09:32:58.9260985Z st.shared.v4.b32 [%r126], {%r1011, %r1023, %r1035, %r1047}; 2026-02-21T09:32:58.9261078Z st.shared.v4.b32 [%r127], {%r1059, %r1071, %r1083, %r1095}; 2026-02-21T09:32:58.9261179Z st.shared.v4.b32 [%r128], {%r1107, %r1119, %r1131, %r1143}; 2026-02-21T09:32:58.9261234Z bar.sync 0, 128; 2026-02-21T09:32:58.9261289Z // begin inline asm 2026-02-21T09:32:58.9261438Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r839, %r843, %r847, %r851}, [%r761]; 2026-02-21T09:32:58.9261493Z // end inline asm 2026-02-21T09:32:58.9261548Z // begin inline asm 2026-02-21T09:32:58.9261700Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r855, %r859, %r863, %r867}, [%r766]; 2026-02-21T09:32:58.9261753Z // end inline asm 2026-02-21T09:32:58.9261808Z // begin inline asm 2026-02-21T09:32:58.9261947Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r871, %r875, %r879, %r883}, [%r771]; 2026-02-21T09:32:58.9262008Z // end inline asm 2026-02-21T09:32:58.9262063Z // begin inline asm 2026-02-21T09:32:58.9262205Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r887, %r891, %r895, %r899}, [%r776]; 2026-02-21T09:32:58.9262267Z // end inline asm 2026-02-21T09:32:58.9262346Z bar.sync 0, 128; 2026-02-21T09:32:58.9262477Z st.shared.v4.b32 [%r125], {%r966, %r978, %r990, %r1002}; 2026-02-21T09:32:58.9262576Z st.shared.v4.b32 [%r126], {%r1014, %r1026, %r1038, %r1050}; 2026-02-21T09:32:58.9262668Z st.shared.v4.b32 [%r127], {%r1062, %r1074, %r1086, %r1098}; 2026-02-21T09:32:58.9262758Z st.shared.v4.b32 [%r128], {%r1110, %r1122, %r1134, %r1146}; 2026-02-21T09:32:58.9262813Z bar.sync 0, 128; 2026-02-21T09:32:58.9262877Z // begin inline asm 2026-02-21T09:32:58.9263022Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r840, %r844, %r848, %r852}, [%r761]; 2026-02-21T09:32:58.9263077Z // end inline asm 2026-02-21T09:32:58.9263140Z // begin inline asm 2026-02-21T09:32:58.9263283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r856, %r860, %r864, %r868}, [%r766]; 2026-02-21T09:32:58.9263337Z // end inline asm 2026-02-21T09:32:58.9263392Z // begin inline asm 2026-02-21T09:32:58.9263544Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r872, %r876, %r880, %r884}, [%r771]; 2026-02-21T09:32:58.9263611Z // end inline asm 2026-02-21T09:32:58.9263666Z // begin inline asm 2026-02-21T09:32:58.9263809Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r888, %r892, %r896, %r900}, [%r776]; 2026-02-21T09:32:58.9263860Z // end inline asm 2026-02-21T09:32:58.9263914Z // begin inline asm 2026-02-21T09:32:58.9264022Z st.global.v4.b32 [ %rd129 + 0 ], { %r837, %r838, %r839, %r840 }; 2026-02-21T09:32:58.9264075Z // end inline asm 2026-02-21T09:32:58.9264127Z // begin inline asm 2026-02-21T09:32:58.9264225Z st.global.v4.b32 [ %rd130 + 0 ], { %r841, %r842, %r843, %r844 }; 2026-02-21T09:32:58.9264284Z // end inline asm 2026-02-21T09:32:58.9264337Z // begin inline asm 2026-02-21T09:32:58.9264431Z st.global.v4.b32 [ %rd131 + 0 ], { %r845, %r846, %r847, %r848 }; 2026-02-21T09:32:58.9264488Z // end inline asm 2026-02-21T09:32:58.9264540Z // begin inline asm 2026-02-21T09:32:58.9264656Z st.global.v4.b32 [ %rd132 + 0 ], { %r849, %r850, %r851, %r852 }; 2026-02-21T09:32:58.9264763Z // end inline asm 2026-02-21T09:32:58.9264827Z // begin inline asm 2026-02-21T09:32:58.9264921Z st.global.v4.b32 [ %rd133 + 0 ], { %r853, %r854, %r855, %r856 }; 2026-02-21T09:32:58.9264972Z // end inline asm 2026-02-21T09:32:58.9265030Z // begin inline asm 2026-02-21T09:32:58.9265119Z st.global.v4.b32 [ %rd134 + 0 ], { %r857, %r858, %r859, %r860 }; 2026-02-21T09:32:58.9265171Z // end inline asm 2026-02-21T09:32:58.9265223Z // begin inline asm 2026-02-21T09:32:58.9265319Z st.global.v4.b32 [ %rd135 + 0 ], { %r861, %r862, %r863, %r864 }; 2026-02-21T09:32:58.9265370Z // end inline asm 2026-02-21T09:32:58.9265423Z // begin inline asm 2026-02-21T09:32:58.9265522Z st.global.v4.b32 [ %rd136 + 0 ], { %r865, %r866, %r867, %r868 }; 2026-02-21T09:32:58.9265574Z // end inline asm 2026-02-21T09:32:58.9265627Z // begin inline asm 2026-02-21T09:32:58.9265722Z st.global.v4.b32 [ %rd137 + 0 ], { %r869, %r870, %r871, %r872 }; 2026-02-21T09:32:58.9265775Z // end inline asm 2026-02-21T09:32:58.9265829Z // begin inline asm 2026-02-21T09:32:58.9265920Z st.global.v4.b32 [ %rd138 + 0 ], { %r873, %r874, %r875, %r876 }; 2026-02-21T09:32:58.9265982Z // end inline asm 2026-02-21T09:32:58.9266035Z // begin inline asm 2026-02-21T09:32:58.9266124Z st.global.v4.b32 [ %rd139 + 0 ], { %r877, %r878, %r879, %r880 }; 2026-02-21T09:32:58.9266183Z // end inline asm 2026-02-21T09:32:58.9266237Z // begin inline asm 2026-02-21T09:32:58.9266326Z st.global.v4.b32 [ %rd140 + 0 ], { %r881, %r882, %r883, %r884 }; 2026-02-21T09:32:58.9266378Z // end inline asm 2026-02-21T09:32:58.9266439Z // begin inline asm 2026-02-21T09:32:58.9266528Z st.global.v4.b32 [ %rd141 + 0 ], { %r885, %r886, %r887, %r888 }; 2026-02-21T09:32:58.9266579Z // end inline asm 2026-02-21T09:32:58.9266638Z // begin inline asm 2026-02-21T09:32:58.9266727Z st.global.v4.b32 [ %rd142 + 0 ], { %r889, %r890, %r891, %r892 }; 2026-02-21T09:32:58.9266779Z // end inline asm 2026-02-21T09:32:58.9266839Z // begin inline asm 2026-02-21T09:32:58.9266930Z st.global.v4.b32 [ %rd143 + 0 ], { %r893, %r894, %r895, %r896 }; 2026-02-21T09:32:58.9267043Z // end inline asm 2026-02-21T09:32:58.9267096Z // begin inline asm 2026-02-21T09:32:58.9267194Z st.global.v4.b32 [ %rd144 + 0 ], { %r897, %r898, %r899, %r900 }; 2026-02-21T09:32:58.9267248Z // end inline asm 2026-02-21T09:32:58.9267416Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9267483Z add.s32 %r901, %r904, 90192; 2026-02-21T09:32:58.9267645Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9267699Z bar.sync 0, 128; 2026-02-21T09:32:58.9267758Z // begin inline asm 2026-02-21T09:32:58.9267853Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r901]; 2026-02-21T09:32:58.9267907Z // end inline asm 2026-02-21T09:32:58.9267964Z add.s32 %r1147, %r1224, 1; 2026-02-21T09:32:58.9268035Z setp.eq.b32 %p109, %r1147, 2; 2026-02-21T09:32:58.9268102Z selp.b32 %r1224, 0, %r1147, %p109; 2026-02-21T09:32:58.9268171Z selp.b32 %r1223, 1, 0, %p109; 2026-02-21T09:32:58.9268258Z $L__BB0_28: // %.thread23 2026-02-21T09:32:58.9268349Z // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:32:58.9268506Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9268572Z xor.b32 %r1226, %r1226, %r1223; 2026-02-21T09:32:58.9268733Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9268790Z add.s32 %r1219, %r1219, 1; 2026-02-21T09:32:58.9268853Z setp.lt.s32 %p110, %r1219, %r122; 2026-02-21T09:32:58.9268918Z @%p110 bra $L__BB0_24; 2026-02-21T09:32:58.9268973Z bra.uni $L__BB0_29; 2026-02-21T09:32:58.9269073Z $L__BB0_24: // =>This Inner Loop Header: Depth=1 2026-02-21T09:32:58.9269167Z add.s32 %r618, %r1218, 1; 2026-02-21T09:32:58.9269253Z setp.eq.b32 %p105, %r1218, 15; 2026-02-21T09:32:58.9269315Z selp.b32 %r1218, 0, %r618, %p105; 2026-02-21T09:32:58.9269376Z setp.eq.b32 %p106, %r1218, 15; 2026-02-21T09:32:58.9269438Z @%p106 bra $L__BB0_27; 2026-02-21T09:32:58.9269531Z // %bb.25: // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:32:58.9269698Z .loc 1 0 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0:107 2026-02-21T09:32:58.9269761Z mov.b32 %r1223, 0; 2026-02-21T09:32:58.9269928Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9269986Z setp.ne.b32 %p107, %r1218, 0; 2026-02-21T09:32:58.9270049Z @%p107 bra $L__BB0_28; 2026-02-21T09:32:58.9270121Z // %bb.26: // %.thread 2026-02-21T09:32:58.9270206Z // in Loop: Header=BB0_24 Depth=1 2026-02-21T09:32:58.9270264Z add.s32 %r1222, %r1222, 1; 2026-02-21T09:32:58.9270428Z .loc 1 36 35 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:36:35 2026-02-21T09:32:58.9270493Z mul.hi.s32 %r1149, %r1222, 715827883; 2026-02-21T09:32:58.9270551Z shr.u32 %r1150, %r1149, 31; 2026-02-21T09:32:58.9270615Z shr.s32 %r1151, %r1149, 10; 2026-02-21T09:32:58.9270674Z add.s32 %r1152, %r1151, %r1150; 2026-02-21T09:32:58.9270834Z .loc 1 37 33 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:37:33 2026-02-21T09:32:58.9270897Z shl.b32 %r1153, %r1152, 6; 2026-02-21T09:32:58.9271056Z .loc 1 38 39 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:39 2026-02-21T09:32:58.9271111Z sub.s32 %r1154, 8, %r1153; 2026-02-21T09:32:58.9271283Z .loc 1 38 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:52 2026-02-21T09:32:58.9271339Z min.s32 %r1155, %r1154, 64; 2026-02-21T09:32:58.9271501Z .loc 1 39 45 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:45 2026-02-21T09:32:58.9271589Z mul.lo.s32 %r1156, %r1152, 6144; 2026-02-21T09:32:58.9271680Z sub.s32 %r1157, %r1222, %r1156; 2026-02-21T09:32:58.9271841Z .loc 1 40 51 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:40:51 2026-02-21T09:32:58.9271900Z div.s32 %r1158, %r1157, %r1155; 2026-02-21T09:32:58.9272064Z .loc 1 39 64 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:64 2026-02-21T09:32:58.9272124Z mul.lo.s32 %r1159, %r1158, %r1155; 2026-02-21T09:32:58.9272180Z sub.s32 %r1160, %r1157, %r1159; 2026-02-21T09:32:58.9272345Z .loc 1 39 30 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:30 2026-02-21T09:32:58.9272402Z add.s32 %r1161, %r1160, %r1153; 2026-02-21T09:32:58.9272564Z .loc 1 41 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:41:27 2026-02-21T09:32:58.9272622Z shl.b32 %r1220, %r1161, 7; 2026-02-21T09:32:58.9272790Z .loc 1 43 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:43:27 2026-02-21T09:32:58.9272849Z shl.b32 %r1221, %r1158, 7; 2026-02-21T09:32:58.9273015Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9273077Z bra.uni $L__BB0_28; 2026-02-21T09:32:58.9273170Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:32:58.9273334Z .loc 1 0 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0:107 2026-02-21T09:32:58.9273425Z ld.param.b64 %rd11, [_helion_matmul_param_1]; 2026-02-21T09:32:58.9273484Z mov.b32 %r156, global_smem; 2026-02-21T09:32:58.9273539Z add.s32 %r157, %r156, %r3; 2026-02-21T09:32:58.9273604Z add.s32 %r233, %r156, 49152; 2026-02-21T09:32:58.9273659Z mov.u32 %r5, %ctaid.x; 2026-02-21T09:32:58.9273713Z bra.uni $L__BB0_2; 2026-02-21T09:32:58.9273852Z $L__BB0_21: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9274030Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9274088Z barrier.sync 1; 2026-02-21T09:32:58.9274144Z barrier.sync 1; 2026-02-21T09:32:58.9274227Z $L__BB0_2: // %.preheader 2026-02-21T09:32:58.9274315Z // =>This Loop Header: Depth=1 2026-02-21T09:32:58.9274399Z // Child Loop BB0_17 Depth 2 2026-02-21T09:32:58.9274488Z // Child Loop BB0_9 Depth 2 2026-02-21T09:32:58.9274646Z .loc 1 19 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:19 2026-02-21T09:32:58.9274750Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:32:58.9274804Z barrier.sync 1; 2026-02-21T09:32:58.9274874Z ld.shared.b8 %r155, [%r157+90204]; 2026-02-21T09:32:58.9274937Z setp.gt.u32 %p4, %r155, 3; 2026-02-21T09:32:58.9274994Z @%p4 bra $L__BB0_4; 2026-02-21T09:32:58.9275081Z // %bb.3: // %.preheader 2026-02-21T09:32:58.9275168Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9275229Z $L_brx_0: .branchtargets 2026-02-21T09:32:58.9275282Z $L__BB0_5, 2026-02-21T09:32:58.9275340Z $L__BB0_15, 2026-02-21T09:32:58.9275391Z $L__BB0_21, 2026-02-21T09:32:58.9275440Z $L__BB0_30; 2026-02-21T09:32:58.9275506Z brx.idx %r155, $L_brx_0; 2026-02-21T09:32:58.9275599Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9275762Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9275843Z ld.shared.b32 %r270, [global_smem+49152]; 2026-02-21T09:32:58.9275898Z barrier.sync 1; 2026-02-21T09:32:58.9275954Z max.u32 %r234, %r5, 767; 2026-02-21T09:32:58.9276011Z shl.b32 %r235, %r234, 4; 2026-02-21T09:32:58.9276076Z sub.s32 %r7, 12288, %r235; 2026-02-21T09:32:58.9276237Z .loc 1 42 45 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:42:45 2026-02-21T09:32:58.9276354Z add.s32 %r236, %r1, -128; 2026-02-21T09:32:58.9276418Z shr.u32 %r237, %r236, 5; 2026-02-21T09:32:58.9276474Z shr.u32 %r238, %r1, 3; 2026-02-21T09:32:58.9276531Z bfe.u32 %r8, %r1, 3, 4; 2026-02-21T09:32:58.9276588Z or.b32 %r9, %r8, 16; 2026-02-21T09:32:58.9276653Z or.b32 %r10, %r8, 32; 2026-02-21T09:32:58.9276709Z or.b32 %r11, %r8, 48; 2026-02-21T09:32:58.9276764Z or.b32 %r12, %r8, 64; 2026-02-21T09:32:58.9276829Z or.b32 %r13, %r8, 80; 2026-02-21T09:32:58.9276881Z or.b32 %r14, %r8, 96; 2026-02-21T09:32:58.9276937Z or.b32 %r15, %r238, 112; 2026-02-21T09:32:58.9277109Z .loc 1 50 48 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:50:48 2026-02-21T09:32:58.9277165Z shl.b32 %r239, %r1, 3; 2026-02-21T09:32:58.9277219Z and.b32 %r16, %r239, 56; 2026-02-21T09:32:58.9277389Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9277459Z setp.lt.s32 %p16, %r7, 1; 2026-02-21T09:32:58.9277517Z setp.gt.s32 %p15, %r7, 0; 2026-02-21T09:32:58.9277677Z .loc 1 36 35 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:36:35 2026-02-21T09:32:58.9277746Z mul.hi.u32 %r240, %r5, 715827883; 2026-02-21T09:32:58.9277803Z shr.u32 %r241, %r240, 10; 2026-02-21T09:32:58.9277966Z .loc 1 37 33 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:37:33 2026-02-21T09:32:58.9278038Z shl.b32 %r242, %r241, 6; 2026-02-21T09:32:58.9278199Z .loc 1 38 39 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:39 2026-02-21T09:32:58.9278256Z sub.s32 %r243, 8, %r242; 2026-02-21T09:32:58.9278415Z .loc 1 39 45 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:45 2026-02-21T09:32:58.9278513Z mul.lo.s32 %r244, %r241, 6144; 2026-02-21T09:32:58.9278599Z sub.s32 %r245, %r5, %r244; 2026-02-21T09:32:58.9278765Z .loc 1 39 64 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:64 2026-02-21T09:32:58.9278832Z rem.s32 %r246, %r245, %r243; 2026-02-21T09:32:58.9278995Z .loc 1 39 30 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:30 2026-02-21T09:32:58.9279052Z add.s32 %r247, %r246, %r242; 2026-02-21T09:32:58.9279225Z .loc 1 41 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:41:27 2026-02-21T09:32:58.9279280Z shl.b32 %r248, %r247, 7; 2026-02-21T09:32:58.9279441Z .loc 1 42 32 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:42:32 2026-02-21T09:32:58.9279506Z or.b32 %r1195, %r248, %r8; 2026-02-21T09:32:58.9279560Z or.b32 %r1196, %r248, %r9; 2026-02-21T09:32:58.9279616Z or.b32 %r1197, %r248, %r10; 2026-02-21T09:32:58.9279674Z or.b32 %r1198, %r248, %r11; 2026-02-21T09:32:58.9279739Z or.b32 %r1199, %r248, %r12; 2026-02-21T09:32:58.9279795Z or.b32 %r1200, %r248, %r13; 2026-02-21T09:32:58.9279851Z or.b32 %r1201, %r248, %r14; 2026-02-21T09:32:58.9279914Z or.b32 %r1202, %r248, %r15; 2026-02-21T09:32:58.9280077Z .loc 1 55 80 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:80 2026-02-21T09:32:58.9280134Z shl.b32 %r249, %r1195, 10; 2026-02-21T09:32:58.9280189Z shl.b32 %r250, %r1196, 10; 2026-02-21T09:32:58.9280253Z shl.b32 %r251, %r1197, 10; 2026-02-21T09:32:58.9280308Z shl.b32 %r252, %r1198, 10; 2026-02-21T09:32:58.9280363Z shl.b32 %r253, %r1199, 10; 2026-02-21T09:32:58.9280423Z shl.b32 %r254, %r1200, 10; 2026-02-21T09:32:58.9280476Z shl.b32 %r255, %r1201, 10; 2026-02-21T09:32:58.9280531Z shl.b32 %r256, %r1202, 10; 2026-02-21T09:32:58.9280697Z .loc 1 55 59 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:59 2026-02-21T09:32:58.9280760Z or.b32 %r257, %r249, %r16; 2026-02-21T09:32:58.9280815Z or.b32 %r258, %r250, %r16; 2026-02-21T09:32:58.9280903Z or.b32 %r259, %r251, %r16; 2026-02-21T09:32:58.9280987Z or.b32 %r260, %r252, %r16; 2026-02-21T09:32:58.9281040Z or.b32 %r261, %r253, %r16; 2026-02-21T09:32:58.9281093Z or.b32 %r262, %r254, %r16; 2026-02-21T09:32:58.9281152Z or.b32 %r263, %r255, %r16; 2026-02-21T09:32:58.9281206Z or.b32 %r264, %r256, %r16; 2026-02-21T09:32:58.9281368Z .loc 1 55 34 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:34 2026-02-21T09:32:58.9281434Z mad.wide.s32 %rd17, %r257, 2, %rd11; 2026-02-21T09:32:58.9281505Z mad.wide.s32 %rd18, %r258, 2, %rd11; 2026-02-21T09:32:58.9281563Z mad.wide.s32 %rd19, %r259, 2, %rd11; 2026-02-21T09:32:58.9281623Z mad.wide.s32 %rd20, %r260, 2, %rd11; 2026-02-21T09:32:58.9281687Z mad.wide.s32 %rd21, %r261, 2, %rd11; 2026-02-21T09:32:58.9281746Z mad.wide.s32 %rd22, %r262, 2, %rd11; 2026-02-21T09:32:58.9281806Z mad.wide.s32 %rd23, %r263, 2, %rd11; 2026-02-21T09:32:58.9281866Z mad.wide.s32 %rd24, %r264, 2, %rd11; 2026-02-21T09:32:58.9282033Z .loc 1 55 87 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:87 2026-02-21T09:32:58.9282090Z shl.b32 %r265, %r1, 4; 2026-02-21T09:32:58.9282146Z and.b32 %r266, %r265, 2032; 2026-02-21T09:32:58.9282206Z shl.b32 %r267, %r1, 1; 2026-02-21T09:32:58.9282260Z and.b32 %r268, %r267, 112; 2026-02-21T09:32:58.9282313Z xor.b32 %r25, %r266, %r268; 2026-02-21T09:32:58.9282374Z add.s32 %r299, %r233, %r25; 2026-02-21T09:32:58.9282434Z selp.b32 %r199, 16, 0, %p15; 2026-02-21T09:32:58.9282488Z // begin inline asm 2026-02-21T09:32:58.9282605Z cp.async.cg.shared.global [ %r299 + 0 ], [ %rd17 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9282667Z // end inline asm 2026-02-21T09:32:58.9282722Z add.s32 %r301, %r299, 2048; 2026-02-21T09:32:58.9282775Z // begin inline asm 2026-02-21T09:32:58.9282894Z cp.async.cg.shared.global [ %r301 + 0 ], [ %rd18 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9282969Z // end inline asm 2026-02-21T09:32:58.9283045Z add.s32 %r303, %r299, 4096; 2026-02-21T09:32:58.9283103Z // begin inline asm 2026-02-21T09:32:58.9283218Z cp.async.cg.shared.global [ %r303 + 0 ], [ %rd19 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9283270Z // end inline asm 2026-02-21T09:32:58.9283326Z add.s32 %r305, %r299, 6144; 2026-02-21T09:32:58.9283387Z // begin inline asm 2026-02-21T09:32:58.9283493Z cp.async.cg.shared.global [ %r305 + 0 ], [ %rd20 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9283544Z // end inline asm 2026-02-21T09:32:58.9283606Z add.s32 %r307, %r299, 8192; 2026-02-21T09:32:58.9283659Z // begin inline asm 2026-02-21T09:32:58.9283762Z cp.async.cg.shared.global [ %r307 + 0 ], [ %rd21 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9283813Z // end inline asm 2026-02-21T09:32:58.9283879Z add.s32 %r309, %r299, 10240; 2026-02-21T09:32:58.9283932Z // begin inline asm 2026-02-21T09:32:58.9284036Z cp.async.cg.shared.global [ %r309 + 0 ], [ %rd22 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9284098Z // end inline asm 2026-02-21T09:32:58.9284158Z add.s32 %r311, %r299, 12288; 2026-02-21T09:32:58.9284213Z // begin inline asm 2026-02-21T09:32:58.9284317Z cp.async.cg.shared.global [ %r311 + 0 ], [ %rd23 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9284379Z // end inline asm 2026-02-21T09:32:58.9284436Z add.s32 %r313, %r299, 14336; 2026-02-21T09:32:58.9284492Z // begin inline asm 2026-02-21T09:32:58.9284609Z cp.async.cg.shared.global [ %r313 + 0 ], [ %rd24 + 0 ], 0x10, %r199; 2026-02-21T09:32:58.9284661Z // end inline asm 2026-02-21T09:32:58.9284750Z cp.async.commit_group; 2026-02-21T09:32:58.9284924Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9284984Z setp.gt.s32 %p17, %r7, 1; 2026-02-21T09:32:58.9285140Z .loc 1 55 34 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:34 2026-02-21T09:32:58.9285198Z cvt.s64.s32 %rd33, %r249; 2026-02-21T09:32:58.9285263Z cvt.u64.u32 %rd34, %r16; 2026-02-21T09:32:58.9285320Z or.b64 %rd35, %rd33, %rd34; 2026-02-21T09:32:58.9285402Z shl.b64 %rd36, %rd35, 1; 2026-02-21T09:32:58.9285492Z add.s64 %rd1, %rd11, %rd36; 2026-02-21T09:32:58.9285547Z add.s64 %rd25, %rd1, 128; 2026-02-21T09:32:58.9285603Z cvt.s64.s32 %rd37, %r250; 2026-02-21T09:32:58.9285657Z or.b64 %rd38, %rd37, %rd34; 2026-02-21T09:32:58.9285724Z shl.b64 %rd39, %rd38, 1; 2026-02-21T09:32:58.9285781Z add.s64 %rd2, %rd11, %rd39; 2026-02-21T09:32:58.9285835Z add.s64 %rd26, %rd2, 128; 2026-02-21T09:32:58.9285900Z cvt.s64.s32 %rd40, %r251; 2026-02-21T09:32:58.9285955Z or.b64 %rd41, %rd40, %rd34; 2026-02-21T09:32:58.9286009Z shl.b64 %rd42, %rd41, 1; 2026-02-21T09:32:58.9286066Z add.s64 %rd3, %rd11, %rd42; 2026-02-21T09:32:58.9286130Z add.s64 %rd27, %rd3, 128; 2026-02-21T09:32:58.9286185Z cvt.s64.s32 %rd43, %r252; 2026-02-21T09:32:58.9286240Z or.b64 %rd44, %rd43, %rd34; 2026-02-21T09:32:58.9286304Z shl.b64 %rd45, %rd44, 1; 2026-02-21T09:32:58.9286359Z add.s64 %rd4, %rd11, %rd45; 2026-02-21T09:32:58.9286416Z add.s64 %rd28, %rd4, 128; 2026-02-21T09:32:58.9286478Z cvt.s64.s32 %rd46, %r253; 2026-02-21T09:32:58.9286534Z or.b64 %rd47, %rd46, %rd34; 2026-02-21T09:32:58.9286588Z shl.b64 %rd48, %rd47, 1; 2026-02-21T09:32:58.9286643Z add.s64 %rd5, %rd11, %rd48; 2026-02-21T09:32:58.9286706Z add.s64 %rd29, %rd5, 128; 2026-02-21T09:32:58.9286762Z cvt.s64.s32 %rd49, %r254; 2026-02-21T09:32:58.9286817Z or.b64 %rd50, %rd49, %rd34; 2026-02-21T09:32:58.9286878Z shl.b64 %rd51, %rd50, 1; 2026-02-21T09:32:58.9286933Z add.s64 %rd6, %rd11, %rd51; 2026-02-21T09:32:58.9286987Z add.s64 %rd30, %rd6, 128; 2026-02-21T09:32:58.9287041Z cvt.s64.s32 %rd52, %r255; 2026-02-21T09:32:58.9287104Z or.b64 %rd53, %rd52, %rd34; 2026-02-21T09:32:58.9287159Z shl.b64 %rd54, %rd53, 1; 2026-02-21T09:32:58.9287214Z add.s64 %rd7, %rd11, %rd54; 2026-02-21T09:32:58.9287274Z add.s64 %rd31, %rd7, 128; 2026-02-21T09:32:58.9287329Z cvt.s64.s32 %rd55, %r256; 2026-02-21T09:32:58.9287414Z or.b64 %rd56, %rd55, %rd34; 2026-02-21T09:32:58.9287494Z shl.b64 %rd57, %rd56, 1; 2026-02-21T09:32:58.9287559Z add.s64 %rd8, %rd11, %rd57; 2026-02-21T09:32:58.9287616Z add.s64 %rd32, %rd8, 128; 2026-02-21T09:32:58.9287779Z .loc 1 55 87 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:87 2026-02-21T09:32:58.9287840Z bar.sync 2, 128; 2026-02-21T09:32:58.9287895Z add.s32 %r269, %r156, %r25; 2026-02-21T09:32:58.9287950Z add.s32 %r214, %r269, 65536; 2026-02-21T09:32:58.9288015Z selp.b32 %r215, 16, 0, %p17; 2026-02-21T09:32:58.9288068Z // begin inline asm 2026-02-21T09:32:58.9288177Z cp.async.cg.shared.global [ %r214 + 0 ], [ %rd25 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9288228Z // end inline asm 2026-02-21T09:32:58.9288290Z add.s32 %r216, %r269, 67584; 2026-02-21T09:32:58.9288344Z // begin inline asm 2026-02-21T09:32:58.9288452Z cp.async.cg.shared.global [ %r216 + 0 ], [ %rd26 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9288509Z // end inline asm 2026-02-21T09:32:58.9288566Z add.s32 %r218, %r269, 69632; 2026-02-21T09:32:58.9288621Z // begin inline asm 2026-02-21T09:32:58.9288728Z cp.async.cg.shared.global [ %r218 + 0 ], [ %rd27 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9288785Z // end inline asm 2026-02-21T09:32:58.9288839Z add.s32 %r220, %r269, 71680; 2026-02-21T09:32:58.9288892Z // begin inline asm 2026-02-21T09:32:58.9289004Z cp.async.cg.shared.global [ %r220 + 0 ], [ %rd28 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9289055Z // end inline asm 2026-02-21T09:32:58.9289109Z add.s32 %r222, %r269, 73728; 2026-02-21T09:32:58.9289162Z // begin inline asm 2026-02-21T09:32:58.9289277Z cp.async.cg.shared.global [ %r222 + 0 ], [ %rd29 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9289330Z // end inline asm 2026-02-21T09:32:58.9289385Z add.s32 %r224, %r269, 75776; 2026-02-21T09:32:58.9289445Z // begin inline asm 2026-02-21T09:32:58.9289547Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd30 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9289601Z // end inline asm 2026-02-21T09:32:58.9289664Z add.s32 %r226, %r269, 77824; 2026-02-21T09:32:58.9289743Z // begin inline asm 2026-02-21T09:32:58.9289866Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd31 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9289917Z // end inline asm 2026-02-21T09:32:58.9289980Z add.s32 %r228, %r269, 79872; 2026-02-21T09:32:58.9290033Z // begin inline asm 2026-02-21T09:32:58.9290136Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd32 + 0 ], 0x10, %r215; 2026-02-21T09:32:58.9290195Z // end inline asm 2026-02-21T09:32:58.9290256Z cp.async.commit_group; 2026-02-21T09:32:58.9290316Z cp.async.wait_group 1; 2026-02-21T09:32:58.9290369Z bar.sync 2, 128; 2026-02-21T09:32:58.9290537Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9290592Z add.s32 %r230, %r156, 90144; 2026-02-21T09:32:58.9290644Z mov.b32 %r231, 0; 2026-02-21T09:32:58.9290704Z // begin inline asm 2026-02-21T09:32:58.9290753Z 2026-02-21T09:32:58.9290801Z { 2026-02-21T09:32:58.9290865Z @!%p15 bra.uni skipWait; 2026-02-21T09:32:58.9290933Z .reg .pred complete; 2026-02-21T09:32:58.9290989Z waitLoop: 2026-02-21T09:32:58.9291108Z mbarrier.try_wait.parity.shared.b64 complete, [%r230], %r231; 2026-02-21T09:32:58.9291181Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9291236Z skipWait: 2026-02-21T09:32:58.9291285Z } 2026-02-21T09:32:58.9291289Z 2026-02-21T09:32:58.9291351Z // end inline asm 2026-02-21T09:32:58.9291512Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9291587Z shfl.sync.idx.b32 %r34, %r237, 0, 31, -1; 2026-02-21T09:32:58.9291647Z setp.ne.b32 %p18, %r34, 0; 2026-02-21T09:32:58.9291714Z or.pred %p19, %p16, %p18; 2026-02-21T09:32:58.9291770Z @%p19 bra $L__BB0_7; 2026-02-21T09:32:58.9291860Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9291929Z elect.sync %r278|%p21, -1; 2026-02-21T09:32:58.9292010Z bfe.u32 %r280, %r156, 4, 14; 2026-02-21T09:32:58.9292088Z cvt.u64.u32 %rd68, %r280; 2026-02-21T09:32:58.9292156Z or.b64 %rd58, %rd68, 4611686293372403712; 2026-02-21T09:32:58.9292220Z bfe.u32 %r282, %r233, 4, 14; 2026-02-21T09:32:58.9292275Z cvt.u64.u32 %rd69, %r282; 2026-02-21T09:32:58.9292335Z or.b64 %rd59, %rd69, 4611686293372403712; 2026-02-21T09:32:58.9292398Z mov.b32 %r271, 136314896; 2026-02-21T09:32:58.9292453Z mov.pred %p20, 0; 2026-02-21T09:32:58.9292506Z // begin inline asm 2026-02-21T09:32:58.9292659Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r270 + 0 ], %rd58, %rd59, %r271, %p20; 2026-02-21T09:32:58.9292712Z // end inline asm 2026-02-21T09:32:58.9292766Z add.s32 %r283, %r156, 32; 2026-02-21T09:32:58.9292822Z bfe.u32 %r284, %r283, 4, 14; 2026-02-21T09:32:58.9292883Z cvt.u64.u32 %rd70, %r284; 2026-02-21T09:32:58.9292946Z or.b64 %rd60, %rd70, 4611686293372403712; 2026-02-21T09:32:58.9293001Z add.s32 %r285, %r156, 49184; 2026-02-21T09:32:58.9293062Z bfe.u32 %r286, %r285, 4, 14; 2026-02-21T09:32:58.9293118Z cvt.u64.u32 %rd71, %r286; 2026-02-21T09:32:58.9293180Z or.b64 %rd61, %rd71, 4611686293372403712; 2026-02-21T09:32:58.9293240Z mov.pred %p22, -1; 2026-02-21T09:32:58.9293301Z // begin inline asm 2026-02-21T09:32:58.9293435Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r270 + 0 ], %rd60, %rd61, %r271, %p22; 2026-02-21T09:32:58.9293488Z // end inline asm 2026-02-21T09:32:58.9293550Z add.s32 %r287, %r156, 64; 2026-02-21T09:32:58.9293606Z bfe.u32 %r288, %r287, 4, 14; 2026-02-21T09:32:58.9293661Z cvt.u64.u32 %rd72, %r288; 2026-02-21T09:32:58.9293730Z or.b64 %rd62, %rd72, 4611686293372403712; 2026-02-21T09:32:58.9293786Z add.s32 %r289, %r156, 49216; 2026-02-21T09:32:58.9293840Z bfe.u32 %r290, %r289, 4, 14; 2026-02-21T09:32:58.9293895Z cvt.u64.u32 %rd73, %r290; 2026-02-21T09:32:58.9293963Z or.b64 %rd63, %rd73, 4611686293372403712; 2026-02-21T09:32:58.9294018Z // begin inline asm 2026-02-21T09:32:58.9294149Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r270 + 0 ], %rd62, %rd63, %r271, %p22; 2026-02-21T09:32:58.9294210Z // end inline asm 2026-02-21T09:32:58.9294286Z add.s32 %r291, %r156, 96; 2026-02-21T09:32:58.9294370Z bfe.u32 %r292, %r291, 4, 14; 2026-02-21T09:32:58.9294424Z cvt.u64.u32 %rd74, %r292; 2026-02-21T09:32:58.9294492Z or.b64 %rd64, %rd74, 4611686293372403712; 2026-02-21T09:32:58.9294546Z add.s32 %r293, %r156, 49248; 2026-02-21T09:32:58.9294599Z bfe.u32 %r294, %r293, 4, 14; 2026-02-21T09:32:58.9294659Z cvt.u64.u32 %rd75, %r294; 2026-02-21T09:32:58.9294756Z or.b64 %rd65, %rd75, 4611686293372403712; 2026-02-21T09:32:58.9294810Z // begin inline asm 2026-02-21T09:32:58.9294938Z @%p21 tcgen05.mma.cta_group::1.kind::f16 [ %r270 + 0 ], %rd64, %rd65, %r271, %p22; 2026-02-21T09:32:58.9294991Z // end inline asm 2026-02-21T09:32:58.9295047Z add.s32 %r295, %r156, 90112; 2026-02-21T09:32:58.9295100Z cvt.u64.u32 %rd66, %r295; 2026-02-21T09:32:58.9295162Z // begin inline asm 2026-02-21T09:32:58.9295278Z @%p21 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd66]; 2026-02-21T09:32:58.9295332Z // end inline asm 2026-02-21T09:32:58.9295395Z add.s32 %r296, %r156, 90176; 2026-02-21T09:32:58.9295451Z cvt.u64.u32 %rd67, %r296; 2026-02-21T09:32:58.9295503Z // begin inline asm 2026-02-21T09:32:58.9295622Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:32:58.9295674Z // end inline asm 2026-02-21T09:32:58.9295766Z $L__BB0_7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9295934Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9296001Z setp.gt.s32 %p31, %r7, 2; 2026-02-21T09:32:58.9296167Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9296223Z add.s32 %r297, %r156, 90192; 2026-02-21T09:32:58.9296286Z mov.pred %p122, 0; 2026-02-21T09:32:58.9296340Z // begin inline asm 2026-02-21T09:32:58.9296389Z 2026-02-21T09:32:58.9296471Z { 2026-02-21T09:32:58.9296574Z @!%p122 bra.uni skipWait; 2026-02-21T09:32:58.9296635Z .reg .pred complete; 2026-02-21T09:32:58.9296689Z waitLoop: 2026-02-21T09:32:58.9296811Z mbarrier.try_wait.parity.shared.b64 complete, [%r297], %r231; 2026-02-21T09:32:58.9296873Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9296925Z skipWait: 2026-02-21T09:32:58.9296973Z } 2026-02-21T09:32:58.9296984Z 2026-02-21T09:32:58.9297037Z // end inline asm 2026-02-21T09:32:58.9297198Z .loc 1 55 34 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:34 2026-02-21T09:32:58.9297255Z add.s64 %rd76, %rd1, 256; 2026-02-21T09:32:58.9297317Z add.s64 %rd77, %rd2, 256; 2026-02-21T09:32:58.9297372Z add.s64 %rd78, %rd3, 256; 2026-02-21T09:32:58.9297424Z add.s64 %rd79, %rd4, 256; 2026-02-21T09:32:58.9297487Z add.s64 %rd80, %rd5, 256; 2026-02-21T09:32:58.9297542Z add.s64 %rd81, %rd6, 256; 2026-02-21T09:32:58.9297598Z add.s64 %rd82, %rd7, 256; 2026-02-21T09:32:58.9297652Z add.s64 %rd83, %rd8, 256; 2026-02-21T09:32:58.9297824Z .loc 1 55 87 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:87 2026-02-21T09:32:58.9297882Z bar.sync 2, 128; 2026-02-21T09:32:58.9297940Z selp.b32 %r300, 16, 0, %p31; 2026-02-21T09:32:58.9298001Z // begin inline asm 2026-02-21T09:32:58.9298109Z cp.async.cg.shared.global [ %r299 + 0 ], [ %rd76 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9298164Z // end inline asm 2026-02-21T09:32:58.9298229Z // begin inline asm 2026-02-21T09:32:58.9298340Z cp.async.cg.shared.global [ %r301 + 0 ], [ %rd77 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9298394Z // end inline asm 2026-02-21T09:32:58.9298450Z // begin inline asm 2026-02-21T09:32:58.9298584Z cp.async.cg.shared.global [ %r303 + 0 ], [ %rd78 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9298637Z // end inline asm 2026-02-21T09:32:58.9298693Z // begin inline asm 2026-02-21T09:32:58.9298809Z cp.async.cg.shared.global [ %r305 + 0 ], [ %rd79 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9298865Z // end inline asm 2026-02-21T09:32:58.9298922Z // begin inline asm 2026-02-21T09:32:58.9299064Z cp.async.cg.shared.global [ %r307 + 0 ], [ %rd80 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9299152Z // end inline asm 2026-02-21T09:32:58.9299207Z // begin inline asm 2026-02-21T09:32:58.9299316Z cp.async.cg.shared.global [ %r309 + 0 ], [ %rd81 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9299377Z // end inline asm 2026-02-21T09:32:58.9299432Z // begin inline asm 2026-02-21T09:32:58.9299543Z cp.async.cg.shared.global [ %r311 + 0 ], [ %rd82 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9299606Z // end inline asm 2026-02-21T09:32:58.9299660Z // begin inline asm 2026-02-21T09:32:58.9299769Z cp.async.cg.shared.global [ %r313 + 0 ], [ %rd83 + 0 ], 0x10, %r300; 2026-02-21T09:32:58.9299822Z // end inline asm 2026-02-21T09:32:58.9299893Z cp.async.commit_group; 2026-02-21T09:32:58.9300065Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9300127Z @%p16 bra $L__BB0_14; 2026-02-21T09:32:58.9300215Z // %bb.8: // %.lr.ph13 2026-02-21T09:32:58.9300309Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9300480Z .loc 1 0 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0:107 2026-02-21T09:32:58.9300550Z add.s32 %r6, %r235, -12272; 2026-02-21T09:32:58.9300609Z sub.s32 %r35, 13, %r6; 2026-02-21T09:32:58.9300668Z sub.s32 %r36, 15, %r6; 2026-02-21T09:32:58.9300723Z mov.b32 %r1193, 2; 2026-02-21T09:32:58.9300785Z mov.b32 %r1179, 0; 2026-02-21T09:32:58.9300839Z mov.b32 %r1178, 1; 2026-02-21T09:32:58.9300895Z mov.b32 %r1177, 128; 2026-02-21T09:32:58.9300959Z mov.b32 %r1180, %r1179; 2026-02-21T09:32:58.9301017Z mov.b32 %r1181, %r1179; 2026-02-21T09:32:58.9301076Z mov.b32 %r1182, %r1179; 2026-02-21T09:32:58.9301131Z mov.b32 %r1183, %r1179; 2026-02-21T09:32:58.9301194Z mov.b32 %r1203, %r5; 2026-02-21T09:32:58.9301277Z mov.b32 %r1194, %r1179; 2026-02-21T09:32:58.9301356Z bra.uni $L__BB0_9; 2026-02-21T09:32:58.9301467Z $L__BB0_13: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:32:58.9301642Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9301708Z setp.eq.b32 %p122, %r1178, 15; 2026-02-21T09:32:58.9301772Z setp.eq.b32 %p54, %r1178, 15; 2026-02-21T09:32:58.9301840Z setp.eq.b32 %p55, %r55, 0; 2026-02-21T09:32:58.9301903Z setp.lt.s32 %p57, %r1194, %r35; 2026-02-21T09:32:58.9302074Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9302140Z add.s32 %r394, %r1181, 1; 2026-02-21T09:32:58.9302201Z setp.eq.b32 %p58, %r394, 2; 2026-02-21T09:32:58.9302263Z selp.b32 %r395, 0, %r394, %p58; 2026-02-21T09:32:58.9302340Z selp.b32 %r1181, %r395, %r1181, %p54; 2026-02-21T09:32:58.9302403Z and.pred %p59, %p54, %p58; 2026-02-21T09:32:58.9302463Z selp.b32 %r396, 1, 0, %p59; 2026-02-21T09:32:58.9302526Z xor.b32 %r1180, %r1180, %r396; 2026-02-21T09:32:58.9302709Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9302768Z shl.b32 %r397, %r1181, 3; 2026-02-21T09:32:58.9302827Z add.s32 %r399, %r156, %r397; 2026-02-21T09:32:58.9302892Z add.s32 %r376, %r399, 90192; 2026-02-21T09:32:58.9302954Z and.pred %p53, %p36, %p54; 2026-02-21T09:32:58.9303121Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9303183Z // begin inline asm 2026-02-21T09:32:58.9303234Z 2026-02-21T09:32:58.9303283Z { 2026-02-21T09:32:58.9303344Z @!%p53 bra.uni skipWait; 2026-02-21T09:32:58.9303410Z .reg .pred complete; 2026-02-21T09:32:58.9303464Z waitLoop: 2026-02-21T09:32:58.9303584Z mbarrier.try_wait.parity.shared.b64 complete, [%r376], %r1180; 2026-02-21T09:32:58.9303655Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9303710Z skipWait: 2026-02-21T09:32:58.9303760Z } 2026-02-21T09:32:58.9303765Z 2026-02-21T09:32:58.9303843Z // end inline asm 2026-02-21T09:32:58.9304051Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9304112Z add.s32 %r400, %r1177, 64; 2026-02-21T09:32:58.9304177Z selp.b32 %r1177, 0, %r400, %p55; 2026-02-21T09:32:58.9304350Z .loc 1 50 35 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:50:35 2026-02-21T09:32:58.9304408Z add.s32 %r401, %r1177, %r16; 2026-02-21T09:32:58.9304575Z .loc 1 55 80 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:80 2026-02-21T09:32:58.9304640Z shl.b32 %r402, %r1195, 10; 2026-02-21T09:32:58.9304727Z shl.b32 %r403, %r1196, 10; 2026-02-21T09:32:58.9304784Z shl.b32 %r404, %r1197, 10; 2026-02-21T09:32:58.9304841Z shl.b32 %r405, %r1198, 10; 2026-02-21T09:32:58.9304905Z shl.b32 %r406, %r1199, 10; 2026-02-21T09:32:58.9304961Z shl.b32 %r407, %r1200, 10; 2026-02-21T09:32:58.9305018Z shl.b32 %r408, %r1201, 10; 2026-02-21T09:32:58.9305083Z shl.b32 %r409, %r1202, 10; 2026-02-21T09:32:58.9305252Z .loc 1 55 59 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:59 2026-02-21T09:32:58.9305310Z add.s32 %r410, %r402, %r401; 2026-02-21T09:32:58.9305369Z add.s32 %r411, %r403, %r401; 2026-02-21T09:32:58.9305433Z add.s32 %r412, %r404, %r401; 2026-02-21T09:32:58.9305490Z add.s32 %r413, %r405, %r401; 2026-02-21T09:32:58.9305546Z add.s32 %r414, %r406, %r401; 2026-02-21T09:32:58.9305612Z add.s32 %r415, %r407, %r401; 2026-02-21T09:32:58.9305668Z add.s32 %r416, %r408, %r401; 2026-02-21T09:32:58.9305724Z add.s32 %r417, %r409, %r401; 2026-02-21T09:32:58.9305899Z .loc 1 55 34 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:34 2026-02-21T09:32:58.9305969Z mad.wide.s32 %rd102, %r410, 2, %rd11; 2026-02-21T09:32:58.9306036Z mad.wide.s32 %rd103, %r411, 2, %rd11; 2026-02-21T09:32:58.9306155Z mad.wide.s32 %rd104, %r412, 2, %rd11; 2026-02-21T09:32:58.9306230Z mad.wide.s32 %rd105, %r413, 2, %rd11; 2026-02-21T09:32:58.9306294Z mad.wide.s32 %rd106, %r414, 2, %rd11; 2026-02-21T09:32:58.9306358Z mad.wide.s32 %rd107, %r415, 2, %rd11; 2026-02-21T09:32:58.9306430Z mad.wide.s32 %rd108, %r416, 2, %rd11; 2026-02-21T09:32:58.9306491Z mad.wide.s32 %rd109, %r417, 2, %rd11; 2026-02-21T09:32:58.9306663Z .loc 1 55 87 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:87 2026-02-21T09:32:58.9306727Z bar.sync 2, 128; 2026-02-21T09:32:58.9306786Z add.s32 %r378, %r77, %r25; 2026-02-21T09:32:58.9306844Z selp.b32 %r379, 16, 0, %p57; 2026-02-21T09:32:58.9306901Z // begin inline asm 2026-02-21T09:32:58.9307033Z cp.async.cg.shared.global [ %r378 + 0 ], [ %rd102 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9307088Z // end inline asm 2026-02-21T09:32:58.9307145Z add.s32 %r380, %r378, 2048; 2026-02-21T09:32:58.9307208Z // begin inline asm 2026-02-21T09:32:58.9307331Z cp.async.cg.shared.global [ %r380 + 0 ], [ %rd103 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9307387Z // end inline asm 2026-02-21T09:32:58.9307456Z add.s32 %r382, %r378, 4096; 2026-02-21T09:32:58.9307516Z // begin inline asm 2026-02-21T09:32:58.9307626Z cp.async.cg.shared.global [ %r382 + 0 ], [ %rd104 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9307677Z // end inline asm 2026-02-21T09:32:58.9307737Z add.s32 %r384, %r378, 6144; 2026-02-21T09:32:58.9307790Z // begin inline asm 2026-02-21T09:32:58.9307900Z cp.async.cg.shared.global [ %r384 + 0 ], [ %rd105 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9307953Z // end inline asm 2026-02-21T09:32:58.9308016Z add.s32 %r386, %r378, 8192; 2026-02-21T09:32:58.9308070Z // begin inline asm 2026-02-21T09:32:58.9308179Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd106 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9308240Z // end inline asm 2026-02-21T09:32:58.9308296Z add.s32 %r388, %r378, 10240; 2026-02-21T09:32:58.9308349Z // begin inline asm 2026-02-21T09:32:58.9308465Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd107 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9308545Z // end inline asm 2026-02-21T09:32:58.9308626Z add.s32 %r390, %r378, 12288; 2026-02-21T09:32:58.9308680Z // begin inline asm 2026-02-21T09:32:58.9308797Z cp.async.cg.shared.global [ %r390 + 0 ], [ %rd108 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9308851Z // end inline asm 2026-02-21T09:32:58.9308907Z add.s32 %r392, %r378, 14336; 2026-02-21T09:32:58.9308968Z // begin inline asm 2026-02-21T09:32:58.9309077Z cp.async.cg.shared.global [ %r392 + 0 ], [ %rd109 + 0 ], 0x10, %r379; 2026-02-21T09:32:58.9309129Z // end inline asm 2026-02-21T09:32:58.9309188Z cp.async.commit_group; 2026-02-21T09:32:58.9309365Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9309420Z add.s32 %r1194, %r1194, 1; 2026-02-21T09:32:58.9309481Z setp.ne.b32 %p60, %r7, %r1194; 2026-02-21T09:32:58.9309544Z mov.b32 %r1178, %r1193; 2026-02-21T09:32:58.9309599Z mov.b32 %r1193, %r55; 2026-02-21T09:32:58.9309655Z @%p60 bra $L__BB0_9; 2026-02-21T09:32:58.9309719Z bra.uni $L__BB0_14; 2026-02-21T09:32:58.9309810Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:32:58.9309899Z // => This Inner Loop Header: Depth=2 2026-02-21T09:32:58.9310069Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9310132Z add.s32 %r320, %r1193, 1; 2026-02-21T09:32:58.9310195Z setp.eq.b32 %p34, %r1193, 15; 2026-02-21T09:32:58.9310254Z selp.b32 %r55, 0, %r320, %p34; 2026-02-21T09:32:58.9310320Z setp.ne.b32 %p35, %r55, 0; 2026-02-21T09:32:58.9310375Z @%p35 bra $L__BB0_11; 2026-02-21T09:32:58.9310466Z // %bb.10: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:32:58.9310528Z add.s32 %r1203, %r1203, 1; 2026-02-21T09:32:58.9310710Z .loc 1 36 35 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:36:35 2026-02-21T09:32:58.9310798Z mul.hi.s32 %r321, %r1203, 715827883; 2026-02-21T09:32:58.9310856Z shr.u32 %r322, %r321, 31; 2026-02-21T09:32:58.9310916Z shr.s32 %r323, %r321, 10; 2026-02-21T09:32:58.9310970Z add.s32 %r324, %r323, %r322; 2026-02-21T09:32:58.9311123Z .loc 1 37 33 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:37:33 2026-02-21T09:32:58.9311185Z shl.b32 %r325, %r324, 6; 2026-02-21T09:32:58.9311339Z .loc 1 38 39 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:39 2026-02-21T09:32:58.9311393Z sub.s32 %r326, 8, %r325; 2026-02-21T09:32:58.9311553Z .loc 1 38 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:52 2026-02-21T09:32:58.9311607Z min.s32 %r327, %r326, 64; 2026-02-21T09:32:58.9311761Z .loc 1 39 45 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:45 2026-02-21T09:32:58.9311820Z mul.lo.s32 %r328, %r324, 6144; 2026-02-21T09:32:58.9311886Z sub.s32 %r329, %r1203, %r328; 2026-02-21T09:32:58.9312041Z .loc 1 39 64 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:64 2026-02-21T09:32:58.9312097Z rem.s32 %r330, %r329, %r327; 2026-02-21T09:32:58.9312257Z .loc 1 39 30 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:30 2026-02-21T09:32:58.9312314Z add.s32 %r331, %r330, %r325; 2026-02-21T09:32:58.9312466Z .loc 1 41 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:41:27 2026-02-21T09:32:58.9312530Z shl.b32 %r332, %r331, 7; 2026-02-21T09:32:58.9312683Z .loc 1 42 32 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:42:32 2026-02-21T09:32:58.9312739Z or.b32 %r1195, %r332, %r8; 2026-02-21T09:32:58.9312802Z or.b32 %r1196, %r332, %r9; 2026-02-21T09:32:58.9312858Z or.b32 %r1197, %r332, %r10; 2026-02-21T09:32:58.9312913Z or.b32 %r1198, %r332, %r11; 2026-02-21T09:32:58.9312969Z or.b32 %r1199, %r332, %r12; 2026-02-21T09:32:58.9313063Z or.b32 %r1200, %r332, %r13; 2026-02-21T09:32:58.9313141Z or.b32 %r1201, %r332, %r14; 2026-02-21T09:32:58.9313194Z or.b32 %r1202, %r332, %r15; 2026-02-21T09:32:58.9313295Z $L__BB0_11: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:32:58.9313461Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9313523Z setp.ge.s32 %p37, %r1194, %r36; 2026-02-21T09:32:58.9313583Z setp.lt.s32 %p36, %r1194, %r36; 2026-02-21T09:32:58.9313746Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9313801Z add.s32 %r335, %r1183, 1; 2026-02-21T09:32:58.9313859Z setp.eq.b32 %p39, %r335, 3; 2026-02-21T09:32:58.9313928Z selp.b32 %r1183, 0, %r335, %p39; 2026-02-21T09:32:58.9313983Z selp.b32 %r336, 1, 0, %p39; 2026-02-21T09:32:58.9314041Z xor.b32 %r1182, %r1182, %r336; 2026-02-21T09:32:58.9314212Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9314271Z add.s32 %r337, %r1179, 1; 2026-02-21T09:32:58.9314330Z setp.gt.s32 %p40, %r337, 1; 2026-02-21T09:32:58.9314392Z selp.b32 %r1179, 0, %r337, %p40; 2026-02-21T09:32:58.9314459Z shl.b32 %r338, %r1183, 3; 2026-02-21T09:32:58.9314515Z add.s32 %r340, %r156, %r338; 2026-02-21T09:32:58.9314569Z add.s32 %r333, %r340, 90144; 2026-02-21T09:32:58.9314765Z .loc 1 55 87 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:55:87 2026-02-21T09:32:58.9314826Z cp.async.wait_group 1; 2026-02-21T09:32:58.9314879Z bar.sync 2, 128; 2026-02-21T09:32:58.9314941Z shl.b32 %r341, %r1179, 14; 2026-02-21T09:32:58.9314998Z add.s32 %r342, %r156, %r341; 2026-02-21T09:32:58.9315053Z add.s32 %r77, %r342, 49152; 2026-02-21T09:32:58.9315251Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9315339Z // begin inline asm 2026-02-21T09:32:58.9315389Z 2026-02-21T09:32:58.9315436Z { 2026-02-21T09:32:58.9315507Z @!%p36 bra.uni skipWait; 2026-02-21T09:32:58.9315567Z .reg .pred complete; 2026-02-21T09:32:58.9315619Z waitLoop: 2026-02-21T09:32:58.9315739Z mbarrier.try_wait.parity.shared.b64 complete, [%r333], %r1182; 2026-02-21T09:32:58.9315808Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9315859Z skipWait: 2026-02-21T09:32:58.9315906Z } 2026-02-21T09:32:58.9315910Z 2026-02-21T09:32:58.9315972Z // end inline asm 2026-02-21T09:32:58.9316128Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9316188Z or.pred %p41, %p18, %p37; 2026-02-21T09:32:58.9316243Z @%p41 bra $L__BB0_13; 2026-02-21T09:32:58.9316338Z // %bb.12: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:32:58.9316505Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9316566Z setp.eq.b32 %p52, %r1178, 15; 2026-02-21T09:32:58.9316631Z shl.b32 %r351, %r1181, 3; 2026-02-21T09:32:58.9316687Z add.s32 %r353, %r156, %r351; 2026-02-21T09:32:58.9316742Z add.s32 %r354, %r353, 90176; 2026-02-21T09:32:58.9316907Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9316961Z shl.b32 %r355, %r1181, 7; 2026-02-21T09:32:58.9317015Z add.s32 %r343, %r355, %r270; 2026-02-21T09:32:58.9317183Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9317241Z shl.b32 %r356, %r1183, 14; 2026-02-21T09:32:58.9317297Z add.s32 %r357, %r156, %r356; 2026-02-21T09:32:58.9317461Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9317523Z add.s32 %r360, %r340, 90112; 2026-02-21T09:32:58.9317684Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9317771Z not.pred %p42, %p122; 2026-02-21T09:32:58.9317864Z elect.sync %r361|%p43, -1; 2026-02-21T09:32:58.9317920Z bfe.u32 %r362, %r357, 4, 14; 2026-02-21T09:32:58.9317976Z cvt.u64.u32 %rd94, %r362; 2026-02-21T09:32:58.9318042Z or.b64 %rd84, %rd94, 4611686293372403712; 2026-02-21T09:32:58.9318106Z bfe.u32 %r363, %r77, 4, 14; 2026-02-21T09:32:58.9318161Z cvt.u64.u32 %rd95, %r363; 2026-02-21T09:32:58.9318225Z or.b64 %rd85, %rd95, 4611686293372403712; 2026-02-21T09:32:58.9318287Z mov.b32 %r344, 136314896; 2026-02-21T09:32:58.9318341Z // begin inline asm 2026-02-21T09:32:58.9318475Z @%p43 tcgen05.mma.cta_group::1.kind::f16 [ %r343 + 0 ], %rd84, %rd85, %r344, %p42; 2026-02-21T09:32:58.9318534Z // end inline asm 2026-02-21T09:32:58.9318587Z add.s32 %r364, %r357, 32; 2026-02-21T09:32:58.9318642Z bfe.u32 %r365, %r364, 4, 14; 2026-02-21T09:32:58.9318696Z cvt.u64.u32 %rd96, %r365; 2026-02-21T09:32:58.9318766Z or.b64 %rd86, %rd96, 4611686293372403712; 2026-02-21T09:32:58.9318821Z add.s32 %r366, %r77, 32; 2026-02-21T09:32:58.9318878Z bfe.u32 %r367, %r366, 4, 14; 2026-02-21T09:32:58.9318939Z cvt.u64.u32 %rd97, %r367; 2026-02-21T09:32:58.9318998Z or.b64 %rd87, %rd97, 4611686293372403712; 2026-02-21T09:32:58.9319054Z mov.pred %p44, -1; 2026-02-21T09:32:58.9319107Z // begin inline asm 2026-02-21T09:32:58.9319244Z @%p43 tcgen05.mma.cta_group::1.kind::f16 [ %r343 + 0 ], %rd86, %rd87, %r344, %p44; 2026-02-21T09:32:58.9319296Z // end inline asm 2026-02-21T09:32:58.9319350Z add.s32 %r368, %r357, 64; 2026-02-21T09:32:58.9319414Z bfe.u32 %r369, %r368, 4, 14; 2026-02-21T09:32:58.9319468Z cvt.u64.u32 %rd98, %r369; 2026-02-21T09:32:58.9319527Z or.b64 %rd88, %rd98, 4611686293372403712; 2026-02-21T09:32:58.9319590Z add.s32 %r370, %r77, 64; 2026-02-21T09:32:58.9319644Z bfe.u32 %r371, %r370, 4, 14; 2026-02-21T09:32:58.9319697Z cvt.u64.u32 %rd99, %r371; 2026-02-21T09:32:58.9319782Z or.b64 %rd89, %rd99, 4611686293372403712; 2026-02-21T09:32:58.9319869Z // begin inline asm 2026-02-21T09:32:58.9319998Z @%p43 tcgen05.mma.cta_group::1.kind::f16 [ %r343 + 0 ], %rd88, %rd89, %r344, %p44; 2026-02-21T09:32:58.9320053Z // end inline asm 2026-02-21T09:32:58.9320114Z add.s32 %r372, %r357, 96; 2026-02-21T09:32:58.9320168Z bfe.u32 %r373, %r372, 4, 14; 2026-02-21T09:32:58.9320225Z cvt.u64.u32 %rd100, %r373; 2026-02-21T09:32:58.9320292Z or.b64 %rd90, %rd100, 4611686293372403712; 2026-02-21T09:32:58.9320356Z add.s32 %r374, %r77, 96; 2026-02-21T09:32:58.9320410Z bfe.u32 %r375, %r374, 4, 14; 2026-02-21T09:32:58.9320467Z cvt.u64.u32 %rd101, %r375; 2026-02-21T09:32:58.9320540Z or.b64 %rd91, %rd101, 4611686293372403712; 2026-02-21T09:32:58.9320594Z // begin inline asm 2026-02-21T09:32:58.9320719Z @%p43 tcgen05.mma.cta_group::1.kind::f16 [ %r343 + 0 ], %rd90, %rd91, %r344, %p44; 2026-02-21T09:32:58.9320778Z // end inline asm 2026-02-21T09:32:58.9320834Z cvt.u64.u32 %rd92, %r360; 2026-02-21T09:32:58.9320888Z // begin inline asm 2026-02-21T09:32:58.9321008Z @%p43 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd92]; 2026-02-21T09:32:58.9321071Z // end inline asm 2026-02-21T09:32:58.9321131Z and.pred %p51, %p52, %p43; 2026-02-21T09:32:58.9321186Z cvt.u64.u32 %rd93, %r354; 2026-02-21T09:32:58.9321247Z // begin inline asm 2026-02-21T09:32:58.9321368Z @%p51 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd93]; 2026-02-21T09:32:58.9321420Z // end inline asm 2026-02-21T09:32:58.9321484Z bra.uni $L__BB0_13; 2026-02-21T09:32:58.9321580Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9321749Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9321805Z barrier.sync 1; 2026-02-21T09:32:58.9321982Z .loc 1 21 68 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:21:68 2026-02-21T09:32:58.9322044Z mov.u32 %r82, %ctaid.x; 2026-02-21T09:32:58.9322104Z mov.u32 %r158, %ctaid.y; 2026-02-21T09:32:58.9322167Z mov.u32 %r159, %ctaid.z; 2026-02-21T09:32:58.9322244Z mov.u32 %r160, %nctaid.x; 2026-02-21T09:32:58.9322324Z mov.u32 %r161, %nctaid.y; 2026-02-21T09:32:58.9322396Z mad.lo.s32 %r162, %r159, %r161, %r158; 2026-02-21T09:32:58.9322458Z mad.lo.s32 %r163, %r162, %r160, %r82; 2026-02-21T09:32:58.9322514Z shl.b32 %r164, %r163, 7; 2026-02-21T09:32:58.9322569Z cvt.s64.s32 %rd14, %r164; 2026-02-21T09:32:58.9322633Z add.s64 %rd15, %rd13, %rd14; 2026-02-21T09:32:58.9322694Z cvta.global.u64 %rd16, %rd15; 2026-02-21T09:32:58.9322865Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9322926Z max.u32 %r165, %r82, 767; 2026-02-21T09:32:58.9322980Z shl.b32 %r166, %r165, 4; 2026-02-21T09:32:58.9323036Z sub.s32 %r83, 12288, %r166; 2026-02-21T09:32:58.9323094Z setp.lt.s32 %p5, %r83, 1; 2026-02-21T09:32:58.9323156Z @%p5 bra $L__BB0_20; 2026-02-21T09:32:58.9323230Z // %bb.16: // %.lr.ph 2026-02-21T09:32:58.9323316Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9323381Z add.s32 %r1212, %r82, -1; 2026-02-21T09:32:58.9323437Z add.s32 %r85, %r1, -256; 2026-02-21T09:32:58.9323492Z mov.b32 %r1209, -1; 2026-02-21T09:32:58.9323545Z mov.b32 %r1204, 0; 2026-02-21T09:32:58.9323609Z mov.b32 %r1205, %r1204; 2026-02-21T09:32:58.9323662Z mov.b32 %r1211, %r1204; 2026-02-21T09:32:58.9323715Z mov.b32 %r1207, %r1204; 2026-02-21T09:32:58.9323775Z mov.b32 %r1210, %r1204; 2026-02-21T09:32:58.9323830Z bra.uni $L__BB0_17; 2026-02-21T09:32:58.9323925Z $L__BB0_19: // in Loop: Header=BB0_17 Depth=2 2026-02-21T09:32:58.9324095Z .loc 1 0 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0:107 2026-02-21T09:32:58.9324155Z selp.b32 %r188, 0, %r1207, %p8; 2026-02-21T09:32:58.9324216Z setp.lt.u32 %p11, %r85, 32; 2026-02-21T09:32:58.9324298Z setp.eq.b32 %p9, %r85, 0; 2026-02-21T09:32:58.9324496Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9324554Z shl.b32 %r191, %r1205, 3; 2026-02-21T09:32:58.9324610Z add.s32 %r193, %r156, %r191; 2026-02-21T09:32:58.9324700Z add.s32 %r184, %r193, 90112; 2026-02-21T09:32:58.9324863Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9324917Z // begin inline asm 2026-02-21T09:32:58.9324972Z 2026-02-21T09:32:58.9325020Z { 2026-02-21T09:32:58.9325080Z .reg .pred complete; 2026-02-21T09:32:58.9325132Z waitLoop: 2026-02-21T09:32:58.9325257Z mbarrier.try_wait.parity.shared.b64 complete, [%r184], %r1204; 2026-02-21T09:32:58.9325319Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9325366Z } 2026-02-21T09:32:58.9325370Z 2026-02-21T09:32:58.9325429Z // end inline asm 2026-02-21T09:32:58.9325593Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9325650Z add.s32 %r190, %r193, 90144; 2026-02-21T09:32:58.9325818Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9325871Z bar.sync 3, 64; 2026-02-21T09:32:58.9325924Z // begin inline asm 2026-02-21T09:32:58.9326031Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r190], 16384; 2026-02-21T09:32:58.9326090Z // end inline asm 2026-02-21T09:32:58.9326147Z shl.b32 %r194, %r1205, 14; 2026-02-21T09:32:58.9326202Z add.s32 %r187, %r156, %r194; 2026-02-21T09:32:58.9326261Z bar.sync 3, 64; 2026-02-21T09:32:58.9326321Z elect.sync %r195|%p12, -1; 2026-02-21T09:32:58.9326378Z and.pred %p10, %p11, %p12; 2026-02-21T09:32:58.9326431Z // begin inline asm 2026-02-21T09:32:58.9326691Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r187], [%rd16, {%r188, %r1211}], [%r190]; 2026-02-21T09:32:58.9326744Z // end inline asm 2026-02-21T09:32:58.9326916Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9327033Z add.s32 %r1207, %r188, 64; 2026-02-21T09:32:58.9327197Z .loc 1 54 31 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:54:31 2026-02-21T09:32:58.9327253Z add.s32 %r196, %r1205, 1; 2026-02-21T09:32:58.9327318Z setp.eq.b32 %p13, %r196, 3; 2026-02-21T09:32:58.9327380Z selp.b32 %r1205, 0, %r196, %p13; 2026-02-21T09:32:58.9327438Z selp.b32 %r197, 1, 0, %p13; 2026-02-21T09:32:58.9327497Z xor.b32 %r1204, %r1204, %r197; 2026-02-21T09:32:58.9327674Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9327730Z add.s32 %r1210, %r1210, 1; 2026-02-21T09:32:58.9327791Z setp.lt.s32 %p14, %r1210, %r83; 2026-02-21T09:32:58.9327855Z @%p14 bra $L__BB0_17; 2026-02-21T09:32:58.9327910Z bra.uni $L__BB0_20; 2026-02-21T09:32:58.9328008Z $L__BB0_17: // Parent Loop BB0_2 Depth=1 2026-02-21T09:32:58.9328108Z // => This Inner Loop Header: Depth=2 2026-02-21T09:32:58.9328166Z add.s32 %r169, %r1209, 1; 2026-02-21T09:32:58.9328225Z setp.eq.b32 %p6, %r1209, 15; 2026-02-21T09:32:58.9328285Z selp.b32 %r1209, 0, %r169, %p6; 2026-02-21T09:32:58.9328351Z setp.ne.b32 %p7, %r1209, 0; 2026-02-21T09:32:58.9328409Z setp.eq.b32 %p8, %r1209, 0; 2026-02-21T09:32:58.9328464Z @%p7 bra $L__BB0_19; 2026-02-21T09:32:58.9328564Z // %bb.18: // in Loop: Header=BB0_17 Depth=2 2026-02-21T09:32:58.9328621Z add.s32 %r1212, %r1212, 1; 2026-02-21T09:32:58.9328781Z .loc 1 36 35 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:36:35 2026-02-21T09:32:58.9328862Z mul.hi.s32 %r170, %r1212, -715827883; 2026-02-21T09:32:58.9328917Z shr.u32 %r171, %r170, 31; 2026-02-21T09:32:58.9328972Z shr.s32 %r172, %r170, 10; 2026-02-21T09:32:58.9329054Z add.s32 %r173, %r172, %r171; 2026-02-21T09:32:58.9329247Z .loc 1 37 33 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:37:33 2026-02-21T09:32:58.9329307Z shl.b32 %r174, %r173, 6; 2026-02-21T09:32:58.9329468Z .loc 1 38 39 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:39 2026-02-21T09:32:58.9329531Z or.b32 %r175, %r174, 8; 2026-02-21T09:32:58.9329694Z .loc 1 38 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:38:52 2026-02-21T09:32:58.9329752Z min.s32 %r176, %r175, 64; 2026-02-21T09:32:58.9329924Z .loc 1 39 45 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:39:45 2026-02-21T09:32:58.9329988Z mul.hi.s32 %r177, %r1212, 715827883; 2026-02-21T09:32:58.9330053Z shr.u32 %r178, %r177, 31; 2026-02-21T09:32:58.9330107Z shr.u32 %r179, %r177, 10; 2026-02-21T09:32:58.9330171Z add.s32 %r180, %r179, %r178; 2026-02-21T09:32:58.9330230Z mul.lo.s32 %r181, %r180, 6144; 2026-02-21T09:32:58.9330290Z sub.s32 %r182, %r1212, %r181; 2026-02-21T09:32:58.9330459Z .loc 1 40 51 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:40:51 2026-02-21T09:32:58.9330517Z div.s32 %r183, %r182, %r176; 2026-02-21T09:32:58.9330677Z .loc 1 43 27 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:43:27 2026-02-21T09:32:58.9330739Z shl.b32 %r1211, %r183, 7; 2026-02-21T09:32:58.9330793Z bra.uni $L__BB0_19; 2026-02-21T09:32:58.9330871Z $L__BB0_20: // %._crit_edge 2026-02-21T09:32:58.9330956Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9331134Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9331190Z barrier.sync 1; 2026-02-21T09:32:58.9331244Z bra.uni $L__BB0_2; 2026-02-21T09:32:58.9331333Z $L__BB0_14: // %._crit_edge14 2026-02-21T09:32:58.9331419Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9331503Z cp.async.wait_group 0; 2026-02-21T09:32:58.9331598Z bar.sync 2, 128; 2026-02-21T09:32:58.9331653Z barrier.sync 1; 2026-02-21T09:32:58.9331707Z bra.uni $L__BB0_2; 2026-02-21T09:32:58.9331798Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:32:58.9331965Z .loc 1 19 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:19 2026-02-21T09:32:58.9332020Z barrier.sync 1; 2026-02-21T09:32:58.9332073Z barrier.sync 1; 2026-02-21T09:32:58.9332131Z bra.uni $L__BB0_2; 2026-02-21T09:32:58.9332210Z $L__BB0_29: // %._crit_edge17 2026-02-21T09:32:58.9332372Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9332433Z barrier.sync 1; 2026-02-21T09:32:58.9332491Z shl.b32 %r1175, %r1224, 3; 2026-02-21T09:32:58.9332553Z add.s32 %r1162, %r574, %r1175; 2026-02-21T09:32:58.9332712Z .loc 1 56 52 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:56:52 2026-02-21T09:32:58.9332778Z bar.sync 0, 128; 2026-02-21T09:32:58.9332833Z // begin inline asm 2026-02-21T09:32:58.9332881Z 2026-02-21T09:32:58.9332935Z { 2026-02-21T09:32:58.9332993Z .reg .pred complete; 2026-02-21T09:32:58.9333046Z waitLoop: 2026-02-21T09:32:58.9333166Z mbarrier.try_wait.parity.shared.b64 complete, [%r1162], %r1226; 2026-02-21T09:32:58.9333238Z @!complete bra.uni waitLoop; 2026-02-21T09:32:58.9333285Z } 2026-02-21T09:32:58.9333289Z 2026-02-21T09:32:58.9333342Z // end inline asm 2026-02-21T09:32:58.9333515Z .loc 1 30 107 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:107 2026-02-21T09:32:58.9333569Z bar.sync 0, 128; 2026-02-21T09:32:58.9333623Z // begin inline asm 2026-02-21T09:32:58.9333715Z @%p111 mbarrier.inval.shared::cta.b64 [%r574]; 2026-02-21T09:32:58.9333767Z // end inline asm 2026-02-21T09:32:58.9333844Z bar.sync 0, 128; 2026-02-21T09:32:58.9333920Z // begin inline asm 2026-02-21T09:32:58.9334014Z @%p111 mbarrier.inval.shared::cta.b64 [%r575]; 2026-02-21T09:32:58.9334066Z // end inline asm 2026-02-21T09:32:58.9334118Z // begin inline asm 2026-02-21T09:32:58.9334200Z @%p111 mbarrier.inval.shared::cta.b64 [%r572]; 2026-02-21T09:32:58.9334251Z // end inline asm 2026-02-21T09:32:58.9334301Z bar.sync 0, 128; 2026-02-21T09:32:58.9334353Z // begin inline asm 2026-02-21T09:32:58.9334434Z @%p111 mbarrier.inval.shared::cta.b64 [%r573]; 2026-02-21T09:32:58.9334485Z // end inline asm 2026-02-21T09:32:58.9334538Z // begin inline asm 2026-02-21T09:32:58.9334618Z @%p111 mbarrier.inval.shared::cta.b64 [%r566]; 2026-02-21T09:32:58.9334703Z // end inline asm 2026-02-21T09:32:58.9334756Z bar.sync 0, 128; 2026-02-21T09:32:58.9334810Z // begin inline asm 2026-02-21T09:32:58.9334891Z @%p111 mbarrier.inval.shared::cta.b64 [%r567]; 2026-02-21T09:32:58.9334942Z // end inline asm 2026-02-21T09:32:58.9334995Z bar.sync 0, 128; 2026-02-21T09:32:58.9335054Z // begin inline asm 2026-02-21T09:32:58.9335129Z @%p111 mbarrier.inval.shared::cta.b64 [%r568]; 2026-02-21T09:32:58.9335184Z // end inline asm 2026-02-21T09:32:58.9335242Z // begin inline asm 2026-02-21T09:32:58.9335315Z @%p111 mbarrier.inval.shared::cta.b64 [%r563]; 2026-02-21T09:32:58.9335366Z // end inline asm 2026-02-21T09:32:58.9335419Z bar.sync 0, 128; 2026-02-21T09:32:58.9335480Z // begin inline asm 2026-02-21T09:32:58.9335553Z @%p111 mbarrier.inval.shared::cta.b64 [%r564]; 2026-02-21T09:32:58.9335604Z // end inline asm 2026-02-21T09:32:58.9335662Z bar.sync 0, 128; 2026-02-21T09:32:58.9335714Z // begin inline asm 2026-02-21T09:32:58.9335787Z @%p111 mbarrier.inval.shared::cta.b64 [%r565]; 2026-02-21T09:32:58.9335837Z // end inline asm 2026-02-21T09:32:58.9336008Z .loc 1 30 4 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:30:4 2026-02-21T09:32:58.9336061Z bar.sync 0, 128; 2026-02-21T09:32:58.9336119Z // begin inline asm 2026-02-21T09:32:58.9336246Z @%p61 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1174, 256; 2026-02-21T09:32:58.9336369Z // end inline asm 2026-02-21T09:32:58.9336470Z st.shared.v2.b32 [global_smem+90208], {50529027, 50529027}; 2026-02-21T09:32:58.9336532Z barrier.sync 1; 2026-02-21T09:32:58.9336611Z $L__BB0_30: // %common.ret 2026-02-21T09:32:58.9336765Z .loc 1 0 0 // cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py:0 2026-02-21T09:32:58.9336814Z ret; 2026-02-21T09:32:58.9336876Z $L__tmp1: 2026-02-21T09:32:58.9336930Z $L__func_end0: 2026-02-21T09:32:58.9337007Z // -- End function 2026-02-21T09:32:58.9337065Z } 2026-02-21T09:32:58.9337263Z .file 1 "/tmp/torchinductor_root/f5/cf5bj2ybu4uiqkxva2jry4hoox2no6efwobkv6yt2gkkq54ey7p4.py" 2026-02-21T09:32:58.9337325Z .section .debug_abbrev 2026-02-21T09:32:58.9337386Z { 2026-02-21T09:32:58.9337473Z .b8 1 // Abbreviation Code 2026-02-21T09:32:58.9337560Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:32:58.9337639Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:32:58.9337720Z .b8 37 // DW_AT_producer 2026-02-21T09:32:58.9337791Z .b8 8 // DW_FORM_string 2026-02-21T09:32:58.9337862Z .b8 19 // DW_AT_language 2026-02-21T09:32:58.9337949Z .b8 5 // DW_FORM_data2 2026-02-21T09:32:58.9338026Z .b8 3 // DW_AT_name 2026-02-21T09:32:58.9338099Z .b8 8 // DW_FORM_string 2026-02-21T09:32:58.9338189Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:32:58.9338267Z .b8 6 // DW_FORM_data4 2026-02-21T09:32:58.9338342Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:32:58.9338471Z .b8 8 // DW_FORM_string 2026-02-21T09:32:58.9338550Z .b8 0 // EOM(1) 2026-02-21T09:32:58.9338617Z .b8 0 // EOM(2) 2026-02-21T09:32:58.9338680Z .b8 0 // EOM(3) 2026-02-21T09:32:58.9338735Z } 2026-02-21T09:32:58.9338793Z .section .debug_info 2026-02-21T09:32:58.9338840Z { 2026-02-21T09:32:58.9338917Z .b32 104 // Length of Unit 2026-02-21T09:32:58.9339007Z .b8 2 // DWARF version number 2026-02-21T09:32:58.9339058Z .b8 0 2026-02-21T09:32:58.9339168Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:32:58.9339261Z .b8 8 // Address Size (in bytes) 2026-02-21T09:32:58.9339357Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:32:58.9339435Z .b8 116 // DW_AT_producer 2026-02-21T09:32:58.9339495Z .b8 114 2026-02-21T09:32:58.9339548Z .b8 105 2026-02-21T09:32:58.9339597Z .b8 116 2026-02-21T09:32:58.9339648Z .b8 111 2026-02-21T09:32:58.9339706Z .b8 110 2026-02-21T09:32:58.9339756Z .b8 0 2026-02-21T09:32:58.9339826Z .b8 2 // DW_AT_language 2026-02-21T09:32:58.9339882Z .b8 0 2026-02-21T09:32:58.9339953Z .b8 99 // DW_AT_name 2026-02-21T09:32:58.9340004Z .b8 102 2026-02-21T09:32:58.9340052Z .b8 53 2026-02-21T09:32:58.9340110Z .b8 98 2026-02-21T09:32:58.9340158Z .b8 106 2026-02-21T09:32:58.9340209Z .b8 50 2026-02-21T09:32:58.9340264Z .b8 121 2026-02-21T09:32:58.9340313Z .b8 98 2026-02-21T09:32:58.9340361Z .b8 117 2026-02-21T09:32:58.9340409Z .b8 52 2026-02-21T09:32:58.9340464Z .b8 117 2026-02-21T09:32:58.9340512Z .b8 105 2026-02-21T09:32:58.9340560Z .b8 113 2026-02-21T09:32:58.9340615Z .b8 107 2026-02-21T09:32:58.9340662Z .b8 120 2026-02-21T09:32:58.9340708Z .b8 118 2026-02-21T09:32:58.9340756Z .b8 97 2026-02-21T09:32:58.9340812Z .b8 50 2026-02-21T09:32:58.9340861Z .b8 106 2026-02-21T09:32:58.9340931Z .b8 114 2026-02-21T09:32:58.9340981Z .b8 121 2026-02-21T09:32:58.9341056Z .b8 52 2026-02-21T09:32:58.9341104Z .b8 104 2026-02-21T09:32:58.9341151Z .b8 111 2026-02-21T09:32:58.9341204Z .b8 111 2026-02-21T09:32:58.9341251Z .b8 120 2026-02-21T09:32:58.9341298Z .b8 50 2026-02-21T09:32:58.9341344Z .b8 110 2026-02-21T09:32:58.9341399Z .b8 111 2026-02-21T09:32:58.9341446Z .b8 54 2026-02-21T09:32:58.9341494Z .b8 101 2026-02-21T09:32:58.9341548Z .b8 102 2026-02-21T09:32:58.9341595Z .b8 119 2026-02-21T09:32:58.9341643Z .b8 111 2026-02-21T09:32:58.9341690Z .b8 98 2026-02-21T09:32:58.9341744Z .b8 107 2026-02-21T09:32:58.9341791Z .b8 118 2026-02-21T09:32:58.9341837Z .b8 54 2026-02-21T09:32:58.9341885Z .b8 121 2026-02-21T09:32:58.9341940Z .b8 116 2026-02-21T09:32:58.9341987Z .b8 50 2026-02-21T09:32:58.9342034Z .b8 103 2026-02-21T09:32:58.9342086Z .b8 107 2026-02-21T09:32:58.9342133Z .b8 107 2026-02-21T09:32:58.9342180Z .b8 113 2026-02-21T09:32:58.9342229Z .b8 53 2026-02-21T09:32:58.9342285Z .b8 52 2026-02-21T09:32:58.9342335Z .b8 101 2026-02-21T09:32:58.9342383Z .b8 121 2026-02-21T09:32:58.9342439Z .b8 55 2026-02-21T09:32:58.9342488Z .b8 112 2026-02-21T09:32:58.9342535Z .b8 52 2026-02-21T09:32:58.9342583Z .b8 46 2026-02-21T09:32:58.9342638Z .b8 112 2026-02-21T09:32:58.9342686Z .b8 121 2026-02-21T09:32:58.9342733Z .b8 0 2026-02-21T09:32:58.9342830Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:32:58.9342901Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:32:58.9342950Z .b8 116 2026-02-21T09:32:58.9342998Z .b8 109 2026-02-21T09:32:58.9343054Z .b8 112 2026-02-21T09:32:58.9343116Z .b8 47 2026-02-21T09:32:58.9343169Z .b8 116 2026-02-21T09:32:58.9343229Z .b8 111 2026-02-21T09:32:58.9343281Z .b8 114 2026-02-21T09:32:58.9343333Z .b8 99 2026-02-21T09:32:58.9343384Z .b8 104 2026-02-21T09:32:58.9343446Z .b8 105 2026-02-21T09:32:58.9343497Z .b8 110 2026-02-21T09:32:58.9343548Z .b8 100 2026-02-21T09:32:58.9343620Z .b8 117 2026-02-21T09:32:58.9343703Z .b8 99 2026-02-21T09:32:58.9343757Z .b8 116 2026-02-21T09:32:58.9343807Z .b8 111 2026-02-21T09:32:58.9343865Z .b8 114 2026-02-21T09:32:58.9343915Z .b8 95 2026-02-21T09:32:58.9343966Z .b8 114 2026-02-21T09:32:58.9344015Z .b8 111 2026-02-21T09:32:58.9344072Z .b8 111 2026-02-21T09:32:58.9344123Z .b8 116 2026-02-21T09:32:58.9344172Z .b8 47 2026-02-21T09:32:58.9344229Z .b8 102 2026-02-21T09:32:58.9344278Z .b8 53 2026-02-21T09:32:58.9344329Z .b8 0 2026-02-21T09:32:58.9344378Z } 2026-02-21T09:32:58.9344451Z .section .debug_macinfo { } 2026-02-21T09:32:58.9344455Z 2026-02-21T09:32:58.9344533Z ================================================================ 2026-02-21T09:32:58.9344637Z please share the reproducer above with Triton project. 2026-02-21T09:33:01.3619589Z 2026-02-21T09:33:01.3624269Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 82/82 16.7 configs/s 2026-02-21T09:33:04.7430012Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 298.8 2026-02-21T09:33:04.7430689Z configs/s 2026-02-21T09:33:04.9572737Z [136s] Generation 6 complete: 2026-02-21T09:33:04.9575850Z error=9 2026-02-21T09:33:04.9580217Z ok=77 2026-02-21T09:33:04.9584441Z min=0.0430 2026-02-21T09:33:04.9586419Z mid=0.0593 2026-02-21T09:33:04.9590679Z max=11.0593 2026-02-21T09:33:04.9596460Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:33:04.9601713Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:33:04.9605854Z 'l2_groupings': [64], 2026-02-21T09:33:04.9610777Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:33:04.9615182Z 'loop_orders': [[0, 1]], 2026-02-21T09:33:04.9616682Z 'num_stages': 3, 2026-02-21T09:33:04.9616852Z 'num_warps': 4, 2026-02-21T09:33:04.9617006Z 'pid_type': 'flat', 2026-02-21T09:33:04.9617184Z 'range_flattens': [None, True], 2026-02-21T09:33:04.9617380Z 'range_multi_buffers': [None, True], 2026-02-21T09:33:04.9617602Z 'range_num_stages': [0, 0], 2026-02-21T09:33:04.9617792Z 'range_unroll_factors': [0, 0], 2026-02-21T09:33:04.9618256Z 'range_warp_specializes': [None, None]} 2026-02-21T09:33:04.9618565Z [136s] Fitting surrogate: 640 points, 640 targets 2026-02-21T09:33:05.9064625Z [137s] Generation 7 starting: 48 neighbors, 3 active search path(s) 2026-02-21T09:33:37.8748407Z [169s] Timeout after 30s compiling Config(block_sizes=[512, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=2, num_warps=1, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:33:37.8767307Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49/49 0.3 configs/s 2026-02-21T09:33:40.2047971Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 49/49 21.6 configs/s 2026-02-21T09:33:41.8611504Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 606.3 2026-02-21T09:33:41.8612345Z configs/s 2026-02-21T09:33:41.9772368Z [173s] Generation 7 complete: 2026-02-21T09:33:41.9774395Z error=10 2026-02-21T09:33:41.9774616Z timeout=1 2026-02-21T09:33:41.9774867Z ok=40 2026-02-21T09:33:41.9775058Z min=0.0429 2026-02-21T09:33:41.9775236Z mid=0.0614 2026-02-21T09:33:41.9775393Z max=5.6843 2026-02-21T09:33:41.9775601Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:33:41.9775873Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:33:41.9776128Z 'l2_groupings': [64], 2026-02-21T09:33:41.9776312Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:33:41.9776528Z 'loop_orders': [[0, 1]], 2026-02-21T09:33:41.9776701Z 'num_stages': 3, 2026-02-21T09:33:41.9776867Z 'num_warps': 4, 2026-02-21T09:33:41.9777026Z 'pid_type': 'flat', 2026-02-21T09:33:41.9777208Z 'range_flattens': [None, True], 2026-02-21T09:33:41.9777438Z 'range_multi_buffers': [None, True], 2026-02-21T09:33:41.9797667Z 'range_num_stages': [0, 0], 2026-02-21T09:33:41.9797916Z 'range_unroll_factors': [0, 0], 2026-02-21T09:33:41.9798182Z 'range_warp_specializes': [None, None]} 2026-02-21T09:33:41.9798447Z [173s] Fitting surrogate: 691 points, 691 targets 2026-02-21T09:33:42.6938373Z [174s] Generation 8 starting: 32 neighbors, 2 active search path(s) 2026-02-21T09:34:03.3493970Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 33/33 0.4 configs/s 2026-02-21T09:34:05.6642889Z 2026-02-21T09:34:05.6644329Z 2026-02-21T09:34:05.6645071Z ================================================================ 2026-02-21T09:34:05.6645413Z Internal Triton PTX codegen error 2026-02-21T09:34:05.6645637Z `ptxas` stderr: 2026-02-21T09:34:05.6646143Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 72 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:34:05.6646773Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:34:05.6646974Z 2026-02-21T09:34:05.6647518Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp7pzebf1f.ptx -o /tmp/tmp7pzebf1f.ptx.o 2026-02-21T09:34:05.6648113Z 2026-02-21T09:34:05.6648118Z 2026-02-21T09:34:05.6648275Z // 2026-02-21T09:34:05.6651527Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:34:05.6656872Z // 2026-02-21T09:34:05.6658193Z 2026-02-21T09:34:05.6658352Z .version 8.7 2026-02-21T09:34:05.6658551Z .target sm_100a 2026-02-21T09:34:05.6658708Z .address_size 64 2026-02-21T09:34:05.6658805Z 2026-02-21T09:34:05.6658971Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:34:05.6659272Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:34:05.6659517Z // @_helion_matmul 2026-02-21T09:34:05.6659741Z .visible .entry _helion_matmul( 2026-02-21T09:34:05.6660004Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:34:05.6660304Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:34:05.6660943Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:34:05.6661240Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:34:05.6661531Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:34:05.6661775Z ) 2026-02-21T09:34:05.6661912Z .reqntid 384 2026-02-21T09:34:05.6662068Z .maxnreg 32 2026-02-21T09:34:05.6662204Z { 2026-02-21T09:34:05.6662350Z .reg .pred %p<98>; 2026-02-21T09:34:05.6662522Z .reg .b32 %r<3364>; 2026-02-21T09:34:05.6662682Z .reg .b64 %rd<1299>; 2026-02-21T09:34:05.6662851Z $L__func_begin0: 2026-02-21T09:34:05.6662945Z 2026-02-21T09:34:05.6663005Z // %bb.0: 2026-02-21T09:34:05.6663298Z .loc 1 14 0 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:14 2026-02-21T09:34:05.6663628Z mov.u32 %r1, %tid.x; 2026-02-21T09:34:05.6663884Z shr.u32 %r2, %r1, 5; 2026-02-21T09:34:05.6664135Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:34:05.6664366Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:34:05.6664554Z @%p3 bra $L__BB0_18; 2026-02-21T09:34:05.6664805Z bra.uni $L__BB0_1; 2026-02-21T09:34:05.6664982Z $L__BB0_18: 2026-02-21T09:34:05.6665163Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T09:34:05.6665403Z setp.lt.u32 %p48, %r1, 32; 2026-02-21T09:34:05.6665595Z mov.b32 %r615, global_smem; 2026-02-21T09:34:05.6665787Z // begin inline asm 2026-02-21T09:34:05.6666073Z @%p48 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r615], 512; 2026-02-21T09:34:05.6666402Z // end inline asm 2026-02-21T09:34:05.6666586Z bar.sync 0, 128; 2026-02-21T09:34:05.6666769Z ld.shared.b32 %r3308, [global_smem]; 2026-02-21T09:34:05.6666986Z bar.sync 0, 128; 2026-02-21T09:34:05.6667135Z // begin inline asm 2026-02-21T09:34:05.6667374Z @%p48 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:34:05.6667638Z // end inline asm 2026-02-21T09:34:05.6667932Z .loc 1 20 46 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:46 2026-02-21T09:34:05.6668289Z mov.u32 %r120, %ctaid.x; 2026-02-21T09:34:05.6668975Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.6669353Z sub.s32 %r1164, 192, %r120; 2026-02-21T09:34:05.6669558Z mul.hi.s32 %r1165, %r1164, -580400985; 2026-02-21T09:34:05.6669785Z add.s32 %r1166, %r1165, %r1164; 2026-02-21T09:34:05.6669990Z shr.u32 %r1167, %r1166, 31; 2026-02-21T09:34:05.6670188Z shr.s32 %r1168, %r1166, 11; 2026-02-21T09:34:05.6670378Z add.s32 %r1169, %r1168, %r1167; 2026-02-21T09:34:05.6670588Z mul.lo.s32 %r1170, %r1169, 2368; 2026-02-21T09:34:05.6670805Z setp.ne.b32 %p85, %r1164, %r1170; 2026-02-21T09:34:05.6671018Z setp.lt.u32 %p86, %r120, 193; 2026-02-21T09:34:05.6671232Z and.pred %p87, %p86, %p85; 2026-02-21T09:34:05.6671438Z selp.b32 %r1171, 1, 0, %p87; 2026-02-21T09:34:05.6671648Z add.s32 %r1172, %r1169, %r1171; 2026-02-21T09:34:05.6671851Z shl.b32 %r188, %r1172, 5; 2026-02-21T09:34:05.6672198Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6672591Z shfl.sync.idx.b32 %r1173, %r2, 0, 31, -1; 2026-02-21T09:34:05.6672836Z shl.b32 %r1174, %r1173, 21; 2026-02-21T09:34:05.6673043Z and.b32 %r1175, %r1174, 6291456; 2026-02-21T09:34:05.6673257Z add.s32 %r616, %r1175, %r3308; 2026-02-21T09:34:05.6673473Z mov.pred %p50, -1; 2026-02-21T09:34:05.6673652Z mov.b32 %r3363, 0; 2026-02-21T09:34:05.6673835Z // begin inline asm 2026-02-21T09:34:05.6674356Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 0], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6674980Z // end inline asm 2026-02-21T09:34:05.6675155Z // begin inline asm 2026-02-21T09:34:05.6675667Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 16], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6676309Z // end inline asm 2026-02-21T09:34:05.6676474Z // begin inline asm 2026-02-21T09:34:05.6676980Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 32], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6677499Z // end inline asm 2026-02-21T09:34:05.6677662Z // begin inline asm 2026-02-21T09:34:05.6678164Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 48], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6678701Z // end inline asm 2026-02-21T09:34:05.6678873Z // begin inline asm 2026-02-21T09:34:05.6679433Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 64], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6680013Z // end inline asm 2026-02-21T09:34:05.6680187Z // begin inline asm 2026-02-21T09:34:05.6680662Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 80], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6681190Z // end inline asm 2026-02-21T09:34:05.6681348Z // begin inline asm 2026-02-21T09:34:05.6681831Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 96], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6682359Z // end inline asm 2026-02-21T09:34:05.6682523Z // begin inline asm 2026-02-21T09:34:05.6683022Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 112], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6683564Z // end inline asm 2026-02-21T09:34:05.6683743Z // begin inline asm 2026-02-21T09:34:05.6684218Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 128], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6684780Z // end inline asm 2026-02-21T09:34:05.6684959Z // begin inline asm 2026-02-21T09:34:05.6685442Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 144], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6685985Z // end inline asm 2026-02-21T09:34:05.6686155Z // begin inline asm 2026-02-21T09:34:05.6686661Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 160], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6687193Z // end inline asm 2026-02-21T09:34:05.6687364Z // begin inline asm 2026-02-21T09:34:05.6687859Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 176], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6688333Z // end inline asm 2026-02-21T09:34:05.6688493Z // begin inline asm 2026-02-21T09:34:05.6688925Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 192], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6689407Z // end inline asm 2026-02-21T09:34:05.6689566Z // begin inline asm 2026-02-21T09:34:05.6689996Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 208], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6690490Z // end inline asm 2026-02-21T09:34:05.6690643Z // begin inline asm 2026-02-21T09:34:05.6691099Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 224], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6691642Z // end inline asm 2026-02-21T09:34:05.6691793Z // begin inline asm 2026-02-21T09:34:05.6692225Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 240], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6692729Z // end inline asm 2026-02-21T09:34:05.6692883Z // begin inline asm 2026-02-21T09:34:05.6693331Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 256], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6693814Z // end inline asm 2026-02-21T09:34:05.6693968Z // begin inline asm 2026-02-21T09:34:05.6694458Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 272], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6695014Z // end inline asm 2026-02-21T09:34:05.6695176Z // begin inline asm 2026-02-21T09:34:05.6695727Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 288], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6696261Z // end inline asm 2026-02-21T09:34:05.6696417Z // begin inline asm 2026-02-21T09:34:05.6696867Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 304], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6697335Z // end inline asm 2026-02-21T09:34:05.6697488Z // begin inline asm 2026-02-21T09:34:05.6697923Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 320], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6698405Z // end inline asm 2026-02-21T09:34:05.6698562Z // begin inline asm 2026-02-21T09:34:05.6698997Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 336], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6699476Z // end inline asm 2026-02-21T09:34:05.6699627Z // begin inline asm 2026-02-21T09:34:05.6700062Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 352], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6700534Z // end inline asm 2026-02-21T09:34:05.6700690Z // begin inline asm 2026-02-21T09:34:05.6701151Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 368], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6701666Z // end inline asm 2026-02-21T09:34:05.6701854Z // begin inline asm 2026-02-21T09:34:05.6702300Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 384], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6702781Z // end inline asm 2026-02-21T09:34:05.6702937Z // begin inline asm 2026-02-21T09:34:05.6703390Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 400], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6703891Z // end inline asm 2026-02-21T09:34:05.6704048Z // begin inline asm 2026-02-21T09:34:05.6704510Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 416], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6705069Z // end inline asm 2026-02-21T09:34:05.6705243Z // begin inline asm 2026-02-21T09:34:05.6705726Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 432], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6706293Z // end inline asm 2026-02-21T09:34:05.6706447Z // begin inline asm 2026-02-21T09:34:05.6706888Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 448], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6707391Z // end inline asm 2026-02-21T09:34:05.6707555Z // begin inline asm 2026-02-21T09:34:05.6708009Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 464], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6708514Z // end inline asm 2026-02-21T09:34:05.6708670Z // begin inline asm 2026-02-21T09:34:05.6709210Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 480], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6709738Z // end inline asm 2026-02-21T09:34:05.6709904Z // begin inline asm 2026-02-21T09:34:05.6710375Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r616 + 496], {%r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363, %r3363}; 2026-02-21T09:34:05.6710877Z // end inline asm 2026-02-21T09:34:05.6711039Z // begin inline asm 2026-02-21T09:34:05.6711223Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:34:05.6711424Z // end inline asm 2026-02-21T09:34:05.6711578Z bar.sync 0, 128; 2026-02-21T09:34:05.6711898Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.6712264Z setp.eq.b32 %p94, %r1, 0; 2026-02-21T09:34:05.6712457Z add.s32 %r1160, %r615, 180224; 2026-02-21T09:34:05.6712653Z // begin inline asm 2026-02-21T09:34:05.6712853Z @%p94 mbarrier.init.shared::cta.b64 [%r1160], 1; 2026-02-21T09:34:05.6713086Z // end inline asm 2026-02-21T09:34:05.6713250Z add.s32 %r3304, %r615, 180240; 2026-02-21T09:34:05.6713444Z // begin inline asm 2026-02-21T09:34:05.6713639Z @%p94 mbarrier.init.shared::cta.b64 [%r3304], 1; 2026-02-21T09:34:05.6713871Z // end inline asm 2026-02-21T09:34:05.6714178Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6714525Z bar.sync 0, 128; 2026-02-21T09:34:05.6714745Z // begin inline asm 2026-02-21T09:34:05.6714955Z @%p94 mbarrier.arrive.shared::cta.b64 _, [%r3304]; 2026-02-21T09:34:05.6715202Z // end inline asm 2026-02-21T09:34:05.6715502Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.6715913Z st.shared.v2.b32 [global_smem+180248], {0, 50397698}; 2026-02-21T09:34:05.6716176Z st.shared.b32 [global_smem], %r3308; 2026-02-21T09:34:05.6716394Z st.shared.b32 [global_smem+8], %r188; 2026-02-21T09:34:05.6716599Z barrier.sync 1; 2026-02-21T09:34:05.6716757Z barrier.sync 1; 2026-02-21T09:34:05.6716931Z setp.lt.s32 %p88, %r1172, 1; 2026-02-21T09:34:05.6717118Z @%p88 bra $L__BB0_25; 2026-02-21T09:34:05.6717312Z // %bb.19: // %.lr.ph4 2026-02-21T09:34:05.6717651Z .loc 1 0 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:0:130 2026-02-21T09:34:05.6718010Z ld.param.b64 %rd19, [_helion_matmul_param_2]; 2026-02-21T09:34:05.6718227Z and.b32 %r121, %r1, 96; 2026-02-21T09:34:05.6718403Z bfe.u32 %r122, %r1, 5, 2; 2026-02-21T09:34:05.6718583Z or.b32 %r123, %r122, 4; 2026-02-21T09:34:05.6718748Z or.b32 %r124, %r122, 8; 2026-02-21T09:34:05.6718918Z or.b32 %r125, %r122, 12; 2026-02-21T09:34:05.6719083Z or.b32 %r126, %r122, 16; 2026-02-21T09:34:05.6719252Z or.b32 %r127, %r122, 20; 2026-02-21T09:34:05.6719411Z or.b32 %r128, %r122, 24; 2026-02-21T09:34:05.6719579Z or.b32 %r129, %r122, 28; 2026-02-21T09:34:05.6719742Z or.b32 %r130, %r122, 32; 2026-02-21T09:34:05.6719946Z or.b32 %r131, %r122, 36; 2026-02-21T09:34:05.6720142Z or.b32 %r132, %r122, 40; 2026-02-21T09:34:05.6720301Z or.b32 %r133, %r122, 44; 2026-02-21T09:34:05.6720467Z or.b32 %r134, %r122, 48; 2026-02-21T09:34:05.6720626Z or.b32 %r135, %r122, 52; 2026-02-21T09:34:05.6720793Z or.b32 %r136, %r122, 56; 2026-02-21T09:34:05.6720954Z or.b32 %r137, %r122, 60; 2026-02-21T09:34:05.6721121Z or.b32 %r138, %r122, 64; 2026-02-21T09:34:05.6721281Z or.b32 %r139, %r122, 68; 2026-02-21T09:34:05.6721448Z or.b32 %r140, %r122, 72; 2026-02-21T09:34:05.6721607Z or.b32 %r141, %r122, 76; 2026-02-21T09:34:05.6721772Z or.b32 %r142, %r122, 80; 2026-02-21T09:34:05.6721938Z or.b32 %r143, %r122, 84; 2026-02-21T09:34:05.6722097Z or.b32 %r144, %r122, 88; 2026-02-21T09:34:05.6722261Z or.b32 %r145, %r122, 92; 2026-02-21T09:34:05.6722420Z or.b32 %r146, %r122, 96; 2026-02-21T09:34:05.6722620Z or.b32 %r147, %r122, 100; 2026-02-21T09:34:05.6722819Z or.b32 %r148, %r122, 104; 2026-02-21T09:34:05.6722997Z or.b32 %r149, %r122, 108; 2026-02-21T09:34:05.6723169Z or.b32 %r150, %r122, 112; 2026-02-21T09:34:05.6723346Z or.b32 %r151, %r122, 116; 2026-02-21T09:34:05.6723499Z or.b32 %r152, %r122, 120; 2026-02-21T09:34:05.6723658Z or.b32 %r153, %r122, 124; 2026-02-21T09:34:05.6723818Z or.b32 %r154, %r122, 128; 2026-02-21T09:34:05.6723972Z or.b32 %r155, %r122, 132; 2026-02-21T09:34:05.6724136Z or.b32 %r156, %r122, 136; 2026-02-21T09:34:05.6724290Z or.b32 %r157, %r122, 140; 2026-02-21T09:34:05.6724451Z or.b32 %r158, %r122, 144; 2026-02-21T09:34:05.6724604Z or.b32 %r159, %r122, 148; 2026-02-21T09:34:05.6724810Z or.b32 %r160, %r122, 152; 2026-02-21T09:34:05.6724989Z or.b32 %r161, %r122, 156; 2026-02-21T09:34:05.6725174Z or.b32 %r162, %r122, 160; 2026-02-21T09:34:05.6725350Z or.b32 %r163, %r122, 164; 2026-02-21T09:34:05.6725534Z or.b32 %r164, %r122, 168; 2026-02-21T09:34:05.6725730Z or.b32 %r165, %r122, 172; 2026-02-21T09:34:05.6725907Z or.b32 %r166, %r122, 176; 2026-02-21T09:34:05.6726093Z or.b32 %r167, %r122, 180; 2026-02-21T09:34:05.6726266Z or.b32 %r168, %r122, 184; 2026-02-21T09:34:05.6726457Z or.b32 %r169, %r122, 188; 2026-02-21T09:34:05.6726615Z or.b32 %r170, %r122, 192; 2026-02-21T09:34:05.6726786Z or.b32 %r171, %r122, 196; 2026-02-21T09:34:05.6726945Z or.b32 %r172, %r122, 200; 2026-02-21T09:34:05.6727114Z or.b32 %r173, %r122, 204; 2026-02-21T09:34:05.6727275Z or.b32 %r174, %r122, 208; 2026-02-21T09:34:05.6727445Z or.b32 %r175, %r122, 212; 2026-02-21T09:34:05.6727617Z or.b32 %r176, %r122, 216; 2026-02-21T09:34:05.6727778Z or.b32 %r177, %r122, 220; 2026-02-21T09:34:05.6727945Z or.b32 %r178, %r122, 224; 2026-02-21T09:34:05.6728104Z or.b32 %r179, %r122, 228; 2026-02-21T09:34:05.6728274Z or.b32 %r180, %r122, 232; 2026-02-21T09:34:05.6728434Z or.b32 %r181, %r122, 236; 2026-02-21T09:34:05.6728604Z or.b32 %r182, %r122, 240; 2026-02-21T09:34:05.6728775Z or.b32 %r183, %r122, 244; 2026-02-21T09:34:05.6728957Z or.b32 %r184, %r122, 248; 2026-02-21T09:34:05.6729130Z or.b32 %r185, %r122, 252; 2026-02-21T09:34:05.6729314Z shl.b32 %r186, %r1, 3; 2026-02-21T09:34:05.6729501Z and.b32 %r187, %r186, 248; 2026-02-21T09:34:05.6729829Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.6730214Z add.s32 %r3360, %r120, -2368; 2026-02-21T09:34:05.6730404Z and.b32 %r1178, %r1, 7; 2026-02-21T09:34:05.6730590Z shl.b32 %r1179, %r1178, 11; 2026-02-21T09:34:05.6730761Z shl.b32 %r1180, %r1, 4; 2026-02-21T09:34:05.6730935Z and.b32 %r1181, %r1180, 2032; 2026-02-21T09:34:05.6731114Z or.b32 %r1182, %r1179, %r1181; 2026-02-21T09:34:05.6731309Z add.s32 %r1184, %r615, 163840; 2026-02-21T09:34:05.6731503Z add.s32 %r191, %r1184, %r1182; 2026-02-21T09:34:05.6731692Z xor.b32 %r1185, %r1182, 16; 2026-02-21T09:34:05.6731883Z add.s32 %r192, %r1184, %r1185; 2026-02-21T09:34:05.6732069Z xor.b32 %r1186, %r1182, 32; 2026-02-21T09:34:05.6732259Z add.s32 %r193, %r1184, %r1186; 2026-02-21T09:34:05.6732476Z xor.b32 %r1187, %r1182, 48; 2026-02-21T09:34:05.6732693Z add.s32 %r194, %r1184, %r1187; 2026-02-21T09:34:05.6732879Z xor.b32 %r1188, %r1182, 64; 2026-02-21T09:34:05.6733055Z add.s32 %r195, %r1184, %r1188; 2026-02-21T09:34:05.6733227Z xor.b32 %r1189, %r1182, 80; 2026-02-21T09:34:05.6733402Z add.s32 %r196, %r1184, %r1189; 2026-02-21T09:34:05.6733599Z xor.b32 %r1190, %r1182, 96; 2026-02-21T09:34:05.6733769Z add.s32 %r197, %r1184, %r1190; 2026-02-21T09:34:05.6733954Z xor.b32 %r1191, %r1182, 112; 2026-02-21T09:34:05.6734127Z add.s32 %r198, %r1184, %r1191; 2026-02-21T09:34:05.6734309Z shl.b32 %r1192, %r121, 6; 2026-02-21T09:34:05.6734481Z shl.b32 %r1193, %r1178, 4; 2026-02-21T09:34:05.6734714Z shr.u32 %r1194, %r121, 1; 2026-02-21T09:34:05.6734903Z bfe.s32 %r1195, %r1, 3, 1; 2026-02-21T09:34:05.6735105Z and.b32 %r1196, %r1195, 8256; 2026-02-21T09:34:05.6735346Z and.b32 %r1197, %r186, 128; 2026-02-21T09:34:05.6735574Z or.b32 %r1198, %r1192, %r1193; 2026-02-21T09:34:05.6735780Z or.b32 %r1199, %r1196, %r1194; 2026-02-21T09:34:05.6735965Z xor.b32 %r1200, %r1199, %r1198; 2026-02-21T09:34:05.6736169Z add.s32 %r1201, %r1184, %r1197; 2026-02-21T09:34:05.6736354Z add.s32 %r1755, %r1201, %r1200; 2026-02-21T09:34:05.6736542Z add.s32 %r1760, %r1755, 256; 2026-02-21T09:34:05.6736717Z add.s32 %r1765, %r1755, 512; 2026-02-21T09:34:05.6736897Z add.s32 %r1770, %r1755, 768; 2026-02-21T09:34:05.6737069Z add.s32 %r1775, %r1755, 1024; 2026-02-21T09:34:05.6737249Z add.s32 %r1780, %r1755, 1280; 2026-02-21T09:34:05.6737427Z add.s32 %r1785, %r1755, 1536; 2026-02-21T09:34:05.6737599Z add.s32 %r1790, %r1755, 1792; 2026-02-21T09:34:05.6737780Z max.s32 %r3353, %r188, 1; 2026-02-21T09:34:05.6737949Z mov.b32 %r3358, -1; 2026-02-21T09:34:05.6738115Z mov.b32 %r1176, 0; 2026-02-21T09:34:05.6738271Z mov.b32 %r3363, %r1176; 2026-02-21T09:34:05.6738448Z mov.b32 %r3361, %r1176; 2026-02-21T09:34:05.6738614Z mov.b32 %r3362, %r1176; 2026-02-21T09:34:05.6738786Z bra.uni $L__BB0_20; 2026-02-21T09:34:05.6739005Z $L__BB0_23: // in Loop: Header=BB0_20 Depth=1 2026-02-21T09:34:05.6739383Z .loc 1 32 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:32:32 2026-02-21T09:34:05.6739724Z or.b32 %r2328, %r3362, %r187; 2026-02-21T09:34:05.6740021Z .loc 1 34 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:34:32 2026-02-21T09:34:05.6740352Z add.s32 %r2329, %r3361, %r122; 2026-02-21T09:34:05.6740531Z add.s32 %r2330, %r3361, %r123; 2026-02-21T09:34:05.6740713Z add.s32 %r2331, %r3361, %r124; 2026-02-21T09:34:05.6740886Z add.s32 %r2332, %r3361, %r125; 2026-02-21T09:34:05.6741067Z add.s32 %r2333, %r3361, %r126; 2026-02-21T09:34:05.6741245Z add.s32 %r2334, %r3361, %r127; 2026-02-21T09:34:05.6741418Z add.s32 %r2335, %r3361, %r128; 2026-02-21T09:34:05.6741600Z add.s32 %r2336, %r3361, %r129; 2026-02-21T09:34:05.6741773Z add.s32 %r2337, %r3361, %r130; 2026-02-21T09:34:05.6741957Z add.s32 %r2338, %r3361, %r131; 2026-02-21T09:34:05.6742132Z add.s32 %r2339, %r3361, %r132; 2026-02-21T09:34:05.6742312Z add.s32 %r2340, %r3361, %r133; 2026-02-21T09:34:05.6742482Z add.s32 %r2341, %r3361, %r134; 2026-02-21T09:34:05.6742663Z add.s32 %r2342, %r3361, %r135; 2026-02-21T09:34:05.6742836Z add.s32 %r2343, %r3361, %r136; 2026-02-21T09:34:05.6743017Z add.s32 %r2344, %r3361, %r137; 2026-02-21T09:34:05.6743198Z add.s32 %r2345, %r3361, %r138; 2026-02-21T09:34:05.6743371Z add.s32 %r2346, %r3361, %r139; 2026-02-21T09:34:05.6743553Z add.s32 %r2347, %r3361, %r140; 2026-02-21T09:34:05.6743726Z add.s32 %r2348, %r3361, %r141; 2026-02-21T09:34:05.6743906Z add.s32 %r2349, %r3361, %r142; 2026-02-21T09:34:05.6744079Z add.s32 %r2350, %r3361, %r143; 2026-02-21T09:34:05.6744260Z add.s32 %r2351, %r3361, %r144; 2026-02-21T09:34:05.6744431Z add.s32 %r2352, %r3361, %r145; 2026-02-21T09:34:05.6744611Z add.s32 %r2353, %r3361, %r146; 2026-02-21T09:34:05.6744842Z add.s32 %r2354, %r3361, %r147; 2026-02-21T09:34:05.6745055Z add.s32 %r2355, %r3361, %r148; 2026-02-21T09:34:05.6758514Z add.s32 %r2356, %r3361, %r149; 2026-02-21T09:34:05.6758828Z add.s32 %r2357, %r3361, %r150; 2026-02-21T09:34:05.6759045Z add.s32 %r2358, %r3361, %r151; 2026-02-21T09:34:05.6759234Z add.s32 %r2359, %r3361, %r152; 2026-02-21T09:34:05.6759433Z add.s32 %r2360, %r3361, %r153; 2026-02-21T09:34:05.6759629Z add.s32 %r2361, %r3361, %r154; 2026-02-21T09:34:05.6759814Z add.s32 %r2362, %r3361, %r155; 2026-02-21T09:34:05.6760014Z add.s32 %r2363, %r3361, %r156; 2026-02-21T09:34:05.6760205Z add.s32 %r2364, %r3361, %r157; 2026-02-21T09:34:05.6760398Z add.s32 %r2365, %r3361, %r158; 2026-02-21T09:34:05.6760584Z add.s32 %r2366, %r3361, %r159; 2026-02-21T09:34:05.6760779Z add.s32 %r2367, %r3361, %r160; 2026-02-21T09:34:05.6760965Z add.s32 %r2368, %r3361, %r161; 2026-02-21T09:34:05.6761280Z add.s32 %r2369, %r3361, %r162; 2026-02-21T09:34:05.6761523Z add.s32 %r2370, %r3361, %r163; 2026-02-21T09:34:05.6761729Z add.s32 %r2371, %r3361, %r164; 2026-02-21T09:34:05.6761930Z add.s32 %r2372, %r3361, %r165; 2026-02-21T09:34:05.6762118Z add.s32 %r2373, %r3361, %r166; 2026-02-21T09:34:05.6762317Z add.s32 %r2374, %r3361, %r167; 2026-02-21T09:34:05.6762504Z add.s32 %r2375, %r3361, %r168; 2026-02-21T09:34:05.6762701Z add.s32 %r2376, %r3361, %r169; 2026-02-21T09:34:05.6762888Z add.s32 %r2377, %r3361, %r170; 2026-02-21T09:34:05.6763081Z add.s32 %r2378, %r3361, %r171; 2026-02-21T09:34:05.6763263Z add.s32 %r2379, %r3361, %r172; 2026-02-21T09:34:05.6763454Z add.s32 %r2380, %r3361, %r173; 2026-02-21T09:34:05.6763652Z add.s32 %r2381, %r3361, %r174; 2026-02-21T09:34:05.6763838Z add.s32 %r2382, %r3361, %r175; 2026-02-21T09:34:05.6764029Z add.s32 %r2383, %r3361, %r176; 2026-02-21T09:34:05.6764214Z add.s32 %r2384, %r3361, %r177; 2026-02-21T09:34:05.6764408Z add.s32 %r2385, %r3361, %r178; 2026-02-21T09:34:05.6764595Z add.s32 %r2386, %r3361, %r179; 2026-02-21T09:34:05.6764834Z add.s32 %r2387, %r3361, %r180; 2026-02-21T09:34:05.6765025Z add.s32 %r2388, %r3361, %r181; 2026-02-21T09:34:05.6765227Z add.s32 %r2389, %r3361, %r182; 2026-02-21T09:34:05.6765413Z add.s32 %r2390, %r3361, %r183; 2026-02-21T09:34:05.6765609Z add.s32 %r2391, %r3361, %r184; 2026-02-21T09:34:05.6765806Z add.s32 %r2392, %r3361, %r185; 2026-02-21T09:34:05.6766174Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6766569Z bar.sync 0, 128; 2026-02-21T09:34:05.6766762Z // begin inline asm 2026-02-21T09:34:05.6766947Z 2026-02-21T09:34:05.6767090Z { 2026-02-21T09:34:05.6767255Z .reg .pred complete; 2026-02-21T09:34:05.6767435Z waitLoop: 2026-02-21T09:34:05.6767681Z mbarrier.try_wait.parity.shared.b64 complete, [%r1160], %r3363; 2026-02-21T09:34:05.6767976Z @!complete bra.uni waitLoop; 2026-02-21T09:34:05.6768172Z } 2026-02-21T09:34:05.6768254Z 2026-02-21T09:34:05.6768334Z // end inline asm 2026-02-21T09:34:05.6768501Z // begin inline asm 2026-02-21T09:34:05.6768988Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1206, %r1207, %r1208, %r1209, %r1210, %r1211, %r1212, %r1213, %r1214, %r1215, %r1216, %r1217, %r1218, %r1219, %r1220, %r1221}, [%r616 + 0]; 2026-02-21T09:34:05.6769494Z // end inline asm 2026-02-21T09:34:05.6769675Z cvt.u64.u32 %rd275, %r1210; 2026-02-21T09:34:05.6769868Z cvt.u64.u32 %rd276, %r1211; 2026-02-21T09:34:05.6770065Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:34:05.6770263Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:34:05.6770458Z cvt.u64.u32 %rd279, %r1212; 2026-02-21T09:34:05.6770651Z cvt.u64.u32 %rd280, %r1213; 2026-02-21T09:34:05.6770837Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:34:05.6771040Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:34:05.6771236Z cvt.u64.u32 %rd283, %r1218; 2026-02-21T09:34:05.6771437Z cvt.u64.u32 %rd284, %r1219; 2026-02-21T09:34:05.6771625Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:34:05.6771829Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:34:05.6772025Z cvt.u64.u32 %rd287, %r1220; 2026-02-21T09:34:05.6772275Z cvt.u64.u32 %rd288, %r1221; 2026-02-21T09:34:05.6772531Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:34:05.6772716Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:34:05.6772913Z // begin inline asm 2026-02-21T09:34:05.6773381Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1223, %r1224, %r1225, %r1226, %r1227, %r1228, %r1229, %r1230, %r1231, %r1232, %r1233, %r1234, %r1235, %r1236, %r1237, %r1238}, [%r616 + 16]; 2026-02-21T09:34:05.6773885Z // end inline asm 2026-02-21T09:34:05.6774054Z cvt.u64.u32 %rd291, %r1227; 2026-02-21T09:34:05.6774248Z cvt.u64.u32 %rd292, %r1228; 2026-02-21T09:34:05.6774440Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:34:05.6774625Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:34:05.6774856Z cvt.u64.u32 %rd295, %r1229; 2026-02-21T09:34:05.6775039Z cvt.u64.u32 %rd296, %r1230; 2026-02-21T09:34:05.6775233Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:34:05.6775450Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:34:05.6775688Z cvt.u64.u32 %rd299, %r1235; 2026-02-21T09:34:05.6775880Z cvt.u64.u32 %rd300, %r1236; 2026-02-21T09:34:05.6776079Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:34:05.6776279Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:34:05.6776474Z cvt.u64.u32 %rd303, %r1237; 2026-02-21T09:34:05.6776664Z cvt.u64.u32 %rd304, %r1238; 2026-02-21T09:34:05.6776850Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:34:05.6777043Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:34:05.6777227Z // begin inline asm 2026-02-21T09:34:05.6777708Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1240, %r1241, %r1242, %r1243, %r1244, %r1245, %r1246, %r1247, %r1248, %r1249, %r1250, %r1251, %r1252, %r1253, %r1254, %r1255}, [%r616 + 32]; 2026-02-21T09:34:05.6778209Z // end inline asm 2026-02-21T09:34:05.6778384Z cvt.u64.u32 %rd307, %r1244; 2026-02-21T09:34:05.6778569Z cvt.u64.u32 %rd308, %r1245; 2026-02-21T09:34:05.6778764Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:34:05.6778961Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:34:05.6779152Z cvt.u64.u32 %rd311, %r1246; 2026-02-21T09:34:05.6779346Z cvt.u64.u32 %rd312, %r1247; 2026-02-21T09:34:05.6779529Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:34:05.6779718Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:34:05.6779904Z cvt.u64.u32 %rd315, %r1252; 2026-02-21T09:34:05.6780094Z cvt.u64.u32 %rd316, %r1253; 2026-02-21T09:34:05.6780275Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:34:05.6780468Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:34:05.6780666Z cvt.u64.u32 %rd319, %r1254; 2026-02-21T09:34:05.6780845Z cvt.u64.u32 %rd320, %r1255; 2026-02-21T09:34:05.6781033Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:34:05.6781217Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:34:05.6781411Z // begin inline asm 2026-02-21T09:34:05.6781872Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1257, %r1258, %r1259, %r1260, %r1261, %r1262, %r1263, %r1264, %r1265, %r1266, %r1267, %r1268, %r1269, %r1270, %r1271, %r1272}, [%r616 + 48]; 2026-02-21T09:34:05.6782386Z // end inline asm 2026-02-21T09:34:05.6782552Z cvt.u64.u32 %rd323, %r1261; 2026-02-21T09:34:05.6782747Z cvt.u64.u32 %rd324, %r1262; 2026-02-21T09:34:05.6782941Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:34:05.6783124Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:34:05.6783323Z cvt.u64.u32 %rd327, %r1263; 2026-02-21T09:34:05.6783505Z cvt.u64.u32 %rd328, %r1264; 2026-02-21T09:34:05.6783695Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:34:05.6783877Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:34:05.6784073Z cvt.u64.u32 %rd331, %r1269; 2026-02-21T09:34:05.6784254Z cvt.u64.u32 %rd332, %r1270; 2026-02-21T09:34:05.6784444Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:34:05.6784624Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:34:05.6784851Z cvt.u64.u32 %rd335, %r1271; 2026-02-21T09:34:05.6785040Z cvt.u64.u32 %rd336, %r1272; 2026-02-21T09:34:05.6785219Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:34:05.6785413Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:34:05.6785601Z // begin inline asm 2026-02-21T09:34:05.6786087Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1274, %r1275, %r1276, %r1277, %r1278, %r1279, %r1280, %r1281, %r1282, %r1283, %r1284, %r1285, %r1286, %r1287, %r1288, %r1289}, [%r616 + 64]; 2026-02-21T09:34:05.6786669Z // end inline asm 2026-02-21T09:34:05.6786848Z cvt.u64.u32 %rd339, %r1278; 2026-02-21T09:34:05.6787043Z cvt.u64.u32 %rd340, %r1279; 2026-02-21T09:34:05.6787233Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:34:05.6787435Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:34:05.6787629Z cvt.u64.u32 %rd343, %r1280; 2026-02-21T09:34:05.6787827Z cvt.u64.u32 %rd344, %r1281; 2026-02-21T09:34:05.6788014Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:34:05.6788215Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:34:05.6788409Z cvt.u64.u32 %rd347, %r1286; 2026-02-21T09:34:05.6788611Z cvt.u64.u32 %rd348, %r1287; 2026-02-21T09:34:05.6788803Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:34:05.6789004Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:34:05.6789239Z cvt.u64.u32 %rd351, %r1288; 2026-02-21T09:34:05.6789464Z cvt.u64.u32 %rd352, %r1289; 2026-02-21T09:34:05.6789668Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:34:05.6789863Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:34:05.6790065Z // begin inline asm 2026-02-21T09:34:05.6790532Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1291, %r1292, %r1293, %r1294, %r1295, %r1296, %r1297, %r1298, %r1299, %r1300, %r1301, %r1302, %r1303, %r1304, %r1305, %r1306}, [%r616 + 80]; 2026-02-21T09:34:05.6791073Z // end inline asm 2026-02-21T09:34:05.6791241Z cvt.u64.u32 %rd355, %r1295; 2026-02-21T09:34:05.6791439Z cvt.u64.u32 %rd356, %r1296; 2026-02-21T09:34:05.6791637Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:34:05.6791827Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:34:05.6792033Z cvt.u64.u32 %rd359, %r1297; 2026-02-21T09:34:05.6792223Z cvt.u64.u32 %rd360, %r1298; 2026-02-21T09:34:05.6792421Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:34:05.6792611Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:34:05.6792815Z cvt.u64.u32 %rd363, %r1303; 2026-02-21T09:34:05.6793006Z cvt.u64.u32 %rd364, %r1304; 2026-02-21T09:34:05.6793211Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:34:05.6793413Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:34:05.6793607Z cvt.u64.u32 %rd367, %r1305; 2026-02-21T09:34:05.6793807Z cvt.u64.u32 %rd368, %r1306; 2026-02-21T09:34:05.6793996Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:34:05.6794196Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:34:05.6794386Z // begin inline asm 2026-02-21T09:34:05.6794905Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1308, %r1309, %r1310, %r1311, %r1312, %r1313, %r1314, %r1315, %r1316, %r1317, %r1318, %r1319, %r1320, %r1321, %r1322, %r1323}, [%r616 + 96]; 2026-02-21T09:34:05.6795429Z // end inline asm 2026-02-21T09:34:05.6795605Z cvt.u64.u32 %rd371, %r1312; 2026-02-21T09:34:05.6795817Z cvt.u64.u32 %rd372, %r1313; 2026-02-21T09:34:05.6796000Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:34:05.6796194Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:34:05.6796394Z cvt.u64.u32 %rd375, %r1314; 2026-02-21T09:34:05.6796583Z cvt.u64.u32 %rd376, %r1315; 2026-02-21T09:34:05.6796785Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:34:05.6796975Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:34:05.6797180Z cvt.u64.u32 %rd379, %r1320; 2026-02-21T09:34:05.6797369Z cvt.u64.u32 %rd380, %r1321; 2026-02-21T09:34:05.6797569Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:34:05.6797763Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:34:05.6797967Z cvt.u64.u32 %rd383, %r1322; 2026-02-21T09:34:05.6798150Z cvt.u64.u32 %rd384, %r1323; 2026-02-21T09:34:05.6798345Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:34:05.6798535Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:34:05.6798722Z // begin inline asm 2026-02-21T09:34:05.6799199Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1325, %r1326, %r1327, %r1328, %r1329, %r1330, %r1331, %r1332, %r1333, %r1334, %r1335, %r1336, %r1337, %r1338, %r1339, %r1340}, [%r616 + 112]; 2026-02-21T09:34:05.6799700Z // end inline asm 2026-02-21T09:34:05.6799877Z cvt.u64.u32 %rd387, %r1329; 2026-02-21T09:34:05.6800064Z cvt.u64.u32 %rd388, %r1330; 2026-02-21T09:34:05.6800321Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:34:05.6800516Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:34:05.6800697Z cvt.u64.u32 %rd391, %r1331; 2026-02-21T09:34:05.6800873Z cvt.u64.u32 %rd392, %r1332; 2026-02-21T09:34:05.6801042Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:34:05.6801224Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:34:05.6801396Z cvt.u64.u32 %rd395, %r1337; 2026-02-21T09:34:05.6801572Z cvt.u64.u32 %rd396, %r1338; 2026-02-21T09:34:05.6801743Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:34:05.6801922Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:34:05.6802097Z cvt.u64.u32 %rd399, %r1339; 2026-02-21T09:34:05.6802274Z cvt.u64.u32 %rd400, %r1340; 2026-02-21T09:34:05.6802452Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:34:05.6802620Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:34:05.6802809Z // begin inline asm 2026-02-21T09:34:05.6803323Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1342, %r1343, %r1344, %r1345, %r1346, %r1347, %r1348, %r1349, %r1350, %r1351, %r1352, %r1353, %r1354, %r1355, %r1356, %r1357}, [%r616 + 128]; 2026-02-21T09:34:05.6803823Z // end inline asm 2026-02-21T09:34:05.6803984Z cvt.u64.u32 %rd403, %r1346; 2026-02-21T09:34:05.6804155Z cvt.u64.u32 %rd404, %r1347; 2026-02-21T09:34:05.6804331Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:34:05.6804501Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:34:05.6804758Z cvt.u64.u32 %rd407, %r1348; 2026-02-21T09:34:05.6804950Z cvt.u64.u32 %rd408, %r1349; 2026-02-21T09:34:05.6805151Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:34:05.6805344Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:34:05.6805551Z cvt.u64.u32 %rd411, %r1354; 2026-02-21T09:34:05.6805755Z cvt.u64.u32 %rd412, %r1355; 2026-02-21T09:34:05.6805940Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:34:05.6806135Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:34:05.6806324Z cvt.u64.u32 %rd415, %r1356; 2026-02-21T09:34:05.6806518Z cvt.u64.u32 %rd416, %r1357; 2026-02-21T09:34:05.6806691Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:34:05.6806874Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:34:05.6807047Z // begin inline asm 2026-02-21T09:34:05.6807484Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1359, %r1360, %r1361, %r1362, %r1363, %r1364, %r1365, %r1366, %r1367, %r1368, %r1369, %r1370, %r1371, %r1372, %r1373, %r1374}, [%r616 + 144]; 2026-02-21T09:34:05.6807952Z // end inline asm 2026-02-21T09:34:05.6808103Z cvt.u64.u32 %rd419, %r1363; 2026-02-21T09:34:05.6808278Z cvt.u64.u32 %rd420, %r1364; 2026-02-21T09:34:05.6808445Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:34:05.6808623Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:34:05.6808796Z cvt.u64.u32 %rd423, %r1365; 2026-02-21T09:34:05.6808971Z cvt.u64.u32 %rd424, %r1366; 2026-02-21T09:34:05.6809138Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:34:05.6809312Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:34:05.6809491Z cvt.u64.u32 %rd427, %r1371; 2026-02-21T09:34:05.6809660Z cvt.u64.u32 %rd428, %r1372; 2026-02-21T09:34:05.6809840Z shl.b64 %rd429, %rd428, 32; 2026-02-21T09:34:05.6810016Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T09:34:05.6810197Z cvt.u64.u32 %rd431, %r1373; 2026-02-21T09:34:05.6810365Z cvt.u64.u32 %rd432, %r1374; 2026-02-21T09:34:05.6810540Z shl.b64 %rd433, %rd432, 32; 2026-02-21T09:34:05.6810710Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T09:34:05.6810891Z // begin inline asm 2026-02-21T09:34:05.6811312Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1376, %r1377, %r1378, %r1379, %r1380, %r1381, %r1382, %r1383, %r1384, %r1385, %r1386, %r1387, %r1388, %r1389, %r1390, %r1391}, [%r616 + 160]; 2026-02-21T09:34:05.6811787Z // end inline asm 2026-02-21T09:34:05.6811948Z cvt.u64.u32 %rd435, %r1380; 2026-02-21T09:34:05.6812118Z cvt.u64.u32 %rd436, %r1381; 2026-02-21T09:34:05.6812294Z shl.b64 %rd437, %rd436, 32; 2026-02-21T09:34:05.6812464Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T09:34:05.6812647Z cvt.u64.u32 %rd439, %r1382; 2026-02-21T09:34:05.6812818Z cvt.u64.u32 %rd440, %r1383; 2026-02-21T09:34:05.6813053Z shl.b64 %rd441, %rd440, 32; 2026-02-21T09:34:05.6813257Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T09:34:05.6813444Z cvt.u64.u32 %rd443, %r1388; 2026-02-21T09:34:05.6813621Z cvt.u64.u32 %rd444, %r1389; 2026-02-21T09:34:05.6813791Z shl.b64 %rd445, %rd444, 32; 2026-02-21T09:34:05.6813970Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T09:34:05.6814143Z cvt.u64.u32 %rd447, %r1390; 2026-02-21T09:34:05.6814317Z cvt.u64.u32 %rd448, %r1391; 2026-02-21T09:34:05.6814484Z shl.b64 %rd449, %rd448, 32; 2026-02-21T09:34:05.6814707Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T09:34:05.6814904Z // begin inline asm 2026-02-21T09:34:05.6815386Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1393, %r1394, %r1395, %r1396, %r1397, %r1398, %r1399, %r1400, %r1401, %r1402, %r1403, %r1404, %r1405, %r1406, %r1407, %r1408}, [%r616 + 176]; 2026-02-21T09:34:05.6815934Z // end inline asm 2026-02-21T09:34:05.6816127Z cvt.u64.u32 %rd451, %r1397; 2026-02-21T09:34:05.6816352Z cvt.u64.u32 %rd452, %r1398; 2026-02-21T09:34:05.6816544Z shl.b64 %rd453, %rd452, 32; 2026-02-21T09:34:05.6816722Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T09:34:05.6816897Z cvt.u64.u32 %rd455, %r1399; 2026-02-21T09:34:05.6817077Z cvt.u64.u32 %rd456, %r1400; 2026-02-21T09:34:05.6817255Z shl.b64 %rd457, %rd456, 32; 2026-02-21T09:34:05.6817448Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T09:34:05.6817649Z cvt.u64.u32 %rd459, %r1405; 2026-02-21T09:34:05.6817831Z cvt.u64.u32 %rd460, %r1406; 2026-02-21T09:34:05.6818021Z shl.b64 %rd461, %rd460, 32; 2026-02-21T09:34:05.6818205Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T09:34:05.6818400Z cvt.u64.u32 %rd463, %r1407; 2026-02-21T09:34:05.6818583Z cvt.u64.u32 %rd464, %r1408; 2026-02-21T09:34:05.6818769Z shl.b64 %rd465, %rd464, 32; 2026-02-21T09:34:05.6818952Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T09:34:05.6819144Z // begin inline asm 2026-02-21T09:34:05.6819617Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1410, %r1411, %r1412, %r1413, %r1414, %r1415, %r1416, %r1417, %r1418, %r1419, %r1420, %r1421, %r1422, %r1423, %r1424, %r1425}, [%r616 + 192]; 2026-02-21T09:34:05.6820134Z // end inline asm 2026-02-21T09:34:05.6820305Z cvt.u64.u32 %rd467, %r1414; 2026-02-21T09:34:05.6820485Z cvt.u64.u32 %rd468, %r1415; 2026-02-21T09:34:05.6820673Z shl.b64 %rd469, %rd468, 32; 2026-02-21T09:34:05.6820859Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T09:34:05.6821054Z cvt.u64.u32 %rd471, %r1416; 2026-02-21T09:34:05.6821239Z cvt.u64.u32 %rd472, %r1417; 2026-02-21T09:34:05.6821427Z shl.b64 %rd473, %rd472, 32; 2026-02-21T09:34:05.6821619Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T09:34:05.6821815Z cvt.u64.u32 %rd475, %r1422; 2026-02-21T09:34:05.6822004Z cvt.u64.u32 %rd476, %r1423; 2026-02-21T09:34:05.6822186Z shl.b64 %rd477, %rd476, 32; 2026-02-21T09:34:05.6822381Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T09:34:05.6822567Z cvt.u64.u32 %rd479, %r1424; 2026-02-21T09:34:05.6822758Z cvt.u64.u32 %rd480, %r1425; 2026-02-21T09:34:05.6822943Z shl.b64 %rd481, %rd480, 32; 2026-02-21T09:34:05.6823140Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T09:34:05.6823342Z // begin inline asm 2026-02-21T09:34:05.6823776Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1427, %r1428, %r1429, %r1430, %r1431, %r1432, %r1433, %r1434, %r1435, %r1436, %r1437, %r1438, %r1439, %r1440, %r1441, %r1442}, [%r616 + 208]; 2026-02-21T09:34:05.6824249Z // end inline asm 2026-02-21T09:34:05.6824403Z cvt.u64.u32 %rd483, %r1431; 2026-02-21T09:34:05.6824587Z cvt.u64.u32 %rd484, %r1432; 2026-02-21T09:34:05.6824787Z shl.b64 %rd485, %rd484, 32; 2026-02-21T09:34:05.6824968Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T09:34:05.6825141Z cvt.u64.u32 %rd487, %r1433; 2026-02-21T09:34:05.6825320Z cvt.u64.u32 %rd488, %r1434; 2026-02-21T09:34:05.6825498Z shl.b64 %rd489, %rd488, 32; 2026-02-21T09:34:05.6825675Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T09:34:05.6825872Z cvt.u64.u32 %rd491, %r1439; 2026-02-21T09:34:05.6826056Z cvt.u64.u32 %rd492, %r1440; 2026-02-21T09:34:05.6826249Z shl.b64 %rd493, %rd492, 32; 2026-02-21T09:34:05.6826472Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T09:34:05.6826697Z cvt.u64.u32 %rd495, %r1441; 2026-02-21T09:34:05.6826882Z cvt.u64.u32 %rd496, %r1442; 2026-02-21T09:34:05.6827073Z shl.b64 %rd497, %rd496, 32; 2026-02-21T09:34:05.6827259Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T09:34:05.6827455Z // begin inline asm 2026-02-21T09:34:05.6827927Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1444, %r1445, %r1446, %r1447, %r1448, %r1449, %r1450, %r1451, %r1452, %r1453, %r1454, %r1455, %r1456, %r1457, %r1458, %r1459}, [%r616 + 224]; 2026-02-21T09:34:05.6828455Z // end inline asm 2026-02-21T09:34:05.6828627Z cvt.u64.u32 %rd499, %r1448; 2026-02-21T09:34:05.6828813Z cvt.u64.u32 %rd500, %r1449; 2026-02-21T09:34:05.6829004Z shl.b64 %rd501, %rd500, 32; 2026-02-21T09:34:05.6829193Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T09:34:05.6829393Z cvt.u64.u32 %rd503, %r1450; 2026-02-21T09:34:05.6829606Z cvt.u64.u32 %rd504, %r1451; 2026-02-21T09:34:05.6829829Z shl.b64 %rd505, %rd504, 32; 2026-02-21T09:34:05.6830026Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T09:34:05.6830217Z cvt.u64.u32 %rd507, %r1456; 2026-02-21T09:34:05.6830407Z cvt.u64.u32 %rd508, %r1457; 2026-02-21T09:34:05.6830588Z shl.b64 %rd509, %rd508, 32; 2026-02-21T09:34:05.6830776Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T09:34:05.6830962Z cvt.u64.u32 %rd511, %r1458; 2026-02-21T09:34:05.6831151Z cvt.u64.u32 %rd512, %r1459; 2026-02-21T09:34:05.6831332Z shl.b64 %rd513, %rd512, 32; 2026-02-21T09:34:05.6831524Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T09:34:05.6831717Z // begin inline asm 2026-02-21T09:34:05.6832177Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1461, %r1462, %r1463, %r1464, %r1465, %r1466, %r1467, %r1468, %r1469, %r1470, %r1471, %r1472, %r1473, %r1474, %r1475, %r1476}, [%r616 + 240]; 2026-02-21T09:34:05.6832684Z // end inline asm 2026-02-21T09:34:05.6832849Z cvt.u64.u32 %rd515, %r1465; 2026-02-21T09:34:05.6833039Z cvt.u64.u32 %rd516, %r1466; 2026-02-21T09:34:05.6833225Z shl.b64 %rd517, %rd516, 32; 2026-02-21T09:34:05.6833419Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T09:34:05.6833614Z cvt.u64.u32 %rd519, %r1467; 2026-02-21T09:34:05.6833798Z cvt.u64.u32 %rd520, %r1468; 2026-02-21T09:34:05.6833983Z shl.b64 %rd521, %rd520, 32; 2026-02-21T09:34:05.6834160Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T09:34:05.6834350Z cvt.u64.u32 %rd523, %r1473; 2026-02-21T09:34:05.6834527Z cvt.u64.u32 %rd524, %r1474; 2026-02-21T09:34:05.6834755Z shl.b64 %rd525, %rd524, 32; 2026-02-21T09:34:05.6834949Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T09:34:05.6835155Z cvt.u64.u32 %rd527, %r1475; 2026-02-21T09:34:05.6835350Z cvt.u64.u32 %rd528, %r1476; 2026-02-21T09:34:05.6835550Z shl.b64 %rd529, %rd528, 32; 2026-02-21T09:34:05.6835741Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T09:34:05.6835944Z // begin inline asm 2026-02-21T09:34:05.6836450Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1478, %r1479, %r1480, %r1481, %r1482, %r1483, %r1484, %r1485, %r1486, %r1487, %r1488, %r1489, %r1490, %r1491, %r1492, %r1493}, [%r616 + 256]; 2026-02-21T09:34:05.6836948Z // end inline asm 2026-02-21T09:34:05.6837106Z cvt.u64.u32 %rd531, %r1482; 2026-02-21T09:34:05.6837275Z cvt.u64.u32 %rd532, %r1483; 2026-02-21T09:34:05.6837452Z shl.b64 %rd533, %rd532, 32; 2026-02-21T09:34:05.6837621Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T09:34:05.6837805Z cvt.u64.u32 %rd535, %r1484; 2026-02-21T09:34:05.6837981Z cvt.u64.u32 %rd536, %r1485; 2026-02-21T09:34:05.6838149Z shl.b64 %rd537, %rd536, 32; 2026-02-21T09:34:05.6838324Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T09:34:05.6838500Z cvt.u64.u32 %rd539, %r1490; 2026-02-21T09:34:05.6838678Z cvt.u64.u32 %rd540, %r1491; 2026-02-21T09:34:05.6838847Z shl.b64 %rd541, %rd540, 32; 2026-02-21T09:34:05.6839022Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T09:34:05.6839197Z cvt.u64.u32 %rd543, %r1492; 2026-02-21T09:34:05.6839373Z cvt.u64.u32 %rd544, %r1493; 2026-02-21T09:34:05.6839543Z shl.b64 %rd545, %rd544, 32; 2026-02-21T09:34:05.6839722Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T09:34:05.6839936Z // begin inline asm 2026-02-21T09:34:05.6840408Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1495, %r1496, %r1497, %r1498, %r1499, %r1500, %r1501, %r1502, %r1503, %r1504, %r1505, %r1506, %r1507, %r1508, %r1509, %r1510}, [%r616 + 272]; 2026-02-21T09:34:05.6840877Z // end inline asm 2026-02-21T09:34:05.6841026Z cvt.u64.u32 %rd547, %r1499; 2026-02-21T09:34:05.6841200Z cvt.u64.u32 %rd548, %r1500; 2026-02-21T09:34:05.6841368Z shl.b64 %rd549, %rd548, 32; 2026-02-21T09:34:05.6841543Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T09:34:05.6841716Z cvt.u64.u32 %rd551, %r1501; 2026-02-21T09:34:05.6841890Z cvt.u64.u32 %rd552, %r1502; 2026-02-21T09:34:05.6842064Z shl.b64 %rd553, %rd552, 32; 2026-02-21T09:34:05.6842233Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T09:34:05.6842416Z cvt.u64.u32 %rd555, %r1507; 2026-02-21T09:34:05.6842581Z cvt.u64.u32 %rd556, %r1508; 2026-02-21T09:34:05.6842796Z shl.b64 %rd557, %rd556, 32; 2026-02-21T09:34:05.6842997Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T09:34:05.6843184Z cvt.u64.u32 %rd559, %r1509; 2026-02-21T09:34:05.6843356Z cvt.u64.u32 %rd560, %r1510; 2026-02-21T09:34:05.6843533Z shl.b64 %rd561, %rd560, 32; 2026-02-21T09:34:05.6843710Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T09:34:05.6843883Z // begin inline asm 2026-02-21T09:34:05.6844313Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1512, %r1513, %r1514, %r1515, %r1516, %r1517, %r1518, %r1519, %r1520, %r1521, %r1522, %r1523, %r1524, %r1525, %r1526, %r1527}, [%r616 + 288]; 2026-02-21T09:34:05.6844841Z // end inline asm 2026-02-21T09:34:05.6845024Z cvt.u64.u32 %rd563, %r1516; 2026-02-21T09:34:05.6845214Z cvt.u64.u32 %rd564, %r1517; 2026-02-21T09:34:05.6845415Z shl.b64 %rd565, %rd564, 32; 2026-02-21T09:34:05.6845609Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T09:34:05.6845827Z cvt.u64.u32 %rd567, %r1518; 2026-02-21T09:34:05.6846022Z cvt.u64.u32 %rd568, %r1519; 2026-02-21T09:34:05.6846219Z shl.b64 %rd569, %rd568, 32; 2026-02-21T09:34:05.6846418Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T09:34:05.6846623Z cvt.u64.u32 %rd571, %r1524; 2026-02-21T09:34:05.6846824Z cvt.u64.u32 %rd572, %r1525; 2026-02-21T09:34:05.6847010Z shl.b64 %rd573, %rd572, 32; 2026-02-21T09:34:05.6847199Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T09:34:05.6847393Z cvt.u64.u32 %rd575, %r1526; 2026-02-21T09:34:05.6847592Z cvt.u64.u32 %rd576, %r1527; 2026-02-21T09:34:05.6847782Z shl.b64 %rd577, %rd576, 32; 2026-02-21T09:34:05.6847982Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T09:34:05.6848184Z // begin inline asm 2026-02-21T09:34:05.6848668Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1529, %r1530, %r1531, %r1532, %r1533, %r1534, %r1535, %r1536, %r1537, %r1538, %r1539, %r1540, %r1541, %r1542, %r1543, %r1544}, [%r616 + 304]; 2026-02-21T09:34:05.6849232Z // end inline asm 2026-02-21T09:34:05.6849403Z cvt.u64.u32 %rd579, %r1533; 2026-02-21T09:34:05.6849601Z cvt.u64.u32 %rd580, %r1534; 2026-02-21T09:34:05.6849793Z shl.b64 %rd581, %rd580, 32; 2026-02-21T09:34:05.6849997Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T09:34:05.6850202Z cvt.u64.u32 %rd583, %r1535; 2026-02-21T09:34:05.6850395Z cvt.u64.u32 %rd584, %r1536; 2026-02-21T09:34:05.6850594Z shl.b64 %rd585, %rd584, 32; 2026-02-21T09:34:05.6850786Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T09:34:05.6850989Z cvt.u64.u32 %rd587, %r1541; 2026-02-21T09:34:05.6851177Z cvt.u64.u32 %rd588, %r1542; 2026-02-21T09:34:05.6851373Z shl.b64 %rd589, %rd588, 32; 2026-02-21T09:34:05.6851564Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T09:34:05.6851766Z cvt.u64.u32 %rd591, %r1543; 2026-02-21T09:34:05.6851953Z cvt.u64.u32 %rd592, %r1544; 2026-02-21T09:34:05.6852149Z shl.b64 %rd593, %rd592, 32; 2026-02-21T09:34:05.6852347Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T09:34:05.6852544Z // begin inline asm 2026-02-21T09:34:05.6853060Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1546, %r1547, %r1548, %r1549, %r1550, %r1551, %r1552, %r1553, %r1554, %r1555, %r1556, %r1557, %r1558, %r1559, %r1560, %r1561}, [%r616 + 320]; 2026-02-21T09:34:05.6853592Z // end inline asm 2026-02-21T09:34:05.6853805Z cvt.u64.u32 %rd595, %r1550; 2026-02-21T09:34:05.6854027Z cvt.u64.u32 %rd596, %r1551; 2026-02-21T09:34:05.6854223Z shl.b64 %rd597, %rd596, 32; 2026-02-21T09:34:05.6854413Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T09:34:05.6854614Z cvt.u64.u32 %rd599, %r1552; 2026-02-21T09:34:05.6854842Z cvt.u64.u32 %rd600, %r1553; 2026-02-21T09:34:05.6855034Z shl.b64 %rd601, %rd600, 32; 2026-02-21T09:34:05.6855238Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T09:34:05.6855436Z cvt.u64.u32 %rd603, %r1558; 2026-02-21T09:34:05.6855632Z cvt.u64.u32 %rd604, %r1559; 2026-02-21T09:34:05.6855824Z shl.b64 %rd605, %rd604, 32; 2026-02-21T09:34:05.6856026Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T09:34:05.6856223Z cvt.u64.u32 %rd607, %r1560; 2026-02-21T09:34:05.6856423Z cvt.u64.u32 %rd608, %r1561; 2026-02-21T09:34:05.6856626Z shl.b64 %rd609, %rd608, 32; 2026-02-21T09:34:05.6856854Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T09:34:05.6857090Z // begin inline asm 2026-02-21T09:34:05.6857548Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1563, %r1564, %r1565, %r1566, %r1567, %r1568, %r1569, %r1570, %r1571, %r1572, %r1573, %r1574, %r1575, %r1576, %r1577, %r1578}, [%r616 + 336]; 2026-02-21T09:34:05.6858065Z // end inline asm 2026-02-21T09:34:05.6858226Z cvt.u64.u32 %rd611, %r1567; 2026-02-21T09:34:05.6858410Z cvt.u64.u32 %rd612, %r1568; 2026-02-21T09:34:05.6858591Z shl.b64 %rd613, %rd612, 32; 2026-02-21T09:34:05.6858785Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T09:34:05.6858976Z cvt.u64.u32 %rd615, %r1569; 2026-02-21T09:34:05.6859159Z cvt.u64.u32 %rd616, %r1570; 2026-02-21T09:34:05.6859349Z shl.b64 %rd617, %rd616, 32; 2026-02-21T09:34:05.6859530Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T09:34:05.6859721Z cvt.u64.u32 %rd619, %r1575; 2026-02-21T09:34:05.6859904Z cvt.u64.u32 %rd620, %r1576; 2026-02-21T09:34:05.6860090Z shl.b64 %rd621, %rd620, 32; 2026-02-21T09:34:05.6860271Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T09:34:05.6860464Z cvt.u64.u32 %rd623, %r1577; 2026-02-21T09:34:05.6860643Z cvt.u64.u32 %rd624, %r1578; 2026-02-21T09:34:05.6860830Z shl.b64 %rd625, %rd624, 32; 2026-02-21T09:34:05.6861023Z or.b64 %rd626, %rd623, %rd625; 2026-02-21T09:34:05.6861205Z // begin inline asm 2026-02-21T09:34:05.6861689Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1580, %r1581, %r1582, %r1583, %r1584, %r1585, %r1586, %r1587, %r1588, %r1589, %r1590, %r1591, %r1592, %r1593, %r1594, %r1595}, [%r616 + 352]; 2026-02-21T09:34:05.6862200Z // end inline asm 2026-02-21T09:34:05.6862367Z cvt.u64.u32 %rd627, %r1584; 2026-02-21T09:34:05.6862551Z cvt.u64.u32 %rd628, %r1585; 2026-02-21T09:34:05.6862736Z shl.b64 %rd629, %rd628, 32; 2026-02-21T09:34:05.6862921Z or.b64 %rd630, %rd627, %rd629; 2026-02-21T09:34:05.6863103Z cvt.u64.u32 %rd631, %r1586; 2026-02-21T09:34:05.6863294Z cvt.u64.u32 %rd632, %r1587; 2026-02-21T09:34:05.6863476Z shl.b64 %rd633, %rd632, 32; 2026-02-21T09:34:05.6863662Z or.b64 %rd634, %rd631, %rd633; 2026-02-21T09:34:05.6863844Z cvt.u64.u32 %rd635, %r1592; 2026-02-21T09:34:05.6864028Z cvt.u64.u32 %rd636, %r1593; 2026-02-21T09:34:05.6864205Z shl.b64 %rd637, %rd636, 32; 2026-02-21T09:34:05.6864389Z or.b64 %rd638, %rd635, %rd637; 2026-02-21T09:34:05.6864572Z cvt.u64.u32 %rd639, %r1594; 2026-02-21T09:34:05.6864797Z cvt.u64.u32 %rd640, %r1595; 2026-02-21T09:34:05.6864985Z shl.b64 %rd641, %rd640, 32; 2026-02-21T09:34:05.6865169Z or.b64 %rd642, %rd639, %rd641; 2026-02-21T09:34:05.6865363Z // begin inline asm 2026-02-21T09:34:05.6865852Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1597, %r1598, %r1599, %r1600, %r1601, %r1602, %r1603, %r1604, %r1605, %r1606, %r1607, %r1608, %r1609, %r1610, %r1611, %r1612}, [%r616 + 368]; 2026-02-21T09:34:05.6866348Z // end inline asm 2026-02-21T09:34:05.6866507Z cvt.u64.u32 %rd643, %r1601; 2026-02-21T09:34:05.6866696Z cvt.u64.u32 %rd644, %r1602; 2026-02-21T09:34:05.6866877Z shl.b64 %rd645, %rd644, 32; 2026-02-21T09:34:05.6867064Z or.b64 %rd646, %rd643, %rd645; 2026-02-21T09:34:05.6867258Z cvt.u64.u32 %rd647, %r1603; 2026-02-21T09:34:05.6867472Z cvt.u64.u32 %rd648, %r1604; 2026-02-21T09:34:05.6867685Z shl.b64 %rd649, %rd648, 32; 2026-02-21T09:34:05.6867865Z or.b64 %rd650, %rd647, %rd649; 2026-02-21T09:34:05.6868059Z cvt.u64.u32 %rd651, %r1609; 2026-02-21T09:34:05.6868242Z cvt.u64.u32 %rd652, %r1610; 2026-02-21T09:34:05.6868433Z shl.b64 %rd653, %rd652, 32; 2026-02-21T09:34:05.6868615Z or.b64 %rd654, %rd651, %rd653; 2026-02-21T09:34:05.6868809Z cvt.u64.u32 %rd655, %r1611; 2026-02-21T09:34:05.6868985Z cvt.u64.u32 %rd656, %r1612; 2026-02-21T09:34:05.6869155Z shl.b64 %rd657, %rd656, 32; 2026-02-21T09:34:05.6869333Z or.b64 %rd658, %rd655, %rd657; 2026-02-21T09:34:05.6869506Z // begin inline asm 2026-02-21T09:34:05.6869952Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1614, %r1615, %r1616, %r1617, %r1618, %r1619, %r1620, %r1621, %r1622, %r1623, %r1624, %r1625, %r1626, %r1627, %r1628, %r1629}, [%r616 + 384]; 2026-02-21T09:34:05.6870436Z // end inline asm 2026-02-21T09:34:05.6870626Z cvt.u64.u32 %rd659, %r1618; 2026-02-21T09:34:05.6870800Z cvt.u64.u32 %rd660, %r1619; 2026-02-21T09:34:05.6870981Z shl.b64 %rd661, %rd660, 32; 2026-02-21T09:34:05.6871157Z or.b64 %rd662, %rd659, %rd661; 2026-02-21T09:34:05.6871331Z cvt.u64.u32 %rd663, %r1620; 2026-02-21T09:34:05.6871506Z cvt.u64.u32 %rd664, %r1621; 2026-02-21T09:34:05.6871673Z shl.b64 %rd665, %rd664, 32; 2026-02-21T09:34:05.6871850Z or.b64 %rd666, %rd663, %rd665; 2026-02-21T09:34:05.6872023Z cvt.u64.u32 %rd667, %r1626; 2026-02-21T09:34:05.6872197Z cvt.u64.u32 %rd668, %r1627; 2026-02-21T09:34:05.6872364Z shl.b64 %rd669, %rd668, 32; 2026-02-21T09:34:05.6872542Z or.b64 %rd670, %rd667, %rd669; 2026-02-21T09:34:05.6872714Z cvt.u64.u32 %rd671, %r1628; 2026-02-21T09:34:05.6872890Z cvt.u64.u32 %rd672, %r1629; 2026-02-21T09:34:05.6873062Z shl.b64 %rd673, %rd672, 32; 2026-02-21T09:34:05.6873231Z or.b64 %rd674, %rd671, %rd673; 2026-02-21T09:34:05.6873411Z // begin inline asm 2026-02-21T09:34:05.6873832Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1631, %r1632, %r1633, %r1634, %r1635, %r1636, %r1637, %r1638, %r1639, %r1640, %r1641, %r1642, %r1643, %r1644, %r1645, %r1646}, [%r616 + 400]; 2026-02-21T09:34:05.6874297Z // end inline asm 2026-02-21T09:34:05.6874446Z cvt.u64.u32 %rd675, %r1635; 2026-02-21T09:34:05.6874626Z cvt.u64.u32 %rd676, %r1636; 2026-02-21T09:34:05.6874852Z shl.b64 %rd677, %rd676, 32; 2026-02-21T09:34:05.6875030Z or.b64 %rd678, %rd675, %rd677; 2026-02-21T09:34:05.6875219Z cvt.u64.u32 %rd679, %r1637; 2026-02-21T09:34:05.6875401Z cvt.u64.u32 %rd680, %r1638; 2026-02-21T09:34:05.6875590Z shl.b64 %rd681, %rd680, 32; 2026-02-21T09:34:05.6875775Z or.b64 %rd682, %rd679, %rd681; 2026-02-21T09:34:05.6875969Z cvt.u64.u32 %rd683, %r1643; 2026-02-21T09:34:05.6876153Z cvt.u64.u32 %rd684, %r1644; 2026-02-21T09:34:05.6876330Z shl.b64 %rd685, %rd684, 32; 2026-02-21T09:34:05.6876500Z or.b64 %rd686, %rd683, %rd685; 2026-02-21T09:34:05.6876685Z cvt.u64.u32 %rd687, %r1645; 2026-02-21T09:34:05.6876863Z cvt.u64.u32 %rd688, %r1646; 2026-02-21T09:34:05.6877035Z shl.b64 %rd689, %rd688, 32; 2026-02-21T09:34:05.6877218Z or.b64 %rd690, %rd687, %rd689; 2026-02-21T09:34:05.6877390Z // begin inline asm 2026-02-21T09:34:05.6877819Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1648, %r1649, %r1650, %r1651, %r1652, %r1653, %r1654, %r1655, %r1656, %r1657, %r1658, %r1659, %r1660, %r1661, %r1662, %r1663}, [%r616 + 416]; 2026-02-21T09:34:05.6878299Z // end inline asm 2026-02-21T09:34:05.6878457Z cvt.u64.u32 %rd691, %r1650; 2026-02-21T09:34:05.6878626Z cvt.u64.u32 %rd692, %r1651; 2026-02-21T09:34:05.6878806Z shl.b64 %rd693, %rd692, 32; 2026-02-21T09:34:05.6878987Z or.b64 %rd694, %rd691, %rd693; 2026-02-21T09:34:05.6879164Z cvt.u64.u32 %rd695, %r1652; 2026-02-21T09:34:05.6879342Z cvt.u64.u32 %rd696, %r1653; 2026-02-21T09:34:05.6879512Z shl.b64 %rd697, %rd696, 32; 2026-02-21T09:34:05.6879689Z or.b64 %rd698, %rd695, %rd697; 2026-02-21T09:34:05.6879862Z cvt.u64.u32 %rd699, %r1654; 2026-02-21T09:34:05.6880040Z cvt.u64.u32 %rd700, %r1655; 2026-02-21T09:34:05.6880242Z shl.b64 %rd701, %rd700, 32; 2026-02-21T09:34:05.6880451Z or.b64 %rd702, %rd699, %rd701; 2026-02-21T09:34:05.6880630Z cvt.u64.u32 %rd703, %r1658; 2026-02-21T09:34:05.6880796Z cvt.u64.u32 %rd704, %r1659; 2026-02-21T09:34:05.6880973Z shl.b64 %rd705, %rd704, 32; 2026-02-21T09:34:05.6881142Z or.b64 %rd706, %rd703, %rd705; 2026-02-21T09:34:05.6881325Z cvt.u64.u32 %rd707, %r1660; 2026-02-21T09:34:05.6881491Z cvt.u64.u32 %rd708, %r1661; 2026-02-21T09:34:05.6881664Z shl.b64 %rd709, %rd708, 32; 2026-02-21T09:34:05.6881835Z or.b64 %rd710, %rd707, %rd709; 2026-02-21T09:34:05.6882015Z cvt.u64.u32 %rd711, %r1662; 2026-02-21T09:34:05.6882182Z cvt.u64.u32 %rd712, %r1663; 2026-02-21T09:34:05.6882359Z shl.b64 %rd713, %rd712, 32; 2026-02-21T09:34:05.6882536Z or.b64 %rd714, %rd711, %rd713; 2026-02-21T09:34:05.6882708Z // begin inline asm 2026-02-21T09:34:05.6883191Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1665, %r1666, %r1667, %r1668, %r1669, %r1670, %r1671, %r1672, %r1673, %r1674, %r1675, %r1676, %r1677, %r1678, %r1679, %r1680}, [%r616 + 432]; 2026-02-21T09:34:05.6883660Z // end inline asm 2026-02-21T09:34:05.6883820Z cvt.u64.u32 %rd715, %r1667; 2026-02-21T09:34:05.6883991Z cvt.u64.u32 %rd716, %r1668; 2026-02-21T09:34:05.6884166Z shl.b64 %rd717, %rd716, 32; 2026-02-21T09:34:05.6884343Z or.b64 %rd718, %rd715, %rd717; 2026-02-21T09:34:05.6884516Z cvt.u64.u32 %rd719, %r1669; 2026-02-21T09:34:05.6884727Z cvt.u64.u32 %rd720, %r1670; 2026-02-21T09:34:05.6884897Z shl.b64 %rd721, %rd720, 32; 2026-02-21T09:34:05.6885080Z or.b64 %rd722, %rd719, %rd721; 2026-02-21T09:34:05.6885262Z cvt.u64.u32 %rd723, %r1671; 2026-02-21T09:34:05.6885447Z cvt.u64.u32 %rd724, %r1672; 2026-02-21T09:34:05.6885625Z shl.b64 %rd725, %rd724, 32; 2026-02-21T09:34:05.6885813Z or.b64 %rd726, %rd723, %rd725; 2026-02-21T09:34:05.6885994Z cvt.u64.u32 %rd727, %r1677; 2026-02-21T09:34:05.6886181Z cvt.u64.u32 %rd728, %r1678; 2026-02-21T09:34:05.6886368Z shl.b64 %rd729, %rd728, 32; 2026-02-21T09:34:05.6886552Z or.b64 %rd730, %rd727, %rd729; 2026-02-21T09:34:05.6886747Z cvt.u64.u32 %rd731, %r1679; 2026-02-21T09:34:05.6886925Z cvt.u64.u32 %rd732, %r1680; 2026-02-21T09:34:05.6887112Z shl.b64 %rd733, %rd732, 32; 2026-02-21T09:34:05.6887292Z or.b64 %rd734, %rd731, %rd733; 2026-02-21T09:34:05.6887479Z // begin inline asm 2026-02-21T09:34:05.6887938Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1682, %r1683, %r1684, %r1685, %r1686, %r1687, %r1688, %r1689, %r1690, %r1691, %r1692, %r1693, %r1694, %r1695, %r1696, %r1697}, [%r616 + 448]; 2026-02-21T09:34:05.6888434Z // end inline asm 2026-02-21T09:34:05.6888603Z cvt.u64.u32 %rd735, %r1684; 2026-02-21T09:34:05.6888783Z cvt.u64.u32 %rd736, %r1685; 2026-02-21T09:34:05.6888974Z shl.b64 %rd737, %rd736, 32; 2026-02-21T09:34:05.6889156Z or.b64 %rd738, %rd735, %rd737; 2026-02-21T09:34:05.6889359Z cvt.u64.u32 %rd739, %r1686; 2026-02-21T09:34:05.6889540Z cvt.u64.u32 %rd740, %r1687; 2026-02-21T09:34:05.6889730Z shl.b64 %rd741, %rd740, 32; 2026-02-21T09:34:05.6889914Z or.b64 %rd742, %rd739, %rd741; 2026-02-21T09:34:05.6890104Z cvt.u64.u32 %rd743, %r1688; 2026-02-21T09:34:05.6890288Z cvt.u64.u32 %rd744, %r1689; 2026-02-21T09:34:05.6890467Z shl.b64 %rd745, %rd744, 32; 2026-02-21T09:34:05.6890651Z or.b64 %rd746, %rd743, %rd745; 2026-02-21T09:34:05.6890836Z cvt.u64.u32 %rd747, %r1692; 2026-02-21T09:34:05.6891020Z cvt.u64.u32 %rd748, %r1693; 2026-02-21T09:34:05.6891201Z shl.b64 %rd749, %rd748, 32; 2026-02-21T09:34:05.6891389Z or.b64 %rd750, %rd747, %rd749; 2026-02-21T09:34:05.6891571Z cvt.u64.u32 %rd751, %r1694; 2026-02-21T09:34:05.6891753Z cvt.u64.u32 %rd752, %r1695; 2026-02-21T09:34:05.6891933Z shl.b64 %rd753, %rd752, 32; 2026-02-21T09:34:05.6892120Z or.b64 %rd754, %rd751, %rd753; 2026-02-21T09:34:05.6892311Z cvt.u64.u32 %rd755, %r1696; 2026-02-21T09:34:05.6892488Z cvt.u64.u32 %rd756, %r1697; 2026-02-21T09:34:05.6892673Z shl.b64 %rd757, %rd756, 32; 2026-02-21T09:34:05.6892853Z or.b64 %rd758, %rd755, %rd757; 2026-02-21T09:34:05.6893081Z // begin inline asm 2026-02-21T09:34:05.6893565Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1699, %r1700, %r1701, %r1702, %r1703, %r1704, %r1705, %r1706, %r1707, %r1708, %r1709, %r1710, %r1711, %r1712, %r1713, %r1714}, [%r616 + 464]; 2026-02-21T09:34:05.6894089Z // end inline asm 2026-02-21T09:34:05.6894246Z cvt.u64.u32 %rd759, %r1701; 2026-02-21T09:34:05.6894432Z cvt.u64.u32 %rd760, %r1702; 2026-02-21T09:34:05.6894617Z shl.b64 %rd761, %rd760, 32; 2026-02-21T09:34:05.6894840Z or.b64 %rd762, %rd759, %rd761; 2026-02-21T09:34:05.6895034Z cvt.u64.u32 %rd763, %r1703; 2026-02-21T09:34:05.6895215Z cvt.u64.u32 %rd764, %r1704; 2026-02-21T09:34:05.6895404Z shl.b64 %rd765, %rd764, 32; 2026-02-21T09:34:05.6895585Z or.b64 %rd766, %rd763, %rd765; 2026-02-21T09:34:05.6895775Z cvt.u64.u32 %rd767, %r1705; 2026-02-21T09:34:05.6895951Z cvt.u64.u32 %rd768, %r1706; 2026-02-21T09:34:05.6896168Z shl.b64 %rd769, %rd768, 32; 2026-02-21T09:34:05.6896391Z or.b64 %rd770, %rd767, %rd769; 2026-02-21T09:34:05.6896555Z cvt.u64.u32 %rd771, %r1709; 2026-02-21T09:34:05.6896721Z cvt.u64.u32 %rd772, %r1710; 2026-02-21T09:34:05.6896877Z shl.b64 %rd773, %rd772, 32; 2026-02-21T09:34:05.6897039Z or.b64 %rd774, %rd771, %rd773; 2026-02-21T09:34:05.6897197Z cvt.u64.u32 %rd775, %r1711; 2026-02-21T09:34:05.6897357Z cvt.u64.u32 %rd776, %r1712; 2026-02-21T09:34:05.6897511Z shl.b64 %rd777, %rd776, 32; 2026-02-21T09:34:05.6897674Z or.b64 %rd778, %rd775, %rd777; 2026-02-21T09:34:05.6897832Z cvt.u64.u32 %rd779, %r1713; 2026-02-21T09:34:05.6897994Z cvt.u64.u32 %rd780, %r1714; 2026-02-21T09:34:05.6898156Z shl.b64 %rd781, %rd780, 32; 2026-02-21T09:34:05.6898314Z or.b64 %rd782, %rd779, %rd781; 2026-02-21T09:34:05.6898482Z // begin inline asm 2026-02-21T09:34:05.6898877Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1716, %r1717, %r1718, %r1719, %r1720, %r1721, %r1722, %r1723, %r1724, %r1725, %r1726, %r1727, %r1728, %r1729, %r1730, %r1731}, [%r616 + 480]; 2026-02-21T09:34:05.6899319Z // end inline asm 2026-02-21T09:34:05.6899465Z cvt.u64.u32 %rd783, %r1718; 2026-02-21T09:34:05.6899626Z cvt.u64.u32 %rd784, %r1719; 2026-02-21T09:34:05.6899776Z shl.b64 %rd785, %rd784, 32; 2026-02-21T09:34:05.6899934Z or.b64 %rd786, %rd783, %rd785; 2026-02-21T09:34:05.6900093Z cvt.u64.u32 %rd787, %r1720; 2026-02-21T09:34:05.6900246Z cvt.u64.u32 %rd788, %r1721; 2026-02-21T09:34:05.6900405Z shl.b64 %rd789, %rd788, 32; 2026-02-21T09:34:05.6900555Z or.b64 %rd790, %rd787, %rd789; 2026-02-21T09:34:05.6900716Z cvt.u64.u32 %rd791, %r1722; 2026-02-21T09:34:05.6900867Z cvt.u64.u32 %rd792, %r1723; 2026-02-21T09:34:05.6901023Z shl.b64 %rd793, %rd792, 32; 2026-02-21T09:34:05.6901179Z or.b64 %rd794, %rd791, %rd793; 2026-02-21T09:34:05.6901340Z cvt.u64.u32 %rd795, %r1726; 2026-02-21T09:34:05.6901496Z cvt.u64.u32 %rd796, %r1727; 2026-02-21T09:34:05.6901648Z shl.b64 %rd797, %rd796, 32; 2026-02-21T09:34:05.6901806Z or.b64 %rd798, %rd795, %rd797; 2026-02-21T09:34:05.6901966Z cvt.u64.u32 %rd799, %r1728; 2026-02-21T09:34:05.6902122Z cvt.u64.u32 %rd800, %r1729; 2026-02-21T09:34:05.6902277Z shl.b64 %rd801, %rd800, 32; 2026-02-21T09:34:05.6902434Z or.b64 %rd802, %rd799, %rd801; 2026-02-21T09:34:05.6902588Z cvt.u64.u32 %rd803, %r1730; 2026-02-21T09:34:05.6902742Z cvt.u64.u32 %rd804, %r1731; 2026-02-21T09:34:05.6902891Z shl.b64 %rd805, %rd804, 32; 2026-02-21T09:34:05.6903046Z or.b64 %rd806, %rd803, %rd805; 2026-02-21T09:34:05.6903208Z // begin inline asm 2026-02-21T09:34:05.6903591Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r1733, %r1734, %r1735, %r1736, %r1737, %r1738, %r1739, %r1740, %r1741, %r1742, %r1743, %r1744, %r1745, %r1746, %r1747, %r1748}, [%r616 + 496]; 2026-02-21T09:34:05.6904022Z // end inline asm 2026-02-21T09:34:05.6904161Z cvt.u64.u32 %rd807, %r1735; 2026-02-21T09:34:05.6904323Z cvt.u64.u32 %rd808, %r1736; 2026-02-21T09:34:05.6904476Z shl.b64 %rd809, %rd808, 32; 2026-02-21T09:34:05.6904640Z or.b64 %rd810, %rd807, %rd809; 2026-02-21T09:34:05.6904774Z cvt.u64.u32 %rd811, %r1737; 2026-02-21T09:34:05.6904906Z cvt.u64.u32 %rd812, %r1738; 2026-02-21T09:34:05.6904966Z shl.b64 %rd813, %rd812, 32; 2026-02-21T09:34:05.6905030Z or.b64 %rd814, %rd811, %rd813; 2026-02-21T09:34:05.6905099Z cvt.u64.u32 %rd815, %r1739; 2026-02-21T09:34:05.6905178Z cvt.u64.u32 %rd816, %r1740; 2026-02-21T09:34:05.6905248Z shl.b64 %rd817, %rd816, 32; 2026-02-21T09:34:05.6905320Z or.b64 %rd818, %rd815, %rd817; 2026-02-21T09:34:05.6905397Z cvt.u64.u32 %rd819, %r1743; 2026-02-21T09:34:05.6905465Z cvt.u64.u32 %rd820, %r1744; 2026-02-21T09:34:05.6905534Z shl.b64 %rd821, %rd820, 32; 2026-02-21T09:34:05.6905606Z or.b64 %rd822, %rd819, %rd821; 2026-02-21T09:34:05.6905687Z cvt.u64.u32 %rd823, %r1745; 2026-02-21T09:34:05.6905756Z cvt.u64.u32 %rd824, %r1746; 2026-02-21T09:34:05.6905827Z shl.b64 %rd825, %rd824, 32; 2026-02-21T09:34:05.6905907Z or.b64 %rd826, %rd823, %rd825; 2026-02-21T09:34:05.6906031Z cvt.u64.u32 %rd827, %r1747; 2026-02-21T09:34:05.6906141Z cvt.u64.u32 %rd828, %r1748; 2026-02-21T09:34:05.6906214Z shl.b64 %rd829, %rd828, 32; 2026-02-21T09:34:05.6906297Z or.b64 %rd830, %rd827, %rd829; 2026-02-21T09:34:05.6906367Z // begin inline asm 2026-02-21T09:34:05.6906465Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:34:05.6906540Z // end inline asm 2026-02-21T09:34:05.6906607Z bar.sync 0, 128; 2026-02-21T09:34:05.6906684Z // begin inline asm 2026-02-21T09:34:05.6906800Z @%p94 mbarrier.arrive.shared::cta.b64 _, [%r3304]; 2026-02-21T09:34:05.6906874Z // end inline asm 2026-02-21T09:34:05.6906940Z cvt.u64.u32 %rd831, %r1206; 2026-02-21T09:34:05.6907006Z cvt.u64.u32 %rd832, %r1207; 2026-02-21T09:34:05.6907081Z shl.b64 %rd833, %rd832, 32; 2026-02-21T09:34:05.6907150Z or.b64 %rd834, %rd831, %rd833; 2026-02-21T09:34:05.6907378Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6907466Z mov.b64 {%r2394, %r2395}, %rd834; 2026-02-21T09:34:05.6907557Z cvt.rn.f16x2.f32 %r2396, %r2395, %r2394; 2026-02-21T09:34:05.6907777Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6907848Z cvt.u64.u32 %rd835, %r1208; 2026-02-21T09:34:05.6907926Z cvt.u64.u32 %rd836, %r1209; 2026-02-21T09:34:05.6907995Z shl.b64 %rd837, %rd836, 32; 2026-02-21T09:34:05.6908066Z or.b64 %rd838, %rd835, %rd837; 2026-02-21T09:34:05.6908276Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6908347Z mov.b64 {%r2397, %r2398}, %rd838; 2026-02-21T09:34:05.6908427Z cvt.rn.f16x2.f32 %r2399, %r2398, %r2397; 2026-02-21T09:34:05.6908500Z mov.b64 {%r2400, %r2401}, %rd278; 2026-02-21T09:34:05.6908573Z cvt.rn.f16x2.f32 %r2402, %r2401, %r2400; 2026-02-21T09:34:05.6908636Z mov.b64 {%r2403, %r2404}, %rd282; 2026-02-21T09:34:05.6908710Z cvt.rn.f16x2.f32 %r2405, %r2404, %r2403; 2026-02-21T09:34:05.6908911Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6908984Z cvt.u64.u32 %rd839, %r1214; 2026-02-21T09:34:05.6909048Z cvt.u64.u32 %rd840, %r1215; 2026-02-21T09:34:05.6909121Z shl.b64 %rd841, %rd840, 32; 2026-02-21T09:34:05.6909193Z or.b64 %rd842, %rd839, %rd841; 2026-02-21T09:34:05.6909394Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6909466Z mov.b64 {%r2406, %r2407}, %rd842; 2026-02-21T09:34:05.6909545Z cvt.rn.f16x2.f32 %r2408, %r2407, %r2406; 2026-02-21T09:34:05.6909748Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6909819Z cvt.u64.u32 %rd843, %r1216; 2026-02-21T09:34:05.6909890Z cvt.u64.u32 %rd844, %r1217; 2026-02-21T09:34:05.6909955Z shl.b64 %rd845, %rd844, 32; 2026-02-21T09:34:05.6910023Z or.b64 %rd846, %rd843, %rd845; 2026-02-21T09:34:05.6910222Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6910321Z mov.b64 {%r2409, %r2410}, %rd846; 2026-02-21T09:34:05.6910424Z cvt.rn.f16x2.f32 %r2411, %r2410, %r2409; 2026-02-21T09:34:05.6910498Z mov.b64 {%r2412, %r2413}, %rd286; 2026-02-21T09:34:05.6910575Z cvt.rn.f16x2.f32 %r2414, %r2413, %r2412; 2026-02-21T09:34:05.6910642Z mov.b64 {%r2415, %r2416}, %rd290; 2026-02-21T09:34:05.6910717Z cvt.rn.f16x2.f32 %r2417, %r2416, %r2415; 2026-02-21T09:34:05.6910923Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6910989Z cvt.u64.u32 %rd847, %r1223; 2026-02-21T09:34:05.6911052Z cvt.u64.u32 %rd848, %r1224; 2026-02-21T09:34:05.6911125Z shl.b64 %rd849, %rd848, 32; 2026-02-21T09:34:05.6911194Z or.b64 %rd850, %rd847, %rd849; 2026-02-21T09:34:05.6911393Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6911495Z mov.b64 {%r2418, %r2419}, %rd850; 2026-02-21T09:34:05.6911596Z cvt.rn.f16x2.f32 %r2420, %r2419, %r2418; 2026-02-21T09:34:05.6911794Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6911862Z cvt.u64.u32 %rd851, %r1225; 2026-02-21T09:34:05.6911929Z cvt.u64.u32 %rd852, %r1226; 2026-02-21T09:34:05.6911992Z shl.b64 %rd853, %rd852, 32; 2026-02-21T09:34:05.6912059Z or.b64 %rd854, %rd851, %rd853; 2026-02-21T09:34:05.6912261Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6912329Z mov.b64 {%r2421, %r2422}, %rd854; 2026-02-21T09:34:05.6912404Z cvt.rn.f16x2.f32 %r2423, %r2422, %r2421; 2026-02-21T09:34:05.6912476Z mov.b64 {%r2424, %r2425}, %rd294; 2026-02-21T09:34:05.6912550Z cvt.rn.f16x2.f32 %r2426, %r2425, %r2424; 2026-02-21T09:34:05.6912615Z mov.b64 {%r2427, %r2428}, %rd298; 2026-02-21T09:34:05.6912693Z cvt.rn.f16x2.f32 %r2429, %r2428, %r2427; 2026-02-21T09:34:05.6912892Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6912959Z cvt.u64.u32 %rd855, %r1231; 2026-02-21T09:34:05.6913024Z cvt.u64.u32 %rd856, %r1232; 2026-02-21T09:34:05.6913094Z shl.b64 %rd857, %rd856, 32; 2026-02-21T09:34:05.6913160Z or.b64 %rd858, %rd855, %rd857; 2026-02-21T09:34:05.6913350Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6913425Z mov.b64 {%r2430, %r2431}, %rd858; 2026-02-21T09:34:05.6913500Z cvt.rn.f16x2.f32 %r2432, %r2431, %r2430; 2026-02-21T09:34:05.6913691Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6913758Z cvt.u64.u32 %rd859, %r1233; 2026-02-21T09:34:05.6913831Z cvt.u64.u32 %rd860, %r1234; 2026-02-21T09:34:05.6913898Z shl.b64 %rd861, %rd860, 32; 2026-02-21T09:34:05.6913968Z or.b64 %rd862, %rd859, %rd861; 2026-02-21T09:34:05.6914169Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6914240Z mov.b64 {%r2433, %r2434}, %rd862; 2026-02-21T09:34:05.6914317Z cvt.rn.f16x2.f32 %r2435, %r2434, %r2433; 2026-02-21T09:34:05.6914391Z mov.b64 {%r2436, %r2437}, %rd302; 2026-02-21T09:34:05.6914467Z cvt.rn.f16x2.f32 %r2438, %r2437, %r2436; 2026-02-21T09:34:05.6914534Z mov.b64 {%r2439, %r2440}, %rd306; 2026-02-21T09:34:05.6914608Z cvt.rn.f16x2.f32 %r2441, %r2440, %r2439; 2026-02-21T09:34:05.6914859Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6914927Z cvt.u64.u32 %rd863, %r1240; 2026-02-21T09:34:05.6914993Z cvt.u64.u32 %rd864, %r1241; 2026-02-21T09:34:05.6915066Z shl.b64 %rd865, %rd864, 32; 2026-02-21T09:34:05.6915134Z or.b64 %rd866, %rd863, %rd865; 2026-02-21T09:34:05.6915331Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6915410Z mov.b64 {%r2442, %r2443}, %rd866; 2026-02-21T09:34:05.6915526Z cvt.rn.f16x2.f32 %r2444, %r2443, %r2442; 2026-02-21T09:34:05.6915748Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6915816Z cvt.u64.u32 %rd867, %r1242; 2026-02-21T09:34:05.6915891Z cvt.u64.u32 %rd868, %r1243; 2026-02-21T09:34:05.6915958Z shl.b64 %rd869, %rd868, 32; 2026-02-21T09:34:05.6916028Z or.b64 %rd870, %rd867, %rd869; 2026-02-21T09:34:05.6916232Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6916303Z mov.b64 {%r2445, %r2446}, %rd870; 2026-02-21T09:34:05.6916381Z cvt.rn.f16x2.f32 %r2447, %r2446, %r2445; 2026-02-21T09:34:05.6916460Z mov.b64 {%r2448, %r2449}, %rd310; 2026-02-21T09:34:05.6916537Z cvt.rn.f16x2.f32 %r2450, %r2449, %r2448; 2026-02-21T09:34:05.6916606Z mov.b64 {%r2451, %r2452}, %rd314; 2026-02-21T09:34:05.6916724Z cvt.rn.f16x2.f32 %r2453, %r2452, %r2451; 2026-02-21T09:34:05.6916963Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6917035Z cvt.u64.u32 %rd871, %r1248; 2026-02-21T09:34:05.6917103Z cvt.u64.u32 %rd872, %r1249; 2026-02-21T09:34:05.6917177Z shl.b64 %rd873, %rd872, 32; 2026-02-21T09:34:05.6917247Z or.b64 %rd874, %rd871, %rd873; 2026-02-21T09:34:05.6917447Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6917525Z mov.b64 {%r2454, %r2455}, %rd874; 2026-02-21T09:34:05.6917603Z cvt.rn.f16x2.f32 %r2456, %r2455, %r2454; 2026-02-21T09:34:05.6917796Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6917867Z cvt.u64.u32 %rd875, %r1250; 2026-02-21T09:34:05.6917946Z cvt.u64.u32 %rd876, %r1251; 2026-02-21T09:34:05.6918015Z shl.b64 %rd877, %rd876, 32; 2026-02-21T09:34:05.6918086Z or.b64 %rd878, %rd875, %rd877; 2026-02-21T09:34:05.6918289Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6918362Z mov.b64 {%r2457, %r2458}, %rd878; 2026-02-21T09:34:05.6918439Z cvt.rn.f16x2.f32 %r2459, %r2458, %r2457; 2026-02-21T09:34:05.6918511Z mov.b64 {%r2460, %r2461}, %rd318; 2026-02-21T09:34:05.6918585Z cvt.rn.f16x2.f32 %r2462, %r2461, %r2460; 2026-02-21T09:34:05.6918644Z mov.b64 {%r2463, %r2464}, %rd322; 2026-02-21T09:34:05.6918709Z cvt.rn.f16x2.f32 %r2465, %r2464, %r2463; 2026-02-21T09:34:05.6918883Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6918944Z cvt.u64.u32 %rd879, %r1257; 2026-02-21T09:34:05.6919003Z cvt.u64.u32 %rd880, %r1258; 2026-02-21T09:34:05.6919068Z shl.b64 %rd881, %rd880, 32; 2026-02-21T09:34:05.6919130Z or.b64 %rd882, %rd879, %rd881; 2026-02-21T09:34:05.6919302Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6919364Z mov.b64 {%r2466, %r2467}, %rd882; 2026-02-21T09:34:05.6919441Z cvt.rn.f16x2.f32 %r2468, %r2467, %r2466; 2026-02-21T09:34:05.6919612Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6919670Z cvt.u64.u32 %rd883, %r1259; 2026-02-21T09:34:05.6919739Z cvt.u64.u32 %rd884, %r1260; 2026-02-21T09:34:05.6919797Z shl.b64 %rd885, %rd884, 32; 2026-02-21T09:34:05.6919857Z or.b64 %rd886, %rd883, %rd885; 2026-02-21T09:34:05.6920034Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6920094Z mov.b64 {%r2469, %r2470}, %rd886; 2026-02-21T09:34:05.6920161Z cvt.rn.f16x2.f32 %r2471, %r2470, %r2469; 2026-02-21T09:34:05.6920221Z mov.b64 {%r2472, %r2473}, %rd326; 2026-02-21T09:34:05.6920295Z cvt.rn.f16x2.f32 %r2474, %r2473, %r2472; 2026-02-21T09:34:05.6920355Z mov.b64 {%r2475, %r2476}, %rd330; 2026-02-21T09:34:05.6920422Z cvt.rn.f16x2.f32 %r2477, %r2476, %r2475; 2026-02-21T09:34:05.6920625Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6920709Z cvt.u64.u32 %rd887, %r1265; 2026-02-21T09:34:05.6920769Z cvt.u64.u32 %rd888, %r1266; 2026-02-21T09:34:05.6920835Z shl.b64 %rd889, %rd888, 32; 2026-02-21T09:34:05.6920898Z or.b64 %rd890, %rd887, %rd889; 2026-02-21T09:34:05.6921070Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6921130Z mov.b64 {%r2478, %r2479}, %rd890; 2026-02-21T09:34:05.6921205Z cvt.rn.f16x2.f32 %r2480, %r2479, %r2478; 2026-02-21T09:34:05.6921378Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6921438Z cvt.u64.u32 %rd891, %r1267; 2026-02-21T09:34:05.6921505Z cvt.u64.u32 %rd892, %r1268; 2026-02-21T09:34:05.6921582Z shl.b64 %rd893, %rd892, 32; 2026-02-21T09:34:05.6921664Z or.b64 %rd894, %rd891, %rd893; 2026-02-21T09:34:05.6921842Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6921904Z mov.b64 {%r2481, %r2482}, %rd894; 2026-02-21T09:34:05.6921972Z cvt.rn.f16x2.f32 %r2483, %r2482, %r2481; 2026-02-21T09:34:05.6922031Z mov.b64 {%r2484, %r2485}, %rd334; 2026-02-21T09:34:05.6922106Z cvt.rn.f16x2.f32 %r2486, %r2485, %r2484; 2026-02-21T09:34:05.6922165Z mov.b64 {%r2487, %r2488}, %rd338; 2026-02-21T09:34:05.6922231Z cvt.rn.f16x2.f32 %r2489, %r2488, %r2487; 2026-02-21T09:34:05.6922410Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6922470Z cvt.u64.u32 %rd895, %r1274; 2026-02-21T09:34:05.6922528Z cvt.u64.u32 %rd896, %r1275; 2026-02-21T09:34:05.6922594Z shl.b64 %rd897, %rd896, 32; 2026-02-21T09:34:05.6922654Z or.b64 %rd898, %rd895, %rd897; 2026-02-21T09:34:05.6922825Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6922887Z mov.b64 {%r2490, %r2491}, %rd898; 2026-02-21T09:34:05.6922963Z cvt.rn.f16x2.f32 %r2492, %r2491, %r2490; 2026-02-21T09:34:05.6923135Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6923192Z cvt.u64.u32 %rd899, %r1276; 2026-02-21T09:34:05.6923257Z cvt.u64.u32 %rd900, %r1277; 2026-02-21T09:34:05.6923315Z shl.b64 %rd901, %rd900, 32; 2026-02-21T09:34:05.6923376Z or.b64 %rd902, %rd899, %rd901; 2026-02-21T09:34:05.6923552Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6923613Z mov.b64 {%r2493, %r2494}, %rd902; 2026-02-21T09:34:05.6923678Z cvt.rn.f16x2.f32 %r2495, %r2494, %r2493; 2026-02-21T09:34:05.6923738Z mov.b64 {%r2496, %r2497}, %rd342; 2026-02-21T09:34:05.6923813Z cvt.rn.f16x2.f32 %r2498, %r2497, %r2496; 2026-02-21T09:34:05.6923874Z mov.b64 {%r2499, %r2500}, %rd346; 2026-02-21T09:34:05.6923943Z cvt.rn.f16x2.f32 %r2501, %r2500, %r2499; 2026-02-21T09:34:05.6924123Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6924182Z cvt.u64.u32 %rd903, %r1282; 2026-02-21T09:34:05.6924241Z cvt.u64.u32 %rd904, %r1283; 2026-02-21T09:34:05.6924308Z shl.b64 %rd905, %rd904, 32; 2026-02-21T09:34:05.6924368Z or.b64 %rd906, %rd903, %rd905; 2026-02-21T09:34:05.6924539Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6924599Z mov.b64 {%r2502, %r2503}, %rd906; 2026-02-21T09:34:05.6924705Z cvt.rn.f16x2.f32 %r2504, %r2503, %r2502; 2026-02-21T09:34:05.6924881Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6924941Z cvt.u64.u32 %rd907, %r1284; 2026-02-21T09:34:05.6925015Z cvt.u64.u32 %rd908, %r1285; 2026-02-21T09:34:05.6925083Z shl.b64 %rd909, %rd908, 32; 2026-02-21T09:34:05.6925156Z or.b64 %rd910, %rd907, %rd909; 2026-02-21T09:34:05.6925437Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6925509Z mov.b64 {%r2505, %r2506}, %rd910; 2026-02-21T09:34:05.6925587Z cvt.rn.f16x2.f32 %r2507, %r2506, %r2505; 2026-02-21T09:34:05.6925658Z mov.b64 {%r2508, %r2509}, %rd350; 2026-02-21T09:34:05.6925742Z cvt.rn.f16x2.f32 %r2510, %r2509, %r2508; 2026-02-21T09:34:05.6925810Z mov.b64 {%r2511, %r2512}, %rd354; 2026-02-21T09:34:05.6925887Z cvt.rn.f16x2.f32 %r2513, %r2512, %r2511; 2026-02-21T09:34:05.6926094Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6926163Z cvt.u64.u32 %rd911, %r1291; 2026-02-21T09:34:05.6926231Z cvt.u64.u32 %rd912, %r1292; 2026-02-21T09:34:05.6926306Z shl.b64 %rd913, %rd912, 32; 2026-02-21T09:34:05.6926376Z or.b64 %rd914, %rd911, %rd913; 2026-02-21T09:34:05.6926623Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6926694Z mov.b64 {%r2514, %r2515}, %rd914; 2026-02-21T09:34:05.6926778Z cvt.rn.f16x2.f32 %r2516, %r2515, %r2514; 2026-02-21T09:34:05.6926981Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6927045Z cvt.u64.u32 %rd915, %r1293; 2026-02-21T09:34:05.6927117Z cvt.u64.u32 %rd916, %r1294; 2026-02-21T09:34:05.6927180Z shl.b64 %rd917, %rd916, 32; 2026-02-21T09:34:05.6927246Z or.b64 %rd918, %rd915, %rd917; 2026-02-21T09:34:05.6927438Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6927505Z mov.b64 {%r2517, %r2518}, %rd918; 2026-02-21T09:34:05.6927578Z cvt.rn.f16x2.f32 %r2519, %r2518, %r2517; 2026-02-21T09:34:05.6927641Z mov.b64 {%r2520, %r2521}, %rd358; 2026-02-21T09:34:05.6927723Z cvt.rn.f16x2.f32 %r2522, %r2521, %r2520; 2026-02-21T09:34:05.6927788Z mov.b64 {%r2523, %r2524}, %rd362; 2026-02-21T09:34:05.6927861Z cvt.rn.f16x2.f32 %r2525, %r2524, %r2523; 2026-02-21T09:34:05.6928057Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6928122Z cvt.u64.u32 %rd919, %r1299; 2026-02-21T09:34:05.6928186Z cvt.u64.u32 %rd920, %r1300; 2026-02-21T09:34:05.6928257Z shl.b64 %rd921, %rd920, 32; 2026-02-21T09:34:05.6928324Z or.b64 %rd922, %rd919, %rd921; 2026-02-21T09:34:05.6928513Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6928576Z mov.b64 {%r2526, %r2527}, %rd922; 2026-02-21T09:34:05.6928656Z cvt.rn.f16x2.f32 %r2528, %r2527, %r2526; 2026-02-21T09:34:05.6928844Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6928909Z cvt.u64.u32 %rd923, %r1301; 2026-02-21T09:34:05.6928981Z cvt.u64.u32 %rd924, %r1302; 2026-02-21T09:34:05.6929047Z shl.b64 %rd925, %rd924, 32; 2026-02-21T09:34:05.6929115Z or.b64 %rd926, %rd923, %rd925; 2026-02-21T09:34:05.6929313Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6929379Z mov.b64 {%r2529, %r2530}, %rd926; 2026-02-21T09:34:05.6929451Z cvt.rn.f16x2.f32 %r2531, %r2530, %r2529; 2026-02-21T09:34:05.6929515Z mov.b64 {%r2532, %r2533}, %rd366; 2026-02-21T09:34:05.6929594Z cvt.rn.f16x2.f32 %r2534, %r2533, %r2532; 2026-02-21T09:34:05.6929658Z mov.b64 {%r2535, %r2536}, %rd370; 2026-02-21T09:34:05.6929730Z cvt.rn.f16x2.f32 %r2537, %r2536, %r2535; 2026-02-21T09:34:05.6929923Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6929989Z cvt.u64.u32 %rd927, %r1308; 2026-02-21T09:34:05.6930053Z cvt.u64.u32 %rd928, %r1309; 2026-02-21T09:34:05.6930124Z shl.b64 %rd929, %rd928, 32; 2026-02-21T09:34:05.6930193Z or.b64 %rd930, %rd927, %rd929; 2026-02-21T09:34:05.6930383Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6930494Z mov.b64 {%r2538, %r2539}, %rd930; 2026-02-21T09:34:05.6930573Z cvt.rn.f16x2.f32 %r2540, %r2539, %r2538; 2026-02-21T09:34:05.6930763Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6930826Z cvt.u64.u32 %rd931, %r1310; 2026-02-21T09:34:05.6930897Z cvt.u64.u32 %rd932, %r1311; 2026-02-21T09:34:05.6930961Z shl.b64 %rd933, %rd932, 32; 2026-02-21T09:34:05.6931026Z or.b64 %rd934, %rd931, %rd933; 2026-02-21T09:34:05.6931217Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6931282Z mov.b64 {%r2541, %r2542}, %rd934; 2026-02-21T09:34:05.6931355Z cvt.rn.f16x2.f32 %r2543, %r2542, %r2541; 2026-02-21T09:34:05.6931419Z mov.b64 {%r2544, %r2545}, %rd374; 2026-02-21T09:34:05.6931549Z cvt.rn.f16x2.f32 %r2546, %r2545, %r2544; 2026-02-21T09:34:05.6931618Z mov.b64 {%r2547, %r2548}, %rd378; 2026-02-21T09:34:05.6931692Z cvt.rn.f16x2.f32 %r2549, %r2548, %r2547; 2026-02-21T09:34:05.6931885Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6931952Z cvt.u64.u32 %rd935, %r1316; 2026-02-21T09:34:05.6932015Z cvt.u64.u32 %rd936, %r1317; 2026-02-21T09:34:05.6932078Z shl.b64 %rd937, %rd936, 32; 2026-02-21T09:34:05.6932151Z or.b64 %rd938, %rd935, %rd937; 2026-02-21T09:34:05.6932335Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6932400Z mov.b64 {%r2550, %r2551}, %rd938; 2026-02-21T09:34:05.6932480Z cvt.rn.f16x2.f32 %r2552, %r2551, %r2550; 2026-02-21T09:34:05.6932665Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6932731Z cvt.u64.u32 %rd939, %r1318; 2026-02-21T09:34:05.6932805Z cvt.u64.u32 %rd940, %r1319; 2026-02-21T09:34:05.6932869Z shl.b64 %rd941, %rd940, 32; 2026-02-21T09:34:05.6932935Z or.b64 %rd942, %rd939, %rd941; 2026-02-21T09:34:05.6933132Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6933203Z mov.b64 {%r2553, %r2554}, %rd942; 2026-02-21T09:34:05.6933275Z cvt.rn.f16x2.f32 %r2555, %r2554, %r2553; 2026-02-21T09:34:05.6933338Z mov.b64 {%r2556, %r2557}, %rd382; 2026-02-21T09:34:05.6933418Z cvt.rn.f16x2.f32 %r2558, %r2557, %r2556; 2026-02-21T09:34:05.6933481Z mov.b64 {%r2559, %r2560}, %rd386; 2026-02-21T09:34:05.6933551Z cvt.rn.f16x2.f32 %r2561, %r2560, %r2559; 2026-02-21T09:34:05.6933749Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6933813Z cvt.u64.u32 %rd943, %r1325; 2026-02-21T09:34:05.6933876Z cvt.u64.u32 %rd944, %r1326; 2026-02-21T09:34:05.6933943Z shl.b64 %rd945, %rd944, 32; 2026-02-21T09:34:05.6934018Z or.b64 %rd946, %rd943, %rd945; 2026-02-21T09:34:05.6934208Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6934275Z mov.b64 {%r2562, %r2563}, %rd946; 2026-02-21T09:34:05.6934355Z cvt.rn.f16x2.f32 %r2564, %r2563, %r2562; 2026-02-21T09:34:05.6934543Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6934608Z cvt.u64.u32 %rd947, %r1327; 2026-02-21T09:34:05.6934721Z cvt.u64.u32 %rd948, %r1328; 2026-02-21T09:34:05.6934785Z shl.b64 %rd949, %rd948, 32; 2026-02-21T09:34:05.6934851Z or.b64 %rd950, %rd947, %rd949; 2026-02-21T09:34:05.6935036Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6935112Z mov.b64 {%r2565, %r2566}, %rd950; 2026-02-21T09:34:05.6935185Z cvt.rn.f16x2.f32 %r2567, %r2566, %r2565; 2026-02-21T09:34:05.6935251Z mov.b64 {%r2568, %r2569}, %rd390; 2026-02-21T09:34:05.6935332Z cvt.rn.f16x2.f32 %r2570, %r2569, %r2568; 2026-02-21T09:34:05.6935455Z mov.b64 {%r2571, %r2572}, %rd394; 2026-02-21T09:34:05.6935566Z cvt.rn.f16x2.f32 %r2573, %r2572, %r2571; 2026-02-21T09:34:05.6935774Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6935844Z cvt.u64.u32 %rd951, %r1333; 2026-02-21T09:34:05.6935912Z cvt.u64.u32 %rd952, %r1334; 2026-02-21T09:34:05.6935979Z shl.b64 %rd953, %rd952, 32; 2026-02-21T09:34:05.6936060Z or.b64 %rd954, %rd951, %rd953; 2026-02-21T09:34:05.6936254Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6936329Z mov.b64 {%r2574, %r2575}, %rd954; 2026-02-21T09:34:05.6936415Z cvt.rn.f16x2.f32 %r2576, %r2575, %r2574; 2026-02-21T09:34:05.6936616Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6936717Z cvt.u64.u32 %rd955, %r1335; 2026-02-21T09:34:05.6936828Z cvt.u64.u32 %rd956, %r1336; 2026-02-21T09:34:05.6936904Z shl.b64 %rd957, %rd956, 32; 2026-02-21T09:34:05.6936981Z or.b64 %rd958, %rd955, %rd957; 2026-02-21T09:34:05.6937189Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6937269Z mov.b64 {%r2577, %r2578}, %rd958; 2026-02-21T09:34:05.6937345Z cvt.rn.f16x2.f32 %r2579, %r2578, %r2577; 2026-02-21T09:34:05.6937411Z mov.b64 {%r2580, %r2581}, %rd398; 2026-02-21T09:34:05.6937499Z cvt.rn.f16x2.f32 %r2582, %r2581, %r2580; 2026-02-21T09:34:05.6937562Z mov.b64 {%r2583, %r2584}, %rd402; 2026-02-21T09:34:05.6937635Z cvt.rn.f16x2.f32 %r2585, %r2584, %r2583; 2026-02-21T09:34:05.6937830Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6937894Z cvt.u64.u32 %rd959, %r1342; 2026-02-21T09:34:05.6937956Z cvt.u64.u32 %rd960, %r1343; 2026-02-21T09:34:05.6938022Z shl.b64 %rd961, %rd960, 32; 2026-02-21T09:34:05.6938096Z or.b64 %rd962, %rd959, %rd961; 2026-02-21T09:34:05.6938286Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6938351Z mov.b64 {%r2586, %r2587}, %rd962; 2026-02-21T09:34:05.6938433Z cvt.rn.f16x2.f32 %r2588, %r2587, %r2586; 2026-02-21T09:34:05.6938625Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6938690Z cvt.u64.u32 %rd963, %r1344; 2026-02-21T09:34:05.6938760Z cvt.u64.u32 %rd964, %r1345; 2026-02-21T09:34:05.6938824Z shl.b64 %rd965, %rd964, 32; 2026-02-21T09:34:05.6938888Z or.b64 %rd966, %rd963, %rd965; 2026-02-21T09:34:05.6939077Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6939149Z mov.b64 {%r2589, %r2590}, %rd966; 2026-02-21T09:34:05.6939224Z cvt.rn.f16x2.f32 %r2591, %r2590, %r2589; 2026-02-21T09:34:05.6939292Z mov.b64 {%r2592, %r2593}, %rd406; 2026-02-21T09:34:05.6939374Z cvt.rn.f16x2.f32 %r2594, %r2593, %r2592; 2026-02-21T09:34:05.6939440Z mov.b64 {%r2595, %r2596}, %rd410; 2026-02-21T09:34:05.6939513Z cvt.rn.f16x2.f32 %r2597, %r2596, %r2595; 2026-02-21T09:34:05.6939708Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6939773Z cvt.u64.u32 %rd967, %r1350; 2026-02-21T09:34:05.6939837Z cvt.u64.u32 %rd968, %r1351; 2026-02-21T09:34:05.6939899Z shl.b64 %rd969, %rd968, 32; 2026-02-21T09:34:05.6939973Z or.b64 %rd970, %rd967, %rd969; 2026-02-21T09:34:05.6940161Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6940226Z mov.b64 {%r2598, %r2599}, %rd970; 2026-02-21T09:34:05.6940307Z cvt.rn.f16x2.f32 %r2600, %r2599, %r2598; 2026-02-21T09:34:05.6940499Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6940564Z cvt.u64.u32 %rd971, %r1352; 2026-02-21T09:34:05.6940662Z cvt.u64.u32 %rd972, %r1353; 2026-02-21T09:34:05.6940750Z shl.b64 %rd973, %rd972, 32; 2026-02-21T09:34:05.6940815Z or.b64 %rd974, %rd971, %rd973; 2026-02-21T09:34:05.6941001Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6941073Z mov.b64 {%r2601, %r2602}, %rd974; 2026-02-21T09:34:05.6941146Z cvt.rn.f16x2.f32 %r2603, %r2602, %r2601; 2026-02-21T09:34:05.6941210Z mov.b64 {%r2604, %r2605}, %rd414; 2026-02-21T09:34:05.6941290Z cvt.rn.f16x2.f32 %r2606, %r2605, %r2604; 2026-02-21T09:34:05.6941353Z mov.b64 {%r2607, %r2608}, %rd418; 2026-02-21T09:34:05.6941425Z cvt.rn.f16x2.f32 %r2609, %r2608, %r2607; 2026-02-21T09:34:05.6941621Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6941684Z cvt.u64.u32 %rd975, %r1359; 2026-02-21T09:34:05.6941771Z cvt.u64.u32 %rd976, %r1360; 2026-02-21T09:34:05.6941861Z shl.b64 %rd977, %rd976, 32; 2026-02-21T09:34:05.6941938Z or.b64 %rd978, %rd975, %rd977; 2026-02-21T09:34:05.6942129Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6942195Z mov.b64 {%r2610, %r2611}, %rd978; 2026-02-21T09:34:05.6942277Z cvt.rn.f16x2.f32 %r2612, %r2611, %r2610; 2026-02-21T09:34:05.6942466Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6942530Z cvt.u64.u32 %rd979, %r1361; 2026-02-21T09:34:05.6942602Z cvt.u64.u32 %rd980, %r1362; 2026-02-21T09:34:05.6942665Z shl.b64 %rd981, %rd980, 32; 2026-02-21T09:34:05.6942731Z or.b64 %rd982, %rd979, %rd981; 2026-02-21T09:34:05.6942919Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6942992Z mov.b64 {%r2613, %r2614}, %rd982; 2026-02-21T09:34:05.6943065Z cvt.rn.f16x2.f32 %r2615, %r2614, %r2613; 2026-02-21T09:34:05.6943130Z mov.b64 {%r2616, %r2617}, %rd422; 2026-02-21T09:34:05.6943211Z cvt.rn.f16x2.f32 %r2618, %r2617, %r2616; 2026-02-21T09:34:05.6943277Z mov.b64 {%r2619, %r2620}, %rd426; 2026-02-21T09:34:05.6943348Z cvt.rn.f16x2.f32 %r2621, %r2620, %r2619; 2026-02-21T09:34:05.6943544Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6943609Z cvt.u64.u32 %rd983, %r1367; 2026-02-21T09:34:05.6943672Z cvt.u64.u32 %rd984, %r1368; 2026-02-21T09:34:05.6943736Z shl.b64 %rd985, %rd984, 32; 2026-02-21T09:34:05.6943816Z or.b64 %rd986, %rd983, %rd985; 2026-02-21T09:34:05.6943987Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6944047Z mov.b64 {%r2622, %r2623}, %rd986; 2026-02-21T09:34:05.6944122Z cvt.rn.f16x2.f32 %r2624, %r2623, %r2622; 2026-02-21T09:34:05.6944303Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6944365Z cvt.u64.u32 %rd987, %r1369; 2026-02-21T09:34:05.6944435Z cvt.u64.u32 %rd988, %r1370; 2026-02-21T09:34:05.6944496Z shl.b64 %rd989, %rd988, 32; 2026-02-21T09:34:05.6944562Z or.b64 %rd990, %rd987, %rd989; 2026-02-21T09:34:05.6944793Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6944867Z mov.b64 {%r2625, %r2626}, %rd990; 2026-02-21T09:34:05.6944941Z cvt.rn.f16x2.f32 %r2627, %r2626, %r2625; 2026-02-21T09:34:05.6945006Z mov.b64 {%r2628, %r2629}, %rd430; 2026-02-21T09:34:05.6945086Z cvt.rn.f16x2.f32 %r2630, %r2629, %r2628; 2026-02-21T09:34:05.6945152Z mov.b64 {%r2631, %r2632}, %rd434; 2026-02-21T09:34:05.6945223Z cvt.rn.f16x2.f32 %r2633, %r2632, %r2631; 2026-02-21T09:34:05.6945417Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6945483Z cvt.u64.u32 %rd991, %r1376; 2026-02-21T09:34:05.6945550Z cvt.u64.u32 %rd992, %r1377; 2026-02-21T09:34:05.6945665Z shl.b64 %rd993, %rd992, 32; 2026-02-21T09:34:05.6945773Z or.b64 %rd994, %rd991, %rd993; 2026-02-21T09:34:05.6945962Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6946029Z mov.b64 {%r2634, %r2635}, %rd994; 2026-02-21T09:34:05.6946116Z cvt.rn.f16x2.f32 %r2636, %r2635, %r2634; 2026-02-21T09:34:05.6946310Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6946372Z cvt.u64.u32 %rd995, %r1378; 2026-02-21T09:34:05.6946447Z cvt.u64.u32 %rd996, %r1379; 2026-02-21T09:34:05.6946509Z shl.b64 %rd997, %rd996, 32; 2026-02-21T09:34:05.6946568Z or.b64 %rd998, %rd995, %rd997; 2026-02-21T09:34:05.6946740Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6946809Z mov.b64 {%r2637, %r2638}, %rd998; 2026-02-21T09:34:05.6946928Z cvt.rn.f16x2.f32 %r2639, %r2638, %r2637; 2026-02-21T09:34:05.6946992Z mov.b64 {%r2640, %r2641}, %rd438; 2026-02-21T09:34:05.6947066Z cvt.rn.f16x2.f32 %r2642, %r2641, %r2640; 2026-02-21T09:34:05.6947125Z mov.b64 {%r2643, %r2644}, %rd442; 2026-02-21T09:34:05.6947192Z cvt.rn.f16x2.f32 %r2645, %r2644, %r2643; 2026-02-21T09:34:05.6947359Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6947426Z cvt.u64.u32 %rd999, %r1384; 2026-02-21T09:34:05.6947489Z cvt.u64.u32 %rd1000, %r1385; 2026-02-21T09:34:05.6947552Z shl.b64 %rd1001, %rd1000, 32; 2026-02-21T09:34:05.6947626Z or.b64 %rd1002, %rd999, %rd1001; 2026-02-21T09:34:05.6947795Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6947859Z mov.b64 {%r2646, %r2647}, %rd1002; 2026-02-21T09:34:05.6947933Z cvt.rn.f16x2.f32 %r2648, %r2647, %r2646; 2026-02-21T09:34:05.6948101Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6948169Z cvt.u64.u32 %rd1003, %r1386; 2026-02-21T09:34:05.6948230Z cvt.u64.u32 %rd1004, %r1387; 2026-02-21T09:34:05.6948300Z shl.b64 %rd1005, %rd1004, 32; 2026-02-21T09:34:05.6948362Z or.b64 %rd1006, %rd1003, %rd1005; 2026-02-21T09:34:05.6948529Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6948598Z mov.b64 {%r2649, %r2650}, %rd1006; 2026-02-21T09:34:05.6948667Z cvt.rn.f16x2.f32 %r2651, %r2650, %r2649; 2026-02-21T09:34:05.6948726Z mov.b64 {%r2652, %r2653}, %rd446; 2026-02-21T09:34:05.6948801Z cvt.rn.f16x2.f32 %r2654, %r2653, %r2652; 2026-02-21T09:34:05.6948861Z mov.b64 {%r2655, %r2656}, %rd450; 2026-02-21T09:34:05.6948928Z cvt.rn.f16x2.f32 %r2657, %r2656, %r2655; 2026-02-21T09:34:05.6949097Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6949168Z cvt.u64.u32 %rd1007, %r1393; 2026-02-21T09:34:05.6949230Z cvt.u64.u32 %rd1008, %r1394; 2026-02-21T09:34:05.6949293Z shl.b64 %rd1009, %rd1008, 32; 2026-02-21T09:34:05.6949361Z or.b64 %rd1010, %rd1007, %rd1009; 2026-02-21T09:34:05.6949527Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6949590Z mov.b64 {%r2658, %r2659}, %rd1010; 2026-02-21T09:34:05.6949664Z cvt.rn.f16x2.f32 %r2660, %r2659, %r2658; 2026-02-21T09:34:05.6949830Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6949891Z cvt.u64.u32 %rd1011, %r1395; 2026-02-21T09:34:05.6949950Z cvt.u64.u32 %rd1012, %r1396; 2026-02-21T09:34:05.6950019Z shl.b64 %rd1013, %rd1012, 32; 2026-02-21T09:34:05.6950078Z or.b64 %rd1014, %rd1011, %rd1013; 2026-02-21T09:34:05.6950242Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6950314Z mov.b64 {%r2661, %r2662}, %rd1014; 2026-02-21T09:34:05.6950405Z cvt.rn.f16x2.f32 %r2663, %r2662, %r2661; 2026-02-21T09:34:05.6950486Z mov.b64 {%r2664, %r2665}, %rd454; 2026-02-21T09:34:05.6950559Z cvt.rn.f16x2.f32 %r2666, %r2665, %r2664; 2026-02-21T09:34:05.6950619Z mov.b64 {%r2667, %r2668}, %rd458; 2026-02-21T09:34:05.6950686Z cvt.rn.f16x2.f32 %r2669, %r2668, %r2667; 2026-02-21T09:34:05.6950856Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6950923Z cvt.u64.u32 %rd1015, %r1401; 2026-02-21T09:34:05.6950984Z cvt.u64.u32 %rd1016, %r1402; 2026-02-21T09:34:05.6951044Z shl.b64 %rd1017, %rd1016, 32; 2026-02-21T09:34:05.6951112Z or.b64 %rd1018, %rd1015, %rd1017; 2026-02-21T09:34:05.6951283Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6951343Z mov.b64 {%r2670, %r2671}, %rd1018; 2026-02-21T09:34:05.6951443Z cvt.rn.f16x2.f32 %r2672, %r2671, %r2670; 2026-02-21T09:34:05.6951642Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6951708Z cvt.u64.u32 %rd1019, %r1403; 2026-02-21T09:34:05.6951769Z cvt.u64.u32 %rd1020, %r1404; 2026-02-21T09:34:05.6951836Z shl.b64 %rd1021, %rd1020, 32; 2026-02-21T09:34:05.6951897Z or.b64 %rd1022, %rd1019, %rd1021; 2026-02-21T09:34:05.6952070Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6952140Z mov.b64 {%r2673, %r2674}, %rd1022; 2026-02-21T09:34:05.6952208Z cvt.rn.f16x2.f32 %r2675, %r2674, %r2673; 2026-02-21T09:34:05.6952267Z mov.b64 {%r2676, %r2677}, %rd462; 2026-02-21T09:34:05.6952342Z cvt.rn.f16x2.f32 %r2678, %r2677, %r2676; 2026-02-21T09:34:05.6952401Z mov.b64 {%r2679, %r2680}, %rd466; 2026-02-21T09:34:05.6952467Z cvt.rn.f16x2.f32 %r2681, %r2680, %r2679; 2026-02-21T09:34:05.6952643Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6952710Z cvt.u64.u32 %rd1023, %r1410; 2026-02-21T09:34:05.6952771Z cvt.u64.u32 %rd1024, %r1411; 2026-02-21T09:34:05.6952832Z shl.b64 %rd1025, %rd1024, 32; 2026-02-21T09:34:05.6952900Z or.b64 %rd1026, %rd1023, %rd1025; 2026-02-21T09:34:05.6953072Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6953134Z mov.b64 {%r2682, %r2683}, %rd1026; 2026-02-21T09:34:05.6953210Z cvt.rn.f16x2.f32 %r2684, %r2683, %r2682; 2026-02-21T09:34:05.6953382Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6953442Z cvt.u64.u32 %rd1027, %r1412; 2026-02-21T09:34:05.6953502Z cvt.u64.u32 %rd1028, %r1413; 2026-02-21T09:34:05.6953573Z shl.b64 %rd1029, %rd1028, 32; 2026-02-21T09:34:05.6953633Z or.b64 %rd1030, %rd1027, %rd1029; 2026-02-21T09:34:05.6953808Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6953880Z mov.b64 {%r2685, %r2686}, %rd1030; 2026-02-21T09:34:05.6953949Z cvt.rn.f16x2.f32 %r2687, %r2686, %r2685; 2026-02-21T09:34:05.6954009Z mov.b64 {%r2688, %r2689}, %rd470; 2026-02-21T09:34:05.6954085Z cvt.rn.f16x2.f32 %r2690, %r2689, %r2688; 2026-02-21T09:34:05.6954146Z mov.b64 {%r2691, %r2692}, %rd474; 2026-02-21T09:34:05.6954215Z cvt.rn.f16x2.f32 %r2693, %r2692, %r2691; 2026-02-21T09:34:05.6954389Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6954460Z cvt.u64.u32 %rd1031, %r1418; 2026-02-21T09:34:05.6954521Z cvt.u64.u32 %rd1032, %r1419; 2026-02-21T09:34:05.6954586Z shl.b64 %rd1033, %rd1032, 32; 2026-02-21T09:34:05.6954654Z or.b64 %rd1034, %rd1031, %rd1033; 2026-02-21T09:34:05.6954856Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6954920Z mov.b64 {%r2694, %r2695}, %rd1034; 2026-02-21T09:34:05.6954996Z cvt.rn.f16x2.f32 %r2696, %r2695, %r2694; 2026-02-21T09:34:05.6955216Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6955310Z cvt.u64.u32 %rd1035, %r1420; 2026-02-21T09:34:05.6955374Z cvt.u64.u32 %rd1036, %r1421; 2026-02-21T09:34:05.6955446Z shl.b64 %rd1037, %rd1036, 32; 2026-02-21T09:34:05.6955509Z or.b64 %rd1038, %rd1035, %rd1037; 2026-02-21T09:34:05.6955690Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6955763Z mov.b64 {%r2697, %r2698}, %rd1038; 2026-02-21T09:34:05.6955837Z cvt.rn.f16x2.f32 %r2699, %r2698, %r2697; 2026-02-21T09:34:05.6955901Z mov.b64 {%r2700, %r2701}, %rd478; 2026-02-21T09:34:05.6955979Z cvt.rn.f16x2.f32 %r2702, %r2701, %r2700; 2026-02-21T09:34:05.6956043Z mov.b64 {%r2703, %r2704}, %rd482; 2026-02-21T09:34:05.6956114Z cvt.rn.f16x2.f32 %r2705, %r2704, %r2703; 2026-02-21T09:34:05.6956373Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6956454Z cvt.u64.u32 %rd1039, %r1427; 2026-02-21T09:34:05.6956519Z cvt.u64.u32 %rd1040, %r1428; 2026-02-21T09:34:05.6956585Z shl.b64 %rd1041, %rd1040, 32; 2026-02-21T09:34:05.6956660Z or.b64 %rd1042, %rd1039, %rd1041; 2026-02-21T09:34:05.6956849Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6956916Z mov.b64 {%r2706, %r2707}, %rd1042; 2026-02-21T09:34:05.6956997Z cvt.rn.f16x2.f32 %r2708, %r2707, %r2706; 2026-02-21T09:34:05.6957190Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6957256Z cvt.u64.u32 %rd1043, %r1429; 2026-02-21T09:34:05.6957320Z cvt.u64.u32 %rd1044, %r1430; 2026-02-21T09:34:05.6957394Z shl.b64 %rd1045, %rd1044, 32; 2026-02-21T09:34:05.6957461Z or.b64 %rd1046, %rd1043, %rd1045; 2026-02-21T09:34:05.6957650Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6957729Z mov.b64 {%r2709, %r2710}, %rd1046; 2026-02-21T09:34:05.6957802Z cvt.rn.f16x2.f32 %r2711, %r2710, %r2709; 2026-02-21T09:34:05.6957867Z mov.b64 {%r2712, %r2713}, %rd486; 2026-02-21T09:34:05.6957947Z cvt.rn.f16x2.f32 %r2714, %r2713, %r2712; 2026-02-21T09:34:05.6958013Z mov.b64 {%r2715, %r2716}, %rd490; 2026-02-21T09:34:05.6958085Z cvt.rn.f16x2.f32 %r2717, %r2716, %r2715; 2026-02-21T09:34:05.6958270Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6958346Z cvt.u64.u32 %rd1047, %r1435; 2026-02-21T09:34:05.6958412Z cvt.u64.u32 %rd1048, %r1436; 2026-02-21T09:34:05.6958478Z shl.b64 %rd1049, %rd1048, 32; 2026-02-21T09:34:05.6958554Z or.b64 %rd1050, %rd1047, %rd1049; 2026-02-21T09:34:05.6958742Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6958813Z mov.b64 {%r2718, %r2719}, %rd1050; 2026-02-21T09:34:05.6958897Z cvt.rn.f16x2.f32 %r2720, %r2719, %r2718; 2026-02-21T09:34:05.6959086Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6959151Z cvt.u64.u32 %rd1051, %r1437; 2026-02-21T09:34:05.6959215Z cvt.u64.u32 %rd1052, %r1438; 2026-02-21T09:34:05.6959289Z shl.b64 %rd1053, %rd1052, 32; 2026-02-21T09:34:05.6959355Z or.b64 %rd1054, %rd1051, %rd1053; 2026-02-21T09:34:05.6959540Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6959613Z mov.b64 {%r2721, %r2722}, %rd1054; 2026-02-21T09:34:05.6959687Z cvt.rn.f16x2.f32 %r2723, %r2722, %r2721; 2026-02-21T09:34:05.6959751Z mov.b64 {%r2724, %r2725}, %rd494; 2026-02-21T09:34:05.6959829Z cvt.rn.f16x2.f32 %r2726, %r2725, %r2724; 2026-02-21T09:34:05.6959894Z mov.b64 {%r2727, %r2728}, %rd498; 2026-02-21T09:34:05.6959970Z cvt.rn.f16x2.f32 %r2729, %r2728, %r2727; 2026-02-21T09:34:05.6960190Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6960290Z cvt.u64.u32 %rd1055, %r1444; 2026-02-21T09:34:05.6960356Z cvt.u64.u32 %rd1056, %r1445; 2026-02-21T09:34:05.6960421Z shl.b64 %rd1057, %rd1056, 32; 2026-02-21T09:34:05.6960492Z or.b64 %rd1058, %rd1055, %rd1057; 2026-02-21T09:34:05.6960679Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6960746Z mov.b64 {%r2730, %r2731}, %rd1058; 2026-02-21T09:34:05.6960828Z cvt.rn.f16x2.f32 %r2732, %r2731, %r2730; 2026-02-21T09:34:05.6961012Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6961078Z cvt.u64.u32 %rd1059, %r1446; 2026-02-21T09:34:05.6961142Z cvt.u64.u32 %rd1060, %r1447; 2026-02-21T09:34:05.6961215Z shl.b64 %rd1061, %rd1060, 32; 2026-02-21T09:34:05.6961344Z or.b64 %rd1062, %rd1059, %rd1061; 2026-02-21T09:34:05.6961532Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6961607Z mov.b64 {%r2733, %r2734}, %rd1062; 2026-02-21T09:34:05.6961681Z cvt.rn.f16x2.f32 %r2735, %r2734, %r2733; 2026-02-21T09:34:05.6961744Z mov.b64 {%r2736, %r2737}, %rd502; 2026-02-21T09:34:05.6961822Z cvt.rn.f16x2.f32 %r2738, %r2737, %r2736; 2026-02-21T09:34:05.6961887Z mov.b64 {%r2739, %r2740}, %rd506; 2026-02-21T09:34:05.6961958Z cvt.rn.f16x2.f32 %r2741, %r2740, %r2739; 2026-02-21T09:34:05.6962140Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6962213Z cvt.u64.u32 %rd1063, %r1452; 2026-02-21T09:34:05.6962276Z cvt.u64.u32 %rd1064, %r1453; 2026-02-21T09:34:05.6962341Z shl.b64 %rd1065, %rd1064, 32; 2026-02-21T09:34:05.6962412Z or.b64 %rd1066, %rd1063, %rd1065; 2026-02-21T09:34:05.6962600Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6962667Z mov.b64 {%r2742, %r2743}, %rd1066; 2026-02-21T09:34:05.6962748Z cvt.rn.f16x2.f32 %r2744, %r2743, %r2742; 2026-02-21T09:34:05.6962933Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6962997Z cvt.u64.u32 %rd1067, %r1454; 2026-02-21T09:34:05.6963062Z cvt.u64.u32 %rd1068, %r1455; 2026-02-21T09:34:05.6963137Z shl.b64 %rd1069, %rd1068, 32; 2026-02-21T09:34:05.6963201Z or.b64 %rd1070, %rd1067, %rd1069; 2026-02-21T09:34:05.6963385Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6963460Z mov.b64 {%r2745, %r2746}, %rd1070; 2026-02-21T09:34:05.6963532Z cvt.rn.f16x2.f32 %r2747, %r2746, %r2745; 2026-02-21T09:34:05.6963596Z mov.b64 {%r2748, %r2749}, %rd510; 2026-02-21T09:34:05.6963680Z cvt.rn.f16x2.f32 %r2750, %r2749, %r2748; 2026-02-21T09:34:05.6963749Z mov.b64 {%r2751, %r2752}, %rd514; 2026-02-21T09:34:05.6963823Z cvt.rn.f16x2.f32 %r2753, %r2752, %r2751; 2026-02-21T09:34:05.6964010Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6964096Z cvt.u64.u32 %rd1071, %r1461; 2026-02-21T09:34:05.6964162Z cvt.u64.u32 %rd1072, %r1462; 2026-02-21T09:34:05.6964228Z shl.b64 %rd1073, %rd1072, 32; 2026-02-21T09:34:05.6964300Z or.b64 %rd1074, %rd1071, %rd1073; 2026-02-21T09:34:05.6964483Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6964548Z mov.b64 {%r2754, %r2755}, %rd1074; 2026-02-21T09:34:05.6964620Z cvt.rn.f16x2.f32 %r2756, %r2755, %r2754; 2026-02-21T09:34:05.6964887Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6964952Z cvt.u64.u32 %rd1075, %r1463; 2026-02-21T09:34:05.6965017Z cvt.u64.u32 %rd1076, %r1464; 2026-02-21T09:34:05.6965093Z shl.b64 %rd1077, %rd1076, 32; 2026-02-21T09:34:05.6965188Z or.b64 %rd1078, %rd1075, %rd1077; 2026-02-21T09:34:05.6965399Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6965471Z mov.b64 {%r2757, %r2758}, %rd1078; 2026-02-21T09:34:05.6965545Z cvt.rn.f16x2.f32 %r2759, %r2758, %r2757; 2026-02-21T09:34:05.6965608Z mov.b64 {%r2760, %r2761}, %rd518; 2026-02-21T09:34:05.6965681Z cvt.rn.f16x2.f32 %r2762, %r2761, %r2760; 2026-02-21T09:34:05.6965752Z mov.b64 {%r2763, %r2764}, %rd522; 2026-02-21T09:34:05.6965823Z cvt.rn.f16x2.f32 %r2765, %r2764, %r2763; 2026-02-21T09:34:05.6966009Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6966081Z cvt.u64.u32 %rd1079, %r1469; 2026-02-21T09:34:05.6966145Z cvt.u64.u32 %rd1080, %r1470; 2026-02-21T09:34:05.6966210Z shl.b64 %rd1081, %rd1080, 32; 2026-02-21T09:34:05.6966310Z or.b64 %rd1082, %rd1079, %rd1081; 2026-02-21T09:34:05.6966523Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6966593Z mov.b64 {%r2766, %r2767}, %rd1082; 2026-02-21T09:34:05.6966665Z cvt.rn.f16x2.f32 %r2768, %r2767, %r2766; 2026-02-21T09:34:05.6966855Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6966920Z cvt.u64.u32 %rd1083, %r1471; 2026-02-21T09:34:05.6966983Z cvt.u64.u32 %rd1084, %r1472; 2026-02-21T09:34:05.6967055Z shl.b64 %rd1085, %rd1084, 32; 2026-02-21T09:34:05.6967120Z or.b64 %rd1086, %rd1083, %rd1085; 2026-02-21T09:34:05.6967306Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6967379Z mov.b64 {%r2769, %r2770}, %rd1086; 2026-02-21T09:34:05.6967451Z cvt.rn.f16x2.f32 %r2771, %r2770, %r2769; 2026-02-21T09:34:05.6967514Z mov.b64 {%r2772, %r2773}, %rd526; 2026-02-21T09:34:05.6967591Z cvt.rn.f16x2.f32 %r2774, %r2773, %r2772; 2026-02-21T09:34:05.6967665Z mov.b64 {%r2775, %r2776}, %rd530; 2026-02-21T09:34:05.6967738Z cvt.rn.f16x2.f32 %r2777, %r2776, %r2775; 2026-02-21T09:34:05.6967921Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6967994Z cvt.u64.u32 %rd1087, %r1478; 2026-02-21T09:34:05.6968061Z cvt.u64.u32 %rd1088, %r1479; 2026-02-21T09:34:05.6968127Z shl.b64 %rd1089, %rd1088, 32; 2026-02-21T09:34:05.6968199Z or.b64 %rd1090, %rd1087, %rd1089; 2026-02-21T09:34:05.6968381Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6968446Z mov.b64 {%r2778, %r2779}, %rd1090; 2026-02-21T09:34:05.6968518Z cvt.rn.f16x2.f32 %r2780, %r2779, %r2778; 2026-02-21T09:34:05.6968709Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6968776Z cvt.u64.u32 %rd1091, %r1480; 2026-02-21T09:34:05.6968841Z cvt.u64.u32 %rd1092, %r1481; 2026-02-21T09:34:05.6968916Z shl.b64 %rd1093, %rd1092, 32; 2026-02-21T09:34:05.6968982Z or.b64 %rd1094, %rd1091, %rd1093; 2026-02-21T09:34:05.6969166Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6969237Z mov.b64 {%r2781, %r2782}, %rd1094; 2026-02-21T09:34:05.6969310Z cvt.rn.f16x2.f32 %r2783, %r2782, %r2781; 2026-02-21T09:34:05.6969373Z mov.b64 {%r2784, %r2785}, %rd534; 2026-02-21T09:34:05.6969445Z cvt.rn.f16x2.f32 %r2786, %r2785, %r2784; 2026-02-21T09:34:05.6969515Z mov.b64 {%r2787, %r2788}, %rd538; 2026-02-21T09:34:05.6969585Z cvt.rn.f16x2.f32 %r2789, %r2788, %r2787; 2026-02-21T09:34:05.6969771Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6969843Z cvt.u64.u32 %rd1095, %r1486; 2026-02-21T09:34:05.6969907Z cvt.u64.u32 %rd1096, %r1487; 2026-02-21T09:34:05.6969975Z shl.b64 %rd1097, %rd1096, 32; 2026-02-21T09:34:05.6970046Z or.b64 %rd1098, %rd1095, %rd1097; 2026-02-21T09:34:05.6970256Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6970346Z mov.b64 {%r2790, %r2791}, %rd1098; 2026-02-21T09:34:05.6970417Z cvt.rn.f16x2.f32 %r2792, %r2791, %r2790; 2026-02-21T09:34:05.6970609Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6970673Z cvt.u64.u32 %rd1099, %r1488; 2026-02-21T09:34:05.6970736Z cvt.u64.u32 %rd1100, %r1489; 2026-02-21T09:34:05.6970808Z shl.b64 %rd1101, %rd1100, 32; 2026-02-21T09:34:05.6970871Z or.b64 %rd1102, %rd1099, %rd1101; 2026-02-21T09:34:05.6971056Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6971127Z mov.b64 {%r2793, %r2794}, %rd1102; 2026-02-21T09:34:05.6971199Z cvt.rn.f16x2.f32 %r2795, %r2794, %r2793; 2026-02-21T09:34:05.6971312Z mov.b64 {%r2796, %r2797}, %rd542; 2026-02-21T09:34:05.6971388Z cvt.rn.f16x2.f32 %r2798, %r2797, %r2796; 2026-02-21T09:34:05.6971471Z mov.b64 {%r2799, %r2800}, %rd546; 2026-02-21T09:34:05.6971536Z cvt.rn.f16x2.f32 %r2801, %r2800, %r2799; 2026-02-21T09:34:05.6971707Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6971776Z cvt.u64.u32 %rd1103, %r1495; 2026-02-21T09:34:05.6971835Z cvt.u64.u32 %rd1104, %r1496; 2026-02-21T09:34:05.6971895Z shl.b64 %rd1105, %rd1104, 32; 2026-02-21T09:34:05.6971962Z or.b64 %rd1106, %rd1103, %rd1105; 2026-02-21T09:34:05.6972133Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6972193Z mov.b64 {%r2802, %r2803}, %rd1106; 2026-02-21T09:34:05.6972261Z cvt.rn.f16x2.f32 %r2804, %r2803, %r2802; 2026-02-21T09:34:05.6972439Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6972503Z cvt.u64.u32 %rd1107, %r1497; 2026-02-21T09:34:05.6972564Z cvt.u64.u32 %rd1108, %r1498; 2026-02-21T09:34:05.6972635Z shl.b64 %rd1109, %rd1108, 32; 2026-02-21T09:34:05.6972694Z or.b64 %rd1110, %rd1107, %rd1109; 2026-02-21T09:34:05.6972866Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6972956Z mov.b64 {%r2805, %r2806}, %rd1110; 2026-02-21T09:34:05.6973025Z cvt.rn.f16x2.f32 %r2807, %r2806, %r2805; 2026-02-21T09:34:05.6973084Z mov.b64 {%r2808, %r2809}, %rd550; 2026-02-21T09:34:05.6973153Z cvt.rn.f16x2.f32 %r2810, %r2809, %r2808; 2026-02-21T09:34:05.6973225Z mov.b64 {%r2811, %r2812}, %rd554; 2026-02-21T09:34:05.6973294Z cvt.rn.f16x2.f32 %r2813, %r2812, %r2811; 2026-02-21T09:34:05.6973465Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6973533Z cvt.u64.u32 %rd1111, %r1503; 2026-02-21T09:34:05.6973593Z cvt.u64.u32 %rd1112, %r1504; 2026-02-21T09:34:05.6973655Z shl.b64 %rd1113, %rd1112, 32; 2026-02-21T09:34:05.6973724Z or.b64 %rd1114, %rd1111, %rd1113; 2026-02-21T09:34:05.6973894Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6973955Z mov.b64 {%r2814, %r2815}, %rd1114; 2026-02-21T09:34:05.6974022Z cvt.rn.f16x2.f32 %r2816, %r2815, %r2814; 2026-02-21T09:34:05.6974201Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6974260Z cvt.u64.u32 %rd1115, %r1505; 2026-02-21T09:34:05.6974319Z cvt.u64.u32 %rd1116, %r1506; 2026-02-21T09:34:05.6974387Z shl.b64 %rd1117, %rd1116, 32; 2026-02-21T09:34:05.6974447Z or.b64 %rd1118, %rd1115, %rd1117; 2026-02-21T09:34:05.6974628Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6974818Z mov.b64 {%r2817, %r2818}, %rd1118; 2026-02-21T09:34:05.6974896Z cvt.rn.f16x2.f32 %r2819, %r2818, %r2817; 2026-02-21T09:34:05.6975001Z mov.b64 {%r2820, %r2821}, %rd558; 2026-02-21T09:34:05.6975109Z cvt.rn.f16x2.f32 %r2822, %r2821, %r2820; 2026-02-21T09:34:05.6975181Z mov.b64 {%r2823, %r2824}, %rd562; 2026-02-21T09:34:05.6975253Z cvt.rn.f16x2.f32 %r2825, %r2824, %r2823; 2026-02-21T09:34:05.6975440Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6975514Z cvt.u64.u32 %rd1119, %r1512; 2026-02-21T09:34:05.6975579Z cvt.u64.u32 %rd1120, %r1513; 2026-02-21T09:34:05.6975644Z shl.b64 %rd1121, %rd1120, 32; 2026-02-21T09:34:05.6975715Z or.b64 %rd1122, %rd1119, %rd1121; 2026-02-21T09:34:05.6975906Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6975972Z mov.b64 {%r2826, %r2827}, %rd1122; 2026-02-21T09:34:05.6976046Z cvt.rn.f16x2.f32 %r2828, %r2827, %r2826; 2026-02-21T09:34:05.6976308Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6976372Z cvt.u64.u32 %rd1123, %r1514; 2026-02-21T09:34:05.6976432Z cvt.u64.u32 %rd1124, %r1515; 2026-02-21T09:34:05.6976502Z shl.b64 %rd1125, %rd1124, 32; 2026-02-21T09:34:05.6976562Z or.b64 %rd1126, %rd1123, %rd1125; 2026-02-21T09:34:05.6976734Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6976802Z mov.b64 {%r2829, %r2830}, %rd1126; 2026-02-21T09:34:05.6976868Z cvt.rn.f16x2.f32 %r2831, %r2830, %r2829; 2026-02-21T09:34:05.6976928Z mov.b64 {%r2832, %r2833}, %rd566; 2026-02-21T09:34:05.6976994Z cvt.rn.f16x2.f32 %r2834, %r2833, %r2832; 2026-02-21T09:34:05.6977060Z mov.b64 {%r2835, %r2836}, %rd570; 2026-02-21T09:34:05.6977126Z cvt.rn.f16x2.f32 %r2837, %r2836, %r2835; 2026-02-21T09:34:05.6977300Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6977372Z cvt.u64.u32 %rd1127, %r1520; 2026-02-21T09:34:05.6977434Z cvt.u64.u32 %rd1128, %r1521; 2026-02-21T09:34:05.6977496Z shl.b64 %rd1129, %rd1128, 32; 2026-02-21T09:34:05.6977563Z or.b64 %rd1130, %rd1127, %rd1129; 2026-02-21T09:34:05.6977732Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6977793Z mov.b64 {%r2838, %r2839}, %rd1130; 2026-02-21T09:34:05.6977862Z cvt.rn.f16x2.f32 %r2840, %r2839, %r2838; 2026-02-21T09:34:05.6978046Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6978107Z cvt.u64.u32 %rd1131, %r1522; 2026-02-21T09:34:05.6978167Z cvt.u64.u32 %rd1132, %r1523; 2026-02-21T09:34:05.6978234Z shl.b64 %rd1133, %rd1132, 32; 2026-02-21T09:34:05.6978294Z or.b64 %rd1134, %rd1131, %rd1133; 2026-02-21T09:34:05.6978463Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6978532Z mov.b64 {%r2841, %r2842}, %rd1134; 2026-02-21T09:34:05.6978601Z cvt.rn.f16x2.f32 %r2843, %r2842, %r2841; 2026-02-21T09:34:05.6978662Z mov.b64 {%r2844, %r2845}, %rd574; 2026-02-21T09:34:05.6978727Z cvt.rn.f16x2.f32 %r2846, %r2845, %r2844; 2026-02-21T09:34:05.6978793Z mov.b64 {%r2847, %r2848}, %rd578; 2026-02-21T09:34:05.6978858Z cvt.rn.f16x2.f32 %r2849, %r2848, %r2847; 2026-02-21T09:34:05.6979030Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6979099Z cvt.u64.u32 %rd1135, %r1529; 2026-02-21T09:34:05.6979162Z cvt.u64.u32 %rd1136, %r1530; 2026-02-21T09:34:05.6979222Z shl.b64 %rd1137, %rd1136, 32; 2026-02-21T09:34:05.6979289Z or.b64 %rd1138, %rd1135, %rd1137; 2026-02-21T09:34:05.6979462Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6979522Z mov.b64 {%r2850, %r2851}, %rd1138; 2026-02-21T09:34:05.6979589Z cvt.rn.f16x2.f32 %r2852, %r2851, %r2850; 2026-02-21T09:34:05.6979769Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6979876Z cvt.u64.u32 %rd1139, %r1531; 2026-02-21T09:34:05.6979935Z cvt.u64.u32 %rd1140, %r1532; 2026-02-21T09:34:05.6980003Z shl.b64 %rd1141, %rd1140, 32; 2026-02-21T09:34:05.6980063Z or.b64 %rd1142, %rd1139, %rd1141; 2026-02-21T09:34:05.6980237Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6980304Z mov.b64 {%r2853, %r2854}, %rd1142; 2026-02-21T09:34:05.6980370Z cvt.rn.f16x2.f32 %r2855, %r2854, %r2853; 2026-02-21T09:34:05.6980428Z mov.b64 {%r2856, %r2857}, %rd582; 2026-02-21T09:34:05.6980494Z cvt.rn.f16x2.f32 %r2858, %r2857, %r2856; 2026-02-21T09:34:05.6980561Z mov.b64 {%r2859, %r2860}, %rd586; 2026-02-21T09:34:05.6980626Z cvt.rn.f16x2.f32 %r2861, %r2860, %r2859; 2026-02-21T09:34:05.6980854Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6980929Z cvt.u64.u32 %rd1143, %r1537; 2026-02-21T09:34:05.6980990Z cvt.u64.u32 %rd1144, %r1538; 2026-02-21T09:34:05.6981052Z shl.b64 %rd1145, %rd1144, 32; 2026-02-21T09:34:05.6981110Z or.b64 %rd1146, %rd1143, %rd1145; 2026-02-21T09:34:05.6981290Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6981352Z mov.b64 {%r2862, %r2863}, %rd1146; 2026-02-21T09:34:05.6981420Z cvt.rn.f16x2.f32 %r2864, %r2863, %r2862; 2026-02-21T09:34:05.6981600Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6981662Z cvt.u64.u32 %rd1147, %r1539; 2026-02-21T09:34:05.6981723Z cvt.u64.u32 %rd1148, %r1540; 2026-02-21T09:34:05.6981795Z shl.b64 %rd1149, %rd1148, 32; 2026-02-21T09:34:05.6981858Z or.b64 %rd1150, %rd1147, %rd1149; 2026-02-21T09:34:05.6982031Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6982103Z mov.b64 {%r2865, %r2866}, %rd1150; 2026-02-21T09:34:05.6982173Z cvt.rn.f16x2.f32 %r2867, %r2866, %r2865; 2026-02-21T09:34:05.6982234Z mov.b64 {%r2868, %r2869}, %rd590; 2026-02-21T09:34:05.6982305Z cvt.rn.f16x2.f32 %r2870, %r2869, %r2868; 2026-02-21T09:34:05.6982374Z mov.b64 {%r2871, %r2872}, %rd594; 2026-02-21T09:34:05.6982441Z cvt.rn.f16x2.f32 %r2873, %r2872, %r2871; 2026-02-21T09:34:05.6982607Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6982676Z cvt.u64.u32 %rd1151, %r1546; 2026-02-21T09:34:05.6982735Z cvt.u64.u32 %rd1152, %r1547; 2026-02-21T09:34:05.6982795Z shl.b64 %rd1153, %rd1152, 32; 2026-02-21T09:34:05.6982855Z or.b64 %rd1154, %rd1151, %rd1153; 2026-02-21T09:34:05.6983033Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6983095Z mov.b64 {%r2874, %r2875}, %rd1154; 2026-02-21T09:34:05.6983165Z cvt.rn.f16x2.f32 %r2876, %r2875, %r2874; 2026-02-21T09:34:05.6983343Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6983406Z cvt.u64.u32 %rd1155, %r1548; 2026-02-21T09:34:05.6983466Z cvt.u64.u32 %rd1156, %r1549; 2026-02-21T09:34:05.6983531Z shl.b64 %rd1157, %rd1156, 32; 2026-02-21T09:34:05.6983591Z or.b64 %rd1158, %rd1155, %rd1157; 2026-02-21T09:34:05.6983764Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6983832Z mov.b64 {%r2877, %r2878}, %rd1158; 2026-02-21T09:34:05.6983899Z cvt.rn.f16x2.f32 %r2879, %r2878, %r2877; 2026-02-21T09:34:05.6983958Z mov.b64 {%r2880, %r2881}, %rd598; 2026-02-21T09:34:05.6984025Z cvt.rn.f16x2.f32 %r2882, %r2881, %r2880; 2026-02-21T09:34:05.6984091Z mov.b64 {%r2883, %r2884}, %rd602; 2026-02-21T09:34:05.6984158Z cvt.rn.f16x2.f32 %r2885, %r2884, %r2883; 2026-02-21T09:34:05.6984332Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6984443Z cvt.u64.u32 %rd1159, %r1554; 2026-02-21T09:34:05.6984503Z cvt.u64.u32 %rd1160, %r1555; 2026-02-21T09:34:05.6984563Z shl.b64 %rd1161, %rd1160, 32; 2026-02-21T09:34:05.6984623Z or.b64 %rd1162, %rd1159, %rd1161; 2026-02-21T09:34:05.6984846Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6984909Z mov.b64 {%r2886, %r2887}, %rd1162; 2026-02-21T09:34:05.6984977Z cvt.rn.f16x2.f32 %r2888, %r2887, %r2886; 2026-02-21T09:34:05.6985158Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6985218Z cvt.u64.u32 %rd1163, %r1556; 2026-02-21T09:34:05.6985277Z cvt.u64.u32 %rd1164, %r1557; 2026-02-21T09:34:05.6985344Z shl.b64 %rd1165, %rd1164, 32; 2026-02-21T09:34:05.6985403Z or.b64 %rd1166, %rd1163, %rd1165; 2026-02-21T09:34:05.6985633Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6985712Z mov.b64 {%r2889, %r2890}, %rd1166; 2026-02-21T09:34:05.6985784Z cvt.rn.f16x2.f32 %r2891, %r2890, %r2889; 2026-02-21T09:34:05.6985848Z mov.b64 {%r2892, %r2893}, %rd606; 2026-02-21T09:34:05.6985922Z cvt.rn.f16x2.f32 %r2894, %r2893, %r2892; 2026-02-21T09:34:05.6985992Z mov.b64 {%r2895, %r2896}, %rd610; 2026-02-21T09:34:05.6986065Z cvt.rn.f16x2.f32 %r2897, %r2896, %r2895; 2026-02-21T09:34:05.6986255Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6986329Z cvt.u64.u32 %rd1167, %r1563; 2026-02-21T09:34:05.6986396Z cvt.u64.u32 %rd1168, %r1564; 2026-02-21T09:34:05.6986463Z shl.b64 %rd1169, %rd1168, 32; 2026-02-21T09:34:05.6986528Z or.b64 %rd1170, %rd1167, %rd1169; 2026-02-21T09:34:05.6986724Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6986790Z mov.b64 {%r2898, %r2899}, %rd1170; 2026-02-21T09:34:05.6986863Z cvt.rn.f16x2.f32 %r2900, %r2899, %r2898; 2026-02-21T09:34:05.6987063Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6987124Z cvt.u64.u32 %rd1171, %r1565; 2026-02-21T09:34:05.6987183Z cvt.u64.u32 %rd1172, %r1566; 2026-02-21T09:34:05.6987248Z shl.b64 %rd1173, %rd1172, 32; 2026-02-21T09:34:05.6987306Z or.b64 %rd1174, %rd1171, %rd1173; 2026-02-21T09:34:05.6987475Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6987535Z mov.b64 {%r2901, %r2902}, %rd1174; 2026-02-21T09:34:05.6987609Z cvt.rn.f16x2.f32 %r2903, %r2902, %r2901; 2026-02-21T09:34:05.6987668Z mov.b64 {%r2904, %r2905}, %rd614; 2026-02-21T09:34:05.6987735Z cvt.rn.f16x2.f32 %r2906, %r2905, %r2904; 2026-02-21T09:34:05.6987803Z mov.b64 {%r2907, %r2908}, %rd618; 2026-02-21T09:34:05.6987873Z cvt.rn.f16x2.f32 %r2909, %r2908, %r2907; 2026-02-21T09:34:05.6988050Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6988117Z cvt.u64.u32 %rd1175, %r1571; 2026-02-21T09:34:05.6988176Z cvt.u64.u32 %rd1176, %r1572; 2026-02-21T09:34:05.6988237Z shl.b64 %rd1177, %rd1176, 32; 2026-02-21T09:34:05.6988296Z or.b64 %rd1178, %rd1175, %rd1177; 2026-02-21T09:34:05.6988474Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6988534Z mov.b64 {%r2910, %r2911}, %rd1178; 2026-02-21T09:34:05.6988602Z cvt.rn.f16x2.f32 %r2912, %r2911, %r2910; 2026-02-21T09:34:05.6988780Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6988841Z cvt.u64.u32 %rd1179, %r1573; 2026-02-21T09:34:05.6988901Z cvt.u64.u32 %rd1180, %r1574; 2026-02-21T09:34:05.6988969Z shl.b64 %rd1181, %rd1180, 32; 2026-02-21T09:34:05.6989032Z or.b64 %rd1182, %rd1179, %rd1181; 2026-02-21T09:34:05.6989239Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6989327Z mov.b64 {%r2913, %r2914}, %rd1182; 2026-02-21T09:34:05.6989404Z cvt.rn.f16x2.f32 %r2915, %r2914, %r2913; 2026-02-21T09:34:05.6989463Z mov.b64 {%r2916, %r2917}, %rd622; 2026-02-21T09:34:05.6989528Z cvt.rn.f16x2.f32 %r2918, %r2917, %r2916; 2026-02-21T09:34:05.6989595Z mov.b64 {%r2919, %r2920}, %rd626; 2026-02-21T09:34:05.6989663Z cvt.rn.f16x2.f32 %r2921, %r2920, %r2919; 2026-02-21T09:34:05.6989836Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6989903Z cvt.u64.u32 %rd1183, %r1580; 2026-02-21T09:34:05.6989963Z cvt.u64.u32 %rd1184, %r1581; 2026-02-21T09:34:05.6990023Z shl.b64 %rd1185, %rd1184, 32; 2026-02-21T09:34:05.6990083Z or.b64 %rd1186, %rd1183, %rd1185; 2026-02-21T09:34:05.6990300Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6990363Z mov.b64 {%r2922, %r2923}, %rd1186; 2026-02-21T09:34:05.6990431Z cvt.rn.f16x2.f32 %r2924, %r2923, %r2922; 2026-02-21T09:34:05.6990611Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6990672Z cvt.u64.u32 %rd1187, %r1582; 2026-02-21T09:34:05.6990731Z cvt.u64.u32 %rd1188, %r1583; 2026-02-21T09:34:05.6990803Z shl.b64 %rd1189, %rd1188, 32; 2026-02-21T09:34:05.6990863Z or.b64 %rd1190, %rd1187, %rd1189; 2026-02-21T09:34:05.6991036Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6991099Z mov.b64 {%r2925, %r2926}, %rd1190; 2026-02-21T09:34:05.6991180Z cvt.rn.f16x2.f32 %r2927, %r2926, %r2925; 2026-02-21T09:34:05.6991244Z mov.b64 {%r2928, %r2929}, %rd630; 2026-02-21T09:34:05.6991313Z cvt.rn.f16x2.f32 %r2930, %r2929, %r2928; 2026-02-21T09:34:05.6991388Z mov.b64 {%r2931, %r2932}, %rd634; 2026-02-21T09:34:05.6991456Z cvt.rn.f16x2.f32 %r2933, %r2932, %r2931; 2026-02-21T09:34:05.6991630Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6991697Z cvt.u64.u32 %rd1191, %r1588; 2026-02-21T09:34:05.6991756Z cvt.u64.u32 %rd1192, %r1589; 2026-02-21T09:34:05.6991816Z shl.b64 %rd1193, %rd1192, 32; 2026-02-21T09:34:05.6991875Z or.b64 %rd1194, %rd1191, %rd1193; 2026-02-21T09:34:05.6992054Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6992115Z mov.b64 {%r2934, %r2935}, %rd1194; 2026-02-21T09:34:05.6992181Z cvt.rn.f16x2.f32 %r2936, %r2935, %r2934; 2026-02-21T09:34:05.6992364Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6992424Z cvt.u64.u32 %rd1195, %r1590; 2026-02-21T09:34:05.6992485Z cvt.u64.u32 %rd1196, %r1591; 2026-02-21T09:34:05.6992555Z shl.b64 %rd1197, %rd1196, 32; 2026-02-21T09:34:05.6992618Z or.b64 %rd1198, %rd1195, %rd1197; 2026-02-21T09:34:05.6992791Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6992852Z mov.b64 {%r2937, %r2938}, %rd1198; 2026-02-21T09:34:05.6992926Z cvt.rn.f16x2.f32 %r2939, %r2938, %r2937; 2026-02-21T09:34:05.6992986Z mov.b64 {%r2940, %r2941}, %rd638; 2026-02-21T09:34:05.6993054Z cvt.rn.f16x2.f32 %r2942, %r2941, %r2940; 2026-02-21T09:34:05.6993119Z mov.b64 {%r2943, %r2944}, %rd642; 2026-02-21T09:34:05.6993186Z cvt.rn.f16x2.f32 %r2945, %r2944, %r2943; 2026-02-21T09:34:05.6993358Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6993427Z cvt.u64.u32 %rd1199, %r1597; 2026-02-21T09:34:05.6993486Z cvt.u64.u32 %rd1200, %r1598; 2026-02-21T09:34:05.6993547Z shl.b64 %rd1201, %rd1200, 32; 2026-02-21T09:34:05.6993609Z or.b64 %rd1202, %rd1199, %rd1201; 2026-02-21T09:34:05.6993790Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6993899Z mov.b64 {%r2946, %r2947}, %rd1202; 2026-02-21T09:34:05.6993967Z cvt.rn.f16x2.f32 %r2948, %r2947, %r2946; 2026-02-21T09:34:05.6994145Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6994205Z cvt.u64.u32 %rd1203, %r1599; 2026-02-21T09:34:05.6994266Z cvt.u64.u32 %rd1204, %r1600; 2026-02-21T09:34:05.6994333Z shl.b64 %rd1205, %rd1204, 32; 2026-02-21T09:34:05.6994393Z or.b64 %rd1206, %rd1203, %rd1205; 2026-02-21T09:34:05.6994564Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6994625Z mov.b64 {%r2949, %r2950}, %rd1206; 2026-02-21T09:34:05.6994741Z cvt.rn.f16x2.f32 %r2951, %r2950, %r2949; 2026-02-21T09:34:05.6994802Z mov.b64 {%r2952, %r2953}, %rd646; 2026-02-21T09:34:05.6994919Z cvt.rn.f16x2.f32 %r2954, %r2953, %r2952; 2026-02-21T09:34:05.6994995Z mov.b64 {%r2955, %r2956}, %rd650; 2026-02-21T09:34:05.6995069Z cvt.rn.f16x2.f32 %r2957, %r2956, %r2955; 2026-02-21T09:34:05.6995258Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6995330Z cvt.u64.u32 %rd1207, %r1605; 2026-02-21T09:34:05.6995395Z cvt.u64.u32 %rd1208, %r1606; 2026-02-21T09:34:05.6995461Z shl.b64 %rd1209, %rd1208, 32; 2026-02-21T09:34:05.6995523Z or.b64 %rd1210, %rd1207, %rd1209; 2026-02-21T09:34:05.6995718Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6995783Z mov.b64 {%r2958, %r2959}, %rd1210; 2026-02-21T09:34:05.6995855Z cvt.rn.f16x2.f32 %r2960, %r2959, %r2958; 2026-02-21T09:34:05.6996047Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6996112Z cvt.u64.u32 %rd1211, %r1607; 2026-02-21T09:34:05.6996179Z cvt.u64.u32 %rd1212, %r1608; 2026-02-21T09:34:05.6996252Z shl.b64 %rd1213, %rd1212, 32; 2026-02-21T09:34:05.6996319Z or.b64 %rd1214, %rd1211, %rd1213; 2026-02-21T09:34:05.6996506Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6996566Z mov.b64 {%r2961, %r2962}, %rd1214; 2026-02-21T09:34:05.6996642Z cvt.rn.f16x2.f32 %r2963, %r2962, %r2961; 2026-02-21T09:34:05.6996702Z mov.b64 {%r2964, %r2965}, %rd654; 2026-02-21T09:34:05.6996769Z cvt.rn.f16x2.f32 %r2966, %r2965, %r2964; 2026-02-21T09:34:05.6996836Z mov.b64 {%r2967, %r2968}, %rd658; 2026-02-21T09:34:05.6996902Z cvt.rn.f16x2.f32 %r2969, %r2968, %r2967; 2026-02-21T09:34:05.6997077Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.6997148Z cvt.u64.u32 %rd1215, %r1614; 2026-02-21T09:34:05.6997211Z cvt.u64.u32 %rd1216, %r1615; 2026-02-21T09:34:05.6997277Z shl.b64 %rd1217, %rd1216, 32; 2026-02-21T09:34:05.6997344Z or.b64 %rd1218, %rd1215, %rd1217; 2026-02-21T09:34:05.6997544Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.6997611Z mov.b64 {%r2970, %r2971}, %rd1218; 2026-02-21T09:34:05.6997683Z cvt.rn.f16x2.f32 %r2972, %r2971, %r2970; 2026-02-21T09:34:05.6998308Z [197s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:34:05.6999441Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=6, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:34:05.6999596Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:34:05.6999666Z `ptxas` stderr: 2026-02-21T09:34:05.7000084Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 72 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:34:05.7000230Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:34:05.7000246Z 2026-02-21T09:34:05.7000719Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp7pzebf1f.ptx -o /tmp/tmp7pzebf1f.ptx.o 2026-02-21T09:34:05.7000725Z 2026-02-21T09:34:05.7000875Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:34:05.7001079Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7001147Z cvt.u64.u32 %rd1219, %r1616; 2026-02-21T09:34:05.7001212Z cvt.u64.u32 %rd1220, %r1617; 2026-02-21T09:34:05.7001306Z shl.b64 %rd1221, %rd1220, 32; 2026-02-21T09:34:05.7001407Z or.b64 %rd1222, %rd1219, %rd1221; 2026-02-21T09:34:05.7001603Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7001672Z mov.b64 {%r2973, %r2974}, %rd1222; 2026-02-21T09:34:05.7001759Z cvt.rn.f16x2.f32 %r2975, %r2974, %r2973; 2026-02-21T09:34:05.7001826Z mov.b64 {%r2976, %r2977}, %rd662; 2026-02-21T09:34:05.7001902Z cvt.rn.f16x2.f32 %r2978, %r2977, %r2976; 2026-02-21T09:34:05.7001984Z mov.b64 {%r2979, %r2980}, %rd666; 2026-02-21T09:34:05.7002056Z cvt.rn.f16x2.f32 %r2981, %r2980, %r2979; 2026-02-21T09:34:05.7002241Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7002307Z cvt.u64.u32 %rd1223, %r1622; 2026-02-21T09:34:05.7002381Z cvt.u64.u32 %rd1224, %r1623; 2026-02-21T09:34:05.7002448Z shl.b64 %rd1225, %rd1224, 32; 2026-02-21T09:34:05.7002512Z or.b64 %rd1226, %rd1223, %rd1225; 2026-02-21T09:34:05.7002707Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7002776Z mov.b64 {%r2982, %r2983}, %rd1226; 2026-02-21T09:34:05.7002849Z cvt.rn.f16x2.f32 %r2984, %r2983, %r2982; 2026-02-21T09:34:05.7003041Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7003116Z cvt.u64.u32 %rd1227, %r1624; 2026-02-21T09:34:05.7003175Z cvt.u64.u32 %rd1228, %r1625; 2026-02-21T09:34:05.7003236Z shl.b64 %rd1229, %rd1228, 32; 2026-02-21T09:34:05.7003304Z or.b64 %rd1230, %rd1227, %rd1229; 2026-02-21T09:34:05.7003477Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7003538Z mov.b64 {%r2985, %r2986}, %rd1230; 2026-02-21T09:34:05.7003613Z cvt.rn.f16x2.f32 %r2987, %r2986, %r2985; 2026-02-21T09:34:05.7003672Z mov.b64 {%r2988, %r2989}, %rd670; 2026-02-21T09:34:05.7003741Z cvt.rn.f16x2.f32 %r2990, %r2989, %r2988; 2026-02-21T09:34:05.7003807Z mov.b64 {%r2991, %r2992}, %rd674; 2026-02-21T09:34:05.7003875Z cvt.rn.f16x2.f32 %r2993, %r2992, %r2991; 2026-02-21T09:34:05.7004047Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7004106Z cvt.u64.u32 %rd1231, %r1631; 2026-02-21T09:34:05.7004175Z cvt.u64.u32 %rd1232, %r1632; 2026-02-21T09:34:05.7004236Z shl.b64 %rd1233, %rd1232, 32; 2026-02-21T09:34:05.7004296Z or.b64 %rd1234, %rd1231, %rd1233; 2026-02-21T09:34:05.7004472Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7004535Z mov.b64 {%r2994, %r2995}, %rd1234; 2026-02-21T09:34:05.7004608Z cvt.rn.f16x2.f32 %r2996, %r2995, %r2994; 2026-02-21T09:34:05.7004829Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7004895Z cvt.u64.u32 %rd1235, %r1633; 2026-02-21T09:34:05.7004961Z cvt.u64.u32 %rd1236, %r1634; 2026-02-21T09:34:05.7005028Z shl.b64 %rd1237, %rd1236, 32; 2026-02-21T09:34:05.7005129Z or.b64 %rd1238, %rd1235, %rd1237; 2026-02-21T09:34:05.7005348Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7005413Z mov.b64 {%r2997, %r2998}, %rd1238; 2026-02-21T09:34:05.7005494Z cvt.rn.f16x2.f32 %r2999, %r2998, %r2997; 2026-02-21T09:34:05.7005558Z mov.b64 {%r3000, %r3001}, %rd678; 2026-02-21T09:34:05.7005631Z cvt.rn.f16x2.f32 %r3002, %r3001, %r3000; 2026-02-21T09:34:05.7005702Z mov.b64 {%r3003, %r3004}, %rd682; 2026-02-21T09:34:05.7005774Z cvt.rn.f16x2.f32 %r3005, %r3004, %r3003; 2026-02-21T09:34:05.7005957Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7006023Z cvt.u64.u32 %rd1239, %r1639; 2026-02-21T09:34:05.7006095Z cvt.u64.u32 %rd1240, %r1640; 2026-02-21T09:34:05.7006160Z shl.b64 %rd1241, %rd1240, 32; 2026-02-21T09:34:05.7006273Z or.b64 %rd1242, %rd1239, %rd1241; 2026-02-21T09:34:05.7006465Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7006529Z mov.b64 {%r3006, %r3007}, %rd1242; 2026-02-21T09:34:05.7006596Z cvt.rn.f16x2.f32 %r3008, %r3007, %r3006; 2026-02-21T09:34:05.7006776Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7006836Z cvt.u64.u32 %rd1243, %r1641; 2026-02-21T09:34:05.7006896Z cvt.u64.u32 %rd1244, %r1642; 2026-02-21T09:34:05.7006956Z shl.b64 %rd1245, %rd1244, 32; 2026-02-21T09:34:05.7007023Z or.b64 %rd1246, %rd1243, %rd1245; 2026-02-21T09:34:05.7007194Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7007254Z mov.b64 {%r3009, %r3010}, %rd1246; 2026-02-21T09:34:05.7007329Z cvt.rn.f16x2.f32 %r3011, %r3010, %r3009; 2026-02-21T09:34:05.7007388Z mov.b64 {%r3012, %r3013}, %rd686; 2026-02-21T09:34:05.7007457Z cvt.rn.f16x2.f32 %r3014, %r3013, %r3012; 2026-02-21T09:34:05.7007524Z mov.b64 {%r3015, %r3016}, %rd690; 2026-02-21T09:34:05.7007591Z cvt.rn.f16x2.f32 %r3017, %r3016, %r3015; 2026-02-21T09:34:05.7007763Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7007823Z cvt.u64.u32 %rd1247, %r1648; 2026-02-21T09:34:05.7007892Z cvt.u64.u32 %rd1248, %r1649; 2026-02-21T09:34:05.7007954Z shl.b64 %rd1249, %rd1248, 32; 2026-02-21T09:34:05.7008014Z or.b64 %rd1250, %rd1247, %rd1249; 2026-02-21T09:34:05.7008194Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7008255Z mov.b64 {%r3018, %r3019}, %rd1250; 2026-02-21T09:34:05.7008322Z cvt.rn.f16x2.f32 %r3020, %r3019, %r3018; 2026-02-21T09:34:05.7008390Z mov.b64 {%r3021, %r3022}, %rd694; 2026-02-21T09:34:05.7008456Z cvt.rn.f16x2.f32 %r3023, %r3022, %r3021; 2026-02-21T09:34:05.7008519Z mov.b64 {%r3024, %r3025}, %rd698; 2026-02-21T09:34:05.7008586Z cvt.rn.f16x2.f32 %r3026, %r3025, %r3024; 2026-02-21T09:34:05.7008657Z mov.b64 {%r3027, %r3028}, %rd702; 2026-02-21T09:34:05.7008724Z cvt.rn.f16x2.f32 %r3029, %r3028, %r3027; 2026-02-21T09:34:05.7008897Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7008964Z cvt.u64.u32 %rd1251, %r1656; 2026-02-21T09:34:05.7009024Z cvt.u64.u32 %rd1252, %r1657; 2026-02-21T09:34:05.7009084Z shl.b64 %rd1253, %rd1252, 32; 2026-02-21T09:34:05.7009144Z or.b64 %rd1254, %rd1251, %rd1253; 2026-02-21T09:34:05.7009327Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7009388Z mov.b64 {%r3030, %r3031}, %rd1254; 2026-02-21T09:34:05.7009453Z cvt.rn.f16x2.f32 %r3032, %r3031, %r3030; 2026-02-21T09:34:05.7009523Z mov.b64 {%r3033, %r3034}, %rd706; 2026-02-21T09:34:05.7009591Z cvt.rn.f16x2.f32 %r3035, %r3034, %r3033; 2026-02-21T09:34:05.7009653Z mov.b64 {%r3036, %r3037}, %rd710; 2026-02-21T09:34:05.7009751Z cvt.rn.f16x2.f32 %r3038, %r3037, %r3036; 2026-02-21T09:34:05.7009834Z mov.b64 {%r3039, %r3040}, %rd714; 2026-02-21T09:34:05.7009901Z cvt.rn.f16x2.f32 %r3041, %r3040, %r3039; 2026-02-21T09:34:05.7010072Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7010140Z cvt.u64.u32 %rd1255, %r1665; 2026-02-21T09:34:05.7010199Z cvt.u64.u32 %rd1256, %r1666; 2026-02-21T09:34:05.7010260Z shl.b64 %rd1257, %rd1256, 32; 2026-02-21T09:34:05.7010330Z or.b64 %rd1258, %rd1255, %rd1257; 2026-02-21T09:34:05.7010504Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7010566Z mov.b64 {%r3042, %r3043}, %rd1258; 2026-02-21T09:34:05.7010641Z cvt.rn.f16x2.f32 %r3044, %r3043, %r3042; 2026-02-21T09:34:05.7010703Z mov.b64 {%r3045, %r3046}, %rd718; 2026-02-21T09:34:05.7010809Z cvt.rn.f16x2.f32 %r3047, %r3046, %r3045; 2026-02-21T09:34:05.7010873Z mov.b64 {%r3048, %r3049}, %rd722; 2026-02-21T09:34:05.7010952Z cvt.rn.f16x2.f32 %r3050, %r3049, %r3048; 2026-02-21T09:34:05.7011010Z mov.b64 {%r3051, %r3052}, %rd726; 2026-02-21T09:34:05.7011076Z cvt.rn.f16x2.f32 %r3053, %r3052, %r3051; 2026-02-21T09:34:05.7011253Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7011312Z cvt.u64.u32 %rd1259, %r1673; 2026-02-21T09:34:05.7011371Z cvt.u64.u32 %rd1260, %r1674; 2026-02-21T09:34:05.7011438Z shl.b64 %rd1261, %rd1260, 32; 2026-02-21T09:34:05.7011498Z or.b64 %rd1262, %rd1259, %rd1261; 2026-02-21T09:34:05.7011670Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7011729Z mov.b64 {%r3054, %r3055}, %rd1262; 2026-02-21T09:34:05.7011803Z cvt.rn.f16x2.f32 %r3056, %r3055, %r3054; 2026-02-21T09:34:05.7011981Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7012044Z cvt.u64.u32 %rd1263, %r1675; 2026-02-21T09:34:05.7012113Z cvt.u64.u32 %rd1264, %r1676; 2026-02-21T09:34:05.7012174Z shl.b64 %rd1265, %rd1264, 32; 2026-02-21T09:34:05.7012233Z or.b64 %rd1266, %rd1263, %rd1265; 2026-02-21T09:34:05.7012412Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7012472Z mov.b64 {%r3057, %r3058}, %rd1266; 2026-02-21T09:34:05.7012538Z cvt.rn.f16x2.f32 %r3059, %r3058, %r3057; 2026-02-21T09:34:05.7012596Z mov.b64 {%r3060, %r3061}, %rd730; 2026-02-21T09:34:05.7012671Z cvt.rn.f16x2.f32 %r3062, %r3061, %r3060; 2026-02-21T09:34:05.7012731Z mov.b64 {%r3063, %r3064}, %rd734; 2026-02-21T09:34:05.7012798Z cvt.rn.f16x2.f32 %r3065, %r3064, %r3063; 2026-02-21T09:34:05.7012973Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7013036Z cvt.u64.u32 %rd1267, %r1682; 2026-02-21T09:34:05.7013097Z cvt.u64.u32 %rd1268, %r1683; 2026-02-21T09:34:05.7013169Z shl.b64 %rd1269, %rd1268, 32; 2026-02-21T09:34:05.7013228Z or.b64 %rd1270, %rd1267, %rd1269; 2026-02-21T09:34:05.7013402Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7013461Z mov.b64 {%r3066, %r3067}, %rd1270; 2026-02-21T09:34:05.7013535Z cvt.rn.f16x2.f32 %r3068, %r3067, %r3066; 2026-02-21T09:34:05.7013595Z mov.b64 {%r3069, %r3070}, %rd738; 2026-02-21T09:34:05.7013661Z cvt.rn.f16x2.f32 %r3071, %r3070, %r3069; 2026-02-21T09:34:05.7013729Z mov.b64 {%r3072, %r3073}, %rd742; 2026-02-21T09:34:05.7013796Z cvt.rn.f16x2.f32 %r3074, %r3073, %r3072; 2026-02-21T09:34:05.7013853Z mov.b64 {%r3075, %r3076}, %rd746; 2026-02-21T09:34:05.7013920Z cvt.rn.f16x2.f32 %r3077, %r3076, %r3075; 2026-02-21T09:34:05.7014099Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7014161Z cvt.u64.u32 %rd1271, %r1690; 2026-02-21T09:34:05.7014245Z cvt.u64.u32 %rd1272, %r1691; 2026-02-21T09:34:05.7014337Z shl.b64 %rd1273, %rd1272, 32; 2026-02-21T09:34:05.7014397Z or.b64 %rd1274, %rd1271, %rd1273; 2026-02-21T09:34:05.7014566Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7014635Z mov.b64 {%r3078, %r3079}, %rd1274; 2026-02-21T09:34:05.7014745Z cvt.rn.f16x2.f32 %r3080, %r3079, %r3078; 2026-02-21T09:34:05.7014805Z mov.b64 {%r3081, %r3082}, %rd750; 2026-02-21T09:34:05.7014872Z cvt.rn.f16x2.f32 %r3083, %r3082, %r3081; 2026-02-21T09:34:05.7014940Z mov.b64 {%r3084, %r3085}, %rd754; 2026-02-21T09:34:05.7015005Z cvt.rn.f16x2.f32 %r3086, %r3085, %r3084; 2026-02-21T09:34:05.7015063Z mov.b64 {%r3087, %r3088}, %rd758; 2026-02-21T09:34:05.7015136Z cvt.rn.f16x2.f32 %r3089, %r3088, %r3087; 2026-02-21T09:34:05.7015362Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7015426Z cvt.u64.u32 %rd1275, %r1699; 2026-02-21T09:34:05.7015493Z cvt.u64.u32 %rd1276, %r1700; 2026-02-21T09:34:05.7015553Z shl.b64 %rd1277, %rd1276, 32; 2026-02-21T09:34:05.7015613Z or.b64 %rd1278, %rd1275, %rd1277; 2026-02-21T09:34:05.7015786Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7015855Z mov.b64 {%r3090, %r3091}, %rd1278; 2026-02-21T09:34:05.7015923Z cvt.rn.f16x2.f32 %r3092, %r3091, %r3090; 2026-02-21T09:34:05.7015981Z mov.b64 {%r3093, %r3094}, %rd762; 2026-02-21T09:34:05.7016055Z cvt.rn.f16x2.f32 %r3095, %r3094, %r3093; 2026-02-21T09:34:05.7016113Z mov.b64 {%r3096, %r3097}, %rd766; 2026-02-21T09:34:05.7016180Z cvt.rn.f16x2.f32 %r3098, %r3097, %r3096; 2026-02-21T09:34:05.7016247Z mov.b64 {%r3099, %r3100}, %rd770; 2026-02-21T09:34:05.7016312Z cvt.rn.f16x2.f32 %r3101, %r3100, %r3099; 2026-02-21T09:34:05.7016488Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7016550Z cvt.u64.u32 %rd1279, %r1707; 2026-02-21T09:34:05.7016620Z cvt.u64.u32 %rd1280, %r1708; 2026-02-21T09:34:05.7016681Z shl.b64 %rd1281, %rd1280, 32; 2026-02-21T09:34:05.7016740Z or.b64 %rd1282, %rd1279, %rd1281; 2026-02-21T09:34:05.7016917Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7016978Z mov.b64 {%r3102, %r3103}, %rd1282; 2026-02-21T09:34:05.7017045Z cvt.rn.f16x2.f32 %r3104, %r3103, %r3102; 2026-02-21T09:34:05.7017110Z mov.b64 {%r3105, %r3106}, %rd774; 2026-02-21T09:34:05.7017177Z cvt.rn.f16x2.f32 %r3107, %r3106, %r3105; 2026-02-21T09:34:05.7017235Z mov.b64 {%r3108, %r3109}, %rd778; 2026-02-21T09:34:05.7017300Z cvt.rn.f16x2.f32 %r3110, %r3109, %r3108; 2026-02-21T09:34:05.7017366Z mov.b64 {%r3111, %r3112}, %rd782; 2026-02-21T09:34:05.7017432Z cvt.rn.f16x2.f32 %r3113, %r3112, %r3111; 2026-02-21T09:34:05.7017606Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7017677Z cvt.u64.u32 %rd1283, %r1716; 2026-02-21T09:34:05.7017736Z cvt.u64.u32 %rd1284, %r1717; 2026-02-21T09:34:05.7017798Z shl.b64 %rd1285, %rd1284, 32; 2026-02-21T09:34:05.7017857Z or.b64 %rd1286, %rd1283, %rd1285; 2026-02-21T09:34:05.7018037Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7018097Z mov.b64 {%r3114, %r3115}, %rd1286; 2026-02-21T09:34:05.7018164Z cvt.rn.f16x2.f32 %r3116, %r3115, %r3114; 2026-02-21T09:34:05.7018232Z mov.b64 {%r3117, %r3118}, %rd786; 2026-02-21T09:34:05.7018300Z cvt.rn.f16x2.f32 %r3119, %r3118, %r3117; 2026-02-21T09:34:05.7018359Z mov.b64 {%r3120, %r3121}, %rd790; 2026-02-21T09:34:05.7018435Z cvt.rn.f16x2.f32 %r3122, %r3121, %r3120; 2026-02-21T09:34:05.7018494Z mov.b64 {%r3123, %r3124}, %rd794; 2026-02-21T09:34:05.7018564Z cvt.rn.f16x2.f32 %r3125, %r3124, %r3123; 2026-02-21T09:34:05.7018736Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7018878Z cvt.u64.u32 %rd1287, %r1724; 2026-02-21T09:34:05.7018939Z cvt.u64.u32 %rd1288, %r1725; 2026-02-21T09:34:05.7019002Z shl.b64 %rd1289, %rd1288, 32; 2026-02-21T09:34:05.7019072Z or.b64 %rd1290, %rd1287, %rd1289; 2026-02-21T09:34:05.7019243Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7019303Z mov.b64 {%r3126, %r3127}, %rd1290; 2026-02-21T09:34:05.7019377Z cvt.rn.f16x2.f32 %r3128, %r3127, %r3126; 2026-02-21T09:34:05.7019437Z mov.b64 {%r3129, %r3130}, %rd798; 2026-02-21T09:34:05.7019505Z cvt.rn.f16x2.f32 %r3131, %r3130, %r3129; 2026-02-21T09:34:05.7019564Z mov.b64 {%r3132, %r3133}, %rd802; 2026-02-21T09:34:05.7019636Z cvt.rn.f16x2.f32 %r3134, %r3133, %r3132; 2026-02-21T09:34:05.7019695Z mov.b64 {%r3135, %r3136}, %rd806; 2026-02-21T09:34:05.7019804Z cvt.rn.f16x2.f32 %r3137, %r3136, %r3135; 2026-02-21T09:34:05.7019984Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7020046Z cvt.u64.u32 %rd1291, %r1733; 2026-02-21T09:34:05.7020105Z cvt.u64.u32 %rd1292, %r1734; 2026-02-21T09:34:05.7020172Z shl.b64 %rd1293, %rd1292, 32; 2026-02-21T09:34:05.7020232Z or.b64 %rd1294, %rd1291, %rd1293; 2026-02-21T09:34:05.7020402Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7020463Z mov.b64 {%r3138, %r3139}, %rd1294; 2026-02-21T09:34:05.7020539Z cvt.rn.f16x2.f32 %r3140, %r3139, %r3138; 2026-02-21T09:34:05.7020598Z mov.b64 {%r3141, %r3142}, %rd810; 2026-02-21T09:34:05.7020666Z cvt.rn.f16x2.f32 %r3143, %r3142, %r3141; 2026-02-21T09:34:05.7020733Z mov.b64 {%r3144, %r3145}, %rd814; 2026-02-21T09:34:05.7020800Z cvt.rn.f16x2.f32 %r3146, %r3145, %r3144; 2026-02-21T09:34:05.7020860Z mov.b64 {%r3147, %r3148}, %rd818; 2026-02-21T09:34:05.7020936Z cvt.rn.f16x2.f32 %r3149, %r3148, %r3147; 2026-02-21T09:34:05.7021108Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7021170Z cvt.u64.u32 %rd1295, %r1741; 2026-02-21T09:34:05.7021230Z cvt.u64.u32 %rd1296, %r1742; 2026-02-21T09:34:05.7021298Z shl.b64 %rd1297, %rd1296, 32; 2026-02-21T09:34:05.7021357Z or.b64 %rd1298, %rd1295, %rd1297; 2026-02-21T09:34:05.7021528Z .loc 1 48 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:48:27 2026-02-21T09:34:05.7021596Z mov.b64 {%r3150, %r3151}, %rd1298; 2026-02-21T09:34:05.7021662Z cvt.rn.f16x2.f32 %r3152, %r3151, %r3150; 2026-02-21T09:34:05.7021720Z mov.b64 {%r3153, %r3154}, %rd822; 2026-02-21T09:34:05.7021793Z cvt.rn.f16x2.f32 %r3155, %r3154, %r3153; 2026-02-21T09:34:05.7021852Z mov.b64 {%r3156, %r3157}, %rd826; 2026-02-21T09:34:05.7021918Z cvt.rn.f16x2.f32 %r3158, %r3157, %r3156; 2026-02-21T09:34:05.7021978Z mov.b64 {%r3159, %r3160}, %rd830; 2026-02-21T09:34:05.7022054Z cvt.rn.f16x2.f32 %r3161, %r3160, %r3159; 2026-02-21T09:34:05.7022226Z .loc 1 49 45 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:49:45 2026-02-21T09:34:05.7022287Z shl.b32 %r3162, %r2329, 10; 2026-02-21T09:34:05.7022357Z shl.b32 %r3163, %r2330, 10; 2026-02-21T09:34:05.7022416Z shl.b32 %r3164, %r2331, 10; 2026-02-21T09:34:05.7022474Z shl.b32 %r3165, %r2332, 10; 2026-02-21T09:34:05.7022531Z shl.b32 %r3166, %r2333, 10; 2026-02-21T09:34:05.7022597Z shl.b32 %r3167, %r2334, 10; 2026-02-21T09:34:05.7022655Z shl.b32 %r3168, %r2335, 10; 2026-02-21T09:34:05.7022712Z shl.b32 %r3169, %r2336, 10; 2026-02-21T09:34:05.7022778Z shl.b32 %r3170, %r2337, 10; 2026-02-21T09:34:05.7022835Z shl.b32 %r3171, %r2338, 10; 2026-02-21T09:34:05.7022892Z shl.b32 %r3172, %r2339, 10; 2026-02-21T09:34:05.7022955Z shl.b32 %r3173, %r2340, 10; 2026-02-21T09:34:05.7023013Z shl.b32 %r3174, %r2341, 10; 2026-02-21T09:34:05.7023071Z shl.b32 %r3175, %r2342, 10; 2026-02-21T09:34:05.7023153Z shl.b32 %r3176, %r2343, 10; 2026-02-21T09:34:05.7023242Z shl.b32 %r3177, %r2344, 10; 2026-02-21T09:34:05.7023302Z shl.b32 %r3178, %r2345, 10; 2026-02-21T09:34:05.7023360Z shl.b32 %r3179, %r2346, 10; 2026-02-21T09:34:05.7023424Z shl.b32 %r3180, %r2347, 10; 2026-02-21T09:34:05.7023481Z shl.b32 %r3181, %r2348, 10; 2026-02-21T09:34:05.7023538Z shl.b32 %r3182, %r2349, 10; 2026-02-21T09:34:05.7023594Z shl.b32 %r3183, %r2350, 10; 2026-02-21T09:34:05.7023658Z shl.b32 %r3184, %r2351, 10; 2026-02-21T09:34:05.7023716Z shl.b32 %r3185, %r2352, 10; 2026-02-21T09:34:05.7023773Z shl.b32 %r3186, %r2353, 10; 2026-02-21T09:34:05.7023836Z shl.b32 %r3187, %r2354, 10; 2026-02-21T09:34:05.7023892Z shl.b32 %r3188, %r2355, 10; 2026-02-21T09:34:05.7023948Z shl.b32 %r3189, %r2356, 10; 2026-02-21T09:34:05.7024007Z shl.b32 %r3190, %r2357, 10; 2026-02-21T09:34:05.7024073Z shl.b32 %r3191, %r2358, 10; 2026-02-21T09:34:05.7024152Z shl.b32 %r3192, %r2359, 10; 2026-02-21T09:34:05.7024233Z shl.b32 %r3193, %r2360, 10; 2026-02-21T09:34:05.7024303Z shl.b32 %r3194, %r2361, 10; 2026-02-21T09:34:05.7024362Z shl.b32 %r3195, %r2362, 10; 2026-02-21T09:34:05.7024420Z shl.b32 %r3196, %r2363, 10; 2026-02-21T09:34:05.7024478Z shl.b32 %r3197, %r2364, 10; 2026-02-21T09:34:05.7024542Z shl.b32 %r3198, %r2365, 10; 2026-02-21T09:34:05.7024600Z shl.b32 %r3199, %r2366, 10; 2026-02-21T09:34:05.7024657Z shl.b32 %r3200, %r2367, 10; 2026-02-21T09:34:05.7024760Z shl.b32 %r3201, %r2368, 10; 2026-02-21T09:34:05.7024820Z shl.b32 %r3202, %r2369, 10; 2026-02-21T09:34:05.7024877Z shl.b32 %r3203, %r2370, 10; 2026-02-21T09:34:05.7024944Z shl.b32 %r3204, %r2371, 10; 2026-02-21T09:34:05.7025002Z shl.b32 %r3205, %r2372, 10; 2026-02-21T09:34:05.7025060Z shl.b32 %r3206, %r2373, 10; 2026-02-21T09:34:05.7025116Z shl.b32 %r3207, %r2374, 10; 2026-02-21T09:34:05.7025182Z shl.b32 %r3208, %r2375, 10; 2026-02-21T09:34:05.7025242Z shl.b32 %r3209, %r2376, 10; 2026-02-21T09:34:05.7025301Z shl.b32 %r3210, %r2377, 10; 2026-02-21T09:34:05.7025368Z shl.b32 %r3211, %r2378, 10; 2026-02-21T09:34:05.7025427Z shl.b32 %r3212, %r2379, 10; 2026-02-21T09:34:05.7025484Z shl.b32 %r3213, %r2380, 10; 2026-02-21T09:34:05.7025541Z shl.b32 %r3214, %r2381, 10; 2026-02-21T09:34:05.7025606Z shl.b32 %r3215, %r2382, 10; 2026-02-21T09:34:05.7025665Z shl.b32 %r3216, %r2383, 10; 2026-02-21T09:34:05.7025723Z shl.b32 %r3217, %r2384, 10; 2026-02-21T09:34:05.7025792Z shl.b32 %r3218, %r2385, 10; 2026-02-21T09:34:05.7025851Z shl.b32 %r3219, %r2386, 10; 2026-02-21T09:34:05.7025910Z shl.b32 %r3220, %r2387, 10; 2026-02-21T09:34:05.7025972Z shl.b32 %r3221, %r2388, 10; 2026-02-21T09:34:05.7026043Z shl.b32 %r3222, %r2389, 10; 2026-02-21T09:34:05.7026103Z shl.b32 %r3223, %r2390, 10; 2026-02-21T09:34:05.7026163Z shl.b32 %r3224, %r2391, 10; 2026-02-21T09:34:05.7026234Z shl.b32 %r3225, %r2392, 10; 2026-02-21T09:34:05.7026413Z .loc 1 49 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:49:52 2026-02-21T09:34:05.7026477Z add.s32 %r3226, %r3162, %r2328; 2026-02-21T09:34:05.7026548Z add.s32 %r3227, %r3163, %r2328; 2026-02-21T09:34:05.7026608Z add.s32 %r3228, %r3164, %r2328; 2026-02-21T09:34:05.7026666Z add.s32 %r3229, %r3165, %r2328; 2026-02-21T09:34:05.7026725Z add.s32 %r3230, %r3166, %r2328; 2026-02-21T09:34:05.7026791Z add.s32 %r3231, %r3167, %r2328; 2026-02-21T09:34:05.7026849Z add.s32 %r3232, %r3168, %r2328; 2026-02-21T09:34:05.7026907Z add.s32 %r3233, %r3169, %r2328; 2026-02-21T09:34:05.7026973Z add.s32 %r3234, %r3170, %r2328; 2026-02-21T09:34:05.7027032Z add.s32 %r3235, %r3171, %r2328; 2026-02-21T09:34:05.7027089Z add.s32 %r3236, %r3172, %r2328; 2026-02-21T09:34:05.7027147Z add.s32 %r3237, %r3173, %r2328; 2026-02-21T09:34:05.7027214Z add.s32 %r3238, %r3174, %r2328; 2026-02-21T09:34:05.7027272Z add.s32 %r3239, %r3175, %r2328; 2026-02-21T09:34:05.7027330Z add.s32 %r3240, %r3176, %r2328; 2026-02-21T09:34:05.7027399Z add.s32 %r3241, %r3177, %r2328; 2026-02-21T09:34:05.7027483Z add.s32 %r3242, %r3178, %r2328; 2026-02-21T09:34:05.7027567Z add.s32 %r3243, %r3179, %r2328; 2026-02-21T09:34:05.7027633Z add.s32 %r3244, %r3180, %r2328; 2026-02-21T09:34:05.7027693Z add.s32 %r3245, %r3181, %r2328; 2026-02-21T09:34:05.7027752Z add.s32 %r3246, %r3182, %r2328; 2026-02-21T09:34:05.7027810Z add.s32 %r3247, %r3183, %r2328; 2026-02-21T09:34:05.7027877Z add.s32 %r3248, %r3184, %r2328; 2026-02-21T09:34:05.7027937Z add.s32 %r3249, %r3185, %r2328; 2026-02-21T09:34:05.7027997Z add.s32 %r3250, %r3186, %r2328; 2026-02-21T09:34:05.7028063Z add.s32 %r3251, %r3187, %r2328; 2026-02-21T09:34:05.7028122Z add.s32 %r3252, %r3188, %r2328; 2026-02-21T09:34:05.7028181Z add.s32 %r3253, %r3189, %r2328; 2026-02-21T09:34:05.7028240Z add.s32 %r3254, %r3190, %r2328; 2026-02-21T09:34:05.7028306Z add.s32 %r3255, %r3191, %r2328; 2026-02-21T09:34:05.7028365Z add.s32 %r3256, %r3192, %r2328; 2026-02-21T09:34:05.7028463Z add.s32 %r3257, %r3193, %r2328; 2026-02-21T09:34:05.7028560Z add.s32 %r3258, %r3194, %r2328; 2026-02-21T09:34:05.7028622Z add.s32 %r3259, %r3195, %r2328; 2026-02-21T09:34:05.7028681Z add.s32 %r3260, %r3196, %r2328; 2026-02-21T09:34:05.7028738Z add.s32 %r3261, %r3197, %r2328; 2026-02-21T09:34:05.7028803Z add.s32 %r3262, %r3198, %r2328; 2026-02-21T09:34:05.7028859Z add.s32 %r3263, %r3199, %r2328; 2026-02-21T09:34:05.7028917Z add.s32 %r3264, %r3200, %r2328; 2026-02-21T09:34:05.7028983Z add.s32 %r3265, %r3201, %r2328; 2026-02-21T09:34:05.7029042Z add.s32 %r3266, %r3202, %r2328; 2026-02-21T09:34:05.7029099Z add.s32 %r3267, %r3203, %r2328; 2026-02-21T09:34:05.7029165Z add.s32 %r3268, %r3204, %r2328; 2026-02-21T09:34:05.7029223Z add.s32 %r3269, %r3205, %r2328; 2026-02-21T09:34:05.7029282Z add.s32 %r3270, %r3206, %r2328; 2026-02-21T09:34:05.7029340Z add.s32 %r3271, %r3207, %r2328; 2026-02-21T09:34:05.7029405Z add.s32 %r3272, %r3208, %r2328; 2026-02-21T09:34:05.7029464Z add.s32 %r3273, %r3209, %r2328; 2026-02-21T09:34:05.7029525Z add.s32 %r3274, %r3210, %r2328; 2026-02-21T09:34:05.7029592Z add.s32 %r3275, %r3211, %r2328; 2026-02-21T09:34:05.7029652Z add.s32 %r3276, %r3212, %r2328; 2026-02-21T09:34:05.7029709Z add.s32 %r3277, %r3213, %r2328; 2026-02-21T09:34:05.7029767Z add.s32 %r3278, %r3214, %r2328; 2026-02-21T09:34:05.7029833Z add.s32 %r3279, %r3215, %r2328; 2026-02-21T09:34:05.7029891Z add.s32 %r3280, %r3216, %r2328; 2026-02-21T09:34:05.7029950Z add.s32 %r3281, %r3217, %r2328; 2026-02-21T09:34:05.7030016Z add.s32 %r3282, %r3218, %r2328; 2026-02-21T09:34:05.7030075Z add.s32 %r3283, %r3219, %r2328; 2026-02-21T09:34:05.7030135Z add.s32 %r3284, %r3220, %r2328; 2026-02-21T09:34:05.7030201Z add.s32 %r3285, %r3221, %r2328; 2026-02-21T09:34:05.7030262Z add.s32 %r3286, %r3222, %r2328; 2026-02-21T09:34:05.7030319Z add.s32 %r3287, %r3223, %r2328; 2026-02-21T09:34:05.7030378Z add.s32 %r3288, %r3224, %r2328; 2026-02-21T09:34:05.7030443Z add.s32 %r3289, %r3225, %r2328; 2026-02-21T09:34:05.7030620Z .loc 1 49 24 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:49:24 2026-02-21T09:34:05.7030694Z mad.wide.s32 %rd211, %r3226, 2, %rd19; 2026-02-21T09:34:05.7030771Z mad.wide.s32 %rd212, %r3227, 2, %rd19; 2026-02-21T09:34:05.7030838Z mad.wide.s32 %rd213, %r3228, 2, %rd19; 2026-02-21T09:34:05.7030902Z mad.wide.s32 %rd214, %r3229, 2, %rd19; 2026-02-21T09:34:05.7030965Z mad.wide.s32 %rd215, %r3230, 2, %rd19; 2026-02-21T09:34:05.7031037Z mad.wide.s32 %rd216, %r3231, 2, %rd19; 2026-02-21T09:34:05.7031101Z mad.wide.s32 %rd217, %r3232, 2, %rd19; 2026-02-21T09:34:05.7031165Z mad.wide.s32 %rd218, %r3233, 2, %rd19; 2026-02-21T09:34:05.7031237Z mad.wide.s32 %rd219, %r3234, 2, %rd19; 2026-02-21T09:34:05.7031302Z mad.wide.s32 %rd220, %r3235, 2, %rd19; 2026-02-21T09:34:05.7031366Z mad.wide.s32 %rd221, %r3236, 2, %rd19; 2026-02-21T09:34:05.7031436Z mad.wide.s32 %rd222, %r3237, 2, %rd19; 2026-02-21T09:34:05.7031501Z mad.wide.s32 %rd223, %r3238, 2, %rd19; 2026-02-21T09:34:05.7031566Z mad.wide.s32 %rd224, %r3239, 2, %rd19; 2026-02-21T09:34:05.7031651Z mad.wide.s32 %rd225, %r3240, 2, %rd19; 2026-02-21T09:34:05.7031745Z mad.wide.s32 %rd226, %r3241, 2, %rd19; 2026-02-21T09:34:05.7031808Z mad.wide.s32 %rd227, %r3242, 2, %rd19; 2026-02-21T09:34:05.7031872Z mad.wide.s32 %rd228, %r3243, 2, %rd19; 2026-02-21T09:34:05.7031944Z mad.wide.s32 %rd229, %r3244, 2, %rd19; 2026-02-21T09:34:05.7032007Z mad.wide.s32 %rd230, %r3245, 2, %rd19; 2026-02-21T09:34:05.7032069Z mad.wide.s32 %rd231, %r3246, 2, %rd19; 2026-02-21T09:34:05.7032143Z mad.wide.s32 %rd232, %r3247, 2, %rd19; 2026-02-21T09:34:05.7032208Z mad.wide.s32 %rd233, %r3248, 2, %rd19; 2026-02-21T09:34:05.7032272Z mad.wide.s32 %rd234, %r3249, 2, %rd19; 2026-02-21T09:34:05.7032336Z mad.wide.s32 %rd235, %r3250, 2, %rd19; 2026-02-21T09:34:05.7032411Z mad.wide.s32 %rd236, %r3251, 2, %rd19; 2026-02-21T09:34:05.7032475Z mad.wide.s32 %rd237, %r3252, 2, %rd19; 2026-02-21T09:34:05.7032609Z mad.wide.s32 %rd238, %r3253, 2, %rd19; 2026-02-21T09:34:05.7032684Z mad.wide.s32 %rd239, %r3254, 2, %rd19; 2026-02-21T09:34:05.7032750Z mad.wide.s32 %rd240, %r3255, 2, %rd19; 2026-02-21T09:34:05.7032813Z mad.wide.s32 %rd241, %r3256, 2, %rd19; 2026-02-21T09:34:05.7032876Z mad.wide.s32 %rd242, %r3257, 2, %rd19; 2026-02-21T09:34:05.7032948Z mad.wide.s32 %rd243, %r3258, 2, %rd19; 2026-02-21T09:34:05.7033010Z mad.wide.s32 %rd244, %r3259, 2, %rd19; 2026-02-21T09:34:05.7033074Z mad.wide.s32 %rd245, %r3260, 2, %rd19; 2026-02-21T09:34:05.7033145Z mad.wide.s32 %rd246, %r3261, 2, %rd19; 2026-02-21T09:34:05.7033207Z mad.wide.s32 %rd247, %r3262, 2, %rd19; 2026-02-21T09:34:05.7033271Z mad.wide.s32 %rd248, %r3263, 2, %rd19; 2026-02-21T09:34:05.7033341Z mad.wide.s32 %rd249, %r3264, 2, %rd19; 2026-02-21T09:34:05.7033404Z mad.wide.s32 %rd250, %r3265, 2, %rd19; 2026-02-21T09:34:05.7033468Z mad.wide.s32 %rd251, %r3266, 2, %rd19; 2026-02-21T09:34:05.7033533Z mad.wide.s32 %rd252, %r3267, 2, %rd19; 2026-02-21T09:34:05.7033607Z mad.wide.s32 %rd253, %r3268, 2, %rd19; 2026-02-21T09:34:05.7033673Z mad.wide.s32 %rd254, %r3269, 2, %rd19; 2026-02-21T09:34:05.7033743Z mad.wide.s32 %rd255, %r3270, 2, %rd19; 2026-02-21T09:34:05.7033813Z mad.wide.s32 %rd256, %r3271, 2, %rd19; 2026-02-21T09:34:05.7033876Z mad.wide.s32 %rd257, %r3272, 2, %rd19; 2026-02-21T09:34:05.7033938Z mad.wide.s32 %rd258, %r3273, 2, %rd19; 2026-02-21T09:34:05.7034002Z mad.wide.s32 %rd259, %r3274, 2, %rd19; 2026-02-21T09:34:05.7034074Z mad.wide.s32 %rd260, %r3275, 2, %rd19; 2026-02-21T09:34:05.7034137Z mad.wide.s32 %rd261, %r3276, 2, %rd19; 2026-02-21T09:34:05.7034201Z mad.wide.s32 %rd262, %r3277, 2, %rd19; 2026-02-21T09:34:05.7034274Z mad.wide.s32 %rd263, %r3278, 2, %rd19; 2026-02-21T09:34:05.7034338Z mad.wide.s32 %rd264, %r3279, 2, %rd19; 2026-02-21T09:34:05.7034404Z mad.wide.s32 %rd265, %r3280, 2, %rd19; 2026-02-21T09:34:05.7034476Z mad.wide.s32 %rd266, %r3281, 2, %rd19; 2026-02-21T09:34:05.7034541Z mad.wide.s32 %rd267, %r3282, 2, %rd19; 2026-02-21T09:34:05.7034608Z mad.wide.s32 %rd268, %r3283, 2, %rd19; 2026-02-21T09:34:05.7034710Z mad.wide.s32 %rd269, %r3284, 2, %rd19; 2026-02-21T09:34:05.7034783Z mad.wide.s32 %rd270, %r3285, 2, %rd19; 2026-02-21T09:34:05.7034846Z mad.wide.s32 %rd271, %r3286, 2, %rd19; 2026-02-21T09:34:05.7034912Z mad.wide.s32 %rd272, %r3287, 2, %rd19; 2026-02-21T09:34:05.7034982Z mad.wide.s32 %rd273, %r3288, 2, %rd19; 2026-02-21T09:34:05.7035046Z mad.wide.s32 %rd274, %r3289, 2, %rd19; 2026-02-21T09:34:05.7035227Z .loc 1 49 82 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:49:82 2026-02-21T09:34:05.7035286Z bar.sync 0, 128; 2026-02-21T09:34:05.7035401Z st.shared.v4.b32 [%r191], {%r2396, %r2408, %r2420, %r2432}; 2026-02-21T09:34:05.7035505Z st.shared.v4.b32 [%r192], {%r2444, %r2456, %r2468, %r2480}; 2026-02-21T09:34:05.7035603Z st.shared.v4.b32 [%r193], {%r2492, %r2504, %r2516, %r2528}; 2026-02-21T09:34:05.7035707Z st.shared.v4.b32 [%r194], {%r2540, %r2552, %r2564, %r2576}; 2026-02-21T09:34:05.7035806Z st.shared.v4.b32 [%r195], {%r2588, %r2600, %r2612, %r2624}; 2026-02-21T09:34:05.7035952Z st.shared.v4.b32 [%r196], {%r2636, %r2648, %r2660, %r2672}; 2026-02-21T09:34:05.7036055Z st.shared.v4.b32 [%r197], {%r2684, %r2696, %r2708, %r2720}; 2026-02-21T09:34:05.7036151Z st.shared.v4.b32 [%r198], {%r2732, %r2744, %r2756, %r2768}; 2026-02-21T09:34:05.7036209Z bar.sync 0, 128; 2026-02-21T09:34:05.7036278Z // begin inline asm 2026-02-21T09:34:05.7036451Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2071, %r2075, %r2079, %r2083}, [%r1755]; 2026-02-21T09:34:05.7036511Z // end inline asm 2026-02-21T09:34:05.7036570Z // begin inline asm 2026-02-21T09:34:05.7036742Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2087, %r2091, %r2095, %r2099}, [%r1760]; 2026-02-21T09:34:05.7036800Z // end inline asm 2026-02-21T09:34:05.7036858Z // begin inline asm 2026-02-21T09:34:05.7037054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2103, %r2107, %r2111, %r2115}, [%r1765]; 2026-02-21T09:34:05.7037139Z // end inline asm 2026-02-21T09:34:05.7037200Z // begin inline asm 2026-02-21T09:34:05.7037358Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2119, %r2123, %r2127, %r2131}, [%r1770]; 2026-02-21T09:34:05.7037421Z // end inline asm 2026-02-21T09:34:05.7037478Z // begin inline asm 2026-02-21T09:34:05.7037637Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2135, %r2139, %r2143, %r2147}, [%r1775]; 2026-02-21T09:34:05.7037699Z // end inline asm 2026-02-21T09:34:05.7037755Z // begin inline asm 2026-02-21T09:34:05.7037907Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2151, %r2155, %r2159, %r2163}, [%r1780]; 2026-02-21T09:34:05.7037970Z // end inline asm 2026-02-21T09:34:05.7038026Z // begin inline asm 2026-02-21T09:34:05.7038178Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2167, %r2171, %r2175, %r2179}, [%r1785]; 2026-02-21T09:34:05.7038233Z // end inline asm 2026-02-21T09:34:05.7038297Z // begin inline asm 2026-02-21T09:34:05.7038482Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2183, %r2187, %r2191, %r2195}, [%r1790]; 2026-02-21T09:34:05.7038544Z // end inline asm 2026-02-21T09:34:05.7038615Z bar.sync 0, 128; 2026-02-21T09:34:05.7038719Z st.shared.v4.b32 [%r191], {%r2780, %r2792, %r2804, %r2816}; 2026-02-21T09:34:05.7038821Z st.shared.v4.b32 [%r192], {%r2828, %r2840, %r2852, %r2864}; 2026-02-21T09:34:05.7038930Z st.shared.v4.b32 [%r193], {%r2876, %r2888, %r2900, %r2912}; 2026-02-21T09:34:05.7039030Z st.shared.v4.b32 [%r194], {%r2924, %r2936, %r2948, %r2960}; 2026-02-21T09:34:05.7039130Z st.shared.v4.b32 [%r195], {%r2972, %r2984, %r2996, %r3008}; 2026-02-21T09:34:05.7039228Z st.shared.v4.b32 [%r196], {%r3020, %r3032, %r3044, %r3056}; 2026-02-21T09:34:05.7039335Z st.shared.v4.b32 [%r197], {%r3068, %r3080, %r3092, %r3104}; 2026-02-21T09:34:05.7039434Z st.shared.v4.b32 [%r198], {%r3116, %r3128, %r3140, %r3152}; 2026-02-21T09:34:05.7039495Z bar.sync 0, 128; 2026-02-21T09:34:05.7039564Z // begin inline asm 2026-02-21T09:34:05.7039739Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2199, %r2203, %r2207, %r2211}, [%r1755]; 2026-02-21T09:34:05.7039801Z // end inline asm 2026-02-21T09:34:05.7039874Z // begin inline asm 2026-02-21T09:34:05.7040045Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2215, %r2219, %r2223, %r2227}, [%r1760]; 2026-02-21T09:34:05.7040105Z // end inline asm 2026-02-21T09:34:05.7040167Z // begin inline asm 2026-02-21T09:34:05.7040340Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2231, %r2235, %r2239, %r2243}, [%r1765]; 2026-02-21T09:34:05.7040396Z // end inline asm 2026-02-21T09:34:05.7040454Z // begin inline asm 2026-02-21T09:34:05.7040620Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2247, %r2251, %r2255, %r2259}, [%r1770]; 2026-02-21T09:34:05.7040677Z // end inline asm 2026-02-21T09:34:05.7040735Z // begin inline asm 2026-02-21T09:34:05.7040889Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2263, %r2267, %r2271, %r2275}, [%r1775]; 2026-02-21T09:34:05.7040962Z // end inline asm 2026-02-21T09:34:05.7041028Z // begin inline asm 2026-02-21T09:34:05.7041198Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2279, %r2283, %r2287, %r2291}, [%r1780]; 2026-02-21T09:34:05.7041318Z // end inline asm 2026-02-21T09:34:05.7041380Z // begin inline asm 2026-02-21T09:34:05.7041552Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2295, %r2299, %r2303, %r2307}, [%r1785]; 2026-02-21T09:34:05.7041622Z // end inline asm 2026-02-21T09:34:05.7041683Z // begin inline asm 2026-02-21T09:34:05.7041849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2311, %r2315, %r2319, %r2323}, [%r1790]; 2026-02-21T09:34:05.7041909Z // end inline asm 2026-02-21T09:34:05.7041977Z bar.sync 0, 128; 2026-02-21T09:34:05.7042081Z st.shared.v4.b32 [%r191], {%r2399, %r2411, %r2423, %r2435}; 2026-02-21T09:34:05.7042185Z st.shared.v4.b32 [%r192], {%r2447, %r2459, %r2471, %r2483}; 2026-02-21T09:34:05.7042295Z st.shared.v4.b32 [%r193], {%r2495, %r2507, %r2519, %r2531}; 2026-02-21T09:34:05.7042396Z st.shared.v4.b32 [%r194], {%r2543, %r2555, %r2567, %r2579}; 2026-02-21T09:34:05.7042543Z st.shared.v4.b32 [%r195], {%r2591, %r2603, %r2615, %r2627}; 2026-02-21T09:34:05.7042658Z st.shared.v4.b32 [%r196], {%r2639, %r2651, %r2663, %r2675}; 2026-02-21T09:34:05.7042762Z st.shared.v4.b32 [%r197], {%r2687, %r2699, %r2711, %r2723}; 2026-02-21T09:34:05.7042863Z st.shared.v4.b32 [%r198], {%r2735, %r2747, %r2759, %r2771}; 2026-02-21T09:34:05.7042925Z bar.sync 0, 128; 2026-02-21T09:34:05.7042996Z // begin inline asm 2026-02-21T09:34:05.7043173Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2072, %r2076, %r2080, %r2084}, [%r1755]; 2026-02-21T09:34:05.7043235Z // end inline asm 2026-02-21T09:34:05.7043307Z // begin inline asm 2026-02-21T09:34:05.7043484Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2088, %r2092, %r2096, %r2100}, [%r1760]; 2026-02-21T09:34:05.7043545Z // end inline asm 2026-02-21T09:34:05.7043614Z // begin inline asm 2026-02-21T09:34:05.7043785Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2104, %r2108, %r2112, %r2116}, [%r1765]; 2026-02-21T09:34:05.7043847Z // end inline asm 2026-02-21T09:34:05.7043913Z // begin inline asm 2026-02-21T09:34:05.7044093Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2120, %r2124, %r2128, %r2132}, [%r1770]; 2026-02-21T09:34:05.7044156Z // end inline asm 2026-02-21T09:34:05.7044217Z // begin inline asm 2026-02-21T09:34:05.7044395Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2136, %r2140, %r2144, %r2148}, [%r1775]; 2026-02-21T09:34:05.7044455Z // end inline asm 2026-02-21T09:34:05.7044517Z // begin inline asm 2026-02-21T09:34:05.7044716Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2152, %r2156, %r2160, %r2164}, [%r1780]; 2026-02-21T09:34:05.7044786Z // end inline asm 2026-02-21T09:34:05.7044847Z // begin inline asm 2026-02-21T09:34:05.7045019Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2168, %r2172, %r2176, %r2180}, [%r1785]; 2026-02-21T09:34:05.7045088Z // end inline asm 2026-02-21T09:34:05.7045151Z // begin inline asm 2026-02-21T09:34:05.7045322Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2184, %r2188, %r2192, %r2196}, [%r1790]; 2026-02-21T09:34:05.7045391Z // end inline asm 2026-02-21T09:34:05.7045454Z bar.sync 0, 128; 2026-02-21T09:34:05.7045560Z st.shared.v4.b32 [%r191], {%r2783, %r2795, %r2807, %r2819}; 2026-02-21T09:34:05.7045665Z st.shared.v4.b32 [%r192], {%r2831, %r2843, %r2855, %r2867}; 2026-02-21T09:34:05.7045776Z st.shared.v4.b32 [%r193], {%r2879, %r2891, %r2903, %r2915}; 2026-02-21T09:34:05.7045876Z st.shared.v4.b32 [%r194], {%r2927, %r2939, %r2951, %r2963}; 2026-02-21T09:34:05.7045978Z st.shared.v4.b32 [%r195], {%r2975, %r2987, %r2999, %r3011}; 2026-02-21T09:34:05.7046085Z st.shared.v4.b32 [%r196], {%r3023, %r3035, %r3047, %r3059}; 2026-02-21T09:34:05.7046183Z st.shared.v4.b32 [%r197], {%r3071, %r3083, %r3095, %r3107}; 2026-02-21T09:34:05.7046283Z st.shared.v4.b32 [%r198], {%r3119, %r3131, %r3143, %r3155}; 2026-02-21T09:34:05.7046350Z bar.sync 0, 128; 2026-02-21T09:34:05.7046412Z // begin inline asm 2026-02-21T09:34:05.7046584Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2200, %r2204, %r2208, %r2212}, [%r1755]; 2026-02-21T09:34:05.7046647Z // end inline asm 2026-02-21T09:34:05.7046748Z // begin inline asm 2026-02-21T09:34:05.7046968Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2216, %r2220, %r2224, %r2228}, [%r1760]; 2026-02-21T09:34:05.7047036Z // end inline asm 2026-02-21T09:34:05.7047100Z // begin inline asm 2026-02-21T09:34:05.7047253Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2232, %r2236, %r2240, %r2244}, [%r1765]; 2026-02-21T09:34:05.7047309Z // end inline asm 2026-02-21T09:34:05.7047372Z // begin inline asm 2026-02-21T09:34:05.7047524Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2248, %r2252, %r2256, %r2260}, [%r1770]; 2026-02-21T09:34:05.7047579Z // end inline asm 2026-02-21T09:34:05.7047634Z // begin inline asm 2026-02-21T09:34:05.7047794Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2264, %r2268, %r2272, %r2276}, [%r1775]; 2026-02-21T09:34:05.7047850Z // end inline asm 2026-02-21T09:34:05.7047907Z // begin inline asm 2026-02-21T09:34:05.7048116Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2280, %r2284, %r2288, %r2292}, [%r1780]; 2026-02-21T09:34:05.7048176Z // end inline asm 2026-02-21T09:34:05.7048234Z // begin inline asm 2026-02-21T09:34:05.7048392Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2296, %r2300, %r2304, %r2308}, [%r1785]; 2026-02-21T09:34:05.7048447Z // end inline asm 2026-02-21T09:34:05.7048504Z // begin inline asm 2026-02-21T09:34:05.7048654Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2312, %r2316, %r2320, %r2324}, [%r1790]; 2026-02-21T09:34:05.7048716Z // end inline asm 2026-02-21T09:34:05.7048771Z bar.sync 0, 128; 2026-02-21T09:34:05.7048866Z st.shared.v4.b32 [%r191], {%r2402, %r2414, %r2426, %r2438}; 2026-02-21T09:34:05.7048969Z st.shared.v4.b32 [%r192], {%r2450, %r2462, %r2474, %r2486}; 2026-02-21T09:34:05.7049064Z st.shared.v4.b32 [%r193], {%r2498, %r2510, %r2522, %r2534}; 2026-02-21T09:34:05.7049159Z st.shared.v4.b32 [%r194], {%r2546, %r2558, %r2570, %r2582}; 2026-02-21T09:34:05.7049261Z st.shared.v4.b32 [%r195], {%r2594, %r2606, %r2618, %r2630}; 2026-02-21T09:34:05.7049356Z st.shared.v4.b32 [%r196], {%r2642, %r2654, %r2666, %r2678}; 2026-02-21T09:34:05.7049449Z st.shared.v4.b32 [%r197], {%r2690, %r2702, %r2714, %r2726}; 2026-02-21T09:34:05.7049542Z st.shared.v4.b32 [%r198], {%r2738, %r2750, %r2762, %r2774}; 2026-02-21T09:34:05.7049605Z bar.sync 0, 128; 2026-02-21T09:34:05.7049662Z // begin inline asm 2026-02-21T09:34:05.7049816Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2073, %r2077, %r2081, %r2085}, [%r1755]; 2026-02-21T09:34:05.7049879Z // end inline asm 2026-02-21T09:34:05.7049936Z // begin inline asm 2026-02-21T09:34:05.7050088Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2089, %r2093, %r2097, %r2101}, [%r1760]; 2026-02-21T09:34:05.7050145Z // end inline asm 2026-02-21T09:34:05.7050210Z // begin inline asm 2026-02-21T09:34:05.7050363Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2105, %r2109, %r2113, %r2117}, [%r1765]; 2026-02-21T09:34:05.7050420Z // end inline asm 2026-02-21T09:34:05.7050491Z // begin inline asm 2026-02-21T09:34:05.7050647Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2121, %r2125, %r2129, %r2133}, [%r1770]; 2026-02-21T09:34:05.7050707Z // end inline asm 2026-02-21T09:34:05.7050775Z // begin inline asm 2026-02-21T09:34:05.7050928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2137, %r2141, %r2145, %r2149}, [%r1775]; 2026-02-21T09:34:05.7050984Z // end inline asm 2026-02-21T09:34:05.7051041Z // begin inline asm 2026-02-21T09:34:05.7051199Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2153, %r2157, %r2161, %r2165}, [%r1780]; 2026-02-21T09:34:05.7051255Z // end inline asm 2026-02-21T09:34:05.7051312Z // begin inline asm 2026-02-21T09:34:05.7051468Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2169, %r2173, %r2177, %r2181}, [%r1785]; 2026-02-21T09:34:05.7051523Z // end inline asm 2026-02-21T09:34:05.7051579Z // begin inline asm 2026-02-21T09:34:05.7051736Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2185, %r2189, %r2193, %r2197}, [%r1790]; 2026-02-21T09:34:05.7051792Z // end inline asm 2026-02-21T09:34:05.7051849Z bar.sync 0, 128; 2026-02-21T09:34:05.7051946Z st.shared.v4.b32 [%r191], {%r2786, %r2798, %r2810, %r2822}; 2026-02-21T09:34:05.7052100Z st.shared.v4.b32 [%r192], {%r2834, %r2846, %r2858, %r2870}; 2026-02-21T09:34:05.7052192Z st.shared.v4.b32 [%r193], {%r2882, %r2894, %r2906, %r2918}; 2026-02-21T09:34:05.7052285Z st.shared.v4.b32 [%r194], {%r2930, %r2942, %r2954, %r2966}; 2026-02-21T09:34:05.7052386Z st.shared.v4.b32 [%r195], {%r2978, %r2990, %r3002, %r3014}; 2026-02-21T09:34:05.7052479Z st.shared.v4.b32 [%r196], {%r3026, %r3038, %r3050, %r3062}; 2026-02-21T09:34:05.7052572Z st.shared.v4.b32 [%r197], {%r3074, %r3086, %r3098, %r3110}; 2026-02-21T09:34:05.7052671Z st.shared.v4.b32 [%r198], {%r3122, %r3134, %r3146, %r3158}; 2026-02-21T09:34:05.7052728Z bar.sync 0, 128; 2026-02-21T09:34:05.7052784Z // begin inline asm 2026-02-21T09:34:05.7052943Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2201, %r2205, %r2209, %r2213}, [%r1755]; 2026-02-21T09:34:05.7053043Z // end inline asm 2026-02-21T09:34:05.7053102Z // begin inline asm 2026-02-21T09:34:05.7053281Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2217, %r2221, %r2225, %r2229}, [%r1760]; 2026-02-21T09:34:05.7053352Z // end inline asm 2026-02-21T09:34:05.7053409Z // begin inline asm 2026-02-21T09:34:05.7053564Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2233, %r2237, %r2241, %r2245}, [%r1765]; 2026-02-21T09:34:05.7053626Z // end inline asm 2026-02-21T09:34:05.7053683Z // begin inline asm 2026-02-21T09:34:05.7053834Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2249, %r2253, %r2257, %r2261}, [%r1770]; 2026-02-21T09:34:05.7053889Z // end inline asm 2026-02-21T09:34:05.7053954Z // begin inline asm 2026-02-21T09:34:05.7054106Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2265, %r2269, %r2273, %r2277}, [%r1775]; 2026-02-21T09:34:05.7054162Z // end inline asm 2026-02-21T09:34:05.7054225Z // begin inline asm 2026-02-21T09:34:05.7054377Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2281, %r2285, %r2289, %r2293}, [%r1780]; 2026-02-21T09:34:05.7054434Z // end inline asm 2026-02-21T09:34:05.7054493Z // begin inline asm 2026-02-21T09:34:05.7054654Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2297, %r2301, %r2305, %r2309}, [%r1785]; 2026-02-21T09:34:05.7054743Z // end inline asm 2026-02-21T09:34:05.7054801Z // begin inline asm 2026-02-21T09:34:05.7054976Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2313, %r2317, %r2321, %r2325}, [%r1790]; 2026-02-21T09:34:05.7055037Z // end inline asm 2026-02-21T09:34:05.7055096Z bar.sync 0, 128; 2026-02-21T09:34:05.7055206Z st.shared.v4.b32 [%r191], {%r2405, %r2417, %r2429, %r2441}; 2026-02-21T09:34:05.7055308Z st.shared.v4.b32 [%r192], {%r2453, %r2465, %r2477, %r2489}; 2026-02-21T09:34:05.7055409Z st.shared.v4.b32 [%r193], {%r2501, %r2513, %r2525, %r2537}; 2026-02-21T09:34:05.7055521Z st.shared.v4.b32 [%r194], {%r2549, %r2561, %r2573, %r2585}; 2026-02-21T09:34:05.7055620Z st.shared.v4.b32 [%r195], {%r2597, %r2609, %r2621, %r2633}; 2026-02-21T09:34:05.7055713Z st.shared.v4.b32 [%r196], {%r2645, %r2657, %r2669, %r2681}; 2026-02-21T09:34:05.7055808Z st.shared.v4.b32 [%r197], {%r2693, %r2705, %r2717, %r2729}; 2026-02-21T09:34:05.7055910Z st.shared.v4.b32 [%r198], {%r2741, %r2753, %r2765, %r2777}; 2026-02-21T09:34:05.7055965Z bar.sync 0, 128; 2026-02-21T09:34:05.7056021Z // begin inline asm 2026-02-21T09:34:05.7056184Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2074, %r2078, %r2082, %r2086}, [%r1755]; 2026-02-21T09:34:05.7056240Z // end inline asm 2026-02-21T09:34:05.7056295Z // begin inline asm 2026-02-21T09:34:05.7056450Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2090, %r2094, %r2098, %r2102}, [%r1760]; 2026-02-21T09:34:05.7056513Z // end inline asm 2026-02-21T09:34:05.7056569Z // begin inline asm 2026-02-21T09:34:05.7056724Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2106, %r2110, %r2114, %r2118}, [%r1765]; 2026-02-21T09:34:05.7056788Z // end inline asm 2026-02-21T09:34:05.7056845Z // begin inline asm 2026-02-21T09:34:05.7056998Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2122, %r2126, %r2130, %r2134}, [%r1770]; 2026-02-21T09:34:05.7057062Z // end inline asm 2026-02-21T09:34:05.7057167Z // begin inline asm 2026-02-21T09:34:05.7057350Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2138, %r2142, %r2146, %r2150}, [%r1775]; 2026-02-21T09:34:05.7057406Z // end inline asm 2026-02-21T09:34:05.7057470Z // begin inline asm 2026-02-21T09:34:05.7057624Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2154, %r2158, %r2162, %r2166}, [%r1780]; 2026-02-21T09:34:05.7057679Z // end inline asm 2026-02-21T09:34:05.7057741Z // begin inline asm 2026-02-21T09:34:05.7057895Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2170, %r2174, %r2178, %r2182}, [%r1785]; 2026-02-21T09:34:05.7057950Z // end inline asm 2026-02-21T09:34:05.7058006Z // begin inline asm 2026-02-21T09:34:05.7058169Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2186, %r2190, %r2194, %r2198}, [%r1790]; 2026-02-21T09:34:05.7058224Z // end inline asm 2026-02-21T09:34:05.7058279Z bar.sync 0, 128; 2026-02-21T09:34:05.7058443Z st.shared.v4.b32 [%r191], {%r2789, %r2801, %r2813, %r2825}; 2026-02-21T09:34:05.7058549Z st.shared.v4.b32 [%r192], {%r2837, %r2849, %r2861, %r2873}; 2026-02-21T09:34:05.7058654Z st.shared.v4.b32 [%r193], {%r2885, %r2897, %r2909, %r2921}; 2026-02-21T09:34:05.7058761Z st.shared.v4.b32 [%r194], {%r2933, %r2945, %r2957, %r2969}; 2026-02-21T09:34:05.7058865Z st.shared.v4.b32 [%r195], {%r2981, %r2993, %r3005, %r3017}; 2026-02-21T09:34:05.7058964Z st.shared.v4.b32 [%r196], {%r3029, %r3041, %r3053, %r3065}; 2026-02-21T09:34:05.7059065Z st.shared.v4.b32 [%r197], {%r3077, %r3089, %r3101, %r3113}; 2026-02-21T09:34:05.7059180Z st.shared.v4.b32 [%r198], {%r3125, %r3137, %r3149, %r3161}; 2026-02-21T09:34:05.7059236Z bar.sync 0, 128; 2026-02-21T09:34:05.7059293Z // begin inline asm 2026-02-21T09:34:05.7059455Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2202, %r2206, %r2210, %r2214}, [%r1755]; 2026-02-21T09:34:05.7059511Z // end inline asm 2026-02-21T09:34:05.7059567Z // begin inline asm 2026-02-21T09:34:05.7059737Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2218, %r2222, %r2226, %r2230}, [%r1760]; 2026-02-21T09:34:05.7059796Z // end inline asm 2026-02-21T09:34:05.7059855Z // begin inline asm 2026-02-21T09:34:05.7060013Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2234, %r2238, %r2242, %r2246}, [%r1765]; 2026-02-21T09:34:05.7060078Z // end inline asm 2026-02-21T09:34:05.7060139Z // begin inline asm 2026-02-21T09:34:05.7060294Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2250, %r2254, %r2258, %r2262}, [%r1770]; 2026-02-21T09:34:05.7060358Z // end inline asm 2026-02-21T09:34:05.7060415Z // begin inline asm 2026-02-21T09:34:05.7060571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2266, %r2270, %r2274, %r2278}, [%r1775]; 2026-02-21T09:34:05.7060634Z // end inline asm 2026-02-21T09:34:05.7060692Z // begin inline asm 2026-02-21T09:34:05.7060846Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2282, %r2286, %r2290, %r2294}, [%r1780]; 2026-02-21T09:34:05.7060901Z // end inline asm 2026-02-21T09:34:05.7060965Z // begin inline asm 2026-02-21T09:34:05.7061119Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2298, %r2302, %r2306, %r2310}, [%r1785]; 2026-02-21T09:34:05.7061178Z // end inline asm 2026-02-21T09:34:05.7061241Z // begin inline asm 2026-02-21T09:34:05.7061393Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2314, %r2318, %r2322, %r2326}, [%r1790]; 2026-02-21T09:34:05.7061447Z // end inline asm 2026-02-21T09:34:05.7061504Z // begin inline asm 2026-02-21T09:34:05.7061629Z st.global.v4.b32 [ %rd211 + 0 ], { %r2071, %r2072, %r2073, %r2074 }; 2026-02-21T09:34:05.7061686Z // end inline asm 2026-02-21T09:34:05.7061741Z // begin inline asm 2026-02-21T09:34:05.7061863Z st.global.v4.b32 [ %rd212 + 0 ], { %r2075, %r2076, %r2077, %r2078 }; 2026-02-21T09:34:05.7061919Z // end inline asm 2026-02-21T09:34:05.7061975Z // begin inline asm 2026-02-21T09:34:05.7062090Z st.global.v4.b32 [ %rd213 + 0 ], { %r2079, %r2080, %r2081, %r2082 }; 2026-02-21T09:34:05.7062146Z // end inline asm 2026-02-21T09:34:05.7062204Z // begin inline asm 2026-02-21T09:34:05.7062311Z st.global.v4.b32 [ %rd214 + 0 ], { %r2083, %r2084, %r2085, %r2086 }; 2026-02-21T09:34:05.7062417Z // end inline asm 2026-02-21T09:34:05.7062473Z // begin inline asm 2026-02-21T09:34:05.7062577Z st.global.v4.b32 [ %rd215 + 0 ], { %r2087, %r2088, %r2089, %r2090 }; 2026-02-21T09:34:05.7062639Z // end inline asm 2026-02-21T09:34:05.7062695Z // begin inline asm 2026-02-21T09:34:05.7062797Z st.global.v4.b32 [ %rd216 + 0 ], { %r2091, %r2092, %r2093, %r2094 }; 2026-02-21T09:34:05.7062853Z // end inline asm 2026-02-21T09:34:05.7062916Z // begin inline asm 2026-02-21T09:34:05.7063017Z st.global.v4.b32 [ %rd217 + 0 ], { %r2095, %r2096, %r2097, %r2098 }; 2026-02-21T09:34:05.7063072Z // end inline asm 2026-02-21T09:34:05.7063136Z // begin inline asm 2026-02-21T09:34:05.7063237Z st.global.v4.b32 [ %rd218 + 0 ], { %r2099, %r2100, %r2101, %r2102 }; 2026-02-21T09:34:05.7063298Z // end inline asm 2026-02-21T09:34:05.7063366Z // begin inline asm 2026-02-21T09:34:05.7063521Z st.global.v4.b32 [ %rd219 + 0 ], { %r2103, %r2104, %r2105, %r2106 }; 2026-02-21T09:34:05.7063584Z // end inline asm 2026-02-21T09:34:05.7063648Z // begin inline asm 2026-02-21T09:34:05.7063767Z st.global.v4.b32 [ %rd220 + 0 ], { %r2107, %r2108, %r2109, %r2110 }; 2026-02-21T09:34:05.7063826Z // end inline asm 2026-02-21T09:34:05.7063888Z // begin inline asm 2026-02-21T09:34:05.7064018Z st.global.v4.b32 [ %rd221 + 0 ], { %r2111, %r2112, %r2113, %r2114 }; 2026-02-21T09:34:05.7064075Z // end inline asm 2026-02-21T09:34:05.7064131Z // begin inline asm 2026-02-21T09:34:05.7064231Z st.global.v4.b32 [ %rd222 + 0 ], { %r2115, %r2116, %r2117, %r2118 }; 2026-02-21T09:34:05.7064292Z // end inline asm 2026-02-21T09:34:05.7064348Z // begin inline asm 2026-02-21T09:34:05.7064448Z st.global.v4.b32 [ %rd223 + 0 ], { %r2119, %r2120, %r2121, %r2122 }; 2026-02-21T09:34:05.7064511Z // end inline asm 2026-02-21T09:34:05.7064568Z // begin inline asm 2026-02-21T09:34:05.7064706Z st.global.v4.b32 [ %rd224 + 0 ], { %r2123, %r2124, %r2125, %r2126 }; 2026-02-21T09:34:05.7064776Z // end inline asm 2026-02-21T09:34:05.7064835Z // begin inline asm 2026-02-21T09:34:05.7064937Z st.global.v4.b32 [ %rd225 + 0 ], { %r2127, %r2128, %r2129, %r2130 }; 2026-02-21T09:34:05.7064992Z // end inline asm 2026-02-21T09:34:05.7065055Z // begin inline asm 2026-02-21T09:34:05.7065155Z st.global.v4.b32 [ %rd226 + 0 ], { %r2131, %r2132, %r2133, %r2134 }; 2026-02-21T09:34:05.7065209Z // end inline asm 2026-02-21T09:34:05.7065271Z // begin inline asm 2026-02-21T09:34:05.7065373Z st.global.v4.b32 [ %rd227 + 0 ], { %r2135, %r2136, %r2137, %r2138 }; 2026-02-21T09:34:05.7065428Z // end inline asm 2026-02-21T09:34:05.7065484Z // begin inline asm 2026-02-21T09:34:05.7065592Z st.global.v4.b32 [ %rd228 + 0 ], { %r2139, %r2140, %r2141, %r2142 }; 2026-02-21T09:34:05.7065647Z // end inline asm 2026-02-21T09:34:05.7065704Z // begin inline asm 2026-02-21T09:34:05.7065812Z st.global.v4.b32 [ %rd229 + 0 ], { %r2143, %r2144, %r2145, %r2146 }; 2026-02-21T09:34:05.7065867Z // end inline asm 2026-02-21T09:34:05.7065925Z // begin inline asm 2026-02-21T09:34:05.7066029Z st.global.v4.b32 [ %rd230 + 0 ], { %r2147, %r2148, %r2149, %r2150 }; 2026-02-21T09:34:05.7066093Z // end inline asm 2026-02-21T09:34:05.7066150Z // begin inline asm 2026-02-21T09:34:05.7066251Z st.global.v4.b32 [ %rd231 + 0 ], { %r2151, %r2152, %r2153, %r2154 }; 2026-02-21T09:34:05.7066315Z // end inline asm 2026-02-21T09:34:05.7066371Z // begin inline asm 2026-02-21T09:34:05.7066471Z st.global.v4.b32 [ %rd232 + 0 ], { %r2155, %r2156, %r2157, %r2158 }; 2026-02-21T09:34:05.7066534Z // end inline asm 2026-02-21T09:34:05.7066591Z // begin inline asm 2026-02-21T09:34:05.7066691Z st.global.v4.b32 [ %rd233 + 0 ], { %r2159, %r2160, %r2161, %r2162 }; 2026-02-21T09:34:05.7066746Z // end inline asm 2026-02-21T09:34:05.7066813Z // begin inline asm 2026-02-21T09:34:05.7066913Z st.global.v4.b32 [ %rd234 + 0 ], { %r2163, %r2164, %r2165, %r2166 }; 2026-02-21T09:34:05.7066970Z // end inline asm 2026-02-21T09:34:05.7067036Z // begin inline asm 2026-02-21T09:34:05.7067167Z st.global.v4.b32 [ %rd235 + 0 ], { %r2167, %r2168, %r2169, %r2170 }; 2026-02-21T09:34:05.7067251Z // end inline asm 2026-02-21T09:34:05.7067307Z // begin inline asm 2026-02-21T09:34:05.7067418Z st.global.v4.b32 [ %rd236 + 0 ], { %r2171, %r2172, %r2173, %r2174 }; 2026-02-21T09:34:05.7067475Z // end inline asm 2026-02-21T09:34:05.7067533Z // begin inline asm 2026-02-21T09:34:05.7067644Z st.global.v4.b32 [ %rd237 + 0 ], { %r2175, %r2176, %r2177, %r2178 }; 2026-02-21T09:34:05.7067699Z // end inline asm 2026-02-21T09:34:05.7067760Z // begin inline asm 2026-02-21T09:34:05.7067871Z st.global.v4.b32 [ %rd238 + 0 ], { %r2179, %r2180, %r2181, %r2182 }; 2026-02-21T09:34:05.7067936Z // end inline asm 2026-02-21T09:34:05.7067993Z // begin inline asm 2026-02-21T09:34:05.7068095Z st.global.v4.b32 [ %rd239 + 0 ], { %r2183, %r2184, %r2185, %r2186 }; 2026-02-21T09:34:05.7068159Z // end inline asm 2026-02-21T09:34:05.7068242Z // begin inline asm 2026-02-21T09:34:05.7068370Z st.global.v4.b32 [ %rd240 + 0 ], { %r2187, %r2188, %r2189, %r2190 }; 2026-02-21T09:34:05.7068435Z // end inline asm 2026-02-21T09:34:05.7068491Z // begin inline asm 2026-02-21T09:34:05.7068594Z st.global.v4.b32 [ %rd241 + 0 ], { %r2191, %r2192, %r2193, %r2194 }; 2026-02-21T09:34:05.7068647Z // end inline asm 2026-02-21T09:34:05.7068712Z // begin inline asm 2026-02-21T09:34:05.7068813Z st.global.v4.b32 [ %rd242 + 0 ], { %r2195, %r2196, %r2197, %r2198 }; 2026-02-21T09:34:05.7068868Z // end inline asm 2026-02-21T09:34:05.7068933Z // begin inline asm 2026-02-21T09:34:05.7069034Z st.global.v4.b32 [ %rd243 + 0 ], { %r2199, %r2200, %r2201, %r2202 }; 2026-02-21T09:34:05.7069089Z // end inline asm 2026-02-21T09:34:05.7069154Z // begin inline asm 2026-02-21T09:34:05.7069255Z st.global.v4.b32 [ %rd244 + 0 ], { %r2203, %r2204, %r2205, %r2206 }; 2026-02-21T09:34:05.7069310Z // end inline asm 2026-02-21T09:34:05.7069369Z // begin inline asm 2026-02-21T09:34:05.7069481Z st.global.v4.b32 [ %rd245 + 0 ], { %r2207, %r2208, %r2209, %r2210 }; 2026-02-21T09:34:05.7069538Z // end inline asm 2026-02-21T09:34:05.7069596Z // begin inline asm 2026-02-21T09:34:05.7069705Z st.global.v4.b32 [ %rd246 + 0 ], { %r2211, %r2212, %r2213, %r2214 }; 2026-02-21T09:34:05.7069760Z // end inline asm 2026-02-21T09:34:05.7069817Z // begin inline asm 2026-02-21T09:34:05.7069917Z st.global.v4.b32 [ %rd247 + 0 ], { %r2215, %r2216, %r2217, %r2218 }; 2026-02-21T09:34:05.7069981Z // end inline asm 2026-02-21T09:34:05.7070038Z // begin inline asm 2026-02-21T09:34:05.7070141Z st.global.v4.b32 [ %rd248 + 0 ], { %r2219, %r2220, %r2221, %r2222 }; 2026-02-21T09:34:05.7070204Z // end inline asm 2026-02-21T09:34:05.7070262Z // begin inline asm 2026-02-21T09:34:05.7070362Z st.global.v4.b32 [ %rd249 + 0 ], { %r2223, %r2224, %r2225, %r2226 }; 2026-02-21T09:34:05.7070425Z // end inline asm 2026-02-21T09:34:05.7070482Z // begin inline asm 2026-02-21T09:34:05.7070583Z st.global.v4.b32 [ %rd250 + 0 ], { %r2227, %r2228, %r2229, %r2230 }; 2026-02-21T09:34:05.7070639Z // end inline asm 2026-02-21T09:34:05.7070707Z // begin inline asm 2026-02-21T09:34:05.7070808Z st.global.v4.b32 [ %rd251 + 0 ], { %r2231, %r2232, %r2233, %r2234 }; 2026-02-21T09:34:05.7070864Z // end inline asm 2026-02-21T09:34:05.7070927Z // begin inline asm 2026-02-21T09:34:05.7071028Z st.global.v4.b32 [ %rd252 + 0 ], { %r2235, %r2236, %r2237, %r2238 }; 2026-02-21T09:34:05.7071082Z // end inline asm 2026-02-21T09:34:05.7071138Z // begin inline asm 2026-02-21T09:34:05.7071246Z st.global.v4.b32 [ %rd253 + 0 ], { %r2239, %r2240, %r2241, %r2242 }; 2026-02-21T09:34:05.7071300Z // end inline asm 2026-02-21T09:34:05.7071356Z // begin inline asm 2026-02-21T09:34:05.7071465Z st.global.v4.b32 [ %rd254 + 0 ], { %r2243, %r2244, %r2245, %r2246 }; 2026-02-21T09:34:05.7071520Z // end inline asm 2026-02-21T09:34:05.7071576Z // begin inline asm 2026-02-21T09:34:05.7071678Z st.global.v4.b32 [ %rd255 + 0 ], { %r2247, %r2248, %r2249, %r2250 }; 2026-02-21T09:34:05.7071741Z // end inline asm 2026-02-21T09:34:05.7071799Z // begin inline asm 2026-02-21T09:34:05.7071925Z st.global.v4.b32 [ %rd256 + 0 ], { %r2251, %r2252, %r2253, %r2254 }; 2026-02-21T09:34:05.7072022Z // end inline asm 2026-02-21T09:34:05.7072078Z // begin inline asm 2026-02-21T09:34:05.7072179Z st.global.v4.b32 [ %rd257 + 0 ], { %r2255, %r2256, %r2257, %r2258 }; 2026-02-21T09:34:05.7072240Z // end inline asm 2026-02-21T09:34:05.7072296Z // begin inline asm 2026-02-21T09:34:05.7072397Z st.global.v4.b32 [ %rd258 + 0 ], { %r2259, %r2260, %r2261, %r2262 }; 2026-02-21T09:34:05.7072451Z // end inline asm 2026-02-21T09:34:05.7072515Z // begin inline asm 2026-02-21T09:34:05.7072616Z st.global.v4.b32 [ %rd259 + 0 ], { %r2263, %r2264, %r2265, %r2266 }; 2026-02-21T09:34:05.7072671Z // end inline asm 2026-02-21T09:34:05.7072734Z // begin inline asm 2026-02-21T09:34:05.7072835Z st.global.v4.b32 [ %rd260 + 0 ], { %r2267, %r2268, %r2269, %r2270 }; 2026-02-21T09:34:05.7072910Z // end inline asm 2026-02-21T09:34:05.7072991Z // begin inline asm 2026-02-21T09:34:05.7073105Z st.global.v4.b32 [ %rd261 + 0 ], { %r2271, %r2272, %r2273, %r2274 }; 2026-02-21T09:34:05.7073161Z // end inline asm 2026-02-21T09:34:05.7073219Z // begin inline asm 2026-02-21T09:34:05.7073332Z st.global.v4.b32 [ %rd262 + 0 ], { %r2275, %r2276, %r2277, %r2278 }; 2026-02-21T09:34:05.7073387Z // end inline asm 2026-02-21T09:34:05.7073444Z // begin inline asm 2026-02-21T09:34:05.7073552Z st.global.v4.b32 [ %rd263 + 0 ], { %r2279, %r2280, %r2281, %r2282 }; 2026-02-21T09:34:05.7073607Z // end inline asm 2026-02-21T09:34:05.7073663Z // begin inline asm 2026-02-21T09:34:05.7073765Z st.global.v4.b32 [ %rd264 + 0 ], { %r2283, %r2284, %r2285, %r2286 }; 2026-02-21T09:34:05.7073827Z // end inline asm 2026-02-21T09:34:05.7073884Z // begin inline asm 2026-02-21T09:34:05.7073985Z st.global.v4.b32 [ %rd265 + 0 ], { %r2287, %r2288, %r2289, %r2290 }; 2026-02-21T09:34:05.7074048Z // end inline asm 2026-02-21T09:34:05.7074108Z // begin inline asm 2026-02-21T09:34:05.7074210Z st.global.v4.b32 [ %rd266 + 0 ], { %r2291, %r2292, %r2293, %r2294 }; 2026-02-21T09:34:05.7074269Z // end inline asm 2026-02-21T09:34:05.7074332Z // begin inline asm 2026-02-21T09:34:05.7074435Z st.global.v4.b32 [ %rd267 + 0 ], { %r2295, %r2296, %r2297, %r2298 }; 2026-02-21T09:34:05.7074490Z // end inline asm 2026-02-21T09:34:05.7074555Z // begin inline asm 2026-02-21T09:34:05.7074658Z st.global.v4.b32 [ %rd268 + 0 ], { %r2299, %r2300, %r2301, %r2302 }; 2026-02-21T09:34:05.7074752Z // end inline asm 2026-02-21T09:34:05.7074818Z // begin inline asm 2026-02-21T09:34:05.7074922Z st.global.v4.b32 [ %rd269 + 0 ], { %r2303, %r2304, %r2305, %r2306 }; 2026-02-21T09:34:05.7074978Z // end inline asm 2026-02-21T09:34:05.7075037Z // begin inline asm 2026-02-21T09:34:05.7075151Z st.global.v4.b32 [ %rd270 + 0 ], { %r2307, %r2308, %r2309, %r2310 }; 2026-02-21T09:34:05.7075207Z // end inline asm 2026-02-21T09:34:05.7075266Z // begin inline asm 2026-02-21T09:34:05.7075383Z st.global.v4.b32 [ %rd271 + 0 ], { %r2311, %r2312, %r2313, %r2314 }; 2026-02-21T09:34:05.7075442Z // end inline asm 2026-02-21T09:34:05.7075501Z // begin inline asm 2026-02-21T09:34:05.7075603Z st.global.v4.b32 [ %rd272 + 0 ], { %r2315, %r2316, %r2317, %r2318 }; 2026-02-21T09:34:05.7075669Z // end inline asm 2026-02-21T09:34:05.7075726Z // begin inline asm 2026-02-21T09:34:05.7075830Z st.global.v4.b32 [ %rd273 + 0 ], { %r2319, %r2320, %r2321, %r2322 }; 2026-02-21T09:34:05.7075893Z // end inline asm 2026-02-21T09:34:05.7075949Z // begin inline asm 2026-02-21T09:34:05.7076052Z st.global.v4.b32 [ %rd274 + 0 ], { %r2323, %r2324, %r2325, %r2326 }; 2026-02-21T09:34:05.7076112Z // end inline asm 2026-02-21T09:34:05.7076170Z mov.b32 %r3359, 1; 2026-02-21T09:34:05.7076283Z $L__BB0_24: // in Loop: Header=BB0_20 Depth=1 2026-02-21T09:34:05.7076473Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7076549Z xor.b32 %r3363, %r3359, %r3363; 2026-02-21T09:34:05.7076735Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7076862Z add.s32 %r3353, %r3353, -1; 2026-02-21T09:34:05.7076939Z setp.ne.b32 %p93, %r3353, 0; 2026-02-21T09:34:05.7077001Z @%p93 bra $L__BB0_20; 2026-02-21T09:34:05.7077061Z bra.uni $L__BB0_25; 2026-02-21T09:34:05.7077177Z $L__BB0_20: // =>This Inner Loop Header: Depth=1 2026-02-21T09:34:05.7077239Z add.s32 %r1203, %r3358, 1; 2026-02-21T09:34:05.7077305Z setp.eq.b32 %p89, %r3358, 31; 2026-02-21T09:34:05.7077373Z selp.b32 %r3358, 0, %r1203, %p89; 2026-02-21T09:34:05.7077444Z setp.eq.b32 %p90, %r3358, 31; 2026-02-21T09:34:05.7077504Z @%p90 bra $L__BB0_23; 2026-02-21T09:34:05.7077607Z // %bb.21: // in Loop: Header=BB0_20 Depth=1 2026-02-21T09:34:05.7077680Z setp.ne.b32 %p91, %r3358, 0; 2026-02-21T09:34:05.7077768Z mov.b32 %r3359, %r1176; 2026-02-21T09:34:05.7077854Z @%p91 bra $L__BB0_24; 2026-02-21T09:34:05.7077938Z // %bb.22: // %.thread 2026-02-21T09:34:05.7078041Z // in Loop: Header=BB0_20 Depth=1 2026-02-21T09:34:05.7078103Z add.s32 %r3360, %r3360, 2368; 2026-02-21T09:34:05.7078285Z .loc 1 26 35 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:26:35 2026-02-21T09:34:05.7078364Z mul.hi.s32 %r3291, %r3360, 715827883; 2026-02-21T09:34:05.7078425Z shr.u32 %r3292, %r3291, 31; 2026-02-21T09:34:05.7078486Z shr.s32 %r3293, %r3291, 6; 2026-02-21T09:34:05.7078555Z add.s32 %r3294, %r3293, %r3292; 2026-02-21T09:34:05.7078731Z .loc 1 27 33 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:27:33 2026-02-21T09:34:05.7078793Z shl.b32 %r3295, %r3294, 3; 2026-02-21T09:34:05.7078966Z .loc 1 28 39 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:28:39 2026-02-21T09:34:05.7079033Z sub.s32 %r3296, 4, %r3295; 2026-02-21T09:34:05.7079206Z .loc 1 28 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:28:52 2026-02-21T09:34:05.7079268Z min.s32 %r3297, %r3296, 8; 2026-02-21T09:34:05.7079447Z .loc 1 29 45 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:45 2026-02-21T09:34:05.7079512Z mul.lo.s32 %r3298, %r3294, 384; 2026-02-21T09:34:05.7079573Z sub.s32 %r3299, %r3360, %r3298; 2026-02-21T09:34:05.7079748Z .loc 1 30 51 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:30:51 2026-02-21T09:34:05.7079811Z div.s32 %r3300, %r3299, %r3297; 2026-02-21T09:34:05.7079981Z .loc 1 29 64 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:64 2026-02-21T09:34:05.7080053Z mul.lo.s32 %r3301, %r3300, %r3297; 2026-02-21T09:34:05.7080113Z sub.s32 %r3302, %r3299, %r3301; 2026-02-21T09:34:05.7080284Z .loc 1 29 30 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:30 2026-02-21T09:34:05.7080346Z add.s32 %r3303, %r3302, %r3295; 2026-02-21T09:34:05.7080522Z .loc 1 31 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:31:27 2026-02-21T09:34:05.7080581Z shl.b32 %r3362, %r3303, 8; 2026-02-21T09:34:05.7080751Z .loc 1 33 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:33:27 2026-02-21T09:34:05.7080818Z shl.b32 %r3361, %r3300, 8; 2026-02-21T09:34:05.7080878Z mov.b32 %r3359, %r1176; 2026-02-21T09:34:05.7081059Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7081124Z bra.uni $L__BB0_24; 2026-02-21T09:34:05.7081229Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:34:05.7081403Z .loc 1 0 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:0:130 2026-02-21T09:34:05.7081503Z ld.param.b64 %rd18, [_helion_matmul_param_1]; 2026-02-21T09:34:05.7081592Z ld.param.b64 %rd17, [_helion_matmul_param_0]; 2026-02-21T09:34:05.7081681Z mov.b32 %r226, global_smem; 2026-02-21T09:34:05.7081765Z add.s32 %r227, %r226, %r3; 2026-02-21T09:34:05.7081834Z mov.u32 %r6, %ctaid.x; 2026-02-21T09:34:05.7081898Z add.s32 %r389, %r1, -128; 2026-02-21T09:34:05.7081961Z shr.u32 %r390, %r389, 5; 2026-02-21T09:34:05.7082025Z bra.uni $L__BB0_2; 2026-02-21T09:34:05.7082129Z $L__BB0_16: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7082304Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7082363Z barrier.sync 1; 2026-02-21T09:34:05.7082429Z barrier.sync 1; 2026-02-21T09:34:05.7082515Z $L__BB0_2: // %.preheader 2026-02-21T09:34:05.7082608Z // =>This Loop Header: Depth=1 2026-02-21T09:34:05.7082730Z // Child Loop BB0_9 Depth 2 2026-02-21T09:34:05.7082922Z .loc 1 14 0 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:14 2026-02-21T09:34:05.7083008Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:34:05.7083075Z barrier.sync 1; 2026-02-21T09:34:05.7083143Z ld.shared.b8 %r225, [%r227+180244]; 2026-02-21T09:34:05.7083208Z setp.gt.u32 %p4, %r225, 4; 2026-02-21T09:34:05.7083268Z @%p4 bra $L__BB0_4; 2026-02-21T09:34:05.7083361Z // %bb.3: // %.preheader 2026-02-21T09:34:05.7083454Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7083520Z $L_brx_0: .branchtargets 2026-02-21T09:34:05.7083586Z $L__BB0_5, 2026-02-21T09:34:05.7083642Z $L__BB0_15, 2026-02-21T09:34:05.7083696Z $L__BB0_16, 2026-02-21T09:34:05.7083749Z $L__BB0_17, 2026-02-21T09:34:05.7083810Z $L__BB0_26; 2026-02-21T09:34:05.7083873Z brx.idx %r225, $L_brx_0; 2026-02-21T09:34:05.7083974Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7084167Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7084241Z ld.shared.b32 %r441, [global_smem]; 2026-02-21T09:34:05.7084312Z ld.shared.b32 %r5, [global_smem+8]; 2026-02-21T09:34:05.7084378Z barrier.sync 1; 2026-02-21T09:34:05.7084551Z .loc 1 32 45 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:32:45 2026-02-21T09:34:05.7084614Z shr.u32 %r391, %r1, 2; 2026-02-21T09:34:05.7084706Z bfe.u32 %r7, %r1, 2, 5; 2026-02-21T09:34:05.7084778Z or.b32 %r8, %r7, 32; 2026-02-21T09:34:05.7084836Z or.b32 %r9, %r7, 64; 2026-02-21T09:34:05.7084896Z or.b32 %r10, %r7, 96; 2026-02-21T09:34:05.7084965Z or.b32 %r11, %r7, 128; 2026-02-21T09:34:05.7085022Z or.b32 %r12, %r7, 160; 2026-02-21T09:34:05.7085078Z or.b32 %r13, %r7, 192; 2026-02-21T09:34:05.7085137Z or.b32 %r14, %r391, 224; 2026-02-21T09:34:05.7085326Z .loc 1 40 48 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:40:48 2026-02-21T09:34:05.7085388Z shl.b32 %r392, %r1, 3; 2026-02-21T09:34:05.7085451Z and.b32 %r15, %r392, 24; 2026-02-21T09:34:05.7085639Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7085702Z setp.lt.s32 %p5, %r5, 1; 2026-02-21T09:34:05.7085763Z setp.gt.s32 %p6, %r5, 0; 2026-02-21T09:34:05.7085945Z .loc 1 26 35 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:26:35 2026-02-21T09:34:05.7086010Z mul.hi.u32 %r393, %r6, 715827883; 2026-02-21T09:34:05.7086069Z shr.u32 %r394, %r393, 6; 2026-02-21T09:34:05.7086243Z .loc 1 27 33 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:27:33 2026-02-21T09:34:05.7086309Z shl.b32 %r395, %r394, 3; 2026-02-21T09:34:05.7086482Z .loc 1 28 39 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:28:39 2026-02-21T09:34:05.7086541Z sub.s32 %r396, 4, %r395; 2026-02-21T09:34:05.7086719Z .loc 1 29 45 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:45 2026-02-21T09:34:05.7086835Z mul.lo.s32 %r397, %r394, 384; 2026-02-21T09:34:05.7086896Z sub.s32 %r398, %r6, %r397; 2026-02-21T09:34:05.7087073Z .loc 1 30 51 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:30:51 2026-02-21T09:34:05.7087137Z div.s32 %r399, %r398, %r396; 2026-02-21T09:34:05.7087312Z .loc 1 29 64 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:64 2026-02-21T09:34:05.7087383Z mul.lo.s32 %r400, %r399, %r396; 2026-02-21T09:34:05.7087443Z sub.s32 %r401, %r398, %r400; 2026-02-21T09:34:05.7087619Z .loc 1 29 30 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:30 2026-02-21T09:34:05.7087679Z add.s32 %r402, %r401, %r395; 2026-02-21T09:34:05.7087878Z .loc 1 31 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:31:27 2026-02-21T09:34:05.7087963Z shl.b32 %r403, %r402, 8; 2026-02-21T09:34:05.7088137Z .loc 1 32 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:32:32 2026-02-21T09:34:05.7088208Z or.b32 %r3336, %r403, %r7; 2026-02-21T09:34:05.7088268Z or.b32 %r3337, %r403, %r8; 2026-02-21T09:34:05.7088327Z or.b32 %r3338, %r403, %r9; 2026-02-21T09:34:05.7088396Z or.b32 %r3339, %r403, %r10; 2026-02-21T09:34:05.7088458Z or.b32 %r3340, %r403, %r11; 2026-02-21T09:34:05.7088517Z or.b32 %r3341, %r403, %r12; 2026-02-21T09:34:05.7088575Z or.b32 %r3342, %r403, %r13; 2026-02-21T09:34:05.7088641Z or.b32 %r3343, %r403, %r14; 2026-02-21T09:34:05.7088818Z .loc 1 33 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:33:27 2026-02-21T09:34:05.7088877Z shl.b32 %r404, %r399, 8; 2026-02-21T09:34:05.7089057Z .loc 1 34 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:34:32 2026-02-21T09:34:05.7089121Z or.b32 %r3344, %r404, %r7; 2026-02-21T09:34:05.7089182Z or.b32 %r3345, %r404, %r8; 2026-02-21T09:34:05.7089249Z or.b32 %r3346, %r404, %r9; 2026-02-21T09:34:05.7089308Z or.b32 %r3347, %r404, %r10; 2026-02-21T09:34:05.7089367Z or.b32 %r3348, %r404, %r11; 2026-02-21T09:34:05.7089424Z or.b32 %r3349, %r404, %r12; 2026-02-21T09:34:05.7089489Z or.b32 %r3350, %r404, %r13; 2026-02-21T09:34:05.7089547Z or.b32 %r3351, %r404, %r14; 2026-02-21T09:34:05.7089720Z .loc 1 44 53 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:53 2026-02-21T09:34:05.7089786Z shl.b32 %r405, %r3344, 10; 2026-02-21T09:34:05.7089845Z shl.b32 %r406, %r3345, 10; 2026-02-21T09:34:05.7089903Z shl.b32 %r407, %r3346, 10; 2026-02-21T09:34:05.7089960Z shl.b32 %r408, %r3347, 10; 2026-02-21T09:34:05.7090027Z shl.b32 %r409, %r3348, 10; 2026-02-21T09:34:05.7090084Z shl.b32 %r410, %r3349, 10; 2026-02-21T09:34:05.7090141Z shl.b32 %r411, %r3350, 10; 2026-02-21T09:34:05.7090207Z shl.b32 %r412, %r3351, 10; 2026-02-21T09:34:05.7090382Z .loc 1 44 60 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:60 2026-02-21T09:34:05.7090443Z or.b32 %r413, %r405, %r15; 2026-02-21T09:34:05.7090508Z or.b32 %r414, %r406, %r15; 2026-02-21T09:34:05.7090567Z or.b32 %r415, %r407, %r15; 2026-02-21T09:34:05.7090624Z or.b32 %r416, %r408, %r15; 2026-02-21T09:34:05.7090680Z or.b32 %r417, %r409, %r15; 2026-02-21T09:34:05.7090745Z or.b32 %r418, %r410, %r15; 2026-02-21T09:34:05.7090802Z or.b32 %r419, %r411, %r15; 2026-02-21T09:34:05.7090860Z or.b32 %r420, %r412, %r15; 2026-02-21T09:34:05.7091043Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7091115Z mad.wide.s32 %rd20, %r413, 2, %rd17; 2026-02-21T09:34:05.7091183Z mad.wide.s32 %rd21, %r414, 2, %rd17; 2026-02-21T09:34:05.7091249Z mad.wide.s32 %rd22, %r415, 2, %rd17; 2026-02-21T09:34:05.7091322Z mad.wide.s32 %rd23, %r416, 2, %rd17; 2026-02-21T09:34:05.7091389Z mad.wide.s32 %rd24, %r417, 2, %rd17; 2026-02-21T09:34:05.7091488Z mad.wide.s32 %rd25, %r418, 2, %rd17; 2026-02-21T09:34:05.7091579Z mad.wide.s32 %rd26, %r419, 2, %rd17; 2026-02-21T09:34:05.7091642Z mad.wide.s32 %rd27, %r420, 2, %rd17; 2026-02-21T09:34:05.7091809Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7091876Z shl.b32 %r421, %r1, 4; 2026-02-21T09:34:05.7091937Z and.b32 %r422, %r421, 2032; 2026-02-21T09:34:05.7091996Z shl.b32 %r423, %r1, 1; 2026-02-21T09:34:05.7092057Z and.b32 %r424, %r423, 48; 2026-02-21T09:34:05.7092126Z xor.b32 %r32, %r422, %r424; 2026-02-21T09:34:05.7092184Z add.s32 %r465, %r226, %r32; 2026-02-21T09:34:05.7092246Z selp.b32 %r229, 16, 0, %p6; 2026-02-21T09:34:05.7092312Z // begin inline asm 2026-02-21T09:34:05.7092436Z cp.async.cg.shared.global [ %r465 + 0 ], [ %rd20 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7092494Z // end inline asm 2026-02-21T09:34:05.7092573Z add.s32 %r467, %r465, 2048; 2026-02-21T09:34:05.7092660Z // begin inline asm 2026-02-21T09:34:05.7092785Z cp.async.cg.shared.global [ %r467 + 0 ], [ %rd21 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7092844Z // end inline asm 2026-02-21T09:34:05.7092914Z add.s32 %r469, %r465, 4096; 2026-02-21T09:34:05.7092974Z // begin inline asm 2026-02-21T09:34:05.7093090Z cp.async.cg.shared.global [ %r469 + 0 ], [ %rd22 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7093158Z // end inline asm 2026-02-21T09:34:05.7093217Z add.s32 %r471, %r465, 6144; 2026-02-21T09:34:05.7093274Z // begin inline asm 2026-02-21T09:34:05.7093388Z cp.async.cg.shared.global [ %r471 + 0 ], [ %rd23 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7093452Z // end inline asm 2026-02-21T09:34:05.7093511Z add.s32 %r473, %r465, 8192; 2026-02-21T09:34:05.7093567Z // begin inline asm 2026-02-21T09:34:05.7093686Z cp.async.cg.shared.global [ %r473 + 0 ], [ %rd24 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7093743Z // end inline asm 2026-02-21T09:34:05.7093804Z add.s32 %r475, %r465, 10240; 2026-02-21T09:34:05.7093862Z // begin inline asm 2026-02-21T09:34:05.7093982Z cp.async.cg.shared.global [ %r475 + 0 ], [ %rd25 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7094039Z // end inline asm 2026-02-21T09:34:05.7094100Z add.s32 %r477, %r465, 12288; 2026-02-21T09:34:05.7094164Z // begin inline asm 2026-02-21T09:34:05.7094275Z cp.async.cg.shared.global [ %r477 + 0 ], [ %rd26 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7094332Z // end inline asm 2026-02-21T09:34:05.7094391Z add.s32 %r479, %r465, 14336; 2026-02-21T09:34:05.7094455Z // begin inline asm 2026-02-21T09:34:05.7094565Z cp.async.cg.shared.global [ %r479 + 0 ], [ %rd27 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7094621Z // end inline asm 2026-02-21T09:34:05.7094726Z cp.async.commit_group; 2026-02-21T09:34:05.7094900Z .loc 1 45 80 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:80 2026-02-21T09:34:05.7094961Z shl.b32 %r425, %r3336, 10; 2026-02-21T09:34:05.7095030Z shl.b32 %r426, %r3337, 10; 2026-02-21T09:34:05.7095091Z shl.b32 %r427, %r3338, 10; 2026-02-21T09:34:05.7095151Z shl.b32 %r428, %r3339, 10; 2026-02-21T09:34:05.7095210Z shl.b32 %r429, %r3340, 10; 2026-02-21T09:34:05.7095274Z shl.b32 %r430, %r3341, 10; 2026-02-21T09:34:05.7095332Z shl.b32 %r431, %r3342, 10; 2026-02-21T09:34:05.7095389Z shl.b32 %r432, %r3343, 10; 2026-02-21T09:34:05.7095573Z .loc 1 45 59 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:59 2026-02-21T09:34:05.7095632Z or.b32 %r433, %r425, %r15; 2026-02-21T09:34:05.7095691Z or.b32 %r434, %r426, %r15; 2026-02-21T09:34:05.7095749Z or.b32 %r435, %r427, %r15; 2026-02-21T09:34:05.7095815Z or.b32 %r436, %r428, %r15; 2026-02-21T09:34:05.7095873Z or.b32 %r437, %r429, %r15; 2026-02-21T09:34:05.7095930Z or.b32 %r438, %r430, %r15; 2026-02-21T09:34:05.7095995Z or.b32 %r439, %r431, %r15; 2026-02-21T09:34:05.7096052Z or.b32 %r440, %r432, %r15; 2026-02-21T09:34:05.7096227Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7096329Z mad.wide.s32 %rd28, %r433, 2, %rd18; 2026-02-21T09:34:05.7096419Z mad.wide.s32 %rd29, %r434, 2, %rd18; 2026-02-21T09:34:05.7096483Z mad.wide.s32 %rd30, %r435, 2, %rd18; 2026-02-21T09:34:05.7096546Z mad.wide.s32 %rd31, %r436, 2, %rd18; 2026-02-21T09:34:05.7096618Z mad.wide.s32 %rd32, %r437, 2, %rd18; 2026-02-21T09:34:05.7096682Z mad.wide.s32 %rd33, %r438, 2, %rd18; 2026-02-21T09:34:05.7096745Z mad.wide.s32 %rd34, %r439, 2, %rd18; 2026-02-21T09:34:05.7096814Z mad.wide.s32 %rd35, %r440, 2, %rd18; 2026-02-21T09:34:05.7096986Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7097044Z add.s32 %r244, %r465, 81920; 2026-02-21T09:34:05.7097108Z // begin inline asm 2026-02-21T09:34:05.7097221Z cp.async.cg.shared.global [ %r244 + 0 ], [ %rd28 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7097278Z // end inline asm 2026-02-21T09:34:05.7097363Z add.s32 %r246, %r465, 83968; 2026-02-21T09:34:05.7097455Z // begin inline asm 2026-02-21T09:34:05.7097572Z cp.async.cg.shared.global [ %r246 + 0 ], [ %rd29 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7097629Z // end inline asm 2026-02-21T09:34:05.7097695Z add.s32 %r248, %r465, 86016; 2026-02-21T09:34:05.7097752Z // begin inline asm 2026-02-21T09:34:05.7097862Z cp.async.cg.shared.global [ %r248 + 0 ], [ %rd30 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7097916Z // end inline asm 2026-02-21T09:34:05.7097984Z add.s32 %r250, %r465, 88064; 2026-02-21T09:34:05.7098040Z // begin inline asm 2026-02-21T09:34:05.7098151Z cp.async.cg.shared.global [ %r250 + 0 ], [ %rd31 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7098213Z // end inline asm 2026-02-21T09:34:05.7098271Z add.s32 %r252, %r465, 90112; 2026-02-21T09:34:05.7098331Z // begin inline asm 2026-02-21T09:34:05.7098442Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd32 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7098505Z // end inline asm 2026-02-21T09:34:05.7098564Z add.s32 %r254, %r465, 92160; 2026-02-21T09:34:05.7098624Z // begin inline asm 2026-02-21T09:34:05.7098744Z cp.async.cg.shared.global [ %r254 + 0 ], [ %rd33 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7098801Z // end inline asm 2026-02-21T09:34:05.7098858Z add.s32 %r256, %r465, 94208; 2026-02-21T09:34:05.7098922Z // begin inline asm 2026-02-21T09:34:05.7099033Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd34 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7099088Z // end inline asm 2026-02-21T09:34:05.7099146Z add.s32 %r258, %r465, 96256; 2026-02-21T09:34:05.7099211Z // begin inline asm 2026-02-21T09:34:05.7099322Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd35 + 0 ], 0x10, %r229; 2026-02-21T09:34:05.7099377Z // end inline asm 2026-02-21T09:34:05.7099448Z cp.async.commit_group; 2026-02-21T09:34:05.7099624Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7099688Z setp.gt.s32 %p7, %r5, 1; 2026-02-21T09:34:05.7099865Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7099930Z cvt.s64.s32 %rd100, %r405; 2026-02-21T09:34:05.7099992Z cvt.u64.u32 %rd101, %r15; 2026-02-21T09:34:05.7100057Z or.b64 %rd102, %rd100, %rd101; 2026-02-21T09:34:05.7100127Z shl.b64 %rd103, %rd102, 1; 2026-02-21T09:34:05.7100187Z add.s64 %rd1, %rd17, %rd103; 2026-02-21T09:34:05.7100247Z add.s64 %rd36, %rd1, 64; 2026-02-21T09:34:05.7100314Z cvt.s64.s32 %rd104, %r406; 2026-02-21T09:34:05.7100379Z or.b64 %rd105, %rd104, %rd101; 2026-02-21T09:34:05.7100439Z shl.b64 %rd106, %rd105, 1; 2026-02-21T09:34:05.7100501Z add.s64 %rd2, %rd17, %rd106; 2026-02-21T09:34:05.7100571Z add.s64 %rd37, %rd2, 64; 2026-02-21T09:34:05.7100632Z cvt.s64.s32 %rd107, %r407; 2026-02-21T09:34:05.7100701Z or.b64 %rd108, %rd107, %rd101; 2026-02-21T09:34:05.7100767Z shl.b64 %rd109, %rd108, 1; 2026-02-21T09:34:05.7100827Z add.s64 %rd3, %rd17, %rd109; 2026-02-21T09:34:05.7100886Z add.s64 %rd38, %rd3, 64; 2026-02-21T09:34:05.7100947Z cvt.s64.s32 %rd110, %r408; 2026-02-21T09:34:05.7101040Z or.b64 %rd111, %rd110, %rd101; 2026-02-21T09:34:05.7101127Z shl.b64 %rd112, %rd111, 1; 2026-02-21T09:34:05.7101186Z add.s64 %rd4, %rd17, %rd112; 2026-02-21T09:34:05.7101253Z add.s64 %rd39, %rd4, 64; 2026-02-21T09:34:05.7101311Z cvt.s64.s32 %rd113, %r409; 2026-02-21T09:34:05.7101371Z or.b64 %rd114, %rd113, %rd101; 2026-02-21T09:34:05.7101430Z shl.b64 %rd115, %rd114, 1; 2026-02-21T09:34:05.7101498Z add.s64 %rd5, %rd17, %rd115; 2026-02-21T09:34:05.7101557Z add.s64 %rd40, %rd5, 64; 2026-02-21T09:34:05.7101617Z cvt.s64.s32 %rd116, %r410; 2026-02-21T09:34:05.7101684Z or.b64 %rd117, %rd116, %rd101; 2026-02-21T09:34:05.7101743Z shl.b64 %rd118, %rd117, 1; 2026-02-21T09:34:05.7101804Z add.s64 %rd6, %rd17, %rd118; 2026-02-21T09:34:05.7101869Z add.s64 %rd41, %rd6, 64; 2026-02-21T09:34:05.7101928Z cvt.s64.s32 %rd119, %r411; 2026-02-21T09:34:05.7102008Z or.b64 %rd120, %rd119, %rd101; 2026-02-21T09:34:05.7102089Z shl.b64 %rd121, %rd120, 1; 2026-02-21T09:34:05.7102160Z add.s64 %rd7, %rd17, %rd121; 2026-02-21T09:34:05.7102220Z add.s64 %rd42, %rd7, 64; 2026-02-21T09:34:05.7102280Z cvt.s64.s32 %rd122, %r412; 2026-02-21T09:34:05.7102348Z or.b64 %rd123, %rd122, %rd101; 2026-02-21T09:34:05.7102407Z shl.b64 %rd124, %rd123, 1; 2026-02-21T09:34:05.7102467Z add.s64 %rd8, %rd17, %rd124; 2026-02-21T09:34:05.7102526Z add.s64 %rd43, %rd8, 64; 2026-02-21T09:34:05.7102709Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7102768Z bar.sync 2, 128; 2026-02-21T09:34:05.7102828Z add.s32 %r260, %r465, 16384; 2026-02-21T09:34:05.7102897Z selp.b32 %r261, 16, 0, %p7; 2026-02-21T09:34:05.7102956Z // begin inline asm 2026-02-21T09:34:05.7103072Z cp.async.cg.shared.global [ %r260 + 0 ], [ %rd36 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7103129Z // end inline asm 2026-02-21T09:34:05.7103196Z add.s32 %r262, %r465, 18432; 2026-02-21T09:34:05.7103254Z // begin inline asm 2026-02-21T09:34:05.7103372Z cp.async.cg.shared.global [ %r262 + 0 ], [ %rd37 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7103439Z // end inline asm 2026-02-21T09:34:05.7103500Z add.s32 %r264, %r465, 20480; 2026-02-21T09:34:05.7103558Z // begin inline asm 2026-02-21T09:34:05.7103679Z cp.async.cg.shared.global [ %r264 + 0 ], [ %rd38 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7103735Z // end inline asm 2026-02-21T09:34:05.7103794Z add.s32 %r266, %r465, 22528; 2026-02-21T09:34:05.7103851Z // begin inline asm 2026-02-21T09:34:05.7103971Z cp.async.cg.shared.global [ %r266 + 0 ], [ %rd39 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7104027Z // end inline asm 2026-02-21T09:34:05.7104086Z add.s32 %r268, %r465, 24576; 2026-02-21T09:34:05.7104149Z // begin inline asm 2026-02-21T09:34:05.7104261Z cp.async.cg.shared.global [ %r268 + 0 ], [ %rd40 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7104316Z // end inline asm 2026-02-21T09:34:05.7104374Z add.s32 %r270, %r465, 26624; 2026-02-21T09:34:05.7104443Z // begin inline asm 2026-02-21T09:34:05.7104560Z cp.async.cg.shared.global [ %r270 + 0 ], [ %rd41 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7104617Z // end inline asm 2026-02-21T09:34:05.7104713Z add.s32 %r272, %r465, 28672; 2026-02-21T09:34:05.7104771Z // begin inline asm 2026-02-21T09:34:05.7104884Z cp.async.cg.shared.global [ %r272 + 0 ], [ %rd42 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7104940Z // end inline asm 2026-02-21T09:34:05.7105006Z add.s32 %r274, %r465, 30720; 2026-02-21T09:34:05.7105063Z // begin inline asm 2026-02-21T09:34:05.7105176Z cp.async.cg.shared.global [ %r274 + 0 ], [ %rd43 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7105240Z // end inline asm 2026-02-21T09:34:05.7105303Z cp.async.commit_group; 2026-02-21T09:34:05.7105480Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7105548Z cvt.s64.s32 %rd125, %r425; 2026-02-21T09:34:05.7105611Z or.b64 %rd126, %rd125, %rd101; 2026-02-21T09:34:05.7105672Z shl.b64 %rd127, %rd126, 1; 2026-02-21T09:34:05.7105764Z add.s64 %rd9, %rd18, %rd127; 2026-02-21T09:34:05.7105859Z add.s64 %rd44, %rd9, 64; 2026-02-21T09:34:05.7105920Z cvt.s64.s32 %rd128, %r426; 2026-02-21T09:34:05.7105981Z or.b64 %rd129, %rd128, %rd101; 2026-02-21T09:34:05.7106047Z shl.b64 %rd130, %rd129, 1; 2026-02-21T09:34:05.7106111Z add.s64 %rd10, %rd18, %rd130; 2026-02-21T09:34:05.7106175Z add.s64 %rd45, %rd10, 64; 2026-02-21T09:34:05.7106237Z cvt.s64.s32 %rd131, %r427; 2026-02-21T09:34:05.7106306Z or.b64 %rd132, %rd131, %rd101; 2026-02-21T09:34:05.7106368Z shl.b64 %rd133, %rd132, 1; 2026-02-21T09:34:05.7106430Z add.s64 %rd11, %rd18, %rd133; 2026-02-21T09:34:05.7106497Z add.s64 %rd46, %rd11, 64; 2026-02-21T09:34:05.7106558Z cvt.s64.s32 %rd134, %r428; 2026-02-21T09:34:05.7106619Z or.b64 %rd135, %rd134, %rd101; 2026-02-21T09:34:05.7106679Z shl.b64 %rd136, %rd135, 1; 2026-02-21T09:34:05.7106748Z add.s64 %rd12, %rd18, %rd136; 2026-02-21T09:34:05.7106834Z add.s64 %rd47, %rd12, 64; 2026-02-21T09:34:05.7106928Z cvt.s64.s32 %rd137, %r429; 2026-02-21T09:34:05.7107001Z or.b64 %rd138, %rd137, %rd101; 2026-02-21T09:34:05.7107062Z shl.b64 %rd139, %rd138, 1; 2026-02-21T09:34:05.7107124Z add.s64 %rd13, %rd18, %rd139; 2026-02-21T09:34:05.7107190Z add.s64 %rd48, %rd13, 64; 2026-02-21T09:34:05.7107251Z cvt.s64.s32 %rd140, %r430; 2026-02-21T09:34:05.7107310Z or.b64 %rd141, %rd140, %rd101; 2026-02-21T09:34:05.7107371Z shl.b64 %rd142, %rd141, 1; 2026-02-21T09:34:05.7107443Z add.s64 %rd14, %rd18, %rd142; 2026-02-21T09:34:05.7107505Z add.s64 %rd49, %rd14, 64; 2026-02-21T09:34:05.7107565Z cvt.s64.s32 %rd143, %r431; 2026-02-21T09:34:05.7107637Z or.b64 %rd144, %rd143, %rd101; 2026-02-21T09:34:05.7107697Z shl.b64 %rd145, %rd144, 1; 2026-02-21T09:34:05.7107759Z add.s64 %rd15, %rd18, %rd145; 2026-02-21T09:34:05.7107817Z add.s64 %rd50, %rd15, 64; 2026-02-21T09:34:05.7107884Z cvt.s64.s32 %rd146, %r432; 2026-02-21T09:34:05.7107946Z or.b64 %rd147, %rd146, %rd101; 2026-02-21T09:34:05.7108005Z shl.b64 %rd148, %rd147, 1; 2026-02-21T09:34:05.7108075Z add.s64 %rd16, %rd18, %rd148; 2026-02-21T09:34:05.7108135Z add.s64 %rd51, %rd16, 64; 2026-02-21T09:34:05.7108310Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7108368Z add.s32 %r276, %r465, 98304; 2026-02-21T09:34:05.7108435Z // begin inline asm 2026-02-21T09:34:05.7108548Z cp.async.cg.shared.global [ %r276 + 0 ], [ %rd44 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7108603Z // end inline asm 2026-02-21T09:34:05.7108670Z add.s32 %r278, %r465, 100352; 2026-02-21T09:34:05.7108727Z // begin inline asm 2026-02-21T09:34:05.7108842Z cp.async.cg.shared.global [ %r278 + 0 ], [ %rd45 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7108904Z // end inline asm 2026-02-21T09:34:05.7108965Z add.s32 %r280, %r465, 102400; 2026-02-21T09:34:05.7109023Z // begin inline asm 2026-02-21T09:34:05.7109137Z cp.async.cg.shared.global [ %r280 + 0 ], [ %rd46 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7109203Z // end inline asm 2026-02-21T09:34:05.7109264Z add.s32 %r282, %r465, 104448; 2026-02-21T09:34:05.7109322Z // begin inline asm 2026-02-21T09:34:05.7109442Z cp.async.cg.shared.global [ %r282 + 0 ], [ %rd47 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7109499Z // end inline asm 2026-02-21T09:34:05.7109557Z add.s32 %r284, %r465, 106496; 2026-02-21T09:34:05.7109614Z // begin inline asm 2026-02-21T09:34:05.7109737Z cp.async.cg.shared.global [ %r284 + 0 ], [ %rd48 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7109793Z // end inline asm 2026-02-21T09:34:05.7109851Z add.s32 %r286, %r465, 108544; 2026-02-21T09:34:05.7109918Z // begin inline asm 2026-02-21T09:34:05.7110032Z cp.async.cg.shared.global [ %r286 + 0 ], [ %rd49 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7110087Z // end inline asm 2026-02-21T09:34:05.7110154Z add.s32 %r288, %r465, 110592; 2026-02-21T09:34:05.7110211Z // begin inline asm 2026-02-21T09:34:05.7110327Z cp.async.cg.shared.global [ %r288 + 0 ], [ %rd50 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7110385Z // end inline asm 2026-02-21T09:34:05.7110485Z add.s32 %r290, %r465, 112640; 2026-02-21T09:34:05.7110568Z // begin inline asm 2026-02-21T09:34:05.7110683Z cp.async.cg.shared.global [ %r290 + 0 ], [ %rd51 + 0 ], 0x10, %r261; 2026-02-21T09:34:05.7110746Z // end inline asm 2026-02-21T09:34:05.7110811Z cp.async.commit_group; 2026-02-21T09:34:05.7110993Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7111058Z setp.gt.s32 %p8, %r5, 2; 2026-02-21T09:34:05.7111244Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7111305Z add.s64 %rd52, %rd1, 128; 2026-02-21T09:34:05.7111366Z add.s64 %rd53, %rd2, 128; 2026-02-21T09:34:05.7111433Z add.s64 %rd54, %rd3, 128; 2026-02-21T09:34:05.7111492Z add.s64 %rd55, %rd4, 128; 2026-02-21T09:34:05.7111549Z add.s64 %rd56, %rd5, 128; 2026-02-21T09:34:05.7111636Z add.s64 %rd57, %rd6, 128; 2026-02-21T09:34:05.7111716Z add.s64 %rd58, %rd7, 128; 2026-02-21T09:34:05.7111777Z add.s64 %rd59, %rd8, 128; 2026-02-21T09:34:05.7111949Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7112012Z bar.sync 2, 128; 2026-02-21T09:34:05.7112072Z add.s32 %r292, %r465, 32768; 2026-02-21T09:34:05.7112135Z selp.b32 %r293, 16, 0, %p8; 2026-02-21T09:34:05.7112199Z // begin inline asm 2026-02-21T09:34:05.7112314Z cp.async.cg.shared.global [ %r292 + 0 ], [ %rd52 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7112370Z // end inline asm 2026-02-21T09:34:05.7112428Z add.s32 %r294, %r465, 34816; 2026-02-21T09:34:05.7112492Z // begin inline asm 2026-02-21T09:34:05.7112605Z cp.async.cg.shared.global [ %r294 + 0 ], [ %rd53 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7112659Z // end inline asm 2026-02-21T09:34:05.7112724Z add.s32 %r296, %r465, 36864; 2026-02-21T09:34:05.7112780Z // begin inline asm 2026-02-21T09:34:05.7112895Z cp.async.cg.shared.global [ %r296 + 0 ], [ %rd54 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7112959Z // end inline asm 2026-02-21T09:34:05.7113018Z add.s32 %r298, %r465, 38912; 2026-02-21T09:34:05.7113074Z // begin inline asm 2026-02-21T09:34:05.7113189Z cp.async.cg.shared.global [ %r298 + 0 ], [ %rd55 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7113254Z // end inline asm 2026-02-21T09:34:05.7113313Z add.s32 %r300, %r465, 40960; 2026-02-21T09:34:05.7113369Z // begin inline asm 2026-02-21T09:34:05.7113491Z cp.async.cg.shared.global [ %r300 + 0 ], [ %rd56 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7113546Z // end inline asm 2026-02-21T09:34:05.7113603Z add.s32 %r302, %r465, 43008; 2026-02-21T09:34:05.7113661Z // begin inline asm 2026-02-21T09:34:05.7113781Z cp.async.cg.shared.global [ %r302 + 0 ], [ %rd57 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7113838Z // end inline asm 2026-02-21T09:34:05.7113896Z add.s32 %r304, %r465, 45056; 2026-02-21T09:34:05.7113961Z // begin inline asm 2026-02-21T09:34:05.7114076Z cp.async.cg.shared.global [ %r304 + 0 ], [ %rd58 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7114133Z // end inline asm 2026-02-21T09:34:05.7114195Z add.s32 %r306, %r465, 47104; 2026-02-21T09:34:05.7114260Z // begin inline asm 2026-02-21T09:34:05.7114373Z cp.async.cg.shared.global [ %r306 + 0 ], [ %rd59 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7114431Z // end inline asm 2026-02-21T09:34:05.7114504Z cp.async.commit_group; 2026-02-21T09:34:05.7114707Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7114770Z add.s64 %rd60, %rd9, 128; 2026-02-21T09:34:05.7114839Z add.s64 %rd61, %rd10, 128; 2026-02-21T09:34:05.7114899Z add.s64 %rd62, %rd11, 128; 2026-02-21T09:34:05.7114960Z add.s64 %rd63, %rd12, 128; 2026-02-21T09:34:05.7115018Z add.s64 %rd64, %rd13, 128; 2026-02-21T09:34:05.7115087Z add.s64 %rd65, %rd14, 128; 2026-02-21T09:34:05.7115147Z add.s64 %rd66, %rd15, 128; 2026-02-21T09:34:05.7115208Z add.s64 %rd67, %rd16, 128; 2026-02-21T09:34:05.7115395Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7115511Z add.s32 %r308, %r465, 114688; 2026-02-21T09:34:05.7115571Z // begin inline asm 2026-02-21T09:34:05.7115684Z cp.async.cg.shared.global [ %r308 + 0 ], [ %rd60 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7115749Z // end inline asm 2026-02-21T09:34:05.7115808Z add.s32 %r310, %r465, 116736; 2026-02-21T09:34:05.7115867Z // begin inline asm 2026-02-21T09:34:05.7115990Z cp.async.cg.shared.global [ %r310 + 0 ], [ %rd61 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7116045Z // end inline asm 2026-02-21T09:34:05.7116104Z add.s32 %r312, %r465, 118784; 2026-02-21T09:34:05.7116172Z // begin inline asm 2026-02-21T09:34:05.7116288Z cp.async.cg.shared.global [ %r312 + 0 ], [ %rd62 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7116346Z // end inline asm 2026-02-21T09:34:05.7116409Z add.s32 %r314, %r465, 120832; 2026-02-21T09:34:05.7116504Z // begin inline asm 2026-02-21T09:34:05.7116646Z cp.async.cg.shared.global [ %r314 + 0 ], [ %rd63 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7116706Z // end inline asm 2026-02-21T09:34:05.7116773Z add.s32 %r316, %r465, 122880; 2026-02-21T09:34:05.7116830Z // begin inline asm 2026-02-21T09:34:05.7116943Z cp.async.cg.shared.global [ %r316 + 0 ], [ %rd64 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7117000Z // end inline asm 2026-02-21T09:34:05.7117065Z add.s32 %r318, %r465, 124928; 2026-02-21T09:34:05.7117123Z // begin inline asm 2026-02-21T09:34:05.7117235Z cp.async.cg.shared.global [ %r318 + 0 ], [ %rd65 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7117299Z // end inline asm 2026-02-21T09:34:05.7117357Z add.s32 %r320, %r465, 126976; 2026-02-21T09:34:05.7117414Z // begin inline asm 2026-02-21T09:34:05.7117534Z cp.async.cg.shared.global [ %r320 + 0 ], [ %rd66 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7117589Z // end inline asm 2026-02-21T09:34:05.7117649Z add.s32 %r322, %r465, 129024; 2026-02-21T09:34:05.7117709Z // begin inline asm 2026-02-21T09:34:05.7117831Z cp.async.cg.shared.global [ %r322 + 0 ], [ %rd67 + 0 ], 0x10, %r293; 2026-02-21T09:34:05.7117890Z // end inline asm 2026-02-21T09:34:05.7117953Z cp.async.commit_group; 2026-02-21T09:34:05.7118138Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7118202Z setp.gt.s32 %p9, %r5, 3; 2026-02-21T09:34:05.7118373Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7118435Z add.s64 %rd68, %rd1, 192; 2026-02-21T09:34:05.7118502Z add.s64 %rd69, %rd2, 192; 2026-02-21T09:34:05.7118562Z add.s64 %rd70, %rd3, 192; 2026-02-21T09:34:05.7118620Z add.s64 %rd71, %rd4, 192; 2026-02-21T09:34:05.7118689Z add.s64 %rd72, %rd5, 192; 2026-02-21T09:34:05.7118747Z add.s64 %rd73, %rd6, 192; 2026-02-21T09:34:05.7118804Z add.s64 %rd74, %rd7, 192; 2026-02-21T09:34:05.7118870Z add.s64 %rd75, %rd8, 192; 2026-02-21T09:34:05.7119039Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7119100Z bar.sync 2, 128; 2026-02-21T09:34:05.7119167Z add.s32 %r324, %r465, 49152; 2026-02-21T09:34:05.7119236Z selp.b32 %r325, 16, 0, %p9; 2026-02-21T09:34:05.7119294Z // begin inline asm 2026-02-21T09:34:05.7119407Z cp.async.cg.shared.global [ %r324 + 0 ], [ %rd68 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7119470Z // end inline asm 2026-02-21T09:34:05.7119529Z add.s32 %r326, %r465, 51200; 2026-02-21T09:34:05.7119586Z // begin inline asm 2026-02-21T09:34:05.7119695Z cp.async.cg.shared.global [ %r326 + 0 ], [ %rd69 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7119759Z // end inline asm 2026-02-21T09:34:05.7119817Z add.s32 %r328, %r465, 53248; 2026-02-21T09:34:05.7119874Z // begin inline asm 2026-02-21T09:34:05.7119993Z cp.async.cg.shared.global [ %r328 + 0 ], [ %rd70 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7120051Z // end inline asm 2026-02-21T09:34:05.7120109Z add.s32 %r330, %r465, 55296; 2026-02-21T09:34:05.7120174Z // begin inline asm 2026-02-21T09:34:05.7120311Z cp.async.cg.shared.global [ %r330 + 0 ], [ %rd71 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7120389Z // end inline asm 2026-02-21T09:34:05.7120447Z add.s32 %r332, %r465, 57344; 2026-02-21T09:34:05.7120510Z // begin inline asm 2026-02-21T09:34:05.7120622Z cp.async.cg.shared.global [ %r332 + 0 ], [ %rd72 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7120676Z // end inline asm 2026-02-21T09:34:05.7120743Z add.s32 %r334, %r465, 59392; 2026-02-21T09:34:05.7120804Z // begin inline asm 2026-02-21T09:34:05.7120917Z cp.async.cg.shared.global [ %r334 + 0 ], [ %rd73 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7120972Z // end inline asm 2026-02-21T09:34:05.7121038Z add.s32 %r336, %r465, 61440; 2026-02-21T09:34:05.7121096Z // begin inline asm 2026-02-21T09:34:05.7121207Z cp.async.cg.shared.global [ %r336 + 0 ], [ %rd74 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7121271Z // end inline asm 2026-02-21T09:34:05.7121352Z add.s32 %r338, %r465, 63488; 2026-02-21T09:34:05.7121431Z // begin inline asm 2026-02-21T09:34:05.7121549Z cp.async.cg.shared.global [ %r338 + 0 ], [ %rd75 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7121614Z // end inline asm 2026-02-21T09:34:05.7121679Z cp.async.commit_group; 2026-02-21T09:34:05.7121856Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7121925Z add.s64 %rd76, %rd9, 192; 2026-02-21T09:34:05.7121985Z add.s64 %rd77, %rd10, 192; 2026-02-21T09:34:05.7122045Z add.s64 %rd78, %rd11, 192; 2026-02-21T09:34:05.7122110Z add.s64 %rd79, %rd12, 192; 2026-02-21T09:34:05.7122169Z add.s64 %rd80, %rd13, 192; 2026-02-21T09:34:05.7122227Z add.s64 %rd81, %rd14, 192; 2026-02-21T09:34:05.7122286Z add.s64 %rd82, %rd15, 192; 2026-02-21T09:34:05.7122353Z add.s64 %rd83, %rd16, 192; 2026-02-21T09:34:05.7122526Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7122588Z add.s32 %r340, %r465, 131072; 2026-02-21T09:34:05.7122655Z // begin inline asm 2026-02-21T09:34:05.7122772Z cp.async.cg.shared.global [ %r340 + 0 ], [ %rd76 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7122831Z // end inline asm 2026-02-21T09:34:05.7122891Z add.s32 %r342, %r465, 133120; 2026-02-21T09:34:05.7122960Z // begin inline asm 2026-02-21T09:34:05.7123074Z cp.async.cg.shared.global [ %r342 + 0 ], [ %rd77 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7123132Z // end inline asm 2026-02-21T09:34:05.7123202Z add.s32 %r344, %r465, 135168; 2026-02-21T09:34:05.7123261Z // begin inline asm 2026-02-21T09:34:05.7123378Z cp.async.cg.shared.global [ %r344 + 0 ], [ %rd78 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7123442Z // end inline asm 2026-02-21T09:34:05.7123503Z add.s32 %r346, %r465, 137216; 2026-02-21T09:34:05.7123562Z // begin inline asm 2026-02-21T09:34:05.7123676Z cp.async.cg.shared.global [ %r346 + 0 ], [ %rd79 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7123741Z // end inline asm 2026-02-21T09:34:05.7123801Z add.s32 %r348, %r465, 139264; 2026-02-21T09:34:05.7123859Z // begin inline asm 2026-02-21T09:34:05.7123982Z cp.async.cg.shared.global [ %r348 + 0 ], [ %rd80 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7124041Z // end inline asm 2026-02-21T09:34:05.7124102Z add.s32 %r350, %r465, 141312; 2026-02-21T09:34:05.7124158Z // begin inline asm 2026-02-21T09:34:05.7124280Z cp.async.cg.shared.global [ %r350 + 0 ], [ %rd81 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7124336Z // end inline asm 2026-02-21T09:34:05.7124395Z add.s32 %r352, %r465, 143360; 2026-02-21T09:34:05.7124460Z // begin inline asm 2026-02-21T09:34:05.7124575Z cp.async.cg.shared.global [ %r352 + 0 ], [ %rd82 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7124631Z // end inline asm 2026-02-21T09:34:05.7124726Z add.s32 %r354, %r465, 145408; 2026-02-21T09:34:05.7124784Z // begin inline asm 2026-02-21T09:34:05.7124897Z cp.async.cg.shared.global [ %r354 + 0 ], [ %rd83 + 0 ], 0x10, %r325; 2026-02-21T09:34:05.7124953Z // end inline asm 2026-02-21T09:34:05.7125025Z cp.async.commit_group; 2026-02-21T09:34:05.7125208Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7125328Z setp.gt.s32 %p10, %r5, 4; 2026-02-21T09:34:05.7125508Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7125570Z add.s64 %rd84, %rd1, 256; 2026-02-21T09:34:05.7125629Z add.s64 %rd85, %rd2, 256; 2026-02-21T09:34:05.7125689Z add.s64 %rd86, %rd3, 256; 2026-02-21T09:34:05.7125756Z add.s64 %rd87, %rd4, 256; 2026-02-21T09:34:05.7125815Z add.s64 %rd88, %rd5, 256; 2026-02-21T09:34:05.7125876Z add.s64 %rd89, %rd6, 256; 2026-02-21T09:34:05.7125942Z add.s64 %rd90, %rd7, 256; 2026-02-21T09:34:05.7126001Z add.s64 %rd91, %rd8, 256; 2026-02-21T09:34:05.7126175Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7126240Z bar.sync 2, 128; 2026-02-21T09:34:05.7126343Z add.s32 %r356, %r465, 65536; 2026-02-21T09:34:05.7126434Z selp.b32 %r357, 16, 0, %p10; 2026-02-21T09:34:05.7126495Z // begin inline asm 2026-02-21T09:34:05.7126619Z cp.async.cg.shared.global [ %r356 + 0 ], [ %rd84 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7126676Z // end inline asm 2026-02-21T09:34:05.7126734Z add.s32 %r358, %r465, 67584; 2026-02-21T09:34:05.7126799Z // begin inline asm 2026-02-21T09:34:05.7126912Z cp.async.cg.shared.global [ %r358 + 0 ], [ %rd85 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7126968Z // end inline asm 2026-02-21T09:34:05.7127027Z add.s32 %r360, %r465, 69632; 2026-02-21T09:34:05.7127092Z // begin inline asm 2026-02-21T09:34:05.7127204Z cp.async.cg.shared.global [ %r360 + 0 ], [ %rd86 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7127258Z // end inline asm 2026-02-21T09:34:05.7127323Z add.s32 %r362, %r465, 71680; 2026-02-21T09:34:05.7127382Z // begin inline asm 2026-02-21T09:34:05.7127495Z cp.async.cg.shared.global [ %r362 + 0 ], [ %rd87 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7127559Z // end inline asm 2026-02-21T09:34:05.7127619Z add.s32 %r364, %r465, 73728; 2026-02-21T09:34:05.7127678Z // begin inline asm 2026-02-21T09:34:05.7127791Z cp.async.cg.shared.global [ %r364 + 0 ], [ %rd88 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7127854Z // end inline asm 2026-02-21T09:34:05.7127913Z add.s32 %r366, %r465, 75776; 2026-02-21T09:34:05.7127971Z // begin inline asm 2026-02-21T09:34:05.7128091Z cp.async.cg.shared.global [ %r366 + 0 ], [ %rd89 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7128148Z // end inline asm 2026-02-21T09:34:05.7128208Z add.s32 %r368, %r465, 77824; 2026-02-21T09:34:05.7128266Z // begin inline asm 2026-02-21T09:34:05.7128386Z cp.async.cg.shared.global [ %r368 + 0 ], [ %rd90 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7128443Z // end inline asm 2026-02-21T09:34:05.7128502Z add.s32 %r370, %r465, 79872; 2026-02-21T09:34:05.7128567Z // begin inline asm 2026-02-21T09:34:05.7128683Z cp.async.cg.shared.global [ %r370 + 0 ], [ %rd91 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7128738Z // end inline asm 2026-02-21T09:34:05.7128802Z cp.async.commit_group; 2026-02-21T09:34:05.7128985Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7129047Z add.s64 %rd92, %rd9, 256; 2026-02-21T09:34:05.7129109Z add.s64 %rd93, %rd10, 256; 2026-02-21T09:34:05.7129175Z add.s64 %rd94, %rd11, 256; 2026-02-21T09:34:05.7129234Z add.s64 %rd95, %rd12, 256; 2026-02-21T09:34:05.7129291Z add.s64 %rd96, %rd13, 256; 2026-02-21T09:34:05.7129357Z add.s64 %rd97, %rd14, 256; 2026-02-21T09:34:05.7129415Z add.s64 %rd98, %rd15, 256; 2026-02-21T09:34:05.7129474Z add.s64 %rd99, %rd16, 256; 2026-02-21T09:34:05.7129648Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7129716Z add.s32 %r372, %r465, 147456; 2026-02-21T09:34:05.7129774Z // begin inline asm 2026-02-21T09:34:05.7129891Z cp.async.cg.shared.global [ %r372 + 0 ], [ %rd92 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7129958Z // end inline asm 2026-02-21T09:34:05.7130040Z add.s32 %r374, %r465, 149504; 2026-02-21T09:34:05.7130121Z // begin inline asm 2026-02-21T09:34:05.7130235Z cp.async.cg.shared.global [ %r374 + 0 ], [ %rd93 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7130300Z // end inline asm 2026-02-21T09:34:05.7130360Z add.s32 %r376, %r465, 151552; 2026-02-21T09:34:05.7130419Z // begin inline asm 2026-02-21T09:34:05.7130541Z cp.async.cg.shared.global [ %r376 + 0 ], [ %rd94 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7130601Z // end inline asm 2026-02-21T09:34:05.7130659Z add.s32 %r378, %r465, 153600; 2026-02-21T09:34:05.7130726Z // begin inline asm 2026-02-21T09:34:05.7130842Z cp.async.cg.shared.global [ %r378 + 0 ], [ %rd95 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7130898Z // end inline asm 2026-02-21T09:34:05.7130958Z add.s32 %r380, %r465, 155648; 2026-02-21T09:34:05.7131034Z // begin inline asm 2026-02-21T09:34:05.7131165Z cp.async.cg.shared.global [ %r380 + 0 ], [ %rd96 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7131247Z // end inline asm 2026-02-21T09:34:05.7131316Z add.s32 %r382, %r465, 157696; 2026-02-21T09:34:05.7131376Z // begin inline asm 2026-02-21T09:34:05.7131489Z cp.async.cg.shared.global [ %r382 + 0 ], [ %rd97 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7131545Z // end inline asm 2026-02-21T09:34:05.7131611Z add.s32 %r384, %r465, 159744; 2026-02-21T09:34:05.7131667Z // begin inline asm 2026-02-21T09:34:05.7131780Z cp.async.cg.shared.global [ %r384 + 0 ], [ %rd98 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7131844Z // end inline asm 2026-02-21T09:34:05.7131902Z add.s32 %r386, %r465, 161792; 2026-02-21T09:34:05.7131958Z // begin inline asm 2026-02-21T09:34:05.7132079Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd99 + 0 ], 0x10, %r357; 2026-02-21T09:34:05.7132134Z // end inline asm 2026-02-21T09:34:05.7132197Z cp.async.commit_group; 2026-02-21T09:34:05.7132375Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7132453Z cp.async.wait_group 8; 2026-02-21T09:34:05.7132512Z bar.sync 2, 128; 2026-02-21T09:34:05.7132691Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7132783Z shfl.sync.idx.b32 %r49, %r390, 0, 31, -1; 2026-02-21T09:34:05.7132846Z setp.ne.b32 %p11, %r49, 0; 2026-02-21T09:34:05.7132912Z or.pred %p12, %p5, %p11; 2026-02-21T09:34:05.7132973Z @%p12 bra $L__BB0_7; 2026-02-21T09:34:05.7133081Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7133147Z elect.sync %r449|%p14, -1; 2026-02-21T09:34:05.7133209Z bfe.u32 %r451, %r226, 4, 14; 2026-02-21T09:34:05.7133278Z cvt.u64.u32 %rd158, %r451; 2026-02-21T09:34:05.7133355Z or.b64 %rd149, %rd158, -9223371899348713472; 2026-02-21T09:34:05.7133416Z add.s32 %r452, %r226, 81920; 2026-02-21T09:34:05.7133481Z bfe.u32 %r453, %r452, 4, 14; 2026-02-21T09:34:05.7133540Z cvt.u64.u32 %rd159, %r453; 2026-02-21T09:34:05.7133617Z or.b64 %rd150, %rd159, -9223371899348713472; 2026-02-21T09:34:05.7133678Z mov.b32 %r442, 138412048; 2026-02-21T09:34:05.7133747Z mov.pred %p13, 0; 2026-02-21T09:34:05.7133806Z // begin inline asm 2026-02-21T09:34:05.7133969Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 0 ], %rd149, %rd150, %r442, %p13; 2026-02-21T09:34:05.7134034Z // end inline asm 2026-02-21T09:34:05.7134094Z add.s32 %r454, %r226, 32; 2026-02-21T09:34:05.7134153Z bfe.u32 %r455, %r454, 4, 14; 2026-02-21T09:34:05.7134213Z cvt.u64.u32 %rd160, %r455; 2026-02-21T09:34:05.7134293Z or.b64 %rd151, %rd160, -9223371899348713472; 2026-02-21T09:34:05.7134352Z add.s32 %r456, %r226, 81952; 2026-02-21T09:34:05.7134410Z bfe.u32 %r457, %r456, 4, 14; 2026-02-21T09:34:05.7134479Z cvt.u64.u32 %rd161, %r457; 2026-02-21T09:34:05.7134551Z or.b64 %rd152, %rd161, -9223371899348713472; 2026-02-21T09:34:05.7134612Z mov.pred %p15, -1; 2026-02-21T09:34:05.7134724Z // begin inline asm 2026-02-21T09:34:05.7134879Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 0 ], %rd151, %rd152, %r442, %p15; 2026-02-21T09:34:05.7134964Z // end inline asm 2026-02-21T09:34:05.7135049Z add.s32 %r458, %r226, 8192; 2026-02-21T09:34:05.7135116Z bfe.u32 %r459, %r458, 4, 14; 2026-02-21T09:34:05.7135175Z cvt.u64.u32 %rd162, %r459; 2026-02-21T09:34:05.7135246Z or.b64 %rd153, %rd162, -9223371899348713472; 2026-02-21T09:34:05.7135309Z // begin inline asm 2026-02-21T09:34:05.7135456Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 256 ], %rd153, %rd150, %r442, %p13; 2026-02-21T09:34:05.7135512Z // end inline asm 2026-02-21T09:34:05.7135578Z add.s32 %r460, %r226, 8224; 2026-02-21T09:34:05.7135636Z bfe.u32 %r461, %r460, 4, 14; 2026-02-21T09:34:05.7135696Z cvt.u64.u32 %rd163, %r461; 2026-02-21T09:34:05.7135766Z or.b64 %rd155, %rd163, -9223371899348713472; 2026-02-21T09:34:05.7135831Z // begin inline asm 2026-02-21T09:34:05.7135977Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 256 ], %rd155, %rd152, %r442, %p15; 2026-02-21T09:34:05.7136060Z // end inline asm 2026-02-21T09:34:05.7136156Z add.s32 %r462, %r226, 180224; 2026-02-21T09:34:05.7136220Z cvt.u64.u32 %rd157, %r462; 2026-02-21T09:34:05.7136280Z // begin inline asm 2026-02-21T09:34:05.7136412Z @%p13 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd157]; 2026-02-21T09:34:05.7136476Z // end inline asm 2026-02-21T09:34:05.7136579Z $L__BB0_7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7136768Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7136840Z setp.gt.s32 %p23, %r5, 5; 2026-02-21T09:34:05.7137024Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7137083Z add.s32 %r463, %r226, 180240; 2026-02-21T09:34:05.7137147Z mov.b32 %r3315, 0; 2026-02-21T09:34:05.7137207Z mov.pred %p97, 0; 2026-02-21T09:34:05.7137264Z // begin inline asm 2026-02-21T09:34:05.7137319Z 2026-02-21T09:34:05.7137379Z { 2026-02-21T09:34:05.7137449Z @!%p97 bra.uni skipWait; 2026-02-21T09:34:05.7137515Z .reg .pred complete; 2026-02-21T09:34:05.7137584Z waitLoop: 2026-02-21T09:34:05.7137714Z mbarrier.try_wait.parity.shared.b64 complete, [%r463], %r3315; 2026-02-21T09:34:05.7137782Z @!complete bra.uni waitLoop; 2026-02-21T09:34:05.7137839Z skipWait: 2026-02-21T09:34:05.7137898Z } 2026-02-21T09:34:05.7137902Z 2026-02-21T09:34:05.7137957Z // end inline asm 2026-02-21T09:34:05.7138133Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7138203Z add.s64 %rd164, %rd1, 320; 2026-02-21T09:34:05.7138264Z add.s64 %rd165, %rd2, 320; 2026-02-21T09:34:05.7138321Z add.s64 %rd166, %rd3, 320; 2026-02-21T09:34:05.7138386Z add.s64 %rd167, %rd4, 320; 2026-02-21T09:34:05.7138445Z add.s64 %rd168, %rd5, 320; 2026-02-21T09:34:05.7138505Z add.s64 %rd169, %rd6, 320; 2026-02-21T09:34:05.7138565Z add.s64 %rd170, %rd7, 320; 2026-02-21T09:34:05.7138634Z add.s64 %rd171, %rd8, 320; 2026-02-21T09:34:05.7138811Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7138874Z bar.sync 2, 128; 2026-02-21T09:34:05.7138951Z selp.b32 %r466, 16, 0, %p23; 2026-02-21T09:34:05.7139009Z // begin inline asm 2026-02-21T09:34:05.7139134Z cp.async.cg.shared.global [ %r465 + 0 ], [ %rd164 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7139198Z // end inline asm 2026-02-21T09:34:05.7139257Z // begin inline asm 2026-02-21T09:34:05.7139378Z cp.async.cg.shared.global [ %r467 + 0 ], [ %rd165 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7139433Z // end inline asm 2026-02-21T09:34:05.7139498Z // begin inline asm 2026-02-21T09:34:05.7139616Z cp.async.cg.shared.global [ %r469 + 0 ], [ %rd166 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7139671Z // end inline asm 2026-02-21T09:34:05.7139737Z // begin inline asm 2026-02-21T09:34:05.7139855Z cp.async.cg.shared.global [ %r471 + 0 ], [ %rd167 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7139914Z // end inline asm 2026-02-21T09:34:05.7139995Z // begin inline asm 2026-02-21T09:34:05.7140145Z cp.async.cg.shared.global [ %r473 + 0 ], [ %rd168 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7140201Z // end inline asm 2026-02-21T09:34:05.7140258Z // begin inline asm 2026-02-21T09:34:05.7140382Z cp.async.cg.shared.global [ %r475 + 0 ], [ %rd169 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7140438Z // end inline asm 2026-02-21T09:34:05.7140497Z // begin inline asm 2026-02-21T09:34:05.7140611Z cp.async.cg.shared.global [ %r477 + 0 ], [ %rd170 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7140674Z // end inline asm 2026-02-21T09:34:05.7140731Z // begin inline asm 2026-02-21T09:34:05.7140845Z cp.async.cg.shared.global [ %r479 + 0 ], [ %rd171 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7140909Z // end inline asm 2026-02-21T09:34:05.7140974Z cp.async.commit_group; 2026-02-21T09:34:05.7141170Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7141261Z add.s64 %rd172, %rd9, 320; 2026-02-21T09:34:05.7141325Z add.s64 %rd173, %rd10, 320; 2026-02-21T09:34:05.7141387Z add.s64 %rd174, %rd11, 320; 2026-02-21T09:34:05.7141447Z add.s64 %rd175, %rd12, 320; 2026-02-21T09:34:05.7141513Z add.s64 %rd176, %rd13, 320; 2026-02-21T09:34:05.7141572Z add.s64 %rd177, %rd14, 320; 2026-02-21T09:34:05.7141630Z add.s64 %rd178, %rd15, 320; 2026-02-21T09:34:05.7141695Z add.s64 %rd179, %rd16, 320; 2026-02-21T09:34:05.7141870Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7141928Z // begin inline asm 2026-02-21T09:34:05.7142051Z cp.async.cg.shared.global [ %r244 + 0 ], [ %rd172 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7142106Z // end inline asm 2026-02-21T09:34:05.7142164Z // begin inline asm 2026-02-21T09:34:05.7142279Z cp.async.cg.shared.global [ %r246 + 0 ], [ %rd173 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7142342Z // end inline asm 2026-02-21T09:34:05.7142402Z // begin inline asm 2026-02-21T09:34:05.7142516Z cp.async.cg.shared.global [ %r248 + 0 ], [ %rd174 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7142581Z // end inline asm 2026-02-21T09:34:05.7142641Z // begin inline asm 2026-02-21T09:34:05.7142755Z cp.async.cg.shared.global [ %r250 + 0 ], [ %rd175 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7142809Z // end inline asm 2026-02-21T09:34:05.7142872Z // begin inline asm 2026-02-21T09:34:05.7142987Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd176 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7143042Z // end inline asm 2026-02-21T09:34:05.7143105Z // begin inline asm 2026-02-21T09:34:05.7143218Z cp.async.cg.shared.global [ %r254 + 0 ], [ %rd177 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7143272Z // end inline asm 2026-02-21T09:34:05.7143329Z // begin inline asm 2026-02-21T09:34:05.7143449Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd178 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7143505Z // end inline asm 2026-02-21T09:34:05.7143565Z // begin inline asm 2026-02-21T09:34:05.7143689Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd179 + 0 ], 0x10, %r466; 2026-02-21T09:34:05.7143746Z // end inline asm 2026-02-21T09:34:05.7143812Z cp.async.commit_group; 2026-02-21T09:34:05.7144000Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7144059Z @%p5 bra $L__BB0_14; 2026-02-21T09:34:05.7144136Z // %bb.8: // %.lr.ph 2026-02-21T09:34:05.7144229Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7144412Z .loc 1 0 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:0:130 2026-02-21T09:34:05.7144473Z add.s32 %r50, %r5, -6; 2026-02-21T09:34:05.7144533Z add.s32 %r51, %r5, -1; 2026-02-21T09:34:05.7144596Z mov.b32 %r3334, 5; 2026-02-21T09:34:05.7144651Z mov.b32 %r3314, 1; 2026-02-21T09:34:05.7144759Z mov.b32 %r3313, 2; 2026-02-21T09:34:05.7144824Z mov.b32 %r3312, 3; 2026-02-21T09:34:05.7144881Z mov.b32 %r3311, 4; 2026-02-21T09:34:05.7144942Z mov.b32 %r3310, 160; 2026-02-21T09:34:05.7145032Z mov.b32 %r3316, %r3315; 2026-02-21T09:34:05.7145152Z mov.b32 %r3352, %r6; 2026-02-21T09:34:05.7145213Z mov.b32 %r3335, %r3315; 2026-02-21T09:34:05.7145270Z bra.uni $L__BB0_9; 2026-02-21T09:34:05.7145379Z $L__BB0_13: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:34:05.7145565Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7145631Z setp.eq.b32 %p97, %r3314, 31; 2026-02-21T09:34:05.7145697Z setp.eq.b32 %p43, %r3314, 31; 2026-02-21T09:34:05.7145770Z setp.eq.b32 %p44, %r78, 0; 2026-02-21T09:34:05.7145839Z setp.lt.s32 %p45, %r3335, %r51; 2026-02-21T09:34:05.7145908Z setp.lt.s32 %p46, %r3335, %r50; 2026-02-21T09:34:05.7146089Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7146186Z selp.b32 %r579, 1, 0, %p43; 2026-02-21T09:34:05.7146276Z xor.b32 %r3316, %r3316, %r579; 2026-02-21T09:34:05.7146469Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7146536Z and.pred %p42, %p45, %p43; 2026-02-21T09:34:05.7146705Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7146764Z // begin inline asm 2026-02-21T09:34:05.7146823Z 2026-02-21T09:34:05.7146875Z { 2026-02-21T09:34:05.7146940Z @!%p42 bra.uni skipWait; 2026-02-21T09:34:05.7147010Z .reg .pred complete; 2026-02-21T09:34:05.7147068Z waitLoop: 2026-02-21T09:34:05.7147197Z mbarrier.try_wait.parity.shared.b64 complete, [%r463], %r3316; 2026-02-21T09:34:05.7147263Z @!complete bra.uni waitLoop; 2026-02-21T09:34:05.7147330Z skipWait: 2026-02-21T09:34:05.7147382Z } 2026-02-21T09:34:05.7147386Z 2026-02-21T09:34:05.7147444Z // end inline asm 2026-02-21T09:34:05.7147632Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7147695Z add.s32 %r581, %r3310, 32; 2026-02-21T09:34:05.7147764Z selp.b32 %r3310, 0, %r581, %p44; 2026-02-21T09:34:05.7147943Z .loc 1 40 35 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:40:35 2026-02-21T09:34:05.7148005Z add.s32 %r582, %r3310, %r15; 2026-02-21T09:34:05.7148172Z .loc 1 44 53 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:53 2026-02-21T09:34:05.7148238Z shl.b32 %r583, %r3344, 10; 2026-02-21T09:34:05.7148296Z shl.b32 %r584, %r3345, 10; 2026-02-21T09:34:05.7148353Z shl.b32 %r585, %r3346, 10; 2026-02-21T09:34:05.7148411Z shl.b32 %r586, %r3347, 10; 2026-02-21T09:34:05.7148474Z shl.b32 %r587, %r3348, 10; 2026-02-21T09:34:05.7148533Z shl.b32 %r588, %r3349, 10; 2026-02-21T09:34:05.7148589Z shl.b32 %r589, %r3350, 10; 2026-02-21T09:34:05.7148655Z shl.b32 %r590, %r3351, 10; 2026-02-21T09:34:05.7148827Z .loc 1 44 60 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:60 2026-02-21T09:34:05.7148890Z add.s32 %r591, %r583, %r582; 2026-02-21T09:34:05.7148951Z add.s32 %r592, %r584, %r582; 2026-02-21T09:34:05.7149017Z add.s32 %r593, %r585, %r582; 2026-02-21T09:34:05.7149075Z add.s32 %r594, %r586, %r582; 2026-02-21T09:34:05.7149134Z add.s32 %r595, %r587, %r582; 2026-02-21T09:34:05.7149197Z add.s32 %r596, %r588, %r582; 2026-02-21T09:34:05.7149255Z add.s32 %r597, %r589, %r582; 2026-02-21T09:34:05.7149313Z add.s32 %r598, %r590, %r582; 2026-02-21T09:34:05.7149481Z .loc 1 44 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:32 2026-02-21T09:34:05.7149559Z mad.wide.s32 %rd195, %r591, 2, %rd17; 2026-02-21T09:34:05.7149629Z mad.wide.s32 %rd196, %r592, 2, %rd17; 2026-02-21T09:34:05.7149693Z mad.wide.s32 %rd197, %r593, 2, %rd17; 2026-02-21T09:34:05.7149764Z mad.wide.s32 %rd198, %r594, 2, %rd17; 2026-02-21T09:34:05.7149828Z mad.wide.s32 %rd199, %r595, 2, %rd17; 2026-02-21T09:34:05.7149891Z mad.wide.s32 %rd200, %r596, 2, %rd17; 2026-02-21T09:34:05.7150010Z mad.wide.s32 %rd201, %r597, 2, %rd17; 2026-02-21T09:34:05.7150097Z mad.wide.s32 %rd202, %r598, 2, %rd17; 2026-02-21T09:34:05.7150272Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7150328Z bar.sync 2, 128; 2026-02-21T09:34:05.7150395Z add.s32 %r547, %r114, %r32; 2026-02-21T09:34:05.7150456Z selp.b32 %r548, 16, 0, %p46; 2026-02-21T09:34:05.7150514Z // begin inline asm 2026-02-21T09:34:05.7150642Z cp.async.cg.shared.global [ %r547 + 0 ], [ %rd195 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7150699Z // end inline asm 2026-02-21T09:34:05.7150758Z add.s32 %r549, %r547, 2048; 2026-02-21T09:34:05.7150823Z // begin inline asm 2026-02-21T09:34:05.7150941Z cp.async.cg.shared.global [ %r549 + 0 ], [ %rd196 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7150996Z // end inline asm 2026-02-21T09:34:05.7151080Z add.s32 %r551, %r547, 4096; 2026-02-21T09:34:05.7151166Z // begin inline asm 2026-02-21T09:34:05.7151288Z cp.async.cg.shared.global [ %r551 + 0 ], [ %rd197 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7151347Z // end inline asm 2026-02-21T09:34:05.7151414Z add.s32 %r553, %r547, 6144; 2026-02-21T09:34:05.7151473Z // begin inline asm 2026-02-21T09:34:05.7151589Z cp.async.cg.shared.global [ %r553 + 0 ], [ %rd198 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7151646Z // end inline asm 2026-02-21T09:34:05.7151712Z add.s32 %r555, %r547, 8192; 2026-02-21T09:34:05.7151770Z // begin inline asm 2026-02-21T09:34:05.7151885Z cp.async.cg.shared.global [ %r555 + 0 ], [ %rd199 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7151950Z // end inline asm 2026-02-21T09:34:05.7152009Z add.s32 %r557, %r547, 10240; 2026-02-21T09:34:05.7152066Z // begin inline asm 2026-02-21T09:34:05.7152182Z cp.async.cg.shared.global [ %r557 + 0 ], [ %rd200 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7152244Z // end inline asm 2026-02-21T09:34:05.7152304Z add.s32 %r559, %r547, 12288; 2026-02-21T09:34:05.7152362Z // begin inline asm 2026-02-21T09:34:05.7152485Z cp.async.cg.shared.global [ %r559 + 0 ], [ %rd201 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7152542Z // end inline asm 2026-02-21T09:34:05.7152600Z add.s32 %r561, %r547, 14336; 2026-02-21T09:34:05.7152663Z // begin inline asm 2026-02-21T09:34:05.7152778Z cp.async.cg.shared.global [ %r561 + 0 ], [ %rd202 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7152833Z // end inline asm 2026-02-21T09:34:05.7152896Z cp.async.commit_group; 2026-02-21T09:34:05.7153078Z .loc 1 45 80 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:80 2026-02-21T09:34:05.7153138Z shl.b32 %r599, %r3336, 10; 2026-02-21T09:34:05.7153198Z shl.b32 %r600, %r3337, 10; 2026-02-21T09:34:05.7153262Z shl.b32 %r601, %r3338, 10; 2026-02-21T09:34:05.7153320Z shl.b32 %r602, %r3339, 10; 2026-02-21T09:34:05.7153377Z shl.b32 %r603, %r3340, 10; 2026-02-21T09:34:05.7153435Z shl.b32 %r604, %r3341, 10; 2026-02-21T09:34:05.7153502Z shl.b32 %r605, %r3342, 10; 2026-02-21T09:34:05.7153562Z shl.b32 %r606, %r3343, 10; 2026-02-21T09:34:05.7153739Z .loc 1 45 59 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:59 2026-02-21T09:34:05.7153808Z add.s32 %r607, %r599, %r582; 2026-02-21T09:34:05.7153868Z add.s32 %r608, %r600, %r582; 2026-02-21T09:34:05.7153927Z add.s32 %r609, %r601, %r582; 2026-02-21T09:34:05.7153992Z add.s32 %r610, %r602, %r582; 2026-02-21T09:34:05.7154049Z add.s32 %r611, %r603, %r582; 2026-02-21T09:34:05.7154106Z add.s32 %r612, %r604, %r582; 2026-02-21T09:34:05.7154164Z add.s32 %r613, %r605, %r582; 2026-02-21T09:34:05.7154231Z add.s32 %r614, %r606, %r582; 2026-02-21T09:34:05.7154402Z .loc 1 45 34 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:34 2026-02-21T09:34:05.7154467Z mad.wide.s32 %rd203, %r607, 2, %rd18; 2026-02-21T09:34:05.7154544Z mad.wide.s32 %rd204, %r608, 2, %rd18; 2026-02-21T09:34:05.7154617Z mad.wide.s32 %rd205, %r609, 2, %rd18; 2026-02-21T09:34:05.7154721Z mad.wide.s32 %rd206, %r610, 2, %rd18; 2026-02-21T09:34:05.7154830Z mad.wide.s32 %rd207, %r611, 2, %rd18; 2026-02-21T09:34:05.7154937Z mad.wide.s32 %rd208, %r612, 2, %rd18; 2026-02-21T09:34:05.7155007Z mad.wide.s32 %rd209, %r613, 2, %rd18; 2026-02-21T09:34:05.7155077Z mad.wide.s32 %rd210, %r614, 2, %rd18; 2026-02-21T09:34:05.7155276Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7155344Z add.s32 %r563, %r115, %r32; 2026-02-21T09:34:05.7155409Z // begin inline asm 2026-02-21T09:34:05.7155541Z cp.async.cg.shared.global [ %r563 + 0 ], [ %rd203 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7155602Z // end inline asm 2026-02-21T09:34:05.7155666Z add.s32 %r565, %r563, 2048; 2026-02-21T09:34:05.7155730Z // begin inline asm 2026-02-21T09:34:05.7155862Z cp.async.cg.shared.global [ %r565 + 0 ], [ %rd204 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7155931Z // end inline asm 2026-02-21T09:34:05.7156020Z add.s32 %r567, %r563, 4096; 2026-02-21T09:34:05.7156117Z // begin inline asm 2026-02-21T09:34:05.7156240Z cp.async.cg.shared.global [ %r567 + 0 ], [ %rd205 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7156301Z // end inline asm 2026-02-21T09:34:05.7156372Z add.s32 %r569, %r563, 6144; 2026-02-21T09:34:05.7156437Z // begin inline asm 2026-02-21T09:34:05.7156554Z cp.async.cg.shared.global [ %r569 + 0 ], [ %rd206 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7156609Z // end inline asm 2026-02-21T09:34:05.7156678Z add.s32 %r571, %r563, 8192; 2026-02-21T09:34:05.7156735Z // begin inline asm 2026-02-21T09:34:05.7156850Z cp.async.cg.shared.global [ %r571 + 0 ], [ %rd207 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7156913Z // end inline asm 2026-02-21T09:34:05.7156972Z add.s32 %r573, %r563, 10240; 2026-02-21T09:34:05.7157029Z // begin inline asm 2026-02-21T09:34:05.7157142Z cp.async.cg.shared.global [ %r573 + 0 ], [ %rd208 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7157206Z // end inline asm 2026-02-21T09:34:05.7157267Z add.s32 %r575, %r563, 12288; 2026-02-21T09:34:05.7157325Z // begin inline asm 2026-02-21T09:34:05.7157448Z cp.async.cg.shared.global [ %r575 + 0 ], [ %rd209 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7157503Z // end inline asm 2026-02-21T09:34:05.7157561Z add.s32 %r577, %r563, 14336; 2026-02-21T09:34:05.7157619Z // begin inline asm 2026-02-21T09:34:05.7157741Z cp.async.cg.shared.global [ %r577 + 0 ], [ %rd210 + 0 ], 0x10, %r548; 2026-02-21T09:34:05.7157796Z // end inline asm 2026-02-21T09:34:05.7157861Z cp.async.commit_group; 2026-02-21T09:34:05.7158052Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7158113Z add.s32 %r3335, %r3335, 1; 2026-02-21T09:34:05.7158180Z setp.ne.b32 %p47, %r5, %r3335; 2026-02-21T09:34:05.7158247Z mov.b32 %r3311, %r3334; 2026-02-21T09:34:05.7158305Z mov.b32 %r3314, %r55; 2026-02-21T09:34:05.7158362Z mov.b32 %r3334, %r78; 2026-02-21T09:34:05.7158423Z @%p47 bra $L__BB0_9; 2026-02-21T09:34:05.7158491Z bra.uni $L__BB0_14; 2026-02-21T09:34:05.7158590Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:34:05.7158690Z // => This Inner Loop Header: Depth=2 2026-02-21T09:34:05.7158872Z .loc 1 0 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:0:130 2026-02-21T09:34:05.7158930Z mov.b32 %r55, %r3313; 2026-02-21T09:34:05.7158988Z mov.b32 %r3313, %r3312; 2026-02-21T09:34:05.7159054Z mov.b32 %r3312, %r3311; 2026-02-21T09:34:05.7159232Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7159292Z add.s32 %r505, %r3334, 1; 2026-02-21T09:34:05.7159358Z setp.eq.b32 %p26, %r3334, 31; 2026-02-21T09:34:05.7159431Z selp.b32 %r78, 0, %r505, %p26; 2026-02-21T09:34:05.7159492Z setp.ne.b32 %p27, %r78, 0; 2026-02-21T09:34:05.7159551Z @%p27 bra $L__BB0_11; 2026-02-21T09:34:05.7159657Z // %bb.10: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:34:05.7159746Z add.s32 %r3352, %r3352, 2368; 2026-02-21T09:34:05.7159940Z .loc 1 26 35 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:26:35 2026-02-21T09:34:05.7160013Z mul.hi.s32 %r506, %r3352, 715827883; 2026-02-21T09:34:05.7160073Z shr.u32 %r507, %r506, 31; 2026-02-21T09:34:05.7160133Z shr.s32 %r508, %r506, 6; 2026-02-21T09:34:05.7160194Z add.s32 %r509, %r508, %r507; 2026-02-21T09:34:05.7160372Z .loc 1 27 33 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:27:33 2026-02-21T09:34:05.7160433Z shl.b32 %r510, %r509, 3; 2026-02-21T09:34:05.7160602Z .loc 1 28 39 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:28:39 2026-02-21T09:34:05.7160668Z sub.s32 %r511, 4, %r510; 2026-02-21T09:34:05.7160840Z .loc 1 28 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:28:52 2026-02-21T09:34:05.7160923Z min.s32 %r512, %r511, 8; 2026-02-21T09:34:05.7161118Z .loc 1 29 45 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:45 2026-02-21T09:34:05.7161183Z mul.lo.s32 %r513, %r509, 384; 2026-02-21T09:34:05.7161244Z sub.s32 %r514, %r3352, %r513; 2026-02-21T09:34:05.7161419Z .loc 1 30 51 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:30:51 2026-02-21T09:34:05.7161485Z div.s32 %r515, %r514, %r512; 2026-02-21T09:34:05.7161655Z .loc 1 29 64 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:64 2026-02-21T09:34:05.7161719Z mul.lo.s32 %r516, %r515, %r512; 2026-02-21T09:34:05.7161785Z sub.s32 %r517, %r514, %r516; 2026-02-21T09:34:05.7161959Z .loc 1 29 30 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:29:30 2026-02-21T09:34:05.7162019Z add.s32 %r518, %r517, %r510; 2026-02-21T09:34:05.7162202Z .loc 1 31 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:31:27 2026-02-21T09:34:05.7162263Z shl.b32 %r519, %r518, 8; 2026-02-21T09:34:05.7162431Z .loc 1 32 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:32:32 2026-02-21T09:34:05.7162500Z or.b32 %r3336, %r519, %r7; 2026-02-21T09:34:05.7162560Z or.b32 %r3337, %r519, %r8; 2026-02-21T09:34:05.7162618Z or.b32 %r3338, %r519, %r9; 2026-02-21T09:34:05.7162677Z or.b32 %r3339, %r519, %r10; 2026-02-21T09:34:05.7162744Z or.b32 %r3340, %r519, %r11; 2026-02-21T09:34:05.7162801Z or.b32 %r3341, %r519, %r12; 2026-02-21T09:34:05.7162860Z or.b32 %r3342, %r519, %r13; 2026-02-21T09:34:05.7162924Z or.b32 %r3343, %r519, %r14; 2026-02-21T09:34:05.7163097Z .loc 1 33 27 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:33:27 2026-02-21T09:34:05.7163157Z shl.b32 %r520, %r515, 8; 2026-02-21T09:34:05.7163334Z .loc 1 34 32 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:34:32 2026-02-21T09:34:05.7163402Z or.b32 %r3344, %r520, %r7; 2026-02-21T09:34:05.7163461Z or.b32 %r3345, %r520, %r8; 2026-02-21T09:34:05.7163521Z or.b32 %r3346, %r520, %r9; 2026-02-21T09:34:05.7163588Z or.b32 %r3347, %r520, %r10; 2026-02-21T09:34:05.7163647Z or.b32 %r3348, %r520, %r11; 2026-02-21T09:34:05.7163706Z or.b32 %r3349, %r520, %r12; 2026-02-21T09:34:05.7163772Z or.b32 %r3350, %r520, %r13; 2026-02-21T09:34:05.7163832Z or.b32 %r3351, %r520, %r14; 2026-02-21T09:34:05.7163943Z $L__BB0_11: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:34:05.7164124Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7164199Z setp.ge.s32 %p28, %r3335, %r51; 2026-02-21T09:34:05.7164258Z add.s32 %r521, %r3315, 1; 2026-02-21T09:34:05.7164321Z setp.gt.s32 %p30, %r521, 4; 2026-02-21T09:34:05.7164394Z selp.b32 %r3315, 0, %r521, %p30; 2026-02-21T09:34:05.7164569Z .loc 1 44 85 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:44:85 2026-02-21T09:34:05.7164636Z cp.async.wait_group 8; 2026-02-21T09:34:05.7164812Z bar.sync 2, 128; 2026-02-21T09:34:05.7164878Z shl.b32 %r522, %r3315, 14; 2026-02-21T09:34:05.7164941Z add.s32 %r114, %r226, %r522; 2026-02-21T09:34:05.7165124Z .loc 1 45 87 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:45:87 2026-02-21T09:34:05.7165197Z add.s32 %r115, %r114, 81920; 2026-02-21T09:34:05.7165387Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7165455Z or.pred %p31, %p11, %p28; 2026-02-21T09:34:05.7165529Z @%p31 bra $L__BB0_13; 2026-02-21T09:34:05.7165637Z // %bb.12: // in Loop: Header=BB0_9 Depth=2 2026-02-21T09:34:05.7165826Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7165898Z setp.eq.b32 %p41, %r3314, 31; 2026-02-21T09:34:05.7166121Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7166187Z not.pred %p32, %p97; 2026-02-21T09:34:05.7166254Z elect.sync %r532|%p33, -1; 2026-02-21T09:34:05.7166326Z bfe.u32 %r533, %r114, 4, 14; 2026-02-21T09:34:05.7166388Z cvt.u64.u32 %rd189, %r533; 2026-02-21T09:34:05.7166465Z or.b64 %rd180, %rd189, -9223371899348713472; 2026-02-21T09:34:05.7166533Z bfe.u32 %r534, %r115, 4, 14; 2026-02-21T09:34:05.7166592Z cvt.u64.u32 %rd190, %r534; 2026-02-21T09:34:05.7166665Z or.b64 %rd181, %rd190, -9223371899348713472; 2026-02-21T09:34:05.7166725Z mov.b32 %r525, 138412048; 2026-02-21T09:34:05.7166789Z // begin inline asm 2026-02-21T09:34:05.7166943Z @%p33 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 0 ], %rd180, %rd181, %r525, %p32; 2026-02-21T09:34:05.7167000Z // end inline asm 2026-02-21T09:34:05.7167067Z add.s32 %r535, %r114, 32; 2026-02-21T09:34:05.7167126Z bfe.u32 %r536, %r535, 4, 14; 2026-02-21T09:34:05.7167186Z cvt.u64.u32 %rd191, %r536; 2026-02-21T09:34:05.7167265Z or.b64 %rd182, %rd191, -9223371899348713472; 2026-02-21T09:34:05.7167326Z add.s32 %r537, %r115, 32; 2026-02-21T09:34:05.7167384Z bfe.u32 %r538, %r537, 4, 14; 2026-02-21T09:34:05.7167444Z cvt.u64.u32 %rd192, %r538; 2026-02-21T09:34:05.7167522Z or.b64 %rd183, %rd192, -9223371899348713472; 2026-02-21T09:34:05.7167582Z mov.pred %p34, -1; 2026-02-21T09:34:05.7167640Z // begin inline asm 2026-02-21T09:34:05.7167792Z @%p33 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 0 ], %rd182, %rd183, %r525, %p34; 2026-02-21T09:34:05.7167849Z // end inline asm 2026-02-21T09:34:05.7167907Z add.s32 %r539, %r114, 8192; 2026-02-21T09:34:05.7167965Z bfe.u32 %r540, %r539, 4, 14; 2026-02-21T09:34:05.7168030Z cvt.u64.u32 %rd193, %r540; 2026-02-21T09:34:05.7168099Z or.b64 %rd184, %rd193, -9223371899348713472; 2026-02-21T09:34:05.7168156Z // begin inline asm 2026-02-21T09:34:05.7168310Z @%p33 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 256 ], %rd184, %rd181, %r525, %p32; 2026-02-21T09:34:05.7168369Z // end inline asm 2026-02-21T09:34:05.7168429Z add.s32 %r541, %r114, 8224; 2026-02-21T09:34:05.7168497Z bfe.u32 %r542, %r541, 4, 14; 2026-02-21T09:34:05.7168557Z cvt.u64.u32 %rd194, %r542; 2026-02-21T09:34:05.7168626Z or.b64 %rd186, %rd194, -9223371899348713472; 2026-02-21T09:34:05.7168683Z // begin inline asm 2026-02-21T09:34:05.7168833Z @%p33 tcgen05.mma.cta_group::1.kind::f16 [ %r441 + 256 ], %rd186, %rd183, %r525, %p34; 2026-02-21T09:34:05.7168888Z // end inline asm 2026-02-21T09:34:05.7168951Z and.pred %p40, %p41, %p33; 2026-02-21T09:34:05.7169017Z add.s32 %r544, %r226, 180224; 2026-02-21T09:34:05.7169075Z cvt.u64.u32 %rd188, %r544; 2026-02-21T09:34:05.7169130Z // begin inline asm 2026-02-21T09:34:05.7169266Z @%p40 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd188]; 2026-02-21T09:34:05.7169322Z // end inline asm 2026-02-21T09:34:05.7169379Z bra.uni $L__BB0_13; 2026-02-21T09:34:05.7169480Z $L__BB0_17: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7169671Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7169779Z barrier.sync 1; 2026-02-21T09:34:05.7169838Z barrier.sync 1; 2026-02-21T09:34:05.7169903Z bra.uni $L__BB0_2; 2026-02-21T09:34:05.7169990Z $L__BB0_14: // %._crit_edge 2026-02-21T09:34:05.7170081Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7170152Z cp.async.wait_group 0; 2026-02-21T09:34:05.7170208Z bar.sync 2, 128; 2026-02-21T09:34:05.7170265Z barrier.sync 1; 2026-02-21T09:34:05.7170323Z bra.uni $L__BB0_2; 2026-02-21T09:34:05.7170425Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7170602Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7170662Z barrier.sync 1; 2026-02-21T09:34:05.7170748Z barrier.sync 1; 2026-02-21T09:34:05.7170826Z bra.uni $L__BB0_2; 2026-02-21T09:34:05.7170925Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:05.7171095Z .loc 1 14 0 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:14 2026-02-21T09:34:05.7171161Z barrier.sync 1; 2026-02-21T09:34:05.7171217Z barrier.sync 1; 2026-02-21T09:34:05.7171273Z bra.uni $L__BB0_2; 2026-02-21T09:34:05.7171370Z $L__BB0_25: // %._crit_edge5 2026-02-21T09:34:05.7171551Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7171609Z barrier.sync 1; 2026-02-21T09:34:05.7171791Z .loc 1 46 52 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:46:52 2026-02-21T09:34:05.7171850Z bar.sync 0, 128; 2026-02-21T09:34:05.7171907Z // begin inline asm 2026-02-21T09:34:05.7171960Z 2026-02-21T09:34:05.7172021Z { 2026-02-21T09:34:05.7172088Z .reg .pred complete; 2026-02-21T09:34:05.7172147Z waitLoop: 2026-02-21T09:34:05.7172290Z mbarrier.try_wait.parity.shared.b64 complete, [%r3304], %r3363; 2026-02-21T09:34:05.7172360Z @!complete bra.uni waitLoop; 2026-02-21T09:34:05.7172415Z } 2026-02-21T09:34:05.7172419Z 2026-02-21T09:34:05.7172477Z // end inline asm 2026-02-21T09:34:05.7172664Z .loc 1 20 130 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:130 2026-02-21T09:34:05.7172722Z bar.sync 0, 128; 2026-02-21T09:34:05.7172779Z // begin inline asm 2026-02-21T09:34:05.7172879Z @%p94 mbarrier.inval.shared::cta.b64 [%r3304]; 2026-02-21T09:34:05.7172936Z // end inline asm 2026-02-21T09:34:05.7172992Z // begin inline asm 2026-02-21T09:34:05.7173088Z @%p94 mbarrier.inval.shared::cta.b64 [%r1160]; 2026-02-21T09:34:05.7173142Z // end inline asm 2026-02-21T09:34:05.7173311Z .loc 1 20 4 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:20:4 2026-02-21T09:34:05.7173370Z bar.sync 0, 128; 2026-02-21T09:34:05.7173437Z // begin inline asm 2026-02-21T09:34:05.7173564Z @%p48 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r3308, 512; 2026-02-21T09:34:05.7173621Z // end inline asm 2026-02-21T09:34:05.7173741Z st.shared.v2.b32 [global_smem+180248], {67372036, 67372036}; 2026-02-21T09:34:05.7173800Z barrier.sync 1; 2026-02-21T09:34:05.7173885Z $L__BB0_26: // %common.ret 2026-02-21T09:34:05.7174064Z .loc 1 0 0 // cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py:0 2026-02-21T09:34:05.7174118Z ret; 2026-02-21T09:34:05.7174174Z $L__tmp0: 2026-02-21T09:34:05.7174231Z $L__func_end0: 2026-02-21T09:34:05.7174325Z // -- End function 2026-02-21T09:34:05.7174378Z } 2026-02-21T09:34:05.7174610Z .file 1 "/tmp/torchinductor_root/pj/cpjxa3lxlesnxk2mn3wal334c7453jrfrhjqdamfzwt4fqfkx7hq.py" 2026-02-21T09:34:05.7174717Z .section .debug_abbrev 2026-02-21T09:34:05.7174774Z { 2026-02-21T09:34:05.7174877Z .b8 1 // Abbreviation Code 2026-02-21T09:34:05.7175012Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:34:05.7175129Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:34:05.7175218Z .b8 37 // DW_AT_producer 2026-02-21T09:34:05.7175303Z .b8 8 // DW_FORM_string 2026-02-21T09:34:05.7175395Z .b8 19 // DW_AT_language 2026-02-21T09:34:05.7175481Z .b8 5 // DW_FORM_data2 2026-02-21T09:34:05.7175564Z .b8 3 // DW_AT_name 2026-02-21T09:34:05.7175655Z .b8 8 // DW_FORM_string 2026-02-21T09:34:05.7175744Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:34:05.7175831Z .b8 6 // DW_FORM_data4 2026-02-21T09:34:05.7175954Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:34:05.7176060Z .b8 8 // DW_FORM_string 2026-02-21T09:34:05.7176145Z .b8 0 // EOM(1) 2026-02-21T09:34:05.7176226Z .b8 0 // EOM(2) 2026-02-21T09:34:05.7176310Z .b8 0 // EOM(3) 2026-02-21T09:34:05.7176366Z } 2026-02-21T09:34:05.7176434Z .section .debug_info 2026-02-21T09:34:05.7176500Z { 2026-02-21T09:34:05.7176593Z .b32 104 // Length of Unit 2026-02-21T09:34:05.7176688Z .b8 2 // DWARF version number 2026-02-21T09:34:05.7176744Z .b8 0 2026-02-21T09:34:05.7176878Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:34:05.7176976Z .b8 8 // Address Size (in bytes) 2026-02-21T09:34:05.7177087Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:34:05.7177185Z .b8 116 // DW_AT_producer 2026-02-21T09:34:05.7177248Z .b8 114 2026-02-21T09:34:05.7177309Z .b8 105 2026-02-21T09:34:05.7177372Z .b8 116 2026-02-21T09:34:05.7177430Z .b8 111 2026-02-21T09:34:05.7177485Z .b8 110 2026-02-21T09:34:05.7177540Z .b8 0 2026-02-21T09:34:05.7177629Z .b8 2 // DW_AT_language 2026-02-21T09:34:05.7177684Z .b8 0 2026-02-21T09:34:05.7177768Z .b8 99 // DW_AT_name 2026-02-21T09:34:05.7177831Z .b8 112 2026-02-21T09:34:05.7177887Z .b8 106 2026-02-21T09:34:05.7177943Z .b8 120 2026-02-21T09:34:05.7177999Z .b8 97 2026-02-21T09:34:05.7178064Z .b8 51 2026-02-21T09:34:05.7178119Z .b8 108 2026-02-21T09:34:05.7178176Z .b8 120 2026-02-21T09:34:05.7178239Z .b8 108 2026-02-21T09:34:05.7178295Z .b8 101 2026-02-21T09:34:05.7178351Z .b8 115 2026-02-21T09:34:05.7178405Z .b8 110 2026-02-21T09:34:05.7178468Z .b8 120 2026-02-21T09:34:05.7178523Z .b8 107 2026-02-21T09:34:05.7178579Z .b8 50 2026-02-21T09:34:05.7178634Z .b8 109 2026-02-21T09:34:05.7178699Z .b8 110 2026-02-21T09:34:05.7178754Z .b8 51 2026-02-21T09:34:05.7178813Z .b8 119 2026-02-21T09:34:05.7178878Z .b8 97 2026-02-21T09:34:05.7178934Z .b8 108 2026-02-21T09:34:05.7178990Z .b8 51 2026-02-21T09:34:05.7179044Z .b8 51 2026-02-21T09:34:05.7179107Z .b8 52 2026-02-21T09:34:05.7179162Z .b8 99 2026-02-21T09:34:05.7179217Z .b8 55 2026-02-21T09:34:05.7179279Z .b8 52 2026-02-21T09:34:05.7179335Z .b8 53 2026-02-21T09:34:05.7179391Z .b8 51 2026-02-21T09:34:05.7179446Z .b8 106 2026-02-21T09:34:05.7179511Z .b8 114 2026-02-21T09:34:05.7179568Z .b8 102 2026-02-21T09:34:05.7179624Z .b8 114 2026-02-21T09:34:05.7179688Z .b8 104 2026-02-21T09:34:05.7179745Z .b8 106 2026-02-21T09:34:05.7179801Z .b8 113 2026-02-21T09:34:05.7179858Z .b8 100 2026-02-21T09:34:05.7179925Z .b8 97 2026-02-21T09:34:05.7179983Z .b8 109 2026-02-21T09:34:05.7180040Z .b8 102 2026-02-21T09:34:05.7180096Z .b8 122 2026-02-21T09:34:05.7180162Z .b8 119 2026-02-21T09:34:05.7180218Z .b8 116 2026-02-21T09:34:05.7180273Z .b8 52 2026-02-21T09:34:05.7180338Z .b8 102 2026-02-21T09:34:05.7180392Z .b8 113 2026-02-21T09:34:05.7180453Z .b8 102 2026-02-21T09:34:05.7180538Z .b8 107 2026-02-21T09:34:05.7180629Z .b8 120 2026-02-21T09:34:05.7180685Z .b8 55 2026-02-21T09:34:05.7180742Z .b8 104 2026-02-21T09:34:05.7180803Z .b8 113 2026-02-21T09:34:05.7180859Z .b8 46 2026-02-21T09:34:05.7180915Z .b8 112 2026-02-21T09:34:05.7180972Z .b8 121 2026-02-21T09:34:05.7181035Z .b8 0 2026-02-21T09:34:05.7181138Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:34:05.7181223Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:34:05.7181288Z .b8 116 2026-02-21T09:34:05.7181343Z .b8 109 2026-02-21T09:34:05.7181399Z .b8 112 2026-02-21T09:34:05.7181454Z .b8 47 2026-02-21T09:34:05.7181518Z .b8 116 2026-02-21T09:34:05.7181574Z .b8 111 2026-02-21T09:34:05.7181630Z .b8 114 2026-02-21T09:34:05.7181684Z .b8 99 2026-02-21T09:34:05.7181749Z .b8 104 2026-02-21T09:34:05.7181805Z .b8 105 2026-02-21T09:34:05.7181860Z .b8 110 2026-02-21T09:34:05.7181952Z .b8 100 2026-02-21T09:34:05.7182012Z .b8 117 2026-02-21T09:34:05.7182104Z .b8 99 2026-02-21T09:34:05.7182174Z .b8 116 2026-02-21T09:34:05.7182238Z .b8 111 2026-02-21T09:34:05.7182291Z .b8 114 2026-02-21T09:34:05.7182343Z .b8 95 2026-02-21T09:34:05.7182403Z .b8 114 2026-02-21T09:34:05.7182454Z .b8 111 2026-02-21T09:34:05.7182506Z .b8 111 2026-02-21T09:34:05.7182558Z .b8 116 2026-02-21T09:34:05.7182617Z .b8 47 2026-02-21T09:34:05.7182669Z .b8 112 2026-02-21T09:34:05.7182721Z .b8 106 2026-02-21T09:34:05.7182779Z .b8 0 2026-02-21T09:34:05.7182831Z } 2026-02-21T09:34:05.7182899Z .section .debug_macinfo { } 2026-02-21T09:34:05.7182904Z 2026-02-21T09:34:05.7182984Z ================================================================ 2026-02-21T09:34:05.7183098Z please share the reproducer above with Triton project. 2026-02-21T09:34:05.7841224Z 2026-02-21T09:34:05.7842272Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 33/33 13.3 configs/s 2026-02-21T09:34:06.9351970Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 870.0 2026-02-21T09:34:06.9353421Z configs/s 2026-02-21T09:34:07.0178396Z [198s] Generation 8 complete: 2026-02-21T09:34:07.0183242Z error=5 2026-02-21T09:34:07.0187789Z ok=30 2026-02-21T09:34:07.0189705Z min=0.0411 2026-02-21T09:34:07.0189880Z mid=0.0594 2026-02-21T09:34:07.0190014Z max=9.3870 2026-02-21T09:34:07.0190175Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:34:07.0190455Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:34:07.0190710Z 'l2_groupings': [64], 2026-02-21T09:34:07.0190900Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:34:07.0191113Z 'loop_orders': [[0, 1]], 2026-02-21T09:34:07.0191288Z 'num_stages': 3, 2026-02-21T09:34:07.0191439Z 'num_warps': 4, 2026-02-21T09:34:07.0191597Z 'pid_type': 'flat', 2026-02-21T09:34:07.0191764Z 'range_flattens': [None, True], 2026-02-21T09:34:07.0191961Z 'range_multi_buffers': [None, True], 2026-02-21T09:34:07.0192193Z 'range_num_stages': [0, 0], 2026-02-21T09:34:07.0192383Z 'range_unroll_factors': [0, 0], 2026-02-21T09:34:07.0192595Z 'range_warp_specializes': [None, None]} 2026-02-21T09:34:07.0208369Z [198s] Fitting surrogate: 726 points, 726 targets 2026-02-21T09:34:07.6665726Z [199s] Generation 9 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:34:15.2604130Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19/19 1.2 configs/s 2026-02-21T09:34:15.9022400Z 2026-02-21T09:34:15.9022416Z 2026-02-21T09:34:15.9022865Z ================================================================ 2026-02-21T09:34:15.9023212Z Internal Triton PTX codegen error 2026-02-21T09:34:15.9023433Z `ptxas` stderr: 2026-02-21T09:34:15.9023985Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 361 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:34:15.9024595Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:34:15.9024908Z 2026-02-21T09:34:15.9025436Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_82wn51n.ptx -o /tmp/tmp_82wn51n.ptx.o 2026-02-21T09:34:15.9026366Z 2026-02-21T09:34:15.9026371Z 2026-02-21T09:34:15.9026462Z // 2026-02-21T09:34:15.9026641Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:34:15.9026873Z // 2026-02-21T09:34:15.9026958Z 2026-02-21T09:34:15.9027032Z .version 8.7 2026-02-21T09:34:15.9027215Z .target sm_100a 2026-02-21T09:34:15.9027386Z .address_size 64 2026-02-21T09:34:15.9027488Z 2026-02-21T09:34:15.9027639Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:34:15.9027966Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:34:15.9028223Z // @_helion_matmul 2026-02-21T09:34:15.9028469Z .visible .entry _helion_matmul( 2026-02-21T09:34:15.9028730Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:34:15.9029198Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:34:15.9029522Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:34:15.9029834Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:34:15.9030143Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:34:15.9030382Z ) 2026-02-21T09:34:15.9030531Z .reqntid 256 2026-02-21T09:34:15.9030684Z .maxnreg 32 2026-02-21T09:34:15.9030836Z { 2026-02-21T09:34:15.9031431Z [207s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:34:15.9032954Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=2, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:34:15.9034404Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:34:15.9034743Z `ptxas` stderr: 2026-02-21T09:34:15.9035276Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 361 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:34:15.9035861Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:34:15.9036036Z 2026-02-21T09:34:15.9036516Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_82wn51n.ptx -o /tmp/tmp_82wn51n.ptx.o 2026-02-21T09:34:15.9037067Z 2026-02-21T09:34:15.9037219Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:34:15.9037521Z .reg .pred %p<93>; 2026-02-21T09:34:15.9037708Z .reg .b32 %r<1025>; 2026-02-21T09:34:15.9037879Z .reg .b64 %rd<429>; 2026-02-21T09:34:15.9038035Z $L__func_begin0: 2026-02-21T09:34:15.9038131Z 2026-02-21T09:34:15.9038203Z // %bb.0: 2026-02-21T09:34:15.9038469Z .loc 1 19 0 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:19 2026-02-21T09:34:15.9038798Z mov.u32 %r1, %tid.x; 2026-02-21T09:34:15.9038992Z ld.param.b64 %rd31, [_helion_matmul_param_1]; 2026-02-21T09:34:15.9039226Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:34:15.9039415Z mov.b32 %r93, global_smem; 2026-02-21T09:34:15.9039593Z // begin inline asm 2026-02-21T09:34:15.9039880Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r93], 256; 2026-02-21T09:34:15.9040171Z // end inline asm 2026-02-21T09:34:15.9040361Z ld.param.b64 %rd48, [_helion_matmul_param_3]; 2026-02-21T09:34:15.9040574Z bar.sync 0; 2026-02-21T09:34:15.9040746Z ld.shared.b32 %r1017, [global_smem]; 2026-02-21T09:34:15.9040941Z bar.sync 0; 2026-02-21T09:34:15.9041094Z // begin inline asm 2026-02-21T09:34:15.9041331Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:34:15.9041597Z // end inline asm 2026-02-21T09:34:15.9041948Z .loc 1 21 67 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:21:67 2026-02-21T09:34:15.9042305Z mov.u32 %r102, %ctaid.x; 2026-02-21T09:34:15.9042485Z mov.u32 %r103, %ctaid.y; 2026-02-21T09:34:15.9042654Z mov.u32 %r104, %ctaid.z; 2026-02-21T09:34:15.9042840Z mov.u32 %r105, %nctaid.x; 2026-02-21T09:34:15.9043023Z mov.u32 %r106, %nctaid.y; 2026-02-21T09:34:15.9043224Z mad.lo.s32 %r107, %r104, %r106, %r103; 2026-02-21T09:34:15.9043443Z mad.lo.s32 %r108, %r107, %r105, %r102; 2026-02-21T09:34:15.9043638Z shl.b32 %r109, %r108, 7; 2026-02-21T09:34:15.9043818Z cvt.s64.s32 %rd49, %r109; 2026-02-21T09:34:15.9043997Z add.s64 %rd45, %rd48, %rd49; 2026-02-21T09:34:15.9044182Z shl.b32 %r110, %r1, 2; 2026-02-21T09:34:15.9044353Z add.s32 %r94, %r93, %r110; 2026-02-21T09:34:15.9044529Z mov.b32 %r95, 0; 2026-02-21T09:34:15.9044759Z // begin inline asm 2026-02-21T09:34:15.9044974Z @%p1 st.shared.b32 [ %r94 + 0 ], %r95; 2026-02-21T09:34:15.9045170Z // end inline asm 2026-02-21T09:34:15.9045338Z bar.warp.sync -1; 2026-02-21T09:34:15.9045513Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:34:15.9045686Z cvt.u64.u32 %rd30, %r93; 2026-02-21T09:34:15.9045859Z // begin inline asm 2026-02-21T09:34:15.9046141Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd30 + 0 ], %rd31; 2026-02-21T09:34:15.9046473Z // end inline asm 2026-02-21T09:34:15.9046625Z // begin inline asm 2026-02-21T09:34:15.9046884Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1; 2026-02-21T09:34:15.9047176Z // end inline asm 2026-02-21T09:34:15.9047334Z mov.b32 %r96, 64; 2026-02-21T09:34:15.9047492Z // begin inline asm 2026-02-21T09:34:15.9047757Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r96; 2026-02-21T09:34:15.9048060Z // end inline asm 2026-02-21T09:34:15.9048206Z mov.b32 %r97, 256; 2026-02-21T09:34:15.9048366Z // begin inline asm 2026-02-21T09:34:15.9048622Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r97; 2026-02-21T09:34:15.9048925Z // end inline asm 2026-02-21T09:34:15.9049073Z mov.b32 %r98, 1024; 2026-02-21T09:34:15.9049239Z // begin inline asm 2026-02-21T09:34:15.9049512Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r98; 2026-02-21T09:34:15.9049815Z // end inline asm 2026-02-21T09:34:15.9049971Z // begin inline asm 2026-02-21T09:34:15.9050230Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r98; 2026-02-21T09:34:15.9050541Z // end inline asm 2026-02-21T09:34:15.9050691Z mov.b64 %rd38, 2048; 2026-02-21T09:34:15.9050855Z // begin inline asm 2026-02-21T09:34:15.9051141Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd30 + 0 ], 0x0, %rd38; 2026-02-21T09:34:15.9051458Z // end inline asm 2026-02-21T09:34:15.9051612Z mov.b32 %r100, 1; 2026-02-21T09:34:15.9051764Z // begin inline asm 2026-02-21T09:34:15.9052058Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0, %r100; 2026-02-21T09:34:15.9052390Z // end inline asm 2026-02-21T09:34:15.9052553Z // begin inline asm 2026-02-21T09:34:15.9052835Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x1, %r100; 2026-02-21T09:34:15.9053166Z // end inline asm 2026-02-21T09:34:15.9053324Z // begin inline asm 2026-02-21T09:34:15.9053579Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x6; 2026-02-21T09:34:15.9053880Z // end inline asm 2026-02-21T09:34:15.9054029Z // begin inline asm 2026-02-21T09:34:15.9054312Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:34:15.9054624Z // end inline asm 2026-02-21T09:34:15.9054824Z // begin inline asm 2026-02-21T09:34:15.9055100Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x3; 2026-02-21T09:34:15.9055414Z // end inline asm 2026-02-21T09:34:15.9055575Z // begin inline asm 2026-02-21T09:34:15.9055871Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd30 + 0 ], 0x0; 2026-02-21T09:34:15.9056197Z // end inline asm 2026-02-21T09:34:15.9056346Z // begin inline asm 2026-02-21T09:34:15.9056740Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd45 + 0 ], [ %rd30 + 0 ], 0x80; 2026-02-21T09:34:15.9057183Z // end inline asm 2026-02-21T09:34:15.9057331Z // begin inline asm 2026-02-21T09:34:15.9057569Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd45 + 0 ], 0x80; 2026-02-21T09:34:15.9057848Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:34:15.9058071Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:34:15.9058270Z // end inline asm 2026-02-21T09:34:15.9058425Z bar.sync 0; 2026-02-21T09:34:15.9058582Z cvta.global.u64 %rd104, %rd45; 2026-02-21T09:34:15.9058939Z .loc 1 28 35 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:28:35 2026-02-21T09:34:15.9059304Z mul.lo.s32 %r1018, %r102, 3; 2026-02-21T09:34:15.9059602Z .loc 1 29 37 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:29:37 2026-02-21T09:34:15.9059935Z add.s32 %r111, %r1018, 3; 2026-02-21T09:34:15.9060221Z .loc 1 29 49 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:29:49 2026-02-21T09:34:15.9060546Z min.s32 %r4, %r111, 384; 2026-02-21T09:34:15.9060830Z .loc 1 30 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:30:52 2026-02-21T09:34:15.9061167Z setp.ge.s32 %p21, %r1018, %r4; 2026-02-21T09:34:15.9061358Z @%p21 bra $L__BB0_9; 2026-02-21T09:34:15.9061542Z // %bb.1: // %.lr.ph 2026-02-21T09:34:15.9061875Z .loc 1 0 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:0:52 2026-02-21T09:34:15.9062227Z ld.param.b64 %rd29, [_helion_matmul_param_2]; 2026-02-21T09:34:15.9062480Z ld.param.b64 %rd28, [_helion_matmul_param_0]; 2026-02-21T09:34:15.9062693Z shr.u32 %r5, %r1, 5; 2026-02-21T09:34:15.9062864Z and.b32 %r6, %r1, 31; 2026-02-21T09:34:15.9063029Z shl.b32 %r7, %r6, 3; 2026-02-21T09:34:15.9063195Z and.b32 %r8, %r1, 240; 2026-02-21T09:34:15.9063375Z bfe.u32 %r9, %r1, 4, 4; 2026-02-21T09:34:15.9063545Z and.b32 %r10, %r1, 224; 2026-02-21T09:34:15.9063725Z bfe.u32 %r11, %r1, 5, 3; 2026-02-21T09:34:15.9063893Z or.b32 %r12, %r11, 8; 2026-02-21T09:34:15.9064063Z or.b32 %r13, %r11, 16; 2026-02-21T09:34:15.9064230Z or.b32 %r14, %r11, 24; 2026-02-21T09:34:15.9064398Z or.b32 %r15, %r11, 32; 2026-02-21T09:34:15.9064559Z or.b32 %r16, %r11, 40; 2026-02-21T09:34:15.9064807Z or.b32 %r17, %r11, 48; 2026-02-21T09:34:15.9064967Z or.b32 %r18, %r11, 56; 2026-02-21T09:34:15.9065136Z or.b32 %r19, %r11, 64; 2026-02-21T09:34:15.9065302Z or.b32 %r20, %r11, 72; 2026-02-21T09:34:15.9065461Z or.b32 %r21, %r11, 80; 2026-02-21T09:34:15.9065626Z or.b32 %r22, %r11, 88; 2026-02-21T09:34:15.9065786Z or.b32 %r23, %r11, 96; 2026-02-21T09:34:15.9065953Z or.b32 %r24, %r11, 104; 2026-02-21T09:34:15.9066118Z or.b32 %r25, %r11, 112; 2026-02-21T09:34:15.9066286Z or.b32 %r26, %r11, 120; 2026-02-21T09:34:15.9066448Z shl.b32 %r112, %r1, 3; 2026-02-21T09:34:15.9066622Z and.b32 %r27, %r112, 120; 2026-02-21T09:34:15.9066796Z and.b32 %r113, %r1, 7; 2026-02-21T09:34:15.9066967Z shl.b32 %r114, %r113, 4; 2026-02-21T09:34:15.9067141Z shl.b32 %r115, %r8, 3; 2026-02-21T09:34:15.9067304Z and.b32 %r116, %r1, 112; 2026-02-21T09:34:15.9067475Z shl.b32 %r117, %r1, 11; 2026-02-21T09:34:15.9067641Z and.b32 %r118, %r117, 16384; 2026-02-21T09:34:15.9067828Z or.b32 %r119, %r114, %r115; 2026-02-21T09:34:15.9068006Z xor.b32 %r120, %r119, %r116; 2026-02-21T09:34:15.9068189Z or.b32 %r28, %r120, %r118; 2026-02-21T09:34:15.9068364Z add.s32 %r122, %r93, 131072; 2026-02-21T09:34:15.9068546Z add.s32 %r311, %r122, %r28; 2026-02-21T09:34:15.9068721Z add.s32 %r313, %r311, 2048; 2026-02-21T09:34:15.9068901Z add.s32 %r315, %r311, 4096; 2026-02-21T09:34:15.9069121Z add.s32 %r317, %r311, 6144; 2026-02-21T09:34:15.9069315Z add.s32 %r319, %r311, 8192; 2026-02-21T09:34:15.9069493Z add.s32 %r321, %r311, 10240; 2026-02-21T09:34:15.9069664Z add.s32 %r323, %r311, 12288; 2026-02-21T09:34:15.9069843Z add.s32 %r325, %r311, 14336; 2026-02-21T09:34:15.9070012Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T09:34:15.9070191Z cvt.u64.u32 %rd50, %r123; 2026-02-21T09:34:15.9070379Z or.b64 %rd79, %rd50, 4611686293372403712; 2026-02-21T09:34:15.9070590Z bfe.u32 %r124, %r93, 4, 14; 2026-02-21T09:34:15.9070762Z cvt.u64.u32 %rd51, %r124; 2026-02-21T09:34:15.9070952Z or.b64 %rd80, %rd51, 4611686293439512576; 2026-02-21T09:34:15.9071154Z add.s32 %r125, %r93, 131104; 2026-02-21T09:34:15.9071326Z bfe.u32 %r126, %r125, 4, 14; 2026-02-21T09:34:15.9071503Z cvt.u64.u32 %rd52, %r126; 2026-02-21T09:34:15.9071679Z or.b64 %rd81, %rd52, 4611686293372403712; 2026-02-21T09:34:15.9071941Z add.s32 %r127, %r93, 32; 2026-02-21T09:34:15.9072143Z bfe.u32 %r128, %r127, 4, 14; 2026-02-21T09:34:15.9072326Z cvt.u64.u32 %rd53, %r128; 2026-02-21T09:34:15.9072505Z or.b64 %rd82, %rd53, 4611686293439512576; 2026-02-21T09:34:15.9072713Z add.s32 %r129, %r93, 131136; 2026-02-21T09:34:15.9072891Z bfe.u32 %r130, %r129, 4, 14; 2026-02-21T09:34:15.9073062Z cvt.u64.u32 %rd54, %r130; 2026-02-21T09:34:15.9073246Z or.b64 %rd83, %rd54, 4611686293372403712; 2026-02-21T09:34:15.9073447Z add.s32 %r131, %r93, 64; 2026-02-21T09:34:15.9073621Z bfe.u32 %r132, %r131, 4, 14; 2026-02-21T09:34:15.9073794Z cvt.u64.u32 %rd55, %r132; 2026-02-21T09:34:15.9073980Z or.b64 %rd84, %rd55, 4611686293439512576; 2026-02-21T09:34:15.9074199Z add.s32 %r133, %r93, 131168; 2026-02-21T09:34:15.9074378Z bfe.u32 %r134, %r133, 4, 14; 2026-02-21T09:34:15.9074547Z cvt.u64.u32 %rd56, %r134; 2026-02-21T09:34:15.9074758Z or.b64 %rd85, %rd56, 4611686293372403712; 2026-02-21T09:34:15.9074971Z add.s32 %r135, %r93, 96; 2026-02-21T09:34:15.9075139Z bfe.u32 %r136, %r135, 4, 14; 2026-02-21T09:34:15.9075320Z cvt.u64.u32 %rd57, %r136; 2026-02-21T09:34:15.9075499Z or.b64 %rd86, %rd57, 4611686293439512576; 2026-02-21T09:34:15.9075699Z add.s32 %r137, %r93, 147456; 2026-02-21T09:34:15.9075877Z bfe.u32 %r138, %r137, 4, 14; 2026-02-21T09:34:15.9076055Z cvt.u64.u32 %rd58, %r138; 2026-02-21T09:34:15.9076230Z or.b64 %rd87, %rd58, 4611686293372403712; 2026-02-21T09:34:15.9076435Z add.s32 %r139, %r93, 32768; 2026-02-21T09:34:15.9076614Z bfe.u32 %r140, %r139, 4, 14; 2026-02-21T09:34:15.9076785Z cvt.u64.u32 %rd59, %r140; 2026-02-21T09:34:15.9076970Z or.b64 %rd88, %rd59, 4611686293439512576; 2026-02-21T09:34:15.9077174Z add.s32 %r141, %r93, 147488; 2026-02-21T09:34:15.9077354Z bfe.u32 %r142, %r141, 4, 14; 2026-02-21T09:34:15.9077526Z cvt.u64.u32 %rd60, %r142; 2026-02-21T09:34:15.9077710Z or.b64 %rd89, %rd60, 4611686293372403712; 2026-02-21T09:34:15.9077911Z add.s32 %r143, %r93, 32800; 2026-02-21T09:34:15.9078092Z bfe.u32 %r144, %r143, 4, 14; 2026-02-21T09:34:15.9078270Z cvt.u64.u32 %rd61, %r144; 2026-02-21T09:34:15.9078448Z or.b64 %rd90, %rd61, 4611686293439512576; 2026-02-21T09:34:15.9078647Z add.s32 %r145, %r93, 147520; 2026-02-21T09:34:15.9078815Z bfe.u32 %r146, %r145, 4, 14; 2026-02-21T09:34:15.9078991Z cvt.u64.u32 %rd62, %r146; 2026-02-21T09:34:15.9079167Z or.b64 %rd91, %rd62, 4611686293372403712; 2026-02-21T09:34:15.9079364Z add.s32 %r147, %r93, 32832; 2026-02-21T09:34:15.9079533Z bfe.u32 %r148, %r147, 4, 14; 2026-02-21T09:34:15.9079710Z cvt.u64.u32 %rd63, %r148; 2026-02-21T09:34:15.9079886Z or.b64 %rd92, %rd63, 4611686293439512576; 2026-02-21T09:34:15.9080093Z add.s32 %r149, %r93, 147552; 2026-02-21T09:34:15.9080268Z bfe.u32 %r150, %r149, 4, 14; 2026-02-21T09:34:15.9080437Z cvt.u64.u32 %rd64, %r150; 2026-02-21T09:34:15.9080621Z or.b64 %rd93, %rd64, 4611686293372403712; 2026-02-21T09:34:15.9080812Z add.s32 %r151, %r93, 32864; 2026-02-21T09:34:15.9080990Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T09:34:15.9081162Z cvt.u64.u32 %rd65, %r152; 2026-02-21T09:34:15.9081348Z or.b64 %rd94, %rd65, 4611686293439512576; 2026-02-21T09:34:15.9081611Z or.b32 %r37, %r27, 128; 2026-02-21T09:34:15.9081788Z add.s32 %r153, %r93, %r28; 2026-02-21T09:34:15.9081970Z add.s32 %r378, %r153, 163840; 2026-02-21T09:34:15.9082150Z add.s32 %r380, %r153, 165888; 2026-02-21T09:34:15.9082333Z add.s32 %r382, %r153, 167936; 2026-02-21T09:34:15.9082507Z add.s32 %r384, %r153, 169984; 2026-02-21T09:34:15.9082686Z add.s32 %r386, %r153, 172032; 2026-02-21T09:34:15.9082856Z add.s32 %r388, %r153, 174080; 2026-02-21T09:34:15.9083038Z add.s32 %r390, %r153, 176128; 2026-02-21T09:34:15.9083208Z add.s32 %r392, %r153, 178176; 2026-02-21T09:34:15.9083392Z shl.b32 %r154, %r113, 11; 2026-02-21T09:34:15.9083561Z shl.b32 %r155, %r1, 4; 2026-02-21T09:34:15.9083733Z and.b32 %r156, %r155, 2032; 2026-02-21T09:34:15.9083911Z shr.u32 %r157, %r1, 1; 2026-02-21T09:34:15.9084074Z and.b32 %r158, %r157, 64; 2026-02-21T09:34:15.9084280Z xor.b32 %r159, %r156, %r158; 2026-02-21T09:34:15.9084482Z or.b32 %r160, %r159, %r154; 2026-02-21T09:34:15.9084667Z add.s32 %r46, %r93, %r160; 2026-02-21T09:34:15.9084873Z xor.b32 %r161, %r160, 16; 2026-02-21T09:34:15.9085046Z add.s32 %r47, %r93, %r161; 2026-02-21T09:34:15.9085216Z xor.b32 %r162, %r160, 32; 2026-02-21T09:34:15.9085391Z add.s32 %r48, %r93, %r162; 2026-02-21T09:34:15.9085560Z xor.b32 %r163, %r160, 48; 2026-02-21T09:34:15.9085733Z add.s32 %r49, %r93, %r163; 2026-02-21T09:34:15.9085910Z shl.b32 %r164, %r10, 6; 2026-02-21T09:34:15.9086077Z shl.b32 %r165, %r6, 4; 2026-02-21T09:34:15.9086248Z shr.u32 %r166, %r10, 1; 2026-02-21T09:34:15.9086414Z or.b32 %r167, %r164, %r165; 2026-02-21T09:34:15.9086595Z xor.b32 %r168, %r167, %r166; 2026-02-21T09:34:15.9086768Z add.s32 %r652, %r93, %r168; 2026-02-21T09:34:15.9086948Z add.s32 %r657, %r652, 512; 2026-02-21T09:34:15.9087116Z add.s32 %r662, %r652, 1024; 2026-02-21T09:34:15.9087294Z add.s32 %r667, %r652, 1536; 2026-02-21T09:34:15.9087599Z .loc 1 30 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:30:52 2026-02-21T09:34:15.9087927Z and.b32 %r169, %r1, 15; 2026-02-21T09:34:15.9088114Z mad.wide.u32 %rd66, %r169, 16, %rd28; 2026-02-21T09:34:15.9088313Z add.s64 %rd18, %rd66, 197120; 2026-02-21T09:34:15.9088492Z shl.b32 %r54, %r9, 10; 2026-02-21T09:34:15.9088772Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9089097Z or.b32 %r170, %r54, %r27; 2026-02-21T09:34:15.9089266Z or.b32 %r55, %r170, 114944; 2026-02-21T09:34:15.9089442Z bra.uni $L__BB0_2; 2026-02-21T09:34:15.9089660Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:15.9090019Z .loc 1 0 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:0:80 2026-02-21T09:34:15.9090349Z mov.b32 %r507, 1; 2026-02-21T09:34:15.9090634Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9090959Z // begin inline asm 2026-02-21T09:34:15.9091112Z 2026-02-21T09:34:15.9091243Z { 2026-02-21T09:34:15.9091379Z .reg .pred complete; 2026-02-21T09:34:15.9091549Z waitLoop: 2026-02-21T09:34:15.9091769Z mbarrier.try_wait.parity.shared.b64 complete, [%r506], %r507; 2026-02-21T09:34:15.9092045Z @!complete bra.uni waitLoop; 2026-02-21T09:34:15.9092227Z } 2026-02-21T09:34:15.9092300Z 2026-02-21T09:34:15.9092362Z // end inline asm 2026-02-21T09:34:15.9092647Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9092974Z cp.async.wait_group 0; 2026-02-21T09:34:15.9093153Z bar.sync 0; 2026-02-21T09:34:15.9093303Z // begin inline asm 2026-02-21T09:34:15.9093503Z @%p4 mbarrier.inval.shared::cta.b64 [%r309]; 2026-02-21T09:34:15.9093726Z // end inline asm 2026-02-21T09:34:15.9093874Z bar.sync 0; 2026-02-21T09:34:15.9094027Z // begin inline asm 2026-02-21T09:34:15.9094210Z @%p4 mbarrier.inval.shared::cta.b64 [%r394]; 2026-02-21T09:34:15.9094472Z // end inline asm 2026-02-21T09:34:15.9094656Z add.s32 %r510, %r93, 196608; 2026-02-21T09:34:15.9094884Z // begin inline asm 2026-02-21T09:34:15.9095064Z @%p4 mbarrier.inval.shared::cta.b64 [%r510]; 2026-02-21T09:34:15.9095277Z // end inline asm 2026-02-21T09:34:15.9095425Z bar.sync 0; 2026-02-21T09:34:15.9095575Z // begin inline asm 2026-02-21T09:34:15.9095756Z @%p4 mbarrier.inval.shared::cta.b64 [%r308]; 2026-02-21T09:34:15.9095959Z // end inline asm 2026-02-21T09:34:15.9096236Z .loc 1 59 45 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:59:45 2026-02-21T09:34:15.9096564Z shl.b32 %r793, %r60, 10; 2026-02-21T09:34:15.9096742Z shl.b32 %r794, %r61, 10; 2026-02-21T09:34:15.9096908Z shl.b32 %r795, %r62, 10; 2026-02-21T09:34:15.9097082Z shl.b32 %r796, %r63, 10; 2026-02-21T09:34:15.9097243Z shl.b32 %r797, %r64, 10; 2026-02-21T09:34:15.9097443Z shl.b32 %r798, %r65, 10; 2026-02-21T09:34:15.9097638Z shl.b32 %r799, %r66, 10; 2026-02-21T09:34:15.9097806Z shl.b32 %r800, %r67, 10; 2026-02-21T09:34:15.9097979Z shl.b32 %r801, %r68, 10; 2026-02-21T09:34:15.9098141Z shl.b32 %r802, %r69, 10; 2026-02-21T09:34:15.9098310Z shl.b32 %r803, %r70, 10; 2026-02-21T09:34:15.9098471Z shl.b32 %r804, %r71, 10; 2026-02-21T09:34:15.9098639Z shl.b32 %r805, %r72, 10; 2026-02-21T09:34:15.9098803Z shl.b32 %r806, %r73, 10; 2026-02-21T09:34:15.9098971Z shl.b32 %r807, %r74, 10; 2026-02-21T09:34:15.9099132Z shl.b32 %r808, %r75, 10; 2026-02-21T09:34:15.9099426Z .loc 1 59 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:59:52 2026-02-21T09:34:15.9099751Z add.s32 %r809, %r793, %r59; 2026-02-21T09:34:15.9099926Z add.s32 %r810, %r794, %r59; 2026-02-21T09:34:15.9100103Z add.s32 %r811, %r795, %r59; 2026-02-21T09:34:15.9100271Z add.s32 %r812, %r796, %r59; 2026-02-21T09:34:15.9100447Z add.s32 %r813, %r797, %r59; 2026-02-21T09:34:15.9100618Z add.s32 %r814, %r798, %r59; 2026-02-21T09:34:15.9100794Z add.s32 %r815, %r799, %r59; 2026-02-21T09:34:15.9100965Z add.s32 %r816, %r800, %r59; 2026-02-21T09:34:15.9101141Z add.s32 %r817, %r801, %r59; 2026-02-21T09:34:15.9101317Z add.s32 %r818, %r802, %r59; 2026-02-21T09:34:15.9101486Z add.s32 %r819, %r803, %r59; 2026-02-21T09:34:15.9101664Z add.s32 %r820, %r804, %r59; 2026-02-21T09:34:15.9101833Z add.s32 %r821, %r805, %r59; 2026-02-21T09:34:15.9102009Z add.s32 %r822, %r806, %r59; 2026-02-21T09:34:15.9102177Z add.s32 %r823, %r807, %r59; 2026-02-21T09:34:15.9102353Z add.s32 %r824, %r808, %r59; 2026-02-21T09:34:15.9102651Z .loc 1 59 24 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:59:24 2026-02-21T09:34:15.9102993Z mad.wide.s32 %rd155, %r809, 2, %rd29; 2026-02-21T09:34:15.9103211Z mad.wide.s32 %rd156, %r810, 2, %rd29; 2026-02-21T09:34:15.9103413Z mad.wide.s32 %rd157, %r811, 2, %rd29; 2026-02-21T09:34:15.9103618Z mad.wide.s32 %rd158, %r812, 2, %rd29; 2026-02-21T09:34:15.9103814Z mad.wide.s32 %rd159, %r813, 2, %rd29; 2026-02-21T09:34:15.9104014Z mad.wide.s32 %rd160, %r814, 2, %rd29; 2026-02-21T09:34:15.9104204Z mad.wide.s32 %rd161, %r815, 2, %rd29; 2026-02-21T09:34:15.9104402Z mad.wide.s32 %rd162, %r816, 2, %rd29; 2026-02-21T09:34:15.9104593Z mad.wide.s32 %rd163, %r817, 2, %rd29; 2026-02-21T09:34:15.9104849Z mad.wide.s32 %rd164, %r818, 2, %rd29; 2026-02-21T09:34:15.9105048Z mad.wide.s32 %rd165, %r819, 2, %rd29; 2026-02-21T09:34:15.9105237Z mad.wide.s32 %rd166, %r820, 2, %rd29; 2026-02-21T09:34:15.9105435Z mad.wide.s32 %rd167, %r821, 2, %rd29; 2026-02-21T09:34:15.9105624Z mad.wide.s32 %rd168, %r822, 2, %rd29; 2026-02-21T09:34:15.9105822Z mad.wide.s32 %rd169, %r823, 2, %rd29; 2026-02-21T09:34:15.9106014Z mad.wide.s32 %rd170, %r824, 2, %rd29; 2026-02-21T09:34:15.9106329Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9106652Z // begin inline asm 2026-02-21T09:34:15.9107077Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527}, [%r647 + 0]; 2026-02-21T09:34:15.9107581Z // end inline asm 2026-02-21T09:34:15.9107737Z // begin inline asm 2026-02-21T09:34:15.9108147Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r529, %r530, %r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544}, [%r647 + 16]; 2026-02-21T09:34:15.9108584Z // end inline asm 2026-02-21T09:34:15.9108744Z // begin inline asm 2026-02-21T09:34:15.9109149Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561}, [%r647 + 32]; 2026-02-21T09:34:15.9109581Z // end inline asm 2026-02-21T09:34:15.9109737Z // begin inline asm 2026-02-21T09:34:15.9110165Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578}, [%r647 + 48]; 2026-02-21T09:34:15.9110630Z // end inline asm 2026-02-21T09:34:15.9110780Z // begin inline asm 2026-02-21T09:34:15.9111183Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595}, [%r647 + 64]; 2026-02-21T09:34:15.9111624Z // end inline asm 2026-02-21T09:34:15.9111773Z // begin inline asm 2026-02-21T09:34:15.9112169Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612}, [%r647 + 80]; 2026-02-21T09:34:15.9112603Z // end inline asm 2026-02-21T09:34:15.9112758Z // begin inline asm 2026-02-21T09:34:15.9113143Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629}, [%r647 + 96]; 2026-02-21T09:34:15.9113577Z // end inline asm 2026-02-21T09:34:15.9113734Z // begin inline asm 2026-02-21T09:34:15.9114124Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646}, [%r647 + 112]; 2026-02-21T09:34:15.9114556Z // end inline asm 2026-02-21T09:34:15.9114743Z // begin inline asm 2026-02-21T09:34:15.9114923Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:34:15.9115106Z // end inline asm 2026-02-21T09:34:15.9115271Z cvt.u64.u32 %rd171, %r512; 2026-02-21T09:34:15.9115451Z cvt.u64.u32 %rd172, %r513; 2026-02-21T09:34:15.9115635Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:34:15.9115826Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:34:15.9116129Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9116455Z mov.b64 {%r825, %r826}, %rd174; 2026-02-21T09:34:15.9116653Z cvt.rn.f16x2.f32 %r827, %r826, %r825; 2026-02-21T09:34:15.9116982Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9117292Z cvt.u64.u32 %rd175, %r514; 2026-02-21T09:34:15.9117479Z cvt.u64.u32 %rd176, %r515; 2026-02-21T09:34:15.9117665Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:34:15.9117843Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:34:15.9118139Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9118447Z mov.b64 {%r828, %r829}, %rd178; 2026-02-21T09:34:15.9118643Z cvt.rn.f16x2.f32 %r830, %r829, %r828; 2026-02-21T09:34:15.9118939Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9119258Z cvt.u64.u32 %rd179, %r516; 2026-02-21T09:34:15.9119429Z cvt.u64.u32 %rd180, %r517; 2026-02-21T09:34:15.9119611Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:34:15.9119794Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:34:15.9120084Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9120402Z mov.b64 {%r831, %r832}, %rd182; 2026-02-21T09:34:15.9120627Z cvt.rn.f16x2.f32 %r833, %r832, %r831; 2026-02-21T09:34:15.9120987Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9121301Z cvt.u64.u32 %rd183, %r518; 2026-02-21T09:34:15.9121488Z cvt.u64.u32 %rd184, %r519; 2026-02-21T09:34:15.9121672Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:34:15.9121855Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:34:15.9122157Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9122477Z mov.b64 {%r834, %r835}, %rd186; 2026-02-21T09:34:15.9122674Z cvt.rn.f16x2.f32 %r836, %r835, %r834; 2026-02-21T09:34:15.9122981Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9123299Z cvt.u64.u32 %rd187, %r520; 2026-02-21T09:34:15.9123478Z cvt.u64.u32 %rd188, %r521; 2026-02-21T09:34:15.9123698Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:34:15.9123907Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:34:15.9124210Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9124530Z mov.b64 {%r837, %r838}, %rd190; 2026-02-21T09:34:15.9124760Z cvt.rn.f16x2.f32 %r839, %r838, %r837; 2026-02-21T09:34:15.9125071Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9125379Z cvt.u64.u32 %rd191, %r522; 2026-02-21T09:34:15.9125560Z cvt.u64.u32 %rd192, %r523; 2026-02-21T09:34:15.9125735Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:34:15.9125907Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:34:15.9126212Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9126532Z mov.b64 {%r840, %r841}, %rd194; 2026-02-21T09:34:15.9126725Z cvt.rn.f16x2.f32 %r842, %r841, %r840; 2026-02-21T09:34:15.9127027Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9127373Z cvt.u64.u32 %rd195, %r524; 2026-02-21T09:34:15.9127542Z cvt.u64.u32 %rd196, %r525; 2026-02-21T09:34:15.9127720Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:34:15.9127900Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:34:15.9128203Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9128528Z mov.b64 {%r843, %r844}, %rd198; 2026-02-21T09:34:15.9128713Z cvt.rn.f16x2.f32 %r845, %r844, %r843; 2026-02-21T09:34:15.9129039Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9129356Z cvt.u64.u32 %rd199, %r526; 2026-02-21T09:34:15.9129539Z cvt.u64.u32 %rd200, %r527; 2026-02-21T09:34:15.9129717Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:34:15.9129890Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:34:15.9130190Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9130503Z mov.b64 {%r846, %r847}, %rd202; 2026-02-21T09:34:15.9130697Z cvt.rn.f16x2.f32 %r848, %r847, %r846; 2026-02-21T09:34:15.9131015Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9131345Z cvt.u64.u32 %rd203, %r529; 2026-02-21T09:34:15.9131515Z cvt.u64.u32 %rd204, %r530; 2026-02-21T09:34:15.9131693Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:34:15.9131876Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:34:15.9132178Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9132503Z mov.b64 {%r849, %r850}, %rd206; 2026-02-21T09:34:15.9132686Z cvt.rn.f16x2.f32 %r851, %r850, %r849; 2026-02-21T09:34:15.9132992Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9133301Z cvt.u64.u32 %rd207, %r531; 2026-02-21T09:34:15.9133480Z cvt.u64.u32 %rd208, %r532; 2026-02-21T09:34:15.9133695Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:34:15.9133905Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:34:15.9134199Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9134520Z mov.b64 {%r852, %r853}, %rd210; 2026-02-21T09:34:15.9134746Z cvt.rn.f16x2.f32 %r854, %r853, %r852; 2026-02-21T09:34:15.9135050Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9135371Z cvt.u64.u32 %rd211, %r533; 2026-02-21T09:34:15.9135543Z cvt.u64.u32 %rd212, %r534; 2026-02-21T09:34:15.9135722Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:34:15.9135902Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:34:15.9136192Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9136544Z mov.b64 {%r855, %r856}, %rd214; 2026-02-21T09:34:15.9136757Z cvt.rn.f16x2.f32 %r857, %r856, %r855; 2026-02-21T09:34:15.9137068Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9137386Z cvt.u64.u32 %rd215, %r535; 2026-02-21T09:34:15.9137563Z cvt.u64.u32 %rd216, %r536; 2026-02-21T09:34:15.9137740Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:34:15.9137913Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:34:15.9138207Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9138525Z mov.b64 {%r858, %r859}, %rd218; 2026-02-21T09:34:15.9138716Z cvt.rn.f16x2.f32 %r860, %r859, %r858; 2026-02-21T09:34:15.9139017Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9139330Z cvt.u64.u32 %rd219, %r537; 2026-02-21T09:34:15.9139498Z cvt.u64.u32 %rd220, %r538; 2026-02-21T09:34:15.9139677Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:34:15.9139858Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:34:15.9140144Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9140462Z mov.b64 {%r861, %r862}, %rd222; 2026-02-21T09:34:15.9140645Z cvt.rn.f16x2.f32 %r863, %r862, %r861; 2026-02-21T09:34:15.9140952Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9141256Z cvt.u64.u32 %rd223, %r539; 2026-02-21T09:34:15.9141436Z cvt.u64.u32 %rd224, %r540; 2026-02-21T09:34:15.9141617Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:34:15.9141795Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:34:15.9142091Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9142401Z mov.b64 {%r864, %r865}, %rd226; 2026-02-21T09:34:15.9142590Z cvt.rn.f16x2.f32 %r866, %r865, %r864; 2026-02-21T09:34:15.9142893Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9143207Z cvt.u64.u32 %rd227, %r541; 2026-02-21T09:34:15.9143385Z cvt.u64.u32 %rd228, %r542; 2026-02-21T09:34:15.9143555Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:34:15.9143737Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:34:15.9144024Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9144343Z mov.b64 {%r867, %r868}, %rd230; 2026-02-21T09:34:15.9144529Z cvt.rn.f16x2.f32 %r869, %r868, %r867; 2026-02-21T09:34:15.9144889Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9145198Z cvt.u64.u32 %rd231, %r543; 2026-02-21T09:34:15.9145379Z cvt.u64.u32 %rd232, %r544; 2026-02-21T09:34:15.9145556Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:34:15.9145730Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:34:15.9146032Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9146347Z mov.b64 {%r870, %r871}, %rd234; 2026-02-21T09:34:15.9146576Z cvt.rn.f16x2.f32 %r872, %r871, %r870; 2026-02-21T09:34:15.9146909Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9147233Z cvt.u64.u32 %rd235, %r546; 2026-02-21T09:34:15.9147412Z cvt.u64.u32 %rd236, %r547; 2026-02-21T09:34:15.9147583Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:34:15.9147763Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:34:15.9148052Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9148374Z mov.b64 {%r873, %r874}, %rd238; 2026-02-21T09:34:15.9148558Z cvt.rn.f16x2.f32 %r875, %r874, %r873; 2026-02-21T09:34:15.9148869Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9149181Z cvt.u64.u32 %rd239, %r548; 2026-02-21T09:34:15.9149388Z cvt.u64.u32 %rd240, %r549; 2026-02-21T09:34:15.9149595Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:34:15.9149774Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:34:15.9150070Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9150381Z mov.b64 {%r876, %r877}, %rd242; 2026-02-21T09:34:15.9150570Z cvt.rn.f16x2.f32 %r878, %r877, %r876; 2026-02-21T09:34:15.9150868Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9151192Z cvt.u64.u32 %rd243, %r550; 2026-02-21T09:34:15.9151370Z cvt.u64.u32 %rd244, %r551; 2026-02-21T09:34:15.9151540Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:34:15.9151719Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:34:15.9152010Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9152335Z mov.b64 {%r879, %r880}, %rd246; 2026-02-21T09:34:15.9152521Z cvt.rn.f16x2.f32 %r881, %r880, %r879; 2026-02-21T09:34:15.9152835Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9153148Z cvt.u64.u32 %rd247, %r552; 2026-02-21T09:34:15.9153326Z cvt.u64.u32 %rd248, %r553; 2026-02-21T09:34:15.9153510Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:34:15.9153689Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:34:15.9153991Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9154306Z mov.b64 {%r882, %r883}, %rd250; 2026-02-21T09:34:15.9154499Z cvt.rn.f16x2.f32 %r884, %r883, %r882; 2026-02-21T09:34:15.9154861Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9155184Z cvt.u64.u32 %rd251, %r554; 2026-02-21T09:34:15.9155361Z cvt.u64.u32 %rd252, %r555; 2026-02-21T09:34:15.9155536Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:34:15.9155715Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:34:15.9156011Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9156337Z mov.b64 {%r885, %r886}, %rd254; 2026-02-21T09:34:15.9156522Z cvt.rn.f16x2.f32 %r887, %r886, %r885; 2026-02-21T09:34:15.9156833Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9157149Z cvt.u64.u32 %rd255, %r556; 2026-02-21T09:34:15.9157328Z cvt.u64.u32 %rd256, %r557; 2026-02-21T09:34:15.9157505Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:34:15.9157684Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:34:15.9157986Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9158300Z mov.b64 {%r888, %r889}, %rd258; 2026-02-21T09:34:15.9158492Z cvt.rn.f16x2.f32 %r890, %r889, %r888; 2026-02-21T09:34:15.9158810Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9159134Z cvt.u64.u32 %rd259, %r558; 2026-02-21T09:34:15.9159347Z cvt.u64.u32 %rd260, %r559; 2026-02-21T09:34:15.9159553Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:34:15.9159735Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:34:15.9160031Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9160352Z mov.b64 {%r891, %r892}, %rd262; 2026-02-21T09:34:15.9160536Z cvt.rn.f16x2.f32 %r893, %r892, %r891; 2026-02-21T09:34:15.9160847Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9161159Z cvt.u64.u32 %rd263, %r560; 2026-02-21T09:34:15.9161339Z cvt.u64.u32 %rd264, %r561; 2026-02-21T09:34:15.9161513Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:34:15.9161688Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:34:15.9161986Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9162331Z mov.b64 {%r894, %r895}, %rd266; 2026-02-21T09:34:15.9162551Z cvt.rn.f16x2.f32 %r896, %r895, %r894; 2026-02-21T09:34:15.9162862Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9163190Z cvt.u64.u32 %rd267, %r563; 2026-02-21T09:34:15.9163366Z cvt.u64.u32 %rd268, %r564; 2026-02-21T09:34:15.9163535Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:34:15.9163716Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:34:15.9164008Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9164335Z mov.b64 {%r897, %r898}, %rd270; 2026-02-21T09:34:15.9164523Z cvt.rn.f16x2.f32 %r899, %r898, %r897; 2026-02-21T09:34:15.9164861Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9165180Z cvt.u64.u32 %rd271, %r565; 2026-02-21T09:34:15.9165361Z cvt.u64.u32 %rd272, %r566; 2026-02-21T09:34:15.9165539Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:34:15.9165716Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:34:15.9166019Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9166335Z mov.b64 {%r900, %r901}, %rd274; 2026-02-21T09:34:15.9166537Z cvt.rn.f16x2.f32 %r902, %r901, %r900; 2026-02-21T09:34:15.9166838Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9167154Z cvt.u64.u32 %rd275, %r567; 2026-02-21T09:34:15.9167331Z cvt.u64.u32 %rd276, %r568; 2026-02-21T09:34:15.9167502Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:34:15.9167682Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:34:15.9167974Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9168299Z mov.b64 {%r903, %r904}, %rd278; 2026-02-21T09:34:15.9168485Z cvt.rn.f16x2.f32 %r905, %r904, %r903; 2026-02-21T09:34:15.9168799Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9169111Z cvt.u64.u32 %rd279, %r569; 2026-02-21T09:34:15.9169291Z cvt.u64.u32 %rd280, %r570; 2026-02-21T09:34:15.9169469Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:34:15.9169644Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:34:15.9169941Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9170253Z mov.b64 {%r906, %r907}, %rd282; 2026-02-21T09:34:15.9170445Z cvt.rn.f16x2.f32 %r908, %r907, %r906; 2026-02-21T09:34:15.9170742Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9171057Z cvt.u64.u32 %rd283, %r571; 2026-02-21T09:34:15.9171235Z cvt.u64.u32 %rd284, %r572; 2026-02-21T09:34:15.9171405Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:34:15.9171586Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:34:15.9171879Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9172260Z mov.b64 {%r909, %r910}, %rd286; 2026-02-21T09:34:15.9172473Z cvt.rn.f16x2.f32 %r911, %r910, %r909; 2026-02-21T09:34:15.9172784Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9173094Z cvt.u64.u32 %rd287, %r573; 2026-02-21T09:34:15.9173272Z cvt.u64.u32 %rd288, %r574; 2026-02-21T09:34:15.9173450Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:34:15.9173625Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:34:15.9173918Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9174240Z mov.b64 {%r912, %r913}, %rd290; 2026-02-21T09:34:15.9174431Z cvt.rn.f16x2.f32 %r914, %r913, %r912; 2026-02-21T09:34:15.9174756Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9175114Z cvt.u64.u32 %rd291, %r575; 2026-02-21T09:34:15.9175322Z cvt.u64.u32 %rd292, %r576; 2026-02-21T09:34:15.9175496Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:34:15.9175681Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:34:15.9175971Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9176288Z mov.b64 {%r915, %r916}, %rd294; 2026-02-21T09:34:15.9176474Z cvt.rn.f16x2.f32 %r917, %r916, %r915; 2026-02-21T09:34:15.9176782Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9177095Z cvt.u64.u32 %rd295, %r577; 2026-02-21T09:34:15.9177271Z cvt.u64.u32 %rd296, %r578; 2026-02-21T09:34:15.9177447Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:34:15.9177620Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:34:15.9177923Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9178240Z mov.b64 {%r918, %r919}, %rd298; 2026-02-21T09:34:15.9178440Z cvt.rn.f16x2.f32 %r920, %r919, %r918; 2026-02-21T09:34:15.9178754Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9179083Z cvt.u64.u32 %rd299, %r580; 2026-02-21T09:34:15.9179264Z cvt.u64.u32 %rd300, %r581; 2026-02-21T09:34:15.9179436Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:34:15.9179616Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:34:15.9179908Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9180223Z mov.b64 {%r921, %r922}, %rd302; 2026-02-21T09:34:15.9180410Z cvt.rn.f16x2.f32 %r923, %r922, %r921; 2026-02-21T09:34:15.9180717Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9181030Z cvt.u64.u32 %rd303, %r582; 2026-02-21T09:34:15.9181208Z cvt.u64.u32 %rd304, %r583; 2026-02-21T09:34:15.9181385Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:34:15.9181562Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:34:15.9181864Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9182180Z mov.b64 {%r924, %r925}, %rd306; 2026-02-21T09:34:15.9182370Z cvt.rn.f16x2.f32 %r926, %r925, %r924; 2026-02-21T09:34:15.9182674Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9182992Z cvt.u64.u32 %rd307, %r584; 2026-02-21T09:34:15.9183171Z cvt.u64.u32 %rd308, %r585; 2026-02-21T09:34:15.9183341Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:34:15.9183522Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:34:15.9183815Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9184138Z mov.b64 {%r927, %r928}, %rd310; 2026-02-21T09:34:15.9184321Z cvt.rn.f16x2.f32 %r929, %r928, %r927; 2026-02-21T09:34:15.9184630Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9184991Z cvt.u64.u32 %rd311, %r586; 2026-02-21T09:34:15.9185205Z cvt.u64.u32 %rd312, %r587; 2026-02-21T09:34:15.9185412Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:34:15.9185587Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:34:15.9185881Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9186193Z mov.b64 {%r930, %r931}, %rd314; 2026-02-21T09:34:15.9186384Z cvt.rn.f16x2.f32 %r932, %r931, %r930; 2026-02-21T09:34:15.9186689Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9187007Z cvt.u64.u32 %rd315, %r588; 2026-02-21T09:34:15.9187185Z cvt.u64.u32 %rd316, %r589; 2026-02-21T09:34:15.9187355Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:34:15.9187536Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:34:15.9187859Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9188207Z mov.b64 {%r933, %r934}, %rd318; 2026-02-21T09:34:15.9188394Z cvt.rn.f16x2.f32 %r935, %r934, %r933; 2026-02-21T09:34:15.9188709Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9189030Z cvt.u64.u32 %rd319, %r590; 2026-02-21T09:34:15.9189210Z cvt.u64.u32 %rd320, %r591; 2026-02-21T09:34:15.9189388Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:34:15.9189561Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:34:15.9189866Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9190184Z mov.b64 {%r936, %r937}, %rd322; 2026-02-21T09:34:15.9190376Z cvt.rn.f16x2.f32 %r938, %r937, %r936; 2026-02-21T09:34:15.9190687Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9191012Z cvt.u64.u32 %rd323, %r592; 2026-02-21T09:34:15.9191195Z cvt.u64.u32 %rd324, %r593; 2026-02-21T09:34:15.9191371Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:34:15.9191554Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:34:15.9191847Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9192170Z mov.b64 {%r939, %r940}, %rd326; 2026-02-21T09:34:15.9192354Z cvt.rn.f16x2.f32 %r941, %r940, %r939; 2026-02-21T09:34:15.9192662Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9192975Z cvt.u64.u32 %rd327, %r594; 2026-02-21T09:34:15.9193154Z cvt.u64.u32 %rd328, %r595; 2026-02-21T09:34:15.9193330Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:34:15.9193501Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:34:15.9193808Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9194127Z mov.b64 {%r942, %r943}, %rd330; 2026-02-21T09:34:15.9194327Z cvt.rn.f16x2.f32 %r944, %r943, %r942; 2026-02-21T09:34:15.9194652Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9195037Z cvt.u64.u32 %rd331, %r597; 2026-02-21T09:34:15.9195230Z cvt.u64.u32 %rd332, %r598; 2026-02-21T09:34:15.9195418Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:34:15.9195612Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:34:15.9195927Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9196340Z mov.b64 {%r945, %r946}, %rd334; 2026-02-21T09:34:15.9196530Z cvt.rn.f16x2.f32 %r947, %r946, %r945; 2026-02-21T09:34:15.9196841Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9197158Z cvt.u64.u32 %rd335, %r599; 2026-02-21T09:34:15.9197338Z cvt.u64.u32 %rd336, %r600; 2026-02-21T09:34:15.9197517Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:34:15.9197689Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:34:15.9198005Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9198441Z mov.b64 {%r948, %r949}, %rd338; 2026-02-21T09:34:15.9198648Z cvt.rn.f16x2.f32 %r950, %r949, %r948; 2026-02-21T09:34:15.9198975Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9199326Z cvt.u64.u32 %rd339, %r601; 2026-02-21T09:34:15.9199521Z cvt.u64.u32 %rd340, %r602; 2026-02-21T09:34:15.9199705Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:34:15.9199896Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:34:15.9200212Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9200552Z mov.b64 {%r951, %r952}, %rd342; 2026-02-21T09:34:15.9200747Z cvt.rn.f16x2.f32 %r953, %r952, %r951; 2026-02-21T09:34:15.9201093Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9201457Z cvt.u64.u32 %rd343, %r603; 2026-02-21T09:34:15.9201681Z cvt.u64.u32 %rd344, %r604; 2026-02-21T09:34:15.9201873Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:34:15.9202059Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:34:15.9202367Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9202704Z mov.b64 {%r954, %r955}, %rd346; 2026-02-21T09:34:15.9202909Z cvt.rn.f16x2.f32 %r956, %r955, %r954; 2026-02-21T09:34:15.9203228Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9203571Z cvt.u64.u32 %rd347, %r605; 2026-02-21T09:34:15.9203760Z cvt.u64.u32 %rd348, %r606; 2026-02-21T09:34:15.9203945Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:34:15.9204140Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:34:15.9204458Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9204829Z mov.b64 {%r957, %r958}, %rd350; 2026-02-21T09:34:15.9205027Z cvt.rn.f16x2.f32 %r959, %r958, %r957; 2026-02-21T09:34:15.9205370Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9205723Z cvt.u64.u32 %rd351, %r607; 2026-02-21T09:34:15.9205910Z cvt.u64.u32 %rd352, %r608; 2026-02-21T09:34:15.9206099Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:34:15.9206282Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:34:15.9206601Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9206920Z mov.b64 {%r960, %r961}, %rd354; 2026-02-21T09:34:15.9207111Z cvt.rn.f16x2.f32 %r962, %r961, %r960; 2026-02-21T09:34:15.9207422Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9207745Z cvt.u64.u32 %rd355, %r609; 2026-02-21T09:34:15.9207926Z cvt.u64.u32 %rd356, %r610; 2026-02-21T09:34:15.9208101Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:34:15.9208287Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:34:15.9208585Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9208904Z mov.b64 {%r963, %r964}, %rd358; 2026-02-21T09:34:15.9209089Z cvt.rn.f16x2.f32 %r965, %r964, %r963; 2026-02-21T09:34:15.9209404Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9209725Z cvt.u64.u32 %rd359, %r611; 2026-02-21T09:34:15.9209904Z cvt.u64.u32 %rd360, %r612; 2026-02-21T09:34:15.9210080Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:34:15.9210254Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:34:15.9210552Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9210872Z mov.b64 {%r966, %r967}, %rd362; 2026-02-21T09:34:15.9211064Z cvt.rn.f16x2.f32 %r968, %r967, %r966; 2026-02-21T09:34:15.9211373Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9211722Z cvt.u64.u32 %rd363, %r614; 2026-02-21T09:34:15.9211931Z cvt.u64.u32 %rd364, %r615; 2026-02-21T09:34:15.9212103Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:34:15.9212284Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:34:15.9212571Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9212899Z mov.b64 {%r969, %r970}, %rd366; 2026-02-21T09:34:15.9213085Z cvt.rn.f16x2.f32 %r971, %r970, %r969; 2026-02-21T09:34:15.9213392Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9213699Z cvt.u64.u32 %rd367, %r616; 2026-02-21T09:34:15.9213876Z cvt.u64.u32 %rd368, %r617; 2026-02-21T09:34:15.9214051Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:34:15.9214226Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:34:15.9214557Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9214960Z mov.b64 {%r972, %r973}, %rd370; 2026-02-21T09:34:15.9215157Z cvt.rn.f16x2.f32 %r974, %r973, %r972; 2026-02-21T09:34:15.9215458Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9215769Z cvt.u64.u32 %rd371, %r618; 2026-02-21T09:34:15.9215945Z cvt.u64.u32 %rd372, %r619; 2026-02-21T09:34:15.9216116Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:34:15.9216300Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:34:15.9216588Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9216920Z mov.b64 {%r975, %r976}, %rd374; 2026-02-21T09:34:15.9217104Z cvt.rn.f16x2.f32 %r977, %r976, %r975; 2026-02-21T09:34:15.9217410Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9217715Z cvt.u64.u32 %rd375, %r620; 2026-02-21T09:34:15.9217896Z cvt.u64.u32 %rd376, %r621; 2026-02-21T09:34:15.9218074Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:34:15.9218249Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:34:15.9218548Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9218856Z mov.b64 {%r978, %r979}, %rd378; 2026-02-21T09:34:15.9219044Z cvt.rn.f16x2.f32 %r980, %r979, %r978; 2026-02-21T09:34:15.9219349Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9219667Z cvt.u64.u32 %rd379, %r622; 2026-02-21T09:34:15.9219846Z cvt.u64.u32 %rd380, %r623; 2026-02-21T09:34:15.9220013Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:34:15.9220194Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:34:15.9220478Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9220793Z mov.b64 {%r981, %r982}, %rd382; 2026-02-21T09:34:15.9220977Z cvt.rn.f16x2.f32 %r983, %r982, %r981; 2026-02-21T09:34:15.9221285Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9221591Z cvt.u64.u32 %rd383, %r624; 2026-02-21T09:34:15.9221768Z cvt.u64.u32 %rd384, %r625; 2026-02-21T09:34:15.9221944Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:34:15.9222118Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:34:15.9222412Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9222719Z mov.b64 {%r984, %r985}, %rd386; 2026-02-21T09:34:15.9222910Z cvt.rn.f16x2.f32 %r986, %r985, %r984; 2026-02-21T09:34:15.9223208Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9223518Z cvt.u64.u32 %rd387, %r626; 2026-02-21T09:34:15.9223692Z cvt.u64.u32 %rd388, %r627; 2026-02-21T09:34:15.9223862Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:34:15.9224042Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:34:15.9224329Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9224771Z mov.b64 {%r987, %r988}, %rd390; 2026-02-21T09:34:15.9224956Z cvt.rn.f16x2.f32 %r989, %r988, %r987; 2026-02-21T09:34:15.9225263Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9225569Z cvt.u64.u32 %rd391, %r628; 2026-02-21T09:34:15.9225747Z cvt.u64.u32 %rd392, %r629; 2026-02-21T09:34:15.9225926Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:34:15.9226101Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:34:15.9226401Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9226716Z mov.b64 {%r990, %r991}, %rd394; 2026-02-21T09:34:15.9226907Z cvt.rn.f16x2.f32 %r992, %r991, %r990; 2026-02-21T09:34:15.9227240Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9227601Z cvt.u64.u32 %rd395, %r631; 2026-02-21T09:34:15.9227783Z cvt.u64.u32 %rd396, %r632; 2026-02-21T09:34:15.9227956Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:34:15.9228141Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:34:15.9228430Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9228762Z mov.b64 {%r993, %r994}, %rd398; 2026-02-21T09:34:15.9228948Z cvt.rn.f16x2.f32 %r995, %r994, %r993; 2026-02-21T09:34:15.9229262Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9229573Z cvt.u64.u32 %rd399, %r633; 2026-02-21T09:34:15.9229750Z cvt.u64.u32 %rd400, %r634; 2026-02-21T09:34:15.9229926Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:34:15.9230102Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:34:15.9230410Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9230728Z mov.b64 {%r996, %r997}, %rd402; 2026-02-21T09:34:15.9230926Z cvt.rn.f16x2.f32 %r998, %r997, %r996; 2026-02-21T09:34:15.9231227Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9231547Z cvt.u64.u32 %rd403, %r635; 2026-02-21T09:34:15.9231724Z cvt.u64.u32 %rd404, %r636; 2026-02-21T09:34:15.9231895Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:34:15.9232078Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:34:15.9232373Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9232698Z mov.b64 {%r999, %r1000}, %rd406; 2026-02-21T09:34:15.9232898Z cvt.rn.f16x2.f32 %r1001, %r1000, %r999; 2026-02-21T09:34:15.9233220Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9233532Z cvt.u64.u32 %rd407, %r637; 2026-02-21T09:34:15.9233712Z cvt.u64.u32 %rd408, %r638; 2026-02-21T09:34:15.9233892Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:34:15.9234070Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:34:15.9234377Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9234728Z mov.b64 {%r1002, %r1003}, %rd410; 2026-02-21T09:34:15.9234941Z cvt.rn.f16x2.f32 %r1004, %r1003, %r1002; 2026-02-21T09:34:15.9235252Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9235578Z cvt.u64.u32 %rd411, %r639; 2026-02-21T09:34:15.9235755Z cvt.u64.u32 %rd412, %r640; 2026-02-21T09:34:15.9235928Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:34:15.9236112Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:34:15.9236405Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9236727Z mov.b64 {%r1005, %r1006}, %rd414; 2026-02-21T09:34:15.9236927Z cvt.rn.f16x2.f32 %r1007, %r1006, %r1005; 2026-02-21T09:34:15.9237249Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9237629Z cvt.u64.u32 %rd415, %r641; 2026-02-21T09:34:15.9237815Z cvt.u64.u32 %rd416, %r642; 2026-02-21T09:34:15.9238000Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:34:15.9238180Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:34:15.9238481Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9238802Z mov.b64 {%r1008, %r1009}, %rd418; 2026-02-21T09:34:15.9239013Z cvt.rn.f16x2.f32 %r1010, %r1009, %r1008; 2026-02-21T09:34:15.9239329Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9239653Z cvt.u64.u32 %rd419, %r643; 2026-02-21T09:34:15.9239838Z cvt.u64.u32 %rd420, %r644; 2026-02-21T09:34:15.9240017Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:34:15.9240210Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:34:15.9240576Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9240906Z mov.b64 {%r1011, %r1012}, %rd422; 2026-02-21T09:34:15.9241102Z cvt.rn.f16x2.f32 %r1013, %r1012, %r1011; 2026-02-21T09:34:15.9241428Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9241752Z cvt.u64.u32 %rd423, %r645; 2026-02-21T09:34:15.9241929Z cvt.u64.u32 %rd424, %r646; 2026-02-21T09:34:15.9242107Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:34:15.9242286Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:34:15.9242597Z .loc 1 58 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:58:27 2026-02-21T09:34:15.9242923Z mov.b64 {%r1014, %r1015}, %rd426; 2026-02-21T09:34:15.9243134Z cvt.rn.f16x2.f32 %r1016, %r1015, %r1014; 2026-02-21T09:34:15.9243450Z .loc 1 59 82 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:59:82 2026-02-21T09:34:15.9243830Z st.shared.v4.b32 [%r46], {%r827, %r839, %r851, %r863}; 2026-02-21T09:34:15.9244108Z st.shared.v4.b32 [%r47], {%r875, %r887, %r899, %r911}; 2026-02-21T09:34:15.9244364Z st.shared.v4.b32 [%r48], {%r923, %r935, %r947, %r959}; 2026-02-21T09:34:15.9244630Z st.shared.v4.b32 [%r49], {%r971, %r983, %r995, %r1007}; 2026-02-21T09:34:15.9244884Z bar.sync 0; 2026-02-21T09:34:15.9245042Z // begin inline asm 2026-02-21T09:34:15.9245307Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r648, %r649, %r650, %r651}, [%r652]; 2026-02-21T09:34:15.9245616Z // end inline asm 2026-02-21T09:34:15.9245772Z // begin inline asm 2026-02-21T09:34:15.9246040Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r653, %r654, %r655, %r656}, [%r657]; 2026-02-21T09:34:15.9246341Z // end inline asm 2026-02-21T09:34:15.9246495Z // begin inline asm 2026-02-21T09:34:15.9246754Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r658, %r659, %r660, %r661}, [%r662]; 2026-02-21T09:34:15.9247046Z // end inline asm 2026-02-21T09:34:15.9247205Z // begin inline asm 2026-02-21T09:34:15.9247454Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r663, %r664, %r665, %r666}, [%r667]; 2026-02-21T09:34:15.9247767Z // end inline asm 2026-02-21T09:34:15.9247919Z bar.sync 0; 2026-02-21T09:34:15.9248115Z st.shared.v4.b32 [%r46], {%r830, %r842, %r854, %r866}; 2026-02-21T09:34:15.9248377Z st.shared.v4.b32 [%r47], {%r878, %r890, %r902, %r914}; 2026-02-21T09:34:15.9248627Z st.shared.v4.b32 [%r48], {%r926, %r938, %r950, %r962}; 2026-02-21T09:34:15.9248888Z st.shared.v4.b32 [%r49], {%r974, %r986, %r998, %r1010}; 2026-02-21T09:34:15.9249107Z bar.sync 0; 2026-02-21T09:34:15.9249257Z // begin inline asm 2026-02-21T09:34:15.9249510Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r668, %r669, %r670, %r671}, [%r652]; 2026-02-21T09:34:15.9249807Z // end inline asm 2026-02-21T09:34:15.9249956Z // begin inline asm 2026-02-21T09:34:15.9250213Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r673, %r674, %r675, %r676}, [%r657]; 2026-02-21T09:34:15.9250505Z // end inline asm 2026-02-21T09:34:15.9250657Z // begin inline asm 2026-02-21T09:34:15.9250946Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r678, %r679, %r680, %r681}, [%r662]; 2026-02-21T09:34:15.9251261Z // end inline asm 2026-02-21T09:34:15.9251416Z // begin inline asm 2026-02-21T09:34:15.9251661Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r683, %r684, %r685, %r686}, [%r667]; 2026-02-21T09:34:15.9251954Z // end inline asm 2026-02-21T09:34:15.9252102Z bar.sync 0; 2026-02-21T09:34:15.9252292Z st.shared.v4.b32 [%r46], {%r833, %r845, %r857, %r869}; 2026-02-21T09:34:15.9252556Z st.shared.v4.b32 [%r47], {%r881, %r893, %r905, %r917}; 2026-02-21T09:34:15.9252808Z st.shared.v4.b32 [%r48], {%r929, %r941, %r953, %r965}; 2026-02-21T09:34:15.9253075Z st.shared.v4.b32 [%r49], {%r977, %r989, %r1001, %r1013}; 2026-02-21T09:34:15.9253299Z bar.sync 0; 2026-02-21T09:34:15.9253456Z // begin inline asm 2026-02-21T09:34:15.9253718Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r688, %r689, %r690, %r691}, [%r652]; 2026-02-21T09:34:15.9254057Z // end inline asm 2026-02-21T09:34:15.9254236Z // begin inline asm 2026-02-21T09:34:15.9254495Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r693, %r694, %r695, %r696}, [%r657]; 2026-02-21T09:34:15.9254843Z // end inline asm 2026-02-21T09:34:15.9254992Z // begin inline asm 2026-02-21T09:34:15.9255248Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r698, %r699, %r700, %r701}, [%r662]; 2026-02-21T09:34:15.9255537Z // end inline asm 2026-02-21T09:34:15.9255692Z // begin inline asm 2026-02-21T09:34:15.9255942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r703, %r704, %r705, %r706}, [%r667]; 2026-02-21T09:34:15.9256239Z // end inline asm 2026-02-21T09:34:15.9256386Z bar.sync 0; 2026-02-21T09:34:15.9256574Z st.shared.v4.b32 [%r46], {%r836, %r848, %r860, %r872}; 2026-02-21T09:34:15.9256833Z st.shared.v4.b32 [%r47], {%r884, %r896, %r908, %r920}; 2026-02-21T09:34:15.9257089Z st.shared.v4.b32 [%r48], {%r932, %r944, %r956, %r968}; 2026-02-21T09:34:15.9257358Z st.shared.v4.b32 [%r49], {%r980, %r992, %r1004, %r1016}; 2026-02-21T09:34:15.9257580Z bar.sync 0; 2026-02-21T09:34:15.9257735Z // begin inline asm 2026-02-21T09:34:15.9257991Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r708, %r709, %r710, %r711}, [%r652]; 2026-02-21T09:34:15.9258294Z // end inline asm 2026-02-21T09:34:15.9258444Z // begin inline asm 2026-02-21T09:34:15.9258707Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r713, %r714, %r715, %r716}, [%r657]; 2026-02-21T09:34:15.9259012Z // end inline asm 2026-02-21T09:34:15.9259163Z // begin inline asm 2026-02-21T09:34:15.9259422Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r718, %r719, %r720, %r721}, [%r662]; 2026-02-21T09:34:15.9259713Z // end inline asm 2026-02-21T09:34:15.9259868Z // begin inline asm 2026-02-21T09:34:15.9260121Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r723, %r724, %r725, %r726}, [%r667]; 2026-02-21T09:34:15.9260425Z // end inline asm 2026-02-21T09:34:15.9260571Z // begin inline asm 2026-02-21T09:34:15.9260790Z st.global.v4.b32 [ %rd155 + 0 ], { %r648, %r668, %r688, %r708 }; 2026-02-21T09:34:15.9261042Z // end inline asm 2026-02-21T09:34:15.9261194Z // begin inline asm 2026-02-21T09:34:15.9261412Z st.global.v4.b32 [ %rd156 + 0 ], { %r649, %r669, %r689, %r709 }; 2026-02-21T09:34:15.9261650Z // end inline asm 2026-02-21T09:34:15.9261806Z // begin inline asm 2026-02-21T09:34:15.9262007Z st.global.v4.b32 [ %rd157 + 0 ], { %r650, %r670, %r690, %r710 }; 2026-02-21T09:34:15.9262245Z // end inline asm 2026-02-21T09:34:15.9262396Z // begin inline asm 2026-02-21T09:34:15.9262604Z st.global.v4.b32 [ %rd158 + 0 ], { %r651, %r671, %r691, %r711 }; 2026-02-21T09:34:15.9262848Z // end inline asm 2026-02-21T09:34:15.9262997Z // begin inline asm 2026-02-21T09:34:15.9263205Z st.global.v4.b32 [ %rd159 + 0 ], { %r653, %r673, %r693, %r713 }; 2026-02-21T09:34:15.9263436Z // end inline asm 2026-02-21T09:34:15.9263593Z // begin inline asm 2026-02-21T09:34:15.9263795Z st.global.v4.b32 [ %rd160 + 0 ], { %r654, %r674, %r694, %r714 }; 2026-02-21T09:34:15.9264039Z // end inline asm 2026-02-21T09:34:15.9264190Z // begin inline asm 2026-02-21T09:34:15.9264432Z st.global.v4.b32 [ %rd161 + 0 ], { %r655, %r675, %r695, %r715 }; 2026-02-21T09:34:15.9264740Z // end inline asm 2026-02-21T09:34:15.9264895Z // begin inline asm 2026-02-21T09:34:15.9265109Z st.global.v4.b32 [ %rd162 + 0 ], { %r656, %r676, %r696, %r716 }; 2026-02-21T09:34:15.9265343Z // end inline asm 2026-02-21T09:34:15.9265510Z // begin inline asm 2026-02-21T09:34:15.9265711Z st.global.v4.b32 [ %rd163 + 0 ], { %r658, %r678, %r698, %r718 }; 2026-02-21T09:34:15.9265945Z // end inline asm 2026-02-21T09:34:15.9266094Z // begin inline asm 2026-02-21T09:34:15.9266297Z st.global.v4.b32 [ %rd164 + 0 ], { %r659, %r679, %r699, %r719 }; 2026-02-21T09:34:15.9266537Z // end inline asm 2026-02-21T09:34:15.9266686Z // begin inline asm 2026-02-21T09:34:15.9266887Z st.global.v4.b32 [ %rd165 + 0 ], { %r660, %r680, %r700, %r720 }; 2026-02-21T09:34:15.9267112Z // end inline asm 2026-02-21T09:34:15.9267323Z // begin inline asm 2026-02-21T09:34:15.9267553Z st.global.v4.b32 [ %rd166 + 0 ], { %r661, %r681, %r701, %r721 }; 2026-02-21T09:34:15.9267797Z // end inline asm 2026-02-21T09:34:15.9267948Z // begin inline asm 2026-02-21T09:34:15.9268157Z st.global.v4.b32 [ %rd167 + 0 ], { %r663, %r683, %r703, %r723 }; 2026-02-21T09:34:15.9268385Z // end inline asm 2026-02-21T09:34:15.9268542Z // begin inline asm 2026-02-21T09:34:15.9268750Z st.global.v4.b32 [ %rd168 + 0 ], { %r664, %r684, %r704, %r724 }; 2026-02-21T09:34:15.9268979Z // end inline asm 2026-02-21T09:34:15.9269135Z // begin inline asm 2026-02-21T09:34:15.9269334Z st.global.v4.b32 [ %rd169 + 0 ], { %r665, %r685, %r705, %r725 }; 2026-02-21T09:34:15.9269573Z // end inline asm 2026-02-21T09:34:15.9269723Z // begin inline asm 2026-02-21T09:34:15.9269838Z st.global.v4.b32 [ %rd170 + 0 ], { %r666, %r686, %r706, %r726 }; 2026-02-21T09:34:15.9269899Z // end inline asm 2026-02-21T09:34:15.9270088Z .loc 1 30 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:30:52 2026-02-21T09:34:15.9270165Z add.s32 %r1018, %r1018, 1; 2026-02-21T09:34:15.9270241Z setp.ne.b32 %p91, %r1018, %r4; 2026-02-21T09:34:15.9270308Z @%p91 bra $L__BB0_2; 2026-02-21T09:34:15.9270371Z bra.uni $L__BB0_9; 2026-02-21T09:34:15.9270491Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:34:15.9270593Z // Child Loop BB0_5 Depth 2 2026-02-21T09:34:15.9270774Z .loc 1 0 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:0:52 2026-02-21T09:34:15.9270852Z setp.lt.u32 %p36, %r1, 64; 2026-02-21T09:34:15.9271031Z .loc 1 36 35 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:36:35 2026-02-21T09:34:15.9271104Z mul.hi.s32 %r334, %r1018, 715827883; 2026-02-21T09:34:15.9271177Z shr.u32 %r335, %r334, 31; 2026-02-21T09:34:15.9271243Z shr.s32 %r336, %r334, 10; 2026-02-21T09:34:15.9271310Z add.s32 %r337, %r336, %r335; 2026-02-21T09:34:15.9271490Z .loc 1 37 33 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:37:33 2026-02-21T09:34:15.9271568Z shl.b32 %r338, %r337, 6; 2026-02-21T09:34:15.9271742Z .loc 1 38 39 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:38:39 2026-02-21T09:34:15.9271807Z sub.s32 %r339, 4, %r338; 2026-02-21T09:34:15.9271987Z .loc 1 38 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:38:52 2026-02-21T09:34:15.9272052Z min.s32 %r340, %r339, 64; 2026-02-21T09:34:15.9272225Z .loc 1 39 45 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:39:45 2026-02-21T09:34:15.9272301Z mul.lo.s32 %r341, %r337, 6144; 2026-02-21T09:34:15.9272368Z sub.s32 %r342, %r1018, %r341; 2026-02-21T09:34:15.9272541Z .loc 1 40 51 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:40:51 2026-02-21T09:34:15.9272608Z div.s32 %r57, %r342, %r340; 2026-02-21T09:34:15.9272792Z .loc 1 39 64 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:39:64 2026-02-21T09:34:15.9272922Z mul.lo.s32 %r343, %r57, %r340; 2026-02-21T09:34:15.9272990Z sub.s32 %r344, %r342, %r343; 2026-02-21T09:34:15.9273183Z .loc 1 39 30 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:39:30 2026-02-21T09:34:15.9273250Z add.s32 %r345, %r344, %r338; 2026-02-21T09:34:15.9273432Z .loc 1 41 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:41:27 2026-02-21T09:34:15.9273505Z shl.b32 %r397, %r345, 8; 2026-02-21T09:34:15.9273689Z .loc 1 43 27 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:43:27 2026-02-21T09:34:15.9273756Z shl.b32 %r346, %r57, 7; 2026-02-21T09:34:15.9273945Z .loc 1 44 32 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:44:32 2026-02-21T09:34:15.9274010Z or.b32 %r347, %r346, %r9; 2026-02-21T09:34:15.9274237Z .loc 1 54 53 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:53 2026-02-21T09:34:15.9274304Z shl.b32 %r348, %r347, 10; 2026-02-21T09:34:15.9274490Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9274577Z shfl.sync.idx.b32 %r76, %r5, 0, 31, -1; 2026-02-21T09:34:15.9274640Z shl.b32 %r349, %r76, 21; 2026-02-21T09:34:15.9274753Z and.b32 %r350, %r349, 6291456; 2026-02-21T09:34:15.9274824Z add.s32 %r351, %r350, %r1017; 2026-02-21T09:34:15.9274891Z shl.b32 %r352, %r76, 5; 2026-02-21T09:34:15.9274970Z and.b32 %r353, %r352, 128; 2026-02-21T09:34:15.9275036Z add.s32 %r647, %r351, %r353; 2026-02-21T09:34:15.9275106Z mov.pred %p41, -1; 2026-02-21T09:34:15.9275169Z mov.b32 %r1019, 0; 2026-02-21T09:34:15.9275246Z // begin inline asm 2026-02-21T09:34:15.9275631Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 0], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9275696Z // end inline asm 2026-02-21T09:34:15.9275768Z // begin inline asm 2026-02-21T09:34:15.9276141Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 16], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9276205Z // end inline asm 2026-02-21T09:34:15.9276273Z // begin inline asm 2026-02-21T09:34:15.9276627Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 32], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9276688Z // end inline asm 2026-02-21T09:34:15.9276748Z // begin inline asm 2026-02-21T09:34:15.9277101Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 48], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9277164Z // end inline asm 2026-02-21T09:34:15.9277227Z // begin inline asm 2026-02-21T09:34:15.9277581Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 64], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9277646Z // end inline asm 2026-02-21T09:34:15.9277708Z // begin inline asm 2026-02-21T09:34:15.9278066Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 80], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9278127Z // end inline asm 2026-02-21T09:34:15.9278189Z // begin inline asm 2026-02-21T09:34:15.9278548Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 96], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9278608Z // end inline asm 2026-02-21T09:34:15.9278669Z // begin inline asm 2026-02-21T09:34:15.9279025Z @%p41 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r647 + 112], {%r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019, %r1019}; 2026-02-21T09:34:15.9279138Z // end inline asm 2026-02-21T09:34:15.9279198Z // begin inline asm 2026-02-21T09:34:15.9279279Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:34:15.9279348Z // end inline asm 2026-02-21T09:34:15.9279407Z bar.sync 0; 2026-02-21T09:34:15.9279590Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9279667Z add.s32 %r1020, %r93, 196608; 2026-02-21T09:34:15.9279730Z // begin inline asm 2026-02-21T09:34:15.9279826Z @%p4 mbarrier.init.shared::cta.b64 [%r1020], 1; 2026-02-21T09:34:15.9279895Z // end inline asm 2026-02-21T09:34:15.9279954Z bar.sync 0; 2026-02-21T09:34:15.9280021Z add.s32 %r308, %r93, 196616; 2026-02-21T09:34:15.9280083Z // begin inline asm 2026-02-21T09:34:15.9280214Z @%p4 mbarrier.init.shared::cta.b64 [%r308], 1; 2026-02-21T09:34:15.9280307Z // end inline asm 2026-02-21T09:34:15.9280378Z add.s32 %r309, %r93, 196624; 2026-02-21T09:34:15.9280449Z // begin inline asm 2026-02-21T09:34:15.9280541Z @%p4 mbarrier.init.shared::cta.b64 [%r309], 1; 2026-02-21T09:34:15.9280601Z // end inline asm 2026-02-21T09:34:15.9280659Z bar.sync 0; 2026-02-21T09:34:15.9280731Z add.s32 %r394, %r93, 196632; 2026-02-21T09:34:15.9280793Z // begin inline asm 2026-02-21T09:34:15.9280881Z @%p4 mbarrier.init.shared::cta.b64 [%r394], 1; 2026-02-21T09:34:15.9280949Z // end inline asm 2026-02-21T09:34:15.9281131Z .loc 1 54 60 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:60 2026-02-21T09:34:15.9281196Z or.b32 %r355, %r348, %r27; 2026-02-21T09:34:15.9281386Z .loc 1 54 32 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:32 2026-02-21T09:34:15.9281463Z mad.wide.s32 %rd67, %r355, 2, %rd28; 2026-02-21T09:34:15.9281531Z cvt.u64.u32 %rd76, %r27; 2026-02-21T09:34:15.9281599Z cvt.s64.s32 %rd19, %r348; 2026-02-21T09:34:15.9281675Z or.b64 %rd77, %rd19, %rd76; 2026-02-21T09:34:15.9281740Z shl.b64 %rd78, %rd77, 1; 2026-02-21T09:34:15.9281805Z add.s64 %rd20, %rd28, %rd78; 2026-02-21T09:34:15.9281875Z add.s64 %rd68, %rd20, 32768; 2026-02-21T09:34:15.9281941Z add.s64 %rd69, %rd20, 65536; 2026-02-21T09:34:15.9282003Z add.s64 %rd70, %rd20, 98304; 2026-02-21T09:34:15.9282068Z add.s64 %rd71, %rd20, 131072; 2026-02-21T09:34:15.9282141Z add.s64 %rd72, %rd20, 163840; 2026-02-21T09:34:15.9282204Z add.s64 %rd73, %rd20, 196608; 2026-02-21T09:34:15.9282267Z add.s64 %rd74, %rd20, 229376; 2026-02-21T09:34:15.9282334Z mov.b32 %r379, 16; 2026-02-21T09:34:15.9282511Z .loc 1 54 85 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:85 2026-02-21T09:34:15.9282573Z // begin inline asm 2026-02-21T09:34:15.9282716Z cp.async.cg.shared.global [ %r311 + 0 ], [ %rd67 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9282778Z // end inline asm 2026-02-21T09:34:15.9282841Z // begin inline asm 2026-02-21T09:34:15.9282978Z cp.async.cg.shared.global [ %r313 + 0 ], [ %rd68 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9283046Z // end inline asm 2026-02-21T09:34:15.9283107Z // begin inline asm 2026-02-21T09:34:15.9283233Z cp.async.cg.shared.global [ %r315 + 0 ], [ %rd69 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9283300Z // end inline asm 2026-02-21T09:34:15.9283362Z // begin inline asm 2026-02-21T09:34:15.9283485Z cp.async.cg.shared.global [ %r317 + 0 ], [ %rd70 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9283546Z // end inline asm 2026-02-21T09:34:15.9283614Z // begin inline asm 2026-02-21T09:34:15.9283735Z cp.async.cg.shared.global [ %r319 + 0 ], [ %rd71 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9283795Z // end inline asm 2026-02-21T09:34:15.9283865Z // begin inline asm 2026-02-21T09:34:15.9283987Z cp.async.cg.shared.global [ %r321 + 0 ], [ %rd72 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9284048Z // end inline asm 2026-02-21T09:34:15.9284119Z // begin inline asm 2026-02-21T09:34:15.9284267Z cp.async.cg.shared.global [ %r323 + 0 ], [ %rd73 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9284354Z // end inline asm 2026-02-21T09:34:15.9284416Z // begin inline asm 2026-02-21T09:34:15.9284549Z cp.async.cg.shared.global [ %r325 + 0 ], [ %rd74 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9284609Z // end inline asm 2026-02-21T09:34:15.9284714Z cp.async.commit_group; 2026-02-21T09:34:15.9284911Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9284975Z bar.sync 0; 2026-02-21T09:34:15.9285041Z // begin inline asm 2026-02-21T09:34:15.9285167Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r309], 65536; 2026-02-21T09:34:15.9285239Z // end inline asm 2026-02-21T09:34:15.9285429Z .loc 1 55 44 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:55:44 2026-02-21T09:34:15.9285496Z // begin inline asm 2026-02-21T09:34:15.9285617Z fence.proxy.async.shared::cta; 2026-02-21T09:34:15.9285705Z // end inline asm 2026-02-21T09:34:15.9285772Z bar.sync 0; 2026-02-21T09:34:15.9285857Z elect.sync %r356|%p37, -1; 2026-02-21T09:34:15.9285930Z and.pred %p35, %p36, %p37; 2026-02-21T09:34:15.9285997Z and.b32 %r357, %r76, 1; 2026-02-21T09:34:15.9286062Z shl.b32 %r78, %r357, 14; 2026-02-21T09:34:15.9286134Z shl.b32 %r358, %r357, 15; 2026-02-21T09:34:15.9286200Z add.s32 %r328, %r93, %r358; 2026-02-21T09:34:15.9286263Z shl.b32 %r329, %r357, 6; 2026-02-21T09:34:15.9286331Z // begin inline asm 2026-02-21T09:34:15.9286618Z @%p35 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r328], [%rd104, {%r329, %r397}], [%r309]; 2026-02-21T09:34:15.9286681Z // end inline asm 2026-02-21T09:34:15.9286863Z .loc 1 54 85 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:85 2026-02-21T09:34:15.9286935Z cp.async.wait_group 0; 2026-02-21T09:34:15.9286994Z bar.sync 0; 2026-02-21T09:34:15.9287175Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9287248Z // begin inline asm 2026-02-21T09:34:15.9287305Z 2026-02-21T09:34:15.9287360Z { 2026-02-21T09:34:15.9287435Z .reg .pred complete; 2026-02-21T09:34:15.9287498Z waitLoop: 2026-02-21T09:34:15.9287636Z mbarrier.try_wait.parity.shared.b64 complete, [%r309], %r1019; 2026-02-21T09:34:15.9287709Z @!complete bra.uni waitLoop; 2026-02-21T09:34:15.9287771Z } 2026-02-21T09:34:15.9287776Z 2026-02-21T09:34:15.9287837Z // end inline asm 2026-02-21T09:34:15.9288013Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9288094Z setp.ne.b32 %p38, %r76, 0; 2026-02-21T09:34:15.9288158Z @%p38 bra $L__BB0_4; 2026-02-21T09:34:15.9288272Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:15.9288350Z elect.sync %r375|%p40, -1; 2026-02-21T09:34:15.9288416Z mov.b32 %r360, 138412048; 2026-02-21T09:34:15.9288484Z mov.pred %p39, 0; 2026-02-21T09:34:15.9288548Z // begin inline asm 2026-02-21T09:34:15.9288722Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd79, %rd80, %r360, %p39; 2026-02-21T09:34:15.9288783Z // end inline asm 2026-02-21T09:34:15.9288845Z // begin inline asm 2026-02-21T09:34:15.9289010Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd81, %rd82, %r360, %p41; 2026-02-21T09:34:15.9289073Z // end inline asm 2026-02-21T09:34:15.9289135Z // begin inline asm 2026-02-21T09:34:15.9289290Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd83, %rd84, %r360, %p41; 2026-02-21T09:34:15.9289351Z // end inline asm 2026-02-21T09:34:15.9289412Z // begin inline asm 2026-02-21T09:34:15.9289558Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd85, %rd86, %r360, %p41; 2026-02-21T09:34:15.9289626Z // end inline asm 2026-02-21T09:34:15.9289688Z // begin inline asm 2026-02-21T09:34:15.9289836Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd87, %rd88, %r360, %p41; 2026-02-21T09:34:15.9289936Z // end inline asm 2026-02-21T09:34:15.9289997Z // begin inline asm 2026-02-21T09:34:15.9290186Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd89, %rd90, %r360, %p41; 2026-02-21T09:34:15.9290254Z // end inline asm 2026-02-21T09:34:15.9290317Z // begin inline asm 2026-02-21T09:34:15.9290460Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd91, %rd92, %r360, %p41; 2026-02-21T09:34:15.9290528Z // end inline asm 2026-02-21T09:34:15.9290588Z // begin inline asm 2026-02-21T09:34:15.9290732Z @%p40 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd93, %rd94, %r360, %p41; 2026-02-21T09:34:15.9290791Z // end inline asm 2026-02-21T09:34:15.9290866Z add.s32 %r377, %r93, 196608; 2026-02-21T09:34:15.9290931Z cvt.u64.u32 %rd95, %r377; 2026-02-21T09:34:15.9290992Z // begin inline asm 2026-02-21T09:34:15.9291138Z @%p40 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd95]; 2026-02-21T09:34:15.9291220Z // end inline asm 2026-02-21T09:34:15.9291356Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:34:15.9291550Z .loc 1 0 0 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:0 2026-02-21T09:34:15.9291615Z or.b32 %r59, %r397, %r7; 2026-02-21T09:34:15.9291680Z or.b32 %r60, %r346, %r11; 2026-02-21T09:34:15.9291741Z or.b32 %r61, %r346, %r12; 2026-02-21T09:34:15.9291813Z or.b32 %r62, %r346, %r13; 2026-02-21T09:34:15.9291875Z or.b32 %r63, %r346, %r14; 2026-02-21T09:34:15.9291936Z or.b32 %r64, %r346, %r15; 2026-02-21T09:34:15.9292005Z or.b32 %r65, %r346, %r16; 2026-02-21T09:34:15.9292066Z or.b32 %r66, %r346, %r17; 2026-02-21T09:34:15.9292127Z or.b32 %r67, %r346, %r18; 2026-02-21T09:34:15.9292187Z or.b32 %r68, %r346, %r19; 2026-02-21T09:34:15.9292256Z or.b32 %r69, %r346, %r20; 2026-02-21T09:34:15.9292319Z or.b32 %r70, %r346, %r21; 2026-02-21T09:34:15.9292381Z or.b32 %r71, %r346, %r22; 2026-02-21T09:34:15.9292450Z or.b32 %r72, %r346, %r23; 2026-02-21T09:34:15.9292514Z or.b32 %r73, %r346, %r24; 2026-02-21T09:34:15.9292576Z or.b32 %r74, %r346, %r25; 2026-02-21T09:34:15.9292639Z or.b32 %r75, %r346, %r26; 2026-02-21T09:34:15.9292835Z .loc 1 54 32 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:32 2026-02-21T09:34:15.9292900Z add.s64 %rd96, %rd20, 256; 2026-02-21T09:34:15.9292964Z cvt.u64.u32 %rd106, %r37; 2026-02-21T09:34:15.9293043Z add.s64 %rd107, %rd19, %rd106; 2026-02-21T09:34:15.9293108Z shl.b64 %rd108, %rd107, 1; 2026-02-21T09:34:15.9293179Z add.s64 %rd109, %rd28, %rd108; 2026-02-21T09:34:15.9293248Z add.s64 %rd97, %rd109, 32768; 2026-02-21T09:34:15.9293320Z add.s64 %rd98, %rd109, 65536; 2026-02-21T09:34:15.9293384Z add.s64 %rd99, %rd109, 98304; 2026-02-21T09:34:15.9293456Z add.s64 %rd100, %rd109, 131072; 2026-02-21T09:34:15.9293531Z add.s64 %rd101, %rd109, 163840; 2026-02-21T09:34:15.9293597Z add.s64 %rd102, %rd109, 196608; 2026-02-21T09:34:15.9293663Z add.s64 %rd103, %rd109, 229376; 2026-02-21T09:34:15.9293859Z .loc 1 54 85 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:85 2026-02-21T09:34:15.9293925Z bar.sync 0; 2026-02-21T09:34:15.9293988Z // begin inline asm 2026-02-21T09:34:15.9294123Z cp.async.cg.shared.global [ %r378 + 0 ], [ %rd96 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9294194Z // end inline asm 2026-02-21T09:34:15.9294257Z // begin inline asm 2026-02-21T09:34:15.9294386Z cp.async.cg.shared.global [ %r380 + 0 ], [ %rd97 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9294459Z // end inline asm 2026-02-21T09:34:15.9294522Z // begin inline asm 2026-02-21T09:34:15.9294647Z cp.async.cg.shared.global [ %r382 + 0 ], [ %rd98 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9294748Z // end inline asm 2026-02-21T09:34:15.9294812Z // begin inline asm 2026-02-21T09:34:15.9294934Z cp.async.cg.shared.global [ %r384 + 0 ], [ %rd99 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9294994Z // end inline asm 2026-02-21T09:34:15.9295066Z // begin inline asm 2026-02-21T09:34:15.9295196Z cp.async.cg.shared.global [ %r386 + 0 ], [ %rd100 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9295318Z // end inline asm 2026-02-21T09:34:15.9295387Z // begin inline asm 2026-02-21T09:34:15.9295519Z cp.async.cg.shared.global [ %r388 + 0 ], [ %rd101 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9295580Z // end inline asm 2026-02-21T09:34:15.9295642Z // begin inline asm 2026-02-21T09:34:15.9295775Z cp.async.cg.shared.global [ %r390 + 0 ], [ %rd102 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9295836Z // end inline asm 2026-02-21T09:34:15.9295898Z // begin inline asm 2026-02-21T09:34:15.9296027Z cp.async.cg.shared.global [ %r392 + 0 ], [ %rd103 + 0 ], 0x10, %r379; 2026-02-21T09:34:15.9296088Z // end inline asm 2026-02-21T09:34:15.9296157Z cp.async.commit_group; 2026-02-21T09:34:15.9296341Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9296413Z // begin inline asm 2026-02-21T09:34:15.9296590Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r394], 65536; 2026-02-21T09:34:15.9296654Z // end inline asm 2026-02-21T09:34:15.9296845Z .loc 1 55 44 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:55:44 2026-02-21T09:34:15.9296908Z bar.sync 0; 2026-02-21T09:34:15.9296979Z elect.sync %r403|%p59, -1; 2026-02-21T09:34:15.9297059Z and.pred %p57, %p36, %p59; 2026-02-21T09:34:15.9297126Z shl.b32 %r404, %r78, 1; 2026-02-21T09:34:15.9297193Z add.s32 %r405, %r93, %r404; 2026-02-21T09:34:15.9297258Z add.s32 %r395, %r405, 65536; 2026-02-21T09:34:15.9297331Z or.b32 %r396, %r329, 128; 2026-02-21T09:34:15.9297392Z // begin inline asm 2026-02-21T09:34:15.9297682Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r395], [%rd104, {%r396, %r397}], [%r394]; 2026-02-21T09:34:15.9297755Z // end inline asm 2026-02-21T09:34:15.9297941Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9298009Z shl.b32 %r406, %r57, 17; 2026-02-21T09:34:15.9298081Z or.b32 %r407, %r54, %r406; 2026-02-21T09:34:15.9298160Z mad.wide.s32 %rd427, %r407, 2, %rd18; 2026-02-21T09:34:15.9298231Z mul.wide.u32 %rd110, %r357, 64; 2026-02-21T09:34:15.9298296Z or.b64 %rd22, %rd110, 256; 2026-02-21T09:34:15.9298367Z or.b32 %r409, %r55, %r406; 2026-02-21T09:34:15.9298433Z cvt.u64.u32 %rd23, %r409; 2026-02-21T09:34:15.9298496Z mov.b32 %r1023, 1; 2026-02-21T09:34:15.9298563Z mov.b64 %rd428, 0; 2026-02-21T09:34:15.9298627Z mov.b32 %r1021, %r1019; 2026-02-21T09:34:15.9298691Z mov.b32 %r1022, %r1019; 2026-02-21T09:34:15.9298753Z mov.b32 %r1024, %r1019; 2026-02-21T09:34:15.9298822Z bra.uni $L__BB0_5; 2026-02-21T09:34:15.9298936Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:34:15.9299122Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9299203Z setp.lt.u64 %p81, %rd428, 768; 2026-02-21T09:34:15.9299384Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9299450Z // begin inline asm 2026-02-21T09:34:15.9299511Z 2026-02-21T09:34:15.9299566Z { 2026-02-21T09:34:15.9299634Z .reg .pred complete; 2026-02-21T09:34:15.9299697Z waitLoop: 2026-02-21T09:34:15.9299845Z mbarrier.try_wait.parity.shared.b64 complete, [%r1020], %r1019; 2026-02-21T09:34:15.9299917Z @!complete bra.uni waitLoop; 2026-02-21T09:34:15.9299971Z } 2026-02-21T09:34:15.9299976Z 2026-02-21T09:34:15.9300043Z // end inline asm 2026-02-21T09:34:15.9300107Z add.s32 %r495, %r1023, 1; 2026-02-21T09:34:15.9300176Z setp.gt.s32 %p84, %r495, 1; 2026-02-21T09:34:15.9300248Z selp.b32 %r1023, 0, %r495, %p84; 2026-02-21T09:34:15.9300320Z selp.b32 %r496, 1, 0, %p84; 2026-02-21T09:34:15.9300386Z xor.b32 %r91, %r1024, %r496; 2026-02-21T09:34:15.9300455Z add.s64 %rd153, %rd23, %rd428; 2026-02-21T09:34:15.9300642Z .loc 1 54 32 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:32 2026-02-21T09:34:15.9300744Z add.s64 %rd144, %rd427, -196608; 2026-02-21T09:34:15.9300839Z add.s64 %rd145, %rd427, -163840; 2026-02-21T09:34:15.9300912Z add.s64 %rd146, %rd427, -131072; 2026-02-21T09:34:15.9300980Z add.s64 %rd147, %rd427, -98304; 2026-02-21T09:34:15.9301050Z add.s64 %rd148, %rd427, -65536; 2026-02-21T09:34:15.9301114Z add.s64 %rd149, %rd427, -32768; 2026-02-21T09:34:15.9301187Z cvt.u32.u64 %r497, %rd153; 2026-02-21T09:34:15.9301263Z mad.wide.s32 %rd151, %r497, 2, %rd28; 2026-02-21T09:34:15.9301451Z .loc 1 54 85 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:85 2026-02-21T09:34:15.9301525Z shl.b32 %r498, %r1023, 15; 2026-02-21T09:34:15.9301591Z add.s32 %r500, %r93, %r498; 2026-02-21T09:34:15.9301656Z add.s32 %r501, %r500, %r28; 2026-02-21T09:34:15.9301723Z bar.sync 0; 2026-02-21T09:34:15.9301789Z add.s32 %r474, %r501, 131072; 2026-02-21T09:34:15.9301879Z selp.b32 %r475, 16, 0, %p81; 2026-02-21T09:34:15.9301974Z // begin inline asm 2026-02-21T09:34:15.9302115Z cp.async.cg.shared.global [ %r474 + 0 ], [ %rd144 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9302179Z // end inline asm 2026-02-21T09:34:15.9302246Z add.s32 %r476, %r501, 133120; 2026-02-21T09:34:15.9302317Z // begin inline asm 2026-02-21T09:34:15.9302444Z cp.async.cg.shared.global [ %r476 + 0 ], [ %rd145 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9302505Z // end inline asm 2026-02-21T09:34:15.9302569Z add.s32 %r478, %r501, 135168; 2026-02-21T09:34:15.9302640Z // begin inline asm 2026-02-21T09:34:15.9302768Z cp.async.cg.shared.global [ %r478 + 0 ], [ %rd146 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9302829Z // end inline asm 2026-02-21T09:34:15.9302904Z add.s32 %r480, %r501, 137216; 2026-02-21T09:34:15.9302968Z // begin inline asm 2026-02-21T09:34:15.9303095Z cp.async.cg.shared.global [ %r480 + 0 ], [ %rd147 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9303161Z // end inline asm 2026-02-21T09:34:15.9303235Z add.s32 %r482, %r501, 139264; 2026-02-21T09:34:15.9303300Z // begin inline asm 2026-02-21T09:34:15.9303426Z cp.async.cg.shared.global [ %r482 + 0 ], [ %rd148 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9303496Z // end inline asm 2026-02-21T09:34:15.9303562Z add.s32 %r484, %r501, 141312; 2026-02-21T09:34:15.9303624Z // begin inline asm 2026-02-21T09:34:15.9303757Z cp.async.cg.shared.global [ %r484 + 0 ], [ %rd149 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9303816Z // end inline asm 2026-02-21T09:34:15.9303880Z add.s32 %r486, %r501, 143360; 2026-02-21T09:34:15.9303941Z // begin inline asm 2026-02-21T09:34:15.9304073Z cp.async.cg.shared.global [ %r486 + 0 ], [ %rd427 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9304132Z // end inline asm 2026-02-21T09:34:15.9304197Z add.s32 %r488, %r501, 145408; 2026-02-21T09:34:15.9304265Z // begin inline asm 2026-02-21T09:34:15.9304390Z cp.async.cg.shared.global [ %r488 + 0 ], [ %rd151 + 0 ], 0x10, %r475; 2026-02-21T09:34:15.9304451Z // end inline asm 2026-02-21T09:34:15.9304522Z cp.async.commit_group; 2026-02-21T09:34:15.9304742Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9304810Z shl.b32 %r502, %r1023, 3; 2026-02-21T09:34:15.9304874Z add.s32 %r503, %r93, %r502; 2026-02-21T09:34:15.9304944Z add.s32 %r494, %r503, 196624; 2026-02-21T09:34:15.9305014Z and.pred %p79, %p4, %p81; 2026-02-21T09:34:15.9305077Z // begin inline asm 2026-02-21T09:34:15.9305214Z @%p79 mbarrier.arrive.expect_tx.shared.b64 _, [%r494], 65536; 2026-02-21T09:34:15.9305279Z // end inline asm 2026-02-21T09:34:15.9305481Z .loc 1 55 44 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:55:44 2026-02-21T09:34:15.9305551Z shl.b32 %r504, %r1023, 16; 2026-02-21T09:34:15.9305624Z bar.sync 0; 2026-02-21T09:34:15.9305697Z elect.sync %r505|%p85, -1; 2026-02-21T09:34:15.9305771Z and.pred %p86, %p81, %p85; 2026-02-21T09:34:15.9305850Z and.pred %p80, %p36, %p86; 2026-02-21T09:34:15.9305933Z add.s32 %r491, %r328, %r504; 2026-02-21T09:34:15.9306042Z add.s64 %rd154, %rd22, %rd428; 2026-02-21T09:34:15.9306157Z cvt.u32.u64 %r492, %rd154; 2026-02-21T09:34:15.9306228Z // begin inline asm 2026-02-21T09:34:15.9306516Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r491], [%rd104, {%r492, %r397}], [%r494]; 2026-02-21T09:34:15.9306578Z // end inline asm 2026-02-21T09:34:15.9306784Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9306855Z add.s64 %rd428, %rd428, 128; 2026-02-21T09:34:15.9306926Z add.s64 %rd427, %rd427, 256; 2026-02-21T09:34:15.9307002Z mov.b32 %r1019, %r1024; 2026-02-21T09:34:15.9307071Z mov.b32 %r1020, %r506; 2026-02-21T09:34:15.9307137Z mov.b32 %r1024, %r91; 2026-02-21T09:34:15.9307206Z @%p81 bra $L__BB0_5; 2026-02-21T09:34:15.9307280Z bra.uni $L__BB0_8; 2026-02-21T09:34:15.9307437Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:34:15.9307576Z // => This Inner Loop Header: Depth=2 2026-02-21T09:34:15.9307653Z add.s32 %r412, %r1022, 1; 2026-02-21T09:34:15.9307722Z setp.gt.s32 %p61, %r412, 1; 2026-02-21T09:34:15.9307793Z selp.b32 %r1022, 0, %r412, %p61; 2026-02-21T09:34:15.9307867Z selp.b32 %r413, 1, 0, %p61; 2026-02-21T09:34:15.9307935Z xor.b32 %r1021, %r1021, %r413; 2026-02-21T09:34:15.9308124Z .loc 1 54 85 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:85 2026-02-21T09:34:15.9308196Z cp.async.wait_group 0; 2026-02-21T09:34:15.9308263Z bar.sync 0; 2026-02-21T09:34:15.9308449Z .loc 1 49 80 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:49:80 2026-02-21T09:34:15.9308515Z shl.b32 %r414, %r1022, 3; 2026-02-21T09:34:15.9308586Z add.s32 %r416, %r93, %r414; 2026-02-21T09:34:15.9308651Z add.s32 %r410, %r416, 196624; 2026-02-21T09:34:15.9308713Z // begin inline asm 2026-02-21T09:34:15.9308770Z 2026-02-21T09:34:15.9308834Z { 2026-02-21T09:34:15.9308903Z .reg .pred complete; 2026-02-21T09:34:15.9308967Z waitLoop: 2026-02-21T09:34:15.9309113Z mbarrier.try_wait.parity.shared.b64 complete, [%r410], %r1021; 2026-02-21T09:34:15.9309184Z @!complete bra.uni waitLoop; 2026-02-21T09:34:15.9309238Z } 2026-02-21T09:34:15.9309242Z 2026-02-21T09:34:15.9309311Z // end inline asm 2026-02-21T09:34:15.9309373Z shl.b32 %r417, %r1023, 3; 2026-02-21T09:34:15.9309437Z add.s32 %r418, %r93, %r417; 2026-02-21T09:34:15.9309500Z add.s32 %r506, %r418, 196608; 2026-02-21T09:34:15.9309692Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9309758Z @%p38 bra $L__BB0_7; 2026-02-21T09:34:15.9309867Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:34:15.9310050Z .loc 1 55 44 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:55:44 2026-02-21T09:34:15.9310118Z shl.b32 %r435, %r1022, 16; 2026-02-21T09:34:15.9310184Z add.s32 %r437, %r93, %r435; 2026-02-21T09:34:15.9310375Z .loc 1 54 85 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:54:85 2026-02-21T09:34:15.9310443Z shl.b32 %r438, %r1022, 15; 2026-02-21T09:34:15.9310506Z add.s32 %r439, %r93, %r438; 2026-02-21T09:34:15.9310572Z add.s32 %r440, %r439, 131072; 2026-02-21T09:34:15.9310761Z .loc 1 56 52 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:56:52 2026-02-21T09:34:15.9310832Z elect.sync %r441|%p63, -1; 2026-02-21T09:34:15.9310901Z bfe.u32 %r442, %r440, 4, 14; 2026-02-21T09:34:15.9310974Z cvt.u64.u32 %rd128, %r442; 2026-02-21T09:34:15.9311058Z or.b64 %rd111, %rd128, 4611686293372403712; 2026-02-21T09:34:15.9311124Z bfe.u32 %r443, %r437, 4, 14; 2026-02-21T09:34:15.9311189Z cvt.u64.u32 %rd129, %r443; 2026-02-21T09:34:15.9311278Z or.b64 %rd112, %rd129, 4611686293439512576; 2026-02-21T09:34:15.9311345Z mov.b32 %r420, 138412048; 2026-02-21T09:34:15.9311414Z mov.pred %p62, -1; 2026-02-21T09:34:15.9311526Z // begin inline asm 2026-02-21T09:34:15.9311716Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd111, %rd112, %r420, %p62; 2026-02-21T09:34:15.9311780Z // end inline asm 2026-02-21T09:34:15.9311856Z add.s32 %r444, %r439, 131104; 2026-02-21T09:34:15.9311922Z bfe.u32 %r445, %r444, 4, 14; 2026-02-21T09:34:15.9311987Z cvt.u64.u32 %rd130, %r445; 2026-02-21T09:34:15.9312067Z or.b64 %rd113, %rd130, 4611686293372403712; 2026-02-21T09:34:15.9312142Z add.s32 %r446, %r437, 32; 2026-02-21T09:34:15.9312208Z bfe.u32 %r447, %r446, 4, 14; 2026-02-21T09:34:15.9312273Z cvt.u64.u32 %rd131, %r447; 2026-02-21T09:34:15.9312357Z or.b64 %rd114, %rd131, 4611686293439512576; 2026-02-21T09:34:15.9312420Z // begin inline asm 2026-02-21T09:34:15.9312579Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd113, %rd114, %r420, %p62; 2026-02-21T09:34:15.9312648Z // end inline asm 2026-02-21T09:34:15.9312737Z add.s32 %r448, %r439, 131136; 2026-02-21T09:34:15.9312824Z bfe.u32 %r449, %r448, 4, 14; 2026-02-21T09:34:15.9312893Z cvt.u64.u32 %rd132, %r449; 2026-02-21T09:34:15.9312978Z or.b64 %rd115, %rd132, 4611686293372403712; 2026-02-21T09:34:15.9313042Z add.s32 %r450, %r437, 64; 2026-02-21T09:34:15.9313104Z bfe.u32 %r451, %r450, 4, 14; 2026-02-21T09:34:15.9313175Z cvt.u64.u32 %rd133, %r451; 2026-02-21T09:34:15.9313248Z or.b64 %rd116, %rd133, 4611686293439512576; 2026-02-21T09:34:15.9313309Z // begin inline asm 2026-02-21T09:34:15.9313461Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd115, %rd116, %r420, %p62; 2026-02-21T09:34:15.9313531Z // end inline asm 2026-02-21T09:34:15.9313595Z add.s32 %r452, %r439, 131168; 2026-02-21T09:34:15.9313660Z bfe.u32 %r453, %r452, 4, 14; 2026-02-21T09:34:15.9313734Z cvt.u64.u32 %rd134, %r453; 2026-02-21T09:34:15.9313810Z or.b64 %rd117, %rd134, 4611686293372403712; 2026-02-21T09:34:15.9313876Z add.s32 %r454, %r437, 96; 2026-02-21T09:34:15.9313948Z bfe.u32 %r455, %r454, 4, 14; 2026-02-21T09:34:15.9314014Z cvt.u64.u32 %rd135, %r455; 2026-02-21T09:34:15.9314089Z or.b64 %rd118, %rd135, 4611686293439512576; 2026-02-21T09:34:15.9314153Z // begin inline asm 2026-02-21T09:34:15.9314313Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd117, %rd118, %r420, %p62; 2026-02-21T09:34:15.9314375Z // end inline asm 2026-02-21T09:34:15.9314440Z add.s32 %r456, %r439, 147456; 2026-02-21T09:34:15.9314511Z bfe.u32 %r457, %r456, 4, 14; 2026-02-21T09:34:15.9314575Z cvt.u64.u32 %rd136, %r457; 2026-02-21T09:34:15.9314649Z or.b64 %rd119, %rd136, 4611686293372403712; 2026-02-21T09:34:15.9314791Z add.s32 %r458, %r437, 32768; 2026-02-21T09:34:15.9314863Z bfe.u32 %r459, %r458, 4, 14; 2026-02-21T09:34:15.9314928Z cvt.u64.u32 %rd137, %r459; 2026-02-21T09:34:15.9315003Z or.b64 %rd120, %rd137, 4611686293439512576; 2026-02-21T09:34:15.9315075Z // begin inline asm 2026-02-21T09:34:15.9315226Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd119, %rd120, %r420, %p62; 2026-02-21T09:34:15.9315290Z // end inline asm 2026-02-21T09:34:15.9315363Z add.s32 %r460, %r439, 147488; 2026-02-21T09:34:15.9315427Z bfe.u32 %r461, %r460, 4, 14; 2026-02-21T09:34:15.9315491Z cvt.u64.u32 %rd138, %r461; 2026-02-21T09:34:15.9315564Z or.b64 %rd121, %rd138, 4611686293372403712; 2026-02-21T09:34:15.9315636Z add.s32 %r462, %r437, 32800; 2026-02-21T09:34:15.9315699Z bfe.u32 %r463, %r462, 4, 14; 2026-02-21T09:34:15.9315764Z cvt.u64.u32 %rd139, %r463; 2026-02-21T09:34:15.9315842Z or.b64 %rd122, %rd139, 4611686293439512576; 2026-02-21T09:34:15.9315905Z // begin inline asm 2026-02-21T09:34:15.9316054Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd121, %rd122, %r420, %p62; 2026-02-21T09:34:15.9316122Z // end inline asm 2026-02-21T09:34:15.9316187Z add.s32 %r464, %r439, 147520; 2026-02-21T09:34:15.9316251Z bfe.u32 %r465, %r464, 4, 14; 2026-02-21T09:34:15.9316316Z cvt.u64.u32 %rd140, %r465; 2026-02-21T09:34:15.9316400Z or.b64 %rd123, %rd140, 4611686293372403712; 2026-02-21T09:34:15.9316469Z add.s32 %r466, %r437, 32832; 2026-02-21T09:34:15.9316565Z bfe.u32 %r467, %r466, 4, 14; 2026-02-21T09:34:15.9316665Z cvt.u64.u32 %rd141, %r467; 2026-02-21T09:34:15.9316738Z or.b64 %rd124, %rd141, 4611686293439512576; 2026-02-21T09:34:15.9316801Z // begin inline asm 2026-02-21T09:34:15.9316950Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd123, %rd124, %r420, %p62; 2026-02-21T09:34:15.9317019Z // end inline asm 2026-02-21T09:34:15.9317082Z add.s32 %r468, %r439, 147552; 2026-02-21T09:34:15.9317144Z bfe.u32 %r469, %r468, 4, 14; 2026-02-21T09:34:15.9317216Z cvt.u64.u32 %rd142, %r469; 2026-02-21T09:34:15.9317290Z or.b64 %rd125, %rd142, 4611686293372403712; 2026-02-21T09:34:15.9317353Z add.s32 %r470, %r437, 32864; 2026-02-21T09:34:15.9317417Z bfe.u32 %r471, %r470, 4, 14; 2026-02-21T09:34:15.9317490Z cvt.u64.u32 %rd143, %r471; 2026-02-21T09:34:15.9317562Z or.b64 %rd126, %rd143, 4611686293439512576; 2026-02-21T09:34:15.9317654Z // begin inline asm 2026-02-21T09:34:15.9317843Z @%p63 tcgen05.mma.cta_group::1.kind::f16 [ %r1017 + 0 ], %rd125, %rd126, %r420, %p62; 2026-02-21T09:34:15.9317908Z // end inline asm 2026-02-21T09:34:15.9317972Z cvt.u64.u32 %rd127, %r506; 2026-02-21T09:34:15.9318040Z // begin inline asm 2026-02-21T09:34:15.9318181Z @%p63 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd127]; 2026-02-21T09:34:15.9318242Z // end inline asm 2026-02-21T09:34:15.9318305Z bra.uni $L__BB0_7; 2026-02-21T09:34:15.9318405Z $L__BB0_9: // %._crit_edge 2026-02-21T09:34:15.9318586Z .loc 1 30 4 // ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py:30:4 2026-02-21T09:34:15.9318646Z bar.sync 0; 2026-02-21T09:34:15.9318717Z // begin inline asm 2026-02-21T09:34:15.9318850Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1017, 256; 2026-02-21T09:34:15.9318912Z // end inline asm 2026-02-21T09:34:15.9318977Z ret; 2026-02-21T09:34:15.9319040Z $L__tmp0: 2026-02-21T09:34:15.9319104Z $L__func_end0: 2026-02-21T09:34:15.9319198Z // -- End function 2026-02-21T09:34:15.9319264Z } 2026-02-21T09:34:15.9319493Z .file 1 "/tmp/torchinductor_root/k6/ck6bq2k7u63dqq7s2u7moi52ewt24bfjsxycsmw25v5q54l42zmb.py" 2026-02-21T09:34:15.9319561Z .section .debug_abbrev 2026-02-21T09:34:15.9319625Z { 2026-02-21T09:34:15.9319724Z .b8 1 // Abbreviation Code 2026-02-21T09:34:15.9319824Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:34:15.9319913Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:34:15.9320012Z .b8 37 // DW_AT_producer 2026-02-21T09:34:15.9320097Z .b8 8 // DW_FORM_string 2026-02-21T09:34:15.9320180Z .b8 19 // DW_AT_language 2026-02-21T09:34:15.9320279Z .b8 5 // DW_FORM_data2 2026-02-21T09:34:15.9320367Z .b8 3 // DW_AT_name 2026-02-21T09:34:15.9320452Z .b8 8 // DW_FORM_string 2026-02-21T09:34:15.9320550Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:34:15.9320635Z .b8 6 // DW_FORM_data4 2026-02-21T09:34:15.9320718Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:34:15.9320807Z .b8 8 // DW_FORM_string 2026-02-21T09:34:15.9320886Z .b8 0 // EOM(1) 2026-02-21T09:34:15.9320962Z .b8 0 // EOM(2) 2026-02-21T09:34:15.9321034Z .b8 0 // EOM(3) 2026-02-21T09:34:15.9321097Z } 2026-02-21T09:34:15.9321162Z .section .debug_info 2026-02-21T09:34:15.9321217Z { 2026-02-21T09:34:15.9321313Z .b32 104 // Length of Unit 2026-02-21T09:34:15.9321408Z .b8 2 // DWARF version number 2026-02-21T09:34:15.9321467Z .b8 0 2026-02-21T09:34:15.9321595Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:34:15.9321752Z .b8 8 // Address Size (in bytes) 2026-02-21T09:34:15.9321863Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:34:15.9321952Z .b8 116 // DW_AT_producer 2026-02-21T09:34:15.9322020Z .b8 114 2026-02-21T09:34:15.9322078Z .b8 105 2026-02-21T09:34:15.9322134Z .b8 116 2026-02-21T09:34:15.9322197Z .b8 111 2026-02-21T09:34:15.9322254Z .b8 110 2026-02-21T09:34:15.9322308Z .b8 0 2026-02-21T09:34:15.9322391Z .b8 2 // DW_AT_language 2026-02-21T09:34:15.9322457Z .b8 0 2026-02-21T09:34:15.9322541Z .b8 99 // DW_AT_name 2026-02-21T09:34:15.9322597Z .b8 107 2026-02-21T09:34:15.9322661Z .b8 54 2026-02-21T09:34:15.9322717Z .b8 98 2026-02-21T09:34:15.9322774Z .b8 113 2026-02-21T09:34:15.9322852Z .b8 50 2026-02-21T09:34:15.9322918Z .b8 107 2026-02-21T09:34:15.9322998Z .b8 55 2026-02-21T09:34:15.9323057Z .b8 117 2026-02-21T09:34:15.9323114Z .b8 54 2026-02-21T09:34:15.9323177Z .b8 51 2026-02-21T09:34:15.9323233Z .b8 100 2026-02-21T09:34:15.9323288Z .b8 113 2026-02-21T09:34:15.9323350Z .b8 113 2026-02-21T09:34:15.9323405Z .b8 55 2026-02-21T09:34:15.9323459Z .b8 115 2026-02-21T09:34:15.9323513Z .b8 50 2026-02-21T09:34:15.9323576Z .b8 117 2026-02-21T09:34:15.9323631Z .b8 55 2026-02-21T09:34:15.9323686Z .b8 109 2026-02-21T09:34:15.9323748Z .b8 111 2026-02-21T09:34:15.9323805Z .b8 105 2026-02-21T09:34:15.9323859Z .b8 53 2026-02-21T09:34:15.9323912Z .b8 50 2026-02-21T09:34:15.9323974Z .b8 101 2026-02-21T09:34:15.9324029Z .b8 119 2026-02-21T09:34:15.9324084Z .b8 116 2026-02-21T09:34:15.9324146Z .b8 50 2026-02-21T09:34:15.9324201Z .b8 52 2026-02-21T09:34:15.9324254Z .b8 98 2026-02-21T09:34:15.9324309Z .b8 102 2026-02-21T09:34:15.9324371Z .b8 106 2026-02-21T09:34:15.9324428Z .b8 115 2026-02-21T09:34:15.9324485Z .b8 120 2026-02-21T09:34:15.9324540Z .b8 121 2026-02-21T09:34:15.9324605Z .b8 99 2026-02-21T09:34:15.9324662Z .b8 115 2026-02-21T09:34:15.9324751Z .b8 109 2026-02-21T09:34:15.9324814Z .b8 119 2026-02-21T09:34:15.9324869Z .b8 50 2026-02-21T09:34:15.9324923Z .b8 53 2026-02-21T09:34:15.9324978Z .b8 118 2026-02-21T09:34:15.9325038Z .b8 53 2026-02-21T09:34:15.9325093Z .b8 113 2026-02-21T09:34:15.9325147Z .b8 53 2026-02-21T09:34:15.9325207Z .b8 52 2026-02-21T09:34:15.9325262Z .b8 108 2026-02-21T09:34:15.9325316Z .b8 52 2026-02-21T09:34:15.9325371Z .b8 50 2026-02-21T09:34:15.9325434Z .b8 122 2026-02-21T09:34:15.9325488Z .b8 109 2026-02-21T09:34:15.9325543Z .b8 98 2026-02-21T09:34:15.9325597Z .b8 46 2026-02-21T09:34:15.9325659Z .b8 112 2026-02-21T09:34:15.9325715Z .b8 121 2026-02-21T09:34:15.9325769Z .b8 0 2026-02-21T09:34:15.9325881Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:34:15.9325967Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:34:15.9326025Z .b8 116 2026-02-21T09:34:15.9326081Z .b8 109 2026-02-21T09:34:15.9326147Z .b8 112 2026-02-21T09:34:15.9326205Z .b8 47 2026-02-21T09:34:15.9326262Z .b8 116 2026-02-21T09:34:15.9326324Z .b8 111 2026-02-21T09:34:15.9326379Z .b8 114 2026-02-21T09:34:15.9326434Z .b8 99 2026-02-21T09:34:15.9326490Z .b8 104 2026-02-21T09:34:15.9326557Z .b8 105 2026-02-21T09:34:15.9326613Z .b8 110 2026-02-21T09:34:15.9326670Z .b8 100 2026-02-21T09:34:15.9326734Z .b8 117 2026-02-21T09:34:15.9326792Z .b8 99 2026-02-21T09:34:15.9326848Z .b8 116 2026-02-21T09:34:15.9326902Z .b8 111 2026-02-21T09:34:15.9326965Z .b8 114 2026-02-21T09:34:15.9327020Z .b8 95 2026-02-21T09:34:15.9327074Z .b8 114 2026-02-21T09:34:15.9327137Z .b8 111 2026-02-21T09:34:15.9327192Z .b8 111 2026-02-21T09:34:15.9327247Z .b8 116 2026-02-21T09:34:15.9327301Z .b8 47 2026-02-21T09:34:15.9327365Z .b8 107 2026-02-21T09:34:15.9327419Z .b8 54 2026-02-21T09:34:15.9327473Z .b8 0 2026-02-21T09:34:15.9327528Z } 2026-02-21T09:34:15.9327607Z .section .debug_macinfo { } 2026-02-21T09:34:15.9327614Z 2026-02-21T09:34:15.9327704Z ================================================================ 2026-02-21T09:34:15.9327884Z please share the reproducer above with Triton project. 2026-02-21T09:34:16.5651891Z 2026-02-21T09:34:16.5657962Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 19/19 15.0 configs/s 2026-02-21T09:34:16.7286397Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5587.1 2026-02-21T09:34:16.7289102Z configs/s 2026-02-21T09:34:16.7607734Z [208s] Generation 9 complete: 2026-02-21T09:34:16.7612260Z error=3 2026-02-21T09:34:16.7616975Z ok=17 2026-02-21T09:34:16.7618774Z min=0.0429 2026-02-21T09:34:16.7618984Z mid=0.1025 2026-02-21T09:34:16.7619129Z max=10.9199 2026-02-21T09:34:16.7619302Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:34:16.7619581Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:34:16.7619844Z 'l2_groupings': [64], 2026-02-21T09:34:16.7620340Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:34:16.7620679Z 'loop_orders': [[0, 1]], 2026-02-21T09:34:16.7620878Z 'num_stages': 3, 2026-02-21T09:34:16.7621041Z 'num_warps': 4, 2026-02-21T09:34:16.7621216Z 'pid_type': 'flat', 2026-02-21T09:34:16.7621442Z 'range_flattens': [None, True], 2026-02-21T09:34:16.7621669Z 'range_multi_buffers': [None, True], 2026-02-21T09:34:16.7621907Z 'range_num_stages': [0, 0], 2026-02-21T09:34:16.7643627Z 'range_unroll_factors': [0, 0], 2026-02-21T09:34:16.7643927Z 'range_warp_specializes': [None, None]} 2026-02-21T09:34:16.7644218Z [208s] Fitting surrogate: 746 points, 746 targets 2026-02-21T09:34:17.3517647Z [209s] Generation 10 starting: 20 neighbors, 1 active search path(s) 2026-02-21T09:34:47.9223789Z [239s] Timeout after 30s compiling Config(block_sizes=[128, 256, 512], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=1, num_stages=2, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:34:47.9249938Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21/21 0.3 configs/s 2026-02-21T09:34:49.2071895Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 21/21 12.7 configs/s 2026-02-21T09:34:49.3620541Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5886.9 2026-02-21T09:34:49.3620902Z configs/s 2026-02-21T09:34:49.3926161Z [241s] Generation 10 complete: 2026-02-21T09:34:49.3930898Z error=9 2026-02-21T09:34:49.3932682Z timeout=1 2026-02-21T09:34:49.3932908Z ok=12 2026-02-21T09:34:49.3938676Z min=0.0429 2026-02-21T09:34:49.3942968Z mid=0.1024 2026-02-21T09:34:49.3947536Z max=3.6332 2026-02-21T09:34:49.3949437Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:34:49.3955250Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:34:49.3960791Z 'l2_groupings': [64], 2026-02-21T09:34:49.3966465Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:34:49.3968294Z 'loop_orders': [[0, 1]], 2026-02-21T09:34:49.3972880Z 'num_stages': 3, 2026-02-21T09:34:49.3977699Z 'num_warps': 4, 2026-02-21T09:34:49.3980060Z 'pid_type': 'flat', 2026-02-21T09:34:49.3980283Z 'range_flattens': [None, True], 2026-02-21T09:34:49.3980481Z 'range_multi_buffers': [None, True], 2026-02-21T09:34:49.3980689Z 'range_num_stages': [0, 0], 2026-02-21T09:34:49.3980862Z 'range_unroll_factors': [0, 0], 2026-02-21T09:34:49.3981059Z 'range_warp_specializes': [None, None]} 2026-02-21T09:34:49.3981385Z [241s] Fitting surrogate: 768 points, 768 targets 2026-02-21T09:34:49.9235648Z [241s] Generation 11 starting: 20 neighbors, 1 active search path(s) 2026-02-21T09:34:58.9784441Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21/21 1.3 configs/s 2026-02-21T09:34:59.8047098Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 21/21 23.5 configs/s 2026-02-21T09:34:59.9631226Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5757.9 2026-02-21T09:34:59.9631961Z configs/s 2026-02-21T09:34:59.9941010Z [251s] Generation 11 complete: 2026-02-21T09:34:59.9946152Z error=8 2026-02-21T09:34:59.9947589Z ok=14 2026-02-21T09:34:59.9947759Z min=0.0429 2026-02-21T09:34:59.9947905Z mid=0.0819 2026-02-21T09:34:59.9948036Z max=2.2701 2026-02-21T09:34:59.9948190Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:34:59.9948446Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:34:59.9948688Z 'l2_groupings': [64], 2026-02-21T09:34:59.9948867Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:34:59.9949068Z 'loop_orders': [[0, 1]], 2026-02-21T09:34:59.9949229Z 'num_stages': 3, 2026-02-21T09:34:59.9949386Z 'num_warps': 4, 2026-02-21T09:34:59.9949822Z 'pid_type': 'flat', 2026-02-21T09:34:59.9950053Z 'range_flattens': [None, True], 2026-02-21T09:34:59.9950247Z 'range_multi_buffers': [None, True], 2026-02-21T09:34:59.9950430Z 'range_num_stages': [0, 0], 2026-02-21T09:34:59.9950598Z 'range_unroll_factors': [0, 0], 2026-02-21T09:34:59.9950768Z 'range_warp_specializes': [None, None]} 2026-02-21T09:34:59.9975490Z [251s] Fitting surrogate: 790 points, 790 targets 2026-02-21T09:35:00.4968455Z [252s] Generation 12 starting: 17 neighbors, 1 active search path(s) 2026-02-21T09:35:31.7124908Z [283s] Timeout after 30s compiling Config(block_sizes=[512, 256, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=2, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:35:31.7141035Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 0.3 configs/s 2026-02-21T09:35:32.7880396Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 17.5 configs/s 2026-02-21T09:35:32.9449428Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5800.4 2026-02-21T09:35:32.9450797Z configs/s 2026-02-21T09:35:32.9754330Z [284s] Generation 12 complete: 2026-02-21T09:35:32.9756412Z error=2 2026-02-21T09:35:32.9756621Z timeout=1 2026-02-21T09:35:32.9759387Z ok=16 2026-02-21T09:35:32.9759553Z min=0.0429 2026-02-21T09:35:32.9759696Z mid=1.5473 2026-02-21T09:35:32.9759821Z max=6.9151 2026-02-21T09:35:32.9759974Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:35:32.9760224Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:35:32.9760441Z 'l2_groupings': [64], 2026-02-21T09:35:32.9760614Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:35:32.9760802Z 'loop_orders': [[0, 1]], 2026-02-21T09:35:32.9760990Z 'num_stages': 3, 2026-02-21T09:35:32.9761139Z 'num_warps': 4, 2026-02-21T09:35:32.9761612Z 'pid_type': 'flat', 2026-02-21T09:35:32.9761842Z 'range_flattens': [None, True], 2026-02-21T09:35:32.9762029Z 'range_multi_buffers': [None, True], 2026-02-21T09:35:32.9762219Z 'range_num_stages': [0, 0], 2026-02-21T09:35:32.9762385Z 'range_unroll_factors': [0, 0], 2026-02-21T09:35:32.9762570Z 'range_warp_specializes': [None, None]} 2026-02-21T09:35:32.9780168Z [284s] Fitting surrogate: 809 points, 809 targets 2026-02-21T09:35:33.4592492Z [285s] Generation 13 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:35:39.9172021Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 1.2 configs/s 2026-02-21T09:35:40.7868952Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 20.9 configs/s 2026-02-21T09:35:41.0389430Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3737.0 2026-02-21T09:35:41.0393599Z configs/s 2026-02-21T09:35:41.0740863Z [292s] Generation 13 complete: 2026-02-21T09:35:41.0745976Z error=3 2026-02-21T09:35:41.0750613Z ok=15 2026-02-21T09:35:41.0754476Z min=0.0429 2026-02-21T09:35:41.0755831Z mid=0.1004 2026-02-21T09:35:41.0756001Z max=2.6849 2026-02-21T09:35:41.0756142Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:35:41.0756390Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:35:41.0756615Z 'l2_groupings': [64], 2026-02-21T09:35:41.0756780Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:35:41.0756973Z 'loop_orders': [[0, 1]], 2026-02-21T09:35:41.0757122Z 'num_stages': 3, 2026-02-21T09:35:41.0757267Z 'num_warps': 4, 2026-02-21T09:35:41.0757405Z 'pid_type': 'flat', 2026-02-21T09:35:41.0757566Z 'range_flattens': [None, True], 2026-02-21T09:35:41.0757739Z 'range_multi_buffers': [None, True], 2026-02-21T09:35:41.0757923Z 'range_num_stages': [0, 0], 2026-02-21T09:35:41.0758084Z 'range_unroll_factors': [0, 0], 2026-02-21T09:35:41.0758627Z 'range_warp_specializes': [None, None]} 2026-02-21T09:35:41.0771255Z [292s] Fitting surrogate: 827 points, 827 targets 2026-02-21T09:35:41.5182673Z [293s] Generation 14 starting: 14 neighbors, 1 active search path(s) 2026-02-21T09:35:52.6549680Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 0.6 configs/s 2026-02-21T09:35:53.4439004Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 18.9 configs/s 2026-02-21T09:35:53.7000334Z Generation 14: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3673.6 2026-02-21T09:35:53.7000671Z configs/s 2026-02-21T09:35:53.7368612Z [305s] Generation 14 complete: 2026-02-21T09:35:53.7373217Z error=3 2026-02-21T09:35:53.7377149Z ok=13 2026-02-21T09:35:53.7381121Z min=0.0429 2026-02-21T09:35:53.7383234Z mid=0.0901 2026-02-21T09:35:53.7383454Z max=6.8618 2026-02-21T09:35:53.7386643Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:35:53.7386945Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:35:53.7387180Z 'l2_groupings': [64], 2026-02-21T09:35:53.7387358Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:35:53.7387547Z 'loop_orders': [[0, 1]], 2026-02-21T09:35:53.7387706Z 'num_stages': 3, 2026-02-21T09:35:53.7387853Z 'num_warps': 4, 2026-02-21T09:35:53.7387987Z 'pid_type': 'flat', 2026-02-21T09:35:53.7388144Z 'range_flattens': [None, True], 2026-02-21T09:35:53.7388315Z 'range_multi_buffers': [None, True], 2026-02-21T09:35:53.7388498Z 'range_num_stages': [0, 0], 2026-02-21T09:35:53.7388659Z 'range_unroll_factors': [0, 0], 2026-02-21T09:35:53.7388838Z 'range_warp_specializes': [None, None]} 2026-02-21T09:35:53.7400930Z [305s] Fitting surrogate: 843 points, 843 targets 2026-02-21T09:35:54.1150731Z [305s] Generation 15 starting: 15 neighbors, 1 active search path(s) 2026-02-21T09:36:16.1066306Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 0.3 configs/s 2026-02-21T09:36:17.0134610Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 17.4 configs/s 2026-02-21T09:36:17.1698783Z Generation 15: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5877.8 2026-02-21T09:36:17.1699890Z configs/s 2026-02-21T09:36:17.2016348Z [328s] Generation 15 complete: 2026-02-21T09:36:17.2020708Z error=3 2026-02-21T09:36:17.2026167Z ok=14 2026-02-21T09:36:17.2028336Z min=0.0429 2026-02-21T09:36:17.2028524Z mid=0.1557 2026-02-21T09:36:17.2028664Z max=11.2901 2026-02-21T09:36:17.2035777Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:36:17.2036086Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:36:17.2036349Z 'l2_groupings': [64], 2026-02-21T09:36:17.2036555Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:36:17.2036772Z 'loop_orders': [[0, 1]], 2026-02-21T09:36:17.2036960Z 'num_stages': 3, 2026-02-21T09:36:17.2037119Z 'num_warps': 4, 2026-02-21T09:36:17.2037289Z 'pid_type': 'flat', 2026-02-21T09:36:17.2037815Z 'range_flattens': [None, True], 2026-02-21T09:36:17.2038107Z 'range_multi_buffers': [None, True], 2026-02-21T09:36:17.2038352Z 'range_num_stages': [0, 0], 2026-02-21T09:36:17.2038549Z 'range_unroll_factors': [0, 0], 2026-02-21T09:36:17.2038765Z 'range_warp_specializes': [None, None]} 2026-02-21T09:36:17.2052843Z [328s] Fitting surrogate: 860 points, 860 targets 2026-02-21T09:36:17.7644538Z [329s] Generation 16 starting: 17 neighbors, 1 active search path(s) 2026-02-21T09:36:49.3166705Z [361s] Timeout after 30s compiling Config(block_sizes=[256, 512, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], num_stages=2, num_warps=1, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]) 2026-02-21T09:36:49.3187515Z Generation 16: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 0.2 configs/s 2026-02-21T09:36:50.3533793Z Generation 16: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 18.3 configs/s 2026-02-21T09:36:50.6056800Z Generation 16: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 3760.0 2026-02-21T09:36:50.6057712Z configs/s 2026-02-21T09:36:50.6450382Z [362s] Generation 16 complete: 2026-02-21T09:36:50.6454638Z error=3 2026-02-21T09:36:50.6459824Z timeout=1 2026-02-21T09:36:50.6461298Z ok=15 2026-02-21T09:36:50.6461510Z min=0.0428 2026-02-21T09:36:50.6461674Z mid=0.1269 2026-02-21T09:36:50.6461840Z max=11.3715 2026-02-21T09:36:50.6462021Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:36:50.6462322Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:36:50.6462587Z 'l2_groupings': [64], 2026-02-21T09:36:50.6462795Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:36:50.6463017Z 'loop_orders': [[0, 1]], 2026-02-21T09:36:50.6463218Z 'num_stages': 3, 2026-02-21T09:36:50.6463420Z 'num_warps': 4, 2026-02-21T09:36:50.6463593Z 'pid_type': 'flat', 2026-02-21T09:36:50.6463798Z 'range_flattens': [None, True], 2026-02-21T09:36:50.6464017Z 'range_multi_buffers': [None, True], 2026-02-21T09:36:50.6464245Z 'range_num_stages': [0, 0], 2026-02-21T09:36:50.6464447Z 'range_unroll_factors': [0, 0], 2026-02-21T09:36:50.6464738Z 'range_warp_specializes': [None, None]} 2026-02-21T09:36:50.6483148Z [362s] Fitting surrogate: 879 points, 879 targets 2026-02-21T09:36:51.1792239Z [362s] Generation 17 starting: 14 neighbors, 1 active search path(s) 2026-02-21T09:37:15.6808041Z Generation 17: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 0.4 configs/s 2026-02-21T09:37:16.4939586Z Generation 17: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 19.7 configs/s 2026-02-21T09:37:16.6514489Z Generation 17: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5830.5 2026-02-21T09:37:16.6515151Z configs/s 2026-02-21T09:37:16.6819087Z [388s] Generation 17 complete: 2026-02-21T09:37:16.6820038Z error=4 2026-02-21T09:37:16.6820584Z ok=12 2026-02-21T09:37:16.6820801Z min=0.0429 2026-02-21T09:37:16.6820990Z mid=0.1474 2026-02-21T09:37:16.6823599Z max=11.0802 2026-02-21T09:37:16.6823977Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:37:16.6824550Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:37:16.6825033Z 'l2_groupings': [64], 2026-02-21T09:37:16.6825314Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:37:16.6825658Z 'loop_orders': [[0, 1]], 2026-02-21T09:37:16.6825935Z 'num_stages': 3, 2026-02-21T09:37:16.6826180Z 'num_warps': 4, 2026-02-21T09:37:16.6826408Z 'pid_type': 'flat', 2026-02-21T09:37:16.6826682Z 'range_flattens': [None, True], 2026-02-21T09:37:16.6826993Z 'range_multi_buffers': [None, True], 2026-02-21T09:37:16.6827326Z 'range_num_stages': [0, 0], 2026-02-21T09:37:16.6827612Z 'range_unroll_factors': [0, 0], 2026-02-21T09:37:16.6827929Z 'range_warp_specializes': [None, None]} 2026-02-21T09:37:16.6854268Z [388s] Fitting surrogate: 895 points, 895 targets 2026-02-21T09:37:17.1606649Z [388s] Generation 18 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:37:24.3317949Z Generation 18: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 1.4 configs/s 2026-02-21T09:37:25.2200244Z Generation 18: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 20.4 configs/s 2026-02-21T09:37:25.3789239Z Generation 18: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5694.7 2026-02-21T09:37:25.3789645Z configs/s 2026-02-21T09:37:25.4086773Z [397s] Generation 18 complete: 2026-02-21T09:37:25.4091992Z error=2 2026-02-21T09:37:25.4094084Z ok=16 2026-02-21T09:37:25.4094305Z min=0.0429 2026-02-21T09:37:25.4094478Z mid=0.1125 2026-02-21T09:37:25.4094648Z max=1.5792 2026-02-21T09:37:25.4094908Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:37:25.4095222Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:37:25.4095485Z 'l2_groupings': [64], 2026-02-21T09:37:25.4095676Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:37:25.4095889Z 'loop_orders': [[0, 1]], 2026-02-21T09:37:25.4096043Z 'num_stages': 3, 2026-02-21T09:37:25.4096193Z 'num_warps': 4, 2026-02-21T09:37:25.4096330Z 'pid_type': 'flat', 2026-02-21T09:37:25.4096492Z 'range_flattens': [None, True], 2026-02-21T09:37:25.4096667Z 'range_multi_buffers': [None, True], 2026-02-21T09:37:25.4096854Z 'range_num_stages': [0, 0], 2026-02-21T09:37:25.4097015Z 'range_unroll_factors': [0, 0], 2026-02-21T09:37:25.4097196Z 'range_warp_specializes': [None, None]} 2026-02-21T09:37:25.4118589Z [397s] Fitting surrogate: 913 points, 913 targets 2026-02-21T09:37:25.8650061Z [397s] Generation 19 starting: 17 neighbors, 1 active search path(s) 2026-02-21T09:37:52.0065594Z Generation 19: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 0.3 configs/s 2026-02-21T09:37:53.1099028Z Generation 19: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 17.0 configs/s 2026-02-21T09:37:53.2682621Z Generation 19: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 5767.8 2026-02-21T09:37:53.2686682Z configs/s 2026-02-21T09:37:53.2986233Z [425s] Generation 19 complete: 2026-02-21T09:37:53.2991352Z error=5 2026-02-21T09:37:53.2992871Z ok=14 2026-02-21T09:37:53.2993035Z min=0.0429 2026-02-21T09:37:53.2993184Z mid=0.0901 2026-02-21T09:37:53.2993313Z max=13.2260 2026-02-21T09:37:53.2993473Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:37:53.2993717Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:37:53.2993946Z 'l2_groupings': [64], 2026-02-21T09:37:53.2994110Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:37:53.2994300Z 'loop_orders': [[0, 1]], 2026-02-21T09:37:53.2994463Z 'num_stages': 3, 2026-02-21T09:37:53.2994601Z 'num_warps': 4, 2026-02-21T09:37:53.2995002Z 'pid_type': 'flat', 2026-02-21T09:37:53.2995162Z 'range_flattens': [None, True], 2026-02-21T09:37:53.2995380Z 'range_multi_buffers': [None, True], 2026-02-21T09:37:53.2995911Z 'range_num_stages': [0, 0], 2026-02-21T09:37:53.2996158Z 'range_unroll_factors': [0, 0], 2026-02-21T09:37:53.2996337Z 'range_warp_specializes': [None, None]} 2026-02-21T09:37:53.3019827Z [425s] Fitting surrogate: 932 points, 932 targets 2026-02-21T09:37:53.8178988Z [425s] Generation 20 starting: 17 neighbors, 1 active search path(s) 2026-02-21T09:38:15.3367214Z Generation 20: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 0.2 configs/s 2026-02-21T09:38:16.3030052Z Generation 20: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 19.7 configs/s 2026-02-21T09:38:16.7559504Z Generation 20: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 2142.3 2026-02-21T09:38:16.7563061Z configs/s 2026-02-21T09:38:16.8040700Z [448s] Generation 20 complete: 2026-02-21T09:38:16.8041143Z error=2 2026-02-21T09:38:16.8041421Z ok=17 2026-02-21T09:38:16.8042038Z min=0.0411 2026-02-21T09:38:16.8042346Z mid=0.0963 2026-02-21T09:38:16.8042706Z max=3.0628 2026-02-21T09:38:16.8043042Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:38:16.8043584Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:38:16.8044132Z 'l2_groupings': [64], 2026-02-21T09:38:16.8044503Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:38:16.8045350Z 'loop_orders': [[0, 1]], 2026-02-21T09:38:16.8045718Z 'num_stages': 3, 2026-02-21T09:38:16.8046051Z 'num_warps': 4, 2026-02-21T09:38:16.8046377Z 'pid_type': 'flat', 2026-02-21T09:38:16.8046747Z 'range_flattens': [None, True], 2026-02-21T09:38:16.8047185Z 'range_multi_buffers': [None, True], 2026-02-21T09:38:16.8047628Z 'range_num_stages': [0, 0], 2026-02-21T09:38:16.8048032Z 'range_unroll_factors': [0, 0], 2026-02-21T09:38:16.8048467Z 'range_warp_specializes': [None, None]} 2026-02-21T09:38:16.8078622Z [448s] Fitting surrogate: 951 points, 951 targets 2026-02-21T09:38:17.1073712Z [448s] Autotuning complete in 448.9s after searching 912 configs. 2026-02-21T09:38:17.1074087Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:38:17.1075362Z @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_stages=3, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:38:17.1076312Z 2026-02-21T09:38:17.1076559Z [448s] Code of selected kernel: /tmp/torchinductor_root/wj/cwjfpgd6457urlosqdck2hjrrxhvukcwtj5gi3rrg4vnaidrdclj.py 2026-02-21T09:38:17.1188841Z from __future__ import annotations 2026-02-21T09:38:17.1189093Z 2026-02-21T09:38:17.1189211Z import torch 2026-02-21T09:38:17.1189380Z import helion 2026-02-21T09:38:17.1189542Z import triton 2026-02-21T09:38:17.1189746Z import triton.language as tl 2026-02-21T09:38:17.1190002Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:38:17.1190219Z 2026-02-21T09:38:17.1190295Z _BLOCK_SIZE_0 = tl.constexpr(128) 2026-02-21T09:38:17.1190476Z _BLOCK_SIZE_1 = tl.constexpr(128) 2026-02-21T09:38:17.1190644Z _BLOCK_SIZE_2 = tl.constexpr(64) 2026-02-21T09:38:17.1190822Z # src[matmul.py:42]: def matmul( 2026-02-21T09:38:17.1190989Z # src[matmul.py:43]: x: Tensor, 2026-02-21T09:38:17.1191168Z # src[matmul.py:44]: y: Tensor, 2026-02-21T09:38:17.1191332Z # src[matmul.py:42-68]: ... 2026-02-21T09:38:17.1191513Z helion.runtime.set_triton_allocator() 2026-02-21T09:38:17.1191637Z 2026-02-21T09:38:17.1191698Z @triton.jit 2026-02-21T09:38:17.1191840Z def _helion_matmul(x, y, out): 2026-02-21T09:38:17.1192094Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:38:17.1192462Z x_desc = tl.make_tensor_descriptor(x, [12288, 1024], [1024, 1], [_BLOCK_SIZE_0, _BLOCK_SIZE_2]) 2026-02-21T09:38:17.1192799Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:38:17.1193282Z num_pid_m = tl.cdiv(12288, _BLOCK_SIZE_0) 2026-02-21T09:38:17.1193554Z num_pid_n = tl.cdiv(1024, _BLOCK_SIZE_1) 2026-02-21T09:38:17.1193743Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:38:17.1193919Z num_pid_in_group = 64 * num_pid_n 2026-02-21T09:38:17.1194117Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:38:17.1194306Z first_pid_m = group_id * 64 2026-02-21T09:38:17.1194499Z group_size_m = min(num_pid_m - first_pid_m, 64) 2026-02-21T09:38:17.1194833Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:38:17.1195114Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:38:17.1195326Z offset_0 = pid_0 * _BLOCK_SIZE_0 2026-02-21T09:38:17.1195563Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:38:17.1195799Z offset_1 = pid_1 * _BLOCK_SIZE_1 2026-02-21T09:38:17.1196146Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:38:17.1196446Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:38:17.1196740Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:38:17.1196990Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:38:17.1197257Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:38:17.1197634Z for offset_2 in tl.range(0, 1024, _BLOCK_SIZE_2, disallow_acc_multi_buffer=False, flatten=True): 2026-02-21T09:38:17.1197979Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:38:17.1198199Z acc_copy = acc 2026-02-21T09:38:17.1198356Z acc_copy_0 = acc_copy 2026-02-21T09:38:17.1198591Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:38:17.1198865Z load = x_desc.load([offset_0, offset_2]) 2026-02-21T09:38:17.1199120Z load_1 = tl.load(y + (indices_2[:, None] * 1 + indices_1[None, :] * 1024), None) 2026-02-21T09:38:17.1199555Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:38:17.1199983Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:38:17.1200222Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:38:17.1200468Z tl.store(out + (indices_0[:, None] * 1024 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:38:17.1200655Z 2026-02-21T09:38:17.1200926Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:38:17.1201318Z """ 2026-02-21T09:38:17.1201553Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:38:17.1201818Z Args: 2026-02-21T09:38:17.1201974Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:38:17.1202180Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:38:17.1202479Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:38:17.1202795Z after the matmul. Defaults to identity (no change). 2026-02-21T09:38:17.1203008Z Returns: 2026-02-21T09:38:17.1203164Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:38:17.1203339Z """ 2026-02-21T09:38:17.1203480Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:38:17.1203650Z m, k = x.size() 2026-02-21T09:38:17.1203807Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:38:17.1203976Z k2, n = y.size() 2026-02-21T09:38:17.1204173Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:38:17.1204416Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:38:17.1204620Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:38:17.1204932Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:38:17.1205213Z # src[matmul.py:62]: ) 2026-02-21T09:38:17.1205538Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:38:17.1205845Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:38:17.1206064Z _BLOCK_SIZE_0 = 128 2026-02-21T09:38:17.1206210Z _BLOCK_SIZE_1 = 128 2026-02-21T09:38:17.1206399Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:38:17.1206674Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:38:17.1206943Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:38:17.1207149Z # src[matmul.py:63-67]: ... 2026-02-21T09:38:17.1207499Z _launcher(_helion_matmul, (triton.cdiv(12288, _BLOCK_SIZE_0) * triton.cdiv(1024, _BLOCK_SIZE_1),), x, y, out, num_warps=4, num_stages=3) 2026-02-21T09:38:17.1207884Z # src[matmul.py:68]: return out 2026-02-21T09:38:17.1208078Z return out 2026-02-21T09:38:48.5159512Z WARNING:tritonbench.utils.triton_op:Completed input ID 8: 2026-02-21T09:38:48.5159920Z (M, N, K) 2026-02-21T09:38:48.5160177Z ------------------- 2026-02-21T09:38:48.5160374Z (12288, 1024, 1024) 2026-02-21T09:38:48.5160479Z 2026-02-21T09:38:48.5175656Z 75%|███████▌ | 6/8 [40:44<15:08, 454.22s/it]WARNING:tritonbench.utils.triton_op:Running input ID 9: 2026-02-21T09:38:48.5176130Z (M, N, K) 2026-02-21T09:38:48.5176347Z ------------------- 2026-02-21T09:38:48.5182219Z (1024, 12288, 1024) 2026-02-21T09:38:48.5182599Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:39:36.4627767Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:40:13.7385580Z INFO:tritonbench.utils.triton_op:Took 85.81ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:40:53.5726056Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:40:53.5728468Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:40:53.5728714Z 'dtype': 'torch.float16', 2026-02-21T09:40:53.5728925Z 'shape': (1024, 1024), 2026-02-21T09:40:53.5729111Z 'stride': (1024, 1)}, 2026-02-21T09:40:53.5729276Z { 'device': 'cuda:0', 2026-02-21T09:40:53.5729452Z 'dtype': 'torch.float16', 2026-02-21T09:40:53.5729623Z 'shape': (1024, 12288), 2026-02-21T09:40:53.5729801Z 'stride': (1, 1024)}, 2026-02-21T09:40:53.5729966Z None), 2026-02-21T09:40:53.5730100Z 'kwargs': {}} 2026-02-21T09:40:53.5796320Z INFO:tritonbench.utils.triton_op:Took 7.26ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:40:53.6710789Z [0s] Autotune random seed: 2137757931 2026-02-21T09:40:53.7734942Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:40:59.8526564Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 31.9 configs/s 2026-02-21T09:41:11.3848024Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 8.7 configs/s 2026-02-21T09:41:11.3857945Z [17s] Adaptive compile timeout: 30s (90% percentile=4.3s, bounds=[30.0s, 30s]) 2026-02-21T09:41:11.3863119Z [17s] Initial random population of 100, 5 starting points: 2026-02-21T09:41:11.3868158Z error=15 2026-02-21T09:41:11.3869724Z ok=85 2026-02-21T09:41:11.3869931Z min=0.1127 2026-02-21T09:41:11.3875116Z mid=1.4233 2026-02-21T09:41:11.3876559Z max=206.7180 2026-02-21T09:41:11.3876749Z best={'block_sizes': [64, 512, 32], 2026-02-21T09:41:11.3876995Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:41:11.3877218Z 'l2_groupings': [4], 2026-02-21T09:41:11.3877392Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:41:11.3877579Z 'loop_orders': [[0, 1]], 2026-02-21T09:41:11.3877737Z 'num_stages': 7, 2026-02-21T09:41:11.3877873Z 'num_warps': 16, 2026-02-21T09:41:11.3878015Z 'pid_type': 'flat', 2026-02-21T09:41:11.3878186Z 'range_flattens': [None, False], 2026-02-21T09:41:11.3878383Z 'range_multi_buffers': [None, False], 2026-02-21T09:41:11.3878570Z 'range_num_stages': [0, 0], 2026-02-21T09:41:11.3878747Z 'range_unroll_factors': [0, 0], 2026-02-21T09:41:11.3878933Z 'range_warp_specializes': [None, False]} 2026-02-21T09:41:11.3879231Z [17s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:41:12.6045179Z [18s] Generation 1 starting: 83 neighbors, 5 active search path(s) 2026-02-21T09:41:19.4378911Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 27.3 configs/s 2026-02-21T09:41:19.9754922Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:19.9755197Z 2026-02-21T09:41:19.9756589Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:41:19.9757823Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:19.9758058Z `ptxas` stderr: 2026-02-21T09:41:19.9758476Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 201 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:19.9758869Z 2026-02-21T09:41:19.9759009Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:19.9759156Z 2026-02-21T09:41:19.9759569Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp0zo43iw4.ptx -o /tmp/tmp0zo43iw4.ptx.o 2026-02-21T09:41:19.9760025Z 2026-02-21T09:41:19.9760156Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:19.9760427Z ================================================================ 2026-02-21T09:41:19.9760637Z Internal Triton PTX codegen error 2026-02-21T09:41:19.9760811Z `ptxas` stderr: 2026-02-21T09:41:19.9761214Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 201 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:19.9761694Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:19.9761843Z 2026-02-21T09:41:19.9762215Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp0zo43iw4.ptx -o /tmp/tmp0zo43iw4.ptx.o 2026-02-21T09:41:19.9762640Z 2026-02-21T09:41:19.9762643Z 2026-02-21T09:41:19.9762705Z // 2026-02-21T09:41:19.9762843Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:19.9763018Z // 2026-02-21T09:41:19.9763084Z 2026-02-21T09:41:19.9763139Z .version 8.7 2026-02-21T09:41:19.9763278Z .target sm_100a 2026-02-21T09:41:19.9763411Z .address_size 64 2026-02-21T09:41:19.9763568Z 2026-02-21T09:41:19.9763689Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:19.9763985Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:19.9764199Z // @_helion_matmul 2026-02-21T09:41:19.9764400Z .visible .entry _helion_matmul( 2026-02-21T09:41:19.9764608Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:19.9764895Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:19.9765138Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:19.9765382Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:19.9765621Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:19.9765843Z ) 2026-02-21T09:41:19.9765972Z .reqntid 128 2026-02-21T09:41:19.9766106Z .maxnreg 32 2026-02-21T09:41:19.9766247Z { 2026-02-21T09:41:19.9766378Z .reg .pred %p<145>; 2026-02-21T09:41:19.9766541Z .reg .b32 %r<659>; 2026-02-21T09:41:19.9766681Z .reg .b64 %rd<210>; 2026-02-21T09:41:19.9766950Z .loc 1 19 0 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:19:0 2026-02-21T09:41:19.9767236Z $L__func_begin0: 2026-02-21T09:41:19.9767481Z .loc 1 19 0 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:19:0 2026-02-21T09:41:19.9767716Z 2026-02-21T09:41:19.9767775Z // %bb.0: 2026-02-21T09:41:19.9767924Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:41:19.9768115Z $L__tmp0: 2026-02-21T09:41:19.9768337Z .loc 1 19 0 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:19 2026-02-21T09:41:19.9768618Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:19.9768788Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:41:19.9768991Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:41:19.9769156Z mov.b32 %r66, global_smem; 2026-02-21T09:41:19.9769340Z // begin inline asm 2026-02-21T09:41:19.9769612Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r66], 128; 2026-02-21T09:41:19.9769854Z // end inline asm 2026-02-21T09:41:19.9770014Z ld.param.b64 %rd50, [_helion_matmul_param_3]; 2026-02-21T09:41:19.9770191Z bar.sync 0; 2026-02-21T09:41:19.9770336Z ld.shared.b32 %r629, [global_smem]; 2026-02-21T09:41:19.9770499Z bar.sync 0; 2026-02-21T09:41:19.9770632Z // begin inline asm 2026-02-21T09:41:19.9770828Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:19.9771056Z // end inline asm 2026-02-21T09:41:19.9771300Z .loc 1 21 67 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:21:67 2026-02-21T09:41:19.9771580Z mov.u32 %r658, %ctaid.x; 2026-02-21T09:41:19.9771733Z mov.u32 %r206, %ctaid.y; 2026-02-21T09:41:19.9771872Z mov.u32 %r207, %ctaid.z; 2026-02-21T09:41:19.9772023Z mov.u32 %r208, %nctaid.x; 2026-02-21T09:41:19.9772170Z mov.u32 %r209, %nctaid.y; 2026-02-21T09:41:19.9772329Z mad.lo.s32 %r210, %r207, %r209, %r206; 2026-02-21T09:41:19.9772710Z mad.lo.s32 %r211, %r210, %r208, %r658; 2026-02-21T09:41:19.9772880Z shl.b32 %r212, %r211, 8; 2026-02-21T09:41:19.9773032Z cvt.s64.s32 %rd51, %r212; 2026-02-21T09:41:19.9773181Z add.s64 %rd19, %rd50, %rd51; 2026-02-21T09:41:19.9773341Z shl.b32 %r213, %r1, 2; 2026-02-21T09:41:19.9773486Z add.s32 %r67, %r66, %r213; 2026-02-21T09:41:19.9773634Z mov.b32 %r76, 0; 2026-02-21T09:41:19.9773761Z // begin inline asm 2026-02-21T09:41:19.9773913Z @%p4 st.shared.b32 [ %r67 + 0 ], %r76; 2026-02-21T09:41:19.9774079Z // end inline asm 2026-02-21T09:41:19.9774221Z bar.warp.sync -1; 2026-02-21T09:41:19.9774367Z setp.eq.b32 %p135, %r1, 0; 2026-02-21T09:41:19.9774519Z cvt.u64.u32 %rd4, %r66; 2026-02-21T09:41:19.9774696Z // begin inline asm 2026-02-21T09:41:19.9774946Z @%p135 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:41:19.9775226Z // end inline asm 2026-02-21T09:41:19.9775354Z // begin inline asm 2026-02-21T09:41:19.9775580Z @%p135 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:19.9775886Z // end inline asm 2026-02-21T09:41:19.9776022Z mov.b32 %r69, 32; 2026-02-21T09:41:19.9776161Z // begin inline asm 2026-02-21T09:41:19.9776390Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r69; 2026-02-21T09:41:19.9776655Z // end inline asm 2026-02-21T09:41:19.9776781Z mov.b32 %r70, 64; 2026-02-21T09:41:19.9776917Z // begin inline asm 2026-02-21T09:41:19.9777139Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r70; 2026-02-21T09:41:19.9777403Z // end inline asm 2026-02-21T09:41:19.9777535Z mov.b32 %r71, 1024; 2026-02-21T09:41:19.9777686Z // begin inline asm 2026-02-21T09:41:19.9777934Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r71; 2026-02-21T09:41:19.9778197Z // end inline asm 2026-02-21T09:41:19.9778331Z // begin inline asm 2026-02-21T09:41:19.9778562Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r71; 2026-02-21T09:41:19.9778832Z // end inline asm 2026-02-21T09:41:19.9778959Z mov.b64 %rd12, 2048; 2026-02-21T09:41:19.9779105Z // begin inline asm 2026-02-21T09:41:19.9779357Z @%p135 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:19.9779632Z // end inline asm 2026-02-21T09:41:19.9779767Z mov.b32 %r73, 1; 2026-02-21T09:41:19.9779894Z // begin inline asm 2026-02-21T09:41:19.9780145Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r73; 2026-02-21T09:41:19.9780419Z // end inline asm 2026-02-21T09:41:19.9780553Z // begin inline asm 2026-02-21T09:41:19.9780793Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:41:19.9781071Z // end inline asm 2026-02-21T09:41:19.9781205Z // begin inline asm 2026-02-21T09:41:19.9781454Z @%p135 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:19.9781757Z // end inline asm 2026-02-21T09:41:19.9781896Z // begin inline asm 2026-02-21T09:41:19.9782159Z @%p135 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:19.9782448Z // end inline asm 2026-02-21T09:41:19.9782587Z // begin inline asm 2026-02-21T09:41:19.9782829Z @%p135 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:19.9783096Z // end inline asm 2026-02-21T09:41:19.9783235Z // begin inline asm 2026-02-21T09:41:19.9783464Z @%p135 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:19.9783733Z // end inline asm 2026-02-21T09:41:19.9783867Z // begin inline asm 2026-02-21T09:41:19.9784231Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:19.9784608Z // end inline asm 2026-02-21T09:41:19.9784769Z // begin inline asm 2026-02-21T09:41:19.9784989Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:41:19.9785247Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:19.9785446Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:19.9785623Z // end inline asm 2026-02-21T09:41:19.9785764Z bar.sync 0; 2026-02-21T09:41:19.9785902Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:41:19.9786201Z .loc 1 22 68 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:22:68 2026-02-21T09:41:19.9786514Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:41:19.9786670Z bar.sync 0; 2026-02-21T09:41:19.9786808Z // begin inline asm 2026-02-21T09:41:19.9786959Z @%p4 st.shared.b32 [ %r67 + 0 ], %r76; 2026-02-21T09:41:19.9787140Z // end inline asm 2026-02-21T09:41:19.9787278Z bar.warp.sync -1; 2026-02-21T09:41:19.9787425Z // begin inline asm 2026-02-21T09:41:19.9787674Z @%p135 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:41:19.9787964Z // end inline asm 2026-02-21T09:41:19.9788113Z // begin inline asm 2026-02-21T09:41:19.9788372Z @%p135 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:19.9788664Z // end inline asm 2026-02-21T09:41:19.9788798Z // begin inline asm 2026-02-21T09:41:19.9789041Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r69; 2026-02-21T09:41:19.9789309Z // end inline asm 2026-02-21T09:41:19.9789450Z mov.b32 %r78, 128; 2026-02-21T09:41:19.9789592Z // begin inline asm 2026-02-21T09:41:19.9789821Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r78; 2026-02-21T09:41:19.9790081Z // end inline asm 2026-02-21T09:41:19.9790209Z // begin inline asm 2026-02-21T09:41:19.9790448Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r71; 2026-02-21T09:41:19.9790713Z // end inline asm 2026-02-21T09:41:19.9790849Z mov.b32 %r80, 12288; 2026-02-21T09:41:19.9790985Z // begin inline asm 2026-02-21T09:41:19.9791227Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r80; 2026-02-21T09:41:19.9791497Z // end inline asm 2026-02-21T09:41:19.9791627Z // begin inline asm 2026-02-21T09:41:19.9791874Z @%p135 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:19.9792148Z // end inline asm 2026-02-21T09:41:19.9792284Z // begin inline asm 2026-02-21T09:41:19.9792525Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r73; 2026-02-21T09:41:19.9792816Z // end inline asm 2026-02-21T09:41:19.9792944Z // begin inline asm 2026-02-21T09:41:19.9793191Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:41:19.9793477Z // end inline asm 2026-02-21T09:41:19.9793604Z // begin inline asm 2026-02-21T09:41:19.9793834Z @%p135 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:19.9794116Z // end inline asm 2026-02-21T09:41:19.9794282Z // begin inline asm 2026-02-21T09:41:19.9794525Z @%p135 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:19.9794836Z // end inline asm 2026-02-21T09:41:19.9794971Z // begin inline asm 2026-02-21T09:41:19.9795195Z @%p135 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:19.9795459Z // end inline asm 2026-02-21T09:41:19.9795585Z // begin inline asm 2026-02-21T09:41:19.9795809Z @%p135 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:19.9796055Z // end inline asm 2026-02-21T09:41:19.9796189Z // begin inline asm 2026-02-21T09:41:19.9796529Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:19.9796896Z // end inline asm 2026-02-21T09:41:19.9797032Z // begin inline asm 2026-02-21T09:41:19.9797232Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:41:19.9797480Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:19.9797664Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:19.9797839Z // end inline asm 2026-02-21T09:41:19.9797968Z bar.sync 0; 2026-02-21T09:41:19.9798109Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:41:19.9798380Z .loc 1 40 45 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:40:45 2026-02-21T09:41:19.9798669Z shr.u32 %r214, %r1, 5; 2026-02-21T09:41:19.9798936Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9799229Z sub.s32 %r216, 1536, %r658; 2026-02-21T09:41:19.9799400Z mul.hi.s32 %r217, %r216, -580400985; 2026-02-21T09:41:19.9799572Z add.s32 %r218, %r217, %r216; 2026-02-21T09:41:19.9799731Z shr.u32 %r219, %r218, 31; 2026-02-21T09:41:19.9799878Z shr.s32 %r220, %r218, 12; 2026-02-21T09:41:19.9800032Z add.s32 %r221, %r220, %r219; 2026-02-21T09:41:19.9800194Z mul.lo.s32 %r222, %r221, 4736; 2026-02-21T09:41:19.9800357Z setp.ne.b32 %p70, %r216, %r222; 2026-02-21T09:41:19.9800610Z setp.lt.u32 %p71, %r658, 1537; 2026-02-21T09:41:19.9800776Z and.pred %p72, %p71, %p70; 2026-02-21T09:41:19.9800941Z selp.b32 %r223, 1, 0, %p72; 2026-02-21T09:41:19.9801094Z add.s32 %r14, %r221, %r223; 2026-02-21T09:41:19.9801360Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9801662Z shfl.sync.idx.b32 %r16, %r214, 0, 31, -1; 2026-02-21T09:41:19.9801852Z shl.b32 %r224, %r16, 21; 2026-02-21T09:41:19.9802009Z and.b32 %r225, %r224, 6291456; 2026-02-21T09:41:19.9802164Z add.s32 %r83, %r225, %r629; 2026-02-21T09:41:19.9802322Z mov.pred %p42, -1; 2026-02-21T09:41:19.9802462Z // begin inline asm 2026-02-21T09:41:19.9802804Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 0], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:19.9803180Z // end inline asm 2026-02-21T09:41:19.9803322Z // begin inline asm 2026-02-21T09:41:19.9803659Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 16], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:19.9804016Z // end inline asm 2026-02-21T09:41:19.9804152Z // begin inline asm 2026-02-21T09:41:19.9804471Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 32], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:19.9804873Z // end inline asm 2026-02-21T09:41:19.9805002Z // begin inline asm 2026-02-21T09:41:19.9805337Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 48], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:19.9805712Z // end inline asm 2026-02-21T09:41:19.9805841Z // begin inline asm 2026-02-21T09:41:19.9806020Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:19.9806179Z // end inline asm 2026-02-21T09:41:19.9806341Z bar.sync 0; 2026-02-21T09:41:19.9806594Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9806892Z add.s32 %r151, %r66, 77872; 2026-02-21T09:41:19.9807039Z // begin inline asm 2026-02-21T09:41:19.9807209Z @%p135 mbarrier.init.shared::cta.b64 [%r151], 1; 2026-02-21T09:41:19.9807401Z // end inline asm 2026-02-21T09:41:19.9807527Z bar.sync 0; 2026-02-21T09:41:19.9807662Z add.s32 %r152, %r66, 77880; 2026-02-21T09:41:19.9807808Z // begin inline asm 2026-02-21T09:41:19.9807974Z @%p135 mbarrier.init.shared::cta.b64 [%r152], 1; 2026-02-21T09:41:19.9808155Z // end inline asm 2026-02-21T09:41:19.9808293Z add.s32 %r153, %r66, 77824; 2026-02-21T09:41:19.9808440Z // begin inline asm 2026-02-21T09:41:19.9808605Z @%p135 mbarrier.init.shared::cta.b64 [%r153], 1; 2026-02-21T09:41:19.9808796Z // end inline asm 2026-02-21T09:41:19.9808924Z bar.sync 0; 2026-02-21T09:41:19.9809062Z add.s32 %r154, %r66, 77832; 2026-02-21T09:41:19.9809213Z // begin inline asm 2026-02-21T09:41:19.9809396Z @%p135 mbarrier.init.shared::cta.b64 [%r154], 1; 2026-02-21T09:41:19.9809573Z // end inline asm 2026-02-21T09:41:19.9809706Z bar.sync 0; 2026-02-21T09:41:19.9809828Z add.s32 %r155, %r66, 77840; 2026-02-21T09:41:19.9809981Z // begin inline asm 2026-02-21T09:41:19.9810136Z @%p135 mbarrier.init.shared::cta.b64 [%r155], 1; 2026-02-21T09:41:19.9810324Z // end inline asm 2026-02-21T09:41:19.9810455Z bar.sync 0; 2026-02-21T09:41:19.9810577Z add.s32 %r156, %r66, 77848; 2026-02-21T09:41:19.9810730Z // begin inline asm 2026-02-21T09:41:19.9810883Z @%p135 mbarrier.init.shared::cta.b64 [%r156], 1; 2026-02-21T09:41:19.9811067Z // end inline asm 2026-02-21T09:41:19.9811189Z bar.sync 0; 2026-02-21T09:41:19.9811320Z add.s32 %r157, %r66, 77856; 2026-02-21T09:41:19.9811464Z // begin inline asm 2026-02-21T09:41:19.9811625Z @%p135 mbarrier.init.shared::cta.b64 [%r157], 1; 2026-02-21T09:41:19.9811805Z // end inline asm 2026-02-21T09:41:19.9811940Z bar.sync 0; 2026-02-21T09:41:19.9812099Z add.s32 %r259, %r66, 77864; 2026-02-21T09:41:19.9812273Z // begin inline asm 2026-02-21T09:41:19.9812435Z @%p135 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:41:19.9812610Z // end inline asm 2026-02-21T09:41:19.9812750Z setp.lt.s32 %p73, %r14, 1; 2026-02-21T09:41:19.9812905Z setp.gt.s32 %p69, %r14, 0; 2026-02-21T09:41:19.9813172Z .loc 1 35 33 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:35:33 2026-02-21T09:41:19.9813457Z shr.u32 %r226, %r658, 4; 2026-02-21T09:41:19.9813615Z and.b32 %r227, %r226, 134217664; 2026-02-21T09:41:19.9813882Z .loc 1 36 39 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:36:39 2026-02-21T09:41:19.9814157Z sub.s32 %r228, 96, %r227; 2026-02-21T09:41:19.9814413Z .loc 1 36 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:36:52 2026-02-21T09:41:19.9814744Z min.s32 %r229, %r228, 64; 2026-02-21T09:41:19.9815009Z .loc 1 37 45 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:37:45 2026-02-21T09:41:19.9815285Z and.b32 %r230, %r658, 1023; 2026-02-21T09:41:19.9815547Z .loc 1 38 51 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:38:51 2026-02-21T09:41:19.9815839Z div.s32 %r231, %r230, %r229; 2026-02-21T09:41:19.9816092Z .loc 1 37 64 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:37:64 2026-02-21T09:41:19.9816384Z mul.lo.s32 %r232, %r231, %r229; 2026-02-21T09:41:19.9816544Z sub.s32 %r233, %r230, %r232; 2026-02-21T09:41:19.9816804Z .loc 1 37 30 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:37:30 2026-02-21T09:41:19.9817079Z add.s32 %r234, %r233, %r227; 2026-02-21T09:41:19.9817337Z .loc 1 39 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:39:27 2026-02-21T09:41:19.9817654Z shl.b32 %r635, %r234, 7; 2026-02-21T09:41:19.9817933Z .loc 1 41 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:41:27 2026-02-21T09:41:19.9818218Z shl.b32 %r631, %r231, 6; 2026-02-21T09:41:19.9818471Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9818761Z bar.sync 0; 2026-02-21T09:41:19.9818897Z and.pred %p1, %p135, %p69; 2026-02-21T09:41:19.9819064Z // begin inline asm 2026-02-21T09:41:19.9819264Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r153], 12288; 2026-02-21T09:41:19.9819478Z // end inline asm 2026-02-21T09:41:19.9819725Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9820003Z bar.sync 0; 2026-02-21T09:41:19.9820146Z elect.sync %r235|%p74, -1; 2026-02-21T09:41:19.9820309Z and.pred %p75, %p69, %p74; 2026-02-21T09:41:19.9820473Z and.pred %p55, %p4, %p75; 2026-02-21T09:41:19.9820627Z add.s32 %r160, %r66, 49152; 2026-02-21T09:41:19.9820786Z // begin inline asm 2026-02-21T09:41:19.9821117Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd40, {%r76, %r631}], [%r153]; 2026-02-21T09:41:19.9821477Z // end inline asm 2026-02-21T09:41:19.9821719Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9821994Z // begin inline asm 2026-02-21T09:41:19.9822152Z fence.proxy.async.shared::cta; 2026-02-21T09:41:19.9822311Z // end inline asm 2026-02-21T09:41:19.9822445Z bar.sync 0; 2026-02-21T09:41:19.9822575Z elect.sync %r236|%p76, -1; 2026-02-21T09:41:19.9822738Z and.pred %p77, %p69, %p76; 2026-02-21T09:41:19.9822898Z and.pred %p56, %p4, %p77; 2026-02-21T09:41:19.9823047Z // begin inline asm 2026-02-21T09:41:19.9823376Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r66], [%rd41, {%r76, %r635}], [%r153]; 2026-02-21T09:41:19.9823729Z // end inline asm 2026-02-21T09:41:19.9823982Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9824315Z bar.sync 0; 2026-02-21T09:41:19.9824448Z // begin inline asm 2026-02-21T09:41:19.9824635Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r154], 12288; 2026-02-21T09:41:19.9824881Z // end inline asm 2026-02-21T09:41:19.9825150Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9825439Z bar.sync 0; 2026-02-21T09:41:19.9825583Z elect.sync %r237|%p78, -1; 2026-02-21T09:41:19.9825745Z and.pred %p79, %p69, %p78; 2026-02-21T09:41:19.9825915Z and.pred %p58, %p4, %p79; 2026-02-21T09:41:19.9826073Z add.s32 %r169, %r66, 53248; 2026-02-21T09:41:19.9826237Z // begin inline asm 2026-02-21T09:41:19.9826581Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r169], [%rd40, {%r69, %r631}], [%r154]; 2026-02-21T09:41:19.9826941Z // end inline asm 2026-02-21T09:41:19.9827208Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9827503Z bar.sync 0; 2026-02-21T09:41:19.9827646Z elect.sync %r238|%p80, -1; 2026-02-21T09:41:19.9827808Z and.pred %p81, %p69, %p80; 2026-02-21T09:41:19.9827977Z and.pred %p59, %p4, %p81; 2026-02-21T09:41:19.9828133Z add.s32 %r173, %r66, 8192; 2026-02-21T09:41:19.9828290Z // begin inline asm 2026-02-21T09:41:19.9828624Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r173], [%rd41, {%r69, %r635}], [%r154]; 2026-02-21T09:41:19.9828992Z // end inline asm 2026-02-21T09:41:19.9829258Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9829550Z bar.sync 0; 2026-02-21T09:41:19.9829687Z // begin inline asm 2026-02-21T09:41:19.9829880Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r155], 12288; 2026-02-21T09:41:19.9830144Z // end inline asm 2026-02-21T09:41:19.9830433Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9830721Z bar.sync 0; 2026-02-21T09:41:19.9830863Z elect.sync %r239|%p82, -1; 2026-02-21T09:41:19.9831024Z and.pred %p83, %p69, %p82; 2026-02-21T09:41:19.9831190Z and.pred %p61, %p4, %p83; 2026-02-21T09:41:19.9831348Z add.s32 %r178, %r66, 57344; 2026-02-21T09:41:19.9831507Z // begin inline asm 2026-02-21T09:41:19.9831841Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd40, {%r70, %r631}], [%r155]; 2026-02-21T09:41:19.9832215Z // end inline asm 2026-02-21T09:41:19.9832466Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9832758Z bar.sync 0; 2026-02-21T09:41:19.9832902Z elect.sync %r240|%p84, -1; 2026-02-21T09:41:19.9833069Z and.pred %p85, %p69, %p84; 2026-02-21T09:41:19.9833230Z and.pred %p62, %p4, %p85; 2026-02-21T09:41:19.9833380Z add.s32 %r182, %r66, 16384; 2026-02-21T09:41:19.9833536Z // begin inline asm 2026-02-21T09:41:19.9833854Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r182], [%rd41, {%r70, %r635}], [%r155]; 2026-02-21T09:41:19.9834203Z // end inline asm 2026-02-21T09:41:19.9834459Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9834775Z bar.sync 0; 2026-02-21T09:41:19.9834904Z // begin inline asm 2026-02-21T09:41:19.9835087Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r156], 12288; 2026-02-21T09:41:19.9835302Z // end inline asm 2026-02-21T09:41:19.9835543Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9835820Z bar.sync 0; 2026-02-21T09:41:19.9835956Z elect.sync %r241|%p86, -1; 2026-02-21T09:41:19.9836110Z and.pred %p87, %p69, %p86; 2026-02-21T09:41:19.9836270Z and.pred %p64, %p4, %p87; 2026-02-21T09:41:19.9836420Z add.s32 %r187, %r66, 61440; 2026-02-21T09:41:19.9836599Z mov.b32 %r188, 96; 2026-02-21T09:41:19.9836756Z // begin inline asm 2026-02-21T09:41:19.9837069Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r187], [%rd40, {%r188, %r631}], [%r156]; 2026-02-21T09:41:19.9837428Z // end inline asm 2026-02-21T09:41:19.9837662Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9837951Z bar.sync 0; 2026-02-21T09:41:19.9838079Z elect.sync %r242|%p88, -1; 2026-02-21T09:41:19.9838237Z and.pred %p89, %p69, %p88; 2026-02-21T09:41:19.9838391Z and.pred %p65, %p4, %p89; 2026-02-21T09:41:19.9838547Z add.s32 %r191, %r66, 24576; 2026-02-21T09:41:19.9838694Z // begin inline asm 2026-02-21T09:41:19.9839012Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r191], [%rd41, {%r188, %r635}], [%r156]; 2026-02-21T09:41:19.9839356Z // end inline asm 2026-02-21T09:41:19.9839597Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9839880Z bar.sync 0; 2026-02-21T09:41:19.9840005Z // begin inline asm 2026-02-21T09:41:19.9840195Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r157], 12288; 2026-02-21T09:41:19.9840399Z // end inline asm 2026-02-21T09:41:19.9840641Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9840927Z bar.sync 0; 2026-02-21T09:41:19.9841059Z elect.sync %r243|%p90, -1; 2026-02-21T09:41:19.9841230Z and.pred %p91, %p69, %p90; 2026-02-21T09:41:19.9841382Z and.pred %p67, %p4, %p91; 2026-02-21T09:41:19.9841540Z add.s32 %r196, %r66, 65536; 2026-02-21T09:41:19.9841684Z // begin inline asm 2026-02-21T09:41:19.9841996Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r196], [%rd40, {%r78, %r631}], [%r157]; 2026-02-21T09:41:19.9842366Z // end inline asm 2026-02-21T09:41:19.9842649Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9842926Z bar.sync 0; 2026-02-21T09:41:19.9843057Z elect.sync %r244|%p92, -1; 2026-02-21T09:41:19.9843220Z and.pred %p93, %p69, %p92; 2026-02-21T09:41:19.9843374Z and.pred %p68, %p4, %p93; 2026-02-21T09:41:19.9843528Z add.s32 %r200, %r66, 32768; 2026-02-21T09:41:19.9843675Z // begin inline asm 2026-02-21T09:41:19.9843989Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r200], [%rd41, {%r78, %r635}], [%r157]; 2026-02-21T09:41:19.9844323Z // end inline asm 2026-02-21T09:41:19.9844574Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9844893Z bar.sync 0; 2026-02-21T09:41:19.9845016Z // begin inline asm 2026-02-21T09:41:19.9845155Z 2026-02-21T09:41:19.9845265Z { 2026-02-21T09:41:19.9845396Z @!%p69 bra.uni skipWait; 2026-02-21T09:41:19.9845550Z .reg .pred complete; 2026-02-21T09:41:19.9845697Z waitLoop: 2026-02-21T09:41:19.9845881Z mbarrier.try_wait.parity.shared.b64 complete, [%r153], %r76; 2026-02-21T09:41:19.9846112Z @!complete bra.uni waitLoop; 2026-02-21T09:41:19.9846261Z skipWait: 2026-02-21T09:41:19.9846382Z } 2026-02-21T09:41:19.9846447Z 2026-02-21T09:41:19.9846508Z // end inline asm 2026-02-21T09:41:19.9846749Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9847037Z setp.ne.b32 %p94, %r16, 0; 2026-02-21T09:41:19.9847191Z or.pred %p95, %p73, %p94; 2026-02-21T09:41:19.9847347Z @%p95 bra $L__BB0_2; 2026-02-21T09:41:19.9847484Z // %bb.1: 2026-02-21T09:41:19.9847617Z elect.sync %r249|%p97, -1; 2026-02-21T09:41:19.9847774Z bfe.u32 %r252, %r160, 4, 14; 2026-02-21T09:41:19.9847935Z cvt.u64.u32 %rd57, %r252; 2026-02-21T09:41:19.9848105Z or.b64 %rd52, %rd57, -9223371899399045120; 2026-02-21T09:41:19.9848280Z bfe.u32 %r253, %r66, 4, 14; 2026-02-21T09:41:19.9848437Z cvt.u64.u32 %rd58, %r253; 2026-02-21T09:41:19.9848598Z or.b64 %rd53, %rd58, -9223371899382267904; 2026-02-21T09:41:19.9848832Z mov.b32 %r246, 69206032; 2026-02-21T09:41:19.9848978Z mov.pred %p96, 0; 2026-02-21T09:41:19.9849122Z // begin inline asm 2026-02-21T09:41:19.9849339Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd52, %rd53, %r246, %p96; 2026-02-21T09:41:19.9849598Z // end inline asm 2026-02-21T09:41:19.9849736Z add.s32 %r254, %r66, 49184; 2026-02-21T09:41:19.9849888Z bfe.u32 %r255, %r254, 4, 14; 2026-02-21T09:41:19.9850048Z cvt.u64.u32 %rd59, %r255; 2026-02-21T09:41:19.9850207Z or.b64 %rd54, %rd59, -9223371899399045120; 2026-02-21T09:41:19.9850393Z add.s32 %r256, %r66, 32; 2026-02-21T09:41:19.9850545Z bfe.u32 %r257, %r256, 4, 14; 2026-02-21T09:41:19.9850708Z cvt.u64.u32 %rd60, %r257; 2026-02-21T09:41:19.9850861Z or.b64 %rd55, %rd60, -9223371899382267904; 2026-02-21T09:41:19.9851039Z // begin inline asm 2026-02-21T09:41:19.9851261Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd54, %rd55, %r246, %p42; 2026-02-21T09:41:19.9851509Z // end inline asm 2026-02-21T09:41:19.9851649Z add.s32 %r258, %r66, 77872; 2026-02-21T09:41:19.9851798Z cvt.u64.u32 %rd56, %r258; 2026-02-21T09:41:19.9851949Z // begin inline asm 2026-02-21T09:41:19.9852149Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T09:41:19.9852378Z // end inline asm 2026-02-21T09:41:19.9852504Z $L__BB0_2: 2026-02-21T09:41:19.9852749Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9853037Z bar.sync 0; 2026-02-21T09:41:19.9853164Z // begin inline asm 2026-02-21T09:41:19.9853354Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r259], 12288; 2026-02-21T09:41:19.9853561Z // end inline asm 2026-02-21T09:41:19.9853808Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9854084Z bar.sync 0; 2026-02-21T09:41:19.9854246Z elect.sync %r269|%p107, -1; 2026-02-21T09:41:19.9854429Z and.pred %p108, %p69, %p107; 2026-02-21T09:41:19.9854598Z and.pred %p102, %p4, %p108; 2026-02-21T09:41:19.9854778Z add.s32 %r260, %r66, 69632; 2026-02-21T09:41:19.9854924Z mov.b32 %r643, 160; 2026-02-21T09:41:19.9855066Z // begin inline asm 2026-02-21T09:41:19.9855394Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r260], [%rd40, {%r643, %r631}], [%r259]; 2026-02-21T09:41:19.9855746Z // end inline asm 2026-02-21T09:41:19.9855989Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9856276Z bar.sync 0; 2026-02-21T09:41:19.9856406Z elect.sync %r270|%p109, -1; 2026-02-21T09:41:19.9856570Z and.pred %p110, %p69, %p109; 2026-02-21T09:41:19.9856732Z and.pred %p103, %p4, %p110; 2026-02-21T09:41:19.9856880Z add.s32 %r264, %r66, 40960; 2026-02-21T09:41:19.9857033Z // begin inline asm 2026-02-21T09:41:19.9857350Z @%p103 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd41, {%r643, %r635}], [%r259]; 2026-02-21T09:41:19.9857699Z // end inline asm 2026-02-21T09:41:19.9857949Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9858241Z @%p73 bra $L__BB0_12; 2026-02-21T09:41:19.9858408Z // %bb.3: // %.lr.ph 2026-02-21T09:41:19.9858701Z .loc 1 0 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:0:108 2026-02-21T09:41:19.9858995Z and.b32 %r4, %r1, 15; 2026-02-21T09:41:19.9859138Z shr.u32 %r215, %r1, 4; 2026-02-21T09:41:19.9859293Z bfe.u32 %r6, %r1, 4, 3; 2026-02-21T09:41:19.9859464Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:41:19.9859655Z shl.b32 %r5, %r4, 3; 2026-02-21T09:41:19.9859793Z or.b32 %r7, %r6, 8; 2026-02-21T09:41:19.9859935Z or.b32 %r8, %r6, 16; 2026-02-21T09:41:19.9860081Z or.b32 %r9, %r6, 24; 2026-02-21T09:41:19.9860219Z or.b32 %r10, %r6, 32; 2026-02-21T09:41:19.9860368Z or.b32 %r11, %r6, 40; 2026-02-21T09:41:19.9860534Z or.b32 %r12, %r6, 48; 2026-02-21T09:41:19.9860700Z or.b32 %r13, %r215, 56; 2026-02-21T09:41:19.9860846Z shl.b32 %r15, %r14, 5; 2026-02-21T09:41:19.9860996Z add.s32 %r20, %r15, -6; 2026-02-21T09:41:19.9861137Z shl.b32 %r279, %r1, 9; 2026-02-21T09:41:19.9861287Z and.b32 %r280, %r279, 3072; 2026-02-21T09:41:19.9861435Z shl.b32 %r281, %r4, 4; 2026-02-21T09:41:19.9861579Z and.b32 %r282, %r1, 96; 2026-02-21T09:41:19.9861729Z shl.b32 %r283, %r282, 3; 2026-02-21T09:41:19.9861873Z and.b32 %r285, %r213, 64; 2026-02-21T09:41:19.9862026Z or.b32 %r286, %r281, %r283; 2026-02-21T09:41:19.9862178Z xor.b32 %r287, %r286, %r285; 2026-02-21T09:41:19.9862332Z or.b32 %r288, %r287, %r280; 2026-02-21T09:41:19.9862477Z add.s32 %r290, %r66, 73728; 2026-02-21T09:41:19.9862631Z add.s32 %r21, %r290, %r288; 2026-02-21T09:41:19.9862775Z xor.b32 %r291, %r288, 32; 2026-02-21T09:41:19.9862927Z add.s32 %r22, %r290, %r291; 2026-02-21T09:41:19.9863076Z shl.b32 %r292, %r1, 5; 2026-02-21T09:41:19.9863224Z and.b32 %r293, %r292, 3168; 2026-02-21T09:41:19.9863381Z shl.b32 %r294, %r1, 4; 2026-02-21T09:41:19.9863522Z and.b32 %r295, %r294, 384; 2026-02-21T09:41:19.9863676Z and.b32 %r296, %r213, 16; 2026-02-21T09:41:19.9863818Z or.b32 %r297, %r293, %r295; 2026-02-21T09:41:19.9863971Z xor.b32 %r298, %r297, %r282; 2026-02-21T09:41:19.9864120Z add.s32 %r299, %r290, %r296; 2026-02-21T09:41:19.9864275Z add.s32 %r438, %r299, %r298; 2026-02-21T09:41:19.9864419Z add.s32 %r443, %r438, 512; 2026-02-21T09:41:19.9864714Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9865000Z max.s32 %r300, %r15, 2; 2026-02-21T09:41:19.9865144Z add.s32 %r25, %r300, -1; 2026-02-21T09:41:19.9865298Z mov.pred %p144, -1; 2026-02-21T09:41:19.9865441Z mov.b32 %r648, 5; 2026-02-21T09:41:19.9865582Z mov.b32 %r644, 0; 2026-02-21T09:41:19.9865740Z mov.b32 %r642, 1; 2026-02-21T09:41:19.9865897Z mov.b32 %r641, 2; 2026-02-21T09:41:19.9866025Z mov.b32 %r640, 3; 2026-02-21T09:41:19.9866158Z mov.b32 %r639, 4; 2026-02-21T09:41:19.9866285Z mov.b32 %r632, %r631; 2026-02-21T09:41:19.9866432Z mov.b32 %r633, %r631; 2026-02-21T09:41:19.9866574Z mov.b32 %r634, %r631; 2026-02-21T09:41:19.9866708Z mov.b32 %r636, %r635; 2026-02-21T09:41:19.9866848Z mov.b32 %r637, %r635; 2026-02-21T09:41:19.9866981Z mov.b32 %r638, %r635; 2026-02-21T09:41:19.9867121Z mov.b32 %r645, %r151; 2026-02-21T09:41:19.9867252Z mov.b32 %r646, %r644; 2026-02-21T09:41:19.9867393Z mov.b32 %r647, %r644; 2026-02-21T09:41:19.9867524Z mov.b32 %r649, %r642; 2026-02-21T09:41:19.9867667Z mov.b32 %r650, %r644; 2026-02-21T09:41:19.9867801Z mov.b32 %r651, %r631; 2026-02-21T09:41:19.9867946Z mov.b32 %r652, %r635; 2026-02-21T09:41:19.9868081Z mov.b32 %r654, %r648; 2026-02-21T09:41:19.9868222Z mov.b32 %r655, %r644; 2026-02-21T09:41:19.9868364Z mov.b32 %r656, %r652; 2026-02-21T09:41:19.9868499Z mov.b32 %r657, %r651; 2026-02-21T09:41:19.9868640Z bra.uni $L__BB0_4; 2026-02-21T09:41:19.9868824Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:19.9869143Z .loc 1 0 0 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:0 2026-02-21T09:41:19.9869437Z selp.b32 %r649, 0, %r352, %p128; 2026-02-21T09:41:19.9869619Z selp.b32 %r353, 1, 0, %p128; 2026-02-21T09:41:19.9869780Z xor.b32 %r650, %r620, %r353; 2026-02-21T09:41:19.9870065Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9870365Z add.s32 %r655, %r655, 1; 2026-02-21T09:41:19.9870526Z setp.ne.b32 %p134, %r25, %r655; 2026-02-21T09:41:19.9870700Z mov.b32 %r631, %r651; 2026-02-21T09:41:19.9870842Z mov.b32 %r634, %r28; 2026-02-21T09:41:19.9870990Z mov.b32 %r635, %r652; 2026-02-21T09:41:19.9871129Z mov.b32 %r638, %r32; 2026-02-21T09:41:19.9871278Z mov.b32 %r639, %r654; 2026-02-21T09:41:19.9871417Z mov.b32 %r642, %r36; 2026-02-21T09:41:19.9871599Z mov.b32 %r644, %r620; 2026-02-21T09:41:19.9871774Z mov.b32 %r645, %r619; 2026-02-21T09:41:19.9871913Z mov.b32 %r651, %r657; 2026-02-21T09:41:19.9872061Z mov.b32 %r652, %r656; 2026-02-21T09:41:19.9872200Z mov.b32 %r654, %r51; 2026-02-21T09:41:19.9872349Z @%p134 bra $L__BB0_4; 2026-02-21T09:41:19.9872492Z bra.uni $L__BB0_11; 2026-02-21T09:41:19.9872686Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:19.9873017Z .loc 1 0 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:0:108 2026-02-21T09:41:19.9873314Z mov.b32 %r620, %r650; 2026-02-21T09:41:19.9873461Z mov.b32 %r36, %r641; 2026-02-21T09:41:19.9873602Z mov.b32 %r641, %r640; 2026-02-21T09:41:19.9873749Z mov.b32 %r640, %r639; 2026-02-21T09:41:19.9873886Z mov.b32 %r32, %r637; 2026-02-21T09:41:19.9874032Z mov.b32 %r637, %r636; 2026-02-21T09:41:19.9874171Z mov.b32 %r636, %r635; 2026-02-21T09:41:19.9874317Z mov.b32 %r28, %r633; 2026-02-21T09:41:19.9874456Z mov.b32 %r633, %r632; 2026-02-21T09:41:19.9874602Z mov.b32 %r632, %r631; 2026-02-21T09:41:19.9874893Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9875194Z add.s32 %r301, %r654, 1; 2026-02-21T09:41:19.9875363Z setp.eq.b32 %p112, %r654, 31; 2026-02-21T09:41:19.9875530Z selp.b32 %r51, 0, %r301, %p112; 2026-02-21T09:41:19.9875707Z setp.ne.b32 %p113, %r51, 0; 2026-02-21T09:41:19.9875867Z @%p113 bra $L__BB0_6; 2026-02-21T09:41:19.9876061Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:19.9876277Z add.s32 %r658, %r658, 4736; 2026-02-21T09:41:19.9876554Z .loc 1 34 35 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:34:35 2026-02-21T09:41:19.9876844Z shr.s32 %r302, %r658, 31; 2026-02-21T09:41:19.9877016Z shr.u32 %r303, %r302, 22; 2026-02-21T09:41:19.9877203Z add.s32 %r304, %r658, %r303; 2026-02-21T09:41:19.9877383Z shr.s32 %r305, %r304, 10; 2026-02-21T09:41:19.9877647Z .loc 1 35 33 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:35:33 2026-02-21T09:41:19.9877923Z shl.b32 %r306, %r305, 6; 2026-02-21T09:41:19.9878181Z .loc 1 36 39 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:36:39 2026-02-21T09:41:19.9878452Z sub.s32 %r307, 96, %r306; 2026-02-21T09:41:19.9878712Z .loc 1 36 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:36:52 2026-02-21T09:41:19.9879002Z min.s32 %r308, %r307, 64; 2026-02-21T09:41:19.9879251Z .loc 1 37 45 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:37:45 2026-02-21T09:41:19.9879535Z and.b32 %r309, %r304, -1024; 2026-02-21T09:41:19.9879686Z sub.s32 %r310, %r658, %r309; 2026-02-21T09:41:19.9879952Z .loc 1 38 51 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:38:51 2026-02-21T09:41:19.9880238Z div.s32 %r311, %r310, %r308; 2026-02-21T09:41:19.9880506Z .loc 1 37 64 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:37:64 2026-02-21T09:41:19.9880788Z mul.lo.s32 %r312, %r311, %r308; 2026-02-21T09:41:19.9880946Z sub.s32 %r313, %r310, %r312; 2026-02-21T09:41:19.9881208Z .loc 1 37 30 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:37:30 2026-02-21T09:41:19.9881481Z add.s32 %r314, %r313, %r306; 2026-02-21T09:41:19.9881744Z .loc 1 39 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:39:27 2026-02-21T09:41:19.9882030Z shl.b32 %r656, %r314, 7; 2026-02-21T09:41:19.9882290Z .loc 1 41 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:41:27 2026-02-21T09:41:19.9882572Z shl.b32 %r657, %r311, 6; 2026-02-21T09:41:19.9882757Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:19.9883080Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9883408Z add.s32 %r317, %r647, 1; 2026-02-21T09:41:19.9883562Z setp.gt.s32 %p115, %r317, 5; 2026-02-21T09:41:19.9883719Z selp.b32 %r647, 0, %r317, %p115; 2026-02-21T09:41:19.9883888Z selp.b32 %r318, 1, 0, %p115; 2026-02-21T09:41:19.9884042Z xor.b32 %r646, %r646, %r318; 2026-02-21T09:41:19.9884187Z shl.b32 %r319, %r647, 3; 2026-02-21T09:41:19.9884337Z add.s32 %r321, %r66, %r319; 2026-02-21T09:41:19.9884484Z add.s32 %r315, %r321, 77824; 2026-02-21T09:41:19.9884633Z bar.sync 0; 2026-02-21T09:41:19.9884788Z // begin inline asm 2026-02-21T09:41:19.9884926Z 2026-02-21T09:41:19.9885034Z { 2026-02-21T09:41:19.9885160Z .reg .pred complete; 2026-02-21T09:41:19.9885300Z waitLoop: 2026-02-21T09:41:19.9885489Z mbarrier.try_wait.parity.shared.b64 complete, [%r315], %r646; 2026-02-21T09:41:19.9885722Z @!complete bra.uni waitLoop; 2026-02-21T09:41:19.9885879Z } 2026-02-21T09:41:19.9885943Z 2026-02-21T09:41:19.9886007Z // end inline asm 2026-02-21T09:41:19.9886145Z shl.b32 %r322, %r649, 3; 2026-02-21T09:41:19.9886301Z add.s32 %r323, %r66, %r322; 2026-02-21T09:41:19.9886451Z add.s32 %r619, %r323, 77872; 2026-02-21T09:41:19.9886716Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9886996Z @%p94 bra $L__BB0_8; 2026-02-21T09:41:19.9887180Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:19.9887500Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9887776Z shl.b32 %r328, %r647, 12; 2026-02-21T09:41:19.9887938Z add.s32 %r330, %r66, %r328; 2026-02-21T09:41:19.9888090Z add.s32 %r331, %r330, 49152; 2026-02-21T09:41:19.9888352Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9888662Z shl.b32 %r332, %r647, 13; 2026-02-21T09:41:19.9888837Z add.s32 %r333, %r66, %r332; 2026-02-21T09:41:19.9889087Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9889369Z elect.sync %r334|%p117, -1; 2026-02-21T09:41:19.9889529Z bfe.u32 %r335, %r331, 4, 14; 2026-02-21T09:41:19.9889676Z cvt.u64.u32 %rd68, %r335; 2026-02-21T09:41:19.9889846Z or.b64 %rd63, %rd68, -9223371899399045120; 2026-02-21T09:41:19.9890023Z bfe.u32 %r336, %r333, 4, 14; 2026-02-21T09:41:19.9890177Z cvt.u64.u32 %rd69, %r336; 2026-02-21T09:41:19.9890337Z or.b64 %rd64, %rd69, -9223371899382267904; 2026-02-21T09:41:19.9890521Z mov.b32 %r325, 69206032; 2026-02-21T09:41:19.9890666Z // begin inline asm 2026-02-21T09:41:19.9890895Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd63, %rd64, %r325, %p144; 2026-02-21T09:41:19.9891157Z // end inline asm 2026-02-21T09:41:19.9891289Z add.s32 %r337, %r330, 49184; 2026-02-21T09:41:19.9891447Z bfe.u32 %r338, %r337, 4, 14; 2026-02-21T09:41:19.9891595Z cvt.u64.u32 %rd70, %r338; 2026-02-21T09:41:19.9891759Z or.b64 %rd65, %rd70, -9223371899399045120; 2026-02-21T09:41:19.9891929Z add.s32 %r339, %r333, 32; 2026-02-21T09:41:19.9892079Z bfe.u32 %r340, %r339, 4, 14; 2026-02-21T09:41:19.9892223Z cvt.u64.u32 %rd71, %r340; 2026-02-21T09:41:19.9892382Z or.b64 %rd66, %rd71, -9223371899382267904; 2026-02-21T09:41:19.9892559Z mov.pred %p118, -1; 2026-02-21T09:41:19.9892699Z // begin inline asm 2026-02-21T09:41:19.9892921Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd65, %rd66, %r325, %p118; 2026-02-21T09:41:19.9893161Z // end inline asm 2026-02-21T09:41:19.9893296Z cvt.u64.u32 %rd67, %r619; 2026-02-21T09:41:19.9893440Z // begin inline asm 2026-02-21T09:41:19.9893646Z @%p117 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:41:19.9893866Z // end inline asm 2026-02-21T09:41:19.9894042Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:19.9894361Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9894749Z setp.eq.b32 %p124, %r51, 0; 2026-02-21T09:41:19.9894918Z setp.lt.s32 %p125, %r655, %r20; 2026-02-21T09:41:19.9895187Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9895468Z // begin inline asm 2026-02-21T09:41:19.9895596Z 2026-02-21T09:41:19.9895713Z { 2026-02-21T09:41:19.9895828Z .reg .pred complete; 2026-02-21T09:41:19.9895974Z waitLoop: 2026-02-21T09:41:19.9896162Z mbarrier.try_wait.parity.shared.b64 complete, [%r645], %r644; 2026-02-21T09:41:19.9896386Z @!complete bra.uni waitLoop; 2026-02-21T09:41:19.9896540Z } 2026-02-21T09:41:19.9896604Z 2026-02-21T09:41:19.9896657Z // end inline asm 2026-02-21T09:41:19.9896799Z add.s32 %r352, %r649, 1; 2026-02-21T09:41:19.9896951Z setp.gt.s32 %p128, %r352, 1; 2026-02-21T09:41:19.9897225Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9897515Z add.s32 %r354, %r643, 32; 2026-02-21T09:41:19.9897663Z add.s32 %r355, %r648, 1; 2026-02-21T09:41:19.9897818Z setp.gt.s32 %p129, %r355, 5; 2026-02-21T09:41:19.9897973Z selp.b32 %r648, 0, %r355, %p129; 2026-02-21T09:41:19.9898147Z selp.b32 %r643, 0, %r354, %p124; 2026-02-21T09:41:19.9898304Z shl.b32 %r356, %r648, 3; 2026-02-21T09:41:19.9898455Z add.s32 %r358, %r66, %r356; 2026-02-21T09:41:19.9898601Z add.s32 %r347, %r358, 77824; 2026-02-21T09:41:19.9898753Z bar.sync 0; 2026-02-21T09:41:19.9898887Z and.pred %p121, %p135, %p125; 2026-02-21T09:41:19.9899049Z // begin inline asm 2026-02-21T09:41:19.9899245Z @%p121 mbarrier.arrive.expect_tx.shared.b64 _, [%r347], 12288; 2026-02-21T09:41:19.9899460Z // end inline asm 2026-02-21T09:41:19.9899701Z .loc 1 51 31 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:51:31 2026-02-21T09:41:19.9899992Z shl.b32 %r359, %r648, 12; 2026-02-21T09:41:19.9900171Z add.s32 %r360, %r66, %r359; 2026-02-21T09:41:19.9900324Z add.s32 %r344, %r360, 49152; 2026-02-21T09:41:19.9900475Z bar.sync 0; 2026-02-21T09:41:19.9900607Z elect.sync %r361|%p130, -1; 2026-02-21T09:41:19.9900772Z and.pred %p131, %p125, %p130; 2026-02-21T09:41:19.9900932Z and.pred %p122, %p4, %p131; 2026-02-21T09:41:19.9901081Z // begin inline asm 2026-02-21T09:41:19.9901406Z @%p122 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r344], [%rd40, {%r643, %r657}], [%r347]; 2026-02-21T09:41:19.9901758Z // end inline asm 2026-02-21T09:41:19.9901996Z .loc 1 52 44 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:52:44 2026-02-21T09:41:19.9902264Z shl.b32 %r362, %r648, 13; 2026-02-21T09:41:19.9902415Z add.s32 %r348, %r66, %r362; 2026-02-21T09:41:19.9902556Z bar.sync 0; 2026-02-21T09:41:19.9902692Z elect.sync %r363|%p132, -1; 2026-02-21T09:41:19.9902852Z and.pred %p133, %p125, %p132; 2026-02-21T09:41:19.9903007Z and.pred %p123, %p4, %p133; 2026-02-21T09:41:19.9903161Z // begin inline asm 2026-02-21T09:41:19.9903472Z @%p123 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r348], [%rd41, {%r643, %r656}], [%r347]; 2026-02-21T09:41:19.9903828Z // end inline asm 2026-02-21T09:41:19.9904065Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9904356Z setp.ne.b32 %p144, %r642, 31; 2026-02-21T09:41:19.9904519Z @%p144 bra $L__BB0_10; 2026-02-21T09:41:19.9904733Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:19.9905044Z .loc 1 40 32 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:40:32 2026-02-21T09:41:19.9905324Z add.s32 %r506, %r638, %r5; 2026-02-21T09:41:19.9905582Z .loc 1 42 32 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:42:32 2026-02-21T09:41:19.9905857Z add.s32 %r507, %r634, %r6; 2026-02-21T09:41:19.9906018Z add.s32 %r508, %r7, %r634; 2026-02-21T09:41:19.9906195Z add.s32 %r509, %r8, %r634; 2026-02-21T09:41:19.9906363Z add.s32 %r510, %r9, %r634; 2026-02-21T09:41:19.9906517Z add.s32 %r511, %r10, %r634; 2026-02-21T09:41:19.9906664Z add.s32 %r512, %r11, %r634; 2026-02-21T09:41:19.9906818Z add.s32 %r513, %r12, %r634; 2026-02-21T09:41:19.9906963Z add.s32 %r514, %r634, %r13; 2026-02-21T09:41:19.9907218Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9907489Z // begin inline asm 2026-02-21T09:41:19.9907629Z 2026-02-21T09:41:19.9907738Z { 2026-02-21T09:41:19.9907861Z .reg .pred complete; 2026-02-21T09:41:19.9908004Z waitLoop: 2026-02-21T09:41:19.9908183Z mbarrier.try_wait.parity.shared.b64 complete, [%r619], %r620; 2026-02-21T09:41:19.9908412Z @!complete bra.uni waitLoop; 2026-02-21T09:41:19.9908555Z } 2026-02-21T09:41:19.9908624Z 2026-02-21T09:41:19.9908677Z // end inline asm 2026-02-21T09:41:19.9908911Z .loc 1 56 53 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:56:53 2026-02-21T09:41:19.9909214Z mad.lo.s32 %r515, %r507, 12288, %r506; 2026-02-21T09:41:19.9909389Z mad.lo.s32 %r516, %r508, 12288, %r506; 2026-02-21T09:41:19.9909567Z mad.lo.s32 %r517, %r509, 12288, %r506; 2026-02-21T09:41:19.9909738Z mad.lo.s32 %r518, %r510, 12288, %r506; 2026-02-21T09:41:19.9909904Z mad.lo.s32 %r519, %r511, 12288, %r506; 2026-02-21T09:41:19.9910075Z mad.lo.s32 %r520, %r512, 12288, %r506; 2026-02-21T09:41:19.9910239Z mad.lo.s32 %r521, %r513, 12288, %r506; 2026-02-21T09:41:19.9910410Z mad.lo.s32 %r522, %r514, 12288, %r506; 2026-02-21T09:41:19.9910674Z .loc 1 56 24 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:56:24 2026-02-21T09:41:19.9910966Z mad.wide.s32 %rd74, %r515, 2, %rd3; 2026-02-21T09:41:19.9911145Z mad.wide.s32 %rd75, %r516, 2, %rd3; 2026-02-21T09:41:19.9911312Z mad.wide.s32 %rd76, %r517, 2, %rd3; 2026-02-21T09:41:19.9911533Z mad.wide.s32 %rd77, %r518, 2, %rd3; 2026-02-21T09:41:19.9911698Z mad.wide.s32 %rd78, %r519, 2, %rd3; 2026-02-21T09:41:19.9911868Z mad.wide.s32 %rd79, %r520, 2, %rd3; 2026-02-21T09:41:19.9912027Z mad.wide.s32 %rd80, %r521, 2, %rd3; 2026-02-21T09:41:19.9912194Z mad.wide.s32 %rd81, %r522, 2, %rd3; 2026-02-21T09:41:19.9912451Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9912729Z // begin inline asm 2026-02-21T09:41:19.9913115Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381}, [%r83 + 0], 64; 2026-02-21T09:41:19.9913529Z // end inline asm 2026-02-21T09:41:19.9913673Z // begin inline asm 2026-02-21T09:41:19.9914040Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398}, [%r83 + 16], 64; 2026-02-21T09:41:19.9914442Z // end inline asm 2026-02-21T09:41:19.9914583Z // begin inline asm 2026-02-21T09:41:19.9914980Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415}, [%r83 + 32], 64; 2026-02-21T09:41:19.9915383Z // end inline asm 2026-02-21T09:41:19.9915523Z // begin inline asm 2026-02-21T09:41:19.9915909Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432}, [%r83 + 48], 64; 2026-02-21T09:41:19.9916321Z // end inline asm 2026-02-21T09:41:19.9916467Z // begin inline asm 2026-02-21T09:41:19.9916628Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:19.9916805Z // end inline asm 2026-02-21T09:41:19.9916956Z cvt.u64.u32 %rd82, %r366; 2026-02-21T09:41:19.9917123Z cvt.u64.u32 %rd83, %r367; 2026-02-21T09:41:19.9917295Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:41:19.9917458Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:41:19.9917751Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9918101Z mov.b64 {%r523, %r524}, %rd85; 2026-02-21T09:41:19.9918287Z cvt.rn.f16x2.f32 %r525, %r524, %r523; 2026-02-21T09:41:19.9918585Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9918879Z cvt.u64.u32 %rd86, %r368; 2026-02-21T09:41:19.9919039Z cvt.u64.u32 %rd87, %r369; 2026-02-21T09:41:19.9919189Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:41:19.9919349Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:41:19.9919611Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9919919Z mov.b64 {%r526, %r527}, %rd89; 2026-02-21T09:41:19.9920090Z cvt.rn.f16x2.f32 %r528, %r527, %r526; 2026-02-21T09:41:19.9920380Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9920671Z cvt.u64.u32 %rd90, %r370; 2026-02-21T09:41:19.9920823Z cvt.u64.u32 %rd91, %r371; 2026-02-21T09:41:19.9920984Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:41:19.9921134Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:41:19.9921412Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9921686Z mov.b64 {%r529, %r530}, %rd93; 2026-02-21T09:41:19.9921853Z cvt.rn.f16x2.f32 %r531, %r530, %r529; 2026-02-21T09:41:19.9922118Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9922395Z cvt.u64.u32 %rd94, %r372; 2026-02-21T09:41:19.9922546Z cvt.u64.u32 %rd95, %r373; 2026-02-21T09:41:19.9922688Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:41:19.9922837Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:41:19.9923082Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9923390Z mov.b64 {%r532, %r533}, %rd97; 2026-02-21T09:41:19.9923568Z cvt.rn.f16x2.f32 %r534, %r533, %r532; 2026-02-21T09:41:19.9923844Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9924121Z cvt.u64.u32 %rd98, %r374; 2026-02-21T09:41:19.9924264Z cvt.u64.u32 %rd99, %r375; 2026-02-21T09:41:19.9924416Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:41:19.9924567Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:41:19.9924857Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9925142Z mov.b64 {%r535, %r536}, %rd101; 2026-02-21T09:41:19.9925311Z cvt.rn.f16x2.f32 %r537, %r536, %r535; 2026-02-21T09:41:19.9925576Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9925860Z cvt.u64.u32 %rd102, %r376; 2026-02-21T09:41:19.9926017Z cvt.u64.u32 %rd103, %r377; 2026-02-21T09:41:19.9926164Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:41:19.9926323Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:41:19.9926579Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9926862Z mov.b64 {%r538, %r539}, %rd105; 2026-02-21T09:41:19.9927021Z cvt.rn.f16x2.f32 %r540, %r539, %r538; 2026-02-21T09:41:19.9927295Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9927583Z cvt.u64.u32 %rd106, %r378; 2026-02-21T09:41:19.9927733Z cvt.u64.u32 %rd107, %r379; 2026-02-21T09:41:19.9927888Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:41:19.9928040Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:41:19.9928307Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9928582Z mov.b64 {%r541, %r542}, %rd109; 2026-02-21T09:41:19.9928754Z cvt.rn.f16x2.f32 %r543, %r542, %r541; 2026-02-21T09:41:19.9929028Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9929343Z cvt.u64.u32 %rd110, %r380; 2026-02-21T09:41:19.9929537Z cvt.u64.u32 %rd111, %r381; 2026-02-21T09:41:19.9929683Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:41:19.9929840Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:41:19.9930092Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9930370Z mov.b64 {%r544, %r545}, %rd113; 2026-02-21T09:41:19.9930528Z cvt.rn.f16x2.f32 %r546, %r545, %r544; 2026-02-21T09:41:19.9930800Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9931091Z cvt.u64.u32 %rd114, %r383; 2026-02-21T09:41:19.9931237Z cvt.u64.u32 %rd115, %r384; 2026-02-21T09:41:19.9931390Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:41:19.9931539Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:41:19.9931798Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9932073Z mov.b64 {%r547, %r548}, %rd117; 2026-02-21T09:41:19.9932241Z cvt.rn.f16x2.f32 %r549, %r548, %r547; 2026-02-21T09:41:19.9932509Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9932794Z cvt.u64.u32 %rd118, %r385; 2026-02-21T09:41:19.9932947Z cvt.u64.u32 %rd119, %r386; 2026-02-21T09:41:19.9933092Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:41:19.9933248Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:41:19.9933505Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9933782Z mov.b64 {%r550, %r551}, %rd121; 2026-02-21T09:41:19.9933939Z cvt.rn.f16x2.f32 %r552, %r551, %r550; 2026-02-21T09:41:19.9934211Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9934486Z cvt.u64.u32 %rd122, %r387; 2026-02-21T09:41:19.9934654Z cvt.u64.u32 %rd123, %r388; 2026-02-21T09:41:19.9934861Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:41:19.9935013Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:41:19.9935274Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9935555Z mov.b64 {%r553, %r554}, %rd125; 2026-02-21T09:41:19.9935719Z cvt.rn.f16x2.f32 %r555, %r554, %r553; 2026-02-21T09:41:19.9935982Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9936258Z cvt.u64.u32 %rd126, %r389; 2026-02-21T09:41:19.9936410Z cvt.u64.u32 %rd127, %r390; 2026-02-21T09:41:19.9936554Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:41:19.9936707Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:41:19.9936958Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9937238Z mov.b64 {%r556, %r557}, %rd129; 2026-02-21T09:41:19.9937398Z cvt.rn.f16x2.f32 %r558, %r557, %r556; 2026-02-21T09:41:19.9937667Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9937944Z cvt.u64.u32 %rd130, %r391; 2026-02-21T09:41:19.9938094Z cvt.u64.u32 %rd131, %r392; 2026-02-21T09:41:19.9938245Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:41:19.9938392Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:41:19.9938652Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9938936Z mov.b64 {%r559, %r560}, %rd133; 2026-02-21T09:41:19.9939101Z cvt.rn.f16x2.f32 %r561, %r560, %r559; 2026-02-21T09:41:19.9939361Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9939643Z cvt.u64.u32 %rd134, %r393; 2026-02-21T09:41:19.9939799Z cvt.u64.u32 %rd135, %r394; 2026-02-21T09:41:19.9939947Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:41:19.9940108Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:41:19.9940358Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9940691Z mov.b64 {%r562, %r563}, %rd137; 2026-02-21T09:41:19.9940849Z cvt.rn.f16x2.f32 %r564, %r563, %r562; 2026-02-21T09:41:19.9941120Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9941396Z cvt.u64.u32 %rd138, %r395; 2026-02-21T09:41:19.9941543Z cvt.u64.u32 %rd139, %r396; 2026-02-21T09:41:19.9941696Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:41:19.9941846Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:41:19.9942104Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9942381Z mov.b64 {%r565, %r566}, %rd141; 2026-02-21T09:41:19.9942544Z cvt.rn.f16x2.f32 %r567, %r566, %r565; 2026-02-21T09:41:19.9942811Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9943093Z cvt.u64.u32 %rd142, %r397; 2026-02-21T09:41:19.9943250Z cvt.u64.u32 %rd143, %r398; 2026-02-21T09:41:19.9943399Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:41:19.9943555Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:41:19.9943817Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9944097Z mov.b64 {%r568, %r569}, %rd145; 2026-02-21T09:41:19.9944256Z cvt.rn.f16x2.f32 %r570, %r569, %r568; 2026-02-21T09:41:19.9944525Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9944844Z cvt.u64.u32 %rd146, %r400; 2026-02-21T09:41:19.9944993Z cvt.u64.u32 %rd147, %r401; 2026-02-21T09:41:19.9945146Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:41:19.9945295Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:41:19.9945563Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9945864Z mov.b64 {%r571, %r572}, %rd149; 2026-02-21T09:41:19.9946050Z cvt.rn.f16x2.f32 %r573, %r572, %r571; 2026-02-21T09:41:19.9946318Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9946592Z cvt.u64.u32 %rd150, %r402; 2026-02-21T09:41:19.9946745Z cvt.u64.u32 %rd151, %r403; 2026-02-21T09:41:19.9946889Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:41:19.9947043Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:41:19.9947293Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9947570Z mov.b64 {%r574, %r575}, %rd153; 2026-02-21T09:41:19.9947727Z cvt.rn.f16x2.f32 %r576, %r575, %r574; 2026-02-21T09:41:19.9948005Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9948288Z cvt.u64.u32 %rd154, %r404; 2026-02-21T09:41:19.9948434Z cvt.u64.u32 %rd155, %r405; 2026-02-21T09:41:19.9948588Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:41:19.9948739Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:41:19.9949003Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9949278Z mov.b64 {%r577, %r578}, %rd157; 2026-02-21T09:41:19.9949443Z cvt.rn.f16x2.f32 %r579, %r578, %r577; 2026-02-21T09:41:19.9949706Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9950000Z cvt.u64.u32 %rd158, %r406; 2026-02-21T09:41:19.9950155Z cvt.u64.u32 %rd159, %r407; 2026-02-21T09:41:19.9950301Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:41:19.9950462Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:41:19.9950716Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9950999Z mov.b64 {%r580, %r581}, %rd161; 2026-02-21T09:41:19.9951157Z cvt.rn.f16x2.f32 %r582, %r581, %r580; 2026-02-21T09:41:19.9951431Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9951765Z cvt.u64.u32 %rd162, %r408; 2026-02-21T09:41:19.9951911Z cvt.u64.u32 %rd163, %r409; 2026-02-21T09:41:19.9952066Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:41:19.9952216Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:41:19.9952481Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9952766Z mov.b64 {%r583, %r584}, %rd165; 2026-02-21T09:41:19.9952933Z cvt.rn.f16x2.f32 %r585, %r584, %r583; 2026-02-21T09:41:19.9953203Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9953481Z cvt.u64.u32 %rd166, %r410; 2026-02-21T09:41:19.9953638Z cvt.u64.u32 %rd167, %r411; 2026-02-21T09:41:19.9953786Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:41:19.9953946Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:41:19.9954203Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9954489Z mov.b64 {%r586, %r587}, %rd169; 2026-02-21T09:41:19.9954649Z cvt.rn.f16x2.f32 %r588, %r587, %r586; 2026-02-21T09:41:19.9954954Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9955238Z cvt.u64.u32 %rd170, %r412; 2026-02-21T09:41:19.9955387Z cvt.u64.u32 %rd171, %r413; 2026-02-21T09:41:19.9955543Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:41:19.9955699Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:41:19.9955978Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9956268Z mov.b64 {%r589, %r590}, %rd173; 2026-02-21T09:41:19.9956441Z cvt.rn.f16x2.f32 %r591, %r590, %r589; 2026-02-21T09:41:19.9956721Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9957045Z cvt.u64.u32 %rd174, %r414; 2026-02-21T09:41:19.9957232Z cvt.u64.u32 %rd175, %r415; 2026-02-21T09:41:19.9957391Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:41:19.9957557Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:41:19.9957830Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9958137Z mov.b64 {%r592, %r593}, %rd177; 2026-02-21T09:41:19.9958305Z cvt.rn.f16x2.f32 %r594, %r593, %r592; 2026-02-21T09:41:19.9958601Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9958896Z cvt.u64.u32 %rd178, %r417; 2026-02-21T09:41:19.9959049Z cvt.u64.u32 %rd179, %r418; 2026-02-21T09:41:19.9959209Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:41:19.9959364Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:41:19.9959641Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9959937Z mov.b64 {%r595, %r596}, %rd181; 2026-02-21T09:41:19.9960114Z cvt.rn.f16x2.f32 %r597, %r596, %r595; 2026-02-21T09:41:19.9960395Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9960699Z cvt.u64.u32 %rd182, %r419; 2026-02-21T09:41:19.9960861Z cvt.u64.u32 %rd183, %r420; 2026-02-21T09:41:19.9961016Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:41:19.9961180Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:41:19.9961454Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9961756Z mov.b64 {%r598, %r599}, %rd185; 2026-02-21T09:41:19.9961934Z cvt.rn.f16x2.f32 %r600, %r599, %r598; 2026-02-21T09:41:19.9962225Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9962521Z cvt.u64.u32 %rd186, %r421; 2026-02-21T09:41:19.9962675Z cvt.u64.u32 %rd187, %r422; 2026-02-21T09:41:19.9962836Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:41:19.9962999Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:41:19.9963268Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9963587Z mov.b64 {%r601, %r602}, %rd189; 2026-02-21T09:41:19.9963754Z cvt.rn.f16x2.f32 %r603, %r602, %r601; 2026-02-21T09:41:19.9964017Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9964295Z cvt.u64.u32 %rd190, %r423; 2026-02-21T09:41:19.9964449Z cvt.u64.u32 %rd191, %r424; 2026-02-21T09:41:19.9964594Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:41:19.9964954Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:41:19.9965220Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9965510Z mov.b64 {%r604, %r605}, %rd193; 2026-02-21T09:41:19.9965671Z cvt.rn.f16x2.f32 %r606, %r605, %r604; 2026-02-21T09:41:19.9965952Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9966241Z cvt.u64.u32 %rd194, %r425; 2026-02-21T09:41:19.9966396Z cvt.u64.u32 %rd195, %r426; 2026-02-21T09:41:19.9966553Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:41:19.9966705Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:41:19.9966977Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9967258Z mov.b64 {%r607, %r608}, %rd197; 2026-02-21T09:41:19.9967425Z cvt.rn.f16x2.f32 %r609, %r608, %r607; 2026-02-21T09:41:19.9967699Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9967989Z cvt.u64.u32 %rd198, %r427; 2026-02-21T09:41:19.9968142Z cvt.u64.u32 %rd199, %r428; 2026-02-21T09:41:19.9968287Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:41:19.9968442Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:41:19.9968736Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9969043Z mov.b64 {%r610, %r611}, %rd201; 2026-02-21T09:41:19.9969204Z cvt.rn.f16x2.f32 %r612, %r611, %r610; 2026-02-21T09:41:19.9969481Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9969767Z cvt.u64.u32 %rd202, %r429; 2026-02-21T09:41:19.9969914Z cvt.u64.u32 %rd203, %r430; 2026-02-21T09:41:19.9969970Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:41:19.9970036Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:41:19.9970196Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9970253Z mov.b64 {%r613, %r614}, %rd205; 2026-02-21T09:41:19.9970321Z cvt.rn.f16x2.f32 %r615, %r614, %r613; 2026-02-21T09:41:19.9970478Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9970535Z cvt.u64.u32 %rd206, %r431; 2026-02-21T09:41:19.9970597Z cvt.u64.u32 %rd207, %r432; 2026-02-21T09:41:19.9970655Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:41:19.9970713Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:41:19.9970877Z .loc 1 55 27 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:55:27 2026-02-21T09:41:19.9970940Z mov.b64 {%r616, %r617}, %rd209; 2026-02-21T09:41:19.9971001Z cvt.rn.f16x2.f32 %r618, %r617, %r616; 2026-02-21T09:41:19.9971157Z .loc 1 56 83 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:56:83 2026-02-21T09:41:19.9971258Z st.shared.v4.b32 [%r21], {%r525, %r537, %r549, %r561}; 2026-02-21T09:41:19.9971348Z st.shared.v4.b32 [%r22], {%r573, %r585, %r597, %r609}; 2026-02-21T09:41:19.9971402Z bar.sync 0; 2026-02-21T09:41:19.9971467Z // begin inline asm 2026-02-21T09:41:19.9971615Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r474, %r478, %r482, %r486}, [%r438]; 2026-02-21T09:41:19.9971670Z // end inline asm 2026-02-21T09:41:19.9971725Z // begin inline asm 2026-02-21T09:41:19.9971880Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r490, %r494, %r498, %r502}, [%r443]; 2026-02-21T09:41:19.9971973Z // end inline asm 2026-02-21T09:41:19.9972049Z bar.sync 0; 2026-02-21T09:41:19.9972144Z st.shared.v4.b32 [%r21], {%r528, %r540, %r552, %r564}; 2026-02-21T09:41:19.9972229Z st.shared.v4.b32 [%r22], {%r576, %r588, %r600, %r612}; 2026-02-21T09:41:19.9972284Z bar.sync 0; 2026-02-21T09:41:19.9972339Z // begin inline asm 2026-02-21T09:41:19.9972486Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r475, %r479, %r483, %r487}, [%r438]; 2026-02-21T09:41:19.9972539Z // end inline asm 2026-02-21T09:41:19.9972594Z // begin inline asm 2026-02-21T09:41:19.9972735Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r491, %r495, %r499, %r503}, [%r443]; 2026-02-21T09:41:19.9972788Z // end inline asm 2026-02-21T09:41:19.9972842Z bar.sync 0; 2026-02-21T09:41:19.9972931Z st.shared.v4.b32 [%r21], {%r531, %r543, %r555, %r567}; 2026-02-21T09:41:19.9973012Z st.shared.v4.b32 [%r22], {%r579, %r591, %r603, %r615}; 2026-02-21T09:41:19.9973064Z bar.sync 0; 2026-02-21T09:41:19.9973118Z // begin inline asm 2026-02-21T09:41:19.9973264Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r476, %r480, %r484, %r488}, [%r438]; 2026-02-21T09:41:19.9973319Z // end inline asm 2026-02-21T09:41:19.9973375Z // begin inline asm 2026-02-21T09:41:19.9973521Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r492, %r496, %r500, %r504}, [%r443]; 2026-02-21T09:41:19.9973576Z // end inline asm 2026-02-21T09:41:19.9973627Z bar.sync 0; 2026-02-21T09:41:19.9973710Z st.shared.v4.b32 [%r21], {%r534, %r546, %r558, %r570}; 2026-02-21T09:41:19.9973799Z st.shared.v4.b32 [%r22], {%r582, %r594, %r606, %r618}; 2026-02-21T09:41:19.9973850Z bar.sync 0; 2026-02-21T09:41:19.9973902Z // begin inline asm 2026-02-21T09:41:19.9974040Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r477, %r481, %r485, %r489}, [%r438]; 2026-02-21T09:41:19.9974093Z // end inline asm 2026-02-21T09:41:19.9974145Z // begin inline asm 2026-02-21T09:41:19.9974325Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r493, %r497, %r501, %r505}, [%r443]; 2026-02-21T09:41:19.9974381Z // end inline asm 2026-02-21T09:41:19.9974436Z // begin inline asm 2026-02-21T09:41:19.9974536Z st.global.v4.b32 [ %rd74 + 0 ], { %r474, %r475, %r476, %r477 }; 2026-02-21T09:41:19.9974596Z // end inline asm 2026-02-21T09:41:19.9974650Z // begin inline asm 2026-02-21T09:41:19.9974774Z st.global.v4.b32 [ %rd75 + 0 ], { %r478, %r479, %r480, %r481 }; 2026-02-21T09:41:19.9974835Z // end inline asm 2026-02-21T09:41:19.9974889Z // begin inline asm 2026-02-21T09:41:19.9974979Z st.global.v4.b32 [ %rd76 + 0 ], { %r482, %r483, %r484, %r485 }; 2026-02-21T09:41:19.9975031Z // end inline asm 2026-02-21T09:41:19.9975093Z // begin inline asm 2026-02-21T09:41:19.9975184Z st.global.v4.b32 [ %rd77 + 0 ], { %r486, %r487, %r488, %r489 }; 2026-02-21T09:41:19.9975237Z // end inline asm 2026-02-21T09:41:19.9975300Z // begin inline asm 2026-02-21T09:41:19.9975390Z st.global.v4.b32 [ %rd78 + 0 ], { %r490, %r491, %r492, %r493 }; 2026-02-21T09:41:19.9975444Z // end inline asm 2026-02-21T09:41:19.9975500Z // begin inline asm 2026-02-21T09:41:19.9975598Z st.global.v4.b32 [ %rd79 + 0 ], { %r494, %r495, %r496, %r497 }; 2026-02-21T09:41:19.9975652Z // end inline asm 2026-02-21T09:41:19.9975706Z // begin inline asm 2026-02-21T09:41:19.9975801Z st.global.v4.b32 [ %rd80 + 0 ], { %r498, %r499, %r500, %r501 }; 2026-02-21T09:41:19.9975854Z // end inline asm 2026-02-21T09:41:19.9975906Z // begin inline asm 2026-02-21T09:41:19.9976002Z st.global.v4.b32 [ %rd81 + 0 ], { %r502, %r503, %r504, %r505 }; 2026-02-21T09:41:19.9976055Z // end inline asm 2026-02-21T09:41:19.9976111Z bra.uni $L__BB0_10; 2026-02-21T09:41:19.9976192Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:19.9976366Z .loc 1 53 52 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:53:52 2026-02-21T09:41:19.9976420Z // begin inline asm 2026-02-21T09:41:19.9976468Z 2026-02-21T09:41:19.9976523Z { 2026-02-21T09:41:19.9976583Z .reg .pred complete; 2026-02-21T09:41:19.9976637Z waitLoop: 2026-02-21T09:41:19.9976776Z mbarrier.try_wait.parity.shared.b64 complete, [%r619], %r620; 2026-02-21T09:41:19.9976869Z @!complete bra.uni waitLoop; 2026-02-21T09:41:19.9976917Z } 2026-02-21T09:41:19.9976921Z 2026-02-21T09:41:19.9976975Z // end inline asm 2026-02-21T09:41:19.9977069Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:41:19.9977242Z .loc 1 28 108 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:108 2026-02-21T09:41:19.9977293Z bar.sync 0; 2026-02-21T09:41:19.9977352Z // begin inline asm 2026-02-21T09:41:19.9977440Z @%p135 mbarrier.inval.shared::cta.b64 [%r153]; 2026-02-21T09:41:19.9977492Z // end inline asm 2026-02-21T09:41:19.9977542Z bar.sync 0; 2026-02-21T09:41:19.9977601Z // begin inline asm 2026-02-21T09:41:19.9977683Z @%p135 mbarrier.inval.shared::cta.b64 [%r154]; 2026-02-21T09:41:19.9977735Z // end inline asm 2026-02-21T09:41:19.9977795Z bar.sync 0; 2026-02-21T09:41:19.9977849Z // begin inline asm 2026-02-21T09:41:19.9977926Z @%p135 mbarrier.inval.shared::cta.b64 [%r155]; 2026-02-21T09:41:19.9977979Z // end inline asm 2026-02-21T09:41:19.9978036Z bar.sync 0; 2026-02-21T09:41:19.9978089Z // begin inline asm 2026-02-21T09:41:19.9978163Z @%p135 mbarrier.inval.shared::cta.b64 [%r156]; 2026-02-21T09:41:19.9978222Z // end inline asm 2026-02-21T09:41:19.9978272Z bar.sync 0; 2026-02-21T09:41:19.9978325Z // begin inline asm 2026-02-21T09:41:19.9978405Z @%p135 mbarrier.inval.shared::cta.b64 [%r157]; 2026-02-21T09:41:19.9978457Z // end inline asm 2026-02-21T09:41:19.9978507Z bar.sync 0; 2026-02-21T09:41:19.9978560Z // begin inline asm 2026-02-21T09:41:19.9978643Z @%p135 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:41:19.9978696Z // end inline asm 2026-02-21T09:41:19.9978748Z // begin inline asm 2026-02-21T09:41:19.9978828Z @%p135 mbarrier.inval.shared::cta.b64 [%r151]; 2026-02-21T09:41:19.9978880Z // end inline asm 2026-02-21T09:41:19.9978954Z bar.sync 0; 2026-02-21T09:41:19.9979030Z // begin inline asm 2026-02-21T09:41:19.9979116Z @%p135 mbarrier.inval.shared::cta.b64 [%r152]; 2026-02-21T09:41:19.9979170Z // end inline asm 2026-02-21T09:41:19.9979334Z .loc 1 28 4 // crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py:28:4 2026-02-21T09:41:19.9979396Z bar.sync 0; 2026-02-21T09:41:19.9979450Z // begin inline asm 2026-02-21T09:41:19.9979563Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r629, 128; 2026-02-21T09:41:19.9979627Z // end inline asm 2026-02-21T09:41:19.9979678Z ret; 2026-02-21T09:41:19.9979730Z $L__tmp1: 2026-02-21T09:41:19.9979783Z $L__func_end0: 2026-02-21T09:41:19.9979870Z // -- End function 2026-02-21T09:41:19.9979918Z } 2026-02-21T09:41:19.9980124Z .file 1 "/tmp/torchinductor_root/rk/crkgwobcln43ywnp4o4jozhemoa7ufssfyrvr57455cu6boh3uni.py" 2026-02-21T09:41:19.9980191Z .section .debug_abbrev 2026-02-21T09:41:19.9980241Z { 2026-02-21T09:41:19.9980327Z .b8 1 // Abbreviation Code 2026-02-21T09:41:19.9980411Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:19.9980497Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:19.9980572Z .b8 37 // DW_AT_producer 2026-02-21T09:41:19.9980644Z .b8 8 // DW_FORM_string 2026-02-21T09:41:19.9980720Z .b8 19 // DW_AT_language 2026-02-21T09:41:19.9980791Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:19.9980862Z .b8 3 // DW_AT_name 2026-02-21T09:41:19.9980939Z .b8 8 // DW_FORM_string 2026-02-21T09:41:19.9981013Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:19.9981083Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:19.9981157Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:19.9981235Z .b8 8 // DW_FORM_string 2026-02-21T09:41:19.9981347Z .b8 0 // EOM(1) 2026-02-21T09:41:19.9981414Z .b8 0 // EOM(2) 2026-02-21T09:41:19.9981484Z .b8 0 // EOM(3) 2026-02-21T09:41:19.9981534Z } 2026-02-21T09:41:19.9981590Z .section .debug_info 2026-02-21T09:41:19.9981644Z { 2026-02-21T09:41:19.9981722Z .b32 104 // Length of Unit 2026-02-21T09:41:19.9981804Z .b8 2 // DWARF version number 2026-02-21T09:41:19.9981854Z .b8 0 2026-02-21T09:41:19.9981975Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:19.9982062Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:19.9982159Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:19.9982243Z .b8 116 // DW_AT_producer 2026-02-21T09:41:19.9982296Z .b8 114 2026-02-21T09:41:19.9982348Z .b8 105 2026-02-21T09:41:19.9982404Z .b8 116 2026-02-21T09:41:19.9982453Z .b8 111 2026-02-21T09:41:19.9982503Z .b8 110 2026-02-21T09:41:19.9982552Z .b8 0 2026-02-21T09:41:19.9982628Z .b8 2 // DW_AT_language 2026-02-21T09:41:19.9982679Z .b8 0 2026-02-21T09:41:19.9982751Z .b8 99 // DW_AT_name 2026-02-21T09:41:19.9982807Z .b8 114 2026-02-21T09:41:19.9982855Z .b8 107 2026-02-21T09:41:19.9982902Z .b8 103 2026-02-21T09:41:19.9982949Z .b8 119 2026-02-21T09:41:19.9983005Z .b8 111 2026-02-21T09:41:19.9983053Z .b8 98 2026-02-21T09:41:19.9983101Z .b8 99 2026-02-21T09:41:19.9983148Z .b8 108 2026-02-21T09:41:19.9983203Z .b8 110 2026-02-21T09:41:19.9983252Z .b8 52 2026-02-21T09:41:19.9983299Z .b8 51 2026-02-21T09:41:19.9983352Z .b8 121 2026-02-21T09:41:19.9983400Z .b8 119 2026-02-21T09:41:19.9983447Z .b8 110 2026-02-21T09:41:19.9983494Z .b8 112 2026-02-21T09:41:19.9983567Z .b8 52 2026-02-21T09:41:19.9983631Z .b8 111 2026-02-21T09:41:19.9983681Z .b8 52 2026-02-21T09:41:19.9983738Z .b8 106 2026-02-21T09:41:19.9983789Z .b8 111 2026-02-21T09:41:19.9983840Z .b8 122 2026-02-21T09:41:19.9983891Z .b8 104 2026-02-21T09:41:19.9983949Z .b8 101 2026-02-21T09:41:19.9984000Z .b8 109 2026-02-21T09:41:19.9984049Z .b8 111 2026-02-21T09:41:19.9984098Z .b8 97 2026-02-21T09:41:19.9984155Z .b8 55 2026-02-21T09:41:19.9984206Z .b8 117 2026-02-21T09:41:19.9984255Z .b8 102 2026-02-21T09:41:19.9984312Z .b8 115 2026-02-21T09:41:19.9984363Z .b8 115 2026-02-21T09:41:19.9984412Z .b8 102 2026-02-21T09:41:19.9984463Z .b8 121 2026-02-21T09:41:19.9984522Z .b8 114 2026-02-21T09:41:19.9984573Z .b8 118 2026-02-21T09:41:19.9984622Z .b8 114 2026-02-21T09:41:19.9984711Z .b8 53 2026-02-21T09:41:19.9984760Z .b8 55 2026-02-21T09:41:19.9984809Z .b8 52 2026-02-21T09:41:19.9984857Z .b8 53 2026-02-21T09:41:19.9984913Z .b8 53 2026-02-21T09:41:19.9984962Z .b8 99 2026-02-21T09:41:19.9985012Z .b8 117 2026-02-21T09:41:19.9985068Z .b8 54 2026-02-21T09:41:19.9985117Z .b8 98 2026-02-21T09:41:19.9985166Z .b8 111 2026-02-21T09:41:19.9985217Z .b8 104 2026-02-21T09:41:19.9985273Z .b8 51 2026-02-21T09:41:19.9985322Z .b8 117 2026-02-21T09:41:19.9985371Z .b8 110 2026-02-21T09:41:19.9985419Z .b8 105 2026-02-21T09:41:19.9985477Z .b8 46 2026-02-21T09:41:19.9985527Z .b8 112 2026-02-21T09:41:19.9985575Z .b8 121 2026-02-21T09:41:19.9985635Z .b8 0 2026-02-21T09:41:19.9985724Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:19.9985801Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:19.9985853Z .b8 116 2026-02-21T09:41:19.9985914Z .b8 109 2026-02-21T09:41:19.9985967Z .b8 112 2026-02-21T09:41:19.9986019Z .b8 47 2026-02-21T09:41:19.9986076Z .b8 116 2026-02-21T09:41:19.9986125Z .b8 111 2026-02-21T09:41:19.9986172Z .b8 114 2026-02-21T09:41:19.9986221Z .b8 99 2026-02-21T09:41:19.9986276Z .b8 104 2026-02-21T09:41:19.9986324Z .b8 105 2026-02-21T09:41:19.9986371Z .b8 110 2026-02-21T09:41:19.9986427Z .b8 100 2026-02-21T09:41:19.9986477Z .b8 117 2026-02-21T09:41:19.9986552Z .b8 99 2026-02-21T09:41:19.9986602Z .b8 116 2026-02-21T09:41:19.9986680Z .b8 111 2026-02-21T09:41:19.9986729Z .b8 114 2026-02-21T09:41:19.9986777Z .b8 95 2026-02-21T09:41:19.9986825Z .b8 114 2026-02-21T09:41:19.9986882Z .b8 111 2026-02-21T09:41:19.9986931Z .b8 111 2026-02-21T09:41:19.9986979Z .b8 116 2026-02-21T09:41:19.9987036Z .b8 47 2026-02-21T09:41:19.9987084Z .b8 114 2026-02-21T09:41:19.9987134Z .b8 107 2026-02-21T09:41:19.9987183Z .b8 0 2026-02-21T09:41:19.9987238Z } 2026-02-21T09:41:19.9987302Z .section .debug_macinfo { } 2026-02-21T09:41:19.9987307Z 2026-02-21T09:41:19.9987382Z ================================================================ 2026-02-21T09:41:19.9987489Z please share the reproducer above with Triton project. 2026-02-21T09:41:20.2787759Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:20.2788042Z 2026-02-21T09:41:20.2790392Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=6, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:41:20.2791738Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:20.2791992Z `ptxas` stderr: 2026-02-21T09:41:20.2792415Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.2792903Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.2793050Z 2026-02-21T09:41:20.2793707Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpboo3ua74.ptx -o /tmp/tmpboo3ua74.ptx.o 2026-02-21T09:41:20.2794172Z 2026-02-21T09:41:20.2794299Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:20.2794494Z 2026-02-21T09:41:20.2794497Z 2026-02-21T09:41:20.2794577Z ================================================================ 2026-02-21T09:41:20.2794854Z Internal Triton PTX codegen error 2026-02-21T09:41:20.2795024Z `ptxas` stderr: 2026-02-21T09:41:20.2795426Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.2795893Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.2796043Z 2026-02-21T09:41:20.2796426Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpboo3ua74.ptx -o /tmp/tmpboo3ua74.ptx.o 2026-02-21T09:41:20.2796845Z 2026-02-21T09:41:20.2796849Z 2026-02-21T09:41:20.2796914Z // 2026-02-21T09:41:20.2797050Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:20.2797227Z // 2026-02-21T09:41:20.2797291Z 2026-02-21T09:41:20.2797345Z .version 8.7 2026-02-21T09:41:20.2797493Z .target sm_100a 2026-02-21T09:41:20.2797640Z .address_size 64 2026-02-21T09:41:20.2797726Z 2026-02-21T09:41:20.2797847Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:20.2798106Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:20.2798313Z // @_helion_matmul 2026-02-21T09:41:20.2798524Z .visible .entry _helion_matmul( 2026-02-21T09:41:20.2798733Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:20.2798986Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:20.2799407Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:20.2799645Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:20.2799889Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:20.2800180Z ) 2026-02-21T09:41:20.2800305Z .reqntid 256 2026-02-21T09:41:20.2800429Z .maxnreg 32 2026-02-21T09:41:20.2800555Z { 2026-02-21T09:41:20.2800676Z .reg .pred %p<143>; 2026-02-21T09:41:20.2800830Z .reg .b32 %r<497>; 2026-02-21T09:41:20.2800975Z .reg .b64 %rd<142>; 2026-02-21T09:41:20.2801238Z .loc 1 19 0 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:19:0 2026-02-21T09:41:20.2801527Z $L__func_begin0: 2026-02-21T09:41:20.2801771Z .loc 1 19 0 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:19:0 2026-02-21T09:41:20.2802007Z 2026-02-21T09:41:20.2802058Z // %bb.0: 2026-02-21T09:41:20.2802206Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:41:20.2802393Z $L__tmp0: 2026-02-21T09:41:20.2802625Z .loc 1 19 0 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:19 2026-02-21T09:41:20.2802898Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:20.2803072Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:41:20.2803267Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:41:20.2803431Z mov.b32 %r60, global_smem; 2026-02-21T09:41:20.2803583Z // begin inline asm 2026-02-21T09:41:20.2803814Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r60], 128; 2026-02-21T09:41:20.2804053Z // end inline asm 2026-02-21T09:41:20.2804213Z ld.param.b64 %rd50, [_helion_matmul_param_3]; 2026-02-21T09:41:20.2804399Z bar.sync 0; 2026-02-21T09:41:20.2804537Z ld.shared.b32 %r467, [global_smem]; 2026-02-21T09:41:20.2804749Z bar.sync 0; 2026-02-21T09:41:20.2804878Z // begin inline asm 2026-02-21T09:41:20.2805081Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:20.2805295Z // end inline asm 2026-02-21T09:41:20.2805536Z .loc 1 21 67 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:21:67 2026-02-21T09:41:20.2805852Z mov.u32 %r496, %ctaid.x; 2026-02-21T09:41:20.2806045Z mov.u32 %r166, %ctaid.y; 2026-02-21T09:41:20.2806204Z mov.u32 %r167, %ctaid.z; 2026-02-21T09:41:20.2806353Z mov.u32 %r168, %nctaid.x; 2026-02-21T09:41:20.2806512Z mov.u32 %r169, %nctaid.y; 2026-02-21T09:41:20.2806673Z mad.lo.s32 %r170, %r167, %r169, %r166; 2026-02-21T09:41:20.2806870Z mad.lo.s32 %r171, %r170, %r168, %r496; 2026-02-21T09:41:20.2807041Z shl.b32 %r172, %r171, 8; 2026-02-21T09:41:20.2807202Z cvt.s64.s32 %rd51, %r172; 2026-02-21T09:41:20.2807357Z add.s64 %rd19, %rd50, %rd51; 2026-02-21T09:41:20.2807522Z shl.b32 %r173, %r1, 2; 2026-02-21T09:41:20.2807670Z add.s32 %r61, %r60, %r173; 2026-02-21T09:41:20.2807823Z mov.b32 %r70, 0; 2026-02-21T09:41:20.2807960Z // begin inline asm 2026-02-21T09:41:20.2808107Z @%p4 st.shared.b32 [ %r61 + 0 ], %r70; 2026-02-21T09:41:20.2808281Z // end inline asm 2026-02-21T09:41:20.2808416Z bar.warp.sync -1; 2026-02-21T09:41:20.2808564Z setp.eq.b32 %p133, %r1, 0; 2026-02-21T09:41:20.2808719Z cvt.u64.u32 %rd4, %r60; 2026-02-21T09:41:20.2808871Z // begin inline asm 2026-02-21T09:41:20.2809114Z @%p133 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:41:20.2809403Z // end inline asm 2026-02-21T09:41:20.2809537Z // begin inline asm 2026-02-21T09:41:20.2809753Z @%p133 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:20.2810010Z // end inline asm 2026-02-21T09:41:20.2810138Z mov.b32 %r63, 32; 2026-02-21T09:41:20.2810276Z // begin inline asm 2026-02-21T09:41:20.2810502Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:41:20.2810768Z // end inline asm 2026-02-21T09:41:20.2810895Z mov.b32 %r64, 64; 2026-02-21T09:41:20.2811029Z // begin inline asm 2026-02-21T09:41:20.2811253Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r64; 2026-02-21T09:41:20.2811509Z // end inline asm 2026-02-21T09:41:20.2811643Z mov.b32 %r65, 1024; 2026-02-21T09:41:20.2811779Z // begin inline asm 2026-02-21T09:41:20.2812052Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r65; 2026-02-21T09:41:20.2812352Z // end inline asm 2026-02-21T09:41:20.2812488Z // begin inline asm 2026-02-21T09:41:20.2812723Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r65; 2026-02-21T09:41:20.2812994Z // end inline asm 2026-02-21T09:41:20.2813129Z mov.b64 %rd12, 2048; 2026-02-21T09:41:20.2813266Z // begin inline asm 2026-02-21T09:41:20.2813517Z @%p133 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:20.2813791Z // end inline asm 2026-02-21T09:41:20.2813924Z mov.b32 %r67, 1; 2026-02-21T09:41:20.2814052Z // begin inline asm 2026-02-21T09:41:20.2814301Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r67; 2026-02-21T09:41:20.2814581Z // end inline asm 2026-02-21T09:41:20.2814741Z // begin inline asm 2026-02-21T09:41:20.2814991Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r67; 2026-02-21T09:41:20.2815268Z // end inline asm 2026-02-21T09:41:20.2815407Z // begin inline asm 2026-02-21T09:41:20.2815631Z @%p133 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:20.2815895Z // end inline asm 2026-02-21T09:41:20.2816024Z // begin inline asm 2026-02-21T09:41:20.2816271Z @%p133 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.2816553Z // end inline asm 2026-02-21T09:41:20.2816681Z // begin inline asm 2026-02-21T09:41:20.2816921Z @%p133 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:20.2817190Z // end inline asm 2026-02-21T09:41:20.2817335Z // begin inline asm 2026-02-21T09:41:20.2817552Z @%p133 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.2817838Z // end inline asm 2026-02-21T09:41:20.2818006Z // begin inline asm 2026-02-21T09:41:20.2818354Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:20.2818749Z // end inline asm 2026-02-21T09:41:20.2818885Z // begin inline asm 2026-02-21T09:41:20.2819104Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:41:20.2819357Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.2819558Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.2819742Z // end inline asm 2026-02-21T09:41:20.2819874Z bar.sync 0; 2026-02-21T09:41:20.2820021Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:41:20.2820302Z .loc 1 22 68 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:22:68 2026-02-21T09:41:20.2820599Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:41:20.2820752Z bar.sync 0; 2026-02-21T09:41:20.2820887Z // begin inline asm 2026-02-21T09:41:20.2821036Z @%p4 st.shared.b32 [ %r61 + 0 ], %r70; 2026-02-21T09:41:20.2821217Z // end inline asm 2026-02-21T09:41:20.2821356Z bar.warp.sync -1; 2026-02-21T09:41:20.2821503Z // begin inline asm 2026-02-21T09:41:20.2821761Z @%p133 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:41:20.2822047Z // end inline asm 2026-02-21T09:41:20.2822187Z // begin inline asm 2026-02-21T09:41:20.2822407Z @%p133 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:20.2822667Z // end inline asm 2026-02-21T09:41:20.2822800Z // begin inline asm 2026-02-21T09:41:20.2823044Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:41:20.2823316Z // end inline asm 2026-02-21T09:41:20.2823449Z mov.b32 %r72, 128; 2026-02-21T09:41:20.2823593Z // begin inline asm 2026-02-21T09:41:20.2823824Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r72; 2026-02-21T09:41:20.2824095Z // end inline asm 2026-02-21T09:41:20.2824231Z // begin inline asm 2026-02-21T09:41:20.2824481Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r65; 2026-02-21T09:41:20.2824844Z // end inline asm 2026-02-21T09:41:20.2824989Z mov.b32 %r74, 12288; 2026-02-21T09:41:20.2825142Z // begin inline asm 2026-02-21T09:41:20.2825392Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r74; 2026-02-21T09:41:20.2825673Z // end inline asm 2026-02-21T09:41:20.2825809Z // begin inline asm 2026-02-21T09:41:20.2826080Z @%p133 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:20.2826370Z // end inline asm 2026-02-21T09:41:20.2826520Z // begin inline asm 2026-02-21T09:41:20.2826827Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r67; 2026-02-21T09:41:20.2827112Z // end inline asm 2026-02-21T09:41:20.2827257Z // begin inline asm 2026-02-21T09:41:20.2827516Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r67; 2026-02-21T09:41:20.2827799Z // end inline asm 2026-02-21T09:41:20.2827929Z // begin inline asm 2026-02-21T09:41:20.2828160Z @%p133 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:20.2828424Z // end inline asm 2026-02-21T09:41:20.2828553Z // begin inline asm 2026-02-21T09:41:20.2828803Z @%p133 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.2829086Z // end inline asm 2026-02-21T09:41:20.2829223Z // begin inline asm 2026-02-21T09:41:20.2829450Z @%p133 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:20.2829725Z // end inline asm 2026-02-21T09:41:20.2829855Z // begin inline asm 2026-02-21T09:41:20.2830083Z @%p133 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.2830349Z // end inline asm 2026-02-21T09:41:20.2830478Z // begin inline asm 2026-02-21T09:41:20.2830880Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:20.2831255Z // end inline asm 2026-02-21T09:41:20.2831392Z // begin inline asm 2026-02-21T09:41:20.2831595Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:41:20.2831844Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.2832032Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.2832201Z // end inline asm 2026-02-21T09:41:20.2832336Z bar.sync 0; 2026-02-21T09:41:20.2832467Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:41:20.2832738Z .loc 1 40 45 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:40:45 2026-02-21T09:41:20.2833022Z shr.u32 %r174, %r1, 5; 2026-02-21T09:41:20.2833285Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2833572Z sub.s32 %r176, 1536, %r496; 2026-02-21T09:41:20.2833743Z mul.hi.s32 %r177, %r176, -580400985; 2026-02-21T09:41:20.2833920Z add.s32 %r178, %r177, %r176; 2026-02-21T09:41:20.2834074Z shr.u32 %r179, %r178, 31; 2026-02-21T09:41:20.2834229Z shr.s32 %r180, %r178, 13; 2026-02-21T09:41:20.2834379Z add.s32 %r181, %r180, %r179; 2026-02-21T09:41:20.2834536Z mul.lo.s32 %r182, %r181, 9472; 2026-02-21T09:41:20.2834723Z setp.ne.b32 %p68, %r176, %r182; 2026-02-21T09:41:20.2834898Z setp.lt.u32 %p69, %r496, 1537; 2026-02-21T09:41:20.2835063Z and.pred %p70, %p69, %p68; 2026-02-21T09:41:20.2835232Z selp.b32 %r183, 1, 0, %p70; 2026-02-21T09:41:20.2835395Z add.s32 %r10, %r181, %r183; 2026-02-21T09:41:20.2835653Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2835958Z shfl.sync.idx.b32 %r12, %r174, 0, 31, -1; 2026-02-21T09:41:20.2836138Z shl.b32 %r184, %r12, 21; 2026-02-21T09:41:20.2836295Z and.b32 %r185, %r184, 6291456; 2026-02-21T09:41:20.2836452Z add.s32 %r186, %r185, %r467; 2026-02-21T09:41:20.2836619Z shl.b32 %r187, %r12, 4; 2026-02-21T09:41:20.2836823Z and.b32 %r188, %r187, 64; 2026-02-21T09:41:20.2837024Z add.s32 %r77, %r186, %r188; 2026-02-21T09:41:20.2837193Z mov.pred %p42, -1; 2026-02-21T09:41:20.2837338Z // begin inline asm 2026-02-21T09:41:20.2837696Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r77 + 0], 32, {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:41:20.2838066Z // end inline asm 2026-02-21T09:41:20.2838209Z // begin inline asm 2026-02-21T09:41:20.2838537Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r77 + 16], 32, {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:41:20.2838917Z // end inline asm 2026-02-21T09:41:20.2839052Z // begin inline asm 2026-02-21T09:41:20.2839198Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:20.2839365Z // end inline asm 2026-02-21T09:41:20.2839491Z bar.sync 0; 2026-02-21T09:41:20.2839738Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2840021Z add.s32 %r111, %r60, 77872; 2026-02-21T09:41:20.2840174Z // begin inline asm 2026-02-21T09:41:20.2840336Z @%p133 mbarrier.init.shared::cta.b64 [%r111], 1; 2026-02-21T09:41:20.2840528Z // end inline asm 2026-02-21T09:41:20.2840653Z bar.sync 0; 2026-02-21T09:41:20.2840785Z add.s32 %r112, %r60, 77880; 2026-02-21T09:41:20.2840936Z // begin inline asm 2026-02-21T09:41:20.2841095Z @%p133 mbarrier.init.shared::cta.b64 [%r112], 1; 2026-02-21T09:41:20.2841283Z // end inline asm 2026-02-21T09:41:20.2841413Z add.s32 %r113, %r60, 77824; 2026-02-21T09:41:20.2841563Z // begin inline asm 2026-02-21T09:41:20.2841717Z @%p133 mbarrier.init.shared::cta.b64 [%r113], 1; 2026-02-21T09:41:20.2841903Z // end inline asm 2026-02-21T09:41:20.2842028Z bar.sync 0; 2026-02-21T09:41:20.2842156Z add.s32 %r114, %r60, 77832; 2026-02-21T09:41:20.2842306Z // begin inline asm 2026-02-21T09:41:20.2842512Z @%p133 mbarrier.init.shared::cta.b64 [%r114], 1; 2026-02-21T09:41:20.2842699Z // end inline asm 2026-02-21T09:41:20.2842825Z bar.sync 0; 2026-02-21T09:41:20.2842955Z add.s32 %r115, %r60, 77840; 2026-02-21T09:41:20.2843098Z // begin inline asm 2026-02-21T09:41:20.2843256Z @%p133 mbarrier.init.shared::cta.b64 [%r115], 1; 2026-02-21T09:41:20.2843431Z // end inline asm 2026-02-21T09:41:20.2843562Z bar.sync 0; 2026-02-21T09:41:20.2843684Z add.s32 %r116, %r60, 77848; 2026-02-21T09:41:20.2843835Z // begin inline asm 2026-02-21T09:41:20.2843994Z @%p133 mbarrier.init.shared::cta.b64 [%r116], 1; 2026-02-21T09:41:20.2844174Z // end inline asm 2026-02-21T09:41:20.2844307Z bar.sync 0; 2026-02-21T09:41:20.2844431Z add.s32 %r117, %r60, 77856; 2026-02-21T09:41:20.2844581Z // begin inline asm 2026-02-21T09:41:20.2844762Z @%p133 mbarrier.init.shared::cta.b64 [%r117], 1; 2026-02-21T09:41:20.2844948Z // end inline asm 2026-02-21T09:41:20.2845074Z bar.sync 0; 2026-02-21T09:41:20.2845212Z add.s32 %r222, %r60, 77864; 2026-02-21T09:41:20.2845367Z // begin inline asm 2026-02-21T09:41:20.2845528Z @%p133 mbarrier.init.shared::cta.b64 [%r222], 1; 2026-02-21T09:41:20.2845715Z // end inline asm 2026-02-21T09:41:20.2845851Z setp.lt.s32 %p71, %r10, 1; 2026-02-21T09:41:20.2846014Z setp.gt.s32 %p67, %r10, 0; 2026-02-21T09:41:20.2846269Z .loc 1 35 33 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:35:33 2026-02-21T09:41:20.2846562Z shr.u32 %r189, %r496, 4; 2026-02-21T09:41:20.2846718Z and.b32 %r190, %r189, 134217664; 2026-02-21T09:41:20.2846995Z .loc 1 36 39 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:36:39 2026-02-21T09:41:20.2847287Z sub.s32 %r191, 96, %r190; 2026-02-21T09:41:20.2847532Z .loc 1 36 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:36:52 2026-02-21T09:41:20.2847812Z min.s32 %r192, %r191, 64; 2026-02-21T09:41:20.2848062Z .loc 1 37 45 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:37:45 2026-02-21T09:41:20.2848367Z and.b32 %r193, %r496, 1023; 2026-02-21T09:41:20.2848642Z .loc 1 38 51 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:38:51 2026-02-21T09:41:20.2848922Z div.s32 %r194, %r193, %r192; 2026-02-21T09:41:20.2849178Z .loc 1 37 64 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:37:64 2026-02-21T09:41:20.2849455Z mul.lo.s32 %r195, %r194, %r192; 2026-02-21T09:41:20.2849624Z sub.s32 %r196, %r193, %r195; 2026-02-21T09:41:20.2849871Z .loc 1 37 30 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:37:30 2026-02-21T09:41:20.2850146Z add.s32 %r197, %r196, %r190; 2026-02-21T09:41:20.2850390Z .loc 1 39 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:39:27 2026-02-21T09:41:20.2850667Z shl.b32 %r473, %r197, 7; 2026-02-21T09:41:20.2850920Z .loc 1 41 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:41:27 2026-02-21T09:41:20.2851191Z shl.b32 %r469, %r194, 6; 2026-02-21T09:41:20.2851447Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2851718Z bar.sync 0; 2026-02-21T09:41:20.2851856Z and.pred %p1, %p133, %p67; 2026-02-21T09:41:20.2852006Z // begin inline asm 2026-02-21T09:41:20.2852196Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r113], 12288; 2026-02-21T09:41:20.2852414Z // end inline asm 2026-02-21T09:41:20.2852641Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2852921Z bar.sync 0; 2026-02-21T09:41:20.2853051Z elect.sync %r198|%p72, -1; 2026-02-21T09:41:20.2853214Z and.pred %p73, %p67, %p72; 2026-02-21T09:41:20.2853368Z and.pred %p53, %p4, %p73; 2026-02-21T09:41:20.2853523Z add.s32 %r120, %r60, 49152; 2026-02-21T09:41:20.2853668Z // begin inline asm 2026-02-21T09:41:20.2854052Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r120], [%rd40, {%r70, %r469}], [%r113]; 2026-02-21T09:41:20.2854417Z // end inline asm 2026-02-21T09:41:20.2854655Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2854964Z // begin inline asm 2026-02-21T09:41:20.2855114Z fence.proxy.async.shared::cta; 2026-02-21T09:41:20.2855279Z // end inline asm 2026-02-21T09:41:20.2855406Z bar.sync 0; 2026-02-21T09:41:20.2855545Z elect.sync %r199|%p74, -1; 2026-02-21T09:41:20.2855701Z and.pred %p75, %p67, %p74; 2026-02-21T09:41:20.2855864Z and.pred %p54, %p4, %p75; 2026-02-21T09:41:20.2856023Z // begin inline asm 2026-02-21T09:41:20.2856339Z @%p54 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r60], [%rd41, {%r70, %r473}], [%r113]; 2026-02-21T09:41:20.2856685Z // end inline asm 2026-02-21T09:41:20.2856928Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2857215Z bar.sync 0; 2026-02-21T09:41:20.2857341Z // begin inline asm 2026-02-21T09:41:20.2857533Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r114], 12288; 2026-02-21T09:41:20.2857737Z // end inline asm 2026-02-21T09:41:20.2857978Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2858256Z bar.sync 0; 2026-02-21T09:41:20.2858385Z elect.sync %r200|%p76, -1; 2026-02-21T09:41:20.2858546Z and.pred %p77, %p67, %p76; 2026-02-21T09:41:20.2858700Z and.pred %p56, %p4, %p77; 2026-02-21T09:41:20.2858858Z add.s32 %r129, %r60, 53248; 2026-02-21T09:41:20.2859004Z // begin inline asm 2026-02-21T09:41:20.2859325Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r129], [%rd40, {%r63, %r469}], [%r114]; 2026-02-21T09:41:20.2859674Z // end inline asm 2026-02-21T09:41:20.2859906Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2860182Z bar.sync 0; 2026-02-21T09:41:20.2860343Z elect.sync %r201|%p78, -1; 2026-02-21T09:41:20.2860536Z and.pred %p79, %p67, %p78; 2026-02-21T09:41:20.2860687Z and.pred %p57, %p4, %p79; 2026-02-21T09:41:20.2860848Z add.s32 %r133, %r60, 8192; 2026-02-21T09:41:20.2861000Z // begin inline asm 2026-02-21T09:41:20.2861328Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r133], [%rd41, {%r63, %r473}], [%r114]; 2026-02-21T09:41:20.2861684Z // end inline asm 2026-02-21T09:41:20.2861932Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2862232Z bar.sync 0; 2026-02-21T09:41:20.2862361Z // begin inline asm 2026-02-21T09:41:20.2862557Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r115], 12288; 2026-02-21T09:41:20.2862774Z // end inline asm 2026-02-21T09:41:20.2863022Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2863305Z bar.sync 0; 2026-02-21T09:41:20.2863440Z elect.sync %r202|%p80, -1; 2026-02-21T09:41:20.2863609Z and.pred %p81, %p67, %p80; 2026-02-21T09:41:20.2863770Z and.pred %p59, %p4, %p81; 2026-02-21T09:41:20.2863934Z add.s32 %r138, %r60, 57344; 2026-02-21T09:41:20.2864084Z // begin inline asm 2026-02-21T09:41:20.2864411Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r138], [%rd40, {%r64, %r469}], [%r115]; 2026-02-21T09:41:20.2864802Z // end inline asm 2026-02-21T09:41:20.2865055Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2865350Z bar.sync 0; 2026-02-21T09:41:20.2865486Z elect.sync %r203|%p82, -1; 2026-02-21T09:41:20.2865657Z and.pred %p83, %p67, %p82; 2026-02-21T09:41:20.2865816Z and.pred %p60, %p4, %p83; 2026-02-21T09:41:20.2865981Z add.s32 %r142, %r60, 16384; 2026-02-21T09:41:20.2866164Z // begin inline asm 2026-02-21T09:41:20.2866529Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r142], [%rd41, {%r64, %r473}], [%r115]; 2026-02-21T09:41:20.2866890Z // end inline asm 2026-02-21T09:41:20.2867150Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2867457Z bar.sync 0; 2026-02-21T09:41:20.2867594Z // begin inline asm 2026-02-21T09:41:20.2867800Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r116], 12288; 2026-02-21T09:41:20.2868032Z // end inline asm 2026-02-21T09:41:20.2868292Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2868585Z bar.sync 0; 2026-02-21T09:41:20.2868743Z elect.sync %r204|%p84, -1; 2026-02-21T09:41:20.2868910Z and.pred %p85, %p67, %p84; 2026-02-21T09:41:20.2869067Z and.pred %p62, %p4, %p85; 2026-02-21T09:41:20.2869239Z add.s32 %r147, %r60, 61440; 2026-02-21T09:41:20.2869398Z mov.b32 %r148, 96; 2026-02-21T09:41:20.2869548Z // begin inline asm 2026-02-21T09:41:20.2869866Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd40, {%r148, %r469}], [%r116]; 2026-02-21T09:41:20.2870224Z // end inline asm 2026-02-21T09:41:20.2870462Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2870740Z bar.sync 0; 2026-02-21T09:41:20.2870883Z elect.sync %r205|%p86, -1; 2026-02-21T09:41:20.2871042Z and.pred %p87, %p67, %p86; 2026-02-21T09:41:20.2871209Z and.pred %p63, %p4, %p87; 2026-02-21T09:41:20.2871364Z add.s32 %r151, %r60, 24576; 2026-02-21T09:41:20.2871523Z // begin inline asm 2026-02-21T09:41:20.2871840Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r151], [%rd41, {%r148, %r473}], [%r116]; 2026-02-21T09:41:20.2872189Z // end inline asm 2026-02-21T09:41:20.2872443Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2872730Z bar.sync 0; 2026-02-21T09:41:20.2872895Z // begin inline asm 2026-02-21T09:41:20.2873104Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r117], 12288; 2026-02-21T09:41:20.2873317Z // end inline asm 2026-02-21T09:41:20.2873550Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2873823Z bar.sync 0; 2026-02-21T09:41:20.2873949Z elect.sync %r206|%p88, -1; 2026-02-21T09:41:20.2874109Z and.pred %p89, %p67, %p88; 2026-02-21T09:41:20.2874266Z and.pred %p65, %p4, %p89; 2026-02-21T09:41:20.2874413Z add.s32 %r156, %r60, 65536; 2026-02-21T09:41:20.2874564Z // begin inline asm 2026-02-21T09:41:20.2874902Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r156], [%rd40, {%r72, %r469}], [%r117]; 2026-02-21T09:41:20.2875249Z // end inline asm 2026-02-21T09:41:20.2875483Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2875754Z bar.sync 0; 2026-02-21T09:41:20.2875883Z elect.sync %r207|%p90, -1; 2026-02-21T09:41:20.2876046Z and.pred %p91, %p67, %p90; 2026-02-21T09:41:20.2876203Z and.pred %p66, %p4, %p91; 2026-02-21T09:41:20.2876352Z add.s32 %r160, %r60, 32768; 2026-02-21T09:41:20.2876505Z // begin inline asm 2026-02-21T09:41:20.2876815Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd41, {%r72, %r473}], [%r117]; 2026-02-21T09:41:20.2877154Z // end inline asm 2026-02-21T09:41:20.2877393Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2877678Z bar.sync 0; 2026-02-21T09:41:20.2877808Z // begin inline asm 2026-02-21T09:41:20.2877936Z 2026-02-21T09:41:20.2878056Z { 2026-02-21T09:41:20.2878183Z @!%p67 bra.uni skipWait; 2026-02-21T09:41:20.2878345Z .reg .pred complete; 2026-02-21T09:41:20.2878493Z waitLoop: 2026-02-21T09:41:20.2878715Z mbarrier.try_wait.parity.shared.b64 complete, [%r113], %r70; 2026-02-21T09:41:20.2878991Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.2879153Z skipWait: 2026-02-21T09:41:20.2879268Z } 2026-02-21T09:41:20.2879337Z 2026-02-21T09:41:20.2879391Z // end inline asm 2026-02-21T09:41:20.2879632Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2879905Z setp.ne.b32 %p92, %r12, 0; 2026-02-21T09:41:20.2880063Z or.pred %p93, %p71, %p92; 2026-02-21T09:41:20.2880212Z @%p93 bra $L__BB0_2; 2026-02-21T09:41:20.2880354Z // %bb.1: 2026-02-21T09:41:20.2880480Z elect.sync %r212|%p95, -1; 2026-02-21T09:41:20.2880643Z bfe.u32 %r215, %r120, 4, 14; 2026-02-21T09:41:20.2880794Z cvt.u64.u32 %rd57, %r215; 2026-02-21T09:41:20.2880963Z or.b64 %rd52, %rd57, -9223371899399045120; 2026-02-21T09:41:20.2881150Z bfe.u32 %r216, %r60, 4, 14; 2026-02-21T09:41:20.2881297Z cvt.u64.u32 %rd58, %r216; 2026-02-21T09:41:20.2881460Z or.b64 %rd53, %rd58, -9223371899382267904; 2026-02-21T09:41:20.2881634Z mov.b32 %r209, 69206032; 2026-02-21T09:41:20.2881788Z mov.pred %p94, 0; 2026-02-21T09:41:20.2881923Z // begin inline asm 2026-02-21T09:41:20.2882145Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd52, %rd53, %r209, %p94; 2026-02-21T09:41:20.2882392Z // end inline asm 2026-02-21T09:41:20.2882528Z add.s32 %r217, %r60, 49184; 2026-02-21T09:41:20.2882677Z bfe.u32 %r218, %r217, 4, 14; 2026-02-21T09:41:20.2882831Z cvt.u64.u32 %rd59, %r218; 2026-02-21T09:41:20.2882991Z or.b64 %rd54, %rd59, -9223371899399045120; 2026-02-21T09:41:20.2883160Z add.s32 %r219, %r60, 32; 2026-02-21T09:41:20.2883311Z bfe.u32 %r220, %r219, 4, 14; 2026-02-21T09:41:20.2883458Z cvt.u64.u32 %rd60, %r220; 2026-02-21T09:41:20.2883616Z or.b64 %rd55, %rd60, -9223371899382267904; 2026-02-21T09:41:20.2883780Z // begin inline asm 2026-02-21T09:41:20.2883997Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd54, %rd55, %r209, %p42; 2026-02-21T09:41:20.2884234Z // end inline asm 2026-02-21T09:41:20.2884373Z add.s32 %r221, %r60, 77872; 2026-02-21T09:41:20.2884559Z cvt.u64.u32 %rd56, %r221; 2026-02-21T09:41:20.2884756Z // begin inline asm 2026-02-21T09:41:20.2884963Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T09:41:20.2885186Z // end inline asm 2026-02-21T09:41:20.2885318Z $L__BB0_2: 2026-02-21T09:41:20.2885555Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2885843Z bar.sync 0; 2026-02-21T09:41:20.2885969Z // begin inline asm 2026-02-21T09:41:20.2886158Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r222], 12288; 2026-02-21T09:41:20.2886372Z // end inline asm 2026-02-21T09:41:20.2886612Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2886903Z bar.sync 0; 2026-02-21T09:41:20.2887039Z elect.sync %r232|%p105, -1; 2026-02-21T09:41:20.2887214Z and.pred %p106, %p67, %p105; 2026-02-21T09:41:20.2887376Z and.pred %p100, %p4, %p106; 2026-02-21T09:41:20.2887540Z add.s32 %r223, %r60, 69632; 2026-02-21T09:41:20.2887684Z mov.b32 %r481, 160; 2026-02-21T09:41:20.2887828Z // begin inline asm 2026-02-21T09:41:20.2888151Z @%p100 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r223], [%rd40, {%r481, %r469}], [%r222]; 2026-02-21T09:41:20.2888496Z // end inline asm 2026-02-21T09:41:20.2888744Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2889017Z bar.sync 0; 2026-02-21T09:41:20.2889153Z elect.sync %r233|%p107, -1; 2026-02-21T09:41:20.2889312Z and.pred %p108, %p67, %p107; 2026-02-21T09:41:20.2889473Z and.pred %p101, %p4, %p108; 2026-02-21T09:41:20.2889631Z add.s32 %r227, %r60, 40960; 2026-02-21T09:41:20.2889774Z // begin inline asm 2026-02-21T09:41:20.2890128Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r227], [%rd41, {%r481, %r473}], [%r222]; 2026-02-21T09:41:20.2890511Z // end inline asm 2026-02-21T09:41:20.2890761Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2891038Z @%p71 bra $L__BB0_12; 2026-02-21T09:41:20.2891207Z // %bb.3: // %.lr.ph 2026-02-21T09:41:20.2891504Z .loc 1 0 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:0:108 2026-02-21T09:41:20.2891784Z and.b32 %r4, %r1, 15; 2026-02-21T09:41:20.2891933Z shr.u32 %r175, %r1, 4; 2026-02-21T09:41:20.2892078Z bfe.u32 %r6, %r1, 4, 4; 2026-02-21T09:41:20.2892253Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:41:20.2892433Z shl.b32 %r5, %r4, 3; 2026-02-21T09:41:20.2892576Z or.b32 %r7, %r6, 16; 2026-02-21T09:41:20.2892708Z or.b32 %r8, %r6, 32; 2026-02-21T09:41:20.2892849Z or.b32 %r9, %r175, 48; 2026-02-21T09:41:20.2892989Z shl.b32 %r11, %r10, 5; 2026-02-21T09:41:20.2893137Z add.s32 %r16, %r11, -6; 2026-02-21T09:41:20.2893287Z shl.b32 %r242, %r1, 9; 2026-02-21T09:41:20.2893427Z and.b32 %r243, %r242, 3072; 2026-02-21T09:41:20.2893582Z shl.b32 %r244, %r4, 4; 2026-02-21T09:41:20.2893718Z shl.b32 %r245, %r1, 3; 2026-02-21T09:41:20.2893865Z and.b32 %r246, %r245, 768; 2026-02-21T09:41:20.2894012Z shl.b32 %r247, %r1, 1; 2026-02-21T09:41:20.2894157Z and.b32 %r248, %r247, 32; 2026-02-21T09:41:20.2894298Z shr.u32 %r249, %r1, 1; 2026-02-21T09:41:20.2894443Z and.b32 %r250, %r249, 64; 2026-02-21T09:41:20.2894587Z or.b32 %r251, %r244, %r246; 2026-02-21T09:41:20.2894772Z or.b32 %r252, %r248, %r250; 2026-02-21T09:41:20.2894929Z xor.b32 %r253, %r251, %r252; 2026-02-21T09:41:20.2895077Z add.s32 %r255, %r60, 73728; 2026-02-21T09:41:20.2895232Z add.s32 %r256, %r255, %r243; 2026-02-21T09:41:20.2895288Z add.s32 %r17, %r256, %r253; 2026-02-21T09:41:20.2895343Z shl.b32 %r257, %r1, 5; 2026-02-21T09:41:20.2895406Z and.b32 %r258, %r257, 3936; 2026-02-21T09:41:20.2895463Z and.b32 %r259, %r1, 224; 2026-02-21T09:41:20.2895519Z and.b32 %r261, %r173, 16; 2026-02-21T09:41:20.2895576Z xor.b32 %r262, %r258, %r259; 2026-02-21T09:41:20.2895694Z add.s32 %r263, %r255, %r261; 2026-02-21T09:41:20.2895749Z add.s32 %r368, %r263, %r262; 2026-02-21T09:41:20.2895923Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2895991Z max.s32 %r264, %r11, 2; 2026-02-21T09:41:20.2896050Z add.s32 %r19, %r264, -1; 2026-02-21T09:41:20.2896112Z mov.pred %p142, -1; 2026-02-21T09:41:20.2896166Z mov.b32 %r486, 5; 2026-02-21T09:41:20.2896229Z mov.b32 %r482, 0; 2026-02-21T09:41:20.2896281Z mov.b32 %r480, 1; 2026-02-21T09:41:20.2896334Z mov.b32 %r479, 2; 2026-02-21T09:41:20.2896398Z mov.b32 %r478, 3; 2026-02-21T09:41:20.2896449Z mov.b32 %r477, 4; 2026-02-21T09:41:20.2896503Z mov.b32 %r470, %r469; 2026-02-21T09:41:20.2896558Z mov.b32 %r471, %r469; 2026-02-21T09:41:20.2896619Z mov.b32 %r472, %r469; 2026-02-21T09:41:20.2896673Z mov.b32 %r474, %r473; 2026-02-21T09:41:20.2896728Z mov.b32 %r475, %r473; 2026-02-21T09:41:20.2896790Z mov.b32 %r476, %r473; 2026-02-21T09:41:20.2896844Z mov.b32 %r483, %r111; 2026-02-21T09:41:20.2896896Z mov.b32 %r484, %r482; 2026-02-21T09:41:20.2896948Z mov.b32 %r485, %r482; 2026-02-21T09:41:20.2897008Z mov.b32 %r487, %r480; 2026-02-21T09:41:20.2897059Z mov.b32 %r488, %r482; 2026-02-21T09:41:20.2897112Z mov.b32 %r489, %r469; 2026-02-21T09:41:20.2897174Z mov.b32 %r490, %r473; 2026-02-21T09:41:20.2897226Z mov.b32 %r492, %r486; 2026-02-21T09:41:20.2897279Z mov.b32 %r493, %r482; 2026-02-21T09:41:20.2897330Z mov.b32 %r494, %r490; 2026-02-21T09:41:20.2897389Z mov.b32 %r495, %r489; 2026-02-21T09:41:20.2897441Z bra.uni $L__BB0_4; 2026-02-21T09:41:20.2897546Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.2897715Z .loc 1 0 0 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:0 2026-02-21T09:41:20.2897805Z selp.b32 %r487, 0, %r316, %p126; 2026-02-21T09:41:20.2897889Z selp.b32 %r317, 1, 0, %p126; 2026-02-21T09:41:20.2897957Z xor.b32 %r488, %r458, %r317; 2026-02-21T09:41:20.2898132Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2898191Z add.s32 %r493, %r493, 1; 2026-02-21T09:41:20.2898256Z setp.ne.b32 %p132, %r19, %r493; 2026-02-21T09:41:20.2898318Z mov.b32 %r469, %r489; 2026-02-21T09:41:20.2898373Z mov.b32 %r472, %r22; 2026-02-21T09:41:20.2898429Z mov.b32 %r473, %r490; 2026-02-21T09:41:20.2898491Z mov.b32 %r476, %r26; 2026-02-21T09:41:20.2898547Z mov.b32 %r477, %r492; 2026-02-21T09:41:20.2898602Z mov.b32 %r480, %r30; 2026-02-21T09:41:20.2898657Z mov.b32 %r482, %r458; 2026-02-21T09:41:20.2898719Z mov.b32 %r483, %r457; 2026-02-21T09:41:20.2898775Z mov.b32 %r489, %r495; 2026-02-21T09:41:20.2898829Z mov.b32 %r490, %r494; 2026-02-21T09:41:20.2898891Z mov.b32 %r492, %r45; 2026-02-21T09:41:20.2898950Z @%p132 bra $L__BB0_4; 2026-02-21T09:41:20.2899007Z bra.uni $L__BB0_11; 2026-02-21T09:41:20.2899119Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:20.2899290Z .loc 1 0 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:0:108 2026-02-21T09:41:20.2899346Z mov.b32 %r458, %r488; 2026-02-21T09:41:20.2899401Z mov.b32 %r30, %r479; 2026-02-21T09:41:20.2899462Z mov.b32 %r479, %r478; 2026-02-21T09:41:20.2899517Z mov.b32 %r478, %r477; 2026-02-21T09:41:20.2899573Z mov.b32 %r26, %r475; 2026-02-21T09:41:20.2899635Z mov.b32 %r475, %r474; 2026-02-21T09:41:20.2899689Z mov.b32 %r474, %r473; 2026-02-21T09:41:20.2899743Z mov.b32 %r22, %r471; 2026-02-21T09:41:20.2899798Z mov.b32 %r471, %r470; 2026-02-21T09:41:20.2899859Z mov.b32 %r470, %r469; 2026-02-21T09:41:20.2900025Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2900083Z add.s32 %r265, %r492, 1; 2026-02-21T09:41:20.2900157Z setp.eq.b32 %p110, %r492, 31; 2026-02-21T09:41:20.2900222Z selp.b32 %r45, 0, %r265, %p110; 2026-02-21T09:41:20.2900325Z setp.ne.b32 %p111, %r45, 0; 2026-02-21T09:41:20.2900381Z @%p111 bra $L__BB0_6; 2026-02-21T09:41:20.2900482Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.2900538Z add.s32 %r496, %r496, 9472; 2026-02-21T09:41:20.2900697Z .loc 1 34 35 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:34:35 2026-02-21T09:41:20.2900759Z shr.s32 %r266, %r496, 31; 2026-02-21T09:41:20.2900812Z shr.u32 %r267, %r266, 22; 2026-02-21T09:41:20.2900868Z add.s32 %r268, %r496, %r267; 2026-02-21T09:41:20.2900928Z shr.s32 %r269, %r268, 10; 2026-02-21T09:41:20.2901089Z .loc 1 35 33 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:35:33 2026-02-21T09:41:20.2901142Z shl.b32 %r270, %r269, 6; 2026-02-21T09:41:20.2901303Z .loc 1 36 39 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:36:39 2026-02-21T09:41:20.2901366Z sub.s32 %r271, 96, %r270; 2026-02-21T09:41:20.2901528Z .loc 1 36 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:36:52 2026-02-21T09:41:20.2901583Z min.s32 %r272, %r271, 64; 2026-02-21T09:41:20.2901747Z .loc 1 37 45 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:37:45 2026-02-21T09:41:20.2901805Z and.b32 %r273, %r268, -1024; 2026-02-21T09:41:20.2901860Z sub.s32 %r274, %r496, %r273; 2026-02-21T09:41:20.2902023Z .loc 1 38 51 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:38:51 2026-02-21T09:41:20.2902078Z div.s32 %r275, %r274, %r272; 2026-02-21T09:41:20.2902233Z .loc 1 37 64 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:37:64 2026-02-21T09:41:20.2902299Z mul.lo.s32 %r276, %r275, %r272; 2026-02-21T09:41:20.2902354Z sub.s32 %r277, %r274, %r276; 2026-02-21T09:41:20.2902557Z .loc 1 37 30 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:37:30 2026-02-21T09:41:20.2902616Z add.s32 %r278, %r277, %r270; 2026-02-21T09:41:20.2902780Z .loc 1 39 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:39:27 2026-02-21T09:41:20.2902835Z shl.b32 %r494, %r278, 7; 2026-02-21T09:41:20.2902988Z .loc 1 41 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:41:27 2026-02-21T09:41:20.2903052Z shl.b32 %r495, %r275, 6; 2026-02-21T09:41:20.2903148Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.2903312Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2903375Z add.s32 %r281, %r485, 1; 2026-02-21T09:41:20.2903433Z setp.gt.s32 %p113, %r281, 5; 2026-02-21T09:41:20.2903495Z selp.b32 %r485, 0, %r281, %p113; 2026-02-21T09:41:20.2903551Z selp.b32 %r282, 1, 0, %p113; 2026-02-21T09:41:20.2903614Z xor.b32 %r484, %r484, %r282; 2026-02-21T09:41:20.2903670Z shl.b32 %r283, %r485, 3; 2026-02-21T09:41:20.2903728Z add.s32 %r285, %r60, %r283; 2026-02-21T09:41:20.2903794Z add.s32 %r279, %r285, 77824; 2026-02-21T09:41:20.2903848Z bar.sync 0; 2026-02-21T09:41:20.2903905Z // begin inline asm 2026-02-21T09:41:20.2903967Z 2026-02-21T09:41:20.2904018Z { 2026-02-21T09:41:20.2904081Z .reg .pred complete; 2026-02-21T09:41:20.2904136Z waitLoop: 2026-02-21T09:41:20.2904268Z mbarrier.try_wait.parity.shared.b64 complete, [%r279], %r484; 2026-02-21T09:41:20.2904334Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.2904386Z } 2026-02-21T09:41:20.2904389Z 2026-02-21T09:41:20.2904451Z // end inline asm 2026-02-21T09:41:20.2904508Z shl.b32 %r286, %r487, 3; 2026-02-21T09:41:20.2904564Z add.s32 %r287, %r60, %r286; 2026-02-21T09:41:20.2904621Z add.s32 %r457, %r287, 77872; 2026-02-21T09:41:20.2904817Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2904878Z @%p92 bra $L__BB0_8; 2026-02-21T09:41:20.2905001Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.2905196Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2905254Z shl.b32 %r292, %r485, 12; 2026-02-21T09:41:20.2905311Z add.s32 %r294, %r60, %r292; 2026-02-21T09:41:20.2905375Z add.s32 %r295, %r294, 49152; 2026-02-21T09:41:20.2905534Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2905591Z shl.b32 %r296, %r485, 13; 2026-02-21T09:41:20.2905648Z add.s32 %r297, %r60, %r296; 2026-02-21T09:41:20.2905820Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2905884Z elect.sync %r298|%p115, -1; 2026-02-21T09:41:20.2905942Z bfe.u32 %r299, %r295, 4, 14; 2026-02-21T09:41:20.2906011Z cvt.u64.u32 %rd68, %r299; 2026-02-21T09:41:20.2906085Z or.b64 %rd63, %rd68, -9223371899399045120; 2026-02-21T09:41:20.2906145Z bfe.u32 %r300, %r297, 4, 14; 2026-02-21T09:41:20.2906215Z cvt.u64.u32 %rd69, %r300; 2026-02-21T09:41:20.2906288Z or.b64 %rd64, %rd69, -9223371899382267904; 2026-02-21T09:41:20.2906347Z mov.b32 %r289, 69206032; 2026-02-21T09:41:20.2906405Z // begin inline asm 2026-02-21T09:41:20.2906563Z @%p115 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd63, %rd64, %r289, %p142; 2026-02-21T09:41:20.2906621Z // end inline asm 2026-02-21T09:41:20.2906678Z add.s32 %r301, %r294, 49184; 2026-02-21T09:41:20.2906744Z bfe.u32 %r302, %r301, 4, 14; 2026-02-21T09:41:20.2906802Z cvt.u64.u32 %rd70, %r302; 2026-02-21T09:41:20.2906871Z or.b64 %rd65, %rd70, -9223371899399045120; 2026-02-21T09:41:20.2906928Z add.s32 %r303, %r297, 32; 2026-02-21T09:41:20.2906992Z bfe.u32 %r304, %r303, 4, 14; 2026-02-21T09:41:20.2907051Z cvt.u64.u32 %rd71, %r304; 2026-02-21T09:41:20.2907157Z or.b64 %rd66, %rd71, -9223371899382267904; 2026-02-21T09:41:20.2907255Z mov.pred %p116, -1; 2026-02-21T09:41:20.2907315Z // begin inline asm 2026-02-21T09:41:20.2907458Z @%p115 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd65, %rd66, %r289, %p116; 2026-02-21T09:41:20.2907519Z // end inline asm 2026-02-21T09:41:20.2907577Z cvt.u64.u32 %rd67, %r457; 2026-02-21T09:41:20.2907633Z // begin inline asm 2026-02-21T09:41:20.2907761Z @%p115 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:41:20.2907824Z // end inline asm 2026-02-21T09:41:20.2907919Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.2908098Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2908168Z setp.eq.b32 %p122, %r45, 0; 2026-02-21T09:41:20.2908231Z setp.lt.s32 %p123, %r493, %r16; 2026-02-21T09:41:20.2908399Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2908463Z // begin inline asm 2026-02-21T09:41:20.2908514Z 2026-02-21T09:41:20.2908565Z { 2026-02-21T09:41:20.2908626Z .reg .pred complete; 2026-02-21T09:41:20.2908688Z waitLoop: 2026-02-21T09:41:20.2908809Z mbarrier.try_wait.parity.shared.b64 complete, [%r483], %r482; 2026-02-21T09:41:20.2908873Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.2908929Z } 2026-02-21T09:41:20.2908932Z 2026-02-21T09:41:20.2908985Z // end inline asm 2026-02-21T09:41:20.2909044Z add.s32 %r316, %r487, 1; 2026-02-21T09:41:20.2909105Z setp.gt.s32 %p126, %r316, 1; 2026-02-21T09:41:20.2909281Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2909337Z add.s32 %r318, %r481, 32; 2026-02-21T09:41:20.2909394Z add.s32 %r319, %r486, 1; 2026-02-21T09:41:20.2909462Z setp.gt.s32 %p127, %r319, 5; 2026-02-21T09:41:20.2909526Z selp.b32 %r486, 0, %r319, %p127; 2026-02-21T09:41:20.2909588Z selp.b32 %r481, 0, %r318, %p122; 2026-02-21T09:41:20.2909654Z shl.b32 %r320, %r486, 3; 2026-02-21T09:41:20.2909713Z add.s32 %r322, %r60, %r320; 2026-02-21T09:41:20.2909795Z add.s32 %r311, %r322, 77824; 2026-02-21T09:41:20.2909873Z bar.sync 0; 2026-02-21T09:41:20.2909948Z and.pred %p119, %p133, %p123; 2026-02-21T09:41:20.2910005Z // begin inline asm 2026-02-21T09:41:20.2910121Z @%p119 mbarrier.arrive.expect_tx.shared.b64 _, [%r311], 12288; 2026-02-21T09:41:20.2910183Z // end inline asm 2026-02-21T09:41:20.2910347Z .loc 1 51 31 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:51:31 2026-02-21T09:41:20.2910405Z shl.b32 %r323, %r486, 12; 2026-02-21T09:41:20.2910461Z add.s32 %r324, %r60, %r323; 2026-02-21T09:41:20.2910525Z add.s32 %r308, %r324, 49152; 2026-02-21T09:41:20.2910579Z bar.sync 0; 2026-02-21T09:41:20.2910641Z elect.sync %r325|%p128, -1; 2026-02-21T09:41:20.2910714Z and.pred %p129, %p123, %p128; 2026-02-21T09:41:20.2910776Z and.pred %p120, %p4, %p129; 2026-02-21T09:41:20.2910833Z // begin inline asm 2026-02-21T09:41:20.2911096Z @%p120 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r308], [%rd40, {%r481, %r495}], [%r311]; 2026-02-21T09:41:20.2911155Z // end inline asm 2026-02-21T09:41:20.2911319Z .loc 1 52 44 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:52:44 2026-02-21T09:41:20.2911378Z shl.b32 %r326, %r486, 13; 2026-02-21T09:41:20.2911444Z add.s32 %r312, %r60, %r326; 2026-02-21T09:41:20.2911498Z bar.sync 0; 2026-02-21T09:41:20.2911561Z elect.sync %r327|%p130, -1; 2026-02-21T09:41:20.2911634Z and.pred %p131, %p123, %p130; 2026-02-21T09:41:20.2911697Z and.pred %p121, %p4, %p131; 2026-02-21T09:41:20.2911754Z // begin inline asm 2026-02-21T09:41:20.2912014Z @%p121 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r312], [%rd41, {%r481, %r494}], [%r311]; 2026-02-21T09:41:20.2912071Z // end inline asm 2026-02-21T09:41:20.2912274Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2912357Z setp.ne.b32 %p142, %r480, 31; 2026-02-21T09:41:20.2912425Z @%p142 bra $L__BB0_10; 2026-02-21T09:41:20.2912516Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.2912676Z .loc 1 40 32 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:40:32 2026-02-21T09:41:20.2912740Z add.s32 %r400, %r476, %r5; 2026-02-21T09:41:20.2912904Z .loc 1 42 32 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:42:32 2026-02-21T09:41:20.2912963Z add.s32 %r401, %r472, %r6; 2026-02-21T09:41:20.2913026Z add.s32 %r402, %r7, %r472; 2026-02-21T09:41:20.2913081Z add.s32 %r403, %r8, %r472; 2026-02-21T09:41:20.2913135Z add.s32 %r404, %r472, %r9; 2026-02-21T09:41:20.2913299Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2913359Z // begin inline asm 2026-02-21T09:41:20.2913409Z 2026-02-21T09:41:20.2913457Z { 2026-02-21T09:41:20.2913522Z .reg .pred complete; 2026-02-21T09:41:20.2913574Z waitLoop: 2026-02-21T09:41:20.2913689Z mbarrier.try_wait.parity.shared.b64 complete, [%r457], %r458; 2026-02-21T09:41:20.2913757Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.2913804Z } 2026-02-21T09:41:20.2913808Z 2026-02-21T09:41:20.2913861Z // end inline asm 2026-02-21T09:41:20.2914017Z .loc 1 56 53 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:56:53 2026-02-21T09:41:20.2914090Z mad.lo.s32 %r405, %r401, 12288, %r400; 2026-02-21T09:41:20.2914154Z mad.lo.s32 %r406, %r402, 12288, %r400; 2026-02-21T09:41:20.2914214Z mad.lo.s32 %r407, %r403, 12288, %r400; 2026-02-21T09:41:20.2914281Z mad.lo.s32 %r408, %r404, 12288, %r400; 2026-02-21T09:41:20.2914439Z .loc 1 56 24 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:56:24 2026-02-21T09:41:20.2914503Z mad.wide.s32 %rd74, %r405, 2, %rd3; 2026-02-21T09:41:20.2914573Z mad.wide.s32 %rd75, %r406, 2, %rd3; 2026-02-21T09:41:20.2914633Z mad.wide.s32 %rd76, %r407, 2, %rd3; 2026-02-21T09:41:20.2914764Z mad.wide.s32 %rd77, %r408, 2, %rd3; 2026-02-21T09:41:20.2914925Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2914989Z // begin inline asm 2026-02-21T09:41:20.2915268Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345}, [%r77 + 0], 32; 2026-02-21T09:41:20.2915321Z // end inline asm 2026-02-21T09:41:20.2915383Z // begin inline asm 2026-02-21T09:41:20.2915652Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362}, [%r77 + 16], 32; 2026-02-21T09:41:20.2915707Z // end inline asm 2026-02-21T09:41:20.2915768Z // begin inline asm 2026-02-21T09:41:20.2915838Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:20.2915891Z // end inline asm 2026-02-21T09:41:20.2915949Z cvt.u64.u32 %rd78, %r330; 2026-02-21T09:41:20.2916016Z cvt.u64.u32 %rd79, %r331; 2026-02-21T09:41:20.2916072Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:41:20.2916128Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:41:20.2916294Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2916354Z mov.b64 {%r409, %r410}, %rd81; 2026-02-21T09:41:20.2916421Z cvt.rn.f16x2.f32 %r411, %r410, %r409; 2026-02-21T09:41:20.2916590Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2916647Z cvt.u64.u32 %rd82, %r332; 2026-02-21T09:41:20.2916703Z cvt.u64.u32 %rd83, %r333; 2026-02-21T09:41:20.2916758Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:41:20.2916822Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:41:20.2917005Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2917093Z mov.b64 {%r412, %r413}, %rd85; 2026-02-21T09:41:20.2917169Z cvt.rn.f16x2.f32 %r414, %r413, %r412; 2026-02-21T09:41:20.2917329Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2917383Z cvt.u64.u32 %rd86, %r334; 2026-02-21T09:41:20.2917443Z cvt.u64.u32 %rd87, %r335; 2026-02-21T09:41:20.2917498Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:41:20.2917553Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:41:20.2917709Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2917775Z mov.b64 {%r415, %r416}, %rd89; 2026-02-21T09:41:20.2917839Z cvt.rn.f16x2.f32 %r417, %r416, %r415; 2026-02-21T09:41:20.2917994Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2918058Z cvt.u64.u32 %rd90, %r336; 2026-02-21T09:41:20.2918113Z cvt.u64.u32 %rd91, %r337; 2026-02-21T09:41:20.2918168Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:41:20.2918233Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:41:20.2918394Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2918450Z mov.b64 {%r418, %r419}, %rd93; 2026-02-21T09:41:20.2918511Z cvt.rn.f16x2.f32 %r420, %r419, %r418; 2026-02-21T09:41:20.2918675Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2918730Z cvt.u64.u32 %rd94, %r338; 2026-02-21T09:41:20.2918784Z cvt.u64.u32 %rd95, %r339; 2026-02-21T09:41:20.2918846Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:41:20.2918901Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:41:20.2919059Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2919124Z mov.b64 {%r421, %r422}, %rd97; 2026-02-21T09:41:20.2919184Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:41:20.2919343Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2919423Z cvt.u64.u32 %rd98, %r340; 2026-02-21T09:41:20.2919510Z cvt.u64.u32 %rd99, %r341; 2026-02-21T09:41:20.2919570Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:41:20.2919629Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:41:20.2919791Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2919850Z mov.b64 {%r424, %r425}, %rd101; 2026-02-21T09:41:20.2919910Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:41:20.2920075Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2920134Z cvt.u64.u32 %rd102, %r342; 2026-02-21T09:41:20.2920190Z cvt.u64.u32 %rd103, %r343; 2026-02-21T09:41:20.2920247Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:41:20.2920316Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:41:20.2920479Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2920541Z mov.b64 {%r427, %r428}, %rd105; 2026-02-21T09:41:20.2920613Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:41:20.2920771Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2920829Z cvt.u64.u32 %rd106, %r344; 2026-02-21T09:41:20.2920891Z cvt.u64.u32 %rd107, %r345; 2026-02-21T09:41:20.2920947Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:41:20.2921005Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:41:20.2921162Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2921226Z mov.b64 {%r430, %r431}, %rd109; 2026-02-21T09:41:20.2921287Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:41:20.2921443Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2921529Z cvt.u64.u32 %rd110, %r347; 2026-02-21T09:41:20.2921606Z cvt.u64.u32 %rd111, %r348; 2026-02-21T09:41:20.2921665Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:41:20.2921730Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:41:20.2921891Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2921947Z mov.b64 {%r433, %r434}, %rd113; 2026-02-21T09:41:20.2922006Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:41:20.2922173Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2922230Z cvt.u64.u32 %rd114, %r349; 2026-02-21T09:41:20.2922284Z cvt.u64.u32 %rd115, %r350; 2026-02-21T09:41:20.2922346Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:41:20.2922402Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:41:20.2922556Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2922618Z mov.b64 {%r436, %r437}, %rd117; 2026-02-21T09:41:20.2922680Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:41:20.2922839Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2922896Z cvt.u64.u32 %rd118, %r351; 2026-02-21T09:41:20.2922959Z cvt.u64.u32 %rd119, %r352; 2026-02-21T09:41:20.2923016Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:41:20.2923071Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:41:20.2923235Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2923292Z mov.b64 {%r439, %r440}, %rd121; 2026-02-21T09:41:20.2923353Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T09:41:20.2923519Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2923576Z cvt.u64.u32 %rd122, %r353; 2026-02-21T09:41:20.2923631Z cvt.u64.u32 %rd123, %r354; 2026-02-21T09:41:20.2923688Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:41:20.2923754Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:41:20.2923916Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2924033Z mov.b64 {%r442, %r443}, %rd125; 2026-02-21T09:41:20.2924101Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T09:41:20.2924262Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2924318Z cvt.u64.u32 %rd126, %r355; 2026-02-21T09:41:20.2924382Z cvt.u64.u32 %rd127, %r356; 2026-02-21T09:41:20.2924438Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:41:20.2924492Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:41:20.2924646Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2924734Z mov.b64 {%r445, %r446}, %rd129; 2026-02-21T09:41:20.2924797Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T09:41:20.2924954Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2925020Z cvt.u64.u32 %rd130, %r357; 2026-02-21T09:41:20.2925077Z cvt.u64.u32 %rd131, %r358; 2026-02-21T09:41:20.2925133Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:41:20.2925188Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:41:20.2925350Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2925407Z mov.b64 {%r448, %r449}, %rd133; 2026-02-21T09:41:20.2925466Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T09:41:20.2925628Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2925684Z cvt.u64.u32 %rd134, %r359; 2026-02-21T09:41:20.2925740Z cvt.u64.u32 %rd135, %r360; 2026-02-21T09:41:20.2925802Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:41:20.2925859Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:41:20.2926046Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2926138Z mov.b64 {%r451, %r452}, %rd137; 2026-02-21T09:41:20.2926200Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T09:41:20.2926355Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2926410Z cvt.u64.u32 %rd138, %r361; 2026-02-21T09:41:20.2926471Z cvt.u64.u32 %rd139, %r362; 2026-02-21T09:41:20.2926527Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:41:20.2926582Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:41:20.2926745Z .loc 1 55 27 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:55:27 2026-02-21T09:41:20.2926803Z mov.b64 {%r454, %r455}, %rd141; 2026-02-21T09:41:20.2926863Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T09:41:20.2927023Z .loc 1 56 83 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:56:83 2026-02-21T09:41:20.2927116Z st.shared.v4.b32 [%r17], {%r411, %r423, %r435, %r447}; 2026-02-21T09:41:20.2927170Z bar.sync 0; 2026-02-21T09:41:20.2927227Z // begin inline asm 2026-02-21T09:41:20.2927384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r384, %r388, %r392, %r396}, [%r368]; 2026-02-21T09:41:20.2927439Z // end inline asm 2026-02-21T09:41:20.2927491Z bar.sync 0; 2026-02-21T09:41:20.2927586Z st.shared.v4.b32 [%r17], {%r414, %r426, %r438, %r450}; 2026-02-21T09:41:20.2927636Z bar.sync 0; 2026-02-21T09:41:20.2927690Z // begin inline asm 2026-02-21T09:41:20.2927833Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r385, %r389, %r393, %r397}, [%r368]; 2026-02-21T09:41:20.2927894Z // end inline asm 2026-02-21T09:41:20.2927945Z bar.sync 0; 2026-02-21T09:41:20.2928031Z st.shared.v4.b32 [%r17], {%r417, %r429, %r441, %r453}; 2026-02-21T09:41:20.2928091Z bar.sync 0; 2026-02-21T09:41:20.2928145Z // begin inline asm 2026-02-21T09:41:20.2928282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r386, %r390, %r394, %r398}, [%r368]; 2026-02-21T09:41:20.2928342Z // end inline asm 2026-02-21T09:41:20.2928394Z bar.sync 0; 2026-02-21T09:41:20.2928480Z st.shared.v4.b32 [%r17], {%r420, %r432, %r444, %r456}; 2026-02-21T09:41:20.2928561Z bar.sync 0; 2026-02-21T09:41:20.2928659Z // begin inline asm 2026-02-21T09:41:20.2928797Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r387, %r391, %r395, %r399}, [%r368]; 2026-02-21T09:41:20.2928852Z // end inline asm 2026-02-21T09:41:20.2928916Z // begin inline asm 2026-02-21T09:41:20.2929019Z st.global.v4.b32 [ %rd74 + 0 ], { %r384, %r385, %r386, %r387 }; 2026-02-21T09:41:20.2929074Z // end inline asm 2026-02-21T09:41:20.2929130Z // begin inline asm 2026-02-21T09:41:20.2929239Z st.global.v4.b32 [ %rd75 + 0 ], { %r388, %r389, %r390, %r391 }; 2026-02-21T09:41:20.2929292Z // end inline asm 2026-02-21T09:41:20.2929345Z // begin inline asm 2026-02-21T09:41:20.2929448Z st.global.v4.b32 [ %rd76 + 0 ], { %r392, %r393, %r394, %r395 }; 2026-02-21T09:41:20.2929500Z // end inline asm 2026-02-21T09:41:20.2929554Z // begin inline asm 2026-02-21T09:41:20.2929647Z st.global.v4.b32 [ %rd77 + 0 ], { %r396, %r397, %r398, %r399 }; 2026-02-21T09:41:20.2929708Z // end inline asm 2026-02-21T09:41:20.2929765Z bra.uni $L__BB0_10; 2026-02-21T09:41:20.2929846Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:20.2930015Z .loc 1 53 52 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:53:52 2026-02-21T09:41:20.2930069Z // begin inline asm 2026-02-21T09:41:20.2930116Z 2026-02-21T09:41:20.2930172Z { 2026-02-21T09:41:20.2930231Z .reg .pred complete; 2026-02-21T09:41:20.2930283Z waitLoop: 2026-02-21T09:41:20.2930397Z mbarrier.try_wait.parity.shared.b64 complete, [%r457], %r458; 2026-02-21T09:41:20.2930466Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.2930514Z } 2026-02-21T09:41:20.2930517Z 2026-02-21T09:41:20.2930569Z // end inline asm 2026-02-21T09:41:20.2930664Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:41:20.2930851Z .loc 1 28 108 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:108 2026-02-21T09:41:20.2930927Z bar.sync 0; 2026-02-21T09:41:20.2930984Z // begin inline asm 2026-02-21T09:41:20.2931080Z @%p133 mbarrier.inval.shared::cta.b64 [%r113]; 2026-02-21T09:41:20.2931133Z // end inline asm 2026-02-21T09:41:20.2931182Z bar.sync 0; 2026-02-21T09:41:20.2931244Z // begin inline asm 2026-02-21T09:41:20.2931327Z @%p133 mbarrier.inval.shared::cta.b64 [%r114]; 2026-02-21T09:41:20.2931380Z // end inline asm 2026-02-21T09:41:20.2931437Z bar.sync 0; 2026-02-21T09:41:20.2931490Z // begin inline asm 2026-02-21T09:41:20.2931568Z @%p133 mbarrier.inval.shared::cta.b64 [%r115]; 2026-02-21T09:41:20.2931620Z // end inline asm 2026-02-21T09:41:20.2931678Z bar.sync 0; 2026-02-21T09:41:20.2931730Z // begin inline asm 2026-02-21T09:41:20.2931805Z @%p133 mbarrier.inval.shared::cta.b64 [%r116]; 2026-02-21T09:41:20.2931863Z // end inline asm 2026-02-21T09:41:20.2931914Z bar.sync 0; 2026-02-21T09:41:20.2931966Z // begin inline asm 2026-02-21T09:41:20.2932042Z @%p133 mbarrier.inval.shared::cta.b64 [%r117]; 2026-02-21T09:41:20.2932101Z // end inline asm 2026-02-21T09:41:20.2932153Z bar.sync 0; 2026-02-21T09:41:20.2932207Z // begin inline asm 2026-02-21T09:41:20.2932288Z @%p133 mbarrier.inval.shared::cta.b64 [%r222]; 2026-02-21T09:41:20.2932340Z // end inline asm 2026-02-21T09:41:20.2932392Z // begin inline asm 2026-02-21T09:41:20.2932472Z @%p133 mbarrier.inval.shared::cta.b64 [%r111]; 2026-02-21T09:41:20.2932524Z // end inline asm 2026-02-21T09:41:20.2932573Z bar.sync 0; 2026-02-21T09:41:20.2932624Z // begin inline asm 2026-02-21T09:41:20.2932705Z @%p133 mbarrier.inval.shared::cta.b64 [%r112]; 2026-02-21T09:41:20.2932755Z // end inline asm 2026-02-21T09:41:20.2932910Z .loc 1 28 4 // clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py:28:4 2026-02-21T09:41:20.2932967Z bar.sync 0; 2026-02-21T09:41:20.2933020Z // begin inline asm 2026-02-21T09:41:20.2933131Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r467, 128; 2026-02-21T09:41:20.2933184Z // end inline asm 2026-02-21T09:41:20.2933242Z ret; 2026-02-21T09:41:20.2933317Z $L__tmp1: 2026-02-21T09:41:20.2933392Z $L__func_end0: 2026-02-21T09:41:20.2933479Z // -- End function 2026-02-21T09:41:20.2933527Z } 2026-02-21T09:41:20.2933718Z .file 1 "/tmp/torchinductor_root/lk/clk6z7cvvzljwlmydqjvvsu42cg3sgl7gfg424l46h6c46fb4ogp.py" 2026-02-21T09:41:20.2933777Z .section .debug_abbrev 2026-02-21T09:41:20.2933832Z { 2026-02-21T09:41:20.2933915Z .b8 1 // Abbreviation Code 2026-02-21T09:41:20.2933996Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:20.2934079Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:20.2934155Z .b8 37 // DW_AT_producer 2026-02-21T09:41:20.2934227Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.2934305Z .b8 19 // DW_AT_language 2026-02-21T09:41:20.2934379Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:20.2934452Z .b8 3 // DW_AT_name 2026-02-21T09:41:20.2934531Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.2934605Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:20.2934708Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:20.2934782Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:20.2934859Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.2934927Z .b8 0 // EOM(1) 2026-02-21T09:41:20.2934994Z .b8 0 // EOM(2) 2026-02-21T09:41:20.2935064Z .b8 0 // EOM(3) 2026-02-21T09:41:20.2935113Z } 2026-02-21T09:41:20.2935170Z .section .debug_info 2026-02-21T09:41:20.2935218Z { 2026-02-21T09:41:20.2935329Z .b32 104 // Length of Unit 2026-02-21T09:41:20.2935435Z .b8 2 // DWARF version number 2026-02-21T09:41:20.2935488Z .b8 0 2026-02-21T09:41:20.2935611Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:20.2935696Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:20.2935791Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:20.2935875Z .b8 116 // DW_AT_producer 2026-02-21T09:41:20.2935929Z .b8 114 2026-02-21T09:41:20.2935981Z .b8 105 2026-02-21T09:41:20.2936032Z .b8 116 2026-02-21T09:41:20.2936093Z .b8 111 2026-02-21T09:41:20.2936143Z .b8 110 2026-02-21T09:41:20.2936195Z .b8 0 2026-02-21T09:41:20.2936280Z .b8 2 // DW_AT_language 2026-02-21T09:41:20.2936333Z .b8 0 2026-02-21T09:41:20.2936407Z .b8 99 // DW_AT_name 2026-02-21T09:41:20.2936456Z .b8 108 2026-02-21T09:41:20.2936515Z .b8 107 2026-02-21T09:41:20.2936565Z .b8 54 2026-02-21T09:41:20.2936615Z .b8 122 2026-02-21T09:41:20.2936672Z .b8 55 2026-02-21T09:41:20.2936722Z .b8 99 2026-02-21T09:41:20.2936772Z .b8 118 2026-02-21T09:41:20.2936820Z .b8 118 2026-02-21T09:41:20.2936877Z .b8 122 2026-02-21T09:41:20.2936926Z .b8 108 2026-02-21T09:41:20.2936974Z .b8 106 2026-02-21T09:41:20.2937031Z .b8 119 2026-02-21T09:41:20.2937080Z .b8 108 2026-02-21T09:41:20.2937127Z .b8 109 2026-02-21T09:41:20.2937175Z .b8 121 2026-02-21T09:41:20.2937230Z .b8 100 2026-02-21T09:41:20.2937280Z .b8 113 2026-02-21T09:41:20.2937329Z .b8 106 2026-02-21T09:41:20.2937376Z .b8 118 2026-02-21T09:41:20.2937431Z .b8 118 2026-02-21T09:41:20.2937479Z .b8 115 2026-02-21T09:41:20.2937527Z .b8 117 2026-02-21T09:41:20.2937581Z .b8 52 2026-02-21T09:41:20.2937629Z .b8 50 2026-02-21T09:41:20.2937676Z .b8 99 2026-02-21T09:41:20.2937724Z .b8 103 2026-02-21T09:41:20.2937778Z .b8 51 2026-02-21T09:41:20.2937828Z .b8 115 2026-02-21T09:41:20.2937876Z .b8 103 2026-02-21T09:41:20.2937933Z .b8 108 2026-02-21T09:41:20.2937982Z .b8 55 2026-02-21T09:41:20.2938031Z .b8 103 2026-02-21T09:41:20.2938128Z .b8 102 2026-02-21T09:41:20.2938214Z .b8 103 2026-02-21T09:41:20.2938264Z .b8 52 2026-02-21T09:41:20.2938313Z .b8 50 2026-02-21T09:41:20.2938362Z .b8 52 2026-02-21T09:41:20.2938419Z .b8 108 2026-02-21T09:41:20.2938468Z .b8 52 2026-02-21T09:41:20.2938516Z .b8 54 2026-02-21T09:41:20.2938571Z .b8 104 2026-02-21T09:41:20.2938619Z .b8 54 2026-02-21T09:41:20.2938666Z .b8 99 2026-02-21T09:41:20.2938713Z .b8 52 2026-02-21T09:41:20.2938768Z .b8 54 2026-02-21T09:41:20.2938816Z .b8 102 2026-02-21T09:41:20.2938864Z .b8 98 2026-02-21T09:41:20.2938918Z .b8 52 2026-02-21T09:41:20.2938967Z .b8 111 2026-02-21T09:41:20.2939014Z .b8 103 2026-02-21T09:41:20.2939063Z .b8 112 2026-02-21T09:41:20.2939119Z .b8 46 2026-02-21T09:41:20.2939168Z .b8 112 2026-02-21T09:41:20.2939215Z .b8 121 2026-02-21T09:41:20.2939271Z .b8 0 2026-02-21T09:41:20.2939360Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:20.2939432Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:20.2939482Z .b8 116 2026-02-21T09:41:20.2939539Z .b8 109 2026-02-21T09:41:20.2939587Z .b8 112 2026-02-21T09:41:20.2939636Z .b8 47 2026-02-21T09:41:20.2939690Z .b8 116 2026-02-21T09:41:20.2939738Z .b8 111 2026-02-21T09:41:20.2939786Z .b8 114 2026-02-21T09:41:20.2939833Z .b8 99 2026-02-21T09:41:20.2939889Z .b8 104 2026-02-21T09:41:20.2939937Z .b8 105 2026-02-21T09:41:20.2939984Z .b8 110 2026-02-21T09:41:20.2940031Z .b8 100 2026-02-21T09:41:20.2940086Z .b8 117 2026-02-21T09:41:20.2940134Z .b8 99 2026-02-21T09:41:20.2940182Z .b8 116 2026-02-21T09:41:20.2940236Z .b8 111 2026-02-21T09:41:20.2940284Z .b8 114 2026-02-21T09:41:20.2940331Z .b8 95 2026-02-21T09:41:20.2940379Z .b8 114 2026-02-21T09:41:20.2940433Z .b8 111 2026-02-21T09:41:20.2940481Z .b8 111 2026-02-21T09:41:20.2940528Z .b8 116 2026-02-21T09:41:20.2940582Z .b8 47 2026-02-21T09:41:20.2940630Z .b8 108 2026-02-21T09:41:20.2940678Z .b8 107 2026-02-21T09:41:20.2940747Z .b8 0 2026-02-21T09:41:20.2940806Z } 2026-02-21T09:41:20.2940891Z .section .debug_macinfo { } 2026-02-21T09:41:20.2940896Z 2026-02-21T09:41:20.2940971Z ================================================================ 2026-02-21T09:41:20.2941078Z please share the reproducer above with Triton project. 2026-02-21T09:41:20.5795347Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:20.5795361Z 2026-02-21T09:41:20.5798183Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:41:20.5798341Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:20.5798414Z `ptxas` stderr: 2026-02-21T09:41:20.5798753Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 201 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.5798853Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.5798860Z 2026-02-21T09:41:20.5799278Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpeqn30kud.ptx -o /tmp/tmpeqn30kud.ptx.o 2026-02-21T09:41:20.5799283Z 2026-02-21T09:41:20.5799410Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:20.5799417Z 2026-02-21T09:41:20.5799523Z 2026-02-21T09:41:20.5799635Z ================================================================ 2026-02-21T09:41:20.5799712Z Internal Triton PTX codegen error 2026-02-21T09:41:20.5799806Z `ptxas` stderr: 2026-02-21T09:41:20.5800223Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 201 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.5800558Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.5800586Z 2026-02-21T09:41:20.5804658Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpeqn30kud.ptx -o /tmp/tmpeqn30kud.ptx.o 2026-02-21T09:41:20.5808074Z 2026-02-21T09:41:20.5812037Z 2026-02-21T09:41:20.5816173Z // 2026-02-21T09:41:20.5817610Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:20.5817678Z // 2026-02-21T09:41:20.5817685Z 2026-02-21T09:41:20.5817759Z .version 8.7 2026-02-21T09:41:20.5817817Z .target sm_100a 2026-02-21T09:41:20.5817872Z .address_size 64 2026-02-21T09:41:20.5817875Z 2026-02-21T09:41:20.5818018Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:20.5818120Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:20.5818205Z // @_helion_matmul 2026-02-21T09:41:20.5818276Z .visible .entry _helion_matmul( 2026-02-21T09:41:20.5818395Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:20.5818488Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:20.5818582Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:20.5818680Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:20.5818775Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:20.5818825Z ) 2026-02-21T09:41:20.5818879Z .reqntid 128 2026-02-21T09:41:20.5818938Z .maxnreg 32 2026-02-21T09:41:20.5818986Z { 2026-02-21T09:41:20.5819049Z .reg .pred %p<154>; 2026-02-21T09:41:20.5819112Z .reg .b32 %r<679>; 2026-02-21T09:41:20.5819165Z .reg .b64 %rd<212>; 2026-02-21T09:41:20.5819535Z .loc 1 19 0 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:19:0 2026-02-21T09:41:20.5819635Z $L__func_begin0: 2026-02-21T09:41:20.5819821Z .loc 1 19 0 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:19:0 2026-02-21T09:41:20.5819827Z 2026-02-21T09:41:20.5819880Z // %bb.0: 2026-02-21T09:41:20.5819967Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:41:20.5820028Z $L__tmp0: 2026-02-21T09:41:20.5820193Z .loc 1 19 0 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:19 2026-02-21T09:41:20.5820250Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:20.5820341Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:41:20.5820404Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:41:20.5820465Z mov.b32 %r69, global_smem; 2026-02-21T09:41:20.5820521Z // begin inline asm 2026-02-21T09:41:20.5820678Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r69], 128; 2026-02-21T09:41:20.5820734Z // end inline asm 2026-02-21T09:41:20.5820817Z ld.param.b64 %rd52, [_helion_matmul_param_3]; 2026-02-21T09:41:20.5820881Z bar.sync 0; 2026-02-21T09:41:20.5820953Z ld.shared.b32 %r646, [global_smem]; 2026-02-21T09:41:20.5821012Z bar.sync 0; 2026-02-21T09:41:20.5821073Z // begin inline asm 2026-02-21T09:41:20.5821192Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:20.5821246Z // end inline asm 2026-02-21T09:41:20.5821420Z .loc 1 21 67 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:21:67 2026-02-21T09:41:20.5821482Z mov.u32 %r678, %ctaid.x; 2026-02-21T09:41:20.5821538Z mov.u32 %r219, %ctaid.y; 2026-02-21T09:41:20.5821591Z mov.u32 %r220, %ctaid.z; 2026-02-21T09:41:20.5821662Z mov.u32 %r221, %nctaid.x; 2026-02-21T09:41:20.5825016Z mov.u32 %r222, %nctaid.y; 2026-02-21T09:41:20.5826663Z mad.lo.s32 %r223, %r220, %r222, %r219; 2026-02-21T09:41:20.5826794Z mad.lo.s32 %r224, %r223, %r221, %r678; 2026-02-21T09:41:20.5826929Z shl.b32 %r225, %r224, 8; 2026-02-21T09:41:20.5827001Z cvt.s64.s32 %rd53, %r225; 2026-02-21T09:41:20.5827112Z add.s64 %rd19, %rd52, %rd53; 2026-02-21T09:41:20.5827224Z shl.b32 %r226, %r1, 2; 2026-02-21T09:41:20.5827494Z add.s32 %r70, %r69, %r226; 2026-02-21T09:41:20.5831033Z mov.b32 %r79, 0; 2026-02-21T09:41:20.5831116Z // begin inline asm 2026-02-21T09:41:20.5831195Z @%p4 st.shared.b32 [ %r70 + 0 ], %r79; 2026-02-21T09:41:20.5831251Z // end inline asm 2026-02-21T09:41:20.5831323Z bar.warp.sync -1; 2026-02-21T09:41:20.5831389Z setp.eq.b32 %p143, %r1, 0; 2026-02-21T09:41:20.5831451Z cvt.u64.u32 %rd4, %r69; 2026-02-21T09:41:20.5831506Z // begin inline asm 2026-02-21T09:41:20.5831706Z @%p143 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:41:20.5831763Z // end inline asm 2026-02-21T09:41:20.5831818Z // begin inline asm 2026-02-21T09:41:20.5831975Z @%p143 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:20.5832028Z // end inline asm 2026-02-21T09:41:20.5832081Z mov.b32 %r72, 32; 2026-02-21T09:41:20.5832151Z // begin inline asm 2026-02-21T09:41:20.5832307Z @%p143 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r72; 2026-02-21T09:41:20.5832365Z // end inline asm 2026-02-21T09:41:20.5832417Z mov.b32 %r73, 64; 2026-02-21T09:41:20.5832479Z // begin inline asm 2026-02-21T09:41:20.5832626Z @%p143 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:41:20.5832679Z // end inline asm 2026-02-21T09:41:20.5832741Z mov.b32 %r74, 1024; 2026-02-21T09:41:20.5832796Z // begin inline asm 2026-02-21T09:41:20.5832957Z @%p143 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r74; 2026-02-21T09:41:20.5833015Z // end inline asm 2026-02-21T09:41:20.5833068Z // begin inline asm 2026-02-21T09:41:20.5833224Z @%p143 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r74; 2026-02-21T09:41:20.5833278Z // end inline asm 2026-02-21T09:41:20.5833365Z mov.b64 %rd12, 2048; 2026-02-21T09:41:20.5833580Z // begin inline asm 2026-02-21T09:41:20.5833800Z @%p143 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:20.5833871Z // end inline asm 2026-02-21T09:41:20.5833926Z mov.b32 %r76, 1; 2026-02-21T09:41:20.5833984Z // begin inline asm 2026-02-21T09:41:20.5834170Z @%p143 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r76; 2026-02-21T09:41:20.5834226Z // end inline asm 2026-02-21T09:41:20.5834373Z // begin inline asm 2026-02-21T09:41:20.5834545Z @%p143 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r76; 2026-02-21T09:41:20.5834597Z // end inline asm 2026-02-21T09:41:20.5834651Z // begin inline asm 2026-02-21T09:41:20.5834872Z @%p143 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:20.5834927Z // end inline asm 2026-02-21T09:41:20.5834980Z // begin inline asm 2026-02-21T09:41:20.5835152Z @%p143 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.5835215Z // end inline asm 2026-02-21T09:41:20.5835270Z // begin inline asm 2026-02-21T09:41:20.5835422Z @%p143 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:20.5835483Z // end inline asm 2026-02-21T09:41:20.5835536Z // begin inline asm 2026-02-21T09:41:20.5835679Z @%p143 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.5835739Z // end inline asm 2026-02-21T09:41:20.5835793Z // begin inline asm 2026-02-21T09:41:20.5836053Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:20.5836107Z // end inline asm 2026-02-21T09:41:20.5836168Z // begin inline asm 2026-02-21T09:41:20.5836293Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:41:20.5836368Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.5836451Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.5836508Z // end inline asm 2026-02-21T09:41:20.5836604Z bar.sync 0; 2026-02-21T09:41:20.5836674Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:41:20.5836889Z .loc 1 22 68 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:22:68 2026-02-21T09:41:20.5836950Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:41:20.5837005Z bar.sync 0; 2026-02-21T09:41:20.5837069Z // begin inline asm 2026-02-21T09:41:20.5837135Z @%p4 st.shared.b32 [ %r70 + 0 ], %r79; 2026-02-21T09:41:20.5837188Z // end inline asm 2026-02-21T09:41:20.5837255Z bar.warp.sync -1; 2026-02-21T09:41:20.5837308Z // begin inline asm 2026-02-21T09:41:20.5837475Z @%p143 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:41:20.5837534Z // end inline asm 2026-02-21T09:41:20.5837589Z // begin inline asm 2026-02-21T09:41:20.5837725Z @%p143 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:20.5837780Z // end inline asm 2026-02-21T09:41:20.5837842Z // begin inline asm 2026-02-21T09:41:20.5837989Z @%p143 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r72; 2026-02-21T09:41:20.5838045Z // end inline asm 2026-02-21T09:41:20.5838102Z mov.b32 %r81, 128; 2026-02-21T09:41:20.5838154Z // begin inline asm 2026-02-21T09:41:20.5838296Z @%p143 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r81; 2026-02-21T09:41:20.5838354Z // end inline asm 2026-02-21T09:41:20.5838407Z // begin inline asm 2026-02-21T09:41:20.5838560Z @%p143 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r74; 2026-02-21T09:41:20.5838611Z // end inline asm 2026-02-21T09:41:20.5838672Z mov.b32 %r83, 12288; 2026-02-21T09:41:20.5838724Z // begin inline asm 2026-02-21T09:41:20.5838876Z @%p143 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r83; 2026-02-21T09:41:20.5838934Z // end inline asm 2026-02-21T09:41:20.5838988Z // begin inline asm 2026-02-21T09:41:20.5839204Z @%p143 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:20.5839268Z // end inline asm 2026-02-21T09:41:20.5839325Z // begin inline asm 2026-02-21T09:41:20.5839493Z @%p143 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r76; 2026-02-21T09:41:20.5839551Z // end inline asm 2026-02-21T09:41:20.5839616Z // begin inline asm 2026-02-21T09:41:20.5839777Z @%p143 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r76; 2026-02-21T09:41:20.5839832Z // end inline asm 2026-02-21T09:41:20.5839900Z // begin inline asm 2026-02-21T09:41:20.5840045Z @%p143 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:20.5840096Z // end inline asm 2026-02-21T09:41:20.5840155Z // begin inline asm 2026-02-21T09:41:20.5840322Z @%p143 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.5840375Z // end inline asm 2026-02-21T09:41:20.5840436Z // begin inline asm 2026-02-21T09:41:20.5840587Z @%p143 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:20.5840643Z // end inline asm 2026-02-21T09:41:20.5840697Z // begin inline asm 2026-02-21T09:41:20.5840850Z @%p143 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.5840904Z // end inline asm 2026-02-21T09:41:20.5840958Z // begin inline asm 2026-02-21T09:41:20.5841222Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:20.5841276Z // end inline asm 2026-02-21T09:41:20.5841329Z // begin inline asm 2026-02-21T09:41:20.5841457Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:41:20.5841527Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.5841597Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.5841649Z // end inline asm 2026-02-21T09:41:20.5841708Z bar.sync 0; 2026-02-21T09:41:20.5841772Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:41:20.5841974Z .loc 1 40 45 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:40:45 2026-02-21T09:41:20.5842061Z shr.u32 %r227, %r1, 5; 2026-02-21T09:41:20.5842236Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5842295Z sub.s32 %r229, 1536, %r678; 2026-02-21T09:41:20.5842370Z mul.hi.s32 %r230, %r229, -580400985; 2026-02-21T09:41:20.5842430Z add.s32 %r231, %r230, %r229; 2026-02-21T09:41:20.5842486Z shr.u32 %r232, %r231, 31; 2026-02-21T09:41:20.5842543Z shr.s32 %r233, %r231, 12; 2026-02-21T09:41:20.5842607Z add.s32 %r234, %r233, %r232; 2026-02-21T09:41:20.5842668Z mul.lo.s32 %r235, %r234, 4736; 2026-02-21T09:41:20.5842730Z setp.ne.b32 %p74, %r229, %r235; 2026-02-21T09:41:20.5842798Z setp.lt.u32 %p75, %r678, 1537; 2026-02-21T09:41:20.5842862Z and.pred %p76, %p75, %p74; 2026-02-21T09:41:20.5842923Z selp.b32 %r236, 1, 0, %p76; 2026-02-21T09:41:20.5842982Z add.s32 %r14, %r234, %r236; 2026-02-21T09:41:20.5843160Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5843235Z shfl.sync.idx.b32 %r16, %r227, 0, 31, -1; 2026-02-21T09:41:20.5843293Z shl.b32 %r237, %r16, 21; 2026-02-21T09:41:20.5843358Z and.b32 %r238, %r237, 6291456; 2026-02-21T09:41:20.5843414Z add.s32 %r86, %r238, %r646; 2026-02-21T09:41:20.5843472Z mov.pred %p42, -1; 2026-02-21T09:41:20.5843533Z // begin inline asm 2026-02-21T09:41:20.5843800Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r86 + 0], 64, {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:41:20.5843854Z // end inline asm 2026-02-21T09:41:20.5843908Z // begin inline asm 2026-02-21T09:41:20.5844189Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r86 + 16], 64, {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:41:20.5844265Z // end inline asm 2026-02-21T09:41:20.5844321Z // begin inline asm 2026-02-21T09:41:20.5844581Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r86 + 32], 64, {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:41:20.5844634Z // end inline asm 2026-02-21T09:41:20.5844719Z // begin inline asm 2026-02-21T09:41:20.5844975Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r86 + 48], 64, {%r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79, %r79}; 2026-02-21T09:41:20.5845028Z // end inline asm 2026-02-21T09:41:20.5845081Z // begin inline asm 2026-02-21T09:41:20.5845158Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:20.5845210Z // end inline asm 2026-02-21T09:41:20.5845261Z bar.sync 0; 2026-02-21T09:41:20.5845437Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5845504Z add.s32 %r154, %r69, 90176; 2026-02-21T09:41:20.5845558Z // begin inline asm 2026-02-21T09:41:20.5845647Z @%p143 mbarrier.init.shared::cta.b64 [%r154], 1; 2026-02-21T09:41:20.5845707Z // end inline asm 2026-02-21T09:41:20.5845757Z bar.sync 0; 2026-02-21T09:41:20.5845813Z add.s32 %r155, %r69, 90184; 2026-02-21T09:41:20.5845866Z // begin inline asm 2026-02-21T09:41:20.5845957Z @%p143 mbarrier.init.shared::cta.b64 [%r155], 1; 2026-02-21T09:41:20.5846010Z // end inline asm 2026-02-21T09:41:20.5846067Z add.s32 %r156, %r69, 90112; 2026-02-21T09:41:20.5846129Z // begin inline asm 2026-02-21T09:41:20.5846209Z @%p143 mbarrier.init.shared::cta.b64 [%r156], 1; 2026-02-21T09:41:20.5846263Z // end inline asm 2026-02-21T09:41:20.5846323Z bar.sync 0; 2026-02-21T09:41:20.5846378Z add.s32 %r157, %r69, 90120; 2026-02-21T09:41:20.5846433Z // begin inline asm 2026-02-21T09:41:20.5846511Z @%p143 mbarrier.init.shared::cta.b64 [%r157], 1; 2026-02-21T09:41:20.5846569Z // end inline asm 2026-02-21T09:41:20.5846622Z bar.sync 0; 2026-02-21T09:41:20.5846679Z add.s32 %r158, %r69, 90128; 2026-02-21T09:41:20.5846769Z // begin inline asm 2026-02-21T09:41:20.5846873Z @%p143 mbarrier.init.shared::cta.b64 [%r158], 1; 2026-02-21T09:41:20.5846925Z // end inline asm 2026-02-21T09:41:20.5846975Z bar.sync 0; 2026-02-21T09:41:20.5847037Z add.s32 %r159, %r69, 90136; 2026-02-21T09:41:20.5847089Z // begin inline asm 2026-02-21T09:41:20.5847167Z @%p143 mbarrier.init.shared::cta.b64 [%r159], 1; 2026-02-21T09:41:20.5847226Z // end inline asm 2026-02-21T09:41:20.5847275Z bar.sync 0; 2026-02-21T09:41:20.5847329Z add.s32 %r160, %r69, 90144; 2026-02-21T09:41:20.5847381Z // begin inline asm 2026-02-21T09:41:20.5847466Z @%p143 mbarrier.init.shared::cta.b64 [%r160], 1; 2026-02-21T09:41:20.5847519Z // end inline asm 2026-02-21T09:41:20.5847570Z bar.sync 0; 2026-02-21T09:41:20.5847629Z add.s32 %r161, %r69, 90152; 2026-02-21T09:41:20.5847683Z // begin inline asm 2026-02-21T09:41:20.5847761Z @%p143 mbarrier.init.shared::cta.b64 [%r161], 1; 2026-02-21T09:41:20.5847823Z // end inline asm 2026-02-21T09:41:20.5847876Z bar.sync 0; 2026-02-21T09:41:20.5847932Z add.s32 %r274, %r69, 90160; 2026-02-21T09:41:20.5847985Z // begin inline asm 2026-02-21T09:41:20.5848072Z @%p143 mbarrier.init.shared::cta.b64 [%r274], 1; 2026-02-21T09:41:20.5848125Z // end inline asm 2026-02-21T09:41:20.5848189Z setp.lt.s32 %p77, %r14, 1; 2026-02-21T09:41:20.5848253Z setp.gt.s32 %p73, %r14, 0; 2026-02-21T09:41:20.5848419Z .loc 1 35 33 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:35:33 2026-02-21T09:41:20.5848478Z shr.u32 %r239, %r678, 4; 2026-02-21T09:41:20.5848540Z and.b32 %r240, %r239, 134217664; 2026-02-21T09:41:20.5848715Z .loc 1 36 39 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:36:39 2026-02-21T09:41:20.5848774Z sub.s32 %r241, 96, %r240; 2026-02-21T09:41:20.5848965Z .loc 1 36 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:36:52 2026-02-21T09:41:20.5849076Z min.s32 %r242, %r241, 64; 2026-02-21T09:41:20.5849242Z .loc 1 37 45 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:37:45 2026-02-21T09:41:20.5849301Z and.b32 %r243, %r678, 1023; 2026-02-21T09:41:20.5849472Z .loc 1 38 51 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:38:51 2026-02-21T09:41:20.5849528Z div.s32 %r244, %r243, %r242; 2026-02-21T09:41:20.5849685Z .loc 1 37 64 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:37:64 2026-02-21T09:41:20.5849753Z mul.lo.s32 %r245, %r244, %r242; 2026-02-21T09:41:20.5849809Z sub.s32 %r246, %r243, %r245; 2026-02-21T09:41:20.5849972Z .loc 1 37 30 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:37:30 2026-02-21T09:41:20.5850029Z add.s32 %r247, %r246, %r240; 2026-02-21T09:41:20.5850203Z .loc 1 39 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:39:27 2026-02-21T09:41:20.5850260Z shl.b32 %r653, %r247, 7; 2026-02-21T09:41:20.5850464Z .loc 1 41 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:41:27 2026-02-21T09:41:20.5850529Z shl.b32 %r648, %r244, 6; 2026-02-21T09:41:20.5850701Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5850753Z bar.sync 0; 2026-02-21T09:41:20.5850820Z and.pred %p1, %p143, %p73; 2026-02-21T09:41:20.5850874Z // begin inline asm 2026-02-21T09:41:20.5850988Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r156], 12288; 2026-02-21T09:41:20.5851044Z // end inline asm 2026-02-21T09:41:20.5851220Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5851273Z bar.sync 0; 2026-02-21T09:41:20.5851337Z elect.sync %r248|%p78, -1; 2026-02-21T09:41:20.5851406Z and.pred %p79, %p73, %p78; 2026-02-21T09:41:20.5851469Z and.pred %p56, %p4, %p79; 2026-02-21T09:41:20.5851527Z add.s32 %r164, %r69, 57344; 2026-02-21T09:41:20.5851616Z // begin inline asm 2026-02-21T09:41:20.5851890Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r164], [%rd40, {%r79, %r648}], [%r156]; 2026-02-21T09:41:20.5851946Z // end inline asm 2026-02-21T09:41:20.5852112Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5852174Z // begin inline asm 2026-02-21T09:41:20.5852247Z fence.proxy.async.shared::cta; 2026-02-21T09:41:20.5852302Z // end inline asm 2026-02-21T09:41:20.5852364Z bar.sync 0; 2026-02-21T09:41:20.5852426Z elect.sync %r249|%p80, -1; 2026-02-21T09:41:20.5852487Z and.pred %p81, %p73, %p80; 2026-02-21T09:41:20.5852555Z and.pred %p57, %p4, %p81; 2026-02-21T09:41:20.5852612Z // begin inline asm 2026-02-21T09:41:20.5852858Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r69], [%rd41, {%r79, %r653}], [%r156]; 2026-02-21T09:41:20.5852916Z // end inline asm 2026-02-21T09:41:20.5853098Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5853154Z bar.sync 0; 2026-02-21T09:41:20.5853210Z // begin inline asm 2026-02-21T09:41:20.5853327Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r157], 12288; 2026-02-21T09:41:20.5853381Z // end inline asm 2026-02-21T09:41:20.5853551Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5853611Z bar.sync 0; 2026-02-21T09:41:20.5859803Z elect.sync %r250|%p82, -1; 2026-02-21T09:41:20.5859871Z and.pred %p83, %p73, %p82; 2026-02-21T09:41:20.5859934Z and.pred %p59, %p4, %p83; 2026-02-21T09:41:20.5859995Z add.s32 %r173, %r69, 61440; 2026-02-21T09:41:20.5860062Z // begin inline asm 2026-02-21T09:41:20.5860428Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r173], [%rd40, {%r72, %r648}], [%r157]; 2026-02-21T09:41:20.5860525Z // end inline asm 2026-02-21T09:41:20.5860719Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5860775Z bar.sync 0; 2026-02-21T09:41:20.5860840Z elect.sync %r251|%p84, -1; 2026-02-21T09:41:20.5860908Z and.pred %p85, %p73, %p84; 2026-02-21T09:41:20.5860969Z and.pred %p60, %p4, %p85; 2026-02-21T09:41:20.5861027Z add.s32 %r177, %r69, 8192; 2026-02-21T09:41:20.5861082Z // begin inline asm 2026-02-21T09:41:20.5861332Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r177], [%rd41, {%r72, %r653}], [%r157]; 2026-02-21T09:41:20.5861409Z // end inline asm 2026-02-21T09:41:20.5861582Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5861642Z bar.sync 0; 2026-02-21T09:41:20.5861696Z // begin inline asm 2026-02-21T09:41:20.5861809Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r158], 12288; 2026-02-21T09:41:20.5861872Z // end inline asm 2026-02-21T09:41:20.5862043Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5862095Z bar.sync 0; 2026-02-21T09:41:20.5862156Z elect.sync %r252|%p86, -1; 2026-02-21T09:41:20.5862225Z and.pred %p87, %p73, %p86; 2026-02-21T09:41:20.5862286Z and.pred %p62, %p4, %p87; 2026-02-21T09:41:20.5862344Z add.s32 %r182, %r69, 65536; 2026-02-21T09:41:20.5862404Z // begin inline asm 2026-02-21T09:41:20.5862635Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r182], [%rd40, {%r73, %r648}], [%r158]; 2026-02-21T09:41:20.5862691Z // end inline asm 2026-02-21T09:41:20.5862863Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5862914Z bar.sync 0; 2026-02-21T09:41:20.5862973Z elect.sync %r253|%p88, -1; 2026-02-21T09:41:20.5863030Z and.pred %p89, %p73, %p88; 2026-02-21T09:41:20.5863098Z and.pred %p63, %p4, %p89; 2026-02-21T09:41:20.5863156Z add.s32 %r186, %r69, 16384; 2026-02-21T09:41:20.5863247Z // begin inline asm 2026-02-21T09:41:20.5863481Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r186], [%rd41, {%r73, %r653}], [%r158]; 2026-02-21T09:41:20.5863535Z // end inline asm 2026-02-21T09:41:20.5863703Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5863761Z bar.sync 0; 2026-02-21T09:41:20.5863814Z // begin inline asm 2026-02-21T09:41:20.5863922Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r159], 12288; 2026-02-21T09:41:20.5863975Z // end inline asm 2026-02-21T09:41:20.5864146Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5864197Z bar.sync 0; 2026-02-21T09:41:20.5864257Z elect.sync %r254|%p90, -1; 2026-02-21T09:41:20.5864324Z and.pred %p91, %p73, %p90; 2026-02-21T09:41:20.5864386Z and.pred %p65, %p4, %p91; 2026-02-21T09:41:20.5864456Z add.s32 %r191, %r69, 69632; 2026-02-21T09:41:20.5864514Z mov.b32 %r192, 96; 2026-02-21T09:41:20.5864581Z // begin inline asm 2026-02-21T09:41:20.5864871Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r191], [%rd40, {%r192, %r648}], [%r159]; 2026-02-21T09:41:20.5864926Z // end inline asm 2026-02-21T09:41:20.5865091Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5865141Z bar.sync 0; 2026-02-21T09:41:20.5865199Z elect.sync %r255|%p92, -1; 2026-02-21T09:41:20.5865348Z and.pred %p93, %p73, %p92; 2026-02-21T09:41:20.5865408Z and.pred %p66, %p4, %p93; 2026-02-21T09:41:20.5865463Z add.s32 %r195, %r69, 24576; 2026-02-21T09:41:20.5865518Z // begin inline asm 2026-02-21T09:41:20.5865754Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r195], [%rd41, {%r192, %r653}], [%r159]; 2026-02-21T09:41:20.5865836Z // end inline asm 2026-02-21T09:41:20.5866044Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5866106Z bar.sync 0; 2026-02-21T09:41:20.5866160Z // begin inline asm 2026-02-21T09:41:20.5866265Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r160], 12288; 2026-02-21T09:41:20.5866325Z // end inline asm 2026-02-21T09:41:20.5866485Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5866537Z bar.sync 0; 2026-02-21T09:41:20.5866597Z elect.sync %r256|%p94, -1; 2026-02-21T09:41:20.5866662Z and.pred %p95, %p73, %p94; 2026-02-21T09:41:20.5866720Z and.pred %p68, %p4, %p95; 2026-02-21T09:41:20.5866775Z add.s32 %r200, %r69, 73728; 2026-02-21T09:41:20.5866838Z // begin inline asm 2026-02-21T09:41:20.5867065Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r200], [%rd40, {%r81, %r648}], [%r160]; 2026-02-21T09:41:20.5867119Z // end inline asm 2026-02-21T09:41:20.5867285Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5867337Z bar.sync 0; 2026-02-21T09:41:20.5867396Z elect.sync %r257|%p96, -1; 2026-02-21T09:41:20.5867454Z and.pred %p97, %p73, %p96; 2026-02-21T09:41:20.5867518Z and.pred %p69, %p4, %p97; 2026-02-21T09:41:20.5867574Z add.s32 %r204, %r69, 32768; 2026-02-21T09:41:20.5867628Z // begin inline asm 2026-02-21T09:41:20.5867862Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r204], [%rd41, {%r81, %r653}], [%r160]; 2026-02-21T09:41:20.5867917Z // end inline asm 2026-02-21T09:41:20.5868081Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5868138Z bar.sync 0; 2026-02-21T09:41:20.5868193Z // begin inline asm 2026-02-21T09:41:20.5868292Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r161], 12288; 2026-02-21T09:41:20.5868347Z // end inline asm 2026-02-21T09:41:20.5868512Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5868591Z bar.sync 0; 2026-02-21T09:41:20.5868649Z elect.sync %r258|%p98, -1; 2026-02-21T09:41:20.5868714Z and.pred %p99, %p73, %p98; 2026-02-21T09:41:20.5868771Z and.pred %p71, %p4, %p99; 2026-02-21T09:41:20.5868826Z add.s32 %r209, %r69, 77824; 2026-02-21T09:41:20.5868886Z mov.b32 %r210, 160; 2026-02-21T09:41:20.5868940Z // begin inline asm 2026-02-21T09:41:20.5869175Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r209], [%rd40, {%r210, %r648}], [%r161]; 2026-02-21T09:41:20.5869229Z // end inline asm 2026-02-21T09:41:20.5869391Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5869441Z bar.sync 0; 2026-02-21T09:41:20.5869500Z elect.sync %r259|%p100, -1; 2026-02-21T09:41:20.5869570Z and.pred %p101, %p73, %p100; 2026-02-21T09:41:20.5869630Z and.pred %p72, %p4, %p101; 2026-02-21T09:41:20.5869686Z add.s32 %r213, %r69, 40960; 2026-02-21T09:41:20.5869745Z // begin inline asm 2026-02-21T09:41:20.5869978Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r213], [%rd41, {%r210, %r653}], [%r161]; 2026-02-21T09:41:20.5870029Z // end inline asm 2026-02-21T09:41:20.5870191Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5870247Z bar.sync 0; 2026-02-21T09:41:20.5870300Z // begin inline asm 2026-02-21T09:41:20.5870382Z 2026-02-21T09:41:20.5870439Z { 2026-02-21T09:41:20.5870503Z @!%p73 bra.uni skipWait; 2026-02-21T09:41:20.5870562Z .reg .pred complete; 2026-02-21T09:41:20.5870616Z waitLoop: 2026-02-21T09:41:20.5870735Z mbarrier.try_wait.parity.shared.b64 complete, [%r156], %r79; 2026-02-21T09:41:20.5870796Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.5870848Z skipWait: 2026-02-21T09:41:20.5870925Z } 2026-02-21T09:41:20.5870931Z 2026-02-21T09:41:20.5871029Z // end inline asm 2026-02-21T09:41:20.5871195Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5871260Z setp.ne.b32 %p102, %r16, 0; 2026-02-21T09:41:20.5871319Z or.pred %p103, %p77, %p102; 2026-02-21T09:41:20.5871376Z @%p103 bra $L__BB0_2; 2026-02-21T09:41:20.5871426Z // %bb.1: 2026-02-21T09:41:20.5871494Z elect.sync %r264|%p105, -1; 2026-02-21T09:41:20.5871557Z bfe.u32 %r267, %r164, 4, 14; 2026-02-21T09:41:20.5871615Z cvt.u64.u32 %rd59, %r267; 2026-02-21T09:41:20.5871695Z or.b64 %rd54, %rd59, -9223371899399045120; 2026-02-21T09:41:20.5871752Z bfe.u32 %r268, %r69, 4, 14; 2026-02-21T09:41:20.5871808Z cvt.u64.u32 %rd60, %r268; 2026-02-21T09:41:20.5871876Z or.b64 %rd55, %rd60, -9223371899382267904; 2026-02-21T09:41:20.5871940Z mov.b32 %r261, 69206032; 2026-02-21T09:41:20.5871997Z mov.pred %p104, 0; 2026-02-21T09:41:20.5872054Z // begin inline asm 2026-02-21T09:41:20.5872212Z @%p105 tcgen05.mma.cta_group::1.kind::f16 [ %r646 + 0 ], %rd54, %rd55, %r261, %p104; 2026-02-21T09:41:20.5872268Z // end inline asm 2026-02-21T09:41:20.5872322Z add.s32 %r269, %r69, 57376; 2026-02-21T09:41:20.5872388Z bfe.u32 %r270, %r269, 4, 14; 2026-02-21T09:41:20.5872445Z cvt.u64.u32 %rd61, %r270; 2026-02-21T09:41:20.5872512Z or.b64 %rd56, %rd61, -9223371899399045120; 2026-02-21T09:41:20.5872568Z add.s32 %r271, %r69, 32; 2026-02-21T09:41:20.5872634Z bfe.u32 %r272, %r271, 4, 14; 2026-02-21T09:41:20.5872692Z cvt.u64.u32 %rd62, %r272; 2026-02-21T09:41:20.5872762Z or.b64 %rd57, %rd62, -9223371899382267904; 2026-02-21T09:41:20.5872827Z // begin inline asm 2026-02-21T09:41:20.5872962Z @%p105 tcgen05.mma.cta_group::1.kind::f16 [ %r646 + 0 ], %rd56, %rd57, %r261, %p42; 2026-02-21T09:41:20.5873016Z // end inline asm 2026-02-21T09:41:20.5873077Z add.s32 %r273, %r69, 90176; 2026-02-21T09:41:20.5873132Z cvt.u64.u32 %rd58, %r273; 2026-02-21T09:41:20.5873187Z // begin inline asm 2026-02-21T09:41:20.5873313Z @%p105 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd58]; 2026-02-21T09:41:20.5873397Z // end inline asm 2026-02-21T09:41:20.5873448Z $L__BB0_2: 2026-02-21T09:41:20.5873620Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5873678Z bar.sync 0; 2026-02-21T09:41:20.5873731Z // begin inline asm 2026-02-21T09:41:20.5873833Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r274], 12288; 2026-02-21T09:41:20.5873885Z // end inline asm 2026-02-21T09:41:20.5874054Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5874108Z bar.sync 0; 2026-02-21T09:41:20.5874167Z elect.sync %r284|%p115, -1; 2026-02-21T09:41:20.5874233Z and.pred %p116, %p73, %p115; 2026-02-21T09:41:20.5874291Z and.pred %p110, %p4, %p116; 2026-02-21T09:41:20.5874344Z add.s32 %r275, %r69, 81920; 2026-02-21T09:41:20.5874403Z mov.b32 %r663, 192; 2026-02-21T09:41:20.5874458Z // begin inline asm 2026-02-21T09:41:20.5874727Z @%p110 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r275], [%rd40, {%r663, %r648}], [%r274]; 2026-02-21T09:41:20.5874782Z // end inline asm 2026-02-21T09:41:20.5874948Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5875000Z bar.sync 0; 2026-02-21T09:41:20.5875060Z elect.sync %r285|%p117, -1; 2026-02-21T09:41:20.5875128Z and.pred %p118, %p73, %p117; 2026-02-21T09:41:20.5875187Z and.pred %p111, %p4, %p118; 2026-02-21T09:41:20.5875270Z add.s32 %r279, %r69, 49152; 2026-02-21T09:41:20.5875330Z // begin inline asm 2026-02-21T09:41:20.5875566Z @%p111 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r279], [%rd41, {%r663, %r653}], [%r274]; 2026-02-21T09:41:20.5875619Z // end inline asm 2026-02-21T09:41:20.5875817Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5875908Z @%p77 bra $L__BB0_12; 2026-02-21T09:41:20.5875989Z // %bb.3: // %.lr.ph 2026-02-21T09:41:20.5876158Z .loc 1 0 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:0:108 2026-02-21T09:41:20.5876222Z and.b32 %r4, %r1, 15; 2026-02-21T09:41:20.5876279Z shr.u32 %r228, %r1, 4; 2026-02-21T09:41:20.5876336Z bfe.u32 %r6, %r1, 4, 3; 2026-02-21T09:41:20.5876425Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:41:20.5876479Z shl.b32 %r5, %r4, 3; 2026-02-21T09:41:20.5876533Z or.b32 %r7, %r6, 8; 2026-02-21T09:41:20.5876585Z or.b32 %r8, %r6, 16; 2026-02-21T09:41:20.5876645Z or.b32 %r9, %r6, 24; 2026-02-21T09:41:20.5876698Z or.b32 %r10, %r6, 32; 2026-02-21T09:41:20.5876750Z or.b32 %r11, %r6, 40; 2026-02-21T09:41:20.5876809Z or.b32 %r12, %r6, 48; 2026-02-21T09:41:20.5876863Z or.b32 %r13, %r228, 56; 2026-02-21T09:41:20.5876919Z shl.b32 %r15, %r14, 5; 2026-02-21T09:41:20.5876976Z add.s32 %r20, %r15, -7; 2026-02-21T09:41:20.5877038Z shl.b32 %r295, %r1, 9; 2026-02-21T09:41:20.5877092Z and.b32 %r296, %r295, 3072; 2026-02-21T09:41:20.5877145Z shl.b32 %r297, %r4, 4; 2026-02-21T09:41:20.5877204Z and.b32 %r298, %r1, 96; 2026-02-21T09:41:20.5877259Z shl.b32 %r299, %r298, 3; 2026-02-21T09:41:20.5877313Z and.b32 %r301, %r226, 64; 2026-02-21T09:41:20.5877367Z or.b32 %r302, %r297, %r299; 2026-02-21T09:41:20.5877429Z xor.b32 %r303, %r302, %r301; 2026-02-21T09:41:20.5877483Z or.b32 %r304, %r303, %r296; 2026-02-21T09:41:20.5877535Z add.s32 %r306, %r69, 86016; 2026-02-21T09:41:20.5877598Z add.s32 %r21, %r306, %r304; 2026-02-21T09:41:20.5877651Z xor.b32 %r307, %r304, 32; 2026-02-21T09:41:20.5877705Z add.s32 %r22, %r306, %r307; 2026-02-21T09:41:20.5877757Z shl.b32 %r308, %r1, 5; 2026-02-21T09:41:20.5877818Z and.b32 %r309, %r308, 3168; 2026-02-21T09:41:20.5877870Z shl.b32 %r310, %r1, 4; 2026-02-21T09:41:20.5877925Z and.b32 %r311, %r310, 384; 2026-02-21T09:41:20.5877988Z and.b32 %r312, %r226, 16; 2026-02-21T09:41:20.5878070Z or.b32 %r313, %r309, %r311; 2026-02-21T09:41:20.5878125Z xor.b32 %r314, %r313, %r298; 2026-02-21T09:41:20.5878189Z add.s32 %r315, %r306, %r312; 2026-02-21T09:41:20.5878243Z add.s32 %r454, %r315, %r314; 2026-02-21T09:41:20.5878298Z add.s32 %r459, %r454, 512; 2026-02-21T09:41:20.5878470Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5878531Z max.s32 %r316, %r15, 2; 2026-02-21T09:41:20.5878588Z add.s32 %r25, %r316, -1; 2026-02-21T09:41:20.5878646Z mov.pred %p153, -1; 2026-02-21T09:41:20.5878704Z mov.b32 %r668, 6; 2026-02-21T09:41:20.5878757Z mov.b32 %r664, 0; 2026-02-21T09:41:20.5878808Z mov.b32 %r662, 1; 2026-02-21T09:41:20.5878860Z mov.b32 %r661, 2; 2026-02-21T09:41:20.5878920Z mov.b32 %r660, 3; 2026-02-21T09:41:20.5878971Z mov.b32 %r659, 4; 2026-02-21T09:41:20.5879021Z mov.b32 %r658, 5; 2026-02-21T09:41:20.5879084Z mov.b32 %r649, %r648; 2026-02-21T09:41:20.5879140Z mov.b32 %r650, %r648; 2026-02-21T09:41:20.5879194Z mov.b32 %r651, %r648; 2026-02-21T09:41:20.5879247Z mov.b32 %r652, %r648; 2026-02-21T09:41:20.5879308Z mov.b32 %r654, %r653; 2026-02-21T09:41:20.5879360Z mov.b32 %r655, %r653; 2026-02-21T09:41:20.5879413Z mov.b32 %r656, %r653; 2026-02-21T09:41:20.5879473Z mov.b32 %r657, %r653; 2026-02-21T09:41:20.5879526Z mov.b32 %r665, %r154; 2026-02-21T09:41:20.5879579Z mov.b32 %r666, %r664; 2026-02-21T09:41:20.5879632Z mov.b32 %r667, %r664; 2026-02-21T09:41:20.5879696Z mov.b32 %r669, %r662; 2026-02-21T09:41:20.5879774Z mov.b32 %r670, %r664; 2026-02-21T09:41:20.5879829Z mov.b32 %r671, %r648; 2026-02-21T09:41:20.5879888Z mov.b32 %r672, %r653; 2026-02-21T09:41:20.5879940Z mov.b32 %r674, %r668; 2026-02-21T09:41:20.5880122Z mov.b32 %r675, %r664; 2026-02-21T09:41:20.5880173Z mov.b32 %r676, %r672; 2026-02-21T09:41:20.5880232Z mov.b32 %r677, %r671; 2026-02-21T09:41:20.5880304Z bra.uni $L__BB0_4; 2026-02-21T09:41:20.5880428Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.5880606Z .loc 1 0 0 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:0 2026-02-21T09:41:20.5880673Z selp.b32 %r669, 0, %r368, %p136; 2026-02-21T09:41:20.5880734Z selp.b32 %r369, 1, 0, %p136; 2026-02-21T09:41:20.5880803Z xor.b32 %r670, %r636, %r369; 2026-02-21T09:41:20.5880971Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5881028Z add.s32 %r675, %r675, 1; 2026-02-21T09:41:20.5881092Z setp.ne.b32 %p142, %r25, %r675; 2026-02-21T09:41:20.5881152Z mov.b32 %r648, %r671; 2026-02-21T09:41:20.5881205Z mov.b32 %r652, %r29; 2026-02-21T09:41:20.5881257Z mov.b32 %r653, %r672; 2026-02-21T09:41:20.5881317Z mov.b32 %r657, %r34; 2026-02-21T09:41:20.5881370Z mov.b32 %r658, %r674; 2026-02-21T09:41:20.5881423Z mov.b32 %r662, %r39; 2026-02-21T09:41:20.5881477Z mov.b32 %r664, %r636; 2026-02-21T09:41:20.5881537Z mov.b32 %r665, %r635; 2026-02-21T09:41:20.5881592Z mov.b32 %r671, %r677; 2026-02-21T09:41:20.5881643Z mov.b32 %r672, %r676; 2026-02-21T09:41:20.5881702Z mov.b32 %r674, %r54; 2026-02-21T09:41:20.5881757Z @%p142 bra $L__BB0_4; 2026-02-21T09:41:20.5881809Z bra.uni $L__BB0_11; 2026-02-21T09:41:20.5881914Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:20.5882082Z .loc 1 0 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:0:108 2026-02-21T09:41:20.5882135Z mov.b32 %r636, %r670; 2026-02-21T09:41:20.5882191Z mov.b32 %r39, %r661; 2026-02-21T09:41:20.5882252Z mov.b32 %r661, %r660; 2026-02-21T09:41:20.5882303Z mov.b32 %r660, %r659; 2026-02-21T09:41:20.5882355Z mov.b32 %r659, %r658; 2026-02-21T09:41:20.5882415Z mov.b32 %r34, %r656; 2026-02-21T09:41:20.5882467Z mov.b32 %r656, %r655; 2026-02-21T09:41:20.5882519Z mov.b32 %r655, %r654; 2026-02-21T09:41:20.5882573Z mov.b32 %r654, %r653; 2026-02-21T09:41:20.5882634Z mov.b32 %r29, %r651; 2026-02-21T09:41:20.5882708Z mov.b32 %r651, %r650; 2026-02-21T09:41:20.5882761Z mov.b32 %r650, %r649; 2026-02-21T09:41:20.5882819Z mov.b32 %r649, %r648; 2026-02-21T09:41:20.5882989Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5883045Z add.s32 %r317, %r674, 1; 2026-02-21T09:41:20.5883105Z setp.eq.b32 %p120, %r674, 31; 2026-02-21T09:41:20.5883172Z selp.b32 %r54, 0, %r317, %p120; 2026-02-21T09:41:20.5883229Z setp.ne.b32 %p121, %r54, 0; 2026-02-21T09:41:20.5883285Z @%p121 bra $L__BB0_6; 2026-02-21T09:41:20.5883389Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.5883444Z add.s32 %r678, %r678, 4736; 2026-02-21T09:41:20.5883612Z .loc 1 34 35 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:34:35 2026-02-21T09:41:20.5883673Z shr.s32 %r318, %r678, 31; 2026-02-21T09:41:20.5883728Z shr.u32 %r319, %r318, 22; 2026-02-21T09:41:20.5883786Z add.s32 %r320, %r678, %r319; 2026-02-21T09:41:20.5883841Z shr.s32 %r321, %r320, 10; 2026-02-21T09:41:20.5884015Z .loc 1 35 33 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:35:33 2026-02-21T09:41:20.5884071Z shl.b32 %r322, %r321, 6; 2026-02-21T09:41:20.5884236Z .loc 1 36 39 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:36:39 2026-02-21T09:41:20.5884300Z sub.s32 %r323, 96, %r322; 2026-02-21T09:41:20.5884464Z .loc 1 36 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:36:52 2026-02-21T09:41:20.5884543Z min.s32 %r324, %r323, 64; 2026-02-21T09:41:20.5884745Z .loc 1 37 45 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:37:45 2026-02-21T09:41:20.5884803Z and.b32 %r325, %r320, -1024; 2026-02-21T09:41:20.5884858Z sub.s32 %r326, %r678, %r325; 2026-02-21T09:41:20.5885095Z .loc 1 38 51 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:38:51 2026-02-21T09:41:20.5885155Z div.s32 %r327, %r326, %r324; 2026-02-21T09:41:20.5885325Z .loc 1 37 64 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:37:64 2026-02-21T09:41:20.5885385Z mul.lo.s32 %r328, %r327, %r324; 2026-02-21T09:41:20.5885449Z sub.s32 %r329, %r326, %r328; 2026-02-21T09:41:20.5885619Z .loc 1 37 30 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:37:30 2026-02-21T09:41:20.5885675Z add.s32 %r330, %r329, %r322; 2026-02-21T09:41:20.5885845Z .loc 1 39 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:39:27 2026-02-21T09:41:20.5885902Z shl.b32 %r676, %r330, 7; 2026-02-21T09:41:20.5886065Z .loc 1 41 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:41:27 2026-02-21T09:41:20.5886128Z shl.b32 %r677, %r327, 6; 2026-02-21T09:41:20.5886224Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.5886397Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5886453Z add.s32 %r333, %r667, 1; 2026-02-21T09:41:20.5886520Z setp.gt.s32 %p123, %r333, 6; 2026-02-21T09:41:20.5886582Z selp.b32 %r667, 0, %r333, %p123; 2026-02-21T09:41:20.5886638Z selp.b32 %r334, 1, 0, %p123; 2026-02-21T09:41:20.5886701Z xor.b32 %r666, %r666, %r334; 2026-02-21T09:41:20.5886756Z shl.b32 %r335, %r667, 3; 2026-02-21T09:41:20.5886812Z add.s32 %r337, %r69, %r335; 2026-02-21T09:41:20.5886876Z add.s32 %r331, %r337, 90112; 2026-02-21T09:41:20.5886932Z bar.sync 0; 2026-02-21T09:41:20.5886988Z // begin inline asm 2026-02-21T09:41:20.5887038Z 2026-02-21T09:41:20.5887095Z { 2026-02-21T09:41:20.5887153Z .reg .pred complete; 2026-02-21T09:41:20.5887208Z waitLoop: 2026-02-21T09:41:20.5887334Z mbarrier.try_wait.parity.shared.b64 complete, [%r331], %r666; 2026-02-21T09:41:20.5887399Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.5887451Z } 2026-02-21T09:41:20.5887482Z 2026-02-21T09:41:20.5887538Z // end inline asm 2026-02-21T09:41:20.5887604Z shl.b32 %r338, %r669, 3; 2026-02-21T09:41:20.5887660Z add.s32 %r339, %r69, %r338; 2026-02-21T09:41:20.5887716Z add.s32 %r635, %r339, 90176; 2026-02-21T09:41:20.5887882Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5887937Z @%p102 bra $L__BB0_8; 2026-02-21T09:41:20.5888026Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.5888192Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5888246Z shl.b32 %r344, %r667, 12; 2026-02-21T09:41:20.5888300Z add.s32 %r346, %r69, %r344; 2026-02-21T09:41:20.5888354Z add.s32 %r347, %r346, 57344; 2026-02-21T09:41:20.5888521Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5888577Z shl.b32 %r348, %r667, 13; 2026-02-21T09:41:20.5888634Z add.s32 %r349, %r69, %r348; 2026-02-21T09:41:20.5888797Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5888857Z elect.sync %r350|%p125, -1; 2026-02-21T09:41:20.5888912Z bfe.u32 %r351, %r347, 4, 14; 2026-02-21T09:41:20.5888975Z cvt.u64.u32 %rd70, %r351; 2026-02-21T09:41:20.5889045Z or.b64 %rd65, %rd70, -9223371899399045120; 2026-02-21T09:41:20.5889099Z bfe.u32 %r352, %r349, 4, 14; 2026-02-21T09:41:20.5889154Z cvt.u64.u32 %rd71, %r352; 2026-02-21T09:41:20.5889263Z or.b64 %rd66, %rd71, -9223371899382267904; 2026-02-21T09:41:20.5889319Z mov.b32 %r341, 69206032; 2026-02-21T09:41:20.5889374Z // begin inline asm 2026-02-21T09:41:20.5889524Z @%p125 tcgen05.mma.cta_group::1.kind::f16 [ %r646 + 0 ], %rd65, %rd66, %r341, %p153; 2026-02-21T09:41:20.5889577Z // end inline asm 2026-02-21T09:41:20.5889667Z add.s32 %r353, %r346, 57376; 2026-02-21T09:41:20.5889745Z bfe.u32 %r354, %r353, 4, 14; 2026-02-21T09:41:20.5889811Z cvt.u64.u32 %rd72, %r354; 2026-02-21T09:41:20.5889877Z or.b64 %rd67, %rd72, -9223371899399045120; 2026-02-21T09:41:20.5889931Z add.s32 %r355, %r349, 32; 2026-02-21T09:41:20.5889993Z bfe.u32 %r356, %r355, 4, 14; 2026-02-21T09:41:20.5890049Z cvt.u64.u32 %rd73, %r356; 2026-02-21T09:41:20.5890112Z or.b64 %rd68, %rd73, -9223371899382267904; 2026-02-21T09:41:20.5890178Z mov.pred %p126, -1; 2026-02-21T09:41:20.5890232Z // begin inline asm 2026-02-21T09:41:20.5890366Z @%p125 tcgen05.mma.cta_group::1.kind::f16 [ %r646 + 0 ], %rd67, %rd68, %r341, %p126; 2026-02-21T09:41:20.5890420Z // end inline asm 2026-02-21T09:41:20.5890483Z cvt.u64.u32 %rd69, %r635; 2026-02-21T09:41:20.5890537Z // begin inline asm 2026-02-21T09:41:20.5890661Z @%p125 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd69]; 2026-02-21T09:41:20.5890720Z // end inline asm 2026-02-21T09:41:20.5890815Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.5890989Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5891056Z setp.eq.b32 %p132, %r54, 0; 2026-02-21T09:41:20.5891118Z setp.lt.s32 %p133, %r675, %r20; 2026-02-21T09:41:20.5891285Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5891339Z // begin inline asm 2026-02-21T09:41:20.5891397Z 2026-02-21T09:41:20.5891445Z { 2026-02-21T09:41:20.5891504Z .reg .pred complete; 2026-02-21T09:41:20.5891564Z waitLoop: 2026-02-21T09:41:20.5891679Z mbarrier.try_wait.parity.shared.b64 complete, [%r665], %r664; 2026-02-21T09:41:20.5891740Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.5891788Z } 2026-02-21T09:41:20.5891792Z 2026-02-21T09:41:20.5891851Z // end inline asm 2026-02-21T09:41:20.5891906Z add.s32 %r368, %r669, 1; 2026-02-21T09:41:20.5891964Z setp.gt.s32 %p136, %r368, 1; 2026-02-21T09:41:20.5892144Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5892222Z add.s32 %r370, %r663, 32; 2026-02-21T09:41:20.5892277Z add.s32 %r371, %r668, 1; 2026-02-21T09:41:20.5892339Z setp.gt.s32 %p137, %r371, 6; 2026-02-21T09:41:20.5892400Z selp.b32 %r668, 0, %r371, %p137; 2026-02-21T09:41:20.5892460Z selp.b32 %r663, 0, %r370, %p132; 2026-02-21T09:41:20.5892514Z shl.b32 %r372, %r668, 3; 2026-02-21T09:41:20.5892574Z add.s32 %r374, %r69, %r372; 2026-02-21T09:41:20.5892628Z add.s32 %r363, %r374, 90112; 2026-02-21T09:41:20.5892682Z bar.sync 0; 2026-02-21T09:41:20.5892750Z and.pred %p129, %p143, %p133; 2026-02-21T09:41:20.5892803Z // begin inline asm 2026-02-21T09:41:20.5892916Z @%p129 mbarrier.arrive.expect_tx.shared.b64 _, [%r363], 12288; 2026-02-21T09:41:20.5892968Z // end inline asm 2026-02-21T09:41:20.5893142Z .loc 1 51 31 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:51:31 2026-02-21T09:41:20.5893198Z shl.b32 %r375, %r668, 12; 2026-02-21T09:41:20.5893254Z add.s32 %r376, %r69, %r375; 2026-02-21T09:41:20.5893316Z add.s32 %r360, %r376, 57344; 2026-02-21T09:41:20.5893368Z bar.sync 0; 2026-02-21T09:41:20.5893428Z elect.sync %r377|%p138, -1; 2026-02-21T09:41:20.5893496Z and.pred %p139, %p133, %p138; 2026-02-21T09:41:20.5893555Z and.pred %p130, %p4, %p139; 2026-02-21T09:41:20.5893608Z // begin inline asm 2026-02-21T09:41:20.5893849Z @%p130 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r360], [%rd40, {%r663, %r677}], [%r363]; 2026-02-21T09:41:20.5893937Z // end inline asm 2026-02-21T09:41:20.5894105Z .loc 1 52 44 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:52:44 2026-02-21T09:41:20.5894161Z shl.b32 %r378, %r668, 13; 2026-02-21T09:41:20.5894225Z add.s32 %r364, %r69, %r378; 2026-02-21T09:41:20.5894279Z bar.sync 0; 2026-02-21T09:41:20.5894341Z elect.sync %r379|%p140, -1; 2026-02-21T09:41:20.5894430Z and.pred %p141, %p133, %p140; 2026-02-21T09:41:20.5894524Z and.pred %p131, %p4, %p141; 2026-02-21T09:41:20.5894582Z // begin inline asm 2026-02-21T09:41:20.5894865Z @%p131 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r364], [%rd41, {%r663, %r676}], [%r363]; 2026-02-21T09:41:20.5894930Z // end inline asm 2026-02-21T09:41:20.5895103Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5895165Z setp.ne.b32 %p153, %r662, 31; 2026-02-21T09:41:20.5895235Z @%p153 bra $L__BB0_10; 2026-02-21T09:41:20.5895332Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.5895499Z .loc 1 40 32 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:40:32 2026-02-21T09:41:20.5895570Z add.s32 %r522, %r657, %r5; 2026-02-21T09:41:20.5895743Z .loc 1 42 32 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:42:32 2026-02-21T09:41:20.5895813Z add.s32 %r523, %r652, %r6; 2026-02-21T09:41:20.5895874Z add.s32 %r524, %r7, %r652; 2026-02-21T09:41:20.5895939Z add.s32 %r525, %r8, %r652; 2026-02-21T09:41:20.5895994Z add.s32 %r526, %r9, %r652; 2026-02-21T09:41:20.5896052Z add.s32 %r527, %r10, %r652; 2026-02-21T09:41:20.5896116Z add.s32 %r528, %r11, %r652; 2026-02-21T09:41:20.5896177Z add.s32 %r529, %r12, %r652; 2026-02-21T09:41:20.5896237Z add.s32 %r530, %r652, %r13; 2026-02-21T09:41:20.5896417Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5896478Z // begin inline asm 2026-02-21T09:41:20.5896531Z 2026-02-21T09:41:20.5896583Z { 2026-02-21T09:41:20.5896652Z .reg .pred complete; 2026-02-21T09:41:20.5896706Z waitLoop: 2026-02-21T09:41:20.5896826Z mbarrier.try_wait.parity.shared.b64 complete, [%r635], %r636; 2026-02-21T09:41:20.5896897Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.5896947Z } 2026-02-21T09:41:20.5896951Z 2026-02-21T09:41:20.5897007Z // end inline asm 2026-02-21T09:41:20.5897175Z .loc 1 56 53 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:56:53 2026-02-21T09:41:20.5897286Z mad.lo.s32 %r531, %r523, 12288, %r522; 2026-02-21T09:41:20.5897354Z mad.lo.s32 %r532, %r524, 12288, %r522; 2026-02-21T09:41:20.5897417Z mad.lo.s32 %r533, %r525, 12288, %r522; 2026-02-21T09:41:20.5897488Z mad.lo.s32 %r534, %r526, 12288, %r522; 2026-02-21T09:41:20.5897550Z mad.lo.s32 %r535, %r527, 12288, %r522; 2026-02-21T09:41:20.5897612Z mad.lo.s32 %r536, %r528, 12288, %r522; 2026-02-21T09:41:20.5897682Z mad.lo.s32 %r537, %r529, 12288, %r522; 2026-02-21T09:41:20.5897744Z mad.lo.s32 %r538, %r530, 12288, %r522; 2026-02-21T09:41:20.5897911Z .loc 1 56 24 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:56:24 2026-02-21T09:41:20.5897978Z mad.wide.s32 %rd76, %r531, 2, %rd3; 2026-02-21T09:41:20.5898051Z mad.wide.s32 %rd77, %r532, 2, %rd3; 2026-02-21T09:41:20.5898116Z mad.wide.s32 %rd78, %r533, 2, %rd3; 2026-02-21T09:41:20.5898180Z mad.wide.s32 %rd79, %r534, 2, %rd3; 2026-02-21T09:41:20.5898251Z mad.wide.s32 %rd80, %r535, 2, %rd3; 2026-02-21T09:41:20.5898311Z mad.wide.s32 %rd81, %r536, 2, %rd3; 2026-02-21T09:41:20.5898372Z mad.wide.s32 %rd82, %r537, 2, %rd3; 2026-02-21T09:41:20.5898440Z mad.wide.s32 %rd83, %r538, 2, %rd3; 2026-02-21T09:41:20.5898608Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5898665Z // begin inline asm 2026-02-21T09:41:20.5898964Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r382, %r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397}, [%r86 + 0], 64; 2026-02-21T09:41:20.5899058Z // end inline asm 2026-02-21T09:41:20.5899115Z // begin inline asm 2026-02-21T09:41:20.5899439Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r399, %r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414}, [%r86 + 16], 64; 2026-02-21T09:41:20.5899534Z // end inline asm 2026-02-21T09:41:20.5899591Z // begin inline asm 2026-02-21T09:41:20.5899888Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r416, %r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431}, [%r86 + 32], 64; 2026-02-21T09:41:20.5899950Z // end inline asm 2026-02-21T09:41:20.5900007Z // begin inline asm 2026-02-21T09:41:20.5900298Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r433, %r434, %r435, %r436, %r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448}, [%r86 + 48], 64; 2026-02-21T09:41:20.5900360Z // end inline asm 2026-02-21T09:41:20.5900415Z // begin inline asm 2026-02-21T09:41:20.5900485Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:20.5900538Z // end inline asm 2026-02-21T09:41:20.5900606Z cvt.u64.u32 %rd84, %r382; 2026-02-21T09:41:20.5900665Z cvt.u64.u32 %rd85, %r383; 2026-02-21T09:41:20.5900722Z shl.b64 %rd86, %rd85, 32; 2026-02-21T09:41:20.5900787Z or.b64 %rd87, %rd84, %rd86; 2026-02-21T09:41:20.5900966Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5901030Z mov.b64 {%r539, %r540}, %rd87; 2026-02-21T09:41:20.5901099Z cvt.rn.f16x2.f32 %r541, %r540, %r539; 2026-02-21T09:41:20.5901276Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5901333Z cvt.u64.u32 %rd88, %r384; 2026-02-21T09:41:20.5901390Z cvt.u64.u32 %rd89, %r385; 2026-02-21T09:41:20.5901454Z shl.b64 %rd90, %rd89, 32; 2026-02-21T09:41:20.5901514Z or.b64 %rd91, %rd88, %rd90; 2026-02-21T09:41:20.5901683Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5901750Z mov.b64 {%r542, %r543}, %rd91; 2026-02-21T09:41:20.5901817Z cvt.rn.f16x2.f32 %r544, %r543, %r542; 2026-02-21T09:41:20.5901989Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5902054Z cvt.u64.u32 %rd92, %r386; 2026-02-21T09:41:20.5902134Z cvt.u64.u32 %rd93, %r387; 2026-02-21T09:41:20.5902191Z shl.b64 %rd94, %rd93, 32; 2026-02-21T09:41:20.5902248Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T09:41:20.5902425Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5902486Z mov.b64 {%r545, %r546}, %rd95; 2026-02-21T09:41:20.5902560Z cvt.rn.f16x2.f32 %r547, %r546, %r545; 2026-02-21T09:41:20.5902730Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5902786Z cvt.u64.u32 %rd96, %r388; 2026-02-21T09:41:20.5902841Z cvt.u64.u32 %rd97, %r389; 2026-02-21T09:41:20.5902896Z shl.b64 %rd98, %rd97, 32; 2026-02-21T09:41:20.5902957Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T09:41:20.5903118Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5903176Z mov.b64 {%r548, %r549}, %rd99; 2026-02-21T09:41:20.5903247Z cvt.rn.f16x2.f32 %r550, %r549, %r548; 2026-02-21T09:41:20.5903405Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5903463Z cvt.u64.u32 %rd100, %r390; 2026-02-21T09:41:20.5903526Z cvt.u64.u32 %rd101, %r391; 2026-02-21T09:41:20.5903582Z shl.b64 %rd102, %rd101, 32; 2026-02-21T09:41:20.5903638Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T09:41:20.5903801Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5903889Z mov.b64 {%r551, %r552}, %rd103; 2026-02-21T09:41:20.5903951Z cvt.rn.f16x2.f32 %r553, %r552, %r551; 2026-02-21T09:41:20.5904113Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5904177Z cvt.u64.u32 %rd104, %r392; 2026-02-21T09:41:20.5904257Z cvt.u64.u32 %rd105, %r393; 2026-02-21T09:41:20.5904339Z shl.b64 %rd106, %rd105, 32; 2026-02-21T09:41:20.5904409Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T09:41:20.5904572Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5904631Z mov.b64 {%r554, %r555}, %rd107; 2026-02-21T09:41:20.5904727Z cvt.rn.f16x2.f32 %r556, %r555, %r554; 2026-02-21T09:41:20.5904899Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5904956Z cvt.u64.u32 %rd108, %r394; 2026-02-21T09:41:20.5905013Z cvt.u64.u32 %rd109, %r395; 2026-02-21T09:41:20.5905079Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:41:20.5905135Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:41:20.5905302Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5905366Z mov.b64 {%r557, %r558}, %rd111; 2026-02-21T09:41:20.5905426Z cvt.rn.f16x2.f32 %r559, %r558, %r557; 2026-02-21T09:41:20.5905592Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5905650Z cvt.u64.u32 %rd112, %r396; 2026-02-21T09:41:20.5905714Z cvt.u64.u32 %rd113, %r397; 2026-02-21T09:41:20.5905770Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:41:20.5905826Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:41:20.5905998Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5906055Z mov.b64 {%r560, %r561}, %rd115; 2026-02-21T09:41:20.5906115Z cvt.rn.f16x2.f32 %r562, %r561, %r560; 2026-02-21T09:41:20.5906284Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5906341Z cvt.u64.u32 %rd116, %r399; 2026-02-21T09:41:20.5906397Z cvt.u64.u32 %rd117, %r400; 2026-02-21T09:41:20.5906452Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:41:20.5906515Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:41:20.5906681Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5906764Z mov.b64 {%r563, %r564}, %rd119; 2026-02-21T09:41:20.5906833Z cvt.rn.f16x2.f32 %r565, %r564, %r563; 2026-02-21T09:41:20.5906991Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5907050Z cvt.u64.u32 %rd120, %r401; 2026-02-21T09:41:20.5907112Z cvt.u64.u32 %rd121, %r402; 2026-02-21T09:41:20.5907167Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:41:20.5907223Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:41:20.5907384Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5907449Z mov.b64 {%r566, %r567}, %rd123; 2026-02-21T09:41:20.5907509Z cvt.rn.f16x2.f32 %r568, %r567, %r566; 2026-02-21T09:41:20.5907667Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5907733Z cvt.u64.u32 %rd124, %r403; 2026-02-21T09:41:20.5907790Z cvt.u64.u32 %rd125, %r404; 2026-02-21T09:41:20.5907847Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:41:20.5907910Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:41:20.5908069Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5908126Z mov.b64 {%r569, %r570}, %rd127; 2026-02-21T09:41:20.5908186Z cvt.rn.f16x2.f32 %r571, %r570, %r569; 2026-02-21T09:41:20.5908350Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5908451Z cvt.u64.u32 %rd128, %r405; 2026-02-21T09:41:20.5908506Z cvt.u64.u32 %rd129, %r406; 2026-02-21T09:41:20.5908568Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:41:20.5908623Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:41:20.5908786Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5908871Z mov.b64 {%r572, %r573}, %rd131; 2026-02-21T09:41:20.5908954Z cvt.rn.f16x2.f32 %r574, %r573, %r572; 2026-02-21T09:41:20.5909118Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5909175Z cvt.u64.u32 %rd132, %r407; 2026-02-21T09:41:20.5909238Z cvt.u64.u32 %rd133, %r408; 2026-02-21T09:41:20.5909293Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:41:20.5909347Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:41:20.5909517Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5909573Z mov.b64 {%r575, %r576}, %rd135; 2026-02-21T09:41:20.5909634Z cvt.rn.f16x2.f32 %r577, %r576, %r575; 2026-02-21T09:41:20.5909806Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5909862Z cvt.u64.u32 %rd136, %r409; 2026-02-21T09:41:20.5909917Z cvt.u64.u32 %rd137, %r410; 2026-02-21T09:41:20.5909974Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:41:20.5910039Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:41:20.5910204Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5910261Z mov.b64 {%r578, %r579}, %rd139; 2026-02-21T09:41:20.5910329Z cvt.rn.f16x2.f32 %r580, %r579, %r578; 2026-02-21T09:41:20.5910493Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5910550Z cvt.u64.u32 %rd140, %r411; 2026-02-21T09:41:20.5910611Z cvt.u64.u32 %rd141, %r412; 2026-02-21T09:41:20.5910666Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:41:20.5910722Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:41:20.5910884Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5910946Z mov.b64 {%r581, %r582}, %rd143; 2026-02-21T09:41:20.5911006Z cvt.rn.f16x2.f32 %r583, %r582, %r581; 2026-02-21T09:41:20.5911174Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5911261Z cvt.u64.u32 %rd144, %r413; 2026-02-21T09:41:20.5911317Z cvt.u64.u32 %rd145, %r414; 2026-02-21T09:41:20.5911373Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:41:20.5911436Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:41:20.5911593Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5911650Z mov.b64 {%r584, %r585}, %rd147; 2026-02-21T09:41:20.5911710Z cvt.rn.f16x2.f32 %r586, %r585, %r584; 2026-02-21T09:41:20.5911876Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5911934Z cvt.u64.u32 %rd148, %r416; 2026-02-21T09:41:20.5911988Z cvt.u64.u32 %rd149, %r417; 2026-02-21T09:41:20.5912051Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:41:20.5912107Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:41:20.5912264Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5912326Z mov.b64 {%r587, %r588}, %rd151; 2026-02-21T09:41:20.5912389Z cvt.rn.f16x2.f32 %r589, %r588, %r587; 2026-02-21T09:41:20.5912550Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5912607Z cvt.u64.u32 %rd152, %r418; 2026-02-21T09:41:20.5912669Z cvt.u64.u32 %rd153, %r419; 2026-02-21T09:41:20.5912724Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:41:20.5912781Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:41:20.5912948Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5913032Z mov.b64 {%r590, %r591}, %rd155; 2026-02-21T09:41:20.5913094Z cvt.rn.f16x2.f32 %r592, %r591, %r590; 2026-02-21T09:41:20.5913268Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5913325Z cvt.u64.u32 %rd156, %r420; 2026-02-21T09:41:20.5913402Z cvt.u64.u32 %rd157, %r421; 2026-02-21T09:41:20.5913477Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:41:20.5913544Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:41:20.5913706Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5913762Z mov.b64 {%r593, %r594}, %rd159; 2026-02-21T09:41:20.5913831Z cvt.rn.f16x2.f32 %r595, %r594, %r593; 2026-02-21T09:41:20.5913991Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5914048Z cvt.u64.u32 %rd160, %r422; 2026-02-21T09:41:20.5914113Z cvt.u64.u32 %rd161, %r423; 2026-02-21T09:41:20.5914169Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:41:20.5914227Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:41:20.5914392Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5914455Z mov.b64 {%r596, %r597}, %rd163; 2026-02-21T09:41:20.5914518Z cvt.rn.f16x2.f32 %r598, %r597, %r596; 2026-02-21T09:41:20.5914705Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5914772Z cvt.u64.u32 %rd164, %r424; 2026-02-21T09:41:20.5914828Z cvt.u64.u32 %rd165, %r425; 2026-02-21T09:41:20.5914883Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:41:20.5914949Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:41:20.5915112Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5915168Z mov.b64 {%r599, %r600}, %rd167; 2026-02-21T09:41:20.5915228Z cvt.rn.f16x2.f32 %r601, %r600, %r599; 2026-02-21T09:41:20.5915401Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5915457Z cvt.u64.u32 %rd168, %r426; 2026-02-21T09:41:20.5915512Z cvt.u64.u32 %rd169, %r427; 2026-02-21T09:41:20.5915576Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:41:20.5915633Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:41:20.5915803Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5915895Z mov.b64 {%r602, %r603}, %rd171; 2026-02-21T09:41:20.5915954Z cvt.rn.f16x2.f32 %r604, %r603, %r602; 2026-02-21T09:41:20.5916119Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5916174Z cvt.u64.u32 %rd172, %r428; 2026-02-21T09:41:20.5916236Z cvt.u64.u32 %rd173, %r429; 2026-02-21T09:41:20.5916292Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:41:20.5916347Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:41:20.5916520Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5916578Z mov.b64 {%r605, %r606}, %rd175; 2026-02-21T09:41:20.5916638Z cvt.rn.f16x2.f32 %r607, %r606, %r605; 2026-02-21T09:41:20.5916812Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5916871Z cvt.u64.u32 %rd176, %r430; 2026-02-21T09:41:20.5916931Z cvt.u64.u32 %rd177, %r431; 2026-02-21T09:41:20.5916987Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:41:20.5917051Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:41:20.5917216Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5917272Z mov.b64 {%r608, %r609}, %rd179; 2026-02-21T09:41:20.5917341Z cvt.rn.f16x2.f32 %r610, %r609, %r608; 2026-02-21T09:41:20.5917502Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5917584Z cvt.u64.u32 %rd180, %r433; 2026-02-21T09:41:20.5917646Z cvt.u64.u32 %rd181, %r434; 2026-02-21T09:41:20.5917700Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:41:20.5917755Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:41:20.5917920Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5918030Z mov.b64 {%r611, %r612}, %rd183; 2026-02-21T09:41:20.5918091Z cvt.rn.f16x2.f32 %r613, %r612, %r611; 2026-02-21T09:41:20.5918254Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5918317Z cvt.u64.u32 %rd184, %r435; 2026-02-21T09:41:20.5918371Z cvt.u64.u32 %rd185, %r436; 2026-02-21T09:41:20.5918426Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:41:20.5918488Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:41:20.5918651Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5918709Z mov.b64 {%r614, %r615}, %rd187; 2026-02-21T09:41:20.5918768Z cvt.rn.f16x2.f32 %r616, %r615, %r614; 2026-02-21T09:41:20.5918935Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5918990Z cvt.u64.u32 %rd188, %r437; 2026-02-21T09:41:20.5919045Z cvt.u64.u32 %rd189, %r438; 2026-02-21T09:41:20.5919109Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:41:20.5919165Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:41:20.5919332Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5919394Z mov.b64 {%r617, %r618}, %rd191; 2026-02-21T09:41:20.5919453Z cvt.rn.f16x2.f32 %r619, %r618, %r617; 2026-02-21T09:41:20.5919617Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5919674Z cvt.u64.u32 %rd192, %r439; 2026-02-21T09:41:20.5919736Z cvt.u64.u32 %rd193, %r440; 2026-02-21T09:41:20.5919793Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:41:20.5919850Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:41:20.5920019Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5920075Z mov.b64 {%r620, %r621}, %rd195; 2026-02-21T09:41:20.5920134Z cvt.rn.f16x2.f32 %r622, %r621, %r620; 2026-02-21T09:41:20.5920304Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5920382Z cvt.u64.u32 %rd196, %r441; 2026-02-21T09:41:20.5920436Z cvt.u64.u32 %rd197, %r442; 2026-02-21T09:41:20.5920492Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:41:20.5920555Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:41:20.5920719Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5920777Z mov.b64 {%r623, %r624}, %rd199; 2026-02-21T09:41:20.5920846Z cvt.rn.f16x2.f32 %r625, %r624, %r623; 2026-02-21T09:41:20.5921010Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5921066Z cvt.u64.u32 %rd200, %r443; 2026-02-21T09:41:20.5921128Z cvt.u64.u32 %rd201, %r444; 2026-02-21T09:41:20.5921183Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:41:20.5921239Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:41:20.5921405Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5921473Z mov.b64 {%r626, %r627}, %rd203; 2026-02-21T09:41:20.5921534Z cvt.rn.f16x2.f32 %r628, %r627, %r626; 2026-02-21T09:41:20.5921699Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5921767Z cvt.u64.u32 %rd204, %r445; 2026-02-21T09:41:20.5921824Z cvt.u64.u32 %rd205, %r446; 2026-02-21T09:41:20.5921879Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:41:20.5921944Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:41:20.5922109Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5922186Z mov.b64 {%r629, %r630}, %rd207; 2026-02-21T09:41:20.5922246Z cvt.rn.f16x2.f32 %r631, %r630, %r629; 2026-02-21T09:41:20.5922416Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5922502Z cvt.u64.u32 %rd208, %r447; 2026-02-21T09:41:20.5922585Z cvt.u64.u32 %rd209, %r448; 2026-02-21T09:41:20.5922650Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:41:20.5922705Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:41:20.5922867Z .loc 1 55 27 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:55:27 2026-02-21T09:41:20.5922930Z mov.b64 {%r632, %r633}, %rd211; 2026-02-21T09:41:20.5922989Z cvt.rn.f16x2.f32 %r634, %r633, %r632; 2026-02-21T09:41:20.5923150Z .loc 1 56 83 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:56:83 2026-02-21T09:41:20.5923242Z st.shared.v4.b32 [%r21], {%r541, %r553, %r565, %r577}; 2026-02-21T09:41:20.5923337Z st.shared.v4.b32 [%r22], {%r589, %r601, %r613, %r625}; 2026-02-21T09:41:20.5923393Z bar.sync 0; 2026-02-21T09:41:20.5923447Z // begin inline asm 2026-02-21T09:41:20.5923605Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r490, %r494, %r498, %r502}, [%r454]; 2026-02-21T09:41:20.5923658Z // end inline asm 2026-02-21T09:41:20.5923712Z // begin inline asm 2026-02-21T09:41:20.5923863Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r506, %r510, %r514, %r518}, [%r459]; 2026-02-21T09:41:20.5923919Z // end inline asm 2026-02-21T09:41:20.5923973Z bar.sync 0; 2026-02-21T09:41:20.5924061Z st.shared.v4.b32 [%r21], {%r544, %r556, %r568, %r580}; 2026-02-21T09:41:20.5924155Z st.shared.v4.b32 [%r22], {%r592, %r604, %r616, %r628}; 2026-02-21T09:41:20.5924210Z bar.sync 0; 2026-02-21T09:41:20.5924268Z // begin inline asm 2026-02-21T09:41:20.5924421Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r491, %r495, %r499, %r503}, [%r454]; 2026-02-21T09:41:20.5924478Z // end inline asm 2026-02-21T09:41:20.5924534Z // begin inline asm 2026-02-21T09:41:20.5924723Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r507, %r511, %r515, %r519}, [%r459]; 2026-02-21T09:41:20.5924787Z // end inline asm 2026-02-21T09:41:20.5924840Z bar.sync 0; 2026-02-21T09:41:20.5924924Z st.shared.v4.b32 [%r21], {%r547, %r559, %r571, %r583}; 2026-02-21T09:41:20.5925012Z st.shared.v4.b32 [%r22], {%r595, %r607, %r619, %r631}; 2026-02-21T09:41:20.5925106Z bar.sync 0; 2026-02-21T09:41:20.5925161Z // begin inline asm 2026-02-21T09:41:20.5925309Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r492, %r496, %r500, %r504}, [%r454]; 2026-02-21T09:41:20.5925362Z // end inline asm 2026-02-21T09:41:20.5925416Z // begin inline asm 2026-02-21T09:41:20.5925554Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r508, %r512, %r516, %r520}, [%r459]; 2026-02-21T09:41:20.5925615Z // end inline asm 2026-02-21T09:41:20.5925666Z bar.sync 0; 2026-02-21T09:41:20.5925750Z st.shared.v4.b32 [%r21], {%r550, %r562, %r574, %r586}; 2026-02-21T09:41:20.5925840Z st.shared.v4.b32 [%r22], {%r598, %r610, %r622, %r634}; 2026-02-21T09:41:20.5925892Z bar.sync 0; 2026-02-21T09:41:20.5925947Z // begin inline asm 2026-02-21T09:41:20.5926086Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r493, %r497, %r501, %r505}, [%r454]; 2026-02-21T09:41:20.5926151Z // end inline asm 2026-02-21T09:41:20.5926209Z // begin inline asm 2026-02-21T09:41:20.5926349Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r509, %r513, %r517, %r521}, [%r459]; 2026-02-21T09:41:20.5926412Z // end inline asm 2026-02-21T09:41:20.5926465Z // begin inline asm 2026-02-21T09:41:20.5926565Z st.global.v4.b32 [ %rd76 + 0 ], { %r490, %r491, %r492, %r493 }; 2026-02-21T09:41:20.5926623Z // end inline asm 2026-02-21T09:41:20.5926678Z // begin inline asm 2026-02-21T09:41:20.5926776Z st.global.v4.b32 [ %rd77 + 0 ], { %r494, %r495, %r496, %r497 }; 2026-02-21T09:41:20.5926828Z // end inline asm 2026-02-21T09:41:20.5926887Z // begin inline asm 2026-02-21T09:41:20.5927006Z st.global.v4.b32 [ %rd78 + 0 ], { %r498, %r499, %r500, %r501 }; 2026-02-21T09:41:20.5927057Z // end inline asm 2026-02-21T09:41:20.5927116Z // begin inline asm 2026-02-21T09:41:20.5927208Z st.global.v4.b32 [ %rd79 + 0 ], { %r502, %r503, %r504, %r505 }; 2026-02-21T09:41:20.5927259Z // end inline asm 2026-02-21T09:41:20.5927313Z // begin inline asm 2026-02-21T09:41:20.5927485Z st.global.v4.b32 [ %rd80 + 0 ], { %r506, %r507, %r508, %r509 }; 2026-02-21T09:41:20.5927542Z // end inline asm 2026-02-21T09:41:20.5927596Z // begin inline asm 2026-02-21T09:41:20.5927693Z st.global.v4.b32 [ %rd81 + 0 ], { %r510, %r511, %r512, %r513 }; 2026-02-21T09:41:20.5927744Z // end inline asm 2026-02-21T09:41:20.5927797Z // begin inline asm 2026-02-21T09:41:20.5927887Z st.global.v4.b32 [ %rd82 + 0 ], { %r514, %r515, %r516, %r517 }; 2026-02-21T09:41:20.5927946Z // end inline asm 2026-02-21T09:41:20.5927998Z // begin inline asm 2026-02-21T09:41:20.5928088Z st.global.v4.b32 [ %rd83 + 0 ], { %r518, %r519, %r520, %r521 }; 2026-02-21T09:41:20.5928149Z // end inline asm 2026-02-21T09:41:20.5928204Z bra.uni $L__BB0_10; 2026-02-21T09:41:20.5928284Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:20.5928455Z .loc 1 53 52 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:53:52 2026-02-21T09:41:20.5928509Z // begin inline asm 2026-02-21T09:41:20.5928559Z 2026-02-21T09:41:20.5928608Z { 2026-02-21T09:41:20.5928677Z .reg .pred complete; 2026-02-21T09:41:20.5928732Z waitLoop: 2026-02-21T09:41:20.5928846Z mbarrier.try_wait.parity.shared.b64 complete, [%r635], %r636; 2026-02-21T09:41:20.5928919Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.5928968Z } 2026-02-21T09:41:20.5928972Z 2026-02-21T09:41:20.5929026Z // end inline asm 2026-02-21T09:41:20.5929115Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:41:20.5929293Z .loc 1 28 108 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:108 2026-02-21T09:41:20.5929346Z bar.sync 0; 2026-02-21T09:41:20.5929400Z // begin inline asm 2026-02-21T09:41:20.5929496Z @%p143 mbarrier.inval.shared::cta.b64 [%r156]; 2026-02-21T09:41:20.5929548Z // end inline asm 2026-02-21T09:41:20.5929601Z bar.sync 0; 2026-02-21T09:41:20.5929663Z // begin inline asm 2026-02-21T09:41:20.5929746Z @%p143 mbarrier.inval.shared::cta.b64 [%r157]; 2026-02-21T09:41:20.5929801Z // end inline asm 2026-02-21T09:41:20.5929862Z bar.sync 0; 2026-02-21T09:41:20.5929949Z // begin inline asm 2026-02-21T09:41:20.5930027Z @%p143 mbarrier.inval.shared::cta.b64 [%r158]; 2026-02-21T09:41:20.5930079Z // end inline asm 2026-02-21T09:41:20.5930137Z bar.sync 0; 2026-02-21T09:41:20.5930191Z // begin inline asm 2026-02-21T09:41:20.5930267Z @%p143 mbarrier.inval.shared::cta.b64 [%r159]; 2026-02-21T09:41:20.5930318Z // end inline asm 2026-02-21T09:41:20.5930376Z bar.sync 0; 2026-02-21T09:41:20.5930430Z // begin inline asm 2026-02-21T09:41:20.5930506Z @%p143 mbarrier.inval.shared::cta.b64 [%r160]; 2026-02-21T09:41:20.5930567Z // end inline asm 2026-02-21T09:41:20.5930618Z bar.sync 0; 2026-02-21T09:41:20.5930671Z // begin inline asm 2026-02-21T09:41:20.5930753Z @%p143 mbarrier.inval.shared::cta.b64 [%r161]; 2026-02-21T09:41:20.5930806Z // end inline asm 2026-02-21T09:41:20.5930856Z bar.sync 0; 2026-02-21T09:41:20.5930909Z // begin inline asm 2026-02-21T09:41:20.5930992Z @%p143 mbarrier.inval.shared::cta.b64 [%r274]; 2026-02-21T09:41:20.5931044Z // end inline asm 2026-02-21T09:41:20.5931099Z // begin inline asm 2026-02-21T09:41:20.5931181Z @%p143 mbarrier.inval.shared::cta.b64 [%r154]; 2026-02-21T09:41:20.5931233Z // end inline asm 2026-02-21T09:41:20.5931284Z bar.sync 0; 2026-02-21T09:41:20.5931336Z // begin inline asm 2026-02-21T09:41:20.5931418Z @%p143 mbarrier.inval.shared::cta.b64 [%r155]; 2026-02-21T09:41:20.5931471Z // end inline asm 2026-02-21T09:41:20.5931636Z .loc 1 28 4 // czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py:28:4 2026-02-21T09:41:20.5931721Z bar.sync 0; 2026-02-21T09:41:20.5931776Z // begin inline asm 2026-02-21T09:41:20.5931887Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r646, 128; 2026-02-21T09:41:20.5931946Z // end inline asm 2026-02-21T09:41:20.5931995Z ret; 2026-02-21T09:41:20.5932049Z $L__tmp1: 2026-02-21T09:41:20.5932102Z $L__func_end0: 2026-02-21T09:41:20.5932212Z // -- End function 2026-02-21T09:41:20.5932286Z } 2026-02-21T09:41:20.5932495Z .file 1 "/tmp/torchinductor_root/zu/czuccoqmrt2v2ctie4hsb6x3k6joowzrmdoojxts4koztlhc3w7k.py" 2026-02-21T09:41:20.5932562Z .section .debug_abbrev 2026-02-21T09:41:20.5932612Z { 2026-02-21T09:41:20.5932698Z .b8 1 // Abbreviation Code 2026-02-21T09:41:20.5932779Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:20.5932864Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:20.5932940Z .b8 37 // DW_AT_producer 2026-02-21T09:41:20.5933013Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.5933090Z .b8 19 // DW_AT_language 2026-02-21T09:41:20.5933161Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:20.5933232Z .b8 3 // DW_AT_name 2026-02-21T09:41:20.5933309Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.5933385Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:20.5933456Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:20.5933526Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:20.5933602Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.5933669Z .b8 0 // EOM(1) 2026-02-21T09:41:20.5933734Z .b8 0 // EOM(2) 2026-02-21T09:41:20.5933803Z .b8 0 // EOM(3) 2026-02-21T09:41:20.5933853Z } 2026-02-21T09:41:20.5933910Z .section .debug_info 2026-02-21T09:41:20.5933963Z { 2026-02-21T09:41:20.5934041Z .b32 104 // Length of Unit 2026-02-21T09:41:20.5934121Z .b8 2 // DWARF version number 2026-02-21T09:41:20.5934170Z .b8 0 2026-02-21T09:41:20.5934290Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:20.5934377Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:20.5934497Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:20.5934580Z .b8 116 // DW_AT_producer 2026-02-21T09:41:20.5934632Z .b8 114 2026-02-21T09:41:20.5934721Z .b8 105 2026-02-21T09:41:20.5934771Z .b8 116 2026-02-21T09:41:20.5934827Z .b8 111 2026-02-21T09:41:20.5934874Z .b8 110 2026-02-21T09:41:20.5934922Z .b8 0 2026-02-21T09:41:20.5935000Z .b8 2 // DW_AT_language 2026-02-21T09:41:20.5935051Z .b8 0 2026-02-21T09:41:20.5935124Z .b8 99 // DW_AT_name 2026-02-21T09:41:20.5935171Z .b8 122 2026-02-21T09:41:20.5935226Z .b8 117 2026-02-21T09:41:20.5935275Z .b8 99 2026-02-21T09:41:20.5935323Z .b8 99 2026-02-21T09:41:20.5935377Z .b8 111 2026-02-21T09:41:20.5935424Z .b8 113 2026-02-21T09:41:20.5935473Z .b8 109 2026-02-21T09:41:20.5935522Z .b8 114 2026-02-21T09:41:20.5935580Z .b8 116 2026-02-21T09:41:20.5935629Z .b8 50 2026-02-21T09:41:20.5935678Z .b8 118 2026-02-21T09:41:20.5935734Z .b8 50 2026-02-21T09:41:20.5935783Z .b8 99 2026-02-21T09:41:20.5935831Z .b8 116 2026-02-21T09:41:20.5935878Z .b8 105 2026-02-21T09:41:20.5935935Z .b8 101 2026-02-21T09:41:20.5935984Z .b8 52 2026-02-21T09:41:20.5936033Z .b8 104 2026-02-21T09:41:20.5936092Z .b8 115 2026-02-21T09:41:20.5936142Z .b8 98 2026-02-21T09:41:20.5936193Z .b8 54 2026-02-21T09:41:20.5936243Z .b8 120 2026-02-21T09:41:20.5936302Z .b8 51 2026-02-21T09:41:20.5936352Z .b8 107 2026-02-21T09:41:20.5936433Z .b8 54 2026-02-21T09:41:20.5936484Z .b8 106 2026-02-21T09:41:20.5936541Z .b8 111 2026-02-21T09:41:20.5936590Z .b8 111 2026-02-21T09:41:20.5936639Z .b8 119 2026-02-21T09:41:20.5936694Z .b8 122 2026-02-21T09:41:20.5936744Z .b8 114 2026-02-21T09:41:20.5936793Z .b8 109 2026-02-21T09:41:20.5936841Z .b8 100 2026-02-21T09:41:20.5936897Z .b8 111 2026-02-21T09:41:20.5936945Z .b8 111 2026-02-21T09:41:20.5937021Z .b8 106 2026-02-21T09:41:20.5937106Z .b8 120 2026-02-21T09:41:20.5937159Z .b8 116 2026-02-21T09:41:20.5937209Z .b8 115 2026-02-21T09:41:20.5937260Z .b8 52 2026-02-21T09:41:20.5937316Z .b8 107 2026-02-21T09:41:20.5937367Z .b8 111 2026-02-21T09:41:20.5937417Z .b8 122 2026-02-21T09:41:20.5937467Z .b8 116 2026-02-21T09:41:20.5937527Z .b8 108 2026-02-21T09:41:20.5937576Z .b8 104 2026-02-21T09:41:20.5937627Z .b8 99 2026-02-21T09:41:20.5937684Z .b8 51 2026-02-21T09:41:20.5937734Z .b8 119 2026-02-21T09:41:20.5937784Z .b8 55 2026-02-21T09:41:20.5937834Z .b8 107 2026-02-21T09:41:20.5937891Z .b8 46 2026-02-21T09:41:20.5937943Z .b8 112 2026-02-21T09:41:20.5937993Z .b8 121 2026-02-21T09:41:20.5938050Z .b8 0 2026-02-21T09:41:20.5938143Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:20.5938218Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:20.5938268Z .b8 116 2026-02-21T09:41:20.5938328Z .b8 109 2026-02-21T09:41:20.5938377Z .b8 112 2026-02-21T09:41:20.5938429Z .b8 47 2026-02-21T09:41:20.5938486Z .b8 116 2026-02-21T09:41:20.5938538Z .b8 111 2026-02-21T09:41:20.5938590Z .b8 114 2026-02-21T09:41:20.5938639Z .b8 99 2026-02-21T09:41:20.5938696Z .b8 104 2026-02-21T09:41:20.5938745Z .b8 105 2026-02-21T09:41:20.5938795Z .b8 110 2026-02-21T09:41:20.5938852Z .b8 100 2026-02-21T09:41:20.5938901Z .b8 117 2026-02-21T09:41:20.5938951Z .b8 99 2026-02-21T09:41:20.5939001Z .b8 116 2026-02-21T09:41:20.5939058Z .b8 111 2026-02-21T09:41:20.5939107Z .b8 114 2026-02-21T09:41:20.5939155Z .b8 95 2026-02-21T09:41:20.5939205Z .b8 114 2026-02-21T09:41:20.5939260Z .b8 111 2026-02-21T09:41:20.5939311Z .b8 111 2026-02-21T09:41:20.5939360Z .b8 116 2026-02-21T09:41:20.5939417Z .b8 47 2026-02-21T09:41:20.5939466Z .b8 122 2026-02-21T09:41:20.5939514Z .b8 117 2026-02-21T09:41:20.5939563Z .b8 0 2026-02-21T09:41:20.5939620Z } 2026-02-21T09:41:20.5939685Z .section .debug_macinfo { } 2026-02-21T09:41:20.5939690Z 2026-02-21T09:41:20.5939767Z ================================================================ 2026-02-21T09:41:20.5939879Z please share the reproducer above with Triton project. 2026-02-21T09:41:20.7621209Z [26s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:20.7621488Z 2026-02-21T09:41:20.7626088Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:41:20.7627216Z 2026-02-21T09:41:20.7627419Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:20.7627603Z 2026-02-21T09:41:20.7632842Z `ptxas` stderr: 2026-02-21T09:41:20.7633063Z ================================================================ 2026-02-21T09:41:20.7635293Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 201 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.7635813Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.7635964Z 2026-02-21T09:41:20.7636364Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpn2yzf7l_.ptx -o /tmp/tmpn2yzf7l_.ptx.o 2026-02-21T09:41:20.7636818Z 2026-02-21T09:41:20.7636948Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:20.7637380Z Internal Triton PTX codegen error 2026-02-21T09:41:20.7637550Z `ptxas` stderr: 2026-02-21T09:41:20.7637965Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 201 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.7638476Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.7638690Z 2026-02-21T09:41:20.7639078Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpn2yzf7l_.ptx -o /tmp/tmpn2yzf7l_.ptx.o 2026-02-21T09:41:20.7639509Z 2026-02-21T09:41:20.7639512Z 2026-02-21T09:41:20.7639577Z // 2026-02-21T09:41:20.7639716Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:20.7639892Z // 2026-02-21T09:41:20.7639958Z 2026-02-21T09:41:20.7640012Z .version 8.7 2026-02-21T09:41:20.7640150Z .target sm_100a 2026-02-21T09:41:20.7640281Z .address_size 64 2026-02-21T09:41:20.7640373Z 2026-02-21T09:41:20.7640493Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:20.7640748Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:20.7640956Z // @_helion_matmul 2026-02-21T09:41:20.7641156Z .visible .entry _helion_matmul( 2026-02-21T09:41:20.7641367Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:20.7641628Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:20.7641869Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:20.7642116Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:20.7642355Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:20.7642557Z ) 2026-02-21T09:41:20.7642677Z .reqntid 128 2026-02-21T09:41:20.7642800Z .maxnreg 32 2026-02-21T09:41:20.7642924Z { 2026-02-21T09:41:20.7643043Z .reg .pred %p<145>; 2026-02-21T09:41:20.7643195Z .reg .b32 %r<659>; 2026-02-21T09:41:20.7643335Z .reg .b64 %rd<210>; 2026-02-21T09:41:20.7643609Z .loc 1 19 0 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:19:0 2026-02-21T09:41:20.7643895Z $L__func_begin0: 2026-02-21T09:41:20.7644148Z .loc 1 19 0 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:19:0 2026-02-21T09:41:20.7644376Z 2026-02-21T09:41:20.7644433Z // %bb.0: 2026-02-21T09:41:20.7644584Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:41:20.7644852Z $L__tmp0: 2026-02-21T09:41:20.7645081Z .loc 1 19 0 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:19 2026-02-21T09:41:20.7645361Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:20.7645528Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:41:20.7645733Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:41:20.7645894Z mov.b32 %r66, global_smem; 2026-02-21T09:41:20.7646059Z // begin inline asm 2026-02-21T09:41:20.7646293Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r66], 128; 2026-02-21T09:41:20.7646534Z // end inline asm 2026-02-21T09:41:20.7646698Z ld.param.b64 %rd50, [_helion_matmul_param_3]; 2026-02-21T09:41:20.7646878Z bar.sync 0; 2026-02-21T09:41:20.7647023Z ld.shared.b32 %r629, [global_smem]; 2026-02-21T09:41:20.7647189Z bar.sync 0; 2026-02-21T09:41:20.7647320Z // begin inline asm 2026-02-21T09:41:20.7647519Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:20.7647746Z // end inline asm 2026-02-21T09:41:20.7647997Z .loc 1 21 67 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:21:67 2026-02-21T09:41:20.7648278Z mov.u32 %r658, %ctaid.x; 2026-02-21T09:41:20.7648435Z mov.u32 %r206, %ctaid.y; 2026-02-21T09:41:20.7648577Z mov.u32 %r207, %ctaid.z; 2026-02-21T09:41:20.7648727Z mov.u32 %r208, %nctaid.x; 2026-02-21T09:41:20.7648877Z mov.u32 %r209, %nctaid.y; 2026-02-21T09:41:20.7649039Z mad.lo.s32 %r210, %r207, %r209, %r206; 2026-02-21T09:41:20.7649252Z mad.lo.s32 %r211, %r210, %r208, %r658; 2026-02-21T09:41:20.7649423Z shl.b32 %r212, %r211, 8; 2026-02-21T09:41:20.7649575Z cvt.s64.s32 %rd51, %r212; 2026-02-21T09:41:20.7649724Z add.s64 %rd19, %rd50, %rd51; 2026-02-21T09:41:20.7649885Z shl.b32 %r213, %r1, 2; 2026-02-21T09:41:20.7650033Z add.s32 %r67, %r66, %r213; 2026-02-21T09:41:20.7650216Z mov.b32 %r76, 0; 2026-02-21T09:41:20.7650349Z // begin inline asm 2026-02-21T09:41:20.7650530Z @%p4 st.shared.b32 [ %r67 + 0 ], %r76; 2026-02-21T09:41:20.7650701Z // end inline asm 2026-02-21T09:41:20.7650847Z bar.warp.sync -1; 2026-02-21T09:41:20.7650990Z setp.eq.b32 %p135, %r1, 0; 2026-02-21T09:41:20.7651153Z cvt.u64.u32 %rd4, %r66; 2026-02-21T09:41:20.7651309Z // begin inline asm 2026-02-21T09:41:20.7651549Z @%p135 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:41:20.7651838Z // end inline asm 2026-02-21T09:41:20.7651973Z // begin inline asm 2026-02-21T09:41:20.7652207Z @%p135 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:20.7652450Z // end inline asm 2026-02-21T09:41:20.7652769Z mov.b32 %r69, 32; 2026-02-21T09:41:20.7652898Z // begin inline asm 2026-02-21T09:41:20.7653130Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r69; 2026-02-21T09:41:20.7653393Z // end inline asm 2026-02-21T09:41:20.7653520Z mov.b32 %r70, 64; 2026-02-21T09:41:20.7653656Z // begin inline asm 2026-02-21T09:41:20.7653879Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r70; 2026-02-21T09:41:20.7654140Z // end inline asm 2026-02-21T09:41:20.7654270Z mov.b32 %r71, 1024; 2026-02-21T09:41:20.7654414Z // begin inline asm 2026-02-21T09:41:20.7654652Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r71; 2026-02-21T09:41:20.7654970Z // end inline asm 2026-02-21T09:41:20.7655101Z // begin inline asm 2026-02-21T09:41:20.7655342Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r71; 2026-02-21T09:41:20.7655611Z // end inline asm 2026-02-21T09:41:20.7655740Z mov.b64 %rd12, 2048; 2026-02-21T09:41:20.7655883Z // begin inline asm 2026-02-21T09:41:20.7656126Z @%p135 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:20.7656410Z // end inline asm 2026-02-21T09:41:20.7656537Z mov.b32 %r73, 1; 2026-02-21T09:41:20.7656672Z // begin inline asm 2026-02-21T09:41:20.7656949Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r73; 2026-02-21T09:41:20.7657239Z // end inline asm 2026-02-21T09:41:20.7657373Z // begin inline asm 2026-02-21T09:41:20.7657613Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:41:20.7657900Z // end inline asm 2026-02-21T09:41:20.7658027Z // begin inline asm 2026-02-21T09:41:20.7658259Z @%p135 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:20.7658513Z // end inline asm 2026-02-21T09:41:20.7658651Z // begin inline asm 2026-02-21T09:41:20.7658891Z @%p135 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.7659174Z // end inline asm 2026-02-21T09:41:20.7659311Z // begin inline asm 2026-02-21T09:41:20.7659542Z @%p135 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:20.7659809Z // end inline asm 2026-02-21T09:41:20.7659938Z // begin inline asm 2026-02-21T09:41:20.7660162Z @%p135 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.7660418Z // end inline asm 2026-02-21T09:41:20.7660550Z // begin inline asm 2026-02-21T09:41:20.7660888Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:20.7661247Z // end inline asm 2026-02-21T09:41:20.7661382Z // begin inline asm 2026-02-21T09:41:20.7661582Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:41:20.7661870Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.7662051Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.7662228Z // end inline asm 2026-02-21T09:41:20.7662361Z bar.sync 0; 2026-02-21T09:41:20.7662494Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:41:20.7662829Z .loc 1 22 68 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:22:68 2026-02-21T09:41:20.7663120Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:41:20.7663276Z bar.sync 0; 2026-02-21T09:41:20.7663402Z // begin inline asm 2026-02-21T09:41:20.7663554Z @%p4 st.shared.b32 [ %r67 + 0 ], %r76; 2026-02-21T09:41:20.7663717Z // end inline asm 2026-02-21T09:41:20.7663857Z bar.warp.sync -1; 2026-02-21T09:41:20.7663991Z // begin inline asm 2026-02-21T09:41:20.7664239Z @%p135 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:41:20.7664515Z // end inline asm 2026-02-21T09:41:20.7664649Z // begin inline asm 2026-02-21T09:41:20.7664907Z @%p135 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:20.7665154Z // end inline asm 2026-02-21T09:41:20.7665290Z // begin inline asm 2026-02-21T09:41:20.7666773Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r69; 2026-02-21T09:41:20.7667068Z // end inline asm 2026-02-21T09:41:20.7667205Z mov.b32 %r78, 128; 2026-02-21T09:41:20.7667355Z // begin inline asm 2026-02-21T09:41:20.7667588Z @%p135 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r78; 2026-02-21T09:41:20.7667862Z // end inline asm 2026-02-21T09:41:20.7667990Z // begin inline asm 2026-02-21T09:41:20.7668232Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r71; 2026-02-21T09:41:20.7668505Z // end inline asm 2026-02-21T09:41:20.7668632Z mov.b32 %r80, 12288; 2026-02-21T09:41:20.7668777Z // begin inline asm 2026-02-21T09:41:20.7669008Z @%p135 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r80; 2026-02-21T09:41:20.7669276Z // end inline asm 2026-02-21T09:41:20.7669404Z // begin inline asm 2026-02-21T09:41:20.7669654Z @%p135 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:20.7669942Z // end inline asm 2026-02-21T09:41:20.7670089Z // begin inline asm 2026-02-21T09:41:20.7670332Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r73; 2026-02-21T09:41:20.7670652Z // end inline asm 2026-02-21T09:41:20.7670785Z // begin inline asm 2026-02-21T09:41:20.7671026Z @%p135 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r73; 2026-02-21T09:41:20.7671315Z // end inline asm 2026-02-21T09:41:20.7671442Z // begin inline asm 2026-02-21T09:41:20.7671674Z @%p135 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:20.7671931Z // end inline asm 2026-02-21T09:41:20.7672071Z // begin inline asm 2026-02-21T09:41:20.7672316Z @%p135 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.7672593Z // end inline asm 2026-02-21T09:41:20.7672729Z // begin inline asm 2026-02-21T09:41:20.7672956Z @%p135 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:20.7673257Z // end inline asm 2026-02-21T09:41:20.7673386Z // begin inline asm 2026-02-21T09:41:20.7673619Z @%p135 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:20.7673903Z // end inline asm 2026-02-21T09:41:20.7674038Z // begin inline asm 2026-02-21T09:41:20.7674405Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:20.7674837Z // end inline asm 2026-02-21T09:41:20.7674983Z // begin inline asm 2026-02-21T09:41:20.7675197Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:41:20.7675496Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.7675687Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.7675874Z // end inline asm 2026-02-21T09:41:20.7676019Z bar.sync 0; 2026-02-21T09:41:20.7676165Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:41:20.7676500Z .loc 1 40 45 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:40:45 2026-02-21T09:41:20.7676808Z shr.u32 %r214, %r1, 5; 2026-02-21T09:41:20.7677094Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7677402Z sub.s32 %r216, 1536, %r658; 2026-02-21T09:41:20.7677580Z mul.hi.s32 %r217, %r216, -580400985; 2026-02-21T09:41:20.7677766Z add.s32 %r218, %r217, %r216; 2026-02-21T09:41:20.7677926Z shr.u32 %r219, %r218, 31; 2026-02-21T09:41:20.7678086Z shr.s32 %r220, %r218, 12; 2026-02-21T09:41:20.7678244Z add.s32 %r221, %r220, %r219; 2026-02-21T09:41:20.7678410Z mul.lo.s32 %r222, %r221, 4736; 2026-02-21T09:41:20.7678581Z setp.ne.b32 %p70, %r216, %r222; 2026-02-21T09:41:20.7678763Z setp.lt.u32 %p71, %r658, 1537; 2026-02-21T09:41:20.7678935Z and.pred %p72, %p71, %p70; 2026-02-21T09:41:20.7679110Z selp.b32 %r223, 1, 0, %p72; 2026-02-21T09:41:20.7679271Z add.s32 %r14, %r221, %r223; 2026-02-21T09:41:20.7679630Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7679952Z shfl.sync.idx.b32 %r16, %r214, 0, 31, -1; 2026-02-21T09:41:20.7680141Z shl.b32 %r224, %r16, 21; 2026-02-21T09:41:20.7680304Z and.b32 %r225, %r224, 6291456; 2026-02-21T09:41:20.7680462Z add.s32 %r83, %r225, %r629; 2026-02-21T09:41:20.7680622Z mov.pred %p42, -1; 2026-02-21T09:41:20.7680763Z // begin inline asm 2026-02-21T09:41:20.7681122Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 0], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:20.7681526Z // end inline asm 2026-02-21T09:41:20.7681659Z // begin inline asm 2026-02-21T09:41:20.7681992Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 16], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:20.7682348Z // end inline asm 2026-02-21T09:41:20.7682486Z // begin inline asm 2026-02-21T09:41:20.7682805Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 32], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:20.7683193Z // end inline asm 2026-02-21T09:41:20.7683330Z // begin inline asm 2026-02-21T09:41:20.7683649Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r83 + 48], 64, {%r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76, %r76}; 2026-02-21T09:41:20.7684007Z // end inline asm 2026-02-21T09:41:20.7684135Z // begin inline asm 2026-02-21T09:41:20.7684290Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:20.7684448Z // end inline asm 2026-02-21T09:41:20.7684584Z bar.sync 0; 2026-02-21T09:41:20.7684866Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7685174Z add.s32 %r151, %r66, 77872; 2026-02-21T09:41:20.7685330Z // begin inline asm 2026-02-21T09:41:20.7685496Z @%p135 mbarrier.init.shared::cta.b64 [%r151], 1; 2026-02-21T09:41:20.7685692Z // end inline asm 2026-02-21T09:41:20.7685824Z bar.sync 0; 2026-02-21T09:41:20.7685965Z add.s32 %r152, %r66, 77880; 2026-02-21T09:41:20.7686115Z // begin inline asm 2026-02-21T09:41:20.7686282Z @%p135 mbarrier.init.shared::cta.b64 [%r152], 1; 2026-02-21T09:41:20.7686463Z // end inline asm 2026-02-21T09:41:20.7686602Z add.s32 %r153, %r66, 77824; 2026-02-21T09:41:20.7686753Z // begin inline asm 2026-02-21T09:41:20.7686907Z @%p135 mbarrier.init.shared::cta.b64 [%r153], 1; 2026-02-21T09:41:20.7687095Z // end inline asm 2026-02-21T09:41:20.7687219Z bar.sync 0; 2026-02-21T09:41:20.7687351Z add.s32 %r154, %r66, 77832; 2026-02-21T09:41:20.7687547Z // begin inline asm 2026-02-21T09:41:20.7687715Z @%p135 mbarrier.init.shared::cta.b64 [%r154], 1; 2026-02-21T09:41:20.7687898Z // end inline asm 2026-02-21T09:41:20.7688036Z bar.sync 0; 2026-02-21T09:41:20.7688165Z add.s32 %r155, %r66, 77840; 2026-02-21T09:41:20.7688322Z // begin inline asm 2026-02-21T09:41:20.7688513Z @%p135 mbarrier.init.shared::cta.b64 [%r155], 1; 2026-02-21T09:41:20.7688692Z // end inline asm 2026-02-21T09:41:20.7688827Z bar.sync 0; 2026-02-21T09:41:20.7688952Z add.s32 %r156, %r66, 77848; 2026-02-21T09:41:20.7689107Z // begin inline asm 2026-02-21T09:41:20.7689261Z @%p135 mbarrier.init.shared::cta.b64 [%r156], 1; 2026-02-21T09:41:20.7689444Z // end inline asm 2026-02-21T09:41:20.7689568Z bar.sync 0; 2026-02-21T09:41:20.7689699Z add.s32 %r157, %r66, 77856; 2026-02-21T09:41:20.7689841Z // begin inline asm 2026-02-21T09:41:20.7690000Z @%p135 mbarrier.init.shared::cta.b64 [%r157], 1; 2026-02-21T09:41:20.7690183Z // end inline asm 2026-02-21T09:41:20.7690308Z bar.sync 0; 2026-02-21T09:41:20.7690436Z add.s32 %r259, %r66, 77864; 2026-02-21T09:41:20.7690580Z // begin inline asm 2026-02-21T09:41:20.7690738Z @%p135 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:41:20.7690911Z // end inline asm 2026-02-21T09:41:20.7691052Z setp.lt.s32 %p73, %r14, 1; 2026-02-21T09:41:20.7691241Z setp.gt.s32 %p69, %r14, 0; 2026-02-21T09:41:20.7691514Z .loc 1 35 33 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:35:33 2026-02-21T09:41:20.7691807Z shr.u32 %r226, %r658, 4; 2026-02-21T09:41:20.7691958Z and.b32 %r227, %r226, 134217664; 2026-02-21T09:41:20.7692231Z .loc 1 36 39 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:36:39 2026-02-21T09:41:20.7692508Z sub.s32 %r228, 96, %r227; 2026-02-21T09:41:20.7692766Z .loc 1 36 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:36:52 2026-02-21T09:41:20.7693051Z min.s32 %r229, %r228, 64; 2026-02-21T09:41:20.7693308Z .loc 1 37 45 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:37:45 2026-02-21T09:41:20.7693592Z and.b32 %r230, %r658, 1023; 2026-02-21T09:41:20.7693845Z .loc 1 38 51 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:38:51 2026-02-21T09:41:20.7694134Z div.s32 %r231, %r230, %r229; 2026-02-21T09:41:20.7694389Z .loc 1 37 64 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:37:64 2026-02-21T09:41:20.7694743Z mul.lo.s32 %r232, %r231, %r229; 2026-02-21T09:41:20.7694908Z sub.s32 %r233, %r230, %r232; 2026-02-21T09:41:20.7695175Z .loc 1 37 30 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:37:30 2026-02-21T09:41:20.7695463Z add.s32 %r234, %r233, %r227; 2026-02-21T09:41:20.7695724Z .loc 1 39 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:39:27 2026-02-21T09:41:20.7696016Z shl.b32 %r635, %r234, 7; 2026-02-21T09:41:20.7696273Z .loc 1 41 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:41:27 2026-02-21T09:41:20.7696561Z shl.b32 %r631, %r231, 6; 2026-02-21T09:41:20.7696822Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7697129Z bar.sync 0; 2026-02-21T09:41:20.7697281Z and.pred %p1, %p135, %p69; 2026-02-21T09:41:20.7697435Z // begin inline asm 2026-02-21T09:41:20.7697632Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r153], 12288; 2026-02-21T09:41:20.7697860Z // end inline asm 2026-02-21T09:41:20.7698114Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7698393Z bar.sync 0; 2026-02-21T09:41:20.7698534Z elect.sync %r235|%p74, -1; 2026-02-21T09:41:20.7698701Z and.pred %p75, %p69, %p74; 2026-02-21T09:41:20.7698855Z and.pred %p55, %p4, %p75; 2026-02-21T09:41:20.7699014Z add.s32 %r160, %r66, 49152; 2026-02-21T09:41:20.7699190Z // begin inline asm 2026-02-21T09:41:20.7699521Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd40, {%r76, %r631}], [%r153]; 2026-02-21T09:41:20.7699874Z // end inline asm 2026-02-21T09:41:20.7700118Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7700424Z // begin inline asm 2026-02-21T09:41:20.7700577Z fence.proxy.async.shared::cta; 2026-02-21T09:41:20.7700746Z // end inline asm 2026-02-21T09:41:20.7700873Z bar.sync 0; 2026-02-21T09:41:20.7701015Z elect.sync %r236|%p76, -1; 2026-02-21T09:41:20.7701168Z and.pred %p77, %p69, %p76; 2026-02-21T09:41:20.7701329Z and.pred %p56, %p4, %p77; 2026-02-21T09:41:20.7701478Z // begin inline asm 2026-02-21T09:41:20.7701806Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r66], [%rd41, {%r76, %r635}], [%r153]; 2026-02-21T09:41:20.7702150Z // end inline asm 2026-02-21T09:41:20.7702392Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7702677Z bar.sync 0; 2026-02-21T09:41:20.7702802Z // begin inline asm 2026-02-21T09:41:20.7702993Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r154], 12288; 2026-02-21T09:41:20.7703204Z // end inline asm 2026-02-21T09:41:20.7703478Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7703752Z bar.sync 0; 2026-02-21T09:41:20.7703888Z elect.sync %r237|%p78, -1; 2026-02-21T09:41:20.7704049Z and.pred %p79, %p69, %p78; 2026-02-21T09:41:20.7704200Z and.pred %p58, %p4, %p79; 2026-02-21T09:41:20.7704361Z add.s32 %r169, %r66, 53248; 2026-02-21T09:41:20.7704506Z // begin inline asm 2026-02-21T09:41:20.7704849Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r169], [%rd40, {%r69, %r631}], [%r154]; 2026-02-21T09:41:20.7705189Z // end inline asm 2026-02-21T09:41:20.7705433Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7705717Z bar.sync 0; 2026-02-21T09:41:20.7705845Z elect.sync %r238|%p80, -1; 2026-02-21T09:41:20.7706004Z and.pred %p81, %p69, %p80; 2026-02-21T09:41:20.7706154Z and.pred %p59, %p4, %p81; 2026-02-21T09:41:20.7706312Z add.s32 %r173, %r66, 8192; 2026-02-21T09:41:20.7706457Z // begin inline asm 2026-02-21T09:41:20.7706774Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r173], [%rd41, {%r69, %r635}], [%r154]; 2026-02-21T09:41:20.7707146Z // end inline asm 2026-02-21T09:41:20.7707402Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7707686Z bar.sync 0; 2026-02-21T09:41:20.7707810Z // begin inline asm 2026-02-21T09:41:20.7708003Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r155], 12288; 2026-02-21T09:41:20.7708216Z // end inline asm 2026-02-21T09:41:20.7708470Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7708751Z bar.sync 0; 2026-02-21T09:41:20.7708888Z elect.sync %r239|%p82, -1; 2026-02-21T09:41:20.7709041Z and.pred %p83, %p69, %p82; 2026-02-21T09:41:20.7709199Z and.pred %p61, %p4, %p83; 2026-02-21T09:41:20.7709356Z add.s32 %r178, %r66, 57344; 2026-02-21T09:41:20.7709504Z // begin inline asm 2026-02-21T09:41:20.7709821Z @%p61 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd40, {%r70, %r631}], [%r155]; 2026-02-21T09:41:20.7710158Z // end inline asm 2026-02-21T09:41:20.7710404Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7710678Z bar.sync 0; 2026-02-21T09:41:20.7710814Z elect.sync %r240|%p84, -1; 2026-02-21T09:41:20.7710975Z and.pred %p85, %p69, %p84; 2026-02-21T09:41:20.7711125Z and.pred %p62, %p4, %p85; 2026-02-21T09:41:20.7711280Z add.s32 %r182, %r66, 16384; 2026-02-21T09:41:20.7711457Z // begin inline asm 2026-02-21T09:41:20.7711778Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r182], [%rd41, {%r70, %r635}], [%r155]; 2026-02-21T09:41:20.7712114Z // end inline asm 2026-02-21T09:41:20.7712402Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7712695Z bar.sync 0; 2026-02-21T09:41:20.7712831Z // begin inline asm 2026-02-21T09:41:20.7713028Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r156], 12288; 2026-02-21T09:41:20.7713240Z // end inline asm 2026-02-21T09:41:20.7713486Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7713760Z bar.sync 0; 2026-02-21T09:41:20.7713898Z elect.sync %r241|%p86, -1; 2026-02-21T09:41:20.7714054Z and.pred %p87, %p69, %p86; 2026-02-21T09:41:20.7714216Z and.pred %p64, %p4, %p87; 2026-02-21T09:41:20.7714368Z add.s32 %r187, %r66, 61440; 2026-02-21T09:41:20.7714524Z mov.b32 %r188, 96; 2026-02-21T09:41:20.7714693Z // begin inline asm 2026-02-21T09:41:20.7715015Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r187], [%rd40, {%r188, %r631}], [%r156]; 2026-02-21T09:41:20.7715370Z // end inline asm 2026-02-21T09:41:20.7715640Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7715916Z bar.sync 0; 2026-02-21T09:41:20.7716047Z elect.sync %r242|%p88, -1; 2026-02-21T09:41:20.7716206Z and.pred %p89, %p69, %p88; 2026-02-21T09:41:20.7716359Z and.pred %p65, %p4, %p89; 2026-02-21T09:41:20.7716515Z add.s32 %r191, %r66, 24576; 2026-02-21T09:41:20.7716669Z // begin inline asm 2026-02-21T09:41:20.7716987Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r191], [%rd41, {%r188, %r635}], [%r156]; 2026-02-21T09:41:20.7717342Z // end inline asm 2026-02-21T09:41:20.7717585Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7717873Z bar.sync 0; 2026-02-21T09:41:20.7717995Z // begin inline asm 2026-02-21T09:41:20.7718183Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r157], 12288; 2026-02-21T09:41:20.7718395Z // end inline asm 2026-02-21T09:41:20.7718630Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7718919Z bar.sync 0; 2026-02-21T09:41:20.7719085Z elect.sync %r243|%p90, -1; 2026-02-21T09:41:20.7719260Z and.pred %p91, %p69, %p90; 2026-02-21T09:41:20.7719420Z and.pred %p67, %p4, %p91; 2026-02-21T09:41:20.7719585Z add.s32 %r196, %r66, 65536; 2026-02-21T09:41:20.7719736Z // begin inline asm 2026-02-21T09:41:20.7720060Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r196], [%rd40, {%r78, %r631}], [%r157]; 2026-02-21T09:41:20.7720417Z // end inline asm 2026-02-21T09:41:20.7720663Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7720958Z bar.sync 0; 2026-02-21T09:41:20.7721094Z elect.sync %r244|%p92, -1; 2026-02-21T09:41:20.7721262Z and.pred %p93, %p69, %p92; 2026-02-21T09:41:20.7721418Z and.pred %p68, %p4, %p93; 2026-02-21T09:41:20.7721581Z add.s32 %r200, %r66, 32768; 2026-02-21T09:41:20.7721735Z // begin inline asm 2026-02-21T09:41:20.7722067Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r200], [%rd41, {%r78, %r635}], [%r157]; 2026-02-21T09:41:20.7722437Z // end inline asm 2026-02-21T09:41:20.7722689Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7722993Z bar.sync 0; 2026-02-21T09:41:20.7723122Z // begin inline asm 2026-02-21T09:41:20.7723265Z 2026-02-21T09:41:20.7723380Z { 2026-02-21T09:41:20.7723512Z @!%p69 bra.uni skipWait; 2026-02-21T09:41:20.7723678Z .reg .pred complete; 2026-02-21T09:41:20.7723851Z waitLoop: 2026-02-21T09:41:20.7724046Z mbarrier.try_wait.parity.shared.b64 complete, [%r153], %r76; 2026-02-21T09:41:20.7724282Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.7724447Z skipWait: 2026-02-21T09:41:20.7724569Z } 2026-02-21T09:41:20.7724641Z 2026-02-21T09:41:20.7724717Z // end inline asm 2026-02-21T09:41:20.7725000Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7725307Z setp.ne.b32 %p94, %r16, 0; 2026-02-21T09:41:20.7725478Z or.pred %p95, %p73, %p94; 2026-02-21T09:41:20.7725634Z @%p95 bra $L__BB0_2; 2026-02-21T09:41:20.7725781Z // %bb.1: 2026-02-21T09:41:20.7725913Z elect.sync %r249|%p97, -1; 2026-02-21T09:41:20.7726084Z bfe.u32 %r252, %r160, 4, 14; 2026-02-21T09:41:20.7726246Z cvt.u64.u32 %rd57, %r252; 2026-02-21T09:41:20.7726427Z or.b64 %rd52, %rd57, -9223371899399045120; 2026-02-21T09:41:20.7726614Z bfe.u32 %r253, %r66, 4, 14; 2026-02-21T09:41:20.7726777Z cvt.u64.u32 %rd58, %r253; 2026-02-21T09:41:20.7726944Z or.b64 %rd53, %rd58, -9223371899382267904; 2026-02-21T09:41:20.7727134Z mov.b32 %r246, 69206032; 2026-02-21T09:41:20.7727294Z mov.pred %p96, 0; 2026-02-21T09:41:20.7727443Z // begin inline asm 2026-02-21T09:41:20.7727669Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd52, %rd53, %r246, %p96; 2026-02-21T09:41:20.7727941Z // end inline asm 2026-02-21T09:41:20.7728086Z add.s32 %r254, %r66, 49184; 2026-02-21T09:41:20.7728242Z bfe.u32 %r255, %r254, 4, 14; 2026-02-21T09:41:20.7728406Z cvt.u64.u32 %rd59, %r255; 2026-02-21T09:41:20.7728568Z or.b64 %rd54, %rd59, -9223371899399045120; 2026-02-21T09:41:20.7728757Z add.s32 %r256, %r66, 32; 2026-02-21T09:41:20.7728919Z bfe.u32 %r257, %r256, 4, 14; 2026-02-21T09:41:20.7729074Z cvt.u64.u32 %rd60, %r257; 2026-02-21T09:41:20.7729241Z or.b64 %rd55, %rd60, -9223371899382267904; 2026-02-21T09:41:20.7729414Z // begin inline asm 2026-02-21T09:41:20.7729639Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd54, %rd55, %r246, %p42; 2026-02-21T09:41:20.7729882Z // end inline asm 2026-02-21T09:41:20.7730022Z add.s32 %r258, %r66, 77872; 2026-02-21T09:41:20.7730173Z cvt.u64.u32 %rd56, %r258; 2026-02-21T09:41:20.7730327Z // begin inline asm 2026-02-21T09:41:20.7730537Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T09:41:20.7730764Z // end inline asm 2026-02-21T09:41:20.7730903Z $L__BB0_2: 2026-02-21T09:41:20.7731140Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7731491Z bar.sync 0; 2026-02-21T09:41:20.7731615Z // begin inline asm 2026-02-21T09:41:20.7731804Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r259], 12288; 2026-02-21T09:41:20.7732015Z // end inline asm 2026-02-21T09:41:20.7732268Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7732562Z bar.sync 0; 2026-02-21T09:41:20.7732694Z elect.sync %r269|%p107, -1; 2026-02-21T09:41:20.7732862Z and.pred %p108, %p69, %p107; 2026-02-21T09:41:20.7733021Z and.pred %p102, %p4, %p108; 2026-02-21T09:41:20.7733183Z add.s32 %r260, %r66, 69632; 2026-02-21T09:41:20.7733330Z mov.b32 %r643, 160; 2026-02-21T09:41:20.7733475Z // begin inline asm 2026-02-21T09:41:20.7733797Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r260], [%rd40, {%r643, %r631}], [%r259]; 2026-02-21T09:41:20.7734155Z // end inline asm 2026-02-21T09:41:20.7734403Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7734706Z bar.sync 0; 2026-02-21T09:41:20.7734846Z elect.sync %r270|%p109, -1; 2026-02-21T09:41:20.7735003Z and.pred %p110, %p69, %p109; 2026-02-21T09:41:20.7735166Z and.pred %p103, %p4, %p110; 2026-02-21T09:41:20.7735315Z add.s32 %r264, %r66, 40960; 2026-02-21T09:41:20.7735467Z // begin inline asm 2026-02-21T09:41:20.7735781Z @%p103 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd41, {%r643, %r635}], [%r259]; 2026-02-21T09:41:20.7736173Z // end inline asm 2026-02-21T09:41:20.7736433Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7736723Z @%p73 bra $L__BB0_12; 2026-02-21T09:41:20.7736893Z // %bb.3: // %.lr.ph 2026-02-21T09:41:20.7737213Z .loc 1 0 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:0:108 2026-02-21T09:41:20.7737510Z and.b32 %r4, %r1, 15; 2026-02-21T09:41:20.7737653Z shr.u32 %r215, %r1, 4; 2026-02-21T09:41:20.7737808Z bfe.u32 %r6, %r1, 4, 3; 2026-02-21T09:41:20.7737987Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:41:20.7738173Z shl.b32 %r5, %r4, 3; 2026-02-21T09:41:20.7738318Z or.b32 %r7, %r6, 8; 2026-02-21T09:41:20.7738451Z or.b32 %r8, %r6, 16; 2026-02-21T09:41:20.7738592Z or.b32 %r9, %r6, 24; 2026-02-21T09:41:20.7738728Z or.b32 %r10, %r6, 32; 2026-02-21T09:41:20.7738879Z or.b32 %r11, %r6, 40; 2026-02-21T09:41:20.7739026Z or.b32 %r12, %r6, 48; 2026-02-21T09:41:20.7739171Z or.b32 %r13, %r215, 56; 2026-02-21T09:41:20.7739315Z shl.b32 %r15, %r14, 5; 2026-02-21T09:41:20.7739468Z add.s32 %r20, %r15, -6; 2026-02-21T09:41:20.7739619Z shl.b32 %r279, %r1, 9; 2026-02-21T09:41:20.7739786Z and.b32 %r280, %r279, 3072; 2026-02-21T09:41:20.7739946Z shl.b32 %r281, %r4, 4; 2026-02-21T09:41:20.7740083Z and.b32 %r282, %r1, 96; 2026-02-21T09:41:20.7740234Z shl.b32 %r283, %r282, 3; 2026-02-21T09:41:20.7740379Z and.b32 %r285, %r213, 64; 2026-02-21T09:41:20.7740535Z or.b32 %r286, %r281, %r283; 2026-02-21T09:41:20.7740593Z xor.b32 %r287, %r286, %r285; 2026-02-21T09:41:20.7740647Z or.b32 %r288, %r287, %r280; 2026-02-21T09:41:20.7740709Z add.s32 %r290, %r66, 73728; 2026-02-21T09:41:20.7740764Z add.s32 %r21, %r290, %r288; 2026-02-21T09:41:20.7740819Z xor.b32 %r291, %r288, 32; 2026-02-21T09:41:20.7740880Z add.s32 %r22, %r290, %r291; 2026-02-21T09:41:20.7740936Z shl.b32 %r292, %r1, 5; 2026-02-21T09:41:20.7740990Z and.b32 %r293, %r292, 3168; 2026-02-21T09:41:20.7741044Z shl.b32 %r294, %r1, 4; 2026-02-21T09:41:20.7741108Z and.b32 %r295, %r294, 384; 2026-02-21T09:41:20.7741164Z and.b32 %r296, %r213, 16; 2026-02-21T09:41:20.7741217Z or.b32 %r297, %r293, %r295; 2026-02-21T09:41:20.7741281Z xor.b32 %r298, %r297, %r282; 2026-02-21T09:41:20.7741338Z add.s32 %r299, %r290, %r296; 2026-02-21T09:41:20.7741394Z add.s32 %r438, %r299, %r298; 2026-02-21T09:41:20.7741564Z add.s32 %r443, %r438, 512; 2026-02-21T09:41:20.7741748Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7741805Z max.s32 %r300, %r15, 2; 2026-02-21T09:41:20.7741863Z add.s32 %r25, %r300, -1; 2026-02-21T09:41:20.7741930Z mov.pred %p144, -1; 2026-02-21T09:41:20.7741983Z mov.b32 %r648, 5; 2026-02-21T09:41:20.7742036Z mov.b32 %r644, 0; 2026-02-21T09:41:20.7742096Z mov.b32 %r642, 1; 2026-02-21T09:41:20.7742151Z mov.b32 %r641, 2; 2026-02-21T09:41:20.7742203Z mov.b32 %r640, 3; 2026-02-21T09:41:20.7742254Z mov.b32 %r639, 4; 2026-02-21T09:41:20.7742319Z mov.b32 %r632, %r631; 2026-02-21T09:41:20.7742373Z mov.b32 %r633, %r631; 2026-02-21T09:41:20.7742429Z mov.b32 %r634, %r631; 2026-02-21T09:41:20.7742490Z mov.b32 %r636, %r635; 2026-02-21T09:41:20.7742546Z mov.b32 %r637, %r635; 2026-02-21T09:41:20.7742600Z mov.b32 %r638, %r635; 2026-02-21T09:41:20.7742652Z mov.b32 %r645, %r151; 2026-02-21T09:41:20.7742717Z mov.b32 %r646, %r644; 2026-02-21T09:41:20.7742771Z mov.b32 %r647, %r644; 2026-02-21T09:41:20.7742823Z mov.b32 %r649, %r642; 2026-02-21T09:41:20.7742883Z mov.b32 %r650, %r644; 2026-02-21T09:41:20.7742936Z mov.b32 %r651, %r631; 2026-02-21T09:41:20.7742989Z mov.b32 %r652, %r635; 2026-02-21T09:41:20.7743042Z mov.b32 %r654, %r648; 2026-02-21T09:41:20.7743103Z mov.b32 %r655, %r644; 2026-02-21T09:41:20.7743154Z mov.b32 %r656, %r652; 2026-02-21T09:41:20.7743206Z mov.b32 %r657, %r651; 2026-02-21T09:41:20.7743289Z bra.uni $L__BB0_4; 2026-02-21T09:41:20.7743392Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.7743559Z .loc 1 0 0 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:0 2026-02-21T09:41:20.7743622Z selp.b32 %r649, 0, %r352, %p128; 2026-02-21T09:41:20.7743712Z selp.b32 %r353, 1, 0, %p128; 2026-02-21T09:41:20.7743768Z xor.b32 %r650, %r620, %r353; 2026-02-21T09:41:20.7743944Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7744008Z add.s32 %r655, %r655, 1; 2026-02-21T09:41:20.7744069Z setp.ne.b32 %p134, %r25, %r655; 2026-02-21T09:41:20.7744122Z mov.b32 %r631, %r651; 2026-02-21T09:41:20.7744182Z mov.b32 %r634, %r28; 2026-02-21T09:41:20.7744235Z mov.b32 %r635, %r652; 2026-02-21T09:41:20.7744288Z mov.b32 %r638, %r32; 2026-02-21T09:41:20.7744340Z mov.b32 %r639, %r654; 2026-02-21T09:41:20.7744401Z mov.b32 %r642, %r36; 2026-02-21T09:41:20.7744455Z mov.b32 %r644, %r620; 2026-02-21T09:41:20.7744507Z mov.b32 %r645, %r619; 2026-02-21T09:41:20.7744566Z mov.b32 %r651, %r657; 2026-02-21T09:41:20.7744619Z mov.b32 %r652, %r656; 2026-02-21T09:41:20.7744698Z mov.b32 %r654, %r51; 2026-02-21T09:41:20.7744756Z @%p134 bra $L__BB0_4; 2026-02-21T09:41:20.7744845Z bra.uni $L__BB0_11; 2026-02-21T09:41:20.7744949Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:20.7745114Z .loc 1 0 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:0:108 2026-02-21T09:41:20.7745176Z mov.b32 %r620, %r650; 2026-02-21T09:41:20.7745229Z mov.b32 %r36, %r641; 2026-02-21T09:41:20.7745281Z mov.b32 %r641, %r640; 2026-02-21T09:41:20.7745333Z mov.b32 %r640, %r639; 2026-02-21T09:41:20.7745394Z mov.b32 %r32, %r637; 2026-02-21T09:41:20.7745448Z mov.b32 %r637, %r636; 2026-02-21T09:41:20.7745501Z mov.b32 %r636, %r635; 2026-02-21T09:41:20.7745564Z mov.b32 %r28, %r633; 2026-02-21T09:41:20.7745619Z mov.b32 %r633, %r632; 2026-02-21T09:41:20.7745672Z mov.b32 %r632, %r631; 2026-02-21T09:41:20.7745848Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7745905Z add.s32 %r301, %r654, 1; 2026-02-21T09:41:20.7745972Z setp.eq.b32 %p112, %r654, 31; 2026-02-21T09:41:20.7746035Z selp.b32 %r51, 0, %r301, %p112; 2026-02-21T09:41:20.7746103Z setp.ne.b32 %p113, %r51, 0; 2026-02-21T09:41:20.7746189Z @%p113 bra $L__BB0_6; 2026-02-21T09:41:20.7746289Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.7746361Z add.s32 %r658, %r658, 4736; 2026-02-21T09:41:20.7746530Z .loc 1 34 35 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:34:35 2026-02-21T09:41:20.7746588Z shr.s32 %r302, %r658, 31; 2026-02-21T09:41:20.7746642Z shr.u32 %r303, %r302, 22; 2026-02-21T09:41:20.7746703Z add.s32 %r304, %r658, %r303; 2026-02-21T09:41:20.7746757Z shr.s32 %r305, %r304, 10; 2026-02-21T09:41:20.7746921Z .loc 1 35 33 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:35:33 2026-02-21T09:41:20.7746984Z shl.b32 %r306, %r305, 6; 2026-02-21T09:41:20.7747148Z .loc 1 36 39 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:36:39 2026-02-21T09:41:20.7747203Z sub.s32 %r307, 96, %r306; 2026-02-21T09:41:20.7747371Z .loc 1 36 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:36:52 2026-02-21T09:41:20.7747426Z min.s32 %r308, %r307, 64; 2026-02-21T09:41:20.7747587Z .loc 1 37 45 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:37:45 2026-02-21T09:41:20.7747651Z and.b32 %r309, %r304, -1024; 2026-02-21T09:41:20.7747705Z sub.s32 %r310, %r658, %r309; 2026-02-21T09:41:20.7747867Z .loc 1 38 51 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:38:51 2026-02-21T09:41:20.7747954Z div.s32 %r311, %r310, %r308; 2026-02-21T09:41:20.7748123Z .loc 1 37 64 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:37:64 2026-02-21T09:41:20.7748184Z mul.lo.s32 %r312, %r311, %r308; 2026-02-21T09:41:20.7748239Z sub.s32 %r313, %r310, %r312; 2026-02-21T09:41:20.7748437Z .loc 1 37 30 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:37:30 2026-02-21T09:41:20.7748493Z add.s32 %r314, %r313, %r306; 2026-02-21T09:41:20.7748654Z .loc 1 39 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:39:27 2026-02-21T09:41:20.7748718Z shl.b32 %r656, %r314, 7; 2026-02-21T09:41:20.7748880Z .loc 1 41 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:41:27 2026-02-21T09:41:20.7748934Z shl.b32 %r657, %r311, 6; 2026-02-21T09:41:20.7749028Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.7749203Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7749260Z add.s32 %r317, %r647, 1; 2026-02-21T09:41:20.7749319Z setp.gt.s32 %p115, %r317, 5; 2026-02-21T09:41:20.7749388Z selp.b32 %r647, 0, %r317, %p115; 2026-02-21T09:41:20.7749445Z selp.b32 %r318, 1, 0, %p115; 2026-02-21T09:41:20.7749519Z xor.b32 %r646, %r646, %r318; 2026-02-21T09:41:20.7749582Z shl.b32 %r319, %r647, 3; 2026-02-21T09:41:20.7749638Z add.s32 %r321, %r66, %r319; 2026-02-21T09:41:20.7749694Z add.s32 %r315, %r321, 77824; 2026-02-21T09:41:20.7749746Z bar.sync 0; 2026-02-21T09:41:20.7749808Z // begin inline asm 2026-02-21T09:41:20.7749856Z 2026-02-21T09:41:20.7749904Z { 2026-02-21T09:41:20.7749969Z .reg .pred complete; 2026-02-21T09:41:20.7750022Z waitLoop: 2026-02-21T09:41:20.7750141Z mbarrier.try_wait.parity.shared.b64 complete, [%r315], %r646; 2026-02-21T09:41:20.7750202Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.7750256Z } 2026-02-21T09:41:20.7750260Z 2026-02-21T09:41:20.7750315Z // end inline asm 2026-02-21T09:41:20.7750370Z shl.b32 %r322, %r649, 3; 2026-02-21T09:41:20.7750433Z add.s32 %r323, %r66, %r322; 2026-02-21T09:41:20.7750486Z add.s32 %r619, %r323, 77872; 2026-02-21T09:41:20.7750644Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7750705Z @%p94 bra $L__BB0_8; 2026-02-21T09:41:20.7750798Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.7750987Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7751041Z shl.b32 %r328, %r647, 12; 2026-02-21T09:41:20.7751103Z add.s32 %r330, %r66, %r328; 2026-02-21T09:41:20.7751156Z add.s32 %r331, %r330, 49152; 2026-02-21T09:41:20.7751319Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7751381Z shl.b32 %r332, %r647, 13; 2026-02-21T09:41:20.7751435Z add.s32 %r333, %r66, %r332; 2026-02-21T09:41:20.7751604Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7751672Z elect.sync %r334|%p117, -1; 2026-02-21T09:41:20.7751729Z bfe.u32 %r335, %r331, 4, 14; 2026-02-21T09:41:20.7751786Z cvt.u64.u32 %rd68, %r335; 2026-02-21T09:41:20.7751858Z or.b64 %rd63, %rd68, -9223371899399045120; 2026-02-21T09:41:20.7751921Z bfe.u32 %r336, %r333, 4, 14; 2026-02-21T09:41:20.7751979Z cvt.u64.u32 %rd69, %r336; 2026-02-21T09:41:20.7752047Z or.b64 %rd64, %rd69, -9223371899382267904; 2026-02-21T09:41:20.7752109Z mov.b32 %r325, 69206032; 2026-02-21T09:41:20.7752163Z // begin inline asm 2026-02-21T09:41:20.7752302Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd63, %rd64, %r325, %p144; 2026-02-21T09:41:20.7752362Z // end inline asm 2026-02-21T09:41:20.7752417Z add.s32 %r337, %r330, 49184; 2026-02-21T09:41:20.7752470Z bfe.u32 %r338, %r337, 4, 14; 2026-02-21T09:41:20.7752525Z cvt.u64.u32 %rd70, %r338; 2026-02-21T09:41:20.7752621Z or.b64 %rd65, %rd70, -9223371899399045120; 2026-02-21T09:41:20.7752675Z add.s32 %r339, %r333, 32; 2026-02-21T09:41:20.7752731Z bfe.u32 %r340, %r339, 4, 14; 2026-02-21T09:41:20.7752792Z cvt.u64.u32 %rd71, %r340; 2026-02-21T09:41:20.7752854Z or.b64 %rd66, %rd71, -9223371899382267904; 2026-02-21T09:41:20.7752935Z mov.pred %p118, -1; 2026-02-21T09:41:20.7752992Z // begin inline asm 2026-02-21T09:41:20.7753134Z @%p117 tcgen05.mma.cta_group::1.kind::f16 [ %r629 + 0 ], %rd65, %rd66, %r325, %p118; 2026-02-21T09:41:20.7753190Z // end inline asm 2026-02-21T09:41:20.7753246Z cvt.u64.u32 %rd67, %r619; 2026-02-21T09:41:20.7753307Z // begin inline asm 2026-02-21T09:41:20.7753429Z @%p117 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:41:20.7753481Z // end inline asm 2026-02-21T09:41:20.7753584Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.7753757Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7753818Z setp.eq.b32 %p124, %r51, 0; 2026-02-21T09:41:20.7753880Z setp.lt.s32 %p125, %r655, %r20; 2026-02-21T09:41:20.7754058Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7754147Z // begin inline asm 2026-02-21T09:41:20.7754200Z 2026-02-21T09:41:20.7754258Z { 2026-02-21T09:41:20.7754316Z .reg .pred complete; 2026-02-21T09:41:20.7754372Z waitLoop: 2026-02-21T09:41:20.7754488Z mbarrier.try_wait.parity.shared.b64 complete, [%r645], %r644; 2026-02-21T09:41:20.7754556Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.7754605Z } 2026-02-21T09:41:20.7754608Z 2026-02-21T09:41:20.7754660Z // end inline asm 2026-02-21T09:41:20.7754750Z add.s32 %r352, %r649, 1; 2026-02-21T09:41:20.7754809Z setp.gt.s32 %p128, %r352, 1; 2026-02-21T09:41:20.7754973Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7755036Z add.s32 %r354, %r643, 32; 2026-02-21T09:41:20.7755090Z add.s32 %r355, %r648, 1; 2026-02-21T09:41:20.7755147Z setp.gt.s32 %p129, %r355, 5; 2026-02-21T09:41:20.7755206Z selp.b32 %r648, 0, %r355, %p129; 2026-02-21T09:41:20.7755273Z selp.b32 %r643, 0, %r354, %p124; 2026-02-21T09:41:20.7755329Z shl.b32 %r356, %r648, 3; 2026-02-21T09:41:20.7755385Z add.s32 %r358, %r66, %r356; 2026-02-21T09:41:20.7755449Z add.s32 %r347, %r358, 77824; 2026-02-21T09:41:20.7755529Z bar.sync 0; 2026-02-21T09:41:20.7755592Z and.pred %p121, %p135, %p125; 2026-02-21T09:41:20.7755647Z // begin inline asm 2026-02-21T09:41:20.7755768Z @%p121 mbarrier.arrive.expect_tx.shared.b64 _, [%r347], 12288; 2026-02-21T09:41:20.7755822Z // end inline asm 2026-02-21T09:41:20.7755985Z .loc 1 51 31 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:51:31 2026-02-21T09:41:20.7756048Z shl.b32 %r359, %r648, 12; 2026-02-21T09:41:20.7756103Z add.s32 %r360, %r66, %r359; 2026-02-21T09:41:20.7756161Z add.s32 %r344, %r360, 49152; 2026-02-21T09:41:20.7756219Z bar.sync 0; 2026-02-21T09:41:20.7756279Z elect.sync %r361|%p130, -1; 2026-02-21T09:41:20.7756342Z and.pred %p131, %p125, %p130; 2026-02-21T09:41:20.7756401Z and.pred %p122, %p4, %p131; 2026-02-21T09:41:20.7756463Z // begin inline asm 2026-02-21T09:41:20.7756705Z @%p122 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r344], [%rd40, {%r643, %r657}], [%r347]; 2026-02-21T09:41:20.7756760Z // end inline asm 2026-02-21T09:41:20.7756930Z .loc 1 52 44 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:52:44 2026-02-21T09:41:20.7756986Z shl.b32 %r362, %r648, 13; 2026-02-21T09:41:20.7757041Z add.s32 %r348, %r66, %r362; 2026-02-21T09:41:20.7757099Z bar.sync 0; 2026-02-21T09:41:20.7757157Z elect.sync %r363|%p132, -1; 2026-02-21T09:41:20.7757217Z and.pred %p133, %p125, %p132; 2026-02-21T09:41:20.7757275Z and.pred %p123, %p4, %p133; 2026-02-21T09:41:20.7757361Z // begin inline asm 2026-02-21T09:41:20.7757600Z @%p123 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r348], [%rd41, {%r643, %r656}], [%r347]; 2026-02-21T09:41:20.7757653Z // end inline asm 2026-02-21T09:41:20.7757830Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7757913Z setp.ne.b32 %p144, %r642, 31; 2026-02-21T09:41:20.7757972Z @%p144 bra $L__BB0_10; 2026-02-21T09:41:20.7758070Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:20.7758236Z .loc 1 40 32 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:40:32 2026-02-21T09:41:20.7758295Z add.s32 %r506, %r638, %r5; 2026-02-21T09:41:20.7758463Z .loc 1 42 32 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:42:32 2026-02-21T09:41:20.7758527Z add.s32 %r507, %r634, %r6; 2026-02-21T09:41:20.7758583Z add.s32 %r508, %r7, %r634; 2026-02-21T09:41:20.7758638Z add.s32 %r509, %r8, %r634; 2026-02-21T09:41:20.7758699Z add.s32 %r510, %r9, %r634; 2026-02-21T09:41:20.7758753Z add.s32 %r511, %r10, %r634; 2026-02-21T09:41:20.7758807Z add.s32 %r512, %r11, %r634; 2026-02-21T09:41:20.7758869Z add.s32 %r513, %r12, %r634; 2026-02-21T09:41:20.7758948Z add.s32 %r514, %r634, %r13; 2026-02-21T09:41:20.7759112Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7759167Z // begin inline asm 2026-02-21T09:41:20.7759222Z 2026-02-21T09:41:20.7759271Z { 2026-02-21T09:41:20.7759328Z .reg .pred complete; 2026-02-21T09:41:20.7759385Z waitLoop: 2026-02-21T09:41:20.7759499Z mbarrier.try_wait.parity.shared.b64 complete, [%r619], %r620; 2026-02-21T09:41:20.7759560Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.7759607Z } 2026-02-21T09:41:20.7759611Z 2026-02-21T09:41:20.7759671Z // end inline asm 2026-02-21T09:41:20.7759837Z .loc 1 56 53 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:56:53 2026-02-21T09:41:20.7759906Z mad.lo.s32 %r515, %r507, 12288, %r506; 2026-02-21T09:41:20.7759978Z mad.lo.s32 %r516, %r508, 12288, %r506; 2026-02-21T09:41:20.7760039Z mad.lo.s32 %r517, %r509, 12288, %r506; 2026-02-21T09:41:20.7760098Z mad.lo.s32 %r518, %r510, 12288, %r506; 2026-02-21T09:41:20.7760167Z mad.lo.s32 %r519, %r511, 12288, %r506; 2026-02-21T09:41:20.7760226Z mad.lo.s32 %r520, %r512, 12288, %r506; 2026-02-21T09:41:20.7760306Z mad.lo.s32 %r521, %r513, 12288, %r506; 2026-02-21T09:41:20.7760363Z mad.lo.s32 %r522, %r514, 12288, %r506; 2026-02-21T09:41:20.7760539Z .loc 1 56 24 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:56:24 2026-02-21T09:41:20.7760604Z mad.wide.s32 %rd74, %r515, 2, %rd3; 2026-02-21T09:41:20.7760667Z mad.wide.s32 %rd75, %r516, 2, %rd3; 2026-02-21T09:41:20.7760735Z mad.wide.s32 %rd76, %r517, 2, %rd3; 2026-02-21T09:41:20.7760793Z mad.wide.s32 %rd77, %r518, 2, %rd3; 2026-02-21T09:41:20.7760855Z mad.wide.s32 %rd78, %r519, 2, %rd3; 2026-02-21T09:41:20.7760920Z mad.wide.s32 %rd79, %r520, 2, %rd3; 2026-02-21T09:41:20.7760978Z mad.wide.s32 %rd80, %r521, 2, %rd3; 2026-02-21T09:41:20.7761036Z mad.wide.s32 %rd81, %r522, 2, %rd3; 2026-02-21T09:41:20.7761201Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7761264Z // begin inline asm 2026-02-21T09:41:20.7761547Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r366, %r367, %r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381}, [%r83 + 0], 64; 2026-02-21T09:41:20.7761601Z // end inline asm 2026-02-21T09:41:20.7761667Z // begin inline asm 2026-02-21T09:41:20.7761963Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r383, %r384, %r385, %r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398}, [%r83 + 16], 64; 2026-02-21T09:41:20.7762020Z // end inline asm 2026-02-21T09:41:20.7762111Z // begin inline asm 2026-02-21T09:41:20.7762405Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r400, %r401, %r402, %r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415}, [%r83 + 32], 64; 2026-02-21T09:41:20.7762463Z // end inline asm 2026-02-21T09:41:20.7762530Z // begin inline asm 2026-02-21T09:41:20.7762846Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r417, %r418, %r419, %r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432}, [%r83 + 48], 64; 2026-02-21T09:41:20.7762905Z // end inline asm 2026-02-21T09:41:20.7762961Z // begin inline asm 2026-02-21T09:41:20.7763039Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:20.7763093Z // end inline asm 2026-02-21T09:41:20.7763152Z cvt.u64.u32 %rd82, %r366; 2026-02-21T09:41:20.7763219Z cvt.u64.u32 %rd83, %r367; 2026-02-21T09:41:20.7763279Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:41:20.7763337Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:41:20.7763519Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7763584Z mov.b64 {%r523, %r524}, %rd85; 2026-02-21T09:41:20.7763651Z cvt.rn.f16x2.f32 %r525, %r524, %r523; 2026-02-21T09:41:20.7763817Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7763905Z cvt.u64.u32 %rd86, %r368; 2026-02-21T09:41:20.7763966Z cvt.u64.u32 %rd87, %r369; 2026-02-21T09:41:20.7764025Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:41:20.7764090Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:41:20.7764260Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7764322Z mov.b64 {%r526, %r527}, %rd89; 2026-02-21T09:41:20.7764388Z cvt.rn.f16x2.f32 %r528, %r527, %r526; 2026-02-21T09:41:20.7764567Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7764626Z cvt.u64.u32 %rd90, %r370; 2026-02-21T09:41:20.7764718Z cvt.u64.u32 %rd91, %r371; 2026-02-21T09:41:20.7764789Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:41:20.7764846Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:41:20.7765018Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7765087Z mov.b64 {%r529, %r530}, %rd93; 2026-02-21T09:41:20.7765157Z cvt.rn.f16x2.f32 %r531, %r530, %r529; 2026-02-21T09:41:20.7765327Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7765418Z cvt.u64.u32 %rd94, %r372; 2026-02-21T09:41:20.7765475Z cvt.u64.u32 %rd95, %r373; 2026-02-21T09:41:20.7765532Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:41:20.7765589Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:41:20.7765769Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7765830Z mov.b64 {%r532, %r533}, %rd97; 2026-02-21T09:41:20.7765894Z cvt.rn.f16x2.f32 %r534, %r533, %r532; 2026-02-21T09:41:20.7766070Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7766128Z cvt.u64.u32 %rd98, %r374; 2026-02-21T09:41:20.7766184Z cvt.u64.u32 %rd99, %r375; 2026-02-21T09:41:20.7766244Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:41:20.7766315Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:41:20.7766480Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7766544Z mov.b64 {%r535, %r536}, %rd101; 2026-02-21T09:41:20.7766616Z cvt.rn.f16x2.f32 %r537, %r536, %r535; 2026-02-21T09:41:20.7766786Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7766847Z cvt.u64.u32 %rd102, %r376; 2026-02-21T09:41:20.7766914Z cvt.u64.u32 %rd103, %r377; 2026-02-21T09:41:20.7766974Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:41:20.7767034Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:41:20.7767229Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7767297Z mov.b64 {%r538, %r539}, %rd105; 2026-02-21T09:41:20.7767360Z cvt.rn.f16x2.f32 %r540, %r539, %r538; 2026-02-21T09:41:20.7767558Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7767626Z cvt.u64.u32 %rd106, %r378; 2026-02-21T09:41:20.7767686Z cvt.u64.u32 %rd107, %r379; 2026-02-21T09:41:20.7767744Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:41:20.7767810Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:41:20.7767980Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7768039Z mov.b64 {%r541, %r542}, %rd109; 2026-02-21T09:41:20.7768102Z cvt.rn.f16x2.f32 %r543, %r542, %r541; 2026-02-21T09:41:20.7768277Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7768338Z cvt.u64.u32 %rd110, %r380; 2026-02-21T09:41:20.7768395Z cvt.u64.u32 %rd111, %r381; 2026-02-21T09:41:20.7768460Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:41:20.7768518Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:41:20.7768714Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7768781Z mov.b64 {%r544, %r545}, %rd113; 2026-02-21T09:41:20.7768845Z cvt.rn.f16x2.f32 %r546, %r545, %r544; 2026-02-21T09:41:20.7769020Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7769079Z cvt.u64.u32 %rd114, %r383; 2026-02-21T09:41:20.7769143Z cvt.u64.u32 %rd115, %r384; 2026-02-21T09:41:20.7769201Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:41:20.7769259Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:41:20.7769438Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7769500Z mov.b64 {%r547, %r548}, %rd117; 2026-02-21T09:41:20.7769563Z cvt.rn.f16x2.f32 %r549, %r548, %r547; 2026-02-21T09:41:20.7769744Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7769801Z cvt.u64.u32 %rd118, %r385; 2026-02-21T09:41:20.7769860Z cvt.u64.u32 %rd119, %r386; 2026-02-21T09:41:20.7769919Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:41:20.7769985Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:41:20.7770186Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7770243Z mov.b64 {%r550, %r551}, %rd121; 2026-02-21T09:41:20.7770312Z cvt.rn.f16x2.f32 %r552, %r551, %r550; 2026-02-21T09:41:20.7770475Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7770531Z cvt.u64.u32 %rd122, %r387; 2026-02-21T09:41:20.7770594Z cvt.u64.u32 %rd123, %r388; 2026-02-21T09:41:20.7770651Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:41:20.7770708Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:41:20.7770868Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7770933Z mov.b64 {%r553, %r554}, %rd125; 2026-02-21T09:41:20.7770994Z cvt.rn.f16x2.f32 %r555, %r554, %r553; 2026-02-21T09:41:20.7771159Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7771228Z cvt.u64.u32 %rd126, %r389; 2026-02-21T09:41:20.7771283Z cvt.u64.u32 %rd127, %r390; 2026-02-21T09:41:20.7771340Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:41:20.7771415Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:41:20.7771578Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7771634Z mov.b64 {%r556, %r557}, %rd129; 2026-02-21T09:41:20.7771694Z cvt.rn.f16x2.f32 %r558, %r557, %r556; 2026-02-21T09:41:20.7771885Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7771940Z cvt.u64.u32 %rd130, %r391; 2026-02-21T09:41:20.7771996Z cvt.u64.u32 %rd131, %r392; 2026-02-21T09:41:20.7772060Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:41:20.7772116Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:41:20.7772316Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7772383Z mov.b64 {%r559, %r560}, %rd133; 2026-02-21T09:41:20.7772442Z cvt.rn.f16x2.f32 %r561, %r560, %r559; 2026-02-21T09:41:20.7772608Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7772662Z cvt.u64.u32 %rd134, %r393; 2026-02-21T09:41:20.7772725Z cvt.u64.u32 %rd135, %r394; 2026-02-21T09:41:20.7772779Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:41:20.7772834Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:41:20.7773005Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7773062Z mov.b64 {%r562, %r563}, %rd137; 2026-02-21T09:41:20.7773122Z cvt.rn.f16x2.f32 %r564, %r563, %r562; 2026-02-21T09:41:20.7773294Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7773370Z cvt.u64.u32 %rd138, %r395; 2026-02-21T09:41:20.7773427Z cvt.u64.u32 %rd139, %r396; 2026-02-21T09:41:20.7773484Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:41:20.7773547Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:41:20.7773707Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7773764Z mov.b64 {%r565, %r566}, %rd141; 2026-02-21T09:41:20.7773830Z cvt.rn.f16x2.f32 %r567, %r566, %r565; 2026-02-21T09:41:20.7773994Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7774050Z cvt.u64.u32 %rd142, %r397; 2026-02-21T09:41:20.7774114Z cvt.u64.u32 %rd143, %r398; 2026-02-21T09:41:20.7774170Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:41:20.7774225Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:41:20.7774388Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7774452Z mov.b64 {%r568, %r569}, %rd145; 2026-02-21T09:41:20.7774514Z cvt.rn.f16x2.f32 %r570, %r569, %r568; 2026-02-21T09:41:20.7774728Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7774795Z cvt.u64.u32 %rd146, %r400; 2026-02-21T09:41:20.7774852Z cvt.u64.u32 %rd147, %r401; 2026-02-21T09:41:20.7774907Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:41:20.7774970Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:41:20.7775137Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7775193Z mov.b64 {%r571, %r572}, %rd149; 2026-02-21T09:41:20.7775255Z cvt.rn.f16x2.f32 %r573, %r572, %r571; 2026-02-21T09:41:20.7775434Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7775489Z cvt.u64.u32 %rd150, %r402; 2026-02-21T09:41:20.7775545Z cvt.u64.u32 %rd151, %r403; 2026-02-21T09:41:20.7775606Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:41:20.7775663Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:41:20.7775831Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7775894Z mov.b64 {%r574, %r575}, %rd153; 2026-02-21T09:41:20.7775953Z cvt.rn.f16x2.f32 %r576, %r575, %r574; 2026-02-21T09:41:20.7776116Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7776171Z cvt.u64.u32 %rd154, %r404; 2026-02-21T09:41:20.7776233Z cvt.u64.u32 %rd155, %r405; 2026-02-21T09:41:20.7776288Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:41:20.7776381Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:41:20.7776550Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7776606Z mov.b64 {%r577, %r578}, %rd157; 2026-02-21T09:41:20.7776666Z cvt.rn.f16x2.f32 %r579, %r578, %r577; 2026-02-21T09:41:20.7776866Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7776922Z cvt.u64.u32 %rd158, %r406; 2026-02-21T09:41:20.7776978Z cvt.u64.u32 %rd159, %r407; 2026-02-21T09:41:20.7777033Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:41:20.7777096Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:41:20.7777258Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7777314Z mov.b64 {%r580, %r581}, %rd161; 2026-02-21T09:41:20.7777381Z cvt.rn.f16x2.f32 %r582, %r581, %r580; 2026-02-21T09:41:20.7777544Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7777603Z cvt.u64.u32 %rd162, %r408; 2026-02-21T09:41:20.7777667Z cvt.u64.u32 %rd163, %r409; 2026-02-21T09:41:20.7777721Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:41:20.7777776Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:41:20.7777965Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7778031Z mov.b64 {%r583, %r584}, %rd165; 2026-02-21T09:41:20.7778091Z cvt.rn.f16x2.f32 %r585, %r584, %r583; 2026-02-21T09:41:20.7778253Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7778316Z cvt.u64.u32 %rd166, %r410; 2026-02-21T09:41:20.7778372Z cvt.u64.u32 %rd167, %r411; 2026-02-21T09:41:20.7778427Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:41:20.7778490Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:41:20.7778651Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7778709Z mov.b64 {%r586, %r587}, %rd169; 2026-02-21T09:41:20.7778769Z cvt.rn.f16x2.f32 %r588, %r587, %r586; 2026-02-21T09:41:20.7778936Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7778992Z cvt.u64.u32 %rd170, %r412; 2026-02-21T09:41:20.7779051Z cvt.u64.u32 %rd171, %r413; 2026-02-21T09:41:20.7779115Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:41:20.7779201Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:41:20.7779368Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7779432Z mov.b64 {%r589, %r590}, %rd173; 2026-02-21T09:41:20.7779491Z cvt.rn.f16x2.f32 %r591, %r590, %r589; 2026-02-21T09:41:20.7779652Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7779707Z cvt.u64.u32 %rd174, %r414; 2026-02-21T09:41:20.7779771Z cvt.u64.u32 %rd175, %r415; 2026-02-21T09:41:20.7779829Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:41:20.7779886Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:41:20.7780054Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7780112Z mov.b64 {%r592, %r593}, %rd177; 2026-02-21T09:41:20.7780179Z cvt.rn.f16x2.f32 %r594, %r593, %r592; 2026-02-21T09:41:20.7780347Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7780404Z cvt.u64.u32 %rd178, %r417; 2026-02-21T09:41:20.7780460Z cvt.u64.u32 %rd179, %r418; 2026-02-21T09:41:20.7780516Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:41:20.7780580Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:41:20.7780744Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7780800Z mov.b64 {%r595, %r596}, %rd181; 2026-02-21T09:41:20.7780867Z cvt.rn.f16x2.f32 %r597, %r596, %r595; 2026-02-21T09:41:20.7781056Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7781115Z cvt.u64.u32 %rd182, %r419; 2026-02-21T09:41:20.7781177Z cvt.u64.u32 %rd183, %r420; 2026-02-21T09:41:20.7781232Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:41:20.7781290Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:41:20.7781474Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7781540Z mov.b64 {%r598, %r599}, %rd185; 2026-02-21T09:41:20.7781600Z cvt.rn.f16x2.f32 %r600, %r599, %r598; 2026-02-21T09:41:20.7781768Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7781833Z cvt.u64.u32 %rd186, %r421; 2026-02-21T09:41:20.7781888Z cvt.u64.u32 %rd187, %r422; 2026-02-21T09:41:20.7781944Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:41:20.7782009Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:41:20.7782173Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7782231Z mov.b64 {%r601, %r602}, %rd189; 2026-02-21T09:41:20.7782291Z cvt.rn.f16x2.f32 %r603, %r602, %r601; 2026-02-21T09:41:20.7782487Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7782548Z cvt.u64.u32 %rd190, %r423; 2026-02-21T09:41:20.7782603Z cvt.u64.u32 %rd191, %r424; 2026-02-21T09:41:20.7782669Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:41:20.7782724Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:41:20.7782891Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7782954Z mov.b64 {%r604, %r605}, %rd193; 2026-02-21T09:41:20.7783015Z cvt.rn.f16x2.f32 %r606, %r605, %r604; 2026-02-21T09:41:20.7783183Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7783241Z cvt.u64.u32 %rd194, %r425; 2026-02-21T09:41:20.7783305Z cvt.u64.u32 %rd195, %r426; 2026-02-21T09:41:20.7783361Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:41:20.7783417Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:41:20.7783587Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7783644Z mov.b64 {%r607, %r608}, %rd197; 2026-02-21T09:41:20.7783705Z cvt.rn.f16x2.f32 %r609, %r608, %r607; 2026-02-21T09:41:20.7783919Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7783974Z cvt.u64.u32 %rd198, %r427; 2026-02-21T09:41:20.7784030Z cvt.u64.u32 %rd199, %r428; 2026-02-21T09:41:20.7784086Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:41:20.7784151Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:41:20.7784315Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7784374Z mov.b64 {%r610, %r611}, %rd201; 2026-02-21T09:41:20.7784441Z cvt.rn.f16x2.f32 %r612, %r611, %r610; 2026-02-21T09:41:20.7784606Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7784661Z cvt.u64.u32 %rd202, %r429; 2026-02-21T09:41:20.7784751Z cvt.u64.u32 %rd203, %r430; 2026-02-21T09:41:20.7784811Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:41:20.7784866Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:41:20.7785032Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7785094Z mov.b64 {%r613, %r614}, %rd205; 2026-02-21T09:41:20.7785153Z cvt.rn.f16x2.f32 %r615, %r614, %r613; 2026-02-21T09:41:20.7785318Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7785380Z cvt.u64.u32 %rd206, %r431; 2026-02-21T09:41:20.7785434Z cvt.u64.u32 %rd207, %r432; 2026-02-21T09:41:20.7785488Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:41:20.7785581Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:41:20.7785746Z .loc 1 55 27 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:55:27 2026-02-21T09:41:20.7785802Z mov.b64 {%r616, %r617}, %rd209; 2026-02-21T09:41:20.7785862Z cvt.rn.f16x2.f32 %r618, %r617, %r616; 2026-02-21T09:41:20.7786061Z .loc 1 56 83 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:56:83 2026-02-21T09:41:20.7786160Z st.shared.v4.b32 [%r21], {%r525, %r537, %r549, %r561}; 2026-02-21T09:41:20.7786248Z st.shared.v4.b32 [%r22], {%r573, %r585, %r597, %r609}; 2026-02-21T09:41:20.7786310Z bar.sync 0; 2026-02-21T09:41:20.7786365Z // begin inline asm 2026-02-21T09:41:20.7786516Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r474, %r478, %r482, %r486}, [%r438]; 2026-02-21T09:41:20.7786575Z // end inline asm 2026-02-21T09:41:20.7786628Z // begin inline asm 2026-02-21T09:41:20.7786771Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r490, %r494, %r498, %r502}, [%r443]; 2026-02-21T09:41:20.7786825Z // end inline asm 2026-02-21T09:41:20.7786886Z bar.sync 0; 2026-02-21T09:41:20.7786972Z st.shared.v4.b32 [%r21], {%r528, %r540, %r552, %r564}; 2026-02-21T09:41:20.7787055Z st.shared.v4.b32 [%r22], {%r576, %r588, %r600, %r612}; 2026-02-21T09:41:20.7787113Z bar.sync 0; 2026-02-21T09:41:20.7787195Z // begin inline asm 2026-02-21T09:41:20.7787339Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r475, %r479, %r483, %r487}, [%r438]; 2026-02-21T09:41:20.7787394Z // end inline asm 2026-02-21T09:41:20.7787454Z // begin inline asm 2026-02-21T09:41:20.7787594Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r491, %r495, %r499, %r503}, [%r443]; 2026-02-21T09:41:20.7787647Z // end inline asm 2026-02-21T09:41:20.7787707Z bar.sync 0; 2026-02-21T09:41:20.7787791Z st.shared.v4.b32 [%r21], {%r531, %r543, %r555, %r567}; 2026-02-21T09:41:20.7787872Z st.shared.v4.b32 [%r22], {%r579, %r591, %r603, %r615}; 2026-02-21T09:41:20.7787932Z bar.sync 0; 2026-02-21T09:41:20.7787987Z // begin inline asm 2026-02-21T09:41:20.7788127Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r476, %r480, %r484, %r488}, [%r438]; 2026-02-21T09:41:20.7788179Z // end inline asm 2026-02-21T09:41:20.7788244Z // begin inline asm 2026-02-21T09:41:20.7788383Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r492, %r496, %r500, %r504}, [%r443]; 2026-02-21T09:41:20.7788439Z // end inline asm 2026-02-21T09:41:20.7788501Z bar.sync 0; 2026-02-21T09:41:20.7788611Z st.shared.v4.b32 [%r21], {%r534, %r546, %r558, %r570}; 2026-02-21T09:41:20.7788695Z st.shared.v4.b32 [%r22], {%r582, %r594, %r606, %r618}; 2026-02-21T09:41:20.7788750Z bar.sync 0; 2026-02-21T09:41:20.7788813Z // begin inline asm 2026-02-21T09:41:20.7788952Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r477, %r481, %r485, %r489}, [%r438]; 2026-02-21T09:41:20.7789004Z // end inline asm 2026-02-21T09:41:20.7789066Z // begin inline asm 2026-02-21T09:41:20.7789203Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r493, %r497, %r501, %r505}, [%r443]; 2026-02-21T09:41:20.7789257Z // end inline asm 2026-02-21T09:41:20.7789317Z // begin inline asm 2026-02-21T09:41:20.7789419Z st.global.v4.b32 [ %rd74 + 0 ], { %r474, %r475, %r476, %r477 }; 2026-02-21T09:41:20.7789472Z // end inline asm 2026-02-21T09:41:20.7789524Z // begin inline asm 2026-02-21T09:41:20.7789631Z st.global.v4.b32 [ %rd75 + 0 ], { %r478, %r479, %r480, %r481 }; 2026-02-21T09:41:20.7789683Z // end inline asm 2026-02-21T09:41:20.7789736Z // begin inline asm 2026-02-21T09:41:20.7789840Z st.global.v4.b32 [ %rd76 + 0 ], { %r482, %r483, %r484, %r485 }; 2026-02-21T09:41:20.7789890Z // end inline asm 2026-02-21T09:41:20.7789943Z // begin inline asm 2026-02-21T09:41:20.7790034Z st.global.v4.b32 [ %rd77 + 0 ], { %r486, %r487, %r488, %r489 }; 2026-02-21T09:41:20.7790093Z // end inline asm 2026-02-21T09:41:20.7790147Z // begin inline asm 2026-02-21T09:41:20.7790236Z st.global.v4.b32 [ %rd78 + 0 ], { %r490, %r491, %r492, %r493 }; 2026-02-21T09:41:20.7790296Z // end inline asm 2026-02-21T09:41:20.7790397Z // begin inline asm 2026-02-21T09:41:20.7790488Z st.global.v4.b32 [ %rd79 + 0 ], { %r494, %r495, %r496, %r497 }; 2026-02-21T09:41:20.7790540Z // end inline asm 2026-02-21T09:41:20.7790601Z // begin inline asm 2026-02-21T09:41:20.7790690Z st.global.v4.b32 [ %rd80 + 0 ], { %r498, %r499, %r500, %r501 }; 2026-02-21T09:41:20.7790743Z // end inline asm 2026-02-21T09:41:20.7790826Z // begin inline asm 2026-02-21T09:41:20.7790918Z st.global.v4.b32 [ %rd81 + 0 ], { %r502, %r503, %r504, %r505 }; 2026-02-21T09:41:20.7790973Z // end inline asm 2026-02-21T09:41:20.7791036Z bra.uni $L__BB0_10; 2026-02-21T09:41:20.7791116Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:20.7791286Z .loc 1 53 52 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:53:52 2026-02-21T09:41:20.7791339Z // begin inline asm 2026-02-21T09:41:20.7791396Z 2026-02-21T09:41:20.7791445Z { 2026-02-21T09:41:20.7791503Z .reg .pred complete; 2026-02-21T09:41:20.7791564Z waitLoop: 2026-02-21T09:41:20.7791679Z mbarrier.try_wait.parity.shared.b64 complete, [%r619], %r620; 2026-02-21T09:41:20.7791743Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.7791792Z } 2026-02-21T09:41:20.7791796Z 2026-02-21T09:41:20.7791856Z // end inline asm 2026-02-21T09:41:20.7791966Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:41:20.7792142Z .loc 1 28 108 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:108 2026-02-21T09:41:20.7792203Z bar.sync 0; 2026-02-21T09:41:20.7792256Z // begin inline asm 2026-02-21T09:41:20.7792345Z @%p135 mbarrier.inval.shared::cta.b64 [%r153]; 2026-02-21T09:41:20.7792403Z // end inline asm 2026-02-21T09:41:20.7792453Z bar.sync 0; 2026-02-21T09:41:20.7792506Z // begin inline asm 2026-02-21T09:41:20.7792588Z @%p135 mbarrier.inval.shared::cta.b64 [%r154]; 2026-02-21T09:41:20.7792647Z // end inline asm 2026-02-21T09:41:20.7792697Z bar.sync 0; 2026-02-21T09:41:20.7792749Z // begin inline asm 2026-02-21T09:41:20.7792834Z @%p135 mbarrier.inval.shared::cta.b64 [%r155]; 2026-02-21T09:41:20.7792886Z // end inline asm 2026-02-21T09:41:20.7792936Z bar.sync 0; 2026-02-21T09:41:20.7792989Z // begin inline asm 2026-02-21T09:41:20.7793072Z @%p135 mbarrier.inval.shared::cta.b64 [%r156]; 2026-02-21T09:41:20.7793123Z // end inline asm 2026-02-21T09:41:20.7793175Z bar.sync 0; 2026-02-21T09:41:20.7793234Z // begin inline asm 2026-02-21T09:41:20.7793307Z @%p135 mbarrier.inval.shared::cta.b64 [%r157]; 2026-02-21T09:41:20.7793386Z // end inline asm 2026-02-21T09:41:20.7793442Z bar.sync 0; 2026-02-21T09:41:20.7793495Z // begin inline asm 2026-02-21T09:41:20.7793569Z @%p135 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:41:20.7793620Z // end inline asm 2026-02-21T09:41:20.7793681Z // begin inline asm 2026-02-21T09:41:20.7793755Z @%p135 mbarrier.inval.shared::cta.b64 [%r151]; 2026-02-21T09:41:20.7793806Z // end inline asm 2026-02-21T09:41:20.7793864Z bar.sync 0; 2026-02-21T09:41:20.7793919Z // begin inline asm 2026-02-21T09:41:20.7793992Z @%p135 mbarrier.inval.shared::cta.b64 [%r152]; 2026-02-21T09:41:20.7794044Z // end inline asm 2026-02-21T09:41:20.7794214Z .loc 1 28 4 // cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py:28:4 2026-02-21T09:41:20.7794264Z bar.sync 0; 2026-02-21T09:41:20.7794318Z // begin inline asm 2026-02-21T09:41:20.7794440Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r629, 128; 2026-02-21T09:41:20.7794493Z // end inline asm 2026-02-21T09:41:20.7794544Z ret; 2026-02-21T09:41:20.7794603Z $L__tmp1: 2026-02-21T09:41:20.7794656Z $L__func_end0: 2026-02-21T09:41:20.7794790Z // -- End function 2026-02-21T09:41:20.7794840Z } 2026-02-21T09:41:20.7795054Z .file 1 "/tmp/torchinductor_root/un/cunhqcmoip62gabende5smcci5zp3nypwse2nn2xfkluqy3uu7wr.py" 2026-02-21T09:41:20.7795114Z .section .debug_abbrev 2026-02-21T09:41:20.7795164Z { 2026-02-21T09:41:20.7795255Z .b8 1 // Abbreviation Code 2026-02-21T09:41:20.7795368Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:20.7795447Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:20.7795526Z .b8 37 // DW_AT_producer 2026-02-21T09:41:20.7795609Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.7795704Z .b8 19 // DW_AT_language 2026-02-21T09:41:20.7795779Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:20.7795861Z .b8 3 // DW_AT_name 2026-02-21T09:41:20.7795932Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.7796009Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:20.7796093Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:20.7796169Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:20.7796243Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.7796326Z .b8 0 // EOM(1) 2026-02-21T09:41:20.7796396Z .b8 0 // EOM(2) 2026-02-21T09:41:20.7796462Z .b8 0 // EOM(3) 2026-02-21T09:41:20.7796511Z } 2026-02-21T09:41:20.7796601Z .section .debug_info 2026-02-21T09:41:20.7796652Z { 2026-02-21T09:41:20.7796732Z .b32 104 // Length of Unit 2026-02-21T09:41:20.7796819Z .b8 2 // DWARF version number 2026-02-21T09:41:20.7796871Z .b8 0 2026-02-21T09:41:20.7796983Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:20.7797069Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:20.7797169Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:20.7797245Z .b8 116 // DW_AT_producer 2026-02-21T09:41:20.7797297Z .b8 114 2026-02-21T09:41:20.7797356Z .b8 105 2026-02-21T09:41:20.7797404Z .b8 116 2026-02-21T09:41:20.7797452Z .b8 111 2026-02-21T09:41:20.7797502Z .b8 110 2026-02-21T09:41:20.7797558Z .b8 0 2026-02-21T09:41:20.7797628Z .b8 2 // DW_AT_language 2026-02-21T09:41:20.7797676Z .b8 0 2026-02-21T09:41:20.7797758Z .b8 99 // DW_AT_name 2026-02-21T09:41:20.7797806Z .b8 117 2026-02-21T09:41:20.7797881Z .b8 110 2026-02-21T09:41:20.7797930Z .b8 104 2026-02-21T09:41:20.7797986Z .b8 113 2026-02-21T09:41:20.7798035Z .b8 99 2026-02-21T09:41:20.7798082Z .b8 109 2026-02-21T09:41:20.7798137Z .b8 111 2026-02-21T09:41:20.7798185Z .b8 105 2026-02-21T09:41:20.7798233Z .b8 112 2026-02-21T09:41:20.7798283Z .b8 54 2026-02-21T09:41:20.7798339Z .b8 50 2026-02-21T09:41:20.7798387Z .b8 103 2026-02-21T09:41:20.7798435Z .b8 97 2026-02-21T09:41:20.7798492Z .b8 98 2026-02-21T09:41:20.7798540Z .b8 101 2026-02-21T09:41:20.7798589Z .b8 110 2026-02-21T09:41:20.7798639Z .b8 100 2026-02-21T09:41:20.7798696Z .b8 101 2026-02-21T09:41:20.7798746Z .b8 53 2026-02-21T09:41:20.7798795Z .b8 115 2026-02-21T09:41:20.7798849Z .b8 109 2026-02-21T09:41:20.7798897Z .b8 99 2026-02-21T09:41:20.7798945Z .b8 99 2026-02-21T09:41:20.7798993Z .b8 105 2026-02-21T09:41:20.7799048Z .b8 53 2026-02-21T09:41:20.7799097Z .b8 122 2026-02-21T09:41:20.7799146Z .b8 112 2026-02-21T09:41:20.7799195Z .b8 51 2026-02-21T09:41:20.7799250Z .b8 110 2026-02-21T09:41:20.7799300Z .b8 121 2026-02-21T09:41:20.7799347Z .b8 112 2026-02-21T09:41:20.7799400Z .b8 119 2026-02-21T09:41:20.7799448Z .b8 115 2026-02-21T09:41:20.7799495Z .b8 101 2026-02-21T09:41:20.7799542Z .b8 50 2026-02-21T09:41:20.7799597Z .b8 110 2026-02-21T09:41:20.7799645Z .b8 110 2026-02-21T09:41:20.7799693Z .b8 50 2026-02-21T09:41:20.7799748Z .b8 120 2026-02-21T09:41:20.7799795Z .b8 102 2026-02-21T09:41:20.7799841Z .b8 107 2026-02-21T09:41:20.7799888Z .b8 108 2026-02-21T09:41:20.7799943Z .b8 117 2026-02-21T09:41:20.7799990Z .b8 113 2026-02-21T09:41:20.7800059Z .b8 121 2026-02-21T09:41:20.7800106Z .b8 51 2026-02-21T09:41:20.7800162Z .b8 117 2026-02-21T09:41:20.7800209Z .b8 117 2026-02-21T09:41:20.7800256Z .b8 55 2026-02-21T09:41:20.7800311Z .b8 119 2026-02-21T09:41:20.7800359Z .b8 114 2026-02-21T09:41:20.7800405Z .b8 46 2026-02-21T09:41:20.7800452Z .b8 112 2026-02-21T09:41:20.7800509Z .b8 121 2026-02-21T09:41:20.7800558Z .b8 0 2026-02-21T09:41:20.7800668Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:20.7800750Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:20.7800799Z .b8 116 2026-02-21T09:41:20.7800847Z .b8 109 2026-02-21T09:41:20.7800894Z .b8 112 2026-02-21T09:41:20.7800949Z .b8 47 2026-02-21T09:41:20.7800998Z .b8 116 2026-02-21T09:41:20.7801046Z .b8 111 2026-02-21T09:41:20.7801102Z .b8 114 2026-02-21T09:41:20.7801152Z .b8 99 2026-02-21T09:41:20.7801199Z .b8 104 2026-02-21T09:41:20.7801246Z .b8 105 2026-02-21T09:41:20.7801303Z .b8 110 2026-02-21T09:41:20.7801351Z .b8 100 2026-02-21T09:41:20.7801402Z .b8 117 2026-02-21T09:41:20.7801460Z .b8 99 2026-02-21T09:41:20.7801509Z .b8 116 2026-02-21T09:41:20.7801561Z .b8 111 2026-02-21T09:41:20.7801609Z .b8 114 2026-02-21T09:41:20.7811949Z .b8 95 2026-02-21T09:41:20.7812056Z .b8 114 2026-02-21T09:41:20.7812113Z .b8 111 2026-02-21T09:41:20.7812181Z .b8 111 2026-02-21T09:41:20.7812235Z .b8 116 2026-02-21T09:41:20.7812377Z .b8 47 2026-02-21T09:41:20.7812440Z .b8 117 2026-02-21T09:41:20.7812506Z .b8 110 2026-02-21T09:41:20.7812562Z .b8 0 2026-02-21T09:41:20.7812621Z } 2026-02-21T09:41:20.7812707Z .section .debug_macinfo { } 2026-02-21T09:41:20.7812714Z 2026-02-21T09:41:20.7812806Z ================================================================ 2026-02-21T09:41:20.7812930Z please share the reproducer above with Triton project. 2026-02-21T09:41:20.9383186Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:20.9383201Z 2026-02-21T09:41:20.9388103Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:41:20.9388124Z 2026-02-21T09:41:20.9388346Z 2026-02-21T09:41:20.9388469Z ================================================================ 2026-02-21T09:41:20.9388551Z Internal Triton PTX codegen error 2026-02-21T09:41:20.9388617Z `ptxas` stderr: 2026-02-21T09:41:20.9388966Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 269 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.9389065Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.9389069Z 2026-02-21T09:41:20.9389482Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpml261tm1.ptx -o /tmp/tmpml261tm1.ptx.o 2026-02-21T09:41:20.9389487Z 2026-02-21T09:41:20.9389491Z 2026-02-21T09:41:20.9389552Z // 2026-02-21T09:41:20.9389626Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:20.9389677Z // 2026-02-21T09:41:20.9389682Z 2026-02-21T09:41:20.9389747Z .version 8.7 2026-02-21T09:41:20.9389805Z .target sm_100a 2026-02-21T09:41:20.9389859Z .address_size 64 2026-02-21T09:41:20.9389863Z 2026-02-21T09:41:20.9389992Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:20.9390073Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:20.9390153Z // @_helion_matmul 2026-02-21T09:41:20.9390220Z .visible .entry _helion_matmul( 2026-02-21T09:41:20.9390333Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:20.9390425Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:20.9390572Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:20.9390673Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:20.9390769Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:20.9390823Z ) 2026-02-21T09:41:20.9390889Z .reqntid 256 2026-02-21T09:41:20.9390988Z .maxnreg 32 2026-02-21T09:41:20.9391040Z { 2026-02-21T09:41:20.9391104Z .reg .pred %p<138>; 2026-02-21T09:41:20.9391172Z .reg .b32 %r<547>; 2026-02-21T09:41:20.9391229Z .reg .b64 %rd<210>; 2026-02-21T09:41:20.9391412Z .loc 1 19 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:19:0 2026-02-21T09:41:20.9391494Z $L__func_begin0: 2026-02-21T09:41:20.9391659Z .loc 1 19 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:19:0 2026-02-21T09:41:20.9391663Z 2026-02-21T09:41:20.9391717Z // %bb.0: 2026-02-21T09:41:20.9391810Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:41:20.9391866Z $L__tmp0: 2026-02-21T09:41:20.9392031Z .loc 1 19 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:19 2026-02-21T09:41:20.9392091Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:20.9392162Z shr.u32 %r2, %r1, 5; 2026-02-21T09:41:20.9392270Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:41:20.9392339Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:41:20.9392408Z @%p3 bra $L__BB0_16; 2026-02-21T09:41:20.9392467Z bra.uni $L__BB0_1; 2026-02-21T09:41:20.9392522Z $L__BB0_16: 2026-02-21T09:41:20.9392687Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0:0 2026-02-21T09:41:20.9392778Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:41:20.9392857Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:41:20.9392933Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:41:20.9393104Z .loc 1 19 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:19 2026-02-21T09:41:20.9393187Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:41:20.9393253Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T09:41:20.9393325Z mov.b32 %r153, global_smem; 2026-02-21T09:41:20.9393384Z // begin inline asm 2026-02-21T09:41:20.9393535Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r153], 128; 2026-02-21T09:41:20.9393592Z // end inline asm 2026-02-21T09:41:20.9393658Z bar.sync 0, 128; 2026-02-21T09:41:20.9393762Z ld.shared.b32 %r518, [global_smem]; 2026-02-21T09:41:20.9393815Z bar.sync 0, 128; 2026-02-21T09:41:20.9393881Z // begin inline asm 2026-02-21T09:41:20.9394005Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:20.9394060Z // end inline asm 2026-02-21T09:41:20.9394239Z .loc 1 21 67 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:21:67 2026-02-21T09:41:20.9394300Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:41:20.9394358Z mov.u32 %r268, %ctaid.y; 2026-02-21T09:41:20.9394416Z mov.u32 %r269, %ctaid.z; 2026-02-21T09:41:20.9394484Z mov.u32 %r270, %nctaid.x; 2026-02-21T09:41:20.9394542Z mov.u32 %r271, %nctaid.y; 2026-02-21T09:41:20.9394612Z mad.lo.s32 %r272, %r269, %r271, %r268; 2026-02-21T09:41:20.9394736Z mad.lo.s32 %r273, %r272, %r270, %r41; 2026-02-21T09:41:20.9394800Z mul.lo.s32 %r274, %r273, 384; 2026-02-21T09:41:20.9394861Z cvt.s64.s32 %rd78, %r274; 2026-02-21T09:41:20.9394923Z add.s64 %rd39, %rd7, %rd78; 2026-02-21T09:41:20.9394994Z shl.b32 %r275, %r1, 2; 2026-02-21T09:41:20.9395057Z add.s32 %r154, %r153, %r275; 2026-02-21T09:41:20.9395111Z mov.b32 %r546, 0; 2026-02-21T09:41:20.9395179Z // begin inline asm 2026-02-21T09:41:20.9395254Z @%p29 st.shared.b32 [ %r154 + 0 ], %r546; 2026-02-21T09:41:20.9395310Z // end inline asm 2026-02-21T09:41:20.9395373Z bar.warp.sync -1; 2026-02-21T09:41:20.9395446Z setp.eq.b32 %p122, %r1, 0; 2026-02-21T09:41:20.9395506Z cvt.u64.u32 %rd24, %r153; 2026-02-21T09:41:20.9395599Z // begin inline asm 2026-02-21T09:41:20.9395782Z @%p122 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd24 + 0 ], %rd4; 2026-02-21T09:41:20.9395837Z // end inline asm 2026-02-21T09:41:20.9395895Z // begin inline asm 2026-02-21T09:41:20.9396048Z @%p122 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1; 2026-02-21T09:41:20.9396104Z // end inline asm 2026-02-21T09:41:20.9396186Z mov.b32 %r156, 32; 2026-02-21T09:41:20.9396243Z // begin inline asm 2026-02-21T09:41:20.9396411Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r156; 2026-02-21T09:41:20.9396467Z // end inline asm 2026-02-21T09:41:20.9396521Z mov.b32 %r157, 64; 2026-02-21T09:41:20.9396586Z // begin inline asm 2026-02-21T09:41:20.9396737Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r157; 2026-02-21T09:41:20.9396790Z // end inline asm 2026-02-21T09:41:20.9396856Z mov.b32 %r158, 1024; 2026-02-21T09:41:20.9397111Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:20.9397168Z `ptxas` stderr: 2026-02-21T09:41:20.9397505Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 269 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:20.9397622Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:20.9397629Z 2026-02-21T09:41:20.9398023Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpml261tm1.ptx -o /tmp/tmpml261tm1.ptx.o 2026-02-21T09:41:20.9398029Z 2026-02-21T09:41:20.9398185Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:20.9398253Z // begin inline asm 2026-02-21T09:41:20.9398422Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r158; 2026-02-21T09:41:20.9398483Z // end inline asm 2026-02-21T09:41:20.9398538Z // begin inline asm 2026-02-21T09:41:20.9398698Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r158; 2026-02-21T09:41:20.9398753Z // end inline asm 2026-02-21T09:41:20.9398817Z mov.b64 %rd32, 2048; 2026-02-21T09:41:20.9398870Z // begin inline asm 2026-02-21T09:41:20.9399045Z @%p122 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd24 + 0 ], 0x0, %rd32; 2026-02-21T09:41:20.9399110Z // end inline asm 2026-02-21T09:41:20.9399164Z mov.b32 %r160, 1; 2026-02-21T09:41:20.9399250Z // begin inline asm 2026-02-21T09:41:20.9399434Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r160; 2026-02-21T09:41:20.9399489Z // end inline asm 2026-02-21T09:41:20.9399545Z // begin inline asm 2026-02-21T09:41:20.9399718Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r160; 2026-02-21T09:41:20.9399781Z // end inline asm 2026-02-21T09:41:20.9399836Z // begin inline asm 2026-02-21T09:41:20.9399989Z @%p122 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x6; 2026-02-21T09:41:20.9400054Z // end inline asm 2026-02-21T09:41:20.9400108Z // begin inline asm 2026-02-21T09:41:20.9400275Z @%p122 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:41:20.9400337Z // end inline asm 2026-02-21T09:41:20.9400396Z // begin inline asm 2026-02-21T09:41:20.9400551Z @%p122 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x2; 2026-02-21T09:41:20.9400616Z // end inline asm 2026-02-21T09:41:20.9400671Z // begin inline asm 2026-02-21T09:41:20.9400816Z @%p122 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:41:20.9400870Z // end inline asm 2026-02-21T09:41:20.9400935Z // begin inline asm 2026-02-21T09:41:20.9401199Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd39 + 0 ], [ %rd24 + 0 ], 0x80; 2026-02-21T09:41:20.9401253Z // end inline asm 2026-02-21T09:41:20.9401344Z // begin inline asm 2026-02-21T09:41:20.9401470Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd39 + 0 ], 0x80; 2026-02-21T09:41:20.9401544Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.9401627Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.9401681Z // end inline asm 2026-02-21T09:41:20.9401737Z bar.sync 0, 128; 2026-02-21T09:41:20.9401931Z .loc 1 22 68 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:22:68 2026-02-21T09:41:20.9402007Z add.s32 %r276, %r274, 128; 2026-02-21T09:41:20.9402070Z cvt.s64.s32 %rd79, %r276; 2026-02-21T09:41:20.9402130Z add.s64 %rd57, %rd7, %rd79; 2026-02-21T09:41:20.9402195Z bar.sync 0, 128; 2026-02-21T09:41:20.9402252Z // begin inline asm 2026-02-21T09:41:20.9402322Z @%p29 st.shared.b32 [ %r154 + 0 ], %r546; 2026-02-21T09:41:20.9402378Z // end inline asm 2026-02-21T09:41:20.9402446Z bar.warp.sync -1; 2026-02-21T09:41:20.9402500Z // begin inline asm 2026-02-21T09:41:20.9402660Z @%p122 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd24 + 0 ], %rd5; 2026-02-21T09:41:20.9402724Z // end inline asm 2026-02-21T09:41:20.9402777Z // begin inline asm 2026-02-21T09:41:20.9402916Z @%p122 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1; 2026-02-21T09:41:20.9402977Z // end inline asm 2026-02-21T09:41:20.9403057Z // begin inline asm 2026-02-21T09:41:20.9403215Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r156; 2026-02-21T09:41:20.9403270Z // end inline asm 2026-02-21T09:41:20.9403335Z mov.b32 %r165, 128; 2026-02-21T09:41:20.9403390Z // begin inline asm 2026-02-21T09:41:20.9403539Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r165; 2026-02-21T09:41:20.9403603Z // end inline asm 2026-02-21T09:41:20.9403657Z // begin inline asm 2026-02-21T09:41:20.9403817Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r158; 2026-02-21T09:41:20.9403880Z // end inline asm 2026-02-21T09:41:20.9403939Z mov.b32 %r167, 12288; 2026-02-21T09:41:20.9403994Z // begin inline asm 2026-02-21T09:41:20.9404161Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r167; 2026-02-21T09:41:20.9404216Z // end inline asm 2026-02-21T09:41:20.9404270Z // begin inline asm 2026-02-21T09:41:20.9404440Z @%p122 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd24 + 0 ], 0x0, %rd32; 2026-02-21T09:41:20.9404505Z // end inline asm 2026-02-21T09:41:20.9404583Z // begin inline asm 2026-02-21T09:41:20.9404790Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r160; 2026-02-21T09:41:20.9404855Z // end inline asm 2026-02-21T09:41:20.9404910Z // begin inline asm 2026-02-21T09:41:20.9405080Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r160; 2026-02-21T09:41:20.9405141Z // end inline asm 2026-02-21T09:41:20.9405195Z // begin inline asm 2026-02-21T09:41:20.9405345Z @%p122 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x6; 2026-02-21T09:41:20.9405402Z // end inline asm 2026-02-21T09:41:20.9405467Z // begin inline asm 2026-02-21T09:41:20.9405632Z @%p122 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:41:20.9405687Z // end inline asm 2026-02-21T09:41:20.9405752Z // begin inline asm 2026-02-21T09:41:20.9405907Z @%p122 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x2; 2026-02-21T09:41:20.9405963Z // end inline asm 2026-02-21T09:41:20.9406026Z // begin inline asm 2026-02-21T09:41:20.9406172Z @%p122 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:41:20.9406226Z // end inline asm 2026-02-21T09:41:20.9406282Z // begin inline asm 2026-02-21T09:41:20.9406548Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd57 + 0 ], [ %rd24 + 0 ], 0x80; 2026-02-21T09:41:20.9406603Z // end inline asm 2026-02-21T09:41:20.9406685Z // begin inline asm 2026-02-21T09:41:20.9406821Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd57 + 0 ], 0x80; 2026-02-21T09:41:20.9406891Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.9406965Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.9407027Z // end inline asm 2026-02-21T09:41:20.9407081Z bar.sync 0, 128; 2026-02-21T09:41:20.9407291Z .loc 1 24 73 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:24:73 2026-02-21T09:41:20.9407363Z add.s32 %r277, %r274, 256; 2026-02-21T09:41:20.9407421Z cvt.s64.s32 %rd80, %r277; 2026-02-21T09:41:20.9407482Z add.s64 %rd75, %rd7, %rd80; 2026-02-21T09:41:20.9407534Z bar.sync 0, 128; 2026-02-21T09:41:20.9407599Z // begin inline asm 2026-02-21T09:41:20.9407668Z @%p29 st.shared.b32 [ %r154 + 0 ], %r546; 2026-02-21T09:41:20.9407721Z // end inline asm 2026-02-21T09:41:20.9407788Z bar.warp.sync -1; 2026-02-21T09:41:20.9407842Z // begin inline asm 2026-02-21T09:41:20.9408002Z @%p122 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd24 + 0 ], %rd6; 2026-02-21T09:41:20.9408058Z // end inline asm 2026-02-21T09:41:20.9408120Z // begin inline asm 2026-02-21T09:41:20.9408255Z @%p122 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1; 2026-02-21T09:41:20.9408311Z // end inline asm 2026-02-21T09:41:20.9408402Z // begin inline asm 2026-02-21T09:41:20.9408553Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r157; 2026-02-21T09:41:20.9408610Z // end inline asm 2026-02-21T09:41:20.9408673Z // begin inline asm 2026-02-21T09:41:20.9408820Z @%p122 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r157; 2026-02-21T09:41:20.9408875Z // end inline asm 2026-02-21T09:41:20.9408928Z // begin inline asm 2026-02-21T09:41:20.9409095Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r167; 2026-02-21T09:41:20.9409149Z // end inline asm 2026-02-21T09:41:20.9409204Z // begin inline asm 2026-02-21T09:41:20.9409375Z @%p122 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r158; 2026-02-21T09:41:20.9409429Z // end inline asm 2026-02-21T09:41:20.9409487Z mov.b64 %rd68, 24576; 2026-02-21T09:41:20.9409551Z // begin inline asm 2026-02-21T09:41:20.9409724Z @%p122 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd24 + 0 ], 0x0, %rd68; 2026-02-21T09:41:20.9409777Z // end inline asm 2026-02-21T09:41:20.9409869Z // begin inline asm 2026-02-21T09:41:20.9410041Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r160; 2026-02-21T09:41:20.9410094Z // end inline asm 2026-02-21T09:41:20.9410149Z // begin inline asm 2026-02-21T09:41:20.9410328Z @%p122 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r160; 2026-02-21T09:41:20.9410384Z // end inline asm 2026-02-21T09:41:20.9410438Z // begin inline asm 2026-02-21T09:41:20.9410600Z @%p122 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x6; 2026-02-21T09:41:20.9410657Z // end inline asm 2026-02-21T09:41:20.9410712Z // begin inline asm 2026-02-21T09:41:20.9410889Z @%p122 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:41:20.9410945Z // end inline asm 2026-02-21T09:41:20.9411000Z // begin inline asm 2026-02-21T09:41:20.9411152Z @%p122 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x3; 2026-02-21T09:41:20.9411216Z // end inline asm 2026-02-21T09:41:20.9411272Z // begin inline asm 2026-02-21T09:41:20.9411415Z @%p122 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:41:20.9411479Z // end inline asm 2026-02-21T09:41:20.9411534Z // begin inline asm 2026-02-21T09:41:20.9411797Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd75 + 0 ], [ %rd24 + 0 ], 0x80; 2026-02-21T09:41:20.9411861Z // end inline asm 2026-02-21T09:41:20.9411915Z // begin inline asm 2026-02-21T09:41:20.9412060Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd75 + 0 ], 0x80; 2026-02-21T09:41:20.9412137Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:41:20.9412211Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:20.9412265Z // end inline asm 2026-02-21T09:41:20.9412319Z bar.sync 0, 128; 2026-02-21T09:41:20.9412416Z cvta.global.u64 %rd81, %rd75; 2026-02-21T09:41:20.9412597Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9412660Z sub.s32 %r278, 1536, %r41; 2026-02-21T09:41:20.9412734Z mul.hi.s32 %r279, %r278, -580400985; 2026-02-21T09:41:20.9412793Z add.s32 %r280, %r279, %r278; 2026-02-21T09:41:20.9412851Z shr.u32 %r281, %r280, 31; 2026-02-21T09:41:20.9412907Z shr.s32 %r282, %r280, 12; 2026-02-21T09:41:20.9412973Z add.s32 %r283, %r282, %r281; 2026-02-21T09:41:20.9413035Z mul.lo.s32 %r284, %r283, 4736; 2026-02-21T09:41:20.9413099Z setp.ne.b32 %p110, %r278, %r284; 2026-02-21T09:41:20.9413173Z setp.lt.u32 %p111, %r41, 1537; 2026-02-21T09:41:20.9413238Z and.pred %p112, %p111, %p110; 2026-02-21T09:41:20.9413296Z selp.b32 %r285, 1, 0, %p112; 2026-02-21T09:41:20.9413361Z add.s32 %r286, %r283, %r285; 2026-02-21T09:41:20.9413421Z shl.b32 %r42, %r286, 5; 2026-02-21T09:41:20.9413610Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9413683Z shfl.sync.idx.b32 %r43, %r2, 0, 31, -1; 2026-02-21T09:41:20.9413753Z shl.b32 %r287, %r43, 21; 2026-02-21T09:41:20.9413812Z and.b32 %r288, %r287, 6291456; 2026-02-21T09:41:20.9413868Z add.s32 %r178, %r288, %r518; 2026-02-21T09:41:20.9413936Z mov.pred %p85, -1; 2026-02-21T09:41:20.9413992Z // begin inline asm 2026-02-21T09:41:20.9414288Z @%p85 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r178 + 0], 64, {%r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546}; 2026-02-21T09:41:20.9414353Z // end inline asm 2026-02-21T09:41:20.9414410Z // begin inline asm 2026-02-21T09:41:20.9414717Z @%p85 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r178 + 16], 64, {%r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546}; 2026-02-21T09:41:20.9414774Z // end inline asm 2026-02-21T09:41:20.9414840Z // begin inline asm 2026-02-21T09:41:20.9415128Z @%p85 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r178 + 32], 64, {%r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546}; 2026-02-21T09:41:20.9415217Z // end inline asm 2026-02-21T09:41:20.9415270Z // begin inline asm 2026-02-21T09:41:20.9415544Z @%p85 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r178 + 48], 64, {%r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546, %r546}; 2026-02-21T09:41:20.9415605Z // end inline asm 2026-02-21T09:41:20.9415658Z // begin inline asm 2026-02-21T09:41:20.9415727Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:20.9415781Z // end inline asm 2026-02-21T09:41:20.9415846Z bar.sync 0, 128; 2026-02-21T09:41:20.9416025Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9416084Z add.s32 %r246, %r153, 90112; 2026-02-21T09:41:20.9416150Z // begin inline asm 2026-02-21T09:41:20.9416241Z @%p122 mbarrier.init.shared::cta.b64 [%r246], 1; 2026-02-21T09:41:20.9416298Z // end inline asm 2026-02-21T09:41:20.9416363Z bar.sync 0, 128; 2026-02-21T09:41:20.9416423Z add.s32 %r247, %r153, 90120; 2026-02-21T09:41:20.9416481Z // begin inline asm 2026-02-21T09:41:20.9416568Z @%p122 mbarrier.init.shared::cta.b64 [%r247], 1; 2026-02-21T09:41:20.9416635Z // end inline asm 2026-02-21T09:41:20.9416688Z bar.sync 0, 128; 2026-02-21T09:41:20.9416743Z add.s32 %r248, %r153, 90128; 2026-02-21T09:41:20.9416801Z // begin inline asm 2026-02-21T09:41:20.9416881Z @%p122 mbarrier.init.shared::cta.b64 [%r248], 1; 2026-02-21T09:41:20.9416959Z // end inline asm 2026-02-21T09:41:20.9417012Z bar.sync 0, 128; 2026-02-21T09:41:20.9417073Z add.s32 %r249, %r153, 90136; 2026-02-21T09:41:20.9417126Z // begin inline asm 2026-02-21T09:41:20.9417203Z @%p122 mbarrier.init.shared::cta.b64 [%r249], 1; 2026-02-21T09:41:20.9417264Z // end inline asm 2026-02-21T09:41:20.9417319Z bar.sync 0, 128; 2026-02-21T09:41:20.9417401Z add.s32 %r250, %r153, 90144; 2026-02-21T09:41:20.9417464Z // begin inline asm 2026-02-21T09:41:20.9417543Z @%p122 mbarrier.init.shared::cta.b64 [%r250], 1; 2026-02-21T09:41:20.9417596Z // end inline asm 2026-02-21T09:41:20.9417646Z bar.sync 0, 128; 2026-02-21T09:41:20.9417709Z add.s32 %r251, %r153, 90152; 2026-02-21T09:41:20.9417764Z // begin inline asm 2026-02-21T09:41:20.9417841Z @%p122 mbarrier.init.shared::cta.b64 [%r251], 1; 2026-02-21T09:41:20.9417900Z // end inline asm 2026-02-21T09:41:20.9417955Z add.s32 %r252, %r153, 90160; 2026-02-21T09:41:20.9418007Z // begin inline asm 2026-02-21T09:41:20.9418086Z @%p122 mbarrier.init.shared::cta.b64 [%r252], 1; 2026-02-21T09:41:20.9418146Z // end inline asm 2026-02-21T09:41:20.9418202Z bar.sync 0, 128; 2026-02-21T09:41:20.9418258Z add.s32 %r253, %r153, 90168; 2026-02-21T09:41:20.9418321Z // begin inline asm 2026-02-21T09:41:20.9418400Z @%p122 mbarrier.init.shared::cta.b64 [%r253], 1; 2026-02-21T09:41:20.9418482Z // end inline asm 2026-02-21T09:41:20.9418547Z bar.sync 0, 128; 2026-02-21T09:41:20.9418606Z add.s32 %r254, %r153, 90176; 2026-02-21T09:41:20.9418663Z // begin inline asm 2026-02-21T09:41:20.9418743Z @%p122 mbarrier.init.shared::cta.b64 [%r254], 1; 2026-02-21T09:41:20.9418807Z // end inline asm 2026-02-21T09:41:20.9418862Z bar.sync 0, 128; 2026-02-21T09:41:20.9418920Z add.s32 %r255, %r153, 90184; 2026-02-21T09:41:20.9418982Z // begin inline asm 2026-02-21T09:41:20.9419064Z @%p122 mbarrier.init.shared::cta.b64 [%r255], 1; 2026-02-21T09:41:20.9419118Z // end inline asm 2026-02-21T09:41:20.9419172Z bar.sync 0, 128; 2026-02-21T09:41:20.9419239Z add.s32 %r256, %r153, 90192; 2026-02-21T09:41:20.9419295Z // begin inline asm 2026-02-21T09:41:20.9419375Z @%p122 mbarrier.init.shared::cta.b64 [%r256], 1; 2026-02-21T09:41:20.9419437Z // end inline asm 2026-02-21T09:41:20.9419492Z bar.sync 0, 128; 2026-02-21T09:41:20.9419550Z add.s32 %r257, %r153, 90200; 2026-02-21T09:41:20.9419606Z // begin inline asm 2026-02-21T09:41:20.9419695Z @%p122 mbarrier.init.shared::cta.b64 [%r257], 1; 2026-02-21T09:41:20.9419787Z // end inline asm 2026-02-21T09:41:20.9419967Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9420023Z bar.sync 0, 128; 2026-02-21T09:41:20.9420079Z // begin inline asm 2026-02-21T09:41:20.9420178Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r246]; 2026-02-21T09:41:20.9420233Z // end inline asm 2026-02-21T09:41:20.9420287Z bar.sync 0, 128; 2026-02-21T09:41:20.9420342Z // begin inline asm 2026-02-21T09:41:20.9420440Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r247]; 2026-02-21T09:41:20.9420496Z // end inline asm 2026-02-21T09:41:20.9420550Z bar.sync 0, 128; 2026-02-21T09:41:20.9420612Z // begin inline asm 2026-02-21T09:41:20.9420698Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r248]; 2026-02-21T09:41:20.9420752Z // end inline asm 2026-02-21T09:41:20.9420805Z bar.sync 0, 128; 2026-02-21T09:41:20.9420869Z // begin inline asm 2026-02-21T09:41:20.9420953Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r249]; 2026-02-21T09:41:20.9421009Z // end inline asm 2026-02-21T09:41:20.9421071Z bar.sync 0, 128; 2026-02-21T09:41:20.9421127Z // begin inline asm 2026-02-21T09:41:20.9421211Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r250]; 2026-02-21T09:41:20.9421271Z // end inline asm 2026-02-21T09:41:20.9421326Z bar.sync 0, 128; 2026-02-21T09:41:20.9421381Z // begin inline asm 2026-02-21T09:41:20.9421464Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r251]; 2026-02-21T09:41:20.9421527Z // end inline asm 2026-02-21T09:41:20.9421705Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9421796Z bar.sync 0, 128; 2026-02-21T09:41:20.9421862Z add.s32 %r264, %r153, 90208; 2026-02-21T09:41:20.9421919Z // begin inline asm 2026-02-21T09:41:20.9422002Z @%p122 mbarrier.init.shared::cta.b64 [%r264], 1; 2026-02-21T09:41:20.9422057Z // end inline asm 2026-02-21T09:41:20.9422148Z add.s32 %r502, %r153, 90224; 2026-02-21T09:41:20.9422207Z // begin inline asm 2026-02-21T09:41:20.9422290Z @%p122 mbarrier.init.shared::cta.b64 [%r502], 1; 2026-02-21T09:41:20.9422353Z // end inline asm 2026-02-21T09:41:20.9422522Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9422577Z bar.sync 0, 128; 2026-02-21T09:41:20.9422640Z // begin inline asm 2026-02-21T09:41:20.9422722Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r502]; 2026-02-21T09:41:20.9422777Z // end inline asm 2026-02-21T09:41:20.9422950Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9423042Z st.shared.b32 [global_smem+90232], 33554689; 2026-02-21T09:41:20.9423119Z st.shared.b32 [global_smem+73728], %r518; 2026-02-21T09:41:20.9423192Z st.shared.b32 [global_smem+73736], %r42; 2026-02-21T09:41:20.9423257Z barrier.sync 1; 2026-02-21T09:41:20.9423362Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:41:20.9423420Z barrier.sync 1; 2026-02-21T09:41:20.9423490Z setp.lt.s32 %p113, %r286, 1; 2026-02-21T09:41:20.9423553Z @%p113 bra $L__BB0_23; 2026-02-21T09:41:20.9423636Z // %bb.17: // %.lr.ph11 2026-02-21T09:41:20.9423697Z add.s32 %r543, %r41, -4736; 2026-02-21T09:41:20.9423762Z shl.b32 %r291, %r1, 7; 2026-02-21T09:41:20.9423820Z and.b32 %r292, %r291, 1920; 2026-02-21T09:41:20.9423876Z shl.b32 %r293, %r1, 6; 2026-02-21T09:41:20.9423941Z and.b32 %r294, %r293, 6144; 2026-02-21T09:41:20.9423996Z shl.b32 %r295, %r1, 4; 2026-02-21T09:41:20.9424055Z and.b32 %r296, %r295, 112; 2026-02-21T09:41:20.9424113Z shl.b32 %r297, %r1, 9; 2026-02-21T09:41:20.9424177Z and.b32 %r298, %r297, 8192; 2026-02-21T09:41:20.9424234Z or.b32 %r299, %r292, %r294; 2026-02-21T09:41:20.9424291Z or.b32 %r300, %r299, %r298; 2026-02-21T09:41:20.9424354Z or.b32 %r301, %r300, %r296; 2026-02-21T09:41:20.9424413Z add.s32 %r303, %r153, 73728; 2026-02-21T09:41:20.9424472Z add.s32 %r46, %r303, %r301; 2026-02-21T09:41:20.9424530Z xor.b32 %r304, %r301, 16; 2026-02-21T09:41:20.9424622Z add.s32 %r47, %r303, %r304; 2026-02-21T09:41:20.9424713Z xor.b32 %r305, %r301, 32; 2026-02-21T09:41:20.9424773Z add.s32 %r48, %r303, %r305; 2026-02-21T09:41:20.9424838Z xor.b32 %r306, %r301, 48; 2026-02-21T09:41:20.9424894Z add.s32 %r49, %r303, %r306; 2026-02-21T09:41:20.9424951Z xor.b32 %r307, %r301, 64; 2026-02-21T09:41:20.9425007Z add.s32 %r50, %r303, %r307; 2026-02-21T09:41:20.9425071Z xor.b32 %r308, %r301, 80; 2026-02-21T09:41:20.9425128Z add.s32 %r51, %r303, %r308; 2026-02-21T09:41:20.9425188Z xor.b32 %r309, %r301, 96; 2026-02-21T09:41:20.9425252Z add.s32 %r52, %r303, %r309; 2026-02-21T09:41:20.9425312Z xor.b32 %r310, %r301, 112; 2026-02-21T09:41:20.9425369Z add.s32 %r53, %r303, %r310; 2026-02-21T09:41:20.9425428Z and.b32 %r311, %r43, 1; 2026-02-21T09:41:20.9425491Z shl.b32 %r312, %r311, 13; 2026-02-21T09:41:20.9425551Z add.s32 %r388, %r303, %r312; 2026-02-21T09:41:20.9425611Z shl.b32 %r55, %r311, 6; 2026-02-21T09:41:20.9425677Z max.s32 %r536, %r42, 1; 2026-02-21T09:41:20.9425734Z mov.b32 %r541, -1; 2026-02-21T09:41:20.9425791Z mov.b32 %r544, %r546; 2026-02-21T09:41:20.9425857Z mov.b32 %r545, %r546; 2026-02-21T09:41:20.9425915Z bra.uni $L__BB0_18; 2026-02-21T09:41:20.9426025Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:41:20.9426203Z .loc 1 0 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0:108 2026-02-21T09:41:20.9426274Z setp.lt.u32 %p119, %r1, 64; 2026-02-21T09:41:20.9426502Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9426556Z bar.sync 0, 128; 2026-02-21T09:41:20.9426619Z // begin inline asm 2026-02-21T09:41:20.9426670Z 2026-02-21T09:41:20.9426721Z { 2026-02-21T09:41:20.9426781Z .reg .pred complete; 2026-02-21T09:41:20.9426846Z waitLoop: 2026-02-21T09:41:20.9426995Z mbarrier.try_wait.parity.shared.b64 complete, [%r264], %r546; 2026-02-21T09:41:20.9427062Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.9427119Z } 2026-02-21T09:41:20.9427123Z 2026-02-21T09:41:20.9427177Z // end inline asm 2026-02-21T09:41:20.9427231Z // begin inline asm 2026-02-21T09:41:20.9427514Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332}, [%r178 + 0], 64; 2026-02-21T09:41:20.9427567Z // end inline asm 2026-02-21T09:41:20.9427621Z // begin inline asm 2026-02-21T09:41:20.9427895Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346, %r347, %r348, %r349}, [%r178 + 16], 64; 2026-02-21T09:41:20.9427964Z // end inline asm 2026-02-21T09:41:20.9428022Z // begin inline asm 2026-02-21T09:41:20.9428325Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363, %r364, %r365, %r366}, [%r178 + 32], 64; 2026-02-21T09:41:20.9428388Z // end inline asm 2026-02-21T09:41:20.9428443Z // begin inline asm 2026-02-21T09:41:20.9428709Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r368, %r369, %r370, %r371, %r372, %r373, %r374, %r375, %r376, %r377, %r378, %r379, %r380, %r381, %r382, %r383}, [%r178 + 48], 64; 2026-02-21T09:41:20.9428768Z // end inline asm 2026-02-21T09:41:20.9428820Z // begin inline asm 2026-02-21T09:41:20.9428887Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:20.9428947Z // end inline asm 2026-02-21T09:41:20.9429000Z bar.sync 0, 128; 2026-02-21T09:41:20.9429056Z // begin inline asm 2026-02-21T09:41:20.9429142Z @%p122 mbarrier.arrive.shared::cta.b64 _, [%r502]; 2026-02-21T09:41:20.9429201Z // end inline asm 2026-02-21T09:41:20.9429257Z cvt.u64.u32 %rd82, %r317; 2026-02-21T09:41:20.9429313Z cvt.u64.u32 %rd83, %r318; 2026-02-21T09:41:20.9429375Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:41:20.9429432Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:41:20.9429598Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9429690Z mov.b64 {%r391, %r392}, %rd85; 2026-02-21T09:41:20.9429765Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T09:41:20.9429931Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9429987Z cvt.u64.u32 %rd86, %r319; 2026-02-21T09:41:20.9430049Z cvt.u64.u32 %rd87, %r320; 2026-02-21T09:41:20.9430105Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:41:20.9430159Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:41:20.9430330Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9430389Z mov.b64 {%r394, %r395}, %rd89; 2026-02-21T09:41:20.9430456Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T09:41:20.9430616Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9430679Z cvt.u64.u32 %rd90, %r321; 2026-02-21T09:41:20.9430734Z cvt.u64.u32 %rd91, %r322; 2026-02-21T09:41:20.9430791Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:41:20.9430854Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:41:20.9431011Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9431069Z mov.b64 {%r397, %r398}, %rd93; 2026-02-21T09:41:20.9431137Z cvt.rn.f16x2.f32 %r399, %r398, %r397; 2026-02-21T09:41:20.9431301Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9431378Z cvt.u64.u32 %rd94, %r323; 2026-02-21T09:41:20.9431433Z cvt.u64.u32 %rd95, %r324; 2026-02-21T09:41:20.9431495Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:41:20.9431550Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:41:20.9431708Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9431795Z mov.b64 {%r400, %r401}, %rd97; 2026-02-21T09:41:20.9431857Z cvt.rn.f16x2.f32 %r402, %r401, %r400; 2026-02-21T09:41:20.9432023Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9432082Z cvt.u64.u32 %rd98, %r325; 2026-02-21T09:41:20.9432136Z cvt.u64.u32 %rd99, %r326; 2026-02-21T09:41:20.9432193Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:41:20.9432251Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:41:20.9432420Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9432480Z mov.b64 {%r403, %r404}, %rd101; 2026-02-21T09:41:20.9432542Z cvt.rn.f16x2.f32 %r405, %r404, %r403; 2026-02-21T09:41:20.9432710Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9432767Z cvt.u64.u32 %rd102, %r327; 2026-02-21T09:41:20.9432823Z cvt.u64.u32 %rd103, %r328; 2026-02-21T09:41:20.9432907Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:41:20.9432966Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:41:20.9433124Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9433183Z mov.b64 {%r406, %r407}, %rd105; 2026-02-21T09:41:20.9433251Z cvt.rn.f16x2.f32 %r408, %r407, %r406; 2026-02-21T09:41:20.9433409Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9433465Z cvt.u64.u32 %rd106, %r329; 2026-02-21T09:41:20.9433528Z cvt.u64.u32 %rd107, %r330; 2026-02-21T09:41:20.9433583Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:41:20.9433643Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:41:20.9433806Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9433863Z mov.b64 {%r409, %r410}, %rd109; 2026-02-21T09:41:20.9433923Z cvt.rn.f16x2.f32 %r411, %r410, %r409; 2026-02-21T09:41:20.9434085Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9434169Z cvt.u64.u32 %rd110, %r331; 2026-02-21T09:41:20.9434228Z cvt.u64.u32 %rd111, %r332; 2026-02-21T09:41:20.9434284Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:41:20.9434349Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:41:20.9434511Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9434568Z mov.b64 {%r412, %r413}, %rd113; 2026-02-21T09:41:20.9434637Z cvt.rn.f16x2.f32 %r414, %r413, %r412; 2026-02-21T09:41:20.9434831Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9434890Z cvt.u64.u32 %rd114, %r334; 2026-02-21T09:41:20.9434944Z cvt.u64.u32 %rd115, %r335; 2026-02-21T09:41:20.9435008Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:41:20.9435065Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:41:20.9435221Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9435287Z mov.b64 {%r415, %r416}, %rd117; 2026-02-21T09:41:20.9435350Z cvt.rn.f16x2.f32 %r417, %r416, %r415; 2026-02-21T09:41:20.9435505Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9435570Z cvt.u64.u32 %rd118, %r336; 2026-02-21T09:41:20.9435628Z cvt.u64.u32 %rd119, %r337; 2026-02-21T09:41:20.9435686Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:41:20.9435744Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:41:20.9435917Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9436001Z mov.b64 {%r418, %r419}, %rd121; 2026-02-21T09:41:20.9436062Z cvt.rn.f16x2.f32 %r420, %r419, %r418; 2026-02-21T09:41:20.9436235Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9436296Z cvt.u64.u32 %rd122, %r338; 2026-02-21T09:41:20.9436380Z cvt.u64.u32 %rd123, %r339; 2026-02-21T09:41:20.9436451Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:41:20.9436511Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:41:20.9436674Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9436732Z mov.b64 {%r421, %r422}, %rd125; 2026-02-21T09:41:20.9436799Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:41:20.9436961Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9437016Z cvt.u64.u32 %rd126, %r340; 2026-02-21T09:41:20.9437080Z cvt.u64.u32 %rd127, %r341; 2026-02-21T09:41:20.9437136Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:41:20.9437192Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:41:20.9437361Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9437418Z mov.b64 {%r424, %r425}, %rd129; 2026-02-21T09:41:20.9437505Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:41:20.9437668Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9437732Z cvt.u64.u32 %rd130, %r342; 2026-02-21T09:41:20.9437787Z cvt.u64.u32 %rd131, %r343; 2026-02-21T09:41:20.9437843Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:41:20.9437907Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:41:20.9438066Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9438124Z mov.b64 {%r427, %r428}, %rd133; 2026-02-21T09:41:20.9438192Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:41:20.9438353Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9438409Z cvt.u64.u32 %rd134, %r344; 2026-02-21T09:41:20.9438465Z cvt.u64.u32 %rd135, %r345; 2026-02-21T09:41:20.9438529Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:41:20.9438588Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:41:20.9438749Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9438842Z mov.b64 {%r430, %r431}, %rd137; 2026-02-21T09:41:20.9438903Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:41:20.9439064Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9439126Z cvt.u64.u32 %rd138, %r346; 2026-02-21T09:41:20.9439182Z cvt.u64.u32 %rd139, %r347; 2026-02-21T09:41:20.9439238Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:41:20.9439294Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:41:20.9439468Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9439525Z mov.b64 {%r433, %r434}, %rd141; 2026-02-21T09:41:20.9439584Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:41:20.9439753Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9439809Z cvt.u64.u32 %rd142, %r348; 2026-02-21T09:41:20.9439866Z cvt.u64.u32 %rd143, %r349; 2026-02-21T09:41:20.9439928Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:41:20.9439983Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:41:20.9440143Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9440199Z mov.b64 {%r436, %r437}, %rd145; 2026-02-21T09:41:20.9440264Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:41:20.9440424Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9440502Z cvt.u64.u32 %rd146, %r351; 2026-02-21T09:41:20.9440565Z cvt.u64.u32 %rd147, %r352; 2026-02-21T09:41:20.9440621Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:41:20.9440678Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:41:20.9440843Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9440921Z mov.b64 {%r439, %r440}, %rd149; 2026-02-21T09:41:20.9440985Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T09:41:20.9441143Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9441206Z cvt.u64.u32 %rd150, %r353; 2026-02-21T09:41:20.9441261Z cvt.u64.u32 %rd151, %r354; 2026-02-21T09:41:20.9441316Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:41:20.9441378Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:41:20.9441538Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9441597Z mov.b64 {%r442, %r443}, %rd153; 2026-02-21T09:41:20.9441664Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T09:41:20.9441827Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9441890Z cvt.u64.u32 %rd154, %r355; 2026-02-21T09:41:20.9441991Z cvt.u64.u32 %rd155, %r356; 2026-02-21T09:41:20.9442082Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:41:20.9442151Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:41:20.9442372Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9442449Z mov.b64 {%r445, %r446}, %rd157; 2026-02-21T09:41:20.9442523Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T09:41:20.9442729Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9442804Z cvt.u64.u32 %rd158, %r357; 2026-02-21T09:41:20.9442872Z cvt.u64.u32 %rd159, %r358; 2026-02-21T09:41:20.9442931Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:41:20.9443018Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:41:20.9443234Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9443299Z mov.b64 {%r448, %r449}, %rd161; 2026-02-21T09:41:20.9443383Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T09:41:20.9443598Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9443718Z cvt.u64.u32 %rd162, %r359; 2026-02-21T09:41:20.9443796Z cvt.u64.u32 %rd163, %r360; 2026-02-21T09:41:20.9443870Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:41:20.9443937Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:41:20.9444145Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9444214Z mov.b64 {%r451, %r452}, %rd165; 2026-02-21T09:41:20.9444296Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T09:41:20.9444505Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9444573Z cvt.u64.u32 %rd166, %r361; 2026-02-21T09:41:20.9444651Z cvt.u64.u32 %rd167, %r362; 2026-02-21T09:41:20.9444750Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:41:20.9444844Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:41:20.9445061Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9445124Z mov.b64 {%r454, %r455}, %rd169; 2026-02-21T09:41:20.9445223Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T09:41:20.9445422Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9445499Z cvt.u64.u32 %rd170, %r363; 2026-02-21T09:41:20.9445566Z cvt.u64.u32 %rd171, %r364; 2026-02-21T09:41:20.9445635Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:41:20.9445711Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:41:20.9445911Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9446011Z mov.b64 {%r457, %r458}, %rd173; 2026-02-21T09:41:20.9446092Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T09:41:20.9446287Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9446405Z cvt.u64.u32 %rd174, %r365; 2026-02-21T09:41:20.9446475Z cvt.u64.u32 %rd175, %r366; 2026-02-21T09:41:20.9446552Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:41:20.9446620Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:41:20.9446828Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9446906Z mov.b64 {%r460, %r461}, %rd177; 2026-02-21T09:41:20.9446979Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T09:41:20.9447167Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9447244Z cvt.u64.u32 %rd178, %r368; 2026-02-21T09:41:20.9447315Z cvt.u64.u32 %rd179, %r369; 2026-02-21T09:41:20.9447372Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:41:20.9447427Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:41:20.9447596Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9447679Z mov.b64 {%r463, %r464}, %rd181; 2026-02-21T09:41:20.9447743Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T09:41:20.9447911Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9447966Z cvt.u64.u32 %rd182, %r370; 2026-02-21T09:41:20.9448021Z cvt.u64.u32 %rd183, %r371; 2026-02-21T09:41:20.9448086Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:41:20.9448141Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:41:20.9448306Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9448362Z mov.b64 {%r466, %r467}, %rd185; 2026-02-21T09:41:20.9448433Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T09:41:20.9448595Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9448650Z cvt.u64.u32 %rd186, %r372; 2026-02-21T09:41:20.9448714Z cvt.u64.u32 %rd187, %r373; 2026-02-21T09:41:20.9448771Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:41:20.9448828Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:41:20.9448999Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9449084Z mov.b64 {%r469, %r470}, %rd189; 2026-02-21T09:41:20.9449144Z cvt.rn.f16x2.f32 %r471, %r470, %r469; 2026-02-21T09:41:20.9449305Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9449370Z cvt.u64.u32 %rd190, %r374; 2026-02-21T09:41:20.9449426Z cvt.u64.u32 %rd191, %r375; 2026-02-21T09:41:20.9449481Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:41:20.9449546Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:41:20.9449707Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9449762Z mov.b64 {%r472, %r473}, %rd193; 2026-02-21T09:41:20.9449830Z cvt.rn.f16x2.f32 %r474, %r473, %r472; 2026-02-21T09:41:20.9449994Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9450049Z cvt.u64.u32 %rd194, %r376; 2026-02-21T09:41:20.9450106Z cvt.u64.u32 %rd195, %r377; 2026-02-21T09:41:20.9450168Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:41:20.9450224Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:41:20.9450384Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9450447Z mov.b64 {%r475, %r476}, %rd197; 2026-02-21T09:41:20.9450508Z cvt.rn.f16x2.f32 %r477, %r476, %r475; 2026-02-21T09:41:20.9450670Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9450753Z cvt.u64.u32 %rd198, %r378; 2026-02-21T09:41:20.9450808Z cvt.u64.u32 %rd199, %r379; 2026-02-21T09:41:20.9450864Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:41:20.9450919Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:41:20.9451119Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9451177Z mov.b64 {%r478, %r479}, %rd201; 2026-02-21T09:41:20.9451238Z cvt.rn.f16x2.f32 %r480, %r479, %r478; 2026-02-21T09:41:20.9451406Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9451463Z cvt.u64.u32 %rd202, %r380; 2026-02-21T09:41:20.9451518Z cvt.u64.u32 %rd203, %r381; 2026-02-21T09:41:20.9451580Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:41:20.9451636Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:41:20.9451798Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9451857Z mov.b64 {%r481, %r482}, %rd205; 2026-02-21T09:41:20.9451923Z cvt.rn.f16x2.f32 %r483, %r482, %r481; 2026-02-21T09:41:20.9452081Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9452136Z cvt.u64.u32 %rd206, %r382; 2026-02-21T09:41:20.9452243Z cvt.u64.u32 %rd207, %r383; 2026-02-21T09:41:20.9452304Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:41:20.9452361Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:41:20.9452530Z .loc 1 55 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:55:27 2026-02-21T09:41:20.9452587Z mov.b64 {%r484, %r485}, %rd209; 2026-02-21T09:41:20.9452647Z cvt.rn.f16x2.f32 %r486, %r485, %r484; 2026-02-21T09:41:20.9452809Z .loc 1 56 45 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:56:45 2026-02-21T09:41:20.9452887Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:41:20.9452942Z bar.sync 0, 128; 2026-02-21T09:41:20.9453039Z st.shared.v4.b32 [%r46], {%r393, %r396, %r399, %r402}; 2026-02-21T09:41:20.9453131Z st.shared.v4.b32 [%r47], {%r405, %r408, %r411, %r414}; 2026-02-21T09:41:20.9453213Z st.shared.v4.b32 [%r48], {%r417, %r420, %r423, %r426}; 2026-02-21T09:41:20.9453294Z st.shared.v4.b32 [%r49], {%r429, %r432, %r435, %r438}; 2026-02-21T09:41:20.9453384Z st.shared.v4.b32 [%r50], {%r441, %r444, %r447, %r450}; 2026-02-21T09:41:20.9453463Z st.shared.v4.b32 [%r51], {%r453, %r456, %r459, %r462}; 2026-02-21T09:41:20.9453571Z st.shared.v4.b32 [%r52], {%r465, %r468, %r471, %r474}; 2026-02-21T09:41:20.9453651Z st.shared.v4.b32 [%r53], {%r477, %r480, %r483, %r486}; 2026-02-21T09:41:20.9453717Z // begin inline asm 2026-02-21T09:41:20.9453792Z fence.proxy.async.shared::cta; 2026-02-21T09:41:20.9453845Z // end inline asm 2026-02-21T09:41:20.9453907Z bar.sync 0, 128; 2026-02-21T09:41:20.9453972Z elect.sync %r487|%p120, -1; 2026-02-21T09:41:20.9454036Z and.pred %p118, %p119, %p120; 2026-02-21T09:41:20.9454100Z add.s32 %r386, %r545, %r55; 2026-02-21T09:41:20.9454163Z // begin inline asm 2026-02-21T09:41:20.9454343Z @%p118 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd81, {%r386, %r544}], [%r388]; 2026-02-21T09:41:20.9454396Z // end inline asm 2026-02-21T09:41:20.9454468Z cp.async.bulk.commit_group; 2026-02-21T09:41:20.9454522Z mov.b32 %r542, 1; 2026-02-21T09:41:20.9454624Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:41:20.9454827Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9454886Z xor.b32 %r546, %r542, %r546; 2026-02-21T09:41:20.9455048Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9455105Z add.s32 %r536, %r536, -1; 2026-02-21T09:41:20.9455171Z setp.ne.b32 %p121, %r536, 0; 2026-02-21T09:41:20.9455229Z @%p121 bra $L__BB0_18; 2026-02-21T09:41:20.9455284Z bra.uni $L__BB0_23; 2026-02-21T09:41:20.9455426Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:20.9455484Z add.s32 %r314, %r541, 1; 2026-02-21T09:41:20.9455543Z setp.eq.b32 %p114, %r541, 31; 2026-02-21T09:41:20.9455614Z selp.b32 %r541, 0, %r314, %p114; 2026-02-21T09:41:20.9455674Z setp.eq.b32 %p115, %r541, 31; 2026-02-21T09:41:20.9455756Z @%p115 bra $L__BB0_21; 2026-02-21T09:41:20.9455853Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:41:20.9456028Z .loc 1 0 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0:108 2026-02-21T09:41:20.9456082Z mov.b32 %r542, 0; 2026-02-21T09:41:20.9456249Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9456316Z setp.ne.b32 %p116, %r541, 0; 2026-02-21T09:41:20.9456371Z @%p116 bra $L__BB0_22; 2026-02-21T09:41:20.9456446Z // %bb.20: // %.thread 2026-02-21T09:41:20.9456543Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:41:20.9456600Z add.s32 %r543, %r543, 4736; 2026-02-21T09:41:20.9456762Z .loc 1 36 35 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:36:35 2026-02-21T09:41:20.9456845Z shr.s32 %r489, %r543, 31; 2026-02-21T09:41:20.9456913Z shr.u32 %r490, %r489, 22; 2026-02-21T09:41:20.9456970Z add.s32 %r491, %r543, %r490; 2026-02-21T09:41:20.9457026Z shr.s32 %r492, %r491, 10; 2026-02-21T09:41:20.9457200Z .loc 1 37 33 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:37:33 2026-02-21T09:41:20.9457256Z shl.b32 %r493, %r492, 6; 2026-02-21T09:41:20.9457418Z .loc 1 38 39 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:38:39 2026-02-21T09:41:20.9457480Z sub.s32 %r494, 96, %r493; 2026-02-21T09:41:20.9457641Z .loc 1 38 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:38:52 2026-02-21T09:41:20.9457697Z min.s32 %r495, %r494, 64; 2026-02-21T09:41:20.9457855Z .loc 1 39 45 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:39:45 2026-02-21T09:41:20.9457920Z and.b32 %r496, %r491, -1024; 2026-02-21T09:41:20.9457974Z sub.s32 %r497, %r543, %r496; 2026-02-21T09:41:20.9458139Z .loc 1 40 51 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:40:51 2026-02-21T09:41:20.9458227Z div.s32 %r498, %r497, %r495; 2026-02-21T09:41:20.9458387Z .loc 1 39 64 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:39:64 2026-02-21T09:41:20.9458445Z mul.lo.s32 %r499, %r498, %r495; 2026-02-21T09:41:20.9458507Z sub.s32 %r500, %r497, %r499; 2026-02-21T09:41:20.9458667Z .loc 1 39 30 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:39:30 2026-02-21T09:41:20.9458721Z add.s32 %r501, %r500, %r493; 2026-02-21T09:41:20.9458890Z .loc 1 41 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:41:27 2026-02-21T09:41:20.9458949Z shl.b32 %r545, %r501, 7; 2026-02-21T09:41:20.9459107Z .loc 1 42 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:42:27 2026-02-21T09:41:20.9459163Z shl.b32 %r544, %r498, 6; 2026-02-21T09:41:20.9459342Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9459398Z bra.uni $L__BB0_22; 2026-02-21T09:41:20.9459493Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:41:20.9459666Z .loc 1 0 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0:108 2026-02-21T09:41:20.9459727Z mov.b32 %r75, global_smem; 2026-02-21T09:41:20.9459782Z add.s32 %r76, %r75, %r3; 2026-02-21T09:41:20.9459841Z bra.uni $L__BB0_2; 2026-02-21T09:41:20.9459936Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9460127Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9460206Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:41:20.9460268Z barrier.sync 1; 2026-02-21T09:41:20.9460321Z barrier.sync 1; 2026-02-21T09:41:20.9460396Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:41:20.9460504Z $L__BB0_2: // %.preheader 2026-02-21T09:41:20.9460592Z // =>This Loop Header: Depth=1 2026-02-21T09:41:20.9460679Z // Child Loop BB0_11 Depth 2 2026-02-21T09:41:20.9460770Z // Child Loop BB0_7 Depth 2 2026-02-21T09:41:20.9460927Z .loc 1 19 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:19 2026-02-21T09:41:20.9461000Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:41:20.9461060Z barrier.sync 1; 2026-02-21T09:41:20.9461123Z ld.shared.b8 %r74, [%r76+90228]; 2026-02-21T09:41:20.9461183Z setp.gt.u32 %p4, %r74, 3; 2026-02-21T09:41:20.9461238Z @%p4 bra $L__BB0_4; 2026-02-21T09:41:20.9461323Z // %bb.3: // %.preheader 2026-02-21T09:41:20.9461408Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9461469Z $L_brx_0: .branchtargets 2026-02-21T09:41:20.9461560Z $L__BB0_5, 2026-02-21T09:41:20.9461613Z $L__BB0_9, 2026-02-21T09:41:20.9461663Z $L__BB0_15, 2026-02-21T09:41:20.9461715Z $L__BB0_24; 2026-02-21T09:41:20.9461782Z brx.idx %r74, $L_brx_0; 2026-02-21T09:41:20.9461873Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9462039Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9462117Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:41:20.9462193Z ld.shared.b32 %r126, [global_smem+73728]; 2026-02-21T09:41:20.9462264Z ld.shared.b32 %r520, [global_smem+73736]; 2026-02-21T09:41:20.9462328Z barrier.sync 1; 2026-02-21T09:41:20.9462387Z setp.lt.s32 %p17, %r520, 1; 2026-02-21T09:41:20.9462444Z @%p17 bra $L__BB0_8; 2026-02-21T09:41:20.9462518Z // %bb.6: // %.lr.ph8 2026-02-21T09:41:20.9462607Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9462770Z .loc 1 0 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0:108 2026-02-21T09:41:20.9462848Z mov.b32 %r524, -1; 2026-02-21T09:41:20.9462917Z mov.pred %p137, 0; 2026-02-21T09:41:20.9462972Z mov.b32 %r521, 0; 2026-02-21T09:41:20.9463027Z mov.b32 %r522, %r521; 2026-02-21T09:41:20.9463082Z mov.b32 %r523, %r521; 2026-02-21T09:41:20.9463184Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:41:20.9463271Z // => This Inner Loop Header: Depth=2 2026-02-21T09:41:20.9463437Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9463502Z add.s32 %r132, %r524, 1; 2026-02-21T09:41:20.9463562Z setp.eq.b32 %p26, %r524, 31; 2026-02-21T09:41:20.9463623Z selp.b32 %r524, 0, %r132, %p26; 2026-02-21T09:41:20.9463684Z shl.b32 %r133, %r523, 3; 2026-02-21T09:41:20.9463739Z add.s32 %r135, %r75, %r133; 2026-02-21T09:41:20.9463795Z add.s32 %r136, %r135, 90112; 2026-02-21T09:41:20.9463850Z add.s32 %r124, %r135, 90160; 2026-02-21T09:41:20.9464019Z .loc 1 51 31 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:51:31 2026-02-21T09:41:20.9464075Z shl.b32 %r137, %r523, 12; 2026-02-21T09:41:20.9464130Z add.s32 %r138, %r75, %r137; 2026-02-21T09:41:20.9464192Z add.s32 %r139, %r138, 49152; 2026-02-21T09:41:20.9464357Z .loc 1 52 44 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:52:44 2026-02-21T09:41:20.9464414Z shl.b32 %r140, %r523, 13; 2026-02-21T09:41:20.9464478Z add.s32 %r141, %r75, %r140; 2026-02-21T09:41:20.9464704Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9464769Z bar.warp.sync -1; 2026-02-21T09:41:20.9464826Z // begin inline asm 2026-02-21T09:41:20.9464888Z 2026-02-21T09:41:20.9464938Z { 2026-02-21T09:41:20.9465000Z .reg .pred complete; 2026-02-21T09:41:20.9465065Z waitLoop: 2026-02-21T09:41:20.9465217Z mbarrier.try_wait.parity.shared.b64 complete, [%r124], %r522; 2026-02-21T09:41:20.9465285Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.9465336Z } 2026-02-21T09:41:20.9465340Z 2026-02-21T09:41:20.9465404Z // end inline asm 2026-02-21T09:41:20.9465577Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9465638Z setp.eq.b32 %p25, %r524, 31; 2026-02-21T09:41:20.9465814Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9465880Z elect.sync %r142|%p20, -1; 2026-02-21T09:41:20.9465944Z bfe.u32 %r143, %r139, 4, 14; 2026-02-21T09:41:20.9466011Z cvt.u64.u32 %rd20, %r143; 2026-02-21T09:41:20.9466085Z or.b64 %rd14, %rd20, -9223371899399045120; 2026-02-21T09:41:20.9466146Z bfe.u32 %r144, %r141, 4, 14; 2026-02-21T09:41:20.9466206Z cvt.u64.u32 %rd21, %r144; 2026-02-21T09:41:20.9466313Z or.b64 %rd15, %rd21, -9223371899382267904; 2026-02-21T09:41:20.9466375Z mov.b32 %r127, 69206032; 2026-02-21T09:41:20.9466431Z // begin inline asm 2026-02-21T09:41:20.9466586Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r126 + 0 ], %rd14, %rd15, %r127, %p137; 2026-02-21T09:41:20.9466644Z // end inline asm 2026-02-21T09:41:20.9466700Z add.s32 %r145, %r138, 49184; 2026-02-21T09:41:20.9466764Z bfe.u32 %r146, %r145, 4, 14; 2026-02-21T09:41:20.9466822Z cvt.u64.u32 %rd22, %r146; 2026-02-21T09:41:20.9466889Z or.b64 %rd16, %rd22, -9223371899399045120; 2026-02-21T09:41:20.9466946Z add.s32 %r147, %r141, 32; 2026-02-21T09:41:20.9467010Z bfe.u32 %r148, %r147, 4, 14; 2026-02-21T09:41:20.9467067Z cvt.u64.u32 %rd23, %r148; 2026-02-21T09:41:20.9467132Z or.b64 %rd17, %rd23, -9223371899382267904; 2026-02-21T09:41:20.9467198Z mov.pred %p21, -1; 2026-02-21T09:41:20.9467254Z // begin inline asm 2026-02-21T09:41:20.9467396Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r126 + 0 ], %rd16, %rd17, %r127, %p21; 2026-02-21T09:41:20.9467453Z // end inline asm 2026-02-21T09:41:20.9467518Z cvt.u64.u32 %rd18, %r136; 2026-02-21T09:41:20.9467574Z // begin inline asm 2026-02-21T09:41:20.9467732Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd18]; 2026-02-21T09:41:20.9467794Z // end inline asm 2026-02-21T09:41:20.9467858Z and.pred %p24, %p25, %p20; 2026-02-21T09:41:20.9467915Z add.s32 %r149, %r75, 90208; 2026-02-21T09:41:20.9467980Z cvt.u64.u32 %rd19, %r149; 2026-02-21T09:41:20.9468035Z // begin inline asm 2026-02-21T09:41:20.9468158Z @%p24 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd19]; 2026-02-21T09:41:20.9468212Z // end inline asm 2026-02-21T09:41:20.9468387Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9468450Z setp.ne.b32 %p137, %r524, 31; 2026-02-21T09:41:20.9468622Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9468690Z selp.b32 %r150, 1, 0, %p25; 2026-02-21T09:41:20.9468751Z xor.b32 %r521, %r521, %r150; 2026-02-21T09:41:20.9468808Z add.s32 %r130, %r75, 90224; 2026-02-21T09:41:20.9468872Z // begin inline asm 2026-02-21T09:41:20.9468922Z 2026-02-21T09:41:20.9468972Z { 2026-02-21T09:41:20.9469035Z @!%p25 bra.uni skipWait; 2026-02-21T09:41:20.9469169Z .reg .pred complete; 2026-02-21T09:41:20.9469223Z waitLoop: 2026-02-21T09:41:20.9469343Z mbarrier.try_wait.parity.shared.b64 complete, [%r130], %r521; 2026-02-21T09:41:20.9469414Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.9469469Z skipWait: 2026-02-21T09:41:20.9469520Z } 2026-02-21T09:41:20.9469524Z 2026-02-21T09:41:20.9469609Z // end inline asm 2026-02-21T09:41:20.9469781Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9469840Z add.s32 %r151, %r523, 1; 2026-02-21T09:41:20.9469901Z setp.eq.b32 %p27, %r151, 6; 2026-02-21T09:41:20.9469973Z selp.b32 %r523, 0, %r151, %p27; 2026-02-21T09:41:20.9470033Z selp.b32 %r152, 1, 0, %p27; 2026-02-21T09:41:20.9470117Z xor.b32 %r522, %r522, %r152; 2026-02-21T09:41:20.9470304Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9470366Z add.s32 %r520, %r520, -1; 2026-02-21T09:41:20.9470427Z setp.ne.b32 %p28, %r520, 0; 2026-02-21T09:41:20.9470487Z @%p28 bra $L__BB0_7; 2026-02-21T09:41:20.9470580Z $L__BB0_8: // %._crit_edge9 2026-02-21T09:41:20.9470671Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9470730Z barrier.sync 1; 2026-02-21T09:41:20.9470819Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:41:20.9470879Z bra.uni $L__BB0_2; 2026-02-21T09:41:20.9470976Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9471160Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9471267Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:41:20.9471346Z ld.shared.b32 %r525, [global_smem+73736]; 2026-02-21T09:41:20.9471403Z barrier.sync 1; 2026-02-21T09:41:20.9471582Z .loc 1 21 67 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:21:67 2026-02-21T09:41:20.9471640Z mov.u32 %r17, %ctaid.x; 2026-02-21T09:41:20.9471699Z mov.u32 %r77, %ctaid.y; 2026-02-21T09:41:20.9471762Z mov.u32 %r78, %ctaid.z; 2026-02-21T09:41:20.9471820Z mov.u32 %r79, %nctaid.x; 2026-02-21T09:41:20.9471878Z mov.u32 %r80, %nctaid.y; 2026-02-21T09:41:20.9471943Z mad.lo.s32 %r81, %r78, %r80, %r77; 2026-02-21T09:41:20.9472013Z mad.lo.s32 %r82, %r81, %r79, %r17; 2026-02-21T09:41:20.9472073Z mul.lo.s32 %r83, %r82, 384; 2026-02-21T09:41:20.9472249Z .loc 1 22 68 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:22:68 2026-02-21T09:41:20.9472311Z add.s32 %r84, %r83, 128; 2026-02-21T09:41:20.9472367Z cvt.s64.s32 %rd8, %r84; 2026-02-21T09:41:20.9472424Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:41:20.9472493Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:41:20.9472651Z .loc 1 21 67 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:21:67 2026-02-21T09:41:20.9472732Z cvt.s64.s32 %rd10, %r83; 2026-02-21T09:41:20.9472788Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:41:20.9472856Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:41:20.9473025Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9473084Z setp.lt.s32 %p5, %r525, 1; 2026-02-21T09:41:20.9473147Z @%p5 bra $L__BB0_14; 2026-02-21T09:41:20.9473220Z // %bb.10: // %.lr.ph 2026-02-21T09:41:20.9473306Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9473369Z add.s32 %r535, %r17, -4736; 2026-02-21T09:41:20.9473426Z add.s32 %r19, %r1, -128; 2026-02-21T09:41:20.9473479Z mov.b32 %r532, -1; 2026-02-21T09:41:20.9473533Z mov.b32 %r526, 0; 2026-02-21T09:41:20.9473598Z mov.b32 %r527, %r526; 2026-02-21T09:41:20.9473652Z mov.b32 %r534, %r526; 2026-02-21T09:41:20.9473707Z mov.b32 %r533, %r526; 2026-02-21T09:41:20.9473767Z mov.b32 %r530, %r526; 2026-02-21T09:41:20.9473822Z bra.uni $L__BB0_11; 2026-02-21T09:41:20.9473916Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:41:20.9474084Z .loc 1 0 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0:108 2026-02-21T09:41:20.9474152Z selp.b32 %r105, 0, %r530, %p8; 2026-02-21T09:41:20.9474211Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:41:20.9474268Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:41:20.9474464Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9474519Z shl.b32 %r112, %r527, 3; 2026-02-21T09:41:20.9474573Z add.s32 %r114, %r75, %r112; 2026-02-21T09:41:20.9474635Z add.s32 %r101, %r114, 90112; 2026-02-21T09:41:20.9474854Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9474912Z // begin inline asm 2026-02-21T09:41:20.9474960Z 2026-02-21T09:41:20.9475015Z { 2026-02-21T09:41:20.9475073Z .reg .pred complete; 2026-02-21T09:41:20.9475124Z waitLoop: 2026-02-21T09:41:20.9475247Z mbarrier.try_wait.parity.shared.b64 complete, [%r101], %r526; 2026-02-21T09:41:20.9475307Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.9475354Z } 2026-02-21T09:41:20.9475358Z 2026-02-21T09:41:20.9475410Z // end inline asm 2026-02-21T09:41:20.9475583Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9475641Z add.s32 %r107, %r114, 90160; 2026-02-21T09:41:20.9475797Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9475857Z bar.sync 3, 64; 2026-02-21T09:41:20.9475912Z // begin inline asm 2026-02-21T09:41:20.9476052Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r107], 12288; 2026-02-21T09:41:20.9476115Z // end inline asm 2026-02-21T09:41:20.9476276Z .loc 1 51 31 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:51:31 2026-02-21T09:41:20.9476333Z shl.b32 %r115, %r527, 12; 2026-02-21T09:41:20.9476389Z add.s32 %r116, %r75, %r115; 2026-02-21T09:41:20.9476452Z add.s32 %r104, %r116, 49152; 2026-02-21T09:41:20.9476607Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9476660Z bar.sync 3, 64; 2026-02-21T09:41:20.9476729Z elect.sync %r117|%p13, -1; 2026-02-21T09:41:20.9476790Z and.pred %p10, %p12, %p13; 2026-02-21T09:41:20.9476848Z // begin inline asm 2026-02-21T09:41:20.9477123Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r104], [%rd12, {%r105, %r534}], [%r107]; 2026-02-21T09:41:20.9477177Z // end inline asm 2026-02-21T09:41:20.9477360Z .loc 1 52 44 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:52:44 2026-02-21T09:41:20.9477425Z shl.b32 %r118, %r527, 13; 2026-02-21T09:41:20.9477509Z add.s32 %r108, %r75, %r118; 2026-02-21T09:41:20.9477693Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9477748Z bar.sync 3, 64; 2026-02-21T09:41:20.9477817Z elect.sync %r119|%p14, -1; 2026-02-21T09:41:20.9477879Z and.pred %p11, %p12, %p14; 2026-02-21T09:41:20.9477933Z // begin inline asm 2026-02-21T09:41:20.9478196Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r108], [%rd13, {%r105, %r533}], [%r107]; 2026-02-21T09:41:20.9478253Z // end inline asm 2026-02-21T09:41:20.9478442Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9478504Z add.s32 %r530, %r105, 32; 2026-02-21T09:41:20.9478680Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9478739Z add.s32 %r120, %r527, 1; 2026-02-21T09:41:20.9478797Z setp.eq.b32 %p15, %r120, 6; 2026-02-21T09:41:20.9478867Z selp.b32 %r527, 0, %r120, %p15; 2026-02-21T09:41:20.9478924Z selp.b32 %r121, 1, 0, %p15; 2026-02-21T09:41:20.9478981Z xor.b32 %r526, %r526, %r121; 2026-02-21T09:41:20.9479173Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9479232Z add.s32 %r525, %r525, -1; 2026-02-21T09:41:20.9479290Z setp.ne.b32 %p16, %r525, 0; 2026-02-21T09:41:20.9479355Z @%p16 bra $L__BB0_11; 2026-02-21T09:41:20.9479412Z bra.uni $L__BB0_14; 2026-02-21T09:41:20.9479541Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:41:20.9479636Z // => This Inner Loop Header: Depth=2 2026-02-21T09:41:20.9479702Z add.s32 %r87, %r532, 1; 2026-02-21T09:41:20.9479761Z setp.eq.b32 %p6, %r532, 31; 2026-02-21T09:41:20.9479824Z selp.b32 %r532, 0, %r87, %p6; 2026-02-21T09:41:20.9479926Z setp.ne.b32 %p7, %r532, 0; 2026-02-21T09:41:20.9479987Z setp.eq.b32 %p8, %r532, 0; 2026-02-21T09:41:20.9480045Z @%p7 bra $L__BB0_13; 2026-02-21T09:41:20.9480138Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:41:20.9480200Z add.s32 %r535, %r535, 4736; 2026-02-21T09:41:20.9480366Z .loc 1 36 35 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:36:35 2026-02-21T09:41:20.9480424Z shr.s32 %r88, %r535, 31; 2026-02-21T09:41:20.9480486Z shr.u32 %r89, %r88, 22; 2026-02-21T09:41:20.9480542Z add.s32 %r90, %r535, %r89; 2026-02-21T09:41:20.9480598Z shr.s32 %r91, %r90, 10; 2026-02-21T09:41:20.9480772Z .loc 1 37 33 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:37:33 2026-02-21T09:41:20.9480828Z shl.b32 %r92, %r91, 6; 2026-02-21T09:41:20.9481013Z .loc 1 38 39 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:38:39 2026-02-21T09:41:20.9481092Z sub.s32 %r93, 96, %r92; 2026-02-21T09:41:20.9481265Z .loc 1 38 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:38:52 2026-02-21T09:41:20.9481321Z min.s32 %r94, %r93, 64; 2026-02-21T09:41:20.9481481Z .loc 1 39 45 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:39:45 2026-02-21T09:41:20.9481546Z and.b32 %r95, %r90, -1024; 2026-02-21T09:41:20.9481602Z sub.s32 %r96, %r535, %r95; 2026-02-21T09:41:20.9481763Z .loc 1 40 51 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:40:51 2026-02-21T09:41:20.9481826Z div.s32 %r97, %r96, %r94; 2026-02-21T09:41:20.9481994Z .loc 1 39 64 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:39:64 2026-02-21T09:41:20.9482054Z mul.lo.s32 %r98, %r97, %r94; 2026-02-21T09:41:20.9482117Z sub.s32 %r99, %r96, %r98; 2026-02-21T09:41:20.9482283Z .loc 1 39 30 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:39:30 2026-02-21T09:41:20.9482341Z add.s32 %r100, %r99, %r92; 2026-02-21T09:41:20.9482509Z .loc 1 41 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:41:27 2026-02-21T09:41:20.9482610Z shl.b32 %r533, %r100, 7; 2026-02-21T09:41:20.9482776Z .loc 1 42 27 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:42:27 2026-02-21T09:41:20.9482834Z shl.b32 %r534, %r97, 6; 2026-02-21T09:41:20.9482897Z bra.uni $L__BB0_13; 2026-02-21T09:41:20.9482979Z $L__BB0_14: // %._crit_edge 2026-02-21T09:41:20.9483065Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9483245Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9483300Z barrier.sync 1; 2026-02-21T09:41:20.9483377Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:41:20.9483432Z bra.uni $L__BB0_2; 2026-02-21T09:41:20.9483535Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:41:20.9483699Z .loc 1 19 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:19 2026-02-21T09:41:20.9483756Z barrier.sync 1; 2026-02-21T09:41:20.9483815Z barrier.sync 1; 2026-02-21T09:41:20.9483868Z bra.uni $L__BB0_2; 2026-02-21T09:41:20.9483952Z $L__BB0_23: // %._crit_edge12 2026-02-21T09:41:20.9484128Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9484198Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:41:20.9484277Z bar.sync 0, 128; 2026-02-21T09:41:20.9484331Z barrier.sync 1; 2026-02-21T09:41:20.9484413Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:41:20.9484572Z .loc 1 53 52 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:53:52 2026-02-21T09:41:20.9484628Z // begin inline asm 2026-02-21T09:41:20.9484711Z 2026-02-21T09:41:20.9484765Z { 2026-02-21T09:41:20.9484850Z .reg .pred complete; 2026-02-21T09:41:20.9484904Z waitLoop: 2026-02-21T09:41:20.9485029Z mbarrier.try_wait.parity.shared.b64 complete, [%r502], %r546; 2026-02-21T09:41:20.9485092Z @!complete bra.uni waitLoop; 2026-02-21T09:41:20.9485138Z } 2026-02-21T09:41:20.9485142Z 2026-02-21T09:41:20.9485204Z // end inline asm 2026-02-21T09:41:20.9485373Z .loc 1 30 108 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:108 2026-02-21T09:41:20.9485426Z bar.sync 0, 128; 2026-02-21T09:41:20.9485486Z // begin inline asm 2026-02-21T09:41:20.9485574Z @%p122 mbarrier.inval.shared::cta.b64 [%r502]; 2026-02-21T09:41:20.9485628Z // end inline asm 2026-02-21T09:41:20.9485682Z // begin inline asm 2026-02-21T09:41:20.9485770Z @%p122 mbarrier.inval.shared::cta.b64 [%r264]; 2026-02-21T09:41:20.9485823Z // end inline asm 2026-02-21T09:41:20.9485875Z // begin inline asm 2026-02-21T09:41:20.9485985Z @%p122 mbarrier.inval.shared::cta.b64 [%r252]; 2026-02-21T09:41:20.9486042Z // end inline asm 2026-02-21T09:41:20.9486097Z bar.sync 0, 128; 2026-02-21T09:41:20.9486151Z // begin inline asm 2026-02-21T09:41:20.9486238Z @%p122 mbarrier.inval.shared::cta.b64 [%r253]; 2026-02-21T09:41:20.9486289Z // end inline asm 2026-02-21T09:41:20.9486341Z bar.sync 0, 128; 2026-02-21T09:41:20.9486402Z // begin inline asm 2026-02-21T09:41:20.9486477Z @%p122 mbarrier.inval.shared::cta.b64 [%r254]; 2026-02-21T09:41:20.9486528Z // end inline asm 2026-02-21T09:41:20.9486589Z bar.sync 0, 128; 2026-02-21T09:41:20.9486641Z // begin inline asm 2026-02-21T09:41:20.9486715Z @%p122 mbarrier.inval.shared::cta.b64 [%r255]; 2026-02-21T09:41:20.9486767Z // end inline asm 2026-02-21T09:41:20.9486829Z bar.sync 0, 128; 2026-02-21T09:41:20.9486892Z // begin inline asm 2026-02-21T09:41:20.9486964Z @%p122 mbarrier.inval.shared::cta.b64 [%r256]; 2026-02-21T09:41:20.9487022Z // end inline asm 2026-02-21T09:41:20.9487073Z bar.sync 0, 128; 2026-02-21T09:41:20.9487127Z // begin inline asm 2026-02-21T09:41:20.9487202Z @%p122 mbarrier.inval.shared::cta.b64 [%r257]; 2026-02-21T09:41:20.9487290Z // end inline asm 2026-02-21T09:41:20.9487344Z // begin inline asm 2026-02-21T09:41:20.9487419Z @%p122 mbarrier.inval.shared::cta.b64 [%r246]; 2026-02-21T09:41:20.9487479Z // end inline asm 2026-02-21T09:41:20.9487531Z bar.sync 0, 128; 2026-02-21T09:41:20.9487590Z // begin inline asm 2026-02-21T09:41:20.9487671Z @%p122 mbarrier.inval.shared::cta.b64 [%r247]; 2026-02-21T09:41:20.9487723Z // end inline asm 2026-02-21T09:41:20.9487776Z bar.sync 0, 128; 2026-02-21T09:41:20.9487829Z // begin inline asm 2026-02-21T09:41:20.9487912Z @%p122 mbarrier.inval.shared::cta.b64 [%r248]; 2026-02-21T09:41:20.9487963Z // end inline asm 2026-02-21T09:41:20.9488016Z bar.sync 0, 128; 2026-02-21T09:41:20.9488076Z // begin inline asm 2026-02-21T09:41:20.9488149Z @%p122 mbarrier.inval.shared::cta.b64 [%r249]; 2026-02-21T09:41:20.9488201Z // end inline asm 2026-02-21T09:41:20.9488254Z bar.sync 0, 128; 2026-02-21T09:41:20.9488315Z // begin inline asm 2026-02-21T09:41:20.9488387Z @%p122 mbarrier.inval.shared::cta.b64 [%r250]; 2026-02-21T09:41:20.9488441Z // end inline asm 2026-02-21T09:41:20.9488501Z bar.sync 0, 128; 2026-02-21T09:41:20.9488554Z // begin inline asm 2026-02-21T09:41:20.9488626Z @%p122 mbarrier.inval.shared::cta.b64 [%r251]; 2026-02-21T09:41:20.9488676Z // end inline asm 2026-02-21T09:41:20.9488843Z .loc 1 30 4 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:30:4 2026-02-21T09:41:20.9488895Z bar.sync 0, 128; 2026-02-21T09:41:20.9488946Z // begin inline asm 2026-02-21T09:41:20.9489095Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r518, 128; 2026-02-21T09:41:20.9489147Z // end inline asm 2026-02-21T09:41:20.9489223Z st.shared.b32 [global_smem+90232], 50529027; 2026-02-21T09:41:20.9489284Z barrier.sync 1; 2026-02-21T09:41:20.9489363Z $L__BB0_24: // %common.ret 2026-02-21T09:41:20.9489546Z .loc 1 0 0 // cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py:0 2026-02-21T09:41:20.9489597Z ret; 2026-02-21T09:41:20.9489657Z $L__tmp1: 2026-02-21T09:41:20.9489710Z $L__func_end0: 2026-02-21T09:41:20.9489791Z // -- End function 2026-02-21T09:41:20.9489847Z } 2026-02-21T09:41:20.9490045Z .file 1 "/tmp/torchinductor_root/un/cunjshpwml2ioynj2avl3mn5ssbyps6rz4i2dwq5qxl7zgth3x24.py" 2026-02-21T09:41:20.9490106Z .section .debug_abbrev 2026-02-21T09:41:20.9490162Z { 2026-02-21T09:41:20.9490249Z .b8 1 // Abbreviation Code 2026-02-21T09:41:20.9490334Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:20.9490414Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:20.9490496Z .b8 37 // DW_AT_producer 2026-02-21T09:41:20.9490568Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.9490661Z .b8 19 // DW_AT_language 2026-02-21T09:41:20.9490744Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:20.9490817Z .b8 3 // DW_AT_name 2026-02-21T09:41:20.9490887Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.9490968Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:20.9491041Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:20.9491112Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:20.9491182Z .b8 8 // DW_FORM_string 2026-02-21T09:41:20.9491256Z .b8 0 // EOM(1) 2026-02-21T09:41:20.9491323Z .b8 0 // EOM(2) 2026-02-21T09:41:20.9491387Z .b8 0 // EOM(3) 2026-02-21T09:41:20.9491442Z } 2026-02-21T09:41:20.9491498Z .section .debug_info 2026-02-21T09:41:20.9491547Z { 2026-02-21T09:41:20.9491632Z .b32 104 // Length of Unit 2026-02-21T09:41:20.9491716Z .b8 2 // DWARF version number 2026-02-21T09:41:20.9491791Z .b8 0 2026-02-21T09:41:20.9491903Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:20.9491996Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:20.9492092Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:20.9492170Z .b8 116 // DW_AT_producer 2026-02-21T09:41:20.9492228Z .b8 114 2026-02-21T09:41:20.9492279Z .b8 105 2026-02-21T09:41:20.9492330Z .b8 116 2026-02-21T09:41:20.9492378Z .b8 111 2026-02-21T09:41:20.9492434Z .b8 110 2026-02-21T09:41:20.9492482Z .b8 0 2026-02-21T09:41:20.9492553Z .b8 2 // DW_AT_language 2026-02-21T09:41:20.9492608Z .b8 0 2026-02-21T09:41:20.9492681Z .b8 99 // DW_AT_name 2026-02-21T09:41:20.9492731Z .b8 117 2026-02-21T09:41:20.9492780Z .b8 110 2026-02-21T09:41:20.9492834Z .b8 106 2026-02-21T09:41:20.9492884Z .b8 115 2026-02-21T09:41:20.9492931Z .b8 104 2026-02-21T09:41:20.9492987Z .b8 112 2026-02-21T09:41:20.9493035Z .b8 119 2026-02-21T09:41:20.9493082Z .b8 109 2026-02-21T09:41:20.9493129Z .b8 108 2026-02-21T09:41:20.9493185Z .b8 50 2026-02-21T09:41:20.9493233Z .b8 105 2026-02-21T09:41:20.9493281Z .b8 111 2026-02-21T09:41:20.9493336Z .b8 121 2026-02-21T09:41:20.9493384Z .b8 110 2026-02-21T09:41:20.9493431Z .b8 106 2026-02-21T09:41:20.9493479Z .b8 50 2026-02-21T09:41:20.9493534Z .b8 97 2026-02-21T09:41:20.9493581Z .b8 118 2026-02-21T09:41:20.9493652Z .b8 108 2026-02-21T09:41:20.9493699Z .b8 51 2026-02-21T09:41:20.9493753Z .b8 109 2026-02-21T09:41:20.9493802Z .b8 110 2026-02-21T09:41:20.9493851Z .b8 53 2026-02-21T09:41:20.9493908Z .b8 115 2026-02-21T09:41:20.9493957Z .b8 115 2026-02-21T09:41:20.9494006Z .b8 98 2026-02-21T09:41:20.9494055Z .b8 121 2026-02-21T09:41:20.9494114Z .b8 112 2026-02-21T09:41:20.9494163Z .b8 115 2026-02-21T09:41:20.9494231Z .b8 54 2026-02-21T09:41:20.9494287Z .b8 114 2026-02-21T09:41:20.9494336Z .b8 122 2026-02-21T09:41:20.9494384Z .b8 52 2026-02-21T09:41:20.9494432Z .b8 105 2026-02-21T09:41:20.9494488Z .b8 50 2026-02-21T09:41:20.9494535Z .b8 100 2026-02-21T09:41:20.9494583Z .b8 119 2026-02-21T09:41:20.9494637Z .b8 113 2026-02-21T09:41:20.9494713Z .b8 53 2026-02-21T09:41:20.9494764Z .b8 113 2026-02-21T09:41:20.9494812Z .b8 120 2026-02-21T09:41:20.9494867Z .b8 108 2026-02-21T09:41:20.9494917Z .b8 55 2026-02-21T09:41:20.9494964Z .b8 122 2026-02-21T09:41:20.9495012Z .b8 103 2026-02-21T09:41:20.9495066Z .b8 116 2026-02-21T09:41:20.9495116Z .b8 104 2026-02-21T09:41:20.9495164Z .b8 51 2026-02-21T09:41:20.9495220Z .b8 120 2026-02-21T09:41:20.9495268Z .b8 50 2026-02-21T09:41:20.9495317Z .b8 52 2026-02-21T09:41:20.9495364Z .b8 46 2026-02-21T09:41:20.9495420Z .b8 112 2026-02-21T09:41:20.9495467Z .b8 121 2026-02-21T09:41:20.9495516Z .b8 0 2026-02-21T09:41:20.9495642Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:20.9495718Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:20.9495769Z .b8 116 2026-02-21T09:41:20.9495818Z .b8 109 2026-02-21T09:41:20.9495873Z .b8 112 2026-02-21T09:41:20.9495920Z .b8 47 2026-02-21T09:41:20.9495968Z .b8 116 2026-02-21T09:41:20.9496023Z .b8 111 2026-02-21T09:41:20.9496070Z .b8 114 2026-02-21T09:41:20.9496118Z .b8 99 2026-02-21T09:41:20.9496165Z .b8 104 2026-02-21T09:41:20.9496222Z .b8 105 2026-02-21T09:41:20.9496270Z .b8 110 2026-02-21T09:41:20.9496319Z .b8 100 2026-02-21T09:41:20.9496365Z .b8 117 2026-02-21T09:41:20.9496421Z .b8 99 2026-02-21T09:41:20.9496470Z .b8 116 2026-02-21T09:41:20.9496518Z .b8 111 2026-02-21T09:41:20.9496572Z .b8 114 2026-02-21T09:41:20.9496620Z .b8 95 2026-02-21T09:41:20.9496667Z .b8 114 2026-02-21T09:41:20.9496715Z .b8 111 2026-02-21T09:41:20.9496771Z .b8 111 2026-02-21T09:41:20.9496820Z .b8 116 2026-02-21T09:41:20.9496867Z .b8 47 2026-02-21T09:41:20.9496922Z .b8 117 2026-02-21T09:41:20.9496971Z .b8 110 2026-02-21T09:41:20.9497021Z .b8 0 2026-02-21T09:41:20.9497069Z } 2026-02-21T09:41:20.9497164Z .section .debug_macinfo { } 2026-02-21T09:41:20.9497168Z 2026-02-21T09:41:20.9497244Z ================================================================ 2026-02-21T09:41:20.9497343Z please share the reproducer above with Triton project. 2026-02-21T09:41:21.1102439Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:21.1102453Z 2026-02-21T09:41:21.1107945Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:41:21.1107964Z 2026-02-21T09:41:21.1108481Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:21.1108564Z `ptxas` stderr: 2026-02-21T09:41:21.1108928Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:21.1109026Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:21.1109032Z 2026-02-21T09:41:21.1109436Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpqx8_r7xw.ptx -o /tmp/tmpqx8_r7xw.ptx.o 2026-02-21T09:41:21.1109657Z 2026-02-21T09:41:21.1109809Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:21.1109820Z 2026-02-21T09:41:21.1109903Z ================================================================ 2026-02-21T09:41:21.1109984Z Internal Triton PTX codegen error 2026-02-21T09:41:21.1110050Z `ptxas` stderr: 2026-02-21T09:41:21.1110412Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:21.1110506Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:21.1110517Z 2026-02-21T09:41:21.1110905Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpqx8_r7xw.ptx -o /tmp/tmpqx8_r7xw.ptx.o 2026-02-21T09:41:21.1110909Z 2026-02-21T09:41:21.1110913Z 2026-02-21T09:41:21.1110966Z // 2026-02-21T09:41:21.1111048Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:21.1111099Z // 2026-02-21T09:41:21.1111301Z 2026-02-21T09:41:21.1111359Z .version 8.7 2026-02-21T09:41:21.1111413Z .target sm_100a 2026-02-21T09:41:21.1111479Z .address_size 64 2026-02-21T09:41:21.1111482Z 2026-02-21T09:41:21.1111639Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:21.1111725Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:21.1111814Z // @_helion_matmul 2026-02-21T09:41:21.1111882Z .visible .entry _helion_matmul( 2026-02-21T09:41:21.1111986Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:21.1112085Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:21.1112178Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:21.1112275Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:21.1112375Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:21.1112425Z ) 2026-02-21T09:41:21.1112479Z .reqntid 256 2026-02-21T09:41:21.1112530Z .maxnreg 32 2026-02-21T09:41:21.1112586Z { 2026-02-21T09:41:21.1112647Z .reg .pred %p<143>; 2026-02-21T09:41:21.1112702Z .reg .b32 %r<497>; 2026-02-21T09:41:21.1112762Z .reg .b64 %rd<142>; 2026-02-21T09:41:21.1112947Z .loc 1 19 0 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:19:0 2026-02-21T09:41:21.1113003Z $L__func_begin0: 2026-02-21T09:41:21.1113218Z .loc 1 19 0 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:19:0 2026-02-21T09:41:21.1113221Z 2026-02-21T09:41:21.1113281Z // %bb.0: 2026-02-21T09:41:21.1113364Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:41:21.1113417Z $L__tmp0: 2026-02-21T09:41:21.1113584Z .loc 1 19 0 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:19 2026-02-21T09:41:21.1113641Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:21.1113723Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:41:21.1113793Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:41:21.1113854Z mov.b32 %r60, global_smem; 2026-02-21T09:41:21.1113909Z // begin inline asm 2026-02-21T09:41:21.1114053Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r60], 128; 2026-02-21T09:41:21.1114116Z // end inline asm 2026-02-21T09:41:21.1114198Z ld.param.b64 %rd50, [_helion_matmul_param_3]; 2026-02-21T09:41:21.1114252Z bar.sync 0; 2026-02-21T09:41:21.1114332Z ld.shared.b32 %r467, [global_smem]; 2026-02-21T09:41:21.1114385Z bar.sync 0; 2026-02-21T09:41:21.1114439Z // begin inline asm 2026-02-21T09:41:21.1114556Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:21.1114617Z // end inline asm 2026-02-21T09:41:21.1114955Z .loc 1 21 67 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:21:67 2026-02-21T09:41:21.1115017Z mov.u32 %r496, %ctaid.x; 2026-02-21T09:41:21.1115084Z mov.u32 %r166, %ctaid.y; 2026-02-21T09:41:21.1115182Z mov.u32 %r167, %ctaid.z; 2026-02-21T09:41:21.1115239Z mov.u32 %r168, %nctaid.x; 2026-02-21T09:41:21.1115303Z mov.u32 %r169, %nctaid.y; 2026-02-21T09:41:21.1115370Z mad.lo.s32 %r170, %r167, %r169, %r166; 2026-02-21T09:41:21.1115432Z mad.lo.s32 %r171, %r170, %r168, %r496; 2026-02-21T09:41:21.1115487Z shl.b32 %r172, %r171, 8; 2026-02-21T09:41:21.1115585Z cvt.s64.s32 %rd51, %r172; 2026-02-21T09:41:21.1115649Z add.s64 %rd19, %rd50, %rd51; 2026-02-21T09:41:21.1115707Z shl.b32 %r173, %r1, 2; 2026-02-21T09:41:21.1115773Z add.s32 %r61, %r60, %r173; 2026-02-21T09:41:21.1115827Z mov.b32 %r70, 0; 2026-02-21T09:41:21.1115882Z // begin inline asm 2026-02-21T09:41:21.1115951Z @%p4 st.shared.b32 [ %r61 + 0 ], %r70; 2026-02-21T09:41:21.1116012Z // end inline asm 2026-02-21T09:41:21.1116072Z bar.warp.sync -1; 2026-02-21T09:41:21.1116131Z setp.eq.b32 %p133, %r1, 0; 2026-02-21T09:41:21.1116194Z cvt.u64.u32 %rd4, %r60; 2026-02-21T09:41:21.1116248Z // begin inline asm 2026-02-21T09:41:21.1116418Z @%p133 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:41:21.1116477Z // end inline asm 2026-02-21T09:41:21.1116530Z // begin inline asm 2026-02-21T09:41:21.1116669Z @%p133 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:21.1116722Z // end inline asm 2026-02-21T09:41:21.1116815Z mov.b32 %r63, 32; 2026-02-21T09:41:21.1116870Z // begin inline asm 2026-02-21T09:41:21.1117017Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:41:21.1117077Z // end inline asm 2026-02-21T09:41:21.1117130Z mov.b32 %r64, 64; 2026-02-21T09:41:21.1117182Z // begin inline asm 2026-02-21T09:41:21.1117335Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r64; 2026-02-21T09:41:21.1117387Z // end inline asm 2026-02-21T09:41:21.1117440Z mov.b32 %r65, 1024; 2026-02-21T09:41:21.1117493Z // begin inline asm 2026-02-21T09:41:21.1117654Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r65; 2026-02-21T09:41:21.1117707Z // end inline asm 2026-02-21T09:41:21.1117759Z // begin inline asm 2026-02-21T09:41:21.1117920Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r65; 2026-02-21T09:41:21.1117971Z // end inline asm 2026-02-21T09:41:21.1118027Z mov.b64 %rd12, 2048; 2026-02-21T09:41:21.1118080Z // begin inline asm 2026-02-21T09:41:21.1118253Z @%p133 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:21.1118338Z // end inline asm 2026-02-21T09:41:21.1118391Z mov.b32 %r67, 1; 2026-02-21T09:41:21.1118452Z // begin inline asm 2026-02-21T09:41:21.1118618Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r67; 2026-02-21T09:41:21.1118670Z // end inline asm 2026-02-21T09:41:21.1118729Z // begin inline asm 2026-02-21T09:41:21.1118897Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r67; 2026-02-21T09:41:21.1118951Z // end inline asm 2026-02-21T09:41:21.1119011Z // begin inline asm 2026-02-21T09:41:21.1119160Z @%p133 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:21.1119214Z // end inline asm 2026-02-21T09:41:21.1119268Z // begin inline asm 2026-02-21T09:41:21.1119442Z @%p133 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.1119497Z // end inline asm 2026-02-21T09:41:21.1119553Z // begin inline asm 2026-02-21T09:41:21.1119715Z @%p133 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:21.1119767Z // end inline asm 2026-02-21T09:41:21.1119820Z // begin inline asm 2026-02-21T09:41:21.1119971Z @%p133 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.1120022Z // end inline asm 2026-02-21T09:41:21.1120074Z // begin inline asm 2026-02-21T09:41:21.1120354Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:21.1120431Z // end inline asm 2026-02-21T09:41:21.1120485Z // begin inline asm 2026-02-21T09:41:21.1120607Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:41:21.1120685Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:21.1120779Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:21.1120833Z // end inline asm 2026-02-21T09:41:21.1120894Z bar.sync 0; 2026-02-21T09:41:21.1120956Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:41:21.1121123Z .loc 1 22 68 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:22:68 2026-02-21T09:41:21.1121181Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:41:21.1121239Z bar.sync 0; 2026-02-21T09:41:21.1121292Z // begin inline asm 2026-02-21T09:41:21.1121358Z @%p4 st.shared.b32 [ %r61 + 0 ], %r70; 2026-02-21T09:41:21.1121417Z // end inline asm 2026-02-21T09:41:21.1121476Z bar.warp.sync -1; 2026-02-21T09:41:21.1121530Z // begin inline asm 2026-02-21T09:41:21.1121692Z @%p133 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:41:21.1121745Z // end inline asm 2026-02-21T09:41:21.1121797Z // begin inline asm 2026-02-21T09:41:21.1121952Z @%p133 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:21.1122014Z // end inline asm 2026-02-21T09:41:21.1122067Z // begin inline asm 2026-02-21T09:41:21.1122212Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:41:21.1122273Z // end inline asm 2026-02-21T09:41:21.1122324Z mov.b32 %r72, 128; 2026-02-21T09:41:21.1122378Z // begin inline asm 2026-02-21T09:41:21.1122526Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r72; 2026-02-21T09:41:21.1122580Z // end inline asm 2026-02-21T09:41:21.1122633Z // begin inline asm 2026-02-21T09:41:21.1122784Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r65; 2026-02-21T09:41:21.1122846Z // end inline asm 2026-02-21T09:41:21.1122899Z mov.b32 %r74, 12288; 2026-02-21T09:41:21.1122952Z // begin inline asm 2026-02-21T09:41:21.1123110Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r74; 2026-02-21T09:41:21.1123162Z // end inline asm 2026-02-21T09:41:21.1123217Z // begin inline asm 2026-02-21T09:41:21.1123386Z @%p133 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:21.1123460Z // end inline asm 2026-02-21T09:41:21.1123513Z // begin inline asm 2026-02-21T09:41:21.1123680Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r67; 2026-02-21T09:41:21.1123738Z // end inline asm 2026-02-21T09:41:21.1123789Z // begin inline asm 2026-02-21T09:41:21.1123948Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r67; 2026-02-21T09:41:21.1124007Z // end inline asm 2026-02-21T09:41:21.1124060Z // begin inline asm 2026-02-21T09:41:21.1124202Z @%p133 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:21.1124260Z // end inline asm 2026-02-21T09:41:21.1124312Z // begin inline asm 2026-02-21T09:41:21.1124475Z @%p133 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.1124534Z // end inline asm 2026-02-21T09:41:21.1124586Z // begin inline asm 2026-02-21T09:41:21.1124772Z @%p133 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:21.1124826Z // end inline asm 2026-02-21T09:41:21.1124885Z // begin inline asm 2026-02-21T09:41:21.1125026Z @%p133 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.1125078Z // end inline asm 2026-02-21T09:41:21.1125137Z // begin inline asm 2026-02-21T09:41:21.1125391Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:21.1125471Z // end inline asm 2026-02-21T09:41:21.1125531Z // begin inline asm 2026-02-21T09:41:21.1125654Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:41:21.1125722Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:21.1125791Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:21.1125882Z // end inline asm 2026-02-21T09:41:21.1125936Z bar.sync 0; 2026-02-21T09:41:21.1125999Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:41:21.1126177Z .loc 1 40 45 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:40:45 2026-02-21T09:41:21.1126234Z shr.u32 %r174, %r1, 5; 2026-02-21T09:41:21.1126407Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1126474Z sub.s32 %r176, 1536, %r496; 2026-02-21T09:41:21.1126540Z mul.hi.s32 %r177, %r176, -580400985; 2026-02-21T09:41:21.1126599Z add.s32 %r178, %r177, %r176; 2026-02-21T09:41:21.1126656Z shr.u32 %r179, %r178, 31; 2026-02-21T09:41:21.1126719Z shr.s32 %r180, %r178, 12; 2026-02-21T09:41:21.1126775Z add.s32 %r181, %r180, %r179; 2026-02-21T09:41:21.1126837Z mul.lo.s32 %r182, %r181, 4736; 2026-02-21T09:41:21.1126904Z setp.ne.b32 %p68, %r176, %r182; 2026-02-21T09:41:21.1126965Z setp.lt.u32 %p69, %r496, 1537; 2026-02-21T09:41:21.1127059Z and.pred %p70, %p69, %p68; 2026-02-21T09:41:21.1127120Z selp.b32 %r183, 1, 0, %p70; 2026-02-21T09:41:21.1127185Z add.s32 %r10, %r181, %r183; 2026-02-21T09:41:21.1127356Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1127428Z shfl.sync.idx.b32 %r12, %r174, 0, 31, -1; 2026-02-21T09:41:21.1127494Z shl.b32 %r184, %r12, 21; 2026-02-21T09:41:21.1127551Z and.b32 %r185, %r184, 6291456; 2026-02-21T09:41:21.1127607Z add.s32 %r186, %r185, %r467; 2026-02-21T09:41:21.1127672Z shl.b32 %r187, %r12, 4; 2026-02-21T09:41:21.1127728Z and.b32 %r188, %r187, 64; 2026-02-21T09:41:21.1127787Z add.s32 %r77, %r186, %r188; 2026-02-21T09:41:21.1127846Z mov.pred %p42, -1; 2026-02-21T09:41:21.1127921Z // begin inline asm 2026-02-21T09:41:21.1128194Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r77 + 0], 32, {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:41:21.1128253Z // end inline asm 2026-02-21T09:41:21.1128324Z // begin inline asm 2026-02-21T09:41:21.1128586Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r77 + 16], 32, {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:41:21.1128683Z // end inline asm 2026-02-21T09:41:21.1128744Z // begin inline asm 2026-02-21T09:41:21.1128812Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:21.1128863Z // end inline asm 2026-02-21T09:41:21.1128915Z bar.sync 0; 2026-02-21T09:41:21.1129103Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1129162Z add.s32 %r111, %r60, 77872; 2026-02-21T09:41:21.1129214Z // begin inline asm 2026-02-21T09:41:21.1129306Z @%p133 mbarrier.init.shared::cta.b64 [%r111], 1; 2026-02-21T09:41:21.1129358Z // end inline asm 2026-02-21T09:41:21.1129408Z bar.sync 0; 2026-02-21T09:41:21.1129470Z add.s32 %r112, %r60, 77880; 2026-02-21T09:41:21.1129525Z // begin inline asm 2026-02-21T09:41:21.1129608Z @%p133 mbarrier.init.shared::cta.b64 [%r112], 1; 2026-02-21T09:41:21.1129661Z // end inline asm 2026-02-21T09:41:21.1129724Z add.s32 %r113, %r60, 77824; 2026-02-21T09:41:21.1129776Z // begin inline asm 2026-02-21T09:41:21.1129854Z @%p133 mbarrier.init.shared::cta.b64 [%r113], 1; 2026-02-21T09:41:21.1129912Z // end inline asm 2026-02-21T09:41:21.1129962Z bar.sync 0; 2026-02-21T09:41:21.1130017Z add.s32 %r114, %r60, 77832; 2026-02-21T09:41:21.1130069Z // begin inline asm 2026-02-21T09:41:21.1130153Z @%p133 mbarrier.init.shared::cta.b64 [%r114], 1; 2026-02-21T09:41:21.1130206Z // end inline asm 2026-02-21T09:41:21.1130295Z bar.sync 0; 2026-02-21T09:41:21.1130356Z add.s32 %r115, %r60, 77840; 2026-02-21T09:41:21.1130411Z // begin inline asm 2026-02-21T09:41:21.1130488Z @%p133 mbarrier.init.shared::cta.b64 [%r115], 1; 2026-02-21T09:41:21.1130539Z // end inline asm 2026-02-21T09:41:21.1130598Z bar.sync 0; 2026-02-21T09:41:21.1130654Z add.s32 %r116, %r60, 77848; 2026-02-21T09:41:21.1130728Z // begin inline asm 2026-02-21T09:41:21.1130813Z @%p133 mbarrier.init.shared::cta.b64 [%r116], 1; 2026-02-21T09:41:21.1130867Z // end inline asm 2026-02-21T09:41:21.1130918Z bar.sync 0; 2026-02-21T09:41:21.1130980Z add.s32 %r117, %r60, 77856; 2026-02-21T09:41:21.1131034Z // begin inline asm 2026-02-21T09:41:21.1131111Z @%p133 mbarrier.init.shared::cta.b64 [%r117], 1; 2026-02-21T09:41:21.1131163Z // end inline asm 2026-02-21T09:41:21.1131223Z bar.sync 0; 2026-02-21T09:41:21.1131278Z add.s32 %r222, %r60, 77864; 2026-02-21T09:41:21.1131331Z // begin inline asm 2026-02-21T09:41:21.1131415Z @%p133 mbarrier.init.shared::cta.b64 [%r222], 1; 2026-02-21T09:41:21.1131469Z // end inline asm 2026-02-21T09:41:21.1131529Z setp.lt.s32 %p71, %r10, 1; 2026-02-21T09:41:21.1131588Z setp.gt.s32 %p67, %r10, 0; 2026-02-21T09:41:21.1131764Z .loc 1 35 33 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:35:33 2026-02-21T09:41:21.1131847Z shr.u32 %r189, %r496, 4; 2026-02-21T09:41:21.1131910Z and.b32 %r190, %r189, 134217664; 2026-02-21T09:41:21.1132082Z .loc 1 36 39 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:36:39 2026-02-21T09:41:21.1132137Z sub.s32 %r191, 96, %r190; 2026-02-21T09:41:21.1132304Z .loc 1 36 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:36:52 2026-02-21T09:41:21.1132364Z min.s32 %r192, %r191, 64; 2026-02-21T09:41:21.1132524Z .loc 1 37 45 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:37:45 2026-02-21T09:41:21.1132579Z and.b32 %r193, %r496, 1023; 2026-02-21T09:41:21.1132739Z .loc 1 38 51 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:38:51 2026-02-21T09:41:21.1132802Z div.s32 %r194, %r193, %r192; 2026-02-21T09:41:21.1132966Z .loc 1 37 64 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:37:64 2026-02-21T09:41:21.1133027Z mul.lo.s32 %r195, %r194, %r192; 2026-02-21T09:41:21.1133092Z sub.s32 %r196, %r193, %r195; 2026-02-21T09:41:21.1133283Z .loc 1 37 30 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:37:30 2026-02-21T09:41:21.1133339Z add.s32 %r197, %r196, %r190; 2026-02-21T09:41:21.1133510Z .loc 1 39 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:39:27 2026-02-21T09:41:21.1133565Z shl.b32 %r473, %r197, 7; 2026-02-21T09:41:21.1133723Z .loc 1 41 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:41:27 2026-02-21T09:41:21.1133784Z shl.b32 %r469, %r194, 6; 2026-02-21T09:41:21.1133953Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1134004Z bar.sync 0; 2026-02-21T09:41:21.1134066Z and.pred %p1, %p133, %p67; 2026-02-21T09:41:21.1134126Z // begin inline asm 2026-02-21T09:41:21.1134236Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r113], 12288; 2026-02-21T09:41:21.1134291Z // end inline asm 2026-02-21T09:41:21.1134461Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1134514Z bar.sync 0; 2026-02-21T09:41:21.1134577Z elect.sync %r198|%p72, -1; 2026-02-21T09:41:21.1134645Z and.pred %p73, %p67, %p72; 2026-02-21T09:41:21.1134732Z and.pred %p53, %p4, %p73; 2026-02-21T09:41:21.1134790Z add.s32 %r120, %r60, 49152; 2026-02-21T09:41:21.1134844Z // begin inline asm 2026-02-21T09:41:21.1135094Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r120], [%rd40, {%r70, %r469}], [%r113]; 2026-02-21T09:41:21.1135176Z // end inline asm 2026-02-21T09:41:21.1135336Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1135398Z // begin inline asm 2026-02-21T09:41:21.1135469Z fence.proxy.async.shared::cta; 2026-02-21T09:41:21.1135522Z // end inline asm 2026-02-21T09:41:21.1135580Z bar.sync 0; 2026-02-21T09:41:21.1135669Z elect.sync %r199|%p74, -1; 2026-02-21T09:41:21.1135731Z and.pred %p75, %p67, %p74; 2026-02-21T09:41:21.1135791Z and.pred %p54, %p4, %p75; 2026-02-21T09:41:21.1135856Z // begin inline asm 2026-02-21T09:41:21.1136093Z @%p54 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r60], [%rd41, {%r70, %r473}], [%r113]; 2026-02-21T09:41:21.1136147Z // end inline asm 2026-02-21T09:41:21.1136333Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1136387Z bar.sync 0; 2026-02-21T09:41:21.1136441Z // begin inline asm 2026-02-21T09:41:21.1136558Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r114], 12288; 2026-02-21T09:41:21.1136611Z // end inline asm 2026-02-21T09:41:21.1136774Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1136824Z bar.sync 0; 2026-02-21T09:41:21.1136917Z elect.sync %r200|%p76, -1; 2026-02-21T09:41:21.1136978Z and.pred %p77, %p67, %p76; 2026-02-21T09:41:21.1137036Z and.pred %p56, %p4, %p77; 2026-02-21T09:41:21.1137100Z add.s32 %r129, %r60, 53248; 2026-02-21T09:41:21.1137154Z // begin inline asm 2026-02-21T09:41:21.1137389Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r129], [%rd40, {%r63, %r469}], [%r114]; 2026-02-21T09:41:21.1137450Z // end inline asm 2026-02-21T09:41:21.1137610Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1137661Z bar.sync 0; 2026-02-21T09:41:21.1137719Z elect.sync %r201|%p78, -1; 2026-02-21T09:41:21.1137785Z and.pred %p79, %p67, %p78; 2026-02-21T09:41:21.1137842Z and.pred %p57, %p4, %p79; 2026-02-21T09:41:21.1137897Z add.s32 %r133, %r60, 8192; 2026-02-21T09:41:21.1137956Z // begin inline asm 2026-02-21T09:41:21.1138186Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r133], [%rd41, {%r63, %r473}], [%r114]; 2026-02-21T09:41:21.1138239Z // end inline asm 2026-02-21T09:41:21.1138416Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1138505Z bar.sync 0; 2026-02-21T09:41:21.1138559Z // begin inline asm 2026-02-21T09:41:21.1138664Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r115], 12288; 2026-02-21T09:41:21.1138724Z // end inline asm 2026-02-21T09:41:21.1138883Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1138934Z bar.sync 0; 2026-02-21T09:41:21.1138996Z elect.sync %r202|%p80, -1; 2026-02-21T09:41:21.1139055Z and.pred %p81, %p67, %p80; 2026-02-21T09:41:21.1139113Z and.pred %p59, %p4, %p81; 2026-02-21T09:41:21.1139169Z add.s32 %r138, %r60, 57344; 2026-02-21T09:41:21.1139230Z // begin inline asm 2026-02-21T09:41:21.1139498Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r138], [%rd40, {%r64, %r469}], [%r115]; 2026-02-21T09:41:21.1139552Z // end inline asm 2026-02-21T09:41:21.1139720Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1139772Z bar.sync 0; 2026-02-21T09:41:21.1139830Z elect.sync %r203|%p82, -1; 2026-02-21T09:41:21.1139895Z and.pred %p83, %p67, %p82; 2026-02-21T09:41:21.1139952Z and.pred %p60, %p4, %p83; 2026-02-21T09:41:21.1140010Z add.s32 %r142, %r60, 16384; 2026-02-21T09:41:21.1140066Z // begin inline asm 2026-02-21T09:41:21.1140315Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r142], [%rd41, {%r64, %r473}], [%r115]; 2026-02-21T09:41:21.1140400Z // end inline asm 2026-02-21T09:41:21.1140576Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1140638Z bar.sync 0; 2026-02-21T09:41:21.1140694Z // begin inline asm 2026-02-21T09:41:21.1140820Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r116], 12288; 2026-02-21T09:41:21.1140882Z // end inline asm 2026-02-21T09:41:21.1141049Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1141104Z bar.sync 0; 2026-02-21T09:41:21.1141165Z elect.sync %r204|%p84, -1; 2026-02-21T09:41:21.1141233Z and.pred %p85, %p67, %p84; 2026-02-21T09:41:21.1141292Z and.pred %p62, %p4, %p85; 2026-02-21T09:41:21.1141350Z add.s32 %r147, %r60, 61440; 2026-02-21T09:41:21.1141412Z mov.b32 %r148, 96; 2026-02-21T09:41:21.1141467Z // begin inline asm 2026-02-21T09:41:21.1141713Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd40, {%r148, %r469}], [%r116]; 2026-02-21T09:41:21.1141776Z // end inline asm 2026-02-21T09:41:21.1141943Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1141995Z bar.sync 0; 2026-02-21T09:41:21.1142079Z elect.sync %r205|%p86, -1; 2026-02-21T09:41:21.1142149Z and.pred %p87, %p67, %p86; 2026-02-21T09:41:21.1142210Z and.pred %p63, %p4, %p87; 2026-02-21T09:41:21.1142271Z add.s32 %r151, %r60, 24576; 2026-02-21T09:41:21.1142333Z // begin inline asm 2026-02-21T09:41:21.1142579Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r151], [%rd41, {%r148, %r473}], [%r116]; 2026-02-21T09:41:21.1142634Z // end inline asm 2026-02-21T09:41:21.1142820Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1142872Z bar.sync 0; 2026-02-21T09:41:21.1142927Z // begin inline asm 2026-02-21T09:41:21.1143035Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r117], 12288; 2026-02-21T09:41:21.1143095Z // end inline asm 2026-02-21T09:41:21.1143263Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1143316Z bar.sync 0; 2026-02-21T09:41:21.1143386Z elect.sync %r206|%p88, -1; 2026-02-21T09:41:21.1143448Z and.pred %p89, %p67, %p88; 2026-02-21T09:41:21.1143509Z and.pred %p65, %p4, %p89; 2026-02-21T09:41:21.1143600Z add.s32 %r156, %r60, 65536; 2026-02-21T09:41:21.1143657Z // begin inline asm 2026-02-21T09:41:21.1143903Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r156], [%rd40, {%r72, %r469}], [%r117]; 2026-02-21T09:41:21.1143958Z // end inline asm 2026-02-21T09:41:21.1144139Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1144193Z bar.sync 0; 2026-02-21T09:41:21.1144253Z elect.sync %r207|%p90, -1; 2026-02-21T09:41:21.1144324Z and.pred %p91, %p67, %p90; 2026-02-21T09:41:21.1144384Z and.pred %p66, %p4, %p91; 2026-02-21T09:41:21.1144442Z add.s32 %r160, %r60, 32768; 2026-02-21T09:41:21.1144505Z // begin inline asm 2026-02-21T09:41:21.1144776Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd41, {%r72, %r473}], [%r117]; 2026-02-21T09:41:21.1144834Z // end inline asm 2026-02-21T09:41:21.1145005Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1145068Z bar.sync 0; 2026-02-21T09:41:21.1145125Z // begin inline asm 2026-02-21T09:41:21.1145177Z 2026-02-21T09:41:21.1145239Z { 2026-02-21T09:41:21.1145304Z @!%p67 bra.uni skipWait; 2026-02-21T09:41:21.1145366Z .reg .pred complete; 2026-02-21T09:41:21.1145424Z waitLoop: 2026-02-21T09:41:21.1145550Z mbarrier.try_wait.parity.shared.b64 complete, [%r113], %r70; 2026-02-21T09:41:21.1145615Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.1145698Z skipWait: 2026-02-21T09:41:21.1145757Z } 2026-02-21T09:41:21.1145761Z 2026-02-21T09:41:21.1145815Z // end inline asm 2026-02-21T09:41:21.1145988Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1146064Z setp.ne.b32 %p92, %r12, 0; 2026-02-21T09:41:21.1146128Z or.pred %p93, %p71, %p92; 2026-02-21T09:41:21.1146218Z @%p93 bra $L__BB0_2; 2026-02-21T09:41:21.1146273Z // %bb.1: 2026-02-21T09:41:21.1146345Z elect.sync %r212|%p95, -1; 2026-02-21T09:41:21.1146409Z bfe.u32 %r215, %r120, 4, 14; 2026-02-21T09:41:21.1146467Z cvt.u64.u32 %rd57, %r215; 2026-02-21T09:41:21.1146549Z or.b64 %rd52, %rd57, -9223371899399045120; 2026-02-21T09:41:21.1146608Z bfe.u32 %r216, %r60, 4, 14; 2026-02-21T09:41:21.1146668Z cvt.u64.u32 %rd58, %r216; 2026-02-21T09:41:21.1146741Z or.b64 %rd53, %rd58, -9223371899382267904; 2026-02-21T09:41:21.1146809Z mov.b32 %r209, 69206032; 2026-02-21T09:41:21.1146866Z mov.pred %p94, 0; 2026-02-21T09:41:21.1146925Z // begin inline asm 2026-02-21T09:41:21.1147081Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd52, %rd53, %r209, %p94; 2026-02-21T09:41:21.1147137Z // end inline asm 2026-02-21T09:41:21.1147195Z add.s32 %r217, %r60, 49184; 2026-02-21T09:41:21.1147263Z bfe.u32 %r218, %r217, 4, 14; 2026-02-21T09:41:21.1147368Z cvt.u64.u32 %rd59, %r218; 2026-02-21T09:41:21.1147442Z or.b64 %rd54, %rd59, -9223371899399045120; 2026-02-21T09:41:21.1147503Z add.s32 %r219, %r60, 32; 2026-02-21T09:41:21.1147570Z bfe.u32 %r220, %r219, 4, 14; 2026-02-21T09:41:21.1147629Z cvt.u64.u32 %rd60, %r220; 2026-02-21T09:41:21.1147707Z or.b64 %rd55, %rd60, -9223371899382267904; 2026-02-21T09:41:21.1147769Z // begin inline asm 2026-02-21T09:41:21.1147903Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd54, %rd55, %r209, %p42; 2026-02-21T09:41:21.1147955Z // end inline asm 2026-02-21T09:41:21.1148010Z add.s32 %r221, %r60, 77872; 2026-02-21T09:41:21.1148074Z cvt.u64.u32 %rd56, %r221; 2026-02-21T09:41:21.1148129Z // begin inline asm 2026-02-21T09:41:21.1148249Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T09:41:21.1148309Z // end inline asm 2026-02-21T09:41:21.1148360Z $L__BB0_2: 2026-02-21T09:41:21.1148536Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1148596Z bar.sync 0; 2026-02-21T09:41:21.1148650Z // begin inline asm 2026-02-21T09:41:21.1148781Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r222], 12288; 2026-02-21T09:41:21.1148835Z // end inline asm 2026-02-21T09:41:21.1149012Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1149065Z bar.sync 0; 2026-02-21T09:41:21.1149125Z elect.sync %r232|%p105, -1; 2026-02-21T09:41:21.1149192Z and.pred %p106, %p67, %p105; 2026-02-21T09:41:21.1149251Z and.pred %p100, %p4, %p106; 2026-02-21T09:41:21.1149306Z add.s32 %r223, %r60, 69632; 2026-02-21T09:41:21.1149367Z mov.b32 %r481, 160; 2026-02-21T09:41:21.1149421Z // begin inline asm 2026-02-21T09:41:21.1149661Z @%p100 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r223], [%rd40, {%r481, %r469}], [%r222]; 2026-02-21T09:41:21.1149714Z // end inline asm 2026-02-21T09:41:21.1149888Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1149940Z bar.sync 0; 2026-02-21T09:41:21.1150000Z elect.sync %r233|%p107, -1; 2026-02-21T09:41:21.1150067Z and.pred %p108, %p67, %p107; 2026-02-21T09:41:21.1150125Z and.pred %p101, %p4, %p108; 2026-02-21T09:41:21.1150179Z add.s32 %r227, %r60, 40960; 2026-02-21T09:41:21.1150238Z // begin inline asm 2026-02-21T09:41:21.1150477Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r227], [%rd41, {%r481, %r473}], [%r222]; 2026-02-21T09:41:21.1150530Z // end inline asm 2026-02-21T09:41:21.1150698Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1150785Z @%p71 bra $L__BB0_12; 2026-02-21T09:41:21.1150861Z // %bb.3: // %.lr.ph 2026-02-21T09:41:21.1151030Z .loc 1 0 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:0:108 2026-02-21T09:41:21.1151095Z and.b32 %r4, %r1, 15; 2026-02-21T09:41:21.1151172Z shr.u32 %r175, %r1, 4; 2026-02-21T09:41:21.1151231Z bfe.u32 %r6, %r1, 4, 4; 2026-02-21T09:41:21.1151318Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:41:21.1151374Z shl.b32 %r5, %r4, 3; 2026-02-21T09:41:21.1151426Z or.b32 %r7, %r6, 16; 2026-02-21T09:41:21.1151480Z or.b32 %r8, %r6, 32; 2026-02-21T09:41:21.1151542Z or.b32 %r9, %r175, 48; 2026-02-21T09:41:21.1151596Z shl.b32 %r11, %r10, 5; 2026-02-21T09:41:21.1151652Z add.s32 %r16, %r11, -6; 2026-02-21T09:41:21.1151714Z shl.b32 %r242, %r1, 9; 2026-02-21T09:41:21.1151768Z and.b32 %r243, %r242, 3072; 2026-02-21T09:41:21.1151822Z shl.b32 %r244, %r4, 4; 2026-02-21T09:41:21.1151876Z shl.b32 %r245, %r1, 3; 2026-02-21T09:41:21.1151940Z and.b32 %r246, %r245, 768; 2026-02-21T09:41:21.1151993Z shl.b32 %r247, %r1, 1; 2026-02-21T09:41:21.1152048Z and.b32 %r248, %r247, 32; 2026-02-21T09:41:21.1152108Z shr.u32 %r249, %r1, 1; 2026-02-21T09:41:21.1152184Z and.b32 %r250, %r249, 64; 2026-02-21T09:41:21.1152241Z or.b32 %r251, %r244, %r246; 2026-02-21T09:41:21.1152296Z or.b32 %r252, %r248, %r250; 2026-02-21T09:41:21.1152361Z xor.b32 %r253, %r251, %r252; 2026-02-21T09:41:21.1152414Z add.s32 %r255, %r60, 73728; 2026-02-21T09:41:21.1152471Z add.s32 %r256, %r255, %r243; 2026-02-21T09:41:21.1152533Z add.s32 %r17, %r256, %r253; 2026-02-21T09:41:21.1152587Z shl.b32 %r257, %r1, 5; 2026-02-21T09:41:21.1152641Z and.b32 %r258, %r257, 3936; 2026-02-21T09:41:21.1152697Z and.b32 %r259, %r1, 224; 2026-02-21T09:41:21.1152761Z and.b32 %r261, %r173, 16; 2026-02-21T09:41:21.1152816Z xor.b32 %r262, %r258, %r259; 2026-02-21T09:41:21.1152873Z add.s32 %r263, %r255, %r261; 2026-02-21T09:41:21.1152939Z add.s32 %r368, %r263, %r262; 2026-02-21T09:41:21.1153105Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1153160Z max.s32 %r264, %r11, 2; 2026-02-21T09:41:21.1153224Z add.s32 %r19, %r264, -1; 2026-02-21T09:41:21.1153284Z mov.pred %p142, -1; 2026-02-21T09:41:21.1153337Z mov.b32 %r486, 5; 2026-02-21T09:41:21.1153414Z mov.b32 %r482, 0; 2026-02-21T09:41:21.1153472Z mov.b32 %r480, 1; 2026-02-21T09:41:21.1153523Z mov.b32 %r479, 2; 2026-02-21T09:41:21.1153574Z mov.b32 %r478, 3; 2026-02-21T09:41:21.1153629Z mov.b32 %r477, 4; 2026-02-21T09:41:21.1153683Z mov.b32 %r470, %r469; 2026-02-21T09:41:21.1153736Z mov.b32 %r471, %r469; 2026-02-21T09:41:21.1153789Z mov.b32 %r472, %r469; 2026-02-21T09:41:21.1153850Z mov.b32 %r474, %r473; 2026-02-21T09:41:21.1153903Z mov.b32 %r475, %r473; 2026-02-21T09:41:21.1153955Z mov.b32 %r476, %r473; 2026-02-21T09:41:21.1154015Z mov.b32 %r483, %r111; 2026-02-21T09:41:21.1154068Z mov.b32 %r484, %r482; 2026-02-21T09:41:21.1154120Z mov.b32 %r485, %r482; 2026-02-21T09:41:21.1154171Z mov.b32 %r487, %r480; 2026-02-21T09:41:21.1154232Z mov.b32 %r488, %r482; 2026-02-21T09:41:21.1154284Z mov.b32 %r489, %r469; 2026-02-21T09:41:21.1154337Z mov.b32 %r490, %r473; 2026-02-21T09:41:21.1154398Z mov.b32 %r492, %r486; 2026-02-21T09:41:21.1154450Z mov.b32 %r493, %r482; 2026-02-21T09:41:21.1154503Z mov.b32 %r494, %r490; 2026-02-21T09:41:21.1154554Z mov.b32 %r495, %r489; 2026-02-21T09:41:21.1154616Z bra.uni $L__BB0_4; 2026-02-21T09:41:21.1154747Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.1154913Z .loc 1 0 0 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:0 2026-02-21T09:41:21.1154983Z selp.b32 %r487, 0, %r316, %p126; 2026-02-21T09:41:21.1155042Z selp.b32 %r317, 1, 0, %p126; 2026-02-21T09:41:21.1155096Z xor.b32 %r488, %r458, %r317; 2026-02-21T09:41:21.1155312Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1155369Z add.s32 %r493, %r493, 1; 2026-02-21T09:41:21.1155433Z setp.ne.b32 %p132, %r19, %r493; 2026-02-21T09:41:21.1155487Z mov.b32 %r469, %r489; 2026-02-21T09:41:21.1155549Z mov.b32 %r472, %r22; 2026-02-21T09:41:21.1155629Z mov.b32 %r473, %r490; 2026-02-21T09:41:21.1155685Z mov.b32 %r476, %r26; 2026-02-21T09:41:21.1155745Z mov.b32 %r477, %r492; 2026-02-21T09:41:21.1155799Z mov.b32 %r480, %r30; 2026-02-21T09:41:21.1155852Z mov.b32 %r482, %r458; 2026-02-21T09:41:21.1155905Z mov.b32 %r483, %r457; 2026-02-21T09:41:21.1155965Z mov.b32 %r489, %r495; 2026-02-21T09:41:21.1156017Z mov.b32 %r490, %r494; 2026-02-21T09:41:21.1156068Z mov.b32 %r492, %r45; 2026-02-21T09:41:21.1156129Z @%p132 bra $L__BB0_4; 2026-02-21T09:41:21.1156184Z bra.uni $L__BB0_11; 2026-02-21T09:41:21.1156283Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:21.1156452Z .loc 1 0 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:0:108 2026-02-21T09:41:21.1156510Z mov.b32 %r458, %r488; 2026-02-21T09:41:21.1156561Z mov.b32 %r30, %r479; 2026-02-21T09:41:21.1156613Z mov.b32 %r479, %r478; 2026-02-21T09:41:21.1156696Z mov.b32 %r478, %r477; 2026-02-21T09:41:21.1156751Z mov.b32 %r26, %r475; 2026-02-21T09:41:21.1156803Z mov.b32 %r475, %r474; 2026-02-21T09:41:21.1156863Z mov.b32 %r474, %r473; 2026-02-21T09:41:21.1156915Z mov.b32 %r22, %r471; 2026-02-21T09:41:21.1156967Z mov.b32 %r471, %r470; 2026-02-21T09:41:21.1157018Z mov.b32 %r470, %r469; 2026-02-21T09:41:21.1157193Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1157249Z add.s32 %r265, %r492, 1; 2026-02-21T09:41:21.1157310Z setp.eq.b32 %p110, %r492, 31; 2026-02-21T09:41:21.1157376Z selp.b32 %r45, 0, %r265, %p110; 2026-02-21T09:41:21.1157436Z setp.ne.b32 %p111, %r45, 0; 2026-02-21T09:41:21.1157491Z @%p111 bra $L__BB0_6; 2026-02-21T09:41:21.1157585Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.1157647Z add.s32 %r496, %r496, 4736; 2026-02-21T09:41:21.1157814Z .loc 1 34 35 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:34:35 2026-02-21T09:41:21.1157871Z shr.s32 %r266, %r496, 31; 2026-02-21T09:41:21.1157934Z shr.u32 %r267, %r266, 22; 2026-02-21T09:41:21.1158017Z add.s32 %r268, %r496, %r267; 2026-02-21T09:41:21.1158072Z shr.s32 %r269, %r268, 10; 2026-02-21T09:41:21.1158247Z .loc 1 35 33 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:35:33 2026-02-21T09:41:21.1158301Z shl.b32 %r270, %r269, 6; 2026-02-21T09:41:21.1158468Z .loc 1 36 39 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:36:39 2026-02-21T09:41:21.1158522Z sub.s32 %r271, 96, %r270; 2026-02-21T09:41:21.1158697Z .loc 1 36 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:36:52 2026-02-21T09:41:21.1158751Z min.s32 %r272, %r271, 64; 2026-02-21T09:41:21.1158914Z .loc 1 37 45 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:37:45 2026-02-21T09:41:21.1158978Z and.b32 %r273, %r268, -1024; 2026-02-21T09:41:21.1159035Z sub.s32 %r274, %r496, %r273; 2026-02-21T09:41:21.1159200Z .loc 1 38 51 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:38:51 2026-02-21T09:41:21.1159264Z div.s32 %r275, %r274, %r272; 2026-02-21T09:41:21.1159428Z .loc 1 37 64 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:37:64 2026-02-21T09:41:21.1159487Z mul.lo.s32 %r276, %r275, %r272; 2026-02-21T09:41:21.1159549Z sub.s32 %r277, %r274, %r276; 2026-02-21T09:41:21.1159716Z .loc 1 37 30 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:37:30 2026-02-21T09:41:21.1159795Z add.s32 %r278, %r277, %r270; 2026-02-21T09:41:21.1159958Z .loc 1 39 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:39:27 2026-02-21T09:41:21.1160021Z shl.b32 %r494, %r278, 7; 2026-02-21T09:41:21.1160185Z .loc 1 41 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:41:27 2026-02-21T09:41:21.1160265Z shl.b32 %r495, %r275, 6; 2026-02-21T09:41:21.1160374Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.1160545Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1160601Z add.s32 %r281, %r485, 1; 2026-02-21T09:41:21.1160673Z setp.gt.s32 %p113, %r281, 5; 2026-02-21T09:41:21.1160734Z selp.b32 %r485, 0, %r281, %p113; 2026-02-21T09:41:21.1160792Z selp.b32 %r282, 1, 0, %p113; 2026-02-21T09:41:21.1160846Z xor.b32 %r484, %r484, %r282; 2026-02-21T09:41:21.1160908Z shl.b32 %r283, %r485, 3; 2026-02-21T09:41:21.1160965Z add.s32 %r285, %r60, %r283; 2026-02-21T09:41:21.1161021Z add.s32 %r279, %r285, 77824; 2026-02-21T09:41:21.1161080Z bar.sync 0; 2026-02-21T09:41:21.1161135Z // begin inline asm 2026-02-21T09:41:21.1161185Z 2026-02-21T09:41:21.1161233Z { 2026-02-21T09:41:21.1161298Z .reg .pred complete; 2026-02-21T09:41:21.1161350Z waitLoop: 2026-02-21T09:41:21.1161488Z mbarrier.try_wait.parity.shared.b64 complete, [%r279], %r484; 2026-02-21T09:41:21.1161558Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.1161608Z } 2026-02-21T09:41:21.1161612Z 2026-02-21T09:41:21.1161665Z // end inline asm 2026-02-21T09:41:21.1161727Z shl.b32 %r286, %r487, 3; 2026-02-21T09:41:21.1161782Z add.s32 %r287, %r60, %r286; 2026-02-21T09:41:21.1161836Z add.s32 %r457, %r287, 77872; 2026-02-21T09:41:21.1162003Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1162065Z @%p92 bra $L__BB0_8; 2026-02-21T09:41:21.1162154Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.1162319Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1162381Z shl.b32 %r292, %r485, 12; 2026-02-21T09:41:21.1162435Z add.s32 %r294, %r60, %r292; 2026-02-21T09:41:21.1162489Z add.s32 %r295, %r294, 49152; 2026-02-21T09:41:21.1162661Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1162742Z shl.b32 %r296, %r485, 13; 2026-02-21T09:41:21.1162798Z add.s32 %r297, %r60, %r296; 2026-02-21T09:41:21.1162964Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1163031Z elect.sync %r298|%p115, -1; 2026-02-21T09:41:21.1163086Z bfe.u32 %r299, %r295, 4, 14; 2026-02-21T09:41:21.1163141Z cvt.u64.u32 %rd68, %r299; 2026-02-21T09:41:21.1163215Z or.b64 %rd63, %rd68, -9223371899399045120; 2026-02-21T09:41:21.1163270Z bfe.u32 %r300, %r297, 4, 14; 2026-02-21T09:41:21.1163327Z cvt.u64.u32 %rd69, %r300; 2026-02-21T09:41:21.1163394Z or.b64 %rd64, %rd69, -9223371899382267904; 2026-02-21T09:41:21.1163456Z mov.b32 %r289, 69206032; 2026-02-21T09:41:21.1163510Z // begin inline asm 2026-02-21T09:41:21.1163652Z @%p115 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd63, %rd64, %r289, %p142; 2026-02-21T09:41:21.1163714Z // end inline asm 2026-02-21T09:41:21.1163769Z add.s32 %r301, %r294, 49184; 2026-02-21T09:41:21.1163825Z bfe.u32 %r302, %r301, 4, 14; 2026-02-21T09:41:21.1163887Z cvt.u64.u32 %rd70, %r302; 2026-02-21T09:41:21.1163952Z or.b64 %rd65, %rd70, -9223371899399045120; 2026-02-21T09:41:21.1164006Z add.s32 %r303, %r297, 32; 2026-02-21T09:41:21.1164059Z bfe.u32 %r304, %r303, 4, 14; 2026-02-21T09:41:21.1164120Z cvt.u64.u32 %rd71, %r304; 2026-02-21T09:41:21.1164185Z or.b64 %rd66, %rd71, -9223371899382267904; 2026-02-21T09:41:21.1164242Z mov.pred %p116, -1; 2026-02-21T09:41:21.1164301Z // begin inline asm 2026-02-21T09:41:21.1164457Z @%p115 tcgen05.mma.cta_group::1.kind::f16 [ %r467 + 0 ], %rd65, %rd66, %r289, %p116; 2026-02-21T09:41:21.1164510Z // end inline asm 2026-02-21T09:41:21.1164563Z cvt.u64.u32 %rd67, %r457; 2026-02-21T09:41:21.1164623Z // begin inline asm 2026-02-21T09:41:21.1164779Z @%p115 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:41:21.1164876Z // end inline asm 2026-02-21T09:41:21.1164976Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.1165152Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1165211Z setp.eq.b32 %p122, %r45, 0; 2026-02-21T09:41:21.1165278Z setp.lt.s32 %p123, %r493, %r16; 2026-02-21T09:41:21.1165443Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1165498Z // begin inline asm 2026-02-21T09:41:21.1165552Z 2026-02-21T09:41:21.1165600Z { 2026-02-21T09:41:21.1165659Z .reg .pred complete; 2026-02-21T09:41:21.1165713Z waitLoop: 2026-02-21T09:41:21.1165836Z mbarrier.try_wait.parity.shared.b64 complete, [%r483], %r482; 2026-02-21T09:41:21.1165897Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.1165945Z } 2026-02-21T09:41:21.1165948Z 2026-02-21T09:41:21.1166007Z // end inline asm 2026-02-21T09:41:21.1166089Z add.s32 %r316, %r487, 1; 2026-02-21T09:41:21.1166150Z setp.gt.s32 %p126, %r316, 1; 2026-02-21T09:41:21.1166324Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1166386Z add.s32 %r318, %r481, 32; 2026-02-21T09:41:21.1166441Z add.s32 %r319, %r486, 1; 2026-02-21T09:41:21.1166499Z setp.gt.s32 %p127, %r319, 5; 2026-02-21T09:41:21.1166567Z selp.b32 %r486, 0, %r319, %p127; 2026-02-21T09:41:21.1166627Z selp.b32 %r481, 0, %r318, %p122; 2026-02-21T09:41:21.1166681Z shl.b32 %r320, %r486, 3; 2026-02-21T09:41:21.1166736Z add.s32 %r322, %r60, %r320; 2026-02-21T09:41:21.1166800Z add.s32 %r311, %r322, 77824; 2026-02-21T09:41:21.1166852Z bar.sync 0; 2026-02-21T09:41:21.1166915Z and.pred %p119, %p133, %p123; 2026-02-21T09:41:21.1166978Z // begin inline asm 2026-02-21T09:41:21.1167090Z @%p119 mbarrier.arrive.expect_tx.shared.b64 _, [%r311], 12288; 2026-02-21T09:41:21.1167143Z // end inline asm 2026-02-21T09:41:21.1167319Z .loc 1 51 31 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:51:31 2026-02-21T09:41:21.1167402Z shl.b32 %r323, %r486, 12; 2026-02-21T09:41:21.1167457Z add.s32 %r324, %r60, %r323; 2026-02-21T09:41:21.1167512Z add.s32 %r308, %r324, 49152; 2026-02-21T09:41:21.1167573Z bar.sync 0; 2026-02-21T09:41:21.1167634Z elect.sync %r325|%p128, -1; 2026-02-21T09:41:21.1167697Z and.pred %p129, %p123, %p128; 2026-02-21T09:41:21.1167766Z and.pred %p120, %p4, %p129; 2026-02-21T09:41:21.1167822Z // begin inline asm 2026-02-21T09:41:21.1168067Z @%p120 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r308], [%rd40, {%r481, %r495}], [%r311]; 2026-02-21T09:41:21.1168133Z // end inline asm 2026-02-21T09:41:21.1168297Z .loc 1 52 44 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:52:44 2026-02-21T09:41:21.1168354Z shl.b32 %r326, %r486, 13; 2026-02-21T09:41:21.1168409Z add.s32 %r312, %r60, %r326; 2026-02-21T09:41:21.1168470Z bar.sync 0; 2026-02-21T09:41:21.1168529Z elect.sync %r327|%p130, -1; 2026-02-21T09:41:21.1168591Z and.pred %p131, %p123, %p130; 2026-02-21T09:41:21.1168660Z and.pred %p121, %p4, %p131; 2026-02-21T09:41:21.1168716Z // begin inline asm 2026-02-21T09:41:21.1168956Z @%p121 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r312], [%rd41, {%r481, %r494}], [%r311]; 2026-02-21T09:41:21.1169015Z // end inline asm 2026-02-21T09:41:21.1169180Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1169240Z setp.ne.b32 %p142, %r480, 31; 2026-02-21T09:41:21.1169323Z @%p142 bra $L__BB0_10; 2026-02-21T09:41:21.1169422Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.1169591Z .loc 1 40 32 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:40:32 2026-02-21T09:41:21.1169650Z add.s32 %r400, %r476, %r5; 2026-02-21T09:41:21.1169850Z .loc 1 42 32 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:42:32 2026-02-21T09:41:21.1169912Z add.s32 %r401, %r472, %r6; 2026-02-21T09:41:21.1169967Z add.s32 %r402, %r7, %r472; 2026-02-21T09:41:21.1170029Z add.s32 %r403, %r8, %r472; 2026-02-21T09:41:21.1170082Z add.s32 %r404, %r472, %r9; 2026-02-21T09:41:21.1170248Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1170302Z // begin inline asm 2026-02-21T09:41:21.1170357Z 2026-02-21T09:41:21.1170405Z { 2026-02-21T09:41:21.1170462Z .reg .pred complete; 2026-02-21T09:41:21.1170523Z waitLoop: 2026-02-21T09:41:21.1170637Z mbarrier.try_wait.parity.shared.b64 complete, [%r457], %r458; 2026-02-21T09:41:21.1170698Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.1170745Z } 2026-02-21T09:41:21.1170748Z 2026-02-21T09:41:21.1170808Z // end inline asm 2026-02-21T09:41:21.1171001Z .loc 1 56 53 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:56:53 2026-02-21T09:41:21.1171070Z mad.lo.s32 %r405, %r401, 12288, %r400; 2026-02-21T09:41:21.1171143Z mad.lo.s32 %r406, %r402, 12288, %r400; 2026-02-21T09:41:21.1171204Z mad.lo.s32 %r407, %r403, 12288, %r400; 2026-02-21T09:41:21.1171265Z mad.lo.s32 %r408, %r404, 12288, %r400; 2026-02-21T09:41:21.1171439Z .loc 1 56 24 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:56:24 2026-02-21T09:41:21.1171505Z mad.wide.s32 %rd74, %r405, 2, %rd3; 2026-02-21T09:41:21.1171567Z mad.wide.s32 %rd75, %r406, 2, %rd3; 2026-02-21T09:41:21.1171627Z mad.wide.s32 %rd76, %r407, 2, %rd3; 2026-02-21T09:41:21.1171695Z mad.wide.s32 %rd77, %r408, 2, %rd3; 2026-02-21T09:41:21.1171863Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1171917Z // begin inline asm 2026-02-21T09:41:21.1172204Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345}, [%r77 + 0], 32; 2026-02-21T09:41:21.1172258Z // end inline asm 2026-02-21T09:41:21.1172335Z // begin inline asm 2026-02-21T09:41:21.1172624Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r347, %r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362}, [%r77 + 16], 32; 2026-02-21T09:41:21.1172676Z // end inline asm 2026-02-21T09:41:21.1172729Z // begin inline asm 2026-02-21T09:41:21.1172799Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:21.1172850Z // end inline asm 2026-02-21T09:41:21.1172907Z cvt.u64.u32 %rd78, %r330; 2026-02-21T09:41:21.1172996Z cvt.u64.u32 %rd79, %r331; 2026-02-21T09:41:21.1173071Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:41:21.1173138Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:41:21.1173347Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1173422Z mov.b64 {%r409, %r410}, %rd81; 2026-02-21T09:41:21.1173491Z cvt.rn.f16x2.f32 %r411, %r410, %r409; 2026-02-21T09:41:21.1173685Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1173759Z cvt.u64.u32 %rd82, %r332; 2026-02-21T09:41:21.1173814Z cvt.u64.u32 %rd83, %r333; 2026-02-21T09:41:21.1173869Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:41:21.1173925Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:41:21.1174123Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1174182Z mov.b64 {%r412, %r413}, %rd85; 2026-02-21T09:41:21.1174248Z cvt.rn.f16x2.f32 %r414, %r413, %r412; 2026-02-21T09:41:21.1174456Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1174513Z cvt.u64.u32 %rd86, %r334; 2026-02-21T09:41:21.1174568Z cvt.u64.u32 %rd87, %r335; 2026-02-21T09:41:21.1174622Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:41:21.1174722Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:41:21.1174936Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1174998Z mov.b64 {%r415, %r416}, %rd89; 2026-02-21T09:41:21.1175068Z cvt.rn.f16x2.f32 %r417, %r416, %r415; 2026-02-21T09:41:21.1175257Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1175312Z cvt.u64.u32 %rd90, %r336; 2026-02-21T09:41:21.1175375Z cvt.u64.u32 %rd91, %r337; 2026-02-21T09:41:21.1175430Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:41:21.1175487Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:41:21.1175670Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1175734Z mov.b64 {%r418, %r419}, %rd93; 2026-02-21T09:41:21.1175795Z cvt.rn.f16x2.f32 %r420, %r419, %r418; 2026-02-21T09:41:21.1176003Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1176070Z cvt.u64.u32 %rd94, %r338; 2026-02-21T09:41:21.1176127Z cvt.u64.u32 %rd95, %r339; 2026-02-21T09:41:21.1176186Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:41:21.1176250Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:41:21.1176441Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1176500Z mov.b64 {%r421, %r422}, %rd97; 2026-02-21T09:41:21.1176561Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:41:21.1176757Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1176815Z cvt.u64.u32 %rd98, %r340; 2026-02-21T09:41:21.1176872Z cvt.u64.u32 %rd99, %r341; 2026-02-21T09:41:21.1176943Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:41:21.1177003Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:41:21.1177192Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1177265Z mov.b64 {%r424, %r425}, %rd101; 2026-02-21T09:41:21.1177330Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:41:21.1177542Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1177603Z cvt.u64.u32 %rd102, %r342; 2026-02-21T09:41:21.1177667Z cvt.u64.u32 %rd103, %r343; 2026-02-21T09:41:21.1177725Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:41:21.1177786Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:41:21.1177962Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1178022Z mov.b64 {%r427, %r428}, %rd105; 2026-02-21T09:41:21.1178085Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:41:21.1178273Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1178331Z cvt.u64.u32 %rd106, %r344; 2026-02-21T09:41:21.1178388Z cvt.u64.u32 %rd107, %r345; 2026-02-21T09:41:21.1178448Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:41:21.1178526Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:41:21.1178693Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1178751Z mov.b64 {%r430, %r431}, %rd109; 2026-02-21T09:41:21.1178818Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:41:21.1178979Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1179035Z cvt.u64.u32 %rd110, %r347; 2026-02-21T09:41:21.1179099Z cvt.u64.u32 %rd111, %r348; 2026-02-21T09:41:21.1179155Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:41:21.1179237Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:41:21.1179403Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1179466Z mov.b64 {%r433, %r434}, %rd113; 2026-02-21T09:41:21.1179525Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:41:21.1179717Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1179785Z cvt.u64.u32 %rd114, %r349; 2026-02-21T09:41:21.1179841Z cvt.u64.u32 %rd115, %r350; 2026-02-21T09:41:21.1179897Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:41:21.1179960Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:41:21.1180129Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1180186Z mov.b64 {%r436, %r437}, %rd117; 2026-02-21T09:41:21.1180246Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:41:21.1180417Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1180475Z cvt.u64.u32 %rd118, %r351; 2026-02-21T09:41:21.1180529Z cvt.u64.u32 %rd119, %r352; 2026-02-21T09:41:21.1180591Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:41:21.1180647Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:41:21.1180836Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1180901Z mov.b64 {%r439, %r440}, %rd121; 2026-02-21T09:41:21.1180962Z cvt.rn.f16x2.f32 %r441, %r440, %r439; 2026-02-21T09:41:21.1181128Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1181184Z cvt.u64.u32 %rd122, %r353; 2026-02-21T09:41:21.1181246Z cvt.u64.u32 %rd123, %r354; 2026-02-21T09:41:21.1181302Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:41:21.1181358Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:41:21.1181528Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1181587Z mov.b64 {%r442, %r443}, %rd125; 2026-02-21T09:41:21.1181647Z cvt.rn.f16x2.f32 %r444, %r443, %r442; 2026-02-21T09:41:21.1181815Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1181872Z cvt.u64.u32 %rd126, %r355; 2026-02-21T09:41:21.1181929Z cvt.u64.u32 %rd127, %r356; 2026-02-21T09:41:21.1181985Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:41:21.1182069Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:41:21.1182234Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1182290Z mov.b64 {%r445, %r446}, %rd129; 2026-02-21T09:41:21.1182358Z cvt.rn.f16x2.f32 %r447, %r446, %r445; 2026-02-21T09:41:21.1182526Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1182581Z cvt.u64.u32 %rd130, %r357; 2026-02-21T09:41:21.1182643Z cvt.u64.u32 %rd131, %r358; 2026-02-21T09:41:21.1182698Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:41:21.1182754Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:41:21.1182918Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1182981Z mov.b64 {%r448, %r449}, %rd133; 2026-02-21T09:41:21.1183042Z cvt.rn.f16x2.f32 %r450, %r449, %r448; 2026-02-21T09:41:21.1183205Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1183270Z cvt.u64.u32 %rd134, %r359; 2026-02-21T09:41:21.1183324Z cvt.u64.u32 %rd135, %r360; 2026-02-21T09:41:21.1183379Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:41:21.1183441Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:41:21.1183603Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1183659Z mov.b64 {%r451, %r452}, %rd137; 2026-02-21T09:41:21.1183718Z cvt.rn.f16x2.f32 %r453, %r452, %r451; 2026-02-21T09:41:21.1183938Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1183994Z cvt.u64.u32 %rd138, %r361; 2026-02-21T09:41:21.1184049Z cvt.u64.u32 %rd139, %r362; 2026-02-21T09:41:21.1184112Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:41:21.1184193Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:41:21.1184357Z .loc 1 55 27 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:55:27 2026-02-21T09:41:21.1184420Z mov.b64 {%r454, %r455}, %rd141; 2026-02-21T09:41:21.1184479Z cvt.rn.f16x2.f32 %r456, %r455, %r454; 2026-02-21T09:41:21.1184643Z .loc 1 56 83 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:56:83 2026-02-21T09:41:21.1184776Z st.shared.v4.b32 [%r17], {%r411, %r423, %r435, %r447}; 2026-02-21T09:41:21.1184839Z bar.sync 0; 2026-02-21T09:41:21.1184895Z // begin inline asm 2026-02-21T09:41:21.1185041Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r384, %r388, %r392, %r396}, [%r368]; 2026-02-21T09:41:21.1185104Z // end inline asm 2026-02-21T09:41:21.1185155Z bar.sync 0; 2026-02-21T09:41:21.1185245Z st.shared.v4.b32 [%r17], {%r414, %r426, %r438, %r450}; 2026-02-21T09:41:21.1185306Z bar.sync 0; 2026-02-21T09:41:21.1185362Z // begin inline asm 2026-02-21T09:41:21.1185531Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r385, %r389, %r393, %r397}, [%r368]; 2026-02-21T09:41:21.1185584Z // end inline asm 2026-02-21T09:41:21.1185647Z bar.sync 0; 2026-02-21T09:41:21.1185733Z st.shared.v4.b32 [%r17], {%r417, %r429, %r441, %r453}; 2026-02-21T09:41:21.1185786Z bar.sync 0; 2026-02-21T09:41:21.1185848Z // begin inline asm 2026-02-21T09:41:21.1185989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r386, %r390, %r394, %r398}, [%r368]; 2026-02-21T09:41:21.1186042Z // end inline asm 2026-02-21T09:41:21.1186094Z bar.sync 0; 2026-02-21T09:41:21.1186190Z st.shared.v4.b32 [%r17], {%r420, %r432, %r444, %r456}; 2026-02-21T09:41:21.1186248Z bar.sync 0; 2026-02-21T09:41:21.1186307Z // begin inline asm 2026-02-21T09:41:21.1186456Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r387, %r391, %r395, %r399}, [%r368]; 2026-02-21T09:41:21.1186511Z // end inline asm 2026-02-21T09:41:21.1186567Z // begin inline asm 2026-02-21T09:41:21.1186667Z st.global.v4.b32 [ %rd74 + 0 ], { %r384, %r385, %r386, %r387 }; 2026-02-21T09:41:21.1186728Z // end inline asm 2026-02-21T09:41:21.1186780Z // begin inline asm 2026-02-21T09:41:21.1186905Z st.global.v4.b32 [ %rd75 + 0 ], { %r388, %r389, %r390, %r391 }; 2026-02-21T09:41:21.1186965Z // end inline asm 2026-02-21T09:41:21.1187018Z // begin inline asm 2026-02-21T09:41:21.1187110Z st.global.v4.b32 [ %rd76 + 0 ], { %r392, %r393, %r394, %r395 }; 2026-02-21T09:41:21.1187168Z // end inline asm 2026-02-21T09:41:21.1187220Z // begin inline asm 2026-02-21T09:41:21.1187310Z st.global.v4.b32 [ %rd77 + 0 ], { %r396, %r397, %r398, %r399 }; 2026-02-21T09:41:21.1187363Z // end inline asm 2026-02-21T09:41:21.1187425Z bra.uni $L__BB0_10; 2026-02-21T09:41:21.1187506Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:21.1187674Z .loc 1 53 52 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:53:52 2026-02-21T09:41:21.1187734Z // begin inline asm 2026-02-21T09:41:21.1187783Z 2026-02-21T09:41:21.1187833Z { 2026-02-21T09:41:21.1187893Z .reg .pred complete; 2026-02-21T09:41:21.1187951Z waitLoop: 2026-02-21T09:41:21.1188066Z mbarrier.try_wait.parity.shared.b64 complete, [%r457], %r458; 2026-02-21T09:41:21.1188131Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.1188186Z } 2026-02-21T09:41:21.1188189Z 2026-02-21T09:41:21.1188241Z // end inline asm 2026-02-21T09:41:21.1188328Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:41:21.1188507Z .loc 1 28 108 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:108 2026-02-21T09:41:21.1188558Z bar.sync 0; 2026-02-21T09:41:21.1188612Z // begin inline asm 2026-02-21T09:41:21.1188730Z @%p133 mbarrier.inval.shared::cta.b64 [%r113]; 2026-02-21T09:41:21.1188790Z // end inline asm 2026-02-21T09:41:21.1188840Z bar.sync 0; 2026-02-21T09:41:21.1188894Z // begin inline asm 2026-02-21T09:41:21.1188982Z @%p133 mbarrier.inval.shared::cta.b64 [%r114]; 2026-02-21T09:41:21.1189036Z // end inline asm 2026-02-21T09:41:21.1189088Z bar.sync 0; 2026-02-21T09:41:21.1189174Z // begin inline asm 2026-02-21T09:41:21.1189261Z @%p133 mbarrier.inval.shared::cta.b64 [%r115]; 2026-02-21T09:41:21.1189314Z // end inline asm 2026-02-21T09:41:21.1189363Z bar.sync 0; 2026-02-21T09:41:21.1189422Z // begin inline asm 2026-02-21T09:41:21.1189496Z @%p133 mbarrier.inval.shared::cta.b64 [%r116]; 2026-02-21T09:41:21.1189546Z // end inline asm 2026-02-21T09:41:21.1189601Z bar.sync 0; 2026-02-21T09:41:21.1189653Z // begin inline asm 2026-02-21T09:41:21.1189727Z @%p133 mbarrier.inval.shared::cta.b64 [%r117]; 2026-02-21T09:41:21.1189777Z // end inline asm 2026-02-21T09:41:21.1189834Z bar.sync 0; 2026-02-21T09:41:21.1189888Z // begin inline asm 2026-02-21T09:41:21.1189963Z @%p133 mbarrier.inval.shared::cta.b64 [%r222]; 2026-02-21T09:41:21.1190021Z // end inline asm 2026-02-21T09:41:21.1190072Z // begin inline asm 2026-02-21T09:41:21.1190144Z @%p133 mbarrier.inval.shared::cta.b64 [%r111]; 2026-02-21T09:41:21.1190194Z // end inline asm 2026-02-21T09:41:21.1190274Z bar.sync 0; 2026-02-21T09:41:21.1190329Z // begin inline asm 2026-02-21T09:41:21.1190405Z @%p133 mbarrier.inval.shared::cta.b64 [%r112]; 2026-02-21T09:41:21.1190464Z // end inline asm 2026-02-21T09:41:21.1190629Z .loc 1 28 4 // czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py:28:4 2026-02-21T09:41:21.1190679Z bar.sync 0; 2026-02-21T09:41:21.1190738Z // begin inline asm 2026-02-21T09:41:21.1190849Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r467, 128; 2026-02-21T09:41:21.1190901Z // end inline asm 2026-02-21T09:41:21.1190950Z ret; 2026-02-21T09:41:21.1191007Z $L__tmp1: 2026-02-21T09:41:21.1191061Z $L__func_end0: 2026-02-21T09:41:21.1191140Z // -- End function 2026-02-21T09:41:21.1191195Z } 2026-02-21T09:41:21.1191401Z .file 1 "/tmp/torchinductor_root/zd/czdyhkqn2mzkjcqh3etdi4saffmjb2ez6tq3izsqg6xpwrfqsjym.py" 2026-02-21T09:41:21.1191461Z .section .debug_abbrev 2026-02-21T09:41:21.1191511Z { 2026-02-21T09:41:21.1191606Z .b8 1 // Abbreviation Code 2026-02-21T09:41:21.1191712Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:21.1191789Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:21.1191875Z .b8 37 // DW_AT_producer 2026-02-21T09:41:21.1191945Z .b8 8 // DW_FORM_string 2026-02-21T09:41:21.1192017Z .b8 19 // DW_AT_language 2026-02-21T09:41:21.1192097Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:21.1192170Z .b8 3 // DW_AT_name 2026-02-21T09:41:21.1192241Z .b8 8 // DW_FORM_string 2026-02-21T09:41:21.1192317Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:21.1192395Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:21.1192469Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:21.1192540Z .b8 8 // DW_FORM_string 2026-02-21T09:41:21.1192618Z .b8 0 // EOM(1) 2026-02-21T09:41:21.1192685Z .b8 0 // EOM(2) 2026-02-21T09:41:21.1192748Z .b8 0 // EOM(3) 2026-02-21T09:41:21.1192803Z } 2026-02-21T09:41:21.1192861Z .section .debug_info 2026-02-21T09:41:21.1192910Z { 2026-02-21T09:41:21.1192991Z .b32 104 // Length of Unit 2026-02-21T09:41:21.1193080Z .b8 2 // DWARF version number 2026-02-21T09:41:21.1193156Z .b8 0 2026-02-21T09:41:21.1193276Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:21.1193369Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:21.1193466Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:21.1193567Z .b8 116 // DW_AT_producer 2026-02-21T09:41:21.1193627Z .b8 114 2026-02-21T09:41:21.1193678Z .b8 105 2026-02-21T09:41:21.1193727Z .b8 116 2026-02-21T09:41:21.1193775Z .b8 111 2026-02-21T09:41:21.1193830Z .b8 110 2026-02-21T09:41:21.1193878Z .b8 0 2026-02-21T09:41:21.1193947Z .b8 2 // DW_AT_language 2026-02-21T09:41:21.1194002Z .b8 0 2026-02-21T09:41:21.1194075Z .b8 99 // DW_AT_name 2026-02-21T09:41:21.1194123Z .b8 122 2026-02-21T09:41:21.1194169Z .b8 100 2026-02-21T09:41:21.1194223Z .b8 121 2026-02-21T09:41:21.1194269Z .b8 104 2026-02-21T09:41:21.1194314Z .b8 107 2026-02-21T09:41:21.1194371Z .b8 113 2026-02-21T09:41:21.1194418Z .b8 110 2026-02-21T09:41:21.1194466Z .b8 50 2026-02-21T09:41:21.1194514Z .b8 109 2026-02-21T09:41:21.1194568Z .b8 122 2026-02-21T09:41:21.1194615Z .b8 107 2026-02-21T09:41:21.1194661Z .b8 106 2026-02-21T09:41:21.1194742Z .b8 99 2026-02-21T09:41:21.1194798Z .b8 113 2026-02-21T09:41:21.1194871Z .b8 104 2026-02-21T09:41:21.1194925Z .b8 51 2026-02-21T09:41:21.1194982Z .b8 101 2026-02-21T09:41:21.1195033Z .b8 116 2026-02-21T09:41:21.1195084Z .b8 100 2026-02-21T09:41:21.1195134Z .b8 105 2026-02-21T09:41:21.1195191Z .b8 52 2026-02-21T09:41:21.1195242Z .b8 115 2026-02-21T09:41:21.1195291Z .b8 97 2026-02-21T09:41:21.1195349Z .b8 102 2026-02-21T09:41:21.1195399Z .b8 102 2026-02-21T09:41:21.1195450Z .b8 109 2026-02-21T09:41:21.1195500Z .b8 106 2026-02-21T09:41:21.1195558Z .b8 98 2026-02-21T09:41:21.1195608Z .b8 50 2026-02-21T09:41:21.1195659Z .b8 101 2026-02-21T09:41:21.1195708Z .b8 122 2026-02-21T09:41:21.1195764Z .b8 54 2026-02-21T09:41:21.1195816Z .b8 116 2026-02-21T09:41:21.1195867Z .b8 113 2026-02-21T09:41:21.1195924Z .b8 51 2026-02-21T09:41:21.1195975Z .b8 105 2026-02-21T09:41:21.1196025Z .b8 122 2026-02-21T09:41:21.1196075Z .b8 115 2026-02-21T09:41:21.1196129Z .b8 113 2026-02-21T09:41:21.1196177Z .b8 103 2026-02-21T09:41:21.1196225Z .b8 54 2026-02-21T09:41:21.1196279Z .b8 120 2026-02-21T09:41:21.1196327Z .b8 112 2026-02-21T09:41:21.1196378Z .b8 119 2026-02-21T09:41:21.1196428Z .b8 114 2026-02-21T09:41:21.1196481Z .b8 102 2026-02-21T09:41:21.1196557Z .b8 113 2026-02-21T09:41:21.1196605Z .b8 115 2026-02-21T09:41:21.1196658Z .b8 106 2026-02-21T09:41:21.1196706Z .b8 121 2026-02-21T09:41:21.1196753Z .b8 109 2026-02-21T09:41:21.1196800Z .b8 46 2026-02-21T09:41:21.1196856Z .b8 112 2026-02-21T09:41:21.1196904Z .b8 121 2026-02-21T09:41:21.1196954Z .b8 0 2026-02-21T09:41:21.1197043Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:21.1197123Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:21.1197172Z .b8 116 2026-02-21T09:41:21.1197222Z .b8 109 2026-02-21T09:41:21.1197277Z .b8 112 2026-02-21T09:41:21.1197326Z .b8 47 2026-02-21T09:41:21.1197376Z .b8 116 2026-02-21T09:41:21.1197423Z .b8 111 2026-02-21T09:41:21.1197478Z .b8 114 2026-02-21T09:41:21.1197526Z .b8 99 2026-02-21T09:41:21.1197575Z .b8 104 2026-02-21T09:41:21.1197627Z .b8 105 2026-02-21T09:41:21.1197677Z .b8 110 2026-02-21T09:41:21.1197728Z .b8 100 2026-02-21T09:41:21.1197778Z .b8 117 2026-02-21T09:41:21.1197831Z .b8 99 2026-02-21T09:41:21.1197881Z .b8 116 2026-02-21T09:41:21.1197929Z .b8 111 2026-02-21T09:41:21.1197982Z .b8 114 2026-02-21T09:41:21.1198032Z .b8 95 2026-02-21T09:41:21.1198080Z .b8 114 2026-02-21T09:41:21.1198128Z .b8 111 2026-02-21T09:41:21.1198182Z .b8 111 2026-02-21T09:41:21.1198230Z .b8 116 2026-02-21T09:41:21.1198280Z .b8 47 2026-02-21T09:41:21.1198329Z .b8 122 2026-02-21T09:41:21.1198385Z .b8 100 2026-02-21T09:41:21.1198433Z .b8 0 2026-02-21T09:41:21.1198483Z } 2026-02-21T09:41:21.1198556Z .section .debug_macinfo { } 2026-02-21T09:41:21.1198587Z 2026-02-21T09:41:21.1198664Z ================================================================ 2026-02-21T09:41:21.1198772Z please share the reproducer above with Triton project. 2026-02-21T09:41:21.2929521Z [27s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:21.2929794Z 2026-02-21T09:41:21.2935766Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:41:21.2936895Z 2026-02-21T09:41:21.2937212Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:21.2937443Z `ptxas` stderr: 2026-02-21T09:41:21.2937854Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:21.2938343Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:21.2938487Z 2026-02-21T09:41:21.2938946Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmppxa0yd5c.ptx -o /tmp/tmppxa0yd5c.ptx.o 2026-02-21T09:41:21.2939392Z 2026-02-21T09:41:21.2939520Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:21.2939704Z 2026-02-21T09:41:21.2939781Z ================================================================ 2026-02-21T09:41:21.2939989Z Internal Triton PTX codegen error 2026-02-21T09:41:21.2940156Z `ptxas` stderr: 2026-02-21T09:41:21.2940559Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 204 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:21.2941018Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:21.2941159Z 2026-02-21T09:41:21.2941521Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmppxa0yd5c.ptx -o /tmp/tmppxa0yd5c.ptx.o 2026-02-21T09:41:21.2941941Z 2026-02-21T09:41:21.2941944Z 2026-02-21T09:41:21.2941996Z // 2026-02-21T09:41:21.2942179Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:21.2942341Z // 2026-02-21T09:41:21.2942407Z 2026-02-21T09:41:21.2942461Z .version 8.7 2026-02-21T09:41:21.2942589Z .target sm_100a 2026-02-21T09:41:21.2942723Z .address_size 64 2026-02-21T09:41:21.2942806Z 2026-02-21T09:41:21.2942924Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:21.2943172Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:21.2943390Z // @_helion_matmul 2026-02-21T09:41:21.2943587Z .visible .entry _helion_matmul( 2026-02-21T09:41:21.2943809Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:21.2944051Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:21.2944296Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:21.2944549Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:21.2944853Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:21.2945050Z ) 2026-02-21T09:41:21.2945160Z .reqntid 256 2026-02-21T09:41:21.2945289Z .maxnreg 32 2026-02-21T09:41:21.2945400Z { 2026-02-21T09:41:21.2945525Z .reg .pred %p<143>; 2026-02-21T09:41:21.2945664Z .reg .b32 %r<498>; 2026-02-21T09:41:21.2945799Z .reg .b64 %rd<142>; 2026-02-21T09:41:21.2946054Z .loc 1 19 0 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:19:0 2026-02-21T09:41:21.2946342Z $L__func_begin0: 2026-02-21T09:41:21.2946581Z .loc 1 19 0 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:19:0 2026-02-21T09:41:21.2946845Z 2026-02-21T09:41:21.2946897Z // %bb.0: 2026-02-21T09:41:21.2947045Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:41:21.2947226Z $L__tmp0: 2026-02-21T09:41:21.2947458Z .loc 1 19 0 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:19 2026-02-21T09:41:21.2947762Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:21.2947943Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:41:21.2948136Z setp.lt.u32 %p4, %r1, 32; 2026-02-21T09:41:21.2948302Z mov.b32 %r60, global_smem; 2026-02-21T09:41:21.2948460Z // begin inline asm 2026-02-21T09:41:21.2948686Z @%p4 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r60], 128; 2026-02-21T09:41:21.2948931Z // end inline asm 2026-02-21T09:41:21.2949089Z ld.param.b64 %rd50, [_helion_matmul_param_3]; 2026-02-21T09:41:21.2949278Z bar.sync 0; 2026-02-21T09:41:21.2949416Z ld.shared.b32 %r468, [global_smem]; 2026-02-21T09:41:21.2949588Z bar.sync 0; 2026-02-21T09:41:21.2949712Z // begin inline asm 2026-02-21T09:41:21.2949916Z @%p4 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:21.2950141Z // end inline asm 2026-02-21T09:41:21.2950411Z .loc 1 21 67 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:21:67 2026-02-21T09:41:21.2950709Z mov.u32 %r497, %ctaid.x; 2026-02-21T09:41:21.2950858Z mov.u32 %r166, %ctaid.y; 2026-02-21T09:41:21.2951010Z mov.u32 %r167, %ctaid.z; 2026-02-21T09:41:21.2951157Z mov.u32 %r168, %nctaid.x; 2026-02-21T09:41:21.2951312Z mov.u32 %r169, %nctaid.y; 2026-02-21T09:41:21.2951469Z mad.lo.s32 %r170, %r167, %r169, %r166; 2026-02-21T09:41:21.2951652Z mad.lo.s32 %r171, %r170, %r168, %r497; 2026-02-21T09:41:21.2951825Z shl.b32 %r172, %r171, 8; 2026-02-21T09:41:21.2951973Z cvt.s64.s32 %rd51, %r172; 2026-02-21T09:41:21.2952137Z add.s64 %rd19, %rd50, %rd51; 2026-02-21T09:41:21.2952295Z shl.b32 %r173, %r1, 2; 2026-02-21T09:41:21.2952460Z add.s32 %r61, %r60, %r173; 2026-02-21T09:41:21.2952609Z mov.b32 %r70, 0; 2026-02-21T09:41:21.2952747Z // begin inline asm 2026-02-21T09:41:21.2952892Z @%p4 st.shared.b32 [ %r61 + 0 ], %r70; 2026-02-21T09:41:21.2953063Z // end inline asm 2026-02-21T09:41:21.2953197Z bar.warp.sync -1; 2026-02-21T09:41:21.2953348Z setp.eq.b32 %p133, %r1, 0; 2026-02-21T09:41:21.2953527Z cvt.u64.u32 %rd4, %r60; 2026-02-21T09:41:21.2953673Z // begin inline asm 2026-02-21T09:41:21.2953961Z @%p133 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:41:21.2954242Z // end inline asm 2026-02-21T09:41:21.2954380Z // begin inline asm 2026-02-21T09:41:21.2954598Z @%p133 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:21.2954888Z // end inline asm 2026-02-21T09:41:21.2955016Z mov.b32 %r63, 32; 2026-02-21T09:41:21.2955155Z // begin inline asm 2026-02-21T09:41:21.2955388Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:41:21.2955645Z // end inline asm 2026-02-21T09:41:21.2955780Z mov.b32 %r64, 64; 2026-02-21T09:41:21.2955908Z // begin inline asm 2026-02-21T09:41:21.2956135Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r64; 2026-02-21T09:41:21.2956389Z // end inline asm 2026-02-21T09:41:21.2956528Z mov.b32 %r65, 1024; 2026-02-21T09:41:21.2956668Z // begin inline asm 2026-02-21T09:41:21.2956908Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r65; 2026-02-21T09:41:21.2957176Z // end inline asm 2026-02-21T09:41:21.2957303Z // begin inline asm 2026-02-21T09:41:21.2957536Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r65; 2026-02-21T09:41:21.2957798Z // end inline asm 2026-02-21T09:41:21.2957932Z mov.b64 %rd12, 2048; 2026-02-21T09:41:21.2958066Z // begin inline asm 2026-02-21T09:41:21.2958314Z @%p133 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:21.2958627Z // end inline asm 2026-02-21T09:41:21.2958752Z mov.b32 %r67, 1; 2026-02-21T09:41:21.2958886Z // begin inline asm 2026-02-21T09:41:21.2959132Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r67; 2026-02-21T09:41:21.2959421Z // end inline asm 2026-02-21T09:41:21.2959548Z // begin inline asm 2026-02-21T09:41:21.2959827Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r67; 2026-02-21T09:41:21.2960104Z // end inline asm 2026-02-21T09:41:21.2960235Z // begin inline asm 2026-02-21T09:41:21.2960464Z @%p133 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:21.2960718Z // end inline asm 2026-02-21T09:41:21.2960854Z // begin inline asm 2026-02-21T09:41:21.2961096Z @%p133 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.2961377Z // end inline asm 2026-02-21T09:41:21.2961506Z // begin inline asm 2026-02-21T09:41:21.2961745Z @%p133 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:21.2962021Z // end inline asm 2026-02-21T09:41:21.2962152Z // begin inline asm 2026-02-21T09:41:21.2962390Z @%p133 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.2962703Z // end inline asm 2026-02-21T09:41:21.2962844Z // begin inline asm 2026-02-21T09:41:21.2963188Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:21.2963556Z // end inline asm 2026-02-21T09:41:21.2963689Z // begin inline asm 2026-02-21T09:41:21.2963897Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:41:21.2964163Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:21.2964347Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:21.2964525Z // end inline asm 2026-02-21T09:41:21.2964652Z bar.sync 0; 2026-02-21T09:41:21.2964834Z cvta.global.u64 %rd40, %rd19; 2026-02-21T09:41:21.2965109Z .loc 1 22 68 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:22:68 2026-02-21T09:41:21.2965413Z add.s64 %rd37, %rd19, 128; 2026-02-21T09:41:21.2965568Z bar.sync 0; 2026-02-21T09:41:21.2965694Z // begin inline asm 2026-02-21T09:41:21.2965848Z @%p4 st.shared.b32 [ %r61 + 0 ], %r70; 2026-02-21T09:41:21.2966013Z // end inline asm 2026-02-21T09:41:21.2966152Z bar.warp.sync -1; 2026-02-21T09:41:21.2966318Z // begin inline asm 2026-02-21T09:41:21.2966569Z @%p133 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:41:21.2966839Z // end inline asm 2026-02-21T09:41:21.2966973Z // begin inline asm 2026-02-21T09:41:21.2967191Z @%p133 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:41:21.2967433Z // end inline asm 2026-02-21T09:41:21.2967567Z // begin inline asm 2026-02-21T09:41:21.2967789Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:41:21.2968052Z // end inline asm 2026-02-21T09:41:21.2968179Z mov.b32 %r72, 128; 2026-02-21T09:41:21.2968317Z // begin inline asm 2026-02-21T09:41:21.2968540Z @%p133 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r72; 2026-02-21T09:41:21.2968807Z // end inline asm 2026-02-21T09:41:21.2968948Z // begin inline asm 2026-02-21T09:41:21.2969183Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r65; 2026-02-21T09:41:21.2969455Z // end inline asm 2026-02-21T09:41:21.2969584Z mov.b32 %r74, 12288; 2026-02-21T09:41:21.2969730Z // begin inline asm 2026-02-21T09:41:21.2969960Z @%p133 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r74; 2026-02-21T09:41:21.2970232Z // end inline asm 2026-02-21T09:41:21.2970367Z // begin inline asm 2026-02-21T09:41:21.2970611Z @%p133 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:41:21.2970923Z // end inline asm 2026-02-21T09:41:21.2971051Z // begin inline asm 2026-02-21T09:41:21.2971299Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r67; 2026-02-21T09:41:21.2971583Z // end inline asm 2026-02-21T09:41:21.2971718Z // begin inline asm 2026-02-21T09:41:21.2971994Z @%p133 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r67; 2026-02-21T09:41:21.2972272Z // end inline asm 2026-02-21T09:41:21.2972422Z // begin inline asm 2026-02-21T09:41:21.2972656Z @%p133 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:41:21.2972920Z // end inline asm 2026-02-21T09:41:21.2973051Z // begin inline asm 2026-02-21T09:41:21.2973305Z @%p133 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.2973583Z // end inline asm 2026-02-21T09:41:21.2973723Z // begin inline asm 2026-02-21T09:41:21.2973958Z @%p133 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x2; 2026-02-21T09:41:21.2974218Z // end inline asm 2026-02-21T09:41:21.2974358Z // begin inline asm 2026-02-21T09:41:21.2974581Z @%p133 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:41:21.2974885Z // end inline asm 2026-02-21T09:41:21.2975047Z // begin inline asm 2026-02-21T09:41:21.2975400Z @%p4 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:41:21.2975795Z // end inline asm 2026-02-21T09:41:21.2975933Z // begin inline asm 2026-02-21T09:41:21.2976150Z @%p4 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:41:21.2976407Z @%p4 cp.async.bulk.commit_group ; 2026-02-21T09:41:21.2976608Z @%p4 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:21.2976785Z // end inline asm 2026-02-21T09:41:21.2976926Z bar.sync 0; 2026-02-21T09:41:21.2977067Z cvta.global.u64 %rd41, %rd37; 2026-02-21T09:41:21.2977356Z .loc 1 40 45 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:40:45 2026-02-21T09:41:21.2977653Z shr.u32 %r174, %r1, 5; 2026-02-21T09:41:21.2977926Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.2978233Z sub.s32 %r176, 1536, %r497; 2026-02-21T09:41:21.2978408Z mul.hi.s32 %r177, %r176, -580400985; 2026-02-21T09:41:21.2978594Z add.s32 %r178, %r177, %r176; 2026-02-21T09:41:21.2978780Z shr.u32 %r179, %r178, 31; 2026-02-21T09:41:21.2978938Z shr.s32 %r180, %r178, 12; 2026-02-21T09:41:21.2979101Z add.s32 %r181, %r180, %r179; 2026-02-21T09:41:21.2979260Z mul.lo.s32 %r182, %r181, 4736; 2026-02-21T09:41:21.2979435Z setp.ne.b32 %p68, %r176, %r182; 2026-02-21T09:41:21.2979606Z setp.lt.u32 %p69, %r497, 1537; 2026-02-21T09:41:21.2979783Z and.pred %p70, %p69, %p68; 2026-02-21T09:41:21.2979945Z selp.b32 %r183, 1, 0, %p70; 2026-02-21T09:41:21.2980111Z add.s32 %r10, %r181, %r183; 2026-02-21T09:41:21.2980382Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.2980695Z shfl.sync.idx.b32 %r12, %r174, 0, 31, -1; 2026-02-21T09:41:21.2980886Z shl.b32 %r184, %r12, 21; 2026-02-21T09:41:21.2981044Z and.b32 %r185, %r184, 6291456; 2026-02-21T09:41:21.2981215Z add.s32 %r186, %r185, %r468; 2026-02-21T09:41:21.2981372Z shl.b32 %r187, %r12, 4; 2026-02-21T09:41:21.2981531Z and.b32 %r188, %r187, 64; 2026-02-21T09:41:21.2981683Z add.s32 %r77, %r186, %r188; 2026-02-21T09:41:21.2981849Z mov.pred %p42, -1; 2026-02-21T09:41:21.2981993Z // begin inline asm 2026-02-21T09:41:21.2982354Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r77 + 0], 32, {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:41:21.2982741Z // end inline asm 2026-02-21T09:41:21.2982876Z // begin inline asm 2026-02-21T09:41:21.2983213Z @%p42 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r77 + 16], 32, {%r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70, %r70}; 2026-02-21T09:41:21.2983617Z // end inline asm 2026-02-21T09:41:21.2983771Z // begin inline asm 2026-02-21T09:41:21.2983920Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:21.2984087Z // end inline asm 2026-02-21T09:41:21.2984215Z bar.sync 0; 2026-02-21T09:41:21.2984492Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.2984817Z add.s32 %r111, %r60, 77872; 2026-02-21T09:41:21.2984964Z // begin inline asm 2026-02-21T09:41:21.2985136Z @%p133 mbarrier.init.shared::cta.b64 [%r111], 1; 2026-02-21T09:41:21.2985320Z // end inline asm 2026-02-21T09:41:21.2985454Z bar.sync 0; 2026-02-21T09:41:21.2985581Z add.s32 %r112, %r60, 77880; 2026-02-21T09:41:21.2985733Z // begin inline asm 2026-02-21T09:41:21.2985893Z @%p133 mbarrier.init.shared::cta.b64 [%r112], 1; 2026-02-21T09:41:21.2986095Z // end inline asm 2026-02-21T09:41:21.2986237Z add.s32 %r113, %r60, 77824; 2026-02-21T09:41:21.2986381Z // begin inline asm 2026-02-21T09:41:21.2986543Z @%p133 mbarrier.init.shared::cta.b64 [%r113], 1; 2026-02-21T09:41:21.2986722Z // end inline asm 2026-02-21T09:41:21.2986856Z bar.sync 0; 2026-02-21T09:41:21.2986982Z add.s32 %r114, %r60, 77832; 2026-02-21T09:41:21.2987157Z // begin inline asm 2026-02-21T09:41:21.2987317Z @%p133 mbarrier.init.shared::cta.b64 [%r114], 1; 2026-02-21T09:41:21.2987503Z // end inline asm 2026-02-21T09:41:21.2987632Z bar.sync 0; 2026-02-21T09:41:21.2987764Z add.s32 %r115, %r60, 77840; 2026-02-21T09:41:21.2987914Z // begin inline asm 2026-02-21T09:41:21.2988070Z @%p133 mbarrier.init.shared::cta.b64 [%r115], 1; 2026-02-21T09:41:21.2988253Z // end inline asm 2026-02-21T09:41:21.2988381Z bar.sync 0; 2026-02-21T09:41:21.2988511Z add.s32 %r116, %r60, 77848; 2026-02-21T09:41:21.2988654Z // begin inline asm 2026-02-21T09:41:21.2988816Z @%p133 mbarrier.init.shared::cta.b64 [%r116], 1; 2026-02-21T09:41:21.2988993Z // end inline asm 2026-02-21T09:41:21.2989125Z bar.sync 0; 2026-02-21T09:41:21.2989248Z add.s32 %r117, %r60, 77856; 2026-02-21T09:41:21.2989399Z // begin inline asm 2026-02-21T09:41:21.2989562Z @%p133 mbarrier.init.shared::cta.b64 [%r117], 1; 2026-02-21T09:41:21.2989739Z // end inline asm 2026-02-21T09:41:21.2989872Z bar.sync 0; 2026-02-21T09:41:21.2989998Z add.s32 %r223, %r60, 77864; 2026-02-21T09:41:21.2990150Z // begin inline asm 2026-02-21T09:41:21.2990341Z @%p133 mbarrier.init.shared::cta.b64 [%r223], 1; 2026-02-21T09:41:21.2990527Z // end inline asm 2026-02-21T09:41:21.2990659Z setp.lt.s32 %p71, %r10, 1; 2026-02-21T09:41:21.2990828Z setp.gt.s32 %p67, %r10, 0; 2026-02-21T09:41:21.2991099Z .loc 1 34 35 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:34:35 2026-02-21T09:41:21.2991396Z mul.hi.u32 %r189, %r497, 715827883; 2026-02-21T09:41:21.2991571Z shr.u32 %r190, %r189, 10; 2026-02-21T09:41:21.2991826Z .loc 1 35 33 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:35:33 2026-02-21T09:41:21.2992114Z shl.b32 %r191, %r190, 6; 2026-02-21T09:41:21.2992364Z .loc 1 36 39 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:36:39 2026-02-21T09:41:21.2992654Z sub.s32 %r192, 16, %r191; 2026-02-21T09:41:21.2992929Z .loc 1 37 45 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:37:45 2026-02-21T09:41:21.2993207Z mul.lo.s32 %r193, %r190, 6144; 2026-02-21T09:41:21.2993374Z sub.s32 %r194, %r497, %r193; 2026-02-21T09:41:21.2993627Z .loc 1 38 51 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:38:51 2026-02-21T09:41:21.2993908Z div.s32 %r195, %r194, %r192; 2026-02-21T09:41:21.2994160Z .loc 1 37 64 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:37:64 2026-02-21T09:41:21.2994444Z mul.lo.s32 %r196, %r195, %r192; 2026-02-21T09:41:21.2994610Z sub.s32 %r197, %r194, %r196; 2026-02-21T09:41:21.2994932Z .loc 1 37 30 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:37:30 2026-02-21T09:41:21.2995221Z add.s32 %r198, %r197, %r191; 2026-02-21T09:41:21.2995476Z .loc 1 39 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:39:27 2026-02-21T09:41:21.2995759Z shl.b32 %r474, %r198, 6; 2026-02-21T09:41:21.2996040Z .loc 1 41 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:41:27 2026-02-21T09:41:21.2996337Z shl.b32 %r470, %r195, 7; 2026-02-21T09:41:21.2996610Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.2996891Z bar.sync 0; 2026-02-21T09:41:21.2997032Z and.pred %p1, %p133, %p67; 2026-02-21T09:41:21.2997186Z // begin inline asm 2026-02-21T09:41:21.2997379Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r113], 12288; 2026-02-21T09:41:21.2997592Z // end inline asm 2026-02-21T09:41:21.2997841Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.2998116Z bar.sync 0; 2026-02-21T09:41:21.2998257Z elect.sync %r199|%p72, -1; 2026-02-21T09:41:21.2998422Z and.pred %p73, %p67, %p72; 2026-02-21T09:41:21.2998576Z and.pred %p53, %p4, %p73; 2026-02-21T09:41:21.2998776Z add.s32 %r120, %r60, 49152; 2026-02-21T09:41:21.2998925Z // begin inline asm 2026-02-21T09:41:21.2999250Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r120], [%rd40, {%r70, %r474}], [%r113]; 2026-02-21T09:41:21.2999604Z // end inline asm 2026-02-21T09:41:21.2999851Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3000129Z // begin inline asm 2026-02-21T09:41:21.3000279Z fence.proxy.async.shared::cta; 2026-02-21T09:41:21.3000444Z // end inline asm 2026-02-21T09:41:21.3000570Z bar.sync 0; 2026-02-21T09:41:21.3000707Z elect.sync %r200|%p74, -1; 2026-02-21T09:41:21.3000863Z and.pred %p75, %p67, %p74; 2026-02-21T09:41:21.3001025Z and.pred %p54, %p4, %p75; 2026-02-21T09:41:21.3001175Z // begin inline asm 2026-02-21T09:41:21.3001495Z @%p54 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r60], [%rd41, {%r70, %r470}], [%r113]; 2026-02-21T09:41:21.3001837Z // end inline asm 2026-02-21T09:41:21.3002084Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3002401Z bar.sync 0; 2026-02-21T09:41:21.3002528Z // begin inline asm 2026-02-21T09:41:21.3002732Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r114], 12288; 2026-02-21T09:41:21.3002944Z // end inline asm 2026-02-21T09:41:21.3003190Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3003470Z bar.sync 0; 2026-02-21T09:41:21.3003611Z elect.sync %r201|%p76, -1; 2026-02-21T09:41:21.3003773Z and.pred %p77, %p67, %p76; 2026-02-21T09:41:21.3003925Z and.pred %p56, %p4, %p77; 2026-02-21T09:41:21.3004084Z add.s32 %r129, %r60, 53248; 2026-02-21T09:41:21.3004231Z // begin inline asm 2026-02-21T09:41:21.3004546Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r129], [%rd40, {%r63, %r474}], [%r114]; 2026-02-21T09:41:21.3004924Z // end inline asm 2026-02-21T09:41:21.3005179Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3005460Z bar.sync 0; 2026-02-21T09:41:21.3005591Z elect.sync %r202|%p78, -1; 2026-02-21T09:41:21.3005752Z and.pred %p79, %p67, %p78; 2026-02-21T09:41:21.3005907Z and.pred %p57, %p4, %p79; 2026-02-21T09:41:21.3006075Z add.s32 %r133, %r60, 8192; 2026-02-21T09:41:21.3006219Z // begin inline asm 2026-02-21T09:41:21.3006535Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r133], [%rd41, {%r63, %r470}], [%r114]; 2026-02-21T09:41:21.3006880Z // end inline asm 2026-02-21T09:41:21.3007164Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3007457Z bar.sync 0; 2026-02-21T09:41:21.3007581Z // begin inline asm 2026-02-21T09:41:21.3007769Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r115], 12288; 2026-02-21T09:41:21.3007977Z // end inline asm 2026-02-21T09:41:21.3008268Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3008543Z bar.sync 0; 2026-02-21T09:41:21.3008678Z elect.sync %r203|%p80, -1; 2026-02-21T09:41:21.3008831Z and.pred %p81, %p67, %p80; 2026-02-21T09:41:21.3008991Z and.pred %p59, %p4, %p81; 2026-02-21T09:41:21.3009147Z add.s32 %r138, %r60, 57344; 2026-02-21T09:41:21.3009295Z // begin inline asm 2026-02-21T09:41:21.3009620Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r138], [%rd40, {%r64, %r474}], [%r115]; 2026-02-21T09:41:21.3009956Z // end inline asm 2026-02-21T09:41:21.3010204Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3010476Z bar.sync 0; 2026-02-21T09:41:21.3010614Z elect.sync %r204|%p82, -1; 2026-02-21T09:41:21.3010777Z and.pred %p83, %p67, %p82; 2026-02-21T09:41:21.3010929Z and.pred %p60, %p4, %p83; 2026-02-21T09:41:21.3011110Z add.s32 %r142, %r60, 16384; 2026-02-21T09:41:21.3011265Z // begin inline asm 2026-02-21T09:41:21.3011583Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r142], [%rd41, {%r64, %r470}], [%r115]; 2026-02-21T09:41:21.3011939Z // end inline asm 2026-02-21T09:41:21.3012199Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3012500Z bar.sync 0; 2026-02-21T09:41:21.3012632Z // begin inline asm 2026-02-21T09:41:21.3012825Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r116], 12288; 2026-02-21T09:41:21.3013039Z // end inline asm 2026-02-21T09:41:21.3013288Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3013573Z bar.sync 0; 2026-02-21T09:41:21.3013713Z elect.sync %r205|%p84, -1; 2026-02-21T09:41:21.3013872Z and.pred %p85, %p67, %p84; 2026-02-21T09:41:21.3014037Z and.pred %p62, %p4, %p85; 2026-02-21T09:41:21.3014193Z add.s32 %r147, %r60, 61440; 2026-02-21T09:41:21.3014351Z mov.b32 %r148, 96; 2026-02-21T09:41:21.3014492Z // begin inline asm 2026-02-21T09:41:21.3014856Z @%p62 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd40, {%r148, %r474}], [%r116]; 2026-02-21T09:41:21.3015202Z // end inline asm 2026-02-21T09:41:21.3015441Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3015726Z bar.sync 0; 2026-02-21T09:41:21.3015859Z elect.sync %r206|%p86, -1; 2026-02-21T09:41:21.3016021Z and.pred %p87, %p67, %p86; 2026-02-21T09:41:21.3016181Z and.pred %p63, %p4, %p87; 2026-02-21T09:41:21.3016331Z add.s32 %r151, %r60, 24576; 2026-02-21T09:41:21.3016484Z // begin inline asm 2026-02-21T09:41:21.3016798Z @%p63 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r151], [%rd41, {%r148, %r470}], [%r116]; 2026-02-21T09:41:21.3017155Z // end inline asm 2026-02-21T09:41:21.3017407Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3017692Z bar.sync 0; 2026-02-21T09:41:21.3017821Z // begin inline asm 2026-02-21T09:41:21.3018001Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r117], 12288; 2026-02-21T09:41:21.3018221Z // end inline asm 2026-02-21T09:41:21.3018468Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3018755Z bar.sync 0; 2026-02-21T09:41:21.3018888Z elect.sync %r207|%p88, -1; 2026-02-21T09:41:21.3019057Z and.pred %p89, %p67, %p88; 2026-02-21T09:41:21.3019215Z and.pred %p65, %p4, %p89; 2026-02-21T09:41:21.3019413Z add.s32 %r156, %r60, 65536; 2026-02-21T09:41:21.3019566Z // begin inline asm 2026-02-21T09:41:21.3019896Z @%p65 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r156], [%rd40, {%r72, %r474}], [%r117]; 2026-02-21T09:41:21.3020264Z // end inline asm 2026-02-21T09:41:21.3020537Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3020839Z bar.sync 0; 2026-02-21T09:41:21.3020972Z elect.sync %r208|%p90, -1; 2026-02-21T09:41:21.3021142Z and.pred %p91, %p67, %p90; 2026-02-21T09:41:21.3021302Z and.pred %p66, %p4, %p91; 2026-02-21T09:41:21.3021466Z add.s32 %r160, %r60, 32768; 2026-02-21T09:41:21.3021625Z // begin inline asm 2026-02-21T09:41:21.3021959Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd41, {%r72, %r470}], [%r117]; 2026-02-21T09:41:21.3022332Z // end inline asm 2026-02-21T09:41:21.3022585Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3022894Z bar.sync 0; 2026-02-21T09:41:21.3023028Z // begin inline asm 2026-02-21T09:41:21.3023175Z 2026-02-21T09:41:21.3023289Z { 2026-02-21T09:41:21.3023424Z @!%p67 bra.uni skipWait; 2026-02-21T09:41:21.3023592Z .reg .pred complete; 2026-02-21T09:41:21.3023767Z waitLoop: 2026-02-21T09:41:21.3023964Z mbarrier.try_wait.parity.shared.b64 complete, [%r113], %r70; 2026-02-21T09:41:21.3024202Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.3024364Z skipWait: 2026-02-21T09:41:21.3024486Z } 2026-02-21T09:41:21.3024560Z 2026-02-21T09:41:21.3024615Z // end inline asm 2026-02-21T09:41:21.3024892Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3025202Z setp.ne.b32 %p92, %r12, 0; 2026-02-21T09:41:21.3025371Z or.pred %p93, %p71, %p92; 2026-02-21T09:41:21.3025526Z @%p93 bra $L__BB0_2; 2026-02-21T09:41:21.3025676Z // %bb.1: 2026-02-21T09:41:21.3025807Z elect.sync %r213|%p95, -1; 2026-02-21T09:41:21.3025979Z bfe.u32 %r216, %r120, 4, 14; 2026-02-21T09:41:21.3026139Z cvt.u64.u32 %rd57, %r216; 2026-02-21T09:41:21.3026315Z or.b64 %rd52, %rd57, -9223371899399045120; 2026-02-21T09:41:21.3026502Z bfe.u32 %r217, %r60, 4, 14; 2026-02-21T09:41:21.3026677Z cvt.u64.u32 %rd58, %r217; 2026-02-21T09:41:21.3026834Z or.b64 %rd53, %rd58, -9223371899382267904; 2026-02-21T09:41:21.3027013Z mov.b32 %r210, 69206032; 2026-02-21T09:41:21.3027192Z mov.pred %p94, 0; 2026-02-21T09:41:21.3027327Z // begin inline asm 2026-02-21T09:41:21.3027557Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r468 + 0 ], %rd52, %rd53, %r210, %p94; 2026-02-21T09:41:21.3027798Z // end inline asm 2026-02-21T09:41:21.3027937Z add.s32 %r218, %r60, 49184; 2026-02-21T09:41:21.3028085Z bfe.u32 %r219, %r218, 4, 14; 2026-02-21T09:41:21.3028240Z cvt.u64.u32 %rd59, %r219; 2026-02-21T09:41:21.3028396Z or.b64 %rd54, %rd59, -9223371899399045120; 2026-02-21T09:41:21.3028573Z add.s32 %r220, %r60, 32; 2026-02-21T09:41:21.3028724Z bfe.u32 %r221, %r220, 4, 14; 2026-02-21T09:41:21.3028869Z cvt.u64.u32 %rd60, %r221; 2026-02-21T09:41:21.3029028Z or.b64 %rd55, %rd60, -9223371899382267904; 2026-02-21T09:41:21.3029195Z // begin inline asm 2026-02-21T09:41:21.3029415Z @%p95 tcgen05.mma.cta_group::1.kind::f16 [ %r468 + 0 ], %rd54, %rd55, %r210, %p42; 2026-02-21T09:41:21.3029653Z // end inline asm 2026-02-21T09:41:21.3029794Z add.s32 %r222, %r60, 77872; 2026-02-21T09:41:21.3029945Z cvt.u64.u32 %rd56, %r222; 2026-02-21T09:41:21.3030097Z // begin inline asm 2026-02-21T09:41:21.3030303Z @%p95 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd56]; 2026-02-21T09:41:21.3030526Z // end inline asm 2026-02-21T09:41:21.3030663Z $L__BB0_2: 2026-02-21T09:41:21.3030907Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3031206Z bar.sync 0; 2026-02-21T09:41:21.3031334Z // begin inline asm 2026-02-21T09:41:21.3031558Z @%p1 mbarrier.arrive.expect_tx.shared.b64 _, [%r223], 12288; 2026-02-21T09:41:21.3031768Z // end inline asm 2026-02-21T09:41:21.3032012Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3032299Z bar.sync 0; 2026-02-21T09:41:21.3032432Z elect.sync %r233|%p105, -1; 2026-02-21T09:41:21.3032628Z and.pred %p106, %p67, %p105; 2026-02-21T09:41:21.3032787Z and.pred %p100, %p4, %p106; 2026-02-21T09:41:21.3032946Z add.s32 %r224, %r60, 69632; 2026-02-21T09:41:21.3033090Z mov.b32 %r482, 160; 2026-02-21T09:41:21.3033233Z // begin inline asm 2026-02-21T09:41:21.3033557Z @%p100 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r224], [%rd40, {%r482, %r474}], [%r223]; 2026-02-21T09:41:21.3033920Z // end inline asm 2026-02-21T09:41:21.3034170Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3034460Z bar.sync 0; 2026-02-21T09:41:21.3034602Z elect.sync %r234|%p107, -1; 2026-02-21T09:41:21.3034789Z and.pred %p108, %p67, %p107; 2026-02-21T09:41:21.3034962Z and.pred %p101, %p4, %p108; 2026-02-21T09:41:21.3035118Z add.s32 %r228, %r60, 40960; 2026-02-21T09:41:21.3035280Z // begin inline asm 2026-02-21T09:41:21.3035648Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r228], [%rd41, {%r482, %r470}], [%r223]; 2026-02-21T09:41:21.3036001Z // end inline asm 2026-02-21T09:41:21.3036259Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3036549Z @%p71 bra $L__BB0_12; 2026-02-21T09:41:21.3036723Z // %bb.3: // %.lr.ph 2026-02-21T09:41:21.3037015Z .loc 1 0 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:0:108 2026-02-21T09:41:21.3037302Z shr.u32 %r175, %r1, 4; 2026-02-21T09:41:21.3037457Z bfe.u32 %r4, %r1, 4, 4; 2026-02-21T09:41:21.3037602Z and.b32 %r8, %r1, 15; 2026-02-21T09:41:21.3037771Z ld.param.b64 %rd3, [_helion_matmul_param_2]; 2026-02-21T09:41:21.3037951Z or.b32 %r5, %r4, 16; 2026-02-21T09:41:21.3038094Z or.b32 %r6, %r4, 32; 2026-02-21T09:41:21.3038232Z or.b32 %r7, %r175, 48; 2026-02-21T09:41:21.3038380Z shl.b32 %r9, %r8, 3; 2026-02-21T09:41:21.3038517Z shl.b32 %r11, %r10, 5; 2026-02-21T09:41:21.3038671Z add.s32 %r16, %r11, -6; 2026-02-21T09:41:21.3038814Z shl.b32 %r243, %r1, 9; 2026-02-21T09:41:21.3038994Z and.b32 %r244, %r243, 3072; 2026-02-21T09:41:21.3039152Z shl.b32 %r245, %r8, 4; 2026-02-21T09:41:21.3039289Z shl.b32 %r246, %r1, 3; 2026-02-21T09:41:21.3039438Z and.b32 %r247, %r246, 768; 2026-02-21T09:41:21.3039586Z shl.b32 %r248, %r1, 1; 2026-02-21T09:41:21.3039734Z and.b32 %r249, %r248, 32; 2026-02-21T09:41:21.3039879Z shr.u32 %r250, %r1, 1; 2026-02-21T09:41:21.3040029Z and.b32 %r251, %r250, 64; 2026-02-21T09:41:21.3040177Z or.b32 %r252, %r245, %r247; 2026-02-21T09:41:21.3040335Z or.b32 %r253, %r249, %r251; 2026-02-21T09:41:21.3040485Z xor.b32 %r254, %r252, %r253; 2026-02-21T09:41:21.3040644Z add.s32 %r256, %r60, 73728; 2026-02-21T09:41:21.3040702Z add.s32 %r257, %r256, %r244; 2026-02-21T09:41:21.3040768Z add.s32 %r17, %r257, %r254; 2026-02-21T09:41:21.3040824Z shl.b32 %r258, %r1, 5; 2026-02-21T09:41:21.3040882Z and.b32 %r259, %r258, 3936; 2026-02-21T09:41:21.3040944Z and.b32 %r260, %r1, 224; 2026-02-21T09:41:21.3041008Z and.b32 %r262, %r173, 16; 2026-02-21T09:41:21.3041065Z xor.b32 %r263, %r259, %r260; 2026-02-21T09:41:21.3041120Z add.s32 %r264, %r256, %r262; 2026-02-21T09:41:21.3041189Z add.s32 %r369, %r264, %r263; 2026-02-21T09:41:21.3041361Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3041417Z max.s32 %r265, %r11, 2; 2026-02-21T09:41:21.3041487Z add.s32 %r19, %r265, -1; 2026-02-21T09:41:21.3041544Z mov.pred %p142, -1; 2026-02-21T09:41:21.3041596Z mov.b32 %r487, 5; 2026-02-21T09:41:21.3041677Z mov.b32 %r483, 0; 2026-02-21T09:41:21.3041734Z mov.b32 %r481, 1; 2026-02-21T09:41:21.3041785Z mov.b32 %r480, 2; 2026-02-21T09:41:21.3041834Z mov.b32 %r479, 3; 2026-02-21T09:41:21.3041885Z mov.b32 %r478, 4; 2026-02-21T09:41:21.3041938Z mov.b32 %r471, %r470; 2026-02-21T09:41:21.3041990Z mov.b32 %r472, %r470; 2026-02-21T09:41:21.3042072Z mov.b32 %r473, %r470; 2026-02-21T09:41:21.3042135Z mov.b32 %r475, %r474; 2026-02-21T09:41:21.3042187Z mov.b32 %r476, %r474; 2026-02-21T09:41:21.3042242Z mov.b32 %r477, %r474; 2026-02-21T09:41:21.3042300Z mov.b32 %r484, %r111; 2026-02-21T09:41:21.3042352Z mov.b32 %r485, %r483; 2026-02-21T09:41:21.3042404Z mov.b32 %r486, %r483; 2026-02-21T09:41:21.3042455Z mov.b32 %r488, %r481; 2026-02-21T09:41:21.3042513Z mov.b32 %r489, %r483; 2026-02-21T09:41:21.3042565Z mov.b32 %r490, %r470; 2026-02-21T09:41:21.3042617Z mov.b32 %r491, %r474; 2026-02-21T09:41:21.3042675Z mov.b32 %r493, %r487; 2026-02-21T09:41:21.3042726Z mov.b32 %r494, %r483; 2026-02-21T09:41:21.3042780Z mov.b32 %r495, %r491; 2026-02-21T09:41:21.3042831Z mov.b32 %r496, %r490; 2026-02-21T09:41:21.3042892Z bra.uni $L__BB0_4; 2026-02-21T09:41:21.3042994Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.3043179Z .loc 1 0 0 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:0 2026-02-21T09:41:21.3043253Z selp.b32 %r488, 0, %r317, %p126; 2026-02-21T09:41:21.3043311Z selp.b32 %r318, 1, 0, %p126; 2026-02-21T09:41:21.3043366Z xor.b32 %r489, %r459, %r318; 2026-02-21T09:41:21.3043544Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3043598Z add.s32 %r494, %r494, 1; 2026-02-21T09:41:21.3043658Z setp.ne.b32 %p132, %r19, %r494; 2026-02-21T09:41:21.3043710Z mov.b32 %r470, %r490; 2026-02-21T09:41:21.3043768Z mov.b32 %r473, %r22; 2026-02-21T09:41:21.3043818Z mov.b32 %r474, %r491; 2026-02-21T09:41:21.3043868Z mov.b32 %r477, %r26; 2026-02-21T09:41:21.3043926Z mov.b32 %r478, %r493; 2026-02-21T09:41:21.3043978Z mov.b32 %r481, %r30; 2026-02-21T09:41:21.3044029Z mov.b32 %r483, %r459; 2026-02-21T09:41:21.3044080Z mov.b32 %r484, %r458; 2026-02-21T09:41:21.3044138Z mov.b32 %r490, %r496; 2026-02-21T09:41:21.3044189Z mov.b32 %r491, %r495; 2026-02-21T09:41:21.3044242Z mov.b32 %r493, %r45; 2026-02-21T09:41:21.3044305Z @%p132 bra $L__BB0_4; 2026-02-21T09:41:21.3044358Z bra.uni $L__BB0_11; 2026-02-21T09:41:21.3044482Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:21.3044655Z .loc 1 0 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:0:108 2026-02-21T09:41:21.3044739Z mov.b32 %r459, %r489; 2026-02-21T09:41:21.3044793Z mov.b32 %r30, %r480; 2026-02-21T09:41:21.3044846Z mov.b32 %r480, %r479; 2026-02-21T09:41:21.3044904Z mov.b32 %r479, %r478; 2026-02-21T09:41:21.3044955Z mov.b32 %r26, %r476; 2026-02-21T09:41:21.3045005Z mov.b32 %r476, %r475; 2026-02-21T09:41:21.3045062Z mov.b32 %r475, %r474; 2026-02-21T09:41:21.3045111Z mov.b32 %r22, %r472; 2026-02-21T09:41:21.3045164Z mov.b32 %r472, %r471; 2026-02-21T09:41:21.3045214Z mov.b32 %r471, %r470; 2026-02-21T09:41:21.3045391Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3045447Z add.s32 %r266, %r493, 1; 2026-02-21T09:41:21.3045509Z setp.eq.b32 %p110, %r493, 31; 2026-02-21T09:41:21.3045576Z selp.b32 %r45, 0, %r266, %p110; 2026-02-21T09:41:21.3045635Z setp.ne.b32 %p111, %r45, 0; 2026-02-21T09:41:21.3045686Z @%p111 bra $L__BB0_6; 2026-02-21T09:41:21.3045778Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.3045840Z add.s32 %r497, %r497, 4736; 2026-02-21T09:41:21.3046005Z .loc 1 34 35 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:34:35 2026-02-21T09:41:21.3046069Z mul.hi.s32 %r267, %r497, 715827883; 2026-02-21T09:41:21.3046171Z shr.u32 %r268, %r267, 31; 2026-02-21T09:41:21.3046227Z shr.s32 %r269, %r267, 10; 2026-02-21T09:41:21.3046282Z add.s32 %r270, %r269, %r268; 2026-02-21T09:41:21.3046452Z .loc 1 35 33 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:35:33 2026-02-21T09:41:21.3046507Z shl.b32 %r271, %r270, 6; 2026-02-21T09:41:21.3046699Z .loc 1 36 39 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:36:39 2026-02-21T09:41:21.3046765Z sub.s32 %r272, 16, %r271; 2026-02-21T09:41:21.3046926Z .loc 1 36 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:36:52 2026-02-21T09:41:21.3046980Z min.s32 %r273, %r272, 64; 2026-02-21T09:41:21.3047144Z .loc 1 37 45 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:37:45 2026-02-21T09:41:21.3047211Z mul.lo.s32 %r274, %r270, 6144; 2026-02-21T09:41:21.3047267Z sub.s32 %r275, %r497, %r274; 2026-02-21T09:41:21.3047434Z .loc 1 38 51 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:38:51 2026-02-21T09:41:21.3047500Z div.s32 %r276, %r275, %r273; 2026-02-21T09:41:21.3047662Z .loc 1 37 64 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:37:64 2026-02-21T09:41:21.3047721Z mul.lo.s32 %r277, %r276, %r273; 2026-02-21T09:41:21.3047810Z sub.s32 %r278, %r275, %r277; 2026-02-21T09:41:21.3047979Z .loc 1 37 30 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:37:30 2026-02-21T09:41:21.3048037Z add.s32 %r279, %r278, %r271; 2026-02-21T09:41:21.3048200Z .loc 1 39 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:39:27 2026-02-21T09:41:21.3048262Z shl.b32 %r495, %r279, 6; 2026-02-21T09:41:21.3048430Z .loc 1 41 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:41:27 2026-02-21T09:41:21.3048485Z shl.b32 %r496, %r276, 7; 2026-02-21T09:41:21.3048589Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.3048771Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3048827Z add.s32 %r282, %r486, 1; 2026-02-21T09:41:21.3048899Z setp.gt.s32 %p113, %r282, 5; 2026-02-21T09:41:21.3048963Z selp.b32 %r486, 0, %r282, %p113; 2026-02-21T09:41:21.3049024Z selp.b32 %r283, 1, 0, %p113; 2026-02-21T09:41:21.3049074Z xor.b32 %r485, %r485, %r283; 2026-02-21T09:41:21.3049158Z shl.b32 %r284, %r486, 3; 2026-02-21T09:41:21.3049214Z add.s32 %r286, %r60, %r284; 2026-02-21T09:41:21.3049265Z add.s32 %r280, %r286, 77824; 2026-02-21T09:41:21.3049322Z bar.sync 0; 2026-02-21T09:41:21.3049376Z // begin inline asm 2026-02-21T09:41:21.3049424Z 2026-02-21T09:41:21.3049469Z { 2026-02-21T09:41:21.3049530Z .reg .pred complete; 2026-02-21T09:41:21.3049579Z waitLoop: 2026-02-21T09:41:21.3049695Z mbarrier.try_wait.parity.shared.b64 complete, [%r280], %r485; 2026-02-21T09:41:21.3049762Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.3049809Z } 2026-02-21T09:41:21.3049813Z 2026-02-21T09:41:21.3049866Z // end inline asm 2026-02-21T09:41:21.3049926Z shl.b32 %r287, %r488, 3; 2026-02-21T09:41:21.3049981Z add.s32 %r288, %r60, %r287; 2026-02-21T09:41:21.3050036Z add.s32 %r458, %r288, 77872; 2026-02-21T09:41:21.3050201Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3050264Z @%p92 bra $L__BB0_8; 2026-02-21T09:41:21.3050357Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.3050522Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3050587Z shl.b32 %r293, %r486, 12; 2026-02-21T09:41:21.3050641Z add.s32 %r295, %r60, %r293; 2026-02-21T09:41:21.3050694Z add.s32 %r296, %r295, 49152; 2026-02-21T09:41:21.3050861Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3050939Z shl.b32 %r297, %r486, 13; 2026-02-21T09:41:21.3050994Z add.s32 %r298, %r60, %r297; 2026-02-21T09:41:21.3051154Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3051220Z elect.sync %r299|%p115, -1; 2026-02-21T09:41:21.3051298Z bfe.u32 %r300, %r296, 4, 14; 2026-02-21T09:41:21.3051356Z cvt.u64.u32 %rd68, %r300; 2026-02-21T09:41:21.3051436Z or.b64 %rd63, %rd68, -9223371899399045120; 2026-02-21T09:41:21.3051490Z bfe.u32 %r301, %r298, 4, 14; 2026-02-21T09:41:21.3051547Z cvt.u64.u32 %rd69, %r301; 2026-02-21T09:41:21.3051623Z or.b64 %rd64, %rd69, -9223371899382267904; 2026-02-21T09:41:21.3051678Z mov.b32 %r290, 69206032; 2026-02-21T09:41:21.3051731Z // begin inline asm 2026-02-21T09:41:21.3051872Z @%p115 tcgen05.mma.cta_group::1.kind::f16 [ %r468 + 0 ], %rd63, %rd64, %r290, %p142; 2026-02-21T09:41:21.3051933Z // end inline asm 2026-02-21T09:41:21.3051988Z add.s32 %r302, %r295, 49184; 2026-02-21T09:41:21.3052045Z bfe.u32 %r303, %r302, 4, 14; 2026-02-21T09:41:21.3052108Z cvt.u64.u32 %rd70, %r303; 2026-02-21T09:41:21.3052174Z or.b64 %rd65, %rd70, -9223371899399045120; 2026-02-21T09:41:21.3052228Z add.s32 %r304, %r298, 32; 2026-02-21T09:41:21.3052281Z bfe.u32 %r305, %r304, 4, 14; 2026-02-21T09:41:21.3052363Z cvt.u64.u32 %rd71, %r305; 2026-02-21T09:41:21.3052430Z or.b64 %rd66, %rd71, -9223371899382267904; 2026-02-21T09:41:21.3052489Z mov.pred %p116, -1; 2026-02-21T09:41:21.3052551Z // begin inline asm 2026-02-21T09:41:21.3052684Z @%p115 tcgen05.mma.cta_group::1.kind::f16 [ %r468 + 0 ], %rd65, %rd66, %r290, %p116; 2026-02-21T09:41:21.3052742Z // end inline asm 2026-02-21T09:41:21.3052803Z cvt.u64.u32 %rd67, %r458; 2026-02-21T09:41:21.3052857Z // begin inline asm 2026-02-21T09:41:21.3052977Z @%p115 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd67]; 2026-02-21T09:41:21.3053028Z // end inline asm 2026-02-21T09:41:21.3053127Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.3053297Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3053354Z setp.eq.b32 %p122, %r45, 0; 2026-02-21T09:41:21.3053421Z setp.lt.s32 %p123, %r494, %r16; 2026-02-21T09:41:21.3053587Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3053641Z // begin inline asm 2026-02-21T09:41:21.3053717Z 2026-02-21T09:41:21.3053766Z { 2026-02-21T09:41:21.3053822Z .reg .pred complete; 2026-02-21T09:41:21.3053871Z waitLoop: 2026-02-21T09:41:21.3053989Z mbarrier.try_wait.parity.shared.b64 complete, [%r484], %r483; 2026-02-21T09:41:21.3054046Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.3054094Z } 2026-02-21T09:41:21.3054098Z 2026-02-21T09:41:21.3054150Z // end inline asm 2026-02-21T09:41:21.3054202Z add.s32 %r317, %r488, 1; 2026-02-21T09:41:21.3054259Z setp.gt.s32 %p126, %r317, 1; 2026-02-21T09:41:21.3054422Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3054481Z add.s32 %r319, %r482, 32; 2026-02-21T09:41:21.3054535Z add.s32 %r320, %r487, 1; 2026-02-21T09:41:21.3054588Z setp.gt.s32 %p127, %r320, 5; 2026-02-21T09:41:21.3054655Z selp.b32 %r487, 0, %r320, %p127; 2026-02-21T09:41:21.3054740Z selp.b32 %r482, 0, %r319, %p122; 2026-02-21T09:41:21.3054793Z shl.b32 %r321, %r487, 3; 2026-02-21T09:41:21.3054850Z add.s32 %r323, %r60, %r321; 2026-02-21T09:41:21.3054903Z add.s32 %r312, %r323, 77824; 2026-02-21T09:41:21.3054955Z bar.sync 0; 2026-02-21T09:41:21.3055018Z and.pred %p119, %p133, %p123; 2026-02-21T09:41:21.3055079Z // begin inline asm 2026-02-21T09:41:21.3055187Z @%p119 mbarrier.arrive.expect_tx.shared.b64 _, [%r312], 12288; 2026-02-21T09:41:21.3055240Z // end inline asm 2026-02-21T09:41:21.3055410Z .loc 1 51 31 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:51:31 2026-02-21T09:41:21.3055492Z shl.b32 %r324, %r487, 12; 2026-02-21T09:41:21.3055547Z add.s32 %r325, %r60, %r324; 2026-02-21T09:41:21.3055602Z add.s32 %r309, %r325, 49152; 2026-02-21T09:41:21.3055661Z bar.sync 0; 2026-02-21T09:41:21.3055721Z elect.sync %r326|%p128, -1; 2026-02-21T09:41:21.3055782Z and.pred %p129, %p123, %p128; 2026-02-21T09:41:21.3055878Z and.pred %p120, %p4, %p129; 2026-02-21T09:41:21.3055934Z // begin inline asm 2026-02-21T09:41:21.3056189Z @%p120 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r309], [%rd40, {%r482, %r495}], [%r312]; 2026-02-21T09:41:21.3056254Z // end inline asm 2026-02-21T09:41:21.3056421Z .loc 1 52 44 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:52:44 2026-02-21T09:41:21.3056477Z shl.b32 %r327, %r487, 13; 2026-02-21T09:41:21.3056533Z add.s32 %r313, %r60, %r327; 2026-02-21T09:41:21.3056593Z bar.sync 0; 2026-02-21T09:41:21.3056653Z elect.sync %r328|%p130, -1; 2026-02-21T09:41:21.3056715Z and.pred %p131, %p123, %p130; 2026-02-21T09:41:21.3056782Z and.pred %p121, %p4, %p131; 2026-02-21T09:41:21.3056834Z // begin inline asm 2026-02-21T09:41:21.3057071Z @%p121 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r313], [%rd41, {%r482, %r496}], [%r312]; 2026-02-21T09:41:21.3057155Z // end inline asm 2026-02-21T09:41:21.3057327Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3057386Z setp.ne.b32 %p142, %r481, 31; 2026-02-21T09:41:21.3057442Z @%p142 bra $L__BB0_10; 2026-02-21T09:41:21.3057541Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:21.3057706Z .loc 1 40 32 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:40:32 2026-02-21T09:41:21.3057764Z add.s32 %r401, %r477, %r4; 2026-02-21T09:41:21.3057827Z add.s32 %r402, %r5, %r477; 2026-02-21T09:41:21.3057881Z add.s32 %r403, %r6, %r477; 2026-02-21T09:41:21.3057936Z add.s32 %r404, %r477, %r7; 2026-02-21T09:41:21.3058106Z .loc 1 42 32 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:42:32 2026-02-21T09:41:21.3058160Z add.s32 %r405, %r473, %r9; 2026-02-21T09:41:21.3058328Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3058387Z // begin inline asm 2026-02-21T09:41:21.3058435Z 2026-02-21T09:41:21.3058507Z { 2026-02-21T09:41:21.3058563Z .reg .pred complete; 2026-02-21T09:41:21.3058623Z waitLoop: 2026-02-21T09:41:21.3058734Z mbarrier.try_wait.parity.shared.b64 complete, [%r458], %r459; 2026-02-21T09:41:21.3058792Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.3058846Z } 2026-02-21T09:41:21.3058850Z 2026-02-21T09:41:21.3058902Z // end inline asm 2026-02-21T09:41:21.3059059Z .loc 1 56 53 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:56:53 2026-02-21T09:41:21.3059125Z mad.lo.s32 %r406, %r401, 12288, %r405; 2026-02-21T09:41:21.3059196Z mad.lo.s32 %r407, %r402, 12288, %r405; 2026-02-21T09:41:21.3059257Z mad.lo.s32 %r408, %r403, 12288, %r405; 2026-02-21T09:41:21.3059319Z mad.lo.s32 %r409, %r404, 12288, %r405; 2026-02-21T09:41:21.3059489Z .loc 1 56 24 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:56:24 2026-02-21T09:41:21.3059555Z mad.wide.s32 %rd74, %r406, 2, %rd3; 2026-02-21T09:41:21.3059617Z mad.wide.s32 %rd75, %r407, 2, %rd3; 2026-02-21T09:41:21.3059684Z mad.wide.s32 %rd76, %r408, 2, %rd3; 2026-02-21T09:41:21.3059743Z mad.wide.s32 %rd77, %r409, 2, %rd3; 2026-02-21T09:41:21.3059907Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3059960Z // begin inline asm 2026-02-21T09:41:21.3060255Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346}, [%r77 + 0], 32; 2026-02-21T09:41:21.3060333Z // end inline asm 2026-02-21T09:41:21.3060390Z // begin inline asm 2026-02-21T09:41:21.3060685Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r348, %r349, %r350, %r351, %r352, %r353, %r354, %r355, %r356, %r357, %r358, %r359, %r360, %r361, %r362, %r363}, [%r77 + 16], 32; 2026-02-21T09:41:21.3060739Z // end inline asm 2026-02-21T09:41:21.3060796Z // begin inline asm 2026-02-21T09:41:21.3060895Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:21.3060951Z // end inline asm 2026-02-21T09:41:21.3061011Z cvt.u64.u32 %rd78, %r331; 2026-02-21T09:41:21.3061068Z cvt.u64.u32 %rd79, %r332; 2026-02-21T09:41:21.3061132Z shl.b64 %rd80, %rd79, 32; 2026-02-21T09:41:21.3061191Z or.b64 %rd81, %rd78, %rd80; 2026-02-21T09:41:21.3061359Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3061427Z mov.b64 {%r410, %r411}, %rd81; 2026-02-21T09:41:21.3061494Z cvt.rn.f16x2.f32 %r412, %r411, %r410; 2026-02-21T09:41:21.3061660Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3061726Z cvt.u64.u32 %rd82, %r333; 2026-02-21T09:41:21.3061782Z cvt.u64.u32 %rd83, %r334; 2026-02-21T09:41:21.3061837Z shl.b64 %rd84, %rd83, 32; 2026-02-21T09:41:21.3061893Z or.b64 %rd85, %rd82, %rd84; 2026-02-21T09:41:21.3062088Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3062153Z mov.b64 {%r413, %r414}, %rd85; 2026-02-21T09:41:21.3062219Z cvt.rn.f16x2.f32 %r415, %r414, %r413; 2026-02-21T09:41:21.3062392Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3062448Z cvt.u64.u32 %rd86, %r335; 2026-02-21T09:41:21.3062505Z cvt.u64.u32 %rd87, %r336; 2026-02-21T09:41:21.3062567Z shl.b64 %rd88, %rd87, 32; 2026-02-21T09:41:21.3062623Z or.b64 %rd89, %rd86, %rd88; 2026-02-21T09:41:21.3062792Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3062852Z mov.b64 {%r416, %r417}, %rd89; 2026-02-21T09:41:21.3062921Z cvt.rn.f16x2.f32 %r418, %r417, %r416; 2026-02-21T09:41:21.3063088Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3063146Z cvt.u64.u32 %rd90, %r337; 2026-02-21T09:41:21.3063210Z cvt.u64.u32 %rd91, %r338; 2026-02-21T09:41:21.3063267Z shl.b64 %rd92, %rd91, 32; 2026-02-21T09:41:21.3063358Z or.b64 %rd93, %rd90, %rd92; 2026-02-21T09:41:21.3063534Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3063591Z mov.b64 {%r419, %r420}, %rd93; 2026-02-21T09:41:21.3063656Z cvt.rn.f16x2.f32 %r421, %r420, %r419; 2026-02-21T09:41:21.3063824Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3063885Z cvt.u64.u32 %rd94, %r339; 2026-02-21T09:41:21.3063943Z cvt.u64.u32 %rd95, %r340; 2026-02-21T09:41:21.3064000Z shl.b64 %rd96, %rd95, 32; 2026-02-21T09:41:21.3064063Z or.b64 %rd97, %rd94, %rd96; 2026-02-21T09:41:21.3064236Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3064293Z mov.b64 {%r422, %r423}, %rd97; 2026-02-21T09:41:21.3064363Z cvt.rn.f16x2.f32 %r424, %r423, %r422; 2026-02-21T09:41:21.3064536Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3064594Z cvt.u64.u32 %rd98, %r341; 2026-02-21T09:41:21.3064650Z cvt.u64.u32 %rd99, %r342; 2026-02-21T09:41:21.3064752Z shl.b64 %rd100, %rd99, 32; 2026-02-21T09:41:21.3064815Z or.b64 %rd101, %rd98, %rd100; 2026-02-21T09:41:21.3064977Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3065044Z mov.b64 {%r425, %r426}, %rd101; 2026-02-21T09:41:21.3065107Z cvt.rn.f16x2.f32 %r427, %r426, %r425; 2026-02-21T09:41:21.3065307Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3065372Z cvt.u64.u32 %rd102, %r343; 2026-02-21T09:41:21.3065431Z cvt.u64.u32 %rd103, %r344; 2026-02-21T09:41:21.3065489Z shl.b64 %rd104, %rd103, 32; 2026-02-21T09:41:21.3065550Z or.b64 %rd105, %rd102, %rd104; 2026-02-21T09:41:21.3065763Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3065829Z mov.b64 {%r428, %r429}, %rd105; 2026-02-21T09:41:21.3065893Z cvt.rn.f16x2.f32 %r430, %r429, %r428; 2026-02-21T09:41:21.3066074Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3066133Z cvt.u64.u32 %rd106, %r345; 2026-02-21T09:41:21.3066190Z cvt.u64.u32 %rd107, %r346; 2026-02-21T09:41:21.3066254Z shl.b64 %rd108, %rd107, 32; 2026-02-21T09:41:21.3066313Z or.b64 %rd109, %rd106, %rd108; 2026-02-21T09:41:21.3066492Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3066550Z mov.b64 {%r431, %r432}, %rd109; 2026-02-21T09:41:21.3066619Z cvt.rn.f16x2.f32 %r433, %r432, %r431; 2026-02-21T09:41:21.3066821Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3066881Z cvt.u64.u32 %rd110, %r348; 2026-02-21T09:41:21.3066945Z cvt.u64.u32 %rd111, %r349; 2026-02-21T09:41:21.3067001Z shl.b64 %rd112, %rd111, 32; 2026-02-21T09:41:21.3067058Z or.b64 %rd113, %rd110, %rd112; 2026-02-21T09:41:21.3067233Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3067292Z mov.b64 {%r434, %r435}, %rd113; 2026-02-21T09:41:21.3067355Z cvt.rn.f16x2.f32 %r436, %r435, %r434; 2026-02-21T09:41:21.3067522Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3067586Z cvt.u64.u32 %rd114, %r350; 2026-02-21T09:41:21.3067642Z cvt.u64.u32 %rd115, %r351; 2026-02-21T09:41:21.3067700Z shl.b64 %rd116, %rd115, 32; 2026-02-21T09:41:21.3067764Z or.b64 %rd117, %rd114, %rd116; 2026-02-21T09:41:21.3067935Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3067992Z mov.b64 {%r437, %r438}, %rd117; 2026-02-21T09:41:21.3068057Z cvt.rn.f16x2.f32 %r439, %r438, %r437; 2026-02-21T09:41:21.3068257Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3068312Z cvt.u64.u32 %rd118, %r352; 2026-02-21T09:41:21.3068365Z cvt.u64.u32 %rd119, %r353; 2026-02-21T09:41:21.3068426Z shl.b64 %rd120, %rd119, 32; 2026-02-21T09:41:21.3068482Z or.b64 %rd121, %rd118, %rd120; 2026-02-21T09:41:21.3068645Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3068709Z mov.b64 {%r440, %r441}, %rd121; 2026-02-21T09:41:21.3068768Z cvt.rn.f16x2.f32 %r442, %r441, %r440; 2026-02-21T09:41:21.3068932Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3068993Z cvt.u64.u32 %rd122, %r354; 2026-02-21T09:41:21.3069049Z cvt.u64.u32 %rd123, %r355; 2026-02-21T09:41:21.3069104Z shl.b64 %rd124, %rd123, 32; 2026-02-21T09:41:21.3069160Z or.b64 %rd125, %rd122, %rd124; 2026-02-21T09:41:21.3069333Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3069389Z mov.b64 {%r443, %r444}, %rd125; 2026-02-21T09:41:21.3069450Z cvt.rn.f16x2.f32 %r445, %r444, %r443; 2026-02-21T09:41:21.3069618Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3069673Z cvt.u64.u32 %rd126, %r356; 2026-02-21T09:41:21.3069729Z cvt.u64.u32 %rd127, %r357; 2026-02-21T09:41:21.3069812Z shl.b64 %rd128, %rd127, 32; 2026-02-21T09:41:21.3069869Z or.b64 %rd129, %rd126, %rd128; 2026-02-21T09:41:21.3070032Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3070088Z mov.b64 {%r446, %r447}, %rd129; 2026-02-21T09:41:21.3070155Z cvt.rn.f16x2.f32 %r448, %r447, %r446; 2026-02-21T09:41:21.3070342Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3070399Z cvt.u64.u32 %rd130, %r358; 2026-02-21T09:41:21.3070460Z cvt.u64.u32 %rd131, %r359; 2026-02-21T09:41:21.3070514Z shl.b64 %rd132, %rd131, 32; 2026-02-21T09:41:21.3070568Z or.b64 %rd133, %rd130, %rd132; 2026-02-21T09:41:21.3070737Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3070794Z mov.b64 {%r449, %r450}, %rd133; 2026-02-21T09:41:21.3070853Z cvt.rn.f16x2.f32 %r451, %r450, %r449; 2026-02-21T09:41:21.3071019Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3071084Z cvt.u64.u32 %rd134, %r360; 2026-02-21T09:41:21.3071139Z cvt.u64.u32 %rd135, %r361; 2026-02-21T09:41:21.3071193Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:41:21.3071257Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:41:21.3071449Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3071508Z mov.b64 {%r452, %r453}, %rd137; 2026-02-21T09:41:21.3071574Z cvt.rn.f16x2.f32 %r454, %r453, %r452; 2026-02-21T09:41:21.3071733Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3071788Z cvt.u64.u32 %rd138, %r362; 2026-02-21T09:41:21.3071842Z cvt.u64.u32 %rd139, %r363; 2026-02-21T09:41:21.3071905Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:41:21.3071960Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:41:21.3072120Z .loc 1 55 27 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:55:27 2026-02-21T09:41:21.3072184Z mov.b64 {%r455, %r456}, %rd141; 2026-02-21T09:41:21.3072245Z cvt.rn.f16x2.f32 %r457, %r456, %r455; 2026-02-21T09:41:21.3072402Z .loc 1 56 83 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:56:83 2026-02-21T09:41:21.3072503Z st.shared.v4.b32 [%r17], {%r412, %r424, %r436, %r448}; 2026-02-21T09:41:21.3072556Z bar.sync 0; 2026-02-21T09:41:21.3072642Z // begin inline asm 2026-02-21T09:41:21.3072790Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r385, %r389, %r393, %r397}, [%r369]; 2026-02-21T09:41:21.3072854Z // end inline asm 2026-02-21T09:41:21.3072905Z bar.sync 0; 2026-02-21T09:41:21.3072995Z st.shared.v4.b32 [%r17], {%r415, %r427, %r439, %r451}; 2026-02-21T09:41:21.3073053Z bar.sync 0; 2026-02-21T09:41:21.3073110Z // begin inline asm 2026-02-21T09:41:21.3073255Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r386, %r390, %r394, %r398}, [%r369]; 2026-02-21T09:41:21.3073311Z // end inline asm 2026-02-21T09:41:21.3073374Z bar.sync 0; 2026-02-21T09:41:21.3073461Z st.shared.v4.b32 [%r17], {%r418, %r430, %r442, %r454}; 2026-02-21T09:41:21.3073521Z bar.sync 0; 2026-02-21T09:41:21.3073582Z // begin inline asm 2026-02-21T09:41:21.3073723Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r387, %r391, %r395, %r399}, [%r369]; 2026-02-21T09:41:21.3073778Z // end inline asm 2026-02-21T09:41:21.3073837Z bar.sync 0; 2026-02-21T09:41:21.3073925Z st.shared.v4.b32 [%r17], {%r421, %r433, %r445, %r457}; 2026-02-21T09:41:21.3073980Z bar.sync 0; 2026-02-21T09:41:21.3074037Z // begin inline asm 2026-02-21T09:41:21.3074187Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r388, %r392, %r396, %r400}, [%r369]; 2026-02-21T09:41:21.3074243Z // end inline asm 2026-02-21T09:41:21.3074299Z // begin inline asm 2026-02-21T09:41:21.3074405Z st.global.v4.b32 [ %rd74 + 0 ], { %r385, %r386, %r387, %r388 }; 2026-02-21T09:41:21.3074457Z // end inline asm 2026-02-21T09:41:21.3074510Z // begin inline asm 2026-02-21T09:41:21.3074635Z st.global.v4.b32 [ %rd75 + 0 ], { %r389, %r390, %r391, %r392 }; 2026-02-21T09:41:21.3074726Z // end inline asm 2026-02-21T09:41:21.3074781Z // begin inline asm 2026-02-21T09:41:21.3074874Z st.global.v4.b32 [ %rd76 + 0 ], { %r393, %r394, %r395, %r396 }; 2026-02-21T09:41:21.3074935Z // end inline asm 2026-02-21T09:41:21.3075016Z // begin inline asm 2026-02-21T09:41:21.3075111Z st.global.v4.b32 [ %rd77 + 0 ], { %r397, %r398, %r399, %r400 }; 2026-02-21T09:41:21.3075171Z // end inline asm 2026-02-21T09:41:21.3075226Z bra.uni $L__BB0_10; 2026-02-21T09:41:21.3075306Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:21.3075475Z .loc 1 53 52 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:53:52 2026-02-21T09:41:21.3075536Z // begin inline asm 2026-02-21T09:41:21.3075585Z 2026-02-21T09:41:21.3075635Z { 2026-02-21T09:41:21.3075699Z .reg .pred complete; 2026-02-21T09:41:21.3075750Z waitLoop: 2026-02-21T09:41:21.3075867Z mbarrier.try_wait.parity.shared.b64 complete, [%r458], %r459; 2026-02-21T09:41:21.3075932Z @!complete bra.uni waitLoop; 2026-02-21T09:41:21.3075987Z } 2026-02-21T09:41:21.3075990Z 2026-02-21T09:41:21.3076043Z // end inline asm 2026-02-21T09:41:21.3076132Z $L__BB0_12: // %._crit_edge.thread 2026-02-21T09:41:21.3076342Z .loc 1 28 108 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:108 2026-02-21T09:41:21.3076400Z bar.sync 0; 2026-02-21T09:41:21.3076455Z // begin inline asm 2026-02-21T09:41:21.3076547Z @%p133 mbarrier.inval.shared::cta.b64 [%r113]; 2026-02-21T09:41:21.3076599Z // end inline asm 2026-02-21T09:41:21.3076656Z bar.sync 0; 2026-02-21T09:41:21.3076708Z // begin inline asm 2026-02-21T09:41:21.3076796Z @%p133 mbarrier.inval.shared::cta.b64 [%r114]; 2026-02-21T09:41:21.3076847Z // end inline asm 2026-02-21T09:41:21.3076897Z bar.sync 0; 2026-02-21T09:41:21.3076955Z // begin inline asm 2026-02-21T09:41:21.3077034Z @%p133 mbarrier.inval.shared::cta.b64 [%r115]; 2026-02-21T09:41:21.3077085Z // end inline asm 2026-02-21T09:41:21.3077135Z bar.sync 0; 2026-02-21T09:41:21.3077195Z // begin inline asm 2026-02-21T09:41:21.3077270Z @%p133 mbarrier.inval.shared::cta.b64 [%r116]; 2026-02-21T09:41:21.3077322Z // end inline asm 2026-02-21T09:41:21.3077381Z bar.sync 0; 2026-02-21T09:41:21.3077435Z // begin inline asm 2026-02-21T09:41:21.3077508Z @%p133 mbarrier.inval.shared::cta.b64 [%r117]; 2026-02-21T09:41:21.3077586Z // end inline asm 2026-02-21T09:41:21.3077643Z bar.sync 0; 2026-02-21T09:41:21.3077696Z // begin inline asm 2026-02-21T09:41:21.3077773Z @%p133 mbarrier.inval.shared::cta.b64 [%r223]; 2026-02-21T09:41:21.3077832Z // end inline asm 2026-02-21T09:41:21.3077883Z // begin inline asm 2026-02-21T09:41:21.3077958Z @%p133 mbarrier.inval.shared::cta.b64 [%r111]; 2026-02-21T09:41:21.3078015Z // end inline asm 2026-02-21T09:41:21.3078066Z bar.sync 0; 2026-02-21T09:41:21.3078118Z // begin inline asm 2026-02-21T09:41:21.3078195Z @%p133 mbarrier.inval.shared::cta.b64 [%r112]; 2026-02-21T09:41:21.3078252Z // end inline asm 2026-02-21T09:41:21.3078418Z .loc 1 28 4 // ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py:28:4 2026-02-21T09:41:21.3078470Z bar.sync 0; 2026-02-21T09:41:21.3078530Z // begin inline asm 2026-02-21T09:41:21.3078647Z @%p4 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r468, 128; 2026-02-21T09:41:21.3078699Z // end inline asm 2026-02-21T09:41:21.3078751Z ret; 2026-02-21T09:41:21.3078811Z $L__tmp1: 2026-02-21T09:41:21.3078864Z $L__func_end0: 2026-02-21T09:41:21.3078947Z // -- End function 2026-02-21T09:41:21.3079003Z } 2026-02-21T09:41:21.3079208Z .file 1 "/tmp/torchinductor_root/ks/ckscx4g6o62usnwjrsx7rm6kbblexdepklqaoraktv3ffafyomx3.py" 2026-02-21T09:41:21.3079267Z .section .debug_abbrev 2026-02-21T09:41:21.3079322Z { 2026-02-21T09:41:21.3079407Z .b8 1 // Abbreviation Code 2026-02-21T09:41:21.3079516Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:21.3079593Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:21.3079677Z .b8 37 // DW_AT_producer 2026-02-21T09:41:21.3079750Z .b8 8 // DW_FORM_string 2026-02-21T09:41:21.3079842Z .b8 19 // DW_AT_language 2026-02-21T09:41:21.3079924Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:21.3079998Z .b8 3 // DW_AT_name 2026-02-21T09:41:21.3080069Z .b8 8 // DW_FORM_string 2026-02-21T09:41:21.3080151Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:21.3080224Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:21.3080295Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:21.3080367Z .b8 8 // DW_FORM_string 2026-02-21T09:41:21.3080445Z .b8 0 // EOM(1) 2026-02-21T09:41:21.3080513Z .b8 0 // EOM(2) 2026-02-21T09:41:21.3080579Z .b8 0 // EOM(3) 2026-02-21T09:41:21.3080638Z } 2026-02-21T09:41:21.3080695Z .section .debug_info 2026-02-21T09:41:21.3080780Z { 2026-02-21T09:41:21.3080869Z .b32 104 // Length of Unit 2026-02-21T09:41:21.3080952Z .b8 2 // DWARF version number 2026-02-21T09:41:21.3081001Z .b8 0 2026-02-21T09:41:21.3081113Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:21.3081204Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:21.3081301Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:21.3081378Z .b8 116 // DW_AT_producer 2026-02-21T09:41:21.3081438Z .b8 114 2026-02-21T09:41:21.3081491Z .b8 105 2026-02-21T09:41:21.3081540Z .b8 116 2026-02-21T09:41:21.3081588Z .b8 111 2026-02-21T09:41:21.3081643Z .b8 110 2026-02-21T09:41:21.3081691Z .b8 0 2026-02-21T09:41:21.3081762Z .b8 2 // DW_AT_language 2026-02-21T09:41:21.3081818Z .b8 0 2026-02-21T09:41:21.3081891Z .b8 99 // DW_AT_name 2026-02-21T09:41:21.3081942Z .b8 107 2026-02-21T09:41:21.3081990Z .b8 115 2026-02-21T09:41:21.3082075Z .b8 99 2026-02-21T09:41:21.3082124Z .b8 120 2026-02-21T09:41:21.3082172Z .b8 52 2026-02-21T09:41:21.3082231Z .b8 103 2026-02-21T09:41:21.3082280Z .b8 54 2026-02-21T09:41:21.3082329Z .b8 111 2026-02-21T09:41:21.3082377Z .b8 54 2026-02-21T09:41:21.3082434Z .b8 50 2026-02-21T09:41:21.3082482Z .b8 117 2026-02-21T09:41:21.3082529Z .b8 115 2026-02-21T09:41:21.3082584Z .b8 110 2026-02-21T09:41:21.3082631Z .b8 119 2026-02-21T09:41:21.3082680Z .b8 106 2026-02-21T09:41:21.3082727Z .b8 114 2026-02-21T09:41:21.3082782Z .b8 115 2026-02-21T09:41:21.3082832Z .b8 120 2026-02-21T09:41:21.3082880Z .b8 55 2026-02-21T09:41:21.3082928Z .b8 114 2026-02-21T09:41:21.3082984Z .b8 109 2026-02-21T09:41:21.3083031Z .b8 54 2026-02-21T09:41:21.3083078Z .b8 107 2026-02-21T09:41:21.3083135Z .b8 98 2026-02-21T09:41:21.3083183Z .b8 98 2026-02-21T09:41:21.3083231Z .b8 108 2026-02-21T09:41:21.3083279Z .b8 101 2026-02-21T09:41:21.3083334Z .b8 120 2026-02-21T09:41:21.3083383Z .b8 100 2026-02-21T09:41:21.3083430Z .b8 101 2026-02-21T09:41:21.3083486Z .b8 112 2026-02-21T09:41:21.3083533Z .b8 107 2026-02-21T09:41:21.3083582Z .b8 108 2026-02-21T09:41:21.3083630Z .b8 113 2026-02-21T09:41:21.3083685Z .b8 97 2026-02-21T09:41:21.3083733Z .b8 111 2026-02-21T09:41:21.3083782Z .b8 114 2026-02-21T09:41:21.3083836Z .b8 97 2026-02-21T09:41:21.3083882Z .b8 107 2026-02-21T09:41:21.3083932Z .b8 116 2026-02-21T09:41:21.3083979Z .b8 118 2026-02-21T09:41:21.3084033Z .b8 51 2026-02-21T09:41:21.3084080Z .b8 102 2026-02-21T09:41:21.3084127Z .b8 102 2026-02-21T09:41:21.3084175Z .b8 97 2026-02-21T09:41:21.3084256Z .b8 102 2026-02-21T09:41:21.3084304Z .b8 121 2026-02-21T09:41:21.3084352Z .b8 111 2026-02-21T09:41:21.3084406Z .b8 109 2026-02-21T09:41:21.3084453Z .b8 120 2026-02-21T09:41:21.3084500Z .b8 51 2026-02-21T09:41:21.3084547Z .b8 46 2026-02-21T09:41:21.3084604Z .b8 112 2026-02-21T09:41:21.3084651Z .b8 121 2026-02-21T09:41:21.3084726Z .b8 0 2026-02-21T09:41:21.3084850Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:21.3084924Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:21.3084974Z .b8 116 2026-02-21T09:41:21.3085021Z .b8 109 2026-02-21T09:41:21.3085076Z .b8 112 2026-02-21T09:41:21.3085123Z .b8 47 2026-02-21T09:41:21.3085171Z .b8 116 2026-02-21T09:41:21.3085225Z .b8 111 2026-02-21T09:41:21.3085273Z .b8 114 2026-02-21T09:41:21.3085319Z .b8 99 2026-02-21T09:41:21.3085367Z .b8 104 2026-02-21T09:41:21.3085424Z .b8 105 2026-02-21T09:41:21.3085472Z .b8 110 2026-02-21T09:41:21.3085519Z .b8 100 2026-02-21T09:41:21.3085575Z .b8 117 2026-02-21T09:41:21.3085624Z .b8 99 2026-02-21T09:41:21.3085672Z .b8 116 2026-02-21T09:41:21.3085720Z .b8 111 2026-02-21T09:41:21.3085779Z .b8 114 2026-02-21T09:41:21.3085828Z .b8 95 2026-02-21T09:41:21.3085879Z .b8 114 2026-02-21T09:41:21.3085928Z .b8 111 2026-02-21T09:41:21.3085987Z .b8 111 2026-02-21T09:41:21.3086037Z .b8 116 2026-02-21T09:41:21.3086086Z .b8 47 2026-02-21T09:41:21.3086169Z .b8 107 2026-02-21T09:41:21.3086221Z .b8 115 2026-02-21T09:41:21.3086269Z .b8 0 2026-02-21T09:41:21.3086317Z } 2026-02-21T09:41:21.3086387Z .section .debug_macinfo { } 2026-02-21T09:41:21.3086391Z 2026-02-21T09:41:21.3086464Z ================================================================ 2026-02-21T09:41:21.3086566Z please share the reproducer above with Triton project. 2026-02-21T09:41:24.3723096Z 2026-02-21T09:41:24.3727692Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 17.2 configs/s 2026-02-21T09:41:24.3737031Z [30s] Generation 1 complete: 2026-02-21T09:41:24.3740837Z error=19 2026-02-21T09:41:24.3742272Z ok=70 2026-02-21T09:41:24.3742438Z min=0.0635 2026-02-21T09:41:24.3742579Z mid=0.2806 2026-02-21T09:41:24.3742702Z max=2.7176 2026-02-21T09:41:24.3742847Z best={'block_sizes': [64, 256, 32], 2026-02-21T09:41:24.3743087Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:41:24.3743316Z 'l2_groupings': [4], 2026-02-21T09:41:24.3743505Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:41:24.3743714Z 'loop_orders': [[0, 1]], 2026-02-21T09:41:24.3744147Z 'maxnreg': 128, 2026-02-21T09:41:24.3744302Z 'num_sm_multiplier': 2, 2026-02-21T09:41:24.3744463Z 'num_stages': 7, 2026-02-21T09:41:24.3744599Z 'num_warps': 8, 2026-02-21T09:41:24.3744866Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:41:24.3745061Z 'range_flattens': [True, False], 2026-02-21T09:41:24.3745252Z 'range_multi_buffers': [False, False], 2026-02-21T09:41:24.3745446Z 'range_num_stages': [0, 0], 2026-02-21T09:41:24.3745615Z 'range_unroll_factors': [0, 0], 2026-02-21T09:41:24.3745794Z 'range_warp_specializes': [False, False]} 2026-02-21T09:41:24.3754989Z [30s] Fitting surrogate: 189 points, 189 targets 2026-02-21T09:41:25.6328694Z [31s] Generation 2 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:41:35.0181200Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86/86 2.5 configs/s 2026-02-21T09:41:35.6501294Z 2026-02-21T09:41:35.6504510Z 2026-02-21T09:41:35.6505089Z ================================================================ 2026-02-21T09:41:35.6505388Z Internal Triton PTX codegen error 2026-02-21T09:41:35.6505560Z `ptxas` stderr: 2026-02-21T09:41:35.6505994Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 146 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:35.6506466Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:35.6506610Z 2026-02-21T09:41:35.6507011Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpuy44dvwe.ptx -o /tmp/tmpuy44dvwe.ptx.o 2026-02-21T09:41:35.6507746Z 2026-02-21T09:41:35.6507751Z 2026-02-21T09:41:35.6507810Z // 2026-02-21T09:41:35.6507950Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:41:35.6508127Z // 2026-02-21T09:41:35.6508193Z 2026-02-21T09:41:35.6508256Z .version 8.7 2026-02-21T09:41:35.6508410Z .target sm_100a 2026-02-21T09:41:35.6508628Z .address_size 64 2026-02-21T09:41:35.6508714Z 2026-02-21T09:41:35.6508836Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:41:35.6509102Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:41:35.6509313Z // @_helion_matmul 2026-02-21T09:41:35.6509527Z .visible .entry _helion_matmul( 2026-02-21T09:41:35.6509739Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:41:35.6510002Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:41:35.6510257Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:41:35.6510496Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:41:35.6510750Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:41:35.6510948Z ) 2026-02-21T09:41:35.6511078Z .reqntid 128 2026-02-21T09:41:35.6511205Z .maxnreg 32 2026-02-21T09:41:35.6511352Z { 2026-02-21T09:41:35.6511534Z .reg .pred %p<137>; 2026-02-21T09:41:35.6511681Z .reg .b32 %r<1720>; 2026-02-21T09:41:35.6511822Z .reg .b64 %rd<631>; 2026-02-21T09:41:35.6512077Z .loc 1 19 0 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:19:0 2026-02-21T09:41:35.6512373Z $L__func_begin0: 2026-02-21T09:41:35.6512613Z .loc 1 19 0 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:19:0 2026-02-21T09:41:35.6512848Z 2026-02-21T09:41:35.6512898Z // %bb.0: 2026-02-21T09:41:35.6513055Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:41:35.6513262Z $L__tmp0: 2026-02-21T09:41:35.6513909Z [41s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:41:35.6515154Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 512, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, False]), static_shapes=True) 2026-02-21T09:41:35.6516388Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:41:35.6516623Z `ptxas` stderr: 2026-02-21T09:41:35.6517017Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 146 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:41:35.6517486Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:41:35.6517628Z 2026-02-21T09:41:35.6518014Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpuy44dvwe.ptx -o /tmp/tmpuy44dvwe.ptx.o 2026-02-21T09:41:35.6518433Z 2026-02-21T09:41:35.6518561Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:41:35.6518907Z .loc 1 19 0 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:19 2026-02-21T09:41:35.6519214Z mov.u32 %r1, %tid.x; 2026-02-21T09:41:35.6519387Z ld.param.b64 %rd7, [_helion_matmul_param_1]; 2026-02-21T09:41:35.6519595Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:41:35.6519760Z mov.b32 %r123, global_smem; 2026-02-21T09:41:35.6519928Z // begin inline asm 2026-02-21T09:41:35.6520177Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r123], 256; 2026-02-21T09:41:35.6520433Z // end inline asm 2026-02-21T09:41:35.6520600Z ld.param.b64 %rd42, [_helion_matmul_param_3]; 2026-02-21T09:41:35.6520796Z bar.sync 0; 2026-02-21T09:41:35.6520978Z ld.shared.b32 %r1681, [global_smem]; 2026-02-21T09:41:35.6521154Z bar.sync 0; 2026-02-21T09:41:35.6521288Z // begin inline asm 2026-02-21T09:41:35.6521490Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:41:35.6521723Z // end inline asm 2026-02-21T09:41:35.6522008Z .loc 1 21 68 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:21:68 2026-02-21T09:41:35.6522317Z mov.u32 %r469, %ctaid.x; 2026-02-21T09:41:35.6522474Z mov.u32 %r470, %ctaid.y; 2026-02-21T09:41:35.6522633Z mov.u32 %r471, %ctaid.z; 2026-02-21T09:41:35.6522783Z mov.u32 %r472, %nctaid.x; 2026-02-21T09:41:35.6522945Z mov.u32 %r473, %nctaid.y; 2026-02-21T09:41:35.6523114Z mad.lo.s32 %r474, %r471, %r473, %r470; 2026-02-21T09:41:35.6523299Z mad.lo.s32 %r475, %r474, %r472, %r469; 2026-02-21T09:41:35.6523478Z shl.b32 %r476, %r475, 7; 2026-02-21T09:41:35.6523628Z cvt.s64.s32 %rd43, %r476; 2026-02-21T09:41:35.6523794Z add.s64 %rd21, %rd42, %rd43; 2026-02-21T09:41:35.6523957Z shl.b32 %r477, %r1, 2; 2026-02-21T09:41:35.6524118Z add.s32 %r124, %r123, %r477; 2026-02-21T09:41:35.6524273Z mov.b32 %r1719, 0; 2026-02-21T09:41:35.6524421Z // begin inline asm 2026-02-21T09:41:35.6524583Z @%p3 st.shared.b32 [ %r124 + 0 ], %r1719; 2026-02-21T09:41:35.6524823Z // end inline asm 2026-02-21T09:41:35.6525012Z bar.warp.sync -1; 2026-02-21T09:41:35.6525168Z setp.eq.b32 %p126, %r1, 0; 2026-02-21T09:41:35.6525333Z cvt.u64.u32 %rd6, %r123; 2026-02-21T09:41:35.6525486Z // begin inline asm 2026-02-21T09:41:35.6525746Z @%p126 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd6 + 0 ], %rd7; 2026-02-21T09:41:35.6526040Z // end inline asm 2026-02-21T09:41:35.6526181Z // begin inline asm 2026-02-21T09:41:35.6526414Z @%p126 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1; 2026-02-21T09:41:35.6526670Z // end inline asm 2026-02-21T09:41:35.6526818Z mov.b32 %r126, 32; 2026-02-21T09:41:35.6526946Z // begin inline asm 2026-02-21T09:41:35.6527185Z @%p126 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r126; 2026-02-21T09:41:35.6527443Z // end inline asm 2026-02-21T09:41:35.6527579Z mov.b32 %r127, 256; 2026-02-21T09:41:35.6527711Z // begin inline asm 2026-02-21T09:41:35.6527940Z @%p126 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r127; 2026-02-21T09:41:35.6528200Z // end inline asm 2026-02-21T09:41:35.6528328Z mov.b32 %r128, 1024; 2026-02-21T09:41:35.6528511Z // begin inline asm 2026-02-21T09:41:35.6528743Z @%p126 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r128; 2026-02-21T09:41:35.6529022Z // end inline asm 2026-02-21T09:41:35.6529147Z mov.b32 %r129, 12288; 2026-02-21T09:41:35.6529291Z // begin inline asm 2026-02-21T09:41:35.6529521Z @%p126 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r129; 2026-02-21T09:41:35.6529791Z // end inline asm 2026-02-21T09:41:35.6529924Z mov.b64 %rd14, 2048; 2026-02-21T09:41:35.6530062Z // begin inline asm 2026-02-21T09:41:35.6530308Z @%p126 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd6 + 0 ], 0x0, %rd14; 2026-02-21T09:41:35.6530583Z // end inline asm 2026-02-21T09:41:35.6530716Z mov.b32 %r130, 1; 2026-02-21T09:41:35.6530845Z // begin inline asm 2026-02-21T09:41:35.6531101Z @%p126 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0, %r130; 2026-02-21T09:41:35.6531382Z // end inline asm 2026-02-21T09:41:35.6531511Z // begin inline asm 2026-02-21T09:41:35.6531759Z @%p126 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x1, %r130; 2026-02-21T09:41:35.6532032Z // end inline asm 2026-02-21T09:41:35.6532165Z // begin inline asm 2026-02-21T09:41:35.6532424Z @%p126 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x6; 2026-02-21T09:41:35.6532677Z // end inline asm 2026-02-21T09:41:35.6532812Z // begin inline asm 2026-02-21T09:41:35.6533050Z @%p126 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0; 2026-02-21T09:41:35.6533385Z // end inline asm 2026-02-21T09:41:35.6533512Z // begin inline asm 2026-02-21T09:41:35.6533751Z @%p126 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x2; 2026-02-21T09:41:35.6534017Z // end inline asm 2026-02-21T09:41:35.6534146Z // begin inline asm 2026-02-21T09:41:35.6534405Z @%p126 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd6 + 0 ], 0x0; 2026-02-21T09:41:35.6534655Z // end inline asm 2026-02-21T09:41:35.6534843Z // begin inline asm 2026-02-21T09:41:35.6535183Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd21 + 0 ], [ %rd6 + 0 ], 0x80; 2026-02-21T09:41:35.6535562Z // end inline asm 2026-02-21T09:41:35.6535697Z // begin inline asm 2026-02-21T09:41:35.6535897Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd21 + 0 ], 0x80; 2026-02-21T09:41:35.6536149Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:41:35.6536331Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:41:35.6536508Z // end inline asm 2026-02-21T09:41:35.6536636Z bar.sync 0; 2026-02-21T09:41:35.6536780Z cvta.global.u64 %rd26, %rd21; 2026-02-21T09:41:35.6537060Z .loc 1 28 35 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:28:35 2026-02-21T09:41:35.6537379Z shl.b32 %r1715, %r469, 1; 2026-02-21T09:41:35.6537636Z .loc 1 29 37 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:29:37 2026-02-21T09:41:35.6537919Z add.s32 %r478, %r1715, 2; 2026-02-21T09:41:35.6538174Z .loc 1 29 49 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:29:49 2026-02-21T09:41:35.6538450Z min.s32 %r479, %r478, 384; 2026-02-21T09:41:35.6538705Z .loc 1 42 45 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:42:45 2026-02-21T09:41:35.6538988Z shr.u32 %r480, %r1, 5; 2026-02-21T09:41:35.6539133Z bfe.u32 %r4, %r1, 2, 5; 2026-02-21T09:41:35.6539285Z or.b32 %r5, %r4, 32; 2026-02-21T09:41:35.6539527Z .loc 1 44 45 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:44:45 2026-02-21T09:41:35.6539807Z shl.b32 %r39, %r1, 3; 2026-02-21T09:41:35.6540058Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6540361Z sub.s32 %r481, %r479, %r1715; 2026-02-21T09:41:35.6540517Z shl.b32 %r41, %r481, 5; 2026-02-21T09:41:35.6540864Z .loc 1 50 48 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:50:48 2026-02-21T09:41:35.6541298Z and.b32 %r42, %r39, 24; 2026-02-21T09:41:35.6541686Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6542162Z shfl.sync.idx.b32 %r43, %r480, 0, 31, -1; 2026-02-21T09:41:35.6542434Z shl.b32 %r482, %r43, 21; 2026-02-21T09:41:35.6542669Z and.b32 %r483, %r482, 6291456; 2026-02-21T09:41:35.6542905Z add.s32 %r132, %r483, %r1681; 2026-02-21T09:41:35.6543146Z mov.pred %p84, -1; 2026-02-21T09:41:35.6543350Z // begin inline asm 2026-02-21T09:41:35.6544013Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 0], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6544786Z // end inline asm 2026-02-21T09:41:35.6544989Z // begin inline asm 2026-02-21T09:41:35.6545610Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 16], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6546313Z // end inline asm 2026-02-21T09:41:35.6546518Z // begin inline asm 2026-02-21T09:41:35.6547158Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 32], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6547850Z // end inline asm 2026-02-21T09:41:35.6548180Z // begin inline asm 2026-02-21T09:41:35.6548822Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 48], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6549530Z // end inline asm 2026-02-21T09:41:35.6549726Z // begin inline asm 2026-02-21T09:41:35.6550418Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 64], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6551121Z // end inline asm 2026-02-21T09:41:35.6551320Z // begin inline asm 2026-02-21T09:41:35.6551979Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 80], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6552682Z // end inline asm 2026-02-21T09:41:35.6552888Z // begin inline asm 2026-02-21T09:41:35.6553537Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 96], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6554245Z // end inline asm 2026-02-21T09:41:35.6554456Z // begin inline asm 2026-02-21T09:41:35.6555274Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 112], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6555993Z // end inline asm 2026-02-21T09:41:35.6556196Z // begin inline asm 2026-02-21T09:41:35.6556888Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048576], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6557608Z // end inline asm 2026-02-21T09:41:35.6557804Z // begin inline asm 2026-02-21T09:41:35.6558472Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048592], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6559095Z // end inline asm 2026-02-21T09:41:35.6559299Z // begin inline asm 2026-02-21T09:41:35.6559982Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048608], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6560783Z // end inline asm 2026-02-21T09:41:35.6560992Z // begin inline asm 2026-02-21T09:41:35.6561661Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048624], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6562409Z // end inline asm 2026-02-21T09:41:35.6562614Z // begin inline asm 2026-02-21T09:41:35.6563315Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048640], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6564079Z // end inline asm 2026-02-21T09:41:35.6564284Z // begin inline asm 2026-02-21T09:41:35.6565057Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048656], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6565792Z // end inline asm 2026-02-21T09:41:35.6565998Z // begin inline asm 2026-02-21T09:41:35.6566632Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048672], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6567294Z // end inline asm 2026-02-21T09:41:35.6567504Z // begin inline asm 2026-02-21T09:41:35.6568146Z @%p84 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r132 + 1048688], 128, {%r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719, %r1719}; 2026-02-21T09:41:35.6569031Z // end inline asm 2026-02-21T09:41:35.6569246Z // begin inline asm 2026-02-21T09:41:35.6569485Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:41:35.6569743Z // end inline asm 2026-02-21T09:41:35.6569941Z bar.sync 0; 2026-02-21T09:41:35.6570422Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6570911Z add.s32 %r1718, %r123, 274496; 2026-02-21T09:41:35.6571163Z // begin inline asm 2026-02-21T09:41:35.6571422Z @%p126 mbarrier.init.shared::cta.b64 [%r1718], 1; 2026-02-21T09:41:35.6571726Z // end inline asm 2026-02-21T09:41:35.6571924Z bar.sync 0; 2026-02-21T09:41:35.6572116Z add.s32 %r405, %r123, 274504; 2026-02-21T09:41:35.6572353Z // begin inline asm 2026-02-21T09:41:35.6572599Z @%p126 mbarrier.init.shared::cta.b64 [%r405], 1; 2026-02-21T09:41:35.6572898Z // end inline asm 2026-02-21T09:41:35.6573100Z add.s32 %r406, %r123, 274432; 2026-02-21T09:41:35.6573334Z // begin inline asm 2026-02-21T09:41:35.6573577Z @%p126 mbarrier.init.shared::cta.b64 [%r406], 1; 2026-02-21T09:41:35.6573894Z // end inline asm 2026-02-21T09:41:35.6574084Z bar.sync 0; 2026-02-21T09:41:35.6574284Z add.s32 %r407, %r123, 274440; 2026-02-21T09:41:35.6574596Z // begin inline asm 2026-02-21T09:41:35.6574908Z @%p126 mbarrier.init.shared::cta.b64 [%r407], 1; 2026-02-21T09:41:35.6575207Z // end inline asm 2026-02-21T09:41:35.6575399Z bar.sync 0; 2026-02-21T09:41:35.6575597Z add.s32 %r408, %r123, 274448; 2026-02-21T09:41:35.6575827Z // begin inline asm 2026-02-21T09:41:35.6576077Z @%p126 mbarrier.init.shared::cta.b64 [%r408], 1; 2026-02-21T09:41:35.6576364Z // end inline asm 2026-02-21T09:41:35.6576562Z bar.sync 0; 2026-02-21T09:41:35.6576759Z add.s32 %r409, %r123, 274456; 2026-02-21T09:41:35.6576989Z // begin inline asm 2026-02-21T09:41:35.6577238Z @%p126 mbarrier.init.shared::cta.b64 [%r409], 1; 2026-02-21T09:41:35.6577526Z // end inline asm 2026-02-21T09:41:35.6577723Z bar.sync 0; 2026-02-21T09:41:35.6577910Z add.s32 %r410, %r123, 274464; 2026-02-21T09:41:35.6578144Z // begin inline asm 2026-02-21T09:41:35.6578387Z @%p126 mbarrier.init.shared::cta.b64 [%r410], 1; 2026-02-21T09:41:35.6578682Z // end inline asm 2026-02-21T09:41:35.6578877Z bar.sync 0; 2026-02-21T09:41:35.6579080Z add.s32 %r411, %r123, 274472; 2026-02-21T09:41:35.6579319Z // begin inline asm 2026-02-21T09:41:35.6579659Z @%p126 mbarrier.init.shared::cta.b64 [%r411], 1; 2026-02-21T09:41:35.6579955Z // end inline asm 2026-02-21T09:41:35.6580148Z bar.sync 0; 2026-02-21T09:41:35.6580348Z add.s32 %r538, %r123, 274480; 2026-02-21T09:41:35.6580585Z // begin inline asm 2026-02-21T09:41:35.6580840Z @%p126 mbarrier.init.shared::cta.b64 [%r538], 1; 2026-02-21T09:41:35.6581129Z // end inline asm 2026-02-21T09:41:35.6581343Z setp.lt.s32 %p61, %r41, 1; 2026-02-21T09:41:35.6581585Z setp.gt.s32 %p60, %r41, 0; 2026-02-21T09:41:35.6582036Z .loc 1 36 35 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:36:35 2026-02-21T09:41:35.6582527Z mul.hi.s32 %r484, %r1715, 715827883; 2026-02-21T09:41:35.6582790Z shr.u32 %r485, %r484, 31; 2026-02-21T09:41:35.6583029Z shr.s32 %r486, %r484, 4; 2026-02-21T09:41:35.6583259Z add.s32 %r487, %r486, %r485; 2026-02-21T09:41:35.6583699Z .loc 1 37 33 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:37:33 2026-02-21T09:41:35.6584161Z shl.b32 %r488, %r487, 2; 2026-02-21T09:41:35.6584571Z .loc 1 38 39 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:38:39 2026-02-21T09:41:35.6585048Z sub.s32 %r489, 16, %r488; 2026-02-21T09:41:35.6585438Z .loc 1 38 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:38:52 2026-02-21T09:41:35.6585859Z min.s32 %r490, %r489, 4; 2026-02-21T09:41:35.6586225Z .loc 1 39 45 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:39:45 2026-02-21T09:41:35.6586766Z mul.lo.s32 %r491, %r487, 96; 2026-02-21T09:41:35.6586992Z sub.s32 %r492, %r1715, %r491; 2026-02-21T09:41:35.6587379Z .loc 1 40 51 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:40:51 2026-02-21T09:41:35.6587794Z div.s32 %r493, %r492, %r490; 2026-02-21T09:41:35.6588244Z .loc 1 39 64 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:39:64 2026-02-21T09:41:35.6588672Z mul.lo.s32 %r494, %r493, %r490; 2026-02-21T09:41:35.6588905Z sub.s32 %r495, %r492, %r494; 2026-02-21T09:41:35.6589294Z .loc 1 39 30 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:39:30 2026-02-21T09:41:35.6589722Z add.s32 %r496, %r495, %r488; 2026-02-21T09:41:35.6590155Z .loc 1 41 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:41:27 2026-02-21T09:41:35.6590629Z shl.b32 %r1688, %r496, 6; 2026-02-21T09:41:35.6591051Z .loc 1 42 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:42:32 2026-02-21T09:41:35.6591532Z or.b32 %r1716, %r1688, %r4; 2026-02-21T09:41:35.6591765Z or.b32 %r1717, %r1688, %r5; 2026-02-21T09:41:35.6592188Z .loc 1 43 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:43:27 2026-02-21T09:41:35.6592725Z shl.b32 %r1683, %r493, 9; 2026-02-21T09:41:35.6593150Z .loc 1 54 53 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:53 2026-02-21T09:41:35.6593618Z shl.b32 %r497, %r1716, 10; 2026-02-21T09:41:35.6593839Z shl.b32 %r498, %r1717, 10; 2026-02-21T09:41:35.6594290Z .loc 1 54 60 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:60 2026-02-21T09:41:35.6594866Z or.b32 %r499, %r497, %r42; 2026-02-21T09:41:35.6595119Z or.b32 %r500, %r498, %r42; 2026-02-21T09:41:35.6595563Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6595988Z mad.wide.s32 %rd24, %r499, 2, %rd4; 2026-02-21T09:41:35.6596232Z mad.wide.s32 %rd25, %r500, 2, %rd4; 2026-02-21T09:41:35.6596604Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6596997Z shl.b32 %r49, %r1, 4; 2026-02-21T09:41:35.6597189Z and.b32 %r501, %r49, 2032; 2026-02-21T09:41:35.6597400Z and.b32 %r50, %r1, 24; 2026-02-21T09:41:35.6597588Z shl.b32 %r502, %r50, 1; 2026-02-21T09:41:35.6597926Z xor.b32 %r51, %r501, %r502; 2026-02-21T09:41:35.6598130Z add.s32 %r503, %r123, %r51; 2026-02-21T09:41:35.6598343Z add.s32 %r413, %r503, 229376; 2026-02-21T09:41:35.6598559Z selp.b32 %r414, 16, 0, %p60; 2026-02-21T09:41:35.6598772Z // begin inline asm 2026-02-21T09:41:35.6599073Z cp.async.cg.shared.global [ %r413 + 0 ], [ %rd24 + 0 ], 0x10, %r414; 2026-02-21T09:41:35.6599406Z // end inline asm 2026-02-21T09:41:35.6599601Z add.s32 %r415, %r503, 231424; 2026-02-21T09:41:35.6599809Z // begin inline asm 2026-02-21T09:41:35.6600087Z cp.async.cg.shared.global [ %r415 + 0 ], [ %rd25 + 0 ], 0x10, %r414; 2026-02-21T09:41:35.6600399Z // end inline asm 2026-02-21T09:41:35.6600602Z cp.async.commit_group; 2026-02-21T09:41:35.6600976Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6601374Z bar.sync 0; 2026-02-21T09:41:35.6601569Z and.pred %p48, %p126, %p60; 2026-02-21T09:41:35.6601821Z // begin inline asm 2026-02-21T09:41:35.6602126Z @%p48 mbarrier.arrive.expect_tx.shared.b64 _, [%r406], 32768; 2026-02-21T09:41:35.6602468Z // end inline asm 2026-02-21T09:41:35.6602862Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6603312Z // begin inline asm 2026-02-21T09:41:35.6603552Z fence.proxy.async.shared::cta; 2026-02-21T09:41:35.6603802Z // end inline asm 2026-02-21T09:41:35.6603994Z bar.sync 0; 2026-02-21T09:41:35.6604201Z elect.sync %r504|%p62, -1; 2026-02-21T09:41:35.6604535Z and.pred %p63, %p60, %p62; 2026-02-21T09:41:35.6604850Z setp.lt.u32 %p64, %r1, 64; 2026-02-21T09:41:35.6605090Z and.pred %p49, %p64, %p63; 2026-02-21T09:41:35.6605338Z and.b32 %r505, %r43, 1; 2026-02-21T09:41:35.6605563Z shl.b32 %r52, %r505, 13; 2026-02-21T09:41:35.6605803Z shl.b32 %r506, %r505, 14; 2026-02-21T09:41:35.6606093Z add.s32 %r418, %r123, %r506; 2026-02-21T09:41:35.6606335Z shl.b32 %r54, %r505, 8; 2026-02-21T09:41:35.6606571Z or.b32 %r55, %r1683, %r54; 2026-02-21T09:41:35.6606784Z // begin inline asm 2026-02-21T09:41:35.6607322Z @%p49 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r418], [%rd26, {%r1719, %r55}], [%r406]; 2026-02-21T09:41:35.6607890Z // end inline asm 2026-02-21T09:41:35.6608306Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6608790Z setp.gt.s32 %p65, %r41, 1; 2026-02-21T09:41:35.6609240Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6609730Z cvt.s64.s32 %rd44, %r497; 2026-02-21T09:41:35.6609976Z cvt.u64.u32 %rd45, %r42; 2026-02-21T09:41:35.6610224Z or.b64 %rd46, %rd44, %rd45; 2026-02-21T09:41:35.6610471Z shl.b64 %rd47, %rd46, 1; 2026-02-21T09:41:35.6610719Z add.s64 %rd2, %rd4, %rd47; 2026-02-21T09:41:35.6611008Z add.s64 %rd27, %rd2, 64; 2026-02-21T09:41:35.6611254Z cvt.s64.s32 %rd48, %r498; 2026-02-21T09:41:35.6611493Z or.b64 %rd49, %rd48, %rd45; 2026-02-21T09:41:35.6611749Z shl.b64 %rd50, %rd49, 1; 2026-02-21T09:41:35.6611987Z add.s64 %rd3, %rd4, %rd50; 2026-02-21T09:41:35.6612231Z add.s64 %rd28, %rd3, 64; 2026-02-21T09:41:35.6612670Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6613162Z add.s32 %r422, %r503, 233472; 2026-02-21T09:41:35.6613428Z selp.b32 %r423, 16, 0, %p65; 2026-02-21T09:41:35.6613678Z // begin inline asm 2026-02-21T09:41:35.6614013Z cp.async.cg.shared.global [ %r422 + 0 ], [ %rd27 + 0 ], 0x10, %r423; 2026-02-21T09:41:35.6614378Z // end inline asm 2026-02-21T09:41:35.6614601Z add.s32 %r424, %r503, 235520; 2026-02-21T09:41:35.6614897Z // begin inline asm 2026-02-21T09:41:35.6615227Z cp.async.cg.shared.global [ %r424 + 0 ], [ %rd28 + 0 ], 0x10, %r423; 2026-02-21T09:41:35.6615607Z // end inline asm 2026-02-21T09:41:35.6615833Z cp.async.commit_group; 2026-02-21T09:41:35.6616286Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6616819Z bar.sync 0; 2026-02-21T09:41:35.6617046Z and.pred %p50, %p126, %p65; 2026-02-21T09:41:35.6617303Z // begin inline asm 2026-02-21T09:41:35.6617635Z @%p50 mbarrier.arrive.expect_tx.shared.b64 _, [%r407], 32768; 2026-02-21T09:41:35.6618000Z // end inline asm 2026-02-21T09:41:35.6618425Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6618904Z bar.sync 0; 2026-02-21T09:41:35.6619124Z elect.sync %r507|%p66, -1; 2026-02-21T09:41:35.6619374Z and.pred %p67, %p65, %p66; 2026-02-21T09:41:35.6619613Z and.pred %p51, %p64, %p67; 2026-02-21T09:41:35.6619860Z add.s32 %r427, %r418, 32768; 2026-02-21T09:41:35.6620095Z // begin inline asm 2026-02-21T09:41:35.6620615Z @%p51 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r427], [%rd26, {%r126, %r55}], [%r407]; 2026-02-21T09:41:35.6621174Z // end inline asm 2026-02-21T09:41:35.6621571Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6622050Z setp.gt.s32 %p68, %r41, 2; 2026-02-21T09:41:35.6622463Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6622918Z add.s64 %rd30, %rd2, 128; 2026-02-21T09:41:35.6623147Z add.s64 %rd31, %rd3, 128; 2026-02-21T09:41:35.6623565Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6624080Z add.s32 %r431, %r503, 237568; 2026-02-21T09:41:35.6624322Z selp.b32 %r432, 16, 0, %p68; 2026-02-21T09:41:35.6624565Z // begin inline asm 2026-02-21T09:41:35.6624918Z cp.async.cg.shared.global [ %r431 + 0 ], [ %rd30 + 0 ], 0x10, %r432; 2026-02-21T09:41:35.6625284Z // end inline asm 2026-02-21T09:41:35.6625558Z add.s32 %r433, %r503, 239616; 2026-02-21T09:41:35.6625805Z // begin inline asm 2026-02-21T09:41:35.6626102Z cp.async.cg.shared.global [ %r433 + 0 ], [ %rd31 + 0 ], 0x10, %r432; 2026-02-21T09:41:35.6626443Z // end inline asm 2026-02-21T09:41:35.6626650Z cp.async.commit_group; 2026-02-21T09:41:35.6627079Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6627543Z bar.sync 0; 2026-02-21T09:41:35.6627747Z and.pred %p52, %p126, %p68; 2026-02-21T09:41:35.6627997Z // begin inline asm 2026-02-21T09:41:35.6628288Z @%p52 mbarrier.arrive.expect_tx.shared.b64 _, [%r408], 32768; 2026-02-21T09:41:35.6628636Z // end inline asm 2026-02-21T09:41:35.6629012Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6629458Z bar.sync 0; 2026-02-21T09:41:35.6629656Z elect.sync %r508|%p69, -1; 2026-02-21T09:41:35.6629951Z and.pred %p70, %p68, %p69; 2026-02-21T09:41:35.6630202Z and.pred %p53, %p64, %p70; 2026-02-21T09:41:35.6630441Z add.s32 %r436, %r418, 65536; 2026-02-21T09:41:35.6630686Z mov.b32 %r437, 64; 2026-02-21T09:41:35.6630892Z // begin inline asm 2026-02-21T09:41:35.6631414Z @%p53 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r436], [%rd26, {%r437, %r55}], [%r408]; 2026-02-21T09:41:35.6631982Z // end inline asm 2026-02-21T09:41:35.6632387Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6632859Z setp.gt.s32 %p71, %r41, 3; 2026-02-21T09:41:35.6633277Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6633738Z add.s64 %rd33, %rd2, 192; 2026-02-21T09:41:35.6633974Z add.s64 %rd34, %rd3, 192; 2026-02-21T09:41:35.6634396Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6634901Z add.s32 %r440, %r503, 241664; 2026-02-21T09:41:35.6635160Z selp.b32 %r441, 16, 0, %p71; 2026-02-21T09:41:35.6635404Z // begin inline asm 2026-02-21T09:41:35.6635787Z cp.async.cg.shared.global [ %r440 + 0 ], [ %rd33 + 0 ], 0x10, %r441; 2026-02-21T09:41:35.6636142Z // end inline asm 2026-02-21T09:41:35.6636343Z add.s32 %r442, %r503, 243712; 2026-02-21T09:41:35.6636576Z // begin inline asm 2026-02-21T09:41:35.6636859Z cp.async.cg.shared.global [ %r442 + 0 ], [ %rd34 + 0 ], 0x10, %r441; 2026-02-21T09:41:35.6637194Z // end inline asm 2026-02-21T09:41:35.6637388Z cp.async.commit_group; 2026-02-21T09:41:35.6637816Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6638283Z bar.sync 0; 2026-02-21T09:41:35.6638487Z and.pred %p54, %p126, %p71; 2026-02-21T09:41:35.6638736Z // begin inline asm 2026-02-21T09:41:35.6639027Z @%p54 mbarrier.arrive.expect_tx.shared.b64 _, [%r409], 32768; 2026-02-21T09:41:35.6639377Z // end inline asm 2026-02-21T09:41:35.6639770Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6640225Z bar.sync 0; 2026-02-21T09:41:35.6640430Z elect.sync %r509|%p72, -1; 2026-02-21T09:41:35.6640662Z and.pred %p73, %p71, %p72; 2026-02-21T09:41:35.6640881Z and.pred %p55, %p64, %p73; 2026-02-21T09:41:35.6641105Z add.s32 %r445, %r418, 98304; 2026-02-21T09:41:35.6641323Z mov.b32 %r446, 96; 2026-02-21T09:41:35.6641516Z // begin inline asm 2026-02-21T09:41:35.6642010Z @%p55 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r445], [%rd26, {%r446, %r55}], [%r409]; 2026-02-21T09:41:35.6642586Z // end inline asm 2026-02-21T09:41:35.6642995Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6643466Z setp.gt.s32 %p74, %r41, 4; 2026-02-21T09:41:35.6643897Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6644401Z add.s64 %rd36, %rd2, 256; 2026-02-21T09:41:35.6644629Z add.s64 %rd37, %rd3, 256; 2026-02-21T09:41:35.6645100Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6645563Z add.s32 %r449, %r503, 245760; 2026-02-21T09:41:35.6645817Z selp.b32 %r450, 16, 0, %p74; 2026-02-21T09:41:35.6646056Z // begin inline asm 2026-02-21T09:41:35.6646380Z cp.async.cg.shared.global [ %r449 + 0 ], [ %rd36 + 0 ], 0x10, %r450; 2026-02-21T09:41:35.6646741Z // end inline asm 2026-02-21T09:41:35.6646949Z add.s32 %r451, %r503, 247808; 2026-02-21T09:41:35.6647190Z // begin inline asm 2026-02-21T09:41:35.6647497Z cp.async.cg.shared.global [ %r451 + 0 ], [ %rd37 + 0 ], 0x10, %r450; 2026-02-21T09:41:35.6647844Z // end inline asm 2026-02-21T09:41:35.6648052Z cp.async.commit_group; 2026-02-21T09:41:35.6648478Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6648990Z bar.sync 0; 2026-02-21T09:41:35.6649201Z and.pred %p56, %p126, %p74; 2026-02-21T09:41:35.6649442Z // begin inline asm 2026-02-21T09:41:35.6649744Z @%p56 mbarrier.arrive.expect_tx.shared.b64 _, [%r410], 32768; 2026-02-21T09:41:35.6650093Z // end inline asm 2026-02-21T09:41:35.6650479Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6650954Z bar.sync 0; 2026-02-21T09:41:35.6651166Z elect.sync %r510|%p75, -1; 2026-02-21T09:41:35.6651431Z and.pred %p76, %p74, %p75; 2026-02-21T09:41:35.6651685Z and.pred %p57, %p64, %p76; 2026-02-21T09:41:35.6651941Z add.s32 %r454, %r418, 131072; 2026-02-21T09:41:35.6652184Z mov.b32 %r455, 128; 2026-02-21T09:41:35.6652411Z // begin inline asm 2026-02-21T09:41:35.6652960Z @%p57 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r454], [%rd26, {%r455, %r55}], [%r410]; 2026-02-21T09:41:35.6653545Z // end inline asm 2026-02-21T09:41:35.6653975Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6654535Z setp.gt.s32 %p77, %r41, 5; 2026-02-21T09:41:35.6655018Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6655491Z add.s64 %rd39, %rd2, 320; 2026-02-21T09:41:35.6655737Z add.s64 %rd40, %rd3, 320; 2026-02-21T09:41:35.6656171Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6656655Z add.s32 %r458, %r503, 249856; 2026-02-21T09:41:35.6656919Z selp.b32 %r459, 16, 0, %p77; 2026-02-21T09:41:35.6657169Z // begin inline asm 2026-02-21T09:41:35.6657502Z cp.async.cg.shared.global [ %r458 + 0 ], [ %rd39 + 0 ], 0x10, %r459; 2026-02-21T09:41:35.6657869Z // end inline asm 2026-02-21T09:41:35.6658092Z add.s32 %r460, %r503, 251904; 2026-02-21T09:41:35.6658333Z // begin inline asm 2026-02-21T09:41:35.6658660Z cp.async.cg.shared.global [ %r460 + 0 ], [ %rd40 + 0 ], 0x10, %r459; 2026-02-21T09:41:35.6659036Z // end inline asm 2026-02-21T09:41:35.6659258Z cp.async.commit_group; 2026-02-21T09:41:35.6659716Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6660168Z bar.sync 0; 2026-02-21T09:41:35.6660377Z and.pred %p58, %p126, %p77; 2026-02-21T09:41:35.6660614Z // begin inline asm 2026-02-21T09:41:35.6660918Z @%p58 mbarrier.arrive.expect_tx.shared.b64 _, [%r411], 32768; 2026-02-21T09:41:35.6661250Z // end inline asm 2026-02-21T09:41:35.6661636Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6662148Z bar.sync 0; 2026-02-21T09:41:35.6662348Z elect.sync %r511|%p78, -1; 2026-02-21T09:41:35.6662602Z and.pred %p79, %p77, %p78; 2026-02-21T09:41:35.6662842Z and.pred %p59, %p64, %p79; 2026-02-21T09:41:35.6663086Z add.s32 %r463, %r418, 163840; 2026-02-21T09:41:35.6663320Z mov.b32 %r464, 160; 2026-02-21T09:41:35.6663587Z // begin inline asm 2026-02-21T09:41:35.6664098Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r463], [%rd26, {%r464, %r55}], [%r411]; 2026-02-21T09:41:35.6664735Z // end inline asm 2026-02-21T09:41:35.6665132Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6665590Z cp.async.wait_group 5; 2026-02-21T09:41:35.6665826Z bar.sync 0; 2026-02-21T09:41:35.6666215Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6666677Z // begin inline asm 2026-02-21T09:41:35.6666874Z 2026-02-21T09:41:35.6667040Z { 2026-02-21T09:41:35.6667222Z @!%p60 bra.uni skipWait; 2026-02-21T09:41:35.6667459Z .reg .pred complete; 2026-02-21T09:41:35.6667683Z waitLoop: 2026-02-21T09:41:35.6667988Z mbarrier.try_wait.parity.shared.b64 complete, [%r406], %r1719; 2026-02-21T09:41:35.6668437Z @!complete bra.uni waitLoop; 2026-02-21T09:41:35.6668677Z skipWait: 2026-02-21T09:41:35.6668860Z } 2026-02-21T09:41:35.6668958Z 2026-02-21T09:41:35.6669043Z // end inline asm 2026-02-21T09:41:35.6669446Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6669908Z setp.ne.b32 %p80, %r43, 0; 2026-02-21T09:41:35.6670145Z or.pred %p81, %p61, %p80; 2026-02-21T09:41:35.6670389Z @%p81 bra $L__BB0_2; 2026-02-21T09:41:35.6670593Z // %bb.1: 2026-02-21T09:41:35.6670793Z elect.sync %r520|%p83, -1; 2026-02-21T09:41:35.6671029Z add.s32 %r522, %r123, 229376; 2026-02-21T09:41:35.6671281Z bfe.u32 %r523, %r522, 4, 14; 2026-02-21T09:41:35.6671522Z cvt.u64.u32 %rd60, %r523; 2026-02-21T09:41:35.6671786Z or.b64 %rd51, %rd60, -9223371899399045120; 2026-02-21T09:41:35.6672071Z bfe.u32 %r524, %r123, 4, 14; 2026-02-21T09:41:35.6672318Z cvt.u64.u32 %rd61, %r524; 2026-02-21T09:41:35.6672574Z or.b64 %rd52, %rd61, -9223371899281604608; 2026-02-21T09:41:35.6672845Z mov.b32 %r513, 71303184; 2026-02-21T09:41:35.6673086Z mov.pred %p82, 0; 2026-02-21T09:41:35.6673291Z // begin inline asm 2026-02-21T09:41:35.6673726Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 0 ], %rd51, %rd52, %r513, %p82; 2026-02-21T09:41:35.6674132Z // end inline asm 2026-02-21T09:41:35.6674351Z add.s32 %r525, %r123, 229408; 2026-02-21T09:41:35.6674589Z bfe.u32 %r526, %r525, 4, 14; 2026-02-21T09:41:35.6674880Z cvt.u64.u32 %rd62, %r526; 2026-02-21T09:41:35.6675132Z or.b64 %rd53, %rd62, -9223371899399045120; 2026-02-21T09:41:35.6675406Z add.s32 %r527, %r123, 32; 2026-02-21T09:41:35.6675640Z bfe.u32 %r528, %r527, 4, 14; 2026-02-21T09:41:35.6675870Z cvt.u64.u32 %rd63, %r528; 2026-02-21T09:41:35.6676116Z or.b64 %rd54, %rd63, -9223371899281604608; 2026-02-21T09:41:35.6676380Z // begin inline asm 2026-02-21T09:41:35.6676719Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 0 ], %rd53, %rd54, %r513, %p84; 2026-02-21T09:41:35.6677098Z // end inline asm 2026-02-21T09:41:35.6677306Z add.s32 %r529, %r123, 16384; 2026-02-21T09:41:35.6677536Z bfe.u32 %r530, %r529, 4, 14; 2026-02-21T09:41:35.6677768Z cvt.u64.u32 %rd64, %r530; 2026-02-21T09:41:35.6678022Z or.b64 %rd56, %rd64, -9223371899281604608; 2026-02-21T09:41:35.6678292Z // begin inline asm 2026-02-21T09:41:35.6678661Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 1048576 ], %rd51, %rd56, %r513, %p82; 2026-02-21T09:41:35.6679070Z // end inline asm 2026-02-21T09:41:35.6679278Z add.s32 %r531, %r123, 16416; 2026-02-21T09:41:35.6679510Z bfe.u32 %r532, %r531, 4, 14; 2026-02-21T09:41:35.6679751Z cvt.u64.u32 %rd65, %r532; 2026-02-21T09:41:35.6679999Z or.b64 %rd58, %rd65, -9223371899281604608; 2026-02-21T09:41:35.6680333Z // begin inline asm 2026-02-21T09:41:35.6680697Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 1048576 ], %rd53, %rd58, %r513, %p84; 2026-02-21T09:41:35.6681109Z // end inline asm 2026-02-21T09:41:35.6681328Z add.s32 %r533, %r123, 274496; 2026-02-21T09:41:35.6681572Z cvt.u64.u32 %rd59, %r533; 2026-02-21T09:41:35.6681866Z // begin inline asm 2026-02-21T09:41:35.6682190Z @%p83 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd59]; 2026-02-21T09:41:35.6682569Z // end inline asm 2026-02-21T09:41:35.6682775Z $L__BB0_2: 2026-02-21T09:41:35.6683156Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6683639Z setp.gt.s32 %p93, %r41, 6; 2026-02-21T09:41:35.6684061Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6684531Z add.s64 %rd66, %rd2, 384; 2026-02-21T09:41:35.6684811Z add.s64 %rd67, %rd3, 384; 2026-02-21T09:41:35.6685240Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6685706Z bar.sync 0; 2026-02-21T09:41:35.6685904Z add.s32 %r534, %r503, 253952; 2026-02-21T09:41:35.6686157Z selp.b32 %r535, 16, 0, %p93; 2026-02-21T09:41:35.6686435Z // begin inline asm 2026-02-21T09:41:35.6686757Z cp.async.cg.shared.global [ %r534 + 0 ], [ %rd66 + 0 ], 0x10, %r535; 2026-02-21T09:41:35.6687108Z // end inline asm 2026-02-21T09:41:35.6687321Z add.s32 %r536, %r503, 256000; 2026-02-21T09:41:35.6687552Z // begin inline asm 2026-02-21T09:41:35.6687860Z cp.async.cg.shared.global [ %r536 + 0 ], [ %rd67 + 0 ], 0x10, %r535; 2026-02-21T09:41:35.6688197Z // end inline asm 2026-02-21T09:41:35.6688415Z cp.async.commit_group; 2026-02-21T09:41:35.6688846Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6689319Z and.pred %p91, %p126, %p93; 2026-02-21T09:41:35.6689567Z // begin inline asm 2026-02-21T09:41:35.6689856Z @%p91 mbarrier.arrive.expect_tx.shared.b64 _, [%r538], 32768; 2026-02-21T09:41:35.6690202Z // end inline asm 2026-02-21T09:41:35.6690581Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6691038Z bar.sync 0; 2026-02-21T09:41:35.6691245Z elect.sync %r547|%p96, -1; 2026-02-21T09:41:35.6691496Z and.pred %p97, %p93, %p96; 2026-02-21T09:41:35.6691795Z and.pred %p92, %p64, %p97; 2026-02-21T09:41:35.6692033Z shl.b32 %r548, %r52, 1; 2026-02-21T09:41:35.6692267Z add.s32 %r549, %r123, %r548; 2026-02-21T09:41:35.6692504Z add.s32 %r539, %r549, 196608; 2026-02-21T09:41:35.6692743Z mov.b32 %r1698, 192; 2026-02-21T09:41:35.6692954Z // begin inline asm 2026-02-21T09:41:35.6693473Z @%p92 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r539], [%rd26, {%r1698, %r55}], [%r538]; 2026-02-21T09:41:35.6694053Z // end inline asm 2026-02-21T09:41:35.6694457Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6694979Z add.s32 %r56, %r41, -1; 2026-02-21T09:41:35.6695212Z setp.lt.s32 %p98, %r56, 1; 2026-02-21T09:41:35.6695455Z @%p98 bra $L__BB0_11; 2026-02-21T09:41:35.6695703Z // %bb.3: // %.lr.ph 2026-02-21T09:41:35.6696179Z .loc 1 0 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:0:107 2026-02-21T09:41:35.6696662Z bfe.u32 %r7, %r1, 6, 1; 2026-02-21T09:41:35.6696947Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:41:35.6697263Z and.b32 %r6, %r1, 64; 2026-02-21T09:41:35.6697490Z or.b32 %r8, %r7, 2; 2026-02-21T09:41:35.6697715Z or.b32 %r9, %r7, 4; 2026-02-21T09:41:35.6697932Z or.b32 %r10, %r7, 6; 2026-02-21T09:41:35.6698164Z or.b32 %r11, %r7, 8; 2026-02-21T09:41:35.6698378Z or.b32 %r12, %r7, 10; 2026-02-21T09:41:35.6698603Z or.b32 %r13, %r7, 12; 2026-02-21T09:41:35.6698818Z or.b32 %r14, %r7, 14; 2026-02-21T09:41:35.6699128Z or.b32 %r15, %r7, 16; 2026-02-21T09:41:35.6699346Z or.b32 %r16, %r7, 18; 2026-02-21T09:41:35.6699570Z or.b32 %r17, %r7, 20; 2026-02-21T09:41:35.6699783Z or.b32 %r18, %r7, 22; 2026-02-21T09:41:35.6699980Z or.b32 %r19, %r7, 24; 2026-02-21T09:41:35.6700196Z or.b32 %r20, %r7, 26; 2026-02-21T09:41:35.6700407Z or.b32 %r21, %r7, 28; 2026-02-21T09:41:35.6700670Z or.b32 %r22, %r7, 30; 2026-02-21T09:41:35.6700881Z or.b32 %r23, %r7, 32; 2026-02-21T09:41:35.6701093Z or.b32 %r24, %r7, 34; 2026-02-21T09:41:35.6701304Z or.b32 %r25, %r7, 36; 2026-02-21T09:41:35.6701521Z or.b32 %r26, %r7, 38; 2026-02-21T09:41:35.6701725Z or.b32 %r27, %r7, 40; 2026-02-21T09:41:35.6701943Z or.b32 %r28, %r7, 42; 2026-02-21T09:41:35.6702156Z or.b32 %r29, %r7, 44; 2026-02-21T09:41:35.6702357Z or.b32 %r30, %r7, 46; 2026-02-21T09:41:35.6702566Z or.b32 %r31, %r7, 48; 2026-02-21T09:41:35.6702775Z or.b32 %r32, %r7, 50; 2026-02-21T09:41:35.6702988Z or.b32 %r33, %r7, 52; 2026-02-21T09:41:35.6703193Z or.b32 %r34, %r7, 54; 2026-02-21T09:41:35.6703405Z or.b32 %r35, %r7, 56; 2026-02-21T09:41:35.6703620Z or.b32 %r36, %r7, 58; 2026-02-21T09:41:35.6703842Z or.b32 %r37, %r7, 60; 2026-02-21T09:41:35.6704058Z or.b32 %r38, %r7, 62; 2026-02-21T09:41:35.6704297Z and.b32 %r40, %r39, 504; 2026-02-21T09:41:35.6704832Z .loc 1 42 45 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:42:45 2026-02-21T09:41:35.6705304Z setp.eq.b32 %p100, %r6, 0; 2026-02-21T09:41:35.6705559Z add.s32 %r57, %r41, -7; 2026-02-21T09:41:35.6705781Z and.b32 %r559, %r1, 7; 2026-02-21T09:41:35.6706008Z shl.b32 %r560, %r559, 11; 2026-02-21T09:41:35.6706240Z and.b32 %r561, %r49, 240; 2026-02-21T09:41:35.6706477Z and.b32 %r562, %r39, 768; 2026-02-21T09:41:35.6706700Z and.b32 %r564, %r477, 64; 2026-02-21T09:41:35.6706939Z or.b32 %r565, %r561, %r562; 2026-02-21T09:41:35.6707181Z xor.b32 %r566, %r565, %r564; 2026-02-21T09:41:35.6707428Z or.b32 %r567, %r566, %r560; 2026-02-21T09:41:35.6707675Z add.s32 %r569, %r123, 258048; 2026-02-21T09:41:35.6707915Z add.s32 %r58, %r569, %r567; 2026-02-21T09:41:35.6708155Z xor.b32 %r570, %r567, 16; 2026-02-21T09:41:35.6708382Z add.s32 %r59, %r569, %r570; 2026-02-21T09:41:35.6708618Z xor.b32 %r571, %r567, 32; 2026-02-21T09:41:35.6708844Z add.s32 %r60, %r569, %r571; 2026-02-21T09:41:35.6709084Z xor.b32 %r572, %r567, 48; 2026-02-21T09:41:35.6709296Z add.s32 %r61, %r569, %r572; 2026-02-21T09:41:35.6709586Z shl.b32 %r573, %r50, 9; 2026-02-21T09:41:35.6709813Z shl.b32 %r574, %r559, 4; 2026-02-21T09:41:35.6710054Z shl.b32 %r575, %r50, 2; 2026-02-21T09:41:35.6710290Z shl.b32 %r576, %r1, 5; 2026-02-21T09:41:35.6710516Z and.b32 %r577, %r576, 1024; 2026-02-21T09:41:35.6710770Z selp.b32 %r578, 0, 2064, %p100; 2026-02-21T09:41:35.6711021Z or.b32 %r579, %r573, %r574; 2026-02-21T09:41:35.6711259Z or.b32 %r580, %r578, %r575; 2026-02-21T09:41:35.6711491Z xor.b32 %r581, %r580, %r579; 2026-02-21T09:41:35.6711731Z add.s32 %r582, %r569, %r577; 2026-02-21T09:41:35.6711959Z add.s32 %r937, %r582, %r581; 2026-02-21T09:41:35.6712203Z add.s32 %r942, %r937, 128; 2026-02-21T09:41:35.6712446Z add.s32 %r947, %r937, 256; 2026-02-21T09:41:35.6712675Z add.s32 %r952, %r937, 384; 2026-02-21T09:41:35.6712912Z add.s32 %r957, %r937, 512; 2026-02-21T09:41:35.6713142Z add.s32 %r962, %r937, 640; 2026-02-21T09:41:35.6713382Z add.s32 %r967, %r937, 768; 2026-02-21T09:41:35.6713608Z add.s32 %r972, %r937, 896; 2026-02-21T09:41:35.6713851Z add.s32 %r1700, %r123, 274496; 2026-02-21T09:41:35.6714098Z mov.pred %p136, -1; 2026-02-21T09:41:35.6714324Z mov.b32 %r1703, 6; 2026-02-21T09:41:35.6714531Z mov.b32 %r1699, 0; 2026-02-21T09:41:35.6714785Z mov.b32 %r1697, 1; 2026-02-21T09:41:35.6714994Z mov.b32 %r1696, 2; 2026-02-21T09:41:35.6715193Z mov.b32 %r1695, 3; 2026-02-21T09:41:35.6715386Z mov.b32 %r1694, 4; 2026-02-21T09:41:35.6715581Z mov.b32 %r1693, 5; 2026-02-21T09:41:35.6715791Z mov.b32 %r1684, %r1683; 2026-02-21T09:41:35.6716061Z mov.b32 %r1685, %r1683; 2026-02-21T09:41:35.6716292Z mov.b32 %r1686, %r1683; 2026-02-21T09:41:35.6716511Z mov.b32 %r1687, %r1683; 2026-02-21T09:41:35.6716736Z mov.b32 %r1689, %r1688; 2026-02-21T09:41:35.6716954Z mov.b32 %r1690, %r1688; 2026-02-21T09:41:35.6717184Z mov.b32 %r1691, %r1688; 2026-02-21T09:41:35.6717399Z mov.b32 %r1692, %r1688; 2026-02-21T09:41:35.6717665Z mov.b32 %r1701, %r1699; 2026-02-21T09:41:35.6717891Z mov.b32 %r1702, %r1699; 2026-02-21T09:41:35.6718111Z mov.b32 %r1704, %r1697; 2026-02-21T09:41:35.6718335Z mov.b32 %r1705, %r1699; 2026-02-21T09:41:35.6718552Z mov.b32 %r1706, %r1683; 2026-02-21T09:41:35.6718779Z mov.b32 %r1707, %r1688; 2026-02-21T09:41:35.6718998Z mov.b32 %r1709, %r1703; 2026-02-21T09:41:35.6719226Z mov.b32 %r1710, %r1699; 2026-02-21T09:41:35.6719444Z mov.b32 %r1713, %r1707; 2026-02-21T09:41:35.6719667Z mov.b32 %r1714, %r1706; 2026-02-21T09:41:35.6719886Z bra.uni $L__BB0_4; 2026-02-21T09:41:35.6720164Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:35.6720679Z .loc 1 0 0 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:0 2026-02-21T09:41:35.6721118Z selp.b32 %r1704, 0, %r642, %p120; 2026-02-21T09:41:35.6721387Z selp.b32 %r643, 1, 0, %p120; 2026-02-21T09:41:35.6721670Z xor.b32 %r1705, %r1719, %r643; 2026-02-21T09:41:35.6722124Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6722603Z add.s32 %r1710, %r1710, 1; 2026-02-21T09:41:35.6722857Z setp.ne.b32 %p124, %r56, %r1710; 2026-02-21T09:41:35.6723118Z mov.b32 %r1683, %r1706; 2026-02-21T09:41:35.6723339Z mov.b32 %r1687, %r75; 2026-02-21T09:41:35.6723563Z mov.b32 %r1688, %r1707; 2026-02-21T09:41:35.6723782Z mov.b32 %r1692, %r80; 2026-02-21T09:41:35.6724006Z mov.b32 %r1693, %r1709; 2026-02-21T09:41:35.6724223Z mov.b32 %r1697, %r85; 2026-02-21T09:41:35.6724443Z mov.b32 %r1699, %r1719; 2026-02-21T09:41:35.6724660Z mov.b32 %r1700, %r1718; 2026-02-21T09:41:35.6724960Z mov.b32 %r1706, %r1714; 2026-02-21T09:41:35.6725178Z mov.b32 %r1707, %r1713; 2026-02-21T09:41:35.6725404Z mov.b32 %r1709, %r102; 2026-02-21T09:41:35.6725634Z @%p124 bra $L__BB0_4; 2026-02-21T09:41:35.6725846Z bra.uni $L__BB0_11; 2026-02-21T09:41:35.6726138Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:41:35.6726659Z .loc 1 0 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:0:107 2026-02-21T09:41:35.6727174Z mov.b32 %r1719, %r1705; 2026-02-21T09:41:35.6727394Z mov.b32 %r85, %r1696; 2026-02-21T09:41:35.6727616Z mov.b32 %r1696, %r1695; 2026-02-21T09:41:35.6727829Z mov.b32 %r1695, %r1694; 2026-02-21T09:41:35.6728051Z mov.b32 %r1694, %r1693; 2026-02-21T09:41:35.6728273Z mov.b32 %r80, %r1691; 2026-02-21T09:41:35.6728483Z mov.b32 %r1691, %r1690; 2026-02-21T09:41:35.6728709Z mov.b32 %r1690, %r1689; 2026-02-21T09:41:35.6728924Z mov.b32 %r1689, %r1688; 2026-02-21T09:41:35.6729147Z mov.b32 %r75, %r1686; 2026-02-21T09:41:35.6729350Z mov.b32 %r1686, %r1685; 2026-02-21T09:41:35.6729570Z mov.b32 %r1685, %r1684; 2026-02-21T09:41:35.6729780Z mov.b32 %r1684, %r1683; 2026-02-21T09:41:35.6730199Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6730677Z add.s32 %r583, %r1709, 1; 2026-02-21T09:41:35.6730916Z setp.eq.b32 %p101, %r1709, 31; 2026-02-21T09:41:35.6731181Z selp.b32 %r102, 0, %r583, %p101; 2026-02-21T09:41:35.6731441Z setp.ne.b32 %p102, %r102, 0; 2026-02-21T09:41:35.6731690Z @%p102 bra $L__BB0_6; 2026-02-21T09:41:35.6731963Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:35.6732297Z add.s32 %r1715, %r1715, 1; 2026-02-21T09:41:35.6732707Z .loc 1 36 35 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:36:35 2026-02-21T09:41:35.6733185Z mul.hi.s32 %r584, %r1715, 715827883; 2026-02-21T09:41:35.6733501Z shr.u32 %r585, %r584, 31; 2026-02-21T09:41:35.6733732Z shr.s32 %r586, %r584, 4; 2026-02-21T09:41:35.6733973Z add.s32 %r587, %r586, %r585; 2026-02-21T09:41:35.6734390Z .loc 1 37 33 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:37:33 2026-02-21T09:41:35.6734896Z shl.b32 %r588, %r587, 2; 2026-02-21T09:41:35.6735335Z .loc 1 38 39 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:38:39 2026-02-21T09:41:35.6735792Z sub.s32 %r589, 16, %r588; 2026-02-21T09:41:35.6736202Z .loc 1 38 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:38:52 2026-02-21T09:41:35.6736665Z min.s32 %r590, %r589, 4; 2026-02-21T09:41:35.6737087Z .loc 1 39 45 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:39:45 2026-02-21T09:41:35.6737543Z mul.lo.s32 %r591, %r587, 96; 2026-02-21T09:41:35.6737797Z sub.s32 %r592, %r1715, %r591; 2026-02-21T09:41:35.6738212Z .loc 1 40 51 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:40:51 2026-02-21T09:41:35.6738683Z div.s32 %r593, %r592, %r590; 2026-02-21T09:41:35.6739105Z .loc 1 39 64 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:39:64 2026-02-21T09:41:35.6739585Z mul.lo.s32 %r594, %r593, %r590; 2026-02-21T09:41:35.6739895Z sub.s32 %r595, %r592, %r594; 2026-02-21T09:41:35.6740328Z .loc 1 39 30 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:39:30 2026-02-21T09:41:35.6740799Z add.s32 %r596, %r595, %r588; 2026-02-21T09:41:35.6741234Z .loc 1 41 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:41:27 2026-02-21T09:41:35.6741723Z shl.b32 %r1713, %r596, 6; 2026-02-21T09:41:35.6742146Z .loc 1 42 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:42:32 2026-02-21T09:41:35.6742626Z or.b32 %r1716, %r1713, %r4; 2026-02-21T09:41:35.6742882Z or.b32 %r1717, %r1713, %r5; 2026-02-21T09:41:35.6743316Z .loc 1 43 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:43:27 2026-02-21T09:41:35.6743788Z shl.b32 %r1714, %r593, 9; 2026-02-21T09:41:35.6744095Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:35.6744643Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6745175Z add.s32 %r599, %r1702, 1; 2026-02-21T09:41:35.6745468Z setp.gt.s32 %p104, %r599, 6; 2026-02-21T09:41:35.6745741Z selp.b32 %r1702, 0, %r599, %p104; 2026-02-21T09:41:35.6746014Z selp.b32 %r600, 1, 0, %p104; 2026-02-21T09:41:35.6746274Z xor.b32 %r1701, %r1701, %r600; 2026-02-21T09:41:35.6746720Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6747220Z cp.async.wait_group 5; 2026-02-21T09:41:35.6747459Z bar.sync 0; 2026-02-21T09:41:35.6747871Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6748360Z shl.b32 %r601, %r1702, 3; 2026-02-21T09:41:35.6748587Z add.s32 %r603, %r123, %r601; 2026-02-21T09:41:35.6748828Z add.s32 %r597, %r603, 274432; 2026-02-21T09:41:35.6749065Z // begin inline asm 2026-02-21T09:41:35.6749268Z 2026-02-21T09:41:35.6749425Z { 2026-02-21T09:41:35.6749611Z .reg .pred complete; 2026-02-21T09:41:35.6749821Z waitLoop: 2026-02-21T09:41:35.6750114Z mbarrier.try_wait.parity.shared.b64 complete, [%r597], %r1701; 2026-02-21T09:41:35.6750478Z @!complete bra.uni waitLoop; 2026-02-21T09:41:35.6750718Z } 2026-02-21T09:41:35.6750815Z 2026-02-21T09:41:35.6750906Z // end inline asm 2026-02-21T09:41:35.6751112Z shl.b32 %r604, %r1704, 3; 2026-02-21T09:41:35.6751351Z add.s32 %r605, %r123, %r604; 2026-02-21T09:41:35.6751587Z add.s32 %r1718, %r605, 274496; 2026-02-21T09:41:35.6752022Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6752533Z @%p80 bra $L__BB0_8; 2026-02-21T09:41:35.6752817Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:35.6753324Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6753795Z shl.b32 %r614, %r1702, 15; 2026-02-21T09:41:35.6754081Z add.s32 %r616, %r123, %r614; 2026-02-21T09:41:35.6754499Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6755023Z shl.b32 %r617, %r1702, 12; 2026-02-21T09:41:35.6755263Z add.s32 %r618, %r123, %r617; 2026-02-21T09:41:35.6755525Z add.s32 %r619, %r618, 229376; 2026-02-21T09:41:35.6755946Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6756418Z elect.sync %r620|%p106, -1; 2026-02-21T09:41:35.6756672Z bfe.u32 %r621, %r619, 4, 14; 2026-02-21T09:41:35.6756911Z cvt.u64.u32 %rd78, %r621; 2026-02-21T09:41:35.6757176Z or.b64 %rd69, %rd78, -9223371899399045120; 2026-02-21T09:41:35.6757460Z bfe.u32 %r622, %r616, 4, 14; 2026-02-21T09:41:35.6757702Z cvt.u64.u32 %rd79, %r622; 2026-02-21T09:41:35.6757951Z or.b64 %rd70, %rd79, -9223371899281604608; 2026-02-21T09:41:35.6758238Z mov.b32 %r607, 71303184; 2026-02-21T09:41:35.6758504Z // begin inline asm 2026-02-21T09:41:35.6758878Z @%p106 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 0 ], %rd69, %rd70, %r607, %p136; 2026-02-21T09:41:35.6759294Z // end inline asm 2026-02-21T09:41:35.6759505Z add.s32 %r623, %r618, 229408; 2026-02-21T09:41:35.6759757Z bfe.u32 %r624, %r623, 4, 14; 2026-02-21T09:41:35.6759995Z cvt.u64.u32 %rd80, %r624; 2026-02-21T09:41:35.6760253Z or.b64 %rd71, %rd80, -9223371899399045120; 2026-02-21T09:41:35.6760527Z add.s32 %r625, %r616, 32; 2026-02-21T09:41:35.6760767Z bfe.u32 %r626, %r625, 4, 14; 2026-02-21T09:41:35.6761002Z cvt.u64.u32 %rd81, %r626; 2026-02-21T09:41:35.6761254Z or.b64 %rd72, %rd81, -9223371899281604608; 2026-02-21T09:41:35.6761534Z mov.pred %p107, -1; 2026-02-21T09:41:35.6761746Z // begin inline asm 2026-02-21T09:41:35.6762089Z @%p106 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 0 ], %rd71, %rd72, %r607, %p107; 2026-02-21T09:41:35.6762491Z // end inline asm 2026-02-21T09:41:35.6762703Z add.s32 %r627, %r616, 16384; 2026-02-21T09:41:35.6762940Z bfe.u32 %r628, %r627, 4, 14; 2026-02-21T09:41:35.6763183Z cvt.u64.u32 %rd82, %r628; 2026-02-21T09:41:35.6763498Z or.b64 %rd74, %rd82, -9223371899281604608; 2026-02-21T09:41:35.6763776Z // begin inline asm 2026-02-21T09:41:35.6764146Z @%p106 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 1048576 ], %rd69, %rd74, %r607, %p136; 2026-02-21T09:41:35.6764551Z // end inline asm 2026-02-21T09:41:35.6764809Z add.s32 %r629, %r616, 16416; 2026-02-21T09:41:35.6765044Z bfe.u32 %r630, %r629, 4, 14; 2026-02-21T09:41:35.6765287Z cvt.u64.u32 %rd83, %r630; 2026-02-21T09:41:35.6765537Z or.b64 %rd76, %rd83, -9223371899281604608; 2026-02-21T09:41:35.6765822Z // begin inline asm 2026-02-21T09:41:35.6766184Z @%p106 tcgen05.mma.cta_group::1.kind::f16 [ %r1681 + 1048576 ], %rd71, %rd76, %r607, %p107; 2026-02-21T09:41:35.6766606Z // end inline asm 2026-02-21T09:41:35.6766826Z cvt.u64.u32 %rd77, %r1718; 2026-02-21T09:41:35.6767058Z // begin inline asm 2026-02-21T09:41:35.6767389Z @%p106 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd77]; 2026-02-21T09:41:35.6767754Z // end inline asm 2026-02-21T09:41:35.6768032Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:35.6768561Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6769037Z setp.eq.b32 %p116, %r102, 0; 2026-02-21T09:41:35.6769293Z setp.lt.s32 %p117, %r1710, %r57; 2026-02-21T09:41:35.6769733Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6770192Z // begin inline asm 2026-02-21T09:41:35.6770449Z 2026-02-21T09:41:35.6770621Z { 2026-02-21T09:41:35.6770803Z .reg .pred complete; 2026-02-21T09:41:35.6771032Z waitLoop: 2026-02-21T09:41:35.6771326Z mbarrier.try_wait.parity.shared.b64 complete, [%r1700], %r1699; 2026-02-21T09:41:35.6771706Z @!complete bra.uni waitLoop; 2026-02-21T09:41:35.6771937Z } 2026-02-21T09:41:35.6772038Z 2026-02-21T09:41:35.6772169Z // end inline asm 2026-02-21T09:41:35.6772377Z add.s32 %r642, %r1704, 1; 2026-02-21T09:41:35.6772612Z setp.gt.s32 %p120, %r642, 1; 2026-02-21T09:41:35.6773038Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6773499Z add.s32 %r644, %r1698, 32; 2026-02-21T09:41:35.6773733Z add.s32 %r645, %r1703, 1; 2026-02-21T09:41:35.6773962Z setp.gt.s32 %p121, %r645, 6; 2026-02-21T09:41:35.6774217Z selp.b32 %r1703, 0, %r645, %p121; 2026-02-21T09:41:35.6774483Z selp.b32 %r1698, 0, %r644, %p116; 2026-02-21T09:41:35.6774989Z .loc 1 50 35 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:50:35 2026-02-21T09:41:35.6775457Z add.s32 %r646, %r1698, %r42; 2026-02-21T09:41:35.6775871Z .loc 1 54 53 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:53 2026-02-21T09:41:35.6776340Z shl.b32 %r647, %r1716, 10; 2026-02-21T09:41:35.6776641Z shl.b32 %r648, %r1717, 10; 2026-02-21T09:41:35.6777065Z .loc 1 54 60 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:60 2026-02-21T09:41:35.6777527Z add.s32 %r649, %r647, %r646; 2026-02-21T09:41:35.6777772Z add.s32 %r650, %r648, %r646; 2026-02-21T09:41:35.6778193Z .loc 1 54 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:32 2026-02-21T09:41:35.6778660Z mad.wide.s32 %rd84, %r649, 2, %rd4; 2026-02-21T09:41:35.6778945Z mad.wide.s32 %rd85, %r650, 2, %rd4; 2026-02-21T09:41:35.6779386Z .loc 1 54 85 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:54:85 2026-02-21T09:41:35.6779851Z shl.b32 %r651, %r1703, 12; 2026-02-21T09:41:35.6780084Z add.s32 %r653, %r123, %r651; 2026-02-21T09:41:35.6780325Z add.s32 %r654, %r653, %r51; 2026-02-21T09:41:35.6780561Z bar.sync 0; 2026-02-21T09:41:35.6780759Z add.s32 %r633, %r654, 229376; 2026-02-21T09:41:35.6781010Z selp.b32 %r634, 16, 0, %p117; 2026-02-21T09:41:35.6781250Z // begin inline asm 2026-02-21T09:41:35.6781580Z cp.async.cg.shared.global [ %r633 + 0 ], [ %rd84 + 0 ], 0x10, %r634; 2026-02-21T09:41:35.6781992Z // end inline asm 2026-02-21T09:41:35.6782222Z add.s32 %r635, %r654, 231424; 2026-02-21T09:41:35.6782479Z // begin inline asm 2026-02-21T09:41:35.6782815Z cp.async.cg.shared.global [ %r635 + 0 ], [ %rd85 + 0 ], 0x10, %r634; 2026-02-21T09:41:35.6783193Z // end inline asm 2026-02-21T09:41:35.6783428Z cp.async.commit_group; 2026-02-21T09:41:35.6783885Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6784383Z shl.b32 %r655, %r1703, 3; 2026-02-21T09:41:35.6784643Z add.s32 %r656, %r123, %r655; 2026-02-21T09:41:35.6784967Z add.s32 %r641, %r656, 274432; 2026-02-21T09:41:35.6785233Z and.pred %p114, %p126, %p117; 2026-02-21T09:41:35.6785485Z // begin inline asm 2026-02-21T09:41:35.6785800Z @%p114 mbarrier.arrive.expect_tx.shared.b64 _, [%r641], 32768; 2026-02-21T09:41:35.6786170Z // end inline asm 2026-02-21T09:41:35.6786603Z .loc 1 55 44 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:55:44 2026-02-21T09:41:35.6787093Z shl.b32 %r657, %r1703, 15; 2026-02-21T09:41:35.6787335Z bar.sync 0; 2026-02-21T09:41:35.6787560Z elect.sync %r658|%p122, -1; 2026-02-21T09:41:35.6787825Z and.pred %p123, %p117, %p122; 2026-02-21T09:41:35.6788099Z and.pred %p115, %p64, %p123; 2026-02-21T09:41:35.6788348Z add.s32 %r638, %r418, %r657; 2026-02-21T09:41:35.6788598Z or.b32 %r640, %r1714, %r54; 2026-02-21T09:41:35.6788837Z // begin inline asm 2026-02-21T09:41:35.6789404Z @%p115 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r638], [%rd26, {%r1698, %r640}], [%r641]; 2026-02-21T09:41:35.6790697Z // end inline asm 2026-02-21T09:41:35.6791100Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6791575Z setp.ne.b32 %p136, %r1697, 31; 2026-02-21T09:41:35.6791902Z @%p136 bra $L__BB0_10; 2026-02-21T09:41:35.6792202Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:41:35.6792711Z .loc 1 42 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:42:32 2026-02-21T09:41:35.6793169Z add.s32 %r1221, %r1692, %r7; 2026-02-21T09:41:35.6793406Z add.s32 %r1222, %r8, %r1692; 2026-02-21T09:41:35.6793638Z add.s32 %r1223, %r9, %r1692; 2026-02-21T09:41:35.6793878Z add.s32 %r1224, %r10, %r1692; 2026-02-21T09:41:35.6794114Z add.s32 %r1225, %r11, %r1692; 2026-02-21T09:41:35.6794356Z add.s32 %r1226, %r12, %r1692; 2026-02-21T09:41:35.6794586Z add.s32 %r1227, %r13, %r1692; 2026-02-21T09:41:35.6794898Z add.s32 %r1228, %r14, %r1692; 2026-02-21T09:41:35.6795131Z add.s32 %r1229, %r15, %r1692; 2026-02-21T09:41:35.6795376Z add.s32 %r1230, %r16, %r1692; 2026-02-21T09:41:35.6795616Z add.s32 %r1231, %r17, %r1692; 2026-02-21T09:41:35.6795852Z add.s32 %r1232, %r18, %r1692; 2026-02-21T09:41:35.6796139Z add.s32 %r1233, %r19, %r1692; 2026-02-21T09:41:35.6796366Z add.s32 %r1234, %r20, %r1692; 2026-02-21T09:41:35.6796605Z add.s32 %r1235, %r21, %r1692; 2026-02-21T09:41:35.6796834Z add.s32 %r1236, %r22, %r1692; 2026-02-21T09:41:35.6797074Z add.s32 %r1237, %r23, %r1692; 2026-02-21T09:41:35.6797302Z add.s32 %r1238, %r24, %r1692; 2026-02-21T09:41:35.6797537Z add.s32 %r1239, %r25, %r1692; 2026-02-21T09:41:35.6797771Z add.s32 %r1240, %r26, %r1692; 2026-02-21T09:41:35.6798008Z add.s32 %r1241, %r27, %r1692; 2026-02-21T09:41:35.6798247Z add.s32 %r1242, %r28, %r1692; 2026-02-21T09:41:35.6798478Z add.s32 %r1243, %r29, %r1692; 2026-02-21T09:41:35.6798723Z add.s32 %r1244, %r30, %r1692; 2026-02-21T09:41:35.6798955Z add.s32 %r1245, %r31, %r1692; 2026-02-21T09:41:35.6799194Z add.s32 %r1246, %r32, %r1692; 2026-02-21T09:41:35.6799429Z add.s32 %r1247, %r33, %r1692; 2026-02-21T09:41:35.6799670Z add.s32 %r1248, %r34, %r1692; 2026-02-21T09:41:35.6799902Z add.s32 %r1249, %r35, %r1692; 2026-02-21T09:41:35.6800149Z add.s32 %r1250, %r36, %r1692; 2026-02-21T09:41:35.6800391Z add.s32 %r1251, %r37, %r1692; 2026-02-21T09:41:35.6800685Z add.s32 %r1252, %r38, %r1692; 2026-02-21T09:41:35.6801118Z .loc 1 44 32 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:44:32 2026-02-21T09:41:35.6801582Z add.s32 %r1253, %r1687, %r40; 2026-02-21T09:41:35.6802014Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6802465Z // begin inline asm 2026-02-21T09:41:35.6802665Z 2026-02-21T09:41:35.6802821Z { 2026-02-21T09:41:35.6803000Z .reg .pred complete; 2026-02-21T09:41:35.6803219Z waitLoop: 2026-02-21T09:41:35.6803502Z mbarrier.try_wait.parity.shared.b64 complete, [%r1718], %r1719; 2026-02-21T09:41:35.6803887Z @!complete bra.uni waitLoop; 2026-02-21T09:41:35.6804119Z } 2026-02-21T09:41:35.6804216Z 2026-02-21T09:41:35.6804305Z // end inline asm 2026-02-21T09:41:35.6804739Z .loc 1 59 53 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:59:53 2026-02-21T09:41:35.6805230Z mad.lo.s32 %r1254, %r1221, 12288, %r1253; 2026-02-21T09:41:35.6805521Z mad.lo.s32 %r1255, %r1222, 12288, %r1253; 2026-02-21T09:41:35.6805816Z mad.lo.s32 %r1256, %r1223, 12288, %r1253; 2026-02-21T09:41:35.6806106Z mad.lo.s32 %r1257, %r1224, 12288, %r1253; 2026-02-21T09:41:35.6806381Z mad.lo.s32 %r1258, %r1225, 12288, %r1253; 2026-02-21T09:41:35.6806664Z mad.lo.s32 %r1259, %r1226, 12288, %r1253; 2026-02-21T09:41:35.6806937Z mad.lo.s32 %r1260, %r1227, 12288, %r1253; 2026-02-21T09:41:35.6807221Z mad.lo.s32 %r1261, %r1228, 12288, %r1253; 2026-02-21T09:41:35.6807546Z mad.lo.s32 %r1262, %r1229, 12288, %r1253; 2026-02-21T09:41:35.6807831Z mad.lo.s32 %r1263, %r1230, 12288, %r1253; 2026-02-21T09:41:35.6808106Z mad.lo.s32 %r1264, %r1231, 12288, %r1253; 2026-02-21T09:41:35.6808393Z mad.lo.s32 %r1265, %r1232, 12288, %r1253; 2026-02-21T09:41:35.6808680Z mad.lo.s32 %r1266, %r1233, 12288, %r1253; 2026-02-21T09:41:35.6809004Z mad.lo.s32 %r1267, %r1234, 12288, %r1253; 2026-02-21T09:41:35.6809281Z mad.lo.s32 %r1268, %r1235, 12288, %r1253; 2026-02-21T09:41:35.6809558Z mad.lo.s32 %r1269, %r1236, 12288, %r1253; 2026-02-21T09:41:35.6809841Z mad.lo.s32 %r1270, %r1237, 12288, %r1253; 2026-02-21T09:41:35.6810117Z mad.lo.s32 %r1271, %r1238, 12288, %r1253; 2026-02-21T09:41:35.6810402Z mad.lo.s32 %r1272, %r1239, 12288, %r1253; 2026-02-21T09:41:35.6810689Z mad.lo.s32 %r1273, %r1240, 12288, %r1253; 2026-02-21T09:41:35.6810962Z mad.lo.s32 %r1274, %r1241, 12288, %r1253; 2026-02-21T09:41:35.6811247Z mad.lo.s32 %r1275, %r1242, 12288, %r1253; 2026-02-21T09:41:35.6811524Z mad.lo.s32 %r1276, %r1243, 12288, %r1253; 2026-02-21T09:41:35.6811801Z mad.lo.s32 %r1277, %r1244, 12288, %r1253; 2026-02-21T09:41:35.6812069Z mad.lo.s32 %r1278, %r1245, 12288, %r1253; 2026-02-21T09:41:35.6812346Z mad.lo.s32 %r1279, %r1246, 12288, %r1253; 2026-02-21T09:41:35.6812616Z mad.lo.s32 %r1280, %r1247, 12288, %r1253; 2026-02-21T09:41:35.6812942Z mad.lo.s32 %r1281, %r1248, 12288, %r1253; 2026-02-21T09:41:35.6813233Z mad.lo.s32 %r1282, %r1249, 12288, %r1253; 2026-02-21T09:41:35.6813519Z mad.lo.s32 %r1283, %r1250, 12288, %r1253; 2026-02-21T09:41:35.6813812Z mad.lo.s32 %r1284, %r1251, 12288, %r1253; 2026-02-21T09:41:35.6814095Z mad.lo.s32 %r1285, %r1252, 12288, %r1253; 2026-02-21T09:41:35.6814571Z .loc 1 59 24 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:59:24 2026-02-21T09:41:35.6815095Z mad.wide.s32 %rd87, %r1254, 2, %rd5; 2026-02-21T09:41:35.6815377Z mad.wide.s32 %rd88, %r1255, 2, %rd5; 2026-02-21T09:41:35.6815645Z mad.wide.s32 %rd89, %r1256, 2, %rd5; 2026-02-21T09:41:35.6815926Z mad.wide.s32 %rd90, %r1257, 2, %rd5; 2026-02-21T09:41:35.6816199Z mad.wide.s32 %rd91, %r1258, 2, %rd5; 2026-02-21T09:41:35.6816468Z mad.wide.s32 %rd92, %r1259, 2, %rd5; 2026-02-21T09:41:35.6816741Z mad.wide.s32 %rd93, %r1260, 2, %rd5; 2026-02-21T09:41:35.6817004Z mad.wide.s32 %rd94, %r1261, 2, %rd5; 2026-02-21T09:41:35.6817276Z mad.wide.s32 %rd95, %r1262, 2, %rd5; 2026-02-21T09:41:35.6817538Z mad.wide.s32 %rd96, %r1263, 2, %rd5; 2026-02-21T09:41:35.6817853Z mad.wide.s32 %rd97, %r1264, 2, %rd5; 2026-02-21T09:41:35.6818118Z mad.wide.s32 %rd98, %r1265, 2, %rd5; 2026-02-21T09:41:35.6818392Z mad.wide.s32 %rd99, %r1266, 2, %rd5; 2026-02-21T09:41:35.6818666Z mad.wide.s32 %rd100, %r1267, 2, %rd5; 2026-02-21T09:41:35.6818940Z mad.wide.s32 %rd101, %r1268, 2, %rd5; 2026-02-21T09:41:35.6819210Z mad.wide.s32 %rd102, %r1269, 2, %rd5; 2026-02-21T09:41:35.6819478Z mad.wide.s32 %rd103, %r1270, 2, %rd5; 2026-02-21T09:41:35.6819752Z mad.wide.s32 %rd104, %r1271, 2, %rd5; 2026-02-21T09:41:35.6820017Z mad.wide.s32 %rd105, %r1272, 2, %rd5; 2026-02-21T09:41:35.6820286Z mad.wide.s32 %rd106, %r1273, 2, %rd5; 2026-02-21T09:41:35.6820545Z mad.wide.s32 %rd107, %r1274, 2, %rd5; 2026-02-21T09:41:35.6820813Z mad.wide.s32 %rd108, %r1275, 2, %rd5; 2026-02-21T09:41:35.6820908Z mad.wide.s32 %rd109, %r1276, 2, %rd5; 2026-02-21T09:41:35.6821010Z mad.wide.s32 %rd110, %r1277, 2, %rd5; 2026-02-21T09:41:35.6821101Z mad.wide.s32 %rd111, %r1278, 2, %rd5; 2026-02-21T09:41:35.6821194Z mad.wide.s32 %rd112, %r1279, 2, %rd5; 2026-02-21T09:41:35.6821285Z mad.wide.s32 %rd113, %r1280, 2, %rd5; 2026-02-21T09:41:35.6821385Z mad.wide.s32 %rd114, %r1281, 2, %rd5; 2026-02-21T09:41:35.6821478Z mad.wide.s32 %rd115, %r1282, 2, %rd5; 2026-02-21T09:41:35.6821569Z mad.wide.s32 %rd116, %r1283, 2, %rd5; 2026-02-21T09:41:35.6821672Z mad.wide.s32 %rd117, %r1284, 2, %rd5; 2026-02-21T09:41:35.6821763Z mad.wide.s32 %rd118, %r1285, 2, %rd5; 2026-02-21T09:41:35.6822090Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6822182Z // begin inline asm 2026-02-21T09:41:35.6822639Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676}, [%r132 + 0], 128; 2026-02-21T09:41:35.6822749Z // end inline asm 2026-02-21T09:41:35.6822830Z // begin inline asm 2026-02-21T09:41:35.6823237Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693}, [%r132 + 16], 128; 2026-02-21T09:41:35.6823304Z // end inline asm 2026-02-21T09:41:35.6823371Z // begin inline asm 2026-02-21T09:41:35.6823776Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710}, [%r132 + 32], 128; 2026-02-21T09:41:35.6823846Z // end inline asm 2026-02-21T09:41:35.6823930Z // begin inline asm 2026-02-21T09:41:35.6824398Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727}, [%r132 + 48], 128; 2026-02-21T09:41:35.6824480Z // end inline asm 2026-02-21T09:41:35.6824564Z // begin inline asm 2026-02-21T09:41:35.6825206Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744}, [%r132 + 64], 128; 2026-02-21T09:41:35.6825303Z // end inline asm 2026-02-21T09:41:35.6825387Z // begin inline asm 2026-02-21T09:41:35.6825881Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761}, [%r132 + 80], 128; 2026-02-21T09:41:35.6825967Z // end inline asm 2026-02-21T09:41:35.6826058Z // begin inline asm 2026-02-21T09:41:35.6826541Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778}, [%r132 + 96], 128; 2026-02-21T09:41:35.6826631Z // end inline asm 2026-02-21T09:41:35.6826713Z // begin inline asm 2026-02-21T09:41:35.6827200Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795}, [%r132 + 112], 128; 2026-02-21T09:41:35.6827291Z // end inline asm 2026-02-21T09:41:35.6827427Z // begin inline asm 2026-02-21T09:41:35.6827928Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812}, [%r132 + 1048576], 128; 2026-02-21T09:41:35.6828021Z // end inline asm 2026-02-21T09:41:35.6828105Z // begin inline asm 2026-02-21T09:41:35.6828610Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829}, [%r132 + 1048592], 128; 2026-02-21T09:41:35.6828704Z // end inline asm 2026-02-21T09:41:35.6828788Z // begin inline asm 2026-02-21T09:41:35.6829285Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845, %r846}, [%r132 + 1048608], 128; 2026-02-21T09:41:35.6829379Z // end inline asm 2026-02-21T09:41:35.6829467Z // begin inline asm 2026-02-21T09:41:35.6829966Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858, %r859, %r860, %r861, %r862, %r863}, [%r132 + 1048624], 128; 2026-02-21T09:41:35.6830051Z // end inline asm 2026-02-21T09:41:35.6830148Z // begin inline asm 2026-02-21T09:41:35.6830644Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r865, %r866, %r867, %r868, %r869, %r870, %r871, %r872, %r873, %r874, %r875, %r876, %r877, %r878, %r879, %r880}, [%r132 + 1048640], 128; 2026-02-21T09:41:35.6830727Z // end inline asm 2026-02-21T09:41:35.6830820Z // begin inline asm 2026-02-21T09:41:35.6831359Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r882, %r883, %r884, %r885, %r886, %r887, %r888, %r889, %r890, %r891, %r892, %r893, %r894, %r895, %r896, %r897}, [%r132 + 1048656], 128; 2026-02-21T09:41:35.6831441Z // end inline asm 2026-02-21T09:41:35.6831537Z // begin inline asm 2026-02-21T09:41:35.6832075Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r899, %r900, %r901, %r902, %r903, %r904, %r905, %r906, %r907, %r908, %r909, %r910, %r911, %r912, %r913, %r914}, [%r132 + 1048672], 128; 2026-02-21T09:41:35.6832160Z // end inline asm 2026-02-21T09:41:35.6832254Z // begin inline asm 2026-02-21T09:41:35.6832758Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r916, %r917, %r918, %r919, %r920, %r921, %r922, %r923, %r924, %r925, %r926, %r927, %r928, %r929, %r930, %r931}, [%r132 + 1048688], 128; 2026-02-21T09:41:35.6832838Z // end inline asm 2026-02-21T09:41:35.6832917Z // begin inline asm 2026-02-21T09:41:35.6833036Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:41:35.6833113Z // end inline asm 2026-02-21T09:41:35.6833207Z cvt.u64.u32 %rd119, %r661; 2026-02-21T09:41:35.6833303Z cvt.u64.u32 %rd120, %r662; 2026-02-21T09:41:35.6833390Z shl.b64 %rd121, %rd120, 32; 2026-02-21T09:41:35.6833479Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T09:41:35.6833799Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6833892Z mov.b64 {%r1286, %r1287}, %rd122; 2026-02-21T09:41:35.6833998Z cvt.rn.f16x2.f32 %r1288, %r1287, %r1286; 2026-02-21T09:41:35.6834270Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6834363Z cvt.u64.u32 %rd123, %r663; 2026-02-21T09:41:35.6834448Z cvt.u64.u32 %rd124, %r664; 2026-02-21T09:41:35.6834534Z shl.b64 %rd125, %rd124, 32; 2026-02-21T09:41:35.6834622Z or.b64 %rd126, %rd123, %rd125; 2026-02-21T09:41:35.6834941Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6835035Z mov.b64 {%r1289, %r1290}, %rd126; 2026-02-21T09:41:35.6835149Z cvt.rn.f16x2.f32 %r1291, %r1290, %r1289; 2026-02-21T09:41:35.6835412Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6835499Z cvt.u64.u32 %rd127, %r665; 2026-02-21T09:41:35.6835583Z cvt.u64.u32 %rd128, %r666; 2026-02-21T09:41:35.6835679Z shl.b64 %rd129, %rd128, 32; 2026-02-21T09:41:35.6835766Z or.b64 %rd130, %rd127, %rd129; 2026-02-21T09:41:35.6836076Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6836173Z mov.b64 {%r1292, %r1293}, %rd130; 2026-02-21T09:41:35.6836274Z cvt.rn.f16x2.f32 %r1294, %r1293, %r1292; 2026-02-21T09:41:35.6836539Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6836631Z cvt.u64.u32 %rd131, %r667; 2026-02-21T09:41:35.6836718Z cvt.u64.u32 %rd132, %r668; 2026-02-21T09:41:35.6836807Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:41:35.6836895Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:41:35.6837169Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6837256Z mov.b64 {%r1295, %r1296}, %rd134; 2026-02-21T09:41:35.6837357Z cvt.rn.f16x2.f32 %r1297, %r1296, %r1295; 2026-02-21T09:41:35.6837632Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6837715Z cvt.u64.u32 %rd135, %r669; 2026-02-21T09:41:35.6837799Z cvt.u64.u32 %rd136, %r670; 2026-02-21T09:41:35.6837893Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:41:35.6837979Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:41:35.6838242Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6838330Z mov.b64 {%r1298, %r1299}, %rd138; 2026-02-21T09:41:35.6838438Z cvt.rn.f16x2.f32 %r1300, %r1299, %r1298; 2026-02-21T09:41:35.6838737Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6838822Z cvt.u64.u32 %rd139, %r671; 2026-02-21T09:41:35.6838917Z cvt.u64.u32 %rd140, %r672; 2026-02-21T09:41:35.6839004Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:41:35.6839093Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:41:35.6839409Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6839499Z mov.b64 {%r1301, %r1302}, %rd142; 2026-02-21T09:41:35.6839600Z cvt.rn.f16x2.f32 %r1303, %r1302, %r1301; 2026-02-21T09:41:35.6839868Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6839963Z cvt.u64.u32 %rd143, %r673; 2026-02-21T09:41:35.6840047Z cvt.u64.u32 %rd144, %r674; 2026-02-21T09:41:35.6840134Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:41:35.6840228Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:41:35.6840487Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6840567Z mov.b64 {%r1304, %r1305}, %rd146; 2026-02-21T09:41:35.6840670Z cvt.rn.f16x2.f32 %r1306, %r1305, %r1304; 2026-02-21T09:41:35.6840971Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6841054Z cvt.u64.u32 %rd147, %r675; 2026-02-21T09:41:35.6841132Z cvt.u64.u32 %rd148, %r676; 2026-02-21T09:41:35.6841226Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:41:35.6841309Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:41:35.6841568Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6841658Z mov.b64 {%r1307, %r1308}, %rd150; 2026-02-21T09:41:35.6841750Z cvt.rn.f16x2.f32 %r1309, %r1308, %r1307; 2026-02-21T09:41:35.6842013Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6842104Z cvt.u64.u32 %rd151, %r678; 2026-02-21T09:41:35.6842183Z cvt.u64.u32 %rd152, %r679; 2026-02-21T09:41:35.6842260Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:41:35.6842340Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:41:35.6842608Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6842696Z mov.b64 {%r1310, %r1311}, %rd154; 2026-02-21T09:41:35.6842790Z cvt.rn.f16x2.f32 %r1312, %r1311, %r1310; 2026-02-21T09:41:35.6843105Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6843186Z cvt.u64.u32 %rd155, %r680; 2026-02-21T09:41:35.6843265Z cvt.u64.u32 %rd156, %r681; 2026-02-21T09:41:35.6843350Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:41:35.6843434Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:41:35.6843664Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6843745Z mov.b64 {%r1313, %r1314}, %rd158; 2026-02-21T09:41:35.6843846Z cvt.rn.f16x2.f32 %r1315, %r1314, %r1313; 2026-02-21T09:41:35.6844101Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6844184Z cvt.u64.u32 %rd159, %r682; 2026-02-21T09:41:35.6844281Z cvt.u64.u32 %rd160, %r683; 2026-02-21T09:41:35.6844368Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:41:35.6844451Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:41:35.6844779Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6844867Z mov.b64 {%r1316, %r1317}, %rd162; 2026-02-21T09:41:35.6844969Z cvt.rn.f16x2.f32 %r1318, %r1317, %r1316; 2026-02-21T09:41:35.6845233Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6845325Z cvt.u64.u32 %rd163, %r684; 2026-02-21T09:41:35.6845411Z cvt.u64.u32 %rd164, %r685; 2026-02-21T09:41:35.6845565Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:41:35.6845660Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:41:35.6845923Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6846008Z mov.b64 {%r1319, %r1320}, %rd166; 2026-02-21T09:41:35.6846115Z cvt.rn.f16x2.f32 %r1321, %r1320, %r1319; 2026-02-21T09:41:35.6846419Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6846507Z cvt.u64.u32 %rd167, %r686; 2026-02-21T09:41:35.6846592Z cvt.u64.u32 %rd168, %r687; 2026-02-21T09:41:35.6846689Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:41:35.6846778Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:41:35.6847047Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6847150Z mov.b64 {%r1322, %r1323}, %rd170; 2026-02-21T09:41:35.6847248Z cvt.rn.f16x2.f32 %r1324, %r1323, %r1322; 2026-02-21T09:41:35.6847514Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6847606Z cvt.u64.u32 %rd171, %r688; 2026-02-21T09:41:35.6847690Z cvt.u64.u32 %rd172, %r689; 2026-02-21T09:41:35.6847773Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:41:35.6847856Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:41:35.6848166Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6848253Z mov.b64 {%r1325, %r1326}, %rd174; 2026-02-21T09:41:35.6848350Z cvt.rn.f16x2.f32 %r1327, %r1326, %r1325; 2026-02-21T09:41:35.6848623Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6848707Z cvt.u64.u32 %rd175, %r690; 2026-02-21T09:41:35.6848793Z cvt.u64.u32 %rd176, %r691; 2026-02-21T09:41:35.6848890Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:41:35.6848975Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:41:35.6849245Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6849372Z mov.b64 {%r1328, %r1329}, %rd178; 2026-02-21T09:41:35.6849484Z cvt.rn.f16x2.f32 %r1330, %r1329, %r1328; 2026-02-21T09:41:35.6849759Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6849843Z cvt.u64.u32 %rd179, %r692; 2026-02-21T09:41:35.6849932Z cvt.u64.u32 %rd180, %r693; 2026-02-21T09:41:35.6850061Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:41:35.6850144Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:41:35.6850409Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6850494Z mov.b64 {%r1331, %r1332}, %rd182; 2026-02-21T09:41:35.6850593Z cvt.rn.f16x2.f32 %r1333, %r1332, %r1331; 2026-02-21T09:41:35.6850852Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6850933Z cvt.u64.u32 %rd183, %r695; 2026-02-21T09:41:35.6851001Z cvt.u64.u32 %rd184, %r696; 2026-02-21T09:41:35.6851071Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:41:35.6851149Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:41:35.6851369Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6851446Z mov.b64 {%r1334, %r1335}, %rd186; 2026-02-21T09:41:35.6851534Z cvt.rn.f16x2.f32 %r1336, %r1335, %r1334; 2026-02-21T09:41:35.6851761Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6851832Z cvt.u64.u32 %rd187, %r697; 2026-02-21T09:41:35.6851914Z cvt.u64.u32 %rd188, %r698; 2026-02-21T09:41:35.6852001Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:41:35.6852079Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:41:35.6852328Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6852415Z mov.b64 {%r1337, %r1338}, %rd190; 2026-02-21T09:41:35.6852567Z cvt.rn.f16x2.f32 %r1339, %r1338, %r1337; 2026-02-21T09:41:35.6852823Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6852912Z cvt.u64.u32 %rd191, %r699; 2026-02-21T09:41:35.6852994Z cvt.u64.u32 %rd192, %r700; 2026-02-21T09:41:35.6853149Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:41:35.6853236Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:41:35.6853511Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6853595Z mov.b64 {%r1340, %r1341}, %rd194; 2026-02-21T09:41:35.6853690Z cvt.rn.f16x2.f32 %r1342, %r1341, %r1340; 2026-02-21T09:41:35.6853952Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6854033Z cvt.u64.u32 %rd195, %r701; 2026-02-21T09:41:35.6854114Z cvt.u64.u32 %rd196, %r702; 2026-02-21T09:41:35.6854201Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:41:35.6854284Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:41:35.6854542Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6854624Z mov.b64 {%r1343, %r1344}, %rd198; 2026-02-21T09:41:35.6854779Z cvt.rn.f16x2.f32 %r1345, %r1344, %r1343; 2026-02-21T09:41:35.6855088Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6855173Z cvt.u64.u32 %rd199, %r703; 2026-02-21T09:41:35.6855264Z cvt.u64.u32 %rd200, %r704; 2026-02-21T09:41:35.6855346Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:41:35.6855433Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:41:35.6855698Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6855779Z mov.b64 {%r1346, %r1347}, %rd202; 2026-02-21T09:41:35.6855872Z cvt.rn.f16x2.f32 %r1348, %r1347, %r1346; 2026-02-21T09:41:35.6856126Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6856218Z cvt.u64.u32 %rd203, %r705; 2026-02-21T09:41:35.6856301Z cvt.u64.u32 %rd204, %r706; 2026-02-21T09:41:35.6856384Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:41:35.6856475Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:41:35.6856728Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6856854Z mov.b64 {%r1349, %r1350}, %rd206; 2026-02-21T09:41:35.6856959Z cvt.rn.f16x2.f32 %r1351, %r1350, %r1349; 2026-02-21T09:41:35.6857207Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6857290Z cvt.u64.u32 %rd207, %r707; 2026-02-21T09:41:35.6857374Z cvt.u64.u32 %rd208, %r708; 2026-02-21T09:41:35.6857467Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:41:35.6857551Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:41:35.6857813Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6857908Z mov.b64 {%r1352, %r1353}, %rd210; 2026-02-21T09:41:35.6858006Z cvt.rn.f16x2.f32 %r1354, %r1353, %r1352; 2026-02-21T09:41:35.6858267Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6858361Z cvt.u64.u32 %rd211, %r709; 2026-02-21T09:41:35.6858444Z cvt.u64.u32 %rd212, %r710; 2026-02-21T09:41:35.6858528Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:41:35.6858611Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:41:35.6858874Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6858959Z mov.b64 {%r1355, %r1356}, %rd214; 2026-02-21T09:41:35.6859055Z cvt.rn.f16x2.f32 %r1357, %r1356, %r1355; 2026-02-21T09:41:35.6859321Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6859406Z cvt.u64.u32 %rd215, %r712; 2026-02-21T09:41:35.6859539Z cvt.u64.u32 %rd216, %r713; 2026-02-21T09:41:35.6859631Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:41:35.6859716Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:41:35.6859972Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6860097Z mov.b64 {%r1358, %r1359}, %rd218; 2026-02-21T09:41:35.6860202Z cvt.rn.f16x2.f32 %r1360, %r1359, %r1358; 2026-02-21T09:41:35.6860470Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6860556Z cvt.u64.u32 %rd219, %r714; 2026-02-21T09:41:35.6860653Z cvt.u64.u32 %rd220, %r715; 2026-02-21T09:41:35.6860738Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:41:35.6860826Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:41:35.6861106Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6861191Z mov.b64 {%r1361, %r1362}, %rd222; 2026-02-21T09:41:35.6861288Z cvt.rn.f16x2.f32 %r1363, %r1362, %r1361; 2026-02-21T09:41:35.6861560Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6861650Z cvt.u64.u32 %rd223, %r716; 2026-02-21T09:41:35.6861734Z cvt.u64.u32 %rd224, %r717; 2026-02-21T09:41:35.6861851Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:41:35.6861938Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:41:35.6862159Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6862231Z mov.b64 {%r1364, %r1365}, %rd226; 2026-02-21T09:41:35.6862319Z cvt.rn.f16x2.f32 %r1366, %r1365, %r1364; 2026-02-21T09:41:35.6862537Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6862607Z cvt.u64.u32 %rd227, %r718; 2026-02-21T09:41:35.6862677Z cvt.u64.u32 %rd228, %r719; 2026-02-21T09:41:35.6862755Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:41:35.6862828Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:41:35.6863065Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6863156Z mov.b64 {%r1367, %r1368}, %rd230; 2026-02-21T09:41:35.6863248Z cvt.rn.f16x2.f32 %r1369, %r1368, %r1367; 2026-02-21T09:41:35.6863506Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6863667Z cvt.u64.u32 %rd231, %r720; 2026-02-21T09:41:35.6863751Z cvt.u64.u32 %rd232, %r721; 2026-02-21T09:41:35.6863837Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:41:35.6863922Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:41:35.6864197Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6864283Z mov.b64 {%r1370, %r1371}, %rd234; 2026-02-21T09:41:35.6864382Z cvt.rn.f16x2.f32 %r1372, %r1371, %r1370; 2026-02-21T09:41:35.6864651Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6864795Z cvt.u64.u32 %rd235, %r722; 2026-02-21T09:41:35.6864884Z cvt.u64.u32 %rd236, %r723; 2026-02-21T09:41:35.6864979Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:41:35.6865065Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:41:35.6865335Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6865425Z mov.b64 {%r1373, %r1374}, %rd238; 2026-02-21T09:41:35.6865535Z cvt.rn.f16x2.f32 %r1375, %r1374, %r1373; 2026-02-21T09:41:35.6865798Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6865881Z cvt.u64.u32 %rd239, %r724; 2026-02-21T09:41:35.6865973Z cvt.u64.u32 %rd240, %r725; 2026-02-21T09:41:35.6866058Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:41:35.6866145Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:41:35.6866419Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6866557Z mov.b64 {%r1376, %r1377}, %rd242; 2026-02-21T09:41:35.6866656Z cvt.rn.f16x2.f32 %r1378, %r1377, %r1376; 2026-02-21T09:41:35.6866921Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6867058Z cvt.u64.u32 %rd243, %r726; 2026-02-21T09:41:35.6867145Z cvt.u64.u32 %rd244, %r727; 2026-02-21T09:41:35.6867247Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:41:35.6867356Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:41:35.6867635Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6867726Z mov.b64 {%r1379, %r1380}, %rd246; 2026-02-21T09:41:35.6867835Z cvt.rn.f16x2.f32 %r1381, %r1380, %r1379; 2026-02-21T09:41:35.6868107Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6868196Z cvt.u64.u32 %rd247, %r729; 2026-02-21T09:41:35.6868287Z cvt.u64.u32 %rd248, %r730; 2026-02-21T09:41:35.6868386Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:41:35.6868476Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:41:35.6868758Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6868896Z mov.b64 {%r1382, %r1383}, %rd250; 2026-02-21T09:41:35.6869006Z cvt.rn.f16x2.f32 %r1384, %r1383, %r1382; 2026-02-21T09:41:35.6869284Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6869380Z cvt.u64.u32 %rd251, %r731; 2026-02-21T09:41:35.6869469Z cvt.u64.u32 %rd252, %r732; 2026-02-21T09:41:35.6869558Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:41:35.6869648Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:41:35.6869933Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6870024Z mov.b64 {%r1385, %r1386}, %rd254; 2026-02-21T09:41:35.6870134Z cvt.rn.f16x2.f32 %r1387, %r1386, %r1385; 2026-02-21T09:41:35.6870418Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6870507Z cvt.u64.u32 %rd255, %r733; 2026-02-21T09:41:35.6870596Z cvt.u64.u32 %rd256, %r734; 2026-02-21T09:41:35.6870693Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:41:35.6870786Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:41:35.6871066Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6871195Z mov.b64 {%r1388, %r1389}, %rd258; 2026-02-21T09:41:35.6871310Z cvt.rn.f16x2.f32 %r1390, %r1389, %r1388; 2026-02-21T09:41:35.6871588Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6871678Z cvt.u64.u32 %rd259, %r735; 2026-02-21T09:41:35.6871777Z cvt.u64.u32 %rd260, %r736; 2026-02-21T09:41:35.6871867Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:41:35.6871960Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:41:35.6872250Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6872340Z mov.b64 {%r1391, %r1392}, %rd262; 2026-02-21T09:41:35.6872444Z cvt.rn.f16x2.f32 %r1393, %r1392, %r1391; 2026-02-21T09:41:35.6872723Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6872823Z cvt.u64.u32 %rd263, %r737; 2026-02-21T09:41:35.6872911Z cvt.u64.u32 %rd264, %r738; 2026-02-21T09:41:35.6873000Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:41:35.6873099Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:41:35.6873370Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6873457Z mov.b64 {%r1394, %r1395}, %rd266; 2026-02-21T09:41:35.6873570Z cvt.rn.f16x2.f32 %r1396, %r1395, %r1394; 2026-02-21T09:41:35.6873846Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6873963Z cvt.u64.u32 %rd267, %r739; 2026-02-21T09:41:35.6874054Z cvt.u64.u32 %rd268, %r740; 2026-02-21T09:41:35.6874154Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:41:35.6874244Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:41:35.6874554Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6874652Z mov.b64 {%r1397, %r1398}, %rd270; 2026-02-21T09:41:35.6874817Z cvt.rn.f16x2.f32 %r1399, %r1398, %r1397; 2026-02-21T09:41:35.6875108Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6875207Z cvt.u64.u32 %rd271, %r741; 2026-02-21T09:41:35.6875296Z cvt.u64.u32 %rd272, %r742; 2026-02-21T09:41:35.6875384Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:41:35.6875471Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:41:35.6875745Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6875833Z mov.b64 {%r1400, %r1401}, %rd274; 2026-02-21T09:41:35.6875932Z cvt.rn.f16x2.f32 %r1402, %r1401, %r1400; 2026-02-21T09:41:35.6876200Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6876323Z cvt.u64.u32 %rd275, %r743; 2026-02-21T09:41:35.6876409Z cvt.u64.u32 %rd276, %r744; 2026-02-21T09:41:35.6876503Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:41:35.6876590Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:41:35.6876857Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6876945Z mov.b64 {%r1403, %r1404}, %rd278; 2026-02-21T09:41:35.6877053Z cvt.rn.f16x2.f32 %r1405, %r1404, %r1403; 2026-02-21T09:41:35.6877317Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6877400Z cvt.u64.u32 %rd279, %r746; 2026-02-21T09:41:35.6877493Z cvt.u64.u32 %rd280, %r747; 2026-02-21T09:41:35.6877579Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:41:35.6877666Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:41:35.6877941Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6878028Z mov.b64 {%r1406, %r1407}, %rd282; 2026-02-21T09:41:35.6878129Z cvt.rn.f16x2.f32 %r1408, %r1407, %r1406; 2026-02-21T09:41:35.6878390Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6878523Z cvt.u64.u32 %rd283, %r748; 2026-02-21T09:41:35.6878605Z cvt.u64.u32 %rd284, %r749; 2026-02-21T09:41:35.6878689Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:41:35.6878782Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:41:35.6879041Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6879118Z mov.b64 {%r1409, %r1410}, %rd286; 2026-02-21T09:41:35.6879222Z cvt.rn.f16x2.f32 %r1411, %r1410, %r1409; 2026-02-21T09:41:35.6879482Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6879562Z cvt.u64.u32 %rd287, %r750; 2026-02-21T09:41:35.6879638Z cvt.u64.u32 %rd288, %r751; 2026-02-21T09:41:35.6879728Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:41:35.6879810Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:41:35.6880079Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6880173Z mov.b64 {%r1412, %r1413}, %rd290; 2026-02-21T09:41:35.6880267Z cvt.rn.f16x2.f32 %r1414, %r1413, %r1412; 2026-02-21T09:41:35.6880532Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6880620Z cvt.u64.u32 %rd291, %r752; 2026-02-21T09:41:35.6880698Z cvt.u64.u32 %rd292, %r753; 2026-02-21T09:41:35.6880779Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:41:35.6880951Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:41:35.6881217Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6881298Z mov.b64 {%r1415, %r1416}, %rd294; 2026-02-21T09:41:35.6881393Z cvt.rn.f16x2.f32 %r1417, %r1416, %r1415; 2026-02-21T09:41:35.6881713Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6881797Z cvt.u64.u32 %rd295, %r754; 2026-02-21T09:41:35.6881880Z cvt.u64.u32 %rd296, %r755; 2026-02-21T09:41:35.6881967Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:41:35.6882042Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:41:35.6882286Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6882361Z mov.b64 {%r1418, %r1419}, %rd298; 2026-02-21T09:41:35.6882458Z cvt.rn.f16x2.f32 %r1420, %r1419, %r1418; 2026-02-21T09:41:35.6882710Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6882786Z cvt.u64.u32 %rd299, %r756; 2026-02-21T09:41:35.6882869Z cvt.u64.u32 %rd300, %r757; 2026-02-21T09:41:35.6882944Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:41:35.6883022Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:41:35.6883312Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6883382Z mov.b64 {%r1421, %r1422}, %rd302; 2026-02-21T09:41:35.6883451Z cvt.rn.f16x2.f32 %r1423, %r1422, %r1421; 2026-02-21T09:41:35.6883612Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6883676Z cvt.u64.u32 %rd303, %r758; 2026-02-21T09:41:35.6883732Z cvt.u64.u32 %rd304, %r759; 2026-02-21T09:41:35.6883788Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:41:35.6883853Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:41:35.6884013Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6884071Z mov.b64 {%r1424, %r1425}, %rd306; 2026-02-21T09:41:35.6884144Z cvt.rn.f16x2.f32 %r1426, %r1425, %r1424; 2026-02-21T09:41:35.6884310Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6884369Z cvt.u64.u32 %rd307, %r760; 2026-02-21T09:41:35.6884427Z cvt.u64.u32 %rd308, %r761; 2026-02-21T09:41:35.6884493Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:41:35.6884583Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:41:35.6884794Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6884863Z mov.b64 {%r1427, %r1428}, %rd310; 2026-02-21T09:41:35.6884927Z cvt.rn.f16x2.f32 %r1429, %r1428, %r1427; 2026-02-21T09:41:35.6885092Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6885155Z cvt.u64.u32 %rd311, %r763; 2026-02-21T09:41:35.6885214Z cvt.u64.u32 %rd312, %r764; 2026-02-21T09:41:35.6885271Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:41:35.6885328Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:41:35.6885498Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6885556Z mov.b64 {%r1430, %r1431}, %rd314; 2026-02-21T09:41:35.6885621Z cvt.rn.f16x2.f32 %r1432, %r1431, %r1430; 2026-02-21T09:41:35.6885790Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6885849Z cvt.u64.u32 %rd315, %r765; 2026-02-21T09:41:35.6885906Z cvt.u64.u32 %rd316, %r766; 2026-02-21T09:41:35.6885972Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:41:35.6886033Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:41:35.6886193Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6886250Z mov.b64 {%r1433, %r1434}, %rd318; 2026-02-21T09:41:35.6886323Z cvt.rn.f16x2.f32 %r1435, %r1434, %r1433; 2026-02-21T09:41:35.6886518Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6886574Z cvt.u64.u32 %rd319, %r767; 2026-02-21T09:41:35.6886638Z cvt.u64.u32 %rd320, %r768; 2026-02-21T09:41:35.6886693Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:41:35.6886780Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:41:35.6886948Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6887006Z mov.b64 {%r1436, %r1437}, %rd322; 2026-02-21T09:41:35.6887071Z cvt.rn.f16x2.f32 %r1438, %r1437, %r1436; 2026-02-21T09:41:35.6887231Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6887298Z cvt.u64.u32 %rd323, %r769; 2026-02-21T09:41:35.6887355Z cvt.u64.u32 %rd324, %r770; 2026-02-21T09:41:35.6887412Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:41:35.6887477Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:41:35.6887637Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6887695Z mov.b64 {%r1439, %r1440}, %rd326; 2026-02-21T09:41:35.6887764Z cvt.rn.f16x2.f32 %r1441, %r1440, %r1439; 2026-02-21T09:41:35.6887952Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6888010Z cvt.u64.u32 %rd327, %r771; 2026-02-21T09:41:35.6888067Z cvt.u64.u32 %rd328, %r772; 2026-02-21T09:41:35.6888132Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:41:35.6888189Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:41:35.6888345Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6888411Z mov.b64 {%r1442, %r1443}, %rd330; 2026-02-21T09:41:35.6888473Z cvt.rn.f16x2.f32 %r1444, %r1443, %r1442; 2026-02-21T09:41:35.6888629Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6888693Z cvt.u64.u32 %rd331, %r773; 2026-02-21T09:41:35.6888746Z cvt.u64.u32 %rd332, %r774; 2026-02-21T09:41:35.6888801Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:41:35.6888856Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:41:35.6889020Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6889077Z mov.b64 {%r1445, %r1446}, %rd334; 2026-02-21T09:41:35.6889172Z cvt.rn.f16x2.f32 %r1447, %r1446, %r1445; 2026-02-21T09:41:35.6889341Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6889396Z cvt.u64.u32 %rd335, %r775; 2026-02-21T09:41:35.6889451Z cvt.u64.u32 %rd336, %r776; 2026-02-21T09:41:35.6889514Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:41:35.6889570Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:41:35.6889729Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6889788Z mov.b64 {%r1448, %r1449}, %rd338; 2026-02-21T09:41:35.6889857Z cvt.rn.f16x2.f32 %r1450, %r1449, %r1448; 2026-02-21T09:41:35.6890017Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6890072Z cvt.u64.u32 %rd339, %r777; 2026-02-21T09:41:35.6890137Z cvt.u64.u32 %rd340, %r778; 2026-02-21T09:41:35.6890192Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:41:35.6890249Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:41:35.6890418Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6890473Z mov.b64 {%r1451, %r1452}, %rd342; 2026-02-21T09:41:35.6890536Z cvt.rn.f16x2.f32 %r1453, %r1452, %r1451; 2026-02-21T09:41:35.6890694Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6890756Z cvt.u64.u32 %rd343, %r780; 2026-02-21T09:41:35.6890813Z cvt.u64.u32 %rd344, %r781; 2026-02-21T09:41:35.6890891Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:41:35.6890954Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:41:35.6891113Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6891169Z mov.b64 {%r1454, %r1455}, %rd346; 2026-02-21T09:41:35.6891261Z cvt.rn.f16x2.f32 %r1456, %r1455, %r1454; 2026-02-21T09:41:35.6891423Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6891480Z cvt.u64.u32 %rd347, %r782; 2026-02-21T09:41:35.6891536Z cvt.u64.u32 %rd348, %r783; 2026-02-21T09:41:35.6891599Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:41:35.6891656Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:41:35.6891813Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6891878Z mov.b64 {%r1457, %r1458}, %rd350; 2026-02-21T09:41:35.6891941Z cvt.rn.f16x2.f32 %r1459, %r1458, %r1457; 2026-02-21T09:41:35.6892103Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6892164Z cvt.u64.u32 %rd351, %r784; 2026-02-21T09:41:35.6892219Z cvt.u64.u32 %rd352, %r785; 2026-02-21T09:41:35.6892277Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:41:35.6892355Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:41:35.6892524Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6892583Z mov.b64 {%r1460, %r1461}, %rd354; 2026-02-21T09:41:35.6892646Z cvt.rn.f16x2.f32 %r1462, %r1461, %r1460; 2026-02-21T09:41:35.6892810Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6892867Z cvt.u64.u32 %rd355, %r786; 2026-02-21T09:41:35.6892921Z cvt.u64.u32 %rd356, %r787; 2026-02-21T09:41:35.6892984Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:41:35.6893040Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:41:35.6893202Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6893257Z mov.b64 {%r1463, %r1464}, %rd358; 2026-02-21T09:41:35.6893327Z cvt.rn.f16x2.f32 %r1465, %r1464, %r1463; 2026-02-21T09:41:35.6893486Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6893540Z cvt.u64.u32 %rd359, %r788; 2026-02-21T09:41:35.6893625Z cvt.u64.u32 %rd360, %r789; 2026-02-21T09:41:35.6893681Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:41:35.6893737Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:41:35.6893905Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6893963Z mov.b64 {%r1466, %r1467}, %rd362; 2026-02-21T09:41:35.6894027Z cvt.rn.f16x2.f32 %r1468, %r1467, %r1466; 2026-02-21T09:41:35.6894188Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6894253Z cvt.u64.u32 %rd363, %r790; 2026-02-21T09:41:35.6894308Z cvt.u64.u32 %rd364, %r791; 2026-02-21T09:41:35.6894363Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:41:35.6894429Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:41:35.6894592Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6894650Z mov.b64 {%r1469, %r1470}, %rd366; 2026-02-21T09:41:35.6894772Z cvt.rn.f16x2.f32 %r1471, %r1470, %r1469; 2026-02-21T09:41:35.6894935Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6894990Z cvt.u64.u32 %rd367, %r792; 2026-02-21T09:41:35.6895046Z cvt.u64.u32 %rd368, %r793; 2026-02-21T09:41:35.6895109Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:41:35.6895164Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:41:35.6895323Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6895413Z mov.b64 {%r1472, %r1473}, %rd370; 2026-02-21T09:41:35.6895476Z cvt.rn.f16x2.f32 %r1474, %r1473, %r1472; 2026-02-21T09:41:35.6895638Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6895704Z cvt.u64.u32 %rd371, %r794; 2026-02-21T09:41:35.6895763Z cvt.u64.u32 %rd372, %r795; 2026-02-21T09:41:35.6895847Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:41:35.6895905Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:41:35.6896077Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6896132Z mov.b64 {%r1475, %r1476}, %rd374; 2026-02-21T09:41:35.6896195Z cvt.rn.f16x2.f32 %r1477, %r1476, %r1475; 2026-02-21T09:41:35.6896361Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6896415Z cvt.u64.u32 %rd375, %r797; 2026-02-21T09:41:35.6896470Z cvt.u64.u32 %rd376, %r798; 2026-02-21T09:41:35.6896536Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:41:35.6896593Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:41:35.6896759Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6896816Z mov.b64 {%r1478, %r1479}, %rd378; 2026-02-21T09:41:35.6896913Z cvt.rn.f16x2.f32 %r1480, %r1479, %r1478; 2026-02-21T09:41:35.6897076Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6897134Z cvt.u64.u32 %rd379, %r799; 2026-02-21T09:41:35.6897197Z cvt.u64.u32 %rd380, %r800; 2026-02-21T09:41:35.6897252Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:41:35.6897307Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:41:35.6897475Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6897531Z mov.b64 {%r1481, %r1482}, %rd382; 2026-02-21T09:41:35.6897594Z cvt.rn.f16x2.f32 %r1483, %r1482, %r1481; 2026-02-21T09:41:35.6897756Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6897818Z cvt.u64.u32 %rd383, %r801; 2026-02-21T09:41:35.6897872Z cvt.u64.u32 %rd384, %r802; 2026-02-21T09:41:35.6897925Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:41:35.6897990Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:41:35.6898154Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6898250Z mov.b64 {%r1484, %r1485}, %rd386; 2026-02-21T09:41:35.6898321Z cvt.rn.f16x2.f32 %r1486, %r1485, %r1484; 2026-02-21T09:41:35.6898483Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6898538Z cvt.u64.u32 %rd387, %r803; 2026-02-21T09:41:35.6898592Z cvt.u64.u32 %rd388, %r804; 2026-02-21T09:41:35.6898654Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:41:35.6898709Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:41:35.6898872Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6898934Z mov.b64 {%r1487, %r1488}, %rd390; 2026-02-21T09:41:35.6898997Z cvt.rn.f16x2.f32 %r1489, %r1488, %r1487; 2026-02-21T09:41:35.6899160Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6899222Z cvt.u64.u32 %rd391, %r805; 2026-02-21T09:41:35.6899278Z cvt.u64.u32 %rd392, %r806; 2026-02-21T09:41:35.6899334Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:41:35.6899389Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:41:35.6899556Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6899613Z mov.b64 {%r1490, %r1491}, %rd394; 2026-02-21T09:41:35.6899676Z cvt.rn.f16x2.f32 %r1492, %r1491, %r1490; 2026-02-21T09:41:35.6899843Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6899922Z cvt.u64.u32 %rd395, %r807; 2026-02-21T09:41:35.6899978Z cvt.u64.u32 %rd396, %r808; 2026-02-21T09:41:35.6900040Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:41:35.6900096Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:41:35.6900313Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6900415Z mov.b64 {%r1493, %r1494}, %rd398; 2026-02-21T09:41:35.6900512Z cvt.rn.f16x2.f32 %r1495, %r1494, %r1493; 2026-02-21T09:41:35.6900766Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6900839Z cvt.u64.u32 %rd399, %r809; 2026-02-21T09:41:35.6900917Z cvt.u64.u32 %rd400, %r810; 2026-02-21T09:41:35.6900987Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:41:35.6901058Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:41:35.6901302Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6901377Z mov.b64 {%r1496, %r1497}, %rd402; 2026-02-21T09:41:35.6901467Z cvt.rn.f16x2.f32 %r1498, %r1497, %r1496; 2026-02-21T09:41:35.6901719Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6901799Z cvt.u64.u32 %rd403, %r811; 2026-02-21T09:41:35.6901895Z cvt.u64.u32 %rd404, %r812; 2026-02-21T09:41:35.6901970Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:41:35.6902050Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:41:35.6902291Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6902369Z mov.b64 {%r1499, %r1500}, %rd406; 2026-02-21T09:41:35.6902464Z cvt.rn.f16x2.f32 %r1501, %r1500, %r1499; 2026-02-21T09:41:35.6902716Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6902789Z cvt.u64.u32 %rd407, %r814; 2026-02-21T09:41:35.6902860Z cvt.u64.u32 %rd408, %r815; 2026-02-21T09:41:35.6902944Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:41:35.6903016Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:41:35.6903265Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6903344Z mov.b64 {%r1502, %r1503}, %rd410; 2026-02-21T09:41:35.6903428Z cvt.rn.f16x2.f32 %r1504, %r1503, %r1502; 2026-02-21T09:41:35.6903668Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6903781Z cvt.u64.u32 %rd411, %r816; 2026-02-21T09:41:35.6903855Z cvt.u64.u32 %rd412, %r817; 2026-02-21T09:41:35.6903927Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:41:35.6904006Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:41:35.6904261Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6904332Z mov.b64 {%r1505, %r1506}, %rd414; 2026-02-21T09:41:35.6904415Z cvt.rn.f16x2.f32 %r1507, %r1506, %r1505; 2026-02-21T09:41:35.6904735Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6904815Z cvt.u64.u32 %rd415, %r818; 2026-02-21T09:41:35.6904890Z cvt.u64.u32 %rd416, %r819; 2026-02-21T09:41:35.6904971Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:41:35.6905046Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:41:35.6905286Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6905365Z mov.b64 {%r1508, %r1509}, %rd418; 2026-02-21T09:41:35.6905464Z cvt.rn.f16x2.f32 %r1510, %r1509, %r1508; 2026-02-21T09:41:35.6905708Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6905777Z cvt.u64.u32 %rd419, %r820; 2026-02-21T09:41:35.6905855Z cvt.u64.u32 %rd420, %r821; 2026-02-21T09:41:35.6905936Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:41:35.6906014Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:41:35.6906354Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6906443Z mov.b64 {%r1511, %r1512}, %rd422; 2026-02-21T09:41:35.6906541Z cvt.rn.f16x2.f32 %r1513, %r1512, %r1511; 2026-02-21T09:41:35.6906795Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6906933Z cvt.u64.u32 %rd423, %r822; 2026-02-21T09:41:35.6907018Z cvt.u64.u32 %rd424, %r823; 2026-02-21T09:41:35.6907103Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:41:35.6907193Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:41:35.6907451Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6907532Z mov.b64 {%r1514, %r1515}, %rd426; 2026-02-21T09:41:35.6907634Z cvt.rn.f16x2.f32 %r1516, %r1515, %r1514; 2026-02-21T09:41:35.6907883Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6907970Z cvt.u64.u32 %rd427, %r824; 2026-02-21T09:41:35.6908055Z cvt.u64.u32 %rd428, %r825; 2026-02-21T09:41:35.6908148Z shl.b64 %rd429, %rd428, 32; 2026-02-21T09:41:35.6908235Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T09:41:35.6908496Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6908626Z mov.b64 {%r1517, %r1518}, %rd430; 2026-02-21T09:41:35.6908723Z cvt.rn.f16x2.f32 %r1519, %r1518, %r1517; 2026-02-21T09:41:35.6908979Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6909069Z cvt.u64.u32 %rd431, %r826; 2026-02-21T09:41:35.6909152Z cvt.u64.u32 %rd432, %r827; 2026-02-21T09:41:35.6909235Z shl.b64 %rd433, %rd432, 32; 2026-02-21T09:41:35.6909316Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T09:41:35.6909579Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6909666Z mov.b64 {%r1520, %r1521}, %rd434; 2026-02-21T09:41:35.6909762Z cvt.rn.f16x2.f32 %r1522, %r1521, %r1520; 2026-02-21T09:41:35.6910031Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6910114Z cvt.u64.u32 %rd435, %r828; 2026-02-21T09:41:35.6910197Z cvt.u64.u32 %rd436, %r829; 2026-02-21T09:41:35.6910292Z shl.b64 %rd437, %rd436, 32; 2026-02-21T09:41:35.6910373Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T09:41:35.6910669Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6910754Z mov.b64 {%r1523, %r1524}, %rd438; 2026-02-21T09:41:35.6910862Z cvt.rn.f16x2.f32 %r1525, %r1524, %r1523; 2026-02-21T09:41:35.6911138Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6911228Z cvt.u64.u32 %rd439, %r831; 2026-02-21T09:41:35.6911324Z cvt.u64.u32 %rd440, %r832; 2026-02-21T09:41:35.6911417Z shl.b64 %rd441, %rd440, 32; 2026-02-21T09:41:35.6911508Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T09:41:35.6911793Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6911883Z mov.b64 {%r1526, %r1527}, %rd442; 2026-02-21T09:41:35.6911989Z cvt.rn.f16x2.f32 %r1528, %r1527, %r1526; 2026-02-21T09:41:35.6912263Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6912360Z cvt.u64.u32 %rd443, %r833; 2026-02-21T09:41:35.6912446Z cvt.u64.u32 %rd444, %r834; 2026-02-21T09:41:35.6912532Z shl.b64 %rd445, %rd444, 32; 2026-02-21T09:41:35.6912628Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T09:41:35.6912894Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6912985Z mov.b64 {%r1529, %r1530}, %rd446; 2026-02-21T09:41:35.6913094Z cvt.rn.f16x2.f32 %r1531, %r1530, %r1529; 2026-02-21T09:41:35.6913395Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6913485Z cvt.u64.u32 %rd447, %r835; 2026-02-21T09:41:35.6913571Z cvt.u64.u32 %rd448, %r836; 2026-02-21T09:41:35.6913671Z shl.b64 %rd449, %rd448, 32; 2026-02-21T09:41:35.6913761Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T09:41:35.6914068Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6914169Z mov.b64 {%r1532, %r1533}, %rd450; 2026-02-21T09:41:35.6914271Z cvt.rn.f16x2.f32 %r1534, %r1533, %r1532; 2026-02-21T09:41:35.6914543Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6914638Z cvt.u64.u32 %rd451, %r837; 2026-02-21T09:41:35.6914778Z cvt.u64.u32 %rd452, %r838; 2026-02-21T09:41:35.6914866Z shl.b64 %rd453, %rd452, 32; 2026-02-21T09:41:35.6914956Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T09:41:35.6915236Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6915325Z mov.b64 {%r1535, %r1536}, %rd454; 2026-02-21T09:41:35.6915425Z cvt.rn.f16x2.f32 %r1537, %r1536, %r1535; 2026-02-21T09:41:35.6915706Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6915838Z cvt.u64.u32 %rd455, %r839; 2026-02-21T09:41:35.6915926Z cvt.u64.u32 %rd456, %r840; 2026-02-21T09:41:35.6916022Z shl.b64 %rd457, %rd456, 32; 2026-02-21T09:41:35.6916108Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T09:41:35.6916378Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6916466Z mov.b64 {%r1538, %r1539}, %rd458; 2026-02-21T09:41:35.6916575Z cvt.rn.f16x2.f32 %r1540, %r1539, %r1538; 2026-02-21T09:41:35.6916848Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6916937Z cvt.u64.u32 %rd459, %r841; 2026-02-21T09:41:35.6917034Z cvt.u64.u32 %rd460, %r842; 2026-02-21T09:41:35.6917123Z shl.b64 %rd461, %rd460, 32; 2026-02-21T09:41:35.6917212Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T09:41:35.6917490Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6917582Z mov.b64 {%r1541, %r1542}, %rd462; 2026-02-21T09:41:35.6917682Z cvt.rn.f16x2.f32 %r1543, %r1542, %r1541; 2026-02-21T09:41:35.6917985Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6918082Z cvt.u64.u32 %rd463, %r843; 2026-02-21T09:41:35.6918169Z cvt.u64.u32 %rd464, %r844; 2026-02-21T09:41:35.6918259Z shl.b64 %rd465, %rd464, 32; 2026-02-21T09:41:35.6918356Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T09:41:35.6918630Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6918721Z mov.b64 {%r1544, %r1545}, %rd466; 2026-02-21T09:41:35.6918840Z cvt.rn.f16x2.f32 %r1546, %r1545, %r1544; 2026-02-21T09:41:35.6919120Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6919203Z cvt.u64.u32 %rd467, %r845; 2026-02-21T09:41:35.6919286Z cvt.u64.u32 %rd468, %r846; 2026-02-21T09:41:35.6919384Z shl.b64 %rd469, %rd468, 32; 2026-02-21T09:41:35.6919470Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T09:41:35.6919734Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6919829Z mov.b64 {%r1547, %r1548}, %rd470; 2026-02-21T09:41:35.6919928Z cvt.rn.f16x2.f32 %r1549, %r1548, %r1547; 2026-02-21T09:41:35.6920189Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6920281Z cvt.u64.u32 %rd471, %r848; 2026-02-21T09:41:35.6920364Z cvt.u64.u32 %rd472, %r849; 2026-02-21T09:41:35.6920449Z shl.b64 %rd473, %rd472, 32; 2026-02-21T09:41:35.6920568Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T09:41:35.6920839Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6920926Z mov.b64 {%r1550, %r1551}, %rd474; 2026-02-21T09:41:35.6921026Z cvt.rn.f16x2.f32 %r1552, %r1551, %r1550; 2026-02-21T09:41:35.6921340Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6921429Z cvt.u64.u32 %rd475, %r850; 2026-02-21T09:41:35.6921513Z cvt.u64.u32 %rd476, %r851; 2026-02-21T09:41:35.6921605Z shl.b64 %rd477, %rd476, 32; 2026-02-21T09:41:35.6921693Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T09:41:35.6921959Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6922047Z mov.b64 {%r1553, %r1554}, %rd478; 2026-02-21T09:41:35.6922155Z cvt.rn.f16x2.f32 %r1555, %r1554, %r1553; 2026-02-21T09:41:35.6922420Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6922507Z cvt.u64.u32 %rd479, %r852; 2026-02-21T09:41:35.6922599Z cvt.u64.u32 %rd480, %r853; 2026-02-21T09:41:35.6922681Z shl.b64 %rd481, %rd480, 32; 2026-02-21T09:41:35.6922765Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T09:41:35.6923086Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6923173Z mov.b64 {%r1556, %r1557}, %rd482; 2026-02-21T09:41:35.6923270Z cvt.rn.f16x2.f32 %r1558, %r1557, %r1556; 2026-02-21T09:41:35.6923528Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6923621Z cvt.u64.u32 %rd483, %r854; 2026-02-21T09:41:35.6923704Z cvt.u64.u32 %rd484, %r855; 2026-02-21T09:41:35.6923790Z shl.b64 %rd485, %rd484, 32; 2026-02-21T09:41:35.6923884Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T09:41:35.6924145Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6924234Z mov.b64 {%r1559, %r1560}, %rd486; 2026-02-21T09:41:35.6924340Z cvt.rn.f16x2.f32 %r1561, %r1560, %r1559; 2026-02-21T09:41:35.6924604Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6924742Z cvt.u64.u32 %rd487, %r856; 2026-02-21T09:41:35.6924827Z cvt.u64.u32 %rd488, %r857; 2026-02-21T09:41:35.6924958Z shl.b64 %rd489, %rd488, 32; 2026-02-21T09:41:35.6925046Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T09:41:35.6925311Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6925406Z mov.b64 {%r1562, %r1563}, %rd490; 2026-02-21T09:41:35.6925508Z cvt.rn.f16x2.f32 %r1564, %r1563, %r1562; 2026-02-21T09:41:35.6925767Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6925858Z cvt.u64.u32 %rd491, %r858; 2026-02-21T09:41:35.6925945Z cvt.u64.u32 %rd492, %r859; 2026-02-21T09:41:35.6926030Z shl.b64 %rd493, %rd492, 32; 2026-02-21T09:41:35.6926115Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T09:41:35.6926383Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6926468Z mov.b64 {%r1565, %r1566}, %rd494; 2026-02-21T09:41:35.6926564Z cvt.rn.f16x2.f32 %r1567, %r1566, %r1565; 2026-02-21T09:41:35.6926821Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6926902Z cvt.u64.u32 %rd495, %r860; 2026-02-21T09:41:35.6926984Z cvt.u64.u32 %rd496, %r861; 2026-02-21T09:41:35.6927076Z shl.b64 %rd497, %rd496, 32; 2026-02-21T09:41:35.6927159Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T09:41:35.6927415Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6927492Z mov.b64 {%r1568, %r1569}, %rd498; 2026-02-21T09:41:35.6927638Z cvt.rn.f16x2.f32 %r1570, %r1569, %r1568; 2026-02-21T09:41:35.6927900Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6927983Z cvt.u64.u32 %rd499, %r862; 2026-02-21T09:41:35.6928075Z cvt.u64.u32 %rd500, %r863; 2026-02-21T09:41:35.6928160Z shl.b64 %rd501, %rd500, 32; 2026-02-21T09:41:35.6928280Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T09:41:35.6928549Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6928637Z mov.b64 {%r1571, %r1572}, %rd502; 2026-02-21T09:41:35.6928733Z cvt.rn.f16x2.f32 %r1573, %r1572, %r1571; 2026-02-21T09:41:35.6928992Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6929085Z cvt.u64.u32 %rd503, %r865; 2026-02-21T09:41:35.6929168Z cvt.u64.u32 %rd504, %r866; 2026-02-21T09:41:35.6929252Z shl.b64 %rd505, %rd504, 32; 2026-02-21T09:41:35.6929346Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T09:41:35.6929605Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6929689Z mov.b64 {%r1574, %r1575}, %rd506; 2026-02-21T09:41:35.6929793Z cvt.rn.f16x2.f32 %r1576, %r1575, %r1574; 2026-02-21T09:41:35.6930093Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6930178Z cvt.u64.u32 %rd507, %r867; 2026-02-21T09:41:35.6930260Z cvt.u64.u32 %rd508, %r868; 2026-02-21T09:41:35.6930355Z shl.b64 %rd509, %rd508, 32; 2026-02-21T09:41:35.6930440Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T09:41:35.6930699Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6930793Z mov.b64 {%r1577, %r1578}, %rd510; 2026-02-21T09:41:35.6930890Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T09:41:35.6931149Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6931243Z cvt.u64.u32 %rd511, %r869; 2026-02-21T09:41:35.6931328Z cvt.u64.u32 %rd512, %r870; 2026-02-21T09:41:35.6931411Z shl.b64 %rd513, %rd512, 32; 2026-02-21T09:41:35.6931493Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T09:41:35.6931770Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6931857Z mov.b64 {%r1580, %r1581}, %rd514; 2026-02-21T09:41:35.6931985Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T09:41:35.6932256Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6932339Z cvt.u64.u32 %rd515, %r871; 2026-02-21T09:41:35.6932423Z cvt.u64.u32 %rd516, %r872; 2026-02-21T09:41:35.6932521Z shl.b64 %rd517, %rd516, 32; 2026-02-21T09:41:35.6932608Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T09:41:35.6932870Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6932960Z mov.b64 {%r1583, %r1584}, %rd518; 2026-02-21T09:41:35.6933070Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T09:41:35.6933335Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6933419Z cvt.u64.u32 %rd519, %r873; 2026-02-21T09:41:35.6933512Z cvt.u64.u32 %rd520, %r874; 2026-02-21T09:41:35.6933596Z shl.b64 %rd521, %rd520, 32; 2026-02-21T09:41:35.6933684Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T09:41:35.6933953Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6934038Z mov.b64 {%r1586, %r1587}, %rd522; 2026-02-21T09:41:35.6934135Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T09:41:35.6934394Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6934487Z cvt.u64.u32 %rd523, %r875; 2026-02-21T09:41:35.6934602Z cvt.u64.u32 %rd524, %r876; 2026-02-21T09:41:35.6934733Z shl.b64 %rd525, %rd524, 32; 2026-02-21T09:41:35.6934828Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T09:41:35.6935087Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6935172Z mov.b64 {%r1589, %r1590}, %rd526; 2026-02-21T09:41:35.6935314Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T09:41:35.6935574Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6935659Z cvt.u64.u32 %rd527, %r877; 2026-02-21T09:41:35.6935743Z cvt.u64.u32 %rd528, %r878; 2026-02-21T09:41:35.6935835Z shl.b64 %rd529, %rd528, 32; 2026-02-21T09:41:35.6935919Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T09:41:35.6936179Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6936270Z mov.b64 {%r1592, %r1593}, %rd530; 2026-02-21T09:41:35.6936367Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T09:41:35.6936630Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6936720Z cvt.u64.u32 %rd531, %r879; 2026-02-21T09:41:35.6936803Z cvt.u64.u32 %rd532, %r880; 2026-02-21T09:41:35.6936924Z shl.b64 %rd533, %rd532, 32; 2026-02-21T09:41:35.6937012Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T09:41:35.6937280Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6937366Z mov.b64 {%r1595, %r1596}, %rd534; 2026-02-21T09:41:35.6937462Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T09:41:35.6937728Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6937812Z cvt.u64.u32 %rd535, %r882; 2026-02-21T09:41:35.6937896Z cvt.u64.u32 %rd536, %r883; 2026-02-21T09:41:35.6937989Z shl.b64 %rd537, %rd536, 32; 2026-02-21T09:41:35.6938074Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T09:41:35.6938337Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6938420Z mov.b64 {%r1598, %r1599}, %rd538; 2026-02-21T09:41:35.6938524Z cvt.rn.f16x2.f32 %r1600, %r1599, %r1598; 2026-02-21T09:41:35.6938790Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6938876Z cvt.u64.u32 %rd539, %r884; 2026-02-21T09:41:35.6939003Z cvt.u64.u32 %rd540, %r885; 2026-02-21T09:41:35.6939089Z shl.b64 %rd541, %rd540, 32; 2026-02-21T09:41:35.6939178Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T09:41:35.6939446Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6939531Z mov.b64 {%r1601, %r1602}, %rd542; 2026-02-21T09:41:35.6939631Z cvt.rn.f16x2.f32 %r1603, %r1602, %r1601; 2026-02-21T09:41:35.6939896Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6939993Z cvt.u64.u32 %rd543, %r886; 2026-02-21T09:41:35.6940080Z cvt.u64.u32 %rd544, %r887; 2026-02-21T09:41:35.6940167Z shl.b64 %rd545, %rd544, 32; 2026-02-21T09:41:35.6940261Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T09:41:35.6940527Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6940612Z mov.b64 {%r1604, %r1605}, %rd546; 2026-02-21T09:41:35.6940720Z cvt.rn.f16x2.f32 %r1606, %r1605, %r1604; 2026-02-21T09:41:35.6940980Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6941066Z cvt.u64.u32 %rd547, %r888; 2026-02-21T09:41:35.6941149Z cvt.u64.u32 %rd548, %r889; 2026-02-21T09:41:35.6941241Z shl.b64 %rd549, %rd548, 32; 2026-02-21T09:41:35.6941327Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T09:41:35.6941586Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6941726Z mov.b64 {%r1607, %r1608}, %rd550; 2026-02-21T09:41:35.6941822Z cvt.rn.f16x2.f32 %r1609, %r1608, %r1607; 2026-02-21T09:41:35.6942079Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6942171Z cvt.u64.u32 %rd551, %r890; 2026-02-21T09:41:35.6942283Z cvt.u64.u32 %rd552, %r891; 2026-02-21T09:41:35.6942367Z shl.b64 %rd553, %rd552, 32; 2026-02-21T09:41:35.6942454Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T09:41:35.6942723Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6942807Z mov.b64 {%r1610, %r1611}, %rd554; 2026-02-21T09:41:35.6942902Z cvt.rn.f16x2.f32 %r1612, %r1611, %r1610; 2026-02-21T09:41:35.6943163Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6943243Z cvt.u64.u32 %rd555, %r892; 2026-02-21T09:41:35.6943327Z cvt.u64.u32 %rd556, %r893; 2026-02-21T09:41:35.6943417Z shl.b64 %rd557, %rd556, 32; 2026-02-21T09:41:35.6943499Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T09:41:35.6943755Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6943841Z mov.b64 {%r1613, %r1614}, %rd558; 2026-02-21T09:41:35.6943972Z cvt.rn.f16x2.f32 %r1615, %r1614, %r1613; 2026-02-21T09:41:35.6944226Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6944309Z cvt.u64.u32 %rd559, %r894; 2026-02-21T09:41:35.6944395Z cvt.u64.u32 %rd560, %r895; 2026-02-21T09:41:35.6944467Z shl.b64 %rd561, %rd560, 32; 2026-02-21T09:41:35.6944540Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T09:41:35.6944847Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6944925Z mov.b64 {%r1616, %r1617}, %rd562; 2026-02-21T09:41:35.6945022Z cvt.rn.f16x2.f32 %r1618, %r1617, %r1616; 2026-02-21T09:41:35.6945239Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6945323Z cvt.u64.u32 %rd563, %r896; 2026-02-21T09:41:35.6945399Z cvt.u64.u32 %rd564, %r897; 2026-02-21T09:41:35.6945475Z shl.b64 %rd565, %rd564, 32; 2026-02-21T09:41:35.6945566Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T09:41:35.6945787Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6945912Z mov.b64 {%r1619, %r1620}, %rd566; 2026-02-21T09:41:35.6946004Z cvt.rn.f16x2.f32 %r1621, %r1620, %r1619; 2026-02-21T09:41:35.6946207Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6946265Z cvt.u64.u32 %rd567, %r899; 2026-02-21T09:41:35.6946321Z cvt.u64.u32 %rd568, %r900; 2026-02-21T09:41:35.6946385Z shl.b64 %rd569, %rd568, 32; 2026-02-21T09:41:35.6946442Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T09:41:35.6946605Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6946670Z mov.b64 {%r1622, %r1623}, %rd570; 2026-02-21T09:41:35.6946734Z cvt.rn.f16x2.f32 %r1624, %r1623, %r1622; 2026-02-21T09:41:35.6946896Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6946958Z cvt.u64.u32 %rd571, %r901; 2026-02-21T09:41:35.6947014Z cvt.u64.u32 %rd572, %r902; 2026-02-21T09:41:35.6947069Z shl.b64 %rd573, %rd572, 32; 2026-02-21T09:41:35.6947125Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T09:41:35.6947294Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6947351Z mov.b64 {%r1625, %r1626}, %rd574; 2026-02-21T09:41:35.6947414Z cvt.rn.f16x2.f32 %r1627, %r1626, %r1625; 2026-02-21T09:41:35.6947582Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6947672Z cvt.u64.u32 %rd575, %r903; 2026-02-21T09:41:35.6947726Z cvt.u64.u32 %rd576, %r904; 2026-02-21T09:41:35.6947789Z shl.b64 %rd577, %rd576, 32; 2026-02-21T09:41:35.6947846Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T09:41:35.6948057Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6948116Z mov.b64 {%r1628, %r1629}, %rd578; 2026-02-21T09:41:35.6948189Z cvt.rn.f16x2.f32 %r1630, %r1629, %r1628; 2026-02-21T09:41:35.6948353Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6948410Z cvt.u64.u32 %rd579, %r905; 2026-02-21T09:41:35.6948472Z cvt.u64.u32 %rd580, %r906; 2026-02-21T09:41:35.6948529Z shl.b64 %rd581, %rd580, 32; 2026-02-21T09:41:35.6948584Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T09:41:35.6948751Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6948810Z mov.b64 {%r1631, %r1632}, %rd582; 2026-02-21T09:41:35.6948873Z cvt.rn.f16x2.f32 %r1633, %r1632, %r1631; 2026-02-21T09:41:35.6949036Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6949100Z cvt.u64.u32 %rd583, %r907; 2026-02-21T09:41:35.6949182Z cvt.u64.u32 %rd584, %r908; 2026-02-21T09:41:35.6949241Z shl.b64 %rd585, %rd584, 32; 2026-02-21T09:41:35.6949306Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T09:41:35.6949466Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6949522Z mov.b64 {%r1634, %r1635}, %rd586; 2026-02-21T09:41:35.6949591Z cvt.rn.f16x2.f32 %r1636, %r1635, %r1634; 2026-02-21T09:41:35.6949748Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6949802Z cvt.u64.u32 %rd587, %r909; 2026-02-21T09:41:35.6949857Z cvt.u64.u32 %rd588, %r910; 2026-02-21T09:41:35.6949921Z shl.b64 %rd589, %rd588, 32; 2026-02-21T09:41:35.6949977Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T09:41:35.6950135Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6950197Z mov.b64 {%r1637, %r1638}, %rd590; 2026-02-21T09:41:35.6950261Z cvt.rn.f16x2.f32 %r1639, %r1638, %r1637; 2026-02-21T09:41:35.6950425Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6950510Z cvt.u64.u32 %rd591, %r911; 2026-02-21T09:41:35.6950566Z cvt.u64.u32 %rd592, %r912; 2026-02-21T09:41:35.6950621Z shl.b64 %rd593, %rd592, 32; 2026-02-21T09:41:35.6950677Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T09:41:35.6950842Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6950900Z mov.b64 {%r1640, %r1641}, %rd594; 2026-02-21T09:41:35.6950963Z cvt.rn.f16x2.f32 %r1642, %r1641, %r1640; 2026-02-21T09:41:35.6951129Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6951185Z cvt.u64.u32 %rd595, %r913; 2026-02-21T09:41:35.6951240Z cvt.u64.u32 %rd596, %r914; 2026-02-21T09:41:35.6951302Z shl.b64 %rd597, %rd596, 32; 2026-02-21T09:41:35.6951360Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T09:41:35.6951514Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6951573Z mov.b64 {%r1643, %r1644}, %rd598; 2026-02-21T09:41:35.6951644Z cvt.rn.f16x2.f32 %r1645, %r1644, %r1643; 2026-02-21T09:41:35.6951801Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6951856Z cvt.u64.u32 %rd599, %r916; 2026-02-21T09:41:35.6951918Z cvt.u64.u32 %rd600, %r917; 2026-02-21T09:41:35.6951974Z shl.b64 %rd601, %rd600, 32; 2026-02-21T09:41:35.6952030Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T09:41:35.6952221Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6952280Z mov.b64 {%r1646, %r1647}, %rd602; 2026-02-21T09:41:35.6952343Z cvt.rn.f16x2.f32 %r1648, %r1647, %r1646; 2026-02-21T09:41:35.6952521Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6952585Z cvt.u64.u32 %rd603, %r918; 2026-02-21T09:41:35.6952642Z cvt.u64.u32 %rd604, %r919; 2026-02-21T09:41:35.6952699Z shl.b64 %rd605, %rd604, 32; 2026-02-21T09:41:35.6952762Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T09:41:35.6952923Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6952981Z mov.b64 {%r1649, %r1650}, %rd606; 2026-02-21T09:41:35.6953052Z cvt.rn.f16x2.f32 %r1651, %r1650, %r1649; 2026-02-21T09:41:35.6953219Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6953279Z cvt.u64.u32 %rd607, %r920; 2026-02-21T09:41:35.6953337Z cvt.u64.u32 %rd608, %r921; 2026-02-21T09:41:35.6953404Z shl.b64 %rd609, %rd608, 32; 2026-02-21T09:41:35.6953463Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T09:41:35.6953657Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6953728Z mov.b64 {%r1652, %r1653}, %rd610; 2026-02-21T09:41:35.6953798Z cvt.rn.f16x2.f32 %r1654, %r1653, %r1652; 2026-02-21T09:41:35.6953965Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6954033Z cvt.u64.u32 %rd611, %r922; 2026-02-21T09:41:35.6954090Z cvt.u64.u32 %rd612, %r923; 2026-02-21T09:41:35.6954149Z shl.b64 %rd613, %rd612, 32; 2026-02-21T09:41:35.6954208Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T09:41:35.6954385Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6954445Z mov.b64 {%r1655, %r1656}, %rd614; 2026-02-21T09:41:35.6954510Z cvt.rn.f16x2.f32 %r1657, %r1656, %r1655; 2026-02-21T09:41:35.6954759Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6954822Z cvt.u64.u32 %rd615, %r924; 2026-02-21T09:41:35.6954883Z cvt.u64.u32 %rd616, %r925; 2026-02-21T09:41:35.6954948Z shl.b64 %rd617, %rd616, 32; 2026-02-21T09:41:35.6955036Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T09:41:35.6955207Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6955268Z mov.b64 {%r1658, %r1659}, %rd618; 2026-02-21T09:41:35.6955344Z cvt.rn.f16x2.f32 %r1660, %r1659, %r1658; 2026-02-21T09:41:35.6955511Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6955571Z cvt.u64.u32 %rd619, %r926; 2026-02-21T09:41:35.6955637Z cvt.u64.u32 %rd620, %r927; 2026-02-21T09:41:35.6955697Z shl.b64 %rd621, %rd620, 32; 2026-02-21T09:41:35.6955757Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T09:41:35.6955934Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6955994Z mov.b64 {%r1661, %r1662}, %rd622; 2026-02-21T09:41:35.6956062Z cvt.rn.f16x2.f32 %r1663, %r1662, %r1661; 2026-02-21T09:41:35.6956232Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6956300Z cvt.u64.u32 %rd623, %r928; 2026-02-21T09:41:35.6956358Z cvt.u64.u32 %rd624, %r929; 2026-02-21T09:41:35.6956416Z shl.b64 %rd625, %rd624, 32; 2026-02-21T09:41:35.6956485Z or.b64 %rd626, %rd623, %rd625; 2026-02-21T09:41:35.6956656Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6956715Z mov.b64 {%r1664, %r1665}, %rd626; 2026-02-21T09:41:35.6956788Z cvt.rn.f16x2.f32 %r1666, %r1665, %r1664; 2026-02-21T09:41:35.6956988Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6957047Z cvt.u64.u32 %rd627, %r930; 2026-02-21T09:41:35.6957105Z cvt.u64.u32 %rd628, %r931; 2026-02-21T09:41:35.6957171Z shl.b64 %rd629, %rd628, 32; 2026-02-21T09:41:35.6957232Z or.b64 %rd630, %rd627, %rd629; 2026-02-21T09:41:35.6957433Z .loc 1 58 27 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:58:27 2026-02-21T09:41:35.6957506Z mov.b64 {%r1667, %r1668}, %rd630; 2026-02-21T09:41:35.6957572Z cvt.rn.f16x2.f32 %r1669, %r1668, %r1667; 2026-02-21T09:41:35.6957737Z .loc 1 59 83 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:59:83 2026-02-21T09:41:35.6957849Z st.shared.v4.b32 [%r58], {%r1288, %r1300, %r1312, %r1324}; 2026-02-21T09:41:35.6957961Z st.shared.v4.b32 [%r58+1024], {%r1480, %r1492, %r1504, %r1516}; 2026-02-21T09:41:35.6958061Z st.shared.v4.b32 [%r59], {%r1336, %r1348, %r1360, %r1372}; 2026-02-21T09:41:35.6958165Z st.shared.v4.b32 [%r59+1024], {%r1528, %r1540, %r1552, %r1564}; 2026-02-21T09:41:35.6958265Z st.shared.v4.b32 [%r60], {%r1384, %r1396, %r1408, %r1420}; 2026-02-21T09:41:35.6958364Z st.shared.v4.b32 [%r60+1024], {%r1576, %r1588, %r1600, %r1612}; 2026-02-21T09:41:35.6958482Z st.shared.v4.b32 [%r61], {%r1432, %r1444, %r1456, %r1468}; 2026-02-21T09:41:35.6958589Z st.shared.v4.b32 [%r61+1024], {%r1624, %r1636, %r1648, %r1660}; 2026-02-21T09:41:35.6958647Z bar.sync 0; 2026-02-21T09:41:35.6958710Z // begin inline asm 2026-02-21T09:41:35.6958880Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1093, %r1097, %r1101, %r1105}, [%r937]; 2026-02-21T09:41:35.6958938Z // end inline asm 2026-02-21T09:41:35.6958997Z // begin inline asm 2026-02-21T09:41:35.6959154Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1109, %r1113, %r1117, %r1121}, [%r942]; 2026-02-21T09:41:35.6959217Z // end inline asm 2026-02-21T09:41:35.6959274Z // begin inline asm 2026-02-21T09:41:35.6959447Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1125, %r1129, %r1133, %r1137}, [%r947]; 2026-02-21T09:41:35.6959528Z // end inline asm 2026-02-21T09:41:35.6959605Z // begin inline asm 2026-02-21T09:41:35.6959844Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1141, %r1145, %r1149, %r1153}, [%r952]; 2026-02-21T09:41:35.6959925Z // end inline asm 2026-02-21T09:41:35.6960002Z // begin inline asm 2026-02-21T09:41:35.6960239Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1157, %r1161, %r1165, %r1169}, [%r957]; 2026-02-21T09:41:35.6960347Z // end inline asm 2026-02-21T09:41:35.6960434Z // begin inline asm 2026-02-21T09:41:35.6960665Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1173, %r1177, %r1181, %r1185}, [%r962]; 2026-02-21T09:41:35.6960739Z // end inline asm 2026-02-21T09:41:35.6960828Z // begin inline asm 2026-02-21T09:41:35.6961067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1189, %r1193, %r1197, %r1201}, [%r967]; 2026-02-21T09:41:35.6961137Z // end inline asm 2026-02-21T09:41:35.6961219Z // begin inline asm 2026-02-21T09:41:35.6961468Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1205, %r1209, %r1213, %r1217}, [%r972]; 2026-02-21T09:41:35.6961546Z // end inline asm 2026-02-21T09:41:35.6961623Z bar.sync 0; 2026-02-21T09:41:35.6961778Z st.shared.v4.b32 [%r58], {%r1291, %r1303, %r1315, %r1327}; 2026-02-21T09:41:35.6961935Z st.shared.v4.b32 [%r58+1024], {%r1483, %r1495, %r1507, %r1519}; 2026-02-21T09:41:35.6962074Z st.shared.v4.b32 [%r59], {%r1339, %r1351, %r1363, %r1375}; 2026-02-21T09:41:35.6962237Z st.shared.v4.b32 [%r59+1024], {%r1531, %r1543, %r1555, %r1567}; 2026-02-21T09:41:35.6962378Z st.shared.v4.b32 [%r60], {%r1387, %r1399, %r1411, %r1423}; 2026-02-21T09:41:35.6962525Z st.shared.v4.b32 [%r60+1024], {%r1579, %r1591, %r1603, %r1615}; 2026-02-21T09:41:35.6962665Z st.shared.v4.b32 [%r61], {%r1435, %r1447, %r1459, %r1471}; 2026-02-21T09:41:35.6962812Z st.shared.v4.b32 [%r61+1024], {%r1627, %r1639, %r1651, %r1663}; 2026-02-21T09:41:35.6962888Z bar.sync 0; 2026-02-21T09:41:35.6963021Z // begin inline asm 2026-02-21T09:41:35.6963270Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1094, %r1098, %r1102, %r1106}, [%r937]; 2026-02-21T09:41:35.6963350Z // end inline asm 2026-02-21T09:41:35.6963432Z // begin inline asm 2026-02-21T09:41:35.6963675Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1110, %r1114, %r1118, %r1122}, [%r942]; 2026-02-21T09:41:35.6963789Z // end inline asm 2026-02-21T09:41:35.6963870Z // begin inline asm 2026-02-21T09:41:35.6964107Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1126, %r1130, %r1134, %r1138}, [%r947]; 2026-02-21T09:41:35.6964192Z // end inline asm 2026-02-21T09:41:35.6964276Z // begin inline asm 2026-02-21T09:41:35.6964512Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1142, %r1146, %r1150, %r1154}, [%r952]; 2026-02-21T09:41:35.6964599Z // end inline asm 2026-02-21T09:41:35.6964738Z // begin inline asm 2026-02-21T09:41:35.6964972Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1158, %r1162, %r1166, %r1170}, [%r957]; 2026-02-21T09:41:35.6965059Z // end inline asm 2026-02-21T09:41:35.6965143Z // begin inline asm 2026-02-21T09:41:35.6965382Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1174, %r1178, %r1182, %r1186}, [%r962]; 2026-02-21T09:41:35.6965463Z // end inline asm 2026-02-21T09:41:35.6965557Z // begin inline asm 2026-02-21T09:41:35.6965837Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1190, %r1194, %r1198, %r1202}, [%r967]; 2026-02-21T09:41:35.6965919Z // end inline asm 2026-02-21T09:41:35.6966016Z // begin inline asm 2026-02-21T09:41:35.6966252Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1206, %r1210, %r1214, %r1218}, [%r972]; 2026-02-21T09:41:35.6966328Z // end inline asm 2026-02-21T09:41:35.6966406Z bar.sync 0; 2026-02-21T09:41:35.6966558Z st.shared.v4.b32 [%r58], {%r1294, %r1306, %r1318, %r1330}; 2026-02-21T09:41:35.6966711Z st.shared.v4.b32 [%r58+1024], {%r1486, %r1498, %r1510, %r1522}; 2026-02-21T09:41:35.6966845Z st.shared.v4.b32 [%r59], {%r1342, %r1354, %r1366, %r1378}; 2026-02-21T09:41:35.6966981Z st.shared.v4.b32 [%r59+1024], {%r1534, %r1546, %r1558, %r1570}; 2026-02-21T09:41:35.6967100Z st.shared.v4.b32 [%r60], {%r1390, %r1402, %r1414, %r1426}; 2026-02-21T09:41:35.6967226Z st.shared.v4.b32 [%r60+1024], {%r1582, %r1594, %r1606, %r1618}; 2026-02-21T09:41:35.6967358Z st.shared.v4.b32 [%r61], {%r1438, %r1450, %r1462, %r1474}; 2026-02-21T09:41:35.6967510Z st.shared.v4.b32 [%r61+1024], {%r1630, %r1642, %r1654, %r1666}; 2026-02-21T09:41:35.6967586Z bar.sync 0; 2026-02-21T09:41:35.6967733Z // begin inline asm 2026-02-21T09:41:35.6967972Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1095, %r1099, %r1103, %r1107}, [%r937]; 2026-02-21T09:41:35.6968048Z // end inline asm 2026-02-21T09:41:35.6968127Z // begin inline asm 2026-02-21T09:41:35.6968374Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1111, %r1115, %r1119, %r1123}, [%r942]; 2026-02-21T09:41:35.6968452Z // end inline asm 2026-02-21T09:41:35.6968532Z // begin inline asm 2026-02-21T09:41:35.6968775Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1127, %r1131, %r1135, %r1139}, [%r947]; 2026-02-21T09:41:35.6968854Z // end inline asm 2026-02-21T09:41:35.6968930Z // begin inline asm 2026-02-21T09:41:35.6969158Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1143, %r1147, %r1151, %r1155}, [%r952]; 2026-02-21T09:41:35.6969241Z // end inline asm 2026-02-21T09:41:35.6969319Z // begin inline asm 2026-02-21T09:41:35.6969553Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1159, %r1163, %r1167, %r1171}, [%r957]; 2026-02-21T09:41:35.6969638Z // end inline asm 2026-02-21T09:41:35.6969715Z // begin inline asm 2026-02-21T09:41:35.6969941Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1175, %r1179, %r1183, %r1187}, [%r962]; 2026-02-21T09:41:35.6970023Z // end inline asm 2026-02-21T09:41:35.6970099Z // begin inline asm 2026-02-21T09:41:35.6970332Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1191, %r1195, %r1199, %r1203}, [%r967]; 2026-02-21T09:41:35.6970408Z // end inline asm 2026-02-21T09:41:35.6970494Z // begin inline asm 2026-02-21T09:41:35.6970726Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1207, %r1211, %r1215, %r1219}, [%r972]; 2026-02-21T09:41:35.6970882Z // end inline asm 2026-02-21T09:41:35.6970961Z bar.sync 0; 2026-02-21T09:41:35.6971098Z st.shared.v4.b32 [%r58], {%r1297, %r1309, %r1321, %r1333}; 2026-02-21T09:41:35.6971248Z st.shared.v4.b32 [%r58+1024], {%r1489, %r1501, %r1513, %r1525}; 2026-02-21T09:41:35.6971426Z st.shared.v4.b32 [%r59], {%r1345, %r1357, %r1369, %r1381}; 2026-02-21T09:41:35.6971580Z st.shared.v4.b32 [%r59+1024], {%r1537, %r1549, %r1561, %r1573}; 2026-02-21T09:41:35.6971718Z st.shared.v4.b32 [%r60], {%r1393, %r1405, %r1417, %r1429}; 2026-02-21T09:41:35.6971862Z st.shared.v4.b32 [%r60+1024], {%r1585, %r1597, %r1609, %r1621}; 2026-02-21T09:41:35.6972004Z st.shared.v4.b32 [%r61], {%r1441, %r1453, %r1465, %r1477}; 2026-02-21T09:41:35.6972145Z st.shared.v4.b32 [%r61+1024], {%r1633, %r1645, %r1657, %r1669}; 2026-02-21T09:41:35.6972217Z bar.sync 0; 2026-02-21T09:41:35.6972302Z // begin inline asm 2026-02-21T09:41:35.6972536Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1096, %r1100, %r1104, %r1108}, [%r937]; 2026-02-21T09:41:35.6972613Z // end inline asm 2026-02-21T09:41:35.6972691Z // begin inline asm 2026-02-21T09:41:35.6972932Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1112, %r1116, %r1120, %r1124}, [%r942]; 2026-02-21T09:41:35.6973006Z // end inline asm 2026-02-21T09:41:35.6973116Z // begin inline asm 2026-02-21T09:41:35.6973351Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1128, %r1132, %r1136, %r1140}, [%r947]; 2026-02-21T09:41:35.6973431Z // end inline asm 2026-02-21T09:41:35.6973506Z // begin inline asm 2026-02-21T09:41:35.6973730Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1144, %r1148, %r1152, %r1156}, [%r952]; 2026-02-21T09:41:35.6973805Z // end inline asm 2026-02-21T09:41:35.6973885Z // begin inline asm 2026-02-21T09:41:35.6974117Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1160, %r1164, %r1168, %r1172}, [%r957]; 2026-02-21T09:41:35.6974199Z // end inline asm 2026-02-21T09:41:35.6974275Z // begin inline asm 2026-02-21T09:41:35.6974509Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1176, %r1180, %r1184, %r1188}, [%r962]; 2026-02-21T09:41:35.6974590Z // end inline asm 2026-02-21T09:41:35.6974740Z // begin inline asm 2026-02-21T09:41:35.6974962Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1192, %r1196, %r1200, %r1204}, [%r967]; 2026-02-21T09:41:35.6975048Z // end inline asm 2026-02-21T09:41:35.6975125Z // begin inline asm 2026-02-21T09:41:35.6975333Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1208, %r1212, %r1216, %r1220}, [%r972]; 2026-02-21T09:41:35.6975442Z // end inline asm 2026-02-21T09:41:35.6975521Z // begin inline asm 2026-02-21T09:41:35.6975659Z st.global.v4.b32 [ %rd87 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T09:41:35.6975723Z // end inline asm 2026-02-21T09:41:35.6975812Z // begin inline asm 2026-02-21T09:41:35.6975969Z st.global.v4.b32 [ %rd88 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T09:41:35.6976046Z // end inline asm 2026-02-21T09:41:35.6976126Z // begin inline asm 2026-02-21T09:41:35.6976288Z st.global.v4.b32 [ %rd89 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T09:41:35.6976366Z // end inline asm 2026-02-21T09:41:35.6976445Z // begin inline asm 2026-02-21T09:41:35.6976604Z st.global.v4.b32 [ %rd90 + 0 ], { %r1105, %r1106, %r1107, %r1108 }; 2026-02-21T09:41:35.6976681Z // end inline asm 2026-02-21T09:41:35.6976763Z // begin inline asm 2026-02-21T09:41:35.6976920Z st.global.v4.b32 [ %rd91 + 0 ], { %r1109, %r1110, %r1111, %r1112 }; 2026-02-21T09:41:35.6976998Z // end inline asm 2026-02-21T09:41:35.6977078Z // begin inline asm 2026-02-21T09:41:35.6977229Z st.global.v4.b32 [ %rd92 + 0 ], { %r1113, %r1114, %r1115, %r1116 }; 2026-02-21T09:41:35.6977315Z // end inline asm 2026-02-21T09:41:35.6977395Z // begin inline asm 2026-02-21T09:41:35.6977543Z st.global.v4.b32 [ %rd93 + 0 ], { %r1117, %r1118, %r1119, %r1120 }; 2026-02-21T09:41:35.6977627Z // end inline asm 2026-02-21T09:41:35.6977704Z // begin inline asm 2026-02-21T09:41:35.6977855Z st.global.v4.b32 [ %rd94 + 0 ], { %r1121, %r1122, %r1123, %r1124 }; 2026-02-21T09:41:35.6977983Z // end inline asm 2026-02-21T09:41:35.6978070Z // begin inline asm 2026-02-21T09:41:35.6978224Z st.global.v4.b32 [ %rd95 + 0 ], { %r1125, %r1126, %r1127, %r1128 }; 2026-02-21T09:41:35.6978304Z // end inline asm 2026-02-21T09:41:35.6978398Z // begin inline asm 2026-02-21T09:41:35.6978591Z st.global.v4.b32 [ %rd96 + 0 ], { %r1129, %r1130, %r1131, %r1132 }; 2026-02-21T09:41:35.6978673Z // end inline asm 2026-02-21T09:41:35.6978752Z // begin inline asm 2026-02-21T09:41:35.6978914Z st.global.v4.b32 [ %rd97 + 0 ], { %r1133, %r1134, %r1135, %r1136 }; 2026-02-21T09:41:35.6978993Z // end inline asm 2026-02-21T09:41:35.6979074Z // begin inline asm 2026-02-21T09:41:35.6979235Z st.global.v4.b32 [ %rd98 + 0 ], { %r1137, %r1138, %r1139, %r1140 }; 2026-02-21T09:41:35.6979313Z // end inline asm 2026-02-21T09:41:35.6979392Z // begin inline asm 2026-02-21T09:41:35.6979553Z st.global.v4.b32 [ %rd99 + 0 ], { %r1141, %r1142, %r1143, %r1144 }; 2026-02-21T09:41:35.6979634Z // end inline asm 2026-02-21T09:41:35.6979713Z // begin inline asm 2026-02-21T09:41:35.6979873Z st.global.v4.b32 [ %rd100 + 0 ], { %r1145, %r1146, %r1147, %r1148 }; 2026-02-21T09:41:35.6979961Z // end inline asm 2026-02-21T09:41:35.6980040Z // begin inline asm 2026-02-21T09:41:35.6980239Z st.global.v4.b32 [ %rd101 + 0 ], { %r1149, %r1150, %r1151, %r1152 }; 2026-02-21T09:41:35.6980327Z // end inline asm 2026-02-21T09:41:35.6980408Z // begin inline asm 2026-02-21T09:41:35.6980565Z st.global.v4.b32 [ %rd102 + 0 ], { %r1153, %r1154, %r1155, %r1156 }; 2026-02-21T09:41:35.6980643Z // end inline asm 2026-02-21T09:41:35.6980734Z // begin inline asm 2026-02-21T09:41:35.6980891Z st.global.v4.b32 [ %rd103 + 0 ], { %r1157, %r1158, %r1159, %r1160 }; 2026-02-21T09:41:35.6980968Z // end inline asm 2026-02-21T09:41:35.6981055Z // begin inline asm 2026-02-21T09:41:35.6981212Z st.global.v4.b32 [ %rd104 + 0 ], { %r1161, %r1162, %r1163, %r1164 }; 2026-02-21T09:41:35.6981292Z // end inline asm 2026-02-21T09:41:35.6981376Z // begin inline asm 2026-02-21T09:41:35.6981530Z st.global.v4.b32 [ %rd105 + 0 ], { %r1165, %r1166, %r1167, %r1168 }; 2026-02-21T09:41:35.6981606Z // end inline asm 2026-02-21T09:41:35.6981684Z // begin inline asm 2026-02-21T09:41:35.6981849Z st.global.v4.b32 [ %rd106 + 0 ], { %r1169, %r1170, %r1171, %r1172 }; 2026-02-21T09:41:35.6981927Z // end inline asm 2026-02-21T09:41:35.6982007Z // begin inline asm 2026-02-21T09:41:35.6982202Z st.global.v4.b32 [ %rd107 + 0 ], { %r1173, %r1174, %r1175, %r1176 }; 2026-02-21T09:41:35.6982276Z // end inline asm 2026-02-21T09:41:35.6982355Z // begin inline asm 2026-02-21T09:41:35.6982508Z st.global.v4.b32 [ %rd108 + 0 ], { %r1177, %r1178, %r1179, %r1180 }; 2026-02-21T09:41:35.6982593Z // end inline asm 2026-02-21T09:41:35.6982672Z // begin inline asm 2026-02-21T09:41:35.6982822Z st.global.v4.b32 [ %rd109 + 0 ], { %r1181, %r1182, %r1183, %r1184 }; 2026-02-21T09:41:35.6982907Z // end inline asm 2026-02-21T09:41:35.6982988Z // begin inline asm 2026-02-21T09:41:35.6983135Z st.global.v4.b32 [ %rd110 + 0 ], { %r1185, %r1186, %r1187, %r1188 }; 2026-02-21T09:41:35.6983217Z // end inline asm 2026-02-21T09:41:35.6983295Z // begin inline asm 2026-02-21T09:41:35.6983447Z st.global.v4.b32 [ %rd111 + 0 ], { %r1189, %r1190, %r1191, %r1192 }; 2026-02-21T09:41:35.6983525Z // end inline asm 2026-02-21T09:41:35.6983614Z // begin inline asm 2026-02-21T09:41:35.6983768Z st.global.v4.b32 [ %rd112 + 0 ], { %r1193, %r1194, %r1195, %r1196 }; 2026-02-21T09:41:35.6983846Z // end inline asm 2026-02-21T09:41:35.6983934Z // begin inline asm 2026-02-21T09:41:35.6984086Z st.global.v4.b32 [ %rd113 + 0 ], { %r1197, %r1198, %r1199, %r1200 }; 2026-02-21T09:41:35.6984163Z // end inline asm 2026-02-21T09:41:35.6984242Z // begin inline asm 2026-02-21T09:41:35.6984400Z st.global.v4.b32 [ %rd114 + 0 ], { %r1201, %r1202, %r1203, %r1204 }; 2026-02-21T09:41:35.6984476Z // end inline asm 2026-02-21T09:41:35.6984553Z // begin inline asm 2026-02-21T09:41:35.6984807Z st.global.v4.b32 [ %rd115 + 0 ], { %r1205, %r1206, %r1207, %r1208 }; 2026-02-21T09:41:35.6984883Z // end inline asm 2026-02-21T09:41:35.6984959Z // begin inline asm 2026-02-21T09:41:35.6985113Z st.global.v4.b32 [ %rd116 + 0 ], { %r1209, %r1210, %r1211, %r1212 }; 2026-02-21T09:41:35.6985199Z // end inline asm 2026-02-21T09:41:35.6985279Z // begin inline asm 2026-02-21T09:41:35.6985488Z st.global.v4.b32 [ %rd117 + 0 ], { %r1213, %r1214, %r1215, %r1216 }; 2026-02-21T09:41:35.6985577Z // end inline asm 2026-02-21T09:41:35.6985658Z // begin inline asm 2026-02-21T09:41:35.6985812Z st.global.v4.b32 [ %rd118 + 0 ], { %r1217, %r1218, %r1219, %r1220 }; 2026-02-21T09:41:35.6985899Z // end inline asm 2026-02-21T09:41:35.6985983Z bra.uni $L__BB0_10; 2026-02-21T09:41:35.6986106Z $L__BB0_11: // %._crit_edge 2026-02-21T09:41:35.6986404Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6986502Z @%p61 bra $L__BB0_13; 2026-02-21T09:41:35.6986576Z // %bb.12: 2026-02-21T09:41:35.6986851Z .loc 1 56 52 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:56:52 2026-02-21T09:41:35.6986942Z // begin inline asm 2026-02-21T09:41:35.6987015Z 2026-02-21T09:41:35.6987086Z { 2026-02-21T09:41:35.6987213Z .reg .pred complete; 2026-02-21T09:41:35.6987305Z waitLoop: 2026-02-21T09:41:35.6987498Z mbarrier.try_wait.parity.shared.b64 complete, [%r1718], %r1719; 2026-02-21T09:41:35.6987600Z @!complete bra.uni waitLoop; 2026-02-21T09:41:35.6987676Z } 2026-02-21T09:41:35.6987684Z 2026-02-21T09:41:35.6987765Z // end inline asm 2026-02-21T09:41:35.6987842Z $L__BB0_13: 2026-02-21T09:41:35.6988130Z .loc 1 30 107 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:107 2026-02-21T09:41:35.6988217Z cp.async.wait_group 0; 2026-02-21T09:41:35.6988289Z bar.sync 0; 2026-02-21T09:41:35.6988367Z // begin inline asm 2026-02-21T09:41:35.6988508Z @%p126 mbarrier.inval.shared::cta.b64 [%r406]; 2026-02-21T09:41:35.6988586Z // end inline asm 2026-02-21T09:41:35.6988660Z bar.sync 0; 2026-02-21T09:41:35.6988747Z // begin inline asm 2026-02-21T09:41:35.6988889Z @%p126 mbarrier.inval.shared::cta.b64 [%r407]; 2026-02-21T09:41:35.6988962Z // end inline asm 2026-02-21T09:41:35.7004800Z bar.sync 0; 2026-02-21T09:41:35.7004930Z // begin inline asm 2026-02-21T09:41:35.7005049Z @%p126 mbarrier.inval.shared::cta.b64 [%r408]; 2026-02-21T09:41:35.7005244Z // end inline asm 2026-02-21T09:41:35.7005302Z bar.sync 0; 2026-02-21T09:41:35.7005360Z // begin inline asm 2026-02-21T09:41:35.7005447Z @%p126 mbarrier.inval.shared::cta.b64 [%r409]; 2026-02-21T09:41:35.7005512Z // end inline asm 2026-02-21T09:41:35.7005567Z bar.sync 0; 2026-02-21T09:41:35.7005622Z // begin inline asm 2026-02-21T09:41:35.7005713Z @%p126 mbarrier.inval.shared::cta.b64 [%r410]; 2026-02-21T09:41:35.7005770Z // end inline asm 2026-02-21T09:41:35.7005825Z bar.sync 0; 2026-02-21T09:41:35.7005889Z // begin inline asm 2026-02-21T09:41:35.7005974Z @%p126 mbarrier.inval.shared::cta.b64 [%r411]; 2026-02-21T09:41:35.7006030Z // end inline asm 2026-02-21T09:41:35.7006081Z bar.sync 0; 2026-02-21T09:41:35.7006145Z // begin inline asm 2026-02-21T09:41:35.7006224Z @%p126 mbarrier.inval.shared::cta.b64 [%r538]; 2026-02-21T09:41:35.7006278Z // end inline asm 2026-02-21T09:41:35.7006347Z add.s32 %r1679, %r123, 274496; 2026-02-21T09:41:35.7006411Z // begin inline asm 2026-02-21T09:41:35.7006496Z @%p126 mbarrier.inval.shared::cta.b64 [%r1679]; 2026-02-21T09:41:35.7006550Z // end inline asm 2026-02-21T09:41:35.7006610Z bar.sync 0; 2026-02-21T09:41:35.7006664Z // begin inline asm 2026-02-21T09:41:35.7006742Z @%p126 mbarrier.inval.shared::cta.b64 [%r405]; 2026-02-21T09:41:35.7006803Z // end inline asm 2026-02-21T09:41:35.7006993Z .loc 1 30 4 // camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py:30:4 2026-02-21T09:41:35.7007047Z bar.sync 0; 2026-02-21T09:41:35.7007142Z // begin inline asm 2026-02-21T09:41:35.7007272Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1681, 256; 2026-02-21T09:41:35.7007328Z // end inline asm 2026-02-21T09:41:35.7007380Z ret; 2026-02-21T09:41:35.7007444Z $L__tmp1: 2026-02-21T09:41:35.7007500Z $L__func_end0: 2026-02-21T09:41:35.7007587Z // -- End function 2026-02-21T09:41:35.7007675Z } 2026-02-21T09:41:35.7007893Z .file 1 "/tmp/torchinductor_root/am/camsj74xbmt7bea5c6prekqwp6dl6a7hmrlf2uagzmfteyzmdyig.py" 2026-02-21T09:41:35.7007957Z .section .debug_abbrev 2026-02-21T09:41:35.7008007Z { 2026-02-21T09:41:35.7008104Z .b8 1 // Abbreviation Code 2026-02-21T09:41:35.7008191Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:41:35.7008273Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:41:35.7008357Z .b8 37 // DW_AT_producer 2026-02-21T09:41:35.7008432Z .b8 8 // DW_FORM_string 2026-02-21T09:41:35.7008507Z .b8 19 // DW_AT_language 2026-02-21T09:41:35.7008583Z .b8 5 // DW_FORM_data2 2026-02-21T09:41:35.7008663Z .b8 3 // DW_AT_name 2026-02-21T09:41:35.7008785Z .b8 8 // DW_FORM_string 2026-02-21T09:41:35.7008866Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:41:35.7008952Z .b8 6 // DW_FORM_data4 2026-02-21T09:41:35.7009028Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:41:35.7009099Z .b8 8 // DW_FORM_string 2026-02-21T09:41:35.7009176Z .b8 0 // EOM(1) 2026-02-21T09:41:35.7009245Z .b8 0 // EOM(2) 2026-02-21T09:41:35.7009310Z .b8 0 // EOM(3) 2026-02-21T09:41:35.7009360Z } 2026-02-21T09:41:35.7009429Z .section .debug_info 2026-02-21T09:41:35.7009480Z { 2026-02-21T09:41:35.7009562Z .b32 104 // Length of Unit 2026-02-21T09:41:35.7009655Z .b8 2 // DWARF version number 2026-02-21T09:41:35.7009708Z .b8 0 2026-02-21T09:41:35.7009827Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:41:35.7009927Z .b8 8 // Address Size (in bytes) 2026-02-21T09:41:35.7010057Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:41:35.7010137Z .b8 116 // DW_AT_producer 2026-02-21T09:41:35.7010189Z .b8 114 2026-02-21T09:41:35.7010248Z .b8 105 2026-02-21T09:41:35.7010299Z .b8 116 2026-02-21T09:41:35.7010347Z .b8 111 2026-02-21T09:41:35.7010405Z .b8 110 2026-02-21T09:41:35.7010454Z .b8 0 2026-02-21T09:41:35.7010524Z .b8 2 // DW_AT_language 2026-02-21T09:41:35.7010573Z .b8 0 2026-02-21T09:41:35.7010658Z .b8 99 // DW_AT_name 2026-02-21T09:41:35.7010710Z .b8 97 2026-02-21T09:41:35.7010758Z .b8 109 2026-02-21T09:41:35.7010815Z .b8 115 2026-02-21T09:41:35.7010863Z .b8 106 2026-02-21T09:41:35.7010912Z .b8 55 2026-02-21T09:41:35.7010959Z .b8 52 2026-02-21T09:41:35.7011017Z .b8 120 2026-02-21T09:41:35.7011068Z .b8 98 2026-02-21T09:41:35.7011120Z .b8 109 2026-02-21T09:41:35.7011177Z .b8 116 2026-02-21T09:41:35.7011227Z .b8 55 2026-02-21T09:41:35.7011284Z .b8 98 2026-02-21T09:41:35.7011334Z .b8 101 2026-02-21T09:41:35.7011395Z .b8 97 2026-02-21T09:41:35.7011446Z .b8 53 2026-02-21T09:41:35.7011494Z .b8 99 2026-02-21T09:41:35.7011542Z .b8 54 2026-02-21T09:41:35.7011601Z .b8 112 2026-02-21T09:41:35.7011650Z .b8 114 2026-02-21T09:41:35.7011698Z .b8 101 2026-02-21T09:41:35.7011753Z .b8 107 2026-02-21T09:41:35.7011803Z .b8 113 2026-02-21T09:41:35.7011852Z .b8 119 2026-02-21T09:41:35.7011902Z .b8 112 2026-02-21T09:41:35.7011960Z .b8 54 2026-02-21T09:41:35.7012009Z .b8 100 2026-02-21T09:41:35.7012084Z .b8 108 2026-02-21T09:41:35.7012143Z .b8 54 2026-02-21T09:41:35.7012193Z .b8 97 2026-02-21T09:41:35.7012243Z .b8 55 2026-02-21T09:41:35.7012294Z .b8 104 2026-02-21T09:41:35.7012352Z .b8 109 2026-02-21T09:41:35.7012400Z .b8 114 2026-02-21T09:41:35.7012450Z .b8 108 2026-02-21T09:41:35.7012502Z .b8 102 2026-02-21T09:41:35.7012560Z .b8 50 2026-02-21T09:41:35.7012635Z .b8 117 2026-02-21T09:41:35.7012686Z .b8 97 2026-02-21T09:41:35.7012745Z .b8 103 2026-02-21T09:41:35.7012795Z .b8 122 2026-02-21T09:41:35.7012844Z .b8 109 2026-02-21T09:41:35.7012894Z .b8 102 2026-02-21T09:41:35.7012953Z .b8 116 2026-02-21T09:41:35.7013001Z .b8 101 2026-02-21T09:41:35.7013051Z .b8 121 2026-02-21T09:41:35.7013108Z .b8 122 2026-02-21T09:41:35.7013157Z .b8 109 2026-02-21T09:41:35.7013205Z .b8 100 2026-02-21T09:41:35.7013253Z .b8 121 2026-02-21T09:41:35.7013310Z .b8 105 2026-02-21T09:41:35.7013357Z .b8 103 2026-02-21T09:41:35.7013406Z .b8 46 2026-02-21T09:41:35.7013465Z .b8 112 2026-02-21T09:41:35.7013513Z .b8 121 2026-02-21T09:41:35.7013563Z .b8 0 2026-02-21T09:41:35.7013652Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:41:35.7013736Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:41:35.7013784Z .b8 116 2026-02-21T09:41:35.7013831Z .b8 109 2026-02-21T09:41:35.7013887Z .b8 112 2026-02-21T09:41:35.7013936Z .b8 47 2026-02-21T09:41:35.7014009Z .b8 116 2026-02-21T09:41:35.7014061Z .b8 111 2026-02-21T09:41:35.7014119Z .b8 114 2026-02-21T09:41:35.7014170Z .b8 99 2026-02-21T09:41:35.7014219Z .b8 104 2026-02-21T09:41:35.7014268Z .b8 105 2026-02-21T09:41:35.7014326Z .b8 110 2026-02-21T09:41:35.7014378Z .b8 100 2026-02-21T09:41:35.7014426Z .b8 117 2026-02-21T09:41:35.7014481Z .b8 99 2026-02-21T09:41:35.7014530Z .b8 116 2026-02-21T09:41:35.7014580Z .b8 111 2026-02-21T09:41:35.7014627Z .b8 114 2026-02-21T09:41:35.7014734Z .b8 95 2026-02-21T09:41:35.7014783Z .b8 114 2026-02-21T09:41:35.7014832Z .b8 111 2026-02-21T09:41:35.7014890Z .b8 111 2026-02-21T09:41:35.7014952Z .b8 116 2026-02-21T09:41:35.7015015Z .b8 47 2026-02-21T09:41:35.7015068Z .b8 97 2026-02-21T09:41:35.7015135Z .b8 109 2026-02-21T09:41:35.7015196Z .b8 0 2026-02-21T09:41:35.7015257Z } 2026-02-21T09:41:35.7015354Z .section .debug_macinfo { } 2026-02-21T09:41:35.7015362Z 2026-02-21T09:41:35.7015466Z ================================================================ 2026-02-21T09:41:35.7015615Z please share the reproducer above with Triton project. 2026-02-21T09:41:39.7378624Z 2026-02-21T09:41:39.7381333Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 86/86 18.2 configs/s 2026-02-21T09:41:41.0081555Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 780.4 2026-02-21T09:41:41.0086822Z configs/s 2026-02-21T09:41:41.0900054Z [47s] Generation 2 complete: 2026-02-21T09:41:41.0904458Z error=12 2026-02-21T09:41:41.0908465Z ok=79 2026-02-21T09:41:41.0913675Z min=0.0634 2026-02-21T09:41:41.0915384Z mid=0.1433 2026-02-21T09:41:41.0915585Z max=10.5073 2026-02-21T09:41:41.0915734Z best={'block_sizes': [64, 256, 32], 2026-02-21T09:41:41.0915987Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:41:41.0916210Z 'l2_groupings': [4], 2026-02-21T09:41:41.0916394Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:41:41.0916602Z 'loop_orders': [[0, 1]], 2026-02-21T09:41:41.0916773Z 'maxnreg': 128, 2026-02-21T09:41:41.0916917Z 'num_sm_multiplier': 2, 2026-02-21T09:41:41.0917089Z 'num_stages': 7, 2026-02-21T09:41:41.0917237Z 'num_warps': 4, 2026-02-21T09:41:41.0917392Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:41:41.0917594Z 'range_flattens': [True, False], 2026-02-21T09:41:41.0917773Z 'range_multi_buffers': [False, False], 2026-02-21T09:41:41.0917963Z 'range_num_stages': [0, 0], 2026-02-21T09:41:41.0918128Z 'range_unroll_factors': [0, 0], 2026-02-21T09:41:41.0918314Z 'range_warp_specializes': [False, False]} 2026-02-21T09:41:41.0919407Z [47s] Fitting surrogate: 280 points, 280 targets 2026-02-21T09:41:42.4294139Z [48s] Generation 3 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:41:54.0340411Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 2.2 configs/s 2026-02-21T09:41:58.9707176Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 17.8 configs/s 2026-02-21T09:42:00.6212571Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 605.9 2026-02-21T09:42:00.6216505Z configs/s 2026-02-21T09:42:00.7143133Z [66s] Generation 3 complete: 2026-02-21T09:42:00.7148880Z error=5 2026-02-21T09:42:00.7153421Z ok=86 2026-02-21T09:42:00.7155021Z min=0.0492 2026-02-21T09:42:00.7155201Z mid=0.1147 2026-02-21T09:42:00.7155334Z max=5.8399 2026-02-21T09:42:00.7155495Z best={'block_sizes': [256, 128, 32], 2026-02-21T09:42:00.7155754Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:42:00.7155982Z 'l2_groupings': [64], 2026-02-21T09:42:00.7156166Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:42:00.7156377Z 'loop_orders': [[1, 0]], 2026-02-21T09:42:00.7156545Z 'num_stages': 6, 2026-02-21T09:42:00.7156691Z 'num_warps': 4, 2026-02-21T09:42:00.7156843Z 'pid_type': 'flat', 2026-02-21T09:42:00.7157006Z 'range_flattens': [None, True], 2026-02-21T09:42:00.7157428Z 'range_multi_buffers': [None, True], 2026-02-21T09:42:00.7157639Z 'range_num_stages': [0, 0], 2026-02-21T09:42:00.7157821Z 'range_unroll_factors': [0, 0], 2026-02-21T09:42:00.7158018Z 'range_warp_specializes': [None, None]} 2026-02-21T09:42:00.7162044Z [66s] Fitting surrogate: 371 points, 371 targets 2026-02-21T09:42:01.9941044Z [68s] Generation 4 starting: 85 neighbors, 5 active search path(s) 2026-02-21T09:42:16.8141078Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86/86 1.2 configs/s 2026-02-21T09:42:19.6522696Z [85s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:42:19.6522992Z 2026-02-21T09:42:19.6522996Z 2026-02-21T09:42:19.6523118Z ================================================================ 2026-02-21T09:42:19.6523347Z Internal Triton PTX codegen error 2026-02-21T09:42:19.6523521Z `ptxas` stderr: 2026-02-21T09:42:19.6523951Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 232 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:42:19.6524436Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:42:19.6525156Z 2026-02-21T09:42:19.6525551Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2dq0j00o.ptx -o /tmp/tmp2dq0j00o.ptx.o 2026-02-21T09:42:19.6525974Z 2026-02-21T09:42:19.6525988Z 2026-02-21T09:42:19.6526043Z // 2026-02-21T09:42:19.6526179Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:42:19.6526351Z // 2026-02-21T09:42:19.6526415Z 2026-02-21T09:42:19.6526470Z .version 8.7 2026-02-21T09:42:19.6526605Z .target sm_100a 2026-02-21T09:42:19.6526765Z .address_size 64 2026-02-21T09:42:19.6526861Z 2026-02-21T09:42:19.6526983Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:42:19.6527247Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:42:19.6527459Z // @_helion_matmul 2026-02-21T09:42:19.6527752Z .visible .entry _helion_matmul( 2026-02-21T09:42:19.6527964Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:42:19.6528215Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:42:19.6528459Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:42:19.6528692Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:42:19.6530107Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=2, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[True, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:42:19.6531396Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:42:19.6531633Z `ptxas` stderr: 2026-02-21T09:42:19.6532071Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 232 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:42:19.6532572Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:42:19.6532721Z 2026-02-21T09:42:19.6533107Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp2dq0j00o.ptx -o /tmp/tmp2dq0j00o.ptx.o 2026-02-21T09:42:19.6533568Z 2026-02-21T09:42:19.6533699Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:42:19.6533996Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:42:19.6534200Z ) 2026-02-21T09:42:19.6534329Z .reqntid 128 2026-02-21T09:42:19.6534458Z .maxnreg 32 2026-02-21T09:42:19.6534587Z { 2026-02-21T09:42:19.6534758Z .reg .pred %p<85>; 2026-02-21T09:42:19.6534912Z .reg .b32 %r<447>; 2026-02-21T09:42:19.6535108Z .reg .b64 %rd<222>; 2026-02-21T09:42:19.6535265Z $L__func_begin0: 2026-02-21T09:42:19.6535349Z 2026-02-21T09:42:19.6535405Z // %bb.0: 2026-02-21T09:42:19.6535661Z .loc 1 19 0 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:19 2026-02-21T09:42:19.6535958Z mov.u32 %r1, %tid.x; 2026-02-21T09:42:19.6536133Z ld.param.b64 %rd12, [_helion_matmul_param_1]; 2026-02-21T09:42:19.6536344Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:42:19.6536527Z ld.param.b64 %rd30, [_helion_matmul_param_2]; 2026-02-21T09:42:19.6536730Z mov.b32 %r34, global_smem; 2026-02-21T09:42:19.6536895Z // begin inline asm 2026-02-21T09:42:19.6537138Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r34], 64; 2026-02-21T09:42:19.6537378Z // end inline asm 2026-02-21T09:42:19.6537540Z ld.param.b64 %rd47, [_helion_matmul_param_3]; 2026-02-21T09:42:19.6537724Z bar.sync 0; 2026-02-21T09:42:19.6537863Z ld.shared.b32 %r440, [global_smem]; 2026-02-21T09:42:19.6538035Z bar.sync 0; 2026-02-21T09:42:19.6538164Z // begin inline asm 2026-02-21T09:42:19.6538368Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:42:19.6538628Z // end inline asm 2026-02-21T09:42:19.6538876Z .loc 1 21 68 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:21:68 2026-02-21T09:42:19.6539164Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:42:19.6539322Z mov.u32 %r51, %ctaid.y; 2026-02-21T09:42:19.6539480Z mov.u32 %r52, %ctaid.z; 2026-02-21T09:42:19.6539626Z mov.u32 %r53, %nctaid.x; 2026-02-21T09:42:19.6539785Z mov.u32 %r54, %nctaid.y; 2026-02-21T09:42:19.6539936Z mad.lo.s32 %r55, %r52, %r54, %r51; 2026-02-21T09:42:19.6540118Z mad.lo.s32 %r56, %r55, %r53, %r3; 2026-02-21T09:42:19.6540280Z shl.b32 %r57, %r56, 8; 2026-02-21T09:42:19.6540430Z cvt.s64.s32 %rd48, %r57; 2026-02-21T09:42:19.6540580Z add.s64 %rd26, %rd47, %rd48; 2026-02-21T09:42:19.6540740Z shl.b32 %r58, %r1, 2; 2026-02-21T09:42:19.6540883Z add.s32 %r35, %r34, %r58; 2026-02-21T09:42:19.6541035Z mov.b32 %r44, 0; 2026-02-21T09:42:19.6541207Z // begin inline asm 2026-02-21T09:42:19.6541354Z @%p1 st.shared.b32 [ %r35 + 0 ], %r44; 2026-02-21T09:42:19.6541529Z // end inline asm 2026-02-21T09:42:19.6541661Z bar.warp.sync -1; 2026-02-21T09:42:19.6541809Z setp.eq.b32 %p77, %r1, 0; 2026-02-21T09:42:19.6541960Z cvt.u64.u32 %rd11, %r34; 2026-02-21T09:42:19.6542109Z // begin inline asm 2026-02-21T09:42:19.6542352Z @%p77 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T09:42:19.6542635Z // end inline asm 2026-02-21T09:42:19.6542772Z // begin inline asm 2026-02-21T09:42:19.6542988Z @%p77 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:42:19.6543283Z // end inline asm 2026-02-21T09:42:19.6543411Z mov.b32 %r37, 32; 2026-02-21T09:42:19.6543547Z // begin inline asm 2026-02-21T09:42:19.6543772Z @%p77 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r37; 2026-02-21T09:42:19.6544040Z // end inline asm 2026-02-21T09:42:19.6544166Z mov.b32 %r38, 64; 2026-02-21T09:42:19.6544301Z // begin inline asm 2026-02-21T09:42:19.6544527Z @%p77 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r38; 2026-02-21T09:42:19.6544814Z // end inline asm 2026-02-21T09:42:19.6544951Z mov.b32 %r39, 1024; 2026-02-21T09:42:19.6545086Z // begin inline asm 2026-02-21T09:42:19.6545326Z @%p77 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r39; 2026-02-21T09:42:19.6545594Z // end inline asm 2026-02-21T09:42:19.6545729Z mov.b32 %r40, 12288; 2026-02-21T09:42:19.6545867Z // begin inline asm 2026-02-21T09:42:19.6546106Z @%p77 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r40; 2026-02-21T09:42:19.6546379Z // end inline asm 2026-02-21T09:42:19.6546507Z mov.b64 %rd19, 2048; 2026-02-21T09:42:19.6546650Z // begin inline asm 2026-02-21T09:42:19.6546928Z @%p77 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:42:19.6547215Z // end inline asm 2026-02-21T09:42:19.6547342Z mov.b32 %r41, 1; 2026-02-21T09:42:19.6547481Z // begin inline asm 2026-02-21T09:42:19.6547738Z @%p77 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r41; 2026-02-21T09:42:19.6548012Z // end inline asm 2026-02-21T09:42:19.6548146Z // begin inline asm 2026-02-21T09:42:19.6548391Z @%p77 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r41; 2026-02-21T09:42:19.6548672Z // end inline asm 2026-02-21T09:42:19.6548803Z // begin inline asm 2026-02-21T09:42:19.6549042Z @%p77 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:42:19.6549300Z // end inline asm 2026-02-21T09:42:19.6549436Z // begin inline asm 2026-02-21T09:42:19.6549684Z @%p77 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:42:19.6549955Z // end inline asm 2026-02-21T09:42:19.6550092Z // begin inline asm 2026-02-21T09:42:19.6550320Z @%p77 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x2; 2026-02-21T09:42:19.6550627Z // end inline asm 2026-02-21T09:42:19.6550754Z // begin inline asm 2026-02-21T09:42:19.6550985Z @%p77 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:42:19.6551246Z // end inline asm 2026-02-21T09:42:19.6551375Z // begin inline asm 2026-02-21T09:42:19.6551717Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd26 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:42:19.6552094Z // end inline asm 2026-02-21T09:42:19.6552229Z // begin inline asm 2026-02-21T09:42:19.6552430Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd26 + 0 ], 0x80; 2026-02-21T09:42:19.6552683Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:42:19.6552874Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:42:19.6553045Z // end inline asm 2026-02-21T09:42:19.6553186Z bar.sync 0; 2026-02-21T09:42:19.6553353Z cvta.global.u64 %rd67, %rd26; 2026-02-21T09:42:19.6553639Z .loc 1 23 73 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:23:73 2026-02-21T09:42:19.6553924Z add.s64 %rd44, %rd26, 128; 2026-02-21T09:42:19.6554080Z bar.sync 0; 2026-02-21T09:42:19.6554205Z // begin inline asm 2026-02-21T09:42:19.6554355Z @%p1 st.shared.b32 [ %r35 + 0 ], %r44; 2026-02-21T09:42:19.6554527Z // end inline asm 2026-02-21T09:42:19.6554660Z bar.warp.sync -1; 2026-02-21T09:42:19.6554853Z // begin inline asm 2026-02-21T09:42:19.6555095Z @%p77 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd30; 2026-02-21T09:42:19.6555405Z // end inline asm 2026-02-21T09:42:19.6555532Z // begin inline asm 2026-02-21T09:42:19.6555754Z @%p77 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:42:19.6555995Z // end inline asm 2026-02-21T09:42:19.6556131Z // begin inline asm 2026-02-21T09:42:19.6556372Z @%p77 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r38; 2026-02-21T09:42:19.6556633Z // end inline asm 2026-02-21T09:42:19.6556771Z mov.b32 %r46, 128; 2026-02-21T09:42:19.6556902Z // begin inline asm 2026-02-21T09:42:19.6557134Z @%p77 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r46; 2026-02-21T09:42:19.6557391Z // end inline asm 2026-02-21T09:42:19.6557528Z // begin inline asm 2026-02-21T09:42:19.6557760Z @%p77 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r40; 2026-02-21T09:42:19.6558036Z // end inline asm 2026-02-21T09:42:19.6558172Z // begin inline asm 2026-02-21T09:42:19.6558405Z @%p77 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r39; 2026-02-21T09:42:19.6558680Z // end inline asm 2026-02-21T09:42:19.6558813Z mov.b64 %rd37, 24576; 2026-02-21T09:42:19.6558963Z // begin inline asm 2026-02-21T09:42:19.6559213Z @%p77 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd37; 2026-02-21T09:42:19.6559601Z // end inline asm 2026-02-21T09:42:19.6559782Z // begin inline asm 2026-02-21T09:42:19.6560073Z @%p77 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r41; 2026-02-21T09:42:19.6560392Z // end inline asm 2026-02-21T09:42:19.6560550Z // begin inline asm 2026-02-21T09:42:19.6560842Z @%p77 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r41; 2026-02-21T09:42:19.6561146Z // end inline asm 2026-02-21T09:42:19.6561296Z // begin inline asm 2026-02-21T09:42:19.6561541Z @%p77 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x6; 2026-02-21T09:42:19.6561823Z // end inline asm 2026-02-21T09:42:19.6561963Z // begin inline asm 2026-02-21T09:42:19.6562235Z @%p77 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:42:19.6562534Z // end inline asm 2026-02-21T09:42:19.6562664Z // begin inline asm 2026-02-21T09:42:19.6562923Z @%p77 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x3; 2026-02-21T09:42:19.6563203Z // end inline asm 2026-02-21T09:42:19.6563370Z // begin inline asm 2026-02-21T09:42:19.6563609Z @%p77 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:42:19.6563891Z // end inline asm 2026-02-21T09:42:19.6564027Z // begin inline asm 2026-02-21T09:42:19.6564370Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd44 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:42:19.6564786Z // end inline asm 2026-02-21T09:42:19.6564916Z // begin inline asm 2026-02-21T09:42:19.6565141Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd44 + 0 ], 0x80; 2026-02-21T09:42:19.6565414Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:42:19.6565597Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:42:19.6565787Z // end inline asm 2026-02-21T09:42:19.6565914Z bar.sync 0; 2026-02-21T09:42:19.6566057Z cvta.global.u64 %rd91, %rd44; 2026-02-21T09:42:19.6566373Z .loc 1 32 107 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:32:107 2026-02-21T09:42:19.6566702Z setp.gt.u32 %p39, %r3, 1535; 2026-02-21T09:42:19.6566857Z @%p39 bra $L__BB0_8; 2026-02-21T09:42:19.6567017Z // %bb.1: // %.lr.ph 2026-02-21T09:42:19.6567311Z .loc 1 0 107 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:0:107 2026-02-21T09:42:19.6567609Z ld.param.b64 %rd10, [_helion_matmul_param_0]; 2026-02-21T09:42:19.6567909Z .loc 1 51 48 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:51:48 2026-02-21T09:42:19.6568241Z shl.b32 %r146, %r1, 3; 2026-02-21T09:42:19.6568394Z and.b32 %r147, %r146, 24; 2026-02-21T09:42:19.6568647Z .loc 1 44 45 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:44:45 2026-02-21T09:42:19.6568930Z bfe.u32 %r4, %r1, 2, 5; 2026-02-21T09:42:19.6569082Z shr.u32 %r148, %r1, 5; 2026-02-21T09:42:19.6569225Z and.b32 %r149, %r1, 127; 2026-02-21T09:42:19.6569379Z shl.b32 %r150, %r149, 4; 2026-02-21T09:42:19.6569521Z shl.b32 %r151, %r1, 1; 2026-02-21T09:42:19.6569672Z and.b32 %r152, %r151, 48; 2026-02-21T09:42:19.6569823Z xor.b32 %r5, %r150, %r152; 2026-02-21T09:42:19.6569980Z or.b32 %r6, %r147, 32; 2026-02-21T09:42:19.6570121Z add.s32 %r154, %r34, %r5; 2026-02-21T09:42:19.6570281Z add.s32 %r189, %r154, 24576; 2026-02-21T09:42:19.6570441Z add.s32 %r191, %r154, 26624; 2026-02-21T09:42:19.6570633Z add.s32 %r193, %r154, 28672; 2026-02-21T09:42:19.6570789Z add.s32 %r195, %r154, 30720; 2026-02-21T09:42:19.6570934Z shl.b32 %r155, %r149, 7; 2026-02-21T09:42:19.6571086Z shl.b32 %r156, %r1, 4; 2026-02-21T09:42:19.6571227Z and.b32 %r157, %r156, 112; 2026-02-21T09:42:19.6571382Z or.b32 %r158, %r155, %r157; 2026-02-21T09:42:19.6571531Z xor.b32 %r159, %r158, 16; 2026-02-21T09:42:19.6571680Z xor.b32 %r160, %r158, 32; 2026-02-21T09:42:19.6571821Z xor.b32 %r161, %r158, 48; 2026-02-21T09:42:19.6572019Z xor.b32 %r162, %r158, 64; 2026-02-21T09:42:19.6572164Z xor.b32 %r163, %r158, 80; 2026-02-21T09:42:19.6572313Z xor.b32 %r164, %r158, 96; 2026-02-21T09:42:19.6572464Z xor.b32 %r165, %r158, 112; 2026-02-21T09:42:19.6572612Z add.s32 %r131, %r154, 16384; 2026-02-21T09:42:19.6572766Z add.s32 %r137, %r154, 22528; 2026-02-21T09:42:19.6572913Z add.s32 %r135, %r154, 20480; 2026-02-21T09:42:19.6573069Z add.s32 %r133, %r154, 18432; 2026-02-21T09:42:19.6573322Z .loc 1 43 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:43:27 2026-02-21T09:42:19.6573615Z shl.b32 %r166, %r3, 7; 2026-02-21T09:42:19.6573760Z and.b32 %r341, %r166, 896; 2026-02-21T09:42:19.6574023Z .loc 1 44 32 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:44:32 2026-02-21T09:42:19.6574311Z or.b32 %r167, %r341, %r4; 2026-02-21T09:42:19.6574561Z .loc 1 45 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:45:27 2026-02-21T09:42:19.6574892Z shl.b32 %r168, %r3, 3; 2026-02-21T09:42:19.6575035Z and.b32 %r340, %r168, 16320; 2026-02-21T09:42:19.6575316Z .loc 1 55 53 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:53 2026-02-21T09:42:19.6575590Z shl.b32 %r169, %r167, 10; 2026-02-21T09:42:19.6575840Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6576146Z shfl.sync.idx.b32 %r21, %r148, 0, 31, -1; 2026-02-21T09:42:19.6576322Z shl.b32 %r170, %r21, 21; 2026-02-21T09:42:19.6576478Z and.b32 %r171, %r170, 6291456; 2026-02-21T09:42:19.6576633Z add.s32 %r339, %r171, %r440; 2026-02-21T09:42:19.6576791Z mov.pred %p40, -1; 2026-02-21T09:42:19.6576930Z // begin inline asm 2026-02-21T09:42:19.6577266Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r339 + 0], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T09:42:19.6577638Z // end inline asm 2026-02-21T09:42:19.6577769Z // begin inline asm 2026-02-21T09:42:19.6578124Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r339 + 16], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T09:42:19.6578470Z // end inline asm 2026-02-21T09:42:19.6578610Z // begin inline asm 2026-02-21T09:42:19.6578924Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r339 + 32], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T09:42:19.6579285Z // end inline asm 2026-02-21T09:42:19.6579421Z // begin inline asm 2026-02-21T09:42:19.6579734Z @%p40 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r339 + 48], {%r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44, %r44}; 2026-02-21T09:42:19.6580116Z // end inline asm 2026-02-21T09:42:19.6580250Z // begin inline asm 2026-02-21T09:42:19.6580413Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:42:19.6580576Z // end inline asm 2026-02-21T09:42:19.6580721Z bar.sync 0; 2026-02-21T09:42:19.6580963Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6581257Z add.s32 %r442, %r34, 40960; 2026-02-21T09:42:19.6581419Z // begin inline asm 2026-02-21T09:42:19.6581583Z @%p77 mbarrier.init.shared::cta.b64 [%r442], 1; 2026-02-21T09:42:19.6581775Z // end inline asm 2026-02-21T09:42:19.6581903Z bar.sync 0; 2026-02-21T09:42:19.6582040Z add.s32 %r128, %r34, 40968; 2026-02-21T09:42:19.6582193Z // begin inline asm 2026-02-21T09:42:19.6582360Z @%p77 mbarrier.init.shared::cta.b64 [%r128], 1; 2026-02-21T09:42:19.6582543Z // end inline asm 2026-02-21T09:42:19.6582686Z add.s32 %r129, %r34, 40976; 2026-02-21T09:42:19.6582844Z // begin inline asm 2026-02-21T09:42:19.6583004Z @%p77 mbarrier.init.shared::cta.b64 [%r129], 1; 2026-02-21T09:42:19.6583194Z // end inline asm 2026-02-21T09:42:19.6583328Z bar.sync 0; 2026-02-21T09:42:19.6583469Z add.s32 %r197, %r34, 40984; 2026-02-21T09:42:19.6583654Z // begin inline asm 2026-02-21T09:42:19.6583825Z @%p77 mbarrier.init.shared::cta.b64 [%r197], 1; 2026-02-21T09:42:19.6584010Z // end inline asm 2026-02-21T09:42:19.6584264Z .loc 1 55 60 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:60 2026-02-21T09:42:19.6584556Z or.b32 %r172, %r169, %r147; 2026-02-21T09:42:19.6584850Z .loc 1 55 32 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:32 2026-02-21T09:42:19.6585154Z mad.wide.u32 %rd49, %r172, 2, %rd10; 2026-02-21T09:42:19.6585337Z cvt.u64.u32 %rd3, %r169; 2026-02-21T09:42:19.6585502Z add.s64 %rd50, %rd49, 65536; 2026-02-21T09:42:19.6585665Z add.s64 %rd51, %rd49, 131072; 2026-02-21T09:42:19.6585833Z add.s64 %rd52, %rd49, 196608; 2026-02-21T09:42:19.6585990Z mov.b32 %r190, 16; 2026-02-21T09:42:19.6586241Z .loc 1 55 85 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:85 2026-02-21T09:42:19.6586528Z // begin inline asm 2026-02-21T09:42:19.6586737Z cp.async.cg.shared.global [ %r131 + 0 ], [ %rd49 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6586980Z // end inline asm 2026-02-21T09:42:19.6587149Z // begin inline asm 2026-02-21T09:42:19.6587353Z cp.async.cg.shared.global [ %r133 + 0 ], [ %rd50 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6587575Z // end inline asm 2026-02-21T09:42:19.6587714Z // begin inline asm 2026-02-21T09:42:19.6587907Z cp.async.cg.shared.global [ %r135 + 0 ], [ %rd51 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6588134Z // end inline asm 2026-02-21T09:42:19.6588272Z // begin inline asm 2026-02-21T09:42:19.6588462Z cp.async.cg.shared.global [ %r137 + 0 ], [ %rd52 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6588691Z // end inline asm 2026-02-21T09:42:19.6588831Z cp.async.commit_group; 2026-02-21T09:42:19.6589101Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6589385Z bar.sync 0; 2026-02-21T09:42:19.6589522Z // begin inline asm 2026-02-21T09:42:19.6589742Z @%p77 mbarrier.arrive.expect_tx.shared.b64 _, [%r129], 4096; 2026-02-21T09:42:19.6589976Z // end inline asm 2026-02-21T09:42:19.6590233Z .loc 1 56 44 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:56:44 2026-02-21T09:42:19.6590520Z bar.sync 0; 2026-02-21T09:42:19.6590673Z elect.sync %r173|%p51, -1; 2026-02-21T09:42:19.6590846Z and.pred %p49, %p1, %p51; 2026-02-21T09:42:19.6591020Z add.s32 %r140, %r34, 32768; 2026-02-21T09:42:19.6591186Z // begin inline asm 2026-02-21T09:42:19.6591520Z @%p49 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r140], [%rd67, {%r44, %r340}], [%r129]; 2026-02-21T09:42:19.6591890Z // end inline asm 2026-02-21T09:42:19.6592122Z .loc 1 55 85 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:85 2026-02-21T09:42:19.6592408Z cp.async.wait_group 0; 2026-02-21T09:42:19.6592554Z bar.sync 0; 2026-02-21T09:42:19.6592788Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6593054Z // begin inline asm 2026-02-21T09:42:19.6593192Z 2026-02-21T09:42:19.6593304Z { 2026-02-21T09:42:19.6593430Z .reg .pred complete; 2026-02-21T09:42:19.6593570Z waitLoop: 2026-02-21T09:42:19.6593758Z mbarrier.try_wait.parity.shared.b64 complete, [%r129], %r44; 2026-02-21T09:42:19.6593992Z @!complete bra.uni waitLoop; 2026-02-21T09:42:19.6594136Z } 2026-02-21T09:42:19.6594198Z 2026-02-21T09:42:19.6594258Z // end inline asm 2026-02-21T09:42:19.6594489Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6594807Z setp.ne.b32 %p52, %r21, 0; 2026-02-21T09:42:19.6594959Z @%p52 bra $L__BB0_3; 2026-02-21T09:42:19.6595104Z // %bb.2: 2026-02-21T09:42:19.6595330Z .loc 1 0 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:0:52 2026-02-21T09:42:19.6595615Z add.s32 %r180, %r34, 32800; 2026-02-21T09:42:19.6595802Z bfe.u32 %r181, %r180, 4, 14; 2026-02-21T09:42:19.6595956Z cvt.u64.u32 %rd59, %r181; 2026-02-21T09:42:19.6596125Z or.b64 %rd57, %rd59, -9223371899399045120; 2026-02-21T09:42:19.6596299Z add.s32 %r182, %r34, 16384; 2026-02-21T09:42:19.6596453Z add.s32 %r183, %r34, 16416; 2026-02-21T09:42:19.6596599Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T09:42:19.6596758Z cvt.u64.u32 %rd60, %r184; 2026-02-21T09:42:19.6596917Z or.b64 %rd56, %rd60, -9223371899382267904; 2026-02-21T09:42:19.6597098Z bfe.u32 %r185, %r140, 4, 14; 2026-02-21T09:42:19.6597251Z cvt.u64.u32 %rd61, %r185; 2026-02-21T09:42:19.6597406Z or.b64 %rd55, %rd61, -9223371899399045120; 2026-02-21T09:42:19.6597583Z bfe.u32 %r186, %r182, 4, 14; 2026-02-21T09:42:19.6597727Z cvt.u64.u32 %rd62, %r186; 2026-02-21T09:42:19.6597885Z or.b64 %rd54, %rd62, -9223371899382267904; 2026-02-21T09:42:19.6598155Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6598438Z elect.sync %r187|%p54, -1; 2026-02-21T09:42:19.6598590Z mov.b32 %r175, 135266320; 2026-02-21T09:42:19.6598742Z mov.pred %p53, 0; 2026-02-21T09:42:19.6598915Z // begin inline asm 2026-02-21T09:42:19.6599136Z @%p54 tcgen05.mma.cta_group::1.kind::f16 [ %r440 + 0 ], %rd54, %rd55, %r175, %p53; 2026-02-21T09:42:19.6599399Z // end inline asm 2026-02-21T09:42:19.6599530Z // begin inline asm 2026-02-21T09:42:19.6599750Z @%p54 tcgen05.mma.cta_group::1.kind::f16 [ %r440 + 0 ], %rd56, %rd57, %r175, %p40; 2026-02-21T09:42:19.6599995Z // end inline asm 2026-02-21T09:42:19.6600138Z add.s32 %r188, %r34, 40960; 2026-02-21T09:42:19.6600290Z cvt.u64.u32 %rd58, %r188; 2026-02-21T09:42:19.6600446Z // begin inline asm 2026-02-21T09:42:19.6600658Z @%p54 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd58]; 2026-02-21T09:42:19.6600878Z // end inline asm 2026-02-21T09:42:19.6601013Z $L__BB0_3: 2026-02-21T09:42:19.6601238Z .loc 1 0 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:0:52 2026-02-21T09:42:19.6601554Z add.s32 %r11, %r34, %r158; 2026-02-21T09:42:19.6601706Z add.s32 %r12, %r34, %r159; 2026-02-21T09:42:19.6601862Z add.s32 %r13, %r34, %r160; 2026-02-21T09:42:19.6602006Z add.s32 %r14, %r34, %r161; 2026-02-21T09:42:19.6602160Z add.s32 %r15, %r34, %r162; 2026-02-21T09:42:19.6602309Z add.s32 %r16, %r34, %r163; 2026-02-21T09:42:19.6602454Z add.s32 %r17, %r34, %r164; 2026-02-21T09:42:19.6602607Z add.s32 %r18, %r34, %r165; 2026-02-21T09:42:19.6602852Z .loc 1 55 32 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:32 2026-02-21T09:42:19.6603135Z add.s64 %rd63, %rd49, 64; 2026-02-21T09:42:19.6603334Z cvt.u64.u32 %rd69, %r6; 2026-02-21T09:42:19.6603497Z add.s64 %rd70, %rd3, %rd69; 2026-02-21T09:42:19.6603653Z shl.b64 %rd71, %rd70, 1; 2026-02-21T09:42:19.6603815Z add.s64 %rd72, %rd10, %rd71; 2026-02-21T09:42:19.6603979Z add.s64 %rd64, %rd72, 65536; 2026-02-21T09:42:19.6604138Z add.s64 %rd65, %rd72, 131072; 2026-02-21T09:42:19.6604310Z add.s64 %rd66, %rd72, 196608; 2026-02-21T09:42:19.6604581Z .loc 1 55 85 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:85 2026-02-21T09:42:19.6604898Z bar.sync 0; 2026-02-21T09:42:19.6605023Z // begin inline asm 2026-02-21T09:42:19.6605222Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd63 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6605439Z // end inline asm 2026-02-21T09:42:19.6605574Z // begin inline asm 2026-02-21T09:42:19.6605770Z cp.async.cg.shared.global [ %r191 + 0 ], [ %rd64 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6605984Z // end inline asm 2026-02-21T09:42:19.6606120Z // begin inline asm 2026-02-21T09:42:19.6606309Z cp.async.cg.shared.global [ %r193 + 0 ], [ %rd65 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6606527Z // end inline asm 2026-02-21T09:42:19.6606655Z // begin inline asm 2026-02-21T09:42:19.6606851Z cp.async.cg.shared.global [ %r195 + 0 ], [ %rd66 + 0 ], 0x10, %r190; 2026-02-21T09:42:19.6607063Z // end inline asm 2026-02-21T09:42:19.6607234Z cp.async.commit_group; 2026-02-21T09:42:19.6607495Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6607776Z // begin inline asm 2026-02-21T09:42:19.6607964Z @%p77 mbarrier.arrive.expect_tx.shared.b64 _, [%r197], 4096; 2026-02-21T09:42:19.6608171Z // end inline asm 2026-02-21T09:42:19.6608412Z .loc 1 56 44 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:56:44 2026-02-21T09:42:19.6608693Z bar.sync 0; 2026-02-21T09:42:19.6608836Z elect.sync %r206|%p61, -1; 2026-02-21T09:42:19.6608993Z and.pred %p59, %p1, %p61; 2026-02-21T09:42:19.6609153Z add.s32 %r198, %r34, 36864; 2026-02-21T09:42:19.6609307Z mov.b32 %r199, 32; 2026-02-21T09:42:19.6609440Z // begin inline asm 2026-02-21T09:42:19.6609769Z @%p59 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r198], [%rd67, {%r199, %r340}], [%r197]; 2026-02-21T09:42:19.6610122Z // end inline asm 2026-02-21T09:42:19.6610372Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6610682Z and.b32 %r207, %r1, 3; 2026-02-21T09:42:19.6610841Z mul.wide.u32 %rd73, %r207, 16; 2026-02-21T09:42:19.6611007Z and.b32 %r208, %r3, 7; 2026-02-21T09:42:19.6611150Z shl.b32 %r209, %r208, 17; 2026-02-21T09:42:19.6611304Z shl.b32 %r210, %r4, 10; 2026-02-21T09:42:19.6611448Z or.b32 %r211, %r209, %r210; 2026-02-21T09:42:19.6611609Z mul.wide.u32 %rd74, %r211, 2; 2026-02-21T09:42:19.6611764Z or.b64 %rd75, %rd73, %rd74; 2026-02-21T09:42:19.6611920Z add.s64 %rd76, %rd75, %rd10; 2026-02-21T09:42:19.6612075Z add.s64 %rd221, %rd76, 196736; 2026-02-21T09:42:19.6612239Z mov.b32 %r445, 1; 2026-02-21T09:42:19.6612368Z mov.b32 %r441, 0; 2026-02-21T09:42:19.6612508Z mov.b64 %rd220, -32; 2026-02-21T09:42:19.6612658Z mov.b32 %r443, %r441; 2026-02-21T09:42:19.6612800Z mov.b32 %r444, %r441; 2026-02-21T09:42:19.6612946Z mov.b32 %r446, %r441; 2026-02-21T09:42:19.6613082Z bra.uni $L__BB0_4; 2026-02-21T09:42:19.6613321Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:19.6613635Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6613916Z add.s64 %rd8, %rd220, 32; 2026-02-21T09:42:19.6614069Z setp.lt.u64 %p71, %rd8, 960; 2026-02-21T09:42:19.6614330Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6614608Z // begin inline asm 2026-02-21T09:42:19.6614765Z 2026-02-21T09:42:19.6614882Z { 2026-02-21T09:42:19.6615001Z .reg .pred complete; 2026-02-21T09:42:19.6615177Z waitLoop: 2026-02-21T09:42:19.6615358Z mbarrier.try_wait.parity.shared.b64 complete, [%r442], %r441; 2026-02-21T09:42:19.6615589Z @!complete bra.uni waitLoop; 2026-02-21T09:42:19.6615732Z } 2026-02-21T09:42:19.6615801Z 2026-02-21T09:42:19.6615855Z // end inline asm 2026-02-21T09:42:19.6615988Z add.s32 %r254, %r445, 1; 2026-02-21T09:42:19.6616147Z setp.gt.s32 %p74, %r254, 1; 2026-02-21T09:42:19.6616311Z selp.b32 %r445, 0, %r254, %p74; 2026-02-21T09:42:19.6616472Z selp.b32 %r255, 1, 0, %p74; 2026-02-21T09:42:19.6616627Z xor.b32 %r33, %r446, %r255; 2026-02-21T09:42:19.6616879Z .loc 1 55 32 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:32 2026-02-21T09:42:19.6617161Z add.s64 %rd86, %rd221, -196608; 2026-02-21T09:42:19.6617321Z add.s64 %rd87, %rd221, -131072; 2026-02-21T09:42:19.6617486Z add.s64 %rd88, %rd221, -65536; 2026-02-21T09:42:19.6617751Z .loc 1 55 85 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:85 2026-02-21T09:42:19.6618035Z shl.b32 %r256, %r445, 13; 2026-02-21T09:42:19.6618190Z add.s32 %r258, %r34, %r256; 2026-02-21T09:42:19.6618339Z add.s32 %r259, %r258, %r5; 2026-02-21T09:42:19.6618493Z bar.sync 0; 2026-02-21T09:42:19.6618622Z add.s32 %r241, %r259, 16384; 2026-02-21T09:42:19.6618813Z selp.b32 %r242, 16, 0, %p71; 2026-02-21T09:42:19.6618972Z // begin inline asm 2026-02-21T09:42:19.6619177Z cp.async.cg.shared.global [ %r241 + 0 ], [ %rd86 + 0 ], 0x10, %r242; 2026-02-21T09:42:19.6619400Z // end inline asm 2026-02-21T09:42:19.6619544Z add.s32 %r243, %r259, 18432; 2026-02-21T09:42:19.6619699Z // begin inline asm 2026-02-21T09:42:19.6619890Z cp.async.cg.shared.global [ %r243 + 0 ], [ %rd87 + 0 ], 0x10, %r242; 2026-02-21T09:42:19.6620114Z // end inline asm 2026-02-21T09:42:19.6620244Z add.s32 %r245, %r259, 20480; 2026-02-21T09:42:19.6620397Z // begin inline asm 2026-02-21T09:42:19.6620585Z cp.async.cg.shared.global [ %r245 + 0 ], [ %rd88 + 0 ], 0x10, %r242; 2026-02-21T09:42:19.6620809Z // end inline asm 2026-02-21T09:42:19.6620939Z add.s32 %r247, %r259, 22528; 2026-02-21T09:42:19.6621092Z // begin inline asm 2026-02-21T09:42:19.6621293Z cp.async.cg.shared.global [ %r247 + 0 ], [ %rd221 + 0 ], 0x10, %r242; 2026-02-21T09:42:19.6621508Z // end inline asm 2026-02-21T09:42:19.6621650Z cp.async.commit_group; 2026-02-21T09:42:19.6621902Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6622218Z shl.b32 %r260, %r445, 3; 2026-02-21T09:42:19.6622364Z add.s32 %r261, %r34, %r260; 2026-02-21T09:42:19.6622523Z add.s32 %r253, %r261, 40976; 2026-02-21T09:42:19.6622677Z and.pred %p69, %p77, %p71; 2026-02-21T09:42:19.6622836Z // begin inline asm 2026-02-21T09:42:19.6623024Z @%p69 mbarrier.arrive.expect_tx.shared.b64 _, [%r253], 4096; 2026-02-21T09:42:19.6623233Z // end inline asm 2026-02-21T09:42:19.6623479Z .loc 1 56 44 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:56:44 2026-02-21T09:42:19.6623753Z shl.b32 %r262, %r445, 12; 2026-02-21T09:42:19.6623906Z add.s32 %r263, %r34, %r262; 2026-02-21T09:42:19.6624054Z add.s32 %r250, %r263, 32768; 2026-02-21T09:42:19.6624203Z bar.sync 0; 2026-02-21T09:42:19.6624333Z elect.sync %r264|%p75, -1; 2026-02-21T09:42:19.6624497Z and.pred %p76, %p71, %p75; 2026-02-21T09:42:19.6624718Z and.pred %p70, %p1, %p76; 2026-02-21T09:42:19.6624872Z cvt.u32.u64 %r265, %rd220; 2026-02-21T09:42:19.6625024Z add.s32 %r251, %r265, 96; 2026-02-21T09:42:19.6625168Z // begin inline asm 2026-02-21T09:42:19.6625491Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r250], [%rd67, {%r251, %r340}], [%r253]; 2026-02-21T09:42:19.6625832Z // end inline asm 2026-02-21T09:42:19.6626079Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6626352Z add.s64 %rd221, %rd221, 64; 2026-02-21T09:42:19.6626510Z mov.b64 %rd220, %rd8; 2026-02-21T09:42:19.6626685Z mov.b32 %r441, %r446; 2026-02-21T09:42:19.6626821Z mov.b32 %r442, %r266; 2026-02-21T09:42:19.6626964Z mov.b32 %r446, %r33; 2026-02-21T09:42:19.6627103Z @%p71 bra $L__BB0_4; 2026-02-21T09:42:19.6627246Z bra.uni $L__BB0_7; 2026-02-21T09:42:19.6627430Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:42:19.6627649Z add.s32 %r214, %r444, 1; 2026-02-21T09:42:19.6627798Z setp.gt.s32 %p63, %r214, 1; 2026-02-21T09:42:19.6627965Z selp.b32 %r444, 0, %r214, %p63; 2026-02-21T09:42:19.6628133Z selp.b32 %r215, 1, 0, %p63; 2026-02-21T09:42:19.6628285Z xor.b32 %r443, %r443, %r215; 2026-02-21T09:42:19.6628548Z .loc 1 55 85 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:85 2026-02-21T09:42:19.6628852Z cp.async.wait_group 0; 2026-02-21T09:42:19.6629019Z bar.sync 0; 2026-02-21T09:42:19.6629259Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6629554Z shl.b32 %r216, %r444, 3; 2026-02-21T09:42:19.6629706Z add.s32 %r218, %r34, %r216; 2026-02-21T09:42:19.6629868Z add.s32 %r212, %r218, 40976; 2026-02-21T09:42:19.6630027Z // begin inline asm 2026-02-21T09:42:19.6630160Z 2026-02-21T09:42:19.6630277Z { 2026-02-21T09:42:19.6630396Z .reg .pred complete; 2026-02-21T09:42:19.6630578Z waitLoop: 2026-02-21T09:42:19.6630771Z mbarrier.try_wait.parity.shared.b64 complete, [%r212], %r443; 2026-02-21T09:42:19.6631017Z @!complete bra.uni waitLoop; 2026-02-21T09:42:19.6631169Z } 2026-02-21T09:42:19.6631243Z 2026-02-21T09:42:19.6631297Z // end inline asm 2026-02-21T09:42:19.6631436Z shl.b32 %r219, %r445, 3; 2026-02-21T09:42:19.6631596Z add.s32 %r220, %r34, %r219; 2026-02-21T09:42:19.6631756Z add.s32 %r266, %r220, 40960; 2026-02-21T09:42:19.6632016Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6632319Z @%p52 bra $L__BB0_6; 2026-02-21T09:42:19.6632512Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:19.6632841Z .loc 1 56 44 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:56:44 2026-02-21T09:42:19.6633127Z shl.b32 %r225, %r444, 12; 2026-02-21T09:42:19.6633287Z add.s32 %r227, %r34, %r225; 2026-02-21T09:42:19.6633450Z add.s32 %r228, %r227, 32768; 2026-02-21T09:42:19.6633715Z .loc 1 55 85 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:55:85 2026-02-21T09:42:19.6634032Z shl.b32 %r229, %r444, 13; 2026-02-21T09:42:19.6634183Z add.s32 %r230, %r34, %r229; 2026-02-21T09:42:19.6634345Z add.s32 %r231, %r230, 16384; 2026-02-21T09:42:19.6634609Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6634933Z elect.sync %r232|%p65, -1; 2026-02-21T09:42:19.6635092Z bfe.u32 %r233, %r231, 4, 14; 2026-02-21T09:42:19.6635252Z cvt.u64.u32 %rd82, %r233; 2026-02-21T09:42:19.6635428Z or.b64 %rd77, %rd82, -9223371899382267904; 2026-02-21T09:42:19.6635611Z bfe.u32 %r234, %r228, 4, 14; 2026-02-21T09:42:19.6635772Z cvt.u64.u32 %rd83, %r234; 2026-02-21T09:42:19.6635937Z or.b64 %rd78, %rd83, -9223371899399045120; 2026-02-21T09:42:19.6636123Z mov.b32 %r222, 135266320; 2026-02-21T09:42:19.6636275Z mov.pred %p64, -1; 2026-02-21T09:42:19.6636459Z // begin inline asm 2026-02-21T09:42:19.6636698Z @%p65 tcgen05.mma.cta_group::1.kind::f16 [ %r440 + 0 ], %rd77, %rd78, %r222, %p64; 2026-02-21T09:42:19.6636973Z // end inline asm 2026-02-21T09:42:19.6637125Z add.s32 %r235, %r230, 16416; 2026-02-21T09:42:19.6637286Z bfe.u32 %r236, %r235, 4, 14; 2026-02-21T09:42:19.6637451Z cvt.u64.u32 %rd84, %r236; 2026-02-21T09:42:19.6637620Z or.b64 %rd79, %rd84, -9223371899382267904; 2026-02-21T09:42:19.6637812Z add.s32 %r237, %r227, 32800; 2026-02-21T09:42:19.6637972Z bfe.u32 %r238, %r237, 4, 14; 2026-02-21T09:42:19.6638150Z cvt.u64.u32 %rd85, %r238; 2026-02-21T09:42:19.6638335Z or.b64 %rd80, %rd85, -9223371899399045120; 2026-02-21T09:42:19.6638514Z // begin inline asm 2026-02-21T09:42:19.6638737Z @%p65 tcgen05.mma.cta_group::1.kind::f16 [ %r440 + 0 ], %rd79, %rd80, %r222, %p64; 2026-02-21T09:42:19.6638979Z // end inline asm 2026-02-21T09:42:19.6639118Z cvt.u64.u32 %rd81, %r266; 2026-02-21T09:42:19.6639261Z // begin inline asm 2026-02-21T09:42:19.6639466Z @%p65 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd81]; 2026-02-21T09:42:19.6639686Z // end inline asm 2026-02-21T09:42:19.6639822Z bra.uni $L__BB0_6; 2026-02-21T09:42:19.6639991Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:42:19.6640297Z .loc 1 0 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:0:52 2026-02-21T09:42:19.6640581Z mov.b32 %r267, 1; 2026-02-21T09:42:19.6640815Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6641089Z // begin inline asm 2026-02-21T09:42:19.6641218Z 2026-02-21T09:42:19.6641331Z { 2026-02-21T09:42:19.6641445Z .reg .pred complete; 2026-02-21T09:42:19.6641592Z waitLoop: 2026-02-21T09:42:19.6641767Z mbarrier.try_wait.parity.shared.b64 complete, [%r266], %r267; 2026-02-21T09:42:19.6641997Z @!complete bra.uni waitLoop; 2026-02-21T09:42:19.6642151Z } 2026-02-21T09:42:19.6642238Z 2026-02-21T09:42:19.6642295Z // end inline asm 2026-02-21T09:42:19.6642535Z .loc 1 50 57 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:50:57 2026-02-21T09:42:19.6642813Z cp.async.wait_group 0; 2026-02-21T09:42:19.6642964Z bar.sync 0; 2026-02-21T09:42:19.6643087Z // begin inline asm 2026-02-21T09:42:19.6643253Z @%p77 mbarrier.inval.shared::cta.b64 [%r129]; 2026-02-21T09:42:19.6643431Z // end inline asm 2026-02-21T09:42:19.6643564Z bar.sync 0; 2026-02-21T09:42:19.6643691Z // begin inline asm 2026-02-21T09:42:19.6643844Z @%p77 mbarrier.inval.shared::cta.b64 [%r197]; 2026-02-21T09:42:19.6644028Z // end inline asm 2026-02-21T09:42:19.6644162Z add.s32 %r270, %r34, 40960; 2026-02-21T09:42:19.6644314Z // begin inline asm 2026-02-21T09:42:19.6644464Z @%p77 mbarrier.inval.shared::cta.b64 [%r270]; 2026-02-21T09:42:19.6644644Z // end inline asm 2026-02-21T09:42:19.6644822Z bar.sync 0; 2026-02-21T09:42:19.6644950Z // begin inline asm 2026-02-21T09:42:19.6645103Z @%p77 mbarrier.inval.shared::cta.b64 [%r128]; 2026-02-21T09:42:19.6645285Z // end inline asm 2026-02-21T09:42:19.6645559Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6645835Z // begin inline asm 2026-02-21T09:42:19.6646194Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287}, [%r339 + 0]; 2026-02-21T09:42:19.6646563Z // end inline asm 2026-02-21T09:42:19.6646699Z // begin inline asm 2026-02-21T09:42:19.6647033Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297, %r298, %r299, %r300, %r301, %r302, %r303, %r304}, [%r339 + 16]; 2026-02-21T09:42:19.6647415Z // end inline asm 2026-02-21T09:42:19.6647551Z // begin inline asm 2026-02-21T09:42:19.6647916Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r306, %r307, %r308, %r309, %r310, %r311, %r312, %r313, %r314, %r315, %r316, %r317, %r318, %r319, %r320, %r321}, [%r339 + 32]; 2026-02-21T09:42:19.6648291Z // end inline asm 2026-02-21T09:42:19.6648423Z // begin inline asm 2026-02-21T09:42:19.6648774Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338}, [%r339 + 48]; 2026-02-21T09:42:19.6649145Z // end inline asm 2026-02-21T09:42:19.6649274Z // begin inline asm 2026-02-21T09:42:19.6649424Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:42:19.6649581Z // end inline asm 2026-02-21T09:42:19.6649718Z cvt.u64.u32 %rd92, %r272; 2026-02-21T09:42:19.6649866Z cvt.u64.u32 %rd93, %r273; 2026-02-21T09:42:19.6650050Z shl.b64 %rd94, %rd93, 32; 2026-02-21T09:42:19.6650195Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T09:42:19.6650460Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6650743Z mov.b64 {%r343, %r344}, %rd95; 2026-02-21T09:42:19.6650909Z cvt.rn.f16x2.f32 %r345, %r344, %r343; 2026-02-21T09:42:19.6651186Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6651459Z cvt.u64.u32 %rd96, %r274; 2026-02-21T09:42:19.6651608Z cvt.u64.u32 %rd97, %r275; 2026-02-21T09:42:19.6651750Z shl.b64 %rd98, %rd97, 32; 2026-02-21T09:42:19.6651901Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T09:42:19.6652148Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6652430Z mov.b64 {%r346, %r347}, %rd99; 2026-02-21T09:42:19.6652597Z cvt.rn.f16x2.f32 %r348, %r347, %r346; 2026-02-21T09:42:19.6652868Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6653146Z cvt.u64.u32 %rd100, %r276; 2026-02-21T09:42:19.6653295Z cvt.u64.u32 %rd101, %r277; 2026-02-21T09:42:19.6653448Z shl.b64 %rd102, %rd101, 32; 2026-02-21T09:42:19.6653597Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T09:42:19.6653885Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6654177Z mov.b64 {%r349, %r350}, %rd103; 2026-02-21T09:42:19.6654340Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T09:42:19.6654612Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6654913Z cvt.u64.u32 %rd104, %r278; 2026-02-21T09:42:19.6655070Z cvt.u64.u32 %rd105, %r279; 2026-02-21T09:42:19.6655217Z shl.b64 %rd106, %rd105, 32; 2026-02-21T09:42:19.6655374Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T09:42:19.6655637Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6655928Z mov.b64 {%r352, %r353}, %rd107; 2026-02-21T09:42:19.6656098Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T09:42:19.6656370Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6656655Z cvt.u64.u32 %rd108, %r280; 2026-02-21T09:42:19.6656803Z cvt.u64.u32 %rd109, %r281; 2026-02-21T09:42:19.6657006Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:42:19.6657161Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:42:19.6657424Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6657716Z mov.b64 {%r355, %r356}, %rd111; 2026-02-21T09:42:19.6657884Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T09:42:19.6658157Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6658431Z cvt.u64.u32 %rd112, %r282; 2026-02-21T09:42:19.6658591Z cvt.u64.u32 %rd113, %r283; 2026-02-21T09:42:19.6658744Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:42:19.6658910Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:42:19.6659166Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6659458Z mov.b64 {%r358, %r359}, %rd115; 2026-02-21T09:42:19.6659667Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T09:42:19.6659944Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6660241Z cvt.u64.u32 %rd116, %r284; 2026-02-21T09:42:19.6660387Z cvt.u64.u32 %rd117, %r285; 2026-02-21T09:42:19.6660538Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:42:19.6660686Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:42:19.6660952Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6661242Z mov.b64 {%r361, %r362}, %rd119; 2026-02-21T09:42:19.6661432Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T09:42:19.6661709Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6661996Z cvt.u64.u32 %rd120, %r286; 2026-02-21T09:42:19.6662153Z cvt.u64.u32 %rd121, %r287; 2026-02-21T09:42:19.6662304Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:42:19.6662467Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:42:19.6662725Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6663015Z mov.b64 {%r364, %r365}, %rd123; 2026-02-21T09:42:19.6663186Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T09:42:19.6663457Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6663744Z cvt.u64.u32 %rd124, %r289; 2026-02-21T09:42:19.6663894Z cvt.u64.u32 %rd125, %r290; 2026-02-21T09:42:19.6664052Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:42:19.6664207Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:42:19.6664472Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6664820Z mov.b64 {%r367, %r368}, %rd127; 2026-02-21T09:42:19.6664980Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T09:42:19.6665288Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6665560Z cvt.u64.u32 %rd128, %r291; 2026-02-21T09:42:19.6665716Z cvt.u64.u32 %rd129, %r292; 2026-02-21T09:42:19.6665860Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:42:19.6666017Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:42:19.6666265Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6666541Z mov.b64 {%r370, %r371}, %rd131; 2026-02-21T09:42:19.6666707Z cvt.rn.f16x2.f32 %r372, %r371, %r370; 2026-02-21T09:42:19.6666965Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6667242Z cvt.u64.u32 %rd132, %r293; 2026-02-21T09:42:19.6667387Z cvt.u64.u32 %rd133, %r294; 2026-02-21T09:42:19.6667538Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:42:19.6667686Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:42:19.6667947Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6668226Z mov.b64 {%r373, %r374}, %rd135; 2026-02-21T09:42:19.6668408Z cvt.rn.f16x2.f32 %r375, %r374, %r373; 2026-02-21T09:42:19.6668680Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6668950Z cvt.u64.u32 %rd136, %r295; 2026-02-21T09:42:19.6669105Z cvt.u64.u32 %rd137, %r296; 2026-02-21T09:42:19.6669250Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:42:19.6669407Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:42:19.6669658Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6669949Z mov.b64 {%r376, %r377}, %rd139; 2026-02-21T09:42:19.6670118Z cvt.rn.f16x2.f32 %r378, %r377, %r376; 2026-02-21T09:42:19.6670385Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6670661Z cvt.u64.u32 %rd140, %r297; 2026-02-21T09:42:19.6670836Z cvt.u64.u32 %rd141, %r298; 2026-02-21T09:42:19.6670990Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:42:19.6671140Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:42:19.6671399Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6671675Z mov.b64 {%r379, %r380}, %rd143; 2026-02-21T09:42:19.6671841Z cvt.rn.f16x2.f32 %r381, %r380, %r379; 2026-02-21T09:42:19.6672123Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6672406Z cvt.u64.u32 %rd144, %r299; 2026-02-21T09:42:19.6672567Z cvt.u64.u32 %rd145, %r300; 2026-02-21T09:42:19.6672747Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:42:19.6672912Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:42:19.6673175Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6673470Z mov.b64 {%r382, %r383}, %rd147; 2026-02-21T09:42:19.6673646Z cvt.rn.f16x2.f32 %r384, %r383, %r382; 2026-02-21T09:42:19.6673924Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6674218Z cvt.u64.u32 %rd148, %r301; 2026-02-21T09:42:19.6674370Z cvt.u64.u32 %rd149, %r302; 2026-02-21T09:42:19.6674529Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:42:19.6674720Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:42:19.6674995Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6675285Z mov.b64 {%r385, %r386}, %rd151; 2026-02-21T09:42:19.6675452Z cvt.rn.f16x2.f32 %r387, %r386, %r385; 2026-02-21T09:42:19.6675732Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6676015Z cvt.u64.u32 %rd152, %r303; 2026-02-21T09:42:19.6676174Z cvt.u64.u32 %rd153, %r304; 2026-02-21T09:42:19.6676326Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:42:19.6676515Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:42:19.6676785Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6677073Z mov.b64 {%r388, %r389}, %rd155; 2026-02-21T09:42:19.6677245Z cvt.rn.f16x2.f32 %r390, %r389, %r388; 2026-02-21T09:42:19.6677524Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6677815Z cvt.u64.u32 %rd156, %r306; 2026-02-21T09:42:19.6677964Z cvt.u64.u32 %rd157, %r307; 2026-02-21T09:42:19.6678120Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:42:19.6678273Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:42:19.6678544Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6678842Z mov.b64 {%r391, %r392}, %rd159; 2026-02-21T09:42:19.6679007Z cvt.rn.f16x2.f32 %r393, %r392, %r391; 2026-02-21T09:42:19.6679290Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6679571Z cvt.u64.u32 %rd160, %r308; 2026-02-21T09:42:19.6679731Z cvt.u64.u32 %rd161, %r309; 2026-02-21T09:42:19.6679917Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:42:19.6680081Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:42:19.6680341Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6680622Z mov.b64 {%r394, %r395}, %rd163; 2026-02-21T09:42:19.6680791Z cvt.rn.f16x2.f32 %r396, %r395, %r394; 2026-02-21T09:42:19.6681058Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6681337Z cvt.u64.u32 %rd164, %r310; 2026-02-21T09:42:19.6681488Z cvt.u64.u32 %rd165, %r311; 2026-02-21T09:42:19.6681640Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:42:19.6681789Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:42:19.6682048Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6682349Z mov.b64 {%r397, %r398}, %rd167; 2026-02-21T09:42:19.6682509Z cvt.rn.f16x2.f32 %r399, %r398, %r397; 2026-02-21T09:42:19.6682778Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6683046Z cvt.u64.u32 %rd168, %r312; 2026-02-21T09:42:19.6683200Z cvt.u64.u32 %rd169, %r313; 2026-02-21T09:42:19.6683347Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:42:19.6683504Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:42:19.6683752Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6684031Z mov.b64 {%r400, %r401}, %rd171; 2026-02-21T09:42:19.6684222Z cvt.rn.f16x2.f32 %r402, %r401, %r400; 2026-02-21T09:42:19.6684496Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6684802Z cvt.u64.u32 %rd172, %r314; 2026-02-21T09:42:19.6684948Z cvt.u64.u32 %rd173, %r315; 2026-02-21T09:42:19.6685105Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:42:19.6685254Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:42:19.6685515Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6685795Z mov.b64 {%r403, %r404}, %rd175; 2026-02-21T09:42:19.6685953Z cvt.rn.f16x2.f32 %r405, %r404, %r403; 2026-02-21T09:42:19.6686224Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6686494Z cvt.u64.u32 %rd176, %r316; 2026-02-21T09:42:19.6686646Z cvt.u64.u32 %rd177, %r317; 2026-02-21T09:42:19.6686791Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:42:19.6686947Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:42:19.6687202Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6687484Z mov.b64 {%r406, %r407}, %rd179; 2026-02-21T09:42:19.6687648Z cvt.rn.f16x2.f32 %r408, %r407, %r406; 2026-02-21T09:42:19.6687940Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6688220Z cvt.u64.u32 %rd180, %r318; 2026-02-21T09:42:19.6688365Z cvt.u64.u32 %rd181, %r319; 2026-02-21T09:42:19.6688518Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:42:19.6688666Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:42:19.6688923Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6689208Z mov.b64 {%r409, %r410}, %rd183; 2026-02-21T09:42:19.6689364Z cvt.rn.f16x2.f32 %r411, %r410, %r409; 2026-02-21T09:42:19.6689630Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6689900Z cvt.u64.u32 %rd184, %r320; 2026-02-21T09:42:19.6690054Z cvt.u64.u32 %rd185, %r321; 2026-02-21T09:42:19.6690199Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:42:19.6690354Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:42:19.6690607Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6690918Z mov.b64 {%r412, %r413}, %rd187; 2026-02-21T09:42:19.6691085Z cvt.rn.f16x2.f32 %r414, %r413, %r412; 2026-02-21T09:42:19.6691350Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6691631Z cvt.u64.u32 %rd188, %r323; 2026-02-21T09:42:19.6691778Z cvt.u64.u32 %rd189, %r324; 2026-02-21T09:42:19.6691941Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:42:19.6692092Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:42:19.6692352Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6692630Z mov.b64 {%r415, %r416}, %rd191; 2026-02-21T09:42:19.6692790Z cvt.rn.f16x2.f32 %r417, %r416, %r415; 2026-02-21T09:42:19.6693060Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6693331Z cvt.u64.u32 %rd192, %r325; 2026-02-21T09:42:19.6693509Z cvt.u64.u32 %rd193, %r326; 2026-02-21T09:42:19.6693657Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:42:19.6693814Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:42:19.6694066Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6694344Z mov.b64 {%r418, %r419}, %rd195; 2026-02-21T09:42:19.6694509Z cvt.rn.f16x2.f32 %r420, %r419, %r418; 2026-02-21T09:42:19.6694802Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6695090Z cvt.u64.u32 %rd196, %r327; 2026-02-21T09:42:19.6695265Z cvt.u64.u32 %rd197, %r328; 2026-02-21T09:42:19.6695417Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:42:19.6695566Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:42:19.6695824Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6696101Z mov.b64 {%r421, %r422}, %rd199; 2026-02-21T09:42:19.6696261Z cvt.rn.f16x2.f32 %r423, %r422, %r421; 2026-02-21T09:42:19.6696531Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6696799Z cvt.u64.u32 %rd200, %r329; 2026-02-21T09:42:19.6696949Z cvt.u64.u32 %rd201, %r330; 2026-02-21T09:42:19.6697093Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:42:19.6697249Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:42:19.6697498Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6697775Z mov.b64 {%r424, %r425}, %rd203; 2026-02-21T09:42:19.6697943Z cvt.rn.f16x2.f32 %r426, %r425, %r424; 2026-02-21T09:42:19.6698204Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6698478Z cvt.u64.u32 %rd204, %r331; 2026-02-21T09:42:19.6698623Z cvt.u64.u32 %rd205, %r332; 2026-02-21T09:42:19.6698777Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:42:19.6698969Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:42:19.6699231Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6699509Z mov.b64 {%r427, %r428}, %rd207; 2026-02-21T09:42:19.6699667Z cvt.rn.f16x2.f32 %r429, %r428, %r427; 2026-02-21T09:42:19.6699935Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6700204Z cvt.u64.u32 %rd208, %r333; 2026-02-21T09:42:19.6700357Z cvt.u64.u32 %rd209, %r334; 2026-02-21T09:42:19.6700504Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:42:19.6700659Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:42:19.6700913Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6701195Z mov.b64 {%r430, %r431}, %rd211; 2026-02-21T09:42:19.6701362Z cvt.rn.f16x2.f32 %r432, %r431, %r430; 2026-02-21T09:42:19.6701629Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6701909Z cvt.u64.u32 %rd212, %r335; 2026-02-21T09:42:19.6702083Z cvt.u64.u32 %rd213, %r336; 2026-02-21T09:42:19.6702242Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:42:19.6702392Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:42:19.6702652Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6702942Z mov.b64 {%r433, %r434}, %rd215; 2026-02-21T09:42:19.6703101Z cvt.rn.f16x2.f32 %r435, %r434, %r433; 2026-02-21T09:42:19.6703367Z .loc 1 57 52 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:57:52 2026-02-21T09:42:19.6703633Z cvt.u64.u32 %rd216, %r337; 2026-02-21T09:42:19.6703784Z cvt.u64.u32 %rd217, %r338; 2026-02-21T09:42:19.6703926Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:42:19.6704081Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:42:19.6704354Z .loc 1 59 27 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:59:27 2026-02-21T09:42:19.6704631Z mov.b64 {%r436, %r437}, %rd219; 2026-02-21T09:42:19.6704841Z cvt.rn.f16x2.f32 %r438, %r437, %r436; 2026-02-21T09:42:19.6705108Z .loc 1 60 45 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:60:45 2026-02-21T09:42:19.6705408Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:42:19.6705575Z bar.sync 0; 2026-02-21T09:42:19.6705746Z st.shared.v4.b32 [%r11], {%r345, %r348, %r351, %r354}; 2026-02-21T09:42:19.6705978Z st.shared.v4.b32 [%r12], {%r357, %r360, %r363, %r366}; 2026-02-21T09:42:19.6706206Z st.shared.v4.b32 [%r13], {%r369, %r372, %r375, %r378}; 2026-02-21T09:42:19.6706480Z st.shared.v4.b32 [%r14], {%r381, %r384, %r387, %r390}; 2026-02-21T09:42:19.6706700Z st.shared.v4.b32 [%r15], {%r393, %r396, %r399, %r402}; 2026-02-21T09:42:19.6706926Z st.shared.v4.b32 [%r16], {%r405, %r408, %r411, %r414}; 2026-02-21T09:42:19.6707141Z st.shared.v4.b32 [%r17], {%r417, %r420, %r423, %r426}; 2026-02-21T09:42:19.6707364Z st.shared.v4.b32 [%r18], {%r429, %r432, %r435, %r438}; 2026-02-21T09:42:19.6707556Z // begin inline asm 2026-02-21T09:42:19.6707727Z fence.proxy.async.shared::cta; 2026-02-21T09:42:19.6707896Z // end inline asm 2026-02-21T09:42:19.6708041Z bar.sync 0; 2026-02-21T09:42:19.6708191Z elect.sync %r439|%p83, -1; 2026-02-21T09:42:19.6708353Z and.pred %p81, %p1, %p83; 2026-02-21T09:42:19.6708517Z // begin inline asm 2026-02-21T09:42:19.6708778Z @%p81 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd91, {%r340, %r341}], [%r34]; 2026-02-21T09:42:19.6709071Z // end inline asm 2026-02-21T09:42:19.6709217Z cp.async.bulk.commit_group; 2026-02-21T09:42:19.6709411Z $L__BB0_8: // %._crit_edge 2026-02-21T09:42:19.6709717Z .loc 1 32 107 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:32:107 2026-02-21T09:42:19.6710033Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:42:19.6710209Z bar.sync 0; 2026-02-21T09:42:19.6710481Z .loc 1 32 4 // c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py:32:4 2026-02-21T09:42:19.6710768Z bar.sync 0; 2026-02-21T09:42:19.6710892Z // begin inline asm 2026-02-21T09:42:19.6711091Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r440, 64; 2026-02-21T09:42:19.6711304Z // end inline asm 2026-02-21T09:42:19.6711435Z ret; 2026-02-21T09:42:19.6711553Z $L__tmp0: 2026-02-21T09:42:19.6711681Z $L__func_end0: 2026-02-21T09:42:19.6711838Z // -- End function 2026-02-21T09:42:19.6712014Z } 2026-02-21T09:42:19.6712280Z .file 1 "/tmp/torchinductor_root/47/c47s3bmr3bb33soxmhpjwtnjp4da4c7qfctc2uneqfpfomq7esoi.py" 2026-02-21T09:42:19.6712591Z .section .debug_abbrev 2026-02-21T09:42:19.6712739Z { 2026-02-21T09:42:19.6712887Z .b8 1 // Abbreviation Code 2026-02-21T09:42:19.6713113Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:42:19.6713325Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:42:19.6713541Z .b8 37 // DW_AT_producer 2026-02-21T09:42:19.6713776Z .b8 8 // DW_FORM_string 2026-02-21T09:42:19.6713977Z .b8 19 // DW_AT_language 2026-02-21T09:42:19.6714188Z .b8 5 // DW_FORM_data2 2026-02-21T09:42:19.6714390Z .b8 3 // DW_AT_name 2026-02-21T09:42:19.6714596Z .b8 8 // DW_FORM_string 2026-02-21T09:42:19.6714825Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:42:19.6715033Z .b8 6 // DW_FORM_data4 2026-02-21T09:42:19.6715236Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:42:19.6715432Z .b8 8 // DW_FORM_string 2026-02-21T09:42:19.6715634Z .b8 0 // EOM(1) 2026-02-21T09:42:19.6715822Z .b8 0 // EOM(2) 2026-02-21T09:42:19.6716037Z .b8 0 // EOM(3) 2026-02-21T09:42:19.6716202Z } 2026-02-21T09:42:19.6716328Z .section .debug_info 2026-02-21T09:42:19.6716461Z { 2026-02-21T09:42:19.6716609Z .b32 104 // Length of Unit 2026-02-21T09:42:19.6716829Z .b8 2 // DWARF version number 2026-02-21T09:42:19.6717013Z .b8 0 2026-02-21T09:42:19.6717195Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:42:19.6717468Z .b8 8 // Address Size (in bytes) 2026-02-21T09:42:19.6717715Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:42:19.6717986Z .b8 116 // DW_AT_producer 2026-02-21T09:42:19.6718178Z .b8 114 2026-02-21T09:42:19.6718299Z .b8 105 2026-02-21T09:42:19.6718421Z .b8 116 2026-02-21T09:42:19.6718542Z .b8 111 2026-02-21T09:42:19.6718654Z .b8 110 2026-02-21T09:42:19.6718776Z .b8 0 2026-02-21T09:42:19.6718916Z .b8 2 // DW_AT_language 2026-02-21T09:42:19.6719105Z .b8 0 2026-02-21T09:42:19.6719245Z .b8 99 // DW_AT_name 2026-02-21T09:42:19.6719429Z .b8 52 2026-02-21T09:42:19.6719548Z .b8 55 2026-02-21T09:42:19.6719671Z .b8 115 2026-02-21T09:42:19.6719784Z .b8 51 2026-02-21T09:42:19.6719907Z .b8 98 2026-02-21T09:42:19.6720020Z .b8 109 2026-02-21T09:42:19.6720141Z .b8 114 2026-02-21T09:42:19.6720262Z .b8 51 2026-02-21T09:42:19.6720374Z .b8 98 2026-02-21T09:42:19.6720495Z .b8 98 2026-02-21T09:42:19.6720610Z .b8 51 2026-02-21T09:42:19.6720730Z .b8 51 2026-02-21T09:42:19.6720846Z .b8 115 2026-02-21T09:42:19.6720967Z .b8 111 2026-02-21T09:42:19.6721081Z .b8 120 2026-02-21T09:42:19.6721206Z .b8 109 2026-02-21T09:42:19.6721322Z .b8 104 2026-02-21T09:42:19.6721446Z .b8 112 2026-02-21T09:42:19.6721558Z .b8 106 2026-02-21T09:42:19.6721680Z .b8 119 2026-02-21T09:42:19.6721791Z .b8 116 2026-02-21T09:42:19.6721942Z .b8 110 2026-02-21T09:42:19.6722063Z .b8 106 2026-02-21T09:42:19.6722174Z .b8 112 2026-02-21T09:42:19.6722294Z .b8 52 2026-02-21T09:42:19.6722409Z .b8 100 2026-02-21T09:42:19.6722529Z .b8 97 2026-02-21T09:42:19.6722641Z .b8 52 2026-02-21T09:42:19.6722761Z .b8 99 2026-02-21T09:42:19.6722873Z .b8 55 2026-02-21T09:42:19.6722992Z .b8 113 2026-02-21T09:42:19.6723104Z .b8 102 2026-02-21T09:42:19.6723225Z .b8 99 2026-02-21T09:42:19.6723340Z .b8 116 2026-02-21T09:42:19.6723462Z .b8 99 2026-02-21T09:42:19.6723575Z .b8 50 2026-02-21T09:42:19.6723697Z .b8 117 2026-02-21T09:42:19.6723809Z .b8 110 2026-02-21T09:42:19.6723929Z .b8 101 2026-02-21T09:42:19.6724047Z .b8 113 2026-02-21T09:42:19.6724160Z .b8 102 2026-02-21T09:42:19.6724279Z .b8 112 2026-02-21T09:42:19.6724391Z .b8 102 2026-02-21T09:42:19.6724510Z .b8 111 2026-02-21T09:42:19.6724621Z .b8 109 2026-02-21T09:42:19.6724766Z .b8 113 2026-02-21T09:42:19.6724878Z .b8 55 2026-02-21T09:42:19.6724997Z .b8 101 2026-02-21T09:42:19.6725108Z .b8 115 2026-02-21T09:42:19.6725230Z .b8 111 2026-02-21T09:42:19.6725343Z .b8 105 2026-02-21T09:42:19.6725461Z .b8 46 2026-02-21T09:42:19.6725573Z .b8 112 2026-02-21T09:42:19.6725731Z .b8 121 2026-02-21T09:42:19.6725851Z .b8 0 2026-02-21T09:42:19.6726013Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:42:19.6726245Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:42:19.6726428Z .b8 116 2026-02-21T09:42:19.6726549Z .b8 109 2026-02-21T09:42:19.6726661Z .b8 112 2026-02-21T09:42:19.6726784Z .b8 47 2026-02-21T09:42:19.6726898Z .b8 116 2026-02-21T09:42:19.6727023Z .b8 111 2026-02-21T09:42:19.6727145Z .b8 114 2026-02-21T09:42:19.6727264Z .b8 99 2026-02-21T09:42:19.6727376Z .b8 104 2026-02-21T09:42:19.6727503Z .b8 105 2026-02-21T09:42:19.6727619Z .b8 110 2026-02-21T09:42:19.6727726Z .b8 100 2026-02-21T09:42:19.6727840Z .b8 117 2026-02-21T09:42:19.6727947Z .b8 99 2026-02-21T09:42:19.6728063Z .b8 116 2026-02-21T09:42:19.6728170Z .b8 111 2026-02-21T09:42:19.6728284Z .b8 114 2026-02-21T09:42:19.6728390Z .b8 95 2026-02-21T09:42:19.6728537Z .b8 114 2026-02-21T09:42:19.6728648Z .b8 111 2026-02-21T09:42:19.6728763Z .b8 111 2026-02-21T09:42:19.6728872Z .b8 116 2026-02-21T09:42:19.6728989Z .b8 47 2026-02-21T09:42:19.6729098Z .b8 52 2026-02-21T09:42:19.6729215Z .b8 55 2026-02-21T09:42:19.6729322Z .b8 0 2026-02-21T09:42:19.6729437Z } 2026-02-21T09:42:19.6729568Z .section .debug_macinfo { } 2026-02-21T09:42:19.6729573Z 2026-02-21T09:42:19.6729650Z ================================================================ 2026-02-21T09:42:19.6729753Z please share the reproducer above with Triton project. 2026-02-21T09:42:21.2901187Z 2026-02-21T09:42:21.2902060Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 86/86 19.4 configs/s 2026-02-21T09:42:23.8011162Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 400.3 2026-02-21T09:42:23.8012225Z configs/s 2026-02-21T09:42:23.9443043Z [90s] Generation 4 complete: 2026-02-21T09:42:23.9447495Z error=14 2026-02-21T09:42:23.9449487Z ok=76 2026-02-21T09:42:23.9449697Z min=0.0451 2026-02-21T09:42:23.9449853Z mid=0.0839 2026-02-21T09:42:23.9450005Z max=6.7266 2026-02-21T09:42:23.9450175Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:42:23.9450422Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T09:42:23.9450663Z 'l2_groupings': [64], 2026-02-21T09:42:23.9450840Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:42:23.9451039Z 'loop_orders': [[1, 0]], 2026-02-21T09:42:23.9451203Z 'num_stages': 6, 2026-02-21T09:42:23.9451364Z 'num_warps': 4, 2026-02-21T09:42:23.9451500Z 'pid_type': 'flat', 2026-02-21T09:42:23.9451695Z 'range_flattens': [None, True], 2026-02-21T09:42:23.9451869Z 'range_multi_buffers': [None, True], 2026-02-21T09:42:23.9452059Z 'range_num_stages': [0, 0], 2026-02-21T09:42:23.9452223Z 'range_unroll_factors': [0, 0], 2026-02-21T09:42:23.9452421Z 'range_warp_specializes': [None, None]} 2026-02-21T09:42:23.9473692Z [90s] Fitting surrogate: 461 points, 461 targets 2026-02-21T09:42:25.0572999Z [91s] Generation 5 starting: 77 neighbors, 5 active search path(s) 2026-02-21T09:42:32.8050143Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77/77 1.8 configs/s 2026-02-21T09:42:36.7338068Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 77/77 19.9 configs/s 2026-02-21T09:42:39.0714354Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 430.1 2026-02-21T09:42:39.0718594Z configs/s 2026-02-21T09:42:39.2085822Z [105s] Generation 5 complete: 2026-02-21T09:42:39.2089667Z error=10 2026-02-21T09:42:39.2091593Z ok=72 2026-02-21T09:42:39.2091808Z min=0.0409 2026-02-21T09:42:39.2091953Z mid=0.0636 2026-02-21T09:42:39.2092086Z max=4.8030 2026-02-21T09:42:39.2096667Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:42:39.2100675Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:42:39.2104046Z 'l2_groupings': [64], 2026-02-21T09:42:39.2108806Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:42:39.2109122Z 'loop_orders': [[1, 0]], 2026-02-21T09:42:39.2109596Z 'num_stages': 3, 2026-02-21T09:42:39.2114651Z 'num_warps': 4, 2026-02-21T09:42:39.2119180Z 'pid_type': 'flat', 2026-02-21T09:42:39.2123165Z 'range_flattens': [None, False], 2026-02-21T09:42:39.2127792Z 'range_multi_buffers': [None, True], 2026-02-21T09:42:39.2131786Z 'range_num_stages': [0, 0], 2026-02-21T09:42:39.2135004Z 'range_unroll_factors': [0, 0], 2026-02-21T09:42:39.2138852Z 'range_warp_specializes': [None, None]} 2026-02-21T09:42:39.2140265Z [105s] Fitting surrogate: 543 points, 543 targets 2026-02-21T09:42:40.5138524Z [106s] Generation 6 starting: 80 neighbors, 5 active search path(s) 2026-02-21T09:42:47.6145677Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 17.8 configs/s 2026-02-21T09:42:48.8263536Z 2026-02-21T09:42:48.8268206Z 2026-02-21T09:42:48.8273548Z ================================================================ 2026-02-21T09:42:48.8278842Z Internal Triton PTX codegen error 2026-02-21T09:42:48.8279852Z `ptxas` stderr: 2026-02-21T09:42:48.8280431Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 147 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:42:48.8281072Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:42:48.8281251Z 2026-02-21T09:42:48.8281799Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpy5axpv50.ptx -o /tmp/tmpy5axpv50.ptx.o 2026-02-21T09:42:48.8282347Z 2026-02-21T09:42:48.8282450Z 2026-02-21T09:42:48.8282525Z // 2026-02-21T09:42:48.8282697Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:42:48.8282877Z // 2026-02-21T09:42:48.8282956Z 2026-02-21T09:42:48.8283016Z .version 8.7 2026-02-21T09:42:48.8283166Z .target sm_100a 2026-02-21T09:42:48.8283306Z .address_size 64 2026-02-21T09:42:48.8283394Z 2026-02-21T09:42:48.8283535Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:42:48.8283804Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:42:48.8284044Z // @_helion_matmul 2026-02-21T09:42:48.8284263Z .visible .entry _helion_matmul( 2026-02-21T09:42:48.8284507Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:42:48.8284904Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:42:48.8285194Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:42:48.8285508Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:42:48.8285800Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:42:48.8286048Z ) 2026-02-21T09:42:48.8286190Z .reqntid 256 2026-02-21T09:42:48.8286353Z .maxnreg 32 2026-02-21T09:42:48.8286500Z { 2026-02-21T09:42:48.8286657Z .reg .pred %p<128>; 2026-02-21T09:42:48.8286833Z .reg .b32 %r<1732>; 2026-02-21T09:42:48.8287334Z .reg .b64 %rd<650>; 2026-02-21T09:42:48.8287640Z .loc 1 19 0 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:19:0 2026-02-21T09:42:48.8287953Z $L__func_begin0: 2026-02-21T09:42:48.8288214Z .loc 1 19 0 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:19:0 2026-02-21T09:42:48.8288456Z 2026-02-21T09:42:48.8288511Z // %bb.0: 2026-02-21T09:42:48.8288681Z ld.param.b64 %rd6, [_helion_matmul_param_0]; 2026-02-21T09:42:48.8288885Z $L__tmp0: 2026-02-21T09:42:48.8289125Z .loc 1 19 0 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:19 2026-02-21T09:42:48.8289422Z mov.u32 %r1, %tid.x; 2026-02-21T09:42:48.8289609Z ld.param.b64 %rd9, [_helion_matmul_param_1]; 2026-02-21T09:42:48.8289841Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:42:48.8290023Z mov.b32 %r128, global_smem; 2026-02-21T09:42:48.8290209Z // begin inline asm 2026-02-21T09:42:48.8290482Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r128], 512; 2026-02-21T09:42:48.8290770Z // end inline asm 2026-02-21T09:42:48.8290958Z ld.param.b64 %rd51, [_helion_matmul_param_3]; 2026-02-21T09:42:48.8291263Z bar.sync 0; 2026-02-21T09:42:48.8291434Z ld.shared.b32 %r1692, [global_smem]; 2026-02-21T09:42:48.8291628Z bar.sync 0; 2026-02-21T09:42:48.8291792Z // begin inline asm 2026-02-21T09:42:48.8292008Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:42:48.8292252Z // end inline asm 2026-02-21T09:42:48.8292518Z .loc 1 21 68 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:21:68 2026-02-21T09:42:48.8292829Z mov.u32 %r1729, %ctaid.x; 2026-02-21T09:42:48.8293003Z mov.u32 %r484, %ctaid.y; 2026-02-21T09:42:48.8293161Z mov.u32 %r485, %ctaid.z; 2026-02-21T09:42:48.8293324Z mov.u32 %r486, %nctaid.x; 2026-02-21T09:42:48.8293481Z mov.u32 %r487, %nctaid.y; 2026-02-21T09:42:48.8293654Z mad.lo.s32 %r488, %r485, %r487, %r484; 2026-02-21T09:42:48.8293845Z mad.lo.s32 %r489, %r488, %r486, %r1729; 2026-02-21T09:42:48.8294083Z shl.b32 %r490, %r489, 7; 2026-02-21T09:42:48.8294242Z cvt.s64.s32 %rd52, %r490; 2026-02-21T09:42:48.8294412Z add.s64 %rd23, %rd51, %rd52; 2026-02-21T09:42:48.8294583Z shl.b32 %r491, %r1, 2; 2026-02-21T09:42:48.8294818Z add.s32 %r129, %r128, %r491; 2026-02-21T09:42:48.8294997Z mov.b32 %r1731, 0; 2026-02-21T09:42:48.8295152Z // begin inline asm 2026-02-21T09:42:48.8295341Z @%p3 st.shared.b32 [ %r129 + 0 ], %r1731; 2026-02-21T09:42:48.8295550Z // end inline asm 2026-02-21T09:42:48.8295729Z bar.warp.sync -1; 2026-02-21T09:42:48.8295924Z setp.eq.b32 %p118, %r1, 0; 2026-02-21T09:42:48.8296121Z cvt.u64.u32 %rd8, %r128; 2026-02-21T09:42:48.8296347Z // begin inline asm 2026-02-21T09:42:48.8296635Z @%p118 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T09:42:48.8296971Z // end inline asm 2026-02-21T09:42:48.8297122Z // begin inline asm 2026-02-21T09:42:48.8297384Z @%p118 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T09:42:48.8297672Z // end inline asm 2026-02-21T09:42:48.8297829Z mov.b32 %r131, 32; 2026-02-21T09:42:48.8297991Z // begin inline asm 2026-02-21T09:42:48.8298256Z @%p118 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r131; 2026-02-21T09:42:48.8298568Z // end inline asm 2026-02-21T09:42:48.8298718Z mov.b32 %r132, 256; 2026-02-21T09:42:48.8298885Z // begin inline asm 2026-02-21T09:42:48.8299148Z @%p118 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r132; 2026-02-21T09:42:48.8299464Z // end inline asm 2026-02-21T09:42:48.8299614Z mov.b32 %r133, 1024; 2026-02-21T09:42:48.8299782Z // begin inline asm 2026-02-21T09:42:48.8300060Z @%p118 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r133; 2026-02-21T09:42:48.8300381Z // end inline asm 2026-02-21T09:42:48.8300540Z mov.b32 %r134, 12288; 2026-02-21T09:42:48.8300702Z // begin inline asm 2026-02-21T09:42:48.8301016Z @%p118 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r134; 2026-02-21T09:42:48.8301335Z // end inline asm 2026-02-21T09:42:48.8301494Z mov.b64 %rd16, 2048; 2026-02-21T09:42:48.8301651Z // begin inline asm 2026-02-21T09:42:48.8301942Z @%p118 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T09:42:48.8302278Z // end inline asm 2026-02-21T09:42:48.8302425Z mov.b32 %r135, 1; 2026-02-21T09:42:48.8302584Z // begin inline asm 2026-02-21T09:42:48.8302870Z @%p118 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r135; 2026-02-21T09:42:48.8303200Z // end inline asm 2026-02-21T09:42:48.8303351Z // begin inline asm 2026-02-21T09:42:48.8303648Z @%p118 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r135; 2026-02-21T09:42:48.8303981Z // end inline asm 2026-02-21T09:42:48.8304130Z // begin inline asm 2026-02-21T09:42:48.8304404Z @%p118 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x6; 2026-02-21T09:42:48.8304790Z // end inline asm 2026-02-21T09:42:48.8304946Z // begin inline asm 2026-02-21T09:42:48.8305269Z @%p118 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T09:42:48.8305595Z // end inline asm 2026-02-21T09:42:48.8305742Z // begin inline asm 2026-02-21T09:42:48.8306012Z @%p118 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x2; 2026-02-21T09:42:48.8306323Z // end inline asm 2026-02-21T09:42:48.8306471Z // begin inline asm 2026-02-21T09:42:48.8306740Z @%p118 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T09:42:48.8307033Z // end inline asm 2026-02-21T09:42:48.8307194Z // begin inline asm 2026-02-21T09:42:48.8307596Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T09:42:48.8308043Z // end inline asm 2026-02-21T09:42:48.8308201Z // begin inline asm 2026-02-21T09:42:48.8308472Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T09:42:48.8308771Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:42:48.8308986Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:42:48.8309189Z // end inline asm 2026-02-21T09:42:48.8309337Z bar.sync 0; 2026-02-21T09:42:48.8309502Z cvta.global.u64 %rd30, %rd23; 2026-02-21T09:42:48.8309818Z .loc 1 42 45 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:42:45 2026-02-21T09:42:48.8310153Z shr.u32 %r492, %r1, 5; 2026-02-21T09:42:48.8310332Z shr.u32 %r493, %r1, 2; 2026-02-21T09:42:48.8310501Z bfe.u32 %r4, %r1, 2, 6; 2026-02-21T09:42:48.8310722Z or.b32 %r5, %r4, 64; 2026-02-21T09:42:48.8310882Z or.b32 %r6, %r4, 128; 2026-02-21T09:42:48.8311053Z or.b32 %r7, %r493, 192; 2026-02-21T09:42:48.8311342Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8311668Z max.u32 %r494, %r1729, 191; 2026-02-21T09:42:48.8311850Z shl.b32 %r495, %r494, 5; 2026-02-21T09:42:48.8312033Z add.s32 %r43, %r495, -6112; 2026-02-21T09:42:48.8312221Z sub.s32 %r44, 6144, %r495; 2026-02-21T09:42:48.8312512Z .loc 1 50 48 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:50:48 2026-02-21T09:42:48.8312813Z shl.b32 %r496, %r1, 3; 2026-02-21T09:42:48.8312967Z and.b32 %r45, %r496, 24; 2026-02-21T09:42:48.8313239Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8313544Z shfl.sync.idx.b32 %r46, %r492, 0, 31, -1; 2026-02-21T09:42:48.8313739Z shl.b32 %r497, %r46, 21; 2026-02-21T09:42:48.8313905Z and.b32 %r498, %r497, 6291456; 2026-02-21T09:42:48.8314075Z add.s32 %r499, %r498, %r1692; 2026-02-21T09:42:48.8314245Z shl.b32 %r500, %r46, 6; 2026-02-21T09:42:48.8314398Z and.b32 %r501, %r500, 256; 2026-02-21T09:42:48.8314573Z add.s32 %r137, %r499, %r501; 2026-02-21T09:42:48.8314802Z mov.pred %p77, -1; 2026-02-21T09:42:48.8315003Z // begin inline asm 2026-02-21T09:42:48.8315460Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 0], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8315949Z // end inline asm 2026-02-21T09:42:48.8316109Z // begin inline asm 2026-02-21T09:42:48.8316559Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 16], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8317001Z // end inline asm 2026-02-21T09:42:48.8317139Z // begin inline asm 2026-02-21T09:42:48.8317545Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 32], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8317987Z // end inline asm 2026-02-21T09:42:48.8318127Z // begin inline asm 2026-02-21T09:42:48.8318532Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 48], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8318990Z // end inline asm 2026-02-21T09:42:48.8319135Z // begin inline asm 2026-02-21T09:42:48.8319530Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 64], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8319975Z // end inline asm 2026-02-21T09:42:48.8320124Z // begin inline asm 2026-02-21T09:42:48.8320521Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 80], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8320969Z // end inline asm 2026-02-21T09:42:48.8321110Z // begin inline asm 2026-02-21T09:42:48.8321549Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 96], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8321997Z // end inline asm 2026-02-21T09:42:48.8322143Z // begin inline asm 2026-02-21T09:42:48.8322554Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 112], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8323008Z // end inline asm 2026-02-21T09:42:48.8323156Z // begin inline asm 2026-02-21T09:42:48.8323584Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 128], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8324114Z // end inline asm 2026-02-21T09:42:48.8324261Z // begin inline asm 2026-02-21T09:42:48.8324727Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 144], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8325211Z // end inline asm 2026-02-21T09:42:48.8325372Z // begin inline asm 2026-02-21T09:42:48.8325910Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 160], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8326382Z // end inline asm 2026-02-21T09:42:48.8326543Z // begin inline asm 2026-02-21T09:42:48.8326987Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 176], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8327479Z // end inline asm 2026-02-21T09:42:48.8327636Z // begin inline asm 2026-02-21T09:42:48.8328068Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 192], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8328578Z // end inline asm 2026-02-21T09:42:48.8328732Z // begin inline asm 2026-02-21T09:42:48.8329159Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 208], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8329622Z // end inline asm 2026-02-21T09:42:48.8329761Z // begin inline asm 2026-02-21T09:42:48.8330193Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 224], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8330643Z // end inline asm 2026-02-21T09:42:48.8330793Z // begin inline asm 2026-02-21T09:42:48.8331236Z @%p77 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r137 + 240], {%r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731, %r1731}; 2026-02-21T09:42:48.8331687Z // end inline asm 2026-02-21T09:42:48.8331832Z // begin inline asm 2026-02-21T09:42:48.8331990Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:42:48.8332169Z // end inline asm 2026-02-21T09:42:48.8332335Z bar.sync 0; 2026-02-21T09:42:48.8332598Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8332926Z add.s32 %r1730, %r128, 229424; 2026-02-21T09:42:48.8333106Z // begin inline asm 2026-02-21T09:42:48.8333295Z @%p118 mbarrier.init.shared::cta.b64 [%r1730], 1; 2026-02-21T09:42:48.8333514Z // end inline asm 2026-02-21T09:42:48.8333665Z bar.sync 0; 2026-02-21T09:42:48.8333815Z add.s32 %r410, %r128, 229432; 2026-02-21T09:42:48.8333995Z // begin inline asm 2026-02-21T09:42:48.8334181Z @%p118 mbarrier.init.shared::cta.b64 [%r410], 1; 2026-02-21T09:42:48.8334403Z // end inline asm 2026-02-21T09:42:48.8334554Z add.s32 %r411, %r128, 229376; 2026-02-21T09:42:48.8334808Z // begin inline asm 2026-02-21T09:42:48.8334988Z @%p118 mbarrier.init.shared::cta.b64 [%r411], 1; 2026-02-21T09:42:48.8335189Z // end inline asm 2026-02-21T09:42:48.8335368Z bar.sync 0; 2026-02-21T09:42:48.8335506Z add.s32 %r412, %r128, 229384; 2026-02-21T09:42:48.8335677Z // begin inline asm 2026-02-21T09:42:48.8335845Z @%p118 mbarrier.init.shared::cta.b64 [%r412], 1; 2026-02-21T09:42:48.8336045Z // end inline asm 2026-02-21T09:42:48.8336179Z bar.sync 0; 2026-02-21T09:42:48.8336331Z add.s32 %r413, %r128, 229392; 2026-02-21T09:42:48.8336509Z // begin inline asm 2026-02-21T09:42:48.8336686Z @%p118 mbarrier.init.shared::cta.b64 [%r413], 1; 2026-02-21T09:42:48.8336898Z // end inline asm 2026-02-21T09:42:48.8337040Z bar.sync 0; 2026-02-21T09:42:48.8337193Z add.s32 %r414, %r128, 229400; 2026-02-21T09:42:48.8337393Z // begin inline asm 2026-02-21T09:42:48.8337582Z @%p118 mbarrier.init.shared::cta.b64 [%r414], 1; 2026-02-21T09:42:48.8337790Z // end inline asm 2026-02-21T09:42:48.8337942Z bar.sync 0; 2026-02-21T09:42:48.8338088Z add.s32 %r415, %r128, 229408; 2026-02-21T09:42:48.8338267Z // begin inline asm 2026-02-21T09:42:48.8338461Z @%p118 mbarrier.init.shared::cta.b64 [%r415], 1; 2026-02-21T09:42:48.8338654Z // end inline asm 2026-02-21T09:42:48.8338801Z bar.sync 0; 2026-02-21T09:42:48.8338936Z add.s32 %r556, %r128, 229416; 2026-02-21T09:42:48.8339101Z // begin inline asm 2026-02-21T09:42:48.8339270Z @%p118 mbarrier.init.shared::cta.b64 [%r556], 1; 2026-02-21T09:42:48.8339469Z // end inline asm 2026-02-21T09:42:48.8339614Z setp.lt.s32 %p58, %r44, 1; 2026-02-21T09:42:48.8339803Z setp.gt.s32 %p57, %r44, 0; 2026-02-21T09:42:48.8340090Z .loc 1 37 33 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:37:33 2026-02-21T09:42:48.8340388Z shr.u32 %r502, %r1729, 2; 2026-02-21T09:42:48.8340562Z and.b32 %r503, %r502, 536870848; 2026-02-21T09:42:48.8340841Z .loc 1 38 39 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:38:39 2026-02-21T09:42:48.8341146Z sub.s32 %r504, 48, %r503; 2026-02-21T09:42:48.8341445Z .loc 1 39 45 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:39:45 2026-02-21T09:42:48.8341757Z and.b32 %r505, %r1729, 255; 2026-02-21T09:42:48.8342066Z .loc 1 40 51 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:40:51 2026-02-21T09:42:48.8342384Z div.s32 %r506, %r505, %r504; 2026-02-21T09:42:48.8342675Z .loc 1 39 64 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:39:64 2026-02-21T09:42:48.8342971Z mul.lo.s32 %r507, %r506, %r504; 2026-02-21T09:42:48.8343151Z sub.s32 %r508, %r505, %r507; 2026-02-21T09:42:48.8343425Z .loc 1 39 30 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:39:30 2026-02-21T09:42:48.8343731Z add.s32 %r509, %r508, %r503; 2026-02-21T09:42:48.8344008Z .loc 1 41 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:41:27 2026-02-21T09:42:48.8344301Z shl.b32 %r1698, %r509, 8; 2026-02-21T09:42:48.8344580Z .loc 1 43 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:43:27 2026-02-21T09:42:48.8344940Z shl.b32 %r1694, %r506, 8; 2026-02-21T09:42:48.8345282Z .loc 1 44 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:44:32 2026-02-21T09:42:48.8345598Z or.b32 %r1725, %r1694, %r4; 2026-02-21T09:42:48.8345782Z or.b32 %r1726, %r1694, %r5; 2026-02-21T09:42:48.8345963Z or.b32 %r1727, %r1694, %r6; 2026-02-21T09:42:48.8346134Z or.b32 %r1728, %r1694, %r7; 2026-02-21T09:42:48.8346408Z .loc 1 54 53 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:53 2026-02-21T09:42:48.8346699Z shl.b32 %r510, %r1725, 10; 2026-02-21T09:42:48.8346878Z shl.b32 %r511, %r1726, 10; 2026-02-21T09:42:48.8347037Z shl.b32 %r512, %r1727, 10; 2026-02-21T09:42:48.8347210Z shl.b32 %r513, %r1728, 10; 2026-02-21T09:42:48.8347481Z .loc 1 54 60 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:60 2026-02-21T09:42:48.8347787Z or.b32 %r514, %r510, %r45; 2026-02-21T09:42:48.8347984Z or.b32 %r515, %r511, %r45; 2026-02-21T09:42:48.8348142Z or.b32 %r516, %r512, %r45; 2026-02-21T09:42:48.8348305Z or.b32 %r517, %r513, %r45; 2026-02-21T09:42:48.8348567Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8348879Z mad.wide.s32 %rd26, %r514, 2, %rd6; 2026-02-21T09:42:48.8349068Z mad.wide.s32 %rd27, %r515, 2, %rd6; 2026-02-21T09:42:48.8349259Z mad.wide.s32 %rd28, %r516, 2, %rd6; 2026-02-21T09:42:48.8349435Z mad.wide.s32 %rd29, %r517, 2, %rd6; 2026-02-21T09:42:48.8349724Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8350064Z shl.b32 %r518, %r1, 4; 2026-02-21T09:42:48.8350220Z and.b32 %r54, %r518, 4080; 2026-02-21T09:42:48.8350388Z shl.b32 %r519, %r1, 1; 2026-02-21T09:42:48.8350543Z and.b32 %r520, %r519, 48; 2026-02-21T09:42:48.8350707Z xor.b32 %r55, %r54, %r520; 2026-02-21T09:42:48.8350865Z add.s32 %r417, %r128, %r55; 2026-02-21T09:42:48.8351033Z selp.b32 %r418, 16, 0, %p57; 2026-02-21T09:42:48.8351194Z // begin inline asm 2026-02-21T09:42:48.8351409Z cp.async.cg.shared.global [ %r417 + 0 ], [ %rd26 + 0 ], 0x10, %r418; 2026-02-21T09:42:48.8351650Z // end inline asm 2026-02-21T09:42:48.8351791Z add.s32 %r419, %r417, 4096; 2026-02-21T09:42:48.8351953Z // begin inline asm 2026-02-21T09:42:48.8352153Z cp.async.cg.shared.global [ %r419 + 0 ], [ %rd27 + 0 ], 0x10, %r418; 2026-02-21T09:42:48.8352389Z // end inline asm 2026-02-21T09:42:48.8352526Z add.s32 %r421, %r417, 8192; 2026-02-21T09:42:48.8352688Z // begin inline asm 2026-02-21T09:42:48.8352887Z cp.async.cg.shared.global [ %r421 + 0 ], [ %rd28 + 0 ], 0x10, %r418; 2026-02-21T09:42:48.8353127Z // end inline asm 2026-02-21T09:42:48.8353276Z add.s32 %r423, %r417, 12288; 2026-02-21T09:42:48.8353434Z // begin inline asm 2026-02-21T09:42:48.8353638Z cp.async.cg.shared.global [ %r423 + 0 ], [ %rd29 + 0 ], 0x10, %r418; 2026-02-21T09:42:48.8353890Z // end inline asm 2026-02-21T09:42:48.8354047Z cp.async.commit_group; 2026-02-21T09:42:48.8354316Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8354707Z bar.sync 0; 2026-02-21T09:42:48.8354863Z and.pred %p47, %p118, %p57; 2026-02-21T09:42:48.8355047Z // begin inline asm 2026-02-21T09:42:48.8355265Z @%p47 mbarrier.arrive.expect_tx.shared.b64 _, [%r411], 16384; 2026-02-21T09:42:48.8355524Z // end inline asm 2026-02-21T09:42:48.8355805Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8356125Z bar.sync 0; 2026-02-21T09:42:48.8356279Z elect.sync %r521|%p59, -1; 2026-02-21T09:42:48.8356447Z and.pred %p60, %p57, %p59; 2026-02-21T09:42:48.8356623Z and.pred %p48, %p3, %p60; 2026-02-21T09:42:48.8356790Z add.s32 %r426, %r128, 98304; 2026-02-21T09:42:48.8356961Z // begin inline asm 2026-02-21T09:42:48.8357336Z @%p48 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r426], [%rd30, {%r1731, %r1698}], [%r411]; 2026-02-21T09:42:48.8357755Z // end inline asm 2026-02-21T09:42:48.8358013Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8358309Z setp.gt.s32 %p61, %r44, 1; 2026-02-21T09:42:48.8358585Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8358874Z cvt.s64.s32 %rd53, %r510; 2026-02-21T09:42:48.8359045Z cvt.u64.u32 %rd54, %r45; 2026-02-21T09:42:48.8359210Z or.b64 %rd55, %rd53, %rd54; 2026-02-21T09:42:48.8359374Z shl.b64 %rd56, %rd55, 1; 2026-02-21T09:42:48.8359538Z add.s64 %rd2, %rd6, %rd56; 2026-02-21T09:42:48.8359695Z add.s64 %rd31, %rd2, 64; 2026-02-21T09:42:48.8359857Z cvt.s64.s32 %rd57, %r511; 2026-02-21T09:42:48.8360011Z or.b64 %rd58, %rd57, %rd54; 2026-02-21T09:42:48.8360178Z shl.b64 %rd59, %rd58, 1; 2026-02-21T09:42:48.8360332Z add.s64 %rd3, %rd6, %rd59; 2026-02-21T09:42:48.8360528Z add.s64 %rd32, %rd3, 64; 2026-02-21T09:42:48.8360686Z cvt.s64.s32 %rd60, %r512; 2026-02-21T09:42:48.8360852Z or.b64 %rd61, %rd60, %rd54; 2026-02-21T09:42:48.8361017Z shl.b64 %rd62, %rd61, 1; 2026-02-21T09:42:48.8361171Z add.s64 %rd4, %rd6, %rd62; 2026-02-21T09:42:48.8361335Z add.s64 %rd33, %rd4, 64; 2026-02-21T09:42:48.8361489Z cvt.s64.s32 %rd63, %r513; 2026-02-21T09:42:48.8361650Z or.b64 %rd64, %rd63, %rd54; 2026-02-21T09:42:48.8361806Z shl.b64 %rd65, %rd64, 1; 2026-02-21T09:42:48.8361964Z add.s64 %rd5, %rd6, %rd65; 2026-02-21T09:42:48.8362118Z add.s64 %rd34, %rd5, 64; 2026-02-21T09:42:48.8362426Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8362721Z add.s32 %r430, %r417, 16384; 2026-02-21T09:42:48.8362893Z selp.b32 %r431, 16, 0, %p61; 2026-02-21T09:42:48.8363059Z // begin inline asm 2026-02-21T09:42:48.8363268Z cp.async.cg.shared.global [ %r430 + 0 ], [ %rd31 + 0 ], 0x10, %r431; 2026-02-21T09:42:48.8363509Z // end inline asm 2026-02-21T09:42:48.8363649Z add.s32 %r432, %r417, 20480; 2026-02-21T09:42:48.8363814Z // begin inline asm 2026-02-21T09:42:48.8364016Z cp.async.cg.shared.global [ %r432 + 0 ], [ %rd32 + 0 ], 0x10, %r431; 2026-02-21T09:42:48.8364254Z // end inline asm 2026-02-21T09:42:48.8364396Z add.s32 %r434, %r417, 24576; 2026-02-21T09:42:48.8364560Z // begin inline asm 2026-02-21T09:42:48.8364802Z cp.async.cg.shared.global [ %r434 + 0 ], [ %rd33 + 0 ], 0x10, %r431; 2026-02-21T09:42:48.8365035Z // end inline asm 2026-02-21T09:42:48.8365185Z add.s32 %r436, %r417, 28672; 2026-02-21T09:42:48.8365344Z // begin inline asm 2026-02-21T09:42:48.8365557Z cp.async.cg.shared.global [ %r436 + 0 ], [ %rd34 + 0 ], 0x10, %r431; 2026-02-21T09:42:48.8365809Z // end inline asm 2026-02-21T09:42:48.8365974Z cp.async.commit_group; 2026-02-21T09:42:48.8366297Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8366627Z bar.sync 0; 2026-02-21T09:42:48.8366791Z and.pred %p49, %p118, %p61; 2026-02-21T09:42:48.8366981Z // begin inline asm 2026-02-21T09:42:48.8367190Z @%p49 mbarrier.arrive.expect_tx.shared.b64 _, [%r412], 16384; 2026-02-21T09:42:48.8367423Z // end inline asm 2026-02-21T09:42:48.8367681Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8367978Z bar.sync 0; 2026-02-21T09:42:48.8368128Z elect.sync %r522|%p62, -1; 2026-02-21T09:42:48.8368295Z and.pred %p63, %p61, %p62; 2026-02-21T09:42:48.8368468Z and.pred %p50, %p3, %p63; 2026-02-21T09:42:48.8368640Z add.s32 %r439, %r128, 114688; 2026-02-21T09:42:48.8368802Z // begin inline asm 2026-02-21T09:42:48.8369161Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r439], [%rd30, {%r131, %r1698}], [%r412]; 2026-02-21T09:42:48.8369537Z // end inline asm 2026-02-21T09:42:48.8369804Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8370109Z setp.gt.s32 %p64, %r44, 2; 2026-02-21T09:42:48.8370420Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8370726Z add.s64 %rd36, %rd2, 128; 2026-02-21T09:42:48.8370884Z add.s64 %rd37, %rd3, 128; 2026-02-21T09:42:48.8371047Z add.s64 %rd38, %rd4, 128; 2026-02-21T09:42:48.8371201Z add.s64 %rd39, %rd5, 128; 2026-02-21T09:42:48.8371473Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8371767Z add.s32 %r443, %r417, 32768; 2026-02-21T09:42:48.8371941Z selp.b32 %r444, 16, 0, %p64; 2026-02-21T09:42:48.8372101Z // begin inline asm 2026-02-21T09:42:48.8372311Z cp.async.cg.shared.global [ %r443 + 0 ], [ %rd36 + 0 ], 0x10, %r444; 2026-02-21T09:42:48.8372548Z // end inline asm 2026-02-21T09:42:48.8372691Z add.s32 %r445, %r417, 36864; 2026-02-21T09:42:48.8372856Z // begin inline asm 2026-02-21T09:42:48.8373110Z cp.async.cg.shared.global [ %r445 + 0 ], [ %rd37 + 0 ], 0x10, %r444; 2026-02-21T09:42:48.8373354Z // end inline asm 2026-02-21T09:42:48.8373494Z add.s32 %r447, %r417, 40960; 2026-02-21T09:42:48.8373658Z // begin inline asm 2026-02-21T09:42:48.8373856Z cp.async.cg.shared.global [ %r447 + 0 ], [ %rd38 + 0 ], 0x10, %r444; 2026-02-21T09:42:48.8374092Z // end inline asm 2026-02-21T09:42:48.8374239Z add.s32 %r449, %r417, 45056; 2026-02-21T09:42:48.8374393Z // begin inline asm 2026-02-21T09:42:48.8374599Z cp.async.cg.shared.global [ %r449 + 0 ], [ %rd39 + 0 ], 0x10, %r444; 2026-02-21T09:42:48.8374887Z // end inline asm 2026-02-21T09:42:48.8375048Z cp.async.commit_group; 2026-02-21T09:42:48.8375342Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8375662Z bar.sync 0; 2026-02-21T09:42:48.8375820Z and.pred %p51, %p118, %p64; 2026-02-21T09:42:48.8375992Z // begin inline asm 2026-02-21T09:42:48.8376196Z @%p51 mbarrier.arrive.expect_tx.shared.b64 _, [%r413], 16384; 2026-02-21T09:42:48.8376424Z // end inline asm 2026-02-21T09:42:48.8376704Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8377022Z bar.sync 0; 2026-02-21T09:42:48.8377185Z elect.sync %r523|%p65, -1; 2026-02-21T09:42:48.8377370Z and.pred %p66, %p64, %p65; 2026-02-21T09:42:48.8377558Z and.pred %p52, %p3, %p66; 2026-02-21T09:42:48.8377737Z add.s32 %r452, %r128, 131072; 2026-02-21T09:42:48.8377926Z mov.b32 %r453, 64; 2026-02-21T09:42:48.8378082Z // begin inline asm 2026-02-21T09:42:48.8378466Z @%p52 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r452], [%rd30, {%r453, %r1698}], [%r413]; 2026-02-21T09:42:48.8378876Z // end inline asm 2026-02-21T09:42:48.8379150Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8379509Z setp.gt.s32 %p67, %r44, 3; 2026-02-21T09:42:48.8379809Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8380139Z add.s64 %rd41, %rd2, 192; 2026-02-21T09:42:48.8380320Z add.s64 %rd42, %rd3, 192; 2026-02-21T09:42:48.8380489Z add.s64 %rd43, %rd4, 192; 2026-02-21T09:42:48.8380662Z add.s64 %rd44, %rd5, 192; 2026-02-21T09:42:48.8380948Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8381270Z add.s32 %r456, %r417, 49152; 2026-02-21T09:42:48.8381447Z selp.b32 %r457, 16, 0, %p67; 2026-02-21T09:42:48.8381626Z // begin inline asm 2026-02-21T09:42:48.8381852Z cp.async.cg.shared.global [ %r456 + 0 ], [ %rd41 + 0 ], 0x10, %r457; 2026-02-21T09:42:48.8382112Z // end inline asm 2026-02-21T09:42:48.8382274Z add.s32 %r458, %r417, 53248; 2026-02-21T09:42:48.8382447Z // begin inline asm 2026-02-21T09:42:48.8382657Z cp.async.cg.shared.global [ %r458 + 0 ], [ %rd42 + 0 ], 0x10, %r457; 2026-02-21T09:42:48.8382890Z // end inline asm 2026-02-21T09:42:48.8383039Z add.s32 %r460, %r417, 57344; 2026-02-21T09:42:48.8383225Z // begin inline asm 2026-02-21T09:42:48.8383429Z cp.async.cg.shared.global [ %r460 + 0 ], [ %rd43 + 0 ], 0x10, %r457; 2026-02-21T09:42:48.8383656Z // end inline asm 2026-02-21T09:42:48.8383802Z add.s32 %r462, %r417, 61440; 2026-02-21T09:42:48.8383957Z // begin inline asm 2026-02-21T09:42:48.8384160Z cp.async.cg.shared.global [ %r462 + 0 ], [ %rd44 + 0 ], 0x10, %r457; 2026-02-21T09:42:48.8384391Z // end inline asm 2026-02-21T09:42:48.8384539Z cp.async.commit_group; 2026-02-21T09:42:48.8384862Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8385187Z bar.sync 0; 2026-02-21T09:42:48.8385347Z and.pred %p53, %p118, %p67; 2026-02-21T09:42:48.8385524Z // begin inline asm 2026-02-21T09:42:48.8385742Z @%p53 mbarrier.arrive.expect_tx.shared.b64 _, [%r414], 16384; 2026-02-21T09:42:48.8385985Z // end inline asm 2026-02-21T09:42:48.8386295Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8386597Z bar.sync 0; 2026-02-21T09:42:48.8386744Z elect.sync %r524|%p68, -1; 2026-02-21T09:42:48.8386930Z and.pred %p69, %p67, %p68; 2026-02-21T09:42:48.8387102Z and.pred %p54, %p3, %p69; 2026-02-21T09:42:48.8387279Z add.s32 %r465, %r128, 147456; 2026-02-21T09:42:48.8387445Z mov.b32 %r466, 96; 2026-02-21T09:42:48.8387601Z // begin inline asm 2026-02-21T09:42:48.8387961Z @%p54 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r465], [%rd30, {%r466, %r1698}], [%r414]; 2026-02-21T09:42:48.8388381Z // end inline asm 2026-02-21T09:42:48.8388667Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8388987Z setp.gt.s32 %p70, %r44, 4; 2026-02-21T09:42:48.8389294Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8389624Z add.s64 %rd46, %rd2, 256; 2026-02-21T09:42:48.8389809Z add.s64 %rd47, %rd3, 256; 2026-02-21T09:42:48.8389985Z add.s64 %rd48, %rd4, 256; 2026-02-21T09:42:48.8390171Z add.s64 %rd49, %rd5, 256; 2026-02-21T09:42:48.8390467Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8390781Z add.s32 %r469, %r417, 65536; 2026-02-21T09:42:48.8390966Z selp.b32 %r470, 16, 0, %p70; 2026-02-21T09:42:48.8391141Z // begin inline asm 2026-02-21T09:42:48.8391370Z cp.async.cg.shared.global [ %r469 + 0 ], [ %rd46 + 0 ], 0x10, %r470; 2026-02-21T09:42:48.8391617Z // end inline asm 2026-02-21T09:42:48.8391779Z add.s32 %r471, %r417, 69632; 2026-02-21T09:42:48.8391950Z // begin inline asm 2026-02-21T09:42:48.8392176Z cp.async.cg.shared.global [ %r471 + 0 ], [ %rd47 + 0 ], 0x10, %r470; 2026-02-21T09:42:48.8392433Z // end inline asm 2026-02-21T09:42:48.8392616Z add.s32 %r473, %r417, 73728; 2026-02-21T09:42:48.8392790Z // begin inline asm 2026-02-21T09:42:48.8393000Z cp.async.cg.shared.global [ %r473 + 0 ], [ %rd48 + 0 ], 0x10, %r470; 2026-02-21T09:42:48.8393249Z // end inline asm 2026-02-21T09:42:48.8393396Z add.s32 %r475, %r417, 77824; 2026-02-21T09:42:48.8393567Z // begin inline asm 2026-02-21T09:42:48.8393775Z cp.async.cg.shared.global [ %r475 + 0 ], [ %rd49 + 0 ], 0x10, %r470; 2026-02-21T09:42:48.8394022Z // end inline asm 2026-02-21T09:42:48.8394182Z cp.async.commit_group; 2026-02-21T09:42:48.8394467Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8394824Z bar.sync 0; 2026-02-21T09:42:48.8394973Z and.pred %p55, %p118, %p70; 2026-02-21T09:42:48.8395151Z // begin inline asm 2026-02-21T09:42:48.8395363Z @%p55 mbarrier.arrive.expect_tx.shared.b64 _, [%r415], 16384; 2026-02-21T09:42:48.8395607Z // end inline asm 2026-02-21T09:42:48.8395871Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8396239Z bar.sync 0; 2026-02-21T09:42:48.8396429Z elect.sync %r525|%p71, -1; 2026-02-21T09:42:48.8396609Z and.pred %p72, %p70, %p71; 2026-02-21T09:42:48.8396794Z and.pred %p56, %p3, %p72; 2026-02-21T09:42:48.8396968Z add.s32 %r478, %r128, 163840; 2026-02-21T09:42:48.8397153Z mov.b32 %r479, 128; 2026-02-21T09:42:48.8397313Z // begin inline asm 2026-02-21T09:42:48.8397693Z @%p56 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r478], [%rd30, {%r479, %r1698}], [%r415]; 2026-02-21T09:42:48.8398090Z // end inline asm 2026-02-21T09:42:48.8398370Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8398704Z cp.async.wait_group 4; 2026-02-21T09:42:48.8398872Z bar.sync 0; 2026-02-21T09:42:48.8399141Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8399452Z // begin inline asm 2026-02-21T09:42:48.8399637Z 2026-02-21T09:42:48.8399767Z { 2026-02-21T09:42:48.8399918Z @!%p57 bra.uni skipWait; 2026-02-21T09:42:48.8400096Z .reg .pred complete; 2026-02-21T09:42:48.8400267Z waitLoop: 2026-02-21T09:42:48.8400492Z mbarrier.try_wait.parity.shared.b64 complete, [%r411], %r1731; 2026-02-21T09:42:48.8400761Z @!complete bra.uni waitLoop; 2026-02-21T09:42:48.8400952Z skipWait: 2026-02-21T09:42:48.8401091Z } 2026-02-21T09:42:48.8401167Z 2026-02-21T09:42:48.8401238Z // end inline asm 2026-02-21T09:42:48.8401511Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8401880Z setp.ne.b32 %p73, %r46, 0; 2026-02-21T09:42:48.8402064Z or.pred %p74, %p58, %p73; 2026-02-21T09:42:48.8402253Z @%p74 bra $L__BB0_2; 2026-02-21T09:42:48.8402425Z // %bb.1: 2026-02-21T09:42:48.8402579Z elect.sync %r534|%p76, -1; 2026-02-21T09:42:48.8402777Z bfe.u32 %r536, %r128, 4, 14; 2026-02-21T09:42:48.8402960Z cvt.u64.u32 %rd75, %r536; 2026-02-21T09:42:48.8403167Z or.b64 %rd66, %rd75, -9223371899348713472; 2026-02-21T09:42:48.8403383Z bfe.u32 %r538, %r426, 4, 14; 2026-02-21T09:42:48.8403570Z cvt.u64.u32 %rd76, %r538; 2026-02-21T09:42:48.8403759Z or.b64 %rd67, %rd76, -9223371899348713472; 2026-02-21T09:42:48.8403974Z mov.b32 %r527, 138412048; 2026-02-21T09:42:48.8404147Z mov.pred %p75, 0; 2026-02-21T09:42:48.8404321Z // begin inline asm 2026-02-21T09:42:48.8404595Z @%p76 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 0 ], %rd66, %rd67, %r527, %p75; 2026-02-21T09:42:48.8404942Z // end inline asm 2026-02-21T09:42:48.8405103Z add.s32 %r539, %r128, 32; 2026-02-21T09:42:48.8405272Z bfe.u32 %r540, %r539, 4, 14; 2026-02-21T09:42:48.8405453Z cvt.u64.u32 %rd77, %r540; 2026-02-21T09:42:48.8405633Z or.b64 %rd68, %rd77, -9223371899348713472; 2026-02-21T09:42:48.8405841Z add.s32 %r541, %r128, 98336; 2026-02-21T09:42:48.8406019Z bfe.u32 %r542, %r541, 4, 14; 2026-02-21T09:42:48.8406214Z cvt.u64.u32 %rd78, %r542; 2026-02-21T09:42:48.8406392Z or.b64 %rd69, %rd78, -9223371899348713472; 2026-02-21T09:42:48.8406577Z // begin inline asm 2026-02-21T09:42:48.8406818Z @%p76 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 0 ], %rd68, %rd69, %r527, %p77; 2026-02-21T09:42:48.8407077Z // end inline asm 2026-02-21T09:42:48.8407228Z add.s32 %r543, %r128, 8192; 2026-02-21T09:42:48.8407388Z bfe.u32 %r544, %r543, 4, 14; 2026-02-21T09:42:48.8407553Z cvt.u64.u32 %rd79, %r544; 2026-02-21T09:42:48.8407718Z or.b64 %rd70, %rd79, -9223371899348713472; 2026-02-21T09:42:48.8407906Z // begin inline asm 2026-02-21T09:42:48.8408141Z @%p76 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 256 ], %rd70, %rd67, %r527, %p75; 2026-02-21T09:42:48.8408408Z // end inline asm 2026-02-21T09:42:48.8408558Z add.s32 %r545, %r128, 8224; 2026-02-21T09:42:48.8408717Z bfe.u32 %r546, %r545, 4, 14; 2026-02-21T09:42:48.8408883Z cvt.u64.u32 %rd80, %r546; 2026-02-21T09:42:48.8409048Z or.b64 %rd72, %rd80, -9223371899348713472; 2026-02-21T09:42:48.8409237Z // begin inline asm 2026-02-21T09:42:48.8409467Z @%p76 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 256 ], %rd72, %rd69, %r527, %p77; 2026-02-21T09:42:48.8409768Z // end inline asm 2026-02-21T09:42:48.8409919Z add.s32 %r547, %r128, 229424; 2026-02-21T09:42:48.8410082Z cvt.u64.u32 %rd74, %r547; 2026-02-21T09:42:48.8410244Z // begin inline asm 2026-02-21T09:42:48.8410458Z @%p76 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd74]; 2026-02-21T09:42:48.8410708Z // end inline asm 2026-02-21T09:42:48.8410847Z $L__BB0_2: 2026-02-21T09:42:48.8411108Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8411421Z setp.gt.s32 %p86, %r44, 5; 2026-02-21T09:42:48.8411692Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8411993Z add.s64 %rd81, %rd2, 320; 2026-02-21T09:42:48.8412150Z add.s64 %rd82, %rd3, 320; 2026-02-21T09:42:48.8412315Z add.s64 %rd83, %rd4, 320; 2026-02-21T09:42:48.8412500Z add.s64 %rd84, %rd5, 320; 2026-02-21T09:42:48.8412771Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8413064Z bar.sync 0; 2026-02-21T09:42:48.8413209Z add.s32 %r548, %r417, 81920; 2026-02-21T09:42:48.8413378Z selp.b32 %r549, 16, 0, %p86; 2026-02-21T09:42:48.8413536Z // begin inline asm 2026-02-21T09:42:48.8413753Z cp.async.cg.shared.global [ %r548 + 0 ], [ %rd81 + 0 ], 0x10, %r549; 2026-02-21T09:42:48.8413982Z // end inline asm 2026-02-21T09:42:48.8414133Z add.s32 %r550, %r417, 86016; 2026-02-21T09:42:48.8414289Z // begin inline asm 2026-02-21T09:42:48.8414534Z cp.async.cg.shared.global [ %r550 + 0 ], [ %rd82 + 0 ], 0x10, %r549; 2026-02-21T09:42:48.8414816Z // end inline asm 2026-02-21T09:42:48.8414967Z add.s32 %r552, %r417, 90112; 2026-02-21T09:42:48.8415124Z // begin inline asm 2026-02-21T09:42:48.8415336Z cp.async.cg.shared.global [ %r552 + 0 ], [ %rd83 + 0 ], 0x10, %r549; 2026-02-21T09:42:48.8415572Z // end inline asm 2026-02-21T09:42:48.8415722Z add.s32 %r554, %r417, 94208; 2026-02-21T09:42:48.8415902Z // begin inline asm 2026-02-21T09:42:48.8416120Z cp.async.cg.shared.global [ %r554 + 0 ], [ %rd84 + 0 ], 0x10, %r549; 2026-02-21T09:42:48.8416374Z // end inline asm 2026-02-21T09:42:48.8416530Z cp.async.commit_group; 2026-02-21T09:42:48.8416830Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8417176Z and.pred %p84, %p118, %p86; 2026-02-21T09:42:48.8417352Z // begin inline asm 2026-02-21T09:42:48.8417572Z @%p84 mbarrier.arrive.expect_tx.shared.b64 _, [%r556], 16384; 2026-02-21T09:42:48.8417818Z // end inline asm 2026-02-21T09:42:48.8418093Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8418405Z bar.sync 0; 2026-02-21T09:42:48.8418564Z elect.sync %r565|%p89, -1; 2026-02-21T09:42:48.8418774Z and.pred %p90, %p86, %p89; 2026-02-21T09:42:48.8418964Z and.pred %p85, %p3, %p90; 2026-02-21T09:42:48.8419140Z add.s32 %r557, %r128, 180224; 2026-02-21T09:42:48.8419321Z mov.b32 %r1706, 160; 2026-02-21T09:42:48.8419487Z // begin inline asm 2026-02-21T09:42:48.8419864Z @%p85 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r557], [%rd30, {%r1706, %r1698}], [%r556]; 2026-02-21T09:42:48.8420292Z // end inline asm 2026-02-21T09:42:48.8420566Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8420899Z sub.s32 %r56, 31, %r43; 2026-02-21T09:42:48.8421072Z setp.lt.s32 %p91, %r56, 1; 2026-02-21T09:42:48.8421257Z @%p91 bra $L__BB0_11; 2026-02-21T09:42:48.8421449Z // %bb.3: // %.lr.ph 2026-02-21T09:42:48.8421787Z .loc 1 0 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:0:84 2026-02-21T09:42:48.8422126Z bfe.u32 %r9, %r1, 5, 3; 2026-02-21T09:42:48.8422300Z and.b32 %r41, %r1, 31; 2026-02-21T09:42:48.8422513Z ld.param.b64 %rd7, [_helion_matmul_param_2]; 2026-02-21T09:42:48.8422794Z and.b32 %r8, %r1, 224; 2026-02-21T09:42:48.8422966Z or.b32 %r10, %r9, 8; 2026-02-21T09:42:48.8423125Z or.b32 %r11, %r9, 16; 2026-02-21T09:42:48.8423290Z or.b32 %r12, %r9, 24; 2026-02-21T09:42:48.8423454Z or.b32 %r13, %r9, 32; 2026-02-21T09:42:48.8423608Z or.b32 %r14, %r9, 40; 2026-02-21T09:42:48.8423770Z or.b32 %r15, %r9, 48; 2026-02-21T09:42:48.8423924Z or.b32 %r16, %r9, 56; 2026-02-21T09:42:48.8424087Z or.b32 %r17, %r9, 64; 2026-02-21T09:42:48.8424240Z or.b32 %r18, %r9, 72; 2026-02-21T09:42:48.8424404Z or.b32 %r19, %r9, 80; 2026-02-21T09:42:48.8424558Z or.b32 %r20, %r9, 88; 2026-02-21T09:42:48.8424768Z or.b32 %r21, %r9, 96; 2026-02-21T09:42:48.8424926Z or.b32 %r22, %r9, 104; 2026-02-21T09:42:48.8425109Z or.b32 %r23, %r9, 112; 2026-02-21T09:42:48.8425267Z or.b32 %r24, %r9, 120; 2026-02-21T09:42:48.8425415Z or.b32 %r25, %r9, 128; 2026-02-21T09:42:48.8425600Z or.b32 %r26, %r9, 136; 2026-02-21T09:42:48.8425749Z or.b32 %r27, %r9, 144; 2026-02-21T09:42:48.8425905Z or.b32 %r28, %r9, 152; 2026-02-21T09:42:48.8426050Z or.b32 %r29, %r9, 160; 2026-02-21T09:42:48.8426202Z or.b32 %r30, %r9, 168; 2026-02-21T09:42:48.8426345Z or.b32 %r31, %r9, 176; 2026-02-21T09:42:48.8426498Z or.b32 %r32, %r9, 184; 2026-02-21T09:42:48.8426643Z or.b32 %r33, %r9, 192; 2026-02-21T09:42:48.8426793Z or.b32 %r34, %r9, 200; 2026-02-21T09:42:48.8426938Z or.b32 %r35, %r9, 208; 2026-02-21T09:42:48.8427089Z or.b32 %r36, %r9, 216; 2026-02-21T09:42:48.8427239Z or.b32 %r37, %r9, 224; 2026-02-21T09:42:48.8427420Z or.b32 %r38, %r9, 232; 2026-02-21T09:42:48.8427572Z or.b32 %r39, %r9, 240; 2026-02-21T09:42:48.8427716Z or.b32 %r40, %r9, 248; 2026-02-21T09:42:48.8427866Z shl.b32 %r42, %r41, 3; 2026-02-21T09:42:48.8428013Z sub.s32 %r57, 26, %r43; 2026-02-21T09:42:48.8428175Z shl.b32 %r574, %r1, 12; 2026-02-21T09:42:48.8428331Z and.b32 %r575, %r574, 28672; 2026-02-21T09:42:48.8428506Z or.b32 %r576, %r575, %r54; 2026-02-21T09:42:48.8428667Z add.s32 %r578, %r128, 196608; 2026-02-21T09:42:48.8428843Z add.s32 %r58, %r578, %r576; 2026-02-21T09:42:48.8429013Z xor.b32 %r579, %r576, 16; 2026-02-21T09:42:48.8429173Z add.s32 %r59, %r578, %r579; 2026-02-21T09:42:48.8429342Z xor.b32 %r580, %r576, 32; 2026-02-21T09:42:48.8429499Z add.s32 %r60, %r578, %r580; 2026-02-21T09:42:48.8429666Z xor.b32 %r581, %r576, 48; 2026-02-21T09:42:48.8429818Z add.s32 %r61, %r578, %r581; 2026-02-21T09:42:48.8429982Z xor.b32 %r582, %r576, 64; 2026-02-21T09:42:48.8430134Z add.s32 %r62, %r578, %r582; 2026-02-21T09:42:48.8430299Z xor.b32 %r583, %r576, 80; 2026-02-21T09:42:48.8430464Z add.s32 %r63, %r578, %r583; 2026-02-21T09:42:48.8430630Z xor.b32 %r584, %r576, 96; 2026-02-21T09:42:48.8430790Z add.s32 %r64, %r578, %r584; 2026-02-21T09:42:48.8430950Z xor.b32 %r585, %r576, 112; 2026-02-21T09:42:48.8431113Z add.s32 %r65, %r578, %r585; 2026-02-21T09:42:48.8431302Z shl.b32 %r586, %r8, 7; 2026-02-21T09:42:48.8431467Z shl.b32 %r587, %r41, 4; 2026-02-21T09:42:48.8431617Z shr.u32 %r588, %r8, 1; 2026-02-21T09:42:48.8431777Z or.b32 %r589, %r586, %r587; 2026-02-21T09:42:48.8431936Z xor.b32 %r590, %r589, %r588; 2026-02-21T09:42:48.8432104Z add.s32 %r949, %r578, %r590; 2026-02-21T09:42:48.8432269Z add.s32 %r954, %r949, 512; 2026-02-21T09:42:48.8432426Z add.s32 %r959, %r949, 1024; 2026-02-21T09:42:48.8432589Z add.s32 %r964, %r949, 1536; 2026-02-21T09:42:48.8432743Z add.s32 %r969, %r949, 2048; 2026-02-21T09:42:48.8432905Z add.s32 %r974, %r949, 2560; 2026-02-21T09:42:48.8433057Z add.s32 %r979, %r949, 3072; 2026-02-21T09:42:48.8433220Z add.s32 %r984, %r949, 3584; 2026-02-21T09:42:48.8433382Z add.s32 %r1708, %r128, 229424; 2026-02-21T09:42:48.8433560Z mov.pred %p127, -1; 2026-02-21T09:42:48.8433712Z mov.b32 %r1711, 5; 2026-02-21T09:42:48.8433863Z mov.b32 %r1707, 0; 2026-02-21T09:42:48.8434010Z mov.b32 %r1705, 1; 2026-02-21T09:42:48.8434149Z mov.b32 %r1704, 2; 2026-02-21T09:42:48.8434294Z mov.b32 %r1703, 3; 2026-02-21T09:42:48.8434428Z mov.b32 %r1702, 4; 2026-02-21T09:42:48.8434623Z mov.b32 %r1695, %r1694; 2026-02-21T09:42:48.8434831Z mov.b32 %r1696, %r1694; 2026-02-21T09:42:48.8435002Z mov.b32 %r1697, %r1694; 2026-02-21T09:42:48.8435164Z mov.b32 %r1699, %r1698; 2026-02-21T09:42:48.8435332Z mov.b32 %r1700, %r1698; 2026-02-21T09:42:48.8435493Z mov.b32 %r1701, %r1698; 2026-02-21T09:42:48.8435660Z mov.b32 %r1709, %r1707; 2026-02-21T09:42:48.8435830Z mov.b32 %r1710, %r1707; 2026-02-21T09:42:48.8435993Z mov.b32 %r1712, %r1705; 2026-02-21T09:42:48.8436161Z mov.b32 %r1713, %r1707; 2026-02-21T09:42:48.8436332Z mov.b32 %r1718, %r1694; 2026-02-21T09:42:48.8436490Z mov.b32 %r1719, %r1698; 2026-02-21T09:42:48.8436640Z mov.b32 %r1721, %r1711; 2026-02-21T09:42:48.8436799Z mov.b32 %r1722, %r1707; 2026-02-21T09:42:48.8436948Z mov.b32 %r1723, %r1719; 2026-02-21T09:42:48.8437109Z mov.b32 %r1724, %r1718; 2026-02-21T09:42:48.8437258Z bra.uni $L__BB0_4; 2026-02-21T09:42:48.8437493Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:48.8437743Z selp.b32 %r1712, 0, %r652, %p112; 2026-02-21T09:42:48.8437924Z selp.b32 %r653, 1, 0, %p112; 2026-02-21T09:42:48.8438101Z xor.b32 %r1713, %r1731, %r653; 2026-02-21T09:42:48.8438383Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8438693Z add.s32 %r1722, %r1722, 1; 2026-02-21T09:42:48.8438863Z setp.lt.s32 %p116, %r1722, %r56; 2026-02-21T09:42:48.8439051Z mov.b32 %r1694, %r1718; 2026-02-21T09:42:48.8439206Z mov.b32 %r1697, %r76; 2026-02-21T09:42:48.8439404Z mov.b32 %r1698, %r1719; 2026-02-21T09:42:48.8439572Z mov.b32 %r1701, %r80; 2026-02-21T09:42:48.8439719Z mov.b32 %r1702, %r1721; 2026-02-21T09:42:48.8439874Z mov.b32 %r1705, %r84; 2026-02-21T09:42:48.8440019Z mov.b32 %r1707, %r1731; 2026-02-21T09:42:48.8440174Z mov.b32 %r1708, %r1730; 2026-02-21T09:42:48.8440324Z mov.b32 %r1718, %r1724; 2026-02-21T09:42:48.8440477Z mov.b32 %r1719, %r1723; 2026-02-21T09:42:48.8440625Z mov.b32 %r1721, %r103; 2026-02-21T09:42:48.8440784Z @%p116 bra $L__BB0_4; 2026-02-21T09:42:48.8440932Z bra.uni $L__BB0_11; 2026-02-21T09:42:48.8441137Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:42:48.8441480Z .loc 1 0 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:0:84 2026-02-21T09:42:48.8441773Z mov.b32 %r1731, %r1713; 2026-02-21T09:42:48.8441932Z mov.b32 %r84, %r1704; 2026-02-21T09:42:48.8442080Z mov.b32 %r1704, %r1703; 2026-02-21T09:42:48.8442237Z mov.b32 %r1703, %r1702; 2026-02-21T09:42:48.8442385Z mov.b32 %r80, %r1700; 2026-02-21T09:42:48.8442536Z mov.b32 %r1700, %r1699; 2026-02-21T09:42:48.8442684Z mov.b32 %r1699, %r1698; 2026-02-21T09:42:48.8442838Z mov.b32 %r76, %r1696; 2026-02-21T09:42:48.8442991Z mov.b32 %r1696, %r1695; 2026-02-21T09:42:48.8443139Z mov.b32 %r1695, %r1694; 2026-02-21T09:42:48.8443431Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8443737Z add.s32 %r591, %r1721, 1; 2026-02-21T09:42:48.8443908Z setp.eq.b32 %p93, %r1721, 31; 2026-02-21T09:42:48.8444082Z selp.b32 %r103, 0, %r591, %p93; 2026-02-21T09:42:48.8444264Z setp.ne.b32 %p94, %r103, 0; 2026-02-21T09:42:48.8444428Z @%p94 bra $L__BB0_6; 2026-02-21T09:42:48.8444637Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:48.8444917Z add.s32 %r1729, %r1729, 1; 2026-02-21T09:42:48.8445213Z .loc 1 36 35 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:36:35 2026-02-21T09:42:48.8445541Z shr.s32 %r592, %r1729, 31; 2026-02-21T09:42:48.8445711Z shr.u32 %r593, %r592, 24; 2026-02-21T09:42:48.8445888Z add.s32 %r594, %r1729, %r593; 2026-02-21T09:42:48.8446066Z shr.s32 %r595, %r594, 8; 2026-02-21T09:42:48.8446372Z .loc 1 37 33 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:37:33 2026-02-21T09:42:48.8446666Z shl.b32 %r596, %r595, 6; 2026-02-21T09:42:48.8446983Z .loc 1 38 39 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:38:39 2026-02-21T09:42:48.8447286Z sub.s32 %r597, 48, %r596; 2026-02-21T09:42:48.8447556Z .loc 1 38 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:38:52 2026-02-21T09:42:48.8447854Z min.s32 %r598, %r597, 64; 2026-02-21T09:42:48.8448121Z .loc 1 39 45 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:39:45 2026-02-21T09:42:48.8448434Z and.b32 %r599, %r594, -256; 2026-02-21T09:42:48.8448598Z sub.s32 %r600, %r1729, %r599; 2026-02-21T09:42:48.8448884Z .loc 1 40 51 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:40:51 2026-02-21T09:42:48.8449188Z div.s32 %r601, %r600, %r598; 2026-02-21T09:42:48.8449496Z .loc 1 39 64 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:39:64 2026-02-21T09:42:48.8449815Z mul.lo.s32 %r602, %r601, %r598; 2026-02-21T09:42:48.8449992Z sub.s32 %r603, %r600, %r602; 2026-02-21T09:42:48.8450269Z .loc 1 39 30 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:39:30 2026-02-21T09:42:48.8450562Z add.s32 %r604, %r603, %r596; 2026-02-21T09:42:48.8450839Z .loc 1 41 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:41:27 2026-02-21T09:42:48.8451133Z shl.b32 %r1723, %r604, 8; 2026-02-21T09:42:48.8451398Z .loc 1 43 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:43:27 2026-02-21T09:42:48.8451722Z shl.b32 %r1724, %r601, 8; 2026-02-21T09:42:48.8451988Z .loc 1 44 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:44:32 2026-02-21T09:42:48.8452289Z or.b32 %r1725, %r1724, %r4; 2026-02-21T09:42:48.8452449Z or.b32 %r1726, %r1724, %r5; 2026-02-21T09:42:48.8452617Z or.b32 %r1727, %r1724, %r6; 2026-02-21T09:42:48.8452785Z or.b32 %r1728, %r1724, %r7; 2026-02-21T09:42:48.8452986Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:48.8453327Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8453619Z add.s32 %r607, %r1710, 1; 2026-02-21T09:42:48.8453786Z setp.gt.s32 %p96, %r607, 5; 2026-02-21T09:42:48.8453956Z selp.b32 %r1710, 0, %r607, %p96; 2026-02-21T09:42:48.8454140Z selp.b32 %r608, 1, 0, %p96; 2026-02-21T09:42:48.8454303Z xor.b32 %r1709, %r1709, %r608; 2026-02-21T09:42:48.8454591Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8454978Z cp.async.wait_group 4; 2026-02-21T09:42:48.8455148Z bar.sync 0; 2026-02-21T09:42:48.8455420Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8455771Z shl.b32 %r609, %r1710, 3; 2026-02-21T09:42:48.8455951Z add.s32 %r611, %r128, %r609; 2026-02-21T09:42:48.8456126Z add.s32 %r605, %r611, 229376; 2026-02-21T09:42:48.8456308Z // begin inline asm 2026-02-21T09:42:48.8456459Z 2026-02-21T09:42:48.8456582Z { 2026-02-21T09:42:48.8456712Z .reg .pred complete; 2026-02-21T09:42:48.8456860Z waitLoop: 2026-02-21T09:42:48.8457065Z mbarrier.try_wait.parity.shared.b64 complete, [%r605], %r1709; 2026-02-21T09:42:48.8457313Z @!complete bra.uni waitLoop; 2026-02-21T09:42:48.8457476Z } 2026-02-21T09:42:48.8457543Z 2026-02-21T09:42:48.8457602Z // end inline asm 2026-02-21T09:42:48.8457750Z shl.b32 %r612, %r1712, 3; 2026-02-21T09:42:48.8457908Z add.s32 %r613, %r128, %r612; 2026-02-21T09:42:48.8458078Z add.s32 %r1730, %r613, 229424; 2026-02-21T09:42:48.8458362Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8458652Z @%p73 bra $L__BB0_8; 2026-02-21T09:42:48.8458851Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:48.8459181Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8459510Z shl.b32 %r622, %r1710, 14; 2026-02-21T09:42:48.8459671Z add.s32 %r624, %r128, %r622; 2026-02-21T09:42:48.8459838Z add.s32 %r625, %r624, 98304; 2026-02-21T09:42:48.8460108Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8460403Z elect.sync %r626|%p98, -1; 2026-02-21T09:42:48.8460580Z bfe.u32 %r627, %r624, 4, 14; 2026-02-21T09:42:48.8460743Z cvt.u64.u32 %rd95, %r627; 2026-02-21T09:42:48.8460932Z or.b64 %rd86, %rd95, -9223371899348713472; 2026-02-21T09:42:48.8461124Z bfe.u32 %r628, %r625, 4, 14; 2026-02-21T09:42:48.8461292Z cvt.u64.u32 %rd96, %r628; 2026-02-21T09:42:48.8461462Z or.b64 %rd87, %rd96, -9223371899348713472; 2026-02-21T09:42:48.8461655Z mov.b32 %r615, 138412048; 2026-02-21T09:42:48.8461807Z // begin inline asm 2026-02-21T09:42:48.8462081Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 0 ], %rd86, %rd87, %r615, %p127; 2026-02-21T09:42:48.8462358Z // end inline asm 2026-02-21T09:42:48.8462500Z add.s32 %r629, %r624, 32; 2026-02-21T09:42:48.8462663Z bfe.u32 %r630, %r629, 4, 14; 2026-02-21T09:42:48.8462822Z cvt.u64.u32 %rd97, %r630; 2026-02-21T09:42:48.8462996Z or.b64 %rd88, %rd97, -9223371899348713472; 2026-02-21T09:42:48.8463180Z add.s32 %r631, %r624, 98336; 2026-02-21T09:42:48.8463346Z bfe.u32 %r632, %r631, 4, 14; 2026-02-21T09:42:48.8463506Z cvt.u64.u32 %rd98, %r632; 2026-02-21T09:42:48.8463680Z or.b64 %rd89, %rd98, -9223371899348713472; 2026-02-21T09:42:48.8463902Z mov.pred %p99, -1; 2026-02-21T09:42:48.8464050Z // begin inline asm 2026-02-21T09:42:48.8464288Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 0 ], %rd88, %rd89, %r615, %p99; 2026-02-21T09:42:48.8464551Z // end inline asm 2026-02-21T09:42:48.8464736Z add.s32 %r633, %r624, 8192; 2026-02-21T09:42:48.8464899Z bfe.u32 %r634, %r633, 4, 14; 2026-02-21T09:42:48.8465069Z cvt.u64.u32 %rd99, %r634; 2026-02-21T09:42:48.8465245Z or.b64 %rd90, %rd99, -9223371899348713472; 2026-02-21T09:42:48.8465451Z // begin inline asm 2026-02-21T09:42:48.8465709Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 256 ], %rd90, %rd87, %r615, %p127; 2026-02-21T09:42:48.8465999Z // end inline asm 2026-02-21T09:42:48.8466159Z add.s32 %r635, %r624, 8224; 2026-02-21T09:42:48.8466330Z bfe.u32 %r636, %r635, 4, 14; 2026-02-21T09:42:48.8466510Z cvt.u64.u32 %rd100, %r636; 2026-02-21T09:42:48.8466696Z or.b64 %rd92, %rd100, -9223371899348713472; 2026-02-21T09:42:48.8466908Z // begin inline asm 2026-02-21T09:42:48.8467156Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r1692 + 256 ], %rd92, %rd89, %r615, %p99; 2026-02-21T09:42:48.8467449Z // end inline asm 2026-02-21T09:42:48.8467611Z cvt.u64.u32 %rd94, %r1730; 2026-02-21T09:42:48.8467780Z // begin inline asm 2026-02-21T09:42:48.8468077Z @%p98 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd94]; 2026-02-21T09:42:48.8468346Z // end inline asm 2026-02-21T09:42:48.8468554Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:48.8468916Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8469263Z setp.eq.b32 %p108, %r103, 0; 2026-02-21T09:42:48.8469459Z setp.lt.s32 %p109, %r1722, %r57; 2026-02-21T09:42:48.8469771Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8470098Z // begin inline asm 2026-02-21T09:42:48.8470250Z 2026-02-21T09:42:48.8470389Z { 2026-02-21T09:42:48.8470530Z .reg .pred complete; 2026-02-21T09:42:48.8470700Z waitLoop: 2026-02-21T09:42:48.8470919Z mbarrier.try_wait.parity.shared.b64 complete, [%r1708], %r1707; 2026-02-21T09:42:48.8471211Z @!complete bra.uni waitLoop; 2026-02-21T09:42:48.8471393Z } 2026-02-21T09:42:48.8471474Z 2026-02-21T09:42:48.8471536Z // end inline asm 2026-02-21T09:42:48.8471702Z add.s32 %r652, %r1712, 1; 2026-02-21T09:42:48.8471878Z setp.gt.s32 %p112, %r652, 1; 2026-02-21T09:42:48.8472219Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8472536Z add.s32 %r654, %r1706, 32; 2026-02-21T09:42:48.8472715Z add.s32 %r655, %r1711, 1; 2026-02-21T09:42:48.8472888Z setp.gt.s32 %p113, %r655, 5; 2026-02-21T09:42:48.8473075Z selp.b32 %r1711, 0, %r655, %p113; 2026-02-21T09:42:48.8473277Z selp.b32 %r1706, 0, %r654, %p108; 2026-02-21T09:42:48.8473587Z .loc 1 50 35 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:50:35 2026-02-21T09:42:48.8473910Z add.s32 %r656, %r1706, %r45; 2026-02-21T09:42:48.8474201Z .loc 1 54 53 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:53 2026-02-21T09:42:48.8474529Z shl.b32 %r657, %r1725, 10; 2026-02-21T09:42:48.8474735Z shl.b32 %r658, %r1726, 10; 2026-02-21T09:42:48.8474916Z shl.b32 %r659, %r1727, 10; 2026-02-21T09:42:48.8475121Z shl.b32 %r660, %r1728, 10; 2026-02-21T09:42:48.8475402Z .loc 1 54 60 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:60 2026-02-21T09:42:48.8475717Z add.s32 %r661, %r657, %r656; 2026-02-21T09:42:48.8475891Z add.s32 %r662, %r658, %r656; 2026-02-21T09:42:48.8476070Z add.s32 %r663, %r659, %r656; 2026-02-21T09:42:48.8476239Z add.s32 %r664, %r660, %r656; 2026-02-21T09:42:48.8476537Z .loc 1 54 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:32 2026-02-21T09:42:48.8476871Z mad.wide.s32 %rd101, %r661, 2, %rd6; 2026-02-21T09:42:48.8477113Z mad.wide.s32 %rd102, %r662, 2, %rd6; 2026-02-21T09:42:48.8477317Z mad.wide.s32 %rd103, %r663, 2, %rd6; 2026-02-21T09:42:48.8477511Z mad.wide.s32 %rd104, %r664, 2, %rd6; 2026-02-21T09:42:48.8477823Z .loc 1 54 85 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:54:85 2026-02-21T09:42:48.8478112Z shl.b32 %r665, %r1711, 14; 2026-02-21T09:42:48.8478275Z add.s32 %r667, %r128, %r665; 2026-02-21T09:42:48.8478429Z bar.sync 0; 2026-02-21T09:42:48.8478574Z add.s32 %r639, %r667, %r55; 2026-02-21T09:42:48.8478734Z selp.b32 %r640, 16, 0, %p109; 2026-02-21T09:42:48.8478903Z // begin inline asm 2026-02-21T09:42:48.8479117Z cp.async.cg.shared.global [ %r639 + 0 ], [ %rd101 + 0 ], 0x10, %r640; 2026-02-21T09:42:48.8479350Z // end inline asm 2026-02-21T09:42:48.8479500Z add.s32 %r641, %r639, 4096; 2026-02-21T09:42:48.8479656Z // begin inline asm 2026-02-21T09:42:48.8479864Z cp.async.cg.shared.global [ %r641 + 0 ], [ %rd102 + 0 ], 0x10, %r640; 2026-02-21T09:42:48.8480100Z // end inline asm 2026-02-21T09:42:48.8480246Z add.s32 %r643, %r639, 8192; 2026-02-21T09:42:48.8480400Z // begin inline asm 2026-02-21T09:42:48.8480607Z cp.async.cg.shared.global [ %r643 + 0 ], [ %rd103 + 0 ], 0x10, %r640; 2026-02-21T09:42:48.8480843Z // end inline asm 2026-02-21T09:42:48.8480985Z add.s32 %r645, %r639, 12288; 2026-02-21T09:42:48.8481178Z // begin inline asm 2026-02-21T09:42:48.8481385Z cp.async.cg.shared.global [ %r645 + 0 ], [ %rd104 + 0 ], 0x10, %r640; 2026-02-21T09:42:48.8481627Z // end inline asm 2026-02-21T09:42:48.8481774Z cp.async.commit_group; 2026-02-21T09:42:48.8482050Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8482353Z shl.b32 %r668, %r1711, 3; 2026-02-21T09:42:48.8482537Z add.s32 %r669, %r128, %r668; 2026-02-21T09:42:48.8482711Z add.s32 %r651, %r669, 229376; 2026-02-21T09:42:48.8482905Z and.pred %p106, %p118, %p109; 2026-02-21T09:42:48.8483091Z // begin inline asm 2026-02-21T09:42:48.8483310Z @%p106 mbarrier.arrive.expect_tx.shared.b64 _, [%r651], 16384; 2026-02-21T09:42:48.8483571Z // end inline asm 2026-02-21T09:42:48.8483842Z .loc 1 55 44 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:55:44 2026-02-21T09:42:48.8484165Z add.s32 %r648, %r667, 98304; 2026-02-21T09:42:48.8484335Z bar.sync 0; 2026-02-21T09:42:48.8484500Z elect.sync %r670|%p114, -1; 2026-02-21T09:42:48.8484750Z and.pred %p115, %p109, %p114; 2026-02-21T09:42:48.8484979Z and.pred %p107, %p3, %p115; 2026-02-21T09:42:48.8485158Z // begin inline asm 2026-02-21T09:42:48.8485542Z @%p107 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r648], [%rd30, {%r1706, %r1723}], [%r651]; 2026-02-21T09:42:48.8485958Z // end inline asm 2026-02-21T09:42:48.8486207Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8486516Z setp.ne.b32 %p127, %r1705, 31; 2026-02-21T09:42:48.8486691Z @%p127 bra $L__BB0_10; 2026-02-21T09:42:48.8486895Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:42:48.8487229Z .loc 1 42 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:42:32 2026-02-21T09:42:48.8487524Z add.s32 %r1233, %r1701, %r42; 2026-02-21T09:42:48.8487834Z .loc 1 44 32 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:44:32 2026-02-21T09:42:48.8488126Z add.s32 %r1234, %r1697, %r9; 2026-02-21T09:42:48.8488295Z add.s32 %r1235, %r10, %r1697; 2026-02-21T09:42:48.8488455Z add.s32 %r1236, %r11, %r1697; 2026-02-21T09:42:48.8488619Z add.s32 %r1237, %r12, %r1697; 2026-02-21T09:42:48.8488782Z add.s32 %r1238, %r13, %r1697; 2026-02-21T09:42:48.8488940Z add.s32 %r1239, %r14, %r1697; 2026-02-21T09:42:48.8489105Z add.s32 %r1240, %r15, %r1697; 2026-02-21T09:42:48.8489260Z add.s32 %r1241, %r16, %r1697; 2026-02-21T09:42:48.8489423Z add.s32 %r1242, %r17, %r1697; 2026-02-21T09:42:48.8489607Z add.s32 %r1243, %r18, %r1697; 2026-02-21T09:42:48.8489770Z add.s32 %r1244, %r19, %r1697; 2026-02-21T09:42:48.8489928Z add.s32 %r1245, %r20, %r1697; 2026-02-21T09:42:48.8490090Z add.s32 %r1246, %r21, %r1697; 2026-02-21T09:42:48.8490247Z add.s32 %r1247, %r22, %r1697; 2026-02-21T09:42:48.8490413Z add.s32 %r1248, %r23, %r1697; 2026-02-21T09:42:48.8490579Z add.s32 %r1249, %r24, %r1697; 2026-02-21T09:42:48.8490735Z add.s32 %r1250, %r25, %r1697; 2026-02-21T09:42:48.8490902Z add.s32 %r1251, %r26, %r1697; 2026-02-21T09:42:48.8491060Z add.s32 %r1252, %r27, %r1697; 2026-02-21T09:42:48.8491225Z add.s32 %r1253, %r28, %r1697; 2026-02-21T09:42:48.8491383Z add.s32 %r1254, %r29, %r1697; 2026-02-21T09:42:48.8491549Z add.s32 %r1255, %r30, %r1697; 2026-02-21T09:42:48.8491706Z add.s32 %r1256, %r31, %r1697; 2026-02-21T09:42:48.8491872Z add.s32 %r1257, %r32, %r1697; 2026-02-21T09:42:48.8492036Z add.s32 %r1258, %r33, %r1697; 2026-02-21T09:42:48.8492195Z add.s32 %r1259, %r34, %r1697; 2026-02-21T09:42:48.8492367Z add.s32 %r1260, %r35, %r1697; 2026-02-21T09:42:48.8492527Z add.s32 %r1261, %r36, %r1697; 2026-02-21T09:42:48.8492693Z add.s32 %r1262, %r37, %r1697; 2026-02-21T09:42:48.8492850Z add.s32 %r1263, %r38, %r1697; 2026-02-21T09:42:48.8493017Z add.s32 %r1264, %r39, %r1697; 2026-02-21T09:42:48.8493198Z add.s32 %r1265, %r40, %r1697; 2026-02-21T09:42:48.8493476Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8493776Z // begin inline asm 2026-02-21T09:42:48.8493915Z 2026-02-21T09:42:48.8494037Z { 2026-02-21T09:42:48.8494161Z .reg .pred complete; 2026-02-21T09:42:48.8494319Z waitLoop: 2026-02-21T09:42:48.8494522Z mbarrier.try_wait.parity.shared.b64 complete, [%r1730], %r1731; 2026-02-21T09:42:48.8494830Z @!complete bra.uni waitLoop; 2026-02-21T09:42:48.8495003Z } 2026-02-21T09:42:48.8495086Z 2026-02-21T09:42:48.8495148Z // end inline asm 2026-02-21T09:42:48.8495429Z .loc 1 59 53 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:59:53 2026-02-21T09:42:48.8495784Z mad.lo.s32 %r1266, %r1234, 12288, %r1233; 2026-02-21T09:42:48.8496008Z mad.lo.s32 %r1267, %r1235, 12288, %r1233; 2026-02-21T09:42:48.8496225Z mad.lo.s32 %r1268, %r1236, 12288, %r1233; 2026-02-21T09:42:48.8496422Z mad.lo.s32 %r1269, %r1237, 12288, %r1233; 2026-02-21T09:42:48.8496613Z mad.lo.s32 %r1270, %r1238, 12288, %r1233; 2026-02-21T09:42:48.8496805Z mad.lo.s32 %r1271, %r1239, 12288, %r1233; 2026-02-21T09:42:48.8497039Z mad.lo.s32 %r1272, %r1240, 12288, %r1233; 2026-02-21T09:42:48.8497253Z mad.lo.s32 %r1273, %r1241, 12288, %r1233; 2026-02-21T09:42:48.8497463Z mad.lo.s32 %r1274, %r1242, 12288, %r1233; 2026-02-21T09:42:48.8497668Z mad.lo.s32 %r1275, %r1243, 12288, %r1233; 2026-02-21T09:42:48.8497882Z mad.lo.s32 %r1276, %r1244, 12288, %r1233; 2026-02-21T09:42:48.8498089Z mad.lo.s32 %r1277, %r1245, 12288, %r1233; 2026-02-21T09:42:48.8498302Z mad.lo.s32 %r1278, %r1246, 12288, %r1233; 2026-02-21T09:42:48.8498508Z mad.lo.s32 %r1279, %r1247, 12288, %r1233; 2026-02-21T09:42:48.8498717Z mad.lo.s32 %r1280, %r1248, 12288, %r1233; 2026-02-21T09:42:48.8498920Z mad.lo.s32 %r1281, %r1249, 12288, %r1233; 2026-02-21T09:42:48.8499132Z mad.lo.s32 %r1282, %r1250, 12288, %r1233; 2026-02-21T09:42:48.8499342Z mad.lo.s32 %r1283, %r1251, 12288, %r1233; 2026-02-21T09:42:48.8499580Z mad.lo.s32 %r1284, %r1252, 12288, %r1233; 2026-02-21T09:42:48.8499809Z mad.lo.s32 %r1285, %r1253, 12288, %r1233; 2026-02-21T09:42:48.8500013Z mad.lo.s32 %r1286, %r1254, 12288, %r1233; 2026-02-21T09:42:48.8500237Z mad.lo.s32 %r1287, %r1255, 12288, %r1233; 2026-02-21T09:42:48.8500453Z mad.lo.s32 %r1288, %r1256, 12288, %r1233; 2026-02-21T09:42:48.8500674Z mad.lo.s32 %r1289, %r1257, 12288, %r1233; 2026-02-21T09:42:48.8500896Z mad.lo.s32 %r1290, %r1258, 12288, %r1233; 2026-02-21T09:42:48.8501124Z mad.lo.s32 %r1291, %r1259, 12288, %r1233; 2026-02-21T09:42:48.8501353Z mad.lo.s32 %r1292, %r1260, 12288, %r1233; 2026-02-21T09:42:48.8501605Z mad.lo.s32 %r1293, %r1261, 12288, %r1233; 2026-02-21T09:42:48.8501836Z mad.lo.s32 %r1294, %r1262, 12288, %r1233; 2026-02-21T09:42:48.8502058Z mad.lo.s32 %r1295, %r1263, 12288, %r1233; 2026-02-21T09:42:48.8502284Z mad.lo.s32 %r1296, %r1264, 12288, %r1233; 2026-02-21T09:42:48.8502508Z mad.lo.s32 %r1297, %r1265, 12288, %r1233; 2026-02-21T09:42:48.8502853Z .loc 1 59 24 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:59:24 2026-02-21T09:42:48.8503202Z mad.wide.s32 %rd106, %r1266, 2, %rd7; 2026-02-21T09:42:48.8503403Z mad.wide.s32 %rd107, %r1267, 2, %rd7; 2026-02-21T09:42:48.8503597Z mad.wide.s32 %rd108, %r1268, 2, %rd7; 2026-02-21T09:42:48.8503775Z mad.wide.s32 %rd109, %r1269, 2, %rd7; 2026-02-21T09:42:48.8503961Z mad.wide.s32 %rd110, %r1270, 2, %rd7; 2026-02-21T09:42:48.8504135Z mad.wide.s32 %rd111, %r1271, 2, %rd7; 2026-02-21T09:42:48.8504315Z mad.wide.s32 %rd112, %r1272, 2, %rd7; 2026-02-21T09:42:48.8504486Z mad.wide.s32 %rd113, %r1273, 2, %rd7; 2026-02-21T09:42:48.8504710Z mad.wide.s32 %rd114, %r1274, 2, %rd7; 2026-02-21T09:42:48.8504903Z mad.wide.s32 %rd115, %r1275, 2, %rd7; 2026-02-21T09:42:48.8505104Z mad.wide.s32 %rd116, %r1276, 2, %rd7; 2026-02-21T09:42:48.8505302Z mad.wide.s32 %rd117, %r1277, 2, %rd7; 2026-02-21T09:42:48.8505401Z mad.wide.s32 %rd118, %r1278, 2, %rd7; 2026-02-21T09:42:48.8505471Z mad.wide.s32 %rd119, %r1279, 2, %rd7; 2026-02-21T09:42:48.8505540Z mad.wide.s32 %rd120, %r1280, 2, %rd7; 2026-02-21T09:42:48.8505617Z mad.wide.s32 %rd121, %r1281, 2, %rd7; 2026-02-21T09:42:48.8505685Z mad.wide.s32 %rd122, %r1282, 2, %rd7; 2026-02-21T09:42:48.8505753Z mad.wide.s32 %rd123, %r1283, 2, %rd7; 2026-02-21T09:42:48.8505829Z mad.wide.s32 %rd124, %r1284, 2, %rd7; 2026-02-21T09:42:48.8505897Z mad.wide.s32 %rd125, %r1285, 2, %rd7; 2026-02-21T09:42:48.8505965Z mad.wide.s32 %rd126, %r1286, 2, %rd7; 2026-02-21T09:42:48.8506041Z mad.wide.s32 %rd127, %r1287, 2, %rd7; 2026-02-21T09:42:48.8506110Z mad.wide.s32 %rd128, %r1288, 2, %rd7; 2026-02-21T09:42:48.8506187Z mad.wide.s32 %rd129, %r1289, 2, %rd7; 2026-02-21T09:42:48.8506250Z mad.wide.s32 %rd130, %r1290, 2, %rd7; 2026-02-21T09:42:48.8506319Z mad.wide.s32 %rd131, %r1291, 2, %rd7; 2026-02-21T09:42:48.8506381Z mad.wide.s32 %rd132, %r1292, 2, %rd7; 2026-02-21T09:42:48.8506445Z mad.wide.s32 %rd133, %r1293, 2, %rd7; 2026-02-21T09:42:48.8506516Z mad.wide.s32 %rd134, %r1294, 2, %rd7; 2026-02-21T09:42:48.8506580Z mad.wide.s32 %rd135, %r1295, 2, %rd7; 2026-02-21T09:42:48.8506675Z mad.wide.s32 %rd136, %r1296, 2, %rd7; 2026-02-21T09:42:48.8506739Z mad.wide.s32 %rd137, %r1297, 2, %rd7; 2026-02-21T09:42:48.8506922Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8506981Z // begin inline asm 2026-02-21T09:42:48.8507287Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r673, %r674, %r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688}, [%r137 + 0]; 2026-02-21T09:42:48.8507354Z // end inline asm 2026-02-21T09:42:48.8507413Z // begin inline asm 2026-02-21T09:42:48.8507706Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r690, %r691, %r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705}, [%r137 + 16]; 2026-02-21T09:42:48.8507771Z // end inline asm 2026-02-21T09:42:48.8507859Z // begin inline asm 2026-02-21T09:42:48.8508143Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r707, %r708, %r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722}, [%r137 + 32]; 2026-02-21T09:42:48.8508208Z // end inline asm 2026-02-21T09:42:48.8508264Z // begin inline asm 2026-02-21T09:42:48.8508549Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r724, %r725, %r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739}, [%r137 + 48]; 2026-02-21T09:42:48.8508614Z // end inline asm 2026-02-21T09:42:48.8508672Z // begin inline asm 2026-02-21T09:42:48.8508954Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r741, %r742, %r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756}, [%r137 + 64]; 2026-02-21T09:42:48.8509056Z // end inline asm 2026-02-21T09:42:48.8509121Z // begin inline asm 2026-02-21T09:42:48.8509398Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r758, %r759, %r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773}, [%r137 + 80]; 2026-02-21T09:42:48.8509457Z // end inline asm 2026-02-21T09:42:48.8509523Z // begin inline asm 2026-02-21T09:42:48.8509801Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r775, %r776, %r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790}, [%r137 + 96]; 2026-02-21T09:42:48.8509857Z // end inline asm 2026-02-21T09:42:48.8509919Z // begin inline asm 2026-02-21T09:42:48.8510196Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r792, %r793, %r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807}, [%r137 + 112]; 2026-02-21T09:42:48.8510253Z // end inline asm 2026-02-21T09:42:48.8510317Z // begin inline asm 2026-02-21T09:42:48.8510595Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r809, %r810, %r811, %r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824}, [%r137 + 128]; 2026-02-21T09:42:48.8510675Z // end inline asm 2026-02-21T09:42:48.8510734Z // begin inline asm 2026-02-21T09:42:48.8511024Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r826, %r827, %r828, %r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841}, [%r137 + 144]; 2026-02-21T09:42:48.8511082Z // end inline asm 2026-02-21T09:42:48.8511139Z // begin inline asm 2026-02-21T09:42:48.8511436Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r843, %r844, %r845, %r846, %r847, %r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858}, [%r137 + 160]; 2026-02-21T09:42:48.8511492Z // end inline asm 2026-02-21T09:42:48.8511549Z // begin inline asm 2026-02-21T09:42:48.8511838Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r860, %r861, %r862, %r863, %r864, %r865, %r866, %r867, %r868, %r869, %r870, %r871, %r872, %r873, %r874, %r875}, [%r137 + 176]; 2026-02-21T09:42:48.8511893Z // end inline asm 2026-02-21T09:42:48.8511951Z // begin inline asm 2026-02-21T09:42:48.8512255Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r877, %r878, %r879, %r880, %r881, %r882, %r883, %r884, %r885, %r886, %r887, %r888, %r889, %r890, %r891, %r892}, [%r137 + 192]; 2026-02-21T09:42:48.8512335Z // end inline asm 2026-02-21T09:42:48.8512392Z // begin inline asm 2026-02-21T09:42:48.8512663Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r894, %r895, %r896, %r897, %r898, %r899, %r900, %r901, %r902, %r903, %r904, %r905, %r906, %r907, %r908, %r909}, [%r137 + 208]; 2026-02-21T09:42:48.8512726Z // end inline asm 2026-02-21T09:42:48.8512782Z // begin inline asm 2026-02-21T09:42:48.8513058Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r911, %r912, %r913, %r914, %r915, %r916, %r917, %r918, %r919, %r920, %r921, %r922, %r923, %r924, %r925, %r926}, [%r137 + 224]; 2026-02-21T09:42:48.8513124Z // end inline asm 2026-02-21T09:42:48.8513180Z // begin inline asm 2026-02-21T09:42:48.8513456Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r928, %r929, %r930, %r931, %r932, %r933, %r934, %r935, %r936, %r937, %r938, %r939, %r940, %r941, %r942, %r943}, [%r137 + 240]; 2026-02-21T09:42:48.8513522Z // end inline asm 2026-02-21T09:42:48.8513602Z // begin inline asm 2026-02-21T09:42:48.8513679Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:42:48.8513745Z // end inline asm 2026-02-21T09:42:48.8513809Z cvt.u64.u32 %rd138, %r673; 2026-02-21T09:42:48.8513871Z cvt.u64.u32 %rd139, %r674; 2026-02-21T09:42:48.8513935Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:42:48.8514011Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:42:48.8514196Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8514263Z mov.b64 {%r1298, %r1299}, %rd141; 2026-02-21T09:42:48.8514355Z cvt.rn.f16x2.f32 %r1300, %r1299, %r1298; 2026-02-21T09:42:48.8514559Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8514625Z cvt.u64.u32 %rd142, %r675; 2026-02-21T09:42:48.8514725Z cvt.u64.u32 %rd143, %r676; 2026-02-21T09:42:48.8514792Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:42:48.8514862Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:42:48.8515049Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8515129Z mov.b64 {%r1301, %r1302}, %rd145; 2026-02-21T09:42:48.8515209Z cvt.rn.f16x2.f32 %r1303, %r1302, %r1301; 2026-02-21T09:42:48.8515392Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8515465Z cvt.u64.u32 %rd146, %r677; 2026-02-21T09:42:48.8515530Z cvt.u64.u32 %rd147, %r678; 2026-02-21T09:42:48.8515595Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:42:48.8515661Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:42:48.8515852Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8515921Z mov.b64 {%r1304, %r1305}, %rd149; 2026-02-21T09:42:48.8515996Z cvt.rn.f16x2.f32 %r1306, %r1305, %r1304; 2026-02-21T09:42:48.8516236Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8516301Z cvt.u64.u32 %rd150, %r679; 2026-02-21T09:42:48.8516363Z cvt.u64.u32 %rd151, %r680; 2026-02-21T09:42:48.8516432Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:42:48.8516493Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:42:48.8516662Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8516730Z mov.b64 {%r1307, %r1308}, %rd153; 2026-02-21T09:42:48.8516800Z cvt.rn.f16x2.f32 %r1309, %r1308, %r1307; 2026-02-21T09:42:48.8516971Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8517032Z cvt.u64.u32 %rd154, %r681; 2026-02-21T09:42:48.8517101Z cvt.u64.u32 %rd155, %r682; 2026-02-21T09:42:48.8517161Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:42:48.8517223Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:42:48.8517404Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8517466Z mov.b64 {%r1310, %r1311}, %rd157; 2026-02-21T09:42:48.8517563Z cvt.rn.f16x2.f32 %r1312, %r1311, %r1310; 2026-02-21T09:42:48.8517742Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8517803Z cvt.u64.u32 %rd158, %r683; 2026-02-21T09:42:48.8517863Z cvt.u64.u32 %rd159, %r684; 2026-02-21T09:42:48.8517923Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:42:48.8517991Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:42:48.8518162Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8518224Z mov.b64 {%r1313, %r1314}, %rd161; 2026-02-21T09:42:48.8518301Z cvt.rn.f16x2.f32 %r1315, %r1314, %r1313; 2026-02-21T09:42:48.8518469Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8518529Z cvt.u64.u32 %rd162, %r685; 2026-02-21T09:42:48.8518624Z cvt.u64.u32 %rd163, %r686; 2026-02-21T09:42:48.8518687Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:42:48.8518749Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:42:48.8518918Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8518986Z mov.b64 {%r1316, %r1317}, %rd165; 2026-02-21T09:42:48.8519054Z cvt.rn.f16x2.f32 %r1318, %r1317, %r1316; 2026-02-21T09:42:48.8519226Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8519295Z cvt.u64.u32 %rd166, %r687; 2026-02-21T09:42:48.8519352Z cvt.u64.u32 %rd167, %r688; 2026-02-21T09:42:48.8519438Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:42:48.8519498Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:42:48.8519674Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8519734Z mov.b64 {%r1319, %r1320}, %rd169; 2026-02-21T09:42:48.8519803Z cvt.rn.f16x2.f32 %r1321, %r1320, %r1319; 2026-02-21T09:42:48.8519978Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8520040Z cvt.u64.u32 %rd170, %r690; 2026-02-21T09:42:48.8520100Z cvt.u64.u32 %rd171, %r691; 2026-02-21T09:42:48.8520165Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:42:48.8520226Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:42:48.8520395Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8520461Z mov.b64 {%r1322, %r1323}, %rd173; 2026-02-21T09:42:48.8520528Z cvt.rn.f16x2.f32 %r1324, %r1323, %r1322; 2026-02-21T09:42:48.8520701Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8520761Z cvt.u64.u32 %rd174, %r692; 2026-02-21T09:42:48.8520830Z cvt.u64.u32 %rd175, %r693; 2026-02-21T09:42:48.8520889Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:42:48.8520975Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:42:48.8521155Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8521218Z mov.b64 {%r1325, %r1326}, %rd177; 2026-02-21T09:42:48.8521287Z cvt.rn.f16x2.f32 %r1327, %r1326, %r1325; 2026-02-21T09:42:48.8521460Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8521520Z cvt.u64.u32 %rd178, %r694; 2026-02-21T09:42:48.8521580Z cvt.u64.u32 %rd179, %r695; 2026-02-21T09:42:48.8521640Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:42:48.8521707Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:42:48.8521881Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8521942Z mov.b64 {%r1328, %r1329}, %rd181; 2026-02-21T09:42:48.8522017Z cvt.rn.f16x2.f32 %r1330, %r1329, %r1328; 2026-02-21T09:42:48.8522187Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8522246Z cvt.u64.u32 %rd182, %r696; 2026-02-21T09:42:48.8522336Z cvt.u64.u32 %rd183, %r697; 2026-02-21T09:42:48.8522396Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:42:48.8522456Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:42:48.8522627Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8522697Z mov.b64 {%r1331, %r1332}, %rd185; 2026-02-21T09:42:48.8522766Z cvt.rn.f16x2.f32 %r1333, %r1332, %r1331; 2026-02-21T09:42:48.8522938Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8523008Z cvt.u64.u32 %rd186, %r698; 2026-02-21T09:42:48.8523069Z cvt.u64.u32 %rd187, %r699; 2026-02-21T09:42:48.8523129Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:42:48.8523190Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:42:48.8523373Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8523459Z mov.b64 {%r1334, %r1335}, %rd189; 2026-02-21T09:42:48.8523531Z cvt.rn.f16x2.f32 %r1336, %r1335, %r1334; 2026-02-21T09:42:48.8523711Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8523772Z cvt.u64.u32 %rd190, %r700; 2026-02-21T09:42:48.8523832Z cvt.u64.u32 %rd191, %r701; 2026-02-21T09:42:48.8523900Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:42:48.8523961Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:42:48.8524130Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8524222Z mov.b64 {%r1337, %r1338}, %rd193; 2026-02-21T09:42:48.8524290Z cvt.rn.f16x2.f32 %r1339, %r1338, %r1337; 2026-02-21T09:42:48.8524456Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8524518Z cvt.u64.u32 %rd194, %r702; 2026-02-21T09:42:48.8524592Z cvt.u64.u32 %rd195, %r703; 2026-02-21T09:42:48.8524658Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:42:48.8524771Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:42:48.8524968Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8525035Z mov.b64 {%r1340, %r1341}, %rd197; 2026-02-21T09:42:48.8525106Z cvt.rn.f16x2.f32 %r1342, %r1341, %r1340; 2026-02-21T09:42:48.8525297Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8525360Z cvt.u64.u32 %rd198, %r704; 2026-02-21T09:42:48.8525423Z cvt.u64.u32 %rd199, %r705; 2026-02-21T09:42:48.8525490Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:42:48.8525562Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:42:48.8525743Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8525807Z mov.b64 {%r1343, %r1344}, %rd201; 2026-02-21T09:42:48.8525914Z cvt.rn.f16x2.f32 %r1345, %r1344, %r1343; 2026-02-21T09:42:48.8526099Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8526165Z cvt.u64.u32 %rd202, %r707; 2026-02-21T09:42:48.8526243Z cvt.u64.u32 %rd203, %r708; 2026-02-21T09:42:48.8526303Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:42:48.8526364Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:42:48.8526534Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8526602Z mov.b64 {%r1346, %r1347}, %rd205; 2026-02-21T09:42:48.8526670Z cvt.rn.f16x2.f32 %r1348, %r1347, %r1346; 2026-02-21T09:42:48.8526837Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8526907Z cvt.u64.u32 %rd206, %r709; 2026-02-21T09:42:48.8526973Z cvt.u64.u32 %rd207, %r710; 2026-02-21T09:42:48.8527037Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:42:48.8527103Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:42:48.8527294Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8527391Z mov.b64 {%r1349, %r1350}, %rd209; 2026-02-21T09:42:48.8527463Z cvt.rn.f16x2.f32 %r1351, %r1350, %r1349; 2026-02-21T09:42:48.8527651Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8527715Z cvt.u64.u32 %rd210, %r711; 2026-02-21T09:42:48.8527779Z cvt.u64.u32 %rd211, %r712; 2026-02-21T09:42:48.8527849Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:42:48.8527913Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:42:48.8528097Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8528168Z mov.b64 {%r1352, %r1353}, %rd213; 2026-02-21T09:42:48.8528241Z cvt.rn.f16x2.f32 %r1354, %r1353, %r1352; 2026-02-21T09:42:48.8528427Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8528532Z cvt.u64.u32 %rd214, %r713; 2026-02-21T09:42:48.8528607Z cvt.u64.u32 %rd215, %r714; 2026-02-21T09:42:48.8528672Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:42:48.8528737Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:42:48.8528929Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8528995Z mov.b64 {%r1355, %r1356}, %rd217; 2026-02-21T09:42:48.8529069Z cvt.rn.f16x2.f32 %r1357, %r1356, %r1355; 2026-02-21T09:42:48.8529259Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8529352Z cvt.u64.u32 %rd218, %r715; 2026-02-21T09:42:48.8529416Z cvt.u64.u32 %rd219, %r716; 2026-02-21T09:42:48.8529480Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:42:48.8529552Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:42:48.8529742Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8529809Z mov.b64 {%r1358, %r1359}, %rd221; 2026-02-21T09:42:48.8529890Z cvt.rn.f16x2.f32 %r1360, %r1359, %r1358; 2026-02-21T09:42:48.8530076Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8530139Z cvt.u64.u32 %rd222, %r717; 2026-02-21T09:42:48.8530209Z cvt.u64.u32 %rd223, %r718; 2026-02-21T09:42:48.8530273Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:42:48.8530338Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:42:48.8530524Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8530598Z mov.b64 {%r1361, %r1362}, %rd225; 2026-02-21T09:42:48.8530671Z cvt.rn.f16x2.f32 %r1363, %r1362, %r1361; 2026-02-21T09:42:48.8530857Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8530930Z cvt.u64.u32 %rd226, %r719; 2026-02-21T09:42:48.8531017Z cvt.u64.u32 %rd227, %r720; 2026-02-21T09:42:48.8531084Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:42:48.8531149Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:42:48.8531341Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8531407Z mov.b64 {%r1364, %r1365}, %rd229; 2026-02-21T09:42:48.8531480Z cvt.rn.f16x2.f32 %r1366, %r1365, %r1364; 2026-02-21T09:42:48.8531680Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8531743Z cvt.u64.u32 %rd230, %r721; 2026-02-21T09:42:48.8531805Z cvt.u64.u32 %rd231, %r722; 2026-02-21T09:42:48.8531879Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:42:48.8531945Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:42:48.8532131Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8532206Z mov.b64 {%r1367, %r1368}, %rd233; 2026-02-21T09:42:48.8532281Z cvt.rn.f16x2.f32 %r1369, %r1368, %r1367; 2026-02-21T09:42:48.8532467Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8532577Z cvt.u64.u32 %rd234, %r724; 2026-02-21T09:42:48.8532648Z cvt.u64.u32 %rd235, %r725; 2026-02-21T09:42:48.8532713Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:42:48.8532778Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:42:48.8532975Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8533037Z mov.b64 {%r1370, %r1371}, %rd237; 2026-02-21T09:42:48.8533106Z cvt.rn.f16x2.f32 %r1372, %r1371, %r1370; 2026-02-21T09:42:48.8533288Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8533349Z cvt.u64.u32 %rd238, %r726; 2026-02-21T09:42:48.8533409Z cvt.u64.u32 %rd239, %r727; 2026-02-21T09:42:48.8533472Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:42:48.8533541Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:42:48.8533739Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8533803Z mov.b64 {%r1373, %r1374}, %rd241; 2026-02-21T09:42:48.8533879Z cvt.rn.f16x2.f32 %r1375, %r1374, %r1373; 2026-02-21T09:42:48.8534049Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8534108Z cvt.u64.u32 %rd242, %r728; 2026-02-21T09:42:48.8534177Z cvt.u64.u32 %rd243, %r729; 2026-02-21T09:42:48.8534240Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:42:48.8534303Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:42:48.8534475Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8534573Z mov.b64 {%r1376, %r1377}, %rd245; 2026-02-21T09:42:48.8534646Z cvt.rn.f16x2.f32 %r1378, %r1377, %r1376; 2026-02-21T09:42:48.8534861Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8534938Z cvt.u64.u32 %rd246, %r730; 2026-02-21T09:42:48.8535002Z cvt.u64.u32 %rd247, %r731; 2026-02-21T09:42:48.8535071Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:42:48.8535139Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:42:48.8535331Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8535399Z mov.b64 {%r1379, %r1380}, %rd249; 2026-02-21T09:42:48.8535474Z cvt.rn.f16x2.f32 %r1381, %r1380, %r1379; 2026-02-21T09:42:48.8535666Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8535735Z cvt.u64.u32 %rd250, %r732; 2026-02-21T09:42:48.8535802Z cvt.u64.u32 %rd251, %r733; 2026-02-21T09:42:48.8535876Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:42:48.8535943Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:42:48.8536124Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8536245Z mov.b64 {%r1382, %r1383}, %rd253; 2026-02-21T09:42:48.8536314Z cvt.rn.f16x2.f32 %r1384, %r1383, %r1382; 2026-02-21T09:42:48.8536487Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8536547Z cvt.u64.u32 %rd254, %r734; 2026-02-21T09:42:48.8536614Z cvt.u64.u32 %rd255, %r735; 2026-02-21T09:42:48.8536675Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:42:48.8536734Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:42:48.8536913Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8536973Z mov.b64 {%r1385, %r1386}, %rd257; 2026-02-21T09:42:48.8537043Z cvt.rn.f16x2.f32 %r1387, %r1386, %r1385; 2026-02-21T09:42:48.8537225Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8537286Z cvt.u64.u32 %rd258, %r736; 2026-02-21T09:42:48.8537345Z cvt.u64.u32 %rd259, %r737; 2026-02-21T09:42:48.8537407Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:42:48.8537476Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:42:48.8537672Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8537733Z mov.b64 {%r1388, %r1389}, %rd261; 2026-02-21T09:42:48.8537808Z cvt.rn.f16x2.f32 %r1390, %r1389, %r1388; 2026-02-21T09:42:48.8537980Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8538041Z cvt.u64.u32 %rd262, %r738; 2026-02-21T09:42:48.8538107Z cvt.u64.u32 %rd263, %r739; 2026-02-21T09:42:48.8538167Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:42:48.8538228Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:42:48.8538399Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8538465Z mov.b64 {%r1391, %r1392}, %rd265; 2026-02-21T09:42:48.8538533Z cvt.rn.f16x2.f32 %r1393, %r1392, %r1391; 2026-02-21T09:42:48.8538732Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8538803Z cvt.u64.u32 %rd266, %r741; 2026-02-21T09:42:48.8538862Z cvt.u64.u32 %rd267, %r742; 2026-02-21T09:42:48.8538922Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:42:48.8538983Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:42:48.8539161Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8539220Z mov.b64 {%r1394, %r1395}, %rd269; 2026-02-21T09:42:48.8539299Z cvt.rn.f16x2.f32 %r1396, %r1395, %r1394; 2026-02-21T09:42:48.8539471Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8539559Z cvt.u64.u32 %rd270, %r743; 2026-02-21T09:42:48.8539618Z cvt.u64.u32 %rd271, %r744; 2026-02-21T09:42:48.8539683Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:42:48.8539743Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:42:48.8539914Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8539984Z mov.b64 {%r1397, %r1398}, %rd273; 2026-02-21T09:42:48.8540050Z cvt.rn.f16x2.f32 %r1399, %r1398, %r1397; 2026-02-21T09:42:48.8540217Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8540276Z cvt.u64.u32 %rd274, %r745; 2026-02-21T09:42:48.8540340Z cvt.u64.u32 %rd275, %r746; 2026-02-21T09:42:48.8540401Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:42:48.8540461Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:42:48.8540635Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8540699Z mov.b64 {%r1400, %r1401}, %rd277; 2026-02-21T09:42:48.8540766Z cvt.rn.f16x2.f32 %r1402, %r1401, %r1400; 2026-02-21T09:42:48.8540940Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8541025Z cvt.u64.u32 %rd278, %r747; 2026-02-21T09:42:48.8541089Z cvt.u64.u32 %rd279, %r748; 2026-02-21T09:42:48.8541151Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:42:48.8541219Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:42:48.8541390Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8541450Z mov.b64 {%r1403, %r1404}, %rd281; 2026-02-21T09:42:48.8541525Z cvt.rn.f16x2.f32 %r1405, %r1404, %r1403; 2026-02-21T09:42:48.8541695Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8541753Z cvt.u64.u32 %rd282, %r749; 2026-02-21T09:42:48.8541823Z cvt.u64.u32 %rd283, %r750; 2026-02-21T09:42:48.8541883Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:42:48.8541945Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:42:48.8542114Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8542186Z mov.b64 {%r1406, %r1407}, %rd285; 2026-02-21T09:42:48.8542254Z cvt.rn.f16x2.f32 %r1408, %r1407, %r1406; 2026-02-21T09:42:48.8542445Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8542515Z cvt.u64.u32 %rd286, %r751; 2026-02-21T09:42:48.8542575Z cvt.u64.u32 %rd287, %r752; 2026-02-21T09:42:48.8542637Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:42:48.8542698Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:42:48.8542883Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8542946Z mov.b64 {%r1409, %r1410}, %rd289; 2026-02-21T09:42:48.8543017Z cvt.rn.f16x2.f32 %r1411, %r1410, %r1409; 2026-02-21T09:42:48.8543198Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8543257Z cvt.u64.u32 %rd290, %r753; 2026-02-21T09:42:48.8543316Z cvt.u64.u32 %rd291, %r754; 2026-02-21T09:42:48.8543407Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:42:48.8543472Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:42:48.8543643Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8543708Z mov.b64 {%r1412, %r1413}, %rd293; 2026-02-21T09:42:48.8543776Z cvt.rn.f16x2.f32 %r1414, %r1413, %r1412; 2026-02-21T09:42:48.8543944Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8544005Z cvt.u64.u32 %rd294, %r755; 2026-02-21T09:42:48.8544071Z cvt.u64.u32 %rd295, %r756; 2026-02-21T09:42:48.8544130Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:42:48.8544214Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:42:48.8544395Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8544455Z mov.b64 {%r1415, %r1416}, %rd297; 2026-02-21T09:42:48.8544522Z cvt.rn.f16x2.f32 %r1417, %r1416, %r1415; 2026-02-21T09:42:48.8544743Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8544811Z cvt.u64.u32 %rd298, %r758; 2026-02-21T09:42:48.8544875Z cvt.u64.u32 %rd299, %r759; 2026-02-21T09:42:48.8544941Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:42:48.8545014Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:42:48.8545200Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8545265Z mov.b64 {%r1418, %r1419}, %rd301; 2026-02-21T09:42:48.8545346Z cvt.rn.f16x2.f32 %r1420, %r1419, %r1418; 2026-02-21T09:42:48.8545531Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8545598Z cvt.u64.u32 %rd302, %r760; 2026-02-21T09:42:48.8545671Z cvt.u64.u32 %rd303, %r761; 2026-02-21T09:42:48.8545735Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:42:48.8545800Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:42:48.8546020Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8546098Z mov.b64 {%r1421, %r1422}, %rd305; 2026-02-21T09:42:48.8546172Z cvt.rn.f16x2.f32 %r1423, %r1422, %r1421; 2026-02-21T09:42:48.8546360Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8546434Z cvt.u64.u32 %rd306, %r762; 2026-02-21T09:42:48.8546499Z cvt.u64.u32 %rd307, %r763; 2026-02-21T09:42:48.8546565Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:42:48.8546630Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:42:48.8546823Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8546890Z mov.b64 {%r1424, %r1425}, %rd309; 2026-02-21T09:42:48.8546964Z cvt.rn.f16x2.f32 %r1426, %r1425, %r1424; 2026-02-21T09:42:48.8547159Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8547227Z cvt.u64.u32 %rd310, %r764; 2026-02-21T09:42:48.8547291Z cvt.u64.u32 %rd311, %r765; 2026-02-21T09:42:48.8547407Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:42:48.8547473Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:42:48.8547655Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8547727Z mov.b64 {%r1427, %r1428}, %rd313; 2026-02-21T09:42:48.8547800Z cvt.rn.f16x2.f32 %r1429, %r1428, %r1427; 2026-02-21T09:42:48.8547987Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8548050Z cvt.u64.u32 %rd314, %r766; 2026-02-21T09:42:48.8548125Z cvt.u64.u32 %rd315, %r767; 2026-02-21T09:42:48.8548189Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:42:48.8548254Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:42:48.8548445Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8548513Z mov.b64 {%r1430, %r1431}, %rd317; 2026-02-21T09:42:48.8548617Z cvt.rn.f16x2.f32 %r1432, %r1431, %r1430; 2026-02-21T09:42:48.8548805Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8548870Z cvt.u64.u32 %rd318, %r768; 2026-02-21T09:42:48.8548935Z cvt.u64.u32 %rd319, %r769; 2026-02-21T09:42:48.8548999Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:42:48.8549072Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:42:48.8549255Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8549321Z mov.b64 {%r1433, %r1434}, %rd321; 2026-02-21T09:42:48.8549431Z cvt.rn.f16x2.f32 %r1435, %r1434, %r1433; 2026-02-21T09:42:48.8549610Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8549674Z cvt.u64.u32 %rd322, %r770; 2026-02-21T09:42:48.8549743Z cvt.u64.u32 %rd323, %r771; 2026-02-21T09:42:48.8549808Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:42:48.8549874Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:42:48.8550052Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8550125Z mov.b64 {%r1436, %r1437}, %rd325; 2026-02-21T09:42:48.8550197Z cvt.rn.f16x2.f32 %r1438, %r1437, %r1436; 2026-02-21T09:42:48.8550376Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8550449Z cvt.u64.u32 %rd326, %r772; 2026-02-21T09:42:48.8550513Z cvt.u64.u32 %rd327, %r773; 2026-02-21T09:42:48.8550578Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:42:48.8550644Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:42:48.8550830Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8550894Z mov.b64 {%r1439, %r1440}, %rd329; 2026-02-21T09:42:48.8550966Z cvt.rn.f16x2.f32 %r1441, %r1440, %r1439; 2026-02-21T09:42:48.8551178Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8551246Z cvt.u64.u32 %rd330, %r775; 2026-02-21T09:42:48.8551312Z cvt.u64.u32 %rd331, %r776; 2026-02-21T09:42:48.8551388Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:42:48.8551454Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:42:48.8551640Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8551712Z mov.b64 {%r1442, %r1443}, %rd333; 2026-02-21T09:42:48.8551785Z cvt.rn.f16x2.f32 %r1444, %r1443, %r1442; 2026-02-21T09:42:48.8551978Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8552045Z cvt.u64.u32 %rd334, %r777; 2026-02-21T09:42:48.8552118Z cvt.u64.u32 %rd335, %r778; 2026-02-21T09:42:48.8552184Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:42:48.8552251Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:42:48.8552449Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8552517Z mov.b64 {%r1445, %r1446}, %rd337; 2026-02-21T09:42:48.8552627Z cvt.rn.f16x2.f32 %r1447, %r1446, %r1445; 2026-02-21T09:42:48.8552822Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8552888Z cvt.u64.u32 %rd338, %r779; 2026-02-21T09:42:48.8552952Z cvt.u64.u32 %rd339, %r780; 2026-02-21T09:42:48.8553017Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:42:48.8553089Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:42:48.8553272Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8553344Z mov.b64 {%r1448, %r1449}, %rd341; 2026-02-21T09:42:48.8553424Z cvt.rn.f16x2.f32 %r1450, %r1449, %r1448; 2026-02-21T09:42:48.8553611Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8553678Z cvt.u64.u32 %rd342, %r781; 2026-02-21T09:42:48.8553773Z cvt.u64.u32 %rd343, %r782; 2026-02-21T09:42:48.8553839Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:42:48.8553907Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:42:48.8554090Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8554163Z mov.b64 {%r1451, %r1452}, %rd345; 2026-02-21T09:42:48.8554236Z cvt.rn.f16x2.f32 %r1453, %r1452, %r1451; 2026-02-21T09:42:48.8554423Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8554495Z cvt.u64.u32 %rd346, %r783; 2026-02-21T09:42:48.8554585Z cvt.u64.u32 %rd347, %r784; 2026-02-21T09:42:48.8554650Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:42:48.8554747Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:42:48.8554943Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8555010Z mov.b64 {%r1454, %r1455}, %rd349; 2026-02-21T09:42:48.8555086Z cvt.rn.f16x2.f32 %r1456, %r1455, %r1454; 2026-02-21T09:42:48.8555279Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8555346Z cvt.u64.u32 %rd350, %r785; 2026-02-21T09:42:48.8555411Z cvt.u64.u32 %rd351, %r786; 2026-02-21T09:42:48.8555486Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:42:48.8555551Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:42:48.8555736Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8555807Z mov.b64 {%r1457, %r1458}, %rd353; 2026-02-21T09:42:48.8555883Z cvt.rn.f16x2.f32 %r1459, %r1458, %r1457; 2026-02-21T09:42:48.8556069Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8556134Z cvt.u64.u32 %rd354, %r787; 2026-02-21T09:42:48.8556207Z cvt.u64.u32 %rd355, %r788; 2026-02-21T09:42:48.8556307Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:42:48.8556377Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:42:48.8556569Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8556638Z mov.b64 {%r1460, %r1461}, %rd357; 2026-02-21T09:42:48.8556713Z cvt.rn.f16x2.f32 %r1462, %r1461, %r1460; 2026-02-21T09:42:48.8556907Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8556972Z cvt.u64.u32 %rd358, %r789; 2026-02-21T09:42:48.8557035Z cvt.u64.u32 %rd359, %r790; 2026-02-21T09:42:48.8557099Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:42:48.8557174Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:42:48.8557362Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8557427Z mov.b64 {%r1463, %r1464}, %rd361; 2026-02-21T09:42:48.8557507Z cvt.rn.f16x2.f32 %r1465, %r1464, %r1463; 2026-02-21T09:42:48.8557697Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8557761Z cvt.u64.u32 %rd362, %r792; 2026-02-21T09:42:48.8557862Z cvt.u64.u32 %rd363, %r793; 2026-02-21T09:42:48.8557927Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:42:48.8557993Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:42:48.8558182Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8558254Z mov.b64 {%r1466, %r1467}, %rd365; 2026-02-21T09:42:48.8558326Z cvt.rn.f16x2.f32 %r1468, %r1467, %r1466; 2026-02-21T09:42:48.8558510Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8558584Z cvt.u64.u32 %rd366, %r794; 2026-02-21T09:42:48.8558646Z cvt.u64.u32 %rd367, %r795; 2026-02-21T09:42:48.8558710Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:42:48.8558774Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:42:48.8558994Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8559060Z mov.b64 {%r1469, %r1470}, %rd369; 2026-02-21T09:42:48.8559135Z cvt.rn.f16x2.f32 %r1471, %r1470, %r1469; 2026-02-21T09:42:48.8559327Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8559391Z cvt.u64.u32 %rd370, %r796; 2026-02-21T09:42:48.8559455Z cvt.u64.u32 %rd371, %r797; 2026-02-21T09:42:48.8559527Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:42:48.8559591Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:42:48.8559778Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8559880Z mov.b64 {%r1472, %r1473}, %rd373; 2026-02-21T09:42:48.8559955Z cvt.rn.f16x2.f32 %r1474, %r1473, %r1472; 2026-02-21T09:42:48.8560141Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8560208Z cvt.u64.u32 %rd374, %r798; 2026-02-21T09:42:48.8560281Z cvt.u64.u32 %rd375, %r799; 2026-02-21T09:42:48.8560345Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:42:48.8560413Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:42:48.8560608Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8560673Z mov.b64 {%r1475, %r1476}, %rd377; 2026-02-21T09:42:48.8560747Z cvt.rn.f16x2.f32 %r1477, %r1476, %r1475; 2026-02-21T09:42:48.8560940Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8561004Z cvt.u64.u32 %rd378, %r800; 2026-02-21T09:42:48.8561070Z cvt.u64.u32 %rd379, %r801; 2026-02-21T09:42:48.8561134Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:42:48.8561207Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:42:48.8561392Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8561456Z mov.b64 {%r1478, %r1479}, %rd381; 2026-02-21T09:42:48.8561573Z cvt.rn.f16x2.f32 %r1480, %r1479, %r1478; 2026-02-21T09:42:48.8561747Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8561811Z cvt.u64.u32 %rd382, %r802; 2026-02-21T09:42:48.8561881Z cvt.u64.u32 %rd383, %r803; 2026-02-21T09:42:48.8561943Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:42:48.8562007Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:42:48.8562177Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8562250Z mov.b64 {%r1481, %r1482}, %rd385; 2026-02-21T09:42:48.8562321Z cvt.rn.f16x2.f32 %r1483, %r1482, %r1481; 2026-02-21T09:42:48.8562496Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8562573Z cvt.u64.u32 %rd386, %r804; 2026-02-21T09:42:48.8562634Z cvt.u64.u32 %rd387, %r805; 2026-02-21T09:42:48.8562694Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:42:48.8562759Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:42:48.8562939Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8563023Z mov.b64 {%r1484, %r1485}, %rd389; 2026-02-21T09:42:48.8563091Z cvt.rn.f16x2.f32 %r1486, %r1485, %r1484; 2026-02-21T09:42:48.8563270Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8563328Z cvt.u64.u32 %rd390, %r806; 2026-02-21T09:42:48.8563387Z cvt.u64.u32 %rd391, %r807; 2026-02-21T09:42:48.8563454Z shl.b64 %rd392, %rd391, 32; 2026-02-21T09:42:48.8563513Z or.b64 %rd393, %rd390, %rd392; 2026-02-21T09:42:48.8563685Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8563753Z mov.b64 {%r1487, %r1488}, %rd393; 2026-02-21T09:42:48.8563819Z cvt.rn.f16x2.f32 %r1489, %r1488, %r1487; 2026-02-21T09:42:48.8564012Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8564073Z cvt.u64.u32 %rd394, %r809; 2026-02-21T09:42:48.8564141Z cvt.u64.u32 %rd395, %r810; 2026-02-21T09:42:48.8564201Z shl.b64 %rd396, %rd395, 32; 2026-02-21T09:42:48.8564261Z or.b64 %rd397, %rd394, %rd396; 2026-02-21T09:42:48.8564438Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8564499Z mov.b64 {%r1490, %r1491}, %rd397; 2026-02-21T09:42:48.8564573Z cvt.rn.f16x2.f32 %r1492, %r1491, %r1490; 2026-02-21T09:42:48.8564803Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8564898Z cvt.u64.u32 %rd398, %r811; 2026-02-21T09:42:48.8564963Z cvt.u64.u32 %rd399, %r812; 2026-02-21T09:42:48.8565027Z shl.b64 %rd400, %rd399, 32; 2026-02-21T09:42:48.8565099Z or.b64 %rd401, %rd398, %rd400; 2026-02-21T09:42:48.8565287Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8565353Z mov.b64 {%r1493, %r1494}, %rd401; 2026-02-21T09:42:48.8565436Z cvt.rn.f16x2.f32 %r1495, %r1494, %r1493; 2026-02-21T09:42:48.8565619Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8565685Z cvt.u64.u32 %rd402, %r813; 2026-02-21T09:42:48.8566085Z [115s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:42:48.8567179Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 256, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, False]), static_shapes=True) 2026-02-21T09:42:48.8567359Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:42:48.8567430Z `ptxas` stderr: 2026-02-21T09:42:48.8567788Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 147 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:42:48.8567885Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:42:48.8567890Z 2026-02-21T09:42:48.8568318Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpy5axpv50.ptx -o /tmp/tmpy5axpv50.ptx.o 2026-02-21T09:42:48.8568323Z 2026-02-21T09:42:48.8568461Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:42:48.8568524Z cvt.u64.u32 %rd403, %r814; 2026-02-21T09:42:48.8568593Z shl.b64 %rd404, %rd403, 32; 2026-02-21T09:42:48.8568657Z or.b64 %rd405, %rd402, %rd404; 2026-02-21T09:42:48.8568830Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8568902Z mov.b64 {%r1496, %r1497}, %rd405; 2026-02-21T09:42:48.8569001Z cvt.rn.f16x2.f32 %r1498, %r1497, %r1496; 2026-02-21T09:42:48.8569173Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8569239Z cvt.u64.u32 %rd406, %r815; 2026-02-21T09:42:48.8569299Z cvt.u64.u32 %rd407, %r816; 2026-02-21T09:42:48.8569360Z shl.b64 %rd408, %rd407, 32; 2026-02-21T09:42:48.8569421Z or.b64 %rd409, %rd406, %rd408; 2026-02-21T09:42:48.8569600Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8569663Z mov.b64 {%r1499, %r1500}, %rd409; 2026-02-21T09:42:48.8569730Z cvt.rn.f16x2.f32 %r1501, %r1500, %r1499; 2026-02-21T09:42:48.8569907Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8569966Z cvt.u64.u32 %rd410, %r817; 2026-02-21T09:42:48.8570051Z cvt.u64.u32 %rd411, %r818; 2026-02-21T09:42:48.8570120Z shl.b64 %rd412, %rd411, 32; 2026-02-21T09:42:48.8570183Z or.b64 %rd413, %rd410, %rd412; 2026-02-21T09:42:48.8570352Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8570412Z mov.b64 {%r1502, %r1503}, %rd413; 2026-02-21T09:42:48.8570489Z cvt.rn.f16x2.f32 %r1504, %r1503, %r1502; 2026-02-21T09:42:48.8570659Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8570719Z cvt.u64.u32 %rd414, %r819; 2026-02-21T09:42:48.8570786Z cvt.u64.u32 %rd415, %r820; 2026-02-21T09:42:48.8570885Z shl.b64 %rd416, %rd415, 32; 2026-02-21T09:42:48.8570946Z or.b64 %rd417, %rd414, %rd416; 2026-02-21T09:42:48.8571124Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8571184Z mov.b64 {%r1505, %r1506}, %rd417; 2026-02-21T09:42:48.8571254Z cvt.rn.f16x2.f32 %r1507, %r1506, %r1505; 2026-02-21T09:42:48.8571421Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8571491Z cvt.u64.u32 %rd418, %r821; 2026-02-21T09:42:48.8571551Z cvt.u64.u32 %rd419, %r822; 2026-02-21T09:42:48.8571611Z shl.b64 %rd420, %rd419, 32; 2026-02-21T09:42:48.8571677Z or.b64 %rd421, %rd418, %rd420; 2026-02-21T09:42:48.8571847Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8571908Z mov.b64 {%r1508, %r1509}, %rd421; 2026-02-21T09:42:48.8571984Z cvt.rn.f16x2.f32 %r1510, %r1509, %r1508; 2026-02-21T09:42:48.8572156Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8572215Z cvt.u64.u32 %rd422, %r823; 2026-02-21T09:42:48.8572275Z cvt.u64.u32 %rd423, %r824; 2026-02-21T09:42:48.8572342Z shl.b64 %rd424, %rd423, 32; 2026-02-21T09:42:48.8572434Z or.b64 %rd425, %rd422, %rd424; 2026-02-21T09:42:48.8572609Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8572684Z mov.b64 {%r1511, %r1512}, %rd425; 2026-02-21T09:42:48.8572751Z cvt.rn.f16x2.f32 %r1513, %r1512, %r1511; 2026-02-21T09:42:48.8572920Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8572988Z cvt.u64.u32 %rd426, %r826; 2026-02-21T09:42:48.8573048Z cvt.u64.u32 %rd427, %r827; 2026-02-21T09:42:48.8573110Z shl.b64 %rd428, %rd427, 32; 2026-02-21T09:42:48.8573172Z or.b64 %rd429, %rd426, %rd428; 2026-02-21T09:42:48.8573349Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8573412Z mov.b64 {%r1514, %r1515}, %rd429; 2026-02-21T09:42:48.8573480Z cvt.rn.f16x2.f32 %r1516, %r1515, %r1514; 2026-02-21T09:42:48.8573664Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8573727Z cvt.u64.u32 %rd430, %r828; 2026-02-21T09:42:48.8573812Z cvt.u64.u32 %rd431, %r829; 2026-02-21T09:42:48.8573880Z shl.b64 %rd432, %rd431, 32; 2026-02-21T09:42:48.8573942Z or.b64 %rd433, %rd430, %rd432; 2026-02-21T09:42:48.8574112Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8574174Z mov.b64 {%r1517, %r1518}, %rd433; 2026-02-21T09:42:48.8574254Z cvt.rn.f16x2.f32 %r1519, %r1518, %r1517; 2026-02-21T09:42:48.8574427Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8574494Z cvt.u64.u32 %rd434, %r830; 2026-02-21T09:42:48.8574571Z cvt.u64.u32 %rd435, %r831; 2026-02-21T09:42:48.8574639Z shl.b64 %rd436, %rd435, 32; 2026-02-21T09:42:48.8574760Z or.b64 %rd437, %rd434, %rd436; 2026-02-21T09:42:48.8574955Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8575050Z mov.b64 {%r1520, %r1521}, %rd437; 2026-02-21T09:42:48.8575128Z cvt.rn.f16x2.f32 %r1522, %r1521, %r1520; 2026-02-21T09:42:48.8575313Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8575386Z cvt.u64.u32 %rd438, %r832; 2026-02-21T09:42:48.8575449Z cvt.u64.u32 %rd439, %r833; 2026-02-21T09:42:48.8575515Z shl.b64 %rd440, %rd439, 32; 2026-02-21T09:42:48.8575587Z or.b64 %rd441, %rd438, %rd440; 2026-02-21T09:42:48.8575774Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8575872Z mov.b64 {%r1523, %r1524}, %rd441; 2026-02-21T09:42:48.8575952Z cvt.rn.f16x2.f32 %r1525, %r1524, %r1523; 2026-02-21T09:42:48.8576141Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8576205Z cvt.u64.u32 %rd442, %r834; 2026-02-21T09:42:48.8576270Z cvt.u64.u32 %rd443, %r835; 2026-02-21T09:42:48.8576346Z shl.b64 %rd444, %rd443, 32; 2026-02-21T09:42:48.8576411Z or.b64 %rd445, %rd442, %rd444; 2026-02-21T09:42:48.8576599Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8576677Z mov.b64 {%r1526, %r1527}, %rd445; 2026-02-21T09:42:48.8576750Z cvt.rn.f16x2.f32 %r1528, %r1527, %r1526; 2026-02-21T09:42:48.8576937Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8577015Z cvt.u64.u32 %rd446, %r836; 2026-02-21T09:42:48.8577074Z cvt.u64.u32 %rd447, %r837; 2026-02-21T09:42:48.8577134Z shl.b64 %rd448, %rd447, 32; 2026-02-21T09:42:48.8577194Z or.b64 %rd449, %rd446, %rd448; 2026-02-21T09:42:48.8577375Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8577435Z mov.b64 {%r1529, %r1530}, %rd449; 2026-02-21T09:42:48.8577537Z cvt.rn.f16x2.f32 %r1531, %r1530, %r1529; 2026-02-21T09:42:48.8577720Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8577782Z cvt.u64.u32 %rd450, %r838; 2026-02-21T09:42:48.8577841Z cvt.u64.u32 %rd451, %r839; 2026-02-21T09:42:48.8577908Z shl.b64 %rd452, %rd451, 32; 2026-02-21T09:42:48.8577967Z or.b64 %rd453, %rd450, %rd452; 2026-02-21T09:42:48.8578136Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8578196Z mov.b64 {%r1532, %r1533}, %rd453; 2026-02-21T09:42:48.8578269Z cvt.rn.f16x2.f32 %r1534, %r1533, %r1532; 2026-02-21T09:42:48.8578438Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8578498Z cvt.u64.u32 %rd454, %r840; 2026-02-21T09:42:48.8578565Z cvt.u64.u32 %rd455, %r841; 2026-02-21T09:42:48.8578626Z shl.b64 %rd456, %rd455, 32; 2026-02-21T09:42:48.8578687Z or.b64 %rd457, %rd454, %rd456; 2026-02-21T09:42:48.8578864Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8578956Z mov.b64 {%r1535, %r1536}, %rd457; 2026-02-21T09:42:48.8579022Z cvt.rn.f16x2.f32 %r1537, %r1536, %r1535; 2026-02-21T09:42:48.8579189Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8579255Z cvt.u64.u32 %rd458, %r843; 2026-02-21T09:42:48.8579313Z cvt.u64.u32 %rd459, %r844; 2026-02-21T09:42:48.8579371Z shl.b64 %rd460, %rd459, 32; 2026-02-21T09:42:48.8579437Z or.b64 %rd461, %rd458, %rd460; 2026-02-21T09:42:48.8579603Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8579664Z mov.b64 {%r1538, %r1539}, %rd461; 2026-02-21T09:42:48.8579738Z cvt.rn.f16x2.f32 %r1540, %r1539, %r1538; 2026-02-21T09:42:48.8579906Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8579990Z cvt.u64.u32 %rd462, %r845; 2026-02-21T09:42:48.8580050Z cvt.u64.u32 %rd463, %r846; 2026-02-21T09:42:48.8580117Z shl.b64 %rd464, %rd463, 32; 2026-02-21T09:42:48.8580178Z or.b64 %rd465, %rd462, %rd464; 2026-02-21T09:42:48.8580348Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8580416Z mov.b64 {%r1541, %r1542}, %rd465; 2026-02-21T09:42:48.8580484Z cvt.rn.f16x2.f32 %r1543, %r1542, %r1541; 2026-02-21T09:42:48.8580651Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8580759Z cvt.u64.u32 %rd466, %r847; 2026-02-21T09:42:48.8580819Z cvt.u64.u32 %rd467, %r848; 2026-02-21T09:42:48.8580879Z shl.b64 %rd468, %rd467, 32; 2026-02-21T09:42:48.8580939Z or.b64 %rd469, %rd466, %rd468; 2026-02-21T09:42:48.8581120Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8581184Z mov.b64 {%r1544, %r1545}, %rd469; 2026-02-21T09:42:48.8581252Z cvt.rn.f16x2.f32 %r1546, %r1545, %r1544; 2026-02-21T09:42:48.8581430Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8581489Z cvt.u64.u32 %rd470, %r849; 2026-02-21T09:42:48.8581550Z cvt.u64.u32 %rd471, %r850; 2026-02-21T09:42:48.8581617Z shl.b64 %rd472, %rd471, 32; 2026-02-21T09:42:48.8581678Z or.b64 %rd473, %rd470, %rd472; 2026-02-21T09:42:48.8581847Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8581910Z mov.b64 {%r1547, %r1548}, %rd473; 2026-02-21T09:42:48.8581986Z cvt.rn.f16x2.f32 %r1549, %r1548, %r1547; 2026-02-21T09:42:48.8582155Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8582215Z cvt.u64.u32 %rd474, %r851; 2026-02-21T09:42:48.8582306Z cvt.u64.u32 %rd475, %r852; 2026-02-21T09:42:48.8582371Z shl.b64 %rd476, %rd475, 32; 2026-02-21T09:42:48.8582433Z or.b64 %rd477, %rd474, %rd476; 2026-02-21T09:42:48.8582619Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8582682Z mov.b64 {%r1550, %r1551}, %rd477; 2026-02-21T09:42:48.8582752Z cvt.rn.f16x2.f32 %r1552, %r1551, %r1550; 2026-02-21T09:42:48.8582923Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8582995Z cvt.u64.u32 %rd478, %r853; 2026-02-21T09:42:48.8583056Z cvt.u64.u32 %rd479, %r854; 2026-02-21T09:42:48.8583120Z shl.b64 %rd480, %rd479, 32; 2026-02-21T09:42:48.8583191Z or.b64 %rd481, %rd478, %rd480; 2026-02-21T09:42:48.8583361Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8583422Z mov.b64 {%r1553, %r1554}, %rd481; 2026-02-21T09:42:48.8583498Z cvt.rn.f16x2.f32 %r1555, %r1554, %r1553; 2026-02-21T09:42:48.8583670Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8583754Z cvt.u64.u32 %rd482, %r855; 2026-02-21T09:42:48.8583813Z cvt.u64.u32 %rd483, %r856; 2026-02-21T09:42:48.8583879Z shl.b64 %rd484, %rd483, 32; 2026-02-21T09:42:48.8583942Z or.b64 %rd485, %rd482, %rd484; 2026-02-21T09:42:48.8584111Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8584181Z mov.b64 {%r1556, %r1557}, %rd485; 2026-02-21T09:42:48.8584249Z cvt.rn.f16x2.f32 %r1558, %r1557, %r1556; 2026-02-21T09:42:48.8584420Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8584489Z cvt.u64.u32 %rd486, %r857; 2026-02-21T09:42:48.8584547Z cvt.u64.u32 %rd487, %r858; 2026-02-21T09:42:48.8584607Z shl.b64 %rd488, %rd487, 32; 2026-02-21T09:42:48.8584724Z or.b64 %rd489, %rd486, %rd488; 2026-02-21T09:42:48.8584931Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8584996Z mov.b64 {%r1559, %r1560}, %rd489; 2026-02-21T09:42:48.8585068Z cvt.rn.f16x2.f32 %r1561, %r1560, %r1559; 2026-02-21T09:42:48.8585257Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8585321Z cvt.u64.u32 %rd490, %r860; 2026-02-21T09:42:48.8585385Z cvt.u64.u32 %rd491, %r861; 2026-02-21T09:42:48.8585456Z shl.b64 %rd492, %rd491, 32; 2026-02-21T09:42:48.8585522Z or.b64 %rd493, %rd490, %rd492; 2026-02-21T09:42:48.8585753Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8585820Z mov.b64 {%r1562, %r1563}, %rd493; 2026-02-21T09:42:48.8585900Z cvt.rn.f16x2.f32 %r1564, %r1563, %r1562; 2026-02-21T09:42:48.8586085Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8586151Z cvt.u64.u32 %rd494, %r862; 2026-02-21T09:42:48.8586224Z cvt.u64.u32 %rd495, %r863; 2026-02-21T09:42:48.8586291Z shl.b64 %rd496, %rd495, 32; 2026-02-21T09:42:48.8586359Z or.b64 %rd497, %rd494, %rd496; 2026-02-21T09:42:48.8586557Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8586620Z mov.b64 {%r1565, %r1566}, %rd497; 2026-02-21T09:42:48.8586687Z cvt.rn.f16x2.f32 %r1567, %r1566, %r1565; 2026-02-21T09:42:48.8586860Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8586930Z cvt.u64.u32 %rd498, %r864; 2026-02-21T09:42:48.8586990Z cvt.u64.u32 %rd499, %r865; 2026-02-21T09:42:48.8587051Z shl.b64 %rd500, %rd499, 32; 2026-02-21T09:42:48.8587118Z or.b64 %rd501, %rd498, %rd500; 2026-02-21T09:42:48.8587298Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8587396Z mov.b64 {%r1568, %r1569}, %rd501; 2026-02-21T09:42:48.8587480Z cvt.rn.f16x2.f32 %r1570, %r1569, %r1568; 2026-02-21T09:42:48.8587665Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8587729Z cvt.u64.u32 %rd502, %r866; 2026-02-21T09:42:48.8587794Z cvt.u64.u32 %rd503, %r867; 2026-02-21T09:42:48.8587867Z shl.b64 %rd504, %rd503, 32; 2026-02-21T09:42:48.8587933Z or.b64 %rd505, %rd502, %rd504; 2026-02-21T09:42:48.8588116Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8588188Z mov.b64 {%r1571, %r1572}, %rd505; 2026-02-21T09:42:48.8588260Z cvt.rn.f16x2.f32 %r1573, %r1572, %r1571; 2026-02-21T09:42:48.8588442Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8588512Z cvt.u64.u32 %rd506, %r868; 2026-02-21T09:42:48.8588576Z cvt.u64.u32 %rd507, %r869; 2026-02-21T09:42:48.8588643Z shl.b64 %rd508, %rd507, 32; 2026-02-21T09:42:48.8588708Z or.b64 %rd509, %rd506, %rd508; 2026-02-21T09:42:48.8588925Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8588989Z mov.b64 {%r1574, %r1575}, %rd509; 2026-02-21T09:42:48.8589063Z cvt.rn.f16x2.f32 %r1576, %r1575, %r1574; 2026-02-21T09:42:48.8589254Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8589319Z cvt.u64.u32 %rd510, %r870; 2026-02-21T09:42:48.8589382Z cvt.u64.u32 %rd511, %r871; 2026-02-21T09:42:48.8589454Z shl.b64 %rd512, %rd511, 32; 2026-02-21T09:42:48.8589518Z or.b64 %rd513, %rd510, %rd512; 2026-02-21T09:42:48.8589698Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8589763Z mov.b64 {%r1577, %r1578}, %rd513; 2026-02-21T09:42:48.8589844Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T09:42:48.8590050Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8590119Z cvt.u64.u32 %rd514, %r872; 2026-02-21T09:42:48.8590190Z cvt.u64.u32 %rd515, %r873; 2026-02-21T09:42:48.8590256Z shl.b64 %rd516, %rd515, 32; 2026-02-21T09:42:48.8590320Z or.b64 %rd517, %rd514, %rd516; 2026-02-21T09:42:48.8590510Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8590577Z mov.b64 {%r1580, %r1581}, %rd517; 2026-02-21T09:42:48.8590649Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T09:42:48.8590856Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8590931Z cvt.u64.u32 %rd518, %r874; 2026-02-21T09:42:48.8590994Z cvt.u64.u32 %rd519, %r875; 2026-02-21T09:42:48.8591057Z shl.b64 %rd520, %rd519, 32; 2026-02-21T09:42:48.8591128Z or.b64 %rd521, %rd518, %rd520; 2026-02-21T09:42:48.8591317Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8591384Z mov.b64 {%r1583, %r1584}, %rd521; 2026-02-21T09:42:48.8591464Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T09:42:48.8591666Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8591731Z cvt.u64.u32 %rd522, %r877; 2026-02-21T09:42:48.8591795Z cvt.u64.u32 %rd523, %r878; 2026-02-21T09:42:48.8591869Z shl.b64 %rd524, %rd523, 32; 2026-02-21T09:42:48.8591936Z or.b64 %rd525, %rd522, %rd524; 2026-02-21T09:42:48.8592145Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8592222Z mov.b64 {%r1586, %r1587}, %rd525; 2026-02-21T09:42:48.8592297Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T09:42:48.8592504Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8592608Z cvt.u64.u32 %rd526, %r879; 2026-02-21T09:42:48.8592679Z cvt.u64.u32 %rd527, %r880; 2026-02-21T09:42:48.8592749Z shl.b64 %rd528, %rd527, 32; 2026-02-21T09:42:48.8592818Z or.b64 %rd529, %rd526, %rd528; 2026-02-21T09:42:48.8593048Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8593117Z mov.b64 {%r1589, %r1590}, %rd529; 2026-02-21T09:42:48.8593198Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T09:42:48.8593377Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8593439Z cvt.u64.u32 %rd530, %r881; 2026-02-21T09:42:48.8593498Z cvt.u64.u32 %rd531, %r882; 2026-02-21T09:42:48.8593563Z shl.b64 %rd532, %rd531, 32; 2026-02-21T09:42:48.8593625Z or.b64 %rd533, %rd530, %rd532; 2026-02-21T09:42:48.8593796Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8593860Z mov.b64 {%r1592, %r1593}, %rd533; 2026-02-21T09:42:48.8593934Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T09:42:48.8594129Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8594188Z cvt.u64.u32 %rd534, %r883; 2026-02-21T09:42:48.8594255Z cvt.u64.u32 %rd535, %r884; 2026-02-21T09:42:48.8594314Z shl.b64 %rd536, %rd535, 32; 2026-02-21T09:42:48.8594375Z or.b64 %rd537, %rd534, %rd536; 2026-02-21T09:42:48.8594555Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8594622Z mov.b64 {%r1595, %r1596}, %rd537; 2026-02-21T09:42:48.8594744Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T09:42:48.8594933Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8595005Z cvt.u64.u32 %rd538, %r885; 2026-02-21T09:42:48.8595068Z cvt.u64.u32 %rd539, %r886; 2026-02-21T09:42:48.8595158Z shl.b64 %rd540, %rd539, 32; 2026-02-21T09:42:48.8595233Z or.b64 %rd541, %rd538, %rd540; 2026-02-21T09:42:48.8595417Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8595483Z mov.b64 {%r1598, %r1599}, %rd541; 2026-02-21T09:42:48.8595564Z cvt.rn.f16x2.f32 %r1600, %r1599, %r1598; 2026-02-21T09:42:48.8595749Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8595813Z cvt.u64.u32 %rd542, %r887; 2026-02-21T09:42:48.8595876Z cvt.u64.u32 %rd543, %r888; 2026-02-21T09:42:48.8595947Z shl.b64 %rd544, %rd543, 32; 2026-02-21T09:42:48.8596043Z or.b64 %rd545, %rd542, %rd544; 2026-02-21T09:42:48.8596235Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8596303Z mov.b64 {%r1601, %r1602}, %rd545; 2026-02-21T09:42:48.8596370Z cvt.rn.f16x2.f32 %r1603, %r1602, %r1601; 2026-02-21T09:42:48.8596545Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8596615Z cvt.u64.u32 %rd546, %r889; 2026-02-21T09:42:48.8596674Z cvt.u64.u32 %rd547, %r890; 2026-02-21T09:42:48.8596733Z shl.b64 %rd548, %rd547, 32; 2026-02-21T09:42:48.8596793Z or.b64 %rd549, %rd546, %rd548; 2026-02-21T09:42:48.8596971Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8597032Z mov.b64 {%r1604, %r1605}, %rd549; 2026-02-21T09:42:48.8597100Z cvt.rn.f16x2.f32 %r1606, %r1605, %r1604; 2026-02-21T09:42:48.8597275Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8597336Z cvt.u64.u32 %rd550, %r891; 2026-02-21T09:42:48.8597394Z cvt.u64.u32 %rd551, %r892; 2026-02-21T09:42:48.8597461Z shl.b64 %rd552, %rd551, 32; 2026-02-21T09:42:48.8597521Z or.b64 %rd553, %rd550, %rd552; 2026-02-21T09:42:48.8597721Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8597785Z mov.b64 {%r1607, %r1608}, %rd553; 2026-02-21T09:42:48.8597861Z cvt.rn.f16x2.f32 %r1609, %r1608, %r1607; 2026-02-21T09:42:48.8598031Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8598090Z cvt.u64.u32 %rd554, %r894; 2026-02-21T09:42:48.8598157Z cvt.u64.u32 %rd555, %r895; 2026-02-21T09:42:48.8598216Z shl.b64 %rd556, %rd555, 32; 2026-02-21T09:42:48.8598276Z or.b64 %rd557, %rd554, %rd556; 2026-02-21T09:42:48.8598450Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8598513Z mov.b64 {%r1610, %r1611}, %rd557; 2026-02-21T09:42:48.8598581Z cvt.rn.f16x2.f32 %r1612, %r1611, %r1610; 2026-02-21T09:42:48.8598750Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8598820Z cvt.u64.u32 %rd558, %r896; 2026-02-21T09:42:48.8598879Z cvt.u64.u32 %rd559, %r897; 2026-02-21T09:42:48.8598967Z shl.b64 %rd560, %rd559, 32; 2026-02-21T09:42:48.8599035Z or.b64 %rd561, %rd558, %rd560; 2026-02-21T09:42:48.8599208Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8599269Z mov.b64 {%r1613, %r1614}, %rd561; 2026-02-21T09:42:48.8599342Z cvt.rn.f16x2.f32 %r1615, %r1614, %r1613; 2026-02-21T09:42:48.8599513Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8599572Z cvt.u64.u32 %rd562, %r898; 2026-02-21T09:42:48.8599634Z cvt.u64.u32 %rd563, %r899; 2026-02-21T09:42:48.8599700Z shl.b64 %rd564, %rd563, 32; 2026-02-21T09:42:48.8599760Z or.b64 %rd565, %rd562, %rd564; 2026-02-21T09:42:48.8599937Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8600005Z mov.b64 {%r1616, %r1617}, %rd565; 2026-02-21T09:42:48.8600095Z cvt.rn.f16x2.f32 %r1618, %r1617, %r1616; 2026-02-21T09:42:48.8600269Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8600337Z cvt.u64.u32 %rd566, %r900; 2026-02-21T09:42:48.8600396Z cvt.u64.u32 %rd567, %r901; 2026-02-21T09:42:48.8600455Z shl.b64 %rd568, %rd567, 32; 2026-02-21T09:42:48.8600517Z or.b64 %rd569, %rd566, %rd568; 2026-02-21T09:42:48.8600694Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8600757Z mov.b64 {%r1619, %r1620}, %rd569; 2026-02-21T09:42:48.8600849Z cvt.rn.f16x2.f32 %r1621, %r1620, %r1619; 2026-02-21T09:42:48.8601022Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8601081Z cvt.u64.u32 %rd570, %r902; 2026-02-21T09:42:48.8601140Z cvt.u64.u32 %rd571, %r903; 2026-02-21T09:42:48.8601207Z shl.b64 %rd572, %rd571, 32; 2026-02-21T09:42:48.8601268Z or.b64 %rd573, %rd570, %rd572; 2026-02-21T09:42:48.8601434Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8601497Z mov.b64 {%r1622, %r1623}, %rd573; 2026-02-21T09:42:48.8601571Z cvt.rn.f16x2.f32 %r1624, %r1623, %r1622; 2026-02-21T09:42:48.8601735Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8601794Z cvt.u64.u32 %rd574, %r904; 2026-02-21T09:42:48.8601861Z cvt.u64.u32 %rd575, %r905; 2026-02-21T09:42:48.8601922Z shl.b64 %rd576, %rd575, 32; 2026-02-21T09:42:48.8601984Z or.b64 %rd577, %rd574, %rd576; 2026-02-21T09:42:48.8602161Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8602223Z mov.b64 {%r1625, %r1626}, %rd577; 2026-02-21T09:42:48.8602293Z cvt.rn.f16x2.f32 %r1627, %r1626, %r1625; 2026-02-21T09:42:48.8602485Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8602556Z cvt.u64.u32 %rd578, %r906; 2026-02-21T09:42:48.8602616Z cvt.u64.u32 %rd579, %r907; 2026-02-21T09:42:48.8602677Z shl.b64 %rd580, %rd579, 32; 2026-02-21T09:42:48.8602744Z or.b64 %rd581, %rd578, %rd580; 2026-02-21T09:42:48.8602914Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8602975Z mov.b64 {%r1628, %r1629}, %rd581; 2026-02-21T09:42:48.8603049Z cvt.rn.f16x2.f32 %r1630, %r1629, %r1628; 2026-02-21T09:42:48.8603217Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8603278Z cvt.u64.u32 %rd582, %r908; 2026-02-21T09:42:48.8603336Z cvt.u64.u32 %rd583, %r909; 2026-02-21T09:42:48.8603402Z shl.b64 %rd584, %rd583, 32; 2026-02-21T09:42:48.8603463Z or.b64 %rd585, %rd582, %rd584; 2026-02-21T09:42:48.8603637Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8603706Z mov.b64 {%r1631, %r1632}, %rd585; 2026-02-21T09:42:48.8603798Z cvt.rn.f16x2.f32 %r1633, %r1632, %r1631; 2026-02-21T09:42:48.8603965Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8604033Z cvt.u64.u32 %rd586, %r911; 2026-02-21T09:42:48.8604092Z cvt.u64.u32 %rd587, %r912; 2026-02-21T09:42:48.8604153Z shl.b64 %rd588, %rd587, 32; 2026-02-21T09:42:48.8604212Z or.b64 %rd589, %rd586, %rd588; 2026-02-21T09:42:48.8604386Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8604448Z mov.b64 {%r1634, %r1635}, %rd589; 2026-02-21T09:42:48.8604518Z cvt.rn.f16x2.f32 %r1636, %r1635, %r1634; 2026-02-21T09:42:48.8604776Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8604842Z cvt.u64.u32 %rd590, %r913; 2026-02-21T09:42:48.8604954Z cvt.u64.u32 %rd591, %r914; 2026-02-21T09:42:48.8605027Z shl.b64 %rd592, %rd591, 32; 2026-02-21T09:42:48.8605094Z or.b64 %rd593, %rd590, %rd592; 2026-02-21T09:42:48.8605278Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8605345Z mov.b64 {%r1637, %r1638}, %rd593; 2026-02-21T09:42:48.8605426Z cvt.rn.f16x2.f32 %r1639, %r1638, %r1637; 2026-02-21T09:42:48.8605609Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8605674Z cvt.u64.u32 %rd594, %r915; 2026-02-21T09:42:48.8605778Z cvt.u64.u32 %rd595, %r916; 2026-02-21T09:42:48.8605843Z shl.b64 %rd596, %rd595, 32; 2026-02-21T09:42:48.8605909Z or.b64 %rd597, %rd594, %rd596; 2026-02-21T09:42:48.8606100Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8606168Z mov.b64 {%r1640, %r1641}, %rd597; 2026-02-21T09:42:48.8606243Z cvt.rn.f16x2.f32 %r1642, %r1641, %r1640; 2026-02-21T09:42:48.8606427Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8606496Z cvt.u64.u32 %rd598, %r917; 2026-02-21T09:42:48.8606556Z cvt.u64.u32 %rd599, %r918; 2026-02-21T09:42:48.8606617Z shl.b64 %rd600, %rd599, 32; 2026-02-21T09:42:48.8606685Z or.b64 %rd601, %rd598, %rd600; 2026-02-21T09:42:48.8606856Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8606917Z mov.b64 {%r1643, %r1644}, %rd601; 2026-02-21T09:42:48.8606994Z cvt.rn.f16x2.f32 %r1645, %r1644, %r1643; 2026-02-21T09:42:48.8607164Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8607223Z cvt.u64.u32 %rd602, %r919; 2026-02-21T09:42:48.8607282Z cvt.u64.u32 %rd603, %r920; 2026-02-21T09:42:48.8607374Z shl.b64 %rd604, %rd603, 32; 2026-02-21T09:42:48.8607437Z or.b64 %rd605, %rd602, %rd604; 2026-02-21T09:42:48.8607606Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8607676Z mov.b64 {%r1646, %r1647}, %rd605; 2026-02-21T09:42:48.8607743Z cvt.rn.f16x2.f32 %r1648, %r1647, %r1646; 2026-02-21T09:42:48.8607912Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8607978Z cvt.u64.u32 %rd606, %r921; 2026-02-21T09:42:48.8608036Z cvt.u64.u32 %rd607, %r922; 2026-02-21T09:42:48.8608095Z shl.b64 %rd608, %rd607, 32; 2026-02-21T09:42:48.8608155Z or.b64 %rd609, %rd606, %rd608; 2026-02-21T09:42:48.8608331Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8608391Z mov.b64 {%r1649, %r1650}, %rd609; 2026-02-21T09:42:48.8608458Z cvt.rn.f16x2.f32 %r1651, %r1650, %r1649; 2026-02-21T09:42:48.8608633Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8608693Z cvt.u64.u32 %rd610, %r923; 2026-02-21T09:42:48.8608782Z cvt.u64.u32 %rd611, %r924; 2026-02-21T09:42:48.8608848Z shl.b64 %rd612, %rd611, 32; 2026-02-21T09:42:48.8608908Z or.b64 %rd613, %rd610, %rd612; 2026-02-21T09:42:48.8609081Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8609141Z mov.b64 {%r1652, %r1653}, %rd613; 2026-02-21T09:42:48.8609216Z cvt.rn.f16x2.f32 %r1654, %r1653, %r1652; 2026-02-21T09:42:48.8609385Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8609446Z cvt.u64.u32 %rd614, %r925; 2026-02-21T09:42:48.8609511Z cvt.u64.u32 %rd615, %r926; 2026-02-21T09:42:48.8609570Z shl.b64 %rd616, %rd615, 32; 2026-02-21T09:42:48.8609630Z or.b64 %rd617, %rd614, %rd616; 2026-02-21T09:42:48.8609828Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8609890Z mov.b64 {%r1655, %r1656}, %rd617; 2026-02-21T09:42:48.8609961Z cvt.rn.f16x2.f32 %r1657, %r1656, %r1655; 2026-02-21T09:42:48.8610131Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8610197Z cvt.u64.u32 %rd618, %r928; 2026-02-21T09:42:48.8610257Z cvt.u64.u32 %rd619, %r929; 2026-02-21T09:42:48.8610316Z shl.b64 %rd620, %rd619, 32; 2026-02-21T09:42:48.8610385Z or.b64 %rd621, %rd618, %rd620; 2026-02-21T09:42:48.8610555Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8610640Z mov.b64 {%r1658, %r1659}, %rd621; 2026-02-21T09:42:48.8610715Z cvt.rn.f16x2.f32 %r1660, %r1659, %r1658; 2026-02-21T09:42:48.8610882Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8610942Z cvt.u64.u32 %rd622, %r930; 2026-02-21T09:42:48.8611002Z cvt.u64.u32 %rd623, %r931; 2026-02-21T09:42:48.8611069Z shl.b64 %rd624, %rd623, 32; 2026-02-21T09:42:48.8611133Z or.b64 %rd625, %rd622, %rd624; 2026-02-21T09:42:48.8611304Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8611375Z mov.b64 {%r1661, %r1662}, %rd625; 2026-02-21T09:42:48.8611442Z cvt.rn.f16x2.f32 %r1663, %r1662, %r1661; 2026-02-21T09:42:48.8611615Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8611685Z cvt.u64.u32 %rd626, %r932; 2026-02-21T09:42:48.8611747Z cvt.u64.u32 %rd627, %r933; 2026-02-21T09:42:48.8611807Z shl.b64 %rd628, %rd627, 32; 2026-02-21T09:42:48.8611867Z or.b64 %rd629, %rd626, %rd628; 2026-02-21T09:42:48.8612047Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8612107Z mov.b64 {%r1664, %r1665}, %rd629; 2026-02-21T09:42:48.8612201Z cvt.rn.f16x2.f32 %r1666, %r1665, %r1664; 2026-02-21T09:42:48.8612378Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8612439Z cvt.u64.u32 %rd630, %r934; 2026-02-21T09:42:48.8612499Z cvt.u64.u32 %rd631, %r935; 2026-02-21T09:42:48.8612565Z shl.b64 %rd632, %rd631, 32; 2026-02-21T09:42:48.8612627Z or.b64 %rd633, %rd630, %rd632; 2026-02-21T09:42:48.8612798Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8612858Z mov.b64 {%r1667, %r1668}, %rd633; 2026-02-21T09:42:48.8612934Z cvt.rn.f16x2.f32 %r1669, %r1668, %r1667; 2026-02-21T09:42:48.8613104Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8613163Z cvt.u64.u32 %rd634, %r936; 2026-02-21T09:42:48.8613230Z cvt.u64.u32 %rd635, %r937; 2026-02-21T09:42:48.8613289Z shl.b64 %rd636, %rd635, 32; 2026-02-21T09:42:48.8613354Z or.b64 %rd637, %rd634, %rd636; 2026-02-21T09:42:48.8613530Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8613617Z mov.b64 {%r1670, %r1671}, %rd637; 2026-02-21T09:42:48.8613685Z cvt.rn.f16x2.f32 %r1672, %r1671, %r1670; 2026-02-21T09:42:48.8613854Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8613922Z cvt.u64.u32 %rd638, %r938; 2026-02-21T09:42:48.8613981Z cvt.u64.u32 %rd639, %r939; 2026-02-21T09:42:48.8614039Z shl.b64 %rd640, %rd639, 32; 2026-02-21T09:42:48.8614106Z or.b64 %rd641, %rd638, %rd640; 2026-02-21T09:42:48.8614272Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8614335Z mov.b64 {%r1673, %r1674}, %rd641; 2026-02-21T09:42:48.8614411Z cvt.rn.f16x2.f32 %r1675, %r1674, %r1673; 2026-02-21T09:42:48.8614601Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8614662Z cvt.u64.u32 %rd642, %r940; 2026-02-21T09:42:48.8614772Z cvt.u64.u32 %rd643, %r941; 2026-02-21T09:42:48.8614839Z shl.b64 %rd644, %rd643, 32; 2026-02-21T09:42:48.8614903Z or.b64 %rd645, %rd642, %rd644; 2026-02-21T09:42:48.8615087Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8615161Z mov.b64 {%r1676, %r1677}, %rd645; 2026-02-21T09:42:48.8615235Z cvt.rn.f16x2.f32 %r1678, %r1677, %r1676; 2026-02-21T09:42:48.8615415Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8615527Z cvt.u64.u32 %rd646, %r942; 2026-02-21T09:42:48.8615590Z cvt.u64.u32 %rd647, %r943; 2026-02-21T09:42:48.8615654Z shl.b64 %rd648, %rd647, 32; 2026-02-21T09:42:48.8615718Z or.b64 %rd649, %rd646, %rd648; 2026-02-21T09:42:48.8615914Z .loc 1 58 27 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:58:27 2026-02-21T09:42:48.8615979Z mov.b64 {%r1679, %r1680}, %rd649; 2026-02-21T09:42:48.8616053Z cvt.rn.f16x2.f32 %r1681, %r1680, %r1679; 2026-02-21T09:42:48.8616247Z .loc 1 59 83 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:59:83 2026-02-21T09:42:48.8616361Z st.shared.v4.b32 [%r58], {%r1300, %r1312, %r1324, %r1336}; 2026-02-21T09:42:48.8616471Z st.shared.v4.b32 [%r59], {%r1348, %r1360, %r1372, %r1384}; 2026-02-21T09:42:48.8616582Z st.shared.v4.b32 [%r60], {%r1396, %r1408, %r1420, %r1432}; 2026-02-21T09:42:48.8616682Z st.shared.v4.b32 [%r61], {%r1444, %r1456, %r1468, %r1480}; 2026-02-21T09:42:48.8616783Z st.shared.v4.b32 [%r62], {%r1492, %r1504, %r1516, %r1528}; 2026-02-21T09:42:48.8616881Z st.shared.v4.b32 [%r63], {%r1540, %r1552, %r1564, %r1576}; 2026-02-21T09:42:48.8616988Z st.shared.v4.b32 [%r64], {%r1588, %r1600, %r1612, %r1624}; 2026-02-21T09:42:48.8617114Z st.shared.v4.b32 [%r65], {%r1636, %r1648, %r1660, %r1672}; 2026-02-21T09:42:48.8617180Z bar.sync 0; 2026-02-21T09:42:48.8617253Z // begin inline asm 2026-02-21T09:42:48.8617434Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1105, %r1109, %r1113, %r1117}, [%r949]; 2026-02-21T09:42:48.8617498Z // end inline asm 2026-02-21T09:42:48.8617568Z // begin inline asm 2026-02-21T09:42:48.8617742Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1121, %r1125, %r1129, %r1133}, [%r954]; 2026-02-21T09:42:48.8617805Z // end inline asm 2026-02-21T09:42:48.8617869Z // begin inline asm 2026-02-21T09:42:48.8618046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1137, %r1141, %r1145, %r1149}, [%r959]; 2026-02-21T09:42:48.8618109Z // end inline asm 2026-02-21T09:42:48.8618171Z // begin inline asm 2026-02-21T09:42:48.8618341Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1153, %r1157, %r1161, %r1165}, [%r964]; 2026-02-21T09:42:48.8618402Z // end inline asm 2026-02-21T09:42:48.8618463Z // begin inline asm 2026-02-21T09:42:48.8618642Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1169, %r1173, %r1177, %r1181}, [%r969]; 2026-02-21T09:42:48.8618703Z // end inline asm 2026-02-21T09:42:48.8618796Z // begin inline asm 2026-02-21T09:42:48.8618960Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1185, %r1189, %r1193, %r1197}, [%r974]; 2026-02-21T09:42:48.8619052Z // end inline asm 2026-02-21T09:42:48.8619113Z // begin inline asm 2026-02-21T09:42:48.8619276Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1201, %r1205, %r1209, %r1213}, [%r979]; 2026-02-21T09:42:48.8619344Z // end inline asm 2026-02-21T09:42:48.8619405Z // begin inline asm 2026-02-21T09:42:48.8619565Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1217, %r1221, %r1225, %r1229}, [%r984]; 2026-02-21T09:42:48.8619627Z // end inline asm 2026-02-21T09:42:48.8619699Z bar.sync 0; 2026-02-21T09:42:48.8619802Z st.shared.v4.b32 [%r58], {%r1303, %r1315, %r1327, %r1339}; 2026-02-21T09:42:48.8619903Z st.shared.v4.b32 [%r59], {%r1351, %r1363, %r1375, %r1387}; 2026-02-21T09:42:48.8620013Z st.shared.v4.b32 [%r60], {%r1399, %r1411, %r1423, %r1435}; 2026-02-21T09:42:48.8620141Z st.shared.v4.b32 [%r61], {%r1447, %r1459, %r1471, %r1483}; 2026-02-21T09:42:48.8620245Z st.shared.v4.b32 [%r62], {%r1495, %r1507, %r1519, %r1531}; 2026-02-21T09:42:48.8620353Z st.shared.v4.b32 [%r63], {%r1543, %r1555, %r1567, %r1579}; 2026-02-21T09:42:48.8620453Z st.shared.v4.b32 [%r64], {%r1591, %r1603, %r1615, %r1627}; 2026-02-21T09:42:48.8620553Z st.shared.v4.b32 [%r65], {%r1639, %r1651, %r1663, %r1675}; 2026-02-21T09:42:48.8620613Z bar.sync 0; 2026-02-21T09:42:48.8620683Z // begin inline asm 2026-02-21T09:42:48.8620853Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1106, %r1110, %r1114, %r1118}, [%r949]; 2026-02-21T09:42:48.8620940Z // end inline asm 2026-02-21T09:42:48.8621011Z // begin inline asm 2026-02-21T09:42:48.8621178Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1122, %r1126, %r1130, %r1134}, [%r954]; 2026-02-21T09:42:48.8621239Z // end inline asm 2026-02-21T09:42:48.8621312Z // begin inline asm 2026-02-21T09:42:48.8621479Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1138, %r1142, %r1146, %r1150}, [%r959]; 2026-02-21T09:42:48.8621542Z // end inline asm 2026-02-21T09:42:48.8621607Z // begin inline asm 2026-02-21T09:42:48.8621777Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1154, %r1158, %r1162, %r1166}, [%r964]; 2026-02-21T09:42:48.8621837Z // end inline asm 2026-02-21T09:42:48.8621898Z // begin inline asm 2026-02-21T09:42:48.8622067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1170, %r1174, %r1178, %r1182}, [%r969]; 2026-02-21T09:42:48.8622128Z // end inline asm 2026-02-21T09:42:48.8622188Z // begin inline asm 2026-02-21T09:42:48.8622347Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1186, %r1190, %r1194, %r1198}, [%r974]; 2026-02-21T09:42:48.8622416Z // end inline asm 2026-02-21T09:42:48.8622476Z // begin inline asm 2026-02-21T09:42:48.8622638Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1202, %r1206, %r1210, %r1214}, [%r979]; 2026-02-21T09:42:48.8622706Z // end inline asm 2026-02-21T09:42:48.8622791Z // begin inline asm 2026-02-21T09:42:48.8622961Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1218, %r1222, %r1226, %r1230}, [%r984]; 2026-02-21T09:42:48.8623031Z // end inline asm 2026-02-21T09:42:48.8623092Z bar.sync 0; 2026-02-21T09:42:48.8623196Z st.shared.v4.b32 [%r58], {%r1306, %r1318, %r1330, %r1342}; 2026-02-21T09:42:48.8623297Z st.shared.v4.b32 [%r59], {%r1354, %r1366, %r1378, %r1390}; 2026-02-21T09:42:48.8623404Z st.shared.v4.b32 [%r60], {%r1402, %r1414, %r1426, %r1438}; 2026-02-21T09:42:48.8623504Z st.shared.v4.b32 [%r61], {%r1450, %r1462, %r1474, %r1486}; 2026-02-21T09:42:48.8623604Z st.shared.v4.b32 [%r62], {%r1498, %r1510, %r1522, %r1534}; 2026-02-21T09:42:48.8623713Z st.shared.v4.b32 [%r63], {%r1546, %r1558, %r1570, %r1582}; 2026-02-21T09:42:48.8623810Z st.shared.v4.b32 [%r64], {%r1594, %r1606, %r1618, %r1630}; 2026-02-21T09:42:48.8623910Z st.shared.v4.b32 [%r65], {%r1642, %r1654, %r1666, %r1678}; 2026-02-21T09:42:48.8623975Z bar.sync 0; 2026-02-21T09:42:48.8624038Z // begin inline asm 2026-02-21T09:42:48.8624211Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1107, %r1111, %r1115, %r1119}, [%r949]; 2026-02-21T09:42:48.8624310Z // end inline asm 2026-02-21T09:42:48.8624380Z // begin inline asm 2026-02-21T09:42:48.8624548Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1123, %r1127, %r1131, %r1135}, [%r954]; 2026-02-21T09:42:48.8624608Z // end inline asm 2026-02-21T09:42:48.8624705Z // begin inline asm 2026-02-21T09:42:48.8624876Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1139, %r1143, %r1147, %r1151}, [%r959]; 2026-02-21T09:42:48.8624936Z // end inline asm 2026-02-21T09:42:48.8624997Z // begin inline asm 2026-02-21T09:42:48.8625169Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1155, %r1159, %r1163, %r1167}, [%r964]; 2026-02-21T09:42:48.8625231Z // end inline asm 2026-02-21T09:42:48.8625292Z // begin inline asm 2026-02-21T09:42:48.8625466Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1171, %r1175, %r1179, %r1183}, [%r969]; 2026-02-21T09:42:48.8625526Z // end inline asm 2026-02-21T09:42:48.8625589Z // begin inline asm 2026-02-21T09:42:48.8625790Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1187, %r1191, %r1195, %r1199}, [%r974]; 2026-02-21T09:42:48.8625854Z // end inline asm 2026-02-21T09:42:48.8625914Z // begin inline asm 2026-02-21T09:42:48.8626078Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1203, %r1207, %r1211, %r1215}, [%r979]; 2026-02-21T09:42:48.8626144Z // end inline asm 2026-02-21T09:42:48.8626206Z // begin inline asm 2026-02-21T09:42:48.8626369Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1219, %r1223, %r1227, %r1231}, [%r984]; 2026-02-21T09:42:48.8626438Z // end inline asm 2026-02-21T09:42:48.8626497Z bar.sync 0; 2026-02-21T09:42:48.8626631Z st.shared.v4.b32 [%r58], {%r1309, %r1321, %r1333, %r1345}; 2026-02-21T09:42:48.8626740Z st.shared.v4.b32 [%r59], {%r1357, %r1369, %r1381, %r1393}; 2026-02-21T09:42:48.8626842Z st.shared.v4.b32 [%r60], {%r1405, %r1417, %r1429, %r1441}; 2026-02-21T09:42:48.8626944Z st.shared.v4.b32 [%r61], {%r1453, %r1465, %r1477, %r1489}; 2026-02-21T09:42:48.8627046Z st.shared.v4.b32 [%r62], {%r1501, %r1513, %r1525, %r1537}; 2026-02-21T09:42:48.8627152Z st.shared.v4.b32 [%r63], {%r1549, %r1561, %r1573, %r1585}; 2026-02-21T09:42:48.8627251Z st.shared.v4.b32 [%r64], {%r1597, %r1609, %r1621, %r1633}; 2026-02-21T09:42:48.8627348Z st.shared.v4.b32 [%r65], {%r1645, %r1657, %r1669, %r1681}; 2026-02-21T09:42:48.8627416Z bar.sync 0; 2026-02-21T09:42:48.8627479Z // begin inline asm 2026-02-21T09:42:48.8627647Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1108, %r1112, %r1116, %r1120}, [%r949]; 2026-02-21T09:42:48.8627714Z // end inline asm 2026-02-21T09:42:48.8627775Z // begin inline asm 2026-02-21T09:42:48.8627943Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1124, %r1128, %r1132, %r1136}, [%r954]; 2026-02-21T09:42:48.8628002Z // end inline asm 2026-02-21T09:42:48.8628070Z // begin inline asm 2026-02-21T09:42:48.8628237Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1140, %r1144, %r1148, %r1152}, [%r959]; 2026-02-21T09:42:48.8628326Z // end inline asm 2026-02-21T09:42:48.8628397Z // begin inline asm 2026-02-21T09:42:48.8628566Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1156, %r1160, %r1164, %r1168}, [%r964]; 2026-02-21T09:42:48.8628628Z // end inline asm 2026-02-21T09:42:48.8628689Z // begin inline asm 2026-02-21T09:42:48.8628862Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1172, %r1176, %r1180, %r1184}, [%r969]; 2026-02-21T09:42:48.8628923Z // end inline asm 2026-02-21T09:42:48.8628983Z // begin inline asm 2026-02-21T09:42:48.8629159Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1188, %r1192, %r1196, %r1200}, [%r974]; 2026-02-21T09:42:48.8629215Z // end inline asm 2026-02-21T09:42:48.8629273Z // begin inline asm 2026-02-21T09:42:48.8629436Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1204, %r1208, %r1212, %r1216}, [%r979]; 2026-02-21T09:42:48.8629490Z // end inline asm 2026-02-21T09:42:48.8629546Z // begin inline asm 2026-02-21T09:42:48.8629698Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1220, %r1224, %r1228, %r1232}, [%r984]; 2026-02-21T09:42:48.8629765Z // end inline asm 2026-02-21T09:42:48.8629822Z // begin inline asm 2026-02-21T09:42:48.8629970Z st.global.v4.b32 [ %rd106 + 0 ], { %r1105, %r1106, %r1107, %r1108 }; 2026-02-21T09:42:48.8630034Z // end inline asm 2026-02-21T09:42:48.8630091Z // begin inline asm 2026-02-21T09:42:48.8630205Z st.global.v4.b32 [ %rd107 + 0 ], { %r1109, %r1110, %r1111, %r1112 }; 2026-02-21T09:42:48.8630260Z // end inline asm 2026-02-21T09:42:48.8630324Z // begin inline asm 2026-02-21T09:42:48.8630432Z st.global.v4.b32 [ %rd108 + 0 ], { %r1113, %r1114, %r1115, %r1116 }; 2026-02-21T09:42:48.8630489Z // end inline asm 2026-02-21T09:42:48.8630556Z // begin inline asm 2026-02-21T09:42:48.8630664Z st.global.v4.b32 [ %rd109 + 0 ], { %r1117, %r1118, %r1119, %r1120 }; 2026-02-21T09:42:48.8630721Z // end inline asm 2026-02-21T09:42:48.8630787Z // begin inline asm 2026-02-21T09:42:48.8630894Z st.global.v4.b32 [ %rd110 + 0 ], { %r1121, %r1122, %r1123, %r1124 }; 2026-02-21T09:42:48.8630962Z // end inline asm 2026-02-21T09:42:48.8631053Z // begin inline asm 2026-02-21T09:42:48.8631166Z st.global.v4.b32 [ %rd111 + 0 ], { %r1125, %r1126, %r1127, %r1128 }; 2026-02-21T09:42:48.8631223Z // end inline asm 2026-02-21T09:42:48.8631281Z // begin inline asm 2026-02-21T09:42:48.8631391Z st.global.v4.b32 [ %rd112 + 0 ], { %r1129, %r1130, %r1131, %r1132 }; 2026-02-21T09:42:48.8631447Z // end inline asm 2026-02-21T09:42:48.8631503Z // begin inline asm 2026-02-21T09:42:48.8631604Z st.global.v4.b32 [ %rd113 + 0 ], { %r1133, %r1134, %r1135, %r1136 }; 2026-02-21T09:42:48.8631668Z // end inline asm 2026-02-21T09:42:48.8631725Z // begin inline asm 2026-02-21T09:42:48.8631851Z st.global.v4.b32 [ %rd114 + 0 ], { %r1137, %r1138, %r1139, %r1140 }; 2026-02-21T09:42:48.8631914Z // end inline asm 2026-02-21T09:42:48.8631971Z // begin inline asm 2026-02-21T09:42:48.8632072Z st.global.v4.b32 [ %rd115 + 0 ], { %r1141, %r1142, %r1143, %r1144 }; 2026-02-21T09:42:48.8632135Z // end inline asm 2026-02-21T09:42:48.8632193Z // begin inline asm 2026-02-21T09:42:48.8632296Z st.global.v4.b32 [ %rd116 + 0 ], { %r1145, %r1146, %r1147, %r1148 }; 2026-02-21T09:42:48.8632353Z // end inline asm 2026-02-21T09:42:48.8632419Z // begin inline asm 2026-02-21T09:42:48.8632521Z st.global.v4.b32 [ %rd117 + 0 ], { %r1149, %r1150, %r1151, %r1152 }; 2026-02-21T09:42:48.8632577Z // end inline asm 2026-02-21T09:42:48.8632640Z // begin inline asm 2026-02-21T09:42:48.8632742Z st.global.v4.b32 [ %rd118 + 0 ], { %r1153, %r1154, %r1155, %r1156 }; 2026-02-21T09:42:48.8632799Z // end inline asm 2026-02-21T09:42:48.8632855Z // begin inline asm 2026-02-21T09:42:48.8632963Z st.global.v4.b32 [ %rd119 + 0 ], { %r1157, %r1158, %r1159, %r1160 }; 2026-02-21T09:42:48.8633020Z // end inline asm 2026-02-21T09:42:48.8633077Z // begin inline asm 2026-02-21T09:42:48.8633187Z st.global.v4.b32 [ %rd120 + 0 ], { %r1161, %r1162, %r1163, %r1164 }; 2026-02-21T09:42:48.8633242Z // end inline asm 2026-02-21T09:42:48.8633299Z // begin inline asm 2026-02-21T09:42:48.8633432Z st.global.v4.b32 [ %rd121 + 0 ], { %r1165, %r1166, %r1167, %r1168 }; 2026-02-21T09:42:48.8633492Z // end inline asm 2026-02-21T09:42:48.8633550Z // begin inline asm 2026-02-21T09:42:48.8633652Z st.global.v4.b32 [ %rd122 + 0 ], { %r1169, %r1170, %r1171, %r1172 }; 2026-02-21T09:42:48.8633716Z // end inline asm 2026-02-21T09:42:48.8633772Z // begin inline asm 2026-02-21T09:42:48.8633872Z st.global.v4.b32 [ %rd123 + 0 ], { %r1173, %r1174, %r1175, %r1176 }; 2026-02-21T09:42:48.8633935Z // end inline asm 2026-02-21T09:42:48.8633993Z // begin inline asm 2026-02-21T09:42:48.8634093Z st.global.v4.b32 [ %rd124 + 0 ], { %r1177, %r1178, %r1179, %r1180 }; 2026-02-21T09:42:48.8634150Z // end inline asm 2026-02-21T09:42:48.8634215Z // begin inline asm 2026-02-21T09:42:48.8634316Z st.global.v4.b32 [ %rd125 + 0 ], { %r1181, %r1182, %r1183, %r1184 }; 2026-02-21T09:42:48.8634373Z // end inline asm 2026-02-21T09:42:48.8634436Z // begin inline asm 2026-02-21T09:42:48.8634543Z st.global.v4.b32 [ %rd126 + 0 ], { %r1185, %r1186, %r1187, %r1188 }; 2026-02-21T09:42:48.8634603Z // end inline asm 2026-02-21T09:42:48.8634664Z // begin inline asm 2026-02-21T09:42:48.8634859Z st.global.v4.b32 [ %rd127 + 0 ], { %r1189, %r1190, %r1191, %r1192 }; 2026-02-21T09:42:48.8634919Z // end inline asm 2026-02-21T09:42:48.8634981Z // begin inline asm 2026-02-21T09:42:48.8635101Z st.global.v4.b32 [ %rd128 + 0 ], { %r1193, %r1194, %r1195, %r1196 }; 2026-02-21T09:42:48.8635160Z // end inline asm 2026-02-21T09:42:48.8635221Z // begin inline asm 2026-02-21T09:42:48.8635337Z st.global.v4.b32 [ %rd129 + 0 ], { %r1197, %r1198, %r1199, %r1200 }; 2026-02-21T09:42:48.8635398Z // end inline asm 2026-02-21T09:42:48.8635461Z // begin inline asm 2026-02-21T09:42:48.8635570Z st.global.v4.b32 [ %rd130 + 0 ], { %r1201, %r1202, %r1203, %r1204 }; 2026-02-21T09:42:48.8635637Z // end inline asm 2026-02-21T09:42:48.8635698Z // begin inline asm 2026-02-21T09:42:48.8635811Z st.global.v4.b32 [ %rd131 + 0 ], { %r1205, %r1206, %r1207, %r1208 }; 2026-02-21T09:42:48.8635910Z // end inline asm 2026-02-21T09:42:48.8635973Z // begin inline asm 2026-02-21T09:42:48.8636085Z st.global.v4.b32 [ %rd132 + 0 ], { %r1209, %r1210, %r1211, %r1212 }; 2026-02-21T09:42:48.8636144Z // end inline asm 2026-02-21T09:42:48.8636213Z // begin inline asm 2026-02-21T09:42:48.8636327Z st.global.v4.b32 [ %rd133 + 0 ], { %r1213, %r1214, %r1215, %r1216 }; 2026-02-21T09:42:48.8636382Z // end inline asm 2026-02-21T09:42:48.8636447Z // begin inline asm 2026-02-21T09:42:48.8636546Z st.global.v4.b32 [ %rd134 + 0 ], { %r1217, %r1218, %r1219, %r1220 }; 2026-02-21T09:42:48.8636601Z // end inline asm 2026-02-21T09:42:48.8636693Z // begin inline asm 2026-02-21T09:42:48.8636794Z st.global.v4.b32 [ %rd135 + 0 ], { %r1221, %r1222, %r1223, %r1224 }; 2026-02-21T09:42:48.8636850Z // end inline asm 2026-02-21T09:42:48.8636907Z // begin inline asm 2026-02-21T09:42:48.8637016Z st.global.v4.b32 [ %rd136 + 0 ], { %r1225, %r1226, %r1227, %r1228 }; 2026-02-21T09:42:48.8637073Z // end inline asm 2026-02-21T09:42:48.8637132Z // begin inline asm 2026-02-21T09:42:48.8637239Z st.global.v4.b32 [ %rd137 + 0 ], { %r1229, %r1230, %r1231, %r1232 }; 2026-02-21T09:42:48.8637297Z // end inline asm 2026-02-21T09:42:48.8637357Z bra.uni $L__BB0_10; 2026-02-21T09:42:48.8637447Z $L__BB0_11: // %._crit_edge 2026-02-21T09:42:48.8637640Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8637702Z @%p58 bra $L__BB0_13; 2026-02-21T09:42:48.8637756Z // %bb.12: 2026-02-21T09:42:48.8637939Z .loc 1 56 52 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:56:52 2026-02-21T09:42:48.8637999Z // begin inline asm 2026-02-21T09:42:48.8638052Z 2026-02-21T09:42:48.8638112Z { 2026-02-21T09:42:48.8638174Z .reg .pred complete; 2026-02-21T09:42:48.8638230Z waitLoop: 2026-02-21T09:42:48.8638360Z mbarrier.try_wait.parity.shared.b64 complete, [%r1730], %r1731; 2026-02-21T09:42:48.8638466Z @!complete bra.uni waitLoop; 2026-02-21T09:42:48.8638523Z } 2026-02-21T09:42:48.8638528Z 2026-02-21T09:42:48.8638588Z // end inline asm 2026-02-21T09:42:48.8638655Z $L__BB0_13: 2026-02-21T09:42:48.8638830Z .loc 1 30 84 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:84 2026-02-21T09:42:48.8638899Z cp.async.wait_group 0; 2026-02-21T09:42:48.8638954Z bar.sync 0; 2026-02-21T09:42:48.8639020Z // begin inline asm 2026-02-21T09:42:48.8639116Z @%p118 mbarrier.inval.shared::cta.b64 [%r411]; 2026-02-21T09:42:48.8639174Z // end inline asm 2026-02-21T09:42:48.8639238Z bar.sync 0; 2026-02-21T09:42:48.8639295Z // begin inline asm 2026-02-21T09:42:48.8639388Z @%p118 mbarrier.inval.shared::cta.b64 [%r412]; 2026-02-21T09:42:48.8639450Z // end inline asm 2026-02-21T09:42:48.8639505Z bar.sync 0; 2026-02-21T09:42:48.8639561Z // begin inline asm 2026-02-21T09:42:48.8639645Z @%p118 mbarrier.inval.shared::cta.b64 [%r413]; 2026-02-21T09:42:48.8639710Z // end inline asm 2026-02-21T09:42:48.8639768Z bar.sync 0; 2026-02-21T09:42:48.8639826Z // begin inline asm 2026-02-21T09:42:48.8639941Z @%p118 mbarrier.inval.shared::cta.b64 [%r414]; 2026-02-21T09:42:48.8639999Z // end inline asm 2026-02-21T09:42:48.8640053Z bar.sync 0; 2026-02-21T09:42:48.8640109Z // begin inline asm 2026-02-21T09:42:48.8640197Z @%p118 mbarrier.inval.shared::cta.b64 [%r415]; 2026-02-21T09:42:48.8640252Z // end inline asm 2026-02-21T09:42:48.8640307Z bar.sync 0; 2026-02-21T09:42:48.8640371Z // begin inline asm 2026-02-21T09:42:48.8640451Z @%p118 mbarrier.inval.shared::cta.b64 [%r556]; 2026-02-21T09:42:48.8640508Z // end inline asm 2026-02-21T09:42:48.8640574Z add.s32 %r1690, %r128, 229424; 2026-02-21T09:42:48.8640639Z // begin inline asm 2026-02-21T09:42:48.8640726Z @%p118 mbarrier.inval.shared::cta.b64 [%r1690]; 2026-02-21T09:42:48.8640781Z // end inline asm 2026-02-21T09:42:48.8640844Z bar.sync 0; 2026-02-21T09:42:48.8640901Z // begin inline asm 2026-02-21T09:42:48.8641009Z @%p118 mbarrier.inval.shared::cta.b64 [%r410]; 2026-02-21T09:42:48.8641073Z // end inline asm 2026-02-21T09:42:48.8641244Z .loc 1 30 4 // c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py:30:4 2026-02-21T09:42:48.8641303Z bar.sync 0; 2026-02-21T09:42:48.8641360Z // begin inline asm 2026-02-21T09:42:48.8641487Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1692, 512; 2026-02-21T09:42:48.8641545Z // end inline asm 2026-02-21T09:42:48.8641598Z ret; 2026-02-21T09:42:48.8641660Z $L__tmp1: 2026-02-21T09:42:48.8641717Z $L__func_end0: 2026-02-21T09:42:48.8641807Z // -- End function 2026-02-21T09:42:48.8641883Z } 2026-02-21T09:42:48.8642111Z .file 1 "/tmp/torchinductor_root/7q/c7q65hxat7sgcp6e643ojvyummdvs6ds3vxu47wqv2waguyvrcqp.py" 2026-02-21T09:42:48.8642175Z .section .debug_abbrev 2026-02-21T09:42:48.8642227Z { 2026-02-21T09:42:48.8642325Z .b8 1 // Abbreviation Code 2026-02-21T09:42:48.8642418Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:42:48.8642501Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:42:48.8642593Z .b8 37 // DW_AT_producer 2026-02-21T09:42:48.8642670Z .b8 8 // DW_FORM_string 2026-02-21T09:42:48.8642746Z .b8 19 // DW_AT_language 2026-02-21T09:42:48.8642826Z .b8 5 // DW_FORM_data2 2026-02-21T09:42:48.8642911Z .b8 3 // DW_AT_name 2026-02-21T09:42:48.8642984Z .b8 8 // DW_FORM_string 2026-02-21T09:42:48.8643066Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:42:48.8643154Z .b8 6 // DW_FORM_data4 2026-02-21T09:42:48.8643231Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:42:48.8643306Z .b8 8 // DW_FORM_string 2026-02-21T09:42:48.8643423Z .b8 0 // EOM(1) 2026-02-21T09:42:48.8643496Z .b8 0 // EOM(2) 2026-02-21T09:42:48.8643565Z .b8 0 // EOM(3) 2026-02-21T09:42:48.8643619Z } 2026-02-21T09:42:48.8643687Z .section .debug_info 2026-02-21T09:42:48.8643739Z { 2026-02-21T09:42:48.8643826Z .b32 104 // Length of Unit 2026-02-21T09:42:48.8643922Z .b8 2 // DWARF version number 2026-02-21T09:42:48.8643975Z .b8 0 2026-02-21T09:42:48.8644093Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:42:48.8644193Z .b8 8 // Address Size (in bytes) 2026-02-21T09:42:48.8644297Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:42:48.8644379Z .b8 116 // DW_AT_producer 2026-02-21T09:42:48.8644434Z .b8 114 2026-02-21T09:42:48.8644495Z .b8 105 2026-02-21T09:42:48.8644548Z .b8 116 2026-02-21T09:42:48.8644601Z .b8 111 2026-02-21T09:42:48.8644661Z .b8 110 2026-02-21T09:42:48.8644803Z .b8 0 2026-02-21T09:42:48.8644881Z .b8 2 // DW_AT_language 2026-02-21T09:42:48.8644932Z .b8 0 2026-02-21T09:42:48.8645018Z .b8 99 // DW_AT_name 2026-02-21T09:42:48.8645072Z .b8 55 2026-02-21T09:42:48.8645124Z .b8 113 2026-02-21T09:42:48.8645184Z .b8 54 2026-02-21T09:42:48.8645240Z .b8 53 2026-02-21T09:42:48.8645297Z .b8 104 2026-02-21T09:42:48.8645354Z .b8 120 2026-02-21T09:42:48.8645420Z .b8 97 2026-02-21T09:42:48.8645475Z .b8 116 2026-02-21T09:42:48.8645533Z .b8 55 2026-02-21T09:42:48.8645595Z .b8 115 2026-02-21T09:42:48.8645652Z .b8 103 2026-02-21T09:42:48.8645707Z .b8 99 2026-02-21T09:42:48.8645764Z .b8 112 2026-02-21T09:42:48.8645834Z .b8 54 2026-02-21T09:42:48.8645890Z .b8 101 2026-02-21T09:42:48.8645948Z .b8 54 2026-02-21T09:42:48.8646006Z .b8 52 2026-02-21T09:42:48.8646074Z .b8 51 2026-02-21T09:42:48.8646135Z .b8 111 2026-02-21T09:42:48.8646225Z .b8 106 2026-02-21T09:42:48.8646292Z .b8 118 2026-02-21T09:42:48.8646349Z .b8 121 2026-02-21T09:42:48.8646408Z .b8 117 2026-02-21T09:42:48.8646463Z .b8 109 2026-02-21T09:42:48.8646530Z .b8 109 2026-02-21T09:42:48.8646587Z .b8 100 2026-02-21T09:42:48.8646643Z .b8 118 2026-02-21T09:42:48.8646706Z .b8 115 2026-02-21T09:42:48.8646763Z .b8 54 2026-02-21T09:42:48.8646819Z .b8 100 2026-02-21T09:42:48.8646874Z .b8 115 2026-02-21T09:42:48.8646938Z .b8 51 2026-02-21T09:42:48.8646996Z .b8 118 2026-02-21T09:42:48.8647053Z .b8 120 2026-02-21T09:42:48.8647118Z .b8 117 2026-02-21T09:42:48.8647175Z .b8 52 2026-02-21T09:42:48.8647262Z .b8 55 2026-02-21T09:42:48.8647318Z .b8 119 2026-02-21T09:42:48.8647382Z .b8 113 2026-02-21T09:42:48.8647440Z .b8 118 2026-02-21T09:42:48.8647496Z .b8 50 2026-02-21T09:42:48.8647552Z .b8 119 2026-02-21T09:42:48.8647617Z .b8 97 2026-02-21T09:42:48.8647674Z .b8 103 2026-02-21T09:42:48.8647730Z .b8 117 2026-02-21T09:42:48.8647793Z .b8 121 2026-02-21T09:42:48.8647851Z .b8 118 2026-02-21T09:42:48.8647909Z .b8 114 2026-02-21T09:42:48.8647967Z .b8 99 2026-02-21T09:42:48.8648033Z .b8 113 2026-02-21T09:42:48.8648090Z .b8 112 2026-02-21T09:42:48.8648146Z .b8 46 2026-02-21T09:42:48.8648209Z .b8 112 2026-02-21T09:42:48.8648266Z .b8 121 2026-02-21T09:42:48.8648323Z .b8 0 2026-02-21T09:42:48.8648427Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:42:48.8648521Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:42:48.8648577Z .b8 116 2026-02-21T09:42:48.8648634Z .b8 109 2026-02-21T09:42:48.8648695Z .b8 112 2026-02-21T09:42:48.8648750Z .b8 47 2026-02-21T09:42:48.8648805Z .b8 116 2026-02-21T09:42:48.8648864Z .b8 111 2026-02-21T09:42:48.8648927Z .b8 114 2026-02-21T09:42:48.8648982Z .b8 99 2026-02-21T09:42:48.8649038Z .b8 104 2026-02-21T09:42:48.8649092Z .b8 105 2026-02-21T09:42:48.8649155Z .b8 110 2026-02-21T09:42:48.8649210Z .b8 100 2026-02-21T09:42:48.8649266Z .b8 117 2026-02-21T09:42:48.8649328Z .b8 99 2026-02-21T09:42:48.8649417Z .b8 116 2026-02-21T09:42:48.8649475Z .b8 111 2026-02-21T09:42:48.8649530Z .b8 114 2026-02-21T09:42:48.8649594Z .b8 95 2026-02-21T09:42:48.8649652Z .b8 114 2026-02-21T09:42:48.8649708Z .b8 111 2026-02-21T09:42:48.8649771Z .b8 111 2026-02-21T09:42:48.8649826Z .b8 116 2026-02-21T09:42:48.8649883Z .b8 47 2026-02-21T09:42:48.8649939Z .b8 55 2026-02-21T09:42:48.8650004Z .b8 113 2026-02-21T09:42:48.8650060Z .b8 0 2026-02-21T09:42:48.8650116Z } 2026-02-21T09:42:48.8650195Z .section .debug_macinfo { } 2026-02-21T09:42:48.8650200Z 2026-02-21T09:42:48.8650286Z ================================================================ 2026-02-21T09:42:48.8650402Z please share the reproducer above with Triton project. 2026-02-21T09:42:52.0293382Z 2026-02-21T09:42:52.0296045Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 18.3 configs/s 2026-02-21T09:42:54.4343946Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 419.1 2026-02-21T09:42:54.4344340Z configs/s 2026-02-21T09:42:54.5806684Z [120s] Generation 6 complete: 2026-02-21T09:42:54.5811228Z error=11 2026-02-21T09:42:54.5815828Z ok=74 2026-02-21T09:42:54.5817180Z min=0.0388 2026-02-21T09:42:54.5817347Z mid=0.0656 2026-02-21T09:42:54.5817489Z max=3.1632 2026-02-21T09:42:54.5817636Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:42:54.5817889Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:42:54.5818112Z 'l2_groupings': [32], 2026-02-21T09:42:54.5818287Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:42:54.5818492Z 'loop_orders': [[1, 0]], 2026-02-21T09:42:54.5818657Z 'num_stages': 3, 2026-02-21T09:42:54.5818829Z 'num_warps': 4, 2026-02-21T09:42:54.5818975Z 'pid_type': 'flat', 2026-02-21T09:42:54.5819147Z 'range_flattens': [None, False], 2026-02-21T09:42:54.5819343Z 'range_multi_buffers': [None, True], 2026-02-21T09:42:54.5819542Z 'range_num_stages': [0, 0], 2026-02-21T09:42:54.5819721Z 'range_unroll_factors': [0, 0], 2026-02-21T09:42:54.5820143Z 'range_warp_specializes': [None, None]} 2026-02-21T09:42:54.5836336Z [120s] Fitting surrogate: 628 points, 628 targets 2026-02-21T09:42:56.2385159Z [122s] Generation 7 starting: 62 neighbors, 4 active search path(s) 2026-02-21T09:43:04.3324607Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62/62 1.4 configs/s 2026-02-21T09:43:07.2616707Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 62/62 20.2 configs/s 2026-02-21T09:43:09.9286291Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 378.0 2026-02-21T09:43:09.9290398Z configs/s 2026-02-21T09:43:10.0836487Z [136s] Generation 7 complete: 2026-02-21T09:43:10.0836759Z error=14 2026-02-21T09:43:10.0836920Z ok=52 2026-02-21T09:43:10.0837080Z min=0.0388 2026-02-21T09:43:10.0837250Z mid=0.0513 2026-02-21T09:43:10.0837418Z max=11.8103 2026-02-21T09:43:10.0837560Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:10.0837828Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:10.0838045Z 'l2_groupings': [32], 2026-02-21T09:43:10.0838223Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:10.0838421Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:10.0838577Z 'num_stages': 3, 2026-02-21T09:43:10.0838729Z 'num_warps': 4, 2026-02-21T09:43:10.0838869Z 'pid_type': 'flat', 2026-02-21T09:43:10.0839032Z 'range_flattens': [None, False], 2026-02-21T09:43:10.0839212Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:10.0839400Z 'range_num_stages': [0, 0], 2026-02-21T09:43:10.0839568Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:10.0839758Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:10.0859071Z [136s] Fitting surrogate: 694 points, 694 targets 2026-02-21T09:43:10.9725916Z [137s] Generation 8 starting: 48 neighbors, 3 active search path(s) 2026-02-21T09:43:14.7975665Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48/48 19.3 configs/s 2026-02-21T09:43:16.8087620Z 2026-02-21T09:43:16.8088012Z 2026-02-21T09:43:16.8088463Z ================================================================ 2026-02-21T09:43:16.8088805Z Internal Triton PTX codegen error 2026-02-21T09:43:16.8089009Z `ptxas` stderr: 2026-02-21T09:43:16.8089517Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 200 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:43:16.8090083Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:43:16.8090266Z 2026-02-21T09:43:16.8090742Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpwdw09qol.ptx -o /tmp/tmpwdw09qol.ptx.o 2026-02-21T09:43:16.8091322Z 2026-02-21T09:43:16.8091327Z 2026-02-21T09:43:16.8091394Z // 2026-02-21T09:43:16.8091555Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:43:16.8091771Z // 2026-02-21T09:43:16.8091855Z 2026-02-21T09:43:16.8091928Z .version 8.7 2026-02-21T09:43:16.8092110Z .target sm_100a 2026-02-21T09:43:16.8092306Z .address_size 64 2026-02-21T09:43:16.8092425Z 2026-02-21T09:43:16.8092574Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:43:16.8093143Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:43:16.8093828Z [143s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:43:16.8095556Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=3, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:43:16.8097011Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:43:16.8097303Z `ptxas` stderr: 2026-02-21T09:43:16.8097920Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 200 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:43:16.8098501Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:43:16.8098676Z 2026-02-21T09:43:16.8099148Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpwdw09qol.ptx -o /tmp/tmpwdw09qol.ptx.o 2026-02-21T09:43:16.8099713Z 2026-02-21T09:43:16.8099879Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:43:16.8100336Z // @_helion_matmul 2026-02-21T09:43:16.8100620Z .visible .entry _helion_matmul( 2026-02-21T09:43:16.8100885Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:43:16.8101190Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:43:16.8101487Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:43:16.8101784Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:43:16.8102081Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:43:16.8102330Z ) 2026-02-21T09:43:16.8102462Z .reqntid 128 2026-02-21T09:43:16.8102616Z .maxnreg 32 2026-02-21T09:43:16.8102749Z { 2026-02-21T09:43:16.8102894Z .reg .pred %p<42>; 2026-02-21T09:43:16.8103059Z .reg .b16 %rs<4>; 2026-02-21T09:43:16.8103224Z .reg .b32 %r<1016>; 2026-02-21T09:43:16.8103391Z .reg .b64 %rd<475>; 2026-02-21T09:43:16.8103545Z $L__func_begin0: 2026-02-21T09:43:16.8103638Z 2026-02-21T09:43:16.8103703Z // %bb.0: 2026-02-21T09:43:16.8103992Z .loc 1 14 0 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:14 2026-02-21T09:43:16.8104337Z mov.u32 %r1, %tid.x; 2026-02-21T09:43:16.8104510Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:43:16.8104754Z mov.b32 %r63, global_smem; 2026-02-21T09:43:16.8104938Z // begin inline asm 2026-02-21T09:43:16.8105321Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r63], 128; 2026-02-21T09:43:16.8105623Z // end inline asm 2026-02-21T09:43:16.8105778Z bar.sync 0; 2026-02-21T09:43:16.8105948Z ld.shared.b32 %r1009, [global_smem]; 2026-02-21T09:43:16.8106146Z bar.sync 0; 2026-02-21T09:43:16.8106300Z // begin inline asm 2026-02-21T09:43:16.8106532Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:43:16.8106803Z // end inline asm 2026-02-21T09:43:16.8107098Z .loc 1 21 30 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:21:30 2026-02-21T09:43:16.8107462Z mov.u32 %r3, %ctaid.x; 2026-02-21T09:43:16.8107805Z .loc 1 23 74 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:23:74 2026-02-21T09:43:16.8108195Z setp.gt.u32 %p3, %r3, 767; 2026-02-21T09:43:16.8108411Z @%p3 bra $L__BB0_8; 2026-02-21T09:43:16.8108618Z // %bb.1: // %.lr.ph 2026-02-21T09:43:16.8108993Z .loc 1 0 74 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:0:74 2026-02-21T09:43:16.8109453Z ld.param.b64 %rd69, [_helion_matmul_param_1]; 2026-02-21T09:43:16.8109717Z ld.param.b64 %rd68, [_helion_matmul_param_0]; 2026-02-21T09:43:16.8110088Z .loc 1 35 45 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:35:45 2026-02-21T09:43:16.8110431Z shl.b32 %r266, %r1, 3; 2026-02-21T09:43:16.8110745Z .loc 1 43 48 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:43:48 2026-02-21T09:43:16.8111089Z and.b32 %r267, %r266, 56; 2026-02-21T09:43:16.8111407Z .loc 1 35 45 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:35:45 2026-02-21T09:43:16.8111753Z and.b32 %r268, %r266, 120; 2026-02-21T09:43:16.8111935Z shr.u32 %r269, %r1, 4; 2026-02-21T09:43:16.8112246Z .loc 1 36 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:36:27 2026-02-21T09:43:16.8112580Z shl.b32 %r270, %r3, 2; 2026-02-21T09:43:16.8112825Z and.b32 %r33, %r270, 896; 2026-02-21T09:43:16.8113140Z .loc 1 35 45 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:35:45 2026-02-21T09:43:16.8113485Z or.b32 %r271, %r269, %r33; 2026-02-21T09:43:16.8113666Z bfe.u32 %r272, %r1, 4, 3; 2026-02-21T09:43:16.8113845Z shr.u32 %r273, %r1, 3; 2026-02-21T09:43:16.8114013Z or.b32 %r4, %r273, 112; 2026-02-21T09:43:16.8114192Z bfe.u32 %r5, %r1, 3, 4; 2026-02-21T09:43:16.8114362Z or.b32 %r274, %r5, 96; 2026-02-21T09:43:16.8114524Z or.b32 %r275, %r5, 80; 2026-02-21T09:43:16.8114734Z or.b32 %r276, %r5, 64; 2026-02-21T09:43:16.8114965Z or.b32 %r277, %r5, 48; 2026-02-21T09:43:16.8115130Z or.b32 %r278, %r5, 32; 2026-02-21T09:43:16.8115288Z or.b32 %r279, %r5, 16; 2026-02-21T09:43:16.8115463Z shr.u32 %r280, %r1, 5; 2026-02-21T09:43:16.8115649Z setp.eq.b32 %p39, %r1, 0; 2026-02-21T09:43:16.8115845Z shl.b32 %r281, %r1, 4; 2026-02-21T09:43:16.8116025Z and.b32 %r282, %r281, 2032; 2026-02-21T09:43:16.8116233Z shl.b32 %r283, %r1, 1; 2026-02-21T09:43:16.8116428Z and.b32 %r284, %r283, 112; 2026-02-21T09:43:16.8116642Z xor.b32 %r6, %r282, %r284; 2026-02-21T09:43:16.8116836Z add.s32 %r202, %r63, %r6; 2026-02-21T09:43:16.8117009Z add.s32 %r383, %r202, 32768; 2026-02-21T09:43:16.8117196Z add.s32 %r385, %r202, 34816; 2026-02-21T09:43:16.8117368Z add.s32 %r387, %r202, 36864; 2026-02-21T09:43:16.8117548Z add.s32 %r389, %r202, 38912; 2026-02-21T09:43:16.8117719Z add.s32 %r391, %r202, 40960; 2026-02-21T09:43:16.8117902Z add.s32 %r393, %r202, 43008; 2026-02-21T09:43:16.8118072Z add.s32 %r395, %r202, 45056; 2026-02-21T09:43:16.8118256Z add.s32 %r397, %r202, 47104; 2026-02-21T09:43:16.8118447Z add.s32 %r399, %r202, 81920; 2026-02-21T09:43:16.8118627Z add.s32 %r401, %r202, 83968; 2026-02-21T09:43:16.8118816Z add.s32 %r403, %r202, 86016; 2026-02-21T09:43:16.8118997Z add.s32 %r405, %r202, 88064; 2026-02-21T09:43:16.8119268Z add.s32 %r407, %r202, 90112; 2026-02-21T09:43:16.8119460Z add.s32 %r409, %r202, 92160; 2026-02-21T09:43:16.8119650Z add.s32 %r411, %r202, 94208; 2026-02-21T09:43:16.8119834Z add.s32 %r413, %r202, 96256; 2026-02-21T09:43:16.8120025Z shl.b32 %r286, %r1, 10; 2026-02-21T09:43:16.8120210Z and.b32 %r287, %r286, 6144; 2026-02-21T09:43:16.8120394Z or.b32 %r288, %r287, %r282; 2026-02-21T09:43:16.8120583Z xor.b32 %r289, %r288, 32; 2026-02-21T09:43:16.8120762Z xor.b32 %r290, %r288, 64; 2026-02-21T09:43:16.8120946Z xor.b32 %r291, %r288, 96; 2026-02-21T09:43:16.8121123Z and.b32 %r292, %r1, 96; 2026-02-21T09:43:16.8121309Z shl.b32 %r293, %r292, 6; 2026-02-21T09:43:16.8121490Z shl.b32 %r294, %r1, 5; 2026-02-21T09:43:16.8121668Z and.b32 %r295, %r294, 96; 2026-02-21T09:43:16.8121847Z and.b32 %r296, %r281, 384; 2026-02-21T09:43:16.8122039Z shl.b32 %r297, %r1, 2; 2026-02-21T09:43:16.8122219Z and.b32 %r298, %r297, 16; 2026-02-21T09:43:16.8122395Z or.b32 %r299, %r293, %r295; 2026-02-21T09:43:16.8122587Z or.b32 %r300, %r296, %r292; 2026-02-21T09:43:16.8122772Z xor.b32 %r301, %r299, %r300; 2026-02-21T09:43:16.8122963Z add.s32 %r302, %r63, %r298; 2026-02-21T09:43:16.8123219Z add.s32 %r660, %r302, %r301; 2026-02-21T09:43:16.8123412Z add.s32 %r250, %r202, 65536; 2026-02-21T09:43:16.8123600Z add.s32 %r264, %r202, 79872; 2026-02-21T09:43:16.8123805Z add.s32 %r262, %r202, 77824; 2026-02-21T09:43:16.8124003Z add.s32 %r260, %r202, 75776; 2026-02-21T09:43:16.8124213Z add.s32 %r258, %r202, 73728; 2026-02-21T09:43:16.8124426Z add.s32 %r256, %r202, 71680; 2026-02-21T09:43:16.8124637Z add.s32 %r254, %r202, 69632; 2026-02-21T09:43:16.8124911Z add.s32 %r252, %r202, 67584; 2026-02-21T09:43:16.8125116Z add.s32 %r234, %r202, 16384; 2026-02-21T09:43:16.8125305Z add.s32 %r248, %r202, 30720; 2026-02-21T09:43:16.8125493Z add.s32 %r246, %r202, 28672; 2026-02-21T09:43:16.8125686Z add.s32 %r244, %r202, 26624; 2026-02-21T09:43:16.8125868Z add.s32 %r242, %r202, 24576; 2026-02-21T09:43:16.8126059Z add.s32 %r240, %r202, 22528; 2026-02-21T09:43:16.8126315Z add.s32 %r238, %r202, 20480; 2026-02-21T09:43:16.8126517Z add.s32 %r236, %r202, 18432; 2026-02-21T09:43:16.8126704Z add.s32 %r218, %r202, 49152; 2026-02-21T09:43:16.8126874Z add.s32 %r232, %r202, 63488; 2026-02-21T09:43:16.8127055Z add.s32 %r230, %r202, 61440; 2026-02-21T09:43:16.8127227Z add.s32 %r228, %r202, 59392; 2026-02-21T09:43:16.8127405Z add.s32 %r226, %r202, 57344; 2026-02-21T09:43:16.8127576Z add.s32 %r224, %r202, 55296; 2026-02-21T09:43:16.8127759Z add.s32 %r222, %r202, 53248; 2026-02-21T09:43:16.8127933Z add.s32 %r220, %r202, 51200; 2026-02-21T09:43:16.8128112Z add.s32 %r216, %r202, 14336; 2026-02-21T09:43:16.8128351Z add.s32 %r214, %r202, 12288; 2026-02-21T09:43:16.8128526Z add.s32 %r212, %r202, 10240; 2026-02-21T09:43:16.8128712Z add.s32 %r210, %r202, 8192; 2026-02-21T09:43:16.8128886Z add.s32 %r208, %r202, 6144; 2026-02-21T09:43:16.8129065Z add.s32 %r206, %r202, 4096; 2026-02-21T09:43:16.8129241Z add.s32 %r204, %r202, 2048; 2026-02-21T09:43:16.8129568Z .loc 1 30 33 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:30:33 2026-02-21T09:43:16.8129909Z shr.u32 %r303, %r3, 3; 2026-02-21T09:43:16.8130079Z and.b32 %r304, %r303, 96; 2026-02-21T09:43:16.8130393Z .loc 1 32 64 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:32:64 2026-02-21T09:43:16.8130756Z and.b32 %r31, %r3, 31; 2026-02-21T09:43:16.8131069Z .loc 1 32 30 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:32:30 2026-02-21T09:43:16.8131406Z or.b32 %r305, %r304, %r31; 2026-02-21T09:43:16.8131733Z .loc 1 34 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:34:27 2026-02-21T09:43:16.8132109Z shl.b32 %r306, %r305, 7; 2026-02-21T09:43:16.8132471Z .loc 1 35 32 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:35:32 2026-02-21T09:43:16.8132895Z or.b32 %r307, %r306, %r5; 2026-02-21T09:43:16.8133165Z or.b32 %r308, %r306, %r279; 2026-02-21T09:43:16.8133357Z or.b32 %r309, %r306, %r278; 2026-02-21T09:43:16.8133534Z or.b32 %r310, %r306, %r277; 2026-02-21T09:43:16.8133715Z or.b32 %r311, %r306, %r276; 2026-02-21T09:43:16.8133886Z or.b32 %r312, %r306, %r275; 2026-02-21T09:43:16.8134064Z or.b32 %r313, %r306, %r274; 2026-02-21T09:43:16.8134232Z or.b32 %r314, %r306, %r4; 2026-02-21T09:43:16.8134554Z .loc 1 37 32 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:37:32 2026-02-21T09:43:16.8134953Z or.b32 %r315, %r33, %r5; 2026-02-21T09:43:16.8135128Z or.b32 %r316, %r33, %r279; 2026-02-21T09:43:16.8135309Z or.b32 %r317, %r33, %r278; 2026-02-21T09:43:16.8135488Z or.b32 %r318, %r33, %r277; 2026-02-21T09:43:16.8135667Z or.b32 %r319, %r33, %r276; 2026-02-21T09:43:16.8135838Z or.b32 %r320, %r33, %r275; 2026-02-21T09:43:16.8136015Z or.b32 %r321, %r33, %r274; 2026-02-21T09:43:16.8136186Z or.b32 %r322, %r33, %r4; 2026-02-21T09:43:16.8136367Z or.b32 %r34, %r33, %r272; 2026-02-21T09:43:16.8136675Z .loc 1 47 53 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:53 2026-02-21T09:43:16.8137104Z shl.b32 %r323, %r315, 10; 2026-02-21T09:43:16.8137280Z shl.b32 %r324, %r316, 10; 2026-02-21T09:43:16.8137448Z shl.b32 %r325, %r317, 10; 2026-02-21T09:43:16.8137622Z shl.b32 %r326, %r318, 10; 2026-02-21T09:43:16.8137788Z shl.b32 %r327, %r319, 10; 2026-02-21T09:43:16.8137960Z shl.b32 %r328, %r320, 10; 2026-02-21T09:43:16.8138122Z shl.b32 %r329, %r321, 10; 2026-02-21T09:43:16.8138297Z shl.b32 %r330, %r322, 10; 2026-02-21T09:43:16.8138596Z .loc 1 48 80 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:80 2026-02-21T09:43:16.8138933Z shl.b32 %r331, %r307, 10; 2026-02-21T09:43:16.8139106Z shl.b32 %r332, %r308, 10; 2026-02-21T09:43:16.8139272Z shl.b32 %r333, %r309, 10; 2026-02-21T09:43:16.8139447Z shl.b32 %r334, %r310, 10; 2026-02-21T09:43:16.8139617Z shl.b32 %r335, %r311, 10; 2026-02-21T09:43:16.8139902Z shl.b32 %r336, %r312, 10; 2026-02-21T09:43:16.8140091Z shl.b32 %r337, %r313, 10; 2026-02-21T09:43:16.8140290Z shl.b32 %r338, %r314, 10; 2026-02-21T09:43:16.8140657Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8141071Z shfl.sync.idx.b32 %r50, %r280, 0, 31, -1; 2026-02-21T09:43:16.8141295Z shl.b32 %r339, %r50, 21; 2026-02-21T09:43:16.8141471Z and.b32 %r340, %r339, 6291456; 2026-02-21T09:43:16.8141670Z add.s32 %r655, %r340, %r1009; 2026-02-21T09:43:16.8141854Z mov.pred %p17, -1; 2026-02-21T09:43:16.8142026Z mov.b32 %r1010, 0; 2026-02-21T09:43:16.8142243Z // begin inline asm 2026-02-21T09:43:16.8142737Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 0], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8143257Z // end inline asm 2026-02-21T09:43:16.8143410Z // begin inline asm 2026-02-21T09:43:16.8143921Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 16], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8144430Z // end inline asm 2026-02-21T09:43:16.8144589Z // begin inline asm 2026-02-21T09:43:16.8145100Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 32], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8145604Z // end inline asm 2026-02-21T09:43:16.8145760Z // begin inline asm 2026-02-21T09:43:16.8146216Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 48], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8146722Z // end inline asm 2026-02-21T09:43:16.8146872Z // begin inline asm 2026-02-21T09:43:16.8147401Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 64], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8147987Z // end inline asm 2026-02-21T09:43:16.8148170Z // begin inline asm 2026-02-21T09:43:16.8148686Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 80], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8149190Z // end inline asm 2026-02-21T09:43:16.8149346Z // begin inline asm 2026-02-21T09:43:16.8149801Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 96], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8150305Z // end inline asm 2026-02-21T09:43:16.8150466Z // begin inline asm 2026-02-21T09:43:16.8150932Z @%p17 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r655 + 112], {%r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010, %r1010}; 2026-02-21T09:43:16.8151435Z // end inline asm 2026-02-21T09:43:16.8151655Z // begin inline asm 2026-02-21T09:43:16.8151840Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:43:16.8152024Z // end inline asm 2026-02-21T09:43:16.8152180Z bar.sync 0; 2026-02-21T09:43:16.8152473Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8152823Z add.s32 %r1011, %r63, 98304; 2026-02-21T09:43:16.8153003Z // begin inline asm 2026-02-21T09:43:16.8153194Z @%p39 mbarrier.init.shared::cta.b64 [%r1011], 1; 2026-02-21T09:43:16.8153425Z // end inline asm 2026-02-21T09:43:16.8153571Z bar.sync 0; 2026-02-21T09:43:16.8153729Z add.s32 %r201, %r63, 98312; 2026-02-21T09:43:16.8153905Z // begin inline asm 2026-02-21T09:43:16.8154098Z @%p39 mbarrier.init.shared::cta.b64 [%r201], 1; 2026-02-21T09:43:16.8154317Z // end inline asm 2026-02-21T09:43:16.8154718Z .loc 1 47 60 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:60 2026-02-21T09:43:16.8155087Z or.b32 %r341, %r323, %r267; 2026-02-21T09:43:16.8155284Z or.b32 %r342, %r324, %r267; 2026-02-21T09:43:16.8155483Z or.b32 %r343, %r325, %r267; 2026-02-21T09:43:16.8155668Z or.b32 %r344, %r326, %r267; 2026-02-21T09:43:16.8155871Z or.b32 %r345, %r327, %r267; 2026-02-21T09:43:16.8156065Z or.b32 %r346, %r328, %r267; 2026-02-21T09:43:16.8156273Z or.b32 %r347, %r329, %r267; 2026-02-21T09:43:16.8156454Z or.b32 %r348, %r330, %r267; 2026-02-21T09:43:16.8156774Z .loc 1 47 32 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:32 2026-02-21T09:43:16.8157196Z mad.wide.u32 %rd71, %r341, 2, %rd68; 2026-02-21T09:43:16.8157407Z mad.wide.u32 %rd72, %r342, 2, %rd68; 2026-02-21T09:43:16.8157623Z mad.wide.u32 %rd73, %r343, 2, %rd68; 2026-02-21T09:43:16.8157826Z mad.wide.u32 %rd74, %r344, 2, %rd68; 2026-02-21T09:43:16.8158032Z mad.wide.u32 %rd75, %r345, 2, %rd68; 2026-02-21T09:43:16.8158236Z mad.wide.u32 %rd76, %r346, 2, %rd68; 2026-02-21T09:43:16.8158438Z mad.wide.u32 %rd77, %r347, 2, %rd68; 2026-02-21T09:43:16.8158638Z mad.wide.u32 %rd78, %r348, 2, %rd68; 2026-02-21T09:43:16.8158834Z mov.b32 %r384, 16; 2026-02-21T09:43:16.8159143Z .loc 1 47 85 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:85 2026-02-21T09:43:16.8159481Z // begin inline asm 2026-02-21T09:43:16.8159724Z cp.async.cg.shared.global [ %r202 + 0 ], [ %rd71 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8159989Z // end inline asm 2026-02-21T09:43:16.8160149Z // begin inline asm 2026-02-21T09:43:16.8160375Z cp.async.cg.shared.global [ %r204 + 0 ], [ %rd72 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8160648Z // end inline asm 2026-02-21T09:43:16.8160798Z // begin inline asm 2026-02-21T09:43:16.8161026Z cp.async.cg.shared.global [ %r206 + 0 ], [ %rd73 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8161291Z // end inline asm 2026-02-21T09:43:16.8161441Z // begin inline asm 2026-02-21T09:43:16.8161730Z cp.async.cg.shared.global [ %r208 + 0 ], [ %rd74 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8161992Z // end inline asm 2026-02-21T09:43:16.8162157Z // begin inline asm 2026-02-21T09:43:16.8162386Z cp.async.cg.shared.global [ %r210 + 0 ], [ %rd75 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8162660Z // end inline asm 2026-02-21T09:43:16.8162818Z // begin inline asm 2026-02-21T09:43:16.8163082Z cp.async.cg.shared.global [ %r212 + 0 ], [ %rd76 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8163386Z // end inline asm 2026-02-21T09:43:16.8163554Z // begin inline asm 2026-02-21T09:43:16.8163832Z cp.async.cg.shared.global [ %r214 + 0 ], [ %rd77 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8164169Z // end inline asm 2026-02-21T09:43:16.8164335Z // begin inline asm 2026-02-21T09:43:16.8164567Z cp.async.cg.shared.global [ %r216 + 0 ], [ %rd78 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8164931Z // end inline asm 2026-02-21T09:43:16.8165103Z cp.async.commit_group; 2026-02-21T09:43:16.8165451Z .loc 1 48 59 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:59 2026-02-21T09:43:16.8165827Z or.b32 %r349, %r331, %r267; 2026-02-21T09:43:16.8166084Z or.b32 %r350, %r332, %r267; 2026-02-21T09:43:16.8166278Z or.b32 %r351, %r333, %r267; 2026-02-21T09:43:16.8166459Z or.b32 %r352, %r334, %r267; 2026-02-21T09:43:16.8166646Z or.b32 %r353, %r335, %r267; 2026-02-21T09:43:16.8166826Z or.b32 %r354, %r336, %r267; 2026-02-21T09:43:16.8167012Z or.b32 %r355, %r337, %r267; 2026-02-21T09:43:16.8167190Z or.b32 %r356, %r338, %r267; 2026-02-21T09:43:16.8167519Z .loc 1 48 34 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:34 2026-02-21T09:43:16.8167889Z mad.wide.u32 %rd79, %r349, 2, %rd69; 2026-02-21T09:43:16.8168098Z mad.wide.u32 %rd80, %r350, 2, %rd69; 2026-02-21T09:43:16.8168313Z mad.wide.u32 %rd81, %r351, 2, %rd69; 2026-02-21T09:43:16.8168513Z mad.wide.u32 %rd82, %r352, 2, %rd69; 2026-02-21T09:43:16.8168723Z mad.wide.u32 %rd83, %r353, 2, %rd69; 2026-02-21T09:43:16.8168979Z mad.wide.u32 %rd84, %r354, 2, %rd69; 2026-02-21T09:43:16.8169194Z mad.wide.u32 %rd85, %r355, 2, %rd69; 2026-02-21T09:43:16.8169404Z mad.wide.u32 %rd86, %r356, 2, %rd69; 2026-02-21T09:43:16.8169759Z .loc 1 48 87 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:87 2026-02-21T09:43:16.8170129Z // begin inline asm 2026-02-21T09:43:16.8170356Z cp.async.cg.shared.global [ %r218 + 0 ], [ %rd79 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8170627Z // end inline asm 2026-02-21T09:43:16.8170777Z // begin inline asm 2026-02-21T09:43:16.8171008Z cp.async.cg.shared.global [ %r220 + 0 ], [ %rd80 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8171340Z // end inline asm 2026-02-21T09:43:16.8171518Z // begin inline asm 2026-02-21T09:43:16.8171765Z cp.async.cg.shared.global [ %r222 + 0 ], [ %rd81 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8172067Z // end inline asm 2026-02-21T09:43:16.8172249Z // begin inline asm 2026-02-21T09:43:16.8172498Z cp.async.cg.shared.global [ %r224 + 0 ], [ %rd82 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8172762Z // end inline asm 2026-02-21T09:43:16.8172913Z // begin inline asm 2026-02-21T09:43:16.8173139Z cp.async.cg.shared.global [ %r226 + 0 ], [ %rd83 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8173395Z // end inline asm 2026-02-21T09:43:16.8173550Z // begin inline asm 2026-02-21T09:43:16.8173770Z cp.async.cg.shared.global [ %r228 + 0 ], [ %rd84 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8174034Z // end inline asm 2026-02-21T09:43:16.8174187Z // begin inline asm 2026-02-21T09:43:16.8174405Z cp.async.cg.shared.global [ %r230 + 0 ], [ %rd85 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8174713Z // end inline asm 2026-02-21T09:43:16.8174871Z // begin inline asm 2026-02-21T09:43:16.8175102Z cp.async.cg.shared.global [ %r232 + 0 ], [ %rd86 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8175354Z // end inline asm 2026-02-21T09:43:16.8175521Z cp.async.commit_group; 2026-02-21T09:43:16.8175905Z .loc 1 47 32 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:32 2026-02-21T09:43:16.8176261Z add.s64 %rd87, %rd71, 128; 2026-02-21T09:43:16.8176451Z add.s64 %rd88, %rd72, 128; 2026-02-21T09:43:16.8176626Z add.s64 %rd89, %rd73, 128; 2026-02-21T09:43:16.8176807Z add.s64 %rd90, %rd74, 128; 2026-02-21T09:43:16.8176977Z add.s64 %rd91, %rd75, 128; 2026-02-21T09:43:16.8177157Z add.s64 %rd92, %rd76, 128; 2026-02-21T09:43:16.8177327Z add.s64 %rd93, %rd77, 128; 2026-02-21T09:43:16.8177507Z add.s64 %rd94, %rd78, 128; 2026-02-21T09:43:16.8177810Z .loc 1 47 85 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:85 2026-02-21T09:43:16.8178149Z bar.sync 0; 2026-02-21T09:43:16.8178304Z // begin inline asm 2026-02-21T09:43:16.8178523Z cp.async.cg.shared.global [ %r234 + 0 ], [ %rd87 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8178785Z // end inline asm 2026-02-21T09:43:16.8178936Z // begin inline asm 2026-02-21T09:43:16.8179180Z cp.async.cg.shared.global [ %r236 + 0 ], [ %rd88 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8179462Z // end inline asm 2026-02-21T09:43:16.8179626Z // begin inline asm 2026-02-21T09:43:16.8179947Z cp.async.cg.shared.global [ %r238 + 0 ], [ %rd89 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8180269Z // end inline asm 2026-02-21T09:43:16.8180437Z // begin inline asm 2026-02-21T09:43:16.8180656Z cp.async.cg.shared.global [ %r240 + 0 ], [ %rd90 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8180921Z // end inline asm 2026-02-21T09:43:16.8181069Z // begin inline asm 2026-02-21T09:43:16.8181317Z cp.async.cg.shared.global [ %r242 + 0 ], [ %rd91 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8181574Z // end inline asm 2026-02-21T09:43:16.8181733Z // begin inline asm 2026-02-21T09:43:16.8181954Z cp.async.cg.shared.global [ %r244 + 0 ], [ %rd92 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8182222Z // end inline asm 2026-02-21T09:43:16.8182371Z // begin inline asm 2026-02-21T09:43:16.8182597Z cp.async.cg.shared.global [ %r246 + 0 ], [ %rd93 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8182865Z // end inline asm 2026-02-21T09:43:16.8183085Z // begin inline asm 2026-02-21T09:43:16.8183319Z cp.async.cg.shared.global [ %r248 + 0 ], [ %rd94 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8183580Z // end inline asm 2026-02-21T09:43:16.8183744Z cp.async.commit_group; 2026-02-21T09:43:16.8184060Z .loc 1 48 34 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:34 2026-02-21T09:43:16.8184416Z add.s64 %rd95, %rd79, 128; 2026-02-21T09:43:16.8184604Z add.s64 %rd96, %rd80, 128; 2026-02-21T09:43:16.8184825Z add.s64 %rd97, %rd81, 128; 2026-02-21T09:43:16.8185010Z add.s64 %rd98, %rd82, 128; 2026-02-21T09:43:16.8185237Z add.s64 %rd99, %rd83, 128; 2026-02-21T09:43:16.8185423Z add.s64 %rd100, %rd84, 128; 2026-02-21T09:43:16.8185604Z add.s64 %rd101, %rd85, 128; 2026-02-21T09:43:16.8185787Z add.s64 %rd102, %rd86, 128; 2026-02-21T09:43:16.8186108Z .loc 1 48 87 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:87 2026-02-21T09:43:16.8186458Z // begin inline asm 2026-02-21T09:43:16.8186700Z cp.async.cg.shared.global [ %r250 + 0 ], [ %rd95 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8186994Z // end inline asm 2026-02-21T09:43:16.8187166Z // begin inline asm 2026-02-21T09:43:16.8187428Z cp.async.cg.shared.global [ %r252 + 0 ], [ %rd96 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8187747Z // end inline asm 2026-02-21T09:43:16.8187902Z // begin inline asm 2026-02-21T09:43:16.8188132Z cp.async.cg.shared.global [ %r254 + 0 ], [ %rd97 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8188390Z // end inline asm 2026-02-21T09:43:16.8188546Z // begin inline asm 2026-02-21T09:43:16.8188767Z cp.async.cg.shared.global [ %r256 + 0 ], [ %rd98 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8189034Z // end inline asm 2026-02-21T09:43:16.8189191Z // begin inline asm 2026-02-21T09:43:16.8189410Z cp.async.cg.shared.global [ %r258 + 0 ], [ %rd99 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8189671Z // end inline asm 2026-02-21T09:43:16.8189821Z // begin inline asm 2026-02-21T09:43:16.8190127Z cp.async.cg.shared.global [ %r260 + 0 ], [ %rd100 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8190398Z // end inline asm 2026-02-21T09:43:16.8190553Z // begin inline asm 2026-02-21T09:43:16.8190774Z cp.async.cg.shared.global [ %r262 + 0 ], [ %rd101 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8191042Z // end inline asm 2026-02-21T09:43:16.8191198Z // begin inline asm 2026-02-21T09:43:16.8191418Z cp.async.cg.shared.global [ %r264 + 0 ], [ %rd102 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8191680Z // end inline asm 2026-02-21T09:43:16.8191832Z cp.async.commit_group; 2026-02-21T09:43:16.8192142Z .loc 1 47 85 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:85 2026-02-21T09:43:16.8192487Z cp.async.wait_group 2; 2026-02-21T09:43:16.8192667Z bar.sync 0; 2026-02-21T09:43:16.8192941Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8193288Z setp.ne.b32 %p14, %r50, 0; 2026-02-21T09:43:16.8193474Z @%p14 bra $L__BB0_3; 2026-02-21T09:43:16.8193634Z // %bb.2: 2026-02-21T09:43:16.8193911Z .loc 1 0 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:0:52 2026-02-21T09:43:16.8194333Z add.s32 %r366, %r63, 49152; 2026-02-21T09:43:16.8194519Z add.s32 %r367, %r63, 49248; 2026-02-21T09:43:16.8194760Z bfe.u32 %r368, %r367, 4, 14; 2026-02-21T09:43:16.8194967Z cvt.u64.u32 %rd112, %r368; 2026-02-21T09:43:16.8195177Z or.b64 %rd110, %rd112, 4611686293372403712; 2026-02-21T09:43:16.8195428Z add.s32 %r369, %r63, 96; 2026-02-21T09:43:16.8195640Z bfe.u32 %r370, %r369, 4, 14; 2026-02-21T09:43:16.8195856Z cvt.u64.u32 %rd113, %r370; 2026-02-21T09:43:16.8196057Z or.b64 %rd109, %rd113, 4611686293372403712; 2026-02-21T09:43:16.8196265Z add.s32 %r371, %r63, 49216; 2026-02-21T09:43:16.8196446Z bfe.u32 %r372, %r371, 4, 14; 2026-02-21T09:43:16.8196621Z cvt.u64.u32 %rd114, %r372; 2026-02-21T09:43:16.8196819Z or.b64 %rd108, %rd114, 4611686293372403712; 2026-02-21T09:43:16.8197093Z add.s32 %r373, %r63, 64; 2026-02-21T09:43:16.8197280Z bfe.u32 %r374, %r373, 4, 14; 2026-02-21T09:43:16.8197465Z cvt.u64.u32 %rd115, %r374; 2026-02-21T09:43:16.8197654Z or.b64 %rd107, %rd115, 4611686293372403712; 2026-02-21T09:43:16.8197863Z add.s32 %r375, %r63, 49184; 2026-02-21T09:43:16.8198038Z bfe.u32 %r376, %r375, 4, 14; 2026-02-21T09:43:16.8198220Z cvt.u64.u32 %rd116, %r376; 2026-02-21T09:43:16.8198405Z or.b64 %rd106, %rd116, 4611686293372403712; 2026-02-21T09:43:16.8198615Z add.s32 %r377, %r63, 32; 2026-02-21T09:43:16.8198785Z bfe.u32 %r378, %r377, 4, 14; 2026-02-21T09:43:16.8198968Z cvt.u64.u32 %rd117, %r378; 2026-02-21T09:43:16.8199219Z or.b64 %rd105, %rd117, 4611686293372403712; 2026-02-21T09:43:16.8199431Z bfe.u32 %r379, %r366, 4, 14; 2026-02-21T09:43:16.8199613Z cvt.u64.u32 %rd118, %r379; 2026-02-21T09:43:16.8199794Z or.b64 %rd104, %rd118, 4611686293372403712; 2026-02-21T09:43:16.8200009Z bfe.u32 %r380, %r63, 4, 14; 2026-02-21T09:43:16.8200188Z cvt.u64.u32 %rd119, %r380; 2026-02-21T09:43:16.8200378Z or.b64 %rd103, %rd119, 4611686293372403712; 2026-02-21T09:43:16.8200723Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8201081Z elect.sync %r381|%p16, -1; 2026-02-21T09:43:16.8201262Z mov.b32 %r358, 136314896; 2026-02-21T09:43:16.8201443Z mov.pred %p15, 0; 2026-02-21T09:43:16.8201608Z // begin inline asm 2026-02-21T09:43:16.8201875Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd103, %rd104, %r358, %p15; 2026-02-21T09:43:16.8202187Z // end inline asm 2026-02-21T09:43:16.8202338Z // begin inline asm 2026-02-21T09:43:16.8202615Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd105, %rd106, %r358, %p17; 2026-02-21T09:43:16.8202947Z // end inline asm 2026-02-21T09:43:16.8203122Z // begin inline asm 2026-02-21T09:43:16.8203420Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd107, %rd108, %r358, %p17; 2026-02-21T09:43:16.8203842Z // end inline asm 2026-02-21T09:43:16.8204012Z // begin inline asm 2026-02-21T09:43:16.8204259Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd109, %rd110, %r358, %p17; 2026-02-21T09:43:16.8204563Z // end inline asm 2026-02-21T09:43:16.8204765Z add.s32 %r382, %r63, 98304; 2026-02-21T09:43:16.8204961Z cvt.u64.u32 %rd111, %r382; 2026-02-21T09:43:16.8205137Z // begin inline asm 2026-02-21T09:43:16.8205384Z @%p16 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd111]; 2026-02-21T09:43:16.8205665Z // end inline asm 2026-02-21T09:43:16.8205811Z $L__BB0_3: 2026-02-21T09:43:16.8206116Z .loc 1 0 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:0:52 2026-02-21T09:43:16.8206518Z ld.param.b64 %rd70, [_helion_matmul_param_2]; 2026-02-21T09:43:16.8206769Z add.s32 %r23, %r63, %r288; 2026-02-21T09:43:16.8206955Z add.s32 %r24, %r63, %r289; 2026-02-21T09:43:16.8207146Z add.s32 %r25, %r63, %r290; 2026-02-21T09:43:16.8207329Z add.s32 %r26, %r63, %r291; 2026-02-21T09:43:16.8207521Z add.s32 %r665, %r660, 512; 2026-02-21T09:43:16.8207711Z add.s32 %r670, %r660, 1024; 2026-02-21T09:43:16.8207963Z add.s32 %r675, %r660, 1536; 2026-02-21T09:43:16.8208155Z or.b32 %r32, %r306, %r268; 2026-02-21T09:43:16.8208335Z or.b32 %r35, %r34, 8; 2026-02-21T09:43:16.8208516Z or.b32 %r36, %r34, 16; 2026-02-21T09:43:16.8208691Z or.b32 %r37, %r34, 24; 2026-02-21T09:43:16.8208867Z or.b32 %r38, %r34, 32; 2026-02-21T09:43:16.8209036Z or.b32 %r39, %r34, 40; 2026-02-21T09:43:16.8209212Z or.b32 %r40, %r34, 48; 2026-02-21T09:43:16.8209383Z or.b32 %r41, %r271, 56; 2026-02-21T09:43:16.8209565Z or.b32 %r42, %r34, 64; 2026-02-21T09:43:16.8209740Z or.b32 %r43, %r34, 72; 2026-02-21T09:43:16.8209904Z or.b32 %r44, %r34, 80; 2026-02-21T09:43:16.8210077Z or.b32 %r45, %r34, 88; 2026-02-21T09:43:16.8210246Z or.b32 %r46, %r34, 96; 2026-02-21T09:43:16.8210430Z or.b32 %r47, %r34, 104; 2026-02-21T09:43:16.8210627Z or.b32 %r48, %r34, 112; 2026-02-21T09:43:16.8210836Z or.b32 %r49, %r271, 120; 2026-02-21T09:43:16.8211295Z .loc 1 47 32 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:32 2026-02-21T09:43:16.8211708Z add.s64 %rd120, %rd71, 256; 2026-02-21T09:43:16.8211904Z add.s64 %rd121, %rd72, 256; 2026-02-21T09:43:16.8212088Z add.s64 %rd122, %rd73, 256; 2026-02-21T09:43:16.8212278Z add.s64 %rd123, %rd74, 256; 2026-02-21T09:43:16.8212460Z add.s64 %rd124, %rd75, 256; 2026-02-21T09:43:16.8212653Z add.s64 %rd125, %rd76, 256; 2026-02-21T09:43:16.8212837Z add.s64 %rd126, %rd77, 256; 2026-02-21T09:43:16.8213026Z add.s64 %rd127, %rd78, 256; 2026-02-21T09:43:16.8213421Z .loc 1 47 85 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:85 2026-02-21T09:43:16.8213789Z bar.sync 0; 2026-02-21T09:43:16.8213941Z // begin inline asm 2026-02-21T09:43:16.8214197Z cp.async.cg.shared.global [ %r383 + 0 ], [ %rd120 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8214486Z // end inline asm 2026-02-21T09:43:16.8214640Z // begin inline asm 2026-02-21T09:43:16.8214929Z cp.async.cg.shared.global [ %r385 + 0 ], [ %rd121 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8215193Z // end inline asm 2026-02-21T09:43:16.8215355Z // begin inline asm 2026-02-21T09:43:16.8215579Z cp.async.cg.shared.global [ %r387 + 0 ], [ %rd122 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8215843Z // end inline asm 2026-02-21T09:43:16.8215990Z // begin inline asm 2026-02-21T09:43:16.8216221Z cp.async.cg.shared.global [ %r389 + 0 ], [ %rd123 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8216485Z // end inline asm 2026-02-21T09:43:16.8216631Z // begin inline asm 2026-02-21T09:43:16.8216860Z cp.async.cg.shared.global [ %r391 + 0 ], [ %rd124 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8217118Z // end inline asm 2026-02-21T09:43:16.8217271Z // begin inline asm 2026-02-21T09:43:16.8217494Z cp.async.cg.shared.global [ %r393 + 0 ], [ %rd125 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8217762Z // end inline asm 2026-02-21T09:43:16.8217959Z // begin inline asm 2026-02-21T09:43:16.8218209Z cp.async.cg.shared.global [ %r395 + 0 ], [ %rd126 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8218502Z // end inline asm 2026-02-21T09:43:16.8218663Z // begin inline asm 2026-02-21T09:43:16.8218923Z cp.async.cg.shared.global [ %r397 + 0 ], [ %rd127 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8219241Z // end inline asm 2026-02-21T09:43:16.8219412Z cp.async.commit_group; 2026-02-21T09:43:16.8219720Z .loc 1 48 34 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:34 2026-02-21T09:43:16.8220072Z add.s64 %rd128, %rd79, 256; 2026-02-21T09:43:16.8220251Z add.s64 %rd129, %rd80, 256; 2026-02-21T09:43:16.8220441Z add.s64 %rd130, %rd81, 256; 2026-02-21T09:43:16.8220621Z add.s64 %rd131, %rd82, 256; 2026-02-21T09:43:16.8220796Z add.s64 %rd132, %rd83, 256; 2026-02-21T09:43:16.8220978Z add.s64 %rd133, %rd84, 256; 2026-02-21T09:43:16.8221152Z add.s64 %rd134, %rd85, 256; 2026-02-21T09:43:16.8221334Z add.s64 %rd135, %rd86, 256; 2026-02-21T09:43:16.8221650Z .loc 1 48 87 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:87 2026-02-21T09:43:16.8222062Z // begin inline asm 2026-02-21T09:43:16.8222288Z cp.async.cg.shared.global [ %r399 + 0 ], [ %rd128 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8222555Z // end inline asm 2026-02-21T09:43:16.8222713Z // begin inline asm 2026-02-21T09:43:16.8222940Z cp.async.cg.shared.global [ %r401 + 0 ], [ %rd129 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8223208Z // end inline asm 2026-02-21T09:43:16.8223358Z // begin inline asm 2026-02-21T09:43:16.8223587Z cp.async.cg.shared.global [ %r403 + 0 ], [ %rd130 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8223846Z // end inline asm 2026-02-21T09:43:16.8224002Z // begin inline asm 2026-02-21T09:43:16.8224222Z cp.async.cg.shared.global [ %r405 + 0 ], [ %rd131 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8224481Z // end inline asm 2026-02-21T09:43:16.8224636Z // begin inline asm 2026-02-21T09:43:16.8224916Z cp.async.cg.shared.global [ %r407 + 0 ], [ %rd132 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8225239Z // end inline asm 2026-02-21T09:43:16.8225393Z // begin inline asm 2026-02-21T09:43:16.8225623Z cp.async.cg.shared.global [ %r409 + 0 ], [ %rd133 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8225878Z // end inline asm 2026-02-21T09:43:16.8226030Z // begin inline asm 2026-02-21T09:43:16.8226263Z cp.async.cg.shared.global [ %r411 + 0 ], [ %rd134 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8226560Z // end inline asm 2026-02-21T09:43:16.8226728Z // begin inline asm 2026-02-21T09:43:16.8226980Z cp.async.cg.shared.global [ %r413 + 0 ], [ %rd135 + 0 ], 0x10, %r384; 2026-02-21T09:43:16.8227300Z // end inline asm 2026-02-21T09:43:16.8227535Z cp.async.commit_group; 2026-02-21T09:43:16.8227855Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8228201Z and.b32 %r419, %r1, 7; 2026-02-21T09:43:16.8228388Z mul.wide.u32 %rd17, %r419, 16; 2026-02-21T09:43:16.8228580Z cvt.u16.u32 %rs1, %r3; 2026-02-21T09:43:16.8228762Z shr.u16 %rs2, %rs1, 8; 2026-02-21T09:43:16.8228926Z and.b16 %rs3, %rs2, 3; 2026-02-21T09:43:16.8229110Z mul.wide.u16 %r420, %rs3, 4096; 2026-02-21T09:43:16.8229307Z shl.b32 %r421, %r31, 7; 2026-02-21T09:43:16.8229474Z or.b32 %r422, %r420, %r421; 2026-02-21T09:43:16.8229663Z or.b32 %r423, %r422, %r4; 2026-02-21T09:43:16.8229860Z mad.wide.u32 %rd137, %r423, 2048, %rd69; 2026-02-21T09:43:16.8230101Z add.s64 %rd474, %rd137, 384; 2026-02-21T09:43:16.8230300Z or.b32 %r424, %r422, %r5; 2026-02-21T09:43:16.8230500Z or.b32 %r425, %r424, 96; 2026-02-21T09:43:16.8230698Z mad.wide.u32 %rd138, %r425, 2048, %rd69; 2026-02-21T09:43:16.8230938Z add.s64 %rd473, %rd138, 384; 2026-02-21T09:43:16.8231141Z or.b32 %r426, %r424, 80; 2026-02-21T09:43:16.8231340Z mad.wide.u32 %rd139, %r426, 2048, %rd69; 2026-02-21T09:43:16.8231575Z add.s64 %rd472, %rd139, 384; 2026-02-21T09:43:16.8231767Z or.b32 %r427, %r424, 64; 2026-02-21T09:43:16.8232042Z mad.wide.u32 %rd140, %r427, 2048, %rd69; 2026-02-21T09:43:16.8232287Z add.s64 %rd471, %rd140, 384; 2026-02-21T09:43:16.8232500Z or.b32 %r428, %r424, 48; 2026-02-21T09:43:16.8232714Z mad.wide.u32 %rd141, %r428, 2048, %rd69; 2026-02-21T09:43:16.8232948Z add.s64 %rd470, %rd141, 384; 2026-02-21T09:43:16.8233143Z or.b32 %r429, %r424, 32; 2026-02-21T09:43:16.8233345Z mad.wide.u32 %rd142, %r429, 2048, %rd69; 2026-02-21T09:43:16.8233572Z add.s64 %rd469, %rd142, 384; 2026-02-21T09:43:16.8233766Z or.b32 %r430, %r424, 16; 2026-02-21T09:43:16.8233969Z mad.wide.u32 %rd143, %r430, 2048, %rd69; 2026-02-21T09:43:16.8234190Z add.s64 %rd468, %rd143, 384; 2026-02-21T09:43:16.8234400Z mad.wide.u32 %rd144, %r424, 2048, %rd69; 2026-02-21T09:43:16.8234621Z add.s64 %rd467, %rd144, 384; 2026-02-21T09:43:16.8234864Z shl.b32 %r431, %r3, 12; 2026-02-21T09:43:16.8235049Z and.b32 %r432, %r431, 917504; 2026-02-21T09:43:16.8235254Z shl.b32 %r433, %r4, 10; 2026-02-21T09:43:16.8235447Z or.b32 %r434, %r432, %r433; 2026-02-21T09:43:16.8235651Z mad.wide.u32 %rd145, %r434, 2, %rd68; 2026-02-21T09:43:16.8235878Z add.s64 %rd466, %rd145, 384; 2026-02-21T09:43:16.8236143Z add.s32 %r435, %r33, %r5; 2026-02-21T09:43:16.8236344Z add.s32 %r436, %r435, 96; 2026-02-21T09:43:16.8236548Z mad.wide.u32 %rd146, %r436, 2048, %rd68; 2026-02-21T09:43:16.8236790Z add.s64 %rd465, %rd146, 384; 2026-02-21T09:43:16.8237000Z add.s32 %r437, %r435, 80; 2026-02-21T09:43:16.8237223Z mad.wide.u32 %rd147, %r437, 2048, %rd68; 2026-02-21T09:43:16.8237456Z add.s64 %rd464, %rd147, 384; 2026-02-21T09:43:16.8237645Z add.s32 %r438, %r435, 64; 2026-02-21T09:43:16.8237854Z mad.wide.u32 %rd148, %r438, 2048, %rd68; 2026-02-21T09:43:16.8238079Z add.s64 %rd463, %rd148, 384; 2026-02-21T09:43:16.8238276Z add.s32 %r439, %r435, 48; 2026-02-21T09:43:16.8238472Z mad.wide.u32 %rd149, %r439, 2048, %rd68; 2026-02-21T09:43:16.8238701Z add.s64 %rd462, %rd149, 384; 2026-02-21T09:43:16.8238895Z add.s32 %r440, %r435, 32; 2026-02-21T09:43:16.8239100Z mad.wide.u32 %rd150, %r440, 2048, %rd68; 2026-02-21T09:43:16.8239388Z add.s64 %rd461, %rd150, 384; 2026-02-21T09:43:16.8239587Z add.s32 %r441, %r435, 16; 2026-02-21T09:43:16.8239782Z mad.wide.u32 %rd151, %r441, 2048, %rd68; 2026-02-21T09:43:16.8240002Z add.s64 %rd460, %rd151, 384; 2026-02-21T09:43:16.8240200Z shl.b32 %r442, %r5, 10; 2026-02-21T09:43:16.8240382Z or.b32 %r443, %r432, %r442; 2026-02-21T09:43:16.8240592Z mad.wide.u32 %rd152, %r443, 2, %rd68; 2026-02-21T09:43:16.8240807Z add.s64 %rd459, %rd152, 384; 2026-02-21T09:43:16.8241006Z mov.b32 %r1014, 1; 2026-02-21T09:43:16.8241180Z mov.b32 %r1013, 2; 2026-02-21T09:43:16.8241372Z mov.b64 %rd458, -64; 2026-02-21T09:43:16.8241629Z mov.b32 %r1012, %r1010; 2026-02-21T09:43:16.8241818Z mov.b32 %r1015, %r1010; 2026-02-21T09:43:16.8241987Z bra.uni $L__BB0_4; 2026-02-21T09:43:16.8242197Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:43:16.8242596Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8242940Z add.s64 %rd458, %rd458, 64; 2026-02-21T09:43:16.8243137Z setp.lt.u64 %p35, %rd458, 832; 2026-02-21T09:43:16.8243458Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8243801Z // begin inline asm 2026-02-21T09:43:16.8243960Z 2026-02-21T09:43:16.8244083Z { 2026-02-21T09:43:16.8244225Z .reg .pred complete; 2026-02-21T09:43:16.8244389Z waitLoop: 2026-02-21T09:43:16.8244618Z mbarrier.try_wait.parity.shared.b64 complete, [%r1011], %r1010; 2026-02-21T09:43:16.8244957Z @!complete bra.uni waitLoop; 2026-02-21T09:43:16.8245142Z } 2026-02-21T09:43:16.8245214Z 2026-02-21T09:43:16.8245278Z // end inline asm 2026-02-21T09:43:16.8245440Z add.s32 %r510, %r1014, 1; 2026-02-21T09:43:16.8245619Z setp.gt.s32 %p36, %r510, 1; 2026-02-21T09:43:16.8245819Z selp.b32 %r1014, 0, %r510, %p36; 2026-02-21T09:43:16.8246022Z selp.b32 %r511, 1, 0, %p36; 2026-02-21T09:43:16.8246283Z xor.b32 %r61, %r1015, %r511; 2026-02-21T09:43:16.8246615Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8246956Z add.s32 %r512, %r1013, 1; 2026-02-21T09:43:16.8247141Z setp.gt.s32 %p37, %r512, 2; 2026-02-21T09:43:16.8247324Z selp.b32 %r1013, 0, %r512, %p37; 2026-02-21T09:43:16.8247658Z .loc 1 47 32 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:32 2026-02-21T09:43:16.8248009Z add.s64 %rd170, %rd459, %rd17; 2026-02-21T09:43:16.8248195Z add.s64 %rd171, %rd460, %rd17; 2026-02-21T09:43:16.8248400Z add.s64 %rd172, %rd461, %rd17; 2026-02-21T09:43:16.8248603Z add.s64 %rd173, %rd462, %rd17; 2026-02-21T09:43:16.8248812Z add.s64 %rd174, %rd463, %rd17; 2026-02-21T09:43:16.8249015Z add.s64 %rd175, %rd464, %rd17; 2026-02-21T09:43:16.8249233Z add.s64 %rd176, %rd465, %rd17; 2026-02-21T09:43:16.8249614Z .loc 1 47 85 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:85 2026-02-21T09:43:16.8249980Z add.s64 %rd177, %rd466, %rd17; 2026-02-21T09:43:16.8250180Z shl.b32 %r513, %r1013, 14; 2026-02-21T09:43:16.8250438Z add.s32 %r515, %r63, %r513; 2026-02-21T09:43:16.8250627Z bar.sync 0; 2026-02-21T09:43:16.8250785Z add.s32 %r478, %r515, %r6; 2026-02-21T09:43:16.8250982Z selp.b32 %r479, 16, 0, %p35; 2026-02-21T09:43:16.8251170Z // begin inline asm 2026-02-21T09:43:16.8251430Z cp.async.cg.shared.global [ %r478 + 0 ], [ %rd170 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8251717Z // end inline asm 2026-02-21T09:43:16.8251893Z add.s32 %r480, %r478, 2048; 2026-02-21T09:43:16.8252076Z // begin inline asm 2026-02-21T09:43:16.8252330Z cp.async.cg.shared.global [ %r480 + 0 ], [ %rd171 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8252618Z // end inline asm 2026-02-21T09:43:16.8252778Z add.s32 %r482, %r478, 4096; 2026-02-21T09:43:16.8252969Z // begin inline asm 2026-02-21T09:43:16.8253210Z cp.async.cg.shared.global [ %r482 + 0 ], [ %rd172 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8253556Z // end inline asm 2026-02-21T09:43:16.8253724Z add.s32 %r484, %r478, 6144; 2026-02-21T09:43:16.8253924Z // begin inline asm 2026-02-21T09:43:16.8254172Z cp.async.cg.shared.global [ %r484 + 0 ], [ %rd173 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8254458Z // end inline asm 2026-02-21T09:43:16.8254627Z add.s32 %r486, %r478, 8192; 2026-02-21T09:43:16.8254853Z // begin inline asm 2026-02-21T09:43:16.8255102Z cp.async.cg.shared.global [ %r486 + 0 ], [ %rd174 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8255378Z // end inline asm 2026-02-21T09:43:16.8255551Z add.s32 %r488, %r478, 10240; 2026-02-21T09:43:16.8255736Z // begin inline asm 2026-02-21T09:43:16.8256049Z cp.async.cg.shared.global [ %r488 + 0 ], [ %rd175 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8256358Z // end inline asm 2026-02-21T09:43:16.8256540Z add.s32 %r490, %r478, 12288; 2026-02-21T09:43:16.8256749Z // begin inline asm 2026-02-21T09:43:16.8257043Z cp.async.cg.shared.global [ %r490 + 0 ], [ %rd176 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8257343Z // end inline asm 2026-02-21T09:43:16.8257503Z add.s32 %r492, %r478, 14336; 2026-02-21T09:43:16.8257701Z // begin inline asm 2026-02-21T09:43:16.8257942Z cp.async.cg.shared.global [ %r492 + 0 ], [ %rd177 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8258237Z // end inline asm 2026-02-21T09:43:16.8258395Z cp.async.commit_group; 2026-02-21T09:43:16.8258722Z .loc 1 48 34 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:34 2026-02-21T09:43:16.8259081Z add.s64 %rd178, %rd467, %rd17; 2026-02-21T09:43:16.8259268Z add.s64 %rd179, %rd468, %rd17; 2026-02-21T09:43:16.8259462Z add.s64 %rd180, %rd469, %rd17; 2026-02-21T09:43:16.8259650Z add.s64 %rd181, %rd470, %rd17; 2026-02-21T09:43:16.8259841Z add.s64 %rd182, %rd471, %rd17; 2026-02-21T09:43:16.8260026Z add.s64 %rd183, %rd472, %rd17; 2026-02-21T09:43:16.8260216Z add.s64 %rd184, %rd473, %rd17; 2026-02-21T09:43:16.8260598Z .loc 1 48 87 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:87 2026-02-21T09:43:16.8260964Z add.s64 %rd185, %rd474, %rd17; 2026-02-21T09:43:16.8261146Z add.s32 %r494, %r478, 49152; 2026-02-21T09:43:16.8261332Z // begin inline asm 2026-02-21T09:43:16.8261578Z cp.async.cg.shared.global [ %r494 + 0 ], [ %rd178 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8261837Z // end inline asm 2026-02-21T09:43:16.8261999Z add.s32 %r496, %r478, 51200; 2026-02-21T09:43:16.8262173Z // begin inline asm 2026-02-21T09:43:16.8262407Z cp.async.cg.shared.global [ %r496 + 0 ], [ %rd179 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8262682Z // end inline asm 2026-02-21T09:43:16.8262858Z add.s32 %r498, %r478, 53248; 2026-02-21T09:43:16.8263051Z // begin inline asm 2026-02-21T09:43:16.8263321Z cp.async.cg.shared.global [ %r498 + 0 ], [ %rd180 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8263642Z // end inline asm 2026-02-21T09:43:16.8263808Z add.s32 %r500, %r478, 55296; 2026-02-21T09:43:16.8263992Z // begin inline asm 2026-02-21T09:43:16.8264222Z cp.async.cg.shared.global [ %r500 + 0 ], [ %rd181 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8264490Z // end inline asm 2026-02-21T09:43:16.8264769Z add.s32 %r502, %r478, 57344; 2026-02-21T09:43:16.8264956Z // begin inline asm 2026-02-21T09:43:16.8265182Z cp.async.cg.shared.global [ %r502 + 0 ], [ %rd182 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8265451Z // end inline asm 2026-02-21T09:43:16.8265610Z add.s32 %r504, %r478, 59392; 2026-02-21T09:43:16.8265784Z // begin inline asm 2026-02-21T09:43:16.8266018Z cp.async.cg.shared.global [ %r504 + 0 ], [ %rd183 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8266279Z // end inline asm 2026-02-21T09:43:16.8266443Z add.s32 %r506, %r478, 61440; 2026-02-21T09:43:16.8266617Z // begin inline asm 2026-02-21T09:43:16.8266847Z cp.async.cg.shared.global [ %r506 + 0 ], [ %rd184 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8267109Z // end inline asm 2026-02-21T09:43:16.8267268Z add.s32 %r508, %r478, 63488; 2026-02-21T09:43:16.8267450Z // begin inline asm 2026-02-21T09:43:16.8267736Z cp.async.cg.shared.global [ %r508 + 0 ], [ %rd185 + 0 ], 0x10, %r479; 2026-02-21T09:43:16.8268026Z // end inline asm 2026-02-21T09:43:16.8268194Z cp.async.commit_group; 2026-02-21T09:43:16.8268526Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8268884Z add.s64 %rd474, %rd474, 128; 2026-02-21T09:43:16.8269079Z add.s64 %rd473, %rd473, 128; 2026-02-21T09:43:16.8269263Z add.s64 %rd472, %rd472, 128; 2026-02-21T09:43:16.8269452Z add.s64 %rd471, %rd471, 128; 2026-02-21T09:43:16.8269639Z add.s64 %rd470, %rd470, 128; 2026-02-21T09:43:16.8269818Z add.s64 %rd469, %rd469, 128; 2026-02-21T09:43:16.8270061Z add.s64 %rd468, %rd468, 128; 2026-02-21T09:43:16.8270247Z add.s64 %rd467, %rd467, 128; 2026-02-21T09:43:16.8270445Z add.s64 %rd466, %rd466, 128; 2026-02-21T09:43:16.8270637Z add.s64 %rd465, %rd465, 128; 2026-02-21T09:43:16.8270850Z add.s64 %rd464, %rd464, 128; 2026-02-21T09:43:16.8271050Z add.s64 %rd463, %rd463, 128; 2026-02-21T09:43:16.8271250Z add.s64 %rd462, %rd462, 128; 2026-02-21T09:43:16.8283309Z add.s64 %rd461, %rd461, 128; 2026-02-21T09:43:16.8283626Z add.s64 %rd460, %rd460, 128; 2026-02-21T09:43:16.8283828Z add.s64 %rd459, %rd459, 128; 2026-02-21T09:43:16.8284024Z setp.lt.u64 %p38, %rd458, 896; 2026-02-21T09:43:16.8284228Z mov.b32 %r1010, %r1015; 2026-02-21T09:43:16.8284404Z mov.b32 %r1011, %r516; 2026-02-21T09:43:16.8284588Z mov.b32 %r1015, %r61; 2026-02-21T09:43:16.8284802Z @%p38 bra $L__BB0_4; 2026-02-21T09:43:16.8284984Z bra.uni $L__BB0_7; 2026-02-21T09:43:16.8285253Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:43:16.8285551Z add.s32 %r445, %r1012, 1; 2026-02-21T09:43:16.8285774Z setp.gt.s32 %p25, %r445, 2; 2026-02-21T09:43:16.8285986Z selp.b32 %r1012, 0, %r445, %p25; 2026-02-21T09:43:16.8286389Z .loc 1 47 85 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:47:85 2026-02-21T09:43:16.8286938Z cp.async.wait_group 2; 2026-02-21T09:43:16.8287146Z bar.sync 0; 2026-02-21T09:43:16.8287459Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8287831Z shl.b32 %r446, %r1014, 3; 2026-02-21T09:43:16.8288024Z add.s32 %r448, %r63, %r446; 2026-02-21T09:43:16.8288212Z add.s32 %r516, %r448, 98304; 2026-02-21T09:43:16.8288553Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8288902Z @%p14 bra $L__BB0_6; 2026-02-21T09:43:16.8289132Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:43:16.8289524Z .loc 1 48 87 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:48:87 2026-02-21T09:43:16.8289877Z shl.b32 %r457, %r1012, 14; 2026-02-21T09:43:16.8290069Z add.s32 %r459, %r63, %r457; 2026-02-21T09:43:16.8290255Z add.s32 %r460, %r459, 49152; 2026-02-21T09:43:16.8290583Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8290938Z elect.sync %r461|%p27, -1; 2026-02-21T09:43:16.8291227Z bfe.u32 %r462, %r459, 4, 14; 2026-02-21T09:43:16.8291410Z cvt.u64.u32 %rd162, %r462; 2026-02-21T09:43:16.8291622Z or.b64 %rd153, %rd162, 4611686293372403712; 2026-02-21T09:43:16.8291835Z bfe.u32 %r463, %r460, 4, 14; 2026-02-21T09:43:16.8292028Z cvt.u64.u32 %rd163, %r463; 2026-02-21T09:43:16.8292249Z or.b64 %rd154, %rd163, 4611686293372403712; 2026-02-21T09:43:16.8292468Z mov.b32 %r450, 136314896; 2026-02-21T09:43:16.8292667Z mov.pred %p26, -1; 2026-02-21T09:43:16.8292845Z // begin inline asm 2026-02-21T09:43:16.8293157Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd153, %rd154, %r450, %p26; 2026-02-21T09:43:16.8293509Z // end inline asm 2026-02-21T09:43:16.8293701Z add.s32 %r464, %r459, 32; 2026-02-21T09:43:16.8293907Z bfe.u32 %r465, %r464, 4, 14; 2026-02-21T09:43:16.8294137Z cvt.u64.u32 %rd164, %r465; 2026-02-21T09:43:16.8294472Z or.b64 %rd155, %rd164, 4611686293372403712; 2026-02-21T09:43:16.8294754Z add.s32 %r466, %r459, 49184; 2026-02-21T09:43:16.8294965Z bfe.u32 %r467, %r466, 4, 14; 2026-02-21T09:43:16.8295154Z cvt.u64.u32 %rd165, %r467; 2026-02-21T09:43:16.8295363Z or.b64 %rd156, %rd165, 4611686293372403712; 2026-02-21T09:43:16.8295583Z // begin inline asm 2026-02-21T09:43:16.8295876Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd155, %rd156, %r450, %p26; 2026-02-21T09:43:16.8296202Z // end inline asm 2026-02-21T09:43:16.8296379Z add.s32 %r468, %r459, 64; 2026-02-21T09:43:16.8296573Z bfe.u32 %r469, %r468, 4, 14; 2026-02-21T09:43:16.8296825Z cvt.u64.u32 %rd166, %r469; 2026-02-21T09:43:16.8297040Z or.b64 %rd157, %rd166, 4611686293372403712; 2026-02-21T09:43:16.8297258Z add.s32 %r470, %r459, 49216; 2026-02-21T09:43:16.8297453Z bfe.u32 %r471, %r470, 4, 14; 2026-02-21T09:43:16.8297645Z cvt.u64.u32 %rd167, %r471; 2026-02-21T09:43:16.8297858Z or.b64 %rd158, %rd167, 4611686293372403712; 2026-02-21T09:43:16.8298077Z // begin inline asm 2026-02-21T09:43:16.8298364Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd157, %rd158, %r450, %p26; 2026-02-21T09:43:16.8298696Z // end inline asm 2026-02-21T09:43:16.8298861Z add.s32 %r472, %r459, 96; 2026-02-21T09:43:16.8299052Z bfe.u32 %r473, %r472, 4, 14; 2026-02-21T09:43:16.8299244Z cvt.u64.u32 %rd168, %r473; 2026-02-21T09:43:16.8299452Z or.b64 %rd159, %rd168, 4611686293372403712; 2026-02-21T09:43:16.8299671Z add.s32 %r474, %r459, 49248; 2026-02-21T09:43:16.8299870Z bfe.u32 %r475, %r474, 4, 14; 2026-02-21T09:43:16.8300054Z cvt.u64.u32 %rd169, %r475; 2026-02-21T09:43:16.8300257Z or.b64 %rd160, %rd169, 4611686293372403712; 2026-02-21T09:43:16.8300465Z // begin inline asm 2026-02-21T09:43:16.8300722Z @%p27 tcgen05.mma.cta_group::1.kind::f16 [ %r1009 + 0 ], %rd159, %rd160, %r450, %p26; 2026-02-21T09:43:16.8301023Z // end inline asm 2026-02-21T09:43:16.8301182Z cvt.u64.u32 %rd161, %r516; 2026-02-21T09:43:16.8301575Z // begin inline asm 2026-02-21T09:43:16.8301854Z @%p27 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd161]; 2026-02-21T09:43:16.8302181Z // end inline asm 2026-02-21T09:43:16.8302359Z bra.uni $L__BB0_6; 2026-02-21T09:43:16.8302607Z $L__BB0_7: // %._crit_edge.loopexit 2026-02-21T09:43:16.8303009Z .loc 1 0 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:0:52 2026-02-21T09:43:16.8303356Z mov.b32 %r517, 1; 2026-02-21T09:43:16.8303663Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8304005Z // begin inline asm 2026-02-21T09:43:16.8304172Z 2026-02-21T09:43:16.8304304Z { 2026-02-21T09:43:16.8304454Z .reg .pred complete; 2026-02-21T09:43:16.8304619Z waitLoop: 2026-02-21T09:43:16.8304899Z mbarrier.try_wait.parity.shared.b64 complete, [%r516], %r517; 2026-02-21T09:43:16.8305193Z @!complete bra.uni waitLoop; 2026-02-21T09:43:16.8305367Z } 2026-02-21T09:43:16.8305446Z 2026-02-21T09:43:16.8305520Z // end inline asm 2026-02-21T09:43:16.8305813Z .loc 1 42 90 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:42:90 2026-02-21T09:43:16.8306245Z cp.async.wait_group 0; 2026-02-21T09:43:16.8306419Z bar.sync 0; 2026-02-21T09:43:16.8306580Z add.s32 %r518, %r63, 98304; 2026-02-21T09:43:16.8306758Z // begin inline asm 2026-02-21T09:43:16.8306963Z @%p39 mbarrier.inval.shared::cta.b64 [%r518]; 2026-02-21T09:43:16.8307193Z // end inline asm 2026-02-21T09:43:16.8307347Z bar.sync 0; 2026-02-21T09:43:16.8307504Z // begin inline asm 2026-02-21T09:43:16.8307690Z @%p39 mbarrier.inval.shared::cta.b64 [%r201]; 2026-02-21T09:43:16.8307916Z // end inline asm 2026-02-21T09:43:16.8308210Z .loc 1 52 53 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:52:53 2026-02-21T09:43:16.8308566Z mad.lo.s32 %r801, %r34, 12288, %r32; 2026-02-21T09:43:16.8308771Z mad.lo.s32 %r802, %r35, 12288, %r32; 2026-02-21T09:43:16.8309036Z mad.lo.s32 %r803, %r36, 12288, %r32; 2026-02-21T09:43:16.8309249Z mad.lo.s32 %r804, %r37, 12288, %r32; 2026-02-21T09:43:16.8309448Z mad.lo.s32 %r805, %r38, 12288, %r32; 2026-02-21T09:43:16.8309670Z mad.lo.s32 %r806, %r39, 12288, %r32; 2026-02-21T09:43:16.8309883Z mad.lo.s32 %r807, %r40, 12288, %r32; 2026-02-21T09:43:16.8310109Z mad.lo.s32 %r808, %r41, 12288, %r32; 2026-02-21T09:43:16.8310332Z mad.lo.s32 %r809, %r42, 12288, %r32; 2026-02-21T09:43:16.8310575Z mad.lo.s32 %r810, %r43, 12288, %r32; 2026-02-21T09:43:16.8310807Z mad.lo.s32 %r811, %r44, 12288, %r32; 2026-02-21T09:43:16.8311032Z mad.lo.s32 %r812, %r45, 12288, %r32; 2026-02-21T09:43:16.8311296Z mad.lo.s32 %r813, %r46, 12288, %r32; 2026-02-21T09:43:16.8311492Z mad.lo.s32 %r814, %r47, 12288, %r32; 2026-02-21T09:43:16.8311691Z mad.lo.s32 %r815, %r48, 12288, %r32; 2026-02-21T09:43:16.8311885Z mad.lo.s32 %r816, %r49, 12288, %r32; 2026-02-21T09:43:16.8312227Z .loc 1 52 24 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:52:24 2026-02-21T09:43:16.8312580Z mad.wide.u32 %rd186, %r801, 2, %rd70; 2026-02-21T09:43:16.8312801Z mad.wide.u32 %rd187, %r802, 2, %rd70; 2026-02-21T09:43:16.8313005Z mad.wide.u32 %rd188, %r803, 2, %rd70; 2026-02-21T09:43:16.8313212Z mad.wide.u32 %rd189, %r804, 2, %rd70; 2026-02-21T09:43:16.8313421Z mad.wide.u32 %rd190, %r805, 2, %rd70; 2026-02-21T09:43:16.8313618Z mad.wide.u32 %rd191, %r806, 2, %rd70; 2026-02-21T09:43:16.8313828Z mad.wide.u32 %rd192, %r807, 2, %rd70; 2026-02-21T09:43:16.8314025Z mad.wide.u32 %rd193, %r808, 2, %rd70; 2026-02-21T09:43:16.8314230Z mad.wide.u32 %rd194, %r809, 2, %rd70; 2026-02-21T09:43:16.8314429Z mad.wide.u32 %rd195, %r810, 2, %rd70; 2026-02-21T09:43:16.8314639Z mad.wide.u32 %rd196, %r811, 2, %rd70; 2026-02-21T09:43:16.8314890Z mad.wide.u32 %rd197, %r812, 2, %rd70; 2026-02-21T09:43:16.8315100Z mad.wide.u32 %rd198, %r813, 2, %rd70; 2026-02-21T09:43:16.8315308Z mad.wide.u32 %rd199, %r814, 2, %rd70; 2026-02-21T09:43:16.8315567Z mad.wide.u32 %rd200, %r815, 2, %rd70; 2026-02-21T09:43:16.8315784Z mad.wide.u32 %rd201, %r816, 2, %rd70; 2026-02-21T09:43:16.8316115Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8316465Z // begin inline asm 2026-02-21T09:43:16.8316906Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529, %r530, %r531, %r532, %r533, %r534, %r535}, [%r655 + 0]; 2026-02-21T09:43:16.8317385Z // end inline asm 2026-02-21T09:43:16.8317558Z // begin inline asm 2026-02-21T09:43:16.8318027Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552}, [%r655 + 16]; 2026-02-21T09:43:16.8318598Z // end inline asm 2026-02-21T09:43:16.8318777Z // begin inline asm 2026-02-21T09:43:16.8319228Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569}, [%r655 + 32]; 2026-02-21T09:43:16.8319744Z // end inline asm 2026-02-21T09:43:16.8319995Z // begin inline asm 2026-02-21T09:43:16.8320464Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586}, [%r655 + 48]; 2026-02-21T09:43:16.8320973Z // end inline asm 2026-02-21T09:43:16.8321148Z // begin inline asm 2026-02-21T09:43:16.8321604Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r588, %r589, %r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603}, [%r655 + 64]; 2026-02-21T09:43:16.8322133Z // end inline asm 2026-02-21T09:43:16.8322298Z // begin inline asm 2026-02-21T09:43:16.8322793Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r605, %r606, %r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620}, [%r655 + 80]; 2026-02-21T09:43:16.8323359Z // end inline asm 2026-02-21T09:43:16.8323525Z // begin inline asm 2026-02-21T09:43:16.8324048Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r622, %r623, %r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637}, [%r655 + 96]; 2026-02-21T09:43:16.8324564Z // end inline asm 2026-02-21T09:43:16.8324781Z // begin inline asm 2026-02-21T09:43:16.8325231Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r639, %r640, %r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654}, [%r655 + 112]; 2026-02-21T09:43:16.8325722Z // end inline asm 2026-02-21T09:43:16.8325895Z // begin inline asm 2026-02-21T09:43:16.8326082Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:43:16.8326348Z // end inline asm 2026-02-21T09:43:16.8326521Z cvt.u64.u32 %rd202, %r520; 2026-02-21T09:43:16.8326728Z cvt.u64.u32 %rd203, %r521; 2026-02-21T09:43:16.8326929Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:43:16.8327125Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:43:16.8327522Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8327942Z mov.b64 {%r817, %r818}, %rd205; 2026-02-21T09:43:16.8328165Z cvt.rn.f16x2.f32 %r819, %r818, %r817; 2026-02-21T09:43:16.8328545Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8328914Z cvt.u64.u32 %rd206, %r522; 2026-02-21T09:43:16.8329121Z cvt.u64.u32 %rd207, %r523; 2026-02-21T09:43:16.8329311Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:43:16.8329515Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:43:16.8329872Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8330253Z mov.b64 {%r820, %r821}, %rd209; 2026-02-21T09:43:16.8330462Z cvt.rn.f16x2.f32 %r822, %r821, %r820; 2026-02-21T09:43:16.8330830Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8331256Z cvt.u64.u32 %rd210, %r524; 2026-02-21T09:43:16.8331450Z cvt.u64.u32 %rd211, %r525; 2026-02-21T09:43:16.8331650Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:43:16.8331858Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:43:16.8332251Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8332629Z mov.b64 {%r823, %r824}, %rd213; 2026-02-21T09:43:16.8332830Z cvt.rn.f16x2.f32 %r825, %r824, %r823; 2026-02-21T09:43:16.8333170Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8333511Z cvt.u64.u32 %rd214, %r526; 2026-02-21T09:43:16.8333697Z cvt.u64.u32 %rd215, %r527; 2026-02-21T09:43:16.8333874Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:43:16.8334063Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:43:16.8334386Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8334800Z mov.b64 {%r826, %r827}, %rd217; 2026-02-21T09:43:16.8335001Z cvt.rn.f16x2.f32 %r828, %r827, %r826; 2026-02-21T09:43:16.8335348Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8335773Z cvt.u64.u32 %rd218, %r528; 2026-02-21T09:43:16.8335950Z cvt.u64.u32 %rd219, %r529; 2026-02-21T09:43:16.8336134Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:43:16.8336324Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:43:16.8336646Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8336999Z mov.b64 {%r829, %r830}, %rd221; 2026-02-21T09:43:16.8337198Z cvt.rn.f16x2.f32 %r831, %r830, %r829; 2026-02-21T09:43:16.8337566Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8337930Z cvt.u64.u32 %rd222, %r530; 2026-02-21T09:43:16.8338132Z cvt.u64.u32 %rd223, %r531; 2026-02-21T09:43:16.8338343Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:43:16.8338553Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:43:16.8339043Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8339459Z mov.b64 {%r832, %r833}, %rd225; 2026-02-21T09:43:16.8339668Z cvt.rn.f16x2.f32 %r834, %r833, %r832; 2026-02-21T09:43:16.8340020Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8340387Z cvt.u64.u32 %rd226, %r532; 2026-02-21T09:43:16.8340575Z cvt.u64.u32 %rd227, %r533; 2026-02-21T09:43:16.8340769Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:43:16.8340965Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:43:16.8341368Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8341742Z mov.b64 {%r835, %r836}, %rd229; 2026-02-21T09:43:16.8341946Z cvt.rn.f16x2.f32 %r837, %r836, %r835; 2026-02-21T09:43:16.8342313Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8342677Z cvt.u64.u32 %rd230, %r534; 2026-02-21T09:43:16.8342873Z cvt.u64.u32 %rd231, %r535; 2026-02-21T09:43:16.8343065Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:43:16.8343254Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:43:16.8343601Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8343970Z mov.b64 {%r838, %r839}, %rd233; 2026-02-21T09:43:16.8344181Z cvt.rn.f16x2.f32 %r840, %r839, %r838; 2026-02-21T09:43:16.8344532Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8344979Z cvt.u64.u32 %rd234, %r537; 2026-02-21T09:43:16.8345172Z cvt.u64.u32 %rd235, %r538; 2026-02-21T09:43:16.8345370Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:43:16.8345586Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:43:16.8346032Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8346401Z mov.b64 {%r841, %r842}, %rd237; 2026-02-21T09:43:16.8346598Z cvt.rn.f16x2.f32 %r843, %r842, %r841; 2026-02-21T09:43:16.8346944Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8347285Z cvt.u64.u32 %rd238, %r539; 2026-02-21T09:43:16.8347466Z cvt.u64.u32 %rd239, %r540; 2026-02-21T09:43:16.8347649Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:43:16.8347830Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:43:16.8348159Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8348504Z mov.b64 {%r844, %r845}, %rd241; 2026-02-21T09:43:16.8348704Z cvt.rn.f16x2.f32 %r846, %r845, %r844; 2026-02-21T09:43:16.8349038Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8349391Z cvt.u64.u32 %rd242, %r541; 2026-02-21T09:43:16.8349567Z cvt.u64.u32 %rd243, %r542; 2026-02-21T09:43:16.8349748Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:43:16.8349935Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:43:16.8350316Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8350677Z mov.b64 {%r847, %r848}, %rd245; 2026-02-21T09:43:16.8350870Z cvt.rn.f16x2.f32 %r849, %r848, %r847; 2026-02-21T09:43:16.8351215Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8351569Z cvt.u64.u32 %rd246, %r543; 2026-02-21T09:43:16.8351768Z cvt.u64.u32 %rd247, %r544; 2026-02-21T09:43:16.8351973Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:43:16.8352179Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:43:16.8352570Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8352924Z mov.b64 {%r850, %r851}, %rd249; 2026-02-21T09:43:16.8353125Z cvt.rn.f16x2.f32 %r852, %r851, %r850; 2026-02-21T09:43:16.8353526Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8353897Z cvt.u64.u32 %rd250, %r545; 2026-02-21T09:43:16.8354074Z cvt.u64.u32 %rd251, %r546; 2026-02-21T09:43:16.8354259Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:43:16.8354445Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:43:16.8354810Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8355167Z mov.b64 {%r853, %r854}, %rd253; 2026-02-21T09:43:16.8355358Z cvt.rn.f16x2.f32 %r855, %r854, %r853; 2026-02-21T09:43:16.8355694Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8356085Z cvt.u64.u32 %rd254, %r547; 2026-02-21T09:43:16.8356268Z cvt.u64.u32 %rd255, %r548; 2026-02-21T09:43:16.8356450Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:43:16.8356516Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:43:16.8356718Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8356784Z mov.b64 {%r856, %r857}, %rd257; 2026-02-21T09:43:16.8356864Z cvt.rn.f16x2.f32 %r858, %r857, %r856; 2026-02-21T09:43:16.8357061Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8357126Z cvt.u64.u32 %rd258, %r549; 2026-02-21T09:43:16.8357197Z cvt.u64.u32 %rd259, %r550; 2026-02-21T09:43:16.8357261Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:43:16.8357326Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:43:16.8357531Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8357598Z mov.b64 {%r859, %r860}, %rd261; 2026-02-21T09:43:16.8357667Z cvt.rn.f16x2.f32 %r861, %r860, %r859; 2026-02-21T09:43:16.8357864Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8357982Z cvt.u64.u32 %rd262, %r551; 2026-02-21T09:43:16.8358049Z cvt.u64.u32 %rd263, %r552; 2026-02-21T09:43:16.8358116Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:43:16.8358191Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:43:16.8358391Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8358457Z mov.b64 {%r862, %r863}, %rd265; 2026-02-21T09:43:16.8358534Z cvt.rn.f16x2.f32 %r864, %r863, %r862; 2026-02-21T09:43:16.8358733Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8358798Z cvt.u64.u32 %rd266, %r554; 2026-02-21T09:43:16.8358863Z cvt.u64.u32 %rd267, %r555; 2026-02-21T09:43:16.8358934Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:43:16.8358998Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:43:16.8359205Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8359284Z mov.b64 {%r865, %r866}, %rd269; 2026-02-21T09:43:16.8359361Z cvt.rn.f16x2.f32 %r867, %r866, %r865; 2026-02-21T09:43:16.8359629Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8359712Z cvt.u64.u32 %rd270, %r556; 2026-02-21T09:43:16.8359785Z cvt.u64.u32 %rd271, %r557; 2026-02-21T09:43:16.8359859Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:43:16.8359933Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:43:16.8360181Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8360255Z mov.b64 {%r868, %r869}, %rd273; 2026-02-21T09:43:16.8360339Z cvt.rn.f16x2.f32 %r870, %r869, %r868; 2026-02-21T09:43:16.8360559Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8360622Z cvt.u64.u32 %rd274, %r558; 2026-02-21T09:43:16.8360687Z cvt.u64.u32 %rd275, %r559; 2026-02-21T09:43:16.8360752Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:43:16.8360870Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:43:16.8361072Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8361140Z mov.b64 {%r871, %r872}, %rd277; 2026-02-21T09:43:16.8361218Z cvt.rn.f16x2.f32 %r873, %r872, %r871; 2026-02-21T09:43:16.8361419Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8361483Z cvt.u64.u32 %rd278, %r560; 2026-02-21T09:43:16.8361556Z cvt.u64.u32 %rd279, %r561; 2026-02-21T09:43:16.8361620Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:43:16.8361715Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:43:16.8361915Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8361988Z mov.b64 {%r874, %r875}, %rd281; 2026-02-21T09:43:16.8362056Z cvt.rn.f16x2.f32 %r876, %r875, %r874; 2026-02-21T09:43:16.8362256Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8362329Z cvt.u64.u32 %rd282, %r562; 2026-02-21T09:43:16.8362395Z cvt.u64.u32 %rd283, %r563; 2026-02-21T09:43:16.8362461Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:43:16.8362532Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:43:16.8362730Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8362795Z mov.b64 {%r877, %r878}, %rd285; 2026-02-21T09:43:16.8362866Z cvt.rn.f16x2.f32 %r879, %r878, %r877; 2026-02-21T09:43:16.8363073Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8363139Z cvt.u64.u32 %rd286, %r564; 2026-02-21T09:43:16.8363203Z cvt.u64.u32 %rd287, %r565; 2026-02-21T09:43:16.8363275Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:43:16.8363339Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:43:16.8363571Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8363649Z mov.b64 {%r880, %r881}, %rd289; 2026-02-21T09:43:16.8363720Z cvt.rn.f16x2.f32 %r882, %r881, %r880; 2026-02-21T09:43:16.8363918Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8363991Z cvt.u64.u32 %rd290, %r566; 2026-02-21T09:43:16.8364055Z cvt.u64.u32 %rd291, %r567; 2026-02-21T09:43:16.8364120Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:43:16.8364184Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:43:16.8364388Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8364454Z mov.b64 {%r883, %r884}, %rd293; 2026-02-21T09:43:16.8364524Z cvt.rn.f16x2.f32 %r885, %r884, %r883; 2026-02-21T09:43:16.8364777Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8364845Z cvt.u64.u32 %rd294, %r568; 2026-02-21T09:43:16.8364912Z cvt.u64.u32 %rd295, %r569; 2026-02-21T09:43:16.8364977Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:43:16.8365091Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:43:16.8365291Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8365356Z mov.b64 {%r886, %r887}, %rd297; 2026-02-21T09:43:16.8365433Z cvt.rn.f16x2.f32 %r888, %r887, %r886; 2026-02-21T09:43:16.8365629Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8365692Z cvt.u64.u32 %rd298, %r571; 2026-02-21T09:43:16.8365762Z cvt.u64.u32 %rd299, %r572; 2026-02-21T09:43:16.8365828Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:43:16.8365891Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:43:16.8366088Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8366160Z mov.b64 {%r889, %r890}, %rd301; 2026-02-21T09:43:16.8366272Z cvt.rn.f16x2.f32 %r891, %r890, %r889; 2026-02-21T09:43:16.8366472Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8366548Z cvt.u64.u32 %rd302, %r573; 2026-02-21T09:43:16.8366613Z cvt.u64.u32 %rd303, %r574; 2026-02-21T09:43:16.8366680Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:43:16.8366750Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:43:16.8366949Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8367015Z mov.b64 {%r892, %r893}, %rd305; 2026-02-21T09:43:16.8367085Z cvt.rn.f16x2.f32 %r894, %r893, %r892; 2026-02-21T09:43:16.8367328Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8367398Z cvt.u64.u32 %rd306, %r575; 2026-02-21T09:43:16.8367470Z cvt.u64.u32 %rd307, %r576; 2026-02-21T09:43:16.8367548Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:43:16.8367618Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:43:16.8367838Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8367918Z mov.b64 {%r895, %r896}, %rd309; 2026-02-21T09:43:16.8367993Z cvt.rn.f16x2.f32 %r897, %r896, %r895; 2026-02-21T09:43:16.8368220Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8368293Z cvt.u64.u32 %rd310, %r577; 2026-02-21T09:43:16.8368374Z cvt.u64.u32 %rd311, %r578; 2026-02-21T09:43:16.8368448Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:43:16.8368520Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:43:16.8368773Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8368838Z mov.b64 {%r898, %r899}, %rd313; 2026-02-21T09:43:16.8368908Z cvt.rn.f16x2.f32 %r900, %r899, %r898; 2026-02-21T09:43:16.8369163Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8369234Z cvt.u64.u32 %rd314, %r579; 2026-02-21T09:43:16.8369299Z cvt.u64.u32 %rd315, %r580; 2026-02-21T09:43:16.8369366Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:43:16.8369438Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:43:16.8369640Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8369703Z mov.b64 {%r901, %r902}, %rd317; 2026-02-21T09:43:16.8369780Z cvt.rn.f16x2.f32 %r903, %r902, %r901; 2026-02-21T09:43:16.8369981Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8370047Z cvt.u64.u32 %rd318, %r581; 2026-02-21T09:43:16.8370119Z cvt.u64.u32 %rd319, %r582; 2026-02-21T09:43:16.8370186Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:43:16.8370252Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:43:16.8370453Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8370527Z mov.b64 {%r904, %r905}, %rd321; 2026-02-21T09:43:16.8370596Z cvt.rn.f16x2.f32 %r906, %r905, %r904; 2026-02-21T09:43:16.8370860Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8370936Z cvt.u64.u32 %rd322, %r583; 2026-02-21T09:43:16.8371001Z cvt.u64.u32 %rd323, %r584; 2026-02-21T09:43:16.8371066Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:43:16.8371139Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:43:16.8371340Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8371405Z mov.b64 {%r907, %r908}, %rd325; 2026-02-21T09:43:16.8371477Z cvt.rn.f16x2.f32 %r909, %r908, %r907; 2026-02-21T09:43:16.8371685Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8371748Z cvt.u64.u32 %rd326, %r585; 2026-02-21T09:43:16.8371813Z cvt.u64.u32 %rd327, %r586; 2026-02-21T09:43:16.8371923Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:43:16.8371990Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:43:16.8372194Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8372269Z mov.b64 {%r910, %r911}, %rd329; 2026-02-21T09:43:16.8372338Z cvt.rn.f16x2.f32 %r912, %r911, %r910; 2026-02-21T09:43:16.8372539Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8372601Z cvt.u64.u32 %rd330, %r588; 2026-02-21T09:43:16.8372671Z cvt.u64.u32 %rd331, %r589; 2026-02-21T09:43:16.8372733Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:43:16.8372830Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:43:16.8373044Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8373111Z mov.b64 {%r913, %r914}, %rd333; 2026-02-21T09:43:16.8373181Z cvt.rn.f16x2.f32 %r915, %r914, %r913; 2026-02-21T09:43:16.8373396Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8373463Z cvt.u64.u32 %rd334, %r590; 2026-02-21T09:43:16.8373526Z cvt.u64.u32 %rd335, %r591; 2026-02-21T09:43:16.8373591Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:43:16.8373663Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:43:16.8373863Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8373928Z mov.b64 {%r916, %r917}, %rd337; 2026-02-21T09:43:16.8374004Z cvt.rn.f16x2.f32 %r918, %r917, %r916; 2026-02-21T09:43:16.8374203Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8374268Z cvt.u64.u32 %rd338, %r592; 2026-02-21T09:43:16.8374340Z cvt.u64.u32 %rd339, %r593; 2026-02-21T09:43:16.8374404Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:43:16.8374469Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:43:16.8374751Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8374832Z mov.b64 {%r919, %r920}, %rd341; 2026-02-21T09:43:16.8374902Z cvt.rn.f16x2.f32 %r921, %r920, %r919; 2026-02-21T09:43:16.8375101Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8375172Z cvt.u64.u32 %rd342, %r594; 2026-02-21T09:43:16.8375236Z cvt.u64.u32 %rd343, %r595; 2026-02-21T09:43:16.8375301Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:43:16.8375373Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:43:16.8375572Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8375644Z mov.b64 {%r922, %r923}, %rd345; 2026-02-21T09:43:16.8375718Z cvt.rn.f16x2.f32 %r924, %r923, %r922; 2026-02-21T09:43:16.8375942Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8376011Z cvt.u64.u32 %rd346, %r596; 2026-02-21T09:43:16.8376080Z cvt.u64.u32 %rd347, %r597; 2026-02-21T09:43:16.8376159Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:43:16.8376271Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:43:16.8376501Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8376585Z mov.b64 {%r925, %r926}, %rd349; 2026-02-21T09:43:16.8376665Z cvt.rn.f16x2.f32 %r927, %r926, %r925; 2026-02-21T09:43:16.8376909Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8376980Z cvt.u64.u32 %rd350, %r598; 2026-02-21T09:43:16.8377054Z cvt.u64.u32 %rd351, %r599; 2026-02-21T09:43:16.8377120Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:43:16.8377184Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:43:16.8377393Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8377459Z mov.b64 {%r928, %r929}, %rd353; 2026-02-21T09:43:16.8377572Z cvt.rn.f16x2.f32 %r930, %r929, %r928; 2026-02-21T09:43:16.8377784Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8377850Z cvt.u64.u32 %rd354, %r600; 2026-02-21T09:43:16.8377915Z cvt.u64.u32 %rd355, %r601; 2026-02-21T09:43:16.8377981Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:43:16.8378054Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:43:16.8378254Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8378320Z mov.b64 {%r931, %r932}, %rd357; 2026-02-21T09:43:16.8378436Z cvt.rn.f16x2.f32 %r933, %r932, %r931; 2026-02-21T09:43:16.8378644Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8378710Z cvt.u64.u32 %rd358, %r602; 2026-02-21T09:43:16.8378780Z cvt.u64.u32 %rd359, %r603; 2026-02-21T09:43:16.8378845Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:43:16.8378913Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:43:16.8379116Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8379190Z mov.b64 {%r934, %r935}, %rd361; 2026-02-21T09:43:16.8379260Z cvt.rn.f16x2.f32 %r936, %r935, %r934; 2026-02-21T09:43:16.8379463Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8379541Z cvt.u64.u32 %rd362, %r605; 2026-02-21T09:43:16.8379606Z cvt.u64.u32 %rd363, %r606; 2026-02-21T09:43:16.8379671Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:43:16.8379743Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:43:16.8379951Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8380016Z mov.b64 {%r937, %r938}, %rd365; 2026-02-21T09:43:16.8380086Z cvt.rn.f16x2.f32 %r939, %r938, %r937; 2026-02-21T09:43:16.8380325Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8380391Z cvt.u64.u32 %rd366, %r607; 2026-02-21T09:43:16.8380456Z cvt.u64.u32 %rd367, %r608; 2026-02-21T09:43:16.8380533Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:43:16.8380610Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:43:16.8380834Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8380912Z mov.b64 {%r940, %r941}, %rd369; 2026-02-21T09:43:16.8380984Z cvt.rn.f16x2.f32 %r942, %r941, %r940; 2026-02-21T09:43:16.8381197Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8381267Z cvt.u64.u32 %rd370, %r609; 2026-02-21T09:43:16.8381342Z cvt.u64.u32 %rd371, %r610; 2026-02-21T09:43:16.8381409Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:43:16.8381476Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:43:16.8381697Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8381768Z mov.b64 {%r943, %r944}, %rd373; 2026-02-21T09:43:16.8381876Z cvt.rn.f16x2.f32 %r945, %r944, %r943; 2026-02-21T09:43:16.8382096Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8382162Z cvt.u64.u32 %rd374, %r611; 2026-02-21T09:43:16.8382228Z cvt.u64.u32 %rd375, %r612; 2026-02-21T09:43:16.8382297Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:43:16.8382372Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:43:16.8382581Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8382652Z mov.b64 {%r946, %r947}, %rd377; 2026-02-21T09:43:16.8382735Z cvt.rn.f16x2.f32 %r948, %r947, %r946; 2026-02-21T09:43:16.8382944Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8383012Z cvt.u64.u32 %rd378, %r613; 2026-02-21T09:43:16.8383088Z cvt.u64.u32 %rd379, %r614; 2026-02-21T09:43:16.8383186Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:43:16.8383257Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:43:16.8383469Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8383546Z mov.b64 {%r949, %r950}, %rd381; 2026-02-21T09:43:16.8383629Z cvt.rn.f16x2.f32 %r951, %r950, %r949; 2026-02-21T09:43:16.8383842Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8383925Z cvt.u64.u32 %rd382, %r615; 2026-02-21T09:43:16.8383997Z cvt.u64.u32 %rd383, %r616; 2026-02-21T09:43:16.8384110Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:43:16.8384198Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:43:16.8384438Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8384510Z mov.b64 {%r952, %r953}, %rd385; 2026-02-21T09:43:16.8384591Z cvt.rn.f16x2.f32 %r954, %r953, %r952; 2026-02-21T09:43:16.8384901Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8384982Z cvt.u64.u32 %rd386, %r617; 2026-02-21T09:43:16.8385058Z cvt.u64.u32 %rd387, %r618; 2026-02-21T09:43:16.8385144Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:43:16.8385222Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:43:16.8385455Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8385531Z mov.b64 {%r955, %r956}, %rd389; 2026-02-21T09:43:16.8385603Z cvt.rn.f16x2.f32 %r957, %r956, %r955; 2026-02-21T09:43:16.8385814Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8385882Z cvt.u64.u32 %rd390, %r619; 2026-02-21T09:43:16.8385957Z cvt.u64.u32 %rd391, %r620; 2026-02-21T09:43:16.8386023Z shl.b64 %rd392, %rd391, 32; 2026-02-21T09:43:16.8386090Z or.b64 %rd393, %rd390, %rd392; 2026-02-21T09:43:16.8386363Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8386439Z mov.b64 {%r958, %r959}, %rd393; 2026-02-21T09:43:16.8386515Z cvt.rn.f16x2.f32 %r960, %r959, %r958; 2026-02-21T09:43:16.8386736Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8386803Z cvt.u64.u32 %rd394, %r622; 2026-02-21T09:43:16.8386870Z cvt.u64.u32 %rd395, %r623; 2026-02-21T09:43:16.8386939Z shl.b64 %rd396, %rd395, 32; 2026-02-21T09:43:16.8387015Z or.b64 %rd397, %rd394, %rd396; 2026-02-21T09:43:16.8387230Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8387300Z mov.b64 {%r961, %r962}, %rd397; 2026-02-21T09:43:16.8387380Z cvt.rn.f16x2.f32 %r963, %r962, %r961; 2026-02-21T09:43:16.8387594Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8387663Z cvt.u64.u32 %rd398, %r624; 2026-02-21T09:43:16.8387738Z cvt.u64.u32 %rd399, %r625; 2026-02-21T09:43:16.8387849Z shl.b64 %rd400, %rd399, 32; 2026-02-21T09:43:16.8387918Z or.b64 %rd401, %rd398, %rd400; 2026-02-21T09:43:16.8388135Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8388212Z mov.b64 {%r964, %r965}, %rd401; 2026-02-21T09:43:16.8388286Z cvt.rn.f16x2.f32 %r966, %r965, %r964; 2026-02-21T09:43:16.8388493Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8388569Z cvt.u64.u32 %rd402, %r626; 2026-02-21T09:43:16.8388637Z cvt.u64.u32 %rd403, %r627; 2026-02-21T09:43:16.8388704Z shl.b64 %rd404, %rd403, 32; 2026-02-21T09:43:16.8388781Z or.b64 %rd405, %rd402, %rd404; 2026-02-21T09:43:16.8388993Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8389094Z mov.b64 {%r967, %r968}, %rd405; 2026-02-21T09:43:16.8389166Z cvt.rn.f16x2.f32 %r969, %r968, %r967; 2026-02-21T09:43:16.8389373Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8389436Z cvt.u64.u32 %rd406, %r628; 2026-02-21T09:43:16.8389501Z cvt.u64.u32 %rd407, %r629; 2026-02-21T09:43:16.8389573Z shl.b64 %rd408, %rd407, 32; 2026-02-21T09:43:16.8389637Z or.b64 %rd409, %rd406, %rd408; 2026-02-21T09:43:16.8389835Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8389906Z mov.b64 {%r970, %r971}, %rd409; 2026-02-21T09:43:16.8390012Z cvt.rn.f16x2.f32 %r972, %r971, %r970; 2026-02-21T09:43:16.8390208Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8390272Z cvt.u64.u32 %rd410, %r630; 2026-02-21T09:43:16.8390341Z cvt.u64.u32 %rd411, %r631; 2026-02-21T09:43:16.8390407Z shl.b64 %rd412, %rd411, 32; 2026-02-21T09:43:16.8390473Z or.b64 %rd413, %rd410, %rd412; 2026-02-21T09:43:16.8390674Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8390739Z mov.b64 {%r973, %r974}, %rd413; 2026-02-21T09:43:16.8390808Z cvt.rn.f16x2.f32 %r975, %r974, %r973; 2026-02-21T09:43:16.8391011Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8391074Z cvt.u64.u32 %rd414, %r632; 2026-02-21T09:43:16.8391137Z cvt.u64.u32 %rd415, %r633; 2026-02-21T09:43:16.8391201Z shl.b64 %rd416, %rd415, 32; 2026-02-21T09:43:16.8391274Z or.b64 %rd417, %rd414, %rd416; 2026-02-21T09:43:16.8391476Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8391539Z mov.b64 {%r976, %r977}, %rd417; 2026-02-21T09:43:16.8391615Z cvt.rn.f16x2.f32 %r978, %r977, %r976; 2026-02-21T09:43:16.8391857Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8391925Z cvt.u64.u32 %rd418, %r634; 2026-02-21T09:43:16.8391997Z cvt.u64.u32 %rd419, %r635; 2026-02-21T09:43:16.8392061Z shl.b64 %rd420, %rd419, 32; 2026-02-21T09:43:16.8392125Z or.b64 %rd421, %rd418, %rd420; 2026-02-21T09:43:16.8392337Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8392414Z mov.b64 {%r979, %r980}, %rd421; 2026-02-21T09:43:16.8392488Z cvt.rn.f16x2.f32 %r981, %r980, %r979; 2026-02-21T09:43:16.8392708Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8392789Z cvt.u64.u32 %rd422, %r636; 2026-02-21T09:43:16.8392860Z cvt.u64.u32 %rd423, %r637; 2026-02-21T09:43:16.8392931Z shl.b64 %rd424, %rd423, 32; 2026-02-21T09:43:16.8393010Z or.b64 %rd425, %rd422, %rd424; 2026-02-21T09:43:16.8393250Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8393324Z mov.b64 {%r982, %r983}, %rd425; 2026-02-21T09:43:16.8393450Z cvt.rn.f16x2.f32 %r984, %r983, %r982; 2026-02-21T09:43:16.8393681Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8393745Z cvt.u64.u32 %rd426, %r639; 2026-02-21T09:43:16.8393807Z cvt.u64.u32 %rd427, %r640; 2026-02-21T09:43:16.8393878Z shl.b64 %rd428, %rd427, 32; 2026-02-21T09:43:16.8393941Z or.b64 %rd429, %rd426, %rd428; 2026-02-21T09:43:16.8394139Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8394212Z mov.b64 {%r985, %r986}, %rd429; 2026-02-21T09:43:16.8394281Z cvt.rn.f16x2.f32 %r987, %r986, %r985; 2026-02-21T09:43:16.8394480Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8394546Z cvt.u64.u32 %rd430, %r641; 2026-02-21T09:43:16.8394652Z cvt.u64.u32 %rd431, %r642; 2026-02-21T09:43:16.8394779Z shl.b64 %rd432, %rd431, 32; 2026-02-21T09:43:16.8394848Z or.b64 %rd433, %rd430, %rd432; 2026-02-21T09:43:16.8395054Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8395118Z mov.b64 {%r988, %r989}, %rd433; 2026-02-21T09:43:16.8395188Z cvt.rn.f16x2.f32 %r990, %r989, %r988; 2026-02-21T09:43:16.8395391Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8395457Z cvt.u64.u32 %rd434, %r643; 2026-02-21T09:43:16.8395564Z cvt.u64.u32 %rd435, %r644; 2026-02-21T09:43:16.8395632Z shl.b64 %rd436, %rd435, 32; 2026-02-21T09:43:16.8395704Z or.b64 %rd437, %rd434, %rd436; 2026-02-21T09:43:16.8395903Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8395967Z mov.b64 {%r991, %r992}, %rd437; 2026-02-21T09:43:16.8396047Z cvt.rn.f16x2.f32 %r993, %r992, %r991; 2026-02-21T09:43:16.8396245Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8396310Z cvt.u64.u32 %rd438, %r645; 2026-02-21T09:43:16.8396381Z cvt.u64.u32 %rd439, %r646; 2026-02-21T09:43:16.8396445Z shl.b64 %rd440, %rd439, 32; 2026-02-21T09:43:16.8396509Z or.b64 %rd441, %rd438, %rd440; 2026-02-21T09:43:16.8396706Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8396779Z mov.b64 {%r994, %r995}, %rd441; 2026-02-21T09:43:16.8396849Z cvt.rn.f16x2.f32 %r996, %r995, %r994; 2026-02-21T09:43:16.8397050Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8397123Z cvt.u64.u32 %rd442, %r647; 2026-02-21T09:43:16.8397187Z cvt.u64.u32 %rd443, %r648; 2026-02-21T09:43:16.8397252Z shl.b64 %rd444, %rd443, 32; 2026-02-21T09:43:16.8397358Z or.b64 %rd445, %rd442, %rd444; 2026-02-21T09:43:16.8397561Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8397629Z mov.b64 {%r997, %r998}, %rd445; 2026-02-21T09:43:16.8397697Z cvt.rn.f16x2.f32 %r999, %r998, %r997; 2026-02-21T09:43:16.8397908Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8397972Z cvt.u64.u32 %rd446, %r649; 2026-02-21T09:43:16.8398038Z cvt.u64.u32 %rd447, %r650; 2026-02-21T09:43:16.8398111Z shl.b64 %rd448, %rd447, 32; 2026-02-21T09:43:16.8398176Z or.b64 %rd449, %rd446, %rd448; 2026-02-21T09:43:16.8398375Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8398455Z mov.b64 {%r1000, %r1001}, %rd449; 2026-02-21T09:43:16.8398535Z cvt.rn.f16x2.f32 %r1002, %r1001, %r1000; 2026-02-21T09:43:16.8398742Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8398807Z cvt.u64.u32 %rd450, %r651; 2026-02-21T09:43:16.8398914Z cvt.u64.u32 %rd451, %r652; 2026-02-21T09:43:16.8398980Z shl.b64 %rd452, %rd451, 32; 2026-02-21T09:43:16.8399047Z or.b64 %rd453, %rd450, %rd452; 2026-02-21T09:43:16.8399259Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8399330Z mov.b64 {%r1003, %r1004}, %rd453; 2026-02-21T09:43:16.8399411Z cvt.rn.f16x2.f32 %r1005, %r1004, %r1003; 2026-02-21T09:43:16.8399619Z .loc 1 49 52 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:49:52 2026-02-21T09:43:16.8399684Z cvt.u64.u32 %rd454, %r653; 2026-02-21T09:43:16.8399750Z cvt.u64.u32 %rd455, %r654; 2026-02-21T09:43:16.8399820Z shl.b64 %rd456, %rd455, 32; 2026-02-21T09:43:16.8399894Z or.b64 %rd457, %rd454, %rd456; 2026-02-21T09:43:16.8400117Z .loc 1 51 27 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:51:27 2026-02-21T09:43:16.8400229Z mov.b64 {%r1006, %r1007}, %rd457; 2026-02-21T09:43:16.8400323Z cvt.rn.f16x2.f32 %r1008, %r1007, %r1006; 2026-02-21T09:43:16.8400556Z .loc 1 52 83 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:52:83 2026-02-21T09:43:16.8400684Z st.shared.v4.b32 [%r23], {%r819, %r831, %r843, %r855}; 2026-02-21T09:43:16.8400816Z st.shared.v4.b32 [%r24], {%r867, %r879, %r891, %r903}; 2026-02-21T09:43:16.8400932Z st.shared.v4.b32 [%r25], {%r915, %r927, %r939, %r951}; 2026-02-21T09:43:16.8401038Z st.shared.v4.b32 [%r26], {%r963, %r975, %r987, %r999}; 2026-02-21T09:43:16.8401132Z bar.sync 0; 2026-02-21T09:43:16.8401205Z // begin inline asm 2026-02-21T09:43:16.8401390Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r736, %r740, %r744, %r748}, [%r660]; 2026-02-21T09:43:16.8401454Z // end inline asm 2026-02-21T09:43:16.8401525Z // begin inline asm 2026-02-21T09:43:16.8401707Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r752, %r756, %r760, %r764}, [%r665]; 2026-02-21T09:43:16.8401771Z // end inline asm 2026-02-21T09:43:16.8401835Z // begin inline asm 2026-02-21T09:43:16.8402018Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r768, %r772, %r776, %r780}, [%r670]; 2026-02-21T09:43:16.8402079Z // end inline asm 2026-02-21T09:43:16.8402141Z // begin inline asm 2026-02-21T09:43:16.8402320Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r784, %r788, %r792, %r796}, [%r675]; 2026-02-21T09:43:16.8402379Z // end inline asm 2026-02-21T09:43:16.8402442Z bar.sync 0; 2026-02-21T09:43:16.8402551Z st.shared.v4.b32 [%r23], {%r822, %r834, %r846, %r858}; 2026-02-21T09:43:16.8402651Z st.shared.v4.b32 [%r24], {%r870, %r882, %r894, %r906}; 2026-02-21T09:43:16.8402751Z st.shared.v4.b32 [%r25], {%r918, %r930, %r942, %r954}; 2026-02-21T09:43:16.8402857Z st.shared.v4.b32 [%r26], {%r966, %r978, %r990, %r1002}; 2026-02-21T09:43:16.8402924Z bar.sync 0; 2026-02-21T09:43:16.8402986Z // begin inline asm 2026-02-21T09:43:16.8403193Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r737, %r741, %r745, %r749}, [%r660]; 2026-02-21T09:43:16.8403265Z // end inline asm 2026-02-21T09:43:16.8403331Z // begin inline asm 2026-02-21T09:43:16.8403504Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r753, %r757, %r761, %r765}, [%r665]; 2026-02-21T09:43:16.8403575Z // end inline asm 2026-02-21T09:43:16.8403638Z // begin inline asm 2026-02-21T09:43:16.8403809Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r769, %r773, %r777, %r781}, [%r670]; 2026-02-21T09:43:16.8403871Z // end inline asm 2026-02-21T09:43:16.8403942Z // begin inline asm 2026-02-21T09:43:16.8404111Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r785, %r789, %r793, %r797}, [%r675]; 2026-02-21T09:43:16.8404174Z // end inline asm 2026-02-21T09:43:16.8404242Z bar.sync 0; 2026-02-21T09:43:16.8404343Z st.shared.v4.b32 [%r23], {%r825, %r837, %r849, %r861}; 2026-02-21T09:43:16.8404440Z st.shared.v4.b32 [%r24], {%r873, %r885, %r897, %r909}; 2026-02-21T09:43:16.8404539Z st.shared.v4.b32 [%r25], {%r921, %r933, %r945, %r957}; 2026-02-21T09:43:16.8404651Z st.shared.v4.b32 [%r26], {%r969, %r981, %r993, %r1005}; 2026-02-21T09:43:16.8404784Z bar.sync 0; 2026-02-21T09:43:16.8404851Z // begin inline asm 2026-02-21T09:43:16.8405040Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r738, %r742, %r746, %r750}, [%r660]; 2026-02-21T09:43:16.8405102Z // end inline asm 2026-02-21T09:43:16.8405165Z // begin inline asm 2026-02-21T09:43:16.8405347Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r754, %r758, %r762, %r766}, [%r665]; 2026-02-21T09:43:16.8405408Z // end inline asm 2026-02-21T09:43:16.8405469Z // begin inline asm 2026-02-21T09:43:16.8405639Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r770, %r774, %r778, %r782}, [%r670]; 2026-02-21T09:43:16.8405707Z // end inline asm 2026-02-21T09:43:16.8405769Z // begin inline asm 2026-02-21T09:43:16.8405939Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r786, %r790, %r794, %r798}, [%r675]; 2026-02-21T09:43:16.8406005Z // end inline asm 2026-02-21T09:43:16.8406065Z bar.sync 0; 2026-02-21T09:43:16.8406205Z st.shared.v4.b32 [%r23], {%r828, %r840, %r852, %r864}; 2026-02-21T09:43:16.8406306Z st.shared.v4.b32 [%r24], {%r876, %r888, %r900, %r912}; 2026-02-21T09:43:16.8406414Z st.shared.v4.b32 [%r25], {%r924, %r936, %r948, %r960}; 2026-02-21T09:43:16.8406518Z st.shared.v4.b32 [%r26], {%r972, %r984, %r996, %r1008}; 2026-02-21T09:43:16.8406580Z bar.sync 0; 2026-02-21T09:43:16.8406652Z // begin inline asm 2026-02-21T09:43:16.8406833Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r739, %r743, %r747, %r751}, [%r660]; 2026-02-21T09:43:16.8406895Z // end inline asm 2026-02-21T09:43:16.8406968Z // begin inline asm 2026-02-21T09:43:16.8407158Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r755, %r759, %r763, %r767}, [%r665]; 2026-02-21T09:43:16.8407260Z // end inline asm 2026-02-21T09:43:16.8407326Z // begin inline asm 2026-02-21T09:43:16.8407538Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r771, %r775, %r779, %r783}, [%r670]; 2026-02-21T09:43:16.8407602Z // end inline asm 2026-02-21T09:43:16.8407674Z // begin inline asm 2026-02-21T09:43:16.8407868Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r787, %r791, %r795, %r799}, [%r675]; 2026-02-21T09:43:16.8407928Z // end inline asm 2026-02-21T09:43:16.8407989Z // begin inline asm 2026-02-21T09:43:16.8408118Z st.global.v4.b32 [ %rd186 + 0 ], { %r736, %r737, %r738, %r739 }; 2026-02-21T09:43:16.8408177Z // end inline asm 2026-02-21T09:43:16.8408239Z // begin inline asm 2026-02-21T09:43:16.8408357Z st.global.v4.b32 [ %rd187 + 0 ], { %r740, %r741, %r742, %r743 }; 2026-02-21T09:43:16.8408423Z // end inline asm 2026-02-21T09:43:16.8408485Z // begin inline asm 2026-02-21T09:43:16.8408598Z st.global.v4.b32 [ %rd188 + 0 ], { %r744, %r745, %r746, %r747 }; 2026-02-21T09:43:16.8408668Z // end inline asm 2026-02-21T09:43:16.8408730Z // begin inline asm 2026-02-21T09:43:16.8408841Z st.global.v4.b32 [ %rd189 + 0 ], { %r748, %r749, %r750, %r751 }; 2026-02-21T09:43:16.8408900Z // end inline asm 2026-02-21T09:43:16.8408968Z // begin inline asm 2026-02-21T09:43:16.8409121Z st.global.v4.b32 [ %rd190 + 0 ], { %r752, %r753, %r754, %r755 }; 2026-02-21T09:43:16.8409186Z // end inline asm 2026-02-21T09:43:16.8409259Z // begin inline asm 2026-02-21T09:43:16.8409372Z st.global.v4.b32 [ %rd191 + 0 ], { %r756, %r757, %r758, %r759 }; 2026-02-21T09:43:16.8409434Z // end inline asm 2026-02-21T09:43:16.8409496Z // begin inline asm 2026-02-21T09:43:16.8409614Z st.global.v4.b32 [ %rd192 + 0 ], { %r760, %r761, %r762, %r763 }; 2026-02-21T09:43:16.8409674Z // end inline asm 2026-02-21T09:43:16.8409735Z // begin inline asm 2026-02-21T09:43:16.8409852Z st.global.v4.b32 [ %rd193 + 0 ], { %r764, %r765, %r766, %r767 }; 2026-02-21T09:43:16.8409913Z // end inline asm 2026-02-21T09:43:16.8409975Z // begin inline asm 2026-02-21T09:43:16.8410092Z st.global.v4.b32 [ %rd194 + 0 ], { %r768, %r769, %r770, %r771 }; 2026-02-21T09:43:16.8410151Z // end inline asm 2026-02-21T09:43:16.8410211Z // begin inline asm 2026-02-21T09:43:16.8410320Z st.global.v4.b32 [ %rd195 + 0 ], { %r772, %r773, %r774, %r775 }; 2026-02-21T09:43:16.8410389Z // end inline asm 2026-02-21T09:43:16.8410451Z // begin inline asm 2026-02-21T09:43:16.8410600Z st.global.v4.b32 [ %rd196 + 0 ], { %r776, %r777, %r778, %r779 }; 2026-02-21T09:43:16.8410669Z // end inline asm 2026-02-21T09:43:16.8410730Z // begin inline asm 2026-02-21T09:43:16.8410841Z st.global.v4.b32 [ %rd197 + 0 ], { %r780, %r781, %r782, %r783 }; 2026-02-21T09:43:16.8410903Z // end inline asm 2026-02-21T09:43:16.8410971Z // begin inline asm 2026-02-21T09:43:16.8411080Z st.global.v4.b32 [ %rd198 + 0 ], { %r784, %r785, %r786, %r787 }; 2026-02-21T09:43:16.8411139Z // end inline asm 2026-02-21T09:43:16.8411211Z // begin inline asm 2026-02-21T09:43:16.8411323Z st.global.v4.b32 [ %rd199 + 0 ], { %r788, %r789, %r790, %r791 }; 2026-02-21T09:43:16.8411382Z // end inline asm 2026-02-21T09:43:16.8411443Z // begin inline asm 2026-02-21T09:43:16.8411562Z st.global.v4.b32 [ %rd200 + 0 ], { %r792, %r793, %r794, %r795 }; 2026-02-21T09:43:16.8411624Z // end inline asm 2026-02-21T09:43:16.8411717Z // begin inline asm 2026-02-21T09:43:16.8411836Z st.global.v4.b32 [ %rd201 + 0 ], { %r796, %r797, %r798, %r799 }; 2026-02-21T09:43:16.8411897Z // end inline asm 2026-02-21T09:43:16.8411988Z $L__BB0_8: // %._crit_edge 2026-02-21T09:43:16.8412203Z .loc 1 23 4 // cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py:23:4 2026-02-21T09:43:16.8412262Z bar.sync 0; 2026-02-21T09:43:16.8412324Z // begin inline asm 2026-02-21T09:43:16.8412464Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1009, 128; 2026-02-21T09:43:16.8412531Z // end inline asm 2026-02-21T09:43:16.8412632Z ret; 2026-02-21T09:43:16.8412694Z $L__tmp0: 2026-02-21T09:43:16.8412762Z $L__func_end0: 2026-02-21T09:43:16.8412860Z // -- End function 2026-02-21T09:43:16.8412916Z } 2026-02-21T09:43:16.8413173Z .file 1 "/tmp/torchinductor_root/ge/cgey4oqu7zncjwhzwqcbkbazyklarskgmc5bboo7z2wbcdinbjsx.py" 2026-02-21T09:43:16.8413245Z .section .debug_abbrev 2026-02-21T09:43:16.8413303Z { 2026-02-21T09:43:16.8413403Z .b8 1 // Abbreviation Code 2026-02-21T09:43:16.8413518Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:43:16.8413610Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:43:16.8413702Z .b8 37 // DW_AT_producer 2026-02-21T09:43:16.8413795Z .b8 8 // DW_FORM_string 2026-02-21T09:43:16.8413883Z .b8 19 // DW_AT_language 2026-02-21T09:43:16.8413975Z .b8 5 // DW_FORM_data2 2026-02-21T09:43:16.8414075Z .b8 3 // DW_AT_name 2026-02-21T09:43:16.8414170Z .b8 8 // DW_FORM_string 2026-02-21T09:43:16.8414274Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:43:16.8414408Z .b8 6 // DW_FORM_data4 2026-02-21T09:43:16.8414516Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:43:16.8414601Z .b8 8 // DW_FORM_string 2026-02-21T09:43:16.8414726Z .b8 0 // EOM(1) 2026-02-21T09:43:16.8414817Z .b8 0 // EOM(2) 2026-02-21T09:43:16.8414896Z .b8 0 // EOM(3) 2026-02-21T09:43:16.8414951Z } 2026-02-21T09:43:16.8415026Z .section .debug_info 2026-02-21T09:43:16.8415079Z { 2026-02-21T09:43:16.8415172Z .b32 104 // Length of Unit 2026-02-21T09:43:16.8415268Z .b8 2 // DWARF version number 2026-02-21T09:43:16.8415333Z .b8 0 2026-02-21T09:43:16.8415471Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:43:16.8415569Z .b8 8 // Address Size (in bytes) 2026-02-21T09:43:16.8415694Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:43:16.8415786Z .b8 116 // DW_AT_producer 2026-02-21T09:43:16.8415890Z .b8 114 2026-02-21T09:43:16.8415949Z .b8 105 2026-02-21T09:43:16.8416013Z .b8 116 2026-02-21T09:43:16.8416069Z .b8 111 2026-02-21T09:43:16.8416123Z .b8 110 2026-02-21T09:43:16.8416187Z .b8 0 2026-02-21T09:43:16.8416269Z .b8 2 // DW_AT_language 2026-02-21T09:43:16.8416324Z .b8 0 2026-02-21T09:43:16.8416408Z .b8 99 // DW_AT_name 2026-02-21T09:43:16.8416472Z .b8 103 2026-02-21T09:43:16.8416526Z .b8 101 2026-02-21T09:43:16.8416580Z .b8 121 2026-02-21T09:43:16.8416643Z .b8 52 2026-02-21T09:43:16.8416698Z .b8 111 2026-02-21T09:43:16.8416751Z .b8 113 2026-02-21T09:43:16.8416806Z .b8 117 2026-02-21T09:43:16.8416868Z .b8 55 2026-02-21T09:43:16.8416923Z .b8 122 2026-02-21T09:43:16.8416977Z .b8 110 2026-02-21T09:43:16.8417035Z .b8 99 2026-02-21T09:43:16.8417088Z .b8 106 2026-02-21T09:43:16.8417142Z .b8 119 2026-02-21T09:43:16.8417198Z .b8 104 2026-02-21T09:43:16.8417300Z .b8 122 2026-02-21T09:43:16.8417358Z .b8 119 2026-02-21T09:43:16.8417416Z .b8 113 2026-02-21T09:43:16.8417473Z .b8 99 2026-02-21T09:43:16.8417534Z .b8 98 2026-02-21T09:43:16.8417590Z .b8 107 2026-02-21T09:43:16.8417643Z .b8 98 2026-02-21T09:43:16.8417702Z .b8 97 2026-02-21T09:43:16.8417757Z .b8 122 2026-02-21T09:43:16.8417813Z .b8 121 2026-02-21T09:43:16.8417868Z .b8 107 2026-02-21T09:43:16.8417931Z .b8 108 2026-02-21T09:43:16.8417986Z .b8 97 2026-02-21T09:43:16.8418041Z .b8 114 2026-02-21T09:43:16.8418102Z .b8 115 2026-02-21T09:43:16.8418156Z .b8 107 2026-02-21T09:43:16.8418211Z .b8 103 2026-02-21T09:43:16.8418305Z .b8 109 2026-02-21T09:43:16.8418367Z .b8 99 2026-02-21T09:43:16.8418420Z .b8 53 2026-02-21T09:43:16.8418474Z .b8 98 2026-02-21T09:43:16.8418537Z .b8 98 2026-02-21T09:43:16.8418590Z .b8 111 2026-02-21T09:43:16.8418645Z .b8 111 2026-02-21T09:43:16.8418700Z .b8 55 2026-02-21T09:43:16.8418761Z .b8 122 2026-02-21T09:43:16.8418816Z .b8 50 2026-02-21T09:43:16.8418875Z .b8 119 2026-02-21T09:43:16.8418930Z .b8 98 2026-02-21T09:43:16.8418990Z .b8 99 2026-02-21T09:43:16.8419046Z .b8 100 2026-02-21T09:43:16.8419101Z .b8 105 2026-02-21T09:43:16.8419163Z .b8 110 2026-02-21T09:43:16.8419217Z .b8 98 2026-02-21T09:43:16.8419271Z .b8 106 2026-02-21T09:43:16.8419325Z .b8 115 2026-02-21T09:43:16.8419388Z .b8 120 2026-02-21T09:43:16.8419443Z .b8 46 2026-02-21T09:43:16.8419497Z .b8 112 2026-02-21T09:43:16.8419560Z .b8 121 2026-02-21T09:43:16.8419614Z .b8 0 2026-02-21T09:43:16.8419719Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:43:16.8419803Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:43:16.8419866Z .b8 116 2026-02-21T09:43:16.8419920Z .b8 109 2026-02-21T09:43:16.8419974Z .b8 112 2026-02-21T09:43:16.8420036Z .b8 47 2026-02-21T09:43:16.8420089Z .b8 116 2026-02-21T09:43:16.8420144Z .b8 111 2026-02-21T09:43:16.8420200Z .b8 114 2026-02-21T09:43:16.8420263Z .b8 99 2026-02-21T09:43:16.8420317Z .b8 104 2026-02-21T09:43:16.8420427Z .b8 105 2026-02-21T09:43:16.8420488Z .b8 110 2026-02-21T09:43:16.8420552Z .b8 100 2026-02-21T09:43:16.8420609Z .b8 117 2026-02-21T09:43:16.8420664Z .b8 99 2026-02-21T09:43:16.8420728Z .b8 116 2026-02-21T09:43:16.8420794Z .b8 111 2026-02-21T09:43:16.8420855Z .b8 114 2026-02-21T09:43:16.8420913Z .b8 95 2026-02-21T09:43:16.8420982Z .b8 114 2026-02-21T09:43:16.8421041Z .b8 111 2026-02-21T09:43:16.8421099Z .b8 111 2026-02-21T09:43:16.8421166Z .b8 116 2026-02-21T09:43:16.8421227Z .b8 47 2026-02-21T09:43:16.8421289Z .b8 103 2026-02-21T09:43:16.8421352Z .b8 101 2026-02-21T09:43:16.8421422Z .b8 0 2026-02-21T09:43:16.8421480Z } 2026-02-21T09:43:16.8421558Z .section .debug_macinfo { } 2026-02-21T09:43:16.8421565Z 2026-02-21T09:43:16.8421662Z ================================================================ 2026-02-21T09:43:16.8421780Z please share the reproducer above with Triton project. 2026-02-21T09:43:17.7375364Z 2026-02-21T09:43:17.7381354Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 48/48 16.6 configs/s 2026-02-21T09:43:19.8496908Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 476.4 2026-02-21T09:43:19.8497767Z configs/s 2026-02-21T09:43:19.9806765Z [146s] Generation 8 complete: 2026-02-21T09:43:19.9809109Z error=2 2026-02-21T09:43:19.9809298Z ok=49 2026-02-21T09:43:19.9809462Z min=0.0389 2026-02-21T09:43:19.9812389Z mid=0.0614 2026-02-21T09:43:19.9812543Z max=6.4911 2026-02-21T09:43:19.9812695Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:19.9812919Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:19.9813148Z 'l2_groupings': [32], 2026-02-21T09:43:19.9813310Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:19.9813500Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:19.9813650Z 'num_stages': 3, 2026-02-21T09:43:19.9813791Z 'num_warps': 4, 2026-02-21T09:43:19.9813928Z 'pid_type': 'flat', 2026-02-21T09:43:19.9814087Z 'range_flattens': [None, False], 2026-02-21T09:43:19.9814518Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:19.9814791Z 'range_num_stages': [0, 0], 2026-02-21T09:43:19.9814975Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:19.9815155Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:19.9829775Z [146s] Fitting surrogate: 745 points, 745 targets 2026-02-21T09:43:20.6976422Z [146s] Generation 9 starting: 38 neighbors, 2 active search path(s) 2026-02-21T09:43:24.3501999Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38/38 21.1 configs/s 2026-02-21T09:43:26.2643712Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 38/38 20.4 configs/s 2026-02-21T09:43:28.0315119Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 568.2 2026-02-21T09:43:28.0315846Z configs/s 2026-02-21T09:43:28.1516127Z [154s] Generation 9 complete: 2026-02-21T09:43:28.1520327Z error=6 2026-02-21T09:43:28.1521662Z ok=35 2026-02-21T09:43:28.1522163Z min=0.0388 2026-02-21T09:43:28.1522330Z mid=0.0512 2026-02-21T09:43:28.1522451Z max=2.4966 2026-02-21T09:43:28.1522607Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:28.1522836Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:28.1523038Z 'l2_groupings': [32], 2026-02-21T09:43:28.1523210Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:28.1523393Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:28.1523551Z 'num_stages': 3, 2026-02-21T09:43:28.1523688Z 'num_warps': 4, 2026-02-21T09:43:28.1523833Z 'pid_type': 'flat', 2026-02-21T09:43:28.1523989Z 'range_flattens': [None, False], 2026-02-21T09:43:28.1524178Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:28.1524364Z 'range_num_stages': [0, 0], 2026-02-21T09:43:28.1524538Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:28.1524931Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:28.1538499Z [154s] Fitting surrogate: 786 points, 786 targets 2026-02-21T09:43:28.6327667Z [154s] Generation 10 starting: 15 neighbors, 1 active search path(s) 2026-02-21T09:43:29.9624591Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 15/15 26.9 configs/s 2026-02-21T09:43:30.5578826Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 15/15 26.4 configs/s 2026-02-21T09:43:31.2331723Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1460.8 2026-02-21T09:43:31.2336679Z configs/s 2026-02-21T09:43:31.2905062Z [157s] Generation 10 complete: 2026-02-21T09:43:31.2908882Z error=5 2026-02-21T09:43:31.2910397Z ok=12 2026-02-21T09:43:31.2910567Z min=0.0378 2026-02-21T09:43:31.2910726Z mid=0.0471 2026-02-21T09:43:31.2910846Z max=0.1722 2026-02-21T09:43:31.2910993Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:31.2911218Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:31.2911425Z 'l2_groupings': [32], 2026-02-21T09:43:31.2911586Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:31.2911789Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:31.2911953Z 'num_stages': 3, 2026-02-21T09:43:31.2912098Z 'num_warps': 4, 2026-02-21T09:43:31.2912240Z 'pid_type': 'flat', 2026-02-21T09:43:31.2912400Z 'range_flattens': [None, False], 2026-02-21T09:43:31.2912586Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:31.2912767Z 'range_num_stages': [0, 0], 2026-02-21T09:43:31.2912937Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:31.2913111Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:31.2937445Z [157s] Fitting surrogate: 803 points, 803 targets 2026-02-21T09:43:31.7265206Z [157s] Generation 11 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:43:33.4106700Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 8.8 configs/s 2026-02-21T09:43:34.2562984Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 17.5 configs/s 2026-02-21T09:43:35.1780584Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1078.0 2026-02-21T09:43:35.1781427Z configs/s 2026-02-21T09:43:35.2526180Z [161s] Generation 11 complete: 2026-02-21T09:43:35.2527452Z error=2 2026-02-21T09:43:35.2527616Z ok=16 2026-02-21T09:43:35.2527744Z min=0.0389 2026-02-21T09:43:35.2527879Z mid=0.0450 2026-02-21T09:43:35.2528004Z max=2.9943 2026-02-21T09:43:35.2528152Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:35.2528375Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:35.2528588Z 'l2_groupings': [32], 2026-02-21T09:43:35.2528752Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:35.2528941Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:35.2529118Z 'num_stages': 3, 2026-02-21T09:43:35.2529254Z 'num_warps': 4, 2026-02-21T09:43:35.2529397Z 'pid_type': 'flat', 2026-02-21T09:43:35.2529548Z 'range_flattens': [None, False], 2026-02-21T09:43:35.2529732Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:35.2529912Z 'range_num_stages': [0, 0], 2026-02-21T09:43:35.2530319Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:35.2530516Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:35.2551106Z [161s] Fitting surrogate: 821 points, 821 targets 2026-02-21T09:43:35.6962690Z [161s] Generation 12 starting: 15 neighbors, 1 active search path(s) 2026-02-21T09:43:38.1339417Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 6.2 configs/s 2026-02-21T09:43:38.9754392Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 20.3 configs/s 2026-02-21T09:43:39.9732037Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 997.5 2026-02-21T09:43:39.9733752Z configs/s 2026-02-21T09:43:40.0499235Z [166s] Generation 12 complete: 2026-02-21T09:43:40.0499527Z error=2 2026-02-21T09:43:40.0499678Z ok=15 2026-02-21T09:43:40.0499804Z min=0.0389 2026-02-21T09:43:40.0499989Z mid=0.0411 2026-02-21T09:43:40.0500126Z max=1.8946 2026-02-21T09:43:40.0500310Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:40.0500935Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:40.0501198Z 'l2_groupings': [32], 2026-02-21T09:43:40.0501671Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:40.0501857Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:40.0502020Z 'num_stages': 3, 2026-02-21T09:43:40.0502157Z 'num_warps': 4, 2026-02-21T09:43:40.0502299Z 'pid_type': 'flat', 2026-02-21T09:43:40.0502450Z 'range_flattens': [None, False], 2026-02-21T09:43:40.0502633Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:40.0502815Z 'range_num_stages': [0, 0], 2026-02-21T09:43:40.0502980Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:40.0503164Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:40.0524603Z [166s] Fitting surrogate: 838 points, 838 targets 2026-02-21T09:43:40.5418054Z [166s] Generation 13 starting: 16 neighbors, 1 active search path(s) 2026-02-21T09:43:45.1491737Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 1.5 configs/s 2026-02-21T09:43:46.0741260Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 16/16 18.2 configs/s 2026-02-21T09:43:46.9998938Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1072.7 2026-02-21T09:43:46.9999292Z configs/s 2026-02-21T09:43:47.0744380Z [173s] Generation 13 complete: 2026-02-21T09:43:47.0744665Z error=1 2026-02-21T09:43:47.0746974Z ok=17 2026-02-21T09:43:47.0747107Z min=0.0389 2026-02-21T09:43:47.0747235Z mid=0.0430 2026-02-21T09:43:47.0747360Z max=5.0371 2026-02-21T09:43:47.0747491Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:43:47.0748022Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:43:47.0748223Z 'l2_groupings': [32], 2026-02-21T09:43:47.0748396Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:43:47.0748591Z 'loop_orders': [[1, 0]], 2026-02-21T09:43:47.0748747Z 'num_stages': 3, 2026-02-21T09:43:47.0748893Z 'num_warps': 4, 2026-02-21T09:43:47.0749062Z 'pid_type': 'flat', 2026-02-21T09:43:47.0749223Z 'range_flattens': [None, False], 2026-02-21T09:43:47.0749405Z 'range_multi_buffers': [None, True], 2026-02-21T09:43:47.0749591Z 'range_num_stages': [0, 0], 2026-02-21T09:43:47.0749758Z 'range_unroll_factors': [0, 0], 2026-02-21T09:43:47.0749937Z 'range_warp_specializes': [None, None]} 2026-02-21T09:43:47.0772066Z [173s] Fitting surrogate: 856 points, 856 targets 2026-02-21T09:43:47.3620887Z [173s] Autotuning complete in 173.6s after searching 816 configs. 2026-02-21T09:43:47.3621225Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:43:47.3622509Z @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], num_stages=3, num_warps=4, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:43:47.3623555Z 2026-02-21T09:43:47.3623809Z [173s] Code of selected kernel: /tmp/torchinductor_root/eu/ceuoawsrupehsvftxoq6b6z332e5onnpvkjuyzm4zuujw5xpzwwp.py 2026-02-21T09:43:47.3741188Z from __future__ import annotations 2026-02-21T09:43:47.3745708Z 2026-02-21T09:43:47.3747975Z import torch 2026-02-21T09:43:47.3748200Z import triton 2026-02-21T09:43:47.3753050Z import triton.language as tl 2026-02-21T09:43:47.3755062Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:43:47.3755320Z 2026-02-21T09:43:47.3759882Z _BLOCK_SIZE_1 = tl.constexpr(128) 2026-02-21T09:43:47.3764014Z _BLOCK_SIZE_0 = tl.constexpr(128) 2026-02-21T09:43:47.3765530Z _BLOCK_SIZE_2 = tl.constexpr(64) 2026-02-21T09:43:47.3765688Z 2026-02-21T09:43:47.3765760Z @triton.jit 2026-02-21T09:43:47.3765917Z def _helion_matmul(x, y, out): 2026-02-21T09:43:47.3766151Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:43:47.3766412Z num_pid_m = tl.cdiv(12288, _BLOCK_SIZE_1) 2026-02-21T09:43:47.3766632Z num_pid_n = tl.cdiv(1024, _BLOCK_SIZE_0) 2026-02-21T09:43:47.3766832Z inner_2d_pid = tl.program_id(0) 2026-02-21T09:43:47.3767261Z num_pid_in_group = 32 * num_pid_n 2026-02-21T09:43:47.3767472Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:43:47.3767676Z first_pid_m = group_id * 32 2026-02-21T09:43:47.3767881Z group_size_m = min(num_pid_m - first_pid_m, 32) 2026-02-21T09:43:47.3768158Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:43:47.3768450Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:43:47.3768678Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:43:47.3768919Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:43:47.3769166Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:43:47.3769384Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:43:47.3769774Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:43:47.3770076Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:43:47.3770334Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:43:47.3770611Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:43:47.3770998Z for offset_2 in tl.range(0, 1024, _BLOCK_SIZE_2, disallow_acc_multi_buffer=False, flatten=False): 2026-02-21T09:43:47.3771373Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:43:47.3771605Z acc_copy = acc 2026-02-21T09:43:47.3771768Z acc_copy_0 = acc_copy 2026-02-21T09:43:47.3772069Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:43:47.3772459Z load = tl.load(x + (indices_0[:, None] * 1024 + indices_2[None, :] * 1), None, eviction_policy='evict_last') 2026-02-21T09:43:47.3772843Z load_1 = tl.load(y + (indices_2[:, None] * 1 + indices_1[None, :] * 1024), None) 2026-02-21T09:43:47.3773292Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:43:47.3773725Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:43:47.3773970Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:43:47.3774216Z tl.store(out + (indices_0[:, None] * 12288 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:43:47.3774405Z 2026-02-21T09:43:47.3774751Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:43:47.3775150Z """ 2026-02-21T09:43:47.3775403Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:43:47.3775692Z Args: 2026-02-21T09:43:47.3775869Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:43:47.3776123Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:43:47.3776413Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:43:47.3776736Z after the matmul. Defaults to identity (no change). 2026-02-21T09:43:47.3776937Z Returns: 2026-02-21T09:43:47.3777095Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:43:47.3777272Z """ 2026-02-21T09:43:47.3777410Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:43:47.3777579Z m, k = x.size() 2026-02-21T09:43:47.3777732Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:43:47.3777910Z k2, n = y.size() 2026-02-21T09:43:47.3778102Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:43:47.3778349Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:43:47.3778547Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:43:47.3778831Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:43:47.3779104Z # src[matmul.py:62]: ) 2026-02-21T09:43:47.3779358Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:43:47.3779707Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:43:47.3779916Z _BLOCK_SIZE_1 = 128 2026-02-21T09:43:47.3780067Z _BLOCK_SIZE_0 = 128 2026-02-21T09:43:47.3780249Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:43:47.3780527Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:43:47.3780792Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:43:47.3780998Z # src[matmul.py:63-67]: ... 2026-02-21T09:43:47.3781354Z _launcher(_helion_matmul, (triton.cdiv(12288, _BLOCK_SIZE_1) * triton.cdiv(1024, _BLOCK_SIZE_0),), x, y, out, num_warps=4, num_stages=3) 2026-02-21T09:43:47.3781739Z # src[matmul.py:68]: return out 2026-02-21T09:43:47.3781907Z return out 2026-02-21T09:44:18.0373369Z WARNING:tritonbench.utils.triton_op:Completed input ID 9: 2026-02-21T09:44:18.0377692Z (M, N, K) 2026-02-21T09:44:18.0377847Z ------------------- 2026-02-21T09:44:18.0378026Z (1024, 12288, 1024) 2026-02-21T09:44:18.0378113Z 2026-02-21T09:44:18.0388110Z 88%|████████▊ | 7/8 [46:14<06:53, 413.45s/it]WARNING:tritonbench.utils.triton_op:Running input ID 11: 2026-02-21T09:44:18.0388463Z (M, N, K) 2026-02-21T09:44:18.0392281Z ------------------- 2026-02-21T09:44:18.0396080Z (2048, 12288, 2048) 2026-02-21T09:44:18.0400116Z INFO:tritonbench.utils.triton_op:Took 0.00ms to get benchmark function for aten_matmul 2026-02-21T09:45:04.8358617Z INFO:tritonbench.utils.triton_op:Took 0.01ms to get benchmark function for triton_tutorial_matmul 2026-02-21T09:45:41.2562681Z INFO:tritonbench.utils.triton_op:Took 86.09ms to get benchmark function for pt2_triton_matmul 2026-02-21T09:46:22.8173996Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:46:22.8177840Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:46:22.8182218Z 'dtype': 'torch.float16', 2026-02-21T09:46:22.8186190Z 'shape': (2048, 2048), 2026-02-21T09:46:22.8190565Z 'stride': (2048, 1)}, 2026-02-21T09:46:22.8192045Z { 'device': 'cuda:0', 2026-02-21T09:46:22.8192272Z 'dtype': 'torch.float16', 2026-02-21T09:46:22.8192470Z 'shape': (2048, 12288), 2026-02-21T09:46:22.8192644Z 'stride': (1, 2048)}, 2026-02-21T09:46:22.8192814Z None), 2026-02-21T09:46:22.8192950Z 'kwargs': {}} 2026-02-21T09:46:22.8221549Z INFO:tritonbench.utils.triton_op:Took 5.26ms to get benchmark function for helion_matmul_tritonbench 2026-02-21T09:46:22.9131983Z [0s] Autotune random seed: 2137757931 2026-02-21T09:46:23.0425670Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:46:33.1917857Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━ 100/100 42.3 configs/s 2026-02-21T09:47:03.3909439Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 3.3 configs/s 2026-02-21T09:47:03.3918292Z [40s] Adaptive compile timeout: 30s (90% percentile=8.0s, bounds=[30.0s, 30s]) 2026-02-21T09:47:04.0366191Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━━━ 182/182 186.2 configs/s 2026-02-21T09:47:04.3858751Z [41s] Initial random population of 100, 5 starting points: 2026-02-21T09:47:04.3860193Z error=19 2026-02-21T09:47:04.3860359Z ok=81 2026-02-21T09:47:04.3860484Z min=1.0967 2026-02-21T09:47:04.3860623Z mid=6.8290 2026-02-21T09:47:04.3860752Z max=679.7250 2026-02-21T09:47:04.3860902Z best={'block_sizes': [256, 16, 16], 2026-02-21T09:47:04.3861230Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:47:04.3862734Z 'l2_groupings': [4], 2026-02-21T09:47:04.3862935Z 'load_eviction_policies': ['', ''], 2026-02-21T09:47:04.3863130Z 'loop_orders': [[1, 0]], 2026-02-21T09:47:04.3863285Z 'maxnreg': 32, 2026-02-21T09:47:04.3863436Z 'num_sm_multiplier': 16, 2026-02-21T09:47:04.3863605Z 'num_stages': 4, 2026-02-21T09:47:04.3863751Z 'num_warps': 16, 2026-02-21T09:47:04.3863909Z 'pid_type': 'persistent_blocked', 2026-02-21T09:47:04.3864314Z 'range_flattens': [True, None], 2026-02-21T09:47:04.3864498Z 'range_multi_buffers': [False, None], 2026-02-21T09:47:04.3864813Z 'range_num_stages': [0, 0], 2026-02-21T09:47:04.3864991Z 'range_unroll_factors': [0, 0], 2026-02-21T09:47:04.3865168Z 'range_warp_specializes': [True, None]} 2026-02-21T09:47:04.3873566Z [41s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:47:05.7380275Z [42s] Generation 1 starting: 92 neighbors, 5 active search path(s) 2026-02-21T09:47:11.8314555Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 95/95 20.5 configs/s 2026-02-21T09:47:17.4361055Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 95/95 17.1 configs/s 2026-02-21T09:47:18.4130475Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━ 610/610 563.2 configs/s 2026-02-21T09:47:18.5379282Z [55s] Generation 1 complete: 2026-02-21T09:47:18.5381057Z error=16 2026-02-21T09:47:18.5381605Z ok=81 2026-02-21T09:47:18.5381753Z min=0.3277 2026-02-21T09:47:18.5381900Z mid=1.3794 2026-02-21T09:47:18.5382021Z max=12.0494 2026-02-21T09:47:18.5382170Z best={'block_sizes': [64, 128, 16], 2026-02-21T09:47:18.5382429Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:47:18.5382684Z 'l2_groupings': [32], 2026-02-21T09:47:18.5382851Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:47:18.5383045Z 'loop_orders': [[1, 0]], 2026-02-21T09:47:18.5383199Z 'num_stages': 7, 2026-02-21T09:47:18.5383341Z 'num_warps': 4, 2026-02-21T09:47:18.5383478Z 'pid_type': 'flat', 2026-02-21T09:47:18.5383727Z 'range_flattens': [None, None], 2026-02-21T09:47:18.5383915Z 'range_multi_buffers': [None, False], 2026-02-21T09:47:18.5384100Z 'range_num_stages': [0, 0], 2026-02-21T09:47:18.5384272Z 'range_unroll_factors': [0, 0], 2026-02-21T09:47:18.5384448Z 'range_warp_specializes': [None, None]} 2026-02-21T09:47:18.5397402Z [55s] Fitting surrogate: 197 points, 197 targets 2026-02-21T09:47:19.7954311Z [56s] Generation 2 starting: 88 neighbors, 5 active search path(s) 2026-02-21T09:47:26.7239550Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91/91 24.7 configs/s 2026-02-21T09:47:26.8838743Z [63s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:26.8840033Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:47:26.8841382Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:26.8841569Z 2026-02-21T09:47:26.8841929Z `ptxas` stderr: 2026-02-21T09:47:26.8842375Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 313 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:26.8842866Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:26.8843024Z 2026-02-21T09:47:26.8843437Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp0l1e5xdy.ptx -o /tmp/tmp0l1e5xdy.ptx.o 2026-02-21T09:47:26.8843888Z 2026-02-21T09:47:26.8844025Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:26.8844216Z 2026-02-21T09:47:26.8844360Z ================================================================ 2026-02-21T09:47:26.8848115Z Internal Triton PTX codegen error 2026-02-21T09:47:26.8851998Z `ptxas` stderr: 2026-02-21T09:47:26.8855564Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 313 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:26.8860563Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:26.8863608Z 2026-02-21T09:47:26.8868071Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp0l1e5xdy.ptx -o /tmp/tmp0l1e5xdy.ptx.o 2026-02-21T09:47:26.8868607Z 2026-02-21T09:47:26.8868685Z 2026-02-21T09:47:26.8868753Z // 2026-02-21T09:47:26.8868940Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:26.8869122Z // 2026-02-21T09:47:26.8869225Z 2026-02-21T09:47:26.8869315Z .version 8.7 2026-02-21T09:47:26.8869462Z .target sm_100a 2026-02-21T09:47:26.8869625Z .address_size 64 2026-02-21T09:47:26.8873447Z 2026-02-21T09:47:26.8879446Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:26.8883765Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:26.8886971Z // @_helion_matmul 2026-02-21T09:47:26.8891640Z .visible .entry _helion_matmul( 2026-02-21T09:47:26.8895380Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:26.8899868Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:26.8903298Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:26.8907050Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:26.8909125Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:26.8909364Z ) 2026-02-21T09:47:26.8909496Z .reqntid 256 2026-02-21T09:47:26.8909629Z .maxnreg 32 2026-02-21T09:47:26.8909956Z { 2026-02-21T09:47:26.8910079Z .reg .pred %p<124>; 2026-02-21T09:47:26.8910235Z .reg .b16 %rs<11>; 2026-02-21T09:47:26.8910373Z .reg .b32 %r<282>; 2026-02-21T09:47:26.8910513Z .reg .b64 %rd<124>; 2026-02-21T09:47:26.8910770Z .loc 1 19 0 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:19:0 2026-02-21T09:47:26.8911075Z $L__func_begin0: 2026-02-21T09:47:26.8911322Z .loc 1 19 0 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:19:0 2026-02-21T09:47:26.8911544Z 2026-02-21T09:47:26.8911598Z // %bb.0: 2026-02-21T09:47:26.8911760Z ld.param.b64 %rd8, [_helion_matmul_param_0]; 2026-02-21T09:47:26.8911941Z $L__tmp0: 2026-02-21T09:47:26.8912173Z .loc 1 19 0 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:19 2026-02-21T09:47:26.8912441Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:26.8912621Z ld.param.b64 %rd26, [_helion_matmul_param_1]; 2026-02-21T09:47:26.8912821Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:47:26.8913000Z ld.param.b64 %rd44, [_helion_matmul_param_2]; 2026-02-21T09:47:26.8913199Z mov.b32 %r29, global_smem; 2026-02-21T09:47:26.8913360Z // begin inline asm 2026-02-21T09:47:26.8913620Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r29], 32; 2026-02-21T09:47:26.8913862Z // end inline asm 2026-02-21T09:47:26.8914083Z ld.param.b64 %rd61, [_helion_matmul_param_3]; 2026-02-21T09:47:26.8914275Z bar.sync 0; 2026-02-21T09:47:26.8914437Z ld.shared.b32 %r272, [global_smem]; 2026-02-21T09:47:26.8914621Z bar.sync 0; 2026-02-21T09:47:26.8914814Z // begin inline asm 2026-02-21T09:47:26.8915042Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:26.8915271Z // end inline asm 2026-02-21T09:47:26.8915542Z .loc 1 21 67 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:21:67 2026-02-21T09:47:26.8915842Z mov.u32 %r54, %ctaid.x; 2026-02-21T09:47:26.8916008Z mov.u32 %r55, %ctaid.y; 2026-02-21T09:47:26.8916161Z mov.u32 %r56, %ctaid.z; 2026-02-21T09:47:26.8916325Z mov.u32 %r57, %nctaid.x; 2026-02-21T09:47:26.8916490Z mov.u32 %r58, %nctaid.y; 2026-02-21T09:47:26.8916650Z mad.lo.s32 %r59, %r56, %r58, %r55; 2026-02-21T09:47:26.8916841Z mad.lo.s32 %r60, %r59, %r57, %r54; 2026-02-21T09:47:26.8917017Z mul.lo.s32 %r61, %r60, 384; 2026-02-21T09:47:26.8917191Z cvt.s64.s32 %rd62, %r61; 2026-02-21T09:47:26.8917351Z add.s64 %rd22, %rd61, %rd62; 2026-02-21T09:47:26.8917521Z shl.b32 %r62, %r1, 2; 2026-02-21T09:47:26.8917723Z add.s32 %r30, %r29, %r62; 2026-02-21T09:47:26.8918033Z mov.b32 %r39, 0; 2026-02-21T09:47:26.8918177Z // begin inline asm 2026-02-21T09:47:26.8918348Z @%p1 st.shared.b32 [ %r30 + 0 ], %r39; 2026-02-21T09:47:26.8918538Z // end inline asm 2026-02-21T09:47:26.8918687Z bar.warp.sync -1; 2026-02-21T09:47:26.8918848Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:47:26.8919013Z cvt.u64.u32 %rd7, %r29; 2026-02-21T09:47:26.8919174Z // begin inline asm 2026-02-21T09:47:26.8919439Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd7 + 0 ], %rd8; 2026-02-21T09:47:26.8919736Z // end inline asm 2026-02-21T09:47:26.8919879Z // begin inline asm 2026-02-21T09:47:26.8920115Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1; 2026-02-21T09:47:26.8920379Z // end inline asm 2026-02-21T09:47:26.8920521Z mov.b32 %r32, 16; 2026-02-21T09:47:26.8920712Z // begin inline asm 2026-02-21T09:47:26.8920950Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r32; 2026-02-21T09:47:26.8921231Z // end inline asm 2026-02-21T09:47:26.8921364Z mov.b32 %r33, 256; 2026-02-21T09:47:26.8921509Z // begin inline asm 2026-02-21T09:47:26.8921739Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r33; 2026-02-21T09:47:26.8922007Z // end inline asm 2026-02-21T09:47:26.8922150Z mov.b32 %r34, 2048; 2026-02-21T09:47:26.8922296Z // begin inline asm 2026-02-21T09:47:26.8922549Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r34; 2026-02-21T09:47:26.8922863Z // end inline asm 2026-02-21T09:47:26.8923010Z // begin inline asm 2026-02-21T09:47:26.8923254Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r34; 2026-02-21T09:47:26.8923520Z // end inline asm 2026-02-21T09:47:26.8923657Z mov.b64 %rd15, 4096; 2026-02-21T09:47:26.8923797Z // begin inline asm 2026-02-21T09:47:26.8924041Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd7 + 0 ], 0x0, %rd15; 2026-02-21T09:47:26.8924313Z // end inline asm 2026-02-21T09:47:26.8924447Z mov.b32 %r36, 1; 2026-02-21T09:47:26.8924573Z // begin inline asm 2026-02-21T09:47:26.8924854Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r36; 2026-02-21T09:47:26.8925135Z // end inline asm 2026-02-21T09:47:26.8925271Z // begin inline asm 2026-02-21T09:47:26.8925522Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r36; 2026-02-21T09:47:26.8925796Z // end inline asm 2026-02-21T09:47:26.8925933Z // begin inline asm 2026-02-21T09:47:26.8926160Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x6; 2026-02-21T09:47:26.8926429Z // end inline asm 2026-02-21T09:47:26.8926559Z // begin inline asm 2026-02-21T09:47:26.8926846Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0; 2026-02-21T09:47:26.8927124Z // end inline asm 2026-02-21T09:47:26.8927251Z // begin inline asm 2026-02-21T09:47:26.8927485Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1; 2026-02-21T09:47:26.8927733Z // end inline asm 2026-02-21T09:47:26.8927868Z // begin inline asm 2026-02-21T09:47:26.8928084Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0; 2026-02-21T09:47:26.8928333Z // end inline asm 2026-02-21T09:47:26.8928459Z // begin inline asm 2026-02-21T09:47:26.8928801Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd22 + 0 ], [ %rd7 + 0 ], 0x80; 2026-02-21T09:47:26.8929167Z // end inline asm 2026-02-21T09:47:26.8929293Z // begin inline asm 2026-02-21T09:47:26.8929499Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd22 + 0 ], 0x80; 2026-02-21T09:47:26.8929737Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:47:26.8929928Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:26.8930100Z // end inline asm 2026-02-21T09:47:26.8930233Z bar.sync 0; 2026-02-21T09:47:26.8930410Z cvta.global.u64 %rd79, %rd22; 2026-02-21T09:47:26.8930676Z .loc 1 22 68 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:22:68 2026-02-21T09:47:26.8930969Z add.s32 %r63, %r61, 128; 2026-02-21T09:47:26.8931116Z cvt.s64.s32 %rd63, %r63; 2026-02-21T09:47:26.8931274Z add.s64 %rd40, %rd61, %rd63; 2026-02-21T09:47:26.8931423Z bar.sync 0; 2026-02-21T09:47:26.8931558Z // begin inline asm 2026-02-21T09:47:26.8931702Z @%p1 st.shared.b32 [ %r30 + 0 ], %r39; 2026-02-21T09:47:26.8931874Z // end inline asm 2026-02-21T09:47:26.8932012Z bar.warp.sync -1; 2026-02-21T09:47:26.8932155Z // begin inline asm 2026-02-21T09:47:26.8932396Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd7 + 0 ], %rd26; 2026-02-21T09:47:26.8932659Z // end inline asm 2026-02-21T09:47:26.8932800Z // begin inline asm 2026-02-21T09:47:26.8933045Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1; 2026-02-21T09:47:26.8933301Z // end inline asm 2026-02-21T09:47:26.8933434Z // begin inline asm 2026-02-21T09:47:26.8933666Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r32; 2026-02-21T09:47:26.8933928Z // end inline asm 2026-02-21T09:47:26.8934058Z // begin inline asm 2026-02-21T09:47:26.8934310Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r32; 2026-02-21T09:47:26.8934561Z // end inline asm 2026-02-21T09:47:26.8934731Z // begin inline asm 2026-02-21T09:47:26.8934965Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r34; 2026-02-21T09:47:26.8935260Z // end inline asm 2026-02-21T09:47:26.8935397Z mov.b32 %r43, 12288; 2026-02-21T09:47:26.8935535Z // begin inline asm 2026-02-21T09:47:26.8935767Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r43; 2026-02-21T09:47:26.8936024Z // end inline asm 2026-02-21T09:47:26.8936163Z // begin inline asm 2026-02-21T09:47:26.8936401Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd7 + 0 ], 0x0, %rd15; 2026-02-21T09:47:26.8936676Z // end inline asm 2026-02-21T09:47:26.8936804Z // begin inline asm 2026-02-21T09:47:26.8937058Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r36; 2026-02-21T09:47:26.8937336Z // end inline asm 2026-02-21T09:47:26.8937466Z // begin inline asm 2026-02-21T09:47:26.8937708Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r36; 2026-02-21T09:47:26.8937977Z // end inline asm 2026-02-21T09:47:26.8938115Z // begin inline asm 2026-02-21T09:47:26.8938334Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x6; 2026-02-21T09:47:26.8938593Z // end inline asm 2026-02-21T09:47:26.8938727Z // begin inline asm 2026-02-21T09:47:26.8938990Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0; 2026-02-21T09:47:26.8939266Z // end inline asm 2026-02-21T09:47:26.8939393Z // begin inline asm 2026-02-21T09:47:26.8939619Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1; 2026-02-21T09:47:26.8939865Z // end inline asm 2026-02-21T09:47:26.8939998Z // begin inline asm 2026-02-21T09:47:26.8940216Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0; 2026-02-21T09:47:26.8940457Z // end inline asm 2026-02-21T09:47:26.8940592Z // begin inline asm 2026-02-21T09:47:26.8940923Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd40 + 0 ], [ %rd7 + 0 ], 0x80; 2026-02-21T09:47:26.8941288Z // end inline asm 2026-02-21T09:47:26.8941416Z // begin inline asm 2026-02-21T09:47:26.8941625Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd40 + 0 ], 0x80; 2026-02-21T09:47:26.8941870Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:47:26.8942049Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:26.8942229Z // end inline asm 2026-02-21T09:47:26.8942358Z bar.sync 0; 2026-02-21T09:47:26.8942498Z cvta.global.u64 %rd80, %rd40; 2026-02-21T09:47:26.8942793Z .loc 1 24 73 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:24:73 2026-02-21T09:47:26.8943080Z add.s32 %r64, %r61, 256; 2026-02-21T09:47:26.8943227Z cvt.s64.s32 %rd64, %r64; 2026-02-21T09:47:26.8943384Z add.s64 %rd58, %rd61, %rd64; 2026-02-21T09:47:26.8943532Z bar.sync 0; 2026-02-21T09:47:26.8943662Z // begin inline asm 2026-02-21T09:47:26.8943813Z @%p1 st.shared.b32 [ %r30 + 0 ], %r39; 2026-02-21T09:47:26.8943978Z // end inline asm 2026-02-21T09:47:26.8944122Z bar.warp.sync -1; 2026-02-21T09:47:26.8944259Z // begin inline asm 2026-02-21T09:47:26.8944506Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd7 + 0 ], %rd44; 2026-02-21T09:47:26.8944797Z // end inline asm 2026-02-21T09:47:26.8944933Z // begin inline asm 2026-02-21T09:47:26.8945171Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1; 2026-02-21T09:47:26.8945415Z // end inline asm 2026-02-21T09:47:26.8945550Z // begin inline asm 2026-02-21T09:47:26.8945772Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r32; 2026-02-21T09:47:26.8946030Z // end inline asm 2026-02-21T09:47:26.8946159Z // begin inline asm 2026-02-21T09:47:26.8946382Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r33; 2026-02-21T09:47:26.8946626Z // end inline asm 2026-02-21T09:47:26.8946763Z // begin inline asm 2026-02-21T09:47:26.8946994Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r43; 2026-02-21T09:47:26.8947289Z // end inline asm 2026-02-21T09:47:26.8947424Z // begin inline asm 2026-02-21T09:47:26.8947650Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r34; 2026-02-21T09:47:26.8947911Z // end inline asm 2026-02-21T09:47:26.8948042Z mov.b64 %rd51, 24576; 2026-02-21T09:47:26.8948191Z // begin inline asm 2026-02-21T09:47:26.8948427Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd7 + 0 ], 0x0, %rd51; 2026-02-21T09:47:26.8948700Z // end inline asm 2026-02-21T09:47:26.8948835Z // begin inline asm 2026-02-21T09:47:26.8949071Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0, %r36; 2026-02-21T09:47:26.8949344Z // end inline asm 2026-02-21T09:47:26.8949472Z // begin inline asm 2026-02-21T09:47:26.8949710Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1, %r36; 2026-02-21T09:47:26.8949973Z // end inline asm 2026-02-21T09:47:26.8950103Z // begin inline asm 2026-02-21T09:47:26.8950329Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x6; 2026-02-21T09:47:26.8950578Z // end inline asm 2026-02-21T09:47:26.8950709Z // begin inline asm 2026-02-21T09:47:26.8950942Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0; 2026-02-21T09:47:26.8951260Z // end inline asm 2026-02-21T09:47:26.8951390Z // begin inline asm 2026-02-21T09:47:26.8951617Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x1; 2026-02-21T09:47:26.8951873Z // end inline asm 2026-02-21T09:47:26.8952000Z // begin inline asm 2026-02-21T09:47:26.8952217Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd7 + 0 ], 0x0; 2026-02-21T09:47:26.8952457Z // end inline asm 2026-02-21T09:47:26.8952593Z // begin inline asm 2026-02-21T09:47:26.8952916Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd58 + 0 ], [ %rd7 + 0 ], 0x80; 2026-02-21T09:47:26.8953282Z // end inline asm 2026-02-21T09:47:26.8953410Z // begin inline asm 2026-02-21T09:47:26.8953619Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd58 + 0 ], 0x80; 2026-02-21T09:47:26.8953865Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:47:26.8954044Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:26.8954220Z // end inline asm 2026-02-21T09:47:26.8954351Z bar.sync 0; 2026-02-21T09:47:26.8954495Z cvta.global.u64 %rd91, %rd58; 2026-02-21T09:47:26.8954831Z .loc 1 31 35 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:31:35 2026-02-21T09:47:26.8955137Z mul.lo.s32 %r273, %r54, 3; 2026-02-21T09:47:26.8955406Z .loc 1 32 37 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:32:37 2026-02-21T09:47:26.8955685Z add.s32 %r65, %r273, 3; 2026-02-21T09:47:26.8955944Z .loc 1 32 49 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:32:49 2026-02-21T09:47:26.8956218Z min.s32 %r4, %r65, 6144; 2026-02-21T09:47:26.8956476Z .loc 1 33 84 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:33:84 2026-02-21T09:47:26.8956760Z setp.ge.s32 %p57, %r273, %r4; 2026-02-21T09:47:26.8956929Z @%p57 bra $L__BB0_9; 2026-02-21T09:47:26.8957095Z // %bb.1: // %.lr.ph 2026-02-21T09:47:26.8957409Z .loc 1 0 84 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:0:84 2026-02-21T09:47:26.8957691Z shr.u32 %r5, %r1, 5; 2026-02-21T09:47:26.8957839Z bfe.u32 %r67, %r29, 4, 14; 2026-02-21T09:47:26.8957998Z cvt.u64.u32 %rd65, %r67; 2026-02-21T09:47:26.8958163Z or.b64 %rd74, %rd65, -4611685949674356736; 2026-02-21T09:47:26.8958348Z add.s32 %r68, %r29, 40960; 2026-02-21T09:47:26.8958498Z bfe.u32 %r69, %r68, 4, 14; 2026-02-21T09:47:26.8958655Z cvt.u64.u32 %rd66, %r69; 2026-02-21T09:47:26.8958814Z or.b64 %rd75, %rd66, -4611685949705814016; 2026-02-21T09:47:26.8958995Z add.s32 %r70, %r29, 4096; 2026-02-21T09:47:26.8959191Z bfe.u32 %r71, %r70, 4, 14; 2026-02-21T09:47:26.8959346Z cvt.u64.u32 %rd67, %r71; 2026-02-21T09:47:26.8959513Z or.b64 %rd76, %rd67, -4611685949674356736; 2026-02-21T09:47:26.8959691Z shl.b32 %r72, %r1, 5; 2026-02-21T09:47:26.8959848Z and.b32 %r73, %r72, 8032; 2026-02-21T09:47:26.8960001Z bfe.s32 %r74, %r1, 2, 1; 2026-02-21T09:47:26.8960158Z and.b32 %r75, %r74, 144; 2026-02-21T09:47:26.8960305Z or.b32 %r76, %r75, %r73; 2026-02-21T09:47:26.8960460Z add.s32 %r77, %r29, 32768; 2026-02-21T09:47:26.8960619Z add.s32 %r6, %r77, %r76; 2026-02-21T09:47:26.8960768Z xor.b32 %r78, %r76, 16; 2026-02-21T09:47:26.8960923Z add.s32 %r7, %r77, %r78; 2026-02-21T09:47:26.8961066Z bra.uni $L__BB0_2; 2026-02-21T09:47:26.8961259Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:26.8961589Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8961880Z // begin inline asm 2026-02-21T09:47:26.8962017Z 2026-02-21T09:47:26.8962139Z { 2026-02-21T09:47:26.8962264Z .reg .pred complete; 2026-02-21T09:47:26.8962418Z waitLoop: 2026-02-21T09:47:26.8962619Z mbarrier.try_wait.parity.shared.b64 complete, [%r218], %r219; 2026-02-21T09:47:26.8962864Z @!complete bra.uni waitLoop; 2026-02-21T09:47:26.8963027Z } 2026-02-21T09:47:26.8963094Z 2026-02-21T09:47:26.8963178Z // end inline asm 2026-02-21T09:47:26.8963448Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.8963740Z bar.sync 0; 2026-02-21T09:47:26.8963885Z // begin inline asm 2026-02-21T09:47:26.8964061Z @%p4 mbarrier.inval.shared::cta.b64 [%r98]; 2026-02-21T09:47:26.8964266Z // end inline asm 2026-02-21T09:47:26.8964418Z bar.sync 0; 2026-02-21T09:47:26.8964560Z // begin inline asm 2026-02-21T09:47:26.8964781Z @%p4 mbarrier.inval.shared::cta.b64 [%r99]; 2026-02-21T09:47:26.8964969Z // end inline asm 2026-02-21T09:47:26.8965107Z bar.sync 0; 2026-02-21T09:47:26.8965237Z // begin inline asm 2026-02-21T09:47:26.8965405Z @%p4 mbarrier.inval.shared::cta.b64 [%r100]; 2026-02-21T09:47:26.8965588Z // end inline asm 2026-02-21T09:47:26.8965727Z bar.sync 0; 2026-02-21T09:47:26.8965853Z // begin inline asm 2026-02-21T09:47:26.8966018Z @%p4 mbarrier.inval.shared::cta.b64 [%r156]; 2026-02-21T09:47:26.8966211Z // end inline asm 2026-02-21T09:47:26.8966351Z add.s32 %r224, %r29, 43040; 2026-02-21T09:47:26.8966521Z // begin inline asm 2026-02-21T09:47:26.8966733Z @%p4 mbarrier.inval.shared::cta.b64 [%r224]; 2026-02-21T09:47:26.8966922Z // end inline asm 2026-02-21T09:47:26.8967053Z bar.sync 0; 2026-02-21T09:47:26.8967187Z // begin inline asm 2026-02-21T09:47:26.8967344Z @%p4 mbarrier.inval.shared::cta.b64 [%r97]; 2026-02-21T09:47:26.8967531Z // end inline asm 2026-02-21T09:47:26.8967781Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8968063Z // begin inline asm 2026-02-21T09:47:26.8968441Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r226, %r227, %r228, %r229, %r230, %r231, %r232, %r233, %r234, %r235, %r236, %r237, %r238, %r239, %r240, %r241}, [%r242 + 0]; 2026-02-21T09:47:26.8968821Z // end inline asm 2026-02-21T09:47:26.8968958Z // begin inline asm 2026-02-21T09:47:26.8969106Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:26.8969273Z // end inline asm 2026-02-21T09:47:26.8969440Z cvt.u64.u32 %rd92, %r226; 2026-02-21T09:47:26.8969593Z cvt.u64.u32 %rd93, %r227; 2026-02-21T09:47:26.8969748Z shl.b64 %rd94, %rd93, 32; 2026-02-21T09:47:26.8969895Z or.b64 %rd95, %rd92, %rd94; 2026-02-21T09:47:26.8970160Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8970437Z mov.b64 {%r247, %r248}, %rd95; 2026-02-21T09:47:26.8970608Z cvt.rn.f16x2.f32 %r249, %r248, %r247; 2026-02-21T09:47:26.8970878Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8971191Z cvt.u64.u32 %rd96, %r228; 2026-02-21T09:47:26.8971343Z cvt.u64.u32 %rd97, %r229; 2026-02-21T09:47:26.8971488Z shl.b64 %rd98, %rd97, 32; 2026-02-21T09:47:26.8971641Z or.b64 %rd99, %rd96, %rd98; 2026-02-21T09:47:26.8971903Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8972198Z mov.b64 {%r250, %r251}, %rd99; 2026-02-21T09:47:26.8972362Z cvt.rn.f16x2.f32 %r252, %r251, %r250; 2026-02-21T09:47:26.8972651Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8972940Z cvt.u64.u32 %rd100, %r230; 2026-02-21T09:47:26.8973093Z cvt.u64.u32 %rd101, %r231; 2026-02-21T09:47:26.8973249Z shl.b64 %rd102, %rd101, 32; 2026-02-21T09:47:26.8973403Z or.b64 %rd103, %rd100, %rd102; 2026-02-21T09:47:26.8973669Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8973954Z mov.b64 {%r253, %r254}, %rd103; 2026-02-21T09:47:26.8974130Z cvt.rn.f16x2.f32 %r255, %r254, %r253; 2026-02-21T09:47:26.8974402Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8974731Z cvt.u64.u32 %rd104, %r232; 2026-02-21T09:47:26.8974895Z cvt.u64.u32 %rd105, %r233; 2026-02-21T09:47:26.8975076Z shl.b64 %rd106, %rd105, 32; 2026-02-21T09:47:26.8975238Z or.b64 %rd107, %rd104, %rd106; 2026-02-21T09:47:26.8975493Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8975781Z mov.b64 {%r256, %r257}, %rd107; 2026-02-21T09:47:26.8975943Z cvt.rn.f16x2.f32 %r258, %r257, %r256; 2026-02-21T09:47:26.8976219Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8976495Z cvt.u64.u32 %rd108, %r234; 2026-02-21T09:47:26.8976645Z cvt.u64.u32 %rd109, %r235; 2026-02-21T09:47:26.8976802Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:47:26.8976953Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:47:26.8977212Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8977482Z mov.b64 {%r259, %r260}, %rd111; 2026-02-21T09:47:26.8977650Z cvt.rn.f16x2.f32 %r261, %r260, %r259; 2026-02-21T09:47:26.8977918Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8978191Z cvt.u64.u32 %rd112, %r236; 2026-02-21T09:47:26.8978378Z cvt.u64.u32 %rd113, %r237; 2026-02-21T09:47:26.8978531Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:47:26.8978694Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:47:26.8978952Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8979243Z mov.b64 {%r262, %r263}, %rd115; 2026-02-21T09:47:26.8979406Z cvt.rn.f16x2.f32 %r264, %r263, %r262; 2026-02-21T09:47:26.8979680Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8979961Z cvt.u64.u32 %rd116, %r238; 2026-02-21T09:47:26.8980113Z cvt.u64.u32 %rd117, %r239; 2026-02-21T09:47:26.8980269Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:47:26.8980422Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:47:26.8980709Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8980989Z mov.b64 {%r265, %r266}, %rd119; 2026-02-21T09:47:26.8981154Z cvt.rn.f16x2.f32 %r267, %r266, %r265; 2026-02-21T09:47:26.8981417Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8981698Z cvt.u64.u32 %rd120, %r240; 2026-02-21T09:47:26.8981849Z cvt.u64.u32 %rd121, %r241; 2026-02-21T09:47:26.8981996Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:47:26.8982153Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:47:26.8982405Z .loc 1 58 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:58:27 2026-02-21T09:47:26.8982709Z mov.b64 {%r268, %r269}, %rd123; 2026-02-21T09:47:26.8982868Z cvt.rn.f16x2.f32 %r270, %r269, %r268; 2026-02-21T09:47:26.8983136Z .loc 1 59 45 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:59:45 2026-02-21T09:47:26.8983426Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:26.8983594Z bar.sync 0; 2026-02-21T09:47:26.8983767Z st.shared.v4.b32 [%r6], {%r249, %r252, %r255, %r258}; 2026-02-21T09:47:26.8983998Z st.shared.v4.b32 [%r7], {%r261, %r264, %r267, %r270}; 2026-02-21T09:47:26.8984197Z // begin inline asm 2026-02-21T09:47:26.8984351Z fence.proxy.async.shared::cta; 2026-02-21T09:47:26.8984517Z // end inline asm 2026-02-21T09:47:26.8984647Z bar.sync 0; 2026-02-21T09:47:26.8984824Z elect.sync %r271|%p121, -1; 2026-02-21T09:47:26.8984985Z and.pred %p119, %p1, %p121; 2026-02-21T09:47:26.8985149Z // begin inline asm 2026-02-21T09:47:26.8985417Z @%p119 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd91, {%r243, %r244}], [%r77]; 2026-02-21T09:47:26.8985702Z // end inline asm 2026-02-21T09:47:26.8985849Z cp.async.bulk.commit_group; 2026-02-21T09:47:26.8986114Z .loc 1 33 84 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:33:84 2026-02-21T09:47:26.8986405Z add.s32 %r273, %r273, 1; 2026-02-21T09:47:26.8986587Z setp.ne.b32 %p122, %r273, %r4; 2026-02-21T09:47:26.8986762Z @%p122 bra $L__BB0_2; 2026-02-21T09:47:26.8986910Z bra.uni $L__BB0_9; 2026-02-21T09:47:26.8987090Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:47:26.8987327Z // Child Loop BB0_5 Depth 2 2026-02-21T09:47:26.8987627Z .loc 1 39 35 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:39:35 2026-02-21T09:47:26.8987907Z shr.s32 %r131, %r273, 31; 2026-02-21T09:47:26.8988055Z shr.u32 %r132, %r131, 27; 2026-02-21T09:47:26.8988212Z add.s32 %r133, %r273, %r132; 2026-02-21T09:47:26.8988467Z .loc 1 42 45 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:42:45 2026-02-21T09:47:26.8988746Z and.b32 %r134, %r133, 65504; 2026-02-21T09:47:26.8988904Z sub.s32 %r135, %r273, %r134; 2026-02-21T09:47:26.8989155Z .loc 1 42 64 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:42:64 2026-02-21T09:47:26.8989438Z cvt.u16.u32 %rs1, %r135; 2026-02-21T09:47:26.8989588Z cvt.s8.s32 %rs2, %r135; 2026-02-21T09:47:26.8989881Z .loc 1 43 51 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:43:51 2026-02-21T09:47:26.8990150Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:47:26.8990303Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:47:26.8990457Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:47:26.8990604Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:47:26.8990753Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:47:26.8990992Z .loc 1 42 64 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:42:64 2026-02-21T09:47:26.8991271Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:47:26.8991416Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:47:26.8991666Z .loc 1 44 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:44:27 2026-02-21T09:47:26.8991933Z shl.b32 %r136, %r133, 1; 2026-02-21T09:47:26.8992088Z and.b32 %r137, %r136, -64; 2026-02-21T09:47:26.8992273Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:47:26.8992427Z mad.wide.s16 %r243, %rs10, 16, %r137; 2026-02-21T09:47:26.8992702Z .loc 1 45 27 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:45:27 2026-02-21T09:47:26.8992981Z mul.wide.s16 %r244, %rs7, 256; 2026-02-21T09:47:26.8993243Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.8993527Z shfl.sync.idx.b32 %r11, %r5, 0, 31, -1; 2026-02-21T09:47:26.8993713Z shl.b32 %r138, %r11, 21; 2026-02-21T09:47:26.8993866Z and.b32 %r139, %r138, 6291456; 2026-02-21T09:47:26.8994075Z add.s32 %r140, %r139, %r272; 2026-02-21T09:47:26.8994235Z shl.b32 %r141, %r11, 2; 2026-02-21T09:47:26.8994381Z and.b32 %r142, %r141, 16; 2026-02-21T09:47:26.8994541Z add.s32 %r242, %r140, %r142; 2026-02-21T09:47:26.8994719Z mov.pred %p58, -1; 2026-02-21T09:47:26.8994866Z mov.b32 %r274, 0; 2026-02-21T09:47:26.8995002Z // begin inline asm 2026-02-21T09:47:26.8995381Z @%p58 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r242 + 0], {%r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274, %r274}; 2026-02-21T09:47:26.8995778Z // end inline asm 2026-02-21T09:47:26.8995916Z // begin inline asm 2026-02-21T09:47:26.8996072Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:26.8996230Z // end inline asm 2026-02-21T09:47:26.8996367Z bar.sync 0; 2026-02-21T09:47:26.8996602Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.8996886Z add.s32 %r275, %r29, 43040; 2026-02-21T09:47:26.8997037Z // begin inline asm 2026-02-21T09:47:26.8997208Z @%p4 mbarrier.init.shared::cta.b64 [%r275], 1; 2026-02-21T09:47:26.8997400Z // end inline asm 2026-02-21T09:47:26.8997528Z bar.sync 0; 2026-02-21T09:47:26.8997662Z add.s32 %r97, %r29, 43048; 2026-02-21T09:47:26.8997809Z // begin inline asm 2026-02-21T09:47:26.8998005Z @%p4 mbarrier.init.shared::cta.b64 [%r97], 1; 2026-02-21T09:47:26.8998190Z // end inline asm 2026-02-21T09:47:26.8998328Z add.s32 %r98, %r29, 43008; 2026-02-21T09:47:26.8998478Z // begin inline asm 2026-02-21T09:47:26.8998641Z @%p4 mbarrier.init.shared::cta.b64 [%r98], 1; 2026-02-21T09:47:26.8998816Z // end inline asm 2026-02-21T09:47:26.8998951Z bar.sync 0; 2026-02-21T09:47:26.8999086Z add.s32 %r99, %r29, 43016; 2026-02-21T09:47:26.8999232Z // begin inline asm 2026-02-21T09:47:26.8999390Z @%p4 mbarrier.init.shared::cta.b64 [%r99], 1; 2026-02-21T09:47:26.8999563Z // end inline asm 2026-02-21T09:47:26.8999695Z bar.sync 0; 2026-02-21T09:47:26.8999822Z add.s32 %r100, %r29, 43024; 2026-02-21T09:47:26.8999980Z // begin inline asm 2026-02-21T09:47:26.9000137Z @%p4 mbarrier.init.shared::cta.b64 [%r100], 1; 2026-02-21T09:47:26.9000323Z // end inline asm 2026-02-21T09:47:26.9000448Z bar.sync 0; 2026-02-21T09:47:26.9000579Z add.s32 %r156, %r29, 43032; 2026-02-21T09:47:26.9000731Z // begin inline asm 2026-02-21T09:47:26.9000885Z @%p4 mbarrier.init.shared::cta.b64 [%r156], 1; 2026-02-21T09:47:26.9001070Z // end inline asm 2026-02-21T09:47:26.9001231Z bar.sync 0; 2026-02-21T09:47:26.9001359Z // begin inline asm 2026-02-21T09:47:26.9001538Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r98], 8704; 2026-02-21T09:47:26.9001764Z // end inline asm 2026-02-21T09:47:26.9002011Z .loc 1 54 31 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:54:31 2026-02-21T09:47:26.9002304Z // begin inline asm 2026-02-21T09:47:26.9002463Z fence.proxy.async.shared::cta; 2026-02-21T09:47:26.9002628Z // end inline asm 2026-02-21T09:47:26.9002766Z bar.sync 0; 2026-02-21T09:47:26.9002905Z elect.sync %r143|%p75, -1; 2026-02-21T09:47:26.9003076Z and.pred %p66, %p1, %p75; 2026-02-21T09:47:26.9003231Z // begin inline asm 2026-02-21T09:47:26.9003572Z @%p66 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29], [%rd79, {%r274, %r244}], [%r98]; 2026-02-21T09:47:26.9003940Z // end inline asm 2026-02-21T09:47:26.9004227Z .loc 1 55 44 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:55:44 2026-02-21T09:47:26.9004536Z bar.sync 0; 2026-02-21T09:47:26.9004707Z elect.sync %r144|%p76, -1; 2026-02-21T09:47:26.9004887Z and.pred %p67, %p1, %p76; 2026-02-21T09:47:26.9005042Z // begin inline asm 2026-02-21T09:47:26.9005373Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r68], [%rd80, {%r274, %r243}], [%r98]; 2026-02-21T09:47:26.9005731Z // end inline asm 2026-02-21T09:47:26.9005986Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9006321Z bar.sync 0; 2026-02-21T09:47:26.9006459Z // begin inline asm 2026-02-21T09:47:26.9006660Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r99], 8704; 2026-02-21T09:47:26.9006875Z // end inline asm 2026-02-21T09:47:26.9007129Z .loc 1 54 31 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:54:31 2026-02-21T09:47:26.9007421Z bar.sync 0; 2026-02-21T09:47:26.9007564Z elect.sync %r145|%p77, -1; 2026-02-21T09:47:26.9007729Z and.pred %p69, %p1, %p77; 2026-02-21T09:47:26.9007894Z add.s32 %r112, %r29, 8192; 2026-02-21T09:47:26.9008054Z mov.b32 %r113, 16; 2026-02-21T09:47:26.9008193Z // begin inline asm 2026-02-21T09:47:26.9008528Z @%p69 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r112], [%rd79, {%r113, %r244}], [%r99]; 2026-02-21T09:47:26.9008884Z // end inline asm 2026-02-21T09:47:26.9009140Z .loc 1 55 44 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:55:44 2026-02-21T09:47:26.9009423Z bar.sync 0; 2026-02-21T09:47:26.9009567Z elect.sync %r146|%p78, -1; 2026-02-21T09:47:26.9009737Z and.pred %p70, %p1, %p78; 2026-02-21T09:47:26.9009895Z add.s32 %r116, %r29, 41472; 2026-02-21T09:47:26.9010046Z // begin inline asm 2026-02-21T09:47:26.9010382Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r116], [%rd80, {%r113, %r243}], [%r99]; 2026-02-21T09:47:26.9010739Z // end inline asm 2026-02-21T09:47:26.9010978Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9011252Z bar.sync 0; 2026-02-21T09:47:26.9011374Z // begin inline asm 2026-02-21T09:47:26.9011560Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r100], 8704; 2026-02-21T09:47:26.9011771Z // end inline asm 2026-02-21T09:47:26.9012006Z .loc 1 54 31 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:54:31 2026-02-21T09:47:26.9012279Z bar.sync 0; 2026-02-21T09:47:26.9012409Z elect.sync %r147|%p79, -1; 2026-02-21T09:47:26.9012567Z and.pred %p72, %p1, %p79; 2026-02-21T09:47:26.9012718Z add.s32 %r121, %r29, 16384; 2026-02-21T09:47:26.9012871Z mov.b32 %r122, 32; 2026-02-21T09:47:26.9013003Z // begin inline asm 2026-02-21T09:47:26.9013323Z @%p72 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r121], [%rd79, {%r122, %r244}], [%r100]; 2026-02-21T09:47:26.9013680Z // end inline asm 2026-02-21T09:47:26.9013941Z .loc 1 55 44 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:55:44 2026-02-21T09:47:26.9014212Z bar.sync 0; 2026-02-21T09:47:26.9014341Z elect.sync %r148|%p80, -1; 2026-02-21T09:47:26.9014503Z and.pred %p73, %p1, %p80; 2026-02-21T09:47:26.9014653Z add.s32 %r125, %r29, 41984; 2026-02-21T09:47:26.9014838Z // begin inline asm 2026-02-21T09:47:26.9015160Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r125], [%rd80, {%r122, %r243}], [%r100]; 2026-02-21T09:47:26.9015498Z // end inline asm 2026-02-21T09:47:26.9015743Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9016017Z bar.sync 0; 2026-02-21T09:47:26.9016162Z // begin inline asm 2026-02-21T09:47:26.9016293Z 2026-02-21T09:47:26.9016409Z { 2026-02-21T09:47:26.9016530Z .reg .pred complete; 2026-02-21T09:47:26.9016724Z waitLoop: 2026-02-21T09:47:26.9016906Z mbarrier.try_wait.parity.shared.b64 complete, [%r98], %r274; 2026-02-21T09:47:26.9017136Z @!complete bra.uni waitLoop; 2026-02-21T09:47:26.9017290Z } 2026-02-21T09:47:26.9017355Z 2026-02-21T09:47:26.9017408Z // end inline asm 2026-02-21T09:47:26.9017647Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.9017931Z setp.ne.b32 %p81, %r11, 0; 2026-02-21T09:47:26.9018088Z @%p81 bra $L__BB0_4; 2026-02-21T09:47:26.9018271Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:26.9018524Z elect.sync %r153|%p83, -1; 2026-02-21T09:47:26.9018684Z mov.b32 %r150, 134479888; 2026-02-21T09:47:26.9018831Z mov.pred %p82, 0; 2026-02-21T09:47:26.9018973Z // begin inline asm 2026-02-21T09:47:26.9019192Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r272 + 0 ], %rd74, %rd75, %r150, %p82; 2026-02-21T09:47:26.9019448Z // end inline asm 2026-02-21T09:47:26.9019581Z // begin inline asm 2026-02-21T09:47:26.9019800Z @%p83 tcgen05.mma.cta_group::1.kind::f16 [ %r272 + 16 ], %rd76, %rd75, %r150, %p82; 2026-02-21T09:47:26.9020043Z // end inline asm 2026-02-21T09:47:26.9020182Z add.s32 %r155, %r29, 43040; 2026-02-21T09:47:26.9020342Z cvt.u64.u32 %rd78, %r155; 2026-02-21T09:47:26.9020488Z // begin inline asm 2026-02-21T09:47:26.9020699Z @%p83 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd78]; 2026-02-21T09:47:26.9020921Z // end inline asm 2026-02-21T09:47:26.9021098Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:26.9021410Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9021689Z bar.sync 0; 2026-02-21T09:47:26.9021811Z // begin inline asm 2026-02-21T09:47:26.9021999Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r156], 8704; 2026-02-21T09:47:26.9022212Z // end inline asm 2026-02-21T09:47:26.9022476Z .loc 1 54 31 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:54:31 2026-02-21T09:47:26.9022752Z bar.sync 0; 2026-02-21T09:47:26.9022882Z elect.sync %r170|%p91, -1; 2026-02-21T09:47:26.9023046Z and.pred %p88, %p1, %p91; 2026-02-21T09:47:26.9023196Z add.s32 %r157, %r29, 24576; 2026-02-21T09:47:26.9023350Z mov.b32 %r158, 48; 2026-02-21T09:47:26.9023482Z // begin inline asm 2026-02-21T09:47:26.9023801Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r157], [%rd79, {%r158, %r244}], [%r156]; 2026-02-21T09:47:26.9024160Z // end inline asm 2026-02-21T09:47:26.9024392Z .loc 1 55 44 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:55:44 2026-02-21T09:47:26.9024708Z bar.sync 0; 2026-02-21T09:47:26.9024840Z elect.sync %r171|%p92, -1; 2026-02-21T09:47:26.9025005Z and.pred %p89, %p1, %p92; 2026-02-21T09:47:26.9025156Z add.s32 %r161, %r29, 42496; 2026-02-21T09:47:26.9025316Z // begin inline asm 2026-02-21T09:47:26.9025650Z @%p89 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r161], [%rd80, {%r158, %r243}], [%r156]; 2026-02-21T09:47:26.9026021Z // end inline asm 2026-02-21T09:47:26.9026159Z mov.b32 %r279, 1; 2026-02-21T09:47:26.9026289Z mov.b32 %r278, 3; 2026-02-21T09:47:26.9026439Z mov.b32 %r276, %r274; 2026-02-21T09:47:26.9026590Z mov.b32 %r277, %r274; 2026-02-21T09:47:26.9026748Z mov.b32 %r280, %r274; 2026-02-21T09:47:26.9026884Z mov.b32 %r281, %r274; 2026-02-21T09:47:26.9027047Z bra.uni $L__BB0_5; 2026-02-21T09:47:26.9027231Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:47:26.9027539Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9027823Z setp.lt.u32 %p103, %r281, 1984; 2026-02-21T09:47:26.9028084Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.9028360Z // begin inline asm 2026-02-21T09:47:26.9028516Z 2026-02-21T09:47:26.9028635Z { 2026-02-21T09:47:26.9028752Z .reg .pred complete; 2026-02-21T09:47:26.9028900Z waitLoop: 2026-02-21T09:47:26.9029090Z mbarrier.try_wait.parity.shared.b64 complete, [%r275], %r274; 2026-02-21T09:47:26.9029321Z @!complete bra.uni waitLoop; 2026-02-21T09:47:26.9029477Z } 2026-02-21T09:47:26.9029540Z 2026-02-21T09:47:26.9029593Z // end inline asm 2026-02-21T09:47:26.9029733Z add.s32 %r207, %r279, 1; 2026-02-21T09:47:26.9029886Z setp.gt.s32 %p106, %r207, 1; 2026-02-21T09:47:26.9030053Z selp.b32 %r279, 0, %r207, %p106; 2026-02-21T09:47:26.9030215Z selp.b32 %r208, 1, 0, %p106; 2026-02-21T09:47:26.9030406Z xor.b32 %r280, %r219, %r208; 2026-02-21T09:47:26.9030658Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9030938Z add.s32 %r209, %r278, 1; 2026-02-21T09:47:26.9031093Z setp.gt.s32 %p107, %r209, 3; 2026-02-21T09:47:26.9031251Z selp.b32 %r278, 0, %r209, %p107; 2026-02-21T09:47:26.9031414Z shl.b32 %r210, %r278, 3; 2026-02-21T09:47:26.9031557Z add.s32 %r212, %r29, %r210; 2026-02-21T09:47:26.9031712Z add.s32 %r202, %r212, 43008; 2026-02-21T09:47:26.9031854Z bar.sync 0; 2026-02-21T09:47:26.9031991Z and.pred %p100, %p4, %p103; 2026-02-21T09:47:26.9032138Z // begin inline asm 2026-02-21T09:47:26.9032331Z @%p100 mbarrier.arrive.expect_tx.shared.b64 _, [%r202], 8704; 2026-02-21T09:47:26.9032545Z // end inline asm 2026-02-21T09:47:26.9032779Z .loc 1 54 31 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:54:31 2026-02-21T09:47:26.9033065Z shl.b32 %r213, %r278, 13; 2026-02-21T09:47:26.9033213Z add.s32 %r199, %r29, %r213; 2026-02-21T09:47:26.9033364Z bar.sync 0; 2026-02-21T09:47:26.9033493Z elect.sync %r214|%p108, -1; 2026-02-21T09:47:26.9033661Z and.pred %p109, %p103, %p108; 2026-02-21T09:47:26.9033820Z and.pred %p101, %p1, %p109; 2026-02-21T09:47:26.9033978Z add.s32 %r200, %r281, 64; 2026-02-21T09:47:26.9034159Z // begin inline asm 2026-02-21T09:47:26.9034481Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r199], [%rd79, {%r200, %r244}], [%r202]; 2026-02-21T09:47:26.9034866Z // end inline asm 2026-02-21T09:47:26.9035108Z .loc 1 55 44 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:55:44 2026-02-21T09:47:26.9035404Z shl.b32 %r215, %r278, 9; 2026-02-21T09:47:26.9035553Z add.s32 %r216, %r29, %r215; 2026-02-21T09:47:26.9035712Z add.s32 %r203, %r216, 40960; 2026-02-21T09:47:26.9035866Z bar.sync 0; 2026-02-21T09:47:26.9035995Z elect.sync %r217|%p110, -1; 2026-02-21T09:47:26.9036163Z and.pred %p111, %p103, %p110; 2026-02-21T09:47:26.9036325Z and.pred %p102, %p1, %p111; 2026-02-21T09:47:26.9036484Z // begin inline asm 2026-02-21T09:47:26.9036811Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r203], [%rd80, {%r200, %r243}], [%r202]; 2026-02-21T09:47:26.9037177Z // end inline asm 2026-02-21T09:47:26.9037419Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9037748Z setp.lt.u32 %p112, %r281, 2016; 2026-02-21T09:47:26.9037917Z add.s32 %r281, %r281, 16; 2026-02-21T09:47:26.9038063Z mov.b32 %r274, %r219; 2026-02-21T09:47:26.9038212Z mov.b32 %r275, %r218; 2026-02-21T09:47:26.9038351Z @%p112 bra $L__BB0_5; 2026-02-21T09:47:26.9038495Z bra.uni $L__BB0_8; 2026-02-21T09:47:26.9038672Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:26.9038915Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:26.9039228Z .loc 1 0 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:0:42 2026-02-21T09:47:26.9039493Z mov.b32 %r219, %r280; 2026-02-21T09:47:26.9039741Z .loc 1 50 42 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:50:42 2026-02-21T09:47:26.9040008Z add.s32 %r174, %r277, 1; 2026-02-21T09:47:26.9040208Z setp.gt.s32 %p94, %r174, 3; 2026-02-21T09:47:26.9040372Z selp.b32 %r277, 0, %r174, %p94; 2026-02-21T09:47:26.9040546Z selp.b32 %r175, 1, 0, %p94; 2026-02-21T09:47:26.9040702Z xor.b32 %r276, %r276, %r175; 2026-02-21T09:47:26.9040863Z shl.b32 %r176, %r277, 3; 2026-02-21T09:47:26.9041019Z add.s32 %r178, %r29, %r176; 2026-02-21T09:47:26.9041169Z add.s32 %r172, %r178, 43008; 2026-02-21T09:47:26.9041324Z bar.sync 0; 2026-02-21T09:47:26.9041455Z // begin inline asm 2026-02-21T09:47:26.9041595Z 2026-02-21T09:47:26.9041709Z { 2026-02-21T09:47:26.9041840Z .reg .pred complete; 2026-02-21T09:47:26.9042006Z waitLoop: 2026-02-21T09:47:26.9042192Z mbarrier.try_wait.parity.shared.b64 complete, [%r172], %r276; 2026-02-21T09:47:26.9042413Z @!complete bra.uni waitLoop; 2026-02-21T09:47:26.9042565Z } 2026-02-21T09:47:26.9042628Z 2026-02-21T09:47:26.9042693Z // end inline asm 2026-02-21T09:47:26.9042828Z shl.b32 %r179, %r279, 3; 2026-02-21T09:47:26.9042985Z add.s32 %r180, %r29, %r179; 2026-02-21T09:47:26.9043137Z add.s32 %r218, %r180, 43040; 2026-02-21T09:47:26.9043401Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.9043673Z @%p81 bra $L__BB0_7; 2026-02-21T09:47:26.9043872Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:47:26.9044196Z .loc 1 54 31 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:54:31 2026-02-21T09:47:26.9044491Z shl.b32 %r185, %r277, 13; 2026-02-21T09:47:26.9044662Z add.s32 %r187, %r29, %r185; 2026-02-21T09:47:26.9044965Z .loc 1 55 44 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:55:44 2026-02-21T09:47:26.9045261Z shl.b32 %r188, %r277, 9; 2026-02-21T09:47:26.9045416Z add.s32 %r189, %r29, %r188; 2026-02-21T09:47:26.9045578Z add.s32 %r190, %r189, 40960; 2026-02-21T09:47:26.9045870Z .loc 1 56 52 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:56:52 2026-02-21T09:47:26.9046171Z elect.sync %r191|%p96, -1; 2026-02-21T09:47:26.9046344Z bfe.u32 %r192, %r187, 4, 14; 2026-02-21T09:47:26.9046507Z cvt.u64.u32 %rd86, %r192; 2026-02-21T09:47:26.9046696Z or.b64 %rd81, %rd86, -4611685949674356736; 2026-02-21T09:47:26.9046883Z bfe.u32 %r193, %r190, 4, 14; 2026-02-21T09:47:26.9047047Z cvt.u64.u32 %rd87, %r193; 2026-02-21T09:47:26.9047213Z or.b64 %rd82, %rd87, -4611685949705814016; 2026-02-21T09:47:26.9047401Z mov.b32 %r182, 134479888; 2026-02-21T09:47:26.9047554Z mov.pred %p95, -1; 2026-02-21T09:47:26.9047704Z // begin inline asm 2026-02-21T09:47:26.9047935Z @%p96 tcgen05.mma.cta_group::1.kind::f16 [ %r272 + 0 ], %rd81, %rd82, %r182, %p95; 2026-02-21T09:47:26.9048195Z // end inline asm 2026-02-21T09:47:26.9048342Z add.s32 %r194, %r187, 4096; 2026-02-21T09:47:26.9048498Z bfe.u32 %r195, %r194, 4, 14; 2026-02-21T09:47:26.9048659Z cvt.u64.u32 %rd88, %r195; 2026-02-21T09:47:26.9048825Z or.b64 %rd83, %rd88, -4611685949674356736; 2026-02-21T09:47:26.9049013Z // begin inline asm 2026-02-21T09:47:26.9049236Z @%p96 tcgen05.mma.cta_group::1.kind::f16 [ %r272 + 16 ], %rd83, %rd82, %r182, %p95; 2026-02-21T09:47:26.9049533Z // end inline asm 2026-02-21T09:47:26.9049678Z cvt.u64.u32 %rd85, %r218; 2026-02-21T09:47:26.9049828Z // begin inline asm 2026-02-21T09:47:26.9050046Z @%p96 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd85]; 2026-02-21T09:47:26.9050274Z // end inline asm 2026-02-21T09:47:26.9050415Z bra.uni $L__BB0_7; 2026-02-21T09:47:26.9050575Z $L__BB0_9: // %._crit_edge 2026-02-21T09:47:26.9050883Z .loc 1 33 84 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:33:84 2026-02-21T09:47:26.9051181Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:26.9051359Z bar.sync 0; 2026-02-21T09:47:26.9051607Z .loc 1 33 4 // chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py:33:4 2026-02-21T09:47:26.9051887Z bar.sync 0; 2026-02-21T09:47:26.9052049Z // begin inline asm 2026-02-21T09:47:26.9052248Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r272, 32; 2026-02-21T09:47:26.9052479Z // end inline asm 2026-02-21T09:47:26.9052605Z ret; 2026-02-21T09:47:26.9052727Z $L__tmp1: 2026-02-21T09:47:26.9052847Z $L__func_end0: 2026-02-21T09:47:26.9053007Z // -- End function 2026-02-21T09:47:26.9053188Z } 2026-02-21T09:47:26.9053442Z .file 1 "/tmp/torchinductor_root/hw/chw2wyli7nm3agoz6cexd4qgp5fbjok5qfahubh25c674rwwe6p7.py" 2026-02-21T09:47:26.9053761Z .section .debug_abbrev 2026-02-21T09:47:26.9053897Z { 2026-02-21T09:47:26.9054093Z .b8 1 // Abbreviation Code 2026-02-21T09:47:26.9054308Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:26.9054527Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:26.9054761Z .b8 37 // DW_AT_producer 2026-02-21T09:47:26.9054977Z .b8 8 // DW_FORM_string 2026-02-21T09:47:26.9055194Z .b8 19 // DW_AT_language 2026-02-21T09:47:26.9055400Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:26.9055604Z .b8 3 // DW_AT_name 2026-02-21T09:47:26.9055793Z .b8 8 // DW_FORM_string 2026-02-21T09:47:26.9055996Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:26.9056193Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:26.9056394Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:26.9056593Z .b8 8 // DW_FORM_string 2026-02-21T09:47:26.9056779Z .b8 0 // EOM(1) 2026-02-21T09:47:26.9056971Z .b8 0 // EOM(2) 2026-02-21T09:47:26.9057150Z .b8 0 // EOM(3) 2026-02-21T09:47:26.9057344Z } 2026-02-21T09:47:26.9057464Z .section .debug_info 2026-02-21T09:47:26.9057607Z { 2026-02-21T09:47:26.9057749Z .b32 104 // Length of Unit 2026-02-21T09:47:26.9057969Z .b8 2 // DWARF version number 2026-02-21T09:47:26.9058160Z .b8 0 2026-02-21T09:47:26.9058335Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:26.9058585Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:26.9058813Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:26.9059048Z .b8 116 // DW_AT_producer 2026-02-21T09:47:26.9059226Z .b8 114 2026-02-21T09:47:26.9059349Z .b8 105 2026-02-21T09:47:26.9059461Z .b8 116 2026-02-21T09:47:26.9059578Z .b8 111 2026-02-21T09:47:26.9059685Z .b8 110 2026-02-21T09:47:26.9059799Z .b8 0 2026-02-21T09:47:26.9059938Z .b8 2 // DW_AT_language 2026-02-21T09:47:26.9060106Z .b8 0 2026-02-21T09:47:26.9060247Z .b8 99 // DW_AT_name 2026-02-21T09:47:26.9060445Z .b8 104 2026-02-21T09:47:26.9060559Z .b8 119 2026-02-21T09:47:26.9060665Z .b8 50 2026-02-21T09:47:26.9060781Z .b8 119 2026-02-21T09:47:26.9060887Z .b8 121 2026-02-21T09:47:26.9061000Z .b8 108 2026-02-21T09:47:26.9061108Z .b8 105 2026-02-21T09:47:26.9061222Z .b8 55 2026-02-21T09:47:26.9061331Z .b8 110 2026-02-21T09:47:26.9061443Z .b8 109 2026-02-21T09:47:26.9061557Z .b8 51 2026-02-21T09:47:26.9061664Z .b8 97 2026-02-21T09:47:26.9061777Z .b8 103 2026-02-21T09:47:26.9061884Z .b8 111 2026-02-21T09:47:26.9061998Z .b8 122 2026-02-21T09:47:26.9062107Z .b8 54 2026-02-21T09:47:26.9062221Z .b8 99 2026-02-21T09:47:26.9062331Z .b8 101 2026-02-21T09:47:26.9062451Z .b8 120 2026-02-21T09:47:26.9062560Z .b8 100 2026-02-21T09:47:26.9062680Z .b8 52 2026-02-21T09:47:26.9062787Z .b8 113 2026-02-21T09:47:26.9062904Z .b8 103 2026-02-21T09:47:26.9063010Z .b8 112 2026-02-21T09:47:26.9063123Z .b8 53 2026-02-21T09:47:26.9063265Z .b8 102 2026-02-21T09:47:26.9063375Z .b8 98 2026-02-21T09:47:26.9063489Z .b8 106 2026-02-21T09:47:26.9063599Z .b8 111 2026-02-21T09:47:26.9063712Z .b8 107 2026-02-21T09:47:26.9063819Z .b8 53 2026-02-21T09:47:26.9063933Z .b8 113 2026-02-21T09:47:26.9064040Z .b8 102 2026-02-21T09:47:26.9064153Z .b8 97 2026-02-21T09:47:26.9064259Z .b8 104 2026-02-21T09:47:26.9064372Z .b8 117 2026-02-21T09:47:26.9064481Z .b8 98 2026-02-21T09:47:26.9064595Z .b8 104 2026-02-21T09:47:26.9064734Z .b8 50 2026-02-21T09:47:26.9064851Z .b8 53 2026-02-21T09:47:26.9064957Z .b8 99 2026-02-21T09:47:26.9065072Z .b8 54 2026-02-21T09:47:26.9065185Z .b8 55 2026-02-21T09:47:26.9065329Z .b8 52 2026-02-21T09:47:26.9065447Z .b8 114 2026-02-21T09:47:26.9065555Z .b8 119 2026-02-21T09:47:26.9065667Z .b8 119 2026-02-21T09:47:26.9065774Z .b8 101 2026-02-21T09:47:26.9065889Z .b8 54 2026-02-21T09:47:26.9065995Z .b8 112 2026-02-21T09:47:26.9066108Z .b8 55 2026-02-21T09:47:26.9066214Z .b8 46 2026-02-21T09:47:26.9066329Z .b8 112 2026-02-21T09:47:26.9066436Z .b8 121 2026-02-21T09:47:26.9066551Z .b8 0 2026-02-21T09:47:26.9066704Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:26.9066929Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:26.9067111Z .b8 116 2026-02-21T09:47:26.9067219Z .b8 109 2026-02-21T09:47:26.9067335Z .b8 112 2026-02-21T09:47:26.9067444Z .b8 47 2026-02-21T09:47:26.9067558Z .b8 116 2026-02-21T09:47:26.9067667Z .b8 111 2026-02-21T09:47:26.9067782Z .b8 114 2026-02-21T09:47:26.9067888Z .b8 99 2026-02-21T09:47:26.9068004Z .b8 104 2026-02-21T09:47:26.9068110Z .b8 105 2026-02-21T09:47:26.9068226Z .b8 110 2026-02-21T09:47:26.9068336Z .b8 100 2026-02-21T09:47:26.9068456Z .b8 117 2026-02-21T09:47:26.9068577Z .b8 99 2026-02-21T09:47:26.9068684Z .b8 116 2026-02-21T09:47:26.9068799Z .b8 111 2026-02-21T09:47:26.9068907Z .b8 114 2026-02-21T09:47:26.9069020Z .b8 95 2026-02-21T09:47:26.9069127Z .b8 114 2026-02-21T09:47:26.9069241Z .b8 111 2026-02-21T09:47:26.9069348Z .b8 111 2026-02-21T09:47:26.9069492Z .b8 116 2026-02-21T09:47:26.9069602Z .b8 47 2026-02-21T09:47:26.9069716Z .b8 104 2026-02-21T09:47:26.9069824Z .b8 119 2026-02-21T09:47:26.9069938Z .b8 0 2026-02-21T09:47:26.9070044Z } 2026-02-21T09:47:26.9070174Z .section .debug_macinfo { } 2026-02-21T09:47:26.9070277Z 2026-02-21T09:47:26.9070359Z ================================================================ 2026-02-21T09:47:26.9070584Z please share the reproducer above with Triton project. 2026-02-21T09:47:27.3036076Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:27.3036345Z 2026-02-21T09:47:27.3041468Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:47:27.3042877Z 2026-02-21T09:47:27.3043021Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:27.3043212Z 2026-02-21T09:47:27.3048238Z `ptxas` stderr: 2026-02-21T09:47:27.3048458Z ================================================================ 2026-02-21T09:47:27.3052963Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 351 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.3053450Z Internal Triton PTX codegen error 2026-02-21T09:47:27.3055238Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.3055497Z `ptxas` stderr: 2026-02-21T09:47:27.3057732Z 2026-02-21T09:47:27.3058282Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplb6gwvt1.ptx -o /tmp/tmplb6gwvt1.ptx.o 2026-02-21T09:47:27.3058955Z 2026-02-21T09:47:27.3059113Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:27.3059662Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 351 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.3060152Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.3060300Z 2026-02-21T09:47:27.3060681Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplb6gwvt1.ptx -o /tmp/tmplb6gwvt1.ptx.o 2026-02-21T09:47:27.3061171Z 2026-02-21T09:47:27.3061174Z 2026-02-21T09:47:27.3061230Z // 2026-02-21T09:47:27.3061367Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:27.3061546Z // 2026-02-21T09:47:27.3061614Z 2026-02-21T09:47:27.3061681Z .version 8.7 2026-02-21T09:47:27.3061819Z .target sm_100a 2026-02-21T09:47:27.3061973Z .address_size 64 2026-02-21T09:47:27.3062061Z 2026-02-21T09:47:27.3062187Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:27.3062455Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:27.3062660Z // @_helion_matmul 2026-02-21T09:47:27.3062858Z .visible .entry _helion_matmul( 2026-02-21T09:47:27.3063074Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:27.3063324Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:27.3063578Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:27.3063819Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:27.3064065Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:27.3064258Z ) 2026-02-21T09:47:27.3064378Z .reqntid 128 2026-02-21T09:47:27.3064500Z .maxnreg 32 2026-02-21T09:47:27.3064625Z { 2026-02-21T09:47:27.3064952Z .reg .pred %p<149>; 2026-02-21T09:47:27.3065152Z .reg .b16 %rs<10>; 2026-02-21T09:47:27.3065300Z .reg .b32 %r<397>; 2026-02-21T09:47:27.3065435Z .reg .b64 %rd<207>; 2026-02-21T09:47:27.3065705Z .loc 1 19 0 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:19:0 2026-02-21T09:47:27.3065985Z $L__func_begin0: 2026-02-21T09:47:27.3066231Z .loc 1 19 0 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:19:0 2026-02-21T09:47:27.3066454Z 2026-02-21T09:47:27.3066507Z // %bb.0: 2026-02-21T09:47:27.3066669Z ld.param.b64 %rd17, [_helion_matmul_param_0]; 2026-02-21T09:47:27.3066865Z $L__tmp0: 2026-02-21T09:47:27.3067251Z .loc 1 19 0 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:19 2026-02-21T09:47:27.3067537Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:27.3067703Z ld.param.b64 %rd35, [_helion_matmul_param_1]; 2026-02-21T09:47:27.3067905Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:47:27.3068077Z ld.param.b64 %rd53, [_helion_matmul_param_2]; 2026-02-21T09:47:27.3068275Z mov.b32 %r29, global_smem; 2026-02-21T09:47:27.3068427Z // begin inline asm 2026-02-21T09:47:27.3068659Z @%p1 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r29], 32; 2026-02-21T09:47:27.3068945Z // end inline asm 2026-02-21T09:47:27.3069098Z ld.param.b64 %rd70, [_helion_matmul_param_3]; 2026-02-21T09:47:27.3069282Z bar.sync 0; 2026-02-21T09:47:27.3069422Z ld.shared.b32 %r387, [global_smem]; 2026-02-21T09:47:27.3069597Z bar.sync 0; 2026-02-21T09:47:27.3069722Z // begin inline asm 2026-02-21T09:47:27.3069927Z @%p1 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:27.3070146Z // end inline asm 2026-02-21T09:47:27.3070402Z .loc 1 21 67 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:21:67 2026-02-21T09:47:27.3070699Z mov.u32 %r54, %ctaid.x; 2026-02-21T09:47:27.3070866Z mov.u32 %r55, %ctaid.y; 2026-02-21T09:47:27.3071008Z mov.u32 %r56, %ctaid.z; 2026-02-21T09:47:27.3071160Z mov.u32 %r57, %nctaid.x; 2026-02-21T09:47:27.3071308Z mov.u32 %r58, %nctaid.y; 2026-02-21T09:47:27.3071494Z mad.lo.s32 %r59, %r56, %r58, %r55; 2026-02-21T09:47:27.3071671Z mad.lo.s32 %r60, %r59, %r57, %r54; 2026-02-21T09:47:27.3071836Z mul.lo.s32 %r61, %r60, 384; 2026-02-21T09:47:27.3071998Z cvt.s64.s32 %rd71, %r61; 2026-02-21T09:47:27.3072148Z add.s64 %rd31, %rd70, %rd71; 2026-02-21T09:47:27.3072306Z shl.b32 %r62, %r1, 2; 2026-02-21T09:47:27.3072447Z add.s32 %r30, %r29, %r62; 2026-02-21T09:47:27.3072599Z mov.b32 %r39, 0; 2026-02-21T09:47:27.3072729Z // begin inline asm 2026-02-21T09:47:27.3072883Z @%p1 st.shared.b32 [ %r30 + 0 ], %r39; 2026-02-21T09:47:27.3073055Z // end inline asm 2026-02-21T09:47:27.3073219Z bar.warp.sync -1; 2026-02-21T09:47:27.3073367Z setp.eq.b32 %p4, %r1, 0; 2026-02-21T09:47:27.3073514Z cvt.u64.u32 %rd16, %r29; 2026-02-21T09:47:27.3073661Z // begin inline asm 2026-02-21T09:47:27.3073899Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd16 + 0 ], %rd17; 2026-02-21T09:47:27.3074178Z // end inline asm 2026-02-21T09:47:27.3074309Z // begin inline asm 2026-02-21T09:47:27.3074529Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1; 2026-02-21T09:47:27.3074825Z // end inline asm 2026-02-21T09:47:27.3074952Z mov.b32 %r32, 64; 2026-02-21T09:47:27.3075086Z // begin inline asm 2026-02-21T09:47:27.3075314Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r32; 2026-02-21T09:47:27.3075576Z // end inline asm 2026-02-21T09:47:27.3075703Z mov.b32 %r33, 256; 2026-02-21T09:47:27.3075842Z // begin inline asm 2026-02-21T09:47:27.3076062Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r33; 2026-02-21T09:47:27.3076321Z // end inline asm 2026-02-21T09:47:27.3076458Z mov.b32 %r34, 2048; 2026-02-21T09:47:27.3076593Z // begin inline asm 2026-02-21T09:47:27.3076835Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r34; 2026-02-21T09:47:27.3077108Z // end inline asm 2026-02-21T09:47:27.3077280Z // begin inline asm 2026-02-21T09:47:27.3077519Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r34; 2026-02-21T09:47:27.3077800Z // end inline asm 2026-02-21T09:47:27.3077935Z mov.b64 %rd24, 4096; 2026-02-21T09:47:27.3078085Z // begin inline asm 2026-02-21T09:47:27.3078346Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd16 + 0 ], 0x0, %rd24; 2026-02-21T09:47:27.3078631Z // end inline asm 2026-02-21T09:47:27.3078770Z mov.b32 %r36, 1; 2026-02-21T09:47:27.3078903Z // begin inline asm 2026-02-21T09:47:27.3079164Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r36; 2026-02-21T09:47:27.3079448Z // end inline asm 2026-02-21T09:47:27.3079592Z // begin inline asm 2026-02-21T09:47:27.3079853Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r36; 2026-02-21T09:47:27.3080138Z // end inline asm 2026-02-21T09:47:27.3080289Z // begin inline asm 2026-02-21T09:47:27.3080525Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x6; 2026-02-21T09:47:27.3080830Z // end inline asm 2026-02-21T09:47:27.3080972Z // begin inline asm 2026-02-21T09:47:27.3081236Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:47:27.3081527Z // end inline asm 2026-02-21T09:47:27.3081675Z // begin inline asm 2026-02-21T09:47:27.3081925Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x3; 2026-02-21T09:47:27.3082199Z // end inline asm 2026-02-21T09:47:27.3082344Z // begin inline asm 2026-02-21T09:47:27.3082579Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:47:27.3082852Z // end inline asm 2026-02-21T09:47:27.3082993Z // begin inline asm 2026-02-21T09:47:27.3083354Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd31 + 0 ], [ %rd16 + 0 ], 0x80; 2026-02-21T09:47:27.3083757Z // end inline asm 2026-02-21T09:47:27.3083926Z // begin inline asm 2026-02-21T09:47:27.3084144Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd31 + 0 ], 0x80; 2026-02-21T09:47:27.3084408Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.3084599Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.3084812Z // end inline asm 2026-02-21T09:47:27.3084948Z bar.sync 0; 2026-02-21T09:47:27.3085083Z cvta.global.u64 %rd109, %rd31; 2026-02-21T09:47:27.3085366Z .loc 1 22 68 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:22:68 2026-02-21T09:47:27.3085665Z add.s32 %r63, %r61, 128; 2026-02-21T09:47:27.3085814Z cvt.s64.s32 %rd72, %r63; 2026-02-21T09:47:27.3086002Z add.s64 %rd49, %rd70, %rd72; 2026-02-21T09:47:27.3086149Z bar.sync 0; 2026-02-21T09:47:27.3086277Z // begin inline asm 2026-02-21T09:47:27.3086417Z @%p1 st.shared.b32 [ %r30 + 0 ], %r39; 2026-02-21T09:47:27.3086587Z // end inline asm 2026-02-21T09:47:27.3086717Z bar.warp.sync -1; 2026-02-21T09:47:27.3086856Z // begin inline asm 2026-02-21T09:47:27.3087097Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd16 + 0 ], %rd35; 2026-02-21T09:47:27.3087361Z // end inline asm 2026-02-21T09:47:27.3087492Z // begin inline asm 2026-02-21T09:47:27.3087701Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1; 2026-02-21T09:47:27.3087946Z // end inline asm 2026-02-21T09:47:27.3088073Z // begin inline asm 2026-02-21T09:47:27.3088300Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r32; 2026-02-21T09:47:27.3088556Z // end inline asm 2026-02-21T09:47:27.3088681Z mov.b32 %r41, 16; 2026-02-21T09:47:27.3088820Z // begin inline asm 2026-02-21T09:47:27.3089036Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r41; 2026-02-21T09:47:27.3089294Z // end inline asm 2026-02-21T09:47:27.3089421Z // begin inline asm 2026-02-21T09:47:27.3089681Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r34; 2026-02-21T09:47:27.3089944Z // end inline asm 2026-02-21T09:47:27.3090085Z mov.b32 %r43, 12288; 2026-02-21T09:47:27.3090234Z // begin inline asm 2026-02-21T09:47:27.3090467Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r43; 2026-02-21T09:47:27.3090740Z // end inline asm 2026-02-21T09:47:27.3090870Z // begin inline asm 2026-02-21T09:47:27.3091119Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd16 + 0 ], 0x0, %rd24; 2026-02-21T09:47:27.3091386Z // end inline asm 2026-02-21T09:47:27.3091521Z // begin inline asm 2026-02-21T09:47:27.3091761Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r36; 2026-02-21T09:47:27.3092035Z // end inline asm 2026-02-21T09:47:27.3092170Z // begin inline asm 2026-02-21T09:47:27.3092407Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r36; 2026-02-21T09:47:27.3092679Z // end inline asm 2026-02-21T09:47:27.3092805Z // begin inline asm 2026-02-21T09:47:27.3093034Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x6; 2026-02-21T09:47:27.3093313Z // end inline asm 2026-02-21T09:47:27.3093448Z // begin inline asm 2026-02-21T09:47:27.3093689Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:47:27.3093958Z // end inline asm 2026-02-21T09:47:27.3094094Z // begin inline asm 2026-02-21T09:47:27.3094317Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x3; 2026-02-21T09:47:27.3094577Z // end inline asm 2026-02-21T09:47:27.3094733Z // begin inline asm 2026-02-21T09:47:27.3094962Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:47:27.3095255Z // end inline asm 2026-02-21T09:47:27.3095423Z // begin inline asm 2026-02-21T09:47:27.3095762Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd49 + 0 ], [ %rd16 + 0 ], 0x80; 2026-02-21T09:47:27.3096173Z // end inline asm 2026-02-21T09:47:27.3096309Z // begin inline asm 2026-02-21T09:47:27.3096507Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd49 + 0 ], 0x80; 2026-02-21T09:47:27.3096755Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.3096943Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.3097110Z // end inline asm 2026-02-21T09:47:27.3097243Z bar.sync 0; 2026-02-21T09:47:27.3097377Z cvta.global.u64 %rd110, %rd49; 2026-02-21T09:47:27.3097656Z .loc 1 24 73 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:24:73 2026-02-21T09:47:27.3097943Z add.s32 %r64, %r61, 256; 2026-02-21T09:47:27.3098134Z cvt.s64.s32 %rd73, %r64; 2026-02-21T09:47:27.3098285Z add.s64 %rd67, %rd70, %rd73; 2026-02-21T09:47:27.3098441Z bar.sync 0; 2026-02-21T09:47:27.3098566Z // begin inline asm 2026-02-21T09:47:27.3098716Z @%p1 st.shared.b32 [ %r30 + 0 ], %r39; 2026-02-21T09:47:27.3098889Z // end inline asm 2026-02-21T09:47:27.3099026Z bar.warp.sync -1; 2026-02-21T09:47:27.3099169Z // begin inline asm 2026-02-21T09:47:27.3099406Z @%p4 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd16 + 0 ], %rd53; 2026-02-21T09:47:27.3099676Z // end inline asm 2026-02-21T09:47:27.3099803Z // begin inline asm 2026-02-21T09:47:27.3100020Z @%p4 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1; 2026-02-21T09:47:27.3100254Z // end inline asm 2026-02-21T09:47:27.3100386Z // begin inline asm 2026-02-21T09:47:27.3100614Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r41; 2026-02-21T09:47:27.3100864Z // end inline asm 2026-02-21T09:47:27.3101002Z // begin inline asm 2026-02-21T09:47:27.3101219Z @%p4 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r33; 2026-02-21T09:47:27.3101472Z // end inline asm 2026-02-21T09:47:27.3101600Z // begin inline asm 2026-02-21T09:47:27.3101834Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r43; 2026-02-21T09:47:27.3102126Z // end inline asm 2026-02-21T09:47:27.3102258Z // begin inline asm 2026-02-21T09:47:27.3102496Z @%p4 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r34; 2026-02-21T09:47:27.3102754Z // end inline asm 2026-02-21T09:47:27.3102890Z mov.b64 %rd60, 24576; 2026-02-21T09:47:27.3103029Z // begin inline asm 2026-02-21T09:47:27.3103275Z @%p4 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd16 + 0 ], 0x0, %rd60; 2026-02-21T09:47:27.3103544Z // end inline asm 2026-02-21T09:47:27.3103676Z // begin inline asm 2026-02-21T09:47:27.3103926Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0, %r36; 2026-02-21T09:47:27.3104200Z // end inline asm 2026-02-21T09:47:27.3104337Z // begin inline asm 2026-02-21T09:47:27.3104574Z @%p4 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1, %r36; 2026-02-21T09:47:27.3104877Z // end inline asm 2026-02-21T09:47:27.3105001Z // begin inline asm 2026-02-21T09:47:27.3105232Z @%p4 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x6; 2026-02-21T09:47:27.3105509Z // end inline asm 2026-02-21T09:47:27.3105634Z // begin inline asm 2026-02-21T09:47:27.3105872Z @%p4 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:47:27.3106141Z // end inline asm 2026-02-21T09:47:27.3106275Z // begin inline asm 2026-02-21T09:47:27.3106498Z @%p4 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x1; 2026-02-21T09:47:27.3106764Z // end inline asm 2026-02-21T09:47:27.3106897Z // begin inline asm 2026-02-21T09:47:27.3107109Z @%p4 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd16 + 0 ], 0x0; 2026-02-21T09:47:27.3107360Z // end inline asm 2026-02-21T09:47:27.3107486Z // begin inline asm 2026-02-21T09:47:27.3107815Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd67 + 0 ], [ %rd16 + 0 ], 0x80; 2026-02-21T09:47:27.3108174Z // end inline asm 2026-02-21T09:47:27.3108339Z // begin inline asm 2026-02-21T09:47:27.3108543Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd67 + 0 ], 0x80; 2026-02-21T09:47:27.3108780Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.3108966Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.3109133Z // end inline asm 2026-02-21T09:47:27.3109267Z bar.sync 0; 2026-02-21T09:47:27.3109399Z cvta.global.u64 %rd142, %rd67; 2026-02-21T09:47:27.3109671Z .loc 1 31 35 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:31:35 2026-02-21T09:47:27.3109950Z mul.lo.s32 %r388, %r54, 3; 2026-02-21T09:47:27.3110254Z .loc 1 32 37 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:32:37 2026-02-21T09:47:27.3110551Z add.s32 %r65, %r388, 3; 2026-02-21T09:47:27.3110800Z .loc 1 32 49 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:32:49 2026-02-21T09:47:27.3111087Z min.s32 %r4, %r65, 6144; 2026-02-21T09:47:27.3111340Z .loc 1 33 84 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:33:84 2026-02-21T09:47:27.3111624Z setp.ge.s32 %p57, %r388, %r4; 2026-02-21T09:47:27.3111779Z @%p57 bra $L__BB0_9; 2026-02-21T09:47:27.3111956Z // %bb.1: // %.lr.ph 2026-02-21T09:47:27.3112258Z .loc 1 0 84 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:0:84 2026-02-21T09:47:27.3112541Z shr.u32 %r5, %r1, 5; 2026-02-21T09:47:27.3112693Z bfe.u32 %r67, %r29, 4, 14; 2026-02-21T09:47:27.3112846Z cvt.u64.u32 %rd74, %r67; 2026-02-21T09:47:27.3113010Z or.b64 %rd92, %rd74, 4611686293439512576; 2026-02-21T09:47:27.3113183Z add.s32 %r68, %r29, 139264; 2026-02-21T09:47:27.3113340Z bfe.u32 %r69, %r68, 4, 14; 2026-02-21T09:47:27.3113490Z cvt.u64.u32 %rd75, %r69; 2026-02-21T09:47:27.3113650Z or.b64 %rd93, %rd75, 4611686293313683456; 2026-02-21T09:47:27.3113825Z add.s32 %r70, %r29, 32; 2026-02-21T09:47:27.3113994Z bfe.u32 %r71, %r70, 4, 14; 2026-02-21T09:47:27.3114149Z cvt.u64.u32 %rd76, %r71; 2026-02-21T09:47:27.3114297Z or.b64 %rd94, %rd76, 4611686293439512576; 2026-02-21T09:47:27.3114475Z add.s32 %r72, %r29, 139296; 2026-02-21T09:47:27.3114621Z bfe.u32 %r73, %r72, 4, 14; 2026-02-21T09:47:27.3114803Z cvt.u64.u32 %rd77, %r73; 2026-02-21T09:47:27.3114954Z or.b64 %rd95, %rd77, 4611686293313683456; 2026-02-21T09:47:27.3115124Z add.s32 %r74, %r29, 64; 2026-02-21T09:47:27.3115265Z bfe.u32 %r75, %r74, 4, 14; 2026-02-21T09:47:27.3115416Z cvt.u64.u32 %rd78, %r75; 2026-02-21T09:47:27.3115573Z or.b64 %rd96, %rd78, 4611686293439512576; 2026-02-21T09:47:27.3115741Z add.s32 %r76, %r29, 139328; 2026-02-21T09:47:27.3115895Z bfe.u32 %r77, %r76, 4, 14; 2026-02-21T09:47:27.3116038Z cvt.u64.u32 %rd79, %r77; 2026-02-21T09:47:27.3116194Z or.b64 %rd97, %rd79, 4611686293313683456; 2026-02-21T09:47:27.3116358Z add.s32 %r78, %r29, 96; 2026-02-21T09:47:27.3116504Z bfe.u32 %r79, %r78, 4, 14; 2026-02-21T09:47:27.3116648Z cvt.u64.u32 %rd80, %r79; 2026-02-21T09:47:27.3116805Z or.b64 %rd98, %rd80, 4611686293439512576; 2026-02-21T09:47:27.3117001Z add.s32 %r80, %r29, 139360; 2026-02-21T09:47:27.3117155Z bfe.u32 %r81, %r80, 4, 14; 2026-02-21T09:47:27.3117306Z cvt.u64.u32 %rd81, %r81; 2026-02-21T09:47:27.3117455Z or.b64 %rd99, %rd81, 4611686293313683456; 2026-02-21T09:47:27.3117627Z add.s32 %r82, %r29, 16384; 2026-02-21T09:47:27.3117773Z bfe.u32 %r83, %r82, 4, 14; 2026-02-21T09:47:27.3117924Z cvt.u64.u32 %rd82, %r83; 2026-02-21T09:47:27.3118078Z or.b64 %rd100, %rd82, 4611686293439512576; 2026-02-21T09:47:27.3118256Z add.s32 %r84, %r29, 16416; 2026-02-21T09:47:27.3118400Z bfe.u32 %r85, %r84, 4, 14; 2026-02-21T09:47:27.3118554Z cvt.u64.u32 %rd83, %r85; 2026-02-21T09:47:27.3118717Z or.b64 %rd102, %rd83, 4611686293439512576; 2026-02-21T09:47:27.3118885Z add.s32 %r86, %r29, 16448; 2026-02-21T09:47:27.3119037Z bfe.u32 %r87, %r86, 4, 14; 2026-02-21T09:47:27.3119179Z cvt.u64.u32 %rd84, %r87; 2026-02-21T09:47:27.3119365Z or.b64 %rd104, %rd84, 4611686293439512576; 2026-02-21T09:47:27.3119533Z add.s32 %r88, %r29, 16480; 2026-02-21T09:47:27.3119684Z bfe.u32 %r89, %r88, 4, 14; 2026-02-21T09:47:27.3119825Z cvt.u64.u32 %rd85, %r89; 2026-02-21T09:47:27.3119983Z or.b64 %rd106, %rd85, 4611686293439512576; 2026-02-21T09:47:27.3120146Z shl.b32 %r90, %r1, 5; 2026-02-21T09:47:27.3120296Z and.b32 %r91, %r90, 3936; 2026-02-21T09:47:27.3120448Z bfe.s32 %r92, %r1, 2, 1; 2026-02-21T09:47:27.3120594Z and.b32 %r93, %r92, 144; 2026-02-21T09:47:27.3120750Z or.b32 %r94, %r93, %r91; 2026-02-21T09:47:27.3120899Z add.s32 %r95, %r29, 131072; 2026-02-21T09:47:27.3121087Z add.s32 %r6, %r95, %r94; 2026-02-21T09:47:27.3121235Z xor.b32 %r96, %r94, 16; 2026-02-21T09:47:27.3121389Z add.s32 %r7, %r95, %r96; 2026-02-21T09:47:27.3121536Z bra.uni $L__BB0_2; 2026-02-21T09:47:27.3121730Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:27.3121949Z mov.b32 %r293, 1; 2026-02-21T09:47:27.3122205Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3122501Z // begin inline asm 2026-02-21T09:47:27.3122635Z 2026-02-21T09:47:27.3122755Z { 2026-02-21T09:47:27.3122877Z .reg .pred complete; 2026-02-21T09:47:27.3123027Z waitLoop: 2026-02-21T09:47:27.3123216Z mbarrier.try_wait.parity.shared.b64 complete, [%r292], %r293; 2026-02-21T09:47:27.3123462Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.3123612Z } 2026-02-21T09:47:27.3123683Z 2026-02-21T09:47:27.3123738Z // end inline asm 2026-02-21T09:47:27.3123990Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3124271Z bar.sync 0; 2026-02-21T09:47:27.3124406Z // begin inline asm 2026-02-21T09:47:27.3124569Z @%p4 mbarrier.inval.shared::cta.b64 [%r133]; 2026-02-21T09:47:27.3124805Z // end inline asm 2026-02-21T09:47:27.3124937Z bar.sync 0; 2026-02-21T09:47:27.3125098Z // begin inline asm 2026-02-21T09:47:27.3125262Z @%p4 mbarrier.inval.shared::cta.b64 [%r134]; 2026-02-21T09:47:27.3125455Z // end inline asm 2026-02-21T09:47:27.3125595Z bar.sync 0; 2026-02-21T09:47:27.3125723Z // begin inline asm 2026-02-21T09:47:27.3125890Z @%p4 mbarrier.inval.shared::cta.b64 [%r135]; 2026-02-21T09:47:27.3126072Z // end inline asm 2026-02-21T09:47:27.3126214Z bar.sync 0; 2026-02-21T09:47:27.3126341Z // begin inline asm 2026-02-21T09:47:27.3126504Z @%p4 mbarrier.inval.shared::cta.b64 [%r200]; 2026-02-21T09:47:27.3126685Z // end inline asm 2026-02-21T09:47:27.3126832Z add.s32 %r298, %r29, 147488; 2026-02-21T09:47:27.3126989Z // begin inline asm 2026-02-21T09:47:27.3127158Z @%p4 mbarrier.inval.shared::cta.b64 [%r298]; 2026-02-21T09:47:27.3127343Z // end inline asm 2026-02-21T09:47:27.3127475Z bar.sync 0; 2026-02-21T09:47:27.3127610Z // begin inline asm 2026-02-21T09:47:27.3127765Z @%p4 mbarrier.inval.shared::cta.b64 [%r132]; 2026-02-21T09:47:27.3127953Z // end inline asm 2026-02-21T09:47:27.3128206Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3128543Z // begin inline asm 2026-02-21T09:47:27.3128927Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r300, %r301, %r302, %r303, %r304, %r305, %r306, %r307, %r308, %r309, %r310, %r311, %r312, %r313, %r314, %r315}, [%r333 + 0]; 2026-02-21T09:47:27.3129331Z // end inline asm 2026-02-21T09:47:27.3129465Z // begin inline asm 2026-02-21T09:47:27.3129812Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329, %r330, %r331, %r332}, [%r333 + 16]; 2026-02-21T09:47:27.3130191Z // end inline asm 2026-02-21T09:47:27.3130319Z // begin inline asm 2026-02-21T09:47:27.3130471Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:27.3130627Z // end inline asm 2026-02-21T09:47:27.3130766Z cvt.u64.u32 %rd143, %r300; 2026-02-21T09:47:27.3130925Z cvt.u64.u32 %rd144, %r301; 2026-02-21T09:47:27.3131078Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:47:27.3131264Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:47:27.3131531Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3131817Z mov.b64 {%r338, %r339}, %rd146; 2026-02-21T09:47:27.3131986Z cvt.rn.f16x2.f32 %r340, %r339, %r338; 2026-02-21T09:47:27.3132266Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3132544Z cvt.u64.u32 %rd147, %r302; 2026-02-21T09:47:27.3132691Z cvt.u64.u32 %rd148, %r303; 2026-02-21T09:47:27.3132846Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:47:27.3133029Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:47:27.3133290Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3133563Z mov.b64 {%r341, %r342}, %rd150; 2026-02-21T09:47:27.3133733Z cvt.rn.f16x2.f32 %r343, %r342, %r341; 2026-02-21T09:47:27.3134005Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3134287Z cvt.u64.u32 %rd151, %r304; 2026-02-21T09:47:27.3134439Z cvt.u64.u32 %rd152, %r305; 2026-02-21T09:47:27.3134584Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:47:27.3134777Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:47:27.3135041Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3135332Z mov.b64 {%r344, %r345}, %rd154; 2026-02-21T09:47:27.3135494Z cvt.rn.f16x2.f32 %r346, %r345, %r344; 2026-02-21T09:47:27.3135774Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3136058Z cvt.u64.u32 %rd155, %r306; 2026-02-21T09:47:27.3136206Z cvt.u64.u32 %rd156, %r307; 2026-02-21T09:47:27.3136359Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:47:27.3136511Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:47:27.3136811Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3137084Z mov.b64 {%r347, %r348}, %rd158; 2026-02-21T09:47:27.3137253Z cvt.rn.f16x2.f32 %r349, %r348, %r347; 2026-02-21T09:47:27.3137518Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3137798Z cvt.u64.u32 %rd159, %r308; 2026-02-21T09:47:27.3137951Z cvt.u64.u32 %rd160, %r309; 2026-02-21T09:47:27.3138096Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:47:27.3138252Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:47:27.3138503Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3138777Z mov.b64 {%r350, %r351}, %rd162; 2026-02-21T09:47:27.3138934Z cvt.rn.f16x2.f32 %r352, %r351, %r350; 2026-02-21T09:47:27.3139203Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3139488Z cvt.u64.u32 %rd163, %r310; 2026-02-21T09:47:27.3139635Z cvt.u64.u32 %rd164, %r311; 2026-02-21T09:47:27.3139784Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:47:27.3139984Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:47:27.3140241Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3140511Z mov.b64 {%r353, %r354}, %rd166; 2026-02-21T09:47:27.3140674Z cvt.rn.f16x2.f32 %r355, %r354, %r353; 2026-02-21T09:47:27.3140934Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3141211Z cvt.u64.u32 %rd167, %r312; 2026-02-21T09:47:27.3141364Z cvt.u64.u32 %rd168, %r313; 2026-02-21T09:47:27.3141508Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:47:27.3141663Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:47:27.3141913Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3142201Z mov.b64 {%r356, %r357}, %rd170; 2026-02-21T09:47:27.3142392Z cvt.rn.f16x2.f32 %r358, %r357, %r356; 2026-02-21T09:47:27.3142662Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3142938Z cvt.u64.u32 %rd171, %r314; 2026-02-21T09:47:27.3143084Z cvt.u64.u32 %rd172, %r315; 2026-02-21T09:47:27.3143236Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:47:27.3143382Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:47:27.3143639Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3143911Z mov.b64 {%r359, %r360}, %rd174; 2026-02-21T09:47:27.3144073Z cvt.rn.f16x2.f32 %r361, %r360, %r359; 2026-02-21T09:47:27.3144374Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3144650Z cvt.u64.u32 %rd175, %r317; 2026-02-21T09:47:27.3144836Z cvt.u64.u32 %rd176, %r318; 2026-02-21T09:47:27.3144981Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:47:27.3145140Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:47:27.3145393Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3145670Z mov.b64 {%r362, %r363}, %rd178; 2026-02-21T09:47:27.3145828Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T09:47:27.3146102Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3146379Z cvt.u64.u32 %rd179, %r319; 2026-02-21T09:47:27.3146526Z cvt.u64.u32 %rd180, %r320; 2026-02-21T09:47:27.3146682Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:47:27.3146833Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:47:27.3147097Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3147372Z mov.b64 {%r365, %r366}, %rd182; 2026-02-21T09:47:27.3147535Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T09:47:27.3147833Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3148117Z cvt.u64.u32 %rd183, %r321; 2026-02-21T09:47:27.3148272Z cvt.u64.u32 %rd184, %r322; 2026-02-21T09:47:27.3148421Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:47:27.3148580Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:47:27.3148840Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3149117Z mov.b64 {%r368, %r369}, %rd186; 2026-02-21T09:47:27.3149276Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T09:47:27.3149550Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3149834Z cvt.u64.u32 %rd187, %r323; 2026-02-21T09:47:27.3149980Z cvt.u64.u32 %rd188, %r324; 2026-02-21T09:47:27.3150133Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:47:27.3150283Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:47:27.3150561Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3150836Z mov.b64 {%r371, %r372}, %rd190; 2026-02-21T09:47:27.3151002Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T09:47:27.3151297Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3151579Z cvt.u64.u32 %rd191, %r325; 2026-02-21T09:47:27.3151733Z cvt.u64.u32 %rd192, %r326; 2026-02-21T09:47:27.3151876Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:47:27.3152032Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:47:27.3152287Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3152578Z mov.b64 {%r374, %r375}, %rd194; 2026-02-21T09:47:27.3152734Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T09:47:27.3153009Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3153294Z cvt.u64.u32 %rd195, %r327; 2026-02-21T09:47:27.3153438Z cvt.u64.u32 %rd196, %r328; 2026-02-21T09:47:27.3153618Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:47:27.3153771Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:47:27.3154031Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3154307Z mov.b64 {%r377, %r378}, %rd198; 2026-02-21T09:47:27.3154469Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T09:47:27.3154781Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3155063Z cvt.u64.u32 %rd199, %r329; 2026-02-21T09:47:27.3155212Z cvt.u64.u32 %rd200, %r330; 2026-02-21T09:47:27.3155355Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:47:27.3155542Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:47:27.3155795Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3156075Z mov.b64 {%r380, %r381}, %rd202; 2026-02-21T09:47:27.3156232Z cvt.rn.f16x2.f32 %r382, %r381, %r380; 2026-02-21T09:47:27.3156509Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3156789Z cvt.u64.u32 %rd203, %r331; 2026-02-21T09:47:27.3156936Z cvt.u64.u32 %rd204, %r332; 2026-02-21T09:47:27.3157090Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:47:27.3157237Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:47:27.3157496Z .loc 1 58 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:58:27 2026-02-21T09:47:27.3157775Z mov.b64 {%r383, %r384}, %rd206; 2026-02-21T09:47:27.3157939Z cvt.rn.f16x2.f32 %r385, %r384, %r383; 2026-02-21T09:47:27.3158200Z .loc 1 59 45 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:59:45 2026-02-21T09:47:27.3158489Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.3158658Z bar.sync 0; 2026-02-21T09:47:27.3158820Z st.shared.v4.b32 [%r6], {%r340, %r343, %r346, %r349}; 2026-02-21T09:47:27.3159081Z st.shared.v4.b32 [%r6+4096], {%r364, %r367, %r370, %r373}; 2026-02-21T09:47:27.3159316Z st.shared.v4.b32 [%r7], {%r352, %r355, %r358, %r361}; 2026-02-21T09:47:27.3159545Z st.shared.v4.b32 [%r7+4096], {%r376, %r379, %r382, %r385}; 2026-02-21T09:47:27.3159738Z // begin inline asm 2026-02-21T09:47:27.3159898Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.3160063Z // end inline asm 2026-02-21T09:47:27.3160191Z bar.sync 0; 2026-02-21T09:47:27.3160329Z elect.sync %r386|%p146, -1; 2026-02-21T09:47:27.3160487Z and.pred %p144, %p1, %p146; 2026-02-21T09:47:27.3160642Z // begin inline asm 2026-02-21T09:47:27.3160896Z @%p144 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd142, {%r334, %r335}], [%r95]; 2026-02-21T09:47:27.3161186Z // end inline asm 2026-02-21T09:47:27.3161323Z cp.async.bulk.commit_group; 2026-02-21T09:47:27.3161585Z .loc 1 33 84 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:33:84 2026-02-21T09:47:27.3161872Z add.s32 %r388, %r388, 1; 2026-02-21T09:47:27.3162025Z setp.ne.b32 %p147, %r388, %r4; 2026-02-21T09:47:27.3162191Z @%p147 bra $L__BB0_2; 2026-02-21T09:47:27.3162330Z bra.uni $L__BB0_9; 2026-02-21T09:47:27.3162544Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:47:27.3162774Z // Child Loop BB0_5 Depth 2 2026-02-21T09:47:27.3163082Z .loc 1 39 35 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:39:35 2026-02-21T09:47:27.3163359Z shr.s32 %r166, %r388, 31; 2026-02-21T09:47:27.3163511Z shr.u32 %r167, %r166, 28; 2026-02-21T09:47:27.3163669Z add.s32 %r168, %r388, %r167; 2026-02-21T09:47:27.3163925Z .loc 1 42 45 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:42:45 2026-02-21T09:47:27.3164209Z and.b32 %r169, %r168, 65520; 2026-02-21T09:47:27.3164358Z sub.s32 %r170, %r388, %r169; 2026-02-21T09:47:27.3164623Z .loc 1 42 64 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:42:64 2026-02-21T09:47:27.3164992Z cvt.u16.u32 %rs1, %r170; 2026-02-21T09:47:27.3165157Z and.b16 %rs2, %rs1, 128; 2026-02-21T09:47:27.3165426Z .loc 1 43 51 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:43:51 2026-02-21T09:47:27.3165708Z shr.u16 %rs3, %rs2, 7; 2026-02-21T09:47:27.3165865Z add.s16 %rs4, %rs1, %rs3; 2026-02-21T09:47:27.3166020Z cvt.s16.s8 %rs5, %rs4; 2026-02-21T09:47:27.3166181Z shr.s16 %rs6, %rs5, 1; 2026-02-21T09:47:27.3166437Z .loc 1 42 64 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:42:64 2026-02-21T09:47:27.3166728Z and.b16 %rs7, %rs4, 254; 2026-02-21T09:47:27.3166912Z sub.s16 %rs8, %rs1, %rs7; 2026-02-21T09:47:27.3167186Z .loc 1 44 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:44:27 2026-02-21T09:47:27.3167480Z shl.b32 %r171, %r168, 1; 2026-02-21T09:47:27.3167629Z and.b32 %r172, %r171, -32; 2026-02-21T09:47:27.3167794Z cvt.s16.s8 %rs9, %rs8; 2026-02-21T09:47:27.3167957Z mad.wide.s16 %r334, %rs9, 16, %r172; 2026-02-21T09:47:27.3168253Z .loc 1 45 27 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:45:27 2026-02-21T09:47:27.3168545Z mul.wide.s16 %r335, %rs6, 256; 2026-02-21T09:47:27.3168828Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3169135Z shfl.sync.idx.b32 %r11, %r5, 0, 31, -1; 2026-02-21T09:47:27.3169316Z shl.b32 %r173, %r11, 21; 2026-02-21T09:47:27.3169477Z and.b32 %r174, %r173, 6291456; 2026-02-21T09:47:27.3169638Z add.s32 %r333, %r174, %r387; 2026-02-21T09:47:27.3169805Z mov.pred %p85, -1; 2026-02-21T09:47:27.3169950Z mov.b32 %r389, 0; 2026-02-21T09:47:27.3170094Z // begin inline asm 2026-02-21T09:47:27.3170466Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r333 + 0], {%r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389}; 2026-02-21T09:47:27.3170873Z // end inline asm 2026-02-21T09:47:27.3171046Z // begin inline asm 2026-02-21T09:47:27.3171424Z @%p85 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r333 + 16], {%r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389, %r389}; 2026-02-21T09:47:27.3171837Z // end inline asm 2026-02-21T09:47:27.3171972Z // begin inline asm 2026-02-21T09:47:27.3172130Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:27.3172294Z // end inline asm 2026-02-21T09:47:27.3172433Z bar.sync 0; 2026-02-21T09:47:27.3172679Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3172978Z add.s32 %r390, %r29, 147488; 2026-02-21T09:47:27.3173139Z // begin inline asm 2026-02-21T09:47:27.3173307Z @%p4 mbarrier.init.shared::cta.b64 [%r390], 1; 2026-02-21T09:47:27.3173504Z // end inline asm 2026-02-21T09:47:27.3173635Z bar.sync 0; 2026-02-21T09:47:27.3173780Z add.s32 %r132, %r29, 147496; 2026-02-21T09:47:27.3173925Z // begin inline asm 2026-02-21T09:47:27.3174090Z @%p4 mbarrier.init.shared::cta.b64 [%r132], 1; 2026-02-21T09:47:27.3174268Z // end inline asm 2026-02-21T09:47:27.3174430Z add.s32 %r133, %r29, 147456; 2026-02-21T09:47:27.3174584Z // begin inline asm 2026-02-21T09:47:27.3174769Z @%p4 mbarrier.init.shared::cta.b64 [%r133], 1; 2026-02-21T09:47:27.3174950Z // end inline asm 2026-02-21T09:47:27.3175072Z bar.sync 0; 2026-02-21T09:47:27.3175200Z add.s32 %r134, %r29, 147464; 2026-02-21T09:47:27.3175342Z // begin inline asm 2026-02-21T09:47:27.3175500Z @%p4 mbarrier.init.shared::cta.b64 [%r134], 1; 2026-02-21T09:47:27.3175674Z // end inline asm 2026-02-21T09:47:27.3175803Z bar.sync 0; 2026-02-21T09:47:27.3175926Z add.s32 %r135, %r29, 147472; 2026-02-21T09:47:27.3176077Z // begin inline asm 2026-02-21T09:47:27.3176232Z @%p4 mbarrier.init.shared::cta.b64 [%r135], 1; 2026-02-21T09:47:27.3176408Z // end inline asm 2026-02-21T09:47:27.3176540Z bar.sync 0; 2026-02-21T09:47:27.3176662Z add.s32 %r200, %r29, 147480; 2026-02-21T09:47:27.3176810Z // begin inline asm 2026-02-21T09:47:27.3177002Z @%p4 mbarrier.init.shared::cta.b64 [%r200], 1; 2026-02-21T09:47:27.3177186Z // end inline asm 2026-02-21T09:47:27.3177309Z bar.sync 0; 2026-02-21T09:47:27.3177434Z // begin inline asm 2026-02-21T09:47:27.3177622Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r133], 34816; 2026-02-21T09:47:27.3177827Z // end inline asm 2026-02-21T09:47:27.3178076Z .loc 1 54 31 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:54:31 2026-02-21T09:47:27.3178354Z // begin inline asm 2026-02-21T09:47:27.3178512Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.3178670Z // end inline asm 2026-02-21T09:47:27.3178841Z bar.sync 0; 2026-02-21T09:47:27.3178974Z elect.sync %r175|%p76, -1; 2026-02-21T09:47:27.3179138Z and.pred %p67, %p1, %p76; 2026-02-21T09:47:27.3179289Z // begin inline asm 2026-02-21T09:47:27.3179624Z @%p67 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29], [%rd109, {%r389, %r335}], [%r133]; 2026-02-21T09:47:27.3179981Z // end inline asm 2026-02-21T09:47:27.3180218Z .loc 1 55 44 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:55:44 2026-02-21T09:47:27.3180493Z bar.sync 0; 2026-02-21T09:47:27.3180624Z elect.sync %r176|%p77, -1; 2026-02-21T09:47:27.3180793Z and.pred %p68, %p1, %p77; 2026-02-21T09:47:27.3180943Z // begin inline asm 2026-02-21T09:47:27.3181268Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r68], [%rd110, {%r389, %r334}], [%r133]; 2026-02-21T09:47:27.3181628Z // end inline asm 2026-02-21T09:47:27.3181863Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3182152Z bar.sync 0; 2026-02-21T09:47:27.3182273Z // begin inline asm 2026-02-21T09:47:27.3182462Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r134], 34816; 2026-02-21T09:47:27.3182669Z // end inline asm 2026-02-21T09:47:27.3182959Z .loc 1 54 31 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:54:31 2026-02-21T09:47:27.3183240Z bar.sync 0; 2026-02-21T09:47:27.3183371Z elect.sync %r177|%p78, -1; 2026-02-21T09:47:27.3183536Z and.pred %p70, %p1, %p78; 2026-02-21T09:47:27.3183686Z add.s32 %r147, %r29, 32768; 2026-02-21T09:47:27.3183837Z mov.b32 %r148, 64; 2026-02-21T09:47:27.3183966Z // begin inline asm 2026-02-21T09:47:27.3184284Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd109, {%r148, %r335}], [%r134]; 2026-02-21T09:47:27.3184623Z // end inline asm 2026-02-21T09:47:27.3184907Z .loc 1 55 44 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:55:44 2026-02-21T09:47:27.3185190Z bar.sync 0; 2026-02-21T09:47:27.3185319Z elect.sync %r178|%p79, -1; 2026-02-21T09:47:27.3185483Z and.pred %p71, %p1, %p79; 2026-02-21T09:47:27.3185632Z add.s32 %r151, %r29, 141312; 2026-02-21T09:47:27.3185783Z // begin inline asm 2026-02-21T09:47:27.3186106Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r151], [%rd110, {%r148, %r334}], [%r134]; 2026-02-21T09:47:27.3186505Z // end inline asm 2026-02-21T09:47:27.3186760Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3187039Z bar.sync 0; 2026-02-21T09:47:27.3187169Z // begin inline asm 2026-02-21T09:47:27.3187350Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r135], 34816; 2026-02-21T09:47:27.3187565Z // end inline asm 2026-02-21T09:47:27.3187803Z .loc 1 54 31 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:54:31 2026-02-21T09:47:27.3188084Z bar.sync 0; 2026-02-21T09:47:27.3188213Z elect.sync %r179|%p80, -1; 2026-02-21T09:47:27.3188376Z and.pred %p73, %p1, %p80; 2026-02-21T09:47:27.3188532Z add.s32 %r156, %r29, 65536; 2026-02-21T09:47:27.3188682Z mov.b32 %r157, 128; 2026-02-21T09:47:27.3188827Z // begin inline asm 2026-02-21T09:47:27.3189166Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r156], [%rd109, {%r157, %r335}], [%r135]; 2026-02-21T09:47:27.3189526Z // end inline asm 2026-02-21T09:47:27.3189764Z .loc 1 55 44 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:55:44 2026-02-21T09:47:27.3190046Z bar.sync 0; 2026-02-21T09:47:27.3190184Z elect.sync %r180|%p81, -1; 2026-02-21T09:47:27.3190336Z and.pred %p74, %p1, %p81; 2026-02-21T09:47:27.3190496Z add.s32 %r160, %r29, 143360; 2026-02-21T09:47:27.3190644Z // begin inline asm 2026-02-21T09:47:27.3190967Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd110, {%r157, %r334}], [%r135]; 2026-02-21T09:47:27.3191345Z // end inline asm 2026-02-21T09:47:27.3191588Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3191865Z bar.sync 0; 2026-02-21T09:47:27.3191985Z // begin inline asm 2026-02-21T09:47:27.3192034Z 2026-02-21T09:47:27.3192090Z { 2026-02-21T09:47:27.3192147Z .reg .pred complete; 2026-02-21T09:47:27.3192213Z waitLoop: 2026-02-21T09:47:27.3192326Z mbarrier.try_wait.parity.shared.b64 complete, [%r133], %r389; 2026-02-21T09:47:27.3192397Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.3192449Z } 2026-02-21T09:47:27.3192453Z 2026-02-21T09:47:27.3192505Z // end inline asm 2026-02-21T09:47:27.3192674Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3192735Z setp.ne.b32 %p82, %r11, 0; 2026-02-21T09:47:27.3192789Z @%p82 bra $L__BB0_4; 2026-02-21T09:47:27.3192893Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:27.3192955Z elect.sync %r197|%p84, -1; 2026-02-21T09:47:27.3193010Z mov.b32 %r182, 134479888; 2026-02-21T09:47:27.3193066Z mov.pred %p83, 0; 2026-02-21T09:47:27.3193127Z // begin inline asm 2026-02-21T09:47:27.3193363Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd92, %rd93, %r182, %p83; 2026-02-21T09:47:27.3193421Z // end inline asm 2026-02-21T09:47:27.3193484Z // begin inline asm 2026-02-21T09:47:27.3193616Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd94, %rd95, %r182, %p85; 2026-02-21T09:47:27.3193670Z // end inline asm 2026-02-21T09:47:27.3193729Z // begin inline asm 2026-02-21T09:47:27.3193853Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd96, %rd97, %r182, %p85; 2026-02-21T09:47:27.3193907Z // end inline asm 2026-02-21T09:47:27.3193959Z // begin inline asm 2026-02-21T09:47:27.3194091Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd98, %rd99, %r182, %p85; 2026-02-21T09:47:27.3194143Z // end inline asm 2026-02-21T09:47:27.3194196Z // begin inline asm 2026-02-21T09:47:27.3194335Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd100, %rd93, %r182, %p83; 2026-02-21T09:47:27.3194389Z // end inline asm 2026-02-21T09:47:27.3194442Z // begin inline asm 2026-02-21T09:47:27.3194581Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd102, %rd95, %r182, %p85; 2026-02-21T09:47:27.3194635Z // end inline asm 2026-02-21T09:47:27.3194742Z // begin inline asm 2026-02-21T09:47:27.3194875Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd104, %rd97, %r182, %p85; 2026-02-21T09:47:27.3194929Z // end inline asm 2026-02-21T09:47:27.3194981Z // begin inline asm 2026-02-21T09:47:27.3195105Z @%p84 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd106, %rd99, %r182, %p85; 2026-02-21T09:47:27.3195164Z // end inline asm 2026-02-21T09:47:27.3195220Z add.s32 %r199, %r29, 147488; 2026-02-21T09:47:27.3195279Z cvt.u64.u32 %rd108, %r199; 2026-02-21T09:47:27.3195339Z // begin inline asm 2026-02-21T09:47:27.3195461Z @%p84 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd108]; 2026-02-21T09:47:27.3195512Z // end inline asm 2026-02-21T09:47:27.3195613Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:27.3195804Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3195859Z bar.sync 0; 2026-02-21T09:47:27.3195913Z // begin inline asm 2026-02-21T09:47:27.3196027Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r200], 34816; 2026-02-21T09:47:27.3196079Z // end inline asm 2026-02-21T09:47:27.3196242Z .loc 1 54 31 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:54:31 2026-02-21T09:47:27.3196300Z bar.sync 0; 2026-02-21T09:47:27.3196362Z elect.sync %r214|%p104, -1; 2026-02-21T09:47:27.3196421Z and.pred %p101, %p1, %p104; 2026-02-21T09:47:27.3196476Z add.s32 %r201, %r29, 98304; 2026-02-21T09:47:27.3196561Z mov.b32 %r202, 192; 2026-02-21T09:47:27.3196614Z // begin inline asm 2026-02-21T09:47:27.3196856Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r201], [%rd109, {%r202, %r335}], [%r200]; 2026-02-21T09:47:27.3196914Z // end inline asm 2026-02-21T09:47:27.3197081Z .loc 1 55 44 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:55:44 2026-02-21T09:47:27.3197133Z bar.sync 0; 2026-02-21T09:47:27.3197201Z elect.sync %r215|%p105, -1; 2026-02-21T09:47:27.3197259Z and.pred %p102, %p1, %p105; 2026-02-21T09:47:27.3197315Z add.s32 %r205, %r29, 145408; 2026-02-21T09:47:27.3197369Z // begin inline asm 2026-02-21T09:47:27.3197609Z @%p102 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r205], [%rd110, {%r202, %r334}], [%r200]; 2026-02-21T09:47:27.3197662Z // end inline asm 2026-02-21T09:47:27.3197713Z mov.b32 %r394, 1; 2026-02-21T09:47:27.3197772Z mov.b32 %r393, 3; 2026-02-21T09:47:27.3197827Z mov.b32 %r391, %r389; 2026-02-21T09:47:27.3197881Z mov.b32 %r392, %r389; 2026-02-21T09:47:27.3197940Z mov.b32 %r395, %r389; 2026-02-21T09:47:27.3197993Z mov.b32 %r396, %r389; 2026-02-21T09:47:27.3198046Z bra.uni $L__BB0_5; 2026-02-21T09:47:27.3198140Z $L__BB0_7: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:47:27.3198337Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3198405Z setp.lt.u32 %p128, %r396, 1792; 2026-02-21T09:47:27.3198569Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3198630Z // begin inline asm 2026-02-21T09:47:27.3198679Z 2026-02-21T09:47:27.3198727Z { 2026-02-21T09:47:27.3198794Z .reg .pred complete; 2026-02-21T09:47:27.3198848Z waitLoop: 2026-02-21T09:47:27.3198961Z mbarrier.try_wait.parity.shared.b64 complete, [%r390], %r389; 2026-02-21T09:47:27.3199022Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.3199079Z } 2026-02-21T09:47:27.3199083Z 2026-02-21T09:47:27.3199135Z // end inline asm 2026-02-21T09:47:27.3199192Z add.s32 %r281, %r394, 1; 2026-02-21T09:47:27.3199259Z setp.gt.s32 %p131, %r281, 1; 2026-02-21T09:47:27.3199320Z selp.b32 %r394, 0, %r281, %p131; 2026-02-21T09:47:27.3199378Z selp.b32 %r282, 1, 0, %p131; 2026-02-21T09:47:27.3199435Z xor.b32 %r25, %r395, %r282; 2026-02-21T09:47:27.3199606Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3199692Z add.s32 %r283, %r393, 1; 2026-02-21T09:47:27.3199750Z setp.gt.s32 %p132, %r283, 3; 2026-02-21T09:47:27.3199820Z selp.b32 %r393, 0, %r283, %p132; 2026-02-21T09:47:27.3199874Z shl.b32 %r284, %r393, 3; 2026-02-21T09:47:27.3199928Z add.s32 %r286, %r29, %r284; 2026-02-21T09:47:27.3199993Z add.s32 %r276, %r286, 147456; 2026-02-21T09:47:27.3200043Z bar.sync 0; 2026-02-21T09:47:27.3200102Z and.pred %p125, %p4, %p128; 2026-02-21T09:47:27.3200157Z // begin inline asm 2026-02-21T09:47:27.3200275Z @%p125 mbarrier.arrive.expect_tx.shared.b64 _, [%r276], 34816; 2026-02-21T09:47:27.3200327Z // end inline asm 2026-02-21T09:47:27.3200486Z .loc 1 54 31 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:54:31 2026-02-21T09:47:27.3200551Z shl.b32 %r287, %r393, 15; 2026-02-21T09:47:27.3200631Z add.s32 %r273, %r29, %r287; 2026-02-21T09:47:27.3200685Z bar.sync 0; 2026-02-21T09:47:27.3200745Z elect.sync %r288|%p133, -1; 2026-02-21T09:47:27.3200814Z and.pred %p134, %p128, %p133; 2026-02-21T09:47:27.3200873Z and.pred %p126, %p1, %p134; 2026-02-21T09:47:27.3200929Z add.s32 %r274, %r396, 256; 2026-02-21T09:47:27.3200991Z // begin inline asm 2026-02-21T09:47:27.3201228Z @%p126 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r273], [%rd109, {%r274, %r335}], [%r276]; 2026-02-21T09:47:27.3201280Z // end inline asm 2026-02-21T09:47:27.3201449Z .loc 1 55 44 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:55:44 2026-02-21T09:47:27.3201528Z shl.b32 %r289, %r393, 11; 2026-02-21T09:47:27.3201584Z add.s32 %r290, %r29, %r289; 2026-02-21T09:47:27.3201642Z add.s32 %r277, %r290, 139264; 2026-02-21T09:47:27.3201699Z bar.sync 0; 2026-02-21T09:47:27.3201757Z elect.sync %r291|%p135, -1; 2026-02-21T09:47:27.3201818Z and.pred %p136, %p128, %p135; 2026-02-21T09:47:27.3201884Z and.pred %p127, %p1, %p136; 2026-02-21T09:47:27.3201938Z // begin inline asm 2026-02-21T09:47:27.3202172Z @%p127 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r277], [%rd110, {%r274, %r334}], [%r276]; 2026-02-21T09:47:27.3202231Z // end inline asm 2026-02-21T09:47:27.3202395Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3202457Z setp.lt.u32 %p137, %r396, 1920; 2026-02-21T09:47:27.3202513Z add.s32 %r396, %r396, 64; 2026-02-21T09:47:27.3202572Z mov.b32 %r389, %r395; 2026-02-21T09:47:27.3202627Z mov.b32 %r390, %r292; 2026-02-21T09:47:27.3202679Z mov.b32 %r395, %r25; 2026-02-21T09:47:27.3202741Z @%p137 bra $L__BB0_5; 2026-02-21T09:47:27.3202794Z bra.uni $L__BB0_8; 2026-02-21T09:47:27.3202887Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:27.3203005Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:27.3203168Z .loc 1 50 42 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:50:42 2026-02-21T09:47:27.3203225Z add.s32 %r218, %r392, 1; 2026-02-21T09:47:27.3203286Z setp.gt.s32 %p107, %r218, 3; 2026-02-21T09:47:27.3203353Z selp.b32 %r392, 0, %r218, %p107; 2026-02-21T09:47:27.3203410Z selp.b32 %r219, 1, 0, %p107; 2026-02-21T09:47:27.3203466Z xor.b32 %r391, %r391, %r219; 2026-02-21T09:47:27.3203527Z shl.b32 %r220, %r392, 3; 2026-02-21T09:47:27.3203584Z add.s32 %r222, %r29, %r220; 2026-02-21T09:47:27.3203640Z add.s32 %r216, %r222, 147456; 2026-02-21T09:47:27.3203691Z bar.sync 0; 2026-02-21T09:47:27.3203751Z // begin inline asm 2026-02-21T09:47:27.3203798Z 2026-02-21T09:47:27.3203846Z { 2026-02-21T09:47:27.3203910Z .reg .pred complete; 2026-02-21T09:47:27.3203962Z waitLoop: 2026-02-21T09:47:27.3204075Z mbarrier.try_wait.parity.shared.b64 complete, [%r216], %r391; 2026-02-21T09:47:27.3204139Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.3204193Z } 2026-02-21T09:47:27.3204196Z 2026-02-21T09:47:27.3204273Z // end inline asm 2026-02-21T09:47:27.3204328Z shl.b32 %r223, %r394, 3; 2026-02-21T09:47:27.3204388Z add.s32 %r224, %r29, %r223; 2026-02-21T09:47:27.3204443Z add.s32 %r292, %r224, 147488; 2026-02-21T09:47:27.3204598Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3204659Z @%p82 bra $L__BB0_7; 2026-02-21T09:47:27.3204784Z // %bb.6: // in Loop: Header=BB0_5 Depth=2 2026-02-21T09:47:27.3204950Z .loc 1 54 31 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:54:31 2026-02-21T09:47:27.3205007Z shl.b32 %r241, %r392, 15; 2026-02-21T09:47:27.3205069Z add.s32 %r243, %r29, %r241; 2026-02-21T09:47:27.3205233Z .loc 1 55 44 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:55:44 2026-02-21T09:47:27.3205289Z shl.b32 %r244, %r392, 11; 2026-02-21T09:47:27.3205382Z add.s32 %r245, %r29, %r244; 2026-02-21T09:47:27.3205441Z add.s32 %r246, %r245, 139264; 2026-02-21T09:47:27.3205604Z .loc 1 56 52 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:56:52 2026-02-21T09:47:27.3205669Z elect.sync %r247|%p109, -1; 2026-02-21T09:47:27.3205726Z bfe.u32 %r248, %r243, 4, 14; 2026-02-21T09:47:27.3205784Z cvt.u64.u32 %rd128, %r248; 2026-02-21T09:47:27.3205854Z or.b64 %rd111, %rd128, 4611686293439512576; 2026-02-21T09:47:27.3205920Z bfe.u32 %r249, %r246, 4, 14; 2026-02-21T09:47:27.3205978Z cvt.u64.u32 %rd129, %r249; 2026-02-21T09:47:27.3206082Z or.b64 %rd112, %rd129, 4611686293313683456; 2026-02-21T09:47:27.3206146Z mov.b32 %r226, 134479888; 2026-02-21T09:47:27.3206204Z mov.pred %p108, -1; 2026-02-21T09:47:27.3206259Z // begin inline asm 2026-02-21T09:47:27.3206407Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd111, %rd112, %r226, %p108; 2026-02-21T09:47:27.3206463Z // end inline asm 2026-02-21T09:47:27.3206519Z add.s32 %r250, %r243, 32; 2026-02-21T09:47:27.3206575Z bfe.u32 %r251, %r250, 4, 14; 2026-02-21T09:47:27.3206640Z cvt.u64.u32 %rd130, %r251; 2026-02-21T09:47:27.3206706Z or.b64 %rd113, %rd130, 4611686293439512576; 2026-02-21T09:47:27.3206762Z add.s32 %r252, %r245, 139296; 2026-02-21T09:47:27.3206825Z bfe.u32 %r253, %r252, 4, 14; 2026-02-21T09:47:27.3206882Z cvt.u64.u32 %rd131, %r253; 2026-02-21T09:47:27.3206946Z or.b64 %rd114, %rd131, 4611686293313683456; 2026-02-21T09:47:27.3207001Z // begin inline asm 2026-02-21T09:47:27.3207145Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd113, %rd114, %r226, %p108; 2026-02-21T09:47:27.3207199Z // end inline asm 2026-02-21T09:47:27.3207253Z add.s32 %r254, %r243, 64; 2026-02-21T09:47:27.3207316Z bfe.u32 %r255, %r254, 4, 14; 2026-02-21T09:47:27.3207371Z cvt.u64.u32 %rd132, %r255; 2026-02-21T09:47:27.3207433Z or.b64 %rd115, %rd132, 4611686293439512576; 2026-02-21T09:47:27.3207521Z add.s32 %r256, %r245, 139328; 2026-02-21T09:47:27.3207577Z bfe.u32 %r257, %r256, 4, 14; 2026-02-21T09:47:27.3207632Z cvt.u64.u32 %rd133, %r257; 2026-02-21T09:47:27.3207697Z or.b64 %rd116, %rd133, 4611686293313683456; 2026-02-21T09:47:27.3207757Z // begin inline asm 2026-02-21T09:47:27.3207887Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd115, %rd116, %r226, %p108; 2026-02-21T09:47:27.3207939Z // end inline asm 2026-02-21T09:47:27.3208002Z add.s32 %r258, %r243, 96; 2026-02-21T09:47:27.3208055Z bfe.u32 %r259, %r258, 4, 14; 2026-02-21T09:47:27.3208111Z cvt.u64.u32 %rd134, %r259; 2026-02-21T09:47:27.3208176Z or.b64 %rd117, %rd134, 4611686293439512576; 2026-02-21T09:47:27.3208239Z add.s32 %r260, %r245, 139360; 2026-02-21T09:47:27.3208293Z bfe.u32 %r261, %r260, 4, 14; 2026-02-21T09:47:27.3208349Z cvt.u64.u32 %rd135, %r261; 2026-02-21T09:47:27.3208419Z or.b64 %rd118, %rd135, 4611686293313683456; 2026-02-21T09:47:27.3208473Z // begin inline asm 2026-02-21T09:47:27.3208610Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 0 ], %rd117, %rd118, %r226, %p108; 2026-02-21T09:47:27.3208672Z // end inline asm 2026-02-21T09:47:27.3208754Z add.s32 %r262, %r243, 16384; 2026-02-21T09:47:27.3208811Z bfe.u32 %r263, %r262, 4, 14; 2026-02-21T09:47:27.3208870Z cvt.u64.u32 %rd136, %r263; 2026-02-21T09:47:27.3208946Z or.b64 %rd119, %rd136, 4611686293439512576; 2026-02-21T09:47:27.3209003Z // begin inline asm 2026-02-21T09:47:27.3209144Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd119, %rd112, %r226, %p108; 2026-02-21T09:47:27.3209209Z // end inline asm 2026-02-21T09:47:27.3209265Z add.s32 %r264, %r243, 16416; 2026-02-21T09:47:27.3209321Z bfe.u32 %r265, %r264, 4, 14; 2026-02-21T09:47:27.3209385Z cvt.u64.u32 %rd137, %r265; 2026-02-21T09:47:27.3209452Z or.b64 %rd121, %rd137, 4611686293439512576; 2026-02-21T09:47:27.3209507Z // begin inline asm 2026-02-21T09:47:27.3209646Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd121, %rd114, %r226, %p108; 2026-02-21T09:47:27.3209737Z // end inline asm 2026-02-21T09:47:27.3209794Z add.s32 %r266, %r243, 16448; 2026-02-21T09:47:27.3209853Z bfe.u32 %r267, %r266, 4, 14; 2026-02-21T09:47:27.3209916Z cvt.u64.u32 %rd138, %r267; 2026-02-21T09:47:27.3209982Z or.b64 %rd123, %rd138, 4611686293439512576; 2026-02-21T09:47:27.3210037Z // begin inline asm 2026-02-21T09:47:27.3210179Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd123, %rd116, %r226, %p108; 2026-02-21T09:47:27.3210234Z // end inline asm 2026-02-21T09:47:27.3210289Z add.s32 %r268, %r243, 16480; 2026-02-21T09:47:27.3210344Z bfe.u32 %r269, %r268, 4, 14; 2026-02-21T09:47:27.3210410Z cvt.u64.u32 %rd139, %r269; 2026-02-21T09:47:27.3210509Z or.b64 %rd125, %rd139, 4611686293439512576; 2026-02-21T09:47:27.3210565Z // begin inline asm 2026-02-21T09:47:27.3210711Z @%p109 tcgen05.mma.cta_group::1.kind::f16 [ %r387 + 16 ], %rd125, %rd118, %r226, %p108; 2026-02-21T09:47:27.3210764Z // end inline asm 2026-02-21T09:47:27.3210823Z cvt.u64.u32 %rd127, %r292; 2026-02-21T09:47:27.3210879Z // begin inline asm 2026-02-21T09:47:27.3211015Z @%p109 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd127]; 2026-02-21T09:47:27.3211071Z // end inline asm 2026-02-21T09:47:27.3211126Z bra.uni $L__BB0_7; 2026-02-21T09:47:27.3211216Z $L__BB0_9: // %._crit_edge 2026-02-21T09:47:27.3211394Z .loc 1 33 84 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:33:84 2026-02-21T09:47:27.3211466Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.3211525Z bar.sync 0; 2026-02-21T09:47:27.3211696Z .loc 1 33 4 // cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py:33:4 2026-02-21T09:47:27.3211751Z bar.sync 0; 2026-02-21T09:47:27.3211807Z // begin inline asm 2026-02-21T09:47:27.3211930Z @%p1 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r387, 32; 2026-02-21T09:47:27.3211985Z // end inline asm 2026-02-21T09:47:27.3212039Z ret; 2026-02-21T09:47:27.3212123Z $L__tmp1: 2026-02-21T09:47:27.3212182Z $L__func_end0: 2026-02-21T09:47:27.3212268Z // -- End function 2026-02-21T09:47:27.3212329Z } 2026-02-21T09:47:27.3212540Z .file 1 "/tmp/torchinductor_root/do/cdofjiklnp6iru2ulg3vgss6droczb6c2htl7v7uzqqvd4bsvrml.py" 2026-02-21T09:47:27.3212604Z .section .debug_abbrev 2026-02-21T09:47:27.3212658Z { 2026-02-21T09:47:27.3212753Z .b8 1 // Abbreviation Code 2026-02-21T09:47:27.3212842Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:27.3212925Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:27.3213012Z .b8 37 // DW_AT_producer 2026-02-21T09:47:27.3213088Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.3213163Z .b8 19 // DW_AT_language 2026-02-21T09:47:27.3213246Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:27.3213323Z .b8 3 // DW_AT_name 2026-02-21T09:47:27.3213398Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.3213501Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:27.3213585Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:27.3213661Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:27.3213734Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.3213811Z .b8 0 // EOM(1) 2026-02-21T09:47:27.3213880Z .b8 0 // EOM(2) 2026-02-21T09:47:27.3213948Z .b8 0 // EOM(3) 2026-02-21T09:47:27.3214007Z } 2026-02-21T09:47:27.3214066Z .section .debug_info 2026-02-21T09:47:27.3214118Z { 2026-02-21T09:47:27.3214201Z .b32 104 // Length of Unit 2026-02-21T09:47:27.3214296Z .b8 2 // DWARF version number 2026-02-21T09:47:27.3214350Z .b8 0 2026-02-21T09:47:27.3214490Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:27.3214589Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:27.3214722Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:27.3214805Z .b8 116 // DW_AT_producer 2026-02-21T09:47:27.3214866Z .b8 114 2026-02-21T09:47:27.3214920Z .b8 105 2026-02-21T09:47:27.3214970Z .b8 116 2026-02-21T09:47:27.3215019Z .b8 111 2026-02-21T09:47:27.3215076Z .b8 110 2026-02-21T09:47:27.3215125Z .b8 0 2026-02-21T09:47:27.3215227Z .b8 2 // DW_AT_language 2026-02-21T09:47:27.3215278Z .b8 0 2026-02-21T09:47:27.3215363Z .b8 99 // DW_AT_name 2026-02-21T09:47:27.3215414Z .b8 100 2026-02-21T09:47:27.3215464Z .b8 111 2026-02-21T09:47:27.3215520Z .b8 102 2026-02-21T09:47:27.3215570Z .b8 106 2026-02-21T09:47:27.3215621Z .b8 105 2026-02-21T09:47:27.3215673Z .b8 107 2026-02-21T09:47:27.3215732Z .b8 108 2026-02-21T09:47:27.3215782Z .b8 110 2026-02-21T09:47:27.3215834Z .b8 112 2026-02-21T09:47:27.3215907Z .b8 54 2026-02-21T09:47:27.3215958Z .b8 105 2026-02-21T09:47:27.3216008Z .b8 114 2026-02-21T09:47:27.3216057Z .b8 117 2026-02-21T09:47:27.3216115Z .b8 50 2026-02-21T09:47:27.3216164Z .b8 117 2026-02-21T09:47:27.3216214Z .b8 108 2026-02-21T09:47:27.3216271Z .b8 103 2026-02-21T09:47:27.3216321Z .b8 51 2026-02-21T09:47:27.3216371Z .b8 118 2026-02-21T09:47:27.3216420Z .b8 103 2026-02-21T09:47:27.3216479Z .b8 115 2026-02-21T09:47:27.3216528Z .b8 115 2026-02-21T09:47:27.3216577Z .b8 54 2026-02-21T09:47:27.3216627Z .b8 100 2026-02-21T09:47:27.3216695Z .b8 114 2026-02-21T09:47:27.3216742Z .b8 111 2026-02-21T09:47:27.3216790Z .b8 99 2026-02-21T09:47:27.3216844Z .b8 122 2026-02-21T09:47:27.3216891Z .b8 98 2026-02-21T09:47:27.3216937Z .b8 54 2026-02-21T09:47:27.3216983Z .b8 99 2026-02-21T09:47:27.3217038Z .b8 50 2026-02-21T09:47:27.3217111Z .b8 104 2026-02-21T09:47:27.3217162Z .b8 116 2026-02-21T09:47:27.3217217Z .b8 108 2026-02-21T09:47:27.3217266Z .b8 55 2026-02-21T09:47:27.3217315Z .b8 118 2026-02-21T09:47:27.3217362Z .b8 55 2026-02-21T09:47:27.3217418Z .b8 117 2026-02-21T09:47:27.3217466Z .b8 122 2026-02-21T09:47:27.3217513Z .b8 113 2026-02-21T09:47:27.3217568Z .b8 113 2026-02-21T09:47:27.3217615Z .b8 118 2026-02-21T09:47:27.3217663Z .b8 100 2026-02-21T09:47:27.3217711Z .b8 52 2026-02-21T09:47:27.3217764Z .b8 98 2026-02-21T09:47:27.3217811Z .b8 115 2026-02-21T09:47:27.3217858Z .b8 118 2026-02-21T09:47:27.3217904Z .b8 114 2026-02-21T09:47:27.3217959Z .b8 109 2026-02-21T09:47:27.3218008Z .b8 108 2026-02-21T09:47:27.3218055Z .b8 46 2026-02-21T09:47:27.3218108Z .b8 112 2026-02-21T09:47:27.3218156Z .b8 121 2026-02-21T09:47:27.3218202Z .b8 0 2026-02-21T09:47:27.3218287Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:27.3218366Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:27.3218415Z .b8 116 2026-02-21T09:47:27.3218463Z .b8 109 2026-02-21T09:47:27.3218516Z .b8 112 2026-02-21T09:47:27.3218563Z .b8 47 2026-02-21T09:47:27.3218637Z .b8 116 2026-02-21T09:47:27.3218685Z .b8 111 2026-02-21T09:47:27.3218741Z .b8 114 2026-02-21T09:47:27.3218790Z .b8 99 2026-02-21T09:47:27.3218838Z .b8 104 2026-02-21T09:47:27.3218892Z .b8 105 2026-02-21T09:47:27.3218940Z .b8 110 2026-02-21T09:47:27.3218987Z .b8 100 2026-02-21T09:47:27.3219034Z .b8 117 2026-02-21T09:47:27.3219089Z .b8 99 2026-02-21T09:47:27.3219136Z .b8 116 2026-02-21T09:47:27.3219184Z .b8 111 2026-02-21T09:47:27.3219233Z .b8 114 2026-02-21T09:47:27.3219286Z .b8 95 2026-02-21T09:47:27.3219334Z .b8 114 2026-02-21T09:47:27.3219383Z .b8 111 2026-02-21T09:47:27.3219437Z .b8 111 2026-02-21T09:47:27.3219486Z .b8 116 2026-02-21T09:47:27.3219532Z .b8 47 2026-02-21T09:47:27.3219580Z .b8 100 2026-02-21T09:47:27.3219634Z .b8 111 2026-02-21T09:47:27.3219682Z .b8 0 2026-02-21T09:47:27.3219730Z } 2026-02-21T09:47:27.3219798Z .section .debug_macinfo { } 2026-02-21T09:47:27.3219803Z 2026-02-21T09:47:27.3219902Z ================================================================ 2026-02-21T09:47:27.3220008Z please share the reproducer above with Triton project. 2026-02-21T09:47:27.4286816Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:27.4286830Z 2026-02-21T09:47:27.4289611Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:47:27.4289993Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:27.4290057Z `ptxas` stderr: 2026-02-21T09:47:27.4290412Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.4290519Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.4290526Z 2026-02-21T09:47:27.4290930Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjh4sje24.ptx -o /tmp/tmpjh4sje24.ptx.o 2026-02-21T09:47:27.4290936Z 2026-02-21T09:47:27.4291070Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:27.4291083Z 2026-02-21T09:47:27.4295038Z 2026-02-21T09:47:27.4296631Z ================================================================ 2026-02-21T09:47:27.4296740Z Internal Triton PTX codegen error 2026-02-21T09:47:27.4296817Z `ptxas` stderr: 2026-02-21T09:47:27.4297366Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.4297486Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.4297497Z 2026-02-21T09:47:27.4297917Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpjh4sje24.ptx -o /tmp/tmpjh4sje24.ptx.o 2026-02-21T09:47:27.4297923Z 2026-02-21T09:47:27.4297926Z 2026-02-21T09:47:27.4297984Z // 2026-02-21T09:47:27.4298060Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:27.4298126Z // 2026-02-21T09:47:27.4298129Z 2026-02-21T09:47:27.4298188Z .version 8.7 2026-02-21T09:47:27.4298249Z .target sm_100a 2026-02-21T09:47:27.4298306Z .address_size 64 2026-02-21T09:47:27.4298310Z 2026-02-21T09:47:27.4298441Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:27.4298521Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:27.4298603Z // @_helion_matmul 2026-02-21T09:47:27.4298681Z .visible .entry _helion_matmul( 2026-02-21T09:47:27.4298789Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:27.4298952Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:27.4299054Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:27.4299146Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:27.4299242Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:27.4299292Z ) 2026-02-21T09:47:27.4299354Z .reqntid 256 2026-02-21T09:47:27.4299408Z .maxnreg 32 2026-02-21T09:47:27.4299459Z { 2026-02-21T09:47:27.4299530Z .reg .pred %p<205>; 2026-02-21T09:47:27.4299586Z .reg .b32 %r<683>; 2026-02-21T09:47:27.4299641Z .reg .b64 %rd<315>; 2026-02-21T09:47:27.4299822Z .loc 1 19 0 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:19:0 2026-02-21T09:47:27.4299884Z $L__func_begin0: 2026-02-21T09:47:27.4300095Z .loc 1 19 0 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:19:0 2026-02-21T09:47:27.4300100Z 2026-02-21T09:47:27.4300154Z // %bb.0: 2026-02-21T09:47:27.4300249Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:47:27.4300303Z $L__tmp0: 2026-02-21T09:47:27.4300462Z .loc 1 19 0 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:19 2026-02-21T09:47:27.4300529Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:27.4300616Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:47:27.4300682Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:47:27.4300782Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:47:27.4300924Z mov.b32 %r61, global_smem; 2026-02-21T09:47:27.4300983Z // begin inline asm 2026-02-21T09:47:27.4301168Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r61], 128; 2026-02-21T09:47:27.4301225Z // end inline asm 2026-02-21T09:47:27.4301303Z ld.param.b64 %rd64, [_helion_matmul_param_3]; 2026-02-21T09:47:27.4301365Z bar.sync 0; 2026-02-21T09:47:27.4301438Z ld.shared.b32 %r657, [global_smem]; 2026-02-21T09:47:27.4301493Z bar.sync 0; 2026-02-21T09:47:27.4301551Z // begin inline asm 2026-02-21T09:47:27.4301681Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:27.4301738Z // end inline asm 2026-02-21T09:47:27.4301914Z .loc 1 21 67 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:21:67 2026-02-21T09:47:27.4301983Z mov.u32 %r680, %ctaid.x; 2026-02-21T09:47:27.4302039Z mov.u32 %r189, %ctaid.y; 2026-02-21T09:47:27.4302094Z mov.u32 %r190, %ctaid.z; 2026-02-21T09:47:27.4302151Z mov.u32 %r191, %nctaid.x; 2026-02-21T09:47:27.4302215Z mov.u32 %r192, %nctaid.y; 2026-02-21T09:47:27.4302282Z mad.lo.s32 %r193, %r190, %r192, %r189; 2026-02-21T09:47:27.4302344Z mad.lo.s32 %r194, %r193, %r191, %r680; 2026-02-21T09:47:27.4302410Z mul.lo.s32 %r195, %r194, 384; 2026-02-21T09:47:27.4302467Z cvt.s64.s32 %rd65, %r195; 2026-02-21T09:47:27.4302552Z add.s64 %rd19, %rd64, %rd65; 2026-02-21T09:47:27.4302620Z shl.b32 %r196, %r1, 2; 2026-02-21T09:47:27.4302678Z add.s32 %r62, %r61, %r196; 2026-02-21T09:47:27.4302732Z mov.b32 %r682, 0; 2026-02-21T09:47:27.4302787Z // begin inline asm 2026-02-21T09:47:27.4302861Z @%p3 st.shared.b32 [ %r62 + 0 ], %r682; 2026-02-21T09:47:27.4302915Z // end inline asm 2026-02-21T09:47:27.4302974Z bar.warp.sync -1; 2026-02-21T09:47:27.4303039Z setp.eq.b32 %p197, %r1, 0; 2026-02-21T09:47:27.4303096Z cvt.u64.u32 %rd4, %r61; 2026-02-21T09:47:27.4303150Z // begin inline asm 2026-02-21T09:47:27.4303315Z @%p197 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:47:27.4303378Z // end inline asm 2026-02-21T09:47:27.4303431Z // begin inline asm 2026-02-21T09:47:27.4303571Z @%p197 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.4303634Z // end inline asm 2026-02-21T09:47:27.4303687Z mov.b32 %r64, 64; 2026-02-21T09:47:27.4303742Z // begin inline asm 2026-02-21T09:47:27.4303901Z @%p197 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r64; 2026-02-21T09:47:27.4303981Z // end inline asm 2026-02-21T09:47:27.4304035Z mov.b32 %r65, 256; 2026-02-21T09:47:27.4304089Z // begin inline asm 2026-02-21T09:47:27.4304317Z @%p197 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r65; 2026-02-21T09:47:27.4308609Z // end inline asm 2026-02-21T09:47:27.4310315Z mov.b32 %r66, 2048; 2026-02-21T09:47:27.4310422Z // begin inline asm 2026-02-21T09:47:27.4313951Z @%p197 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r66; 2026-02-21T09:47:27.4314045Z // end inline asm 2026-02-21T09:47:27.4314116Z // begin inline asm 2026-02-21T09:47:27.4314335Z @%p197 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r66; 2026-02-21T09:47:27.4316751Z // end inline asm 2026-02-21T09:47:27.4316846Z mov.b64 %rd12, 4096; 2026-02-21T09:47:27.4316909Z // begin inline asm 2026-02-21T09:47:27.4317342Z @%p197 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:27.4317425Z // end inline asm 2026-02-21T09:47:27.4317496Z mov.b32 %r68, 1; 2026-02-21T09:47:27.4317555Z // begin inline asm 2026-02-21T09:47:27.4317745Z @%p197 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r68; 2026-02-21T09:47:27.4317819Z // end inline asm 2026-02-21T09:47:27.4317874Z // begin inline asm 2026-02-21T09:47:27.4318041Z @%p197 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r68; 2026-02-21T09:47:27.4318102Z // end inline asm 2026-02-21T09:47:27.4318213Z // begin inline asm 2026-02-21T09:47:27.4318600Z @%p197 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.4318663Z // end inline asm 2026-02-21T09:47:27.4318719Z // begin inline asm 2026-02-21T09:47:27.4318894Z @%p197 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.4318949Z // end inline asm 2026-02-21T09:47:27.4319013Z // begin inline asm 2026-02-21T09:47:27.4319169Z @%p197 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.4319221Z // end inline asm 2026-02-21T09:47:27.4319283Z // begin inline asm 2026-02-21T09:47:27.4319429Z @%p197 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.4319482Z // end inline asm 2026-02-21T09:47:27.4319541Z // begin inline asm 2026-02-21T09:47:27.4319808Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.4319862Z // end inline asm 2026-02-21T09:47:27.4319922Z // begin inline asm 2026-02-21T09:47:27.4320048Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:47:27.4320121Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.4320196Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.4320299Z // end inline asm 2026-02-21T09:47:27.4320357Z bar.sync 0; 2026-02-21T09:47:27.4320423Z cvta.global.u64 %rd58, %rd19; 2026-02-21T09:47:27.4320614Z .loc 1 22 68 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:22:68 2026-02-21T09:47:27.4320677Z add.s32 %r197, %r195, 128; 2026-02-21T09:47:27.4320737Z cvt.s64.s32 %rd66, %r197; 2026-02-21T09:47:27.4320803Z add.s64 %rd37, %rd64, %rd66; 2026-02-21T09:47:27.4320856Z bar.sync 0; 2026-02-21T09:47:27.4320910Z // begin inline asm 2026-02-21T09:47:27.4320979Z @%p3 st.shared.b32 [ %r62 + 0 ], %r682; 2026-02-21T09:47:27.4321040Z // end inline asm 2026-02-21T09:47:27.4321105Z bar.warp.sync -1; 2026-02-21T09:47:27.4321157Z // begin inline asm 2026-02-21T09:47:27.4321326Z @%p197 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:47:27.4321378Z // end inline asm 2026-02-21T09:47:27.4321431Z // begin inline asm 2026-02-21T09:47:27.4321576Z @%p197 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.4321641Z // end inline asm 2026-02-21T09:47:27.4321745Z // begin inline asm 2026-02-21T09:47:27.4321899Z @%p197 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r64; 2026-02-21T09:47:27.4321961Z // end inline asm 2026-02-21T09:47:27.4322017Z // begin inline asm 2026-02-21T09:47:27.4322165Z @%p197 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r64; 2026-02-21T09:47:27.4322228Z // end inline asm 2026-02-21T09:47:27.4322284Z // begin inline asm 2026-02-21T09:47:27.4322442Z @%p197 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r66; 2026-02-21T09:47:27.4322499Z // end inline asm 2026-02-21T09:47:27.4322565Z mov.b32 %r75, 12288; 2026-02-21T09:47:27.4322621Z // begin inline asm 2026-02-21T09:47:27.4322779Z @%p197 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r75; 2026-02-21T09:47:27.4322844Z // end inline asm 2026-02-21T09:47:27.4322932Z // begin inline asm 2026-02-21T09:47:27.4323104Z @%p197 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:27.4323169Z // end inline asm 2026-02-21T09:47:27.4323226Z // begin inline asm 2026-02-21T09:47:27.4323394Z @%p197 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r68; 2026-02-21T09:47:27.4323457Z // end inline asm 2026-02-21T09:47:27.4323514Z // begin inline asm 2026-02-21T09:47:27.4323681Z @%p197 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r68; 2026-02-21T09:47:27.4323737Z // end inline asm 2026-02-21T09:47:27.4323825Z // begin inline asm 2026-02-21T09:47:27.4323973Z @%p197 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.4324027Z // end inline asm 2026-02-21T09:47:27.4324089Z // begin inline asm 2026-02-21T09:47:27.4324256Z @%p197 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.4324312Z // end inline asm 2026-02-21T09:47:27.4324377Z // begin inline asm 2026-02-21T09:47:27.4324528Z @%p197 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.4324583Z // end inline asm 2026-02-21T09:47:27.4324638Z // begin inline asm 2026-02-21T09:47:27.4324863Z @%p197 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.4324917Z // end inline asm 2026-02-21T09:47:27.4324971Z // begin inline asm 2026-02-21T09:47:27.4325247Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.4325306Z // end inline asm 2026-02-21T09:47:27.4325363Z // begin inline asm 2026-02-21T09:47:27.4325503Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:47:27.4325576Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.4325650Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.4325751Z // end inline asm 2026-02-21T09:47:27.4325812Z bar.sync 0; 2026-02-21T09:47:27.4325879Z cvta.global.u64 %rd59, %rd37; 2026-02-21T09:47:27.4326053Z .loc 1 24 73 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:24:73 2026-02-21T09:47:27.4326122Z add.s32 %r198, %r195, 256; 2026-02-21T09:47:27.4326181Z cvt.s64.s32 %rd67, %r198; 2026-02-21T09:47:27.4326240Z add.s64 %rd55, %rd64, %rd67; 2026-02-21T09:47:27.4326299Z bar.sync 0; 2026-02-21T09:47:27.4326353Z // begin inline asm 2026-02-21T09:47:27.4326421Z @%p3 st.shared.b32 [ %r62 + 0 ], %r682; 2026-02-21T09:47:27.4326474Z // end inline asm 2026-02-21T09:47:27.4326541Z bar.warp.sync -1; 2026-02-21T09:47:27.4326596Z // begin inline asm 2026-02-21T09:47:27.4326755Z @%p197 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:47:27.4326817Z // end inline asm 2026-02-21T09:47:27.4326869Z // begin inline asm 2026-02-21T09:47:27.4327004Z @%p197 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.4327063Z // end inline asm 2026-02-21T09:47:27.4327146Z // begin inline asm 2026-02-21T09:47:27.4327296Z @%p197 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r64; 2026-02-21T09:47:27.4327349Z // end inline asm 2026-02-21T09:47:27.4327411Z // begin inline asm 2026-02-21T09:47:27.4327556Z @%p197 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r65; 2026-02-21T09:47:27.4327609Z // end inline asm 2026-02-21T09:47:27.4327669Z // begin inline asm 2026-02-21T09:47:27.4327823Z @%p197 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r75; 2026-02-21T09:47:27.4327878Z // end inline asm 2026-02-21T09:47:27.4327939Z // begin inline asm 2026-02-21T09:47:27.4328091Z @%p197 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r66; 2026-02-21T09:47:27.4328144Z // end inline asm 2026-02-21T09:47:27.4328200Z mov.b64 %rd48, 24576; 2026-02-21T09:47:27.4328291Z // begin inline asm 2026-02-21T09:47:27.4328456Z @%p197 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd48; 2026-02-21T09:47:27.4328511Z // end inline asm 2026-02-21T09:47:27.4328572Z // begin inline asm 2026-02-21T09:47:27.4328735Z @%p197 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r68; 2026-02-21T09:47:27.4328787Z // end inline asm 2026-02-21T09:47:27.4328846Z // begin inline asm 2026-02-21T09:47:27.4329009Z @%p197 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r68; 2026-02-21T09:47:27.4329060Z // end inline asm 2026-02-21T09:47:27.4329142Z // begin inline asm 2026-02-21T09:47:27.4329297Z @%p197 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.4329349Z // end inline asm 2026-02-21T09:47:27.4329401Z // begin inline asm 2026-02-21T09:47:27.4329572Z @%p197 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.4329625Z // end inline asm 2026-02-21T09:47:27.4329678Z // begin inline asm 2026-02-21T09:47:27.4329833Z @%p197 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.4329886Z // end inline asm 2026-02-21T09:47:27.4329938Z // begin inline asm 2026-02-21T09:47:27.4330086Z @%p197 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.4330138Z // end inline asm 2026-02-21T09:47:27.4330189Z // begin inline asm 2026-02-21T09:47:27.4330447Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.4330507Z // end inline asm 2026-02-21T09:47:27.4330560Z // begin inline asm 2026-02-21T09:47:27.4330683Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:47:27.4330760Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.4330830Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.4330911Z // end inline asm 2026-02-21T09:47:27.4330974Z bar.sync 0; 2026-02-21T09:47:27.4331037Z cvta.global.u64 %rd186, %rd55; 2026-02-21T09:47:27.4331202Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4331261Z max.u32 %r199, %r680, 1535; 2026-02-21T09:47:27.4331325Z shl.b32 %r200, %r199, 4; 2026-02-21T09:47:27.4331386Z add.s32 %r4, %r200, -24560; 2026-02-21T09:47:27.4331442Z sub.s32 %r5, 24576, %r200; 2026-02-21T09:47:27.4331612Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4331668Z shr.u32 %r201, %r1, 5; 2026-02-21T09:47:27.4331741Z shfl.sync.idx.b32 %r6, %r201, 0, 31, -1; 2026-02-21T09:47:27.4331805Z shl.b32 %r202, %r6, 21; 2026-02-21T09:47:27.4331866Z and.b32 %r203, %r202, 6291456; 2026-02-21T09:47:27.4331922Z add.s32 %r204, %r203, %r657; 2026-02-21T09:47:27.4331978Z shl.b32 %r205, %r6, 4; 2026-02-21T09:47:27.4332044Z and.b32 %r206, %r205, 64; 2026-02-21T09:47:27.4332102Z add.s32 %r86, %r204, %r206; 2026-02-21T09:47:27.4332160Z mov.pred %p99, -1; 2026-02-21T09:47:27.4332242Z // begin inline asm 2026-02-21T09:47:27.4332531Z @%p99 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r86 + 0], {%r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682}; 2026-02-21T09:47:27.4332586Z // end inline asm 2026-02-21T09:47:27.4332643Z // begin inline asm 2026-02-21T09:47:27.4332935Z @%p99 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r86 + 16], {%r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682}; 2026-02-21T09:47:27.4332992Z // end inline asm 2026-02-21T09:47:27.4333046Z // begin inline asm 2026-02-21T09:47:27.4333329Z @%p99 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r86 + 32], {%r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682}; 2026-02-21T09:47:27.4333385Z // end inline asm 2026-02-21T09:47:27.4333461Z // begin inline asm 2026-02-21T09:47:27.4333732Z @%p99 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r86 + 48], {%r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682, %r682}; 2026-02-21T09:47:27.4333791Z // end inline asm 2026-02-21T09:47:27.4333845Z // begin inline asm 2026-02-21T09:47:27.4333921Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:27.4333975Z // end inline asm 2026-02-21T09:47:27.4334026Z bar.sync 0; 2026-02-21T09:47:27.4334194Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4334302Z add.s32 %r681, %r61, 360480; 2026-02-21T09:47:27.4334356Z // begin inline asm 2026-02-21T09:47:27.4334441Z @%p197 mbarrier.init.shared::cta.b64 [%r681], 1; 2026-02-21T09:47:27.4334501Z // end inline asm 2026-02-21T09:47:27.4334552Z bar.sync 0; 2026-02-21T09:47:27.4334610Z add.s32 %r155, %r61, 360488; 2026-02-21T09:47:27.4334664Z // begin inline asm 2026-02-21T09:47:27.4334790Z @%p197 mbarrier.init.shared::cta.b64 [%r155], 1; 2026-02-21T09:47:27.4334845Z // end inline asm 2026-02-21T09:47:27.4334902Z add.s32 %r156, %r61, 360448; 2026-02-21T09:47:27.4334966Z // begin inline asm 2026-02-21T09:47:27.4335044Z @%p197 mbarrier.init.shared::cta.b64 [%r156], 1; 2026-02-21T09:47:27.4335096Z // end inline asm 2026-02-21T09:47:27.4335155Z bar.sync 0; 2026-02-21T09:47:27.4335209Z add.s32 %r157, %r61, 360456; 2026-02-21T09:47:27.4335262Z // begin inline asm 2026-02-21T09:47:27.4335339Z @%p197 mbarrier.init.shared::cta.b64 [%r157], 1; 2026-02-21T09:47:27.4335400Z // end inline asm 2026-02-21T09:47:27.4335454Z bar.sync 0; 2026-02-21T09:47:27.4335509Z add.s32 %r158, %r61, 360464; 2026-02-21T09:47:27.4335572Z // begin inline asm 2026-02-21T09:47:27.4335652Z @%p197 mbarrier.init.shared::cta.b64 [%r158], 1; 2026-02-21T09:47:27.4335706Z // end inline asm 2026-02-21T09:47:27.4335759Z bar.sync 0; 2026-02-21T09:47:27.4335851Z add.s32 %r310, %r61, 360472; 2026-02-21T09:47:27.4335909Z // begin inline asm 2026-02-21T09:47:27.4335986Z @%p197 mbarrier.init.shared::cta.b64 [%r310], 1; 2026-02-21T09:47:27.4336049Z // end inline asm 2026-02-21T09:47:27.4336111Z setp.lt.s32 %p79, %r5, 1; 2026-02-21T09:47:27.4336169Z setp.gt.s32 %p78, %r5, 0; 2026-02-21T09:47:27.4336344Z .loc 1 39 35 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:39:35 2026-02-21T09:47:27.4336412Z mul.hi.u32 %r207, %r680, 715827883; 2026-02-21T09:47:27.4336471Z shr.u32 %r208, %r207, 7; 2026-02-21T09:47:27.4336634Z .loc 1 40 33 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:40:33 2026-02-21T09:47:27.4336702Z shl.b32 %r209, %r208, 2; 2026-02-21T09:47:27.4336864Z .loc 1 41 39 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:41:39 2026-02-21T09:47:27.4336921Z sub.s32 %r210, 8, %r209; 2026-02-21T09:47:27.4337095Z .loc 1 41 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:41:52 2026-02-21T09:47:27.4337150Z min.s32 %r211, %r210, 4; 2026-02-21T09:47:27.4337338Z .loc 1 42 45 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:42:45 2026-02-21T09:47:27.4337406Z mul.lo.s32 %r212, %r208, 768; 2026-02-21T09:47:27.4337462Z sub.s32 %r213, %r680, %r212; 2026-02-21T09:47:27.4337622Z .loc 1 43 51 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:43:51 2026-02-21T09:47:27.4337679Z div.s32 %r214, %r213, %r211; 2026-02-21T09:47:27.4337843Z .loc 1 42 64 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:42:64 2026-02-21T09:47:27.4337906Z mul.lo.s32 %r215, %r214, %r211; 2026-02-21T09:47:27.4337962Z sub.s32 %r216, %r213, %r215; 2026-02-21T09:47:27.4338128Z .loc 1 42 30 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:42:30 2026-02-21T09:47:27.4338183Z add.s32 %r217, %r216, %r209; 2026-02-21T09:47:27.4338370Z .loc 1 44 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:44:27 2026-02-21T09:47:27.4338436Z shl.b32 %r661, %r217, 8; 2026-02-21T09:47:27.4338594Z .loc 1 45 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:45:27 2026-02-21T09:47:27.4338649Z shl.b32 %r659, %r214, 6; 2026-02-21T09:47:27.4338816Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4338869Z bar.sync 0; 2026-02-21T09:47:27.4338931Z and.pred %p69, %p197, %p78; 2026-02-21T09:47:27.4338985Z // begin inline asm 2026-02-21T09:47:27.4339102Z @%p69 mbarrier.arrive.expect_tx.shared.b64 _, [%r156], 81920; 2026-02-21T09:47:27.4339188Z // end inline asm 2026-02-21T09:47:27.4339342Z .loc 1 54 31 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:54:31 2026-02-21T09:47:27.4339404Z // begin inline asm 2026-02-21T09:47:27.4339474Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.4339529Z // end inline asm 2026-02-21T09:47:27.4339581Z bar.sync 0; 2026-02-21T09:47:27.4339650Z elect.sync %r218|%p80, -1; 2026-02-21T09:47:27.4339713Z and.pred %p81, %p78, %p80; 2026-02-21T09:47:27.4339771Z setp.lt.u32 %p82, %r1, 64; 2026-02-21T09:47:27.4339838Z and.pred %p70, %p82, %p81; 2026-02-21T09:47:27.4339892Z and.b32 %r219, %r6, 1; 2026-02-21T09:47:27.4339949Z shl.b32 %r10, %r219, 14; 2026-02-21T09:47:27.4340012Z shl.b32 %r220, %r219, 15; 2026-02-21T09:47:27.4340067Z add.s32 %r161, %r61, %r220; 2026-02-21T09:47:27.4340122Z shl.b32 %r162, %r219, 6; 2026-02-21T09:47:27.4340175Z // begin inline asm 2026-02-21T09:47:27.4340431Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r161], [%rd58, {%r162, %r661}], [%r156]; 2026-02-21T09:47:27.4340485Z // end inline asm 2026-02-21T09:47:27.4340639Z .loc 1 55 44 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:55:44 2026-02-21T09:47:27.4340699Z bar.sync 0; 2026-02-21T09:47:27.4340781Z elect.sync %r221|%p83, -1; 2026-02-21T09:47:27.4340840Z and.pred %p84, %p78, %p83; 2026-02-21T09:47:27.4340906Z and.pred %p71, %p82, %p84; 2026-02-21T09:47:27.4340962Z shl.b32 %r13, %r219, 12; 2026-02-21T09:47:27.4341016Z shl.b32 %r222, %r219, 13; 2026-02-21T09:47:27.4341073Z add.s32 %r223, %r61, %r222; 2026-02-21T09:47:27.4341140Z add.s32 %r165, %r223, 262144; 2026-02-21T09:47:27.4341194Z // begin inline asm 2026-02-21T09:47:27.4341429Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r165], [%rd59, {%r162, %r659}], [%r156]; 2026-02-21T09:47:27.4341492Z // end inline asm 2026-02-21T09:47:27.4341659Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4341721Z setp.gt.s32 %p85, %r5, 1; 2026-02-21T09:47:27.4341781Z bar.sync 0; 2026-02-21T09:47:27.4341842Z and.pred %p72, %p197, %p85; 2026-02-21T09:47:27.4341911Z // begin inline asm 2026-02-21T09:47:27.4342025Z @%p72 mbarrier.arrive.expect_tx.shared.b64 _, [%r157], 81920; 2026-02-21T09:47:27.4342088Z // end inline asm 2026-02-21T09:47:27.4342270Z .loc 1 54 31 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:54:31 2026-02-21T09:47:27.4342323Z bar.sync 0; 2026-02-21T09:47:27.4342388Z elect.sync %r224|%p86, -1; 2026-02-21T09:47:27.4342446Z and.pred %p87, %p85, %p86; 2026-02-21T09:47:27.4342503Z and.pred %p73, %p82, %p87; 2026-02-21T09:47:27.4342558Z add.s32 %r170, %r161, 65536; 2026-02-21T09:47:27.4342619Z or.b32 %r171, %r162, 128; 2026-02-21T09:47:27.4342673Z // begin inline asm 2026-02-21T09:47:27.4342903Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r170], [%rd58, {%r171, %r661}], [%r157]; 2026-02-21T09:47:27.4342965Z // end inline asm 2026-02-21T09:47:27.4343124Z .loc 1 55 44 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:55:44 2026-02-21T09:47:27.4343174Z bar.sync 0; 2026-02-21T09:47:27.4343262Z elect.sync %r225|%p88, -1; 2026-02-21T09:47:27.4343321Z and.pred %p89, %p85, %p88; 2026-02-21T09:47:27.4343380Z and.pred %p74, %p82, %p89; 2026-02-21T09:47:27.4343437Z add.s32 %r174, %r223, 278528; 2026-02-21T09:47:27.4343498Z // begin inline asm 2026-02-21T09:47:27.4343723Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r174], [%rd59, {%r171, %r659}], [%r157]; 2026-02-21T09:47:27.4343776Z // end inline asm 2026-02-21T09:47:27.4343945Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4344002Z setp.gt.s32 %p90, %r5, 2; 2026-02-21T09:47:27.4344076Z bar.sync 0; 2026-02-21T09:47:27.4344142Z and.pred %p75, %p197, %p90; 2026-02-21T09:47:27.4344197Z // begin inline asm 2026-02-21T09:47:27.4344304Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r158], 81920; 2026-02-21T09:47:27.4344357Z // end inline asm 2026-02-21T09:47:27.4344524Z .loc 1 54 31 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:54:31 2026-02-21T09:47:27.4344578Z bar.sync 0; 2026-02-21T09:47:27.4344638Z elect.sync %r226|%p91, -1; 2026-02-21T09:47:27.4344746Z and.pred %p92, %p90, %p91; 2026-02-21T09:47:27.4344806Z and.pred %p76, %p82, %p92; 2026-02-21T09:47:27.4344866Z add.s32 %r179, %r161, 131072; 2026-02-21T09:47:27.4344924Z or.b32 %r180, %r162, 256; 2026-02-21T09:47:27.4344988Z // begin inline asm 2026-02-21T09:47:27.4345237Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r179], [%rd58, {%r180, %r661}], [%r158]; 2026-02-21T09:47:27.4345292Z // end inline asm 2026-02-21T09:47:27.4345473Z .loc 1 55 44 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:55:44 2026-02-21T09:47:27.4345528Z bar.sync 0; 2026-02-21T09:47:27.4345588Z elect.sync %r227|%p93, -1; 2026-02-21T09:47:27.4345656Z and.pred %p94, %p90, %p93; 2026-02-21T09:47:27.4345716Z and.pred %p77, %p82, %p94; 2026-02-21T09:47:27.4345803Z add.s32 %r183, %r223, 294912; 2026-02-21T09:47:27.4345864Z // begin inline asm 2026-02-21T09:47:27.4346125Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r183], [%rd59, {%r180, %r659}], [%r158]; 2026-02-21T09:47:27.4346184Z // end inline asm 2026-02-21T09:47:27.4346353Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4346418Z bar.sync 0; 2026-02-21T09:47:27.4346477Z // begin inline asm 2026-02-21T09:47:27.4346531Z 2026-02-21T09:47:27.4346592Z { 2026-02-21T09:47:27.4346658Z @!%p78 bra.uni skipWait; 2026-02-21T09:47:27.4346720Z .reg .pred complete; 2026-02-21T09:47:27.4346777Z waitLoop: 2026-02-21T09:47:27.4346907Z mbarrier.try_wait.parity.shared.b64 complete, [%r156], %r682; 2026-02-21T09:47:27.4346973Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.4347029Z skipWait: 2026-02-21T09:47:27.4347088Z } 2026-02-21T09:47:27.4347094Z 2026-02-21T09:47:27.4347150Z // end inline asm 2026-02-21T09:47:27.4347322Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4347417Z setp.ne.b32 %p95, %r6, 0; 2026-02-21T09:47:27.4347478Z or.pred %p96, %p79, %p95; 2026-02-21T09:47:27.4347536Z @%p96 bra $L__BB0_2; 2026-02-21T09:47:27.4347590Z // %bb.1: 2026-02-21T09:47:27.4347661Z elect.sync %r260|%p98, -1; 2026-02-21T09:47:27.4347724Z bfe.u32 %r262, %r61, 4, 14; 2026-02-21T09:47:27.4347784Z cvt.u64.u32 %rd101, %r262; 2026-02-21T09:47:27.4347865Z or.b64 %rd68, %rd101, 4611686293439512576; 2026-02-21T09:47:27.4347927Z add.s32 %r263, %r61, 262144; 2026-02-21T09:47:27.4347989Z bfe.u32 %r264, %r263, 4, 14; 2026-02-21T09:47:27.4348047Z cvt.u64.u32 %rd102, %r264; 2026-02-21T09:47:27.4348126Z or.b64 %rd69, %rd102, 4611686293338849280; 2026-02-21T09:47:27.4348185Z mov.b32 %r229, 135266320; 2026-02-21T09:47:27.4348244Z mov.pred %p97, 0; 2026-02-21T09:47:27.4348307Z // begin inline asm 2026-02-21T09:47:27.4348480Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd68, %rd69, %r229, %p97; 2026-02-21T09:47:27.4348537Z // end inline asm 2026-02-21T09:47:27.4348603Z add.s32 %r265, %r61, 32; 2026-02-21T09:47:27.4348664Z bfe.u32 %r266, %r265, 4, 14; 2026-02-21T09:47:27.4348723Z cvt.u64.u32 %rd103, %r266; 2026-02-21T09:47:27.4348793Z or.b64 %rd70, %rd103, 4611686293439512576; 2026-02-21T09:47:27.4348861Z add.s32 %r267, %r61, 262176; 2026-02-21T09:47:27.4348919Z bfe.u32 %r268, %r267, 4, 14; 2026-02-21T09:47:27.4348978Z cvt.u64.u32 %rd104, %r268; 2026-02-21T09:47:27.4349053Z or.b64 %rd71, %rd104, 4611686293338849280; 2026-02-21T09:47:27.4349110Z // begin inline asm 2026-02-21T09:47:27.4349284Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd70, %rd71, %r229, %p99; 2026-02-21T09:47:27.4349340Z // end inline asm 2026-02-21T09:47:27.4349407Z add.s32 %r269, %r61, 64; 2026-02-21T09:47:27.4349465Z bfe.u32 %r270, %r269, 4, 14; 2026-02-21T09:47:27.4349524Z cvt.u64.u32 %rd105, %r270; 2026-02-21T09:47:27.4349603Z or.b64 %rd72, %rd105, 4611686293439512576; 2026-02-21T09:47:27.4349662Z add.s32 %r271, %r61, 262208; 2026-02-21T09:47:27.4349722Z bfe.u32 %r272, %r271, 4, 14; 2026-02-21T09:47:27.4349781Z cvt.u64.u32 %rd106, %r272; 2026-02-21T09:47:27.4349860Z or.b64 %rd73, %rd106, 4611686293338849280; 2026-02-21T09:47:27.4349919Z // begin inline asm 2026-02-21T09:47:27.4350056Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd72, %rd73, %r229, %p99; 2026-02-21T09:47:27.4350119Z // end inline asm 2026-02-21T09:47:27.4350178Z add.s32 %r273, %r61, 96; 2026-02-21T09:47:27.4350235Z bfe.u32 %r274, %r273, 4, 14; 2026-02-21T09:47:27.4350304Z cvt.u64.u32 %rd107, %r274; 2026-02-21T09:47:27.4350372Z or.b64 %rd74, %rd107, 4611686293439512576; 2026-02-21T09:47:27.4350429Z add.s32 %r275, %r61, 262240; 2026-02-21T09:47:27.4350486Z bfe.u32 %r276, %r275, 4, 14; 2026-02-21T09:47:27.4350552Z cvt.u64.u32 %rd108, %r276; 2026-02-21T09:47:27.4350620Z or.b64 %rd75, %rd108, 4611686293338849280; 2026-02-21T09:47:27.4350701Z // begin inline asm 2026-02-21T09:47:27.4350842Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd74, %rd75, %r229, %p99; 2026-02-21T09:47:27.4350900Z // end inline asm 2026-02-21T09:47:27.4350959Z add.s32 %r277, %r61, 32768; 2026-02-21T09:47:27.4351024Z bfe.u32 %r278, %r277, 4, 14; 2026-02-21T09:47:27.4351082Z cvt.u64.u32 %rd109, %r278; 2026-02-21T09:47:27.4351148Z or.b64 %rd76, %rd109, 4611686293439512576; 2026-02-21T09:47:27.4351204Z add.s32 %r279, %r61, 270336; 2026-02-21T09:47:27.4351269Z bfe.u32 %r280, %r279, 4, 14; 2026-02-21T09:47:27.4351326Z cvt.u64.u32 %rd110, %r280; 2026-02-21T09:47:27.4351394Z or.b64 %rd77, %rd110, 4611686293338849280; 2026-02-21T09:47:27.4351456Z // begin inline asm 2026-02-21T09:47:27.4351587Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd76, %rd77, %r229, %p99; 2026-02-21T09:47:27.4351643Z // end inline asm 2026-02-21T09:47:27.4351700Z add.s32 %r281, %r61, 32800; 2026-02-21T09:47:27.4351766Z bfe.u32 %r282, %r281, 4, 14; 2026-02-21T09:47:27.4351825Z cvt.u64.u32 %rd111, %r282; 2026-02-21T09:47:27.4351891Z or.b64 %rd78, %rd111, 4611686293439512576; 2026-02-21T09:47:27.4351978Z add.s32 %r283, %r61, 270368; 2026-02-21T09:47:27.4352037Z bfe.u32 %r284, %r283, 4, 14; 2026-02-21T09:47:27.4352096Z cvt.u64.u32 %rd112, %r284; 2026-02-21T09:47:27.4352171Z or.b64 %rd79, %rd112, 4611686293338849280; 2026-02-21T09:47:27.4352228Z // begin inline asm 2026-02-21T09:47:27.4352359Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd78, %rd79, %r229, %p99; 2026-02-21T09:47:27.4352413Z // end inline asm 2026-02-21T09:47:27.4352479Z add.s32 %r285, %r61, 32832; 2026-02-21T09:47:27.4352538Z bfe.u32 %r286, %r285, 4, 14; 2026-02-21T09:47:27.4352596Z cvt.u64.u32 %rd113, %r286; 2026-02-21T09:47:27.4352670Z or.b64 %rd80, %rd113, 4611686293439512576; 2026-02-21T09:47:27.4352727Z add.s32 %r287, %r61, 270400; 2026-02-21T09:47:27.4352784Z bfe.u32 %r288, %r287, 4, 14; 2026-02-21T09:47:27.4352843Z cvt.u64.u32 %rd114, %r288; 2026-02-21T09:47:27.4352952Z or.b64 %rd81, %rd114, 4611686293338849280; 2026-02-21T09:47:27.4353009Z // begin inline asm 2026-02-21T09:47:27.4353152Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd80, %rd81, %r229, %p99; 2026-02-21T09:47:27.4353212Z // end inline asm 2026-02-21T09:47:27.4353268Z add.s32 %r289, %r61, 32864; 2026-02-21T09:47:27.4353322Z bfe.u32 %r290, %r289, 4, 14; 2026-02-21T09:47:27.4353383Z cvt.u64.u32 %rd115, %r290; 2026-02-21T09:47:27.4353447Z or.b64 %rd82, %rd115, 4611686293439512576; 2026-02-21T09:47:27.4353501Z add.s32 %r291, %r61, 270432; 2026-02-21T09:47:27.4353554Z bfe.u32 %r292, %r291, 4, 14; 2026-02-21T09:47:27.4353641Z cvt.u64.u32 %rd116, %r292; 2026-02-21T09:47:27.4353706Z or.b64 %rd83, %rd116, 4611686293338849280; 2026-02-21T09:47:27.4353759Z // begin inline asm 2026-02-21T09:47:27.4353891Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd82, %rd83, %r229, %p99; 2026-02-21T09:47:27.4353943Z // end inline asm 2026-02-21T09:47:27.4354000Z add.s32 %r293, %r61, 16384; 2026-02-21T09:47:27.4354055Z bfe.u32 %r294, %r293, 4, 14; 2026-02-21T09:47:27.4354118Z cvt.u64.u32 %rd117, %r294; 2026-02-21T09:47:27.4354182Z or.b64 %rd84, %rd117, 4611686293439512576; 2026-02-21T09:47:27.4354235Z // begin inline asm 2026-02-21T09:47:27.4354371Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd84, %rd69, %r229, %p97; 2026-02-21T09:47:27.4354423Z // end inline asm 2026-02-21T09:47:27.4354480Z add.s32 %r295, %r61, 16416; 2026-02-21T09:47:27.4354541Z bfe.u32 %r296, %r295, 4, 14; 2026-02-21T09:47:27.4354597Z cvt.u64.u32 %rd118, %r296; 2026-02-21T09:47:27.4354660Z or.b64 %rd86, %rd118, 4611686293439512576; 2026-02-21T09:47:27.4354752Z // begin inline asm 2026-02-21T09:47:27.4354887Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd86, %rd71, %r229, %p99; 2026-02-21T09:47:27.4354940Z // end inline asm 2026-02-21T09:47:27.4354996Z add.s32 %r297, %r61, 16448; 2026-02-21T09:47:27.4355082Z bfe.u32 %r298, %r297, 4, 14; 2026-02-21T09:47:27.4355141Z cvt.u64.u32 %rd119, %r298; 2026-02-21T09:47:27.4355206Z or.b64 %rd88, %rd119, 4611686293439512576; 2026-02-21T09:47:27.4355268Z // begin inline asm 2026-02-21T09:47:27.4355395Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd88, %rd73, %r229, %p99; 2026-02-21T09:47:27.4355447Z // end inline asm 2026-02-21T09:47:27.4355502Z add.s32 %r299, %r61, 16480; 2026-02-21T09:47:27.4355564Z bfe.u32 %r300, %r299, 4, 14; 2026-02-21T09:47:27.4355619Z cvt.u64.u32 %rd120, %r300; 2026-02-21T09:47:27.4355682Z or.b64 %rd90, %rd120, 4611686293439512576; 2026-02-21T09:47:27.4355744Z // begin inline asm 2026-02-21T09:47:27.4355871Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd90, %rd75, %r229, %p99; 2026-02-21T09:47:27.4355925Z // end inline asm 2026-02-21T09:47:27.4355988Z add.s32 %r301, %r61, 49152; 2026-02-21T09:47:27.4356045Z bfe.u32 %r302, %r301, 4, 14; 2026-02-21T09:47:27.4356101Z cvt.u64.u32 %rd121, %r302; 2026-02-21T09:47:27.4356168Z or.b64 %rd92, %rd121, 4611686293439512576; 2026-02-21T09:47:27.4356232Z // begin inline asm 2026-02-21T09:47:27.4356382Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd92, %rd77, %r229, %p99; 2026-02-21T09:47:27.4356435Z // end inline asm 2026-02-21T09:47:27.4356498Z add.s32 %r303, %r61, 49184; 2026-02-21T09:47:27.4356554Z bfe.u32 %r304, %r303, 4, 14; 2026-02-21T09:47:27.4356611Z cvt.u64.u32 %rd122, %r304; 2026-02-21T09:47:27.4356676Z or.b64 %rd94, %rd122, 4611686293439512576; 2026-02-21T09:47:27.4356741Z // begin inline asm 2026-02-21T09:47:27.4356868Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd94, %rd79, %r229, %p99; 2026-02-21T09:47:27.4356926Z // end inline asm 2026-02-21T09:47:27.4356990Z add.s32 %r305, %r61, 49216; 2026-02-21T09:47:27.4357044Z bfe.u32 %r306, %r305, 4, 14; 2026-02-21T09:47:27.4357100Z cvt.u64.u32 %rd123, %r306; 2026-02-21T09:47:27.4357172Z or.b64 %rd96, %rd123, 4611686293439512576; 2026-02-21T09:47:27.4357227Z // begin inline asm 2026-02-21T09:47:27.4357376Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd96, %rd81, %r229, %p99; 2026-02-21T09:47:27.4357432Z // end inline asm 2026-02-21T09:47:27.4357493Z add.s32 %r307, %r61, 49248; 2026-02-21T09:47:27.4357548Z bfe.u32 %r308, %r307, 4, 14; 2026-02-21T09:47:27.4357603Z cvt.u64.u32 %rd124, %r308; 2026-02-21T09:47:27.4357672Z or.b64 %rd98, %rd124, 4611686293439512576; 2026-02-21T09:47:27.4357726Z // begin inline asm 2026-02-21T09:47:27.4357849Z @%p98 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd98, %rd83, %r229, %p99; 2026-02-21T09:47:27.4357908Z // end inline asm 2026-02-21T09:47:27.4357962Z add.s32 %r309, %r61, 360480; 2026-02-21T09:47:27.4358042Z cvt.u64.u32 %rd100, %r309; 2026-02-21T09:47:27.4358097Z // begin inline asm 2026-02-21T09:47:27.4358230Z @%p98 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd100]; 2026-02-21T09:47:27.4358284Z // end inline asm 2026-02-21T09:47:27.4358335Z $L__BB0_2: 2026-02-21T09:47:27.4358515Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4358574Z setp.gt.s32 %p133, %r5, 3; 2026-02-21T09:47:27.4358627Z bar.sync 0; 2026-02-21T09:47:27.4358689Z and.pred %p130, %p197, %p133; 2026-02-21T09:47:27.4358751Z // begin inline asm 2026-02-21T09:47:27.4358863Z @%p130 mbarrier.arrive.expect_tx.shared.b64 _, [%r310], 81920; 2026-02-21T09:47:27.4358916Z // end inline asm 2026-02-21T09:47:27.4359086Z .loc 1 54 31 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:54:31 2026-02-21T09:47:27.4359138Z bar.sync 0; 2026-02-21T09:47:27.4359200Z elect.sync %r322|%p136, -1; 2026-02-21T09:47:27.4359270Z and.pred %p137, %p133, %p136; 2026-02-21T09:47:27.4359330Z and.pred %p131, %p82, %p137; 2026-02-21T09:47:27.4359387Z shl.b32 %r323, %r10, 1; 2026-02-21T09:47:27.4359441Z add.s32 %r324, %r61, %r323; 2026-02-21T09:47:27.4359505Z add.s32 %r311, %r324, 196608; 2026-02-21T09:47:27.4359582Z or.b32 %r312, %r162, 384; 2026-02-21T09:47:27.4359640Z // begin inline asm 2026-02-21T09:47:27.4359887Z @%p131 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r311], [%rd58, {%r312, %r661}], [%r310]; 2026-02-21T09:47:27.4359942Z // end inline asm 2026-02-21T09:47:27.4360107Z .loc 1 55 44 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:55:44 2026-02-21T09:47:27.4360166Z bar.sync 0; 2026-02-21T09:47:27.4360226Z elect.sync %r325|%p138, -1; 2026-02-21T09:47:27.4360284Z and.pred %p139, %p133, %p138; 2026-02-21T09:47:27.4360343Z and.pred %p132, %p82, %p139; 2026-02-21T09:47:27.4360408Z shl.b32 %r326, %r13, 1; 2026-02-21T09:47:27.4360464Z add.s32 %r327, %r61, %r326; 2026-02-21T09:47:27.4360520Z add.s32 %r315, %r327, 311296; 2026-02-21T09:47:27.4360582Z // begin inline asm 2026-02-21T09:47:27.4360816Z @%p132 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r315], [%rd59, {%r312, %r659}], [%r310]; 2026-02-21T09:47:27.4360869Z // end inline asm 2026-02-21T09:47:27.4361039Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4361119Z sub.s32 %r15, 15, %r4; 2026-02-21T09:47:27.4361178Z setp.lt.s32 %p140, %r15, 1; 2026-02-21T09:47:27.4361235Z @%p140 bra $L__BB0_11; 2026-02-21T09:47:27.4361316Z // %bb.3: // %.lr.ph 2026-02-21T09:47:27.4361481Z .loc 1 0 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:0:84 2026-02-21T09:47:27.4361536Z sub.s32 %r16, 12, %r4; 2026-02-21T09:47:27.4361597Z shl.b32 %r334, %r1, 7; 2026-02-21T09:47:27.4361653Z and.b32 %r335, %r334, 32640; 2026-02-21T09:47:27.4361707Z shl.b32 %r336, %r1, 4; 2026-02-21T09:47:27.4361762Z and.b32 %r337, %r336, 112; 2026-02-21T09:47:27.4361826Z or.b32 %r338, %r335, %r337; 2026-02-21T09:47:27.4361881Z add.s32 %r340, %r61, 327680; 2026-02-21T09:47:27.4361936Z add.s32 %r17, %r340, %r338; 2026-02-21T09:47:27.4361998Z xor.b32 %r341, %r338, 16; 2026-02-21T09:47:27.4362073Z add.s32 %r18, %r340, %r341; 2026-02-21T09:47:27.4362129Z xor.b32 %r342, %r338, 32; 2026-02-21T09:47:27.4362192Z add.s32 %r19, %r340, %r342; 2026-02-21T09:47:27.4362248Z xor.b32 %r343, %r338, 48; 2026-02-21T09:47:27.4362303Z add.s32 %r20, %r340, %r343; 2026-02-21T09:47:27.4362357Z xor.b32 %r344, %r338, 64; 2026-02-21T09:47:27.4362418Z add.s32 %r21, %r340, %r344; 2026-02-21T09:47:27.4362472Z xor.b32 %r345, %r338, 80; 2026-02-21T09:47:27.4362526Z add.s32 %r22, %r340, %r345; 2026-02-21T09:47:27.4362587Z xor.b32 %r346, %r338, 96; 2026-02-21T09:47:27.4362641Z add.s32 %r23, %r340, %r346; 2026-02-21T09:47:27.4362718Z xor.b32 %r347, %r338, 112; 2026-02-21T09:47:27.4362773Z add.s32 %r24, %r340, %r347; 2026-02-21T09:47:27.4362837Z add.s32 %r667, %r61, 360480; 2026-02-21T09:47:27.4362895Z mov.pred %p204, -1; 2026-02-21T09:47:27.4362948Z mov.b32 %r670, 3; 2026-02-21T09:47:27.4363007Z mov.b32 %r666, 0; 2026-02-21T09:47:27.4363063Z mov.b32 %r665, 384; 2026-02-21T09:47:27.4363118Z mov.b32 %r664, 1; 2026-02-21T09:47:27.4363171Z mov.b32 %r663, 2; 2026-02-21T09:47:27.4363235Z mov.b32 %r660, %r659; 2026-02-21T09:47:27.4363289Z mov.b32 %r662, %r661; 2026-02-21T09:47:27.4363342Z mov.b32 %r668, %r666; 2026-02-21T09:47:27.4363403Z mov.b32 %r669, %r666; 2026-02-21T09:47:27.4363457Z mov.b32 %r671, %r664; 2026-02-21T09:47:27.4363509Z mov.b32 %r672, %r666; 2026-02-21T09:47:27.4363563Z mov.b32 %r673, %r659; 2026-02-21T09:47:27.4363624Z mov.b32 %r674, %r661; 2026-02-21T09:47:27.4363676Z mov.b32 %r676, %r670; 2026-02-21T09:47:27.4363730Z mov.b32 %r677, %r666; 2026-02-21T09:47:27.4363796Z mov.b32 %r678, %r674; 2026-02-21T09:47:27.4363849Z mov.b32 %r679, %r673; 2026-02-21T09:47:27.4363905Z bra.uni $L__BB0_4; 2026-02-21T09:47:27.4364012Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.4364085Z selp.b32 %r671, 0, %r467, %p186; 2026-02-21T09:47:27.4364143Z selp.b32 %r468, 1, 0, %p186; 2026-02-21T09:47:27.4364221Z xor.b32 %r672, %r682, %r468; 2026-02-21T09:47:27.4364392Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4364450Z add.s32 %r677, %r677, 1; 2026-02-21T09:47:27.4364514Z setp.lt.s32 %p195, %r677, %r15; 2026-02-21T09:47:27.4364575Z mov.b32 %r659, %r673; 2026-02-21T09:47:27.4364628Z mov.b32 %r660, %r25; 2026-02-21T09:47:27.4364722Z mov.b32 %r661, %r674; 2026-02-21T09:47:27.4364777Z mov.b32 %r662, %r27; 2026-02-21T09:47:27.4364837Z mov.b32 %r663, %r676; 2026-02-21T09:47:27.4364889Z mov.b32 %r664, %r29; 2026-02-21T09:47:27.4364941Z mov.b32 %r666, %r682; 2026-02-21T09:47:27.4365005Z mov.b32 %r667, %r681; 2026-02-21T09:47:27.4365057Z mov.b32 %r673, %r679; 2026-02-21T09:47:27.4365110Z mov.b32 %r674, %r678; 2026-02-21T09:47:27.4365163Z mov.b32 %r676, %r44; 2026-02-21T09:47:27.4365228Z @%p195 bra $L__BB0_4; 2026-02-21T09:47:27.4365282Z bra.uni $L__BB0_11; 2026-02-21T09:47:27.4365385Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:27.4365555Z .loc 1 0 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:0:84 2026-02-21T09:47:27.4365648Z mov.b32 %r682, %r672; 2026-02-21T09:47:27.4365702Z mov.b32 %r29, %r663; 2026-02-21T09:47:27.4365756Z mov.b32 %r27, %r661; 2026-02-21T09:47:27.4365820Z mov.b32 %r25, %r659; 2026-02-21T09:47:27.4365985Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4366041Z add.s32 %r348, %r676, 1; 2026-02-21T09:47:27.4366109Z setp.eq.b32 %p142, %r676, 15; 2026-02-21T09:47:27.4366172Z selp.b32 %r44, 0, %r348, %p142; 2026-02-21T09:47:27.4366232Z setp.ne.b32 %p143, %r44, 0; 2026-02-21T09:47:27.4366296Z @%p143 bra $L__BB0_6; 2026-02-21T09:47:27.4366392Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.4366449Z add.s32 %r680, %r680, 1; 2026-02-21T09:47:27.4366639Z .loc 1 39 35 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:39:35 2026-02-21T09:47:27.4366715Z mul.hi.s32 %r349, %r680, 715827883; 2026-02-21T09:47:27.4366772Z shr.u32 %r350, %r349, 31; 2026-02-21T09:47:27.4366827Z shr.s32 %r351, %r349, 7; 2026-02-21T09:47:27.4366893Z add.s32 %r352, %r351, %r350; 2026-02-21T09:47:27.4367055Z .loc 1 40 33 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:40:33 2026-02-21T09:47:27.4367111Z shl.b32 %r353, %r352, 2; 2026-02-21T09:47:27.4367283Z .loc 1 41 39 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:41:39 2026-02-21T09:47:27.4367371Z sub.s32 %r354, 8, %r353; 2026-02-21T09:47:27.4367530Z .loc 1 41 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:41:52 2026-02-21T09:47:27.4367585Z min.s32 %r355, %r354, 4; 2026-02-21T09:47:27.4367755Z .loc 1 42 45 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:42:45 2026-02-21T09:47:27.4367816Z mul.lo.s32 %r356, %r352, 768; 2026-02-21T09:47:27.4367873Z sub.s32 %r357, %r680, %r356; 2026-02-21T09:47:27.4368043Z .loc 1 43 51 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:43:51 2026-02-21T09:47:27.4368098Z div.s32 %r358, %r357, %r355; 2026-02-21T09:47:27.4368258Z .loc 1 42 64 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:42:64 2026-02-21T09:47:27.4368323Z mul.lo.s32 %r359, %r358, %r355; 2026-02-21T09:47:27.4368377Z sub.s32 %r360, %r357, %r359; 2026-02-21T09:47:27.4368540Z .loc 1 42 30 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:42:30 2026-02-21T09:47:27.4368603Z add.s32 %r361, %r360, %r353; 2026-02-21T09:47:27.4368764Z .loc 1 44 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:44:27 2026-02-21T09:47:27.4368819Z shl.b32 %r678, %r361, 8; 2026-02-21T09:47:27.4369005Z .loc 1 45 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:45:27 2026-02-21T09:47:27.4369070Z shl.b32 %r679, %r358, 6; 2026-02-21T09:47:27.4369166Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.4369330Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4369392Z add.s32 %r364, %r669, 1; 2026-02-21T09:47:27.4369449Z setp.gt.s32 %p145, %r364, 3; 2026-02-21T09:47:27.4369508Z selp.b32 %r669, 0, %r364, %p145; 2026-02-21T09:47:27.4369572Z selp.b32 %r365, 1, 0, %p145; 2026-02-21T09:47:27.4369627Z xor.b32 %r668, %r668, %r365; 2026-02-21T09:47:27.4369681Z shl.b32 %r366, %r669, 3; 2026-02-21T09:47:27.4369738Z add.s32 %r368, %r61, %r366; 2026-02-21T09:47:27.4369800Z add.s32 %r362, %r368, 360448; 2026-02-21T09:47:27.4369853Z bar.sync 0; 2026-02-21T09:47:27.4369907Z // begin inline asm 2026-02-21T09:47:27.4369963Z 2026-02-21T09:47:27.4370010Z { 2026-02-21T09:47:27.4370070Z .reg .pred complete; 2026-02-21T09:47:27.4370124Z waitLoop: 2026-02-21T09:47:27.4370249Z mbarrier.try_wait.parity.shared.b64 complete, [%r362], %r668; 2026-02-21T09:47:27.4370356Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.4370404Z } 2026-02-21T09:47:27.4370409Z 2026-02-21T09:47:27.4370467Z // end inline asm 2026-02-21T09:47:27.4370522Z shl.b32 %r369, %r671, 3; 2026-02-21T09:47:27.4370577Z add.s32 %r370, %r61, %r369; 2026-02-21T09:47:27.4370631Z add.s32 %r681, %r370, 360480; 2026-02-21T09:47:27.4370805Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4370860Z @%p95 bra $L__BB0_8; 2026-02-21T09:47:27.4370953Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.4371125Z .loc 1 54 31 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:54:31 2026-02-21T09:47:27.4371181Z shl.b32 %r403, %r669, 16; 2026-02-21T09:47:27.4371237Z add.s32 %r405, %r61, %r403; 2026-02-21T09:47:27.4371426Z .loc 1 55 44 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:55:44 2026-02-21T09:47:27.4371484Z shl.b32 %r406, %r669, 14; 2026-02-21T09:47:27.4371540Z add.s32 %r407, %r61, %r406; 2026-02-21T09:47:27.4371596Z add.s32 %r408, %r407, 262144; 2026-02-21T09:47:27.4371767Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4371831Z elect.sync %r409|%p147, -1; 2026-02-21T09:47:27.4371888Z bfe.u32 %r410, %r405, 4, 14; 2026-02-21T09:47:27.4371958Z cvt.u64.u32 %rd160, %r410; 2026-02-21T09:47:27.4372031Z or.b64 %rd127, %rd160, 4611686293439512576; 2026-02-21T09:47:27.4372112Z bfe.u32 %r411, %r408, 4, 14; 2026-02-21T09:47:27.4372187Z cvt.u64.u32 %rd161, %r411; 2026-02-21T09:47:27.4372257Z or.b64 %rd128, %rd161, 4611686293338849280; 2026-02-21T09:47:27.4372313Z mov.b32 %r372, 135266320; 2026-02-21T09:47:27.4372367Z // begin inline asm 2026-02-21T09:47:27.4372518Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd127, %rd128, %r372, %p204; 2026-02-21T09:47:27.4372572Z // end inline asm 2026-02-21T09:47:27.4372626Z add.s32 %r412, %r405, 32; 2026-02-21T09:47:27.4372690Z bfe.u32 %r413, %r412, 4, 14; 2026-02-21T09:47:27.4372744Z cvt.u64.u32 %rd162, %r413; 2026-02-21T09:47:27.4372810Z or.b64 %rd129, %rd162, 4611686293439512576; 2026-02-21T09:47:27.4372872Z add.s32 %r414, %r407, 262176; 2026-02-21T09:47:27.4372927Z bfe.u32 %r415, %r414, 4, 14; 2026-02-21T09:47:27.4372983Z cvt.u64.u32 %rd163, %r415; 2026-02-21T09:47:27.4373046Z or.b64 %rd130, %rd163, 4611686293338849280; 2026-02-21T09:47:27.4373112Z mov.pred %p148, -1; 2026-02-21T09:47:27.4373166Z // begin inline asm 2026-02-21T09:47:27.4373305Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd129, %rd130, %r372, %p148; 2026-02-21T09:47:27.4373363Z // end inline asm 2026-02-21T09:47:27.4373416Z add.s32 %r416, %r405, 64; 2026-02-21T09:47:27.4373493Z bfe.u32 %r417, %r416, 4, 14; 2026-02-21T09:47:27.4373551Z cvt.u64.u32 %rd164, %r417; 2026-02-21T09:47:27.4373623Z or.b64 %rd131, %rd164, 4611686293439512576; 2026-02-21T09:47:27.4373681Z add.s32 %r418, %r407, 262208; 2026-02-21T09:47:27.4373735Z bfe.u32 %r419, %r418, 4, 14; 2026-02-21T09:47:27.4373798Z cvt.u64.u32 %rd165, %r419; 2026-02-21T09:47:27.4373861Z or.b64 %rd132, %rd165, 4611686293338849280; 2026-02-21T09:47:27.4373915Z // begin inline asm 2026-02-21T09:47:27.4374053Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd131, %rd132, %r372, %p148; 2026-02-21T09:47:27.4374106Z // end inline asm 2026-02-21T09:47:27.4374161Z add.s32 %r420, %r405, 96; 2026-02-21T09:47:27.4374217Z bfe.u32 %r421, %r420, 4, 14; 2026-02-21T09:47:27.4374281Z cvt.u64.u32 %rd166, %r421; 2026-02-21T09:47:27.4374345Z or.b64 %rd133, %rd166, 4611686293439512576; 2026-02-21T09:47:27.4374399Z add.s32 %r422, %r407, 262240; 2026-02-21T09:47:27.4374460Z bfe.u32 %r423, %r422, 4, 14; 2026-02-21T09:47:27.4374516Z cvt.u64.u32 %rd167, %r423; 2026-02-21T09:47:27.4374582Z or.b64 %rd134, %rd167, 4611686293338849280; 2026-02-21T09:47:27.4374657Z // begin inline asm 2026-02-21T09:47:27.4374830Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd133, %rd134, %r372, %p148; 2026-02-21T09:47:27.4374884Z // end inline asm 2026-02-21T09:47:27.4374939Z add.s32 %r424, %r405, 32768; 2026-02-21T09:47:27.4375000Z bfe.u32 %r425, %r424, 4, 14; 2026-02-21T09:47:27.4375056Z cvt.u64.u32 %rd168, %r425; 2026-02-21T09:47:27.4375120Z or.b64 %rd135, %rd168, 4611686293439512576; 2026-02-21T09:47:27.4375184Z add.s32 %r426, %r407, 270336; 2026-02-21T09:47:27.4375239Z bfe.u32 %r427, %r426, 4, 14; 2026-02-21T09:47:27.4375295Z cvt.u64.u32 %rd169, %r427; 2026-02-21T09:47:27.4375357Z or.b64 %rd136, %rd169, 4611686293338849280; 2026-02-21T09:47:27.4375419Z // begin inline asm 2026-02-21T09:47:27.4375548Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd135, %rd136, %r372, %p148; 2026-02-21T09:47:27.4375601Z // end inline asm 2026-02-21T09:47:27.4375689Z add.s32 %r428, %r405, 32800; 2026-02-21T09:47:27.4375745Z bfe.u32 %r429, %r428, 4, 14; 2026-02-21T09:47:27.4375804Z cvt.u64.u32 %rd170, %r429; 2026-02-21T09:47:27.4375875Z or.b64 %rd137, %rd170, 4611686293439512576; 2026-02-21T09:47:27.4375930Z add.s32 %r430, %r407, 270368; 2026-02-21T09:47:27.4375985Z bfe.u32 %r431, %r430, 4, 14; 2026-02-21T09:47:27.4376041Z cvt.u64.u32 %rd171, %r431; 2026-02-21T09:47:27.4376112Z or.b64 %rd138, %rd171, 4611686293338849280; 2026-02-21T09:47:27.4376166Z // begin inline asm 2026-02-21T09:47:27.4376297Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd137, %rd138, %r372, %p148; 2026-02-21T09:47:27.4376382Z // end inline asm 2026-02-21T09:47:27.4376438Z add.s32 %r432, %r405, 32832; 2026-02-21T09:47:27.4376493Z bfe.u32 %r433, %r432, 4, 14; 2026-02-21T09:47:27.4376549Z cvt.u64.u32 %rd172, %r433; 2026-02-21T09:47:27.4376620Z or.b64 %rd139, %rd172, 4611686293439512576; 2026-02-21T09:47:27.4376677Z add.s32 %r434, %r407, 270400; 2026-02-21T09:47:27.4376733Z bfe.u32 %r435, %r434, 4, 14; 2026-02-21T09:47:27.4376796Z cvt.u64.u32 %rd173, %r435; 2026-02-21T09:47:27.4376861Z or.b64 %rd140, %rd173, 4611686293338849280; 2026-02-21T09:47:27.4376916Z // begin inline asm 2026-02-21T09:47:27.4377054Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd139, %rd140, %r372, %p148; 2026-02-21T09:47:27.4377106Z // end inline asm 2026-02-21T09:47:27.4377160Z add.s32 %r436, %r405, 32864; 2026-02-21T09:47:27.4377214Z bfe.u32 %r437, %r436, 4, 14; 2026-02-21T09:47:27.4377278Z cvt.u64.u32 %rd174, %r437; 2026-02-21T09:47:27.4377341Z or.b64 %rd141, %rd174, 4611686293439512576; 2026-02-21T09:47:27.4377399Z add.s32 %r438, %r407, 270432; 2026-02-21T09:47:27.4377459Z bfe.u32 %r439, %r438, 4, 14; 2026-02-21T09:47:27.4377514Z cvt.u64.u32 %rd175, %r439; 2026-02-21T09:47:27.4377579Z or.b64 %rd142, %rd175, 4611686293338849280; 2026-02-21T09:47:27.4377632Z // begin inline asm 2026-02-21T09:47:27.4377796Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 0 ], %rd141, %rd142, %r372, %p148; 2026-02-21T09:47:27.4377850Z // end inline asm 2026-02-21T09:47:27.4377907Z add.s32 %r440, %r405, 16384; 2026-02-21T09:47:27.4377969Z bfe.u32 %r441, %r440, 4, 14; 2026-02-21T09:47:27.4378024Z cvt.u64.u32 %rd176, %r441; 2026-02-21T09:47:27.4378087Z or.b64 %rd143, %rd176, 4611686293439512576; 2026-02-21T09:47:27.4378149Z // begin inline asm 2026-02-21T09:47:27.4378283Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd143, %rd128, %r372, %p204; 2026-02-21T09:47:27.4378336Z // end inline asm 2026-02-21T09:47:27.4378391Z add.s32 %r442, %r405, 16416; 2026-02-21T09:47:27.4378455Z bfe.u32 %r443, %r442, 4, 14; 2026-02-21T09:47:27.4378512Z cvt.u64.u32 %rd177, %r443; 2026-02-21T09:47:27.4378577Z or.b64 %rd145, %rd177, 4611686293439512576; 2026-02-21T09:47:27.4378643Z // begin inline asm 2026-02-21T09:47:27.4378779Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd145, %rd130, %r372, %p148; 2026-02-21T09:47:27.4378835Z // end inline asm 2026-02-21T09:47:27.4378900Z add.s32 %r444, %r405, 16448; 2026-02-21T09:47:27.4378980Z bfe.u32 %r445, %r444, 4, 14; 2026-02-21T09:47:27.4379035Z cvt.u64.u32 %rd178, %r445; 2026-02-21T09:47:27.4379099Z or.b64 %rd147, %rd178, 4611686293439512576; 2026-02-21T09:47:27.4379162Z // begin inline asm 2026-02-21T09:47:27.4379296Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd147, %rd132, %r372, %p148; 2026-02-21T09:47:27.4379349Z // end inline asm 2026-02-21T09:47:27.4379410Z add.s32 %r446, %r405, 16480; 2026-02-21T09:47:27.4379464Z bfe.u32 %r447, %r446, 4, 14; 2026-02-21T09:47:27.4379518Z cvt.u64.u32 %rd179, %r447; 2026-02-21T09:47:27.4379583Z or.b64 %rd149, %rd179, 4611686293439512576; 2026-02-21T09:47:27.4379644Z // begin inline asm 2026-02-21T09:47:27.4379777Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd149, %rd134, %r372, %p148; 2026-02-21T09:47:27.4379830Z // end inline asm 2026-02-21T09:47:27.4379894Z add.s32 %r448, %r405, 49152; 2026-02-21T09:47:27.4379971Z bfe.u32 %r449, %r448, 4, 14; 2026-02-21T09:47:27.4380029Z cvt.u64.u32 %rd180, %r449; 2026-02-21T09:47:27.4380101Z or.b64 %rd151, %rd180, 4611686293439512576; 2026-02-21T09:47:27.4380157Z // begin inline asm 2026-02-21T09:47:27.4380288Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd151, %rd136, %r372, %p148; 2026-02-21T09:47:27.4380340Z // end inline asm 2026-02-21T09:47:27.4380401Z add.s32 %r450, %r405, 49184; 2026-02-21T09:47:27.4380455Z bfe.u32 %r451, %r450, 4, 14; 2026-02-21T09:47:27.4380511Z cvt.u64.u32 %rd181, %r451; 2026-02-21T09:47:27.4380581Z or.b64 %rd153, %rd181, 4611686293439512576; 2026-02-21T09:47:27.4380658Z // begin inline asm 2026-02-21T09:47:27.4380789Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd153, %rd138, %r372, %p148; 2026-02-21T09:47:27.4380849Z // end inline asm 2026-02-21T09:47:27.4380903Z add.s32 %r452, %r405, 49216; 2026-02-21T09:47:27.4380957Z bfe.u32 %r453, %r452, 4, 14; 2026-02-21T09:47:27.4381015Z cvt.u64.u32 %rd182, %r453; 2026-02-21T09:47:27.4381087Z or.b64 %rd155, %rd182, 4611686293439512576; 2026-02-21T09:47:27.4381143Z // begin inline asm 2026-02-21T09:47:27.4381275Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd155, %rd140, %r372, %p148; 2026-02-21T09:47:27.4381336Z // end inline asm 2026-02-21T09:47:27.4381389Z add.s32 %r454, %r405, 49248; 2026-02-21T09:47:27.4381442Z bfe.u32 %r455, %r454, 4, 14; 2026-02-21T09:47:27.4381504Z cvt.u64.u32 %rd183, %r455; 2026-02-21T09:47:27.4381568Z or.b64 %rd157, %rd183, 4611686293439512576; 2026-02-21T09:47:27.4381623Z // begin inline asm 2026-02-21T09:47:27.4381756Z @%p147 tcgen05.mma.cta_group::1.kind::f16 [ %r657 + 64 ], %rd157, %rd142, %r372, %p148; 2026-02-21T09:47:27.4381818Z // end inline asm 2026-02-21T09:47:27.4381873Z cvt.u64.u32 %rd159, %r681; 2026-02-21T09:47:27.4381927Z // begin inline asm 2026-02-21T09:47:27.4382058Z @%p147 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd159]; 2026-02-21T09:47:27.4382131Z // end inline asm 2026-02-21T09:47:27.4382226Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.4382400Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4382460Z setp.eq.b32 %p182, %r44, 0; 2026-02-21T09:47:27.4382520Z setp.lt.s32 %p183, %r677, %r16; 2026-02-21T09:47:27.4382682Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4382744Z // begin inline asm 2026-02-21T09:47:27.4382792Z 2026-02-21T09:47:27.4382838Z { 2026-02-21T09:47:27.4382902Z .reg .pred complete; 2026-02-21T09:47:27.4382955Z waitLoop: 2026-02-21T09:47:27.4383070Z mbarrier.try_wait.parity.shared.b64 complete, [%r667], %r666; 2026-02-21T09:47:27.4383132Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.4383186Z } 2026-02-21T09:47:27.4383190Z 2026-02-21T09:47:27.4383241Z // end inline asm 2026-02-21T09:47:27.4383299Z add.s32 %r467, %r671, 1; 2026-02-21T09:47:27.4383365Z setp.gt.s32 %p186, %r467, 1; 2026-02-21T09:47:27.4383528Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4383606Z add.s32 %r469, %r665, 128; 2026-02-21T09:47:27.4383667Z add.s32 %r470, %r670, 1; 2026-02-21T09:47:27.4383724Z setp.gt.s32 %p187, %r470, 3; 2026-02-21T09:47:27.4383784Z selp.b32 %r670, 0, %r470, %p187; 2026-02-21T09:47:27.4383843Z selp.b32 %r665, 0, %r469, %p182; 2026-02-21T09:47:27.4383905Z shl.b32 %r471, %r670, 3; 2026-02-21T09:47:27.4383961Z add.s32 %r473, %r61, %r471; 2026-02-21T09:47:27.4384016Z add.s32 %r462, %r473, 360448; 2026-02-21T09:47:27.4384076Z bar.sync 0; 2026-02-21T09:47:27.4384136Z and.pred %p179, %p197, %p183; 2026-02-21T09:47:27.4384189Z // begin inline asm 2026-02-21T09:47:27.4384302Z @%p179 mbarrier.arrive.expect_tx.shared.b64 _, [%r462], 81920; 2026-02-21T09:47:27.4384362Z // end inline asm 2026-02-21T09:47:27.4384543Z .loc 1 54 31 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:54:31 2026-02-21T09:47:27.4384603Z shl.b32 %r474, %r670, 16; 2026-02-21T09:47:27.4384664Z bar.sync 0; 2026-02-21T09:47:27.4384766Z elect.sync %r475|%p188, -1; 2026-02-21T09:47:27.4384827Z and.pred %p189, %p183, %p188; 2026-02-21T09:47:27.4384893Z and.pred %p180, %p82, %p189; 2026-02-21T09:47:27.4384948Z add.s32 %r459, %r161, %r474; 2026-02-21T09:47:27.4385002Z add.s32 %r460, %r665, %r162; 2026-02-21T09:47:27.4385056Z // begin inline asm 2026-02-21T09:47:27.4385308Z @%p180 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r459], [%rd58, {%r460, %r678}], [%r462]; 2026-02-21T09:47:27.4385386Z // end inline asm 2026-02-21T09:47:27.4385546Z .loc 1 55 44 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:55:44 2026-02-21T09:47:27.4385610Z shl.b32 %r476, %r670, 14; 2026-02-21T09:47:27.4385662Z bar.sync 0; 2026-02-21T09:47:27.4385724Z elect.sync %r477|%p190, -1; 2026-02-21T09:47:27.4385794Z and.pred %p191, %p183, %p190; 2026-02-21T09:47:27.4385854Z and.pred %p181, %p82, %p191; 2026-02-21T09:47:27.4385911Z add.s32 %r463, %r165, %r476; 2026-02-21T09:47:27.4385966Z // begin inline asm 2026-02-21T09:47:27.4386211Z @%p181 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r463], [%rd59, {%r460, %r679}], [%r462]; 2026-02-21T09:47:27.4386267Z // end inline asm 2026-02-21T09:47:27.4386426Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4386498Z setp.ne.b32 %p204, %r664, 15; 2026-02-21T09:47:27.4386557Z @%p204 bra $L__BB0_10; 2026-02-21T09:47:27.4386649Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.4386817Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4386872Z // begin inline asm 2026-02-21T09:47:27.4386923Z 2026-02-21T09:47:27.4387009Z { 2026-02-21T09:47:27.4387077Z .reg .pred complete; 2026-02-21T09:47:27.4387133Z waitLoop: 2026-02-21T09:47:27.4387255Z mbarrier.try_wait.parity.shared.b64 complete, [%r681], %r682; 2026-02-21T09:47:27.4387327Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.4387376Z } 2026-02-21T09:47:27.4387380Z 2026-02-21T09:47:27.4387436Z // end inline asm 2026-02-21T09:47:27.4387498Z // begin inline asm 2026-02-21T09:47:27.4387780Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495}, [%r86 + 0]; 2026-02-21T09:47:27.4387836Z // end inline asm 2026-02-21T09:47:27.4387893Z // begin inline asm 2026-02-21T09:47:27.4388187Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512}, [%r86 + 16]; 2026-02-21T09:47:27.4388243Z // end inline asm 2026-02-21T09:47:27.4388299Z // begin inline asm 2026-02-21T09:47:27.4388581Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529}, [%r86 + 32]; 2026-02-21T09:47:27.4388665Z // end inline asm 2026-02-21T09:47:27.4388720Z // begin inline asm 2026-02-21T09:47:27.4389005Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r531, %r532, %r533, %r534, %r535, %r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546}, [%r86 + 48]; 2026-02-21T09:47:27.4389060Z // end inline asm 2026-02-21T09:47:27.4389115Z // begin inline asm 2026-02-21T09:47:27.4389192Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:27.4389247Z // end inline asm 2026-02-21T09:47:27.4389309Z cvt.u64.u32 %rd187, %r480; 2026-02-21T09:47:27.4389368Z cvt.u64.u32 %rd188, %r481; 2026-02-21T09:47:27.4389438Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:47:27.4389501Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:47:27.4389672Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4389770Z mov.b64 {%r551, %r552}, %rd190; 2026-02-21T09:47:27.4389843Z cvt.rn.f16x2.f32 %r553, %r552, %r551; 2026-02-21T09:47:27.4390011Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4390070Z cvt.u64.u32 %rd191, %r482; 2026-02-21T09:47:27.4390136Z cvt.u64.u32 %rd192, %r483; 2026-02-21T09:47:27.4390195Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:47:27.4390256Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:47:27.4390428Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4390513Z mov.b64 {%r554, %r555}, %rd194; 2026-02-21T09:47:27.4390582Z cvt.rn.f16x2.f32 %r556, %r555, %r554; 2026-02-21T09:47:27.4390756Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4390815Z cvt.u64.u32 %rd195, %r484; 2026-02-21T09:47:27.4390876Z cvt.u64.u32 %rd196, %r485; 2026-02-21T09:47:27.4390935Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:47:27.4391002Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:47:27.4391171Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4391230Z mov.b64 {%r557, %r558}, %rd198; 2026-02-21T09:47:27.4391302Z cvt.rn.f16x2.f32 %r559, %r558, %r557; 2026-02-21T09:47:27.4391472Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4391530Z cvt.u64.u32 %rd199, %r486; 2026-02-21T09:47:27.4391594Z cvt.u64.u32 %rd200, %r487; 2026-02-21T09:47:27.4391654Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:47:27.4391713Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:47:27.4391882Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4391946Z mov.b64 {%r560, %r561}, %rd202; 2026-02-21T09:47:27.4392121Z cvt.rn.f16x2.f32 %r562, %r561, %r560; 2026-02-21T09:47:27.4392292Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4392362Z cvt.u64.u32 %rd203, %r488; 2026-02-21T09:47:27.4392421Z cvt.u64.u32 %rd204, %r489; 2026-02-21T09:47:27.4392481Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:47:27.4392547Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:47:27.4392717Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4392777Z mov.b64 {%r563, %r564}, %rd206; 2026-02-21T09:47:27.4392842Z cvt.rn.f16x2.f32 %r565, %r564, %r563; 2026-02-21T09:47:27.4393020Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4393080Z cvt.u64.u32 %rd207, %r490; 2026-02-21T09:47:27.4393139Z cvt.u64.u32 %rd208, %r491; 2026-02-21T09:47:27.4393206Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:47:27.4393266Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:47:27.4393440Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4393531Z mov.b64 {%r566, %r567}, %rd210; 2026-02-21T09:47:27.4393595Z cvt.rn.f16x2.f32 %r568, %r567, %r566; 2026-02-21T09:47:27.4393766Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4393825Z cvt.u64.u32 %rd211, %r492; 2026-02-21T09:47:27.4393893Z cvt.u64.u32 %rd212, %r493; 2026-02-21T09:47:27.4393952Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:47:27.4394011Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:47:27.4394186Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4394245Z mov.b64 {%r569, %r570}, %rd214; 2026-02-21T09:47:27.4394309Z cvt.rn.f16x2.f32 %r571, %r570, %r569; 2026-02-21T09:47:27.4394478Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4394559Z cvt.u64.u32 %rd215, %r494; 2026-02-21T09:47:27.4394619Z cvt.u64.u32 %rd216, %r495; 2026-02-21T09:47:27.4394705Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:47:27.4394775Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:47:27.4394943Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4395001Z mov.b64 {%r572, %r573}, %rd218; 2026-02-21T09:47:27.4395073Z cvt.rn.f16x2.f32 %r574, %r573, %r572; 2026-02-21T09:47:27.4395244Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4395304Z cvt.u64.u32 %rd219, %r497; 2026-02-21T09:47:27.4395398Z cvt.u64.u32 %rd220, %r498; 2026-02-21T09:47:27.4395459Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:47:27.4395518Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:47:27.4395689Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4395761Z mov.b64 {%r575, %r576}, %rd222; 2026-02-21T09:47:27.4395827Z cvt.rn.f16x2.f32 %r577, %r576, %r575; 2026-02-21T09:47:27.4395998Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4396066Z cvt.u64.u32 %rd223, %r499; 2026-02-21T09:47:27.4396125Z cvt.u64.u32 %rd224, %r500; 2026-02-21T09:47:27.4396183Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:47:27.4396250Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:47:27.4396427Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4396483Z mov.b64 {%r578, %r579}, %rd226; 2026-02-21T09:47:27.4396545Z cvt.rn.f16x2.f32 %r580, %r579, %r578; 2026-02-21T09:47:27.4396719Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4396780Z cvt.u64.u32 %rd227, %r501; 2026-02-21T09:47:27.4396839Z cvt.u64.u32 %rd228, %r502; 2026-02-21T09:47:27.4396931Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:47:27.4396992Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:47:27.4397148Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4397214Z mov.b64 {%r581, %r582}, %rd230; 2026-02-21T09:47:27.4397274Z cvt.rn.f16x2.f32 %r583, %r582, %r581; 2026-02-21T09:47:27.4397429Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4397485Z cvt.u64.u32 %rd231, %r503; 2026-02-21T09:47:27.4397547Z cvt.u64.u32 %rd232, %r504; 2026-02-21T09:47:27.4397602Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:47:27.4397658Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:47:27.4397821Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4397877Z mov.b64 {%r584, %r585}, %rd234; 2026-02-21T09:47:27.4397936Z cvt.rn.f16x2.f32 %r586, %r585, %r584; 2026-02-21T09:47:27.4398103Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4398160Z cvt.u64.u32 %rd235, %r505; 2026-02-21T09:47:27.4398239Z cvt.u64.u32 %rd236, %r506; 2026-02-21T09:47:27.4398294Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:47:27.4398360Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:47:27.4398525Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4398582Z mov.b64 {%r587, %r588}, %rd238; 2026-02-21T09:47:27.4398651Z cvt.rn.f16x2.f32 %r589, %r588, %r587; 2026-02-21T09:47:27.4398814Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4398872Z cvt.u64.u32 %rd239, %r507; 2026-02-21T09:47:27.4398936Z cvt.u64.u32 %rd240, %r508; 2026-02-21T09:47:27.4398991Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:47:27.4399047Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:47:27.4399236Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4399304Z mov.b64 {%r590, %r591}, %rd242; 2026-02-21T09:47:27.4399366Z cvt.rn.f16x2.f32 %r592, %r591, %r590; 2026-02-21T09:47:27.4399524Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4399589Z cvt.u64.u32 %rd243, %r509; 2026-02-21T09:47:27.4399645Z cvt.u64.u32 %rd244, %r510; 2026-02-21T09:47:27.4399700Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:47:27.4399765Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:47:27.4399926Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4400004Z mov.b64 {%r593, %r594}, %rd246; 2026-02-21T09:47:27.4400065Z cvt.rn.f16x2.f32 %r595, %r594, %r593; 2026-02-21T09:47:27.4400234Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4400289Z cvt.u64.u32 %rd247, %r511; 2026-02-21T09:47:27.4400346Z cvt.u64.u32 %rd248, %r512; 2026-02-21T09:47:27.4400408Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:47:27.4400465Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:47:27.4400629Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4400692Z mov.b64 {%r596, %r597}, %rd250; 2026-02-21T09:47:27.4400752Z cvt.rn.f16x2.f32 %r598, %r597, %r596; 2026-02-21T09:47:27.4400915Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4400971Z cvt.u64.u32 %rd251, %r514; 2026-02-21T09:47:27.4401032Z cvt.u64.u32 %rd252, %r515; 2026-02-21T09:47:27.4401089Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:47:27.4401146Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:47:27.4401316Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4401372Z mov.b64 {%r599, %r600}, %rd254; 2026-02-21T09:47:27.4401452Z cvt.rn.f16x2.f32 %r601, %r600, %r599; 2026-02-21T09:47:27.4401613Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4401672Z cvt.u64.u32 %rd255, %r516; 2026-02-21T09:47:27.4401727Z cvt.u64.u32 %rd256, %r517; 2026-02-21T09:47:27.4401781Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:47:27.4401844Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:47:27.4402001Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4402056Z mov.b64 {%r602, %r603}, %rd258; 2026-02-21T09:47:27.4402122Z cvt.rn.f16x2.f32 %r604, %r603, %r602; 2026-02-21T09:47:27.4402278Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4402335Z cvt.u64.u32 %rd259, %r518; 2026-02-21T09:47:27.4402395Z cvt.u64.u32 %rd260, %r519; 2026-02-21T09:47:27.4402451Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:47:27.4402509Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:47:27.4402666Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4402752Z mov.b64 {%r605, %r606}, %rd262; 2026-02-21T09:47:27.4402812Z cvt.rn.f16x2.f32 %r607, %r606, %r605; 2026-02-21T09:47:27.4402975Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4403040Z cvt.u64.u32 %rd263, %r520; 2026-02-21T09:47:27.4403096Z cvt.u64.u32 %rd264, %r521; 2026-02-21T09:47:27.4403153Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:47:27.4403217Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:47:27.4403379Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4403436Z mov.b64 {%r608, %r609}, %rd266; 2026-02-21T09:47:27.4403496Z cvt.rn.f16x2.f32 %r610, %r609, %r608; 2026-02-21T09:47:27.4403669Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4403746Z cvt.u64.u32 %rd267, %r522; 2026-02-21T09:47:27.4403804Z cvt.u64.u32 %rd268, %r523; 2026-02-21T09:47:27.4403870Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:47:27.4403926Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:47:27.4404090Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4404157Z mov.b64 {%r611, %r612}, %rd270; 2026-02-21T09:47:27.4404219Z cvt.rn.f16x2.f32 %r613, %r612, %r611; 2026-02-21T09:47:27.4404378Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4404472Z cvt.u64.u32 %rd271, %r524; 2026-02-21T09:47:27.4404542Z cvt.u64.u32 %rd272, %r525; 2026-02-21T09:47:27.4404598Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:47:27.4404654Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:47:27.4404853Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4404911Z mov.b64 {%r614, %r615}, %rd274; 2026-02-21T09:47:27.4404970Z cvt.rn.f16x2.f32 %r616, %r615, %r614; 2026-02-21T09:47:27.4405133Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4405188Z cvt.u64.u32 %rd275, %r526; 2026-02-21T09:47:27.4405244Z cvt.u64.u32 %rd276, %r527; 2026-02-21T09:47:27.4405298Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:47:27.4405363Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:47:27.4405515Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4405573Z mov.b64 {%r617, %r618}, %rd278; 2026-02-21T09:47:27.4405645Z cvt.rn.f16x2.f32 %r619, %r618, %r617; 2026-02-21T09:47:27.4405802Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4405857Z cvt.u64.u32 %rd279, %r528; 2026-02-21T09:47:27.4405920Z cvt.u64.u32 %rd280, %r529; 2026-02-21T09:47:27.4406010Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:47:27.4406069Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:47:27.4406229Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4406297Z mov.b64 {%r620, %r621}, %rd282; 2026-02-21T09:47:27.4406357Z cvt.rn.f16x2.f32 %r622, %r621, %r620; 2026-02-21T09:47:27.4406513Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4406577Z cvt.u64.u32 %rd283, %r531; 2026-02-21T09:47:27.4406632Z cvt.u64.u32 %rd284, %r532; 2026-02-21T09:47:27.4406688Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:47:27.4406755Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:47:27.4406917Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4406973Z mov.b64 {%r623, %r624}, %rd286; 2026-02-21T09:47:27.4407033Z cvt.rn.f16x2.f32 %r625, %r624, %r623; 2026-02-21T09:47:27.4407206Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4407289Z cvt.u64.u32 %rd287, %r533; 2026-02-21T09:47:27.4407345Z cvt.u64.u32 %rd288, %r534; 2026-02-21T09:47:27.4407410Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:47:27.4407466Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:47:27.4407629Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4407693Z mov.b64 {%r626, %r627}, %rd290; 2026-02-21T09:47:27.4407752Z cvt.rn.f16x2.f32 %r628, %r627, %r626; 2026-02-21T09:47:27.4407916Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4407974Z cvt.u64.u32 %rd291, %r535; 2026-02-21T09:47:27.4408037Z cvt.u64.u32 %rd292, %r536; 2026-02-21T09:47:27.4408093Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:47:27.4408148Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:47:27.4408343Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4408401Z mov.b64 {%r629, %r630}, %rd294; 2026-02-21T09:47:27.4408461Z cvt.rn.f16x2.f32 %r631, %r630, %r629; 2026-02-21T09:47:27.4408631Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4408687Z cvt.u64.u32 %rd295, %r537; 2026-02-21T09:47:27.4408742Z cvt.u64.u32 %rd296, %r538; 2026-02-21T09:47:27.4408797Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:47:27.4408860Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:47:27.4409024Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4409105Z mov.b64 {%r632, %r633}, %rd298; 2026-02-21T09:47:27.4409173Z cvt.rn.f16x2.f32 %r634, %r633, %r632; 2026-02-21T09:47:27.4409338Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4409395Z cvt.u64.u32 %rd299, %r539; 2026-02-21T09:47:27.4409457Z cvt.u64.u32 %rd300, %r540; 2026-02-21T09:47:27.4409513Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:47:27.4409571Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:47:27.4409729Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4409793Z mov.b64 {%r635, %r636}, %rd302; 2026-02-21T09:47:27.4409853Z cvt.rn.f16x2.f32 %r637, %r636, %r635; 2026-02-21T09:47:27.4410015Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4410076Z cvt.u64.u32 %rd303, %r541; 2026-02-21T09:47:27.4410134Z cvt.u64.u32 %rd304, %r542; 2026-02-21T09:47:27.4410188Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:47:27.4410250Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:47:27.4410407Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4410485Z mov.b64 {%r638, %r639}, %rd306; 2026-02-21T09:47:27.4410547Z cvt.rn.f16x2.f32 %r640, %r639, %r638; 2026-02-21T09:47:27.4410711Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4410768Z cvt.u64.u32 %rd307, %r543; 2026-02-21T09:47:27.4410822Z cvt.u64.u32 %rd308, %r544; 2026-02-21T09:47:27.4410884Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:47:27.4410940Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:47:27.4411096Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4411160Z mov.b64 {%r641, %r642}, %rd310; 2026-02-21T09:47:27.4411222Z cvt.rn.f16x2.f32 %r643, %r642, %r641; 2026-02-21T09:47:27.4411379Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4411435Z cvt.u64.u32 %rd311, %r545; 2026-02-21T09:47:27.4411498Z cvt.u64.u32 %rd312, %r546; 2026-02-21T09:47:27.4411553Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:47:27.4411611Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:47:27.4411773Z .loc 1 58 27 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:58:27 2026-02-21T09:47:27.4411854Z mov.b64 {%r644, %r645}, %rd314; 2026-02-21T09:47:27.4411915Z cvt.rn.f16x2.f32 %r646, %r645, %r644; 2026-02-21T09:47:27.4412083Z .loc 1 59 45 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:59:45 2026-02-21T09:47:27.4412154Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.4412205Z bar.sync 0; 2026-02-21T09:47:27.4412300Z st.shared.v4.b32 [%r17], {%r553, %r556, %r559, %r562}; 2026-02-21T09:47:27.4412397Z st.shared.v4.b32 [%r18], {%r565, %r568, %r571, %r574}; 2026-02-21T09:47:27.4412481Z st.shared.v4.b32 [%r19], {%r577, %r580, %r583, %r586}; 2026-02-21T09:47:27.4412565Z st.shared.v4.b32 [%r20], {%r589, %r592, %r595, %r598}; 2026-02-21T09:47:27.4412654Z st.shared.v4.b32 [%r21], {%r601, %r604, %r607, %r610}; 2026-02-21T09:47:27.4412756Z st.shared.v4.b32 [%r22], {%r613, %r616, %r619, %r622}; 2026-02-21T09:47:27.4412841Z st.shared.v4.b32 [%r23], {%r625, %r628, %r631, %r634}; 2026-02-21T09:47:27.4412932Z st.shared.v4.b32 [%r24], {%r637, %r640, %r643, %r646}; 2026-02-21T09:47:27.4412993Z // begin inline asm 2026-02-21T09:47:27.4413066Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.4413118Z // end inline asm 2026-02-21T09:47:27.4413178Z bar.sync 0; 2026-02-21T09:47:27.4413242Z elect.sync %r647|%p194, -1; 2026-02-21T09:47:27.4413302Z and.pred %p192, %p3, %p194; 2026-02-21T09:47:27.4413362Z // begin inline asm 2026-02-21T09:47:27.4413546Z @%p192 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd186, {%r660, %r662}], [%r340]; 2026-02-21T09:47:27.4413623Z // end inline asm 2026-02-21T09:47:27.4413687Z cp.async.bulk.commit_group; 2026-02-21T09:47:27.4413751Z bra.uni $L__BB0_10; 2026-02-21T09:47:27.4413830Z $L__BB0_11: // %._crit_edge 2026-02-21T09:47:27.4413988Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4414063Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.4414115Z bar.sync 0; 2026-02-21T09:47:27.4414171Z @%p79 bra $L__BB0_13; 2026-02-21T09:47:27.4414229Z // %bb.12: 2026-02-21T09:47:27.4414384Z .loc 1 56 52 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:56:52 2026-02-21T09:47:27.4414438Z // begin inline asm 2026-02-21T09:47:27.4414486Z 2026-02-21T09:47:27.4414541Z { 2026-02-21T09:47:27.4414598Z .reg .pred complete; 2026-02-21T09:47:27.4414651Z waitLoop: 2026-02-21T09:47:27.4414803Z mbarrier.try_wait.parity.shared.b64 complete, [%r681], %r682; 2026-02-21T09:47:27.4414867Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.4414916Z } 2026-02-21T09:47:27.4414920Z 2026-02-21T09:47:27.4414974Z // end inline asm 2026-02-21T09:47:27.4415036Z $L__BB0_13: 2026-02-21T09:47:27.4415215Z .loc 1 33 84 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:84 2026-02-21T09:47:27.4415273Z // begin inline asm 2026-02-21T09:47:27.4415366Z @%p197 mbarrier.inval.shared::cta.b64 [%r156]; 2026-02-21T09:47:27.4415421Z // end inline asm 2026-02-21T09:47:27.4415473Z bar.sync 0; 2026-02-21T09:47:27.4415535Z // begin inline asm 2026-02-21T09:47:27.4415619Z @%p197 mbarrier.inval.shared::cta.b64 [%r157]; 2026-02-21T09:47:27.4415672Z // end inline asm 2026-02-21T09:47:27.4415724Z bar.sync 0; 2026-02-21T09:47:27.4415786Z // begin inline asm 2026-02-21T09:47:27.4415866Z @%p197 mbarrier.inval.shared::cta.b64 [%r158]; 2026-02-21T09:47:27.4415918Z // end inline asm 2026-02-21T09:47:27.4415974Z bar.sync 0; 2026-02-21T09:47:27.4416031Z // begin inline asm 2026-02-21T09:47:27.4416107Z @%p197 mbarrier.inval.shared::cta.b64 [%r310]; 2026-02-21T09:47:27.4416159Z // end inline asm 2026-02-21T09:47:27.4416227Z add.s32 %r655, %r61, 360480; 2026-02-21T09:47:27.4416281Z // begin inline asm 2026-02-21T09:47:27.4416358Z @%p197 mbarrier.inval.shared::cta.b64 [%r655]; 2026-02-21T09:47:27.4416419Z // end inline asm 2026-02-21T09:47:27.4416470Z bar.sync 0; 2026-02-21T09:47:27.4416551Z // begin inline asm 2026-02-21T09:47:27.4416634Z @%p197 mbarrier.inval.shared::cta.b64 [%r155]; 2026-02-21T09:47:27.4416687Z // end inline asm 2026-02-21T09:47:27.4416846Z .loc 1 33 4 // cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py:33:4 2026-02-21T09:47:27.4416897Z bar.sync 0; 2026-02-21T09:47:27.4416957Z // begin inline asm 2026-02-21T09:47:27.4417069Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r657, 128; 2026-02-21T09:47:27.4417121Z // end inline asm 2026-02-21T09:47:27.4417177Z ret; 2026-02-21T09:47:27.4417230Z $L__tmp1: 2026-02-21T09:47:27.4417282Z $L__func_end0: 2026-02-21T09:47:27.4417362Z // -- End function 2026-02-21T09:47:27.4417419Z } 2026-02-21T09:47:27.4417620Z .file 1 "/tmp/torchinductor_root/c3/cc3m7thbucux3ok5z6fqpd5vtn66fzkkkoyidp5ahxyn2adxhsro.py" 2026-02-21T09:47:27.4417680Z .section .debug_abbrev 2026-02-21T09:47:27.4417762Z { 2026-02-21T09:47:27.4417849Z .b8 1 // Abbreviation Code 2026-02-21T09:47:27.4417936Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:27.4418022Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:27.4418100Z .b8 37 // DW_AT_producer 2026-02-21T09:47:27.4418174Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.4418245Z .b8 19 // DW_AT_language 2026-02-21T09:47:27.4418323Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:27.4418419Z .b8 3 // DW_AT_name 2026-02-21T09:47:27.4418491Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.4418574Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:27.4418646Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:27.4418720Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:27.4418797Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.4418868Z .b8 0 // EOM(1) 2026-02-21T09:47:27.4418933Z .b8 0 // EOM(2) 2026-02-21T09:47:27.4418997Z .b8 0 // EOM(3) 2026-02-21T09:47:27.4419053Z } 2026-02-21T09:47:27.4419111Z .section .debug_info 2026-02-21T09:47:27.4419159Z { 2026-02-21T09:47:27.4419247Z .b32 104 // Length of Unit 2026-02-21T09:47:27.4419331Z .b8 2 // DWARF version number 2026-02-21T09:47:27.4419383Z .b8 0 2026-02-21T09:47:27.4419496Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:27.4419588Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:27.4419708Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:27.4419788Z .b8 116 // DW_AT_producer 2026-02-21T09:47:27.4419850Z .b8 114 2026-02-21T09:47:27.4419901Z .b8 105 2026-02-21T09:47:27.4419950Z .b8 116 2026-02-21T09:47:27.4420008Z .b8 111 2026-02-21T09:47:27.4420058Z .b8 110 2026-02-21T09:47:27.4420107Z .b8 0 2026-02-21T09:47:27.4420179Z .b8 2 // DW_AT_language 2026-02-21T09:47:27.4420238Z .b8 0 2026-02-21T09:47:27.4420312Z .b8 99 // DW_AT_name 2026-02-21T09:47:27.4420363Z .b8 99 2026-02-21T09:47:27.4420418Z .b8 51 2026-02-21T09:47:27.4420467Z .b8 109 2026-02-21T09:47:27.4420515Z .b8 55 2026-02-21T09:47:27.4420563Z .b8 116 2026-02-21T09:47:27.4420620Z .b8 104 2026-02-21T09:47:27.4420667Z .b8 98 2026-02-21T09:47:27.4420715Z .b8 117 2026-02-21T09:47:27.4420770Z .b8 99 2026-02-21T09:47:27.4420819Z .b8 117 2026-02-21T09:47:27.4420867Z .b8 120 2026-02-21T09:47:27.4420913Z .b8 51 2026-02-21T09:47:27.4420969Z .b8 111 2026-02-21T09:47:27.4421019Z .b8 107 2026-02-21T09:47:27.4421067Z .b8 53 2026-02-21T09:47:27.4421115Z .b8 122 2026-02-21T09:47:27.4421193Z .b8 54 2026-02-21T09:47:27.4421241Z .b8 102 2026-02-21T09:47:27.4421289Z .b8 113 2026-02-21T09:47:27.4421343Z .b8 112 2026-02-21T09:47:27.4421391Z .b8 100 2026-02-21T09:47:27.4421438Z .b8 53 2026-02-21T09:47:27.4421485Z .b8 118 2026-02-21T09:47:27.4421541Z .b8 116 2026-02-21T09:47:27.4421590Z .b8 110 2026-02-21T09:47:27.4421638Z .b8 54 2026-02-21T09:47:27.4421693Z .b8 54 2026-02-21T09:47:27.4421739Z .b8 102 2026-02-21T09:47:27.4421788Z .b8 122 2026-02-21T09:47:27.4421836Z .b8 107 2026-02-21T09:47:27.4421892Z .b8 107 2026-02-21T09:47:27.4421942Z .b8 107 2026-02-21T09:47:27.4421990Z .b8 111 2026-02-21T09:47:27.4422037Z .b8 121 2026-02-21T09:47:27.4422093Z .b8 105 2026-02-21T09:47:27.4422140Z .b8 100 2026-02-21T09:47:27.4422187Z .b8 112 2026-02-21T09:47:27.4422242Z .b8 53 2026-02-21T09:47:27.4422290Z .b8 97 2026-02-21T09:47:27.4422338Z .b8 104 2026-02-21T09:47:27.4422385Z .b8 120 2026-02-21T09:47:27.4422442Z .b8 121 2026-02-21T09:47:27.4422522Z .b8 110 2026-02-21T09:47:27.4422571Z .b8 50 2026-02-21T09:47:27.4422628Z .b8 97 2026-02-21T09:47:27.4422678Z .b8 100 2026-02-21T09:47:27.4422727Z .b8 120 2026-02-21T09:47:27.4422775Z .b8 104 2026-02-21T09:47:27.4422832Z .b8 115 2026-02-21T09:47:27.4422880Z .b8 114 2026-02-21T09:47:27.4422928Z .b8 111 2026-02-21T09:47:27.4422981Z .b8 46 2026-02-21T09:47:27.4423029Z .b8 112 2026-02-21T09:47:27.4423080Z .b8 121 2026-02-21T09:47:27.4423128Z .b8 0 2026-02-21T09:47:27.4423224Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:27.4423295Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:27.4423365Z .b8 116 2026-02-21T09:47:27.4423419Z .b8 109 2026-02-21T09:47:27.4423468Z .b8 112 2026-02-21T09:47:27.4423517Z .b8 47 2026-02-21T09:47:27.4423564Z .b8 116 2026-02-21T09:47:27.4423620Z .b8 111 2026-02-21T09:47:27.4423669Z .b8 114 2026-02-21T09:47:27.4423718Z .b8 99 2026-02-21T09:47:27.4423766Z .b8 104 2026-02-21T09:47:27.4423822Z .b8 105 2026-02-21T09:47:27.4423871Z .b8 110 2026-02-21T09:47:27.4423919Z .b8 100 2026-02-21T09:47:27.4423974Z .b8 117 2026-02-21T09:47:27.4424022Z .b8 99 2026-02-21T09:47:27.4424070Z .b8 116 2026-02-21T09:47:27.4424116Z .b8 111 2026-02-21T09:47:27.4424171Z .b8 114 2026-02-21T09:47:27.4424218Z .b8 95 2026-02-21T09:47:27.4424266Z .b8 114 2026-02-21T09:47:27.4424321Z .b8 111 2026-02-21T09:47:27.4424368Z .b8 111 2026-02-21T09:47:27.4424416Z .b8 116 2026-02-21T09:47:27.4424463Z .b8 47 2026-02-21T09:47:27.4424519Z .b8 99 2026-02-21T09:47:27.4424568Z .b8 51 2026-02-21T09:47:27.4424614Z .b8 0 2026-02-21T09:47:27.4424661Z } 2026-02-21T09:47:27.4424781Z .section .debug_macinfo { } 2026-02-21T09:47:27.4424784Z 2026-02-21T09:47:27.4424861Z ================================================================ 2026-02-21T09:47:27.4424962Z please share the reproducer above with Triton project. 2026-02-21T09:47:27.6655649Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:27.6656933Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:47:27.6658067Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:27.6658295Z `ptxas` stderr: 2026-02-21T09:47:27.6658718Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.6659193Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.6659347Z 2026-02-21T09:47:27.6659728Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_7_32wvl.ptx -o /tmp/tmp_7_32wvl.ptx.o 2026-02-21T09:47:27.6660231Z 2026-02-21T09:47:27.6660365Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:27.6660553Z 2026-02-21T09:47:27.6660641Z 2026-02-21T09:47:27.6660644Z 2026-02-21T09:47:27.6660730Z ================================================================ 2026-02-21T09:47:27.6660942Z Internal Triton PTX codegen error 2026-02-21T09:47:27.6661107Z `ptxas` stderr: 2026-02-21T09:47:27.6661515Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.6661975Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.6662126Z 2026-02-21T09:47:27.6662521Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_7_32wvl.ptx -o /tmp/tmp_7_32wvl.ptx.o 2026-02-21T09:47:27.6662959Z 2026-02-21T09:47:27.6662970Z 2026-02-21T09:47:27.6663023Z // 2026-02-21T09:47:27.6663161Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:27.6663334Z // 2026-02-21T09:47:27.6663400Z 2026-02-21T09:47:27.6663454Z .version 8.7 2026-02-21T09:47:27.6663593Z .target sm_100a 2026-02-21T09:47:27.6663732Z .address_size 64 2026-02-21T09:47:27.6663815Z 2026-02-21T09:47:27.6663933Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:27.6664195Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:27.6664403Z // @_helion_matmul 2026-02-21T09:47:27.6664639Z .visible .entry _helion_matmul( 2026-02-21T09:47:27.6664928Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:27.6665198Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:27.6665459Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:27.6665709Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:27.6665960Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:27.6666157Z ) 2026-02-21T09:47:27.6666280Z .reqntid 256 2026-02-21T09:47:27.6666404Z .maxnreg 32 2026-02-21T09:47:27.6666530Z { 2026-02-21T09:47:27.6666650Z .reg .pred %p<172>; 2026-02-21T09:47:27.6666813Z .reg .b32 %r<589>; 2026-02-21T09:47:27.6666958Z .reg .b64 %rd<259>; 2026-02-21T09:47:27.6667213Z .loc 1 19 0 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:19:0 2026-02-21T09:47:27.6667503Z $L__func_begin0: 2026-02-21T09:47:27.6667742Z .loc 1 19 0 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:19:0 2026-02-21T09:47:27.6667976Z 2026-02-21T09:47:27.6668026Z // %bb.0: 2026-02-21T09:47:27.6668183Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:47:27.6668366Z $L__tmp0: 2026-02-21T09:47:27.6668634Z .loc 1 19 0 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:19 2026-02-21T09:47:27.6668911Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:27.6669087Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:47:27.6669280Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:47:27.6669461Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:47:27.6669649Z mov.b32 %r56, global_smem; 2026-02-21T09:47:27.6669811Z // begin inline asm 2026-02-21T09:47:27.6670046Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r56], 128; 2026-02-21T09:47:27.6670285Z // end inline asm 2026-02-21T09:47:27.6670450Z ld.param.b64 %rd64, [_helion_matmul_param_3]; 2026-02-21T09:47:27.6670633Z bar.sync 0; 2026-02-21T09:47:27.6670780Z ld.shared.b32 %r563, [global_smem]; 2026-02-21T09:47:27.6670946Z bar.sync 0; 2026-02-21T09:47:27.6671077Z // begin inline asm 2026-02-21T09:47:27.6671279Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:27.6671506Z // end inline asm 2026-02-21T09:47:27.6671757Z .loc 1 21 67 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:21:67 2026-02-21T09:47:27.6672075Z mov.u32 %r586, %ctaid.x; 2026-02-21T09:47:27.6672230Z mov.u32 %r184, %ctaid.y; 2026-02-21T09:47:27.6672373Z mov.u32 %r185, %ctaid.z; 2026-02-21T09:47:27.6672522Z mov.u32 %r186, %nctaid.x; 2026-02-21T09:47:27.6672671Z mov.u32 %r187, %nctaid.y; 2026-02-21T09:47:27.6672835Z mad.lo.s32 %r188, %r185, %r187, %r184; 2026-02-21T09:47:27.6673011Z mad.lo.s32 %r189, %r188, %r186, %r586; 2026-02-21T09:47:27.6673187Z mul.lo.s32 %r190, %r189, 384; 2026-02-21T09:47:27.6673348Z cvt.s64.s32 %rd65, %r190; 2026-02-21T09:47:27.6673504Z add.s64 %rd19, %rd64, %rd65; 2026-02-21T09:47:27.6673665Z shl.b32 %r191, %r1, 2; 2026-02-21T09:47:27.6673814Z add.s32 %r57, %r56, %r191; 2026-02-21T09:47:27.6673969Z mov.b32 %r588, 0; 2026-02-21T09:47:27.6674103Z // begin inline asm 2026-02-21T09:47:27.6674257Z @%p3 st.shared.b32 [ %r57 + 0 ], %r588; 2026-02-21T09:47:27.6674451Z // end inline asm 2026-02-21T09:47:27.6674598Z bar.warp.sync -1; 2026-02-21T09:47:27.6674781Z setp.eq.b32 %p164, %r1, 0; 2026-02-21T09:47:27.6674946Z cvt.u64.u32 %rd4, %r56; 2026-02-21T09:47:27.6675100Z // begin inline asm 2026-02-21T09:47:27.6675345Z @%p164 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:47:27.6675629Z // end inline asm 2026-02-21T09:47:27.6675760Z // begin inline asm 2026-02-21T09:47:27.6675990Z @%p164 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.6676239Z // end inline asm 2026-02-21T09:47:27.6676382Z mov.b32 %r59, 64; 2026-02-21T09:47:27.6676553Z // begin inline asm 2026-02-21T09:47:27.6676791Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:27.6677057Z // end inline asm 2026-02-21T09:47:27.6677186Z mov.b32 %r60, 256; 2026-02-21T09:47:27.6677323Z // begin inline asm 2026-02-21T09:47:27.6677547Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:47:27.6677807Z // end inline asm 2026-02-21T09:47:27.6677939Z mov.b32 %r61, 2048; 2026-02-21T09:47:27.6678081Z // begin inline asm 2026-02-21T09:47:27.6678316Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:47:27.6678591Z // end inline asm 2026-02-21T09:47:27.6678726Z // begin inline asm 2026-02-21T09:47:27.6678956Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:47:27.6679226Z // end inline asm 2026-02-21T09:47:27.6679358Z mov.b64 %rd12, 4096; 2026-02-21T09:47:27.6679505Z // begin inline asm 2026-02-21T09:47:27.6679759Z @%p164 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:27.6680058Z // end inline asm 2026-02-21T09:47:27.6680199Z mov.b32 %r63, 1; 2026-02-21T09:47:27.6680333Z // begin inline asm 2026-02-21T09:47:27.6680623Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:27.6680920Z // end inline asm 2026-02-21T09:47:27.6681062Z // begin inline asm 2026-02-21T09:47:27.6681315Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:27.6681606Z // end inline asm 2026-02-21T09:47:27.6681739Z // begin inline asm 2026-02-21T09:47:27.6681982Z @%p164 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.6682257Z // end inline asm 2026-02-21T09:47:27.6682389Z // begin inline asm 2026-02-21T09:47:27.6682650Z @%p164 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.6682937Z // end inline asm 2026-02-21T09:47:27.6683078Z // begin inline asm 2026-02-21T09:47:27.6683314Z @%p164 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.6683592Z // end inline asm 2026-02-21T09:47:27.6683734Z // begin inline asm 2026-02-21T09:47:27.6683967Z @%p164 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.6684264Z // end inline asm 2026-02-21T09:47:27.6684398Z // begin inline asm 2026-02-21T09:47:27.6684803Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.6685190Z // end inline asm 2026-02-21T09:47:27.6685336Z // begin inline asm 2026-02-21T09:47:27.6685558Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:47:27.6685815Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.6686017Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.6686198Z // end inline asm 2026-02-21T09:47:27.6686344Z bar.sync 0; 2026-02-21T09:47:27.6686487Z cvta.global.u64 %rd58, %rd19; 2026-02-21T09:47:27.6686784Z .loc 1 22 68 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:22:68 2026-02-21T09:47:27.6687088Z add.s32 %r192, %r190, 128; 2026-02-21T09:47:27.6687300Z cvt.s64.s32 %rd66, %r192; 2026-02-21T09:47:27.6687476Z add.s64 %rd37, %rd64, %rd66; 2026-02-21T09:47:27.6687639Z bar.sync 0; 2026-02-21T09:47:27.6687795Z // begin inline asm 2026-02-21T09:47:27.6687945Z @%p3 st.shared.b32 [ %r57 + 0 ], %r588; 2026-02-21T09:47:27.6688124Z // end inline asm 2026-02-21T09:47:27.6688260Z bar.warp.sync -1; 2026-02-21T09:47:27.6688406Z // begin inline asm 2026-02-21T09:47:27.6688656Z @%p164 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:47:27.6688937Z // end inline asm 2026-02-21T09:47:27.6689076Z // begin inline asm 2026-02-21T09:47:27.6689324Z @%p164 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.6689577Z // end inline asm 2026-02-21T09:47:27.6689706Z // begin inline asm 2026-02-21T09:47:27.6689941Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:27.6690203Z // end inline asm 2026-02-21T09:47:27.6690342Z // begin inline asm 2026-02-21T09:47:27.6690566Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r59; 2026-02-21T09:47:27.6690830Z // end inline asm 2026-02-21T09:47:27.6690967Z // begin inline asm 2026-02-21T09:47:27.6691202Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:47:27.6691474Z // end inline asm 2026-02-21T09:47:27.6691604Z mov.b32 %r70, 12288; 2026-02-21T09:47:27.6691746Z // begin inline asm 2026-02-21T09:47:27.6691979Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r70; 2026-02-21T09:47:27.6692247Z // end inline asm 2026-02-21T09:47:27.6692382Z // begin inline asm 2026-02-21T09:47:27.6692627Z @%p164 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:27.6692905Z // end inline asm 2026-02-21T09:47:27.6693032Z // begin inline asm 2026-02-21T09:47:27.6693311Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:27.6693589Z // end inline asm 2026-02-21T09:47:27.6693726Z // begin inline asm 2026-02-21T09:47:27.6693966Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:27.6694249Z // end inline asm 2026-02-21T09:47:27.6694386Z // begin inline asm 2026-02-21T09:47:27.6694618Z @%p164 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.6694917Z // end inline asm 2026-02-21T09:47:27.6695048Z // begin inline asm 2026-02-21T09:47:27.6695298Z @%p164 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.6695571Z // end inline asm 2026-02-21T09:47:27.6695707Z // begin inline asm 2026-02-21T09:47:27.6695945Z @%p164 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.6696201Z // end inline asm 2026-02-21T09:47:27.6696336Z // begin inline asm 2026-02-21T09:47:27.6696560Z @%p164 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.6696866Z // end inline asm 2026-02-21T09:47:27.6696997Z // begin inline asm 2026-02-21T09:47:27.6697339Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.6697713Z // end inline asm 2026-02-21T09:47:27.6697846Z // begin inline asm 2026-02-21T09:47:27.6698245Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:47:27.6698485Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.6698673Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.6698844Z // end inline asm 2026-02-21T09:47:27.6698989Z bar.sync 0; 2026-02-21T09:47:27.6699124Z cvta.global.u64 %rd59, %rd37; 2026-02-21T09:47:27.6699399Z .loc 1 24 73 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:24:73 2026-02-21T09:47:27.6699693Z add.s32 %r193, %r190, 256; 2026-02-21T09:47:27.6699876Z cvt.s64.s32 %rd67, %r193; 2026-02-21T09:47:27.6700042Z add.s64 %rd55, %rd64, %rd67; 2026-02-21T09:47:27.6700193Z bar.sync 0; 2026-02-21T09:47:27.6700324Z // begin inline asm 2026-02-21T09:47:27.6700468Z @%p3 st.shared.b32 [ %r57 + 0 ], %r588; 2026-02-21T09:47:27.6700640Z // end inline asm 2026-02-21T09:47:27.6700771Z bar.warp.sync -1; 2026-02-21T09:47:27.6700912Z // begin inline asm 2026-02-21T09:47:27.6701152Z @%p164 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:47:27.6701420Z // end inline asm 2026-02-21T09:47:27.6701557Z // begin inline asm 2026-02-21T09:47:27.6701793Z @%p164 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.6702041Z // end inline asm 2026-02-21T09:47:27.6702168Z // begin inline asm 2026-02-21T09:47:27.6702398Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:27.6702658Z // end inline asm 2026-02-21T09:47:27.6702788Z // begin inline asm 2026-02-21T09:47:27.6703014Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:47:27.6703267Z // end inline asm 2026-02-21T09:47:27.6703400Z // begin inline asm 2026-02-21T09:47:27.6703632Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r70; 2026-02-21T09:47:27.6703903Z // end inline asm 2026-02-21T09:47:27.6704030Z // begin inline asm 2026-02-21T09:47:27.6704265Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:47:27.6704530Z // end inline asm 2026-02-21T09:47:27.6704660Z mov.b64 %rd48, 24576; 2026-02-21T09:47:27.6704834Z // begin inline asm 2026-02-21T09:47:27.6705075Z @%p164 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd48; 2026-02-21T09:47:27.6705355Z // end inline asm 2026-02-21T09:47:27.6705483Z // begin inline asm 2026-02-21T09:47:27.6705758Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:27.6706042Z // end inline asm 2026-02-21T09:47:27.6706172Z // begin inline asm 2026-02-21T09:47:27.6706423Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:27.6706696Z // end inline asm 2026-02-21T09:47:27.6706831Z // begin inline asm 2026-02-21T09:47:27.6707057Z @%p164 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.6707321Z // end inline asm 2026-02-21T09:47:27.6707452Z // begin inline asm 2026-02-21T09:47:27.6707714Z @%p164 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.6707999Z // end inline asm 2026-02-21T09:47:27.6708126Z // begin inline asm 2026-02-21T09:47:27.6708359Z @%p164 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.6708618Z // end inline asm 2026-02-21T09:47:27.6708751Z // begin inline asm 2026-02-21T09:47:27.6708970Z @%p164 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.6709257Z // end inline asm 2026-02-21T09:47:27.6709391Z // begin inline asm 2026-02-21T09:47:27.6709724Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.6710096Z // end inline asm 2026-02-21T09:47:27.6710223Z // begin inline asm 2026-02-21T09:47:27.6710430Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:47:27.6710670Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.6710860Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.6711043Z // end inline asm 2026-02-21T09:47:27.6711169Z bar.sync 0; 2026-02-21T09:47:27.6711313Z cvta.global.u64 %rd130, %rd55; 2026-02-21T09:47:27.6711582Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6711873Z max.u32 %r194, %r586, 1535; 2026-02-21T09:47:27.6712057Z shl.b32 %r195, %r194, 5; 2026-02-21T09:47:27.6712216Z add.s32 %r4, %r195, -49120; 2026-02-21T09:47:27.6712371Z sub.s32 %r5, 49152, %r195; 2026-02-21T09:47:27.6712632Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6712924Z shr.u32 %r196, %r1, 5; 2026-02-21T09:47:27.6713084Z shfl.sync.idx.b32 %r6, %r196, 0, 31, -1; 2026-02-21T09:47:27.6713266Z shl.b32 %r197, %r6, 21; 2026-02-21T09:47:27.6713414Z and.b32 %r198, %r197, 6291456; 2026-02-21T09:47:27.6713577Z add.s32 %r199, %r198, %r563; 2026-02-21T09:47:27.6713727Z shl.b32 %r200, %r6, 4; 2026-02-21T09:47:27.6713910Z and.b32 %r201, %r200, 64; 2026-02-21T09:47:27.6714058Z add.s32 %r81, %r199, %r201; 2026-02-21T09:47:27.6714217Z mov.pred %p98, -1; 2026-02-21T09:47:27.6714353Z // begin inline asm 2026-02-21T09:47:27.6714747Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 0], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:27.6715150Z // end inline asm 2026-02-21T09:47:27.6715282Z // begin inline asm 2026-02-21T09:47:27.6715630Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 16], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:27.6716021Z // end inline asm 2026-02-21T09:47:27.6716160Z // begin inline asm 2026-02-21T09:47:27.6716507Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 32], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:27.6716879Z // end inline asm 2026-02-21T09:47:27.6717016Z // begin inline asm 2026-02-21T09:47:27.6717351Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 48], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:27.6717728Z // end inline asm 2026-02-21T09:47:27.6717886Z // begin inline asm 2026-02-21T09:47:27.6718044Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:27.6718216Z // end inline asm 2026-02-21T09:47:27.6718344Z bar.sync 0; 2026-02-21T09:47:27.6718596Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6718888Z add.s32 %r587, %r56, 196640; 2026-02-21T09:47:27.6719049Z // begin inline asm 2026-02-21T09:47:27.6719210Z @%p164 mbarrier.init.shared::cta.b64 [%r587], 1; 2026-02-21T09:47:27.6719403Z // end inline asm 2026-02-21T09:47:27.6719528Z bar.sync 0; 2026-02-21T09:47:27.6719664Z add.s32 %r150, %r56, 196648; 2026-02-21T09:47:27.6719813Z // begin inline asm 2026-02-21T09:47:27.6719977Z @%p164 mbarrier.init.shared::cta.b64 [%r150], 1; 2026-02-21T09:47:27.6720167Z // end inline asm 2026-02-21T09:47:27.6720297Z add.s32 %r151, %r56, 196608; 2026-02-21T09:47:27.6720451Z // begin inline asm 2026-02-21T09:47:27.6720609Z @%p164 mbarrier.init.shared::cta.b64 [%r151], 1; 2026-02-21T09:47:27.6720800Z // end inline asm 2026-02-21T09:47:27.6720925Z bar.sync 0; 2026-02-21T09:47:27.6721107Z add.s32 %r152, %r56, 196616; 2026-02-21T09:47:27.6721259Z // begin inline asm 2026-02-21T09:47:27.6721430Z @%p164 mbarrier.init.shared::cta.b64 [%r152], 1; 2026-02-21T09:47:27.6721623Z // end inline asm 2026-02-21T09:47:27.6721756Z bar.sync 0; 2026-02-21T09:47:27.6721898Z add.s32 %r153, %r56, 196624; 2026-02-21T09:47:27.6722052Z // begin inline asm 2026-02-21T09:47:27.6722220Z @%p164 mbarrier.init.shared::cta.b64 [%r153], 1; 2026-02-21T09:47:27.6722402Z // end inline asm 2026-02-21T09:47:27.6722541Z bar.sync 0; 2026-02-21T09:47:27.6722674Z add.s32 %r259, %r56, 196632; 2026-02-21T09:47:27.6722834Z // begin inline asm 2026-02-21T09:47:27.6722995Z @%p164 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:47:27.6723185Z // end inline asm 2026-02-21T09:47:27.6723334Z setp.lt.s32 %p79, %r5, 1; 2026-02-21T09:47:27.6723494Z setp.gt.s32 %p78, %r5, 0; 2026-02-21T09:47:27.6723812Z .loc 1 40 33 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:40:33 2026-02-21T09:47:27.6724116Z shr.u32 %r202, %r586, 3; 2026-02-21T09:47:27.6724283Z and.b32 %r203, %r202, 268435452; 2026-02-21T09:47:27.6724564Z .loc 1 41 39 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:39 2026-02-21T09:47:27.6724918Z sub.s32 %r204, 192, %r203; 2026-02-21T09:47:27.6725197Z .loc 1 41 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:52 2026-02-21T09:47:27.6725486Z min.s32 %r205, %r204, 4; 2026-02-21T09:47:27.6725762Z .loc 1 42 45 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:45 2026-02-21T09:47:27.6726079Z and.b32 %r206, %r586, 31; 2026-02-21T09:47:27.6726351Z .loc 1 43 51 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:43:51 2026-02-21T09:47:27.6726640Z div.s32 %r207, %r206, %r205; 2026-02-21T09:47:27.6726917Z .loc 1 42 64 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:64 2026-02-21T09:47:27.6727217Z mul.lo.s32 %r208, %r207, %r205; 2026-02-21T09:47:27.6727384Z sub.s32 %r209, %r206, %r208; 2026-02-21T09:47:27.6727660Z .loc 1 42 30 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:30 2026-02-21T09:47:27.6727949Z add.s32 %r210, %r209, %r203; 2026-02-21T09:47:27.6728226Z .loc 1 44 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:44:27 2026-02-21T09:47:27.6728521Z shl.b32 %r565, %r210, 6; 2026-02-21T09:47:27.6728794Z .loc 1 45 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:45:27 2026-02-21T09:47:27.6729104Z shl.b32 %r567, %r207, 8; 2026-02-21T09:47:27.6729370Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6729675Z bar.sync 0; 2026-02-21T09:47:27.6729842Z and.pred %p69, %p164, %p78; 2026-02-21T09:47:27.6730013Z // begin inline asm 2026-02-21T09:47:27.6730208Z @%p69 mbarrier.arrive.expect_tx.shared.b64 _, [%r151], 40960; 2026-02-21T09:47:27.6730438Z // end inline asm 2026-02-21T09:47:27.6730684Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:27.6730980Z // begin inline asm 2026-02-21T09:47:27.6731144Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.6731311Z // end inline asm 2026-02-21T09:47:27.6731462Z bar.sync 0; 2026-02-21T09:47:27.6731594Z elect.sync %r211|%p80, -1; 2026-02-21T09:47:27.6731761Z and.pred %p81, %p78, %p80; 2026-02-21T09:47:27.6731919Z and.pred %p70, %p3, %p81; 2026-02-21T09:47:27.6732077Z // begin inline asm 2026-02-21T09:47:27.6732389Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r56], [%rd58, {%r588, %r567}], [%r151]; 2026-02-21T09:47:27.6732737Z // end inline asm 2026-02-21T09:47:27.6732984Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:27.6733260Z bar.sync 0; 2026-02-21T09:47:27.6733426Z elect.sync %r212|%p82, -1; 2026-02-21T09:47:27.6733582Z and.pred %p83, %p78, %p82; 2026-02-21T09:47:27.6733741Z and.pred %p71, %p3, %p83; 2026-02-21T09:47:27.6733891Z add.s32 %r160, %r56, 131072; 2026-02-21T09:47:27.6734045Z // begin inline asm 2026-02-21T09:47:27.6734373Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd59, {%r588, %r565}], [%r151]; 2026-02-21T09:47:27.6734770Z // end inline asm 2026-02-21T09:47:27.6735014Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6735293Z setp.gt.s32 %p84, %r5, 1; 2026-02-21T09:47:27.6735444Z bar.sync 0; 2026-02-21T09:47:27.6735572Z and.pred %p72, %p164, %p84; 2026-02-21T09:47:27.6735733Z // begin inline asm 2026-02-21T09:47:27.6735917Z @%p72 mbarrier.arrive.expect_tx.shared.b64 _, [%r152], 40960; 2026-02-21T09:47:27.6736170Z // end inline asm 2026-02-21T09:47:27.6736417Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:27.6736693Z bar.sync 0; 2026-02-21T09:47:27.6736830Z elect.sync %r213|%p85, -1; 2026-02-21T09:47:27.6736983Z and.pred %p86, %p84, %p85; 2026-02-21T09:47:27.6737141Z and.pred %p73, %p3, %p86; 2026-02-21T09:47:27.6737290Z add.s32 %r165, %r56, 32768; 2026-02-21T09:47:27.6737443Z // begin inline asm 2026-02-21T09:47:27.6737761Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r165], [%rd58, {%r59, %r567}], [%r152]; 2026-02-21T09:47:27.6738132Z // end inline asm 2026-02-21T09:47:27.6738388Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:27.6738663Z bar.sync 0; 2026-02-21T09:47:27.6738803Z elect.sync %r214|%p87, -1; 2026-02-21T09:47:27.6738957Z and.pred %p88, %p84, %p87; 2026-02-21T09:47:27.6739123Z and.pred %p74, %p3, %p88; 2026-02-21T09:47:27.6739277Z add.s32 %r169, %r56, 139264; 2026-02-21T09:47:27.6739438Z // begin inline asm 2026-02-21T09:47:27.6739764Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r169], [%rd59, {%r59, %r565}], [%r152]; 2026-02-21T09:47:27.6740122Z // end inline asm 2026-02-21T09:47:27.6740367Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6740647Z setp.gt.s32 %p89, %r5, 2; 2026-02-21T09:47:27.6740799Z bar.sync 0; 2026-02-21T09:47:27.6740928Z and.pred %p75, %p164, %p89; 2026-02-21T09:47:27.6741087Z // begin inline asm 2026-02-21T09:47:27.6741275Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r153], 40960; 2026-02-21T09:47:27.6741493Z // end inline asm 2026-02-21T09:47:27.6741739Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:27.6742012Z bar.sync 0; 2026-02-21T09:47:27.6742193Z elect.sync %r215|%p90, -1; 2026-02-21T09:47:27.6742353Z and.pred %p91, %p89, %p90; 2026-02-21T09:47:27.6742513Z and.pred %p76, %p3, %p91; 2026-02-21T09:47:27.6742664Z add.s32 %r174, %r56, 65536; 2026-02-21T09:47:27.6742817Z mov.b32 %r175, 128; 2026-02-21T09:47:27.6742956Z // begin inline asm 2026-02-21T09:47:27.6743271Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r174], [%rd58, {%r175, %r567}], [%r153]; 2026-02-21T09:47:27.6743632Z // end inline asm 2026-02-21T09:47:27.6743872Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:27.6744164Z bar.sync 0; 2026-02-21T09:47:27.6744295Z elect.sync %r216|%p92, -1; 2026-02-21T09:47:27.6744460Z and.pred %p93, %p89, %p92; 2026-02-21T09:47:27.6744615Z and.pred %p77, %p3, %p93; 2026-02-21T09:47:27.6744803Z add.s32 %r178, %r56, 147456; 2026-02-21T09:47:27.6744949Z // begin inline asm 2026-02-21T09:47:27.6745265Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd59, {%r175, %r565}], [%r153]; 2026-02-21T09:47:27.6745645Z // end inline asm 2026-02-21T09:47:27.6745880Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6746160Z bar.sync 0; 2026-02-21T09:47:27.6746281Z // begin inline asm 2026-02-21T09:47:27.6746417Z 2026-02-21T09:47:27.6746526Z { 2026-02-21T09:47:27.6746657Z @!%p78 bra.uni skipWait; 2026-02-21T09:47:27.6746810Z .reg .pred complete; 2026-02-21T09:47:27.6746958Z waitLoop: 2026-02-21T09:47:27.6747149Z mbarrier.try_wait.parity.shared.b64 complete, [%r151], %r588; 2026-02-21T09:47:27.6747376Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.6747534Z skipWait: 2026-02-21T09:47:27.6747648Z } 2026-02-21T09:47:27.6747719Z 2026-02-21T09:47:27.6747772Z // end inline asm 2026-02-21T09:47:27.6748008Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6748320Z setp.ne.b32 %p94, %r6, 0; 2026-02-21T09:47:27.6748479Z or.pred %p95, %p79, %p94; 2026-02-21T09:47:27.6748645Z @%p95 bra $L__BB0_2; 2026-02-21T09:47:27.6748797Z // %bb.1: 2026-02-21T09:47:27.6748931Z elect.sync %r233|%p97, -1; 2026-02-21T09:47:27.6749103Z bfe.u32 %r235, %r56, 4, 14; 2026-02-21T09:47:27.6749263Z cvt.u64.u32 %rd85, %r235; 2026-02-21T09:47:27.6749437Z or.b64 %rd68, %rd85, 4611686293439512576; 2026-02-21T09:47:27.6759924Z bfe.u32 %r237, %r160, 4, 14; 2026-02-21T09:47:27.6760190Z cvt.u64.u32 %rd86, %r237; 2026-02-21T09:47:27.6760379Z or.b64 %rd69, %rd86, 4611686293338849280; 2026-02-21T09:47:27.6760564Z mov.b32 %r218, 135266320; 2026-02-21T09:47:27.6760836Z mov.pred %p96, 0; 2026-02-21T09:47:27.6760982Z // begin inline asm 2026-02-21T09:47:27.6761234Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd68, %rd69, %r218, %p96; 2026-02-21T09:47:27.6761494Z // end inline asm 2026-02-21T09:47:27.6761649Z add.s32 %r238, %r56, 32; 2026-02-21T09:47:27.6761813Z bfe.u32 %r239, %r238, 4, 14; 2026-02-21T09:47:27.6761983Z cvt.u64.u32 %rd87, %r239; 2026-02-21T09:47:27.6762160Z or.b64 %rd70, %rd87, 4611686293439512576; 2026-02-21T09:47:27.6762339Z add.s32 %r240, %r56, 131104; 2026-02-21T09:47:27.6762499Z bfe.u32 %r241, %r240, 4, 14; 2026-02-21T09:47:27.6762653Z cvt.u64.u32 %rd88, %r241; 2026-02-21T09:47:27.6762821Z or.b64 %rd71, %rd88, 4611686293338849280; 2026-02-21T09:47:27.6762991Z // begin inline asm 2026-02-21T09:47:27.6763218Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd70, %rd71, %r218, %p98; 2026-02-21T09:47:27.6763461Z // end inline asm 2026-02-21T09:47:27.6763606Z add.s32 %r242, %r56, 64; 2026-02-21T09:47:27.6763768Z bfe.u32 %r243, %r242, 4, 14; 2026-02-21T09:47:27.6763919Z cvt.u64.u32 %rd89, %r243; 2026-02-21T09:47:27.6764082Z or.b64 %rd72, %rd89, 4611686293439512576; 2026-02-21T09:47:27.6764255Z add.s32 %r244, %r56, 131136; 2026-02-21T09:47:27.6764413Z bfe.u32 %r245, %r244, 4, 14; 2026-02-21T09:47:27.6764599Z cvt.u64.u32 %rd90, %r245; 2026-02-21T09:47:27.6764810Z or.b64 %rd73, %rd90, 4611686293338849280; 2026-02-21T09:47:27.6764984Z // begin inline asm 2026-02-21T09:47:27.6765210Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd72, %rd73, %r218, %p98; 2026-02-21T09:47:27.6765464Z // end inline asm 2026-02-21T09:47:27.6765603Z add.s32 %r246, %r56, 96; 2026-02-21T09:47:27.6765767Z bfe.u32 %r247, %r246, 4, 14; 2026-02-21T09:47:27.6765921Z cvt.u64.u32 %rd91, %r247; 2026-02-21T09:47:27.6766094Z or.b64 %rd74, %rd91, 4611686293439512576; 2026-02-21T09:47:27.6766267Z add.s32 %r248, %r56, 131168; 2026-02-21T09:47:27.6766428Z bfe.u32 %r249, %r248, 4, 14; 2026-02-21T09:47:27.6766583Z cvt.u64.u32 %rd92, %r249; 2026-02-21T09:47:27.6766751Z or.b64 %rd75, %rd92, 4611686293338849280; 2026-02-21T09:47:27.6766921Z // begin inline asm 2026-02-21T09:47:27.6767144Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd74, %rd75, %r218, %p98; 2026-02-21T09:47:27.6767413Z // end inline asm 2026-02-21T09:47:27.6767563Z add.s32 %r250, %r56, 16384; 2026-02-21T09:47:27.6767741Z bfe.u32 %r251, %r250, 4, 14; 2026-02-21T09:47:27.6767939Z cvt.u64.u32 %rd93, %r251; 2026-02-21T09:47:27.6768110Z or.b64 %rd76, %rd93, 4611686293439512576; 2026-02-21T09:47:27.6768284Z // begin inline asm 2026-02-21T09:47:27.6768519Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd76, %rd69, %r218, %p96; 2026-02-21T09:47:27.6768784Z // end inline asm 2026-02-21T09:47:27.6768927Z add.s32 %r252, %r56, 16416; 2026-02-21T09:47:27.6769095Z bfe.u32 %r253, %r252, 4, 14; 2026-02-21T09:47:27.6769253Z cvt.u64.u32 %rd94, %r253; 2026-02-21T09:47:27.6769425Z or.b64 %rd78, %rd94, 4611686293439512576; 2026-02-21T09:47:27.6769603Z // begin inline asm 2026-02-21T09:47:27.6769837Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd78, %rd71, %r218, %p98; 2026-02-21T09:47:27.6770093Z // end inline asm 2026-02-21T09:47:27.6770245Z add.s32 %r254, %r56, 16448; 2026-02-21T09:47:27.6770414Z bfe.u32 %r255, %r254, 4, 14; 2026-02-21T09:47:27.6770606Z cvt.u64.u32 %rd95, %r255; 2026-02-21T09:47:27.6770785Z or.b64 %rd80, %rd95, 4611686293439512576; 2026-02-21T09:47:27.6770964Z // begin inline asm 2026-02-21T09:47:27.6771196Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd80, %rd73, %r218, %p98; 2026-02-21T09:47:27.6771449Z // end inline asm 2026-02-21T09:47:27.6771603Z add.s32 %r256, %r56, 16480; 2026-02-21T09:47:27.6771763Z bfe.u32 %r257, %r256, 4, 14; 2026-02-21T09:47:27.6771929Z cvt.u64.u32 %rd96, %r257; 2026-02-21T09:47:27.6772101Z or.b64 %rd82, %rd96, 4611686293439512576; 2026-02-21T09:47:27.6772278Z // begin inline asm 2026-02-21T09:47:27.6772541Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd82, %rd75, %r218, %p98; 2026-02-21T09:47:27.6772797Z // end inline asm 2026-02-21T09:47:27.6772947Z add.s32 %r258, %r56, 196640; 2026-02-21T09:47:27.6773104Z cvt.u64.u32 %rd84, %r258; 2026-02-21T09:47:27.6773170Z // begin inline asm 2026-02-21T09:47:27.6773304Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd84]; 2026-02-21T09:47:27.6773362Z // end inline asm 2026-02-21T09:47:27.6773421Z $L__BB0_2: 2026-02-21T09:47:27.6773617Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6773686Z setp.gt.s32 %p116, %r5, 3; 2026-02-21T09:47:27.6773742Z bar.sync 0; 2026-02-21T09:47:27.6773820Z and.pred %p113, %p164, %p116; 2026-02-21T09:47:27.6773878Z // begin inline asm 2026-02-21T09:47:27.6774000Z @%p113 mbarrier.arrive.expect_tx.shared.b64 _, [%r259], 40960; 2026-02-21T09:47:27.6774069Z // end inline asm 2026-02-21T09:47:27.6774249Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:27.6774307Z bar.sync 0; 2026-02-21T09:47:27.6774378Z elect.sync %r271|%p119, -1; 2026-02-21T09:47:27.6774456Z and.pred %p120, %p116, %p119; 2026-02-21T09:47:27.6774519Z and.pred %p114, %p3, %p120; 2026-02-21T09:47:27.6774607Z add.s32 %r260, %r56, 98304; 2026-02-21T09:47:27.6774709Z mov.b32 %r571, 192; 2026-02-21T09:47:27.6774775Z // begin inline asm 2026-02-21T09:47:27.6775026Z @%p114 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r260], [%rd58, {%r571, %r567}], [%r259]; 2026-02-21T09:47:27.6775093Z // end inline asm 2026-02-21T09:47:27.6775261Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:27.6775317Z bar.sync 0; 2026-02-21T09:47:27.6775378Z elect.sync %r272|%p121, -1; 2026-02-21T09:47:27.6775451Z and.pred %p122, %p116, %p121; 2026-02-21T09:47:27.6775513Z and.pred %p115, %p3, %p122; 2026-02-21T09:47:27.6775573Z add.s32 %r264, %r56, 155648; 2026-02-21T09:47:27.6775638Z // begin inline asm 2026-02-21T09:47:27.6775885Z @%p115 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd59, {%r571, %r565}], [%r259]; 2026-02-21T09:47:27.6775939Z // end inline asm 2026-02-21T09:47:27.6776118Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6776209Z sub.s32 %r10, 31, %r4; 2026-02-21T09:47:27.6776270Z setp.lt.s32 %p123, %r10, 1; 2026-02-21T09:47:27.6776330Z @%p123 bra $L__BB0_11; 2026-02-21T09:47:27.6776417Z // %bb.3: // %.lr.ph 2026-02-21T09:47:27.6776584Z .loc 1 0 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:0:84 2026-02-21T09:47:27.6776642Z sub.s32 %r11, 28, %r4; 2026-02-21T09:47:27.6776711Z shl.b32 %r279, %r1, 7; 2026-02-21T09:47:27.6776771Z and.b32 %r280, %r279, 32640; 2026-02-21T09:47:27.6776833Z shl.b32 %r281, %r1, 4; 2026-02-21T09:47:27.6776908Z and.b32 %r282, %r281, 112; 2026-02-21T09:47:27.6776968Z or.b32 %r283, %r280, %r282; 2026-02-21T09:47:27.6777029Z add.s32 %r285, %r56, 163840; 2026-02-21T09:47:27.6777087Z add.s32 %r12, %r285, %r283; 2026-02-21T09:47:27.6777155Z xor.b32 %r286, %r283, 16; 2026-02-21T09:47:27.6777213Z add.s32 %r13, %r285, %r286; 2026-02-21T09:47:27.6777292Z xor.b32 %r287, %r283, 32; 2026-02-21T09:47:27.6777360Z add.s32 %r14, %r285, %r287; 2026-02-21T09:47:27.6777419Z xor.b32 %r288, %r283, 48; 2026-02-21T09:47:27.6777475Z add.s32 %r15, %r285, %r288; 2026-02-21T09:47:27.6777532Z xor.b32 %r289, %r283, 64; 2026-02-21T09:47:27.6777597Z add.s32 %r16, %r285, %r289; 2026-02-21T09:47:27.6777652Z xor.b32 %r290, %r283, 80; 2026-02-21T09:47:27.6777709Z add.s32 %r17, %r285, %r290; 2026-02-21T09:47:27.6777774Z xor.b32 %r291, %r283, 96; 2026-02-21T09:47:27.6777829Z add.s32 %r18, %r285, %r291; 2026-02-21T09:47:27.6777888Z xor.b32 %r292, %r283, 112; 2026-02-21T09:47:27.6777971Z add.s32 %r19, %r285, %r292; 2026-02-21T09:47:27.6778038Z add.s32 %r573, %r56, 196640; 2026-02-21T09:47:27.6778098Z mov.pred %p171, -1; 2026-02-21T09:47:27.6778153Z mov.b32 %r576, 3; 2026-02-21T09:47:27.6778215Z mov.b32 %r572, 0; 2026-02-21T09:47:27.6778269Z mov.b32 %r570, 1; 2026-02-21T09:47:27.6778323Z mov.b32 %r569, 2; 2026-02-21T09:47:27.6778381Z mov.b32 %r566, %r565; 2026-02-21T09:47:27.6778448Z mov.b32 %r568, %r567; 2026-02-21T09:47:27.6778505Z mov.b32 %r574, %r572; 2026-02-21T09:47:27.6778561Z mov.b32 %r575, %r572; 2026-02-21T09:47:27.6778626Z mov.b32 %r577, %r570; 2026-02-21T09:47:27.6778681Z mov.b32 %r578, %r572; 2026-02-21T09:47:27.6778736Z mov.b32 %r579, %r567; 2026-02-21T09:47:27.6778788Z mov.b32 %r580, %r565; 2026-02-21T09:47:27.6778854Z mov.b32 %r582, %r576; 2026-02-21T09:47:27.6778908Z mov.b32 %r583, %r572; 2026-02-21T09:47:27.6778961Z mov.b32 %r584, %r580; 2026-02-21T09:47:27.6779027Z mov.b32 %r585, %r579; 2026-02-21T09:47:27.6779086Z bra.uni $L__BB0_4; 2026-02-21T09:47:27.6779190Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.6779256Z selp.b32 %r577, 0, %r372, %p153; 2026-02-21T09:47:27.6779328Z selp.b32 %r373, 1, 0, %p153; 2026-02-21T09:47:27.6779384Z xor.b32 %r578, %r588, %r373; 2026-02-21T09:47:27.6779582Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6779653Z add.s32 %r583, %r583, 1; 2026-02-21T09:47:27.6779720Z setp.lt.s32 %p162, %r583, %r10; 2026-02-21T09:47:27.6779776Z mov.b32 %r565, %r580; 2026-02-21T09:47:27.6779842Z mov.b32 %r566, %r20; 2026-02-21T09:47:27.6779898Z mov.b32 %r567, %r579; 2026-02-21T09:47:27.6779953Z mov.b32 %r568, %r22; 2026-02-21T09:47:27.6780006Z mov.b32 %r569, %r582; 2026-02-21T09:47:27.6780069Z mov.b32 %r570, %r24; 2026-02-21T09:47:27.6780125Z mov.b32 %r572, %r588; 2026-02-21T09:47:27.6780177Z mov.b32 %r573, %r587; 2026-02-21T09:47:27.6780239Z mov.b32 %r579, %r585; 2026-02-21T09:47:27.6780295Z mov.b32 %r580, %r584; 2026-02-21T09:47:27.6780349Z mov.b32 %r582, %r39; 2026-02-21T09:47:27.6780404Z @%p162 bra $L__BB0_4; 2026-02-21T09:47:27.6780468Z bra.uni $L__BB0_11; 2026-02-21T09:47:27.6780570Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:27.6780739Z .loc 1 0 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:0:84 2026-02-21T09:47:27.6780824Z mov.b32 %r588, %r578; 2026-02-21T09:47:27.6780878Z mov.b32 %r24, %r569; 2026-02-21T09:47:27.6780933Z mov.b32 %r22, %r567; 2026-02-21T09:47:27.6780995Z mov.b32 %r20, %r565; 2026-02-21T09:47:27.6781159Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6781220Z add.s32 %r293, %r582, 1; 2026-02-21T09:47:27.6781281Z setp.eq.b32 %p125, %r582, 31; 2026-02-21T09:47:27.6781353Z selp.b32 %r39, 0, %r293, %p125; 2026-02-21T09:47:27.6781415Z setp.ne.b32 %p126, %r39, 0; 2026-02-21T09:47:27.6781472Z @%p126 bra $L__BB0_6; 2026-02-21T09:47:27.6781575Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.6781632Z add.s32 %r586, %r586, 1; 2026-02-21T09:47:27.6781802Z .loc 1 39 35 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:39:35 2026-02-21T09:47:27.6781879Z shr.s32 %r294, %r586, 31; 2026-02-21T09:47:27.6781947Z shr.u32 %r295, %r294, 27; 2026-02-21T09:47:27.6782004Z add.s32 %r296, %r586, %r295; 2026-02-21T09:47:27.6782061Z shr.s32 %r297, %r296, 5; 2026-02-21T09:47:27.6782230Z .loc 1 40 33 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:40:33 2026-02-21T09:47:27.6782284Z shl.b32 %r298, %r297, 2; 2026-02-21T09:47:27.6782442Z .loc 1 41 39 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:39 2026-02-21T09:47:27.6782512Z sub.s32 %r299, 192, %r298; 2026-02-21T09:47:27.6782673Z .loc 1 41 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:52 2026-02-21T09:47:27.6782754Z min.s32 %r300, %r299, 4; 2026-02-21T09:47:27.6782928Z .loc 1 42 45 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:45 2026-02-21T09:47:27.6782987Z and.b32 %r301, %r296, -32; 2026-02-21T09:47:27.6783046Z sub.s32 %r302, %r586, %r301; 2026-02-21T09:47:27.6783212Z .loc 1 43 51 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:43:51 2026-02-21T09:47:27.6783282Z div.s32 %r303, %r302, %r300; 2026-02-21T09:47:27.6783448Z .loc 1 42 64 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:64 2026-02-21T09:47:27.6783509Z mul.lo.s32 %r304, %r303, %r300; 2026-02-21T09:47:27.6783575Z sub.s32 %r305, %r302, %r304; 2026-02-21T09:47:27.6783742Z .loc 1 42 30 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:30 2026-02-21T09:47:27.6783800Z add.s32 %r306, %r305, %r298; 2026-02-21T09:47:27.6783974Z .loc 1 44 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:44:27 2026-02-21T09:47:27.6784032Z shl.b32 %r584, %r306, 6; 2026-02-21T09:47:27.6784197Z .loc 1 45 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:45:27 2026-02-21T09:47:27.6784278Z shl.b32 %r585, %r303, 8; 2026-02-21T09:47:27.6784387Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.6784551Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6784609Z add.s32 %r309, %r575, 1; 2026-02-21T09:47:27.6784705Z setp.gt.s32 %p128, %r309, 3; 2026-02-21T09:47:27.6784770Z selp.b32 %r575, 0, %r309, %p128; 2026-02-21T09:47:27.6784833Z selp.b32 %r310, 1, 0, %p128; 2026-02-21T09:47:27.6784897Z xor.b32 %r574, %r574, %r310; 2026-02-21T09:47:27.6784954Z shl.b32 %r311, %r575, 3; 2026-02-21T09:47:27.6785012Z add.s32 %r313, %r56, %r311; 2026-02-21T09:47:27.6785072Z add.s32 %r307, %r313, 196608; 2026-02-21T09:47:27.6785137Z bar.sync 0; 2026-02-21T09:47:27.6785194Z // begin inline asm 2026-02-21T09:47:27.6785246Z 2026-02-21T09:47:27.6785306Z { 2026-02-21T09:47:27.6785368Z .reg .pred complete; 2026-02-21T09:47:27.6785421Z waitLoop: 2026-02-21T09:47:27.6785547Z mbarrier.try_wait.parity.shared.b64 complete, [%r307], %r574; 2026-02-21T09:47:27.6785623Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.6785714Z } 2026-02-21T09:47:27.6785719Z 2026-02-21T09:47:27.6785779Z // end inline asm 2026-02-21T09:47:27.6785850Z shl.b32 %r314, %r577, 3; 2026-02-21T09:47:27.6785908Z add.s32 %r315, %r56, %r314; 2026-02-21T09:47:27.6785968Z add.s32 %r587, %r315, 196640; 2026-02-21T09:47:27.6786143Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6786200Z @%p94 bra $L__BB0_8; 2026-02-21T09:47:27.6786292Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.6786461Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:27.6786531Z shl.b32 %r332, %r575, 15; 2026-02-21T09:47:27.6786589Z add.s32 %r334, %r56, %r332; 2026-02-21T09:47:27.6786751Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:27.6786847Z shl.b32 %r335, %r575, 13; 2026-02-21T09:47:27.6786907Z add.s32 %r336, %r56, %r335; 2026-02-21T09:47:27.6786967Z add.s32 %r337, %r336, 131072; 2026-02-21T09:47:27.6787144Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6787208Z elect.sync %r338|%p130, -1; 2026-02-21T09:47:27.6787267Z bfe.u32 %r339, %r334, 4, 14; 2026-02-21T09:47:27.6787328Z cvt.u64.u32 %rd116, %r339; 2026-02-21T09:47:27.6787416Z or.b64 %rd99, %rd116, 4611686293439512576; 2026-02-21T09:47:27.6787473Z bfe.u32 %r340, %r337, 4, 14; 2026-02-21T09:47:27.6787564Z cvt.u64.u32 %rd117, %r340; 2026-02-21T09:47:27.6787647Z or.b64 %rd100, %rd117, 4611686293338849280; 2026-02-21T09:47:27.6787704Z mov.b32 %r317, 135266320; 2026-02-21T09:47:27.6787765Z // begin inline asm 2026-02-21T09:47:27.6787918Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd99, %rd100, %r317, %p171; 2026-02-21T09:47:27.6787975Z // end inline asm 2026-02-21T09:47:27.6788037Z add.s32 %r341, %r334, 32; 2026-02-21T09:47:27.6788096Z bfe.u32 %r342, %r341, 4, 14; 2026-02-21T09:47:27.6788167Z cvt.u64.u32 %rd118, %r342; 2026-02-21T09:47:27.6788238Z or.b64 %rd101, %rd118, 4611686293439512576; 2026-02-21T09:47:27.6788295Z add.s32 %r343, %r336, 131104; 2026-02-21T09:47:27.6788360Z bfe.u32 %r344, %r343, 4, 14; 2026-02-21T09:47:27.6788418Z cvt.u64.u32 %rd119, %r344; 2026-02-21T09:47:27.6788484Z or.b64 %rd102, %rd119, 4611686293338849280; 2026-02-21T09:47:27.6788551Z mov.pred %p131, -1; 2026-02-21T09:47:27.6788608Z // begin inline asm 2026-02-21T09:47:27.6788748Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd101, %rd102, %r317, %p131; 2026-02-21T09:47:27.6788811Z // end inline asm 2026-02-21T09:47:27.6788865Z add.s32 %r345, %r334, 64; 2026-02-21T09:47:27.6788921Z bfe.u32 %r346, %r345, 4, 14; 2026-02-21T09:47:27.6788985Z cvt.u64.u32 %rd120, %r346; 2026-02-21T09:47:27.6789086Z or.b64 %rd103, %rd120, 4611686293439512576; 2026-02-21T09:47:27.6789148Z add.s32 %r347, %r336, 131136; 2026-02-21T09:47:27.6789206Z bfe.u32 %r348, %r347, 4, 14; 2026-02-21T09:47:27.6789277Z cvt.u64.u32 %rd121, %r348; 2026-02-21T09:47:27.6789342Z or.b64 %rd104, %rd121, 4611686293338849280; 2026-02-21T09:47:27.6789400Z // begin inline asm 2026-02-21T09:47:27.6789548Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd103, %rd104, %r317, %p131; 2026-02-21T09:47:27.6789602Z // end inline asm 2026-02-21T09:47:27.6789657Z add.s32 %r349, %r334, 96; 2026-02-21T09:47:27.6789712Z bfe.u32 %r350, %r349, 4, 14; 2026-02-21T09:47:27.6789777Z cvt.u64.u32 %rd122, %r350; 2026-02-21T09:47:27.6789840Z or.b64 %rd105, %rd122, 4611686293439512576; 2026-02-21T09:47:27.6789897Z add.s32 %r351, %r336, 131168; 2026-02-21T09:47:27.6789958Z bfe.u32 %r352, %r351, 4, 14; 2026-02-21T09:47:27.6790013Z cvt.u64.u32 %rd123, %r352; 2026-02-21T09:47:27.6790077Z or.b64 %rd106, %rd123, 4611686293338849280; 2026-02-21T09:47:27.6790139Z // begin inline asm 2026-02-21T09:47:27.6790276Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd105, %rd106, %r317, %p131; 2026-02-21T09:47:27.6790356Z // end inline asm 2026-02-21T09:47:27.6790412Z add.s32 %r353, %r334, 16384; 2026-02-21T09:47:27.6790473Z bfe.u32 %r354, %r353, 4, 14; 2026-02-21T09:47:27.6790528Z cvt.u64.u32 %rd124, %r354; 2026-02-21T09:47:27.6790592Z or.b64 %rd107, %rd124, 4611686293439512576; 2026-02-21T09:47:27.6790653Z // begin inline asm 2026-02-21T09:47:27.6790788Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd107, %rd100, %r317, %p171; 2026-02-21T09:47:27.6790840Z // end inline asm 2026-02-21T09:47:27.6790907Z add.s32 %r355, %r334, 16416; 2026-02-21T09:47:27.6790961Z bfe.u32 %r356, %r355, 4, 14; 2026-02-21T09:47:27.6791017Z cvt.u64.u32 %rd125, %r356; 2026-02-21T09:47:27.6791082Z or.b64 %rd109, %rd125, 4611686293439512576; 2026-02-21T09:47:27.6791144Z // begin inline asm 2026-02-21T09:47:27.6791303Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd109, %rd102, %r317, %p131; 2026-02-21T09:47:27.6791358Z // end inline asm 2026-02-21T09:47:27.6791422Z add.s32 %r357, %r334, 16448; 2026-02-21T09:47:27.6791477Z bfe.u32 %r358, %r357, 4, 14; 2026-02-21T09:47:27.6791534Z cvt.u64.u32 %rd126, %r358; 2026-02-21T09:47:27.6791598Z or.b64 %rd111, %rd126, 4611686293439512576; 2026-02-21T09:47:27.6791660Z // begin inline asm 2026-02-21T09:47:27.6791795Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd111, %rd104, %r317, %p131; 2026-02-21T09:47:27.6791847Z // end inline asm 2026-02-21T09:47:27.6791910Z add.s32 %r359, %r334, 16480; 2026-02-21T09:47:27.6791986Z bfe.u32 %r360, %r359, 4, 14; 2026-02-21T09:47:27.6792041Z cvt.u64.u32 %rd127, %r360; 2026-02-21T09:47:27.6792114Z or.b64 %rd113, %rd127, 4611686293439512576; 2026-02-21T09:47:27.6792167Z // begin inline asm 2026-02-21T09:47:27.6792299Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd113, %rd106, %r317, %p131; 2026-02-21T09:47:27.6792354Z // end inline asm 2026-02-21T09:47:27.6792418Z cvt.u64.u32 %rd115, %r587; 2026-02-21T09:47:27.6792471Z // begin inline asm 2026-02-21T09:47:27.6792598Z @%p130 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd115]; 2026-02-21T09:47:27.6792658Z // end inline asm 2026-02-21T09:47:27.6792750Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.6792916Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6792982Z setp.eq.b32 %p149, %r39, 0; 2026-02-21T09:47:27.6793043Z setp.lt.s32 %p150, %r583, %r11; 2026-02-21T09:47:27.6793206Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6793260Z // begin inline asm 2026-02-21T09:47:27.6793314Z 2026-02-21T09:47:27.6793368Z { 2026-02-21T09:47:27.6793435Z .reg .pred complete; 2026-02-21T09:47:27.6793488Z waitLoop: 2026-02-21T09:47:27.6793625Z mbarrier.try_wait.parity.shared.b64 complete, [%r573], %r572; 2026-02-21T09:47:27.6793690Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.6793748Z } 2026-02-21T09:47:27.6793753Z 2026-02-21T09:47:27.6793806Z // end inline asm 2026-02-21T09:47:27.6793861Z add.s32 %r372, %r577, 1; 2026-02-21T09:47:27.6793927Z setp.gt.s32 %p153, %r372, 1; 2026-02-21T09:47:27.6794089Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6794145Z add.s32 %r374, %r571, 64; 2026-02-21T09:47:27.6794206Z add.s32 %r375, %r576, 1; 2026-02-21T09:47:27.6794264Z setp.gt.s32 %p154, %r375, 3; 2026-02-21T09:47:27.6794324Z selp.b32 %r576, 0, %r375, %p154; 2026-02-21T09:47:27.6794385Z selp.b32 %r571, 0, %r374, %p149; 2026-02-21T09:47:27.6794449Z shl.b32 %r376, %r576, 3; 2026-02-21T09:47:27.6794504Z add.s32 %r378, %r56, %r376; 2026-02-21T09:47:27.6794560Z add.s32 %r367, %r378, 196608; 2026-02-21T09:47:27.6794619Z bar.sync 0; 2026-02-21T09:47:27.6794714Z and.pred %p146, %p164, %p150; 2026-02-21T09:47:27.6794773Z // begin inline asm 2026-02-21T09:47:27.6794885Z @%p146 mbarrier.arrive.expect_tx.shared.b64 _, [%r367], 40960; 2026-02-21T09:47:27.6794984Z // end inline asm 2026-02-21T09:47:27.6795150Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:27.6795205Z shl.b32 %r379, %r576, 15; 2026-02-21T09:47:27.6795268Z add.s32 %r364, %r56, %r379; 2026-02-21T09:47:27.6795321Z bar.sync 0; 2026-02-21T09:47:27.6795382Z elect.sync %r380|%p155, -1; 2026-02-21T09:47:27.6795452Z and.pred %p156, %p150, %p155; 2026-02-21T09:47:27.6795512Z and.pred %p147, %p3, %p156; 2026-02-21T09:47:27.6795568Z // begin inline asm 2026-02-21T09:47:27.6795811Z @%p147 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r364], [%rd58, {%r571, %r585}], [%r367]; 2026-02-21T09:47:27.6795873Z // end inline asm 2026-02-21T09:47:27.6796071Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:27.6796129Z shl.b32 %r381, %r576, 13; 2026-02-21T09:47:27.6796195Z add.s32 %r382, %r56, %r381; 2026-02-21T09:47:27.6796252Z add.s32 %r368, %r382, 131072; 2026-02-21T09:47:27.6796304Z bar.sync 0; 2026-02-21T09:47:27.6796372Z elect.sync %r383|%p157, -1; 2026-02-21T09:47:27.6796431Z and.pred %p158, %p150, %p157; 2026-02-21T09:47:27.6796490Z and.pred %p148, %p3, %p158; 2026-02-21T09:47:27.6796544Z // begin inline asm 2026-02-21T09:47:27.6796790Z @%p148 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r368], [%rd59, {%r571, %r584}], [%r367]; 2026-02-21T09:47:27.6796870Z // end inline asm 2026-02-21T09:47:27.6797036Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6797102Z setp.ne.b32 %p171, %r570, 31; 2026-02-21T09:47:27.6797161Z @%p171 bra $L__BB0_10; 2026-02-21T09:47:27.6797252Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.6797417Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6797473Z // begin inline asm 2026-02-21T09:47:27.6797521Z 2026-02-21T09:47:27.6797568Z { 2026-02-21T09:47:27.6797631Z .reg .pred complete; 2026-02-21T09:47:27.6797682Z waitLoop: 2026-02-21T09:47:27.6797793Z mbarrier.try_wait.parity.shared.b64 complete, [%r587], %r588; 2026-02-21T09:47:27.6797861Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.6797908Z } 2026-02-21T09:47:27.6797911Z 2026-02-21T09:47:27.6797963Z // end inline asm 2026-02-21T09:47:27.6798022Z // begin inline asm 2026-02-21T09:47:27.6798300Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401}, [%r81 + 0]; 2026-02-21T09:47:27.6798352Z // end inline asm 2026-02-21T09:47:27.6798404Z // begin inline asm 2026-02-21T09:47:27.6798704Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418}, [%r81 + 16]; 2026-02-21T09:47:27.6798761Z // end inline asm 2026-02-21T09:47:27.6798814Z // begin inline asm 2026-02-21T09:47:27.6799075Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435}, [%r81 + 32]; 2026-02-21T09:47:27.6799127Z // end inline asm 2026-02-21T09:47:27.6799180Z // begin inline asm 2026-02-21T09:47:27.6799437Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452}, [%r81 + 48]; 2026-02-21T09:47:27.6799492Z // end inline asm 2026-02-21T09:47:27.6799545Z // begin inline asm 2026-02-21T09:47:27.6799618Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:27.6799669Z // end inline asm 2026-02-21T09:47:27.6799729Z cvt.u64.u32 %rd131, %r386; 2026-02-21T09:47:27.6799786Z cvt.u64.u32 %rd132, %r387; 2026-02-21T09:47:27.6799853Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:47:27.6799913Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:47:27.6800110Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6800177Z mov.b64 {%r457, %r458}, %rd134; 2026-02-21T09:47:27.6800246Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T09:47:27.6800414Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6800472Z cvt.u64.u32 %rd135, %r388; 2026-02-21T09:47:27.6800535Z cvt.u64.u32 %rd136, %r389; 2026-02-21T09:47:27.6800592Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:47:27.6800650Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:47:27.6800821Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6800880Z mov.b64 {%r460, %r461}, %rd138; 2026-02-21T09:47:27.6800944Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T09:47:27.6801138Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6801198Z cvt.u64.u32 %rd139, %r390; 2026-02-21T09:47:27.6801255Z cvt.u64.u32 %rd140, %r391; 2026-02-21T09:47:27.6801311Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:47:27.6801377Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:47:27.6801537Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6801595Z mov.b64 {%r463, %r464}, %rd142; 2026-02-21T09:47:27.6801667Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T09:47:27.6801855Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6801914Z cvt.u64.u32 %rd143, %r392; 2026-02-21T09:47:27.6801980Z cvt.u64.u32 %rd144, %r393; 2026-02-21T09:47:27.6802038Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:47:27.6802096Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:47:27.6802258Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6802325Z mov.b64 {%r466, %r467}, %rd146; 2026-02-21T09:47:27.6802386Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T09:47:27.6802549Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6802615Z cvt.u64.u32 %rd147, %r394; 2026-02-21T09:47:27.6802671Z cvt.u64.u32 %rd148, %r395; 2026-02-21T09:47:27.6802726Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:47:27.6802790Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:47:27.6802953Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6803012Z mov.b64 {%r469, %r470}, %rd150; 2026-02-21T09:47:27.6803073Z cvt.rn.f16x2.f32 %r471, %r470, %r469; 2026-02-21T09:47:27.6803240Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6803333Z cvt.u64.u32 %rd151, %r396; 2026-02-21T09:47:27.6803390Z cvt.u64.u32 %rd152, %r397; 2026-02-21T09:47:27.6803453Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:47:27.6803510Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:47:27.6803670Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6803733Z mov.b64 {%r472, %r473}, %rd154; 2026-02-21T09:47:27.6803794Z cvt.rn.f16x2.f32 %r474, %r473, %r472; 2026-02-21T09:47:27.6803956Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6804011Z cvt.u64.u32 %rd155, %r398; 2026-02-21T09:47:27.6804073Z cvt.u64.u32 %rd156, %r399; 2026-02-21T09:47:27.6804128Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:47:27.6804184Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:47:27.6804354Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6804412Z mov.b64 {%r475, %r476}, %rd158; 2026-02-21T09:47:27.6804474Z cvt.rn.f16x2.f32 %r477, %r476, %r475; 2026-02-21T09:47:27.6804702Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6804760Z cvt.u64.u32 %rd159, %r400; 2026-02-21T09:47:27.6804815Z cvt.u64.u32 %rd160, %r401; 2026-02-21T09:47:27.6804871Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:47:27.6804935Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:47:27.6805099Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6805156Z mov.b64 {%r478, %r479}, %rd162; 2026-02-21T09:47:27.6805228Z cvt.rn.f16x2.f32 %r480, %r479, %r478; 2026-02-21T09:47:27.6805387Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6805444Z cvt.u64.u32 %rd163, %r403; 2026-02-21T09:47:27.6805507Z cvt.u64.u32 %rd164, %r404; 2026-02-21T09:47:27.6805564Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:47:27.6805647Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:47:27.6805813Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6805880Z mov.b64 {%r481, %r482}, %rd166; 2026-02-21T09:47:27.6805942Z cvt.rn.f16x2.f32 %r483, %r482, %r481; 2026-02-21T09:47:27.6806106Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6806173Z cvt.u64.u32 %rd167, %r405; 2026-02-21T09:47:27.6806229Z cvt.u64.u32 %rd168, %r406; 2026-02-21T09:47:27.6806286Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:47:27.6806393Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:47:27.6806553Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6806610Z mov.b64 {%r484, %r485}, %rd170; 2026-02-21T09:47:27.6806670Z cvt.rn.f16x2.f32 %r486, %r485, %r484; 2026-02-21T09:47:27.6806844Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6806902Z cvt.u64.u32 %rd171, %r407; 2026-02-21T09:47:27.6806958Z cvt.u64.u32 %rd172, %r408; 2026-02-21T09:47:27.6807019Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:47:27.6807075Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:47:27.6807235Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6807298Z mov.b64 {%r487, %r488}, %rd174; 2026-02-21T09:47:27.6807358Z cvt.rn.f16x2.f32 %r489, %r488, %r487; 2026-02-21T09:47:27.6807519Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6807575Z cvt.u64.u32 %rd175, %r409; 2026-02-21T09:47:27.6807638Z cvt.u64.u32 %rd176, %r410; 2026-02-21T09:47:27.6807694Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:47:27.6807749Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:47:27.6807942Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6808001Z mov.b64 {%r490, %r491}, %rd178; 2026-02-21T09:47:27.6808062Z cvt.rn.f16x2.f32 %r492, %r491, %r490; 2026-02-21T09:47:27.6808230Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6808286Z cvt.u64.u32 %rd179, %r411; 2026-02-21T09:47:27.6808340Z cvt.u64.u32 %rd180, %r412; 2026-02-21T09:47:27.6808396Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:47:27.6808460Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:47:27.6808625Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6808685Z mov.b64 {%r493, %r494}, %rd182; 2026-02-21T09:47:27.6808753Z cvt.rn.f16x2.f32 %r495, %r494, %r493; 2026-02-21T09:47:27.6808917Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6808973Z cvt.u64.u32 %rd183, %r413; 2026-02-21T09:47:27.6809037Z cvt.u64.u32 %rd184, %r414; 2026-02-21T09:47:27.6809094Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:47:27.6809180Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:47:27.6809341Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6809406Z mov.b64 {%r496, %r497}, %rd186; 2026-02-21T09:47:27.6809467Z cvt.rn.f16x2.f32 %r498, %r497, %r496; 2026-02-21T09:47:27.6809630Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6809693Z cvt.u64.u32 %rd187, %r415; 2026-02-21T09:47:27.6809748Z cvt.u64.u32 %rd188, %r416; 2026-02-21T09:47:27.6809808Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:47:27.6809873Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:47:27.6810038Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6810094Z mov.b64 {%r499, %r500}, %rd190; 2026-02-21T09:47:27.6810176Z cvt.rn.f16x2.f32 %r501, %r500, %r499; 2026-02-21T09:47:27.6810349Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6810408Z cvt.u64.u32 %rd191, %r417; 2026-02-21T09:47:27.6810465Z cvt.u64.u32 %rd192, %r418; 2026-02-21T09:47:27.6810533Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:47:27.6810591Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:47:27.6810752Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6810816Z mov.b64 {%r502, %r503}, %rd194; 2026-02-21T09:47:27.6810876Z cvt.rn.f16x2.f32 %r504, %r503, %r502; 2026-02-21T09:47:27.6811067Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6811123Z cvt.u64.u32 %rd195, %r420; 2026-02-21T09:47:27.6811186Z cvt.u64.u32 %rd196, %r421; 2026-02-21T09:47:27.6811244Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:47:27.6811301Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:47:27.6811477Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6811538Z mov.b64 {%r505, %r506}, %rd198; 2026-02-21T09:47:27.6811600Z cvt.rn.f16x2.f32 %r507, %r506, %r505; 2026-02-21T09:47:27.6811778Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6811836Z cvt.u64.u32 %rd199, %r422; 2026-02-21T09:47:27.6811893Z cvt.u64.u32 %rd200, %r423; 2026-02-21T09:47:27.6811949Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:47:27.6812017Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:47:27.6812188Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6812247Z mov.b64 {%r508, %r509}, %rd202; 2026-02-21T09:47:27.6812319Z cvt.rn.f16x2.f32 %r510, %r509, %r508; 2026-02-21T09:47:27.6812515Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6812575Z cvt.u64.u32 %rd203, %r424; 2026-02-21T09:47:27.6812641Z cvt.u64.u32 %rd204, %r425; 2026-02-21T09:47:27.6812700Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:47:27.6812758Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:47:27.6812928Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6812995Z mov.b64 {%r511, %r512}, %rd206; 2026-02-21T09:47:27.6813057Z cvt.rn.f16x2.f32 %r513, %r512, %r511; 2026-02-21T09:47:27.6813226Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6813295Z cvt.u64.u32 %rd207, %r426; 2026-02-21T09:47:27.6813352Z cvt.u64.u32 %rd208, %r427; 2026-02-21T09:47:27.6813410Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:47:27.6813478Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:47:27.6813651Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6813712Z mov.b64 {%r514, %r515}, %rd210; 2026-02-21T09:47:27.6813774Z cvt.rn.f16x2.f32 %r516, %r515, %r514; 2026-02-21T09:47:27.6813970Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6814029Z cvt.u64.u32 %rd211, %r428; 2026-02-21T09:47:27.6814086Z cvt.u64.u32 %rd212, %r429; 2026-02-21T09:47:27.6814152Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:47:27.6814211Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:47:27.6814380Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6814448Z mov.b64 {%r517, %r518}, %rd214; 2026-02-21T09:47:27.6814511Z cvt.rn.f16x2.f32 %r519, %r518, %r517; 2026-02-21T09:47:27.6814711Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6814771Z cvt.u64.u32 %rd215, %r430; 2026-02-21T09:47:27.6814838Z cvt.u64.u32 %rd216, %r431; 2026-02-21T09:47:27.6814924Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:47:27.6814988Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:47:27.6815166Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6815225Z mov.b64 {%r520, %r521}, %rd218; 2026-02-21T09:47:27.6815287Z cvt.rn.f16x2.f32 %r522, %r521, %r520; 2026-02-21T09:47:27.6815464Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6815522Z cvt.u64.u32 %rd219, %r432; 2026-02-21T09:47:27.6815579Z cvt.u64.u32 %rd220, %r433; 2026-02-21T09:47:27.6815636Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:47:27.6815728Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:47:27.6815899Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6815957Z mov.b64 {%r523, %r524}, %rd222; 2026-02-21T09:47:27.6816027Z cvt.rn.f16x2.f32 %r525, %r524, %r523; 2026-02-21T09:47:27.6816200Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6816260Z cvt.u64.u32 %rd223, %r434; 2026-02-21T09:47:27.6816324Z cvt.u64.u32 %rd224, %r435; 2026-02-21T09:47:27.6816382Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:47:27.6816440Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:47:27.6816606Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6816672Z mov.b64 {%r526, %r527}, %rd226; 2026-02-21T09:47:27.6816734Z cvt.rn.f16x2.f32 %r528, %r527, %r526; 2026-02-21T09:47:27.6816903Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6816972Z cvt.u64.u32 %rd227, %r437; 2026-02-21T09:47:27.6817030Z cvt.u64.u32 %rd228, %r438; 2026-02-21T09:47:27.6817089Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:47:27.6817155Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:47:27.6817349Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6817411Z mov.b64 {%r529, %r530}, %rd230; 2026-02-21T09:47:27.6817475Z cvt.rn.f16x2.f32 %r531, %r530, %r529; 2026-02-21T09:47:27.6817651Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6817711Z cvt.u64.u32 %rd231, %r439; 2026-02-21T09:47:27.6817768Z cvt.u64.u32 %rd232, %r440; 2026-02-21T09:47:27.6817835Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:47:27.6817893Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:47:27.6818060Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6818128Z mov.b64 {%r532, %r533}, %rd234; 2026-02-21T09:47:27.6818192Z cvt.rn.f16x2.f32 %r534, %r533, %r532; 2026-02-21T09:47:27.6818360Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6818420Z cvt.u64.u32 %rd235, %r441; 2026-02-21T09:47:27.6818487Z cvt.u64.u32 %rd236, %r442; 2026-02-21T09:47:27.6818545Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:47:27.6818631Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:47:27.6818810Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6818870Z mov.b64 {%r535, %r536}, %rd238; 2026-02-21T09:47:27.6818932Z cvt.rn.f16x2.f32 %r537, %r536, %r535; 2026-02-21T09:47:27.6819119Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6819181Z cvt.u64.u32 %rd239, %r443; 2026-02-21T09:47:27.6819251Z cvt.u64.u32 %rd240, %r444; 2026-02-21T09:47:27.6819309Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:47:27.6819376Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:47:27.6819541Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6819600Z mov.b64 {%r538, %r539}, %rd242; 2026-02-21T09:47:27.6819693Z cvt.rn.f16x2.f32 %r540, %r539, %r538; 2026-02-21T09:47:27.6819860Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6819917Z cvt.u64.u32 %rd243, %r445; 2026-02-21T09:47:27.6819980Z cvt.u64.u32 %rd244, %r446; 2026-02-21T09:47:27.6820037Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:47:27.6820093Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:47:27.6820255Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6820320Z mov.b64 {%r541, %r542}, %rd246; 2026-02-21T09:47:27.6820410Z cvt.rn.f16x2.f32 %r543, %r542, %r541; 2026-02-21T09:47:27.6820572Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6820636Z cvt.u64.u32 %rd247, %r447; 2026-02-21T09:47:27.6820690Z cvt.u64.u32 %rd248, %r448; 2026-02-21T09:47:27.6820745Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:47:27.6820804Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:47:27.6820970Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6821026Z mov.b64 {%r544, %r545}, %rd250; 2026-02-21T09:47:27.6821087Z cvt.rn.f16x2.f32 %r546, %r545, %r544; 2026-02-21T09:47:27.6821255Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6821311Z cvt.u64.u32 %rd251, %r449; 2026-02-21T09:47:27.6821365Z cvt.u64.u32 %rd252, %r450; 2026-02-21T09:47:27.6821426Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:47:27.6821481Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:47:27.6821640Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6821704Z mov.b64 {%r547, %r548}, %rd254; 2026-02-21T09:47:27.6821764Z cvt.rn.f16x2.f32 %r549, %r548, %r547; 2026-02-21T09:47:27.6821943Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6822001Z cvt.u64.u32 %rd255, %r451; 2026-02-21T09:47:27.6822067Z cvt.u64.u32 %rd256, %r452; 2026-02-21T09:47:27.6822122Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:47:27.6822179Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:47:27.6822347Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:27.6822403Z mov.b64 {%r550, %r551}, %rd258; 2026-02-21T09:47:27.6822462Z cvt.rn.f16x2.f32 %r552, %r551, %r550; 2026-02-21T09:47:27.6822629Z .loc 1 59 45 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:59:45 2026-02-21T09:47:27.6822702Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.6822756Z bar.sync 0; 2026-02-21T09:47:27.6822851Z st.shared.v4.b32 [%r12], {%r459, %r462, %r465, %r468}; 2026-02-21T09:47:27.6822950Z st.shared.v4.b32 [%r13], {%r471, %r474, %r477, %r480}; 2026-02-21T09:47:27.6823037Z st.shared.v4.b32 [%r14], {%r483, %r486, %r489, %r492}; 2026-02-21T09:47:27.6823120Z st.shared.v4.b32 [%r15], {%r495, %r498, %r501, %r504}; 2026-02-21T09:47:27.6823229Z st.shared.v4.b32 [%r16], {%r507, %r510, %r513, %r516}; 2026-02-21T09:47:27.6823310Z st.shared.v4.b32 [%r17], {%r519, %r522, %r525, %r528}; 2026-02-21T09:47:27.6823391Z st.shared.v4.b32 [%r18], {%r531, %r534, %r537, %r540}; 2026-02-21T09:47:27.6823477Z st.shared.v4.b32 [%r19], {%r543, %r546, %r549, %r552}; 2026-02-21T09:47:27.6823532Z // begin inline asm 2026-02-21T09:47:27.6823603Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.6823656Z // end inline asm 2026-02-21T09:47:27.6823716Z bar.sync 0; 2026-02-21T09:47:27.6823778Z elect.sync %r553|%p161, -1; 2026-02-21T09:47:27.6823837Z and.pred %p159, %p3, %p161; 2026-02-21T09:47:27.6823897Z // begin inline asm 2026-02-21T09:47:27.6824075Z @%p159 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd130, {%r566, %r568}], [%r285]; 2026-02-21T09:47:27.6824129Z // end inline asm 2026-02-21T09:47:27.6824215Z cp.async.bulk.commit_group; 2026-02-21T09:47:27.6824279Z bra.uni $L__BB0_10; 2026-02-21T09:47:27.6824364Z $L__BB0_11: // %._crit_edge 2026-02-21T09:47:27.6824530Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6824608Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.6824661Z bar.sync 0; 2026-02-21T09:47:27.6824750Z @%p79 bra $L__BB0_13; 2026-02-21T09:47:27.6824809Z // %bb.12: 2026-02-21T09:47:27.6824967Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:27.6825049Z // begin inline asm 2026-02-21T09:47:27.6825097Z 2026-02-21T09:47:27.6825154Z { 2026-02-21T09:47:27.6825211Z .reg .pred complete; 2026-02-21T09:47:27.6825263Z waitLoop: 2026-02-21T09:47:27.6825386Z mbarrier.try_wait.parity.shared.b64 complete, [%r587], %r588; 2026-02-21T09:47:27.6825448Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.6825497Z } 2026-02-21T09:47:27.6825502Z 2026-02-21T09:47:27.6825554Z // end inline asm 2026-02-21T09:47:27.6825616Z $L__BB0_13: 2026-02-21T09:47:27.6825783Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:27.6825836Z // begin inline asm 2026-02-21T09:47:27.6825929Z @%p164 mbarrier.inval.shared::cta.b64 [%r151]; 2026-02-21T09:47:27.6825982Z // end inline asm 2026-02-21T09:47:27.6826033Z bar.sync 0; 2026-02-21T09:47:27.6826093Z // begin inline asm 2026-02-21T09:47:27.6826176Z @%p164 mbarrier.inval.shared::cta.b64 [%r152]; 2026-02-21T09:47:27.6826229Z // end inline asm 2026-02-21T09:47:27.6826282Z bar.sync 0; 2026-02-21T09:47:27.6826344Z // begin inline asm 2026-02-21T09:47:27.6826421Z @%p164 mbarrier.inval.shared::cta.b64 [%r153]; 2026-02-21T09:47:27.6826473Z // end inline asm 2026-02-21T09:47:27.6826531Z bar.sync 0; 2026-02-21T09:47:27.6826585Z // begin inline asm 2026-02-21T09:47:27.6826688Z @%p164 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:47:27.6826744Z // end inline asm 2026-02-21T09:47:27.6826811Z add.s32 %r561, %r56, 196640; 2026-02-21T09:47:27.6826866Z // begin inline asm 2026-02-21T09:47:27.6826942Z @%p164 mbarrier.inval.shared::cta.b64 [%r561]; 2026-02-21T09:47:27.6827001Z // end inline asm 2026-02-21T09:47:27.6827053Z bar.sync 0; 2026-02-21T09:47:27.6827108Z // begin inline asm 2026-02-21T09:47:27.6827193Z @%p164 mbarrier.inval.shared::cta.b64 [%r150]; 2026-02-21T09:47:27.6827245Z // end inline asm 2026-02-21T09:47:27.6827404Z .loc 1 33 4 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:4 2026-02-21T09:47:27.6827460Z bar.sync 0; 2026-02-21T09:47:27.6827529Z // begin inline asm 2026-02-21T09:47:27.6827644Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r563, 128; 2026-02-21T09:47:27.6827695Z // end inline asm 2026-02-21T09:47:27.6827751Z ret; 2026-02-21T09:47:27.6827802Z $L__tmp1: 2026-02-21T09:47:27.6827854Z $L__func_end0: 2026-02-21T09:47:27.6827935Z // -- End function 2026-02-21T09:47:27.6827991Z } 2026-02-21T09:47:27.6828222Z .file 1 "/tmp/torchinductor_root/y7/cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py" 2026-02-21T09:47:27.6828281Z .section .debug_abbrev 2026-02-21T09:47:27.6828337Z { 2026-02-21T09:47:27.6828421Z .b8 1 // Abbreviation Code 2026-02-21T09:47:27.6828506Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:27.6828589Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:27.6828666Z .b8 37 // DW_AT_producer 2026-02-21T09:47:27.6828740Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.6828812Z .b8 19 // DW_AT_language 2026-02-21T09:47:27.6828891Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:27.6828962Z .b8 3 // DW_AT_name 2026-02-21T09:47:27.6829058Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.6829141Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:27.6829213Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:27.6829284Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:27.6829360Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.6829430Z .b8 0 // EOM(1) 2026-02-21T09:47:27.6829497Z .b8 0 // EOM(2) 2026-02-21T09:47:27.6829559Z .b8 0 // EOM(3) 2026-02-21T09:47:27.6829641Z } 2026-02-21T09:47:27.6829699Z .section .debug_info 2026-02-21T09:47:27.6829747Z { 2026-02-21T09:47:27.6829836Z .b32 104 // Length of Unit 2026-02-21T09:47:27.6829917Z .b8 2 // DWARF version number 2026-02-21T09:47:27.6829965Z .b8 0 2026-02-21T09:47:27.6830081Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:27.6830173Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:27.6830271Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:27.6830347Z .b8 116 // DW_AT_producer 2026-02-21T09:47:27.6830406Z .b8 114 2026-02-21T09:47:27.6830456Z .b8 105 2026-02-21T09:47:27.6830506Z .b8 116 2026-02-21T09:47:27.6830561Z .b8 111 2026-02-21T09:47:27.6830610Z .b8 110 2026-02-21T09:47:27.6830658Z .b8 0 2026-02-21T09:47:27.6830730Z .b8 2 // DW_AT_language 2026-02-21T09:47:27.6830786Z .b8 0 2026-02-21T09:47:27.6830857Z .b8 99 // DW_AT_name 2026-02-21T09:47:27.6830906Z .b8 121 2026-02-21T09:47:27.6830963Z .b8 55 2026-02-21T09:47:27.6831012Z .b8 122 2026-02-21T09:47:27.6831060Z .b8 99 2026-02-21T09:47:27.6831109Z .b8 110 2026-02-21T09:47:27.6831165Z .b8 116 2026-02-21T09:47:27.6831235Z .b8 99 2026-02-21T09:47:27.6831286Z .b8 106 2026-02-21T09:47:27.6831342Z .b8 100 2026-02-21T09:47:27.6831391Z .b8 114 2026-02-21T09:47:27.6831440Z .b8 104 2026-02-21T09:47:27.6831488Z .b8 104 2026-02-21T09:47:27.6831540Z .b8 115 2026-02-21T09:47:27.6831588Z .b8 99 2026-02-21T09:47:27.6831637Z .b8 121 2026-02-21T09:47:27.6831684Z .b8 107 2026-02-21T09:47:27.6831741Z .b8 55 2026-02-21T09:47:27.6831789Z .b8 50 2026-02-21T09:47:27.6831838Z .b8 52 2026-02-21T09:47:27.6831892Z .b8 118 2026-02-21T09:47:27.6831939Z .b8 105 2026-02-21T09:47:27.6831988Z .b8 54 2026-02-21T09:47:27.6832034Z .b8 105 2026-02-21T09:47:27.6832088Z .b8 50 2026-02-21T09:47:27.6832135Z .b8 112 2026-02-21T09:47:27.6832184Z .b8 101 2026-02-21T09:47:27.6832238Z .b8 119 2026-02-21T09:47:27.6832285Z .b8 104 2026-02-21T09:47:27.6832333Z .b8 103 2026-02-21T09:47:27.6832381Z .b8 102 2026-02-21T09:47:27.6832435Z .b8 116 2026-02-21T09:47:27.6832483Z .b8 51 2026-02-21T09:47:27.6832530Z .b8 115 2026-02-21T09:47:27.6832579Z .b8 103 2026-02-21T09:47:27.6832634Z .b8 111 2026-02-21T09:47:27.6832685Z .b8 108 2026-02-21T09:47:27.6832734Z .b8 105 2026-02-21T09:47:27.6832788Z .b8 121 2026-02-21T09:47:27.6832858Z .b8 119 2026-02-21T09:47:27.6832906Z .b8 114 2026-02-21T09:47:27.6832955Z .b8 99 2026-02-21T09:47:27.6833012Z .b8 111 2026-02-21T09:47:27.6833061Z .b8 52 2026-02-21T09:47:27.6833111Z .b8 121 2026-02-21T09:47:27.6833167Z .b8 118 2026-02-21T09:47:27.6833216Z .b8 100 2026-02-21T09:47:27.6833263Z .b8 117 2026-02-21T09:47:27.6833313Z .b8 107 2026-02-21T09:47:27.6833372Z .b8 115 2026-02-21T09:47:27.6833420Z .b8 98 2026-02-21T09:47:27.6833469Z .b8 54 2026-02-21T09:47:27.6833529Z .b8 46 2026-02-21T09:47:27.6833580Z .b8 112 2026-02-21T09:47:27.6833629Z .b8 121 2026-02-21T09:47:27.6833677Z .b8 0 2026-02-21T09:47:27.6833773Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:27.6833848Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:27.6833896Z .b8 116 2026-02-21T09:47:27.6833952Z .b8 109 2026-02-21T09:47:27.6834001Z .b8 112 2026-02-21T09:47:27.6834068Z .b8 47 2026-02-21T09:47:27.6834117Z .b8 116 2026-02-21T09:47:27.6834172Z .b8 111 2026-02-21T09:47:27.6834221Z .b8 114 2026-02-21T09:47:27.6834268Z .b8 99 2026-02-21T09:47:27.6834316Z .b8 104 2026-02-21T09:47:27.6834373Z .b8 105 2026-02-21T09:47:27.6834421Z .b8 110 2026-02-21T09:47:27.6834469Z .b8 100 2026-02-21T09:47:27.6834523Z .b8 117 2026-02-21T09:47:27.6834570Z .b8 99 2026-02-21T09:47:27.6834617Z .b8 116 2026-02-21T09:47:27.6834664Z .b8 111 2026-02-21T09:47:27.6834749Z .b8 114 2026-02-21T09:47:27.6834798Z .b8 95 2026-02-21T09:47:27.6834845Z .b8 114 2026-02-21T09:47:27.6834899Z .b8 111 2026-02-21T09:47:27.6834947Z .b8 111 2026-02-21T09:47:27.6835020Z .b8 116 2026-02-21T09:47:27.6835069Z .b8 47 2026-02-21T09:47:27.6835125Z .b8 121 2026-02-21T09:47:27.6835172Z .b8 55 2026-02-21T09:47:27.6835222Z .b8 0 2026-02-21T09:47:27.6835271Z } 2026-02-21T09:47:27.6835342Z .section .debug_macinfo { } 2026-02-21T09:47:27.6835346Z 2026-02-21T09:47:27.6835426Z ================================================================ 2026-02-21T09:47:27.6835529Z please share the reproducer above with Triton project. 2026-02-21T09:47:27.8841322Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:27.8841338Z 2026-02-21T09:47:27.8846101Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:47:27.8846128Z 2026-02-21T09:47:27.8847811Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:27.8847825Z 2026-02-21T09:47:27.8852896Z `ptxas` stderr: 2026-02-21T09:47:27.8853188Z ================================================================ 2026-02-21T09:47:27.8854629Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 253 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.8854819Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.8854827Z 2026-02-21T09:47:27.8855225Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpe1u2qo1v.ptx -o /tmp/tmpe1u2qo1v.ptx.o 2026-02-21T09:47:27.8855231Z 2026-02-21T09:47:27.8855361Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:27.8855447Z Internal Triton PTX codegen error 2026-02-21T09:47:27.8855507Z `ptxas` stderr: 2026-02-21T09:47:27.8855831Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 253 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:27.8855933Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:27.8855938Z 2026-02-21T09:47:27.8856530Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpe1u2qo1v.ptx -o /tmp/tmpe1u2qo1v.ptx.o 2026-02-21T09:47:27.8856535Z 2026-02-21T09:47:27.8856538Z 2026-02-21T09:47:27.8856591Z // 2026-02-21T09:47:27.8856672Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:27.8856721Z // 2026-02-21T09:47:27.8856725Z 2026-02-21T09:47:27.8856779Z .version 8.7 2026-02-21T09:47:27.8856843Z .target sm_100a 2026-02-21T09:47:27.8856899Z .address_size 64 2026-02-21T09:47:27.8856904Z 2026-02-21T09:47:27.8857024Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:27.8857110Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:27.8857189Z // @_helion_matmul 2026-02-21T09:47:27.8857255Z .visible .entry _helion_matmul( 2026-02-21T09:47:27.8857416Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:27.8857521Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:27.8857618Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:27.8857711Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:27.8857819Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:27.8857870Z ) 2026-02-21T09:47:27.8857927Z .reqntid 128 2026-02-21T09:47:27.8857991Z .maxnreg 32 2026-02-21T09:47:27.8858042Z { 2026-02-21T09:47:27.8858105Z .reg .pred %p<176>; 2026-02-21T09:47:27.8858162Z .reg .b32 %r<818>; 2026-02-21T09:47:27.8858266Z .reg .b64 %rd<387>; 2026-02-21T09:47:27.8858452Z .loc 1 19 0 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:19:0 2026-02-21T09:47:27.8858506Z $L__func_begin0: 2026-02-21T09:47:27.8858680Z .loc 1 19 0 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:19:0 2026-02-21T09:47:27.8858686Z 2026-02-21T09:47:27.8858738Z // %bb.0: 2026-02-21T09:47:27.8858822Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:47:27.8858884Z $L__tmp0: 2026-02-21T09:47:27.8859048Z .loc 1 19 0 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:19 2026-02-21T09:47:27.8859105Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:27.8859189Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:47:27.8859260Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:47:27.8859373Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:47:27.8859435Z mov.b32 %r56, global_smem; 2026-02-21T09:47:27.8859500Z // begin inline asm 2026-02-21T09:47:27.8859649Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r56], 128; 2026-02-21T09:47:27.8859706Z // end inline asm 2026-02-21T09:47:27.8859794Z ld.param.b64 %rd64, [_helion_matmul_param_3]; 2026-02-21T09:47:27.8859848Z bar.sync 0; 2026-02-21T09:47:27.8859954Z ld.shared.b32 %r792, [global_smem]; 2026-02-21T09:47:27.8860019Z bar.sync 0; 2026-02-21T09:47:27.8860077Z // begin inline asm 2026-02-21T09:47:27.8860199Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:27.8860252Z // end inline asm 2026-02-21T09:47:27.8860433Z .loc 1 21 67 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:21:67 2026-02-21T09:47:27.8860491Z mov.u32 %r815, %ctaid.x; 2026-02-21T09:47:27.8860549Z mov.u32 %r252, %ctaid.y; 2026-02-21T09:47:27.8860611Z mov.u32 %r253, %ctaid.z; 2026-02-21T09:47:27.8860667Z mov.u32 %r254, %nctaid.x; 2026-02-21T09:47:27.8860722Z mov.u32 %r255, %nctaid.y; 2026-02-21T09:47:27.8860791Z mad.lo.s32 %r256, %r253, %r255, %r252; 2026-02-21T09:47:27.8860860Z mad.lo.s32 %r257, %r256, %r254, %r815; 2026-02-21T09:47:27.8860918Z mul.lo.s32 %r258, %r257, 384; 2026-02-21T09:47:27.8860976Z cvt.s64.s32 %rd65, %r258; 2026-02-21T09:47:27.8861043Z add.s64 %rd19, %rd64, %rd65; 2026-02-21T09:47:27.8861100Z shl.b32 %r259, %r1, 2; 2026-02-21T09:47:27.8861161Z add.s32 %r57, %r56, %r259; 2026-02-21T09:47:27.8861223Z mov.b32 %r817, 0; 2026-02-21T09:47:27.8861304Z // begin inline asm 2026-02-21T09:47:27.8861372Z @%p3 st.shared.b32 [ %r57 + 0 ], %r817; 2026-02-21T09:47:27.8861424Z // end inline asm 2026-02-21T09:47:27.8861490Z bar.warp.sync -1; 2026-02-21T09:47:27.8861549Z setp.eq.b32 %p168, %r1, 0; 2026-02-21T09:47:27.8861606Z cvt.u64.u32 %rd4, %r56; 2026-02-21T09:47:27.8861668Z // begin inline asm 2026-02-21T09:47:27.8861836Z @%p168 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:47:27.8861891Z // end inline asm 2026-02-21T09:47:27.8861947Z // begin inline asm 2026-02-21T09:47:27.8862095Z @%p168 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.8862148Z // end inline asm 2026-02-21T09:47:27.8862201Z mov.b32 %r59, 64; 2026-02-21T09:47:27.8862262Z // begin inline asm 2026-02-21T09:47:27.8862436Z @%p168 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:27.8862490Z // end inline asm 2026-02-21T09:47:27.8862554Z mov.b32 %r60, 256; 2026-02-21T09:47:27.8862606Z // begin inline asm 2026-02-21T09:47:27.8862752Z @%p168 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:47:27.8862806Z // end inline asm 2026-02-21T09:47:27.8862867Z mov.b32 %r61, 2048; 2026-02-21T09:47:27.8862920Z // begin inline asm 2026-02-21T09:47:27.8863077Z @%p168 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:47:27.8863138Z // end inline asm 2026-02-21T09:47:27.8863191Z // begin inline asm 2026-02-21T09:47:27.8863377Z @%p168 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:47:27.8863444Z // end inline asm 2026-02-21T09:47:27.8863501Z mov.b64 %rd12, 4096; 2026-02-21T09:47:27.8863557Z // begin inline asm 2026-02-21T09:47:27.8863820Z @%p168 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:27.8863873Z // end inline asm 2026-02-21T09:47:27.8863925Z mov.b32 %r63, 1; 2026-02-21T09:47:27.8863977Z // begin inline asm 2026-02-21T09:47:27.8864149Z @%p168 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:27.8864201Z // end inline asm 2026-02-21T09:47:27.8864253Z // begin inline asm 2026-02-21T09:47:27.8864428Z @%p168 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:27.8864479Z // end inline asm 2026-02-21T09:47:27.8864532Z // begin inline asm 2026-02-21T09:47:27.8864740Z @%p168 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.8864797Z // end inline asm 2026-02-21T09:47:27.8864851Z // begin inline asm 2026-02-21T09:47:27.8865020Z @%p168 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.8865073Z // end inline asm 2026-02-21T09:47:27.8865161Z // begin inline asm 2026-02-21T09:47:27.8865316Z @%p168 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.8865380Z // end inline asm 2026-02-21T09:47:27.8865434Z // begin inline asm 2026-02-21T09:47:27.8865577Z @%p168 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.8865639Z // end inline asm 2026-02-21T09:47:27.8865692Z // begin inline asm 2026-02-21T09:47:27.8865958Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.8866018Z // end inline asm 2026-02-21T09:47:27.8866072Z // begin inline asm 2026-02-21T09:47:27.8866196Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:47:27.8866266Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.8866345Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.8866396Z // end inline asm 2026-02-21T09:47:27.8866449Z bar.sync 0; 2026-02-21T09:47:27.8866520Z cvta.global.u64 %rd58, %rd19; 2026-02-21T09:47:27.8866702Z .loc 1 22 68 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:22:68 2026-02-21T09:47:27.8866791Z add.s32 %r260, %r258, 128; 2026-02-21T09:47:27.8866857Z cvt.s64.s32 %rd66, %r260; 2026-02-21T09:47:27.8866916Z add.s64 %rd37, %rd64, %rd66; 2026-02-21T09:47:27.8866968Z bar.sync 0; 2026-02-21T09:47:27.8867022Z // begin inline asm 2026-02-21T09:47:27.8867095Z @%p3 st.shared.b32 [ %r57 + 0 ], %r817; 2026-02-21T09:47:27.8867148Z // end inline asm 2026-02-21T09:47:27.8867205Z bar.warp.sync -1; 2026-02-21T09:47:27.8867269Z // begin inline asm 2026-02-21T09:47:27.8867430Z @%p168 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:47:27.8867482Z // end inline asm 2026-02-21T09:47:27.8867537Z // begin inline asm 2026-02-21T09:47:27.8867680Z @%p168 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.8867735Z // end inline asm 2026-02-21T09:47:27.8867815Z // begin inline asm 2026-02-21T09:47:27.8867976Z @%p168 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:27.8868035Z // end inline asm 2026-02-21T09:47:27.8868094Z // begin inline asm 2026-02-21T09:47:27.8868252Z @%p168 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r59; 2026-02-21T09:47:27.8868307Z // end inline asm 2026-02-21T09:47:27.8868361Z // begin inline asm 2026-02-21T09:47:27.8868532Z @%p168 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:47:27.8868587Z // end inline asm 2026-02-21T09:47:27.8868670Z mov.b32 %r70, 12288; 2026-02-21T09:47:27.8868726Z // begin inline asm 2026-02-21T09:47:27.8868889Z @%p168 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r70; 2026-02-21T09:47:27.8868941Z // end inline asm 2026-02-21T09:47:27.8868995Z // begin inline asm 2026-02-21T09:47:27.8869169Z @%p168 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:27.8869222Z // end inline asm 2026-02-21T09:47:27.8869276Z // begin inline asm 2026-02-21T09:47:27.8869444Z @%p168 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:27.8869496Z // end inline asm 2026-02-21T09:47:27.8869548Z // begin inline asm 2026-02-21T09:47:27.8869708Z @%p168 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:27.8869767Z // end inline asm 2026-02-21T09:47:27.8869819Z // begin inline asm 2026-02-21T09:47:27.8869966Z @%p168 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.8870026Z // end inline asm 2026-02-21T09:47:27.8870079Z // begin inline asm 2026-02-21T09:47:27.8870241Z @%p168 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.8870299Z // end inline asm 2026-02-21T09:47:27.8870373Z // begin inline asm 2026-02-21T09:47:27.8870525Z @%p168 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.8870585Z // end inline asm 2026-02-21T09:47:27.8870637Z // begin inline asm 2026-02-21T09:47:27.8870777Z @%p168 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.8870828Z // end inline asm 2026-02-21T09:47:27.8870886Z // begin inline asm 2026-02-21T09:47:27.8871141Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.8871192Z // end inline asm 2026-02-21T09:47:27.8871253Z // begin inline asm 2026-02-21T09:47:27.8871374Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:47:27.8871442Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.8871519Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.8871571Z // end inline asm 2026-02-21T09:47:27.8871623Z bar.sync 0; 2026-02-21T09:47:27.8871687Z cvta.global.u64 %rd59, %rd37; 2026-02-21T09:47:27.8871869Z .loc 1 24 73 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:24:73 2026-02-21T09:47:27.8871949Z add.s32 %r261, %r258, 256; 2026-02-21T09:47:27.8872004Z cvt.s64.s32 %rd67, %r261; 2026-02-21T09:47:27.8872067Z add.s64 %rd55, %rd64, %rd67; 2026-02-21T09:47:27.8872117Z bar.sync 0; 2026-02-21T09:47:27.8872169Z // begin inline asm 2026-02-21T09:47:27.8872232Z @%p3 st.shared.b32 [ %r57 + 0 ], %r817; 2026-02-21T09:47:27.8872291Z // end inline asm 2026-02-21T09:47:27.8872347Z bar.warp.sync -1; 2026-02-21T09:47:27.8872399Z // begin inline asm 2026-02-21T09:47:27.8872563Z @%p168 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:47:27.8872616Z // end inline asm 2026-02-21T09:47:27.8872667Z // begin inline asm 2026-02-21T09:47:27.8872808Z @%p168 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:27.8872861Z // end inline asm 2026-02-21T09:47:27.8872937Z // begin inline asm 2026-02-21T09:47:27.8873084Z @%p168 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:27.8873144Z // end inline asm 2026-02-21T09:47:27.8873198Z // begin inline asm 2026-02-21T09:47:27.8873344Z @%p168 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:47:27.8873404Z // end inline asm 2026-02-21T09:47:27.8873456Z // begin inline asm 2026-02-21T09:47:27.8873610Z @%p168 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r70; 2026-02-21T09:47:27.8873666Z // end inline asm 2026-02-21T09:47:27.8873763Z // begin inline asm 2026-02-21T09:47:27.8873915Z @%p168 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:47:27.8873973Z // end inline asm 2026-02-21T09:47:27.8874030Z mov.b64 %rd48, 24576; 2026-02-21T09:47:27.8874085Z // begin inline asm 2026-02-21T09:47:27.8874257Z @%p168 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd48; 2026-02-21T09:47:27.8874321Z // end inline asm 2026-02-21T09:47:27.8874379Z // begin inline asm 2026-02-21T09:47:27.8874554Z @%p168 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:27.8874616Z // end inline asm 2026-02-21T09:47:27.8874704Z // begin inline asm 2026-02-21T09:47:27.8874894Z @%p168 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:27.8874957Z // end inline asm 2026-02-21T09:47:27.8875015Z // begin inline asm 2026-02-21T09:47:27.8875171Z @%p168 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:27.8875227Z // end inline asm 2026-02-21T09:47:27.8875289Z // begin inline asm 2026-02-21T09:47:27.8875459Z @%p168 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.8875514Z // end inline asm 2026-02-21T09:47:27.8875607Z // begin inline asm 2026-02-21T09:47:27.8875766Z @%p168 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:27.8875826Z // end inline asm 2026-02-21T09:47:27.8875896Z // begin inline asm 2026-02-21T09:47:27.8876044Z @%p168 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:27.8876101Z // end inline asm 2026-02-21T09:47:27.8876158Z // begin inline asm 2026-02-21T09:47:27.8876427Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:27.8876482Z // end inline asm 2026-02-21T09:47:27.8876540Z // begin inline asm 2026-02-21T09:47:27.8876673Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:47:27.8876744Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:27.8876819Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:27.8876882Z // end inline asm 2026-02-21T09:47:27.8876936Z bar.sync 0; 2026-02-21T09:47:27.8877008Z cvta.global.u64 %rd130, %rd55; 2026-02-21T09:47:27.8877187Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8877287Z max.u32 %r262, %r815, 1535; 2026-02-21T09:47:27.8877347Z shl.b32 %r263, %r262, 5; 2026-02-21T09:47:27.8877411Z add.s32 %r4, %r263, -49120; 2026-02-21T09:47:27.8877477Z sub.s32 %r5, 49152, %r263; 2026-02-21T09:47:27.8877648Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8877707Z shr.u32 %r264, %r1, 5; 2026-02-21T09:47:27.8877790Z shfl.sync.idx.b32 %r6, %r264, 0, 31, -1; 2026-02-21T09:47:27.8877851Z shl.b32 %r265, %r6, 21; 2026-02-21T09:47:27.8877914Z and.b32 %r266, %r265, 6291456; 2026-02-21T09:47:27.8877973Z add.s32 %r81, %r266, %r792; 2026-02-21T09:47:27.8878042Z mov.pred %p102, -1; 2026-02-21T09:47:27.8878098Z // begin inline asm 2026-02-21T09:47:27.8878413Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 0], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8878481Z // end inline asm 2026-02-21T09:47:27.8878538Z // begin inline asm 2026-02-21T09:47:27.8878823Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 16], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8878886Z // end inline asm 2026-02-21T09:47:27.8878941Z // begin inline asm 2026-02-21T09:47:27.8879230Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 32], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8879317Z // end inline asm 2026-02-21T09:47:27.8879371Z // begin inline asm 2026-02-21T09:47:27.8879647Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 48], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8879704Z // end inline asm 2026-02-21T09:47:27.8879768Z // begin inline asm 2026-02-21T09:47:27.8880052Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 64], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8880106Z // end inline asm 2026-02-21T09:47:27.8880169Z // begin inline asm 2026-02-21T09:47:27.8880439Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 80], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8880492Z // end inline asm 2026-02-21T09:47:27.8880555Z // begin inline asm 2026-02-21T09:47:27.8880826Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 96], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8880881Z // end inline asm 2026-02-21T09:47:27.8880943Z // begin inline asm 2026-02-21T09:47:27.8881249Z @%p102 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 112], {%r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817, %r817}; 2026-02-21T09:47:27.8881307Z // end inline asm 2026-02-21T09:47:27.8881362Z // begin inline asm 2026-02-21T09:47:27.8881438Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:27.8881492Z // end inline asm 2026-02-21T09:47:27.8881546Z bar.sync 0; 2026-02-21T09:47:27.8881733Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8881794Z add.s32 %r816, %r56, 196640; 2026-02-21T09:47:27.8881852Z // begin inline asm 2026-02-21T09:47:27.8881947Z @%p168 mbarrier.init.shared::cta.b64 [%r816], 1; 2026-02-21T09:47:27.8882002Z // end inline asm 2026-02-21T09:47:27.8882054Z bar.sync 0; 2026-02-21T09:47:27.8882126Z add.s32 %r218, %r56, 196648; 2026-02-21T09:47:27.8882186Z // begin inline asm 2026-02-21T09:47:27.8882267Z @%p168 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T09:47:27.8882321Z // end inline asm 2026-02-21T09:47:27.8882384Z add.s32 %r219, %r56, 196608; 2026-02-21T09:47:27.8882461Z // begin inline asm 2026-02-21T09:47:27.8882541Z @%p168 mbarrier.init.shared::cta.b64 [%r219], 1; 2026-02-21T09:47:27.8882594Z // end inline asm 2026-02-21T09:47:27.8882653Z bar.sync 0; 2026-02-21T09:47:27.8882710Z add.s32 %r220, %r56, 196616; 2026-02-21T09:47:27.8882764Z // begin inline asm 2026-02-21T09:47:27.8882847Z @%p168 mbarrier.init.shared::cta.b64 [%r220], 1; 2026-02-21T09:47:27.8882899Z // end inline asm 2026-02-21T09:47:27.8882951Z bar.sync 0; 2026-02-21T09:47:27.8883013Z add.s32 %r221, %r56, 196624; 2026-02-21T09:47:27.8883068Z // begin inline asm 2026-02-21T09:47:27.8883146Z @%p168 mbarrier.init.shared::cta.b64 [%r221], 1; 2026-02-21T09:47:27.8883197Z // end inline asm 2026-02-21T09:47:27.8883257Z bar.sync 0; 2026-02-21T09:47:27.8883313Z add.s32 %r324, %r56, 196632; 2026-02-21T09:47:27.8883367Z // begin inline asm 2026-02-21T09:47:27.8883476Z @%p168 mbarrier.init.shared::cta.b64 [%r324], 1; 2026-02-21T09:47:27.8883529Z // end inline asm 2026-02-21T09:47:27.8883593Z setp.lt.s32 %p83, %r5, 1; 2026-02-21T09:47:27.8883652Z setp.gt.s32 %p82, %r5, 0; 2026-02-21T09:47:27.8883830Z .loc 1 40 33 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:40:33 2026-02-21T09:47:27.8883888Z shr.u32 %r267, %r815, 3; 2026-02-21T09:47:27.8883951Z and.b32 %r268, %r267, 268435452; 2026-02-21T09:47:27.8884123Z .loc 1 41 39 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:41:39 2026-02-21T09:47:27.8884179Z sub.s32 %r269, 192, %r268; 2026-02-21T09:47:27.8884364Z .loc 1 41 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:41:52 2026-02-21T09:47:27.8884426Z min.s32 %r270, %r269, 4; 2026-02-21T09:47:27.8884591Z .loc 1 42 45 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:42:45 2026-02-21T09:47:27.8884648Z and.b32 %r271, %r815, 31; 2026-02-21T09:47:27.8884862Z .loc 1 43 51 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:43:51 2026-02-21T09:47:27.8884929Z div.s32 %r272, %r271, %r270; 2026-02-21T09:47:27.8885090Z .loc 1 42 64 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:42:64 2026-02-21T09:47:27.8885152Z mul.lo.s32 %r273, %r272, %r270; 2026-02-21T09:47:27.8885216Z sub.s32 %r274, %r271, %r273; 2026-02-21T09:47:27.8885380Z .loc 1 42 30 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:42:30 2026-02-21T09:47:27.8885435Z add.s32 %r275, %r274, %r268; 2026-02-21T09:47:27.8885608Z .loc 1 44 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:44:27 2026-02-21T09:47:27.8885663Z shl.b32 %r794, %r275, 6; 2026-02-21T09:47:27.8885823Z .loc 1 45 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:45:27 2026-02-21T09:47:27.8885913Z shl.b32 %r796, %r272, 8; 2026-02-21T09:47:27.8886082Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8886135Z bar.sync 0; 2026-02-21T09:47:27.8886198Z and.pred %p73, %p168, %p82; 2026-02-21T09:47:27.8886263Z // begin inline asm 2026-02-21T09:47:27.8886375Z @%p73 mbarrier.arrive.expect_tx.shared.b64 _, [%r219], 40960; 2026-02-21T09:47:27.8886429Z // end inline asm 2026-02-21T09:47:27.8886598Z .loc 1 54 31 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:54:31 2026-02-21T09:47:27.8886653Z // begin inline asm 2026-02-21T09:47:27.8886724Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.8886786Z // end inline asm 2026-02-21T09:47:27.8886839Z bar.sync 0; 2026-02-21T09:47:27.8886903Z elect.sync %r276|%p84, -1; 2026-02-21T09:47:27.8886963Z and.pred %p85, %p82, %p84; 2026-02-21T09:47:27.8887029Z and.pred %p74, %p3, %p85; 2026-02-21T09:47:27.8887084Z // begin inline asm 2026-02-21T09:47:27.8887326Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r56], [%rd58, {%r817, %r796}], [%r219]; 2026-02-21T09:47:27.8887413Z // end inline asm 2026-02-21T09:47:27.8887575Z .loc 1 55 44 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:55:44 2026-02-21T09:47:27.8887628Z bar.sync 0; 2026-02-21T09:47:27.8887696Z elect.sync %r277|%p86, -1; 2026-02-21T09:47:27.8887754Z and.pred %p87, %p82, %p86; 2026-02-21T09:47:27.8887812Z and.pred %p75, %p3, %p87; 2026-02-21T09:47:27.8887868Z add.s32 %r228, %r56, 131072; 2026-02-21T09:47:27.8887930Z // begin inline asm 2026-02-21T09:47:27.8888172Z @%p75 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r228], [%rd59, {%r817, %r794}], [%r219]; 2026-02-21T09:47:27.8888225Z // end inline asm 2026-02-21T09:47:27.8888392Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8888452Z setp.gt.s32 %p88, %r5, 1; 2026-02-21T09:47:27.8888527Z bar.sync 0; 2026-02-21T09:47:27.8888595Z and.pred %p76, %p168, %p88; 2026-02-21T09:47:27.8888651Z // begin inline asm 2026-02-21T09:47:27.8888760Z @%p76 mbarrier.arrive.expect_tx.shared.b64 _, [%r220], 40960; 2026-02-21T09:47:27.8888812Z // end inline asm 2026-02-21T09:47:27.8888985Z .loc 1 54 31 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:54:31 2026-02-21T09:47:27.8889035Z bar.sync 0; 2026-02-21T09:47:27.8889093Z elect.sync %r278|%p89, -1; 2026-02-21T09:47:27.8889158Z and.pred %p90, %p88, %p89; 2026-02-21T09:47:27.8889214Z and.pred %p77, %p3, %p90; 2026-02-21T09:47:27.8889296Z add.s32 %r233, %r56, 32768; 2026-02-21T09:47:27.8889352Z // begin inline asm 2026-02-21T09:47:27.8889591Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r233], [%rd58, {%r59, %r796}], [%r220]; 2026-02-21T09:47:27.8889644Z // end inline asm 2026-02-21T09:47:27.8889813Z .loc 1 55 44 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:55:44 2026-02-21T09:47:27.8889873Z bar.sync 0; 2026-02-21T09:47:27.8889933Z elect.sync %r279|%p91, -1; 2026-02-21T09:47:27.8889991Z and.pred %p92, %p88, %p91; 2026-02-21T09:47:27.8890053Z and.pred %p78, %p3, %p92; 2026-02-21T09:47:27.8890108Z add.s32 %r237, %r56, 139264; 2026-02-21T09:47:27.8890162Z // begin inline asm 2026-02-21T09:47:27.8890387Z @%p78 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r237], [%rd59, {%r59, %r794}], [%r220]; 2026-02-21T09:47:27.8890449Z // end inline asm 2026-02-21T09:47:27.8890617Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8890676Z setp.gt.s32 %p93, %r5, 2; 2026-02-21T09:47:27.8890735Z bar.sync 0; 2026-02-21T09:47:27.8890794Z and.pred %p79, %p168, %p93; 2026-02-21T09:47:27.8890847Z // begin inline asm 2026-02-21T09:47:27.8890982Z @%p79 mbarrier.arrive.expect_tx.shared.b64 _, [%r221], 40960; 2026-02-21T09:47:27.8891040Z // end inline asm 2026-02-21T09:47:27.8891201Z .loc 1 54 31 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:54:31 2026-02-21T09:47:27.8891254Z bar.sync 0; 2026-02-21T09:47:27.8891318Z elect.sync %r280|%p94, -1; 2026-02-21T09:47:27.8891376Z and.pred %p95, %p93, %p94; 2026-02-21T09:47:27.8891433Z and.pred %p80, %p3, %p95; 2026-02-21T09:47:27.8891495Z add.s32 %r242, %r56, 65536; 2026-02-21T09:47:27.8891549Z mov.b32 %r243, 128; 2026-02-21T09:47:27.8891602Z // begin inline asm 2026-02-21T09:47:27.8891835Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r242], [%rd58, {%r243, %r796}], [%r221]; 2026-02-21T09:47:27.8891890Z // end inline asm 2026-02-21T09:47:27.8892052Z .loc 1 55 44 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:55:44 2026-02-21T09:47:27.8892103Z bar.sync 0; 2026-02-21T09:47:27.8892169Z elect.sync %r281|%p96, -1; 2026-02-21T09:47:27.8892228Z and.pred %p97, %p93, %p96; 2026-02-21T09:47:27.8892289Z and.pred %p81, %p3, %p97; 2026-02-21T09:47:27.8892354Z add.s32 %r246, %r56, 147456; 2026-02-21T09:47:27.8892432Z // begin inline asm 2026-02-21T09:47:27.8892657Z @%p81 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r246], [%rd59, {%r243, %r794}], [%r221]; 2026-02-21T09:47:27.8892719Z // end inline asm 2026-02-21T09:47:27.8892883Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8892935Z bar.sync 0; 2026-02-21T09:47:27.8892990Z // begin inline asm 2026-02-21T09:47:27.8893049Z 2026-02-21T09:47:27.8893101Z { 2026-02-21T09:47:27.8893163Z @!%p82 bra.uni skipWait; 2026-02-21T09:47:27.8893230Z .reg .pred complete; 2026-02-21T09:47:27.8893283Z waitLoop: 2026-02-21T09:47:27.8893397Z mbarrier.try_wait.parity.shared.b64 complete, [%r219], %r817; 2026-02-21T09:47:27.8893459Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.8893520Z skipWait: 2026-02-21T09:47:27.8893568Z } 2026-02-21T09:47:27.8893608Z 2026-02-21T09:47:27.8893662Z // end inline asm 2026-02-21T09:47:27.8893832Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8893889Z setp.ne.b32 %p98, %r6, 0; 2026-02-21T09:47:27.8893946Z or.pred %p99, %p83, %p98; 2026-02-21T09:47:27.8894009Z @%p99 bra $L__BB0_2; 2026-02-21T09:47:27.8894061Z // %bb.1: 2026-02-21T09:47:27.8894122Z elect.sync %r298|%p101, -1; 2026-02-21T09:47:27.8894181Z bfe.u32 %r300, %r56, 4, 14; 2026-02-21T09:47:27.8894244Z cvt.u64.u32 %rd85, %r300; 2026-02-21T09:47:27.8894312Z or.b64 %rd68, %rd85, 4611686293439512576; 2026-02-21T09:47:27.8894396Z bfe.u32 %r302, %r228, 4, 14; 2026-02-21T09:47:27.8894458Z cvt.u64.u32 %rd86, %r302; 2026-02-21T09:47:27.8894524Z or.b64 %rd69, %rd86, 4611686293338849280; 2026-02-21T09:47:27.8894579Z mov.b32 %r283, 135266320; 2026-02-21T09:47:27.8894634Z mov.pred %p100, 0; 2026-02-21T09:47:27.8894727Z // begin inline asm 2026-02-21T09:47:27.8894869Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd68, %rd69, %r283, %p100; 2026-02-21T09:47:27.8894924Z // end inline asm 2026-02-21T09:47:27.8894987Z add.s32 %r303, %r56, 32; 2026-02-21T09:47:27.8895043Z bfe.u32 %r304, %r303, 4, 14; 2026-02-21T09:47:27.8895097Z cvt.u64.u32 %rd87, %r304; 2026-02-21T09:47:27.8895168Z or.b64 %rd70, %rd87, 4611686293439512576; 2026-02-21T09:47:27.8895223Z add.s32 %r305, %r56, 131104; 2026-02-21T09:47:27.8895279Z bfe.u32 %r306, %r305, 4, 14; 2026-02-21T09:47:27.8895333Z cvt.u64.u32 %rd88, %r306; 2026-02-21T09:47:27.8895403Z or.b64 %rd71, %rd88, 4611686293338849280; 2026-02-21T09:47:27.8895459Z // begin inline asm 2026-02-21T09:47:27.8895589Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd70, %rd71, %r283, %p102; 2026-02-21T09:47:27.8895651Z // end inline asm 2026-02-21T09:47:27.8895706Z add.s32 %r307, %r56, 64; 2026-02-21T09:47:27.8895760Z bfe.u32 %r308, %r307, 4, 14; 2026-02-21T09:47:27.8895845Z cvt.u64.u32 %rd89, %r308; 2026-02-21T09:47:27.8895919Z or.b64 %rd72, %rd89, 4611686293439512576; 2026-02-21T09:47:27.8895977Z add.s32 %r309, %r56, 131136; 2026-02-21T09:47:27.8896033Z bfe.u32 %r310, %r309, 4, 14; 2026-02-21T09:47:27.8896097Z cvt.u64.u32 %rd90, %r310; 2026-02-21T09:47:27.8896158Z or.b64 %rd73, %rd90, 4611686293338849280; 2026-02-21T09:47:27.8896213Z // begin inline asm 2026-02-21T09:47:27.8896351Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd72, %rd73, %r283, %p102; 2026-02-21T09:47:27.8896403Z // end inline asm 2026-02-21T09:47:27.8896457Z add.s32 %r311, %r56, 96; 2026-02-21T09:47:27.8896511Z bfe.u32 %r312, %r311, 4, 14; 2026-02-21T09:47:27.8896574Z cvt.u64.u32 %rd91, %r312; 2026-02-21T09:47:27.8896634Z or.b64 %rd74, %rd91, 4611686293439512576; 2026-02-21T09:47:27.8896688Z add.s32 %r313, %r56, 131168; 2026-02-21T09:47:27.8896750Z bfe.u32 %r314, %r313, 4, 14; 2026-02-21T09:47:27.8896804Z cvt.u64.u32 %rd92, %r314; 2026-02-21T09:47:27.8896867Z or.b64 %rd75, %rd92, 4611686293338849280; 2026-02-21T09:47:27.8896922Z // begin inline asm 2026-02-21T09:47:27.8897094Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd74, %rd75, %r283, %p102; 2026-02-21T09:47:27.8897147Z // end inline asm 2026-02-21T09:47:27.8897202Z add.s32 %r315, %r56, 16384; 2026-02-21T09:47:27.8897262Z bfe.u32 %r316, %r315, 4, 14; 2026-02-21T09:47:27.8897315Z cvt.u64.u32 %rd93, %r316; 2026-02-21T09:47:27.8897375Z or.b64 %rd76, %rd93, 4611686293439512576; 2026-02-21T09:47:27.8897434Z // begin inline asm 2026-02-21T09:47:27.8897567Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd76, %rd69, %r283, %p100; 2026-02-21T09:47:27.8897621Z // end inline asm 2026-02-21T09:47:27.8897677Z add.s32 %r317, %r56, 16416; 2026-02-21T09:47:27.8897738Z bfe.u32 %r318, %r317, 4, 14; 2026-02-21T09:47:27.8897792Z cvt.u64.u32 %rd94, %r318; 2026-02-21T09:47:27.8897852Z or.b64 %rd78, %rd94, 4611686293439512576; 2026-02-21T09:47:27.8897913Z // begin inline asm 2026-02-21T09:47:27.8898075Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd78, %rd71, %r283, %p102; 2026-02-21T09:47:27.8898130Z // end inline asm 2026-02-21T09:47:27.8898190Z add.s32 %r319, %r56, 16448; 2026-02-21T09:47:27.8898245Z bfe.u32 %r320, %r319, 4, 14; 2026-02-21T09:47:27.8898301Z cvt.u64.u32 %rd95, %r320; 2026-02-21T09:47:27.8898361Z or.b64 %rd80, %rd95, 4611686293439512576; 2026-02-21T09:47:27.8898424Z // begin inline asm 2026-02-21T09:47:27.8898553Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd80, %rd73, %r283, %p102; 2026-02-21T09:47:27.8898605Z // end inline asm 2026-02-21T09:47:27.8898665Z add.s32 %r321, %r56, 16480; 2026-02-21T09:47:27.8898919Z bfe.u32 %r322, %r321, 4, 14; 2026-02-21T09:47:27.8898975Z cvt.u64.u32 %rd96, %r322; 2026-02-21T09:47:27.8899042Z or.b64 %rd82, %rd96, 4611686293439512576; 2026-02-21T09:47:27.8899097Z // begin inline asm 2026-02-21T09:47:27.8899226Z @%p101 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd82, %rd75, %r283, %p102; 2026-02-21T09:47:27.8899279Z // end inline asm 2026-02-21T09:47:27.8899342Z add.s32 %r323, %r56, 196640; 2026-02-21T09:47:27.8899400Z cvt.u64.u32 %rd84, %r323; 2026-02-21T09:47:27.8899454Z // begin inline asm 2026-02-21T09:47:27.8899586Z @%p101 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd84]; 2026-02-21T09:47:27.8899640Z // end inline asm 2026-02-21T09:47:27.8899694Z $L__BB0_2: 2026-02-21T09:47:27.8899878Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8899939Z setp.gt.s32 %p120, %r5, 3; 2026-02-21T09:47:27.8899993Z bar.sync 0; 2026-02-21T09:47:27.8900060Z and.pred %p117, %p168, %p120; 2026-02-21T09:47:27.8900130Z // begin inline asm 2026-02-21T09:47:27.8900245Z @%p117 mbarrier.arrive.expect_tx.shared.b64 _, [%r324], 40960; 2026-02-21T09:47:27.8900299Z // end inline asm 2026-02-21T09:47:27.8900496Z .loc 1 54 31 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:54:31 2026-02-21T09:47:27.8900550Z bar.sync 0; 2026-02-21T09:47:27.8900612Z elect.sync %r336|%p123, -1; 2026-02-21T09:47:27.8900675Z and.pred %p124, %p120, %p123; 2026-02-21T09:47:27.8900741Z and.pred %p118, %p3, %p124; 2026-02-21T09:47:27.8900795Z add.s32 %r325, %r56, 98304; 2026-02-21T09:47:27.8900848Z mov.b32 %r800, 192; 2026-02-21T09:47:27.8900909Z // begin inline asm 2026-02-21T09:47:27.8901143Z @%p118 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r325], [%rd58, {%r800, %r796}], [%r324]; 2026-02-21T09:47:27.8901197Z // end inline asm 2026-02-21T09:47:27.8901367Z .loc 1 55 44 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:55:44 2026-02-21T09:47:27.8901420Z bar.sync 0; 2026-02-21T09:47:27.8901479Z elect.sync %r337|%p125, -1; 2026-02-21T09:47:27.8901539Z and.pred %p126, %p120, %p125; 2026-02-21T09:47:27.8901604Z and.pred %p119, %p3, %p126; 2026-02-21T09:47:27.8901659Z add.s32 %r329, %r56, 155648; 2026-02-21T09:47:27.8901715Z // begin inline asm 2026-02-21T09:47:27.8901957Z @%p119 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r329], [%rd59, {%r800, %r794}], [%r324]; 2026-02-21T09:47:27.8902035Z // end inline asm 2026-02-21T09:47:27.8902195Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8902257Z sub.s32 %r10, 31, %r4; 2026-02-21T09:47:27.8902315Z setp.lt.s32 %p127, %r10, 1; 2026-02-21T09:47:27.8902373Z @%p127 bra $L__BB0_11; 2026-02-21T09:47:27.8902449Z // %bb.3: // %.lr.ph 2026-02-21T09:47:27.8902618Z .loc 1 0 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:0:84 2026-02-21T09:47:27.8902674Z sub.s32 %r11, 28, %r4; 2026-02-21T09:47:27.8902727Z shl.b32 %r344, %r1, 7; 2026-02-21T09:47:27.8902792Z and.b32 %r345, %r344, 16256; 2026-02-21T09:47:27.8902846Z shl.b32 %r346, %r1, 4; 2026-02-21T09:47:27.8902904Z and.b32 %r347, %r346, 112; 2026-02-21T09:47:27.8902987Z or.b32 %r348, %r345, %r347; 2026-02-21T09:47:27.8903044Z add.s32 %r350, %r56, 163840; 2026-02-21T09:47:27.8903100Z add.s32 %r12, %r350, %r348; 2026-02-21T09:47:27.8903154Z xor.b32 %r351, %r348, 16; 2026-02-21T09:47:27.8903216Z add.s32 %r13, %r350, %r351; 2026-02-21T09:47:27.8903270Z xor.b32 %r352, %r348, 32; 2026-02-21T09:47:27.8903323Z add.s32 %r14, %r350, %r352; 2026-02-21T09:47:27.8903385Z xor.b32 %r353, %r348, 48; 2026-02-21T09:47:27.8903438Z add.s32 %r15, %r350, %r353; 2026-02-21T09:47:27.8903491Z xor.b32 %r354, %r348, 64; 2026-02-21T09:47:27.8903544Z add.s32 %r16, %r350, %r354; 2026-02-21T09:47:27.8903629Z xor.b32 %r355, %r348, 80; 2026-02-21T09:47:27.8903683Z add.s32 %r17, %r350, %r355; 2026-02-21T09:47:27.8903737Z xor.b32 %r356, %r348, 96; 2026-02-21T09:47:27.8903797Z add.s32 %r18, %r350, %r356; 2026-02-21T09:47:27.8903852Z xor.b32 %r357, %r348, 112; 2026-02-21T09:47:27.8903906Z add.s32 %r19, %r350, %r357; 2026-02-21T09:47:27.8903962Z add.s32 %r802, %r56, 196640; 2026-02-21T09:47:27.8904025Z mov.pred %p175, -1; 2026-02-21T09:47:27.8904078Z mov.b32 %r805, 3; 2026-02-21T09:47:27.8904130Z mov.b32 %r801, 0; 2026-02-21T09:47:27.8904185Z mov.b32 %r799, 1; 2026-02-21T09:47:27.8904236Z mov.b32 %r798, 2; 2026-02-21T09:47:27.8904290Z mov.b32 %r795, %r794; 2026-02-21T09:47:27.8904344Z mov.b32 %r797, %r796; 2026-02-21T09:47:27.8904403Z mov.b32 %r803, %r801; 2026-02-21T09:47:27.8904456Z mov.b32 %r804, %r801; 2026-02-21T09:47:27.8904508Z mov.b32 %r806, %r799; 2026-02-21T09:47:27.8904568Z mov.b32 %r807, %r801; 2026-02-21T09:47:27.8904620Z mov.b32 %r808, %r796; 2026-02-21T09:47:27.8904727Z mov.b32 %r809, %r794; 2026-02-21T09:47:27.8904781Z mov.b32 %r811, %r805; 2026-02-21T09:47:27.8904840Z mov.b32 %r812, %r801; 2026-02-21T09:47:27.8904893Z mov.b32 %r813, %r809; 2026-02-21T09:47:27.8904945Z mov.b32 %r814, %r808; 2026-02-21T09:47:27.8905006Z bra.uni $L__BB0_4; 2026-02-21T09:47:27.8905138Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.8905205Z selp.b32 %r806, 0, %r437, %p157; 2026-02-21T09:47:27.8905264Z selp.b32 %r438, 1, 0, %p157; 2026-02-21T09:47:27.8905327Z xor.b32 %r807, %r817, %r438; 2026-02-21T09:47:27.8905493Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8905549Z add.s32 %r812, %r812, 1; 2026-02-21T09:47:27.8905619Z setp.lt.s32 %p166, %r812, %r10; 2026-02-21T09:47:27.8905673Z mov.b32 %r794, %r809; 2026-02-21T09:47:27.8905727Z mov.b32 %r795, %r20; 2026-02-21T09:47:27.8905786Z mov.b32 %r796, %r808; 2026-02-21T09:47:27.8905840Z mov.b32 %r797, %r22; 2026-02-21T09:47:27.8905893Z mov.b32 %r798, %r811; 2026-02-21T09:47:27.8905946Z mov.b32 %r799, %r24; 2026-02-21T09:47:27.8906007Z mov.b32 %r801, %r817; 2026-02-21T09:47:27.8906060Z mov.b32 %r802, %r816; 2026-02-21T09:47:27.8906114Z mov.b32 %r808, %r814; 2026-02-21T09:47:27.8906174Z mov.b32 %r809, %r813; 2026-02-21T09:47:27.8906227Z mov.b32 %r811, %r39; 2026-02-21T09:47:27.8906282Z @%p166 bra $L__BB0_4; 2026-02-21T09:47:27.8906368Z bra.uni $L__BB0_11; 2026-02-21T09:47:27.8906477Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:27.8906646Z .loc 1 0 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:0:84 2026-02-21T09:47:27.8906699Z mov.b32 %r817, %r807; 2026-02-21T09:47:27.8906760Z mov.b32 %r24, %r798; 2026-02-21T09:47:27.8906814Z mov.b32 %r22, %r796; 2026-02-21T09:47:27.8906869Z mov.b32 %r20, %r794; 2026-02-21T09:47:27.8907043Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8907104Z add.s32 %r358, %r811, 1; 2026-02-21T09:47:27.8907174Z setp.eq.b32 %p129, %r811, 31; 2026-02-21T09:47:27.8907236Z selp.b32 %r39, 0, %r358, %p129; 2026-02-21T09:47:27.8907302Z setp.ne.b32 %p130, %r39, 0; 2026-02-21T09:47:27.8907357Z @%p130 bra $L__BB0_6; 2026-02-21T09:47:27.8907491Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.8907558Z add.s32 %r815, %r815, 1; 2026-02-21T09:47:27.8907726Z .loc 1 39 35 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:39:35 2026-02-21T09:47:27.8907781Z shr.s32 %r359, %r815, 31; 2026-02-21T09:47:27.8907835Z shr.u32 %r360, %r359, 27; 2026-02-21T09:47:27.8907899Z add.s32 %r361, %r815, %r360; 2026-02-21T09:47:27.8907954Z shr.s32 %r362, %r361, 5; 2026-02-21T09:47:27.8908124Z .loc 1 40 33 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:40:33 2026-02-21T09:47:27.8908216Z shl.b32 %r363, %r362, 2; 2026-02-21T09:47:27.8908385Z .loc 1 41 39 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:41:39 2026-02-21T09:47:27.8908441Z sub.s32 %r364, 192, %r363; 2026-02-21T09:47:27.8908612Z .loc 1 41 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:41:52 2026-02-21T09:47:27.8908668Z min.s32 %r365, %r364, 4; 2026-02-21T09:47:27.8908832Z .loc 1 42 45 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:42:45 2026-02-21T09:47:27.8908898Z and.b32 %r366, %r361, -32; 2026-02-21T09:47:27.8908954Z sub.s32 %r367, %r815, %r366; 2026-02-21T09:47:27.8909123Z .loc 1 43 51 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:43:51 2026-02-21T09:47:27.8909178Z div.s32 %r368, %r367, %r365; 2026-02-21T09:47:27.8909353Z .loc 1 42 64 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:42:64 2026-02-21T09:47:27.8909414Z mul.lo.s32 %r369, %r368, %r365; 2026-02-21T09:47:27.8909472Z sub.s32 %r370, %r367, %r369; 2026-02-21T09:47:27.8909650Z .loc 1 42 30 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:42:30 2026-02-21T09:47:27.8909707Z add.s32 %r371, %r370, %r363; 2026-02-21T09:47:27.8909899Z .loc 1 44 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:44:27 2026-02-21T09:47:27.8909965Z shl.b32 %r813, %r371, 6; 2026-02-21T09:47:27.8910129Z .loc 1 45 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:45:27 2026-02-21T09:47:27.8910184Z shl.b32 %r814, %r368, 8; 2026-02-21T09:47:27.8910280Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.8910455Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8910508Z add.s32 %r374, %r804, 1; 2026-02-21T09:47:27.8910566Z setp.gt.s32 %p132, %r374, 3; 2026-02-21T09:47:27.8910636Z selp.b32 %r804, 0, %r374, %p132; 2026-02-21T09:47:27.8910692Z selp.b32 %r375, 1, 0, %p132; 2026-02-21T09:47:27.8910745Z xor.b32 %r803, %r803, %r375; 2026-02-21T09:47:27.8910806Z shl.b32 %r376, %r804, 3; 2026-02-21T09:47:27.8910861Z add.s32 %r378, %r56, %r376; 2026-02-21T09:47:27.8910916Z add.s32 %r372, %r378, 196608; 2026-02-21T09:47:27.8910969Z bar.sync 0; 2026-02-21T09:47:27.8911032Z // begin inline asm 2026-02-21T09:47:27.8911081Z 2026-02-21T09:47:27.8911167Z { 2026-02-21T09:47:27.8911233Z .reg .pred complete; 2026-02-21T09:47:27.8911286Z waitLoop: 2026-02-21T09:47:27.8911404Z mbarrier.try_wait.parity.shared.b64 complete, [%r372], %r803; 2026-02-21T09:47:27.8911465Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.8911520Z } 2026-02-21T09:47:27.8911524Z 2026-02-21T09:47:27.8911578Z // end inline asm 2026-02-21T09:47:27.8911631Z shl.b32 %r379, %r806, 3; 2026-02-21T09:47:27.8911693Z add.s32 %r380, %r56, %r379; 2026-02-21T09:47:27.8911748Z add.s32 %r816, %r380, 196640; 2026-02-21T09:47:27.8911912Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8911973Z @%p98 bra $L__BB0_8; 2026-02-21T09:47:27.8912061Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.8912249Z .loc 1 54 31 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:54:31 2026-02-21T09:47:27.8912304Z shl.b32 %r397, %r804, 15; 2026-02-21T09:47:27.8912366Z add.s32 %r399, %r56, %r397; 2026-02-21T09:47:27.8912529Z .loc 1 55 44 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:55:44 2026-02-21T09:47:27.8912583Z shl.b32 %r400, %r804, 13; 2026-02-21T09:47:27.8912644Z add.s32 %r401, %r56, %r400; 2026-02-21T09:47:27.8912700Z add.s32 %r402, %r401, 131072; 2026-02-21T09:47:27.8912859Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8912927Z elect.sync %r403|%p134, -1; 2026-02-21T09:47:27.8913008Z bfe.u32 %r404, %r399, 4, 14; 2026-02-21T09:47:27.8913064Z cvt.u64.u32 %rd116, %r404; 2026-02-21T09:47:27.8913133Z or.b64 %rd99, %rd116, 4611686293439512576; 2026-02-21T09:47:27.8913198Z bfe.u32 %r405, %r402, 4, 14; 2026-02-21T09:47:27.8913255Z cvt.u64.u32 %rd117, %r405; 2026-02-21T09:47:27.8913326Z or.b64 %rd100, %rd117, 4611686293338849280; 2026-02-21T09:47:27.8913386Z mov.b32 %r382, 135266320; 2026-02-21T09:47:27.8913441Z // begin inline asm 2026-02-21T09:47:27.8913579Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd99, %rd100, %r382, %p175; 2026-02-21T09:47:27.8913638Z // end inline asm 2026-02-21T09:47:27.8913693Z add.s32 %r406, %r399, 32; 2026-02-21T09:47:27.8913748Z bfe.u32 %r407, %r406, 4, 14; 2026-02-21T09:47:27.8913804Z cvt.u64.u32 %rd118, %r407; 2026-02-21T09:47:27.8913879Z or.b64 %rd101, %rd118, 4611686293439512576; 2026-02-21T09:47:27.8913935Z add.s32 %r408, %r401, 131104; 2026-02-21T09:47:27.8913990Z bfe.u32 %r409, %r408, 4, 14; 2026-02-21T09:47:27.8914054Z cvt.u64.u32 %rd119, %r409; 2026-02-21T09:47:27.8914117Z or.b64 %rd102, %rd119, 4611686293338849280; 2026-02-21T09:47:27.8914174Z mov.pred %p135, -1; 2026-02-21T09:47:27.8914229Z // begin inline asm 2026-02-21T09:47:27.8914396Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd101, %rd102, %r382, %p135; 2026-02-21T09:47:27.8914451Z // end inline asm 2026-02-21T09:47:27.8914508Z add.s32 %r410, %r399, 64; 2026-02-21T09:47:27.8914571Z bfe.u32 %r411, %r410, 4, 14; 2026-02-21T09:47:27.8914626Z cvt.u64.u32 %rd120, %r411; 2026-02-21T09:47:27.8914717Z or.b64 %rd103, %rd120, 4611686293439512576; 2026-02-21T09:47:27.8914783Z add.s32 %r412, %r401, 131136; 2026-02-21T09:47:27.8914840Z bfe.u32 %r413, %r412, 4, 14; 2026-02-21T09:47:27.8914897Z cvt.u64.u32 %rd121, %r413; 2026-02-21T09:47:27.8914962Z or.b64 %rd104, %rd121, 4611686293338849280; 2026-02-21T09:47:27.8915028Z // begin inline asm 2026-02-21T09:47:27.8915165Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd103, %rd104, %r382, %p135; 2026-02-21T09:47:27.8915221Z // end inline asm 2026-02-21T09:47:27.8915282Z add.s32 %r414, %r399, 96; 2026-02-21T09:47:27.8915339Z bfe.u32 %r415, %r414, 4, 14; 2026-02-21T09:47:27.8915394Z cvt.u64.u32 %rd122, %r415; 2026-02-21T09:47:27.8915459Z or.b64 %rd105, %rd122, 4611686293439512576; 2026-02-21T09:47:27.8915522Z add.s32 %r416, %r401, 131168; 2026-02-21T09:47:27.8915576Z bfe.u32 %r417, %r416, 4, 14; 2026-02-21T09:47:27.8915659Z cvt.u64.u32 %rd123, %r417; 2026-02-21T09:47:27.8915729Z or.b64 %rd106, %rd123, 4611686293338849280; 2026-02-21T09:47:27.8915782Z // begin inline asm 2026-02-21T09:47:27.8915911Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 0 ], %rd105, %rd106, %r382, %p135; 2026-02-21T09:47:27.8915969Z // end inline asm 2026-02-21T09:47:27.8916027Z add.s32 %r418, %r399, 16384; 2026-02-21T09:47:27.8916080Z bfe.u32 %r419, %r418, 4, 14; 2026-02-21T09:47:27.8916135Z cvt.u64.u32 %rd124, %r419; 2026-02-21T09:47:27.8916209Z or.b64 %rd107, %rd124, 4611686293439512576; 2026-02-21T09:47:27.8916261Z // begin inline asm 2026-02-21T09:47:27.8916396Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd107, %rd100, %r382, %p175; 2026-02-21T09:47:27.8916455Z // end inline asm 2026-02-21T09:47:27.8916509Z add.s32 %r420, %r399, 16416; 2026-02-21T09:47:27.8916565Z bfe.u32 %r421, %r420, 4, 14; 2026-02-21T09:47:27.8916646Z cvt.u64.u32 %rd125, %r421; 2026-02-21T09:47:27.8916720Z or.b64 %rd109, %rd125, 4611686293439512576; 2026-02-21T09:47:27.8916772Z // begin inline asm 2026-02-21T09:47:27.8916909Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd109, %rd102, %r382, %p135; 2026-02-21T09:47:27.8916969Z // end inline asm 2026-02-21T09:47:27.8917022Z add.s32 %r422, %r399, 16448; 2026-02-21T09:47:27.8917076Z bfe.u32 %r423, %r422, 4, 14; 2026-02-21T09:47:27.8917136Z cvt.u64.u32 %rd126, %r423; 2026-02-21T09:47:27.8917201Z or.b64 %rd111, %rd126, 4611686293439512576; 2026-02-21T09:47:27.8917300Z // begin inline asm 2026-02-21T09:47:27.8917441Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd111, %rd104, %r382, %p135; 2026-02-21T09:47:27.8917502Z // end inline asm 2026-02-21T09:47:27.8917557Z add.s32 %r424, %r399, 16480; 2026-02-21T09:47:27.8917614Z bfe.u32 %r425, %r424, 4, 14; 2026-02-21T09:47:27.8917681Z cvt.u64.u32 %rd127, %r425; 2026-02-21T09:47:27.8917749Z or.b64 %rd113, %rd127, 4611686293439512576; 2026-02-21T09:47:27.8917804Z // begin inline asm 2026-02-21T09:47:27.8917949Z @%p134 tcgen05.mma.cta_group::1.kind::f16 [ %r792 + 64 ], %rd113, %rd106, %r382, %p135; 2026-02-21T09:47:27.8918003Z // end inline asm 2026-02-21T09:47:27.8918057Z cvt.u64.u32 %rd115, %r816; 2026-02-21T09:47:27.8918112Z // begin inline asm 2026-02-21T09:47:27.8918251Z @%p134 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd115]; 2026-02-21T09:47:27.8918305Z // end inline asm 2026-02-21T09:47:27.8918403Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.8918591Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8918653Z setp.eq.b32 %p153, %r39, 0; 2026-02-21T09:47:27.8918715Z setp.lt.s32 %p154, %r812, %r11; 2026-02-21T09:47:27.8918922Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8918980Z // begin inline asm 2026-02-21T09:47:27.8919029Z 2026-02-21T09:47:27.8919081Z { 2026-02-21T09:47:27.8919146Z .reg .pred complete; 2026-02-21T09:47:27.8919200Z waitLoop: 2026-02-21T09:47:27.8919319Z mbarrier.try_wait.parity.shared.b64 complete, [%r802], %r801; 2026-02-21T09:47:27.8919387Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.8919437Z } 2026-02-21T09:47:27.8919440Z 2026-02-21T09:47:27.8919493Z // end inline asm 2026-02-21T09:47:27.8919557Z add.s32 %r437, %r806, 1; 2026-02-21T09:47:27.8919615Z setp.gt.s32 %p157, %r437, 1; 2026-02-21T09:47:27.8919790Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8919848Z add.s32 %r439, %r800, 64; 2026-02-21T09:47:27.8919910Z add.s32 %r440, %r805, 1; 2026-02-21T09:47:27.8919968Z setp.gt.s32 %p158, %r440, 3; 2026-02-21T09:47:27.8920029Z selp.b32 %r805, 0, %r440, %p158; 2026-02-21T09:47:27.8920097Z selp.b32 %r800, 0, %r439, %p153; 2026-02-21T09:47:27.8920154Z shl.b32 %r441, %r805, 3; 2026-02-21T09:47:27.8920210Z add.s32 %r443, %r56, %r441; 2026-02-21T09:47:27.8920293Z add.s32 %r432, %r443, 196608; 2026-02-21T09:47:27.8920351Z bar.sync 0; 2026-02-21T09:47:27.8920413Z and.pred %p150, %p168, %p154; 2026-02-21T09:47:27.8920467Z // begin inline asm 2026-02-21T09:47:27.8920586Z @%p150 mbarrier.arrive.expect_tx.shared.b64 _, [%r432], 40960; 2026-02-21T09:47:27.8920641Z // end inline asm 2026-02-21T09:47:27.8920815Z .loc 1 54 31 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:54:31 2026-02-21T09:47:27.8920875Z shl.b32 %r444, %r805, 15; 2026-02-21T09:47:27.8920932Z add.s32 %r429, %r56, %r444; 2026-02-21T09:47:27.8920985Z bar.sync 0; 2026-02-21T09:47:27.8921047Z elect.sync %r445|%p159, -1; 2026-02-21T09:47:27.8921113Z and.pred %p160, %p154, %p159; 2026-02-21T09:47:27.8921172Z and.pred %p151, %p3, %p160; 2026-02-21T09:47:27.8921226Z // begin inline asm 2026-02-21T09:47:27.8921517Z @%p151 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r429], [%rd58, {%r800, %r814}], [%r432]; 2026-02-21T09:47:27.8921575Z // end inline asm 2026-02-21T09:47:27.8921744Z .loc 1 55 44 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:55:44 2026-02-21T09:47:27.8921803Z shl.b32 %r446, %r805, 13; 2026-02-21T09:47:27.8921857Z add.s32 %r447, %r56, %r446; 2026-02-21T09:47:27.8921913Z add.s32 %r433, %r447, 131072; 2026-02-21T09:47:27.8921965Z bar.sync 0; 2026-02-21T09:47:27.8922030Z elect.sync %r448|%p161, -1; 2026-02-21T09:47:27.8922089Z and.pred %p162, %p154, %p161; 2026-02-21T09:47:27.8922171Z and.pred %p152, %p3, %p162; 2026-02-21T09:47:27.8922232Z // begin inline asm 2026-02-21T09:47:27.8922486Z @%p152 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r433], [%rd59, {%r800, %r813}], [%r432]; 2026-02-21T09:47:27.8922540Z // end inline asm 2026-02-21T09:47:27.8922717Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8922782Z setp.ne.b32 %p175, %r799, 31; 2026-02-21T09:47:27.8922843Z @%p175 bra $L__BB0_10; 2026-02-21T09:47:27.8922936Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:27.8923116Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8923171Z // begin inline asm 2026-02-21T09:47:27.8923218Z 2026-02-21T09:47:27.8923272Z { 2026-02-21T09:47:27.8923330Z .reg .pred complete; 2026-02-21T09:47:27.8923382Z waitLoop: 2026-02-21T09:47:27.8923500Z mbarrier.try_wait.parity.shared.b64 complete, [%r816], %r817; 2026-02-21T09:47:27.8923568Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.8923615Z } 2026-02-21T09:47:27.8923618Z 2026-02-21T09:47:27.8923670Z // end inline asm 2026-02-21T09:47:27.8923728Z // begin inline asm 2026-02-21T09:47:27.8924029Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r451, %r452, %r453, %r454, %r455, %r456, %r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466}, [%r81 + 0]; 2026-02-21T09:47:27.8924088Z // end inline asm 2026-02-21T09:47:27.8924146Z // begin inline asm 2026-02-21T09:47:27.8924429Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r468, %r469, %r470, %r471, %r472, %r473, %r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483}, [%r81 + 16]; 2026-02-21T09:47:27.8924483Z // end inline asm 2026-02-21T09:47:27.8924535Z // begin inline asm 2026-02-21T09:47:27.8924840Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r485, %r486, %r487, %r488, %r489, %r490, %r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500}, [%r81 + 32]; 2026-02-21T09:47:27.8924895Z // end inline asm 2026-02-21T09:47:27.8924948Z // begin inline asm 2026-02-21T09:47:27.8925217Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r502, %r503, %r504, %r505, %r506, %r507, %r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517}, [%r81 + 48]; 2026-02-21T09:47:27.8925280Z // end inline asm 2026-02-21T09:47:27.8925332Z // begin inline asm 2026-02-21T09:47:27.8925584Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r519, %r520, %r521, %r522, %r523, %r524, %r525, %r526, %r527, %r528, %r529, %r530, %r531, %r532, %r533, %r534}, [%r81 + 64]; 2026-02-21T09:47:27.8925659Z // end inline asm 2026-02-21T09:47:27.8925710Z // begin inline asm 2026-02-21T09:47:27.8925979Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r536, %r537, %r538, %r539, %r540, %r541, %r542, %r543, %r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551}, [%r81 + 80]; 2026-02-21T09:47:27.8926029Z // end inline asm 2026-02-21T09:47:27.8926079Z // begin inline asm 2026-02-21T09:47:27.8926343Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r553, %r554, %r555, %r556, %r557, %r558, %r559, %r560, %r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568}, [%r81 + 96]; 2026-02-21T09:47:27.8926393Z // end inline asm 2026-02-21T09:47:27.8926443Z // begin inline asm 2026-02-21T09:47:27.8926734Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r570, %r571, %r572, %r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585}, [%r81 + 112]; 2026-02-21T09:47:27.8926790Z // end inline asm 2026-02-21T09:47:27.8926841Z // begin inline asm 2026-02-21T09:47:27.8926905Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:27.8926960Z // end inline asm 2026-02-21T09:47:27.8927015Z cvt.u64.u32 %rd131, %r451; 2026-02-21T09:47:27.8927068Z cvt.u64.u32 %rd132, %r452; 2026-02-21T09:47:27.8927123Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:47:27.8927186Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:47:27.8927350Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8927433Z mov.b64 {%r590, %r591}, %rd134; 2026-02-21T09:47:27.8927502Z cvt.rn.f16x2.f32 %r592, %r591, %r590; 2026-02-21T09:47:27.8927663Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8927716Z cvt.u64.u32 %rd135, %r453; 2026-02-21T09:47:27.8927774Z cvt.u64.u32 %rd136, %r454; 2026-02-21T09:47:27.8927827Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:47:27.8927886Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:47:27.8928043Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8928103Z mov.b64 {%r593, %r594}, %rd138; 2026-02-21T09:47:27.8928164Z cvt.rn.f16x2.f32 %r595, %r594, %r593; 2026-02-21T09:47:27.8928326Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8928384Z cvt.u64.u32 %rd139, %r455; 2026-02-21T09:47:27.8928435Z cvt.u64.u32 %rd140, %r456; 2026-02-21T09:47:27.8928491Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:47:27.8928549Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:47:27.8928703Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8928756Z mov.b64 {%r596, %r597}, %rd142; 2026-02-21T09:47:27.8928851Z cvt.rn.f16x2.f32 %r598, %r597, %r596; 2026-02-21T09:47:27.8929020Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8929075Z cvt.u64.u32 %rd143, %r457; 2026-02-21T09:47:27.8929125Z cvt.u64.u32 %rd144, %r458; 2026-02-21T09:47:27.8929183Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:47:27.8929237Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:47:27.8929398Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8929454Z mov.b64 {%r599, %r600}, %rd146; 2026-02-21T09:47:27.8929511Z cvt.rn.f16x2.f32 %r601, %r600, %r599; 2026-02-21T09:47:27.8929674Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8929732Z cvt.u64.u32 %rd147, %r459; 2026-02-21T09:47:27.8929784Z cvt.u64.u32 %rd148, %r460; 2026-02-21T09:47:27.8929836Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:47:27.8929889Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:47:27.8930056Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8930132Z mov.b64 {%r602, %r603}, %rd150; 2026-02-21T09:47:27.8930189Z cvt.rn.f16x2.f32 %r604, %r603, %r602; 2026-02-21T09:47:27.8930356Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8930409Z cvt.u64.u32 %rd151, %r461; 2026-02-21T09:47:27.8930461Z cvt.u64.u32 %rd152, %r462; 2026-02-21T09:47:27.8930512Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:47:27.8930571Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:47:27.8930736Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8930789Z mov.b64 {%r605, %r606}, %rd154; 2026-02-21T09:47:27.8930851Z cvt.rn.f16x2.f32 %r607, %r606, %r605; 2026-02-21T09:47:27.8931014Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8931089Z cvt.u64.u32 %rd155, %r463; 2026-02-21T09:47:27.8931148Z cvt.u64.u32 %rd156, %r464; 2026-02-21T09:47:27.8931202Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:47:27.8931257Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:47:27.8931418Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8931475Z mov.b64 {%r608, %r609}, %rd158; 2026-02-21T09:47:27.8931533Z cvt.rn.f16x2.f32 %r610, %r609, %r608; 2026-02-21T09:47:27.8931691Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8931819Z cvt.u64.u32 %rd159, %r465; 2026-02-21T09:47:27.8931874Z cvt.u64.u32 %rd160, %r466; 2026-02-21T09:47:27.8931928Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:47:27.8931987Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:47:27.8932154Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8932211Z mov.b64 {%r611, %r612}, %rd162; 2026-02-21T09:47:27.8932272Z cvt.rn.f16x2.f32 %r613, %r612, %r611; 2026-02-21T09:47:27.8932441Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8932494Z cvt.u64.u32 %rd163, %r468; 2026-02-21T09:47:27.8932546Z cvt.u64.u32 %rd164, %r469; 2026-02-21T09:47:27.8932612Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:47:27.8932669Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:47:27.8932832Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8932897Z mov.b64 {%r614, %r615}, %rd166; 2026-02-21T09:47:27.8932959Z cvt.rn.f16x2.f32 %r616, %r615, %r614; 2026-02-21T09:47:27.8933130Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8933188Z cvt.u64.u32 %rd167, %r470; 2026-02-21T09:47:27.8933258Z cvt.u64.u32 %rd168, %r471; 2026-02-21T09:47:27.8933351Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:47:27.8933410Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:47:27.8933586Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8933643Z mov.b64 {%r617, %r618}, %rd170; 2026-02-21T09:47:27.8933703Z cvt.rn.f16x2.f32 %r619, %r618, %r617; 2026-02-21T09:47:27.8933880Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8933937Z cvt.u64.u32 %rd171, %r472; 2026-02-21T09:47:27.8933992Z cvt.u64.u32 %rd172, %r473; 2026-02-21T09:47:27.8934047Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:47:27.8934111Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:47:27.8934280Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8934334Z mov.b64 {%r620, %r621}, %rd174; 2026-02-21T09:47:27.8934397Z cvt.rn.f16x2.f32 %r622, %r621, %r620; 2026-02-21T09:47:27.8934558Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8934637Z cvt.u64.u32 %rd175, %r474; 2026-02-21T09:47:27.8934723Z cvt.u64.u32 %rd176, %r475; 2026-02-21T09:47:27.8934781Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:47:27.8934838Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:47:27.8935010Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8935077Z mov.b64 {%r623, %r624}, %rd178; 2026-02-21T09:47:27.8935138Z cvt.rn.f16x2.f32 %r625, %r624, %r623; 2026-02-21T09:47:27.8935304Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8935372Z cvt.u64.u32 %rd179, %r476; 2026-02-21T09:47:27.8935427Z cvt.u64.u32 %rd180, %r477; 2026-02-21T09:47:27.8935483Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:47:27.8935547Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:47:27.8935737Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8935797Z mov.b64 {%r626, %r627}, %rd182; 2026-02-21T09:47:27.8935858Z cvt.rn.f16x2.f32 %r628, %r627, %r626; 2026-02-21T09:47:27.8936032Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8936088Z cvt.u64.u32 %rd183, %r478; 2026-02-21T09:47:27.8936142Z cvt.u64.u32 %rd184, %r479; 2026-02-21T09:47:27.8936204Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:47:27.8936263Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:47:27.8936430Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8936521Z mov.b64 {%r629, %r630}, %rd186; 2026-02-21T09:47:27.8936582Z cvt.rn.f16x2.f32 %r631, %r630, %r629; 2026-02-21T09:47:27.8936746Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8936802Z cvt.u64.u32 %rd187, %r480; 2026-02-21T09:47:27.8936865Z cvt.u64.u32 %rd188, %r481; 2026-02-21T09:47:27.8936921Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:47:27.8936977Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:47:27.8937146Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8937201Z mov.b64 {%r632, %r633}, %rd190; 2026-02-21T09:47:27.8937261Z cvt.rn.f16x2.f32 %r634, %r633, %r632; 2026-02-21T09:47:27.8937434Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8937490Z cvt.u64.u32 %rd191, %r482; 2026-02-21T09:47:27.8937547Z cvt.u64.u32 %rd192, %r483; 2026-02-21T09:47:27.8937600Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:47:27.8937660Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:47:27.8937822Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8937905Z mov.b64 {%r635, %r636}, %rd194; 2026-02-21T09:47:27.8937972Z cvt.rn.f16x2.f32 %r637, %r636, %r635; 2026-02-21T09:47:27.8938132Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8938186Z cvt.u64.u32 %rd195, %r485; 2026-02-21T09:47:27.8938244Z cvt.u64.u32 %rd196, %r486; 2026-02-21T09:47:27.8938296Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:47:27.8938348Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:47:27.8938508Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8938567Z mov.b64 {%r638, %r639}, %rd198; 2026-02-21T09:47:27.8938627Z cvt.rn.f16x2.f32 %r640, %r639, %r638; 2026-02-21T09:47:27.8938788Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8938844Z cvt.u64.u32 %rd199, %r487; 2026-02-21T09:47:27.8938899Z cvt.u64.u32 %rd200, %r488; 2026-02-21T09:47:27.8938950Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:47:27.8939007Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:47:27.8939171Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8939253Z mov.b64 {%r641, %r642}, %rd202; 2026-02-21T09:47:27.8939309Z cvt.rn.f16x2.f32 %r643, %r642, %r641; 2026-02-21T09:47:27.8939471Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8939526Z cvt.u64.u32 %rd203, %r489; 2026-02-21T09:47:27.8939577Z cvt.u64.u32 %rd204, %r490; 2026-02-21T09:47:27.8939636Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:47:27.8939692Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:47:27.8939857Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8939922Z mov.b64 {%r644, %r645}, %rd206; 2026-02-21T09:47:27.8939982Z cvt.rn.f16x2.f32 %r646, %r645, %r644; 2026-02-21T09:47:27.8940169Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8940228Z cvt.u64.u32 %rd207, %r491; 2026-02-21T09:47:27.8940291Z cvt.u64.u32 %rd208, %r492; 2026-02-21T09:47:27.8940348Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:47:27.8940404Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:47:27.8940580Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8940637Z mov.b64 {%r647, %r648}, %rd210; 2026-02-21T09:47:27.8940700Z cvt.rn.f16x2.f32 %r649, %r648, %r647; 2026-02-21T09:47:27.8940881Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8940958Z cvt.u64.u32 %rd211, %r493; 2026-02-21T09:47:27.8941013Z cvt.u64.u32 %rd212, %r494; 2026-02-21T09:47:27.8941069Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:47:27.8941132Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:47:27.8941297Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8941353Z mov.b64 {%r650, %r651}, %rd214; 2026-02-21T09:47:27.8941422Z cvt.rn.f16x2.f32 %r652, %r651, %r650; 2026-02-21T09:47:27.8941583Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8941638Z cvt.u64.u32 %rd215, %r495; 2026-02-21T09:47:27.8941700Z cvt.u64.u32 %rd216, %r496; 2026-02-21T09:47:27.8941755Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:47:27.8941810Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:47:27.8941974Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8942038Z mov.b64 {%r653, %r654}, %rd218; 2026-02-21T09:47:27.8942097Z cvt.rn.f16x2.f32 %r655, %r654, %r653; 2026-02-21T09:47:27.8942261Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8942323Z cvt.u64.u32 %rd219, %r497; 2026-02-21T09:47:27.8942405Z cvt.u64.u32 %rd220, %r498; 2026-02-21T09:47:27.8942464Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:47:27.8942526Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:47:27.8942692Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8942747Z mov.b64 {%r656, %r657}, %rd222; 2026-02-21T09:47:27.8942807Z cvt.rn.f16x2.f32 %r658, %r657, %r656; 2026-02-21T09:47:27.8942979Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8943035Z cvt.u64.u32 %rd223, %r499; 2026-02-21T09:47:27.8943089Z cvt.u64.u32 %rd224, %r500; 2026-02-21T09:47:27.8943155Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:47:27.8943211Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:47:27.8943374Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8943437Z mov.b64 {%r659, %r660}, %rd226; 2026-02-21T09:47:27.8943498Z cvt.rn.f16x2.f32 %r661, %r660, %r659; 2026-02-21T09:47:27.8943664Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8943741Z cvt.u64.u32 %rd227, %r502; 2026-02-21T09:47:27.8943804Z cvt.u64.u32 %rd228, %r503; 2026-02-21T09:47:27.8943859Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:47:27.8943915Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:47:27.8944089Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8944146Z mov.b64 {%r662, %r663}, %rd230; 2026-02-21T09:47:27.8944208Z cvt.rn.f16x2.f32 %r664, %r663, %r662; 2026-02-21T09:47:27.8944381Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8944437Z cvt.u64.u32 %rd231, %r504; 2026-02-21T09:47:27.8944492Z cvt.u64.u32 %rd232, %r505; 2026-02-21T09:47:27.8944547Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:47:27.8944610Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:47:27.8944830Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8944888Z mov.b64 {%r665, %r666}, %rd234; 2026-02-21T09:47:27.8944956Z cvt.rn.f16x2.f32 %r667, %r666, %r665; 2026-02-21T09:47:27.8945121Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8945177Z cvt.u64.u32 %rd235, %r506; 2026-02-21T09:47:27.8945238Z cvt.u64.u32 %rd236, %r507; 2026-02-21T09:47:27.8945294Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:47:27.8945348Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:47:27.8945549Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8945612Z mov.b64 {%r668, %r669}, %rd238; 2026-02-21T09:47:27.8945673Z cvt.rn.f16x2.f32 %r670, %r669, %r668; 2026-02-21T09:47:27.8945839Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8945901Z cvt.u64.u32 %rd239, %r508; 2026-02-21T09:47:27.8945956Z cvt.u64.u32 %rd240, %r509; 2026-02-21T09:47:27.8946013Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:47:27.8946075Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:47:27.8946241Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8946297Z mov.b64 {%r671, %r672}, %rd242; 2026-02-21T09:47:27.8946357Z cvt.rn.f16x2.f32 %r673, %r672, %r671; 2026-02-21T09:47:27.8946532Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8946589Z cvt.u64.u32 %rd243, %r510; 2026-02-21T09:47:27.8946644Z cvt.u64.u32 %rd244, %r511; 2026-02-21T09:47:27.8946709Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:47:27.8946766Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:47:27.8946932Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8947019Z mov.b64 {%r674, %r675}, %rd246; 2026-02-21T09:47:27.8947079Z cvt.rn.f16x2.f32 %r676, %r675, %r674; 2026-02-21T09:47:27.8947238Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8947293Z cvt.u64.u32 %rd247, %r512; 2026-02-21T09:47:27.8947354Z cvt.u64.u32 %rd248, %r513; 2026-02-21T09:47:27.8947409Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:47:27.8947465Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:47:27.8947635Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8947691Z mov.b64 {%r677, %r678}, %rd250; 2026-02-21T09:47:27.8947752Z cvt.rn.f16x2.f32 %r679, %r678, %r677; 2026-02-21T09:47:27.8947916Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8947972Z cvt.u64.u32 %rd251, %r514; 2026-02-21T09:47:27.8948026Z cvt.u64.u32 %rd252, %r515; 2026-02-21T09:47:27.8948084Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:47:27.8948150Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:47:27.8948345Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8948402Z mov.b64 {%r680, %r681}, %rd254; 2026-02-21T09:47:27.8948468Z cvt.rn.f16x2.f32 %r682, %r681, %r680; 2026-02-21T09:47:27.8948632Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8948687Z cvt.u64.u32 %rd255, %r516; 2026-02-21T09:47:27.8948749Z cvt.u64.u32 %rd256, %r517; 2026-02-21T09:47:27.8948806Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:47:27.8948865Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:47:27.8949030Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8949096Z mov.b64 {%r683, %r684}, %rd258; 2026-02-21T09:47:27.8949156Z cvt.rn.f16x2.f32 %r685, %r684, %r683; 2026-02-21T09:47:27.8949347Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8949418Z cvt.u64.u32 %rd259, %r519; 2026-02-21T09:47:27.8949474Z cvt.u64.u32 %rd260, %r520; 2026-02-21T09:47:27.8949530Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:47:27.8949593Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:47:27.8949756Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8949813Z mov.b64 {%r686, %r687}, %rd262; 2026-02-21T09:47:27.8949875Z cvt.rn.f16x2.f32 %r688, %r687, %r686; 2026-02-21T09:47:27.8950046Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8950123Z cvt.u64.u32 %rd263, %r521; 2026-02-21T09:47:27.8950179Z cvt.u64.u32 %rd264, %r522; 2026-02-21T09:47:27.8950242Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:47:27.8950298Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:47:27.8950470Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8950533Z mov.b64 {%r689, %r690}, %rd266; 2026-02-21T09:47:27.8950595Z cvt.rn.f16x2.f32 %r691, %r690, %r689; 2026-02-21T09:47:27.8950766Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8950821Z cvt.u64.u32 %rd267, %r523; 2026-02-21T09:47:27.8950883Z cvt.u64.u32 %rd268, %r524; 2026-02-21T09:47:27.8950938Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:47:27.8950994Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:47:27.8951165Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8951223Z mov.b64 {%r692, %r693}, %rd270; 2026-02-21T09:47:27.8951284Z cvt.rn.f16x2.f32 %r694, %r693, %r692; 2026-02-21T09:47:27.8951461Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8951539Z cvt.u64.u32 %rd271, %r525; 2026-02-21T09:47:27.8951598Z cvt.u64.u32 %rd272, %r526; 2026-02-21T09:47:27.8951654Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:47:27.8951719Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:47:27.8951886Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8951941Z mov.b64 {%r695, %r696}, %rd274; 2026-02-21T09:47:27.8952007Z cvt.rn.f16x2.f32 %r697, %r696, %r695; 2026-02-21T09:47:27.8952172Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8952228Z cvt.u64.u32 %rd275, %r527; 2026-02-21T09:47:27.8952290Z cvt.u64.u32 %rd276, %r528; 2026-02-21T09:47:27.8952346Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:47:27.8952401Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:47:27.8952566Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8952630Z mov.b64 {%r698, %r699}, %rd278; 2026-02-21T09:47:27.8952692Z cvt.rn.f16x2.f32 %r700, %r699, %r698; 2026-02-21T09:47:27.8952856Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8952943Z cvt.u64.u32 %rd279, %r529; 2026-02-21T09:47:27.8952997Z cvt.u64.u32 %rd280, %r530; 2026-02-21T09:47:27.8953053Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:47:27.8953117Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:47:27.8953280Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8953337Z mov.b64 {%r701, %r702}, %rd282; 2026-02-21T09:47:27.8953398Z cvt.rn.f16x2.f32 %r703, %r702, %r701; 2026-02-21T09:47:27.8953576Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8953632Z cvt.u64.u32 %rd283, %r531; 2026-02-21T09:47:27.8953688Z cvt.u64.u32 %rd284, %r532; 2026-02-21T09:47:27.8953749Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:47:27.8953824Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:47:27.8953992Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8954054Z mov.b64 {%r704, %r705}, %rd286; 2026-02-21T09:47:27.8954112Z cvt.rn.f16x2.f32 %r706, %r705, %r704; 2026-02-21T09:47:27.8954278Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8954333Z cvt.u64.u32 %rd287, %r533; 2026-02-21T09:47:27.8954393Z cvt.u64.u32 %rd288, %r534; 2026-02-21T09:47:27.8954449Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:47:27.8954503Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:47:27.8954732Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8954790Z mov.b64 {%r707, %r708}, %rd290; 2026-02-21T09:47:27.8954849Z cvt.rn.f16x2.f32 %r709, %r708, %r707; 2026-02-21T09:47:27.8955027Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8955083Z cvt.u64.u32 %rd291, %r536; 2026-02-21T09:47:27.8955138Z cvt.u64.u32 %rd292, %r537; 2026-02-21T09:47:27.8955194Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:47:27.8955257Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:47:27.8955423Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8955479Z mov.b64 {%r710, %r711}, %rd294; 2026-02-21T09:47:27.8955547Z cvt.rn.f16x2.f32 %r712, %r711, %r710; 2026-02-21T09:47:27.8955714Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8955771Z cvt.u64.u32 %rd295, %r538; 2026-02-21T09:47:27.8955833Z cvt.u64.u32 %rd296, %r539; 2026-02-21T09:47:27.8955890Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:47:27.8955946Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:47:27.8956140Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8956207Z mov.b64 {%r713, %r714}, %rd298; 2026-02-21T09:47:27.8956268Z cvt.rn.f16x2.f32 %r715, %r714, %r713; 2026-02-21T09:47:27.8956435Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8956500Z cvt.u64.u32 %rd299, %r540; 2026-02-21T09:47:27.8956555Z cvt.u64.u32 %rd300, %r541; 2026-02-21T09:47:27.8956612Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:47:27.8956677Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:47:27.8956843Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8956901Z mov.b64 {%r716, %r717}, %rd302; 2026-02-21T09:47:27.8956959Z cvt.rn.f16x2.f32 %r718, %r717, %r716; 2026-02-21T09:47:27.8957133Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8957190Z cvt.u64.u32 %rd303, %r542; 2026-02-21T09:47:27.8957246Z cvt.u64.u32 %rd304, %r543; 2026-02-21T09:47:27.8957312Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:47:27.8957369Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:47:27.8957564Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8957630Z mov.b64 {%r719, %r720}, %rd306; 2026-02-21T09:47:27.8957693Z cvt.rn.f16x2.f32 %r721, %r720, %r719; 2026-02-21T09:47:27.8957860Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8957918Z cvt.u64.u32 %rd307, %r544; 2026-02-21T09:47:27.8957985Z cvt.u64.u32 %rd308, %r545; 2026-02-21T09:47:27.8958043Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:47:27.8958100Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:47:27.8958276Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8958334Z mov.b64 {%r722, %r723}, %rd310; 2026-02-21T09:47:27.8958394Z cvt.rn.f16x2.f32 %r724, %r723, %r722; 2026-02-21T09:47:27.8958590Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8958648Z cvt.u64.u32 %rd311, %r546; 2026-02-21T09:47:27.8958704Z cvt.u64.u32 %rd312, %r547; 2026-02-21T09:47:27.8958759Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:47:27.8958822Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:47:27.8958989Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8959044Z mov.b64 {%r725, %r726}, %rd314; 2026-02-21T09:47:27.8959112Z cvt.rn.f16x2.f32 %r727, %r726, %r725; 2026-02-21T09:47:27.8959305Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8959364Z cvt.u64.u32 %rd315, %r548; 2026-02-21T09:47:27.8959428Z cvt.u64.u32 %rd316, %r549; 2026-02-21T09:47:27.8959484Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:47:27.8959541Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:47:27.8959709Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8959776Z mov.b64 {%r728, %r729}, %rd318; 2026-02-21T09:47:27.8959838Z cvt.rn.f16x2.f32 %r730, %r729, %r728; 2026-02-21T09:47:27.8960005Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8960070Z cvt.u64.u32 %rd319, %r550; 2026-02-21T09:47:27.8960125Z cvt.u64.u32 %rd320, %r551; 2026-02-21T09:47:27.8960181Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:47:27.8960237Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:47:27.8960410Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8960470Z mov.b64 {%r731, %r732}, %rd322; 2026-02-21T09:47:27.8960530Z cvt.rn.f16x2.f32 %r733, %r732, %r731; 2026-02-21T09:47:27.8960704Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8960782Z cvt.u64.u32 %rd323, %r553; 2026-02-21T09:47:27.8960838Z cvt.u64.u32 %rd324, %r554; 2026-02-21T09:47:27.8960902Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:47:27.8960957Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:47:27.8961151Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8961216Z mov.b64 {%r734, %r735}, %rd326; 2026-02-21T09:47:27.8961279Z cvt.rn.f16x2.f32 %r736, %r735, %r734; 2026-02-21T09:47:27.8961453Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8961511Z cvt.u64.u32 %rd327, %r555; 2026-02-21T09:47:27.8961577Z cvt.u64.u32 %rd328, %r556; 2026-02-21T09:47:27.8961635Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:47:27.8961692Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:47:27.8961872Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8961932Z mov.b64 {%r737, %r738}, %rd330; 2026-02-21T09:47:27.8961995Z cvt.rn.f16x2.f32 %r739, %r738, %r737; 2026-02-21T09:47:27.8962200Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8962260Z cvt.u64.u32 %rd331, %r557; 2026-02-21T09:47:27.8962318Z cvt.u64.u32 %rd332, %r558; 2026-02-21T09:47:27.8962376Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:47:27.8962442Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:47:27.8962617Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8962676Z mov.b64 {%r740, %r741}, %rd334; 2026-02-21T09:47:27.8962747Z cvt.rn.f16x2.f32 %r742, %r741, %r740; 2026-02-21T09:47:27.8962924Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8962982Z cvt.u64.u32 %rd335, %r559; 2026-02-21T09:47:27.8963047Z cvt.u64.u32 %rd336, %r560; 2026-02-21T09:47:27.8963107Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:47:27.8963200Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:47:27.8963375Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8963444Z mov.b64 {%r743, %r744}, %rd338; 2026-02-21T09:47:27.8963505Z cvt.rn.f16x2.f32 %r745, %r744, %r743; 2026-02-21T09:47:27.8963679Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8963742Z cvt.u64.u32 %rd339, %r561; 2026-02-21T09:47:27.8963800Z cvt.u64.u32 %rd340, %r562; 2026-02-21T09:47:27.8963857Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:47:27.8963937Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:47:27.8964116Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8964174Z mov.b64 {%r746, %r747}, %rd342; 2026-02-21T09:47:27.8964236Z cvt.rn.f16x2.f32 %r748, %r747, %r746; 2026-02-21T09:47:27.8964414Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8964472Z cvt.u64.u32 %rd343, %r563; 2026-02-21T09:47:27.8964531Z cvt.u64.u32 %rd344, %r564; 2026-02-21T09:47:27.8964594Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:47:27.8964652Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:47:27.8964858Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8964922Z mov.b64 {%r749, %r750}, %rd346; 2026-02-21T09:47:27.8964985Z cvt.rn.f16x2.f32 %r751, %r750, %r749; 2026-02-21T09:47:27.8965159Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8965218Z cvt.u64.u32 %rd347, %r565; 2026-02-21T09:47:27.8965283Z cvt.u64.u32 %rd348, %r566; 2026-02-21T09:47:27.8965342Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:47:27.8965400Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:47:27.8965606Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8965668Z mov.b64 {%r752, %r753}, %rd350; 2026-02-21T09:47:27.8965732Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T09:47:27.8965913Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8965971Z cvt.u64.u32 %rd351, %r567; 2026-02-21T09:47:27.8966029Z cvt.u64.u32 %rd352, %r568; 2026-02-21T09:47:27.8966086Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:47:27.8966152Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:47:27.8966326Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8966387Z mov.b64 {%r755, %r756}, %rd354; 2026-02-21T09:47:27.8966460Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T09:47:27.8966635Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8966696Z cvt.u64.u32 %rd355, %r570; 2026-02-21T09:47:27.8966766Z cvt.u64.u32 %rd356, %r571; 2026-02-21T09:47:27.8966826Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:47:27.8966913Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:47:27.8967089Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8967160Z mov.b64 {%r758, %r759}, %rd358; 2026-02-21T09:47:27.8967223Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T09:47:27.8967396Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8967462Z cvt.u64.u32 %rd359, %r572; 2026-02-21T09:47:27.8967520Z cvt.u64.u32 %rd360, %r573; 2026-02-21T09:47:27.8967580Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:47:27.8967638Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:47:27.8967818Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8967876Z mov.b64 {%r761, %r762}, %rd362; 2026-02-21T09:47:27.8967963Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T09:47:27.8968145Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8968204Z cvt.u64.u32 %rd363, %r574; 2026-02-21T09:47:27.8968261Z cvt.u64.u32 %rd364, %r575; 2026-02-21T09:47:27.8968327Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:47:27.8968385Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:47:27.8968556Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8968615Z mov.b64 {%r764, %r765}, %rd366; 2026-02-21T09:47:27.8968685Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T09:47:27.8968888Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8968948Z cvt.u64.u32 %rd367, %r576; 2026-02-21T09:47:27.8969024Z cvt.u64.u32 %rd368, %r577; 2026-02-21T09:47:27.8969080Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:47:27.8969139Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:47:27.8969310Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8969368Z mov.b64 {%r767, %r768}, %rd370; 2026-02-21T09:47:27.8969430Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T09:47:27.8969591Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8969655Z cvt.u64.u32 %rd371, %r578; 2026-02-21T09:47:27.8969712Z cvt.u64.u32 %rd372, %r579; 2026-02-21T09:47:27.8969768Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:47:27.8969832Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:47:27.8969995Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8970052Z mov.b64 {%r770, %r771}, %rd374; 2026-02-21T09:47:27.8970119Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T09:47:27.8970304Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8970363Z cvt.u64.u32 %rd375, %r580; 2026-02-21T09:47:27.8970418Z cvt.u64.u32 %rd376, %r581; 2026-02-21T09:47:27.8970483Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:47:27.8970538Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:47:27.8970706Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8970769Z mov.b64 {%r773, %r774}, %rd378; 2026-02-21T09:47:27.8970829Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T09:47:27.8970999Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8971064Z cvt.u64.u32 %rd379, %r582; 2026-02-21T09:47:27.8971119Z cvt.u64.u32 %rd380, %r583; 2026-02-21T09:47:27.8971174Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:47:27.8971230Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:47:27.8971402Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8971459Z mov.b64 {%r776, %r777}, %rd382; 2026-02-21T09:47:27.8971518Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T09:47:27.8971714Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8971768Z cvt.u64.u32 %rd383, %r584; 2026-02-21T09:47:27.8971823Z cvt.u64.u32 %rd384, %r585; 2026-02-21T09:47:27.8971884Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:47:27.8971941Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:47:27.8972106Z .loc 1 58 27 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:58:27 2026-02-21T09:47:27.8972164Z mov.b64 {%r779, %r780}, %rd386; 2026-02-21T09:47:27.8972230Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T09:47:27.8972394Z .loc 1 59 45 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:59:45 2026-02-21T09:47:27.8972462Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.8972521Z bar.sync 0; 2026-02-21T09:47:27.8972644Z st.shared.v4.b32 [%r12], {%r592, %r595, %r598, %r601}; 2026-02-21T09:47:27.8972748Z st.shared.v4.b32 [%r12+16384], {%r688, %r691, %r694, %r697}; 2026-02-21T09:47:27.8972844Z st.shared.v4.b32 [%r13], {%r604, %r607, %r610, %r613}; 2026-02-21T09:47:27.8972935Z st.shared.v4.b32 [%r13+16384], {%r700, %r703, %r706, %r709}; 2026-02-21T09:47:27.8973019Z st.shared.v4.b32 [%r14], {%r616, %r619, %r622, %r625}; 2026-02-21T09:47:27.8973115Z st.shared.v4.b32 [%r14+16384], {%r712, %r715, %r718, %r721}; 2026-02-21T09:47:27.8973198Z st.shared.v4.b32 [%r15], {%r628, %r631, %r634, %r637}; 2026-02-21T09:47:27.8973287Z st.shared.v4.b32 [%r15+16384], {%r724, %r727, %r730, %r733}; 2026-02-21T09:47:27.8973388Z st.shared.v4.b32 [%r16], {%r640, %r643, %r646, %r649}; 2026-02-21T09:47:27.8973482Z st.shared.v4.b32 [%r16+16384], {%r736, %r739, %r742, %r745}; 2026-02-21T09:47:27.8973564Z st.shared.v4.b32 [%r17], {%r652, %r655, %r658, %r661}; 2026-02-21T09:47:27.8973653Z st.shared.v4.b32 [%r17+16384], {%r748, %r751, %r754, %r757}; 2026-02-21T09:47:27.8973744Z st.shared.v4.b32 [%r18], {%r664, %r667, %r670, %r673}; 2026-02-21T09:47:27.8973835Z st.shared.v4.b32 [%r18+16384], {%r760, %r763, %r766, %r769}; 2026-02-21T09:47:27.8973916Z st.shared.v4.b32 [%r19], {%r676, %r679, %r682, %r685}; 2026-02-21T09:47:27.8974012Z st.shared.v4.b32 [%r19+16384], {%r772, %r775, %r778, %r781}; 2026-02-21T09:47:27.8974069Z // begin inline asm 2026-02-21T09:47:27.8974140Z fence.proxy.async.shared::cta; 2026-02-21T09:47:27.8974194Z // end inline asm 2026-02-21T09:47:27.8974254Z bar.sync 0; 2026-02-21T09:47:27.8974318Z elect.sync %r782|%p165, -1; 2026-02-21T09:47:27.8974380Z and.pred %p163, %p3, %p165; 2026-02-21T09:47:27.8974442Z // begin inline asm 2026-02-21T09:47:27.8974625Z @%p163 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd130, {%r795, %r797}], [%r350]; 2026-02-21T09:47:27.8974702Z // end inline asm 2026-02-21T09:47:27.8974770Z cp.async.bulk.commit_group; 2026-02-21T09:47:27.8974865Z bra.uni $L__BB0_10; 2026-02-21T09:47:27.8974950Z $L__BB0_11: // %._crit_edge 2026-02-21T09:47:27.8975125Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8975200Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:27.8975252Z bar.sync 0; 2026-02-21T09:47:27.8975309Z @%p83 bra $L__BB0_13; 2026-02-21T09:47:27.8975369Z // %bb.12: 2026-02-21T09:47:27.8975538Z .loc 1 56 52 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:56:52 2026-02-21T09:47:27.8975594Z // begin inline asm 2026-02-21T09:47:27.8975643Z 2026-02-21T09:47:27.8975704Z { 2026-02-21T09:47:27.8975770Z .reg .pred complete; 2026-02-21T09:47:27.8975823Z waitLoop: 2026-02-21T09:47:27.8975946Z mbarrier.try_wait.parity.shared.b64 complete, [%r816], %r817; 2026-02-21T09:47:27.8976008Z @!complete bra.uni waitLoop; 2026-02-21T09:47:27.8976056Z } 2026-02-21T09:47:27.8976060Z 2026-02-21T09:47:27.8976112Z // end inline asm 2026-02-21T09:47:27.8976179Z $L__BB0_13: 2026-02-21T09:47:27.8976351Z .loc 1 33 84 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:84 2026-02-21T09:47:27.8976436Z // begin inline asm 2026-02-21T09:47:27.8976528Z @%p168 mbarrier.inval.shared::cta.b64 [%r219]; 2026-02-21T09:47:27.8976580Z // end inline asm 2026-02-21T09:47:27.8976633Z bar.sync 0; 2026-02-21T09:47:27.8976693Z // begin inline asm 2026-02-21T09:47:27.8976776Z @%p168 mbarrier.inval.shared::cta.b64 [%r220]; 2026-02-21T09:47:27.8976829Z // end inline asm 2026-02-21T09:47:27.8976880Z bar.sync 0; 2026-02-21T09:47:27.8976942Z // begin inline asm 2026-02-21T09:47:27.8977021Z @%p168 mbarrier.inval.shared::cta.b64 [%r221]; 2026-02-21T09:47:27.8977072Z // end inline asm 2026-02-21T09:47:27.8977130Z bar.sync 0; 2026-02-21T09:47:27.8977184Z // begin inline asm 2026-02-21T09:47:27.8977258Z @%p168 mbarrier.inval.shared::cta.b64 [%r324]; 2026-02-21T09:47:27.8977311Z // end inline asm 2026-02-21T09:47:27.8977402Z add.s32 %r790, %r56, 196640; 2026-02-21T09:47:27.8977459Z // begin inline asm 2026-02-21T09:47:27.8977533Z @%p168 mbarrier.inval.shared::cta.b64 [%r790]; 2026-02-21T09:47:27.8977594Z // end inline asm 2026-02-21T09:47:27.8977645Z bar.sync 0; 2026-02-21T09:47:27.8977699Z // begin inline asm 2026-02-21T09:47:27.8977780Z @%p168 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T09:47:27.8977834Z // end inline asm 2026-02-21T09:47:27.8977996Z .loc 1 33 4 // cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py:33:4 2026-02-21T09:47:27.8978047Z bar.sync 0; 2026-02-21T09:47:27.8978107Z // begin inline asm 2026-02-21T09:47:27.8978247Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r792, 128; 2026-02-21T09:47:27.8978300Z // end inline asm 2026-02-21T09:47:27.8978358Z ret; 2026-02-21T09:47:27.8978413Z $L__tmp1: 2026-02-21T09:47:27.8978467Z $L__func_end0: 2026-02-21T09:47:27.8978548Z // -- End function 2026-02-21T09:47:27.8978603Z } 2026-02-21T09:47:27.8978814Z .file 1 "/tmp/torchinductor_root/bp/cbpkqccskhgahbhlz36fcqilym5fxjr6iyikqhmfhecormtupz65.py" 2026-02-21T09:47:27.8978876Z .section .debug_abbrev 2026-02-21T09:47:27.8978932Z { 2026-02-21T09:47:27.8979016Z .b8 1 // Abbreviation Code 2026-02-21T09:47:27.8979097Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:27.8979179Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:27.8979254Z .b8 37 // DW_AT_producer 2026-02-21T09:47:27.8979326Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.8979399Z .b8 19 // DW_AT_language 2026-02-21T09:47:27.8979477Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:27.8979549Z .b8 3 // DW_AT_name 2026-02-21T09:47:27.8979618Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.8979722Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:27.8979795Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:27.8979868Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:27.8979945Z .b8 8 // DW_FORM_string 2026-02-21T09:47:27.8980012Z .b8 0 // EOM(1) 2026-02-21T09:47:27.8980079Z .b8 0 // EOM(2) 2026-02-21T09:47:27.8980142Z .b8 0 // EOM(3) 2026-02-21T09:47:27.8980198Z } 2026-02-21T09:47:27.8980255Z .section .debug_info 2026-02-21T09:47:27.8980304Z { 2026-02-21T09:47:27.8980390Z .b32 104 // Length of Unit 2026-02-21T09:47:27.8980482Z .b8 2 // DWARF version number 2026-02-21T09:47:27.8980548Z .b8 0 2026-02-21T09:47:27.8980699Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:27.8980820Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:27.8980920Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:27.8981040Z .b8 116 // DW_AT_producer 2026-02-21T09:47:27.8981099Z .b8 114 2026-02-21T09:47:27.8981150Z .b8 105 2026-02-21T09:47:27.8981200Z .b8 116 2026-02-21T09:47:27.8981255Z .b8 111 2026-02-21T09:47:27.8981305Z .b8 110 2026-02-21T09:47:27.8981353Z .b8 0 2026-02-21T09:47:27.8981423Z .b8 2 // DW_AT_language 2026-02-21T09:47:27.8981480Z .b8 0 2026-02-21T09:47:27.8981553Z .b8 99 // DW_AT_name 2026-02-21T09:47:27.8981605Z .b8 98 2026-02-21T09:47:27.8981660Z .b8 112 2026-02-21T09:47:27.8981709Z .b8 107 2026-02-21T09:47:27.8981758Z .b8 113 2026-02-21T09:47:27.8981809Z .b8 99 2026-02-21T09:47:27.8981867Z .b8 99 2026-02-21T09:47:27.8981917Z .b8 115 2026-02-21T09:47:27.8981965Z .b8 107 2026-02-21T09:47:27.8982019Z .b8 104 2026-02-21T09:47:27.8982067Z .b8 103 2026-02-21T09:47:27.8982136Z .b8 97 2026-02-21T09:47:27.8982186Z .b8 104 2026-02-21T09:47:27.8982247Z .b8 98 2026-02-21T09:47:27.8982299Z .b8 104 2026-02-21T09:47:27.8982350Z .b8 108 2026-02-21T09:47:27.8982400Z .b8 122 2026-02-21T09:47:27.8982456Z .b8 51 2026-02-21T09:47:27.8982505Z .b8 54 2026-02-21T09:47:27.8982555Z .b8 102 2026-02-21T09:47:27.8982611Z .b8 99 2026-02-21T09:47:27.8982661Z .b8 113 2026-02-21T09:47:27.8982712Z .b8 105 2026-02-21T09:47:27.8982763Z .b8 108 2026-02-21T09:47:27.8982820Z .b8 121 2026-02-21T09:47:27.8982871Z .b8 109 2026-02-21T09:47:27.8982921Z .b8 53 2026-02-21T09:47:27.8982978Z .b8 102 2026-02-21T09:47:27.8983048Z .b8 120 2026-02-21T09:47:27.8983097Z .b8 106 2026-02-21T09:47:27.8983144Z .b8 114 2026-02-21T09:47:27.8983200Z .b8 54 2026-02-21T09:47:27.8983246Z .b8 105 2026-02-21T09:47:27.8983293Z .b8 121 2026-02-21T09:47:27.8983346Z .b8 105 2026-02-21T09:47:27.8983395Z .b8 107 2026-02-21T09:47:27.8983442Z .b8 113 2026-02-21T09:47:27.8983490Z .b8 104 2026-02-21T09:47:27.8983547Z .b8 109 2026-02-21T09:47:27.8983594Z .b8 102 2026-02-21T09:47:27.8983642Z .b8 104 2026-02-21T09:47:27.8983692Z .b8 101 2026-02-21T09:47:27.8983746Z .b8 99 2026-02-21T09:47:27.8983794Z .b8 111 2026-02-21T09:47:27.8983840Z .b8 114 2026-02-21T09:47:27.8983893Z .b8 109 2026-02-21T09:47:27.8983940Z .b8 116 2026-02-21T09:47:27.8983987Z .b8 117 2026-02-21T09:47:27.8984033Z .b8 112 2026-02-21T09:47:27.8984088Z .b8 122 2026-02-21T09:47:27.8984135Z .b8 54 2026-02-21T09:47:27.8984181Z .b8 53 2026-02-21T09:47:27.8984233Z .b8 46 2026-02-21T09:47:27.8984280Z .b8 112 2026-02-21T09:47:27.8984326Z .b8 121 2026-02-21T09:47:27.8984372Z .b8 0 2026-02-21T09:47:27.8984467Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:27.8984537Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:27.8984584Z .b8 116 2026-02-21T09:47:27.8984639Z .b8 109 2026-02-21T09:47:27.8984718Z .b8 112 2026-02-21T09:47:27.8984766Z .b8 47 2026-02-21T09:47:27.8984813Z .b8 116 2026-02-21T09:47:27.8984895Z .b8 111 2026-02-21T09:47:27.8984945Z .b8 114 2026-02-21T09:47:27.8984994Z .b8 99 2026-02-21T09:47:27.8985043Z .b8 104 2026-02-21T09:47:27.8985098Z .b8 105 2026-02-21T09:47:27.8985144Z .b8 110 2026-02-21T09:47:27.8985191Z .b8 100 2026-02-21T09:47:27.8985246Z .b8 117 2026-02-21T09:47:27.8985294Z .b8 99 2026-02-21T09:47:27.8985342Z .b8 116 2026-02-21T09:47:27.8985389Z .b8 111 2026-02-21T09:47:27.8985447Z .b8 114 2026-02-21T09:47:27.8985495Z .b8 95 2026-02-21T09:47:27.8985543Z .b8 114 2026-02-21T09:47:27.8985594Z .b8 111 2026-02-21T09:47:27.8985642Z .b8 111 2026-02-21T09:47:27.8985689Z .b8 116 2026-02-21T09:47:27.8985736Z .b8 47 2026-02-21T09:47:27.8985792Z .b8 98 2026-02-21T09:47:27.8985840Z .b8 112 2026-02-21T09:47:27.8985889Z .b8 0 2026-02-21T09:47:27.8985947Z } 2026-02-21T09:47:27.8986009Z .section .debug_macinfo { } 2026-02-21T09:47:27.8986013Z 2026-02-21T09:47:27.8986090Z ================================================================ 2026-02-21T09:47:27.8986191Z please share the reproducer above with Triton project. 2026-02-21T09:47:28.0592315Z [65s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:28.0592805Z 2026-02-21T09:47:28.0597975Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]), static_shapes=True) 2026-02-21T09:47:28.0599152Z 2026-02-21T09:47:28.0599345Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:28.0599580Z `ptxas` stderr: 2026-02-21T09:47:28.0600175Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:28.0600665Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:28.0600821Z 2026-02-21T09:47:28.0601220Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp34g020w7.ptx -o /tmp/tmp34g020w7.ptx.o 2026-02-21T09:47:28.0601673Z 2026-02-21T09:47:28.0601814Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:28.0602011Z 2026-02-21T09:47:28.0602096Z ================================================================ 2026-02-21T09:47:28.0602319Z Internal Triton PTX codegen error 2026-02-21T09:47:28.0602547Z `ptxas` stderr: 2026-02-21T09:47:28.0602968Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:28.0603450Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:28.0603603Z 2026-02-21T09:47:28.0604005Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp34g020w7.ptx -o /tmp/tmp34g020w7.ptx.o 2026-02-21T09:47:28.0604482Z 2026-02-21T09:47:28.0604485Z 2026-02-21T09:47:28.0604542Z // 2026-02-21T09:47:28.0604767Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:28.0604941Z // 2026-02-21T09:47:28.0605014Z 2026-02-21T09:47:28.0605080Z .version 8.7 2026-02-21T09:47:28.0605218Z .target sm_100a 2026-02-21T09:47:28.0605371Z .address_size 64 2026-02-21T09:47:28.0605460Z 2026-02-21T09:47:28.0605587Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:28.0605864Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:28.0606094Z // @_helion_matmul 2026-02-21T09:47:28.0606307Z .visible .entry _helion_matmul( 2026-02-21T09:47:28.0606538Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:28.0606834Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:28.0607090Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:28.0607338Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:28.0607597Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:28.0607811Z ) 2026-02-21T09:47:28.0607933Z .reqntid 256 2026-02-21T09:47:28.0608072Z .maxnreg 32 2026-02-21T09:47:28.0608196Z { 2026-02-21T09:47:28.0608332Z .reg .pred %p<172>; 2026-02-21T09:47:28.0608486Z .reg .b32 %r<589>; 2026-02-21T09:47:28.0608640Z .reg .b64 %rd<259>; 2026-02-21T09:47:28.0608915Z .loc 1 19 0 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:19:0 2026-02-21T09:47:28.0609232Z $L__func_begin0: 2026-02-21T09:47:28.0609485Z .loc 1 19 0 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:19:0 2026-02-21T09:47:28.0609741Z 2026-02-21T09:47:28.0609794Z // %bb.0: 2026-02-21T09:47:28.0609968Z ld.param.b64 %rd5, [_helion_matmul_param_0]; 2026-02-21T09:47:28.0610148Z $L__tmp0: 2026-02-21T09:47:28.0610434Z .loc 1 19 0 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:19 2026-02-21T09:47:28.0610710Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:28.0610881Z ld.param.b64 %rd23, [_helion_matmul_param_1]; 2026-02-21T09:47:28.0611073Z setp.lt.u32 %p3, %r1, 32; 2026-02-21T09:47:28.0611255Z ld.param.b64 %rd41, [_helion_matmul_param_2]; 2026-02-21T09:47:28.0611445Z mov.b32 %r56, global_smem; 2026-02-21T09:47:28.0611606Z // begin inline asm 2026-02-21T09:47:28.0611839Z @%p3 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r56], 128; 2026-02-21T09:47:28.0612079Z // end inline asm 2026-02-21T09:47:28.0612241Z ld.param.b64 %rd64, [_helion_matmul_param_3]; 2026-02-21T09:47:28.0612418Z bar.sync 0; 2026-02-21T09:47:28.0612562Z ld.shared.b32 %r563, [global_smem]; 2026-02-21T09:47:28.0612729Z bar.sync 0; 2026-02-21T09:47:28.0612865Z // begin inline asm 2026-02-21T09:47:28.0613090Z @%p3 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:28.0613323Z // end inline asm 2026-02-21T09:47:28.0613572Z .loc 1 21 67 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:21:67 2026-02-21T09:47:28.0613851Z mov.u32 %r586, %ctaid.x; 2026-02-21T09:47:28.0614005Z mov.u32 %r184, %ctaid.y; 2026-02-21T09:47:28.0614147Z mov.u32 %r185, %ctaid.z; 2026-02-21T09:47:28.0614298Z mov.u32 %r186, %nctaid.x; 2026-02-21T09:47:28.0614442Z mov.u32 %r187, %nctaid.y; 2026-02-21T09:47:28.0614604Z mad.lo.s32 %r188, %r185, %r187, %r184; 2026-02-21T09:47:28.0614853Z mad.lo.s32 %r189, %r188, %r186, %r586; 2026-02-21T09:47:28.0615032Z mul.lo.s32 %r190, %r189, 384; 2026-02-21T09:47:28.0615197Z cvt.s64.s32 %rd65, %r190; 2026-02-21T09:47:28.0615349Z add.s64 %rd19, %rd64, %rd65; 2026-02-21T09:47:28.0615510Z shl.b32 %r191, %r1, 2; 2026-02-21T09:47:28.0615659Z add.s32 %r57, %r56, %r191; 2026-02-21T09:47:28.0615817Z mov.b32 %r588, 0; 2026-02-21T09:47:28.0615955Z // begin inline asm 2026-02-21T09:47:28.0616112Z @%p3 st.shared.b32 [ %r57 + 0 ], %r588; 2026-02-21T09:47:28.0616281Z // end inline asm 2026-02-21T09:47:28.0616427Z bar.warp.sync -1; 2026-02-21T09:47:28.0616569Z setp.eq.b32 %p164, %r1, 0; 2026-02-21T09:47:28.0616731Z cvt.u64.u32 %rd4, %r56; 2026-02-21T09:47:28.0616884Z // begin inline asm 2026-02-21T09:47:28.0617127Z @%p164 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd5; 2026-02-21T09:47:28.0617413Z // end inline asm 2026-02-21T09:47:28.0617542Z // begin inline asm 2026-02-21T09:47:28.0617768Z @%p164 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:28.0618019Z // end inline asm 2026-02-21T09:47:28.0618158Z mov.b32 %r59, 64; 2026-02-21T09:47:28.0618290Z // begin inline asm 2026-02-21T09:47:28.0618528Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:28.0618837Z // end inline asm 2026-02-21T09:47:28.0618968Z mov.b32 %r60, 256; 2026-02-21T09:47:28.0619107Z // begin inline asm 2026-02-21T09:47:28.0619327Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:47:28.0619589Z // end inline asm 2026-02-21T09:47:28.0619716Z mov.b32 %r61, 2048; 2026-02-21T09:47:28.0619866Z // begin inline asm 2026-02-21T09:47:28.0620103Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:47:28.0620364Z // end inline asm 2026-02-21T09:47:28.0620505Z // begin inline asm 2026-02-21T09:47:28.0620742Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:47:28.0621012Z // end inline asm 2026-02-21T09:47:28.0621138Z mov.b64 %rd12, 4096; 2026-02-21T09:47:28.0621279Z // begin inline asm 2026-02-21T09:47:28.0621524Z @%p164 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:28.0621799Z // end inline asm 2026-02-21T09:47:28.0621935Z mov.b32 %r63, 1; 2026-02-21T09:47:28.0622058Z // begin inline asm 2026-02-21T09:47:28.0622343Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:28.0622616Z // end inline asm 2026-02-21T09:47:28.0622745Z // begin inline asm 2026-02-21T09:47:28.0622978Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:28.0623252Z // end inline asm 2026-02-21T09:47:28.0623382Z // begin inline asm 2026-02-21T09:47:28.0623603Z @%p164 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:28.0623860Z // end inline asm 2026-02-21T09:47:28.0623982Z // begin inline asm 2026-02-21T09:47:28.0624225Z @%p164 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:28.0624493Z // end inline asm 2026-02-21T09:47:28.0624623Z // begin inline asm 2026-02-21T09:47:28.0624914Z @%p164 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:28.0625172Z // end inline asm 2026-02-21T09:47:28.0625307Z // begin inline asm 2026-02-21T09:47:28.0625531Z @%p164 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:28.0625790Z // end inline asm 2026-02-21T09:47:28.0625918Z // begin inline asm 2026-02-21T09:47:28.0626268Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd19 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:28.0626642Z // end inline asm 2026-02-21T09:47:28.0626768Z // begin inline asm 2026-02-21T09:47:28.0626979Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd19 + 0 ], 0x80; 2026-02-21T09:47:28.0627251Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:28.0627436Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:28.0627598Z // end inline asm 2026-02-21T09:47:28.0627725Z bar.sync 0; 2026-02-21T09:47:28.0627857Z cvta.global.u64 %rd58, %rd19; 2026-02-21T09:47:28.0628135Z .loc 1 22 68 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:22:68 2026-02-21T09:47:28.0628420Z add.s32 %r192, %r190, 128; 2026-02-21T09:47:28.0628565Z cvt.s64.s32 %rd66, %r192; 2026-02-21T09:47:28.0628717Z add.s64 %rd37, %rd64, %rd66; 2026-02-21T09:47:28.0628860Z bar.sync 0; 2026-02-21T09:47:28.0628985Z // begin inline asm 2026-02-21T09:47:28.0629128Z @%p3 st.shared.b32 [ %r57 + 0 ], %r588; 2026-02-21T09:47:28.0629296Z // end inline asm 2026-02-21T09:47:28.0629424Z bar.warp.sync -1; 2026-02-21T09:47:28.0629566Z // begin inline asm 2026-02-21T09:47:28.0629817Z @%p164 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd23; 2026-02-21T09:47:28.0630082Z // end inline asm 2026-02-21T09:47:28.0630213Z // begin inline asm 2026-02-21T09:47:28.0630422Z @%p164 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:28.0630662Z // end inline asm 2026-02-21T09:47:28.0630781Z // begin inline asm 2026-02-21T09:47:28.0631051Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:28.0631312Z // end inline asm 2026-02-21T09:47:28.0631440Z // begin inline asm 2026-02-21T09:47:28.0631662Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r59; 2026-02-21T09:47:28.0631917Z // end inline asm 2026-02-21T09:47:28.0632050Z // begin inline asm 2026-02-21T09:47:28.0632281Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r61; 2026-02-21T09:47:28.0632556Z // end inline asm 2026-02-21T09:47:28.0632686Z mov.b32 %r70, 12288; 2026-02-21T09:47:28.0632829Z // begin inline asm 2026-02-21T09:47:28.0633066Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r70; 2026-02-21T09:47:28.0633327Z // end inline asm 2026-02-21T09:47:28.0633465Z // begin inline asm 2026-02-21T09:47:28.0633710Z @%p164 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd12; 2026-02-21T09:47:28.0633996Z // end inline asm 2026-02-21T09:47:28.0634124Z // begin inline asm 2026-02-21T09:47:28.0634401Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:28.0634711Z // end inline asm 2026-02-21T09:47:28.0634849Z // begin inline asm 2026-02-21T09:47:28.0635097Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:28.0635369Z // end inline asm 2026-02-21T09:47:28.0635502Z // begin inline asm 2026-02-21T09:47:28.0635724Z @%p164 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:28.0635989Z // end inline asm 2026-02-21T09:47:28.0636117Z // begin inline asm 2026-02-21T09:47:28.0636368Z @%p164 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:28.0636649Z // end inline asm 2026-02-21T09:47:28.0636778Z // begin inline asm 2026-02-21T09:47:28.0637050Z @%p164 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:28.0637314Z // end inline asm 2026-02-21T09:47:28.0637449Z // begin inline asm 2026-02-21T09:47:28.0637846Z @%p164 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:28.0638107Z // end inline asm 2026-02-21T09:47:28.0638241Z // begin inline asm 2026-02-21T09:47:28.0638579Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd37 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:28.0638958Z // end inline asm 2026-02-21T09:47:28.0639087Z // begin inline asm 2026-02-21T09:47:28.0639299Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd37 + 0 ], 0x80; 2026-02-21T09:47:28.0639576Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:28.0639769Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:28.0639950Z // end inline asm 2026-02-21T09:47:28.0640080Z bar.sync 0; 2026-02-21T09:47:28.0640222Z cvta.global.u64 %rd59, %rd37; 2026-02-21T09:47:28.0640493Z .loc 1 24 73 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:24:73 2026-02-21T09:47:28.0640789Z add.s32 %r193, %r190, 256; 2026-02-21T09:47:28.0640944Z cvt.s64.s32 %rd67, %r193; 2026-02-21T09:47:28.0641103Z add.s64 %rd55, %rd64, %rd67; 2026-02-21T09:47:28.0641250Z bar.sync 0; 2026-02-21T09:47:28.0641380Z // begin inline asm 2026-02-21T09:47:28.0641524Z @%p3 st.shared.b32 [ %r57 + 0 ], %r588; 2026-02-21T09:47:28.0641699Z // end inline asm 2026-02-21T09:47:28.0641839Z bar.warp.sync -1; 2026-02-21T09:47:28.0641975Z // begin inline asm 2026-02-21T09:47:28.0642217Z @%p164 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd4 + 0 ], %rd41; 2026-02-21T09:47:28.0642487Z // end inline asm 2026-02-21T09:47:28.0642619Z // begin inline asm 2026-02-21T09:47:28.0642834Z @%p164 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1; 2026-02-21T09:47:28.0643088Z // end inline asm 2026-02-21T09:47:28.0643216Z // begin inline asm 2026-02-21T09:47:28.0643485Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r59; 2026-02-21T09:47:28.0643748Z // end inline asm 2026-02-21T09:47:28.0643876Z // begin inline asm 2026-02-21T09:47:28.0644103Z @%p164 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r60; 2026-02-21T09:47:28.0644361Z // end inline asm 2026-02-21T09:47:28.0644493Z // begin inline asm 2026-02-21T09:47:28.0644775Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r70; 2026-02-21T09:47:28.0645050Z // end inline asm 2026-02-21T09:47:28.0645189Z // begin inline asm 2026-02-21T09:47:28.0645433Z @%p164 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r61; 2026-02-21T09:47:28.0645724Z // end inline asm 2026-02-21T09:47:28.0645860Z mov.b64 %rd48, 24576; 2026-02-21T09:47:28.0646012Z // begin inline asm 2026-02-21T09:47:28.0646268Z @%p164 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd4 + 0 ], 0x0, %rd48; 2026-02-21T09:47:28.0646564Z // end inline asm 2026-02-21T09:47:28.0646698Z // begin inline asm 2026-02-21T09:47:28.0646986Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0, %r63; 2026-02-21T09:47:28.0647279Z // end inline asm 2026-02-21T09:47:28.0647412Z // begin inline asm 2026-02-21T09:47:28.0647668Z @%p164 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x1, %r63; 2026-02-21T09:47:28.0647949Z // end inline asm 2026-02-21T09:47:28.0648086Z // begin inline asm 2026-02-21T09:47:28.0648316Z @%p164 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x6; 2026-02-21T09:47:28.0648588Z // end inline asm 2026-02-21T09:47:28.0648728Z // begin inline asm 2026-02-21T09:47:28.0648978Z @%p164 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:28.0649270Z // end inline asm 2026-02-21T09:47:28.0649403Z // begin inline asm 2026-02-21T09:47:28.0649807Z @%p164 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x3; 2026-02-21T09:47:28.0650083Z // end inline asm 2026-02-21T09:47:28.0650233Z // begin inline asm 2026-02-21T09:47:28.0650483Z @%p164 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd4 + 0 ], 0x0; 2026-02-21T09:47:28.0650756Z // end inline asm 2026-02-21T09:47:28.0650910Z // begin inline asm 2026-02-21T09:47:28.0651266Z @%p3 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd55 + 0 ], [ %rd4 + 0 ], 0x80; 2026-02-21T09:47:28.0651679Z // end inline asm 2026-02-21T09:47:28.0651827Z // begin inline asm 2026-02-21T09:47:28.0652061Z @%p3 fence.proxy.tensormap::generic.acquire.gpu [ %rd55 + 0 ], 0x80; 2026-02-21T09:47:28.0652363Z @%p3 cp.async.bulk.commit_group ; 2026-02-21T09:47:28.0652565Z @%p3 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:28.0652752Z // end inline asm 2026-02-21T09:47:28.0652877Z bar.sync 0; 2026-02-21T09:47:28.0653020Z cvta.global.u64 %rd130, %rd55; 2026-02-21T09:47:28.0653293Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0653588Z max.u32 %r194, %r586, 1535; 2026-02-21T09:47:28.0653740Z shl.b32 %r195, %r194, 5; 2026-02-21T09:47:28.0653898Z add.s32 %r4, %r195, -49120; 2026-02-21T09:47:28.0654057Z sub.s32 %r5, 49152, %r195; 2026-02-21T09:47:28.0654313Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0654603Z shr.u32 %r196, %r1, 5; 2026-02-21T09:47:28.0654796Z shfl.sync.idx.b32 %r6, %r196, 0, 31, -1; 2026-02-21T09:47:28.0654982Z shl.b32 %r197, %r6, 21; 2026-02-21T09:47:28.0655134Z and.b32 %r198, %r197, 6291456; 2026-02-21T09:47:28.0655301Z add.s32 %r199, %r198, %r563; 2026-02-21T09:47:28.0655452Z shl.b32 %r200, %r6, 4; 2026-02-21T09:47:28.0655604Z and.b32 %r201, %r200, 64; 2026-02-21T09:47:28.0655754Z add.s32 %r81, %r199, %r201; 2026-02-21T09:47:28.0655913Z mov.pred %p98, -1; 2026-02-21T09:47:28.0656086Z // begin inline asm 2026-02-21T09:47:28.0656446Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 0], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:28.0656831Z // end inline asm 2026-02-21T09:47:28.0656960Z // begin inline asm 2026-02-21T09:47:28.0657311Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 16], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:28.0657687Z // end inline asm 2026-02-21T09:47:28.0657816Z // begin inline asm 2026-02-21T09:47:28.0658157Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 32], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:28.0658526Z // end inline asm 2026-02-21T09:47:28.0658662Z // begin inline asm 2026-02-21T09:47:28.0659004Z @%p98 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r81 + 48], {%r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588, %r588}; 2026-02-21T09:47:28.0659410Z // end inline asm 2026-02-21T09:47:28.0659547Z // begin inline asm 2026-02-21T09:47:28.0659690Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:28.0659855Z // end inline asm 2026-02-21T09:47:28.0659981Z bar.sync 0; 2026-02-21T09:47:28.0660225Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0660509Z add.s32 %r587, %r56, 196640; 2026-02-21T09:47:28.0660667Z // begin inline asm 2026-02-21T09:47:28.0660828Z @%p164 mbarrier.init.shared::cta.b64 [%r587], 1; 2026-02-21T09:47:28.0661021Z // end inline asm 2026-02-21T09:47:28.0661154Z bar.sync 0; 2026-02-21T09:47:28.0661283Z add.s32 %r150, %r56, 196648; 2026-02-21T09:47:28.0661442Z // begin inline asm 2026-02-21T09:47:28.0661603Z @%p164 mbarrier.init.shared::cta.b64 [%r150], 1; 2026-02-21T09:47:28.0661803Z // end inline asm 2026-02-21T09:47:28.0661961Z add.s32 %r151, %r56, 196608; 2026-02-21T09:47:28.0662115Z // begin inline asm 2026-02-21T09:47:28.0662272Z @%p164 mbarrier.init.shared::cta.b64 [%r151], 1; 2026-02-21T09:47:28.0662465Z // end inline asm 2026-02-21T09:47:28.0662589Z bar.sync 0; 2026-02-21T09:47:28.0662722Z add.s32 %r152, %r56, 196616; 2026-02-21T09:47:28.0662875Z // begin inline asm 2026-02-21T09:47:28.0663031Z @%p164 mbarrier.init.shared::cta.b64 [%r152], 1; 2026-02-21T09:47:28.0663216Z // end inline asm 2026-02-21T09:47:28.0663340Z bar.sync 0; 2026-02-21T09:47:28.0663473Z add.s32 %r153, %r56, 196624; 2026-02-21T09:47:28.0663617Z // begin inline asm 2026-02-21T09:47:28.0663814Z @%p164 mbarrier.init.shared::cta.b64 [%r153], 1; 2026-02-21T09:47:28.0663990Z // end inline asm 2026-02-21T09:47:28.0664123Z bar.sync 0; 2026-02-21T09:47:28.0664248Z add.s32 %r259, %r56, 196632; 2026-02-21T09:47:28.0664399Z // begin inline asm 2026-02-21T09:47:28.0664558Z @%p164 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:47:28.0664761Z // end inline asm 2026-02-21T09:47:28.0664905Z setp.lt.s32 %p79, %r5, 1; 2026-02-21T09:47:28.0665059Z setp.gt.s32 %p78, %r5, 0; 2026-02-21T09:47:28.0665326Z .loc 1 40 33 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:40:33 2026-02-21T09:47:28.0665607Z shr.u32 %r202, %r586, 3; 2026-02-21T09:47:28.0665766Z and.b32 %r203, %r202, 268435452; 2026-02-21T09:47:28.0666041Z .loc 1 41 39 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:39 2026-02-21T09:47:28.0666317Z sub.s32 %r204, 192, %r203; 2026-02-21T09:47:28.0666578Z .loc 1 41 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:52 2026-02-21T09:47:28.0666856Z min.s32 %r205, %r204, 4; 2026-02-21T09:47:28.0667111Z .loc 1 42 45 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:45 2026-02-21T09:47:28.0667388Z and.b32 %r206, %r586, 31; 2026-02-21T09:47:28.0667679Z .loc 1 43 51 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:43:51 2026-02-21T09:47:28.0667960Z div.s32 %r207, %r206, %r205; 2026-02-21T09:47:28.0668214Z .loc 1 42 64 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:64 2026-02-21T09:47:28.0668498Z mul.lo.s32 %r208, %r207, %r205; 2026-02-21T09:47:28.0668656Z sub.s32 %r209, %r206, %r208; 2026-02-21T09:47:28.0668913Z .loc 1 42 30 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:30 2026-02-21T09:47:28.0669185Z add.s32 %r210, %r209, %r203; 2026-02-21T09:47:28.0669442Z .loc 1 44 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:44:27 2026-02-21T09:47:28.0669724Z shl.b32 %r565, %r210, 6; 2026-02-21T09:47:28.0669970Z .loc 1 45 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:45:27 2026-02-21T09:47:28.0670250Z shl.b32 %r567, %r207, 8; 2026-02-21T09:47:28.0670500Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0670777Z bar.sync 0; 2026-02-21T09:47:28.0670956Z and.pred %p69, %p164, %p78; 2026-02-21T09:47:28.0671123Z // begin inline asm 2026-02-21T09:47:28.0671316Z @%p69 mbarrier.arrive.expect_tx.shared.b64 _, [%r151], 40960; 2026-02-21T09:47:28.0671543Z // end inline asm 2026-02-21T09:47:28.0671796Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:28.0672076Z // begin inline asm 2026-02-21T09:47:28.0672241Z fence.proxy.async.shared::cta; 2026-02-21T09:47:28.0672406Z // end inline asm 2026-02-21T09:47:28.0672547Z bar.sync 0; 2026-02-21T09:47:28.0672684Z elect.sync %r211|%p80, -1; 2026-02-21T09:47:28.0672863Z and.pred %p81, %p78, %p80; 2026-02-21T09:47:28.0673026Z and.pred %p70, %p3, %p81; 2026-02-21T09:47:28.0673191Z // begin inline asm 2026-02-21T09:47:28.0673545Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r56], [%rd58, {%r588, %r567}], [%r151]; 2026-02-21T09:47:28.0673899Z // end inline asm 2026-02-21T09:47:28.0674145Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:28.0674414Z bar.sync 0; 2026-02-21T09:47:28.0674552Z elect.sync %r212|%p82, -1; 2026-02-21T09:47:28.0674734Z and.pred %p83, %p78, %p82; 2026-02-21T09:47:28.0674897Z and.pred %p71, %p3, %p83; 2026-02-21T09:47:28.0675057Z add.s32 %r160, %r56, 131072; 2026-02-21T09:47:28.0675206Z // begin inline asm 2026-02-21T09:47:28.0675532Z @%p71 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r160], [%rd59, {%r588, %r565}], [%r151]; 2026-02-21T09:47:28.0675927Z // end inline asm 2026-02-21T09:47:28.0676175Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0676455Z setp.gt.s32 %p84, %r5, 1; 2026-02-21T09:47:28.0676608Z bar.sync 0; 2026-02-21T09:47:28.0676741Z and.pred %p72, %p164, %p84; 2026-02-21T09:47:28.0676904Z // begin inline asm 2026-02-21T09:47:28.0677093Z @%p72 mbarrier.arrive.expect_tx.shared.b64 _, [%r152], 40960; 2026-02-21T09:47:28.0677305Z // end inline asm 2026-02-21T09:47:28.0677549Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:28.0677816Z bar.sync 0; 2026-02-21T09:47:28.0677954Z elect.sync %r213|%p85, -1; 2026-02-21T09:47:28.0678108Z and.pred %p86, %p84, %p85; 2026-02-21T09:47:28.0678267Z and.pred %p73, %p3, %p86; 2026-02-21T09:47:28.0678414Z add.s32 %r165, %r56, 32768; 2026-02-21T09:47:28.0678568Z // begin inline asm 2026-02-21T09:47:28.0678886Z @%p73 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r165], [%rd58, {%r59, %r567}], [%r152]; 2026-02-21T09:47:28.0679224Z // end inline asm 2026-02-21T09:47:28.0679468Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:28.0679771Z bar.sync 0; 2026-02-21T09:47:28.0679912Z elect.sync %r214|%p87, -1; 2026-02-21T09:47:28.0680064Z and.pred %p88, %p84, %p87; 2026-02-21T09:47:28.0680223Z and.pred %p74, %p3, %p88; 2026-02-21T09:47:28.0680379Z add.s32 %r169, %r56, 139264; 2026-02-21T09:47:28.0680524Z // begin inline asm 2026-02-21T09:47:28.0680837Z @%p74 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r169], [%rd59, {%r59, %r565}], [%r152]; 2026-02-21T09:47:28.0681184Z // end inline asm 2026-02-21T09:47:28.0681428Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0681713Z setp.gt.s32 %p89, %r5, 2; 2026-02-21T09:47:28.0681864Z bar.sync 0; 2026-02-21T09:47:28.0681992Z and.pred %p75, %p164, %p89; 2026-02-21T09:47:28.0682151Z // begin inline asm 2026-02-21T09:47:28.0682340Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r153], 40960; 2026-02-21T09:47:28.0682554Z // end inline asm 2026-02-21T09:47:28.0682804Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:28.0683104Z bar.sync 0; 2026-02-21T09:47:28.0683240Z elect.sync %r215|%p90, -1; 2026-02-21T09:47:28.0683393Z and.pred %p91, %p89, %p90; 2026-02-21T09:47:28.0683552Z and.pred %p76, %p3, %p91; 2026-02-21T09:47:28.0683699Z add.s32 %r174, %r56, 65536; 2026-02-21T09:47:28.0683852Z mov.b32 %r175, 128; 2026-02-21T09:47:28.0683994Z // begin inline asm 2026-02-21T09:47:28.0684314Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r174], [%rd58, {%r175, %r567}], [%r153]; 2026-02-21T09:47:28.0684657Z // end inline asm 2026-02-21T09:47:28.0684937Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:28.0685223Z bar.sync 0; 2026-02-21T09:47:28.0685352Z elect.sync %r216|%p92, -1; 2026-02-21T09:47:28.0685508Z and.pred %p93, %p89, %p92; 2026-02-21T09:47:28.0685665Z and.pred %p77, %p3, %p93; 2026-02-21T09:47:28.0685844Z add.s32 %r178, %r56, 147456; 2026-02-21T09:47:28.0686001Z // begin inline asm 2026-02-21T09:47:28.0686311Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r178], [%rd59, {%r175, %r565}], [%r153]; 2026-02-21T09:47:28.0686648Z // end inline asm 2026-02-21T09:47:28.0686884Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0687153Z bar.sync 0; 2026-02-21T09:47:28.0687272Z // begin inline asm 2026-02-21T09:47:28.0687405Z 2026-02-21T09:47:28.0687519Z { 2026-02-21T09:47:28.0687639Z @!%p78 bra.uni skipWait; 2026-02-21T09:47:28.0687825Z .reg .pred complete; 2026-02-21T09:47:28.0687961Z waitLoop: 2026-02-21T09:47:28.0688146Z mbarrier.try_wait.parity.shared.b64 complete, [%r151], %r588; 2026-02-21T09:47:28.0688374Z @!complete bra.uni waitLoop; 2026-02-21T09:47:28.0688527Z skipWait: 2026-02-21T09:47:28.0688637Z } 2026-02-21T09:47:28.0688706Z 2026-02-21T09:47:28.0688759Z // end inline asm 2026-02-21T09:47:28.0688994Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0689285Z setp.ne.b32 %p94, %r6, 0; 2026-02-21T09:47:28.0689442Z or.pred %p95, %p79, %p94; 2026-02-21T09:47:28.0689596Z @%p95 bra $L__BB0_2; 2026-02-21T09:47:28.0689743Z // %bb.1: 2026-02-21T09:47:28.0689873Z elect.sync %r233|%p97, -1; 2026-02-21T09:47:28.0690035Z bfe.u32 %r235, %r56, 4, 14; 2026-02-21T09:47:28.0690186Z cvt.u64.u32 %rd85, %r235; 2026-02-21T09:47:28.0690356Z or.b64 %rd68, %rd85, 4611686293439512576; 2026-02-21T09:47:28.0690535Z bfe.u32 %r237, %r160, 4, 14; 2026-02-21T09:47:28.0690698Z cvt.u64.u32 %rd86, %r237; 2026-02-21T09:47:28.0690862Z or.b64 %rd69, %rd86, 4611686293338849280; 2026-02-21T09:47:28.0691036Z mov.b32 %r218, 135266320; 2026-02-21T09:47:28.0691193Z mov.pred %p96, 0; 2026-02-21T09:47:28.0691334Z // begin inline asm 2026-02-21T09:47:28.0691602Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd68, %rd69, %r218, %p96; 2026-02-21T09:47:28.0691868Z // end inline asm 2026-02-21T09:47:28.0692015Z add.s32 %r238, %r56, 32; 2026-02-21T09:47:28.0692175Z bfe.u32 %r239, %r238, 4, 14; 2026-02-21T09:47:28.0692335Z cvt.u64.u32 %rd87, %r239; 2026-02-21T09:47:28.0692497Z or.b64 %rd70, %rd87, 4611686293439512576; 2026-02-21T09:47:28.0692679Z add.s32 %r240, %r56, 131104; 2026-02-21T09:47:28.0692836Z bfe.u32 %r241, %r240, 4, 14; 2026-02-21T09:47:28.0692986Z cvt.u64.u32 %rd88, %r241; 2026-02-21T09:47:28.0693148Z or.b64 %rd71, %rd88, 4611686293338849280; 2026-02-21T09:47:28.0693321Z // begin inline asm 2026-02-21T09:47:28.0693546Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd70, %rd71, %r218, %p98; 2026-02-21T09:47:28.0693798Z // end inline asm 2026-02-21T09:47:28.0693940Z add.s32 %r242, %r56, 64; 2026-02-21T09:47:28.0694092Z bfe.u32 %r243, %r242, 4, 14; 2026-02-21T09:47:28.0694243Z cvt.u64.u32 %rd89, %r243; 2026-02-21T09:47:28.0694406Z or.b64 %rd72, %rd89, 4611686293439512576; 2026-02-21T09:47:28.0694578Z add.s32 %r244, %r56, 131136; 2026-02-21T09:47:28.0694782Z bfe.u32 %r245, %r244, 4, 14; 2026-02-21T09:47:28.0694965Z cvt.u64.u32 %rd90, %r245; 2026-02-21T09:47:28.0695125Z or.b64 %rd73, %rd90, 4611686293338849280; 2026-02-21T09:47:28.0695294Z // begin inline asm 2026-02-21T09:47:28.0695510Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd72, %rd73, %r218, %p98; 2026-02-21T09:47:28.0695761Z // end inline asm 2026-02-21T09:47:28.0695892Z add.s32 %r246, %r56, 96; 2026-02-21T09:47:28.0696043Z bfe.u32 %r247, %r246, 4, 14; 2026-02-21T09:47:28.0696191Z cvt.u64.u32 %rd91, %r247; 2026-02-21T09:47:28.0696352Z or.b64 %rd74, %rd91, 4611686293439512576; 2026-02-21T09:47:28.0696521Z add.s32 %r248, %r56, 131168; 2026-02-21T09:47:28.0696674Z bfe.u32 %r249, %r248, 4, 14; 2026-02-21T09:47:28.0696820Z cvt.u64.u32 %rd92, %r249; 2026-02-21T09:47:28.0696981Z or.b64 %rd75, %rd92, 4611686293338849280; 2026-02-21T09:47:28.0697150Z // begin inline asm 2026-02-21T09:47:28.0697400Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd74, %rd75, %r218, %p98; 2026-02-21T09:47:28.0697652Z // end inline asm 2026-02-21T09:47:28.0697790Z add.s32 %r250, %r56, 16384; 2026-02-21T09:47:28.0697954Z bfe.u32 %r251, %r250, 4, 14; 2026-02-21T09:47:28.0698109Z cvt.u64.u32 %rd93, %r251; 2026-02-21T09:47:28.0698272Z or.b64 %rd76, %rd93, 4611686293439512576; 2026-02-21T09:47:28.0698443Z // begin inline asm 2026-02-21T09:47:28.0698679Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd76, %rd69, %r218, %p96; 2026-02-21T09:47:28.0698919Z // end inline asm 2026-02-21T09:47:28.0699058Z add.s32 %r252, %r56, 16416; 2026-02-21T09:47:28.0699238Z bfe.u32 %r253, %r252, 4, 14; 2026-02-21T09:47:28.0699383Z cvt.u64.u32 %rd94, %r253; 2026-02-21T09:47:28.0699538Z or.b64 %rd78, %rd94, 4611686293439512576; 2026-02-21T09:47:28.0699700Z // begin inline asm 2026-02-21T09:47:28.0699913Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd78, %rd71, %r218, %p98; 2026-02-21T09:47:28.0700149Z // end inline asm 2026-02-21T09:47:28.0700286Z add.s32 %r254, %r56, 16448; 2026-02-21T09:47:28.0700434Z bfe.u32 %r255, %r254, 4, 14; 2026-02-21T09:47:28.0700584Z cvt.u64.u32 %rd95, %r255; 2026-02-21T09:47:28.0700750Z or.b64 %rd80, %rd95, 4611686293439512576; 2026-02-21T09:47:28.0700917Z // begin inline asm 2026-02-21T09:47:28.0701135Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd80, %rd73, %r218, %p98; 2026-02-21T09:47:28.0701375Z // end inline asm 2026-02-21T09:47:28.0701512Z add.s32 %r256, %r56, 16480; 2026-02-21T09:47:28.0701660Z bfe.u32 %r257, %r256, 4, 14; 2026-02-21T09:47:28.0701817Z cvt.u64.u32 %rd96, %r257; 2026-02-21T09:47:28.0701968Z or.b64 %rd82, %rd96, 4611686293439512576; 2026-02-21T09:47:28.0702138Z // begin inline asm 2026-02-21T09:47:28.0702350Z @%p97 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd82, %rd75, %r218, %p98; 2026-02-21T09:47:28.0702588Z // end inline asm 2026-02-21T09:47:28.0702754Z add.s32 %r258, %r56, 196640; 2026-02-21T09:47:28.0702906Z cvt.u64.u32 %rd84, %r258; 2026-02-21T09:47:28.0702967Z // begin inline asm 2026-02-21T09:47:28.0703092Z @%p97 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd84]; 2026-02-21T09:47:28.0703145Z // end inline asm 2026-02-21T09:47:28.0703197Z $L__BB0_2: 2026-02-21T09:47:28.0703374Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0703434Z setp.gt.s32 %p116, %r5, 3; 2026-02-21T09:47:28.0703486Z bar.sync 0; 2026-02-21T09:47:28.0703559Z and.pred %p113, %p164, %p116; 2026-02-21T09:47:28.0703613Z // begin inline asm 2026-02-21T09:47:28.0703726Z @%p113 mbarrier.arrive.expect_tx.shared.b64 _, [%r259], 40960; 2026-02-21T09:47:28.0703786Z // end inline asm 2026-02-21T09:47:28.0703950Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:28.0704003Z bar.sync 0; 2026-02-21T09:47:28.0704066Z elect.sync %r271|%p119, -1; 2026-02-21T09:47:28.0704138Z and.pred %p120, %p116, %p119; 2026-02-21T09:47:28.0704199Z and.pred %p114, %p3, %p120; 2026-02-21T09:47:28.0704279Z add.s32 %r260, %r56, 98304; 2026-02-21T09:47:28.0704340Z mov.b32 %r571, 192; 2026-02-21T09:47:28.0704392Z // begin inline asm 2026-02-21T09:47:28.0704638Z @%p114 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r260], [%rd58, {%r571, %r567}], [%r259]; 2026-02-21T09:47:28.0704727Z // end inline asm 2026-02-21T09:47:28.0704895Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:28.0704946Z bar.sync 0; 2026-02-21T09:47:28.0705009Z elect.sync %r272|%p121, -1; 2026-02-21T09:47:28.0705078Z and.pred %p122, %p116, %p121; 2026-02-21T09:47:28.0705136Z and.pred %p115, %p3, %p122; 2026-02-21T09:47:28.0705192Z add.s32 %r264, %r56, 155648; 2026-02-21T09:47:28.0705252Z // begin inline asm 2026-02-21T09:47:28.0705521Z @%p115 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r264], [%rd59, {%r571, %r565}], [%r259]; 2026-02-21T09:47:28.0705575Z // end inline asm 2026-02-21T09:47:28.0705745Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0705801Z sub.s32 %r10, 31, %r4; 2026-02-21T09:47:28.0705859Z setp.lt.s32 %p123, %r10, 1; 2026-02-21T09:47:28.0705916Z @%p123 bra $L__BB0_11; 2026-02-21T09:47:28.0705995Z // %bb.3: // %.lr.ph 2026-02-21T09:47:28.0706158Z .loc 1 0 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:0:84 2026-02-21T09:47:28.0706241Z sub.s32 %r11, 28, %r4; 2026-02-21T09:47:28.0706302Z shl.b32 %r279, %r1, 7; 2026-02-21T09:47:28.0706359Z and.b32 %r280, %r279, 32640; 2026-02-21T09:47:28.0706413Z shl.b32 %r281, %r1, 4; 2026-02-21T09:47:28.0706470Z and.b32 %r282, %r281, 112; 2026-02-21T09:47:28.0706533Z or.b32 %r283, %r280, %r282; 2026-02-21T09:47:28.0706588Z add.s32 %r285, %r56, 163840; 2026-02-21T09:47:28.0706645Z add.s32 %r12, %r285, %r283; 2026-02-21T09:47:28.0706708Z xor.b32 %r286, %r283, 16; 2026-02-21T09:47:28.0706765Z add.s32 %r13, %r285, %r286; 2026-02-21T09:47:28.0706818Z xor.b32 %r287, %r283, 32; 2026-02-21T09:47:28.0706879Z add.s32 %r14, %r285, %r287; 2026-02-21T09:47:28.0706933Z xor.b32 %r288, %r283, 48; 2026-02-21T09:47:28.0706987Z add.s32 %r15, %r285, %r288; 2026-02-21T09:47:28.0707040Z xor.b32 %r289, %r283, 64; 2026-02-21T09:47:28.0707103Z add.s32 %r16, %r285, %r289; 2026-02-21T09:47:28.0707156Z xor.b32 %r290, %r283, 80; 2026-02-21T09:47:28.0707211Z add.s32 %r17, %r285, %r290; 2026-02-21T09:47:28.0707272Z xor.b32 %r291, %r283, 96; 2026-02-21T09:47:28.0707326Z add.s32 %r18, %r285, %r291; 2026-02-21T09:47:28.0707382Z xor.b32 %r292, %r283, 112; 2026-02-21T09:47:28.0707435Z add.s32 %r19, %r285, %r292; 2026-02-21T09:47:28.0707497Z add.s32 %r573, %r56, 196640; 2026-02-21T09:47:28.0707555Z mov.pred %p171, -1; 2026-02-21T09:47:28.0707631Z mov.b32 %r576, 3; 2026-02-21T09:47:28.0707692Z mov.b32 %r572, 0; 2026-02-21T09:47:28.0707745Z mov.b32 %r570, 1; 2026-02-21T09:47:28.0707798Z mov.b32 %r569, 2; 2026-02-21T09:47:28.0707852Z mov.b32 %r566, %r565; 2026-02-21T09:47:28.0707914Z mov.b32 %r568, %r567; 2026-02-21T09:47:28.0707967Z mov.b32 %r574, %r572; 2026-02-21T09:47:28.0708020Z mov.b32 %r575, %r572; 2026-02-21T09:47:28.0708082Z mov.b32 %r577, %r570; 2026-02-21T09:47:28.0708136Z mov.b32 %r578, %r572; 2026-02-21T09:47:28.0708189Z mov.b32 %r579, %r567; 2026-02-21T09:47:28.0708243Z mov.b32 %r580, %r565; 2026-02-21T09:47:28.0708304Z mov.b32 %r582, %r576; 2026-02-21T09:47:28.0708359Z mov.b32 %r583, %r572; 2026-02-21T09:47:28.0708413Z mov.b32 %r584, %r580; 2026-02-21T09:47:28.0708473Z mov.b32 %r585, %r579; 2026-02-21T09:47:28.0708527Z bra.uni $L__BB0_4; 2026-02-21T09:47:28.0708629Z $L__BB0_10: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:28.0708693Z selp.b32 %r577, 0, %r372, %p153; 2026-02-21T09:47:28.0708759Z selp.b32 %r373, 1, 0, %p153; 2026-02-21T09:47:28.0708814Z xor.b32 %r578, %r588, %r373; 2026-02-21T09:47:28.0709002Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0709065Z add.s32 %r583, %r583, 1; 2026-02-21T09:47:28.0709128Z setp.lt.s32 %p162, %r583, %r10; 2026-02-21T09:47:28.0709181Z mov.b32 %r565, %r580; 2026-02-21T09:47:28.0709242Z mov.b32 %r566, %r20; 2026-02-21T09:47:28.0709296Z mov.b32 %r567, %r579; 2026-02-21T09:47:28.0709350Z mov.b32 %r568, %r22; 2026-02-21T09:47:28.0709403Z mov.b32 %r569, %r582; 2026-02-21T09:47:28.0709463Z mov.b32 %r570, %r24; 2026-02-21T09:47:28.0709517Z mov.b32 %r572, %r588; 2026-02-21T09:47:28.0709570Z mov.b32 %r573, %r587; 2026-02-21T09:47:28.0709628Z mov.b32 %r579, %r585; 2026-02-21T09:47:28.0709680Z mov.b32 %r580, %r584; 2026-02-21T09:47:28.0709732Z mov.b32 %r582, %r39; 2026-02-21T09:47:28.0709786Z @%p162 bra $L__BB0_4; 2026-02-21T09:47:28.0709847Z bra.uni $L__BB0_11; 2026-02-21T09:47:28.0709980Z $L__BB0_4: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:28.0710148Z .loc 1 0 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:0:84 2026-02-21T09:47:28.0710208Z mov.b32 %r588, %r578; 2026-02-21T09:47:28.0710262Z mov.b32 %r24, %r569; 2026-02-21T09:47:28.0710314Z mov.b32 %r22, %r567; 2026-02-21T09:47:28.0710366Z mov.b32 %r20, %r565; 2026-02-21T09:47:28.0710539Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0710595Z add.s32 %r293, %r582, 1; 2026-02-21T09:47:28.0710677Z setp.eq.b32 %p125, %r582, 31; 2026-02-21T09:47:28.0710746Z selp.b32 %r39, 0, %r293, %p125; 2026-02-21T09:47:28.0710805Z setp.ne.b32 %p126, %r39, 0; 2026-02-21T09:47:28.0710861Z @%p126 bra $L__BB0_6; 2026-02-21T09:47:28.0710963Z // %bb.5: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:28.0711022Z add.s32 %r586, %r586, 1; 2026-02-21T09:47:28.0711186Z .loc 1 39 35 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:39:35 2026-02-21T09:47:28.0711242Z shr.s32 %r294, %r586, 31; 2026-02-21T09:47:28.0711305Z shr.u32 %r295, %r294, 27; 2026-02-21T09:47:28.0711359Z add.s32 %r296, %r586, %r295; 2026-02-21T09:47:28.0711413Z shr.s32 %r297, %r296, 5; 2026-02-21T09:47:28.0711580Z .loc 1 40 33 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:40:33 2026-02-21T09:47:28.0711634Z shl.b32 %r298, %r297, 2; 2026-02-21T09:47:28.0711796Z .loc 1 41 39 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:39 2026-02-21T09:47:28.0711860Z sub.s32 %r299, 192, %r298; 2026-02-21T09:47:28.0712020Z .loc 1 41 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:41:52 2026-02-21T09:47:28.0712075Z min.s32 %r300, %r299, 4; 2026-02-21T09:47:28.0712255Z .loc 1 42 45 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:45 2026-02-21T09:47:28.0712320Z and.b32 %r301, %r296, -32; 2026-02-21T09:47:28.0712376Z sub.s32 %r302, %r586, %r301; 2026-02-21T09:47:28.0712539Z .loc 1 43 51 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:43:51 2026-02-21T09:47:28.0712600Z div.s32 %r303, %r302, %r300; 2026-02-21T09:47:28.0712762Z .loc 1 42 64 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:64 2026-02-21T09:47:28.0712821Z mul.lo.s32 %r304, %r303, %r300; 2026-02-21T09:47:28.0712883Z sub.s32 %r305, %r302, %r304; 2026-02-21T09:47:28.0713045Z .loc 1 42 30 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:42:30 2026-02-21T09:47:28.0713100Z add.s32 %r306, %r305, %r298; 2026-02-21T09:47:28.0713266Z .loc 1 44 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:44:27 2026-02-21T09:47:28.0713320Z shl.b32 %r584, %r306, 6; 2026-02-21T09:47:28.0713485Z .loc 1 45 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:45:27 2026-02-21T09:47:28.0713560Z shl.b32 %r585, %r303, 8; 2026-02-21T09:47:28.0713662Z $L__BB0_6: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:28.0713821Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0713875Z add.s32 %r309, %r575, 1; 2026-02-21T09:47:28.0713939Z setp.gt.s32 %p128, %r309, 3; 2026-02-21T09:47:28.0714001Z selp.b32 %r575, 0, %r309, %p128; 2026-02-21T09:47:28.0714058Z selp.b32 %r310, 1, 0, %p128; 2026-02-21T09:47:28.0714120Z xor.b32 %r574, %r574, %r310; 2026-02-21T09:47:28.0714173Z shl.b32 %r311, %r575, 3; 2026-02-21T09:47:28.0714229Z add.s32 %r313, %r56, %r311; 2026-02-21T09:47:28.0714286Z add.s32 %r307, %r313, 196608; 2026-02-21T09:47:28.0714346Z bar.sync 0; 2026-02-21T09:47:28.0714400Z // begin inline asm 2026-02-21T09:47:28.0714449Z 2026-02-21T09:47:28.0714505Z { 2026-02-21T09:47:28.0714583Z .reg .pred complete; 2026-02-21T09:47:28.0714639Z waitLoop: 2026-02-21T09:47:28.0714788Z mbarrier.try_wait.parity.shared.b64 complete, [%r307], %r574; 2026-02-21T09:47:28.0714859Z @!complete bra.uni waitLoop; 2026-02-21T09:47:28.0714907Z } 2026-02-21T09:47:28.0714912Z 2026-02-21T09:47:28.0714965Z // end inline asm 2026-02-21T09:47:28.0715025Z shl.b32 %r314, %r577, 3; 2026-02-21T09:47:28.0715079Z add.s32 %r315, %r56, %r314; 2026-02-21T09:47:28.0715136Z add.s32 %r587, %r315, 196640; 2026-02-21T09:47:28.0715295Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0715386Z @%p94 bra $L__BB0_8; 2026-02-21T09:47:28.0715476Z // %bb.7: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:28.0715644Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:28.0715710Z shl.b32 %r332, %r575, 15; 2026-02-21T09:47:28.0715768Z add.s32 %r334, %r56, %r332; 2026-02-21T09:47:28.0715934Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:28.0716001Z shl.b32 %r335, %r575, 13; 2026-02-21T09:47:28.0716057Z add.s32 %r336, %r56, %r335; 2026-02-21T09:47:28.0716114Z add.s32 %r337, %r336, 131072; 2026-02-21T09:47:28.0716292Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0716355Z elect.sync %r338|%p130, -1; 2026-02-21T09:47:28.0716413Z bfe.u32 %r339, %r334, 4, 14; 2026-02-21T09:47:28.0716473Z cvt.u64.u32 %rd116, %r339; 2026-02-21T09:47:28.0716562Z or.b64 %rd99, %rd116, 4611686293439512576; 2026-02-21T09:47:28.0716620Z bfe.u32 %r340, %r337, 4, 14; 2026-02-21T09:47:28.0716677Z cvt.u64.u32 %rd117, %r340; 2026-02-21T09:47:28.0716755Z or.b64 %rd100, %rd117, 4611686293338849280; 2026-02-21T09:47:28.0716810Z mov.b32 %r317, 135266320; 2026-02-21T09:47:28.0716897Z // begin inline asm 2026-02-21T09:47:28.0717041Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd99, %rd100, %r317, %p171; 2026-02-21T09:47:28.0717103Z // end inline asm 2026-02-21T09:47:28.0717157Z add.s32 %r341, %r334, 32; 2026-02-21T09:47:28.0717211Z bfe.u32 %r342, %r341, 4, 14; 2026-02-21T09:47:28.0717273Z cvt.u64.u32 %rd118, %r342; 2026-02-21T09:47:28.0717340Z or.b64 %rd101, %rd118, 4611686293439512576; 2026-02-21T09:47:28.0717396Z add.s32 %r343, %r336, 131104; 2026-02-21T09:47:28.0717460Z bfe.u32 %r344, %r343, 4, 14; 2026-02-21T09:47:28.0717515Z cvt.u64.u32 %rd119, %r344; 2026-02-21T09:47:28.0717579Z or.b64 %rd102, %rd119, 4611686293338849280; 2026-02-21T09:47:28.0717638Z mov.pred %p131, -1; 2026-02-21T09:47:28.0717701Z // begin inline asm 2026-02-21T09:47:28.0717836Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd101, %rd102, %r317, %p131; 2026-02-21T09:47:28.0717891Z // end inline asm 2026-02-21T09:47:28.0717953Z add.s32 %r345, %r334, 64; 2026-02-21T09:47:28.0718011Z bfe.u32 %r346, %r345, 4, 14; 2026-02-21T09:47:28.0718067Z cvt.u64.u32 %rd120, %r346; 2026-02-21T09:47:28.0718168Z or.b64 %rd103, %rd120, 4611686293439512576; 2026-02-21T09:47:28.0718231Z add.s32 %r347, %r336, 131136; 2026-02-21T09:47:28.0718286Z bfe.u32 %r348, %r347, 4, 14; 2026-02-21T09:47:28.0718342Z cvt.u64.u32 %rd121, %r348; 2026-02-21T09:47:28.0718413Z or.b64 %rd104, %rd121, 4611686293338849280; 2026-02-21T09:47:28.0718466Z // begin inline asm 2026-02-21T09:47:28.0718598Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd103, %rd104, %r317, %p131; 2026-02-21T09:47:28.0718657Z // end inline asm 2026-02-21T09:47:28.0718711Z add.s32 %r349, %r334, 96; 2026-02-21T09:47:28.0718766Z bfe.u32 %r350, %r349, 4, 14; 2026-02-21T09:47:28.0718822Z cvt.u64.u32 %rd122, %r350; 2026-02-21T09:47:28.0718895Z or.b64 %rd105, %rd122, 4611686293439512576; 2026-02-21T09:47:28.0718950Z add.s32 %r351, %r336, 131168; 2026-02-21T09:47:28.0719007Z bfe.u32 %r352, %r351, 4, 14; 2026-02-21T09:47:28.0719095Z cvt.u64.u32 %rd123, %r352; 2026-02-21T09:47:28.0719161Z or.b64 %rd106, %rd123, 4611686293338849280; 2026-02-21T09:47:28.0719217Z // begin inline asm 2026-02-21T09:47:28.0719355Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 0 ], %rd105, %rd106, %r317, %p131; 2026-02-21T09:47:28.0719408Z // end inline asm 2026-02-21T09:47:28.0719462Z add.s32 %r353, %r334, 16384; 2026-02-21T09:47:28.0719518Z bfe.u32 %r354, %r353, 4, 14; 2026-02-21T09:47:28.0719582Z cvt.u64.u32 %rd124, %r354; 2026-02-21T09:47:28.0719644Z or.b64 %rd107, %rd124, 4611686293439512576; 2026-02-21T09:47:28.0719698Z // begin inline asm 2026-02-21T09:47:28.0719864Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd107, %rd100, %r317, %p171; 2026-02-21T09:47:28.0719917Z // end inline asm 2026-02-21T09:47:28.0719973Z add.s32 %r355, %r334, 16416; 2026-02-21T09:47:28.0720027Z bfe.u32 %r356, %r355, 4, 14; 2026-02-21T09:47:28.0720087Z cvt.u64.u32 %rd125, %r356; 2026-02-21T09:47:28.0720155Z or.b64 %rd109, %rd125, 4611686293439512576; 2026-02-21T09:47:28.0720209Z // begin inline asm 2026-02-21T09:47:28.0720351Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd109, %rd102, %r317, %p131; 2026-02-21T09:47:28.0720405Z // end inline asm 2026-02-21T09:47:28.0720459Z add.s32 %r357, %r334, 16448; 2026-02-21T09:47:28.0720519Z bfe.u32 %r358, %r357, 4, 14; 2026-02-21T09:47:28.0720574Z cvt.u64.u32 %rd126, %r358; 2026-02-21T09:47:28.0720637Z or.b64 %rd111, %rd126, 4611686293439512576; 2026-02-21T09:47:28.0720691Z // begin inline asm 2026-02-21T09:47:28.0720831Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd111, %rd104, %r317, %p131; 2026-02-21T09:47:28.0720885Z // end inline asm 2026-02-21T09:47:28.0720939Z add.s32 %r359, %r334, 16480; 2026-02-21T09:47:28.0720999Z bfe.u32 %r360, %r359, 4, 14; 2026-02-21T09:47:28.0721053Z cvt.u64.u32 %rd127, %r360; 2026-02-21T09:47:28.0721116Z or.b64 %rd113, %rd127, 4611686293439512576; 2026-02-21T09:47:28.0721196Z // begin inline asm 2026-02-21T09:47:28.0721337Z @%p130 tcgen05.mma.cta_group::1.kind::f16 [ %r563 + 64 ], %rd113, %rd106, %r317, %p131; 2026-02-21T09:47:28.0721391Z // end inline asm 2026-02-21T09:47:28.0721448Z cvt.u64.u32 %rd115, %r587; 2026-02-21T09:47:28.0721508Z // begin inline asm 2026-02-21T09:47:28.0721630Z @%p130 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd115]; 2026-02-21T09:47:28.0721682Z // end inline asm 2026-02-21T09:47:28.0721782Z $L__BB0_8: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:28.0721950Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0722011Z setp.eq.b32 %p149, %r39, 0; 2026-02-21T09:47:28.0722078Z setp.lt.s32 %p150, %r583, %r11; 2026-02-21T09:47:28.0722242Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0722298Z // begin inline asm 2026-02-21T09:47:28.0722345Z 2026-02-21T09:47:28.0722402Z { 2026-02-21T09:47:28.0722463Z .reg .pred complete; 2026-02-21T09:47:28.0722517Z waitLoop: 2026-02-21T09:47:28.0722662Z mbarrier.try_wait.parity.shared.b64 complete, [%r573], %r572; 2026-02-21T09:47:28.0722725Z @!complete bra.uni waitLoop; 2026-02-21T09:47:28.0722773Z } 2026-02-21T09:47:28.0722776Z 2026-02-21T09:47:28.0722828Z // end inline asm 2026-02-21T09:47:28.0722891Z add.s32 %r372, %r577, 1; 2026-02-21T09:47:28.0722950Z setp.gt.s32 %p153, %r372, 1; 2026-02-21T09:47:28.0723116Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0723180Z add.s32 %r374, %r571, 64; 2026-02-21T09:47:28.0723237Z add.s32 %r375, %r576, 1; 2026-02-21T09:47:28.0723295Z setp.gt.s32 %p154, %r375, 3; 2026-02-21T09:47:28.0723366Z selp.b32 %r576, 0, %r375, %p154; 2026-02-21T09:47:28.0723427Z selp.b32 %r571, 0, %r374, %p149; 2026-02-21T09:47:28.0723483Z shl.b32 %r376, %r576, 3; 2026-02-21T09:47:28.0723541Z add.s32 %r378, %r56, %r376; 2026-02-21T09:47:28.0723628Z add.s32 %r367, %r378, 196608; 2026-02-21T09:47:28.0723684Z bar.sync 0; 2026-02-21T09:47:28.0723749Z and.pred %p146, %p164, %p150; 2026-02-21T09:47:28.0723811Z // begin inline asm 2026-02-21T09:47:28.0723921Z @%p146 mbarrier.arrive.expect_tx.shared.b64 _, [%r367], 40960; 2026-02-21T09:47:28.0723974Z // end inline asm 2026-02-21T09:47:28.0724135Z .loc 1 54 31 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:54:31 2026-02-21T09:47:28.0724198Z shl.b32 %r379, %r576, 15; 2026-02-21T09:47:28.0724252Z add.s32 %r364, %r56, %r379; 2026-02-21T09:47:28.0724303Z bar.sync 0; 2026-02-21T09:47:28.0724392Z elect.sync %r380|%p155, -1; 2026-02-21T09:47:28.0724453Z and.pred %p156, %p150, %p155; 2026-02-21T09:47:28.0724513Z and.pred %p147, %p3, %p156; 2026-02-21T09:47:28.0724566Z // begin inline asm 2026-02-21T09:47:28.0724853Z @%p147 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r364], [%rd58, {%r571, %r585}], [%r367]; 2026-02-21T09:47:28.0724911Z // end inline asm 2026-02-21T09:47:28.0725067Z .loc 1 55 44 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:55:44 2026-02-21T09:47:28.0725131Z shl.b32 %r381, %r576, 13; 2026-02-21T09:47:28.0725185Z add.s32 %r382, %r56, %r381; 2026-02-21T09:47:28.0725240Z add.s32 %r368, %r382, 131072; 2026-02-21T09:47:28.0725300Z bar.sync 0; 2026-02-21T09:47:28.0725360Z elect.sync %r383|%p157, -1; 2026-02-21T09:47:28.0725418Z and.pred %p158, %p150, %p157; 2026-02-21T09:47:28.0725485Z and.pred %p148, %p3, %p158; 2026-02-21T09:47:28.0725538Z // begin inline asm 2026-02-21T09:47:28.0725776Z @%p148 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r368], [%rd59, {%r571, %r584}], [%r367]; 2026-02-21T09:47:28.0725829Z // end inline asm 2026-02-21T09:47:28.0725998Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0726089Z setp.ne.b32 %p171, %r570, 31; 2026-02-21T09:47:28.0726152Z @%p171 bra $L__BB0_10; 2026-02-21T09:47:28.0726252Z // %bb.9: // in Loop: Header=BB0_4 Depth=1 2026-02-21T09:47:28.0726421Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0726476Z // begin inline asm 2026-02-21T09:47:28.0726533Z 2026-02-21T09:47:28.0726583Z { 2026-02-21T09:47:28.0726641Z .reg .pred complete; 2026-02-21T09:47:28.0726695Z waitLoop: 2026-02-21T09:47:28.0726815Z mbarrier.try_wait.parity.shared.b64 complete, [%r587], %r588; 2026-02-21T09:47:28.0726876Z @!complete bra.uni waitLoop; 2026-02-21T09:47:28.0726925Z } 2026-02-21T09:47:28.0726929Z 2026-02-21T09:47:28.0726989Z // end inline asm 2026-02-21T09:47:28.0727043Z // begin inline asm 2026-02-21T09:47:28.0727312Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r386, %r387, %r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401}, [%r81 + 0]; 2026-02-21T09:47:28.0727375Z // end inline asm 2026-02-21T09:47:28.0727428Z // begin inline asm 2026-02-21T09:47:28.0727750Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r403, %r404, %r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418}, [%r81 + 16]; 2026-02-21T09:47:28.0727802Z // end inline asm 2026-02-21T09:47:28.0727865Z // begin inline asm 2026-02-21T09:47:28.0728132Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r420, %r421, %r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435}, [%r81 + 32]; 2026-02-21T09:47:28.0728185Z // end inline asm 2026-02-21T09:47:28.0728244Z // begin inline asm 2026-02-21T09:47:28.0728507Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r437, %r438, %r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452}, [%r81 + 48]; 2026-02-21T09:47:28.0728559Z // end inline asm 2026-02-21T09:47:28.0728620Z // begin inline asm 2026-02-21T09:47:28.0728687Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:28.0728770Z // end inline asm 2026-02-21T09:47:28.0728829Z cvt.u64.u32 %rd131, %r386; 2026-02-21T09:47:28.0728893Z cvt.u64.u32 %rd132, %r387; 2026-02-21T09:47:28.0728948Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:47:28.0729007Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:47:28.0729178Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0729237Z mov.b64 {%r457, %r458}, %rd134; 2026-02-21T09:47:28.0729303Z cvt.rn.f16x2.f32 %r459, %r458, %r457; 2026-02-21T09:47:28.0729472Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0729556Z cvt.u64.u32 %rd135, %r388; 2026-02-21T09:47:28.0729611Z cvt.u64.u32 %rd136, %r389; 2026-02-21T09:47:28.0729666Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:47:28.0729732Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:47:28.0729898Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0729957Z mov.b64 {%r460, %r461}, %rd138; 2026-02-21T09:47:28.0730031Z cvt.rn.f16x2.f32 %r462, %r461, %r460; 2026-02-21T09:47:28.0730191Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0730247Z cvt.u64.u32 %rd139, %r390; 2026-02-21T09:47:28.0730308Z cvt.u64.u32 %rd140, %r391; 2026-02-21T09:47:28.0730363Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:47:28.0730419Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:47:28.0730578Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0730644Z mov.b64 {%r463, %r464}, %rd142; 2026-02-21T09:47:28.0730708Z cvt.rn.f16x2.f32 %r465, %r464, %r463; 2026-02-21T09:47:28.0730870Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0730933Z cvt.u64.u32 %rd143, %r392; 2026-02-21T09:47:28.0731011Z cvt.u64.u32 %rd144, %r393; 2026-02-21T09:47:28.0731069Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:47:28.0731133Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:47:28.0731298Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0731355Z mov.b64 {%r466, %r467}, %rd146; 2026-02-21T09:47:28.0731417Z cvt.rn.f16x2.f32 %r468, %r467, %r466; 2026-02-21T09:47:28.0731590Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0731646Z cvt.u64.u32 %rd147, %r394; 2026-02-21T09:47:28.0731699Z cvt.u64.u32 %rd148, %r395; 2026-02-21T09:47:28.0731765Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:47:28.0731822Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:47:28.0731985Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0732047Z mov.b64 {%r469, %r470}, %rd150; 2026-02-21T09:47:28.0732111Z cvt.rn.f16x2.f32 %r471, %r470, %r469; 2026-02-21T09:47:28.0732275Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0732361Z cvt.u64.u32 %rd151, %r396; 2026-02-21T09:47:28.0732427Z cvt.u64.u32 %rd152, %r397; 2026-02-21T09:47:28.0732489Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:47:28.0732545Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:47:28.0732714Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0732770Z mov.b64 {%r472, %r473}, %rd154; 2026-02-21T09:47:28.0732830Z cvt.rn.f16x2.f32 %r474, %r473, %r472; 2026-02-21T09:47:28.0733001Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0733058Z cvt.u64.u32 %rd155, %r398; 2026-02-21T09:47:28.0733112Z cvt.u64.u32 %rd156, %r399; 2026-02-21T09:47:28.0733166Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:47:28.0733230Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:47:28.0733420Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0733480Z mov.b64 {%r475, %r476}, %rd158; 2026-02-21T09:47:28.0733550Z cvt.rn.f16x2.f32 %r477, %r476, %r475; 2026-02-21T09:47:28.0733712Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0733767Z cvt.u64.u32 %rd159, %r400; 2026-02-21T09:47:28.0733830Z cvt.u64.u32 %rd160, %r401; 2026-02-21T09:47:28.0733890Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:47:28.0733950Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:47:28.0734121Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0734212Z mov.b64 {%r478, %r479}, %rd162; 2026-02-21T09:47:28.0734275Z cvt.rn.f16x2.f32 %r480, %r479, %r478; 2026-02-21T09:47:28.0734444Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0734512Z cvt.u64.u32 %rd163, %r403; 2026-02-21T09:47:28.0734570Z cvt.u64.u32 %rd164, %r404; 2026-02-21T09:47:28.0734630Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:47:28.0734723Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:47:28.0734898Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0734958Z mov.b64 {%r481, %r482}, %rd166; 2026-02-21T09:47:28.0735020Z cvt.rn.f16x2.f32 %r483, %r482, %r481; 2026-02-21T09:47:28.0735198Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0735258Z cvt.u64.u32 %rd167, %r405; 2026-02-21T09:47:28.0735317Z cvt.u64.u32 %rd168, %r406; 2026-02-21T09:47:28.0735384Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:47:28.0735444Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:47:28.0735616Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0735711Z mov.b64 {%r484, %r485}, %rd170; 2026-02-21T09:47:28.0735776Z cvt.rn.f16x2.f32 %r486, %r485, %r484; 2026-02-21T09:47:28.0735953Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0736011Z cvt.u64.u32 %rd171, %r407; 2026-02-21T09:47:28.0736075Z cvt.u64.u32 %rd172, %r408; 2026-02-21T09:47:28.0736133Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:47:28.0736191Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:47:28.0736368Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0736428Z mov.b64 {%r487, %r488}, %rd174; 2026-02-21T09:47:28.0736492Z cvt.rn.f16x2.f32 %r489, %r488, %r487; 2026-02-21T09:47:28.0736672Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0736730Z cvt.u64.u32 %rd175, %r409; 2026-02-21T09:47:28.0736789Z cvt.u64.u32 %rd176, %r410; 2026-02-21T09:47:28.0736846Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:47:28.0736914Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:47:28.0737086Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0737176Z mov.b64 {%r490, %r491}, %rd178; 2026-02-21T09:47:28.0737248Z cvt.rn.f16x2.f32 %r492, %r491, %r490; 2026-02-21T09:47:28.0737419Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0737476Z cvt.u64.u32 %rd179, %r411; 2026-02-21T09:47:28.0737539Z cvt.u64.u32 %rd180, %r412; 2026-02-21T09:47:28.0737597Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:47:28.0737657Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:47:28.0737829Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0737894Z mov.b64 {%r493, %r494}, %rd182; 2026-02-21T09:47:28.0737957Z cvt.rn.f16x2.f32 %r495, %r494, %r493; 2026-02-21T09:47:28.0738151Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0738219Z cvt.u64.u32 %rd183, %r413; 2026-02-21T09:47:28.0738277Z cvt.u64.u32 %rd184, %r414; 2026-02-21T09:47:28.0738335Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:47:28.0738400Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:47:28.0738572Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0738630Z mov.b64 {%r496, %r497}, %rd186; 2026-02-21T09:47:28.0738693Z cvt.rn.f16x2.f32 %r498, %r497, %r496; 2026-02-21T09:47:28.0738870Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0738955Z cvt.u64.u32 %rd187, %r415; 2026-02-21T09:47:28.0739013Z cvt.u64.u32 %rd188, %r416; 2026-02-21T09:47:28.0739078Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:47:28.0739136Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:47:28.0739309Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0739374Z mov.b64 {%r499, %r500}, %rd190; 2026-02-21T09:47:28.0739440Z cvt.rn.f16x2.f32 %r501, %r500, %r499; 2026-02-21T09:47:28.0739609Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0739668Z cvt.u64.u32 %rd191, %r417; 2026-02-21T09:47:28.0739733Z cvt.u64.u32 %rd192, %r418; 2026-02-21T09:47:28.0739791Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:47:28.0739849Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:47:28.0740027Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0740088Z mov.b64 {%r502, %r503}, %rd194; 2026-02-21T09:47:28.0740151Z cvt.rn.f16x2.f32 %r504, %r503, %r502; 2026-02-21T09:47:28.0740331Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0740411Z cvt.u64.u32 %rd195, %r420; 2026-02-21T09:47:28.0740473Z cvt.u64.u32 %rd196, %r421; 2026-02-21T09:47:28.0740531Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:47:28.0740600Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:47:28.0740777Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0740836Z mov.b64 {%r505, %r506}, %rd198; 2026-02-21T09:47:28.0740908Z cvt.rn.f16x2.f32 %r507, %r506, %r505; 2026-02-21T09:47:28.0741082Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0741141Z cvt.u64.u32 %rd199, %r422; 2026-02-21T09:47:28.0741210Z cvt.u64.u32 %rd200, %r423; 2026-02-21T09:47:28.0741272Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:47:28.0741333Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:47:28.0741515Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0741579Z mov.b64 {%r508, %r509}, %rd202; 2026-02-21T09:47:28.0741641Z cvt.rn.f16x2.f32 %r510, %r509, %r508; 2026-02-21T09:47:28.0741804Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0741890Z cvt.u64.u32 %rd203, %r424; 2026-02-21T09:47:28.0741945Z cvt.u64.u32 %rd204, %r425; 2026-02-21T09:47:28.0742002Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:47:28.0742058Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:47:28.0742229Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0742285Z mov.b64 {%r511, %r512}, %rd206; 2026-02-21T09:47:28.0742345Z cvt.rn.f16x2.f32 %r513, %r512, %r511; 2026-02-21T09:47:28.0742517Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0742573Z cvt.u64.u32 %rd207, %r426; 2026-02-21T09:47:28.0742628Z cvt.u64.u32 %rd208, %r427; 2026-02-21T09:47:28.0742690Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:47:28.0742749Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:47:28.0742937Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0743002Z mov.b64 {%r514, %r515}, %rd210; 2026-02-21T09:47:28.0743062Z cvt.rn.f16x2.f32 %r516, %r515, %r514; 2026-02-21T09:47:28.0743227Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0743284Z cvt.u64.u32 %rd211, %r428; 2026-02-21T09:47:28.0743345Z cvt.u64.u32 %rd212, %r429; 2026-02-21T09:47:28.0743401Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:47:28.0743458Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:47:28.0743648Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0743704Z mov.b64 {%r517, %r518}, %rd214; 2026-02-21T09:47:28.0743764Z cvt.rn.f16x2.f32 %r519, %r518, %r517; 2026-02-21T09:47:28.0743938Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0743994Z cvt.u64.u32 %rd215, %r430; 2026-02-21T09:47:28.0744051Z cvt.u64.u32 %rd216, %r431; 2026-02-21T09:47:28.0744106Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:47:28.0744170Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:47:28.0744333Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0744389Z mov.b64 {%r520, %r521}, %rd218; 2026-02-21T09:47:28.0744457Z cvt.rn.f16x2.f32 %r522, %r521, %r520; 2026-02-21T09:47:28.0744622Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0744703Z cvt.u64.u32 %rd219, %r432; 2026-02-21T09:47:28.0744768Z cvt.u64.u32 %rd220, %r433; 2026-02-21T09:47:28.0744823Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:47:28.0744877Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:47:28.0745078Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0745145Z mov.b64 {%r523, %r524}, %rd222; 2026-02-21T09:47:28.0745206Z cvt.rn.f16x2.f32 %r525, %r524, %r523; 2026-02-21T09:47:28.0745370Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0745433Z cvt.u64.u32 %rd223, %r434; 2026-02-21T09:47:28.0745488Z cvt.u64.u32 %rd224, %r435; 2026-02-21T09:47:28.0745544Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:47:28.0745600Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:47:28.0745772Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0745829Z mov.b64 {%r526, %r527}, %rd226; 2026-02-21T09:47:28.0745888Z cvt.rn.f16x2.f32 %r528, %r527, %r526; 2026-02-21T09:47:28.0746060Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0746116Z cvt.u64.u32 %rd227, %r437; 2026-02-21T09:47:28.0746171Z cvt.u64.u32 %rd228, %r438; 2026-02-21T09:47:28.0746234Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:47:28.0746290Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:47:28.0746473Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0746535Z mov.b64 {%r529, %r530}, %rd230; 2026-02-21T09:47:28.0746595Z cvt.rn.f16x2.f32 %r531, %r530, %r529; 2026-02-21T09:47:28.0746754Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0746808Z cvt.u64.u32 %rd231, %r439; 2026-02-21T09:47:28.0746869Z cvt.u64.u32 %rd232, %r440; 2026-02-21T09:47:28.0746926Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:47:28.0746981Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:47:28.0747148Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0747205Z mov.b64 {%r532, %r533}, %rd234; 2026-02-21T09:47:28.0747264Z cvt.rn.f16x2.f32 %r534, %r533, %r532; 2026-02-21T09:47:28.0747457Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0747515Z cvt.u64.u32 %rd235, %r441; 2026-02-21T09:47:28.0747570Z cvt.u64.u32 %rd236, %r442; 2026-02-21T09:47:28.0747626Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:47:28.0747690Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:47:28.0747856Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0747911Z mov.b64 {%r535, %r536}, %rd238; 2026-02-21T09:47:28.0747977Z cvt.rn.f16x2.f32 %r537, %r536, %r535; 2026-02-21T09:47:28.0748137Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0748220Z cvt.u64.u32 %rd239, %r443; 2026-02-21T09:47:28.0748286Z cvt.u64.u32 %rd240, %r444; 2026-02-21T09:47:28.0748343Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:47:28.0748398Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:47:28.0748563Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0748627Z mov.b64 {%r538, %r539}, %rd242; 2026-02-21T09:47:28.0748686Z cvt.rn.f16x2.f32 %r540, %r539, %r538; 2026-02-21T09:47:28.0748848Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0748911Z cvt.u64.u32 %rd243, %r445; 2026-02-21T09:47:28.0748966Z cvt.u64.u32 %rd244, %r446; 2026-02-21T09:47:28.0749022Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:47:28.0749078Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:47:28.0749243Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0749301Z mov.b64 {%r541, %r542}, %rd246; 2026-02-21T09:47:28.0749361Z cvt.rn.f16x2.f32 %r543, %r542, %r541; 2026-02-21T09:47:28.0749531Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0749610Z cvt.u64.u32 %rd247, %r447; 2026-02-21T09:47:28.0749669Z cvt.u64.u32 %rd248, %r448; 2026-02-21T09:47:28.0749735Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:47:28.0749792Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:47:28.0749958Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0750017Z mov.b64 {%r544, %r545}, %rd250; 2026-02-21T09:47:28.0750091Z cvt.rn.f16x2.f32 %r546, %r545, %r544; 2026-02-21T09:47:28.0750253Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0750309Z cvt.u64.u32 %rd251, %r449; 2026-02-21T09:47:28.0750374Z cvt.u64.u32 %rd252, %r450; 2026-02-21T09:47:28.0750429Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:47:28.0750486Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:47:28.0750656Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0750713Z mov.b64 {%r547, %r548}, %rd254; 2026-02-21T09:47:28.0750774Z cvt.rn.f16x2.f32 %r549, %r548, %r547; 2026-02-21T09:47:28.0750960Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0751023Z cvt.u64.u32 %rd255, %r451; 2026-02-21T09:47:28.0751078Z cvt.u64.u32 %rd256, %r452; 2026-02-21T09:47:28.0751132Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:47:28.0751195Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:47:28.0751356Z .loc 1 58 27 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:58:27 2026-02-21T09:47:28.0751414Z mov.b64 {%r550, %r551}, %rd258; 2026-02-21T09:47:28.0751482Z cvt.rn.f16x2.f32 %r552, %r551, %r550; 2026-02-21T09:47:28.0751643Z .loc 1 59 45 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:59:45 2026-02-21T09:47:28.0751713Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:28.0751765Z bar.sync 0; 2026-02-21T09:47:28.0751867Z st.shared.v4.b32 [%r12], {%r459, %r462, %r465, %r468}; 2026-02-21T09:47:28.0751979Z st.shared.v4.b32 [%r13], {%r471, %r474, %r477, %r480}; 2026-02-21T09:47:28.0752065Z st.shared.v4.b32 [%r14], {%r483, %r486, %r489, %r492}; 2026-02-21T09:47:28.0752154Z st.shared.v4.b32 [%r15], {%r495, %r498, %r501, %r504}; 2026-02-21T09:47:28.0752233Z st.shared.v4.b32 [%r16], {%r507, %r510, %r513, %r516}; 2026-02-21T09:47:28.0752311Z st.shared.v4.b32 [%r17], {%r519, %r522, %r525, %r528}; 2026-02-21T09:47:28.0752396Z st.shared.v4.b32 [%r18], {%r531, %r534, %r537, %r540}; 2026-02-21T09:47:28.0752474Z st.shared.v4.b32 [%r19], {%r543, %r546, %r549, %r552}; 2026-02-21T09:47:28.0752530Z // begin inline asm 2026-02-21T09:47:28.0752623Z fence.proxy.async.shared::cta; 2026-02-21T09:47:28.0752683Z // end inline asm 2026-02-21T09:47:28.0752736Z bar.sync 0; 2026-02-21T09:47:28.0752798Z elect.sync %r553|%p161, -1; 2026-02-21T09:47:28.0752865Z and.pred %p159, %p3, %p161; 2026-02-21T09:47:28.0752918Z // begin inline asm 2026-02-21T09:47:28.0753097Z @%p159 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd130, {%r566, %r568}], [%r285]; 2026-02-21T09:47:28.0753158Z // end inline asm 2026-02-21T09:47:28.0753222Z cp.async.bulk.commit_group; 2026-02-21T09:47:28.0753278Z bra.uni $L__BB0_10; 2026-02-21T09:47:28.0753358Z $L__BB0_11: // %._crit_edge 2026-02-21T09:47:28.0753529Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0753598Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:28.0753651Z bar.sync 0; 2026-02-21T09:47:28.0753715Z @%p79 bra $L__BB0_13; 2026-02-21T09:47:28.0753767Z // %bb.12: 2026-02-21T09:47:28.0753930Z .loc 1 56 52 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:56:52 2026-02-21T09:47:28.0753993Z // begin inline asm 2026-02-21T09:47:28.0754042Z 2026-02-21T09:47:28.0754090Z { 2026-02-21T09:47:28.0754146Z .reg .pred complete; 2026-02-21T09:47:28.0754205Z waitLoop: 2026-02-21T09:47:28.0754339Z mbarrier.try_wait.parity.shared.b64 complete, [%r587], %r588; 2026-02-21T09:47:28.0754403Z @!complete bra.uni waitLoop; 2026-02-21T09:47:28.0754460Z } 2026-02-21T09:47:28.0754463Z 2026-02-21T09:47:28.0754516Z // end inline asm 2026-02-21T09:47:28.0754568Z $L__BB0_13: 2026-02-21T09:47:28.0754759Z .loc 1 33 84 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:84 2026-02-21T09:47:28.0754821Z // begin inline asm 2026-02-21T09:47:28.0754903Z @%p164 mbarrier.inval.shared::cta.b64 [%r151]; 2026-02-21T09:47:28.0754954Z // end inline asm 2026-02-21T09:47:28.0755014Z bar.sync 0; 2026-02-21T09:47:28.0755068Z // begin inline asm 2026-02-21T09:47:28.0755149Z @%p164 mbarrier.inval.shared::cta.b64 [%r152]; 2026-02-21T09:47:28.0755208Z // end inline asm 2026-02-21T09:47:28.0755259Z bar.sync 0; 2026-02-21T09:47:28.0755311Z // begin inline asm 2026-02-21T09:47:28.0755388Z @%p164 mbarrier.inval.shared::cta.b64 [%r153]; 2026-02-21T09:47:28.0755449Z // end inline asm 2026-02-21T09:47:28.0755503Z bar.sync 0; 2026-02-21T09:47:28.0755556Z // begin inline asm 2026-02-21T09:47:28.0755668Z @%p164 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:47:28.0755720Z // end inline asm 2026-02-21T09:47:28.0755779Z add.s32 %r561, %r56, 196640; 2026-02-21T09:47:28.0755833Z // begin inline asm 2026-02-21T09:47:28.0755917Z @%p164 mbarrier.inval.shared::cta.b64 [%r561]; 2026-02-21T09:47:28.0755970Z // end inline asm 2026-02-21T09:47:28.0756020Z bar.sync 0; 2026-02-21T09:47:28.0756081Z // begin inline asm 2026-02-21T09:47:28.0756157Z @%p164 mbarrier.inval.shared::cta.b64 [%r150]; 2026-02-21T09:47:28.0756210Z // end inline asm 2026-02-21T09:47:28.0756376Z .loc 1 33 4 // cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py:33:4 2026-02-21T09:47:28.0756434Z bar.sync 0; 2026-02-21T09:47:28.0756487Z // begin inline asm 2026-02-21T09:47:28.0756601Z @%p3 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r563, 128; 2026-02-21T09:47:28.0756662Z // end inline asm 2026-02-21T09:47:28.0756737Z ret; 2026-02-21T09:47:28.0756791Z $L__tmp1: 2026-02-21T09:47:28.0756855Z $L__func_end0: 2026-02-21T09:47:28.0756935Z // -- End function 2026-02-21T09:47:28.0756984Z } 2026-02-21T09:47:28.0757185Z .file 1 "/tmp/torchinductor_root/y7/cy7zcntcjdrhhscyk724vi6i2pewhgft3sgoliywrco4yvduksb6.py" 2026-02-21T09:47:28.0757254Z .section .debug_abbrev 2026-02-21T09:47:28.0757302Z { 2026-02-21T09:47:28.0757387Z .b8 1 // Abbreviation Code 2026-02-21T09:47:28.0757481Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:28.0757590Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:28.0757669Z .b8 37 // DW_AT_producer 2026-02-21T09:47:28.0757753Z .b8 8 // DW_FORM_string 2026-02-21T09:47:28.0757827Z .b8 19 // DW_AT_language 2026-02-21T09:47:28.0757905Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:28.0757979Z .b8 3 // DW_AT_name 2026-02-21T09:47:28.0758059Z .b8 8 // DW_FORM_string 2026-02-21T09:47:28.0758136Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:28.0758207Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:28.0758288Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:28.0758358Z .b8 8 // DW_FORM_string 2026-02-21T09:47:28.0758427Z .b8 0 // EOM(1) 2026-02-21T09:47:28.0758502Z .b8 0 // EOM(2) 2026-02-21T09:47:28.0758566Z .b8 0 // EOM(3) 2026-02-21T09:47:28.0758615Z } 2026-02-21T09:47:28.0758672Z .section .debug_info 2026-02-21T09:47:28.0758728Z { 2026-02-21T09:47:28.0758809Z .b32 104 // Length of Unit 2026-02-21T09:47:28.0758921Z .b8 2 // DWARF version number 2026-02-21T09:47:28.0758978Z .b8 0 2026-02-21T09:47:28.0759092Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:28.0759178Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:28.0759273Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:28.0759356Z .b8 116 // DW_AT_producer 2026-02-21T09:47:28.0759407Z .b8 114 2026-02-21T09:47:28.0759457Z .b8 105 2026-02-21T09:47:28.0759512Z .b8 116 2026-02-21T09:47:28.0759559Z .b8 111 2026-02-21T09:47:28.0759610Z .b8 110 2026-02-21T09:47:28.0759659Z .b8 0 2026-02-21T09:47:28.0759735Z .b8 2 // DW_AT_language 2026-02-21T09:47:28.0759784Z .b8 0 2026-02-21T09:47:28.0759855Z .b8 99 // DW_AT_name 2026-02-21T09:47:28.0759913Z .b8 121 2026-02-21T09:47:28.0759964Z .b8 55 2026-02-21T09:47:28.0760012Z .b8 122 2026-02-21T09:47:28.0760063Z .b8 99 2026-02-21T09:47:28.0760120Z .b8 110 2026-02-21T09:47:28.0760190Z .b8 116 2026-02-21T09:47:28.0760238Z .b8 99 2026-02-21T09:47:28.0760295Z .b8 106 2026-02-21T09:47:28.0760345Z .b8 100 2026-02-21T09:47:28.0760393Z .b8 114 2026-02-21T09:47:28.0760441Z .b8 104 2026-02-21T09:47:28.0760495Z .b8 104 2026-02-21T09:47:28.0760543Z .b8 115 2026-02-21T09:47:28.0760591Z .b8 99 2026-02-21T09:47:28.0760646Z .b8 121 2026-02-21T09:47:28.0760693Z .b8 107 2026-02-21T09:47:28.0760741Z .b8 55 2026-02-21T09:47:28.0760789Z .b8 50 2026-02-21T09:47:28.0760844Z .b8 52 2026-02-21T09:47:28.0760892Z .b8 118 2026-02-21T09:47:28.0760942Z .b8 105 2026-02-21T09:47:28.0760996Z .b8 54 2026-02-21T09:47:28.0761044Z .b8 105 2026-02-21T09:47:28.0761091Z .b8 50 2026-02-21T09:47:28.0761137Z .b8 112 2026-02-21T09:47:28.0761193Z .b8 101 2026-02-21T09:47:28.0761241Z .b8 119 2026-02-21T09:47:28.0761288Z .b8 104 2026-02-21T09:47:28.0761336Z .b8 103 2026-02-21T09:47:28.0761392Z .b8 102 2026-02-21T09:47:28.0761440Z .b8 116 2026-02-21T09:47:28.0761508Z .b8 51 2026-02-21T09:47:28.0761563Z .b8 115 2026-02-21T09:47:28.0761613Z .b8 103 2026-02-21T09:47:28.0761659Z .b8 111 2026-02-21T09:47:28.0761706Z .b8 108 2026-02-21T09:47:28.0761762Z .b8 105 2026-02-21T09:47:28.0761809Z .b8 121 2026-02-21T09:47:28.0761857Z .b8 119 2026-02-21T09:47:28.0761911Z .b8 114 2026-02-21T09:47:28.0761958Z .b8 99 2026-02-21T09:47:28.0762006Z .b8 111 2026-02-21T09:47:28.0762054Z .b8 52 2026-02-21T09:47:28.0762109Z .b8 121 2026-02-21T09:47:28.0762156Z .b8 118 2026-02-21T09:47:28.0762204Z .b8 100 2026-02-21T09:47:28.0762251Z .b8 117 2026-02-21T09:47:28.0762304Z .b8 107 2026-02-21T09:47:28.0762389Z .b8 115 2026-02-21T09:47:28.0762438Z .b8 98 2026-02-21T09:47:28.0762495Z .b8 54 2026-02-21T09:47:28.0762544Z .b8 46 2026-02-21T09:47:28.0762592Z .b8 112 2026-02-21T09:47:28.0762641Z .b8 121 2026-02-21T09:47:28.0762697Z .b8 0 2026-02-21T09:47:28.0762784Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:28.0762859Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:28.0762916Z .b8 116 2026-02-21T09:47:28.0762966Z .b8 109 2026-02-21T09:47:28.0763014Z .b8 112 2026-02-21T09:47:28.0763062Z .b8 47 2026-02-21T09:47:28.0763118Z .b8 116 2026-02-21T09:47:28.0763167Z .b8 111 2026-02-21T09:47:28.0763215Z .b8 114 2026-02-21T09:47:28.0763270Z .b8 99 2026-02-21T09:47:28.0763318Z .b8 104 2026-02-21T09:47:28.0763369Z .b8 105 2026-02-21T09:47:28.0763417Z .b8 110 2026-02-21T09:47:28.0763479Z .b8 100 2026-02-21T09:47:28.0763528Z .b8 117 2026-02-21T09:47:28.0763582Z .b8 99 2026-02-21T09:47:28.0763637Z .b8 116 2026-02-21T09:47:28.0763686Z .b8 111 2026-02-21T09:47:28.0763737Z .b8 114 2026-02-21T09:47:28.0763786Z .b8 95 2026-02-21T09:47:28.0763842Z .b8 114 2026-02-21T09:47:28.0763892Z .b8 111 2026-02-21T09:47:28.0763941Z .b8 111 2026-02-21T09:47:28.0763990Z .b8 116 2026-02-21T09:47:28.0764047Z .b8 47 2026-02-21T09:47:28.0764096Z .b8 121 2026-02-21T09:47:28.0764143Z .b8 55 2026-02-21T09:47:28.0764197Z .b8 0 2026-02-21T09:47:28.0764266Z } 2026-02-21T09:47:28.0764331Z .section .debug_macinfo { } 2026-02-21T09:47:28.0764336Z 2026-02-21T09:47:28.0764414Z ================================================================ 2026-02-21T09:47:28.0764520Z please share the reproducer above with Triton project. 2026-02-21T09:47:31.9201175Z 2026-02-21T09:47:31.9205626Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 91/91 17.7 configs/s 2026-02-21T09:47:32.3699362Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 1940.7 2026-02-21T09:47:32.3703033Z configs/s 2026-02-21T09:47:32.4427719Z [69s] Generation 2 complete: 2026-02-21T09:47:32.4431622Z error=17 2026-02-21T09:47:32.4435958Z ok=77 2026-02-21T09:47:32.4438069Z min=0.1701 2026-02-21T09:47:32.4438240Z mid=0.4710 2026-02-21T09:47:32.4438366Z max=7.2238 2026-02-21T09:47:32.4438517Z best={'block_sizes': [256, 128, 32], 2026-02-21T09:47:32.4438779Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:47:32.4439028Z 'l2_groupings': [8], 2026-02-21T09:47:32.4439201Z 'load_eviction_policies': ['', ''], 2026-02-21T09:47:32.4439636Z 'loop_orders': [[0, 1]], 2026-02-21T09:47:32.4439803Z 'maxnreg': 64, 2026-02-21T09:47:32.4439951Z 'num_sm_multiplier': 8, 2026-02-21T09:47:32.4440118Z 'num_stages': 2, 2026-02-21T09:47:32.4440258Z 'num_warps': 1, 2026-02-21T09:47:32.4440419Z 'pid_type': 'persistent_blocked', 2026-02-21T09:47:32.4440607Z 'range_flattens': [False, False], 2026-02-21T09:47:32.4440798Z 'range_multi_buffers': [False, False], 2026-02-21T09:47:32.4440984Z 'range_num_stages': [0, 0], 2026-02-21T09:47:32.4441162Z 'range_unroll_factors': [0, 0], 2026-02-21T09:47:32.4441350Z 'range_warp_specializes': [None, True]} 2026-02-21T09:47:32.4447615Z [69s] Fitting surrogate: 291 points, 291 targets 2026-02-21T09:47:33.7287215Z [70s] Generation 3 starting: 84 neighbors, 5 active search path(s) 2026-02-21T09:47:41.2794512Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88/88 6.6 configs/s 2026-02-21T09:47:41.4016272Z [78s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:41.4016627Z 2026-02-21T09:47:41.4022186Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 32, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:47:41.4023577Z 2026-02-21T09:47:41.4023789Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:41.4024048Z `ptxas` stderr: 2026-02-21T09:47:41.4024500Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:41.4025112Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:41.4025278Z 2026-02-21T09:47:41.4025691Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpawx1lq_3.ptx -o /tmp/tmpawx1lq_3.ptx.o 2026-02-21T09:47:41.4026137Z 2026-02-21T09:47:41.4026277Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:41.4026567Z ================================================================ 2026-02-21T09:47:41.4026782Z Internal Triton PTX codegen error 2026-02-21T09:47:41.4026962Z `ptxas` stderr: 2026-02-21T09:47:41.4027359Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:41.4027827Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:41.4027973Z 2026-02-21T09:47:41.4028409Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpawx1lq_3.ptx -o /tmp/tmpawx1lq_3.ptx.o 2026-02-21T09:47:41.4028842Z 2026-02-21T09:47:41.4028846Z 2026-02-21T09:47:41.4028902Z // 2026-02-21T09:47:41.4029044Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:41.4029235Z // 2026-02-21T09:47:41.4029300Z 2026-02-21T09:47:41.4029369Z .version 8.7 2026-02-21T09:47:41.4029502Z .target sm_100a 2026-02-21T09:47:41.4029648Z .address_size 64 2026-02-21T09:47:41.4029736Z 2026-02-21T09:47:41.4029856Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:41.4030118Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:41.4030320Z // @_helion_matmul 2026-02-21T09:47:41.4030521Z .visible .entry _helion_matmul( 2026-02-21T09:47:41.4030727Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:41.4030980Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:41.4031226Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:41.4031548Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:41.4031799Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:41.4031996Z ) 2026-02-21T09:47:41.4032121Z .reqntid 256 2026-02-21T09:47:41.4032250Z .maxnreg 32 2026-02-21T09:47:41.4032383Z { 2026-02-21T09:47:41.4032510Z .reg .pred %p<99>; 2026-02-21T09:47:41.4032670Z .reg .b16 %rs<11>; 2026-02-21T09:47:41.4032807Z .reg .b32 %r<378>; 2026-02-21T09:47:41.4032957Z .reg .b64 %rd<153>; 2026-02-21T09:47:41.4033238Z .loc 1 19 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:19:0 2026-02-21T09:47:41.4033522Z $L__func_begin0: 2026-02-21T09:47:41.4033773Z .loc 1 19 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:19:0 2026-02-21T09:47:41.4034012Z 2026-02-21T09:47:41.4034065Z // %bb.0: 2026-02-21T09:47:41.4034267Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:47:41.4034452Z $L__tmp0: 2026-02-21T09:47:41.4034721Z .loc 1 19 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:19 2026-02-21T09:47:41.4035010Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:41.4035151Z shr.u32 %r2, %r1, 5; 2026-02-21T09:47:41.4035313Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:47:41.4035494Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:47:41.4035649Z @%p1 bra $L__BB0_12; 2026-02-21T09:47:41.4035785Z bra.uni $L__BB0_1; 2026-02-21T09:47:41.4035924Z $L__BB0_12: 2026-02-21T09:47:41.4036160Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0:0 2026-02-21T09:47:41.4036505Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:47:41.4036714Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:47:41.4036997Z .loc 1 19 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:19 2026-02-21T09:47:41.4037295Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.4037481Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:47:41.4037650Z mov.b32 %r134, global_smem; 2026-02-21T09:47:41.4037804Z // begin inline asm 2026-02-21T09:47:41.4038050Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r134], 32; 2026-02-21T09:47:41.4038294Z // end inline asm 2026-02-21T09:47:41.4038427Z bar.sync 0, 128; 2026-02-21T09:47:41.4038578Z ld.shared.b32 %r370, [global_smem]; 2026-02-21T09:47:41.4038745Z bar.sync 0, 128; 2026-02-21T09:47:41.4038885Z // begin inline asm 2026-02-21T09:47:41.4039083Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:41.4039307Z // end inline asm 2026-02-21T09:47:41.4039548Z .loc 1 21 67 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:21:67 2026-02-21T09:47:41.4039836Z mov.u32 %r151, %ctaid.x; 2026-02-21T09:47:41.4039987Z mov.u32 %r152, %ctaid.y; 2026-02-21T09:47:41.4040166Z mov.u32 %r153, %ctaid.z; 2026-02-21T09:47:41.4040321Z mov.u32 %r154, %nctaid.x; 2026-02-21T09:47:41.4040471Z mov.u32 %r155, %nctaid.y; 2026-02-21T09:47:41.4040634Z mad.lo.s32 %r156, %r153, %r155, %r152; 2026-02-21T09:47:41.4040810Z mad.lo.s32 %r157, %r156, %r154, %r151; 2026-02-21T09:47:41.4040980Z shl.b32 %r158, %r157, 8; 2026-02-21T09:47:41.4041126Z cvt.s64.s32 %rd84, %r158; 2026-02-21T09:47:41.4041283Z add.s64 %rd63, %rd6, %rd84; 2026-02-21T09:47:41.4041435Z shl.b32 %r159, %r1, 2; 2026-02-21T09:47:41.4041591Z add.s32 %r135, %r134, %r159; 2026-02-21T09:47:41.4041749Z mov.b32 %r144, 0; 2026-02-21T09:47:41.4041879Z // begin inline asm 2026-02-21T09:47:41.4042040Z @%p34 st.shared.b32 [ %r135 + 0 ], %r144; 2026-02-21T09:47:41.4042209Z // end inline asm 2026-02-21T09:47:41.4042349Z bar.warp.sync -1; 2026-02-21T09:47:41.4042487Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:47:41.4042642Z cvt.u64.u32 %rd48, %r134; 2026-02-21T09:47:41.4042783Z // begin inline asm 2026-02-21T09:47:41.4043032Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd48 + 0 ], %rd3; 2026-02-21T09:47:41.4043310Z // end inline asm 2026-02-21T09:47:41.4043479Z // begin inline asm 2026-02-21T09:47:41.4043702Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1; 2026-02-21T09:47:41.4043951Z // end inline asm 2026-02-21T09:47:41.4044093Z mov.b32 %r137, 64; 2026-02-21T09:47:41.4044228Z // begin inline asm 2026-02-21T09:47:41.4044470Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r137; 2026-02-21T09:47:41.4044767Z // end inline asm 2026-02-21T09:47:41.4044912Z mov.b32 %r138, 128; 2026-02-21T09:47:41.4045064Z // begin inline asm 2026-02-21T09:47:41.4045303Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r138; 2026-02-21T09:47:41.4045585Z // end inline asm 2026-02-21T09:47:41.4045720Z mov.b32 %r139, 2048; 2026-02-21T09:47:41.4045868Z // begin inline asm 2026-02-21T09:47:41.4046155Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.4046454Z // end inline asm 2026-02-21T09:47:41.4046589Z // begin inline asm 2026-02-21T09:47:41.4046844Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r139; 2026-02-21T09:47:41.4047126Z // end inline asm 2026-02-21T09:47:41.4047261Z mov.b64 %rd56, 4096; 2026-02-21T09:47:41.4047409Z // begin inline asm 2026-02-21T09:47:41.4047667Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd48 + 0 ], 0x0, %rd56; 2026-02-21T09:47:41.4047961Z // end inline asm 2026-02-21T09:47:41.4048093Z mov.b32 %r141, 1; 2026-02-21T09:47:41.4048275Z // begin inline asm 2026-02-21T09:47:41.4048542Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r141; 2026-02-21T09:47:41.4048833Z // end inline asm 2026-02-21T09:47:41.4048973Z // begin inline asm 2026-02-21T09:47:41.4049233Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r141; 2026-02-21T09:47:41.4049529Z // end inline asm 2026-02-21T09:47:41.4049663Z // begin inline asm 2026-02-21T09:47:41.4049909Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x6; 2026-02-21T09:47:41.4050175Z // end inline asm 2026-02-21T09:47:41.4050317Z // begin inline asm 2026-02-21T09:47:41.4050580Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.4050905Z // end inline asm 2026-02-21T09:47:41.4051040Z // begin inline asm 2026-02-21T09:47:41.4051287Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x3; 2026-02-21T09:47:41.4051567Z // end inline asm 2026-02-21T09:47:41.4051706Z // begin inline asm 2026-02-21T09:47:41.4051943Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.4052206Z // end inline asm 2026-02-21T09:47:41.4052340Z // begin inline asm 2026-02-21T09:47:41.4052713Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd63 + 0 ], [ %rd48 + 0 ], 0x80; 2026-02-21T09:47:41.4053095Z // end inline asm 2026-02-21T09:47:41.4053224Z // begin inline asm 2026-02-21T09:47:41.4053436Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd63 + 0 ], 0x80; 2026-02-21T09:47:41.4053685Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.4053879Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.4054056Z // end inline asm 2026-02-21T09:47:41.4054183Z bar.sync 0, 128; 2026-02-21T09:47:41.4054431Z .loc 1 22 68 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:22:68 2026-02-21T09:47:41.4054756Z add.s64 %rd81, %rd63, 128; 2026-02-21T09:47:41.4054913Z bar.sync 0, 128; 2026-02-21T09:47:41.4055041Z // begin inline asm 2026-02-21T09:47:41.4055192Z @%p34 st.shared.b32 [ %r135 + 0 ], %r144; 2026-02-21T09:47:41.4055363Z // end inline asm 2026-02-21T09:47:41.4055504Z bar.warp.sync -1; 2026-02-21T09:47:41.4055638Z // begin inline asm 2026-02-21T09:47:41.4055880Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd48 + 0 ], %rd4; 2026-02-21T09:47:41.4056191Z // end inline asm 2026-02-21T09:47:41.4056324Z // begin inline asm 2026-02-21T09:47:41.4056554Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1; 2026-02-21T09:47:41.4056799Z // end inline asm 2026-02-21T09:47:41.4056936Z // begin inline asm 2026-02-21T09:47:41.4057165Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r137; 2026-02-21T09:47:41.4057439Z // end inline asm 2026-02-21T09:47:41.4057577Z mov.b32 %r146, 32; 2026-02-21T09:47:41.4057711Z // begin inline asm 2026-02-21T09:47:41.4057943Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r146; 2026-02-21T09:47:41.4058200Z // end inline asm 2026-02-21T09:47:41.4058338Z // begin inline asm 2026-02-21T09:47:41.4058577Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.4058886Z // end inline asm 2026-02-21T09:47:41.4059017Z mov.b32 %r148, 12288; 2026-02-21T09:47:41.4059166Z // begin inline asm 2026-02-21T09:47:41.4059401Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r148; 2026-02-21T09:47:41.4059662Z // end inline asm 2026-02-21T09:47:41.4059797Z // begin inline asm 2026-02-21T09:47:41.4060042Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd48 + 0 ], 0x0, %rd56; 2026-02-21T09:47:41.4060320Z // end inline asm 2026-02-21T09:47:41.4060448Z // begin inline asm 2026-02-21T09:47:41.4060705Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r141; 2026-02-21T09:47:41.4061023Z // end inline asm 2026-02-21T09:47:41.4061153Z // begin inline asm 2026-02-21T09:47:41.4061408Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r141; 2026-02-21T09:47:41.4061685Z // end inline asm 2026-02-21T09:47:41.4061819Z // begin inline asm 2026-02-21T09:47:41.4062041Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x6; 2026-02-21T09:47:41.4062301Z // end inline asm 2026-02-21T09:47:41.4062427Z // begin inline asm 2026-02-21T09:47:41.4062672Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.4062951Z // end inline asm 2026-02-21T09:47:41.4063076Z // begin inline asm 2026-02-21T09:47:41.4063310Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x3; 2026-02-21T09:47:41.4063565Z // end inline asm 2026-02-21T09:47:41.4063697Z // begin inline asm 2026-02-21T09:47:41.4063922Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.4064184Z // end inline asm 2026-02-21T09:47:41.4064316Z // begin inline asm 2026-02-21T09:47:41.4064643Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd81 + 0 ], [ %rd48 + 0 ], 0x80; 2026-02-21T09:47:41.4065081Z // end inline asm 2026-02-21T09:47:41.4065211Z // begin inline asm 2026-02-21T09:47:41.4065420Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd81 + 0 ], 0x80; 2026-02-21T09:47:41.4065661Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.4065852Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.4066027Z // end inline asm 2026-02-21T09:47:41.4066154Z bar.sync 0, 128; 2026-02-21T09:47:41.4066402Z .loc 1 29 35 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:29:35 2026-02-21T09:47:41.4066689Z mul.lo.s32 %r377, %r151, 3; 2026-02-21T09:47:41.4066957Z .loc 1 30 37 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:30:37 2026-02-21T09:47:41.4067243Z add.s32 %r160, %r377, 3; 2026-02-21T09:47:41.4067499Z .loc 1 30 49 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:30:49 2026-02-21T09:47:41.4067787Z min.s32 %r22, %r160, 6144; 2026-02-21T09:47:41.4068047Z .loc 1 31 84 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:31:84 2026-02-21T09:47:41.4068393Z setp.ge.s32 %p72, %r377, %r22; 2026-02-21T09:47:41.4068552Z @%p72 bra $L__BB0_15; 2026-02-21T09:47:41.4068718Z // %bb.13: // %.lr.ph 2026-02-21T09:47:41.4069004Z .loc 1 0 84 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0:84 2026-02-21T09:47:41.4069306Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:47:41.4069494Z shl.b32 %r161, %r1, 3; 2026-02-21T09:47:41.4069639Z and.b32 %r23, %r161, 24; 2026-02-21T09:47:41.4069796Z bfe.u32 %r24, %r1, 2, 5; 2026-02-21T09:47:41.4069941Z or.b32 %r25, %r24, 32; 2026-02-21T09:47:41.4070087Z or.b32 %r26, %r24, 64; 2026-02-21T09:47:41.4070223Z or.b32 %r27, %r24, 96; 2026-02-21T09:47:41.4070367Z shl.b32 %r162, %r1, 4; 2026-02-21T09:47:41.4070509Z and.b32 %r163, %r162, 2032; 2026-02-21T09:47:41.4070670Z add.s32 %r28, %r134, %r163; 2026-02-21T09:47:41.4070819Z shl.b32 %r165, %r1, 6; 2026-02-21T09:47:41.4070992Z and.b32 %r166, %r165, 1536; 2026-02-21T09:47:41.4071152Z and.b32 %r167, %r162, 112; 2026-02-21T09:47:41.4071305Z and.b32 %r169, %r159, 384; 2026-02-21T09:47:41.4071467Z add.s32 %r170, %r134, %r166; 2026-02-21T09:47:41.4071624Z add.s32 %r171, %r170, %r167; 2026-02-21T09:47:41.4071787Z add.s32 %r268, %r171, %r169; 2026-02-21T09:47:41.4071983Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:41.4072310Z .loc 1 37 35 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:37:35 2026-02-21T09:47:41.4072589Z shr.s32 %r300, %r377, 31; 2026-02-21T09:47:41.4072779Z shr.u32 %r301, %r300, 26; 2026-02-21T09:47:41.4072931Z add.s32 %r302, %r377, %r301; 2026-02-21T09:47:41.4073185Z .loc 1 40 64 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:40:64 2026-02-21T09:47:41.4073475Z and.b32 %r303, %r302, 65472; 2026-02-21T09:47:41.4073625Z sub.s32 %r304, %r377, %r303; 2026-02-21T09:47:41.4073783Z cvt.u16.u32 %rs1, %r304; 2026-02-21T09:47:41.4073932Z cvt.s8.s32 %rs2, %r304; 2026-02-21T09:47:41.4074195Z .loc 1 41 51 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:41:51 2026-02-21T09:47:41.4074473Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:47:41.4074625Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:47:41.4074804Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:47:41.4074956Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:47:41.4075107Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:47:41.4075360Z .loc 1 40 64 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:40:64 2026-02-21T09:47:41.4075647Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:47:41.4075795Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:47:41.4076061Z .loc 1 42 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:42:27 2026-02-21T09:47:41.4076340Z shl.b32 %r305, %r302, 1; 2026-02-21T09:47:41.4076521Z and.b32 %r306, %r305, -128; 2026-02-21T09:47:41.4076686Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:47:41.4076839Z mad.wide.s16 %r307, %rs10, 32, %r306; 2026-02-21T09:47:41.4077122Z .loc 1 43 32 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:43:32 2026-02-21T09:47:41.4077402Z or.b32 %r308, %r307, %r23; 2026-02-21T09:47:41.4077666Z .loc 1 44 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:44:27 2026-02-21T09:47:41.4077948Z mul.wide.s16 %r309, %rs7, 128; 2026-02-21T09:47:41.4078220Z .loc 1 45 32 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:45:32 2026-02-21T09:47:41.4078508Z or.b32 %r310, %r309, %r24; 2026-02-21T09:47:41.4078656Z or.b32 %r311, %r309, %r25; 2026-02-21T09:47:41.4078810Z or.b32 %r312, %r309, %r26; 2026-02-21T09:47:41.4078952Z or.b32 %r313, %r309, %r27; 2026-02-21T09:47:41.4079209Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4079502Z shfl.sync.idx.b32 %r314, %r2, 0, 31, -1; 2026-02-21T09:47:41.4079686Z shl.b32 %r315, %r314, 21; 2026-02-21T09:47:41.4079866Z and.b32 %r316, %r315, 6291456; 2026-02-21T09:47:41.4080028Z add.s32 %r172, %r316, %r370; 2026-02-21T09:47:41.4080184Z mov.pred %p73, -1; 2026-02-21T09:47:41.4080322Z mov.b32 %r173, 0; 2026-02-21T09:47:41.4080461Z // begin inline asm 2026-02-21T09:47:41.4080824Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r172 + 0], {%r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173}; 2026-02-21T09:47:41.4081212Z // end inline asm 2026-02-21T09:47:41.4081347Z // begin inline asm 2026-02-21T09:47:41.4081701Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r172 + 16], {%r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173, %r173}; 2026-02-21T09:47:41.4082083Z // end inline asm 2026-02-21T09:47:41.4082213Z // begin inline asm 2026-02-21T09:47:41.4082373Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:41.4082562Z // end inline asm 2026-02-21T09:47:41.4082706Z bar.sync 0, 128; 2026-02-21T09:47:41.4082949Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4083237Z add.s32 %r206, %r134, 82048; 2026-02-21T09:47:41.4083384Z // begin inline asm 2026-02-21T09:47:41.4083547Z @%p37 mbarrier.init.shared::cta.b64 [%r206], 1; 2026-02-21T09:47:41.4083736Z // end inline asm 2026-02-21T09:47:41.4083865Z bar.sync 0, 128; 2026-02-21T09:47:41.4084002Z add.s32 %r207, %r134, 82056; 2026-02-21T09:47:41.4084148Z // begin inline asm 2026-02-21T09:47:41.4084340Z @%p37 mbarrier.init.shared::cta.b64 [%r207], 1; 2026-02-21T09:47:41.4084518Z // end inline asm 2026-02-21T09:47:41.4084651Z bar.sync 0, 128; 2026-02-21T09:47:41.4084818Z add.s32 %r208, %r134, 82064; 2026-02-21T09:47:41.4084973Z // begin inline asm 2026-02-21T09:47:41.4085138Z @%p37 mbarrier.init.shared::cta.b64 [%r208], 1; 2026-02-21T09:47:41.4085319Z // end inline asm 2026-02-21T09:47:41.4085452Z bar.sync 0, 128; 2026-02-21T09:47:41.4085583Z add.s32 %r209, %r134, 82072; 2026-02-21T09:47:41.4085738Z // begin inline asm 2026-02-21T09:47:41.4085893Z @%p37 mbarrier.init.shared::cta.b64 [%r209], 1; 2026-02-21T09:47:41.4086074Z // end inline asm 2026-02-21T09:47:41.4086204Z add.s32 %r210, %r134, 82080; 2026-02-21T09:47:41.4086360Z // begin inline asm 2026-02-21T09:47:41.4086515Z @%p37 mbarrier.init.shared::cta.b64 [%r210], 1; 2026-02-21T09:47:41.4086697Z // end inline asm 2026-02-21T09:47:41.4086831Z bar.sync 0, 128; 2026-02-21T09:47:41.4086961Z add.s32 %r211, %r134, 82088; 2026-02-21T09:47:41.4087111Z // begin inline asm 2026-02-21T09:47:41.4087264Z @%p37 mbarrier.init.shared::cta.b64 [%r211], 1; 2026-02-21T09:47:41.4087442Z // end inline asm 2026-02-21T09:47:41.4087565Z bar.sync 0, 128; 2026-02-21T09:47:41.4087700Z add.s32 %r212, %r134, 82096; 2026-02-21T09:47:41.4087846Z // begin inline asm 2026-02-21T09:47:41.4088036Z @%p37 mbarrier.init.shared::cta.b64 [%r212], 1; 2026-02-21T09:47:41.4088220Z // end inline asm 2026-02-21T09:47:41.4088350Z bar.sync 0, 128; 2026-02-21T09:47:41.4088492Z add.s32 %r213, %r134, 82104; 2026-02-21T09:47:41.4088645Z // begin inline asm 2026-02-21T09:47:41.4088814Z @%p37 mbarrier.init.shared::cta.b64 [%r213], 1; 2026-02-21T09:47:41.4089115Z // end inline asm 2026-02-21T09:47:41.4089373Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4089660Z bar.sync 0, 128; 2026-02-21T09:47:41.4089800Z // begin inline asm 2026-02-21T09:47:41.4089981Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r206]; 2026-02-21T09:47:41.4090182Z // end inline asm 2026-02-21T09:47:41.4090324Z bar.sync 0, 128; 2026-02-21T09:47:41.4090461Z // begin inline asm 2026-02-21T09:47:41.4090639Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r207]; 2026-02-21T09:47:41.4090833Z // end inline asm 2026-02-21T09:47:41.4090988Z bar.sync 0, 128; 2026-02-21T09:47:41.4091124Z // begin inline asm 2026-02-21T09:47:41.4091295Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r208]; 2026-02-21T09:47:41.4091540Z // end inline asm 2026-02-21T09:47:41.4091677Z bar.sync 0, 128; 2026-02-21T09:47:41.4091816Z // begin inline asm 2026-02-21T09:47:41.4091977Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r209]; 2026-02-21T09:47:41.4092173Z // end inline asm 2026-02-21T09:47:41.4092428Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4092729Z bar.sync 0, 128; 2026-02-21T09:47:41.4092865Z add.s32 %r218, %r134, 82112; 2026-02-21T09:47:41.4093024Z // begin inline asm 2026-02-21T09:47:41.4093189Z @%p37 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T09:47:41.4093384Z // end inline asm 2026-02-21T09:47:41.4093547Z st.shared.b32 [global_smem+82120], 33554689; 2026-02-21T09:47:41.4093752Z st.shared.b32 [global_smem+81920], %r370; 2026-02-21T09:47:41.4093985Z st.shared.v2.b32 [global_smem+81928], {%r309, %r307}; 2026-02-21T09:47:41.4094216Z barrier.sync 1; 2026-02-21T09:47:41.4094390Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.4094579Z barrier.sync 1; 2026-02-21T09:47:41.4094783Z barrier.sync 1; 2026-02-21T09:47:41.4094943Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.4095256Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4095557Z bar.sync 0, 128; 2026-02-21T09:47:41.4095692Z // begin inline asm 2026-02-21T09:47:41.4095837Z 2026-02-21T09:47:41.4095953Z { 2026-02-21T09:47:41.4096087Z .reg .pred complete; 2026-02-21T09:47:41.4096268Z waitLoop: 2026-02-21T09:47:41.4096469Z mbarrier.try_wait.parity.shared.b64 complete, [%r218], %r173; 2026-02-21T09:47:41.4096725Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.4096881Z } 2026-02-21T09:47:41.4096943Z 2026-02-21T09:47:41.4097001Z // end inline asm 2026-02-21T09:47:41.4097250Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4097532Z bar.sync 0, 128; 2026-02-21T09:47:41.4097661Z // begin inline asm 2026-02-21T09:47:41.4097828Z @%p37 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T09:47:41.4098006Z // end inline asm 2026-02-21T09:47:41.4098140Z // begin inline asm 2026-02-21T09:47:41.4098294Z @%p37 mbarrier.inval.shared::cta.b64 [%r210]; 2026-02-21T09:47:41.4098482Z // end inline asm 2026-02-21T09:47:41.4098618Z bar.sync 0, 128; 2026-02-21T09:47:41.4098746Z // begin inline asm 2026-02-21T09:47:41.4098909Z @%p37 mbarrier.inval.shared::cta.b64 [%r211]; 2026-02-21T09:47:41.4099085Z // end inline asm 2026-02-21T09:47:41.4099220Z bar.sync 0, 128; 2026-02-21T09:47:41.4099347Z // begin inline asm 2026-02-21T09:47:41.4099511Z @%p37 mbarrier.inval.shared::cta.b64 [%r212]; 2026-02-21T09:47:41.4099688Z // end inline asm 2026-02-21T09:47:41.4099824Z bar.sync 0, 128; 2026-02-21T09:47:41.4099953Z // begin inline asm 2026-02-21T09:47:41.4100141Z @%p37 mbarrier.inval.shared::cta.b64 [%r213]; 2026-02-21T09:47:41.4100325Z // end inline asm 2026-02-21T09:47:41.4100451Z // begin inline asm 2026-02-21T09:47:41.4100612Z @%p37 mbarrier.inval.shared::cta.b64 [%r206]; 2026-02-21T09:47:41.4100786Z // end inline asm 2026-02-21T09:47:41.4100918Z bar.sync 0, 128; 2026-02-21T09:47:41.4101045Z // begin inline asm 2026-02-21T09:47:41.4101202Z @%p37 mbarrier.inval.shared::cta.b64 [%r207]; 2026-02-21T09:47:41.4101375Z // end inline asm 2026-02-21T09:47:41.4101509Z bar.sync 0, 128; 2026-02-21T09:47:41.4101635Z // begin inline asm 2026-02-21T09:47:41.4101797Z @%p37 mbarrier.inval.shared::cta.b64 [%r208]; 2026-02-21T09:47:41.4101977Z // end inline asm 2026-02-21T09:47:41.4102102Z bar.sync 0, 128; 2026-02-21T09:47:41.4102238Z // begin inline asm 2026-02-21T09:47:41.4102392Z @%p37 mbarrier.inval.shared::cta.b64 [%r209]; 2026-02-21T09:47:41.4102577Z // end inline asm 2026-02-21T09:47:41.4102814Z .loc 1 59 53 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:59:53 2026-02-21T09:47:41.4103107Z mad.lo.s32 %r318, %r310, 12288, %r308; 2026-02-21T09:47:41.4103313Z mad.lo.s32 %r319, %r311, 12288, %r308; 2026-02-21T09:47:41.4103489Z mad.lo.s32 %r320, %r312, 12288, %r308; 2026-02-21T09:47:41.4103663Z mad.lo.s32 %r321, %r313, 12288, %r308; 2026-02-21T09:47:41.4103932Z .loc 1 59 24 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:59:24 2026-02-21T09:47:41.4104219Z mad.wide.s32 %rd85, %r318, 2, %rd5; 2026-02-21T09:47:41.4104389Z mad.wide.s32 %rd86, %r319, 2, %rd5; 2026-02-21T09:47:41.4104560Z mad.wide.s32 %rd87, %r320, 2, %rd5; 2026-02-21T09:47:41.4104750Z mad.wide.s32 %rd88, %r321, 2, %rd5; 2026-02-21T09:47:41.4105039Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4105323Z // begin inline asm 2026-02-21T09:47:41.4105707Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r230, %r231, %r232, %r233, %r234, %r235, %r236, %r237, %r238, %r239, %r240, %r241, %r242, %r243, %r244, %r245}, [%r172 + 0]; 2026-02-21T09:47:41.4106081Z // end inline asm 2026-02-21T09:47:41.4106212Z // begin inline asm 2026-02-21T09:47:41.4106558Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r247, %r248, %r249, %r250, %r251, %r252, %r253, %r254, %r255, %r256, %r257, %r258, %r259, %r260, %r261, %r262}, [%r172 + 16]; 2026-02-21T09:47:41.4106930Z // end inline asm 2026-02-21T09:47:41.4107067Z // begin inline asm 2026-02-21T09:47:41.4107218Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:41.4107374Z // end inline asm 2026-02-21T09:47:41.4107514Z cvt.u64.u32 %rd89, %r230; 2026-02-21T09:47:41.4107663Z cvt.u64.u32 %rd90, %r231; 2026-02-21T09:47:41.4107854Z shl.b64 %rd91, %rd90, 32; 2026-02-21T09:47:41.4107999Z or.b64 %rd92, %rd89, %rd91; 2026-02-21T09:47:41.4108267Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4108549Z mov.b64 {%r322, %r323}, %rd92; 2026-02-21T09:47:41.4108724Z cvt.rn.f16x2.f32 %r324, %r323, %r322; 2026-02-21T09:47:41.4109004Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4109289Z cvt.u64.u32 %rd93, %r232; 2026-02-21T09:47:41.4109443Z cvt.u64.u32 %rd94, %r233; 2026-02-21T09:47:41.4109588Z shl.b64 %rd95, %rd94, 32; 2026-02-21T09:47:41.4109744Z or.b64 %rd96, %rd93, %rd95; 2026-02-21T09:47:41.4110001Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4110286Z mov.b64 {%r325, %r326}, %rd96; 2026-02-21T09:47:41.4110453Z cvt.rn.f16x2.f32 %r327, %r326, %r325; 2026-02-21T09:47:41.4110725Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4111016Z cvt.u64.u32 %rd97, %r234; 2026-02-21T09:47:41.4111157Z cvt.u64.u32 %rd98, %r235; 2026-02-21T09:47:41.4111304Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:47:41.4111448Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:47:41.4111754Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4112051Z mov.b64 {%r328, %r329}, %rd100; 2026-02-21T09:47:41.4112222Z cvt.rn.f16x2.f32 %r330, %r329, %r328; 2026-02-21T09:47:41.4112502Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4112788Z cvt.u64.u32 %rd101, %r236; 2026-02-21T09:47:41.4112947Z cvt.u64.u32 %rd102, %r237; 2026-02-21T09:47:41.4113097Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:47:41.4113258Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:47:41.4113521Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4113819Z mov.b64 {%r331, %r332}, %rd104; 2026-02-21T09:47:41.4113985Z cvt.rn.f16x2.f32 %r333, %r332, %r331; 2026-02-21T09:47:41.4114262Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4114550Z cvt.u64.u32 %rd105, %r238; 2026-02-21T09:47:41.4114720Z cvt.u64.u32 %rd106, %r239; 2026-02-21T09:47:41.4114905Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:47:41.4115055Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:47:41.4115319Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4115601Z mov.b64 {%r334, %r335}, %rd108; 2026-02-21T09:47:41.4115767Z cvt.rn.f16x2.f32 %r336, %r335, %r334; 2026-02-21T09:47:41.4116037Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4116310Z cvt.u64.u32 %rd109, %r240; 2026-02-21T09:47:41.4116467Z cvt.u64.u32 %rd110, %r241; 2026-02-21T09:47:41.4116614Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:47:41.4116771Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:47:41.4117030Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4117339Z mov.b64 {%r337, %r338}, %rd112; 2026-02-21T09:47:41.4117508Z cvt.rn.f16x2.f32 %r339, %r338, %r337; 2026-02-21T09:47:41.4117776Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4118054Z cvt.u64.u32 %rd113, %r242; 2026-02-21T09:47:41.4118200Z cvt.u64.u32 %rd114, %r243; 2026-02-21T09:47:41.4118353Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:47:41.4118504Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:47:41.4118766Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4119044Z mov.b64 {%r340, %r341}, %rd116; 2026-02-21T09:47:41.4119241Z cvt.rn.f16x2.f32 %r342, %r341, %r340; 2026-02-21T09:47:41.4119519Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4119799Z cvt.u64.u32 %rd117, %r244; 2026-02-21T09:47:41.4119956Z cvt.u64.u32 %rd118, %r245; 2026-02-21T09:47:41.4120105Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:47:41.4120269Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:47:41.4120528Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4120815Z mov.b64 {%r343, %r344}, %rd120; 2026-02-21T09:47:41.4120986Z cvt.rn.f16x2.f32 %r345, %r344, %r343; 2026-02-21T09:47:41.4121255Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4121540Z cvt.u64.u32 %rd121, %r247; 2026-02-21T09:47:41.4121688Z cvt.u64.u32 %rd122, %r248; 2026-02-21T09:47:41.4121838Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:47:41.4121989Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:47:41.4122255Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4122533Z mov.b64 {%r346, %r347}, %rd124; 2026-02-21T09:47:41.4122700Z cvt.rn.f16x2.f32 %r348, %r347, %r346; 2026-02-21T09:47:41.4123006Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4123284Z cvt.u64.u32 %rd125, %r249; 2026-02-21T09:47:41.4123440Z cvt.u64.u32 %rd126, %r250; 2026-02-21T09:47:41.4123586Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:47:41.4123745Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:47:41.4124001Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4124283Z mov.b64 {%r349, %r350}, %rd128; 2026-02-21T09:47:41.4124451Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T09:47:41.4124755Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4125043Z cvt.u64.u32 %rd129, %r251; 2026-02-21T09:47:41.4125191Z cvt.u64.u32 %rd130, %r252; 2026-02-21T09:47:41.4125345Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:47:41.4125494Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:47:41.4125766Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4126048Z mov.b64 {%r352, %r353}, %rd132; 2026-02-21T09:47:41.4126243Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T09:47:41.4126520Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4126795Z cvt.u64.u32 %rd133, %r253; 2026-02-21T09:47:41.4126947Z cvt.u64.u32 %rd134, %r254; 2026-02-21T09:47:41.4127091Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:47:41.4127245Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:47:41.4127504Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4127790Z mov.b64 {%r355, %r356}, %rd136; 2026-02-21T09:47:41.4127956Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T09:47:41.4128226Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4128513Z cvt.u64.u32 %rd137, %r255; 2026-02-21T09:47:41.4128687Z cvt.u64.u32 %rd138, %r256; 2026-02-21T09:47:41.4128842Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:47:41.4128994Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:47:41.4129261Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4129549Z mov.b64 {%r358, %r359}, %rd140; 2026-02-21T09:47:41.4129715Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T09:47:41.4129993Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4130269Z cvt.u64.u32 %rd141, %r257; 2026-02-21T09:47:41.4130457Z cvt.u64.u32 %rd142, %r258; 2026-02-21T09:47:41.4130608Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:47:41.4130769Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:47:41.4131028Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4131332Z mov.b64 {%r361, %r362}, %rd144; 2026-02-21T09:47:41.4131519Z cvt.rn.f16x2.f32 %r363, %r362, %r361; 2026-02-21T09:47:41.4131813Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4132116Z cvt.u64.u32 %rd145, %r259; 2026-02-21T09:47:41.4132275Z cvt.u64.u32 %rd146, %r260; 2026-02-21T09:47:41.4132439Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:47:41.4132600Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:47:41.4132876Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4133172Z mov.b64 {%r364, %r365}, %rd148; 2026-02-21T09:47:41.4133349Z cvt.rn.f16x2.f32 %r366, %r365, %r364; 2026-02-21T09:47:41.4133640Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4133931Z cvt.u64.u32 %rd149, %r261; 2026-02-21T09:47:41.4134098Z cvt.u64.u32 %rd150, %r262; 2026-02-21T09:47:41.4134256Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:47:41.4134456Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:47:41.4134754Z .loc 1 58 27 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:58:27 2026-02-21T09:47:41.4135055Z mov.b64 {%r367, %r368}, %rd152; 2026-02-21T09:47:41.4135229Z cvt.rn.f16x2.f32 %r369, %r368, %r367; 2026-02-21T09:47:41.4135507Z .loc 1 59 83 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:59:83 2026-02-21T09:47:41.4135844Z st.shared.v4.b32 [%r28], {%r324, %r336, %r348, %r360}; 2026-02-21T09:47:41.4136052Z bar.sync 0, 128; 2026-02-21T09:47:41.4136199Z // begin inline asm 2026-02-21T09:47:41.4136436Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r264, %r265, %r266, %r267}, [%r268]; 2026-02-21T09:47:41.4136717Z // end inline asm 2026-02-21T09:47:41.4136854Z bar.sync 0, 128; 2026-02-21T09:47:41.4137034Z st.shared.v4.b32 [%r28], {%r327, %r339, %r351, %r363}; 2026-02-21T09:47:41.4137239Z bar.sync 0, 128; 2026-02-21T09:47:41.4137377Z // begin inline asm 2026-02-21T09:47:41.4137619Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r269, %r270, %r271, %r272}, [%r268]; 2026-02-21T09:47:41.4137905Z // end inline asm 2026-02-21T09:47:41.4138042Z bar.sync 0, 128; 2026-02-21T09:47:41.4138208Z st.shared.v4.b32 [%r28], {%r330, %r342, %r354, %r366}; 2026-02-21T09:47:41.4138410Z bar.sync 0, 128; 2026-02-21T09:47:41.4138541Z // begin inline asm 2026-02-21T09:47:41.4138780Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r274, %r275, %r276, %r277}, [%r268]; 2026-02-21T09:47:41.4139051Z // end inline asm 2026-02-21T09:47:41.4139177Z bar.sync 0, 128; 2026-02-21T09:47:41.4139341Z st.shared.v4.b32 [%r28], {%r333, %r345, %r357, %r369}; 2026-02-21T09:47:41.4139527Z bar.sync 0, 128; 2026-02-21T09:47:41.4139662Z // begin inline asm 2026-02-21T09:47:41.4139874Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r279, %r280, %r281, %r282}, [%r268]; 2026-02-21T09:47:41.4140129Z // end inline asm 2026-02-21T09:47:41.4140256Z // begin inline asm 2026-02-21T09:47:41.4140468Z st.global.v4.b32 [ %rd85 + 0 ], { %r264, %r269, %r274, %r279 }; 2026-02-21T09:47:41.4140682Z // end inline asm 2026-02-21T09:47:41.4140811Z // begin inline asm 2026-02-21T09:47:41.4140991Z st.global.v4.b32 [ %rd86 + 0 ], { %r265, %r270, %r275, %r280 }; 2026-02-21T09:47:41.4141189Z // end inline asm 2026-02-21T09:47:41.4141324Z // begin inline asm 2026-02-21T09:47:41.4141495Z st.global.v4.b32 [ %rd87 + 0 ], { %r266, %r271, %r276, %r281 }; 2026-02-21T09:47:41.4141702Z // end inline asm 2026-02-21T09:47:41.4141833Z // begin inline asm 2026-02-21T09:47:41.4142011Z st.global.v4.b32 [ %rd88 + 0 ], { %r267, %r272, %r277, %r282 }; 2026-02-21T09:47:41.4142207Z // end inline asm 2026-02-21T09:47:41.4142486Z .loc 1 31 84 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:31:84 2026-02-21T09:47:41.4142771Z add.s32 %r377, %r377, 1; 2026-02-21T09:47:41.4142924Z setp.ne.b32 %p97, %r22, %r377; 2026-02-21T09:47:41.4143087Z @%p97 bra $L__BB0_14; 2026-02-21T09:47:41.4143255Z $L__BB0_15: // %._crit_edge 2026-02-21T09:47:41.4143556Z .loc 1 31 4 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:31:4 2026-02-21T09:47:41.4143830Z bar.sync 0, 128; 2026-02-21T09:47:41.4143965Z // begin inline asm 2026-02-21T09:47:41.4144165Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r370, 32; 2026-02-21T09:47:41.4144379Z // end inline asm 2026-02-21T09:47:41.4144534Z st.shared.b32 [global_smem+82120], 50529027; 2026-02-21T09:47:41.4144737Z barrier.sync 1; 2026-02-21T09:47:41.4144901Z $L__BB0_16: // %common.ret 2026-02-21T09:47:41.4145188Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4145473Z ret; 2026-02-21T09:47:41.4145629Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:47:41.4145839Z mov.b32 %r33, global_smem; 2026-02-21T09:47:41.4146003Z add.s32 %r34, %r33, %r3; 2026-02-21T09:47:41.4146181Z add.s32 %r66, %r33, 82080; 2026-02-21T09:47:41.4146343Z bfe.u32 %r80, %r33, 4, 14; 2026-02-21T09:47:41.4146494Z cvt.u64.u32 %rd22, %r80; 2026-02-21T09:47:41.4146665Z or.b64 %rd12, %rd22, 4611686293372403712; 2026-02-21T09:47:41.4146835Z add.s32 %r81, %r33, 65536; 2026-02-21T09:47:41.4146984Z bfe.u32 %r82, %r81, 4, 14; 2026-02-21T09:47:41.4147128Z cvt.u64.u32 %rd23, %r82; 2026-02-21T09:47:41.4147290Z or.b64 %rd13, %rd23, 4611686293322072064; 2026-02-21T09:47:41.4147458Z add.s32 %r83, %r33, 32; 2026-02-21T09:47:41.4147608Z bfe.u32 %r84, %r83, 4, 14; 2026-02-21T09:47:41.4147758Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.4147933Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4148255Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4148551Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.4148731Z barrier.sync 1; 2026-02-21T09:47:41.4148861Z barrier.sync 1; 2026-02-21T09:47:41.4149021Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.4149213Z $L__BB0_2: // %.preheader 2026-02-21T09:47:41.4149460Z // =>This Loop Header: Depth=1 2026-02-21T09:47:41.4149684Z // Child Loop BB0_9 Depth 2 2026-02-21T09:47:41.4149900Z // Child Loop BB0_6 Depth 2 2026-02-21T09:47:41.4150200Z .loc 1 19 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:19 2026-02-21T09:47:41.4150490Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:47:41.4150669Z barrier.sync 1; 2026-02-21T09:47:41.4150811Z ld.shared.b8 %r32, [%r34+82116]; 2026-02-21T09:47:41.4150985Z setp.gt.u32 %p2, %r32, 3; 2026-02-21T09:47:41.4151144Z @%p2 bra $L__BB0_4; 2026-02-21T09:47:41.4151304Z // %bb.3: // %.preheader 2026-02-21T09:47:41.4151527Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4151753Z $L_brx_0: .branchtargets 2026-02-21T09:47:41.4151916Z $L__BB0_5, 2026-02-21T09:47:41.4152046Z $L__BB0_8, 2026-02-21T09:47:41.4152179Z $L__BB0_11, 2026-02-21T09:47:41.4152304Z $L__BB0_16; 2026-02-21T09:47:41.4152443Z brx.idx %r32, $L_brx_0; 2026-02-21T09:47:41.4152618Z $L__BB0_5: // %.peel.next 2026-02-21T09:47:41.4152832Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4153154Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4153462Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.4153711Z ld.shared.b32 %r68, [global_smem+81920]; 2026-02-21T09:47:41.4153887Z barrier.sync 1; 2026-02-21T09:47:41.4154135Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4154424Z bar.warp.sync -1; 2026-02-21T09:47:41.4154566Z mov.b32 %r371, 0; 2026-02-21T09:47:41.4154730Z // begin inline asm 2026-02-21T09:47:41.4154864Z 2026-02-21T09:47:41.4154982Z { 2026-02-21T09:47:41.4155103Z .reg .pred complete; 2026-02-21T09:47:41.4155248Z waitLoop: 2026-02-21T09:47:41.4155428Z mbarrier.try_wait.parity.shared.b64 complete, [%r66], %r371; 2026-02-21T09:47:41.4155659Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.4155809Z } 2026-02-21T09:47:41.4155879Z 2026-02-21T09:47:41.4155933Z // end inline asm 2026-02-21T09:47:41.4156185Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4156469Z elect.sync %r79|%p12, -1; 2026-02-21T09:47:41.4156633Z mov.b32 %r69, 134742032; 2026-02-21T09:47:41.4156778Z mov.pred %p11, 0; 2026-02-21T09:47:41.4156917Z // begin inline asm 2026-02-21T09:47:41.4157137Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd12, %rd13, %r69, %p11; 2026-02-21T09:47:41.4157385Z // end inline asm 2026-02-21T09:47:41.4157519Z cvt.u64.u32 %rd24, %r84; 2026-02-21T09:47:41.4157712Z or.b64 %rd14, %rd24, 4611686293372403712; 2026-02-21T09:47:41.4157892Z add.s32 %r85, %r33, 65568; 2026-02-21T09:47:41.4158047Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T09:47:41.4158205Z cvt.u64.u32 %rd25, %r86; 2026-02-21T09:47:41.4158363Z or.b64 %rd15, %rd25, 4611686293322072064; 2026-02-21T09:47:41.4158545Z mov.pred %p13, -1; 2026-02-21T09:47:41.4158685Z // begin inline asm 2026-02-21T09:47:41.4158905Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd14, %rd15, %r69, %p13; 2026-02-21T09:47:41.4159144Z // end inline asm 2026-02-21T09:47:41.4159290Z add.s32 %r87, %r33, 64; 2026-02-21T09:47:41.4159437Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T09:47:41.4159598Z cvt.u64.u32 %rd26, %r88; 2026-02-21T09:47:41.4159763Z or.b64 %rd16, %rd26, 4611686293372403712; 2026-02-21T09:47:41.4159934Z add.s32 %r89, %r33, 65600; 2026-02-21T09:47:41.4160093Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T09:47:41.4160242Z cvt.u64.u32 %rd27, %r90; 2026-02-21T09:47:41.4160408Z or.b64 %rd17, %rd27, 4611686293322072064; 2026-02-21T09:47:41.4160581Z // begin inline asm 2026-02-21T09:47:41.4160802Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd16, %rd17, %r69, %p13; 2026-02-21T09:47:41.4161069Z // end inline asm 2026-02-21T09:47:41.4161209Z add.s32 %r91, %r33, 96; 2026-02-21T09:47:41.4161359Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T09:47:41.4161505Z cvt.u64.u32 %rd28, %r92; 2026-02-21T09:47:41.4161664Z or.b64 %rd18, %rd28, 4611686293372403712; 2026-02-21T09:47:41.4161831Z add.s32 %r93, %r33, 65632; 2026-02-21T09:47:41.4161984Z bfe.u32 %r94, %r93, 4, 14; 2026-02-21T09:47:41.4162132Z cvt.u64.u32 %rd29, %r94; 2026-02-21T09:47:41.4162292Z or.b64 %rd19, %rd29, 4611686293322072064; 2026-02-21T09:47:41.4162457Z // begin inline asm 2026-02-21T09:47:41.4162672Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd18, %rd19, %r69, %p13; 2026-02-21T09:47:41.4162915Z // end inline asm 2026-02-21T09:47:41.4163049Z add.s32 %r95, %r33, 82048; 2026-02-21T09:47:41.4163206Z cvt.u64.u32 %rd20, %r95; 2026-02-21T09:47:41.4163375Z // begin inline asm 2026-02-21T09:47:41.4163586Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:47:41.4163811Z // end inline asm 2026-02-21T09:47:41.4163950Z add.s32 %r96, %r33, 82112; 2026-02-21T09:47:41.4164096Z cvt.u64.u32 %rd21, %r96; 2026-02-21T09:47:41.4164243Z // begin inline asm 2026-02-21T09:47:41.4164445Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:47:41.4164701Z // end inline asm 2026-02-21T09:47:41.4164840Z mov.b32 %r373, 1; 2026-02-21T09:47:41.4164972Z mov.b32 %r372, %r371; 2026-02-21T09:47:41.4165188Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:41.4165426Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:41.4165752Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4166051Z shl.b32 %r107, %r373, 3; 2026-02-21T09:47:41.4166201Z add.s32 %r109, %r33, %r107; 2026-02-21T09:47:41.4166366Z add.s32 %r110, %r109, 82048; 2026-02-21T09:47:41.4166523Z add.s32 %r97, %r109, 82080; 2026-02-21T09:47:41.4166792Z .loc 1 54 31 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:54:31 2026-02-21T09:47:41.4167079Z shl.b32 %r111, %r373, 14; 2026-02-21T09:47:41.4167239Z add.s32 %r112, %r33, %r111; 2026-02-21T09:47:41.4167496Z .loc 1 55 44 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:55:44 2026-02-21T09:47:41.4167787Z shl.b32 %r113, %r373, 12; 2026-02-21T09:47:41.4167946Z add.s32 %r114, %r33, %r113; 2026-02-21T09:47:41.4168104Z add.s32 %r115, %r114, 65536; 2026-02-21T09:47:41.4168365Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4168640Z bar.warp.sync -1; 2026-02-21T09:47:41.4168789Z // begin inline asm 2026-02-21T09:47:41.4168925Z 2026-02-21T09:47:41.4169071Z { 2026-02-21T09:47:41.4169192Z .reg .pred complete; 2026-02-21T09:47:41.4169340Z waitLoop: 2026-02-21T09:47:41.4169517Z mbarrier.try_wait.parity.shared.b64 complete, [%r97], %r372; 2026-02-21T09:47:41.4169747Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.4169897Z } 2026-02-21T09:47:41.4169960Z 2026-02-21T09:47:41.4170014Z // end inline asm 2026-02-21T09:47:41.4170264Z .loc 1 56 52 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:56:52 2026-02-21T09:47:41.4170560Z setp.eq.b32 %p31, %r371, 1920; 2026-02-21T09:47:41.4170729Z elect.sync %r116|%p22, -1; 2026-02-21T09:47:41.4170883Z bfe.u32 %r117, %r112, 4, 14; 2026-02-21T09:47:41.4171042Z cvt.u64.u32 %rd40, %r117; 2026-02-21T09:47:41.4171198Z or.b64 %rd30, %rd40, 4611686293372403712; 2026-02-21T09:47:41.4171379Z bfe.u32 %r118, %r115, 4, 14; 2026-02-21T09:47:41.4171533Z cvt.u64.u32 %rd41, %r118; 2026-02-21T09:47:41.4171687Z or.b64 %rd31, %rd41, 4611686293322072064; 2026-02-21T09:47:41.4171864Z // begin inline asm 2026-02-21T09:47:41.4172073Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd30, %rd31, %r69, %p13; 2026-02-21T09:47:41.4172343Z // end inline asm 2026-02-21T09:47:41.4172474Z add.s32 %r119, %r112, 32; 2026-02-21T09:47:41.4172626Z bfe.u32 %r120, %r119, 4, 14; 2026-02-21T09:47:41.4172774Z cvt.u64.u32 %rd42, %r120; 2026-02-21T09:47:41.4172933Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T09:47:41.4173108Z add.s32 %r121, %r114, 65568; 2026-02-21T09:47:41.4173254Z bfe.u32 %r122, %r121, 4, 14; 2026-02-21T09:47:41.4173407Z cvt.u64.u32 %rd43, %r122; 2026-02-21T09:47:41.4173558Z or.b64 %rd33, %rd43, 4611686293322072064; 2026-02-21T09:47:41.4173732Z // begin inline asm 2026-02-21T09:47:41.4173942Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd32, %rd33, %r69, %p13; 2026-02-21T09:47:41.4174185Z // end inline asm 2026-02-21T09:47:41.4174315Z add.s32 %r123, %r112, 64; 2026-02-21T09:47:41.4174464Z bfe.u32 %r124, %r123, 4, 14; 2026-02-21T09:47:41.4174618Z cvt.u64.u32 %rd44, %r124; 2026-02-21T09:47:41.4174823Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T09:47:41.4175007Z add.s32 %r125, %r114, 65600; 2026-02-21T09:47:41.4175158Z bfe.u32 %r126, %r125, 4, 14; 2026-02-21T09:47:41.4175315Z cvt.u64.u32 %rd45, %r126; 2026-02-21T09:47:41.4175471Z or.b64 %rd35, %rd45, 4611686293322072064; 2026-02-21T09:47:41.4175648Z // begin inline asm 2026-02-21T09:47:41.4175860Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd34, %rd35, %r69, %p13; 2026-02-21T09:47:41.4176109Z // end inline asm 2026-02-21T09:47:41.4176249Z add.s32 %r127, %r112, 96; 2026-02-21T09:47:41.4176397Z bfe.u32 %r128, %r127, 4, 14; 2026-02-21T09:47:41.4176599Z cvt.u64.u32 %rd46, %r128; 2026-02-21T09:47:41.4176757Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T09:47:41.4176935Z add.s32 %r129, %r114, 65632; 2026-02-21T09:47:41.4177086Z bfe.u32 %r130, %r129, 4, 14; 2026-02-21T09:47:41.4177245Z cvt.u64.u32 %rd47, %r130; 2026-02-21T09:47:41.4177404Z or.b64 %rd37, %rd47, 4611686293322072064; 2026-02-21T09:47:41.4177582Z // begin inline asm 2026-02-21T09:47:41.4177802Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd36, %rd37, %r69, %p13; 2026-02-21T09:47:41.4178047Z // end inline asm 2026-02-21T09:47:41.4178191Z cvt.u64.u32 %rd38, %r110; 2026-02-21T09:47:41.4178338Z // begin inline asm 2026-02-21T09:47:41.4178553Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd38]; 2026-02-21T09:47:41.4178789Z // end inline asm 2026-02-21T09:47:41.4178947Z and.pred %p30, %p31, %p22; 2026-02-21T09:47:41.4179107Z // begin inline asm 2026-02-21T09:47:41.4179326Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:47:41.4179564Z // end inline asm 2026-02-21T09:47:41.4179806Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4180099Z add.s32 %r132, %r373, 1; 2026-02-21T09:47:41.4180253Z setp.eq.b32 %p32, %r132, 4; 2026-02-21T09:47:41.4180450Z selp.b32 %r373, 0, %r132, %p32; 2026-02-21T09:47:41.4180625Z selp.b32 %r133, 1, 0, %p32; 2026-02-21T09:47:41.4180792Z xor.b32 %r372, %r372, %r133; 2026-02-21T09:47:41.4181062Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4181362Z add.s32 %r371, %r371, 64; 2026-02-21T09:47:41.4181524Z setp.lt.u32 %p33, %r371, 1984; 2026-02-21T09:47:41.4181688Z @%p33 bra $L__BB0_6; 2026-02-21T09:47:41.4181862Z // %bb.7: // %.loopexit 2026-02-21T09:47:41.4182082Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4182296Z barrier.sync 1; 2026-02-21T09:47:41.4182455Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.4182646Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.4182828Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4183170Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4183480Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.4183721Z ld.shared.v2.b32 {%r50, %r54}, [global_smem+81928]; 2026-02-21T09:47:41.4183919Z barrier.sync 1; 2026-02-21T09:47:41.4184159Z .loc 1 21 67 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:21:67 2026-02-21T09:47:41.4184451Z mov.u32 %r37, %ctaid.x; 2026-02-21T09:47:41.4184596Z mov.u32 %r38, %ctaid.y; 2026-02-21T09:47:41.4184766Z mov.u32 %r39, %ctaid.z; 2026-02-21T09:47:41.4184918Z mov.u32 %r40, %nctaid.x; 2026-02-21T09:47:41.4185066Z mov.u32 %r41, %nctaid.y; 2026-02-21T09:47:41.4185223Z mad.lo.s32 %r42, %r39, %r41, %r38; 2026-02-21T09:47:41.4185393Z mad.lo.s32 %r43, %r42, %r40, %r37; 2026-02-21T09:47:41.4185562Z shl.b32 %r44, %r43, 8; 2026-02-21T09:47:41.4185819Z .loc 1 22 68 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:22:68 2026-02-21T09:47:41.4186113Z cvt.s64.s32 %rd7, %r44; 2026-02-21T09:47:41.4186303Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:47:41.4186458Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:47:41.4186615Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:47:41.4186878Z .loc 1 21 67 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:21:67 2026-02-21T09:47:41.4187170Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:47:41.4187325Z add.s32 %r13, %r1, -128; 2026-02-21T09:47:41.4187474Z mov.b32 %r375, 0; 2026-02-21T09:47:41.4187608Z mov.b32 %r374, -64; 2026-02-21T09:47:41.4187754Z mov.b32 %r376, %r375; 2026-02-21T09:47:41.4187936Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:41.4188211Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:41.4188525Z .loc 1 0 67 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0:67 2026-02-21T09:47:41.4188807Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:47:41.4188970Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:47:41.4189226Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4189520Z add.s32 %r374, %r374, 64; 2026-02-21T09:47:41.4189668Z shl.b32 %r56, %r376, 3; 2026-02-21T09:47:41.4189820Z add.s32 %r58, %r33, %r56; 2026-02-21T09:47:41.4189975Z add.s32 %r45, %r58, 82048; 2026-02-21T09:47:41.4190224Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4190497Z // begin inline asm 2026-02-21T09:47:41.4190627Z 2026-02-21T09:47:41.4190743Z { 2026-02-21T09:47:41.4190859Z .reg .pred complete; 2026-02-21T09:47:41.4191004Z waitLoop: 2026-02-21T09:47:41.4191185Z mbarrier.try_wait.parity.shared.b64 complete, [%r45], %r375; 2026-02-21T09:47:41.4191418Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.4191565Z } 2026-02-21T09:47:41.4191635Z 2026-02-21T09:47:41.4191690Z // end inline asm 2026-02-21T09:47:41.4191964Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4192248Z add.s32 %r51, %r58, 82080; 2026-02-21T09:47:41.4192502Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4192765Z bar.sync 3, 64; 2026-02-21T09:47:41.4192903Z // begin inline asm 2026-02-21T09:47:41.4193087Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r51], 20480; 2026-02-21T09:47:41.4193301Z // end inline asm 2026-02-21T09:47:41.4193546Z .loc 1 54 31 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:54:31 2026-02-21T09:47:41.4193827Z shl.b32 %r59, %r376, 14; 2026-02-21T09:47:41.4193977Z add.s32 %r48, %r33, %r59; 2026-02-21T09:47:41.4194218Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4194486Z bar.sync 3, 64; 2026-02-21T09:47:41.4194620Z elect.sync %r60|%p7, -1; 2026-02-21T09:47:41.4194800Z and.pred %p4, %p6, %p7; 2026-02-21T09:47:41.4194949Z // begin inline asm 2026-02-21T09:47:41.4195278Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r48], [%rd10, {%r374, %r50}], [%r51]; 2026-02-21T09:47:41.4195657Z // end inline asm 2026-02-21T09:47:41.4195896Z .loc 1 55 44 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:55:44 2026-02-21T09:47:41.4196175Z shl.b32 %r61, %r376, 12; 2026-02-21T09:47:41.4196317Z add.s32 %r62, %r33, %r61; 2026-02-21T09:47:41.4196472Z add.s32 %r52, %r62, 65536; 2026-02-21T09:47:41.4196713Z .loc 1 0 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:0 2026-02-21T09:47:41.4196993Z bar.sync 3, 64; 2026-02-21T09:47:41.4197135Z elect.sync %r63|%p8, -1; 2026-02-21T09:47:41.4197285Z and.pred %p5, %p6, %p8; 2026-02-21T09:47:41.4197435Z // begin inline asm 2026-02-21T09:47:41.4197741Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r52], [%rd11, {%r374, %r54}], [%r51]; 2026-02-21T09:47:41.4198081Z // end inline asm 2026-02-21T09:47:41.4198257Z add.s32 %r64, %r376, 1; 2026-02-21T09:47:41.4198410Z setp.eq.b32 %p9, %r64, 4; 2026-02-21T09:47:41.4198566Z selp.b32 %r376, 0, %r64, %p9; 2026-02-21T09:47:41.4198730Z selp.b32 %r65, 1, 0, %p9; 2026-02-21T09:47:41.4198888Z xor.b32 %r375, %r375, %r65; 2026-02-21T09:47:41.4199147Z .loc 1 50 57 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:50:57 2026-02-21T09:47:41.4199444Z setp.lt.u32 %p10, %r374, 1984; 2026-02-21T09:47:41.4199603Z @%p10 bra $L__BB0_9; 2026-02-21T09:47:41.4199797Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4200043Z barrier.sync 1; 2026-02-21T09:47:41.4200208Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.4200382Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.4200563Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.4200881Z .loc 1 19 0 // c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py:19 2026-02-21T09:47:41.4201150Z barrier.sync 1; 2026-02-21T09:47:41.4201288Z barrier.sync 1; 2026-02-21T09:47:41.4201419Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.4201558Z $L__tmp1: 2026-02-21T09:47:41.4201678Z $L__func_end0: 2026-02-21T09:47:41.4201832Z // -- End function 2026-02-21T09:47:41.4202009Z } 2026-02-21T09:47:41.4202269Z .file 1 "/tmp/torchinductor_root/5q/c5qfkvmvzhvqzjvccg2ezoethsqbmj6b6pny6h6seebjp2hucvle.py" 2026-02-21T09:47:41.4202587Z .section .debug_abbrev 2026-02-21T09:47:41.4202725Z { 2026-02-21T09:47:41.4202878Z .b8 1 // Abbreviation Code 2026-02-21T09:47:41.4203098Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:41.4203314Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:41.4203515Z .b8 37 // DW_AT_producer 2026-02-21T09:47:41.4203749Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.4203945Z .b8 19 // DW_AT_language 2026-02-21T09:47:41.4204149Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:41.4204350Z .b8 3 // DW_AT_name 2026-02-21T09:47:41.4204539Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.4204764Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:41.4204958Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:41.4205160Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:41.4205353Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.4205547Z .b8 0 // EOM(1) 2026-02-21T09:47:41.4205735Z .b8 0 // EOM(2) 2026-02-21T09:47:41.4205913Z .b8 0 // EOM(3) 2026-02-21T09:47:41.4206082Z } 2026-02-21T09:47:41.4206201Z .section .debug_info 2026-02-21T09:47:41.4206341Z { 2026-02-21T09:47:41.4206478Z .b32 104 // Length of Unit 2026-02-21T09:47:41.4206732Z .b8 2 // DWARF version number 2026-02-21T09:47:41.4206913Z .b8 0 2026-02-21T09:47:41.4207092Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:41.4207342Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:41.4207566Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:41.4207800Z .b8 116 // DW_AT_producer 2026-02-21T09:47:41.4207981Z .b8 114 2026-02-21T09:47:41.4208104Z .b8 105 2026-02-21T09:47:41.4208214Z .b8 116 2026-02-21T09:47:41.4208331Z .b8 111 2026-02-21T09:47:41.4208439Z .b8 110 2026-02-21T09:47:41.4208558Z .b8 0 2026-02-21T09:47:41.4208700Z .b8 2 // DW_AT_language 2026-02-21T09:47:41.4208878Z .b8 0 2026-02-21T09:47:41.4209057Z .b8 99 // DW_AT_name 2026-02-21T09:47:41.4209243Z .b8 53 2026-02-21T09:47:41.4209370Z .b8 113 2026-02-21T09:47:41.4209486Z .b8 102 2026-02-21T09:47:41.4209606Z .b8 107 2026-02-21T09:47:41.4209717Z .b8 118 2026-02-21T09:47:41.4209837Z .b8 109 2026-02-21T09:47:41.4209951Z .b8 118 2026-02-21T09:47:41.4210071Z .b8 122 2026-02-21T09:47:41.4210184Z .b8 104 2026-02-21T09:47:41.4210306Z .b8 118 2026-02-21T09:47:41.4210418Z .b8 113 2026-02-21T09:47:41.4210537Z .b8 122 2026-02-21T09:47:41.4210657Z .b8 106 2026-02-21T09:47:41.4210771Z .b8 118 2026-02-21T09:47:41.4210889Z .b8 99 2026-02-21T09:47:41.4211004Z .b8 99 2026-02-21T09:47:41.4211151Z .b8 103 2026-02-21T09:47:41.4211259Z .b8 50 2026-02-21T09:47:41.4211375Z .b8 101 2026-02-21T09:47:41.4211482Z .b8 122 2026-02-21T09:47:41.4211594Z .b8 111 2026-02-21T09:47:41.4211701Z .b8 101 2026-02-21T09:47:41.4211815Z .b8 116 2026-02-21T09:47:41.4211922Z .b8 104 2026-02-21T09:47:41.4212036Z .b8 115 2026-02-21T09:47:41.4212143Z .b8 113 2026-02-21T09:47:41.4212257Z .b8 98 2026-02-21T09:47:41.4212371Z .b8 109 2026-02-21T09:47:41.4212479Z .b8 106 2026-02-21T09:47:41.4212592Z .b8 54 2026-02-21T09:47:41.4212699Z .b8 98 2026-02-21T09:47:41.4212812Z .b8 54 2026-02-21T09:47:41.4212918Z .b8 112 2026-02-21T09:47:41.4213033Z .b8 110 2026-02-21T09:47:41.4213139Z .b8 121 2026-02-21T09:47:41.4213251Z .b8 54 2026-02-21T09:47:41.4213357Z .b8 104 2026-02-21T09:47:41.4213468Z .b8 54 2026-02-21T09:47:41.4213575Z .b8 115 2026-02-21T09:47:41.4213691Z .b8 101 2026-02-21T09:47:41.4213798Z .b8 101 2026-02-21T09:47:41.4213914Z .b8 98 2026-02-21T09:47:41.4214021Z .b8 106 2026-02-21T09:47:41.4214136Z .b8 112 2026-02-21T09:47:41.4214251Z .b8 50 2026-02-21T09:47:41.4214359Z .b8 104 2026-02-21T09:47:41.4214472Z .b8 117 2026-02-21T09:47:41.4214579Z .b8 99 2026-02-21T09:47:41.4214720Z .b8 118 2026-02-21T09:47:41.4214832Z .b8 108 2026-02-21T09:47:41.4214953Z .b8 101 2026-02-21T09:47:41.4215061Z .b8 46 2026-02-21T09:47:41.4215175Z .b8 112 2026-02-21T09:47:41.4215282Z .b8 121 2026-02-21T09:47:41.4215431Z .b8 0 2026-02-21T09:47:41.4215585Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:41.4215812Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:41.4216000Z .b8 116 2026-02-21T09:47:41.4216110Z .b8 109 2026-02-21T09:47:41.4216225Z .b8 112 2026-02-21T09:47:41.4216333Z .b8 47 2026-02-21T09:47:41.4216449Z .b8 116 2026-02-21T09:47:41.4216556Z .b8 111 2026-02-21T09:47:41.4216670Z .b8 114 2026-02-21T09:47:41.4216775Z .b8 99 2026-02-21T09:47:41.4216889Z .b8 104 2026-02-21T09:47:41.4216995Z .b8 105 2026-02-21T09:47:41.4217125Z .b8 110 2026-02-21T09:47:41.4217237Z .b8 100 2026-02-21T09:47:41.4217356Z .b8 117 2026-02-21T09:47:41.4217469Z .b8 99 2026-02-21T09:47:41.4217590Z .b8 116 2026-02-21T09:47:41.4217708Z .b8 111 2026-02-21T09:47:41.4217819Z .b8 114 2026-02-21T09:47:41.4217937Z .b8 95 2026-02-21T09:47:41.4218049Z .b8 114 2026-02-21T09:47:41.4218170Z .b8 111 2026-02-21T09:47:41.4218280Z .b8 111 2026-02-21T09:47:41.4218399Z .b8 116 2026-02-21T09:47:41.4218511Z .b8 47 2026-02-21T09:47:41.4218631Z .b8 53 2026-02-21T09:47:41.4218741Z .b8 113 2026-02-21T09:47:41.4218860Z .b8 0 2026-02-21T09:47:41.4219013Z } 2026-02-21T09:47:41.4219146Z .section .debug_macinfo { } 2026-02-21T09:47:41.4219254Z 2026-02-21T09:47:41.4219338Z ================================================================ 2026-02-21T09:47:41.4219576Z please share the reproducer above with Triton project. 2026-02-21T09:47:41.6466802Z [78s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:41.6467072Z 2026-02-21T09:47:41.6472428Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:47:41.6473580Z 2026-02-21T09:47:41.6473718Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:41.6473909Z 2026-02-21T09:47:41.6479429Z `ptxas` stderr: 2026-02-21T09:47:41.6479638Z ================================================================ 2026-02-21T09:47:41.6484128Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:41.6484613Z Internal Triton PTX codegen error 2026-02-21T09:47:41.6485032Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:41.6485403Z 2026-02-21T09:47:41.6485815Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpg8wu66a1.ptx -o /tmp/tmpg8wu66a1.ptx.o 2026-02-21T09:47:41.6486265Z 2026-02-21T09:47:41.6486406Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:41.6486648Z `ptxas` stderr: 2026-02-21T09:47:41.6491177Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 256 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:41.6495454Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:41.6495669Z 2026-02-21T09:47:41.6500420Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpg8wu66a1.ptx -o /tmp/tmpg8wu66a1.ptx.o 2026-02-21T09:47:41.6504406Z 2026-02-21T09:47:41.6508445Z 2026-02-21T09:47:41.6510455Z // 2026-02-21T09:47:41.6510647Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:41.6510838Z // 2026-02-21T09:47:41.6510909Z 2026-02-21T09:47:41.6510976Z .version 8.7 2026-02-21T09:47:41.6511113Z .target sm_100a 2026-02-21T09:47:41.6511256Z .address_size 64 2026-02-21T09:47:41.6511337Z 2026-02-21T09:47:41.6511668Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:41.6511943Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:41.6512154Z // @_helion_matmul 2026-02-21T09:47:41.6512361Z .visible .entry _helion_matmul( 2026-02-21T09:47:41.6512582Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:41.6512834Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:41.6513081Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:41.6513317Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:41.6513566Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:41.6513771Z ) 2026-02-21T09:47:41.6513912Z .reqntid 256 2026-02-21T09:47:41.6514054Z .maxnreg 32 2026-02-21T09:47:41.6514185Z { 2026-02-21T09:47:41.6514318Z .reg .pred %p<98>; 2026-02-21T09:47:41.6514459Z .reg .b16 %rs<11>; 2026-02-21T09:47:41.6514595Z .reg .b32 %r<289>; 2026-02-21T09:47:41.6514794Z .reg .b64 %rd<119>; 2026-02-21T09:47:41.6515069Z .loc 1 19 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:19:0 2026-02-21T09:47:41.6515413Z $L__func_begin0: 2026-02-21T09:47:41.6515659Z .loc 1 19 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:19:0 2026-02-21T09:47:41.6515881Z 2026-02-21T09:47:41.6515940Z // %bb.0: 2026-02-21T09:47:41.6516089Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:47:41.6516282Z $L__tmp0: 2026-02-21T09:47:41.6516502Z .loc 1 19 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:19 2026-02-21T09:47:41.6516780Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:41.6516921Z shr.u32 %r2, %r1, 5; 2026-02-21T09:47:41.6517081Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:47:41.6517260Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:47:41.6517415Z @%p1 bra $L__BB0_12; 2026-02-21T09:47:41.6517557Z bra.uni $L__BB0_1; 2026-02-21T09:47:41.6517689Z $L__BB0_12: 2026-02-21T09:47:41.6518100Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0:0 2026-02-21T09:47:41.6518395Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:47:41.6518606Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:47:41.6518878Z .loc 1 19 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:19 2026-02-21T09:47:41.6519170Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.6519364Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:47:41.6519523Z mov.b32 %r134, global_smem; 2026-02-21T09:47:41.6519683Z // begin inline asm 2026-02-21T09:47:41.6519949Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r134], 32; 2026-02-21T09:47:41.6520200Z // end inline asm 2026-02-21T09:47:41.6520332Z bar.sync 0, 128; 2026-02-21T09:47:41.6520482Z ld.shared.b32 %r281, [global_smem]; 2026-02-21T09:47:41.6520646Z bar.sync 0, 128; 2026-02-21T09:47:41.6520786Z // begin inline asm 2026-02-21T09:47:41.6520989Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:41.6521224Z // end inline asm 2026-02-21T09:47:41.6521481Z .loc 1 21 67 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:21:67 2026-02-21T09:47:41.6521766Z mov.u32 %r151, %ctaid.x; 2026-02-21T09:47:41.6521920Z mov.u32 %r152, %ctaid.y; 2026-02-21T09:47:41.6522063Z mov.u32 %r153, %ctaid.z; 2026-02-21T09:47:41.6522215Z mov.u32 %r154, %nctaid.x; 2026-02-21T09:47:41.6522364Z mov.u32 %r155, %nctaid.y; 2026-02-21T09:47:41.6522527Z mad.lo.s32 %r156, %r153, %r155, %r152; 2026-02-21T09:47:41.6522702Z mad.lo.s32 %r157, %r156, %r154, %r151; 2026-02-21T09:47:41.6522878Z shl.b32 %r158, %r157, 8; 2026-02-21T09:47:41.6523029Z cvt.s64.s32 %rd84, %r158; 2026-02-21T09:47:41.6523180Z add.s64 %rd63, %rd6, %rd84; 2026-02-21T09:47:41.6523339Z shl.b32 %r159, %r1, 2; 2026-02-21T09:47:41.6523489Z add.s32 %r135, %r134, %r159; 2026-02-21T09:47:41.6523673Z mov.b32 %r144, 0; 2026-02-21T09:47:41.6523809Z // begin inline asm 2026-02-21T09:47:41.6523965Z @%p34 st.shared.b32 [ %r135 + 0 ], %r144; 2026-02-21T09:47:41.6524135Z // end inline asm 2026-02-21T09:47:41.6524277Z bar.warp.sync -1; 2026-02-21T09:47:41.6524426Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:47:41.6524575Z cvt.u64.u32 %rd48, %r134; 2026-02-21T09:47:41.6524772Z // begin inline asm 2026-02-21T09:47:41.6525020Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd48 + 0 ], %rd3; 2026-02-21T09:47:41.6525304Z // end inline asm 2026-02-21T09:47:41.6525435Z // begin inline asm 2026-02-21T09:47:41.6525664Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1; 2026-02-21T09:47:41.6525938Z // end inline asm 2026-02-21T09:47:41.6526088Z mov.b32 %r137, 64; 2026-02-21T09:47:41.6526228Z // begin inline asm 2026-02-21T09:47:41.6526479Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r137; 2026-02-21T09:47:41.6526756Z // end inline asm 2026-02-21T09:47:41.6526899Z mov.b32 %r138, 128; 2026-02-21T09:47:41.6527041Z // begin inline asm 2026-02-21T09:47:41.6527296Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r138; 2026-02-21T09:47:41.6527560Z // end inline asm 2026-02-21T09:47:41.6527689Z mov.b32 %r139, 2048; 2026-02-21T09:47:41.6527831Z // begin inline asm 2026-02-21T09:47:41.6528066Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.6528340Z // end inline asm 2026-02-21T09:47:41.6528469Z // begin inline asm 2026-02-21T09:47:41.6528708Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r139; 2026-02-21T09:47:41.6528978Z // end inline asm 2026-02-21T09:47:41.6529107Z mov.b64 %rd56, 4096; 2026-02-21T09:47:41.6529251Z // begin inline asm 2026-02-21T09:47:41.6529493Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd48 + 0 ], 0x0, %rd56; 2026-02-21T09:47:41.6529773Z // end inline asm 2026-02-21T09:47:41.6529929Z mov.b32 %r141, 1; 2026-02-21T09:47:41.6530069Z // begin inline asm 2026-02-21T09:47:41.6530323Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r141; 2026-02-21T09:47:41.6530598Z // end inline asm 2026-02-21T09:47:41.6530733Z // begin inline asm 2026-02-21T09:47:41.6530975Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r141; 2026-02-21T09:47:41.6531251Z // end inline asm 2026-02-21T09:47:41.6531378Z // begin inline asm 2026-02-21T09:47:41.6531608Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x6; 2026-02-21T09:47:41.6531901Z // end inline asm 2026-02-21T09:47:41.6532039Z // begin inline asm 2026-02-21T09:47:41.6532288Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.6532562Z // end inline asm 2026-02-21T09:47:41.6532703Z // begin inline asm 2026-02-21T09:47:41.6532934Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x3; 2026-02-21T09:47:41.6533201Z // end inline asm 2026-02-21T09:47:41.6533336Z // begin inline asm 2026-02-21T09:47:41.6533567Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.6533832Z // end inline asm 2026-02-21T09:47:41.6533966Z // begin inline asm 2026-02-21T09:47:41.6534318Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd63 + 0 ], [ %rd48 + 0 ], 0x80; 2026-02-21T09:47:41.6534727Z // end inline asm 2026-02-21T09:47:41.6534867Z // begin inline asm 2026-02-21T09:47:41.6535074Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd63 + 0 ], 0x80; 2026-02-21T09:47:41.6535334Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.6535533Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.6535705Z // end inline asm 2026-02-21T09:47:41.6535845Z bar.sync 0, 128; 2026-02-21T09:47:41.6536122Z .loc 1 22 68 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:22:68 2026-02-21T09:47:41.6536418Z add.s64 %rd81, %rd63, 128; 2026-02-21T09:47:41.6536573Z bar.sync 0, 128; 2026-02-21T09:47:41.6536711Z // begin inline asm 2026-02-21T09:47:41.6536859Z @%p34 st.shared.b32 [ %r135 + 0 ], %r144; 2026-02-21T09:47:41.6537033Z // end inline asm 2026-02-21T09:47:41.6537163Z bar.warp.sync -1; 2026-02-21T09:47:41.6537303Z // begin inline asm 2026-02-21T09:47:41.6537541Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd48 + 0 ], %rd4; 2026-02-21T09:47:41.6537803Z // end inline asm 2026-02-21T09:47:41.6537938Z // begin inline asm 2026-02-21T09:47:41.6538152Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1; 2026-02-21T09:47:41.6538401Z // end inline asm 2026-02-21T09:47:41.6538528Z // begin inline asm 2026-02-21T09:47:41.6538760Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r137; 2026-02-21T09:47:41.6539021Z // end inline asm 2026-02-21T09:47:41.6539151Z mov.b32 %r146, 16; 2026-02-21T09:47:41.6539288Z // begin inline asm 2026-02-21T09:47:41.6539510Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r146; 2026-02-21T09:47:41.6539802Z // end inline asm 2026-02-21T09:47:41.6539931Z // begin inline asm 2026-02-21T09:47:41.6540171Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.6540434Z // end inline asm 2026-02-21T09:47:41.6540570Z mov.b32 %r148, 12288; 2026-02-21T09:47:41.6540715Z // begin inline asm 2026-02-21T09:47:41.6540945Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r148; 2026-02-21T09:47:41.6541217Z // end inline asm 2026-02-21T09:47:41.6541343Z // begin inline asm 2026-02-21T09:47:41.6541588Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd48 + 0 ], 0x0, %rd56; 2026-02-21T09:47:41.6541861Z // end inline asm 2026-02-21T09:47:41.6541997Z // begin inline asm 2026-02-21T09:47:41.6542279Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0, %r141; 2026-02-21T09:47:41.6542566Z // end inline asm 2026-02-21T09:47:41.6542700Z // begin inline asm 2026-02-21T09:47:41.6542942Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x1, %r141; 2026-02-21T09:47:41.6543232Z // end inline asm 2026-02-21T09:47:41.6543360Z // begin inline asm 2026-02-21T09:47:41.6543587Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x6; 2026-02-21T09:47:41.6543846Z // end inline asm 2026-02-21T09:47:41.6543974Z // begin inline asm 2026-02-21T09:47:41.6544217Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.6544523Z // end inline asm 2026-02-21T09:47:41.6544658Z // begin inline asm 2026-02-21T09:47:41.6544936Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x3; 2026-02-21T09:47:41.6545208Z // end inline asm 2026-02-21T09:47:41.6545341Z // begin inline asm 2026-02-21T09:47:41.6545571Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd48 + 0 ], 0x0; 2026-02-21T09:47:41.6545830Z // end inline asm 2026-02-21T09:47:41.6545961Z // begin inline asm 2026-02-21T09:47:41.6546312Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd81 + 0 ], [ %rd48 + 0 ], 0x80; 2026-02-21T09:47:41.6546683Z // end inline asm 2026-02-21T09:47:41.6546827Z // begin inline asm 2026-02-21T09:47:41.6547039Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd81 + 0 ], 0x80; 2026-02-21T09:47:41.6547308Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.6547513Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.6547683Z // end inline asm 2026-02-21T09:47:41.6547820Z bar.sync 0, 128; 2026-02-21T09:47:41.6548062Z .loc 1 29 35 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:29:35 2026-02-21T09:47:41.6548388Z mul.lo.s32 %r288, %r151, 6; 2026-02-21T09:47:41.6548650Z .loc 1 30 37 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:30:37 2026-02-21T09:47:41.6548941Z add.s32 %r160, %r288, 6; 2026-02-21T09:47:41.6549201Z .loc 1 30 49 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:30:49 2026-02-21T09:47:41.6549477Z min.s32 %r22, %r160, 12288; 2026-02-21T09:47:41.6549734Z .loc 1 31 84 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:31:84 2026-02-21T09:47:41.6550012Z setp.ge.s32 %p72, %r288, %r22; 2026-02-21T09:47:41.6550180Z @%p72 bra $L__BB0_15; 2026-02-21T09:47:41.6550340Z // %bb.13: // %.lr.ph 2026-02-21T09:47:41.6550633Z .loc 1 0 84 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0:84 2026-02-21T09:47:41.6550935Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:47:41.6551119Z and.b32 %r23, %r1, 1; 2026-02-21T09:47:41.6551270Z shl.b32 %r24, %r23, 3; 2026-02-21T09:47:41.6551423Z bfe.u32 %r25, %r1, 1, 6; 2026-02-21T09:47:41.6551579Z or.b32 %r26, %r25, 64; 2026-02-21T09:47:41.6551754Z shl.b32 %r161, %r1, 4; 2026-02-21T09:47:41.6551904Z and.b32 %r162, %r161, 1968; 2026-02-21T09:47:41.6552053Z bfe.s32 %r163, %r1, 2, 1; 2026-02-21T09:47:41.6552208Z and.b32 %r164, %r163, 2112; 2026-02-21T09:47:41.6552356Z or.b32 %r165, %r164, %r162; 2026-02-21T09:47:41.6552508Z add.s32 %r27, %r134, %r165; 2026-02-21T09:47:41.6552658Z xor.b32 %r167, %r165, 64; 2026-02-21T09:47:41.6552802Z add.s32 %r28, %r134, %r167; 2026-02-21T09:47:41.6552954Z shl.b32 %r168, %r1, 3; 2026-02-21T09:47:41.6553096Z and.b32 %r169, %r168, 944; 2026-02-21T09:47:41.6553255Z shl.b32 %r170, %r23, 6; 2026-02-21T09:47:41.6553401Z bfe.s32 %r171, %r1, 3, 1; 2026-02-21T09:47:41.6553559Z and.b32 %r172, %r171, 2112; 2026-02-21T09:47:41.6553711Z or.b32 %r173, %r169, %r170; 2026-02-21T09:47:41.6553873Z xor.b32 %r174, %r173, %r172; 2026-02-21T09:47:41.6554032Z add.s32 %r29, %r134, %r174; 2026-02-21T09:47:41.6554289Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:41.6554633Z .loc 1 37 35 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:37:35 2026-02-21T09:47:41.6554958Z shr.s32 %r241, %r288, 31; 2026-02-21T09:47:41.6555122Z shr.u32 %r242, %r241, 26; 2026-02-21T09:47:41.6555280Z add.s32 %r243, %r288, %r242; 2026-02-21T09:47:41.6555554Z .loc 1 40 64 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:40:64 2026-02-21T09:47:41.6555849Z and.b32 %r244, %r243, -64; 2026-02-21T09:47:41.6556018Z sub.s32 %r245, %r288, %r244; 2026-02-21T09:47:41.6556218Z cvt.u16.u32 %rs1, %r245; 2026-02-21T09:47:41.6556378Z cvt.s8.s32 %rs2, %r245; 2026-02-21T09:47:41.6556655Z .loc 1 41 51 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:41:51 2026-02-21T09:47:41.6556957Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:47:41.6557124Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:47:41.6557279Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:47:41.6557445Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:47:41.6557597Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:47:41.6557856Z .loc 1 40 64 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:40:64 2026-02-21T09:47:41.6558149Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:47:41.6558299Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:47:41.6558566Z .loc 1 42 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:42:27 2026-02-21T09:47:41.6558856Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:47:41.6559024Z mad.wide.s16 %r246, %rs10, 16, %r244; 2026-02-21T09:47:41.6559309Z .loc 1 43 32 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:43:32 2026-02-21T09:47:41.6559607Z or.b32 %r247, %r246, %r24; 2026-02-21T09:47:41.6559876Z .loc 1 44 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:44:27 2026-02-21T09:47:41.6560196Z mul.wide.s16 %r248, %rs7, 128; 2026-02-21T09:47:41.6560479Z .loc 1 45 32 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:45:32 2026-02-21T09:47:41.6560768Z or.b32 %r249, %r248, %r25; 2026-02-21T09:47:41.6560933Z or.b32 %r250, %r248, %r26; 2026-02-21T09:47:41.6561199Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6561495Z shfl.sync.idx.b32 %r251, %r2, 0, 31, -1; 2026-02-21T09:47:41.6561672Z shl.b32 %r252, %r251, 21; 2026-02-21T09:47:41.6561829Z and.b32 %r253, %r252, 6291456; 2026-02-21T09:47:41.6561991Z add.s32 %r175, %r253, %r281; 2026-02-21T09:47:41.6562144Z mov.pred %p73, -1; 2026-02-21T09:47:41.6562291Z mov.b32 %r176, 0; 2026-02-21T09:47:41.6562423Z // begin inline asm 2026-02-21T09:47:41.6562800Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r175 + 0], {%r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176, %r176}; 2026-02-21T09:47:41.6563183Z // end inline asm 2026-02-21T09:47:41.6563323Z // begin inline asm 2026-02-21T09:47:41.6563477Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:41.6563658Z // end inline asm 2026-02-21T09:47:41.6563792Z bar.sync 0, 128; 2026-02-21T09:47:41.6564028Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6564307Z add.s32 %r192, %r134, 73856; 2026-02-21T09:47:41.6564458Z // begin inline asm 2026-02-21T09:47:41.6564625Z @%p37 mbarrier.init.shared::cta.b64 [%r192], 1; 2026-02-21T09:47:41.6564834Z // end inline asm 2026-02-21T09:47:41.6564968Z bar.sync 0, 128; 2026-02-21T09:47:41.6565100Z add.s32 %r193, %r134, 73864; 2026-02-21T09:47:41.6565252Z // begin inline asm 2026-02-21T09:47:41.6565417Z @%p37 mbarrier.init.shared::cta.b64 [%r193], 1; 2026-02-21T09:47:41.6565594Z // end inline asm 2026-02-21T09:47:41.6565727Z bar.sync 0, 128; 2026-02-21T09:47:41.6565855Z add.s32 %r194, %r134, 73872; 2026-02-21T09:47:41.6566011Z // begin inline asm 2026-02-21T09:47:41.6566192Z @%p37 mbarrier.init.shared::cta.b64 [%r194], 1; 2026-02-21T09:47:41.6566389Z // end inline asm 2026-02-21T09:47:41.6566515Z bar.sync 0, 128; 2026-02-21T09:47:41.6566653Z add.s32 %r195, %r134, 73880; 2026-02-21T09:47:41.6566812Z // begin inline asm 2026-02-21T09:47:41.6566968Z @%p37 mbarrier.init.shared::cta.b64 [%r195], 1; 2026-02-21T09:47:41.6567162Z // end inline asm 2026-02-21T09:47:41.6567294Z add.s32 %r196, %r134, 73888; 2026-02-21T09:47:41.6567449Z // begin inline asm 2026-02-21T09:47:41.6567603Z @%p37 mbarrier.init.shared::cta.b64 [%r196], 1; 2026-02-21T09:47:41.6567787Z // end inline asm 2026-02-21T09:47:41.6567942Z bar.sync 0, 128; 2026-02-21T09:47:41.6568078Z add.s32 %r197, %r134, 73896; 2026-02-21T09:47:41.6568222Z // begin inline asm 2026-02-21T09:47:41.6568384Z @%p37 mbarrier.init.shared::cta.b64 [%r197], 1; 2026-02-21T09:47:41.6568564Z // end inline asm 2026-02-21T09:47:41.6568690Z bar.sync 0, 128; 2026-02-21T09:47:41.6568829Z add.s32 %r198, %r134, 73904; 2026-02-21T09:47:41.6568975Z // begin inline asm 2026-02-21T09:47:41.6569136Z @%p37 mbarrier.init.shared::cta.b64 [%r198], 1; 2026-02-21T09:47:41.6569311Z // end inline asm 2026-02-21T09:47:41.6569446Z bar.sync 0, 128; 2026-02-21T09:47:41.6569575Z add.s32 %r199, %r134, 73912; 2026-02-21T09:47:41.6569728Z // begin inline asm 2026-02-21T09:47:41.6569888Z @%p37 mbarrier.init.shared::cta.b64 [%r199], 1; 2026-02-21T09:47:41.6570060Z // end inline asm 2026-02-21T09:47:41.6570299Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6570559Z bar.sync 0, 128; 2026-02-21T09:47:41.6570697Z // begin inline asm 2026-02-21T09:47:41.6570858Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r192]; 2026-02-21T09:47:41.6571051Z // end inline asm 2026-02-21T09:47:41.6571176Z bar.sync 0, 128; 2026-02-21T09:47:41.6571309Z // begin inline asm 2026-02-21T09:47:41.6571497Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r193]; 2026-02-21T09:47:41.6571682Z // end inline asm 2026-02-21T09:47:41.6571814Z bar.sync 0, 128; 2026-02-21T09:47:41.6571942Z // begin inline asm 2026-02-21T09:47:41.6572105Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r194]; 2026-02-21T09:47:41.6572287Z // end inline asm 2026-02-21T09:47:41.6572417Z bar.sync 0, 128; 2026-02-21T09:47:41.6572541Z // begin inline asm 2026-02-21T09:47:41.6572704Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r195]; 2026-02-21T09:47:41.6572884Z // end inline asm 2026-02-21T09:47:41.6573126Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6573402Z bar.sync 0, 128; 2026-02-21T09:47:41.6573532Z add.s32 %r204, %r134, 73920; 2026-02-21T09:47:41.6573690Z // begin inline asm 2026-02-21T09:47:41.6573845Z @%p37 mbarrier.init.shared::cta.b64 [%r204], 1; 2026-02-21T09:47:41.6574031Z // end inline asm 2026-02-21T09:47:41.6574178Z st.shared.b32 [global_smem+73928], 33554689; 2026-02-21T09:47:41.6574383Z st.shared.b32 [global_smem+73728], %r281; 2026-02-21T09:47:41.6574602Z st.shared.v2.b32 [global_smem+73736], {%r248, %r246}; 2026-02-21T09:47:41.6574857Z barrier.sync 1; 2026-02-21T09:47:41.6575021Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.6575199Z barrier.sync 1; 2026-02-21T09:47:41.6575341Z barrier.sync 1; 2026-02-21T09:47:41.6575491Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.6575775Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6576042Z bar.sync 0, 128; 2026-02-21T09:47:41.6576178Z // begin inline asm 2026-02-21T09:47:41.6576313Z 2026-02-21T09:47:41.6576441Z { 2026-02-21T09:47:41.6576578Z .reg .pred complete; 2026-02-21T09:47:41.6576726Z waitLoop: 2026-02-21T09:47:41.6576920Z mbarrier.try_wait.parity.shared.b64 complete, [%r204], %r176; 2026-02-21T09:47:41.6577152Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.6577308Z } 2026-02-21T09:47:41.6577371Z 2026-02-21T09:47:41.6577426Z // end inline asm 2026-02-21T09:47:41.6577691Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6577966Z bar.sync 0, 128; 2026-02-21T09:47:41.6578102Z // begin inline asm 2026-02-21T09:47:41.6578268Z @%p37 mbarrier.inval.shared::cta.b64 [%r204]; 2026-02-21T09:47:41.6578448Z // end inline asm 2026-02-21T09:47:41.6578585Z // begin inline asm 2026-02-21T09:47:41.6578741Z @%p37 mbarrier.inval.shared::cta.b64 [%r196]; 2026-02-21T09:47:41.6578929Z // end inline asm 2026-02-21T09:47:41.6579055Z bar.sync 0, 128; 2026-02-21T09:47:41.6579188Z // begin inline asm 2026-02-21T09:47:41.6579370Z @%p37 mbarrier.inval.shared::cta.b64 [%r197]; 2026-02-21T09:47:41.6579550Z // end inline asm 2026-02-21T09:47:41.6579675Z bar.sync 0, 128; 2026-02-21T09:47:41.6579809Z // begin inline asm 2026-02-21T09:47:41.6579965Z @%p37 mbarrier.inval.shared::cta.b64 [%r198]; 2026-02-21T09:47:41.6580136Z // end inline asm 2026-02-21T09:47:41.6580268Z bar.sync 0, 128; 2026-02-21T09:47:41.6580396Z // begin inline asm 2026-02-21T09:47:41.6580552Z @%p37 mbarrier.inval.shared::cta.b64 [%r199]; 2026-02-21T09:47:41.6580725Z // end inline asm 2026-02-21T09:47:41.6580859Z // begin inline asm 2026-02-21T09:47:41.6581010Z @%p37 mbarrier.inval.shared::cta.b64 [%r192]; 2026-02-21T09:47:41.6581190Z // end inline asm 2026-02-21T09:47:41.6581321Z bar.sync 0, 128; 2026-02-21T09:47:41.6581448Z // begin inline asm 2026-02-21T09:47:41.6581606Z @%p37 mbarrier.inval.shared::cta.b64 [%r193]; 2026-02-21T09:47:41.6581780Z // end inline asm 2026-02-21T09:47:41.6581912Z bar.sync 0, 128; 2026-02-21T09:47:41.6582039Z // begin inline asm 2026-02-21T09:47:41.6582198Z @%p37 mbarrier.inval.shared::cta.b64 [%r194]; 2026-02-21T09:47:41.6582368Z // end inline asm 2026-02-21T09:47:41.6582501Z bar.sync 0, 128; 2026-02-21T09:47:41.6582628Z // begin inline asm 2026-02-21T09:47:41.6582786Z @%p37 mbarrier.inval.shared::cta.b64 [%r195]; 2026-02-21T09:47:41.6582995Z // end inline asm 2026-02-21T09:47:41.6583232Z .loc 1 59 53 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:59:53 2026-02-21T09:47:41.6583527Z mad.lo.s32 %r255, %r249, 12288, %r247; 2026-02-21T09:47:41.6583706Z mad.lo.s32 %r256, %r250, 12288, %r247; 2026-02-21T09:47:41.6583985Z .loc 1 59 24 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:59:24 2026-02-21T09:47:41.6584264Z mad.wide.s32 %rd85, %r255, 2, %rd5; 2026-02-21T09:47:41.6584442Z mad.wide.s32 %rd86, %r256, 2, %rd5; 2026-02-21T09:47:41.6584742Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6585022Z // begin inline asm 2026-02-21T09:47:41.6585382Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r216, %r217, %r218, %r219, %r220, %r221, %r222, %r223, %r224, %r225, %r226, %r227, %r228, %r229, %r230, %r231}, [%r175 + 0]; 2026-02-21T09:47:41.6585767Z // end inline asm 2026-02-21T09:47:41.6585905Z // begin inline asm 2026-02-21T09:47:41.6586053Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:41.6586220Z // end inline asm 2026-02-21T09:47:41.6586388Z cvt.u64.u32 %rd87, %r216; 2026-02-21T09:47:41.6586541Z cvt.u64.u32 %rd88, %r217; 2026-02-21T09:47:41.6586697Z shl.b64 %rd89, %rd88, 32; 2026-02-21T09:47:41.6586846Z or.b64 %rd90, %rd87, %rd89; 2026-02-21T09:47:41.6587107Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6587384Z mov.b64 {%r257, %r258}, %rd90; 2026-02-21T09:47:41.6587559Z cvt.rn.f16x2.f32 %r259, %r258, %r257; 2026-02-21T09:47:41.6587831Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6588113Z cvt.u64.u32 %rd91, %r218; 2026-02-21T09:47:41.6588268Z cvt.u64.u32 %rd92, %r219; 2026-02-21T09:47:41.6588414Z shl.b64 %rd93, %rd92, 32; 2026-02-21T09:47:41.6588569Z or.b64 %rd94, %rd91, %rd93; 2026-02-21T09:47:41.6588845Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6589136Z mov.b64 {%r260, %r261}, %rd94; 2026-02-21T09:47:41.6589299Z cvt.rn.f16x2.f32 %r262, %r261, %r260; 2026-02-21T09:47:41.6589573Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6589851Z cvt.u64.u32 %rd95, %r220; 2026-02-21T09:47:41.6589995Z cvt.u64.u32 %rd96, %r221; 2026-02-21T09:47:41.6590145Z shl.b64 %rd97, %rd96, 32; 2026-02-21T09:47:41.6590292Z or.b64 %rd98, %rd95, %rd97; 2026-02-21T09:47:41.6590552Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6590870Z mov.b64 {%r263, %r264}, %rd98; 2026-02-21T09:47:41.6591039Z cvt.rn.f16x2.f32 %r265, %r264, %r263; 2026-02-21T09:47:41.6591339Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6591647Z cvt.u64.u32 %rd99, %r222; 2026-02-21T09:47:41.6591843Z cvt.u64.u32 %rd100, %r223; 2026-02-21T09:47:41.6592011Z shl.b64 %rd101, %rd100, 32; 2026-02-21T09:47:41.6592200Z or.b64 %rd102, %rd99, %rd101; 2026-02-21T09:47:41.6592475Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6592797Z mov.b64 {%r266, %r267}, %rd102; 2026-02-21T09:47:41.6592996Z cvt.rn.f16x2.f32 %r268, %r267, %r266; 2026-02-21T09:47:41.6593303Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6593614Z cvt.u64.u32 %rd103, %r224; 2026-02-21T09:47:41.6593794Z cvt.u64.u32 %rd104, %r225; 2026-02-21T09:47:41.6593978Z shl.b64 %rd105, %rd104, 32; 2026-02-21T09:47:41.6594145Z or.b64 %rd106, %rd103, %rd105; 2026-02-21T09:47:41.6594405Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6594718Z mov.b64 {%r269, %r270}, %rd106; 2026-02-21T09:47:41.6594924Z cvt.rn.f16x2.f32 %r271, %r270, %r269; 2026-02-21T09:47:41.6595195Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6595480Z cvt.u64.u32 %rd107, %r226; 2026-02-21T09:47:41.6595642Z cvt.u64.u32 %rd108, %r227; 2026-02-21T09:47:41.6595789Z shl.b64 %rd109, %rd108, 32; 2026-02-21T09:47:41.6595949Z or.b64 %rd110, %rd107, %rd109; 2026-02-21T09:47:41.6596202Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6596491Z mov.b64 {%r272, %r273}, %rd110; 2026-02-21T09:47:41.6596650Z cvt.rn.f16x2.f32 %r274, %r273, %r272; 2026-02-21T09:47:41.6596920Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6597197Z cvt.u64.u32 %rd111, %r228; 2026-02-21T09:47:41.6597345Z cvt.u64.u32 %rd112, %r229; 2026-02-21T09:47:41.6597500Z shl.b64 %rd113, %rd112, 32; 2026-02-21T09:47:41.6597660Z or.b64 %rd114, %rd111, %rd113; 2026-02-21T09:47:41.6597932Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6598269Z mov.b64 {%r275, %r276}, %rd114; 2026-02-21T09:47:41.6598445Z cvt.rn.f16x2.f32 %r277, %r276, %r275; 2026-02-21T09:47:41.6598726Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6599018Z cvt.u64.u32 %rd115, %r230; 2026-02-21T09:47:41.6599182Z cvt.u64.u32 %rd116, %r231; 2026-02-21T09:47:41.6599338Z shl.b64 %rd117, %rd116, 32; 2026-02-21T09:47:41.6599503Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T09:47:41.6599773Z .loc 1 58 27 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:58:27 2026-02-21T09:47:41.6600065Z mov.b64 {%r278, %r279}, %rd118; 2026-02-21T09:47:41.6600234Z cvt.rn.f16x2.f32 %r280, %r279, %r278; 2026-02-21T09:47:41.6600529Z .loc 1 59 83 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:59:83 2026-02-21T09:47:41.6600910Z st.shared.v4.b32 [%r27], {%r259, %r262, %r265, %r268}; 2026-02-21T09:47:41.6601157Z st.shared.v4.b32 [%r28], {%r271, %r274, %r277, %r280}; 2026-02-21T09:47:41.6601365Z bar.sync 0, 128; 2026-02-21T09:47:41.6601544Z ld.shared.v4.b32 {%r237, %r238, %r239, %r240}, [%r29+1024]; 2026-02-21T09:47:41.6601792Z ld.shared.v4.b32 {%r233, %r234, %r235, %r236}, [%r29]; 2026-02-21T09:47:41.6601990Z // begin inline asm 2026-02-21T09:47:41.6602183Z st.global.v4.b32 [ %rd85 + 0 ], { %r233, %r234, %r235, %r236 }; 2026-02-21T09:47:41.6602393Z // end inline asm 2026-02-21T09:47:41.6602540Z // begin inline asm 2026-02-21T09:47:41.6602760Z st.global.v4.b32 [ %rd86 + 0 ], { %r237, %r238, %r239, %r240 }; 2026-02-21T09:47:41.6602970Z // end inline asm 2026-02-21T09:47:41.6603222Z .loc 1 31 84 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:31:84 2026-02-21T09:47:41.6603509Z add.s32 %r288, %r288, 1; 2026-02-21T09:47:41.6603676Z setp.ne.b32 %p96, %r22, %r288; 2026-02-21T09:47:41.6603840Z @%p96 bra $L__BB0_14; 2026-02-21T09:47:41.6604022Z $L__BB0_15: // %._crit_edge 2026-02-21T09:47:41.6604325Z .loc 1 31 4 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:31:4 2026-02-21T09:47:41.6604614Z bar.sync 0, 128; 2026-02-21T09:47:41.6604787Z // begin inline asm 2026-02-21T09:47:41.6604992Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r281, 32; 2026-02-21T09:47:41.6605233Z // end inline asm 2026-02-21T09:47:41.6605387Z st.shared.b32 [global_smem+73928], 50529027; 2026-02-21T09:47:41.6605576Z barrier.sync 1; 2026-02-21T09:47:41.6605738Z $L__BB0_16: // %common.ret 2026-02-21T09:47:41.6606042Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6606316Z ret; 2026-02-21T09:47:41.6606482Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:47:41.6606726Z mov.b32 %r33, global_smem; 2026-02-21T09:47:41.6606885Z add.s32 %r34, %r33, %r3; 2026-02-21T09:47:41.6607041Z add.s32 %r66, %r33, 73888; 2026-02-21T09:47:41.6607189Z bfe.u32 %r80, %r33, 4, 14; 2026-02-21T09:47:41.6607344Z cvt.u64.u32 %rd22, %r80; 2026-02-21T09:47:41.6607500Z or.b64 %rd12, %rd22, 4611686293372403712; 2026-02-21T09:47:41.6607678Z add.s32 %r81, %r33, 65536; 2026-02-21T09:47:41.6607823Z bfe.u32 %r82, %r81, 4, 14; 2026-02-21T09:47:41.6607975Z cvt.u64.u32 %rd23, %r82; 2026-02-21T09:47:41.6608136Z or.b64 %rd13, %rd23, 4611686293313683456; 2026-02-21T09:47:41.6608305Z add.s32 %r83, %r33, 32; 2026-02-21T09:47:41.6608458Z bfe.u32 %r84, %r83, 4, 14; 2026-02-21T09:47:41.6608601Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.6608787Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6609098Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6609396Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.6609573Z barrier.sync 1; 2026-02-21T09:47:41.6609712Z barrier.sync 1; 2026-02-21T09:47:41.6609898Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.6610093Z $L__BB0_2: // %.preheader 2026-02-21T09:47:41.6610313Z // =>This Loop Header: Depth=1 2026-02-21T09:47:41.6610537Z // Child Loop BB0_9 Depth 2 2026-02-21T09:47:41.6610758Z // Child Loop BB0_6 Depth 2 2026-02-21T09:47:41.6611043Z .loc 1 19 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:19 2026-02-21T09:47:41.6611340Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:47:41.6611395Z barrier.sync 1; 2026-02-21T09:47:41.6611465Z ld.shared.b8 %r32, [%r34+73924]; 2026-02-21T09:47:41.6611526Z setp.gt.u32 %p2, %r32, 3; 2026-02-21T09:47:41.6611581Z @%p2 bra $L__BB0_4; 2026-02-21T09:47:41.6611685Z // %bb.3: // %.preheader 2026-02-21T09:47:41.6611783Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6611845Z $L_brx_0: .branchtargets 2026-02-21T09:47:41.6611899Z $L__BB0_5, 2026-02-21T09:47:41.6611958Z $L__BB0_8, 2026-02-21T09:47:41.6612009Z $L__BB0_11, 2026-02-21T09:47:41.6612059Z $L__BB0_16; 2026-02-21T09:47:41.6612117Z brx.idx %r32, $L_brx_0; 2026-02-21T09:47:41.6612199Z $L__BB0_5: // %.peel.next 2026-02-21T09:47:41.6612283Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6612444Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6612553Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.6612626Z ld.shared.b32 %r68, [global_smem+73728]; 2026-02-21T09:47:41.6612680Z barrier.sync 1; 2026-02-21T09:47:41.6612841Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6612903Z bar.warp.sync -1; 2026-02-21T09:47:41.6612958Z mov.b32 %r282, 0; 2026-02-21T09:47:41.6613012Z // begin inline asm 2026-02-21T09:47:41.6613067Z 2026-02-21T09:47:41.6613116Z { 2026-02-21T09:47:41.6613174Z .reg .pred complete; 2026-02-21T09:47:41.6613233Z waitLoop: 2026-02-21T09:47:41.6613346Z mbarrier.try_wait.parity.shared.b64 complete, [%r66], %r282; 2026-02-21T09:47:41.6613409Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.6613457Z } 2026-02-21T09:47:41.6613468Z 2026-02-21T09:47:41.6613521Z // end inline asm 2026-02-21T09:47:41.6613683Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6613747Z elect.sync %r79|%p12, -1; 2026-02-21T09:47:41.6613811Z mov.b32 %r69, 134479888; 2026-02-21T09:47:41.6613868Z mov.pred %p11, 0; 2026-02-21T09:47:41.6613923Z // begin inline asm 2026-02-21T09:47:41.6614087Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd12, %rd13, %r69, %p11; 2026-02-21T09:47:41.6614145Z // end inline asm 2026-02-21T09:47:41.6614202Z cvt.u64.u32 %rd24, %r84; 2026-02-21T09:47:41.6614269Z or.b64 %rd14, %rd24, 4611686293372403712; 2026-02-21T09:47:41.6614335Z add.s32 %r85, %r33, 65568; 2026-02-21T09:47:41.6614391Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T09:47:41.6614448Z cvt.u64.u32 %rd25, %r86; 2026-02-21T09:47:41.6614523Z or.b64 %rd15, %rd25, 4611686293313683456; 2026-02-21T09:47:41.6614582Z mov.pred %p13, -1; 2026-02-21T09:47:41.6614637Z // begin inline asm 2026-02-21T09:47:41.6614809Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd14, %rd15, %r69, %p13; 2026-02-21T09:47:41.6614868Z // end inline asm 2026-02-21T09:47:41.6614925Z add.s32 %r87, %r33, 64; 2026-02-21T09:47:41.6614981Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T09:47:41.6615043Z cvt.u64.u32 %rd26, %r88; 2026-02-21T09:47:41.6615105Z or.b64 %rd16, %rd26, 4611686293372403712; 2026-02-21T09:47:41.6615160Z add.s32 %r89, %r33, 65600; 2026-02-21T09:47:41.6615224Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T09:47:41.6615279Z cvt.u64.u32 %rd27, %r90; 2026-02-21T09:47:41.6615371Z or.b64 %rd17, %rd27, 4611686293313683456; 2026-02-21T09:47:41.6615425Z // begin inline asm 2026-02-21T09:47:41.6615561Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd16, %rd17, %r69, %p13; 2026-02-21T09:47:41.6615614Z // end inline asm 2026-02-21T09:47:41.6615669Z add.s32 %r91, %r33, 96; 2026-02-21T09:47:41.6615730Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T09:47:41.6615785Z cvt.u64.u32 %rd28, %r92; 2026-02-21T09:47:41.6615849Z or.b64 %rd18, %rd28, 4611686293372403712; 2026-02-21T09:47:41.6615913Z add.s32 %r93, %r33, 65632; 2026-02-21T09:47:41.6615968Z bfe.u32 %r94, %r93, 4, 14; 2026-02-21T09:47:41.6616023Z cvt.u64.u32 %rd29, %r94; 2026-02-21T09:47:41.6616085Z or.b64 %rd19, %rd29, 4611686293313683456; 2026-02-21T09:47:41.6616148Z // begin inline asm 2026-02-21T09:47:41.6616274Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd18, %rd19, %r69, %p13; 2026-02-21T09:47:41.6616359Z // end inline asm 2026-02-21T09:47:41.6616423Z add.s32 %r95, %r33, 73856; 2026-02-21T09:47:41.6616480Z cvt.u64.u32 %rd20, %r95; 2026-02-21T09:47:41.6616536Z // begin inline asm 2026-02-21T09:47:41.6616663Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:47:41.6616716Z // end inline asm 2026-02-21T09:47:41.6616770Z add.s32 %r96, %r33, 73920; 2026-02-21T09:47:41.6616825Z cvt.u64.u32 %rd21, %r96; 2026-02-21T09:47:41.6616886Z // begin inline asm 2026-02-21T09:47:41.6617004Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:47:41.6617058Z // end inline asm 2026-02-21T09:47:41.6617147Z mov.b32 %r284, 1; 2026-02-21T09:47:41.6617205Z mov.b32 %r283, %r282; 2026-02-21T09:47:41.6617298Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:41.6617388Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:41.6617555Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6617612Z shl.b32 %r107, %r284, 3; 2026-02-21T09:47:41.6617670Z add.s32 %r109, %r33, %r107; 2026-02-21T09:47:41.6617735Z add.s32 %r110, %r109, 73856; 2026-02-21T09:47:41.6617791Z add.s32 %r97, %r109, 73888; 2026-02-21T09:47:41.6617948Z .loc 1 54 31 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:54:31 2026-02-21T09:47:41.6618012Z shl.b32 %r111, %r284, 14; 2026-02-21T09:47:41.6618067Z add.s32 %r112, %r33, %r111; 2026-02-21T09:47:41.6618221Z .loc 1 55 44 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:55:44 2026-02-21T09:47:41.6618278Z shl.b32 %r113, %r284, 11; 2026-02-21T09:47:41.6618340Z add.s32 %r114, %r33, %r113; 2026-02-21T09:47:41.6618398Z add.s32 %r115, %r114, 65536; 2026-02-21T09:47:41.6618548Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6618643Z bar.warp.sync -1; 2026-02-21T09:47:41.6618701Z // begin inline asm 2026-02-21T09:47:41.6618751Z 2026-02-21T09:47:41.6618805Z { 2026-02-21T09:47:41.6618862Z .reg .pred complete; 2026-02-21T09:47:41.6618915Z waitLoop: 2026-02-21T09:47:41.6619029Z mbarrier.try_wait.parity.shared.b64 complete, [%r97], %r283; 2026-02-21T09:47:41.6619097Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.6619145Z } 2026-02-21T09:47:41.6619148Z 2026-02-21T09:47:41.6619201Z // end inline asm 2026-02-21T09:47:41.6619362Z .loc 1 56 52 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:56:52 2026-02-21T09:47:41.6619426Z setp.eq.b32 %p31, %r282, 1920; 2026-02-21T09:47:41.6619487Z elect.sync %r116|%p22, -1; 2026-02-21T09:47:41.6619544Z bfe.u32 %r117, %r112, 4, 14; 2026-02-21T09:47:41.6619608Z cvt.u64.u32 %rd40, %r117; 2026-02-21T09:47:41.6619670Z or.b64 %rd30, %rd40, 4611686293372403712; 2026-02-21T09:47:41.6619726Z bfe.u32 %r118, %r115, 4, 14; 2026-02-21T09:47:41.6619792Z cvt.u64.u32 %rd41, %r118; 2026-02-21T09:47:41.6619855Z or.b64 %rd31, %rd41, 4611686293313683456; 2026-02-21T09:47:41.6619933Z // begin inline asm 2026-02-21T09:47:41.6620068Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd30, %rd31, %r69, %p13; 2026-02-21T09:47:41.6620122Z // end inline asm 2026-02-21T09:47:41.6620175Z add.s32 %r119, %r112, 32; 2026-02-21T09:47:41.6620229Z bfe.u32 %r120, %r119, 4, 14; 2026-02-21T09:47:41.6620292Z cvt.u64.u32 %rd42, %r120; 2026-02-21T09:47:41.6620354Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T09:47:41.6620410Z add.s32 %r121, %r114, 65568; 2026-02-21T09:47:41.6620470Z bfe.u32 %r122, %r121, 4, 14; 2026-02-21T09:47:41.6620527Z cvt.u64.u32 %rd43, %r122; 2026-02-21T09:47:41.6620587Z or.b64 %rd33, %rd43, 4611686293313683456; 2026-02-21T09:47:41.6620642Z // begin inline asm 2026-02-21T09:47:41.6620775Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd32, %rd33, %r69, %p13; 2026-02-21T09:47:41.6620830Z // end inline asm 2026-02-21T09:47:41.6620906Z add.s32 %r123, %r112, 64; 2026-02-21T09:47:41.6620971Z bfe.u32 %r124, %r123, 4, 14; 2026-02-21T09:47:41.6621028Z cvt.u64.u32 %rd44, %r124; 2026-02-21T09:47:41.6621091Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T09:47:41.6621153Z add.s32 %r125, %r114, 65600; 2026-02-21T09:47:41.6621208Z bfe.u32 %r126, %r125, 4, 14; 2026-02-21T09:47:41.6621265Z cvt.u64.u32 %rd45, %r126; 2026-02-21T09:47:41.6621327Z or.b64 %rd35, %rd45, 4611686293313683456; 2026-02-21T09:47:41.6621389Z // begin inline asm 2026-02-21T09:47:41.6621514Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd34, %rd35, %r69, %p13; 2026-02-21T09:47:41.6621588Z // end inline asm 2026-02-21T09:47:41.6621650Z add.s32 %r127, %r112, 96; 2026-02-21T09:47:41.6621706Z bfe.u32 %r128, %r127, 4, 14; 2026-02-21T09:47:41.6621762Z cvt.u64.u32 %rd46, %r128; 2026-02-21T09:47:41.6621824Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T09:47:41.6621889Z add.s32 %r129, %r114, 65632; 2026-02-21T09:47:41.6621946Z bfe.u32 %r130, %r129, 4, 14; 2026-02-21T09:47:41.6622004Z cvt.u64.u32 %rd47, %r130; 2026-02-21T09:47:41.6622075Z or.b64 %rd37, %rd47, 4611686293313683456; 2026-02-21T09:47:41.6622130Z // begin inline asm 2026-02-21T09:47:41.6622251Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r68 + 0 ], %rd36, %rd37, %r69, %p13; 2026-02-21T09:47:41.6622310Z // end inline asm 2026-02-21T09:47:41.6622364Z cvt.u64.u32 %rd38, %r110; 2026-02-21T09:47:41.6622418Z // begin inline asm 2026-02-21T09:47:41.6622536Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd38]; 2026-02-21T09:47:41.6622597Z // end inline asm 2026-02-21T09:47:41.6622659Z and.pred %p30, %p31, %p22; 2026-02-21T09:47:41.6622712Z // begin inline asm 2026-02-21T09:47:41.6622837Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:47:41.6622891Z // end inline asm 2026-02-21T09:47:41.6623076Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6623143Z add.s32 %r132, %r284, 1; 2026-02-21T09:47:41.6623203Z setp.eq.b32 %p32, %r132, 4; 2026-02-21T09:47:41.6623266Z selp.b32 %r284, 0, %r132, %p32; 2026-02-21T09:47:41.6623323Z selp.b32 %r133, 1, 0, %p32; 2026-02-21T09:47:41.6623387Z xor.b32 %r283, %r283, %r133; 2026-02-21T09:47:41.6623551Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6623607Z add.s32 %r282, %r282, 64; 2026-02-21T09:47:41.6623675Z setp.lt.u32 %p33, %r282, 1984; 2026-02-21T09:47:41.6623730Z @%p33 bra $L__BB0_6; 2026-02-21T09:47:41.6623809Z // %bb.7: // %.loopexit 2026-02-21T09:47:41.6623905Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6623960Z barrier.sync 1; 2026-02-21T09:47:41.6624035Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.6624088Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.6624193Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6624356Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6624453Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.6624554Z ld.shared.v2.b32 {%r50, %r54}, [global_smem+73736]; 2026-02-21T09:47:41.6624611Z barrier.sync 1; 2026-02-21T09:47:41.6624803Z .loc 1 21 67 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:21:67 2026-02-21T09:47:41.6624870Z mov.u32 %r37, %ctaid.x; 2026-02-21T09:47:41.6624929Z mov.u32 %r38, %ctaid.y; 2026-02-21T09:47:41.6624984Z mov.u32 %r39, %ctaid.z; 2026-02-21T09:47:41.6625043Z mov.u32 %r40, %nctaid.x; 2026-02-21T09:47:41.6625109Z mov.u32 %r41, %nctaid.y; 2026-02-21T09:47:41.6625170Z mad.lo.s32 %r42, %r39, %r41, %r38; 2026-02-21T09:47:41.6625229Z mad.lo.s32 %r43, %r42, %r40, %r37; 2026-02-21T09:47:41.6625292Z shl.b32 %r44, %r43, 8; 2026-02-21T09:47:41.6625479Z .loc 1 22 68 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:22:68 2026-02-21T09:47:41.6625537Z cvt.s64.s32 %rd7, %r44; 2026-02-21T09:47:41.6625595Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:47:41.6625660Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:47:41.6625720Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:47:41.6625880Z .loc 1 21 67 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:21:67 2026-02-21T09:47:41.6625949Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:47:41.6626007Z add.s32 %r13, %r1, -128; 2026-02-21T09:47:41.6626060Z mov.b32 %r286, 0; 2026-02-21T09:47:41.6626123Z mov.b32 %r285, -64; 2026-02-21T09:47:41.6626209Z mov.b32 %r287, %r286; 2026-02-21T09:47:41.6626306Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:41.6626397Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:41.6626569Z .loc 1 0 67 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0:67 2026-02-21T09:47:41.6626632Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:47:41.6626691Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:47:41.6626859Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6626914Z add.s32 %r285, %r285, 64; 2026-02-21T09:47:41.6626971Z shl.b32 %r56, %r287, 3; 2026-02-21T09:47:41.6627033Z add.s32 %r58, %r33, %r56; 2026-02-21T09:47:41.6627088Z add.s32 %r45, %r58, 73856; 2026-02-21T09:47:41.6627243Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6627298Z // begin inline asm 2026-02-21T09:47:41.6627357Z 2026-02-21T09:47:41.6627419Z { 2026-02-21T09:47:41.6627501Z .reg .pred complete; 2026-02-21T09:47:41.6627574Z waitLoop: 2026-02-21T09:47:41.6627702Z mbarrier.try_wait.parity.shared.b64 complete, [%r45], %r286; 2026-02-21T09:47:41.6627782Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.6627847Z } 2026-02-21T09:47:41.6627883Z 2026-02-21T09:47:41.6627967Z // end inline asm 2026-02-21T09:47:41.6628156Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6628226Z add.s32 %r51, %r58, 73888; 2026-02-21T09:47:41.6628408Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6628472Z bar.sync 3, 64; 2026-02-21T09:47:41.6628536Z // begin inline asm 2026-02-21T09:47:41.6628663Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r51], 18432; 2026-02-21T09:47:41.6628730Z // end inline asm 2026-02-21T09:47:41.6628918Z .loc 1 54 31 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:54:31 2026-02-21T09:47:41.6628989Z shl.b32 %r59, %r287, 14; 2026-02-21T09:47:41.6629065Z add.s32 %r48, %r33, %r59; 2026-02-21T09:47:41.6629242Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6629310Z bar.sync 3, 64; 2026-02-21T09:47:41.6629395Z elect.sync %r60|%p7, -1; 2026-02-21T09:47:41.6629466Z and.pred %p4, %p6, %p7; 2026-02-21T09:47:41.6629574Z // begin inline asm 2026-02-21T09:47:41.6629871Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r48], [%rd10, {%r285, %r50}], [%r51]; 2026-02-21T09:47:41.6629938Z // end inline asm 2026-02-21T09:47:41.6630130Z .loc 1 55 44 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:55:44 2026-02-21T09:47:41.6630198Z shl.b32 %r61, %r287, 11; 2026-02-21T09:47:41.6630262Z add.s32 %r62, %r33, %r61; 2026-02-21T09:47:41.6630329Z add.s32 %r52, %r62, 65536; 2026-02-21T09:47:41.6630487Z .loc 1 0 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:0 2026-02-21T09:47:41.6630552Z bar.sync 3, 64; 2026-02-21T09:47:41.6630613Z elect.sync %r63|%p8, -1; 2026-02-21T09:47:41.6630674Z and.pred %p5, %p6, %p8; 2026-02-21T09:47:41.6630739Z // begin inline asm 2026-02-21T09:47:41.6630996Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r52], [%rd11, {%r285, %r54}], [%r51]; 2026-02-21T09:47:41.6631054Z // end inline asm 2026-02-21T09:47:41.6631109Z add.s32 %r64, %r287, 1; 2026-02-21T09:47:41.6631175Z setp.eq.b32 %p9, %r64, 4; 2026-02-21T09:47:41.6631237Z selp.b32 %r287, 0, %r64, %p9; 2026-02-21T09:47:41.6631295Z selp.b32 %r65, 1, 0, %p9; 2026-02-21T09:47:41.6631358Z xor.b32 %r286, %r286, %r65; 2026-02-21T09:47:41.6631524Z .loc 1 50 57 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:50:57 2026-02-21T09:47:41.6631585Z setp.lt.u32 %p10, %r285, 1984; 2026-02-21T09:47:41.6631669Z @%p10 bra $L__BB0_9; 2026-02-21T09:47:41.6631765Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6631820Z barrier.sync 1; 2026-02-21T09:47:41.6631894Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.6631954Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.6632051Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.6632208Z .loc 1 19 0 // co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py:19 2026-02-21T09:47:41.6632272Z barrier.sync 1; 2026-02-21T09:47:41.6632327Z barrier.sync 1; 2026-02-21T09:47:41.6632382Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.6632434Z $L__tmp1: 2026-02-21T09:47:41.6632492Z $L__func_end0: 2026-02-21T09:47:41.6632571Z // -- End function 2026-02-21T09:47:41.6632619Z } 2026-02-21T09:47:41.6632830Z .file 1 "/tmp/torchinductor_root/o4/co4jo3bbdayb3grh334gez575glccdtgvqlxrv6nektzaj34sayd.py" 2026-02-21T09:47:41.6632891Z .section .debug_abbrev 2026-02-21T09:47:41.6632940Z { 2026-02-21T09:47:41.6633033Z .b8 1 // Abbreviation Code 2026-02-21T09:47:41.6633118Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:41.6633197Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:41.6633295Z .b8 37 // DW_AT_producer 2026-02-21T09:47:41.6633376Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.6633448Z .b8 19 // DW_AT_language 2026-02-21T09:47:41.6633521Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:41.6633600Z .b8 3 // DW_AT_name 2026-02-21T09:47:41.6633670Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.6633745Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:41.6633825Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:41.6633898Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:41.6633967Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.6634034Z .b8 0 // EOM(1) 2026-02-21T09:47:41.6634107Z .b8 0 // EOM(2) 2026-02-21T09:47:41.6634174Z .b8 0 // EOM(3) 2026-02-21T09:47:41.6634223Z } 2026-02-21T09:47:41.6634313Z .section .debug_info 2026-02-21T09:47:41.6634363Z { 2026-02-21T09:47:41.6634444Z .b32 104 // Length of Unit 2026-02-21T09:47:41.6634533Z .b8 2 // DWARF version number 2026-02-21T09:47:41.6634583Z .b8 0 2026-02-21T09:47:41.6634722Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:41.6634809Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:41.6634913Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:41.6634991Z .b8 116 // DW_AT_producer 2026-02-21T09:47:41.6635042Z .b8 114 2026-02-21T09:47:41.6635099Z .b8 105 2026-02-21T09:47:41.6635149Z .b8 116 2026-02-21T09:47:41.6635197Z .b8 111 2026-02-21T09:47:41.6635246Z .b8 110 2026-02-21T09:47:41.6635304Z .b8 0 2026-02-21T09:47:41.6635412Z .b8 2 // DW_AT_language 2026-02-21T09:47:41.6635464Z .b8 0 2026-02-21T09:47:41.6635545Z .b8 99 // DW_AT_name 2026-02-21T09:47:41.6635595Z .b8 111 2026-02-21T09:47:41.6635644Z .b8 52 2026-02-21T09:47:41.6635694Z .b8 106 2026-02-21T09:47:41.6635752Z .b8 111 2026-02-21T09:47:41.6635804Z .b8 51 2026-02-21T09:47:41.6635852Z .b8 98 2026-02-21T09:47:41.6635908Z .b8 98 2026-02-21T09:47:41.6635956Z .b8 100 2026-02-21T09:47:41.6636005Z .b8 97 2026-02-21T09:47:41.6636053Z .b8 121 2026-02-21T09:47:41.6636108Z .b8 98 2026-02-21T09:47:41.6636156Z .b8 51 2026-02-21T09:47:41.6636205Z .b8 103 2026-02-21T09:47:41.6636288Z .b8 114 2026-02-21T09:47:41.6636338Z .b8 104 2026-02-21T09:47:41.6636386Z .b8 51 2026-02-21T09:47:41.6636437Z .b8 51 2026-02-21T09:47:41.6636493Z .b8 52 2026-02-21T09:47:41.6636543Z .b8 103 2026-02-21T09:47:41.6636592Z .b8 101 2026-02-21T09:47:41.6636641Z .b8 122 2026-02-21T09:47:41.6636699Z .b8 53 2026-02-21T09:47:41.6636751Z .b8 55 2026-02-21T09:47:41.6636803Z .b8 53 2026-02-21T09:47:41.6636859Z .b8 103 2026-02-21T09:47:41.6636908Z .b8 108 2026-02-21T09:47:41.6636957Z .b8 99 2026-02-21T09:47:41.6637005Z .b8 99 2026-02-21T09:47:41.6637061Z .b8 100 2026-02-21T09:47:41.6637111Z .b8 116 2026-02-21T09:47:41.6637159Z .b8 103 2026-02-21T09:47:41.6637214Z .b8 118 2026-02-21T09:47:41.6637266Z .b8 113 2026-02-21T09:47:41.6637315Z .b8 108 2026-02-21T09:47:41.6637368Z .b8 120 2026-02-21T09:47:41.6637427Z .b8 114 2026-02-21T09:47:41.6637478Z .b8 118 2026-02-21T09:47:41.6637532Z .b8 54 2026-02-21T09:47:41.6647566Z .b8 110 2026-02-21T09:47:41.6647644Z .b8 101 2026-02-21T09:47:41.6647702Z .b8 107 2026-02-21T09:47:41.6647763Z .b8 116 2026-02-21T09:47:41.6647829Z .b8 122 2026-02-21T09:47:41.6647884Z .b8 97 2026-02-21T09:47:41.6647938Z .b8 106 2026-02-21T09:47:41.6648001Z .b8 51 2026-02-21T09:47:41.6648056Z .b8 52 2026-02-21T09:47:41.6648109Z .b8 115 2026-02-21T09:47:41.6648165Z .b8 97 2026-02-21T09:47:41.6648230Z .b8 121 2026-02-21T09:47:41.6648369Z .b8 100 2026-02-21T09:47:41.6648430Z .b8 46 2026-02-21T09:47:41.6648496Z .b8 112 2026-02-21T09:47:41.6648550Z .b8 121 2026-02-21T09:47:41.6648609Z .b8 0 2026-02-21T09:47:41.6648730Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:41.6648835Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:41.6648890Z .b8 116 2026-02-21T09:47:41.6648944Z .b8 109 2026-02-21T09:47:41.6649007Z .b8 112 2026-02-21T09:47:41.6649063Z .b8 47 2026-02-21T09:47:41.6649115Z .b8 116 2026-02-21T09:47:41.6649168Z .b8 111 2026-02-21T09:47:41.6649233Z .b8 114 2026-02-21T09:47:41.6649285Z .b8 99 2026-02-21T09:47:41.6649339Z .b8 104 2026-02-21T09:47:41.6649404Z .b8 105 2026-02-21T09:47:41.6649456Z .b8 110 2026-02-21T09:47:41.6649509Z .b8 100 2026-02-21T09:47:41.6649562Z .b8 117 2026-02-21T09:47:41.6649623Z .b8 99 2026-02-21T09:47:41.6649676Z .b8 116 2026-02-21T09:47:41.6649727Z .b8 111 2026-02-21T09:47:41.6649780Z .b8 114 2026-02-21T09:47:41.6649840Z .b8 95 2026-02-21T09:47:41.6649893Z .b8 114 2026-02-21T09:47:41.6649950Z .b8 111 2026-02-21T09:47:41.6650011Z .b8 111 2026-02-21T09:47:41.6650064Z .b8 116 2026-02-21T09:47:41.6650158Z .b8 47 2026-02-21T09:47:41.6650209Z .b8 111 2026-02-21T09:47:41.6650270Z .b8 52 2026-02-21T09:47:41.6650323Z .b8 0 2026-02-21T09:47:41.6650377Z } 2026-02-21T09:47:41.6650460Z .section .debug_macinfo { } 2026-02-21T09:47:41.6650466Z 2026-02-21T09:47:41.6650555Z ================================================================ 2026-02-21T09:47:41.6650670Z please share the reproducer above with Triton project. 2026-02-21T09:47:41.8945690Z [78s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:41.8945717Z 2026-02-21T09:47:41.8945778Z 2026-02-21T09:47:41.8951385Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 32, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:47:41.8951411Z 2026-02-21T09:47:41.8951770Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:41.8951908Z `ptxas` stderr: 2026-02-21T09:47:41.8951999Z ================================================================ 2026-02-21T09:47:41.8952080Z Internal Triton PTX codegen error 2026-02-21T09:47:41.8952139Z `ptxas` stderr: 2026-02-21T09:47:41.8952518Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 307 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:41.8952679Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:41.8952684Z 2026-02-21T09:47:41.8953132Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpm05zurcr.ptx -o /tmp/tmpm05zurcr.ptx.o 2026-02-21T09:47:41.8953139Z 2026-02-21T09:47:41.8953142Z 2026-02-21T09:47:41.8953206Z // 2026-02-21T09:47:41.8953290Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:41.8953359Z // 2026-02-21T09:47:41.8953363Z 2026-02-21T09:47:41.8953427Z .version 8.7 2026-02-21T09:47:41.8953496Z .target sm_100a 2026-02-21T09:47:41.8953559Z .address_size 64 2026-02-21T09:47:41.8953563Z 2026-02-21T09:47:41.8953734Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:41.8953821Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:41.8953919Z // @_helion_matmul 2026-02-21T09:47:41.8954019Z .visible .entry _helion_matmul( 2026-02-21T09:47:41.8954133Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:41.8954237Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:41.8954401Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:41.8954502Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:41.8954606Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:41.8957093Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 307 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:41.8957148Z ) 2026-02-21T09:47:41.8957206Z .reqntid 256 2026-02-21T09:47:41.8957263Z .maxnreg 32 2026-02-21T09:47:41.8957328Z { 2026-02-21T09:47:41.8957395Z .reg .pred %p<120>; 2026-02-21T09:47:41.8957456Z .reg .b16 %rs<11>; 2026-02-21T09:47:41.8957526Z .reg .b32 %r<342>; 2026-02-21T09:47:41.8957587Z .reg .b64 %rd<172>; 2026-02-21T09:47:41.8957780Z .loc 1 19 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:19:0 2026-02-21T09:47:41.8957845Z $L__func_begin0: 2026-02-21T09:47:41.8958021Z .loc 1 19 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:19:0 2026-02-21T09:47:41.8958025Z 2026-02-21T09:47:41.8958078Z // %bb.0: 2026-02-21T09:47:41.8958224Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:47:41.8958288Z $L__tmp0: 2026-02-21T09:47:41.8958457Z .loc 1 19 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:19 2026-02-21T09:47:41.8958516Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:41.8958583Z shr.u32 %r2, %r1, 5; 2026-02-21T09:47:41.8958657Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:47:41.8958724Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:47:41.8958792Z @%p1 bra $L__BB0_12; 2026-02-21T09:47:41.8958850Z bra.uni $L__BB0_1; 2026-02-21T09:47:41.8958907Z $L__BB0_12: 2026-02-21T09:47:41.8959081Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0:0 2026-02-21T09:47:41.8959174Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:47:41.8959254Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:47:41.8959377Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:47:41.8959560Z .loc 1 19 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:19 2026-02-21T09:47:41.8959646Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.8959711Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:47:41.8959775Z mov.b32 %r132, global_smem; 2026-02-21T09:47:41.8959840Z // begin inline asm 2026-02-21T09:47:41.8959993Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r132], 32; 2026-02-21T09:47:41.8960050Z // end inline asm 2026-02-21T09:47:41.8960116Z bar.sync 0, 128; 2026-02-21T09:47:41.8960183Z ld.shared.b32 %r334, [global_smem]; 2026-02-21T09:47:41.8960268Z bar.sync 0, 128; 2026-02-21T09:47:41.8960336Z // begin inline asm 2026-02-21T09:47:41.8960458Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:41.8960513Z // end inline asm 2026-02-21T09:47:41.8960687Z .loc 1 21 67 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:21:67 2026-02-21T09:47:41.8960755Z mov.u32 %r157, %ctaid.x; 2026-02-21T09:47:41.8960815Z mov.u32 %r158, %ctaid.y; 2026-02-21T09:47:41.8960871Z mov.u32 %r159, %ctaid.z; 2026-02-21T09:47:41.8960939Z mov.u32 %r160, %nctaid.x; 2026-02-21T09:47:41.8960999Z mov.u32 %r161, %nctaid.y; 2026-02-21T09:47:41.8961065Z mad.lo.s32 %r162, %r159, %r161, %r158; 2026-02-21T09:47:41.8961137Z mad.lo.s32 %r163, %r162, %r160, %r157; 2026-02-21T09:47:41.8961199Z mul.lo.s32 %r164, %r163, 384; 2026-02-21T09:47:41.8961260Z cvt.s64.s32 %rd104, %r164; 2026-02-21T09:47:41.8961321Z add.s64 %rd65, %rd7, %rd104; 2026-02-21T09:47:41.8961389Z shl.b32 %r165, %r1, 2; 2026-02-21T09:47:41.8961446Z add.s32 %r133, %r132, %r165; 2026-02-21T09:47:41.8961500Z mov.b32 %r142, 0; 2026-02-21T09:47:41.8961564Z // begin inline asm 2026-02-21T09:47:41.8961634Z @%p34 st.shared.b32 [ %r133 + 0 ], %r142; 2026-02-21T09:47:41.8961692Z // end inline asm 2026-02-21T09:47:41.8961779Z bar.warp.sync -1; 2026-02-21T09:47:41.8961853Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:47:41.8961912Z cvt.u64.u32 %rd50, %r132; 2026-02-21T09:47:41.8961969Z // begin inline asm 2026-02-21T09:47:41.8962151Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd4; 2026-02-21T09:47:41.8962209Z // end inline asm 2026-02-21T09:47:41.8962264Z // begin inline asm 2026-02-21T09:47:41.8962409Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:47:41.8962471Z // end inline asm 2026-02-21T09:47:41.8962524Z mov.b32 %r135, 64; 2026-02-21T09:47:41.8962580Z // begin inline asm 2026-02-21T09:47:41.8962749Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r135; 2026-02-21T09:47:41.8962804Z // end inline asm 2026-02-21T09:47:41.8962860Z mov.b32 %r136, 128; 2026-02-21T09:47:41.8962924Z // begin inline asm 2026-02-21T09:47:41.8963074Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:47:41.8963130Z // end inline asm 2026-02-21T09:47:41.8963185Z mov.b32 %r137, 2048; 2026-02-21T09:47:41.8963274Z // begin inline asm 2026-02-21T09:47:41.8963437Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r137; 2026-02-21T09:47:41.8963493Z // end inline asm 2026-02-21T09:47:41.8963558Z // begin inline asm 2026-02-21T09:47:41.8963720Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r137; 2026-02-21T09:47:41.8963774Z // end inline asm 2026-02-21T09:47:41.8963840Z mov.b64 %rd58, 4096; 2026-02-21T09:47:41.8963896Z // begin inline asm 2026-02-21T09:47:41.8964064Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:47:41.8964130Z // end inline asm 2026-02-21T09:47:41.8964185Z mov.b32 %r139, 1; 2026-02-21T09:47:41.8964240Z // begin inline asm 2026-02-21T09:47:41.8964416Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.8964510Z // end inline asm 2026-02-21T09:47:41.8964568Z // begin inline asm 2026-02-21T09:47:41.8964785Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r139; 2026-02-21T09:47:41.8964855Z // end inline asm 2026-02-21T09:47:41.8964914Z // begin inline asm 2026-02-21T09:47:41.8965069Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:47:41.8965138Z // end inline asm 2026-02-21T09:47:41.8965196Z // begin inline asm 2026-02-21T09:47:41.8965367Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:41.8965452Z // end inline asm 2026-02-21T09:47:41.8965524Z // begin inline asm 2026-02-21T09:47:41.8965679Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:47:41.8965735Z // end inline asm 2026-02-21T09:47:41.8965802Z // begin inline asm 2026-02-21T09:47:41.8965951Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:41.8966005Z // end inline asm 2026-02-21T09:47:41.8966071Z // begin inline asm 2026-02-21T09:47:41.8966331Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd65 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:47:41.8966386Z // end inline asm 2026-02-21T09:47:41.8966442Z // begin inline asm 2026-02-21T09:47:41.8966578Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd65 + 0 ], 0x80; 2026-02-21T09:47:41.8966650Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.8966725Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.8966790Z // end inline asm 2026-02-21T09:47:41.8966846Z bar.sync 0, 128; 2026-02-21T09:47:41.8967021Z .loc 1 22 68 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:22:68 2026-02-21T09:47:41.8967091Z add.s32 %r166, %r164, 128; 2026-02-21T09:47:41.8967151Z cvt.s64.s32 %rd105, %r166; 2026-02-21T09:47:41.8967236Z add.s64 %rd83, %rd7, %rd105; 2026-02-21T09:47:41.8967295Z bar.sync 0, 128; 2026-02-21T09:47:41.8967360Z // begin inline asm 2026-02-21T09:47:41.8967431Z @%p34 st.shared.b32 [ %r133 + 0 ], %r142; 2026-02-21T09:47:41.8967485Z // end inline asm 2026-02-21T09:47:41.8967550Z bar.warp.sync -1; 2026-02-21T09:47:41.8967607Z // begin inline asm 2026-02-21T09:47:41.8967762Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd5; 2026-02-21T09:47:41.8967825Z // end inline asm 2026-02-21T09:47:41.8967882Z // begin inline asm 2026-02-21T09:47:41.8968020Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:47:41.8968076Z // end inline asm 2026-02-21T09:47:41.8968138Z // begin inline asm 2026-02-21T09:47:41.8968285Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r135; 2026-02-21T09:47:41.8968338Z // end inline asm 2026-02-21T09:47:41.8968400Z mov.b32 %r144, 32; 2026-02-21T09:47:41.8968455Z // begin inline asm 2026-02-21T09:47:41.8968606Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r144; 2026-02-21T09:47:41.8968694Z // end inline asm 2026-02-21T09:47:41.8968748Z // begin inline asm 2026-02-21T09:47:41.8968906Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r137; 2026-02-21T09:47:41.8968960Z // end inline asm 2026-02-21T09:47:41.8969024Z mov.b32 %r146, 12288; 2026-02-21T09:47:41.8969079Z // begin inline asm 2026-02-21T09:47:41.8969234Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r146; 2026-02-21T09:47:41.8969296Z // end inline asm 2026-02-21T09:47:41.8969353Z // begin inline asm 2026-02-21T09:47:41.8969518Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:47:41.8969579Z // end inline asm 2026-02-21T09:47:41.8969632Z // begin inline asm 2026-02-21T09:47:41.8969804Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.8969886Z // end inline asm 2026-02-21T09:47:41.8969950Z // begin inline asm 2026-02-21T09:47:41.8970119Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r139; 2026-02-21T09:47:41.8970173Z // end inline asm 2026-02-21T09:47:41.8970236Z // begin inline asm 2026-02-21T09:47:41.8970382Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:47:41.8970436Z // end inline asm 2026-02-21T09:47:41.8970500Z // begin inline asm 2026-02-21T09:47:41.8970667Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:41.8970745Z // end inline asm 2026-02-21T09:47:41.8970799Z // begin inline asm 2026-02-21T09:47:41.8970957Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:47:41.8971010Z // end inline asm 2026-02-21T09:47:41.8971065Z // begin inline asm 2026-02-21T09:47:41.8971218Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:41.8971273Z // end inline asm 2026-02-21T09:47:41.8971327Z // begin inline asm 2026-02-21T09:47:41.8971592Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd83 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:47:41.8971647Z // end inline asm 2026-02-21T09:47:41.8971702Z // begin inline asm 2026-02-21T09:47:41.8971834Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd83 + 0 ], 0x80; 2026-02-21T09:47:41.8971903Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.8971976Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.8972033Z // end inline asm 2026-02-21T09:47:41.8972097Z bar.sync 0, 128; 2026-02-21T09:47:41.8972265Z .loc 1 24 73 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:24:73 2026-02-21T09:47:41.8972326Z add.s32 %r167, %r164, 256; 2026-02-21T09:47:41.8972394Z cvt.s64.s32 %rd106, %r167; 2026-02-21T09:47:41.8972494Z add.s64 %rd101, %rd7, %rd106; 2026-02-21T09:47:41.8972555Z bar.sync 0, 128; 2026-02-21T09:47:41.8972620Z // begin inline asm 2026-02-21T09:47:41.8972692Z @%p34 st.shared.b32 [ %r133 + 0 ], %r142; 2026-02-21T09:47:41.8972747Z // end inline asm 2026-02-21T09:47:41.8972805Z bar.warp.sync -1; 2026-02-21T09:47:41.8972872Z // begin inline asm 2026-02-21T09:47:41.8973028Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd6; 2026-02-21T09:47:41.8973084Z // end inline asm 2026-02-21T09:47:41.8973147Z // begin inline asm 2026-02-21T09:47:41.8973287Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:47:41.8973344Z // end inline asm 2026-02-21T09:47:41.8973399Z // begin inline asm 2026-02-21T09:47:41.8973557Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r144; 2026-02-21T09:47:41.8973611Z // end inline asm 2026-02-21T09:47:41.8973666Z // begin inline asm 2026-02-21T09:47:41.8973823Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:47:41.8973877Z // end inline asm 2026-02-21T09:47:41.8973955Z // begin inline asm 2026-02-21T09:47:41.8974120Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r146; 2026-02-21T09:47:41.8974175Z // end inline asm 2026-02-21T09:47:41.8974229Z // begin inline asm 2026-02-21T09:47:41.8974396Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r137; 2026-02-21T09:47:41.8974449Z // end inline asm 2026-02-21T09:47:41.8974507Z mov.b64 %rd94, 24576; 2026-02-21T09:47:41.8974561Z // begin inline asm 2026-02-21T09:47:41.8974767Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd94; 2026-02-21T09:47:41.8974823Z // end inline asm 2026-02-21T09:47:41.8974876Z // begin inline asm 2026-02-21T09:47:41.8975052Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r139; 2026-02-21T09:47:41.8975109Z // end inline asm 2026-02-21T09:47:41.8975188Z // begin inline asm 2026-02-21T09:47:41.8975370Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r139; 2026-02-21T09:47:41.8975427Z // end inline asm 2026-02-21T09:47:41.8975482Z // begin inline asm 2026-02-21T09:47:41.8975633Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:47:41.8975697Z // end inline asm 2026-02-21T09:47:41.8975752Z // begin inline asm 2026-02-21T09:47:41.8975916Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:41.8975981Z // end inline asm 2026-02-21T09:47:41.8976063Z // begin inline asm 2026-02-21T09:47:41.8976214Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x2; 2026-02-21T09:47:41.8976280Z // end inline asm 2026-02-21T09:47:41.8976335Z // begin inline asm 2026-02-21T09:47:41.8976481Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:41.8976546Z // end inline asm 2026-02-21T09:47:41.8976599Z // begin inline asm 2026-02-21T09:47:41.8976869Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd101 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:47:41.8976922Z // end inline asm 2026-02-21T09:47:41.8976986Z // begin inline asm 2026-02-21T09:47:41.8977117Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd101 + 0 ], 0x80; 2026-02-21T09:47:41.8977185Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:41.8977265Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:41.8977320Z // end inline asm 2026-02-21T09:47:41.8977376Z bar.sync 0, 128; 2026-02-21T09:47:41.8977445Z cvta.global.u64 %rd107, %rd101; 2026-02-21T09:47:41.8977622Z .loc 1 31 35 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:31:35 2026-02-21T09:47:41.8977683Z mul.lo.s32 %r341, %r157, 3; 2026-02-21T09:47:41.8977893Z .loc 1 32 37 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:32:37 2026-02-21T09:47:41.8977953Z add.s32 %r168, %r341, 3; 2026-02-21T09:47:41.8978121Z .loc 1 32 49 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:32:49 2026-02-21T09:47:41.8978188Z min.s32 %r22, %r168, 6144; 2026-02-21T09:47:41.8978355Z .loc 1 33 84 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:33:84 2026-02-21T09:47:41.8978420Z setp.ge.s32 %p90, %r341, %r22; 2026-02-21T09:47:41.8978485Z @%p90 bra $L__BB0_15; 2026-02-21T09:47:41.8978563Z // %bb.13: // %.lr.ph 2026-02-21T09:47:41.8978732Z .loc 1 0 84 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0:84 2026-02-21T09:47:41.8978792Z shl.b32 %r169, %r1, 6; 2026-02-21T09:47:41.8978861Z and.b32 %r170, %r169, 8128; 2026-02-21T09:47:41.8978921Z shl.b32 %r171, %r1, 3; 2026-02-21T09:47:41.8978979Z and.b32 %r172, %r171, 48; 2026-02-21T09:47:41.8979046Z or.b32 %r173, %r170, %r172; 2026-02-21T09:47:41.8979105Z add.s32 %r175, %r132, 81920; 2026-02-21T09:47:41.8979161Z add.s32 %r23, %r175, %r173; 2026-02-21T09:47:41.8979250Z xor.b32 %r176, %r173, 16; 2026-02-21T09:47:41.8979305Z add.s32 %r24, %r175, %r176; 2026-02-21T09:47:41.8979359Z xor.b32 %r177, %r173, 32; 2026-02-21T09:47:41.8979414Z add.s32 %r25, %r175, %r177; 2026-02-21T09:47:41.8979474Z xor.b32 %r178, %r173, 48; 2026-02-21T09:47:41.8979529Z add.s32 %r26, %r175, %r178; 2026-02-21T09:47:41.8979632Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:41.8979806Z .loc 1 39 35 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:39:35 2026-02-21T09:47:41.8979862Z shr.s32 %r274, %r341, 31; 2026-02-21T09:47:41.8979915Z shr.u32 %r275, %r274, 26; 2026-02-21T09:47:41.8979971Z add.s32 %r276, %r341, %r275; 2026-02-21T09:47:41.8980144Z .loc 1 42 64 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:42:64 2026-02-21T09:47:41.8980222Z and.b32 %r277, %r276, 65472; 2026-02-21T09:47:41.8980281Z sub.s32 %r278, %r341, %r277; 2026-02-21T09:47:41.8980348Z cvt.u16.u32 %rs1, %r278; 2026-02-21T09:47:41.8980407Z cvt.s8.s32 %rs2, %r278; 2026-02-21T09:47:41.8980577Z .loc 1 43 51 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:43:51 2026-02-21T09:47:41.8980641Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:47:41.8980697Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:47:41.8980754Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:47:41.8980812Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:47:41.8980874Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:47:41.8981065Z .loc 1 42 64 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:42:64 2026-02-21T09:47:41.8981121Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:47:41.8981185Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:47:41.8981355Z .loc 1 44 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:44:27 2026-02-21T09:47:41.8981414Z shl.b32 %r279, %r276, 1; 2026-02-21T09:47:41.8981479Z and.b32 %r280, %r279, -128; 2026-02-21T09:47:41.8981541Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:47:41.8981610Z mad.wide.s16 %r271, %rs10, 32, %r280; 2026-02-21T09:47:41.8981778Z .loc 1 45 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:45:27 2026-02-21T09:47:41.8981849Z mul.wide.s16 %r272, %rs7, 128; 2026-02-21T09:47:41.8982017Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8982088Z shfl.sync.idx.b32 %r281, %r2, 0, 31, -1; 2026-02-21T09:47:41.8982155Z shl.b32 %r282, %r281, 21; 2026-02-21T09:47:41.8982215Z and.b32 %r283, %r282, 6291456; 2026-02-21T09:47:41.8982273Z add.s32 %r179, %r283, %r334; 2026-02-21T09:47:41.8982340Z mov.pred %p91, -1; 2026-02-21T09:47:41.8982395Z mov.b32 %r180, 0; 2026-02-21T09:47:41.8982451Z // begin inline asm 2026-02-21T09:47:41.8982756Z @%p91 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r179 + 0], {%r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180}; 2026-02-21T09:47:41.8982821Z // end inline asm 2026-02-21T09:47:41.8982875Z // begin inline asm 2026-02-21T09:47:41.8983147Z @%p91 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r179 + 16], {%r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180, %r180}; 2026-02-21T09:47:41.8983211Z // end inline asm 2026-02-21T09:47:41.8983264Z // begin inline asm 2026-02-21T09:47:41.8983332Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:41.8983392Z // end inline asm 2026-02-21T09:47:41.8983444Z bar.sync 0, 128; 2026-02-21T09:47:41.8983621Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.8983687Z add.s32 %r213, %r132, 90240; 2026-02-21T09:47:41.8983741Z // begin inline asm 2026-02-21T09:47:41.8983823Z @%p37 mbarrier.init.shared::cta.b64 [%r213], 1; 2026-02-21T09:47:41.8983887Z // end inline asm 2026-02-21T09:47:41.8983940Z bar.sync 0, 128; 2026-02-21T09:47:41.8983995Z add.s32 %r214, %r132, 90248; 2026-02-21T09:47:41.8984071Z // begin inline asm 2026-02-21T09:47:41.8984161Z @%p37 mbarrier.init.shared::cta.b64 [%r214], 1; 2026-02-21T09:47:41.8984215Z // end inline asm 2026-02-21T09:47:41.8984268Z bar.sync 0, 128; 2026-02-21T09:47:41.8984331Z add.s32 %r215, %r132, 90256; 2026-02-21T09:47:41.8984384Z // begin inline asm 2026-02-21T09:47:41.8984461Z @%p37 mbarrier.init.shared::cta.b64 [%r215], 1; 2026-02-21T09:47:41.8984521Z // end inline asm 2026-02-21T09:47:41.8984573Z bar.sync 0, 128; 2026-02-21T09:47:41.8984630Z add.s32 %r216, %r132, 90264; 2026-02-21T09:47:41.8984717Z // begin inline asm 2026-02-21T09:47:41.8984804Z @%p37 mbarrier.init.shared::cta.b64 [%r216], 1; 2026-02-21T09:47:41.8984856Z // end inline asm 2026-02-21T09:47:41.8984912Z add.s32 %r217, %r132, 90272; 2026-02-21T09:47:41.8984973Z // begin inline asm 2026-02-21T09:47:41.8985073Z @%p37 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T09:47:41.8985129Z // end inline asm 2026-02-21T09:47:41.8985183Z bar.sync 0, 128; 2026-02-21T09:47:41.8985246Z add.s32 %r218, %r132, 90280; 2026-02-21T09:47:41.8985302Z // begin inline asm 2026-02-21T09:47:41.8985378Z @%p37 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T09:47:41.8985437Z // end inline asm 2026-02-21T09:47:41.8985490Z bar.sync 0, 128; 2026-02-21T09:47:41.8985545Z add.s32 %r219, %r132, 90288; 2026-02-21T09:47:41.8985599Z // begin inline asm 2026-02-21T09:47:41.8985682Z @%p37 mbarrier.init.shared::cta.b64 [%r219], 1; 2026-02-21T09:47:41.8985738Z // end inline asm 2026-02-21T09:47:41.8985816Z bar.sync 0, 128; 2026-02-21T09:47:41.8985880Z add.s32 %r220, %r132, 90296; 2026-02-21T09:47:41.8985934Z // begin inline asm 2026-02-21T09:47:41.8986010Z @%p37 mbarrier.init.shared::cta.b64 [%r220], 1; 2026-02-21T09:47:41.8986069Z // end inline asm 2026-02-21T09:47:41.8986235Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.8986289Z bar.sync 0, 128; 2026-02-21T09:47:41.8986343Z // begin inline asm 2026-02-21T09:47:41.8986437Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r213]; 2026-02-21T09:47:41.8986490Z // end inline asm 2026-02-21T09:47:41.8986542Z bar.sync 0, 128; 2026-02-21T09:47:41.8986601Z // begin inline asm 2026-02-21T09:47:41.8986684Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r214]; 2026-02-21T09:47:41.8986736Z // end inline asm 2026-02-21T09:47:41.8986787Z bar.sync 0, 128; 2026-02-21T09:47:41.8986847Z // begin inline asm 2026-02-21T09:47:41.8986928Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r215]; 2026-02-21T09:47:41.8986982Z // end inline asm 2026-02-21T09:47:41.8987039Z bar.sync 0, 128; 2026-02-21T09:47:41.8987092Z // begin inline asm 2026-02-21T09:47:41.8987172Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r216]; 2026-02-21T09:47:41.8987231Z // end inline asm 2026-02-21T09:47:41.8987424Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.8987480Z bar.sync 0, 128; 2026-02-21T09:47:41.8987537Z add.s32 %r225, %r132, 90304; 2026-02-21T09:47:41.8987597Z // begin inline asm 2026-02-21T09:47:41.8987672Z @%p37 mbarrier.init.shared::cta.b64 [%r225], 1; 2026-02-21T09:47:41.8987724Z // end inline asm 2026-02-21T09:47:41.8987807Z st.shared.b32 [global_smem+90312], 33554689; 2026-02-21T09:47:41.8987878Z st.shared.b32 [global_smem+90112], %r334; 2026-02-21T09:47:41.8987969Z st.shared.v2.b32 [global_smem+90120], {%r272, %r271}; 2026-02-21T09:47:41.8988024Z barrier.sync 1; 2026-02-21T09:47:41.8988108Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.8988162Z barrier.sync 1; 2026-02-21T09:47:41.8988215Z barrier.sync 1; 2026-02-21T09:47:41.8988297Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:41.8988466Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8988521Z bar.sync 0, 128; 2026-02-21T09:47:41.8988584Z // begin inline asm 2026-02-21T09:47:41.8988633Z 2026-02-21T09:47:41.8988705Z { 2026-02-21T09:47:41.8988766Z .reg .pred complete; 2026-02-21T09:47:41.8988830Z waitLoop: 2026-02-21T09:47:41.8988947Z mbarrier.try_wait.parity.shared.b64 complete, [%r225], %r180; 2026-02-21T09:47:41.8989013Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.8989070Z } 2026-02-21T09:47:41.8989074Z 2026-02-21T09:47:41.8989128Z // end inline asm 2026-02-21T09:47:41.8989295Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.8989349Z bar.sync 0, 128; 2026-02-21T09:47:41.8989412Z // begin inline asm 2026-02-21T09:47:41.8989490Z @%p37 mbarrier.inval.shared::cta.b64 [%r225]; 2026-02-21T09:47:41.8989542Z // end inline asm 2026-02-21T09:47:41.8989604Z // begin inline asm 2026-02-21T09:47:41.8989681Z @%p37 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T09:47:41.8989733Z // end inline asm 2026-02-21T09:47:41.8989810Z bar.sync 0, 128; 2026-02-21T09:47:41.8989865Z // begin inline asm 2026-02-21T09:47:41.8989941Z @%p37 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T09:47:41.8989995Z // end inline asm 2026-02-21T09:47:41.8990058Z bar.sync 0, 128; 2026-02-21T09:47:41.8990112Z // begin inline asm 2026-02-21T09:47:41.8990187Z @%p37 mbarrier.inval.shared::cta.b64 [%r219]; 2026-02-21T09:47:41.8990250Z // end inline asm 2026-02-21T09:47:41.8990303Z bar.sync 0, 128; 2026-02-21T09:47:41.8990361Z // begin inline asm 2026-02-21T09:47:41.8990436Z @%p37 mbarrier.inval.shared::cta.b64 [%r220]; 2026-02-21T09:47:41.8990497Z // end inline asm 2026-02-21T09:47:41.8990585Z // begin inline asm 2026-02-21T09:47:41.8990662Z @%p37 mbarrier.inval.shared::cta.b64 [%r213]; 2026-02-21T09:47:41.8990726Z // end inline asm 2026-02-21T09:47:41.8990782Z bar.sync 0, 128; 2026-02-21T09:47:41.8990837Z // begin inline asm 2026-02-21T09:47:41.8990914Z @%p37 mbarrier.inval.shared::cta.b64 [%r214]; 2026-02-21T09:47:41.8990979Z // end inline asm 2026-02-21T09:47:41.8991036Z bar.sync 0, 128; 2026-02-21T09:47:41.8991091Z // begin inline asm 2026-02-21T09:47:41.8991180Z @%p37 mbarrier.inval.shared::cta.b64 [%r215]; 2026-02-21T09:47:41.8991233Z // end inline asm 2026-02-21T09:47:41.8991288Z bar.sync 0, 128; 2026-02-21T09:47:41.8991350Z // begin inline asm 2026-02-21T09:47:41.8991427Z @%p37 mbarrier.inval.shared::cta.b64 [%r216]; 2026-02-21T09:47:41.8991482Z // end inline asm 2026-02-21T09:47:41.8991657Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8991722Z // begin inline asm 2026-02-21T09:47:41.8992006Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r237, %r238, %r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251, %r252}, [%r179 + 0]; 2026-02-21T09:47:41.8992061Z // end inline asm 2026-02-21T09:47:41.8992123Z // begin inline asm 2026-02-21T09:47:41.8992415Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r254, %r255, %r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269}, [%r179 + 16]; 2026-02-21T09:47:41.8992476Z // end inline asm 2026-02-21T09:47:41.8992540Z // begin inline asm 2026-02-21T09:47:41.8992612Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:41.8992667Z // end inline asm 2026-02-21T09:47:41.8992729Z cvt.u64.u32 %rd108, %r237; 2026-02-21T09:47:41.8992797Z cvt.u64.u32 %rd109, %r238; 2026-02-21T09:47:41.8992859Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:47:41.8992923Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:47:41.8993110Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8993176Z mov.b64 {%r285, %r286}, %rd111; 2026-02-21T09:47:41.8993247Z cvt.rn.f16x2.f32 %r287, %r286, %r285; 2026-02-21T09:47:41.8993429Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8993490Z cvt.u64.u32 %rd112, %r239; 2026-02-21T09:47:41.8993553Z cvt.u64.u32 %rd113, %r240; 2026-02-21T09:47:41.8993613Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:47:41.8993709Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:47:41.8993885Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8993949Z mov.b64 {%r288, %r289}, %rd115; 2026-02-21T09:47:41.8994024Z cvt.rn.f16x2.f32 %r290, %r289, %r288; 2026-02-21T09:47:41.8994199Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8994259Z cvt.u64.u32 %rd116, %r241; 2026-02-21T09:47:41.8994324Z cvt.u64.u32 %rd117, %r242; 2026-02-21T09:47:41.8994383Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:47:41.8994444Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:47:41.8994620Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8994719Z mov.b64 {%r291, %r292}, %rd119; 2026-02-21T09:47:41.8994811Z cvt.rn.f16x2.f32 %r293, %r292, %r291; 2026-02-21T09:47:41.8994987Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8995055Z cvt.u64.u32 %rd120, %r243; 2026-02-21T09:47:41.8995113Z cvt.u64.u32 %rd121, %r244; 2026-02-21T09:47:41.8995171Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:47:41.8995239Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:47:41.8995411Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8995472Z mov.b64 {%r294, %r295}, %rd123; 2026-02-21T09:47:41.8995536Z cvt.rn.f16x2.f32 %r296, %r295, %r294; 2026-02-21T09:47:41.8995748Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8995807Z cvt.u64.u32 %rd124, %r245; 2026-02-21T09:47:41.8995865Z cvt.u64.u32 %rd125, %r246; 2026-02-21T09:47:41.8995932Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:47:41.8995992Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:47:41.8996170Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8996239Z mov.b64 {%r297, %r298}, %rd127; 2026-02-21T09:47:41.8996304Z cvt.rn.f16x2.f32 %r299, %r298, %r297; 2026-02-21T09:47:41.8996481Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8996540Z cvt.u64.u32 %rd128, %r247; 2026-02-21T09:47:41.8996606Z cvt.u64.u32 %rd129, %r248; 2026-02-21T09:47:41.8996664Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:47:41.8996725Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:47:41.8996910Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8996969Z mov.b64 {%r300, %r301}, %rd131; 2026-02-21T09:47:41.8997033Z cvt.rn.f16x2.f32 %r302, %r301, %r300; 2026-02-21T09:47:41.8997238Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8997300Z cvt.u64.u32 %rd132, %r249; 2026-02-21T09:47:41.8997359Z cvt.u64.u32 %rd133, %r250; 2026-02-21T09:47:41.8997420Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:47:41.8997487Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:47:41.8997663Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8997722Z mov.b64 {%r303, %r304}, %rd135; 2026-02-21T09:47:41.8997794Z cvt.rn.f16x2.f32 %r305, %r304, %r303; 2026-02-21T09:47:41.8997969Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8998030Z cvt.u64.u32 %rd136, %r251; 2026-02-21T09:47:41.8998096Z cvt.u64.u32 %rd137, %r252; 2026-02-21T09:47:41.8998155Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:47:41.8998215Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:47:41.8998386Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8998455Z mov.b64 {%r306, %r307}, %rd139; 2026-02-21T09:47:41.8998518Z cvt.rn.f16x2.f32 %r308, %r307, %r306; 2026-02-21T09:47:41.8998724Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8998791Z cvt.u64.u32 %rd140, %r254; 2026-02-21T09:47:41.8998850Z cvt.u64.u32 %rd141, %r255; 2026-02-21T09:47:41.8998910Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:47:41.8998979Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:47:41.8999162Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8999223Z mov.b64 {%r309, %r310}, %rd143; 2026-02-21T09:47:41.8999286Z cvt.rn.f16x2.f32 %r311, %r310, %r309; 2026-02-21T09:47:41.8999465Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.8999521Z cvt.u64.u32 %rd144, %r256; 2026-02-21T09:47:41.8999577Z cvt.u64.u32 %rd145, %r257; 2026-02-21T09:47:41.8999665Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:47:41.8999723Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:47:41.8999895Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.8999958Z mov.b64 {%r312, %r313}, %rd147; 2026-02-21T09:47:41.9000017Z cvt.rn.f16x2.f32 %r314, %r313, %r312; 2026-02-21T09:47:41.9000184Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9000240Z cvt.u64.u32 %rd148, %r258; 2026-02-21T09:47:41.9000303Z cvt.u64.u32 %rd149, %r259; 2026-02-21T09:47:41.9000359Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:47:41.9000438Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:47:41.9000611Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.9000668Z mov.b64 {%r315, %r316}, %rd151; 2026-02-21T09:47:41.9000727Z cvt.rn.f16x2.f32 %r317, %r316, %r315; 2026-02-21T09:47:41.9000909Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9000966Z cvt.u64.u32 %rd152, %r260; 2026-02-21T09:47:41.9001022Z cvt.u64.u32 %rd153, %r261; 2026-02-21T09:47:41.9001078Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:47:41.9001143Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:47:41.9001311Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.9001367Z mov.b64 {%r318, %r319}, %rd155; 2026-02-21T09:47:41.9001434Z cvt.rn.f16x2.f32 %r320, %r319, %r318; 2026-02-21T09:47:41.9001598Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9001656Z cvt.u64.u32 %rd156, %r262; 2026-02-21T09:47:41.9001718Z cvt.u64.u32 %rd157, %r263; 2026-02-21T09:47:41.9001775Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:47:41.9001832Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:47:41.9002018Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.9002087Z mov.b64 {%r321, %r322}, %rd159; 2026-02-21T09:47:41.9002147Z cvt.rn.f16x2.f32 %r323, %r322, %r321; 2026-02-21T09:47:41.9002313Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9002379Z cvt.u64.u32 %rd160, %r264; 2026-02-21T09:47:41.9002435Z cvt.u64.u32 %rd161, %r265; 2026-02-21T09:47:41.9002492Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:47:41.9002555Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:47:41.9002722Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.9002781Z mov.b64 {%r324, %r325}, %rd163; 2026-02-21T09:47:41.9002840Z cvt.rn.f16x2.f32 %r326, %r325, %r324; 2026-02-21T09:47:41.9003013Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9003072Z cvt.u64.u32 %rd164, %r266; 2026-02-21T09:47:41.9003130Z cvt.u64.u32 %rd165, %r267; 2026-02-21T09:47:41.9003192Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:47:41.9003268Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:47:41.9003435Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.9003498Z mov.b64 {%r327, %r328}, %rd167; 2026-02-21T09:47:41.9003558Z cvt.rn.f16x2.f32 %r329, %r328, %r327; 2026-02-21T09:47:41.9003720Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9003774Z cvt.u64.u32 %rd168, %r268; 2026-02-21T09:47:41.9003837Z cvt.u64.u32 %rd169, %r269; 2026-02-21T09:47:41.9003893Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:47:41.9003947Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:47:41.9004120Z .loc 1 58 27 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:58:27 2026-02-21T09:47:41.9004177Z mov.b64 {%r330, %r331}, %rd171; 2026-02-21T09:47:41.9004254Z cvt.rn.f16x2.f32 %r332, %r331, %r330; 2026-02-21T09:47:41.9004427Z .loc 1 59 45 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:59:45 2026-02-21T09:47:41.9004498Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:41.9004550Z bar.sync 0, 128; 2026-02-21T09:47:41.9004642Z st.shared.v4.b32 [%r23], {%r287, %r290, %r293, %r296}; 2026-02-21T09:47:41.9004768Z st.shared.v4.b32 [%r24], {%r299, %r302, %r305, %r308}; 2026-02-21T09:47:41.9004854Z st.shared.v4.b32 [%r25], {%r311, %r314, %r317, %r320}; 2026-02-21T09:47:41.9004938Z st.shared.v4.b32 [%r26], {%r323, %r326, %r329, %r332}; 2026-02-21T09:47:41.9005028Z // begin inline asm 2026-02-21T09:47:41.9005099Z fence.proxy.async.shared::cta; 2026-02-21T09:47:41.9005151Z // end inline asm 2026-02-21T09:47:41.9005204Z bar.sync 0, 128; 2026-02-21T09:47:41.9005274Z elect.sync %r333|%p117, -1; 2026-02-21T09:47:41.9005336Z and.pred %p115, %p34, %p117; 2026-02-21T09:47:41.9005392Z // begin inline asm 2026-02-21T09:47:41.9005584Z @%p115 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd107, {%r271, %r272}], [%r175]; 2026-02-21T09:47:41.9005639Z // end inline asm 2026-02-21T09:47:41.9005702Z cp.async.bulk.commit_group; 2026-02-21T09:47:41.9005879Z .loc 1 33 84 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:33:84 2026-02-21T09:47:41.9005936Z add.s32 %r341, %r341, 1; 2026-02-21T09:47:41.9005997Z setp.ne.b32 %p118, %r22, %r341; 2026-02-21T09:47:41.9006055Z @%p118 bra $L__BB0_14; 2026-02-21T09:47:41.9006143Z $L__BB0_15: // %._crit_edge 2026-02-21T09:47:41.9006215Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:41.9006269Z bar.sync 0, 128; 2026-02-21T09:47:41.9006442Z .loc 1 33 4 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:33:4 2026-02-21T09:47:41.9006495Z bar.sync 0, 128; 2026-02-21T09:47:41.9006548Z // begin inline asm 2026-02-21T09:47:41.9006696Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r334, 32; 2026-02-21T09:47:41.9006751Z // end inline asm 2026-02-21T09:47:41.9006829Z st.shared.b32 [global_smem+90312], 50529027; 2026-02-21T09:47:41.9006884Z barrier.sync 1; 2026-02-21T09:47:41.9006970Z $L__BB0_16: // %common.ret 2026-02-21T09:47:41.9007132Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9007182Z ret; 2026-02-21T09:47:41.9007284Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:47:41.9007344Z mov.b32 %r30, global_smem; 2026-02-21T09:47:41.9007402Z add.s32 %r31, %r30, %r3; 2026-02-21T09:47:41.9007469Z add.s32 %r64, %r30, 90272; 2026-02-21T09:47:41.9007529Z bfe.u32 %r78, %r30, 4, 14; 2026-02-21T09:47:41.9007588Z cvt.u64.u32 %rd24, %r78; 2026-02-21T09:47:41.9007656Z or.b64 %rd14, %rd24, 4611686293372403712; 2026-02-21T09:47:41.9007720Z add.s32 %r79, %r30, 65536; 2026-02-21T09:47:41.9007778Z bfe.u32 %r80, %r79, 4, 14; 2026-02-21T09:47:41.9007838Z cvt.u64.u32 %rd25, %r80; 2026-02-21T09:47:41.9007911Z or.b64 %rd15, %rd25, 4611686293322072064; 2026-02-21T09:47:41.9007991Z add.s32 %r81, %r30, 32; 2026-02-21T09:47:41.9008046Z bfe.u32 %r82, %r81, 4, 14; 2026-02-21T09:47:41.9008102Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.9008209Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9008382Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9008459Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.9008521Z barrier.sync 1; 2026-02-21T09:47:41.9008577Z barrier.sync 1; 2026-02-21T09:47:41.9008650Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.9008732Z $L__BB0_2: // %.preheader 2026-02-21T09:47:41.9008820Z // =>This Loop Header: Depth=1 2026-02-21T09:47:41.9008945Z // Child Loop BB0_9 Depth 2 2026-02-21T09:47:41.9009032Z // Child Loop BB0_6 Depth 2 2026-02-21T09:47:41.9009207Z .loc 1 19 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:19 2026-02-21T09:47:41.9009282Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:47:41.9009335Z barrier.sync 1; 2026-02-21T09:47:41.9009406Z ld.shared.b8 %r29, [%r31+90308]; 2026-02-21T09:47:41.9009468Z setp.gt.u32 %p2, %r29, 3; 2026-02-21T09:47:41.9009523Z @%p2 bra $L__BB0_4; 2026-02-21T09:47:41.9009607Z // %bb.3: // %.preheader 2026-02-21T09:47:41.9009713Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9009773Z $L_brx_0: .branchtargets 2026-02-21T09:47:41.9009826Z $L__BB0_5, 2026-02-21T09:47:41.9009886Z $L__BB0_8, 2026-02-21T09:47:41.9009937Z $L__BB0_11, 2026-02-21T09:47:41.9009986Z $L__BB0_16; 2026-02-21T09:47:41.9010052Z brx.idx %r29, $L_brx_0; 2026-02-21T09:47:41.9010130Z $L__BB0_5: // %.peel.next 2026-02-21T09:47:41.9010214Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9010386Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9010468Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.9010541Z ld.shared.b32 %r66, [global_smem+90112]; 2026-02-21T09:47:41.9010597Z barrier.sync 1; 2026-02-21T09:47:41.9010768Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9010828Z bar.warp.sync -1; 2026-02-21T09:47:41.9010883Z mov.b32 %r335, 0; 2026-02-21T09:47:41.9010945Z // begin inline asm 2026-02-21T09:47:41.9010995Z 2026-02-21T09:47:41.9011044Z { 2026-02-21T09:47:41.9011102Z .reg .pred complete; 2026-02-21T09:47:41.9011164Z waitLoop: 2026-02-21T09:47:41.9011279Z mbarrier.try_wait.parity.shared.b64 complete, [%r64], %r335; 2026-02-21T09:47:41.9011361Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.9011417Z } 2026-02-21T09:47:41.9011421Z 2026-02-21T09:47:41.9011477Z // end inline asm 2026-02-21T09:47:41.9011641Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9011705Z elect.sync %r77|%p12, -1; 2026-02-21T09:47:41.9011769Z mov.b32 %r67, 134742032; 2026-02-21T09:47:41.9011824Z mov.pred %p11, 0; 2026-02-21T09:47:41.9011879Z // begin inline asm 2026-02-21T09:47:41.9012022Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd14, %rd15, %r67, %p11; 2026-02-21T09:47:41.9012075Z // end inline asm 2026-02-21T09:47:41.9012132Z cvt.u64.u32 %rd26, %r82; 2026-02-21T09:47:41.9012203Z or.b64 %rd16, %rd26, 4611686293372403712; 2026-02-21T09:47:41.9012259Z add.s32 %r83, %r30, 65568; 2026-02-21T09:47:41.9012312Z bfe.u32 %r84, %r83, 4, 14; 2026-02-21T09:47:41.9012367Z cvt.u64.u32 %rd27, %r84; 2026-02-21T09:47:41.9012440Z or.b64 %rd17, %rd27, 4611686293322072064; 2026-02-21T09:47:41.9012497Z mov.pred %p13, -1; 2026-02-21T09:47:41.9012551Z // begin inline asm 2026-02-21T09:47:41.9012708Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd16, %rd17, %r67, %p13; 2026-02-21T09:47:41.9012761Z // end inline asm 2026-02-21T09:47:41.9012816Z add.s32 %r85, %r30, 64; 2026-02-21T09:47:41.9012870Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T09:47:41.9012933Z cvt.u64.u32 %rd28, %r86; 2026-02-21T09:47:41.9012996Z or.b64 %rd18, %rd28, 4611686293372403712; 2026-02-21T09:47:41.9013049Z add.s32 %r87, %r30, 65600; 2026-02-21T09:47:41.9013109Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T09:47:41.9013167Z cvt.u64.u32 %rd29, %r88; 2026-02-21T09:47:41.9013229Z or.b64 %rd19, %rd29, 4611686293322072064; 2026-02-21T09:47:41.9013290Z // begin inline asm 2026-02-21T09:47:41.9013416Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd18, %rd19, %r67, %p13; 2026-02-21T09:47:41.9013468Z // end inline asm 2026-02-21T09:47:41.9013524Z add.s32 %r89, %r30, 96; 2026-02-21T09:47:41.9013604Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T09:47:41.9013660Z cvt.u64.u32 %rd30, %r90; 2026-02-21T09:47:41.9013725Z or.b64 %rd20, %rd30, 4611686293372403712; 2026-02-21T09:47:41.9013787Z add.s32 %r91, %r30, 65632; 2026-02-21T09:47:41.9013842Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T09:47:41.9013897Z cvt.u64.u32 %rd31, %r92; 2026-02-21T09:47:41.9013959Z or.b64 %rd21, %rd31, 4611686293322072064; 2026-02-21T09:47:41.9014021Z // begin inline asm 2026-02-21T09:47:41.9014145Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd20, %rd21, %r67, %p13; 2026-02-21T09:47:41.9014199Z // end inline asm 2026-02-21T09:47:41.9014282Z add.s32 %r93, %r30, 90240; 2026-02-21T09:47:41.9014337Z cvt.u64.u32 %rd22, %r93; 2026-02-21T09:47:41.9014390Z // begin inline asm 2026-02-21T09:47:41.9014520Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:47:41.9014573Z // end inline asm 2026-02-21T09:47:41.9014628Z add.s32 %r94, %r30, 90304; 2026-02-21T09:47:41.9014719Z cvt.u64.u32 %rd23, %r94; 2026-02-21T09:47:41.9014784Z // begin inline asm 2026-02-21T09:47:41.9014903Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:47:41.9014957Z // end inline asm 2026-02-21T09:47:41.9015019Z mov.b32 %r337, 1; 2026-02-21T09:47:41.9015079Z mov.b32 %r336, %r335; 2026-02-21T09:47:41.9015175Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:41.9015274Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:41.9015442Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9015500Z shl.b32 %r105, %r337, 3; 2026-02-21T09:47:41.9015560Z add.s32 %r107, %r30, %r105; 2026-02-21T09:47:41.9015631Z add.s32 %r108, %r107, 90240; 2026-02-21T09:47:41.9015692Z add.s32 %r95, %r107, 90272; 2026-02-21T09:47:41.9015886Z .loc 1 54 31 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:54:31 2026-02-21T09:47:41.9015955Z shl.b32 %r109, %r337, 14; 2026-02-21T09:47:41.9016012Z add.s32 %r110, %r30, %r109; 2026-02-21T09:47:41.9016179Z .loc 1 55 44 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:55:44 2026-02-21T09:47:41.9016244Z shl.b32 %r111, %r337, 12; 2026-02-21T09:47:41.9016302Z add.s32 %r112, %r30, %r111; 2026-02-21T09:47:41.9016357Z add.s32 %r113, %r112, 65536; 2026-02-21T09:47:41.9016517Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9016582Z bar.warp.sync -1; 2026-02-21T09:47:41.9016637Z // begin inline asm 2026-02-21T09:47:41.9016687Z 2026-02-21T09:47:41.9016742Z { 2026-02-21T09:47:41.9016799Z .reg .pred complete; 2026-02-21T09:47:41.9016852Z waitLoop: 2026-02-21T09:47:41.9016966Z mbarrier.try_wait.parity.shared.b64 complete, [%r95], %r336; 2026-02-21T09:47:41.9017036Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.9017086Z } 2026-02-21T09:47:41.9017091Z 2026-02-21T09:47:41.9017146Z // end inline asm 2026-02-21T09:47:41.9017318Z .loc 1 56 52 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:56:52 2026-02-21T09:47:41.9017404Z setp.eq.b32 %p31, %r335, 1920; 2026-02-21T09:47:41.9017465Z elect.sync %r114|%p22, -1; 2026-02-21T09:47:41.9017531Z bfe.u32 %r115, %r110, 4, 14; 2026-02-21T09:47:41.9017589Z cvt.u64.u32 %rd42, %r115; 2026-02-21T09:47:41.9017652Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T09:47:41.9017710Z bfe.u32 %r116, %r113, 4, 14; 2026-02-21T09:47:41.9017775Z cvt.u64.u32 %rd43, %r116; 2026-02-21T09:47:41.9017839Z or.b64 %rd33, %rd43, 4611686293322072064; 2026-02-21T09:47:41.9017895Z // begin inline asm 2026-02-21T09:47:41.9018029Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd32, %rd33, %r67, %p13; 2026-02-21T09:47:41.9018082Z // end inline asm 2026-02-21T09:47:41.9018136Z add.s32 %r117, %r110, 32; 2026-02-21T09:47:41.9018194Z bfe.u32 %r118, %r117, 4, 14; 2026-02-21T09:47:41.9018279Z cvt.u64.u32 %rd44, %r118; 2026-02-21T09:47:41.9018343Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T09:47:41.9018399Z add.s32 %r119, %r112, 65568; 2026-02-21T09:47:41.9018463Z bfe.u32 %r120, %r119, 4, 14; 2026-02-21T09:47:41.9018518Z cvt.u64.u32 %rd45, %r120; 2026-02-21T09:47:41.9018580Z or.b64 %rd35, %rd45, 4611686293322072064; 2026-02-21T09:47:41.9018640Z // begin inline asm 2026-02-21T09:47:41.9018767Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd34, %rd35, %r67, %p13; 2026-02-21T09:47:41.9018820Z // end inline asm 2026-02-21T09:47:41.9018874Z add.s32 %r121, %r110, 64; 2026-02-21T09:47:41.9018963Z bfe.u32 %r122, %r121, 4, 14; 2026-02-21T09:47:41.9019018Z cvt.u64.u32 %rd46, %r122; 2026-02-21T09:47:41.9019080Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T09:47:41.9019141Z add.s32 %r123, %r112, 65600; 2026-02-21T09:47:41.9019197Z bfe.u32 %r124, %r123, 4, 14; 2026-02-21T09:47:41.9019252Z cvt.u64.u32 %rd47, %r124; 2026-02-21T09:47:41.9019315Z or.b64 %rd37, %rd47, 4611686293322072064; 2026-02-21T09:47:41.9019376Z // begin inline asm 2026-02-21T09:47:41.9019498Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd36, %rd37, %r67, %p13; 2026-02-21T09:47:41.9019553Z // end inline asm 2026-02-21T09:47:41.9019613Z add.s32 %r125, %r110, 96; 2026-02-21T09:47:41.9019668Z bfe.u32 %r126, %r125, 4, 14; 2026-02-21T09:47:41.9019722Z cvt.u64.u32 %rd48, %r126; 2026-02-21T09:47:41.9019790Z or.b64 %rd38, %rd48, 4611686293372403712; 2026-02-21T09:47:41.9019845Z add.s32 %r127, %r112, 65632; 2026-02-21T09:47:41.9019899Z bfe.u32 %r128, %r127, 4, 14; 2026-02-21T09:47:41.9019956Z cvt.u64.u32 %rd49, %r128; 2026-02-21T09:47:41.9020025Z or.b64 %rd39, %rd49, 4611686293322072064; 2026-02-21T09:47:41.9020079Z // begin inline asm 2026-02-21T09:47:41.9020200Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r66 + 0 ], %rd38, %rd39, %r67, %p13; 2026-02-21T09:47:41.9020260Z // end inline asm 2026-02-21T09:47:41.9020331Z cvt.u64.u32 %rd40, %r108; 2026-02-21T09:47:41.9020388Z // begin inline asm 2026-02-21T09:47:41.9020515Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T09:47:41.9020570Z // end inline asm 2026-02-21T09:47:41.9020633Z and.pred %p30, %p31, %p22; 2026-02-21T09:47:41.9020686Z // begin inline asm 2026-02-21T09:47:41.9020813Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:47:41.9020866Z // end inline asm 2026-02-21T09:47:41.9021031Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9021094Z add.s32 %r130, %r337, 1; 2026-02-21T09:47:41.9021155Z setp.eq.b32 %p32, %r130, 4; 2026-02-21T09:47:41.9021216Z selp.b32 %r337, 0, %r130, %p32; 2026-02-21T09:47:41.9021273Z selp.b32 %r131, 1, 0, %p32; 2026-02-21T09:47:41.9021335Z xor.b32 %r336, %r336, %r131; 2026-02-21T09:47:41.9021502Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9021559Z add.s32 %r335, %r335, 64; 2026-02-21T09:47:41.9021627Z setp.lt.u32 %p33, %r335, 1984; 2026-02-21T09:47:41.9021705Z @%p33 bra $L__BB0_6; 2026-02-21T09:47:41.9021785Z // %bb.7: // %.loopexit 2026-02-21T09:47:41.9021879Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9021934Z barrier.sync 1; 2026-02-21T09:47:41.9022009Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.9022064Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.9022169Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9022336Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9022412Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.9022514Z ld.shared.v2.b32 {%r48, %r52}, [global_smem+90120]; 2026-02-21T09:47:41.9022571Z barrier.sync 1; 2026-02-21T09:47:41.9022756Z .loc 1 21 67 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:21:67 2026-02-21T09:47:41.9022823Z mov.u32 %r34, %ctaid.x; 2026-02-21T09:47:41.9022880Z mov.u32 %r35, %ctaid.y; 2026-02-21T09:47:41.9022935Z mov.u32 %r36, %ctaid.z; 2026-02-21T09:47:41.9022992Z mov.u32 %r37, %nctaid.x; 2026-02-21T09:47:41.9023056Z mov.u32 %r38, %nctaid.y; 2026-02-21T09:47:41.9023118Z mad.lo.s32 %r39, %r36, %r38, %r35; 2026-02-21T09:47:41.9023178Z mad.lo.s32 %r40, %r39, %r37, %r34; 2026-02-21T09:47:41.9023241Z mul.lo.s32 %r41, %r40, 384; 2026-02-21T09:47:41.9023407Z .loc 1 22 68 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:22:68 2026-02-21T09:47:41.9023481Z add.s32 %r42, %r41, 128; 2026-02-21T09:47:41.9023545Z cvt.s64.s32 %rd8, %r42; 2026-02-21T09:47:41.9023602Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:47:41.9023663Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:47:41.9023831Z .loc 1 21 67 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:21:67 2026-02-21T09:47:41.9023896Z cvt.s64.s32 %rd10, %r41; 2026-02-21T09:47:41.9023954Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:47:41.9024017Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:47:41.9024080Z add.s32 %r13, %r1, -128; 2026-02-21T09:47:41.9024132Z mov.b32 %r339, 0; 2026-02-21T09:47:41.9024188Z mov.b32 %r338, -64; 2026-02-21T09:47:41.9024242Z mov.b32 %r340, %r339; 2026-02-21T09:47:41.9024344Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:41.9024435Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:41.9024604Z .loc 1 0 67 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0:67 2026-02-21T09:47:41.9024708Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:47:41.9024769Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:47:41.9024936Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9024999Z add.s32 %r338, %r338, 64; 2026-02-21T09:47:41.9025080Z shl.b32 %r54, %r340, 3; 2026-02-21T09:47:41.9025137Z add.s32 %r56, %r30, %r54; 2026-02-21T09:47:41.9025195Z add.s32 %r43, %r56, 90240; 2026-02-21T09:47:41.9025359Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9025415Z // begin inline asm 2026-02-21T09:47:41.9025464Z 2026-02-21T09:47:41.9025517Z { 2026-02-21T09:47:41.9025574Z .reg .pred complete; 2026-02-21T09:47:41.9025626Z waitLoop: 2026-02-21T09:47:41.9025741Z mbarrier.try_wait.parity.shared.b64 complete, [%r43], %r339; 2026-02-21T09:47:41.9025810Z @!complete bra.uni waitLoop; 2026-02-21T09:47:41.9025861Z } 2026-02-21T09:47:41.9025864Z 2026-02-21T09:47:41.9025917Z // end inline asm 2026-02-21T09:47:41.9026089Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9026146Z add.s32 %r49, %r56, 90272; 2026-02-21T09:47:41.9026307Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9026369Z bar.sync 3, 64; 2026-02-21T09:47:41.9026461Z // begin inline asm 2026-02-21T09:47:41.9026569Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r49], 20480; 2026-02-21T09:47:41.9026622Z // end inline asm 2026-02-21T09:47:41.9026797Z .loc 1 54 31 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:54:31 2026-02-21T09:47:41.9026853Z shl.b32 %r57, %r340, 14; 2026-02-21T09:47:41.9026908Z add.s32 %r46, %r30, %r57; 2026-02-21T09:47:41.9027074Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9027129Z bar.sync 3, 64; 2026-02-21T09:47:41.9027189Z elect.sync %r58|%p7, -1; 2026-02-21T09:47:41.9027255Z and.pred %p4, %p6, %p7; 2026-02-21T09:47:41.9027309Z // begin inline asm 2026-02-21T09:47:41.9027552Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r46], [%rd12, {%r338, %r48}], [%r49]; 2026-02-21T09:47:41.9027629Z // end inline asm 2026-02-21T09:47:41.9027803Z .loc 1 55 44 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:55:44 2026-02-21T09:47:41.9027860Z shl.b32 %r59, %r340, 12; 2026-02-21T09:47:41.9027916Z add.s32 %r60, %r30, %r59; 2026-02-21T09:47:41.9027977Z add.s32 %r50, %r60, 65536; 2026-02-21T09:47:41.9028135Z .loc 1 0 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:0 2026-02-21T09:47:41.9028188Z bar.sync 3, 64; 2026-02-21T09:47:41.9028253Z elect.sync %r61|%p8, -1; 2026-02-21T09:47:41.9028312Z and.pred %p5, %p6, %p8; 2026-02-21T09:47:41.9028367Z // begin inline asm 2026-02-21T09:47:41.9028630Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r50], [%rd13, {%r338, %r52}], [%r49]; 2026-02-21T09:47:41.9028684Z // end inline asm 2026-02-21T09:47:41.9028740Z add.s32 %r62, %r340, 1; 2026-02-21T09:47:41.9028798Z setp.eq.b32 %p9, %r62, 4; 2026-02-21T09:47:41.9028866Z selp.b32 %r340, 0, %r62, %p9; 2026-02-21T09:47:41.9028923Z selp.b32 %r63, 1, 0, %p9; 2026-02-21T09:47:41.9028979Z xor.b32 %r339, %r339, %r63; 2026-02-21T09:47:41.9029156Z .loc 1 50 79 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:50:79 2026-02-21T09:47:41.9029217Z setp.lt.u32 %p10, %r338, 1984; 2026-02-21T09:47:41.9029273Z @%p10 bra $L__BB0_9; 2026-02-21T09:47:41.9029367Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9029430Z barrier.sync 1; 2026-02-21T09:47:41.9029505Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:41.9029560Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.9029660Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:41.9029825Z .loc 1 19 0 // cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py:19 2026-02-21T09:47:41.9029879Z barrier.sync 1; 2026-02-21T09:47:41.9029941Z barrier.sync 1; 2026-02-21T09:47:41.9030020Z bra.uni $L__BB0_2; 2026-02-21T09:47:41.9030075Z $L__tmp1: 2026-02-21T09:47:41.9030127Z $L__func_end0: 2026-02-21T09:47:41.9030215Z // -- End function 2026-02-21T09:47:41.9030265Z } 2026-02-21T09:47:41.9030469Z .file 1 "/tmp/torchinductor_root/fd/cfdoycskmqy7anqzz2vejwfisrilyi2vdxdtn7nce7ei34kteqds.py" 2026-02-21T09:47:41.9030535Z .section .debug_abbrev 2026-02-21T09:47:41.9030586Z { 2026-02-21T09:47:41.9030673Z .b8 1 // Abbreviation Code 2026-02-21T09:47:41.9030759Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:41.9030848Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:41.9030927Z .b8 37 // DW_AT_producer 2026-02-21T09:47:41.9031003Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.9031083Z .b8 19 // DW_AT_language 2026-02-21T09:47:41.9031159Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:41.9031234Z .b8 3 // DW_AT_name 2026-02-21T09:47:41.9031334Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.9031410Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:41.9031485Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:41.9031572Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:41.9031649Z .b8 8 // DW_FORM_string 2026-02-21T09:47:41.9031722Z .b8 0 // EOM(1) 2026-02-21T09:47:41.9031793Z .b8 0 // EOM(2) 2026-02-21T09:47:41.9031882Z .b8 0 // EOM(3) 2026-02-21T09:47:41.9031933Z } 2026-02-21T09:47:41.9031990Z .section .debug_info 2026-02-21T09:47:41.9032044Z { 2026-02-21T09:47:41.9032122Z .b32 104 // Length of Unit 2026-02-21T09:47:41.9032222Z .b8 2 // DWARF version number 2026-02-21T09:47:41.9032273Z .b8 0 2026-02-21T09:47:41.9032390Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:41.9032477Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:41.9032573Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:41.9032657Z .b8 116 // DW_AT_producer 2026-02-21T09:47:41.9032710Z .b8 114 2026-02-21T09:47:41.9032760Z .b8 105 2026-02-21T09:47:41.9032816Z .b8 116 2026-02-21T09:47:41.9032866Z .b8 111 2026-02-21T09:47:41.9032915Z .b8 110 2026-02-21T09:47:41.9032984Z .b8 0 2026-02-21T09:47:41.9033064Z .b8 2 // DW_AT_language 2026-02-21T09:47:41.9033114Z .b8 0 2026-02-21T09:47:41.9033184Z .b8 99 // DW_AT_name 2026-02-21T09:47:41.9033241Z .b8 102 2026-02-21T09:47:41.9033289Z .b8 100 2026-02-21T09:47:41.9033338Z .b8 111 2026-02-21T09:47:41.9033388Z .b8 121 2026-02-21T09:47:41.9033446Z .b8 99 2026-02-21T09:47:41.9033495Z .b8 115 2026-02-21T09:47:41.9033547Z .b8 107 2026-02-21T09:47:41.9033595Z .b8 109 2026-02-21T09:47:41.9033652Z .b8 113 2026-02-21T09:47:41.9033699Z .b8 121 2026-02-21T09:47:41.9033749Z .b8 55 2026-02-21T09:47:41.9033803Z .b8 97 2026-02-21T09:47:41.9033851Z .b8 110 2026-02-21T09:47:41.9033899Z .b8 113 2026-02-21T09:47:41.9033947Z .b8 122 2026-02-21T09:47:41.9034002Z .b8 122 2026-02-21T09:47:41.9034051Z .b8 50 2026-02-21T09:47:41.9034099Z .b8 118 2026-02-21T09:47:41.9034154Z .b8 101 2026-02-21T09:47:41.9034203Z .b8 106 2026-02-21T09:47:41.9034252Z .b8 119 2026-02-21T09:47:41.9034303Z .b8 102 2026-02-21T09:47:41.9034359Z .b8 105 2026-02-21T09:47:41.9034407Z .b8 115 2026-02-21T09:47:41.9034457Z .b8 114 2026-02-21T09:47:41.9034512Z .b8 105 2026-02-21T09:47:41.9034560Z .b8 108 2026-02-21T09:47:41.9034609Z .b8 121 2026-02-21T09:47:41.9034658Z .b8 105 2026-02-21T09:47:41.9034742Z .b8 50 2026-02-21T09:47:41.9034804Z .b8 118 2026-02-21T09:47:41.9034878Z .b8 100 2026-02-21T09:47:41.9034932Z .b8 120 2026-02-21T09:47:41.9034990Z .b8 100 2026-02-21T09:47:41.9035042Z .b8 116 2026-02-21T09:47:41.9035093Z .b8 110 2026-02-21T09:47:41.9035150Z .b8 55 2026-02-21T09:47:41.9035201Z .b8 110 2026-02-21T09:47:41.9035251Z .b8 99 2026-02-21T09:47:41.9035302Z .b8 101 2026-02-21T09:47:41.9035359Z .b8 55 2026-02-21T09:47:41.9035410Z .b8 101 2026-02-21T09:47:41.9035460Z .b8 105 2026-02-21T09:47:41.9035516Z .b8 51 2026-02-21T09:47:41.9035566Z .b8 52 2026-02-21T09:47:41.9035616Z .b8 107 2026-02-21T09:47:41.9035666Z .b8 116 2026-02-21T09:47:41.9035724Z .b8 101 2026-02-21T09:47:41.9035775Z .b8 113 2026-02-21T09:47:41.9035828Z .b8 100 2026-02-21T09:47:41.9035878Z .b8 115 2026-02-21T09:47:41.9035937Z .b8 46 2026-02-21T09:47:41.9035988Z .b8 112 2026-02-21T09:47:41.9036039Z .b8 121 2026-02-21T09:47:41.9036094Z .b8 0 2026-02-21T09:47:41.9036187Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:41.9036267Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:41.9036320Z .b8 116 2026-02-21T09:47:41.9036379Z .b8 109 2026-02-21T09:47:41.9036455Z .b8 112 2026-02-21T09:47:41.9036506Z .b8 47 2026-02-21T09:47:41.9036565Z .b8 116 2026-02-21T09:47:41.9036616Z .b8 111 2026-02-21T09:47:41.9036670Z .b8 114 2026-02-21T09:47:41.9036722Z .b8 99 2026-02-21T09:47:41.9036782Z .b8 104 2026-02-21T09:47:41.9036834Z .b8 105 2026-02-21T09:47:41.9036885Z .b8 110 2026-02-21T09:47:41.9036946Z .b8 100 2026-02-21T09:47:41.9036996Z .b8 117 2026-02-21T09:47:41.9037046Z .b8 99 2026-02-21T09:47:41.9037097Z .b8 116 2026-02-21T09:47:41.9037155Z .b8 111 2026-02-21T09:47:41.9037205Z .b8 114 2026-02-21T09:47:41.9037257Z .b8 95 2026-02-21T09:47:41.9037315Z .b8 114 2026-02-21T09:47:41.9037366Z .b8 111 2026-02-21T09:47:41.9037416Z .b8 111 2026-02-21T09:47:41.9037467Z .b8 116 2026-02-21T09:47:41.9037524Z .b8 47 2026-02-21T09:47:41.9037576Z .b8 102 2026-02-21T09:47:41.9037625Z .b8 100 2026-02-21T09:47:41.9037676Z .b8 0 2026-02-21T09:47:41.9037732Z } 2026-02-21T09:47:41.9037822Z .section .debug_macinfo { } 2026-02-21T09:47:41.9037826Z 2026-02-21T09:47:41.9037907Z ================================================================ 2026-02-21T09:47:41.9038021Z please share the reproducer above with Triton project. 2026-02-21T09:47:42.0624923Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:42.0625142Z 2026-02-21T09:47:42.0625587Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpm05zurcr.ptx -o /tmp/tmpm05zurcr.ptx.o 2026-02-21T09:47:42.0626060Z 2026-02-21T09:47:42.0626206Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:42.0626776Z [79s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:42.0627022Z 2026-02-21T09:47:42.0628102Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:47:42.0629359Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:42.0629608Z `ptxas` stderr: 2026-02-21T09:47:42.0630025Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 314 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:42.0630513Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:42.0630661Z 2026-02-21T09:47:42.0631065Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_1j738ug.ptx -o /tmp/tmp_1j738ug.ptx.o 2026-02-21T09:47:42.0631559Z 2026-02-21T09:47:42.0631688Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:42.0631879Z 2026-02-21T09:47:42.0631882Z 2026-02-21T09:47:42.0631965Z ================================================================ 2026-02-21T09:47:42.0632172Z Internal Triton PTX codegen error 2026-02-21T09:47:42.0632344Z `ptxas` stderr: 2026-02-21T09:47:42.0632740Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 314 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:42.0633219Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:42.0633372Z 2026-02-21T09:47:42.0633937Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp_1j738ug.ptx -o /tmp/tmp_1j738ug.ptx.o 2026-02-21T09:47:42.0634352Z 2026-02-21T09:47:42.0634355Z 2026-02-21T09:47:42.0634413Z // 2026-02-21T09:47:42.0634564Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:42.0634776Z // 2026-02-21T09:47:42.0634841Z 2026-02-21T09:47:42.0634894Z .version 8.7 2026-02-21T09:47:42.0635074Z .target sm_100a 2026-02-21T09:47:42.0635202Z .address_size 64 2026-02-21T09:47:42.0635290Z 2026-02-21T09:47:42.0635405Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:42.0635653Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:42.0635856Z // @_helion_matmul 2026-02-21T09:47:42.0636054Z .visible .entry _helion_matmul( 2026-02-21T09:47:42.0636261Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:42.0636510Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:42.0636753Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:42.0636998Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:42.0637235Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:42.0637435Z ) 2026-02-21T09:47:42.0637592Z .reqntid 256 2026-02-21T09:47:42.0637717Z .maxnreg 32 2026-02-21T09:47:42.0637845Z { 2026-02-21T09:47:42.0637965Z .reg .pred %p<122>; 2026-02-21T09:47:42.0638114Z .reg .b16 %rs<11>; 2026-02-21T09:47:42.0638248Z .reg .b32 %r<467>; 2026-02-21T09:47:42.0638387Z .reg .b64 %rd<236>; 2026-02-21T09:47:42.0638642Z .loc 1 19 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:19:0 2026-02-21T09:47:42.0638934Z $L__func_begin0: 2026-02-21T09:47:42.0639168Z .loc 1 19 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:19:0 2026-02-21T09:47:42.0639398Z 2026-02-21T09:47:42.0639482Z // %bb.0: 2026-02-21T09:47:42.0639636Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:47:42.0639819Z $L__tmp0: 2026-02-21T09:47:42.0640045Z .loc 1 19 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:19 2026-02-21T09:47:42.0640320Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:42.0640467Z shr.u32 %r2, %r1, 5; 2026-02-21T09:47:42.0640619Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:47:42.0640811Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:47:42.0640967Z @%p1 bra $L__BB0_12; 2026-02-21T09:47:42.0641103Z bra.uni $L__BB0_1; 2026-02-21T09:47:42.0641240Z $L__BB0_12: 2026-02-21T09:47:42.0641463Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0:0 2026-02-21T09:47:42.0641765Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:47:42.0641971Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:47:42.0642175Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:47:42.0642456Z .loc 1 19 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:19 2026-02-21T09:47:42.0642758Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:42.0642948Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:47:42.0643106Z mov.b32 %r137, global_smem; 2026-02-21T09:47:42.0643267Z // begin inline asm 2026-02-21T09:47:42.0643564Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r137], 64; 2026-02-21T09:47:42.0643821Z // end inline asm 2026-02-21T09:47:42.0643952Z bar.sync 0, 128; 2026-02-21T09:47:42.0644108Z ld.shared.b32 %r459, [global_smem]; 2026-02-21T09:47:42.0644274Z bar.sync 0, 128; 2026-02-21T09:47:42.0644420Z // begin inline asm 2026-02-21T09:47:42.0644637Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:42.0644900Z // end inline asm 2026-02-21T09:47:42.0645151Z .loc 1 21 67 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:21:67 2026-02-21T09:47:42.0645434Z mov.u32 %r162, %ctaid.x; 2026-02-21T09:47:42.0645590Z mov.u32 %r163, %ctaid.y; 2026-02-21T09:47:42.0645735Z mov.u32 %r164, %ctaid.z; 2026-02-21T09:47:42.0645888Z mov.u32 %r165, %nctaid.x; 2026-02-21T09:47:42.0646037Z mov.u32 %r166, %nctaid.y; 2026-02-21T09:47:42.0646202Z mad.lo.s32 %r167, %r164, %r166, %r163; 2026-02-21T09:47:42.0646387Z mad.lo.s32 %r168, %r167, %r165, %r162; 2026-02-21T09:47:42.0646557Z mul.lo.s32 %r169, %r168, 384; 2026-02-21T09:47:42.0646756Z cvt.s64.s32 %rd104, %r169; 2026-02-21T09:47:42.0646912Z add.s64 %rd65, %rd7, %rd104; 2026-02-21T09:47:42.0647075Z shl.b32 %r170, %r1, 2; 2026-02-21T09:47:42.0647223Z add.s32 %r138, %r137, %r170; 2026-02-21T09:47:42.0647378Z mov.b32 %r147, 0; 2026-02-21T09:47:42.0647512Z // begin inline asm 2026-02-21T09:47:42.0647669Z @%p34 st.shared.b32 [ %r138 + 0 ], %r147; 2026-02-21T09:47:42.0647837Z // end inline asm 2026-02-21T09:47:42.0647978Z bar.warp.sync -1; 2026-02-21T09:47:42.0648124Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:47:42.0648275Z cvt.u64.u32 %rd50, %r137; 2026-02-21T09:47:42.0648424Z // begin inline asm 2026-02-21T09:47:42.0648662Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd4; 2026-02-21T09:47:42.0648938Z // end inline asm 2026-02-21T09:47:42.0649065Z // begin inline asm 2026-02-21T09:47:42.0649317Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:47:42.0649567Z // end inline asm 2026-02-21T09:47:42.0649709Z mov.b32 %r140, 64; 2026-02-21T09:47:42.0649848Z // begin inline asm 2026-02-21T09:47:42.0650079Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r140; 2026-02-21T09:47:42.0650352Z // end inline asm 2026-02-21T09:47:42.0650484Z mov.b32 %r141, 128; 2026-02-21T09:47:42.0650627Z // begin inline asm 2026-02-21T09:47:42.0650853Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r141; 2026-02-21T09:47:42.0651133Z // end inline asm 2026-02-21T09:47:42.0651265Z mov.b32 %r142, 2048; 2026-02-21T09:47:42.0651443Z // begin inline asm 2026-02-21T09:47:42.0651684Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r142; 2026-02-21T09:47:42.0651954Z // end inline asm 2026-02-21T09:47:42.0652089Z // begin inline asm 2026-02-21T09:47:42.0652325Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r142; 2026-02-21T09:47:42.0652596Z // end inline asm 2026-02-21T09:47:42.0652728Z mov.b64 %rd58, 4096; 2026-02-21T09:47:42.0652873Z // begin inline asm 2026-02-21T09:47:42.0653126Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:47:42.0653403Z // end inline asm 2026-02-21T09:47:42.0653543Z mov.b32 %r144, 1; 2026-02-21T09:47:42.0653673Z // begin inline asm 2026-02-21T09:47:42.0653935Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r144; 2026-02-21T09:47:42.0654218Z // end inline asm 2026-02-21T09:47:42.0654359Z // begin inline asm 2026-02-21T09:47:42.0654608Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r144; 2026-02-21T09:47:42.0654925Z // end inline asm 2026-02-21T09:47:42.0655062Z // begin inline asm 2026-02-21T09:47:42.0655289Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:47:42.0655578Z // end inline asm 2026-02-21T09:47:42.0655708Z // begin inline asm 2026-02-21T09:47:42.0655959Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:42.0656236Z // end inline asm 2026-02-21T09:47:42.0656373Z // begin inline asm 2026-02-21T09:47:42.0656608Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:47:42.0656868Z // end inline asm 2026-02-21T09:47:42.0657005Z // begin inline asm 2026-02-21T09:47:42.0657224Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:42.0657482Z // end inline asm 2026-02-21T09:47:42.0657613Z // begin inline asm 2026-02-21T09:47:42.0657957Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd65 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:47:42.0658327Z // end inline asm 2026-02-21T09:47:42.0658455Z // begin inline asm 2026-02-21T09:47:42.0658669Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd65 + 0 ], 0x80; 2026-02-21T09:47:42.0658914Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:42.0659140Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:42.0659311Z // end inline asm 2026-02-21T09:47:42.0659444Z bar.sync 0, 128; 2026-02-21T09:47:42.0659680Z .loc 1 22 68 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:22:68 2026-02-21T09:47:42.0659968Z add.s32 %r171, %r169, 128; 2026-02-21T09:47:42.0660127Z cvt.s64.s32 %rd105, %r171; 2026-02-21T09:47:42.0660280Z add.s64 %rd83, %rd7, %rd105; 2026-02-21T09:47:42.0660439Z bar.sync 0, 128; 2026-02-21T09:47:42.0660570Z // begin inline asm 2026-02-21T09:47:42.0660725Z @%p34 st.shared.b32 [ %r138 + 0 ], %r147; 2026-02-21T09:47:42.0660894Z // end inline asm 2026-02-21T09:47:42.0661032Z bar.warp.sync -1; 2026-02-21T09:47:42.0661165Z // begin inline asm 2026-02-21T09:47:42.0661407Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd5; 2026-02-21T09:47:42.0661709Z // end inline asm 2026-02-21T09:47:42.0661840Z // begin inline asm 2026-02-21T09:47:42.0662066Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:47:42.0662305Z // end inline asm 2026-02-21T09:47:42.0662442Z // begin inline asm 2026-02-21T09:47:42.0662666Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r140; 2026-02-21T09:47:42.0662929Z // end inline asm 2026-02-21T09:47:42.0663058Z // begin inline asm 2026-02-21T09:47:42.0663290Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r140; 2026-02-21T09:47:42.0663581Z // end inline asm 2026-02-21T09:47:42.0663713Z // begin inline asm 2026-02-21T09:47:42.0663961Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r142; 2026-02-21T09:47:42.0664227Z // end inline asm 2026-02-21T09:47:42.0664370Z mov.b32 %r151, 12288; 2026-02-21T09:47:42.0664513Z // begin inline asm 2026-02-21T09:47:42.0664807Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r151; 2026-02-21T09:47:42.0665095Z // end inline asm 2026-02-21T09:47:42.0665230Z // begin inline asm 2026-02-21T09:47:42.0665495Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:47:42.0665781Z // end inline asm 2026-02-21T09:47:42.0665923Z // begin inline asm 2026-02-21T09:47:42.0666180Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r144; 2026-02-21T09:47:42.0666483Z // end inline asm 2026-02-21T09:47:42.0666626Z // begin inline asm 2026-02-21T09:47:42.0666886Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r144; 2026-02-21T09:47:42.0667183Z // end inline asm 2026-02-21T09:47:42.0667318Z // begin inline asm 2026-02-21T09:47:42.0667559Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:47:42.0667832Z // end inline asm 2026-02-21T09:47:42.0668006Z // begin inline asm 2026-02-21T09:47:42.0668262Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:42.0668551Z // end inline asm 2026-02-21T09:47:42.0668693Z // begin inline asm 2026-02-21T09:47:42.0668930Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:47:42.0669214Z // end inline asm 2026-02-21T09:47:42.0669347Z // begin inline asm 2026-02-21T09:47:42.0669580Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:42.0669838Z // end inline asm 2026-02-21T09:47:42.0669983Z // begin inline asm 2026-02-21T09:47:42.0670333Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd83 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:47:42.0670727Z // end inline asm 2026-02-21T09:47:42.0670869Z // begin inline asm 2026-02-21T09:47:42.0671079Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd83 + 0 ], 0x80; 2026-02-21T09:47:42.0671342Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:42.0671533Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:42.0671745Z // end inline asm 2026-02-21T09:47:42.0671884Z bar.sync 0, 128; 2026-02-21T09:47:42.0672144Z .loc 1 24 73 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:24:73 2026-02-21T09:47:42.0672438Z add.s32 %r172, %r169, 256; 2026-02-21T09:47:42.0672591Z cvt.s64.s32 %rd106, %r172; 2026-02-21T09:47:42.0672753Z add.s64 %rd101, %rd7, %rd106; 2026-02-21T09:47:42.0672905Z bar.sync 0, 128; 2026-02-21T09:47:42.0673043Z // begin inline asm 2026-02-21T09:47:42.0673192Z @%p34 st.shared.b32 [ %r138 + 0 ], %r147; 2026-02-21T09:47:42.0673368Z // end inline asm 2026-02-21T09:47:42.0673499Z bar.warp.sync -1; 2026-02-21T09:47:42.0673642Z // begin inline asm 2026-02-21T09:47:42.0673884Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd6; 2026-02-21T09:47:42.0674151Z // end inline asm 2026-02-21T09:47:42.0674315Z // begin inline asm 2026-02-21T09:47:42.0674534Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:47:42.0674822Z // end inline asm 2026-02-21T09:47:42.0674959Z // begin inline asm 2026-02-21T09:47:42.0675196Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r140; 2026-02-21T09:47:42.0675456Z // end inline asm 2026-02-21T09:47:42.0675583Z // begin inline asm 2026-02-21T09:47:42.0675813Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r141; 2026-02-21T09:47:42.0676067Z // end inline asm 2026-02-21T09:47:42.0676230Z // begin inline asm 2026-02-21T09:47:42.0676462Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r151; 2026-02-21T09:47:42.0676731Z // end inline asm 2026-02-21T09:47:42.0676857Z // begin inline asm 2026-02-21T09:47:42.0677099Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r142; 2026-02-21T09:47:42.0677380Z // end inline asm 2026-02-21T09:47:42.0677509Z mov.b64 %rd94, 24576; 2026-02-21T09:47:42.0677658Z // begin inline asm 2026-02-21T09:47:42.0677901Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd94; 2026-02-21T09:47:42.0678197Z // end inline asm 2026-02-21T09:47:42.0678328Z // begin inline asm 2026-02-21T09:47:42.0678605Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r144; 2026-02-21T09:47:42.0678909Z // end inline asm 2026-02-21T09:47:42.0679039Z // begin inline asm 2026-02-21T09:47:42.0679316Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r144; 2026-02-21T09:47:42.0679618Z // end inline asm 2026-02-21T09:47:42.0679754Z // begin inline asm 2026-02-21T09:47:42.0679980Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:47:42.0680244Z // end inline asm 2026-02-21T09:47:42.0680407Z // begin inline asm 2026-02-21T09:47:42.0680675Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:42.0680960Z // end inline asm 2026-02-21T09:47:42.0681091Z // begin inline asm 2026-02-21T09:47:42.0681327Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:47:42.0681606Z // end inline asm 2026-02-21T09:47:42.0681744Z // begin inline asm 2026-02-21T09:47:42.0681968Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:47:42.0682227Z // end inline asm 2026-02-21T09:47:42.0682365Z // begin inline asm 2026-02-21T09:47:42.0682716Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd101 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:47:42.0683107Z // end inline asm 2026-02-21T09:47:42.0683241Z // begin inline asm 2026-02-21T09:47:42.0683456Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd101 + 0 ], 0x80; 2026-02-21T09:47:42.0683708Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:47:42.0683903Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:42.0684117Z // end inline asm 2026-02-21T09:47:42.0684254Z bar.sync 0, 128; 2026-02-21T09:47:42.0684410Z cvta.global.u64 %rd107, %rd101; 2026-02-21T09:47:42.0684721Z .loc 1 31 35 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:31:35 2026-02-21T09:47:42.0685023Z shl.b32 %r466, %r162, 1; 2026-02-21T09:47:42.0685284Z .loc 1 32 37 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:32:37 2026-02-21T09:47:42.0685574Z add.s32 %r173, %r466, 2; 2026-02-21T09:47:42.0685841Z .loc 1 32 49 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:32:49 2026-02-21T09:47:42.0686141Z min.s32 %r22, %r173, 3072; 2026-02-21T09:47:42.0686411Z .loc 1 33 84 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:33:84 2026-02-21T09:47:42.0686714Z setp.ge.s32 %p90, %r466, %r22; 2026-02-21T09:47:42.0686924Z @%p90 bra $L__BB0_15; 2026-02-21T09:47:42.0687093Z // %bb.13: // %.lr.ph 2026-02-21T09:47:42.0687395Z .loc 1 0 84 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0:84 2026-02-21T09:47:42.0687681Z shl.b32 %r174, %r1, 7; 2026-02-21T09:47:42.0687829Z and.b32 %r175, %r174, 16256; 2026-02-21T09:47:42.0687991Z shl.b32 %r176, %r1, 4; 2026-02-21T09:47:42.0688135Z and.b32 %r177, %r176, 112; 2026-02-21T09:47:42.0688290Z or.b32 %r178, %r175, %r177; 2026-02-21T09:47:42.0688442Z add.s32 %r180, %r137, 98304; 2026-02-21T09:47:42.0688600Z add.s32 %r23, %r180, %r178; 2026-02-21T09:47:42.0688802Z xor.b32 %r181, %r178, 16; 2026-02-21T09:47:42.0688964Z add.s32 %r24, %r180, %r181; 2026-02-21T09:47:42.0689117Z xor.b32 %r182, %r178, 32; 2026-02-21T09:47:42.0689276Z add.s32 %r25, %r180, %r182; 2026-02-21T09:47:42.0689434Z xor.b32 %r183, %r178, 48; 2026-02-21T09:47:42.0689584Z add.s32 %r26, %r180, %r183; 2026-02-21T09:47:42.0689745Z xor.b32 %r184, %r178, 64; 2026-02-21T09:47:42.0689892Z add.s32 %r27, %r180, %r184; 2026-02-21T09:47:42.0690052Z xor.b32 %r185, %r178, 80; 2026-02-21T09:47:42.0690201Z add.s32 %r28, %r180, %r185; 2026-02-21T09:47:42.0690361Z xor.b32 %r186, %r178, 96; 2026-02-21T09:47:42.0690508Z add.s32 %r29, %r180, %r186; 2026-02-21T09:47:42.0690669Z xor.b32 %r187, %r178, 112; 2026-02-21T09:47:42.0690822Z add.s32 %r30, %r180, %r187; 2026-02-21T09:47:42.0691032Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:42.0691362Z .loc 1 39 35 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:39:35 2026-02-21T09:47:42.0691644Z shr.s32 %r351, %r466, 31; 2026-02-21T09:47:42.0691800Z shr.u32 %r352, %r351, 26; 2026-02-21T09:47:42.0691950Z add.s32 %r353, %r466, %r352; 2026-02-21T09:47:42.0692214Z .loc 1 42 64 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:42:64 2026-02-21T09:47:42.0692525Z and.b32 %r354, %r353, 65472; 2026-02-21T09:47:42.0692684Z sub.s32 %r355, %r466, %r354; 2026-02-21T09:47:42.0692842Z cvt.u16.u32 %rs1, %r355; 2026-02-21T09:47:42.0692991Z cvt.s8.s32 %rs2, %r355; 2026-02-21T09:47:42.0693243Z .loc 1 43 51 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:43:51 2026-02-21T09:47:42.0693510Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:47:42.0693663Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:47:42.0693805Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:47:42.0693960Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:47:42.0694104Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:47:42.0694352Z .loc 1 42 64 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:42:64 2026-02-21T09:47:42.0694631Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:47:42.0694801Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:47:42.0695062Z .loc 1 44 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:44:27 2026-02-21T09:47:42.0695337Z shl.b32 %r356, %r353, 2; 2026-02-21T09:47:42.0695497Z and.b32 %r357, %r356, -256; 2026-02-21T09:47:42.0695676Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:47:42.0695841Z mad.wide.s16 %r348, %rs10, 64, %r357; 2026-02-21T09:47:42.0696110Z .loc 1 45 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:45:27 2026-02-21T09:47:42.0696392Z mul.wide.s16 %r349, %rs7, 128; 2026-02-21T09:47:42.0696653Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0696932Z shfl.sync.idx.b32 %r358, %r2, 0, 31, -1; 2026-02-21T09:47:42.0697115Z shl.b32 %r359, %r358, 21; 2026-02-21T09:47:42.0697263Z and.b32 %r360, %r359, 6291456; 2026-02-21T09:47:42.0697422Z add.s32 %r188, %r360, %r459; 2026-02-21T09:47:42.0697573Z mov.pred %p91, -1; 2026-02-21T09:47:42.0697717Z mov.b32 %r189, 0; 2026-02-21T09:47:42.0697849Z // begin inline asm 2026-02-21T09:47:42.0698243Z @%p91 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 0], {%r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189}; 2026-02-21T09:47:42.0698652Z // end inline asm 2026-02-21T09:47:42.0698784Z // begin inline asm 2026-02-21T09:47:42.0699136Z @%p91 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 16], {%r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189}; 2026-02-21T09:47:42.0699514Z // end inline asm 2026-02-21T09:47:42.0699651Z // begin inline asm 2026-02-21T09:47:42.0700000Z @%p91 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 32], {%r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189}; 2026-02-21T09:47:42.0700399Z // end inline asm 2026-02-21T09:47:42.0700539Z // begin inline asm 2026-02-21T09:47:42.0700881Z @%p91 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r188 + 48], {%r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189, %r189}; 2026-02-21T09:47:42.0701265Z // end inline asm 2026-02-21T09:47:42.0701395Z // begin inline asm 2026-02-21T09:47:42.0701550Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:42.0701713Z // end inline asm 2026-02-21T09:47:42.0701842Z bar.sync 0, 128; 2026-02-21T09:47:42.0702091Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0702368Z add.s32 %r256, %r137, 114816; 2026-02-21T09:47:42.0702523Z // begin inline asm 2026-02-21T09:47:42.0702682Z @%p37 mbarrier.init.shared::cta.b64 [%r256], 1; 2026-02-21T09:47:42.0702871Z // end inline asm 2026-02-21T09:47:42.0702999Z bar.sync 0, 128; 2026-02-21T09:47:42.0703138Z add.s32 %r257, %r137, 114824; 2026-02-21T09:47:42.0703292Z // begin inline asm 2026-02-21T09:47:42.0703451Z @%p37 mbarrier.init.shared::cta.b64 [%r257], 1; 2026-02-21T09:47:42.0703638Z // end inline asm 2026-02-21T09:47:42.0703763Z bar.sync 0, 128; 2026-02-21T09:47:42.0703901Z add.s32 %r258, %r137, 114832; 2026-02-21T09:47:42.0704073Z // begin inline asm 2026-02-21T09:47:42.0704238Z @%p37 mbarrier.init.shared::cta.b64 [%r258], 1; 2026-02-21T09:47:42.0704418Z // end inline asm 2026-02-21T09:47:42.0704550Z bar.sync 0, 128; 2026-02-21T09:47:42.0704708Z add.s32 %r259, %r137, 114840; 2026-02-21T09:47:42.0704868Z // begin inline asm 2026-02-21T09:47:42.0705030Z @%p37 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:47:42.0705206Z // end inline asm 2026-02-21T09:47:42.0705347Z add.s32 %r260, %r137, 114848; 2026-02-21T09:47:42.0705495Z // begin inline asm 2026-02-21T09:47:42.0705659Z @%p37 mbarrier.init.shared::cta.b64 [%r260], 1; 2026-02-21T09:47:42.0705840Z // end inline asm 2026-02-21T09:47:42.0705977Z bar.sync 0, 128; 2026-02-21T09:47:42.0706109Z add.s32 %r261, %r137, 114856; 2026-02-21T09:47:42.0706265Z // begin inline asm 2026-02-21T09:47:42.0706425Z @%p37 mbarrier.init.shared::cta.b64 [%r261], 1; 2026-02-21T09:47:42.0706599Z // end inline asm 2026-02-21T09:47:42.0706732Z bar.sync 0, 128; 2026-02-21T09:47:42.0706862Z add.s32 %r262, %r137, 114864; 2026-02-21T09:47:42.0707018Z // begin inline asm 2026-02-21T09:47:42.0707197Z @%p37 mbarrier.init.shared::cta.b64 [%r262], 1; 2026-02-21T09:47:42.0707381Z // end inline asm 2026-02-21T09:47:42.0707508Z bar.sync 0, 128; 2026-02-21T09:47:42.0707646Z add.s32 %r263, %r137, 114872; 2026-02-21T09:47:42.0707793Z // begin inline asm 2026-02-21T09:47:42.0707960Z @%p37 mbarrier.init.shared::cta.b64 [%r263], 1; 2026-02-21T09:47:42.0708151Z // end inline asm 2026-02-21T09:47:42.0708396Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0708688Z bar.sync 0, 128; 2026-02-21T09:47:42.0708821Z // begin inline asm 2026-02-21T09:47:42.0708997Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r256]; 2026-02-21T09:47:42.0709192Z // end inline asm 2026-02-21T09:47:42.0709332Z bar.sync 0, 128; 2026-02-21T09:47:42.0709466Z // begin inline asm 2026-02-21T09:47:42.0709639Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r257]; 2026-02-21T09:47:42.0709871Z // end inline asm 2026-02-21T09:47:42.0710005Z bar.sync 0, 128; 2026-02-21T09:47:42.0710147Z // begin inline asm 2026-02-21T09:47:42.0710307Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r258]; 2026-02-21T09:47:42.0710503Z // end inline asm 2026-02-21T09:47:42.0710631Z bar.sync 0, 128; 2026-02-21T09:47:42.0710769Z // begin inline asm 2026-02-21T09:47:42.0710931Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r259]; 2026-02-21T09:47:42.0711125Z // end inline asm 2026-02-21T09:47:42.0711379Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0711693Z bar.sync 0, 128; 2026-02-21T09:47:42.0711839Z add.s32 %r268, %r137, 114880; 2026-02-21T09:47:42.0711994Z // begin inline asm 2026-02-21T09:47:42.0712160Z @%p37 mbarrier.init.shared::cta.b64 [%r268], 1; 2026-02-21T09:47:42.0712345Z // end inline asm 2026-02-21T09:47:42.0712511Z st.shared.b32 [global_smem+114888], 33554689; 2026-02-21T09:47:42.0712721Z st.shared.b32 [global_smem+114688], %r459; 2026-02-21T09:47:42.0712953Z st.shared.v2.b32 [global_smem+114696], {%r349, %r348}; 2026-02-21T09:47:42.0713165Z barrier.sync 1; 2026-02-21T09:47:42.0713327Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:42.0713518Z barrier.sync 1; 2026-02-21T09:47:42.0713655Z barrier.sync 1; 2026-02-21T09:47:42.0713821Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:42.0714114Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0714413Z bar.sync 0, 128; 2026-02-21T09:47:42.0714548Z // begin inline asm 2026-02-21T09:47:42.0714727Z 2026-02-21T09:47:42.0714847Z { 2026-02-21T09:47:42.0714993Z .reg .pred complete; 2026-02-21T09:47:42.0715151Z waitLoop: 2026-02-21T09:47:42.0715345Z mbarrier.try_wait.parity.shared.b64 complete, [%r268], %r189; 2026-02-21T09:47:42.0715598Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.0715755Z } 2026-02-21T09:47:42.0715848Z 2026-02-21T09:47:42.0715915Z // end inline asm 2026-02-21T09:47:42.0716164Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0716465Z bar.sync 0, 128; 2026-02-21T09:47:42.0716601Z // begin inline asm 2026-02-21T09:47:42.0716780Z @%p37 mbarrier.inval.shared::cta.b64 [%r268]; 2026-02-21T09:47:42.0716966Z // end inline asm 2026-02-21T09:47:42.0717094Z // begin inline asm 2026-02-21T09:47:42.0717255Z @%p37 mbarrier.inval.shared::cta.b64 [%r260]; 2026-02-21T09:47:42.0717434Z // end inline asm 2026-02-21T09:47:42.0717567Z bar.sync 0, 128; 2026-02-21T09:47:42.0717694Z // begin inline asm 2026-02-21T09:47:42.0717857Z @%p37 mbarrier.inval.shared::cta.b64 [%r261]; 2026-02-21T09:47:42.0718031Z // end inline asm 2026-02-21T09:47:42.0718164Z bar.sync 0, 128; 2026-02-21T09:47:42.0718289Z // begin inline asm 2026-02-21T09:47:42.0718449Z @%p37 mbarrier.inval.shared::cta.b64 [%r262]; 2026-02-21T09:47:42.0718630Z // end inline asm 2026-02-21T09:47:42.0718755Z bar.sync 0, 128; 2026-02-21T09:47:42.0718888Z // begin inline asm 2026-02-21T09:47:42.0719066Z @%p37 mbarrier.inval.shared::cta.b64 [%r263]; 2026-02-21T09:47:42.0719246Z // end inline asm 2026-02-21T09:47:42.0719371Z // begin inline asm 2026-02-21T09:47:42.0719527Z @%p37 mbarrier.inval.shared::cta.b64 [%r256]; 2026-02-21T09:47:42.0719697Z // end inline asm 2026-02-21T09:47:42.0719830Z bar.sync 0, 128; 2026-02-21T09:47:42.0719963Z // begin inline asm 2026-02-21T09:47:42.0720113Z @%p37 mbarrier.inval.shared::cta.b64 [%r257]; 2026-02-21T09:47:42.0720290Z // end inline asm 2026-02-21T09:47:42.0720414Z bar.sync 0, 128; 2026-02-21T09:47:42.0720550Z // begin inline asm 2026-02-21T09:47:42.0720701Z @%p37 mbarrier.inval.shared::cta.b64 [%r258]; 2026-02-21T09:47:42.0720881Z // end inline asm 2026-02-21T09:47:42.0721008Z bar.sync 0, 128; 2026-02-21T09:47:42.0721140Z // begin inline asm 2026-02-21T09:47:42.0721292Z @%p37 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:47:42.0721498Z // end inline asm 2026-02-21T09:47:42.0721745Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0722029Z // begin inline asm 2026-02-21T09:47:42.0722383Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295}, [%r188 + 0]; 2026-02-21T09:47:42.0722767Z // end inline asm 2026-02-21T09:47:42.0722902Z // begin inline asm 2026-02-21T09:47:42.0723248Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r297, %r298, %r299, %r300, %r301, %r302, %r303, %r304, %r305, %r306, %r307, %r308, %r309, %r310, %r311, %r312}, [%r188 + 16]; 2026-02-21T09:47:42.0723650Z // end inline asm 2026-02-21T09:47:42.0723790Z // begin inline asm 2026-02-21T09:47:42.0724127Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r314, %r315, %r316, %r317, %r318, %r319, %r320, %r321, %r322, %r323, %r324, %r325, %r326, %r327, %r328, %r329}, [%r188 + 32]; 2026-02-21T09:47:42.0724503Z // end inline asm 2026-02-21T09:47:42.0724632Z // begin inline asm 2026-02-21T09:47:42.0725018Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r331, %r332, %r333, %r334, %r335, %r336, %r337, %r338, %r339, %r340, %r341, %r342, %r343, %r344, %r345, %r346}, [%r188 + 48]; 2026-02-21T09:47:42.0725389Z // end inline asm 2026-02-21T09:47:42.0725515Z // begin inline asm 2026-02-21T09:47:42.0725668Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:42.0725823Z // end inline asm 2026-02-21T09:47:42.0725963Z cvt.u64.u32 %rd108, %r280; 2026-02-21T09:47:42.0726119Z cvt.u64.u32 %rd109, %r281; 2026-02-21T09:47:42.0726279Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:47:42.0726437Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:47:42.0726714Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0727012Z mov.b64 {%r362, %r363}, %rd111; 2026-02-21T09:47:42.0727184Z cvt.rn.f16x2.f32 %r364, %r363, %r362; 2026-02-21T09:47:42.0727495Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0727774Z cvt.u64.u32 %rd112, %r282; 2026-02-21T09:47:42.0727928Z cvt.u64.u32 %rd113, %r283; 2026-02-21T09:47:42.0728075Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:47:42.0728234Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:47:42.0728490Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0728768Z mov.b64 {%r365, %r366}, %rd115; 2026-02-21T09:47:42.0728936Z cvt.rn.f16x2.f32 %r367, %r366, %r365; 2026-02-21T09:47:42.0729200Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0729478Z cvt.u64.u32 %rd116, %r284; 2026-02-21T09:47:42.0729625Z cvt.u64.u32 %rd117, %r285; 2026-02-21T09:47:42.0729779Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:47:42.0729930Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:47:42.0730191Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0730466Z mov.b64 {%r368, %r369}, %rd119; 2026-02-21T09:47:42.0730654Z cvt.rn.f16x2.f32 %r370, %r369, %r368; 2026-02-21T09:47:42.0730925Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0731205Z cvt.u64.u32 %rd120, %r286; 2026-02-21T09:47:42.0731356Z cvt.u64.u32 %rd121, %r287; 2026-02-21T09:47:42.0731500Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:47:42.0731658Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:47:42.0731818Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0731882Z mov.b64 {%r371, %r372}, %rd123; 2026-02-21T09:47:42.0731943Z cvt.rn.f16x2.f32 %r373, %r372, %r371; 2026-02-21T09:47:42.0732099Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0732155Z cvt.u64.u32 %rd124, %r288; 2026-02-21T09:47:42.0732264Z cvt.u64.u32 %rd125, %r289; 2026-02-21T09:47:42.0732322Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:47:42.0732381Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:47:42.0732549Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0732605Z mov.b64 {%r374, %r375}, %rd127; 2026-02-21T09:47:42.0732665Z cvt.rn.f16x2.f32 %r376, %r375, %r374; 2026-02-21T09:47:42.0732835Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0732891Z cvt.u64.u32 %rd128, %r290; 2026-02-21T09:47:42.0732972Z cvt.u64.u32 %rd129, %r291; 2026-02-21T09:47:42.0733029Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:47:42.0733094Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:47:42.0733254Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0733311Z mov.b64 {%r377, %r378}, %rd131; 2026-02-21T09:47:42.0733380Z cvt.rn.f16x2.f32 %r379, %r378, %r377; 2026-02-21T09:47:42.0733539Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0733596Z cvt.u64.u32 %rd132, %r292; 2026-02-21T09:47:42.0733660Z cvt.u64.u32 %rd133, %r293; 2026-02-21T09:47:42.0733718Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:47:42.0733773Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:47:42.0733933Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0733996Z mov.b64 {%r380, %r381}, %rd135; 2026-02-21T09:47:42.0734057Z cvt.rn.f16x2.f32 %r382, %r381, %r380; 2026-02-21T09:47:42.0734216Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0734280Z cvt.u64.u32 %rd136, %r294; 2026-02-21T09:47:42.0734335Z cvt.u64.u32 %rd137, %r295; 2026-02-21T09:47:42.0734390Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:47:42.0734468Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:47:42.0734639Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0734731Z mov.b64 {%r383, %r384}, %rd139; 2026-02-21T09:47:42.0734792Z cvt.rn.f16x2.f32 %r385, %r384, %r383; 2026-02-21T09:47:42.0734963Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0735018Z cvt.u64.u32 %rd140, %r297; 2026-02-21T09:47:42.0735074Z cvt.u64.u32 %rd141, %r298; 2026-02-21T09:47:42.0735136Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:47:42.0735192Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:47:42.0735354Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0735417Z mov.b64 {%r386, %r387}, %rd143; 2026-02-21T09:47:42.0735477Z cvt.rn.f16x2.f32 %r388, %r387, %r386; 2026-02-21T09:47:42.0735642Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0735698Z cvt.u64.u32 %rd144, %r299; 2026-02-21T09:47:42.0735783Z cvt.u64.u32 %rd145, %r300; 2026-02-21T09:47:42.0735838Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:47:42.0735896Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:47:42.0736064Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0736121Z mov.b64 {%r389, %r390}, %rd147; 2026-02-21T09:47:42.0736181Z cvt.rn.f16x2.f32 %r391, %r390, %r389; 2026-02-21T09:47:42.0736351Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0736408Z cvt.u64.u32 %rd148, %r301; 2026-02-21T09:47:42.0736463Z cvt.u64.u32 %rd149, %r302; 2026-02-21T09:47:42.0736517Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:47:42.0736576Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:47:42.0736739Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0736828Z mov.b64 {%r392, %r393}, %rd151; 2026-02-21T09:47:42.0736899Z cvt.rn.f16x2.f32 %r394, %r393, %r392; 2026-02-21T09:47:42.0737059Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0737115Z cvt.u64.u32 %rd152, %r303; 2026-02-21T09:47:42.0737176Z cvt.u64.u32 %rd153, %r304; 2026-02-21T09:47:42.0737232Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:47:42.0737288Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:47:42.0737446Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0737532Z mov.b64 {%r395, %r396}, %rd155; 2026-02-21T09:47:42.0737594Z cvt.rn.f16x2.f32 %r397, %r396, %r395; 2026-02-21T09:47:42.0737756Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0737821Z cvt.u64.u32 %rd156, %r305; 2026-02-21T09:47:42.0737879Z cvt.u64.u32 %rd157, %r306; 2026-02-21T09:47:42.0737937Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:47:42.0737994Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:47:42.0738162Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0738219Z mov.b64 {%r398, %r399}, %rd159; 2026-02-21T09:47:42.0738280Z cvt.rn.f16x2.f32 %r400, %r399, %r398; 2026-02-21T09:47:42.0738451Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0738506Z cvt.u64.u32 %rd160, %r307; 2026-02-21T09:47:42.0738562Z cvt.u64.u32 %rd161, %r308; 2026-02-21T09:47:42.0738626Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:47:42.0738684Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:47:42.0738844Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0738901Z mov.b64 {%r401, %r402}, %rd163; 2026-02-21T09:47:42.0738993Z cvt.rn.f16x2.f32 %r403, %r402, %r401; 2026-02-21T09:47:42.0739153Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0739211Z cvt.u64.u32 %rd164, %r309; 2026-02-21T09:47:42.0739270Z cvt.u64.u32 %rd165, %r310; 2026-02-21T09:47:42.0739323Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:47:42.0739377Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:47:42.0739539Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0739595Z mov.b64 {%r404, %r405}, %rd167; 2026-02-21T09:47:42.0739653Z cvt.rn.f16x2.f32 %r406, %r405, %r404; 2026-02-21T09:47:42.0739818Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0739872Z cvt.u64.u32 %rd168, %r311; 2026-02-21T09:47:42.0739927Z cvt.u64.u32 %rd169, %r312; 2026-02-21T09:47:42.0739981Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:47:42.0740043Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:47:42.0740204Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0740283Z mov.b64 {%r407, %r408}, %rd171; 2026-02-21T09:47:42.0740348Z cvt.rn.f16x2.f32 %r409, %r408, %r407; 2026-02-21T09:47:42.0740507Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0740561Z cvt.u64.u32 %rd172, %r314; 2026-02-21T09:47:42.0740616Z cvt.u64.u32 %rd173, %r315; 2026-02-21T09:47:42.0740677Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:47:42.0740731Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:47:42.0740889Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0740954Z mov.b64 {%r410, %r411}, %rd175; 2026-02-21T09:47:42.0741014Z cvt.rn.f16x2.f32 %r412, %r411, %r410; 2026-02-21T09:47:42.0741175Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0741261Z cvt.u64.u32 %rd176, %r316; 2026-02-21T09:47:42.0741320Z cvt.u64.u32 %rd177, %r317; 2026-02-21T09:47:42.0741377Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:47:42.0741432Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:47:42.0741603Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0741659Z mov.b64 {%r413, %r414}, %rd179; 2026-02-21T09:47:42.0741719Z cvt.rn.f16x2.f32 %r415, %r414, %r413; 2026-02-21T09:47:42.0741883Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0741939Z cvt.u64.u32 %rd180, %r318; 2026-02-21T09:47:42.0742013Z cvt.u64.u32 %rd181, %r319; 2026-02-21T09:47:42.0742078Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:47:42.0742134Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:47:42.0742299Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0742358Z mov.b64 {%r416, %r417}, %rd183; 2026-02-21T09:47:42.0742429Z cvt.rn.f16x2.f32 %r418, %r417, %r416; 2026-02-21T09:47:42.0742592Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0742651Z cvt.u64.u32 %rd184, %r320; 2026-02-21T09:47:42.0742719Z cvt.u64.u32 %rd185, %r321; 2026-02-21T09:47:42.0742776Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:47:42.0742838Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:47:42.0743010Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0743067Z mov.b64 {%r419, %r420}, %rd187; 2026-02-21T09:47:42.0743129Z cvt.rn.f16x2.f32 %r421, %r420, %r419; 2026-02-21T09:47:42.0743288Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0743353Z cvt.u64.u32 %rd188, %r322; 2026-02-21T09:47:42.0743408Z cvt.u64.u32 %rd189, %r323; 2026-02-21T09:47:42.0743487Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:47:42.0743556Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:47:42.0743710Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0743768Z mov.b64 {%r422, %r423}, %rd191; 2026-02-21T09:47:42.0743837Z cvt.rn.f16x2.f32 %r424, %r423, %r422; 2026-02-21T09:47:42.0743990Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0744046Z cvt.u64.u32 %rd192, %r324; 2026-02-21T09:47:42.0744101Z cvt.u64.u32 %rd193, %r325; 2026-02-21T09:47:42.0744164Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:47:42.0744222Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:47:42.0744375Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0744439Z mov.b64 {%r425, %r426}, %rd195; 2026-02-21T09:47:42.0744500Z cvt.rn.f16x2.f32 %r427, %r426, %r425; 2026-02-21T09:47:42.0744655Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0744743Z cvt.u64.u32 %rd196, %r326; 2026-02-21T09:47:42.0744835Z cvt.u64.u32 %rd197, %r327; 2026-02-21T09:47:42.0744890Z shl.b64 %rd198, %rd197, 32; 2026-02-21T09:47:42.0744947Z or.b64 %rd199, %rd196, %rd198; 2026-02-21T09:47:42.0745119Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0745175Z mov.b64 {%r428, %r429}, %rd199; 2026-02-21T09:47:42.0745235Z cvt.rn.f16x2.f32 %r430, %r429, %r428; 2026-02-21T09:47:42.0745404Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0745462Z cvt.u64.u32 %rd200, %r328; 2026-02-21T09:47:42.0745518Z cvt.u64.u32 %rd201, %r329; 2026-02-21T09:47:42.0745580Z shl.b64 %rd202, %rd201, 32; 2026-02-21T09:47:42.0745636Z or.b64 %rd203, %rd200, %rd202; 2026-02-21T09:47:42.0745832Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0745890Z mov.b64 {%r431, %r432}, %rd203; 2026-02-21T09:47:42.0745959Z cvt.rn.f16x2.f32 %r433, %r432, %r431; 2026-02-21T09:47:42.0746117Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0746173Z cvt.u64.u32 %rd204, %r331; 2026-02-21T09:47:42.0746238Z cvt.u64.u32 %rd205, %r332; 2026-02-21T09:47:42.0746293Z shl.b64 %rd206, %rd205, 32; 2026-02-21T09:47:42.0746350Z or.b64 %rd207, %rd204, %rd206; 2026-02-21T09:47:42.0746521Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0746605Z mov.b64 {%r434, %r435}, %rd207; 2026-02-21T09:47:42.0746664Z cvt.rn.f16x2.f32 %r436, %r435, %r434; 2026-02-21T09:47:42.0746827Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0746889Z cvt.u64.u32 %rd208, %r333; 2026-02-21T09:47:42.0746947Z cvt.u64.u32 %rd209, %r334; 2026-02-21T09:47:42.0747003Z shl.b64 %rd210, %rd209, 32; 2026-02-21T09:47:42.0747068Z or.b64 %rd211, %rd208, %rd210; 2026-02-21T09:47:42.0747225Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0747282Z mov.b64 {%r437, %r438}, %rd211; 2026-02-21T09:47:42.0747349Z cvt.rn.f16x2.f32 %r439, %r438, %r437; 2026-02-21T09:47:42.0747505Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0747560Z cvt.u64.u32 %rd212, %r335; 2026-02-21T09:47:42.0747615Z cvt.u64.u32 %rd213, %r336; 2026-02-21T09:47:42.0747678Z shl.b64 %rd214, %rd213, 32; 2026-02-21T09:47:42.0747735Z or.b64 %rd215, %rd212, %rd214; 2026-02-21T09:47:42.0747891Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0747954Z mov.b64 {%r440, %r441}, %rd215; 2026-02-21T09:47:42.0748038Z cvt.rn.f16x2.f32 %r442, %r441, %r440; 2026-02-21T09:47:42.0748199Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0748263Z cvt.u64.u32 %rd216, %r337; 2026-02-21T09:47:42.0748319Z cvt.u64.u32 %rd217, %r338; 2026-02-21T09:47:42.0748374Z shl.b64 %rd218, %rd217, 32; 2026-02-21T09:47:42.0748429Z or.b64 %rd219, %rd216, %rd218; 2026-02-21T09:47:42.0748599Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0748655Z mov.b64 {%r443, %r444}, %rd219; 2026-02-21T09:47:42.0748715Z cvt.rn.f16x2.f32 %r445, %r444, %r443; 2026-02-21T09:47:42.0748882Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0748939Z cvt.u64.u32 %rd220, %r339; 2026-02-21T09:47:42.0748994Z cvt.u64.u32 %rd221, %r340; 2026-02-21T09:47:42.0749056Z shl.b64 %rd222, %rd221, 32; 2026-02-21T09:47:42.0749114Z or.b64 %rd223, %rd220, %rd222; 2026-02-21T09:47:42.0749276Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0749370Z mov.b64 {%r446, %r447}, %rd223; 2026-02-21T09:47:42.0749437Z cvt.rn.f16x2.f32 %r448, %r447, %r446; 2026-02-21T09:47:42.0749601Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0749657Z cvt.u64.u32 %rd224, %r341; 2026-02-21T09:47:42.0749719Z cvt.u64.u32 %rd225, %r342; 2026-02-21T09:47:42.0749774Z shl.b64 %rd226, %rd225, 32; 2026-02-21T09:47:42.0749829Z or.b64 %rd227, %rd224, %rd226; 2026-02-21T09:47:42.0749999Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0750056Z mov.b64 {%r449, %r450}, %rd227; 2026-02-21T09:47:42.0750117Z cvt.rn.f16x2.f32 %r451, %r450, %r449; 2026-02-21T09:47:42.0750282Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0750369Z cvt.u64.u32 %rd228, %r343; 2026-02-21T09:47:42.0750426Z cvt.u64.u32 %rd229, %r344; 2026-02-21T09:47:42.0750485Z shl.b64 %rd230, %rd229, 32; 2026-02-21T09:47:42.0750548Z or.b64 %rd231, %rd228, %rd230; 2026-02-21T09:47:42.0750712Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0750769Z mov.b64 {%r452, %r453}, %rd231; 2026-02-21T09:47:42.0750835Z cvt.rn.f16x2.f32 %r454, %r453, %r452; 2026-02-21T09:47:42.0750992Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0751068Z cvt.u64.u32 %rd232, %r345; 2026-02-21T09:47:42.0751124Z cvt.u64.u32 %rd233, %r346; 2026-02-21T09:47:42.0751190Z shl.b64 %rd234, %rd233, 32; 2026-02-21T09:47:42.0751247Z or.b64 %rd235, %rd232, %rd234; 2026-02-21T09:47:42.0751413Z .loc 1 58 27 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:58:27 2026-02-21T09:47:42.0751489Z mov.b64 {%r455, %r456}, %rd235; 2026-02-21T09:47:42.0751551Z cvt.rn.f16x2.f32 %r457, %r456, %r455; 2026-02-21T09:47:42.0751712Z .loc 1 59 45 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:59:45 2026-02-21T09:47:42.0751793Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:42.0751848Z bar.sync 0, 128; 2026-02-21T09:47:42.0751944Z st.shared.v4.b32 [%r23], {%r364, %r367, %r370, %r373}; 2026-02-21T09:47:42.0752037Z st.shared.v4.b32 [%r24], {%r376, %r379, %r382, %r385}; 2026-02-21T09:47:42.0752131Z st.shared.v4.b32 [%r25], {%r388, %r391, %r394, %r397}; 2026-02-21T09:47:42.0752216Z st.shared.v4.b32 [%r26], {%r400, %r403, %r406, %r409}; 2026-02-21T09:47:42.0752301Z st.shared.v4.b32 [%r27], {%r412, %r415, %r418, %r421}; 2026-02-21T09:47:42.0752391Z st.shared.v4.b32 [%r28], {%r424, %r427, %r430, %r433}; 2026-02-21T09:47:42.0752473Z st.shared.v4.b32 [%r29], {%r436, %r439, %r442, %r445}; 2026-02-21T09:47:42.0752594Z st.shared.v4.b32 [%r30], {%r448, %r451, %r454, %r457}; 2026-02-21T09:47:42.0752663Z // begin inline asm 2026-02-21T09:47:42.0752741Z fence.proxy.async.shared::cta; 2026-02-21T09:47:42.0752798Z // end inline asm 2026-02-21T09:47:42.0752855Z bar.sync 0, 128; 2026-02-21T09:47:42.0752930Z elect.sync %r458|%p119, -1; 2026-02-21T09:47:42.0752996Z and.pred %p117, %p34, %p119; 2026-02-21T09:47:42.0753055Z // begin inline asm 2026-02-21T09:47:42.0753254Z @%p117 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd107, {%r348, %r349}], [%r180]; 2026-02-21T09:47:42.0753309Z // end inline asm 2026-02-21T09:47:42.0753375Z cp.async.bulk.commit_group; 2026-02-21T09:47:42.0753553Z .loc 1 33 84 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:33:84 2026-02-21T09:47:42.0753614Z add.s32 %r466, %r466, 1; 2026-02-21T09:47:42.0753678Z setp.ne.b32 %p120, %r22, %r466; 2026-02-21T09:47:42.0753739Z @%p120 bra $L__BB0_14; 2026-02-21T09:47:42.0753834Z $L__BB0_15: // %._crit_edge 2026-02-21T09:47:42.0753908Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:42.0753964Z bar.sync 0, 128; 2026-02-21T09:47:42.0754165Z .loc 1 33 4 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:33:4 2026-02-21T09:47:42.0754224Z bar.sync 0, 128; 2026-02-21T09:47:42.0754282Z // begin inline asm 2026-02-21T09:47:42.0754410Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r459, 64; 2026-02-21T09:47:42.0754466Z // end inline asm 2026-02-21T09:47:42.0754549Z st.shared.b32 [global_smem+114888], 50529027; 2026-02-21T09:47:42.0754606Z barrier.sync 1; 2026-02-21T09:47:42.0754726Z $L__BB0_16: // %common.ret 2026-02-21T09:47:42.0754898Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0754952Z ret; 2026-02-21T09:47:42.0755056Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:47:42.0755117Z mov.b32 %r35, global_smem; 2026-02-21T09:47:42.0755206Z add.s32 %r36, %r35, %r3; 2026-02-21T09:47:42.0755268Z add.s32 %r69, %r35, 114848; 2026-02-21T09:47:42.0755337Z bfe.u32 %r83, %r35, 4, 14; 2026-02-21T09:47:42.0755398Z cvt.u64.u32 %rd24, %r83; 2026-02-21T09:47:42.0755468Z or.b64 %rd14, %rd24, 4611686293372403712; 2026-02-21T09:47:42.0755533Z add.s32 %r84, %r35, 65536; 2026-02-21T09:47:42.0755590Z bfe.u32 %r85, %r84, 4, 14; 2026-02-21T09:47:42.0755649Z cvt.u64.u32 %rd25, %r85; 2026-02-21T09:47:42.0755722Z or.b64 %rd15, %rd25, 4611686293338849280; 2026-02-21T09:47:42.0755780Z add.s32 %r86, %r35, 32; 2026-02-21T09:47:42.0755836Z bfe.u32 %r87, %r86, 4, 14; 2026-02-21T09:47:42.0755919Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.0756033Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0756203Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0756285Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.0756351Z barrier.sync 1; 2026-02-21T09:47:42.0756410Z barrier.sync 1; 2026-02-21T09:47:42.0756490Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.0756571Z $L__BB0_2: // %.preheader 2026-02-21T09:47:42.0756669Z // =>This Loop Header: Depth=1 2026-02-21T09:47:42.0756759Z // Child Loop BB0_9 Depth 2 2026-02-21T09:47:42.0756846Z // Child Loop BB0_6 Depth 2 2026-02-21T09:47:42.0757023Z .loc 1 19 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:19 2026-02-21T09:47:42.0757103Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:47:42.0757160Z barrier.sync 1; 2026-02-21T09:47:42.0757235Z ld.shared.b8 %r34, [%r36+114884]; 2026-02-21T09:47:42.0757300Z setp.gt.u32 %p2, %r34, 3; 2026-02-21T09:47:42.0757359Z @%p2 bra $L__BB0_4; 2026-02-21T09:47:42.0757467Z // %bb.3: // %.preheader 2026-02-21T09:47:42.0757569Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0757634Z $L_brx_0: .branchtargets 2026-02-21T09:47:42.0757690Z $L__BB0_5, 2026-02-21T09:47:42.0757750Z $L__BB0_8, 2026-02-21T09:47:42.0757804Z $L__BB0_11, 2026-02-21T09:47:42.0757856Z $L__BB0_16; 2026-02-21T09:47:42.0757918Z brx.idx %r34, $L_brx_0; 2026-02-21T09:47:42.0758004Z $L__BB0_5: // %.peel.next 2026-02-21T09:47:42.0758091Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0758260Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0758346Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.0758423Z ld.shared.b32 %r71, [global_smem+114688]; 2026-02-21T09:47:42.0758480Z barrier.sync 1; 2026-02-21T09:47:42.0758659Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0758723Z bar.warp.sync -1; 2026-02-21T09:47:42.0758779Z mov.b32 %r460, 0; 2026-02-21T09:47:42.0758866Z // begin inline asm 2026-02-21T09:47:42.0758926Z 2026-02-21T09:47:42.0758976Z { 2026-02-21T09:47:42.0759038Z .reg .pred complete; 2026-02-21T09:47:42.0759102Z waitLoop: 2026-02-21T09:47:42.0759223Z mbarrier.try_wait.parity.shared.b64 complete, [%r69], %r460; 2026-02-21T09:47:42.0759291Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.0759354Z } 2026-02-21T09:47:42.0759366Z 2026-02-21T09:47:42.0759422Z // end inline asm 2026-02-21T09:47:42.0759580Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0759646Z elect.sync %r82|%p12, -1; 2026-02-21T09:47:42.0759714Z mov.b32 %r72, 135266320; 2026-02-21T09:47:42.0759772Z mov.pred %p11, 0; 2026-02-21T09:47:42.0759827Z // begin inline asm 2026-02-21T09:47:42.0759971Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd14, %rd15, %r72, %p11; 2026-02-21T09:47:42.0760045Z // end inline asm 2026-02-21T09:47:42.0760105Z cvt.u64.u32 %rd26, %r87; 2026-02-21T09:47:42.0760171Z or.b64 %rd16, %rd26, 4611686293372403712; 2026-02-21T09:47:42.0760236Z add.s32 %r88, %r35, 65568; 2026-02-21T09:47:42.0760291Z bfe.u32 %r89, %r88, 4, 14; 2026-02-21T09:47:42.0760347Z cvt.u64.u32 %rd27, %r89; 2026-02-21T09:47:42.0760417Z or.b64 %rd17, %rd27, 4611686293338849280; 2026-02-21T09:47:42.0760477Z mov.pred %p13, -1; 2026-02-21T09:47:42.0760532Z // begin inline asm 2026-02-21T09:47:42.0760672Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd16, %rd17, %r72, %p13; 2026-02-21T09:47:42.0760725Z // end inline asm 2026-02-21T09:47:42.0760806Z add.s32 %r90, %r35, 64; 2026-02-21T09:47:42.0760862Z bfe.u32 %r91, %r90, 4, 14; 2026-02-21T09:47:42.0760925Z cvt.u64.u32 %rd28, %r91; 2026-02-21T09:47:42.0760989Z or.b64 %rd18, %rd28, 4611686293372403712; 2026-02-21T09:47:42.0761043Z add.s32 %r92, %r35, 65600; 2026-02-21T09:47:42.0761106Z bfe.u32 %r93, %r92, 4, 14; 2026-02-21T09:47:42.0761162Z cvt.u64.u32 %rd29, %r93; 2026-02-21T09:47:42.0761223Z or.b64 %rd19, %rd29, 4611686293338849280; 2026-02-21T09:47:42.0761279Z // begin inline asm 2026-02-21T09:47:42.0761413Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd18, %rd19, %r72, %p13; 2026-02-21T09:47:42.0761468Z // end inline asm 2026-02-21T09:47:42.0761524Z add.s32 %r94, %r35, 96; 2026-02-21T09:47:42.0761586Z bfe.u32 %r95, %r94, 4, 14; 2026-02-21T09:47:42.0761642Z cvt.u64.u32 %rd30, %r95; 2026-02-21T09:47:42.0761703Z or.b64 %rd20, %rd30, 4611686293372403712; 2026-02-21T09:47:42.0761764Z add.s32 %r96, %r35, 65632; 2026-02-21T09:47:42.0761819Z bfe.u32 %r97, %r96, 4, 14; 2026-02-21T09:47:42.0761875Z cvt.u64.u32 %rd31, %r97; 2026-02-21T09:47:42.0761935Z or.b64 %rd21, %rd31, 4611686293338849280; 2026-02-21T09:47:42.0761997Z // begin inline asm 2026-02-21T09:47:42.0762121Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd20, %rd21, %r72, %p13; 2026-02-21T09:47:42.0762196Z // end inline asm 2026-02-21T09:47:42.0762267Z add.s32 %r98, %r35, 114816; 2026-02-21T09:47:42.0762322Z cvt.u64.u32 %rd22, %r98; 2026-02-21T09:47:42.0762378Z // begin inline asm 2026-02-21T09:47:42.0762509Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:47:42.0762562Z // end inline asm 2026-02-21T09:47:42.0762618Z add.s32 %r99, %r35, 114880; 2026-02-21T09:47:42.0762672Z cvt.u64.u32 %rd23, %r99; 2026-02-21T09:47:42.0762734Z // begin inline asm 2026-02-21T09:47:42.0762853Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:47:42.0762907Z // end inline asm 2026-02-21T09:47:42.0762968Z mov.b32 %r462, 1; 2026-02-21T09:47:42.0763024Z mov.b32 %r461, %r460; 2026-02-21T09:47:42.0763118Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:42.0763205Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:42.0763386Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0763441Z shl.b32 %r110, %r462, 3; 2026-02-21T09:47:42.0763516Z add.s32 %r112, %r35, %r110; 2026-02-21T09:47:42.0763583Z add.s32 %r113, %r112, 114816; 2026-02-21T09:47:42.0763639Z add.s32 %r100, %r112, 114848; 2026-02-21T09:47:42.0763799Z .loc 1 54 31 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:54:31 2026-02-21T09:47:42.0763861Z shl.b32 %r114, %r462, 14; 2026-02-21T09:47:42.0763916Z add.s32 %r115, %r35, %r114; 2026-02-21T09:47:42.0764079Z .loc 1 55 44 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:55:44 2026-02-21T09:47:42.0764137Z shl.b32 %r116, %r462, 13; 2026-02-21T09:47:42.0764198Z add.s32 %r117, %r35, %r116; 2026-02-21T09:47:42.0764254Z add.s32 %r118, %r117, 65536; 2026-02-21T09:47:42.0764413Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0764480Z bar.warp.sync -1; 2026-02-21T09:47:42.0764555Z // begin inline asm 2026-02-21T09:47:42.0764603Z 2026-02-21T09:47:42.0764658Z { 2026-02-21T09:47:42.0764748Z .reg .pred complete; 2026-02-21T09:47:42.0764801Z waitLoop: 2026-02-21T09:47:42.0764918Z mbarrier.try_wait.parity.shared.b64 complete, [%r100], %r461; 2026-02-21T09:47:42.0764987Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.0765036Z } 2026-02-21T09:47:42.0765039Z 2026-02-21T09:47:42.0765093Z // end inline asm 2026-02-21T09:47:42.0765260Z .loc 1 56 52 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:56:52 2026-02-21T09:47:42.0765323Z setp.eq.b32 %p31, %r460, 1920; 2026-02-21T09:47:42.0765413Z elect.sync %r119|%p22, -1; 2026-02-21T09:47:42.0765471Z bfe.u32 %r120, %r115, 4, 14; 2026-02-21T09:47:42.0765535Z cvt.u64.u32 %rd42, %r120; 2026-02-21T09:47:42.0765599Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T09:47:42.0765657Z bfe.u32 %r121, %r118, 4, 14; 2026-02-21T09:47:42.0765722Z cvt.u64.u32 %rd43, %r121; 2026-02-21T09:47:42.0765785Z or.b64 %rd33, %rd43, 4611686293338849280; 2026-02-21T09:47:42.0765840Z // begin inline asm 2026-02-21T09:47:42.0765978Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd32, %rd33, %r72, %p13; 2026-02-21T09:47:42.0766033Z // end inline asm 2026-02-21T09:47:42.0766089Z add.s32 %r122, %r115, 32; 2026-02-21T09:47:42.0766145Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T09:47:42.0766209Z cvt.u64.u32 %rd44, %r123; 2026-02-21T09:47:42.0766269Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T09:47:42.0766326Z add.s32 %r124, %r117, 65568; 2026-02-21T09:47:42.0766388Z bfe.u32 %r125, %r124, 4, 14; 2026-02-21T09:47:42.0766447Z cvt.u64.u32 %rd45, %r125; 2026-02-21T09:47:42.0766509Z or.b64 %rd35, %rd45, 4611686293338849280; 2026-02-21T09:47:42.0766564Z // begin inline asm 2026-02-21T09:47:42.0766700Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd34, %rd35, %r72, %p13; 2026-02-21T09:47:42.0766755Z // end inline asm 2026-02-21T09:47:42.0766861Z add.s32 %r126, %r115, 64; 2026-02-21T09:47:42.0766935Z bfe.u32 %r127, %r126, 4, 14; 2026-02-21T09:47:42.0766992Z cvt.u64.u32 %rd46, %r127; 2026-02-21T09:47:42.0767056Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T09:47:42.0767119Z add.s32 %r128, %r117, 65600; 2026-02-21T09:47:42.0767174Z bfe.u32 %r129, %r128, 4, 14; 2026-02-21T09:47:42.0767229Z cvt.u64.u32 %rd47, %r129; 2026-02-21T09:47:42.0767292Z or.b64 %rd37, %rd47, 4611686293338849280; 2026-02-21T09:47:42.0767354Z // begin inline asm 2026-02-21T09:47:42.0767479Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd36, %rd37, %r72, %p13; 2026-02-21T09:47:42.0767531Z // end inline asm 2026-02-21T09:47:42.0767594Z add.s32 %r130, %r115, 96; 2026-02-21T09:47:42.0767650Z bfe.u32 %r131, %r130, 4, 14; 2026-02-21T09:47:42.0767706Z cvt.u64.u32 %rd48, %r131; 2026-02-21T09:47:42.0767768Z or.b64 %rd38, %rd48, 4611686293372403712; 2026-02-21T09:47:42.0767831Z add.s32 %r132, %r117, 65632; 2026-02-21T09:47:42.0767886Z bfe.u32 %r133, %r132, 4, 14; 2026-02-21T09:47:42.0767943Z cvt.u64.u32 %rd49, %r133; 2026-02-21T09:47:42.0768015Z or.b64 %rd39, %rd49, 4611686293338849280; 2026-02-21T09:47:42.0768101Z // begin inline asm 2026-02-21T09:47:42.0768222Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r71 + 0 ], %rd38, %rd39, %r72, %p13; 2026-02-21T09:47:42.0768283Z // end inline asm 2026-02-21T09:47:42.0768338Z cvt.u64.u32 %rd40, %r113; 2026-02-21T09:47:42.0768391Z // begin inline asm 2026-02-21T09:47:42.0768510Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T09:47:42.0768569Z // end inline asm 2026-02-21T09:47:42.0768631Z and.pred %p30, %p31, %p22; 2026-02-21T09:47:42.0768686Z // begin inline asm 2026-02-21T09:47:42.0768811Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:47:42.0768864Z // end inline asm 2026-02-21T09:47:42.0769024Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0769089Z add.s32 %r135, %r462, 1; 2026-02-21T09:47:42.0769176Z setp.eq.b32 %p32, %r135, 4; 2026-02-21T09:47:42.0769241Z selp.b32 %r462, 0, %r135, %p32; 2026-02-21T09:47:42.0769300Z selp.b32 %r136, 1, 0, %p32; 2026-02-21T09:47:42.0769364Z xor.b32 %r461, %r461, %r136; 2026-02-21T09:47:42.0769532Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0769587Z add.s32 %r460, %r460, 64; 2026-02-21T09:47:42.0769655Z setp.lt.u32 %p33, %r460, 1984; 2026-02-21T09:47:42.0769712Z @%p33 bra $L__BB0_6; 2026-02-21T09:47:42.0769790Z // %bb.7: // %.loopexit 2026-02-21T09:47:42.0769906Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0769962Z barrier.sync 1; 2026-02-21T09:47:42.0770037Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.0770091Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.0770192Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0770356Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0770431Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.0770531Z ld.shared.v2.b32 {%r53, %r57}, [global_smem+114696]; 2026-02-21T09:47:42.0770587Z barrier.sync 1; 2026-02-21T09:47:42.0770745Z .loc 1 21 67 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:21:67 2026-02-21T09:47:42.0770808Z mov.u32 %r39, %ctaid.x; 2026-02-21T09:47:42.0770865Z mov.u32 %r40, %ctaid.y; 2026-02-21T09:47:42.0770920Z mov.u32 %r41, %ctaid.z; 2026-02-21T09:47:42.0770976Z mov.u32 %r42, %nctaid.x; 2026-02-21T09:47:42.0771041Z mov.u32 %r43, %nctaid.y; 2026-02-21T09:47:42.0771101Z mad.lo.s32 %r44, %r41, %r43, %r40; 2026-02-21T09:47:42.0771160Z mad.lo.s32 %r45, %r44, %r42, %r39; 2026-02-21T09:47:42.0771222Z mul.lo.s32 %r46, %r45, 384; 2026-02-21T09:47:42.0771404Z .loc 1 22 68 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:22:68 2026-02-21T09:47:42.0771461Z add.s32 %r47, %r46, 128; 2026-02-21T09:47:42.0771521Z cvt.s64.s32 %rd8, %r47; 2026-02-21T09:47:42.0771586Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:47:42.0771647Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:47:42.0771811Z .loc 1 21 67 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:21:67 2026-02-21T09:47:42.0771873Z cvt.s64.s32 %rd10, %r46; 2026-02-21T09:47:42.0771931Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:47:42.0771993Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:47:42.0772055Z add.s32 %r13, %r1, -128; 2026-02-21T09:47:42.0772106Z mov.b32 %r464, 0; 2026-02-21T09:47:42.0772162Z mov.b32 %r463, -64; 2026-02-21T09:47:42.0772215Z mov.b32 %r465, %r464; 2026-02-21T09:47:42.0772319Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:42.0772409Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:42.0772572Z .loc 1 0 67 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0:67 2026-02-21T09:47:42.0772661Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:47:42.0772720Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:47:42.0772886Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0772948Z add.s32 %r463, %r463, 64; 2026-02-21T09:47:42.0773003Z shl.b32 %r59, %r465, 3; 2026-02-21T09:47:42.0773058Z add.s32 %r61, %r35, %r59; 2026-02-21T09:47:42.0773112Z add.s32 %r48, %r61, 114816; 2026-02-21T09:47:42.0773279Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0773336Z // begin inline asm 2026-02-21T09:47:42.0773383Z 2026-02-21T09:47:42.0773438Z { 2026-02-21T09:47:42.0773496Z .reg .pred complete; 2026-02-21T09:47:42.0773548Z waitLoop: 2026-02-21T09:47:42.0773662Z mbarrier.try_wait.parity.shared.b64 complete, [%r48], %r464; 2026-02-21T09:47:42.0773733Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.0773804Z } 2026-02-21T09:47:42.0773809Z 2026-02-21T09:47:42.0773864Z // end inline asm 2026-02-21T09:47:42.0774029Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0774085Z add.s32 %r54, %r61, 114848; 2026-02-21T09:47:42.0774235Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0774300Z bar.sync 3, 64; 2026-02-21T09:47:42.0774357Z // begin inline asm 2026-02-21T09:47:42.0774465Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r54], 24576; 2026-02-21T09:47:42.0774544Z // end inline asm 2026-02-21T09:47:42.0774738Z .loc 1 54 31 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:54:31 2026-02-21T09:47:42.0774798Z shl.b32 %r62, %r465, 14; 2026-02-21T09:47:42.0774858Z add.s32 %r51, %r35, %r62; 2026-02-21T09:47:42.0775025Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0775080Z bar.sync 3, 64; 2026-02-21T09:47:42.0775142Z elect.sync %r63|%p7, -1; 2026-02-21T09:47:42.0775213Z and.pred %p4, %p6, %p7; 2026-02-21T09:47:42.0775267Z // begin inline asm 2026-02-21T09:47:42.0775511Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r51], [%rd12, {%r463, %r53}], [%r54]; 2026-02-21T09:47:42.0775565Z // end inline asm 2026-02-21T09:47:42.0775733Z .loc 1 55 44 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:55:44 2026-02-21T09:47:42.0775788Z shl.b32 %r64, %r465, 13; 2026-02-21T09:47:42.0775846Z add.s32 %r65, %r35, %r64; 2026-02-21T09:47:42.0775910Z add.s32 %r55, %r65, 65536; 2026-02-21T09:47:42.0776059Z .loc 1 0 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:0 2026-02-21T09:47:42.0776115Z bar.sync 3, 64; 2026-02-21T09:47:42.0776181Z elect.sync %r66|%p8, -1; 2026-02-21T09:47:42.0776268Z and.pred %p5, %p6, %p8; 2026-02-21T09:47:42.0776327Z // begin inline asm 2026-02-21T09:47:42.0776561Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r55], [%rd13, {%r463, %r57}], [%r54]; 2026-02-21T09:47:42.0776625Z // end inline asm 2026-02-21T09:47:42.0776680Z add.s32 %r67, %r465, 1; 2026-02-21T09:47:42.0776739Z setp.eq.b32 %p9, %r67, 4; 2026-02-21T09:47:42.0776810Z selp.b32 %r465, 0, %r67, %p9; 2026-02-21T09:47:42.0776866Z selp.b32 %r68, 1, 0, %p9; 2026-02-21T09:47:42.0776922Z xor.b32 %r464, %r464, %r68; 2026-02-21T09:47:42.0777097Z .loc 1 50 57 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:50:57 2026-02-21T09:47:42.0777160Z setp.lt.u32 %p10, %r463, 1984; 2026-02-21T09:47:42.0777216Z @%p10 bra $L__BB0_9; 2026-02-21T09:47:42.0777311Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0777375Z barrier.sync 1; 2026-02-21T09:47:42.0777452Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.0777511Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.0777612Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.0777801Z .loc 1 19 0 // ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py:19 2026-02-21T09:47:42.0777857Z barrier.sync 1; 2026-02-21T09:47:42.0777920Z barrier.sync 1; 2026-02-21T09:47:42.0777976Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.0778028Z $L__tmp1: 2026-02-21T09:47:42.0778081Z $L__func_end0: 2026-02-21T09:47:42.0778168Z // -- End function 2026-02-21T09:47:42.0778219Z } 2026-02-21T09:47:42.0778421Z .file 1 "/tmp/torchinductor_root/ky/ckyo5xjavckz4j3qsyvkcyyqo2vb3wmr2l5o34kmtwe43jwe3c3t.py" 2026-02-21T09:47:42.0778490Z .section .debug_abbrev 2026-02-21T09:47:42.0778541Z { 2026-02-21T09:47:42.0778626Z .b8 1 // Abbreviation Code 2026-02-21T09:47:42.0778710Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:42.0778823Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:42.0778902Z .b8 37 // DW_AT_producer 2026-02-21T09:47:42.0778978Z .b8 8 // DW_FORM_string 2026-02-21T09:47:42.0779056Z .b8 19 // DW_AT_language 2026-02-21T09:47:42.0779130Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:42.0779203Z .b8 3 // DW_AT_name 2026-02-21T09:47:42.0779280Z .b8 8 // DW_FORM_string 2026-02-21T09:47:42.0779354Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:42.0779451Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:42.0779524Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:42.0779601Z .b8 8 // DW_FORM_string 2026-02-21T09:47:42.0779668Z .b8 0 // EOM(1) 2026-02-21T09:47:42.0779737Z .b8 0 // EOM(2) 2026-02-21T09:47:42.0779810Z .b8 0 // EOM(3) 2026-02-21T09:47:42.0779859Z } 2026-02-21T09:47:42.0779914Z .section .debug_info 2026-02-21T09:47:42.0779969Z { 2026-02-21T09:47:42.0780049Z .b32 104 // Length of Unit 2026-02-21T09:47:42.0780130Z .b8 2 // DWARF version number 2026-02-21T09:47:42.0780180Z .b8 0 2026-02-21T09:47:42.0780297Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:42.0780380Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:42.0780477Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:42.0780564Z .b8 116 // DW_AT_producer 2026-02-21T09:47:42.0780616Z .b8 114 2026-02-21T09:47:42.0780667Z .b8 105 2026-02-21T09:47:42.0780717Z .b8 116 2026-02-21T09:47:42.0780792Z .b8 111 2026-02-21T09:47:42.0780844Z .b8 110 2026-02-21T09:47:42.0780893Z .b8 0 2026-02-21T09:47:42.0780972Z .b8 2 // DW_AT_language 2026-02-21T09:47:42.0781022Z .b8 0 2026-02-21T09:47:42.0781096Z .b8 99 // DW_AT_name 2026-02-21T09:47:42.0781145Z .b8 107 2026-02-21T09:47:42.0781201Z .b8 121 2026-02-21T09:47:42.0781249Z .b8 111 2026-02-21T09:47:42.0781298Z .b8 53 2026-02-21T09:47:42.0781352Z .b8 120 2026-02-21T09:47:42.0781401Z .b8 106 2026-02-21T09:47:42.0781450Z .b8 97 2026-02-21T09:47:42.0781498Z .b8 118 2026-02-21T09:47:42.0781555Z .b8 99 2026-02-21T09:47:42.0781604Z .b8 107 2026-02-21T09:47:42.0781654Z .b8 122 2026-02-21T09:47:42.0781708Z .b8 52 2026-02-21T09:47:42.0781757Z .b8 106 2026-02-21T09:47:42.0781805Z .b8 51 2026-02-21T09:47:42.0781854Z .b8 113 2026-02-21T09:47:42.0781915Z .b8 115 2026-02-21T09:47:42.0781964Z .b8 121 2026-02-21T09:47:42.0782015Z .b8 118 2026-02-21T09:47:42.0782072Z .b8 107 2026-02-21T09:47:42.0782126Z .b8 99 2026-02-21T09:47:42.0782176Z .b8 121 2026-02-21T09:47:42.0782225Z .b8 121 2026-02-21T09:47:42.0782284Z .b8 113 2026-02-21T09:47:42.0782355Z .b8 111 2026-02-21T09:47:42.0782405Z .b8 50 2026-02-21T09:47:42.0782453Z .b8 118 2026-02-21T09:47:42.0782508Z .b8 98 2026-02-21T09:47:42.0782557Z .b8 51 2026-02-21T09:47:42.0782605Z .b8 119 2026-02-21T09:47:42.0782661Z .b8 109 2026-02-21T09:47:42.0782710Z .b8 114 2026-02-21T09:47:42.0782758Z .b8 50 2026-02-21T09:47:42.0782807Z .b8 108 2026-02-21T09:47:42.0782863Z .b8 53 2026-02-21T09:47:42.0782912Z .b8 111 2026-02-21T09:47:42.0782960Z .b8 51 2026-02-21T09:47:42.0783014Z .b8 52 2026-02-21T09:47:42.0783061Z .b8 107 2026-02-21T09:47:42.0783112Z .b8 109 2026-02-21T09:47:42.0783161Z .b8 116 2026-02-21T09:47:42.0783216Z .b8 119 2026-02-21T09:47:42.0783266Z .b8 101 2026-02-21T09:47:42.0783313Z .b8 52 2026-02-21T09:47:42.0783362Z .b8 51 2026-02-21T09:47:42.0783419Z .b8 106 2026-02-21T09:47:42.0783467Z .b8 119 2026-02-21T09:47:42.0783516Z .b8 101 2026-02-21T09:47:42.0783573Z .b8 51 2026-02-21T09:47:42.0783623Z .b8 99 2026-02-21T09:47:42.0783695Z .b8 51 2026-02-21T09:47:42.0783744Z .b8 116 2026-02-21T09:47:42.0783803Z .b8 46 2026-02-21T09:47:42.0783851Z .b8 112 2026-02-21T09:47:42.0783900Z .b8 121 2026-02-21T09:47:42.0783955Z .b8 0 2026-02-21T09:47:42.0784044Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:42.0784118Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:42.0784168Z .b8 116 2026-02-21T09:47:42.0784223Z .b8 109 2026-02-21T09:47:42.0784272Z .b8 112 2026-02-21T09:47:42.0784320Z .b8 47 2026-02-21T09:47:42.0784375Z .b8 116 2026-02-21T09:47:42.0784424Z .b8 111 2026-02-21T09:47:42.0784512Z .b8 114 2026-02-21T09:47:42.0784561Z .b8 99 2026-02-21T09:47:42.0784619Z .b8 104 2026-02-21T09:47:42.0784694Z .b8 105 2026-02-21T09:47:42.0784744Z .b8 110 2026-02-21T09:47:42.0784799Z .b8 100 2026-02-21T09:47:42.0784848Z .b8 117 2026-02-21T09:47:42.0784895Z .b8 99 2026-02-21T09:47:42.0784943Z .b8 116 2026-02-21T09:47:42.0785001Z .b8 111 2026-02-21T09:47:42.0785051Z .b8 114 2026-02-21T09:47:42.0785104Z .b8 95 2026-02-21T09:47:42.0785154Z .b8 114 2026-02-21T09:47:42.0785213Z .b8 111 2026-02-21T09:47:42.0785263Z .b8 111 2026-02-21T09:47:42.0785310Z .b8 116 2026-02-21T09:47:42.0785365Z .b8 47 2026-02-21T09:47:42.0785412Z .b8 107 2026-02-21T09:47:42.0785460Z .b8 121 2026-02-21T09:47:42.0785510Z .b8 0 2026-02-21T09:47:42.0785566Z } 2026-02-21T09:47:42.0785630Z .section .debug_macinfo { } 2026-02-21T09:47:42.0785633Z 2026-02-21T09:47:42.0785709Z ================================================================ 2026-02-21T09:47:42.0785816Z please share the reproducer above with Triton project. 2026-02-21T09:47:42.2987751Z [79s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:47:42.2987765Z 2026-02-21T09:47:42.2990341Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 16, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:47:42.2990520Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:47:42.2990582Z `ptxas` stderr: 2026-02-21T09:47:42.2990930Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 301 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:42.2991031Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:42.2991041Z 2026-02-21T09:47:42.2991445Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplx5wy2wf.ptx -o /tmp/tmplx5wy2wf.ptx.o 2026-02-21T09:47:42.2991450Z 2026-02-21T09:47:42.2991580Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:47:42.2991586Z 2026-02-21T09:47:42.2991589Z 2026-02-21T09:47:42.2991746Z ================================================================ 2026-02-21T09:47:42.2991818Z Internal Triton PTX codegen error 2026-02-21T09:47:42.2991876Z `ptxas` stderr: 2026-02-21T09:47:42.2992204Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 301 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:47:42.2992295Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:47:42.2992298Z 2026-02-21T09:47:42.2992682Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmplx5wy2wf.ptx -o /tmp/tmplx5wy2wf.ptx.o 2026-02-21T09:47:42.2992688Z 2026-02-21T09:47:42.2992691Z 2026-02-21T09:47:42.2992751Z // 2026-02-21T09:47:42.2992824Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:47:42.2992874Z // 2026-02-21T09:47:42.2992879Z 2026-02-21T09:47:42.2993005Z .version 8.7 2026-02-21T09:47:42.2993062Z .target sm_100a 2026-02-21T09:47:42.2993118Z .address_size 64 2026-02-21T09:47:42.2993122Z 2026-02-21T09:47:42.2993249Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:47:42.2993329Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:47:42.2993412Z // @_helion_matmul 2026-02-21T09:47:42.2993484Z .visible .entry _helion_matmul( 2026-02-21T09:47:42.2993602Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:47:42.2993699Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:47:42.2993835Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:47:42.2993943Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:47:42.2994040Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:47:42.2994096Z ) 2026-02-21T09:47:42.2994164Z .reqntid 256 2026-02-21T09:47:42.2994222Z .maxnreg 32 2026-02-21T09:47:42.2994274Z { 2026-02-21T09:47:42.2994337Z .reg .pred %p<139>; 2026-02-21T09:47:42.2994401Z .reg .b16 %rs<11>; 2026-02-21T09:47:42.2994453Z .reg .b32 %r<340>; 2026-02-21T09:47:42.2994506Z .reg .b64 %rd<172>; 2026-02-21T09:47:42.2994886Z .loc 1 19 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:19:0 2026-02-21T09:47:42.2994943Z $L__func_begin0: 2026-02-21T09:47:42.2995113Z .loc 1 19 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:19:0 2026-02-21T09:47:42.2995116Z 2026-02-21T09:47:42.2995177Z // %bb.0: 2026-02-21T09:47:42.2995261Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:47:42.2995314Z $L__tmp0: 2026-02-21T09:47:42.2995471Z .loc 1 19 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:19 2026-02-21T09:47:42.2995537Z mov.u32 %r1, %tid.x; 2026-02-21T09:47:42.2995591Z shr.u32 %r2, %r1, 5; 2026-02-21T09:47:42.2995698Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:47:42.2995771Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T09:47:42.2995830Z @%p2 bra $L__BB0_12; 2026-02-21T09:47:42.2995885Z bra.uni $L__BB0_1; 2026-02-21T09:47:42.2995938Z $L__BB0_12: 2026-02-21T09:47:42.2996108Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0:0 2026-02-21T09:47:42.2996188Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:47:42.2996263Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:47:42.2996344Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:47:42.2996502Z .loc 1 19 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:19 2026-02-21T09:47:42.2996584Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:42.2996653Z setp.lt.u32 %p53, %r1, 32; 2026-02-21T09:47:42.2996713Z mov.b32 %r192, global_smem; 2026-02-21T09:47:42.2996769Z // begin inline asm 2026-02-21T09:47:42.2996920Z @%p53 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r192], 32; 2026-02-21T09:47:42.2996984Z // end inline asm 2026-02-21T09:47:42.2997072Z bar.sync 0, 128; 2026-02-21T09:47:42.2997139Z ld.shared.b32 %r332, [global_smem]; 2026-02-21T09:47:42.2997197Z bar.sync 0, 128; 2026-02-21T09:47:42.2997252Z // begin inline asm 2026-02-21T09:47:42.2997374Z @%p53 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:47:42.2997433Z // end inline asm 2026-02-21T09:47:42.2997602Z .loc 1 21 67 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:21:67 2026-02-21T09:47:42.2997821Z mov.u32 %r217, %ctaid.x; 2026-02-21T09:47:42.2997880Z mov.u32 %r218, %ctaid.y; 2026-02-21T09:47:42.2997941Z mov.u32 %r219, %ctaid.z; 2026-02-21T09:47:42.2997998Z mov.u32 %r220, %nctaid.x; 2026-02-21T09:47:42.2998055Z mov.u32 %r221, %nctaid.y; 2026-02-21T09:47:42.2998126Z mad.lo.s32 %r222, %r219, %r221, %r218; 2026-02-21T09:47:42.2998187Z mad.lo.s32 %r223, %r222, %r220, %r217; 2026-02-21T09:47:42.2998278Z mul.lo.s32 %r224, %r223, 384; 2026-02-21T09:47:42.2998341Z cvt.s64.s32 %rd136, %r224; 2026-02-21T09:47:42.2998411Z add.s64 %rd97, %rd7, %rd136; 2026-02-21T09:47:42.2998469Z shl.b32 %r225, %r1, 2; 2026-02-21T09:47:42.2998526Z add.s32 %r193, %r192, %r225; 2026-02-21T09:47:42.2998586Z mov.b32 %r202, 0; 2026-02-21T09:47:42.2998640Z // begin inline asm 2026-02-21T09:47:42.2998711Z @%p53 st.shared.b32 [ %r193 + 0 ], %r202; 2026-02-21T09:47:42.2998771Z // end inline asm 2026-02-21T09:47:42.2998829Z bar.warp.sync -1; 2026-02-21T09:47:42.2998887Z setp.eq.b32 %p56, %r1, 0; 2026-02-21T09:47:42.2998944Z cvt.u64.u32 %rd82, %r192; 2026-02-21T09:47:42.2999040Z // begin inline asm 2026-02-21T09:47:42.2999207Z @%p56 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd4; 2026-02-21T09:47:42.2999261Z // end inline asm 2026-02-21T09:47:42.2999321Z // begin inline asm 2026-02-21T09:47:42.2999464Z @%p56 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T09:47:42.2999517Z // end inline asm 2026-02-21T09:47:42.2999571Z mov.b32 %r195, 64; 2026-02-21T09:47:42.2999633Z // begin inline asm 2026-02-21T09:47:42.2999785Z @%p56 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r195; 2026-02-21T09:47:42.2999838Z // end inline asm 2026-02-21T09:47:42.2999899Z mov.b32 %r196, 128; 2026-02-21T09:47:42.2999952Z // begin inline asm 2026-02-21T09:47:42.3000102Z @%p56 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r196; 2026-02-21T09:47:42.3000161Z // end inline asm 2026-02-21T09:47:42.3000216Z mov.b32 %r197, 2048; 2026-02-21T09:47:42.3000269Z // begin inline asm 2026-02-21T09:47:42.3000435Z @%p56 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r197; 2026-02-21T09:47:42.3000497Z // end inline asm 2026-02-21T09:47:42.3000551Z // begin inline asm 2026-02-21T09:47:42.3000710Z @%p56 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r197; 2026-02-21T09:47:42.3000798Z // end inline asm 2026-02-21T09:47:42.3000857Z mov.b64 %rd90, 4096; 2026-02-21T09:47:42.3000913Z // begin inline asm 2026-02-21T09:47:42.3001087Z @%p56 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T09:47:42.3001140Z // end inline asm 2026-02-21T09:47:42.3001192Z mov.b32 %r199, 1; 2026-02-21T09:47:42.3001245Z // begin inline asm 2026-02-21T09:47:42.3001421Z @%p56 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r199; 2026-02-21T09:47:42.3001474Z // end inline asm 2026-02-21T09:47:42.3001527Z // begin inline asm 2026-02-21T09:47:42.3001702Z @%p56 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r199; 2026-02-21T09:47:42.3001754Z // end inline asm 2026-02-21T09:47:42.3001806Z // begin inline asm 2026-02-21T09:47:42.3001962Z @%p56 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T09:47:42.3002016Z // end inline asm 2026-02-21T09:47:42.3002071Z // begin inline asm 2026-02-21T09:47:42.3002244Z @%p56 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T09:47:42.3002321Z // end inline asm 2026-02-21T09:47:42.3002373Z // begin inline asm 2026-02-21T09:47:42.3002521Z @%p56 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T09:47:42.3002580Z // end inline asm 2026-02-21T09:47:42.3002632Z // begin inline asm 2026-02-21T09:47:42.3002773Z @%p56 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T09:47:42.3002834Z // end inline asm 2026-02-21T09:47:42.3002888Z // begin inline asm 2026-02-21T09:47:42.3003152Z @%p53 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd97 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T09:47:42.3003210Z // end inline asm 2026-02-21T09:47:42.3003262Z // begin inline asm 2026-02-21T09:47:42.3003389Z @%p53 fence.proxy.tensormap::generic.acquire.gpu [ %rd97 + 0 ], 0x80; 2026-02-21T09:47:42.3003488Z @%p53 cp.async.bulk.commit_group ; 2026-02-21T09:47:42.3003564Z @%p53 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:42.3003618Z // end inline asm 2026-02-21T09:47:42.3003672Z bar.sync 0, 128; 2026-02-21T09:47:42.3003854Z .loc 1 22 68 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:22:68 2026-02-21T09:47:42.3003913Z add.s32 %r226, %r224, 128; 2026-02-21T09:47:42.3003970Z cvt.s64.s32 %rd137, %r226; 2026-02-21T09:47:42.3004035Z add.s64 %rd115, %rd7, %rd137; 2026-02-21T09:47:42.3004089Z bar.sync 0, 128; 2026-02-21T09:47:42.3004142Z // begin inline asm 2026-02-21T09:47:42.3004232Z @%p53 st.shared.b32 [ %r193 + 0 ], %r202; 2026-02-21T09:47:42.3004294Z // end inline asm 2026-02-21T09:47:42.3004349Z bar.warp.sync -1; 2026-02-21T09:47:42.3004403Z // begin inline asm 2026-02-21T09:47:42.3004566Z @%p56 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd5; 2026-02-21T09:47:42.3004619Z // end inline asm 2026-02-21T09:47:42.3004719Z // begin inline asm 2026-02-21T09:47:42.3004865Z @%p56 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T09:47:42.3004919Z // end inline asm 2026-02-21T09:47:42.3004971Z // begin inline asm 2026-02-21T09:47:42.3005119Z @%p56 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r195; 2026-02-21T09:47:42.3005179Z // end inline asm 2026-02-21T09:47:42.3005231Z mov.b32 %r204, 16; 2026-02-21T09:47:42.3005283Z // begin inline asm 2026-02-21T09:47:42.3005435Z @%p56 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r204; 2026-02-21T09:47:42.3005488Z // end inline asm 2026-02-21T09:47:42.3005540Z // begin inline asm 2026-02-21T09:47:42.3005703Z @%p56 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r197; 2026-02-21T09:47:42.3005755Z // end inline asm 2026-02-21T09:47:42.3005809Z mov.b32 %r206, 12288; 2026-02-21T09:47:42.3005862Z // begin inline asm 2026-02-21T09:47:42.3006056Z @%p56 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r206; 2026-02-21T09:47:42.3006112Z // end inline asm 2026-02-21T09:47:42.3006166Z // begin inline asm 2026-02-21T09:47:42.3006336Z @%p56 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd90; 2026-02-21T09:47:42.3006387Z // end inline asm 2026-02-21T09:47:42.3006440Z // begin inline asm 2026-02-21T09:47:42.3006611Z @%p56 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r199; 2026-02-21T09:47:42.3006663Z // end inline asm 2026-02-21T09:47:42.3006715Z // begin inline asm 2026-02-21T09:47:42.3006879Z @%p56 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r199; 2026-02-21T09:47:42.3006939Z // end inline asm 2026-02-21T09:47:42.3006991Z // begin inline asm 2026-02-21T09:47:42.3007138Z @%p56 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T09:47:42.3007199Z // end inline asm 2026-02-21T09:47:42.3007253Z // begin inline asm 2026-02-21T09:47:42.3007415Z @%p56 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T09:47:42.3007502Z // end inline asm 2026-02-21T09:47:42.3007557Z // begin inline asm 2026-02-21T09:47:42.3007705Z @%p56 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x3; 2026-02-21T09:47:42.3007765Z // end inline asm 2026-02-21T09:47:42.3007818Z // begin inline asm 2026-02-21T09:47:42.3007958Z @%p56 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T09:47:42.3008010Z // end inline asm 2026-02-21T09:47:42.3008073Z // begin inline asm 2026-02-21T09:47:42.3008336Z @%p53 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd115 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T09:47:42.3008391Z // end inline asm 2026-02-21T09:47:42.3008452Z // begin inline asm 2026-02-21T09:47:42.3008581Z @%p53 fence.proxy.tensormap::generic.acquire.gpu [ %rd115 + 0 ], 0x80; 2026-02-21T09:47:42.3008696Z @%p53 cp.async.bulk.commit_group ; 2026-02-21T09:47:42.3008776Z @%p53 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:42.3008830Z // end inline asm 2026-02-21T09:47:42.3008883Z bar.sync 0, 128; 2026-02-21T09:47:42.3009048Z .loc 1 24 73 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:24:73 2026-02-21T09:47:42.3009113Z add.s32 %r227, %r224, 256; 2026-02-21T09:47:42.3009171Z cvt.s64.s32 %rd138, %r227; 2026-02-21T09:47:42.3009232Z add.s64 %rd133, %rd7, %rd138; 2026-02-21T09:47:42.3009294Z bar.sync 0, 128; 2026-02-21T09:47:42.3009349Z // begin inline asm 2026-02-21T09:47:42.3009445Z @%p53 st.shared.b32 [ %r193 + 0 ], %r202; 2026-02-21T09:47:42.3009499Z // end inline asm 2026-02-21T09:47:42.3009567Z bar.warp.sync -1; 2026-02-21T09:47:42.3009622Z // begin inline asm 2026-02-21T09:47:42.3009775Z @%p56 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd82 + 0 ], %rd6; 2026-02-21T09:47:42.3009837Z // end inline asm 2026-02-21T09:47:42.3009891Z // begin inline asm 2026-02-21T09:47:42.3010026Z @%p56 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T09:47:42.3010087Z // end inline asm 2026-02-21T09:47:42.3010141Z // begin inline asm 2026-02-21T09:47:42.3010291Z @%p56 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r204; 2026-02-21T09:47:42.3010357Z // end inline asm 2026-02-21T09:47:42.3010413Z // begin inline asm 2026-02-21T09:47:42.3010561Z @%p56 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r196; 2026-02-21T09:47:42.3010614Z // end inline asm 2026-02-21T09:47:42.3010675Z // begin inline asm 2026-02-21T09:47:42.3010841Z @%p56 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r206; 2026-02-21T09:47:42.3010896Z // end inline asm 2026-02-21T09:47:42.3010960Z // begin inline asm 2026-02-21T09:47:42.3011147Z @%p56 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r197; 2026-02-21T09:47:42.3011206Z // end inline asm 2026-02-21T09:47:42.3011272Z mov.b64 %rd126, 24576; 2026-02-21T09:47:42.3011328Z // begin inline asm 2026-02-21T09:47:42.3011509Z @%p56 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd82 + 0 ], 0x0, %rd126; 2026-02-21T09:47:42.3011564Z // end inline asm 2026-02-21T09:47:42.3011629Z // begin inline asm 2026-02-21T09:47:42.3011805Z @%p56 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0, %r199; 2026-02-21T09:47:42.3011860Z // end inline asm 2026-02-21T09:47:42.3011923Z // begin inline asm 2026-02-21T09:47:42.3012098Z @%p56 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1, %r199; 2026-02-21T09:47:42.3012156Z // end inline asm 2026-02-21T09:47:42.3012219Z // begin inline asm 2026-02-21T09:47:42.3012373Z @%p56 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x6; 2026-02-21T09:47:42.3012429Z // end inline asm 2026-02-21T09:47:42.3012486Z // begin inline asm 2026-02-21T09:47:42.3012668Z @%p56 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T09:47:42.3012748Z // end inline asm 2026-02-21T09:47:42.3012805Z // begin inline asm 2026-02-21T09:47:42.3012965Z @%p56 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x1; 2026-02-21T09:47:42.3013021Z // end inline asm 2026-02-21T09:47:42.3013077Z // begin inline asm 2026-02-21T09:47:42.3013230Z @%p56 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd82 + 0 ], 0x0; 2026-02-21T09:47:42.3013285Z // end inline asm 2026-02-21T09:47:42.3013340Z // begin inline asm 2026-02-21T09:47:42.3013615Z @%p53 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd133 + 0 ], [ %rd82 + 0 ], 0x80; 2026-02-21T09:47:42.3013671Z // end inline asm 2026-02-21T09:47:42.3013726Z // begin inline asm 2026-02-21T09:47:42.3013857Z @%p53 fence.proxy.tensormap::generic.acquire.gpu [ %rd133 + 0 ], 0x80; 2026-02-21T09:47:42.3013958Z @%p53 cp.async.bulk.commit_group ; 2026-02-21T09:47:42.3014036Z @%p53 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:47:42.3014094Z // end inline asm 2026-02-21T09:47:42.3014155Z bar.sync 0, 128; 2026-02-21T09:47:42.3014224Z cvta.global.u64 %rd139, %rd133; 2026-02-21T09:47:42.3014405Z .loc 1 31 35 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:31:35 2026-02-21T09:47:42.3014473Z mul.lo.s32 %r339, %r217, 6; 2026-02-21T09:47:42.3014651Z .loc 1 32 37 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:32:37 2026-02-21T09:47:42.3014737Z add.s32 %r228, %r339, 6; 2026-02-21T09:47:42.3014944Z .loc 1 32 49 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:32:49 2026-02-21T09:47:42.3015010Z min.s32 %r23, %r228, 12288; 2026-02-21T09:47:42.3015190Z .loc 1 33 84 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:33:84 2026-02-21T09:47:42.3015257Z setp.ge.s32 %p109, %r339, %r23; 2026-02-21T09:47:42.3015329Z @%p109 bra $L__BB0_15; 2026-02-21T09:47:42.3015409Z // %bb.13: // %.lr.ph 2026-02-21T09:47:42.3015587Z .loc 1 0 84 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0:84 2026-02-21T09:47:42.3015652Z shl.b32 %r229, %r1, 5; 2026-02-21T09:47:42.3015711Z and.b32 %r230, %r229, 3936; 2026-02-21T09:47:42.3015773Z bfe.s32 %r231, %r1, 2, 1; 2026-02-21T09:47:42.3015832Z and.b32 %r232, %r231, 144; 2026-02-21T09:47:42.3015897Z or.b32 %r233, %r232, %r230; 2026-02-21T09:47:42.3015957Z add.s32 %r235, %r192, 294912; 2026-02-21T09:47:42.3016018Z add.s32 %r24, %r235, %r233; 2026-02-21T09:47:42.3016083Z xor.b32 %r236, %r233, 16; 2026-02-21T09:47:42.3016139Z add.s32 %r25, %r235, %r236; 2026-02-21T09:47:42.3016246Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:47:42.3016457Z .loc 1 39 35 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:39:35 2026-02-21T09:47:42.3016517Z shr.s32 %r298, %r339, 31; 2026-02-21T09:47:42.3016574Z shr.u32 %r299, %r298, 26; 2026-02-21T09:47:42.3016638Z add.s32 %r300, %r339, %r299; 2026-02-21T09:47:42.3016820Z .loc 1 42 64 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:42:64 2026-02-21T09:47:42.3016880Z and.b32 %r301, %r300, -64; 2026-02-21T09:47:42.3016939Z sub.s32 %r302, %r339, %r301; 2026-02-21T09:47:42.3017007Z cvt.u16.u32 %rs1, %r302; 2026-02-21T09:47:42.3017068Z cvt.s8.s32 %rs2, %r302; 2026-02-21T09:47:42.3017240Z .loc 1 43 51 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:43:51 2026-02-21T09:47:42.3017309Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:47:42.3017367Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:47:42.3017425Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:47:42.3017485Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:47:42.3017549Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:47:42.3017728Z .loc 1 42 64 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:42:64 2026-02-21T09:47:42.3017787Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:47:42.3017881Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:47:42.3018059Z .loc 1 44 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:44:27 2026-02-21T09:47:42.3018120Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:47:42.3018191Z mad.wide.s16 %r295, %rs10, 16, %r301; 2026-02-21T09:47:42.3018378Z .loc 1 45 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:45:27 2026-02-21T09:47:42.3018444Z mul.wide.s16 %r296, %rs7, 128; 2026-02-21T09:47:42.3018625Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3018709Z shfl.sync.idx.b32 %r303, %r2, 0, 31, -1; 2026-02-21T09:47:42.3018769Z shl.b32 %r304, %r303, 21; 2026-02-21T09:47:42.3018844Z and.b32 %r305, %r304, 6291456; 2026-02-21T09:47:42.3018921Z add.s32 %r237, %r305, %r332; 2026-02-21T09:47:42.3019008Z mov.pred %p110, -1; 2026-02-21T09:47:42.3019062Z mov.b32 %r238, 0; 2026-02-21T09:47:42.3019119Z // begin inline asm 2026-02-21T09:47:42.3019404Z @%p110 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r237 + 0], {%r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238, %r238}; 2026-02-21T09:47:42.3019458Z // end inline asm 2026-02-21T09:47:42.3019513Z // begin inline asm 2026-02-21T09:47:42.3019588Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:47:42.3019641Z // end inline asm 2026-02-21T09:47:42.3019694Z bar.sync 0, 128; 2026-02-21T09:47:42.3019862Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3019947Z add.s32 %r254, %r192, 299136; 2026-02-21T09:47:42.3020002Z // begin inline asm 2026-02-21T09:47:42.3020085Z @%p56 mbarrier.init.shared::cta.b64 [%r254], 1; 2026-02-21T09:47:42.3020146Z // end inline asm 2026-02-21T09:47:42.3020198Z bar.sync 0, 128; 2026-02-21T09:47:42.3020255Z add.s32 %r255, %r192, 299144; 2026-02-21T09:47:42.3020316Z // begin inline asm 2026-02-21T09:47:42.3020399Z @%p56 mbarrier.init.shared::cta.b64 [%r255], 1; 2026-02-21T09:47:42.3020451Z // end inline asm 2026-02-21T09:47:42.3020501Z bar.sync 0, 128; 2026-02-21T09:47:42.3020565Z add.s32 %r256, %r192, 299152; 2026-02-21T09:47:42.3020618Z // begin inline asm 2026-02-21T09:47:42.3020694Z @%p56 mbarrier.init.shared::cta.b64 [%r256], 1; 2026-02-21T09:47:42.3020754Z // end inline asm 2026-02-21T09:47:42.3020806Z bar.sync 0, 128; 2026-02-21T09:47:42.3020862Z add.s32 %r257, %r192, 299160; 2026-02-21T09:47:42.3020921Z // begin inline asm 2026-02-21T09:47:42.3020998Z @%p56 mbarrier.init.shared::cta.b64 [%r257], 1; 2026-02-21T09:47:42.3021049Z // end inline asm 2026-02-21T09:47:42.3021105Z add.s32 %r258, %r192, 299168; 2026-02-21T09:47:42.3021167Z // begin inline asm 2026-02-21T09:47:42.3021242Z @%p56 mbarrier.init.shared::cta.b64 [%r258], 1; 2026-02-21T09:47:42.3021316Z // end inline asm 2026-02-21T09:47:42.3021377Z bar.sync 0, 128; 2026-02-21T09:47:42.3021433Z add.s32 %r259, %r192, 299176; 2026-02-21T09:47:42.3021488Z // begin inline asm 2026-02-21T09:47:42.3021562Z @%p56 mbarrier.init.shared::cta.b64 [%r259], 1; 2026-02-21T09:47:42.3021623Z // end inline asm 2026-02-21T09:47:42.3021675Z bar.sync 0, 128; 2026-02-21T09:47:42.3021729Z add.s32 %r260, %r192, 299184; 2026-02-21T09:47:42.3021788Z // begin inline asm 2026-02-21T09:47:42.3021862Z @%p56 mbarrier.init.shared::cta.b64 [%r260], 1; 2026-02-21T09:47:42.3021915Z // end inline asm 2026-02-21T09:47:42.3021972Z bar.sync 0, 128; 2026-02-21T09:47:42.3022028Z add.s32 %r261, %r192, 299192; 2026-02-21T09:47:42.3022080Z // begin inline asm 2026-02-21T09:47:42.3022155Z @%p56 mbarrier.init.shared::cta.b64 [%r261], 1; 2026-02-21T09:47:42.3022214Z // end inline asm 2026-02-21T09:47:42.3022373Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3022426Z bar.sync 0, 128; 2026-02-21T09:47:42.3022487Z // begin inline asm 2026-02-21T09:47:42.3022572Z @%p56 mbarrier.arrive.shared::cta.b64 _, [%r254]; 2026-02-21T09:47:42.3022650Z // end inline asm 2026-02-21T09:47:42.3022701Z bar.sync 0, 128; 2026-02-21T09:47:42.3022761Z // begin inline asm 2026-02-21T09:47:42.3022843Z @%p56 mbarrier.arrive.shared::cta.b64 _, [%r255]; 2026-02-21T09:47:42.3022895Z // end inline asm 2026-02-21T09:47:42.3022953Z bar.sync 0, 128; 2026-02-21T09:47:42.3023005Z // begin inline asm 2026-02-21T09:47:42.3023084Z @%p56 mbarrier.arrive.shared::cta.b64 _, [%r256]; 2026-02-21T09:47:42.3023143Z // end inline asm 2026-02-21T09:47:42.3023195Z bar.sync 0, 128; 2026-02-21T09:47:42.3023247Z // begin inline asm 2026-02-21T09:47:42.3023325Z @%p56 mbarrier.arrive.shared::cta.b64 _, [%r257]; 2026-02-21T09:47:42.3023384Z // end inline asm 2026-02-21T09:47:42.3023546Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3023621Z bar.sync 0, 128; 2026-02-21T09:47:42.3023685Z add.s32 %r266, %r192, 299200; 2026-02-21T09:47:42.3023740Z // begin inline asm 2026-02-21T09:47:42.3023815Z @%p56 mbarrier.init.shared::cta.b64 [%r266], 1; 2026-02-21T09:47:42.3023866Z // end inline asm 2026-02-21T09:47:42.3023954Z st.shared.b32 [global_smem+299208], 33554689; 2026-02-21T09:47:42.3024028Z st.shared.b32 [global_smem+299008], %r332; 2026-02-21T09:47:42.3024121Z st.shared.v2.b32 [global_smem+299016], {%r296, %r295}; 2026-02-21T09:47:42.3024184Z barrier.sync 1; 2026-02-21T09:47:42.3024261Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:42.3024316Z barrier.sync 1; 2026-02-21T09:47:42.3024400Z barrier.sync 1; 2026-02-21T09:47:42.3024475Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:47:42.3024640Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3024719Z bar.sync 0, 128; 2026-02-21T09:47:42.3024782Z // begin inline asm 2026-02-21T09:47:42.3024834Z 2026-02-21T09:47:42.3024884Z { 2026-02-21T09:47:42.3024953Z .reg .pred complete; 2026-02-21T09:47:42.3025008Z waitLoop: 2026-02-21T09:47:42.3025129Z mbarrier.try_wait.parity.shared.b64 complete, [%r266], %r238; 2026-02-21T09:47:42.3025192Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.3025250Z } 2026-02-21T09:47:42.3025254Z 2026-02-21T09:47:42.3025307Z // end inline asm 2026-02-21T09:47:42.3025472Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3025534Z bar.sync 0, 128; 2026-02-21T09:47:42.3025589Z // begin inline asm 2026-02-21T09:47:42.3025670Z @%p56 mbarrier.inval.shared::cta.b64 [%r266]; 2026-02-21T09:47:42.3025733Z // end inline asm 2026-02-21T09:47:42.3025786Z // begin inline asm 2026-02-21T09:47:42.3025866Z @%p56 mbarrier.inval.shared::cta.b64 [%r258]; 2026-02-21T09:47:42.3025920Z // end inline asm 2026-02-21T09:47:42.3025981Z bar.sync 0, 128; 2026-02-21T09:47:42.3026061Z // begin inline asm 2026-02-21T09:47:42.3026143Z @%p56 mbarrier.inval.shared::cta.b64 [%r259]; 2026-02-21T09:47:42.3026203Z // end inline asm 2026-02-21T09:47:42.3026257Z bar.sync 0, 128; 2026-02-21T09:47:42.3026311Z // begin inline asm 2026-02-21T09:47:42.3026384Z @%p56 mbarrier.inval.shared::cta.b64 [%r260]; 2026-02-21T09:47:42.3026444Z // end inline asm 2026-02-21T09:47:42.3026494Z bar.sync 0, 128; 2026-02-21T09:47:42.3026548Z // begin inline asm 2026-02-21T09:47:42.3026629Z @%p56 mbarrier.inval.shared::cta.b64 [%r261]; 2026-02-21T09:47:42.3026681Z // end inline asm 2026-02-21T09:47:42.3026735Z // begin inline asm 2026-02-21T09:47:42.3026809Z @%p56 mbarrier.inval.shared::cta.b64 [%r254]; 2026-02-21T09:47:42.3026869Z // end inline asm 2026-02-21T09:47:42.3026920Z bar.sync 0, 128; 2026-02-21T09:47:42.3026973Z // begin inline asm 2026-02-21T09:47:42.3027054Z @%p56 mbarrier.inval.shared::cta.b64 [%r255]; 2026-02-21T09:47:42.3027106Z // end inline asm 2026-02-21T09:47:42.3027156Z bar.sync 0, 128; 2026-02-21T09:47:42.3027219Z // begin inline asm 2026-02-21T09:47:42.3027291Z @%p56 mbarrier.inval.shared::cta.b64 [%r256]; 2026-02-21T09:47:42.3027388Z // end inline asm 2026-02-21T09:47:42.3027440Z bar.sync 0, 128; 2026-02-21T09:47:42.3027501Z // begin inline asm 2026-02-21T09:47:42.3027576Z @%p56 mbarrier.inval.shared::cta.b64 [%r257]; 2026-02-21T09:47:42.3027627Z // end inline asm 2026-02-21T09:47:42.3027792Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3027846Z // begin inline asm 2026-02-21T09:47:42.3028118Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r278, %r279, %r280, %r281, %r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293}, [%r237 + 0]; 2026-02-21T09:47:42.3028181Z // end inline asm 2026-02-21T09:47:42.3028235Z // begin inline asm 2026-02-21T09:47:42.3028301Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:47:42.3028352Z // end inline asm 2026-02-21T09:47:42.3028419Z cvt.u64.u32 %rd140, %r278; 2026-02-21T09:47:42.3028503Z cvt.u64.u32 %rd141, %r279; 2026-02-21T09:47:42.3028562Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:47:42.3028630Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:47:42.3028792Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3028853Z mov.b64 {%r307, %r308}, %rd143; 2026-02-21T09:47:42.3028918Z cvt.rn.f16x2.f32 %r309, %r308, %r307; 2026-02-21T09:47:42.3029083Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3029138Z cvt.u64.u32 %rd144, %r280; 2026-02-21T09:47:42.3029237Z cvt.u64.u32 %rd145, %r281; 2026-02-21T09:47:42.3029303Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:47:42.3029362Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:47:42.3029526Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3029591Z mov.b64 {%r310, %r311}, %rd147; 2026-02-21T09:47:42.3029658Z cvt.rn.f16x2.f32 %r312, %r311, %r310; 2026-02-21T09:47:42.3029821Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3029878Z cvt.u64.u32 %rd148, %r282; 2026-02-21T09:47:42.3029939Z cvt.u64.u32 %rd149, %r283; 2026-02-21T09:47:42.3029995Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:47:42.3030052Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:47:42.3030220Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3030278Z mov.b64 {%r313, %r314}, %rd151; 2026-02-21T09:47:42.3030338Z cvt.rn.f16x2.f32 %r315, %r314, %r313; 2026-02-21T09:47:42.3030511Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3030565Z cvt.u64.u32 %rd152, %r284; 2026-02-21T09:47:42.3030621Z cvt.u64.u32 %rd153, %r285; 2026-02-21T09:47:42.3030675Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:47:42.3030767Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:47:42.3030935Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3030995Z mov.b64 {%r316, %r317}, %rd155; 2026-02-21T09:47:42.3031066Z cvt.rn.f16x2.f32 %r318, %r317, %r316; 2026-02-21T09:47:42.3031229Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3031286Z cvt.u64.u32 %rd156, %r286; 2026-02-21T09:47:42.3031349Z cvt.u64.u32 %rd157, %r287; 2026-02-21T09:47:42.3031406Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:47:42.3031464Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:47:42.3031631Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3031696Z mov.b64 {%r319, %r320}, %rd159; 2026-02-21T09:47:42.3031757Z cvt.rn.f16x2.f32 %r321, %r320, %r319; 2026-02-21T09:47:42.3031923Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3031988Z cvt.u64.u32 %rd160, %r288; 2026-02-21T09:47:42.3032067Z cvt.u64.u32 %rd161, %r289; 2026-02-21T09:47:42.3032123Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:47:42.3032186Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:47:42.3032353Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3032410Z mov.b64 {%r322, %r323}, %rd163; 2026-02-21T09:47:42.3032470Z cvt.rn.f16x2.f32 %r324, %r323, %r322; 2026-02-21T09:47:42.3032640Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3032697Z cvt.u64.u32 %rd164, %r290; 2026-02-21T09:47:42.3032753Z cvt.u64.u32 %rd165, %r291; 2026-02-21T09:47:42.3032817Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:47:42.3032873Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:47:42.3033038Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3033124Z mov.b64 {%r325, %r326}, %rd167; 2026-02-21T09:47:42.3033187Z cvt.rn.f16x2.f32 %r327, %r326, %r325; 2026-02-21T09:47:42.3033354Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3033410Z cvt.u64.u32 %rd168, %r292; 2026-02-21T09:47:42.3033476Z cvt.u64.u32 %rd169, %r293; 2026-02-21T09:47:42.3033535Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:47:42.3033593Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:47:42.3033766Z .loc 1 58 27 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:58:27 2026-02-21T09:47:42.3033845Z mov.b64 {%r328, %r329}, %rd171; 2026-02-21T09:47:42.3033908Z cvt.rn.f16x2.f32 %r330, %r329, %r328; 2026-02-21T09:47:42.3034085Z .loc 1 59 45 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:59:45 2026-02-21T09:47:42.3034155Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:42.3034210Z bar.sync 0, 128; 2026-02-21T09:47:42.3034303Z st.shared.v4.b32 [%r24], {%r309, %r312, %r315, %r318}; 2026-02-21T09:47:42.3034400Z st.shared.v4.b32 [%r25], {%r321, %r324, %r327, %r330}; 2026-02-21T09:47:42.3034454Z // begin inline asm 2026-02-21T09:47:42.3034522Z fence.proxy.async.shared::cta; 2026-02-21T09:47:42.3034584Z // end inline asm 2026-02-21T09:47:42.3034636Z bar.sync 0, 128; 2026-02-21T09:47:42.3034739Z elect.sync %r331|%p135, -1; 2026-02-21T09:47:42.3034813Z and.pred %p133, %p53, %p135; 2026-02-21T09:47:42.3034868Z // begin inline asm 2026-02-21T09:47:42.3035046Z @%p133 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd139, {%r295, %r296}], [%r235]; 2026-02-21T09:47:42.3035099Z // end inline asm 2026-02-21T09:47:42.3035171Z cp.async.bulk.commit_group; 2026-02-21T09:47:42.3035336Z .loc 1 33 84 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:33:84 2026-02-21T09:47:42.3035393Z add.s32 %r339, %r339, 1; 2026-02-21T09:47:42.3035489Z setp.ne.b32 %p136, %r23, %r339; 2026-02-21T09:47:42.3035548Z @%p136 bra $L__BB0_14; 2026-02-21T09:47:42.3035630Z $L__BB0_15: // %._crit_edge 2026-02-21T09:47:42.3035707Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:47:42.3035758Z bar.sync 0, 128; 2026-02-21T09:47:42.3035924Z .loc 1 33 4 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:33:4 2026-02-21T09:47:42.3035976Z bar.sync 0, 128; 2026-02-21T09:47:42.3036037Z // begin inline asm 2026-02-21T09:47:42.3036151Z @%p53 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r332, 32; 2026-02-21T09:47:42.3036204Z // end inline asm 2026-02-21T09:47:42.3036294Z st.shared.b32 [global_smem+299208], 50529027; 2026-02-21T09:47:42.3036348Z barrier.sync 1; 2026-02-21T09:47:42.3036428Z $L__BB0_16: // %common.ret 2026-02-21T09:47:42.3036592Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3036652Z ret; 2026-02-21T09:47:42.3036748Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:47:42.3036808Z mov.b32 %r29, global_smem; 2026-02-21T09:47:42.3036907Z add.s32 %r30, %r29, %r3; 2026-02-21T09:47:42.3036963Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.3037062Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3037232Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3037308Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.3037363Z barrier.sync 1; 2026-02-21T09:47:42.3037416Z barrier.sync 1; 2026-02-21T09:47:42.3037499Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.3037577Z $L__BB0_2: // %.preheader 2026-02-21T09:47:42.3037665Z // =>This Loop Header: Depth=1 2026-02-21T09:47:42.3037758Z // Child Loop BB0_9 Depth 2 2026-02-21T09:47:42.3037868Z // Child Loop BB0_6 Depth 2 2026-02-21T09:47:42.3038029Z .loc 1 19 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:19 2026-02-21T09:47:42.3038112Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:47:42.3038167Z barrier.sync 1; 2026-02-21T09:47:42.3038230Z ld.shared.b8 %r28, [%r30+299204]; 2026-02-21T09:47:42.3038290Z setp.gt.u32 %p3, %r28, 3; 2026-02-21T09:47:42.3038353Z @%p3 bra $L__BB0_4; 2026-02-21T09:47:42.3038429Z // %bb.3: // %.preheader 2026-02-21T09:47:42.3038516Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3038612Z $L_brx_0: .branchtargets 2026-02-21T09:47:42.3038666Z $L__BB0_5, 2026-02-21T09:47:42.3038717Z $L__BB0_8, 2026-02-21T09:47:42.3038777Z $L__BB0_11, 2026-02-21T09:47:42.3038827Z $L__BB0_16; 2026-02-21T09:47:42.3038884Z brx.idx %r28, $L_brx_0; 2026-02-21T09:47:42.3038977Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3039154Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3039230Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.3039304Z ld.shared.b32 %r85, [global_smem+299008]; 2026-02-21T09:47:42.3039364Z barrier.sync 1; 2026-02-21T09:47:42.3039422Z mov.pred %p138, 0; 2026-02-21T09:47:42.3039473Z mov.b32 %r334, 0; 2026-02-21T09:47:42.3039530Z mov.b32 %r333, -256; 2026-02-21T09:47:42.3039593Z mov.b32 %r335, %r334; 2026-02-21T09:47:42.3039686Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:42.3039776Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:42.3039842Z shl.b32 %r117, %r335, 3; 2026-02-21T09:47:42.3039900Z add.s32 %r119, %r29, %r117; 2026-02-21T09:47:42.3039958Z add.s32 %r120, %r119, 299136; 2026-02-21T09:47:42.3040023Z add.s32 %r83, %r119, 299168; 2026-02-21T09:47:42.3040210Z .loc 1 54 31 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:54:31 2026-02-21T09:47:42.3040269Z shl.b32 %r121, %r335, 16; 2026-02-21T09:47:42.3040327Z add.s32 %r122, %r29, %r121; 2026-02-21T09:47:42.3040501Z .loc 1 55 44 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:55:44 2026-02-21T09:47:42.3040558Z shl.b32 %r123, %r335, 13; 2026-02-21T09:47:42.3040615Z add.s32 %r124, %r29, %r123; 2026-02-21T09:47:42.3040681Z add.s32 %r125, %r124, 262144; 2026-02-21T09:47:42.3040838Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3040899Z bar.warp.sync -1; 2026-02-21T09:47:42.3040963Z // begin inline asm 2026-02-21T09:47:42.3041012Z 2026-02-21T09:47:42.3041060Z { 2026-02-21T09:47:42.3041118Z .reg .pred complete; 2026-02-21T09:47:42.3041180Z waitLoop: 2026-02-21T09:47:42.3041295Z mbarrier.try_wait.parity.shared.b64 complete, [%r83], %r334; 2026-02-21T09:47:42.3041360Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.3041417Z } 2026-02-21T09:47:42.3041421Z 2026-02-21T09:47:42.3041498Z // end inline asm 2026-02-21T09:47:42.3041658Z .loc 1 56 52 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:56:52 2026-02-21T09:47:42.3041725Z setp.eq.b32 %p50, %r333, 1536; 2026-02-21T09:47:42.3041795Z elect.sync %r126|%p16, -1; 2026-02-21T09:47:42.3041856Z bfe.u32 %r127, %r122, 4, 14; 2026-02-21T09:47:42.3041916Z cvt.u64.u32 %rd50, %r127; 2026-02-21T09:47:42.3041994Z or.b64 %rd16, %rd50, 4611686293372403712; 2026-02-21T09:47:42.3042053Z bfe.u32 %r128, %r125, 4, 14; 2026-02-21T09:47:42.3042112Z cvt.u64.u32 %rd51, %r128; 2026-02-21T09:47:42.3042178Z or.b64 %rd17, %rd51, 4611686293313683456; 2026-02-21T09:47:42.3042243Z mov.b32 %r86, 134479888; 2026-02-21T09:47:42.3042298Z // begin inline asm 2026-02-21T09:47:42.3042434Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd16, %rd17, %r86, %p138; 2026-02-21T09:47:42.3042496Z // end inline asm 2026-02-21T09:47:42.3042573Z add.s32 %r129, %r122, 32; 2026-02-21T09:47:42.3042630Z bfe.u32 %r130, %r129, 4, 14; 2026-02-21T09:47:42.3042696Z cvt.u64.u32 %rd52, %r130; 2026-02-21T09:47:42.3042758Z or.b64 %rd18, %rd52, 4611686293372403712; 2026-02-21T09:47:42.3042815Z add.s32 %r131, %r124, 262176; 2026-02-21T09:47:42.3042871Z bfe.u32 %r132, %r131, 4, 14; 2026-02-21T09:47:42.3042934Z cvt.u64.u32 %rd53, %r132; 2026-02-21T09:47:42.3042996Z or.b64 %rd19, %rd53, 4611686293313683456; 2026-02-21T09:47:42.3043054Z mov.pred %p138, -1; 2026-02-21T09:47:42.3043115Z // begin inline asm 2026-02-21T09:47:42.3043248Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd18, %rd19, %r86, %p138; 2026-02-21T09:47:42.3043327Z // end inline asm 2026-02-21T09:47:42.3043389Z add.s32 %r133, %r122, 64; 2026-02-21T09:47:42.3043445Z bfe.u32 %r134, %r133, 4, 14; 2026-02-21T09:47:42.3043501Z cvt.u64.u32 %rd54, %r134; 2026-02-21T09:47:42.3043564Z or.b64 %rd20, %rd54, 4611686293372403712; 2026-02-21T09:47:42.3043631Z add.s32 %r135, %r124, 262208; 2026-02-21T09:47:42.3043687Z bfe.u32 %r136, %r135, 4, 14; 2026-02-21T09:47:42.3043744Z cvt.u64.u32 %rd55, %r136; 2026-02-21T09:47:42.3043814Z or.b64 %rd21, %rd55, 4611686293313683456; 2026-02-21T09:47:42.3043870Z // begin inline asm 2026-02-21T09:47:42.3043995Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd20, %rd21, %r86, %p138; 2026-02-21T09:47:42.3044049Z // end inline asm 2026-02-21T09:47:42.3044111Z add.s32 %r137, %r122, 96; 2026-02-21T09:47:42.3044166Z bfe.u32 %r138, %r137, 4, 14; 2026-02-21T09:47:42.3044220Z cvt.u64.u32 %rd56, %r138; 2026-02-21T09:47:42.3044293Z or.b64 %rd22, %rd56, 4611686293372403712; 2026-02-21T09:47:42.3044350Z add.s32 %r139, %r124, 262240; 2026-02-21T09:47:42.3044405Z bfe.u32 %r140, %r139, 4, 14; 2026-02-21T09:47:42.3044460Z cvt.u64.u32 %rd57, %r140; 2026-02-21T09:47:42.3044529Z or.b64 %rd23, %rd57, 4611686293313683456; 2026-02-21T09:47:42.3044583Z // begin inline asm 2026-02-21T09:47:42.3044779Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd22, %rd23, %r86, %p138; 2026-02-21T09:47:42.3044843Z // end inline asm 2026-02-21T09:47:42.3044899Z add.s32 %r141, %r122, 16384; 2026-02-21T09:47:42.3044954Z bfe.u32 %r142, %r141, 4, 14; 2026-02-21T09:47:42.3045017Z cvt.u64.u32 %rd58, %r142; 2026-02-21T09:47:42.3045078Z or.b64 %rd24, %rd58, 4611686293372403712; 2026-02-21T09:47:42.3045136Z add.s32 %r143, %r124, 264192; 2026-02-21T09:47:42.3045191Z bfe.u32 %r144, %r143, 4, 14; 2026-02-21T09:47:42.3045254Z cvt.u64.u32 %rd59, %r144; 2026-02-21T09:47:42.3045316Z or.b64 %rd25, %rd59, 4611686293313683456; 2026-02-21T09:47:42.3045371Z // begin inline asm 2026-02-21T09:47:42.3045504Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd24, %rd25, %r86, %p138; 2026-02-21T09:47:42.3045559Z // end inline asm 2026-02-21T09:47:42.3045615Z add.s32 %r145, %r122, 16416; 2026-02-21T09:47:42.3045678Z bfe.u32 %r146, %r145, 4, 14; 2026-02-21T09:47:42.3045736Z cvt.u64.u32 %rd60, %r146; 2026-02-21T09:47:42.3045799Z or.b64 %rd26, %rd60, 4611686293372403712; 2026-02-21T09:47:42.3045856Z add.s32 %r147, %r124, 264224; 2026-02-21T09:47:42.3045950Z bfe.u32 %r148, %r147, 4, 14; 2026-02-21T09:47:42.3046007Z cvt.u64.u32 %rd61, %r148; 2026-02-21T09:47:42.3046070Z or.b64 %rd27, %rd61, 4611686293313683456; 2026-02-21T09:47:42.3046131Z // begin inline asm 2026-02-21T09:47:42.3046254Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd26, %rd27, %r86, %p138; 2026-02-21T09:47:42.3046306Z // end inline asm 2026-02-21T09:47:42.3046361Z add.s32 %r149, %r122, 16448; 2026-02-21T09:47:42.3046423Z bfe.u32 %r150, %r149, 4, 14; 2026-02-21T09:47:42.3046479Z cvt.u64.u32 %rd62, %r150; 2026-02-21T09:47:42.3046541Z or.b64 %rd28, %rd62, 4611686293372403712; 2026-02-21T09:47:42.3046604Z add.s32 %r151, %r124, 264256; 2026-02-21T09:47:42.3046659Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T09:47:42.3046714Z cvt.u64.u32 %rd63, %r152; 2026-02-21T09:47:42.3046783Z or.b64 %rd29, %rd63, 4611686293313683456; 2026-02-21T09:47:42.3046862Z // begin inline asm 2026-02-21T09:47:42.3046988Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd28, %rd29, %r86, %p138; 2026-02-21T09:47:42.3047042Z // end inline asm 2026-02-21T09:47:42.3047104Z add.s32 %r153, %r122, 16480; 2026-02-21T09:47:42.3047158Z bfe.u32 %r154, %r153, 4, 14; 2026-02-21T09:47:42.3047212Z cvt.u64.u32 %rd64, %r154; 2026-02-21T09:47:42.3047280Z or.b64 %rd30, %rd64, 4611686293372403712; 2026-02-21T09:47:42.3047337Z add.s32 %r155, %r124, 264288; 2026-02-21T09:47:42.3047392Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T09:47:42.3047447Z cvt.u64.u32 %rd65, %r156; 2026-02-21T09:47:42.3047544Z or.b64 %rd31, %rd65, 4611686293313683456; 2026-02-21T09:47:42.3047600Z // begin inline asm 2026-02-21T09:47:42.3047723Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd30, %rd31, %r86, %p138; 2026-02-21T09:47:42.3047784Z // end inline asm 2026-02-21T09:47:42.3047840Z add.s32 %r157, %r122, 32768; 2026-02-21T09:47:42.3047896Z bfe.u32 %r158, %r157, 4, 14; 2026-02-21T09:47:42.3047960Z cvt.u64.u32 %rd66, %r158; 2026-02-21T09:47:42.3048023Z or.b64 %rd32, %rd66, 4611686293372403712; 2026-02-21T09:47:42.3048078Z add.s32 %r159, %r124, 266240; 2026-02-21T09:47:42.3048133Z bfe.u32 %r160, %r159, 4, 14; 2026-02-21T09:47:42.3048196Z cvt.u64.u32 %rd67, %r160; 2026-02-21T09:47:42.3048258Z or.b64 %rd33, %rd67, 4611686293313683456; 2026-02-21T09:47:42.3048314Z // begin inline asm 2026-02-21T09:47:42.3048446Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd32, %rd33, %r86, %p138; 2026-02-21T09:47:42.3048500Z // end inline asm 2026-02-21T09:47:42.3048555Z add.s32 %r161, %r122, 32800; 2026-02-21T09:47:42.3048614Z bfe.u32 %r162, %r161, 4, 14; 2026-02-21T09:47:42.3048687Z cvt.u64.u32 %rd68, %r162; 2026-02-21T09:47:42.3048748Z or.b64 %rd34, %rd68, 4611686293372403712; 2026-02-21T09:47:42.3048805Z add.s32 %r163, %r124, 266272; 2026-02-21T09:47:42.3048869Z bfe.u32 %r164, %r163, 4, 14; 2026-02-21T09:47:42.3048950Z cvt.u64.u32 %rd69, %r164; 2026-02-21T09:47:42.3049014Z or.b64 %rd35, %rd69, 4611686293313683456; 2026-02-21T09:47:42.3049077Z // begin inline asm 2026-02-21T09:47:42.3049201Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd34, %rd35, %r86, %p138; 2026-02-21T09:47:42.3049253Z // end inline asm 2026-02-21T09:47:42.3049307Z add.s32 %r165, %r122, 32832; 2026-02-21T09:47:42.3049369Z bfe.u32 %r166, %r165, 4, 14; 2026-02-21T09:47:42.3049423Z cvt.u64.u32 %rd70, %r166; 2026-02-21T09:47:42.3049484Z or.b64 %rd36, %rd70, 4611686293372403712; 2026-02-21T09:47:42.3049546Z add.s32 %r167, %r124, 266304; 2026-02-21T09:47:42.3049600Z bfe.u32 %r168, %r167, 4, 14; 2026-02-21T09:47:42.3049658Z cvt.u64.u32 %rd71, %r168; 2026-02-21T09:47:42.3049720Z or.b64 %rd37, %rd71, 4611686293313683456; 2026-02-21T09:47:42.3049782Z // begin inline asm 2026-02-21T09:47:42.3049906Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd36, %rd37, %r86, %p138; 2026-02-21T09:47:42.3049961Z // end inline asm 2026-02-21T09:47:42.3050024Z add.s32 %r169, %r122, 32864; 2026-02-21T09:47:42.3050079Z bfe.u32 %r170, %r169, 4, 14; 2026-02-21T09:47:42.3050156Z cvt.u64.u32 %rd72, %r170; 2026-02-21T09:47:42.3050224Z or.b64 %rd38, %rd72, 4611686293372403712; 2026-02-21T09:47:42.3050280Z add.s32 %r171, %r124, 266336; 2026-02-21T09:47:42.3050334Z bfe.u32 %r172, %r171, 4, 14; 2026-02-21T09:47:42.3050389Z cvt.u64.u32 %rd73, %r172; 2026-02-21T09:47:42.3050458Z or.b64 %rd39, %rd73, 4611686293313683456; 2026-02-21T09:47:42.3050511Z // begin inline asm 2026-02-21T09:47:42.3050633Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd38, %rd39, %r86, %p138; 2026-02-21T09:47:42.3050694Z // end inline asm 2026-02-21T09:47:42.3050749Z add.s32 %r173, %r122, 49152; 2026-02-21T09:47:42.3050803Z bfe.u32 %r174, %r173, 4, 14; 2026-02-21T09:47:42.3050865Z cvt.u64.u32 %rd74, %r174; 2026-02-21T09:47:42.3050926Z or.b64 %rd40, %rd74, 4611686293372403712; 2026-02-21T09:47:42.3050981Z add.s32 %r175, %r124, 268288; 2026-02-21T09:47:42.3051060Z bfe.u32 %r176, %r175, 4, 14; 2026-02-21T09:47:42.3051126Z cvt.u64.u32 %rd75, %r176; 2026-02-21T09:47:42.3051189Z or.b64 %rd41, %rd75, 4611686293313683456; 2026-02-21T09:47:42.3051243Z // begin inline asm 2026-02-21T09:47:42.3051374Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd40, %rd41, %r86, %p138; 2026-02-21T09:47:42.3051427Z // end inline asm 2026-02-21T09:47:42.3051482Z add.s32 %r177, %r122, 49184; 2026-02-21T09:47:42.3051536Z bfe.u32 %r178, %r177, 4, 14; 2026-02-21T09:47:42.3051598Z cvt.u64.u32 %rd76, %r178; 2026-02-21T09:47:42.3051658Z or.b64 %rd42, %rd76, 4611686293372403712; 2026-02-21T09:47:42.3051737Z add.s32 %r179, %r124, 268320; 2026-02-21T09:47:42.3051800Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T09:47:42.3051854Z cvt.u64.u32 %rd77, %r180; 2026-02-21T09:47:42.3051916Z or.b64 %rd43, %rd77, 4611686293313683456; 2026-02-21T09:47:42.3051976Z // begin inline asm 2026-02-21T09:47:42.3052103Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd42, %rd43, %r86, %p138; 2026-02-21T09:47:42.3052155Z // end inline asm 2026-02-21T09:47:42.3052209Z add.s32 %r181, %r122, 49216; 2026-02-21T09:47:42.3052271Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T09:47:42.3052326Z cvt.u64.u32 %rd78, %r182; 2026-02-21T09:47:42.3052388Z or.b64 %rd44, %rd78, 4611686293372403712; 2026-02-21T09:47:42.3052449Z add.s32 %r183, %r124, 268352; 2026-02-21T09:47:42.3052504Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T09:47:42.3052559Z cvt.u64.u32 %rd79, %r184; 2026-02-21T09:47:42.3052620Z or.b64 %rd45, %rd79, 4611686293313683456; 2026-02-21T09:47:42.3052682Z // begin inline asm 2026-02-21T09:47:42.3052807Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd44, %rd45, %r86, %p138; 2026-02-21T09:47:42.3052859Z // end inline asm 2026-02-21T09:47:42.3052922Z add.s32 %r185, %r122, 49248; 2026-02-21T09:47:42.3052976Z bfe.u32 %r186, %r185, 4, 14; 2026-02-21T09:47:42.3053031Z cvt.u64.u32 %rd80, %r186; 2026-02-21T09:47:42.3053121Z or.b64 %rd46, %rd80, 4611686293372403712; 2026-02-21T09:47:42.3053180Z add.s32 %r187, %r124, 268384; 2026-02-21T09:47:42.3053235Z bfe.u32 %r188, %r187, 4, 14; 2026-02-21T09:47:42.3053292Z cvt.u64.u32 %rd81, %r188; 2026-02-21T09:47:42.3053360Z or.b64 %rd47, %rd81, 4611686293313683456; 2026-02-21T09:47:42.3053415Z // begin inline asm 2026-02-21T09:47:42.3053536Z @%p16 tcgen05.mma.cta_group::1.kind::f16 [ %r85 + 0 ], %rd46, %rd47, %r86, %p138; 2026-02-21T09:47:42.3053596Z // end inline asm 2026-02-21T09:47:42.3053651Z cvt.u64.u32 %rd48, %r120; 2026-02-21T09:47:42.3053706Z // begin inline asm 2026-02-21T09:47:42.3053836Z @%p16 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd48]; 2026-02-21T09:47:42.3053890Z // end inline asm 2026-02-21T09:47:42.3053953Z and.pred %p48, %p50, %p16; 2026-02-21T09:47:42.3054009Z add.s32 %r189, %r29, 299200; 2026-02-21T09:47:42.3054073Z cvt.u64.u32 %rd49, %r189; 2026-02-21T09:47:42.3054127Z // begin inline asm 2026-02-21T09:47:42.3054247Z @%p48 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd49]; 2026-02-21T09:47:42.3054308Z // end inline asm 2026-02-21T09:47:42.3054473Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3054554Z add.s32 %r190, %r335, 1; 2026-02-21T09:47:42.3054615Z setp.eq.b32 %p51, %r190, 4; 2026-02-21T09:47:42.3054715Z selp.b32 %r335, 0, %r190, %p51; 2026-02-21T09:47:42.3054786Z selp.b32 %r191, 1, 0, %p51; 2026-02-21T09:47:42.3054856Z xor.b32 %r334, %r334, %r191; 2026-02-21T09:47:42.3055047Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3055110Z add.s32 %r333, %r333, 256; 2026-02-21T09:47:42.3055176Z setp.lt.u32 %p52, %r333, 1792; 2026-02-21T09:47:42.3055247Z @%p52 bra $L__BB0_6; 2026-02-21T09:47:42.3055348Z // %bb.7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3055408Z barrier.sync 1; 2026-02-21T09:47:42.3055490Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.3055592Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.3055692Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3055868Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3055955Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.3056051Z ld.shared.v2.b32 {%r46, %r54}, [global_smem+299016]; 2026-02-21T09:47:42.3056110Z barrier.sync 1; 2026-02-21T09:47:42.3056297Z .loc 1 21 67 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:21:67 2026-02-21T09:47:42.3056361Z mov.u32 %r32, %ctaid.x; 2026-02-21T09:47:42.3056450Z mov.u32 %r33, %ctaid.y; 2026-02-21T09:47:42.3056507Z mov.u32 %r34, %ctaid.z; 2026-02-21T09:47:42.3056575Z mov.u32 %r35, %nctaid.x; 2026-02-21T09:47:42.3056634Z mov.u32 %r36, %nctaid.y; 2026-02-21T09:47:42.3056699Z mad.lo.s32 %r37, %r34, %r36, %r33; 2026-02-21T09:47:42.3056769Z mad.lo.s32 %r38, %r37, %r35, %r32; 2026-02-21T09:47:42.3056829Z mul.lo.s32 %r39, %r38, 384; 2026-02-21T09:47:42.3057006Z .loc 1 22 68 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:22:68 2026-02-21T09:47:42.3057066Z add.s32 %r40, %r39, 128; 2026-02-21T09:47:42.3057134Z cvt.s64.s32 %rd8, %r40; 2026-02-21T09:47:42.3057194Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:47:42.3057256Z cvta.global.u64 %rd14, %rd9; 2026-02-21T09:47:42.3057439Z .loc 1 21 67 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:21:67 2026-02-21T09:47:42.3057498Z cvt.s64.s32 %rd10, %r39; 2026-02-21T09:47:42.3057558Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:47:42.3057632Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:47:42.3057691Z add.s32 %r13, %r1, -128; 2026-02-21T09:47:42.3057749Z shr.u32 %r14, %r13, 5; 2026-02-21T09:47:42.3057805Z mov.b32 %r336, 0; 2026-02-21T09:47:42.3057870Z mov.b32 %r337, %r336; 2026-02-21T09:47:42.3057927Z mov.b32 %r338, %r336; 2026-02-21T09:47:42.3058064Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:47:42.3058168Z // => This Inner Loop Header: Depth=2 2026-02-21T09:47:42.3058347Z .loc 1 0 67 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0:67 2026-02-21T09:47:42.3058411Z setp.lt.u32 %p9, %r13, 64; 2026-02-21T09:47:42.3058480Z setp.eq.b32 %p4, %r13, 0; 2026-02-21T09:47:42.3058654Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3058714Z shl.b32 %r60, %r337, 3; 2026-02-21T09:47:42.3058773Z add.s32 %r62, %r29, %r60; 2026-02-21T09:47:42.3058842Z add.s32 %r41, %r62, 299136; 2026-02-21T09:47:42.3059008Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3059067Z // begin inline asm 2026-02-21T09:47:42.3059127Z 2026-02-21T09:47:42.3059180Z { 2026-02-21T09:47:42.3059243Z .reg .pred complete; 2026-02-21T09:47:42.3059300Z waitLoop: 2026-02-21T09:47:42.3059426Z mbarrier.try_wait.parity.shared.b64 complete, [%r41], %r336; 2026-02-21T09:47:42.3059529Z @!complete bra.uni waitLoop; 2026-02-21T09:47:42.3059579Z } 2026-02-21T09:47:42.3059583Z 2026-02-21T09:47:42.3059646Z // end inline asm 2026-02-21T09:47:42.3059819Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3059878Z add.s32 %r47, %r62, 299168; 2026-02-21T09:47:42.3060051Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3060108Z bar.sync 3, 64; 2026-02-21T09:47:42.3060166Z // begin inline asm 2026-02-21T09:47:42.3060280Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r47], 73728; 2026-02-21T09:47:42.3060344Z // end inline asm 2026-02-21T09:47:42.3060518Z .loc 1 54 31 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:54:31 2026-02-21T09:47:42.3060577Z shl.b32 %r63, %r337, 16; 2026-02-21T09:47:42.3060666Z add.s32 %r64, %r29, %r63; 2026-02-21T09:47:42.3060832Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3060890Z bar.sync 3, 64; 2026-02-21T09:47:42.3060974Z shfl.sync.idx.b32 %r65, %r14, 0, 31, -1; 2026-02-21T09:47:42.3061039Z elect.sync %r66|%p10, -1; 2026-02-21T09:47:42.3061101Z and.pred %p5, %p9, %p10; 2026-02-21T09:47:42.3061159Z and.b32 %r67, %r65, 3; 2026-02-21T09:47:42.3061225Z shl.b32 %r68, %r67, 14; 2026-02-21T09:47:42.3061284Z add.s32 %r44, %r64, %r68; 2026-02-21T09:47:42.3061340Z shl.b32 %r69, %r67, 6; 2026-02-21T09:47:42.3061445Z add.s32 %r45, %r338, %r69; 2026-02-21T09:47:42.3061503Z // begin inline asm 2026-02-21T09:47:42.3061757Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r44], [%rd12, {%r45, %r46}], [%r47]; 2026-02-21T09:47:42.3061821Z // end inline asm 2026-02-21T09:47:42.3061889Z xor.b32 %r70, %r67, 2; 2026-02-21T09:47:42.3061946Z shl.b32 %r71, %r70, 14; 2026-02-21T09:47:42.3062001Z add.s32 %r48, %r64, %r71; 2026-02-21T09:47:42.3062064Z shl.b32 %r72, %r70, 6; 2026-02-21T09:47:42.3062120Z add.s32 %r49, %r338, %r72; 2026-02-21T09:47:42.3062175Z // begin inline asm 2026-02-21T09:47:42.3062409Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r48], [%rd12, {%r49, %r46}], [%r47]; 2026-02-21T09:47:42.3062463Z // end inline asm 2026-02-21T09:47:42.3062627Z .loc 1 55 44 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:55:44 2026-02-21T09:47:42.3062690Z shl.b32 %r73, %r337, 13; 2026-02-21T09:47:42.3062747Z add.s32 %r74, %r29, %r73; 2026-02-21T09:47:42.3062802Z add.s32 %r75, %r74, 262144; 2026-02-21T09:47:42.3062962Z .loc 1 0 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:0 2026-02-21T09:47:42.3063024Z bar.sync 3, 64; 2026-02-21T09:47:42.3063082Z elect.sync %r76|%p11, -1; 2026-02-21T09:47:42.3063185Z and.pred %p7, %p9, %p11; 2026-02-21T09:47:42.3063250Z shl.b32 %r77, %r67, 11; 2026-02-21T09:47:42.3063306Z add.s32 %r52, %r75, %r77; 2026-02-21T09:47:42.3063363Z // begin inline asm 2026-02-21T09:47:42.3063592Z @%p7 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r52], [%rd14, {%r45, %r54}], [%r47]; 2026-02-21T09:47:42.3063653Z // end inline asm 2026-02-21T09:47:42.3063707Z shl.b32 %r78, %r70, 11; 2026-02-21T09:47:42.3063763Z add.s32 %r56, %r75, %r78; 2026-02-21T09:47:42.3063826Z // begin inline asm 2026-02-21T09:47:42.3064055Z @%p7 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r56], [%rd14, {%r49, %r54}], [%r47]; 2026-02-21T09:47:42.3064113Z // end inline asm 2026-02-21T09:47:42.3064177Z add.s32 %r79, %r337, 1; 2026-02-21T09:47:42.3064240Z setp.eq.b32 %p12, %r79, 4; 2026-02-21T09:47:42.3064304Z selp.b32 %r337, 0, %r79, %p12; 2026-02-21T09:47:42.3064361Z selp.b32 %r80, 1, 0, %p12; 2026-02-21T09:47:42.3064428Z xor.b32 %r336, %r336, %r80; 2026-02-21T09:47:42.3064599Z .loc 1 50 79 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:50:79 2026-02-21T09:47:42.3064714Z add.s32 %r20, %r338, 256; 2026-02-21T09:47:42.3064785Z setp.lt.u32 %p13, %r338, 1792; 2026-02-21T09:47:42.3064841Z mov.b32 %r338, %r20; 2026-02-21T09:47:42.3064896Z @%p13 bra $L__BB0_9; 2026-02-21T09:47:42.3064995Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3065049Z barrier.sync 1; 2026-02-21T09:47:42.3065125Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:47:42.3065178Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.3065278Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:47:42.3065436Z .loc 1 19 0 // cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py:19 2026-02-21T09:47:42.3065491Z barrier.sync 1; 2026-02-21T09:47:42.3065551Z barrier.sync 1; 2026-02-21T09:47:42.3065605Z bra.uni $L__BB0_2; 2026-02-21T09:47:42.3065684Z $L__tmp1: 2026-02-21T09:47:42.3065739Z $L__func_end0: 2026-02-21T09:47:42.3065827Z // -- End function 2026-02-21T09:47:42.3065877Z } 2026-02-21T09:47:42.3066074Z .file 1 "/tmp/torchinductor_root/vn/cvnwbbsu6uegfqyuwr2ybjydxukez5laqvhinjoi56mvckyv7hxl.py" 2026-02-21T09:47:42.3066142Z .section .debug_abbrev 2026-02-21T09:47:42.3066192Z { 2026-02-21T09:47:42.3066277Z .b8 1 // Abbreviation Code 2026-02-21T09:47:42.3066371Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:47:42.3066451Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:47:42.3066559Z .b8 37 // DW_AT_producer 2026-02-21T09:47:42.3066633Z .b8 8 // DW_FORM_string 2026-02-21T09:47:42.3066712Z .b8 19 // DW_AT_language 2026-02-21T09:47:42.3066787Z .b8 5 // DW_FORM_data2 2026-02-21T09:47:42.3066861Z .b8 3 // DW_AT_name 2026-02-21T09:47:42.3066941Z .b8 8 // DW_FORM_string 2026-02-21T09:47:42.3067018Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:47:42.3067091Z .b8 6 // DW_FORM_data4 2026-02-21T09:47:42.3067170Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:47:42.3067240Z .b8 8 // DW_FORM_string 2026-02-21T09:47:42.3067310Z .b8 0 // EOM(1) 2026-02-21T09:47:42.3067378Z .b8 0 // EOM(2) 2026-02-21T09:47:42.3067450Z .b8 0 // EOM(3) 2026-02-21T09:47:42.3067499Z } 2026-02-21T09:47:42.3067557Z .section .debug_info 2026-02-21T09:47:42.3067614Z { 2026-02-21T09:47:42.3067696Z .b32 104 // Length of Unit 2026-02-21T09:47:42.3067807Z .b8 2 // DWARF version number 2026-02-21T09:47:42.3067866Z .b8 0 2026-02-21T09:47:42.3067978Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:47:42.3068066Z .b8 8 // Address Size (in bytes) 2026-02-21T09:47:42.3068160Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:47:42.3068245Z .b8 116 // DW_AT_producer 2026-02-21T09:47:42.3068295Z .b8 114 2026-02-21T09:47:42.3068346Z .b8 105 2026-02-21T09:47:42.3068399Z .b8 116 2026-02-21T09:47:42.3068448Z .b8 111 2026-02-21T09:47:42.3068495Z .b8 110 2026-02-21T09:47:42.3068546Z .b8 0 2026-02-21T09:47:42.3068622Z .b8 2 // DW_AT_language 2026-02-21T09:47:42.3068670Z .b8 0 2026-02-21T09:47:42.3068740Z .b8 99 // DW_AT_name 2026-02-21T09:47:42.3068795Z .b8 118 2026-02-21T09:47:42.3068844Z .b8 110 2026-02-21T09:47:42.3068893Z .b8 119 2026-02-21T09:47:42.3068942Z .b8 98 2026-02-21T09:47:42.3069000Z .b8 98 2026-02-21T09:47:42.3069048Z .b8 115 2026-02-21T09:47:42.3069096Z .b8 117 2026-02-21T09:47:42.3069185Z .b8 54 2026-02-21T09:47:42.3069234Z .b8 117 2026-02-21T09:47:42.3069282Z .b8 101 2026-02-21T09:47:42.3069330Z .b8 103 2026-02-21T09:47:42.3069387Z .b8 102 2026-02-21T09:47:42.3069435Z .b8 113 2026-02-21T09:47:42.3069483Z .b8 121 2026-02-21T09:47:42.3069531Z .b8 117 2026-02-21T09:47:42.3069587Z .b8 119 2026-02-21T09:47:42.3069636Z .b8 114 2026-02-21T09:47:42.3069683Z .b8 50 2026-02-21T09:47:42.3069740Z .b8 121 2026-02-21T09:47:42.3069788Z .b8 98 2026-02-21T09:47:42.3069839Z .b8 106 2026-02-21T09:47:42.3069887Z .b8 121 2026-02-21T09:47:42.3069946Z .b8 100 2026-02-21T09:47:42.3069993Z .b8 120 2026-02-21T09:47:42.3070042Z .b8 117 2026-02-21T09:47:42.3070097Z .b8 107 2026-02-21T09:47:42.3070145Z .b8 101 2026-02-21T09:47:42.3070194Z .b8 122 2026-02-21T09:47:42.3070242Z .b8 53 2026-02-21T09:47:42.3070299Z .b8 108 2026-02-21T09:47:42.3070348Z .b8 97 2026-02-21T09:47:42.3070397Z .b8 113 2026-02-21T09:47:42.3070478Z .b8 118 2026-02-21T09:47:42.3070527Z .b8 104 2026-02-21T09:47:42.3070576Z .b8 105 2026-02-21T09:47:42.3070627Z .b8 110 2026-02-21T09:47:42.3070687Z .b8 106 2026-02-21T09:47:42.3070734Z .b8 111 2026-02-21T09:47:42.3070783Z .b8 105 2026-02-21T09:47:42.3070832Z .b8 53 2026-02-21T09:47:42.3070891Z .b8 54 2026-02-21T09:47:42.3070940Z .b8 109 2026-02-21T09:47:42.3070988Z .b8 118 2026-02-21T09:47:42.3071043Z .b8 99 2026-02-21T09:47:42.3071092Z .b8 107 2026-02-21T09:47:42.3071140Z .b8 121 2026-02-21T09:47:42.3071187Z .b8 118 2026-02-21T09:47:42.3071243Z .b8 55 2026-02-21T09:47:42.3071290Z .b8 104 2026-02-21T09:47:42.3071364Z .b8 120 2026-02-21T09:47:42.3071419Z .b8 108 2026-02-21T09:47:42.3071468Z .b8 46 2026-02-21T09:47:42.3071518Z .b8 112 2026-02-21T09:47:42.3071566Z .b8 121 2026-02-21T09:47:42.3071624Z .b8 0 2026-02-21T09:47:42.3071713Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:47:42.3071786Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:47:42.3071843Z .b8 116 2026-02-21T09:47:42.3071891Z .b8 109 2026-02-21T09:47:42.3071939Z .b8 112 2026-02-21T09:47:42.3071988Z .b8 47 2026-02-21T09:47:42.3072044Z .b8 116 2026-02-21T09:47:42.3072093Z .b8 111 2026-02-21T09:47:42.3072141Z .b8 114 2026-02-21T09:47:42.3072189Z .b8 99 2026-02-21T09:47:42.3072246Z .b8 104 2026-02-21T09:47:42.3072295Z .b8 105 2026-02-21T09:47:42.3072344Z .b8 110 2026-02-21T09:47:42.3072398Z .b8 100 2026-02-21T09:47:42.3072446Z .b8 117 2026-02-21T09:47:42.3072494Z .b8 99 2026-02-21T09:47:42.3072543Z .b8 116 2026-02-21T09:47:42.3072599Z .b8 111 2026-02-21T09:47:42.3072648Z .b8 114 2026-02-21T09:47:42.3072696Z .b8 95 2026-02-21T09:47:42.3072753Z .b8 114 2026-02-21T09:47:42.3072801Z .b8 111 2026-02-21T09:47:42.3072849Z .b8 111 2026-02-21T09:47:42.3072897Z .b8 116 2026-02-21T09:47:42.3072953Z .b8 47 2026-02-21T09:47:42.3073001Z .b8 118 2026-02-21T09:47:42.3073049Z .b8 110 2026-02-21T09:47:42.3073104Z .b8 0 2026-02-21T09:47:42.3073153Z } 2026-02-21T09:47:42.3073238Z .section .debug_macinfo { } 2026-02-21T09:47:42.3073244Z 2026-02-21T09:47:42.3073320Z ================================================================ 2026-02-21T09:47:42.3073428Z please share the reproducer above with Triton project. 2026-02-21T09:47:46.3399662Z 2026-02-21T09:47:46.3401445Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 88/88 17.5 configs/s 2026-02-21T09:47:52.0561103Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 174.1 2026-02-21T09:47:52.0565608Z configs/s 2026-02-21T09:47:52.2415634Z [89s] Generation 3 complete: 2026-02-21T09:47:52.2417563Z error=18 2026-02-21T09:47:52.2417732Z ok=72 2026-02-21T09:47:52.2417861Z min=0.1353 2026-02-21T09:47:52.2418001Z mid=0.2733 2026-02-21T09:47:52.2418127Z max=22.0365 2026-02-21T09:47:52.2418284Z best={'block_sizes': [256, 128, 64], 2026-02-21T09:47:52.2418534Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:47:52.2418785Z 'l2_groupings': [8], 2026-02-21T09:47:52.2418962Z 'load_eviction_policies': ['', ''], 2026-02-21T09:47:52.2419483Z 'loop_orders': [[0, 1]], 2026-02-21T09:47:52.2419640Z 'maxnreg': 64, 2026-02-21T09:47:52.2419806Z 'num_sm_multiplier': 8, 2026-02-21T09:47:52.2419965Z 'num_stages': 2, 2026-02-21T09:47:52.2420097Z 'num_warps': 1, 2026-02-21T09:47:52.2420252Z 'pid_type': 'persistent_blocked', 2026-02-21T09:47:52.2420428Z 'range_flattens': [False, False], 2026-02-21T09:47:52.2420610Z 'range_multi_buffers': [False, False], 2026-02-21T09:47:52.2420787Z 'range_num_stages': [0, 0], 2026-02-21T09:47:52.2420953Z 'range_unroll_factors': [0, 0], 2026-02-21T09:47:52.2421130Z 'range_warp_specializes': [None, True]} 2026-02-21T09:47:52.2438182Z [89s] Fitting surrogate: 381 points, 381 targets 2026-02-21T09:47:53.4051936Z [90s] Generation 4 starting: 86 neighbors, 5 active search path(s) 2026-02-21T09:48:16.9836298Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 0.5 configs/s 2026-02-21T09:48:17.3518320Z [114s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:48:17.3518703Z 2026-02-21T09:48:17.3523920Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:48:17.3525763Z 2026-02-21T09:48:17.3525929Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:48:17.3526246Z ================================================================ 2026-02-21T09:48:17.3528297Z `ptxas` stderr: 2026-02-21T09:48:17.3529172Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 309 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:17.3529840Z Internal Triton PTX codegen error 2026-02-21T09:48:17.3530088Z `ptxas` stderr: 2026-02-21T09:48:17.3530653Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 309 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:17.3531282Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:17.3531496Z 2026-02-21T09:48:17.3532055Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp9m99robr.ptx -o /tmp/tmp9m99robr.ptx.o 2026-02-21T09:48:17.3532640Z 2026-02-21T09:48:17.3532645Z 2026-02-21T09:48:17.3532724Z // 2026-02-21T09:48:17.3532930Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:48:17.3533186Z // 2026-02-21T09:48:17.3533293Z 2026-02-21T09:48:17.3533372Z .version 8.7 2026-02-21T09:48:17.3533554Z .target sm_100a 2026-02-21T09:48:17.3533756Z .address_size 64 2026-02-21T09:48:17.3533855Z 2026-02-21T09:48:17.3534029Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:48:17.3534486Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:48:17.3534863Z // @_helion_matmul 2026-02-21T09:48:17.3535118Z .visible .entry _helion_matmul( 2026-02-21T09:48:17.3535385Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:48:17.3535731Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:48:17.3536059Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:48:17.3536394Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:48:17.3536675Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:48:17.3536914Z ) 2026-02-21T09:48:17.3537048Z .reqntid 256 2026-02-21T09:48:17.3537202Z .maxnreg 32 2026-02-21T09:48:17.3537341Z { 2026-02-21T09:48:17.3537492Z .reg .pred %p<120>; 2026-02-21T09:48:17.3537665Z .reg .b16 %rs<15>; 2026-02-21T09:48:17.3537829Z .reg .b32 %r<342>; 2026-02-21T09:48:17.3537984Z .reg .b64 %rd<172>; 2026-02-21T09:48:17.3538290Z .loc 1 19 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:19:0 2026-02-21T09:48:17.3538626Z $L__func_begin0: 2026-02-21T09:48:17.3538906Z .loc 1 19 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:19:0 2026-02-21T09:48:17.3539167Z 2026-02-21T09:48:17.3539235Z // %bb.0: 2026-02-21T09:48:17.3539406Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:48:17.3539632Z $L__tmp0: 2026-02-21T09:48:17.3539935Z .loc 1 19 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:19 2026-02-21T09:48:17.3540254Z mov.u32 %r1, %tid.x; 2026-02-21T09:48:17.3540413Z shr.u32 %r2, %r1, 5; 2026-02-21T09:48:17.3540594Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:48:17.3540808Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:48:17.3541036Z @%p1 bra $L__BB0_12; 2026-02-21T09:48:17.3541206Z bra.uni $L__BB0_1; 2026-02-21T09:48:17.3541361Z $L__BB0_12: 2026-02-21T09:48:17.3541630Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0:0 2026-02-21T09:48:17.3541968Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:48:17.3542211Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:48:17.3542443Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:48:17.3542777Z .loc 1 19 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:19 2026-02-21T09:48:17.3543125Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:17.3543343Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:48:17.3543540Z mov.b32 %r129, global_smem; 2026-02-21T09:48:17.3543716Z // begin inline asm 2026-02-21T09:48:17.3543995Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r129], 64; 2026-02-21T09:48:17.3544266Z // end inline asm 2026-02-21T09:48:17.3544462Z bar.sync 0, 128; 2026-02-21T09:48:17.3544637Z ld.shared.b32 %r334, [global_smem]; 2026-02-21T09:48:17.3544881Z bar.sync 0, 128; 2026-02-21T09:48:17.3545037Z // begin inline asm 2026-02-21T09:48:17.3545263Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:48:17.3545524Z // end inline asm 2026-02-21T09:48:17.3545797Z .loc 1 21 67 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:21:67 2026-02-21T09:48:17.3546124Z mov.u32 %r21, %ctaid.x; 2026-02-21T09:48:17.3546293Z mov.u32 %r154, %ctaid.y; 2026-02-21T09:48:17.3546467Z mov.u32 %r155, %ctaid.z; 2026-02-21T09:48:17.3546646Z mov.u32 %r156, %nctaid.x; 2026-02-21T09:48:17.3546819Z mov.u32 %r157, %nctaid.y; 2026-02-21T09:48:17.3547005Z mad.lo.s32 %r158, %r155, %r157, %r154; 2026-02-21T09:48:17.3547207Z mad.lo.s32 %r159, %r158, %r156, %r21; 2026-02-21T09:48:17.3547411Z mul.lo.s32 %r160, %r159, 384; 2026-02-21T09:48:17.3547591Z cvt.s64.s32 %rd104, %r160; 2026-02-21T09:48:17.3547778Z add.s64 %rd65, %rd7, %rd104; 2026-02-21T09:48:17.3547951Z shl.b32 %r161, %r1, 2; 2026-02-21T09:48:17.3548172Z add.s32 %r130, %r129, %r161; 2026-02-21T09:48:17.3548340Z mov.b32 %r139, 0; 2026-02-21T09:48:17.3548495Z // begin inline asm 2026-02-21T09:48:17.3548671Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:17.3548868Z // end inline asm 2026-02-21T09:48:17.3549022Z bar.warp.sync -1; 2026-02-21T09:48:17.3549180Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:48:17.3549359Z cvt.u64.u32 %rd50, %r129; 2026-02-21T09:48:17.3549524Z // begin inline asm 2026-02-21T09:48:17.3549805Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd4; 2026-02-21T09:48:17.3550123Z // end inline asm 2026-02-21T09:48:17.3550278Z // begin inline asm 2026-02-21T09:48:17.3550529Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:17.3550805Z // end inline asm 2026-02-21T09:48:17.3550957Z mov.b32 %r132, 64; 2026-02-21T09:48:17.3551107Z // begin inline asm 2026-02-21T09:48:17.3551372Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:17.3551678Z // end inline asm 2026-02-21T09:48:17.3551833Z // begin inline asm 2026-02-21T09:48:17.3552084Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:17.3552383Z // end inline asm 2026-02-21T09:48:17.3552537Z mov.b32 %r134, 2048; 2026-02-21T09:48:17.3552693Z // begin inline asm 2026-02-21T09:48:17.3552970Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r134; 2026-02-21T09:48:17.3553282Z // end inline asm 2026-02-21T09:48:17.3553473Z // begin inline asm 2026-02-21T09:48:17.3553742Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r134; 2026-02-21T09:48:17.3554059Z // end inline asm 2026-02-21T09:48:17.3554215Z mov.b64 %rd58, 4096; 2026-02-21T09:48:17.3554371Z // begin inline asm 2026-02-21T09:48:17.3554832Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:17.3555001Z 2026-02-21T09:48:17.3555444Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp9m99robr.ptx -o /tmp/tmp9m99robr.ptx.o 2026-02-21T09:48:17.3555971Z 2026-02-21T09:48:17.3556117Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:48:17.3556528Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:48:17.3556844Z // end inline asm 2026-02-21T09:48:17.3556997Z mov.b32 %r136, 1; 2026-02-21T09:48:17.3557146Z // begin inline asm 2026-02-21T09:48:17.3557439Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:17.3557755Z // end inline asm 2026-02-21T09:48:17.3557912Z // begin inline asm 2026-02-21T09:48:17.3558241Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:17.3558569Z // end inline asm 2026-02-21T09:48:17.3558723Z // begin inline asm 2026-02-21T09:48:17.3558980Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:17.3559284Z // end inline asm 2026-02-21T09:48:17.3559428Z // begin inline asm 2026-02-21T09:48:17.3559710Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:17.3560017Z // end inline asm 2026-02-21T09:48:17.3560169Z // begin inline asm 2026-02-21T09:48:17.3560431Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:17.3560721Z // end inline asm 2026-02-21T09:48:17.3560874Z // begin inline asm 2026-02-21T09:48:17.3561122Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:17.3561413Z // end inline asm 2026-02-21T09:48:17.3561557Z // begin inline asm 2026-02-21T09:48:17.3561943Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd65 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:17.3562431Z // end inline asm 2026-02-21T09:48:17.3562576Z // begin inline asm 2026-02-21T09:48:17.3562811Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd65 + 0 ], 0x80; 2026-02-21T09:48:17.3563092Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:17.3563310Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:17.3563502Z // end inline asm 2026-02-21T09:48:17.3563654Z bar.sync 0, 128; 2026-02-21T09:48:17.3563933Z .loc 1 22 68 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:22:68 2026-02-21T09:48:17.3564255Z add.s32 %r162, %r160, 128; 2026-02-21T09:48:17.3564438Z cvt.s64.s32 %rd105, %r162; 2026-02-21T09:48:17.3564617Z add.s64 %rd83, %rd7, %rd105; 2026-02-21T09:48:17.3564839Z bar.sync 0, 128; 2026-02-21T09:48:17.3564986Z // begin inline asm 2026-02-21T09:48:17.3565160Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:17.3565353Z // end inline asm 2026-02-21T09:48:17.3565512Z bar.warp.sync -1; 2026-02-21T09:48:17.3565666Z // begin inline asm 2026-02-21T09:48:17.3565947Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd5; 2026-02-21T09:48:17.3566263Z // end inline asm 2026-02-21T09:48:17.3566410Z // begin inline asm 2026-02-21T09:48:17.3566705Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:17.3566989Z // end inline asm 2026-02-21T09:48:17.3567148Z // begin inline asm 2026-02-21T09:48:17.3567404Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:17.3567736Z // end inline asm 2026-02-21T09:48:17.3567888Z // begin inline asm 2026-02-21T09:48:17.3568140Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:17.3568438Z // end inline asm 2026-02-21T09:48:17.3568582Z // begin inline asm 2026-02-21T09:48:17.3568885Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r134; 2026-02-21T09:48:17.3569193Z // end inline asm 2026-02-21T09:48:17.3569351Z mov.b32 %r143, 12288; 2026-02-21T09:48:17.3569510Z // begin inline asm 2026-02-21T09:48:17.3569786Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r143; 2026-02-21T09:48:17.3570099Z // end inline asm 2026-02-21T09:48:17.3570244Z // begin inline asm 2026-02-21T09:48:17.3570526Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:48:17.3570844Z // end inline asm 2026-02-21T09:48:17.3570998Z // begin inline asm 2026-02-21T09:48:17.3571277Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:17.3571602Z // end inline asm 2026-02-21T09:48:17.3571757Z // begin inline asm 2026-02-21T09:48:17.3572037Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:17.3572403Z // end inline asm 2026-02-21T09:48:17.3572551Z // begin inline asm 2026-02-21T09:48:17.3572804Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:17.3573088Z // end inline asm 2026-02-21T09:48:17.3573239Z // begin inline asm 2026-02-21T09:48:17.3573512Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:17.3573814Z // end inline asm 2026-02-21T09:48:17.3573964Z // begin inline asm 2026-02-21T09:48:17.3574216Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:17.3574512Z // end inline asm 2026-02-21T09:48:17.3574658Z // begin inline asm 2026-02-21T09:48:17.3574956Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:17.3575234Z // end inline asm 2026-02-21T09:48:17.3575386Z // begin inline asm 2026-02-21T09:48:17.3575778Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd83 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:17.3576196Z // end inline asm 2026-02-21T09:48:17.3576383Z // begin inline asm 2026-02-21T09:48:17.3576611Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd83 + 0 ], 0x80; 2026-02-21T09:48:17.3576894Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:17.3577102Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:17.3577304Z // end inline asm 2026-02-21T09:48:17.3577456Z bar.sync 0, 128; 2026-02-21T09:48:17.3577732Z .loc 1 24 73 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:24:73 2026-02-21T09:48:17.3578057Z add.s32 %r163, %r160, 256; 2026-02-21T09:48:17.3578234Z cvt.s64.s32 %rd106, %r163; 2026-02-21T09:48:17.3578421Z add.s64 %rd101, %rd7, %rd106; 2026-02-21T09:48:17.3578597Z bar.sync 0, 128; 2026-02-21T09:48:17.3578759Z // begin inline asm 2026-02-21T09:48:17.3578930Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:17.3579142Z // end inline asm 2026-02-21T09:48:17.3579298Z bar.warp.sync -1; 2026-02-21T09:48:17.3579451Z // begin inline asm 2026-02-21T09:48:17.3579727Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd6; 2026-02-21T09:48:17.3580029Z // end inline asm 2026-02-21T09:48:17.3580181Z // begin inline asm 2026-02-21T09:48:17.3580424Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:17.3580715Z // end inline asm 2026-02-21T09:48:17.3580858Z // begin inline asm 2026-02-21T09:48:17.3581121Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:17.3581420Z // end inline asm 2026-02-21T09:48:17.3581624Z // begin inline asm 2026-02-21T09:48:17.3581889Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:17.3582186Z // end inline asm 2026-02-21T09:48:17.3582339Z // begin inline asm 2026-02-21T09:48:17.3582608Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r143; 2026-02-21T09:48:17.3582946Z // end inline asm 2026-02-21T09:48:17.3583104Z // begin inline asm 2026-02-21T09:48:17.3583376Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r134; 2026-02-21T09:48:17.3583702Z // end inline asm 2026-02-21T09:48:17.3583856Z mov.b64 %rd94, 24576; 2026-02-21T09:48:17.3584031Z // begin inline asm 2026-02-21T09:48:17.3584313Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd94; 2026-02-21T09:48:17.3584637Z // end inline asm 2026-02-21T09:48:17.3584830Z // begin inline asm 2026-02-21T09:48:17.3585116Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:17.3585437Z // end inline asm 2026-02-21T09:48:17.3585580Z // begin inline asm 2026-02-21T09:48:17.3585860Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:17.3586180Z // end inline asm 2026-02-21T09:48:17.3586370Z // begin inline asm 2026-02-21T09:48:17.3586627Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:17.3586920Z // end inline asm 2026-02-21T09:48:17.3587072Z // begin inline asm 2026-02-21T09:48:17.3587343Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:17.3587658Z // end inline asm 2026-02-21T09:48:17.3587802Z // begin inline asm 2026-02-21T09:48:17.3588064Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:17.3588353Z // end inline asm 2026-02-21T09:48:17.3588505Z // begin inline asm 2026-02-21T09:48:17.3588763Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:17.3589049Z // end inline asm 2026-02-21T09:48:17.3589205Z // begin inline asm 2026-02-21T09:48:17.3589583Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd101 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:17.3590023Z // end inline asm 2026-02-21T09:48:17.3590171Z // begin inline asm 2026-02-21T09:48:17.3590453Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd101 + 0 ], 0x80; 2026-02-21T09:48:17.3590742Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:17.3590952Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:17.3591151Z // end inline asm 2026-02-21T09:48:17.3591296Z bar.sync 0, 128; 2026-02-21T09:48:17.3591460Z cvta.global.u64 %rd107, %rd101; 2026-02-21T09:48:17.3591766Z .loc 1 31 35 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:31:35 2026-02-21T09:48:17.3592088Z mul.lo.s32 %r341, %r21, 3; 2026-02-21T09:48:17.3592378Z .loc 1 32 37 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:32:37 2026-02-21T09:48:17.3592691Z add.s32 %r164, %r341, 3; 2026-02-21T09:48:17.3592978Z .loc 1 32 49 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:32:49 2026-02-21T09:48:17.3593289Z min.s32 %r23, %r164, 6144; 2026-02-21T09:48:17.3593581Z .loc 1 33 84 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:33:84 2026-02-21T09:48:17.3593895Z setp.ge.s32 %p90, %r341, %r23; 2026-02-21T09:48:17.3594086Z @%p90 bra $L__BB0_15; 2026-02-21T09:48:17.3594267Z // %bb.13: // %.lr.ph 2026-02-21T09:48:17.3594598Z .loc 1 0 84 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0:84 2026-02-21T09:48:17.3595001Z shl.b32 %r165, %r1, 7; 2026-02-21T09:48:17.3595169Z and.b32 %r166, %r165, 1920; 2026-02-21T09:48:17.3595352Z shl.b32 %r167, %r1, 6; 2026-02-21T09:48:17.3595555Z and.b32 %r168, %r167, 6144; 2026-02-21T09:48:17.3595734Z shl.b32 %r169, %r1, 4; 2026-02-21T09:48:17.3595894Z and.b32 %r170, %r169, 112; 2026-02-21T09:48:17.3596066Z and.b32 %r172, %r161, 64; 2026-02-21T09:48:17.3596232Z or.b32 %r173, %r168, %r170; 2026-02-21T09:48:17.3596412Z xor.b32 %r174, %r173, %r172; 2026-02-21T09:48:17.3596611Z or.b32 %r175, %r174, %r166; 2026-02-21T09:48:17.3596792Z add.s32 %r177, %r129, 65536; 2026-02-21T09:48:17.3596971Z add.s32 %r24, %r177, %r175; 2026-02-21T09:48:17.3597137Z xor.b32 %r178, %r175, 16; 2026-02-21T09:48:17.3597310Z add.s32 %r25, %r177, %r178; 2026-02-21T09:48:17.3597477Z xor.b32 %r179, %r175, 32; 2026-02-21T09:48:17.3597651Z add.s32 %r26, %r177, %r179; 2026-02-21T09:48:17.3597815Z xor.b32 %r180, %r175, 48; 2026-02-21T09:48:17.3597988Z add.s32 %r27, %r177, %r180; 2026-02-21T09:48:17.3598278Z .loc 1 33 84 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:33:84 2026-02-21T09:48:17.3598608Z cvt.u16.u32 %rs4, %r21; 2026-02-21T09:48:17.3598785Z mul.lo.s16 %rs14, %rs4, 3; 2026-02-21T09:48:17.3599010Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:48:17.3599370Z .loc 1 39 35 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:39:35 2026-02-21T09:48:17.3599726Z mul.hi.s32 %r276, %r341, 715827883; 2026-02-21T09:48:17.3599922Z shr.u32 %r277, %r276, 31; 2026-02-21T09:48:17.3600091Z shr.s32 %r278, %r276, 7; 2026-02-21T09:48:17.3600265Z add.s32 %r279, %r278, %r277; 2026-02-21T09:48:17.3600558Z .loc 1 43 51 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:43:51 2026-02-21T09:48:17.3600882Z cvt.u16.u32 %rs5, %r279; 2026-02-21T09:48:17.3601081Z mad.lo.s16 %rs6, %rs5, -768, %rs14; 2026-02-21T09:48:17.3601274Z shr.s16 %rs7, %rs6, 15; 2026-02-21T09:48:17.3601452Z shr.u16 %rs8, %rs7, 14; 2026-02-21T09:48:17.3601618Z add.s16 %rs9, %rs6, %rs8; 2026-02-21T09:48:17.3601797Z shr.s16 %rs10, %rs9, 2; 2026-02-21T09:48:17.3602084Z .loc 1 42 64 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:42:64 2026-02-21T09:48:17.3602419Z and.b16 %rs11, %rs9, -4; 2026-02-21T09:48:17.3602616Z mad.lo.s16 %rs12, %rs5, 768, %rs11; 2026-02-21T09:48:17.3602808Z sub.s16 %rs13, %rs14, %rs12; 2026-02-21T09:48:17.3603114Z .loc 1 44 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:44:27 2026-02-21T09:48:17.3603457Z shl.b32 %r280, %r279, 8; 2026-02-21T09:48:17.3603639Z mad.wide.s16 %r274, %rs13, 64, %r280; 2026-02-21T09:48:17.3603950Z .loc 1 45 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:45:27 2026-02-21T09:48:17.3604269Z mul.wide.s16 %r273, %rs10, 64; 2026-02-21T09:48:17.3604564Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3604931Z shfl.sync.idx.b32 %r281, %r2, 0, 31, -1; 2026-02-21T09:48:17.3605142Z shl.b32 %r282, %r281, 21; 2026-02-21T09:48:17.3605312Z and.b32 %r283, %r282, 6291456; 2026-02-21T09:48:17.3605498Z add.s32 %r181, %r283, %r334; 2026-02-21T09:48:17.3605677Z mov.pred %p91, -1; 2026-02-21T09:48:17.3605848Z mov.b32 %r182, 0; 2026-02-21T09:48:17.3606005Z // begin inline asm 2026-02-21T09:48:17.3606453Z @%p91 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r181 + 0], 32, {%r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182}; 2026-02-21T09:48:17.3606909Z // end inline asm 2026-02-21T09:48:17.3607062Z // begin inline asm 2026-02-21T09:48:17.3607486Z @%p91 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r181 + 16], 32, {%r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182}; 2026-02-21T09:48:17.3607933Z // end inline asm 2026-02-21T09:48:17.3608092Z // begin inline asm 2026-02-21T09:48:17.3608259Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:48:17.3608449Z // end inline asm 2026-02-21T09:48:17.3608627Z bar.sync 0, 128; 2026-02-21T09:48:17.3608907Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3609232Z add.s32 %r215, %r129, 73856; 2026-02-21T09:48:17.3609400Z // begin inline asm 2026-02-21T09:48:17.3609593Z @%p37 mbarrier.init.shared::cta.b64 [%r215], 1; 2026-02-21T09:48:17.3609832Z // end inline asm 2026-02-21T09:48:17.3609986Z bar.sync 0, 128; 2026-02-21T09:48:17.3610141Z add.s32 %r216, %r129, 73864; 2026-02-21T09:48:17.3610324Z // begin inline asm 2026-02-21T09:48:17.3610507Z @%p37 mbarrier.init.shared::cta.b64 [%r216], 1; 2026-02-21T09:48:17.3610730Z // end inline asm 2026-02-21T09:48:17.3610886Z bar.sync 0, 128; 2026-02-21T09:48:17.3611041Z add.s32 %r217, %r129, 73872; 2026-02-21T09:48:17.3611224Z // begin inline asm 2026-02-21T09:48:17.3611407Z @%p37 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T09:48:17.3611625Z // end inline asm 2026-02-21T09:48:17.3611776Z bar.sync 0, 128; 2026-02-21T09:48:17.3611939Z add.s32 %r218, %r129, 73880; 2026-02-21T09:48:17.3612113Z // begin inline asm 2026-02-21T09:48:17.3612302Z @%p37 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T09:48:17.3612516Z // end inline asm 2026-02-21T09:48:17.3612672Z add.s32 %r219, %r129, 73888; 2026-02-21T09:48:17.3612888Z // begin inline asm 2026-02-21T09:48:17.3613102Z @%p37 mbarrier.init.shared::cta.b64 [%r219], 1; 2026-02-21T09:48:17.3613354Z // end inline asm 2026-02-21T09:48:17.3613548Z bar.sync 0, 128; 2026-02-21T09:48:17.3613748Z add.s32 %r220, %r129, 73896; 2026-02-21T09:48:17.3613944Z // begin inline asm 2026-02-21T09:48:17.3614157Z @%p37 mbarrier.init.shared::cta.b64 [%r220], 1; 2026-02-21T09:48:17.3614395Z // end inline asm 2026-02-21T09:48:17.3614576Z bar.sync 0, 128; 2026-02-21T09:48:17.3614801Z add.s32 %r221, %r129, 73904; 2026-02-21T09:48:17.3615007Z // begin inline asm 2026-02-21T09:48:17.3615189Z @%p37 mbarrier.init.shared::cta.b64 [%r221], 1; 2026-02-21T09:48:17.3615394Z // end inline asm 2026-02-21T09:48:17.3615544Z bar.sync 0, 128; 2026-02-21T09:48:17.3615695Z add.s32 %r222, %r129, 73912; 2026-02-21T09:48:17.3615870Z // begin inline asm 2026-02-21T09:48:17.3616044Z @%p37 mbarrier.init.shared::cta.b64 [%r222], 1; 2026-02-21T09:48:17.3616255Z // end inline asm 2026-02-21T09:48:17.3616537Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3616849Z bar.sync 0, 128; 2026-02-21T09:48:17.3617039Z // begin inline asm 2026-02-21T09:48:17.3617224Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r215]; 2026-02-21T09:48:17.3617443Z // end inline asm 2026-02-21T09:48:17.3617586Z bar.sync 0, 128; 2026-02-21T09:48:17.3617736Z // begin inline asm 2026-02-21T09:48:17.3617916Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r216]; 2026-02-21T09:48:17.3618134Z // end inline asm 2026-02-21T09:48:17.3618286Z bar.sync 0, 128; 2026-02-21T09:48:17.3618429Z // begin inline asm 2026-02-21T09:48:17.3618613Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r217]; 2026-02-21T09:48:17.3618819Z // end inline asm 2026-02-21T09:48:17.3618965Z bar.sync 0, 128; 2026-02-21T09:48:17.3619108Z // begin inline asm 2026-02-21T09:48:17.3619288Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r218]; 2026-02-21T09:48:17.3619494Z // end inline asm 2026-02-21T09:48:17.3619774Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3620090Z bar.sync 0, 128; 2026-02-21T09:48:17.3620239Z add.s32 %r227, %r129, 73920; 2026-02-21T09:48:17.3620414Z // begin inline asm 2026-02-21T09:48:17.3620590Z @%p37 mbarrier.init.shared::cta.b64 [%r227], 1; 2026-02-21T09:48:17.3620798Z // end inline asm 2026-02-21T09:48:17.3620966Z st.shared.b32 [global_smem+73928], 33554689; 2026-02-21T09:48:17.3621197Z st.shared.b32 [global_smem+73728], %r334; 2026-02-21T09:48:17.3621438Z st.shared.v2.b32 [global_smem+73736], {%r274, %r273}; 2026-02-21T09:48:17.3621667Z barrier.sync 1; 2026-02-21T09:48:17.3621886Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:17.3622094Z barrier.sync 1; 2026-02-21T09:48:17.3622250Z barrier.sync 1; 2026-02-21T09:48:17.3622420Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:17.3622748Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3623108Z bar.sync 0, 128; 2026-02-21T09:48:17.3623272Z // begin inline asm 2026-02-21T09:48:17.3623423Z 2026-02-21T09:48:17.3623559Z { 2026-02-21T09:48:17.3623694Z .reg .pred complete; 2026-02-21T09:48:17.3623863Z waitLoop: 2026-02-21T09:48:17.3624071Z mbarrier.try_wait.parity.shared.b64 complete, [%r227], %r182; 2026-02-21T09:48:17.3624335Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.3624510Z } 2026-02-21T09:48:17.3624582Z 2026-02-21T09:48:17.3624641Z // end inline asm 2026-02-21T09:48:17.3624969Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3625280Z bar.sync 0, 128; 2026-02-21T09:48:17.3625437Z // begin inline asm 2026-02-21T09:48:17.3625619Z @%p37 mbarrier.inval.shared::cta.b64 [%r227]; 2026-02-21T09:48:17.3625831Z // end inline asm 2026-02-21T09:48:17.3625983Z // begin inline asm 2026-02-21T09:48:17.3626161Z @%p37 mbarrier.inval.shared::cta.b64 [%r219]; 2026-02-21T09:48:17.3626414Z // end inline asm 2026-02-21T09:48:17.3626563Z bar.sync 0, 128; 2026-02-21T09:48:17.3626716Z // begin inline asm 2026-02-21T09:48:17.3626890Z @%p37 mbarrier.inval.shared::cta.b64 [%r220]; 2026-02-21T09:48:17.3627096Z // end inline asm 2026-02-21T09:48:17.3627240Z bar.sync 0, 128; 2026-02-21T09:48:17.3627390Z // begin inline asm 2026-02-21T09:48:17.3627562Z @%p37 mbarrier.inval.shared::cta.b64 [%r221]; 2026-02-21T09:48:17.3627765Z // end inline asm 2026-02-21T09:48:17.3627914Z bar.sync 0, 128; 2026-02-21T09:48:17.3628057Z // begin inline asm 2026-02-21T09:48:17.3628237Z @%p37 mbarrier.inval.shared::cta.b64 [%r222]; 2026-02-21T09:48:17.3628434Z // end inline asm 2026-02-21T09:48:17.3628591Z // begin inline asm 2026-02-21T09:48:17.3628762Z @%p37 mbarrier.inval.shared::cta.b64 [%r215]; 2026-02-21T09:48:17.3628962Z // end inline asm 2026-02-21T09:48:17.3629102Z bar.sync 0, 128; 2026-02-21T09:48:17.3629253Z // begin inline asm 2026-02-21T09:48:17.3629433Z @%p37 mbarrier.inval.shared::cta.b64 [%r216]; 2026-02-21T09:48:17.3629630Z // end inline asm 2026-02-21T09:48:17.3629779Z bar.sync 0, 128; 2026-02-21T09:48:17.3629988Z // begin inline asm 2026-02-21T09:48:17.3630168Z @%p37 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T09:48:17.3630361Z // end inline asm 2026-02-21T09:48:17.3630510Z bar.sync 0, 128; 2026-02-21T09:48:17.3630653Z // begin inline asm 2026-02-21T09:48:17.3630830Z @%p37 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T09:48:17.3631024Z // end inline asm 2026-02-21T09:48:17.3631300Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3631621Z // begin inline asm 2026-02-21T09:48:17.3632077Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251, %r252, %r253, %r254}, [%r181 + 0], 32; 2026-02-21T09:48:17.3632669Z // end inline asm 2026-02-21T09:48:17.3632838Z // begin inline asm 2026-02-21T09:48:17.3633318Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270, %r271}, [%r181 + 16], 32; 2026-02-21T09:48:17.3633847Z // end inline asm 2026-02-21T09:48:17.3633999Z // begin inline asm 2026-02-21T09:48:17.3634176Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:48:17.3634355Z // end inline asm 2026-02-21T09:48:17.3634515Z cvt.u64.u32 %rd108, %r239; 2026-02-21T09:48:17.3634768Z cvt.u64.u32 %rd109, %r240; 2026-02-21T09:48:17.3634950Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:48:17.3635127Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:48:17.3635436Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3635796Z mov.b64 {%r285, %r286}, %rd111; 2026-02-21T09:48:17.3635987Z cvt.rn.f16x2.f32 %r287, %r286, %r285; 2026-02-21T09:48:17.3636300Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3636612Z cvt.u64.u32 %rd112, %r241; 2026-02-21T09:48:17.3636828Z cvt.u64.u32 %rd113, %r242; 2026-02-21T09:48:17.3637000Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:48:17.3637186Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:48:17.3637475Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3637795Z mov.b64 {%r288, %r289}, %rd115; 2026-02-21T09:48:17.3637989Z cvt.rn.f16x2.f32 %r290, %r289, %r288; 2026-02-21T09:48:17.3638292Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3638608Z cvt.u64.u32 %rd116, %r243; 2026-02-21T09:48:17.3638779Z cvt.u64.u32 %rd117, %r244; 2026-02-21T09:48:17.3638952Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:48:17.3639123Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:48:17.3639421Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3639765Z mov.b64 {%r291, %r292}, %rd119; 2026-02-21T09:48:17.3639952Z cvt.rn.f16x2.f32 %r293, %r292, %r291; 2026-02-21T09:48:17.3640260Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3640568Z cvt.u64.u32 %rd120, %r245; 2026-02-21T09:48:17.3640739Z cvt.u64.u32 %rd121, %r246; 2026-02-21T09:48:17.3640903Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:48:17.3641079Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:48:17.3641362Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3641686Z mov.b64 {%r294, %r295}, %rd123; 2026-02-21T09:48:17.3641875Z cvt.rn.f16x2.f32 %r296, %r295, %r294; 2026-02-21T09:48:17.3642171Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3642481Z cvt.u64.u32 %rd124, %r247; 2026-02-21T09:48:17.3642645Z cvt.u64.u32 %rd125, %r248; 2026-02-21T09:48:17.3642819Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:48:17.3642989Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:48:17.3643282Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3643638Z mov.b64 {%r297, %r298}, %rd127; 2026-02-21T09:48:17.3643825Z cvt.rn.f16x2.f32 %r299, %r298, %r297; 2026-02-21T09:48:17.3644140Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3644466Z cvt.u64.u32 %rd128, %r249; 2026-02-21T09:48:17.3644651Z cvt.u64.u32 %rd129, %r250; 2026-02-21T09:48:17.3644849Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:48:17.3645033Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:48:17.3645341Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3645681Z mov.b64 {%r300, %r301}, %rd131; 2026-02-21T09:48:17.3645874Z cvt.rn.f16x2.f32 %r302, %r301, %r300; 2026-02-21T09:48:17.3646183Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3646510Z cvt.u64.u32 %rd132, %r251; 2026-02-21T09:48:17.3646680Z cvt.u64.u32 %rd133, %r252; 2026-02-21T09:48:17.3646856Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:48:17.3647024Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:48:17.3647323Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3647643Z mov.b64 {%r303, %r304}, %rd135; 2026-02-21T09:48:17.3647821Z cvt.rn.f16x2.f32 %r305, %r304, %r303; 2026-02-21T09:48:17.3648132Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3648487Z cvt.u64.u32 %rd136, %r253; 2026-02-21T09:48:17.3648662Z cvt.u64.u32 %rd137, %r254; 2026-02-21T09:48:17.3648825Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:48:17.3649003Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:48:17.3649329Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3649649Z mov.b64 {%r306, %r307}, %rd139; 2026-02-21T09:48:17.3649837Z cvt.rn.f16x2.f32 %r308, %r307, %r306; 2026-02-21T09:48:17.3650137Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3650455Z cvt.u64.u32 %rd140, %r256; 2026-02-21T09:48:17.3650623Z cvt.u64.u32 %rd141, %r257; 2026-02-21T09:48:17.3650798Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:48:17.3650967Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:48:17.3651264Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3651581Z mov.b64 {%r309, %r310}, %rd143; 2026-02-21T09:48:17.3651762Z cvt.rn.f16x2.f32 %r311, %r310, %r309; 2026-02-21T09:48:17.3652072Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3652383Z cvt.u64.u32 %rd144, %r258; 2026-02-21T09:48:17.3652609Z cvt.u64.u32 %rd145, %r259; 2026-02-21T09:48:17.3652780Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:48:17.3652956Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:48:17.3653246Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3653575Z mov.b64 {%r312, %r313}, %rd147; 2026-02-21T09:48:17.3653760Z cvt.rn.f16x2.f32 %r314, %r313, %r312; 2026-02-21T09:48:17.3654066Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3654396Z cvt.u64.u32 %rd148, %r260; 2026-02-21T09:48:17.3654561Z cvt.u64.u32 %rd149, %r261; 2026-02-21T09:48:17.3654771Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:48:17.3654945Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:48:17.3655250Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3655585Z mov.b64 {%r315, %r316}, %rd151; 2026-02-21T09:48:17.3655769Z cvt.rn.f16x2.f32 %r317, %r316, %r315; 2026-02-21T09:48:17.3656090Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3656441Z cvt.u64.u32 %rd152, %r262; 2026-02-21T09:48:17.3656615Z cvt.u64.u32 %rd153, %r263; 2026-02-21T09:48:17.3656782Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:48:17.3656960Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:48:17.3657252Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3657573Z mov.b64 {%r318, %r319}, %rd155; 2026-02-21T09:48:17.3657762Z cvt.rn.f16x2.f32 %r320, %r319, %r318; 2026-02-21T09:48:17.3658076Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3658400Z cvt.u64.u32 %rd156, %r264; 2026-02-21T09:48:17.3658585Z cvt.u64.u32 %rd157, %r265; 2026-02-21T09:48:17.3658759Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:48:17.3658930Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:48:17.3659232Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3659561Z mov.b64 {%r321, %r322}, %rd159; 2026-02-21T09:48:17.3659742Z cvt.rn.f16x2.f32 %r323, %r322, %r321; 2026-02-21T09:48:17.3660059Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3660369Z cvt.u64.u32 %rd160, %r266; 2026-02-21T09:48:17.3660542Z cvt.u64.u32 %rd161, %r267; 2026-02-21T09:48:17.3660706Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:48:17.3660881Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:48:17.3661204Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3661529Z mov.b64 {%r324, %r325}, %rd163; 2026-02-21T09:48:17.3661718Z cvt.rn.f16x2.f32 %r326, %r325, %r324; 2026-02-21T09:48:17.3662019Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3662361Z cvt.u64.u32 %rd164, %r268; 2026-02-21T09:48:17.3662530Z cvt.u64.u32 %rd165, %r269; 2026-02-21T09:48:17.3662702Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:48:17.3662872Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:48:17.3663168Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3663484Z mov.b64 {%r327, %r328}, %rd167; 2026-02-21T09:48:17.3663663Z cvt.rn.f16x2.f32 %r329, %r328, %r327; 2026-02-21T09:48:17.3663968Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3664276Z cvt.u64.u32 %rd168, %r270; 2026-02-21T09:48:17.3664447Z cvt.u64.u32 %rd169, %r271; 2026-02-21T09:48:17.3664611Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:48:17.3664818Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:48:17.3665113Z .loc 1 58 27 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:58:27 2026-02-21T09:48:17.3665478Z mov.b64 {%r330, %r331}, %rd171; 2026-02-21T09:48:17.3665670Z cvt.rn.f16x2.f32 %r332, %r331, %r330; 2026-02-21T09:48:17.3665980Z .loc 1 59 45 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:59:45 2026-02-21T09:48:17.3666321Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:17.3666507Z bar.sync 0, 128; 2026-02-21T09:48:17.3666705Z st.shared.v4.b32 [%r24], {%r287, %r290, %r293, %r296}; 2026-02-21T09:48:17.3666962Z st.shared.v4.b32 [%r25], {%r299, %r302, %r305, %r308}; 2026-02-21T09:48:17.3667215Z st.shared.v4.b32 [%r26], {%r311, %r314, %r317, %r320}; 2026-02-21T09:48:17.3667466Z st.shared.v4.b32 [%r27], {%r323, %r326, %r329, %r332}; 2026-02-21T09:48:17.3667681Z // begin inline asm 2026-02-21T09:48:17.3667867Z fence.proxy.async.shared::cta; 2026-02-21T09:48:17.3668045Z // end inline asm 2026-02-21T09:48:17.3668202Z bar.sync 0, 128; 2026-02-21T09:48:17.3668361Z elect.sync %r333|%p117, -1; 2026-02-21T09:48:17.3668555Z and.pred %p115, %p34, %p117; 2026-02-21T09:48:17.3668729Z // begin inline asm 2026-02-21T09:48:17.3669034Z @%p115 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd107, {%r273, %r274}], [%r177]; 2026-02-21T09:48:17.3669413Z // end inline asm 2026-02-21T09:48:17.3669572Z cp.async.bulk.commit_group; 2026-02-21T09:48:17.3669884Z .loc 1 33 84 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:33:84 2026-02-21T09:48:17.3670201Z add.s32 %r341, %r341, 1; 2026-02-21T09:48:17.3670381Z add.s16 %rs14, %rs14, 1; 2026-02-21T09:48:17.3670558Z setp.ne.b32 %p118, %r23, %r341; 2026-02-21T09:48:17.3670749Z @%p118 bra $L__BB0_14; 2026-02-21T09:48:17.3670942Z $L__BB0_15: // %._crit_edge 2026-02-21T09:48:17.3671173Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:17.3671368Z bar.sync 0, 128; 2026-02-21T09:48:17.3671645Z .loc 1 33 4 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:33:4 2026-02-21T09:48:17.3671960Z bar.sync 0, 128; 2026-02-21T09:48:17.3672109Z // begin inline asm 2026-02-21T09:48:17.3672337Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r334, 64; 2026-02-21T09:48:17.3672579Z // end inline asm 2026-02-21T09:48:17.3672758Z st.shared.b32 [global_smem+73928], 50529027; 2026-02-21T09:48:17.3672963Z barrier.sync 1; 2026-02-21T09:48:17.3673147Z $L__BB0_16: // %common.ret 2026-02-21T09:48:17.3673476Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3673782Z ret; 2026-02-21T09:48:17.3673969Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:48:17.3674234Z mov.b32 %r31, global_smem; 2026-02-21T09:48:17.3674415Z add.s32 %r32, %r31, %r3; 2026-02-21T09:48:17.3674583Z add.s32 %r63, %r31, 73888; 2026-02-21T09:48:17.3674808Z bfe.u32 %r77, %r31, 4, 14; 2026-02-21T09:48:17.3674979Z cvt.u64.u32 %rd24, %r77; 2026-02-21T09:48:17.3675170Z or.b64 %rd14, %rd24, 4611686293338849280; 2026-02-21T09:48:17.3675408Z add.s32 %r78, %r31, 32768; 2026-02-21T09:48:17.3675576Z bfe.u32 %r79, %r78, 4, 14; 2026-02-21T09:48:17.3675753Z cvt.u64.u32 %rd25, %r79; 2026-02-21T09:48:17.3675930Z or.b64 %rd15, %rd25, 4611686293338849280; 2026-02-21T09:48:17.3676135Z add.s32 %r80, %r31, 32; 2026-02-21T09:48:17.3676301Z bfe.u32 %r81, %r80, 4, 14; 2026-02-21T09:48:17.3676471Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.3676673Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3677040Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3677394Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.3677591Z barrier.sync 1; 2026-02-21T09:48:17.3677749Z barrier.sync 1; 2026-02-21T09:48:17.3677920Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.3678152Z $L__BB0_2: // %.preheader 2026-02-21T09:48:17.3678442Z // =>This Loop Header: Depth=1 2026-02-21T09:48:17.3678708Z // Child Loop BB0_9 Depth 2 2026-02-21T09:48:17.3678955Z // Child Loop BB0_6 Depth 2 2026-02-21T09:48:17.3679290Z .loc 1 19 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:19 2026-02-21T09:48:17.3679623Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:48:17.3679819Z barrier.sync 1; 2026-02-21T09:48:17.3679986Z ld.shared.b8 %r30, [%r32+73924]; 2026-02-21T09:48:17.3680178Z setp.gt.u32 %p2, %r30, 3; 2026-02-21T09:48:17.3680360Z @%p2 bra $L__BB0_4; 2026-02-21T09:48:17.3680547Z // %bb.3: // %.preheader 2026-02-21T09:48:17.3680800Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3681041Z $L_brx_0: .branchtargets 2026-02-21T09:48:17.3681208Z $L__BB0_5, 2026-02-21T09:48:17.3681359Z $L__BB0_8, 2026-02-21T09:48:17.3681497Z $L__BB0_11, 2026-02-21T09:48:17.3681643Z $L__BB0_16; 2026-02-21T09:48:17.3681790Z brx.idx %r30, $L_brx_0; 2026-02-21T09:48:17.3681983Z $L__BB0_5: // %.peel.next 2026-02-21T09:48:17.3682274Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3682623Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3682965Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.3683180Z ld.shared.b32 %r65, [global_smem+73728]; 2026-02-21T09:48:17.3683387Z barrier.sync 1; 2026-02-21T09:48:17.3683649Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3683965Z bar.warp.sync -1; 2026-02-21T09:48:17.3684119Z mov.b32 %r335, 0; 2026-02-21T09:48:17.3684279Z // begin inline asm 2026-02-21T09:48:17.3684429Z 2026-02-21T09:48:17.3684565Z { 2026-02-21T09:48:17.3684769Z .reg .pred complete; 2026-02-21T09:48:17.3684939Z waitLoop: 2026-02-21T09:48:17.3685158Z mbarrier.try_wait.parity.shared.b64 complete, [%r63], %r335; 2026-02-21T09:48:17.3685416Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.3685594Z } 2026-02-21T09:48:17.3685666Z 2026-02-21T09:48:17.3685727Z // end inline asm 2026-02-21T09:48:17.3686014Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3686342Z elect.sync %r76|%p12, -1; 2026-02-21T09:48:17.3686528Z mov.b32 %r66, 68157456; 2026-02-21T09:48:17.3686696Z mov.pred %p11, 0; 2026-02-21T09:48:17.3686862Z // begin inline asm 2026-02-21T09:48:17.3687116Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd14, %rd15, %r66, %p11; 2026-02-21T09:48:17.3687433Z // end inline asm 2026-02-21T09:48:17.3687591Z cvt.u64.u32 %rd26, %r81; 2026-02-21T09:48:17.3687770Z or.b64 %rd16, %rd26, 4611686293338849280; 2026-02-21T09:48:17.3687975Z add.s32 %r82, %r31, 32800; 2026-02-21T09:48:17.3688144Z bfe.u32 %r83, %r82, 4, 14; 2026-02-21T09:48:17.3688347Z cvt.u64.u32 %rd27, %r83; 2026-02-21T09:48:17.3688526Z or.b64 %rd17, %rd27, 4611686293338849280; 2026-02-21T09:48:17.3688728Z mov.pred %p13, -1; 2026-02-21T09:48:17.3688887Z // begin inline asm 2026-02-21T09:48:17.3689122Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd16, %rd17, %r66, %p13; 2026-02-21T09:48:17.3689400Z // end inline asm 2026-02-21T09:48:17.3689551Z add.s32 %r84, %r31, 64; 2026-02-21T09:48:17.3689726Z bfe.u32 %r85, %r84, 4, 14; 2026-02-21T09:48:17.3689893Z cvt.u64.u32 %rd28, %r85; 2026-02-21T09:48:17.3690074Z or.b64 %rd18, %rd28, 4611686293338849280; 2026-02-21T09:48:17.3690266Z add.s32 %r86, %r31, 32832; 2026-02-21T09:48:17.3690442Z bfe.u32 %r87, %r86, 4, 14; 2026-02-21T09:48:17.3690609Z cvt.u64.u32 %rd29, %r87; 2026-02-21T09:48:17.3690788Z or.b64 %rd19, %rd29, 4611686293338849280; 2026-02-21T09:48:17.3690985Z // begin inline asm 2026-02-21T09:48:17.3691220Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd18, %rd19, %r66, %p13; 2026-02-21T09:48:17.3691538Z // end inline asm 2026-02-21T09:48:17.3691693Z add.s32 %r88, %r31, 96; 2026-02-21T09:48:17.3691871Z bfe.u32 %r89, %r88, 4, 14; 2026-02-21T09:48:17.3692036Z cvt.u64.u32 %rd30, %r89; 2026-02-21T09:48:17.3692215Z or.b64 %rd20, %rd30, 4611686293338849280; 2026-02-21T09:48:17.3692404Z add.s32 %r90, %r31, 32864; 2026-02-21T09:48:17.3692576Z bfe.u32 %r91, %r90, 4, 14; 2026-02-21T09:48:17.3692747Z cvt.u64.u32 %rd31, %r91; 2026-02-21T09:48:17.3692917Z or.b64 %rd21, %rd31, 4611686293338849280; 2026-02-21T09:48:17.3693111Z // begin inline asm 2026-02-21T09:48:17.3693343Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd20, %rd21, %r66, %p13; 2026-02-21T09:48:17.3693618Z // end inline asm 2026-02-21T09:48:17.3693767Z add.s32 %r92, %r31, 73856; 2026-02-21T09:48:17.3693944Z cvt.u64.u32 %rd22, %r92; 2026-02-21T09:48:17.3694104Z // begin inline asm 2026-02-21T09:48:17.3694339Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:48:17.3694599Z // end inline asm 2026-02-21T09:48:17.3694790Z add.s32 %r93, %r31, 73920; 2026-02-21T09:48:17.3694967Z cvt.u64.u32 %rd23, %r93; 2026-02-21T09:48:17.3695164Z // begin inline asm 2026-02-21T09:48:17.3695397Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:48:17.3695643Z // end inline asm 2026-02-21T09:48:17.3695798Z mov.b32 %r337, 1; 2026-02-21T09:48:17.3695948Z mov.b32 %r336, %r335; 2026-02-21T09:48:17.3696160Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:17.3696430Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:17.3696787Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3697107Z shl.b32 %r104, %r337, 3; 2026-02-21T09:48:17.3697274Z add.s32 %r106, %r31, %r104; 2026-02-21T09:48:17.3697459Z add.s32 %r107, %r106, 73856; 2026-02-21T09:48:17.3697629Z add.s32 %r94, %r106, 73888; 2026-02-21T09:48:17.3697929Z .loc 1 54 31 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:54:31 2026-02-21T09:48:17.3698251Z shl.b32 %r108, %r337, 13; 2026-02-21T09:48:17.3698418Z add.s32 %r109, %r31, %r108; 2026-02-21T09:48:17.3698710Z .loc 1 55 44 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:55:44 2026-02-21T09:48:17.3699020Z add.s32 %r110, %r109, 32768; 2026-02-21T09:48:17.3699312Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3699619Z bar.warp.sync -1; 2026-02-21T09:48:17.3699783Z // begin inline asm 2026-02-21T09:48:17.3699978Z 2026-02-21T09:48:17.3700109Z { 2026-02-21T09:48:17.3700251Z .reg .pred complete; 2026-02-21T09:48:17.3700410Z waitLoop: 2026-02-21T09:48:17.3700620Z mbarrier.try_wait.parity.shared.b64 complete, [%r94], %r336; 2026-02-21T09:48:17.3700876Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.3701049Z } 2026-02-21T09:48:17.3701119Z 2026-02-21T09:48:17.3701213Z // end inline asm 2026-02-21T09:48:17.3701490Z .loc 1 56 52 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:56:52 2026-02-21T09:48:17.3701803Z setp.eq.b32 %p31, %r335, 1920; 2026-02-21T09:48:17.3702002Z elect.sync %r111|%p22, -1; 2026-02-21T09:48:17.3702188Z bfe.u32 %r112, %r109, 4, 14; 2026-02-21T09:48:17.3702364Z cvt.u64.u32 %rd42, %r112; 2026-02-21T09:48:17.3702566Z or.b64 %rd32, %rd42, 4611686293338849280; 2026-02-21T09:48:17.3702760Z bfe.u32 %r113, %r110, 4, 14; 2026-02-21T09:48:17.3702937Z cvt.u64.u32 %rd43, %r113; 2026-02-21T09:48:17.3703111Z or.b64 %rd33, %rd43, 4611686293338849280; 2026-02-21T09:48:17.3703308Z // begin inline asm 2026-02-21T09:48:17.3703544Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd32, %rd33, %r66, %p13; 2026-02-21T09:48:17.3703825Z // end inline asm 2026-02-21T09:48:17.3703979Z add.s32 %r114, %r109, 32; 2026-02-21T09:48:17.3704143Z bfe.u32 %r115, %r114, 4, 14; 2026-02-21T09:48:17.3704359Z cvt.u64.u32 %rd44, %r115; 2026-02-21T09:48:17.3704534Z or.b64 %rd34, %rd44, 4611686293338849280; 2026-02-21T09:48:17.3704788Z add.s32 %r116, %r109, 32800; 2026-02-21T09:48:17.3704959Z bfe.u32 %r117, %r116, 4, 14; 2026-02-21T09:48:17.3705132Z cvt.u64.u32 %rd45, %r117; 2026-02-21T09:48:17.3705304Z or.b64 %rd35, %rd45, 4611686293338849280; 2026-02-21T09:48:17.3705507Z // begin inline asm 2026-02-21T09:48:17.3705753Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd34, %rd35, %r66, %p13; 2026-02-21T09:48:17.3706024Z // end inline asm 2026-02-21T09:48:17.3706183Z add.s32 %r118, %r109, 64; 2026-02-21T09:48:17.3706347Z bfe.u32 %r119, %r118, 4, 14; 2026-02-21T09:48:17.3706529Z cvt.u64.u32 %rd46, %r119; 2026-02-21T09:48:17.3706701Z or.b64 %rd36, %rd46, 4611686293338849280; 2026-02-21T09:48:17.3706898Z add.s32 %r120, %r109, 32832; 2026-02-21T09:48:17.3707063Z bfe.u32 %r121, %r120, 4, 14; 2026-02-21T09:48:17.3707237Z cvt.u64.u32 %rd47, %r121; 2026-02-21T09:48:17.3707422Z or.b64 %rd37, %rd47, 4611686293338849280; 2026-02-21T09:48:17.3707672Z // begin inline asm 2026-02-21T09:48:17.3707970Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd36, %rd37, %r66, %p13; 2026-02-21T09:48:17.3708349Z // end inline asm 2026-02-21T09:48:17.3708529Z add.s32 %r122, %r109, 96; 2026-02-21T09:48:17.3708732Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T09:48:17.3708936Z cvt.u64.u32 %rd48, %r123; 2026-02-21T09:48:17.3709138Z or.b64 %rd38, %rd48, 4611686293338849280; 2026-02-21T09:48:17.3709377Z add.s32 %r124, %r109, 32864; 2026-02-21T09:48:17.3709541Z bfe.u32 %r125, %r124, 4, 14; 2026-02-21T09:48:17.3709716Z cvt.u64.u32 %rd49, %r125; 2026-02-21T09:48:17.3709896Z or.b64 %rd39, %rd49, 4611686293338849280; 2026-02-21T09:48:17.3710084Z // begin inline asm 2026-02-21T09:48:17.3710324Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd38, %rd39, %r66, %p13; 2026-02-21T09:48:17.3710588Z // end inline asm 2026-02-21T09:48:17.3710744Z cvt.u64.u32 %rd40, %r107; 2026-02-21T09:48:17.3710907Z // begin inline asm 2026-02-21T09:48:17.3711140Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T09:48:17.3711401Z // end inline asm 2026-02-21T09:48:17.3711566Z and.pred %p30, %p31, %p22; 2026-02-21T09:48:17.3711744Z // begin inline asm 2026-02-21T09:48:17.3711964Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:48:17.3712225Z // end inline asm 2026-02-21T09:48:17.3712486Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3712802Z add.s32 %r127, %r337, 1; 2026-02-21T09:48:17.3712973Z setp.eq.b32 %p32, %r127, 4; 2026-02-21T09:48:17.3713198Z selp.b32 %r337, 0, %r127, %p32; 2026-02-21T09:48:17.3713392Z selp.b32 %r128, 1, 0, %p32; 2026-02-21T09:48:17.3713565Z xor.b32 %r336, %r336, %r128; 2026-02-21T09:48:17.3713866Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3714178Z add.s32 %r335, %r335, 64; 2026-02-21T09:48:17.3714389Z setp.lt.u32 %p33, %r335, 1984; 2026-02-21T09:48:17.3714566Z @%p33 bra $L__BB0_6; 2026-02-21T09:48:17.3714826Z // %bb.7: // %.loopexit 2026-02-21T09:48:17.3715069Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3715302Z barrier.sync 1; 2026-02-21T09:48:17.3715486Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.3715687Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.3715892Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3716242Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3716590Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.3716828Z ld.shared.v2.b32 {%r49, %r53}, [global_smem+73736]; 2026-02-21T09:48:17.3717058Z barrier.sync 1; 2026-02-21T09:48:17.3717370Z .loc 1 21 67 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:21:67 2026-02-21T09:48:17.3717703Z mov.u32 %r35, %ctaid.x; 2026-02-21T09:48:17.3717881Z mov.u32 %r36, %ctaid.y; 2026-02-21T09:48:17.3718045Z mov.u32 %r37, %ctaid.z; 2026-02-21T09:48:17.3718219Z mov.u32 %r38, %nctaid.x; 2026-02-21T09:48:17.3718387Z mov.u32 %r39, %nctaid.y; 2026-02-21T09:48:17.3718566Z mad.lo.s32 %r40, %r37, %r39, %r36; 2026-02-21T09:48:17.3718757Z mad.lo.s32 %r41, %r40, %r38, %r35; 2026-02-21T09:48:17.3718950Z mul.lo.s32 %r42, %r41, 384; 2026-02-21T09:48:17.3719240Z .loc 1 22 68 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:22:68 2026-02-21T09:48:17.3719556Z add.s32 %r43, %r42, 128; 2026-02-21T09:48:17.3719727Z cvt.s64.s32 %rd8, %r43; 2026-02-21T09:48:17.3719891Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:48:17.3720072Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:48:17.3720368Z .loc 1 21 67 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:21:67 2026-02-21T09:48:17.3720689Z cvt.s64.s32 %rd10, %r42; 2026-02-21T09:48:17.3720863Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:48:17.3721050Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:48:17.3721266Z add.s32 %r13, %r1, -128; 2026-02-21T09:48:17.3721436Z mov.b32 %r339, 0; 2026-02-21T09:48:17.3721604Z mov.b32 %r338, -64; 2026-02-21T09:48:17.3721762Z mov.b32 %r340, %r339; 2026-02-21T09:48:17.3721974Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:17.3722238Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:17.3722591Z .loc 1 0 67 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0:67 2026-02-21T09:48:17.3722916Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:48:17.3723102Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:48:17.3723396Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3723707Z add.s32 %r338, %r338, 64; 2026-02-21T09:48:17.3723889Z shl.b32 %r55, %r340, 3; 2026-02-21T09:48:17.3724056Z add.s32 %r57, %r31, %r55; 2026-02-21T09:48:17.3724237Z add.s32 %r44, %r57, 73856; 2026-02-21T09:48:17.3724516Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3724855Z // begin inline asm 2026-02-21T09:48:17.3725005Z 2026-02-21T09:48:17.3725138Z { 2026-02-21T09:48:17.3725270Z .reg .pred complete; 2026-02-21T09:48:17.3725435Z waitLoop: 2026-02-21T09:48:17.3725655Z mbarrier.try_wait.parity.shared.b64 complete, [%r44], %r339; 2026-02-21T09:48:17.3725925Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.3726116Z } 2026-02-21T09:48:17.3726224Z 2026-02-21T09:48:17.3726285Z // end inline asm 2026-02-21T09:48:17.3726567Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3726887Z add.s32 %r50, %r57, 73888; 2026-02-21T09:48:17.3727181Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3727541Z bar.sync 3, 64; 2026-02-21T09:48:17.3727693Z // begin inline asm 2026-02-21T09:48:17.3727916Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r50], 16384; 2026-02-21T09:48:17.3728152Z // end inline asm 2026-02-21T09:48:17.3728429Z .loc 1 54 31 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:54:31 2026-02-21T09:48:17.3728741Z shl.b32 %r58, %r340, 13; 2026-02-21T09:48:17.3728914Z add.s32 %r47, %r31, %r58; 2026-02-21T09:48:17.3729189Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3729491Z bar.sync 3, 64; 2026-02-21T09:48:17.3729655Z elect.sync %r59|%p7, -1; 2026-02-21T09:48:17.3729831Z and.pred %p4, %p6, %p7; 2026-02-21T09:48:17.3730008Z // begin inline asm 2026-02-21T09:48:17.3730368Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r47], [%rd12, {%r338, %r49}], [%r50]; 2026-02-21T09:48:17.3730809Z // end inline asm 2026-02-21T09:48:17.3731088Z .loc 1 55 44 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:55:44 2026-02-21T09:48:17.3731421Z add.s32 %r51, %r47, 32768; 2026-02-21T09:48:17.3731709Z .loc 1 0 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:0 2026-02-21T09:48:17.3732004Z bar.sync 3, 64; 2026-02-21T09:48:17.3732173Z elect.sync %r60|%p8, -1; 2026-02-21T09:48:17.3732352Z and.pred %p5, %p6, %p8; 2026-02-21T09:48:17.3732529Z // begin inline asm 2026-02-21T09:48:17.3732889Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r51], [%rd13, {%r338, %r53}], [%r50]; 2026-02-21T09:48:17.3733285Z // end inline asm 2026-02-21T09:48:17.3733445Z add.s32 %r61, %r340, 1; 2026-02-21T09:48:17.3733615Z setp.eq.b32 %p9, %r61, 4; 2026-02-21T09:48:17.3733802Z selp.b32 %r340, 0, %r61, %p9; 2026-02-21T09:48:17.3733985Z selp.b32 %r62, 1, 0, %p9; 2026-02-21T09:48:17.3734169Z xor.b32 %r339, %r339, %r62; 2026-02-21T09:48:17.3734466Z .loc 1 50 79 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:50:79 2026-02-21T09:48:17.3734918Z setp.lt.u32 %p10, %r338, 1984; 2026-02-21T09:48:17.3735098Z @%p10 bra $L__BB0_9; 2026-02-21T09:48:17.3735311Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3735547Z barrier.sync 1; 2026-02-21T09:48:17.3735720Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.3735934Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.3736139Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.3736494Z .loc 1 19 0 // cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py:19 2026-02-21T09:48:17.3736806Z barrier.sync 1; 2026-02-21T09:48:17.3736972Z barrier.sync 1; 2026-02-21T09:48:17.3737132Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.3737301Z $L__tmp1: 2026-02-21T09:48:17.3737440Z $L__func_end0: 2026-02-21T09:48:17.3737618Z // -- End function 2026-02-21T09:48:17.3737821Z } 2026-02-21T09:48:17.3738110Z .file 1 "/tmp/torchinductor_root/fg/cfg5b2ujuyexhcizrqo2acrwaw56kou26vxuzxz37asl7yl4xikq.py" 2026-02-21T09:48:17.3738493Z .section .debug_abbrev 2026-02-21T09:48:17.3738671Z { 2026-02-21T09:48:17.3738866Z .b8 1 // Abbreviation Code 2026-02-21T09:48:17.3739192Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:48:17.3739476Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:48:17.3739747Z .b8 37 // DW_AT_producer 2026-02-21T09:48:17.3740056Z .b8 8 // DW_FORM_string 2026-02-21T09:48:17.3740331Z .b8 19 // DW_AT_language 2026-02-21T09:48:17.3740558Z .b8 5 // DW_FORM_data2 2026-02-21T09:48:17.3740788Z .b8 3 // DW_AT_name 2026-02-21T09:48:17.3741038Z .b8 8 // DW_FORM_string 2026-02-21T09:48:17.3741274Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:48:17.3741502Z .b8 6 // DW_FORM_data4 2026-02-21T09:48:17.3741731Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:48:17.3741959Z .b8 8 // DW_FORM_string 2026-02-21T09:48:17.3742176Z .b8 0 // EOM(1) 2026-02-21T09:48:17.3742392Z .b8 0 // EOM(2) 2026-02-21T09:48:17.3742596Z .b8 0 // EOM(3) 2026-02-21T09:48:17.3742789Z } 2026-02-21T09:48:17.3742924Z .section .debug_info 2026-02-21T09:48:17.3743081Z { 2026-02-21T09:48:17.3743237Z .b32 104 // Length of Unit 2026-02-21T09:48:17.3743490Z .b8 2 // DWARF version number 2026-02-21T09:48:17.3743706Z .b8 0 2026-02-21T09:48:17.3743943Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:48:17.3744236Z .b8 8 // Address Size (in bytes) 2026-02-21T09:48:17.3744499Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:48:17.3744841Z .b8 116 // DW_AT_producer 2026-02-21T09:48:17.3745043Z .b8 114 2026-02-21T09:48:17.3745182Z .b8 105 2026-02-21T09:48:17.3745316Z .b8 116 2026-02-21T09:48:17.3745440Z .b8 111 2026-02-21T09:48:17.3745572Z .b8 110 2026-02-21T09:48:17.3745699Z .b8 0 2026-02-21T09:48:17.3745863Z .b8 2 // DW_AT_language 2026-02-21T09:48:17.3746064Z .b8 0 2026-02-21T09:48:17.3746231Z .b8 99 // DW_AT_name 2026-02-21T09:48:17.3746432Z .b8 102 2026-02-21T09:48:17.3746567Z .b8 103 2026-02-21T09:48:17.3746696Z .b8 53 2026-02-21T09:48:17.3746845Z .b8 98 2026-02-21T09:48:17.3747003Z .b8 50 2026-02-21T09:48:17.3747137Z .b8 117 2026-02-21T09:48:17.3747260Z .b8 106 2026-02-21T09:48:17.3747391Z .b8 117 2026-02-21T09:48:17.3747522Z .b8 121 2026-02-21T09:48:17.3747646Z .b8 101 2026-02-21T09:48:17.3747826Z .b8 120 2026-02-21T09:48:17.3747951Z .b8 104 2026-02-21T09:48:17.3748085Z .b8 99 2026-02-21T09:48:17.3748211Z .b8 105 2026-02-21T09:48:17.3748342Z .b8 122 2026-02-21T09:48:17.3748467Z .b8 114 2026-02-21T09:48:17.3748601Z .b8 113 2026-02-21T09:48:17.3748725Z .b8 111 2026-02-21T09:48:17.3748861Z .b8 50 2026-02-21T09:48:17.3748988Z .b8 97 2026-02-21T09:48:17.3749134Z .b8 99 2026-02-21T09:48:17.3749258Z .b8 114 2026-02-21T09:48:17.3749395Z .b8 119 2026-02-21T09:48:17.3749529Z .b8 97 2026-02-21T09:48:17.3749655Z .b8 119 2026-02-21T09:48:17.3749789Z .b8 53 2026-02-21T09:48:17.3749914Z .b8 54 2026-02-21T09:48:17.3750046Z .b8 107 2026-02-21T09:48:17.3750171Z .b8 111 2026-02-21T09:48:17.3750302Z .b8 117 2026-02-21T09:48:17.3750425Z .b8 50 2026-02-21T09:48:17.3750556Z .b8 54 2026-02-21T09:48:17.3750679Z .b8 118 2026-02-21T09:48:17.3750810Z .b8 120 2026-02-21T09:48:17.3750934Z .b8 117 2026-02-21T09:48:17.3751072Z .b8 122 2026-02-21T09:48:17.3751198Z .b8 120 2026-02-21T09:48:17.3751331Z .b8 122 2026-02-21T09:48:17.3751462Z .b8 51 2026-02-21T09:48:17.3751585Z .b8 55 2026-02-21T09:48:17.3751716Z .b8 97 2026-02-21T09:48:17.3751840Z .b8 115 2026-02-21T09:48:17.3751973Z .b8 108 2026-02-21T09:48:17.3752096Z .b8 55 2026-02-21T09:48:17.3752229Z .b8 121 2026-02-21T09:48:17.3752353Z .b8 108 2026-02-21T09:48:17.3752483Z .b8 52 2026-02-21T09:48:17.3752609Z .b8 120 2026-02-21T09:48:17.3752743Z .b8 105 2026-02-21T09:48:17.3752868Z .b8 107 2026-02-21T09:48:17.3753004Z .b8 113 2026-02-21T09:48:17.3753127Z .b8 46 2026-02-21T09:48:17.3753282Z .b8 112 2026-02-21T09:48:17.3753466Z .b8 121 2026-02-21T09:48:17.3753604Z .b8 0 2026-02-21T09:48:17.3753790Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:48:17.3754046Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:48:17.3754269Z .b8 116 2026-02-21T09:48:17.3754394Z .b8 109 2026-02-21T09:48:17.3754529Z .b8 112 2026-02-21T09:48:17.3754654Z .b8 47 2026-02-21T09:48:17.3754891Z .b8 116 2026-02-21T09:48:17.3755018Z .b8 111 2026-02-21T09:48:17.3755152Z .b8 114 2026-02-21T09:48:17.3755276Z .b8 99 2026-02-21T09:48:17.3755410Z .b8 104 2026-02-21T09:48:17.3755534Z .b8 105 2026-02-21T09:48:17.3755671Z .b8 110 2026-02-21T09:48:17.3755805Z .b8 100 2026-02-21T09:48:17.3755929Z .b8 117 2026-02-21T09:48:17.3756058Z .b8 99 2026-02-21T09:48:17.3756182Z .b8 116 2026-02-21T09:48:17.3756313Z .b8 111 2026-02-21T09:48:17.3756437Z .b8 114 2026-02-21T09:48:17.3756572Z .b8 95 2026-02-21T09:48:17.3756697Z .b8 114 2026-02-21T09:48:17.3756826Z .b8 111 2026-02-21T09:48:17.3756950Z .b8 111 2026-02-21T09:48:17.3757018Z .b8 116 2026-02-21T09:48:17.3757073Z .b8 47 2026-02-21T09:48:17.3757128Z .b8 102 2026-02-21T09:48:17.3757184Z .b8 103 2026-02-21T09:48:17.3757247Z .b8 0 2026-02-21T09:48:17.3757303Z } 2026-02-21T09:48:17.3757375Z .section .debug_macinfo { } 2026-02-21T09:48:17.3757380Z 2026-02-21T09:48:17.3757509Z ================================================================ 2026-02-21T09:48:17.3757627Z please share the reproducer above with Triton project. 2026-02-21T09:48:17.7508939Z [114s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:48:17.7509298Z 2026-02-21T09:48:17.7512908Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:48:17.7514282Z 2026-02-21T09:48:17.7514288Z 2026-02-21T09:48:17.7514405Z ================================================================ 2026-02-21T09:48:17.7514658Z Internal Triton PTX codegen error 2026-02-21T09:48:17.7514989Z `ptxas` stderr: 2026-02-21T09:48:17.7515493Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 303 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:17.7516402Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:17.7516579Z 2026-02-21T09:48:17.7517040Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpv3peb0nw.ptx -o /tmp/tmpv3peb0nw.ptx.o 2026-02-21T09:48:17.7517610Z 2026-02-21T09:48:17.7517614Z 2026-02-21T09:48:17.7517688Z // 2026-02-21T09:48:17.7517916Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:48:17.7518172Z // 2026-02-21T09:48:17.7518283Z 2026-02-21T09:48:17.7518359Z .version 8.7 2026-02-21T09:48:17.7518518Z .target sm_100a 2026-02-21T09:48:17.7518677Z .address_size 64 2026-02-21T09:48:17.7518771Z 2026-02-21T09:48:17.7518908Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:48:17.7519220Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:48:17.7519468Z // @_helion_matmul 2026-02-21T09:48:17.7519693Z .visible .entry _helion_matmul( 2026-02-21T09:48:17.7519938Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:48:17.7520223Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:48:17.7520521Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:48:17.7520802Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:48:17.7521107Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:48:17.7521461Z ) 2026-02-21T09:48:17.7521596Z .reqntid 256 2026-02-21T09:48:17.7521751Z .maxnreg 32 2026-02-21T09:48:17.7521890Z { 2026-02-21T09:48:17.7522039Z .reg .pred %p<136>; 2026-02-21T09:48:17.7522208Z .reg .b16 %rs<11>; 2026-02-21T09:48:17.7522385Z .reg .b32 %r<370>; 2026-02-21T09:48:17.7522541Z .reg .b64 %rd<196>; 2026-02-21T09:48:17.7522944Z .loc 1 19 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:19:0 2026-02-21T09:48:17.7523257Z $L__func_begin0: 2026-02-21T09:48:17.7523532Z .loc 1 19 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:19:0 2026-02-21T09:48:17.7523780Z 2026-02-21T09:48:17.7523846Z // %bb.0: 2026-02-21T09:48:17.7524012Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:48:17.7524225Z $L__tmp0: 2026-02-21T09:48:17.7524479Z .loc 1 19 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:19 2026-02-21T09:48:17.7524862Z mov.u32 %r1, %tid.x; 2026-02-21T09:48:17.7525025Z shr.u32 %r2, %r1, 5; 2026-02-21T09:48:17.7525204Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:48:17.7525417Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:48:17.7525590Z @%p1 bra $L__BB0_12; 2026-02-21T09:48:17.7525755Z bra.uni $L__BB0_1; 2026-02-21T09:48:17.7525907Z $L__BB0_12: 2026-02-21T09:48:17.7526241Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0:0 2026-02-21T09:48:17.7526590Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:48:17.7526832Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:48:17.7527058Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:48:17.7527390Z .loc 1 19 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:19 2026-02-21T09:48:17.7527745Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:17.7527960Z setp.lt.u32 %p50, %r1, 32; 2026-02-21T09:48:17.7528151Z mov.b32 %r162, global_smem; 2026-02-21T09:48:17.7528327Z // begin inline asm 2026-02-21T09:48:17.7528612Z @%p50 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r162], 32; 2026-02-21T09:48:17.7528893Z // end inline asm 2026-02-21T09:48:17.7529057Z bar.sync 0, 128; 2026-02-21T09:48:17.7529221Z ld.shared.b32 %r362, [global_smem]; 2026-02-21T09:48:17.7529423Z bar.sync 0, 128; 2026-02-21T09:48:17.7529580Z // begin inline asm 2026-02-21T09:48:17.7529813Z @%p50 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:48:17.7530117Z // end inline asm 2026-02-21T09:48:17.7530396Z .loc 1 21 67 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:21:67 2026-02-21T09:48:17.7530735Z mov.u32 %r187, %ctaid.x; 2026-02-21T09:48:17.7530904Z mov.u32 %r188, %ctaid.y; 2026-02-21T09:48:17.7531076Z mov.u32 %r189, %ctaid.z; 2026-02-21T09:48:17.7531245Z mov.u32 %r190, %nctaid.x; 2026-02-21T09:48:17.7531425Z mov.u32 %r191, %nctaid.y; 2026-02-21T09:48:17.7531608Z mad.lo.s32 %r192, %r189, %r191, %r188; 2026-02-21T09:48:17.7531809Z mad.lo.s32 %r193, %r192, %r190, %r187; 2026-02-21T09:48:17.7532018Z mul.lo.s32 %r194, %r193, 384; 2026-02-21T09:48:17.7532200Z cvt.s64.s32 %rd128, %r194; 2026-02-21T09:48:17.7532388Z add.s64 %rd89, %rd7, %rd128; 2026-02-21T09:48:17.7532565Z shl.b32 %r195, %r1, 2; 2026-02-21T09:48:17.7532740Z add.s32 %r163, %r162, %r195; 2026-02-21T09:48:17.7532908Z mov.b32 %r172, 0; 2026-02-21T09:48:17.7533069Z // begin inline asm 2026-02-21T09:48:17.7533238Z @%p50 st.shared.b32 [ %r163 + 0 ], %r172; 2026-02-21T09:48:17.7533436Z // end inline asm 2026-02-21T09:48:17.7533598Z bar.warp.sync -1; 2026-02-21T09:48:17.7533758Z setp.eq.b32 %p53, %r1, 0; 2026-02-21T09:48:17.7533935Z cvt.u64.u32 %rd74, %r162; 2026-02-21T09:48:17.7534100Z // begin inline asm 2026-02-21T09:48:17.7534382Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd74 + 0 ], %rd4; 2026-02-21T09:48:17.7534741Z // end inline asm 2026-02-21T09:48:17.7534898Z // begin inline asm 2026-02-21T09:48:17.7535144Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:17.7535466Z // end inline asm 2026-02-21T09:48:17.7535616Z mov.b32 %r165, 64; 2026-02-21T09:48:17.7535763Z // begin inline asm 2026-02-21T09:48:17.7536027Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r165; 2026-02-21T09:48:17.7536328Z // end inline asm 2026-02-21T09:48:17.7536515Z mov.b32 %r166, 256; 2026-02-21T09:48:17.7536676Z // begin inline asm 2026-02-21T09:48:17.7536946Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r166; 2026-02-21T09:48:17.7537252Z // end inline asm 2026-02-21T09:48:17.7537405Z mov.b32 %r167, 2048; 2026-02-21T09:48:17.7537570Z // begin inline asm 2026-02-21T09:48:17.7537977Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:48:17.7538247Z `ptxas` stderr: 2026-02-21T09:48:17.7538722Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 303 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:17.7539286Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:17.7539463Z 2026-02-21T09:48:17.7539937Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpv3peb0nw.ptx -o /tmp/tmpv3peb0nw.ptx.o 2026-02-21T09:48:17.7540446Z 2026-02-21T09:48:17.7540597Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:48:17.7540991Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r167; 2026-02-21T09:48:17.7541314Z // end inline asm 2026-02-21T09:48:17.7541464Z // begin inline asm 2026-02-21T09:48:17.7541744Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r167; 2026-02-21T09:48:17.7542054Z // end inline asm 2026-02-21T09:48:17.7542204Z mov.b64 %rd82, 4096; 2026-02-21T09:48:17.7542369Z // begin inline asm 2026-02-21T09:48:17.7542645Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd74 + 0 ], 0x0, %rd82; 2026-02-21T09:48:17.7542966Z // end inline asm 2026-02-21T09:48:17.7543114Z mov.b32 %r169, 1; 2026-02-21T09:48:17.7543273Z // begin inline asm 2026-02-21T09:48:17.7543565Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r169; 2026-02-21T09:48:17.7543883Z // end inline asm 2026-02-21T09:48:17.7544040Z // begin inline asm 2026-02-21T09:48:17.7544350Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r169; 2026-02-21T09:48:17.7544720Z // end inline asm 2026-02-21T09:48:17.7544868Z // begin inline asm 2026-02-21T09:48:17.7545134Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x6; 2026-02-21T09:48:17.7545434Z // end inline asm 2026-02-21T09:48:17.7545579Z // begin inline asm 2026-02-21T09:48:17.7545861Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:17.7546181Z // end inline asm 2026-02-21T09:48:17.7546337Z // begin inline asm 2026-02-21T09:48:17.7546597Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x3; 2026-02-21T09:48:17.7546901Z // end inline asm 2026-02-21T09:48:17.7547046Z // begin inline asm 2026-02-21T09:48:17.7547307Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:17.7547603Z // end inline asm 2026-02-21T09:48:17.7547748Z // begin inline asm 2026-02-21T09:48:17.7548150Z @%p50 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd89 + 0 ], [ %rd74 + 0 ], 0x80; 2026-02-21T09:48:17.7548581Z // end inline asm 2026-02-21T09:48:17.7548733Z // begin inline asm 2026-02-21T09:48:17.7548961Z @%p50 fence.proxy.tensormap::generic.acquire.gpu [ %rd89 + 0 ], 0x80; 2026-02-21T09:48:17.7549251Z @%p50 cp.async.bulk.commit_group ; 2026-02-21T09:48:17.7549469Z @%p50 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:17.7549663Z // end inline asm 2026-02-21T09:48:17.7549859Z bar.sync 0, 128; 2026-02-21T09:48:17.7550135Z .loc 1 22 68 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:22:68 2026-02-21T09:48:17.7550468Z add.s32 %r196, %r194, 128; 2026-02-21T09:48:17.7550647Z cvt.s64.s32 %rd129, %r196; 2026-02-21T09:48:17.7550836Z add.s64 %rd107, %rd7, %rd129; 2026-02-21T09:48:17.7551040Z bar.sync 0, 128; 2026-02-21T09:48:17.7551199Z // begin inline asm 2026-02-21T09:48:17.7551379Z @%p50 st.shared.b32 [ %r163 + 0 ], %r172; 2026-02-21T09:48:17.7551576Z // end inline asm 2026-02-21T09:48:17.7551740Z bar.warp.sync -1; 2026-02-21T09:48:17.7551897Z // begin inline asm 2026-02-21T09:48:17.7552174Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd74 + 0 ], %rd5; 2026-02-21T09:48:17.7552477Z // end inline asm 2026-02-21T09:48:17.7552628Z // begin inline asm 2026-02-21T09:48:17.7552870Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:17.7553155Z // end inline asm 2026-02-21T09:48:17.7553307Z // begin inline asm 2026-02-21T09:48:17.7553566Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r165; 2026-02-21T09:48:17.7553877Z // end inline asm 2026-02-21T09:48:17.7554022Z mov.b32 %r174, 16; 2026-02-21T09:48:17.7554178Z // begin inline asm 2026-02-21T09:48:17.7554463Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r174; 2026-02-21T09:48:17.7554826Z // end inline asm 2026-02-21T09:48:17.7554972Z // begin inline asm 2026-02-21T09:48:17.7555247Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r167; 2026-02-21T09:48:17.7555557Z // end inline asm 2026-02-21T09:48:17.7555706Z mov.b32 %r176, 12288; 2026-02-21T09:48:17.7555873Z // begin inline asm 2026-02-21T09:48:17.7556136Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r176; 2026-02-21T09:48:17.7556444Z // end inline asm 2026-02-21T09:48:17.7556588Z // begin inline asm 2026-02-21T09:48:17.7556872Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd74 + 0 ], 0x0, %rd82; 2026-02-21T09:48:17.7557198Z // end inline asm 2026-02-21T09:48:17.7557344Z // begin inline asm 2026-02-21T09:48:17.7557633Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r169; 2026-02-21T09:48:17.7557947Z // end inline asm 2026-02-21T09:48:17.7558098Z // begin inline asm 2026-02-21T09:48:17.7558417Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r169; 2026-02-21T09:48:17.7558732Z // end inline asm 2026-02-21T09:48:17.7558887Z // begin inline asm 2026-02-21T09:48:17.7559140Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x6; 2026-02-21T09:48:17.7559438Z // end inline asm 2026-02-21T09:48:17.7559584Z // begin inline asm 2026-02-21T09:48:17.7559864Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:17.7560234Z // end inline asm 2026-02-21T09:48:17.7560436Z // begin inline asm 2026-02-21T09:48:17.7560786Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x3; 2026-02-21T09:48:17.7561143Z // end inline asm 2026-02-21T09:48:17.7561296Z // begin inline asm 2026-02-21T09:48:17.7561549Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:17.7561838Z // end inline asm 2026-02-21T09:48:17.7561986Z // begin inline asm 2026-02-21T09:48:17.7562384Z @%p50 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd107 + 0 ], [ %rd74 + 0 ], 0x80; 2026-02-21T09:48:17.7562814Z // end inline asm 2026-02-21T09:48:17.7562967Z // begin inline asm 2026-02-21T09:48:17.7563212Z @%p50 fence.proxy.tensormap::generic.acquire.gpu [ %rd107 + 0 ], 0x80; 2026-02-21T09:48:17.7563499Z @%p50 cp.async.bulk.commit_group ; 2026-02-21T09:48:17.7563723Z @%p50 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:17.7563963Z // end inline asm 2026-02-21T09:48:17.7564121Z bar.sync 0, 128; 2026-02-21T09:48:17.7564400Z .loc 1 24 73 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:24:73 2026-02-21T09:48:17.7564794Z add.s32 %r197, %r194, 256; 2026-02-21T09:48:17.7564974Z cvt.s64.s32 %rd130, %r197; 2026-02-21T09:48:17.7565162Z add.s64 %rd125, %rd7, %rd130; 2026-02-21T09:48:17.7565408Z bar.sync 0, 128; 2026-02-21T09:48:17.7565558Z // begin inline asm 2026-02-21T09:48:17.7565736Z @%p50 st.shared.b32 [ %r163 + 0 ], %r172; 2026-02-21T09:48:17.7565928Z // end inline asm 2026-02-21T09:48:17.7566085Z bar.warp.sync -1; 2026-02-21T09:48:17.7566237Z // begin inline asm 2026-02-21T09:48:17.7566510Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd74 + 0 ], %rd6; 2026-02-21T09:48:17.7566816Z // end inline asm 2026-02-21T09:48:17.7566972Z // begin inline asm 2026-02-21T09:48:17.7567227Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:17.7567509Z // end inline asm 2026-02-21T09:48:17.7567665Z // begin inline asm 2026-02-21T09:48:17.7567924Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r174; 2026-02-21T09:48:17.7568228Z // end inline asm 2026-02-21T09:48:17.7568371Z // begin inline asm 2026-02-21T09:48:17.7568674Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r166; 2026-02-21T09:48:17.7568968Z // end inline asm 2026-02-21T09:48:17.7569113Z // begin inline asm 2026-02-21T09:48:17.7569386Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r176; 2026-02-21T09:48:17.7569684Z // end inline asm 2026-02-21T09:48:17.7569836Z // begin inline asm 2026-02-21T09:48:17.7570095Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r167; 2026-02-21T09:48:17.7570396Z // end inline asm 2026-02-21T09:48:17.7570544Z mov.b64 %rd118, 24576; 2026-02-21T09:48:17.7570713Z // begin inline asm 2026-02-21T09:48:17.7570998Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd74 + 0 ], 0x0, %rd118; 2026-02-21T09:48:17.7571309Z // end inline asm 2026-02-21T09:48:17.7571464Z // begin inline asm 2026-02-21T09:48:17.7571744Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r169; 2026-02-21T09:48:17.7572062Z // end inline asm 2026-02-21T09:48:17.7572207Z // begin inline asm 2026-02-21T09:48:17.7572487Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r169; 2026-02-21T09:48:17.7572847Z // end inline asm 2026-02-21T09:48:17.7572994Z // begin inline asm 2026-02-21T09:48:17.7573255Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x6; 2026-02-21T09:48:17.7573542Z // end inline asm 2026-02-21T09:48:17.7573693Z // begin inline asm 2026-02-21T09:48:17.7573962Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:17.7574284Z // end inline asm 2026-02-21T09:48:17.7574440Z // begin inline asm 2026-02-21T09:48:17.7574734Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:17.7575041Z // end inline asm 2026-02-21T09:48:17.7575187Z // begin inline asm 2026-02-21T09:48:17.7575453Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:17.7575749Z // end inline asm 2026-02-21T09:48:17.7575914Z // begin inline asm 2026-02-21T09:48:17.7576324Z @%p50 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd125 + 0 ], [ %rd74 + 0 ], 0x80; 2026-02-21T09:48:17.7576754Z // end inline asm 2026-02-21T09:48:17.7576909Z // begin inline asm 2026-02-21T09:48:17.7577142Z @%p50 fence.proxy.tensormap::generic.acquire.gpu [ %rd125 + 0 ], 0x80; 2026-02-21T09:48:17.7577431Z @%p50 cp.async.bulk.commit_group ; 2026-02-21T09:48:17.7577641Z @%p50 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:17.7577844Z // end inline asm 2026-02-21T09:48:17.7578029Z bar.sync 0, 128; 2026-02-21T09:48:17.7578198Z cvta.global.u64 %rd131, %rd125; 2026-02-21T09:48:17.7578510Z .loc 1 31 35 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:31:35 2026-02-21T09:48:17.7578828Z mul.lo.s32 %r369, %r187, 3; 2026-02-21T09:48:17.7579163Z .loc 1 32 37 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:32:37 2026-02-21T09:48:17.7579481Z add.s32 %r198, %r369, 3; 2026-02-21T09:48:17.7579780Z .loc 1 32 49 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:32:49 2026-02-21T09:48:17.7580101Z min.s32 %r22, %r198, 6144; 2026-02-21T09:48:17.7580403Z .loc 1 33 84 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:33:84 2026-02-21T09:48:17.7580738Z setp.ge.s32 %p106, %r369, %r22; 2026-02-21T09:48:17.7580925Z @%p106 bra $L__BB0_15; 2026-02-21T09:48:17.7581117Z // %bb.13: // %.lr.ph 2026-02-21T09:48:17.7581447Z .loc 1 0 84 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0:84 2026-02-21T09:48:17.7581773Z shl.b32 %r199, %r1, 5; 2026-02-21T09:48:17.7581939Z and.b32 %r200, %r199, 3936; 2026-02-21T09:48:17.7582123Z bfe.s32 %r201, %r1, 2, 1; 2026-02-21T09:48:17.7582290Z and.b32 %r202, %r201, 144; 2026-02-21T09:48:17.7582503Z or.b32 %r203, %r202, %r200; 2026-02-21T09:48:17.7582685Z add.s32 %r205, %r162, 131072; 2026-02-21T09:48:17.7582859Z add.s32 %r23, %r205, %r203; 2026-02-21T09:48:17.7583035Z xor.b32 %r206, %r203, 16; 2026-02-21T09:48:17.7583198Z add.s32 %r24, %r205, %r206; 2026-02-21T09:48:17.7583422Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:48:17.7583785Z .loc 1 39 35 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:39:35 2026-02-21T09:48:17.7584115Z shr.s32 %r302, %r369, 31; 2026-02-21T09:48:17.7584280Z shr.u32 %r303, %r302, 27; 2026-02-21T09:48:17.7584454Z add.s32 %r304, %r369, %r303; 2026-02-21T09:48:17.7584790Z .loc 1 42 64 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:42:64 2026-02-21T09:48:17.7585102Z and.b32 %r305, %r304, 65504; 2026-02-21T09:48:17.7585312Z sub.s32 %r306, %r369, %r305; 2026-02-21T09:48:17.7585561Z cvt.u16.u32 %rs1, %r306; 2026-02-21T09:48:17.7585786Z cvt.s8.s32 %rs2, %r306; 2026-02-21T09:48:17.7586147Z .loc 1 43 51 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:43:51 2026-02-21T09:48:17.7586618Z shr.u16 %rs3, %rs2, 13; 2026-02-21T09:48:17.7586844Z and.b16 %rs4, %rs3, 3; 2026-02-21T09:48:17.7587053Z add.s16 %rs5, %rs1, %rs4; 2026-02-21T09:48:17.7587251Z cvt.s16.s8 %rs6, %rs5; 2026-02-21T09:48:17.7587411Z shr.s16 %rs7, %rs6, 2; 2026-02-21T09:48:17.7587696Z .loc 1 42 64 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:42:64 2026-02-21T09:48:17.7588004Z and.b16 %rs8, %rs5, 252; 2026-02-21T09:48:17.7588177Z sub.s16 %rs9, %rs1, %rs8; 2026-02-21T09:48:17.7588463Z .loc 1 44 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:44:27 2026-02-21T09:48:17.7588785Z shl.b32 %r307, %r304, 1; 2026-02-21T09:48:17.7588978Z and.b32 %r308, %r307, -64; 2026-02-21T09:48:17.7589149Z cvt.s16.s8 %rs10, %rs9; 2026-02-21T09:48:17.7589332Z mad.wide.s16 %r299, %rs10, 16, %r308; 2026-02-21T09:48:17.7589640Z .loc 1 45 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:45:27 2026-02-21T09:48:17.7589963Z mul.wide.s16 %r300, %rs7, 256; 2026-02-21T09:48:17.7590256Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7590590Z shfl.sync.idx.b32 %r309, %r2, 0, 31, -1; 2026-02-21T09:48:17.7590790Z shl.b32 %r310, %r309, 21; 2026-02-21T09:48:17.7590965Z and.b32 %r311, %r310, 6291456; 2026-02-21T09:48:17.7591148Z add.s32 %r207, %r311, %r362; 2026-02-21T09:48:17.7591320Z mov.pred %p107, -1; 2026-02-21T09:48:17.7591532Z mov.b32 %r208, 0; 2026-02-21T09:48:17.7591682Z // begin inline asm 2026-02-21T09:48:17.7592109Z @%p107 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 0], {%r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208}; 2026-02-21T09:48:17.7592547Z // end inline asm 2026-02-21T09:48:17.7592707Z // begin inline asm 2026-02-21T09:48:17.7593160Z @%p107 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 16], {%r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208}; 2026-02-21T09:48:17.7593615Z // end inline asm 2026-02-21T09:48:17.7593772Z // begin inline asm 2026-02-21T09:48:17.7593940Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:48:17.7594126Z // end inline asm 2026-02-21T09:48:17.7594271Z bar.sync 0, 128; 2026-02-21T09:48:17.7594560Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7594916Z add.s32 %r241, %r162, 147584; 2026-02-21T09:48:17.7595097Z // begin inline asm 2026-02-21T09:48:17.7595289Z @%p53 mbarrier.init.shared::cta.b64 [%r241], 1; 2026-02-21T09:48:17.7595499Z // end inline asm 2026-02-21T09:48:17.7595656Z bar.sync 0, 128; 2026-02-21T09:48:17.7595812Z add.s32 %r242, %r162, 147592; 2026-02-21T09:48:17.7595993Z // begin inline asm 2026-02-21T09:48:17.7596206Z @%p53 mbarrier.init.shared::cta.b64 [%r242], 1; 2026-02-21T09:48:17.7596422Z // end inline asm 2026-02-21T09:48:17.7596567Z bar.sync 0, 128; 2026-02-21T09:48:17.7596723Z add.s32 %r243, %r162, 147600; 2026-02-21T09:48:17.7596896Z // begin inline asm 2026-02-21T09:48:17.7597075Z @%p53 mbarrier.init.shared::cta.b64 [%r243], 1; 2026-02-21T09:48:17.7597285Z // end inline asm 2026-02-21T09:48:17.7597429Z bar.sync 0, 128; 2026-02-21T09:48:17.7597584Z add.s32 %r244, %r162, 147608; 2026-02-21T09:48:17.7597752Z // begin inline asm 2026-02-21T09:48:17.7597937Z @%p53 mbarrier.init.shared::cta.b64 [%r244], 1; 2026-02-21T09:48:17.7598138Z // end inline asm 2026-02-21T09:48:17.7598294Z add.s32 %r245, %r162, 147616; 2026-02-21T09:48:17.7598460Z // begin inline asm 2026-02-21T09:48:17.7598644Z @%p53 mbarrier.init.shared::cta.b64 [%r245], 1; 2026-02-21T09:48:17.7598849Z // end inline asm 2026-02-21T09:48:17.7598991Z bar.sync 0, 128; 2026-02-21T09:48:17.7599152Z add.s32 %r246, %r162, 147624; 2026-02-21T09:48:17.7599324Z // begin inline asm 2026-02-21T09:48:17.7599512Z @%p53 mbarrier.init.shared::cta.b64 [%r246], 1; 2026-02-21T09:48:17.7599759Z // end inline asm 2026-02-21T09:48:17.7599909Z bar.sync 0, 128; 2026-02-21T09:48:17.7600056Z add.s32 %r247, %r162, 147632; 2026-02-21T09:48:17.7600229Z // begin inline asm 2026-02-21T09:48:17.7600410Z @%p53 mbarrier.init.shared::cta.b64 [%r247], 1; 2026-02-21T09:48:17.7600608Z // end inline asm 2026-02-21T09:48:17.7600756Z bar.sync 0, 128; 2026-02-21T09:48:17.7600904Z add.s32 %r248, %r162, 147640; 2026-02-21T09:48:17.7601077Z // begin inline asm 2026-02-21T09:48:17.7601252Z @%p53 mbarrier.init.shared::cta.b64 [%r248], 1; 2026-02-21T09:48:17.7601458Z // end inline asm 2026-02-21T09:48:17.7601725Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7602045Z bar.sync 0, 128; 2026-02-21T09:48:17.7602191Z // begin inline asm 2026-02-21T09:48:17.7602386Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r241]; 2026-02-21T09:48:17.7602611Z // end inline asm 2026-02-21T09:48:17.7602754Z bar.sync 0, 128; 2026-02-21T09:48:17.7602908Z // begin inline asm 2026-02-21T09:48:17.7603087Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r242]; 2026-02-21T09:48:17.7603305Z // end inline asm 2026-02-21T09:48:17.7603446Z bar.sync 0, 128; 2026-02-21T09:48:17.7603599Z // begin inline asm 2026-02-21T09:48:17.7603777Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r243]; 2026-02-21T09:48:17.7603989Z // end inline asm 2026-02-21T09:48:17.7604135Z bar.sync 0, 128; 2026-02-21T09:48:17.7604279Z // begin inline asm 2026-02-21T09:48:17.7604465Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r244]; 2026-02-21T09:48:17.7604758Z // end inline asm 2026-02-21T09:48:17.7605035Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7605356Z bar.sync 0, 128; 2026-02-21T09:48:17.7605513Z add.s32 %r253, %r162, 147648; 2026-02-21T09:48:17.7605685Z // begin inline asm 2026-02-21T09:48:17.7605931Z @%p53 mbarrier.init.shared::cta.b64 [%r253], 1; 2026-02-21T09:48:17.7606212Z // end inline asm 2026-02-21T09:48:17.7606437Z st.shared.b32 [global_smem+147656], 33554689; 2026-02-21T09:48:17.7606777Z st.shared.b32 [global_smem+147456], %r362; 2026-02-21T09:48:17.7607111Z st.shared.v2.b32 [global_smem+147464], {%r300, %r299}; 2026-02-21T09:48:17.7607344Z barrier.sync 1; 2026-02-21T09:48:17.7607523Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:17.7607727Z barrier.sync 1; 2026-02-21T09:48:17.7607876Z barrier.sync 1; 2026-02-21T09:48:17.7608055Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:17.7608384Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7608692Z bar.sync 0, 128; 2026-02-21T09:48:17.7608848Z // begin inline asm 2026-02-21T09:48:17.7608998Z 2026-02-21T09:48:17.7609135Z { 2026-02-21T09:48:17.7609272Z .reg .pred complete; 2026-02-21T09:48:17.7609487Z waitLoop: 2026-02-21T09:48:17.7609706Z mbarrier.try_wait.parity.shared.b64 complete, [%r253], %r208; 2026-02-21T09:48:17.7609988Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.7610158Z } 2026-02-21T09:48:17.7610238Z 2026-02-21T09:48:17.7610300Z // end inline asm 2026-02-21T09:48:17.7610579Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7610899Z bar.sync 0, 128; 2026-02-21T09:48:17.7611052Z // begin inline asm 2026-02-21T09:48:17.7611235Z @%p53 mbarrier.inval.shared::cta.b64 [%r253]; 2026-02-21T09:48:17.7611449Z // end inline asm 2026-02-21T09:48:17.7611595Z // begin inline asm 2026-02-21T09:48:17.7611787Z @%p53 mbarrier.inval.shared::cta.b64 [%r245]; 2026-02-21T09:48:17.7611991Z // end inline asm 2026-02-21T09:48:17.7612145Z bar.sync 0, 128; 2026-02-21T09:48:17.7612297Z // begin inline asm 2026-02-21T09:48:17.7612474Z @%p53 mbarrier.inval.shared::cta.b64 [%r246]; 2026-02-21T09:48:17.7612686Z // end inline asm 2026-02-21T09:48:17.7612832Z bar.sync 0, 128; 2026-02-21T09:48:17.7612984Z // begin inline asm 2026-02-21T09:48:17.7613193Z @%p53 mbarrier.inval.shared::cta.b64 [%r247]; 2026-02-21T09:48:17.7613402Z // end inline asm 2026-02-21T09:48:17.7613545Z bar.sync 0, 128; 2026-02-21T09:48:17.7613696Z // begin inline asm 2026-02-21T09:48:17.7613876Z @%p53 mbarrier.inval.shared::cta.b64 [%r248]; 2026-02-21T09:48:17.7614076Z // end inline asm 2026-02-21T09:48:17.7614227Z // begin inline asm 2026-02-21T09:48:17.7614398Z @%p53 mbarrier.inval.shared::cta.b64 [%r241]; 2026-02-21T09:48:17.7614602Z // end inline asm 2026-02-21T09:48:17.7614775Z bar.sync 0, 128; 2026-02-21T09:48:17.7614927Z // begin inline asm 2026-02-21T09:48:17.7615100Z @%p53 mbarrier.inval.shared::cta.b64 [%r242]; 2026-02-21T09:48:17.7615308Z // end inline asm 2026-02-21T09:48:17.7615450Z bar.sync 0, 128; 2026-02-21T09:48:17.7615602Z // begin inline asm 2026-02-21T09:48:17.7615781Z @%p53 mbarrier.inval.shared::cta.b64 [%r243]; 2026-02-21T09:48:17.7615983Z // end inline asm 2026-02-21T09:48:17.7616132Z bar.sync 0, 128; 2026-02-21T09:48:17.7616276Z // begin inline asm 2026-02-21T09:48:17.7616461Z @%p53 mbarrier.inval.shared::cta.b64 [%r244]; 2026-02-21T09:48:17.7616659Z // end inline asm 2026-02-21T09:48:17.7616937Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7617251Z // begin inline asm 2026-02-21T09:48:17.7617653Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r265, %r266, %r267, %r268, %r269, %r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280}, [%r207 + 0]; 2026-02-21T09:48:17.7618148Z // end inline asm 2026-02-21T09:48:17.7618293Z // begin inline asm 2026-02-21T09:48:17.7618689Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297}, [%r207 + 16]; 2026-02-21T09:48:17.7619122Z // end inline asm 2026-02-21T09:48:17.7619278Z // begin inline asm 2026-02-21T09:48:17.7619477Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:48:17.7619675Z // end inline asm 2026-02-21T09:48:17.7619841Z cvt.u64.u32 %rd132, %r265; 2026-02-21T09:48:17.7620020Z cvt.u64.u32 %rd133, %r266; 2026-02-21T09:48:17.7620207Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:48:17.7620387Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:48:17.7620705Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7621028Z mov.b64 {%r313, %r314}, %rd135; 2026-02-21T09:48:17.7621229Z cvt.rn.f16x2.f32 %r315, %r314, %r313; 2026-02-21T09:48:17.7621550Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7621885Z cvt.u64.u32 %rd136, %r267; 2026-02-21T09:48:17.7622072Z cvt.u64.u32 %rd137, %r268; 2026-02-21T09:48:17.7622243Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:48:17.7622428Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:48:17.7622775Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7623104Z mov.b64 {%r316, %r317}, %rd139; 2026-02-21T09:48:17.7623293Z cvt.rn.f16x2.f32 %r318, %r317, %r316; 2026-02-21T09:48:17.7623607Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7623930Z cvt.u64.u32 %rd140, %r269; 2026-02-21T09:48:17.7624097Z cvt.u64.u32 %rd141, %r270; 2026-02-21T09:48:17.7624274Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:48:17.7624445Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:48:17.7624798Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7625115Z mov.b64 {%r319, %r320}, %rd143; 2026-02-21T09:48:17.7625309Z cvt.rn.f16x2.f32 %r321, %r320, %r319; 2026-02-21T09:48:17.7625613Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7625933Z cvt.u64.u32 %rd144, %r271; 2026-02-21T09:48:17.7626110Z cvt.u64.u32 %rd145, %r272; 2026-02-21T09:48:17.7626277Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:48:17.7626509Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:48:17.7626805Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7627124Z mov.b64 {%r322, %r323}, %rd147; 2026-02-21T09:48:17.7627305Z cvt.rn.f16x2.f32 %r324, %r323, %r322; 2026-02-21T09:48:17.7627615Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7627930Z cvt.u64.u32 %rd148, %r273; 2026-02-21T09:48:17.7628099Z cvt.u64.u32 %rd149, %r274; 2026-02-21T09:48:17.7628270Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:48:17.7628441Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:48:17.7628738Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7629055Z mov.b64 {%r325, %r326}, %rd151; 2026-02-21T09:48:17.7629244Z cvt.rn.f16x2.f32 %r327, %r326, %r325; 2026-02-21T09:48:17.7629552Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7629871Z cvt.u64.u32 %rd152, %r275; 2026-02-21T09:48:17.7630046Z cvt.u64.u32 %rd153, %r276; 2026-02-21T09:48:17.7630214Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:48:17.7630391Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:48:17.7630682Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7631002Z mov.b64 {%r328, %r329}, %rd155; 2026-02-21T09:48:17.7631229Z cvt.rn.f16x2.f32 %r330, %r329, %r328; 2026-02-21T09:48:17.7631540Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7631856Z cvt.u64.u32 %rd156, %r277; 2026-02-21T09:48:17.7632026Z cvt.u64.u32 %rd157, %r278; 2026-02-21T09:48:17.7632206Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:48:17.7632422Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:48:17.7632723Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7633033Z mov.b64 {%r331, %r332}, %rd159; 2026-02-21T09:48:17.7633111Z cvt.rn.f16x2.f32 %r333, %r332, %r331; 2026-02-21T09:48:17.7633295Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7633360Z cvt.u64.u32 %rd160, %r279; 2026-02-21T09:48:17.7633432Z cvt.u64.u32 %rd161, %r280; 2026-02-21T09:48:17.7633497Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:48:17.7633561Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:48:17.7633743Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7633817Z mov.b64 {%r334, %r335}, %rd163; 2026-02-21T09:48:17.7633886Z cvt.rn.f16x2.f32 %r336, %r335, %r334; 2026-02-21T09:48:17.7634098Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7634175Z cvt.u64.u32 %rd164, %r282; 2026-02-21T09:48:17.7634239Z cvt.u64.u32 %rd165, %r283; 2026-02-21T09:48:17.7634303Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:48:17.7634375Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:48:17.7634559Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7634625Z mov.b64 {%r337, %r338}, %rd167; 2026-02-21T09:48:17.7634746Z cvt.rn.f16x2.f32 %r339, %r338, %r337; 2026-02-21T09:48:17.7634938Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7635004Z cvt.u64.u32 %rd168, %r284; 2026-02-21T09:48:17.7635068Z cvt.u64.u32 %rd169, %r285; 2026-02-21T09:48:17.7635138Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:48:17.7635201Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:48:17.7635387Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7635458Z mov.b64 {%r340, %r341}, %rd171; 2026-02-21T09:48:17.7635575Z cvt.rn.f16x2.f32 %r342, %r341, %r340; 2026-02-21T09:48:17.7635756Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7635820Z cvt.u64.u32 %rd172, %r286; 2026-02-21T09:48:17.7635891Z cvt.u64.u32 %rd173, %r287; 2026-02-21T09:48:17.7635955Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:48:17.7636018Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:48:17.7636208Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7636275Z mov.b64 {%r343, %r344}, %rd175; 2026-02-21T09:48:17.7636345Z cvt.rn.f16x2.f32 %r345, %r344, %r343; 2026-02-21T09:48:17.7636536Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7636600Z cvt.u64.u32 %rd176, %r288; 2026-02-21T09:48:17.7636664Z cvt.u64.u32 %rd177, %r289; 2026-02-21T09:48:17.7636729Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:48:17.7636803Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:48:17.7636987Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7637048Z mov.b64 {%r346, %r347}, %rd179; 2026-02-21T09:48:17.7637126Z cvt.rn.f16x2.f32 %r348, %r347, %r346; 2026-02-21T09:48:17.7637303Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7637366Z cvt.u64.u32 %rd180, %r290; 2026-02-21T09:48:17.7637435Z cvt.u64.u32 %rd181, %r291; 2026-02-21T09:48:17.7637540Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:48:17.7637606Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:48:17.7637790Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7637863Z mov.b64 {%r349, %r350}, %rd183; 2026-02-21T09:48:17.7637933Z cvt.rn.f16x2.f32 %r351, %r350, %r349; 2026-02-21T09:48:17.7638151Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7638226Z cvt.u64.u32 %rd184, %r292; 2026-02-21T09:48:17.7638289Z cvt.u64.u32 %rd185, %r293; 2026-02-21T09:48:17.7638351Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:48:17.7638421Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:48:17.7638604Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7638668Z mov.b64 {%r352, %r353}, %rd187; 2026-02-21T09:48:17.7638735Z cvt.rn.f16x2.f32 %r354, %r353, %r352; 2026-02-21T09:48:17.7638928Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7638991Z cvt.u64.u32 %rd188, %r294; 2026-02-21T09:48:17.7639052Z cvt.u64.u32 %rd189, %r295; 2026-02-21T09:48:17.7639122Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:48:17.7639283Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:48:17.7639468Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7639541Z mov.b64 {%r355, %r356}, %rd191; 2026-02-21T09:48:17.7639610Z cvt.rn.f16x2.f32 %r357, %r356, %r355; 2026-02-21T09:48:17.7639790Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7639853Z cvt.u64.u32 %rd192, %r296; 2026-02-21T09:48:17.7639925Z cvt.u64.u32 %rd193, %r297; 2026-02-21T09:48:17.7639988Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:48:17.7640053Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:48:17.7640242Z .loc 1 58 27 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:58:27 2026-02-21T09:48:17.7640307Z mov.b64 {%r358, %r359}, %rd195; 2026-02-21T09:48:17.7640375Z cvt.rn.f16x2.f32 %r360, %r359, %r358; 2026-02-21T09:48:17.7640565Z .loc 1 59 45 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:59:45 2026-02-21T09:48:17.7640648Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:17.7640711Z bar.sync 0, 128; 2026-02-21T09:48:17.7640842Z st.shared.v4.b32 [%r23], {%r315, %r318, %r321, %r324}; 2026-02-21T09:48:17.7640966Z st.shared.v4.b32 [%r23+4096], {%r339, %r342, %r345, %r348}; 2026-02-21T09:48:17.7641066Z st.shared.v4.b32 [%r24], {%r327, %r330, %r333, %r336}; 2026-02-21T09:48:17.7641169Z st.shared.v4.b32 [%r24+4096], {%r351, %r354, %r357, %r360}; 2026-02-21T09:48:17.7641244Z // begin inline asm 2026-02-21T09:48:17.7641327Z fence.proxy.async.shared::cta; 2026-02-21T09:48:17.7641388Z // end inline asm 2026-02-21T09:48:17.7641456Z bar.sync 0, 128; 2026-02-21T09:48:17.7641531Z elect.sync %r361|%p133, -1; 2026-02-21T09:48:17.7641604Z and.pred %p131, %p50, %p133; 2026-02-21T09:48:17.7641669Z // begin inline asm 2026-02-21T09:48:17.7641887Z @%p131 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd131, {%r299, %r300}], [%r205]; 2026-02-21T09:48:17.7641952Z // end inline asm 2026-02-21T09:48:17.7642032Z cp.async.bulk.commit_group; 2026-02-21T09:48:17.7642230Z .loc 1 33 84 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:33:84 2026-02-21T09:48:17.7642299Z add.s32 %r369, %r369, 1; 2026-02-21T09:48:17.7642370Z setp.ne.b32 %p134, %r22, %r369; 2026-02-21T09:48:17.7642440Z @%p134 bra $L__BB0_14; 2026-02-21T09:48:17.7642542Z $L__BB0_15: // %._crit_edge 2026-02-21T09:48:17.7642620Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:17.7642681Z bar.sync 0, 128; 2026-02-21T09:48:17.7642875Z .loc 1 33 4 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:33:4 2026-02-21T09:48:17.7642965Z bar.sync 0, 128; 2026-02-21T09:48:17.7643029Z // begin inline asm 2026-02-21T09:48:17.7643166Z @%p50 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r362, 32; 2026-02-21T09:48:17.7643225Z // end inline asm 2026-02-21T09:48:17.7643317Z st.shared.b32 [global_smem+147656], 50529027; 2026-02-21T09:48:17.7643380Z barrier.sync 1; 2026-02-21T09:48:17.7643502Z $L__BB0_16: // %common.ret 2026-02-21T09:48:17.7643688Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7643745Z ret; 2026-02-21T09:48:17.7643859Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:48:17.7643927Z mov.b32 %r28, global_smem; 2026-02-21T09:48:17.7643994Z add.s32 %r29, %r28, %r3; 2026-02-21T09:48:17.7644067Z add.s32 %r62, %r28, 147616; 2026-02-21T09:48:17.7644134Z bfe.u32 %r84, %r28, 4, 14; 2026-02-21T09:48:17.7644200Z cvt.u64.u32 %rd32, %r84; 2026-02-21T09:48:17.7644278Z or.b64 %rd14, %rd32, 4611686293439512576; 2026-02-21T09:48:17.7644350Z add.s32 %r85, %r28, 139264; 2026-02-21T09:48:17.7644413Z bfe.u32 %r86, %r85, 4, 14; 2026-02-21T09:48:17.7644477Z cvt.u64.u32 %rd33, %r86; 2026-02-21T09:48:17.7644557Z or.b64 %rd23, %rd33, 4611686293313683456; 2026-02-21T09:48:17.7644645Z add.s32 %r87, %r28, 32; 2026-02-21T09:48:17.7644795Z bfe.u32 %r88, %r87, 4, 14; 2026-02-21T09:48:17.7644859Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.7644984Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7645181Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7645269Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.7645339Z barrier.sync 1; 2026-02-21T09:48:17.7645401Z barrier.sync 1; 2026-02-21T09:48:17.7645486Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.7645583Z $L__BB0_2: // %.preheader 2026-02-21T09:48:17.7645686Z // =>This Loop Header: Depth=1 2026-02-21T09:48:17.7645782Z // Child Loop BB0_9 Depth 2 2026-02-21T09:48:17.7645874Z // Child Loop BB0_6 Depth 2 2026-02-21T09:48:17.7646065Z .loc 1 19 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:19 2026-02-21T09:48:17.7646150Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:48:17.7646284Z barrier.sync 1; 2026-02-21T09:48:17.7646370Z ld.shared.b8 %r27, [%r29+147652]; 2026-02-21T09:48:17.7646441Z setp.gt.u32 %p2, %r27, 3; 2026-02-21T09:48:17.7646506Z @%p2 bra $L__BB0_4; 2026-02-21T09:48:17.7646605Z // %bb.3: // %.preheader 2026-02-21T09:48:17.7646702Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7646771Z $L_brx_0: .branchtargets 2026-02-21T09:48:17.7646831Z $L__BB0_5, 2026-02-21T09:48:17.7646902Z $L__BB0_8, 2026-02-21T09:48:17.7646961Z $L__BB0_11, 2026-02-21T09:48:17.7647018Z $L__BB0_16; 2026-02-21T09:48:17.7647092Z brx.idx %r27, $L_brx_0; 2026-02-21T09:48:17.7647176Z $L__BB0_5: // %.peel.next 2026-02-21T09:48:17.7647270Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7647457Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7647548Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.7647633Z ld.shared.b32 %r64, [global_smem+147456]; 2026-02-21T09:48:17.7647695Z barrier.sync 1; 2026-02-21T09:48:17.7647879Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7647948Z bar.warp.sync -1; 2026-02-21T09:48:17.7648009Z mov.b32 %r363, 0; 2026-02-21T09:48:17.7648081Z // begin inline asm 2026-02-21T09:48:17.7648138Z 2026-02-21T09:48:17.7648193Z { 2026-02-21T09:48:17.7648315Z .reg .pred complete; 2026-02-21T09:48:17.7648386Z waitLoop: 2026-02-21T09:48:17.7648515Z mbarrier.try_wait.parity.shared.b64 complete, [%r62], %r363; 2026-02-21T09:48:17.7648589Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.7648651Z } 2026-02-21T09:48:17.7648655Z 2026-02-21T09:48:17.7648719Z // end inline asm 2026-02-21T09:48:17.7648953Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7649036Z elect.sync %r83|%p12, -1; 2026-02-21T09:48:17.7649101Z mov.b32 %r65, 134479888; 2026-02-21T09:48:17.7649166Z mov.pred %p11, 0; 2026-02-21T09:48:17.7649228Z // begin inline asm 2026-02-21T09:48:17.7649394Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd14, %rd23, %r65, %p11; 2026-02-21T09:48:17.7649458Z // end inline asm 2026-02-21T09:48:17.7649524Z cvt.u64.u32 %rd34, %r88; 2026-02-21T09:48:17.7649605Z or.b64 %rd16, %rd34, 4611686293439512576; 2026-02-21T09:48:17.7649670Z add.s32 %r89, %r28, 139296; 2026-02-21T09:48:17.7649738Z bfe.u32 %r90, %r89, 4, 14; 2026-02-21T09:48:17.7649805Z cvt.u64.u32 %rd35, %r90; 2026-02-21T09:48:17.7649886Z or.b64 %rd17, %rd35, 4611686293313683456; 2026-02-21T09:48:17.7649954Z mov.pred %p13, -1; 2026-02-21T09:48:17.7650016Z // begin inline asm 2026-02-21T09:48:17.7650216Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd16, %rd17, %r65, %p13; 2026-02-21T09:48:17.7650279Z // end inline asm 2026-02-21T09:48:17.7650343Z add.s32 %r91, %r28, 64; 2026-02-21T09:48:17.7650416Z bfe.u32 %r92, %r91, 4, 14; 2026-02-21T09:48:17.7650480Z cvt.u64.u32 %rd36, %r92; 2026-02-21T09:48:17.7650551Z or.b64 %rd18, %rd36, 4611686293439512576; 2026-02-21T09:48:17.7650615Z add.s32 %r93, %r28, 139328; 2026-02-21T09:48:17.7650686Z bfe.u32 %r94, %r93, 4, 14; 2026-02-21T09:48:17.7650752Z cvt.u64.u32 %rd37, %r94; 2026-02-21T09:48:17.7650823Z or.b64 %rd19, %rd37, 4611686293313683456; 2026-02-21T09:48:17.7650898Z // begin inline asm 2026-02-21T09:48:17.7651042Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd18, %rd19, %r65, %p13; 2026-02-21T09:48:17.7651106Z // end inline asm 2026-02-21T09:48:17.7651169Z add.s32 %r95, %r28, 96; 2026-02-21T09:48:17.7651243Z bfe.u32 %r96, %r95, 4, 14; 2026-02-21T09:48:17.7651306Z cvt.u64.u32 %rd38, %r96; 2026-02-21T09:48:17.7651376Z or.b64 %rd20, %rd38, 4611686293439512576; 2026-02-21T09:48:17.7651452Z add.s32 %r97, %r28, 139360; 2026-02-21T09:48:17.7651515Z bfe.u32 %r98, %r97, 4, 14; 2026-02-21T09:48:17.7651604Z cvt.u64.u32 %rd39, %r98; 2026-02-21T09:48:17.7651674Z or.b64 %rd21, %rd39, 4611686293313683456; 2026-02-21T09:48:17.7651745Z // begin inline asm 2026-02-21T09:48:17.7651885Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd20, %rd21, %r65, %p13; 2026-02-21T09:48:17.7651945Z // end inline asm 2026-02-21T09:48:17.7652016Z add.s32 %r99, %r28, 16384; 2026-02-21T09:48:17.7652080Z bfe.u32 %r100, %r99, 4, 14; 2026-02-21T09:48:17.7652144Z cvt.u64.u32 %rd40, %r100; 2026-02-21T09:48:17.7652222Z or.b64 %rd22, %rd40, 4611686293439512576; 2026-02-21T09:48:17.7652286Z // begin inline asm 2026-02-21T09:48:17.7652431Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd22, %rd23, %r65, %p11; 2026-02-21T09:48:17.7652491Z // end inline asm 2026-02-21T09:48:17.7652560Z add.s32 %r101, %r28, 16416; 2026-02-21T09:48:17.7652627Z bfe.u32 %r102, %r101, 4, 14; 2026-02-21T09:48:17.7652694Z cvt.u64.u32 %rd41, %r102; 2026-02-21T09:48:17.7652772Z or.b64 %rd24, %rd41, 4611686293439512576; 2026-02-21T09:48:17.7652835Z // begin inline asm 2026-02-21T09:48:17.7652978Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd24, %rd17, %r65, %p13; 2026-02-21T09:48:17.7653046Z // end inline asm 2026-02-21T09:48:17.7653110Z add.s32 %r103, %r28, 16448; 2026-02-21T09:48:17.7653176Z bfe.u32 %r104, %r103, 4, 14; 2026-02-21T09:48:17.7653240Z cvt.u64.u32 %rd42, %r104; 2026-02-21T09:48:17.7653319Z or.b64 %rd26, %rd42, 4611686293439512576; 2026-02-21T09:48:17.7653381Z // begin inline asm 2026-02-21T09:48:17.7653553Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd26, %rd19, %r65, %p13; 2026-02-21T09:48:17.7653621Z // end inline asm 2026-02-21T09:48:17.7653683Z add.s32 %r105, %r28, 16480; 2026-02-21T09:48:17.7653746Z bfe.u32 %r106, %r105, 4, 14; 2026-02-21T09:48:17.7653809Z cvt.u64.u32 %rd43, %r106; 2026-02-21T09:48:17.7653889Z or.b64 %rd28, %rd43, 4611686293439512576; 2026-02-21T09:48:17.7653975Z // begin inline asm 2026-02-21T09:48:17.7654117Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd28, %rd21, %r65, %p13; 2026-02-21T09:48:17.7654187Z // end inline asm 2026-02-21T09:48:17.7654251Z add.s32 %r107, %r28, 147584; 2026-02-21T09:48:17.7654314Z cvt.u64.u32 %rd30, %r107; 2026-02-21T09:48:17.7654382Z // begin inline asm 2026-02-21T09:48:17.7654520Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd30]; 2026-02-21T09:48:17.7654580Z // end inline asm 2026-02-21T09:48:17.7654643Z add.s32 %r108, %r28, 147648; 2026-02-21T09:48:17.7654748Z cvt.u64.u32 %rd31, %r108; 2026-02-21T09:48:17.7654814Z // begin inline asm 2026-02-21T09:48:17.7654953Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd31]; 2026-02-21T09:48:17.7655021Z // end inline asm 2026-02-21T09:48:17.7655080Z mov.b32 %r365, 1; 2026-02-21T09:48:17.7655141Z mov.b32 %r364, %r363; 2026-02-21T09:48:17.7655297Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:17.7655401Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:17.7655593Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7655655Z shl.b32 %r127, %r365, 3; 2026-02-21T09:48:17.7655726Z add.s32 %r129, %r28, %r127; 2026-02-21T09:48:17.7655792Z add.s32 %r130, %r129, 147584; 2026-02-21T09:48:17.7655857Z add.s32 %r109, %r129, 147616; 2026-02-21T09:48:17.7656047Z .loc 1 54 31 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:54:31 2026-02-21T09:48:17.7656111Z shl.b32 %r131, %r365, 15; 2026-02-21T09:48:17.7656172Z add.s32 %r132, %r28, %r131; 2026-02-21T09:48:17.7656359Z .loc 1 55 44 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:55:44 2026-02-21T09:48:17.7656424Z shl.b32 %r133, %r365, 11; 2026-02-21T09:48:17.7656488Z add.s32 %r134, %r28, %r133; 2026-02-21T09:48:17.7656557Z add.s32 %r135, %r134, 139264; 2026-02-21T09:48:17.7656749Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7656863Z bar.warp.sync -1; 2026-02-21T09:48:17.7656926Z // begin inline asm 2026-02-21T09:48:17.7656989Z 2026-02-21T09:48:17.7657045Z { 2026-02-21T09:48:17.7657111Z .reg .pred complete; 2026-02-21T09:48:17.7657170Z waitLoop: 2026-02-21T09:48:17.7657312Z mbarrier.try_wait.parity.shared.b64 complete, [%r109], %r364; 2026-02-21T09:48:17.7657383Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.7657437Z } 2026-02-21T09:48:17.7657441Z 2026-02-21T09:48:17.7657513Z // end inline asm 2026-02-21T09:48:17.7657697Z .loc 1 56 52 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:56:52 2026-02-21T09:48:17.7657768Z setp.eq.b32 %p47, %r363, 1920; 2026-02-21T09:48:17.7657844Z elect.sync %r136|%p30, -1; 2026-02-21T09:48:17.7657906Z bfe.u32 %r137, %r132, 4, 14; 2026-02-21T09:48:17.7657975Z cvt.u64.u32 %rd62, %r137; 2026-02-21T09:48:17.7658046Z or.b64 %rd44, %rd62, 4611686293439512576; 2026-02-21T09:48:17.7658118Z bfe.u32 %r138, %r135, 4, 14; 2026-02-21T09:48:17.7658181Z cvt.u64.u32 %rd63, %r138; 2026-02-21T09:48:17.7658254Z or.b64 %rd45, %rd63, 4611686293313683456; 2026-02-21T09:48:17.7658324Z // begin inline asm 2026-02-21T09:48:17.7658469Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd44, %rd45, %r65, %p13; 2026-02-21T09:48:17.7658530Z // end inline asm 2026-02-21T09:48:17.7658599Z add.s32 %r139, %r132, 32; 2026-02-21T09:48:17.7658663Z bfe.u32 %r140, %r139, 4, 14; 2026-02-21T09:48:17.7658725Z cvt.u64.u32 %rd64, %r140; 2026-02-21T09:48:17.7658827Z or.b64 %rd46, %rd64, 4611686293439512576; 2026-02-21T09:48:17.7658903Z add.s32 %r141, %r134, 139296; 2026-02-21T09:48:17.7658965Z bfe.u32 %r142, %r141, 4, 14; 2026-02-21T09:48:17.7659028Z cvt.u64.u32 %rd65, %r142; 2026-02-21T09:48:17.7659107Z or.b64 %rd47, %rd65, 4611686293313683456; 2026-02-21T09:48:17.7659201Z // begin inline asm 2026-02-21T09:48:17.7659345Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd46, %rd47, %r65, %p13; 2026-02-21T09:48:17.7659409Z // end inline asm 2026-02-21T09:48:17.7659483Z add.s32 %r143, %r132, 64; 2026-02-21T09:48:17.7659549Z bfe.u32 %r144, %r143, 4, 14; 2026-02-21T09:48:17.7659613Z cvt.u64.u32 %rd66, %r144; 2026-02-21T09:48:17.7659692Z or.b64 %rd48, %rd66, 4611686293439512576; 2026-02-21T09:48:17.7659755Z add.s32 %r145, %r134, 139328; 2026-02-21T09:48:17.7659817Z bfe.u32 %r146, %r145, 4, 14; 2026-02-21T09:48:17.7659887Z cvt.u64.u32 %rd67, %r146; 2026-02-21T09:48:17.7659957Z or.b64 %rd49, %rd67, 4611686293313683456; 2026-02-21T09:48:17.7660019Z // begin inline asm 2026-02-21T09:48:17.7660160Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd48, %rd49, %r65, %p13; 2026-02-21T09:48:17.7660231Z // end inline asm 2026-02-21T09:48:17.7660292Z add.s32 %r147, %r132, 96; 2026-02-21T09:48:17.7660354Z bfe.u32 %r148, %r147, 4, 14; 2026-02-21T09:48:17.7660449Z cvt.u64.u32 %rd68, %r148; 2026-02-21T09:48:17.7660522Z or.b64 %rd50, %rd68, 4611686293439512576; 2026-02-21T09:48:17.7660587Z add.s32 %r149, %r134, 139360; 2026-02-21T09:48:17.7660651Z bfe.u32 %r150, %r149, 4, 14; 2026-02-21T09:48:17.7660723Z cvt.u64.u32 %rd69, %r150; 2026-02-21T09:48:17.7660794Z or.b64 %rd51, %rd69, 4611686293313683456; 2026-02-21T09:48:17.7660855Z // begin inline asm 2026-02-21T09:48:17.7661002Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 0 ], %rd50, %rd51, %r65, %p13; 2026-02-21T09:48:17.7661062Z // end inline asm 2026-02-21T09:48:17.7661124Z add.s32 %r151, %r132, 16384; 2026-02-21T09:48:17.7661195Z bfe.u32 %r152, %r151, 4, 14; 2026-02-21T09:48:17.7661256Z cvt.u64.u32 %rd70, %r152; 2026-02-21T09:48:17.7661325Z or.b64 %rd52, %rd70, 4611686293439512576; 2026-02-21T09:48:17.7661387Z // begin inline asm 2026-02-21T09:48:17.7661536Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd52, %rd45, %r65, %p13; 2026-02-21T09:48:17.7661598Z // end inline asm 2026-02-21T09:48:17.7661663Z add.s32 %r153, %r132, 16416; 2026-02-21T09:48:17.7661733Z bfe.u32 %r154, %r153, 4, 14; 2026-02-21T09:48:17.7661824Z cvt.u64.u32 %rd71, %r154; 2026-02-21T09:48:17.7661894Z or.b64 %rd54, %rd71, 4611686293439512576; 2026-02-21T09:48:17.7661956Z // begin inline asm 2026-02-21T09:48:17.7662104Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd54, %rd47, %r65, %p13; 2026-02-21T09:48:17.7662165Z // end inline asm 2026-02-21T09:48:17.7662227Z add.s32 %r155, %r132, 16448; 2026-02-21T09:48:17.7662297Z bfe.u32 %r156, %r155, 4, 14; 2026-02-21T09:48:17.7662360Z cvt.u64.u32 %rd72, %r156; 2026-02-21T09:48:17.7662431Z or.b64 %rd56, %rd72, 4611686293439512576; 2026-02-21T09:48:17.7662502Z // begin inline asm 2026-02-21T09:48:17.7662640Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd56, %rd49, %r65, %p13; 2026-02-21T09:48:17.7662700Z // end inline asm 2026-02-21T09:48:17.7662763Z add.s32 %r157, %r132, 16480; 2026-02-21T09:48:17.7662836Z bfe.u32 %r158, %r157, 4, 14; 2026-02-21T09:48:17.7662898Z cvt.u64.u32 %rd73, %r158; 2026-02-21T09:48:17.7662971Z or.b64 %rd58, %rd73, 4611686293439512576; 2026-02-21T09:48:17.7663040Z // begin inline asm 2026-02-21T09:48:17.7663178Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r64 + 16 ], %rd58, %rd51, %r65, %p13; 2026-02-21T09:48:17.7663238Z // end inline asm 2026-02-21T09:48:17.7663308Z cvt.u64.u32 %rd60, %r130; 2026-02-21T09:48:17.7663370Z // begin inline asm 2026-02-21T09:48:17.7663505Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd60]; 2026-02-21T09:48:17.7663564Z // end inline asm 2026-02-21T09:48:17.7663641Z and.pred %p46, %p47, %p30; 2026-02-21T09:48:17.7663732Z // begin inline asm 2026-02-21T09:48:17.7663868Z @%p46 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd31]; 2026-02-21T09:48:17.7663936Z // end inline asm 2026-02-21T09:48:17.7664116Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7664208Z add.s32 %r160, %r365, 1; 2026-02-21T09:48:17.7664285Z setp.eq.b32 %p48, %r160, 4; 2026-02-21T09:48:17.7664355Z selp.b32 %r365, 0, %r160, %p48; 2026-02-21T09:48:17.7664420Z selp.b32 %r161, 1, 0, %p48; 2026-02-21T09:48:17.7664484Z xor.b32 %r364, %r364, %r161; 2026-02-21T09:48:17.7664730Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7664794Z add.s32 %r363, %r363, 64; 2026-02-21T09:48:17.7664864Z setp.lt.u32 %p49, %r363, 1984; 2026-02-21T09:48:17.7664936Z @%p49 bra $L__BB0_6; 2026-02-21T09:48:17.7665025Z // %bb.7: // %.loopexit 2026-02-21T09:48:17.7665124Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7665188Z barrier.sync 1; 2026-02-21T09:48:17.7665279Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.7665343Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.7665505Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7665702Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7665789Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.7665894Z ld.shared.v2.b32 {%r46, %r50}, [global_smem+147464]; 2026-02-21T09:48:17.7665967Z barrier.sync 1; 2026-02-21T09:48:17.7666151Z .loc 1 21 67 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:21:67 2026-02-21T09:48:17.7666218Z mov.u32 %r32, %ctaid.x; 2026-02-21T09:48:17.7666282Z mov.u32 %r33, %ctaid.y; 2026-02-21T09:48:17.7666358Z mov.u32 %r34, %ctaid.z; 2026-02-21T09:48:17.7666425Z mov.u32 %r35, %nctaid.x; 2026-02-21T09:48:17.7666489Z mov.u32 %r36, %nctaid.y; 2026-02-21T09:48:17.7666568Z mad.lo.s32 %r37, %r34, %r36, %r33; 2026-02-21T09:48:17.7666638Z mad.lo.s32 %r38, %r37, %r35, %r32; 2026-02-21T09:48:17.7666704Z mul.lo.s32 %r39, %r38, 384; 2026-02-21T09:48:17.7666894Z .loc 1 22 68 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:22:68 2026-02-21T09:48:17.7666958Z add.s32 %r40, %r39, 128; 2026-02-21T09:48:17.7667054Z cvt.s64.s32 %rd8, %r40; 2026-02-21T09:48:17.7667120Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:48:17.7667199Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:48:17.7667383Z .loc 1 21 67 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:21:67 2026-02-21T09:48:17.7667449Z cvt.s64.s32 %rd10, %r39; 2026-02-21T09:48:17.7667524Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:48:17.7667596Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:48:17.7667662Z add.s32 %r13, %r1, -128; 2026-02-21T09:48:17.7667725Z mov.b32 %r367, 0; 2026-02-21T09:48:17.7667807Z mov.b32 %r366, -64; 2026-02-21T09:48:17.7667870Z mov.b32 %r368, %r367; 2026-02-21T09:48:17.7667978Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:17.7668090Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:17.7668279Z .loc 1 0 67 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0:67 2026-02-21T09:48:17.7668350Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:48:17.7668424Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:48:17.7668607Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7668670Z add.s32 %r366, %r366, 64; 2026-02-21T09:48:17.7668732Z shl.b32 %r52, %r368, 3; 2026-02-21T09:48:17.7668802Z add.s32 %r54, %r28, %r52; 2026-02-21T09:48:17.7668866Z add.s32 %r41, %r54, 147584; 2026-02-21T09:48:17.7669045Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7669151Z // begin inline asm 2026-02-21T09:48:17.7669207Z 2026-02-21T09:48:17.7669263Z { 2026-02-21T09:48:17.7669338Z .reg .pred complete; 2026-02-21T09:48:17.7669398Z waitLoop: 2026-02-21T09:48:17.7669525Z mbarrier.try_wait.parity.shared.b64 complete, [%r41], %r367; 2026-02-21T09:48:17.7669635Z @!complete bra.uni waitLoop; 2026-02-21T09:48:17.7669701Z } 2026-02-21T09:48:17.7669705Z 2026-02-21T09:48:17.7669768Z // end inline asm 2026-02-21T09:48:17.7669945Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7670016Z add.s32 %r47, %r54, 147616; 2026-02-21T09:48:17.7670194Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7670257Z bar.sync 3, 64; 2026-02-21T09:48:17.7670329Z // begin inline asm 2026-02-21T09:48:17.7670449Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r47], 34816; 2026-02-21T09:48:17.7670512Z // end inline asm 2026-02-21T09:48:17.7670692Z .loc 1 54 31 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:54:31 2026-02-21T09:48:17.7670767Z shl.b32 %r55, %r368, 15; 2026-02-21T09:48:17.7670830Z add.s32 %r44, %r28, %r55; 2026-02-21T09:48:17.7671037Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7671111Z bar.sync 3, 64; 2026-02-21T09:48:17.7671182Z elect.sync %r56|%p7, -1; 2026-02-21T09:48:17.7671252Z and.pred %p4, %p6, %p7; 2026-02-21T09:48:17.7671314Z // begin inline asm 2026-02-21T09:48:17.7671588Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r44], [%rd12, {%r366, %r46}], [%r47]; 2026-02-21T09:48:17.7671649Z // end inline asm 2026-02-21T09:48:17.7671828Z .loc 1 55 44 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:55:44 2026-02-21T09:48:17.7671901Z shl.b32 %r57, %r368, 11; 2026-02-21T09:48:17.7671966Z add.s32 %r58, %r28, %r57; 2026-02-21T09:48:17.7672029Z add.s32 %r48, %r58, 139264; 2026-02-21T09:48:17.7672211Z .loc 1 0 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:0 2026-02-21T09:48:17.7672272Z bar.sync 3, 64; 2026-02-21T09:48:17.7672340Z elect.sync %r59|%p8, -1; 2026-02-21T09:48:17.7672416Z and.pred %p5, %p6, %p8; 2026-02-21T09:48:17.7672477Z // begin inline asm 2026-02-21T09:48:17.7672736Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r48], [%rd13, {%r366, %r50}], [%r47]; 2026-02-21T09:48:17.7672829Z // end inline asm 2026-02-21T09:48:17.7672899Z add.s32 %r60, %r368, 1; 2026-02-21T09:48:17.7672965Z setp.eq.b32 %p9, %r60, 4; 2026-02-21T09:48:17.7673034Z selp.b32 %r368, 0, %r60, %p9; 2026-02-21T09:48:17.7673107Z selp.b32 %r61, 1, 0, %p9; 2026-02-21T09:48:17.7673171Z xor.b32 %r367, %r367, %r61; 2026-02-21T09:48:17.7673351Z .loc 1 50 57 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:50:57 2026-02-21T09:48:17.7673429Z setp.lt.u32 %p10, %r366, 1984; 2026-02-21T09:48:17.7673492Z @%p10 bra $L__BB0_9; 2026-02-21T09:48:17.7673598Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7673660Z barrier.sync 1; 2026-02-21T09:48:17.7673753Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:17.7673817Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.7673923Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:17.7674113Z .loc 1 19 0 // crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py:19 2026-02-21T09:48:17.7674175Z barrier.sync 1; 2026-02-21T09:48:17.7674235Z barrier.sync 1; 2026-02-21T09:48:17.7674296Z bra.uni $L__BB0_2; 2026-02-21T09:48:17.7674361Z $L__tmp1: 2026-02-21T09:48:17.7674421Z $L__func_end0: 2026-02-21T09:48:17.7674510Z // -- End function 2026-02-21T09:48:17.7674574Z } 2026-02-21T09:48:17.7674875Z .file 1 "/tmp/torchinductor_root/rh/crhjjjiqvxj5o2bk73uswazcc2j2ojfwtol3rmh22wnybxnefccb.py" 2026-02-21T09:48:17.7674943Z .section .debug_abbrev 2026-02-21T09:48:17.7675006Z { 2026-02-21T09:48:17.7675103Z .b8 1 // Abbreviation Code 2026-02-21T09:48:17.7675199Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:48:17.7675324Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:48:17.7675424Z .b8 37 // DW_AT_producer 2026-02-21T09:48:17.7675506Z .b8 8 // DW_FORM_string 2026-02-21T09:48:17.7675587Z .b8 19 // DW_AT_language 2026-02-21T09:48:17.7675680Z .b8 5 // DW_FORM_data2 2026-02-21T09:48:17.7675761Z .b8 3 // DW_AT_name 2026-02-21T09:48:17.7675841Z .b8 8 // DW_FORM_string 2026-02-21T09:48:17.7675933Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:48:17.7676017Z .b8 6 // DW_FORM_data4 2026-02-21T09:48:17.7676099Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:48:17.7676178Z .b8 8 // DW_FORM_string 2026-02-21T09:48:17.7676298Z .b8 0 // EOM(1) 2026-02-21T09:48:17.7676381Z .b8 0 // EOM(2) 2026-02-21T09:48:17.7676456Z .b8 0 // EOM(3) 2026-02-21T09:48:17.7676522Z } 2026-02-21T09:48:17.7676590Z .section .debug_info 2026-02-21T09:48:17.7676645Z { 2026-02-21T09:48:17.7676747Z .b32 104 // Length of Unit 2026-02-21T09:48:17.7676845Z .b8 2 // DWARF version number 2026-02-21T09:48:17.7676904Z .b8 0 2026-02-21T09:48:17.7677034Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:48:17.7677144Z .b8 8 // Address Size (in bytes) 2026-02-21T09:48:17.7677258Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:48:17.7677348Z .b8 116 // DW_AT_producer 2026-02-21T09:48:17.7677416Z .b8 114 2026-02-21T09:48:17.7677475Z .b8 105 2026-02-21T09:48:17.7677530Z .b8 116 2026-02-21T09:48:17.7677586Z .b8 111 2026-02-21T09:48:17.7677652Z .b8 110 2026-02-21T09:48:17.7677708Z .b8 0 2026-02-21T09:48:17.7677834Z .b8 2 // DW_AT_language 2026-02-21T09:48:17.7677898Z .b8 0 2026-02-21T09:48:17.7677980Z .b8 99 // DW_AT_name 2026-02-21T09:48:17.7678036Z .b8 114 2026-02-21T09:48:17.7678092Z .b8 104 2026-02-21T09:48:17.7678156Z .b8 106 2026-02-21T09:48:17.7678211Z .b8 106 2026-02-21T09:48:17.7678266Z .b8 106 2026-02-21T09:48:17.7678329Z .b8 105 2026-02-21T09:48:17.7678384Z .b8 113 2026-02-21T09:48:17.7678439Z .b8 118 2026-02-21T09:48:17.7678493Z .b8 120 2026-02-21T09:48:17.7678557Z .b8 106 2026-02-21T09:48:17.7678616Z .b8 53 2026-02-21T09:48:17.7678670Z .b8 111 2026-02-21T09:48:17.7678731Z .b8 50 2026-02-21T09:48:17.7678785Z .b8 98 2026-02-21T09:48:17.7678841Z .b8 107 2026-02-21T09:48:17.7678896Z .b8 55 2026-02-21T09:48:17.7678960Z .b8 51 2026-02-21T09:48:17.7679017Z .b8 117 2026-02-21T09:48:17.7679075Z .b8 115 2026-02-21T09:48:17.7679135Z .b8 119 2026-02-21T09:48:17.7679201Z .b8 97 2026-02-21T09:48:17.7679259Z .b8 122 2026-02-21T09:48:17.7679314Z .b8 99 2026-02-21T09:48:17.7679380Z .b8 99 2026-02-21T09:48:17.7679434Z .b8 50 2026-02-21T09:48:17.7679490Z .b8 106 2026-02-21T09:48:17.7679544Z .b8 50 2026-02-21T09:48:17.7679609Z .b8 111 2026-02-21T09:48:17.7679664Z .b8 106 2026-02-21T09:48:17.7679720Z .b8 102 2026-02-21T09:48:17.7679783Z .b8 119 2026-02-21T09:48:17.7679837Z .b8 116 2026-02-21T09:48:17.7679893Z .b8 111 2026-02-21T09:48:17.7679948Z .b8 108 2026-02-21T09:48:17.7680012Z .b8 51 2026-02-21T09:48:17.7680066Z .b8 114 2026-02-21T09:48:17.7680121Z .b8 109 2026-02-21T09:48:17.7680224Z .b8 104 2026-02-21T09:48:17.7680279Z .b8 50 2026-02-21T09:48:17.7680333Z .b8 50 2026-02-21T09:48:17.7680387Z .b8 119 2026-02-21T09:48:17.7680449Z .b8 110 2026-02-21T09:48:17.7680503Z .b8 121 2026-02-21T09:48:17.7680557Z .b8 98 2026-02-21T09:48:17.7680610Z .b8 120 2026-02-21T09:48:17.7680674Z .b8 110 2026-02-21T09:48:17.7680730Z .b8 101 2026-02-21T09:48:17.7680785Z .b8 102 2026-02-21T09:48:17.7680873Z .b8 99 2026-02-21T09:48:17.7680929Z .b8 99 2026-02-21T09:48:17.7680985Z .b8 98 2026-02-21T09:48:17.7681041Z .b8 46 2026-02-21T09:48:17.7681102Z .b8 112 2026-02-21T09:48:17.7681155Z .b8 121 2026-02-21T09:48:17.7681211Z .b8 0 2026-02-21T09:48:17.7681319Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:48:17.7681401Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:48:17.7681456Z .b8 116 2026-02-21T09:48:17.7681510Z .b8 109 2026-02-21T09:48:17.7681572Z .b8 112 2026-02-21T09:48:17.7681627Z .b8 47 2026-02-21T09:48:17.7681681Z .b8 116 2026-02-21T09:48:17.7681743Z .b8 111 2026-02-21T09:48:17.7681801Z .b8 114 2026-02-21T09:48:17.7681857Z .b8 99 2026-02-21T09:48:17.7681913Z .b8 104 2026-02-21T09:48:17.7681976Z .b8 105 2026-02-21T09:48:17.7682032Z .b8 110 2026-02-21T09:48:17.7682089Z .b8 100 2026-02-21T09:48:17.7682143Z .b8 117 2026-02-21T09:48:17.7682206Z .b8 99 2026-02-21T09:48:17.7682261Z .b8 116 2026-02-21T09:48:17.7682340Z .b8 111 2026-02-21T09:48:17.7682406Z .b8 114 2026-02-21T09:48:17.7682461Z .b8 95 2026-02-21T09:48:17.7682517Z .b8 114 2026-02-21T09:48:17.7682575Z .b8 111 2026-02-21T09:48:17.7682639Z .b8 111 2026-02-21T09:48:17.7682696Z .b8 116 2026-02-21T09:48:17.7682752Z .b8 47 2026-02-21T09:48:17.7682815Z .b8 114 2026-02-21T09:48:17.7682871Z .b8 104 2026-02-21T09:48:17.7682927Z .b8 0 2026-02-21T09:48:17.7682982Z } 2026-02-21T09:48:17.7683064Z .section .debug_macinfo { } 2026-02-21T09:48:17.7683069Z 2026-02-21T09:48:17.7683157Z ================================================================ 2026-02-21T09:48:17.7683272Z please share the reproducer above with Triton project. 2026-02-21T09:48:17.9984113Z [114s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:48:17.9984442Z 2026-02-21T09:48:17.9990012Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:48:17.9991770Z 2026-02-21T09:48:17.9992142Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:48:17.9992429Z `ptxas` stderr: 2026-02-21T09:48:17.9992912Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 302 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:17.9993487Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:17.9993658Z 2026-02-21T09:48:17.9994132Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp11hld0t2.ptx -o /tmp/tmp11hld0t2.ptx.o 2026-02-21T09:48:17.9994658Z 2026-02-21T09:48:17.9995039Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:48:17.9995272Z 2026-02-21T09:48:17.9995367Z ================================================================ 2026-02-21T09:48:17.9995615Z Internal Triton PTX codegen error 2026-02-21T09:48:17.9995809Z `ptxas` stderr: 2026-02-21T09:48:17.9996283Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 302 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:17.9996817Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:17.9996992Z 2026-02-21T09:48:17.9997544Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp11hld0t2.ptx -o /tmp/tmp11hld0t2.ptx.o 2026-02-21T09:48:17.9998037Z 2026-02-21T09:48:17.9998041Z 2026-02-21T09:48:17.9998113Z // 2026-02-21T09:48:17.9998285Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:48:17.9998510Z // 2026-02-21T09:48:17.9998667Z 2026-02-21T09:48:17.9998730Z .version 8.7 2026-02-21T09:48:17.9998893Z .target sm_100a 2026-02-21T09:48:17.9999043Z .address_size 64 2026-02-21T09:48:17.9999145Z 2026-02-21T09:48:17.9999284Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:48:17.9999607Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:48:17.9999848Z // @_helion_matmul 2026-02-21T09:48:18.0000082Z .visible .entry _helion_matmul( 2026-02-21T09:48:18.0000328Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:48:18.0000622Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:48:18.0000905Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:48:18.0001191Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:48:18.0001478Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:48:18.0001704Z ) 2026-02-21T09:48:18.0001916Z .reqntid 256 2026-02-21T09:48:18.0002064Z .maxnreg 32 2026-02-21T09:48:18.0002205Z { 2026-02-21T09:48:18.0002342Z .reg .pred %p<136>; 2026-02-21T09:48:18.0002511Z .reg .b16 %rs<15>; 2026-02-21T09:48:18.0002667Z .reg .b32 %r<368>; 2026-02-21T09:48:18.0002829Z .reg .b64 %rd<196>; 2026-02-21T09:48:18.0003125Z .loc 1 19 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:19:0 2026-02-21T09:48:18.0003468Z $L__func_begin0: 2026-02-21T09:48:18.0003758Z .loc 1 19 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:19:0 2026-02-21T09:48:18.0004024Z 2026-02-21T09:48:18.0004087Z // %bb.0: 2026-02-21T09:48:18.0004270Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:48:18.0004486Z $L__tmp0: 2026-02-21T09:48:18.0004810Z .loc 1 19 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:19 2026-02-21T09:48:18.0005141Z mov.u32 %r1, %tid.x; 2026-02-21T09:48:18.0005322Z shr.u32 %r2, %r1, 5; 2026-02-21T09:48:18.0005499Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:48:18.0005718Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:48:18.0005973Z @%p1 bra $L__BB0_12; 2026-02-21T09:48:18.0006130Z bra.uni $L__BB0_1; 2026-02-21T09:48:18.0006292Z $L__BB0_12: 2026-02-21T09:48:18.0006560Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0:0 2026-02-21T09:48:18.0006920Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:48:18.0007159Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:48:18.0007399Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:48:18.0007725Z .loc 1 19 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:19 2026-02-21T09:48:18.0008085Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.0008311Z setp.lt.u32 %p50, %r1, 32; 2026-02-21T09:48:18.0008498Z mov.b32 %r163, global_smem; 2026-02-21T09:48:18.0008691Z // begin inline asm 2026-02-21T09:48:18.0008977Z @%p50 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r163], 32; 2026-02-21T09:48:18.0009270Z // end inline asm 2026-02-21T09:48:18.0009425Z bar.sync 0, 128; 2026-02-21T09:48:18.0009613Z ld.shared.b32 %r360, [global_smem]; 2026-02-21T09:48:18.0009811Z bar.sync 0, 128; 2026-02-21T09:48:18.0009964Z // begin inline asm 2026-02-21T09:48:18.0010203Z @%p50 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:48:18.0010460Z // end inline asm 2026-02-21T09:48:18.0010749Z .loc 1 21 67 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:21:67 2026-02-21T09:48:18.0011080Z mov.u32 %r21, %ctaid.x; 2026-02-21T09:48:18.0011309Z mov.u32 %r188, %ctaid.y; 2026-02-21T09:48:18.0011479Z mov.u32 %r189, %ctaid.z; 2026-02-21T09:48:18.0011655Z mov.u32 %r190, %nctaid.x; 2026-02-21T09:48:18.0011837Z mov.u32 %r191, %nctaid.y; 2026-02-21T09:48:18.0012017Z mad.lo.s32 %r192, %r189, %r191, %r188; 2026-02-21T09:48:18.0012253Z mad.lo.s32 %r193, %r192, %r190, %r21; 2026-02-21T09:48:18.0023932Z mul.lo.s32 %r194, %r193, 384; 2026-02-21T09:48:18.0024168Z cvt.s64.s32 %rd128, %r194; 2026-02-21T09:48:18.0024387Z add.s64 %rd89, %rd7, %rd128; 2026-02-21T09:48:18.0024574Z shl.b32 %r195, %r1, 2; 2026-02-21T09:48:18.0024814Z add.s32 %r164, %r163, %r195; 2026-02-21T09:48:18.0024993Z mov.b32 %r173, 0; 2026-02-21T09:48:18.0025169Z // begin inline asm 2026-02-21T09:48:18.0025360Z @%p50 st.shared.b32 [ %r164 + 0 ], %r173; 2026-02-21T09:48:18.0025589Z // end inline asm 2026-02-21T09:48:18.0025774Z bar.warp.sync -1; 2026-02-21T09:48:18.0025952Z setp.eq.b32 %p53, %r1, 0; 2026-02-21T09:48:18.0026151Z cvt.u64.u32 %rd74, %r163; 2026-02-21T09:48:18.0026336Z // begin inline asm 2026-02-21T09:48:18.0026661Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd74 + 0 ], %rd4; 2026-02-21T09:48:18.0026990Z // end inline asm 2026-02-21T09:48:18.0027161Z // begin inline asm 2026-02-21T09:48:18.0027493Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:18.0027806Z // end inline asm 2026-02-21T09:48:18.0027970Z mov.b32 %r166, 64; 2026-02-21T09:48:18.0028131Z // begin inline asm 2026-02-21T09:48:18.0028417Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r166; 2026-02-21T09:48:18.0028730Z // end inline asm 2026-02-21T09:48:18.0028897Z mov.b32 %r167, 256; 2026-02-21T09:48:18.0029062Z // begin inline asm 2026-02-21T09:48:18.0029339Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r167; 2026-02-21T09:48:18.0029645Z // end inline asm 2026-02-21T09:48:18.0029811Z mov.b32 %r168, 2048; 2026-02-21T09:48:18.0029988Z // begin inline asm 2026-02-21T09:48:18.0030272Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r168; 2026-02-21T09:48:18.0030611Z // end inline asm 2026-02-21T09:48:18.0030766Z // begin inline asm 2026-02-21T09:48:18.0031054Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r168; 2026-02-21T09:48:18.0031381Z // end inline asm 2026-02-21T09:48:18.0031549Z mov.b64 %rd82, 4096; 2026-02-21T09:48:18.0031776Z // begin inline asm 2026-02-21T09:48:18.0032064Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd74 + 0 ], 0x0, %rd82; 2026-02-21T09:48:18.0032396Z // end inline asm 2026-02-21T09:48:18.0032549Z mov.b32 %r170, 1; 2026-02-21T09:48:18.0032713Z // begin inline asm 2026-02-21T09:48:18.0033011Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r170; 2026-02-21T09:48:18.0033359Z // end inline asm 2026-02-21T09:48:18.0033516Z // begin inline asm 2026-02-21T09:48:18.0033821Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r170; 2026-02-21T09:48:18.0034169Z // end inline asm 2026-02-21T09:48:18.0034328Z // begin inline asm 2026-02-21T09:48:18.0034613Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x6; 2026-02-21T09:48:18.0034979Z // end inline asm 2026-02-21T09:48:18.0035145Z // begin inline asm 2026-02-21T09:48:18.0035425Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:18.0035754Z // end inline asm 2026-02-21T09:48:18.0035916Z // begin inline asm 2026-02-21T09:48:18.0036201Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x3; 2026-02-21T09:48:18.0036533Z // end inline asm 2026-02-21T09:48:18.0036685Z // begin inline asm 2026-02-21T09:48:18.0036957Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:18.0037252Z // end inline asm 2026-02-21T09:48:18.0037472Z // begin inline asm 2026-02-21T09:48:18.0037891Z @%p50 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd89 + 0 ], [ %rd74 + 0 ], 0x80; 2026-02-21T09:48:18.0038342Z // end inline asm 2026-02-21T09:48:18.0038504Z // begin inline asm 2026-02-21T09:48:18.0038741Z @%p50 fence.proxy.tensormap::generic.acquire.gpu [ %rd89 + 0 ], 0x80; 2026-02-21T09:48:18.0039090Z @%p50 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.0039313Z @%p50 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.0039526Z // end inline asm 2026-02-21T09:48:18.0039679Z bar.sync 0, 128; 2026-02-21T09:48:18.0039981Z .loc 1 22 68 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:22:68 2026-02-21T09:48:18.0040329Z add.s32 %r196, %r194, 128; 2026-02-21T09:48:18.0040512Z cvt.s64.s32 %rd129, %r196; 2026-02-21T09:48:18.0040703Z add.s64 %rd107, %rd7, %rd129; 2026-02-21T09:48:18.0040879Z bar.sync 0, 128; 2026-02-21T09:48:18.0041039Z // begin inline asm 2026-02-21T09:48:18.0041212Z @%p50 st.shared.b32 [ %r164 + 0 ], %r173; 2026-02-21T09:48:18.0041420Z // end inline asm 2026-02-21T09:48:18.0041573Z bar.warp.sync -1; 2026-02-21T09:48:18.0041743Z // begin inline asm 2026-02-21T09:48:18.0042027Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd74 + 0 ], %rd5; 2026-02-21T09:48:18.0042380Z // end inline asm 2026-02-21T09:48:18.0042543Z // begin inline asm 2026-02-21T09:48:18.0042788Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:18.0043080Z // end inline asm 2026-02-21T09:48:18.0043229Z // begin inline asm 2026-02-21T09:48:18.0043494Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r166; 2026-02-21T09:48:18.0043789Z // end inline asm 2026-02-21T09:48:18.0043945Z mov.b32 %r175, 16; 2026-02-21T09:48:18.0044103Z // begin inline asm 2026-02-21T09:48:18.0044362Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r175; 2026-02-21T09:48:18.0044711Z // end inline asm 2026-02-21T09:48:18.0044867Z // begin inline asm 2026-02-21T09:48:18.0045151Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r168; 2026-02-21T09:48:18.0045466Z // end inline asm 2026-02-21T09:48:18.0045640Z mov.b32 %r177, 12288; 2026-02-21T09:48:18.0045821Z // begin inline asm 2026-02-21T09:48:18.0046115Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r177; 2026-02-21T09:48:18.0046510Z // end inline asm 2026-02-21T09:48:18.0046666Z // begin inline asm 2026-02-21T09:48:18.0046983Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd74 + 0 ], 0x0, %rd82; 2026-02-21T09:48:18.0047322Z // end inline asm 2026-02-21T09:48:18.0047488Z // begin inline asm 2026-02-21T09:48:18.0047795Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r170; 2026-02-21T09:48:18.0048151Z // end inline asm 2026-02-21T09:48:18.0048317Z // begin inline asm 2026-02-21T09:48:18.0048628Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r170; 2026-02-21T09:48:18.0048982Z // end inline asm 2026-02-21T09:48:18.0049139Z // begin inline asm 2026-02-21T09:48:18.0049434Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x6; 2026-02-21T09:48:18.0049766Z // end inline asm 2026-02-21T09:48:18.0049929Z // begin inline asm 2026-02-21T09:48:18.0050239Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:18.0050576Z // end inline asm 2026-02-21T09:48:18.0050742Z // begin inline asm 2026-02-21T09:48:18.0051026Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x3; 2026-02-21T09:48:18.0051356Z // end inline asm 2026-02-21T09:48:18.0051512Z // begin inline asm 2026-02-21T09:48:18.0051799Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:18.0052123Z // end inline asm 2026-02-21T09:48:18.0052322Z // begin inline asm 2026-02-21T09:48:18.0052759Z @%p50 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd107 + 0 ], [ %rd74 + 0 ], 0x80; 2026-02-21T09:48:18.0053237Z // end inline asm 2026-02-21T09:48:18.0053399Z // begin inline asm 2026-02-21T09:48:18.0053676Z @%p50 fence.proxy.tensormap::generic.acquire.gpu [ %rd107 + 0 ], 0x80; 2026-02-21T09:48:18.0053995Z @%p50 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.0054233Z @%p50 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.0054429Z // end inline asm 2026-02-21T09:48:18.0054585Z bar.sync 0, 128; 2026-02-21T09:48:18.0054919Z .loc 1 24 73 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:24:73 2026-02-21T09:48:18.0055281Z add.s32 %r197, %r194, 256; 2026-02-21T09:48:18.0055467Z cvt.s64.s32 %rd130, %r197; 2026-02-21T09:48:18.0055663Z add.s64 %rd125, %rd7, %rd130; 2026-02-21T09:48:18.0055844Z bar.sync 0, 128; 2026-02-21T09:48:18.0056010Z // begin inline asm 2026-02-21T09:48:18.0056187Z @%p50 st.shared.b32 [ %r164 + 0 ], %r173; 2026-02-21T09:48:18.0056398Z // end inline asm 2026-02-21T09:48:18.0056565Z bar.warp.sync -1; 2026-02-21T09:48:18.0056725Z // begin inline asm 2026-02-21T09:48:18.0057024Z @%p53 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd74 + 0 ], %rd6; 2026-02-21T09:48:18.0057394Z // end inline asm 2026-02-21T09:48:18.0057560Z // begin inline asm 2026-02-21T09:48:18.0057815Z @%p53 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:18.0058112Z // end inline asm 2026-02-21T09:48:18.0058267Z // begin inline asm 2026-02-21T09:48:18.0058554Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r175; 2026-02-21T09:48:18.0058866Z // end inline asm 2026-02-21T09:48:18.0059019Z // begin inline asm 2026-02-21T09:48:18.0059293Z @%p53 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r167; 2026-02-21T09:48:18.0059591Z // end inline asm 2026-02-21T09:48:18.0059758Z // begin inline asm 2026-02-21T09:48:18.0060036Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r177; 2026-02-21T09:48:18.0060363Z // end inline asm 2026-02-21T09:48:18.0060524Z // begin inline asm 2026-02-21T09:48:18.0060802Z @%p53 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r168; 2026-02-21T09:48:18.0061121Z // end inline asm 2026-02-21T09:48:18.0061279Z mov.b64 %rd118, 24576; 2026-02-21T09:48:18.0061495Z // begin inline asm 2026-02-21T09:48:18.0061788Z @%p53 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd74 + 0 ], 0x0, %rd118; 2026-02-21T09:48:18.0062123Z // end inline asm 2026-02-21T09:48:18.0062286Z // begin inline asm 2026-02-21T09:48:18.0062584Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0, %r170; 2026-02-21T09:48:18.0062935Z // end inline asm 2026-02-21T09:48:18.0063089Z // begin inline asm 2026-02-21T09:48:18.0063387Z @%p53 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1, %r170; 2026-02-21T09:48:18.0063725Z // end inline asm 2026-02-21T09:48:18.0063891Z // begin inline asm 2026-02-21T09:48:18.0064158Z @%p53 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x6; 2026-02-21T09:48:18.0064477Z // end inline asm 2026-02-21T09:48:18.0064641Z // begin inline asm 2026-02-21T09:48:18.0065025Z @%p53 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:18.0065362Z // end inline asm 2026-02-21T09:48:18.0065518Z // begin inline asm 2026-02-21T09:48:18.0065785Z @%p53 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x1; 2026-02-21T09:48:18.0066096Z // end inline asm 2026-02-21T09:48:18.0066248Z // begin inline asm 2026-02-21T09:48:18.0066517Z @%p53 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd74 + 0 ], 0x0; 2026-02-21T09:48:18.0066808Z // end inline asm 2026-02-21T09:48:18.0066966Z // begin inline asm 2026-02-21T09:48:18.0067416Z @%p50 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd125 + 0 ], [ %rd74 + 0 ], 0x80; 2026-02-21T09:48:18.0067841Z // end inline asm 2026-02-21T09:48:18.0068000Z // begin inline asm 2026-02-21T09:48:18.0068237Z @%p50 fence.proxy.tensormap::generic.acquire.gpu [ %rd125 + 0 ], 0x80; 2026-02-21T09:48:18.0068582Z @%p50 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.0068792Z @%p50 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.0068995Z // end inline asm 2026-02-21T09:48:18.0069146Z bar.sync 0, 128; 2026-02-21T09:48:18.0069305Z cvta.global.u64 %rd131, %rd125; 2026-02-21T09:48:18.0069630Z .loc 1 31 35 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:31:35 2026-02-21T09:48:18.0069968Z mul.lo.s32 %r367, %r21, 3; 2026-02-21T09:48:18.0070280Z .loc 1 32 37 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:32:37 2026-02-21T09:48:18.0070602Z add.s32 %r198, %r367, 3; 2026-02-21T09:48:18.0070908Z .loc 1 32 49 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:32:49 2026-02-21T09:48:18.0071232Z min.s32 %r23, %r198, 6144; 2026-02-21T09:48:18.0071527Z .loc 1 33 84 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:33:84 2026-02-21T09:48:18.0071908Z setp.ge.s32 %p106, %r367, %r23; 2026-02-21T09:48:18.0072102Z @%p106 bra $L__BB0_15; 2026-02-21T09:48:18.0072303Z // %bb.13: // %.lr.ph 2026-02-21T09:48:18.0072645Z .loc 1 0 84 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0:84 2026-02-21T09:48:18.0072982Z shl.b32 %r199, %r1, 5; 2026-02-21T09:48:18.0073153Z and.b32 %r200, %r199, 3936; 2026-02-21T09:48:18.0073347Z bfe.s32 %r201, %r1, 2, 1; 2026-02-21T09:48:18.0073522Z and.b32 %r202, %r201, 144; 2026-02-21T09:48:18.0073704Z or.b32 %r203, %r202, %r200; 2026-02-21T09:48:18.0073892Z add.s32 %r205, %r163, 131072; 2026-02-21T09:48:18.0074071Z add.s32 %r24, %r205, %r203; 2026-02-21T09:48:18.0074249Z xor.b32 %r206, %r203, 16; 2026-02-21T09:48:18.0074425Z add.s32 %r25, %r205, %r206; 2026-02-21T09:48:18.0074768Z .loc 1 33 84 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:33:84 2026-02-21T09:48:18.0075108Z cvt.u16.u32 %rs4, %r21; 2026-02-21T09:48:18.0075284Z mul.lo.s16 %rs14, %rs4, 3; 2026-02-21T09:48:18.0075520Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:48:18.0075942Z .loc 1 39 35 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:39:35 2026-02-21T09:48:18.0076280Z mul.hi.s32 %r302, %r367, 715827883; 2026-02-21T09:48:18.0076479Z shr.u32 %r303, %r302, 31; 2026-02-21T09:48:18.0076650Z shr.s32 %r304, %r302, 9; 2026-02-21T09:48:18.0076831Z add.s32 %r305, %r304, %r303; 2026-02-21T09:48:18.0077131Z .loc 1 43 51 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:43:51 2026-02-21T09:48:18.0077485Z cvt.u16.u32 %rs5, %r305; 2026-02-21T09:48:18.0077670Z mad.lo.s16 %rs6, %rs5, -3072, %rs14; 2026-02-21T09:48:18.0077877Z shr.s16 %rs7, %rs6, 15; 2026-02-21T09:48:18.0078048Z shr.u16 %rs8, %rs7, 14; 2026-02-21T09:48:18.0078227Z add.s16 %rs9, %rs6, %rs8; 2026-02-21T09:48:18.0078406Z shr.s16 %rs10, %rs9, 2; 2026-02-21T09:48:18.0078714Z .loc 1 42 64 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:42:64 2026-02-21T09:48:18.0079047Z and.b16 %rs11, %rs9, -4; 2026-02-21T09:48:18.0079224Z mad.lo.s16 %rs12, %rs5, 3072, %rs11; 2026-02-21T09:48:18.0079425Z sub.s16 %rs13, %rs14, %rs12; 2026-02-21T09:48:18.0079726Z .loc 1 44 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:44:27 2026-02-21T09:48:18.0080080Z shl.b32 %r306, %r305, 10; 2026-02-21T09:48:18.0080261Z mad.wide.s16 %r300, %rs13, 256, %r306; 2026-02-21T09:48:18.0080588Z .loc 1 45 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:45:27 2026-02-21T09:48:18.0080998Z mul.wide.s16 %r299, %rs10, 16; 2026-02-21T09:48:18.0081300Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0081672Z shfl.sync.idx.b32 %r307, %r2, 0, 31, -1; 2026-02-21T09:48:18.0081878Z shl.b32 %r308, %r307, 21; 2026-02-21T09:48:18.0082103Z and.b32 %r309, %r308, 6291456; 2026-02-21T09:48:18.0082284Z add.s32 %r207, %r309, %r360; 2026-02-21T09:48:18.0082472Z mov.pred %p107, -1; 2026-02-21T09:48:18.0082642Z mov.b32 %r208, 0; 2026-02-21T09:48:18.0082797Z // begin inline asm 2026-02-21T09:48:18.0083261Z @%p107 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 0], {%r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208}; 2026-02-21T09:48:18.0083716Z // end inline asm 2026-02-21T09:48:18.0083882Z // begin inline asm 2026-02-21T09:48:18.0084309Z @%p107 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 16], {%r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208, %r208}; 2026-02-21T09:48:18.0084847Z // end inline asm 2026-02-21T09:48:18.0085007Z // begin inline asm 2026-02-21T09:48:18.0085181Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:48:18.0085379Z // end inline asm 2026-02-21T09:48:18.0085527Z bar.sync 0, 128; 2026-02-21T09:48:18.0085868Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0086218Z add.s32 %r241, %r163, 147584; 2026-02-21T09:48:18.0086407Z // begin inline asm 2026-02-21T09:48:18.0086603Z @%p53 mbarrier.init.shared::cta.b64 [%r241], 1; 2026-02-21T09:48:18.0086839Z // end inline asm 2026-02-21T09:48:18.0086999Z bar.sync 0, 128; 2026-02-21T09:48:18.0087154Z add.s32 %r242, %r163, 147592; 2026-02-21T09:48:18.0087338Z // begin inline asm 2026-02-21T09:48:18.0087523Z @%p53 mbarrier.init.shared::cta.b64 [%r242], 1; 2026-02-21T09:48:18.0087741Z // end inline asm 2026-02-21T09:48:18.0087890Z bar.sync 0, 128; 2026-02-21T09:48:18.0088048Z add.s32 %r243, %r163, 147600; 2026-02-21T09:48:18.0088230Z // begin inline asm 2026-02-21T09:48:18.0088416Z @%p53 mbarrier.init.shared::cta.b64 [%r243], 1; 2026-02-21T09:48:18.0088617Z // end inline asm 2026-02-21T09:48:18.0088771Z bar.sync 0, 128; 2026-02-21T09:48:18.0088925Z add.s32 %r244, %r163, 147608; 2026-02-21T09:48:18.0089093Z // begin inline asm 2026-02-21T09:48:18.0089276Z @%p53 mbarrier.init.shared::cta.b64 [%r244], 1; 2026-02-21T09:48:18.0089511Z // end inline asm 2026-02-21T09:48:18.0089666Z add.s32 %r245, %r163, 147616; 2026-02-21T09:48:18.0089834Z // begin inline asm 2026-02-21T09:48:18.0090020Z @%p53 mbarrier.init.shared::cta.b64 [%r245], 1; 2026-02-21T09:48:18.0090219Z // end inline asm 2026-02-21T09:48:18.0090368Z bar.sync 0, 128; 2026-02-21T09:48:18.0090524Z add.s32 %r246, %r163, 147624; 2026-02-21T09:48:18.0090691Z // begin inline asm 2026-02-21T09:48:18.0090872Z @%p53 mbarrier.init.shared::cta.b64 [%r246], 1; 2026-02-21T09:48:18.0091070Z // end inline asm 2026-02-21T09:48:18.0091222Z bar.sync 0, 128; 2026-02-21T09:48:18.0091369Z add.s32 %r247, %r163, 147632; 2026-02-21T09:48:18.0091541Z // begin inline asm 2026-02-21T09:48:18.0091718Z @%p53 mbarrier.init.shared::cta.b64 [%r247], 1; 2026-02-21T09:48:18.0091924Z // end inline asm 2026-02-21T09:48:18.0092067Z bar.sync 0, 128; 2026-02-21T09:48:18.0092223Z add.s32 %r248, %r163, 147640; 2026-02-21T09:48:18.0092397Z // begin inline asm 2026-02-21T09:48:18.0092573Z @%p53 mbarrier.init.shared::cta.b64 [%r248], 1; 2026-02-21T09:48:18.0092780Z // end inline asm 2026-02-21T09:48:18.0093046Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0093358Z bar.sync 0, 128; 2026-02-21T09:48:18.0093502Z // begin inline asm 2026-02-21T09:48:18.0093693Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r241]; 2026-02-21T09:48:18.0093905Z // end inline asm 2026-02-21T09:48:18.0094060Z bar.sync 0, 128; 2026-02-21T09:48:18.0094261Z // begin inline asm 2026-02-21T09:48:18.0094440Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r242]; 2026-02-21T09:48:18.0094658Z // end inline asm 2026-02-21T09:48:18.0094875Z bar.sync 0, 128; 2026-02-21T09:48:18.0095033Z // begin inline asm 2026-02-21T09:48:18.0095219Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r243]; 2026-02-21T09:48:18.0095490Z // end inline asm 2026-02-21T09:48:18.0095647Z bar.sync 0, 128; 2026-02-21T09:48:18.0095811Z // begin inline asm 2026-02-21T09:48:18.0096011Z @%p53 mbarrier.arrive.shared::cta.b64 _, [%r244]; 2026-02-21T09:48:18.0096228Z // end inline asm 2026-02-21T09:48:18.0096523Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0096853Z bar.sync 0, 128; 2026-02-21T09:48:18.0097021Z add.s32 %r253, %r163, 147648; 2026-02-21T09:48:18.0097188Z // begin inline asm 2026-02-21T09:48:18.0097369Z @%p53 mbarrier.init.shared::cta.b64 [%r253], 1; 2026-02-21T09:48:18.0097571Z // end inline asm 2026-02-21T09:48:18.0097752Z st.shared.b32 [global_smem+147656], 33554689; 2026-02-21T09:48:18.0097985Z st.shared.b32 [global_smem+147456], %r360; 2026-02-21T09:48:18.0098227Z st.shared.v2.b32 [global_smem+147464], {%r300, %r299}; 2026-02-21T09:48:18.0098453Z barrier.sync 1; 2026-02-21T09:48:18.0098674Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.0098890Z barrier.sync 1; 2026-02-21T09:48:18.0099040Z barrier.sync 1; 2026-02-21T09:48:18.0099221Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.0099545Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0099875Z bar.sync 0, 128; 2026-02-21T09:48:18.0100029Z // begin inline asm 2026-02-21T09:48:18.0100177Z 2026-02-21T09:48:18.0100312Z { 2026-02-21T09:48:18.0100447Z .reg .pred complete; 2026-02-21T09:48:18.0100614Z waitLoop: 2026-02-21T09:48:18.0100822Z mbarrier.try_wait.parity.shared.b64 complete, [%r253], %r208; 2026-02-21T09:48:18.0101089Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.0101256Z } 2026-02-21T09:48:18.0101336Z 2026-02-21T09:48:18.0101397Z // end inline asm 2026-02-21T09:48:18.0101666Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0101984Z bar.sync 0, 128; 2026-02-21T09:48:18.0102139Z // begin inline asm 2026-02-21T09:48:18.0102323Z @%p53 mbarrier.inval.shared::cta.b64 [%r253]; 2026-02-21T09:48:18.0102583Z // end inline asm 2026-02-21T09:48:18.0102735Z // begin inline asm 2026-02-21T09:48:18.0102928Z @%p53 mbarrier.inval.shared::cta.b64 [%r245]; 2026-02-21T09:48:18.0103136Z // end inline asm 2026-02-21T09:48:18.0103297Z bar.sync 0, 128; 2026-02-21T09:48:18.0103445Z // begin inline asm 2026-02-21T09:48:18.0103635Z @%p53 mbarrier.inval.shared::cta.b64 [%r246]; 2026-02-21T09:48:18.0103847Z // end inline asm 2026-02-21T09:48:18.0103997Z bar.sync 0, 128; 2026-02-21T09:48:18.0104156Z // begin inline asm 2026-02-21T09:48:18.0104339Z @%p53 mbarrier.inval.shared::cta.b64 [%r247]; 2026-02-21T09:48:18.0104552Z // end inline asm 2026-02-21T09:48:18.0104739Z bar.sync 0, 128; 2026-02-21T09:48:18.0104895Z // begin inline asm 2026-02-21T09:48:18.0105075Z @%p53 mbarrier.inval.shared::cta.b64 [%r248]; 2026-02-21T09:48:18.0105288Z // end inline asm 2026-02-21T09:48:18.0105442Z // begin inline asm 2026-02-21T09:48:18.0105635Z @%p53 mbarrier.inval.shared::cta.b64 [%r241]; 2026-02-21T09:48:18.0105851Z // end inline asm 2026-02-21T09:48:18.0106001Z bar.sync 0, 128; 2026-02-21T09:48:18.0106160Z // begin inline asm 2026-02-21T09:48:18.0106337Z @%p53 mbarrier.inval.shared::cta.b64 [%r242]; 2026-02-21T09:48:18.0106548Z // end inline asm 2026-02-21T09:48:18.0106695Z bar.sync 0, 128; 2026-02-21T09:48:18.0106858Z // begin inline asm 2026-02-21T09:48:18.0107041Z @%p53 mbarrier.inval.shared::cta.b64 [%r243]; 2026-02-21T09:48:18.0107253Z // end inline asm 2026-02-21T09:48:18.0107408Z bar.sync 0, 128; 2026-02-21T09:48:18.0107556Z // begin inline asm 2026-02-21T09:48:18.0107813Z @%p53 mbarrier.inval.shared::cta.b64 [%r244]; 2026-02-21T09:48:18.0108055Z // end inline asm 2026-02-21T09:48:18.0108387Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0108769Z // begin inline asm 2026-02-21T09:48:18.0109311Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r265, %r266, %r267, %r268, %r269, %r270, %r271, %r272, %r273, %r274, %r275, %r276, %r277, %r278, %r279, %r280}, [%r207 + 0]; 2026-02-21T09:48:18.0109849Z // end inline asm 2026-02-21T09:48:18.0110010Z // begin inline asm 2026-02-21T09:48:18.0110427Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r282, %r283, %r284, %r285, %r286, %r287, %r288, %r289, %r290, %r291, %r292, %r293, %r294, %r295, %r296, %r297}, [%r207 + 16]; 2026-02-21T09:48:18.0110873Z // end inline asm 2026-02-21T09:48:18.0111033Z // begin inline asm 2026-02-21T09:48:18.0111204Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:48:18.0111396Z // end inline asm 2026-02-21T09:48:18.0111554Z cvt.u64.u32 %rd132, %r265; 2026-02-21T09:48:18.0111745Z cvt.u64.u32 %rd133, %r266; 2026-02-21T09:48:18.0111924Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:48:18.0112117Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:48:18.0112504Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0112847Z mov.b64 {%r311, %r312}, %rd135; 2026-02-21T09:48:18.0113052Z cvt.rn.f16x2.f32 %r313, %r312, %r311; 2026-02-21T09:48:18.0113382Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0113717Z cvt.u64.u32 %rd136, %r267; 2026-02-21T09:48:18.0113891Z cvt.u64.u32 %rd137, %r268; 2026-02-21T09:48:18.0114070Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:48:18.0114257Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:48:18.0114580Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0114951Z mov.b64 {%r314, %r315}, %rd139; 2026-02-21T09:48:18.0115145Z cvt.rn.f16x2.f32 %r316, %r315, %r314; 2026-02-21T09:48:18.0115471Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0115798Z cvt.u64.u32 %rd140, %r269; 2026-02-21T09:48:18.0115986Z cvt.u64.u32 %rd141, %r270; 2026-02-21T09:48:18.0116157Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:48:18.0116326Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:48:18.0116673Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0116983Z mov.b64 {%r317, %r318}, %rd143; 2026-02-21T09:48:18.0117171Z cvt.rn.f16x2.f32 %r319, %r318, %r317; 2026-02-21T09:48:18.0117472Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0117784Z cvt.u64.u32 %rd144, %r271; 2026-02-21T09:48:18.0117949Z cvt.u64.u32 %rd145, %r272; 2026-02-21T09:48:18.0118124Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:48:18.0118313Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:48:18.0118605Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0118932Z mov.b64 {%r320, %r321}, %rd147; 2026-02-21T09:48:18.0119114Z cvt.rn.f16x2.f32 %r322, %r321, %r320; 2026-02-21T09:48:18.0119423Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0119730Z cvt.u64.u32 %rd148, %r273; 2026-02-21T09:48:18.0119900Z cvt.u64.u32 %rd149, %r274; 2026-02-21T09:48:18.0120068Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:48:18.0120234Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:48:18.0120527Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0120832Z mov.b64 {%r323, %r324}, %rd151; 2026-02-21T09:48:18.0121017Z cvt.rn.f16x2.f32 %r325, %r324, %r323; 2026-02-21T09:48:18.0121314Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0121678Z cvt.u64.u32 %rd152, %r275; 2026-02-21T09:48:18.0121843Z cvt.u64.u32 %rd153, %r276; 2026-02-21T09:48:18.0122016Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:48:18.0122193Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:48:18.0122521Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0122843Z mov.b64 {%r326, %r327}, %rd155; 2026-02-21T09:48:18.0123020Z cvt.rn.f16x2.f32 %r328, %r327, %r326; 2026-02-21T09:48:18.0123324Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0123630Z cvt.u64.u32 %rd156, %r277; 2026-02-21T09:48:18.0123808Z cvt.u64.u32 %rd157, %r278; 2026-02-21T09:48:18.0123978Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:48:18.0124146Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:48:18.0124435Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0124799Z mov.b64 {%r329, %r330}, %rd159; 2026-02-21T09:48:18.0124987Z cvt.rn.f16x2.f32 %r331, %r330, %r329; 2026-02-21T09:48:18.0125172Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0125296Z cvt.u64.u32 %rd160, %r279; 2026-02-21T09:48:18.0125372Z cvt.u64.u32 %rd161, %r280; 2026-02-21T09:48:18.0125436Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:48:18.0125499Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:48:18.0125688Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0125756Z mov.b64 {%r332, %r333}, %rd163; 2026-02-21T09:48:18.0125828Z cvt.rn.f16x2.f32 %r334, %r333, %r332; 2026-02-21T09:48:18.0126018Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0126090Z cvt.u64.u32 %rd164, %r282; 2026-02-21T09:48:18.0126156Z cvt.u64.u32 %rd165, %r283; 2026-02-21T09:48:18.0126222Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:48:18.0126297Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:48:18.0126486Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0126554Z mov.b64 {%r335, %r336}, %rd167; 2026-02-21T09:48:18.0126635Z cvt.rn.f16x2.f32 %r337, %r336, %r335; 2026-02-21T09:48:18.0126824Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0126931Z cvt.u64.u32 %rd168, %r284; 2026-02-21T09:48:18.0126997Z cvt.u64.u32 %rd169, %r285; 2026-02-21T09:48:18.0127072Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:48:18.0127137Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:48:18.0127325Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0127400Z mov.b64 {%r338, %r339}, %rd171; 2026-02-21T09:48:18.0127472Z cvt.rn.f16x2.f32 %r340, %r339, %r338; 2026-02-21T09:48:18.0127660Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0127733Z cvt.u64.u32 %rd172, %r286; 2026-02-21T09:48:18.0127799Z cvt.u64.u32 %rd173, %r287; 2026-02-21T09:48:18.0127866Z shl.b64 %rd174, %rd173, 32; 2026-02-21T09:48:18.0127934Z or.b64 %rd175, %rd172, %rd174; 2026-02-21T09:48:18.0128133Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0128202Z mov.b64 {%r341, %r342}, %rd175; 2026-02-21T09:48:18.0128273Z cvt.rn.f16x2.f32 %r343, %r342, %r341; 2026-02-21T09:48:18.0128474Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0128541Z cvt.u64.u32 %rd176, %r288; 2026-02-21T09:48:18.0128608Z cvt.u64.u32 %rd177, %r289; 2026-02-21T09:48:18.0128681Z shl.b64 %rd178, %rd177, 32; 2026-02-21T09:48:18.0128746Z or.b64 %rd179, %rd176, %rd178; 2026-02-21T09:48:18.0128988Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0129056Z mov.b64 {%r344, %r345}, %rd179; 2026-02-21T09:48:18.0129136Z cvt.rn.f16x2.f32 %r346, %r345, %r344; 2026-02-21T09:48:18.0129364Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0129432Z cvt.u64.u32 %rd180, %r290; 2026-02-21T09:48:18.0129511Z cvt.u64.u32 %rd181, %r291; 2026-02-21T09:48:18.0129577Z shl.b64 %rd182, %rd181, 32; 2026-02-21T09:48:18.0129643Z or.b64 %rd183, %rd180, %rd182; 2026-02-21T09:48:18.0129842Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0129909Z mov.b64 {%r347, %r348}, %rd183; 2026-02-21T09:48:18.0129978Z cvt.rn.f16x2.f32 %r349, %r348, %r347; 2026-02-21T09:48:18.0130168Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0130242Z cvt.u64.u32 %rd184, %r292; 2026-02-21T09:48:18.0130306Z cvt.u64.u32 %rd185, %r293; 2026-02-21T09:48:18.0130370Z shl.b64 %rd186, %rd185, 32; 2026-02-21T09:48:18.0130443Z or.b64 %rd187, %rd184, %rd186; 2026-02-21T09:48:18.0130655Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0130724Z mov.b64 {%r350, %r351}, %rd187; 2026-02-21T09:48:18.0130802Z cvt.rn.f16x2.f32 %r352, %r351, %r350; 2026-02-21T09:48:18.0130993Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0131059Z cvt.u64.u32 %rd188, %r294; 2026-02-21T09:48:18.0131124Z cvt.u64.u32 %rd189, %r295; 2026-02-21T09:48:18.0131198Z shl.b64 %rd190, %rd189, 32; 2026-02-21T09:48:18.0131265Z or.b64 %rd191, %rd188, %rd190; 2026-02-21T09:48:18.0131454Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0131532Z mov.b64 {%r353, %r354}, %rd191; 2026-02-21T09:48:18.0131602Z cvt.rn.f16x2.f32 %r355, %r354, %r353; 2026-02-21T09:48:18.0131794Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0131867Z cvt.u64.u32 %rd192, %r296; 2026-02-21T09:48:18.0131935Z cvt.u64.u32 %rd193, %r297; 2026-02-21T09:48:18.0132002Z shl.b64 %rd194, %rd193, 32; 2026-02-21T09:48:18.0132095Z or.b64 %rd195, %rd192, %rd194; 2026-02-21T09:48:18.0132297Z .loc 1 58 27 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:58:27 2026-02-21T09:48:18.0132363Z mov.b64 {%r356, %r357}, %rd195; 2026-02-21T09:48:18.0132436Z cvt.rn.f16x2.f32 %r358, %r357, %r356; 2026-02-21T09:48:18.0132632Z .loc 1 59 45 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:59:45 2026-02-21T09:48:18.0132714Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:18.0132776Z bar.sync 0, 128; 2026-02-21T09:48:18.0132896Z st.shared.v4.b32 [%r24], {%r313, %r316, %r319, %r322}; 2026-02-21T09:48:18.0133009Z st.shared.v4.b32 [%r24+4096], {%r337, %r340, %r343, %r346}; 2026-02-21T09:48:18.0133111Z st.shared.v4.b32 [%r25], {%r325, %r328, %r331, %r334}; 2026-02-21T09:48:18.0133220Z st.shared.v4.b32 [%r25+4096], {%r349, %r352, %r355, %r358}; 2026-02-21T09:48:18.0133296Z // begin inline asm 2026-02-21T09:48:18.0133380Z fence.proxy.async.shared::cta; 2026-02-21T09:48:18.0133445Z // end inline asm 2026-02-21T09:48:18.0133520Z bar.sync 0, 128; 2026-02-21T09:48:18.0133594Z elect.sync %r359|%p133, -1; 2026-02-21T09:48:18.0133667Z and.pred %p131, %p50, %p133; 2026-02-21T09:48:18.0133731Z // begin inline asm 2026-02-21T09:48:18.0133952Z @%p131 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd131, {%r299, %r300}], [%r205]; 2026-02-21T09:48:18.0134014Z // end inline asm 2026-02-21T09:48:18.0134088Z cp.async.bulk.commit_group; 2026-02-21T09:48:18.0134289Z .loc 1 33 84 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:33:84 2026-02-21T09:48:18.0134384Z add.s32 %r367, %r367, 1; 2026-02-21T09:48:18.0134450Z add.s16 %rs14, %rs14, 1; 2026-02-21T09:48:18.0134530Z setp.ne.b32 %p134, %r23, %r367; 2026-02-21T09:48:18.0134597Z @%p134 bra $L__BB0_14; 2026-02-21T09:48:18.0134742Z $L__BB0_15: // %._crit_edge 2026-02-21T09:48:18.0134865Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:18.0134935Z bar.sync 0, 128; 2026-02-21T09:48:18.0135128Z .loc 1 33 4 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:33:4 2026-02-21T09:48:18.0135189Z bar.sync 0, 128; 2026-02-21T09:48:18.0135261Z // begin inline asm 2026-02-21T09:48:18.0135395Z @%p50 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r360, 32; 2026-02-21T09:48:18.0135457Z // end inline asm 2026-02-21T09:48:18.0135556Z st.shared.b32 [global_smem+147656], 50529027; 2026-02-21T09:48:18.0135620Z barrier.sync 1; 2026-02-21T09:48:18.0135713Z $L__BB0_16: // %common.ret 2026-02-21T09:48:18.0135904Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0135969Z ret; 2026-02-21T09:48:18.0136077Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:48:18.0136191Z mov.b32 %r29, global_smem; 2026-02-21T09:48:18.0136268Z add.s32 %r30, %r29, %r3; 2026-02-21T09:48:18.0136334Z add.s32 %r63, %r29, 147616; 2026-02-21T09:48:18.0136402Z bfe.u32 %r85, %r29, 4, 14; 2026-02-21T09:48:18.0136475Z cvt.u64.u32 %rd32, %r85; 2026-02-21T09:48:18.0136553Z or.b64 %rd14, %rd32, 4611686293439512576; 2026-02-21T09:48:18.0136617Z add.s32 %r86, %r29, 139264; 2026-02-21T09:48:18.0136681Z bfe.u32 %r87, %r86, 4, 14; 2026-02-21T09:48:18.0136760Z cvt.u64.u32 %rd33, %r87; 2026-02-21T09:48:18.0136836Z or.b64 %rd23, %rd33, 4611686293313683456; 2026-02-21T09:48:18.0136903Z add.s32 %r88, %r29, 32; 2026-02-21T09:48:18.0136974Z bfe.u32 %r89, %r88, 4, 14; 2026-02-21T09:48:18.0137038Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.0137155Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0137377Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0137475Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.0137543Z barrier.sync 1; 2026-02-21T09:48:18.0137606Z barrier.sync 1; 2026-02-21T09:48:18.0137703Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.0137867Z $L__BB0_2: // %.preheader 2026-02-21T09:48:18.0137976Z // =>This Loop Header: Depth=1 2026-02-21T09:48:18.0138082Z // Child Loop BB0_9 Depth 2 2026-02-21T09:48:18.0138178Z // Child Loop BB0_6 Depth 2 2026-02-21T09:48:18.0138365Z .loc 1 19 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:19 2026-02-21T09:48:18.0138456Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:48:18.0138528Z barrier.sync 1; 2026-02-21T09:48:18.0138603Z ld.shared.b8 %r28, [%r30+147652]; 2026-02-21T09:48:18.0138676Z setp.gt.u32 %p2, %r28, 3; 2026-02-21T09:48:18.0138753Z @%p2 bra $L__BB0_4; 2026-02-21T09:48:18.0138848Z // %bb.3: // %.preheader 2026-02-21T09:48:18.0138957Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0139043Z $L_brx_0: .branchtargets 2026-02-21T09:48:18.0139107Z $L__BB0_5, 2026-02-21T09:48:18.0139171Z $L__BB0_8, 2026-02-21T09:48:18.0139233Z $L__BB0_11, 2026-02-21T09:48:18.0139307Z $L__BB0_16; 2026-02-21T09:48:18.0139378Z brx.idx %r28, $L_brx_0; 2026-02-21T09:48:18.0139468Z $L__BB0_5: // %.peel.next 2026-02-21T09:48:18.0139585Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0139799Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0139926Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.0140013Z ld.shared.b32 %r65, [global_smem+147456]; 2026-02-21T09:48:18.0140083Z barrier.sync 1; 2026-02-21T09:48:18.0140273Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0140370Z bar.warp.sync -1; 2026-02-21T09:48:18.0140440Z mov.b32 %r361, 0; 2026-02-21T09:48:18.0140504Z // begin inline asm 2026-02-21T09:48:18.0140562Z 2026-02-21T09:48:18.0140628Z { 2026-02-21T09:48:18.0140695Z .reg .pred complete; 2026-02-21T09:48:18.0140755Z waitLoop: 2026-02-21T09:48:18.0140885Z mbarrier.try_wait.parity.shared.b64 complete, [%r63], %r361; 2026-02-21T09:48:18.0140965Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.0141021Z } 2026-02-21T09:48:18.0141027Z 2026-02-21T09:48:18.0141091Z // end inline asm 2026-02-21T09:48:18.0141291Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0141364Z elect.sync %r84|%p12, -1; 2026-02-21T09:48:18.0141427Z mov.b32 %r66, 134479888; 2026-02-21T09:48:18.0141490Z mov.pred %p11, 0; 2026-02-21T09:48:18.0141557Z // begin inline asm 2026-02-21T09:48:18.0141739Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd14, %rd23, %r66, %p11; 2026-02-21T09:48:18.0141804Z // end inline asm 2026-02-21T09:48:18.0141873Z cvt.u64.u32 %rd34, %r89; 2026-02-21T09:48:18.0141952Z or.b64 %rd16, %rd34, 4611686293439512576; 2026-02-21T09:48:18.0142014Z add.s32 %r90, %r29, 139296; 2026-02-21T09:48:18.0142080Z bfe.u32 %r91, %r90, 4, 14; 2026-02-21T09:48:18.0142141Z cvt.u64.u32 %rd35, %r91; 2026-02-21T09:48:18.0142214Z or.b64 %rd17, %rd35, 4611686293313683456; 2026-02-21T09:48:18.0142279Z mov.pred %p13, -1; 2026-02-21T09:48:18.0142344Z // begin inline asm 2026-02-21T09:48:18.0142491Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd16, %rd17, %r66, %p13; 2026-02-21T09:48:18.0142552Z // end inline asm 2026-02-21T09:48:18.0142619Z add.s32 %r92, %r29, 64; 2026-02-21T09:48:18.0142680Z bfe.u32 %r93, %r92, 4, 14; 2026-02-21T09:48:18.0142741Z cvt.u64.u32 %rd36, %r93; 2026-02-21T09:48:18.0142810Z or.b64 %rd18, %rd36, 4611686293439512576; 2026-02-21T09:48:18.0142875Z add.s32 %r94, %r29, 139328; 2026-02-21T09:48:18.0142937Z bfe.u32 %r95, %r94, 4, 14; 2026-02-21T09:48:18.0143004Z cvt.u64.u32 %rd37, %r95; 2026-02-21T09:48:18.0143080Z or.b64 %rd19, %rd37, 4611686293313683456; 2026-02-21T09:48:18.0143169Z // begin inline asm 2026-02-21T09:48:18.0143316Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd18, %rd19, %r66, %p13; 2026-02-21T09:48:18.0143386Z // end inline asm 2026-02-21T09:48:18.0143450Z add.s32 %r96, %r29, 96; 2026-02-21T09:48:18.0143511Z bfe.u32 %r97, %r96, 4, 14; 2026-02-21T09:48:18.0143576Z cvt.u64.u32 %rd38, %r97; 2026-02-21T09:48:18.0143663Z or.b64 %rd20, %rd38, 4611686293439512576; 2026-02-21T09:48:18.0143723Z add.s32 %r98, %r29, 139360; 2026-02-21T09:48:18.0143784Z bfe.u32 %r99, %r98, 4, 14; 2026-02-21T09:48:18.0143852Z cvt.u64.u32 %rd39, %r99; 2026-02-21T09:48:18.0143919Z or.b64 %rd21, %rd39, 4611686293313683456; 2026-02-21T09:48:18.0143979Z // begin inline asm 2026-02-21T09:48:18.0144115Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd20, %rd21, %r66, %p13; 2026-02-21T09:48:18.0144181Z // end inline asm 2026-02-21T09:48:18.0144245Z add.s32 %r100, %r29, 16384; 2026-02-21T09:48:18.0144312Z bfe.u32 %r101, %r100, 4, 14; 2026-02-21T09:48:18.0144383Z cvt.u64.u32 %rd40, %r101; 2026-02-21T09:48:18.0144450Z or.b64 %rd22, %rd40, 4611686293439512576; 2026-02-21T09:48:18.0144509Z // begin inline asm 2026-02-21T09:48:18.0144657Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd22, %rd23, %r66, %p11; 2026-02-21T09:48:18.0144743Z // end inline asm 2026-02-21T09:48:18.0144806Z add.s32 %r102, %r29, 16416; 2026-02-21T09:48:18.0144869Z bfe.u32 %r103, %r102, 4, 14; 2026-02-21T09:48:18.0144940Z cvt.u64.u32 %rd41, %r103; 2026-02-21T09:48:18.0145054Z or.b64 %rd24, %rd41, 4611686293439512576; 2026-02-21T09:48:18.0145113Z // begin inline asm 2026-02-21T09:48:18.0145258Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd24, %rd17, %r66, %p13; 2026-02-21T09:48:18.0145316Z // end inline asm 2026-02-21T09:48:18.0145377Z add.s32 %r104, %r29, 16448; 2026-02-21T09:48:18.0145499Z bfe.u32 %r105, %r104, 4, 14; 2026-02-21T09:48:18.0145563Z cvt.u64.u32 %rd42, %r105; 2026-02-21T09:48:18.0145636Z or.b64 %rd26, %rd42, 4611686293439512576; 2026-02-21T09:48:18.0145701Z // begin inline asm 2026-02-21T09:48:18.0145859Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd26, %rd19, %r66, %p13; 2026-02-21T09:48:18.0145921Z // end inline asm 2026-02-21T09:48:18.0145986Z add.s32 %r106, %r29, 16480; 2026-02-21T09:48:18.0146059Z bfe.u32 %r107, %r106, 4, 14; 2026-02-21T09:48:18.0146124Z cvt.u64.u32 %rd43, %r107; 2026-02-21T09:48:18.0146198Z or.b64 %rd28, %rd43, 4611686293439512576; 2026-02-21T09:48:18.0146260Z // begin inline asm 2026-02-21T09:48:18.0146411Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd28, %rd21, %r66, %p13; 2026-02-21T09:48:18.0146471Z // end inline asm 2026-02-21T09:48:18.0146538Z add.s32 %r108, %r29, 147584; 2026-02-21T09:48:18.0146614Z cvt.u64.u32 %rd30, %r108; 2026-02-21T09:48:18.0146677Z // begin inline asm 2026-02-21T09:48:18.0146860Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd30]; 2026-02-21T09:48:18.0146932Z // end inline asm 2026-02-21T09:48:18.0147002Z add.s32 %r109, %r29, 147648; 2026-02-21T09:48:18.0147068Z cvt.u64.u32 %rd31, %r109; 2026-02-21T09:48:18.0147131Z // begin inline asm 2026-02-21T09:48:18.0147280Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd31]; 2026-02-21T09:48:18.0147344Z // end inline asm 2026-02-21T09:48:18.0147407Z mov.b32 %r363, 1; 2026-02-21T09:48:18.0147481Z mov.b32 %r362, %r361; 2026-02-21T09:48:18.0147592Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:18.0147698Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:18.0147897Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0147961Z shl.b32 %r128, %r363, 3; 2026-02-21T09:48:18.0148025Z add.s32 %r130, %r29, %r128; 2026-02-21T09:48:18.0148095Z add.s32 %r131, %r130, 147584; 2026-02-21T09:48:18.0148171Z add.s32 %r110, %r130, 147616; 2026-02-21T09:48:18.0148409Z .loc 1 54 31 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:54:31 2026-02-21T09:48:18.0148473Z shl.b32 %r132, %r363, 15; 2026-02-21T09:48:18.0148545Z add.s32 %r133, %r29, %r132; 2026-02-21T09:48:18.0148739Z .loc 1 55 44 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:55:44 2026-02-21T09:48:18.0148802Z shl.b32 %r134, %r363, 11; 2026-02-21T09:48:18.0148873Z add.s32 %r135, %r29, %r134; 2026-02-21T09:48:18.0148938Z add.s32 %r136, %r135, 139264; 2026-02-21T09:48:18.0149127Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0149196Z bar.warp.sync -1; 2026-02-21T09:48:18.0149268Z // begin inline asm 2026-02-21T09:48:18.0149324Z 2026-02-21T09:48:18.0149379Z { 2026-02-21T09:48:18.0149454Z .reg .pred complete; 2026-02-21T09:48:18.0149516Z waitLoop: 2026-02-21T09:48:18.0149657Z mbarrier.try_wait.parity.shared.b64 complete, [%r110], %r362; 2026-02-21T09:48:18.0149729Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.0149793Z } 2026-02-21T09:48:18.0149797Z 2026-02-21T09:48:18.0149859Z // end inline asm 2026-02-21T09:48:18.0150057Z .loc 1 56 52 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:56:52 2026-02-21T09:48:18.0150140Z setp.eq.b32 %p47, %r361, 1920; 2026-02-21T09:48:18.0150211Z elect.sync %r137|%p30, -1; 2026-02-21T09:48:18.0150276Z bfe.u32 %r138, %r133, 4, 14; 2026-02-21T09:48:18.0150347Z cvt.u64.u32 %rd62, %r138; 2026-02-21T09:48:18.0150451Z or.b64 %rd44, %rd62, 4611686293439512576; 2026-02-21T09:48:18.0150517Z bfe.u32 %r139, %r136, 4, 14; 2026-02-21T09:48:18.0150580Z cvt.u64.u32 %rd63, %r139; 2026-02-21T09:48:18.0150662Z or.b64 %rd45, %rd63, 4611686293313683456; 2026-02-21T09:48:18.0150724Z // begin inline asm 2026-02-21T09:48:18.0150901Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd44, %rd45, %r66, %p13; 2026-02-21T09:48:18.0150972Z // end inline asm 2026-02-21T09:48:18.0151036Z add.s32 %r140, %r133, 32; 2026-02-21T09:48:18.0151100Z bfe.u32 %r141, %r140, 4, 14; 2026-02-21T09:48:18.0151164Z cvt.u64.u32 %rd64, %r141; 2026-02-21T09:48:18.0151244Z or.b64 %rd46, %rd64, 4611686293439512576; 2026-02-21T09:48:18.0151309Z add.s32 %r142, %r135, 139296; 2026-02-21T09:48:18.0151371Z bfe.u32 %r143, %r142, 4, 14; 2026-02-21T09:48:18.0151442Z cvt.u64.u32 %rd65, %r143; 2026-02-21T09:48:18.0151513Z or.b64 %rd47, %rd65, 4611686293313683456; 2026-02-21T09:48:18.0151577Z // begin inline asm 2026-02-21T09:48:18.0151729Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd46, %rd47, %r66, %p13; 2026-02-21T09:48:18.0151791Z // end inline asm 2026-02-21T09:48:18.0151853Z add.s32 %r144, %r133, 64; 2026-02-21T09:48:18.0151918Z bfe.u32 %r145, %r144, 4, 14; 2026-02-21T09:48:18.0151991Z cvt.u64.u32 %rd66, %r145; 2026-02-21T09:48:18.0152087Z or.b64 %rd48, %rd66, 4611686293439512576; 2026-02-21T09:48:18.0152156Z add.s32 %r146, %r135, 139328; 2026-02-21T09:48:18.0152226Z bfe.u32 %r147, %r146, 4, 14; 2026-02-21T09:48:18.0152292Z cvt.u64.u32 %rd67, %r147; 2026-02-21T09:48:18.0152365Z or.b64 %rd49, %rd67, 4611686293313683456; 2026-02-21T09:48:18.0152428Z // begin inline asm 2026-02-21T09:48:18.0152575Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd48, %rd49, %r66, %p13; 2026-02-21T09:48:18.0152635Z // end inline asm 2026-02-21T09:48:18.0152698Z add.s32 %r148, %r133, 96; 2026-02-21T09:48:18.0152767Z bfe.u32 %r149, %r148, 4, 14; 2026-02-21T09:48:18.0152830Z cvt.u64.u32 %rd68, %r149; 2026-02-21T09:48:18.0152903Z or.b64 %rd50, %rd68, 4611686293439512576; 2026-02-21T09:48:18.0152973Z add.s32 %r150, %r135, 139360; 2026-02-21T09:48:18.0153036Z bfe.u32 %r151, %r150, 4, 14; 2026-02-21T09:48:18.0153099Z cvt.u64.u32 %rd69, %r151; 2026-02-21T09:48:18.0153169Z or.b64 %rd51, %rd69, 4611686293313683456; 2026-02-21T09:48:18.0153241Z // begin inline asm 2026-02-21T09:48:18.0153383Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd50, %rd51, %r66, %p13; 2026-02-21T09:48:18.0153471Z // end inline asm 2026-02-21T09:48:18.0153543Z add.s32 %r152, %r133, 16384; 2026-02-21T09:48:18.0153607Z bfe.u32 %r153, %r152, 4, 14; 2026-02-21T09:48:18.0153670Z cvt.u64.u32 %rd70, %r153; 2026-02-21T09:48:18.0153749Z or.b64 %rd52, %rd70, 4611686293439512576; 2026-02-21T09:48:18.0153812Z // begin inline asm 2026-02-21T09:48:18.0153959Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd52, %rd45, %r66, %p13; 2026-02-21T09:48:18.0154021Z // end inline asm 2026-02-21T09:48:18.0154093Z add.s32 %r154, %r133, 16416; 2026-02-21T09:48:18.0154159Z bfe.u32 %r155, %r154, 4, 14; 2026-02-21T09:48:18.0154224Z cvt.u64.u32 %rd71, %r155; 2026-02-21T09:48:18.0154303Z or.b64 %rd54, %rd71, 4611686293439512576; 2026-02-21T09:48:18.0154366Z // begin inline asm 2026-02-21T09:48:18.0154511Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd54, %rd47, %r66, %p13; 2026-02-21T09:48:18.0154574Z // end inline asm 2026-02-21T09:48:18.0154647Z add.s32 %r156, %r133, 16448; 2026-02-21T09:48:18.0154738Z bfe.u32 %r157, %r156, 4, 14; 2026-02-21T09:48:18.0154804Z cvt.u64.u32 %rd72, %r157; 2026-02-21T09:48:18.0154883Z or.b64 %rd56, %rd72, 4611686293439512576; 2026-02-21T09:48:18.0154945Z // begin inline asm 2026-02-21T09:48:18.0155089Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd56, %rd49, %r66, %p13; 2026-02-21T09:48:18.0155158Z // end inline asm 2026-02-21T09:48:18.0155222Z add.s32 %r158, %r133, 16480; 2026-02-21T09:48:18.0155287Z bfe.u32 %r159, %r158, 4, 14; 2026-02-21T09:48:18.0155396Z cvt.u64.u32 %rd73, %r159; 2026-02-21T09:48:18.0155480Z or.b64 %rd58, %rd73, 4611686293439512576; 2026-02-21T09:48:18.0155545Z // begin inline asm 2026-02-21T09:48:18.0155697Z @%p30 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 16 ], %rd58, %rd51, %r66, %p13; 2026-02-21T09:48:18.0155774Z // end inline asm 2026-02-21T09:48:18.0155844Z cvt.u64.u32 %rd60, %r131; 2026-02-21T09:48:18.0155947Z // begin inline asm 2026-02-21T09:48:18.0156099Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd60]; 2026-02-21T09:48:18.0156163Z // end inline asm 2026-02-21T09:48:18.0156235Z and.pred %p46, %p47, %p30; 2026-02-21T09:48:18.0156298Z // begin inline asm 2026-02-21T09:48:18.0156442Z @%p46 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd31]; 2026-02-21T09:48:18.0156503Z // end inline asm 2026-02-21T09:48:18.0156697Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0156771Z add.s32 %r161, %r363, 1; 2026-02-21T09:48:18.0156841Z setp.eq.b32 %p48, %r161, 4; 2026-02-21T09:48:18.0156911Z selp.b32 %r363, 0, %r161, %p48; 2026-02-21T09:48:18.0156988Z selp.b32 %r162, 1, 0, %p48; 2026-02-21T09:48:18.0157052Z xor.b32 %r362, %r362, %r162; 2026-02-21T09:48:18.0157250Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0157355Z add.s32 %r361, %r361, 64; 2026-02-21T09:48:18.0157438Z setp.lt.u32 %p49, %r361, 1984; 2026-02-21T09:48:18.0157507Z @%p49 bra $L__BB0_6; 2026-02-21T09:48:18.0157600Z // %bb.7: // %.loopexit 2026-02-21T09:48:18.0157708Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0157773Z barrier.sync 1; 2026-02-21T09:48:18.0157861Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.0157927Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.0158045Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0158238Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0158329Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.0158448Z ld.shared.v2.b32 {%r47, %r51}, [global_smem+147464]; 2026-02-21T09:48:18.0158512Z barrier.sync 1; 2026-02-21T09:48:18.0158710Z .loc 1 21 67 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:21:67 2026-02-21T09:48:18.0158786Z mov.u32 %r33, %ctaid.x; 2026-02-21T09:48:18.0158893Z mov.u32 %r34, %ctaid.y; 2026-02-21T09:48:18.0158957Z mov.u32 %r35, %ctaid.z; 2026-02-21T09:48:18.0159023Z mov.u32 %r36, %nctaid.x; 2026-02-21T09:48:18.0159096Z mov.u32 %r37, %nctaid.y; 2026-02-21T09:48:18.0159168Z mad.lo.s32 %r38, %r35, %r37, %r34; 2026-02-21T09:48:18.0159241Z mad.lo.s32 %r39, %r38, %r36, %r33; 2026-02-21T09:48:18.0159316Z mul.lo.s32 %r40, %r39, 384; 2026-02-21T09:48:18.0159550Z .loc 1 22 68 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:22:68 2026-02-21T09:48:18.0159615Z add.s32 %r41, %r40, 128; 2026-02-21T09:48:18.0159689Z cvt.s64.s32 %rd8, %r41; 2026-02-21T09:48:18.0159755Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:48:18.0159825Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:48:18.0160021Z .loc 1 21 67 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:21:67 2026-02-21T09:48:18.0160098Z cvt.s64.s32 %rd10, %r40; 2026-02-21T09:48:18.0160165Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:48:18.0160237Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:48:18.0160309Z add.s32 %r13, %r1, -128; 2026-02-21T09:48:18.0160369Z mov.b32 %r365, 0; 2026-02-21T09:48:18.0160433Z mov.b32 %r364, -64; 2026-02-21T09:48:18.0160518Z mov.b32 %r366, %r365; 2026-02-21T09:48:18.0160635Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:18.0160738Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:18.0160958Z .loc 1 0 67 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0:67 2026-02-21T09:48:18.0161079Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:48:18.0161157Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:48:18.0161383Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0161481Z add.s32 %r364, %r364, 64; 2026-02-21T09:48:18.0161611Z shl.b32 %r53, %r366, 3; 2026-02-21T09:48:18.0161700Z add.s32 %r55, %r29, %r53; 2026-02-21T09:48:18.0161786Z add.s32 %r42, %r55, 147584; 2026-02-21T09:48:18.0162008Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0162096Z // begin inline asm 2026-02-21T09:48:18.0162167Z 2026-02-21T09:48:18.0162240Z { 2026-02-21T09:48:18.0162332Z .reg .pred complete; 2026-02-21T09:48:18.0162413Z waitLoop: 2026-02-21T09:48:18.0162577Z mbarrier.try_wait.parity.shared.b64 complete, [%r42], %r365; 2026-02-21T09:48:18.0162661Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.0162723Z } 2026-02-21T09:48:18.0162735Z 2026-02-21T09:48:18.0162809Z // end inline asm 2026-02-21T09:48:18.0163035Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0163111Z add.s32 %r48, %r55, 147616; 2026-02-21T09:48:18.0163376Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0163461Z bar.sync 3, 64; 2026-02-21T09:48:18.0163535Z // begin inline asm 2026-02-21T09:48:18.0163691Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r48], 34816; 2026-02-21T09:48:18.0163774Z // end inline asm 2026-02-21T09:48:18.0163982Z .loc 1 54 31 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:54:31 2026-02-21T09:48:18.0164057Z shl.b32 %r56, %r366, 15; 2026-02-21T09:48:18.0164155Z add.s32 %r45, %r29, %r56; 2026-02-21T09:48:18.0164375Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0164442Z bar.sync 3, 64; 2026-02-21T09:48:18.0164546Z elect.sync %r57|%p7, -1; 2026-02-21T09:48:18.0164646Z and.pred %p4, %p6, %p7; 2026-02-21T09:48:18.0164757Z // begin inline asm 2026-02-21T09:48:18.0165074Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r45], [%rd12, {%r364, %r47}], [%r48]; 2026-02-21T09:48:18.0165161Z // end inline asm 2026-02-21T09:48:18.0165409Z .loc 1 55 44 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:55:44 2026-02-21T09:48:18.0165537Z shl.b32 %r58, %r366, 11; 2026-02-21T09:48:18.0165635Z add.s32 %r59, %r29, %r58; 2026-02-21T09:48:18.0165750Z add.s32 %r49, %r59, 139264; 2026-02-21T09:48:18.0165956Z .loc 1 0 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:0 2026-02-21T09:48:18.0166022Z bar.sync 3, 64; 2026-02-21T09:48:18.0166104Z elect.sync %r60|%p8, -1; 2026-02-21T09:48:18.0166177Z and.pred %p5, %p6, %p8; 2026-02-21T09:48:18.0166248Z // begin inline asm 2026-02-21T09:48:18.0166545Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r49], [%rd13, {%r364, %r51}], [%r48]; 2026-02-21T09:48:18.0166612Z // end inline asm 2026-02-21T09:48:18.0166677Z add.s32 %r61, %r366, 1; 2026-02-21T09:48:18.0166745Z setp.eq.b32 %p9, %r61, 4; 2026-02-21T09:48:18.0166831Z selp.b32 %r366, 0, %r61, %p9; 2026-02-21T09:48:18.0166900Z selp.b32 %r62, 1, 0, %p9; 2026-02-21T09:48:18.0166970Z xor.b32 %r365, %r365, %r62; 2026-02-21T09:48:18.0167183Z .loc 1 50 79 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:50:79 2026-02-21T09:48:18.0167257Z setp.lt.u32 %p10, %r364, 1984; 2026-02-21T09:48:18.0167324Z @%p10 bra $L__BB0_9; 2026-02-21T09:48:18.0167443Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0167509Z barrier.sync 1; 2026-02-21T09:48:18.0167597Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.0167662Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.0167840Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.0168048Z .loc 1 19 0 // cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py:19 2026-02-21T09:48:18.0168123Z barrier.sync 1; 2026-02-21T09:48:18.0168194Z barrier.sync 1; 2026-02-21T09:48:18.0168259Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.0168363Z $L__tmp1: 2026-02-21T09:48:18.0168429Z $L__func_end0: 2026-02-21T09:48:18.0168534Z // -- End function 2026-02-21T09:48:18.0168591Z } 2026-02-21T09:48:18.0168833Z .file 1 "/tmp/torchinductor_root/ks/cks4lefckfh7cssz7kmfb5xefvur3pk5qgscxxourupz4lglzseo.py" 2026-02-21T09:48:18.0168910Z .section .debug_abbrev 2026-02-21T09:48:18.0168967Z { 2026-02-21T09:48:18.0169068Z .b8 1 // Abbreviation Code 2026-02-21T09:48:18.0169175Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:48:18.0169266Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:48:18.0169359Z .b8 37 // DW_AT_producer 2026-02-21T09:48:18.0169442Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.0169537Z .b8 19 // DW_AT_language 2026-02-21T09:48:18.0169664Z .b8 5 // DW_FORM_data2 2026-02-21T09:48:18.0169751Z .b8 3 // DW_AT_name 2026-02-21T09:48:18.0169845Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.0169935Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:48:18.0170019Z .b8 6 // DW_FORM_data4 2026-02-21T09:48:18.0170112Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:48:18.0170195Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.0170275Z .b8 0 // EOM(1) 2026-02-21T09:48:18.0170353Z .b8 0 // EOM(2) 2026-02-21T09:48:18.0170437Z .b8 0 // EOM(3) 2026-02-21T09:48:18.0170496Z } 2026-02-21T09:48:18.0170562Z .section .debug_info 2026-02-21T09:48:18.0170626Z { 2026-02-21T09:48:18.0170719Z .b32 104 // Length of Unit 2026-02-21T09:48:18.0170818Z .b8 2 // DWARF version number 2026-02-21T09:48:18.0170876Z .b8 0 2026-02-21T09:48:18.0171016Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:48:18.0171148Z .b8 8 // Address Size (in bytes) 2026-02-21T09:48:18.0171258Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:48:18.0171355Z .b8 116 // DW_AT_producer 2026-02-21T09:48:18.0171415Z .b8 114 2026-02-21T09:48:18.0171475Z .b8 105 2026-02-21T09:48:18.0171540Z .b8 116 2026-02-21T09:48:18.0171596Z .b8 111 2026-02-21T09:48:18.0171654Z .b8 110 2026-02-21T09:48:18.0171712Z .b8 0 2026-02-21T09:48:18.0171803Z .b8 2 // DW_AT_language 2026-02-21T09:48:18.0171860Z .b8 0 2026-02-21T09:48:18.0171945Z .b8 99 // DW_AT_name 2026-02-21T09:48:18.0172011Z .b8 107 2026-02-21T09:48:18.0172066Z .b8 115 2026-02-21T09:48:18.0172123Z .b8 52 2026-02-21T09:48:18.0172181Z .b8 108 2026-02-21T09:48:18.0172246Z .b8 101 2026-02-21T09:48:18.0172302Z .b8 102 2026-02-21T09:48:18.0172361Z .b8 99 2026-02-21T09:48:18.0172427Z .b8 107 2026-02-21T09:48:18.0172484Z .b8 102 2026-02-21T09:48:18.0172541Z .b8 104 2026-02-21T09:48:18.0172596Z .b8 55 2026-02-21T09:48:18.0172659Z .b8 99 2026-02-21T09:48:18.0172714Z .b8 115 2026-02-21T09:48:18.0172771Z .b8 115 2026-02-21T09:48:18.0172827Z .b8 122 2026-02-21T09:48:18.0172890Z .b8 55 2026-02-21T09:48:18.0172945Z .b8 107 2026-02-21T09:48:18.0173002Z .b8 109 2026-02-21T09:48:18.0173064Z .b8 102 2026-02-21T09:48:18.0173121Z .b8 98 2026-02-21T09:48:18.0173177Z .b8 53 2026-02-21T09:48:18.0173262Z .b8 120 2026-02-21T09:48:18.0173324Z .b8 101 2026-02-21T09:48:18.0173382Z .b8 102 2026-02-21T09:48:18.0173438Z .b8 118 2026-02-21T09:48:18.0173502Z .b8 117 2026-02-21T09:48:18.0173558Z .b8 114 2026-02-21T09:48:18.0173615Z .b8 51 2026-02-21T09:48:18.0173672Z .b8 112 2026-02-21T09:48:18.0173737Z .b8 107 2026-02-21T09:48:18.0173793Z .b8 53 2026-02-21T09:48:18.0173851Z .b8 113 2026-02-21T09:48:18.0173951Z .b8 103 2026-02-21T09:48:18.0174009Z .b8 115 2026-02-21T09:48:18.0174069Z .b8 99 2026-02-21T09:48:18.0174127Z .b8 120 2026-02-21T09:48:18.0174195Z .b8 120 2026-02-21T09:48:18.0174253Z .b8 111 2026-02-21T09:48:18.0174311Z .b8 117 2026-02-21T09:48:18.0174367Z .b8 114 2026-02-21T09:48:18.0174435Z .b8 117 2026-02-21T09:48:18.0174491Z .b8 112 2026-02-21T09:48:18.0174546Z .b8 122 2026-02-21T09:48:18.0174610Z .b8 52 2026-02-21T09:48:18.0174665Z .b8 108 2026-02-21T09:48:18.0174767Z .b8 103 2026-02-21T09:48:18.0174824Z .b8 108 2026-02-21T09:48:18.0174888Z .b8 122 2026-02-21T09:48:18.0174943Z .b8 115 2026-02-21T09:48:18.0175000Z .b8 101 2026-02-21T09:48:18.0175064Z .b8 111 2026-02-21T09:48:18.0175118Z .b8 46 2026-02-21T09:48:18.0175175Z .b8 112 2026-02-21T09:48:18.0175229Z .b8 121 2026-02-21T09:48:18.0175294Z .b8 0 2026-02-21T09:48:18.0175397Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:48:18.0175533Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:48:18.0175599Z .b8 116 2026-02-21T09:48:18.0175654Z .b8 109 2026-02-21T09:48:18.0175711Z .b8 112 2026-02-21T09:48:18.0175767Z .b8 47 2026-02-21T09:48:18.0175834Z .b8 116 2026-02-21T09:48:18.0175890Z .b8 111 2026-02-21T09:48:18.0175947Z .b8 114 2026-02-21T09:48:18.0176003Z .b8 99 2026-02-21T09:48:18.0176070Z .b8 104 2026-02-21T09:48:18.0176126Z .b8 105 2026-02-21T09:48:18.0176182Z .b8 110 2026-02-21T09:48:18.0176244Z .b8 100 2026-02-21T09:48:18.0176302Z .b8 117 2026-02-21T09:48:18.0176358Z .b8 99 2026-02-21T09:48:18.0176415Z .b8 116 2026-02-21T09:48:18.0176479Z .b8 111 2026-02-21T09:48:18.0176535Z .b8 114 2026-02-21T09:48:18.0176592Z .b8 95 2026-02-21T09:48:18.0176656Z .b8 114 2026-02-21T09:48:18.0176712Z .b8 111 2026-02-21T09:48:18.0176767Z .b8 111 2026-02-21T09:48:18.0176823Z .b8 116 2026-02-21T09:48:18.0176887Z .b8 47 2026-02-21T09:48:18.0176943Z .b8 107 2026-02-21T09:48:18.0176999Z .b8 115 2026-02-21T09:48:18.0177061Z .b8 0 2026-02-21T09:48:18.0177116Z } 2026-02-21T09:48:18.0177192Z .section .debug_macinfo { } 2026-02-21T09:48:18.0177197Z 2026-02-21T09:48:18.0177286Z ================================================================ 2026-02-21T09:48:18.0177450Z please share the reproducer above with Triton project. 2026-02-21T09:48:18.1134598Z [115s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:48:18.1134625Z 2026-02-21T09:48:18.1137619Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:48:18.1137665Z 2026-02-21T09:48:18.1139773Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:48:18.1139876Z `ptxas` stderr: 2026-02-21T09:48:18.1140281Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 309 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:18.1140397Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:18.1140405Z 2026-02-21T09:48:18.1140872Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpz_lnuuch.ptx -o /tmp/tmpz_lnuuch.ptx.o 2026-02-21T09:48:18.1140877Z 2026-02-21T09:48:18.1141025Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:48:18.1141340Z 2026-02-21T09:48:18.1141476Z ================================================================ 2026-02-21T09:48:18.1143077Z Internal Triton PTX codegen error 2026-02-21T09:48:18.1148744Z `ptxas` stderr: 2026-02-21T09:48:18.1151425Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 309 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:18.1151595Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:18.1151604Z 2026-02-21T09:48:18.1152112Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpz_lnuuch.ptx -o /tmp/tmpz_lnuuch.ptx.o 2026-02-21T09:48:18.1152119Z 2026-02-21T09:48:18.1152124Z 2026-02-21T09:48:18.1152200Z // 2026-02-21T09:48:18.1152296Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:48:18.1152357Z // 2026-02-21T09:48:18.1152362Z 2026-02-21T09:48:18.1152434Z .version 8.7 2026-02-21T09:48:18.1152509Z .target sm_100a 2026-02-21T09:48:18.1152576Z .address_size 64 2026-02-21T09:48:18.1152580Z 2026-02-21T09:48:18.1152742Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:48:18.1152852Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:48:18.1153027Z // @_helion_matmul 2026-02-21T09:48:18.1153113Z .visible .entry _helion_matmul( 2026-02-21T09:48:18.1153256Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:48:18.1153375Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:48:18.1153490Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:48:18.1153603Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:48:18.1153729Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:48:18.1153790Z ) 2026-02-21T09:48:18.1153855Z .reqntid 256 2026-02-21T09:48:18.1153932Z .maxnreg 32 2026-02-21T09:48:18.1153995Z { 2026-02-21T09:48:18.1154071Z .reg .pred %p<120>; 2026-02-21T09:48:18.1154141Z .reg .b16 %rs<15>; 2026-02-21T09:48:18.1154215Z .reg .b32 %r<342>; 2026-02-21T09:48:18.1154283Z .reg .b64 %rd<172>; 2026-02-21T09:48:18.1154561Z .loc 1 19 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:19:0 2026-02-21T09:48:18.1154639Z $L__func_begin0: 2026-02-21T09:48:18.1154910Z .loc 1 19 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:19:0 2026-02-21T09:48:18.1154993Z 2026-02-21T09:48:18.1155059Z // %bb.0: 2026-02-21T09:48:18.1155175Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:48:18.1155239Z $L__tmp0: 2026-02-21T09:48:18.1155427Z .loc 1 19 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:19 2026-02-21T09:48:18.1155496Z mov.u32 %r1, %tid.x; 2026-02-21T09:48:18.1155574Z shr.u32 %r2, %r1, 5; 2026-02-21T09:48:18.1155658Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:48:18.1155735Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:48:18.1155807Z @%p1 bra $L__BB0_12; 2026-02-21T09:48:18.1156044Z bra.uni $L__BB0_1; 2026-02-21T09:48:18.1156104Z $L__BB0_12: 2026-02-21T09:48:18.1156298Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0:0 2026-02-21T09:48:18.1156395Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:48:18.1156484Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:48:18.1156574Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:48:18.1156775Z .loc 1 19 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:19 2026-02-21T09:48:18.1156869Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.1156943Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:48:18.1157022Z mov.b32 %r129, global_smem; 2026-02-21T09:48:18.1157087Z // begin inline asm 2026-02-21T09:48:18.1157280Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r129], 64; 2026-02-21T09:48:18.1157409Z // end inline asm 2026-02-21T09:48:18.1157472Z bar.sync 0, 128; 2026-02-21T09:48:18.1157552Z ld.shared.b32 %r334, [global_smem]; 2026-02-21T09:48:18.1157615Z bar.sync 0, 128; 2026-02-21T09:48:18.1157687Z // begin inline asm 2026-02-21T09:48:18.1157832Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:48:18.1157936Z // end inline asm 2026-02-21T09:48:18.1158139Z .loc 1 21 67 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:21:67 2026-02-21T09:48:18.1158209Z mov.u32 %r21, %ctaid.x; 2026-02-21T09:48:18.1158276Z mov.u32 %r154, %ctaid.y; 2026-02-21T09:48:18.1158353Z mov.u32 %r155, %ctaid.z; 2026-02-21T09:48:18.1158420Z mov.u32 %r156, %nctaid.x; 2026-02-21T09:48:18.1158486Z mov.u32 %r157, %nctaid.y; 2026-02-21T09:48:18.1158562Z mad.lo.s32 %r158, %r155, %r157, %r154; 2026-02-21T09:48:18.1158647Z mad.lo.s32 %r159, %r158, %r156, %r21; 2026-02-21T09:48:18.1158716Z mul.lo.s32 %r160, %r159, 384; 2026-02-21T09:48:18.1158787Z cvt.s64.s32 %rd104, %r160; 2026-02-21T09:48:18.1158867Z add.s64 %rd65, %rd7, %rd104; 2026-02-21T09:48:18.1158934Z shl.b32 %r161, %r1, 2; 2026-02-21T09:48:18.1158998Z add.s32 %r130, %r129, %r161; 2026-02-21T09:48:18.1159059Z mov.b32 %r139, 0; 2026-02-21T09:48:18.1159130Z // begin inline asm 2026-02-21T09:48:18.1159252Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:18.1159316Z // end inline asm 2026-02-21T09:48:18.1159404Z bar.warp.sync -1; 2026-02-21T09:48:18.1159470Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:48:18.1159534Z cvt.u64.u32 %rd50, %r129; 2026-02-21T09:48:18.1159595Z // begin inline asm 2026-02-21T09:48:18.1159792Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd4; 2026-02-21T09:48:18.1159852Z // end inline asm 2026-02-21T09:48:18.1159913Z // begin inline asm 2026-02-21T09:48:18.1160081Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:18.1160142Z // end inline asm 2026-02-21T09:48:18.1160200Z mov.b32 %r132, 64; 2026-02-21T09:48:18.1160268Z // begin inline asm 2026-02-21T09:48:18.1160438Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:18.1160498Z // end inline asm 2026-02-21T09:48:18.1160557Z // begin inline asm 2026-02-21T09:48:18.1160733Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:18.1160793Z // end inline asm 2026-02-21T09:48:18.1160902Z mov.b32 %r134, 2048; 2026-02-21T09:48:18.1160970Z // begin inline asm 2026-02-21T09:48:18.1161156Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r134; 2026-02-21T09:48:18.1161216Z // end inline asm 2026-02-21T09:48:18.1161282Z // begin inline asm 2026-02-21T09:48:18.1161463Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r134; 2026-02-21T09:48:18.1161522Z // end inline asm 2026-02-21T09:48:18.1161583Z mov.b64 %rd58, 4096; 2026-02-21T09:48:18.1161652Z // begin inline asm 2026-02-21T09:48:18.1161845Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:48:18.1161905Z // end inline asm 2026-02-21T09:48:18.1161973Z mov.b32 %r136, 1; 2026-02-21T09:48:18.1162033Z // begin inline asm 2026-02-21T09:48:18.1162228Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:18.1162300Z // end inline asm 2026-02-21T09:48:18.1162359Z // begin inline asm 2026-02-21T09:48:18.1162548Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:18.1162615Z // end inline asm 2026-02-21T09:48:18.1162677Z // begin inline asm 2026-02-21T09:48:18.1162846Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:18.1162909Z // end inline asm 2026-02-21T09:48:18.1162982Z // begin inline asm 2026-02-21T09:48:18.1163168Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.1163261Z // end inline asm 2026-02-21T09:48:18.1163333Z // begin inline asm 2026-02-21T09:48:18.1163503Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:18.1163586Z // end inline asm 2026-02-21T09:48:18.1163656Z // begin inline asm 2026-02-21T09:48:18.1163841Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.1163902Z // end inline asm 2026-02-21T09:48:18.1163962Z // begin inline asm 2026-02-21T09:48:18.1164266Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd65 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:18.1164327Z // end inline asm 2026-02-21T09:48:18.1164387Z // begin inline asm 2026-02-21T09:48:18.1164539Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd65 + 0 ], 0x80; 2026-02-21T09:48:18.1164617Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.1164772Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.1164840Z // end inline asm 2026-02-21T09:48:18.1164902Z bar.sync 0, 128; 2026-02-21T09:48:18.1165092Z .loc 1 22 68 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:22:68 2026-02-21T09:48:18.1165166Z add.s32 %r162, %r160, 128; 2026-02-21T09:48:18.1165263Z cvt.s64.s32 %rd105, %r162; 2026-02-21T09:48:18.1165332Z add.s64 %rd83, %rd7, %rd105; 2026-02-21T09:48:18.1165396Z bar.sync 0, 128; 2026-02-21T09:48:18.1165466Z // begin inline asm 2026-02-21T09:48:18.1165545Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:18.1165606Z // end inline asm 2026-02-21T09:48:18.1165681Z bar.warp.sync -1; 2026-02-21T09:48:18.1165743Z // begin inline asm 2026-02-21T09:48:18.1165924Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd5; 2026-02-21T09:48:18.1165986Z // end inline asm 2026-02-21T09:48:18.1166056Z // begin inline asm 2026-02-21T09:48:18.1166212Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:18.1166272Z // end inline asm 2026-02-21T09:48:18.1166341Z // begin inline asm 2026-02-21T09:48:18.1166507Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:18.1166567Z // end inline asm 2026-02-21T09:48:18.1166636Z // begin inline asm 2026-02-21T09:48:18.1166802Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:18.1166900Z // end inline asm 2026-02-21T09:48:18.1166961Z // begin inline asm 2026-02-21T09:48:18.1167146Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r134; 2026-02-21T09:48:18.1167206Z // end inline asm 2026-02-21T09:48:18.1167269Z mov.b32 %r143, 12288; 2026-02-21T09:48:18.1167337Z // begin inline asm 2026-02-21T09:48:18.1167509Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r143; 2026-02-21T09:48:18.1167570Z // end inline asm 2026-02-21T09:48:18.1167638Z // begin inline asm 2026-02-21T09:48:18.1167822Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:48:18.1167882Z // end inline asm 2026-02-21T09:48:18.1167941Z // begin inline asm 2026-02-21T09:48:18.1168136Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:18.1168196Z // end inline asm 2026-02-21T09:48:18.1168259Z // begin inline asm 2026-02-21T09:48:18.1168450Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:18.1168507Z // end inline asm 2026-02-21T09:48:18.1168567Z // begin inline asm 2026-02-21T09:48:18.1168737Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:18.1168796Z // end inline asm 2026-02-21T09:48:18.1168857Z // begin inline asm 2026-02-21T09:48:18.1169043Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.1169136Z // end inline asm 2026-02-21T09:48:18.1169197Z // begin inline asm 2026-02-21T09:48:18.1169370Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:18.1169437Z // end inline asm 2026-02-21T09:48:18.1169497Z // begin inline asm 2026-02-21T09:48:18.1169691Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.1169763Z // end inline asm 2026-02-21T09:48:18.1169823Z // begin inline asm 2026-02-21T09:48:18.1170111Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd83 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:18.1170178Z // end inline asm 2026-02-21T09:48:18.1170238Z // begin inline asm 2026-02-21T09:48:18.1170376Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd83 + 0 ], 0x80; 2026-02-21T09:48:18.1170459Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.1170539Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.1170599Z // end inline asm 2026-02-21T09:48:18.1170657Z bar.sync 0, 128; 2026-02-21T09:48:18.1170855Z .loc 1 24 73 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:24:73 2026-02-21T09:48:18.1170921Z add.s32 %r163, %r160, 256; 2026-02-21T09:48:18.1171014Z cvt.s64.s32 %rd106, %r163; 2026-02-21T09:48:18.1171093Z add.s64 %rd101, %rd7, %rd106; 2026-02-21T09:48:18.1171154Z bar.sync 0, 128; 2026-02-21T09:48:18.1171216Z // begin inline asm 2026-02-21T09:48:18.1171292Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:18.1171360Z // end inline asm 2026-02-21T09:48:18.1171423Z bar.warp.sync -1; 2026-02-21T09:48:18.1171483Z // begin inline asm 2026-02-21T09:48:18.1171666Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd6; 2026-02-21T09:48:18.1171725Z // end inline asm 2026-02-21T09:48:18.1171784Z // begin inline asm 2026-02-21T09:48:18.1171942Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:18.1172004Z // end inline asm 2026-02-21T09:48:18.1172064Z // begin inline asm 2026-02-21T09:48:18.1172227Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:18.1172296Z // end inline asm 2026-02-21T09:48:18.1172358Z // begin inline asm 2026-02-21T09:48:18.1172522Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:18.1172639Z // end inline asm 2026-02-21T09:48:18.1172702Z // begin inline asm 2026-02-21T09:48:18.1172880Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r143; 2026-02-21T09:48:18.1172949Z // end inline asm 2026-02-21T09:48:18.1173011Z // begin inline asm 2026-02-21T09:48:18.1173187Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r134; 2026-02-21T09:48:18.1173248Z // end inline asm 2026-02-21T09:48:18.1173322Z mov.b64 %rd94, 24576; 2026-02-21T09:48:18.1173385Z // begin inline asm 2026-02-21T09:48:18.1173572Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd94; 2026-02-21T09:48:18.1173640Z // end inline asm 2026-02-21T09:48:18.1173701Z // begin inline asm 2026-02-21T09:48:18.1173895Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:18.1173964Z // end inline asm 2026-02-21T09:48:18.1174024Z // begin inline asm 2026-02-21T09:48:18.1174214Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:18.1174280Z // end inline asm 2026-02-21T09:48:18.1174339Z // begin inline asm 2026-02-21T09:48:18.1174505Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:18.1174565Z // end inline asm 2026-02-21T09:48:18.1174636Z // begin inline asm 2026-02-21T09:48:18.1174871Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.1174964Z // end inline asm 2026-02-21T09:48:18.1175033Z // begin inline asm 2026-02-21T09:48:18.1175202Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:18.1175262Z // end inline asm 2026-02-21T09:48:18.1175331Z // begin inline asm 2026-02-21T09:48:18.1175531Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.1175593Z // end inline asm 2026-02-21T09:48:18.1175656Z // begin inline asm 2026-02-21T09:48:18.1175956Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd101 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:18.1176016Z // end inline asm 2026-02-21T09:48:18.1176076Z // begin inline asm 2026-02-21T09:48:18.1176229Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd101 + 0 ], 0x80; 2026-02-21T09:48:18.1176307Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.1176388Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.1176456Z // end inline asm 2026-02-21T09:48:18.1176518Z bar.sync 0, 128; 2026-02-21T09:48:18.1176594Z cvta.global.u64 %rd107, %rd101; 2026-02-21T09:48:18.1176781Z .loc 1 31 35 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:31:35 2026-02-21T09:48:18.1176854Z mul.lo.s32 %r341, %r21, 3; 2026-02-21T09:48:18.1177069Z .loc 1 32 37 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:32:37 2026-02-21T09:48:18.1177139Z add.s32 %r164, %r341, 3; 2026-02-21T09:48:18.1177334Z .loc 1 32 49 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:32:49 2026-02-21T09:48:18.1177400Z min.s32 %r23, %r164, 6144; 2026-02-21T09:48:18.1177588Z .loc 1 33 84 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:33:84 2026-02-21T09:48:18.1177668Z setp.ge.s32 %p90, %r341, %r23; 2026-02-21T09:48:18.1177732Z @%p90 bra $L__BB0_15; 2026-02-21T09:48:18.1177819Z // %bb.13: // %.lr.ph 2026-02-21T09:48:18.1178016Z .loc 1 0 84 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0:84 2026-02-21T09:48:18.1178079Z shl.b32 %r165, %r1, 7; 2026-02-21T09:48:18.1178145Z and.b32 %r166, %r165, 1920; 2026-02-21T09:48:18.1178207Z shl.b32 %r167, %r1, 6; 2026-02-21T09:48:18.1178281Z and.b32 %r168, %r167, 6144; 2026-02-21T09:48:18.1178343Z shl.b32 %r169, %r1, 4; 2026-02-21T09:48:18.1178404Z and.b32 %r170, %r169, 112; 2026-02-21T09:48:18.1178504Z and.b32 %r172, %r161, 64; 2026-02-21T09:48:18.1178566Z or.b32 %r173, %r168, %r170; 2026-02-21T09:48:18.1178631Z xor.b32 %r174, %r173, %r172; 2026-02-21T09:48:18.1178692Z or.b32 %r175, %r174, %r166; 2026-02-21T09:48:18.1178763Z add.s32 %r177, %r129, 65536; 2026-02-21T09:48:18.1178824Z add.s32 %r24, %r177, %r175; 2026-02-21T09:48:18.1178886Z xor.b32 %r178, %r175, 16; 2026-02-21T09:48:18.1178953Z add.s32 %r25, %r177, %r178; 2026-02-21T09:48:18.1179014Z xor.b32 %r179, %r175, 32; 2026-02-21T09:48:18.1179077Z add.s32 %r26, %r177, %r179; 2026-02-21T09:48:18.1179137Z xor.b32 %r180, %r175, 48; 2026-02-21T09:48:18.1179205Z add.s32 %r27, %r177, %r180; 2026-02-21T09:48:18.1179389Z .loc 1 33 84 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:33:84 2026-02-21T09:48:18.1179454Z cvt.u16.u32 %rs4, %r21; 2026-02-21T09:48:18.1179527Z mul.lo.s16 %rs14, %rs4, 3; 2026-02-21T09:48:18.1179645Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:48:18.1179830Z .loc 1 39 35 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:39:35 2026-02-21T09:48:18.1179908Z mul.hi.s32 %r276, %r341, 715827883; 2026-02-21T09:48:18.1179970Z shr.u32 %r277, %r276, 31; 2026-02-21T09:48:18.1180033Z shr.s32 %r278, %r276, 7; 2026-02-21T09:48:18.1180096Z add.s32 %r279, %r278, %r277; 2026-02-21T09:48:18.1180288Z .loc 1 43 51 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:43:51 2026-02-21T09:48:18.1180382Z cvt.u16.u32 %rs5, %r279; 2026-02-21T09:48:18.1180456Z mad.lo.s16 %rs6, %rs5, -768, %rs14; 2026-02-21T09:48:18.1180529Z shr.s16 %rs7, %rs6, 15; 2026-02-21T09:48:18.1180592Z shr.u16 %rs8, %rs7, 14; 2026-02-21T09:48:18.1180656Z add.s16 %rs9, %rs6, %rs8; 2026-02-21T09:48:18.1180725Z shr.s16 %rs10, %rs9, 2; 2026-02-21T09:48:18.1180937Z .loc 1 42 64 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:42:64 2026-02-21T09:48:18.1181006Z and.b16 %rs11, %rs9, -4; 2026-02-21T09:48:18.1181076Z mad.lo.s16 %rs12, %rs5, 768, %rs11; 2026-02-21T09:48:18.1181149Z sub.s16 %rs13, %rs14, %rs12; 2026-02-21T09:48:18.1181332Z .loc 1 44 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:44:27 2026-02-21T09:48:18.1181395Z shl.b32 %r280, %r279, 8; 2026-02-21T09:48:18.1181475Z mad.wide.s16 %r274, %rs13, 64, %r280; 2026-02-21T09:48:18.1181657Z .loc 1 45 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:45:27 2026-02-21T09:48:18.1181729Z mul.wide.s16 %r273, %rs10, 64; 2026-02-21T09:48:18.1181917Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1181998Z shfl.sync.idx.b32 %r281, %r2, 0, 31, -1; 2026-02-21T09:48:18.1182061Z shl.b32 %r282, %r281, 21; 2026-02-21T09:48:18.1182152Z and.b32 %r283, %r282, 6291456; 2026-02-21T09:48:18.1182228Z add.s32 %r181, %r283, %r334; 2026-02-21T09:48:18.1182298Z mov.pred %p91, -1; 2026-02-21T09:48:18.1182357Z mov.b32 %r182, 0; 2026-02-21T09:48:18.1182429Z // begin inline asm 2026-02-21T09:48:18.1182773Z @%p91 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r181 + 0], 32, {%r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182}; 2026-02-21T09:48:18.1182835Z // end inline asm 2026-02-21T09:48:18.1182907Z // begin inline asm 2026-02-21T09:48:18.1183241Z @%p91 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r181 + 16], 32, {%r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182}; 2026-02-21T09:48:18.1183306Z // end inline asm 2026-02-21T09:48:18.1183366Z // begin inline asm 2026-02-21T09:48:18.1183453Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:48:18.1183514Z // end inline asm 2026-02-21T09:48:18.1183574Z bar.sync 0, 128; 2026-02-21T09:48:18.1183768Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1183862Z add.s32 %r215, %r129, 73856; 2026-02-21T09:48:18.1183922Z // begin inline asm 2026-02-21T09:48:18.1184025Z @%p37 mbarrier.init.shared::cta.b64 [%r215], 1; 2026-02-21T09:48:18.1184085Z // end inline asm 2026-02-21T09:48:18.1184145Z bar.sync 0, 128; 2026-02-21T09:48:18.1184208Z add.s32 %r216, %r129, 73864; 2026-02-21T09:48:18.1184277Z // begin inline asm 2026-02-21T09:48:18.1184369Z @%p37 mbarrier.init.shared::cta.b64 [%r216], 1; 2026-02-21T09:48:18.1184429Z // end inline asm 2026-02-21T09:48:18.1184498Z bar.sync 0, 128; 2026-02-21T09:48:18.1184562Z add.s32 %r217, %r129, 73872; 2026-02-21T09:48:18.1184622Z // begin inline asm 2026-02-21T09:48:18.1184775Z @%p37 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T09:48:18.1184847Z // end inline asm 2026-02-21T09:48:18.1184908Z bar.sync 0, 128; 2026-02-21T09:48:18.1184973Z add.s32 %r218, %r129, 73880; 2026-02-21T09:48:18.1185042Z // begin inline asm 2026-02-21T09:48:18.1185128Z @%p37 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T09:48:18.1185190Z // end inline asm 2026-02-21T09:48:18.1185253Z add.s32 %r219, %r129, 73888; 2026-02-21T09:48:18.1185321Z // begin inline asm 2026-02-21T09:48:18.1185407Z @%p37 mbarrier.init.shared::cta.b64 [%r219], 1; 2026-02-21T09:48:18.1185465Z // end inline asm 2026-02-21T09:48:18.1185531Z bar.sync 0, 128; 2026-02-21T09:48:18.1185593Z add.s32 %r220, %r129, 73896; 2026-02-21T09:48:18.1185654Z // begin inline asm 2026-02-21T09:48:18.1185745Z @%p37 mbarrier.init.shared::cta.b64 [%r220], 1; 2026-02-21T09:48:18.1185844Z // end inline asm 2026-02-21T09:48:18.1185903Z bar.sync 0, 128; 2026-02-21T09:48:18.1185965Z add.s32 %r221, %r129, 73904; 2026-02-21T09:48:18.1186036Z // begin inline asm 2026-02-21T09:48:18.1186122Z @%p37 mbarrier.init.shared::cta.b64 [%r221], 1; 2026-02-21T09:48:18.1186182Z // end inline asm 2026-02-21T09:48:18.1186251Z bar.sync 0, 128; 2026-02-21T09:48:18.1186340Z add.s32 %r222, %r129, 73912; 2026-02-21T09:48:18.1186402Z // begin inline asm 2026-02-21T09:48:18.1186491Z @%p37 mbarrier.init.shared::cta.b64 [%r222], 1; 2026-02-21T09:48:18.1186559Z // end inline asm 2026-02-21T09:48:18.1186742Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1186803Z bar.sync 0, 128; 2026-02-21T09:48:18.1186869Z // begin inline asm 2026-02-21T09:48:18.1186966Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r215]; 2026-02-21T09:48:18.1187024Z // end inline asm 2026-02-21T09:48:18.1187090Z bar.sync 0, 128; 2026-02-21T09:48:18.1187152Z // begin inline asm 2026-02-21T09:48:18.1187245Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r216]; 2026-02-21T09:48:18.1187304Z // end inline asm 2026-02-21T09:48:18.1187373Z bar.sync 0, 128; 2026-02-21T09:48:18.1187433Z // begin inline asm 2026-02-21T09:48:18.1187527Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r217]; 2026-02-21T09:48:18.1187627Z // end inline asm 2026-02-21T09:48:18.1187689Z bar.sync 0, 128; 2026-02-21T09:48:18.1187748Z // begin inline asm 2026-02-21T09:48:18.1187839Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r218]; 2026-02-21T09:48:18.1187905Z // end inline asm 2026-02-21T09:48:18.1188083Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1188143Z bar.sync 0, 128; 2026-02-21T09:48:18.1188213Z add.s32 %r227, %r129, 73920; 2026-02-21T09:48:18.1188273Z // begin inline asm 2026-02-21T09:48:18.1188359Z @%p37 mbarrier.init.shared::cta.b64 [%r227], 1; 2026-02-21T09:48:18.1188425Z // end inline asm 2026-02-21T09:48:18.1190682Z st.shared.b32 [global_smem+73928], 33554689; 2026-02-21T09:48:18.1190766Z st.shared.b32 [global_smem+73728], %r334; 2026-02-21T09:48:18.1190875Z st.shared.v2.b32 [global_smem+73736], {%r274, %r273}; 2026-02-21T09:48:18.1190947Z barrier.sync 1; 2026-02-21T09:48:18.1191037Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.1191103Z barrier.sync 1; 2026-02-21T09:48:18.1191165Z barrier.sync 1; 2026-02-21T09:48:18.1191307Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.1191489Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1191551Z bar.sync 0, 128; 2026-02-21T09:48:18.1191620Z // begin inline asm 2026-02-21T09:48:18.1191678Z 2026-02-21T09:48:18.1191732Z { 2026-02-21T09:48:18.1191800Z .reg .pred complete; 2026-02-21T09:48:18.1191869Z waitLoop: 2026-02-21T09:48:18.1192004Z mbarrier.try_wait.parity.shared.b64 complete, [%r227], %r182; 2026-02-21T09:48:18.1192076Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.1192159Z } 2026-02-21T09:48:18.1192164Z 2026-02-21T09:48:18.1192226Z // end inline asm 2026-02-21T09:48:18.1192419Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1192481Z bar.sync 0, 128; 2026-02-21T09:48:18.1192543Z // begin inline asm 2026-02-21T09:48:18.1192646Z @%p37 mbarrier.inval.shared::cta.b64 [%r227]; 2026-02-21T09:48:18.1192704Z // end inline asm 2026-02-21T09:48:18.1192765Z // begin inline asm 2026-02-21T09:48:18.1192860Z @%p37 mbarrier.inval.shared::cta.b64 [%r219]; 2026-02-21T09:48:18.1192919Z // end inline asm 2026-02-21T09:48:18.1192978Z bar.sync 0, 128; 2026-02-21T09:48:18.1193039Z // begin inline asm 2026-02-21T09:48:18.1193130Z @%p37 mbarrier.inval.shared::cta.b64 [%r220]; 2026-02-21T09:48:18.1193189Z // end inline asm 2026-02-21T09:48:18.1193248Z bar.sync 0, 128; 2026-02-21T09:48:18.1193312Z // begin inline asm 2026-02-21T09:48:18.1193396Z @%p37 mbarrier.inval.shared::cta.b64 [%r221]; 2026-02-21T09:48:18.1193458Z // end inline asm 2026-02-21T09:48:18.1193517Z bar.sync 0, 128; 2026-02-21T09:48:18.1193585Z // begin inline asm 2026-02-21T09:48:18.1193670Z @%p37 mbarrier.inval.shared::cta.b64 [%r222]; 2026-02-21T09:48:18.1193728Z // end inline asm 2026-02-21T09:48:18.1193798Z // begin inline asm 2026-02-21T09:48:18.1193910Z @%p37 mbarrier.inval.shared::cta.b64 [%r215]; 2026-02-21T09:48:18.1193971Z // end inline asm 2026-02-21T09:48:18.1194039Z bar.sync 0, 128; 2026-02-21T09:48:18.1194098Z // begin inline asm 2026-02-21T09:48:18.1194182Z @%p37 mbarrier.inval.shared::cta.b64 [%r216]; 2026-02-21T09:48:18.1194240Z // end inline asm 2026-02-21T09:48:18.1194307Z bar.sync 0, 128; 2026-02-21T09:48:18.1194368Z // begin inline asm 2026-02-21T09:48:18.1194451Z @%p37 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T09:48:18.1194518Z // end inline asm 2026-02-21T09:48:18.1194577Z bar.sync 0, 128; 2026-02-21T09:48:18.1194637Z // begin inline asm 2026-02-21T09:48:18.1194779Z @%p37 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T09:48:18.1194852Z // end inline asm 2026-02-21T09:48:18.1195036Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1195101Z // begin inline asm 2026-02-21T09:48:18.1195486Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251, %r252, %r253, %r254}, [%r181 + 0], 32; 2026-02-21T09:48:18.1195556Z // end inline asm 2026-02-21T09:48:18.1195618Z // begin inline asm 2026-02-21T09:48:18.1195939Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270, %r271}, [%r181 + 16], 32; 2026-02-21T09:48:18.1196001Z // end inline asm 2026-02-21T09:48:18.1196065Z // begin inline asm 2026-02-21T09:48:18.1196147Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:48:18.1196219Z // end inline asm 2026-02-21T09:48:18.1196289Z cvt.u64.u32 %rd108, %r239; 2026-02-21T09:48:18.1196450Z cvt.u64.u32 %rd109, %r240; 2026-02-21T09:48:18.1196527Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:48:18.1196596Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:48:18.1196788Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1196867Z mov.b64 {%r285, %r286}, %rd111; 2026-02-21T09:48:18.1196944Z cvt.rn.f16x2.f32 %r287, %r286, %r285; 2026-02-21T09:48:18.1197165Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1197231Z cvt.u64.u32 %rd112, %r241; 2026-02-21T09:48:18.1197308Z cvt.u64.u32 %rd113, %r242; 2026-02-21T09:48:18.1197375Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:48:18.1197442Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:48:18.1197637Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1197708Z mov.b64 {%r288, %r289}, %rd115; 2026-02-21T09:48:18.1197783Z cvt.rn.f16x2.f32 %r290, %r289, %r288; 2026-02-21T09:48:18.1197977Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1198041Z cvt.u64.u32 %rd116, %r243; 2026-02-21T09:48:18.1198105Z cvt.u64.u32 %rd117, %r244; 2026-02-21T09:48:18.1198174Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:48:18.1198250Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:48:18.1198436Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1198502Z mov.b64 {%r291, %r292}, %rd119; 2026-02-21T09:48:18.1198584Z cvt.rn.f16x2.f32 %r293, %r292, %r291; 2026-02-21T09:48:18.1198766Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1198831Z cvt.u64.u32 %rd120, %r245; 2026-02-21T09:48:18.1198902Z cvt.u64.u32 %rd121, %r246; 2026-02-21T09:48:18.1198966Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:48:18.1199031Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:48:18.1199214Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1199288Z mov.b64 {%r294, %r295}, %rd123; 2026-02-21T09:48:18.1199358Z cvt.rn.f16x2.f32 %r296, %r295, %r294; 2026-02-21T09:48:18.1199586Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1199662Z cvt.u64.u32 %rd124, %r247; 2026-02-21T09:48:18.1199726Z cvt.u64.u32 %rd125, %r248; 2026-02-21T09:48:18.1199790Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:48:18.1199862Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:48:18.1200044Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1200110Z mov.b64 {%r297, %r298}, %rd127; 2026-02-21T09:48:18.1200178Z cvt.rn.f16x2.f32 %r299, %r298, %r297; 2026-02-21T09:48:18.1200369Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1200434Z cvt.u64.u32 %rd128, %r249; 2026-02-21T09:48:18.1200498Z cvt.u64.u32 %rd129, %r250; 2026-02-21T09:48:18.1200569Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:48:18.1200633Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:48:18.1200849Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1200923Z mov.b64 {%r300, %r301}, %rd131; 2026-02-21T09:48:18.1200990Z cvt.rn.f16x2.f32 %r302, %r301, %r300; 2026-02-21T09:48:18.1201170Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1201234Z cvt.u64.u32 %rd132, %r251; 2026-02-21T09:48:18.1201303Z cvt.u64.u32 %rd133, %r252; 2026-02-21T09:48:18.1201367Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:48:18.1201433Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:48:18.1201623Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1201732Z mov.b64 {%r303, %r304}, %rd135; 2026-02-21T09:48:18.1201802Z cvt.rn.f16x2.f32 %r305, %r304, %r303; 2026-02-21T09:48:18.1201994Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1202058Z cvt.u64.u32 %rd136, %r253; 2026-02-21T09:48:18.1202122Z cvt.u64.u32 %rd137, %r254; 2026-02-21T09:48:18.1202185Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:48:18.1202285Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:48:18.1202465Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1202531Z mov.b64 {%r306, %r307}, %rd139; 2026-02-21T09:48:18.1202608Z cvt.rn.f16x2.f32 %r308, %r307, %r306; 2026-02-21T09:48:18.1202793Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1202882Z cvt.u64.u32 %rd140, %r256; 2026-02-21T09:48:18.1202952Z cvt.u64.u32 %rd141, %r257; 2026-02-21T09:48:18.1203025Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:48:18.1203093Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:48:18.1203291Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1203361Z mov.b64 {%r309, %r310}, %rd143; 2026-02-21T09:48:18.1203435Z cvt.rn.f16x2.f32 %r311, %r310, %r309; 2026-02-21T09:48:18.1203626Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1203693Z cvt.u64.u32 %rd144, %r258; 2026-02-21T09:48:18.1203757Z cvt.u64.u32 %rd145, %r259; 2026-02-21T09:48:18.1203827Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:48:18.1203892Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:48:18.1204074Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1204140Z mov.b64 {%r312, %r313}, %rd147; 2026-02-21T09:48:18.1204217Z cvt.rn.f16x2.f32 %r314, %r313, %r312; 2026-02-21T09:48:18.1204404Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1204468Z cvt.u64.u32 %rd148, %r260; 2026-02-21T09:48:18.1204540Z cvt.u64.u32 %rd149, %r261; 2026-02-21T09:48:18.1204604Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:48:18.1204786Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:48:18.1204981Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1205047Z mov.b64 {%r315, %r316}, %rd151; 2026-02-21T09:48:18.1205115Z cvt.rn.f16x2.f32 %r317, %r316, %r315; 2026-02-21T09:48:18.1205296Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1205367Z cvt.u64.u32 %rd152, %r262; 2026-02-21T09:48:18.1205429Z cvt.u64.u32 %rd153, %r263; 2026-02-21T09:48:18.1205492Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:48:18.1205564Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:48:18.1205754Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1205821Z mov.b64 {%r318, %r319}, %rd155; 2026-02-21T09:48:18.1205899Z cvt.rn.f16x2.f32 %r320, %r319, %r318; 2026-02-21T09:48:18.1206126Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1206193Z cvt.u64.u32 %rd156, %r264; 2026-02-21T09:48:18.1206259Z cvt.u64.u32 %rd157, %r265; 2026-02-21T09:48:18.1206332Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:48:18.1206398Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:48:18.1206583Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1206658Z mov.b64 {%r321, %r322}, %rd159; 2026-02-21T09:48:18.1206728Z cvt.rn.f16x2.f32 %r323, %r322, %r321; 2026-02-21T09:48:18.1206912Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1207016Z cvt.u64.u32 %rd160, %r266; 2026-02-21T09:48:18.1207080Z cvt.u64.u32 %rd161, %r267; 2026-02-21T09:48:18.1207145Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:48:18.1207209Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:48:18.1207403Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1207467Z mov.b64 {%r324, %r325}, %rd163; 2026-02-21T09:48:18.1207568Z cvt.rn.f16x2.f32 %r326, %r325, %r324; 2026-02-21T09:48:18.1207755Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1207818Z cvt.u64.u32 %rd164, %r268; 2026-02-21T09:48:18.1207881Z cvt.u64.u32 %rd165, %r269; 2026-02-21T09:48:18.1207950Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:48:18.1208014Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:48:18.1208198Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1208262Z mov.b64 {%r327, %r328}, %rd167; 2026-02-21T09:48:18.1208339Z cvt.rn.f16x2.f32 %r329, %r328, %r327; 2026-02-21T09:48:18.1208520Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1208584Z cvt.u64.u32 %rd168, %r270; 2026-02-21T09:48:18.1208655Z cvt.u64.u32 %rd169, %r271; 2026-02-21T09:48:18.1208722Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:48:18.1208789Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:48:18.1208977Z .loc 1 58 27 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:58:27 2026-02-21T09:48:18.1209040Z mov.b64 {%r330, %r331}, %rd171; 2026-02-21T09:48:18.1209109Z cvt.rn.f16x2.f32 %r332, %r331, %r330; 2026-02-21T09:48:18.1209294Z .loc 1 59 45 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:59:45 2026-02-21T09:48:18.1209379Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:18.1209440Z bar.sync 0, 128; 2026-02-21T09:48:18.1209545Z st.shared.v4.b32 [%r24], {%r287, %r290, %r293, %r296}; 2026-02-21T09:48:18.1209653Z st.shared.v4.b32 [%r25], {%r299, %r302, %r305, %r308}; 2026-02-21T09:48:18.1209748Z st.shared.v4.b32 [%r26], {%r311, %r314, %r317, %r320}; 2026-02-21T09:48:18.1209843Z st.shared.v4.b32 [%r27], {%r323, %r326, %r329, %r332}; 2026-02-21T09:48:18.1209915Z // begin inline asm 2026-02-21T09:48:18.1210034Z fence.proxy.async.shared::cta; 2026-02-21T09:48:18.1210101Z // end inline asm 2026-02-21T09:48:18.1210161Z bar.sync 0, 128; 2026-02-21T09:48:18.1210242Z elect.sync %r333|%p117, -1; 2026-02-21T09:48:18.1210313Z and.pred %p115, %p34, %p117; 2026-02-21T09:48:18.1210375Z // begin inline asm 2026-02-21T09:48:18.1210589Z @%p115 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd107, {%r273, %r274}], [%r177]; 2026-02-21T09:48:18.1210650Z // end inline asm 2026-02-21T09:48:18.1210722Z cp.async.bulk.commit_group; 2026-02-21T09:48:18.1210909Z .loc 1 33 84 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:33:84 2026-02-21T09:48:18.1210976Z add.s32 %r341, %r341, 1; 2026-02-21T09:48:18.1211041Z add.s16 %rs14, %rs14, 1; 2026-02-21T09:48:18.1211112Z setp.ne.b32 %p118, %r23, %r341; 2026-02-21T09:48:18.1211188Z @%p118 bra $L__BB0_14; 2026-02-21T09:48:18.1211312Z $L__BB0_15: // %._crit_edge 2026-02-21T09:48:18.1211395Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:18.1211463Z bar.sync 0, 128; 2026-02-21T09:48:18.1211649Z .loc 1 33 4 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:33:4 2026-02-21T09:48:18.1211712Z bar.sync 0, 128; 2026-02-21T09:48:18.1211772Z // begin inline asm 2026-02-21T09:48:18.1211912Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r334, 64; 2026-02-21T09:48:18.1211975Z // end inline asm 2026-02-21T09:48:18.1212062Z st.shared.b32 [global_smem+73928], 50529027; 2026-02-21T09:48:18.1212139Z barrier.sync 1; 2026-02-21T09:48:18.1212230Z $L__BB0_16: // %common.ret 2026-02-21T09:48:18.1212440Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1212511Z ret; 2026-02-21T09:48:18.1212620Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:48:18.1212688Z mov.b32 %r31, global_smem; 2026-02-21T09:48:18.1212758Z add.s32 %r32, %r31, %r3; 2026-02-21T09:48:18.1212830Z add.s32 %r63, %r31, 73888; 2026-02-21T09:48:18.1212926Z bfe.u32 %r77, %r31, 4, 14; 2026-02-21T09:48:18.1212990Z cvt.u64.u32 %rd24, %r77; 2026-02-21T09:48:18.1213074Z or.b64 %rd14, %rd24, 4611686293338849280; 2026-02-21T09:48:18.1213136Z add.s32 %r78, %r31, 32768; 2026-02-21T09:48:18.1213199Z bfe.u32 %r79, %r78, 4, 14; 2026-02-21T09:48:18.1213271Z cvt.u64.u32 %rd25, %r79; 2026-02-21T09:48:18.1213344Z or.b64 %rd15, %rd25, 4611686293338849280; 2026-02-21T09:48:18.1213407Z add.s32 %r80, %r31, 32; 2026-02-21T09:48:18.1213470Z bfe.u32 %r81, %r80, 4, 14; 2026-02-21T09:48:18.1213541Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.1213656Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1213844Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1213938Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.1214001Z barrier.sync 1; 2026-02-21T09:48:18.1214064Z barrier.sync 1; 2026-02-21T09:48:18.1214149Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.1214247Z $L__BB0_2: // %.preheader 2026-02-21T09:48:18.1214352Z // =>This Loop Header: Depth=1 2026-02-21T09:48:18.1214450Z // Child Loop BB0_9 Depth 2 2026-02-21T09:48:18.1214554Z // Child Loop BB0_6 Depth 2 2026-02-21T09:48:18.1214784Z .loc 1 19 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:19 2026-02-21T09:48:18.1214871Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:48:18.1214943Z barrier.sync 1; 2026-02-21T09:48:18.1215017Z ld.shared.b8 %r30, [%r32+73924]; 2026-02-21T09:48:18.1215087Z setp.gt.u32 %p2, %r30, 3; 2026-02-21T09:48:18.1215151Z @%p2 bra $L__BB0_4; 2026-02-21T09:48:18.1215247Z // %bb.3: // %.preheader 2026-02-21T09:48:18.1215395Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1215467Z $L_brx_0: .branchtargets 2026-02-21T09:48:18.1215536Z $L__BB0_5, 2026-02-21T09:48:18.1215595Z $L__BB0_8, 2026-02-21T09:48:18.1215654Z $L__BB0_11, 2026-02-21T09:48:18.1215711Z $L__BB0_16; 2026-02-21T09:48:18.1215786Z brx.idx %r30, $L_brx_0; 2026-02-21T09:48:18.1215870Z $L__BB0_5: // %.peel.next 2026-02-21T09:48:18.1215968Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1216170Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1216256Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.1216341Z ld.shared.b32 %r65, [global_smem+73728]; 2026-02-21T09:48:18.1216412Z barrier.sync 1; 2026-02-21T09:48:18.1216593Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1216695Z bar.warp.sync -1; 2026-02-21T09:48:18.1216759Z mov.b32 %r335, 0; 2026-02-21T09:48:18.1216829Z // begin inline asm 2026-02-21T09:48:18.1216886Z 2026-02-21T09:48:18.1216942Z { 2026-02-21T09:48:18.1217016Z .reg .pred complete; 2026-02-21T09:48:18.1217078Z waitLoop: 2026-02-21T09:48:18.1217209Z mbarrier.try_wait.parity.shared.b64 complete, [%r63], %r335; 2026-02-21T09:48:18.1217281Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.1217344Z } 2026-02-21T09:48:18.1217348Z 2026-02-21T09:48:18.1217410Z // end inline asm 2026-02-21T09:48:18.1217594Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1217703Z elect.sync %r76|%p12, -1; 2026-02-21T09:48:18.1217767Z mov.b32 %r66, 68157456; 2026-02-21T09:48:18.1217831Z mov.pred %p11, 0; 2026-02-21T09:48:18.1217900Z // begin inline asm 2026-02-21T09:48:18.1218055Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd14, %rd15, %r66, %p11; 2026-02-21T09:48:18.1218117Z // end inline asm 2026-02-21T09:48:18.1218184Z cvt.u64.u32 %rd26, %r81; 2026-02-21T09:48:18.1218268Z or.b64 %rd16, %rd26, 4611686293338849280; 2026-02-21T09:48:18.1218364Z add.s32 %r82, %r31, 32800; 2026-02-21T09:48:18.1218429Z bfe.u32 %r83, %r82, 4, 14; 2026-02-21T09:48:18.1218502Z cvt.u64.u32 %rd27, %r83; 2026-02-21T09:48:18.1218575Z or.b64 %rd17, %rd27, 4611686293338849280; 2026-02-21T09:48:18.1218641Z mov.pred %p13, -1; 2026-02-21T09:48:18.1218710Z // begin inline asm 2026-02-21T09:48:18.1218861Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd16, %rd17, %r66, %p13; 2026-02-21T09:48:18.1218923Z // end inline asm 2026-02-21T09:48:18.1218985Z add.s32 %r84, %r31, 64; 2026-02-21T09:48:18.1219058Z bfe.u32 %r85, %r84, 4, 14; 2026-02-21T09:48:18.1219122Z cvt.u64.u32 %rd28, %r85; 2026-02-21T09:48:18.1219193Z or.b64 %rd18, %rd28, 4611686293338849280; 2026-02-21T09:48:18.1219262Z add.s32 %r86, %r31, 32832; 2026-02-21T09:48:18.1219332Z bfe.u32 %r87, %r86, 4, 14; 2026-02-21T09:48:18.1219409Z cvt.u64.u32 %rd29, %r87; 2026-02-21T09:48:18.1219493Z or.b64 %rd19, %rd29, 4611686293338849280; 2026-02-21T09:48:18.1219578Z // begin inline asm 2026-02-21T09:48:18.1219813Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd18, %rd19, %r66, %p13; 2026-02-21T09:48:18.1219885Z // end inline asm 2026-02-21T09:48:18.1219966Z add.s32 %r88, %r31, 96; 2026-02-21T09:48:18.1220038Z bfe.u32 %r89, %r88, 4, 14; 2026-02-21T09:48:18.1220112Z cvt.u64.u32 %rd30, %r89; 2026-02-21T09:48:18.1220202Z or.b64 %rd20, %rd30, 4611686293338849280; 2026-02-21T09:48:18.1220278Z add.s32 %r90, %r31, 32864; 2026-02-21T09:48:18.1220339Z bfe.u32 %r91, %r90, 4, 14; 2026-02-21T09:48:18.1220403Z cvt.u64.u32 %rd31, %r91; 2026-02-21T09:48:18.1220481Z or.b64 %rd21, %rd31, 4611686293338849280; 2026-02-21T09:48:18.1220542Z // begin inline asm 2026-02-21T09:48:18.1220684Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd20, %rd21, %r66, %p13; 2026-02-21T09:48:18.1220756Z // end inline asm 2026-02-21T09:48:18.1220857Z add.s32 %r92, %r31, 73856; 2026-02-21T09:48:18.1220925Z cvt.u64.u32 %rd22, %r92; 2026-02-21T09:48:18.1220990Z // begin inline asm 2026-02-21T09:48:18.1221143Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:48:18.1221204Z // end inline asm 2026-02-21T09:48:18.1221266Z add.s32 %r93, %r31, 73920; 2026-02-21T09:48:18.1221339Z cvt.u64.u32 %rd23, %r93; 2026-02-21T09:48:18.1221401Z // begin inline asm 2026-02-21T09:48:18.1221539Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:48:18.1221608Z // end inline asm 2026-02-21T09:48:18.1221667Z mov.b32 %r337, 1; 2026-02-21T09:48:18.1221731Z mov.b32 %r336, %r335; 2026-02-21T09:48:18.1221840Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:18.1221949Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:18.1222168Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1222235Z shl.b32 %r104, %r337, 3; 2026-02-21T09:48:18.1222311Z add.s32 %r106, %r31, %r104; 2026-02-21T09:48:18.1222381Z add.s32 %r107, %r106, 73856; 2026-02-21T09:48:18.1222456Z add.s32 %r94, %r106, 73888; 2026-02-21T09:48:18.1222648Z .loc 1 54 31 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:54:31 2026-02-21T09:48:18.1222714Z shl.b32 %r108, %r337, 13; 2026-02-21T09:48:18.1222777Z add.s32 %r109, %r31, %r108; 2026-02-21T09:48:18.1222964Z .loc 1 55 44 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:55:44 2026-02-21T09:48:18.1223038Z add.s32 %r110, %r109, 32768; 2026-02-21T09:48:18.1223247Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1223316Z bar.warp.sync -1; 2026-02-21T09:48:18.1223389Z // begin inline asm 2026-02-21T09:48:18.1223445Z 2026-02-21T09:48:18.1223500Z { 2026-02-21T09:48:18.1223575Z .reg .pred complete; 2026-02-21T09:48:18.1223634Z waitLoop: 2026-02-21T09:48:18.1223765Z mbarrier.try_wait.parity.shared.b64 complete, [%r94], %r336; 2026-02-21T09:48:18.1223864Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.1223930Z } 2026-02-21T09:48:18.1223934Z 2026-02-21T09:48:18.1223998Z // end inline asm 2026-02-21T09:48:18.1224185Z .loc 1 56 52 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:56:52 2026-02-21T09:48:18.1224264Z setp.eq.b32 %p31, %r335, 1920; 2026-02-21T09:48:18.1224333Z elect.sync %r111|%p22, -1; 2026-02-21T09:48:18.1224398Z bfe.u32 %r112, %r109, 4, 14; 2026-02-21T09:48:18.1224461Z cvt.u64.u32 %rd42, %r112; 2026-02-21T09:48:18.1224546Z or.b64 %rd32, %rd42, 4611686293338849280; 2026-02-21T09:48:18.1224612Z bfe.u32 %r113, %r110, 4, 14; 2026-02-21T09:48:18.1224707Z cvt.u64.u32 %rd43, %r113; 2026-02-21T09:48:18.1224788Z or.b64 %rd33, %rd43, 4611686293338849280; 2026-02-21T09:48:18.1224852Z // begin inline asm 2026-02-21T09:48:18.1224998Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd32, %rd33, %r66, %p13; 2026-02-21T09:48:18.1225067Z // end inline asm 2026-02-21T09:48:18.1225129Z add.s32 %r114, %r109, 32; 2026-02-21T09:48:18.1225191Z bfe.u32 %r115, %r114, 4, 14; 2026-02-21T09:48:18.1225253Z cvt.u64.u32 %rd44, %r115; 2026-02-21T09:48:18.1225333Z or.b64 %rd34, %rd44, 4611686293338849280; 2026-02-21T09:48:18.1225395Z add.s32 %r116, %r109, 32800; 2026-02-21T09:48:18.1225455Z bfe.u32 %r117, %r116, 4, 14; 2026-02-21T09:48:18.1225525Z cvt.u64.u32 %rd45, %r117; 2026-02-21T09:48:18.1225595Z or.b64 %rd35, %rd45, 4611686293338849280; 2026-02-21T09:48:18.1225655Z // begin inline asm 2026-02-21T09:48:18.1225799Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd34, %rd35, %r66, %p13; 2026-02-21T09:48:18.1225867Z // end inline asm 2026-02-21T09:48:18.1225928Z add.s32 %r118, %r109, 64; 2026-02-21T09:48:18.1225989Z bfe.u32 %r119, %r118, 4, 14; 2026-02-21T09:48:18.1226060Z cvt.u64.u32 %rd46, %r119; 2026-02-21T09:48:18.1226176Z or.b64 %rd36, %rd46, 4611686293338849280; 2026-02-21T09:48:18.1226240Z add.s32 %r120, %r109, 32832; 2026-02-21T09:48:18.1226311Z bfe.u32 %r121, %r120, 4, 14; 2026-02-21T09:48:18.1226372Z cvt.u64.u32 %rd47, %r121; 2026-02-21T09:48:18.1226443Z or.b64 %rd37, %rd47, 4611686293338849280; 2026-02-21T09:48:18.1226505Z // begin inline asm 2026-02-21T09:48:18.1226653Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd36, %rd37, %r66, %p13; 2026-02-21T09:48:18.1226713Z // end inline asm 2026-02-21T09:48:18.1226775Z add.s32 %r122, %r109, 96; 2026-02-21T09:48:18.1226847Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T09:48:18.1226909Z cvt.u64.u32 %rd48, %r123; 2026-02-21T09:48:18.1226980Z or.b64 %rd38, %rd48, 4611686293338849280; 2026-02-21T09:48:18.1227044Z add.s32 %r124, %r109, 32864; 2026-02-21T09:48:18.1227113Z bfe.u32 %r125, %r124, 4, 14; 2026-02-21T09:48:18.1227175Z cvt.u64.u32 %rd49, %r125; 2026-02-21T09:48:18.1227277Z or.b64 %rd39, %rd49, 4611686293338849280; 2026-02-21T09:48:18.1227349Z // begin inline asm 2026-02-21T09:48:18.1227491Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd38, %rd39, %r66, %p13; 2026-02-21T09:48:18.1227552Z // end inline asm 2026-02-21T09:48:18.1227621Z cvt.u64.u32 %rd40, %r107; 2026-02-21T09:48:18.1227682Z // begin inline asm 2026-02-21T09:48:18.1227816Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T09:48:18.1227875Z // end inline asm 2026-02-21T09:48:18.1227954Z and.pred %p30, %p31, %p22; 2026-02-21T09:48:18.1228014Z // begin inline asm 2026-02-21T09:48:18.1228149Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:48:18.1228254Z // end inline asm 2026-02-21T09:48:18.1228440Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1228503Z add.s32 %r127, %r337, 1; 2026-02-21T09:48:18.1228578Z setp.eq.b32 %p32, %r127, 4; 2026-02-21T09:48:18.1228651Z selp.b32 %r337, 0, %r127, %p32; 2026-02-21T09:48:18.1228719Z selp.b32 %r128, 1, 0, %p32; 2026-02-21T09:48:18.1228783Z xor.b32 %r336, %r336, %r128; 2026-02-21T09:48:18.1229006Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1229070Z add.s32 %r335, %r335, 64; 2026-02-21T09:48:18.1229141Z setp.lt.u32 %p33, %r335, 1984; 2026-02-21T09:48:18.1229218Z @%p33 bra $L__BB0_6; 2026-02-21T09:48:18.1229308Z // %bb.7: // %.loopexit 2026-02-21T09:48:18.1229410Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1229482Z barrier.sync 1; 2026-02-21T09:48:18.1229567Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.1229633Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.1229742Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1229943Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1230029Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.1230132Z ld.shared.v2.b32 {%r49, %r53}, [global_smem+73736]; 2026-02-21T09:48:18.1230204Z barrier.sync 1; 2026-02-21T09:48:18.1230384Z .loc 1 21 67 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:21:67 2026-02-21T09:48:18.1230448Z mov.u32 %r35, %ctaid.x; 2026-02-21T09:48:18.1230520Z mov.u32 %r36, %ctaid.y; 2026-02-21T09:48:18.1230582Z mov.u32 %r37, %ctaid.z; 2026-02-21T09:48:18.1230647Z mov.u32 %r38, %nctaid.x; 2026-02-21T09:48:18.1230709Z mov.u32 %r39, %nctaid.y; 2026-02-21T09:48:18.1230789Z mad.lo.s32 %r40, %r37, %r39, %r36; 2026-02-21T09:48:18.1230858Z mad.lo.s32 %r41, %r40, %r38, %r35; 2026-02-21T09:48:18.1230923Z mul.lo.s32 %r42, %r41, 384; 2026-02-21T09:48:18.1231115Z .loc 1 22 68 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:22:68 2026-02-21T09:48:18.1231177Z add.s32 %r43, %r42, 128; 2026-02-21T09:48:18.1231242Z cvt.s64.s32 %rd8, %r43; 2026-02-21T09:48:18.1231334Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:48:18.1231412Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:48:18.1231592Z .loc 1 21 67 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:21:67 2026-02-21T09:48:18.1231655Z cvt.s64.s32 %rd10, %r42; 2026-02-21T09:48:18.1231729Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:48:18.1231800Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:48:18.1231864Z add.s32 %r13, %r1, -128; 2026-02-21T09:48:18.1231931Z mov.b32 %r339, 0; 2026-02-21T09:48:18.1231993Z mov.b32 %r338, -64; 2026-02-21T09:48:18.1232055Z mov.b32 %r340, %r339; 2026-02-21T09:48:18.1232162Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:18.1232275Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:18.1232455Z .loc 1 0 67 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0:67 2026-02-21T09:48:18.1232552Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:48:18.1232629Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:48:18.1232813Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1232881Z add.s32 %r338, %r338, 64; 2026-02-21T09:48:18.1232951Z shl.b32 %r55, %r340, 3; 2026-02-21T09:48:18.1233017Z add.s32 %r57, %r31, %r55; 2026-02-21T09:48:18.1233081Z add.s32 %r44, %r57, 73856; 2026-02-21T09:48:18.1233260Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1233330Z // begin inline asm 2026-02-21T09:48:18.1233385Z 2026-02-21T09:48:18.1233440Z { 2026-02-21T09:48:18.1233541Z .reg .pred complete; 2026-02-21T09:48:18.1233601Z waitLoop: 2026-02-21T09:48:18.1233729Z mbarrier.try_wait.parity.shared.b64 complete, [%r44], %r339; 2026-02-21T09:48:18.1233800Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.1233863Z } 2026-02-21T09:48:18.1233866Z 2026-02-21T09:48:18.1233927Z // end inline asm 2026-02-21T09:48:18.1234111Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1234221Z add.s32 %r50, %r57, 73888; 2026-02-21T09:48:18.1234392Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1234454Z bar.sync 3, 64; 2026-02-21T09:48:18.1234522Z // begin inline asm 2026-02-21T09:48:18.1234641Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r50], 16384; 2026-02-21T09:48:18.1234750Z // end inline asm 2026-02-21T09:48:18.1234933Z .loc 1 54 31 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:54:31 2026-02-21T09:48:18.1235005Z shl.b32 %r58, %r340, 13; 2026-02-21T09:48:18.1235068Z add.s32 %r47, %r31, %r58; 2026-02-21T09:48:18.1235247Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1235316Z bar.sync 3, 64; 2026-02-21T09:48:18.1235387Z elect.sync %r59|%p7, -1; 2026-02-21T09:48:18.1235458Z and.pred %p4, %p6, %p7; 2026-02-21T09:48:18.1235527Z // begin inline asm 2026-02-21T09:48:18.1235799Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r47], [%rd12, {%r338, %r49}], [%r50]; 2026-02-21T09:48:18.1235858Z // end inline asm 2026-02-21T09:48:18.1236042Z .loc 1 55 44 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:55:44 2026-02-21T09:48:18.1236113Z add.s32 %r51, %r47, 32768; 2026-02-21T09:48:18.1236287Z .loc 1 0 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:0 2026-02-21T09:48:18.1236348Z bar.sync 3, 64; 2026-02-21T09:48:18.1236426Z elect.sync %r60|%p8, -1; 2026-02-21T09:48:18.1236494Z and.pred %p5, %p6, %p8; 2026-02-21T09:48:18.1236556Z // begin inline asm 2026-02-21T09:48:18.1236833Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r51], [%rd13, {%r338, %r53}], [%r50]; 2026-02-21T09:48:18.1236894Z // end inline asm 2026-02-21T09:48:18.1236997Z add.s32 %r61, %r340, 1; 2026-02-21T09:48:18.1237068Z setp.eq.b32 %p9, %r61, 4; 2026-02-21T09:48:18.1237147Z selp.b32 %r340, 0, %r61, %p9; 2026-02-21T09:48:18.1237211Z selp.b32 %r62, 1, 0, %p9; 2026-02-21T09:48:18.1237274Z xor.b32 %r339, %r339, %r62; 2026-02-21T09:48:18.1237465Z .loc 1 50 79 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:50:79 2026-02-21T09:48:18.1237534Z setp.lt.u32 %p10, %r338, 1984; 2026-02-21T09:48:18.1237597Z @%p10 bra $L__BB0_9; 2026-02-21T09:48:18.1237711Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1237773Z barrier.sync 1; 2026-02-21T09:48:18.1237860Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.1237921Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.1238038Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.1238248Z .loc 1 19 0 // cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py:19 2026-02-21T09:48:18.1238313Z barrier.sync 1; 2026-02-21T09:48:18.1238384Z barrier.sync 1; 2026-02-21T09:48:18.1238446Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.1238504Z $L__tmp1: 2026-02-21T09:48:18.1238564Z $L__func_end0: 2026-02-21T09:48:18.1238663Z // -- End function 2026-02-21T09:48:18.1238719Z } 2026-02-21T09:48:18.1238938Z .file 1 "/tmp/torchinductor_root/q2/cq2cxa47wwryrxhsn4es6knyas4gip7rt3jyxfbqrblcqzb6243s.py" 2026-02-21T09:48:18.1239018Z .section .debug_abbrev 2026-02-21T09:48:18.1239075Z { 2026-02-21T09:48:18.1239181Z .b8 1 // Abbreviation Code 2026-02-21T09:48:18.1239286Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:48:18.1239422Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:48:18.1239511Z .b8 37 // DW_AT_producer 2026-02-21T09:48:18.1239600Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.1239694Z .b8 19 // DW_AT_language 2026-02-21T09:48:18.1239782Z .b8 5 // DW_FORM_data2 2026-02-21T09:48:18.1239894Z .b8 3 // DW_AT_name 2026-02-21T09:48:18.1239984Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.1240074Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:48:18.1240158Z .b8 6 // DW_FORM_data4 2026-02-21T09:48:18.1240261Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:48:18.1240341Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.1240421Z .b8 0 // EOM(1) 2026-02-21T09:48:18.1240503Z .b8 0 // EOM(2) 2026-02-21T09:48:18.1240576Z .b8 0 // EOM(3) 2026-02-21T09:48:18.1240633Z } 2026-02-21T09:48:18.1240698Z .section .debug_info 2026-02-21T09:48:18.1240763Z { 2026-02-21T09:48:18.1240855Z .b32 104 // Length of Unit 2026-02-21T09:48:18.1240952Z .b8 2 // DWARF version number 2026-02-21T09:48:18.1241018Z .b8 0 2026-02-21T09:48:18.1241150Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:48:18.1241252Z .b8 8 // Address Size (in bytes) 2026-02-21T09:48:18.1241364Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:48:18.1241461Z .b8 116 // DW_AT_producer 2026-02-21T09:48:18.1241522Z .b8 114 2026-02-21T09:48:18.1241582Z .b8 105 2026-02-21T09:48:18.1241652Z .b8 116 2026-02-21T09:48:18.1241708Z .b8 111 2026-02-21T09:48:18.1241765Z .b8 110 2026-02-21T09:48:18.1241821Z .b8 0 2026-02-21T09:48:18.1241913Z .b8 2 // DW_AT_language 2026-02-21T09:48:18.1241971Z .b8 0 2026-02-21T09:48:18.1242055Z .b8 99 // DW_AT_name 2026-02-21T09:48:18.1242147Z .b8 113 2026-02-21T09:48:18.1242206Z .b8 50 2026-02-21T09:48:18.1242267Z .b8 99 2026-02-21T09:48:18.1242325Z .b8 120 2026-02-21T09:48:18.1242391Z .b8 97 2026-02-21T09:48:18.1242447Z .b8 52 2026-02-21T09:48:18.1242503Z .b8 55 2026-02-21T09:48:18.1242567Z .b8 119 2026-02-21T09:48:18.1242623Z .b8 119 2026-02-21T09:48:18.1242679Z .b8 114 2026-02-21T09:48:18.1242735Z .b8 121 2026-02-21T09:48:18.1242799Z .b8 114 2026-02-21T09:48:18.1242856Z .b8 120 2026-02-21T09:48:18.1242912Z .b8 104 2026-02-21T09:48:18.1242976Z .b8 115 2026-02-21T09:48:18.1243031Z .b8 110 2026-02-21T09:48:18.1243087Z .b8 52 2026-02-21T09:48:18.1243142Z .b8 101 2026-02-21T09:48:18.1243207Z .b8 115 2026-02-21T09:48:18.1243263Z .b8 54 2026-02-21T09:48:18.1243317Z .b8 107 2026-02-21T09:48:18.1243371Z .b8 110 2026-02-21T09:48:18.1243435Z .b8 121 2026-02-21T09:48:18.1243489Z .b8 97 2026-02-21T09:48:18.1243546Z .b8 115 2026-02-21T09:48:18.1243608Z .b8 52 2026-02-21T09:48:18.1243664Z .b8 103 2026-02-21T09:48:18.1243751Z .b8 105 2026-02-21T09:48:18.1243810Z .b8 112 2026-02-21T09:48:18.1243873Z .b8 55 2026-02-21T09:48:18.1243931Z .b8 114 2026-02-21T09:48:18.1243987Z .b8 116 2026-02-21T09:48:18.1244049Z .b8 51 2026-02-21T09:48:18.1244105Z .b8 106 2026-02-21T09:48:18.1244160Z .b8 121 2026-02-21T09:48:18.1244216Z .b8 120 2026-02-21T09:48:18.1244279Z .b8 102 2026-02-21T09:48:18.1244335Z .b8 98 2026-02-21T09:48:18.1244389Z .b8 113 2026-02-21T09:48:18.1244451Z .b8 114 2026-02-21T09:48:18.1244506Z .b8 98 2026-02-21T09:48:18.1244562Z .b8 108 2026-02-21T09:48:18.1244617Z .b8 99 2026-02-21T09:48:18.1244744Z .b8 113 2026-02-21T09:48:18.1244804Z .b8 122 2026-02-21T09:48:18.1244859Z .b8 98 2026-02-21T09:48:18.1244952Z .b8 54 2026-02-21T09:48:18.1245018Z .b8 50 2026-02-21T09:48:18.1245075Z .b8 52 2026-02-21T09:48:18.1245130Z .b8 51 2026-02-21T09:48:18.1245195Z .b8 115 2026-02-21T09:48:18.1245251Z .b8 46 2026-02-21T09:48:18.1245307Z .b8 112 2026-02-21T09:48:18.1245363Z .b8 121 2026-02-21T09:48:18.1245429Z .b8 0 2026-02-21T09:48:18.1245536Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:48:18.1245620Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:48:18.1245726Z .b8 116 2026-02-21T09:48:18.1245805Z .b8 109 2026-02-21T09:48:18.1245864Z .b8 112 2026-02-21T09:48:18.1245923Z .b8 47 2026-02-21T09:48:18.1245989Z .b8 116 2026-02-21T09:48:18.1246049Z .b8 111 2026-02-21T09:48:18.1246105Z .b8 114 2026-02-21T09:48:18.1246172Z .b8 99 2026-02-21T09:48:18.1246231Z .b8 104 2026-02-21T09:48:18.1246286Z .b8 105 2026-02-21T09:48:18.1246342Z .b8 110 2026-02-21T09:48:18.1246407Z .b8 100 2026-02-21T09:48:18.1246463Z .b8 117 2026-02-21T09:48:18.1246519Z .b8 99 2026-02-21T09:48:18.1246576Z .b8 116 2026-02-21T09:48:18.1246639Z .b8 111 2026-02-21T09:48:18.1246694Z .b8 114 2026-02-21T09:48:18.1246749Z .b8 95 2026-02-21T09:48:18.1246813Z .b8 114 2026-02-21T09:48:18.1246868Z .b8 111 2026-02-21T09:48:18.1246923Z .b8 111 2026-02-21T09:48:18.1246978Z .b8 116 2026-02-21T09:48:18.1247040Z .b8 47 2026-02-21T09:48:18.1247099Z .b8 113 2026-02-21T09:48:18.1247157Z .b8 50 2026-02-21T09:48:18.1247221Z .b8 0 2026-02-21T09:48:18.1247277Z } 2026-02-21T09:48:18.1247352Z .section .debug_macinfo { } 2026-02-21T09:48:18.1247356Z 2026-02-21T09:48:18.1247444Z ================================================================ 2026-02-21T09:48:18.1247566Z please share the reproducer above with Triton project. 2026-02-21T09:48:18.3576255Z [115s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:48:18.3576281Z 2026-02-21T09:48:18.3579370Z Config: @helion.kernel(config=helion.Config(block_sizes=[64, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:48:18.3579583Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:48:18.3579674Z `ptxas` stderr: 2026-02-21T09:48:18.3580161Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 309 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:18.3580282Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:18.3580291Z 2026-02-21T09:48:18.3580815Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmppiu9q7is.ptx -o /tmp/tmppiu9q7is.ptx.o 2026-02-21T09:48:18.3580824Z 2026-02-21T09:48:18.3580980Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:48:18.3580986Z 2026-02-21T09:48:18.3580990Z 2026-02-21T09:48:18.3581088Z ================================================================ 2026-02-21T09:48:18.3581263Z Internal Triton PTX codegen error 2026-02-21T09:48:18.3581343Z `ptxas` stderr: 2026-02-21T09:48:18.3581748Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 309 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:48:18.3581856Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:48:18.3581861Z 2026-02-21T09:48:18.3582338Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmppiu9q7is.ptx -o /tmp/tmppiu9q7is.ptx.o 2026-02-21T09:48:18.3582343Z 2026-02-21T09:48:18.3582346Z 2026-02-21T09:48:18.3582413Z // 2026-02-21T09:48:18.3582501Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:48:18.3582668Z // 2026-02-21T09:48:18.3582673Z 2026-02-21T09:48:18.3582745Z .version 8.7 2026-02-21T09:48:18.3582813Z .target sm_100a 2026-02-21T09:48:18.3582882Z .address_size 64 2026-02-21T09:48:18.3582886Z 2026-02-21T09:48:18.3583079Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:48:18.3583176Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:48:18.3583369Z // @_helion_matmul 2026-02-21T09:48:18.3583456Z .visible .entry _helion_matmul( 2026-02-21T09:48:18.3583580Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:48:18.3583692Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:48:18.3583808Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:48:18.3583918Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:48:18.3584030Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:48:18.3584098Z ) 2026-02-21T09:48:18.3584162Z .reqntid 256 2026-02-21T09:48:18.3584226Z .maxnreg 32 2026-02-21T09:48:18.3584284Z { 2026-02-21T09:48:18.3584366Z .reg .pred %p<120>; 2026-02-21T09:48:18.3584434Z .reg .b16 %rs<15>; 2026-02-21T09:48:18.3584496Z .reg .b32 %r<342>; 2026-02-21T09:48:18.3584561Z .reg .b64 %rd<172>; 2026-02-21T09:48:18.3584922Z .loc 1 19 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:19:0 2026-02-21T09:48:18.3584993Z $L__func_begin0: 2026-02-21T09:48:18.3585228Z .loc 1 19 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:19:0 2026-02-21T09:48:18.3585233Z 2026-02-21T09:48:18.3585306Z // %bb.0: 2026-02-21T09:48:18.3585410Z ld.param.b64 %rd7, [_helion_matmul_param_3]; 2026-02-21T09:48:18.3585473Z $L__tmp0: 2026-02-21T09:48:18.3585691Z .loc 1 19 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:19 2026-02-21T09:48:18.3585761Z mov.u32 %r1, %tid.x; 2026-02-21T09:48:18.3585831Z shr.u32 %r2, %r1, 5; 2026-02-21T09:48:18.3585928Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:48:18.3586225Z setp.lt.u32 %p1, %r3, 4; 2026-02-21T09:48:18.3586294Z @%p1 bra $L__BB0_12; 2026-02-21T09:48:18.3586371Z bra.uni $L__BB0_1; 2026-02-21T09:48:18.3586444Z $L__BB0_12: 2026-02-21T09:48:18.3586749Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0:0 2026-02-21T09:48:18.3586851Z ld.param.b64 %rd6, [_helion_matmul_param_2]; 2026-02-21T09:48:18.3586955Z ld.param.b64 %rd5, [_helion_matmul_param_1]; 2026-02-21T09:48:18.3587044Z ld.param.b64 %rd4, [_helion_matmul_param_0]; 2026-02-21T09:48:18.3587254Z .loc 1 19 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:19 2026-02-21T09:48:18.3587358Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.3587436Z setp.lt.u32 %p34, %r1, 32; 2026-02-21T09:48:18.3587508Z mov.b32 %r129, global_smem; 2026-02-21T09:48:18.3587573Z // begin inline asm 2026-02-21T09:48:18.3587781Z @%p34 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r129], 64; 2026-02-21T09:48:18.3587851Z // end inline asm 2026-02-21T09:48:18.3587920Z bar.sync 0, 128; 2026-02-21T09:48:18.3588015Z ld.shared.b32 %r334, [global_smem]; 2026-02-21T09:48:18.3588128Z bar.sync 0, 128; 2026-02-21T09:48:18.3588199Z // begin inline asm 2026-02-21T09:48:18.3588343Z @%p34 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:48:18.3588420Z // end inline asm 2026-02-21T09:48:18.3588645Z .loc 1 21 67 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:21:67 2026-02-21T09:48:18.3588717Z mov.u32 %r21, %ctaid.x; 2026-02-21T09:48:18.3588797Z mov.u32 %r154, %ctaid.y; 2026-02-21T09:48:18.3588865Z mov.u32 %r155, %ctaid.z; 2026-02-21T09:48:18.3588936Z mov.u32 %r156, %nctaid.x; 2026-02-21T09:48:18.3589012Z mov.u32 %r157, %nctaid.y; 2026-02-21T09:48:18.3589092Z mad.lo.s32 %r158, %r155, %r157, %r154; 2026-02-21T09:48:18.3589254Z mad.lo.s32 %r159, %r158, %r156, %r21; 2026-02-21T09:48:18.3589327Z mul.lo.s32 %r160, %r159, 384; 2026-02-21T09:48:18.3589410Z cvt.s64.s32 %rd104, %r160; 2026-02-21T09:48:18.3589496Z add.s64 %rd65, %rd7, %rd104; 2026-02-21T09:48:18.3589559Z shl.b32 %r161, %r1, 2; 2026-02-21T09:48:18.3589633Z add.s32 %r130, %r129, %r161; 2026-02-21T09:48:18.3589694Z mov.b32 %r139, 0; 2026-02-21T09:48:18.3589756Z // begin inline asm 2026-02-21T09:48:18.3589892Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:18.3589963Z // end inline asm 2026-02-21T09:48:18.3590031Z bar.warp.sync -1; 2026-02-21T09:48:18.3590099Z setp.eq.b32 %p37, %r1, 0; 2026-02-21T09:48:18.3590173Z cvt.u64.u32 %rd50, %r129; 2026-02-21T09:48:18.3590236Z // begin inline asm 2026-02-21T09:48:18.3590423Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd4; 2026-02-21T09:48:18.3590494Z // end inline asm 2026-02-21T09:48:18.3590557Z // begin inline asm 2026-02-21T09:48:18.3590727Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:18.3590790Z // end inline asm 2026-02-21T09:48:18.3590861Z mov.b32 %r132, 64; 2026-02-21T09:48:18.3590923Z // begin inline asm 2026-02-21T09:48:18.3591093Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:18.3591163Z // end inline asm 2026-02-21T09:48:18.3591223Z // begin inline asm 2026-02-21T09:48:18.3591391Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:18.3591459Z // end inline asm 2026-02-21T09:48:18.3591521Z mov.b32 %r134, 2048; 2026-02-21T09:48:18.3591581Z // begin inline asm 2026-02-21T09:48:18.3591759Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r134; 2026-02-21T09:48:18.3591827Z // end inline asm 2026-02-21T09:48:18.3591889Z // begin inline asm 2026-02-21T09:48:18.3592064Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r134; 2026-02-21T09:48:18.3592139Z // end inline asm 2026-02-21T09:48:18.3592202Z mov.b64 %rd58, 4096; 2026-02-21T09:48:18.3592264Z // begin inline asm 2026-02-21T09:48:18.3592476Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:48:18.3592537Z // end inline asm 2026-02-21T09:48:18.3592634Z mov.b32 %r136, 1; 2026-02-21T09:48:18.3592695Z // begin inline asm 2026-02-21T09:48:18.3592899Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:18.3592961Z // end inline asm 2026-02-21T09:48:18.3593022Z // begin inline asm 2026-02-21T09:48:18.3593219Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:18.3593279Z // end inline asm 2026-02-21T09:48:18.3593338Z // begin inline asm 2026-02-21T09:48:18.3593512Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:18.3593573Z // end inline asm 2026-02-21T09:48:18.3593637Z // begin inline asm 2026-02-21T09:48:18.3593823Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.3593890Z // end inline asm 2026-02-21T09:48:18.3593949Z // begin inline asm 2026-02-21T09:48:18.3594187Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:18.3594257Z // end inline asm 2026-02-21T09:48:18.3594319Z // begin inline asm 2026-02-21T09:48:18.3594483Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.3594552Z // end inline asm 2026-02-21T09:48:18.3594613Z // begin inline asm 2026-02-21T09:48:18.3595010Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd65 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:18.3595080Z // end inline asm 2026-02-21T09:48:18.3595144Z // begin inline asm 2026-02-21T09:48:18.3595288Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd65 + 0 ], 0x80; 2026-02-21T09:48:18.3595414Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.3595508Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.3595568Z // end inline asm 2026-02-21T09:48:18.3595631Z bar.sync 0, 128; 2026-02-21T09:48:18.3595851Z .loc 1 22 68 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:22:68 2026-02-21T09:48:18.3595920Z add.s32 %r162, %r160, 128; 2026-02-21T09:48:18.3596031Z cvt.s64.s32 %rd105, %r162; 2026-02-21T09:48:18.3596109Z add.s64 %rd83, %rd7, %rd105; 2026-02-21T09:48:18.3596172Z bar.sync 0, 128; 2026-02-21T09:48:18.3596246Z // begin inline asm 2026-02-21T09:48:18.3596326Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:18.3596397Z // end inline asm 2026-02-21T09:48:18.3596465Z bar.warp.sync -1; 2026-02-21T09:48:18.3596528Z // begin inline asm 2026-02-21T09:48:18.3596716Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd5; 2026-02-21T09:48:18.3596779Z // end inline asm 2026-02-21T09:48:18.3596840Z // begin inline asm 2026-02-21T09:48:18.3596997Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:18.3597068Z // end inline asm 2026-02-21T09:48:18.3597131Z // begin inline asm 2026-02-21T09:48:18.3597302Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:18.3597373Z // end inline asm 2026-02-21T09:48:18.3597435Z // begin inline asm 2026-02-21T09:48:18.3597599Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:18.3597664Z // end inline asm 2026-02-21T09:48:18.3597725Z // begin inline asm 2026-02-21T09:48:18.3597902Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r134; 2026-02-21T09:48:18.3597968Z // end inline asm 2026-02-21T09:48:18.3598031Z mov.b32 %r143, 12288; 2026-02-21T09:48:18.3598091Z // begin inline asm 2026-02-21T09:48:18.3598285Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r143; 2026-02-21T09:48:18.3598354Z // end inline asm 2026-02-21T09:48:18.3598414Z // begin inline asm 2026-02-21T09:48:18.3598618Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd58; 2026-02-21T09:48:18.3598687Z // end inline asm 2026-02-21T09:48:18.3598790Z // begin inline asm 2026-02-21T09:48:18.3598982Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:18.3599051Z // end inline asm 2026-02-21T09:48:18.3599110Z // begin inline asm 2026-02-21T09:48:18.3599316Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:18.3599377Z // end inline asm 2026-02-21T09:48:18.3599444Z // begin inline asm 2026-02-21T09:48:18.3599610Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:18.3599669Z // end inline asm 2026-02-21T09:48:18.3599737Z // begin inline asm 2026-02-21T09:48:18.3599922Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.3599981Z // end inline asm 2026-02-21T09:48:18.3600049Z // begin inline asm 2026-02-21T09:48:18.3600256Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:18.3600318Z // end inline asm 2026-02-21T09:48:18.3600378Z // begin inline asm 2026-02-21T09:48:18.3600551Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.3600612Z // end inline asm 2026-02-21T09:48:18.3600673Z // begin inline asm 2026-02-21T09:48:18.3600992Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd83 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:18.3601052Z // end inline asm 2026-02-21T09:48:18.3601112Z // begin inline asm 2026-02-21T09:48:18.3601260Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd83 + 0 ], 0x80; 2026-02-21T09:48:18.3601365Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.3601446Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.3601514Z // end inline asm 2026-02-21T09:48:18.3601574Z bar.sync 0, 128; 2026-02-21T09:48:18.3601772Z .loc 1 24 73 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:24:73 2026-02-21T09:48:18.3601849Z add.s32 %r163, %r160, 256; 2026-02-21T09:48:18.3601921Z cvt.s64.s32 %rd106, %r163; 2026-02-21T09:48:18.3602016Z add.s64 %rd101, %rd7, %rd106; 2026-02-21T09:48:18.3602074Z bar.sync 0, 128; 2026-02-21T09:48:18.3602140Z // begin inline asm 2026-02-21T09:48:18.3602215Z @%p34 st.shared.b32 [ %r130 + 0 ], %r139; 2026-02-21T09:48:18.3602271Z // end inline asm 2026-02-21T09:48:18.3602334Z bar.warp.sync -1; 2026-02-21T09:48:18.3602402Z // begin inline asm 2026-02-21T09:48:18.3602587Z @%p37 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd50 + 0 ], %rd6; 2026-02-21T09:48:18.3602646Z // end inline asm 2026-02-21T09:48:18.3602715Z // begin inline asm 2026-02-21T09:48:18.3602863Z @%p37 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1; 2026-02-21T09:48:18.3602921Z // end inline asm 2026-02-21T09:48:18.3602989Z // begin inline asm 2026-02-21T09:48:18.3603151Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r132; 2026-02-21T09:48:18.3603209Z // end inline asm 2026-02-21T09:48:18.3603267Z // begin inline asm 2026-02-21T09:48:18.3603436Z @%p37 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r132; 2026-02-21T09:48:18.3603495Z // end inline asm 2026-02-21T09:48:18.3603553Z // begin inline asm 2026-02-21T09:48:18.3603732Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r143; 2026-02-21T09:48:18.3603792Z // end inline asm 2026-02-21T09:48:18.3603851Z // begin inline asm 2026-02-21T09:48:18.3604032Z @%p37 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r134; 2026-02-21T09:48:18.3604095Z // end inline asm 2026-02-21T09:48:18.3604157Z mov.b64 %rd94, 24576; 2026-02-21T09:48:18.3604215Z // begin inline asm 2026-02-21T09:48:18.3604402Z @%p37 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd50 + 0 ], 0x0, %rd94; 2026-02-21T09:48:18.3604461Z // end inline asm 2026-02-21T09:48:18.3604522Z // begin inline asm 2026-02-21T09:48:18.3604795Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0, %r136; 2026-02-21T09:48:18.3604857Z // end inline asm 2026-02-21T09:48:18.3604916Z // begin inline asm 2026-02-21T09:48:18.3605105Z @%p37 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x1, %r136; 2026-02-21T09:48:18.3605165Z // end inline asm 2026-02-21T09:48:18.3605223Z // begin inline asm 2026-02-21T09:48:18.3605388Z @%p37 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x6; 2026-02-21T09:48:18.3605446Z // end inline asm 2026-02-21T09:48:18.3605508Z // begin inline asm 2026-02-21T09:48:18.3605693Z @%p37 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.3605767Z // end inline asm 2026-02-21T09:48:18.3605829Z // begin inline asm 2026-02-21T09:48:18.3606000Z @%p37 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x3; 2026-02-21T09:48:18.3606106Z // end inline asm 2026-02-21T09:48:18.3606182Z // begin inline asm 2026-02-21T09:48:18.3606340Z @%p37 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd50 + 0 ], 0x0; 2026-02-21T09:48:18.3606409Z // end inline asm 2026-02-21T09:48:18.3606469Z // begin inline asm 2026-02-21T09:48:18.3606756Z @%p34 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd101 + 0 ], [ %rd50 + 0 ], 0x80; 2026-02-21T09:48:18.3606827Z // end inline asm 2026-02-21T09:48:18.3606889Z // begin inline asm 2026-02-21T09:48:18.3607029Z @%p34 fence.proxy.tensormap::generic.acquire.gpu [ %rd101 + 0 ], 0x80; 2026-02-21T09:48:18.3607104Z @%p34 cp.async.bulk.commit_group ; 2026-02-21T09:48:18.3607248Z @%p34 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:48:18.3607307Z // end inline asm 2026-02-21T09:48:18.3607368Z bar.sync 0, 128; 2026-02-21T09:48:18.3607448Z cvta.global.u64 %rd107, %rd101; 2026-02-21T09:48:18.3607647Z .loc 1 31 35 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:31:35 2026-02-21T09:48:18.3607712Z mul.lo.s32 %r341, %r21, 3; 2026-02-21T09:48:18.3607937Z .loc 1 32 37 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:32:37 2026-02-21T09:48:18.3608002Z add.s32 %r164, %r341, 3; 2026-02-21T09:48:18.3608186Z .loc 1 32 49 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:32:49 2026-02-21T09:48:18.3608249Z min.s32 %r23, %r164, 6144; 2026-02-21T09:48:18.3608441Z .loc 1 33 84 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:33:84 2026-02-21T09:48:18.3608513Z setp.ge.s32 %p90, %r341, %r23; 2026-02-21T09:48:18.3608579Z @%p90 bra $L__BB0_15; 2026-02-21T09:48:18.3608669Z // %bb.13: // %.lr.ph 2026-02-21T09:48:18.3608855Z .loc 1 0 84 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0:84 2026-02-21T09:48:18.3608918Z shl.b32 %r165, %r1, 7; 2026-02-21T09:48:18.3608991Z and.b32 %r166, %r165, 1920; 2026-02-21T09:48:18.3609053Z shl.b32 %r167, %r1, 6; 2026-02-21T09:48:18.3609116Z and.b32 %r168, %r167, 6144; 2026-02-21T09:48:18.3609177Z shl.b32 %r169, %r1, 4; 2026-02-21T09:48:18.3609246Z and.b32 %r170, %r169, 112; 2026-02-21T09:48:18.3609308Z and.b32 %r172, %r161, 64; 2026-02-21T09:48:18.3609369Z or.b32 %r173, %r168, %r170; 2026-02-21T09:48:18.3609441Z xor.b32 %r174, %r173, %r172; 2026-02-21T09:48:18.3609500Z or.b32 %r175, %r174, %r166; 2026-02-21T09:48:18.3609563Z add.s32 %r177, %r129, 65536; 2026-02-21T09:48:18.3609623Z add.s32 %r24, %r177, %r175; 2026-02-21T09:48:18.3609693Z xor.b32 %r178, %r175, 16; 2026-02-21T09:48:18.3609752Z add.s32 %r25, %r177, %r178; 2026-02-21T09:48:18.3609815Z xor.b32 %r179, %r175, 32; 2026-02-21T09:48:18.3609883Z add.s32 %r26, %r177, %r179; 2026-02-21T09:48:18.3609944Z xor.b32 %r180, %r175, 48; 2026-02-21T09:48:18.3610005Z add.s32 %r27, %r177, %r180; 2026-02-21T09:48:18.3610220Z .loc 1 33 84 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:33:84 2026-02-21T09:48:18.3610295Z cvt.u16.u32 %rs4, %r21; 2026-02-21T09:48:18.3610360Z mul.lo.s16 %rs14, %rs4, 3; 2026-02-21T09:48:18.3610473Z $L__BB0_14: // =>This Inner Loop Header: Depth=1 2026-02-21T09:48:18.3610667Z .loc 1 39 35 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:39:35 2026-02-21T09:48:18.3610737Z mul.hi.s32 %r276, %r341, 715827883; 2026-02-21T09:48:18.3610797Z shr.u32 %r277, %r276, 31; 2026-02-21T09:48:18.3610866Z shr.s32 %r278, %r276, 7; 2026-02-21T09:48:18.3610929Z add.s32 %r279, %r278, %r277; 2026-02-21T09:48:18.3611112Z .loc 1 43 51 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:43:51 2026-02-21T09:48:18.3611177Z cvt.u16.u32 %rs5, %r279; 2026-02-21T09:48:18.3611255Z mad.lo.s16 %rs6, %rs5, -768, %rs14; 2026-02-21T09:48:18.3611318Z shr.s16 %rs7, %rs6, 15; 2026-02-21T09:48:18.3611421Z shr.u16 %rs8, %rs7, 14; 2026-02-21T09:48:18.3611493Z add.s16 %rs9, %rs6, %rs8; 2026-02-21T09:48:18.3611555Z shr.s16 %rs10, %rs9, 2; 2026-02-21T09:48:18.3611739Z .loc 1 42 64 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:42:64 2026-02-21T09:48:18.3611812Z and.b16 %rs11, %rs9, -4; 2026-02-21T09:48:18.3611880Z mad.lo.s16 %rs12, %rs5, 768, %rs11; 2026-02-21T09:48:18.3611941Z sub.s16 %rs13, %rs14, %rs12; 2026-02-21T09:48:18.3612122Z .loc 1 44 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:44:27 2026-02-21T09:48:18.3612189Z shl.b32 %r280, %r279, 8; 2026-02-21T09:48:18.3612261Z mad.wide.s16 %r274, %rs13, 64, %r280; 2026-02-21T09:48:18.3612468Z .loc 1 45 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:45:27 2026-02-21T09:48:18.3612545Z mul.wide.s16 %r273, %rs10, 64; 2026-02-21T09:48:18.3612732Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3612812Z shfl.sync.idx.b32 %r281, %r2, 0, 31, -1; 2026-02-21T09:48:18.3612880Z shl.b32 %r282, %r281, 21; 2026-02-21T09:48:18.3612971Z and.b32 %r283, %r282, 6291456; 2026-02-21T09:48:18.3613036Z add.s32 %r181, %r283, %r334; 2026-02-21T09:48:18.3613101Z mov.pred %p91, -1; 2026-02-21T09:48:18.3613167Z mov.b32 %r182, 0; 2026-02-21T09:48:18.3613227Z // begin inline asm 2026-02-21T09:48:18.3613564Z @%p91 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r181 + 0], 32, {%r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182}; 2026-02-21T09:48:18.3613632Z // end inline asm 2026-02-21T09:48:18.3613692Z // begin inline asm 2026-02-21T09:48:18.3614023Z @%p91 tcgen05.st.sync.aligned.16x32bx2.x16.b32 [%r181 + 16], 32, {%r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182, %r182}; 2026-02-21T09:48:18.3614092Z // end inline asm 2026-02-21T09:48:18.3614151Z // begin inline asm 2026-02-21T09:48:18.3614228Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:48:18.3614286Z // end inline asm 2026-02-21T09:48:18.3614354Z bar.sync 0, 128; 2026-02-21T09:48:18.3614542Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3614609Z add.s32 %r215, %r129, 73856; 2026-02-21T09:48:18.3614738Z // begin inline asm 2026-02-21T09:48:18.3614838Z @%p37 mbarrier.init.shared::cta.b64 [%r215], 1; 2026-02-21T09:48:18.3614899Z // end inline asm 2026-02-21T09:48:18.3614968Z bar.sync 0, 128; 2026-02-21T09:48:18.3615031Z add.s32 %r216, %r129, 73864; 2026-02-21T09:48:18.3615092Z // begin inline asm 2026-02-21T09:48:18.3615186Z @%p37 mbarrier.init.shared::cta.b64 [%r216], 1; 2026-02-21T09:48:18.3615258Z // end inline asm 2026-02-21T09:48:18.3615321Z bar.sync 0, 128; 2026-02-21T09:48:18.3615386Z add.s32 %r217, %r129, 73872; 2026-02-21T09:48:18.3615457Z // begin inline asm 2026-02-21T09:48:18.3615548Z @%p37 mbarrier.init.shared::cta.b64 [%r217], 1; 2026-02-21T09:48:18.3615643Z // end inline asm 2026-02-21T09:48:18.3615707Z bar.sync 0, 128; 2026-02-21T09:48:18.3615783Z add.s32 %r218, %r129, 73880; 2026-02-21T09:48:18.3615846Z // begin inline asm 2026-02-21T09:48:18.3615936Z @%p37 mbarrier.init.shared::cta.b64 [%r218], 1; 2026-02-21T09:48:18.3616010Z // end inline asm 2026-02-21T09:48:18.3616078Z add.s32 %r219, %r129, 73888; 2026-02-21T09:48:18.3616142Z // begin inline asm 2026-02-21T09:48:18.3616242Z @%p37 mbarrier.init.shared::cta.b64 [%r219], 1; 2026-02-21T09:48:18.3616308Z // end inline asm 2026-02-21T09:48:18.3616368Z bar.sync 0, 128; 2026-02-21T09:48:18.3616432Z add.s32 %r220, %r129, 73896; 2026-02-21T09:48:18.3616502Z // begin inline asm 2026-02-21T09:48:18.3616592Z @%p37 mbarrier.init.shared::cta.b64 [%r220], 1; 2026-02-21T09:48:18.3616652Z // end inline asm 2026-02-21T09:48:18.3616721Z bar.sync 0, 128; 2026-02-21T09:48:18.3616784Z add.s32 %r221, %r129, 73904; 2026-02-21T09:48:18.3616846Z // begin inline asm 2026-02-21T09:48:18.3616977Z @%p37 mbarrier.init.shared::cta.b64 [%r221], 1; 2026-02-21T09:48:18.3617049Z // end inline asm 2026-02-21T09:48:18.3617111Z bar.sync 0, 128; 2026-02-21T09:48:18.3617175Z add.s32 %r222, %r129, 73912; 2026-02-21T09:48:18.3617246Z // begin inline asm 2026-02-21T09:48:18.3617338Z @%p37 mbarrier.init.shared::cta.b64 [%r222], 1; 2026-02-21T09:48:18.3617396Z // end inline asm 2026-02-21T09:48:18.3617575Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3617642Z bar.sync 0, 128; 2026-02-21T09:48:18.3617703Z // begin inline asm 2026-02-21T09:48:18.3617797Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r215]; 2026-02-21T09:48:18.3617896Z // end inline asm 2026-02-21T09:48:18.3617954Z bar.sync 0, 128; 2026-02-21T09:48:18.3618014Z // begin inline asm 2026-02-21T09:48:18.3618113Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r216]; 2026-02-21T09:48:18.3618172Z // end inline asm 2026-02-21T09:48:18.3618231Z bar.sync 0, 128; 2026-02-21T09:48:18.3618291Z // begin inline asm 2026-02-21T09:48:18.3618387Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r217]; 2026-02-21T09:48:18.3618475Z // end inline asm 2026-02-21T09:48:18.3618535Z bar.sync 0, 128; 2026-02-21T09:48:18.3618601Z // begin inline asm 2026-02-21T09:48:18.3618691Z @%p37 mbarrier.arrive.shared::cta.b64 _, [%r218]; 2026-02-21T09:48:18.3618748Z // end inline asm 2026-02-21T09:48:18.3618936Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3619003Z bar.sync 0, 128; 2026-02-21T09:48:18.3619064Z add.s32 %r227, %r129, 73920; 2026-02-21T09:48:18.3619121Z // begin inline asm 2026-02-21T09:48:18.3619219Z @%p37 mbarrier.init.shared::cta.b64 [%r227], 1; 2026-02-21T09:48:18.3619276Z // end inline asm 2026-02-21T09:48:18.3619361Z st.shared.b32 [global_smem+73928], 33554689; 2026-02-21T09:48:18.3619451Z st.shared.b32 [global_smem+73728], %r334; 2026-02-21T09:48:18.3619556Z st.shared.v2.b32 [global_smem+73736], {%r274, %r273}; 2026-02-21T09:48:18.3619618Z barrier.sync 1; 2026-02-21T09:48:18.3619702Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.3619773Z barrier.sync 1; 2026-02-21T09:48:18.3619832Z barrier.sync 1; 2026-02-21T09:48:18.3619914Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:48:18.3620106Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3620165Z bar.sync 0, 128; 2026-02-21T09:48:18.3620223Z // begin inline asm 2026-02-21T09:48:18.3620279Z 2026-02-21T09:48:18.3620340Z { 2026-02-21T09:48:18.3620407Z .reg .pred complete; 2026-02-21T09:48:18.3620464Z waitLoop: 2026-02-21T09:48:18.3620605Z mbarrier.try_wait.parity.shared.b64 complete, [%r227], %r182; 2026-02-21T09:48:18.3620675Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.3620729Z } 2026-02-21T09:48:18.3620733Z 2026-02-21T09:48:18.3620798Z // end inline asm 2026-02-21T09:48:18.3621014Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3621075Z bar.sync 0, 128; 2026-02-21T09:48:18.3621135Z // begin inline asm 2026-02-21T09:48:18.3621229Z @%p37 mbarrier.inval.shared::cta.b64 [%r227]; 2026-02-21T09:48:18.3621286Z // end inline asm 2026-02-21T09:48:18.3621345Z // begin inline asm 2026-02-21T09:48:18.3621438Z @%p37 mbarrier.inval.shared::cta.b64 [%r219]; 2026-02-21T09:48:18.3621495Z // end inline asm 2026-02-21T09:48:18.3621552Z bar.sync 0, 128; 2026-02-21T09:48:18.3621610Z // begin inline asm 2026-02-21T09:48:18.3621701Z @%p37 mbarrier.inval.shared::cta.b64 [%r220]; 2026-02-21T09:48:18.3621758Z // end inline asm 2026-02-21T09:48:18.3621815Z bar.sync 0, 128; 2026-02-21T09:48:18.3621884Z // begin inline asm 2026-02-21T09:48:18.3621966Z @%p37 mbarrier.inval.shared::cta.b64 [%r221]; 2026-02-21T09:48:18.3622022Z // end inline asm 2026-02-21T09:48:18.3622087Z bar.sync 0, 128; 2026-02-21T09:48:18.3622145Z // begin inline asm 2026-02-21T09:48:18.3622252Z @%p37 mbarrier.inval.shared::cta.b64 [%r222]; 2026-02-21T09:48:18.3622311Z // end inline asm 2026-02-21T09:48:18.3622380Z // begin inline asm 2026-02-21T09:48:18.3622461Z @%p37 mbarrier.inval.shared::cta.b64 [%r215]; 2026-02-21T09:48:18.3622518Z // end inline asm 2026-02-21T09:48:18.3622581Z bar.sync 0, 128; 2026-02-21T09:48:18.3622639Z // begin inline asm 2026-02-21T09:48:18.3622718Z @%p37 mbarrier.inval.shared::cta.b64 [%r216]; 2026-02-21T09:48:18.3622773Z // end inline asm 2026-02-21T09:48:18.3622840Z bar.sync 0, 128; 2026-02-21T09:48:18.3622899Z // begin inline asm 2026-02-21T09:48:18.3622980Z @%p37 mbarrier.inval.shared::cta.b64 [%r217]; 2026-02-21T09:48:18.3623045Z // end inline asm 2026-02-21T09:48:18.3623131Z bar.sync 0, 128; 2026-02-21T09:48:18.3623191Z // begin inline asm 2026-02-21T09:48:18.3623272Z @%p37 mbarrier.inval.shared::cta.b64 [%r218]; 2026-02-21T09:48:18.3623337Z // end inline asm 2026-02-21T09:48:18.3623521Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3623581Z // begin inline asm 2026-02-21T09:48:18.3623931Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r239, %r240, %r241, %r242, %r243, %r244, %r245, %r246, %r247, %r248, %r249, %r250, %r251, %r252, %r253, %r254}, [%r181 + 0], 32; 2026-02-21T09:48:18.3623992Z // end inline asm 2026-02-21T09:48:18.3624054Z // begin inline asm 2026-02-21T09:48:18.3624380Z tcgen05.ld.sync.aligned.16x32bx2.x16.b32 {%r256, %r257, %r258, %r259, %r260, %r261, %r262, %r263, %r264, %r265, %r266, %r267, %r268, %r269, %r270, %r271}, [%r181 + 16], 32; 2026-02-21T09:48:18.3624441Z // end inline asm 2026-02-21T09:48:18.3624499Z // begin inline asm 2026-02-21T09:48:18.3624585Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:48:18.3624644Z // end inline asm 2026-02-21T09:48:18.3624748Z cvt.u64.u32 %rd108, %r239; 2026-02-21T09:48:18.3624815Z cvt.u64.u32 %rd109, %r240; 2026-02-21T09:48:18.3624889Z shl.b64 %rd110, %rd109, 32; 2026-02-21T09:48:18.3624958Z or.b64 %rd111, %rd108, %rd110; 2026-02-21T09:48:18.3625153Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3625230Z mov.b64 {%r285, %r286}, %rd111; 2026-02-21T09:48:18.3625303Z cvt.rn.f16x2.f32 %r287, %r286, %r285; 2026-02-21T09:48:18.3625491Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3625562Z cvt.u64.u32 %rd112, %r241; 2026-02-21T09:48:18.3625624Z cvt.u64.u32 %rd113, %r242; 2026-02-21T09:48:18.3625687Z shl.b64 %rd114, %rd113, 32; 2026-02-21T09:48:18.3625755Z or.b64 %rd115, %rd112, %rd114; 2026-02-21T09:48:18.3625957Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3626031Z mov.b64 {%r288, %r289}, %rd115; 2026-02-21T09:48:18.3626105Z cvt.rn.f16x2.f32 %r290, %r289, %r288; 2026-02-21T09:48:18.3626307Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3626420Z cvt.u64.u32 %rd116, %r243; 2026-02-21T09:48:18.3626488Z cvt.u64.u32 %rd117, %r244; 2026-02-21T09:48:18.3626564Z shl.b64 %rd118, %rd117, 32; 2026-02-21T09:48:18.3626631Z or.b64 %rd119, %rd116, %rd118; 2026-02-21T09:48:18.3626822Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3626888Z mov.b64 {%r291, %r292}, %rd119; 2026-02-21T09:48:18.3626967Z cvt.rn.f16x2.f32 %r293, %r292, %r291; 2026-02-21T09:48:18.3627162Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3627227Z cvt.u64.u32 %rd120, %r245; 2026-02-21T09:48:18.3627301Z cvt.u64.u32 %rd121, %r246; 2026-02-21T09:48:18.3627366Z shl.b64 %rd122, %rd121, 32; 2026-02-21T09:48:18.3627431Z or.b64 %rd123, %rd120, %rd122; 2026-02-21T09:48:18.3627692Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3627763Z mov.b64 {%r294, %r295}, %rd123; 2026-02-21T09:48:18.3627833Z cvt.rn.f16x2.f32 %r296, %r295, %r294; 2026-02-21T09:48:18.3628030Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3628104Z cvt.u64.u32 %rd124, %r247; 2026-02-21T09:48:18.3628168Z cvt.u64.u32 %rd125, %r248; 2026-02-21T09:48:18.3628233Z shl.b64 %rd126, %rd125, 32; 2026-02-21T09:48:18.3628306Z or.b64 %rd127, %rd124, %rd126; 2026-02-21T09:48:18.3628499Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3628564Z mov.b64 {%r297, %r298}, %rd127; 2026-02-21T09:48:18.3628692Z cvt.rn.f16x2.f32 %r299, %r298, %r297; 2026-02-21T09:48:18.3628883Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3628949Z cvt.u64.u32 %rd128, %r249; 2026-02-21T09:48:18.3629012Z cvt.u64.u32 %rd129, %r250; 2026-02-21T09:48:18.3629085Z shl.b64 %rd130, %rd129, 32; 2026-02-21T09:48:18.3629153Z or.b64 %rd131, %rd128, %rd130; 2026-02-21T09:48:18.3629418Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3629491Z mov.b64 {%r300, %r301}, %rd131; 2026-02-21T09:48:18.3629562Z cvt.rn.f16x2.f32 %r302, %r301, %r300; 2026-02-21T09:48:18.3629754Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3629826Z cvt.u64.u32 %rd132, %r251; 2026-02-21T09:48:18.3629890Z cvt.u64.u32 %rd133, %r252; 2026-02-21T09:48:18.3629955Z shl.b64 %rd134, %rd133, 32; 2026-02-21T09:48:18.3630022Z or.b64 %rd135, %rd132, %rd134; 2026-02-21T09:48:18.3630223Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3630290Z mov.b64 {%r303, %r304}, %rd135; 2026-02-21T09:48:18.3630361Z cvt.rn.f16x2.f32 %r305, %r304, %r303; 2026-02-21T09:48:18.3632595Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3632682Z cvt.u64.u32 %rd136, %r253; 2026-02-21T09:48:18.3632749Z cvt.u64.u32 %rd137, %r254; 2026-02-21T09:48:18.3632814Z shl.b64 %rd138, %rd137, 32; 2026-02-21T09:48:18.3632881Z or.b64 %rd139, %rd136, %rd138; 2026-02-21T09:48:18.3633090Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3633156Z mov.b64 {%r306, %r307}, %rd139; 2026-02-21T09:48:18.3633230Z cvt.rn.f16x2.f32 %r308, %r307, %r306; 2026-02-21T09:48:18.3633435Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3633503Z cvt.u64.u32 %rd140, %r256; 2026-02-21T09:48:18.3633568Z cvt.u64.u32 %rd141, %r257; 2026-02-21T09:48:18.3633643Z shl.b64 %rd142, %rd141, 32; 2026-02-21T09:48:18.3633710Z or.b64 %rd143, %rd140, %rd142; 2026-02-21T09:48:18.3634025Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3634122Z mov.b64 {%r309, %r310}, %rd143; 2026-02-21T09:48:18.3634193Z cvt.rn.f16x2.f32 %r311, %r310, %r309; 2026-02-21T09:48:18.3634386Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3634460Z cvt.u64.u32 %rd144, %r258; 2026-02-21T09:48:18.3634525Z cvt.u64.u32 %rd145, %r259; 2026-02-21T09:48:18.3634589Z shl.b64 %rd146, %rd145, 32; 2026-02-21T09:48:18.3634665Z or.b64 %rd147, %rd144, %rd146; 2026-02-21T09:48:18.3635521Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3635593Z mov.b64 {%r312, %r313}, %rd147; 2026-02-21T09:48:18.3635673Z cvt.rn.f16x2.f32 %r314, %r313, %r312; 2026-02-21T09:48:18.3635878Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3635944Z cvt.u64.u32 %rd148, %r260; 2026-02-21T09:48:18.3636012Z cvt.u64.u32 %rd149, %r261; 2026-02-21T09:48:18.3636093Z shl.b64 %rd150, %rd149, 32; 2026-02-21T09:48:18.3636161Z or.b64 %rd151, %rd148, %rd150; 2026-02-21T09:48:18.3636375Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3636454Z mov.b64 {%r315, %r316}, %rd151; 2026-02-21T09:48:18.3636526Z cvt.rn.f16x2.f32 %r317, %r316, %r315; 2026-02-21T09:48:18.3636729Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3636809Z cvt.u64.u32 %rd152, %r262; 2026-02-21T09:48:18.3636877Z cvt.u64.u32 %rd153, %r263; 2026-02-21T09:48:18.3637012Z shl.b64 %rd154, %rd153, 32; 2026-02-21T09:48:18.3637083Z or.b64 %rd155, %rd152, %rd154; 2026-02-21T09:48:18.3637298Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3637366Z mov.b64 {%r318, %r319}, %rd155; 2026-02-21T09:48:18.3637441Z cvt.rn.f16x2.f32 %r320, %r319, %r318; 2026-02-21T09:48:18.3637644Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3637770Z cvt.u64.u32 %rd156, %r264; 2026-02-21T09:48:18.3637839Z cvt.u64.u32 %rd157, %r265; 2026-02-21T09:48:18.3637921Z shl.b64 %rd158, %rd157, 32; 2026-02-21T09:48:18.3637991Z or.b64 %rd159, %rd156, %rd158; 2026-02-21T09:48:18.3638209Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3638278Z mov.b64 {%r321, %r322}, %rd159; 2026-02-21T09:48:18.3638360Z cvt.rn.f16x2.f32 %r323, %r322, %r321; 2026-02-21T09:48:18.3638556Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3638621Z cvt.u64.u32 %rd160, %r266; 2026-02-21T09:48:18.3638692Z cvt.u64.u32 %rd161, %r267; 2026-02-21T09:48:18.3638756Z shl.b64 %rd162, %rd161, 32; 2026-02-21T09:48:18.3638824Z or.b64 %rd163, %rd160, %rd162; 2026-02-21T09:48:18.3639117Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3639186Z mov.b64 {%r324, %r325}, %rd163; 2026-02-21T09:48:18.3639256Z cvt.rn.f16x2.f32 %r326, %r325, %r324; 2026-02-21T09:48:18.3639466Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3639541Z cvt.u64.u32 %rd164, %r268; 2026-02-21T09:48:18.3639604Z cvt.u64.u32 %rd165, %r269; 2026-02-21T09:48:18.3639668Z shl.b64 %rd166, %rd165, 32; 2026-02-21T09:48:18.3639742Z or.b64 %rd167, %rd164, %rd166; 2026-02-21T09:48:18.3639935Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3640002Z mov.b64 {%r327, %r328}, %rd167; 2026-02-21T09:48:18.3640081Z cvt.rn.f16x2.f32 %r329, %r328, %r327; 2026-02-21T09:48:18.3640333Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3640403Z cvt.u64.u32 %rd168, %r270; 2026-02-21T09:48:18.3640470Z cvt.u64.u32 %rd169, %r271; 2026-02-21T09:48:18.3640544Z shl.b64 %rd170, %rd169, 32; 2026-02-21T09:48:18.3640611Z or.b64 %rd171, %rd168, %rd170; 2026-02-21T09:48:18.3640826Z .loc 1 58 27 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:58:27 2026-02-21T09:48:18.3640901Z mov.b64 {%r330, %r331}, %rd171; 2026-02-21T09:48:18.3640972Z cvt.rn.f16x2.f32 %r332, %r331, %r330; 2026-02-21T09:48:18.3641186Z .loc 1 59 45 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:59:45 2026-02-21T09:48:18.3641275Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:18.3641337Z bar.sync 0, 128; 2026-02-21T09:48:18.3641445Z st.shared.v4.b32 [%r24], {%r287, %r290, %r293, %r296}; 2026-02-21T09:48:18.3641549Z st.shared.v4.b32 [%r25], {%r299, %r302, %r305, %r308}; 2026-02-21T09:48:18.3641657Z st.shared.v4.b32 [%r26], {%r311, %r314, %r317, %r320}; 2026-02-21T09:48:18.3641755Z st.shared.v4.b32 [%r27], {%r323, %r326, %r329, %r332}; 2026-02-21T09:48:18.3641822Z // begin inline asm 2026-02-21T09:48:18.3641916Z fence.proxy.async.shared::cta; 2026-02-21T09:48:18.3641978Z // end inline asm 2026-02-21T09:48:18.3642039Z bar.sync 0, 128; 2026-02-21T09:48:18.3642130Z elect.sync %r333|%p117, -1; 2026-02-21T09:48:18.3642200Z and.pred %p115, %p34, %p117; 2026-02-21T09:48:18.3642263Z // begin inline asm 2026-02-21T09:48:18.3642469Z @%p115 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd107, {%r273, %r274}], [%r177]; 2026-02-21T09:48:18.3642536Z // end inline asm 2026-02-21T09:48:18.3642609Z cp.async.bulk.commit_group; 2026-02-21T09:48:18.3642838Z .loc 1 33 84 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:33:84 2026-02-21T09:48:18.3642911Z add.s32 %r341, %r341, 1; 2026-02-21T09:48:18.3642976Z add.s16 %rs14, %rs14, 1; 2026-02-21T09:48:18.3643045Z setp.ne.b32 %p118, %r23, %r341; 2026-02-21T09:48:18.3643115Z @%p118 bra $L__BB0_14; 2026-02-21T09:48:18.3643208Z $L__BB0_15: // %._crit_edge 2026-02-21T09:48:18.3643317Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:48:18.3643378Z bar.sync 0, 128; 2026-02-21T09:48:18.3643577Z .loc 1 33 4 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:33:4 2026-02-21T09:48:18.3643636Z bar.sync 0, 128; 2026-02-21T09:48:18.3643696Z // begin inline asm 2026-02-21T09:48:18.3643835Z @%p34 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r334, 64; 2026-02-21T09:48:18.3643896Z // end inline asm 2026-02-21T09:48:18.3643982Z st.shared.b32 [global_smem+73928], 50529027; 2026-02-21T09:48:18.3644045Z barrier.sync 1; 2026-02-21T09:48:18.3644142Z $L__BB0_16: // %common.ret 2026-02-21T09:48:18.3644332Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3644391Z ret; 2026-02-21T09:48:18.3644507Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:48:18.3644611Z mov.b32 %r31, global_smem; 2026-02-21T09:48:18.3644723Z add.s32 %r32, %r31, %r3; 2026-02-21T09:48:18.3644796Z add.s32 %r63, %r31, 73888; 2026-02-21T09:48:18.3644864Z bfe.u32 %r77, %r31, 4, 14; 2026-02-21T09:48:18.3644928Z cvt.u64.u32 %rd24, %r77; 2026-02-21T09:48:18.3645002Z or.b64 %rd14, %rd24, 4611686293338849280; 2026-02-21T09:48:18.3645071Z add.s32 %r78, %r31, 32768; 2026-02-21T09:48:18.3645133Z bfe.u32 %r79, %r78, 4, 14; 2026-02-21T09:48:18.3645196Z cvt.u64.u32 %rd25, %r79; 2026-02-21T09:48:18.3645276Z or.b64 %rd15, %rd25, 4611686293338849280; 2026-02-21T09:48:18.3645342Z add.s32 %r80, %r31, 32; 2026-02-21T09:48:18.3645404Z bfe.u32 %r81, %r80, 4, 14; 2026-02-21T09:48:18.3645466Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.3645591Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3645833Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3645927Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.3646001Z barrier.sync 1; 2026-02-21T09:48:18.3646064Z barrier.sync 1; 2026-02-21T09:48:18.3646162Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.3646258Z $L__BB0_2: // %.preheader 2026-02-21T09:48:18.3646360Z // =>This Loop Header: Depth=1 2026-02-21T09:48:18.3646457Z // Child Loop BB0_9 Depth 2 2026-02-21T09:48:18.3646552Z // Child Loop BB0_6 Depth 2 2026-02-21T09:48:18.3646743Z .loc 1 19 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:19 2026-02-21T09:48:18.3646830Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:48:18.3646894Z barrier.sync 1; 2026-02-21T09:48:18.3646974Z ld.shared.b8 %r30, [%r32+73924]; 2026-02-21T09:48:18.3647060Z setp.gt.u32 %p2, %r30, 3; 2026-02-21T09:48:18.3647134Z @%p2 bra $L__BB0_4; 2026-02-21T09:48:18.3647222Z // %bb.3: // %.preheader 2026-02-21T09:48:18.3647322Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3647399Z $L_brx_0: .branchtargets 2026-02-21T09:48:18.3647459Z $L__BB0_5, 2026-02-21T09:48:18.3647516Z $L__BB0_8, 2026-02-21T09:48:18.3647581Z $L__BB0_11, 2026-02-21T09:48:18.3647638Z $L__BB0_16; 2026-02-21T09:48:18.3647702Z brx.idx %r30, $L_brx_0; 2026-02-21T09:48:18.3647788Z $L__BB0_5: // %.peel.next 2026-02-21T09:48:18.3647890Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3648123Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3648211Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.3648303Z ld.shared.b32 %r65, [global_smem+73728]; 2026-02-21T09:48:18.3648366Z barrier.sync 1; 2026-02-21T09:48:18.3648556Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3648665Z bar.warp.sync -1; 2026-02-21T09:48:18.3648734Z mov.b32 %r335, 0; 2026-02-21T09:48:18.3648807Z // begin inline asm 2026-02-21T09:48:18.3648865Z 2026-02-21T09:48:18.3648930Z { 2026-02-21T09:48:18.3648997Z .reg .pred complete; 2026-02-21T09:48:18.3649058Z waitLoop: 2026-02-21T09:48:18.3649189Z mbarrier.try_wait.parity.shared.b64 complete, [%r63], %r335; 2026-02-21T09:48:18.3649268Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.3649324Z } 2026-02-21T09:48:18.3649329Z 2026-02-21T09:48:18.3649390Z // end inline asm 2026-02-21T09:48:18.3649592Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3649663Z elect.sync %r76|%p12, -1; 2026-02-21T09:48:18.3649726Z mov.b32 %r66, 68157456; 2026-02-21T09:48:18.3649797Z mov.pred %p11, 0; 2026-02-21T09:48:18.3649857Z // begin inline asm 2026-02-21T09:48:18.3650048Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd14, %rd15, %r66, %p11; 2026-02-21T09:48:18.3650112Z // end inline asm 2026-02-21T09:48:18.3650185Z cvt.u64.u32 %rd26, %r81; 2026-02-21T09:48:18.3650259Z or.b64 %rd16, %rd26, 4611686293338849280; 2026-02-21T09:48:18.3650323Z add.s32 %r82, %r31, 32800; 2026-02-21T09:48:18.3650393Z bfe.u32 %r83, %r82, 4, 14; 2026-02-21T09:48:18.3650456Z cvt.u64.u32 %rd27, %r83; 2026-02-21T09:48:18.3650528Z or.b64 %rd17, %rd27, 4611686293338849280; 2026-02-21T09:48:18.3650601Z mov.pred %p13, -1; 2026-02-21T09:48:18.3650661Z // begin inline asm 2026-02-21T09:48:18.3650810Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd16, %rd17, %r66, %p13; 2026-02-21T09:48:18.3650874Z // end inline asm 2026-02-21T09:48:18.3650945Z add.s32 %r84, %r31, 64; 2026-02-21T09:48:18.3651006Z bfe.u32 %r85, %r84, 4, 14; 2026-02-21T09:48:18.3651068Z cvt.u64.u32 %rd28, %r85; 2026-02-21T09:48:18.3651145Z or.b64 %rd18, %rd28, 4611686293338849280; 2026-02-21T09:48:18.3651237Z add.s32 %r86, %r31, 32832; 2026-02-21T09:48:18.3651302Z bfe.u32 %r87, %r86, 4, 14; 2026-02-21T09:48:18.3651367Z cvt.u64.u32 %rd29, %r87; 2026-02-21T09:48:18.3651444Z or.b64 %rd19, %rd29, 4611686293338849280; 2026-02-21T09:48:18.3651504Z // begin inline asm 2026-02-21T09:48:18.3651645Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd18, %rd19, %r66, %p13; 2026-02-21T09:48:18.3651712Z // end inline asm 2026-02-21T09:48:18.3651774Z add.s32 %r88, %r31, 96; 2026-02-21T09:48:18.3651835Z bfe.u32 %r89, %r88, 4, 14; 2026-02-21T09:48:18.3651903Z cvt.u64.u32 %rd30, %r89; 2026-02-21T09:48:18.3651973Z or.b64 %rd20, %rd30, 4611686293338849280; 2026-02-21T09:48:18.3652037Z add.s32 %r90, %r31, 32864; 2026-02-21T09:48:18.3652098Z bfe.u32 %r91, %r90, 4, 14; 2026-02-21T09:48:18.3652167Z cvt.u64.u32 %rd31, %r91; 2026-02-21T09:48:18.3652236Z or.b64 %rd21, %rd31, 4611686293338849280; 2026-02-21T09:48:18.3652296Z // begin inline asm 2026-02-21T09:48:18.3652445Z @%p12 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd20, %rd21, %r66, %p13; 2026-02-21T09:48:18.3652508Z // end inline asm 2026-02-21T09:48:18.3652569Z add.s32 %r92, %r31, 73856; 2026-02-21T09:48:18.3652631Z cvt.u64.u32 %rd22, %r92; 2026-02-21T09:48:18.3652699Z // begin inline asm 2026-02-21T09:48:18.3652835Z @%p12 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd22]; 2026-02-21T09:48:18.3652895Z // end inline asm 2026-02-21T09:48:18.3652965Z add.s32 %r93, %r31, 73920; 2026-02-21T09:48:18.3653026Z cvt.u64.u32 %rd23, %r93; 2026-02-21T09:48:18.3653087Z // begin inline asm 2026-02-21T09:48:18.3653225Z @%p11 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:48:18.3653317Z // end inline asm 2026-02-21T09:48:18.3653375Z mov.b32 %r337, 1; 2026-02-21T09:48:18.3653437Z mov.b32 %r336, %r335; 2026-02-21T09:48:18.3653554Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:18.3653655Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:18.3653845Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3653962Z shl.b32 %r104, %r337, 3; 2026-02-21T09:48:18.3654027Z add.s32 %r106, %r31, %r104; 2026-02-21T09:48:18.3654092Z add.s32 %r107, %r106, 73856; 2026-02-21T09:48:18.3654163Z add.s32 %r94, %r106, 73888; 2026-02-21T09:48:18.3654358Z .loc 1 54 31 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:54:31 2026-02-21T09:48:18.3654422Z shl.b32 %r108, %r337, 13; 2026-02-21T09:48:18.3654486Z add.s32 %r109, %r31, %r108; 2026-02-21T09:48:18.3654730Z .loc 1 55 44 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:55:44 2026-02-21T09:48:18.3654799Z add.s32 %r110, %r109, 32768; 2026-02-21T09:48:18.3654989Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3655067Z bar.warp.sync -1; 2026-02-21T09:48:18.3655133Z // begin inline asm 2026-02-21T09:48:18.3655191Z 2026-02-21T09:48:18.3655302Z { 2026-02-21T09:48:18.3655374Z .reg .pred complete; 2026-02-21T09:48:18.3655440Z waitLoop: 2026-02-21T09:48:18.3655577Z mbarrier.try_wait.parity.shared.b64 complete, [%r94], %r336; 2026-02-21T09:48:18.3655662Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.3655718Z } 2026-02-21T09:48:18.3655723Z 2026-02-21T09:48:18.3655786Z // end inline asm 2026-02-21T09:48:18.3655989Z .loc 1 56 52 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:56:52 2026-02-21T09:48:18.3656062Z setp.eq.b32 %p31, %r335, 1920; 2026-02-21T09:48:18.3656135Z elect.sync %r111|%p22, -1; 2026-02-21T09:48:18.3656208Z bfe.u32 %r112, %r109, 4, 14; 2026-02-21T09:48:18.3656291Z cvt.u64.u32 %rd42, %r112; 2026-02-21T09:48:18.3656369Z or.b64 %rd32, %rd42, 4611686293338849280; 2026-02-21T09:48:18.3656439Z bfe.u32 %r113, %r110, 4, 14; 2026-02-21T09:48:18.3656522Z cvt.u64.u32 %rd43, %r113; 2026-02-21T09:48:18.3656627Z or.b64 %rd33, %rd43, 4611686293338849280; 2026-02-21T09:48:18.3656695Z // begin inline asm 2026-02-21T09:48:18.3656860Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd32, %rd33, %r66, %p13; 2026-02-21T09:48:18.3656921Z // end inline asm 2026-02-21T09:48:18.3656986Z add.s32 %r114, %r109, 32; 2026-02-21T09:48:18.3657049Z bfe.u32 %r115, %r114, 4, 14; 2026-02-21T09:48:18.3657119Z cvt.u64.u32 %rd44, %r115; 2026-02-21T09:48:18.3657189Z or.b64 %rd34, %rd44, 4611686293338849280; 2026-02-21T09:48:18.3657251Z add.s32 %r116, %r109, 32800; 2026-02-21T09:48:18.3657321Z bfe.u32 %r117, %r116, 4, 14; 2026-02-21T09:48:18.3657384Z cvt.u64.u32 %rd45, %r117; 2026-02-21T09:48:18.3657456Z or.b64 %rd35, %rd45, 4611686293338849280; 2026-02-21T09:48:18.3657517Z // begin inline asm 2026-02-21T09:48:18.3657668Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd34, %rd35, %r66, %p13; 2026-02-21T09:48:18.3657728Z // end inline asm 2026-02-21T09:48:18.3657792Z add.s32 %r118, %r109, 64; 2026-02-21T09:48:18.3657865Z bfe.u32 %r119, %r118, 4, 14; 2026-02-21T09:48:18.3657932Z cvt.u64.u32 %rd46, %r119; 2026-02-21T09:48:18.3658002Z or.b64 %rd36, %rd46, 4611686293338849280; 2026-02-21T09:48:18.3658073Z add.s32 %r120, %r109, 32832; 2026-02-21T09:48:18.3658134Z bfe.u32 %r121, %r120, 4, 14; 2026-02-21T09:48:18.3658197Z cvt.u64.u32 %rd47, %r121; 2026-02-21T09:48:18.3658267Z or.b64 %rd37, %rd47, 4611686293338849280; 2026-02-21T09:48:18.3658339Z // begin inline asm 2026-02-21T09:48:18.3658480Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd36, %rd37, %r66, %p13; 2026-02-21T09:48:18.3658539Z // end inline asm 2026-02-21T09:48:18.3658654Z add.s32 %r122, %r109, 96; 2026-02-21T09:48:18.3658718Z bfe.u32 %r123, %r122, 4, 14; 2026-02-21T09:48:18.3658780Z cvt.u64.u32 %rd48, %r123; 2026-02-21T09:48:18.3658849Z or.b64 %rd38, %rd48, 4611686293338849280; 2026-02-21T09:48:18.3658918Z add.s32 %r124, %r109, 32864; 2026-02-21T09:48:18.3658980Z bfe.u32 %r125, %r124, 4, 14; 2026-02-21T09:48:18.3659044Z cvt.u64.u32 %rd49, %r125; 2026-02-21T09:48:18.3659124Z or.b64 %rd39, %rd49, 4611686293338849280; 2026-02-21T09:48:18.3659240Z // begin inline asm 2026-02-21T09:48:18.3659383Z @%p22 tcgen05.mma.cta_group::1.kind::f16 [ %r65 + 0 ], %rd38, %rd39, %r66, %p13; 2026-02-21T09:48:18.3659451Z // end inline asm 2026-02-21T09:48:18.3659513Z cvt.u64.u32 %rd40, %r107; 2026-02-21T09:48:18.3659573Z // begin inline asm 2026-02-21T09:48:18.3659708Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd40]; 2026-02-21T09:48:18.3659778Z // end inline asm 2026-02-21T09:48:18.3659846Z and.pred %p30, %p31, %p22; 2026-02-21T09:48:18.3659909Z // begin inline asm 2026-02-21T09:48:18.3660048Z @%p30 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd23]; 2026-02-21T09:48:18.3660108Z // end inline asm 2026-02-21T09:48:18.3660292Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3660364Z add.s32 %r127, %r337, 1; 2026-02-21T09:48:18.3660432Z setp.eq.b32 %p32, %r127, 4; 2026-02-21T09:48:18.3660531Z selp.b32 %r337, 0, %r127, %p32; 2026-02-21T09:48:18.3660599Z selp.b32 %r128, 1, 0, %p32; 2026-02-21T09:48:18.3660669Z xor.b32 %r336, %r336, %r128; 2026-02-21T09:48:18.3660865Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3660927Z add.s32 %r335, %r335, 64; 2026-02-21T09:48:18.3661005Z setp.lt.u32 %p33, %r335, 1984; 2026-02-21T09:48:18.3661069Z @%p33 bra $L__BB0_6; 2026-02-21T09:48:18.3661157Z // %bb.7: // %.loopexit 2026-02-21T09:48:18.3661262Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3661327Z barrier.sync 1; 2026-02-21T09:48:18.3661411Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.3661473Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.3661585Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3661809Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3661896Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.3662011Z ld.shared.v2.b32 {%r49, %r53}, [global_smem+73736]; 2026-02-21T09:48:18.3662072Z barrier.sync 1; 2026-02-21T09:48:18.3662264Z .loc 1 21 67 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:21:67 2026-02-21T09:48:18.3662339Z mov.u32 %r35, %ctaid.x; 2026-02-21T09:48:18.3662406Z mov.u32 %r36, %ctaid.y; 2026-02-21T09:48:18.3662469Z mov.u32 %r37, %ctaid.z; 2026-02-21T09:48:18.3662532Z mov.u32 %r38, %nctaid.x; 2026-02-21T09:48:18.3662607Z mov.u32 %r39, %nctaid.y; 2026-02-21T09:48:18.3662678Z mad.lo.s32 %r40, %r37, %r39, %r36; 2026-02-21T09:48:18.3662745Z mad.lo.s32 %r41, %r40, %r38, %r35; 2026-02-21T09:48:18.3662818Z mul.lo.s32 %r42, %r41, 384; 2026-02-21T09:48:18.3663017Z .loc 1 22 68 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:22:68 2026-02-21T09:48:18.3663080Z add.s32 %r43, %r42, 128; 2026-02-21T09:48:18.3663147Z cvt.s64.s32 %rd8, %r43; 2026-02-21T09:48:18.3663221Z add.s64 %rd9, %rd7, %rd8; 2026-02-21T09:48:18.3663291Z cvta.global.u64 %rd13, %rd9; 2026-02-21T09:48:18.3663486Z .loc 1 21 67 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:21:67 2026-02-21T09:48:18.3663560Z cvt.s64.s32 %rd10, %r42; 2026-02-21T09:48:18.3663624Z add.s64 %rd11, %rd7, %rd10; 2026-02-21T09:48:18.3663698Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:48:18.3663774Z add.s32 %r13, %r1, -128; 2026-02-21T09:48:18.3663836Z mov.b32 %r339, 0; 2026-02-21T09:48:18.3663931Z mov.b32 %r338, -64; 2026-02-21T09:48:18.3663995Z mov.b32 %r340, %r339; 2026-02-21T09:48:18.3664116Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:48:18.3664218Z // => This Inner Loop Header: Depth=2 2026-02-21T09:48:18.3664413Z .loc 1 0 67 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0:67 2026-02-21T09:48:18.3664521Z setp.lt.u32 %p6, %r13, 32; 2026-02-21T09:48:18.3664588Z setp.eq.b32 %p3, %r13, 0; 2026-02-21T09:48:18.3664853Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3664923Z add.s32 %r338, %r338, 64; 2026-02-21T09:48:18.3664986Z shl.b32 %r55, %r340, 3; 2026-02-21T09:48:18.3665048Z add.s32 %r57, %r31, %r55; 2026-02-21T09:48:18.3665111Z add.s32 %r44, %r57, 73856; 2026-02-21T09:48:18.3665302Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3665367Z // begin inline asm 2026-02-21T09:48:18.3665421Z 2026-02-21T09:48:18.3665482Z { 2026-02-21T09:48:18.3665548Z .reg .pred complete; 2026-02-21T09:48:18.3665609Z waitLoop: 2026-02-21T09:48:18.3665734Z mbarrier.try_wait.parity.shared.b64 complete, [%r44], %r339; 2026-02-21T09:48:18.3665819Z @!complete bra.uni waitLoop; 2026-02-21T09:48:18.3665875Z } 2026-02-21T09:48:18.3665879Z 2026-02-21T09:48:18.3665980Z // end inline asm 2026-02-21T09:48:18.3666177Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3666242Z add.s32 %r50, %r57, 73888; 2026-02-21T09:48:18.3666427Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3666499Z bar.sync 3, 64; 2026-02-21T09:48:18.3666562Z // begin inline asm 2026-02-21T09:48:18.3666686Z @%p3 mbarrier.arrive.expect_tx.shared.b64 _, [%r50], 16384; 2026-02-21T09:48:18.3666747Z // end inline asm 2026-02-21T09:48:18.3666947Z .loc 1 54 31 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:54:31 2026-02-21T09:48:18.3667010Z shl.b32 %r58, %r340, 13; 2026-02-21T09:48:18.3667073Z add.s32 %r47, %r31, %r58; 2026-02-21T09:48:18.3667292Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3667355Z bar.sync 3, 64; 2026-02-21T09:48:18.3667427Z elect.sync %r59|%p7, -1; 2026-02-21T09:48:18.3667505Z and.pred %p4, %p6, %p7; 2026-02-21T09:48:18.3667567Z // begin inline asm 2026-02-21T09:48:18.3667841Z @%p4 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r47], [%rd12, {%r338, %r49}], [%r50]; 2026-02-21T09:48:18.3667902Z // end inline asm 2026-02-21T09:48:18.3668096Z .loc 1 55 44 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:55:44 2026-02-21T09:48:18.3668160Z add.s32 %r51, %r47, 32768; 2026-02-21T09:48:18.3668333Z .loc 1 0 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:0 2026-02-21T09:48:18.3668403Z bar.sync 3, 64; 2026-02-21T09:48:18.3668470Z elect.sync %r60|%p8, -1; 2026-02-21T09:48:18.3668538Z and.pred %p5, %p6, %p8; 2026-02-21T09:48:18.3668606Z // begin inline asm 2026-02-21T09:48:18.3668866Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r51], [%rd13, {%r338, %r53}], [%r50]; 2026-02-21T09:48:18.3668929Z // end inline asm 2026-02-21T09:48:18.3668991Z add.s32 %r61, %r340, 1; 2026-02-21T09:48:18.3669065Z setp.eq.b32 %p9, %r61, 4; 2026-02-21T09:48:18.3669136Z selp.b32 %r340, 0, %r61, %p9; 2026-02-21T09:48:18.3669200Z selp.b32 %r62, 1, 0, %p9; 2026-02-21T09:48:18.3669273Z xor.b32 %r339, %r339, %r62; 2026-02-21T09:48:18.3669457Z .loc 1 50 57 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:50:57 2026-02-21T09:48:18.3669526Z setp.lt.u32 %p10, %r338, 1984; 2026-02-21T09:48:18.3669595Z @%p10 bra $L__BB0_9; 2026-02-21T09:48:18.3669738Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3669799Z barrier.sync 1; 2026-02-21T09:48:18.3669884Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:48:18.3669953Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.3670059Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:48:18.3670250Z .loc 1 19 0 // csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py:19 2026-02-21T09:48:18.3670358Z barrier.sync 1; 2026-02-21T09:48:18.3670417Z barrier.sync 1; 2026-02-21T09:48:18.3670480Z bra.uni $L__BB0_2; 2026-02-21T09:48:18.3670538Z $L__tmp1: 2026-02-21T09:48:18.3670606Z $L__func_end0: 2026-02-21T09:48:18.3670697Z // -- End function 2026-02-21T09:48:18.3670752Z } 2026-02-21T09:48:18.3670989Z .file 1 "/tmp/torchinductor_root/sa/csadz6thcfoswmbbriox3s4chcxhmya6zrw2dzflpkcblocrspzv.py" 2026-02-21T09:48:18.3671058Z .section .debug_abbrev 2026-02-21T09:48:18.3671115Z { 2026-02-21T09:48:18.3671219Z .b8 1 // Abbreviation Code 2026-02-21T09:48:18.3671315Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:48:18.3671404Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:48:18.3671493Z .b8 37 // DW_AT_producer 2026-02-21T09:48:18.3671609Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.3671692Z .b8 19 // DW_AT_language 2026-02-21T09:48:18.3671776Z .b8 5 // DW_FORM_data2 2026-02-21T09:48:18.3671865Z .b8 3 // DW_AT_name 2026-02-21T09:48:18.3671945Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.3672030Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:48:18.3672123Z .b8 6 // DW_FORM_data4 2026-02-21T09:48:18.3672203Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:48:18.3672284Z .b8 8 // DW_FORM_string 2026-02-21T09:48:18.3672363Z .b8 0 // EOM(1) 2026-02-21T09:48:18.3672447Z .b8 0 // EOM(2) 2026-02-21T09:48:18.3672545Z .b8 0 // EOM(3) 2026-02-21T09:48:18.3672602Z } 2026-02-21T09:48:18.3672677Z .section .debug_info 2026-02-21T09:48:18.3672737Z { 2026-02-21T09:48:18.3672827Z .b32 104 // Length of Unit 2026-02-21T09:48:18.3672930Z .b8 2 // DWARF version number 2026-02-21T09:48:18.3672988Z .b8 0 2026-02-21T09:48:18.3673114Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:48:18.3673210Z .b8 8 // Address Size (in bytes) 2026-02-21T09:48:18.3673328Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:48:18.3673416Z .b8 116 // DW_AT_producer 2026-02-21T09:48:18.3673476Z .b8 114 2026-02-21T09:48:18.3673543Z .b8 105 2026-02-21T09:48:18.3673602Z .b8 116 2026-02-21T09:48:18.3673659Z .b8 111 2026-02-21T09:48:18.3673717Z .b8 110 2026-02-21T09:48:18.3673782Z .b8 0 2026-02-21T09:48:18.3673865Z .b8 2 // DW_AT_language 2026-02-21T09:48:18.3673920Z .b8 0 2026-02-21T09:48:18.3674015Z .b8 99 // DW_AT_name 2026-02-21T09:48:18.3674070Z .b8 115 2026-02-21T09:48:18.3674128Z .b8 97 2026-02-21T09:48:18.3674184Z .b8 100 2026-02-21T09:48:18.3674247Z .b8 122 2026-02-21T09:48:18.3674302Z .b8 54 2026-02-21T09:48:18.3674357Z .b8 116 2026-02-21T09:48:18.3674420Z .b8 104 2026-02-21T09:48:18.3674476Z .b8 99 2026-02-21T09:48:18.3674530Z .b8 102 2026-02-21T09:48:18.3674584Z .b8 111 2026-02-21T09:48:18.3674647Z .b8 115 2026-02-21T09:48:18.3674769Z .b8 119 2026-02-21T09:48:18.3674826Z .b8 109 2026-02-21T09:48:18.3674888Z .b8 98 2026-02-21T09:48:18.3674994Z .b8 98 2026-02-21T09:48:18.3675050Z .b8 114 2026-02-21T09:48:18.3675105Z .b8 105 2026-02-21T09:48:18.3675167Z .b8 111 2026-02-21T09:48:18.3675223Z .b8 120 2026-02-21T09:48:18.3675278Z .b8 51 2026-02-21T09:48:18.3675333Z .b8 115 2026-02-21T09:48:18.3675396Z .b8 52 2026-02-21T09:48:18.3675451Z .b8 99 2026-02-21T09:48:18.3675505Z .b8 104 2026-02-21T09:48:18.3675573Z .b8 99 2026-02-21T09:48:18.3675630Z .b8 120 2026-02-21T09:48:18.3675722Z .b8 104 2026-02-21T09:48:18.3675780Z .b8 109 2026-02-21T09:48:18.3675848Z .b8 121 2026-02-21T09:48:18.3675906Z .b8 97 2026-02-21T09:48:18.3675966Z .b8 54 2026-02-21T09:48:18.3676030Z .b8 122 2026-02-21T09:48:18.3676087Z .b8 114 2026-02-21T09:48:18.3676145Z .b8 119 2026-02-21T09:48:18.3676203Z .b8 50 2026-02-21T09:48:18.3676277Z .b8 100 2026-02-21T09:48:18.3676331Z .b8 122 2026-02-21T09:48:18.3676386Z .b8 102 2026-02-21T09:48:18.3676449Z .b8 108 2026-02-21T09:48:18.3676503Z .b8 112 2026-02-21T09:48:18.3676558Z .b8 107 2026-02-21T09:48:18.3676612Z .b8 99 2026-02-21T09:48:18.3676677Z .b8 98 2026-02-21T09:48:18.3676731Z .b8 108 2026-02-21T09:48:18.3676786Z .b8 111 2026-02-21T09:48:18.3676840Z .b8 99 2026-02-21T09:48:18.3676901Z .b8 114 2026-02-21T09:48:18.3676955Z .b8 115 2026-02-21T09:48:18.3677010Z .b8 112 2026-02-21T09:48:18.3677071Z .b8 122 2026-02-21T09:48:18.3677126Z .b8 118 2026-02-21T09:48:18.3677182Z .b8 46 2026-02-21T09:48:18.3677238Z .b8 112 2026-02-21T09:48:18.3677299Z .b8 121 2026-02-21T09:48:18.3677390Z .b8 0 2026-02-21T09:48:18.3677494Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:48:18.3677585Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:48:18.3677639Z .b8 116 2026-02-21T09:48:18.3677695Z .b8 109 2026-02-21T09:48:18.3677750Z .b8 112 2026-02-21T09:48:18.3677812Z .b8 47 2026-02-21T09:48:18.3677865Z .b8 116 2026-02-21T09:48:18.3677921Z .b8 111 2026-02-21T09:48:18.3677983Z .b8 114 2026-02-21T09:48:18.3678040Z .b8 99 2026-02-21T09:48:18.3678093Z .b8 104 2026-02-21T09:48:18.3678147Z .b8 105 2026-02-21T09:48:18.3678215Z .b8 110 2026-02-21T09:48:18.3678269Z .b8 100 2026-02-21T09:48:18.3678323Z .b8 117 2026-02-21T09:48:18.3678377Z .b8 99 2026-02-21T09:48:18.3678440Z .b8 116 2026-02-21T09:48:18.3678493Z .b8 111 2026-02-21T09:48:18.3678548Z .b8 114 2026-02-21T09:48:18.3678611Z .b8 95 2026-02-21T09:48:18.3678666Z .b8 114 2026-02-21T09:48:18.3678720Z .b8 111 2026-02-21T09:48:18.3678807Z .b8 111 2026-02-21T09:48:18.3678874Z .b8 116 2026-02-21T09:48:18.3678934Z .b8 47 2026-02-21T09:48:18.3678991Z .b8 115 2026-02-21T09:48:18.3679053Z .b8 97 2026-02-21T09:48:18.3679108Z .b8 0 2026-02-21T09:48:18.3679163Z } 2026-02-21T09:48:18.3679234Z .section .debug_macinfo { } 2026-02-21T09:48:18.3679239Z 2026-02-21T09:48:18.3679335Z ================================================================ 2026-02-21T09:48:18.3679448Z please share the reproducer above with Triton project. 2026-02-21T09:48:21.9773441Z 2026-02-21T09:48:21.9778258Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 18.0 configs/s 2026-02-21T09:48:28.3960528Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 155.8 2026-02-21T09:48:28.3962833Z configs/s 2026-02-21T09:48:28.6144185Z [125s] Generation 4 complete: 2026-02-21T09:48:28.6149694Z error=28 2026-02-21T09:48:28.6151389Z ok=64 2026-02-21T09:48:28.6151685Z min=0.1311 2026-02-21T09:48:28.6151860Z mid=0.1966 2026-02-21T09:48:28.6152027Z max=57.4126 2026-02-21T09:48:28.6152181Z best={'block_sizes': [256, 256, 32], 2026-02-21T09:48:28.6152505Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:48:28.6157685Z 'l2_groupings': [2], 2026-02-21T09:48:28.6161616Z 'load_eviction_policies': ['', 'first'], 2026-02-21T09:48:28.6166355Z 'loop_orders': [[0, 1]], 2026-02-21T09:48:28.6168541Z 'maxnreg': 128, 2026-02-21T09:48:28.6168771Z 'num_sm_multiplier': 8, 2026-02-21T09:48:28.6168965Z 'num_stages': 5, 2026-02-21T09:48:28.6169140Z 'num_warps': 2, 2026-02-21T09:48:28.6169330Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:48:28.6169971Z 'range_flattens': [True, None], 2026-02-21T09:48:28.6170182Z 'range_multi_buffers': [None, False], 2026-02-21T09:48:28.6170411Z 'range_num_stages': [0, 0], 2026-02-21T09:48:28.6170609Z 'range_unroll_factors': [0, 0], 2026-02-21T09:48:28.6170840Z 'range_warp_specializes': [True, None]} 2026-02-21T09:48:28.6171095Z [125s] Fitting surrogate: 473 points, 473 targets 2026-02-21T09:48:30.0613573Z [127s] Generation 5 starting: 90 neighbors, 5 active search path(s) 2026-02-21T09:49:10.2915837Z [167s] Timeout after 30s compiling Config(block_sizes=[1024, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=2, num_stages=7, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, None]) 2026-02-21T09:49:10.2933165Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93/93 0.3 configs/s 2026-02-21T09:49:14.4251789Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 93/93 22.8 configs/s 2026-02-21T09:49:21.9178604Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 133.5 2026-02-21T09:49:21.9179581Z configs/s 2026-02-21T09:49:22.1605379Z [179s] Generation 5 complete: 2026-02-21T09:49:22.1607160Z error=24 2026-02-21T09:49:22.1607349Z timeout=1 2026-02-21T09:49:22.1607489Z ok=70 2026-02-21T09:49:22.1607616Z min=0.1198 2026-02-21T09:49:22.1607757Z mid=0.1649 2026-02-21T09:49:22.1607883Z max=13.5680 2026-02-21T09:49:22.1608040Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:49:22.1608298Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:49:22.1608543Z 'l2_groupings': [8], 2026-02-21T09:49:22.1608711Z 'load_eviction_policies': ['', ''], 2026-02-21T09:49:22.1608916Z 'loop_orders': [[1, 0]], 2026-02-21T09:49:22.1609074Z 'maxnreg': 64, 2026-02-21T09:49:22.1609214Z 'num_sm_multiplier': 8, 2026-02-21T09:49:22.1609372Z 'num_stages': 2, 2026-02-21T09:49:22.1609504Z 'num_warps': 1, 2026-02-21T09:49:22.1609661Z 'pid_type': 'persistent_blocked', 2026-02-21T09:49:22.1610117Z 'range_flattens': [False, False], 2026-02-21T09:49:22.1610317Z 'range_multi_buffers': [False, False], 2026-02-21T09:49:22.1610516Z 'range_num_stages': [0, 0], 2026-02-21T09:49:22.1610689Z 'range_unroll_factors': [0, 0], 2026-02-21T09:49:22.1610875Z 'range_warp_specializes': [None, True]} 2026-02-21T09:49:22.1627943Z [179s] Fitting surrogate: 568 points, 568 targets 2026-02-21T09:49:23.7502636Z [180s] Generation 6 starting: 100 neighbors, 5 active search path(s) 2026-02-21T09:49:36.1005956Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 103/103 5.6 configs/s 2026-02-21T09:49:40.0247411Z 2026-02-21T09:49:40.0252552Z 2026-02-21T09:49:40.0254566Z ================================================================ 2026-02-21T09:49:40.0255343Z Internal Triton PTX codegen error 2026-02-21T09:49:40.0260099Z `ptxas` stderr: 2026-02-21T09:49:40.0262165Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:49:40.0262877Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:49:40.0266871Z 2026-02-21T09:49:40.0267520Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpbo3pk_u9.ptx -o /tmp/tmpbo3pk_u9.ptx.o 2026-02-21T09:49:40.0268079Z 2026-02-21T09:49:40.0268083Z 2026-02-21T09:49:40.0268158Z // 2026-02-21T09:49:40.0268326Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:49:40.0268535Z // 2026-02-21T09:49:40.0268612Z 2026-02-21T09:49:40.0268677Z .version 8.7 2026-02-21T09:49:40.0268849Z .target sm_100a 2026-02-21T09:49:40.0269010Z .address_size 64 2026-02-21T09:49:40.0269420Z 2026-02-21T09:49:40.0269569Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:49:40.0269880Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:49:40.0270404Z // @_helion_matmul 2026-02-21T09:49:40.0270628Z .visible .entry _helion_matmul( 2026-02-21T09:49:40.0270878Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:49:40.0271276Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:49:40.0271563Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:49:40.0271845Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:49:40.0272123Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:49:40.0272362Z ) 2026-02-21T09:49:40.0272495Z .reqntid 256 2026-02-21T09:49:40.0272646Z .maxnreg 32 2026-02-21T09:49:40.0272782Z { 2026-02-21T09:49:40.0272930Z .reg .pred %p<143>; 2026-02-21T09:49:40.0273098Z .reg .b32 %r<1656>; 2026-02-21T09:49:40.0273263Z .reg .b64 %rd<609>; 2026-02-21T09:49:40.0273560Z .loc 1 19 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:19:0 2026-02-21T09:49:40.0273887Z $L__func_begin0: 2026-02-21T09:49:40.0274172Z .loc 1 19 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:19:0 2026-02-21T09:49:40.0274448Z 2026-02-21T09:49:40.0274508Z // %bb.0: 2026-02-21T09:49:40.0274822Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:49:40.0275037Z $L__tmp0: 2026-02-21T09:49:40.0275309Z .loc 1 19 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:19 2026-02-21T09:49:40.0275634Z mov.u32 %r1, %tid.x; 2026-02-21T09:49:40.0275795Z shr.u32 %r2, %r1, 5; 2026-02-21T09:49:40.0275977Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:49:40.0276188Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:49:40.0276372Z @%p3 bra $L__BB0_16; 2026-02-21T09:49:40.0276531Z bra.uni $L__BB0_1; 2026-02-21T09:49:40.0276697Z $L__BB0_16: 2026-02-21T09:49:40.0276963Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0:0 2026-02-21T09:49:40.0277321Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:49:40.0277562Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:49:40.0277974Z .loc 1 19 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:19 2026-02-21T09:49:40.0278345Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:49:40.0278578Z setp.lt.u32 %p35, %r1, 32; 2026-02-21T09:49:40.0278787Z mov.b32 %r212, global_smem; 2026-02-21T09:49:40.0278982Z // begin inline asm 2026-02-21T09:49:40.0279283Z @%p35 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r212], 512; 2026-02-21T09:49:40.0279575Z // end inline asm 2026-02-21T09:49:40.0279741Z bar.sync 0, 128; 2026-02-21T09:49:40.0279919Z ld.shared.b32 %r1622, [global_smem]; 2026-02-21T09:49:40.0280115Z bar.sync 0, 128; 2026-02-21T09:49:40.0280277Z // begin inline asm 2026-02-21T09:49:40.0280514Z @%p35 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:49:40.0280776Z // end inline asm 2026-02-21T09:49:40.0281060Z .loc 1 21 67 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:21:67 2026-02-21T09:49:40.0281396Z mov.u32 %r529, %ctaid.x; 2026-02-21T09:49:40.0281570Z mov.u32 %r530, %ctaid.y; 2026-02-21T09:49:40.0281747Z mov.u32 %r531, %ctaid.z; 2026-02-21T09:49:40.0281926Z mov.u32 %r532, %nctaid.x; 2026-02-21T09:49:40.0282100Z mov.u32 %r533, %nctaid.y; 2026-02-21T09:49:40.0282287Z mad.lo.s32 %r534, %r531, %r533, %r530; 2026-02-21T09:49:40.0282490Z mad.lo.s32 %r535, %r534, %r532, %r529; 2026-02-21T09:49:40.0282690Z shl.b32 %r536, %r535, 8; 2026-02-21T09:49:40.0282857Z cvt.s64.s32 %rd64, %r536; 2026-02-21T09:49:40.0283038Z add.s64 %rd43, %rd6, %rd64; 2026-02-21T09:49:40.0283213Z shl.b32 %r537, %r1, 2; 2026-02-21T09:49:40.0283391Z add.s32 %r213, %r212, %r537; 2026-02-21T09:49:40.0283568Z mov.b32 %r1653, 0; 2026-02-21T09:49:40.0283767Z // begin inline asm 2026-02-21T09:49:40.0283945Z @%p35 st.shared.b32 [ %r213 + 0 ], %r1653; 2026-02-21T09:49:40.0284145Z // end inline asm 2026-02-21T09:49:40.0284305Z bar.warp.sync -1; 2026-02-21T09:49:40.0284469Z setp.eq.b32 %p123, %r1, 0; 2026-02-21T09:49:40.0284652Z cvt.u64.u32 %rd28, %r212; 2026-02-21T09:49:40.0284870Z // begin inline asm 2026-02-21T09:49:40.0285166Z @%p123 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd3; 2026-02-21T09:49:40.0285546Z // end inline asm 2026-02-21T09:49:40.0285710Z // begin inline asm 2026-02-21T09:49:40.0285985Z @%p123 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:49:40.0286285Z // end inline asm 2026-02-21T09:49:40.0286450Z mov.b32 %r215, 32; 2026-02-21T09:49:40.0286609Z // begin inline asm 2026-02-21T09:49:40.0286895Z @%p123 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r215; 2026-02-21T09:49:40.0287220Z // end inline asm 2026-02-21T09:49:40.0287388Z mov.b32 %r216, 256; 2026-02-21T09:49:40.0287551Z // begin inline asm 2026-02-21T09:49:40.0287836Z @%p123 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r216; 2026-02-21T09:49:40.0288168Z // end inline asm 2026-02-21T09:49:40.0288331Z mov.b32 %r217, 2048; 2026-02-21T09:49:40.0288505Z // begin inline asm 2026-02-21T09:49:40.0288826Z @%p123 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r217; 2026-02-21T09:49:40.0289161Z // end inline asm 2026-02-21T09:49:40.0289307Z // begin inline asm 2026-02-21T09:49:40.0289583Z @%p123 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r217; 2026-02-21T09:49:40.0289907Z // end inline asm 2026-02-21T09:49:40.0290056Z mov.b64 %rd36, 4096; 2026-02-21T09:49:40.0290221Z // begin inline asm 2026-02-21T09:49:40.0290508Z @%p123 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:49:40.0290835Z // end inline asm 2026-02-21T09:49:40.0290981Z mov.b32 %r219, 1; 2026-02-21T09:49:40.0291141Z // begin inline asm 2026-02-21T09:49:40.0291430Z @%p123 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r219; 2026-02-21T09:49:40.0291760Z // end inline asm 2026-02-21T09:49:40.0291917Z // begin inline asm 2026-02-21T09:49:40.0292247Z @%p123 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r219; 2026-02-21T09:49:40.0292579Z // end inline asm 2026-02-21T09:49:40.0292726Z // begin inline asm 2026-02-21T09:49:40.0292999Z @%p123 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:49:40.0293300Z // end inline asm 2026-02-21T09:49:40.0293458Z // begin inline asm 2026-02-21T09:49:40.0293750Z @%p123 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.0294076Z // end inline asm 2026-02-21T09:49:40.0294233Z // begin inline asm 2026-02-21T09:49:40.0294500Z @%p123 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:49:40.0294845Z // end inline asm 2026-02-21T09:49:40.0294994Z // begin inline asm 2026-02-21T09:49:40.0295258Z @%p123 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.0295562Z // end inline asm 2026-02-21T09:49:40.0295712Z // begin inline asm 2026-02-21T09:49:40.0296113Z @%p35 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:49:40.0296550Z // end inline asm 2026-02-21T09:49:40.0296705Z // begin inline asm 2026-02-21T09:49:40.0296940Z @%p35 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T09:49:40.0297232Z @%p35 cp.async.bulk.commit_group ; 2026-02-21T09:49:40.0297459Z @%p35 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:49:40.0297657Z // end inline asm 2026-02-21T09:49:40.0297812Z bar.sync 0, 128; 2026-02-21T09:49:40.0298096Z .loc 1 22 68 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:22:68 2026-02-21T09:49:40.0298489Z add.s64 %rd61, %rd43, 128; 2026-02-21T09:49:40.0298662Z bar.sync 0, 128; 2026-02-21T09:49:40.0298817Z // begin inline asm 2026-02-21T09:49:40.0298988Z @%p35 st.shared.b32 [ %r213 + 0 ], %r1653; 2026-02-21T09:49:40.0299199Z // end inline asm 2026-02-21T09:49:40.0299353Z bar.warp.sync -1; 2026-02-21T09:49:40.0299557Z // begin inline asm 2026-02-21T09:49:40.0299852Z @%p123 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd4; 2026-02-21T09:49:40.0300173Z // end inline asm 2026-02-21T09:49:40.0300335Z // begin inline asm 2026-02-21T09:49:40.0300617Z @%p123 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:49:40.0300912Z // end inline asm 2026-02-21T09:49:40.0301061Z // begin inline asm 2026-02-21T09:49:40.0301334Z @%p123 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r215; 2026-02-21T09:49:40.0301649Z // end inline asm 2026-02-21T09:49:40.0301798Z mov.b32 %r224, 128; 2026-02-21T09:49:40.0301963Z // begin inline asm 2026-02-21T09:49:40.0302227Z @%p123 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r224; 2026-02-21T09:49:40.0302541Z // end inline asm 2026-02-21T09:49:40.0302689Z // begin inline asm 2026-02-21T09:49:40.0303010Z @%p123 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r217; 2026-02-21T09:49:40.0303325Z // end inline asm 2026-02-21T09:49:40.0303483Z mov.b32 %r226, 12288; 2026-02-21T09:49:40.0303653Z // begin inline asm 2026-02-21T09:49:40.0303934Z @%p123 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r226; 2026-02-21T09:49:40.0304253Z // end inline asm 2026-02-21T09:49:40.0304399Z // begin inline asm 2026-02-21T09:49:40.0304715Z @%p123 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:49:40.0305036Z // end inline asm 2026-02-21T09:49:40.0305193Z // begin inline asm 2026-02-21T09:49:40.0305485Z @%p123 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r219; 2026-02-21T09:49:40.0305831Z // end inline asm 2026-02-21T09:49:40.0305985Z // begin inline asm 2026-02-21T09:49:40.0306300Z @%p123 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r219; 2026-02-21T09:49:40.0306630Z // end inline asm 2026-02-21T09:49:40.0306782Z // begin inline asm 2026-02-21T09:49:40.0307054Z @%p123 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:49:40.0307355Z // end inline asm 2026-02-21T09:49:40.0307502Z // begin inline asm 2026-02-21T09:49:40.0307787Z @%p123 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.0308104Z // end inline asm 2026-02-21T09:49:40.0308256Z // begin inline asm 2026-02-21T09:49:40.0308519Z @%p123 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:49:40.0308836Z // end inline asm 2026-02-21T09:49:40.0308984Z // begin inline asm 2026-02-21T09:49:40.0309249Z @%p123 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.0309544Z // end inline asm 2026-02-21T09:49:40.0309690Z // begin inline asm 2026-02-21T09:49:40.0310089Z @%p35 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:49:40.0310513Z // end inline asm 2026-02-21T09:49:40.0310672Z // begin inline asm 2026-02-21T09:49:40.0310907Z @%p35 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T09:49:40.0311196Z @%p35 cp.async.bulk.commit_group ; 2026-02-21T09:49:40.0311417Z @%p35 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:49:40.0311617Z // end inline asm 2026-02-21T09:49:40.0311774Z bar.sync 0, 128; 2026-02-21T09:49:40.0312050Z .loc 1 29 35 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:29:35 2026-02-21T09:49:40.0312417Z mul.lo.s32 %r43, %r529, 3; 2026-02-21T09:49:40.0312721Z .loc 1 30 37 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:30:37 2026-02-21T09:49:40.0313051Z add.s32 %r538, %r43, 3; 2026-02-21T09:49:40.0313349Z .loc 1 30 49 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:30:49 2026-02-21T09:49:40.0313667Z min.s32 %r539, %r538, 768; 2026-02-21T09:49:40.0314014Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0314350Z sub.s32 %r542, %r539, %r43; 2026-02-21T09:49:40.0314534Z shl.b32 %r1642, %r542, 6; 2026-02-21T09:49:40.0314867Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0315219Z shfl.sync.idx.b32 %r543, %r2, 0, 31, -1; 2026-02-21T09:49:40.0315432Z shl.b32 %r544, %r543, 21; 2026-02-21T09:49:40.0315609Z and.b32 %r545, %r544, 6291456; 2026-02-21T09:49:40.0315801Z add.s32 %r229, %r545, %r1622; 2026-02-21T09:49:40.0315980Z mov.pred %p73, -1; 2026-02-21T09:49:40.0316148Z // begin inline asm 2026-02-21T09:49:40.0316605Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 0], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0317115Z // end inline asm 2026-02-21T09:49:40.0317337Z // begin inline asm 2026-02-21T09:49:40.0317896Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 16], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0318502Z // end inline asm 2026-02-21T09:49:40.0318669Z // begin inline asm 2026-02-21T09:49:40.0319110Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 32], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0319598Z // end inline asm 2026-02-21T09:49:40.0319758Z // begin inline asm 2026-02-21T09:49:40.0320221Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 48], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0320697Z // end inline asm 2026-02-21T09:49:40.0320885Z // begin inline asm 2026-02-21T09:49:40.0321341Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 64], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0321841Z // end inline asm 2026-02-21T09:49:40.0321991Z // begin inline asm 2026-02-21T09:49:40.0322447Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 80], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0322946Z // end inline asm 2026-02-21T09:49:40.0323092Z // begin inline asm 2026-02-21T09:49:40.0323531Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 96], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0323998Z // end inline asm 2026-02-21T09:49:40.0324150Z // begin inline asm 2026-02-21T09:49:40.0324597Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 112], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0325110Z // end inline asm 2026-02-21T09:49:40.0325268Z // begin inline asm 2026-02-21T09:49:40.0325720Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 128], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0326199Z // end inline asm 2026-02-21T09:49:40.0326347Z // begin inline asm 2026-02-21T09:49:40.0326812Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 144], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0327340Z // end inline asm 2026-02-21T09:49:40.0327493Z // begin inline asm 2026-02-21T09:49:40.0327947Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 160], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0328500Z // end inline asm 2026-02-21T09:49:40.0328662Z // begin inline asm 2026-02-21T09:49:40.0329125Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 176], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0329639Z // end inline asm 2026-02-21T09:49:40.0329795Z // begin inline asm 2026-02-21T09:49:40.0330216Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 192], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0330687Z // end inline asm 2026-02-21T09:49:40.0330833Z // begin inline asm 2026-02-21T09:49:40.0331263Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 208], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0331756Z // end inline asm 2026-02-21T09:49:40.0331906Z // begin inline asm 2026-02-21T09:49:40.0332329Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 224], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0332783Z // end inline asm 2026-02-21T09:49:40.0332935Z // begin inline asm 2026-02-21T09:49:40.0333355Z @%p73 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r229 + 240], {%r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653, %r1653}; 2026-02-21T09:49:40.0333819Z // end inline asm 2026-02-21T09:49:40.0333973Z // begin inline asm 2026-02-21T09:49:40.0334137Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:49:40.0334323Z // end inline asm 2026-02-21T09:49:40.0334467Z bar.sync 0, 128; 2026-02-21T09:49:40.0334821Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0335161Z add.s32 %r501, %r212, 180224; 2026-02-21T09:49:40.0335347Z // begin inline asm 2026-02-21T09:49:40.0335544Z @%p123 mbarrier.init.shared::cta.b64 [%r501], 1; 2026-02-21T09:49:40.0335763Z // end inline asm 2026-02-21T09:49:40.0335920Z bar.sync 0, 128; 2026-02-21T09:49:40.0336072Z add.s32 %r502, %r212, 180232; 2026-02-21T09:49:40.0336252Z // begin inline asm 2026-02-21T09:49:40.0336438Z @%p123 mbarrier.init.shared::cta.b64 [%r502], 1; 2026-02-21T09:49:40.0336658Z // end inline asm 2026-02-21T09:49:40.0336803Z bar.sync 0, 128; 2026-02-21T09:49:40.0336965Z add.s32 %r503, %r212, 180240; 2026-02-21T09:49:40.0337139Z // begin inline asm 2026-02-21T09:49:40.0337331Z @%p123 mbarrier.init.shared::cta.b64 [%r503], 1; 2026-02-21T09:49:40.0337544Z // end inline asm 2026-02-21T09:49:40.0337692Z bar.sync 0, 128; 2026-02-21T09:49:40.0337847Z add.s32 %r504, %r212, 180248; 2026-02-21T09:49:40.0338016Z // begin inline asm 2026-02-21T09:49:40.0338201Z @%p123 mbarrier.init.shared::cta.b64 [%r504], 1; 2026-02-21T09:49:40.0338413Z // end inline asm 2026-02-21T09:49:40.0338565Z bar.sync 0, 128; 2026-02-21T09:49:40.0338715Z add.s32 %r505, %r212, 180256; 2026-02-21T09:49:40.0338893Z // begin inline asm 2026-02-21T09:49:40.0339079Z @%p123 mbarrier.init.shared::cta.b64 [%r505], 1; 2026-02-21T09:49:40.0339287Z // end inline asm 2026-02-21T09:49:40.0339440Z bar.sync 0, 128; 2026-02-21T09:49:40.0339588Z add.s32 %r506, %r212, 180264; 2026-02-21T09:49:40.0339765Z // begin inline asm 2026-02-21T09:49:40.0339943Z @%p123 mbarrier.init.shared::cta.b64 [%r506], 1; 2026-02-21T09:49:40.0340156Z // end inline asm 2026-02-21T09:49:40.0340335Z bar.sync 0, 128; 2026-02-21T09:49:40.0340493Z add.s32 %r507, %r212, 180272; 2026-02-21T09:49:40.0340664Z // begin inline asm 2026-02-21T09:49:40.0340852Z @%p123 mbarrier.init.shared::cta.b64 [%r507], 1; 2026-02-21T09:49:40.0341066Z // end inline asm 2026-02-21T09:49:40.0341222Z add.s32 %r508, %r212, 180288; 2026-02-21T09:49:40.0341405Z // begin inline asm 2026-02-21T09:49:40.0341587Z @%p123 mbarrier.init.shared::cta.b64 [%r508], 1; 2026-02-21T09:49:40.0341834Z // end inline asm 2026-02-21T09:49:40.0341981Z bar.sync 0, 128; 2026-02-21T09:49:40.0342139Z add.s32 %r509, %r212, 180296; 2026-02-21T09:49:40.0342306Z // begin inline asm 2026-02-21T09:49:40.0342495Z @%p123 mbarrier.init.shared::cta.b64 [%r509], 1; 2026-02-21T09:49:40.0342709Z // end inline asm 2026-02-21T09:49:40.0342855Z bar.sync 0, 128; 2026-02-21T09:49:40.0343014Z add.s32 %r510, %r212, 180304; 2026-02-21T09:49:40.0343183Z // begin inline asm 2026-02-21T09:49:40.0343372Z @%p123 mbarrier.init.shared::cta.b64 [%r510], 1; 2026-02-21T09:49:40.0343580Z // end inline asm 2026-02-21T09:49:40.0343731Z bar.sync 0, 128; 2026-02-21T09:49:40.0343881Z add.s32 %r511, %r212, 180312; 2026-02-21T09:49:40.0344060Z // begin inline asm 2026-02-21T09:49:40.0344240Z @%p123 mbarrier.init.shared::cta.b64 [%r511], 1; 2026-02-21T09:49:40.0344456Z // end inline asm 2026-02-21T09:49:40.0344609Z bar.sync 0, 128; 2026-02-21T09:49:40.0344825Z add.s32 %r512, %r212, 180320; 2026-02-21T09:49:40.0345010Z // begin inline asm 2026-02-21T09:49:40.0345199Z @%p123 mbarrier.init.shared::cta.b64 [%r512], 1; 2026-02-21T09:49:40.0345418Z // end inline asm 2026-02-21T09:49:40.0345569Z bar.sync 0, 128; 2026-02-21T09:49:40.0345732Z add.s32 %r513, %r212, 180328; 2026-02-21T09:49:40.0345907Z // begin inline asm 2026-02-21T09:49:40.0346102Z @%p123 mbarrier.init.shared::cta.b64 [%r513], 1; 2026-02-21T09:49:40.0346319Z // end inline asm 2026-02-21T09:49:40.0346469Z bar.sync 0, 128; 2026-02-21T09:49:40.0346640Z add.s32 %r514, %r212, 180336; 2026-02-21T09:49:40.0346818Z // begin inline asm 2026-02-21T09:49:40.0347011Z @%p123 mbarrier.init.shared::cta.b64 [%r514], 1; 2026-02-21T09:49:40.0347223Z // end inline asm 2026-02-21T09:49:40.0347509Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0347868Z bar.sync 0, 128; 2026-02-21T09:49:40.0348029Z // begin inline asm 2026-02-21T09:49:40.0348232Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r501]; 2026-02-21T09:49:40.0348457Z // end inline asm 2026-02-21T09:49:40.0348612Z bar.sync 0, 128; 2026-02-21T09:49:40.0348760Z // begin inline asm 2026-02-21T09:49:40.0348956Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r502]; 2026-02-21T09:49:40.0349173Z // end inline asm 2026-02-21T09:49:40.0349327Z bar.sync 0, 128; 2026-02-21T09:49:40.0349474Z // begin inline asm 2026-02-21T09:49:40.0349668Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r503]; 2026-02-21T09:49:40.0349881Z // end inline asm 2026-02-21T09:49:40.0350041Z bar.sync 0, 128; 2026-02-21T09:49:40.0350197Z // begin inline asm 2026-02-21T09:49:40.0350386Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r504]; 2026-02-21T09:49:40.0350608Z // end inline asm 2026-02-21T09:49:40.0350769Z bar.sync 0, 128; 2026-02-21T09:49:40.0350922Z // begin inline asm 2026-02-21T09:49:40.0351109Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r505]; 2026-02-21T09:49:40.0351331Z // end inline asm 2026-02-21T09:49:40.0351478Z bar.sync 0, 128; 2026-02-21T09:49:40.0351632Z // begin inline asm 2026-02-21T09:49:40.0351823Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r506]; 2026-02-21T09:49:40.0352036Z // end inline asm 2026-02-21T09:49:40.0352189Z bar.sync 0, 128; 2026-02-21T09:49:40.0352336Z // begin inline asm 2026-02-21T09:49:40.0352528Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r507]; 2026-02-21T09:49:40.0352737Z // end inline asm 2026-02-21T09:49:40.0353034Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0353403Z bar.sync 0, 128; 2026-02-21T09:49:40.0353565Z add.s32 %r522, %r212, 180352; 2026-02-21T09:49:40.0353743Z // begin inline asm 2026-02-21T09:49:40.0353924Z @%p123 mbarrier.init.shared::cta.b64 [%r522], 1; 2026-02-21T09:49:40.0354140Z // end inline asm 2026-02-21T09:49:40.0354285Z bar.sync 0, 128; 2026-02-21T09:49:40.0354445Z add.s32 %r523, %r212, 180360; 2026-02-21T09:49:40.0354619Z // begin inline asm 2026-02-21T09:49:40.0354879Z @%p123 mbarrier.init.shared::cta.b64 [%r523], 1; 2026-02-21T09:49:40.0355092Z // end inline asm 2026-02-21T09:49:40.0355247Z add.s32 %r524, %r212, 180368; 2026-02-21T09:49:40.0355419Z // begin inline asm 2026-02-21T09:49:40.0355606Z @%p123 mbarrier.init.shared::cta.b64 [%r524], 1; 2026-02-21T09:49:40.0355818Z // end inline asm 2026-02-21T09:49:40.0355962Z bar.sync 0, 128; 2026-02-21T09:49:40.0356121Z add.s32 %r525, %r212, 180376; 2026-02-21T09:49:40.0356291Z // begin inline asm 2026-02-21T09:49:40.0356483Z @%p123 mbarrier.init.shared::cta.b64 [%r525], 1; 2026-02-21T09:49:40.0356690Z // end inline asm 2026-02-21T09:49:40.0356979Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0357397Z bar.sync 0, 128; 2026-02-21T09:49:40.0357582Z // begin inline asm 2026-02-21T09:49:40.0357859Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r524]; 2026-02-21T09:49:40.0358115Z // end inline asm 2026-02-21T09:49:40.0358350Z bar.sync 0, 128; 2026-02-21T09:49:40.0358518Z // begin inline asm 2026-02-21T09:49:40.0358734Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r525]; 2026-02-21T09:49:40.0358986Z // end inline asm 2026-02-21T09:49:40.0359284Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0359624Z st.shared.b32 [global_smem+180384], 33554689; 2026-02-21T09:49:40.0359851Z st.shared.b32 [global_smem+172032], %r1622; 2026-02-21T09:49:40.0360054Z barrier.sync 1; 2026-02-21T09:49:40.0360224Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:49:40.0360427Z barrier.sync 1; 2026-02-21T09:49:40.0360580Z setp.lt.s32 %p116, %r1642, 1; 2026-02-21T09:49:40.0360766Z mov.b32 %r1655, %r1653; 2026-02-21T09:49:40.0360934Z @%p116 bra $L__BB0_23; 2026-02-21T09:49:40.0361125Z // %bb.17: // %.lr.ph12 2026-02-21T09:49:40.0361476Z .loc 1 0 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0:107 2026-02-21T09:49:40.0361823Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:49:40.0362034Z shl.b32 %r540, %r1, 3; 2026-02-21T09:49:40.0362192Z and.b32 %r44, %r540, 120; 2026-02-21T09:49:40.0362360Z shr.u32 %r541, %r1, 4; 2026-02-21T09:49:40.0362521Z bfe.u32 %r45, %r1, 4, 3; 2026-02-21T09:49:40.0362686Z or.b32 %r46, %r45, 8; 2026-02-21T09:49:40.0362838Z or.b32 %r47, %r45, 16; 2026-02-21T09:49:40.0362998Z or.b32 %r48, %r45, 24; 2026-02-21T09:49:40.0363149Z or.b32 %r49, %r45, 32; 2026-02-21T09:49:40.0363307Z or.b32 %r50, %r45, 40; 2026-02-21T09:49:40.0363462Z or.b32 %r51, %r45, 48; 2026-02-21T09:49:40.0363614Z or.b32 %r52, %r541, 56; 2026-02-21T09:49:40.0363777Z or.b32 %r53, %r45, 64; 2026-02-21T09:49:40.0363927Z or.b32 %r54, %r45, 72; 2026-02-21T09:49:40.0364084Z or.b32 %r55, %r45, 80; 2026-02-21T09:49:40.0364234Z or.b32 %r56, %r45, 88; 2026-02-21T09:49:40.0364393Z or.b32 %r57, %r45, 96; 2026-02-21T09:49:40.0364551Z or.b32 %r58, %r45, 104; 2026-02-21T09:49:40.0364763Z or.b32 %r59, %r45, 112; 2026-02-21T09:49:40.0364931Z or.b32 %r60, %r541, 120; 2026-02-21T09:49:40.0365104Z or.b32 %r61, %r45, 128; 2026-02-21T09:49:40.0365270Z or.b32 %r62, %r45, 136; 2026-02-21T09:49:40.0365428Z or.b32 %r63, %r45, 144; 2026-02-21T09:49:40.0365594Z or.b32 %r64, %r45, 152; 2026-02-21T09:49:40.0365751Z or.b32 %r65, %r45, 160; 2026-02-21T09:49:40.0365916Z or.b32 %r66, %r45, 168; 2026-02-21T09:49:40.0366073Z or.b32 %r67, %r45, 176; 2026-02-21T09:49:40.0366241Z or.b32 %r68, %r541, 184; 2026-02-21T09:49:40.0366403Z or.b32 %r69, %r45, 192; 2026-02-21T09:49:40.0366603Z or.b32 %r70, %r45, 200; 2026-02-21T09:49:40.0366762Z or.b32 %r71, %r45, 208; 2026-02-21T09:49:40.0366925Z or.b32 %r72, %r45, 216; 2026-02-21T09:49:40.0367089Z or.b32 %r73, %r45, 224; 2026-02-21T09:49:40.0367246Z or.b32 %r74, %r45, 232; 2026-02-21T09:49:40.0367413Z or.b32 %r75, %r45, 240; 2026-02-21T09:49:40.0367575Z or.b32 %r76, %r541, 248; 2026-02-21T09:49:40.0367941Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0368269Z add.s32 %r1651, %r43, -1; 2026-02-21T09:49:40.0368449Z shl.b32 %r548, %r1, 10; 2026-02-21T09:49:40.0368615Z and.b32 %r549, %r548, 6144; 2026-02-21T09:49:40.0368798Z shl.b32 %r550, %r1, 4; 2026-02-21T09:49:40.0368964Z and.b32 %r551, %r550, 2032; 2026-02-21T09:49:40.0369146Z or.b32 %r552, %r549, %r551; 2026-02-21T09:49:40.0369327Z add.s32 %r554, %r212, 172032; 2026-02-21T09:49:40.0369502Z add.s32 %r80, %r554, %r552; 2026-02-21T09:49:40.0369688Z xor.b32 %r555, %r552, 32; 2026-02-21T09:49:40.0369864Z add.s32 %r81, %r554, %r555; 2026-02-21T09:49:40.0370049Z xor.b32 %r556, %r552, 64; 2026-02-21T09:49:40.0370221Z add.s32 %r82, %r554, %r556; 2026-02-21T09:49:40.0370398Z xor.b32 %r557, %r552, 96; 2026-02-21T09:49:40.0370565Z add.s32 %r83, %r554, %r557; 2026-02-21T09:49:40.0370743Z and.b32 %r558, %r1, 96; 2026-02-21T09:49:40.0370935Z shl.b32 %r559, %r558, 6; 2026-02-21T09:49:40.0371111Z shl.b32 %r560, %r1, 5; 2026-02-21T09:49:40.0371280Z and.b32 %r561, %r560, 96; 2026-02-21T09:49:40.0371449Z and.b32 %r562, %r550, 384; 2026-02-21T09:49:40.0371627Z and.b32 %r564, %r537, 16; 2026-02-21T09:49:40.0371792Z or.b32 %r565, %r559, %r561; 2026-02-21T09:49:40.0371968Z or.b32 %r566, %r562, %r558; 2026-02-21T09:49:40.0372140Z xor.b32 %r567, %r565, %r566; 2026-02-21T09:49:40.0372323Z add.s32 %r568, %r554, %r564; 2026-02-21T09:49:40.0372494Z add.s32 %r849, %r568, %r567; 2026-02-21T09:49:40.0372675Z add.s32 %r854, %r849, 512; 2026-02-21T09:49:40.0372854Z add.s32 %r859, %r849, 1024; 2026-02-21T09:49:40.0373023Z add.s32 %r864, %r849, 1536; 2026-02-21T09:49:40.0373200Z mov.b32 %r1648, -1; 2026-02-21T09:49:40.0373361Z mov.b32 %r1655, %r1653; 2026-02-21T09:49:40.0373533Z mov.b32 %r1650, %r1653; 2026-02-21T09:49:40.0373693Z mov.b32 %r1649, %r1653; 2026-02-21T09:49:40.0373893Z bra.uni $L__BB0_18; 2026-02-21T09:49:40.0374116Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.0374374Z shl.b32 %r1134, %r1653, 3; 2026-02-21T09:49:40.0374552Z add.s32 %r1136, %r212, %r1134; 2026-02-21T09:49:40.0374771Z add.s32 %r571, %r1136, 180352; 2026-02-21T09:49:40.0375086Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0375410Z shl.b32 %r1137, %r1653, 8; 2026-02-21T09:49:40.0375586Z bar.sync 0, 128; 2026-02-21T09:49:40.0375739Z // begin inline asm 2026-02-21T09:49:40.0375897Z 2026-02-21T09:49:40.0376020Z { 2026-02-21T09:49:40.0376163Z .reg .pred complete; 2026-02-21T09:49:40.0376323Z waitLoop: 2026-02-21T09:49:40.0376545Z mbarrier.try_wait.parity.shared.b64 complete, [%r571], %r1655; 2026-02-21T09:49:40.0376812Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.0376990Z } 2026-02-21T09:49:40.0377063Z 2026-02-21T09:49:40.0377134Z // end inline asm 2026-02-21T09:49:40.0377419Z .loc 1 43 32 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:43:32 2026-02-21T09:49:40.0377780Z or.b32 %r1138, %r1649, %r44; 2026-02-21T09:49:40.0378084Z .loc 1 45 32 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:45:32 2026-02-21T09:49:40.0378432Z add.s32 %r1139, %r1650, %r45; 2026-02-21T09:49:40.0378617Z add.s32 %r1140, %r1650, %r46; 2026-02-21T09:49:40.0378804Z add.s32 %r1141, %r1650, %r47; 2026-02-21T09:49:40.0378989Z add.s32 %r1142, %r1650, %r48; 2026-02-21T09:49:40.0379167Z add.s32 %r1143, %r1650, %r49; 2026-02-21T09:49:40.0379351Z add.s32 %r1144, %r1650, %r50; 2026-02-21T09:49:40.0379561Z add.s32 %r1145, %r1650, %r51; 2026-02-21T09:49:40.0379746Z add.s32 %r1146, %r1650, %r52; 2026-02-21T09:49:40.0379923Z add.s32 %r1147, %r1650, %r53; 2026-02-21T09:49:40.0380110Z add.s32 %r1148, %r1650, %r54; 2026-02-21T09:49:40.0380285Z add.s32 %r1149, %r1650, %r55; 2026-02-21T09:49:40.0380470Z add.s32 %r1150, %r1650, %r56; 2026-02-21T09:49:40.0380650Z add.s32 %r1151, %r1650, %r57; 2026-02-21T09:49:40.0380854Z add.s32 %r1152, %r1650, %r58; 2026-02-21T09:49:40.0381034Z add.s32 %r1153, %r1650, %r59; 2026-02-21T09:49:40.0381208Z add.s32 %r1154, %r1650, %r60; 2026-02-21T09:49:40.0381391Z add.s32 %r1155, %r1650, %r61; 2026-02-21T09:49:40.0381566Z add.s32 %r1156, %r1650, %r62; 2026-02-21T09:49:40.0381749Z add.s32 %r1157, %r1650, %r63; 2026-02-21T09:49:40.0381922Z add.s32 %r1158, %r1650, %r64; 2026-02-21T09:49:40.0382103Z add.s32 %r1159, %r1650, %r65; 2026-02-21T09:49:40.0382276Z add.s32 %r1160, %r1650, %r66; 2026-02-21T09:49:40.0382457Z add.s32 %r1161, %r1650, %r67; 2026-02-21T09:49:40.0382640Z add.s32 %r1162, %r1650, %r68; 2026-02-21T09:49:40.0382814Z add.s32 %r1163, %r1650, %r69; 2026-02-21T09:49:40.0382994Z add.s32 %r1164, %r1650, %r70; 2026-02-21T09:49:40.0383167Z add.s32 %r1165, %r1650, %r71; 2026-02-21T09:49:40.0383358Z add.s32 %r1166, %r1650, %r72; 2026-02-21T09:49:40.0383521Z add.s32 %r1167, %r1650, %r73; 2026-02-21T09:49:40.0383717Z add.s32 %r1168, %r1650, %r74; 2026-02-21T09:49:40.0383881Z add.s32 %r1169, %r1650, %r75; 2026-02-21T09:49:40.0384047Z add.s32 %r1170, %r1650, %r76; 2026-02-21T09:49:40.0384331Z .loc 1 59 53 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:59:53 2026-02-21T09:49:40.0384647Z mad.lo.s32 %r1171, %r1139, 12288, %r1138; 2026-02-21T09:49:40.0384886Z mad.lo.s32 %r1172, %r1140, 12288, %r1138; 2026-02-21T09:49:40.0385080Z mad.lo.s32 %r1173, %r1141, 12288, %r1138; 2026-02-21T09:49:40.0385276Z mad.lo.s32 %r1174, %r1142, 12288, %r1138; 2026-02-21T09:49:40.0385478Z mad.lo.s32 %r1175, %r1143, 12288, %r1138; 2026-02-21T09:49:40.0385682Z mad.lo.s32 %r1176, %r1144, 12288, %r1138; 2026-02-21T09:49:40.0385880Z mad.lo.s32 %r1177, %r1145, 12288, %r1138; 2026-02-21T09:49:40.0386083Z mad.lo.s32 %r1178, %r1146, 12288, %r1138; 2026-02-21T09:49:40.0386285Z mad.lo.s32 %r1179, %r1147, 12288, %r1138; 2026-02-21T09:49:40.0386520Z mad.lo.s32 %r1180, %r1148, 12288, %r1138; 2026-02-21T09:49:40.0386738Z mad.lo.s32 %r1181, %r1149, 12288, %r1138; 2026-02-21T09:49:40.0386927Z mad.lo.s32 %r1182, %r1150, 12288, %r1138; 2026-02-21T09:49:40.0387120Z mad.lo.s32 %r1183, %r1151, 12288, %r1138; 2026-02-21T09:49:40.0387306Z mad.lo.s32 %r1184, %r1152, 12288, %r1138; 2026-02-21T09:49:40.0387499Z mad.lo.s32 %r1185, %r1153, 12288, %r1138; 2026-02-21T09:49:40.0387685Z mad.lo.s32 %r1186, %r1154, 12288, %r1138; 2026-02-21T09:49:40.0387880Z mad.lo.s32 %r1187, %r1155, 12288, %r1138; 2026-02-21T09:49:40.0388075Z mad.lo.s32 %r1188, %r1156, 12288, %r1138; 2026-02-21T09:49:40.0388266Z mad.lo.s32 %r1189, %r1157, 12288, %r1138; 2026-02-21T09:49:40.0388463Z mad.lo.s32 %r1190, %r1158, 12288, %r1138; 2026-02-21T09:49:40.0388656Z mad.lo.s32 %r1191, %r1159, 12288, %r1138; 2026-02-21T09:49:40.0388849Z mad.lo.s32 %r1192, %r1160, 12288, %r1138; 2026-02-21T09:49:40.0389035Z mad.lo.s32 %r1193, %r1161, 12288, %r1138; 2026-02-21T09:49:40.0389231Z mad.lo.s32 %r1194, %r1162, 12288, %r1138; 2026-02-21T09:49:40.0389424Z mad.lo.s32 %r1195, %r1163, 12288, %r1138; 2026-02-21T09:49:40.0389624Z mad.lo.s32 %r1196, %r1164, 12288, %r1138; 2026-02-21T09:49:40.0389831Z mad.lo.s32 %r1197, %r1165, 12288, %r1138; 2026-02-21T09:49:40.0390020Z mad.lo.s32 %r1198, %r1166, 12288, %r1138; 2026-02-21T09:49:40.0390214Z mad.lo.s32 %r1199, %r1167, 12288, %r1138; 2026-02-21T09:49:40.0390403Z mad.lo.s32 %r1200, %r1168, 12288, %r1138; 2026-02-21T09:49:40.0390594Z mad.lo.s32 %r1201, %r1169, 12288, %r1138; 2026-02-21T09:49:40.0390781Z mad.lo.s32 %r1202, %r1170, 12288, %r1138; 2026-02-21T09:49:40.0391084Z .loc 1 59 24 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:59:24 2026-02-21T09:49:40.0391457Z mad.wide.s32 %rd65, %r1171, 2, %rd5; 2026-02-21T09:49:40.0391648Z mad.wide.s32 %rd66, %r1172, 2, %rd5; 2026-02-21T09:49:40.0391840Z mad.wide.s32 %rd67, %r1173, 2, %rd5; 2026-02-21T09:49:40.0392025Z mad.wide.s32 %rd68, %r1174, 2, %rd5; 2026-02-21T09:49:40.0392214Z mad.wide.s32 %rd69, %r1175, 2, %rd5; 2026-02-21T09:49:40.0392434Z mad.wide.s32 %rd70, %r1176, 2, %rd5; 2026-02-21T09:49:40.0392630Z mad.wide.s32 %rd71, %r1177, 2, %rd5; 2026-02-21T09:49:40.0392819Z mad.wide.s32 %rd72, %r1178, 2, %rd5; 2026-02-21T09:49:40.0393014Z mad.wide.s32 %rd73, %r1179, 2, %rd5; 2026-02-21T09:49:40.0393210Z mad.wide.s32 %rd74, %r1180, 2, %rd5; 2026-02-21T09:49:40.0393397Z mad.wide.s32 %rd75, %r1181, 2, %rd5; 2026-02-21T09:49:40.0393592Z mad.wide.s32 %rd76, %r1182, 2, %rd5; 2026-02-21T09:49:40.0393780Z mad.wide.s32 %rd77, %r1183, 2, %rd5; 2026-02-21T09:49:40.0393981Z mad.wide.s32 %rd78, %r1184, 2, %rd5; 2026-02-21T09:49:40.0394161Z mad.wide.s32 %rd79, %r1185, 2, %rd5; 2026-02-21T09:49:40.0394348Z mad.wide.s32 %rd80, %r1186, 2, %rd5; 2026-02-21T09:49:40.0394526Z mad.wide.s32 %rd81, %r1187, 2, %rd5; 2026-02-21T09:49:40.0394743Z mad.wide.s32 %rd82, %r1188, 2, %rd5; 2026-02-21T09:49:40.0394934Z mad.wide.s32 %rd83, %r1189, 2, %rd5; 2026-02-21T09:49:40.0395145Z mad.wide.s32 %rd84, %r1190, 2, %rd5; 2026-02-21T09:49:40.0395333Z mad.wide.s32 %rd85, %r1191, 2, %rd5; 2026-02-21T09:49:40.0395511Z mad.wide.s32 %rd86, %r1192, 2, %rd5; 2026-02-21T09:49:40.0395695Z mad.wide.s32 %rd87, %r1193, 2, %rd5; 2026-02-21T09:49:40.0395873Z mad.wide.s32 %rd88, %r1194, 2, %rd5; 2026-02-21T09:49:40.0396058Z mad.wide.s32 %rd89, %r1195, 2, %rd5; 2026-02-21T09:49:40.0396234Z mad.wide.s32 %rd90, %r1196, 2, %rd5; 2026-02-21T09:49:40.0396420Z mad.wide.s32 %rd91, %r1197, 2, %rd5; 2026-02-21T09:49:40.0396607Z mad.wide.s32 %rd92, %r1198, 2, %rd5; 2026-02-21T09:49:40.0396786Z mad.wide.s32 %rd93, %r1199, 2, %rd5; 2026-02-21T09:49:40.0396976Z mad.wide.s32 %rd94, %r1200, 2, %rd5; 2026-02-21T09:49:40.0397155Z mad.wide.s32 %rd95, %r1201, 2, %rd5; 2026-02-21T09:49:40.0397342Z mad.wide.s32 %rd96, %r1202, 2, %rd5; 2026-02-21T09:49:40.0397663Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0397991Z add.s32 %r589, %r229, %r1137; 2026-02-21T09:49:40.0398173Z // begin inline asm 2026-02-21T09:49:40.0398577Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r573, %r574, %r575, %r576, %r577, %r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588}, [%r589 + 0]; 2026-02-21T09:49:40.0399000Z // end inline asm 2026-02-21T09:49:40.0399153Z // begin inline asm 2026-02-21T09:49:40.0399554Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r590, %r591, %r592, %r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605}, [%r589 + 16]; 2026-02-21T09:49:40.0399965Z // end inline asm 2026-02-21T09:49:40.0400132Z // begin inline asm 2026-02-21T09:49:40.0400523Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r607, %r608, %r609, %r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622}, [%r589 + 32]; 2026-02-21T09:49:40.0400932Z // end inline asm 2026-02-21T09:49:40.0401090Z // begin inline asm 2026-02-21T09:49:40.0401468Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r624, %r625, %r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639}, [%r589 + 48]; 2026-02-21T09:49:40.0401878Z // end inline asm 2026-02-21T09:49:40.0402027Z // begin inline asm 2026-02-21T09:49:40.0402404Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r641, %r642, %r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656}, [%r589 + 64]; 2026-02-21T09:49:40.0402815Z // end inline asm 2026-02-21T09:49:40.0402964Z // begin inline asm 2026-02-21T09:49:40.0403344Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r658, %r659, %r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673}, [%r589 + 80]; 2026-02-21T09:49:40.0403793Z // end inline asm 2026-02-21T09:49:40.0403942Z // begin inline asm 2026-02-21T09:49:40.0404309Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r675, %r676, %r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690}, [%r589 + 96]; 2026-02-21T09:49:40.0404795Z // end inline asm 2026-02-21T09:49:40.0404976Z // begin inline asm 2026-02-21T09:49:40.0405370Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r692, %r693, %r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707}, [%r589 + 112]; 2026-02-21T09:49:40.0405822Z // end inline asm 2026-02-21T09:49:40.0405971Z // begin inline asm 2026-02-21T09:49:40.0406384Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r709, %r710, %r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724}, [%r589 + 128]; 2026-02-21T09:49:40.0406827Z // end inline asm 2026-02-21T09:49:40.0406977Z // begin inline asm 2026-02-21T09:49:40.0407402Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r726, %r727, %r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741}, [%r589 + 144]; 2026-02-21T09:49:40.0407870Z // end inline asm 2026-02-21T09:49:40.0408038Z // begin inline asm 2026-02-21T09:49:40.0408475Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r743, %r744, %r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758}, [%r589 + 160]; 2026-02-21T09:49:40.0408941Z // end inline asm 2026-02-21T09:49:40.0409090Z // begin inline asm 2026-02-21T09:49:40.0409511Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r760, %r761, %r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775}, [%r589 + 176]; 2026-02-21T09:49:40.0409971Z // end inline asm 2026-02-21T09:49:40.0410126Z // begin inline asm 2026-02-21T09:49:40.0410532Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r777, %r778, %r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792}, [%r589 + 192]; 2026-02-21T09:49:40.0410970Z // end inline asm 2026-02-21T09:49:40.0411133Z // begin inline asm 2026-02-21T09:49:40.0411568Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r794, %r795, %r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809}, [%r589 + 208]; 2026-02-21T09:49:40.0412009Z // end inline asm 2026-02-21T09:49:40.0412166Z // begin inline asm 2026-02-21T09:49:40.0412565Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r811, %r812, %r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826}, [%r589 + 224]; 2026-02-21T09:49:40.0413006Z // end inline asm 2026-02-21T09:49:40.0413153Z // begin inline asm 2026-02-21T09:49:40.0413560Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r828, %r829, %r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843}, [%r589 + 240]; 2026-02-21T09:49:40.0414007Z // end inline asm 2026-02-21T09:49:40.0414154Z // begin inline asm 2026-02-21T09:49:40.0414331Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:49:40.0414512Z // end inline asm 2026-02-21T09:49:40.0414695Z cvt.u64.u32 %rd97, %r573; 2026-02-21T09:49:40.0414874Z cvt.u64.u32 %rd98, %r574; 2026-02-21T09:49:40.0415056Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:49:40.0415233Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:49:40.0415551Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0415894Z mov.b64 {%r1203, %r1204}, %rd100; 2026-02-21T09:49:40.0416102Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T09:49:40.0416448Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0416790Z cvt.u64.u32 %rd101, %r575; 2026-02-21T09:49:40.0416977Z cvt.u64.u32 %rd102, %r576; 2026-02-21T09:49:40.0417154Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:49:40.0417410Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:49:40.0417724Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0418071Z mov.b64 {%r1206, %r1207}, %rd104; 2026-02-21T09:49:40.0418280Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T09:49:40.0418601Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0418973Z cvt.u64.u32 %rd105, %r577; 2026-02-21T09:49:40.0419146Z cvt.u64.u32 %rd106, %r578; 2026-02-21T09:49:40.0419328Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:49:40.0419509Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:49:40.0419824Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0420148Z mov.b64 {%r1209, %r1210}, %rd108; 2026-02-21T09:49:40.0420337Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T09:49:40.0420644Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0420961Z cvt.u64.u32 %rd109, %r579; 2026-02-21T09:49:40.0421131Z cvt.u64.u32 %rd110, %r580; 2026-02-21T09:49:40.0421293Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:49:40.0421470Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:49:40.0421759Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0422102Z mov.b64 {%r1212, %r1213}, %rd112; 2026-02-21T09:49:40.0422298Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T09:49:40.0422597Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0422900Z cvt.u64.u32 %rd113, %r581; 2026-02-21T09:49:40.0423062Z cvt.u64.u32 %rd114, %r582; 2026-02-21T09:49:40.0423229Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:49:40.0423395Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:49:40.0423683Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0423999Z mov.b64 {%r1215, %r1216}, %rd116; 2026-02-21T09:49:40.0424183Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T09:49:40.0424489Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0424867Z cvt.u64.u32 %rd117, %r583; 2026-02-21T09:49:40.0425051Z cvt.u64.u32 %rd118, %r584; 2026-02-21T09:49:40.0425231Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:49:40.0425420Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:49:40.0425727Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0426050Z mov.b64 {%r1218, %r1219}, %rd120; 2026-02-21T09:49:40.0426248Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T09:49:40.0426555Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0426864Z cvt.u64.u32 %rd121, %r585; 2026-02-21T09:49:40.0427036Z cvt.u64.u32 %rd122, %r586; 2026-02-21T09:49:40.0427211Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:49:40.0427382Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:49:40.0427678Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0427995Z mov.b64 {%r1221, %r1222}, %rd124; 2026-02-21T09:49:40.0428186Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T09:49:40.0428505Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0428811Z cvt.u64.u32 %rd125, %r587; 2026-02-21T09:49:40.0428991Z cvt.u64.u32 %rd126, %r588; 2026-02-21T09:49:40.0429158Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:49:40.0429335Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:49:40.0429625Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0429927Z mov.b64 {%r1224, %r1225}, %rd128; 2026-02-21T09:49:40.0430123Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T09:49:40.0430446Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0430754Z cvt.u64.u32 %rd129, %r590; 2026-02-21T09:49:40.0430914Z cvt.u64.u32 %rd130, %r591; 2026-02-21T09:49:40.0431083Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:49:40.0431247Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:49:40.0431570Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0431883Z mov.b64 {%r1227, %r1228}, %rd132; 2026-02-21T09:49:40.0432066Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T09:49:40.0432366Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0432664Z cvt.u64.u32 %rd133, %r592; 2026-02-21T09:49:40.0432832Z cvt.u64.u32 %rd134, %r593; 2026-02-21T09:49:40.0432992Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:49:40.0433163Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:49:40.0433450Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0433749Z mov.b64 {%r1230, %r1231}, %rd136; 2026-02-21T09:49:40.0433938Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T09:49:40.0434231Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0434559Z cvt.u64.u32 %rd137, %r594; 2026-02-21T09:49:40.0434754Z cvt.u64.u32 %rd138, %r595; 2026-02-21T09:49:40.0434925Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:49:40.0435090Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:49:40.0435375Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0435702Z mov.b64 {%r1233, %r1234}, %rd140; 2026-02-21T09:49:40.0435894Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T09:49:40.0436210Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0436521Z cvt.u64.u32 %rd141, %r596; 2026-02-21T09:49:40.0436689Z cvt.u64.u32 %rd142, %r597; 2026-02-21T09:49:40.0436850Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:49:40.0437023Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:49:40.0437339Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0437649Z mov.b64 {%r1236, %r1237}, %rd144; 2026-02-21T09:49:40.0437840Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T09:49:40.0438139Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0438448Z cvt.u64.u32 %rd145, %r598; 2026-02-21T09:49:40.0438610Z cvt.u64.u32 %rd146, %r599; 2026-02-21T09:49:40.0438780Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:49:40.0438943Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:49:40.0439227Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0439533Z mov.b64 {%r1239, %r1240}, %rd148; 2026-02-21T09:49:40.0439717Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T09:49:40.0440020Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0440327Z cvt.u64.u32 %rd149, %r600; 2026-02-21T09:49:40.0440499Z cvt.u64.u32 %rd150, %r601; 2026-02-21T09:49:40.0440664Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:49:40.0440851Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:49:40.0441137Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0441446Z mov.b64 {%r1242, %r1243}, %rd152; 2026-02-21T09:49:40.0441636Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T09:49:40.0441929Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0442229Z cvt.u64.u32 %rd153, %r602; 2026-02-21T09:49:40.0442388Z cvt.u64.u32 %rd154, %r603; 2026-02-21T09:49:40.0442589Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:49:40.0442754Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:49:40.0443046Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0443364Z mov.b64 {%r1245, %r1246}, %rd156; 2026-02-21T09:49:40.0443548Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T09:49:40.0443885Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0444189Z cvt.u64.u32 %rd157, %r604; 2026-02-21T09:49:40.0444357Z cvt.u64.u32 %rd158, %r605; 2026-02-21T09:49:40.0444518Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:49:40.0444723Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:49:40.0445021Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0445338Z mov.b64 {%r1248, %r1249}, %rd160; 2026-02-21T09:49:40.0445530Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T09:49:40.0445833Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0446138Z cvt.u64.u32 %rd161, %r607; 2026-02-21T09:49:40.0446300Z cvt.u64.u32 %rd162, %r608; 2026-02-21T09:49:40.0446470Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:49:40.0446642Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:49:40.0446962Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0447280Z mov.b64 {%r1251, %r1252}, %rd164; 2026-02-21T09:49:40.0447461Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T09:49:40.0447763Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0448067Z cvt.u64.u32 %rd165, %r609; 2026-02-21T09:49:40.0448234Z cvt.u64.u32 %rd166, %r610; 2026-02-21T09:49:40.0448393Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:49:40.0448565Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:49:40.0448851Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0449152Z mov.b64 {%r1254, %r1255}, %rd168; 2026-02-21T09:49:40.0449342Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T09:49:40.0449666Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0449976Z cvt.u64.u32 %rd169, %r611; 2026-02-21T09:49:40.0450141Z cvt.u64.u32 %rd170, %r612; 2026-02-21T09:49:40.0450310Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:49:40.0450485Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:49:40.0450766Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0451075Z mov.b64 {%r1257, %r1258}, %rd172; 2026-02-21T09:49:40.0451256Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T09:49:40.0451561Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0451861Z cvt.u64.u32 %rd173, %r613; 2026-02-21T09:49:40.0452033Z cvt.u64.u32 %rd174, %r614; 2026-02-21T09:49:40.0452196Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:49:40.0452374Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:49:40.0452667Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0452971Z mov.b64 {%r1260, %r1261}, %rd176; 2026-02-21T09:49:40.0453165Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T09:49:40.0453466Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0453771Z cvt.u64.u32 %rd177, %r615; 2026-02-21T09:49:40.0453932Z cvt.u64.u32 %rd178, %r616; 2026-02-21T09:49:40.0454100Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:49:40.0454270Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:49:40.0454554Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0454939Z mov.b64 {%r1263, %r1264}, %rd180; 2026-02-21T09:49:40.0455121Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T09:49:40.0455429Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0455728Z cvt.u64.u32 %rd181, %r617; 2026-02-21T09:49:40.0455897Z cvt.u64.u32 %rd182, %r618; 2026-02-21T09:49:40.0456059Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:49:40.0456264Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:49:40.0456548Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0456851Z mov.b64 {%r1266, %r1267}, %rd184; 2026-02-21T09:49:40.0457044Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T09:49:40.0457343Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0457646Z cvt.u64.u32 %rd185, %r619; 2026-02-21T09:49:40.0457810Z cvt.u64.u32 %rd186, %r620; 2026-02-21T09:49:40.0457981Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:49:40.0458153Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:49:40.0458435Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0458755Z mov.b64 {%r1269, %r1270}, %rd188; 2026-02-21T09:49:40.0458939Z cvt.rn.f16x2.f32 %r1271, %r1270, %r1269; 2026-02-21T09:49:40.0459276Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0459588Z cvt.u64.u32 %rd189, %r621; 2026-02-21T09:49:40.0459756Z cvt.u64.u32 %rd190, %r622; 2026-02-21T09:49:40.0459916Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:49:40.0460086Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:49:40.0460381Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0460696Z mov.b64 {%r1272, %r1273}, %rd192; 2026-02-21T09:49:40.0460883Z cvt.rn.f16x2.f32 %r1274, %r1273, %r1272; 2026-02-21T09:49:40.0461184Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0461490Z cvt.u64.u32 %rd193, %r624; 2026-02-21T09:49:40.0461652Z cvt.u64.u32 %rd194, %r625; 2026-02-21T09:49:40.0461821Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:49:40.0462039Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:49:40.0462328Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0462653Z mov.b64 {%r1275, %r1276}, %rd196; 2026-02-21T09:49:40.0462844Z cvt.rn.f16x2.f32 %r1277, %r1276, %r1275; 2026-02-21T09:49:40.0463153Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0463458Z cvt.u64.u32 %rd197, %r626; 2026-02-21T09:49:40.0463632Z cvt.u64.u32 %rd198, %r627; 2026-02-21T09:49:40.0463799Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:49:40.0463978Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:49:40.0464276Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0464587Z mov.b64 {%r1278, %r1279}, %rd200; 2026-02-21T09:49:40.0464819Z cvt.rn.f16x2.f32 %r1280, %r1279, %r1278; 2026-02-21T09:49:40.0465133Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0465446Z cvt.u64.u32 %rd201, %r628; 2026-02-21T09:49:40.0465609Z cvt.u64.u32 %rd202, %r629; 2026-02-21T09:49:40.0465776Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:49:40.0465946Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:49:40.0466237Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0466559Z mov.b64 {%r1281, %r1282}, %rd204; 2026-02-21T09:49:40.0466744Z cvt.rn.f16x2.f32 %r1283, %r1282, %r1281; 2026-02-21T09:49:40.0467061Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0467403Z cvt.u64.u32 %rd205, %r630; 2026-02-21T09:49:40.0467571Z cvt.u64.u32 %rd206, %r631; 2026-02-21T09:49:40.0467739Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:49:40.0467906Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:49:40.0468196Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0468504Z mov.b64 {%r1284, %r1285}, %rd208; 2026-02-21T09:49:40.0468726Z cvt.rn.f16x2.f32 %r1286, %r1285, %r1284; 2026-02-21T09:49:40.0469036Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0469346Z cvt.u64.u32 %rd209, %r632; 2026-02-21T09:49:40.0469509Z cvt.u64.u32 %rd210, %r633; 2026-02-21T09:49:40.0469677Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:49:40.0469849Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:49:40.0470136Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0470453Z mov.b64 {%r1287, %r1288}, %rd212; 2026-02-21T09:49:40.0470636Z cvt.rn.f16x2.f32 %r1289, %r1288, %r1287; 2026-02-21T09:49:40.0470944Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0471254Z cvt.u64.u32 %rd213, %r634; 2026-02-21T09:49:40.0471423Z cvt.u64.u32 %rd214, %r635; 2026-02-21T09:49:40.0471590Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:49:40.0471787Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:49:40.0472086Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0472392Z mov.b64 {%r1290, %r1291}, %rd216; 2026-02-21T09:49:40.0472580Z cvt.rn.f16x2.f32 %r1292, %r1291, %r1290; 2026-02-21T09:49:40.0472886Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0473199Z cvt.u64.u32 %rd217, %r636; 2026-02-21T09:49:40.0473358Z cvt.u64.u32 %rd218, %r637; 2026-02-21T09:49:40.0473530Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:49:40.0473701Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:49:40.0473984Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0474296Z mov.b64 {%r1293, %r1294}, %rd220; 2026-02-21T09:49:40.0474506Z cvt.rn.f16x2.f32 %r1295, %r1294, %r1293; 2026-02-21T09:49:40.0474845Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0475142Z cvt.u64.u32 %rd221, %r638; 2026-02-21T09:49:40.0475311Z cvt.u64.u32 %rd222, %r639; 2026-02-21T09:49:40.0475481Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:49:40.0475647Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:49:40.0475937Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0476243Z mov.b64 {%r1296, %r1297}, %rd224; 2026-02-21T09:49:40.0476436Z cvt.rn.f16x2.f32 %r1298, %r1297, %r1296; 2026-02-21T09:49:40.0476738Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0477054Z cvt.u64.u32 %rd225, %r641; 2026-02-21T09:49:40.0477213Z cvt.u64.u32 %rd226, %r642; 2026-02-21T09:49:40.0477381Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:49:40.0477552Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:49:40.0477837Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0478153Z mov.b64 {%r1299, %r1300}, %rd228; 2026-02-21T09:49:40.0478336Z cvt.rn.f16x2.f32 %r1301, %r1300, %r1299; 2026-02-21T09:49:40.0478640Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0478935Z cvt.u64.u32 %rd229, %r643; 2026-02-21T09:49:40.0479103Z cvt.u64.u32 %rd230, %r644; 2026-02-21T09:49:40.0479269Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:49:40.0479432Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:49:40.0479757Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0480058Z mov.b64 {%r1302, %r1303}, %rd232; 2026-02-21T09:49:40.0480248Z cvt.rn.f16x2.f32 %r1304, %r1303, %r1302; 2026-02-21T09:49:40.0480550Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0480870Z cvt.u64.u32 %rd233, %r645; 2026-02-21T09:49:40.0481064Z cvt.u64.u32 %rd234, %r646; 2026-02-21T09:49:40.0481237Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:49:40.0481409Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:49:40.0481687Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0482003Z mov.b64 {%r1305, %r1306}, %rd236; 2026-02-21T09:49:40.0482187Z cvt.rn.f16x2.f32 %r1307, %r1306, %r1305; 2026-02-21T09:49:40.0482491Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0482786Z cvt.u64.u32 %rd237, %r647; 2026-02-21T09:49:40.0482952Z cvt.u64.u32 %rd238, %r648; 2026-02-21T09:49:40.0483118Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:49:40.0483282Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:49:40.0483569Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0483871Z mov.b64 {%r1308, %r1309}, %rd240; 2026-02-21T09:49:40.0484093Z cvt.rn.f16x2.f32 %r1310, %r1309, %r1308; 2026-02-21T09:49:40.0484397Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0484738Z cvt.u64.u32 %rd241, %r649; 2026-02-21T09:49:40.0484902Z cvt.u64.u32 %rd242, %r650; 2026-02-21T09:49:40.0485070Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:49:40.0485240Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:49:40.0485525Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0485838Z mov.b64 {%r1311, %r1312}, %rd244; 2026-02-21T09:49:40.0486022Z cvt.rn.f16x2.f32 %r1313, %r1312, %r1311; 2026-02-21T09:49:40.0486332Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0486632Z cvt.u64.u32 %rd245, %r651; 2026-02-21T09:49:40.0486830Z cvt.u64.u32 %rd246, %r652; 2026-02-21T09:49:40.0487001Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:49:40.0487170Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:49:40.0487461Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0487766Z mov.b64 {%r1314, %r1315}, %rd248; 2026-02-21T09:49:40.0487952Z cvt.rn.f16x2.f32 %r1316, %r1315, %r1314; 2026-02-21T09:49:40.0488255Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0488562Z cvt.u64.u32 %rd249, %r653; 2026-02-21T09:49:40.0488743Z cvt.u64.u32 %rd250, %r654; 2026-02-21T09:49:40.0488906Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:49:40.0489080Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:49:40.0489358Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0489672Z mov.b64 {%r1317, %r1318}, %rd252; 2026-02-21T09:49:40.0489856Z cvt.rn.f16x2.f32 %r1319, %r1318, %r1317; 2026-02-21T09:49:40.0490165Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0490463Z cvt.u64.u32 %rd253, %r655; 2026-02-21T09:49:40.0490632Z cvt.u64.u32 %rd254, %r656; 2026-02-21T09:49:40.0490798Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:49:40.0490959Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:49:40.0491248Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0491557Z mov.b64 {%r1320, %r1321}, %rd256; 2026-02-21T09:49:40.0491745Z cvt.rn.f16x2.f32 %r1322, %r1321, %r1320; 2026-02-21T09:49:40.0492071Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0492373Z cvt.u64.u32 %rd257, %r658; 2026-02-21T09:49:40.0492540Z cvt.u64.u32 %rd258, %r659; 2026-02-21T09:49:40.0492700Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:49:40.0492873Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:49:40.0493152Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0493499Z mov.b64 {%r1323, %r1324}, %rd260; 2026-02-21T09:49:40.0493680Z cvt.rn.f16x2.f32 %r1325, %r1324, %r1323; 2026-02-21T09:49:40.0493984Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0494278Z cvt.u64.u32 %rd261, %r660; 2026-02-21T09:49:40.0494443Z cvt.u64.u32 %rd262, %r661; 2026-02-21T09:49:40.0494608Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:49:40.0494838Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:49:40.0495130Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0495430Z mov.b64 {%r1326, %r1327}, %rd264; 2026-02-21T09:49:40.0495616Z cvt.rn.f16x2.f32 %r1328, %r1327, %r1326; 2026-02-21T09:49:40.0495919Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0496222Z cvt.u64.u32 %rd265, %r662; 2026-02-21T09:49:40.0496419Z cvt.u64.u32 %rd266, %r663; 2026-02-21T09:49:40.0496582Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:49:40.0496756Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:49:40.0497038Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0497350Z mov.b64 {%r1329, %r1330}, %rd268; 2026-02-21T09:49:40.0497533Z cvt.rn.f16x2.f32 %r1331, %r1330, %r1329; 2026-02-21T09:49:40.0497840Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0498137Z cvt.u64.u32 %rd269, %r664; 2026-02-21T09:49:40.0498311Z cvt.u64.u32 %rd270, %r665; 2026-02-21T09:49:40.0498479Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:49:40.0498645Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:49:40.0498934Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0499271Z mov.b64 {%r1332, %r1333}, %rd272; 2026-02-21T09:49:40.0499468Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T09:49:40.0499769Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0500078Z cvt.u64.u32 %rd273, %r666; 2026-02-21T09:49:40.0500250Z cvt.u64.u32 %rd274, %r667; 2026-02-21T09:49:40.0500414Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:49:40.0500591Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:49:40.0500874Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0501188Z mov.b64 {%r1335, %r1336}, %rd276; 2026-02-21T09:49:40.0501375Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T09:49:40.0501680Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0501984Z cvt.u64.u32 %rd277, %r668; 2026-02-21T09:49:40.0502145Z cvt.u64.u32 %rd278, %r669; 2026-02-21T09:49:40.0502318Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:49:40.0502485Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:49:40.0502776Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0503088Z mov.b64 {%r1338, %r1339}, %rd280; 2026-02-21T09:49:40.0503278Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T09:49:40.0503576Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0503885Z cvt.u64.u32 %rd281, %r670; 2026-02-21T09:49:40.0504054Z cvt.u64.u32 %rd282, %r671; 2026-02-21T09:49:40.0504217Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:49:40.0504421Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:49:40.0504729Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0505041Z mov.b64 {%r1341, %r1342}, %rd284; 2026-02-21T09:49:40.0505223Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T09:49:40.0505534Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0505882Z cvt.u64.u32 %rd285, %r672; 2026-02-21T09:49:40.0506044Z cvt.u64.u32 %rd286, %r673; 2026-02-21T09:49:40.0506212Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:49:40.0506378Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:49:40.0506667Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0506970Z mov.b64 {%r1344, %r1345}, %rd288; 2026-02-21T09:49:40.0507155Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T09:49:40.0507459Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0507764Z cvt.u64.u32 %rd289, %r675; 2026-02-21T09:49:40.0507932Z cvt.u64.u32 %rd290, %r676; 2026-02-21T09:49:40.0508091Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:49:40.0508260Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:49:40.0508602Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0508922Z mov.b64 {%r1347, %r1348}, %rd292; 2026-02-21T09:49:40.0509103Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T09:49:40.0509414Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0509476Z cvt.u64.u32 %rd293, %r677; 2026-02-21T09:49:40.0509543Z cvt.u64.u32 %rd294, %r678; 2026-02-21T09:49:40.0509604Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:49:40.0509667Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:49:40.0509844Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0509918Z mov.b64 {%r1350, %r1351}, %rd296; 2026-02-21T09:49:40.0509989Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T09:49:40.0510164Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0510270Z cvt.u64.u32 %rd297, %r679; 2026-02-21T09:49:40.0510335Z cvt.u64.u32 %rd298, %r680; 2026-02-21T09:49:40.0510398Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:49:40.0510470Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:49:40.0510651Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0510719Z mov.b64 {%r1353, %r1354}, %rd300; 2026-02-21T09:49:40.0510790Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T09:49:40.0510981Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0511045Z cvt.u64.u32 %rd301, %r681; 2026-02-21T09:49:40.0511107Z cvt.u64.u32 %rd302, %r682; 2026-02-21T09:49:40.0511179Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:49:40.0511243Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:49:40.0511424Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0511496Z mov.b64 {%r1356, %r1357}, %rd304; 2026-02-21T09:49:40.0511568Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T09:49:40.0511749Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0511811Z cvt.u64.u32 %rd305, %r683; 2026-02-21T09:49:40.0511883Z cvt.u64.u32 %rd306, %r684; 2026-02-21T09:49:40.0511946Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:49:40.0512010Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:49:40.0512196Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0512260Z mov.b64 {%r1359, %r1360}, %rd308; 2026-02-21T09:49:40.0512363Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T09:49:40.0512543Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0512605Z cvt.u64.u32 %rd309, %r685; 2026-02-21T09:49:40.0512667Z cvt.u64.u32 %rd310, %r686; 2026-02-21T09:49:40.0512733Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:49:40.0512804Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:49:40.0513011Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0513073Z mov.b64 {%r1362, %r1363}, %rd312; 2026-02-21T09:49:40.0513152Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T09:49:40.0513327Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0513389Z cvt.u64.u32 %rd313, %r687; 2026-02-21T09:49:40.0513457Z cvt.u64.u32 %rd314, %r688; 2026-02-21T09:49:40.0513519Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:49:40.0513584Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:49:40.0513762Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0513834Z mov.b64 {%r1365, %r1366}, %rd316; 2026-02-21T09:49:40.0513905Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T09:49:40.0514110Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0514181Z cvt.u64.u32 %rd317, %r689; 2026-02-21T09:49:40.0514241Z cvt.u64.u32 %rd318, %r690; 2026-02-21T09:49:40.0514303Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:49:40.0514370Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:49:40.0514550Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0514613Z mov.b64 {%r1368, %r1369}, %rd320; 2026-02-21T09:49:40.0514714Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T09:49:40.0514900Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0514965Z cvt.u64.u32 %rd321, %r692; 2026-02-21T09:49:40.0515028Z cvt.u64.u32 %rd322, %r693; 2026-02-21T09:49:40.0515097Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:49:40.0515161Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:49:40.0515372Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0515443Z mov.b64 {%r1371, %r1372}, %rd324; 2026-02-21T09:49:40.0515513Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T09:49:40.0515687Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0515747Z cvt.u64.u32 %rd325, %r694; 2026-02-21T09:49:40.0515817Z cvt.u64.u32 %rd326, %r695; 2026-02-21T09:49:40.0515880Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:49:40.0515941Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:49:40.0516122Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0516186Z mov.b64 {%r1374, %r1375}, %rd328; 2026-02-21T09:49:40.0516255Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T09:49:40.0516431Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0516493Z cvt.u64.u32 %rd329, %r696; 2026-02-21T09:49:40.0516555Z cvt.u64.u32 %rd330, %r697; 2026-02-21T09:49:40.0516618Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:49:40.0516687Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:49:40.0516857Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0516918Z mov.b64 {%r1377, %r1378}, %rd332; 2026-02-21T09:49:40.0516996Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T09:49:40.0517172Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0517232Z cvt.u64.u32 %rd333, %r698; 2026-02-21T09:49:40.0517330Z cvt.u64.u32 %rd334, %r699; 2026-02-21T09:49:40.0517391Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:49:40.0517453Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:49:40.0517629Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0517699Z mov.b64 {%r1380, %r1381}, %rd336; 2026-02-21T09:49:40.0517769Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T09:49:40.0517974Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0518043Z cvt.u64.u32 %rd337, %r700; 2026-02-21T09:49:40.0518105Z cvt.u64.u32 %rd338, %r701; 2026-02-21T09:49:40.0518167Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:49:40.0518237Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:49:40.0518415Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0518479Z mov.b64 {%r1383, %r1384}, %rd340; 2026-02-21T09:49:40.0518550Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T09:49:40.0518737Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0518798Z cvt.u64.u32 %rd341, %r702; 2026-02-21T09:49:40.0518857Z cvt.u64.u32 %rd342, %r703; 2026-02-21T09:49:40.0518927Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:49:40.0518991Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:49:40.0519202Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0519275Z mov.b64 {%r1386, %r1387}, %rd344; 2026-02-21T09:49:40.0519345Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T09:49:40.0519520Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0519580Z cvt.u64.u32 %rd345, %r704; 2026-02-21T09:49:40.0519649Z cvt.u64.u32 %rd346, %r705; 2026-02-21T09:49:40.0519711Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:49:40.0519777Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:49:40.0519957Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0520020Z mov.b64 {%r1389, %r1390}, %rd348; 2026-02-21T09:49:40.0520089Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T09:49:40.0520298Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0520362Z cvt.u64.u32 %rd349, %r706; 2026-02-21T09:49:40.0520424Z cvt.u64.u32 %rd350, %r707; 2026-02-21T09:49:40.0520485Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:49:40.0520553Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:49:40.0520729Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0520791Z mov.b64 {%r1392, %r1393}, %rd352; 2026-02-21T09:49:40.0520869Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T09:49:40.0521048Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0521115Z cvt.u64.u32 %rd353, %r709; 2026-02-21T09:49:40.0521188Z cvt.u64.u32 %rd354, %r710; 2026-02-21T09:49:40.0521251Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:49:40.0521314Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:49:40.0521494Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0521573Z mov.b64 {%r1395, %r1396}, %rd356; 2026-02-21T09:49:40.0521644Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T09:49:40.0521821Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0521891Z cvt.u64.u32 %rd357, %r711; 2026-02-21T09:49:40.0521952Z cvt.u64.u32 %rd358, %r712; 2026-02-21T09:49:40.0522015Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:49:40.0522084Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:49:40.0522260Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0522348Z mov.b64 {%r1398, %r1399}, %rd360; 2026-02-21T09:49:40.0522418Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T09:49:40.0522603Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0522666Z cvt.u64.u32 %rd361, %r713; 2026-02-21T09:49:40.0522727Z cvt.u64.u32 %rd362, %r714; 2026-02-21T09:49:40.0522852Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:49:40.0522916Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:49:40.0523094Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0523163Z mov.b64 {%r1401, %r1402}, %rd364; 2026-02-21T09:49:40.0523233Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T09:49:40.0523409Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0523470Z cvt.u64.u32 %rd365, %r715; 2026-02-21T09:49:40.0523540Z cvt.u64.u32 %rd366, %r716; 2026-02-21T09:49:40.0523601Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:49:40.0523663Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:49:40.0523846Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0523912Z mov.b64 {%r1404, %r1405}, %rd368; 2026-02-21T09:49:40.0523984Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T09:49:40.0524192Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0524254Z cvt.u64.u32 %rd369, %r717; 2026-02-21T09:49:40.0524316Z cvt.u64.u32 %rd370, %r718; 2026-02-21T09:49:40.0524377Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:49:40.0524446Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:49:40.0524627Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0524711Z mov.b64 {%r1407, %r1408}, %rd372; 2026-02-21T09:49:40.0524790Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T09:49:40.0524964Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0525025Z cvt.u64.u32 %rd373, %r719; 2026-02-21T09:49:40.0525094Z cvt.u64.u32 %rd374, %r720; 2026-02-21T09:49:40.0525185Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:49:40.0525249Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:49:40.0525428Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0525499Z mov.b64 {%r1410, %r1411}, %rd376; 2026-02-21T09:49:40.0525569Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T09:49:40.0525747Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0525816Z cvt.u64.u32 %rd377, %r721; 2026-02-21T09:49:40.0525876Z cvt.u64.u32 %rd378, %r722; 2026-02-21T09:49:40.0525938Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:49:40.0526006Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:49:40.0526185Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0526247Z mov.b64 {%r1413, %r1414}, %rd380; 2026-02-21T09:49:40.0526315Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T09:49:40.0526506Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0526571Z cvt.u64.u32 %rd381, %r723; 2026-02-21T09:49:40.0526634Z cvt.u64.u32 %rd382, %r724; 2026-02-21T09:49:40.0526703Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:49:40.0526765Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:49:40.0526941Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0527011Z mov.b64 {%r1416, %r1417}, %rd384; 2026-02-21T09:49:40.0527081Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T09:49:40.0527258Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0527353Z cvt.u64.u32 %rd385, %r726; 2026-02-21T09:49:40.0527420Z cvt.u64.u32 %rd386, %r727; 2026-02-21T09:49:40.0527481Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:49:40.0527542Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:49:40.0527723Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0527833Z mov.b64 {%r1419, %r1420}, %rd388; 2026-02-21T09:49:40.0527902Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T09:49:40.0528085Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0528146Z cvt.u64.u32 %rd389, %r728; 2026-02-21T09:49:40.0528206Z cvt.u64.u32 %rd390, %r729; 2026-02-21T09:49:40.0528266Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:49:40.0528337Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:49:40.0528513Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0528577Z mov.b64 {%r1422, %r1423}, %rd392; 2026-02-21T09:49:40.0528653Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T09:49:40.0528829Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0528891Z cvt.u64.u32 %rd393, %r730; 2026-02-21T09:49:40.0528958Z cvt.u64.u32 %rd394, %r731; 2026-02-21T09:49:40.0529049Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:49:40.0529113Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:49:40.0529291Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0529361Z mov.b64 {%r1425, %r1426}, %rd396; 2026-02-21T09:49:40.0529430Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T09:49:40.0529605Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0529673Z cvt.u64.u32 %rd397, %r732; 2026-02-21T09:49:40.0529733Z cvt.u64.u32 %rd398, %r733; 2026-02-21T09:49:40.0529797Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:49:40.0529868Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:49:40.0530045Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0530109Z mov.b64 {%r1428, %r1429}, %rd400; 2026-02-21T09:49:40.0530205Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T09:49:40.0530397Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0530459Z cvt.u64.u32 %rd401, %r734; 2026-02-21T09:49:40.0530521Z cvt.u64.u32 %rd402, %r735; 2026-02-21T09:49:40.0530592Z shl.b64 %rd403, %rd402, 32; 2026-02-21T09:49:40.0530656Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T09:49:40.0530836Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0530908Z mov.b64 {%r1431, %r1432}, %rd404; 2026-02-21T09:49:40.0530987Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T09:49:40.0531166Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0531229Z cvt.u64.u32 %rd405, %r736; 2026-02-21T09:49:40.0531298Z cvt.u64.u32 %rd406, %r737; 2026-02-21T09:49:40.0531360Z shl.b64 %rd407, %rd406, 32; 2026-02-21T09:49:40.0531424Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T09:49:40.0531608Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0531671Z mov.b64 {%r1434, %r1435}, %rd408; 2026-02-21T09:49:40.0531742Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T09:49:40.0531926Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0531987Z cvt.u64.u32 %rd409, %r738; 2026-02-21T09:49:40.0532047Z cvt.u64.u32 %rd410, %r739; 2026-02-21T09:49:40.0532109Z shl.b64 %rd411, %rd410, 32; 2026-02-21T09:49:40.0532180Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T09:49:40.0532387Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0532450Z mov.b64 {%r1437, %r1438}, %rd412; 2026-02-21T09:49:40.0532527Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T09:49:40.0532705Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0532812Z cvt.u64.u32 %rd413, %r740; 2026-02-21T09:49:40.0532881Z cvt.u64.u32 %rd414, %r741; 2026-02-21T09:49:40.0532942Z shl.b64 %rd415, %rd414, 32; 2026-02-21T09:49:40.0533003Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T09:49:40.0533179Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0533249Z mov.b64 {%r1440, %r1441}, %rd416; 2026-02-21T09:49:40.0533318Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T09:49:40.0533495Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0533565Z cvt.u64.u32 %rd417, %r743; 2026-02-21T09:49:40.0533627Z cvt.u64.u32 %rd418, %r744; 2026-02-21T09:49:40.0533688Z shl.b64 %rd419, %rd418, 32; 2026-02-21T09:49:40.0533756Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T09:49:40.0533932Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0534024Z mov.b64 {%r1443, %r1444}, %rd420; 2026-02-21T09:49:40.0534097Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T09:49:40.0534283Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0534343Z cvt.u64.u32 %rd421, %r745; 2026-02-21T09:49:40.0534403Z cvt.u64.u32 %rd422, %r746; 2026-02-21T09:49:40.0534473Z shl.b64 %rd423, %rd422, 32; 2026-02-21T09:49:40.0534536Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T09:49:40.0534752Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0534825Z mov.b64 {%r1446, %r1447}, %rd424; 2026-02-21T09:49:40.0534895Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T09:49:40.0535072Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0535135Z cvt.u64.u32 %rd425, %r747; 2026-02-21T09:49:40.0535232Z cvt.u64.u32 %rd426, %r748; 2026-02-21T09:49:40.0535296Z shl.b64 %rd427, %rd426, 32; 2026-02-21T09:49:40.0535360Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T09:49:40.0535543Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0535605Z mov.b64 {%r1449, %r1450}, %rd428; 2026-02-21T09:49:40.0535672Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T09:49:40.0535852Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0535913Z cvt.u64.u32 %rd429, %r749; 2026-02-21T09:49:40.0535973Z cvt.u64.u32 %rd430, %r750; 2026-02-21T09:49:40.0536036Z shl.b64 %rd431, %rd430, 32; 2026-02-21T09:49:40.0536106Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T09:49:40.0536283Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0536345Z mov.b64 {%r1452, %r1453}, %rd432; 2026-02-21T09:49:40.0536420Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T09:49:40.0536600Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0536664Z cvt.u64.u32 %rd433, %r751; 2026-02-21T09:49:40.0536731Z cvt.u64.u32 %rd434, %r752; 2026-02-21T09:49:40.0536793Z shl.b64 %rd435, %rd434, 32; 2026-02-21T09:49:40.0536856Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T09:49:40.0537033Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0537102Z mov.b64 {%r1455, %r1456}, %rd436; 2026-02-21T09:49:40.0537171Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T09:49:40.0537380Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0537449Z cvt.u64.u32 %rd437, %r753; 2026-02-21T09:49:40.0537510Z cvt.u64.u32 %rd438, %r754; 2026-02-21T09:49:40.0537571Z shl.b64 %rd439, %rd438, 32; 2026-02-21T09:49:40.0537642Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T09:49:40.0537823Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0537918Z mov.b64 {%r1458, %r1459}, %rd440; 2026-02-21T09:49:40.0537987Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T09:49:40.0538174Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0538237Z cvt.u64.u32 %rd441, %r755; 2026-02-21T09:49:40.0538298Z cvt.u64.u32 %rd442, %r756; 2026-02-21T09:49:40.0538366Z shl.b64 %rd443, %rd442, 32; 2026-02-21T09:49:40.0538428Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T09:49:40.0538608Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0538676Z mov.b64 {%r1461, %r1462}, %rd444; 2026-02-21T09:49:40.0538747Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T09:49:40.0538929Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0539019Z cvt.u64.u32 %rd445, %r757; 2026-02-21T09:49:40.0539090Z cvt.u64.u32 %rd446, %r758; 2026-02-21T09:49:40.0539152Z shl.b64 %rd447, %rd446, 32; 2026-02-21T09:49:40.0539214Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T09:49:40.0539400Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0539462Z mov.b64 {%r1464, %r1465}, %rd448; 2026-02-21T09:49:40.0539531Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T09:49:40.0539718Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0539782Z cvt.u64.u32 %rd449, %r760; 2026-02-21T09:49:40.0539842Z cvt.u64.u32 %rd450, %r761; 2026-02-21T09:49:40.0539903Z shl.b64 %rd451, %rd450, 32; 2026-02-21T09:49:40.0539974Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T09:49:40.0540179Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0540244Z mov.b64 {%r1467, %r1468}, %rd452; 2026-02-21T09:49:40.0540325Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T09:49:40.0540505Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0540570Z cvt.u64.u32 %rd453, %r762; 2026-02-21T09:49:40.0540640Z cvt.u64.u32 %rd454, %r763; 2026-02-21T09:49:40.0540701Z shl.b64 %rd455, %rd454, 32; 2026-02-21T09:49:40.0540763Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T09:49:40.0540942Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0541014Z mov.b64 {%r1470, %r1471}, %rd456; 2026-02-21T09:49:40.0541085Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T09:49:40.0541262Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0541330Z cvt.u64.u32 %rd457, %r764; 2026-02-21T09:49:40.0541393Z cvt.u64.u32 %rd458, %r765; 2026-02-21T09:49:40.0541456Z shl.b64 %rd459, %rd458, 32; 2026-02-21T09:49:40.0541528Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T09:49:40.0541706Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0541769Z mov.b64 {%r1473, %r1474}, %rd460; 2026-02-21T09:49:40.0541838Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T09:49:40.0542167Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0542488Z cvt.u64.u32 %rd461, %r766; 2026-02-21T09:49:40.0542662Z cvt.u64.u32 %rd462, %r767; 2026-02-21T09:49:40.0542875Z shl.b64 %rd463, %rd462, 32; 2026-02-21T09:49:40.0543049Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T09:49:40.0543344Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0543655Z mov.b64 {%r1476, %r1477}, %rd464; 2026-02-21T09:49:40.0543853Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T09:49:40.0544172Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0544508Z cvt.u64.u32 %rd465, %r768; 2026-02-21T09:49:40.0544758Z cvt.u64.u32 %rd466, %r769; 2026-02-21T09:49:40.0544925Z shl.b64 %rd467, %rd466, 32; 2026-02-21T09:49:40.0545101Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T09:49:40.0545384Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0545706Z mov.b64 {%r1479, %r1480}, %rd468; 2026-02-21T09:49:40.0545901Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T09:49:40.0546222Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0546539Z cvt.u64.u32 %rd469, %r770; 2026-02-21T09:49:40.0546705Z cvt.u64.u32 %rd470, %r771; 2026-02-21T09:49:40.0546885Z shl.b64 %rd471, %rd470, 32; 2026-02-21T09:49:40.0547068Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T09:49:40.0547409Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0547805Z mov.b64 {%r1482, %r1483}, %rd472; 2026-02-21T09:49:40.0548026Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T09:49:40.0548349Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0548662Z cvt.u64.u32 %rd473, %r772; 2026-02-21T09:49:40.0548825Z cvt.u64.u32 %rd474, %r773; 2026-02-21T09:49:40.0548995Z shl.b64 %rd475, %rd474, 32; 2026-02-21T09:49:40.0549171Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T09:49:40.0549507Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0549856Z mov.b64 {%r1485, %r1486}, %rd476; 2026-02-21T09:49:40.0550052Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T09:49:40.0550431Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0550780Z cvt.u64.u32 %rd477, %r774; 2026-02-21T09:49:40.0550961Z cvt.u64.u32 %rd478, %r775; 2026-02-21T09:49:40.0551132Z shl.b64 %rd479, %rd478, 32; 2026-02-21T09:49:40.0551313Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T09:49:40.0551606Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0551940Z mov.b64 {%r1488, %r1489}, %rd480; 2026-02-21T09:49:40.0552140Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T09:49:40.0552456Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0552781Z cvt.u64.u32 %rd481, %r777; 2026-02-21T09:49:40.0552951Z cvt.u64.u32 %rd482, %r778; 2026-02-21T09:49:40.0553130Z shl.b64 %rd483, %rd482, 32; 2026-02-21T09:49:40.0553302Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T09:49:40.0553606Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0553928Z mov.b64 {%r1491, %r1492}, %rd484; 2026-02-21T09:49:40.0554127Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T09:49:40.0554450Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0554803Z cvt.u64.u32 %rd485, %r779; 2026-02-21T09:49:40.0554984Z cvt.u64.u32 %rd486, %r780; 2026-02-21T09:49:40.0555158Z shl.b64 %rd487, %rd486, 32; 2026-02-21T09:49:40.0555341Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T09:49:40.0555647Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0556036Z mov.b64 {%r1494, %r1495}, %rd488; 2026-02-21T09:49:40.0556239Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T09:49:40.0556554Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0556878Z cvt.u64.u32 %rd489, %r781; 2026-02-21T09:49:40.0557044Z cvt.u64.u32 %rd490, %r782; 2026-02-21T09:49:40.0557215Z shl.b64 %rd491, %rd490, 32; 2026-02-21T09:49:40.0557412Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T09:49:40.0557702Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0558018Z mov.b64 {%r1497, %r1498}, %rd492; 2026-02-21T09:49:40.0558202Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T09:49:40.0558512Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0558814Z cvt.u64.u32 %rd493, %r783; 2026-02-21T09:49:40.0558985Z cvt.u64.u32 %rd494, %r784; 2026-02-21T09:49:40.0559147Z shl.b64 %rd495, %rd494, 32; 2026-02-21T09:49:40.0559320Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T09:49:40.0559608Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0559926Z mov.b64 {%r1500, %r1501}, %rd496; 2026-02-21T09:49:40.0560124Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T09:49:40.0560463Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0560779Z cvt.u64.u32 %rd497, %r785; 2026-02-21T09:49:40.0560948Z cvt.u64.u32 %rd498, %r786; 2026-02-21T09:49:40.0561123Z shl.b64 %rd499, %rd498, 32; 2026-02-21T09:49:40.0561293Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T09:49:40.0561589Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0561905Z mov.b64 {%r1503, %r1504}, %rd500; 2026-02-21T09:49:40.0562094Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T09:49:40.0562408Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0562714Z cvt.u64.u32 %rd501, %r787; 2026-02-21T09:49:40.0562888Z cvt.u64.u32 %rd502, %r788; 2026-02-21T09:49:40.0563055Z shl.b64 %rd503, %rd502, 32; 2026-02-21T09:49:40.0563235Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T09:49:40.0563550Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0563867Z mov.b64 {%r1506, %r1507}, %rd504; 2026-02-21T09:49:40.0564058Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T09:49:40.0564357Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0564666Z cvt.u64.u32 %rd505, %r789; 2026-02-21T09:49:40.0564858Z cvt.u64.u32 %rd506, %r790; 2026-02-21T09:49:40.0565026Z shl.b64 %rd507, %rd506, 32; 2026-02-21T09:49:40.0565194Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T09:49:40.0565483Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0565789Z mov.b64 {%r1509, %r1510}, %rd508; 2026-02-21T09:49:40.0565972Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T09:49:40.0566277Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0566571Z cvt.u64.u32 %rd509, %r791; 2026-02-21T09:49:40.0566741Z cvt.u64.u32 %rd510, %r792; 2026-02-21T09:49:40.0566901Z shl.b64 %rd511, %rd510, 32; 2026-02-21T09:49:40.0567071Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T09:49:40.0567349Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0567659Z mov.b64 {%r1512, %r1513}, %rd512; 2026-02-21T09:49:40.0567848Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T09:49:40.0568145Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0568491Z cvt.u64.u32 %rd513, %r794; 2026-02-21T09:49:40.0568655Z cvt.u64.u32 %rd514, %r795; 2026-02-21T09:49:40.0568826Z shl.b64 %rd515, %rd514, 32; 2026-02-21T09:49:40.0568990Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T09:49:40.0569277Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0569585Z mov.b64 {%r1515, %r1516}, %rd516; 2026-02-21T09:49:40.0569812Z cvt.rn.f16x2.f32 %r1517, %r1516, %r1515; 2026-02-21T09:49:40.0570114Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0570410Z cvt.u64.u32 %rd517, %r796; 2026-02-21T09:49:40.0570580Z cvt.u64.u32 %rd518, %r797; 2026-02-21T09:49:40.0570740Z shl.b64 %rd519, %rd518, 32; 2026-02-21T09:49:40.0570912Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T09:49:40.0571191Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0571846Z [196s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:49:40.0573269Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[True, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:49:40.0574546Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:49:40.0574838Z `ptxas` stderr: 2026-02-21T09:49:40.0575299Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:49:40.0575820Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:49:40.0575994Z 2026-02-21T09:49:40.0576427Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpbo3pk_u9.ptx -o /tmp/tmpbo3pk_u9.ptx.o 2026-02-21T09:49:40.0576914Z 2026-02-21T09:49:40.0577084Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:49:40.0577360Z mov.b64 {%r1518, %r1519}, %rd520; 2026-02-21T09:49:40.0577560Z cvt.rn.f16x2.f32 %r1520, %r1519, %r1518; 2026-02-21T09:49:40.0577874Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0578198Z cvt.u64.u32 %rd521, %r798; 2026-02-21T09:49:40.0578366Z cvt.u64.u32 %rd522, %r799; 2026-02-21T09:49:40.0578536Z shl.b64 %rd523, %rd522, 32; 2026-02-21T09:49:40.0578702Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T09:49:40.0579000Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0579312Z mov.b64 {%r1521, %r1522}, %rd524; 2026-02-21T09:49:40.0579508Z cvt.rn.f16x2.f32 %r1523, %r1522, %r1521; 2026-02-21T09:49:40.0579821Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0580127Z cvt.u64.u32 %rd525, %r800; 2026-02-21T09:49:40.0580300Z cvt.u64.u32 %rd526, %r801; 2026-02-21T09:49:40.0580463Z shl.b64 %rd527, %rd526, 32; 2026-02-21T09:49:40.0580638Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T09:49:40.0580924Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0581237Z mov.b64 {%r1524, %r1525}, %rd528; 2026-02-21T09:49:40.0581428Z cvt.rn.f16x2.f32 %r1526, %r1525, %r1524; 2026-02-21T09:49:40.0581730Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0582039Z cvt.u64.u32 %rd529, %r802; 2026-02-21T09:49:40.0582203Z cvt.u64.u32 %rd530, %r803; 2026-02-21T09:49:40.0582372Z shl.b64 %rd531, %rd530, 32; 2026-02-21T09:49:40.0582571Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T09:49:40.0582868Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0583173Z mov.b64 {%r1527, %r1528}, %rd532; 2026-02-21T09:49:40.0583368Z cvt.rn.f16x2.f32 %r1529, %r1528, %r1527; 2026-02-21T09:49:40.0583675Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0584002Z cvt.u64.u32 %rd533, %r804; 2026-02-21T09:49:40.0584173Z cvt.u64.u32 %rd534, %r805; 2026-02-21T09:49:40.0584334Z shl.b64 %rd535, %rd534, 32; 2026-02-21T09:49:40.0584507Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T09:49:40.0584820Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0585136Z mov.b64 {%r1530, %r1531}, %rd536; 2026-02-21T09:49:40.0585327Z cvt.rn.f16x2.f32 %r1532, %r1531, %r1530; 2026-02-21T09:49:40.0585629Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0585942Z cvt.u64.u32 %rd537, %r806; 2026-02-21T09:49:40.0586103Z cvt.u64.u32 %rd538, %r807; 2026-02-21T09:49:40.0586270Z shl.b64 %rd539, %rd538, 32; 2026-02-21T09:49:40.0586436Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T09:49:40.0586763Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0587076Z mov.b64 {%r1533, %r1534}, %rd540; 2026-02-21T09:49:40.0587267Z cvt.rn.f16x2.f32 %r1535, %r1534, %r1533; 2026-02-21T09:49:40.0587569Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0587868Z cvt.u64.u32 %rd541, %r808; 2026-02-21T09:49:40.0588039Z cvt.u64.u32 %rd542, %r809; 2026-02-21T09:49:40.0588203Z shl.b64 %rd543, %rd542, 32; 2026-02-21T09:49:40.0588376Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T09:49:40.0588660Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0588973Z mov.b64 {%r1536, %r1537}, %rd544; 2026-02-21T09:49:40.0589168Z cvt.rn.f16x2.f32 %r1538, %r1537, %r1536; 2026-02-21T09:49:40.0589504Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0589827Z cvt.u64.u32 %rd545, %r811; 2026-02-21T09:49:40.0589994Z cvt.u64.u32 %rd546, %r812; 2026-02-21T09:49:40.0590165Z shl.b64 %rd547, %rd546, 32; 2026-02-21T09:49:40.0590330Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T09:49:40.0590623Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0590930Z mov.b64 {%r1539, %r1540}, %rd548; 2026-02-21T09:49:40.0591121Z cvt.rn.f16x2.f32 %r1541, %r1540, %r1539; 2026-02-21T09:49:40.0591428Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0591733Z cvt.u64.u32 %rd549, %r813; 2026-02-21T09:49:40.0591903Z cvt.u64.u32 %rd550, %r814; 2026-02-21T09:49:40.0592063Z shl.b64 %rd551, %rd550, 32; 2026-02-21T09:49:40.0592234Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T09:49:40.0592517Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0592839Z mov.b64 {%r1542, %r1543}, %rd552; 2026-02-21T09:49:40.0593036Z cvt.rn.f16x2.f32 %r1544, %r1543, %r1542; 2026-02-21T09:49:40.0593352Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0593683Z cvt.u64.u32 %rd553, %r815; 2026-02-21T09:49:40.0593849Z cvt.u64.u32 %rd554, %r816; 2026-02-21T09:49:40.0594022Z shl.b64 %rd555, %rd554, 32; 2026-02-21T09:49:40.0594189Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T09:49:40.0594491Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0594841Z mov.b64 {%r1545, %r1546}, %rd556; 2026-02-21T09:49:40.0595069Z cvt.rn.f16x2.f32 %r1547, %r1546, %r1545; 2026-02-21T09:49:40.0595390Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0595695Z cvt.u64.u32 %rd557, %r817; 2026-02-21T09:49:40.0595870Z cvt.u64.u32 %rd558, %r818; 2026-02-21T09:49:40.0596035Z shl.b64 %rd559, %rd558, 32; 2026-02-21T09:49:40.0596212Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T09:49:40.0596537Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0596853Z mov.b64 {%r1548, %r1549}, %rd560; 2026-02-21T09:49:40.0597048Z cvt.rn.f16x2.f32 %r1550, %r1549, %r1548; 2026-02-21T09:49:40.0597352Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0597669Z cvt.u64.u32 %rd561, %r819; 2026-02-21T09:49:40.0597838Z cvt.u64.u32 %rd562, %r820; 2026-02-21T09:49:40.0598014Z shl.b64 %rd563, %rd562, 32; 2026-02-21T09:49:40.0598185Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T09:49:40.0598480Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0598794Z mov.b64 {%r1551, %r1552}, %rd564; 2026-02-21T09:49:40.0598983Z cvt.rn.f16x2.f32 %r1553, %r1552, %r1551; 2026-02-21T09:49:40.0599328Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0599645Z cvt.u64.u32 %rd565, %r821; 2026-02-21T09:49:40.0599820Z cvt.u64.u32 %rd566, %r822; 2026-02-21T09:49:40.0599986Z shl.b64 %rd567, %rd566, 32; 2026-02-21T09:49:40.0600165Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T09:49:40.0600459Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0600786Z mov.b64 {%r1554, %r1555}, %rd568; 2026-02-21T09:49:40.0600988Z cvt.rn.f16x2.f32 %r1556, %r1555, %r1554; 2026-02-21T09:49:40.0601306Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0601630Z cvt.u64.u32 %rd569, %r823; 2026-02-21T09:49:40.0601797Z cvt.u64.u32 %rd570, %r824; 2026-02-21T09:49:40.0601969Z shl.b64 %rd571, %rd570, 32; 2026-02-21T09:49:40.0602141Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T09:49:40.0602490Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0602816Z mov.b64 {%r1557, %r1558}, %rd572; 2026-02-21T09:49:40.0603006Z cvt.rn.f16x2.f32 %r1559, %r1558, %r1557; 2026-02-21T09:49:40.0603319Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0603638Z cvt.u64.u32 %rd573, %r825; 2026-02-21T09:49:40.0603814Z cvt.u64.u32 %rd574, %r826; 2026-02-21T09:49:40.0603980Z shl.b64 %rd575, %rd574, 32; 2026-02-21T09:49:40.0604155Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T09:49:40.0604450Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0604811Z mov.b64 {%r1560, %r1561}, %rd576; 2026-02-21T09:49:40.0605010Z cvt.rn.f16x2.f32 %r1562, %r1561, %r1560; 2026-02-21T09:49:40.0605324Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0605659Z cvt.u64.u32 %rd577, %r828; 2026-02-21T09:49:40.0605830Z cvt.u64.u32 %rd578, %r829; 2026-02-21T09:49:40.0606009Z shl.b64 %rd579, %rd578, 32; 2026-02-21T09:49:40.0606181Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T09:49:40.0606486Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0606805Z mov.b64 {%r1563, %r1564}, %rd580; 2026-02-21T09:49:40.0606995Z cvt.rn.f16x2.f32 %r1565, %r1564, %r1563; 2026-02-21T09:49:40.0607312Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0607636Z cvt.u64.u32 %rd581, %r830; 2026-02-21T09:49:40.0607844Z cvt.u64.u32 %rd582, %r831; 2026-02-21T09:49:40.0608009Z shl.b64 %rd583, %rd582, 32; 2026-02-21T09:49:40.0608185Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T09:49:40.0608474Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0608788Z mov.b64 {%r1566, %r1567}, %rd584; 2026-02-21T09:49:40.0608984Z cvt.rn.f16x2.f32 %r1568, %r1567, %r1566; 2026-02-21T09:49:40.0609326Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0609637Z cvt.u64.u32 %rd585, %r832; 2026-02-21T09:49:40.0609802Z cvt.u64.u32 %rd586, %r833; 2026-02-21T09:49:40.0609975Z shl.b64 %rd587, %rd586, 32; 2026-02-21T09:49:40.0610144Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T09:49:40.0610434Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0610745Z mov.b64 {%r1569, %r1570}, %rd588; 2026-02-21T09:49:40.0610934Z cvt.rn.f16x2.f32 %r1571, %r1570, %r1569; 2026-02-21T09:49:40.0611246Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0611552Z cvt.u64.u32 %rd589, %r834; 2026-02-21T09:49:40.0611724Z cvt.u64.u32 %rd590, %r835; 2026-02-21T09:49:40.0611892Z shl.b64 %rd591, %rd590, 32; 2026-02-21T09:49:40.0612102Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T09:49:40.0612392Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0612711Z mov.b64 {%r1572, %r1573}, %rd592; 2026-02-21T09:49:40.0612906Z cvt.rn.f16x2.f32 %r1574, %r1573, %r1572; 2026-02-21T09:49:40.0613210Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0613524Z cvt.u64.u32 %rd593, %r836; 2026-02-21T09:49:40.0613696Z cvt.u64.u32 %rd594, %r837; 2026-02-21T09:49:40.0613875Z shl.b64 %rd595, %rd594, 32; 2026-02-21T09:49:40.0614049Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T09:49:40.0614343Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0614666Z mov.b64 {%r1575, %r1576}, %rd596; 2026-02-21T09:49:40.0614880Z cvt.rn.f16x2.f32 %r1577, %r1576, %r1575; 2026-02-21T09:49:40.0615227Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0615544Z cvt.u64.u32 %rd597, %r838; 2026-02-21T09:49:40.0615716Z cvt.u64.u32 %rd598, %r839; 2026-02-21T09:49:40.0615883Z shl.b64 %rd599, %rd598, 32; 2026-02-21T09:49:40.0616060Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T09:49:40.0616348Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0616664Z mov.b64 {%r1578, %r1579}, %rd600; 2026-02-21T09:49:40.0616857Z cvt.rn.f16x2.f32 %r1580, %r1579, %r1578; 2026-02-21T09:49:40.0617160Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0617480Z cvt.u64.u32 %rd601, %r840; 2026-02-21T09:49:40.0617647Z cvt.u64.u32 %rd602, %r841; 2026-02-21T09:49:40.0617823Z shl.b64 %rd603, %rd602, 32; 2026-02-21T09:49:40.0617992Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T09:49:40.0618296Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0618623Z mov.b64 {%r1581, %r1582}, %rd604; 2026-02-21T09:49:40.0618820Z cvt.rn.f16x2.f32 %r1583, %r1582, %r1581; 2026-02-21T09:49:40.0619194Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0619579Z cvt.u64.u32 %rd605, %r842; 2026-02-21T09:49:40.0619776Z cvt.u64.u32 %rd606, %r843; 2026-02-21T09:49:40.0619942Z shl.b64 %rd607, %rd606, 32; 2026-02-21T09:49:40.0620119Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T09:49:40.0620406Z .loc 1 58 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:58:27 2026-02-21T09:49:40.0620756Z mov.b64 {%r1584, %r1585}, %rd608; 2026-02-21T09:49:40.0620948Z cvt.rn.f16x2.f32 %r1586, %r1585, %r1584; 2026-02-21T09:49:40.0621250Z .loc 1 59 83 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:59:83 2026-02-21T09:49:40.0621606Z st.shared.v4.b32 [%r80], {%r1205, %r1217, %r1229, %r1241}; 2026-02-21T09:49:40.0621913Z st.shared.v4.b32 [%r81], {%r1253, %r1265, %r1277, %r1289}; 2026-02-21T09:49:40.0622179Z st.shared.v4.b32 [%r82], {%r1301, %r1313, %r1325, %r1337}; 2026-02-21T09:49:40.0622433Z st.shared.v4.b32 [%r83], {%r1349, %r1361, %r1373, %r1385}; 2026-02-21T09:49:40.0622658Z bar.sync 0, 128; 2026-02-21T09:49:40.0622817Z // begin inline asm 2026-02-21T09:49:40.0623088Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1005, %r1009, %r1013, %r1017}, [%r849]; 2026-02-21T09:49:40.0623392Z // end inline asm 2026-02-21T09:49:40.0623540Z // begin inline asm 2026-02-21T09:49:40.0623805Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1021, %r1025, %r1029, %r1033}, [%r854]; 2026-02-21T09:49:40.0624091Z // end inline asm 2026-02-21T09:49:40.0624245Z // begin inline asm 2026-02-21T09:49:40.0624493Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1037, %r1041, %r1045, %r1049}, [%r859]; 2026-02-21T09:49:40.0624834Z // end inline asm 2026-02-21T09:49:40.0624988Z // begin inline asm 2026-02-21T09:49:40.0625274Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1053, %r1057, %r1061, %r1065}, [%r864]; 2026-02-21T09:49:40.0625580Z // end inline asm 2026-02-21T09:49:40.0625724Z bar.sync 0, 128; 2026-02-21T09:49:40.0625922Z st.shared.v4.b32 [%r80], {%r1397, %r1409, %r1421, %r1433}; 2026-02-21T09:49:40.0626180Z st.shared.v4.b32 [%r81], {%r1445, %r1457, %r1469, %r1481}; 2026-02-21T09:49:40.0626458Z st.shared.v4.b32 [%r82], {%r1493, %r1505, %r1517, %r1529}; 2026-02-21T09:49:40.0626727Z st.shared.v4.b32 [%r83], {%r1541, %r1553, %r1565, %r1577}; 2026-02-21T09:49:40.0626946Z bar.sync 0, 128; 2026-02-21T09:49:40.0627106Z // begin inline asm 2026-02-21T09:49:40.0627355Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1069, %r1073, %r1077, %r1081}, [%r849]; 2026-02-21T09:49:40.0627652Z // end inline asm 2026-02-21T09:49:40.0627796Z // begin inline asm 2026-02-21T09:49:40.0628081Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1085, %r1089, %r1093, %r1097}, [%r854]; 2026-02-21T09:49:40.0628369Z // end inline asm 2026-02-21T09:49:40.0628523Z // begin inline asm 2026-02-21T09:49:40.0628775Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1101, %r1105, %r1109, %r1113}, [%r859]; 2026-02-21T09:49:40.0629065Z // end inline asm 2026-02-21T09:49:40.0629219Z // begin inline asm 2026-02-21T09:49:40.0629467Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1117, %r1121, %r1125, %r1129}, [%r864]; 2026-02-21T09:49:40.0629757Z // end inline asm 2026-02-21T09:49:40.0629899Z bar.sync 0, 128; 2026-02-21T09:49:40.0630088Z st.shared.v4.b32 [%r80], {%r1208, %r1220, %r1232, %r1244}; 2026-02-21T09:49:40.0630342Z st.shared.v4.b32 [%r81], {%r1256, %r1268, %r1280, %r1292}; 2026-02-21T09:49:40.0630604Z st.shared.v4.b32 [%r82], {%r1304, %r1316, %r1328, %r1340}; 2026-02-21T09:49:40.0630859Z st.shared.v4.b32 [%r83], {%r1352, %r1364, %r1376, %r1388}; 2026-02-21T09:49:40.0631074Z bar.sync 0, 128; 2026-02-21T09:49:40.0631227Z // begin inline asm 2026-02-21T09:49:40.0631478Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1006, %r1010, %r1014, %r1018}, [%r849]; 2026-02-21T09:49:40.0631772Z // end inline asm 2026-02-21T09:49:40.0631916Z // begin inline asm 2026-02-21T09:49:40.0632169Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1022, %r1026, %r1030, %r1034}, [%r854]; 2026-02-21T09:49:40.0632456Z // end inline asm 2026-02-21T09:49:40.0632603Z // begin inline asm 2026-02-21T09:49:40.0632857Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1038, %r1042, %r1046, %r1050}, [%r859]; 2026-02-21T09:49:40.0633145Z // end inline asm 2026-02-21T09:49:40.0633293Z // begin inline asm 2026-02-21T09:49:40.0633537Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1054, %r1058, %r1062, %r1066}, [%r864]; 2026-02-21T09:49:40.0633858Z // end inline asm 2026-02-21T09:49:40.0634001Z bar.sync 0, 128; 2026-02-21T09:49:40.0634193Z st.shared.v4.b32 [%r80], {%r1400, %r1412, %r1424, %r1436}; 2026-02-21T09:49:40.0634452Z st.shared.v4.b32 [%r81], {%r1448, %r1460, %r1472, %r1484}; 2026-02-21T09:49:40.0634762Z st.shared.v4.b32 [%r82], {%r1496, %r1508, %r1520, %r1532}; 2026-02-21T09:49:40.0635061Z st.shared.v4.b32 [%r83], {%r1544, %r1556, %r1568, %r1580}; 2026-02-21T09:49:40.0635277Z bar.sync 0, 128; 2026-02-21T09:49:40.0635430Z // begin inline asm 2026-02-21T09:49:40.0635685Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1070, %r1074, %r1078, %r1082}, [%r849]; 2026-02-21T09:49:40.0635990Z // end inline asm 2026-02-21T09:49:40.0636135Z // begin inline asm 2026-02-21T09:49:40.0636394Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1086, %r1090, %r1094, %r1098}, [%r854]; 2026-02-21T09:49:40.0636680Z // end inline asm 2026-02-21T09:49:40.0636835Z // begin inline asm 2026-02-21T09:49:40.0637101Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1102, %r1106, %r1110, %r1114}, [%r859]; 2026-02-21T09:49:40.0637390Z // end inline asm 2026-02-21T09:49:40.0637549Z // begin inline asm 2026-02-21T09:49:40.0637806Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1118, %r1122, %r1126, %r1130}, [%r864]; 2026-02-21T09:49:40.0638110Z // end inline asm 2026-02-21T09:49:40.0638254Z bar.sync 0, 128; 2026-02-21T09:49:40.0638481Z st.shared.v4.b32 [%r80], {%r1211, %r1223, %r1235, %r1247}; 2026-02-21T09:49:40.0638743Z st.shared.v4.b32 [%r81], {%r1259, %r1271, %r1283, %r1295}; 2026-02-21T09:49:40.0639007Z st.shared.v4.b32 [%r82], {%r1307, %r1319, %r1331, %r1343}; 2026-02-21T09:49:40.0639266Z st.shared.v4.b32 [%r83], {%r1355, %r1367, %r1379, %r1391}; 2026-02-21T09:49:40.0639481Z bar.sync 0, 128; 2026-02-21T09:49:40.0639636Z // begin inline asm 2026-02-21T09:49:40.0639888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1007, %r1011, %r1015, %r1019}, [%r849]; 2026-02-21T09:49:40.0640181Z // end inline asm 2026-02-21T09:49:40.0640326Z // begin inline asm 2026-02-21T09:49:40.0640582Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1023, %r1027, %r1031, %r1035}, [%r854]; 2026-02-21T09:49:40.0640870Z // end inline asm 2026-02-21T09:49:40.0641015Z // begin inline asm 2026-02-21T09:49:40.0641299Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1039, %r1043, %r1047, %r1051}, [%r859]; 2026-02-21T09:49:40.0641593Z // end inline asm 2026-02-21T09:49:40.0641750Z // begin inline asm 2026-02-21T09:49:40.0641995Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1055, %r1059, %r1063, %r1067}, [%r864]; 2026-02-21T09:49:40.0642291Z // end inline asm 2026-02-21T09:49:40.0642433Z bar.sync 0, 128; 2026-02-21T09:49:40.0642626Z st.shared.v4.b32 [%r80], {%r1403, %r1415, %r1427, %r1439}; 2026-02-21T09:49:40.0642888Z st.shared.v4.b32 [%r81], {%r1451, %r1463, %r1475, %r1487}; 2026-02-21T09:49:40.0643147Z st.shared.v4.b32 [%r82], {%r1499, %r1511, %r1523, %r1535}; 2026-02-21T09:49:40.0643402Z st.shared.v4.b32 [%r83], {%r1547, %r1559, %r1571, %r1583}; 2026-02-21T09:49:40.0643617Z bar.sync 0, 128; 2026-02-21T09:49:40.0643768Z // begin inline asm 2026-02-21T09:49:40.0644015Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1071, %r1075, %r1079, %r1083}, [%r849]; 2026-02-21T09:49:40.0644307Z // end inline asm 2026-02-21T09:49:40.0644448Z // begin inline asm 2026-02-21T09:49:40.0644763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1087, %r1091, %r1095, %r1099}, [%r854]; 2026-02-21T09:49:40.0645058Z // end inline asm 2026-02-21T09:49:40.0645201Z // begin inline asm 2026-02-21T09:49:40.0645458Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1103, %r1107, %r1111, %r1115}, [%r859]; 2026-02-21T09:49:40.0645742Z // end inline asm 2026-02-21T09:49:40.0645892Z // begin inline asm 2026-02-21T09:49:40.0646139Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1119, %r1123, %r1127, %r1131}, [%r864]; 2026-02-21T09:49:40.0646437Z // end inline asm 2026-02-21T09:49:40.0646579Z bar.sync 0, 128; 2026-02-21T09:49:40.0646768Z st.shared.v4.b32 [%r80], {%r1214, %r1226, %r1238, %r1250}; 2026-02-21T09:49:40.0647073Z st.shared.v4.b32 [%r81], {%r1262, %r1274, %r1286, %r1298}; 2026-02-21T09:49:40.0647329Z st.shared.v4.b32 [%r82], {%r1310, %r1322, %r1334, %r1346}; 2026-02-21T09:49:40.0647586Z st.shared.v4.b32 [%r83], {%r1358, %r1370, %r1382, %r1394}; 2026-02-21T09:49:40.0647803Z bar.sync 0, 128; 2026-02-21T09:49:40.0647956Z // begin inline asm 2026-02-21T09:49:40.0648241Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1008, %r1012, %r1016, %r1020}, [%r849]; 2026-02-21T09:49:40.0648545Z // end inline asm 2026-02-21T09:49:40.0648691Z // begin inline asm 2026-02-21T09:49:40.0648952Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1028, %r1032, %r1036}, [%r854]; 2026-02-21T09:49:40.0649253Z // end inline asm 2026-02-21T09:49:40.0649400Z // begin inline asm 2026-02-21T09:49:40.0649666Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1040, %r1044, %r1048, %r1052}, [%r859]; 2026-02-21T09:49:40.0649957Z // end inline asm 2026-02-21T09:49:40.0650114Z // begin inline asm 2026-02-21T09:49:40.0650368Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1056, %r1060, %r1064, %r1068}, [%r864]; 2026-02-21T09:49:40.0650668Z // end inline asm 2026-02-21T09:49:40.0650811Z bar.sync 0, 128; 2026-02-21T09:49:40.0651006Z st.shared.v4.b32 [%r80], {%r1406, %r1418, %r1430, %r1442}; 2026-02-21T09:49:40.0651282Z st.shared.v4.b32 [%r81], {%r1454, %r1466, %r1478, %r1490}; 2026-02-21T09:49:40.0651584Z st.shared.v4.b32 [%r82], {%r1502, %r1514, %r1526, %r1538}; 2026-02-21T09:49:40.0651845Z st.shared.v4.b32 [%r83], {%r1550, %r1562, %r1574, %r1586}; 2026-02-21T09:49:40.0652057Z bar.sync 0, 128; 2026-02-21T09:49:40.0652209Z // begin inline asm 2026-02-21T09:49:40.0652465Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1072, %r1076, %r1080, %r1084}, [%r849]; 2026-02-21T09:49:40.0652765Z // end inline asm 2026-02-21T09:49:40.0652911Z // begin inline asm 2026-02-21T09:49:40.0653170Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1088, %r1092, %r1096, %r1100}, [%r854]; 2026-02-21T09:49:40.0653468Z // end inline asm 2026-02-21T09:49:40.0653613Z // begin inline asm 2026-02-21T09:49:40.0653871Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1104, %r1108, %r1112, %r1116}, [%r859]; 2026-02-21T09:49:40.0654161Z // end inline asm 2026-02-21T09:49:40.0654316Z // begin inline asm 2026-02-21T09:49:40.0654600Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1120, %r1124, %r1128, %r1132}, [%r864]; 2026-02-21T09:49:40.0654968Z // end inline asm 2026-02-21T09:49:40.0655114Z // begin inline asm 2026-02-21T09:49:40.0655327Z st.global.v4.b32 [ %rd65 + 0 ], { %r1005, %r1006, %r1007, %r1008 }; 2026-02-21T09:49:40.0655572Z // end inline asm 2026-02-21T09:49:40.0655716Z // begin inline asm 2026-02-21T09:49:40.0655925Z st.global.v4.b32 [ %rd66 + 0 ], { %r1009, %r1010, %r1011, %r1012 }; 2026-02-21T09:49:40.0656153Z // end inline asm 2026-02-21T09:49:40.0656303Z // begin inline asm 2026-02-21T09:49:40.0656502Z st.global.v4.b32 [ %rd67 + 0 ], { %r1013, %r1014, %r1015, %r1016 }; 2026-02-21T09:49:40.0656736Z // end inline asm 2026-02-21T09:49:40.0656880Z // begin inline asm 2026-02-21T09:49:40.0657082Z st.global.v4.b32 [ %rd68 + 0 ], { %r1017, %r1018, %r1019, %r1020 }; 2026-02-21T09:49:40.0657312Z // end inline asm 2026-02-21T09:49:40.0657455Z // begin inline asm 2026-02-21T09:49:40.0657662Z st.global.v4.b32 [ %rd69 + 0 ], { %r1021, %r1022, %r1023, %r1024 }; 2026-02-21T09:49:40.0657890Z // end inline asm 2026-02-21T09:49:40.0658042Z // begin inline asm 2026-02-21T09:49:40.0658235Z st.global.v4.b32 [ %rd70 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T09:49:40.0658467Z // end inline asm 2026-02-21T09:49:40.0658610Z // begin inline asm 2026-02-21T09:49:40.0658811Z st.global.v4.b32 [ %rd71 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T09:49:40.0659042Z // end inline asm 2026-02-21T09:49:40.0659186Z // begin inline asm 2026-02-21T09:49:40.0659391Z st.global.v4.b32 [ %rd72 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T09:49:40.0659617Z // end inline asm 2026-02-21T09:49:40.0659811Z // begin inline asm 2026-02-21T09:49:40.0660011Z st.global.v4.b32 [ %rd73 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T09:49:40.0660252Z // end inline asm 2026-02-21T09:49:40.0660402Z // begin inline asm 2026-02-21T09:49:40.0660603Z st.global.v4.b32 [ %rd74 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T09:49:40.0660836Z // end inline asm 2026-02-21T09:49:40.0660982Z // begin inline asm 2026-02-21T09:49:40.0661222Z st.global.v4.b32 [ %rd75 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T09:49:40.0661450Z // end inline asm 2026-02-21T09:49:40.0661600Z // begin inline asm 2026-02-21T09:49:40.0661793Z st.global.v4.b32 [ %rd76 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T09:49:40.0662023Z // end inline asm 2026-02-21T09:49:40.0662165Z // begin inline asm 2026-02-21T09:49:40.0662365Z st.global.v4.b32 [ %rd77 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T09:49:40.0662591Z // end inline asm 2026-02-21T09:49:40.0662741Z // begin inline asm 2026-02-21T09:49:40.0662944Z st.global.v4.b32 [ %rd78 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T09:49:40.0663168Z // end inline asm 2026-02-21T09:49:40.0663319Z // begin inline asm 2026-02-21T09:49:40.0663510Z st.global.v4.b32 [ %rd79 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T09:49:40.0663742Z // end inline asm 2026-02-21T09:49:40.0663888Z // begin inline asm 2026-02-21T09:49:40.0664120Z st.global.v4.b32 [ %rd80 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T09:49:40.0664349Z // end inline asm 2026-02-21T09:49:40.0664498Z // begin inline asm 2026-02-21T09:49:40.0664741Z st.global.v4.b32 [ %rd81 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T09:49:40.0664975Z // end inline asm 2026-02-21T09:49:40.0665127Z // begin inline asm 2026-02-21T09:49:40.0665321Z st.global.v4.b32 [ %rd82 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T09:49:40.0665387Z // end inline asm 2026-02-21T09:49:40.0665448Z // begin inline asm 2026-02-21T09:49:40.0665555Z st.global.v4.b32 [ %rd83 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T09:49:40.0665615Z // end inline asm 2026-02-21T09:49:40.0665683Z // begin inline asm 2026-02-21T09:49:40.0665809Z st.global.v4.b32 [ %rd84 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T09:49:40.0665868Z // end inline asm 2026-02-21T09:49:40.0665935Z // begin inline asm 2026-02-21T09:49:40.0666071Z st.global.v4.b32 [ %rd85 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T09:49:40.0666135Z // end inline asm 2026-02-21T09:49:40.0666195Z // begin inline asm 2026-02-21T09:49:40.0666309Z st.global.v4.b32 [ %rd86 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T09:49:40.0666368Z // end inline asm 2026-02-21T09:49:40.0666429Z // begin inline asm 2026-02-21T09:49:40.0666541Z st.global.v4.b32 [ %rd87 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T09:49:40.0666600Z // end inline asm 2026-02-21T09:49:40.0666660Z // begin inline asm 2026-02-21T09:49:40.0666766Z st.global.v4.b32 [ %rd88 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T09:49:40.0666834Z // end inline asm 2026-02-21T09:49:40.0666895Z // begin inline asm 2026-02-21T09:49:40.0666999Z st.global.v4.b32 [ %rd89 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T09:49:40.0667065Z // end inline asm 2026-02-21T09:49:40.0667124Z // begin inline asm 2026-02-21T09:49:40.0667232Z st.global.v4.b32 [ %rd90 + 0 ], { %r1105, %r1106, %r1107, %r1108 }; 2026-02-21T09:49:40.0667300Z // end inline asm 2026-02-21T09:49:40.0667361Z // begin inline asm 2026-02-21T09:49:40.0667466Z st.global.v4.b32 [ %rd91 + 0 ], { %r1109, %r1110, %r1111, %r1112 }; 2026-02-21T09:49:40.0667524Z // end inline asm 2026-02-21T09:49:40.0667589Z // begin inline asm 2026-02-21T09:49:40.0667696Z st.global.v4.b32 [ %rd92 + 0 ], { %r1113, %r1114, %r1115, %r1116 }; 2026-02-21T09:49:40.0667754Z // end inline asm 2026-02-21T09:49:40.0667822Z // begin inline asm 2026-02-21T09:49:40.0667927Z st.global.v4.b32 [ %rd93 + 0 ], { %r1117, %r1118, %r1119, %r1120 }; 2026-02-21T09:49:40.0667985Z // end inline asm 2026-02-21T09:49:40.0668076Z // begin inline asm 2026-02-21T09:49:40.0668194Z st.global.v4.b32 [ %rd94 + 0 ], { %r1121, %r1122, %r1123, %r1124 }; 2026-02-21T09:49:40.0668253Z // end inline asm 2026-02-21T09:49:40.0668312Z // begin inline asm 2026-02-21T09:49:40.0668427Z st.global.v4.b32 [ %rd95 + 0 ], { %r1125, %r1126, %r1127, %r1128 }; 2026-02-21T09:49:40.0668486Z // end inline asm 2026-02-21T09:49:40.0668547Z // begin inline asm 2026-02-21T09:49:40.0668688Z st.global.v4.b32 [ %rd96 + 0 ], { %r1129, %r1130, %r1131, %r1132 }; 2026-02-21T09:49:40.0668756Z // end inline asm 2026-02-21T09:49:40.0668961Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0669038Z add.s32 %r1133, %r1136, 180368; 2026-02-21T09:49:40.0669240Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0669302Z bar.sync 0, 128; 2026-02-21T09:49:40.0669372Z // begin inline asm 2026-02-21T09:49:40.0669494Z @%p123 mbarrier.arrive.shared::cta.b64 _, [%r1133]; 2026-02-21T09:49:40.0669553Z // end inline asm 2026-02-21T09:49:40.0669620Z add.s32 %r1587, %r1653, 1; 2026-02-21T09:49:40.0669693Z setp.eq.b32 %p121, %r1587, 2; 2026-02-21T09:49:40.0669773Z selp.b32 %r1653, 0, %r1587, %p121; 2026-02-21T09:49:40.0669843Z selp.b32 %r1652, 1, 0, %p121; 2026-02-21T09:49:40.0669962Z $L__BB0_22: // %.thread21 2026-02-21T09:49:40.0670077Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.0670268Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0670338Z xor.b32 %r1655, %r1655, %r1652; 2026-02-21T09:49:40.0670535Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0670605Z add.s32 %r1642, %r1642, -1; 2026-02-21T09:49:40.0670675Z setp.ne.b32 %p122, %r1642, 0; 2026-02-21T09:49:40.0670752Z @%p122 bra $L__BB0_18; 2026-02-21T09:49:40.0670817Z bra.uni $L__BB0_23; 2026-02-21T09:49:40.0670932Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:49:40.0670999Z add.s32 %r570, %r1648, 1; 2026-02-21T09:49:40.0671078Z setp.eq.b32 %p117, %r1648, 63; 2026-02-21T09:49:40.0671174Z selp.b32 %r1648, 0, %r570, %p117; 2026-02-21T09:49:40.0671245Z setp.eq.b32 %p118, %r1648, 63; 2026-02-21T09:49:40.0671319Z @%p118 bra $L__BB0_21; 2026-02-21T09:49:40.0671427Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.0671609Z .loc 1 0 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0:107 2026-02-21T09:49:40.0671670Z mov.b32 %r1652, 0; 2026-02-21T09:49:40.0671860Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0671929Z setp.ne.b32 %p119, %r1648, 0; 2026-02-21T09:49:40.0671992Z @%p119 bra $L__BB0_22; 2026-02-21T09:49:40.0672084Z // %bb.20: // %.thread 2026-02-21T09:49:40.0672180Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.0672245Z add.s32 %r1651, %r1651, 1; 2026-02-21T09:49:40.0672432Z .loc 1 37 35 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:37:35 2026-02-21T09:49:40.0672498Z shr.s32 %r1589, %r1651, 31; 2026-02-21T09:49:40.0672564Z shr.u32 %r1590, %r1589, 24; 2026-02-21T09:49:40.0672630Z add.s32 %r1591, %r1651, %r1590; 2026-02-21T09:49:40.0672699Z shr.s32 %r1592, %r1591, 8; 2026-02-21T09:49:40.0672876Z .loc 1 38 33 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:38:33 2026-02-21T09:49:40.0672938Z shl.b32 %r1593, %r1592, 5; 2026-02-21T09:49:40.0673121Z .loc 1 39 39 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:39:39 2026-02-21T09:49:40.0673185Z sub.s32 %r1594, 96, %r1593; 2026-02-21T09:49:40.0673420Z .loc 1 39 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:39:52 2026-02-21T09:49:40.0673490Z min.s32 %r1595, %r1594, 32; 2026-02-21T09:49:40.0673673Z .loc 1 40 45 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:40:45 2026-02-21T09:49:40.0673740Z and.b32 %r1596, %r1591, -256; 2026-02-21T09:49:40.0673816Z sub.s32 %r1597, %r1651, %r1596; 2026-02-21T09:49:40.0674023Z .loc 1 41 51 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:41:51 2026-02-21T09:49:40.0674090Z div.s32 %r1598, %r1597, %r1595; 2026-02-21T09:49:40.0674272Z .loc 1 40 64 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:40:64 2026-02-21T09:49:40.0674349Z mul.lo.s32 %r1599, %r1598, %r1595; 2026-02-21T09:49:40.0674415Z sub.s32 %r1600, %r1597, %r1599; 2026-02-21T09:49:40.0674594Z .loc 1 40 30 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:40:30 2026-02-21T09:49:40.0674707Z add.s32 %r1601, %r1600, %r1593; 2026-02-21T09:49:40.0674893Z .loc 1 42 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:42:27 2026-02-21T09:49:40.0674956Z shl.b32 %r1649, %r1601, 7; 2026-02-21T09:49:40.0675147Z .loc 1 44 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:44:27 2026-02-21T09:49:40.0675241Z shl.b32 %r1650, %r1598, 8; 2026-02-21T09:49:40.0675437Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0675508Z bra.uni $L__BB0_22; 2026-02-21T09:49:40.0675617Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:49:40.0675807Z .loc 1 0 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0:107 2026-02-21T09:49:40.0675872Z mov.b32 %r111, global_smem; 2026-02-21T09:49:40.0675942Z add.s32 %r112, %r111, %r3; 2026-02-21T09:49:40.0676008Z mov.u32 %r161, %ctaid.x; 2026-02-21T09:49:40.0676075Z mul.lo.s32 %r162, %r161, 3; 2026-02-21T09:49:40.0676145Z add.s32 %r163, %r162, 3; 2026-02-21T09:49:40.0676208Z min.s32 %r164, %r163, 768; 2026-02-21T09:49:40.0676274Z sub.s32 %r165, %r164, %r162; 2026-02-21T09:49:40.0676337Z shl.b32 %r5, %r165, 6; 2026-02-21T09:49:40.0676412Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:49:40.0676506Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.0676620Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0676819Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0676907Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.0676970Z barrier.sync 1; 2026-02-21T09:49:40.0677039Z barrier.sync 1; 2026-02-21T09:49:40.0677125Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.0677214Z $L__BB0_2: // %.preheader 2026-02-21T09:49:40.0677312Z // =>This Loop Header: Depth=1 2026-02-21T09:49:40.0677418Z // Child Loop BB0_11 Depth 2 2026-02-21T09:49:40.0677515Z // Child Loop BB0_7 Depth 2 2026-02-21T09:49:40.0677696Z .loc 1 19 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:19 2026-02-21T09:49:40.0677789Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:49:40.0677854Z barrier.sync 1; 2026-02-21T09:49:40.0677925Z ld.shared.b8 %r110, [%r112+180380]; 2026-02-21T09:49:40.0678000Z setp.gt.u32 %p4, %r110, 3; 2026-02-21T09:49:40.0678062Z @%p4 bra $L__BB0_4; 2026-02-21T09:49:40.0678150Z // %bb.3: // %.preheader 2026-02-21T09:49:40.0678244Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0678320Z $L_brx_0: .branchtargets 2026-02-21T09:49:40.0678381Z $L__BB0_5, 2026-02-21T09:49:40.0678438Z $L__BB0_9, 2026-02-21T09:49:40.0678503Z $L__BB0_15, 2026-02-21T09:49:40.0678592Z $L__BB0_24; 2026-02-21T09:49:40.0678660Z brx.idx %r110, $L_brx_0; 2026-02-21T09:49:40.0678767Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0678971Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0679059Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.0679146Z ld.shared.b32 %r4, [global_smem+172032]; 2026-02-21T09:49:40.0679266Z barrier.sync 1; 2026-02-21T09:49:40.0679330Z @%p17 bra $L__BB0_8; 2026-02-21T09:49:40.0679411Z // %bb.6: // %.lr.ph9 2026-02-21T09:49:40.0679512Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0679699Z .loc 1 0 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0:107 2026-02-21T09:49:40.0679762Z mov.b32 %r1630, -1; 2026-02-21T09:49:40.0679828Z mov.pred %p142, 0; 2026-02-21T09:49:40.0679895Z mov.b32 %r1626, 0; 2026-02-21T09:49:40.0679959Z mov.b32 %r1625, %r5; 2026-02-21T09:49:40.0680022Z mov.b32 %r1627, %r1626; 2026-02-21T09:49:40.0680093Z mov.b32 %r1628, %r1626; 2026-02-21T09:49:40.0680154Z mov.b32 %r1629, %r1626; 2026-02-21T09:49:40.0680260Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:49:40.0680370Z // => This Inner Loop Header: Depth=2 2026-02-21T09:49:40.0680586Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0680651Z add.s32 %r180, %r1630, 1; 2026-02-21T09:49:40.0680719Z setp.eq.b32 %p30, %r1630, 63; 2026-02-21T09:49:40.0680799Z selp.b32 %r1630, 0, %r180, %p30; 2026-02-21T09:49:40.0680862Z shl.b32 %r181, %r1629, 3; 2026-02-21T09:49:40.0680927Z add.s32 %r183, %r111, %r181; 2026-02-21T09:49:40.0681000Z add.s32 %r184, %r183, 180224; 2026-02-21T09:49:40.0681063Z add.s32 %r168, %r183, 180288; 2026-02-21T09:49:40.0681247Z .loc 1 54 31 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:54:31 2026-02-21T09:49:40.0681321Z shl.b32 %r185, %r1629, 14; 2026-02-21T09:49:40.0681387Z add.s32 %r186, %r111, %r185; 2026-02-21T09:49:40.0681570Z .loc 1 55 44 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:55:44 2026-02-21T09:49:40.0681658Z shl.b32 %r187, %r1629, 13; 2026-02-21T09:49:40.0681732Z add.s32 %r188, %r111, %r187; 2026-02-21T09:49:40.0681797Z add.s32 %r189, %r188, 114688; 2026-02-21T09:49:40.0681978Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0682052Z bar.warp.sync -1; 2026-02-21T09:49:40.0682115Z // begin inline asm 2026-02-21T09:49:40.0682171Z 2026-02-21T09:49:40.0682225Z { 2026-02-21T09:49:40.0682301Z .reg .pred complete; 2026-02-21T09:49:40.0682362Z waitLoop: 2026-02-21T09:49:40.0682502Z mbarrier.try_wait.parity.shared.b64 complete, [%r168], %r1628; 2026-02-21T09:49:40.0682581Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.0682639Z } 2026-02-21T09:49:40.0682642Z 2026-02-21T09:49:40.0682703Z // end inline asm 2026-02-21T09:49:40.0682900Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0682964Z shl.b32 %r190, %r1627, 8; 2026-02-21T09:49:40.0683029Z add.s32 %r170, %r190, %r4; 2026-02-21T09:49:40.0683220Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0683292Z shl.b32 %r191, %r1627, 3; 2026-02-21T09:49:40.0683355Z add.s32 %r192, %r111, %r191; 2026-02-21T09:49:40.0683419Z add.s32 %r193, %r192, 180352; 2026-02-21T09:49:40.0683490Z setp.eq.b32 %p29, %r1630, 63; 2026-02-21T09:49:40.0683673Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0683743Z elect.sync %r194|%p20, -1; 2026-02-21T09:49:40.0683817Z bfe.u32 %r195, %r186, 4, 14; 2026-02-21T09:49:40.0683912Z cvt.u64.u32 %rd22, %r195; 2026-02-21T09:49:40.0683995Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T09:49:40.0684059Z bfe.u32 %r196, %r189, 4, 14; 2026-02-21T09:49:40.0684133Z cvt.u64.u32 %rd23, %r196; 2026-02-21T09:49:40.0684212Z or.b64 %rd13, %rd23, -9223371899382267904; 2026-02-21T09:49:40.0684276Z mov.b32 %r171, 136314896; 2026-02-21T09:49:40.0684343Z // begin inline asm 2026-02-21T09:49:40.0684509Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 0 ], %rd12, %rd13, %r171, %p142; 2026-02-21T09:49:40.0684595Z // end inline asm 2026-02-21T09:49:40.0684657Z add.s32 %r197, %r186, 32; 2026-02-21T09:49:40.0684762Z bfe.u32 %r198, %r197, 4, 14; 2026-02-21T09:49:40.0684826Z cvt.u64.u32 %rd24, %r198; 2026-02-21T09:49:40.0684901Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:49:40.0684970Z add.s32 %r199, %r188, 114720; 2026-02-21T09:49:40.0685033Z bfe.u32 %r200, %r199, 4, 14; 2026-02-21T09:49:40.0685095Z cvt.u64.u32 %rd25, %r200; 2026-02-21T09:49:40.0685167Z or.b64 %rd15, %rd25, -9223371899382267904; 2026-02-21T09:49:40.0685240Z mov.pred %p21, -1; 2026-02-21T09:49:40.0685302Z // begin inline asm 2026-02-21T09:49:40.0685455Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 0 ], %rd14, %rd15, %r171, %p21; 2026-02-21T09:49:40.0685524Z // end inline asm 2026-02-21T09:49:40.0685589Z add.s32 %r201, %r186, 8192; 2026-02-21T09:49:40.0685652Z bfe.u32 %r202, %r201, 4, 14; 2026-02-21T09:49:40.0685756Z cvt.u64.u32 %rd26, %r202; 2026-02-21T09:49:40.0685832Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:49:40.0685894Z // begin inline asm 2026-02-21T09:49:40.0686052Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 128 ], %rd16, %rd13, %r171, %p142; 2026-02-21T09:49:40.0686120Z // end inline asm 2026-02-21T09:49:40.0686184Z add.s32 %r203, %r186, 8224; 2026-02-21T09:49:40.0686247Z bfe.u32 %r204, %r203, 4, 14; 2026-02-21T09:49:40.0686319Z cvt.u64.u32 %rd27, %r204; 2026-02-21T09:49:40.0686392Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T09:49:40.0686455Z // begin inline asm 2026-02-21T09:49:40.0686616Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 128 ], %rd18, %rd15, %r171, %p21; 2026-02-21T09:49:40.0686677Z // end inline asm 2026-02-21T09:49:40.0686741Z cvt.u64.u32 %rd20, %r184; 2026-02-21T09:49:40.0686802Z // begin inline asm 2026-02-21T09:49:40.0686984Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:49:40.0687052Z // end inline asm 2026-02-21T09:49:40.0687124Z and.pred %p28, %p29, %p20; 2026-02-21T09:49:40.0687195Z cvt.u64.u32 %rd21, %r193; 2026-02-21T09:49:40.0687256Z // begin inline asm 2026-02-21T09:49:40.0687394Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:49:40.0687462Z // end inline asm 2026-02-21T09:49:40.0687644Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0687717Z setp.ne.b32 %p142, %r1630, 63; 2026-02-21T09:49:40.0687904Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0687981Z add.s32 %r205, %r1627, 1; 2026-02-21T09:49:40.0688049Z setp.eq.b32 %p31, %r205, 2; 2026-02-21T09:49:40.0688118Z selp.b32 %r206, 0, %r205, %p31; 2026-02-21T09:49:40.0688200Z selp.b32 %r1627, %r206, %r1627, %p29; 2026-02-21T09:49:40.0688272Z and.pred %p32, %p29, %p31; 2026-02-21T09:49:40.0688340Z selp.b32 %r207, 1, 0, %p32; 2026-02-21T09:49:40.0688406Z xor.b32 %r1626, %r1626, %r207; 2026-02-21T09:49:40.0688602Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0688664Z shl.b32 %r208, %r1627, 3; 2026-02-21T09:49:40.0688728Z add.s32 %r209, %r111, %r208; 2026-02-21T09:49:40.0688798Z add.s32 %r178, %r209, 180368; 2026-02-21T09:49:40.0688978Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0689040Z // begin inline asm 2026-02-21T09:49:40.0689103Z 2026-02-21T09:49:40.0689188Z { 2026-02-21T09:49:40.0689258Z @!%p29 bra.uni skipWait; 2026-02-21T09:49:40.0689324Z .reg .pred complete; 2026-02-21T09:49:40.0689392Z waitLoop: 2026-02-21T09:49:40.0689529Z mbarrier.try_wait.parity.shared.b64 complete, [%r178], %r1626; 2026-02-21T09:49:40.0689600Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.0689668Z skipWait: 2026-02-21T09:49:40.0689724Z } 2026-02-21T09:49:40.0689728Z 2026-02-21T09:49:40.0689823Z // end inline asm 2026-02-21T09:49:40.0690008Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0690072Z add.s32 %r210, %r1629, 1; 2026-02-21T09:49:40.0690138Z setp.eq.b32 %p33, %r210, 7; 2026-02-21T09:49:40.0690208Z selp.b32 %r1629, 0, %r210, %p33; 2026-02-21T09:49:40.0690281Z selp.b32 %r211, 1, 0, %p33; 2026-02-21T09:49:40.0690347Z xor.b32 %r1628, %r1628, %r211; 2026-02-21T09:49:40.0690544Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0690618Z add.s32 %r1625, %r1625, -1; 2026-02-21T09:49:40.0690684Z setp.ne.b32 %p34, %r1625, 0; 2026-02-21T09:49:40.0690749Z @%p34 bra $L__BB0_7; 2026-02-21T09:49:40.0690844Z $L__BB0_8: // %._crit_edge10 2026-02-21T09:49:40.0690951Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0691015Z barrier.sync 1; 2026-02-21T09:49:40.0691129Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.0691201Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.0691307Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0691501Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0691593Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.0691655Z barrier.sync 1; 2026-02-21T09:49:40.0691840Z .loc 1 21 67 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:21:67 2026-02-21T09:49:40.0691906Z mov.u32 %r114, %ctaid.y; 2026-02-21T09:49:40.0691978Z mov.u32 %r115, %ctaid.z; 2026-02-21T09:49:40.0692042Z mov.u32 %r116, %nctaid.x; 2026-02-21T09:49:40.0692105Z mov.u32 %r117, %nctaid.y; 2026-02-21T09:49:40.0692188Z mad.lo.s32 %r118, %r115, %r117, %r114; 2026-02-21T09:49:40.0692286Z mad.lo.s32 %r119, %r118, %r116, %r161; 2026-02-21T09:49:40.0692350Z shl.b32 %r120, %r119, 8; 2026-02-21T09:49:40.0692538Z .loc 1 22 68 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:22:68 2026-02-21T09:49:40.0692604Z cvt.s64.s32 %rd7, %r120; 2026-02-21T09:49:40.0692667Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:49:40.0692729Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:49:40.0692807Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:49:40.0692987Z .loc 1 21 67 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:21:67 2026-02-21T09:49:40.0693055Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:49:40.0693252Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0693318Z shl.b32 %r1631, %r165, 6; 2026-02-21T09:49:40.0693384Z setp.lt.s32 %p5, %r1631, 1; 2026-02-21T09:49:40.0693452Z @%p5 bra $L__BB0_14; 2026-02-21T09:49:40.0693533Z // %bb.10: // %.lr.ph 2026-02-21T09:49:40.0693629Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0693695Z add.s32 %r1641, %r162, -1; 2026-02-21T09:49:40.0693765Z add.s32 %r21, %r1, -128; 2026-02-21T09:49:40.0693826Z mov.b32 %r1638, -1; 2026-02-21T09:49:40.0693885Z mov.b32 %r1632, 0; 2026-02-21T09:49:40.0693953Z mov.b32 %r1633, %r1632; 2026-02-21T09:49:40.0694016Z mov.b32 %r1640, %r1632; 2026-02-21T09:49:40.0694078Z mov.b32 %r1639, %r1632; 2026-02-21T09:49:40.0694137Z mov.b32 %r1636, %r1632; 2026-02-21T09:49:40.0694206Z bra.uni $L__BB0_11; 2026-02-21T09:49:40.0694313Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:49:40.0694528Z .loc 1 0 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0:107 2026-02-21T09:49:40.0694604Z selp.b32 %r144, 0, %r1636, %p8; 2026-02-21T09:49:40.0694706Z setp.lt.u32 %p12, %r21, 32; 2026-02-21T09:49:40.0694772Z setp.eq.b32 %p9, %r21, 0; 2026-02-21T09:49:40.0694974Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0695070Z shl.b32 %r151, %r1633, 3; 2026-02-21T09:49:40.0695135Z add.s32 %r153, %r111, %r151; 2026-02-21T09:49:40.0695201Z add.s32 %r140, %r153, 180224; 2026-02-21T09:49:40.0695388Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0695448Z // begin inline asm 2026-02-21T09:49:40.0695504Z 2026-02-21T09:49:40.0695564Z { 2026-02-21T09:49:40.0695633Z .reg .pred complete; 2026-02-21T09:49:40.0695690Z waitLoop: 2026-02-21T09:49:40.0695824Z mbarrier.try_wait.parity.shared.b64 complete, [%r140], %r1632; 2026-02-21T09:49:40.0695904Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.0695958Z } 2026-02-21T09:49:40.0695962Z 2026-02-21T09:49:40.0696022Z // end inline asm 2026-02-21T09:49:40.0696218Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0696286Z add.s32 %r146, %r153, 180288; 2026-02-21T09:49:40.0696491Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0696565Z bar.sync 3, 64; 2026-02-21T09:49:40.0696629Z // begin inline asm 2026-02-21T09:49:40.0696755Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r146], 24576; 2026-02-21T09:49:40.0696819Z // end inline asm 2026-02-21T09:49:40.0697010Z .loc 1 54 31 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:54:31 2026-02-21T09:49:40.0697077Z shl.b32 %r154, %r1633, 14; 2026-02-21T09:49:40.0697142Z add.s32 %r143, %r111, %r154; 2026-02-21T09:49:40.0697321Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0697382Z bar.sync 3, 64; 2026-02-21T09:49:40.0697452Z elect.sync %r155|%p13, -1; 2026-02-21T09:49:40.0697526Z and.pred %p10, %p12, %p13; 2026-02-21T09:49:40.0697588Z // begin inline asm 2026-02-21T09:49:40.0697919Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r143], [%rd10, {%r144, %r1640}], [%r146]; 2026-02-21T09:49:40.0697982Z // end inline asm 2026-02-21T09:49:40.0698172Z .loc 1 55 44 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:55:44 2026-02-21T09:49:40.0698235Z shl.b32 %r156, %r1633, 13; 2026-02-21T09:49:40.0698297Z add.s32 %r157, %r111, %r156; 2026-02-21T09:49:40.0698369Z add.s32 %r147, %r157, 114688; 2026-02-21T09:49:40.0698546Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0698606Z bar.sync 3, 64; 2026-02-21T09:49:40.0698681Z elect.sync %r158|%p14, -1; 2026-02-21T09:49:40.0698748Z and.pred %p11, %p12, %p14; 2026-02-21T09:49:40.0698809Z // begin inline asm 2026-02-21T09:49:40.0699087Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd11, {%r144, %r1639}], [%r146]; 2026-02-21T09:49:40.0699150Z // end inline asm 2026-02-21T09:49:40.0699344Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0699409Z add.s32 %r1636, %r144, 32; 2026-02-21T09:49:40.0699589Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0699653Z add.s32 %r159, %r1633, 1; 2026-02-21T09:49:40.0699718Z setp.eq.b32 %p15, %r159, 7; 2026-02-21T09:49:40.0699794Z selp.b32 %r1633, 0, %r159, %p15; 2026-02-21T09:49:40.0699860Z selp.b32 %r160, 1, 0, %p15; 2026-02-21T09:49:40.0699926Z xor.b32 %r1632, %r1632, %r160; 2026-02-21T09:49:40.0700122Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0700235Z add.s32 %r1631, %r1631, -1; 2026-02-21T09:49:40.0700303Z setp.ne.b32 %p16, %r1631, 0; 2026-02-21T09:49:40.0700368Z @%p16 bra $L__BB0_11; 2026-02-21T09:49:40.0700438Z bra.uni $L__BB0_14; 2026-02-21T09:49:40.0700551Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:49:40.0700659Z // => This Inner Loop Header: Depth=2 2026-02-21T09:49:40.0700763Z add.s32 %r126, %r1638, 1; 2026-02-21T09:49:40.0700830Z setp.eq.b32 %p6, %r1638, 63; 2026-02-21T09:49:40.0700899Z selp.b32 %r1638, 0, %r126, %p6; 2026-02-21T09:49:40.0700966Z setp.ne.b32 %p7, %r1638, 0; 2026-02-21T09:49:40.0701039Z setp.eq.b32 %p8, %r1638, 0; 2026-02-21T09:49:40.0701102Z @%p7 bra $L__BB0_13; 2026-02-21T09:49:40.0701207Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:49:40.0701281Z add.s32 %r1641, %r1641, 1; 2026-02-21T09:49:40.0701474Z .loc 1 37 35 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:37:35 2026-02-21T09:49:40.0701537Z shr.s32 %r127, %r1641, 31; 2026-02-21T09:49:40.0701605Z shr.u32 %r128, %r127, 24; 2026-02-21T09:49:40.0701671Z add.s32 %r129, %r1641, %r128; 2026-02-21T09:49:40.0701735Z shr.s32 %r130, %r129, 8; 2026-02-21T09:49:40.0701955Z .loc 1 38 33 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:38:33 2026-02-21T09:49:40.0702032Z shl.b32 %r131, %r130, 5; 2026-02-21T09:49:40.0702218Z .loc 1 39 39 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:39:39 2026-02-21T09:49:40.0702281Z sub.s32 %r132, 96, %r131; 2026-02-21T09:49:40.0702472Z .loc 1 39 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:39:52 2026-02-21T09:49:40.0702535Z min.s32 %r133, %r132, 32; 2026-02-21T09:49:40.0702720Z .loc 1 40 45 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:40:45 2026-02-21T09:49:40.0702792Z and.b32 %r134, %r129, -256; 2026-02-21T09:49:40.0702859Z sub.s32 %r135, %r1641, %r134; 2026-02-21T09:49:40.0703039Z .loc 1 41 51 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:41:51 2026-02-21T09:49:40.0703135Z div.s32 %r136, %r135, %r133; 2026-02-21T09:49:40.0703319Z .loc 1 40 64 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:40:64 2026-02-21T09:49:40.0703387Z mul.lo.s32 %r137, %r136, %r133; 2026-02-21T09:49:40.0703451Z sub.s32 %r138, %r135, %r137; 2026-02-21T09:49:40.0703640Z .loc 1 40 30 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:40:30 2026-02-21T09:49:40.0703702Z add.s32 %r139, %r138, %r131; 2026-02-21T09:49:40.0703883Z .loc 1 42 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:42:27 2026-02-21T09:49:40.0703953Z shl.b32 %r1639, %r139, 7; 2026-02-21T09:49:40.0704132Z .loc 1 44 27 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:44:27 2026-02-21T09:49:40.0704198Z shl.b32 %r1640, %r136, 8; 2026-02-21T09:49:40.0704266Z bra.uni $L__BB0_13; 2026-02-21T09:49:40.0704358Z $L__BB0_14: // %._crit_edge 2026-02-21T09:49:40.0704458Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0704648Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0704792Z barrier.sync 1; 2026-02-21T09:49:40.0704879Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.0704941Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.0705055Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.0705237Z .loc 1 19 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:19 2026-02-21T09:49:40.0705299Z barrier.sync 1; 2026-02-21T09:49:40.0705367Z barrier.sync 1; 2026-02-21T09:49:40.0705460Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.0705554Z $L__BB0_23: // %._crit_edge13 2026-02-21T09:49:40.0705739Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0705808Z barrier.sync 1; 2026-02-21T09:49:40.0705894Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:49:40.0705961Z shl.b32 %r1623, %r1653, 3; 2026-02-21T09:49:40.0706067Z add.s32 %r1602, %r524, %r1623; 2026-02-21T09:49:40.0706249Z .loc 1 56 52 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:56:52 2026-02-21T09:49:40.0706310Z bar.sync 0, 128; 2026-02-21T09:49:40.0706381Z // begin inline asm 2026-02-21T09:49:40.0706438Z 2026-02-21T09:49:40.0706495Z { 2026-02-21T09:49:40.0706564Z .reg .pred complete; 2026-02-21T09:49:40.0706631Z waitLoop: 2026-02-21T09:49:40.0706769Z mbarrier.try_wait.parity.shared.b64 complete, [%r1602], %r1655; 2026-02-21T09:49:40.0706841Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.0706906Z } 2026-02-21T09:49:40.0706910Z 2026-02-21T09:49:40.0706972Z // end inline asm 2026-02-21T09:49:40.0707157Z .loc 1 31 107 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:107 2026-02-21T09:49:40.0707227Z bar.sync 0, 128; 2026-02-21T09:49:40.0720999Z // begin inline asm 2026-02-21T09:49:40.0721226Z @%p123 mbarrier.inval.shared::cta.b64 [%r524]; 2026-02-21T09:49:40.0721413Z // end inline asm 2026-02-21T09:49:40.0721495Z bar.sync 0, 128; 2026-02-21T09:49:40.0721565Z // begin inline asm 2026-02-21T09:49:40.0721668Z @%p123 mbarrier.inval.shared::cta.b64 [%r525]; 2026-02-21T09:49:40.0721745Z // end inline asm 2026-02-21T09:49:40.0721808Z // begin inline asm 2026-02-21T09:49:40.0721901Z @%p123 mbarrier.inval.shared::cta.b64 [%r522]; 2026-02-21T09:49:40.0721972Z // end inline asm 2026-02-21T09:49:40.0722036Z bar.sync 0, 128; 2026-02-21T09:49:40.0722099Z // begin inline asm 2026-02-21T09:49:40.0722188Z @%p123 mbarrier.inval.shared::cta.b64 [%r523]; 2026-02-21T09:49:40.0722269Z // end inline asm 2026-02-21T09:49:40.0722334Z // begin inline asm 2026-02-21T09:49:40.0722420Z @%p123 mbarrier.inval.shared::cta.b64 [%r508]; 2026-02-21T09:49:40.0722493Z // end inline asm 2026-02-21T09:49:40.0722559Z bar.sync 0, 128; 2026-02-21T09:49:40.0722620Z // begin inline asm 2026-02-21T09:49:40.0722756Z @%p123 mbarrier.inval.shared::cta.b64 [%r509]; 2026-02-21T09:49:40.0722835Z // end inline asm 2026-02-21T09:49:40.0722898Z bar.sync 0, 128; 2026-02-21T09:49:40.0722961Z // begin inline asm 2026-02-21T09:49:40.0723058Z @%p123 mbarrier.inval.shared::cta.b64 [%r510]; 2026-02-21T09:49:40.0723120Z // end inline asm 2026-02-21T09:49:40.0723182Z bar.sync 0, 128; 2026-02-21T09:49:40.0723243Z // begin inline asm 2026-02-21T09:49:40.0723341Z @%p123 mbarrier.inval.shared::cta.b64 [%r511]; 2026-02-21T09:49:40.0723404Z // end inline asm 2026-02-21T09:49:40.0723464Z bar.sync 0, 128; 2026-02-21T09:49:40.0723536Z // begin inline asm 2026-02-21T09:49:40.0723626Z @%p123 mbarrier.inval.shared::cta.b64 [%r512]; 2026-02-21T09:49:40.0723686Z // end inline asm 2026-02-21T09:49:40.0723759Z bar.sync 0, 128; 2026-02-21T09:49:40.0723821Z // begin inline asm 2026-02-21T09:49:40.0723908Z @%p123 mbarrier.inval.shared::cta.b64 [%r513]; 2026-02-21T09:49:40.0723968Z // end inline asm 2026-02-21T09:49:40.0724042Z bar.sync 0, 128; 2026-02-21T09:49:40.0724108Z // begin inline asm 2026-02-21T09:49:40.0724196Z @%p123 mbarrier.inval.shared::cta.b64 [%r514]; 2026-02-21T09:49:40.0724267Z // end inline asm 2026-02-21T09:49:40.0724330Z // begin inline asm 2026-02-21T09:49:40.0724415Z @%p123 mbarrier.inval.shared::cta.b64 [%r501]; 2026-02-21T09:49:40.0724473Z // end inline asm 2026-02-21T09:49:40.0724541Z bar.sync 0, 128; 2026-02-21T09:49:40.0724603Z // begin inline asm 2026-02-21T09:49:40.0724767Z @%p123 mbarrier.inval.shared::cta.b64 [%r502]; 2026-02-21T09:49:40.0724838Z // end inline asm 2026-02-21T09:49:40.0724899Z bar.sync 0, 128; 2026-02-21T09:49:40.0724995Z // begin inline asm 2026-02-21T09:49:40.0725102Z @%p123 mbarrier.inval.shared::cta.b64 [%r503]; 2026-02-21T09:49:40.0725176Z // end inline asm 2026-02-21T09:49:40.0725248Z bar.sync 0, 128; 2026-02-21T09:49:40.0725323Z // begin inline asm 2026-02-21T09:49:40.0725436Z @%p123 mbarrier.inval.shared::cta.b64 [%r504]; 2026-02-21T09:49:40.0725510Z // end inline asm 2026-02-21T09:49:40.0725582Z bar.sync 0, 128; 2026-02-21T09:49:40.0725724Z // begin inline asm 2026-02-21T09:49:40.0725858Z @%p123 mbarrier.inval.shared::cta.b64 [%r505]; 2026-02-21T09:49:40.0725931Z // end inline asm 2026-02-21T09:49:40.0726003Z bar.sync 0, 128; 2026-02-21T09:49:40.0726086Z // begin inline asm 2026-02-21T09:49:40.0726191Z @%p123 mbarrier.inval.shared::cta.b64 [%r506]; 2026-02-21T09:49:40.0726260Z // end inline asm 2026-02-21T09:49:40.0726341Z bar.sync 0, 128; 2026-02-21T09:49:40.0726424Z // begin inline asm 2026-02-21T09:49:40.0726522Z @%p123 mbarrier.inval.shared::cta.b64 [%r507]; 2026-02-21T09:49:40.0726596Z // end inline asm 2026-02-21T09:49:40.0726865Z .loc 1 31 4 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:31:4 2026-02-21T09:49:40.0726937Z bar.sync 0, 128; 2026-02-21T09:49:40.0727012Z // begin inline asm 2026-02-21T09:49:40.0727187Z @%p35 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1622, 512; 2026-02-21T09:49:40.0727250Z // end inline asm 2026-02-21T09:49:40.0727371Z st.shared.b32 [global_smem+180384], 50529027; 2026-02-21T09:49:40.0727445Z barrier.sync 1; 2026-02-21T09:49:40.0727539Z $L__BB0_24: // %common.ret 2026-02-21T09:49:40.0727726Z .loc 1 0 0 // cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py:0 2026-02-21T09:49:40.0727785Z ret; 2026-02-21T09:49:40.0727859Z $L__tmp1: 2026-02-21T09:49:40.0727921Z $L__func_end0: 2026-02-21T09:49:40.0728014Z // -- End function 2026-02-21T09:49:40.0728086Z } 2026-02-21T09:49:40.0728314Z .file 1 "/tmp/torchinductor_root/qc/cqcq7sozztj32vv5m4jgu7qcuvsubktoxcn2x446yuy3yeaevb2v.py" 2026-02-21T09:49:40.0728385Z .section .debug_abbrev 2026-02-21T09:49:40.0728450Z { 2026-02-21T09:49:40.0728550Z .b8 1 // Abbreviation Code 2026-02-21T09:49:40.0728646Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:49:40.0728769Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:49:40.0728869Z .b8 37 // DW_AT_producer 2026-02-21T09:49:40.0728948Z .b8 8 // DW_FORM_string 2026-02-21T09:49:40.0729030Z .b8 19 // DW_AT_language 2026-02-21T09:49:40.0729123Z .b8 5 // DW_FORM_data2 2026-02-21T09:49:40.0729203Z .b8 3 // DW_AT_name 2026-02-21T09:49:40.0729285Z .b8 8 // DW_FORM_string 2026-02-21T09:49:40.0729378Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:49:40.0729459Z .b8 6 // DW_FORM_data4 2026-02-21T09:49:40.0729542Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:49:40.0729619Z .b8 8 // DW_FORM_string 2026-02-21T09:49:40.0729705Z .b8 0 // EOM(1) 2026-02-21T09:49:40.0729782Z .b8 0 // EOM(2) 2026-02-21T09:49:40.0729856Z .b8 0 // EOM(3) 2026-02-21T09:49:40.0729922Z } 2026-02-21T09:49:40.0729990Z .section .debug_info 2026-02-21T09:49:40.0730042Z { 2026-02-21T09:49:40.0730140Z .b32 104 // Length of Unit 2026-02-21T09:49:40.0730233Z .b8 2 // DWARF version number 2026-02-21T09:49:40.0730289Z .b8 0 2026-02-21T09:49:40.0730418Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:49:40.0730523Z .b8 8 // Address Size (in bytes) 2026-02-21T09:49:40.0730661Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:49:40.0730752Z .b8 116 // DW_AT_producer 2026-02-21T09:49:40.0730819Z .b8 114 2026-02-21T09:49:40.0730875Z .b8 105 2026-02-21T09:49:40.0730930Z .b8 116 2026-02-21T09:49:40.0730984Z .b8 111 2026-02-21T09:49:40.0731050Z .b8 110 2026-02-21T09:49:40.0731107Z .b8 0 2026-02-21T09:49:40.0731215Z .b8 2 // DW_AT_language 2026-02-21T09:49:40.0731279Z .b8 0 2026-02-21T09:49:40.0731360Z .b8 99 // DW_AT_name 2026-02-21T09:49:40.0731414Z .b8 113 2026-02-21T09:49:40.0731471Z .b8 99 2026-02-21T09:49:40.0731534Z .b8 113 2026-02-21T09:49:40.0731587Z .b8 55 2026-02-21T09:49:40.0731641Z .b8 115 2026-02-21T09:49:40.0731705Z .b8 111 2026-02-21T09:49:40.0731758Z .b8 122 2026-02-21T09:49:40.0731813Z .b8 122 2026-02-21T09:49:40.0731867Z .b8 116 2026-02-21T09:49:40.0731934Z .b8 106 2026-02-21T09:49:40.0731988Z .b8 51 2026-02-21T09:49:40.0732044Z .b8 50 2026-02-21T09:49:40.0732109Z .b8 118 2026-02-21T09:49:40.0732166Z .b8 118 2026-02-21T09:49:40.0732220Z .b8 53 2026-02-21T09:49:40.0732274Z .b8 109 2026-02-21T09:49:40.0732337Z .b8 52 2026-02-21T09:49:40.0732390Z .b8 106 2026-02-21T09:49:40.0732443Z .b8 103 2026-02-21T09:49:40.0732495Z .b8 117 2026-02-21T09:49:40.0732559Z .b8 55 2026-02-21T09:49:40.0732615Z .b8 113 2026-02-21T09:49:40.0732694Z .b8 99 2026-02-21T09:49:40.0732758Z .b8 117 2026-02-21T09:49:40.0732813Z .b8 118 2026-02-21T09:49:40.0732867Z .b8 115 2026-02-21T09:49:40.0732921Z .b8 117 2026-02-21T09:49:40.0732984Z .b8 98 2026-02-21T09:49:40.0733039Z .b8 107 2026-02-21T09:49:40.0733093Z .b8 116 2026-02-21T09:49:40.0733156Z .b8 111 2026-02-21T09:49:40.0733210Z .b8 120 2026-02-21T09:49:40.0733264Z .b8 99 2026-02-21T09:49:40.0733319Z .b8 110 2026-02-21T09:49:40.0733382Z .b8 50 2026-02-21T09:49:40.0733436Z .b8 120 2026-02-21T09:49:40.0733489Z .b8 52 2026-02-21T09:49:40.0733554Z .b8 52 2026-02-21T09:49:40.0733609Z .b8 54 2026-02-21T09:49:40.0733664Z .b8 121 2026-02-21T09:49:40.0733719Z .b8 117 2026-02-21T09:49:40.0733782Z .b8 121 2026-02-21T09:49:40.0733836Z .b8 51 2026-02-21T09:49:40.0733891Z .b8 121 2026-02-21T09:49:40.0733946Z .b8 101 2026-02-21T09:49:40.0734010Z .b8 97 2026-02-21T09:49:40.0734064Z .b8 101 2026-02-21T09:49:40.0734118Z .b8 118 2026-02-21T09:49:40.0734208Z .b8 98 2026-02-21T09:49:40.0734265Z .b8 50 2026-02-21T09:49:40.0734324Z .b8 118 2026-02-21T09:49:40.0734379Z .b8 46 2026-02-21T09:49:40.0734443Z .b8 112 2026-02-21T09:49:40.0734499Z .b8 121 2026-02-21T09:49:40.0734555Z .b8 0 2026-02-21T09:49:40.0734665Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:49:40.0734806Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:49:40.0734862Z .b8 116 2026-02-21T09:49:40.0734918Z .b8 109 2026-02-21T09:49:40.0734987Z .b8 112 2026-02-21T09:49:40.0735044Z .b8 47 2026-02-21T09:49:40.0735100Z .b8 116 2026-02-21T09:49:40.0735165Z .b8 111 2026-02-21T09:49:40.0735219Z .b8 114 2026-02-21T09:49:40.0735276Z .b8 99 2026-02-21T09:49:40.0735328Z .b8 104 2026-02-21T09:49:40.0735393Z .b8 105 2026-02-21T09:49:40.0735447Z .b8 110 2026-02-21T09:49:40.0735501Z .b8 100 2026-02-21T09:49:40.0735554Z .b8 117 2026-02-21T09:49:40.0735616Z .b8 99 2026-02-21T09:49:40.0735671Z .b8 116 2026-02-21T09:49:40.0735724Z .b8 111 2026-02-21T09:49:40.0735788Z .b8 114 2026-02-21T09:49:40.0735843Z .b8 95 2026-02-21T09:49:40.0735897Z .b8 114 2026-02-21T09:49:40.0735954Z .b8 111 2026-02-21T09:49:40.0736024Z .b8 111 2026-02-21T09:49:40.0736084Z .b8 116 2026-02-21T09:49:40.0736139Z .b8 47 2026-02-21T09:49:40.0736202Z .b8 113 2026-02-21T09:49:40.0736255Z .b8 99 2026-02-21T09:49:40.0736312Z .b8 0 2026-02-21T09:49:40.0736368Z } 2026-02-21T09:49:40.0736450Z .section .debug_macinfo { } 2026-02-21T09:49:40.0736457Z 2026-02-21T09:49:40.0736544Z ================================================================ 2026-02-21T09:49:40.0736656Z please share the reproducer above with Triton project. 2026-02-21T09:49:40.8750204Z 2026-02-21T09:49:40.8751860Z 2026-02-21T09:49:40.8751872Z 2026-02-21T09:49:40.8752282Z ================================================================ 2026-02-21T09:49:40.8752614Z Internal Triton PTX codegen error 2026-02-21T09:49:40.8752823Z `ptxas` stderr: 2026-02-21T09:49:40.8753358Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 207 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:49:40.8754230Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:49:40.8754413Z 2026-02-21T09:49:40.8755150Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxn4_2s8h.ptx -o /tmp/tmpxn4_2s8h.ptx.o 2026-02-21T09:49:40.8755674Z 2026-02-21T09:49:40.8755678Z 2026-02-21T09:49:40.8755755Z // 2026-02-21T09:49:40.8755920Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:49:40.8756127Z // 2026-02-21T09:49:40.8756206Z 2026-02-21T09:49:40.8756279Z .version 8.7 2026-02-21T09:49:40.8756453Z .target sm_100a 2026-02-21T09:49:40.8756611Z .address_size 64 2026-02-21T09:49:40.8756721Z 2026-02-21T09:49:40.8756864Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:49:40.8757314Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:49:40.8757568Z // @_helion_matmul 2026-02-21T09:49:40.8757905Z .visible .entry _helion_matmul( 2026-02-21T09:49:40.8758187Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:49:40.8758534Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:49:40.8758889Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:49:40.8759236Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:49:40.8759580Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:49:40.8759849Z ) 2026-02-21T09:49:40.8760021Z .reqntid 256 2026-02-21T09:49:40.8760189Z .maxnreg 32 2026-02-21T09:49:40.8760370Z { 2026-02-21T09:49:40.8760533Z .reg .pred %p<123>; 2026-02-21T09:49:40.8760736Z .reg .b32 %r<1611>; 2026-02-21T09:49:40.8760923Z .reg .b64 %rd<609>; 2026-02-21T09:49:40.8761268Z .loc 1 19 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:19:0 2026-02-21T09:49:40.8761760Z $L__func_begin0: 2026-02-21T09:49:40.8762122Z .loc 1 19 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:19:0 2026-02-21T09:49:40.8762491Z 2026-02-21T09:49:40.8762563Z // %bb.0: 2026-02-21T09:49:40.8762736Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:49:40.8762972Z $L__tmp0: 2026-02-21T09:49:40.8763247Z .loc 1 19 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:19 2026-02-21T09:49:40.8763594Z mov.u32 %r1, %tid.x; 2026-02-21T09:49:40.8763766Z shr.u32 %r2, %r1, 5; 2026-02-21T09:49:40.8763965Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:49:40.8764193Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:49:40.8764376Z @%p3 bra $L__BB0_16; 2026-02-21T09:49:40.8764544Z bra.uni $L__BB0_1; 2026-02-21T09:49:40.8764750Z $L__BB0_16: 2026-02-21T09:49:40.8765025Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0:0 2026-02-21T09:49:40.8765373Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:49:40.8765622Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:49:40.8765954Z .loc 1 19 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:19 2026-02-21T09:49:40.8766329Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:49:40.8766553Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T09:49:40.8766743Z mov.b32 %r190, global_smem; 2026-02-21T09:49:40.8766932Z // begin inline asm 2026-02-21T09:49:40.8767210Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r190], 256; 2026-02-21T09:49:40.8767512Z // end inline asm 2026-02-21T09:49:40.8767667Z bar.sync 0, 128; 2026-02-21T09:49:40.8767844Z ld.shared.b32 %r1582, [global_smem]; 2026-02-21T09:49:40.8768083Z bar.sync 0, 128; 2026-02-21T09:49:40.8768240Z // begin inline asm 2026-02-21T09:49:40.8768481Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:49:40.8768736Z // end inline asm 2026-02-21T09:49:40.8769028Z .loc 1 21 67 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:21:67 2026-02-21T09:49:40.8769446Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:49:40.8769620Z mov.u32 %r495, %ctaid.y; 2026-02-21T09:49:40.8769797Z mov.u32 %r496, %ctaid.z; 2026-02-21T09:49:40.8769966Z mov.u32 %r497, %nctaid.x; 2026-02-21T09:49:40.8770150Z mov.u32 %r498, %nctaid.y; 2026-02-21T09:49:40.8770331Z mad.lo.s32 %r499, %r496, %r498, %r495; 2026-02-21T09:49:40.8770549Z mad.lo.s32 %r500, %r499, %r497, %r41; 2026-02-21T09:49:40.8770744Z shl.b32 %r501, %r500, 8; 2026-02-21T09:49:40.8770924Z cvt.s64.s32 %rd64, %r501; 2026-02-21T09:49:40.8771108Z add.s64 %rd43, %rd6, %rd64; 2026-02-21T09:49:40.8771286Z shl.b32 %r502, %r1, 2; 2026-02-21T09:49:40.8771465Z add.s32 %r191, %r190, %r502; 2026-02-21T09:49:40.8771639Z mov.b32 %r1610, 0; 2026-02-21T09:49:40.8771802Z // begin inline asm 2026-02-21T09:49:40.8771976Z @%p33 st.shared.b32 [ %r191 + 0 ], %r1610; 2026-02-21T09:49:40.8772183Z // end inline asm 2026-02-21T09:49:40.8772340Z bar.warp.sync -1; 2026-02-21T09:49:40.8772514Z setp.eq.b32 %p111, %r1, 0; 2026-02-21T09:49:40.8772734Z cvt.u64.u32 %rd28, %r190; 2026-02-21T09:49:40.8772915Z // begin inline asm 2026-02-21T09:49:40.8773212Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd3; 2026-02-21T09:49:40.8773537Z // end inline asm 2026-02-21T09:49:40.8773696Z // begin inline asm 2026-02-21T09:49:40.8773950Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:49:40.8774251Z // end inline asm 2026-02-21T09:49:40.8774398Z mov.b32 %r193, 32; 2026-02-21T09:49:40.8774558Z // begin inline asm 2026-02-21T09:49:40.8774867Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r193; 2026-02-21T09:49:40.8775190Z // end inline asm 2026-02-21T09:49:40.8775348Z mov.b32 %r194, 256; 2026-02-21T09:49:40.8775504Z // begin inline asm 2026-02-21T09:49:40.8775810Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r194; 2026-02-21T09:49:40.8776114Z // end inline asm 2026-02-21T09:49:40.8776274Z mov.b32 %r195, 2048; 2026-02-21T09:49:40.8776435Z // begin inline asm 2026-02-21T09:49:40.8776722Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r195; 2026-02-21T09:49:40.8777042Z // end inline asm 2026-02-21T09:49:40.8777192Z // begin inline asm 2026-02-21T09:49:40.8777474Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r195; 2026-02-21T09:49:40.8777787Z // end inline asm 2026-02-21T09:49:40.8777943Z mov.b64 %rd36, 4096; 2026-02-21T09:49:40.8778101Z // begin inline asm 2026-02-21T09:49:40.8778397Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:49:40.8778721Z // end inline asm 2026-02-21T09:49:40.8778878Z mov.b32 %r197, 1; 2026-02-21T09:49:40.8779034Z // begin inline asm 2026-02-21T09:49:40.8779326Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r197; 2026-02-21T09:49:40.8779663Z // end inline asm 2026-02-21T09:49:40.8779817Z // begin inline asm 2026-02-21T09:49:40.8780117Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r197; 2026-02-21T09:49:40.8780438Z // end inline asm 2026-02-21T09:49:40.8780594Z // begin inline asm 2026-02-21T09:49:40.8780863Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:49:40.8781166Z // end inline asm 2026-02-21T09:49:40.8781323Z // begin inline asm 2026-02-21T09:49:40.8781609Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.8781980Z // end inline asm 2026-02-21T09:49:40.8782128Z // begin inline asm 2026-02-21T09:49:40.8782404Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:49:40.8782705Z // end inline asm 2026-02-21T09:49:40.8782859Z // begin inline asm 2026-02-21T09:49:40.8783127Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.8783453Z // end inline asm 2026-02-21T09:49:40.8783612Z // begin inline asm 2026-02-21T09:49:40.8784001Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:49:40.8784446Z // end inline asm 2026-02-21T09:49:40.8784596Z // begin inline asm 2026-02-21T09:49:40.8784877Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T09:49:40.8785170Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:49:40.8785389Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:49:40.8785599Z // end inline asm 2026-02-21T09:49:40.8785748Z bar.sync 0, 128; 2026-02-21T09:49:40.8786040Z .loc 1 22 68 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:22:68 2026-02-21T09:49:40.8786380Z add.s64 %rd61, %rd43, 128; 2026-02-21T09:49:40.8786560Z bar.sync 0, 128; 2026-02-21T09:49:40.8786710Z // begin inline asm 2026-02-21T09:49:40.8786921Z @%p33 st.shared.b32 [ %r191 + 0 ], %r1610; 2026-02-21T09:49:40.8787138Z // end inline asm 2026-02-21T09:49:40.8787291Z bar.warp.sync -1; 2026-02-21T09:49:40.8787451Z // begin inline asm 2026-02-21T09:49:40.8787723Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd4; 2026-02-21T09:49:40.8788039Z // end inline asm 2026-02-21T09:49:40.8788183Z // begin inline asm 2026-02-21T09:49:40.8788438Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:49:40.8788724Z // end inline asm 2026-02-21T09:49:40.8788875Z // begin inline asm 2026-02-21T09:49:40.8789150Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r193; 2026-02-21T09:49:40.8789442Z // end inline asm 2026-02-21T09:49:40.8789596Z mov.b32 %r202, 128; 2026-02-21T09:49:40.8789751Z // begin inline asm 2026-02-21T09:49:40.8790043Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r202; 2026-02-21T09:49:40.8790344Z // end inline asm 2026-02-21T09:49:40.8790505Z // begin inline asm 2026-02-21T09:49:40.8790786Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r195; 2026-02-21T09:49:40.8791097Z // end inline asm 2026-02-21T09:49:40.8791258Z mov.b32 %r204, 12288; 2026-02-21T09:49:40.8791422Z // begin inline asm 2026-02-21T09:49:40.8791703Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r204; 2026-02-21T09:49:40.8792012Z // end inline asm 2026-02-21T09:49:40.8792173Z // begin inline asm 2026-02-21T09:49:40.8792463Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:49:40.8792787Z // end inline asm 2026-02-21T09:49:40.8792945Z // begin inline asm 2026-02-21T09:49:40.8793234Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r197; 2026-02-21T09:49:40.8793565Z // end inline asm 2026-02-21T09:49:40.8793714Z // begin inline asm 2026-02-21T09:49:40.8794005Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r197; 2026-02-21T09:49:40.8794330Z // end inline asm 2026-02-21T09:49:40.8794489Z // begin inline asm 2026-02-21T09:49:40.8794790Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:49:40.8795097Z // end inline asm 2026-02-21T09:49:40.8795251Z // begin inline asm 2026-02-21T09:49:40.8795528Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.8795872Z // end inline asm 2026-02-21T09:49:40.8796014Z // begin inline asm 2026-02-21T09:49:40.8796317Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:49:40.8796618Z // end inline asm 2026-02-21T09:49:40.8796762Z // begin inline asm 2026-02-21T09:49:40.8797023Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:49:40.8797310Z // end inline asm 2026-02-21T09:49:40.8797462Z // begin inline asm 2026-02-21T09:49:40.8797913Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:49:40.8798331Z // end inline asm 2026-02-21T09:49:40.8798483Z // begin inline asm 2026-02-21T09:49:40.8798714Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T09:49:40.8798993Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:49:40.8799201Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:49:40.8799399Z // end inline asm 2026-02-21T09:49:40.8799543Z bar.sync 0, 128; 2026-02-21T09:49:40.8799816Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.8800131Z sub.s32 %r505, 768, %r41; 2026-02-21T09:49:40.8800319Z mul.hi.s32 %r506, %r505, -580400985; 2026-02-21T09:49:40.8800519Z add.s32 %r507, %r506, %r505; 2026-02-21T09:49:40.8800694Z shr.u32 %r508, %r507, 31; 2026-02-21T09:49:40.8800867Z shr.s32 %r509, %r507, 10; 2026-02-21T09:49:40.8801067Z add.s32 %r510, %r509, %r508; 2026-02-21T09:49:40.8801254Z mul.lo.s32 %r511, %r510, 1184; 2026-02-21T09:49:40.8801439Z setp.ne.b32 %p102, %r505, %r511; 2026-02-21T09:49:40.8801639Z setp.lt.u32 %p103, %r41, 769; 2026-02-21T09:49:40.8801822Z and.pred %p104, %p103, %p102; 2026-02-21T09:49:40.8802012Z selp.b32 %r512, 1, 0, %p104; 2026-02-21T09:49:40.8802183Z add.s32 %r513, %r510, %r512; 2026-02-21T09:49:40.8802362Z shl.b32 %r75, %r513, 6; 2026-02-21T09:49:40.8802658Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8802998Z shfl.sync.idx.b32 %r514, %r2, 0, 31, -1; 2026-02-21T09:49:40.8803208Z shl.b32 %r515, %r514, 21; 2026-02-21T09:49:40.8803378Z and.b32 %r516, %r515, 6291456; 2026-02-21T09:49:40.8803566Z add.s32 %r207, %r516, %r1582; 2026-02-21T09:49:40.8803741Z mov.pred %p71, -1; 2026-02-21T09:49:40.8803940Z // begin inline asm 2026-02-21T09:49:40.8804399Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 0], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8804941Z // end inline asm 2026-02-21T09:49:40.8805103Z // begin inline asm 2026-02-21T09:49:40.8805548Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 16], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8806029Z // end inline asm 2026-02-21T09:49:40.8806177Z // begin inline asm 2026-02-21T09:49:40.8806631Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 32], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8807125Z // end inline asm 2026-02-21T09:49:40.8807272Z // begin inline asm 2026-02-21T09:49:40.8807726Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 48], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8808188Z // end inline asm 2026-02-21T09:49:40.8808345Z // begin inline asm 2026-02-21T09:49:40.8808787Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 64], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8809259Z // end inline asm 2026-02-21T09:49:40.8809415Z // begin inline asm 2026-02-21T09:49:40.8809851Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 80], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8810367Z // end inline asm 2026-02-21T09:49:40.8810514Z // begin inline asm 2026-02-21T09:49:40.8810942Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 96], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8811461Z // end inline asm 2026-02-21T09:49:40.8811607Z // begin inline asm 2026-02-21T09:49:40.8812040Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 112], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8812525Z // end inline asm 2026-02-21T09:49:40.8812680Z // begin inline asm 2026-02-21T09:49:40.8813117Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 128], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8813588Z // end inline asm 2026-02-21T09:49:40.8813740Z // begin inline asm 2026-02-21T09:49:40.8814166Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 144], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8814657Z // end inline asm 2026-02-21T09:49:40.8814863Z // begin inline asm 2026-02-21T09:49:40.8815313Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 160], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8815814Z // end inline asm 2026-02-21T09:49:40.8815958Z // begin inline asm 2026-02-21T09:49:40.8816472Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 176], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8816990Z // end inline asm 2026-02-21T09:49:40.8817161Z // begin inline asm 2026-02-21T09:49:40.8817622Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 192], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8818155Z // end inline asm 2026-02-21T09:49:40.8818344Z // begin inline asm 2026-02-21T09:49:40.8818799Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 208], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8819304Z // end inline asm 2026-02-21T09:49:40.8819452Z // begin inline asm 2026-02-21T09:49:40.8819900Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 224], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8820414Z // end inline asm 2026-02-21T09:49:40.8820565Z // begin inline asm 2026-02-21T09:49:40.8821044Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 240], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:49:40.8821523Z // end inline asm 2026-02-21T09:49:40.8821681Z // begin inline asm 2026-02-21T09:49:40.8821857Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:49:40.8822059Z // end inline asm 2026-02-21T09:49:40.8822219Z bar.sync 0, 128; 2026-02-21T09:49:40.8822536Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.8822909Z add.s32 %r479, %r190, 106496; 2026-02-21T09:49:40.8823085Z // begin inline asm 2026-02-21T09:49:40.8823283Z @%p111 mbarrier.init.shared::cta.b64 [%r479], 1; 2026-02-21T09:49:40.8823500Z // end inline asm 2026-02-21T09:49:40.8823656Z bar.sync 0, 128; 2026-02-21T09:49:40.8823807Z add.s32 %r480, %r190, 106504; 2026-02-21T09:49:40.8823986Z // begin inline asm 2026-02-21T09:49:40.8824212Z @%p111 mbarrier.init.shared::cta.b64 [%r480], 1; 2026-02-21T09:49:40.8824435Z // end inline asm 2026-02-21T09:49:40.8824588Z bar.sync 0, 128; 2026-02-21T09:49:40.8824765Z add.s32 %r481, %r190, 106512; 2026-02-21T09:49:40.8824946Z // begin inline asm 2026-02-21T09:49:40.8825129Z @%p111 mbarrier.init.shared::cta.b64 [%r481], 1; 2026-02-21T09:49:40.8825349Z // end inline asm 2026-02-21T09:49:40.8825497Z bar.sync 0, 128; 2026-02-21T09:49:40.8825686Z add.s32 %r482, %r190, 106520; 2026-02-21T09:49:40.8825858Z // begin inline asm 2026-02-21T09:49:40.8826047Z @%p111 mbarrier.init.shared::cta.b64 [%r482], 1; 2026-02-21T09:49:40.8826261Z // end inline asm 2026-02-21T09:49:40.8826409Z add.s32 %r483, %r190, 106528; 2026-02-21T09:49:40.8826585Z // begin inline asm 2026-02-21T09:49:40.8826764Z @%p111 mbarrier.init.shared::cta.b64 [%r483], 1; 2026-02-21T09:49:40.8826975Z // end inline asm 2026-02-21T09:49:40.8827121Z bar.sync 0, 128; 2026-02-21T09:49:40.8827295Z add.s32 %r484, %r190, 106536; 2026-02-21T09:49:40.8827498Z // begin inline asm 2026-02-21T09:49:40.8827718Z @%p111 mbarrier.init.shared::cta.b64 [%r484], 1; 2026-02-21T09:49:40.8827963Z // end inline asm 2026-02-21T09:49:40.8828137Z bar.sync 0, 128; 2026-02-21T09:49:40.8828318Z add.s32 %r485, %r190, 106544; 2026-02-21T09:49:40.8828514Z // begin inline asm 2026-02-21T09:49:40.8828732Z @%p111 mbarrier.init.shared::cta.b64 [%r485], 1; 2026-02-21T09:49:40.8828998Z // end inline asm 2026-02-21T09:49:40.8829153Z bar.sync 0, 128; 2026-02-21T09:49:40.8829302Z add.s32 %r486, %r190, 106552; 2026-02-21T09:49:40.8829479Z // begin inline asm 2026-02-21T09:49:40.8829659Z @%p111 mbarrier.init.shared::cta.b64 [%r486], 1; 2026-02-21T09:49:40.8829874Z // end inline asm 2026-02-21T09:49:40.8830154Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.8830465Z bar.sync 0, 128; 2026-02-21T09:49:40.8830621Z // begin inline asm 2026-02-21T09:49:40.8830817Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r479]; 2026-02-21T09:49:40.8831048Z // end inline asm 2026-02-21T09:49:40.8831196Z bar.sync 0, 128; 2026-02-21T09:49:40.8831357Z // begin inline asm 2026-02-21T09:49:40.8831552Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r480]; 2026-02-21T09:49:40.8831784Z // end inline asm 2026-02-21T09:49:40.8831969Z bar.sync 0, 128; 2026-02-21T09:49:40.8832118Z // begin inline asm 2026-02-21T09:49:40.8832315Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r481]; 2026-02-21T09:49:40.8832529Z // end inline asm 2026-02-21T09:49:40.8832682Z bar.sync 0, 128; 2026-02-21T09:49:40.8832830Z // begin inline asm 2026-02-21T09:49:40.8833022Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r482]; 2026-02-21T09:49:40.8833232Z // end inline asm 2026-02-21T09:49:40.8833516Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.8833837Z bar.sync 0, 128; 2026-02-21T09:49:40.8833987Z add.s32 %r491, %r190, 106560; 2026-02-21T09:49:40.8834170Z // begin inline asm 2026-02-21T09:49:40.8834351Z @%p111 mbarrier.init.shared::cta.b64 [%r491], 1; 2026-02-21T09:49:40.8834565Z // end inline asm 2026-02-21T09:49:40.8834755Z add.s32 %r1570, %r190, 106576; 2026-02-21T09:49:40.8834944Z // begin inline asm 2026-02-21T09:49:40.8835127Z @%p111 mbarrier.init.shared::cta.b64 [%r1570], 1; 2026-02-21T09:49:40.8835352Z // end inline asm 2026-02-21T09:49:40.8835640Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8835960Z bar.sync 0, 128; 2026-02-21T09:49:40.8836117Z // begin inline asm 2026-02-21T09:49:40.8836307Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r1570]; 2026-02-21T09:49:40.8836537Z // end inline asm 2026-02-21T09:49:40.8836813Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.8837188Z st.shared.b32 [global_smem+106584], 33554689; 2026-02-21T09:49:40.8837430Z st.shared.b32 [global_smem+98304], %r1582; 2026-02-21T09:49:40.8837692Z st.shared.b32 [global_smem+98312], %r75; 2026-02-21T09:49:40.8837906Z barrier.sync 1; 2026-02-21T09:49:40.8838081Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:49:40.8838290Z barrier.sync 1; 2026-02-21T09:49:40.8838445Z setp.lt.s32 %p105, %r513, 1; 2026-02-21T09:49:40.8838633Z @%p105 bra $L__BB0_23; 2026-02-21T09:49:40.8838823Z // %bb.17: // %.lr.ph10 2026-02-21T09:49:40.8839195Z .loc 1 0 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0:97 2026-02-21T09:49:40.8839551Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:49:40.8839758Z shr.u32 %r503, %r1, 4; 2026-02-21T09:49:40.8839931Z bfe.u32 %r42, %r1, 4, 3; 2026-02-21T09:49:40.8840096Z or.b32 %r43, %r42, 8; 2026-02-21T09:49:40.8840262Z or.b32 %r44, %r42, 16; 2026-02-21T09:49:40.8840419Z or.b32 %r45, %r42, 24; 2026-02-21T09:49:40.8840580Z or.b32 %r46, %r42, 32; 2026-02-21T09:49:40.8840732Z or.b32 %r47, %r42, 40; 2026-02-21T09:49:40.8840896Z or.b32 %r48, %r42, 48; 2026-02-21T09:49:40.8841052Z or.b32 %r49, %r503, 56; 2026-02-21T09:49:40.8841221Z or.b32 %r50, %r42, 64; 2026-02-21T09:49:40.8841381Z or.b32 %r51, %r42, 72; 2026-02-21T09:49:40.8841534Z or.b32 %r52, %r42, 80; 2026-02-21T09:49:40.8841694Z or.b32 %r53, %r42, 88; 2026-02-21T09:49:40.8841850Z or.b32 %r54, %r42, 96; 2026-02-21T09:49:40.8842050Z or.b32 %r55, %r42, 104; 2026-02-21T09:49:40.8842220Z or.b32 %r56, %r42, 112; 2026-02-21T09:49:40.8842389Z or.b32 %r57, %r503, 120; 2026-02-21T09:49:40.8842549Z or.b32 %r58, %r42, 128; 2026-02-21T09:49:40.8842714Z or.b32 %r59, %r42, 136; 2026-02-21T09:49:40.8842869Z or.b32 %r60, %r42, 144; 2026-02-21T09:49:40.8843030Z or.b32 %r61, %r42, 152; 2026-02-21T09:49:40.8843192Z or.b32 %r62, %r42, 160; 2026-02-21T09:49:40.8843347Z or.b32 %r63, %r42, 168; 2026-02-21T09:49:40.8843506Z or.b32 %r64, %r42, 176; 2026-02-21T09:49:40.8843664Z or.b32 %r65, %r503, 184; 2026-02-21T09:49:40.8843829Z or.b32 %r66, %r42, 192; 2026-02-21T09:49:40.8843985Z or.b32 %r67, %r42, 200; 2026-02-21T09:49:40.8844143Z or.b32 %r68, %r42, 208; 2026-02-21T09:49:40.8844296Z or.b32 %r69, %r42, 216; 2026-02-21T09:49:40.8844458Z or.b32 %r70, %r42, 224; 2026-02-21T09:49:40.8844610Z or.b32 %r71, %r42, 232; 2026-02-21T09:49:40.8844809Z or.b32 %r72, %r42, 240; 2026-02-21T09:49:40.8845011Z or.b32 %r73, %r503, 248; 2026-02-21T09:49:40.8845175Z shl.b32 %r504, %r1, 3; 2026-02-21T09:49:40.8845343Z and.b32 %r74, %r504, 120; 2026-02-21T09:49:40.8845631Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.8845966Z add.s32 %r1607, %r41, -1184; 2026-02-21T09:49:40.8846140Z shl.b32 %r519, %r1, 10; 2026-02-21T09:49:40.8846312Z and.b32 %r520, %r519, 6144; 2026-02-21T09:49:40.8846480Z shl.b32 %r521, %r1, 4; 2026-02-21T09:49:40.8846646Z and.b32 %r522, %r521, 2032; 2026-02-21T09:49:40.8846813Z or.b32 %r523, %r520, %r522; 2026-02-21T09:49:40.8846988Z add.s32 %r525, %r190, 98304; 2026-02-21T09:49:40.8847163Z add.s32 %r78, %r525, %r523; 2026-02-21T09:49:40.8847330Z xor.b32 %r526, %r523, 32; 2026-02-21T09:49:40.8847502Z add.s32 %r79, %r525, %r526; 2026-02-21T09:49:40.8847664Z xor.b32 %r527, %r523, 64; 2026-02-21T09:49:40.8847831Z add.s32 %r80, %r525, %r527; 2026-02-21T09:49:40.8847996Z xor.b32 %r528, %r523, 96; 2026-02-21T09:49:40.8848165Z add.s32 %r81, %r525, %r528; 2026-02-21T09:49:40.8848330Z and.b32 %r529, %r1, 96; 2026-02-21T09:49:40.8848499Z shl.b32 %r530, %r529, 6; 2026-02-21T09:49:40.8848660Z shl.b32 %r531, %r1, 5; 2026-02-21T09:49:40.8848825Z and.b32 %r532, %r531, 96; 2026-02-21T09:49:40.8848996Z and.b32 %r533, %r521, 384; 2026-02-21T09:49:40.8849161Z and.b32 %r535, %r502, 16; 2026-02-21T09:49:40.8849331Z or.b32 %r536, %r530, %r532; 2026-02-21T09:49:40.8849494Z or.b32 %r537, %r533, %r529; 2026-02-21T09:49:40.8849668Z xor.b32 %r538, %r536, %r537; 2026-02-21T09:49:40.8849836Z add.s32 %r539, %r525, %r535; 2026-02-21T09:49:40.8850066Z add.s32 %r821, %r539, %r538; 2026-02-21T09:49:40.8850236Z add.s32 %r826, %r821, 512; 2026-02-21T09:49:40.8850413Z add.s32 %r831, %r821, 1024; 2026-02-21T09:49:40.8850587Z add.s32 %r836, %r821, 1536; 2026-02-21T09:49:40.8850757Z max.s32 %r1600, %r75, 1; 2026-02-21T09:49:40.8850930Z mov.b32 %r1605, -1; 2026-02-21T09:49:40.8851091Z mov.b32 %r1608, %r1610; 2026-02-21T09:49:40.8851260Z mov.b32 %r1609, %r1610; 2026-02-21T09:49:40.8851453Z bra.uni $L__BB0_18; 2026-02-21T09:49:40.8851677Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.8852033Z .loc 1 40 32 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:40:32 2026-02-21T09:49:40.8852357Z add.s32 %r1106, %r1609, %r42; 2026-02-21T09:49:40.8852538Z add.s32 %r1107, %r1609, %r43; 2026-02-21T09:49:40.8852709Z add.s32 %r1108, %r1609, %r44; 2026-02-21T09:49:40.8852884Z add.s32 %r1109, %r1609, %r45; 2026-02-21T09:49:40.8853048Z add.s32 %r1110, %r1609, %r46; 2026-02-21T09:49:40.8853222Z add.s32 %r1111, %r1609, %r47; 2026-02-21T09:49:40.8853389Z add.s32 %r1112, %r1609, %r48; 2026-02-21T09:49:40.8853560Z add.s32 %r1113, %r1609, %r49; 2026-02-21T09:49:40.8853723Z add.s32 %r1114, %r1609, %r50; 2026-02-21T09:49:40.8853895Z add.s32 %r1115, %r1609, %r51; 2026-02-21T09:49:40.8854060Z add.s32 %r1116, %r1609, %r52; 2026-02-21T09:49:40.8854264Z add.s32 %r1117, %r1609, %r53; 2026-02-21T09:49:40.8854440Z add.s32 %r1118, %r1609, %r54; 2026-02-21T09:49:40.8854611Z add.s32 %r1119, %r1609, %r55; 2026-02-21T09:49:40.8854809Z add.s32 %r1120, %r1609, %r56; 2026-02-21T09:49:40.8854977Z add.s32 %r1121, %r1609, %r57; 2026-02-21T09:49:40.8855159Z add.s32 %r1122, %r1609, %r58; 2026-02-21T09:49:40.8855331Z add.s32 %r1123, %r1609, %r59; 2026-02-21T09:49:40.8855510Z add.s32 %r1124, %r1609, %r60; 2026-02-21T09:49:40.8855683Z add.s32 %r1125, %r1609, %r61; 2026-02-21T09:49:40.8855861Z add.s32 %r1126, %r1609, %r62; 2026-02-21T09:49:40.8856040Z add.s32 %r1127, %r1609, %r63; 2026-02-21T09:49:40.8856215Z add.s32 %r1128, %r1609, %r64; 2026-02-21T09:49:40.8856392Z add.s32 %r1129, %r1609, %r65; 2026-02-21T09:49:40.8856563Z add.s32 %r1130, %r1609, %r66; 2026-02-21T09:49:40.8856743Z add.s32 %r1131, %r1609, %r67; 2026-02-21T09:49:40.8856916Z add.s32 %r1132, %r1609, %r68; 2026-02-21T09:49:40.8857134Z add.s32 %r1133, %r1609, %r69; 2026-02-21T09:49:40.8857300Z add.s32 %r1134, %r1609, %r70; 2026-02-21T09:49:40.8857473Z add.s32 %r1135, %r1609, %r71; 2026-02-21T09:49:40.8857638Z add.s32 %r1136, %r1609, %r72; 2026-02-21T09:49:40.8857808Z add.s32 %r1137, %r1609, %r73; 2026-02-21T09:49:40.8858097Z .loc 1 42 32 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:42:32 2026-02-21T09:49:40.8858405Z or.b32 %r1138, %r1608, %r74; 2026-02-21T09:49:40.8858695Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8858999Z bar.sync 0, 128; 2026-02-21T09:49:40.8859160Z // begin inline asm 2026-02-21T09:49:40.8859308Z 2026-02-21T09:49:40.8859441Z { 2026-02-21T09:49:40.8859575Z .reg .pred complete; 2026-02-21T09:49:40.8859743Z waitLoop: 2026-02-21T09:49:40.8859962Z mbarrier.try_wait.parity.shared.b64 complete, [%r491], %r1610; 2026-02-21T09:49:40.8860228Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.8860412Z } 2026-02-21T09:49:40.8860485Z 2026-02-21T09:49:40.8860547Z // end inline asm 2026-02-21T09:49:40.8860707Z // begin inline asm 2026-02-21T09:49:40.8861102Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559}, [%r207 + 0]; 2026-02-21T09:49:40.8861537Z // end inline asm 2026-02-21T09:49:40.8861684Z // begin inline asm 2026-02-21T09:49:40.8862076Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576}, [%r207 + 16]; 2026-02-21T09:49:40.8862513Z // end inline asm 2026-02-21T09:49:40.8862696Z // begin inline asm 2026-02-21T09:49:40.8863094Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593}, [%r207 + 32]; 2026-02-21T09:49:40.8863510Z // end inline asm 2026-02-21T09:49:40.8863666Z // begin inline asm 2026-02-21T09:49:40.8864061Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610}, [%r207 + 48]; 2026-02-21T09:49:40.8864537Z // end inline asm 2026-02-21T09:49:40.8864723Z // begin inline asm 2026-02-21T09:49:40.8865106Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627}, [%r207 + 64]; 2026-02-21T09:49:40.8865530Z // end inline asm 2026-02-21T09:49:40.8865677Z // begin inline asm 2026-02-21T09:49:40.8866131Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644}, [%r207 + 80]; 2026-02-21T09:49:40.8866562Z // end inline asm 2026-02-21T09:49:40.8866707Z // begin inline asm 2026-02-21T09:49:40.8867160Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661}, [%r207 + 96]; 2026-02-21T09:49:40.8867673Z // end inline asm 2026-02-21T09:49:40.8867833Z // begin inline asm 2026-02-21T09:49:40.8868229Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678}, [%r207 + 112]; 2026-02-21T09:49:40.8868678Z // end inline asm 2026-02-21T09:49:40.8868833Z // begin inline asm 2026-02-21T09:49:40.8869235Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695}, [%r207 + 128]; 2026-02-21T09:49:40.8869682Z // end inline asm 2026-02-21T09:49:40.8869830Z // begin inline asm 2026-02-21T09:49:40.8870240Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712}, [%r207 + 144]; 2026-02-21T09:49:40.8870684Z // end inline asm 2026-02-21T09:49:40.8870869Z // begin inline asm 2026-02-21T09:49:40.8871279Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729}, [%r207 + 160]; 2026-02-21T09:49:40.8871721Z // end inline asm 2026-02-21T09:49:40.8871875Z // begin inline asm 2026-02-21T09:49:40.8872275Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746}, [%r207 + 176]; 2026-02-21T09:49:40.8872721Z // end inline asm 2026-02-21T09:49:40.8872876Z // begin inline asm 2026-02-21T09:49:40.8873279Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763}, [%r207 + 192]; 2026-02-21T09:49:40.8873731Z // end inline asm 2026-02-21T09:49:40.8873879Z // begin inline asm 2026-02-21T09:49:40.8874288Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780}, [%r207 + 208]; 2026-02-21T09:49:40.8874769Z // end inline asm 2026-02-21T09:49:40.8874926Z // begin inline asm 2026-02-21T09:49:40.8875245Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797}, [%r207 + 224]; 2026-02-21T09:49:40.8875313Z // end inline asm 2026-02-21T09:49:40.8875375Z // begin inline asm 2026-02-21T09:49:40.8875699Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814}, [%r207 + 240]; 2026-02-21T09:49:40.8875809Z // end inline asm 2026-02-21T09:49:40.8875873Z // begin inline asm 2026-02-21T09:49:40.8875957Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:49:40.8876018Z // end inline asm 2026-02-21T09:49:40.8876091Z bar.sync 0, 128; 2026-02-21T09:49:40.8876154Z // begin inline asm 2026-02-21T09:49:40.8876262Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r1570]; 2026-02-21T09:49:40.8876334Z // end inline asm 2026-02-21T09:49:40.8876433Z cvt.u64.u32 %rd97, %r544; 2026-02-21T09:49:40.8876501Z cvt.u64.u32 %rd98, %r545; 2026-02-21T09:49:40.8876567Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:49:40.8876646Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:49:40.8876842Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8876913Z mov.b64 {%r1140, %r1141}, %rd100; 2026-02-21T09:49:40.8877004Z cvt.rn.f16x2.f32 %r1142, %r1141, %r1140; 2026-02-21T09:49:40.8877192Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8877263Z cvt.u64.u32 %rd101, %r546; 2026-02-21T09:49:40.8877337Z cvt.u64.u32 %rd102, %r547; 2026-02-21T09:49:40.8877405Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:49:40.8877473Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:49:40.8877662Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8877772Z mov.b64 {%r1143, %r1144}, %rd104; 2026-02-21T09:49:40.8877852Z cvt.rn.f16x2.f32 %r1145, %r1144, %r1143; 2026-02-21T09:49:40.8878061Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8878139Z cvt.u64.u32 %rd105, %r548; 2026-02-21T09:49:40.8878205Z cvt.u64.u32 %rd106, %r549; 2026-02-21T09:49:40.8878271Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:49:40.8878348Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:49:40.8878538Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8878607Z mov.b64 {%r1146, %r1147}, %rd108; 2026-02-21T09:49:40.8878684Z cvt.rn.f16x2.f32 %r1148, %r1147, %r1146; 2026-02-21T09:49:40.8878906Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8879002Z cvt.u64.u32 %rd109, %r550; 2026-02-21T09:49:40.8879068Z cvt.u64.u32 %rd110, %r551; 2026-02-21T09:49:40.8879146Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:49:40.8879213Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:49:40.8879411Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8879484Z mov.b64 {%r1149, %r1150}, %rd112; 2026-02-21T09:49:40.8879559Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T09:49:40.8879739Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8879802Z cvt.u64.u32 %rd113, %r552; 2026-02-21T09:49:40.8879877Z cvt.u64.u32 %rd114, %r553; 2026-02-21T09:49:40.8879941Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:49:40.8880005Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:49:40.8880197Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8880262Z mov.b64 {%r1152, %r1153}, %rd116; 2026-02-21T09:49:40.8880339Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T09:49:40.8880533Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8880597Z cvt.u64.u32 %rd117, %r554; 2026-02-21T09:49:40.8880659Z cvt.u64.u32 %rd118, %r555; 2026-02-21T09:49:40.8880722Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:49:40.8880795Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:49:40.8880977Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8881040Z mov.b64 {%r1155, %r1156}, %rd120; 2026-02-21T09:49:40.8881120Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T09:49:40.8881328Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8881392Z cvt.u64.u32 %rd121, %r556; 2026-02-21T09:49:40.8881461Z cvt.u64.u32 %rd122, %r557; 2026-02-21T09:49:40.8881525Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:49:40.8881591Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:49:40.8881775Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8881870Z mov.b64 {%r1158, %r1159}, %rd124; 2026-02-21T09:49:40.8881943Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T09:49:40.8882124Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8882194Z cvt.u64.u32 %rd125, %r558; 2026-02-21T09:49:40.8882256Z cvt.u64.u32 %rd126, %r559; 2026-02-21T09:49:40.8882318Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:49:40.8882388Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:49:40.8882572Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8882635Z mov.b64 {%r1161, %r1162}, %rd128; 2026-02-21T09:49:40.8882707Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T09:49:40.8882896Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8883012Z cvt.u64.u32 %rd129, %r561; 2026-02-21T09:49:40.8883077Z cvt.u64.u32 %rd130, %r562; 2026-02-21T09:49:40.8883146Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:49:40.8883211Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:49:40.8883394Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8883466Z mov.b64 {%r1164, %r1165}, %rd132; 2026-02-21T09:49:40.8883540Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T09:49:40.8883720Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8883785Z cvt.u64.u32 %rd133, %r563; 2026-02-21T09:49:40.8883856Z cvt.u64.u32 %rd134, %r564; 2026-02-21T09:49:40.8883920Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:49:40.8883984Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:49:40.8884210Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8884279Z mov.b64 {%r1167, %r1168}, %rd136; 2026-02-21T09:49:40.8884355Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T09:49:40.8884548Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8884613Z cvt.u64.u32 %rd137, %r565; 2026-02-21T09:49:40.8884705Z cvt.u64.u32 %rd138, %r566; 2026-02-21T09:49:40.8884772Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:49:40.8884845Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:49:40.8885036Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8885105Z mov.b64 {%r1170, %r1171}, %rd140; 2026-02-21T09:49:40.8885188Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T09:49:40.8885373Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8885438Z cvt.u64.u32 %rd141, %r567; 2026-02-21T09:49:40.8885510Z cvt.u64.u32 %rd142, %r568; 2026-02-21T09:49:40.8885577Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:49:40.8885646Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:49:40.8885834Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8885909Z mov.b64 {%r1173, %r1174}, %rd144; 2026-02-21T09:49:40.8885987Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T09:49:40.8886181Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8886259Z cvt.u64.u32 %rd145, %r569; 2026-02-21T09:49:40.8886322Z cvt.u64.u32 %rd146, %r570; 2026-02-21T09:49:40.8886422Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:49:40.8886501Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:49:40.8886686Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8886755Z mov.b64 {%r1176, %r1177}, %rd148; 2026-02-21T09:49:40.8886831Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T09:49:40.8887022Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8887117Z cvt.u64.u32 %rd149, %r571; 2026-02-21T09:49:40.8887181Z cvt.u64.u32 %rd150, %r572; 2026-02-21T09:49:40.8887251Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:49:40.8887314Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:49:40.8887491Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8887565Z mov.b64 {%r1179, %r1180}, %rd152; 2026-02-21T09:49:40.8887637Z cvt.rn.f16x2.f32 %r1181, %r1180, %r1179; 2026-02-21T09:49:40.8887822Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8887884Z cvt.u64.u32 %rd153, %r573; 2026-02-21T09:49:40.8887954Z cvt.u64.u32 %rd154, %r574; 2026-02-21T09:49:40.8888017Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:49:40.8888082Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:49:40.8888298Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8888365Z mov.b64 {%r1182, %r1183}, %rd156; 2026-02-21T09:49:40.8888437Z cvt.rn.f16x2.f32 %r1184, %r1183, %r1182; 2026-02-21T09:49:40.8888626Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8888689Z cvt.u64.u32 %rd157, %r575; 2026-02-21T09:49:40.8888751Z cvt.u64.u32 %rd158, %r576; 2026-02-21T09:49:40.8888813Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:49:40.8888884Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:49:40.8889069Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8889131Z mov.b64 {%r1185, %r1186}, %rd160; 2026-02-21T09:49:40.8889211Z cvt.rn.f16x2.f32 %r1187, %r1186, %r1185; 2026-02-21T09:49:40.8889424Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8889491Z cvt.u64.u32 %rd161, %r578; 2026-02-21T09:49:40.8889562Z cvt.u64.u32 %rd162, %r579; 2026-02-21T09:49:40.8889624Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:49:40.8889687Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:49:40.8889869Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8889943Z mov.b64 {%r1188, %r1189}, %rd164; 2026-02-21T09:49:40.8890015Z cvt.rn.f16x2.f32 %r1190, %r1189, %r1188; 2026-02-21T09:49:40.8890195Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8890270Z cvt.u64.u32 %rd165, %r580; 2026-02-21T09:49:40.8890332Z cvt.u64.u32 %rd166, %r581; 2026-02-21T09:49:40.8890844Z [197s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:49:40.8892020Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:49:40.8892163Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:49:40.8892226Z `ptxas` stderr: 2026-02-21T09:49:40.8892615Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 207 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:49:40.8892743Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:49:40.8892748Z 2026-02-21T09:49:40.8893183Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpxn4_2s8h.ptx -o /tmp/tmpxn4_2s8h.ptx.o 2026-02-21T09:49:40.8893197Z 2026-02-21T09:49:40.8893364Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:49:40.8893430Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:49:40.8893495Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:49:40.8893688Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8893753Z mov.b64 {%r1191, %r1192}, %rd168; 2026-02-21T09:49:40.8893828Z cvt.rn.f16x2.f32 %r1193, %r1192, %r1191; 2026-02-21T09:49:40.8894020Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8894086Z cvt.u64.u32 %rd169, %r582; 2026-02-21T09:49:40.8894149Z cvt.u64.u32 %rd170, %r583; 2026-02-21T09:49:40.8894219Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:49:40.8894283Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:49:40.8894468Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8894558Z mov.b64 {%r1194, %r1195}, %rd172; 2026-02-21T09:49:40.8894645Z cvt.rn.f16x2.f32 %r1196, %r1195, %r1194; 2026-02-21T09:49:40.8894857Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8894922Z cvt.u64.u32 %rd173, %r584; 2026-02-21T09:49:40.8894992Z cvt.u64.u32 %rd174, %r585; 2026-02-21T09:49:40.8895057Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:49:40.8895122Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:49:40.8895314Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8895381Z mov.b64 {%r1197, %r1198}, %rd176; 2026-02-21T09:49:40.8895455Z cvt.rn.f16x2.f32 %r1199, %r1198, %r1197; 2026-02-21T09:49:40.8895636Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8895709Z cvt.u64.u32 %rd177, %r586; 2026-02-21T09:49:40.8895806Z cvt.u64.u32 %rd178, %r587; 2026-02-21T09:49:40.8895874Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:49:40.8895950Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:49:40.8896147Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8896211Z mov.b64 {%r1200, %r1201}, %rd180; 2026-02-21T09:49:40.8896292Z cvt.rn.f16x2.f32 %r1202, %r1201, %r1200; 2026-02-21T09:49:40.8896472Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8896535Z cvt.u64.u32 %rd181, %r588; 2026-02-21T09:49:40.8896598Z cvt.u64.u32 %rd182, %r589; 2026-02-21T09:49:40.8896673Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:49:40.8896737Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:49:40.8896926Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8897000Z mov.b64 {%r1203, %r1204}, %rd184; 2026-02-21T09:49:40.8897074Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T09:49:40.8897262Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8897338Z cvt.u64.u32 %rd185, %r590; 2026-02-21T09:49:40.8897401Z cvt.u64.u32 %rd186, %r591; 2026-02-21T09:49:40.8897466Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:49:40.8897533Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:49:40.8897749Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8897817Z mov.b64 {%r1206, %r1207}, %rd188; 2026-02-21T09:49:40.8897892Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T09:49:40.8898123Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8898191Z cvt.u64.u32 %rd189, %r592; 2026-02-21T09:49:40.8898258Z cvt.u64.u32 %rd190, %r593; 2026-02-21T09:49:40.8898332Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:49:40.8898399Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:49:40.8898590Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8898686Z mov.b64 {%r1209, %r1210}, %rd192; 2026-02-21T09:49:40.8898767Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T09:49:40.8898953Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8899017Z cvt.u64.u32 %rd193, %r595; 2026-02-21T09:49:40.8899090Z cvt.u64.u32 %rd194, %r596; 2026-02-21T09:49:40.8899155Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:49:40.8899219Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:49:40.8899417Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8899483Z mov.b64 {%r1212, %r1213}, %rd196; 2026-02-21T09:49:40.8899556Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T09:49:40.8899743Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8899847Z cvt.u64.u32 %rd197, %r597; 2026-02-21T09:49:40.8899915Z cvt.u64.u32 %rd198, %r598; 2026-02-21T09:49:40.8899980Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:49:40.8900053Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:49:40.8900249Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8900314Z mov.b64 {%r1215, %r1216}, %rd200; 2026-02-21T09:49:40.8900395Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T09:49:40.8900595Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8900664Z cvt.u64.u32 %rd201, %r599; 2026-02-21T09:49:40.8900728Z cvt.u64.u32 %rd202, %r600; 2026-02-21T09:49:40.8900800Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:49:40.8900864Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:49:40.8901085Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8901164Z mov.b64 {%r1218, %r1219}, %rd204; 2026-02-21T09:49:40.8901239Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T09:49:40.8901438Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8901510Z cvt.u64.u32 %rd205, %r601; 2026-02-21T09:49:40.8901575Z cvt.u64.u32 %rd206, %r602; 2026-02-21T09:49:40.8901640Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:49:40.8901704Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:49:40.8901908Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8901976Z mov.b64 {%r1221, %r1222}, %rd208; 2026-02-21T09:49:40.8902051Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T09:49:40.8902256Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8902321Z cvt.u64.u32 %rd209, %r603; 2026-02-21T09:49:40.8902385Z cvt.u64.u32 %rd210, %r604; 2026-02-21T09:49:40.8902459Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:49:40.8902526Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:49:40.8902727Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8902793Z mov.b64 {%r1224, %r1225}, %rd212; 2026-02-21T09:49:40.8902872Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T09:49:40.8903066Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8903131Z cvt.u64.u32 %rd213, %r605; 2026-02-21T09:49:40.8903202Z cvt.u64.u32 %rd214, %r606; 2026-02-21T09:49:40.8903295Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:49:40.8903360Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:49:40.8903562Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8903626Z mov.b64 {%r1227, %r1228}, %rd216; 2026-02-21T09:49:40.8903701Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T09:49:40.8903882Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8903985Z cvt.u64.u32 %rd217, %r607; 2026-02-21T09:49:40.8904048Z cvt.u64.u32 %rd218, %r608; 2026-02-21T09:49:40.8904111Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:49:40.8904183Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:49:40.8904380Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8904445Z mov.b64 {%r1230, %r1231}, %rd220; 2026-02-21T09:49:40.8904526Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T09:49:40.8904734Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8904800Z cvt.u64.u32 %rd221, %r609; 2026-02-21T09:49:40.8904864Z cvt.u64.u32 %rd222, %r610; 2026-02-21T09:49:40.8904936Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:49:40.8905003Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:49:40.8905243Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8905319Z mov.b64 {%r1233, %r1234}, %rd224; 2026-02-21T09:49:40.8905392Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T09:49:40.8905583Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8905653Z cvt.u64.u32 %rd225, %r612; 2026-02-21T09:49:40.8905716Z cvt.u64.u32 %rd226, %r613; 2026-02-21T09:49:40.8905781Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:49:40.8905845Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:49:40.8906054Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8906120Z mov.b64 {%r1236, %r1237}, %rd228; 2026-02-21T09:49:40.8906195Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T09:49:40.8906411Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8906481Z cvt.u64.u32 %rd229, %r614; 2026-02-21T09:49:40.8906550Z cvt.u64.u32 %rd230, %r615; 2026-02-21T09:49:40.8906623Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:49:40.8906689Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:49:40.8906877Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8906946Z mov.b64 {%r1239, %r1240}, %rd232; 2026-02-21T09:49:40.8907031Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T09:49:40.8907233Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8907300Z cvt.u64.u32 %rd233, %r616; 2026-02-21T09:49:40.8907371Z cvt.u64.u32 %rd234, %r617; 2026-02-21T09:49:40.8907435Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:49:40.8907500Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:49:40.8907708Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8907776Z mov.b64 {%r1242, %r1243}, %rd236; 2026-02-21T09:49:40.8907855Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T09:49:40.8908040Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8908114Z cvt.u64.u32 %rd237, %r618; 2026-02-21T09:49:40.8908186Z cvt.u64.u32 %rd238, %r619; 2026-02-21T09:49:40.8908250Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:49:40.8908324Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:49:40.8908520Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8908632Z mov.b64 {%r1245, %r1246}, %rd240; 2026-02-21T09:49:40.8908724Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T09:49:40.8908908Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8908972Z cvt.u64.u32 %rd241, %r620; 2026-02-21T09:49:40.8909037Z cvt.u64.u32 %rd242, %r621; 2026-02-21T09:49:40.8909107Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:49:40.8909208Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:49:40.8909392Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8909464Z mov.b64 {%r1248, %r1249}, %rd244; 2026-02-21T09:49:40.8909537Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T09:49:40.8909717Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8909786Z cvt.u64.u32 %rd245, %r622; 2026-02-21T09:49:40.8909849Z cvt.u64.u32 %rd246, %r623; 2026-02-21T09:49:40.8909913Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:49:40.8909976Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:49:40.8910165Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8910230Z mov.b64 {%r1251, %r1252}, %rd248; 2026-02-21T09:49:40.8910305Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T09:49:40.8910522Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8910589Z cvt.u64.u32 %rd249, %r624; 2026-02-21T09:49:40.8910652Z cvt.u64.u32 %rd250, %r625; 2026-02-21T09:49:40.8910720Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:49:40.8910783Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:49:40.8910958Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8911022Z mov.b64 {%r1254, %r1255}, %rd252; 2026-02-21T09:49:40.8911104Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T09:49:40.8911282Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8911344Z cvt.u64.u32 %rd253, %r626; 2026-02-21T09:49:40.8911413Z cvt.u64.u32 %rd254, %r627; 2026-02-21T09:49:40.8911476Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:49:40.8911567Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:49:40.8911761Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8911829Z mov.b64 {%r1257, %r1258}, %rd256; 2026-02-21T09:49:40.8911901Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T09:49:40.8912084Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8912155Z cvt.u64.u32 %rd257, %r629; 2026-02-21T09:49:40.8912217Z cvt.u64.u32 %rd258, %r630; 2026-02-21T09:49:40.8912279Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:49:40.8912351Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:49:40.8912531Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8912597Z mov.b64 {%r1260, %r1261}, %rd260; 2026-02-21T09:49:40.8912675Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T09:49:40.8912861Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8912923Z cvt.u64.u32 %rd261, %r631; 2026-02-21T09:49:40.8912988Z cvt.u64.u32 %rd262, %r632; 2026-02-21T09:49:40.8913059Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:49:40.8913123Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:49:40.8913306Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8913376Z mov.b64 {%r1263, %r1264}, %rd264; 2026-02-21T09:49:40.8913447Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T09:49:40.8913624Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8913723Z cvt.u64.u32 %rd265, %r633; 2026-02-21T09:49:40.8913785Z cvt.u64.u32 %rd266, %r634; 2026-02-21T09:49:40.8913849Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:49:40.8913913Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:49:40.8914105Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8914170Z mov.b64 {%r1266, %r1267}, %rd268; 2026-02-21T09:49:40.8914269Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T09:49:40.8914456Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8914519Z cvt.u64.u32 %rd269, %r635; 2026-02-21T09:49:40.8914580Z cvt.u64.u32 %rd270, %r636; 2026-02-21T09:49:40.8914648Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:49:40.8914774Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:49:40.8914960Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8915026Z mov.b64 {%r1269, %r1270}, %rd272; 2026-02-21T09:49:40.8915108Z cvt.rn.f16x2.f32 %r1271, %r1270, %r1269; 2026-02-21T09:49:40.8915292Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8915355Z cvt.u64.u32 %rd273, %r637; 2026-02-21T09:49:40.8915426Z cvt.u64.u32 %rd274, %r638; 2026-02-21T09:49:40.8915491Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:49:40.8915589Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:49:40.8915776Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8915839Z mov.b64 {%r1272, %r1273}, %rd276; 2026-02-21T09:49:40.8915911Z cvt.rn.f16x2.f32 %r1274, %r1273, %r1272; 2026-02-21T09:49:40.8916091Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8916162Z cvt.u64.u32 %rd277, %r639; 2026-02-21T09:49:40.8916223Z cvt.u64.u32 %rd278, %r640; 2026-02-21T09:49:40.8916286Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:49:40.8916361Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:49:40.8916542Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8916607Z mov.b64 {%r1275, %r1276}, %rd280; 2026-02-21T09:49:40.8916713Z cvt.rn.f16x2.f32 %r1277, %r1276, %r1275; 2026-02-21T09:49:40.8916896Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8916961Z cvt.u64.u32 %rd281, %r641; 2026-02-21T09:49:40.8917022Z cvt.u64.u32 %rd282, %r642; 2026-02-21T09:49:40.8917093Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:49:40.8917156Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:49:40.8917337Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8917411Z mov.b64 {%r1278, %r1279}, %rd284; 2026-02-21T09:49:40.8917484Z cvt.rn.f16x2.f32 %r1280, %r1279, %r1278; 2026-02-21T09:49:40.8917662Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8917735Z cvt.u64.u32 %rd285, %r643; 2026-02-21T09:49:40.8917798Z cvt.u64.u32 %rd286, %r644; 2026-02-21T09:49:40.8917862Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:49:40.8917927Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:49:40.8918118Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8918186Z mov.b64 {%r1281, %r1282}, %rd288; 2026-02-21T09:49:40.8918258Z cvt.rn.f16x2.f32 %r1283, %r1282, %r1281; 2026-02-21T09:49:40.8918445Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8918507Z cvt.u64.u32 %rd289, %r646; 2026-02-21T09:49:40.8918569Z cvt.u64.u32 %rd290, %r647; 2026-02-21T09:49:40.8918638Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:49:40.8918700Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:49:40.8918880Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8918979Z mov.b64 {%r1284, %r1285}, %rd292; 2026-02-21T09:49:40.8919058Z cvt.rn.f16x2.f32 %r1286, %r1285, %r1284; 2026-02-21T09:49:40.8919241Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8919304Z cvt.u64.u32 %rd293, %r648; 2026-02-21T09:49:40.8919406Z cvt.u64.u32 %rd294, %r649; 2026-02-21T09:49:40.8919470Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:49:40.8919533Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:49:40.8919724Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8919789Z mov.b64 {%r1287, %r1288}, %rd296; 2026-02-21T09:49:40.8919860Z cvt.rn.f16x2.f32 %r1289, %r1288, %r1287; 2026-02-21T09:49:40.8920044Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8920113Z cvt.u64.u32 %rd297, %r650; 2026-02-21T09:49:40.8920179Z cvt.u64.u32 %rd298, %r651; 2026-02-21T09:49:40.8920242Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:49:40.8920314Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:49:40.8920497Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8920563Z mov.b64 {%r1290, %r1291}, %rd300; 2026-02-21T09:49:40.8920676Z cvt.rn.f16x2.f32 %r1292, %r1291, %r1290; 2026-02-21T09:49:40.8920864Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8920927Z cvt.u64.u32 %rd301, %r652; 2026-02-21T09:49:40.8920989Z cvt.u64.u32 %rd302, %r653; 2026-02-21T09:49:40.8921059Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:49:40.8921123Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:49:40.8921305Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8921378Z mov.b64 {%r1293, %r1294}, %rd304; 2026-02-21T09:49:40.8921454Z cvt.rn.f16x2.f32 %r1295, %r1294, %r1293; 2026-02-21T09:49:40.8921636Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8921708Z cvt.u64.u32 %rd305, %r654; 2026-02-21T09:49:40.8921771Z cvt.u64.u32 %rd306, %r655; 2026-02-21T09:49:40.8921860Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:49:40.8921927Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:49:40.8922118Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8922182Z mov.b64 {%r1296, %r1297}, %rd308; 2026-02-21T09:49:40.8922259Z cvt.rn.f16x2.f32 %r1298, %r1297, %r1296; 2026-02-21T09:49:40.8922488Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8922560Z cvt.u64.u32 %rd309, %r656; 2026-02-21T09:49:40.8922633Z cvt.u64.u32 %rd310, %r657; 2026-02-21T09:49:40.8922779Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:49:40.8922855Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:49:40.8923076Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8923155Z mov.b64 {%r1299, %r1300}, %rd312; 2026-02-21T09:49:40.8923237Z cvt.rn.f16x2.f32 %r1301, %r1300, %r1299; 2026-02-21T09:49:40.8923427Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8923494Z cvt.u64.u32 %rd313, %r658; 2026-02-21T09:49:40.8923567Z cvt.u64.u32 %rd314, %r659; 2026-02-21T09:49:40.8923630Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:49:40.8923693Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:49:40.8923881Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8923962Z mov.b64 {%r1302, %r1303}, %rd316; 2026-02-21T09:49:40.8924033Z cvt.rn.f16x2.f32 %r1304, %r1303, %r1302; 2026-02-21T09:49:40.8924213Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8924310Z cvt.u64.u32 %rd317, %r660; 2026-02-21T09:49:40.8924373Z cvt.u64.u32 %rd318, %r661; 2026-02-21T09:49:40.8924436Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:49:40.8924508Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:49:40.8924760Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8924872Z mov.b64 {%r1305, %r1306}, %rd320; 2026-02-21T09:49:40.8924950Z cvt.rn.f16x2.f32 %r1307, %r1306, %r1305; 2026-02-21T09:49:40.8925127Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8925190Z cvt.u64.u32 %rd321, %r663; 2026-02-21T09:49:40.8925251Z cvt.u64.u32 %rd322, %r664; 2026-02-21T09:49:40.8925322Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:49:40.8925385Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:49:40.8925568Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8925642Z mov.b64 {%r1308, %r1309}, %rd324; 2026-02-21T09:49:40.8925714Z cvt.rn.f16x2.f32 %r1310, %r1309, %r1308; 2026-02-21T09:49:40.8925891Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8925961Z cvt.u64.u32 %rd325, %r665; 2026-02-21T09:49:40.8926053Z cvt.u64.u32 %rd326, %r666; 2026-02-21T09:49:40.8926120Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:49:40.8926184Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:49:40.8926372Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8926436Z mov.b64 {%r1311, %r1312}, %rd328; 2026-02-21T09:49:40.8926509Z cvt.rn.f16x2.f32 %r1313, %r1312, %r1311; 2026-02-21T09:49:40.8926700Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8926764Z cvt.u64.u32 %rd329, %r667; 2026-02-21T09:49:40.8926829Z cvt.u64.u32 %rd330, %r668; 2026-02-21T09:49:40.8926900Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:49:40.8926965Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:49:40.8927151Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8927245Z mov.b64 {%r1314, %r1315}, %rd332; 2026-02-21T09:49:40.8927328Z cvt.rn.f16x2.f32 %r1316, %r1315, %r1314; 2026-02-21T09:49:40.8927515Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8927579Z cvt.u64.u32 %rd333, %r669; 2026-02-21T09:49:40.8927652Z cvt.u64.u32 %rd334, %r670; 2026-02-21T09:49:40.8927716Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:49:40.8927781Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:49:40.8927973Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8928040Z mov.b64 {%r1317, %r1318}, %rd336; 2026-02-21T09:49:40.8928114Z cvt.rn.f16x2.f32 %r1319, %r1318, %r1317; 2026-02-21T09:49:40.8928293Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8928363Z cvt.u64.u32 %rd337, %r671; 2026-02-21T09:49:40.8928425Z cvt.u64.u32 %rd338, %r672; 2026-02-21T09:49:40.8928491Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:49:40.8928564Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:49:40.8928747Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8928811Z mov.b64 {%r1320, %r1321}, %rd340; 2026-02-21T09:49:40.8928890Z cvt.rn.f16x2.f32 %r1322, %r1321, %r1320; 2026-02-21T09:49:40.8929073Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8929136Z cvt.u64.u32 %rd341, %r673; 2026-02-21T09:49:40.8929198Z cvt.u64.u32 %rd342, %r674; 2026-02-21T09:49:40.8929267Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:49:40.8929359Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:49:40.8929547Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8929619Z mov.b64 {%r1323, %r1324}, %rd344; 2026-02-21T09:49:40.8929691Z cvt.rn.f16x2.f32 %r1325, %r1324, %r1323; 2026-02-21T09:49:40.8929876Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8929974Z cvt.u64.u32 %rd345, %r675; 2026-02-21T09:49:40.8930037Z cvt.u64.u32 %rd346, %r676; 2026-02-21T09:49:40.8930102Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:49:40.8930166Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:49:40.8930355Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8930419Z mov.b64 {%r1326, %r1327}, %rd348; 2026-02-21T09:49:40.8930491Z cvt.rn.f16x2.f32 %r1328, %r1327, %r1326; 2026-02-21T09:49:40.8930682Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8930747Z cvt.u64.u32 %rd349, %r677; 2026-02-21T09:49:40.8930811Z cvt.u64.u32 %rd350, %r678; 2026-02-21T09:49:40.8930881Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:49:40.8930946Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:49:40.8931154Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8931222Z mov.b64 {%r1329, %r1330}, %rd352; 2026-02-21T09:49:40.8931302Z cvt.rn.f16x2.f32 %r1331, %r1330, %r1329; 2026-02-21T09:49:40.8931486Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8931548Z cvt.u64.u32 %rd353, %r680; 2026-02-21T09:49:40.8931618Z cvt.u64.u32 %rd354, %r681; 2026-02-21T09:49:40.8931683Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:49:40.8931746Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:49:40.8931943Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8932008Z mov.b64 {%r1332, %r1333}, %rd356; 2026-02-21T09:49:40.8932081Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T09:49:40.8932264Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8932359Z cvt.u64.u32 %rd357, %r682; 2026-02-21T09:49:40.8932426Z cvt.u64.u32 %rd358, %r683; 2026-02-21T09:49:40.8932490Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:49:40.8932560Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:49:40.8932743Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8932806Z mov.b64 {%r1335, %r1336}, %rd360; 2026-02-21T09:49:40.8932885Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T09:49:40.8933069Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8933132Z cvt.u64.u32 %rd361, %r684; 2026-02-21T09:49:40.8933195Z cvt.u64.u32 %rd362, %r685; 2026-02-21T09:49:40.8933264Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:49:40.8933326Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:49:40.8933508Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8933579Z mov.b64 {%r1338, %r1339}, %rd364; 2026-02-21T09:49:40.8933653Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T09:49:40.8933842Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8933912Z cvt.u64.u32 %rd365, %r686; 2026-02-21T09:49:40.8933975Z cvt.u64.u32 %rd366, %r687; 2026-02-21T09:49:40.8934036Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:49:40.8934097Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:49:40.8934286Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8934350Z mov.b64 {%r1341, %r1342}, %rd368; 2026-02-21T09:49:40.8934453Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T09:49:40.8934644Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8934742Z cvt.u64.u32 %rd369, %r688; 2026-02-21T09:49:40.8934806Z cvt.u64.u32 %rd370, %r689; 2026-02-21T09:49:40.8934877Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:49:40.8934940Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:49:40.8935154Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8935218Z mov.b64 {%r1344, %r1345}, %rd372; 2026-02-21T09:49:40.8935298Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T09:49:40.8935482Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8935546Z cvt.u64.u32 %rd373, %r690; 2026-02-21T09:49:40.8935615Z cvt.u64.u32 %rd374, %r691; 2026-02-21T09:49:40.8935678Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:49:40.8935747Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:49:40.8935937Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8936006Z mov.b64 {%r1347, %r1348}, %rd376; 2026-02-21T09:49:40.8936077Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T09:49:40.8936289Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8936364Z cvt.u64.u32 %rd377, %r692; 2026-02-21T09:49:40.8936428Z cvt.u64.u32 %rd378, %r693; 2026-02-21T09:49:40.8936492Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:49:40.8936564Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:49:40.8936752Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8936818Z mov.b64 {%r1350, %r1351}, %rd380; 2026-02-21T09:49:40.8936897Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T09:49:40.8937079Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8937145Z cvt.u64.u32 %rd381, %r694; 2026-02-21T09:49:40.8937207Z cvt.u64.u32 %rd382, %r695; 2026-02-21T09:49:40.8937281Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:49:40.8937347Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:49:40.8937562Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8937642Z mov.b64 {%r1353, %r1354}, %rd384; 2026-02-21T09:49:40.8937716Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T09:49:40.8937903Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8937978Z cvt.u64.u32 %rd385, %r697; 2026-02-21T09:49:40.8938045Z cvt.u64.u32 %rd386, %r698; 2026-02-21T09:49:40.8938109Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:49:40.8938172Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:49:40.8938363Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8938429Z mov.b64 {%r1356, %r1357}, %rd388; 2026-02-21T09:49:40.8938503Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T09:49:40.8938694Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8938759Z cvt.u64.u32 %rd389, %r699; 2026-02-21T09:49:40.8938822Z cvt.u64.u32 %rd390, %r700; 2026-02-21T09:49:40.8938896Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:49:40.8938960Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:49:40.8939145Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8939210Z mov.b64 {%r1359, %r1360}, %rd392; 2026-02-21T09:49:40.8939290Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T09:49:40.8939473Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8939536Z cvt.u64.u32 %rd393, %r701; 2026-02-21T09:49:40.8939649Z cvt.u64.u32 %rd394, %r702; 2026-02-21T09:49:40.8939713Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:49:40.8939777Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:49:40.8939968Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8940035Z mov.b64 {%r1362, %r1363}, %rd396; 2026-02-21T09:49:40.8940111Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T09:49:40.8940323Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8940396Z cvt.u64.u32 %rd397, %r703; 2026-02-21T09:49:40.8940460Z cvt.u64.u32 %rd398, %r704; 2026-02-21T09:49:40.8940524Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:49:40.8940596Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:49:40.8940777Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8940842Z mov.b64 {%r1365, %r1366}, %rd400; 2026-02-21T09:49:40.8940921Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T09:49:40.8941107Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8941170Z cvt.u64.u32 %rd401, %r705; 2026-02-21T09:49:40.8941234Z cvt.u64.u32 %rd402, %r706; 2026-02-21T09:49:40.8941305Z shl.b64 %rd403, %rd402, 32; 2026-02-21T09:49:40.8941371Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T09:49:40.8941581Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8941655Z mov.b64 {%r1368, %r1369}, %rd404; 2026-02-21T09:49:40.8941727Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T09:49:40.8941906Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8941978Z cvt.u64.u32 %rd405, %r707; 2026-02-21T09:49:40.8942042Z cvt.u64.u32 %rd406, %r708; 2026-02-21T09:49:40.8942105Z shl.b64 %rd407, %rd406, 32; 2026-02-21T09:49:40.8942170Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T09:49:40.8942362Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8942427Z mov.b64 {%r1371, %r1372}, %rd408; 2026-02-21T09:49:40.8942499Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T09:49:40.8942715Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8942782Z cvt.u64.u32 %rd409, %r709; 2026-02-21T09:49:40.8942844Z cvt.u64.u32 %rd410, %r710; 2026-02-21T09:49:40.8942914Z shl.b64 %rd411, %rd410, 32; 2026-02-21T09:49:40.8942977Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T09:49:40.8943158Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8943223Z mov.b64 {%r1374, %r1375}, %rd412; 2026-02-21T09:49:40.8943302Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T09:49:40.8943479Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8943543Z cvt.u64.u32 %rd413, %r711; 2026-02-21T09:49:40.8943612Z cvt.u64.u32 %rd414, %r712; 2026-02-21T09:49:40.8943673Z shl.b64 %rd415, %rd414, 32; 2026-02-21T09:49:40.8943737Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T09:49:40.8943925Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8943992Z mov.b64 {%r1377, %r1378}, %rd416; 2026-02-21T09:49:40.8944066Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T09:49:40.8944242Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8944311Z cvt.u64.u32 %rd417, %r714; 2026-02-21T09:49:40.8944373Z cvt.u64.u32 %rd418, %r715; 2026-02-21T09:49:40.8944436Z shl.b64 %rd419, %rd418, 32; 2026-02-21T09:49:40.8944507Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T09:49:40.8944722Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8944834Z mov.b64 {%r1380, %r1381}, %rd420; 2026-02-21T09:49:40.8944914Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T09:49:40.8945097Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8945161Z cvt.u64.u32 %rd421, %r716; 2026-02-21T09:49:40.8945223Z cvt.u64.u32 %rd422, %r717; 2026-02-21T09:49:40.8945295Z shl.b64 %rd423, %rd422, 32; 2026-02-21T09:49:40.8945386Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T09:49:40.8945566Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8945637Z mov.b64 {%r1383, %r1384}, %rd424; 2026-02-21T09:49:40.8945710Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T09:49:40.8945894Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8945964Z cvt.u64.u32 %rd425, %r718; 2026-02-21T09:49:40.8946027Z cvt.u64.u32 %rd426, %r719; 2026-02-21T09:49:40.8946092Z shl.b64 %rd427, %rd426, 32; 2026-02-21T09:49:40.8946155Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T09:49:40.8946346Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8946413Z mov.b64 {%r1386, %r1387}, %rd428; 2026-02-21T09:49:40.8946490Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T09:49:40.8946720Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8946785Z cvt.u64.u32 %rd429, %r720; 2026-02-21T09:49:40.8946848Z cvt.u64.u32 %rd430, %r721; 2026-02-21T09:49:40.8946920Z shl.b64 %rd431, %rd430, 32; 2026-02-21T09:49:40.8946984Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T09:49:40.8947163Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8947228Z mov.b64 {%r1389, %r1390}, %rd432; 2026-02-21T09:49:40.8947310Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T09:49:40.8947494Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8947558Z cvt.u64.u32 %rd433, %r722; 2026-02-21T09:49:40.8947639Z cvt.u64.u32 %rd434, %r723; 2026-02-21T09:49:40.8947702Z shl.b64 %rd435, %rd434, 32; 2026-02-21T09:49:40.8947802Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T09:49:40.8947998Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8948066Z mov.b64 {%r1392, %r1393}, %rd436; 2026-02-21T09:49:40.8948138Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T09:49:40.8948319Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8948391Z cvt.u64.u32 %rd437, %r724; 2026-02-21T09:49:40.8948454Z cvt.u64.u32 %rd438, %r725; 2026-02-21T09:49:40.8948517Z shl.b64 %rd439, %rd438, 32; 2026-02-21T09:49:40.8948588Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T09:49:40.8948775Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8948837Z mov.b64 {%r1395, %r1396}, %rd440; 2026-02-21T09:49:40.8948919Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T09:49:40.8949102Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8949167Z cvt.u64.u32 %rd441, %r726; 2026-02-21T09:49:40.8949232Z cvt.u64.u32 %rd442, %r727; 2026-02-21T09:49:40.8949302Z shl.b64 %rd443, %rd442, 32; 2026-02-21T09:49:40.8949367Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T09:49:40.8949549Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8949622Z mov.b64 {%r1398, %r1399}, %rd444; 2026-02-21T09:49:40.8949694Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T09:49:40.8949926Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8950027Z cvt.u64.u32 %rd445, %r728; 2026-02-21T09:49:40.8950091Z cvt.u64.u32 %rd446, %r729; 2026-02-21T09:49:40.8950156Z shl.b64 %rd447, %rd446, 32; 2026-02-21T09:49:40.8950223Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T09:49:40.8950420Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8950488Z mov.b64 {%r1401, %r1402}, %rd448; 2026-02-21T09:49:40.8950591Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T09:49:40.8950836Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8950923Z cvt.u64.u32 %rd449, %r731; 2026-02-21T09:49:40.8951007Z cvt.u64.u32 %rd450, %r732; 2026-02-21T09:49:40.8951083Z shl.b64 %rd451, %rd450, 32; 2026-02-21T09:49:40.8951148Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T09:49:40.8951331Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8951400Z mov.b64 {%r1404, %r1405}, %rd452; 2026-02-21T09:49:40.8951482Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T09:49:40.8951670Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8951734Z cvt.u64.u32 %rd453, %r733; 2026-02-21T09:49:40.8951808Z cvt.u64.u32 %rd454, %r734; 2026-02-21T09:49:40.8951985Z shl.b64 %rd455, %rd454, 32; 2026-02-21T09:49:40.8952058Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T09:49:40.8952260Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8952327Z mov.b64 {%r1407, %r1408}, %rd456; 2026-02-21T09:49:40.8952400Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T09:49:40.8952595Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8952667Z cvt.u64.u32 %rd457, %r735; 2026-02-21T09:49:40.8952749Z cvt.u64.u32 %rd458, %r736; 2026-02-21T09:49:40.8952829Z shl.b64 %rd459, %rd458, 32; 2026-02-21T09:49:40.8952922Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T09:49:40.8953134Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8953208Z mov.b64 {%r1410, %r1411}, %rd460; 2026-02-21T09:49:40.8953353Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T09:49:40.8953749Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8953847Z cvt.u64.u32 %rd461, %r737; 2026-02-21T09:49:40.8953929Z cvt.u64.u32 %rd462, %r738; 2026-02-21T09:49:40.8954018Z shl.b64 %rd463, %rd462, 32; 2026-02-21T09:49:40.8954097Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T09:49:40.8954374Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8954449Z mov.b64 {%r1413, %r1414}, %rd464; 2026-02-21T09:49:40.8954523Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T09:49:40.8954744Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8954818Z cvt.u64.u32 %rd465, %r739; 2026-02-21T09:49:40.8954882Z cvt.u64.u32 %rd466, %r740; 2026-02-21T09:49:40.8954946Z shl.b64 %rd467, %rd466, 32; 2026-02-21T09:49:40.8955013Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T09:49:40.8955211Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8955279Z mov.b64 {%r1416, %r1417}, %rd468; 2026-02-21T09:49:40.8955353Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T09:49:40.8955544Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8955609Z cvt.u64.u32 %rd469, %r741; 2026-02-21T09:49:40.8955672Z cvt.u64.u32 %rd470, %r742; 2026-02-21T09:49:40.8955744Z shl.b64 %rd471, %rd470, 32; 2026-02-21T09:49:40.8955809Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T09:49:40.8956057Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8956153Z mov.b64 {%r1419, %r1420}, %rd472; 2026-02-21T09:49:40.8956266Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T09:49:40.8956489Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8956560Z cvt.u64.u32 %rd473, %r743; 2026-02-21T09:49:40.8956670Z cvt.u64.u32 %rd474, %r744; 2026-02-21T09:49:40.8956735Z shl.b64 %rd475, %rd474, 32; 2026-02-21T09:49:40.8956801Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T09:49:40.8956996Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8957063Z mov.b64 {%r1422, %r1423}, %rd476; 2026-02-21T09:49:40.8957140Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T09:49:40.8957325Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8957401Z cvt.u64.u32 %rd477, %r745; 2026-02-21T09:49:40.8957465Z cvt.u64.u32 %rd478, %r746; 2026-02-21T09:49:40.8957529Z shl.b64 %rd479, %rd478, 32; 2026-02-21T09:49:40.8957602Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T09:49:40.8957794Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8957891Z mov.b64 {%r1425, %r1426}, %rd480; 2026-02-21T09:49:40.8957978Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T09:49:40.8958166Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8958233Z cvt.u64.u32 %rd481, %r748; 2026-02-21T09:49:40.8958298Z cvt.u64.u32 %rd482, %r749; 2026-02-21T09:49:40.8958371Z shl.b64 %rd483, %rd482, 32; 2026-02-21T09:49:40.8958436Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T09:49:40.8958623Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8958697Z mov.b64 {%r1428, %r1429}, %rd484; 2026-02-21T09:49:40.8958770Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T09:49:40.8958955Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8959027Z cvt.u64.u32 %rd485, %r750; 2026-02-21T09:49:40.8959122Z cvt.u64.u32 %rd486, %r751; 2026-02-21T09:49:40.8959189Z shl.b64 %rd487, %rd486, 32; 2026-02-21T09:49:40.8959259Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T09:49:40.8959451Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8959519Z mov.b64 {%r1431, %r1432}, %rd488; 2026-02-21T09:49:40.8959596Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T09:49:40.8959797Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8959861Z cvt.u64.u32 %rd489, %r752; 2026-02-21T09:49:40.8959926Z cvt.u64.u32 %rd490, %r753; 2026-02-21T09:49:40.8959998Z shl.b64 %rd491, %rd490, 32; 2026-02-21T09:49:40.8960063Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T09:49:40.8960245Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8960310Z mov.b64 {%r1434, %r1435}, %rd492; 2026-02-21T09:49:40.8960393Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T09:49:40.8960576Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8960643Z cvt.u64.u32 %rd493, %r754; 2026-02-21T09:49:40.8960715Z cvt.u64.u32 %rd494, %r755; 2026-02-21T09:49:40.8960779Z shl.b64 %rd495, %rd494, 32; 2026-02-21T09:49:40.8960845Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T09:49:40.8961053Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8961119Z mov.b64 {%r1437, %r1438}, %rd496; 2026-02-21T09:49:40.8961192Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T09:49:40.8961406Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8961480Z cvt.u64.u32 %rd497, %r756; 2026-02-21T09:49:40.8961543Z cvt.u64.u32 %rd498, %r757; 2026-02-21T09:49:40.8961608Z shl.b64 %rd499, %rd498, 32; 2026-02-21T09:49:40.8961685Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T09:49:40.8961878Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8961967Z mov.b64 {%r1440, %r1441}, %rd500; 2026-02-21T09:49:40.8962048Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T09:49:40.8962259Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8962323Z cvt.u64.u32 %rd501, %r758; 2026-02-21T09:49:40.8962388Z cvt.u64.u32 %rd502, %r759; 2026-02-21T09:49:40.8962461Z shl.b64 %rd503, %rd502, 32; 2026-02-21T09:49:40.8962526Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T09:49:40.8962718Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8962792Z mov.b64 {%r1443, %r1444}, %rd504; 2026-02-21T09:49:40.8962865Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T09:49:40.8963058Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8963130Z cvt.u64.u32 %rd505, %r760; 2026-02-21T09:49:40.8963226Z cvt.u64.u32 %rd506, %r761; 2026-02-21T09:49:40.8963292Z shl.b64 %rd507, %rd506, 32; 2026-02-21T09:49:40.8963356Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T09:49:40.8963559Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8963624Z mov.b64 {%r1446, %r1447}, %rd508; 2026-02-21T09:49:40.8963698Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T09:49:40.8963904Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8963969Z cvt.u64.u32 %rd509, %r762; 2026-02-21T09:49:40.8964032Z cvt.u64.u32 %rd510, %r763; 2026-02-21T09:49:40.8964103Z shl.b64 %rd511, %rd510, 32; 2026-02-21T09:49:40.8964167Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T09:49:40.8964404Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8964471Z mov.b64 {%r1449, %r1450}, %rd512; 2026-02-21T09:49:40.8964556Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T09:49:40.8964825Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8964891Z cvt.u64.u32 %rd513, %r765; 2026-02-21T09:49:40.8964961Z cvt.u64.u32 %rd514, %r766; 2026-02-21T09:49:40.8965025Z shl.b64 %rd515, %rd514, 32; 2026-02-21T09:49:40.8965090Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T09:49:40.8965290Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8965355Z mov.b64 {%r1452, %r1453}, %rd516; 2026-02-21T09:49:40.8965430Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T09:49:40.8965622Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8965695Z cvt.u64.u32 %rd517, %r767; 2026-02-21T09:49:40.8965760Z cvt.u64.u32 %rd518, %r768; 2026-02-21T09:49:40.8965825Z shl.b64 %rd519, %rd518, 32; 2026-02-21T09:49:40.8965902Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T09:49:40.8966092Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8966155Z mov.b64 {%r1455, %r1456}, %rd520; 2026-02-21T09:49:40.8966235Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T09:49:40.8966412Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8966476Z cvt.u64.u32 %rd521, %r769; 2026-02-21T09:49:40.8966539Z cvt.u64.u32 %rd522, %r770; 2026-02-21T09:49:40.8966608Z shl.b64 %rd523, %rd522, 32; 2026-02-21T09:49:40.8966706Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T09:49:40.8966890Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8966960Z mov.b64 {%r1458, %r1459}, %rd524; 2026-02-21T09:49:40.8967032Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T09:49:40.8967219Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8967317Z cvt.u64.u32 %rd525, %r771; 2026-02-21T09:49:40.8967378Z cvt.u64.u32 %rd526, %r772; 2026-02-21T09:49:40.8967441Z shl.b64 %rd527, %rd526, 32; 2026-02-21T09:49:40.8967503Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T09:49:40.8967691Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8967754Z mov.b64 {%r1461, %r1462}, %rd528; 2026-02-21T09:49:40.8967826Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T09:49:40.8968020Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8968082Z cvt.u64.u32 %rd529, %r773; 2026-02-21T09:49:40.8968143Z cvt.u64.u32 %rd530, %r774; 2026-02-21T09:49:40.8968213Z shl.b64 %rd531, %rd530, 32; 2026-02-21T09:49:40.8968276Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T09:49:40.8968488Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8968558Z mov.b64 {%r1464, %r1465}, %rd532; 2026-02-21T09:49:40.8968641Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T09:49:40.8968819Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8968881Z cvt.u64.u32 %rd533, %r775; 2026-02-21T09:49:40.8968953Z cvt.u64.u32 %rd534, %r776; 2026-02-21T09:49:40.8969014Z shl.b64 %rd535, %rd534, 32; 2026-02-21T09:49:40.8969077Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T09:49:40.8969264Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8969329Z mov.b64 {%r1467, %r1468}, %rd536; 2026-02-21T09:49:40.8969400Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T09:49:40.8969604Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8969676Z cvt.u64.u32 %rd537, %r777; 2026-02-21T09:49:40.8969742Z cvt.u64.u32 %rd538, %r778; 2026-02-21T09:49:40.8969805Z shl.b64 %rd539, %rd538, 32; 2026-02-21T09:49:40.8969876Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T09:49:40.8970056Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8970119Z mov.b64 {%r1470, %r1471}, %rd540; 2026-02-21T09:49:40.8970198Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T09:49:40.8970378Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8970442Z cvt.u64.u32 %rd541, %r779; 2026-02-21T09:49:40.8970505Z cvt.u64.u32 %rd542, %r780; 2026-02-21T09:49:40.8970574Z shl.b64 %rd543, %rd542, 32; 2026-02-21T09:49:40.8970636Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T09:49:40.8970818Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8970891Z mov.b64 {%r1473, %r1474}, %rd544; 2026-02-21T09:49:40.8970964Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T09:49:40.8971148Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8971217Z cvt.u64.u32 %rd545, %r782; 2026-02-21T09:49:40.8971280Z cvt.u64.u32 %rd546, %r783; 2026-02-21T09:49:40.8971342Z shl.b64 %rd547, %rd546, 32; 2026-02-21T09:49:40.8971405Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T09:49:40.8971595Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8971656Z mov.b64 {%r1476, %r1477}, %rd548; 2026-02-21T09:49:40.8971757Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T09:49:40.8971947Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8972012Z cvt.u64.u32 %rd549, %r784; 2026-02-21T09:49:40.8972072Z cvt.u64.u32 %rd550, %r785; 2026-02-21T09:49:40.8972142Z shl.b64 %rd551, %rd550, 32; 2026-02-21T09:49:40.8972207Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T09:49:40.8972410Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8972474Z mov.b64 {%r1479, %r1480}, %rd552; 2026-02-21T09:49:40.8972552Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T09:49:40.8972729Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8972791Z cvt.u64.u32 %rd553, %r786; 2026-02-21T09:49:40.8972860Z cvt.u64.u32 %rd554, %r787; 2026-02-21T09:49:40.8972922Z shl.b64 %rd555, %rd554, 32; 2026-02-21T09:49:40.8972987Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T09:49:40.8973171Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8973233Z mov.b64 {%r1482, %r1483}, %rd556; 2026-02-21T09:49:40.8973304Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T09:49:40.8973510Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8973582Z cvt.u64.u32 %rd557, %r788; 2026-02-21T09:49:40.8973644Z cvt.u64.u32 %rd558, %r789; 2026-02-21T09:49:40.8973705Z shl.b64 %rd559, %rd558, 32; 2026-02-21T09:49:40.8973773Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T09:49:40.8973953Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8974015Z mov.b64 {%r1485, %r1486}, %rd560; 2026-02-21T09:49:40.8974093Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T09:49:40.8974276Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8974341Z cvt.u64.u32 %rd561, %r790; 2026-02-21T09:49:40.8974403Z cvt.u64.u32 %rd562, %r791; 2026-02-21T09:49:40.8974474Z shl.b64 %rd563, %rd562, 32; 2026-02-21T09:49:40.8974537Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T09:49:40.8974799Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8974875Z mov.b64 {%r1488, %r1489}, %rd564; 2026-02-21T09:49:40.8974946Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T09:49:40.8975131Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8975200Z cvt.u64.u32 %rd565, %r792; 2026-02-21T09:49:40.8975262Z cvt.u64.u32 %rd566, %r793; 2026-02-21T09:49:40.8975323Z shl.b64 %rd567, %rd566, 32; 2026-02-21T09:49:40.8975386Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T09:49:40.8975576Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8975641Z mov.b64 {%r1491, %r1492}, %rd568; 2026-02-21T09:49:40.8975713Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T09:49:40.8975903Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8975970Z cvt.u64.u32 %rd569, %r794; 2026-02-21T09:49:40.8976034Z cvt.u64.u32 %rd570, %r795; 2026-02-21T09:49:40.8976106Z shl.b64 %rd571, %rd570, 32; 2026-02-21T09:49:40.8976170Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T09:49:40.8976354Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8976417Z mov.b64 {%r1494, %r1495}, %rd572; 2026-02-21T09:49:40.8976498Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T09:49:40.8976683Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8976746Z cvt.u64.u32 %rd573, %r796; 2026-02-21T09:49:40.8976848Z cvt.u64.u32 %rd574, %r797; 2026-02-21T09:49:40.8976912Z shl.b64 %rd575, %rd574, 32; 2026-02-21T09:49:40.8976976Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T09:49:40.8977168Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8977233Z mov.b64 {%r1497, %r1498}, %rd576; 2026-02-21T09:49:40.8977308Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T09:49:40.8977521Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8977591Z cvt.u64.u32 %rd577, %r799; 2026-02-21T09:49:40.8977652Z cvt.u64.u32 %rd578, %r800; 2026-02-21T09:49:40.8977715Z shl.b64 %rd579, %rd578, 32; 2026-02-21T09:49:40.8977784Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T09:49:40.8977968Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8978031Z mov.b64 {%r1500, %r1501}, %rd580; 2026-02-21T09:49:40.8978111Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T09:49:40.8978293Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8978355Z cvt.u64.u32 %rd581, %r801; 2026-02-21T09:49:40.8978419Z cvt.u64.u32 %rd582, %r802; 2026-02-21T09:49:40.8978490Z shl.b64 %rd583, %rd582, 32; 2026-02-21T09:49:40.8978582Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T09:49:40.8978768Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8978838Z mov.b64 {%r1503, %r1504}, %rd584; 2026-02-21T09:49:40.8978911Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T09:49:40.8979096Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8979165Z cvt.u64.u32 %rd585, %r803; 2026-02-21T09:49:40.8979227Z cvt.u64.u32 %rd586, %r804; 2026-02-21T09:49:40.8979291Z shl.b64 %rd587, %rd586, 32; 2026-02-21T09:49:40.8979356Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T09:49:40.8979545Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8979608Z mov.b64 {%r1506, %r1507}, %rd588; 2026-02-21T09:49:40.8979680Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T09:49:40.8979894Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8979959Z cvt.u64.u32 %rd589, %r805; 2026-02-21T09:49:40.8980021Z cvt.u64.u32 %rd590, %r806; 2026-02-21T09:49:40.8980090Z shl.b64 %rd591, %rd590, 32; 2026-02-21T09:49:40.8980152Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T09:49:40.8980336Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8980400Z mov.b64 {%r1509, %r1510}, %rd592; 2026-02-21T09:49:40.8980478Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T09:49:40.8980657Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8980720Z cvt.u64.u32 %rd593, %r807; 2026-02-21T09:49:40.8980790Z cvt.u64.u32 %rd594, %r808; 2026-02-21T09:49:40.8980852Z shl.b64 %rd595, %rd594, 32; 2026-02-21T09:49:40.8980915Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T09:49:40.8981109Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8981174Z mov.b64 {%r1512, %r1513}, %rd596; 2026-02-21T09:49:40.8981246Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T09:49:40.8981427Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8981498Z cvt.u64.u32 %rd597, %r809; 2026-02-21T09:49:40.8981559Z cvt.u64.u32 %rd598, %r810; 2026-02-21T09:49:40.8981620Z shl.b64 %rd599, %rd598, 32; 2026-02-21T09:49:40.8981690Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T09:49:40.8981869Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8981957Z mov.b64 {%r1515, %r1516}, %rd600; 2026-02-21T09:49:40.8982035Z cvt.rn.f16x2.f32 %r1517, %r1516, %r1515; 2026-02-21T09:49:40.8982218Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8982281Z cvt.u64.u32 %rd601, %r811; 2026-02-21T09:49:40.8982344Z cvt.u64.u32 %rd602, %r812; 2026-02-21T09:49:40.8982435Z shl.b64 %rd603, %rd602, 32; 2026-02-21T09:49:40.8982497Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T09:49:40.8982679Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8982751Z mov.b64 {%r1518, %r1519}, %rd604; 2026-02-21T09:49:40.8982823Z cvt.rn.f16x2.f32 %r1520, %r1519, %r1518; 2026-02-21T09:49:40.8983002Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.8983071Z cvt.u64.u32 %rd605, %r813; 2026-02-21T09:49:40.8983133Z cvt.u64.u32 %rd606, %r814; 2026-02-21T09:49:40.8983196Z shl.b64 %rd607, %rd606, 32; 2026-02-21T09:49:40.8983262Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T09:49:40.8983452Z .loc 1 55 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:55:27 2026-02-21T09:49:40.8983516Z mov.b64 {%r1521, %r1522}, %rd608; 2026-02-21T09:49:40.8983635Z cvt.rn.f16x2.f32 %r1523, %r1522, %r1521; 2026-02-21T09:49:40.8983829Z .loc 1 56 53 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:56:53 2026-02-21T09:49:40.8983905Z mad.lo.s32 %r1524, %r1106, 12288, %r1138; 2026-02-21T09:49:40.8983980Z mad.lo.s32 %r1525, %r1107, 12288, %r1138; 2026-02-21T09:49:40.8984059Z mad.lo.s32 %r1526, %r1108, 12288, %r1138; 2026-02-21T09:49:40.8984127Z mad.lo.s32 %r1527, %r1109, 12288, %r1138; 2026-02-21T09:49:40.8984194Z mad.lo.s32 %r1528, %r1110, 12288, %r1138; 2026-02-21T09:49:40.8984261Z mad.lo.s32 %r1529, %r1111, 12288, %r1138; 2026-02-21T09:49:40.8984339Z mad.lo.s32 %r1530, %r1112, 12288, %r1138; 2026-02-21T09:49:40.8984406Z mad.lo.s32 %r1531, %r1113, 12288, %r1138; 2026-02-21T09:49:40.8984474Z mad.lo.s32 %r1532, %r1114, 12288, %r1138; 2026-02-21T09:49:40.8984549Z mad.lo.s32 %r1533, %r1115, 12288, %r1138; 2026-02-21T09:49:40.8984641Z mad.lo.s32 %r1534, %r1116, 12288, %r1138; 2026-02-21T09:49:40.8984745Z mad.lo.s32 %r1535, %r1117, 12288, %r1138; 2026-02-21T09:49:40.8984824Z mad.lo.s32 %r1536, %r1118, 12288, %r1138; 2026-02-21T09:49:40.8984893Z mad.lo.s32 %r1537, %r1119, 12288, %r1138; 2026-02-21T09:49:40.8984962Z mad.lo.s32 %r1538, %r1120, 12288, %r1138; 2026-02-21T09:49:40.8985032Z mad.lo.s32 %r1539, %r1121, 12288, %r1138; 2026-02-21T09:49:40.8985109Z mad.lo.s32 %r1540, %r1122, 12288, %r1138; 2026-02-21T09:49:40.8985177Z mad.lo.s32 %r1541, %r1123, 12288, %r1138; 2026-02-21T09:49:40.8985246Z mad.lo.s32 %r1542, %r1124, 12288, %r1138; 2026-02-21T09:49:40.8985323Z mad.lo.s32 %r1543, %r1125, 12288, %r1138; 2026-02-21T09:49:40.8985393Z mad.lo.s32 %r1544, %r1126, 12288, %r1138; 2026-02-21T09:49:40.8985462Z mad.lo.s32 %r1545, %r1127, 12288, %r1138; 2026-02-21T09:49:40.8985537Z mad.lo.s32 %r1546, %r1128, 12288, %r1138; 2026-02-21T09:49:40.8985606Z mad.lo.s32 %r1547, %r1129, 12288, %r1138; 2026-02-21T09:49:40.8985677Z mad.lo.s32 %r1548, %r1130, 12288, %r1138; 2026-02-21T09:49:40.8985745Z mad.lo.s32 %r1549, %r1131, 12288, %r1138; 2026-02-21T09:49:40.8985826Z mad.lo.s32 %r1550, %r1132, 12288, %r1138; 2026-02-21T09:49:40.8985896Z mad.lo.s32 %r1551, %r1133, 12288, %r1138; 2026-02-21T09:49:40.8985965Z mad.lo.s32 %r1552, %r1134, 12288, %r1138; 2026-02-21T09:49:40.8986042Z mad.lo.s32 %r1553, %r1135, 12288, %r1138; 2026-02-21T09:49:40.8986110Z mad.lo.s32 %r1554, %r1136, 12288, %r1138; 2026-02-21T09:49:40.8986179Z mad.lo.s32 %r1555, %r1137, 12288, %r1138; 2026-02-21T09:49:40.8986371Z .loc 1 56 24 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:56:24 2026-02-21T09:49:40.8986479Z mad.wide.s32 %rd65, %r1524, 2, %rd5; 2026-02-21T09:49:40.8986548Z mad.wide.s32 %rd66, %r1525, 2, %rd5; 2026-02-21T09:49:40.8986613Z mad.wide.s32 %rd67, %r1526, 2, %rd5; 2026-02-21T09:49:40.8986689Z mad.wide.s32 %rd68, %r1527, 2, %rd5; 2026-02-21T09:49:40.8986754Z mad.wide.s32 %rd69, %r1528, 2, %rd5; 2026-02-21T09:49:40.8986822Z mad.wide.s32 %rd70, %r1529, 2, %rd5; 2026-02-21T09:49:40.8986894Z mad.wide.s32 %rd71, %r1530, 2, %rd5; 2026-02-21T09:49:40.8986990Z mad.wide.s32 %rd72, %r1531, 2, %rd5; 2026-02-21T09:49:40.8987056Z mad.wide.s32 %rd73, %r1532, 2, %rd5; 2026-02-21T09:49:40.8987136Z mad.wide.s32 %rd74, %r1533, 2, %rd5; 2026-02-21T09:49:40.8987210Z mad.wide.s32 %rd75, %r1534, 2, %rd5; 2026-02-21T09:49:40.8987275Z mad.wide.s32 %rd76, %r1535, 2, %rd5; 2026-02-21T09:49:40.8987344Z mad.wide.s32 %rd77, %r1536, 2, %rd5; 2026-02-21T09:49:40.8987426Z mad.wide.s32 %rd78, %r1537, 2, %rd5; 2026-02-21T09:49:40.8987493Z mad.wide.s32 %rd79, %r1538, 2, %rd5; 2026-02-21T09:49:40.8987560Z mad.wide.s32 %rd80, %r1539, 2, %rd5; 2026-02-21T09:49:40.8987637Z mad.wide.s32 %rd81, %r1540, 2, %rd5; 2026-02-21T09:49:40.8987702Z mad.wide.s32 %rd82, %r1541, 2, %rd5; 2026-02-21T09:49:40.8987767Z mad.wide.s32 %rd83, %r1542, 2, %rd5; 2026-02-21T09:49:40.8987834Z mad.wide.s32 %rd84, %r1543, 2, %rd5; 2026-02-21T09:49:40.8987908Z mad.wide.s32 %rd85, %r1544, 2, %rd5; 2026-02-21T09:49:40.8988003Z mad.wide.s32 %rd86, %r1545, 2, %rd5; 2026-02-21T09:49:40.8988073Z mad.wide.s32 %rd87, %r1546, 2, %rd5; 2026-02-21T09:49:40.8988146Z mad.wide.s32 %rd88, %r1547, 2, %rd5; 2026-02-21T09:49:40.8988211Z mad.wide.s32 %rd89, %r1548, 2, %rd5; 2026-02-21T09:49:40.8988277Z mad.wide.s32 %rd90, %r1549, 2, %rd5; 2026-02-21T09:49:40.8988344Z mad.wide.s32 %rd91, %r1550, 2, %rd5; 2026-02-21T09:49:40.8988418Z mad.wide.s32 %rd92, %r1551, 2, %rd5; 2026-02-21T09:49:40.8988483Z mad.wide.s32 %rd93, %r1552, 2, %rd5; 2026-02-21T09:49:40.8988547Z mad.wide.s32 %rd94, %r1553, 2, %rd5; 2026-02-21T09:49:40.8988618Z mad.wide.s32 %rd95, %r1554, 2, %rd5; 2026-02-21T09:49:40.8988686Z mad.wide.s32 %rd96, %r1555, 2, %rd5; 2026-02-21T09:49:40.8988873Z .loc 1 56 83 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:56:83 2026-02-21T09:49:40.8988940Z bar.sync 0, 128; 2026-02-21T09:49:40.8989080Z st.shared.v4.b32 [%r78], {%r1142, %r1154, %r1166, %r1178}; 2026-02-21T09:49:40.8989190Z st.shared.v4.b32 [%r79], {%r1190, %r1202, %r1214, %r1226}; 2026-02-21T09:49:40.8989290Z st.shared.v4.b32 [%r80], {%r1238, %r1250, %r1262, %r1274}; 2026-02-21T09:49:40.8989396Z st.shared.v4.b32 [%r81], {%r1286, %r1298, %r1310, %r1322}; 2026-02-21T09:49:40.8989457Z bar.sync 0, 128; 2026-02-21T09:49:40.8989520Z // begin inline asm 2026-02-21T09:49:40.8989696Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r977, %r981, %r985, %r989}, [%r821]; 2026-02-21T09:49:40.8989757Z // end inline asm 2026-02-21T09:49:40.8989818Z // begin inline asm 2026-02-21T09:49:40.8989991Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r993, %r997, %r1001, %r1005}, [%r826]; 2026-02-21T09:49:40.8990052Z // end inline asm 2026-02-21T09:49:40.8990111Z // begin inline asm 2026-02-21T09:49:40.8990276Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1009, %r1013, %r1017, %r1021}, [%r831]; 2026-02-21T09:49:40.8990343Z // end inline asm 2026-02-21T09:49:40.8990402Z // begin inline asm 2026-02-21T09:49:40.8990567Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r836]; 2026-02-21T09:49:40.8990633Z // end inline asm 2026-02-21T09:49:40.8990692Z bar.sync 0, 128; 2026-02-21T09:49:40.8990793Z st.shared.v4.b32 [%r78], {%r1334, %r1346, %r1358, %r1370}; 2026-02-21T09:49:40.8990889Z st.shared.v4.b32 [%r79], {%r1382, %r1394, %r1406, %r1418}; 2026-02-21T09:49:40.8990994Z st.shared.v4.b32 [%r80], {%r1430, %r1442, %r1454, %r1466}; 2026-02-21T09:49:40.8991089Z st.shared.v4.b32 [%r81], {%r1478, %r1490, %r1502, %r1514}; 2026-02-21T09:49:40.8991147Z bar.sync 0, 128; 2026-02-21T09:49:40.8991214Z // begin inline asm 2026-02-21T09:49:40.8991402Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r821]; 2026-02-21T09:49:40.8991462Z // end inline asm 2026-02-21T09:49:40.8991530Z // begin inline asm 2026-02-21T09:49:40.8991689Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r826]; 2026-02-21T09:49:40.8991749Z // end inline asm 2026-02-21T09:49:40.8991808Z // begin inline asm 2026-02-21T09:49:40.8992004Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r831]; 2026-02-21T09:49:40.8992063Z // end inline asm 2026-02-21T09:49:40.8992124Z // begin inline asm 2026-02-21T09:49:40.8992291Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r836]; 2026-02-21T09:49:40.8992349Z // end inline asm 2026-02-21T09:49:40.8992409Z bar.sync 0, 128; 2026-02-21T09:49:40.8992517Z st.shared.v4.b32 [%r78], {%r1145, %r1157, %r1169, %r1181}; 2026-02-21T09:49:40.8992614Z st.shared.v4.b32 [%r79], {%r1193, %r1205, %r1217, %r1229}; 2026-02-21T09:49:40.8992713Z st.shared.v4.b32 [%r80], {%r1241, %r1253, %r1265, %r1277}; 2026-02-21T09:49:40.8992809Z st.shared.v4.b32 [%r81], {%r1289, %r1301, %r1313, %r1325}; 2026-02-21T09:49:40.8992877Z bar.sync 0, 128; 2026-02-21T09:49:40.8992938Z // begin inline asm 2026-02-21T09:49:40.8993100Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r978, %r982, %r986, %r990}, [%r821]; 2026-02-21T09:49:40.8993164Z // end inline asm 2026-02-21T09:49:40.8993253Z // begin inline asm 2026-02-21T09:49:40.8993414Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r994, %r998, %r1002, %r1006}, [%r826]; 2026-02-21T09:49:40.8993475Z // end inline asm 2026-02-21T09:49:40.8993544Z // begin inline asm 2026-02-21T09:49:40.8993705Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1010, %r1014, %r1018, %r1022}, [%r831]; 2026-02-21T09:49:40.8993764Z // end inline asm 2026-02-21T09:49:40.8993833Z // begin inline asm 2026-02-21T09:49:40.8993995Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r836]; 2026-02-21T09:49:40.8994056Z // end inline asm 2026-02-21T09:49:40.8994121Z bar.sync 0, 128; 2026-02-21T09:49:40.8994220Z st.shared.v4.b32 [%r78], {%r1337, %r1349, %r1361, %r1373}; 2026-02-21T09:49:40.8994316Z st.shared.v4.b32 [%r79], {%r1385, %r1397, %r1409, %r1421}; 2026-02-21T09:49:40.8994439Z st.shared.v4.b32 [%r80], {%r1433, %r1445, %r1457, %r1469}; 2026-02-21T09:49:40.8994546Z st.shared.v4.b32 [%r81], {%r1481, %r1493, %r1505, %r1517}; 2026-02-21T09:49:40.8994606Z bar.sync 0, 128; 2026-02-21T09:49:40.8994665Z // begin inline asm 2026-02-21T09:49:40.8994869Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r821]; 2026-02-21T09:49:40.8994928Z // end inline asm 2026-02-21T09:49:40.8994988Z // begin inline asm 2026-02-21T09:49:40.8995159Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r826]; 2026-02-21T09:49:40.8995216Z // end inline asm 2026-02-21T09:49:40.8995276Z // begin inline asm 2026-02-21T09:49:40.8995436Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r831]; 2026-02-21T09:49:40.8995505Z // end inline asm 2026-02-21T09:49:40.8995564Z // begin inline asm 2026-02-21T09:49:40.8995727Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r836]; 2026-02-21T09:49:40.8995793Z // end inline asm 2026-02-21T09:49:40.8995853Z bar.sync 0, 128; 2026-02-21T09:49:40.8995954Z st.shared.v4.b32 [%r78], {%r1148, %r1160, %r1172, %r1184}; 2026-02-21T09:49:40.8996054Z st.shared.v4.b32 [%r79], {%r1196, %r1208, %r1220, %r1232}; 2026-02-21T09:49:40.8996160Z st.shared.v4.b32 [%r80], {%r1244, %r1256, %r1268, %r1280}; 2026-02-21T09:49:40.8996256Z st.shared.v4.b32 [%r81], {%r1292, %r1304, %r1316, %r1328}; 2026-02-21T09:49:40.8996315Z bar.sync 0, 128; 2026-02-21T09:49:40.8996382Z // begin inline asm 2026-02-21T09:49:40.8996545Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r979, %r983, %r987, %r991}, [%r821]; 2026-02-21T09:49:40.8996605Z // end inline asm 2026-02-21T09:49:40.8996676Z // begin inline asm 2026-02-21T09:49:40.8996872Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r995, %r999, %r1003, %r1007}, [%r826]; 2026-02-21T09:49:40.8996931Z // end inline asm 2026-02-21T09:49:40.8996989Z // begin inline asm 2026-02-21T09:49:40.8997157Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1011, %r1015, %r1019, %r1023}, [%r831]; 2026-02-21T09:49:40.8997216Z // end inline asm 2026-02-21T09:49:40.8997277Z // begin inline asm 2026-02-21T09:49:40.8997473Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1027, %r1031, %r1035, %r1039}, [%r836]; 2026-02-21T09:49:40.8997532Z // end inline asm 2026-02-21T09:49:40.8997590Z bar.sync 0, 128; 2026-02-21T09:49:40.8997695Z st.shared.v4.b32 [%r78], {%r1340, %r1352, %r1364, %r1376}; 2026-02-21T09:49:40.8997791Z st.shared.v4.b32 [%r79], {%r1388, %r1400, %r1412, %r1424}; 2026-02-21T09:49:40.8997887Z st.shared.v4.b32 [%r80], {%r1436, %r1448, %r1460, %r1472}; 2026-02-21T09:49:40.8997984Z st.shared.v4.b32 [%r81], {%r1484, %r1496, %r1508, %r1520}; 2026-02-21T09:49:40.8998054Z bar.sync 0, 128; 2026-02-21T09:49:40.8998113Z // begin inline asm 2026-02-21T09:49:40.8998271Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1043, %r1047, %r1051, %r1055}, [%r821]; 2026-02-21T09:49:40.8998335Z // end inline asm 2026-02-21T09:49:40.8998394Z // begin inline asm 2026-02-21T09:49:40.8998556Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1059, %r1063, %r1067, %r1071}, [%r826]; 2026-02-21T09:49:40.8998644Z // end inline asm 2026-02-21T09:49:40.8998713Z // begin inline asm 2026-02-21T09:49:40.8998874Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1075, %r1079, %r1083, %r1087}, [%r831]; 2026-02-21T09:49:40.8998933Z // end inline asm 2026-02-21T09:49:40.8999001Z // begin inline asm 2026-02-21T09:49:40.8999162Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1091, %r1095, %r1099, %r1103}, [%r836]; 2026-02-21T09:49:40.8999219Z // end inline asm 2026-02-21T09:49:40.8999283Z bar.sync 0, 128; 2026-02-21T09:49:40.8999378Z st.shared.v4.b32 [%r78], {%r1151, %r1163, %r1175, %r1187}; 2026-02-21T09:49:40.8999477Z st.shared.v4.b32 [%r79], {%r1199, %r1211, %r1223, %r1235}; 2026-02-21T09:49:40.8999573Z st.shared.v4.b32 [%r80], {%r1247, %r1259, %r1271, %r1283}; 2026-02-21T09:49:40.8999675Z st.shared.v4.b32 [%r81], {%r1295, %r1307, %r1319, %r1331}; 2026-02-21T09:49:40.8999735Z bar.sync 0, 128; 2026-02-21T09:49:40.8999826Z // begin inline asm 2026-02-21T09:49:40.8999992Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r980, %r984, %r988, %r992}, [%r821]; 2026-02-21T09:49:40.9000051Z // end inline asm 2026-02-21T09:49:40.9000109Z // begin inline asm 2026-02-21T09:49:40.9000283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r996, %r1000, %r1004, %r1008}, [%r826]; 2026-02-21T09:49:40.9000341Z // end inline asm 2026-02-21T09:49:40.9000400Z // begin inline asm 2026-02-21T09:49:40.9000558Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1012, %r1016, %r1020, %r1024}, [%r831]; 2026-02-21T09:49:40.9000622Z // end inline asm 2026-02-21T09:49:40.9000681Z // begin inline asm 2026-02-21T09:49:40.9000839Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1028, %r1032, %r1036, %r1040}, [%r836]; 2026-02-21T09:49:40.9000904Z // end inline asm 2026-02-21T09:49:40.9000962Z bar.sync 0, 128; 2026-02-21T09:49:40.9001059Z st.shared.v4.b32 [%r78], {%r1343, %r1355, %r1367, %r1379}; 2026-02-21T09:49:40.9001154Z st.shared.v4.b32 [%r79], {%r1391, %r1403, %r1415, %r1427}; 2026-02-21T09:49:40.9001261Z st.shared.v4.b32 [%r80], {%r1439, %r1451, %r1463, %r1475}; 2026-02-21T09:49:40.9001358Z st.shared.v4.b32 [%r81], {%r1487, %r1499, %r1511, %r1523}; 2026-02-21T09:49:40.9001418Z bar.sync 0, 128; 2026-02-21T09:49:40.9001486Z // begin inline asm 2026-02-21T09:49:40.9001646Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1044, %r1048, %r1052, %r1056}, [%r821]; 2026-02-21T09:49:40.9001703Z // end inline asm 2026-02-21T09:49:40.9001768Z // begin inline asm 2026-02-21T09:49:40.9001928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1060, %r1064, %r1068, %r1072}, [%r826]; 2026-02-21T09:49:40.9001986Z // end inline asm 2026-02-21T09:49:40.9002074Z // begin inline asm 2026-02-21T09:49:40.9002242Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1076, %r1080, %r1084, %r1088}, [%r831]; 2026-02-21T09:49:40.9002299Z // end inline asm 2026-02-21T09:49:40.9002360Z // begin inline asm 2026-02-21T09:49:40.9002529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1092, %r1096, %r1100, %r1104}, [%r836]; 2026-02-21T09:49:40.9002590Z // end inline asm 2026-02-21T09:49:40.9002652Z // begin inline asm 2026-02-21T09:49:40.9002806Z st.global.v4.b32 [ %rd65 + 0 ], { %r977, %r978, %r979, %r980 }; 2026-02-21T09:49:40.9002873Z // end inline asm 2026-02-21T09:49:40.9002933Z // begin inline asm 2026-02-21T09:49:40.9003045Z st.global.v4.b32 [ %rd66 + 0 ], { %r981, %r982, %r983, %r984 }; 2026-02-21T09:49:40.9003112Z // end inline asm 2026-02-21T09:49:40.9003173Z // begin inline asm 2026-02-21T09:49:40.9003277Z st.global.v4.b32 [ %rd67 + 0 ], { %r985, %r986, %r987, %r988 }; 2026-02-21T09:49:40.9003345Z // end inline asm 2026-02-21T09:49:40.9003405Z // begin inline asm 2026-02-21T09:49:40.9003511Z st.global.v4.b32 [ %rd68 + 0 ], { %r989, %r990, %r991, %r992 }; 2026-02-21T09:49:40.9003568Z // end inline asm 2026-02-21T09:49:40.9003638Z // begin inline asm 2026-02-21T09:49:40.9003740Z st.global.v4.b32 [ %rd69 + 0 ], { %r993, %r994, %r995, %r996 }; 2026-02-21T09:49:40.9003798Z // end inline asm 2026-02-21T09:49:40.9003867Z // begin inline asm 2026-02-21T09:49:40.9004005Z st.global.v4.b32 [ %rd70 + 0 ], { %r997, %r998, %r999, %r1000 }; 2026-02-21T09:49:40.9004067Z // end inline asm 2026-02-21T09:49:40.9004129Z // begin inline asm 2026-02-21T09:49:40.9004254Z st.global.v4.b32 [ %rd71 + 0 ], { %r1001, %r1002, %r1003, %r1004 }; 2026-02-21T09:49:40.9004317Z // end inline asm 2026-02-21T09:49:40.9004377Z // begin inline asm 2026-02-21T09:49:40.9004498Z st.global.v4.b32 [ %rd72 + 0 ], { %r1005, %r1006, %r1007, %r1008 }; 2026-02-21T09:49:40.9004556Z // end inline asm 2026-02-21T09:49:40.9004615Z // begin inline asm 2026-02-21T09:49:40.9004771Z st.global.v4.b32 [ %rd73 + 0 ], { %r1009, %r1010, %r1011, %r1012 }; 2026-02-21T09:49:40.9004831Z // end inline asm 2026-02-21T09:49:40.9004888Z // begin inline asm 2026-02-21T09:49:40.9004994Z st.global.v4.b32 [ %rd74 + 0 ], { %r1013, %r1014, %r1015, %r1016 }; 2026-02-21T09:49:40.9005063Z // end inline asm 2026-02-21T09:49:40.9005121Z // begin inline asm 2026-02-21T09:49:40.9005264Z st.global.v4.b32 [ %rd75 + 0 ], { %r1017, %r1018, %r1019, %r1020 }; 2026-02-21T09:49:40.9005331Z // end inline asm 2026-02-21T09:49:40.9005390Z // begin inline asm 2026-02-21T09:49:40.9005497Z st.global.v4.b32 [ %rd76 + 0 ], { %r1021, %r1022, %r1023, %r1024 }; 2026-02-21T09:49:40.9005559Z // end inline asm 2026-02-21T09:49:40.9005634Z // begin inline asm 2026-02-21T09:49:40.9005739Z st.global.v4.b32 [ %rd77 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T09:49:40.9005800Z // end inline asm 2026-02-21T09:49:40.9005869Z // begin inline asm 2026-02-21T09:49:40.9005974Z st.global.v4.b32 [ %rd78 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T09:49:40.9006033Z // end inline asm 2026-02-21T09:49:40.9006093Z // begin inline asm 2026-02-21T09:49:40.9006205Z st.global.v4.b32 [ %rd79 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T09:49:40.9006264Z // end inline asm 2026-02-21T09:49:40.9006322Z // begin inline asm 2026-02-21T09:49:40.9006436Z st.global.v4.b32 [ %rd80 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T09:49:40.9006497Z // end inline asm 2026-02-21T09:49:40.9006557Z // begin inline asm 2026-02-21T09:49:40.9006668Z st.global.v4.b32 [ %rd81 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T09:49:40.9006726Z // end inline asm 2026-02-21T09:49:40.9006785Z // begin inline asm 2026-02-21T09:49:40.9006892Z st.global.v4.b32 [ %rd82 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T09:49:40.9006958Z // end inline asm 2026-02-21T09:49:40.9007016Z // begin inline asm 2026-02-21T09:49:40.9007120Z st.global.v4.b32 [ %rd83 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T09:49:40.9007184Z // end inline asm 2026-02-21T09:49:40.9013384Z // begin inline asm 2026-02-21T09:49:40.9013554Z st.global.v4.b32 [ %rd84 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T09:49:40.9013624Z // end inline asm 2026-02-21T09:49:40.9013699Z // begin inline asm 2026-02-21T09:49:40.9013836Z st.global.v4.b32 [ %rd85 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T09:49:40.9013901Z // end inline asm 2026-02-21T09:49:40.9013976Z // begin inline asm 2026-02-21T09:49:40.9014231Z st.global.v4.b32 [ %rd86 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T09:49:40.9014293Z // end inline asm 2026-02-21T09:49:40.9014372Z // begin inline asm 2026-02-21T09:49:40.9014502Z st.global.v4.b32 [ %rd87 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T09:49:40.9014573Z // end inline asm 2026-02-21T09:49:40.9014644Z // begin inline asm 2026-02-21T09:49:40.9014942Z st.global.v4.b32 [ %rd88 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T09:49:40.9015014Z // end inline asm 2026-02-21T09:49:40.9015089Z // begin inline asm 2026-02-21T09:49:40.9015231Z st.global.v4.b32 [ %rd89 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T09:49:40.9015310Z // end inline asm 2026-02-21T09:49:40.9015372Z // begin inline asm 2026-02-21T09:49:40.9015479Z st.global.v4.b32 [ %rd90 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T09:49:40.9015551Z // end inline asm 2026-02-21T09:49:40.9015611Z // begin inline asm 2026-02-21T09:49:40.9015765Z st.global.v4.b32 [ %rd91 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T09:49:40.9015836Z // end inline asm 2026-02-21T09:49:40.9015897Z // begin inline asm 2026-02-21T09:49:40.9016004Z st.global.v4.b32 [ %rd92 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T09:49:40.9016064Z // end inline asm 2026-02-21T09:49:40.9016134Z // begin inline asm 2026-02-21T09:49:40.9016240Z st.global.v4.b32 [ %rd93 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T09:49:40.9016300Z // end inline asm 2026-02-21T09:49:40.9016369Z // begin inline asm 2026-02-21T09:49:40.9016473Z st.global.v4.b32 [ %rd94 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T09:49:40.9016535Z // end inline asm 2026-02-21T09:49:40.9016606Z // begin inline asm 2026-02-21T09:49:40.9016713Z st.global.v4.b32 [ %rd95 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T09:49:40.9016773Z // end inline asm 2026-02-21T09:49:40.9016891Z // begin inline asm 2026-02-21T09:49:40.9017007Z st.global.v4.b32 [ %rd96 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T09:49:40.9017069Z // end inline asm 2026-02-21T09:49:40.9017131Z mov.b32 %r1606, 1; 2026-02-21T09:49:40.9017256Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.9017472Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.9017549Z xor.b32 %r1610, %r1606, %r1610; 2026-02-21T09:49:40.9017757Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9017831Z add.s32 %r1600, %r1600, -1; 2026-02-21T09:49:40.9017908Z setp.ne.b32 %p110, %r1600, 0; 2026-02-21T09:49:40.9017978Z @%p110 bra $L__BB0_18; 2026-02-21T09:49:40.9018052Z bra.uni $L__BB0_23; 2026-02-21T09:49:40.9018167Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:49:40.9018235Z add.s32 %r541, %r1605, 1; 2026-02-21T09:49:40.9018314Z setp.eq.b32 %p106, %r1605, 63; 2026-02-21T09:49:40.9018391Z selp.b32 %r1605, 0, %r541, %p106; 2026-02-21T09:49:40.9018461Z setp.eq.b32 %p107, %r1605, 63; 2026-02-21T09:49:40.9018526Z @%p107 bra $L__BB0_21; 2026-02-21T09:49:40.9018638Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.9018833Z .loc 1 0 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0:97 2026-02-21T09:49:40.9018893Z mov.b32 %r1606, 0; 2026-02-21T09:49:40.9019093Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9019198Z setp.ne.b32 %p108, %r1605, 0; 2026-02-21T09:49:40.9019261Z @%p108 bra $L__BB0_22; 2026-02-21T09:49:40.9019356Z // %bb.20: // %.thread 2026-02-21T09:49:40.9019456Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:49:40.9019524Z add.s32 %r1607, %r1607, 1184; 2026-02-21T09:49:40.9019713Z .loc 1 34 35 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:34:35 2026-02-21T09:49:40.9019829Z mul.hi.s32 %r1557, %r1607, 715827883; 2026-02-21T09:49:40.9019897Z shr.u32 %r1558, %r1557, 31; 2026-02-21T09:49:40.9019963Z shr.s32 %r1559, %r1557, 5; 2026-02-21T09:49:40.9020041Z add.s32 %r1560, %r1559, %r1558; 2026-02-21T09:49:40.9020231Z .loc 1 35 33 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:35:33 2026-02-21T09:49:40.9020300Z shl.b32 %r1561, %r1560, 1; 2026-02-21T09:49:40.9020502Z .loc 1 36 39 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:36:39 2026-02-21T09:49:40.9020571Z sub.s32 %r1562, 8, %r1561; 2026-02-21T09:49:40.9020759Z .loc 1 36 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:36:52 2026-02-21T09:49:40.9020831Z min.s32 %r1563, %r1562, 2; 2026-02-21T09:49:40.9021019Z .loc 1 37 45 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:37:45 2026-02-21T09:49:40.9021115Z mul.lo.s32 %r1564, %r1560, 192; 2026-02-21T09:49:40.9021182Z sub.s32 %r1565, %r1607, %r1564; 2026-02-21T09:49:40.9021371Z .loc 1 38 51 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:38:51 2026-02-21T09:49:40.9021436Z div.s32 %r1566, %r1565, %r1563; 2026-02-21T09:49:40.9021615Z .loc 1 37 64 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:37:64 2026-02-21T09:49:40.9021692Z mul.lo.s32 %r1567, %r1566, %r1563; 2026-02-21T09:49:40.9021756Z sub.s32 %r1568, %r1565, %r1567; 2026-02-21T09:49:40.9021943Z .loc 1 37 30 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:37:30 2026-02-21T09:49:40.9022016Z add.s32 %r1569, %r1568, %r1561; 2026-02-21T09:49:40.9022197Z .loc 1 39 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:39:27 2026-02-21T09:49:40.9022287Z shl.b32 %r1609, %r1569, 8; 2026-02-21T09:49:40.9022485Z .loc 1 41 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:41:27 2026-02-21T09:49:40.9022549Z shl.b32 %r1608, %r1566, 7; 2026-02-21T09:49:40.9022733Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9022795Z bra.uni $L__BB0_22; 2026-02-21T09:49:40.9022914Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:49:40.9023096Z .loc 1 0 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0:97 2026-02-21T09:49:40.9023163Z mov.b32 %r105, global_smem; 2026-02-21T09:49:40.9023238Z add.s32 %r106, %r105, %r3; 2026-02-21T09:49:40.9023299Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.9023410Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9023601Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9023693Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.9023759Z barrier.sync 1; 2026-02-21T09:49:40.9023823Z barrier.sync 1; 2026-02-21T09:49:40.9023916Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.9024008Z $L__BB0_2: // %.preheader 2026-02-21T09:49:40.9024109Z // =>This Loop Header: Depth=1 2026-02-21T09:49:40.9024214Z // Child Loop BB0_11 Depth 2 2026-02-21T09:49:40.9024309Z // Child Loop BB0_7 Depth 2 2026-02-21T09:49:40.9024487Z .loc 1 19 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:19 2026-02-21T09:49:40.9024605Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:49:40.9024706Z barrier.sync 1; 2026-02-21T09:49:40.9024780Z ld.shared.b8 %r104, [%r106+106580]; 2026-02-21T09:49:40.9024848Z setp.gt.u32 %p4, %r104, 3; 2026-02-21T09:49:40.9024917Z @%p4 bra $L__BB0_4; 2026-02-21T09:49:40.9025005Z // %bb.3: // %.preheader 2026-02-21T09:49:40.9025135Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9025211Z $L_brx_0: .branchtargets 2026-02-21T09:49:40.9025270Z $L__BB0_5, 2026-02-21T09:49:40.9025327Z $L__BB0_9, 2026-02-21T09:49:40.9025383Z $L__BB0_15, 2026-02-21T09:49:40.9025446Z $L__BB0_24; 2026-02-21T09:49:40.9025515Z brx.idx %r104, $L_brx_0; 2026-02-21T09:49:40.9025619Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9025812Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9025896Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.9025983Z ld.shared.b32 %r155, [global_smem+98304]; 2026-02-21T09:49:40.9026076Z ld.shared.b32 %r1584, [global_smem+98312]; 2026-02-21T09:49:40.9026139Z barrier.sync 1; 2026-02-21T09:49:40.9026211Z setp.lt.s32 %p17, %r1584, 1; 2026-02-21T09:49:40.9026277Z @%p17 bra $L__BB0_8; 2026-02-21T09:49:40.9026397Z // %bb.6: // %.lr.ph7 2026-02-21T09:49:40.9026495Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9026678Z .loc 1 0 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0:97 2026-02-21T09:49:40.9026748Z mov.b32 %r1588, -1; 2026-02-21T09:49:40.9026814Z mov.pred %p122, 0; 2026-02-21T09:49:40.9026874Z mov.b32 %r1585, 0; 2026-02-21T09:49:40.9026941Z mov.b32 %r1586, %r1585; 2026-02-21T09:49:40.9027002Z mov.b32 %r1587, %r1585; 2026-02-21T09:49:40.9027109Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:49:40.9027210Z // => This Inner Loop Header: Depth=2 2026-02-21T09:49:40.9027403Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9027466Z add.s32 %r165, %r1588, 1; 2026-02-21T09:49:40.9027565Z setp.eq.b32 %p30, %r1588, 63; 2026-02-21T09:49:40.9027647Z selp.b32 %r1588, 0, %r165, %p30; 2026-02-21T09:49:40.9027711Z shl.b32 %r166, %r1587, 3; 2026-02-21T09:49:40.9027776Z add.s32 %r168, %r105, %r166; 2026-02-21T09:49:40.9027845Z add.s32 %r169, %r168, 106496; 2026-02-21T09:49:40.9027906Z add.s32 %r153, %r168, 106528; 2026-02-21T09:49:40.9028092Z .loc 1 51 31 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:51:31 2026-02-21T09:49:40.9028153Z shl.b32 %r170, %r1587, 14; 2026-02-21T09:49:40.9028224Z add.s32 %r171, %r105, %r170; 2026-02-21T09:49:40.9028409Z .loc 1 52 44 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:52:44 2026-02-21T09:49:40.9028474Z shl.b32 %r172, %r1587, 13; 2026-02-21T09:49:40.9028543Z add.s32 %r173, %r105, %r172; 2026-02-21T09:49:40.9028606Z add.s32 %r174, %r173, 65536; 2026-02-21T09:49:40.9028787Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9028863Z bar.warp.sync -1; 2026-02-21T09:49:40.9028929Z // begin inline asm 2026-02-21T09:49:40.9028989Z 2026-02-21T09:49:40.9029044Z { 2026-02-21T09:49:40.9029120Z .reg .pred complete; 2026-02-21T09:49:40.9029182Z waitLoop: 2026-02-21T09:49:40.9029323Z mbarrier.try_wait.parity.shared.b64 complete, [%r153], %r1586; 2026-02-21T09:49:40.9029402Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.9029457Z } 2026-02-21T09:49:40.9029465Z 2026-02-21T09:49:40.9029526Z // end inline asm 2026-02-21T09:49:40.9029711Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9029843Z setp.eq.b32 %p29, %r1588, 63; 2026-02-21T09:49:40.9030056Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.9030137Z elect.sync %r175|%p20, -1; 2026-02-21T09:49:40.9030230Z bfe.u32 %r176, %r171, 4, 14; 2026-02-21T09:49:40.9030393Z cvt.u64.u32 %rd22, %r176; 2026-02-21T09:49:40.9030498Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T09:49:40.9030614Z bfe.u32 %r177, %r174, 4, 14; 2026-02-21T09:49:40.9030694Z cvt.u64.u32 %rd23, %r177; 2026-02-21T09:49:40.9030799Z or.b64 %rd13, %rd23, -9223371899382267904; 2026-02-21T09:49:40.9030866Z mov.b32 %r156, 136314896; 2026-02-21T09:49:40.9030943Z // begin inline asm 2026-02-21T09:49:40.9031130Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 0 ], %rd12, %rd13, %r156, %p122; 2026-02-21T09:49:40.9031194Z // end inline asm 2026-02-21T09:49:40.9031265Z add.s32 %r178, %r171, 32; 2026-02-21T09:49:40.9031328Z bfe.u32 %r179, %r178, 4, 14; 2026-02-21T09:49:40.9031392Z cvt.u64.u32 %rd24, %r179; 2026-02-21T09:49:40.9031467Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:49:40.9031536Z add.s32 %r180, %r173, 65568; 2026-02-21T09:49:40.9031598Z bfe.u32 %r181, %r180, 4, 14; 2026-02-21T09:49:40.9031660Z cvt.u64.u32 %rd25, %r181; 2026-02-21T09:49:40.9031743Z or.b64 %rd15, %rd25, -9223371899382267904; 2026-02-21T09:49:40.9031809Z mov.pred %p21, -1; 2026-02-21T09:49:40.9031894Z // begin inline asm 2026-02-21T09:49:40.9032062Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 0 ], %rd14, %rd15, %r156, %p21; 2026-02-21T09:49:40.9032125Z // end inline asm 2026-02-21T09:49:40.9032189Z add.s32 %r182, %r171, 8192; 2026-02-21T09:49:40.9032252Z bfe.u32 %r183, %r182, 4, 14; 2026-02-21T09:49:40.9032320Z cvt.u64.u32 %rd26, %r183; 2026-02-21T09:49:40.9032390Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:49:40.9032451Z // begin inline asm 2026-02-21T09:49:40.9032611Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 128 ], %rd16, %rd13, %r156, %p122; 2026-02-21T09:49:40.9032673Z // end inline asm 2026-02-21T09:49:40.9032734Z add.s32 %r184, %r171, 8224; 2026-02-21T09:49:40.9032796Z bfe.u32 %r185, %r184, 4, 14; 2026-02-21T09:49:40.9032865Z cvt.u64.u32 %rd27, %r185; 2026-02-21T09:49:40.9032935Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T09:49:40.9033022Z // begin inline asm 2026-02-21T09:49:40.9033182Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 128 ], %rd18, %rd15, %r156, %p21; 2026-02-21T09:49:40.9033244Z // end inline asm 2026-02-21T09:49:40.9033307Z cvt.u64.u32 %rd20, %r169; 2026-02-21T09:49:40.9033376Z // begin inline asm 2026-02-21T09:49:40.9033514Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:49:40.9033575Z // end inline asm 2026-02-21T09:49:40.9033644Z and.pred %p28, %p29, %p20; 2026-02-21T09:49:40.9033717Z add.s32 %r186, %r105, 106560; 2026-02-21T09:49:40.9033781Z cvt.u64.u32 %rd21, %r186; 2026-02-21T09:49:40.9033842Z // begin inline asm 2026-02-21T09:49:40.9033985Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:49:40.9034043Z // end inline asm 2026-02-21T09:49:40.9034227Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9034303Z setp.ne.b32 %p122, %r1588, 63; 2026-02-21T09:49:40.9034496Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.9034563Z selp.b32 %r187, 1, 0, %p29; 2026-02-21T09:49:40.9034630Z xor.b32 %r1585, %r1585, %r187; 2026-02-21T09:49:40.9034738Z add.s32 %r163, %r105, 106576; 2026-02-21T09:49:40.9034801Z // begin inline asm 2026-02-21T09:49:40.9034854Z 2026-02-21T09:49:40.9034913Z { 2026-02-21T09:49:40.9034980Z @!%p29 bra.uni skipWait; 2026-02-21T09:49:40.9035046Z .reg .pred complete; 2026-02-21T09:49:40.9035106Z waitLoop: 2026-02-21T09:49:40.9035245Z mbarrier.try_wait.parity.shared.b64 complete, [%r163], %r1585; 2026-02-21T09:49:40.9035317Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.9035409Z skipWait: 2026-02-21T09:49:40.9035470Z } 2026-02-21T09:49:40.9035476Z 2026-02-21T09:49:40.9035536Z // end inline asm 2026-02-21T09:49:40.9035708Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9035779Z add.s32 %r188, %r1587, 1; 2026-02-21T09:49:40.9035846Z setp.eq.b32 %p31, %r188, 4; 2026-02-21T09:49:40.9035957Z selp.b32 %r1587, 0, %r188, %p31; 2026-02-21T09:49:40.9036022Z selp.b32 %r189, 1, 0, %p31; 2026-02-21T09:49:40.9036098Z xor.b32 %r1586, %r1586, %r189; 2026-02-21T09:49:40.9036292Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9036356Z add.s32 %r1584, %r1584, -1; 2026-02-21T09:49:40.9036431Z setp.ne.b32 %p32, %r1584, 0; 2026-02-21T09:49:40.9036496Z @%p32 bra $L__BB0_7; 2026-02-21T09:49:40.9036590Z $L__BB0_8: // %._crit_edge8 2026-02-21T09:49:40.9036695Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9036760Z barrier.sync 1; 2026-02-21T09:49:40.9036842Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.9036904Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.9037013Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9037236Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9037322Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.9037414Z ld.shared.b32 %r1589, [global_smem+98312]; 2026-02-21T09:49:40.9037476Z barrier.sync 1; 2026-02-21T09:49:40.9037659Z .loc 1 21 67 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:21:67 2026-02-21T09:49:40.9037731Z mov.u32 %r17, %ctaid.x; 2026-02-21T09:49:40.9037795Z mov.u32 %r107, %ctaid.y; 2026-02-21T09:49:40.9037859Z mov.u32 %r108, %ctaid.z; 2026-02-21T09:49:40.9037922Z mov.u32 %r109, %nctaid.x; 2026-02-21T09:49:40.9037993Z mov.u32 %r110, %nctaid.y; 2026-02-21T09:49:40.9038067Z mad.lo.s32 %r111, %r108, %r110, %r107; 2026-02-21T09:49:40.9038139Z mad.lo.s32 %r112, %r111, %r109, %r17; 2026-02-21T09:49:40.9038209Z shl.b32 %r113, %r112, 8; 2026-02-21T09:49:40.9038423Z .loc 1 22 68 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:22:68 2026-02-21T09:49:40.9038492Z cvt.s64.s32 %rd7, %r113; 2026-02-21T09:49:40.9038560Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:49:40.9038636Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:49:40.9038707Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:49:40.9038887Z .loc 1 21 67 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:21:67 2026-02-21T09:49:40.9038964Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:49:40.9039144Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9039212Z setp.lt.s32 %p5, %r1589, 1; 2026-02-21T09:49:40.9039283Z @%p5 bra $L__BB0_14; 2026-02-21T09:49:40.9039368Z // %bb.10: // %.lr.ph 2026-02-21T09:49:40.9039461Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9039526Z add.s32 %r1599, %r17, -1184; 2026-02-21T09:49:40.9039598Z add.s32 %r19, %r1, -128; 2026-02-21T09:49:40.9039661Z mov.b32 %r1596, -1; 2026-02-21T09:49:40.9039721Z mov.b32 %r1590, 0; 2026-02-21T09:49:40.9039793Z mov.b32 %r1591, %r1590; 2026-02-21T09:49:40.9039855Z mov.b32 %r1598, %r1590; 2026-02-21T09:49:40.9039916Z mov.b32 %r1597, %r1590; 2026-02-21T09:49:40.9039977Z mov.b32 %r1594, %r1590; 2026-02-21T09:49:40.9040045Z bra.uni $L__BB0_11; 2026-02-21T09:49:40.9040151Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:49:40.9040327Z .loc 1 0 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0:97 2026-02-21T09:49:40.9040402Z selp.b32 %r134, 0, %r1594, %p8; 2026-02-21T09:49:40.9040469Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:49:40.9040562Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:49:40.9040753Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9040816Z shl.b32 %r141, %r1591, 3; 2026-02-21T09:49:40.9040879Z add.s32 %r143, %r105, %r141; 2026-02-21T09:49:40.9040947Z add.s32 %r130, %r143, 106496; 2026-02-21T09:49:40.9041131Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9041396Z // begin inline asm 2026-02-21T09:49:40.9041451Z 2026-02-21T09:49:40.9041515Z { 2026-02-21T09:49:40.9041582Z .reg .pred complete; 2026-02-21T09:49:40.9041641Z waitLoop: 2026-02-21T09:49:40.9041776Z mbarrier.try_wait.parity.shared.b64 complete, [%r130], %r1590; 2026-02-21T09:49:40.9041855Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.9041911Z } 2026-02-21T09:49:40.9041916Z 2026-02-21T09:49:40.9041976Z // end inline asm 2026-02-21T09:49:40.9042171Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9042238Z add.s32 %r136, %r143, 106528; 2026-02-21T09:49:40.9042418Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9042486Z bar.sync 3, 64; 2026-02-21T09:49:40.9042552Z // begin inline asm 2026-02-21T09:49:40.9042716Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r136], 24576; 2026-02-21T09:49:40.9042787Z // end inline asm 2026-02-21T09:49:40.9042966Z .loc 1 51 31 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:51:31 2026-02-21T09:49:40.9043031Z shl.b32 %r144, %r1591, 14; 2026-02-21T09:49:40.9043094Z add.s32 %r133, %r105, %r144; 2026-02-21T09:49:40.9043274Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9043334Z bar.sync 3, 64; 2026-02-21T09:49:40.9043404Z elect.sync %r145|%p13, -1; 2026-02-21T09:49:40.9043480Z and.pred %p10, %p12, %p13; 2026-02-21T09:49:40.9043541Z // begin inline asm 2026-02-21T09:49:40.9043830Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r133], [%rd10, {%r134, %r1597}], [%r136]; 2026-02-21T09:49:40.9043898Z // end inline asm 2026-02-21T09:49:40.9044101Z .loc 1 52 44 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:52:44 2026-02-21T09:49:40.9044169Z shl.b32 %r146, %r1591, 13; 2026-02-21T09:49:40.9044232Z add.s32 %r147, %r105, %r146; 2026-02-21T09:49:40.9044302Z add.s32 %r137, %r147, 65536; 2026-02-21T09:49:40.9044478Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9044539Z bar.sync 3, 64; 2026-02-21T09:49:40.9044614Z elect.sync %r148|%p14, -1; 2026-02-21T09:49:40.9044765Z and.pred %p11, %p12, %p14; 2026-02-21T09:49:40.9044830Z // begin inline asm 2026-02-21T09:49:40.9045110Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r137], [%rd11, {%r134, %r1598}], [%r136]; 2026-02-21T09:49:40.9045173Z // end inline asm 2026-02-21T09:49:40.9045354Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9045417Z add.s32 %r1594, %r134, 32; 2026-02-21T09:49:40.9045602Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9045667Z add.s32 %r149, %r1591, 1; 2026-02-21T09:49:40.9045733Z setp.eq.b32 %p15, %r149, 4; 2026-02-21T09:49:40.9045809Z selp.b32 %r1591, 0, %r149, %p15; 2026-02-21T09:49:40.9045874Z selp.b32 %r150, 1, 0, %p15; 2026-02-21T09:49:40.9045939Z xor.b32 %r1590, %r1590, %r150; 2026-02-21T09:49:40.9046132Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9046196Z add.s32 %r1589, %r1589, -1; 2026-02-21T09:49:40.9046261Z setp.ne.b32 %p16, %r1589, 0; 2026-02-21T09:49:40.9046324Z @%p16 bra $L__BB0_11; 2026-02-21T09:49:40.9046426Z bra.uni $L__BB0_14; 2026-02-21T09:49:40.9046536Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:49:40.9046638Z // => This Inner Loop Header: Depth=2 2026-02-21T09:49:40.9046709Z add.s32 %r116, %r1596, 1; 2026-02-21T09:49:40.9046778Z setp.eq.b32 %p6, %r1596, 63; 2026-02-21T09:49:40.9046848Z selp.b32 %r1596, 0, %r116, %p6; 2026-02-21T09:49:40.9046972Z setp.ne.b32 %p7, %r1596, 0; 2026-02-21T09:49:40.9047038Z setp.eq.b32 %p8, %r1596, 0; 2026-02-21T09:49:40.9047101Z @%p7 bra $L__BB0_13; 2026-02-21T09:49:40.9047205Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:49:40.9047279Z add.s32 %r1599, %r1599, 1184; 2026-02-21T09:49:40.9047463Z .loc 1 34 35 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:34:35 2026-02-21T09:49:40.9047535Z mul.hi.s32 %r117, %r1599, 715827883; 2026-02-21T09:49:40.9047606Z shr.u32 %r118, %r117, 31; 2026-02-21T09:49:40.9047673Z shr.s32 %r119, %r117, 5; 2026-02-21T09:49:40.9047736Z add.s32 %r120, %r119, %r118; 2026-02-21T09:49:40.9047919Z .loc 1 35 33 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:35:33 2026-02-21T09:49:40.9047990Z shl.b32 %r121, %r120, 1; 2026-02-21T09:49:40.9048208Z .loc 1 36 39 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:36:39 2026-02-21T09:49:40.9048274Z sub.s32 %r122, 8, %r121; 2026-02-21T09:49:40.9048460Z .loc 1 36 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:36:52 2026-02-21T09:49:40.9048522Z min.s32 %r123, %r122, 2; 2026-02-21T09:49:40.9048702Z .loc 1 37 45 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:37:45 2026-02-21T09:49:40.9048778Z mul.lo.s32 %r124, %r120, 192; 2026-02-21T09:49:40.9048845Z sub.s32 %r125, %r1599, %r124; 2026-02-21T09:49:40.9049025Z .loc 1 38 51 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:38:51 2026-02-21T09:49:40.9049101Z div.s32 %r126, %r125, %r123; 2026-02-21T09:49:40.9049283Z .loc 1 37 64 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:37:64 2026-02-21T09:49:40.9049350Z mul.lo.s32 %r127, %r126, %r123; 2026-02-21T09:49:40.9049454Z sub.s32 %r128, %r125, %r127; 2026-02-21T09:49:40.9049647Z .loc 1 37 30 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:37:30 2026-02-21T09:49:40.9049711Z add.s32 %r129, %r128, %r121; 2026-02-21T09:49:40.9049895Z .loc 1 39 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:39:27 2026-02-21T09:49:40.9049966Z shl.b32 %r1597, %r129, 8; 2026-02-21T09:49:40.9050146Z .loc 1 41 27 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:41:27 2026-02-21T09:49:40.9050210Z shl.b32 %r1598, %r126, 7; 2026-02-21T09:49:40.9050277Z bra.uni $L__BB0_13; 2026-02-21T09:49:40.9050372Z $L__BB0_14: // %._crit_edge 2026-02-21T09:49:40.9050467Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9050655Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9050718Z barrier.sync 1; 2026-02-21T09:49:40.9050803Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:49:40.9050867Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.9050981Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:49:40.9051161Z .loc 1 19 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:19 2026-02-21T09:49:40.9051222Z barrier.sync 1; 2026-02-21T09:49:40.9051289Z barrier.sync 1; 2026-02-21T09:49:40.9051352Z bra.uni $L__BB0_2; 2026-02-21T09:49:40.9051447Z $L__BB0_23: // %._crit_edge11 2026-02-21T09:49:40.9051634Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9051720Z barrier.sync 1; 2026-02-21T09:49:40.9051805Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:49:40.9051986Z .loc 1 53 52 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:53:52 2026-02-21T09:49:40.9052056Z bar.sync 0, 128; 2026-02-21T09:49:40.9052118Z // begin inline asm 2026-02-21T09:49:40.9052175Z 2026-02-21T09:49:40.9052265Z { 2026-02-21T09:49:40.9052330Z .reg .pred complete; 2026-02-21T09:49:40.9052388Z waitLoop: 2026-02-21T09:49:40.9052525Z mbarrier.try_wait.parity.shared.b64 complete, [%r1570], %r1610; 2026-02-21T09:49:40.9052606Z @!complete bra.uni waitLoop; 2026-02-21T09:49:40.9052661Z } 2026-02-21T09:49:40.9052664Z 2026-02-21T09:49:40.9052724Z // end inline asm 2026-02-21T09:49:40.9052909Z .loc 1 28 97 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:97 2026-02-21T09:49:40.9052969Z bar.sync 0, 128; 2026-02-21T09:49:40.9053030Z // begin inline asm 2026-02-21T09:49:40.9053138Z @%p111 mbarrier.inval.shared::cta.b64 [%r1570]; 2026-02-21T09:49:40.9053198Z // end inline asm 2026-02-21T09:49:40.9053259Z // begin inline asm 2026-02-21T09:49:40.9053356Z @%p111 mbarrier.inval.shared::cta.b64 [%r491]; 2026-02-21T09:49:40.9053423Z // end inline asm 2026-02-21T09:49:40.9053485Z // begin inline asm 2026-02-21T09:49:40.9053576Z @%p111 mbarrier.inval.shared::cta.b64 [%r483]; 2026-02-21T09:49:40.9053669Z // end inline asm 2026-02-21T09:49:40.9053730Z bar.sync 0, 128; 2026-02-21T09:49:40.9053789Z // begin inline asm 2026-02-21T09:49:40.9053876Z @%p111 mbarrier.inval.shared::cta.b64 [%r484]; 2026-02-21T09:49:40.9053942Z // end inline asm 2026-02-21T09:49:40.9054002Z bar.sync 0, 128; 2026-02-21T09:49:40.9054061Z // begin inline asm 2026-02-21T09:49:40.9054152Z @%p111 mbarrier.inval.shared::cta.b64 [%r485]; 2026-02-21T09:49:40.9054210Z // end inline asm 2026-02-21T09:49:40.9054269Z bar.sync 0, 128; 2026-02-21T09:49:40.9054328Z // begin inline asm 2026-02-21T09:49:40.9054421Z @%p111 mbarrier.inval.shared::cta.b64 [%r486]; 2026-02-21T09:49:40.9054481Z // end inline asm 2026-02-21T09:49:40.9054540Z // begin inline asm 2026-02-21T09:49:40.9054631Z @%p111 mbarrier.inval.shared::cta.b64 [%r479]; 2026-02-21T09:49:40.9054717Z // end inline asm 2026-02-21T09:49:40.9054777Z bar.sync 0, 128; 2026-02-21T09:49:40.9054869Z // begin inline asm 2026-02-21T09:49:40.9054955Z @%p111 mbarrier.inval.shared::cta.b64 [%r480]; 2026-02-21T09:49:40.9055014Z // end inline asm 2026-02-21T09:49:40.9055072Z bar.sync 0, 128; 2026-02-21T09:49:40.9055140Z // begin inline asm 2026-02-21T09:49:40.9055222Z @%p111 mbarrier.inval.shared::cta.b64 [%r481]; 2026-02-21T09:49:40.9055280Z // end inline asm 2026-02-21T09:49:40.9055348Z bar.sync 0, 128; 2026-02-21T09:49:40.9055407Z // begin inline asm 2026-02-21T09:49:40.9055490Z @%p111 mbarrier.inval.shared::cta.b64 [%r482]; 2026-02-21T09:49:40.9055547Z // end inline asm 2026-02-21T09:49:40.9055743Z .loc 1 28 4 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:28:4 2026-02-21T09:49:40.9055804Z bar.sync 0, 128; 2026-02-21T09:49:40.9055863Z // begin inline asm 2026-02-21T09:49:40.9056003Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1582, 256; 2026-02-21T09:49:40.9056063Z // end inline asm 2026-02-21T09:49:40.9056153Z st.shared.b32 [global_smem+106584], 50529027; 2026-02-21T09:49:40.9056224Z barrier.sync 1; 2026-02-21T09:49:40.9056315Z $L__BB0_24: // %common.ret 2026-02-21T09:49:40.9056493Z .loc 1 0 0 // cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py:0 2026-02-21T09:49:40.9056550Z ret; 2026-02-21T09:49:40.9056617Z $L__tmp1: 2026-02-21T09:49:40.9056678Z $L__func_end0: 2026-02-21T09:49:40.9056768Z // -- End function 2026-02-21T09:49:40.9056833Z } 2026-02-21T09:49:40.9057058Z .file 1 "/tmp/torchinductor_root/gq/cgq3fpmyxqqcmqmn4dljh5sh3shhv625737jdmdmr7zzknhcgnzx.py" 2026-02-21T09:49:40.9057154Z .section .debug_abbrev 2026-02-21T09:49:40.9057216Z { 2026-02-21T09:49:40.9057314Z .b8 1 // Abbreviation Code 2026-02-21T09:49:40.9057412Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:49:40.9057499Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:49:40.9057599Z .b8 37 // DW_AT_producer 2026-02-21T09:49:40.9057713Z .b8 8 // DW_FORM_string 2026-02-21T09:49:40.9057797Z .b8 19 // DW_AT_language 2026-02-21T09:49:40.9057892Z .b8 5 // DW_FORM_data2 2026-02-21T09:49:40.9057974Z .b8 3 // DW_AT_name 2026-02-21T09:49:40.9058054Z .b8 8 // DW_FORM_string 2026-02-21T09:49:40.9058147Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:49:40.9058228Z .b8 6 // DW_FORM_data4 2026-02-21T09:49:40.9058310Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:49:40.9058390Z .b8 8 // DW_FORM_string 2026-02-21T09:49:40.9058473Z .b8 0 // EOM(1) 2026-02-21T09:49:40.9058547Z .b8 0 // EOM(2) 2026-02-21T09:49:40.9058620Z .b8 0 // EOM(3) 2026-02-21T09:49:40.9058714Z } 2026-02-21T09:49:40.9058782Z .section .debug_info 2026-02-21T09:49:40.9058836Z { 2026-02-21T09:49:40.9058926Z .b32 104 // Length of Unit 2026-02-21T09:49:40.9059028Z .b8 2 // DWARF version number 2026-02-21T09:49:40.9059084Z .b8 0 2026-02-21T09:49:40.9059211Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:49:40.9059317Z .b8 8 // Address Size (in bytes) 2026-02-21T09:49:40.9059428Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:49:40.9059518Z .b8 116 // DW_AT_producer 2026-02-21T09:49:40.9059586Z .b8 114 2026-02-21T09:49:40.9059644Z .b8 105 2026-02-21T09:49:40.9059699Z .b8 116 2026-02-21T09:49:40.9059754Z .b8 111 2026-02-21T09:49:40.9059816Z .b8 110 2026-02-21T09:49:40.9059872Z .b8 0 2026-02-21T09:49:40.9059976Z .b8 2 // DW_AT_language 2026-02-21T09:49:40.9060046Z .b8 0 2026-02-21T09:49:40.9060129Z .b8 99 // DW_AT_name 2026-02-21T09:49:40.9060185Z .b8 103 2026-02-21T09:49:40.9060241Z .b8 113 2026-02-21T09:49:40.9060305Z .b8 51 2026-02-21T09:49:40.9060360Z .b8 102 2026-02-21T09:49:40.9060415Z .b8 112 2026-02-21T09:49:40.9060477Z .b8 109 2026-02-21T09:49:40.9060533Z .b8 121 2026-02-21T09:49:40.9060590Z .b8 120 2026-02-21T09:49:40.9060645Z .b8 113 2026-02-21T09:49:40.9060708Z .b8 113 2026-02-21T09:49:40.9060764Z .b8 99 2026-02-21T09:49:40.9060818Z .b8 109 2026-02-21T09:49:40.9060879Z .b8 113 2026-02-21T09:49:40.9060935Z .b8 109 2026-02-21T09:49:40.9060991Z .b8 110 2026-02-21T09:49:40.9061045Z .b8 52 2026-02-21T09:49:40.9061108Z .b8 100 2026-02-21T09:49:40.9061162Z .b8 108 2026-02-21T09:49:40.9061218Z .b8 106 2026-02-21T09:49:40.9061272Z .b8 104 2026-02-21T09:49:40.9061336Z .b8 53 2026-02-21T09:49:40.9061392Z .b8 115 2026-02-21T09:49:40.9061449Z .b8 104 2026-02-21T09:49:40.9061508Z .b8 51 2026-02-21T09:49:40.9061566Z .b8 115 2026-02-21T09:49:40.9061623Z .b8 104 2026-02-21T09:49:40.9061676Z .b8 104 2026-02-21T09:49:40.9061738Z .b8 118 2026-02-21T09:49:40.9061792Z .b8 54 2026-02-21T09:49:40.9061846Z .b8 50 2026-02-21T09:49:40.9061908Z .b8 53 2026-02-21T09:49:40.9061962Z .b8 55 2026-02-21T09:49:40.9062018Z .b8 51 2026-02-21T09:49:40.9062073Z .b8 55 2026-02-21T09:49:40.9062134Z .b8 106 2026-02-21T09:49:40.9062189Z .b8 100 2026-02-21T09:49:40.9062244Z .b8 109 2026-02-21T09:49:40.9062298Z .b8 100 2026-02-21T09:49:40.9062360Z .b8 109 2026-02-21T09:49:40.9062414Z .b8 114 2026-02-21T09:49:40.9062468Z .b8 55 2026-02-21T09:49:40.9062558Z .b8 122 2026-02-21T09:49:40.9062613Z .b8 122 2026-02-21T09:49:40.9062670Z .b8 107 2026-02-21T09:49:40.9062724Z .b8 110 2026-02-21T09:49:40.9062789Z .b8 104 2026-02-21T09:49:40.9062843Z .b8 99 2026-02-21T09:49:40.9062898Z .b8 103 2026-02-21T09:49:40.9062962Z .b8 110 2026-02-21T09:49:40.9063016Z .b8 122 2026-02-21T09:49:40.9063073Z .b8 120 2026-02-21T09:49:40.9063129Z .b8 46 2026-02-21T09:49:40.9063222Z .b8 112 2026-02-21T09:49:40.9063278Z .b8 121 2026-02-21T09:49:40.9063332Z .b8 0 2026-02-21T09:49:40.9063440Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:49:40.9063522Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:49:40.9063576Z .b8 116 2026-02-21T09:49:40.9063634Z .b8 109 2026-02-21T09:49:40.9063697Z .b8 112 2026-02-21T09:49:40.9063751Z .b8 47 2026-02-21T09:49:40.9063806Z .b8 116 2026-02-21T09:49:40.9063866Z .b8 111 2026-02-21T09:49:40.9063922Z .b8 114 2026-02-21T09:49:40.9063977Z .b8 99 2026-02-21T09:49:40.9064031Z .b8 104 2026-02-21T09:49:40.9064098Z .b8 105 2026-02-21T09:49:40.9064155Z .b8 110 2026-02-21T09:49:40.9064211Z .b8 100 2026-02-21T09:49:40.9064266Z .b8 117 2026-02-21T09:49:40.9064341Z .b8 99 2026-02-21T09:49:40.9064395Z .b8 116 2026-02-21T09:49:40.9064449Z .b8 111 2026-02-21T09:49:40.9064509Z .b8 114 2026-02-21T09:49:40.9064563Z .b8 95 2026-02-21T09:49:40.9064620Z .b8 114 2026-02-21T09:49:40.9064713Z .b8 111 2026-02-21T09:49:40.9064807Z .b8 111 2026-02-21T09:49:40.9064864Z .b8 116 2026-02-21T09:49:40.9064918Z .b8 47 2026-02-21T09:49:40.9064979Z .b8 103 2026-02-21T09:49:40.9065033Z .b8 113 2026-02-21T09:49:40.9065088Z .b8 0 2026-02-21T09:49:40.9065141Z } 2026-02-21T09:49:40.9065222Z .section .debug_macinfo { } 2026-02-21T09:49:40.9065226Z 2026-02-21T09:49:40.9065313Z ================================================================ 2026-02-21T09:49:40.9065427Z please share the reproducer above with Triton project. 2026-02-21T09:49:41.3919130Z 2026-02-21T09:49:41.3924455Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━ 103/103 19.7 configs/s 2026-02-21T09:49:51.8518946Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 95.9 configs/s 2026-02-21T09:49:52.1750240Z [209s] Generation 6 complete: 2026-02-21T09:49:52.1750584Z error=23 2026-02-21T09:49:52.1755174Z ok=83 2026-02-21T09:49:52.1760302Z min=0.1179 2026-02-21T09:49:52.1764744Z mid=0.1525 2026-02-21T09:49:52.1766244Z max=12.4385 2026-02-21T09:49:52.1766483Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:49:52.1766801Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:49:52.1767065Z 'l2_groupings': [16], 2026-02-21T09:49:52.1767243Z 'load_eviction_policies': ['', ''], 2026-02-21T09:49:52.1767469Z 'loop_orders': [[1, 0]], 2026-02-21T09:49:52.1767661Z 'maxnreg': 64, 2026-02-21T09:49:52.1767822Z 'num_sm_multiplier': 8, 2026-02-21T09:49:52.1768000Z 'num_stages': 2, 2026-02-21T09:49:52.1768169Z 'num_warps': 1, 2026-02-21T09:49:52.1768355Z 'pid_type': 'persistent_blocked', 2026-02-21T09:49:52.1768585Z 'range_flattens': [False, False], 2026-02-21T09:49:52.1768811Z 'range_multi_buffers': [False, False], 2026-02-21T09:49:52.1769031Z 'range_num_stages': [0, 0], 2026-02-21T09:49:52.1769227Z 'range_unroll_factors': [0, 0], 2026-02-21T09:49:52.1769425Z 'range_warp_specializes': [None, True]} 2026-02-21T09:49:52.1781348Z [209s] Fitting surrogate: 674 points, 674 targets 2026-02-21T09:49:53.6850423Z [210s] Generation 7 starting: 91 neighbors, 5 active search path(s) 2026-02-21T09:50:03.8406169Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91/91 7.8 configs/s 2026-02-21T09:50:04.1037983Z 2026-02-21T09:50:04.1037996Z 2026-02-21T09:50:04.1038303Z ================================================================ 2026-02-21T09:50:04.1038568Z Internal Triton PTX codegen error 2026-02-21T09:50:04.1038752Z `ptxas` stderr: 2026-02-21T09:50:04.1039167Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 208 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:50:04.1039965Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:50:04.1040115Z 2026-02-21T09:50:04.1040517Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpnn092uls.ptx -o /tmp/tmpnn092uls.ptx.o 2026-02-21T09:50:04.1040960Z 2026-02-21T09:50:04.1040965Z 2026-02-21T09:50:04.1041022Z // 2026-02-21T09:50:04.1041269Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:50:04.1041438Z // 2026-02-21T09:50:04.1041509Z 2026-02-21T09:50:04.1041566Z .version 8.7 2026-02-21T09:50:04.1041704Z .target sm_100a 2026-02-21T09:50:04.1041833Z .address_size 64 2026-02-21T09:50:04.1041913Z 2026-02-21T09:50:04.1042041Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:50:04.1042286Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:50:04.1042511Z // @_helion_matmul 2026-02-21T09:50:04.1042711Z .visible .entry _helion_matmul( 2026-02-21T09:50:04.1042938Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:50:04.1043206Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:50:04.1043456Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:50:04.1044147Z [221s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:50:04.1046962Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 128, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, True]), static_shapes=True) 2026-02-21T09:50:04.1048167Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:50:04.1048407Z `ptxas` stderr: 2026-02-21T09:50:04.1048840Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 208 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:50:04.1049319Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:50:04.1049464Z 2026-02-21T09:50:04.1049855Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpnn092uls.ptx -o /tmp/tmpnn092uls.ptx.o 2026-02-21T09:50:04.1050324Z 2026-02-21T09:50:04.1050451Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:50:04.1050739Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:50:04.1050986Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:50:04.1051217Z ) 2026-02-21T09:50:04.1051332Z .reqntid 384 2026-02-21T09:50:04.1051465Z .maxnreg 32 2026-02-21T09:50:04.1051582Z { 2026-02-21T09:50:04.1051707Z .reg .pred %p<82>; 2026-02-21T09:50:04.1051849Z .reg .b16 %rs<3>; 2026-02-21T09:50:04.1051989Z .reg .b32 %r<943>; 2026-02-21T09:50:04.1052121Z .reg .b64 %rd<391>; 2026-02-21T09:50:04.1052387Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19:0 2026-02-21T09:50:04.1052683Z $L__func_begin0: 2026-02-21T09:50:04.1052933Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19:0 2026-02-21T09:50:04.1053175Z 2026-02-21T09:50:04.1053233Z // %bb.0: 2026-02-21T09:50:04.1053379Z ld.param.b64 %rd9, [_helion_matmul_param_3]; 2026-02-21T09:50:04.1053567Z $L__tmp0: 2026-02-21T09:50:04.1053799Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19 2026-02-21T09:50:04.1054083Z mov.u32 %r1, %tid.x; 2026-02-21T09:50:04.1054228Z shr.u32 %r2, %r1, 5; 2026-02-21T09:50:04.1054378Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:50:04.1054564Z setp.lt.u32 %p2, %r3, 4; 2026-02-21T09:50:04.1054798Z @%p2 bra $L__BB0_14; 2026-02-21T09:50:04.1054945Z bra.uni $L__BB0_1; 2026-02-21T09:50:04.1055078Z $L__BB0_14: 2026-02-21T09:50:04.1055323Z .loc 1 0 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:0:0 2026-02-21T09:50:04.1055628Z ld.param.b64 %rd6, [_helion_matmul_param_0]; 2026-02-21T09:50:04.1055933Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19 2026-02-21T09:50:04.1056281Z setmaxnreg.inc.sync.aligned.u32 48; 2026-02-21T09:50:04.1056469Z setp.lt.u32 %p29, %r1, 32; 2026-02-21T09:50:04.1056637Z mov.b32 %r211, global_smem; 2026-02-21T09:50:04.1056791Z // begin inline asm 2026-02-21T09:50:04.1057040Z @%p29 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r211], 128; 2026-02-21T09:50:04.1057281Z // end inline asm 2026-02-21T09:50:04.1057423Z bar.sync 0, 128; 2026-02-21T09:50:04.1057567Z ld.shared.b32 %r935, [global_smem]; 2026-02-21T09:50:04.1057744Z bar.sync 0, 128; 2026-02-21T09:50:04.1057882Z // begin inline asm 2026-02-21T09:50:04.1058096Z @%p29 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:50:04.1058328Z // end inline asm 2026-02-21T09:50:04.1058573Z .loc 1 21 67 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:21:67 2026-02-21T09:50:04.1058906Z mov.u32 %r26, %ctaid.x; 2026-02-21T09:50:04.1059060Z mov.u32 %r220, %ctaid.y; 2026-02-21T09:50:04.1059246Z mov.u32 %r221, %ctaid.z; 2026-02-21T09:50:04.1059396Z mov.u32 %r222, %nctaid.x; 2026-02-21T09:50:04.1059553Z mov.u32 %r223, %nctaid.y; 2026-02-21T09:50:04.1059712Z mad.lo.s32 %r224, %r221, %r223, %r220; 2026-02-21T09:50:04.1059890Z mad.lo.s32 %r225, %r224, %r222, %r26; 2026-02-21T09:50:04.1060062Z shl.b32 %r226, %r225, 7; 2026-02-21T09:50:04.1060213Z cvt.s64.s32 %rd117, %r226; 2026-02-21T09:50:04.1060376Z add.s64 %rd114, %rd9, %rd117; 2026-02-21T09:50:04.1060532Z shl.b32 %r227, %r1, 2; 2026-02-21T09:50:04.1060688Z add.s32 %r212, %r211, %r227; 2026-02-21T09:50:04.1060838Z mov.b32 %r229, 0; 2026-02-21T09:50:04.1060976Z // begin inline asm 2026-02-21T09:50:04.1061127Z @%p29 st.shared.b32 [ %r212 + 0 ], %r229; 2026-02-21T09:50:04.1061304Z // end inline asm 2026-02-21T09:50:04.1061445Z bar.warp.sync -1; 2026-02-21T09:50:04.1061585Z setp.eq.b32 %p58, %r1, 0; 2026-02-21T09:50:04.1061743Z cvt.u64.u32 %rd99, %r211; 2026-02-21T09:50:04.1061886Z // begin inline asm 2026-02-21T09:50:04.1062138Z @%p58 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd99 + 0 ], %rd6; 2026-02-21T09:50:04.1062407Z // end inline asm 2026-02-21T09:50:04.1062542Z // begin inline asm 2026-02-21T09:50:04.1062757Z @%p58 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x1; 2026-02-21T09:50:04.1063005Z // end inline asm 2026-02-21T09:50:04.1063136Z mov.b32 %r214, 64; 2026-02-21T09:50:04.1063267Z // begin inline asm 2026-02-21T09:50:04.1063500Z @%p58 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x0, %r214; 2026-02-21T09:50:04.1063766Z // end inline asm 2026-02-21T09:50:04.1063900Z mov.b32 %r215, 128; 2026-02-21T09:50:04.1064034Z // begin inline asm 2026-02-21T09:50:04.1064264Z @%p58 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x1, %r215; 2026-02-21T09:50:04.1064515Z // end inline asm 2026-02-21T09:50:04.1064652Z mov.b32 %r216, 2048; 2026-02-21T09:50:04.1064825Z // begin inline asm 2026-02-21T09:50:04.1065061Z @%p58 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x0, %r216; 2026-02-21T09:50:04.1065338Z // end inline asm 2026-02-21T09:50:04.1065467Z // begin inline asm 2026-02-21T09:50:04.1065705Z @%p58 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x1, %r216; 2026-02-21T09:50:04.1065967Z // end inline asm 2026-02-21T09:50:04.1066105Z mov.b64 %rd107, 4096; 2026-02-21T09:50:04.1066244Z // begin inline asm 2026-02-21T09:50:04.1066496Z @%p58 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd99 + 0 ], 0x0, %rd107; 2026-02-21T09:50:04.1066827Z // end inline asm 2026-02-21T09:50:04.1066957Z mov.b32 %r218, 1; 2026-02-21T09:50:04.1067101Z // begin inline asm 2026-02-21T09:50:04.1067379Z @%p58 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x0, %r218; 2026-02-21T09:50:04.1067656Z // end inline asm 2026-02-21T09:50:04.1067799Z // begin inline asm 2026-02-21T09:50:04.1068078Z @%p58 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x1, %r218; 2026-02-21T09:50:04.1068427Z // end inline asm 2026-02-21T09:50:04.1068567Z // begin inline asm 2026-02-21T09:50:04.1068803Z @%p58 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x6; 2026-02-21T09:50:04.1069074Z // end inline asm 2026-02-21T09:50:04.1069210Z // begin inline asm 2026-02-21T09:50:04.1069487Z @%p58 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x0; 2026-02-21T09:50:04.1069786Z // end inline asm 2026-02-21T09:50:04.1069927Z // begin inline asm 2026-02-21T09:50:04.1070169Z @%p58 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x3; 2026-02-21T09:50:04.1070435Z // end inline asm 2026-02-21T09:50:04.1070576Z // begin inline asm 2026-02-21T09:50:04.1070807Z @%p58 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd99 + 0 ], 0x0; 2026-02-21T09:50:04.1071103Z // end inline asm 2026-02-21T09:50:04.1071240Z // begin inline asm 2026-02-21T09:50:04.1071640Z @%p29 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd114 + 0 ], [ %rd99 + 0 ], 0x80; 2026-02-21T09:50:04.1072034Z // end inline asm 2026-02-21T09:50:04.1072168Z // begin inline asm 2026-02-21T09:50:04.1072393Z @%p29 fence.proxy.tensormap::generic.acquire.gpu [ %rd114 + 0 ], 0x80; 2026-02-21T09:50:04.1072652Z @%p29 cp.async.bulk.commit_group ; 2026-02-21T09:50:04.1072852Z @%p29 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:50:04.1073030Z // end inline asm 2026-02-21T09:50:04.1073171Z bar.sync 0, 128; 2026-02-21T09:50:04.1073427Z .loc 1 30 84 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:30:84 2026-02-21T09:50:04.1073749Z setp.gt.u32 %p49, %r26, 1535; 2026-02-21T09:50:04.1073925Z @%p49 bra $L__BB0_16; 2026-02-21T09:50:04.1074092Z // %bb.15: // %.lr.ph 2026-02-21T09:50:04.1074405Z .loc 1 0 84 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:0:84 2026-02-21T09:50:04.1074755Z ld.param.b64 %rd8, [_helion_matmul_param_2]; 2026-02-21T09:50:04.1075083Z .loc 1 42 45 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:42:45 2026-02-21T09:50:04.1075398Z shr.u32 %r668, %r1, 4; 2026-02-21T09:50:04.1075677Z .loc 1 36 35 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:36:35 2026-02-21T09:50:04.1076008Z mul.hi.u32 %r669, %r26, 715827883; 2026-02-21T09:50:04.1076180Z shr.u32 %r670, %r669, 6; 2026-02-21T09:50:04.1076461Z .loc 1 39 45 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:39:45 2026-02-21T09:50:04.1076756Z mul.lo.s32 %r671, %r670, 384; 2026-02-21T09:50:04.1076930Z sub.s32 %r672, %r26, %r671; 2026-02-21T09:50:04.1077206Z .loc 1 41 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:41:27 2026-02-21T09:50:04.1077527Z shl.b32 %r673, %r670, 9; 2026-02-21T09:50:04.1077688Z shl.b32 %r674, %r672, 7; 2026-02-21T09:50:04.1077844Z and.b32 %r675, %r674, 384; 2026-02-21T09:50:04.1078011Z or.b32 %r676, %r675, %r673; 2026-02-21T09:50:04.1078278Z .loc 1 42 45 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:42:45 2026-02-21T09:50:04.1078584Z or.b32 %r677, %r668, %r676; 2026-02-21T09:50:04.1078738Z bfe.u32 %r678, %r1, 4, 3; 2026-02-21T09:50:04.1078896Z shl.b32 %r679, %r1, 3; 2026-02-21T09:50:04.1079042Z and.b32 %r680, %r679, 120; 2026-02-21T09:50:04.1079202Z and.b32 %r682, %r227, 16; 2026-02-21T09:50:04.1079358Z add.s32 %r684, %r211, %r682; 2026-02-21T09:50:04.1079551Z and.b32 %r685, %r1, 96; 2026-02-21T09:50:04.1079708Z shl.b32 %r686, %r685, 6; 2026-02-21T09:50:04.1079855Z shl.b32 %r687, %r1, 5; 2026-02-21T09:50:04.1080008Z and.b32 %r688, %r687, 96; 2026-02-21T09:50:04.1080155Z or.b32 %r689, %r686, %r688; 2026-02-21T09:50:04.1080311Z shl.b32 %r690, %r1, 4; 2026-02-21T09:50:04.1080457Z and.b32 %r691, %r690, 384; 2026-02-21T09:50:04.1080618Z or.b32 %r692, %r691, %r685; 2026-02-21T09:50:04.1080800Z xor.b32 %r693, %r689, %r692; 2026-02-21T09:50:04.1080958Z add.s32 %r528, %r684, %r693; 2026-02-21T09:50:04.1081118Z add.s32 %r543, %r528, 1536; 2026-02-21T09:50:04.1081264Z add.s32 %r538, %r528, 1024; 2026-02-21T09:50:04.1081417Z add.s32 %r533, %r528, 512; 2026-02-21T09:50:04.1081564Z shl.b32 %r694, %r1, 10; 2026-02-21T09:50:04.1081715Z and.b32 %r695, %r694, 6144; 2026-02-21T09:50:04.1081862Z and.b32 %r696, %r690, 2032; 2026-02-21T09:50:04.1082013Z or.b32 %r697, %r695, %r696; 2026-02-21T09:50:04.1082158Z xor.b32 %r698, %r697, 96; 2026-02-21T09:50:04.1082314Z add.s32 %r699, %r211, %r698; 2026-02-21T09:50:04.1082468Z xor.b32 %r700, %r697, 64; 2026-02-21T09:50:04.1082612Z add.s32 %r701, %r211, %r700; 2026-02-21T09:50:04.1082764Z xor.b32 %r702, %r697, 32; 2026-02-21T09:50:04.1082907Z add.s32 %r703, %r211, %r702; 2026-02-21T09:50:04.1083114Z add.s32 %r704, %r211, %r697; 2026-02-21T09:50:04.1083404Z .loc 1 42 32 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:42:32 2026-02-21T09:50:04.1083687Z or.b32 %r705, %r676, %r678; 2026-02-21T09:50:04.1083832Z or.b32 %r706, %r705, 8; 2026-02-21T09:50:04.1083980Z or.b32 %r707, %r705, 16; 2026-02-21T09:50:04.1084119Z or.b32 %r708, %r705, 24; 2026-02-21T09:50:04.1084268Z or.b32 %r709, %r705, 32; 2026-02-21T09:50:04.1084415Z or.b32 %r710, %r705, 40; 2026-02-21T09:50:04.1084553Z or.b32 %r711, %r705, 48; 2026-02-21T09:50:04.1084723Z or.b32 %r712, %r677, 56; 2026-02-21T09:50:04.1084865Z or.b32 %r713, %r705, 64; 2026-02-21T09:50:04.1085008Z or.b32 %r714, %r705, 72; 2026-02-21T09:50:04.1085148Z or.b32 %r715, %r705, 80; 2026-02-21T09:50:04.1085294Z or.b32 %r716, %r705, 88; 2026-02-21T09:50:04.1085434Z or.b32 %r717, %r705, 96; 2026-02-21T09:50:04.1085580Z or.b32 %r718, %r705, 104; 2026-02-21T09:50:04.1085730Z or.b32 %r719, %r705, 112; 2026-02-21T09:50:04.1085876Z or.b32 %r720, %r677, 120; 2026-02-21T09:50:04.1086137Z .loc 1 43 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:43:27 2026-02-21T09:50:04.1086424Z shl.b32 %r721, %r672, 5; 2026-02-21T09:50:04.1086581Z and.b32 %r722, %r721, 16256; 2026-02-21T09:50:04.1086843Z .loc 1 44 32 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:44:32 2026-02-21T09:50:04.1087137Z or.b32 %r723, %r722, %r680; 2026-02-21T09:50:04.1087394Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1087712Z shfl.sync.idx.b32 %r724, %r2, 0, 31, -1; 2026-02-21T09:50:04.1087902Z shl.b32 %r725, %r724, 21; 2026-02-21T09:50:04.1088053Z and.b32 %r726, %r725, 6291456; 2026-02-21T09:50:04.1088220Z add.s32 %r228, %r726, %r935; 2026-02-21T09:50:04.1088372Z mov.pred %p50, -1; 2026-02-21T09:50:04.1088522Z // begin inline asm 2026-02-21T09:50:04.1088897Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 0], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1089309Z // end inline asm 2026-02-21T09:50:04.1089447Z // begin inline asm 2026-02-21T09:50:04.1089813Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 16], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1090214Z // end inline asm 2026-02-21T09:50:04.1090344Z // begin inline asm 2026-02-21T09:50:04.1090701Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 32], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1091122Z // end inline asm 2026-02-21T09:50:04.1091258Z // begin inline asm 2026-02-21T09:50:04.1091603Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 48], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1091977Z // end inline asm 2026-02-21T09:50:04.1092116Z // begin inline asm 2026-02-21T09:50:04.1092478Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 64], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1092864Z // end inline asm 2026-02-21T09:50:04.1092993Z // begin inline asm 2026-02-21T09:50:04.1093338Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 80], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1093717Z // end inline asm 2026-02-21T09:50:04.1093844Z // begin inline asm 2026-02-21T09:50:04.1094192Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 96], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1094558Z // end inline asm 2026-02-21T09:50:04.1094720Z // begin inline asm 2026-02-21T09:50:04.1095135Z @%p50 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r228 + 112], {%r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229, %r229}; 2026-02-21T09:50:04.1095513Z // end inline asm 2026-02-21T09:50:04.1095648Z // begin inline asm 2026-02-21T09:50:04.1095795Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:50:04.1095957Z // end inline asm 2026-02-21T09:50:04.1096082Z bar.sync 0, 128; 2026-02-21T09:50:04.1096340Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1096627Z add.s32 %r364, %r211, 114688; 2026-02-21T09:50:04.1096782Z // begin inline asm 2026-02-21T09:50:04.1096950Z @%p58 mbarrier.init.shared::cta.b64 [%r364], 1; 2026-02-21T09:50:04.1097132Z // end inline asm 2026-02-21T09:50:04.1097266Z bar.sync 0, 128; 2026-02-21T09:50:04.1097399Z add.s32 %r365, %r211, 114696; 2026-02-21T09:50:04.1097555Z // begin inline asm 2026-02-21T09:50:04.1097712Z @%p58 mbarrier.init.shared::cta.b64 [%r365], 1; 2026-02-21T09:50:04.1097899Z // end inline asm 2026-02-21T09:50:04.1098025Z bar.sync 0, 128; 2026-02-21T09:50:04.1098162Z add.s32 %r366, %r211, 114704; 2026-02-21T09:50:04.1098307Z // begin inline asm 2026-02-21T09:50:04.1098471Z @%p58 mbarrier.init.shared::cta.b64 [%r366], 1; 2026-02-21T09:50:04.1098655Z // end inline asm 2026-02-21T09:50:04.1098783Z bar.sync 0, 128; 2026-02-21T09:50:04.1098923Z add.s32 %r367, %r211, 114712; 2026-02-21T09:50:04.1099071Z // begin inline asm 2026-02-21T09:50:04.1099239Z @%p58 mbarrier.init.shared::cta.b64 [%r367], 1; 2026-02-21T09:50:04.1099415Z // end inline asm 2026-02-21T09:50:04.1099552Z add.s32 %r368, %r211, 114720; 2026-02-21T09:50:04.1099700Z // begin inline asm 2026-02-21T09:50:04.1099858Z @%p58 mbarrier.init.shared::cta.b64 [%r368], 1; 2026-02-21T09:50:04.1100041Z // end inline asm 2026-02-21T09:50:04.1100168Z bar.sync 0, 128; 2026-02-21T09:50:04.1100304Z add.s32 %r369, %r211, 114728; 2026-02-21T09:50:04.1100450Z // begin inline asm 2026-02-21T09:50:04.1100611Z @%p58 mbarrier.init.shared::cta.b64 [%r369], 1; 2026-02-21T09:50:04.1100788Z // end inline asm 2026-02-21T09:50:04.1100920Z bar.sync 0, 128; 2026-02-21T09:50:04.1101050Z add.s32 %r370, %r211, 114736; 2026-02-21T09:50:04.1101206Z // begin inline asm 2026-02-21T09:50:04.1101358Z @%p58 mbarrier.init.shared::cta.b64 [%r370], 1; 2026-02-21T09:50:04.1101539Z // end inline asm 2026-02-21T09:50:04.1101672Z bar.sync 0, 128; 2026-02-21T09:50:04.1101801Z add.s32 %r371, %r211, 114744; 2026-02-21T09:50:04.1101954Z // begin inline asm 2026-02-21T09:50:04.1102107Z @%p58 mbarrier.init.shared::cta.b64 [%r371], 1; 2026-02-21T09:50:04.1102289Z // end inline asm 2026-02-21T09:50:04.1102565Z .loc 1 54 31 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:54:31 2026-02-21T09:50:04.1102850Z bar.sync 0, 128; 2026-02-21T09:50:04.1102977Z // begin inline asm 2026-02-21T09:50:04.1103142Z @%p58 mbarrier.arrive.shared::cta.b64 _, [%r364]; 2026-02-21T09:50:04.1103338Z // end inline asm 2026-02-21T09:50:04.1103462Z bar.sync 0, 128; 2026-02-21T09:50:04.1103635Z // begin inline asm 2026-02-21T09:50:04.1103792Z @%p58 mbarrier.arrive.shared::cta.b64 _, [%r365]; 2026-02-21T09:50:04.1103982Z // end inline asm 2026-02-21T09:50:04.1104105Z bar.sync 0, 128; 2026-02-21T09:50:04.1104236Z // begin inline asm 2026-02-21T09:50:04.1104390Z @%p58 mbarrier.arrive.shared::cta.b64 _, [%r366]; 2026-02-21T09:50:04.1104579Z // end inline asm 2026-02-21T09:50:04.1104741Z bar.sync 0, 128; 2026-02-21T09:50:04.1104872Z // begin inline asm 2026-02-21T09:50:04.1105036Z @%p58 mbarrier.arrive.shared::cta.b64 _, [%r367]; 2026-02-21T09:50:04.1105216Z // end inline asm 2026-02-21T09:50:04.1105468Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1105752Z bar.sync 0, 128; 2026-02-21T09:50:04.1105888Z add.s32 %r376, %r211, 114752; 2026-02-21T09:50:04.1106036Z // begin inline asm 2026-02-21T09:50:04.1106226Z @%p58 mbarrier.init.shared::cta.b64 [%r376], 1; 2026-02-21T09:50:04.1106445Z // end inline asm 2026-02-21T09:50:04.1106613Z st.shared.v2.b32 [global_smem+114760], {0, 33685761}; 2026-02-21T09:50:04.1106833Z st.shared.b32 [global_smem+65536], %r935; 2026-02-21T09:50:04.1107046Z st.shared.v2.b32 [global_smem+65544], {%r676, %r722}; 2026-02-21T09:50:04.1107249Z barrier.sync 1; 2026-02-21T09:50:04.1107386Z barrier.sync 1; 2026-02-21T09:50:04.1107532Z barrier.sync 1; 2026-02-21T09:50:04.1107803Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1108110Z bar.sync 0, 128; 2026-02-21T09:50:04.1108252Z // begin inline asm 2026-02-21T09:50:04.1108380Z 2026-02-21T09:50:04.1108498Z { 2026-02-21T09:50:04.1108616Z .reg .pred complete; 2026-02-21T09:50:04.1108767Z waitLoop: 2026-02-21T09:50:04.1108952Z mbarrier.try_wait.parity.shared.b64 complete, [%r376], %r229; 2026-02-21T09:50:04.1109186Z @!complete bra.uni waitLoop; 2026-02-21T09:50:04.1109337Z } 2026-02-21T09:50:04.1109407Z 2026-02-21T09:50:04.1109462Z // end inline asm 2026-02-21T09:50:04.1109717Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1110010Z bar.sync 0, 128; 2026-02-21T09:50:04.1110150Z // begin inline asm 2026-02-21T09:50:04.1110310Z @%p58 mbarrier.inval.shared::cta.b64 [%r376]; 2026-02-21T09:50:04.1110497Z // end inline asm 2026-02-21T09:50:04.1110623Z // begin inline asm 2026-02-21T09:50:04.1110783Z @%p58 mbarrier.inval.shared::cta.b64 [%r368]; 2026-02-21T09:50:04.1110960Z // end inline asm 2026-02-21T09:50:04.1111094Z bar.sync 0, 128; 2026-02-21T09:50:04.1111220Z // begin inline asm 2026-02-21T09:50:04.1111383Z @%p58 mbarrier.inval.shared::cta.b64 [%r369]; 2026-02-21T09:50:04.1111564Z // end inline asm 2026-02-21T09:50:04.1111689Z bar.sync 0, 128; 2026-02-21T09:50:04.1111824Z // begin inline asm 2026-02-21T09:50:04.1111979Z @%p58 mbarrier.inval.shared::cta.b64 [%r370]; 2026-02-21T09:50:04.1112158Z // end inline asm 2026-02-21T09:50:04.1112297Z bar.sync 0, 128; 2026-02-21T09:50:04.1112471Z // begin inline asm 2026-02-21T09:50:04.1112655Z @%p58 mbarrier.inval.shared::cta.b64 [%r371]; 2026-02-21T09:50:04.1112884Z // end inline asm 2026-02-21T09:50:04.1113047Z // begin inline asm 2026-02-21T09:50:04.1113225Z @%p58 mbarrier.inval.shared::cta.b64 [%r364]; 2026-02-21T09:50:04.1113465Z // end inline asm 2026-02-21T09:50:04.1113618Z bar.sync 0, 128; 2026-02-21T09:50:04.1113800Z // begin inline asm 2026-02-21T09:50:04.1113994Z @%p58 mbarrier.inval.shared::cta.b64 [%r365]; 2026-02-21T09:50:04.1114206Z // end inline asm 2026-02-21T09:50:04.1114428Z bar.sync 0, 128; 2026-02-21T09:50:04.1114587Z // begin inline asm 2026-02-21T09:50:04.1114809Z @%p58 mbarrier.inval.shared::cta.b64 [%r366]; 2026-02-21T09:50:04.1115028Z // end inline asm 2026-02-21T09:50:04.1115199Z bar.sync 0, 128; 2026-02-21T09:50:04.1115357Z // begin inline asm 2026-02-21T09:50:04.1115574Z @%p58 mbarrier.inval.shared::cta.b64 [%r367]; 2026-02-21T09:50:04.1115802Z // end inline asm 2026-02-21T09:50:04.1116159Z .loc 1 59 53 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:59:53 2026-02-21T09:50:04.1116512Z mad.lo.s32 %r727, %r705, 12288, %r723; 2026-02-21T09:50:04.1116720Z mad.lo.s32 %r728, %r706, 12288, %r723; 2026-02-21T09:50:04.1116928Z mad.lo.s32 %r729, %r707, 12288, %r723; 2026-02-21T09:50:04.1117132Z mad.lo.s32 %r730, %r708, 12288, %r723; 2026-02-21T09:50:04.1117355Z mad.lo.s32 %r731, %r709, 12288, %r723; 2026-02-21T09:50:04.1117543Z mad.lo.s32 %r732, %r710, 12288, %r723; 2026-02-21T09:50:04.1117760Z mad.lo.s32 %r733, %r711, 12288, %r723; 2026-02-21T09:50:04.1117960Z mad.lo.s32 %r734, %r712, 12288, %r723; 2026-02-21T09:50:04.1118154Z mad.lo.s32 %r735, %r713, 12288, %r723; 2026-02-21T09:50:04.1118346Z mad.lo.s32 %r736, %r714, 12288, %r723; 2026-02-21T09:50:04.1118529Z mad.lo.s32 %r737, %r715, 12288, %r723; 2026-02-21T09:50:04.1118745Z mad.lo.s32 %r738, %r716, 12288, %r723; 2026-02-21T09:50:04.1118971Z mad.lo.s32 %r739, %r717, 12288, %r723; 2026-02-21T09:50:04.1119159Z mad.lo.s32 %r740, %r718, 12288, %r723; 2026-02-21T09:50:04.1119347Z mad.lo.s32 %r741, %r719, 12288, %r723; 2026-02-21T09:50:04.1119542Z mad.lo.s32 %r742, %r720, 12288, %r723; 2026-02-21T09:50:04.1119863Z .loc 1 59 24 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:59:24 2026-02-21T09:50:04.1120209Z mad.wide.u32 %rd118, %r727, 2, %rd8; 2026-02-21T09:50:04.1120389Z mad.wide.u32 %rd119, %r728, 2, %rd8; 2026-02-21T09:50:04.1120572Z mad.wide.u32 %rd120, %r729, 2, %rd8; 2026-02-21T09:50:04.1120748Z mad.wide.u32 %rd121, %r730, 2, %rd8; 2026-02-21T09:50:04.1120943Z mad.wide.u32 %rd122, %r731, 2, %rd8; 2026-02-21T09:50:04.1121137Z mad.wide.u32 %rd123, %r732, 2, %rd8; 2026-02-21T09:50:04.1121323Z mad.wide.u32 %rd124, %r733, 2, %rd8; 2026-02-21T09:50:04.1121514Z mad.wide.u32 %rd125, %r734, 2, %rd8; 2026-02-21T09:50:04.1121720Z mad.wide.u32 %rd126, %r735, 2, %rd8; 2026-02-21T09:50:04.1121891Z mad.wide.u32 %rd127, %r736, 2, %rd8; 2026-02-21T09:50:04.1122056Z mad.wide.u32 %rd128, %r737, 2, %rd8; 2026-02-21T09:50:04.1122222Z mad.wide.u32 %rd129, %r738, 2, %rd8; 2026-02-21T09:50:04.1122384Z mad.wide.u32 %rd130, %r739, 2, %rd8; 2026-02-21T09:50:04.1122552Z mad.wide.u32 %rd131, %r740, 2, %rd8; 2026-02-21T09:50:04.1122719Z mad.wide.u32 %rd132, %r741, 2, %rd8; 2026-02-21T09:50:04.1122882Z mad.wide.u32 %rd133, %r742, 2, %rd8; 2026-02-21T09:50:04.1123161Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1123439Z // begin inline asm 2026-02-21T09:50:04.1123801Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r388, %r389, %r390, %r391, %r392, %r393, %r394, %r395, %r396, %r397, %r398, %r399, %r400, %r401, %r402, %r403}, [%r228 + 0]; 2026-02-21T09:50:04.1124173Z // end inline asm 2026-02-21T09:50:04.1124311Z // begin inline asm 2026-02-21T09:50:04.1124662Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r405, %r406, %r407, %r408, %r409, %r410, %r411, %r412, %r413, %r414, %r415, %r416, %r417, %r418, %r419, %r420}, [%r228 + 16]; 2026-02-21T09:50:04.1125068Z // end inline asm 2026-02-21T09:50:04.1125205Z // begin inline asm 2026-02-21T09:50:04.1125549Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r422, %r423, %r424, %r425, %r426, %r427, %r428, %r429, %r430, %r431, %r432, %r433, %r434, %r435, %r436, %r437}, [%r228 + 32]; 2026-02-21T09:50:04.1125932Z // end inline asm 2026-02-21T09:50:04.1126061Z // begin inline asm 2026-02-21T09:50:04.1126397Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r439, %r440, %r441, %r442, %r443, %r444, %r445, %r446, %r447, %r448, %r449, %r450, %r451, %r452, %r453, %r454}, [%r228 + 48]; 2026-02-21T09:50:04.1126810Z // end inline asm 2026-02-21T09:50:04.1126940Z // begin inline asm 2026-02-21T09:50:04.1127281Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r456, %r457, %r458, %r459, %r460, %r461, %r462, %r463, %r464, %r465, %r466, %r467, %r468, %r469, %r470, %r471}, [%r228 + 64]; 2026-02-21T09:50:04.1127641Z // end inline asm 2026-02-21T09:50:04.1127824Z // begin inline asm 2026-02-21T09:50:04.1128173Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r473, %r474, %r475, %r476, %r477, %r478, %r479, %r480, %r481, %r482, %r483, %r484, %r485, %r486, %r487, %r488}, [%r228 + 80]; 2026-02-21T09:50:04.1128556Z // end inline asm 2026-02-21T09:50:04.1128697Z // begin inline asm 2026-02-21T09:50:04.1129029Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r490, %r491, %r492, %r493, %r494, %r495, %r496, %r497, %r498, %r499, %r500, %r501, %r502, %r503, %r504, %r505}, [%r228 + 96]; 2026-02-21T09:50:04.1129411Z // end inline asm 2026-02-21T09:50:04.1129541Z // begin inline asm 2026-02-21T09:50:04.1129881Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r507, %r508, %r509, %r510, %r511, %r512, %r513, %r514, %r515, %r516, %r517, %r518, %r519, %r520, %r521, %r522}, [%r228 + 112]; 2026-02-21T09:50:04.1130255Z // end inline asm 2026-02-21T09:50:04.1130382Z // begin inline asm 2026-02-21T09:50:04.1130567Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:50:04.1130747Z // end inline asm 2026-02-21T09:50:04.1130893Z cvt.u64.u32 %rd134, %r388; 2026-02-21T09:50:04.1131049Z cvt.u64.u32 %rd135, %r389; 2026-02-21T09:50:04.1131208Z shl.b64 %rd136, %rd135, 32; 2026-02-21T09:50:04.1131365Z or.b64 %rd137, %rd134, %rd136; 2026-02-21T09:50:04.1131648Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1131953Z mov.b64 {%r743, %r744}, %rd137; 2026-02-21T09:50:04.1132121Z cvt.rn.f16x2.f32 %r745, %r744, %r743; 2026-02-21T09:50:04.1132411Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1132694Z cvt.u64.u32 %rd138, %r390; 2026-02-21T09:50:04.1132852Z cvt.u64.u32 %rd139, %r391; 2026-02-21T09:50:04.1133001Z shl.b64 %rd140, %rd139, 32; 2026-02-21T09:50:04.1133162Z or.b64 %rd141, %rd138, %rd140; 2026-02-21T09:50:04.1133430Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1133726Z mov.b64 {%r746, %r747}, %rd141; 2026-02-21T09:50:04.1133901Z cvt.rn.f16x2.f32 %r748, %r747, %r746; 2026-02-21T09:50:04.1134180Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1134462Z cvt.u64.u32 %rd142, %r392; 2026-02-21T09:50:04.1134610Z cvt.u64.u32 %rd143, %r393; 2026-02-21T09:50:04.1134795Z shl.b64 %rd144, %rd143, 32; 2026-02-21T09:50:04.1134948Z or.b64 %rd145, %rd142, %rd144; 2026-02-21T09:50:04.1135221Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1135515Z mov.b64 {%r749, %r750}, %rd145; 2026-02-21T09:50:04.1135678Z cvt.rn.f16x2.f32 %r751, %r750, %r749; 2026-02-21T09:50:04.1135961Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1136244Z cvt.u64.u32 %rd146, %r394; 2026-02-21T09:50:04.1136400Z cvt.u64.u32 %rd147, %r395; 2026-02-21T09:50:04.1136550Z shl.b64 %rd148, %rd147, 32; 2026-02-21T09:50:04.1136709Z or.b64 %rd149, %rd146, %rd148; 2026-02-21T09:50:04.1136975Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1137262Z mov.b64 {%r752, %r753}, %rd149; 2026-02-21T09:50:04.1137431Z cvt.rn.f16x2.f32 %r754, %r753, %r752; 2026-02-21T09:50:04.1137708Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1137995Z cvt.u64.u32 %rd150, %r396; 2026-02-21T09:50:04.1138142Z cvt.u64.u32 %rd151, %r397; 2026-02-21T09:50:04.1138325Z shl.b64 %rd152, %rd151, 32; 2026-02-21T09:50:04.1138476Z or.b64 %rd153, %rd150, %rd152; 2026-02-21T09:50:04.1138738Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1139030Z mov.b64 {%r755, %r756}, %rd153; 2026-02-21T09:50:04.1139191Z cvt.rn.f16x2.f32 %r757, %r756, %r755; 2026-02-21T09:50:04.1139471Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1139784Z cvt.u64.u32 %rd154, %r398; 2026-02-21T09:50:04.1139945Z cvt.u64.u32 %rd155, %r399; 2026-02-21T09:50:04.1140092Z shl.b64 %rd156, %rd155, 32; 2026-02-21T09:50:04.1140250Z or.b64 %rd157, %rd154, %rd156; 2026-02-21T09:50:04.1140518Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1140804Z mov.b64 {%r758, %r759}, %rd157; 2026-02-21T09:50:04.1140969Z cvt.rn.f16x2.f32 %r760, %r759, %r758; 2026-02-21T09:50:04.1141242Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1141536Z cvt.u64.u32 %rd158, %r400; 2026-02-21T09:50:04.1141681Z cvt.u64.u32 %rd159, %r401; 2026-02-21T09:50:04.1141832Z shl.b64 %rd160, %rd159, 32; 2026-02-21T09:50:04.1142011Z or.b64 %rd161, %rd158, %rd160; 2026-02-21T09:50:04.1142309Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1142608Z mov.b64 {%r761, %r762}, %rd161; 2026-02-21T09:50:04.1142773Z cvt.rn.f16x2.f32 %r763, %r762, %r761; 2026-02-21T09:50:04.1143057Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1143347Z cvt.u64.u32 %rd162, %r402; 2026-02-21T09:50:04.1143505Z cvt.u64.u32 %rd163, %r403; 2026-02-21T09:50:04.1143653Z shl.b64 %rd164, %rd163, 32; 2026-02-21T09:50:04.1143816Z or.b64 %rd165, %rd162, %rd164; 2026-02-21T09:50:04.1144083Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1144370Z mov.b64 {%r764, %r765}, %rd165; 2026-02-21T09:50:04.1144541Z cvt.rn.f16x2.f32 %r766, %r765, %r764; 2026-02-21T09:50:04.1144845Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1145132Z cvt.u64.u32 %rd166, %r405; 2026-02-21T09:50:04.1145277Z cvt.u64.u32 %rd167, %r406; 2026-02-21T09:50:04.1145430Z shl.b64 %rd168, %rd167, 32; 2026-02-21T09:50:04.1145578Z or.b64 %rd169, %rd166, %rd168; 2026-02-21T09:50:04.1145840Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1146127Z mov.b64 {%r767, %r768}, %rd169; 2026-02-21T09:50:04.1146284Z cvt.rn.f16x2.f32 %r769, %r768, %r767; 2026-02-21T09:50:04.1146558Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1146833Z cvt.u64.u32 %rd170, %r407; 2026-02-21T09:50:04.1146983Z cvt.u64.u32 %rd171, %r408; 2026-02-21T09:50:04.1147130Z shl.b64 %rd172, %rd171, 32; 2026-02-21T09:50:04.1147287Z or.b64 %rd173, %rd170, %rd172; 2026-02-21T09:50:04.1147547Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1147830Z mov.b64 {%r770, %r771}, %rd173; 2026-02-21T09:50:04.1147996Z cvt.rn.f16x2.f32 %r772, %r771, %r770; 2026-02-21T09:50:04.1148261Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1148540Z cvt.u64.u32 %rd174, %r409; 2026-02-21T09:50:04.1148685Z cvt.u64.u32 %rd175, %r410; 2026-02-21T09:50:04.1148836Z shl.b64 %rd176, %rd175, 32; 2026-02-21T09:50:04.1148985Z or.b64 %rd177, %rd174, %rd176; 2026-02-21T09:50:04.1149247Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1149577Z mov.b64 {%r773, %r774}, %rd177; 2026-02-21T09:50:04.1149737Z cvt.rn.f16x2.f32 %r775, %r774, %r773; 2026-02-21T09:50:04.1150024Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1150306Z cvt.u64.u32 %rd178, %r411; 2026-02-21T09:50:04.1150463Z cvt.u64.u32 %rd179, %r412; 2026-02-21T09:50:04.1150611Z shl.b64 %rd180, %rd179, 32; 2026-02-21T09:50:04.1150813Z or.b64 %rd181, %rd178, %rd180; 2026-02-21T09:50:04.1151077Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1151349Z mov.b64 {%r776, %r777}, %rd181; 2026-02-21T09:50:04.1151513Z cvt.rn.f16x2.f32 %r778, %r777, %r776; 2026-02-21T09:50:04.1151784Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1152065Z cvt.u64.u32 %rd182, %r413; 2026-02-21T09:50:04.1152210Z cvt.u64.u32 %rd183, %r414; 2026-02-21T09:50:04.1152362Z shl.b64 %rd184, %rd183, 32; 2026-02-21T09:50:04.1152513Z or.b64 %rd185, %rd182, %rd184; 2026-02-21T09:50:04.1152781Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1153066Z mov.b64 {%r779, %r780}, %rd185; 2026-02-21T09:50:04.1153224Z cvt.rn.f16x2.f32 %r781, %r780, %r779; 2026-02-21T09:50:04.1153570Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1153864Z cvt.u64.u32 %rd186, %r415; 2026-02-21T09:50:04.1154018Z cvt.u64.u32 %rd187, %r416; 2026-02-21T09:50:04.1154165Z shl.b64 %rd188, %rd187, 32; 2026-02-21T09:50:04.1154325Z or.b64 %rd189, %rd186, %rd188; 2026-02-21T09:50:04.1154598Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1154908Z mov.b64 {%r782, %r783}, %rd189; 2026-02-21T09:50:04.1155075Z cvt.rn.f16x2.f32 %r784, %r783, %r782; 2026-02-21T09:50:04.1155352Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1155641Z cvt.u64.u32 %rd190, %r417; 2026-02-21T09:50:04.1155795Z cvt.u64.u32 %rd191, %r418; 2026-02-21T09:50:04.1155956Z shl.b64 %rd192, %rd191, 32; 2026-02-21T09:50:04.1156111Z or.b64 %rd193, %rd190, %rd192; 2026-02-21T09:50:04.1156417Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1156716Z mov.b64 {%r785, %r786}, %rd193; 2026-02-21T09:50:04.1156881Z cvt.rn.f16x2.f32 %r787, %r786, %r785; 2026-02-21T09:50:04.1157192Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1157501Z cvt.u64.u32 %rd194, %r419; 2026-02-21T09:50:04.1157662Z cvt.u64.u32 %rd195, %r420; 2026-02-21T09:50:04.1157815Z shl.b64 %rd196, %rd195, 32; 2026-02-21T09:50:04.1157980Z or.b64 %rd197, %rd194, %rd196; 2026-02-21T09:50:04.1158258Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1158556Z mov.b64 {%r788, %r789}, %rd197; 2026-02-21T09:50:04.1158728Z cvt.rn.f16x2.f32 %r790, %r789, %r788; 2026-02-21T09:50:04.1159014Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1159312Z cvt.u64.u32 %rd198, %r422; 2026-02-21T09:50:04.1159466Z cvt.u64.u32 %rd199, %r423; 2026-02-21T09:50:04.1159628Z shl.b64 %rd200, %rd199, 32; 2026-02-21T09:50:04.1159785Z or.b64 %rd201, %rd198, %rd200; 2026-02-21T09:50:04.1160068Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1160369Z mov.b64 {%r791, %r792}, %rd201; 2026-02-21T09:50:04.1160533Z cvt.rn.f16x2.f32 %r793, %r792, %r791; 2026-02-21T09:50:04.1160831Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1161136Z cvt.u64.u32 %rd202, %r424; 2026-02-21T09:50:04.1161334Z cvt.u64.u32 %rd203, %r425; 2026-02-21T09:50:04.1161495Z shl.b64 %rd204, %rd203, 32; 2026-02-21T09:50:04.1161671Z or.b64 %rd205, %rd202, %rd204; 2026-02-21T09:50:04.1161959Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1162265Z mov.b64 {%r794, %r795}, %rd205; 2026-02-21T09:50:04.1162447Z cvt.rn.f16x2.f32 %r796, %r795, %r794; 2026-02-21T09:50:04.1162755Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1163047Z cvt.u64.u32 %rd206, %r426; 2026-02-21T09:50:04.1163200Z cvt.u64.u32 %rd207, %r427; 2026-02-21T09:50:04.1163359Z shl.b64 %rd208, %rd207, 32; 2026-02-21T09:50:04.1163516Z or.b64 %rd209, %rd206, %rd208; 2026-02-21T09:50:04.1163790Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1164088Z mov.b64 {%r797, %r798}, %rd209; 2026-02-21T09:50:04.1164255Z cvt.rn.f16x2.f32 %r799, %r798, %r797; 2026-02-21T09:50:04.1164541Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1164854Z cvt.u64.u32 %rd210, %r428; 2026-02-21T09:50:04.1165016Z cvt.u64.u32 %rd211, %r429; 2026-02-21T09:50:04.1165197Z shl.b64 %rd212, %rd211, 32; 2026-02-21T09:50:04.1165365Z or.b64 %rd213, %rd210, %rd212; 2026-02-21T09:50:04.1165674Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1165977Z mov.b64 {%r800, %r801}, %rd213; 2026-02-21T09:50:04.1166151Z cvt.rn.f16x2.f32 %r802, %r801, %r800; 2026-02-21T09:50:04.1166438Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1166732Z cvt.u64.u32 %rd214, %r430; 2026-02-21T09:50:04.1166892Z cvt.u64.u32 %rd215, %r431; 2026-02-21T09:50:04.1167048Z shl.b64 %rd216, %rd215, 32; 2026-02-21T09:50:04.1167199Z or.b64 %rd217, %rd214, %rd216; 2026-02-21T09:50:04.1167466Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1167749Z mov.b64 {%r803, %r804}, %rd217; 2026-02-21T09:50:04.1167906Z cvt.rn.f16x2.f32 %r805, %r804, %r803; 2026-02-21T09:50:04.1168189Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1168467Z cvt.u64.u32 %rd218, %r432; 2026-02-21T09:50:04.1168620Z cvt.u64.u32 %rd219, %r433; 2026-02-21T09:50:04.1168767Z shl.b64 %rd220, %rd219, 32; 2026-02-21T09:50:04.1168924Z or.b64 %rd221, %rd218, %rd220; 2026-02-21T09:50:04.1169191Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1169468Z mov.b64 {%r806, %r807}, %rd221; 2026-02-21T09:50:04.1169634Z cvt.rn.f16x2.f32 %r808, %r807, %r806; 2026-02-21T09:50:04.1169901Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1170184Z cvt.u64.u32 %rd222, %r434; 2026-02-21T09:50:04.1170329Z cvt.u64.u32 %rd223, %r435; 2026-02-21T09:50:04.1170484Z shl.b64 %rd224, %rd223, 32; 2026-02-21T09:50:04.1170633Z or.b64 %rd225, %rd222, %rd224; 2026-02-21T09:50:04.1170903Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1171188Z mov.b64 {%r809, %r810}, %rd225; 2026-02-21T09:50:04.1171349Z cvt.rn.f16x2.f32 %r811, %r810, %r809; 2026-02-21T09:50:04.1171625Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1171902Z cvt.u64.u32 %rd226, %r436; 2026-02-21T09:50:04.1172056Z cvt.u64.u32 %rd227, %r437; 2026-02-21T09:50:04.1172201Z shl.b64 %rd228, %rd227, 32; 2026-02-21T09:50:04.1172361Z or.b64 %rd229, %rd226, %rd228; 2026-02-21T09:50:04.1172631Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1172977Z mov.b64 {%r812, %r813}, %rd229; 2026-02-21T09:50:04.1173149Z cvt.rn.f16x2.f32 %r814, %r813, %r812; 2026-02-21T09:50:04.1173424Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1173724Z cvt.u64.u32 %rd230, %r439; 2026-02-21T09:50:04.1173882Z cvt.u64.u32 %rd231, %r440; 2026-02-21T09:50:04.1174045Z shl.b64 %rd232, %rd231, 32; 2026-02-21T09:50:04.1174232Z or.b64 %rd233, %rd230, %rd232; 2026-02-21T09:50:04.1174512Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1174842Z mov.b64 {%r815, %r816}, %rd233; 2026-02-21T09:50:04.1175001Z cvt.rn.f16x2.f32 %r817, %r816, %r815; 2026-02-21T09:50:04.1175282Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1175560Z cvt.u64.u32 %rd234, %r441; 2026-02-21T09:50:04.1175713Z cvt.u64.u32 %rd235, %r442; 2026-02-21T09:50:04.1175861Z shl.b64 %rd236, %rd235, 32; 2026-02-21T09:50:04.1176020Z or.b64 %rd237, %rd234, %rd236; 2026-02-21T09:50:04.1176290Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1176582Z mov.b64 {%r818, %r819}, %rd237; 2026-02-21T09:50:04.1176777Z cvt.rn.f16x2.f32 %r820, %r819, %r818; 2026-02-21T09:50:04.1177075Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1177366Z cvt.u64.u32 %rd238, %r443; 2026-02-21T09:50:04.1177513Z cvt.u64.u32 %rd239, %r444; 2026-02-21T09:50:04.1177667Z shl.b64 %rd240, %rd239, 32; 2026-02-21T09:50:04.1177816Z or.b64 %rd241, %rd238, %rd240; 2026-02-21T09:50:04.1178087Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1178377Z mov.b64 {%r821, %r822}, %rd241; 2026-02-21T09:50:04.1178536Z cvt.rn.f16x2.f32 %r823, %r822, %r821; 2026-02-21T09:50:04.1178815Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1179096Z cvt.u64.u32 %rd242, %r445; 2026-02-21T09:50:04.1179252Z cvt.u64.u32 %rd243, %r446; 2026-02-21T09:50:04.1179398Z shl.b64 %rd244, %rd243, 32; 2026-02-21T09:50:04.1179556Z or.b64 %rd245, %rd242, %rd244; 2026-02-21T09:50:04.1179824Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1180107Z mov.b64 {%r824, %r825}, %rd245; 2026-02-21T09:50:04.1180271Z cvt.rn.f16x2.f32 %r826, %r825, %r824; 2026-02-21T09:50:04.1180544Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1180833Z cvt.u64.u32 %rd246, %r447; 2026-02-21T09:50:04.1180982Z cvt.u64.u32 %rd247, %r448; 2026-02-21T09:50:04.1181134Z shl.b64 %rd248, %rd247, 32; 2026-02-21T09:50:04.1181284Z or.b64 %rd249, %rd246, %rd248; 2026-02-21T09:50:04.1181552Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1181835Z mov.b64 {%r827, %r828}, %rd249; 2026-02-21T09:50:04.1181992Z cvt.rn.f16x2.f32 %r829, %r828, %r827; 2026-02-21T09:50:04.1182272Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1182550Z cvt.u64.u32 %rd250, %r449; 2026-02-21T09:50:04.1182707Z cvt.u64.u32 %rd251, %r450; 2026-02-21T09:50:04.1182854Z shl.b64 %rd252, %rd251, 32; 2026-02-21T09:50:04.1183012Z or.b64 %rd253, %rd250, %rd252; 2026-02-21T09:50:04.1183276Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1183550Z mov.b64 {%r830, %r831}, %rd253; 2026-02-21T09:50:04.1183720Z cvt.rn.f16x2.f32 %r832, %r831, %r830; 2026-02-21T09:50:04.1183993Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1184319Z cvt.u64.u32 %rd254, %r451; 2026-02-21T09:50:04.1184466Z cvt.u64.u32 %rd255, %r452; 2026-02-21T09:50:04.1184623Z shl.b64 %rd256, %rd255, 32; 2026-02-21T09:50:04.1184805Z or.b64 %rd257, %rd254, %rd256; 2026-02-21T09:50:04.1185084Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1185386Z mov.b64 {%r833, %r834}, %rd257; 2026-02-21T09:50:04.1185574Z cvt.rn.f16x2.f32 %r835, %r834, %r833; 2026-02-21T09:50:04.1185850Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1186125Z cvt.u64.u32 %rd258, %r453; 2026-02-21T09:50:04.1186277Z cvt.u64.u32 %rd259, %r454; 2026-02-21T09:50:04.1186422Z shl.b64 %rd260, %rd259, 32; 2026-02-21T09:50:04.1186580Z or.b64 %rd261, %rd258, %rd260; 2026-02-21T09:50:04.1186844Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1187129Z mov.b64 {%r836, %r837}, %rd261; 2026-02-21T09:50:04.1187296Z cvt.rn.f16x2.f32 %r838, %r837, %r836; 2026-02-21T09:50:04.1187568Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1187851Z cvt.u64.u32 %rd262, %r456; 2026-02-21T09:50:04.1187995Z cvt.u64.u32 %rd263, %r457; 2026-02-21T09:50:04.1188170Z shl.b64 %rd264, %rd263, 32; 2026-02-21T09:50:04.1188350Z or.b64 %rd265, %rd262, %rd264; 2026-02-21T09:50:04.1188624Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1188921Z mov.b64 {%r839, %r840}, %rd265; 2026-02-21T09:50:04.1189081Z cvt.rn.f16x2.f32 %r841, %r840, %r839; 2026-02-21T09:50:04.1189366Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1189647Z cvt.u64.u32 %rd266, %r458; 2026-02-21T09:50:04.1189816Z cvt.u64.u32 %rd267, %r459; 2026-02-21T09:50:04.1189961Z shl.b64 %rd268, %rd267, 32; 2026-02-21T09:50:04.1190119Z or.b64 %rd269, %rd266, %rd268; 2026-02-21T09:50:04.1190391Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1190677Z mov.b64 {%r842, %r843}, %rd269; 2026-02-21T09:50:04.1190842Z cvt.rn.f16x2.f32 %r844, %r843, %r842; 2026-02-21T09:50:04.1191120Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1191403Z cvt.u64.u32 %rd270, %r460; 2026-02-21T09:50:04.1191549Z cvt.u64.u32 %rd271, %r461; 2026-02-21T09:50:04.1191703Z shl.b64 %rd272, %rd271, 32; 2026-02-21T09:50:04.1191851Z or.b64 %rd273, %rd270, %rd272; 2026-02-21T09:50:04.1192119Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1192413Z mov.b64 {%r845, %r846}, %rd273; 2026-02-21T09:50:04.1192572Z cvt.rn.f16x2.f32 %r847, %r846, %r845; 2026-02-21T09:50:04.1192855Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1193134Z cvt.u64.u32 %rd274, %r462; 2026-02-21T09:50:04.1193290Z cvt.u64.u32 %rd275, %r463; 2026-02-21T09:50:04.1193436Z shl.b64 %rd276, %rd275, 32; 2026-02-21T09:50:04.1193593Z or.b64 %rd277, %rd274, %rd276; 2026-02-21T09:50:04.1193870Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1194154Z mov.b64 {%r848, %r849}, %rd277; 2026-02-21T09:50:04.1194321Z cvt.rn.f16x2.f32 %r850, %r849, %r848; 2026-02-21T09:50:04.1194592Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1194917Z cvt.u64.u32 %rd278, %r464; 2026-02-21T09:50:04.1195068Z cvt.u64.u32 %rd279, %r465; 2026-02-21T09:50:04.1195238Z shl.b64 %rd280, %rd279, 32; 2026-02-21T09:50:04.1195388Z or.b64 %rd281, %rd278, %rd280; 2026-02-21T09:50:04.1195660Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1195978Z mov.b64 {%r851, %r852}, %rd281; 2026-02-21T09:50:04.1196137Z cvt.rn.f16x2.f32 %r853, %r852, %r851; 2026-02-21T09:50:04.1196422Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1196709Z cvt.u64.u32 %rd282, %r466; 2026-02-21T09:50:04.1196864Z cvt.u64.u32 %rd283, %r467; 2026-02-21T09:50:04.1197039Z shl.b64 %rd284, %rd283, 32; 2026-02-21T09:50:04.1197197Z or.b64 %rd285, %rd282, %rd284; 2026-02-21T09:50:04.1197465Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1197749Z mov.b64 {%r854, %r855}, %rd285; 2026-02-21T09:50:04.1197917Z cvt.rn.f16x2.f32 %r856, %r855, %r854; 2026-02-21T09:50:04.1198189Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1198476Z cvt.u64.u32 %rd286, %r468; 2026-02-21T09:50:04.1198624Z cvt.u64.u32 %rd287, %r469; 2026-02-21T09:50:04.1198778Z shl.b64 %rd288, %rd287, 32; 2026-02-21T09:50:04.1198928Z or.b64 %rd289, %rd286, %rd288; 2026-02-21T09:50:04.1199193Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1199517Z mov.b64 {%r857, %r858}, %rd289; 2026-02-21T09:50:04.1199686Z cvt.rn.f16x2.f32 %r859, %r858, %r857; 2026-02-21T09:50:04.1200008Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1200308Z cvt.u64.u32 %rd290, %r470; 2026-02-21T09:50:04.1200469Z cvt.u64.u32 %rd291, %r471; 2026-02-21T09:50:04.1200621Z shl.b64 %rd292, %rd291, 32; 2026-02-21T09:50:04.1200786Z or.b64 %rd293, %rd290, %rd292; 2026-02-21T09:50:04.1201062Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1201357Z mov.b64 {%r860, %r861}, %rd293; 2026-02-21T09:50:04.1201530Z cvt.rn.f16x2.f32 %r862, %r861, %r860; 2026-02-21T09:50:04.1201812Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1202107Z cvt.u64.u32 %rd294, %r473; 2026-02-21T09:50:04.1202262Z cvt.u64.u32 %rd295, %r474; 2026-02-21T09:50:04.1202420Z shl.b64 %rd296, %rd295, 32; 2026-02-21T09:50:04.1202578Z or.b64 %rd297, %rd294, %rd296; 2026-02-21T09:50:04.1202856Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1203164Z mov.b64 {%r863, %r864}, %rd297; 2026-02-21T09:50:04.1203332Z cvt.rn.f16x2.f32 %r865, %r864, %r863; 2026-02-21T09:50:04.1203623Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1203913Z cvt.u64.u32 %rd298, %r475; 2026-02-21T09:50:04.1204075Z cvt.u64.u32 %rd299, %r476; 2026-02-21T09:50:04.1204226Z shl.b64 %rd300, %rd299, 32; 2026-02-21T09:50:04.1204391Z or.b64 %rd301, %rd298, %rd300; 2026-02-21T09:50:04.1204697Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1204998Z mov.b64 {%r866, %r867}, %rd301; 2026-02-21T09:50:04.1205173Z cvt.rn.f16x2.f32 %r868, %r867, %r866; 2026-02-21T09:50:04.1205459Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1205761Z cvt.u64.u32 %rd302, %r477; 2026-02-21T09:50:04.1205920Z cvt.u64.u32 %rd303, %r478; 2026-02-21T09:50:04.1206085Z shl.b64 %rd304, %rd303, 32; 2026-02-21T09:50:04.1206246Z or.b64 %rd305, %rd302, %rd304; 2026-02-21T09:50:04.1206529Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1206839Z mov.b64 {%r869, %r870}, %rd305; 2026-02-21T09:50:04.1207012Z cvt.rn.f16x2.f32 %r871, %r870, %r869; 2026-02-21T09:50:04.1207307Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1207637Z cvt.u64.u32 %rd306, %r479; 2026-02-21T09:50:04.1207805Z cvt.u64.u32 %rd307, %r480; 2026-02-21T09:50:04.1207950Z shl.b64 %rd308, %rd307, 32; 2026-02-21T09:50:04.1208108Z or.b64 %rd309, %rd306, %rd308; 2026-02-21T09:50:04.1208374Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1208649Z mov.b64 {%r872, %r873}, %rd309; 2026-02-21T09:50:04.1208842Z cvt.rn.f16x2.f32 %r874, %r873, %r872; 2026-02-21T09:50:04.1209111Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1209395Z cvt.u64.u32 %rd310, %r481; 2026-02-21T09:50:04.1209547Z cvt.u64.u32 %rd311, %r482; 2026-02-21T09:50:04.1209709Z shl.b64 %rd312, %rd311, 32; 2026-02-21T09:50:04.1209867Z or.b64 %rd313, %rd310, %rd312; 2026-02-21T09:50:04.1210138Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1210427Z mov.b64 {%r875, %r876}, %rd313; 2026-02-21T09:50:04.1210591Z cvt.rn.f16x2.f32 %r877, %r876, %r875; 2026-02-21T09:50:04.1210878Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1211168Z cvt.u64.u32 %rd314, %r483; 2026-02-21T09:50:04.1211352Z cvt.u64.u32 %rd315, %r484; 2026-02-21T09:50:04.1211500Z shl.b64 %rd316, %rd315, 32; 2026-02-21T09:50:04.1211700Z or.b64 %rd317, %rd314, %rd316; 2026-02-21T09:50:04.1211968Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1212245Z mov.b64 {%r878, %r879}, %rd317; 2026-02-21T09:50:04.1212409Z cvt.rn.f16x2.f32 %r880, %r879, %r878; 2026-02-21T09:50:04.1212679Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1212959Z cvt.u64.u32 %rd318, %r485; 2026-02-21T09:50:04.1213104Z cvt.u64.u32 %rd319, %r486; 2026-02-21T09:50:04.1213257Z shl.b64 %rd320, %rd319, 32; 2026-02-21T09:50:04.1213407Z or.b64 %rd321, %rd318, %rd320; 2026-02-21T09:50:04.1213672Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1213953Z mov.b64 {%r881, %r882}, %rd321; 2026-02-21T09:50:04.1214111Z cvt.rn.f16x2.f32 %r883, %r882, %r881; 2026-02-21T09:50:04.1214390Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1214665Z cvt.u64.u32 %rd322, %r487; 2026-02-21T09:50:04.1214874Z cvt.u64.u32 %rd323, %r488; 2026-02-21T09:50:04.1215019Z shl.b64 %rd324, %rd323, 32; 2026-02-21T09:50:04.1215178Z or.b64 %rd325, %rd322, %rd324; 2026-02-21T09:50:04.1215446Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1215734Z mov.b64 {%r884, %r885}, %rd325; 2026-02-21T09:50:04.1215902Z cvt.rn.f16x2.f32 %r886, %r885, %r884; 2026-02-21T09:50:04.1216178Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1216465Z cvt.u64.u32 %rd326, %r490; 2026-02-21T09:50:04.1216612Z cvt.u64.u32 %rd327, %r491; 2026-02-21T09:50:04.1216766Z shl.b64 %rd328, %rd327, 32; 2026-02-21T09:50:04.1216919Z or.b64 %rd329, %rd326, %rd328; 2026-02-21T09:50:04.1217199Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1217491Z mov.b64 {%r887, %r888}, %rd329; 2026-02-21T09:50:04.1217650Z cvt.rn.f16x2.f32 %r889, %r888, %r887; 2026-02-21T09:50:04.1217931Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1218216Z cvt.u64.u32 %rd330, %r492; 2026-02-21T09:50:04.1218370Z cvt.u64.u32 %rd331, %r493; 2026-02-21T09:50:04.1218515Z shl.b64 %rd332, %rd331, 32; 2026-02-21T09:50:04.1218673Z or.b64 %rd333, %rd330, %rd332; 2026-02-21T09:50:04.1218939Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1219256Z mov.b64 {%r890, %r891}, %rd333; 2026-02-21T09:50:04.1219421Z cvt.rn.f16x2.f32 %r892, %r891, %r890; 2026-02-21T09:50:04.1219700Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1219993Z cvt.u64.u32 %rd334, %r494; 2026-02-21T09:50:04.1220172Z cvt.u64.u32 %rd335, %r495; 2026-02-21T09:50:04.1220326Z shl.b64 %rd336, %rd335, 32; 2026-02-21T09:50:04.1220478Z or.b64 %rd337, %rd334, %rd336; 2026-02-21T09:50:04.1220747Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1221041Z mov.b64 {%r893, %r894}, %rd337; 2026-02-21T09:50:04.1221199Z cvt.rn.f16x2.f32 %r895, %r894, %r893; 2026-02-21T09:50:04.1221484Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1221765Z cvt.u64.u32 %rd338, %r496; 2026-02-21T09:50:04.1221920Z cvt.u64.u32 %rd339, %r497; 2026-02-21T09:50:04.1222066Z shl.b64 %rd340, %rd339, 32; 2026-02-21T09:50:04.1222222Z or.b64 %rd341, %rd338, %rd340; 2026-02-21T09:50:04.1222491Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1222817Z mov.b64 {%r896, %r897}, %rd341; 2026-02-21T09:50:04.1223012Z cvt.rn.f16x2.f32 %r898, %r897, %r896; 2026-02-21T09:50:04.1223285Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1223565Z cvt.u64.u32 %rd342, %r498; 2026-02-21T09:50:04.1223713Z cvt.u64.u32 %rd343, %r499; 2026-02-21T09:50:04.1223869Z shl.b64 %rd344, %rd343, 32; 2026-02-21T09:50:04.1224018Z or.b64 %rd345, %rd342, %rd344; 2026-02-21T09:50:04.1224289Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1224578Z mov.b64 {%r899, %r900}, %rd345; 2026-02-21T09:50:04.1224767Z cvt.rn.f16x2.f32 %r901, %r900, %r899; 2026-02-21T09:50:04.1225051Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1225331Z cvt.u64.u32 %rd346, %r500; 2026-02-21T09:50:04.1225487Z cvt.u64.u32 %rd347, %r501; 2026-02-21T09:50:04.1225633Z shl.b64 %rd348, %rd347, 32; 2026-02-21T09:50:04.1225792Z or.b64 %rd349, %rd346, %rd348; 2026-02-21T09:50:04.1226063Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1226352Z mov.b64 {%r902, %r903}, %rd349; 2026-02-21T09:50:04.1226520Z cvt.rn.f16x2.f32 %r904, %r903, %r902; 2026-02-21T09:50:04.1226794Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1227091Z cvt.u64.u32 %rd350, %r502; 2026-02-21T09:50:04.1227236Z cvt.u64.u32 %rd351, %r503; 2026-02-21T09:50:04.1227392Z shl.b64 %rd352, %rd351, 32; 2026-02-21T09:50:04.1227546Z or.b64 %rd353, %rd350, %rd352; 2026-02-21T09:50:04.1227820Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1228122Z mov.b64 {%r905, %r906}, %rd353; 2026-02-21T09:50:04.1228282Z cvt.rn.f16x2.f32 %r907, %r906, %r905; 2026-02-21T09:50:04.1228570Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1228853Z cvt.u64.u32 %rd354, %r504; 2026-02-21T09:50:04.1229009Z cvt.u64.u32 %rd355, %r505; 2026-02-21T09:50:04.1229155Z shl.b64 %rd356, %rd355, 32; 2026-02-21T09:50:04.1229312Z or.b64 %rd357, %rd354, %rd356; 2026-02-21T09:50:04.1229582Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1229861Z mov.b64 {%r908, %r909}, %rd357; 2026-02-21T09:50:04.1230025Z cvt.rn.f16x2.f32 %r910, %r909, %r908; 2026-02-21T09:50:04.1230295Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1230614Z cvt.u64.u32 %rd358, %r507; 2026-02-21T09:50:04.1230760Z cvt.u64.u32 %rd359, %r508; 2026-02-21T09:50:04.1230911Z shl.b64 %rd360, %rd359, 32; 2026-02-21T09:50:04.1231060Z or.b64 %rd361, %rd358, %rd360; 2026-02-21T09:50:04.1231323Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1231655Z mov.b64 {%r911, %r912}, %rd361; 2026-02-21T09:50:04.1231815Z cvt.rn.f16x2.f32 %r913, %r912, %r911; 2026-02-21T09:50:04.1232094Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1232375Z cvt.u64.u32 %rd362, %r509; 2026-02-21T09:50:04.1232529Z cvt.u64.u32 %rd363, %r510; 2026-02-21T09:50:04.1232674Z shl.b64 %rd364, %rd363, 32; 2026-02-21T09:50:04.1232833Z or.b64 %rd365, %rd362, %rd364; 2026-02-21T09:50:04.1233100Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1233381Z mov.b64 {%r914, %r915}, %rd365; 2026-02-21T09:50:04.1233547Z cvt.rn.f16x2.f32 %r916, %r915, %r914; 2026-02-21T09:50:04.1233823Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1234138Z cvt.u64.u32 %rd366, %r511; 2026-02-21T09:50:04.1234287Z cvt.u64.u32 %rd367, %r512; 2026-02-21T09:50:04.1234465Z shl.b64 %rd368, %rd367, 32; 2026-02-21T09:50:04.1234617Z or.b64 %rd369, %rd366, %rd368; 2026-02-21T09:50:04.1234943Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1235233Z mov.b64 {%r917, %r918}, %rd369; 2026-02-21T09:50:04.1235391Z cvt.rn.f16x2.f32 %r919, %r918, %r917; 2026-02-21T09:50:04.1235678Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1235961Z cvt.u64.u32 %rd370, %r513; 2026-02-21T09:50:04.1236116Z cvt.u64.u32 %rd371, %r514; 2026-02-21T09:50:04.1236265Z shl.b64 %rd372, %rd371, 32; 2026-02-21T09:50:04.1236427Z or.b64 %rd373, %rd370, %rd372; 2026-02-21T09:50:04.1236701Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1236983Z mov.b64 {%r920, %r921}, %rd373; 2026-02-21T09:50:04.1237153Z cvt.rn.f16x2.f32 %r922, %r921, %r920; 2026-02-21T09:50:04.1237434Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1237732Z cvt.u64.u32 %rd374, %r515; 2026-02-21T09:50:04.1237879Z cvt.u64.u32 %rd375, %r516; 2026-02-21T09:50:04.1238037Z shl.b64 %rd376, %rd375, 32; 2026-02-21T09:50:04.1238190Z or.b64 %rd377, %rd374, %rd376; 2026-02-21T09:50:04.1238462Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1238760Z mov.b64 {%r923, %r924}, %rd377; 2026-02-21T09:50:04.1238918Z cvt.rn.f16x2.f32 %r925, %r924, %r923; 2026-02-21T09:50:04.1239209Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1239496Z cvt.u64.u32 %rd378, %r517; 2026-02-21T09:50:04.1239665Z cvt.u64.u32 %rd379, %r518; 2026-02-21T09:50:04.1239812Z shl.b64 %rd380, %rd379, 32; 2026-02-21T09:50:04.1239974Z or.b64 %rd381, %rd378, %rd380; 2026-02-21T09:50:04.1240249Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1240531Z mov.b64 {%r926, %r927}, %rd381; 2026-02-21T09:50:04.1240698Z cvt.rn.f16x2.f32 %r928, %r927, %r926; 2026-02-21T09:50:04.1240973Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1241273Z cvt.u64.u32 %rd382, %r519; 2026-02-21T09:50:04.1241420Z cvt.u64.u32 %rd383, %r520; 2026-02-21T09:50:04.1241574Z shl.b64 %rd384, %rd383, 32; 2026-02-21T09:50:04.1241727Z or.b64 %rd385, %rd382, %rd384; 2026-02-21T09:50:04.1242023Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1242305Z mov.b64 {%r929, %r930}, %rd385; 2026-02-21T09:50:04.1242464Z cvt.rn.f16x2.f32 %r931, %r930, %r929; 2026-02-21T09:50:04.1242740Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1243015Z cvt.u64.u32 %rd386, %r521; 2026-02-21T09:50:04.1243195Z cvt.u64.u32 %rd387, %r522; 2026-02-21T09:50:04.1243341Z shl.b64 %rd388, %rd387, 32; 2026-02-21T09:50:04.1243500Z or.b64 %rd389, %rd386, %rd388; 2026-02-21T09:50:04.1243765Z .loc 1 58 27 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:58:27 2026-02-21T09:50:04.1244055Z mov.b64 {%r932, %r933}, %rd389; 2026-02-21T09:50:04.1244227Z cvt.rn.f16x2.f32 %r934, %r933, %r932; 2026-02-21T09:50:04.1244505Z .loc 1 59 83 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:59:83 2026-02-21T09:50:04.1244876Z st.shared.v4.b32 [%r704], {%r745, %r757, %r769, %r781}; 2026-02-21T09:50:04.1245115Z st.shared.v4.b32 [%r703], {%r793, %r805, %r817, %r829}; 2026-02-21T09:50:04.1245353Z st.shared.v4.b32 [%r701], {%r841, %r853, %r865, %r877}; 2026-02-21T09:50:04.1245587Z st.shared.v4.b32 [%r699], {%r889, %r901, %r913, %r925}; 2026-02-21T09:50:04.1245811Z bar.sync 0, 128; 2026-02-21T09:50:04.1245993Z // begin inline asm 2026-02-21T09:50:04.1246236Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r604, %r608, %r612, %r616}, [%r528]; 2026-02-21T09:50:04.1246515Z // end inline asm 2026-02-21T09:50:04.1246652Z // begin inline asm 2026-02-21T09:50:04.1246888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r620, %r624, %r628, %r632}, [%r533]; 2026-02-21T09:50:04.1247149Z // end inline asm 2026-02-21T09:50:04.1247292Z // begin inline asm 2026-02-21T09:50:04.1247525Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r636, %r640, %r644, %r648}, [%r538]; 2026-02-21T09:50:04.1247783Z // end inline asm 2026-02-21T09:50:04.1247925Z // begin inline asm 2026-02-21T09:50:04.1248150Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r652, %r656, %r660, %r664}, [%r543]; 2026-02-21T09:50:04.1248415Z // end inline asm 2026-02-21T09:50:04.1248548Z bar.sync 0, 128; 2026-02-21T09:50:04.1248724Z st.shared.v4.b32 [%r704], {%r748, %r760, %r772, %r784}; 2026-02-21T09:50:04.1248957Z st.shared.v4.b32 [%r703], {%r796, %r808, %r820, %r832}; 2026-02-21T09:50:04.1249194Z st.shared.v4.b32 [%r701], {%r844, %r856, %r868, %r880}; 2026-02-21T09:50:04.1249426Z st.shared.v4.b32 [%r699], {%r892, %r904, %r916, %r928}; 2026-02-21T09:50:04.1249623Z bar.sync 0, 128; 2026-02-21T09:50:04.1249764Z // begin inline asm 2026-02-21T09:50:04.1249994Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r605, %r609, %r613, %r617}, [%r528]; 2026-02-21T09:50:04.1250269Z // end inline asm 2026-02-21T09:50:04.1250406Z // begin inline asm 2026-02-21T09:50:04.1250641Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r621, %r625, %r629, %r633}, [%r533]; 2026-02-21T09:50:04.1250903Z // end inline asm 2026-02-21T09:50:04.1251039Z // begin inline asm 2026-02-21T09:50:04.1251257Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r641, %r645, %r649}, [%r538]; 2026-02-21T09:50:04.1251505Z // end inline asm 2026-02-21T09:50:04.1251636Z // begin inline asm 2026-02-21T09:50:04.1251846Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r653, %r657, %r661, %r665}, [%r543]; 2026-02-21T09:50:04.1252102Z // end inline asm 2026-02-21T09:50:04.1252230Z bar.sync 0, 128; 2026-02-21T09:50:04.1252395Z st.shared.v4.b32 [%r704], {%r751, %r763, %r775, %r787}; 2026-02-21T09:50:04.1252611Z st.shared.v4.b32 [%r703], {%r799, %r811, %r823, %r835}; 2026-02-21T09:50:04.1252832Z st.shared.v4.b32 [%r701], {%r847, %r859, %r871, %r883}; 2026-02-21T09:50:04.1253053Z st.shared.v4.b32 [%r699], {%r895, %r907, %r919, %r931}; 2026-02-21T09:50:04.1253240Z bar.sync 0, 128; 2026-02-21T09:50:04.1253378Z // begin inline asm 2026-02-21T09:50:04.1253592Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r606, %r610, %r614, %r618}, [%r528]; 2026-02-21T09:50:04.1253875Z // end inline asm 2026-02-21T09:50:04.1254002Z // begin inline asm 2026-02-21T09:50:04.1254222Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r622, %r626, %r630, %r634}, [%r533]; 2026-02-21T09:50:04.1254465Z // end inline asm 2026-02-21T09:50:04.1254602Z // begin inline asm 2026-02-21T09:50:04.1254849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r638, %r642, %r646, %r650}, [%r538]; 2026-02-21T09:50:04.1255125Z // end inline asm 2026-02-21T09:50:04.1255260Z // begin inline asm 2026-02-21T09:50:04.1255468Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r654, %r658, %r662, %r666}, [%r543]; 2026-02-21T09:50:04.1255717Z // end inline asm 2026-02-21T09:50:04.1255843Z bar.sync 0, 128; 2026-02-21T09:50:04.1256006Z st.shared.v4.b32 [%r704], {%r754, %r766, %r778, %r790}; 2026-02-21T09:50:04.1256222Z st.shared.v4.b32 [%r703], {%r802, %r814, %r826, %r838}; 2026-02-21T09:50:04.1256442Z st.shared.v4.b32 [%r701], {%r850, %r862, %r874, %r886}; 2026-02-21T09:50:04.1256664Z st.shared.v4.b32 [%r699], {%r898, %r910, %r922, %r934}; 2026-02-21T09:50:04.1256850Z bar.sync 0, 128; 2026-02-21T09:50:04.1256985Z // begin inline asm 2026-02-21T09:50:04.1257198Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r607, %r611, %r615, %r619}, [%r528]; 2026-02-21T09:50:04.1257447Z // end inline asm 2026-02-21T09:50:04.1257613Z // begin inline asm 2026-02-21T09:50:04.1257864Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r623, %r627, %r631, %r635}, [%r533]; 2026-02-21T09:50:04.1258112Z // end inline asm 2026-02-21T09:50:04.1258247Z // begin inline asm 2026-02-21T09:50:04.1258463Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r639, %r643, %r647, %r651}, [%r538]; 2026-02-21T09:50:04.1258706Z // end inline asm 2026-02-21T09:50:04.1258840Z // begin inline asm 2026-02-21T09:50:04.1259054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r655, %r659, %r663, %r667}, [%r543]; 2026-02-21T09:50:04.1259309Z // end inline asm 2026-02-21T09:50:04.1259436Z // begin inline asm 2026-02-21T09:50:04.1259625Z st.global.v4.b32 [ %rd118 + 0 ], { %r604, %r605, %r606, %r607 }; 2026-02-21T09:50:04.1259835Z // end inline asm 2026-02-21T09:50:04.1259976Z // begin inline asm 2026-02-21T09:50:04.1260162Z st.global.v4.b32 [ %rd119 + 0 ], { %r608, %r609, %r610, %r611 }; 2026-02-21T09:50:04.1260365Z // end inline asm 2026-02-21T09:50:04.1260505Z // begin inline asm 2026-02-21T09:50:04.1260676Z st.global.v4.b32 [ %rd120 + 0 ], { %r612, %r613, %r614, %r615 }; 2026-02-21T09:50:04.1260887Z // end inline asm 2026-02-21T09:50:04.1261013Z // begin inline asm 2026-02-21T09:50:04.1261187Z st.global.v4.b32 [ %rd121 + 0 ], { %r616, %r617, %r618, %r619 }; 2026-02-21T09:50:04.1261383Z // end inline asm 2026-02-21T09:50:04.1261517Z // begin inline asm 2026-02-21T09:50:04.1261690Z st.global.v4.b32 [ %rd122 + 0 ], { %r620, %r621, %r622, %r623 }; 2026-02-21T09:50:04.1261886Z // end inline asm 2026-02-21T09:50:04.1262022Z // begin inline asm 2026-02-21T09:50:04.1262189Z st.global.v4.b32 [ %rd123 + 0 ], { %r624, %r625, %r626, %r627 }; 2026-02-21T09:50:04.1262394Z // end inline asm 2026-02-21T09:50:04.1262520Z // begin inline asm 2026-02-21T09:50:04.1262698Z st.global.v4.b32 [ %rd124 + 0 ], { %r628, %r629, %r630, %r631 }; 2026-02-21T09:50:04.1262894Z // end inline asm 2026-02-21T09:50:04.1262956Z // begin inline asm 2026-02-21T09:50:04.1263049Z st.global.v4.b32 [ %rd125 + 0 ], { %r632, %r633, %r634, %r635 }; 2026-02-21T09:50:04.1263102Z // end inline asm 2026-02-21T09:50:04.1263163Z // begin inline asm 2026-02-21T09:50:04.1263253Z st.global.v4.b32 [ %rd126 + 0 ], { %r636, %r637, %r638, %r639 }; 2026-02-21T09:50:04.1263306Z // end inline asm 2026-02-21T09:50:04.1263360Z // begin inline asm 2026-02-21T09:50:04.1263457Z st.global.v4.b32 [ %rd127 + 0 ], { %r640, %r641, %r642, %r643 }; 2026-02-21T09:50:04.1263509Z // end inline asm 2026-02-21T09:50:04.1263562Z // begin inline asm 2026-02-21T09:50:04.1263659Z st.global.v4.b32 [ %rd128 + 0 ], { %r644, %r645, %r646, %r647 }; 2026-02-21T09:50:04.1263710Z // end inline asm 2026-02-21T09:50:04.1263792Z // begin inline asm 2026-02-21T09:50:04.1263883Z st.global.v4.b32 [ %rd129 + 0 ], { %r648, %r649, %r650, %r651 }; 2026-02-21T09:50:04.1263941Z // end inline asm 2026-02-21T09:50:04.1263994Z // begin inline asm 2026-02-21T09:50:04.1264083Z st.global.v4.b32 [ %rd130 + 0 ], { %r652, %r653, %r654, %r655 }; 2026-02-21T09:50:04.1264143Z // end inline asm 2026-02-21T09:50:04.1264198Z // begin inline asm 2026-02-21T09:50:04.1264313Z st.global.v4.b32 [ %rd131 + 0 ], { %r656, %r657, %r658, %r659 }; 2026-02-21T09:50:04.1264372Z // end inline asm 2026-02-21T09:50:04.1264425Z // begin inline asm 2026-02-21T09:50:04.1264513Z st.global.v4.b32 [ %rd132 + 0 ], { %r660, %r661, %r662, %r663 }; 2026-02-21T09:50:04.1264565Z // end inline asm 2026-02-21T09:50:04.1264624Z // begin inline asm 2026-02-21T09:50:04.1264743Z st.global.v4.b32 [ %rd133 + 0 ], { %r664, %r665, %r666, %r667 }; 2026-02-21T09:50:04.1264795Z // end inline asm 2026-02-21T09:50:04.1264882Z $L__BB0_16: // %._crit_edge 2026-02-21T09:50:04.1265057Z .loc 1 30 4 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:30:4 2026-02-21T09:50:04.1265111Z bar.sync 0, 128; 2026-02-21T09:50:04.1265171Z // begin inline asm 2026-02-21T09:50:04.1265322Z @%p29 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r935, 128; 2026-02-21T09:50:04.1265377Z // end inline asm 2026-02-21T09:50:04.1265501Z st.shared.v2.b32 [global_smem+114760], {50529027, 50529027}; 2026-02-21T09:50:04.1265566Z barrier.sync 1; 2026-02-21T09:50:04.1265648Z $L__BB0_17: // %common.ret 2026-02-21T09:50:04.1265816Z .loc 1 0 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:0 2026-02-21T09:50:04.1265875Z ret; 2026-02-21T09:50:04.1265969Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:50:04.1266053Z ld.param.b64 %rd7, [_helion_matmul_param_1]; 2026-02-21T09:50:04.1266221Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19 2026-02-21T09:50:04.1266286Z cvt.u16.u32 %rs1, %r1; 2026-02-21T09:50:04.1266348Z and.b16 %rs2, %rs1, 7; 2026-02-21T09:50:04.1266411Z mul.wide.u16 %r4, %rs2, 8; 2026-02-21T09:50:04.1266594Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1266654Z or.b32 %r5, %r4, 192; 2026-02-21T09:50:04.1266720Z mov.b32 %r28, global_smem; 2026-02-21T09:50:04.1266788Z add.s32 %r29, %r28, %r3; 2026-02-21T09:50:04.1266846Z bra.uni $L__BB0_2; 2026-02-21T09:50:04.1266948Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1267120Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1267187Z barrier.sync 1; 2026-02-21T09:50:04.1267243Z barrier.sync 1; 2026-02-21T09:50:04.1267321Z $L__BB0_2: // %.preheader 2026-02-21T09:50:04.1267422Z // =>This Loop Header: Depth=1 2026-02-21T09:50:04.1267512Z // Child Loop BB0_11 Depth 2 2026-02-21T09:50:04.1267597Z // Child Loop BB0_6 Depth 2 2026-02-21T09:50:04.1267766Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19 2026-02-21T09:50:04.1267849Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:50:04.1267908Z barrier.sync 1; 2026-02-21T09:50:04.1267983Z ld.shared.b8 %r27, [%r29+114756]; 2026-02-21T09:50:04.1268048Z setp.gt.u32 %p3, %r27, 3; 2026-02-21T09:50:04.1268108Z @%p3 bra $L__BB0_4; 2026-02-21T09:50:04.1268188Z // %bb.3: // %.preheader 2026-02-21T09:50:04.1268290Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1268356Z $L_brx_0: .branchtargets 2026-02-21T09:50:04.1268412Z $L__BB0_5, 2026-02-21T09:50:04.1268475Z $L__BB0_10, 2026-02-21T09:50:04.1268556Z $L__BB0_13, 2026-02-21T09:50:04.1268608Z $L__BB0_17; 2026-02-21T09:50:04.1268666Z brx.idx %r27, $L_brx_0; 2026-02-21T09:50:04.1268766Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1268939Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1268998Z add.s32 %r106, %r28, 65536; 2026-02-21T09:50:04.1269111Z ld.shared.b32 %r146, [global_smem+65536]; 2026-02-21T09:50:04.1269182Z ld.shared.b32 %r107, [global_smem+65548]; 2026-02-21T09:50:04.1269237Z barrier.sync 1; 2026-02-21T09:50:04.1269412Z .loc 1 42 45 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:42:45 2026-02-21T09:50:04.1269471Z add.s32 %r108, %r1, -128; 2026-02-21T09:50:04.1269527Z shr.u32 %r7, %r108, 5; 2026-02-21T09:50:04.1269597Z shr.u32 %r109, %r1, 3; 2026-02-21T09:50:04.1269663Z bfe.u32 %r110, %r1, 3, 4; 2026-02-21T09:50:04.1269716Z or.b32 %r111, %r109, 112; 2026-02-21T09:50:04.1269881Z .loc 1 50 48 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:50:48 2026-02-21T09:50:04.1269942Z shl.b32 %r112, %r1, 3; 2026-02-21T09:50:04.1269997Z and.b32 %r113, %r112, 56; 2026-02-21T09:50:04.1270185Z .loc 1 44 32 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:44:32 2026-02-21T09:50:04.1270266Z add.s32 %r114, %r107, %r110; 2026-02-21T09:50:04.1270333Z add.s32 %r115, %r107, %r111; 2026-02-21T09:50:04.1270387Z shl.b32 %r116, %r114, 11; 2026-02-21T09:50:04.1270552Z .loc 1 55 80 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:80 2026-02-21T09:50:04.1270617Z add.s32 %r117, %r116, 32768; 2026-02-21T09:50:04.1270673Z add.s32 %r118, %r116, 65536; 2026-02-21T09:50:04.1270728Z add.s32 %r119, %r116, 98304; 2026-02-21T09:50:04.1270794Z add.s32 %r120, %r116, 131072; 2026-02-21T09:50:04.1270853Z add.s32 %r121, %r116, 163840; 2026-02-21T09:50:04.1270909Z add.s32 %r122, %r116, 196608; 2026-02-21T09:50:04.1270962Z shl.b32 %r123, %r115, 11; 2026-02-21T09:50:04.1271136Z .loc 1 55 59 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:59 2026-02-21T09:50:04.1271193Z or.b32 %r124, %r116, %r113; 2026-02-21T09:50:04.1271248Z or.b32 %r125, %r117, %r113; 2026-02-21T09:50:04.1271310Z or.b32 %r126, %r118, %r113; 2026-02-21T09:50:04.1271366Z or.b32 %r127, %r119, %r113; 2026-02-21T09:50:04.1271420Z or.b32 %r128, %r120, %r113; 2026-02-21T09:50:04.1271482Z or.b32 %r129, %r121, %r113; 2026-02-21T09:50:04.1271535Z or.b32 %r130, %r122, %r113; 2026-02-21T09:50:04.1271588Z or.b32 %r131, %r123, %r113; 2026-02-21T09:50:04.1271758Z .loc 1 55 34 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:34 2026-02-21T09:50:04.1271830Z mad.wide.s32 %rd13, %r124, 2, %rd7; 2026-02-21T09:50:04.1271893Z mad.wide.s32 %rd14, %r125, 2, %rd7; 2026-02-21T09:50:04.1271954Z mad.wide.s32 %rd15, %r126, 2, %rd7; 2026-02-21T09:50:04.1272022Z mad.wide.s32 %rd16, %r127, 2, %rd7; 2026-02-21T09:50:04.1272080Z mad.wide.s32 %rd17, %r128, 2, %rd7; 2026-02-21T09:50:04.1272138Z mad.wide.s32 %rd18, %r129, 2, %rd7; 2026-02-21T09:50:04.1272196Z mad.wide.s32 %rd19, %r130, 2, %rd7; 2026-02-21T09:50:04.1272260Z mad.wide.s32 %rd20, %r131, 2, %rd7; 2026-02-21T09:50:04.1272430Z .loc 1 55 87 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:87 2026-02-21T09:50:04.1272487Z shl.b32 %r132, %r1, 4; 2026-02-21T09:50:04.1272550Z and.b32 %r133, %r132, 2032; 2026-02-21T09:50:04.1272606Z shl.b32 %r134, %r1, 1; 2026-02-21T09:50:04.1272662Z and.b32 %r135, %r134, 112; 2026-02-21T09:50:04.1272727Z xor.b32 %r8, %r133, %r135; 2026-02-21T09:50:04.1272783Z add.s32 %r54, %r106, %r8; 2026-02-21T09:50:04.1272835Z mov.b32 %r55, 16; 2026-02-21T09:50:04.1272890Z // begin inline asm 2026-02-21T09:50:04.1273013Z cp.async.cg.shared.global [ %r54 + 0 ], [ %rd13 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1273090Z // end inline asm 2026-02-21T09:50:04.1273147Z add.s32 %r56, %r54, 2048; 2026-02-21T09:50:04.1273206Z // begin inline asm 2026-02-21T09:50:04.1273318Z cp.async.cg.shared.global [ %r56 + 0 ], [ %rd14 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1273371Z // end inline asm 2026-02-21T09:50:04.1273425Z add.s32 %r58, %r54, 4096; 2026-02-21T09:50:04.1273487Z // begin inline asm 2026-02-21T09:50:04.1273593Z cp.async.cg.shared.global [ %r58 + 0 ], [ %rd15 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1273673Z // end inline asm 2026-02-21T09:50:04.1273735Z add.s32 %r60, %r54, 6144; 2026-02-21T09:50:04.1273788Z // begin inline asm 2026-02-21T09:50:04.1273891Z cp.async.cg.shared.global [ %r60 + 0 ], [ %rd16 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1273943Z // end inline asm 2026-02-21T09:50:04.1274003Z add.s32 %r62, %r54, 8192; 2026-02-21T09:50:04.1274056Z // begin inline asm 2026-02-21T09:50:04.1274160Z cp.async.cg.shared.global [ %r62 + 0 ], [ %rd17 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1274220Z // end inline asm 2026-02-21T09:50:04.1274278Z add.s32 %r64, %r54, 10240; 2026-02-21T09:50:04.1274330Z // begin inline asm 2026-02-21T09:50:04.1274439Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd18 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1274492Z // end inline asm 2026-02-21T09:50:04.1274548Z add.s32 %r66, %r54, 12288; 2026-02-21T09:50:04.1274621Z // begin inline asm 2026-02-21T09:50:04.1274792Z cp.async.cg.shared.global [ %r66 + 0 ], [ %rd19 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1274847Z // end inline asm 2026-02-21T09:50:04.1274903Z add.s32 %r68, %r54, 14336; 2026-02-21T09:50:04.1274964Z // begin inline asm 2026-02-21T09:50:04.1275067Z cp.async.cg.shared.global [ %r68 + 0 ], [ %rd20 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1275119Z // end inline asm 2026-02-21T09:50:04.1275181Z cp.async.commit_group; 2026-02-21T09:50:04.1275359Z .loc 1 55 34 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:34 2026-02-21T09:50:04.1275415Z cvt.s64.s32 %rd38, %r116; 2026-02-21T09:50:04.1275474Z cvt.u64.u32 %rd39, %r113; 2026-02-21T09:50:04.1275539Z or.b64 %rd40, %rd38, %rd39; 2026-02-21T09:50:04.1275596Z shl.b64 %rd41, %rd40, 1; 2026-02-21T09:50:04.1275654Z add.s64 %rd42, %rd7, %rd41; 2026-02-21T09:50:04.1275719Z add.s64 %rd21, %rd42, 128; 2026-02-21T09:50:04.1275776Z cvt.s64.s32 %rd43, %r117; 2026-02-21T09:50:04.1275833Z or.b64 %rd44, %rd43, %rd39; 2026-02-21T09:50:04.1275894Z shl.b64 %rd45, %rd44, 1; 2026-02-21T09:50:04.1275965Z add.s64 %rd46, %rd7, %rd45; 2026-02-21T09:50:04.1276032Z add.s64 %rd22, %rd46, 128; 2026-02-21T09:50:04.1276090Z cvt.s64.s32 %rd47, %r118; 2026-02-21T09:50:04.1276153Z or.b64 %rd48, %rd47, %rd39; 2026-02-21T09:50:04.1276208Z shl.b64 %rd49, %rd48, 1; 2026-02-21T09:50:04.1276264Z add.s64 %rd50, %rd7, %rd49; 2026-02-21T09:50:04.1276320Z add.s64 %rd23, %rd50, 128; 2026-02-21T09:50:04.1276381Z cvt.s64.s32 %rd51, %r119; 2026-02-21T09:50:04.1276437Z or.b64 %rd52, %rd51, %rd39; 2026-02-21T09:50:04.1276492Z shl.b64 %rd53, %rd52, 1; 2026-02-21T09:50:04.1276556Z add.s64 %rd54, %rd7, %rd53; 2026-02-21T09:50:04.1276612Z add.s64 %rd24, %rd54, 128; 2026-02-21T09:50:04.1276666Z cvt.s64.s32 %rd55, %r120; 2026-02-21T09:50:04.1276721Z or.b64 %rd56, %rd55, %rd39; 2026-02-21T09:50:04.1276784Z shl.b64 %rd57, %rd56, 1; 2026-02-21T09:50:04.1276841Z add.s64 %rd58, %rd7, %rd57; 2026-02-21T09:50:04.1276896Z add.s64 %rd25, %rd58, 128; 2026-02-21T09:50:04.1276961Z cvt.s64.s32 %rd59, %r121; 2026-02-21T09:50:04.1277016Z or.b64 %rd60, %rd59, %rd39; 2026-02-21T09:50:04.1277071Z shl.b64 %rd61, %rd60, 1; 2026-02-21T09:50:04.1277125Z add.s64 %rd62, %rd7, %rd61; 2026-02-21T09:50:04.1277189Z add.s64 %rd26, %rd62, 128; 2026-02-21T09:50:04.1277244Z cvt.s64.s32 %rd63, %r122; 2026-02-21T09:50:04.1277297Z or.b64 %rd64, %rd63, %rd39; 2026-02-21T09:50:04.1277361Z shl.b64 %rd65, %rd64, 1; 2026-02-21T09:50:04.1277416Z add.s64 %rd66, %rd7, %rd65; 2026-02-21T09:50:04.1277473Z add.s64 %rd27, %rd66, 128; 2026-02-21T09:50:04.1277529Z cvt.s64.s32 %rd67, %r123; 2026-02-21T09:50:04.1277621Z or.b64 %rd68, %rd67, %rd39; 2026-02-21T09:50:04.1277677Z shl.b64 %rd69, %rd68, 1; 2026-02-21T09:50:04.1277732Z add.s64 %rd70, %rd7, %rd69; 2026-02-21T09:50:04.1277796Z add.s64 %rd28, %rd70, 128; 2026-02-21T09:50:04.1277968Z .loc 1 55 87 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:87 2026-02-21T09:50:04.1278025Z bar.sync 2, 128; 2026-02-21T09:50:04.1278128Z add.s32 %r136, %r28, %r8; 2026-02-21T09:50:04.1278183Z add.s32 %r70, %r136, 81920; 2026-02-21T09:50:04.1278237Z // begin inline asm 2026-02-21T09:50:04.1278341Z cp.async.cg.shared.global [ %r70 + 0 ], [ %rd21 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1278402Z // end inline asm 2026-02-21T09:50:04.1278456Z add.s32 %r72, %r136, 83968; 2026-02-21T09:50:04.1278510Z // begin inline asm 2026-02-21T09:50:04.1278623Z cp.async.cg.shared.global [ %r72 + 0 ], [ %rd22 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1278675Z // end inline asm 2026-02-21T09:50:04.1278732Z add.s32 %r74, %r136, 86016; 2026-02-21T09:50:04.1278786Z // begin inline asm 2026-02-21T09:50:04.1278896Z cp.async.cg.shared.global [ %r74 + 0 ], [ %rd23 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1278949Z // end inline asm 2026-02-21T09:50:04.1279001Z add.s32 %r76, %r136, 88064; 2026-02-21T09:50:04.1279061Z // begin inline asm 2026-02-21T09:50:04.1279211Z cp.async.cg.shared.global [ %r76 + 0 ], [ %rd24 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1279266Z // end inline asm 2026-02-21T09:50:04.1279328Z add.s32 %r78, %r136, 90112; 2026-02-21T09:50:04.1279382Z // begin inline asm 2026-02-21T09:50:04.1279483Z cp.async.cg.shared.global [ %r78 + 0 ], [ %rd25 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1279534Z // end inline asm 2026-02-21T09:50:04.1279595Z add.s32 %r80, %r136, 92160; 2026-02-21T09:50:04.1279649Z // begin inline asm 2026-02-21T09:50:04.1279750Z cp.async.cg.shared.global [ %r80 + 0 ], [ %rd26 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1279807Z // end inline asm 2026-02-21T09:50:04.1279861Z add.s32 %r82, %r136, 94208; 2026-02-21T09:50:04.1279915Z // begin inline asm 2026-02-21T09:50:04.1280017Z cp.async.cg.shared.global [ %r82 + 0 ], [ %rd27 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1280077Z // end inline asm 2026-02-21T09:50:04.1280132Z add.s32 %r84, %r136, 96256; 2026-02-21T09:50:04.1280185Z // begin inline asm 2026-02-21T09:50:04.1280296Z cp.async.cg.shared.global [ %r84 + 0 ], [ %rd28 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1280351Z // end inline asm 2026-02-21T09:50:04.1280411Z cp.async.commit_group; 2026-02-21T09:50:04.1280575Z .loc 1 55 34 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:34 2026-02-21T09:50:04.1280638Z add.s64 %rd29, %rd42, 256; 2026-02-21T09:50:04.1280694Z add.s64 %rd30, %rd46, 256; 2026-02-21T09:50:04.1280748Z add.s64 %rd31, %rd50, 256; 2026-02-21T09:50:04.1280809Z add.s64 %rd32, %rd54, 256; 2026-02-21T09:50:04.1280865Z add.s64 %rd33, %rd58, 256; 2026-02-21T09:50:04.1280920Z add.s64 %rd34, %rd62, 256; 2026-02-21T09:50:04.1280983Z add.s64 %rd35, %rd66, 256; 2026-02-21T09:50:04.1281038Z add.s64 %rd36, %rd70, 256; 2026-02-21T09:50:04.1281199Z .loc 1 55 87 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:87 2026-02-21T09:50:04.1281253Z bar.sync 2, 128; 2026-02-21T09:50:04.1281317Z add.s32 %r86, %r136, 98304; 2026-02-21T09:50:04.1281373Z // begin inline asm 2026-02-21T09:50:04.1281478Z cp.async.cg.shared.global [ %r86 + 0 ], [ %rd29 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1281539Z // end inline asm 2026-02-21T09:50:04.1281596Z add.s32 %r88, %r136, 100352; 2026-02-21T09:50:04.1281651Z // begin inline asm 2026-02-21T09:50:04.1281754Z cp.async.cg.shared.global [ %r88 + 0 ], [ %rd30 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1281814Z // end inline asm 2026-02-21T09:50:04.1281870Z add.s32 %r90, %r136, 102400; 2026-02-21T09:50:04.1281925Z // begin inline asm 2026-02-21T09:50:04.1282034Z cp.async.cg.shared.global [ %r90 + 0 ], [ %rd31 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1282109Z // end inline asm 2026-02-21T09:50:04.1282166Z add.s32 %r92, %r136, 104448; 2026-02-21T09:50:04.1282227Z // begin inline asm 2026-02-21T09:50:04.1282329Z cp.async.cg.shared.global [ %r92 + 0 ], [ %rd32 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1282383Z // end inline asm 2026-02-21T09:50:04.1282438Z add.s32 %r94, %r136, 106496; 2026-02-21T09:50:04.1282503Z // begin inline asm 2026-02-21T09:50:04.1282606Z cp.async.cg.shared.global [ %r94 + 0 ], [ %rd33 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1282686Z // end inline asm 2026-02-21T09:50:04.1282751Z add.s32 %r96, %r136, 108544; 2026-02-21T09:50:04.1282807Z // begin inline asm 2026-02-21T09:50:04.1282907Z cp.async.cg.shared.global [ %r96 + 0 ], [ %rd34 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1282960Z // end inline asm 2026-02-21T09:50:04.1283022Z add.s32 %r98, %r136, 110592; 2026-02-21T09:50:04.1283076Z // begin inline asm 2026-02-21T09:50:04.1283175Z cp.async.cg.shared.global [ %r98 + 0 ], [ %rd35 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1283234Z // end inline asm 2026-02-21T09:50:04.1283293Z add.s32 %r100, %r136, 112640; 2026-02-21T09:50:04.1283346Z // begin inline asm 2026-02-21T09:50:04.1283455Z cp.async.cg.shared.global [ %r100 + 0 ], [ %rd36 + 0 ], 0x10, %r55; 2026-02-21T09:50:04.1283514Z // end inline asm 2026-02-21T09:50:04.1283573Z cp.async.commit_group; 2026-02-21T09:50:04.1283786Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1283854Z add.s32 %r137, %r5, %r123; 2026-02-21T09:50:04.1283911Z cvt.u64.u32 %rd1, %r137; 2026-02-21T09:50:04.1283968Z add.s32 %r138, %r4, %r116; 2026-02-21T09:50:04.1284030Z cvt.u64.u32 %rd2, %r138; 2026-02-21T09:50:04.1284089Z mov.pred %p81, 0; 2026-02-21T09:50:04.1284141Z mov.b32 %r938, 0; 2026-02-21T09:50:04.1284193Z mov.b32 %r937, 2; 2026-02-21T09:50:04.1284255Z mov.b32 %r936, -1; 2026-02-21T09:50:04.1284308Z mov.b64 %rd390, 0; 2026-02-21T09:50:04.1284364Z mov.b32 %r939, %r938; 2026-02-21T09:50:04.1284424Z bra.uni $L__BB0_6; 2026-02-21T09:50:04.1284520Z $L__BB0_8: // in Loop: Header=BB0_6 Depth=2 2026-02-21T09:50:04.1284726Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1284792Z setp.lt.u64 %p25, %rd390, 1856; 2026-02-21T09:50:04.1284973Z .loc 1 54 31 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:54:31 2026-02-21T09:50:04.1285030Z add.s32 %r195, %r939, 1; 2026-02-21T09:50:04.1285089Z setp.eq.b32 %p26, %r195, 4; 2026-02-21T09:50:04.1285159Z selp.b32 %r939, 0, %r195, %p26; 2026-02-21T09:50:04.1285216Z selp.b32 %r196, 1, 0, %p26; 2026-02-21T09:50:04.1285273Z xor.b32 %r938, %r938, %r196; 2026-02-21T09:50:04.1285454Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1285510Z add.s32 %r197, %r937, 1; 2026-02-21T09:50:04.1285568Z setp.gt.s32 %p27, %r197, 2; 2026-02-21T09:50:04.1285627Z selp.b32 %r937, 0, %r197, %p27; 2026-02-21T09:50:04.1285807Z .loc 1 55 59 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:59 2026-02-21T09:50:04.1285863Z add.s64 %rd97, %rd2, %rd390; 2026-02-21T09:50:04.1286037Z .loc 1 55 34 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:34 2026-02-21T09:50:04.1286101Z add.s64 %rd98, %rd1, %rd390; 2026-02-21T09:50:04.1286160Z cvt.u32.u64 %r198, %rd97; 2026-02-21T09:50:04.1286216Z add.s32 %r199, %r198, 192; 2026-02-21T09:50:04.1286288Z mad.wide.s32 %rd89, %r199, 2, %rd7; 2026-02-21T09:50:04.1286344Z add.s32 %r200, %r198, 32960; 2026-02-21T09:50:04.1286407Z mad.wide.s32 %rd90, %r200, 2, %rd7; 2026-02-21T09:50:04.1286462Z add.s32 %r201, %r198, 65728; 2026-02-21T09:50:04.1286531Z mad.wide.s32 %rd91, %r201, 2, %rd7; 2026-02-21T09:50:04.1286586Z add.s32 %r202, %r198, 98496; 2026-02-21T09:50:04.1286646Z mad.wide.s32 %rd92, %r202, 2, %rd7; 2026-02-21T09:50:04.1286710Z add.s32 %r203, %r198, 131264; 2026-02-21T09:50:04.1286798Z mad.wide.s32 %rd93, %r203, 2, %rd7; 2026-02-21T09:50:04.1286855Z add.s32 %r204, %r198, 164032; 2026-02-21T09:50:04.1286916Z mad.wide.s32 %rd94, %r204, 2, %rd7; 2026-02-21T09:50:04.1286979Z add.s32 %r205, %r198, 196800; 2026-02-21T09:50:04.1287040Z mad.wide.s32 %rd95, %r205, 2, %rd7; 2026-02-21T09:50:04.1287098Z cvt.u32.u64 %r206, %rd98; 2026-02-21T09:50:04.1287167Z mad.wide.s32 %rd96, %r206, 2, %rd7; 2026-02-21T09:50:04.1287376Z .loc 1 55 87 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:87 2026-02-21T09:50:04.1287434Z shl.b32 %r207, %r937, 14; 2026-02-21T09:50:04.1287499Z add.s32 %r209, %r28, %r207; 2026-02-21T09:50:04.1287559Z add.s32 %r210, %r209, %r8; 2026-02-21T09:50:04.1287615Z bar.sync 2, 128; 2026-02-21T09:50:04.1287672Z add.s32 %r179, %r210, 65536; 2026-02-21T09:50:04.1287740Z selp.b32 %r180, 16, 0, %p25; 2026-02-21T09:50:04.1287798Z // begin inline asm 2026-02-21T09:50:04.1287917Z cp.async.cg.shared.global [ %r179 + 0 ], [ %rd89 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1287984Z // end inline asm 2026-02-21T09:50:04.1288040Z add.s32 %r181, %r210, 67584; 2026-02-21T09:50:04.1288097Z // begin inline asm 2026-02-21T09:50:04.1288215Z cp.async.cg.shared.global [ %r181 + 0 ], [ %rd90 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1288303Z // end inline asm 2026-02-21T09:50:04.1288362Z add.s32 %r183, %r210, 69632; 2026-02-21T09:50:04.1288446Z // begin inline asm 2026-02-21T09:50:04.1288571Z cp.async.cg.shared.global [ %r183 + 0 ], [ %rd91 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1288626Z // end inline asm 2026-02-21T09:50:04.1288683Z add.s32 %r185, %r210, 71680; 2026-02-21T09:50:04.1288747Z // begin inline asm 2026-02-21T09:50:04.1288862Z cp.async.cg.shared.global [ %r185 + 0 ], [ %rd92 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1288916Z // end inline asm 2026-02-21T09:50:04.1288972Z add.s32 %r187, %r210, 73728; 2026-02-21T09:50:04.1289034Z // begin inline asm 2026-02-21T09:50:04.1289148Z cp.async.cg.shared.global [ %r187 + 0 ], [ %rd93 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1289204Z // end inline asm 2026-02-21T09:50:04.1289269Z add.s32 %r189, %r210, 75776; 2026-02-21T09:50:04.1289326Z // begin inline asm 2026-02-21T09:50:04.1289440Z cp.async.cg.shared.global [ %r189 + 0 ], [ %rd94 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1289495Z // end inline asm 2026-02-21T09:50:04.1289561Z add.s32 %r191, %r210, 77824; 2026-02-21T09:50:04.1289620Z // begin inline asm 2026-02-21T09:50:04.1289731Z cp.async.cg.shared.global [ %r191 + 0 ], [ %rd95 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1289793Z // end inline asm 2026-02-21T09:50:04.1289850Z add.s32 %r193, %r210, 79872; 2026-02-21T09:50:04.1289908Z // begin inline asm 2026-02-21T09:50:04.1290020Z cp.async.cg.shared.global [ %r193 + 0 ], [ %rd96 + 0 ], 0x10, %r180; 2026-02-21T09:50:04.1290085Z // end inline asm 2026-02-21T09:50:04.1290146Z cp.async.commit_group; 2026-02-21T09:50:04.1290332Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1290406Z add.s64 %rd4, %rd390, 64; 2026-02-21T09:50:04.1290472Z setp.lt.u64 %p28, %rd390, 1984; 2026-02-21T09:50:04.1290533Z mov.pred %p81, -1; 2026-02-21T09:50:04.1290598Z mov.b64 %rd390, %rd4; 2026-02-21T09:50:04.1290658Z @%p28 bra $L__BB0_6; 2026-02-21T09:50:04.1290716Z bra.uni $L__BB0_9; 2026-02-21T09:50:04.1290814Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T09:50:04.1290918Z // => This Inner Loop Header: Depth=2 2026-02-21T09:50:04.1290976Z add.s32 %r141, %r936, 1; 2026-02-21T09:50:04.1291036Z setp.gt.s32 %p11, %r141, 2; 2026-02-21T09:50:04.1291105Z selp.b32 %r936, 0, %r141, %p11; 2026-02-21T09:50:04.1291162Z shl.b32 %r142, %r939, 3; 2026-02-21T09:50:04.1291219Z add.s32 %r144, %r28, %r142; 2026-02-21T09:50:04.1291277Z add.s32 %r139, %r144, 114720; 2026-02-21T09:50:04.1291465Z .loc 1 55 87 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:87 2026-02-21T09:50:04.1291555Z cp.async.wait_group 2; 2026-02-21T09:50:04.1291611Z bar.sync 2, 128; 2026-02-21T09:50:04.1291795Z .loc 1 54 31 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:54:31 2026-02-21T09:50:04.1291851Z // begin inline asm 2026-02-21T09:50:04.1291904Z 2026-02-21T09:50:04.1291964Z { 2026-02-21T09:50:04.1292028Z .reg .pred complete; 2026-02-21T09:50:04.1292109Z waitLoop: 2026-02-21T09:50:04.1292233Z mbarrier.try_wait.parity.shared.b64 complete, [%r139], %r938; 2026-02-21T09:50:04.1292308Z @!complete bra.uni waitLoop; 2026-02-21T09:50:04.1292360Z } 2026-02-21T09:50:04.1292364Z 2026-02-21T09:50:04.1292419Z // end inline asm 2026-02-21T09:50:04.1292597Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1292673Z shfl.sync.idx.b32 %r145, %r7, 0, 31, -1; 2026-02-21T09:50:04.1292734Z setp.ne.b32 %p12, %r145, 0; 2026-02-21T09:50:04.1292800Z @%p12 bra $L__BB0_8; 2026-02-21T09:50:04.1292897Z // %bb.7: // in Loop: Header=BB0_6 Depth=2 2026-02-21T09:50:04.1292963Z setp.eq.b64 %p23, %rd390, 1984; 2026-02-21T09:50:04.1293133Z .loc 1 55 87 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:55:87 2026-02-21T09:50:04.1293222Z shl.b32 %r154, %r936, 14; 2026-02-21T09:50:04.1293302Z add.s32 %r156, %r28, %r154; 2026-02-21T09:50:04.1293361Z add.s32 %r157, %r156, 65536; 2026-02-21T09:50:04.1293538Z .loc 1 54 31 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:54:31 2026-02-21T09:50:04.1293595Z shl.b32 %r158, %r939, 14; 2026-02-21T09:50:04.1293651Z add.s32 %r159, %r28, %r158; 2026-02-21T09:50:04.1293831Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1293890Z add.s32 %r162, %r144, 114688; 2026-02-21T09:50:04.1294060Z .loc 1 56 52 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:56:52 2026-02-21T09:50:04.1294127Z elect.sync %r163|%p14, -1; 2026-02-21T09:50:04.1294194Z bfe.u32 %r164, %r159, 4, 14; 2026-02-21T09:50:04.1294253Z cvt.u64.u32 %rd81, %r164; 2026-02-21T09:50:04.1294322Z or.b64 %rd71, %rd81, 4611686293372403712; 2026-02-21T09:50:04.1294390Z bfe.u32 %r165, %r157, 4, 14; 2026-02-21T09:50:04.1294459Z cvt.u64.u32 %rd82, %r165; 2026-02-21T09:50:04.1294527Z or.b64 %rd72, %rd82, 4611686293372403712; 2026-02-21T09:50:04.1294588Z mov.b32 %r147, 136314896; 2026-02-21T09:50:04.1294643Z // begin inline asm 2026-02-21T09:50:04.1294809Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r146 + 0 ], %rd71, %rd72, %r147, %p81; 2026-02-21T09:50:04.1294861Z // end inline asm 2026-02-21T09:50:04.1294924Z add.s32 %r166, %r159, 32; 2026-02-21T09:50:04.1294978Z bfe.u32 %r167, %r166, 4, 14; 2026-02-21T09:50:04.1295033Z cvt.u64.u32 %rd83, %r167; 2026-02-21T09:50:04.1295101Z or.b64 %rd73, %rd83, 4611686293372403712; 2026-02-21T09:50:04.1295158Z add.s32 %r168, %r156, 65568; 2026-02-21T09:50:04.1295213Z bfe.u32 %r169, %r168, 4, 14; 2026-02-21T09:50:04.1295268Z cvt.u64.u32 %rd84, %r169; 2026-02-21T09:50:04.1295335Z or.b64 %rd74, %rd84, 4611686293372403712; 2026-02-21T09:50:04.1295392Z mov.pred %p15, -1; 2026-02-21T09:50:04.1295444Z // begin inline asm 2026-02-21T09:50:04.1295584Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r146 + 0 ], %rd73, %rd74, %r147, %p15; 2026-02-21T09:50:04.1295638Z // end inline asm 2026-02-21T09:50:04.1295692Z add.s32 %r170, %r159, 64; 2026-02-21T09:50:04.1295753Z bfe.u32 %r171, %r170, 4, 14; 2026-02-21T09:50:04.1295809Z cvt.u64.u32 %rd85, %r171; 2026-02-21T09:50:04.1295871Z or.b64 %rd75, %rd85, 4611686293372403712; 2026-02-21T09:50:04.1295925Z add.s32 %r172, %r156, 65600; 2026-02-21T09:50:04.1295987Z bfe.u32 %r173, %r172, 4, 14; 2026-02-21T09:50:04.1296043Z cvt.u64.u32 %rd86, %r173; 2026-02-21T09:50:04.1296104Z or.b64 %rd76, %rd86, 4611686293372403712; 2026-02-21T09:50:04.1296165Z // begin inline asm 2026-02-21T09:50:04.1296329Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r146 + 0 ], %rd75, %rd76, %r147, %p15; 2026-02-21T09:50:04.1296383Z // end inline asm 2026-02-21T09:50:04.1296438Z add.s32 %r174, %r159, 96; 2026-02-21T09:50:04.1296501Z bfe.u32 %r175, %r174, 4, 14; 2026-02-21T09:50:04.1296556Z cvt.u64.u32 %rd87, %r175; 2026-02-21T09:50:04.1296617Z or.b64 %rd77, %rd87, 4611686293372403712; 2026-02-21T09:50:04.1296708Z add.s32 %r176, %r156, 65632; 2026-02-21T09:50:04.1296764Z bfe.u32 %r177, %r176, 4, 14; 2026-02-21T09:50:04.1296820Z cvt.u64.u32 %rd88, %r177; 2026-02-21T09:50:04.1296889Z or.b64 %rd78, %rd88, 4611686293372403712; 2026-02-21T09:50:04.1296943Z // begin inline asm 2026-02-21T09:50:04.1297066Z @%p14 tcgen05.mma.cta_group::1.kind::f16 [ %r146 + 0 ], %rd77, %rd78, %r147, %p15; 2026-02-21T09:50:04.1297119Z // end inline asm 2026-02-21T09:50:04.1297182Z cvt.u64.u32 %rd79, %r162; 2026-02-21T09:50:04.1297235Z // begin inline asm 2026-02-21T09:50:04.1297355Z @%p14 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd79]; 2026-02-21T09:50:04.1297416Z // end inline asm 2026-02-21T09:50:04.1297479Z and.pred %p22, %p23, %p14; 2026-02-21T09:50:04.1297534Z add.s32 %r178, %r28, 114752; 2026-02-21T09:50:04.1297589Z cvt.u64.u32 %rd80, %r178; 2026-02-21T09:50:04.1297651Z // begin inline asm 2026-02-21T09:50:04.1297819Z @%p22 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd80]; 2026-02-21T09:50:04.1297878Z // end inline asm 2026-02-21T09:50:04.1297942Z bra.uni $L__BB0_8; 2026-02-21T09:50:04.1298043Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1298216Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1298295Z ld.shared.b32 %r45, [global_smem+65544]; 2026-02-21T09:50:04.1298350Z barrier.sync 1; 2026-02-21T09:50:04.1298512Z .loc 1 21 67 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:21:67 2026-02-21T09:50:04.1298578Z mov.u32 %r32, %ctaid.x; 2026-02-21T09:50:04.1298634Z mov.u32 %r33, %ctaid.y; 2026-02-21T09:50:04.1298688Z mov.u32 %r34, %ctaid.z; 2026-02-21T09:50:04.1298744Z mov.u32 %r35, %nctaid.x; 2026-02-21T09:50:04.1298807Z mov.u32 %r36, %nctaid.y; 2026-02-21T09:50:04.1298870Z mad.lo.s32 %r37, %r34, %r36, %r33; 2026-02-21T09:50:04.1298930Z mad.lo.s32 %r38, %r37, %r35, %r32; 2026-02-21T09:50:04.1298994Z shl.b32 %r39, %r38, 7; 2026-02-21T09:50:04.1299050Z cvt.s64.s32 %rd10, %r39; 2026-02-21T09:50:04.1299108Z add.s64 %rd11, %rd9, %rd10; 2026-02-21T09:50:04.1299167Z cvta.global.u64 %rd12, %rd11; 2026-02-21T09:50:04.1299231Z add.s32 %r18, %r1, -256; 2026-02-21T09:50:04.1299283Z mov.b32 %r941, 0; 2026-02-21T09:50:04.1299338Z mov.b32 %r940, -64; 2026-02-21T09:50:04.1299400Z mov.b32 %r942, %r941; 2026-02-21T09:50:04.1299496Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:50:04.1299584Z // => This Inner Loop Header: Depth=2 2026-02-21T09:50:04.1299750Z .loc 1 0 67 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:0:67 2026-02-21T09:50:04.1299819Z setp.lt.u32 %p6, %r18, 32; 2026-02-21T09:50:04.1299876Z setp.eq.b32 %p4, %r18, 0; 2026-02-21T09:50:04.1300048Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1300111Z add.s32 %r940, %r940, 64; 2026-02-21T09:50:04.1300165Z shl.b32 %r47, %r942, 3; 2026-02-21T09:50:04.1300219Z add.s32 %r49, %r28, %r47; 2026-02-21T09:50:04.1300282Z add.s32 %r40, %r49, 114688; 2026-02-21T09:50:04.1300445Z .loc 1 54 31 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:54:31 2026-02-21T09:50:04.1300499Z // begin inline asm 2026-02-21T09:50:04.1300547Z 2026-02-21T09:50:04.1300604Z { 2026-02-21T09:50:04.1300663Z .reg .pred complete; 2026-02-21T09:50:04.1300715Z waitLoop: 2026-02-21T09:50:04.1300835Z mbarrier.try_wait.parity.shared.b64 complete, [%r40], %r941; 2026-02-21T09:50:04.1300926Z @!complete bra.uni waitLoop; 2026-02-21T09:50:04.1300973Z } 2026-02-21T09:50:04.1300977Z 2026-02-21T09:50:04.1301031Z // end inline asm 2026-02-21T09:50:04.1301212Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1301269Z add.s32 %r46, %r49, 114720; 2026-02-21T09:50:04.1301438Z .loc 1 54 31 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:54:31 2026-02-21T09:50:04.1301522Z bar.sync 3, 64; 2026-02-21T09:50:04.1301577Z // begin inline asm 2026-02-21T09:50:04.1301682Z @%p4 mbarrier.arrive.expect_tx.shared.b64 _, [%r46], 16384; 2026-02-21T09:50:04.1301740Z // end inline asm 2026-02-21T09:50:04.1301796Z shl.b32 %r50, %r942, 14; 2026-02-21T09:50:04.1301853Z add.s32 %r43, %r28, %r50; 2026-02-21T09:50:04.1301906Z bar.sync 3, 64; 2026-02-21T09:50:04.1301973Z elect.sync %r51|%p7, -1; 2026-02-21T09:50:04.1302034Z and.pred %p5, %p6, %p7; 2026-02-21T09:50:04.1302088Z // begin inline asm 2026-02-21T09:50:04.1302334Z @%p5 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r43], [%rd12, {%r940, %r45}], [%r46]; 2026-02-21T09:50:04.1302387Z // end inline asm 2026-02-21T09:50:04.1302441Z add.s32 %r52, %r942, 1; 2026-02-21T09:50:04.1302532Z setp.eq.b32 %p8, %r52, 4; 2026-02-21T09:50:04.1302614Z selp.b32 %r942, 0, %r52, %p8; 2026-02-21T09:50:04.1302672Z selp.b32 %r53, 1, 0, %p8; 2026-02-21T09:50:04.1302727Z xor.b32 %r941, %r941, %r53; 2026-02-21T09:50:04.1302906Z .loc 1 49 111 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:49:111 2026-02-21T09:50:04.1302967Z setp.lt.u32 %p9, %r940, 1984; 2026-02-21T09:50:04.1303024Z @%p9 bra $L__BB0_11; 2026-02-21T09:50:04.1303123Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1303178Z barrier.sync 1; 2026-02-21T09:50:04.1303231Z bra.uni $L__BB0_2; 2026-02-21T09:50:04.1303328Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1303387Z cp.async.wait_group 0; 2026-02-21T09:50:04.1303441Z bar.sync 2, 128; 2026-02-21T09:50:04.1303494Z barrier.sync 1; 2026-02-21T09:50:04.1303554Z bra.uni $L__BB0_2; 2026-02-21T09:50:04.1303644Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:04.1303805Z .loc 1 19 0 // cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py:19 2026-02-21T09:50:04.1303867Z barrier.sync 1; 2026-02-21T09:50:04.1303921Z barrier.sync 1; 2026-02-21T09:50:04.1303973Z bra.uni $L__BB0_2; 2026-02-21T09:50:04.1304026Z $L__tmp1: 2026-02-21T09:50:04.1304087Z $L__func_end0: 2026-02-21T09:50:04.1304164Z // -- End function 2026-02-21T09:50:04.1304214Z } 2026-02-21T09:50:04.1304432Z .file 1 "/tmp/torchinductor_root/up/cup6jzszkqeabdk4htloylrskdphgeczlsfkfn4goatv65ayczji.py" 2026-02-21T09:50:04.1304492Z .section .debug_abbrev 2026-02-21T09:50:04.1304543Z { 2026-02-21T09:50:04.1304628Z .b8 1 // Abbreviation Code 2026-02-21T09:50:04.1304770Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:50:04.1304848Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:50:04.1304926Z .b8 37 // DW_AT_producer 2026-02-21T09:50:04.1305006Z .b8 8 // DW_FORM_string 2026-02-21T09:50:04.1305079Z .b8 19 // DW_AT_language 2026-02-21T09:50:04.1305154Z .b8 5 // DW_FORM_data2 2026-02-21T09:50:04.1305233Z .b8 3 // DW_AT_name 2026-02-21T09:50:04.1305304Z .b8 8 // DW_FORM_string 2026-02-21T09:50:04.1305380Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:50:04.1305463Z .b8 6 // DW_FORM_data4 2026-02-21T09:50:04.1305573Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:50:04.1305644Z .b8 8 // DW_FORM_string 2026-02-21T09:50:04.1305714Z .b8 0 // EOM(1) 2026-02-21T09:50:04.1305792Z .b8 0 // EOM(2) 2026-02-21T09:50:04.1305870Z .b8 0 // EOM(3) 2026-02-21T09:50:04.1305922Z } 2026-02-21T09:50:04.1306021Z .section .debug_info 2026-02-21T09:50:04.1306071Z { 2026-02-21T09:50:04.1306151Z .b32 104 // Length of Unit 2026-02-21T09:50:04.1306233Z .b8 2 // DWARF version number 2026-02-21T09:50:04.1306290Z .b8 0 2026-02-21T09:50:04.1306404Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:50:04.1306490Z .b8 8 // Address Size (in bytes) 2026-02-21T09:50:04.1306595Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:50:04.1306672Z .b8 116 // DW_AT_producer 2026-02-21T09:50:04.1306723Z .b8 114 2026-02-21T09:50:04.1306780Z .b8 105 2026-02-21T09:50:04.1306829Z .b8 116 2026-02-21T09:50:04.1306877Z .b8 111 2026-02-21T09:50:04.1306926Z .b8 110 2026-02-21T09:50:04.1306981Z .b8 0 2026-02-21T09:50:04.1307080Z .b8 2 // DW_AT_language 2026-02-21T09:50:04.1307155Z .b8 0 2026-02-21T09:50:04.1307239Z .b8 99 // DW_AT_name 2026-02-21T09:50:04.1307289Z .b8 117 2026-02-21T09:50:04.1307338Z .b8 112 2026-02-21T09:50:04.1307386Z .b8 54 2026-02-21T09:50:04.1307442Z .b8 106 2026-02-21T09:50:04.1307492Z .b8 122 2026-02-21T09:50:04.1307541Z .b8 115 2026-02-21T09:50:04.1307596Z .b8 122 2026-02-21T09:50:04.1307645Z .b8 107 2026-02-21T09:50:04.1307694Z .b8 113 2026-02-21T09:50:04.1307742Z .b8 101 2026-02-21T09:50:04.1307801Z .b8 97 2026-02-21T09:50:04.1307850Z .b8 98 2026-02-21T09:50:04.1307899Z .b8 100 2026-02-21T09:50:04.1307947Z .b8 107 2026-02-21T09:50:04.1308004Z .b8 52 2026-02-21T09:50:04.1308052Z .b8 104 2026-02-21T09:50:04.1308101Z .b8 116 2026-02-21T09:50:04.1308157Z .b8 108 2026-02-21T09:50:04.1308205Z .b8 111 2026-02-21T09:50:04.1308253Z .b8 121 2026-02-21T09:50:04.1308301Z .b8 108 2026-02-21T09:50:04.1308358Z .b8 114 2026-02-21T09:50:04.1308407Z .b8 115 2026-02-21T09:50:04.1308456Z .b8 107 2026-02-21T09:50:04.1308511Z .b8 100 2026-02-21T09:50:04.1308563Z .b8 112 2026-02-21T09:50:04.1308611Z .b8 104 2026-02-21T09:50:04.1308659Z .b8 103 2026-02-21T09:50:04.1308715Z .b8 101 2026-02-21T09:50:04.1308764Z .b8 99 2026-02-21T09:50:04.1308810Z .b8 122 2026-02-21T09:50:04.1308865Z .b8 108 2026-02-21T09:50:04.1308914Z .b8 115 2026-02-21T09:50:04.1308962Z .b8 102 2026-02-21T09:50:04.1309010Z .b8 107 2026-02-21T09:50:04.1309065Z .b8 102 2026-02-21T09:50:04.1309113Z .b8 110 2026-02-21T09:50:04.1309162Z .b8 52 2026-02-21T09:50:04.1309209Z .b8 103 2026-02-21T09:50:04.1309264Z .b8 111 2026-02-21T09:50:04.1309312Z .b8 97 2026-02-21T09:50:04.1309361Z .b8 116 2026-02-21T09:50:04.1309416Z .b8 118 2026-02-21T09:50:04.1309481Z .b8 54 2026-02-21T09:50:04.1309529Z .b8 53 2026-02-21T09:50:04.1309576Z .b8 97 2026-02-21T09:50:04.1309632Z .b8 121 2026-02-21T09:50:04.1309680Z .b8 99 2026-02-21T09:50:04.1309728Z .b8 122 2026-02-21T09:50:04.1309783Z .b8 106 2026-02-21T09:50:04.1309831Z .b8 105 2026-02-21T09:50:04.1309880Z .b8 46 2026-02-21T09:50:04.1309927Z .b8 112 2026-02-21T09:50:04.1309987Z .b8 121 2026-02-21T09:50:04.1310035Z .b8 0 2026-02-21T09:50:04.1310123Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:50:04.1310201Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:50:04.1310250Z .b8 116 2026-02-21T09:50:04.1310298Z .b8 109 2026-02-21T09:50:04.1310346Z .b8 112 2026-02-21T09:50:04.1310402Z .b8 47 2026-02-21T09:50:04.1310451Z .b8 116 2026-02-21T09:50:04.1310498Z .b8 111 2026-02-21T09:50:04.1310547Z .b8 114 2026-02-21T09:50:04.1310603Z .b8 99 2026-02-21T09:50:04.1310651Z .b8 104 2026-02-21T09:50:04.1310699Z .b8 105 2026-02-21T09:50:04.1310782Z .b8 110 2026-02-21T09:50:04.1310831Z .b8 100 2026-02-21T09:50:04.1310879Z .b8 117 2026-02-21T09:50:04.1310926Z .b8 99 2026-02-21T09:50:04.1310983Z .b8 116 2026-02-21T09:50:04.1311032Z .b8 111 2026-02-21T09:50:04.1311080Z .b8 114 2026-02-21T09:50:04.1311137Z .b8 95 2026-02-21T09:50:04.1311186Z .b8 114 2026-02-21T09:50:04.1311239Z .b8 111 2026-02-21T09:50:04.1311289Z .b8 111 2026-02-21T09:50:04.1311376Z .b8 116 2026-02-21T09:50:04.1311428Z .b8 47 2026-02-21T09:50:04.1311479Z .b8 117 2026-02-21T09:50:04.1311535Z .b8 112 2026-02-21T09:50:04.1311584Z .b8 0 2026-02-21T09:50:04.1311632Z } 2026-02-21T09:50:04.1311696Z .section .debug_macinfo { } 2026-02-21T09:50:04.1311700Z 2026-02-21T09:50:04.1311783Z ================================================================ 2026-02-21T09:50:04.1311883Z please share the reproducer above with Triton project. 2026-02-21T09:50:08.1274068Z 2026-02-21T09:50:08.1278711Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 91/91 21.3 configs/s 2026-02-21T09:50:16.9060593Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 114.2 2026-02-21T09:50:16.9064511Z configs/s 2026-02-21T09:50:17.1853984Z [234s] Generation 7 complete: 2026-02-21T09:50:17.1858234Z error=27 2026-02-21T09:50:17.1860152Z ok=69 2026-02-21T09:50:17.1860338Z min=0.1189 2026-02-21T09:50:17.1860575Z mid=0.1443 2026-02-21T09:50:17.1860714Z max=19.4422 2026-02-21T09:50:17.1860874Z best={'block_sizes': [128, 128, 64], 2026-02-21T09:50:17.1861141Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T09:50:17.1861396Z 'l2_groupings': [16], 2026-02-21T09:50:17.1861566Z 'load_eviction_policies': ['', ''], 2026-02-21T09:50:17.1861753Z 'loop_orders': [[1, 0]], 2026-02-21T09:50:17.1861908Z 'maxnreg': 64, 2026-02-21T09:50:17.1862047Z 'num_sm_multiplier': 8, 2026-02-21T09:50:17.1862202Z 'num_stages': 2, 2026-02-21T09:50:17.1862334Z 'num_warps': 1, 2026-02-21T09:50:17.1862496Z 'pid_type': 'persistent_blocked', 2026-02-21T09:50:17.1862676Z 'range_flattens': [False, False], 2026-02-21T09:50:17.1862853Z 'range_multi_buffers': [False, False], 2026-02-21T09:50:17.1863032Z 'range_num_stages': [0, 0], 2026-02-21T09:50:17.1863201Z 'range_unroll_factors': [0, 0], 2026-02-21T09:50:17.1863382Z 'range_warp_specializes': [None, True]} 2026-02-21T09:50:17.1881004Z [234s] Fitting surrogate: 770 points, 770 targets 2026-02-21T09:50:19.1874000Z [236s] Generation 8 starting: 85 neighbors, 5 active search path(s) 2026-02-21T09:50:27.8373636Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86/86 5.5 configs/s 2026-02-21T09:50:29.9243822Z 2026-02-21T09:50:29.9245610Z 2026-02-21T09:50:29.9246002Z ================================================================ 2026-02-21T09:50:29.9246290Z Internal Triton PTX codegen error 2026-02-21T09:50:29.9246482Z `ptxas` stderr: 2026-02-21T09:50:29.9246941Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:50:29.9247486Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:50:29.9247644Z 2026-02-21T09:50:29.9248096Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpa8v8c48t.ptx -o /tmp/tmpa8v8c48t.ptx.o 2026-02-21T09:50:29.9248589Z 2026-02-21T09:50:29.9248593Z 2026-02-21T09:50:29.9248651Z // 2026-02-21T09:50:29.9248803Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:50:29.9248975Z // 2026-02-21T09:50:29.9249049Z 2026-02-21T09:50:29.9249106Z .version 8.7 2026-02-21T09:50:29.9249241Z .target sm_100a 2026-02-21T09:50:29.9249385Z .address_size 64 2026-02-21T09:50:29.9249469Z 2026-02-21T09:50:29.9249602Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:50:29.9249871Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:50:29.9250108Z // @_helion_matmul 2026-02-21T09:50:29.9250616Z .visible .entry _helion_matmul( 2026-02-21T09:50:29.9250853Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:50:29.9251114Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:50:29.9251378Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:50:29.9251861Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:50:29.9252206Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:50:29.9252423Z ) 2026-02-21T09:50:29.9252542Z .reqntid 256 2026-02-21T09:50:29.9252685Z .maxnreg 32 2026-02-21T09:50:29.9252807Z { 2026-02-21T09:50:29.9252942Z .reg .pred %p<142>; 2026-02-21T09:50:29.9253095Z .reg .b32 %r<1699>; 2026-02-21T09:50:29.9253243Z .reg .b64 %rd<651>; 2026-02-21T09:50:29.9253521Z .loc 1 19 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:19:0 2026-02-21T09:50:29.9253823Z $L__func_begin0: 2026-02-21T09:50:29.9254073Z .loc 1 19 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:19:0 2026-02-21T09:50:29.9254297Z 2026-02-21T09:50:29.9254346Z // %bb.0: 2026-02-21T09:50:29.9254500Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:50:29.9255421Z $L__tmp0: 2026-02-21T09:50:29.9255762Z .loc 1 19 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:19 2026-02-21T09:50:29.9256131Z mov.u32 %r1, %tid.x; 2026-02-21T09:50:29.9256276Z shr.u32 %r2, %r1, 5; 2026-02-21T09:50:29.9256437Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:50:29.9256615Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:50:29.9256769Z @%p3 bra $L__BB0_16; 2026-02-21T09:50:29.9256904Z bra.uni $L__BB0_1; 2026-02-21T09:50:29.9257046Z $L__BB0_16: 2026-02-21T09:50:29.9257269Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0:0 2026-02-21T09:50:29.9257580Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:50:29.9257785Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:50:29.9258075Z .loc 1 19 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:19 2026-02-21T09:50:29.9258380Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:50:29.9258564Z setp.lt.u32 %p59, %r1, 32; 2026-02-21T09:50:29.9258731Z mov.b32 %r280, global_smem; 2026-02-21T09:50:29.9258899Z // begin inline asm 2026-02-21T09:50:29.9259158Z @%p59 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r280], 512; 2026-02-21T09:50:29.9259411Z // end inline asm 2026-02-21T09:50:29.9259561Z bar.sync 0, 128; 2026-02-21T09:50:29.9259721Z ld.shared.b32 %r1665, [global_smem]; 2026-02-21T09:50:29.9259892Z bar.sync 0, 128; 2026-02-21T09:50:29.9260028Z // begin inline asm 2026-02-21T09:50:29.9260227Z @%p59 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:50:29.9260453Z // end inline asm 2026-02-21T09:50:29.9260723Z .loc 1 21 67 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:21:67 2026-02-21T09:50:29.9261015Z mov.u32 %r582, %ctaid.x; 2026-02-21T09:50:29.9261164Z mov.u32 %r583, %ctaid.y; 2026-02-21T09:50:29.9261317Z mov.u32 %r584, %ctaid.z; 2026-02-21T09:50:29.9261462Z mov.u32 %r585, %nctaid.x; 2026-02-21T09:50:29.9261619Z mov.u32 %r586, %nctaid.y; 2026-02-21T09:50:29.9261773Z mad.lo.s32 %r587, %r584, %r586, %r583; 2026-02-21T09:50:29.9261954Z mad.lo.s32 %r588, %r587, %r585, %r582; 2026-02-21T09:50:29.9262118Z shl.b32 %r589, %r588, 8; 2026-02-21T09:50:29.9262274Z cvt.s64.s32 %rd106, %r589; 2026-02-21T09:50:29.9262428Z add.s64 %rd85, %rd6, %rd106; 2026-02-21T09:50:29.9262586Z shl.b32 %r590, %r1, 2; 2026-02-21T09:50:29.9262737Z add.s32 %r281, %r280, %r590; 2026-02-21T09:50:29.9262885Z mov.b32 %r1696, 0; 2026-02-21T09:50:29.9263025Z // begin inline asm 2026-02-21T09:50:29.9263174Z @%p59 st.shared.b32 [ %r281 + 0 ], %r1696; 2026-02-21T09:50:29.9263348Z // end inline asm 2026-02-21T09:50:29.9263483Z bar.warp.sync -1; 2026-02-21T09:50:29.9263684Z setp.eq.b32 %p132, %r1, 0; 2026-02-21T09:50:29.9263838Z cvt.u64.u32 %rd70, %r280; 2026-02-21T09:50:29.9263996Z // begin inline asm 2026-02-21T09:50:29.9264251Z @%p132 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd70 + 0 ], %rd3; 2026-02-21T09:50:29.9264527Z // end inline asm 2026-02-21T09:50:29.9264666Z // begin inline asm 2026-02-21T09:50:29.9264915Z @%p132 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1; 2026-02-21T09:50:29.9265211Z // end inline asm 2026-02-21T09:50:29.9265340Z mov.b32 %r283, 64; 2026-02-21T09:50:29.9265479Z // begin inline asm 2026-02-21T09:50:29.9265714Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0, %r283; 2026-02-21T09:50:29.9265988Z // end inline asm 2026-02-21T09:50:29.9266125Z mov.b32 %r284, 256; 2026-02-21T09:50:29.9266261Z // begin inline asm 2026-02-21T09:50:29.9266492Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1, %r284; 2026-02-21T09:50:29.9266748Z // end inline asm 2026-02-21T09:50:29.9266883Z mov.b32 %r285, 2048; 2026-02-21T09:50:29.9267019Z // begin inline asm 2026-02-21T09:50:29.9267263Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0, %r285; 2026-02-21T09:50:29.9267538Z // end inline asm 2026-02-21T09:50:29.9267696Z // begin inline asm 2026-02-21T09:50:29.9267970Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1, %r285; 2026-02-21T09:50:29.9268238Z // end inline asm 2026-02-21T09:50:29.9268374Z mov.b64 %rd78, 4096; 2026-02-21T09:50:29.9268510Z // begin inline asm 2026-02-21T09:50:29.9268761Z @%p132 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd70 + 0 ], 0x0, %rd78; 2026-02-21T09:50:29.9269046Z // end inline asm 2026-02-21T09:50:29.9269183Z mov.b32 %r287, 1; 2026-02-21T09:50:29.9269324Z // begin inline asm 2026-02-21T09:50:29.9269579Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0, %r287; 2026-02-21T09:50:29.9269873Z // end inline asm 2026-02-21T09:50:29.9270004Z // begin inline asm 2026-02-21T09:50:29.9270262Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1, %r287; 2026-02-21T09:50:29.9270539Z // end inline asm 2026-02-21T09:50:29.9270674Z // begin inline asm 2026-02-21T09:50:29.9270912Z @%p132 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x6; 2026-02-21T09:50:29.9271172Z // end inline asm 2026-02-21T09:50:29.9271306Z // begin inline asm 2026-02-21T09:50:29.9271553Z @%p132 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0; 2026-02-21T09:50:29.9271841Z // end inline asm 2026-02-21T09:50:29.9271970Z // begin inline asm 2026-02-21T09:50:29.9272209Z @%p132 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x3; 2026-02-21T09:50:29.9272469Z // end inline asm 2026-02-21T09:50:29.9272608Z // begin inline asm 2026-02-21T09:50:29.9272838Z @%p132 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0; 2026-02-21T09:50:29.9273090Z // end inline asm 2026-02-21T09:50:29.9273227Z // begin inline asm 2026-02-21T09:50:29.9273564Z @%p59 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd85 + 0 ], [ %rd70 + 0 ], 0x80; 2026-02-21T09:50:29.9273946Z // end inline asm 2026-02-21T09:50:29.9274077Z // begin inline asm 2026-02-21T09:50:29.9274295Z @%p59 fence.proxy.tensormap::generic.acquire.gpu [ %rd85 + 0 ], 0x80; 2026-02-21T09:50:29.9274547Z @%p59 cp.async.bulk.commit_group ; 2026-02-21T09:50:29.9274766Z @%p59 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:50:29.9274946Z // end inline asm 2026-02-21T09:50:29.9275072Z bar.sync 0, 128; 2026-02-21T09:50:29.9275314Z .loc 1 22 68 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:22:68 2026-02-21T09:50:29.9275605Z add.s64 %rd103, %rd85, 128; 2026-02-21T09:50:29.9275761Z bar.sync 0, 128; 2026-02-21T09:50:29.9275888Z // begin inline asm 2026-02-21T09:50:29.9276076Z @%p59 st.shared.b32 [ %r281 + 0 ], %r1696; 2026-02-21T09:50:29.9276254Z // end inline asm 2026-02-21T09:50:29.9276389Z bar.warp.sync -1; 2026-02-21T09:50:29.9276529Z // begin inline asm 2026-02-21T09:50:29.9276766Z @%p132 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd70 + 0 ], %rd4; 2026-02-21T09:50:29.9277036Z // end inline asm 2026-02-21T09:50:29.9277165Z // begin inline asm 2026-02-21T09:50:29.9277469Z @%p132 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1; 2026-02-21T09:50:29.9277711Z // end inline asm 2026-02-21T09:50:29.9277846Z // begin inline asm 2026-02-21T09:50:29.9278082Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0, %r283; 2026-02-21T09:50:29.9278346Z // end inline asm 2026-02-21T09:50:29.9278483Z mov.b32 %r292, 128; 2026-02-21T09:50:29.9278620Z // begin inline asm 2026-02-21T09:50:29.9278852Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1, %r292; 2026-02-21T09:50:29.9279112Z // end inline asm 2026-02-21T09:50:29.9279251Z // begin inline asm 2026-02-21T09:50:29.9279494Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0, %r285; 2026-02-21T09:50:29.9279763Z // end inline asm 2026-02-21T09:50:29.9279901Z mov.b32 %r294, 12288; 2026-02-21T09:50:29.9280072Z // begin inline asm 2026-02-21T09:50:29.9280349Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1, %r294; 2026-02-21T09:50:29.9280620Z // end inline asm 2026-02-21T09:50:29.9280760Z // begin inline asm 2026-02-21T09:50:29.9281006Z @%p132 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd70 + 0 ], 0x0, %rd78; 2026-02-21T09:50:29.9281290Z // end inline asm 2026-02-21T09:50:29.9281425Z // begin inline asm 2026-02-21T09:50:29.9281671Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0, %r287; 2026-02-21T09:50:29.9281957Z // end inline asm 2026-02-21T09:50:29.9282088Z // begin inline asm 2026-02-21T09:50:29.9282346Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x1, %r287; 2026-02-21T09:50:29.9282618Z // end inline asm 2026-02-21T09:50:29.9282755Z // begin inline asm 2026-02-21T09:50:29.9282985Z @%p132 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x6; 2026-02-21T09:50:29.9283237Z // end inline asm 2026-02-21T09:50:29.9283372Z // begin inline asm 2026-02-21T09:50:29.9283616Z @%p132 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0; 2026-02-21T09:50:29.9283896Z // end inline asm 2026-02-21T09:50:29.9284024Z // begin inline asm 2026-02-21T09:50:29.9284259Z @%p132 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x3; 2026-02-21T09:50:29.9284522Z // end inline asm 2026-02-21T09:50:29.9284649Z // begin inline asm 2026-02-21T09:50:29.9284901Z @%p132 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd70 + 0 ], 0x0; 2026-02-21T09:50:29.9285150Z // end inline asm 2026-02-21T09:50:29.9285285Z // begin inline asm 2026-02-21T09:50:29.9285614Z @%p59 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd103 + 0 ], [ %rd70 + 0 ], 0x80; 2026-02-21T09:50:29.9285996Z // end inline asm 2026-02-21T09:50:29.9286132Z // begin inline asm 2026-02-21T09:50:29.9286337Z @%p59 fence.proxy.tensormap::generic.acquire.gpu [ %rd103 + 0 ], 0x80; 2026-02-21T09:50:29.9286591Z @%p59 cp.async.bulk.commit_group ; 2026-02-21T09:50:29.9286772Z @%p59 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:50:29.9286950Z // end inline asm 2026-02-21T09:50:29.9287080Z bar.sync 0, 128; 2026-02-21T09:50:29.9287327Z .loc 1 29 35 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:29:35 2026-02-21T09:50:29.9287637Z mul.lo.s32 %r44, %r582, 3; 2026-02-21T09:50:29.9287915Z .loc 1 30 37 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:30:37 2026-02-21T09:50:29.9288212Z add.s32 %r591, %r44, 3; 2026-02-21T09:50:29.9288507Z .loc 1 30 49 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:30:49 2026-02-21T09:50:29.9288803Z min.s32 %r592, %r591, 768; 2026-02-21T09:50:29.9289065Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9289369Z sub.s32 %r595, %r592, %r44; 2026-02-21T09:50:29.9289533Z shl.b32 %r1685, %r595, 4; 2026-02-21T09:50:29.9289832Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9290150Z shfl.sync.idx.b32 %r596, %r2, 0, 31, -1; 2026-02-21T09:50:29.9290335Z shl.b32 %r597, %r596, 21; 2026-02-21T09:50:29.9290501Z and.b32 %r598, %r597, 6291456; 2026-02-21T09:50:29.9290667Z add.s32 %r297, %r598, %r1665; 2026-02-21T09:50:29.9290838Z mov.pred %p97, -1; 2026-02-21T09:50:29.9290984Z // begin inline asm 2026-02-21T09:50:29.9291403Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 0], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9291850Z // end inline asm 2026-02-21T09:50:29.9291990Z // begin inline asm 2026-02-21T09:50:29.9292446Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 16], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9292927Z // end inline asm 2026-02-21T09:50:29.9293071Z // begin inline asm 2026-02-21T09:50:29.9293452Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 32], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9293878Z // end inline asm 2026-02-21T09:50:29.9294019Z // begin inline asm 2026-02-21T09:50:29.9294399Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 48], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9294861Z // end inline asm 2026-02-21T09:50:29.9294995Z // begin inline asm 2026-02-21T09:50:29.9295383Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 64], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9295799Z // end inline asm 2026-02-21T09:50:29.9295931Z // begin inline asm 2026-02-21T09:50:29.9296313Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 80], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9296711Z // end inline asm 2026-02-21T09:50:29.9296846Z // begin inline asm 2026-02-21T09:50:29.9297217Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 96], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9297627Z // end inline asm 2026-02-21T09:50:29.9297766Z // begin inline asm 2026-02-21T09:50:29.9298135Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 112], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9298555Z // end inline asm 2026-02-21T09:50:29.9298685Z // begin inline asm 2026-02-21T09:50:29.9299062Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 128], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9299480Z // end inline asm 2026-02-21T09:50:29.9299610Z // begin inline asm 2026-02-21T09:50:29.9299984Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 144], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9300385Z // end inline asm 2026-02-21T09:50:29.9300520Z // begin inline asm 2026-02-21T09:50:29.9300913Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 160], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9301320Z // end inline asm 2026-02-21T09:50:29.9301454Z // begin inline asm 2026-02-21T09:50:29.9301824Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 176], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9302281Z // end inline asm 2026-02-21T09:50:29.9302409Z // begin inline asm 2026-02-21T09:50:29.9302780Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 192], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9303184Z // end inline asm 2026-02-21T09:50:29.9303320Z // begin inline asm 2026-02-21T09:50:29.9303697Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 208], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9304118Z // end inline asm 2026-02-21T09:50:29.9304253Z // begin inline asm 2026-02-21T09:50:29.9304703Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 224], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9305131Z // end inline asm 2026-02-21T09:50:29.9305267Z // begin inline asm 2026-02-21T09:50:29.9305636Z @%p97 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r297 + 240], {%r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696, %r1696}; 2026-02-21T09:50:29.9306063Z // end inline asm 2026-02-21T09:50:29.9306192Z // begin inline asm 2026-02-21T09:50:29.9306346Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:50:29.9306504Z // end inline asm 2026-02-21T09:50:29.9306643Z bar.sync 0, 128; 2026-02-21T09:50:29.9306889Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9307184Z add.s32 %r569, %r280, 204800; 2026-02-21T09:50:29.9307362Z // begin inline asm 2026-02-21T09:50:29.9307525Z @%p132 mbarrier.init.shared::cta.b64 [%r569], 1; 2026-02-21T09:50:29.9307721Z // end inline asm 2026-02-21T09:50:29.9307851Z bar.sync 0, 128; 2026-02-21T09:50:29.9307992Z add.s32 %r570, %r280, 204808; 2026-02-21T09:50:29.9308140Z // begin inline asm 2026-02-21T09:50:29.9308308Z @%p132 mbarrier.init.shared::cta.b64 [%r570], 1; 2026-02-21T09:50:29.9308491Z // end inline asm 2026-02-21T09:50:29.9308631Z add.s32 %r571, %r280, 204816; 2026-02-21T09:50:29.9308786Z // begin inline asm 2026-02-21T09:50:29.9308944Z @%p132 mbarrier.init.shared::cta.b64 [%r571], 1; 2026-02-21T09:50:29.9309133Z // end inline asm 2026-02-21T09:50:29.9309261Z bar.sync 0, 128; 2026-02-21T09:50:29.9309400Z add.s32 %r572, %r280, 204824; 2026-02-21T09:50:29.9309548Z // begin inline asm 2026-02-21T09:50:29.9309715Z @%p132 mbarrier.init.shared::cta.b64 [%r572], 1; 2026-02-21T09:50:29.9309894Z // end inline asm 2026-02-21T09:50:29.9310136Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9310410Z bar.sync 0, 128; 2026-02-21T09:50:29.9310538Z // begin inline asm 2026-02-21T09:50:29.9310712Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r569]; 2026-02-21T09:50:29.9310904Z // end inline asm 2026-02-21T09:50:29.9311038Z bar.sync 0, 128; 2026-02-21T09:50:29.9311163Z // begin inline asm 2026-02-21T09:50:29.9311333Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r570]; 2026-02-21T09:50:29.9311518Z // end inline asm 2026-02-21T09:50:29.9311758Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9312039Z bar.sync 0, 128; 2026-02-21T09:50:29.9312170Z add.s32 %r575, %r280, 204832; 2026-02-21T09:50:29.9312326Z // begin inline asm 2026-02-21T09:50:29.9312514Z @%p132 mbarrier.init.shared::cta.b64 [%r575], 1; 2026-02-21T09:50:29.9312700Z // end inline asm 2026-02-21T09:50:29.9312824Z bar.sync 0, 128; 2026-02-21T09:50:29.9312961Z add.s32 %r576, %r280, 204840; 2026-02-21T09:50:29.9313108Z // begin inline asm 2026-02-21T09:50:29.9313275Z @%p132 mbarrier.init.shared::cta.b64 [%r576], 1; 2026-02-21T09:50:29.9313460Z // end inline asm 2026-02-21T09:50:29.9313620Z add.s32 %r577, %r280, 204848; 2026-02-21T09:50:29.9313774Z // begin inline asm 2026-02-21T09:50:29.9313930Z @%p132 mbarrier.init.shared::cta.b64 [%r577], 1; 2026-02-21T09:50:29.9314117Z // end inline asm 2026-02-21T09:50:29.9314242Z bar.sync 0, 128; 2026-02-21T09:50:29.9314380Z add.s32 %r578, %r280, 204856; 2026-02-21T09:50:29.9314527Z // begin inline asm 2026-02-21T09:50:29.9314710Z @%p132 mbarrier.init.shared::cta.b64 [%r578], 1; 2026-02-21T09:50:29.9314890Z // end inline asm 2026-02-21T09:50:29.9315133Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9315416Z bar.sync 0, 128; 2026-02-21T09:50:29.9315548Z // begin inline asm 2026-02-21T09:50:29.9315725Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r577]; 2026-02-21T09:50:29.9315910Z // end inline asm 2026-02-21T09:50:29.9316044Z bar.sync 0, 128; 2026-02-21T09:50:29.9316197Z // begin inline asm 2026-02-21T09:50:29.9316394Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r578]; 2026-02-21T09:50:29.9316578Z // end inline asm 2026-02-21T09:50:29.9316824Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9317131Z st.shared.b32 [global_smem+204864], 33554689; 2026-02-21T09:50:29.9317335Z st.shared.b32 [global_smem+196608], %r1665; 2026-02-21T09:50:29.9317522Z barrier.sync 1; 2026-02-21T09:50:29.9317676Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:50:29.9317860Z barrier.sync 1; 2026-02-21T09:50:29.9317998Z setp.lt.s32 %p125, %r1685, 1; 2026-02-21T09:50:29.9318164Z mov.b32 %r1698, %r1696; 2026-02-21T09:50:29.9318310Z @%p125 bra $L__BB0_23; 2026-02-21T09:50:29.9318482Z // %bb.17: // %.lr.ph14 2026-02-21T09:50:29.9318783Z .loc 1 0 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0:74 2026-02-21T09:50:29.9319086Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:50:29.9319278Z shl.b32 %r593, %r1, 3; 2026-02-21T09:50:29.9319428Z and.b32 %r45, %r593, 120; 2026-02-21T09:50:29.9319584Z shr.u32 %r594, %r1, 4; 2026-02-21T09:50:29.9319734Z bfe.u32 %r46, %r1, 4, 3; 2026-02-21T09:50:29.9319887Z or.b32 %r47, %r46, 8; 2026-02-21T09:50:29.9320028Z or.b32 %r48, %r46, 16; 2026-02-21T09:50:29.9320175Z or.b32 %r49, %r46, 24; 2026-02-21T09:50:29.9320319Z or.b32 %r50, %r46, 32; 2026-02-21T09:50:29.9320456Z or.b32 %r51, %r46, 40; 2026-02-21T09:50:29.9320598Z or.b32 %r52, %r46, 48; 2026-02-21T09:50:29.9320736Z or.b32 %r53, %r594, 56; 2026-02-21T09:50:29.9320885Z or.b32 %r54, %r46, 64; 2026-02-21T09:50:29.9321020Z or.b32 %r55, %r46, 72; 2026-02-21T09:50:29.9321159Z or.b32 %r56, %r46, 80; 2026-02-21T09:50:29.9321293Z or.b32 %r57, %r46, 88; 2026-02-21T09:50:29.9321435Z or.b32 %r58, %r46, 96; 2026-02-21T09:50:29.9321572Z or.b32 %r59, %r46, 104; 2026-02-21T09:50:29.9321723Z or.b32 %r60, %r46, 112; 2026-02-21T09:50:29.9321870Z or.b32 %r61, %r594, 120; 2026-02-21T09:50:29.9322016Z or.b32 %r62, %r46, 128; 2026-02-21T09:50:29.9322161Z or.b32 %r63, %r46, 136; 2026-02-21T09:50:29.9322299Z or.b32 %r64, %r46, 144; 2026-02-21T09:50:29.9322444Z or.b32 %r65, %r46, 152; 2026-02-21T09:50:29.9322581Z or.b32 %r66, %r46, 160; 2026-02-21T09:50:29.9322723Z or.b32 %r67, %r46, 168; 2026-02-21T09:50:29.9322858Z or.b32 %r68, %r46, 176; 2026-02-21T09:50:29.9323004Z or.b32 %r69, %r594, 184; 2026-02-21T09:50:29.9323146Z or.b32 %r70, %r46, 192; 2026-02-21T09:50:29.9323289Z or.b32 %r71, %r46, 200; 2026-02-21T09:50:29.9323433Z or.b32 %r72, %r46, 208; 2026-02-21T09:50:29.9323621Z or.b32 %r73, %r46, 216; 2026-02-21T09:50:29.9323767Z or.b32 %r74, %r46, 224; 2026-02-21T09:50:29.9323905Z or.b32 %r75, %r46, 232; 2026-02-21T09:50:29.9324051Z or.b32 %r76, %r46, 240; 2026-02-21T09:50:29.9324191Z or.b32 %r77, %r594, 248; 2026-02-21T09:50:29.9324468Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9324790Z add.s32 %r1694, %r44, -1; 2026-02-21T09:50:29.9324979Z shl.b32 %r601, %r1, 10; 2026-02-21T09:50:29.9325133Z and.b32 %r602, %r601, 6144; 2026-02-21T09:50:29.9325286Z shl.b32 %r603, %r1, 4; 2026-02-21T09:50:29.9325437Z and.b32 %r604, %r603, 2032; 2026-02-21T09:50:29.9325589Z or.b32 %r605, %r602, %r604; 2026-02-21T09:50:29.9325752Z add.s32 %r607, %r280, 196608; 2026-02-21T09:50:29.9325904Z add.s32 %r81, %r607, %r605; 2026-02-21T09:50:29.9326064Z xor.b32 %r608, %r605, 32; 2026-02-21T09:50:29.9326212Z add.s32 %r82, %r607, %r608; 2026-02-21T09:50:29.9326369Z xor.b32 %r609, %r605, 64; 2026-02-21T09:50:29.9326530Z add.s32 %r83, %r607, %r609; 2026-02-21T09:50:29.9326709Z xor.b32 %r610, %r605, 96; 2026-02-21T09:50:29.9326888Z add.s32 %r84, %r607, %r610; 2026-02-21T09:50:29.9327049Z and.b32 %r611, %r1, 96; 2026-02-21T09:50:29.9327227Z shl.b32 %r612, %r611, 6; 2026-02-21T09:50:29.9327412Z shl.b32 %r613, %r1, 5; 2026-02-21T09:50:29.9327579Z and.b32 %r614, %r613, 96; 2026-02-21T09:50:29.9327794Z and.b32 %r615, %r603, 384; 2026-02-21T09:50:29.9327979Z and.b32 %r617, %r590, 16; 2026-02-21T09:50:29.9328149Z or.b32 %r618, %r612, %r614; 2026-02-21T09:50:29.9328327Z or.b32 %r619, %r615, %r611; 2026-02-21T09:50:29.9328474Z xor.b32 %r620, %r618, %r619; 2026-02-21T09:50:29.9328632Z add.s32 %r621, %r607, %r617; 2026-02-21T09:50:29.9328788Z add.s32 %r902, %r621, %r620; 2026-02-21T09:50:29.9328937Z add.s32 %r907, %r902, 512; 2026-02-21T09:50:29.9329091Z add.s32 %r912, %r902, 1024; 2026-02-21T09:50:29.9329234Z add.s32 %r917, %r902, 1536; 2026-02-21T09:50:29.9329388Z mov.b32 %r1691, -1; 2026-02-21T09:50:29.9329524Z mov.b32 %r1698, %r1696; 2026-02-21T09:50:29.9329673Z mov.b32 %r1693, %r1696; 2026-02-21T09:50:29.9329815Z mov.b32 %r1692, %r1696; 2026-02-21T09:50:29.9329966Z bra.uni $L__BB0_18; 2026-02-21T09:50:29.9330151Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:29.9330370Z shl.b32 %r1187, %r1696, 3; 2026-02-21T09:50:29.9330533Z add.s32 %r1189, %r280, %r1187; 2026-02-21T09:50:29.9330695Z add.s32 %r624, %r1189, 204832; 2026-02-21T09:50:29.9330971Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9331249Z shl.b32 %r1190, %r1696, 8; 2026-02-21T09:50:29.9331403Z bar.sync 0, 128; 2026-02-21T09:50:29.9331535Z // begin inline asm 2026-02-21T09:50:29.9331676Z 2026-02-21T09:50:29.9331788Z { 2026-02-21T09:50:29.9331915Z .reg .pred complete; 2026-02-21T09:50:29.9332062Z waitLoop: 2026-02-21T09:50:29.9332259Z mbarrier.try_wait.parity.shared.b64 complete, [%r624], %r1698; 2026-02-21T09:50:29.9332507Z @!complete bra.uni waitLoop; 2026-02-21T09:50:29.9332657Z } 2026-02-21T09:50:29.9332721Z 2026-02-21T09:50:29.9332784Z // end inline asm 2026-02-21T09:50:29.9333025Z .loc 1 43 32 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:43:32 2026-02-21T09:50:29.9333311Z or.b32 %r1191, %r1692, %r45; 2026-02-21T09:50:29.9333570Z .loc 1 45 32 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:45:32 2026-02-21T09:50:29.9333864Z add.s32 %r1192, %r1693, %r46; 2026-02-21T09:50:29.9334024Z add.s32 %r1193, %r1693, %r47; 2026-02-21T09:50:29.9334173Z add.s32 %r1194, %r1693, %r48; 2026-02-21T09:50:29.9334330Z add.s32 %r1195, %r1693, %r49; 2026-02-21T09:50:29.9334477Z add.s32 %r1196, %r1693, %r50; 2026-02-21T09:50:29.9334631Z add.s32 %r1197, %r1693, %r51; 2026-02-21T09:50:29.9334813Z add.s32 %r1198, %r1693, %r52; 2026-02-21T09:50:29.9334968Z add.s32 %r1199, %r1693, %r53; 2026-02-21T09:50:29.9335148Z add.s32 %r1200, %r1693, %r54; 2026-02-21T09:50:29.9335303Z add.s32 %r1201, %r1693, %r55; 2026-02-21T09:50:29.9335457Z add.s32 %r1202, %r1693, %r56; 2026-02-21T09:50:29.9335605Z add.s32 %r1203, %r1693, %r57; 2026-02-21T09:50:29.9335759Z add.s32 %r1204, %r1693, %r58; 2026-02-21T09:50:29.9335907Z add.s32 %r1205, %r1693, %r59; 2026-02-21T09:50:29.9336062Z add.s32 %r1206, %r1693, %r60; 2026-02-21T09:50:29.9336243Z add.s32 %r1207, %r1693, %r61; 2026-02-21T09:50:29.9336394Z add.s32 %r1208, %r1693, %r62; 2026-02-21T09:50:29.9336541Z add.s32 %r1209, %r1693, %r63; 2026-02-21T09:50:29.9336694Z add.s32 %r1210, %r1693, %r64; 2026-02-21T09:50:29.9336841Z add.s32 %r1211, %r1693, %r65; 2026-02-21T09:50:29.9336996Z add.s32 %r1212, %r1693, %r66; 2026-02-21T09:50:29.9337149Z add.s32 %r1213, %r1693, %r67; 2026-02-21T09:50:29.9337294Z add.s32 %r1214, %r1693, %r68; 2026-02-21T09:50:29.9337446Z add.s32 %r1215, %r1693, %r69; 2026-02-21T09:50:29.9337590Z add.s32 %r1216, %r1693, %r70; 2026-02-21T09:50:29.9337743Z add.s32 %r1217, %r1693, %r71; 2026-02-21T09:50:29.9337888Z add.s32 %r1218, %r1693, %r72; 2026-02-21T09:50:29.9338040Z add.s32 %r1219, %r1693, %r73; 2026-02-21T09:50:29.9338188Z add.s32 %r1220, %r1693, %r74; 2026-02-21T09:50:29.9338339Z add.s32 %r1221, %r1693, %r75; 2026-02-21T09:50:29.9338523Z add.s32 %r1222, %r1693, %r76; 2026-02-21T09:50:29.9338695Z add.s32 %r1223, %r1693, %r77; 2026-02-21T09:50:29.9338955Z .loc 1 59 53 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:59:53 2026-02-21T09:50:29.9339243Z mad.lo.s32 %r1224, %r1192, 12288, %r1191; 2026-02-21T09:50:29.9339434Z mad.lo.s32 %r1225, %r1193, 12288, %r1191; 2026-02-21T09:50:29.9339610Z mad.lo.s32 %r1226, %r1194, 12288, %r1191; 2026-02-21T09:50:29.9339792Z mad.lo.s32 %r1227, %r1195, 12288, %r1191; 2026-02-21T09:50:29.9339967Z mad.lo.s32 %r1228, %r1196, 12288, %r1191; 2026-02-21T09:50:29.9340151Z mad.lo.s32 %r1229, %r1197, 12288, %r1191; 2026-02-21T09:50:29.9340331Z mad.lo.s32 %r1230, %r1198, 12288, %r1191; 2026-02-21T09:50:29.9340504Z mad.lo.s32 %r1231, %r1199, 12288, %r1191; 2026-02-21T09:50:29.9340684Z mad.lo.s32 %r1232, %r1200, 12288, %r1191; 2026-02-21T09:50:29.9340856Z mad.lo.s32 %r1233, %r1201, 12288, %r1191; 2026-02-21T09:50:29.9341037Z mad.lo.s32 %r1234, %r1202, 12288, %r1191; 2026-02-21T09:50:29.9341213Z mad.lo.s32 %r1235, %r1203, 12288, %r1191; 2026-02-21T09:50:29.9341403Z mad.lo.s32 %r1236, %r1204, 12288, %r1191; 2026-02-21T09:50:29.9341573Z mad.lo.s32 %r1237, %r1205, 12288, %r1191; 2026-02-21T09:50:29.9341752Z mad.lo.s32 %r1238, %r1206, 12288, %r1191; 2026-02-21T09:50:29.9341930Z mad.lo.s32 %r1239, %r1207, 12288, %r1191; 2026-02-21T09:50:29.9342099Z mad.lo.s32 %r1240, %r1208, 12288, %r1191; 2026-02-21T09:50:29.9342277Z mad.lo.s32 %r1241, %r1209, 12288, %r1191; 2026-02-21T09:50:29.9342449Z mad.lo.s32 %r1242, %r1210, 12288, %r1191; 2026-02-21T09:50:29.9342626Z mad.lo.s32 %r1243, %r1211, 12288, %r1191; 2026-02-21T09:50:29.9342794Z mad.lo.s32 %r1244, %r1212, 12288, %r1191; 2026-02-21T09:50:29.9342974Z mad.lo.s32 %r1245, %r1213, 12288, %r1191; 2026-02-21T09:50:29.9343154Z mad.lo.s32 %r1246, %r1214, 12288, %r1191; 2026-02-21T09:50:29.9343326Z mad.lo.s32 %r1247, %r1215, 12288, %r1191; 2026-02-21T09:50:29.9343504Z mad.lo.s32 %r1248, %r1216, 12288, %r1191; 2026-02-21T09:50:29.9343677Z mad.lo.s32 %r1249, %r1217, 12288, %r1191; 2026-02-21T09:50:29.9343858Z mad.lo.s32 %r1250, %r1218, 12288, %r1191; 2026-02-21T09:50:29.9344030Z mad.lo.s32 %r1251, %r1219, 12288, %r1191; 2026-02-21T09:50:29.9344208Z mad.lo.s32 %r1252, %r1220, 12288, %r1191; 2026-02-21T09:50:29.9344377Z mad.lo.s32 %r1253, %r1221, 12288, %r1191; 2026-02-21T09:50:29.9344556Z mad.lo.s32 %r1254, %r1222, 12288, %r1191; 2026-02-21T09:50:29.9344771Z mad.lo.s32 %r1255, %r1223, 12288, %r1191; 2026-02-21T09:50:29.9345053Z .loc 1 59 24 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:59:24 2026-02-21T09:50:29.9345346Z mad.wide.s32 %rd107, %r1224, 2, %rd5; 2026-02-21T09:50:29.9345555Z mad.wide.s32 %rd108, %r1225, 2, %rd5; 2026-02-21T09:50:29.9345730Z mad.wide.s32 %rd109, %r1226, 2, %rd5; 2026-02-21T09:50:29.9345897Z mad.wide.s32 %rd110, %r1227, 2, %rd5; 2026-02-21T09:50:29.9346071Z mad.wide.s32 %rd111, %r1228, 2, %rd5; 2026-02-21T09:50:29.9346237Z mad.wide.s32 %rd112, %r1229, 2, %rd5; 2026-02-21T09:50:29.9346409Z mad.wide.s32 %rd113, %r1230, 2, %rd5; 2026-02-21T09:50:29.9346608Z mad.wide.s32 %rd114, %r1231, 2, %rd5; 2026-02-21T09:50:29.9346771Z mad.wide.s32 %rd115, %r1232, 2, %rd5; 2026-02-21T09:50:29.9346941Z mad.wide.s32 %rd116, %r1233, 2, %rd5; 2026-02-21T09:50:29.9347104Z mad.wide.s32 %rd117, %r1234, 2, %rd5; 2026-02-21T09:50:29.9347274Z mad.wide.s32 %rd118, %r1235, 2, %rd5; 2026-02-21T09:50:29.9347435Z mad.wide.s32 %rd119, %r1236, 2, %rd5; 2026-02-21T09:50:29.9347602Z mad.wide.s32 %rd120, %r1237, 2, %rd5; 2026-02-21T09:50:29.9347764Z mad.wide.s32 %rd121, %r1238, 2, %rd5; 2026-02-21T09:50:29.9347931Z mad.wide.s32 %rd122, %r1239, 2, %rd5; 2026-02-21T09:50:29.9348102Z mad.wide.s32 %rd123, %r1240, 2, %rd5; 2026-02-21T09:50:29.9348266Z mad.wide.s32 %rd124, %r1241, 2, %rd5; 2026-02-21T09:50:29.9348434Z mad.wide.s32 %rd125, %r1242, 2, %rd5; 2026-02-21T09:50:29.9348596Z mad.wide.s32 %rd126, %r1243, 2, %rd5; 2026-02-21T09:50:29.9348794Z mad.wide.s32 %rd127, %r1244, 2, %rd5; 2026-02-21T09:50:29.9348983Z mad.wide.s32 %rd128, %r1245, 2, %rd5; 2026-02-21T09:50:29.9349157Z mad.wide.s32 %rd129, %r1246, 2, %rd5; 2026-02-21T09:50:29.9349321Z mad.wide.s32 %rd130, %r1247, 2, %rd5; 2026-02-21T09:50:29.9349490Z mad.wide.s32 %rd131, %r1248, 2, %rd5; 2026-02-21T09:50:29.9349661Z mad.wide.s32 %rd132, %r1249, 2, %rd5; 2026-02-21T09:50:29.9349826Z mad.wide.s32 %rd133, %r1250, 2, %rd5; 2026-02-21T09:50:29.9349999Z mad.wide.s32 %rd134, %r1251, 2, %rd5; 2026-02-21T09:50:29.9350164Z mad.wide.s32 %rd135, %r1252, 2, %rd5; 2026-02-21T09:50:29.9350334Z mad.wide.s32 %rd136, %r1253, 2, %rd5; 2026-02-21T09:50:29.9350497Z mad.wide.s32 %rd137, %r1254, 2, %rd5; 2026-02-21T09:50:29.9350666Z mad.wide.s32 %rd138, %r1255, 2, %rd5; 2026-02-21T09:50:29.9350927Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9351211Z add.s32 %r642, %r297, %r1190; 2026-02-21T09:50:29.9351370Z // begin inline asm 2026-02-21T09:50:29.9351724Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r626, %r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641}, [%r642 + 0]; 2026-02-21T09:50:29.9352097Z // end inline asm 2026-02-21T09:50:29.9352229Z // begin inline asm 2026-02-21T09:50:29.9352583Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r643, %r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658}, [%r642 + 16]; 2026-02-21T09:50:29.9352951Z // end inline asm 2026-02-21T09:50:29.9353081Z // begin inline asm 2026-02-21T09:50:29.9353418Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r660, %r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675}, [%r642 + 32]; 2026-02-21T09:50:29.9353782Z // end inline asm 2026-02-21T09:50:29.9353917Z // begin inline asm 2026-02-21T09:50:29.9354252Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r677, %r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692}, [%r642 + 48]; 2026-02-21T09:50:29.9354640Z // end inline asm 2026-02-21T09:50:29.9354830Z // begin inline asm 2026-02-21T09:50:29.9355162Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r694, %r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709}, [%r642 + 64]; 2026-02-21T09:50:29.9355536Z // end inline asm 2026-02-21T09:50:29.9355664Z // begin inline asm 2026-02-21T09:50:29.9356012Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r711, %r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726}, [%r642 + 80]; 2026-02-21T09:50:29.9356402Z // end inline asm 2026-02-21T09:50:29.9356538Z // begin inline asm 2026-02-21T09:50:29.9356871Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r728, %r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743}, [%r642 + 96]; 2026-02-21T09:50:29.9357230Z // end inline asm 2026-02-21T09:50:29.9357368Z // begin inline asm 2026-02-21T09:50:29.9357702Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r745, %r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760}, [%r642 + 112]; 2026-02-21T09:50:29.9358116Z // end inline asm 2026-02-21T09:50:29.9358241Z // begin inline asm 2026-02-21T09:50:29.9358590Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r762, %r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777}, [%r642 + 128]; 2026-02-21T09:50:29.9358963Z // end inline asm 2026-02-21T09:50:29.9359090Z // begin inline asm 2026-02-21T09:50:29.9359432Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r779, %r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794}, [%r642 + 144]; 2026-02-21T09:50:29.9359797Z // end inline asm 2026-02-21T09:50:29.9359931Z // begin inline asm 2026-02-21T09:50:29.9360329Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r796, %r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811}, [%r642 + 160]; 2026-02-21T09:50:29.9360714Z // end inline asm 2026-02-21T09:50:29.9360850Z // begin inline asm 2026-02-21T09:50:29.9361181Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r813, %r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828}, [%r642 + 176]; 2026-02-21T09:50:29.9361567Z // end inline asm 2026-02-21T09:50:29.9361695Z // begin inline asm 2026-02-21T09:50:29.9362042Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r830, %r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845}, [%r642 + 192]; 2026-02-21T09:50:29.9362415Z // end inline asm 2026-02-21T09:50:29.9362547Z // begin inline asm 2026-02-21T09:50:29.9362884Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r847, %r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858, %r859, %r860, %r861, %r862}, [%r642 + 208]; 2026-02-21T09:50:29.9363266Z // end inline asm 2026-02-21T09:50:29.9363405Z // begin inline asm 2026-02-21T09:50:29.9363744Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r864, %r865, %r866, %r867, %r868, %r869, %r870, %r871, %r872, %r873, %r874, %r875, %r876, %r877, %r878, %r879}, [%r642 + 224]; 2026-02-21T09:50:29.9364123Z // end inline asm 2026-02-21T09:50:29.9364261Z // begin inline asm 2026-02-21T09:50:29.9364591Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r881, %r882, %r883, %r884, %r885, %r886, %r887, %r888, %r889, %r890, %r891, %r892, %r893, %r894, %r895, %r896}, [%r642 + 240]; 2026-02-21T09:50:29.9364983Z // end inline asm 2026-02-21T09:50:29.9365113Z // begin inline asm 2026-02-21T09:50:29.9365269Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:50:29.9365426Z // end inline asm 2026-02-21T09:50:29.9365570Z cvt.u64.u32 %rd139, %r626; 2026-02-21T09:50:29.9365725Z cvt.u64.u32 %rd140, %r627; 2026-02-21T09:50:29.9365885Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:50:29.9366052Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:50:29.9366331Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9366629Z mov.b64 {%r1256, %r1257}, %rd142; 2026-02-21T09:50:29.9366805Z cvt.rn.f16x2.f32 %r1258, %r1257, %r1256; 2026-02-21T09:50:29.9367094Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9367385Z cvt.u64.u32 %rd143, %r628; 2026-02-21T09:50:29.9367544Z cvt.u64.u32 %rd144, %r629; 2026-02-21T09:50:29.9367701Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:50:29.9367857Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:50:29.9368134Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9368439Z mov.b64 {%r1259, %r1260}, %rd146; 2026-02-21T09:50:29.9368619Z cvt.rn.f16x2.f32 %r1261, %r1260, %r1259; 2026-02-21T09:50:29.9368890Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9369169Z cvt.u64.u32 %rd147, %r630; 2026-02-21T09:50:29.9369318Z cvt.u64.u32 %rd148, %r631; 2026-02-21T09:50:29.9369497Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:50:29.9369656Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:50:29.9369908Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9370187Z mov.b64 {%r1262, %r1263}, %rd150; 2026-02-21T09:50:29.9370356Z cvt.rn.f16x2.f32 %r1264, %r1263, %r1262; 2026-02-21T09:50:29.9370630Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9370902Z cvt.u64.u32 %rd151, %r632; 2026-02-21T09:50:29.9371057Z cvt.u64.u32 %rd152, %r633; 2026-02-21T09:50:29.9371212Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:50:29.9371364Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:50:29.9371626Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9371928Z mov.b64 {%r1265, %r1266}, %rd154; 2026-02-21T09:50:29.9372126Z cvt.rn.f16x2.f32 %r1267, %r1266, %r1265; 2026-02-21T09:50:29.9372409Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9372702Z cvt.u64.u32 %rd155, %r634; 2026-02-21T09:50:29.9372857Z cvt.u64.u32 %rd156, %r635; 2026-02-21T09:50:29.9373021Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:50:29.9373188Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:50:29.9373452Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9373744Z mov.b64 {%r1268, %r1269}, %rd158; 2026-02-21T09:50:29.9373921Z cvt.rn.f16x2.f32 %r1270, %r1269, %r1268; 2026-02-21T09:50:29.9374210Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9374493Z cvt.u64.u32 %rd159, %r636; 2026-02-21T09:50:29.9374657Z cvt.u64.u32 %rd160, %r637; 2026-02-21T09:50:29.9374857Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:50:29.9375021Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:50:29.9375297Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9375587Z mov.b64 {%r1271, %r1272}, %rd162; 2026-02-21T09:50:29.9375770Z cvt.rn.f16x2.f32 %r1273, %r1272, %r1271; 2026-02-21T09:50:29.9376055Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9376347Z cvt.u64.u32 %rd163, %r638; 2026-02-21T09:50:29.9376501Z cvt.u64.u32 %rd164, %r639; 2026-02-21T09:50:29.9376661Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:50:29.9376828Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:50:29.9377095Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9377401Z mov.b64 {%r1274, %r1275}, %rd166; 2026-02-21T09:50:29.9377576Z cvt.rn.f16x2.f32 %r1276, %r1275, %r1274; 2026-02-21T09:50:29.9377871Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9378159Z cvt.u64.u32 %rd167, %r640; 2026-02-21T09:50:29.9378318Z cvt.u64.u32 %rd168, %r641; 2026-02-21T09:50:29.9378479Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:50:29.9378636Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:50:29.9378912Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9379198Z mov.b64 {%r1277, %r1278}, %rd170; 2026-02-21T09:50:29.9379379Z cvt.rn.f16x2.f32 %r1279, %r1278, %r1277; 2026-02-21T09:50:29.9379664Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9379987Z cvt.u64.u32 %rd171, %r643; 2026-02-21T09:50:29.9380157Z cvt.u64.u32 %rd172, %r644; 2026-02-21T09:50:29.9380304Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:50:29.9380460Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:50:29.9380724Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9381030Z mov.b64 {%r1280, %r1281}, %rd174; 2026-02-21T09:50:29.9381195Z cvt.rn.f16x2.f32 %r1282, %r1281, %r1280; 2026-02-21T09:50:29.9381472Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9381740Z cvt.u64.u32 %rd175, %r645; 2026-02-21T09:50:29.9381894Z cvt.u64.u32 %rd176, %r646; 2026-02-21T09:50:29.9382043Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:50:29.9382192Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:50:29.9382451Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9382734Z mov.b64 {%r1283, %r1284}, %rd178; 2026-02-21T09:50:29.9382906Z cvt.rn.f16x2.f32 %r1285, %r1284, %r1283; 2026-02-21T09:50:29.9383177Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9383490Z cvt.u64.u32 %rd179, %r647; 2026-02-21T09:50:29.9383669Z cvt.u64.u32 %rd180, %r648; 2026-02-21T09:50:29.9383818Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:50:29.9383975Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:50:29.9384228Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9384505Z mov.b64 {%r1286, %r1287}, %rd182; 2026-02-21T09:50:29.9384697Z cvt.rn.f16x2.f32 %r1288, %r1287, %r1286; 2026-02-21T09:50:29.9384981Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9385253Z cvt.u64.u32 %rd183, %r649; 2026-02-21T09:50:29.9385411Z cvt.u64.u32 %rd184, %r650; 2026-02-21T09:50:29.9385567Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:50:29.9385718Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:50:29.9385995Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9386273Z mov.b64 {%r1289, %r1290}, %rd186; 2026-02-21T09:50:29.9386451Z cvt.rn.f16x2.f32 %r1291, %r1290, %r1289; 2026-02-21T09:50:29.9386731Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9387020Z cvt.u64.u32 %rd187, %r651; 2026-02-21T09:50:29.9387180Z cvt.u64.u32 %rd188, %r652; 2026-02-21T09:50:29.9387324Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:50:29.9387479Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:50:29.9387735Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9388015Z mov.b64 {%r1292, %r1293}, %rd190; 2026-02-21T09:50:29.9388180Z cvt.rn.f16x2.f32 %r1294, %r1293, %r1292; 2026-02-21T09:50:29.9388456Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9388724Z cvt.u64.u32 %rd191, %r653; 2026-02-21T09:50:29.9388876Z cvt.u64.u32 %rd192, %r654; 2026-02-21T09:50:29.9389029Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:50:29.9389180Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:50:29.9389447Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9389718Z mov.b64 {%r1295, %r1296}, %rd194; 2026-02-21T09:50:29.9389888Z cvt.rn.f16x2.f32 %r1297, %r1296, %r1295; 2026-02-21T09:50:29.9390162Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9390442Z cvt.u64.u32 %rd195, %r655; 2026-02-21T09:50:29.9390594Z cvt.u64.u32 %rd196, %r656; 2026-02-21T09:50:29.9390739Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:50:29.9390895Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:50:29.9391182Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9391468Z mov.b64 {%r1298, %r1299}, %rd198; 2026-02-21T09:50:29.9391632Z cvt.rn.f16x2.f32 %r1300, %r1299, %r1298; 2026-02-21T09:50:29.9391908Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9392205Z cvt.u64.u32 %rd199, %r657; 2026-02-21T09:50:29.9392358Z cvt.u64.u32 %rd200, %r658; 2026-02-21T09:50:29.9392509Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:50:29.9392660Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:50:29.9392917Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9393198Z mov.b64 {%r1301, %r1302}, %rd202; 2026-02-21T09:50:29.9393371Z cvt.rn.f16x2.f32 %r1303, %r1302, %r1301; 2026-02-21T09:50:29.9393637Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9393921Z cvt.u64.u32 %rd203, %r660; 2026-02-21T09:50:29.9394073Z cvt.u64.u32 %rd204, %r661; 2026-02-21T09:50:29.9394219Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:50:29.9394373Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:50:29.9394655Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9394991Z mov.b64 {%r1304, %r1305}, %rd206; 2026-02-21T09:50:29.9395159Z cvt.rn.f16x2.f32 %r1306, %r1305, %r1304; 2026-02-21T09:50:29.9395436Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9395707Z cvt.u64.u32 %rd207, %r662; 2026-02-21T09:50:29.9395862Z cvt.u64.u32 %rd208, %r663; 2026-02-21T09:50:29.9396015Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:50:29.9396166Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:50:29.9396431Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9396716Z mov.b64 {%r1307, %r1308}, %rd210; 2026-02-21T09:50:29.9396895Z cvt.rn.f16x2.f32 %r1309, %r1308, %r1307; 2026-02-21T09:50:29.9397167Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9397457Z cvt.u64.u32 %rd211, %r664; 2026-02-21T09:50:29.9397616Z cvt.u64.u32 %rd212, %r665; 2026-02-21T09:50:29.9397764Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:50:29.9397919Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:50:29.9398170Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9398447Z mov.b64 {%r1310, %r1311}, %rd214; 2026-02-21T09:50:29.9398611Z cvt.rn.f16x2.f32 %r1312, %r1311, %r1310; 2026-02-21T09:50:29.9398885Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9399172Z cvt.u64.u32 %rd215, %r666; 2026-02-21T09:50:29.9399319Z cvt.u64.u32 %rd216, %r667; 2026-02-21T09:50:29.9399471Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:50:29.9399621Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:50:29.9399881Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9400155Z mov.b64 {%r1313, %r1314}, %rd218; 2026-02-21T09:50:29.9400328Z cvt.rn.f16x2.f32 %r1315, %r1314, %r1313; 2026-02-21T09:50:29.9400603Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9400883Z cvt.u64.u32 %rd219, %r668; 2026-02-21T09:50:29.9401034Z cvt.u64.u32 %rd220, %r669; 2026-02-21T09:50:29.9401181Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:50:29.9401338Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:50:29.9401591Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9401869Z mov.b64 {%r1316, %r1317}, %rd222; 2026-02-21T09:50:29.9402034Z cvt.rn.f16x2.f32 %r1318, %r1317, %r1316; 2026-02-21T09:50:29.9402345Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9402625Z cvt.u64.u32 %rd223, %r670; 2026-02-21T09:50:29.9402771Z cvt.u64.u32 %rd224, %r671; 2026-02-21T09:50:29.9402921Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:50:29.9403072Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:50:29.9403335Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9403637Z mov.b64 {%r1319, %r1320}, %rd226; 2026-02-21T09:50:29.9403809Z cvt.rn.f16x2.f32 %r1321, %r1320, %r1319; 2026-02-21T09:50:29.9404083Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9404366Z cvt.u64.u32 %rd227, %r672; 2026-02-21T09:50:29.9404516Z cvt.u64.u32 %rd228, %r673; 2026-02-21T09:50:29.9404659Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:50:29.9404870Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:50:29.9405131Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9405420Z mov.b64 {%r1322, %r1323}, %rd230; 2026-02-21T09:50:29.9405586Z cvt.rn.f16x2.f32 %r1324, %r1323, %r1322; 2026-02-21T09:50:29.9405910Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9406222Z cvt.u64.u32 %rd231, %r674; 2026-02-21T09:50:29.9406372Z cvt.u64.u32 %rd232, %r675; 2026-02-21T09:50:29.9406527Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:50:29.9406679Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:50:29.9406941Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9407215Z mov.b64 {%r1325, %r1326}, %rd234; 2026-02-21T09:50:29.9407393Z cvt.rn.f16x2.f32 %r1327, %r1326, %r1325; 2026-02-21T09:50:29.9407667Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9407952Z cvt.u64.u32 %rd235, %r677; 2026-02-21T09:50:29.9408114Z cvt.u64.u32 %rd236, %r678; 2026-02-21T09:50:29.9408262Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:50:29.9408420Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:50:29.9408674Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9408952Z mov.b64 {%r1328, %r1329}, %rd238; 2026-02-21T09:50:29.9409120Z cvt.rn.f16x2.f32 %r1330, %r1329, %r1328; 2026-02-21T09:50:29.9409396Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9409683Z cvt.u64.u32 %rd239, %r679; 2026-02-21T09:50:29.9409830Z cvt.u64.u32 %rd240, %r680; 2026-02-21T09:50:29.9409985Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:50:29.9410135Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:50:29.9410396Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9410668Z mov.b64 {%r1331, %r1332}, %rd242; 2026-02-21T09:50:29.9410842Z cvt.rn.f16x2.f32 %r1333, %r1332, %r1331; 2026-02-21T09:50:29.9411112Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9411393Z cvt.u64.u32 %rd243, %r681; 2026-02-21T09:50:29.9411548Z cvt.u64.u32 %rd244, %r682; 2026-02-21T09:50:29.9411697Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:50:29.9411858Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:50:29.9412113Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9412392Z mov.b64 {%r1334, %r1335}, %rd246; 2026-02-21T09:50:29.9412558Z cvt.rn.f16x2.f32 %r1336, %r1335, %r1334; 2026-02-21T09:50:29.9412834Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9413106Z cvt.u64.u32 %rd247, %r683; 2026-02-21T09:50:29.9413250Z cvt.u64.u32 %rd248, %r684; 2026-02-21T09:50:29.9413431Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:50:29.9413580Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:50:29.9413835Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9414114Z mov.b64 {%r1337, %r1338}, %rd250; 2026-02-21T09:50:29.9414285Z cvt.rn.f16x2.f32 %r1339, %r1338, %r1337; 2026-02-21T09:50:29.9414553Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9414884Z cvt.u64.u32 %rd251, %r685; 2026-02-21T09:50:29.9415039Z cvt.u64.u32 %rd252, %r686; 2026-02-21T09:50:29.9415184Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:50:29.9415341Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:50:29.9415596Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9415887Z mov.b64 {%r1340, %r1341}, %rd254; 2026-02-21T09:50:29.9416059Z cvt.rn.f16x2.f32 %r1342, %r1341, %r1340; 2026-02-21T09:50:29.9416356Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9416650Z cvt.u64.u32 %rd255, %r687; 2026-02-21T09:50:29.9416806Z cvt.u64.u32 %rd256, %r688; 2026-02-21T09:50:29.9416969Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:50:29.9417151Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:50:29.9417456Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9417754Z mov.b64 {%r1343, %r1344}, %rd258; 2026-02-21T09:50:29.9417937Z cvt.rn.f16x2.f32 %r1345, %r1344, %r1343; 2026-02-21T09:50:29.9418219Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9418526Z cvt.u64.u32 %rd259, %r689; 2026-02-21T09:50:29.9418689Z cvt.u64.u32 %rd260, %r690; 2026-02-21T09:50:29.9418848Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:50:29.9419015Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:50:29.9419282Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9419572Z mov.b64 {%r1346, %r1347}, %rd262; 2026-02-21T09:50:29.9419743Z cvt.rn.f16x2.f32 %r1348, %r1347, %r1346; 2026-02-21T09:50:29.9420036Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9420329Z cvt.u64.u32 %rd263, %r691; 2026-02-21T09:50:29.9420485Z cvt.u64.u32 %rd264, %r692; 2026-02-21T09:50:29.9420646Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:50:29.9420802Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:50:29.9421076Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9421370Z mov.b64 {%r1349, %r1350}, %rd266; 2026-02-21T09:50:29.9421550Z cvt.rn.f16x2.f32 %r1351, %r1350, %r1349; 2026-02-21T09:50:29.9421831Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9422127Z cvt.u64.u32 %rd267, %r694; 2026-02-21T09:50:29.9422288Z cvt.u64.u32 %rd268, %r695; 2026-02-21T09:50:29.9422441Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:50:29.9422605Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:50:29.9422871Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9423172Z mov.b64 {%r1352, %r1353}, %rd270; 2026-02-21T09:50:29.9423346Z cvt.rn.f16x2.f32 %r1354, %r1353, %r1352; 2026-02-21T09:50:29.9423634Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9423932Z cvt.u64.u32 %rd271, %r696; 2026-02-21T09:50:29.9424078Z cvt.u64.u32 %rd272, %r697; 2026-02-21T09:50:29.9424231Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:50:29.9424380Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:50:29.9424638Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9424972Z mov.b64 {%r1355, %r1356}, %rd274; 2026-02-21T09:50:29.9425143Z cvt.rn.f16x2.f32 %r1357, %r1356, %r1355; 2026-02-21T09:50:29.9425410Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9425685Z cvt.u64.u32 %rd275, %r698; 2026-02-21T09:50:29.9425840Z cvt.u64.u32 %rd276, %r699; 2026-02-21T09:50:29.9425988Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:50:29.9426172Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:50:29.9426427Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9426710Z mov.b64 {%r1358, %r1359}, %rd278; 2026-02-21T09:50:29.9426877Z cvt.rn.f16x2.f32 %r1360, %r1359, %r1358; 2026-02-21T09:50:29.9427153Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9427441Z cvt.u64.u32 %rd279, %r700; 2026-02-21T09:50:29.9427588Z cvt.u64.u32 %rd280, %r701; 2026-02-21T09:50:29.9427743Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:50:29.9427893Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:50:29.9428151Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9428424Z mov.b64 {%r1361, %r1362}, %rd282; 2026-02-21T09:50:29.9428626Z cvt.rn.f16x2.f32 %r1363, %r1362, %r1361; 2026-02-21T09:50:29.9428927Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9429212Z cvt.u64.u32 %rd283, %r702; 2026-02-21T09:50:29.9429371Z cvt.u64.u32 %rd284, %r703; 2026-02-21T09:50:29.9429522Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:50:29.9429691Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:50:29.9441056Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9441393Z mov.b64 {%r1364, %r1365}, %rd286; 2026-02-21T09:50:29.9441595Z cvt.rn.f16x2.f32 %r1366, %r1365, %r1364; 2026-02-21T09:50:29.9441900Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9442208Z cvt.u64.u32 %rd287, %r704; 2026-02-21T09:50:29.9442373Z cvt.u64.u32 %rd288, %r705; 2026-02-21T09:50:29.9442539Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:50:29.9442710Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:50:29.9442992Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9443293Z mov.b64 {%r1367, %r1368}, %rd290; 2026-02-21T09:50:29.9443474Z cvt.rn.f16x2.f32 %r1369, %r1368, %r1367; 2026-02-21T09:50:29.9443765Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9444048Z cvt.u64.u32 %rd291, %r706; 2026-02-21T09:50:29.9444211Z cvt.u64.u32 %rd292, %r707; 2026-02-21T09:50:29.9444364Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:50:29.9444530Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:50:29.9444837Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9445128Z mov.b64 {%r1370, %r1371}, %rd294; 2026-02-21T09:50:29.9445311Z cvt.rn.f16x2.f32 %r1372, %r1371, %r1370; 2026-02-21T09:50:29.9445586Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9445871Z cvt.u64.u32 %rd295, %r708; 2026-02-21T09:50:29.9446025Z cvt.u64.u32 %rd296, %r709; 2026-02-21T09:50:29.9446190Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:50:29.9446348Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:50:29.9446619Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9446908Z mov.b64 {%r1373, %r1374}, %rd298; 2026-02-21T09:50:29.9447081Z cvt.rn.f16x2.f32 %r1375, %r1374, %r1373; 2026-02-21T09:50:29.9447362Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9447742Z cvt.u64.u32 %rd299, %r711; 2026-02-21T09:50:29.9447907Z cvt.u64.u32 %rd300, %r712; 2026-02-21T09:50:29.9448059Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:50:29.9448225Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:50:29.9448493Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9448794Z mov.b64 {%r1376, %r1377}, %rd302; 2026-02-21T09:50:29.9449020Z cvt.rn.f16x2.f32 %r1378, %r1377, %r1376; 2026-02-21T09:50:29.9449300Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9449596Z cvt.u64.u32 %rd303, %r713; 2026-02-21T09:50:29.9449748Z cvt.u64.u32 %rd304, %r714; 2026-02-21T09:50:29.9449910Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:50:29.9450071Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:50:29.9450344Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9450649Z mov.b64 {%r1379, %r1380}, %rd306; 2026-02-21T09:50:29.9450826Z cvt.rn.f16x2.f32 %r1381, %r1380, %r1379; 2026-02-21T09:50:29.9451117Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9451400Z cvt.u64.u32 %rd307, %r715; 2026-02-21T09:50:29.9451606Z cvt.u64.u32 %rd308, %r716; 2026-02-21T09:50:29.9451760Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:50:29.9451963Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:50:29.9452226Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9452515Z mov.b64 {%r1382, %r1383}, %rd310; 2026-02-21T09:50:29.9452695Z cvt.rn.f16x2.f32 %r1384, %r1383, %r1382; 2026-02-21T09:50:29.9452973Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9453262Z cvt.u64.u32 %rd311, %r717; 2026-02-21T09:50:29.9453411Z cvt.u64.u32 %rd312, %r718; 2026-02-21T09:50:29.9453575Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:50:29.9453730Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:50:29.9453995Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9454290Z mov.b64 {%r1385, %r1386}, %rd314; 2026-02-21T09:50:29.9454460Z cvt.rn.f16x2.f32 %r1387, %r1386, %r1385; 2026-02-21T09:50:29.9454782Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9455056Z cvt.u64.u32 %rd315, %r719; 2026-02-21T09:50:29.9455219Z cvt.u64.u32 %rd316, %r720; 2026-02-21T09:50:29.9455370Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:50:29.9455533Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:50:29.9455785Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9456068Z mov.b64 {%r1388, %r1389}, %rd318; 2026-02-21T09:50:29.9456245Z cvt.rn.f16x2.f32 %r1390, %r1389, %r1388; 2026-02-21T09:50:29.9456518Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9456800Z cvt.u64.u32 %rd319, %r721; 2026-02-21T09:50:29.9456948Z cvt.u64.u32 %rd320, %r722; 2026-02-21T09:50:29.9457102Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:50:29.9457257Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:50:29.9457521Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9457803Z mov.b64 {%r1391, %r1392}, %rd322; 2026-02-21T09:50:29.9457969Z cvt.rn.f16x2.f32 %r1393, %r1392, %r1391; 2026-02-21T09:50:29.9458248Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9458523Z cvt.u64.u32 %rd323, %r723; 2026-02-21T09:50:29.9458679Z cvt.u64.u32 %rd324, %r724; 2026-02-21T09:50:29.9458827Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:50:29.9458989Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:50:29.9459255Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9459567Z mov.b64 {%r1394, %r1395}, %rd326; 2026-02-21T09:50:29.9459746Z cvt.rn.f16x2.f32 %r1396, %r1395, %r1394; 2026-02-21T09:50:29.9460022Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9460312Z cvt.u64.u32 %rd327, %r725; 2026-02-21T09:50:29.9460500Z cvt.u64.u32 %rd328, %r726; 2026-02-21T09:50:29.9460669Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:50:29.9460833Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:50:29.9461116Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9461417Z mov.b64 {%r1397, %r1398}, %rd330; 2026-02-21T09:50:29.9461597Z cvt.rn.f16x2.f32 %r1399, %r1398, %r1397; 2026-02-21T09:50:29.9461895Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9462184Z cvt.u64.u32 %rd331, %r728; 2026-02-21T09:50:29.9462348Z cvt.u64.u32 %rd332, %r729; 2026-02-21T09:50:29.9462505Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:50:29.9462674Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:50:29.9462979Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9463269Z mov.b64 {%r1400, %r1401}, %rd334; 2026-02-21T09:50:29.9463486Z cvt.rn.f16x2.f32 %r1402, %r1401, %r1400; 2026-02-21T09:50:29.9463775Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9464070Z cvt.u64.u32 %rd335, %r730; 2026-02-21T09:50:29.9464227Z cvt.u64.u32 %rd336, %r731; 2026-02-21T09:50:29.9464392Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:50:29.9464551Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:50:29.9464870Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9465174Z mov.b64 {%r1403, %r1404}, %rd338; 2026-02-21T09:50:29.9465356Z cvt.rn.f16x2.f32 %r1405, %r1404, %r1403; 2026-02-21T09:50:29.9465662Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9465969Z cvt.u64.u32 %rd339, %r732; 2026-02-21T09:50:29.9466139Z cvt.u64.u32 %rd340, %r733; 2026-02-21T09:50:29.9466300Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:50:29.9466477Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:50:29.9466759Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9467051Z mov.b64 {%r1406, %r1407}, %rd342; 2026-02-21T09:50:29.9467241Z cvt.rn.f16x2.f32 %r1408, %r1407, %r1406; 2026-02-21T09:50:29.9467526Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9467826Z cvt.u64.u32 %rd343, %r734; 2026-02-21T09:50:29.9467979Z cvt.u64.u32 %rd344, %r735; 2026-02-21T09:50:29.9468137Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:50:29.9468291Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:50:29.9468558Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9468857Z mov.b64 {%r1409, %r1410}, %rd346; 2026-02-21T09:50:29.9469031Z cvt.rn.f16x2.f32 %r1411, %r1410, %r1409; 2026-02-21T09:50:29.9469313Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9469589Z cvt.u64.u32 %rd347, %r736; 2026-02-21T09:50:29.9469747Z cvt.u64.u32 %rd348, %r737; 2026-02-21T09:50:29.9469895Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:50:29.9470057Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:50:29.9470322Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9470598Z mov.b64 {%r1412, %r1413}, %rd350; 2026-02-21T09:50:29.9470777Z cvt.rn.f16x2.f32 %r1414, %r1413, %r1412; 2026-02-21T09:50:29.9471052Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9471371Z cvt.u64.u32 %rd351, %r738; 2026-02-21T09:50:29.9471529Z cvt.u64.u32 %rd352, %r739; 2026-02-21T09:50:29.9471694Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:50:29.9471855Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:50:29.9472129Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9472450Z mov.b64 {%r1415, %r1416}, %rd354; 2026-02-21T09:50:29.9472618Z cvt.rn.f16x2.f32 %r1417, %r1416, %r1415; 2026-02-21T09:50:29.9472903Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9473189Z cvt.u64.u32 %rd355, %r740; 2026-02-21T09:50:29.9473349Z cvt.u64.u32 %rd356, %r741; 2026-02-21T09:50:29.9473500Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:50:29.9473660Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:50:29.9473922Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9474203Z mov.b64 {%r1418, %r1419}, %rd358; 2026-02-21T09:50:29.9474379Z cvt.rn.f16x2.f32 %r1420, %r1419, %r1418; 2026-02-21T09:50:29.9474714Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9475001Z cvt.u64.u32 %rd359, %r742; 2026-02-21T09:50:29.9475179Z cvt.u64.u32 %rd360, %r743; 2026-02-21T09:50:29.9475343Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:50:29.9475497Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:50:29.9475761Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9476049Z mov.b64 {%r1421, %r1422}, %rd362; 2026-02-21T09:50:29.9476218Z cvt.rn.f16x2.f32 %r1423, %r1422, %r1421; 2026-02-21T09:50:29.9476497Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9476772Z cvt.u64.u32 %rd363, %r745; 2026-02-21T09:50:29.9476935Z cvt.u64.u32 %rd364, %r746; 2026-02-21T09:50:29.9477088Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:50:29.9477252Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:50:29.9477518Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9477800Z mov.b64 {%r1424, %r1425}, %rd366; 2026-02-21T09:50:29.9477982Z cvt.rn.f16x2.f32 %r1426, %r1425, %r1424; 2026-02-21T09:50:29.9478259Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9478549Z cvt.u64.u32 %rd367, %r747; 2026-02-21T09:50:29.9478704Z cvt.u64.u32 %rd368, %r748; 2026-02-21T09:50:29.9478853Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:50:29.9479011Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:50:29.9479260Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9479545Z mov.b64 {%r1427, %r1428}, %rd370; 2026-02-21T09:50:29.9479615Z cvt.rn.f16x2.f32 %r1429, %r1428, %r1427; 2026-02-21T09:50:29.9479781Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9479840Z cvt.u64.u32 %rd371, %r749; 2026-02-21T09:50:29.9479898Z cvt.u64.u32 %rd372, %r750; 2026-02-21T09:50:29.9479957Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:50:29.9480024Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:50:29.9480182Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9480240Z mov.b64 {%r1430, %r1431}, %rd374; 2026-02-21T09:50:29.9480313Z cvt.rn.f16x2.f32 %r1432, %r1431, %r1430; 2026-02-21T09:50:29.9480468Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9480524Z cvt.u64.u32 %rd375, %r751; 2026-02-21T09:50:29.9480586Z cvt.u64.u32 %rd376, %r752; 2026-02-21T09:50:29.9480642Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:50:29.9480728Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:50:29.9480891Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9480957Z mov.b64 {%r1433, %r1434}, %rd378; 2026-02-21T09:50:29.9481020Z cvt.rn.f16x2.f32 %r1435, %r1434, %r1433; 2026-02-21T09:50:29.9481183Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9481271Z cvt.u64.u32 %rd379, %r753; 2026-02-21T09:50:29.9481328Z cvt.u64.u32 %rd380, %r754; 2026-02-21T09:50:29.9481384Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:50:29.9481448Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:50:29.9481607Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9481666Z mov.b64 {%r1436, %r1437}, %rd382; 2026-02-21T09:50:29.9481730Z cvt.rn.f16x2.f32 %r1438, %r1437, %r1436; 2026-02-21T09:50:29.9481896Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9481953Z cvt.u64.u32 %rd383, %r755; 2026-02-21T09:50:29.9482008Z cvt.u64.u32 %rd384, %r756; 2026-02-21T09:50:29.9482072Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:50:29.9482129Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:50:29.9482353Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9482421Z mov.b64 {%r1439, %r1440}, %rd386; 2026-02-21T09:50:29.9482484Z cvt.rn.f16x2.f32 %r1441, %r1440, %r1439; 2026-02-21T09:50:29.9482645Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9482702Z cvt.u64.u32 %rd387, %r757; 2026-02-21T09:50:29.9482766Z cvt.u64.u32 %rd388, %r758; 2026-02-21T09:50:29.9482822Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:50:29.9482880Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:50:29.9483045Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9483103Z mov.b64 {%r1442, %r1443}, %rd390; 2026-02-21T09:50:29.9483167Z cvt.rn.f16x2.f32 %r1444, %r1443, %r1442; 2026-02-21T09:50:29.9483332Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9483391Z cvt.u64.u32 %rd391, %r759; 2026-02-21T09:50:29.9483448Z cvt.u64.u32 %rd392, %r760; 2026-02-21T09:50:29.9483506Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:50:29.9483570Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:50:29.9483733Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9483791Z mov.b64 {%r1445, %r1446}, %rd394; 2026-02-21T09:50:29.9483863Z cvt.rn.f16x2.f32 %r1447, %r1446, %r1445; 2026-02-21T09:50:29.9484026Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9484082Z cvt.u64.u32 %rd395, %r762; 2026-02-21T09:50:29.9484147Z cvt.u64.u32 %rd396, %r763; 2026-02-21T09:50:29.9484204Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:50:29.9484262Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:50:29.9484423Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9484488Z mov.b64 {%r1448, %r1449}, %rd398; 2026-02-21T09:50:29.9484562Z cvt.rn.f16x2.f32 %r1450, %r1449, %r1448; 2026-02-21T09:50:29.9484777Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9484834Z cvt.u64.u32 %rd399, %r764; 2026-02-21T09:50:29.9484889Z cvt.u64.u32 %rd400, %r765; 2026-02-21T09:50:29.9484952Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:50:29.9485010Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:50:29.9485172Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9485229Z mov.b64 {%r1451, %r1452}, %rd402; 2026-02-21T09:50:29.9485330Z cvt.rn.f16x2.f32 %r1453, %r1452, %r1451; 2026-02-21T09:50:29.9485494Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9485550Z cvt.u64.u32 %rd403, %r766; 2026-02-21T09:50:29.9485614Z cvt.u64.u32 %rd404, %r767; 2026-02-21T09:50:29.9485673Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:50:29.9485733Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:50:29.9485930Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9485991Z mov.b64 {%r1454, %r1455}, %rd406; 2026-02-21T09:50:29.9486055Z cvt.rn.f16x2.f32 %r1456, %r1455, %r1454; 2026-02-21T09:50:29.9486214Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9486280Z cvt.u64.u32 %rd407, %r768; 2026-02-21T09:50:29.9486336Z cvt.u64.u32 %rd408, %r769; 2026-02-21T09:50:29.9486392Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:50:29.9486460Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:50:29.9486625Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9486682Z mov.b64 {%r1457, %r1458}, %rd410; 2026-02-21T09:50:29.9486753Z cvt.rn.f16x2.f32 %r1459, %r1458, %r1457; 2026-02-21T09:50:29.9486994Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9487053Z cvt.u64.u32 %rd411, %r770; 2026-02-21T09:50:29.9487109Z cvt.u64.u32 %rd412, %r771; 2026-02-21T09:50:29.9487173Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:50:29.9487231Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:50:29.9487393Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9487458Z mov.b64 {%r1460, %r1461}, %rd414; 2026-02-21T09:50:29.9487523Z cvt.rn.f16x2.f32 %r1462, %r1461, %r1460; 2026-02-21T09:50:29.9487683Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9487748Z cvt.u64.u32 %rd415, %r772; 2026-02-21T09:50:29.9487803Z cvt.u64.u32 %rd416, %r773; 2026-02-21T09:50:29.9487860Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:50:29.9487915Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:50:29.9488083Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9488141Z mov.b64 {%r1463, %r1464}, %rd418; 2026-02-21T09:50:29.9488204Z cvt.rn.f16x2.f32 %r1465, %r1464, %r1463; 2026-02-21T09:50:29.9488367Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9488423Z cvt.u64.u32 %rd419, %r774; 2026-02-21T09:50:29.9488477Z cvt.u64.u32 %rd420, %r775; 2026-02-21T09:50:29.9488542Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:50:29.9488599Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:50:29.9488754Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9488814Z mov.b64 {%r1466, %r1467}, %rd422; 2026-02-21T09:50:29.9488885Z cvt.rn.f16x2.f32 %r1468, %r1467, %r1466; 2026-02-21T09:50:29.9489046Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9489104Z cvt.u64.u32 %rd423, %r776; 2026-02-21T09:50:29.9489170Z cvt.u64.u32 %rd424, %r777; 2026-02-21T09:50:29.9489228Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:50:29.9489283Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:50:29.9489443Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9489499Z mov.b64 {%r1469, %r1470}, %rd426; 2026-02-21T09:50:29.9489562Z cvt.rn.f16x2.f32 %r1471, %r1470, %r1469; 2026-02-21T09:50:29.9489715Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9489777Z cvt.u64.u32 %rd427, %r779; 2026-02-21T09:50:29.9489858Z cvt.u64.u32 %rd428, %r780; 2026-02-21T09:50:29.9489914Z shl.b64 %rd429, %rd428, 32; 2026-02-21T09:50:29.9489975Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T09:50:29.9490132Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9490188Z mov.b64 {%r1472, %r1473}, %rd430; 2026-02-21T09:50:29.9490260Z cvt.rn.f16x2.f32 %r1474, %r1473, %r1472; 2026-02-21T09:50:29.9490440Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9490497Z cvt.u64.u32 %rd431, %r781; 2026-02-21T09:50:29.9490553Z cvt.u64.u32 %rd432, %r782; 2026-02-21T09:50:29.9490615Z shl.b64 %rd433, %rd432, 32; 2026-02-21T09:50:29.9490673Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T09:50:29.9490826Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9490889Z mov.b64 {%r1475, %r1476}, %rd434; 2026-02-21T09:50:29.9490953Z cvt.rn.f16x2.f32 %r1477, %r1476, %r1475; 2026-02-21T09:50:29.9491107Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9491170Z cvt.u64.u32 %rd435, %r783; 2026-02-21T09:50:29.9491225Z cvt.u64.u32 %rd436, %r784; 2026-02-21T09:50:29.9491302Z shl.b64 %rd437, %rd436, 32; 2026-02-21T09:50:29.9491360Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T09:50:29.9491550Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9491608Z mov.b64 {%r1478, %r1479}, %rd438; 2026-02-21T09:50:29.9491672Z cvt.rn.f16x2.f32 %r1480, %r1479, %r1478; 2026-02-21T09:50:29.9491840Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9491895Z cvt.u64.u32 %rd439, %r785; 2026-02-21T09:50:29.9491950Z cvt.u64.u32 %rd440, %r786; 2026-02-21T09:50:29.9492013Z shl.b64 %rd441, %rd440, 32; 2026-02-21T09:50:29.9492071Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T09:50:29.9492233Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9492291Z mov.b64 {%r1481, %r1482}, %rd442; 2026-02-21T09:50:29.9492361Z cvt.rn.f16x2.f32 %r1483, %r1482, %r1481; 2026-02-21T09:50:29.9492522Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9492579Z cvt.u64.u32 %rd443, %r787; 2026-02-21T09:50:29.9492642Z cvt.u64.u32 %rd444, %r788; 2026-02-21T09:50:29.9492698Z shl.b64 %rd445, %rd444, 32; 2026-02-21T09:50:29.9492756Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T09:50:29.9492926Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9492984Z mov.b64 {%r1484, %r1485}, %rd446; 2026-02-21T09:50:29.9493048Z cvt.rn.f16x2.f32 %r1486, %r1485, %r1484; 2026-02-21T09:50:29.9493211Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9493284Z cvt.u64.u32 %rd447, %r789; 2026-02-21T09:50:29.9493340Z cvt.u64.u32 %rd448, %r790; 2026-02-21T09:50:29.9493396Z shl.b64 %rd449, %rd448, 32; 2026-02-21T09:50:29.9493459Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T09:50:29.9493616Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9493675Z mov.b64 {%r1487, %r1488}, %rd450; 2026-02-21T09:50:29.9493746Z cvt.rn.f16x2.f32 %r1489, %r1488, %r1487; 2026-02-21T09:50:29.9493906Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9493961Z cvt.u64.u32 %rd451, %r791; 2026-02-21T09:50:29.9494016Z cvt.u64.u32 %rd452, %r792; 2026-02-21T09:50:29.9494078Z shl.b64 %rd453, %rd452, 32; 2026-02-21T09:50:29.9494133Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T09:50:29.9494294Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9494383Z mov.b64 {%r1490, %r1491}, %rd454; 2026-02-21T09:50:29.9494447Z cvt.rn.f16x2.f32 %r1492, %r1491, %r1490; 2026-02-21T09:50:29.9494606Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9494734Z cvt.u64.u32 %rd455, %r793; 2026-02-21T09:50:29.9494793Z cvt.u64.u32 %rd456, %r794; 2026-02-21T09:50:29.9494879Z shl.b64 %rd457, %rd456, 32; 2026-02-21T09:50:29.9494936Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T09:50:29.9495103Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9495160Z mov.b64 {%r1493, %r1494}, %rd458; 2026-02-21T09:50:29.9495224Z cvt.rn.f16x2.f32 %r1495, %r1494, %r1493; 2026-02-21T09:50:29.9495389Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9495446Z cvt.u64.u32 %rd459, %r796; 2026-02-21T09:50:29.9495503Z cvt.u64.u32 %rd460, %r797; 2026-02-21T09:50:29.9495565Z shl.b64 %rd461, %rd460, 32; 2026-02-21T09:50:29.9495621Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T09:50:29.9495778Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9495861Z mov.b64 {%r1496, %r1497}, %rd462; 2026-02-21T09:50:29.9495937Z cvt.rn.f16x2.f32 %r1498, %r1497, %r1496; 2026-02-21T09:50:29.9496128Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9496187Z cvt.u64.u32 %rd463, %r798; 2026-02-21T09:50:29.9496252Z cvt.u64.u32 %rd464, %r799; 2026-02-21T09:50:29.9496309Z shl.b64 %rd465, %rd464, 32; 2026-02-21T09:50:29.9496367Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T09:50:29.9496534Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9496592Z mov.b64 {%r1499, %r1500}, %rd466; 2026-02-21T09:50:29.9496655Z cvt.rn.f16x2.f32 %r1501, %r1500, %r1499; 2026-02-21T09:50:29.9496815Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9496879Z cvt.u64.u32 %rd467, %r800; 2026-02-21T09:50:29.9496933Z cvt.u64.u32 %rd468, %r801; 2026-02-21T09:50:29.9496989Z shl.b64 %rd469, %rd468, 32; 2026-02-21T09:50:29.9497054Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T09:50:29.9497217Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9497274Z mov.b64 {%r1502, %r1503}, %rd470; 2026-02-21T09:50:29.9497344Z cvt.rn.f16x2.f32 %r1504, %r1503, %r1502; 2026-02-21T09:50:29.9497501Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9497556Z cvt.u64.u32 %rd471, %r802; 2026-02-21T09:50:29.9497611Z cvt.u64.u32 %rd472, %r803; 2026-02-21T09:50:29.9497672Z shl.b64 %rd473, %rd472, 32; 2026-02-21T09:50:29.9497728Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T09:50:29.9497888Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9497950Z mov.b64 {%r1505, %r1506}, %rd474; 2026-02-21T09:50:29.9498012Z cvt.rn.f16x2.f32 %r1507, %r1506, %r1505; 2026-02-21T09:50:29.9498171Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9498236Z cvt.u64.u32 %rd475, %r804; 2026-02-21T09:50:29.9498292Z cvt.u64.u32 %rd476, %r805; 2026-02-21T09:50:29.9498346Z shl.b64 %rd477, %rd476, 32; 2026-02-21T09:50:29.9498402Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T09:50:29.9498566Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9498622Z mov.b64 {%r1508, %r1509}, %rd478; 2026-02-21T09:50:29.9498684Z cvt.rn.f16x2.f32 %r1510, %r1509, %r1508; 2026-02-21T09:50:29.9498851Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9498933Z cvt.u64.u32 %rd479, %r806; 2026-02-21T09:50:29.9498988Z cvt.u64.u32 %rd480, %r807; 2026-02-21T09:50:29.9499049Z shl.b64 %rd481, %rd480, 32; 2026-02-21T09:50:29.9499107Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T09:50:29.9499268Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9499360Z mov.b64 {%r1511, %r1512}, %rd482; 2026-02-21T09:50:29.9499431Z cvt.rn.f16x2.f32 %r1513, %r1512, %r1511; 2026-02-21T09:50:29.9499590Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9499647Z cvt.u64.u32 %rd483, %r808; 2026-02-21T09:50:29.9499709Z cvt.u64.u32 %rd484, %r809; 2026-02-21T09:50:29.9499763Z shl.b64 %rd485, %rd484, 32; 2026-02-21T09:50:29.9499819Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T09:50:29.9499982Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9500040Z mov.b64 {%r1514, %r1515}, %rd486; 2026-02-21T09:50:29.9500102Z cvt.rn.f16x2.f32 %r1516, %r1515, %r1514; 2026-02-21T09:50:29.9500261Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9500324Z cvt.u64.u32 %rd487, %r810; 2026-02-21T09:50:29.9500402Z cvt.u64.u32 %rd488, %r811; 2026-02-21T09:50:29.9500482Z shl.b64 %rd489, %rd488, 32; 2026-02-21T09:50:29.9500548Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T09:50:29.9500704Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9500761Z mov.b64 {%r1517, %r1518}, %rd490; 2026-02-21T09:50:29.9500830Z cvt.rn.f16x2.f32 %r1519, %r1518, %r1517; 2026-02-21T09:50:29.9500989Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9501044Z cvt.u64.u32 %rd491, %r813; 2026-02-21T09:50:29.9501099Z cvt.u64.u32 %rd492, %r814; 2026-02-21T09:50:29.9501163Z shl.b64 %rd493, %rd492, 32; 2026-02-21T09:50:29.9501221Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T09:50:29.9501380Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9501445Z mov.b64 {%r1520, %r1521}, %rd494; 2026-02-21T09:50:29.9501511Z cvt.rn.f16x2.f32 %r1522, %r1521, %r1520; 2026-02-21T09:50:29.9501677Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9501743Z cvt.u64.u32 %rd495, %r815; 2026-02-21T09:50:29.9501799Z cvt.u64.u32 %rd496, %r816; 2026-02-21T09:50:29.9501858Z shl.b64 %rd497, %rd496, 32; 2026-02-21T09:50:29.9501916Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T09:50:29.9502082Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9502140Z mov.b64 {%r1523, %r1524}, %rd498; 2026-02-21T09:50:29.9502204Z cvt.rn.f16x2.f32 %r1525, %r1524, %r1523; 2026-02-21T09:50:29.9502369Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9502425Z cvt.u64.u32 %rd499, %r817; 2026-02-21T09:50:29.9502481Z cvt.u64.u32 %rd500, %r818; 2026-02-21T09:50:29.9502543Z shl.b64 %rd501, %rd500, 32; 2026-02-21T09:50:29.9502601Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T09:50:29.9502760Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9502818Z mov.b64 {%r1526, %r1527}, %rd502; 2026-02-21T09:50:29.9502888Z cvt.rn.f16x2.f32 %r1528, %r1527, %r1526; 2026-02-21T09:50:29.9503047Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9503102Z cvt.u64.u32 %rd503, %r819; 2026-02-21T09:50:29.9503165Z cvt.u64.u32 %rd504, %r820; 2026-02-21T09:50:29.9503222Z shl.b64 %rd505, %rd504, 32; 2026-02-21T09:50:29.9503277Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T09:50:29.9503464Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9503521Z mov.b64 {%r1529, %r1530}, %rd506; 2026-02-21T09:50:29.9503583Z cvt.rn.f16x2.f32 %r1531, %r1530, %r1529; 2026-02-21T09:50:29.9503743Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9503829Z cvt.u64.u32 %rd507, %r821; 2026-02-21T09:50:29.9503885Z cvt.u64.u32 %rd508, %r822; 2026-02-21T09:50:29.9503943Z shl.b64 %rd509, %rd508, 32; 2026-02-21T09:50:29.9504009Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T09:50:29.9504177Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9504236Z mov.b64 {%r1532, %r1533}, %rd510; 2026-02-21T09:50:29.9504310Z cvt.rn.f16x2.f32 %r1534, %r1533, %r1532; 2026-02-21T09:50:29.9504480Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9504539Z cvt.u64.u32 %rd511, %r823; 2026-02-21T09:50:29.9504597Z cvt.u64.u32 %rd512, %r824; 2026-02-21T09:50:29.9504666Z shl.b64 %rd513, %rd512, 32; 2026-02-21T09:50:29.9504759Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T09:50:29.9504955Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9505046Z mov.b64 {%r1535, %r1536}, %rd514; 2026-02-21T09:50:29.9505116Z cvt.rn.f16x2.f32 %r1537, %r1536, %r1535; 2026-02-21T09:50:29.9505283Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9505348Z cvt.u64.u32 %rd515, %r825; 2026-02-21T09:50:29.9505405Z cvt.u64.u32 %rd516, %r826; 2026-02-21T09:50:29.9505463Z shl.b64 %rd517, %rd516, 32; 2026-02-21T09:50:29.9505521Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T09:50:29.9505694Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9505756Z mov.b64 {%r1538, %r1539}, %rd518; 2026-02-21T09:50:29.9505822Z cvt.rn.f16x2.f32 %r1540, %r1539, %r1538; 2026-02-21T09:50:29.9505994Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9506054Z cvt.u64.u32 %rd519, %r827; 2026-02-21T09:50:29.9506113Z cvt.u64.u32 %rd520, %r828; 2026-02-21T09:50:29.9506180Z shl.b64 %rd521, %rd520, 32; 2026-02-21T09:50:29.9506241Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T09:50:29.9506410Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9506470Z mov.b64 {%r1541, %r1542}, %rd522; 2026-02-21T09:50:29.9506546Z cvt.rn.f16x2.f32 %r1543, %r1542, %r1541; 2026-02-21T09:50:29.9506715Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9506772Z cvt.u64.u32 %rd523, %r830; 2026-02-21T09:50:29.9506836Z cvt.u64.u32 %rd524, %r831; 2026-02-21T09:50:29.9506893Z shl.b64 %rd525, %rd524, 32; 2026-02-21T09:50:29.9506951Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T09:50:29.9507125Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9507183Z mov.b64 {%r1544, %r1545}, %rd526; 2026-02-21T09:50:29.9507250Z cvt.rn.f16x2.f32 %r1546, %r1545, %r1544; 2026-02-21T09:50:29.9507420Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9507486Z cvt.u64.u32 %rd527, %r832; 2026-02-21T09:50:29.9507543Z cvt.u64.u32 %rd528, %r833; 2026-02-21T09:50:29.9507600Z shl.b64 %rd529, %rd528, 32; 2026-02-21T09:50:29.9507664Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T09:50:29.9507830Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9507889Z mov.b64 {%r1547, %r1548}, %rd530; 2026-02-21T09:50:29.9507960Z cvt.rn.f16x2.f32 %r1549, %r1548, %r1547; 2026-02-21T09:50:29.9508160Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9508220Z cvt.u64.u32 %rd531, %r834; 2026-02-21T09:50:29.9508279Z cvt.u64.u32 %rd532, %r835; 2026-02-21T09:50:29.9508345Z shl.b64 %rd533, %rd532, 32; 2026-02-21T09:50:29.9508407Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T09:50:29.9508578Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9508671Z mov.b64 {%r1550, %r1551}, %rd534; 2026-02-21T09:50:29.9508737Z cvt.rn.f16x2.f32 %r1552, %r1551, %r1550; 2026-02-21T09:50:29.9508902Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9508967Z cvt.u64.u32 %rd535, %r836; 2026-02-21T09:50:29.9509024Z cvt.u64.u32 %rd536, %r837; 2026-02-21T09:50:29.9509082Z shl.b64 %rd537, %rd536, 32; 2026-02-21T09:50:29.9509141Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T09:50:29.9509314Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9509374Z mov.b64 {%r1553, %r1554}, %rd538; 2026-02-21T09:50:29.9509440Z cvt.rn.f16x2.f32 %r1555, %r1554, %r1553; 2026-02-21T09:50:29.9509631Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9509710Z cvt.u64.u32 %rd539, %r838; 2026-02-21T09:50:29.9509770Z cvt.u64.u32 %rd540, %r839; 2026-02-21T09:50:29.9509836Z shl.b64 %rd541, %rd540, 32; 2026-02-21T09:50:29.9509896Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T09:50:29.9510065Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9510125Z mov.b64 {%r1556, %r1557}, %rd542; 2026-02-21T09:50:29.9510202Z cvt.rn.f16x2.f32 %r1558, %r1557, %r1556; 2026-02-21T09:50:29.9510376Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9510437Z cvt.u64.u32 %rd543, %r840; 2026-02-21T09:50:29.9510506Z cvt.u64.u32 %rd544, %r841; 2026-02-21T09:50:29.9510564Z shl.b64 %rd545, %rd544, 32; 2026-02-21T09:50:29.9510626Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T09:50:29.9510808Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9510872Z mov.b64 {%r1559, %r1560}, %rd546; 2026-02-21T09:50:29.9510941Z cvt.rn.f16x2.f32 %r1561, %r1560, %r1559; 2026-02-21T09:50:29.9511112Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9511178Z cvt.u64.u32 %rd547, %r842; 2026-02-21T09:50:29.9511236Z cvt.u64.u32 %rd548, %r843; 2026-02-21T09:50:29.9511295Z shl.b64 %rd549, %rd548, 32; 2026-02-21T09:50:29.9511359Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T09:50:29.9511527Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9511588Z mov.b64 {%r1562, %r1563}, %rd550; 2026-02-21T09:50:29.9511660Z cvt.rn.f16x2.f32 %r1564, %r1563, %r1562; 2026-02-21T09:50:29.9511828Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9511885Z cvt.u64.u32 %rd551, %r844; 2026-02-21T09:50:29.9511945Z cvt.u64.u32 %rd552, %r845; 2026-02-21T09:50:29.9512014Z shl.b64 %rd553, %rd552, 32; 2026-02-21T09:50:29.9512077Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T09:50:29.9512250Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9512313Z mov.b64 {%r1565, %r1566}, %rd554; 2026-02-21T09:50:29.9512376Z cvt.rn.f16x2.f32 %r1567, %r1566, %r1565; 2026-02-21T09:50:29.9512536Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9512599Z cvt.u64.u32 %rd555, %r847; 2026-02-21T09:50:29.9512654Z cvt.u64.u32 %rd556, %r848; 2026-02-21T09:50:29.9512731Z shl.b64 %rd557, %rd556, 32; 2026-02-21T09:50:29.9512788Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T09:50:29.9512955Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9513013Z mov.b64 {%r1568, %r1569}, %rd558; 2026-02-21T09:50:29.9513079Z cvt.rn.f16x2.f32 %r1570, %r1569, %r1568; 2026-02-21T09:50:29.9513249Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9513325Z cvt.u64.u32 %rd559, %r849; 2026-02-21T09:50:29.9513381Z cvt.u64.u32 %rd560, %r850; 2026-02-21T09:50:29.9513444Z shl.b64 %rd561, %rd560, 32; 2026-02-21T09:50:29.9513501Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T09:50:29.9513663Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9513720Z mov.b64 {%r1571, %r1572}, %rd562; 2026-02-21T09:50:29.9513792Z cvt.rn.f16x2.f32 %r1573, %r1572, %r1571; 2026-02-21T09:50:29.9513954Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9514009Z cvt.u64.u32 %rd563, %r851; 2026-02-21T09:50:29.9514072Z cvt.u64.u32 %rd564, %r852; 2026-02-21T09:50:29.9514127Z shl.b64 %rd565, %rd564, 32; 2026-02-21T09:50:29.9514202Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T09:50:29.9514408Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9514478Z mov.b64 {%r1574, %r1575}, %rd566; 2026-02-21T09:50:29.9514553Z cvt.rn.f16x2.f32 %r1576, %r1575, %r1574; 2026-02-21T09:50:29.9514776Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9514852Z cvt.u64.u32 %rd567, %r853; 2026-02-21T09:50:29.9514919Z cvt.u64.u32 %rd568, %r854; 2026-02-21T09:50:29.9514997Z shl.b64 %rd569, %rd568, 32; 2026-02-21T09:50:29.9515081Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T09:50:29.9515271Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9515336Z mov.b64 {%r1577, %r1578}, %rd570; 2026-02-21T09:50:29.9515430Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T09:50:29.9515617Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9515674Z cvt.u64.u32 %rd571, %r855; 2026-02-21T09:50:29.9515731Z cvt.u64.u32 %rd572, %r856; 2026-02-21T09:50:29.9515806Z shl.b64 %rd573, %rd572, 32; 2026-02-21T09:50:29.9515873Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T09:50:29.9516059Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9516135Z mov.b64 {%r1580, %r1581}, %rd574; 2026-02-21T09:50:29.9516199Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T09:50:29.9516393Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9516456Z cvt.u64.u32 %rd575, %r857; 2026-02-21T09:50:29.9516523Z cvt.u64.u32 %rd576, %r858; 2026-02-21T09:50:29.9516591Z shl.b64 %rd577, %rd576, 32; 2026-02-21T09:50:29.9516669Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T09:50:29.9516867Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9516925Z mov.b64 {%r1583, %r1584}, %rd578; 2026-02-21T09:50:29.9517016Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T09:50:29.9517207Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9517272Z cvt.u64.u32 %rd579, %r859; 2026-02-21T09:50:29.9517336Z cvt.u64.u32 %rd580, %r860; 2026-02-21T09:50:29.9517409Z shl.b64 %rd581, %rd580, 32; 2026-02-21T09:50:29.9517476Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T09:50:29.9517926Z [246s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:50:29.9519011Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=2, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:50:29.9519261Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:50:29.9519332Z `ptxas` stderr: 2026-02-21T09:50:29.9519745Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:50:29.9519840Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:50:29.9519859Z 2026-02-21T09:50:29.9520296Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpa8v8c48t.ptx -o /tmp/tmpa8v8c48t.ptx.o 2026-02-21T09:50:29.9520304Z 2026-02-21T09:50:29.9520438Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:50:29.9520655Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9520742Z mov.b64 {%r1586, %r1587}, %rd582; 2026-02-21T09:50:29.9520819Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T09:50:29.9521005Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9521064Z cvt.u64.u32 %rd583, %r861; 2026-02-21T09:50:29.9521132Z cvt.u64.u32 %rd584, %r862; 2026-02-21T09:50:29.9521190Z shl.b64 %rd585, %rd584, 32; 2026-02-21T09:50:29.9521251Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T09:50:29.9521434Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9521504Z mov.b64 {%r1589, %r1590}, %rd586; 2026-02-21T09:50:29.9521573Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T09:50:29.9521756Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9521824Z cvt.u64.u32 %rd587, %r864; 2026-02-21T09:50:29.9521884Z cvt.u64.u32 %rd588, %r865; 2026-02-21T09:50:29.9521946Z shl.b64 %rd589, %rd588, 32; 2026-02-21T09:50:29.9522018Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T09:50:29.9522200Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9522259Z mov.b64 {%r1592, %r1593}, %rd590; 2026-02-21T09:50:29.9522324Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T09:50:29.9522518Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9522575Z cvt.u64.u32 %rd591, %r866; 2026-02-21T09:50:29.9522631Z cvt.u64.u32 %rd592, %r867; 2026-02-21T09:50:29.9522698Z shl.b64 %rd593, %rd592, 32; 2026-02-21T09:50:29.9522755Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T09:50:29.9522937Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9523002Z mov.b64 {%r1595, %r1596}, %rd594; 2026-02-21T09:50:29.9523068Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T09:50:29.9523254Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9523312Z cvt.u64.u32 %rd595, %r868; 2026-02-21T09:50:29.9523375Z cvt.u64.u32 %rd596, %r869; 2026-02-21T09:50:29.9523432Z shl.b64 %rd597, %rd596, 32; 2026-02-21T09:50:29.9523491Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T09:50:29.9523677Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9523735Z mov.b64 {%r1598, %r1599}, %rd598; 2026-02-21T09:50:29.9523799Z cvt.rn.f16x2.f32 %r1600, %r1599, %r1598; 2026-02-21T09:50:29.9524010Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9524068Z cvt.u64.u32 %rd599, %r870; 2026-02-21T09:50:29.9524127Z cvt.u64.u32 %rd600, %r871; 2026-02-21T09:50:29.9524187Z shl.b64 %rd601, %rd600, 32; 2026-02-21T09:50:29.9524256Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T09:50:29.9524440Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9524524Z mov.b64 {%r1601, %r1602}, %rd602; 2026-02-21T09:50:29.9524596Z cvt.rn.f16x2.f32 %r1603, %r1602, %r1601; 2026-02-21T09:50:29.9524808Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9524865Z cvt.u64.u32 %rd603, %r872; 2026-02-21T09:50:29.9524927Z cvt.u64.u32 %rd604, %r873; 2026-02-21T09:50:29.9524983Z shl.b64 %rd605, %rd604, 32; 2026-02-21T09:50:29.9525041Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T09:50:29.9525228Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9525294Z mov.b64 {%r1604, %r1605}, %rd606; 2026-02-21T09:50:29.9525358Z cvt.rn.f16x2.f32 %r1606, %r1605, %r1604; 2026-02-21T09:50:29.9525569Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9525657Z cvt.u64.u32 %rd607, %r874; 2026-02-21T09:50:29.9525716Z cvt.u64.u32 %rd608, %r875; 2026-02-21T09:50:29.9525772Z shl.b64 %rd609, %rd608, 32; 2026-02-21T09:50:29.9525835Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T09:50:29.9526016Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9526074Z mov.b64 {%r1607, %r1608}, %rd610; 2026-02-21T09:50:29.9526138Z cvt.rn.f16x2.f32 %r1609, %r1608, %r1607; 2026-02-21T09:50:29.9526326Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9526385Z cvt.u64.u32 %rd611, %r876; 2026-02-21T09:50:29.9526440Z cvt.u64.u32 %rd612, %r877; 2026-02-21T09:50:29.9526503Z shl.b64 %rd613, %rd612, 32; 2026-02-21T09:50:29.9526562Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T09:50:29.9526744Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9526808Z mov.b64 {%r1610, %r1611}, %rd614; 2026-02-21T09:50:29.9526874Z cvt.rn.f16x2.f32 %r1612, %r1611, %r1610; 2026-02-21T09:50:29.9527055Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9527110Z cvt.u64.u32 %rd615, %r878; 2026-02-21T09:50:29.9527173Z cvt.u64.u32 %rd616, %r879; 2026-02-21T09:50:29.9527231Z shl.b64 %rd617, %rd616, 32; 2026-02-21T09:50:29.9527288Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T09:50:29.9527467Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9527526Z mov.b64 {%r1613, %r1614}, %rd618; 2026-02-21T09:50:29.9527589Z cvt.rn.f16x2.f32 %r1615, %r1614, %r1613; 2026-02-21T09:50:29.9527766Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9527823Z cvt.u64.u32 %rd619, %r881; 2026-02-21T09:50:29.9527879Z cvt.u64.u32 %rd620, %r882; 2026-02-21T09:50:29.9527935Z shl.b64 %rd621, %rd620, 32; 2026-02-21T09:50:29.9528000Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T09:50:29.9528175Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9528234Z mov.b64 {%r1616, %r1617}, %rd622; 2026-02-21T09:50:29.9528316Z cvt.rn.f16x2.f32 %r1618, %r1617, %r1616; 2026-02-21T09:50:29.9528477Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9528531Z cvt.u64.u32 %rd623, %r883; 2026-02-21T09:50:29.9528591Z cvt.u64.u32 %rd624, %r884; 2026-02-21T09:50:29.9528645Z shl.b64 %rd625, %rd624, 32; 2026-02-21T09:50:29.9528731Z or.b64 %rd626, %rd623, %rd625; 2026-02-21T09:50:29.9528893Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9528956Z mov.b64 {%r1619, %r1620}, %rd626; 2026-02-21T09:50:29.9529018Z cvt.rn.f16x2.f32 %r1621, %r1620, %r1619; 2026-02-21T09:50:29.9529180Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9529267Z cvt.u64.u32 %rd627, %r885; 2026-02-21T09:50:29.9529323Z cvt.u64.u32 %rd628, %r886; 2026-02-21T09:50:29.9529379Z shl.b64 %rd629, %rd628, 32; 2026-02-21T09:50:29.9529442Z or.b64 %rd630, %rd627, %rd629; 2026-02-21T09:50:29.9529598Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9529654Z mov.b64 {%r1622, %r1623}, %rd630; 2026-02-21T09:50:29.9529717Z cvt.rn.f16x2.f32 %r1624, %r1623, %r1622; 2026-02-21T09:50:29.9529883Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9529938Z cvt.u64.u32 %rd631, %r887; 2026-02-21T09:50:29.9529994Z cvt.u64.u32 %rd632, %r888; 2026-02-21T09:50:29.9530055Z shl.b64 %rd633, %rd632, 32; 2026-02-21T09:50:29.9530111Z or.b64 %rd634, %rd631, %rd633; 2026-02-21T09:50:29.9530310Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9530377Z mov.b64 {%r1625, %r1626}, %rd634; 2026-02-21T09:50:29.9530439Z cvt.rn.f16x2.f32 %r1627, %r1626, %r1625; 2026-02-21T09:50:29.9530593Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9530649Z cvt.u64.u32 %rd635, %r889; 2026-02-21T09:50:29.9530714Z cvt.u64.u32 %rd636, %r890; 2026-02-21T09:50:29.9530772Z shl.b64 %rd637, %rd636, 32; 2026-02-21T09:50:29.9530831Z or.b64 %rd638, %rd635, %rd637; 2026-02-21T09:50:29.9531001Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9531061Z mov.b64 {%r1628, %r1629}, %rd638; 2026-02-21T09:50:29.9531124Z cvt.rn.f16x2.f32 %r1630, %r1629, %r1628; 2026-02-21T09:50:29.9531283Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9531338Z cvt.u64.u32 %rd639, %r891; 2026-02-21T09:50:29.9531396Z cvt.u64.u32 %rd640, %r892; 2026-02-21T09:50:29.9531452Z shl.b64 %rd641, %rd640, 32; 2026-02-21T09:50:29.9531515Z or.b64 %rd642, %rd639, %rd641; 2026-02-21T09:50:29.9531666Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9531722Z mov.b64 {%r1631, %r1632}, %rd642; 2026-02-21T09:50:29.9531791Z cvt.rn.f16x2.f32 %r1633, %r1632, %r1631; 2026-02-21T09:50:29.9531946Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9532003Z cvt.u64.u32 %rd643, %r893; 2026-02-21T09:50:29.9532062Z cvt.u64.u32 %rd644, %r894; 2026-02-21T09:50:29.9532118Z shl.b64 %rd645, %rd644, 32; 2026-02-21T09:50:29.9532173Z or.b64 %rd646, %rd643, %rd645; 2026-02-21T09:50:29.9532327Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9532392Z mov.b64 {%r1634, %r1635}, %rd646; 2026-02-21T09:50:29.9532458Z cvt.rn.f16x2.f32 %r1636, %r1635, %r1634; 2026-02-21T09:50:29.9532611Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9532674Z cvt.u64.u32 %rd647, %r895; 2026-02-21T09:50:29.9532728Z cvt.u64.u32 %rd648, %r896; 2026-02-21T09:50:29.9532783Z shl.b64 %rd649, %rd648, 32; 2026-02-21T09:50:29.9532845Z or.b64 %rd650, %rd647, %rd649; 2026-02-21T09:50:29.9532998Z .loc 1 58 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:58:27 2026-02-21T09:50:29.9533054Z mov.b64 {%r1637, %r1638}, %rd650; 2026-02-21T09:50:29.9533141Z cvt.rn.f16x2.f32 %r1639, %r1638, %r1637; 2026-02-21T09:50:29.9533306Z .loc 1 59 83 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:59:83 2026-02-21T09:50:29.9533404Z st.shared.v4.b32 [%r81], {%r1258, %r1270, %r1282, %r1294}; 2026-02-21T09:50:29.9533494Z st.shared.v4.b32 [%r82], {%r1306, %r1318, %r1330, %r1342}; 2026-02-21T09:50:29.9533592Z st.shared.v4.b32 [%r83], {%r1354, %r1366, %r1378, %r1390}; 2026-02-21T09:50:29.9533717Z st.shared.v4.b32 [%r84], {%r1402, %r1414, %r1426, %r1438}; 2026-02-21T09:50:29.9533774Z bar.sync 0, 128; 2026-02-21T09:50:29.9533838Z // begin inline asm 2026-02-21T09:50:29.9533996Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r902]; 2026-02-21T09:50:29.9534051Z // end inline asm 2026-02-21T09:50:29.9534107Z // begin inline asm 2026-02-21T09:50:29.9534265Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r907]; 2026-02-21T09:50:29.9534319Z // end inline asm 2026-02-21T09:50:29.9534372Z // begin inline asm 2026-02-21T09:50:29.9534531Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r912]; 2026-02-21T09:50:29.9534584Z // end inline asm 2026-02-21T09:50:29.9534636Z // begin inline asm 2026-02-21T09:50:29.9534908Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1106, %r1110, %r1114, %r1118}, [%r917]; 2026-02-21T09:50:29.9535002Z // end inline asm 2026-02-21T09:50:29.9535059Z bar.sync 0, 128; 2026-02-21T09:50:29.9535150Z st.shared.v4.b32 [%r81], {%r1450, %r1462, %r1474, %r1486}; 2026-02-21T09:50:29.9535246Z st.shared.v4.b32 [%r82], {%r1498, %r1510, %r1522, %r1534}; 2026-02-21T09:50:29.9535331Z st.shared.v4.b32 [%r83], {%r1546, %r1558, %r1570, %r1582}; 2026-02-21T09:50:29.9535417Z st.shared.v4.b32 [%r84], {%r1594, %r1606, %r1618, %r1630}; 2026-02-21T09:50:29.9535478Z bar.sync 0, 128; 2026-02-21T09:50:29.9535533Z // begin inline asm 2026-02-21T09:50:29.9535676Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1122, %r1126, %r1130, %r1134}, [%r902]; 2026-02-21T09:50:29.9535730Z // end inline asm 2026-02-21T09:50:29.9535792Z // begin inline asm 2026-02-21T09:50:29.9535931Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1138, %r1142, %r1146, %r1150}, [%r907]; 2026-02-21T09:50:29.9535982Z // end inline asm 2026-02-21T09:50:29.9536042Z // begin inline asm 2026-02-21T09:50:29.9536185Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1154, %r1158, %r1162, %r1166}, [%r912]; 2026-02-21T09:50:29.9536239Z // end inline asm 2026-02-21T09:50:29.9536299Z // begin inline asm 2026-02-21T09:50:29.9536439Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1170, %r1174, %r1178, %r1182}, [%r917]; 2026-02-21T09:50:29.9536491Z // end inline asm 2026-02-21T09:50:29.9536543Z bar.sync 0, 128; 2026-02-21T09:50:29.9536637Z st.shared.v4.b32 [%r81], {%r1261, %r1273, %r1285, %r1297}; 2026-02-21T09:50:29.9536722Z st.shared.v4.b32 [%r82], {%r1309, %r1321, %r1333, %r1345}; 2026-02-21T09:50:29.9536806Z st.shared.v4.b32 [%r83], {%r1357, %r1369, %r1381, %r1393}; 2026-02-21T09:50:29.9536900Z st.shared.v4.b32 [%r84], {%r1405, %r1417, %r1429, %r1441}; 2026-02-21T09:50:29.9536952Z bar.sync 0, 128; 2026-02-21T09:50:29.9537005Z // begin inline asm 2026-02-21T09:50:29.9537155Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1059, %r1063, %r1067, %r1071}, [%r902]; 2026-02-21T09:50:29.9537208Z // end inline asm 2026-02-21T09:50:29.9537260Z // begin inline asm 2026-02-21T09:50:29.9537403Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1075, %r1079, %r1083, %r1087}, [%r907]; 2026-02-21T09:50:29.9537462Z // end inline asm 2026-02-21T09:50:29.9537514Z // begin inline asm 2026-02-21T09:50:29.9537651Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1091, %r1095, %r1099, %r1103}, [%r912]; 2026-02-21T09:50:29.9537709Z // end inline asm 2026-02-21T09:50:29.9537761Z // begin inline asm 2026-02-21T09:50:29.9537898Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1107, %r1111, %r1115, %r1119}, [%r917]; 2026-02-21T09:50:29.9537949Z // end inline asm 2026-02-21T09:50:29.9538034Z bar.sync 0, 128; 2026-02-21T09:50:29.9538120Z st.shared.v4.b32 [%r81], {%r1453, %r1465, %r1477, %r1489}; 2026-02-21T09:50:29.9538204Z st.shared.v4.b32 [%r82], {%r1501, %r1513, %r1525, %r1537}; 2026-02-21T09:50:29.9538297Z st.shared.v4.b32 [%r83], {%r1549, %r1561, %r1573, %r1585}; 2026-02-21T09:50:29.9538382Z st.shared.v4.b32 [%r84], {%r1597, %r1609, %r1621, %r1633}; 2026-02-21T09:50:29.9538435Z bar.sync 0, 128; 2026-02-21T09:50:29.9538519Z // begin inline asm 2026-02-21T09:50:29.9538665Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1123, %r1127, %r1131, %r1135}, [%r902]; 2026-02-21T09:50:29.9538717Z // end inline asm 2026-02-21T09:50:29.9538769Z // begin inline asm 2026-02-21T09:50:29.9538920Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1139, %r1143, %r1147, %r1151}, [%r907]; 2026-02-21T09:50:29.9538972Z // end inline asm 2026-02-21T09:50:29.9539024Z // begin inline asm 2026-02-21T09:50:29.9539177Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1155, %r1159, %r1163, %r1167}, [%r912]; 2026-02-21T09:50:29.9539232Z // end inline asm 2026-02-21T09:50:29.9539285Z // begin inline asm 2026-02-21T09:50:29.9539437Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1171, %r1175, %r1179, %r1183}, [%r917]; 2026-02-21T09:50:29.9539491Z // end inline asm 2026-02-21T09:50:29.9539545Z bar.sync 0, 128; 2026-02-21T09:50:29.9539655Z st.shared.v4.b32 [%r81], {%r1264, %r1276, %r1288, %r1300}; 2026-02-21T09:50:29.9539775Z st.shared.v4.b32 [%r82], {%r1312, %r1324, %r1336, %r1348}; 2026-02-21T09:50:29.9539861Z st.shared.v4.b32 [%r83], {%r1360, %r1372, %r1384, %r1396}; 2026-02-21T09:50:29.9539947Z st.shared.v4.b32 [%r84], {%r1408, %r1420, %r1432, %r1444}; 2026-02-21T09:50:29.9540005Z bar.sync 0, 128; 2026-02-21T09:50:29.9540057Z // begin inline asm 2026-02-21T09:50:29.9540198Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1060, %r1064, %r1068, %r1072}, [%r902]; 2026-02-21T09:50:29.9540250Z // end inline asm 2026-02-21T09:50:29.9540309Z // begin inline asm 2026-02-21T09:50:29.9540450Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1076, %r1080, %r1084, %r1088}, [%r907]; 2026-02-21T09:50:29.9540502Z // end inline asm 2026-02-21T09:50:29.9540561Z // begin inline asm 2026-02-21T09:50:29.9540701Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1092, %r1096, %r1100, %r1104}, [%r912]; 2026-02-21T09:50:29.9540751Z // end inline asm 2026-02-21T09:50:29.9540813Z // begin inline asm 2026-02-21T09:50:29.9540953Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1108, %r1112, %r1116, %r1120}, [%r917]; 2026-02-21T09:50:29.9541006Z // end inline asm 2026-02-21T09:50:29.9541057Z bar.sync 0, 128; 2026-02-21T09:50:29.9541151Z st.shared.v4.b32 [%r81], {%r1456, %r1468, %r1480, %r1492}; 2026-02-21T09:50:29.9541237Z st.shared.v4.b32 [%r82], {%r1504, %r1516, %r1528, %r1540}; 2026-02-21T09:50:29.9541321Z st.shared.v4.b32 [%r83], {%r1552, %r1564, %r1576, %r1588}; 2026-02-21T09:50:29.9541412Z st.shared.v4.b32 [%r84], {%r1600, %r1612, %r1624, %r1636}; 2026-02-21T09:50:29.9541465Z bar.sync 0, 128; 2026-02-21T09:50:29.9541519Z // begin inline asm 2026-02-21T09:50:29.9541667Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1124, %r1128, %r1132, %r1136}, [%r902]; 2026-02-21T09:50:29.9541719Z // end inline asm 2026-02-21T09:50:29.9541771Z // begin inline asm 2026-02-21T09:50:29.9541917Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1140, %r1144, %r1148, %r1152}, [%r907]; 2026-02-21T09:50:29.9541976Z // end inline asm 2026-02-21T09:50:29.9542032Z // begin inline asm 2026-02-21T09:50:29.9542173Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1156, %r1160, %r1164, %r1168}, [%r912]; 2026-02-21T09:50:29.9542231Z // end inline asm 2026-02-21T09:50:29.9542282Z // begin inline asm 2026-02-21T09:50:29.9542423Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1172, %r1176, %r1180, %r1184}, [%r917]; 2026-02-21T09:50:29.9542474Z // end inline asm 2026-02-21T09:50:29.9542532Z bar.sync 0, 128; 2026-02-21T09:50:29.9542619Z st.shared.v4.b32 [%r81], {%r1267, %r1279, %r1291, %r1303}; 2026-02-21T09:50:29.9542704Z st.shared.v4.b32 [%r82], {%r1315, %r1327, %r1339, %r1351}; 2026-02-21T09:50:29.9542821Z st.shared.v4.b32 [%r83], {%r1363, %r1375, %r1387, %r1399}; 2026-02-21T09:50:29.9542906Z st.shared.v4.b32 [%r84], {%r1411, %r1423, %r1435, %r1447}; 2026-02-21T09:50:29.9542958Z bar.sync 0, 128; 2026-02-21T09:50:29.9543018Z // begin inline asm 2026-02-21T09:50:29.9543161Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1061, %r1065, %r1069, %r1073}, [%r902]; 2026-02-21T09:50:29.9543238Z // end inline asm 2026-02-21T09:50:29.9543294Z // begin inline asm 2026-02-21T09:50:29.9543442Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1077, %r1081, %r1085, %r1089}, [%r907]; 2026-02-21T09:50:29.9543496Z // end inline asm 2026-02-21T09:50:29.9543551Z // begin inline asm 2026-02-21T09:50:29.9543697Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1093, %r1097, %r1101, %r1105}, [%r912]; 2026-02-21T09:50:29.9543750Z // end inline asm 2026-02-21T09:50:29.9543804Z // begin inline asm 2026-02-21T09:50:29.9543952Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1109, %r1113, %r1117, %r1121}, [%r917]; 2026-02-21T09:50:29.9544006Z // end inline asm 2026-02-21T09:50:29.9544059Z bar.sync 0, 128; 2026-02-21T09:50:29.9544147Z st.shared.v4.b32 [%r81], {%r1459, %r1471, %r1483, %r1495}; 2026-02-21T09:50:29.9544242Z st.shared.v4.b32 [%r82], {%r1507, %r1519, %r1531, %r1543}; 2026-02-21T09:50:29.9544348Z st.shared.v4.b32 [%r83], {%r1555, %r1567, %r1579, %r1591}; 2026-02-21T09:50:29.9544455Z st.shared.v4.b32 [%r84], {%r1603, %r1615, %r1627, %r1639}; 2026-02-21T09:50:29.9544513Z bar.sync 0, 128; 2026-02-21T09:50:29.9544566Z // begin inline asm 2026-02-21T09:50:29.9544733Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1125, %r1129, %r1133, %r1137}, [%r902]; 2026-02-21T09:50:29.9544787Z // end inline asm 2026-02-21T09:50:29.9544845Z // begin inline asm 2026-02-21T09:50:29.9544983Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1141, %r1145, %r1149, %r1153}, [%r907]; 2026-02-21T09:50:29.9545033Z // end inline asm 2026-02-21T09:50:29.9545091Z // begin inline asm 2026-02-21T09:50:29.9545230Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1157, %r1161, %r1165, %r1169}, [%r912]; 2026-02-21T09:50:29.9545281Z // end inline asm 2026-02-21T09:50:29.9545341Z // begin inline asm 2026-02-21T09:50:29.9545478Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1173, %r1177, %r1181, %r1185}, [%r917]; 2026-02-21T09:50:29.9545529Z // end inline asm 2026-02-21T09:50:29.9545582Z // begin inline asm 2026-02-21T09:50:29.9545697Z st.global.v4.b32 [ %rd107 + 0 ], { %r1058, %r1059, %r1060, %r1061 }; 2026-02-21T09:50:29.9545750Z // end inline asm 2026-02-21T09:50:29.9545801Z // begin inline asm 2026-02-21T09:50:29.9545909Z st.global.v4.b32 [ %rd108 + 0 ], { %r1062, %r1063, %r1064, %r1065 }; 2026-02-21T09:50:29.9545961Z // end inline asm 2026-02-21T09:50:29.9546013Z // begin inline asm 2026-02-21T09:50:29.9546111Z st.global.v4.b32 [ %rd109 + 0 ], { %r1066, %r1067, %r1068, %r1069 }; 2026-02-21T09:50:29.9546170Z // end inline asm 2026-02-21T09:50:29.9546223Z // begin inline asm 2026-02-21T09:50:29.9546321Z st.global.v4.b32 [ %rd110 + 0 ], { %r1070, %r1071, %r1072, %r1073 }; 2026-02-21T09:50:29.9546381Z // end inline asm 2026-02-21T09:50:29.9546433Z // begin inline asm 2026-02-21T09:50:29.9546527Z st.global.v4.b32 [ %rd111 + 0 ], { %r1074, %r1075, %r1076, %r1077 }; 2026-02-21T09:50:29.9546585Z // end inline asm 2026-02-21T09:50:29.9546639Z // begin inline asm 2026-02-21T09:50:29.9546733Z st.global.v4.b32 [ %rd112 + 0 ], { %r1078, %r1079, %r1080, %r1081 }; 2026-02-21T09:50:29.9546786Z // end inline asm 2026-02-21T09:50:29.9546846Z // begin inline asm 2026-02-21T09:50:29.9546938Z st.global.v4.b32 [ %rd113 + 0 ], { %r1082, %r1083, %r1084, %r1085 }; 2026-02-21T09:50:29.9546991Z // end inline asm 2026-02-21T09:50:29.9547050Z // begin inline asm 2026-02-21T09:50:29.9547144Z st.global.v4.b32 [ %rd114 + 0 ], { %r1086, %r1087, %r1088, %r1089 }; 2026-02-21T09:50:29.9547196Z // end inline asm 2026-02-21T09:50:29.9547250Z // begin inline asm 2026-02-21T09:50:29.9547352Z st.global.v4.b32 [ %rd115 + 0 ], { %r1090, %r1091, %r1092, %r1093 }; 2026-02-21T09:50:29.9547436Z // end inline asm 2026-02-21T09:50:29.9547492Z // begin inline asm 2026-02-21T09:50:29.9547596Z st.global.v4.b32 [ %rd116 + 0 ], { %r1094, %r1095, %r1096, %r1097 }; 2026-02-21T09:50:29.9547648Z // end inline asm 2026-02-21T09:50:29.9547700Z // begin inline asm 2026-02-21T09:50:29.9547802Z st.global.v4.b32 [ %rd117 + 0 ], { %r1098, %r1099, %r1100, %r1101 }; 2026-02-21T09:50:29.9547878Z // end inline asm 2026-02-21T09:50:29.9547930Z // begin inline asm 2026-02-21T09:50:29.9548023Z st.global.v4.b32 [ %rd118 + 0 ], { %r1102, %r1103, %r1104, %r1105 }; 2026-02-21T09:50:29.9548081Z // end inline asm 2026-02-21T09:50:29.9548132Z // begin inline asm 2026-02-21T09:50:29.9548224Z st.global.v4.b32 [ %rd119 + 0 ], { %r1106, %r1107, %r1108, %r1109 }; 2026-02-21T09:50:29.9548282Z // end inline asm 2026-02-21T09:50:29.9548335Z // begin inline asm 2026-02-21T09:50:29.9548429Z st.global.v4.b32 [ %rd120 + 0 ], { %r1110, %r1111, %r1112, %r1113 }; 2026-02-21T09:50:29.9548485Z // end inline asm 2026-02-21T09:50:29.9548546Z // begin inline asm 2026-02-21T09:50:29.9548643Z st.global.v4.b32 [ %rd121 + 0 ], { %r1114, %r1115, %r1116, %r1117 }; 2026-02-21T09:50:29.9548697Z // end inline asm 2026-02-21T09:50:29.9548759Z // begin inline asm 2026-02-21T09:50:29.9548883Z st.global.v4.b32 [ %rd122 + 0 ], { %r1118, %r1119, %r1120, %r1121 }; 2026-02-21T09:50:29.9548964Z // end inline asm 2026-02-21T09:50:29.9549027Z // begin inline asm 2026-02-21T09:50:29.9549129Z st.global.v4.b32 [ %rd123 + 0 ], { %r1122, %r1123, %r1124, %r1125 }; 2026-02-21T09:50:29.9549183Z // end inline asm 2026-02-21T09:50:29.9549237Z // begin inline asm 2026-02-21T09:50:29.9549341Z st.global.v4.b32 [ %rd124 + 0 ], { %r1126, %r1127, %r1128, %r1129 }; 2026-02-21T09:50:29.9549395Z // end inline asm 2026-02-21T09:50:29.9549449Z // begin inline asm 2026-02-21T09:50:29.9549553Z st.global.v4.b32 [ %rd125 + 0 ], { %r1130, %r1131, %r1132, %r1133 }; 2026-02-21T09:50:29.9549607Z // end inline asm 2026-02-21T09:50:29.9549663Z // begin inline asm 2026-02-21T09:50:29.9549761Z st.global.v4.b32 [ %rd126 + 0 ], { %r1134, %r1135, %r1136, %r1137 }; 2026-02-21T09:50:29.9549822Z // end inline asm 2026-02-21T09:50:29.9549876Z // begin inline asm 2026-02-21T09:50:29.9549977Z st.global.v4.b32 [ %rd127 + 0 ], { %r1138, %r1139, %r1140, %r1141 }; 2026-02-21T09:50:29.9550038Z // end inline asm 2026-02-21T09:50:29.9550095Z // begin inline asm 2026-02-21T09:50:29.9550192Z st.global.v4.b32 [ %rd128 + 0 ], { %r1142, %r1143, %r1144, %r1145 }; 2026-02-21T09:50:29.9550245Z // end inline asm 2026-02-21T09:50:29.9550308Z // begin inline asm 2026-02-21T09:50:29.9550404Z st.global.v4.b32 [ %rd129 + 0 ], { %r1146, %r1147, %r1148, %r1149 }; 2026-02-21T09:50:29.9550457Z // end inline asm 2026-02-21T09:50:29.9550518Z // begin inline asm 2026-02-21T09:50:29.9550614Z st.global.v4.b32 [ %rd130 + 0 ], { %r1150, %r1151, %r1152, %r1153 }; 2026-02-21T09:50:29.9550668Z // end inline asm 2026-02-21T09:50:29.9550730Z // begin inline asm 2026-02-21T09:50:29.9550827Z st.global.v4.b32 [ %rd131 + 0 ], { %r1154, %r1155, %r1156, %r1157 }; 2026-02-21T09:50:29.9550880Z // end inline asm 2026-02-21T09:50:29.9550933Z // begin inline asm 2026-02-21T09:50:29.9551038Z st.global.v4.b32 [ %rd132 + 0 ], { %r1158, %r1159, %r1160, %r1161 }; 2026-02-21T09:50:29.9551092Z // end inline asm 2026-02-21T09:50:29.9551147Z // begin inline asm 2026-02-21T09:50:29.9551254Z st.global.v4.b32 [ %rd133 + 0 ], { %r1162, %r1163, %r1164, %r1165 }; 2026-02-21T09:50:29.9551307Z // end inline asm 2026-02-21T09:50:29.9551360Z // begin inline asm 2026-02-21T09:50:29.9551456Z st.global.v4.b32 [ %rd134 + 0 ], { %r1166, %r1167, %r1168, %r1169 }; 2026-02-21T09:50:29.9551515Z // end inline asm 2026-02-21T09:50:29.9551570Z // begin inline asm 2026-02-21T09:50:29.9551668Z st.global.v4.b32 [ %rd135 + 0 ], { %r1170, %r1171, %r1172, %r1173 }; 2026-02-21T09:50:29.9551727Z // end inline asm 2026-02-21T09:50:29.9551781Z // begin inline asm 2026-02-21T09:50:29.9551904Z st.global.v4.b32 [ %rd136 + 0 ], { %r1174, %r1175, %r1176, %r1177 }; 2026-02-21T09:50:29.9551964Z // end inline asm 2026-02-21T09:50:29.9552017Z // begin inline asm 2026-02-21T09:50:29.9552115Z st.global.v4.b32 [ %rd137 + 0 ], { %r1178, %r1179, %r1180, %r1181 }; 2026-02-21T09:50:29.9552167Z // end inline asm 2026-02-21T09:50:29.9552227Z // begin inline asm 2026-02-21T09:50:29.9552326Z st.global.v4.b32 [ %rd138 + 0 ], { %r1182, %r1183, %r1184, %r1185 }; 2026-02-21T09:50:29.9552399Z // end inline asm 2026-02-21T09:50:29.9552580Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9552646Z add.s32 %r1186, %r1189, 204848; 2026-02-21T09:50:29.9552812Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9552871Z bar.sync 0, 128; 2026-02-21T09:50:29.9552926Z // begin inline asm 2026-02-21T09:50:29.9553032Z @%p132 mbarrier.arrive.shared::cta.b64 _, [%r1186]; 2026-02-21T09:50:29.9553087Z // end inline asm 2026-02-21T09:50:29.9553153Z add.s32 %r1640, %r1696, 1; 2026-02-21T09:50:29.9553219Z setp.eq.b32 %p130, %r1640, 2; 2026-02-21T09:50:29.9553285Z selp.b32 %r1696, 0, %r1640, %p130; 2026-02-21T09:50:29.9553353Z selp.b32 %r1695, 1, 0, %p130; 2026-02-21T09:50:29.9553470Z $L__BB0_22: // %.thread23 2026-02-21T09:50:29.9553587Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:29.9553764Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9553833Z xor.b32 %r1698, %r1698, %r1695; 2026-02-21T09:50:29.9554003Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9554065Z add.s32 %r1685, %r1685, -1; 2026-02-21T09:50:29.9554136Z setp.ne.b32 %p131, %r1685, 0; 2026-02-21T09:50:29.9554197Z @%p131 bra $L__BB0_18; 2026-02-21T09:50:29.9554255Z bra.uni $L__BB0_23; 2026-02-21T09:50:29.9554371Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:50:29.9554429Z add.s32 %r623, %r1691, 1; 2026-02-21T09:50:29.9554493Z setp.eq.b32 %p126, %r1691, 15; 2026-02-21T09:50:29.9554556Z selp.b32 %r1691, 0, %r623, %p126; 2026-02-21T09:50:29.9554626Z setp.eq.b32 %p127, %r1691, 15; 2026-02-21T09:50:29.9554718Z @%p127 bra $L__BB0_21; 2026-02-21T09:50:29.9554823Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:29.9555000Z .loc 1 0 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0:74 2026-02-21T09:50:29.9555057Z mov.b32 %r1695, 0; 2026-02-21T09:50:29.9555230Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9555301Z setp.ne.b32 %p128, %r1691, 0; 2026-02-21T09:50:29.9555360Z @%p128 bra $L__BB0_22; 2026-02-21T09:50:29.9555438Z // %bb.20: // %.thread 2026-02-21T09:50:29.9555530Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:29.9555597Z add.s32 %r1694, %r1694, 1; 2026-02-21T09:50:29.9555757Z .loc 1 37 35 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:37:35 2026-02-21T09:50:29.9555818Z shr.s32 %r1642, %r1694, 31; 2026-02-21T09:50:29.9555886Z shr.u32 %r1643, %r1642, 25; 2026-02-21T09:50:29.9555950Z add.s32 %r1644, %r1694, %r1643; 2026-02-21T09:50:29.9556009Z shr.s32 %r1645, %r1644, 7; 2026-02-21T09:50:29.9556182Z .loc 1 38 33 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:38:33 2026-02-21T09:50:29.9556239Z shl.b32 %r1646, %r1645, 4; 2026-02-21T09:50:29.9556401Z .loc 1 39 39 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:39:39 2026-02-21T09:50:29.9556468Z sub.s32 %r1647, 96, %r1646; 2026-02-21T09:50:29.9556628Z .loc 1 39 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:39:52 2026-02-21T09:50:29.9556711Z min.s32 %r1648, %r1647, 16; 2026-02-21T09:50:29.9556881Z .loc 1 40 45 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:40:45 2026-02-21T09:50:29.9556949Z and.b32 %r1649, %r1644, -128; 2026-02-21T09:50:29.9557009Z sub.s32 %r1650, %r1694, %r1649; 2026-02-21T09:50:29.9557173Z .loc 1 41 51 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:41:51 2026-02-21T09:50:29.9557264Z div.s32 %r1651, %r1650, %r1648; 2026-02-21T09:50:29.9557424Z .loc 1 40 64 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:40:64 2026-02-21T09:50:29.9557485Z mul.lo.s32 %r1652, %r1651, %r1648; 2026-02-21T09:50:29.9557548Z sub.s32 %r1653, %r1650, %r1652; 2026-02-21T09:50:29.9557711Z .loc 1 40 30 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:40:30 2026-02-21T09:50:29.9557768Z add.s32 %r1654, %r1653, %r1646; 2026-02-21T09:50:29.9557931Z .loc 1 42 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:42:27 2026-02-21T09:50:29.9557996Z shl.b32 %r1692, %r1654, 7; 2026-02-21T09:50:29.9558157Z .loc 1 44 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:44:27 2026-02-21T09:50:29.9558240Z shl.b32 %r1693, %r1651, 8; 2026-02-21T09:50:29.9558433Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9558492Z bra.uni $L__BB0_22; 2026-02-21T09:50:29.9558590Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:50:29.9558761Z .loc 1 0 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0:74 2026-02-21T09:50:29.9558821Z mov.b32 %r112, global_smem; 2026-02-21T09:50:29.9558877Z add.s32 %r113, %r112, %r3; 2026-02-21T09:50:29.9558943Z mov.u32 %r169, %ctaid.x; 2026-02-21T09:50:29.9559001Z mul.lo.s32 %r170, %r169, 3; 2026-02-21T09:50:29.9559061Z add.s32 %r171, %r170, 3; 2026-02-21T09:50:29.9559117Z min.s32 %r172, %r171, 768; 2026-02-21T09:50:29.9559183Z sub.s32 %r173, %r172, %r170; 2026-02-21T09:50:29.9559240Z shl.b32 %r5, %r173, 4; 2026-02-21T09:50:29.9559301Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:50:29.9559364Z bra.uni $L__BB0_2; 2026-02-21T09:50:29.9559462Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9559630Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9559709Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:29.9559771Z barrier.sync 1; 2026-02-21T09:50:29.9559826Z barrier.sync 1; 2026-02-21T09:50:29.9559905Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:29.9559992Z $L__BB0_2: // %.preheader 2026-02-21T09:50:29.9560080Z // =>This Loop Header: Depth=1 2026-02-21T09:50:29.9560163Z // Child Loop BB0_11 Depth 2 2026-02-21T09:50:29.9560256Z // Child Loop BB0_7 Depth 2 2026-02-21T09:50:29.9560422Z .loc 1 19 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:19 2026-02-21T09:50:29.9560497Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:50:29.9560552Z barrier.sync 1; 2026-02-21T09:50:29.9560622Z ld.shared.b8 %r111, [%r113+204860]; 2026-02-21T09:50:29.9560684Z setp.gt.u32 %p4, %r111, 3; 2026-02-21T09:50:29.9560741Z @%p4 bra $L__BB0_4; 2026-02-21T09:50:29.9560825Z // %bb.3: // %.preheader 2026-02-21T09:50:29.9560912Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9560973Z $L_brx_0: .branchtargets 2026-02-21T09:50:29.9561033Z $L__BB0_5, 2026-02-21T09:50:29.9561084Z $L__BB0_9, 2026-02-21T09:50:29.9561136Z $L__BB0_15, 2026-02-21T09:50:29.9561185Z $L__BB0_24; 2026-02-21T09:50:29.9561252Z brx.idx %r111, $L_brx_0; 2026-02-21T09:50:29.9561365Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9561527Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9561607Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:29.9561684Z ld.shared.b32 %r4, [global_smem+196608]; 2026-02-21T09:50:29.9561740Z barrier.sync 1; 2026-02-21T09:50:29.9561827Z @%p17 bra $L__BB0_8; 2026-02-21T09:50:29.9561911Z // %bb.6: // %.lr.ph11 2026-02-21T09:50:29.9561994Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9562158Z .loc 1 0 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0:74 2026-02-21T09:50:29.9562222Z mov.b32 %r1673, -1; 2026-02-21T09:50:29.9562280Z mov.pred %p141, 0; 2026-02-21T09:50:29.9562334Z mov.b32 %r1669, 0; 2026-02-21T09:50:29.9562397Z mov.b32 %r1668, %r5; 2026-02-21T09:50:29.9562456Z mov.b32 %r1670, %r1669; 2026-02-21T09:50:29.9562510Z mov.b32 %r1671, %r1669; 2026-02-21T09:50:29.9562565Z mov.b32 %r1672, %r1669; 2026-02-21T09:50:29.9562668Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:50:29.9562757Z // => This Inner Loop Header: Depth=2 2026-02-21T09:50:29.9562959Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9563031Z add.s32 %r212, %r1673, 1; 2026-02-21T09:50:29.9563094Z setp.eq.b32 %p54, %r1673, 15; 2026-02-21T09:50:29.9563158Z selp.b32 %r1673, 0, %r212, %p54; 2026-02-21T09:50:29.9563222Z shl.b32 %r213, %r1672, 3; 2026-02-21T09:50:29.9563282Z add.s32 %r215, %r112, %r213; 2026-02-21T09:50:29.9563343Z add.s32 %r216, %r215, 204800; 2026-02-21T09:50:29.9563402Z add.s32 %r176, %r215, 204816; 2026-02-21T09:50:29.9563572Z .loc 1 54 31 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:54:31 2026-02-21T09:50:29.9563632Z shl.b32 %r217, %r1672, 16; 2026-02-21T09:50:29.9563691Z add.s32 %r218, %r112, %r217; 2026-02-21T09:50:29.9563859Z .loc 1 55 44 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:55:44 2026-02-21T09:50:29.9563915Z shl.b32 %r219, %r1672, 15; 2026-02-21T09:50:29.9563973Z add.s32 %r220, %r112, %r219; 2026-02-21T09:50:29.9564031Z add.s32 %r221, %r220, 131072; 2026-02-21T09:50:29.9564196Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9564256Z bar.warp.sync -1; 2026-02-21T09:50:29.9564311Z // begin inline asm 2026-02-21T09:50:29.9564370Z 2026-02-21T09:50:29.9564422Z { 2026-02-21T09:50:29.9564482Z .reg .pred complete; 2026-02-21T09:50:29.9564542Z waitLoop: 2026-02-21T09:50:29.9564665Z mbarrier.try_wait.parity.shared.b64 complete, [%r176], %r1671; 2026-02-21T09:50:29.9564756Z @!complete bra.uni waitLoop; 2026-02-21T09:50:29.9564805Z } 2026-02-21T09:50:29.9564811Z 2026-02-21T09:50:29.9564875Z // end inline asm 2026-02-21T09:50:29.9565041Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9565097Z shl.b32 %r222, %r1670, 8; 2026-02-21T09:50:29.9565162Z add.s32 %r178, %r222, %r4; 2026-02-21T09:50:29.9565329Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9565387Z shl.b32 %r223, %r1670, 3; 2026-02-21T09:50:29.9565451Z add.s32 %r224, %r112, %r223; 2026-02-21T09:50:29.9565508Z add.s32 %r225, %r224, 204832; 2026-02-21T09:50:29.9565568Z setp.eq.b32 %p53, %r1673, 15; 2026-02-21T09:50:29.9565730Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9565798Z elect.sync %r226|%p20, -1; 2026-02-21T09:50:29.9565858Z bfe.u32 %r227, %r218, 4, 14; 2026-02-21T09:50:29.9565916Z cvt.u64.u32 %rd46, %r227; 2026-02-21T09:50:29.9565995Z or.b64 %rd12, %rd46, 4611686293439512576; 2026-02-21T09:50:29.9566083Z bfe.u32 %r228, %r221, 4, 14; 2026-02-21T09:50:29.9566141Z cvt.u64.u32 %rd47, %r228; 2026-02-21T09:50:29.9566210Z or.b64 %rd13, %rd47, 4611686293372403712; 2026-02-21T09:50:29.9566271Z mov.b32 %r179, 136314896; 2026-02-21T09:50:29.9566326Z // begin inline asm 2026-02-21T09:50:29.9566477Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd12, %rd13, %r179, %p141; 2026-02-21T09:50:29.9566565Z // end inline asm 2026-02-21T09:50:29.9566621Z add.s32 %r229, %r218, 32; 2026-02-21T09:50:29.9566678Z bfe.u32 %r230, %r229, 4, 14; 2026-02-21T09:50:29.9566739Z cvt.u64.u32 %rd48, %r230; 2026-02-21T09:50:29.9566803Z or.b64 %rd14, %rd48, 4611686293439512576; 2026-02-21T09:50:29.9566861Z add.s32 %r231, %r220, 131104; 2026-02-21T09:50:29.9566918Z bfe.u32 %r232, %r231, 4, 14; 2026-02-21T09:50:29.9566980Z cvt.u64.u32 %rd49, %r232; 2026-02-21T09:50:29.9567042Z or.b64 %rd15, %rd49, 4611686293372403712; 2026-02-21T09:50:29.9567102Z mov.pred %p21, -1; 2026-02-21T09:50:29.9567166Z // begin inline asm 2026-02-21T09:50:29.9567304Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd14, %rd15, %r179, %p21; 2026-02-21T09:50:29.9567359Z // end inline asm 2026-02-21T09:50:29.9567414Z add.s32 %r233, %r218, 64; 2026-02-21T09:50:29.9567478Z bfe.u32 %r234, %r233, 4, 14; 2026-02-21T09:50:29.9567558Z cvt.u64.u32 %rd50, %r234; 2026-02-21T09:50:29.9567654Z or.b64 %rd16, %rd50, 4611686293439512576; 2026-02-21T09:50:29.9567722Z add.s32 %r235, %r220, 131136; 2026-02-21T09:50:29.9567776Z bfe.u32 %r236, %r235, 4, 14; 2026-02-21T09:50:29.9567832Z cvt.u64.u32 %rd51, %r236; 2026-02-21T09:50:29.9567902Z or.b64 %rd17, %rd51, 4611686293372403712; 2026-02-21T09:50:29.9567956Z // begin inline asm 2026-02-21T09:50:29.9568087Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd16, %rd17, %r179, %p21; 2026-02-21T09:50:29.9568141Z // end inline asm 2026-02-21T09:50:29.9568203Z add.s32 %r237, %r218, 96; 2026-02-21T09:50:29.9568258Z bfe.u32 %r238, %r237, 4, 14; 2026-02-21T09:50:29.9568314Z cvt.u64.u32 %rd52, %r238; 2026-02-21T09:50:29.9568382Z or.b64 %rd18, %rd52, 4611686293439512576; 2026-02-21T09:50:29.9568439Z add.s32 %r239, %r220, 131168; 2026-02-21T09:50:29.9568494Z bfe.u32 %r240, %r239, 4, 14; 2026-02-21T09:50:29.9568549Z cvt.u64.u32 %rd53, %r240; 2026-02-21T09:50:29.9568620Z or.b64 %rd19, %rd53, 4611686293372403712; 2026-02-21T09:50:29.9568678Z // begin inline asm 2026-02-21T09:50:29.9568819Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd18, %rd19, %r179, %p21; 2026-02-21T09:50:29.9568878Z // end inline asm 2026-02-21T09:50:29.9568932Z add.s32 %r241, %r218, 32768; 2026-02-21T09:50:29.9568985Z bfe.u32 %r242, %r241, 4, 14; 2026-02-21T09:50:29.9569046Z cvt.u64.u32 %rd54, %r242; 2026-02-21T09:50:29.9569107Z or.b64 %rd20, %rd54, 4611686293439512576; 2026-02-21T09:50:29.9569161Z add.s32 %r243, %r220, 147456; 2026-02-21T09:50:29.9569215Z bfe.u32 %r244, %r243, 4, 14; 2026-02-21T09:50:29.9569278Z cvt.u64.u32 %rd55, %r244; 2026-02-21T09:50:29.9569341Z or.b64 %rd21, %rd55, 4611686293372403712; 2026-02-21T09:50:29.9569395Z // begin inline asm 2026-02-21T09:50:29.9569524Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd20, %rd21, %r179, %p21; 2026-02-21T09:50:29.9569576Z // end inline asm 2026-02-21T09:50:29.9569633Z add.s32 %r245, %r218, 32800; 2026-02-21T09:50:29.9569694Z bfe.u32 %r246, %r245, 4, 14; 2026-02-21T09:50:29.9569752Z cvt.u64.u32 %rd56, %r246; 2026-02-21T09:50:29.9569813Z or.b64 %rd22, %rd56, 4611686293439512576; 2026-02-21T09:50:29.9569870Z add.s32 %r247, %r220, 147488; 2026-02-21T09:50:29.9569933Z bfe.u32 %r248, %r247, 4, 14; 2026-02-21T09:50:29.9569988Z cvt.u64.u32 %rd57, %r248; 2026-02-21T09:50:29.9570050Z or.b64 %rd23, %rd57, 4611686293372403712; 2026-02-21T09:50:29.9570110Z // begin inline asm 2026-02-21T09:50:29.9570233Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd22, %rd23, %r179, %p21; 2026-02-21T09:50:29.9570285Z // end inline asm 2026-02-21T09:50:29.9570365Z add.s32 %r249, %r218, 32832; 2026-02-21T09:50:29.9570428Z bfe.u32 %r250, %r249, 4, 14; 2026-02-21T09:50:29.9570484Z cvt.u64.u32 %rd58, %r250; 2026-02-21T09:50:29.9570546Z or.b64 %rd24, %rd58, 4611686293439512576; 2026-02-21T09:50:29.9570612Z add.s32 %r251, %r220, 147520; 2026-02-21T09:50:29.9570670Z bfe.u32 %r252, %r251, 4, 14; 2026-02-21T09:50:29.9570727Z cvt.u64.u32 %rd59, %r252; 2026-02-21T09:50:29.9570823Z or.b64 %rd25, %rd59, 4611686293372403712; 2026-02-21T09:50:29.9570888Z // begin inline asm 2026-02-21T09:50:29.9571011Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd24, %rd25, %r179, %p21; 2026-02-21T09:50:29.9571063Z // end inline asm 2026-02-21T09:50:29.9571125Z add.s32 %r253, %r218, 32864; 2026-02-21T09:50:29.9571180Z bfe.u32 %r254, %r253, 4, 14; 2026-02-21T09:50:29.9571236Z cvt.u64.u32 %rd60, %r254; 2026-02-21T09:50:29.9571304Z or.b64 %rd26, %rd60, 4611686293439512576; 2026-02-21T09:50:29.9571359Z add.s32 %r255, %r220, 147552; 2026-02-21T09:50:29.9571414Z bfe.u32 %r256, %r255, 4, 14; 2026-02-21T09:50:29.9571468Z cvt.u64.u32 %rd61, %r256; 2026-02-21T09:50:29.9571535Z or.b64 %rd27, %rd61, 4611686293372403712; 2026-02-21T09:50:29.9571589Z // begin inline asm 2026-02-21T09:50:29.9571709Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 0 ], %rd26, %rd27, %r179, %p21; 2026-02-21T09:50:29.9571788Z // end inline asm 2026-02-21T09:50:29.9571869Z add.s32 %r257, %r218, 16384; 2026-02-21T09:50:29.9571926Z bfe.u32 %r258, %r257, 4, 14; 2026-02-21T09:50:29.9571989Z cvt.u64.u32 %rd62, %r258; 2026-02-21T09:50:29.9572049Z or.b64 %rd28, %rd62, 4611686293439512576; 2026-02-21T09:50:29.9572103Z // begin inline asm 2026-02-21T09:50:29.9572236Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd28, %rd13, %r179, %p141; 2026-02-21T09:50:29.9572297Z // end inline asm 2026-02-21T09:50:29.9572351Z add.s32 %r259, %r218, 16416; 2026-02-21T09:50:29.9572405Z bfe.u32 %r260, %r259, 4, 14; 2026-02-21T09:50:29.9572466Z cvt.u64.u32 %rd63, %r260; 2026-02-21T09:50:29.9572530Z or.b64 %rd30, %rd63, 4611686293439512576; 2026-02-21T09:50:29.9572583Z // begin inline asm 2026-02-21T09:50:29.9572720Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd30, %rd15, %r179, %p21; 2026-02-21T09:50:29.9572773Z // end inline asm 2026-02-21T09:50:29.9572828Z add.s32 %r261, %r218, 16448; 2026-02-21T09:50:29.9572881Z bfe.u32 %r262, %r261, 4, 14; 2026-02-21T09:50:29.9572946Z cvt.u64.u32 %rd64, %r262; 2026-02-21T09:50:29.9573006Z or.b64 %rd32, %rd64, 4611686293439512576; 2026-02-21T09:50:29.9573059Z // begin inline asm 2026-02-21T09:50:29.9573198Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd32, %rd17, %r179, %p21; 2026-02-21T09:50:29.9573250Z // end inline asm 2026-02-21T09:50:29.9573303Z add.s32 %r263, %r218, 16480; 2026-02-21T09:50:29.9573357Z bfe.u32 %r264, %r263, 4, 14; 2026-02-21T09:50:29.9573419Z cvt.u64.u32 %rd65, %r264; 2026-02-21T09:50:29.9573480Z or.b64 %rd34, %rd65, 4611686293439512576; 2026-02-21T09:50:29.9573534Z // begin inline asm 2026-02-21T09:50:29.9573671Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd34, %rd19, %r179, %p21; 2026-02-21T09:50:29.9573723Z // end inline asm 2026-02-21T09:50:29.9573776Z add.s32 %r265, %r218, 49152; 2026-02-21T09:50:29.9573837Z bfe.u32 %r266, %r265, 4, 14; 2026-02-21T09:50:29.9573891Z cvt.u64.u32 %rd66, %r266; 2026-02-21T09:50:29.9573953Z or.b64 %rd36, %rd66, 4611686293439512576; 2026-02-21T09:50:29.9574008Z // begin inline asm 2026-02-21T09:50:29.9574139Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd36, %rd21, %r179, %p21; 2026-02-21T09:50:29.9574190Z // end inline asm 2026-02-21T09:50:29.9574243Z add.s32 %r267, %r218, 49184; 2026-02-21T09:50:29.9574304Z bfe.u32 %r268, %r267, 4, 14; 2026-02-21T09:50:29.9574357Z cvt.u64.u32 %rd67, %r268; 2026-02-21T09:50:29.9574417Z or.b64 %rd38, %rd67, 4611686293439512576; 2026-02-21T09:50:29.9574471Z // begin inline asm 2026-02-21T09:50:29.9574602Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd38, %rd23, %r179, %p21; 2026-02-21T09:50:29.9574709Z // end inline asm 2026-02-21T09:50:29.9574765Z add.s32 %r269, %r218, 49216; 2026-02-21T09:50:29.9574825Z bfe.u32 %r270, %r269, 4, 14; 2026-02-21T09:50:29.9574879Z cvt.u64.u32 %rd68, %r270; 2026-02-21T09:50:29.9574940Z or.b64 %rd40, %rd68, 4611686293439512576; 2026-02-21T09:50:29.9574999Z // begin inline asm 2026-02-21T09:50:29.9575125Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd40, %rd25, %r179, %p21; 2026-02-21T09:50:29.9575205Z // end inline asm 2026-02-21T09:50:29.9575260Z add.s32 %r271, %r218, 49248; 2026-02-21T09:50:29.9575319Z bfe.u32 %r272, %r271, 4, 14; 2026-02-21T09:50:29.9575373Z cvt.u64.u32 %rd69, %r272; 2026-02-21T09:50:29.9575435Z or.b64 %rd42, %rd69, 4611686293439512576; 2026-02-21T09:50:29.9575496Z // begin inline asm 2026-02-21T09:50:29.9575621Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r178 + 128 ], %rd42, %rd27, %r179, %p21; 2026-02-21T09:50:29.9575674Z // end inline asm 2026-02-21T09:50:29.9575737Z cvt.u64.u32 %rd44, %r216; 2026-02-21T09:50:29.9575789Z // begin inline asm 2026-02-21T09:50:29.9575911Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd44]; 2026-02-21T09:50:29.9575962Z // end inline asm 2026-02-21T09:50:29.9576033Z and.pred %p52, %p53, %p20; 2026-02-21T09:50:29.9576114Z cvt.u64.u32 %rd45, %r225; 2026-02-21T09:50:29.9576168Z // begin inline asm 2026-02-21T09:50:29.9576320Z @%p52 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd45]; 2026-02-21T09:50:29.9576374Z // end inline asm 2026-02-21T09:50:29.9576534Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9576604Z setp.ne.b32 %p141, %r1673, 15; 2026-02-21T09:50:29.9576767Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9576822Z add.s32 %r273, %r1670, 1; 2026-02-21T09:50:29.9576881Z setp.eq.b32 %p55, %r273, 2; 2026-02-21T09:50:29.9576950Z selp.b32 %r274, 0, %r273, %p55; 2026-02-21T09:50:29.9577015Z selp.b32 %r1670, %r274, %r1670, %p53; 2026-02-21T09:50:29.9577077Z and.pred %p56, %p53, %p55; 2026-02-21T09:50:29.9577142Z selp.b32 %r275, 1, 0, %p56; 2026-02-21T09:50:29.9577200Z xor.b32 %r1669, %r1669, %r275; 2026-02-21T09:50:29.9577359Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9577423Z shl.b32 %r276, %r1670, 3; 2026-02-21T09:50:29.9577480Z add.s32 %r277, %r112, %r276; 2026-02-21T09:50:29.9577536Z add.s32 %r210, %r277, 204848; 2026-02-21T09:50:29.9577693Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9577757Z // begin inline asm 2026-02-21T09:50:29.9577806Z 2026-02-21T09:50:29.9577856Z { 2026-02-21T09:50:29.9577926Z @!%p53 bra.uni skipWait; 2026-02-21T09:50:29.9577985Z .reg .pred complete; 2026-02-21T09:50:29.9578039Z waitLoop: 2026-02-21T09:50:29.9578156Z mbarrier.try_wait.parity.shared.b64 complete, [%r210], %r1669; 2026-02-21T09:50:29.9578227Z @!complete bra.uni waitLoop; 2026-02-21T09:50:29.9578282Z skipWait: 2026-02-21T09:50:29.9578330Z } 2026-02-21T09:50:29.9578333Z 2026-02-21T09:50:29.9578393Z // end inline asm 2026-02-21T09:50:29.9578547Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9578603Z add.s32 %r278, %r1672, 1; 2026-02-21T09:50:29.9578669Z setp.eq.b32 %p57, %r278, 2; 2026-02-21T09:50:29.9578730Z selp.b32 %r1672, 0, %r278, %p57; 2026-02-21T09:50:29.9578785Z selp.b32 %r279, 1, 0, %p57; 2026-02-21T09:50:29.9578842Z xor.b32 %r1671, %r1671, %r279; 2026-02-21T09:50:29.9579007Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9579064Z add.s32 %r1668, %r1668, -1; 2026-02-21T09:50:29.9579122Z setp.ne.b32 %p58, %r1668, 0; 2026-02-21T09:50:29.9579183Z @%p58 bra $L__BB0_7; 2026-02-21T09:50:29.9579293Z $L__BB0_8: // %._crit_edge12 2026-02-21T09:50:29.9579381Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9579435Z barrier.sync 1; 2026-02-21T09:50:29.9579517Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:29.9579571Z bra.uni $L__BB0_2; 2026-02-21T09:50:29.9579668Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9579857Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9579933Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:29.9579987Z barrier.sync 1; 2026-02-21T09:50:29.9580155Z .loc 1 21 67 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:21:67 2026-02-21T09:50:29.9580213Z mov.u32 %r115, %ctaid.y; 2026-02-21T09:50:29.9580269Z mov.u32 %r116, %ctaid.z; 2026-02-21T09:50:29.9580325Z mov.u32 %r117, %nctaid.x; 2026-02-21T09:50:29.9580387Z mov.u32 %r118, %nctaid.y; 2026-02-21T09:50:29.9580454Z mad.lo.s32 %r119, %r116, %r118, %r115; 2026-02-21T09:50:29.9580518Z mad.lo.s32 %r120, %r119, %r117, %r169; 2026-02-21T09:50:29.9580579Z shl.b32 %r121, %r120, 8; 2026-02-21T09:50:29.9580743Z .loc 1 22 68 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:22:68 2026-02-21T09:50:29.9580823Z cvt.s64.s32 %rd7, %r121; 2026-02-21T09:50:29.9580908Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:50:29.9580966Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:50:29.9581028Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:50:29.9581187Z .loc 1 21 67 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:21:67 2026-02-21T09:50:29.9581254Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:50:29.9581414Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9581469Z shl.b32 %r1674, %r173, 4; 2026-02-21T09:50:29.9581533Z setp.lt.s32 %p5, %r1674, 1; 2026-02-21T09:50:29.9581590Z @%p5 bra $L__BB0_14; 2026-02-21T09:50:29.9581664Z // %bb.10: // %.lr.ph 2026-02-21T09:50:29.9581753Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9581811Z add.s32 %r1684, %r170, -1; 2026-02-21T09:50:29.9581866Z add.s32 %r21, %r1, -128; 2026-02-21T09:50:29.9581921Z shr.u32 %r22, %r21, 5; 2026-02-21T09:50:29.9581983Z mov.b32 %r1681, -1; 2026-02-21T09:50:29.9582036Z mov.b32 %r1675, 0; 2026-02-21T09:50:29.9582091Z mov.b32 %r1676, %r1675; 2026-02-21T09:50:29.9582150Z mov.b32 %r1683, %r1675; 2026-02-21T09:50:29.9582202Z mov.b32 %r1682, %r1675; 2026-02-21T09:50:29.9582254Z mov.b32 %r1679, %r1675; 2026-02-21T09:50:29.9582308Z bra.uni $L__BB0_11; 2026-02-21T09:50:29.9582408Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:50:29.9582569Z .loc 1 0 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0:74 2026-02-21T09:50:29.9582630Z selp.b32 %r32, 0, %r1679, %p8; 2026-02-21T09:50:29.9582694Z setp.lt.u32 %p12, %r21, 64; 2026-02-21T09:50:29.9582751Z setp.eq.b32 %p9, %r21, 0; 2026-02-21T09:50:29.9582911Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9582972Z shl.b32 %r152, %r1676, 3; 2026-02-21T09:50:29.9583029Z add.s32 %r154, %r112, %r152; 2026-02-21T09:50:29.9583086Z add.s32 %r141, %r154, 204800; 2026-02-21T09:50:29.9583248Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9583309Z // begin inline asm 2026-02-21T09:50:29.9583356Z 2026-02-21T09:50:29.9583403Z { 2026-02-21T09:50:29.9583465Z .reg .pred complete; 2026-02-21T09:50:29.9583517Z waitLoop: 2026-02-21T09:50:29.9583633Z mbarrier.try_wait.parity.shared.b64 complete, [%r141], %r1675; 2026-02-21T09:50:29.9583696Z @!complete bra.uni waitLoop; 2026-02-21T09:50:29.9583751Z } 2026-02-21T09:50:29.9583754Z 2026-02-21T09:50:29.9583828Z // end inline asm 2026-02-21T09:50:29.9583986Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9584049Z add.s32 %r147, %r154, 204816; 2026-02-21T09:50:29.9584204Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9584257Z bar.sync 3, 64; 2026-02-21T09:50:29.9584319Z // begin inline asm 2026-02-21T09:50:29.9584451Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r147], 98304; 2026-02-21T09:50:29.9584505Z // end inline asm 2026-02-21T09:50:29.9584667Z .loc 1 54 31 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:54:31 2026-02-21T09:50:29.9584764Z shl.b32 %r155, %r1676, 16; 2026-02-21T09:50:29.9584820Z add.s32 %r156, %r112, %r155; 2026-02-21T09:50:29.9584977Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9585041Z bar.sync 3, 64; 2026-02-21T09:50:29.9585114Z shfl.sync.idx.b32 %r157, %r22, 0, 31, -1; 2026-02-21T09:50:29.9585176Z elect.sync %r158|%p13, -1; 2026-02-21T09:50:29.9585242Z and.pred %p10, %p12, %p13; 2026-02-21T09:50:29.9585297Z and.b32 %r159, %r157, 1; 2026-02-21T09:50:29.9585350Z shl.b32 %r160, %r159, 15; 2026-02-21T09:50:29.9585432Z add.s32 %r144, %r156, %r160; 2026-02-21T09:50:29.9585498Z shl.b32 %r161, %r159, 6; 2026-02-21T09:50:29.9585582Z add.s32 %r145, %r161, %r32; 2026-02-21T09:50:29.9585640Z // begin inline asm 2026-02-21T09:50:29.9585904Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r144], [%rd10, {%r145, %r1683}], [%r147]; 2026-02-21T09:50:29.9585965Z // end inline asm 2026-02-21T09:50:29.9586131Z .loc 1 55 44 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:55:44 2026-02-21T09:50:29.9586192Z shl.b32 %r162, %r1676, 15; 2026-02-21T09:50:29.9586248Z add.s32 %r163, %r112, %r162; 2026-02-21T09:50:29.9586405Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9586460Z shl.b32 %r164, %r159, 14; 2026-02-21T09:50:29.9586634Z .loc 1 55 44 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:55:44 2026-02-21T09:50:29.9586691Z add.s32 %r165, %r163, %r164; 2026-02-21T09:50:29.9586850Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9586911Z bar.sync 3, 64; 2026-02-21T09:50:29.9586971Z elect.sync %r166|%p14, -1; 2026-02-21T09:50:29.9587029Z and.pred %p11, %p12, %p14; 2026-02-21T09:50:29.9587092Z add.s32 %r148, %r165, 131072; 2026-02-21T09:50:29.9587146Z // begin inline asm 2026-02-21T09:50:29.9587390Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r148], [%rd11, {%r145, %r1682}], [%r147]; 2026-02-21T09:50:29.9587450Z // end inline asm 2026-02-21T09:50:29.9587612Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9587668Z add.s32 %r1679, %r32, 128; 2026-02-21T09:50:29.9587825Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9587887Z add.s32 %r167, %r1676, 1; 2026-02-21T09:50:29.9587947Z setp.eq.b32 %p15, %r167, 2; 2026-02-21T09:50:29.9588011Z selp.b32 %r1676, 0, %r167, %p15; 2026-02-21T09:50:29.9588077Z selp.b32 %r168, 1, 0, %p15; 2026-02-21T09:50:29.9588134Z xor.b32 %r1675, %r1675, %r168; 2026-02-21T09:50:29.9588297Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9588359Z add.s32 %r1674, %r1674, -1; 2026-02-21T09:50:29.9588417Z setp.ne.b32 %p16, %r1674, 0; 2026-02-21T09:50:29.9588474Z @%p16 bra $L__BB0_11; 2026-02-21T09:50:29.9588528Z bra.uni $L__BB0_14; 2026-02-21T09:50:29.9588632Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:50:29.9588722Z // => This Inner Loop Header: Depth=2 2026-02-21T09:50:29.9588809Z add.s32 %r127, %r1681, 1; 2026-02-21T09:50:29.9588876Z setp.eq.b32 %p6, %r1681, 15; 2026-02-21T09:50:29.9588936Z selp.b32 %r1681, 0, %r127, %p6; 2026-02-21T09:50:29.9588994Z setp.ne.b32 %p7, %r1681, 0; 2026-02-21T09:50:29.9589051Z setp.eq.b32 %p8, %r1681, 0; 2026-02-21T09:50:29.9589113Z @%p7 bra $L__BB0_13; 2026-02-21T09:50:29.9589232Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:50:29.9589289Z add.s32 %r1684, %r1684, 1; 2026-02-21T09:50:29.9589454Z .loc 1 37 35 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:37:35 2026-02-21T09:50:29.9589508Z shr.s32 %r128, %r1684, 31; 2026-02-21T09:50:29.9589563Z shr.u32 %r129, %r128, 25; 2026-02-21T09:50:29.9589626Z add.s32 %r130, %r1684, %r129; 2026-02-21T09:50:29.9589682Z shr.s32 %r131, %r130, 7; 2026-02-21T09:50:29.9589842Z .loc 1 38 33 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:38:33 2026-02-21T09:50:29.9589897Z shl.b32 %r132, %r131, 4; 2026-02-21T09:50:29.9590065Z .loc 1 39 39 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:39:39 2026-02-21T09:50:29.9590119Z sub.s32 %r133, 96, %r132; 2026-02-21T09:50:29.9590295Z .loc 1 39 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:39:52 2026-02-21T09:50:29.9590381Z min.s32 %r134, %r133, 16; 2026-02-21T09:50:29.9590538Z .loc 1 40 45 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:40:45 2026-02-21T09:50:29.9590594Z and.b32 %r135, %r130, -128; 2026-02-21T09:50:29.9590658Z sub.s32 %r136, %r1684, %r135; 2026-02-21T09:50:29.9590819Z .loc 1 41 51 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:41:51 2026-02-21T09:50:29.9590875Z div.s32 %r137, %r136, %r134; 2026-02-21T09:50:29.9591046Z .loc 1 40 64 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:40:64 2026-02-21T09:50:29.9591107Z mul.lo.s32 %r138, %r137, %r134; 2026-02-21T09:50:29.9591162Z sub.s32 %r139, %r136, %r138; 2026-02-21T09:50:29.9591322Z .loc 1 40 30 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:40:30 2026-02-21T09:50:29.9591385Z add.s32 %r140, %r139, %r132; 2026-02-21T09:50:29.9591547Z .loc 1 42 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:42:27 2026-02-21T09:50:29.9591603Z shl.b32 %r1682, %r140, 7; 2026-02-21T09:50:29.9591767Z .loc 1 44 27 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:44:27 2026-02-21T09:50:29.9591821Z shl.b32 %r1683, %r137, 8; 2026-02-21T09:50:29.9591875Z bra.uni $L__BB0_13; 2026-02-21T09:50:29.9591960Z $L__BB0_14: // %._crit_edge 2026-02-21T09:50:29.9592045Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9592215Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9592273Z barrier.sync 1; 2026-02-21T09:50:29.9592362Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:29.9592418Z bra.uni $L__BB0_2; 2026-02-21T09:50:29.9592515Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:29.9592687Z .loc 1 19 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:19 2026-02-21T09:50:29.9592746Z barrier.sync 1; 2026-02-21T09:50:29.9592801Z barrier.sync 1; 2026-02-21T09:50:29.9592864Z bra.uni $L__BB0_2; 2026-02-21T09:50:29.9592950Z $L__BB0_23: // %._crit_edge15 2026-02-21T09:50:29.9593121Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9593177Z barrier.sync 1; 2026-02-21T09:50:29.9593263Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:50:29.9593322Z shl.b32 %r1666, %r1696, 3; 2026-02-21T09:50:29.9593407Z add.s32 %r1655, %r577, %r1666; 2026-02-21T09:50:29.9593583Z .loc 1 56 52 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:56:52 2026-02-21T09:50:29.9593640Z bar.sync 0, 128; 2026-02-21T09:50:29.9593695Z // begin inline asm 2026-02-21T09:50:29.9593752Z 2026-02-21T09:50:29.9593801Z { 2026-02-21T09:50:29.9593863Z .reg .pred complete; 2026-02-21T09:50:29.9593918Z waitLoop: 2026-02-21T09:50:29.9594076Z mbarrier.try_wait.parity.shared.b64 complete, [%r1655], %r1698; 2026-02-21T09:50:29.9594142Z @!complete bra.uni waitLoop; 2026-02-21T09:50:29.9594193Z } 2026-02-21T09:50:29.9594197Z 2026-02-21T09:50:29.9594261Z // end inline asm 2026-02-21T09:50:29.9594429Z .loc 1 31 74 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:74 2026-02-21T09:50:29.9594485Z bar.sync 0, 128; 2026-02-21T09:50:29.9594542Z // begin inline asm 2026-02-21T09:50:29.9594646Z @%p132 mbarrier.inval.shared::cta.b64 [%r577]; 2026-02-21T09:50:29.9594730Z // end inline asm 2026-02-21T09:50:29.9594789Z bar.sync 0, 128; 2026-02-21T09:50:29.9594853Z // begin inline asm 2026-02-21T09:50:29.9594940Z @%p132 mbarrier.inval.shared::cta.b64 [%r578]; 2026-02-21T09:50:29.9594996Z // end inline asm 2026-02-21T09:50:29.9595052Z // begin inline asm 2026-02-21T09:50:29.9595167Z @%p132 mbarrier.inval.shared::cta.b64 [%r575]; 2026-02-21T09:50:29.9595224Z // end inline asm 2026-02-21T09:50:29.9595305Z bar.sync 0, 128; 2026-02-21T09:50:29.9595371Z // begin inline asm 2026-02-21T09:50:29.9595451Z @%p132 mbarrier.inval.shared::cta.b64 [%r576]; 2026-02-21T09:50:29.9595504Z // end inline asm 2026-02-21T09:50:29.9595565Z // begin inline asm 2026-02-21T09:50:29.9595646Z @%p132 mbarrier.inval.shared::cta.b64 [%r571]; 2026-02-21T09:50:29.9595700Z // end inline asm 2026-02-21T09:50:29.9595753Z bar.sync 0, 128; 2026-02-21T09:50:29.9595816Z // begin inline asm 2026-02-21T09:50:29.9595894Z @%p132 mbarrier.inval.shared::cta.b64 [%r572]; 2026-02-21T09:50:29.9595948Z // end inline asm 2026-02-21T09:50:29.9596012Z // begin inline asm 2026-02-21T09:50:29.9596090Z @%p132 mbarrier.inval.shared::cta.b64 [%r569]; 2026-02-21T09:50:29.9596143Z // end inline asm 2026-02-21T09:50:29.9596197Z bar.sync 0, 128; 2026-02-21T09:50:29.9596259Z // begin inline asm 2026-02-21T09:50:29.9596339Z @%p132 mbarrier.inval.shared::cta.b64 [%r570]; 2026-02-21T09:50:29.9596393Z // end inline asm 2026-02-21T09:50:29.9596573Z .loc 1 31 4 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:31:4 2026-02-21T09:50:29.9596627Z bar.sync 0, 128; 2026-02-21T09:50:29.9596683Z // begin inline asm 2026-02-21T09:50:29.9596813Z @%p59 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1665, 512; 2026-02-21T09:50:29.9596867Z // end inline asm 2026-02-21T09:50:29.9596948Z st.shared.b32 [global_smem+204864], 50529027; 2026-02-21T09:50:29.9597005Z barrier.sync 1; 2026-02-21T09:50:29.9597094Z $L__BB0_24: // %common.ret 2026-02-21T09:50:29.9597265Z .loc 1 0 0 // cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py:0 2026-02-21T09:50:29.9597317Z ret; 2026-02-21T09:50:29.9597379Z $L__tmp1: 2026-02-21T09:50:29.9597435Z $L__func_end0: 2026-02-21T09:50:29.9597518Z // -- End function 2026-02-21T09:50:29.9597574Z } 2026-02-21T09:50:29.9597790Z .file 1 "/tmp/torchinductor_root/pq/cpqmrauxcbt5u33sduc4dq53rw4pyjstfqxvpdcjq63kq5d3d32r.py" 2026-02-21T09:50:29.9597855Z .section .debug_abbrev 2026-02-21T09:50:29.9597906Z { 2026-02-21T09:50:29.9598003Z .b8 1 // Abbreviation Code 2026-02-21T09:50:29.9598091Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:50:29.9598171Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:50:29.9598257Z .b8 37 // DW_AT_producer 2026-02-21T09:50:29.9598332Z .b8 8 // DW_FORM_string 2026-02-21T09:50:29.9598406Z .b8 19 // DW_AT_language 2026-02-21T09:50:29.9598519Z .b8 5 // DW_FORM_data2 2026-02-21T09:50:29.9598594Z .b8 3 // DW_AT_name 2026-02-21T09:50:29.9598667Z .b8 8 // DW_FORM_string 2026-02-21T09:50:29.9598746Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:50:29.9598829Z .b8 6 // DW_FORM_data4 2026-02-21T09:50:29.9598930Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:50:29.9599002Z .b8 8 // DW_FORM_string 2026-02-21T09:50:29.9599079Z .b8 0 // EOM(1) 2026-02-21T09:50:29.9599149Z .b8 0 // EOM(2) 2026-02-21T09:50:29.9599216Z .b8 0 // EOM(3) 2026-02-21T09:50:29.9599272Z } 2026-02-21T09:50:29.9599331Z .section .debug_info 2026-02-21T09:50:29.9599381Z { 2026-02-21T09:50:29.9599465Z .b32 104 // Length of Unit 2026-02-21T09:50:29.9599556Z .b8 2 // DWARF version number 2026-02-21T09:50:29.9599607Z .b8 0 2026-02-21T09:50:29.9599723Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:50:29.9599839Z .b8 8 // Address Size (in bytes) 2026-02-21T09:50:29.9599962Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:50:29.9600044Z .b8 116 // DW_AT_producer 2026-02-21T09:50:29.9600097Z .b8 114 2026-02-21T09:50:29.9600156Z .b8 105 2026-02-21T09:50:29.9600205Z .b8 116 2026-02-21T09:50:29.9600257Z .b8 111 2026-02-21T09:50:29.9600315Z .b8 110 2026-02-21T09:50:29.9600365Z .b8 0 2026-02-21T09:50:29.9600438Z .b8 2 // DW_AT_language 2026-02-21T09:50:29.9600488Z .b8 0 2026-02-21T09:50:29.9600570Z .b8 99 // DW_AT_name 2026-02-21T09:50:29.9600621Z .b8 112 2026-02-21T09:50:29.9600672Z .b8 113 2026-02-21T09:50:29.9600738Z .b8 109 2026-02-21T09:50:29.9600786Z .b8 114 2026-02-21T09:50:29.9600834Z .b8 97 2026-02-21T09:50:29.9600881Z .b8 117 2026-02-21T09:50:29.9600937Z .b8 120 2026-02-21T09:50:29.9600987Z .b8 99 2026-02-21T09:50:29.9601034Z .b8 98 2026-02-21T09:50:29.9601090Z .b8 116 2026-02-21T09:50:29.9601139Z .b8 53 2026-02-21T09:50:29.9601190Z .b8 117 2026-02-21T09:50:29.9601240Z .b8 51 2026-02-21T09:50:29.9601296Z .b8 51 2026-02-21T09:50:29.9601346Z .b8 115 2026-02-21T09:50:29.9601395Z .b8 100 2026-02-21T09:50:29.9601451Z .b8 117 2026-02-21T09:50:29.9601500Z .b8 99 2026-02-21T09:50:29.9601550Z .b8 52 2026-02-21T09:50:29.9601600Z .b8 100 2026-02-21T09:50:29.9601658Z .b8 113 2026-02-21T09:50:29.9601709Z .b8 53 2026-02-21T09:50:29.9601759Z .b8 51 2026-02-21T09:50:29.9601808Z .b8 114 2026-02-21T09:50:29.9601863Z .b8 119 2026-02-21T09:50:29.9601911Z .b8 52 2026-02-21T09:50:29.9601959Z .b8 112 2026-02-21T09:50:29.9602013Z .b8 121 2026-02-21T09:50:29.9602061Z .b8 106 2026-02-21T09:50:29.9602109Z .b8 115 2026-02-21T09:50:29.9602156Z .b8 116 2026-02-21T09:50:29.9602211Z .b8 102 2026-02-21T09:50:29.9602260Z .b8 113 2026-02-21T09:50:29.9602308Z .b8 120 2026-02-21T09:50:29.9602363Z .b8 118 2026-02-21T09:50:29.9602411Z .b8 112 2026-02-21T09:50:29.9602458Z .b8 100 2026-02-21T09:50:29.9602508Z .b8 99 2026-02-21T09:50:29.9602565Z .b8 106 2026-02-21T09:50:29.9602614Z .b8 113 2026-02-21T09:50:29.9602666Z .b8 54 2026-02-21T09:50:29.9602714Z .b8 51 2026-02-21T09:50:29.9602768Z .b8 107 2026-02-21T09:50:29.9602815Z .b8 113 2026-02-21T09:50:29.9602862Z .b8 53 2026-02-21T09:50:29.9602917Z .b8 100 2026-02-21T09:50:29.9602964Z .b8 51 2026-02-21T09:50:29.9603011Z .b8 100 2026-02-21T09:50:29.9603058Z .b8 51 2026-02-21T09:50:29.9603114Z .b8 50 2026-02-21T09:50:29.9603163Z .b8 114 2026-02-21T09:50:29.9603209Z .b8 46 2026-02-21T09:50:29.9603261Z .b8 112 2026-02-21T09:50:29.9603308Z .b8 121 2026-02-21T09:50:29.9603357Z .b8 0 2026-02-21T09:50:29.9603444Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:50:29.9603572Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:50:29.9603621Z .b8 116 2026-02-21T09:50:29.9603669Z .b8 109 2026-02-21T09:50:29.9603724Z .b8 112 2026-02-21T09:50:29.9603771Z .b8 47 2026-02-21T09:50:29.9603819Z .b8 116 2026-02-21T09:50:29.9603866Z .b8 111 2026-02-21T09:50:29.9603921Z .b8 114 2026-02-21T09:50:29.9603970Z .b8 99 2026-02-21T09:50:29.9604040Z .b8 104 2026-02-21T09:50:29.9604093Z .b8 105 2026-02-21T09:50:29.9604141Z .b8 110 2026-02-21T09:50:29.9604188Z .b8 100 2026-02-21T09:50:29.9604235Z .b8 117 2026-02-21T09:50:29.9604291Z .b8 99 2026-02-21T09:50:29.9604339Z .b8 116 2026-02-21T09:50:29.9604386Z .b8 111 2026-02-21T09:50:29.9604435Z .b8 114 2026-02-21T09:50:29.9604491Z .b8 95 2026-02-21T09:50:29.9604539Z .b8 114 2026-02-21T09:50:29.9604587Z .b8 111 2026-02-21T09:50:29.9604643Z .b8 111 2026-02-21T09:50:29.9604720Z .b8 116 2026-02-21T09:50:29.9604769Z .b8 47 2026-02-21T09:50:29.9604817Z .b8 112 2026-02-21T09:50:29.9604873Z .b8 113 2026-02-21T09:50:29.9604921Z .b8 0 2026-02-21T09:50:29.9604969Z } 2026-02-21T09:50:29.9605037Z .section .debug_macinfo { } 2026-02-21T09:50:29.9605041Z 2026-02-21T09:50:29.9605116Z ================================================================ 2026-02-21T09:50:29.9605242Z please share the reproducer above with Triton project. 2026-02-21T09:50:32.0858774Z 2026-02-21T09:50:32.0863634Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 86/86 20.5 configs/s 2026-02-21T09:50:41.1296060Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 110.8 2026-02-21T09:50:41.1297503Z configs/s 2026-02-21T09:50:41.4163236Z [258s] Generation 8 complete: 2026-02-21T09:50:41.4167774Z error=16 2026-02-21T09:50:41.4170967Z ok=74 2026-02-21T09:50:41.4174419Z min=0.1073 2026-02-21T09:50:41.4178374Z mid=0.1423 2026-02-21T09:50:41.4181587Z max=1.2000 2026-02-21T09:50:41.4185510Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:50:41.4185855Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:50:41.4191547Z 'l2_groupings': [4], 2026-02-21T09:50:41.4193685Z 'load_eviction_policies': ['last', ''], 2026-02-21T09:50:41.4193962Z 'loop_orders': [[1, 0]], 2026-02-21T09:50:41.4194133Z 'maxnreg': 64, 2026-02-21T09:50:41.4198820Z 'num_sm_multiplier': 16, 2026-02-21T09:50:41.4203239Z 'num_stages': 3, 2026-02-21T09:50:41.4207215Z 'num_warps': 1, 2026-02-21T09:50:41.4209251Z 'pid_type': 'persistent_blocked', 2026-02-21T09:50:41.4209515Z 'range_flattens': [False, None], 2026-02-21T09:50:41.4214379Z 'range_multi_buffers': [False, False], 2026-02-21T09:50:41.4218924Z 'range_num_stages': [0, 0], 2026-02-21T09:50:41.4220789Z 'range_unroll_factors': [0, 0], 2026-02-21T09:50:41.4221021Z 'range_warp_specializes': [None, True]} 2026-02-21T09:50:41.4221311Z [258s] Fitting surrogate: 860 points, 860 targets 2026-02-21T09:50:42.3669022Z [259s] Generation 9 starting: 49 neighbors, 3 active search path(s) 2026-02-21T09:50:53.8016096Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50/50 1.1 configs/s 2026-02-21T09:50:55.8309770Z 2026-02-21T09:50:55.8315738Z 2026-02-21T09:50:55.8320783Z ================================================================ 2026-02-21T09:50:55.8322152Z Internal Triton PTX codegen error 2026-02-21T09:50:55.8322445Z `ptxas` stderr: 2026-02-21T09:50:55.8322976Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 207 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:50:55.8323535Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:50:55.8323734Z 2026-02-21T09:50:55.8324229Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpm435kurp.ptx -o /tmp/tmpm435kurp.ptx.o 2026-02-21T09:50:55.8324825Z 2026-02-21T09:50:55.8324829Z 2026-02-21T09:50:55.8324902Z // 2026-02-21T09:50:55.8325074Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:50:55.8325590Z // 2026-02-21T09:50:55.8325674Z 2026-02-21T09:50:55.8325756Z .version 8.7 2026-02-21T09:50:55.8325931Z .target sm_100a 2026-02-21T09:50:55.8326090Z .address_size 64 2026-02-21T09:50:55.8326187Z 2026-02-21T09:50:55.8326333Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:50:55.8326628Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:50:55.8326956Z // @_helion_matmul 2026-02-21T09:50:55.8327153Z .visible .entry _helion_matmul( 2026-02-21T09:50:55.8327373Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:50:55.8327633Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:50:55.8327898Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:50:55.8328162Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:50:55.8328429Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:50:55.8328634Z ) 2026-02-21T09:50:55.8328748Z .reqntid 256 2026-02-21T09:50:55.8328881Z .maxnreg 32 2026-02-21T09:50:55.8328997Z { 2026-02-21T09:50:55.8329125Z .reg .pred %p<123>; 2026-02-21T09:50:55.8329274Z .reg .b32 %r<1611>; 2026-02-21T09:50:55.8329409Z .reg .b64 %rd<609>; 2026-02-21T09:50:55.8329752Z .loc 1 19 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:19:0 2026-02-21T09:50:55.8330137Z $L__func_begin0: 2026-02-21T09:50:55.8330393Z .loc 1 19 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:19:0 2026-02-21T09:50:55.8330617Z 2026-02-21T09:50:55.8330669Z // %bb.0: 2026-02-21T09:50:55.8334769Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:50:55.8340007Z $L__tmp0: 2026-02-21T09:50:55.8342513Z .loc 1 19 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:19 2026-02-21T09:50:55.8342945Z mov.u32 %r1, %tid.x; 2026-02-21T09:50:55.8343095Z shr.u32 %r2, %r1, 5; 2026-02-21T09:50:55.8343274Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:50:55.8343461Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:50:55.8343622Z @%p3 bra $L__BB0_16; 2026-02-21T09:50:55.8343761Z bra.uni $L__BB0_1; 2026-02-21T09:50:55.8343903Z $L__BB0_16: 2026-02-21T09:50:55.8344145Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0:0 2026-02-21T09:50:55.8344463Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:50:55.8344724Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:50:55.8345017Z .loc 1 19 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:19 2026-02-21T09:50:55.8345325Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:50:55.8345513Z setp.lt.u32 %p33, %r1, 32; 2026-02-21T09:50:55.8345681Z mov.b32 %r190, global_smem; 2026-02-21T09:50:55.8345836Z // begin inline asm 2026-02-21T09:50:55.8346089Z @%p33 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r190], 256; 2026-02-21T09:50:55.8346348Z // end inline asm 2026-02-21T09:50:55.8346483Z bar.sync 0, 128; 2026-02-21T09:50:55.8346635Z ld.shared.b32 %r1582, [global_smem]; 2026-02-21T09:50:55.8346800Z bar.sync 0, 128; 2026-02-21T09:50:55.8346941Z // begin inline asm 2026-02-21T09:50:55.8347142Z @%p33 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:50:55.8347369Z // end inline asm 2026-02-21T09:50:55.8347649Z .loc 1 21 67 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:21:67 2026-02-21T09:50:55.8347943Z mov.u32 %r41, %ctaid.x; 2026-02-21T09:50:55.8348090Z mov.u32 %r495, %ctaid.y; 2026-02-21T09:50:55.8348243Z mov.u32 %r496, %ctaid.z; 2026-02-21T09:50:55.8348388Z mov.u32 %r497, %nctaid.x; 2026-02-21T09:50:55.8348545Z mov.u32 %r498, %nctaid.y; 2026-02-21T09:50:55.8348698Z mad.lo.s32 %r499, %r496, %r498, %r495; 2026-02-21T09:50:55.8348878Z mad.lo.s32 %r500, %r499, %r497, %r41; 2026-02-21T09:50:55.8349044Z shl.b32 %r501, %r500, 8; 2026-02-21T09:50:55.8349392Z cvt.s64.s32 %rd64, %r501; 2026-02-21T09:50:55.8349544Z add.s64 %rd43, %rd6, %rd64; 2026-02-21T09:50:55.8349705Z shl.b32 %r502, %r1, 2; 2026-02-21T09:50:55.8349859Z add.s32 %r191, %r190, %r502; 2026-02-21T09:50:55.8350007Z mov.b32 %r1610, 0; 2026-02-21T09:50:55.8350147Z // begin inline asm 2026-02-21T09:50:55.8350297Z @%p33 st.shared.b32 [ %r191 + 0 ], %r1610; 2026-02-21T09:50:55.8350476Z // end inline asm 2026-02-21T09:50:55.8350675Z bar.warp.sync -1; 2026-02-21T09:50:55.8350831Z setp.eq.b32 %p111, %r1, 0; 2026-02-21T09:50:55.8350989Z cvt.u64.u32 %rd28, %r190; 2026-02-21T09:50:55.8351147Z // begin inline asm 2026-02-21T09:50:55.8351397Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd3; 2026-02-21T09:50:55.8351686Z // end inline asm 2026-02-21T09:50:55.8351825Z // begin inline asm 2026-02-21T09:50:55.8352051Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:50:55.8352310Z // end inline asm 2026-02-21T09:50:55.8352442Z mov.b32 %r193, 32; 2026-02-21T09:50:55.8352581Z // begin inline asm 2026-02-21T09:50:55.8352818Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r193; 2026-02-21T09:50:55.8353093Z // end inline asm 2026-02-21T09:50:55.8353238Z mov.b32 %r194, 256; 2026-02-21T09:50:55.8353420Z // begin inline asm 2026-02-21T09:50:55.8353694Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r194; 2026-02-21T09:50:55.8353963Z // end inline asm 2026-02-21T09:50:55.8354109Z mov.b32 %r195, 2048; 2026-02-21T09:50:55.8354249Z // begin inline asm 2026-02-21T09:50:55.8354498Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r195; 2026-02-21T09:50:55.8354817Z // end inline asm 2026-02-21T09:50:55.8354945Z // begin inline asm 2026-02-21T09:50:55.8355185Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r195; 2026-02-21T09:50:55.8355448Z // end inline asm 2026-02-21T09:50:55.8355585Z mov.b64 %rd36, 4096; 2026-02-21T09:50:55.8355720Z // begin inline asm 2026-02-21T09:50:55.8355974Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:50:55.8356254Z // end inline asm 2026-02-21T09:50:55.8356388Z mov.b32 %r197, 1; 2026-02-21T09:50:55.8356528Z // begin inline asm 2026-02-21T09:50:55.8356786Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r197; 2026-02-21T09:50:55.8357091Z // end inline asm 2026-02-21T09:50:55.8357227Z // begin inline asm 2026-02-21T09:50:55.8357503Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r197; 2026-02-21T09:50:55.8357789Z // end inline asm 2026-02-21T09:50:55.8357933Z // begin inline asm 2026-02-21T09:50:55.8358171Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:50:55.8358436Z // end inline asm 2026-02-21T09:50:55.8358576Z // begin inline asm 2026-02-21T09:50:55.8358828Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:50:55.8359124Z // end inline asm 2026-02-21T09:50:55.8359255Z // begin inline asm 2026-02-21T09:50:55.8359498Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:50:55.8359781Z // end inline asm 2026-02-21T09:50:55.8359913Z // begin inline asm 2026-02-21T09:50:55.8360154Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:50:55.8360413Z // end inline asm 2026-02-21T09:50:55.8360554Z // begin inline asm 2026-02-21T09:50:55.8360900Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:50:55.8361289Z // end inline asm 2026-02-21T09:50:55.8361428Z // begin inline asm 2026-02-21T09:50:55.8361637Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T09:50:55.8361896Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:50:55.8362132Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:50:55.8362322Z // end inline asm 2026-02-21T09:50:55.8362460Z bar.sync 0, 128; 2026-02-21T09:50:55.8362726Z .loc 1 22 68 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:22:68 2026-02-21T09:50:55.8363042Z add.s64 %rd61, %rd43, 128; 2026-02-21T09:50:55.8363217Z bar.sync 0, 128; 2026-02-21T09:50:55.8363387Z // begin inline asm 2026-02-21T09:50:55.8363540Z @%p33 st.shared.b32 [ %r191 + 0 ], %r1610; 2026-02-21T09:50:55.8363724Z // end inline asm 2026-02-21T09:50:55.8363860Z bar.warp.sync -1; 2026-02-21T09:50:55.8364004Z // begin inline asm 2026-02-21T09:50:55.8364247Z @%p111 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd4; 2026-02-21T09:50:55.8364534Z // end inline asm 2026-02-21T09:50:55.8364665Z // begin inline asm 2026-02-21T09:50:55.8364924Z @%p111 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T09:50:55.8365183Z // end inline asm 2026-02-21T09:50:55.8365312Z // begin inline asm 2026-02-21T09:50:55.8365553Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r193; 2026-02-21T09:50:55.8365824Z // end inline asm 2026-02-21T09:50:55.8365964Z mov.b32 %r202, 128; 2026-02-21T09:50:55.8366132Z // begin inline asm 2026-02-21T09:50:55.8366407Z @%p111 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r202; 2026-02-21T09:50:55.8366676Z // end inline asm 2026-02-21T09:50:55.8366814Z // begin inline asm 2026-02-21T09:50:55.8367074Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r195; 2026-02-21T09:50:55.8367335Z // end inline asm 2026-02-21T09:50:55.8367469Z mov.b32 %r204, 12288; 2026-02-21T09:50:55.8367606Z // begin inline asm 2026-02-21T09:50:55.8367843Z @%p111 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r204; 2026-02-21T09:50:55.8368108Z // end inline asm 2026-02-21T09:50:55.8368241Z // begin inline asm 2026-02-21T09:50:55.8368488Z @%p111 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T09:50:55.8368763Z // end inline asm 2026-02-21T09:50:55.8368895Z // begin inline asm 2026-02-21T09:50:55.8369138Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r197; 2026-02-21T09:50:55.8369422Z // end inline asm 2026-02-21T09:50:55.8369551Z // begin inline asm 2026-02-21T09:50:55.8369801Z @%p111 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r197; 2026-02-21T09:50:55.8370078Z // end inline asm 2026-02-21T09:50:55.8370204Z // begin inline asm 2026-02-21T09:50:55.8370431Z @%p111 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x6; 2026-02-21T09:50:55.8370681Z // end inline asm 2026-02-21T09:50:55.8370814Z // begin inline asm 2026-02-21T09:50:55.8371056Z @%p111 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:50:55.8371331Z // end inline asm 2026-02-21T09:50:55.8371457Z // begin inline asm 2026-02-21T09:50:55.8371690Z @%p111 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x2; 2026-02-21T09:50:55.8371956Z // end inline asm 2026-02-21T09:50:55.8372084Z // begin inline asm 2026-02-21T09:50:55.8372309Z @%p111 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T09:50:55.8372559Z // end inline asm 2026-02-21T09:50:55.8372690Z // begin inline asm 2026-02-21T09:50:55.8373017Z @%p33 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T09:50:55.8373397Z // end inline asm 2026-02-21T09:50:55.8373529Z // begin inline asm 2026-02-21T09:50:55.8373727Z @%p33 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T09:50:55.8373970Z @%p33 cp.async.bulk.commit_group ; 2026-02-21T09:50:55.8374152Z @%p33 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:50:55.8374363Z // end inline asm 2026-02-21T09:50:55.8374488Z bar.sync 0, 128; 2026-02-21T09:50:55.8374783Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8375072Z sub.s32 %r505, 768, %r41; 2026-02-21T09:50:55.8375231Z mul.hi.s32 %r506, %r505, -580400985; 2026-02-21T09:50:55.8375410Z add.s32 %r507, %r506, %r505; 2026-02-21T09:50:55.8375593Z shr.u32 %r508, %r507, 31; 2026-02-21T09:50:55.8375747Z shr.s32 %r509, %r507, 9; 2026-02-21T09:50:55.8375894Z add.s32 %r510, %r509, %r508; 2026-02-21T09:50:55.8376054Z mul.lo.s32 %r511, %r510, 592; 2026-02-21T09:50:55.8376215Z setp.ne.b32 %p102, %r505, %r511; 2026-02-21T09:50:55.8376386Z setp.lt.u32 %p103, %r41, 769; 2026-02-21T09:50:55.8376545Z and.pred %p104, %p103, %p102; 2026-02-21T09:50:55.8376708Z selp.b32 %r512, 1, 0, %p104; 2026-02-21T09:50:55.8376863Z add.s32 %r513, %r510, %r512; 2026-02-21T09:50:55.8377010Z shl.b32 %r75, %r513, 6; 2026-02-21T09:50:55.8377596Z [272s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:50:55.8378874Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=4, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:50:55.8380145Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:50:55.8380376Z `ptxas` stderr: 2026-02-21T09:50:55.8380783Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 207 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:50:55.8381238Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:50:55.8381390Z 2026-02-21T09:50:55.8381785Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmpm435kurp.ptx -o /tmp/tmpm435kurp.ptx.o 2026-02-21T09:50:55.8382212Z 2026-02-21T09:50:55.8382346Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:50:55.8382689Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8382999Z shfl.sync.idx.b32 %r514, %r2, 0, 31, -1; 2026-02-21T09:50:55.8383179Z shl.b32 %r515, %r514, 21; 2026-02-21T09:50:55.8383337Z and.b32 %r516, %r515, 6291456; 2026-02-21T09:50:55.8383496Z add.s32 %r207, %r516, %r1582; 2026-02-21T09:50:55.8383658Z mov.pred %p71, -1; 2026-02-21T09:50:55.8383803Z // begin inline asm 2026-02-21T09:50:55.8384193Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 0], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8384615Z // end inline asm 2026-02-21T09:50:55.8384780Z // begin inline asm 2026-02-21T09:50:55.8385162Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 16], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8385582Z // end inline asm 2026-02-21T09:50:55.8385722Z // begin inline asm 2026-02-21T09:50:55.8386109Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 32], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8386525Z // end inline asm 2026-02-21T09:50:55.8386659Z // begin inline asm 2026-02-21T09:50:55.8387041Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 48], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8387493Z // end inline asm 2026-02-21T09:50:55.8387626Z // begin inline asm 2026-02-21T09:50:55.8388006Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 64], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8388412Z // end inline asm 2026-02-21T09:50:55.8388539Z // begin inline asm 2026-02-21T09:50:55.8388915Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 80], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8389363Z // end inline asm 2026-02-21T09:50:55.8389497Z // begin inline asm 2026-02-21T09:50:55.8389873Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 96], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8390278Z // end inline asm 2026-02-21T09:50:55.8390415Z // begin inline asm 2026-02-21T09:50:55.8390797Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 112], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8391208Z // end inline asm 2026-02-21T09:50:55.8391336Z // begin inline asm 2026-02-21T09:50:55.8391787Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 128], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8392217Z // end inline asm 2026-02-21T09:50:55.8392343Z // begin inline asm 2026-02-21T09:50:55.8392721Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 144], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8393124Z // end inline asm 2026-02-21T09:50:55.8393260Z // begin inline asm 2026-02-21T09:50:55.8393651Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 160], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8394064Z // end inline asm 2026-02-21T09:50:55.8394199Z // begin inline asm 2026-02-21T09:50:55.8394571Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 176], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8395049Z // end inline asm 2026-02-21T09:50:55.8395176Z // begin inline asm 2026-02-21T09:50:55.8395574Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 192], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8395983Z // end inline asm 2026-02-21T09:50:55.8396109Z // begin inline asm 2026-02-21T09:50:55.8396503Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 208], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8396909Z // end inline asm 2026-02-21T09:50:55.8397045Z // begin inline asm 2026-02-21T09:50:55.8397443Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 224], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8397859Z // end inline asm 2026-02-21T09:50:55.8397995Z // begin inline asm 2026-02-21T09:50:55.8398368Z @%p71 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r207 + 240], {%r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610, %r1610}; 2026-02-21T09:50:55.8398783Z // end inline asm 2026-02-21T09:50:55.8398911Z // begin inline asm 2026-02-21T09:50:55.8399069Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:50:55.8399231Z // end inline asm 2026-02-21T09:50:55.8399362Z bar.sync 0, 128; 2026-02-21T09:50:55.8399617Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8399924Z add.s32 %r479, %r190, 106496; 2026-02-21T09:50:55.8400081Z // begin inline asm 2026-02-21T09:50:55.8400245Z @%p111 mbarrier.init.shared::cta.b64 [%r479], 1; 2026-02-21T09:50:55.8400440Z // end inline asm 2026-02-21T09:50:55.8400567Z bar.sync 0, 128; 2026-02-21T09:50:55.8400706Z add.s32 %r480, %r190, 106504; 2026-02-21T09:50:55.8400884Z // begin inline asm 2026-02-21T09:50:55.8401045Z @%p111 mbarrier.init.shared::cta.b64 [%r480], 1; 2026-02-21T09:50:55.8401239Z // end inline asm 2026-02-21T09:50:55.8401366Z bar.sync 0, 128; 2026-02-21T09:50:55.8401517Z add.s32 %r481, %r190, 106512; 2026-02-21T09:50:55.8401669Z // begin inline asm 2026-02-21T09:50:55.8401837Z @%p111 mbarrier.init.shared::cta.b64 [%r481], 1; 2026-02-21T09:50:55.8402024Z // end inline asm 2026-02-21T09:50:55.8402165Z bar.sync 0, 128; 2026-02-21T09:50:55.8402300Z add.s32 %r482, %r190, 106520; 2026-02-21T09:50:55.8402460Z // begin inline asm 2026-02-21T09:50:55.8402629Z @%p111 mbarrier.init.shared::cta.b64 [%r482], 1; 2026-02-21T09:50:55.8402816Z // end inline asm 2026-02-21T09:50:55.8402957Z add.s32 %r483, %r190, 106528; 2026-02-21T09:50:55.8403108Z // begin inline asm 2026-02-21T09:50:55.8403302Z @%p111 mbarrier.init.shared::cta.b64 [%r483], 1; 2026-02-21T09:50:55.8403491Z // end inline asm 2026-02-21T09:50:55.8403654Z bar.sync 0, 128; 2026-02-21T09:50:55.8403792Z add.s32 %r484, %r190, 106536; 2026-02-21T09:50:55.8403950Z // begin inline asm 2026-02-21T09:50:55.8404118Z @%p111 mbarrier.init.shared::cta.b64 [%r484], 1; 2026-02-21T09:50:55.8404300Z // end inline asm 2026-02-21T09:50:55.8404436Z bar.sync 0, 128; 2026-02-21T09:50:55.8404570Z add.s32 %r485, %r190, 106544; 2026-02-21T09:50:55.8404776Z // begin inline asm 2026-02-21T09:50:55.8404939Z @%p111 mbarrier.init.shared::cta.b64 [%r485], 1; 2026-02-21T09:50:55.8405131Z // end inline asm 2026-02-21T09:50:55.8405261Z bar.sync 0, 128; 2026-02-21T09:50:55.8405404Z add.s32 %r486, %r190, 106552; 2026-02-21T09:50:55.8405556Z // begin inline asm 2026-02-21T09:50:55.8405729Z @%p111 mbarrier.init.shared::cta.b64 [%r486], 1; 2026-02-21T09:50:55.8405925Z // end inline asm 2026-02-21T09:50:55.8406180Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8406478Z bar.sync 0, 128; 2026-02-21T09:50:55.8406616Z // begin inline asm 2026-02-21T09:50:55.8406799Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r479]; 2026-02-21T09:50:55.8406997Z // end inline asm 2026-02-21T09:50:55.8407138Z bar.sync 0, 128; 2026-02-21T09:50:55.8407272Z // begin inline asm 2026-02-21T09:50:55.8407450Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r480]; 2026-02-21T09:50:55.8407654Z // end inline asm 2026-02-21T09:50:55.8407785Z bar.sync 0, 128; 2026-02-21T09:50:55.8407924Z // begin inline asm 2026-02-21T09:50:55.8408091Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r481]; 2026-02-21T09:50:55.8408292Z // end inline asm 2026-02-21T09:50:55.8408421Z bar.sync 0, 128; 2026-02-21T09:50:55.8408563Z // begin inline asm 2026-02-21T09:50:55.8408729Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r482]; 2026-02-21T09:50:55.8408925Z // end inline asm 2026-02-21T09:50:55.8409188Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8409484Z bar.sync 0, 128; 2026-02-21T09:50:55.8409623Z add.s32 %r491, %r190, 106560; 2026-02-21T09:50:55.8409772Z // begin inline asm 2026-02-21T09:50:55.8409938Z @%p111 mbarrier.init.shared::cta.b64 [%r491], 1; 2026-02-21T09:50:55.8410119Z // end inline asm 2026-02-21T09:50:55.8410258Z add.s32 %r1570, %r190, 106576; 2026-02-21T09:50:55.8410412Z // begin inline asm 2026-02-21T09:50:55.8410579Z @%p111 mbarrier.init.shared::cta.b64 [%r1570], 1; 2026-02-21T09:50:55.8410767Z // end inline asm 2026-02-21T09:50:55.8411008Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8411326Z bar.sync 0, 128; 2026-02-21T09:50:55.8411458Z // begin inline asm 2026-02-21T09:50:55.8411631Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r1570]; 2026-02-21T09:50:55.8411824Z // end inline asm 2026-02-21T09:50:55.8412074Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8412389Z st.shared.b32 [global_smem+106584], 33554689; 2026-02-21T09:50:55.8412619Z st.shared.b32 [global_smem+98304], %r1582; 2026-02-21T09:50:55.8412814Z st.shared.b32 [global_smem+98312], %r75; 2026-02-21T09:50:55.8412986Z barrier.sync 1; 2026-02-21T09:50:55.8413145Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:50:55.8413317Z barrier.sync 1; 2026-02-21T09:50:55.8413460Z setp.lt.s32 %p105, %r513, 1; 2026-02-21T09:50:55.8413615Z @%p105 bra $L__BB0_23; 2026-02-21T09:50:55.8413783Z // %bb.17: // %.lr.ph10 2026-02-21T09:50:55.8414092Z .loc 1 0 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0:97 2026-02-21T09:50:55.8414416Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:50:55.8414627Z shl.b32 %r503, %r1, 3; 2026-02-21T09:50:55.8414805Z and.b32 %r42, %r503, 120; 2026-02-21T09:50:55.8414963Z shr.u32 %r504, %r1, 4; 2026-02-21T09:50:55.8415135Z bfe.u32 %r43, %r1, 4, 3; 2026-02-21T09:50:55.8415292Z or.b32 %r44, %r43, 8; 2026-02-21T09:50:55.8415465Z or.b32 %r45, %r43, 16; 2026-02-21T09:50:55.8415618Z or.b32 %r46, %r43, 24; 2026-02-21T09:50:55.8415758Z or.b32 %r47, %r43, 32; 2026-02-21T09:50:55.8415903Z or.b32 %r48, %r43, 40; 2026-02-21T09:50:55.8416046Z or.b32 %r49, %r43, 48; 2026-02-21T09:50:55.8416187Z or.b32 %r50, %r504, 56; 2026-02-21T09:50:55.8416341Z or.b32 %r51, %r43, 64; 2026-02-21T09:50:55.8416479Z or.b32 %r52, %r43, 72; 2026-02-21T09:50:55.8416626Z or.b32 %r53, %r43, 80; 2026-02-21T09:50:55.8416764Z or.b32 %r54, %r43, 88; 2026-02-21T09:50:55.8416910Z or.b32 %r55, %r43, 96; 2026-02-21T09:50:55.8417051Z or.b32 %r56, %r43, 104; 2026-02-21T09:50:55.8417203Z or.b32 %r57, %r43, 112; 2026-02-21T09:50:55.8417346Z or.b32 %r58, %r504, 120; 2026-02-21T09:50:55.8417500Z or.b32 %r59, %r43, 128; 2026-02-21T09:50:55.8417650Z or.b32 %r60, %r43, 136; 2026-02-21T09:50:55.8417790Z or.b32 %r61, %r43, 144; 2026-02-21T09:50:55.8417937Z or.b32 %r62, %r43, 152; 2026-02-21T09:50:55.8418075Z or.b32 %r63, %r43, 160; 2026-02-21T09:50:55.8418224Z or.b32 %r64, %r43, 168; 2026-02-21T09:50:55.8418359Z or.b32 %r65, %r43, 176; 2026-02-21T09:50:55.8418517Z or.b32 %r66, %r504, 184; 2026-02-21T09:50:55.8418661Z or.b32 %r67, %r43, 192; 2026-02-21T09:50:55.8418806Z or.b32 %r68, %r43, 200; 2026-02-21T09:50:55.8418942Z or.b32 %r69, %r43, 208; 2026-02-21T09:50:55.8419086Z or.b32 %r70, %r43, 216; 2026-02-21T09:50:55.8419231Z or.b32 %r71, %r43, 224; 2026-02-21T09:50:55.8419367Z or.b32 %r72, %r43, 232; 2026-02-21T09:50:55.8419511Z or.b32 %r73, %r43, 240; 2026-02-21T09:50:55.8419651Z or.b32 %r74, %r504, 248; 2026-02-21T09:50:55.8419913Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8420201Z add.s32 %r1607, %r41, -592; 2026-02-21T09:50:55.8420365Z shl.b32 %r519, %r1, 10; 2026-02-21T09:50:55.8420509Z and.b32 %r520, %r519, 6144; 2026-02-21T09:50:55.8420670Z shl.b32 %r521, %r1, 4; 2026-02-21T09:50:55.8420819Z and.b32 %r522, %r521, 2032; 2026-02-21T09:50:55.8420971Z or.b32 %r523, %r520, %r522; 2026-02-21T09:50:55.8421127Z add.s32 %r525, %r190, 98304; 2026-02-21T09:50:55.8421278Z add.s32 %r78, %r525, %r523; 2026-02-21T09:50:55.8421432Z xor.b32 %r526, %r523, 32; 2026-02-21T09:50:55.8421577Z add.s32 %r79, %r525, %r526; 2026-02-21T09:50:55.8421729Z xor.b32 %r527, %r523, 64; 2026-02-21T09:50:55.8421876Z add.s32 %r80, %r525, %r527; 2026-02-21T09:50:55.8422031Z xor.b32 %r528, %r523, 96; 2026-02-21T09:50:55.8422175Z add.s32 %r81, %r525, %r528; 2026-02-21T09:50:55.8422327Z and.b32 %r529, %r1, 96; 2026-02-21T09:50:55.8422509Z shl.b32 %r530, %r529, 6; 2026-02-21T09:50:55.8422652Z shl.b32 %r531, %r1, 5; 2026-02-21T09:50:55.8422800Z and.b32 %r532, %r531, 96; 2026-02-21T09:50:55.8422947Z and.b32 %r533, %r521, 384; 2026-02-21T09:50:55.8423103Z and.b32 %r535, %r502, 16; 2026-02-21T09:50:55.8423245Z or.b32 %r536, %r530, %r532; 2026-02-21T09:50:55.8423403Z or.b32 %r537, %r533, %r529; 2026-02-21T09:50:55.8423554Z xor.b32 %r538, %r536, %r537; 2026-02-21T09:50:55.8423744Z add.s32 %r539, %r525, %r535; 2026-02-21T09:50:55.8423906Z add.s32 %r821, %r539, %r538; 2026-02-21T09:50:55.8424063Z add.s32 %r826, %r821, 512; 2026-02-21T09:50:55.8424215Z add.s32 %r831, %r821, 1024; 2026-02-21T09:50:55.8424361Z add.s32 %r836, %r821, 1536; 2026-02-21T09:50:55.8424514Z max.s32 %r1600, %r75, 1; 2026-02-21T09:50:55.8424657Z mov.b32 %r1605, -1; 2026-02-21T09:50:55.8424835Z mov.b32 %r1608, %r1610; 2026-02-21T09:50:55.8424976Z mov.b32 %r1609, %r1610; 2026-02-21T09:50:55.8425124Z bra.uni $L__BB0_18; 2026-02-21T09:50:55.8425307Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:55.8425631Z .loc 1 40 32 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:40:32 2026-02-21T09:50:55.8425919Z or.b32 %r1106, %r1609, %r42; 2026-02-21T09:50:55.8426207Z .loc 1 42 32 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:42:32 2026-02-21T09:50:55.8426520Z add.s32 %r1107, %r1608, %r43; 2026-02-21T09:50:55.8426675Z add.s32 %r1108, %r1608, %r44; 2026-02-21T09:50:55.8426833Z add.s32 %r1109, %r1608, %r45; 2026-02-21T09:50:55.8426980Z add.s32 %r1110, %r1608, %r46; 2026-02-21T09:50:55.8427135Z add.s32 %r1111, %r1608, %r47; 2026-02-21T09:50:55.8427280Z add.s32 %r1112, %r1608, %r48; 2026-02-21T09:50:55.8427432Z add.s32 %r1113, %r1608, %r49; 2026-02-21T09:50:55.8427584Z add.s32 %r1114, %r1608, %r50; 2026-02-21T09:50:55.8427728Z add.s32 %r1115, %r1608, %r51; 2026-02-21T09:50:55.8427880Z add.s32 %r1116, %r1608, %r52; 2026-02-21T09:50:55.8428026Z add.s32 %r1117, %r1608, %r53; 2026-02-21T09:50:55.8428176Z add.s32 %r1118, %r1608, %r54; 2026-02-21T09:50:55.8428320Z add.s32 %r1119, %r1608, %r55; 2026-02-21T09:50:55.8428470Z add.s32 %r1120, %r1608, %r56; 2026-02-21T09:50:55.8428614Z add.s32 %r1121, %r1608, %r57; 2026-02-21T09:50:55.8428766Z add.s32 %r1122, %r1608, %r58; 2026-02-21T09:50:55.8428909Z add.s32 %r1123, %r1608, %r59; 2026-02-21T09:50:55.8429061Z add.s32 %r1124, %r1608, %r60; 2026-02-21T09:50:55.8429211Z add.s32 %r1125, %r1608, %r61; 2026-02-21T09:50:55.8429352Z add.s32 %r1126, %r1608, %r62; 2026-02-21T09:50:55.8429501Z add.s32 %r1127, %r1608, %r63; 2026-02-21T09:50:55.8429642Z add.s32 %r1128, %r1608, %r64; 2026-02-21T09:50:55.8429791Z add.s32 %r1129, %r1608, %r65; 2026-02-21T09:50:55.8429932Z add.s32 %r1130, %r1608, %r66; 2026-02-21T09:50:55.8430081Z add.s32 %r1131, %r1608, %r67; 2026-02-21T09:50:55.8430222Z add.s32 %r1132, %r1608, %r68; 2026-02-21T09:50:55.8430373Z add.s32 %r1133, %r1608, %r69; 2026-02-21T09:50:55.8430522Z add.s32 %r1134, %r1608, %r70; 2026-02-21T09:50:55.8430667Z add.s32 %r1135, %r1608, %r71; 2026-02-21T09:50:55.8430815Z add.s32 %r1136, %r1608, %r72; 2026-02-21T09:50:55.8430957Z add.s32 %r1137, %r1608, %r73; 2026-02-21T09:50:55.8431104Z add.s32 %r1138, %r1608, %r74; 2026-02-21T09:50:55.8431358Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8431635Z bar.sync 0, 128; 2026-02-21T09:50:55.8431765Z // begin inline asm 2026-02-21T09:50:55.8431906Z 2026-02-21T09:50:55.8432014Z { 2026-02-21T09:50:55.8432140Z .reg .pred complete; 2026-02-21T09:50:55.8432285Z waitLoop: 2026-02-21T09:50:55.8432467Z mbarrier.try_wait.parity.shared.b64 complete, [%r491], %r1610; 2026-02-21T09:50:55.8432707Z @!complete bra.uni waitLoop; 2026-02-21T09:50:55.8432853Z } 2026-02-21T09:50:55.8432925Z 2026-02-21T09:50:55.8432979Z // end inline asm 2026-02-21T09:50:55.8433108Z // begin inline asm 2026-02-21T09:50:55.8433488Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r544, %r545, %r546, %r547, %r548, %r549, %r550, %r551, %r552, %r553, %r554, %r555, %r556, %r557, %r558, %r559}, [%r207 + 0]; 2026-02-21T09:50:55.8433872Z // end inline asm 2026-02-21T09:50:55.8434001Z // begin inline asm 2026-02-21T09:50:55.8434341Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r561, %r562, %r563, %r564, %r565, %r566, %r567, %r568, %r569, %r570, %r571, %r572, %r573, %r574, %r575, %r576}, [%r207 + 16]; 2026-02-21T09:50:55.8434816Z // end inline asm 2026-02-21T09:50:55.8434953Z // begin inline asm 2026-02-21T09:50:55.8435286Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r578, %r579, %r580, %r581, %r582, %r583, %r584, %r585, %r586, %r587, %r588, %r589, %r590, %r591, %r592, %r593}, [%r207 + 32]; 2026-02-21T09:50:55.8435683Z // end inline asm 2026-02-21T09:50:55.8435824Z // begin inline asm 2026-02-21T09:50:55.8436170Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608, %r609, %r610}, [%r207 + 48]; 2026-02-21T09:50:55.8436548Z // end inline asm 2026-02-21T09:50:55.8436680Z // begin inline asm 2026-02-21T09:50:55.8437023Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625, %r626, %r627}, [%r207 + 64]; 2026-02-21T09:50:55.8437434Z // end inline asm 2026-02-21T09:50:55.8437603Z // begin inline asm 2026-02-21T09:50:55.8437946Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642, %r643, %r644}, [%r207 + 80]; 2026-02-21T09:50:55.8438308Z // end inline asm 2026-02-21T09:50:55.8438442Z // begin inline asm 2026-02-21T09:50:55.8438772Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659, %r660, %r661}, [%r207 + 96]; 2026-02-21T09:50:55.8439148Z // end inline asm 2026-02-21T09:50:55.8439275Z // begin inline asm 2026-02-21T09:50:55.8439617Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676, %r677, %r678}, [%r207 + 112]; 2026-02-21T09:50:55.8439993Z // end inline asm 2026-02-21T09:50:55.8440121Z // begin inline asm 2026-02-21T09:50:55.8440465Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693, %r694, %r695}, [%r207 + 128]; 2026-02-21T09:50:55.8440844Z // end inline asm 2026-02-21T09:50:55.8440977Z // begin inline asm 2026-02-21T09:50:55.8441307Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710, %r711, %r712}, [%r207 + 144]; 2026-02-21T09:50:55.8441675Z // end inline asm 2026-02-21T09:50:55.8441812Z // begin inline asm 2026-02-21T09:50:55.8442142Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727, %r728, %r729}, [%r207 + 160]; 2026-02-21T09:50:55.8442516Z // end inline asm 2026-02-21T09:50:55.8442643Z // begin inline asm 2026-02-21T09:50:55.8442985Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744, %r745, %r746}, [%r207 + 176]; 2026-02-21T09:50:55.8443353Z // end inline asm 2026-02-21T09:50:55.8443481Z // begin inline asm 2026-02-21T09:50:55.8443827Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761, %r762, %r763}, [%r207 + 192]; 2026-02-21T09:50:55.8444184Z // end inline asm 2026-02-21T09:50:55.8444318Z // begin inline asm 2026-02-21T09:50:55.8444645Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778, %r779, %r780}, [%r207 + 208]; 2026-02-21T09:50:55.8445060Z // end inline asm 2026-02-21T09:50:55.8445222Z // begin inline asm 2026-02-21T09:50:55.8445575Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795, %r796, %r797}, [%r207 + 224]; 2026-02-21T09:50:55.8445964Z // end inline asm 2026-02-21T09:50:55.8446096Z // begin inline asm 2026-02-21T09:50:55.8446450Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812, %r813, %r814}, [%r207 + 240]; 2026-02-21T09:50:55.8446880Z // end inline asm 2026-02-21T09:50:55.8447029Z // begin inline asm 2026-02-21T09:50:55.8447195Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:50:55.8447368Z // end inline asm 2026-02-21T09:50:55.8447516Z bar.sync 0, 128; 2026-02-21T09:50:55.8447655Z // begin inline asm 2026-02-21T09:50:55.8447841Z @%p111 mbarrier.arrive.shared::cta.b64 _, [%r1570]; 2026-02-21T09:50:55.8448047Z // end inline asm 2026-02-21T09:50:55.8448197Z cvt.u64.u32 %rd97, %r544; 2026-02-21T09:50:55.8448360Z cvt.u64.u32 %rd98, %r545; 2026-02-21T09:50:55.8448528Z shl.b64 %rd99, %rd98, 32; 2026-02-21T09:50:55.8448689Z or.b64 %rd100, %rd97, %rd99; 2026-02-21T09:50:55.8448979Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8449320Z mov.b64 {%r1140, %r1141}, %rd100; 2026-02-21T09:50:55.8449509Z cvt.rn.f16x2.f32 %r1142, %r1141, %r1140; 2026-02-21T09:50:55.8449843Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8450141Z cvt.u64.u32 %rd101, %r546; 2026-02-21T09:50:55.8450309Z cvt.u64.u32 %rd102, %r547; 2026-02-21T09:50:55.8450465Z shl.b64 %rd103, %rd102, 32; 2026-02-21T09:50:55.8450634Z or.b64 %rd104, %rd101, %rd103; 2026-02-21T09:50:55.8450911Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8451200Z mov.b64 {%r1143, %r1144}, %rd104; 2026-02-21T09:50:55.8451387Z cvt.rn.f16x2.f32 %r1145, %r1144, %r1143; 2026-02-21T09:50:55.8451674Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8451972Z cvt.u64.u32 %rd105, %r548; 2026-02-21T09:50:55.8452127Z cvt.u64.u32 %rd106, %r549; 2026-02-21T09:50:55.8452291Z shl.b64 %rd107, %rd106, 32; 2026-02-21T09:50:55.8452456Z or.b64 %rd108, %rd105, %rd107; 2026-02-21T09:50:55.8452725Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8453072Z mov.b64 {%r1146, %r1147}, %rd108; 2026-02-21T09:50:55.8453239Z cvt.rn.f16x2.f32 %r1148, %r1147, %r1146; 2026-02-21T09:50:55.8453517Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8453786Z cvt.u64.u32 %rd109, %r550; 2026-02-21T09:50:55.8453940Z cvt.u64.u32 %rd110, %r551; 2026-02-21T09:50:55.8454086Z shl.b64 %rd111, %rd110, 32; 2026-02-21T09:50:55.8454243Z or.b64 %rd112, %rd109, %rd111; 2026-02-21T09:50:55.8454506Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8454809Z mov.b64 {%r1149, %r1150}, %rd112; 2026-02-21T09:50:55.8454986Z cvt.rn.f16x2.f32 %r1151, %r1150, %r1149; 2026-02-21T09:50:55.8455264Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8455548Z cvt.u64.u32 %rd113, %r552; 2026-02-21T09:50:55.8455694Z cvt.u64.u32 %rd114, %r553; 2026-02-21T09:50:55.8455848Z shl.b64 %rd115, %rd114, 32; 2026-02-21T09:50:55.8456006Z or.b64 %rd116, %rd113, %rd115; 2026-02-21T09:50:55.8456265Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8456545Z mov.b64 {%r1152, %r1153}, %rd116; 2026-02-21T09:50:55.8456712Z cvt.rn.f16x2.f32 %r1154, %r1153, %r1152; 2026-02-21T09:50:55.8456994Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8457294Z cvt.u64.u32 %rd117, %r554; 2026-02-21T09:50:55.8457450Z cvt.u64.u32 %rd118, %r555; 2026-02-21T09:50:55.8457597Z shl.b64 %rd119, %rd118, 32; 2026-02-21T09:50:55.8457755Z or.b64 %rd120, %rd117, %rd119; 2026-02-21T09:50:55.8458017Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8458315Z mov.b64 {%r1155, %r1156}, %rd120; 2026-02-21T09:50:55.8458488Z cvt.rn.f16x2.f32 %r1157, %r1156, %r1155; 2026-02-21T09:50:55.8458756Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8459035Z cvt.u64.u32 %rd121, %r556; 2026-02-21T09:50:55.8459179Z cvt.u64.u32 %rd122, %r557; 2026-02-21T09:50:55.8459334Z shl.b64 %rd123, %rd122, 32; 2026-02-21T09:50:55.8459489Z or.b64 %rd124, %rd121, %rd123; 2026-02-21T09:50:55.8459743Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8460019Z mov.b64 {%r1158, %r1159}, %rd124; 2026-02-21T09:50:55.8460190Z cvt.rn.f16x2.f32 %r1160, %r1159, %r1158; 2026-02-21T09:50:55.8460463Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8460767Z cvt.u64.u32 %rd125, %r558; 2026-02-21T09:50:55.8460921Z cvt.u64.u32 %rd126, %r559; 2026-02-21T09:50:55.8461095Z shl.b64 %rd127, %rd126, 32; 2026-02-21T09:50:55.8461253Z or.b64 %rd128, %rd125, %rd127; 2026-02-21T09:50:55.8461511Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8461784Z mov.b64 {%r1161, %r1162}, %rd128; 2026-02-21T09:50:55.8461957Z cvt.rn.f16x2.f32 %r1163, %r1162, %r1161; 2026-02-21T09:50:55.8462225Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8462507Z cvt.u64.u32 %rd129, %r561; 2026-02-21T09:50:55.8462652Z cvt.u64.u32 %rd130, %r562; 2026-02-21T09:50:55.8462809Z shl.b64 %rd131, %rd130, 32; 2026-02-21T09:50:55.8462965Z or.b64 %rd132, %rd129, %rd131; 2026-02-21T09:50:55.8463216Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8463500Z mov.b64 {%r1164, %r1165}, %rd132; 2026-02-21T09:50:55.8463666Z cvt.rn.f16x2.f32 %r1166, %r1165, %r1164; 2026-02-21T09:50:55.8463943Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8464211Z cvt.u64.u32 %rd133, %r563; 2026-02-21T09:50:55.8464363Z cvt.u64.u32 %rd134, %r564; 2026-02-21T09:50:55.8464507Z shl.b64 %rd135, %rd134, 32; 2026-02-21T09:50:55.8464664Z or.b64 %rd136, %rd133, %rd135; 2026-02-21T09:50:55.8464952Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8465224Z mov.b64 {%r1167, %r1168}, %rd136; 2026-02-21T09:50:55.8465396Z cvt.rn.f16x2.f32 %r1169, %r1168, %r1167; 2026-02-21T09:50:55.8465664Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8465942Z cvt.u64.u32 %rd137, %r565; 2026-02-21T09:50:55.8466089Z cvt.u64.u32 %rd138, %r566; 2026-02-21T09:50:55.8466242Z shl.b64 %rd139, %rd138, 32; 2026-02-21T09:50:55.8466400Z or.b64 %rd140, %rd137, %rd139; 2026-02-21T09:50:55.8466656Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8466936Z mov.b64 {%r1170, %r1171}, %rd140; 2026-02-21T09:50:55.8467101Z cvt.rn.f16x2.f32 %r1172, %r1171, %r1170; 2026-02-21T09:50:55.8467373Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8467649Z cvt.u64.u32 %rd141, %r567; 2026-02-21T09:50:55.8467802Z cvt.u64.u32 %rd142, %r568; 2026-02-21T09:50:55.8467953Z shl.b64 %rd143, %rd142, 32; 2026-02-21T09:50:55.8468102Z or.b64 %rd144, %rd141, %rd143; 2026-02-21T09:50:55.8468398Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8468679Z mov.b64 {%r1173, %r1174}, %rd144; 2026-02-21T09:50:55.8468853Z cvt.rn.f16x2.f32 %r1175, %r1174, %r1173; 2026-02-21T09:50:55.8469132Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8469446Z cvt.u64.u32 %rd145, %r569; 2026-02-21T09:50:55.8469594Z cvt.u64.u32 %rd146, %r570; 2026-02-21T09:50:55.8469747Z shl.b64 %rd147, %rd146, 32; 2026-02-21T09:50:55.8469902Z or.b64 %rd148, %rd145, %rd147; 2026-02-21T09:50:55.8470160Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8470451Z mov.b64 {%r1176, %r1177}, %rd148; 2026-02-21T09:50:55.8470619Z cvt.rn.f16x2.f32 %r1178, %r1177, %r1176; 2026-02-21T09:50:55.8470899Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8471174Z cvt.u64.u32 %rd149, %r571; 2026-02-21T09:50:55.8471328Z cvt.u64.u32 %rd150, %r572; 2026-02-21T09:50:55.8471480Z shl.b64 %rd151, %rd150, 32; 2026-02-21T09:50:55.8471628Z or.b64 %rd152, %rd149, %rd151; 2026-02-21T09:50:55.8471913Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8472211Z mov.b64 {%r1179, %r1180}, %rd152; 2026-02-21T09:50:55.8472385Z cvt.rn.f16x2.f32 %r1181, %r1180, %r1179; 2026-02-21T09:50:55.8472652Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8472928Z cvt.u64.u32 %rd153, %r573; 2026-02-21T09:50:55.8473074Z cvt.u64.u32 %rd154, %r574; 2026-02-21T09:50:55.8473227Z shl.b64 %rd155, %rd154, 32; 2026-02-21T09:50:55.8473382Z or.b64 %rd156, %rd153, %rd155; 2026-02-21T09:50:55.8473634Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8473914Z mov.b64 {%r1182, %r1183}, %rd156; 2026-02-21T09:50:55.8474080Z cvt.rn.f16x2.f32 %r1184, %r1183, %r1182; 2026-02-21T09:50:55.8474358Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8474628Z cvt.u64.u32 %rd157, %r575; 2026-02-21T09:50:55.8474815Z cvt.u64.u32 %rd158, %r576; 2026-02-21T09:50:55.8474970Z shl.b64 %rd159, %rd158, 32; 2026-02-21T09:50:55.8475121Z or.b64 %rd160, %rd157, %rd159; 2026-02-21T09:50:55.8475384Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8475655Z mov.b64 {%r1185, %r1186}, %rd160; 2026-02-21T09:50:55.8475826Z cvt.rn.f16x2.f32 %r1187, %r1186, %r1185; 2026-02-21T09:50:55.8476104Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8476390Z cvt.u64.u32 %rd161, %r578; 2026-02-21T09:50:55.8476534Z cvt.u64.u32 %rd162, %r579; 2026-02-21T09:50:55.8476687Z shl.b64 %rd163, %rd162, 32; 2026-02-21T09:50:55.8476842Z or.b64 %rd164, %rd161, %rd163; 2026-02-21T09:50:55.8477104Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8477389Z mov.b64 {%r1188, %r1189}, %rd164; 2026-02-21T09:50:55.8477555Z cvt.rn.f16x2.f32 %r1190, %r1189, %r1188; 2026-02-21T09:50:55.8477838Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8478116Z cvt.u64.u32 %rd165, %r580; 2026-02-21T09:50:55.8478270Z cvt.u64.u32 %rd166, %r581; 2026-02-21T09:50:55.8478421Z shl.b64 %rd167, %rd166, 32; 2026-02-21T09:50:55.8478571Z or.b64 %rd168, %rd165, %rd167; 2026-02-21T09:50:55.8478835Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8479111Z mov.b64 {%r1191, %r1192}, %rd168; 2026-02-21T09:50:55.8479281Z cvt.rn.f16x2.f32 %r1193, %r1192, %r1191; 2026-02-21T09:50:55.8479600Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8479883Z cvt.u64.u32 %rd169, %r582; 2026-02-21T09:50:55.8480035Z cvt.u64.u32 %rd170, %r583; 2026-02-21T09:50:55.8480194Z shl.b64 %rd171, %rd170, 32; 2026-02-21T09:50:55.8480354Z or.b64 %rd172, %rd169, %rd171; 2026-02-21T09:50:55.8480612Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8480927Z mov.b64 {%r1194, %r1195}, %rd172; 2026-02-21T09:50:55.8481093Z cvt.rn.f16x2.f32 %r1196, %r1195, %r1194; 2026-02-21T09:50:55.8481366Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8481639Z cvt.u64.u32 %rd173, %r584; 2026-02-21T09:50:55.8481790Z cvt.u64.u32 %rd174, %r585; 2026-02-21T09:50:55.8481940Z shl.b64 %rd175, %rd174, 32; 2026-02-21T09:50:55.8482087Z or.b64 %rd176, %rd173, %rd175; 2026-02-21T09:50:55.8482348Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8482626Z mov.b64 {%r1197, %r1198}, %rd176; 2026-02-21T09:50:55.8482802Z cvt.rn.f16x2.f32 %r1199, %r1198, %r1197; 2026-02-21T09:50:55.8483098Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8483406Z cvt.u64.u32 %rd177, %r586; 2026-02-21T09:50:55.8483557Z cvt.u64.u32 %rd178, %r587; 2026-02-21T09:50:55.8483710Z shl.b64 %rd179, %rd178, 32; 2026-02-21T09:50:55.8483864Z or.b64 %rd180, %rd177, %rd179; 2026-02-21T09:50:55.8484116Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8484396Z mov.b64 {%r1200, %r1201}, %rd180; 2026-02-21T09:50:55.8484559Z cvt.rn.f16x2.f32 %r1202, %r1201, %r1200; 2026-02-21T09:50:55.8484870Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8485152Z cvt.u64.u32 %rd181, %r588; 2026-02-21T09:50:55.8485307Z cvt.u64.u32 %rd182, %r589; 2026-02-21T09:50:55.8485459Z shl.b64 %rd183, %rd182, 32; 2026-02-21T09:50:55.8485608Z or.b64 %rd184, %rd181, %rd183; 2026-02-21T09:50:55.8485875Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8486151Z mov.b64 {%r1203, %r1204}, %rd184; 2026-02-21T09:50:55.8486324Z cvt.rn.f16x2.f32 %r1205, %r1204, %r1203; 2026-02-21T09:50:55.8486596Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8486878Z cvt.u64.u32 %rd185, %r590; 2026-02-21T09:50:55.8487033Z cvt.u64.u32 %rd186, %r591; 2026-02-21T09:50:55.8487181Z shl.b64 %rd187, %rd186, 32; 2026-02-21T09:50:55.8487335Z or.b64 %rd188, %rd185, %rd187; 2026-02-21T09:50:55.8487593Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8487884Z mov.b64 {%r1206, %r1207}, %rd188; 2026-02-21T09:50:55.8488050Z cvt.rn.f16x2.f32 %r1208, %r1207, %r1206; 2026-02-21T09:50:55.8488333Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8488605Z cvt.u64.u32 %rd189, %r592; 2026-02-21T09:50:55.8488759Z cvt.u64.u32 %rd190, %r593; 2026-02-21T09:50:55.8488912Z shl.b64 %rd191, %rd190, 32; 2026-02-21T09:50:55.8489065Z or.b64 %rd192, %rd189, %rd191; 2026-02-21T09:50:55.8489329Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8489614Z mov.b64 {%r1209, %r1210}, %rd192; 2026-02-21T09:50:55.8489794Z cvt.rn.f16x2.f32 %r1211, %r1210, %r1209; 2026-02-21T09:50:55.8490080Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8490382Z cvt.u64.u32 %rd193, %r595; 2026-02-21T09:50:55.8490543Z cvt.u64.u32 %rd194, %r596; 2026-02-21T09:50:55.8490727Z shl.b64 %rd195, %rd194, 32; 2026-02-21T09:50:55.8490891Z or.b64 %rd196, %rd193, %rd195; 2026-02-21T09:50:55.8491152Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8491453Z mov.b64 {%r1212, %r1213}, %rd196; 2026-02-21T09:50:55.8491624Z cvt.rn.f16x2.f32 %r1214, %r1213, %r1212; 2026-02-21T09:50:55.8491914Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8492227Z cvt.u64.u32 %rd197, %r597; 2026-02-21T09:50:55.8492386Z cvt.u64.u32 %rd198, %r598; 2026-02-21T09:50:55.8492545Z shl.b64 %rd199, %rd198, 32; 2026-02-21T09:50:55.8492699Z or.b64 %rd200, %rd197, %rd199; 2026-02-21T09:50:55.8492973Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8493259Z mov.b64 {%r1215, %r1216}, %rd200; 2026-02-21T09:50:55.8493439Z cvt.rn.f16x2.f32 %r1217, %r1216, %r1215; 2026-02-21T09:50:55.8493720Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8494017Z cvt.u64.u32 %rd201, %r599; 2026-02-21T09:50:55.8494175Z cvt.u64.u32 %rd202, %r600; 2026-02-21T09:50:55.8494327Z shl.b64 %rd203, %rd202, 32; 2026-02-21T09:50:55.8494517Z or.b64 %rd204, %rd201, %rd203; 2026-02-21T09:50:55.8494846Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8495146Z mov.b64 {%r1218, %r1219}, %rd204; 2026-02-21T09:50:55.8495317Z cvt.rn.f16x2.f32 %r1220, %r1219, %r1218; 2026-02-21T09:50:55.8495611Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8495899Z cvt.u64.u32 %rd205, %r601; 2026-02-21T09:50:55.8496060Z cvt.u64.u32 %rd206, %r602; 2026-02-21T09:50:55.8496221Z shl.b64 %rd207, %rd206, 32; 2026-02-21T09:50:55.8496376Z or.b64 %rd208, %rd205, %rd207; 2026-02-21T09:50:55.8496661Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8496950Z mov.b64 {%r1221, %r1222}, %rd208; 2026-02-21T09:50:55.8497125Z cvt.rn.f16x2.f32 %r1223, %r1222, %r1221; 2026-02-21T09:50:55.8497409Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8497700Z cvt.u64.u32 %rd209, %r603; 2026-02-21T09:50:55.8497858Z cvt.u64.u32 %rd210, %r604; 2026-02-21T09:50:55.8498009Z shl.b64 %rd211, %rd210, 32; 2026-02-21T09:50:55.8498170Z or.b64 %rd212, %rd209, %rd211; 2026-02-21T09:50:55.8498437Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8498742Z mov.b64 {%r1224, %r1225}, %rd212; 2026-02-21T09:50:55.8498914Z cvt.rn.f16x2.f32 %r1226, %r1225, %r1224; 2026-02-21T09:50:55.8499203Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8499486Z cvt.u64.u32 %rd213, %r605; 2026-02-21T09:50:55.8499637Z cvt.u64.u32 %rd214, %r606; 2026-02-21T09:50:55.8499788Z shl.b64 %rd215, %rd214, 32; 2026-02-21T09:50:55.8499936Z or.b64 %rd216, %rd213, %rd215; 2026-02-21T09:50:55.8500206Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8500490Z mov.b64 {%r1227, %r1228}, %rd216; 2026-02-21T09:50:55.8500664Z cvt.rn.f16x2.f32 %r1229, %r1228, %r1227; 2026-02-21T09:50:55.8500937Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8501218Z cvt.u64.u32 %rd217, %r607; 2026-02-21T09:50:55.8501370Z cvt.u64.u32 %rd218, %r608; 2026-02-21T09:50:55.8501516Z shl.b64 %rd219, %rd218, 32; 2026-02-21T09:50:55.8501673Z or.b64 %rd220, %rd217, %rd219; 2026-02-21T09:50:55.8501930Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8502243Z mov.b64 {%r1230, %r1231}, %rd220; 2026-02-21T09:50:55.8502409Z cvt.rn.f16x2.f32 %r1232, %r1231, %r1230; 2026-02-21T09:50:55.8502693Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8502966Z cvt.u64.u32 %rd221, %r609; 2026-02-21T09:50:55.8503120Z cvt.u64.u32 %rd222, %r610; 2026-02-21T09:50:55.8503274Z shl.b64 %rd223, %rd222, 32; 2026-02-21T09:50:55.8503448Z or.b64 %rd224, %rd221, %rd223; 2026-02-21T09:50:55.8503710Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8503998Z mov.b64 {%r1233, %r1234}, %rd224; 2026-02-21T09:50:55.8504170Z cvt.rn.f16x2.f32 %r1235, %r1234, %r1233; 2026-02-21T09:50:55.8504439Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8504742Z cvt.u64.u32 %rd225, %r612; 2026-02-21T09:50:55.8504900Z cvt.u64.u32 %rd226, %r613; 2026-02-21T09:50:55.8505051Z shl.b64 %rd227, %rd226, 32; 2026-02-21T09:50:55.8505207Z or.b64 %rd228, %rd225, %rd227; 2026-02-21T09:50:55.8505466Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8505746Z mov.b64 {%r1236, %r1237}, %rd228; 2026-02-21T09:50:55.8505939Z cvt.rn.f16x2.f32 %r1238, %r1237, %r1236; 2026-02-21T09:50:55.8506253Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8506542Z cvt.u64.u32 %rd229, %r614; 2026-02-21T09:50:55.8506690Z cvt.u64.u32 %rd230, %r615; 2026-02-21T09:50:55.8506843Z shl.b64 %rd231, %rd230, 32; 2026-02-21T09:50:55.8506990Z or.b64 %rd232, %rd229, %rd231; 2026-02-21T09:50:55.8507252Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8507532Z mov.b64 {%r1239, %r1240}, %rd232; 2026-02-21T09:50:55.8507702Z cvt.rn.f16x2.f32 %r1241, %r1240, %r1239; 2026-02-21T09:50:55.8507973Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8508250Z cvt.u64.u32 %rd233, %r616; 2026-02-21T09:50:55.8508402Z cvt.u64.u32 %rd234, %r617; 2026-02-21T09:50:55.8508548Z shl.b64 %rd235, %rd234, 32; 2026-02-21T09:50:55.8508700Z or.b64 %rd236, %rd233, %rd235; 2026-02-21T09:50:55.8508958Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8509234Z mov.b64 {%r1242, %r1243}, %rd236; 2026-02-21T09:50:55.8509395Z cvt.rn.f16x2.f32 %r1244, %r1243, %r1242; 2026-02-21T09:50:55.8509672Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8509957Z cvt.u64.u32 %rd237, %r618; 2026-02-21T09:50:55.8510101Z cvt.u64.u32 %rd238, %r619; 2026-02-21T09:50:55.8510250Z shl.b64 %rd239, %rd238, 32; 2026-02-21T09:50:55.8510397Z or.b64 %rd240, %rd237, %rd239; 2026-02-21T09:50:55.8510653Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8510934Z mov.b64 {%r1245, %r1246}, %rd240; 2026-02-21T09:50:55.8511104Z cvt.rn.f16x2.f32 %r1247, %r1246, %r1245; 2026-02-21T09:50:55.8511372Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8511659Z cvt.u64.u32 %rd241, %r620; 2026-02-21T09:50:55.8511814Z cvt.u64.u32 %rd242, %r621; 2026-02-21T09:50:55.8511959Z shl.b64 %rd243, %rd242, 32; 2026-02-21T09:50:55.8512115Z or.b64 %rd244, %rd241, %rd243; 2026-02-21T09:50:55.8512364Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8512642Z mov.b64 {%r1248, %r1249}, %rd244; 2026-02-21T09:50:55.8512805Z cvt.rn.f16x2.f32 %r1250, %r1249, %r1248; 2026-02-21T09:50:55.8513079Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8513388Z cvt.u64.u32 %rd245, %r622; 2026-02-21T09:50:55.8513533Z cvt.u64.u32 %rd246, %r623; 2026-02-21T09:50:55.8513684Z shl.b64 %rd247, %rd246, 32; 2026-02-21T09:50:55.8513831Z or.b64 %rd248, %rd245, %rd247; 2026-02-21T09:50:55.8514089Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8514361Z mov.b64 {%r1251, %r1252}, %rd248; 2026-02-21T09:50:55.8514560Z cvt.rn.f16x2.f32 %r1253, %r1252, %r1251; 2026-02-21T09:50:55.8514857Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8515142Z cvt.u64.u32 %rd249, %r624; 2026-02-21T09:50:55.8515293Z cvt.u64.u32 %rd250, %r625; 2026-02-21T09:50:55.8515438Z shl.b64 %rd251, %rd250, 32; 2026-02-21T09:50:55.8515594Z or.b64 %rd252, %rd249, %rd251; 2026-02-21T09:50:55.8515853Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8516135Z mov.b64 {%r1254, %r1255}, %rd252; 2026-02-21T09:50:55.8516300Z cvt.rn.f16x2.f32 %r1256, %r1255, %r1254; 2026-02-21T09:50:55.8516588Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8516874Z cvt.u64.u32 %rd253, %r626; 2026-02-21T09:50:55.8517018Z cvt.u64.u32 %rd254, %r627; 2026-02-21T09:50:55.8517200Z shl.b64 %rd255, %rd254, 32; 2026-02-21T09:50:55.8517390Z or.b64 %rd256, %rd253, %rd255; 2026-02-21T09:50:55.8517660Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8517933Z mov.b64 {%r1257, %r1258}, %rd256; 2026-02-21T09:50:55.8518106Z cvt.rn.f16x2.f32 %r1259, %r1258, %r1257; 2026-02-21T09:50:55.8518375Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8518651Z cvt.u64.u32 %rd257, %r629; 2026-02-21T09:50:55.8518803Z cvt.u64.u32 %rd258, %r630; 2026-02-21T09:50:55.8518948Z shl.b64 %rd259, %rd258, 32; 2026-02-21T09:50:55.8519105Z or.b64 %rd260, %rd257, %rd259; 2026-02-21T09:50:55.8519358Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8519643Z mov.b64 {%r1260, %r1261}, %rd260; 2026-02-21T09:50:55.8519806Z cvt.rn.f16x2.f32 %r1262, %r1261, %r1260; 2026-02-21T09:50:55.8520081Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8520355Z cvt.u64.u32 %rd261, %r631; 2026-02-21T09:50:55.8520499Z cvt.u64.u32 %rd262, %r632; 2026-02-21T09:50:55.8520649Z shl.b64 %rd263, %rd262, 32; 2026-02-21T09:50:55.8520797Z or.b64 %rd264, %rd261, %rd263; 2026-02-21T09:50:55.8521059Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8521330Z mov.b64 {%r1263, %r1264}, %rd264; 2026-02-21T09:50:55.8521502Z cvt.rn.f16x2.f32 %r1265, %r1264, %r1263; 2026-02-21T09:50:55.8521769Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8522044Z cvt.u64.u32 %rd265, %r633; 2026-02-21T09:50:55.8522195Z cvt.u64.u32 %rd266, %r634; 2026-02-21T09:50:55.8522342Z shl.b64 %rd267, %rd266, 32; 2026-02-21T09:50:55.8522497Z or.b64 %rd268, %rd265, %rd267; 2026-02-21T09:50:55.8522750Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8523028Z mov.b64 {%r1266, %r1267}, %rd268; 2026-02-21T09:50:55.8523193Z cvt.rn.f16x2.f32 %r1268, %r1267, %r1266; 2026-02-21T09:50:55.8523465Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8523745Z cvt.u64.u32 %rd269, %r635; 2026-02-21T09:50:55.8523893Z cvt.u64.u32 %rd270, %r636; 2026-02-21T09:50:55.8524046Z shl.b64 %rd271, %rd270, 32; 2026-02-21T09:50:55.8524194Z or.b64 %rd272, %rd269, %rd271; 2026-02-21T09:50:55.8524459Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8524775Z mov.b64 {%r1269, %r1270}, %rd272; 2026-02-21T09:50:55.8524950Z cvt.rn.f16x2.f32 %r1271, %r1270, %r1269; 2026-02-21T09:50:55.8525226Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8525508Z cvt.u64.u32 %rd273, %r637; 2026-02-21T09:50:55.8525662Z cvt.u64.u32 %rd274, %r638; 2026-02-21T09:50:55.8525835Z shl.b64 %rd275, %rd274, 32; 2026-02-21T09:50:55.8525998Z or.b64 %rd276, %rd273, %rd275; 2026-02-21T09:50:55.8526248Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8526532Z mov.b64 {%r1272, %r1273}, %rd276; 2026-02-21T09:50:55.8526695Z cvt.rn.f16x2.f32 %r1274, %r1273, %r1272; 2026-02-21T09:50:55.8526966Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8527250Z cvt.u64.u32 %rd277, %r639; 2026-02-21T09:50:55.8527396Z cvt.u64.u32 %rd278, %r640; 2026-02-21T09:50:55.8527547Z shl.b64 %rd279, %rd278, 32; 2026-02-21T09:50:55.8527694Z or.b64 %rd280, %rd277, %rd279; 2026-02-21T09:50:55.8527951Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8528257Z mov.b64 {%r1275, %r1276}, %rd280; 2026-02-21T09:50:55.8528463Z cvt.rn.f16x2.f32 %r1277, %r1276, %r1275; 2026-02-21T09:50:55.8528734Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8529019Z cvt.u64.u32 %rd281, %r641; 2026-02-21T09:50:55.8529171Z cvt.u64.u32 %rd282, %r642; 2026-02-21T09:50:55.8529317Z shl.b64 %rd283, %rd282, 32; 2026-02-21T09:50:55.8529472Z or.b64 %rd284, %rd281, %rd283; 2026-02-21T09:50:55.8529733Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8530019Z mov.b64 {%r1278, %r1279}, %rd284; 2026-02-21T09:50:55.8530183Z cvt.rn.f16x2.f32 %r1280, %r1279, %r1278; 2026-02-21T09:50:55.8530468Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8530747Z cvt.u64.u32 %rd285, %r643; 2026-02-21T09:50:55.8530894Z cvt.u64.u32 %rd286, %r644; 2026-02-21T09:50:55.8531048Z shl.b64 %rd287, %rd286, 32; 2026-02-21T09:50:55.8531198Z or.b64 %rd288, %rd285, %rd287; 2026-02-21T09:50:55.8531461Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8531792Z mov.b64 {%r1281, %r1282}, %rd288; 2026-02-21T09:50:55.8531962Z cvt.rn.f16x2.f32 %r1283, %r1282, %r1281; 2026-02-21T09:50:55.8532233Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8532522Z cvt.u64.u32 %rd289, %r646; 2026-02-21T09:50:55.8532683Z cvt.u64.u32 %rd290, %r647; 2026-02-21T09:50:55.8532835Z shl.b64 %rd291, %rd290, 32; 2026-02-21T09:50:55.8533000Z or.b64 %rd292, %rd289, %rd291; 2026-02-21T09:50:55.8533267Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8533569Z mov.b64 {%r1284, %r1285}, %rd292; 2026-02-21T09:50:55.8533741Z cvt.rn.f16x2.f32 %r1286, %r1285, %r1284; 2026-02-21T09:50:55.8534041Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8534338Z cvt.u64.u32 %rd293, %r648; 2026-02-21T09:50:55.8534490Z cvt.u64.u32 %rd294, %r649; 2026-02-21T09:50:55.8534651Z shl.b64 %rd295, %rd294, 32; 2026-02-21T09:50:55.8534834Z or.b64 %rd296, %rd293, %rd295; 2026-02-21T09:50:55.8535108Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8535394Z mov.b64 {%r1287, %r1288}, %rd296; 2026-02-21T09:50:55.8535575Z cvt.rn.f16x2.f32 %r1289, %r1288, %r1287; 2026-02-21T09:50:55.8535858Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8536187Z cvt.u64.u32 %rd297, %r650; 2026-02-21T09:50:55.8536348Z cvt.u64.u32 %rd298, %r651; 2026-02-21T09:50:55.8536499Z shl.b64 %rd299, %rd298, 32; 2026-02-21T09:50:55.8536664Z or.b64 %rd300, %rd297, %rd299; 2026-02-21T09:50:55.8536933Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8537265Z mov.b64 {%r1290, %r1291}, %rd300; 2026-02-21T09:50:55.8537439Z cvt.rn.f16x2.f32 %r1292, %r1291, %r1290; 2026-02-21T09:50:55.8537729Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8538030Z cvt.u64.u32 %rd301, %r652; 2026-02-21T09:50:55.8538188Z cvt.u64.u32 %rd302, %r653; 2026-02-21T09:50:55.8538351Z shl.b64 %rd303, %rd302, 32; 2026-02-21T09:50:55.8538510Z or.b64 %rd304, %rd301, %rd303; 2026-02-21T09:50:55.8538779Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8539068Z mov.b64 {%r1293, %r1294}, %rd304; 2026-02-21T09:50:55.8539248Z cvt.rn.f16x2.f32 %r1295, %r1294, %r1293; 2026-02-21T09:50:55.8539535Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8539842Z cvt.u64.u32 %rd305, %r654; 2026-02-21T09:50:55.8540038Z cvt.u64.u32 %rd306, %r655; 2026-02-21T09:50:55.8540187Z shl.b64 %rd307, %rd306, 32; 2026-02-21T09:50:55.8540344Z or.b64 %rd308, %rd305, %rd307; 2026-02-21T09:50:55.8540594Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8540879Z mov.b64 {%r1296, %r1297}, %rd308; 2026-02-21T09:50:55.8541041Z cvt.rn.f16x2.f32 %r1298, %r1297, %r1296; 2026-02-21T09:50:55.8541313Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8541596Z cvt.u64.u32 %rd309, %r656; 2026-02-21T09:50:55.8541744Z cvt.u64.u32 %rd310, %r657; 2026-02-21T09:50:55.8541895Z shl.b64 %rd311, %rd310, 32; 2026-02-21T09:50:55.8542041Z or.b64 %rd312, %rd309, %rd311; 2026-02-21T09:50:55.8542297Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8542571Z mov.b64 {%r1299, %r1300}, %rd312; 2026-02-21T09:50:55.8542743Z cvt.rn.f16x2.f32 %r1301, %r1300, %r1299; 2026-02-21T09:50:55.8543019Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8543292Z cvt.u64.u32 %rd313, %r658; 2026-02-21T09:50:55.8543444Z cvt.u64.u32 %rd314, %r659; 2026-02-21T09:50:55.8543588Z shl.b64 %rd315, %rd314, 32; 2026-02-21T09:50:55.8543744Z or.b64 %rd316, %rd313, %rd315; 2026-02-21T09:50:55.8543995Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8544278Z mov.b64 {%r1302, %r1303}, %rd316; 2026-02-21T09:50:55.8544448Z cvt.rn.f16x2.f32 %r1304, %r1303, %r1302; 2026-02-21T09:50:55.8544761Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8545039Z cvt.u64.u32 %rd317, %r660; 2026-02-21T09:50:55.8545187Z cvt.u64.u32 %rd318, %r661; 2026-02-21T09:50:55.8545341Z shl.b64 %rd319, %rd318, 32; 2026-02-21T09:50:55.8545490Z or.b64 %rd320, %rd317, %rd319; 2026-02-21T09:50:55.8545756Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8546036Z mov.b64 {%r1305, %r1306}, %rd320; 2026-02-21T09:50:55.8546209Z cvt.rn.f16x2.f32 %r1307, %r1306, %r1305; 2026-02-21T09:50:55.8546482Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8546755Z cvt.u64.u32 %rd321, %r663; 2026-02-21T09:50:55.8546909Z cvt.u64.u32 %rd322, %r664; 2026-02-21T09:50:55.8547054Z shl.b64 %rd323, %rd322, 32; 2026-02-21T09:50:55.8547241Z or.b64 %rd324, %rd321, %rd323; 2026-02-21T09:50:55.8547506Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8547793Z mov.b64 {%r1308, %r1309}, %rd324; 2026-02-21T09:50:55.8547957Z cvt.rn.f16x2.f32 %r1310, %r1309, %r1308; 2026-02-21T09:50:55.8548239Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8548543Z cvt.u64.u32 %rd325, %r665; 2026-02-21T09:50:55.8548688Z cvt.u64.u32 %rd326, %r666; 2026-02-21T09:50:55.8548840Z shl.b64 %rd327, %rd326, 32; 2026-02-21T09:50:55.8548990Z or.b64 %rd328, %rd325, %rd327; 2026-02-21T09:50:55.8549258Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8549545Z mov.b64 {%r1311, %r1312}, %rd328; 2026-02-21T09:50:55.8549716Z cvt.rn.f16x2.f32 %r1313, %r1312, %r1311; 2026-02-21T09:50:55.8549997Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8550278Z cvt.u64.u32 %rd329, %r667; 2026-02-21T09:50:55.8550431Z cvt.u64.u32 %rd330, %r668; 2026-02-21T09:50:55.8550574Z shl.b64 %rd331, %rd330, 32; 2026-02-21T09:50:55.8550727Z or.b64 %rd332, %rd329, %rd331; 2026-02-21T09:50:55.8551047Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8551337Z mov.b64 {%r1314, %r1315}, %rd332; 2026-02-21T09:50:55.8551501Z cvt.rn.f16x2.f32 %r1316, %r1315, %r1314; 2026-02-21T09:50:55.8551777Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8552059Z cvt.u64.u32 %rd333, %r669; 2026-02-21T09:50:55.8552205Z cvt.u64.u32 %rd334, %r670; 2026-02-21T09:50:55.8552356Z shl.b64 %rd335, %rd334, 32; 2026-02-21T09:50:55.8552506Z or.b64 %rd336, %rd333, %rd335; 2026-02-21T09:50:55.8552766Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8553044Z mov.b64 {%r1317, %r1318}, %rd336; 2026-02-21T09:50:55.8553216Z cvt.rn.f16x2.f32 %r1319, %r1318, %r1317; 2026-02-21T09:50:55.8553492Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8553769Z cvt.u64.u32 %rd337, %r671; 2026-02-21T09:50:55.8553923Z cvt.u64.u32 %rd338, %r672; 2026-02-21T09:50:55.8554070Z shl.b64 %rd339, %rd338, 32; 2026-02-21T09:50:55.8554226Z or.b64 %rd340, %rd337, %rd339; 2026-02-21T09:50:55.8554486Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8554802Z mov.b64 {%r1320, %r1321}, %rd340; 2026-02-21T09:50:55.8554968Z cvt.rn.f16x2.f32 %r1322, %r1321, %r1320; 2026-02-21T09:50:55.8555255Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8555548Z cvt.u64.u32 %rd341, %r673; 2026-02-21T09:50:55.8555697Z cvt.u64.u32 %rd342, %r674; 2026-02-21T09:50:55.8555852Z shl.b64 %rd343, %rd342, 32; 2026-02-21T09:50:55.8556000Z or.b64 %rd344, %rd341, %rd343; 2026-02-21T09:50:55.8556269Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8556557Z mov.b64 {%r1323, %r1324}, %rd344; 2026-02-21T09:50:55.8556736Z cvt.rn.f16x2.f32 %r1325, %r1324, %r1323; 2026-02-21T09:50:55.8557027Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8557305Z cvt.u64.u32 %rd345, %r675; 2026-02-21T09:50:55.8557459Z cvt.u64.u32 %rd346, %r676; 2026-02-21T09:50:55.8557605Z shl.b64 %rd347, %rd346, 32; 2026-02-21T09:50:55.8557763Z or.b64 %rd348, %rd345, %rd347; 2026-02-21T09:50:55.8558019Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8558306Z mov.b64 {%r1326, %r1327}, %rd348; 2026-02-21T09:50:55.8558502Z cvt.rn.f16x2.f32 %r1328, %r1327, %r1326; 2026-02-21T09:50:55.8558665Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8558722Z cvt.u64.u32 %rd349, %r677; 2026-02-21T09:50:55.8558778Z cvt.u64.u32 %rd350, %r678; 2026-02-21T09:50:55.8558843Z shl.b64 %rd351, %rd350, 32; 2026-02-21T09:50:55.8558900Z or.b64 %rd352, %rd349, %rd351; 2026-02-21T09:50:55.8559095Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8559160Z mov.b64 {%r1329, %r1330}, %rd352; 2026-02-21T09:50:55.8559225Z cvt.rn.f16x2.f32 %r1331, %r1330, %r1329; 2026-02-21T09:50:55.8559384Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8559448Z cvt.u64.u32 %rd353, %r680; 2026-02-21T09:50:55.8559503Z cvt.u64.u32 %rd354, %r681; 2026-02-21T09:50:55.8559560Z shl.b64 %rd355, %rd354, 32; 2026-02-21T09:50:55.8559616Z or.b64 %rd356, %rd353, %rd355; 2026-02-21T09:50:55.8559785Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8559844Z mov.b64 {%r1332, %r1333}, %rd356; 2026-02-21T09:50:55.8559908Z cvt.rn.f16x2.f32 %r1334, %r1333, %r1332; 2026-02-21T09:50:55.8560140Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8560198Z cvt.u64.u32 %rd357, %r682; 2026-02-21T09:50:55.8560253Z cvt.u64.u32 %rd358, %r683; 2026-02-21T09:50:55.8560310Z shl.b64 %rd359, %rd358, 32; 2026-02-21T09:50:55.8560373Z or.b64 %rd360, %rd357, %rd359; 2026-02-21T09:50:55.8560536Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8560591Z mov.b64 {%r1335, %r1336}, %rd360; 2026-02-21T09:50:55.8560662Z cvt.rn.f16x2.f32 %r1337, %r1336, %r1335; 2026-02-21T09:50:55.8560827Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8560885Z cvt.u64.u32 %rd361, %r684; 2026-02-21T09:50:55.8560946Z cvt.u64.u32 %rd362, %r685; 2026-02-21T09:50:55.8561000Z shl.b64 %rd363, %rd362, 32; 2026-02-21T09:50:55.8561056Z or.b64 %rd364, %rd361, %rd363; 2026-02-21T09:50:55.8561213Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8561277Z mov.b64 {%r1338, %r1339}, %rd364; 2026-02-21T09:50:55.8561339Z cvt.rn.f16x2.f32 %r1340, %r1339, %r1338; 2026-02-21T09:50:55.8561493Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8561557Z cvt.u64.u32 %rd365, %r686; 2026-02-21T09:50:55.8561612Z cvt.u64.u32 %rd366, %r687; 2026-02-21T09:50:55.8561669Z shl.b64 %rd367, %rd366, 32; 2026-02-21T09:50:55.8561731Z or.b64 %rd368, %rd365, %rd367; 2026-02-21T09:50:55.8561891Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8561949Z mov.b64 {%r1341, %r1342}, %rd368; 2026-02-21T09:50:55.8562012Z cvt.rn.f16x2.f32 %r1343, %r1342, %r1341; 2026-02-21T09:50:55.8562173Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8562229Z cvt.u64.u32 %rd369, %r688; 2026-02-21T09:50:55.8562284Z cvt.u64.u32 %rd370, %r689; 2026-02-21T09:50:55.8562350Z shl.b64 %rd371, %rd370, 32; 2026-02-21T09:50:55.8562406Z or.b64 %rd372, %rd369, %rd371; 2026-02-21T09:50:55.8562561Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8562624Z mov.b64 {%r1344, %r1345}, %rd372; 2026-02-21T09:50:55.8562687Z cvt.rn.f16x2.f32 %r1346, %r1345, %r1344; 2026-02-21T09:50:55.8562842Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8562904Z cvt.u64.u32 %rd373, %r690; 2026-02-21T09:50:55.8562959Z cvt.u64.u32 %rd374, %r691; 2026-02-21T09:50:55.8563036Z shl.b64 %rd375, %rd374, 32; 2026-02-21T09:50:55.8563093Z or.b64 %rd376, %rd373, %rd375; 2026-02-21T09:50:55.8563254Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8563310Z mov.b64 {%r1347, %r1348}, %rd376; 2026-02-21T09:50:55.8563373Z cvt.rn.f16x2.f32 %r1349, %r1348, %r1347; 2026-02-21T09:50:55.8563561Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8563616Z cvt.u64.u32 %rd377, %r692; 2026-02-21T09:50:55.8563670Z cvt.u64.u32 %rd378, %r693; 2026-02-21T09:50:55.8563726Z shl.b64 %rd379, %rd378, 32; 2026-02-21T09:50:55.8563791Z or.b64 %rd380, %rd377, %rd379; 2026-02-21T09:50:55.8563952Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8564008Z mov.b64 {%r1350, %r1351}, %rd380; 2026-02-21T09:50:55.8564078Z cvt.rn.f16x2.f32 %r1352, %r1351, %r1350; 2026-02-21T09:50:55.8564235Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8564292Z cvt.u64.u32 %rd381, %r694; 2026-02-21T09:50:55.8564355Z cvt.u64.u32 %rd382, %r695; 2026-02-21T09:50:55.8564410Z shl.b64 %rd383, %rd382, 32; 2026-02-21T09:50:55.8564491Z or.b64 %rd384, %rd381, %rd383; 2026-02-21T09:50:55.8564703Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8564771Z mov.b64 {%r1353, %r1354}, %rd384; 2026-02-21T09:50:55.8564834Z cvt.rn.f16x2.f32 %r1355, %r1354, %r1353; 2026-02-21T09:50:55.8564993Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8565058Z cvt.u64.u32 %rd385, %r697; 2026-02-21T09:50:55.8565113Z cvt.u64.u32 %rd386, %r698; 2026-02-21T09:50:55.8565171Z shl.b64 %rd387, %rd386, 32; 2026-02-21T09:50:55.8565238Z or.b64 %rd388, %rd385, %rd387; 2026-02-21T09:50:55.8565398Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8565455Z mov.b64 {%r1356, %r1357}, %rd388; 2026-02-21T09:50:55.8565519Z cvt.rn.f16x2.f32 %r1358, %r1357, %r1356; 2026-02-21T09:50:55.8565691Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8565748Z cvt.u64.u32 %rd389, %r699; 2026-02-21T09:50:55.8565804Z cvt.u64.u32 %rd390, %r700; 2026-02-21T09:50:55.8565866Z shl.b64 %rd391, %rd390, 32; 2026-02-21T09:50:55.8565923Z or.b64 %rd392, %rd389, %rd391; 2026-02-21T09:50:55.8566082Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8566147Z mov.b64 {%r1359, %r1360}, %rd392; 2026-02-21T09:50:55.8566209Z cvt.rn.f16x2.f32 %r1361, %r1360, %r1359; 2026-02-21T09:50:55.8566372Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8566439Z cvt.u64.u32 %rd393, %r701; 2026-02-21T09:50:55.8566493Z cvt.u64.u32 %rd394, %r702; 2026-02-21T09:50:55.8566549Z shl.b64 %rd395, %rd394, 32; 2026-02-21T09:50:55.8566606Z or.b64 %rd396, %rd393, %rd395; 2026-02-21T09:50:55.8566775Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8566832Z mov.b64 {%r1362, %r1363}, %rd396; 2026-02-21T09:50:55.8566896Z cvt.rn.f16x2.f32 %r1364, %r1363, %r1362; 2026-02-21T09:50:55.8567063Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8567118Z cvt.u64.u32 %rd397, %r703; 2026-02-21T09:50:55.8567171Z cvt.u64.u32 %rd398, %r704; 2026-02-21T09:50:55.8567226Z shl.b64 %rd399, %rd398, 32; 2026-02-21T09:50:55.8567287Z or.b64 %rd400, %rd397, %rd399; 2026-02-21T09:50:55.8567449Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8567531Z mov.b64 {%r1365, %r1366}, %rd400; 2026-02-21T09:50:55.8567600Z cvt.rn.f16x2.f32 %r1367, %r1366, %r1365; 2026-02-21T09:50:55.8567763Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8567819Z cvt.u64.u32 %rd401, %r705; 2026-02-21T09:50:55.8567882Z cvt.u64.u32 %rd402, %r706; 2026-02-21T09:50:55.8567938Z shl.b64 %rd403, %rd402, 32; 2026-02-21T09:50:55.8568022Z or.b64 %rd404, %rd401, %rd403; 2026-02-21T09:50:55.8568181Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8568247Z mov.b64 {%r1368, %r1369}, %rd404; 2026-02-21T09:50:55.8568308Z cvt.rn.f16x2.f32 %r1370, %r1369, %r1368; 2026-02-21T09:50:55.8568472Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8568535Z cvt.u64.u32 %rd405, %r707; 2026-02-21T09:50:55.8568589Z cvt.u64.u32 %rd406, %r708; 2026-02-21T09:50:55.8568645Z shl.b64 %rd407, %rd406, 32; 2026-02-21T09:50:55.8568706Z or.b64 %rd408, %rd405, %rd407; 2026-02-21T09:50:55.8568869Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8568926Z mov.b64 {%r1371, %r1372}, %rd408; 2026-02-21T09:50:55.8569012Z cvt.rn.f16x2.f32 %r1373, %r1372, %r1371; 2026-02-21T09:50:55.8569197Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8569255Z cvt.u64.u32 %rd409, %r709; 2026-02-21T09:50:55.8569310Z cvt.u64.u32 %rd410, %r710; 2026-02-21T09:50:55.8569374Z shl.b64 %rd411, %rd410, 32; 2026-02-21T09:50:55.8569429Z or.b64 %rd412, %rd409, %rd411; 2026-02-21T09:50:55.8569593Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8569655Z mov.b64 {%r1374, %r1375}, %rd412; 2026-02-21T09:50:55.8569717Z cvt.rn.f16x2.f32 %r1376, %r1375, %r1374; 2026-02-21T09:50:55.8569875Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8569936Z cvt.u64.u32 %rd413, %r711; 2026-02-21T09:50:55.8569989Z cvt.u64.u32 %rd414, %r712; 2026-02-21T09:50:55.8570044Z shl.b64 %rd415, %rd414, 32; 2026-02-21T09:50:55.8570102Z or.b64 %rd416, %rd413, %rd415; 2026-02-21T09:50:55.8570270Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8570326Z mov.b64 {%r1377, %r1378}, %rd416; 2026-02-21T09:50:55.8570388Z cvt.rn.f16x2.f32 %r1379, %r1378, %r1377; 2026-02-21T09:50:55.8570553Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8570608Z cvt.u64.u32 %rd417, %r714; 2026-02-21T09:50:55.8570662Z cvt.u64.u32 %rd418, %r715; 2026-02-21T09:50:55.8570717Z shl.b64 %rd419, %rd418, 32; 2026-02-21T09:50:55.8570778Z or.b64 %rd420, %rd417, %rd419; 2026-02-21T09:50:55.8570941Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8570997Z mov.b64 {%r1380, %r1381}, %rd420; 2026-02-21T09:50:55.8571066Z cvt.rn.f16x2.f32 %r1382, %r1381, %r1380; 2026-02-21T09:50:55.8571227Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8571283Z cvt.u64.u32 %rd421, %r716; 2026-02-21T09:50:55.8571345Z cvt.u64.u32 %rd422, %r717; 2026-02-21T09:50:55.8571399Z shl.b64 %rd423, %rd422, 32; 2026-02-21T09:50:55.8571457Z or.b64 %rd424, %rd421, %rd423; 2026-02-21T09:50:55.8571615Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8571676Z mov.b64 {%r1383, %r1384}, %rd424; 2026-02-21T09:50:55.8571737Z cvt.rn.f16x2.f32 %r1385, %r1384, %r1383; 2026-02-21T09:50:55.8571894Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8571978Z cvt.u64.u32 %rd425, %r718; 2026-02-21T09:50:55.8572032Z cvt.u64.u32 %rd426, %r719; 2026-02-21T09:50:55.8572089Z shl.b64 %rd427, %rd426, 32; 2026-02-21T09:50:55.8572151Z or.b64 %rd428, %rd425, %rd427; 2026-02-21T09:50:55.8572307Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8572364Z mov.b64 {%r1386, %r1387}, %rd428; 2026-02-21T09:50:55.8572448Z cvt.rn.f16x2.f32 %r1388, %r1387, %r1386; 2026-02-21T09:50:55.8572611Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8572665Z cvt.u64.u32 %rd429, %r720; 2026-02-21T09:50:55.8572718Z cvt.u64.u32 %rd430, %r721; 2026-02-21T09:50:55.8572782Z shl.b64 %rd431, %rd430, 32; 2026-02-21T09:50:55.8572838Z or.b64 %rd432, %rd429, %rd431; 2026-02-21T09:50:55.8572993Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8573058Z mov.b64 {%r1389, %r1390}, %rd432; 2026-02-21T09:50:55.8573120Z cvt.rn.f16x2.f32 %r1391, %r1390, %r1389; 2026-02-21T09:50:55.8573281Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8573342Z cvt.u64.u32 %rd433, %r722; 2026-02-21T09:50:55.8573418Z cvt.u64.u32 %rd434, %r723; 2026-02-21T09:50:55.8573495Z shl.b64 %rd435, %rd434, 32; 2026-02-21T09:50:55.8573556Z or.b64 %rd436, %rd433, %rd435; 2026-02-21T09:50:55.8573726Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8573783Z mov.b64 {%r1392, %r1393}, %rd436; 2026-02-21T09:50:55.8573846Z cvt.rn.f16x2.f32 %r1394, %r1393, %r1392; 2026-02-21T09:50:55.8574016Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8574071Z cvt.u64.u32 %rd437, %r724; 2026-02-21T09:50:55.8574126Z cvt.u64.u32 %rd438, %r725; 2026-02-21T09:50:55.8574183Z shl.b64 %rd439, %rd438, 32; 2026-02-21T09:50:55.8574246Z or.b64 %rd440, %rd437, %rd439; 2026-02-21T09:50:55.8574405Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8574462Z mov.b64 {%r1395, %r1396}, %rd440; 2026-02-21T09:50:55.8574534Z cvt.rn.f16x2.f32 %r1397, %r1396, %r1395; 2026-02-21T09:50:55.8574720Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8574778Z cvt.u64.u32 %rd441, %r726; 2026-02-21T09:50:55.8574840Z cvt.u64.u32 %rd442, %r727; 2026-02-21T09:50:55.8574895Z shl.b64 %rd443, %rd442, 32; 2026-02-21T09:50:55.8574949Z or.b64 %rd444, %rd441, %rd443; 2026-02-21T09:50:55.8575111Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8575174Z mov.b64 {%r1398, %r1399}, %rd444; 2026-02-21T09:50:55.8575237Z cvt.rn.f16x2.f32 %r1400, %r1399, %r1398; 2026-02-21T09:50:55.8575396Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8575459Z cvt.u64.u32 %rd445, %r728; 2026-02-21T09:50:55.8575513Z cvt.u64.u32 %rd446, %r729; 2026-02-21T09:50:55.8575570Z shl.b64 %rd447, %rd446, 32; 2026-02-21T09:50:55.8575632Z or.b64 %rd448, %rd445, %rd447; 2026-02-21T09:50:55.8575801Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8575861Z mov.b64 {%r1401, %r1402}, %rd448; 2026-02-21T09:50:55.8575926Z cvt.rn.f16x2.f32 %r1403, %r1402, %r1401; 2026-02-21T09:50:55.8576106Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8576166Z cvt.u64.u32 %rd449, %r731; 2026-02-21T09:50:55.8576224Z cvt.u64.u32 %rd450, %r732; 2026-02-21T09:50:55.8576294Z shl.b64 %rd451, %rd450, 32; 2026-02-21T09:50:55.8576356Z or.b64 %rd452, %rd449, %rd451; 2026-02-21T09:50:55.8576553Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8576621Z mov.b64 {%r1404, %r1405}, %rd452; 2026-02-21T09:50:55.8576687Z cvt.rn.f16x2.f32 %r1406, %r1405, %r1404; 2026-02-21T09:50:55.8576853Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8576918Z cvt.u64.u32 %rd453, %r733; 2026-02-21T09:50:55.8577013Z cvt.u64.u32 %rd454, %r734; 2026-02-21T09:50:55.8577072Z shl.b64 %rd455, %rd454, 32; 2026-02-21T09:50:55.8577131Z or.b64 %rd456, %rd453, %rd455; 2026-02-21T09:50:55.8577306Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8577365Z mov.b64 {%r1407, %r1408}, %rd456; 2026-02-21T09:50:55.8577429Z cvt.rn.f16x2.f32 %r1409, %r1408, %r1407; 2026-02-21T09:50:55.8577608Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8577666Z cvt.u64.u32 %rd457, %r735; 2026-02-21T09:50:55.8577722Z cvt.u64.u32 %rd458, %r736; 2026-02-21T09:50:55.8577780Z shl.b64 %rd459, %rd458, 32; 2026-02-21T09:50:55.8577845Z or.b64 %rd460, %rd457, %rd459; 2026-02-21T09:50:55.8578034Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8578122Z mov.b64 {%r1410, %r1411}, %rd460; 2026-02-21T09:50:55.8578199Z cvt.rn.f16x2.f32 %r1412, %r1411, %r1410; 2026-02-21T09:50:55.8578369Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8578427Z cvt.u64.u32 %rd461, %r737; 2026-02-21T09:50:55.8578489Z cvt.u64.u32 %rd462, %r738; 2026-02-21T09:50:55.8578548Z shl.b64 %rd463, %rd462, 32; 2026-02-21T09:50:55.8578606Z or.b64 %rd464, %rd461, %rd463; 2026-02-21T09:50:55.8578771Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8578838Z mov.b64 {%r1413, %r1414}, %rd464; 2026-02-21T09:50:55.8578903Z cvt.rn.f16x2.f32 %r1415, %r1414, %r1413; 2026-02-21T09:50:55.8579068Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8579131Z cvt.u64.u32 %rd465, %r739; 2026-02-21T09:50:55.8579190Z cvt.u64.u32 %rd466, %r740; 2026-02-21T09:50:55.8579248Z shl.b64 %rd467, %rd466, 32; 2026-02-21T09:50:55.8579315Z or.b64 %rd468, %rd465, %rd467; 2026-02-21T09:50:55.8579477Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8579535Z mov.b64 {%r1416, %r1417}, %rd468; 2026-02-21T09:50:55.8579600Z cvt.rn.f16x2.f32 %r1418, %r1417, %r1416; 2026-02-21T09:50:55.8579772Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8579829Z cvt.u64.u32 %rd469, %r741; 2026-02-21T09:50:55.8579886Z cvt.u64.u32 %rd470, %r742; 2026-02-21T09:50:55.8579953Z shl.b64 %rd471, %rd470, 32; 2026-02-21T09:50:55.8580012Z or.b64 %rd472, %rd469, %rd471; 2026-02-21T09:50:55.8580177Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8580241Z mov.b64 {%r1419, %r1420}, %rd472; 2026-02-21T09:50:55.8580306Z cvt.rn.f16x2.f32 %r1421, %r1420, %r1419; 2026-02-21T09:50:55.8580471Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8580536Z cvt.u64.u32 %rd473, %r743; 2026-02-21T09:50:55.8580592Z cvt.u64.u32 %rd474, %r744; 2026-02-21T09:50:55.8580649Z shl.b64 %rd475, %rd474, 32; 2026-02-21T09:50:55.8580707Z or.b64 %rd476, %rd473, %rd475; 2026-02-21T09:50:55.8580879Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8580937Z mov.b64 {%r1422, %r1423}, %rd476; 2026-02-21T09:50:55.8581002Z cvt.rn.f16x2.f32 %r1424, %r1423, %r1422; 2026-02-21T09:50:55.8581200Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8581259Z cvt.u64.u32 %rd477, %r745; 2026-02-21T09:50:55.8581316Z cvt.u64.u32 %rd478, %r746; 2026-02-21T09:50:55.8581375Z shl.b64 %rd479, %rd478, 32; 2026-02-21T09:50:55.8581443Z or.b64 %rd480, %rd477, %rd479; 2026-02-21T09:50:55.8581610Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8581693Z mov.b64 {%r1425, %r1426}, %rd480; 2026-02-21T09:50:55.8581767Z cvt.rn.f16x2.f32 %r1427, %r1426, %r1425; 2026-02-21T09:50:55.8581937Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8581996Z cvt.u64.u32 %rd481, %r748; 2026-02-21T09:50:55.8582058Z cvt.u64.u32 %rd482, %r749; 2026-02-21T09:50:55.8582116Z shl.b64 %rd483, %rd482, 32; 2026-02-21T09:50:55.8582176Z or.b64 %rd484, %rd481, %rd483; 2026-02-21T09:50:55.8582344Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8582412Z mov.b64 {%r1428, %r1429}, %rd484; 2026-02-21T09:50:55.8582476Z cvt.rn.f16x2.f32 %r1430, %r1429, %r1428; 2026-02-21T09:50:55.8582684Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8582752Z cvt.u64.u32 %rd485, %r750; 2026-02-21T09:50:55.8582832Z cvt.u64.u32 %rd486, %r751; 2026-02-21T09:50:55.8582892Z shl.b64 %rd487, %rd486, 32; 2026-02-21T09:50:55.8582957Z or.b64 %rd488, %rd485, %rd487; 2026-02-21T09:50:55.8583126Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8583185Z mov.b64 {%r1431, %r1432}, %rd488; 2026-02-21T09:50:55.8583250Z cvt.rn.f16x2.f32 %r1433, %r1432, %r1431; 2026-02-21T09:50:55.8583436Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8583493Z cvt.u64.u32 %rd489, %r752; 2026-02-21T09:50:55.8583547Z cvt.u64.u32 %rd490, %r753; 2026-02-21T09:50:55.8583610Z shl.b64 %rd491, %rd490, 32; 2026-02-21T09:50:55.8583667Z or.b64 %rd492, %rd489, %rd491; 2026-02-21T09:50:55.8583830Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8583895Z mov.b64 {%r1434, %r1435}, %rd492; 2026-02-21T09:50:55.8583961Z cvt.rn.f16x2.f32 %r1436, %r1435, %r1434; 2026-02-21T09:50:55.8584121Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8584185Z cvt.u64.u32 %rd493, %r754; 2026-02-21T09:50:55.8584241Z cvt.u64.u32 %rd494, %r755; 2026-02-21T09:50:55.8584296Z shl.b64 %rd495, %rd494, 32; 2026-02-21T09:50:55.8584360Z or.b64 %rd496, %rd493, %rd495; 2026-02-21T09:50:55.8584528Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8584584Z mov.b64 {%r1437, %r1438}, %rd496; 2026-02-21T09:50:55.8584648Z cvt.rn.f16x2.f32 %r1439, %r1438, %r1437; 2026-02-21T09:50:55.8584841Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8584897Z cvt.u64.u32 %rd497, %r756; 2026-02-21T09:50:55.8584952Z cvt.u64.u32 %rd498, %r757; 2026-02-21T09:50:55.8585007Z shl.b64 %rd499, %rd498, 32; 2026-02-21T09:50:55.8585075Z or.b64 %rd500, %rd497, %rd499; 2026-02-21T09:50:55.8585233Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8585290Z mov.b64 {%r1440, %r1441}, %rd500; 2026-02-21T09:50:55.8585361Z cvt.rn.f16x2.f32 %r1442, %r1441, %r1440; 2026-02-21T09:50:55.8585517Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8585573Z cvt.u64.u32 %rd501, %r758; 2026-02-21T09:50:55.8585635Z cvt.u64.u32 %rd502, %r759; 2026-02-21T09:50:55.8585690Z shl.b64 %rd503, %rd502, 32; 2026-02-21T09:50:55.8585773Z or.b64 %rd504, %rd501, %rd503; 2026-02-21T09:50:55.8585933Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8585997Z mov.b64 {%r1443, %r1444}, %rd504; 2026-02-21T09:50:55.8586060Z cvt.rn.f16x2.f32 %r1445, %r1444, %r1443; 2026-02-21T09:50:55.8586221Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8586312Z cvt.u64.u32 %rd505, %r760; 2026-02-21T09:50:55.8586367Z cvt.u64.u32 %rd506, %r761; 2026-02-21T09:50:55.8586423Z shl.b64 %rd507, %rd506, 32; 2026-02-21T09:50:55.8586488Z or.b64 %rd508, %rd505, %rd507; 2026-02-21T09:50:55.8586645Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8586702Z mov.b64 {%r1446, %r1447}, %rd508; 2026-02-21T09:50:55.8586764Z cvt.rn.f16x2.f32 %r1448, %r1447, %r1446; 2026-02-21T09:50:55.8586934Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8586989Z cvt.u64.u32 %rd509, %r762; 2026-02-21T09:50:55.8587043Z cvt.u64.u32 %rd510, %r763; 2026-02-21T09:50:55.8587106Z shl.b64 %rd511, %rd510, 32; 2026-02-21T09:50:55.8587162Z or.b64 %rd512, %rd509, %rd511; 2026-02-21T09:50:55.8587377Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8587442Z mov.b64 {%r1449, %r1450}, %rd512; 2026-02-21T09:50:55.8587505Z cvt.rn.f16x2.f32 %r1451, %r1450, %r1449; 2026-02-21T09:50:55.8587667Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8587728Z cvt.u64.u32 %rd513, %r765; 2026-02-21T09:50:55.8587782Z cvt.u64.u32 %rd514, %r766; 2026-02-21T09:50:55.8587837Z shl.b64 %rd515, %rd514, 32; 2026-02-21T09:50:55.8587892Z or.b64 %rd516, %rd513, %rd515; 2026-02-21T09:50:55.8588062Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8588119Z mov.b64 {%r1452, %r1453}, %rd516; 2026-02-21T09:50:55.8588181Z cvt.rn.f16x2.f32 %r1454, %r1453, %r1452; 2026-02-21T09:50:55.8588347Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8588401Z cvt.u64.u32 %rd517, %r767; 2026-02-21T09:50:55.8588458Z cvt.u64.u32 %rd518, %r768; 2026-02-21T09:50:55.8588512Z shl.b64 %rd519, %rd518, 32; 2026-02-21T09:50:55.8588574Z or.b64 %rd520, %rd517, %rd519; 2026-02-21T09:50:55.8588733Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8588789Z mov.b64 {%r1455, %r1456}, %rd520; 2026-02-21T09:50:55.8588858Z cvt.rn.f16x2.f32 %r1457, %r1456, %r1455; 2026-02-21T09:50:55.8589017Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8589072Z cvt.u64.u32 %rd521, %r769; 2026-02-21T09:50:55.8589134Z cvt.u64.u32 %rd522, %r770; 2026-02-21T09:50:55.8589190Z shl.b64 %rd523, %rd522, 32; 2026-02-21T09:50:55.8589245Z or.b64 %rd524, %rd521, %rd523; 2026-02-21T09:50:55.8589403Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8589468Z mov.b64 {%r1458, %r1459}, %rd524; 2026-02-21T09:50:55.8589532Z cvt.rn.f16x2.f32 %r1460, %r1459, %r1458; 2026-02-21T09:50:55.8589696Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8589757Z cvt.u64.u32 %rd525, %r771; 2026-02-21T09:50:55.8589811Z cvt.u64.u32 %rd526, %r772; 2026-02-21T09:50:55.8589866Z shl.b64 %rd527, %rd526, 32; 2026-02-21T09:50:55.8589928Z or.b64 %rd528, %rd525, %rd527; 2026-02-21T09:50:55.8590088Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8590144Z mov.b64 {%r1461, %r1462}, %rd528; 2026-02-21T09:50:55.8590229Z cvt.rn.f16x2.f32 %r1463, %r1462, %r1461; 2026-02-21T09:50:55.8590398Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8590453Z cvt.u64.u32 %rd529, %r773; 2026-02-21T09:50:55.8590508Z cvt.u64.u32 %rd530, %r774; 2026-02-21T09:50:55.8590573Z shl.b64 %rd531, %rd530, 32; 2026-02-21T09:50:55.8590630Z or.b64 %rd532, %rd529, %rd531; 2026-02-21T09:50:55.8590813Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8590875Z mov.b64 {%r1464, %r1465}, %rd532; 2026-02-21T09:50:55.8590939Z cvt.rn.f16x2.f32 %r1466, %r1465, %r1464; 2026-02-21T09:50:55.8591101Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8591163Z cvt.u64.u32 %rd533, %r775; 2026-02-21T09:50:55.8591220Z cvt.u64.u32 %rd534, %r776; 2026-02-21T09:50:55.8591277Z shl.b64 %rd535, %rd534, 32; 2026-02-21T09:50:55.8591336Z or.b64 %rd536, %rd533, %rd535; 2026-02-21T09:50:55.8591505Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8591561Z mov.b64 {%r1467, %r1468}, %rd536; 2026-02-21T09:50:55.8591623Z cvt.rn.f16x2.f32 %r1469, %r1468, %r1467; 2026-02-21T09:50:55.8591835Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8591892Z cvt.u64.u32 %rd537, %r777; 2026-02-21T09:50:55.8591948Z cvt.u64.u32 %rd538, %r778; 2026-02-21T09:50:55.8592003Z shl.b64 %rd539, %rd538, 32; 2026-02-21T09:50:55.8592066Z or.b64 %rd540, %rd537, %rd539; 2026-02-21T09:50:55.8592221Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8592276Z mov.b64 {%r1470, %r1471}, %rd540; 2026-02-21T09:50:55.8592346Z cvt.rn.f16x2.f32 %r1472, %r1471, %r1470; 2026-02-21T09:50:55.8592502Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8592558Z cvt.u64.u32 %rd541, %r779; 2026-02-21T09:50:55.8592618Z cvt.u64.u32 %rd542, %r780; 2026-02-21T09:50:55.8592674Z shl.b64 %rd543, %rd542, 32; 2026-02-21T09:50:55.8592729Z or.b64 %rd544, %rd541, %rd543; 2026-02-21T09:50:55.8592889Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8592954Z mov.b64 {%r1473, %r1474}, %rd544; 2026-02-21T09:50:55.8593015Z cvt.rn.f16x2.f32 %r1475, %r1474, %r1473; 2026-02-21T09:50:55.8593169Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8593230Z cvt.u64.u32 %rd545, %r782; 2026-02-21T09:50:55.8593285Z cvt.u64.u32 %rd546, %r783; 2026-02-21T09:50:55.8593342Z shl.b64 %rd547, %rd546, 32; 2026-02-21T09:50:55.8593404Z or.b64 %rd548, %rd545, %rd547; 2026-02-21T09:50:55.8593563Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8593619Z mov.b64 {%r1476, %r1477}, %rd548; 2026-02-21T09:50:55.8593681Z cvt.rn.f16x2.f32 %r1478, %r1477, %r1476; 2026-02-21T09:50:55.8593843Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8593900Z cvt.u64.u32 %rd549, %r784; 2026-02-21T09:50:55.8593956Z cvt.u64.u32 %rd550, %r785; 2026-02-21T09:50:55.8594021Z shl.b64 %rd551, %rd550, 32; 2026-02-21T09:50:55.8594077Z or.b64 %rd552, %rd549, %rd551; 2026-02-21T09:50:55.8594234Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8594297Z mov.b64 {%r1479, %r1480}, %rd552; 2026-02-21T09:50:55.8594359Z cvt.rn.f16x2.f32 %r1481, %r1480, %r1479; 2026-02-21T09:50:55.8594521Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8594581Z cvt.u64.u32 %rd553, %r786; 2026-02-21T09:50:55.8594704Z cvt.u64.u32 %rd554, %r787; 2026-02-21T09:50:55.8594762Z shl.b64 %rd555, %rd554, 32; 2026-02-21T09:50:55.8594818Z or.b64 %rd556, %rd553, %rd555; 2026-02-21T09:50:55.8594987Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8595045Z mov.b64 {%r1482, %r1483}, %rd556; 2026-02-21T09:50:55.8595108Z cvt.rn.f16x2.f32 %r1484, %r1483, %r1482; 2026-02-21T09:50:55.8595303Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8595359Z cvt.u64.u32 %rd557, %r788; 2026-02-21T09:50:55.8595414Z cvt.u64.u32 %rd558, %r789; 2026-02-21T09:50:55.8595469Z shl.b64 %rd559, %rd558, 32; 2026-02-21T09:50:55.8595532Z or.b64 %rd560, %rd557, %rd559; 2026-02-21T09:50:55.8595693Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8595750Z mov.b64 {%r1485, %r1486}, %rd560; 2026-02-21T09:50:55.8595820Z cvt.rn.f16x2.f32 %r1487, %r1486, %r1485; 2026-02-21T09:50:55.8595979Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8596033Z cvt.u64.u32 %rd561, %r790; 2026-02-21T09:50:55.8596092Z cvt.u64.u32 %rd562, %r791; 2026-02-21T09:50:55.8596172Z shl.b64 %rd563, %rd562, 32; 2026-02-21T09:50:55.8596252Z or.b64 %rd564, %rd561, %rd563; 2026-02-21T09:50:55.8596413Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8596475Z mov.b64 {%r1488, %r1489}, %rd564; 2026-02-21T09:50:55.8596538Z cvt.rn.f16x2.f32 %r1490, %r1489, %r1488; 2026-02-21T09:50:55.8596698Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8596762Z cvt.u64.u32 %rd565, %r792; 2026-02-21T09:50:55.8596816Z cvt.u64.u32 %rd566, %r793; 2026-02-21T09:50:55.8596871Z shl.b64 %rd567, %rd566, 32; 2026-02-21T09:50:55.8596934Z or.b64 %rd568, %rd565, %rd567; 2026-02-21T09:50:55.8597097Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8597153Z mov.b64 {%r1491, %r1492}, %rd568; 2026-02-21T09:50:55.8597214Z cvt.rn.f16x2.f32 %r1493, %r1492, %r1491; 2026-02-21T09:50:55.8597384Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8597440Z cvt.u64.u32 %rd569, %r794; 2026-02-21T09:50:55.8597495Z cvt.u64.u32 %rd570, %r795; 2026-02-21T09:50:55.8597558Z shl.b64 %rd571, %rd570, 32; 2026-02-21T09:50:55.8597613Z or.b64 %rd572, %rd569, %rd571; 2026-02-21T09:50:55.8597773Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8597834Z mov.b64 {%r1494, %r1495}, %rd572; 2026-02-21T09:50:55.8597896Z cvt.rn.f16x2.f32 %r1496, %r1495, %r1494; 2026-02-21T09:50:55.8598057Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8598120Z cvt.u64.u32 %rd573, %r796; 2026-02-21T09:50:55.8598176Z cvt.u64.u32 %rd574, %r797; 2026-02-21T09:50:55.8598232Z shl.b64 %rd575, %rd574, 32; 2026-02-21T09:50:55.8598288Z or.b64 %rd576, %rd573, %rd575; 2026-02-21T09:50:55.8598458Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8598515Z mov.b64 {%r1497, %r1498}, %rd576; 2026-02-21T09:50:55.8598577Z cvt.rn.f16x2.f32 %r1499, %r1498, %r1497; 2026-02-21T09:50:55.8598742Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8598797Z cvt.u64.u32 %rd577, %r799; 2026-02-21T09:50:55.8598852Z cvt.u64.u32 %rd578, %r800; 2026-02-21T09:50:55.8598908Z shl.b64 %rd579, %rd578, 32; 2026-02-21T09:50:55.8598970Z or.b64 %rd580, %rd577, %rd579; 2026-02-21T09:50:55.8599128Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8599212Z mov.b64 {%r1500, %r1501}, %rd580; 2026-02-21T09:50:55.8599283Z cvt.rn.f16x2.f32 %r1502, %r1501, %r1500; 2026-02-21T09:50:55.8599445Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8599501Z cvt.u64.u32 %rd581, %r801; 2026-02-21T09:50:55.8599564Z cvt.u64.u32 %rd582, %r802; 2026-02-21T09:50:55.8599643Z shl.b64 %rd583, %rd582, 32; 2026-02-21T09:50:55.8599699Z or.b64 %rd584, %rd581, %rd583; 2026-02-21T09:50:55.8599860Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8599923Z mov.b64 {%r1503, %r1504}, %rd584; 2026-02-21T09:50:55.8599985Z cvt.rn.f16x2.f32 %r1505, %r1504, %r1503; 2026-02-21T09:50:55.8600143Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8600206Z cvt.u64.u32 %rd585, %r803; 2026-02-21T09:50:55.8600264Z cvt.u64.u32 %rd586, %r804; 2026-02-21T09:50:55.8600319Z shl.b64 %rd587, %rd586, 32; 2026-02-21T09:50:55.8600382Z or.b64 %rd588, %rd585, %rd587; 2026-02-21T09:50:55.8600543Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8600620Z mov.b64 {%r1506, %r1507}, %rd588; 2026-02-21T09:50:55.8600703Z cvt.rn.f16x2.f32 %r1508, %r1507, %r1506; 2026-02-21T09:50:55.8600870Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8600925Z cvt.u64.u32 %rd589, %r805; 2026-02-21T09:50:55.8600979Z cvt.u64.u32 %rd590, %r806; 2026-02-21T09:50:55.8601043Z shl.b64 %rd591, %rd590, 32; 2026-02-21T09:50:55.8601100Z or.b64 %rd592, %rd589, %rd591; 2026-02-21T09:50:55.8601263Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8601325Z mov.b64 {%r1509, %r1510}, %rd592; 2026-02-21T09:50:55.8601389Z cvt.rn.f16x2.f32 %r1511, %r1510, %r1509; 2026-02-21T09:50:55.8601546Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8601607Z cvt.u64.u32 %rd593, %r807; 2026-02-21T09:50:55.8601663Z cvt.u64.u32 %rd594, %r808; 2026-02-21T09:50:55.8601720Z shl.b64 %rd595, %rd594, 32; 2026-02-21T09:50:55.8601776Z or.b64 %rd596, %rd593, %rd595; 2026-02-21T09:50:55.8601948Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8602004Z mov.b64 {%r1512, %r1513}, %rd596; 2026-02-21T09:50:55.8602066Z cvt.rn.f16x2.f32 %r1514, %r1513, %r1512; 2026-02-21T09:50:55.8602230Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8602285Z cvt.u64.u32 %rd597, %r809; 2026-02-21T09:50:55.8602339Z cvt.u64.u32 %rd598, %r810; 2026-02-21T09:50:55.8602394Z shl.b64 %rd599, %rd598, 32; 2026-02-21T09:50:55.8602459Z or.b64 %rd600, %rd597, %rd599; 2026-02-21T09:50:55.8602621Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8602676Z mov.b64 {%r1515, %r1516}, %rd600; 2026-02-21T09:50:55.8602748Z cvt.rn.f16x2.f32 %r1517, %r1516, %r1515; 2026-02-21T09:50:55.8602911Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8602970Z cvt.u64.u32 %rd601, %r811; 2026-02-21T09:50:55.8603032Z cvt.u64.u32 %rd602, %r812; 2026-02-21T09:50:55.8603088Z shl.b64 %rd603, %rd602, 32; 2026-02-21T09:50:55.8603144Z or.b64 %rd604, %rd601, %rd603; 2026-02-21T09:50:55.8603302Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8603365Z mov.b64 {%r1518, %r1519}, %rd604; 2026-02-21T09:50:55.8603427Z cvt.rn.f16x2.f32 %r1520, %r1519, %r1518; 2026-02-21T09:50:55.8603589Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8603674Z cvt.u64.u32 %rd605, %r813; 2026-02-21T09:50:55.8603729Z cvt.u64.u32 %rd606, %r814; 2026-02-21T09:50:55.8603785Z shl.b64 %rd607, %rd606, 32; 2026-02-21T09:50:55.8603848Z or.b64 %rd608, %rd605, %rd607; 2026-02-21T09:50:55.8604015Z .loc 1 55 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:55:27 2026-02-21T09:50:55.8604093Z mov.b64 {%r1521, %r1522}, %rd608; 2026-02-21T09:50:55.8604156Z cvt.rn.f16x2.f32 %r1523, %r1522, %r1521; 2026-02-21T09:50:55.8604321Z .loc 1 56 53 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:56:53 2026-02-21T09:50:55.8604389Z mad.lo.s32 %r1524, %r1107, 12288, %r1106; 2026-02-21T09:50:55.8604454Z mad.lo.s32 %r1525, %r1108, 12288, %r1106; 2026-02-21T09:50:55.8604523Z mad.lo.s32 %r1526, %r1109, 12288, %r1106; 2026-02-21T09:50:55.8604583Z mad.lo.s32 %r1527, %r1110, 12288, %r1106; 2026-02-21T09:50:55.8604643Z mad.lo.s32 %r1528, %r1111, 12288, %r1106; 2026-02-21T09:50:55.8604739Z mad.lo.s32 %r1529, %r1112, 12288, %r1106; 2026-02-21T09:50:55.8604801Z mad.lo.s32 %r1530, %r1113, 12288, %r1106; 2026-02-21T09:50:55.8604862Z mad.lo.s32 %r1531, %r1114, 12288, %r1106; 2026-02-21T09:50:55.8604923Z mad.lo.s32 %r1532, %r1115, 12288, %r1106; 2026-02-21T09:50:55.8605015Z mad.lo.s32 %r1533, %r1116, 12288, %r1106; 2026-02-21T09:50:55.8605101Z mad.lo.s32 %r1534, %r1117, 12288, %r1106; 2026-02-21T09:50:55.8605163Z mad.lo.s32 %r1535, %r1118, 12288, %r1106; 2026-02-21T09:50:55.8605231Z mad.lo.s32 %r1536, %r1119, 12288, %r1106; 2026-02-21T09:50:55.8605292Z mad.lo.s32 %r1537, %r1120, 12288, %r1106; 2026-02-21T09:50:55.8605352Z mad.lo.s32 %r1538, %r1121, 12288, %r1106; 2026-02-21T09:50:55.8605417Z mad.lo.s32 %r1539, %r1122, 12288, %r1106; 2026-02-21T09:50:55.8605476Z mad.lo.s32 %r1540, %r1123, 12288, %r1106; 2026-02-21T09:50:55.8605537Z mad.lo.s32 %r1541, %r1124, 12288, %r1106; 2026-02-21T09:50:55.8605598Z mad.lo.s32 %r1542, %r1125, 12288, %r1106; 2026-02-21T09:50:55.8605668Z mad.lo.s32 %r1543, %r1126, 12288, %r1106; 2026-02-21T09:50:55.8605730Z mad.lo.s32 %r1544, %r1127, 12288, %r1106; 2026-02-21T09:50:55.8605790Z mad.lo.s32 %r1545, %r1128, 12288, %r1106; 2026-02-21T09:50:55.8605858Z mad.lo.s32 %r1546, %r1129, 12288, %r1106; 2026-02-21T09:50:55.8605918Z mad.lo.s32 %r1547, %r1130, 12288, %r1106; 2026-02-21T09:50:55.8605980Z mad.lo.s32 %r1548, %r1131, 12288, %r1106; 2026-02-21T09:50:55.8606050Z mad.lo.s32 %r1549, %r1132, 12288, %r1106; 2026-02-21T09:50:55.8606110Z mad.lo.s32 %r1550, %r1133, 12288, %r1106; 2026-02-21T09:50:55.8606170Z mad.lo.s32 %r1551, %r1134, 12288, %r1106; 2026-02-21T09:50:55.8606230Z mad.lo.s32 %r1552, %r1135, 12288, %r1106; 2026-02-21T09:50:55.8606297Z mad.lo.s32 %r1553, %r1136, 12288, %r1106; 2026-02-21T09:50:55.8606357Z mad.lo.s32 %r1554, %r1137, 12288, %r1106; 2026-02-21T09:50:55.8606416Z mad.lo.s32 %r1555, %r1138, 12288, %r1106; 2026-02-21T09:50:55.8606583Z .loc 1 56 24 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:56:24 2026-02-21T09:50:55.8606649Z mad.wide.s32 %rd65, %r1524, 2, %rd5; 2026-02-21T09:50:55.8606713Z mad.wide.s32 %rd66, %r1525, 2, %rd5; 2026-02-21T09:50:55.8606781Z mad.wide.s32 %rd67, %r1526, 2, %rd5; 2026-02-21T09:50:55.8606842Z mad.wide.s32 %rd68, %r1527, 2, %rd5; 2026-02-21T09:50:55.8606901Z mad.wide.s32 %rd69, %r1528, 2, %rd5; 2026-02-21T09:50:55.8606961Z mad.wide.s32 %rd70, %r1529, 2, %rd5; 2026-02-21T09:50:55.8607027Z mad.wide.s32 %rd71, %r1530, 2, %rd5; 2026-02-21T09:50:55.8607083Z mad.wide.s32 %rd72, %r1531, 2, %rd5; 2026-02-21T09:50:55.8607143Z mad.wide.s32 %rd73, %r1532, 2, %rd5; 2026-02-21T09:50:55.8607208Z mad.wide.s32 %rd74, %r1533, 2, %rd5; 2026-02-21T09:50:55.8607267Z mad.wide.s32 %rd75, %r1534, 2, %rd5; 2026-02-21T09:50:55.8607325Z mad.wide.s32 %rd76, %r1535, 2, %rd5; 2026-02-21T09:50:55.8607383Z mad.wide.s32 %rd77, %r1536, 2, %rd5; 2026-02-21T09:50:55.8607449Z mad.wide.s32 %rd78, %r1537, 2, %rd5; 2026-02-21T09:50:55.8607533Z mad.wide.s32 %rd79, %r1538, 2, %rd5; 2026-02-21T09:50:55.8607591Z mad.wide.s32 %rd80, %r1539, 2, %rd5; 2026-02-21T09:50:55.8607656Z mad.wide.s32 %rd81, %r1540, 2, %rd5; 2026-02-21T09:50:55.8607716Z mad.wide.s32 %rd82, %r1541, 2, %rd5; 2026-02-21T09:50:55.8607774Z mad.wide.s32 %rd83, %r1542, 2, %rd5; 2026-02-21T09:50:55.8607840Z mad.wide.s32 %rd84, %r1543, 2, %rd5; 2026-02-21T09:50:55.8607925Z mad.wide.s32 %rd85, %r1544, 2, %rd5; 2026-02-21T09:50:55.8607984Z mad.wide.s32 %rd86, %r1545, 2, %rd5; 2026-02-21T09:50:55.8608041Z mad.wide.s32 %rd87, %r1546, 2, %rd5; 2026-02-21T09:50:55.8608106Z mad.wide.s32 %rd88, %r1547, 2, %rd5; 2026-02-21T09:50:55.8608164Z mad.wide.s32 %rd89, %r1548, 2, %rd5; 2026-02-21T09:50:55.8608222Z mad.wide.s32 %rd90, %r1549, 2, %rd5; 2026-02-21T09:50:55.8608285Z mad.wide.s32 %rd91, %r1550, 2, %rd5; 2026-02-21T09:50:55.8608342Z mad.wide.s32 %rd92, %r1551, 2, %rd5; 2026-02-21T09:50:55.8608398Z mad.wide.s32 %rd93, %r1552, 2, %rd5; 2026-02-21T09:50:55.8608458Z mad.wide.s32 %rd94, %r1553, 2, %rd5; 2026-02-21T09:50:55.8608521Z mad.wide.s32 %rd95, %r1554, 2, %rd5; 2026-02-21T09:50:55.8608579Z mad.wide.s32 %rd96, %r1555, 2, %rd5; 2026-02-21T09:50:55.8608745Z .loc 1 56 83 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:56:83 2026-02-21T09:50:55.8608827Z bar.sync 0, 128; 2026-02-21T09:50:55.8608956Z st.shared.v4.b32 [%r78], {%r1142, %r1154, %r1166, %r1178}; 2026-02-21T09:50:55.8609052Z st.shared.v4.b32 [%r79], {%r1190, %r1202, %r1214, %r1226}; 2026-02-21T09:50:55.8609148Z st.shared.v4.b32 [%r80], {%r1238, %r1250, %r1262, %r1274}; 2026-02-21T09:50:55.8609234Z st.shared.v4.b32 [%r81], {%r1286, %r1298, %r1310, %r1322}; 2026-02-21T09:50:55.8609290Z bar.sync 0, 128; 2026-02-21T09:50:55.8609345Z // begin inline asm 2026-02-21T09:50:55.8609499Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r977, %r981, %r985, %r989}, [%r821]; 2026-02-21T09:50:55.8609555Z // end inline asm 2026-02-21T09:50:55.8609610Z // begin inline asm 2026-02-21T09:50:55.8609763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r993, %r997, %r1001, %r1005}, [%r826]; 2026-02-21T09:50:55.8609816Z // end inline asm 2026-02-21T09:50:55.8609869Z // begin inline asm 2026-02-21T09:50:55.8610027Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1009, %r1013, %r1017, %r1021}, [%r831]; 2026-02-21T09:50:55.8610079Z // end inline asm 2026-02-21T09:50:55.8610136Z // begin inline asm 2026-02-21T09:50:55.8610284Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r836]; 2026-02-21T09:50:55.8610344Z // end inline asm 2026-02-21T09:50:55.8610397Z bar.sync 0, 128; 2026-02-21T09:50:55.8610485Z st.shared.v4.b32 [%r78], {%r1334, %r1346, %r1358, %r1370}; 2026-02-21T09:50:55.8610579Z st.shared.v4.b32 [%r79], {%r1382, %r1394, %r1406, %r1418}; 2026-02-21T09:50:55.8610666Z st.shared.v4.b32 [%r80], {%r1430, %r1442, %r1454, %r1466}; 2026-02-21T09:50:55.8610751Z st.shared.v4.b32 [%r81], {%r1478, %r1490, %r1502, %r1514}; 2026-02-21T09:50:55.8610812Z bar.sync 0, 128; 2026-02-21T09:50:55.8610866Z // begin inline asm 2026-02-21T09:50:55.8611013Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r821]; 2026-02-21T09:50:55.8611066Z // end inline asm 2026-02-21T09:50:55.8611127Z // begin inline asm 2026-02-21T09:50:55.8611269Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r826]; 2026-02-21T09:50:55.8611323Z // end inline asm 2026-02-21T09:50:55.8611383Z // begin inline asm 2026-02-21T09:50:55.8611526Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r831]; 2026-02-21T09:50:55.8611578Z // end inline asm 2026-02-21T09:50:55.8611630Z // begin inline asm 2026-02-21T09:50:55.8611778Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r836]; 2026-02-21T09:50:55.8611829Z // end inline asm 2026-02-21T09:50:55.8611880Z bar.sync 0, 128; 2026-02-21T09:50:55.8611974Z st.shared.v4.b32 [%r78], {%r1145, %r1157, %r1169, %r1181}; 2026-02-21T09:50:55.8612082Z st.shared.v4.b32 [%r79], {%r1193, %r1205, %r1217, %r1229}; 2026-02-21T09:50:55.8612168Z st.shared.v4.b32 [%r80], {%r1241, %r1253, %r1265, %r1277}; 2026-02-21T09:50:55.8612262Z st.shared.v4.b32 [%r81], {%r1289, %r1301, %r1313, %r1325}; 2026-02-21T09:50:55.8612314Z bar.sync 0, 128; 2026-02-21T09:50:55.8612367Z // begin inline asm 2026-02-21T09:50:55.8612510Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r978, %r982, %r986, %r990}, [%r821]; 2026-02-21T09:50:55.8612589Z // end inline asm 2026-02-21T09:50:55.8612642Z // begin inline asm 2026-02-21T09:50:55.8612783Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r994, %r998, %r1002, %r1006}, [%r826]; 2026-02-21T09:50:55.8612838Z // end inline asm 2026-02-21T09:50:55.8612891Z // begin inline asm 2026-02-21T09:50:55.8613029Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1010, %r1014, %r1018, %r1022}, [%r831]; 2026-02-21T09:50:55.8613080Z // end inline asm 2026-02-21T09:50:55.8613139Z // begin inline asm 2026-02-21T09:50:55.8613278Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r836]; 2026-02-21T09:50:55.8613329Z // end inline asm 2026-02-21T09:50:55.8613387Z bar.sync 0, 128; 2026-02-21T09:50:55.8613474Z st.shared.v4.b32 [%r78], {%r1337, %r1349, %r1361, %r1373}; 2026-02-21T09:50:55.8613579Z st.shared.v4.b32 [%r79], {%r1385, %r1397, %r1409, %r1421}; 2026-02-21T09:50:55.8613692Z st.shared.v4.b32 [%r80], {%r1433, %r1445, %r1457, %r1469}; 2026-02-21T09:50:55.8613780Z st.shared.v4.b32 [%r81], {%r1481, %r1493, %r1505, %r1517}; 2026-02-21T09:50:55.8613832Z bar.sync 0, 128; 2026-02-21T09:50:55.8613886Z // begin inline asm 2026-02-21T09:50:55.8614035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r821]; 2026-02-21T09:50:55.8614087Z // end inline asm 2026-02-21T09:50:55.8614139Z // begin inline asm 2026-02-21T09:50:55.8614290Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r826]; 2026-02-21T09:50:55.8614341Z // end inline asm 2026-02-21T09:50:55.8614395Z // begin inline asm 2026-02-21T09:50:55.8614543Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r831]; 2026-02-21T09:50:55.8614596Z // end inline asm 2026-02-21T09:50:55.8614648Z // begin inline asm 2026-02-21T09:50:55.8614848Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r836]; 2026-02-21T09:50:55.8614909Z // end inline asm 2026-02-21T09:50:55.8614963Z bar.sync 0, 128; 2026-02-21T09:50:55.8615049Z st.shared.v4.b32 [%r78], {%r1148, %r1160, %r1172, %r1184}; 2026-02-21T09:50:55.8615140Z st.shared.v4.b32 [%r79], {%r1196, %r1208, %r1220, %r1232}; 2026-02-21T09:50:55.8615225Z st.shared.v4.b32 [%r80], {%r1244, %r1256, %r1268, %r1280}; 2026-02-21T09:50:55.8615307Z st.shared.v4.b32 [%r81], {%r1292, %r1304, %r1316, %r1328}; 2026-02-21T09:50:55.8615360Z bar.sync 0, 128; 2026-02-21T09:50:55.8615419Z // begin inline asm 2026-02-21T09:50:55.8615555Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r979, %r983, %r987, %r991}, [%r821]; 2026-02-21T09:50:55.8615608Z // end inline asm 2026-02-21T09:50:55.8615667Z // begin inline asm 2026-02-21T09:50:55.8615804Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r995, %r999, %r1003, %r1007}, [%r826]; 2026-02-21T09:50:55.8615856Z // end inline asm 2026-02-21T09:50:55.8615914Z // begin inline asm 2026-02-21T09:50:55.8616054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1011, %r1015, %r1019, %r1023}, [%r831]; 2026-02-21T09:50:55.8616107Z // end inline asm 2026-02-21T09:50:55.8616160Z // begin inline asm 2026-02-21T09:50:55.8616303Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1027, %r1031, %r1035, %r1039}, [%r836]; 2026-02-21T09:50:55.8616356Z // end inline asm 2026-02-21T09:50:55.8616409Z bar.sync 0, 128; 2026-02-21T09:50:55.8616502Z st.shared.v4.b32 [%r78], {%r1340, %r1352, %r1364, %r1376}; 2026-02-21T09:50:55.8616588Z st.shared.v4.b32 [%r79], {%r1388, %r1400, %r1412, %r1424}; 2026-02-21T09:50:55.8616672Z st.shared.v4.b32 [%r80], {%r1436, %r1448, %r1460, %r1472}; 2026-02-21T09:50:55.8616795Z st.shared.v4.b32 [%r81], {%r1484, %r1496, %r1508, %r1520}; 2026-02-21T09:50:55.8616848Z bar.sync 0, 128; 2026-02-21T09:50:55.8616901Z // begin inline asm 2026-02-21T09:50:55.8617044Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1043, %r1047, %r1051, %r1055}, [%r821]; 2026-02-21T09:50:55.8617110Z // end inline asm 2026-02-21T09:50:55.8617163Z // begin inline asm 2026-02-21T09:50:55.8617307Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1059, %r1063, %r1067, %r1071}, [%r826]; 2026-02-21T09:50:55.8617403Z // end inline asm 2026-02-21T09:50:55.8617458Z // begin inline asm 2026-02-21T09:50:55.8617600Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1075, %r1079, %r1083, %r1087}, [%r831]; 2026-02-21T09:50:55.8617654Z // end inline asm 2026-02-21T09:50:55.8617718Z // begin inline asm 2026-02-21T09:50:55.8617859Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1091, %r1095, %r1099, %r1103}, [%r836]; 2026-02-21T09:50:55.8617909Z // end inline asm 2026-02-21T09:50:55.8617969Z bar.sync 0, 128; 2026-02-21T09:50:55.8618057Z st.shared.v4.b32 [%r78], {%r1151, %r1163, %r1175, %r1187}; 2026-02-21T09:50:55.8618143Z st.shared.v4.b32 [%r79], {%r1199, %r1211, %r1223, %r1235}; 2026-02-21T09:50:55.8618234Z st.shared.v4.b32 [%r80], {%r1247, %r1259, %r1271, %r1283}; 2026-02-21T09:50:55.8618349Z st.shared.v4.b32 [%r81], {%r1295, %r1307, %r1319, %r1331}; 2026-02-21T09:50:55.8618403Z bar.sync 0, 128; 2026-02-21T09:50:55.8618480Z // begin inline asm 2026-02-21T09:50:55.8618626Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r980, %r984, %r988, %r992}, [%r821]; 2026-02-21T09:50:55.8618678Z // end inline asm 2026-02-21T09:50:55.8618731Z // begin inline asm 2026-02-21T09:50:55.8618882Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r996, %r1000, %r1004, %r1008}, [%r826]; 2026-02-21T09:50:55.8618933Z // end inline asm 2026-02-21T09:50:55.8618984Z // begin inline asm 2026-02-21T09:50:55.8619147Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1012, %r1016, %r1020, %r1024}, [%r831]; 2026-02-21T09:50:55.8619203Z // end inline asm 2026-02-21T09:50:55.8619258Z // begin inline asm 2026-02-21T09:50:55.8619400Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1028, %r1032, %r1036, %r1040}, [%r836]; 2026-02-21T09:50:55.8619462Z // end inline asm 2026-02-21T09:50:55.8619515Z bar.sync 0, 128; 2026-02-21T09:50:55.8619607Z st.shared.v4.b32 [%r78], {%r1343, %r1355, %r1367, %r1379}; 2026-02-21T09:50:55.8619706Z st.shared.v4.b32 [%r79], {%r1391, %r1403, %r1415, %r1427}; 2026-02-21T09:50:55.8619795Z st.shared.v4.b32 [%r80], {%r1439, %r1451, %r1463, %r1475}; 2026-02-21T09:50:55.8619885Z st.shared.v4.b32 [%r81], {%r1487, %r1499, %r1511, %r1523}; 2026-02-21T09:50:55.8619940Z bar.sync 0, 128; 2026-02-21T09:50:55.8620003Z // begin inline asm 2026-02-21T09:50:55.8620148Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1044, %r1048, %r1052, %r1056}, [%r821]; 2026-02-21T09:50:55.8620200Z // end inline asm 2026-02-21T09:50:55.8620264Z // begin inline asm 2026-02-21T09:50:55.8620407Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1060, %r1064, %r1068, %r1072}, [%r826]; 2026-02-21T09:50:55.8620461Z // end inline asm 2026-02-21T09:50:55.8620522Z // begin inline asm 2026-02-21T09:50:55.8620667Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1076, %r1080, %r1084, %r1088}, [%r831]; 2026-02-21T09:50:55.8620719Z // end inline asm 2026-02-21T09:50:55.8620775Z // begin inline asm 2026-02-21T09:50:55.8620925Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1092, %r1096, %r1100, %r1104}, [%r836]; 2026-02-21T09:50:55.8620982Z // end inline asm 2026-02-21T09:50:55.8621036Z // begin inline asm 2026-02-21T09:50:55.8621144Z st.global.v4.b32 [ %rd65 + 0 ], { %r977, %r978, %r979, %r980 }; 2026-02-21T09:50:55.8621198Z // end inline asm 2026-02-21T09:50:55.8621252Z // begin inline asm 2026-02-21T09:50:55.8621353Z st.global.v4.b32 [ %rd66 + 0 ], { %r981, %r982, %r983, %r984 }; 2026-02-21T09:50:55.8621414Z // end inline asm 2026-02-21T09:50:55.8621467Z // begin inline asm 2026-02-21T09:50:55.8621561Z st.global.v4.b32 [ %rd67 + 0 ], { %r985, %r986, %r987, %r988 }; 2026-02-21T09:50:55.8621642Z // end inline asm 2026-02-21T09:50:55.8621697Z // begin inline asm 2026-02-21T09:50:55.8621801Z st.global.v4.b32 [ %rd68 + 0 ], { %r989, %r990, %r991, %r992 }; 2026-02-21T09:50:55.8621861Z // end inline asm 2026-02-21T09:50:55.8621916Z // begin inline asm 2026-02-21T09:50:55.8622012Z st.global.v4.b32 [ %rd69 + 0 ], { %r993, %r994, %r995, %r996 }; 2026-02-21T09:50:55.8622067Z // end inline asm 2026-02-21T09:50:55.8622151Z // begin inline asm 2026-02-21T09:50:55.8622256Z st.global.v4.b32 [ %rd70 + 0 ], { %r997, %r998, %r999, %r1000 }; 2026-02-21T09:50:55.8622310Z // end inline asm 2026-02-21T09:50:55.8622370Z // begin inline asm 2026-02-21T09:50:55.8622476Z st.global.v4.b32 [ %rd71 + 0 ], { %r1001, %r1002, %r1003, %r1004 }; 2026-02-21T09:50:55.8622530Z // end inline asm 2026-02-21T09:50:55.8622585Z // begin inline asm 2026-02-21T09:50:55.8622696Z st.global.v4.b32 [ %rd72 + 0 ], { %r1005, %r1006, %r1007, %r1008 }; 2026-02-21T09:50:55.8622750Z // end inline asm 2026-02-21T09:50:55.8622807Z // begin inline asm 2026-02-21T09:50:55.8622915Z st.global.v4.b32 [ %rd73 + 0 ], { %r1009, %r1010, %r1011, %r1012 }; 2026-02-21T09:50:55.8622970Z // end inline asm 2026-02-21T09:50:55.8623025Z // begin inline asm 2026-02-21T09:50:55.8623160Z st.global.v4.b32 [ %rd74 + 0 ], { %r1013, %r1014, %r1015, %r1016 }; 2026-02-21T09:50:55.8623217Z // end inline asm 2026-02-21T09:50:55.8623300Z // begin inline asm 2026-02-21T09:50:55.8623400Z st.global.v4.b32 [ %rd75 + 0 ], { %r1017, %r1018, %r1019, %r1020 }; 2026-02-21T09:50:55.8623463Z // end inline asm 2026-02-21T09:50:55.8623518Z // begin inline asm 2026-02-21T09:50:55.8623615Z st.global.v4.b32 [ %rd76 + 0 ], { %r1021, %r1022, %r1023, %r1024 }; 2026-02-21T09:50:55.8623677Z // end inline asm 2026-02-21T09:50:55.8623731Z // begin inline asm 2026-02-21T09:50:55.8623828Z st.global.v4.b32 [ %rd77 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T09:50:55.8623883Z // end inline asm 2026-02-21T09:50:55.8623947Z // begin inline asm 2026-02-21T09:50:55.8624046Z st.global.v4.b32 [ %rd78 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T09:50:55.8624099Z // end inline asm 2026-02-21T09:50:55.8624160Z // begin inline asm 2026-02-21T09:50:55.8624258Z st.global.v4.b32 [ %rd79 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T09:50:55.8624312Z // end inline asm 2026-02-21T09:50:55.8624367Z // begin inline asm 2026-02-21T09:50:55.8624473Z st.global.v4.b32 [ %rd80 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T09:50:55.8624526Z // end inline asm 2026-02-21T09:50:55.8624581Z // begin inline asm 2026-02-21T09:50:55.8624709Z st.global.v4.b32 [ %rd81 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T09:50:55.8624764Z // end inline asm 2026-02-21T09:50:55.8624818Z // begin inline asm 2026-02-21T09:50:55.8624923Z st.global.v4.b32 [ %rd82 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T09:50:55.8624976Z // end inline asm 2026-02-21T09:50:55.8625031Z // begin inline asm 2026-02-21T09:50:55.8625129Z st.global.v4.b32 [ %rd83 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T09:50:55.8625190Z // end inline asm 2026-02-21T09:50:55.8625245Z // begin inline asm 2026-02-21T09:50:55.8625342Z st.global.v4.b32 [ %rd84 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T09:50:55.8625401Z // end inline asm 2026-02-21T09:50:55.8625457Z // begin inline asm 2026-02-21T09:50:55.8625556Z st.global.v4.b32 [ %rd85 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T09:50:55.8625610Z // end inline asm 2026-02-21T09:50:55.8625672Z // begin inline asm 2026-02-21T09:50:55.8625769Z st.global.v4.b32 [ %rd86 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T09:50:55.8625823Z // end inline asm 2026-02-21T09:50:55.8625885Z // begin inline asm 2026-02-21T09:50:55.8625981Z st.global.v4.b32 [ %rd87 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T09:50:55.8626034Z // end inline asm 2026-02-21T09:50:55.8626097Z // begin inline asm 2026-02-21T09:50:55.8626195Z st.global.v4.b32 [ %rd88 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T09:50:55.8626278Z // end inline asm 2026-02-21T09:50:55.8626332Z // begin inline asm 2026-02-21T09:50:55.8626438Z st.global.v4.b32 [ %rd89 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T09:50:55.8626493Z // end inline asm 2026-02-21T09:50:55.8626547Z // begin inline asm 2026-02-21T09:50:55.8626653Z st.global.v4.b32 [ %rd90 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T09:50:55.8626735Z // end inline asm 2026-02-21T09:50:55.8626790Z // begin inline asm 2026-02-21T09:50:55.8626886Z st.global.v4.b32 [ %rd91 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T09:50:55.8626948Z // end inline asm 2026-02-21T09:50:55.8627013Z // begin inline asm 2026-02-21T09:50:55.8627105Z st.global.v4.b32 [ %rd92 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T09:50:55.8627162Z // end inline asm 2026-02-21T09:50:55.8627214Z // begin inline asm 2026-02-21T09:50:55.8627306Z st.global.v4.b32 [ %rd93 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T09:50:55.8627358Z // end inline asm 2026-02-21T09:50:55.8627416Z // begin inline asm 2026-02-21T09:50:55.8627508Z st.global.v4.b32 [ %rd94 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T09:50:55.8627558Z // end inline asm 2026-02-21T09:50:55.8627617Z // begin inline asm 2026-02-21T09:50:55.8627746Z st.global.v4.b32 [ %rd95 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T09:50:55.8627798Z // end inline asm 2026-02-21T09:50:55.8627884Z // begin inline asm 2026-02-21T09:50:55.8627978Z st.global.v4.b32 [ %rd96 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T09:50:55.8628028Z // end inline asm 2026-02-21T09:50:55.8628079Z mov.b32 %r1606, 1; 2026-02-21T09:50:55.8628186Z $L__BB0_22: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:55.8628357Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8628417Z xor.b32 %r1610, %r1606, %r1610; 2026-02-21T09:50:55.8628591Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8628652Z add.s32 %r1600, %r1600, -1; 2026-02-21T09:50:55.8628716Z setp.ne.b32 %p110, %r1600, 0; 2026-02-21T09:50:55.8628781Z @%p110 bra $L__BB0_18; 2026-02-21T09:50:55.8628835Z bra.uni $L__BB0_23; 2026-02-21T09:50:55.8628937Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:50:55.8628998Z add.s32 %r541, %r1605, 1; 2026-02-21T09:50:55.8629067Z setp.eq.b32 %p106, %r1605, 63; 2026-02-21T09:50:55.8629130Z selp.b32 %r1605, 0, %r541, %p106; 2026-02-21T09:50:55.8629189Z setp.eq.b32 %p107, %r1605, 63; 2026-02-21T09:50:55.8629253Z @%p107 bra $L__BB0_21; 2026-02-21T09:50:55.8629347Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:55.8629513Z .loc 1 0 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0:97 2026-02-21T09:50:55.8629573Z mov.b32 %r1606, 0; 2026-02-21T09:50:55.8629735Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8629795Z setp.ne.b32 %p108, %r1605, 0; 2026-02-21T09:50:55.8629851Z @%p108 bra $L__BB0_22; 2026-02-21T09:50:55.8629931Z // %bb.20: // %.thread 2026-02-21T09:50:55.8630017Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:50:55.8630076Z add.s32 %r1607, %r1607, 592; 2026-02-21T09:50:55.8630249Z .loc 1 34 35 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:34:35 2026-02-21T09:50:55.8630308Z shr.s32 %r1557, %r1607, 31; 2026-02-21T09:50:55.8630362Z shr.u32 %r1558, %r1557, 28; 2026-02-21T09:50:55.8630427Z add.s32 %r1559, %r1607, %r1558; 2026-02-21T09:50:55.8630485Z shr.s32 %r1560, %r1559, 4; 2026-02-21T09:50:55.8630650Z .loc 1 35 33 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:35:33 2026-02-21T09:50:55.8630706Z shl.b32 %r1561, %r1560, 1; 2026-02-21T09:50:55.8630892Z .loc 1 36 39 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:36:39 2026-02-21T09:50:55.8630949Z sub.s32 %r1562, 96, %r1561; 2026-02-21T09:50:55.8631112Z .loc 1 36 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:36:52 2026-02-21T09:50:55.8631175Z min.s32 %r1563, %r1562, 2; 2026-02-21T09:50:55.8631334Z .loc 1 37 45 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:37:45 2026-02-21T09:50:55.8631415Z and.b32 %r1564, %r1559, -16; 2026-02-21T09:50:55.8631479Z sub.s32 %r1565, %r1607, %r1564; 2026-02-21T09:50:55.8631640Z .loc 1 38 51 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:38:51 2026-02-21T09:50:55.8631697Z div.s32 %r1566, %r1565, %r1563; 2026-02-21T09:50:55.8631857Z .loc 1 37 64 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:37:64 2026-02-21T09:50:55.8631924Z mul.lo.s32 %r1567, %r1566, %r1563; 2026-02-21T09:50:55.8631982Z sub.s32 %r1568, %r1565, %r1567; 2026-02-21T09:50:55.8632140Z .loc 1 37 30 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:37:30 2026-02-21T09:50:55.8632202Z add.s32 %r1569, %r1568, %r1561; 2026-02-21T09:50:55.8632382Z .loc 1 39 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:39:27 2026-02-21T09:50:55.8632458Z shl.b32 %r1609, %r1569, 7; 2026-02-21T09:50:55.8632623Z .loc 1 41 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:41:27 2026-02-21T09:50:55.8632678Z shl.b32 %r1608, %r1566, 8; 2026-02-21T09:50:55.8632835Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8632897Z bra.uni $L__BB0_22; 2026-02-21T09:50:55.8632987Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:50:55.8633146Z .loc 1 0 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0:97 2026-02-21T09:50:55.8633204Z mov.b32 %r105, global_smem; 2026-02-21T09:50:55.8633267Z add.s32 %r106, %r105, %r3; 2026-02-21T09:50:55.8633320Z bra.uni $L__BB0_2; 2026-02-21T09:50:55.8633415Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8633582Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8633662Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:55.8633716Z barrier.sync 1; 2026-02-21T09:50:55.8633776Z barrier.sync 1; 2026-02-21T09:50:55.8633850Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:55.8633929Z $L__BB0_2: // %.preheader 2026-02-21T09:50:55.8634015Z // =>This Loop Header: Depth=1 2026-02-21T09:50:55.8634104Z // Child Loop BB0_11 Depth 2 2026-02-21T09:50:55.8634186Z // Child Loop BB0_7 Depth 2 2026-02-21T09:50:55.8634341Z .loc 1 19 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:19 2026-02-21T09:50:55.8634422Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:50:55.8634475Z barrier.sync 1; 2026-02-21T09:50:55.8634537Z ld.shared.b8 %r104, [%r106+106580]; 2026-02-21T09:50:55.8634604Z setp.gt.u32 %p4, %r104, 3; 2026-02-21T09:50:55.8634660Z @%p4 bra $L__BB0_4; 2026-02-21T09:50:55.8634768Z // %bb.3: // %.preheader 2026-02-21T09:50:55.8634854Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8634922Z $L_brx_0: .branchtargets 2026-02-21T09:50:55.8634975Z $L__BB0_5, 2026-02-21T09:50:55.8635025Z $L__BB0_9, 2026-02-21T09:50:55.8635083Z $L__BB0_15, 2026-02-21T09:50:55.8635132Z $L__BB0_24; 2026-02-21T09:50:55.8635192Z brx.idx %r104, $L_brx_0; 2026-02-21T09:50:55.8635282Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8635480Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8635553Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:55.8635628Z ld.shared.b32 %r155, [global_smem+98304]; 2026-02-21T09:50:55.8635710Z ld.shared.b32 %r1584, [global_smem+98312]; 2026-02-21T09:50:55.8635766Z barrier.sync 1; 2026-02-21T09:50:55.8635828Z setp.lt.s32 %p17, %r1584, 1; 2026-02-21T09:50:55.8635921Z @%p17 bra $L__BB0_8; 2026-02-21T09:50:55.8635993Z // %bb.6: // %.lr.ph7 2026-02-21T09:50:55.8636075Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8636236Z .loc 1 0 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0:97 2026-02-21T09:50:55.8636298Z mov.b32 %r1588, -1; 2026-02-21T09:50:55.8636356Z mov.pred %p122, 0; 2026-02-21T09:50:55.8636408Z mov.b32 %r1585, 0; 2026-02-21T09:50:55.8636471Z mov.b32 %r1586, %r1585; 2026-02-21T09:50:55.8636526Z mov.b32 %r1587, %r1585; 2026-02-21T09:50:55.8636617Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:50:55.8636710Z // => This Inner Loop Header: Depth=2 2026-02-21T09:50:55.8636894Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8636976Z add.s32 %r165, %r1588, 1; 2026-02-21T09:50:55.8637039Z setp.eq.b32 %p30, %r1588, 63; 2026-02-21T09:50:55.8637108Z selp.b32 %r1588, 0, %r165, %p30; 2026-02-21T09:50:55.8637163Z shl.b32 %r166, %r1587, 3; 2026-02-21T09:50:55.8637220Z add.s32 %r168, %r105, %r166; 2026-02-21T09:50:55.8637282Z add.s32 %r169, %r168, 106496; 2026-02-21T09:50:55.8637337Z add.s32 %r153, %r168, 106528; 2026-02-21T09:50:55.8637498Z .loc 1 51 31 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:51:31 2026-02-21T09:50:55.8637560Z shl.b32 %r170, %r1587, 14; 2026-02-21T09:50:55.8637617Z add.s32 %r171, %r105, %r170; 2026-02-21T09:50:55.8637777Z .loc 1 52 44 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:52:44 2026-02-21T09:50:55.8637833Z shl.b32 %r172, %r1587, 13; 2026-02-21T09:50:55.8637896Z add.s32 %r173, %r105, %r172; 2026-02-21T09:50:55.8637950Z add.s32 %r174, %r173, 65536; 2026-02-21T09:50:55.8638112Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8638180Z bar.warp.sync -1; 2026-02-21T09:50:55.8638235Z // begin inline asm 2026-02-21T09:50:55.8638284Z 2026-02-21T09:50:55.8638332Z { 2026-02-21T09:50:55.8638397Z .reg .pred complete; 2026-02-21T09:50:55.8638449Z waitLoop: 2026-02-21T09:50:55.8638573Z mbarrier.try_wait.parity.shared.b64 complete, [%r153], %r1586; 2026-02-21T09:50:55.8638645Z @!complete bra.uni waitLoop; 2026-02-21T09:50:55.8638693Z } 2026-02-21T09:50:55.8638698Z 2026-02-21T09:50:55.8638752Z // end inline asm 2026-02-21T09:50:55.8638921Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8638982Z setp.eq.b32 %p29, %r1588, 63; 2026-02-21T09:50:55.8639145Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8639205Z elect.sync %r175|%p20, -1; 2026-02-21T09:50:55.8639271Z bfe.u32 %r176, %r171, 4, 14; 2026-02-21T09:50:55.8639331Z cvt.u64.u32 %rd22, %r176; 2026-02-21T09:50:55.8639401Z or.b64 %rd12, %rd22, -9223371899348713472; 2026-02-21T09:50:55.8639468Z bfe.u32 %r177, %r174, 4, 14; 2026-02-21T09:50:55.8639525Z cvt.u64.u32 %rd23, %r177; 2026-02-21T09:50:55.8639592Z or.b64 %rd13, %rd23, -9223371899382267904; 2026-02-21T09:50:55.8639653Z mov.b32 %r156, 136314896; 2026-02-21T09:50:55.8639707Z // begin inline asm 2026-02-21T09:50:55.8639851Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 0 ], %rd12, %rd13, %r156, %p122; 2026-02-21T09:50:55.8639904Z // end inline asm 2026-02-21T09:50:55.8639991Z add.s32 %r178, %r171, 32; 2026-02-21T09:50:55.8640047Z bfe.u32 %r179, %r178, 4, 14; 2026-02-21T09:50:55.8640100Z cvt.u64.u32 %rd24, %r179; 2026-02-21T09:50:55.8640173Z or.b64 %rd14, %rd24, -9223371899348713472; 2026-02-21T09:50:55.8640228Z add.s32 %r180, %r173, 65568; 2026-02-21T09:50:55.8640283Z bfe.u32 %r181, %r180, 4, 14; 2026-02-21T09:50:55.8640340Z cvt.u64.u32 %rd25, %r181; 2026-02-21T09:50:55.8640415Z or.b64 %rd15, %rd25, -9223371899382267904; 2026-02-21T09:50:55.8640495Z mov.pred %p21, -1; 2026-02-21T09:50:55.8640549Z // begin inline asm 2026-02-21T09:50:55.8640691Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 0 ], %rd14, %rd15, %r156, %p21; 2026-02-21T09:50:55.8640743Z // end inline asm 2026-02-21T09:50:55.8640798Z add.s32 %r182, %r171, 8192; 2026-02-21T09:50:55.8640859Z bfe.u32 %r183, %r182, 4, 14; 2026-02-21T09:50:55.8640914Z cvt.u64.u32 %rd26, %r183; 2026-02-21T09:50:55.8640979Z or.b64 %rd16, %rd26, -9223371899348713472; 2026-02-21T09:50:55.8641034Z // begin inline asm 2026-02-21T09:50:55.8641176Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 128 ], %rd16, %rd13, %r156, %p122; 2026-02-21T09:50:55.8641228Z // end inline asm 2026-02-21T09:50:55.8641282Z add.s32 %r184, %r171, 8224; 2026-02-21T09:50:55.8641342Z bfe.u32 %r185, %r184, 4, 14; 2026-02-21T09:50:55.8641397Z cvt.u64.u32 %rd27, %r185; 2026-02-21T09:50:55.8641480Z or.b64 %rd18, %rd27, -9223371899348713472; 2026-02-21T09:50:55.8641556Z // begin inline asm 2026-02-21T09:50:55.8641695Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r155 + 128 ], %rd18, %rd15, %r156, %p21; 2026-02-21T09:50:55.8641746Z // end inline asm 2026-02-21T09:50:55.8641802Z cvt.u64.u32 %rd20, %r169; 2026-02-21T09:50:55.8641862Z // begin inline asm 2026-02-21T09:50:55.8641982Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd20]; 2026-02-21T09:50:55.8642034Z // end inline asm 2026-02-21T09:50:55.8642104Z and.pred %p28, %p29, %p20; 2026-02-21T09:50:55.8642160Z add.s32 %r186, %r105, 106560; 2026-02-21T09:50:55.8642215Z cvt.u64.u32 %rd21, %r186; 2026-02-21T09:50:55.8642268Z // begin inline asm 2026-02-21T09:50:55.8642392Z @%p28 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd21]; 2026-02-21T09:50:55.8642443Z // end inline asm 2026-02-21T09:50:55.8642608Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8642676Z setp.ne.b32 %p122, %r1588, 63; 2026-02-21T09:50:55.8642842Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8642901Z selp.b32 %r187, 1, 0, %p29; 2026-02-21T09:50:55.8642964Z xor.b32 %r1585, %r1585, %r187; 2026-02-21T09:50:55.8643020Z add.s32 %r163, %r105, 106576; 2026-02-21T09:50:55.8643072Z // begin inline asm 2026-02-21T09:50:55.8643119Z 2026-02-21T09:50:55.8643175Z { 2026-02-21T09:50:55.8643235Z @!%p29 bra.uni skipWait; 2026-02-21T09:50:55.8643292Z .reg .pred complete; 2026-02-21T09:50:55.8643349Z waitLoop: 2026-02-21T09:50:55.8643466Z mbarrier.try_wait.parity.shared.b64 complete, [%r163], %r1585; 2026-02-21T09:50:55.8643527Z @!complete bra.uni waitLoop; 2026-02-21T09:50:55.8643578Z skipWait: 2026-02-21T09:50:55.8643634Z } 2026-02-21T09:50:55.8643638Z 2026-02-21T09:50:55.8643690Z // end inline asm 2026-02-21T09:50:55.8643850Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8643914Z add.s32 %r188, %r1587, 1; 2026-02-21T09:50:55.8643971Z setp.eq.b32 %p31, %r188, 4; 2026-02-21T09:50:55.8644037Z selp.b32 %r1587, 0, %r188, %p31; 2026-02-21T09:50:55.8644100Z selp.b32 %r189, 1, 0, %p31; 2026-02-21T09:50:55.8644156Z xor.b32 %r1586, %r1586, %r189; 2026-02-21T09:50:55.8644318Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8644373Z add.s32 %r1584, %r1584, -1; 2026-02-21T09:50:55.8644438Z setp.ne.b32 %p32, %r1584, 0; 2026-02-21T09:50:55.8644492Z @%p32 bra $L__BB0_7; 2026-02-21T09:50:55.8644597Z $L__BB0_8: // %._crit_edge8 2026-02-21T09:50:55.8644714Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8644771Z barrier.sync 1; 2026-02-21T09:50:55.8644847Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:55.8644901Z bra.uni $L__BB0_2; 2026-02-21T09:50:55.8645004Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8645205Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8645282Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:55.8645362Z ld.shared.b32 %r1589, [global_smem+98312]; 2026-02-21T09:50:55.8645417Z barrier.sync 1; 2026-02-21T09:50:55.8645579Z .loc 1 21 67 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:21:67 2026-02-21T09:50:55.8645643Z mov.u32 %r17, %ctaid.x; 2026-02-21T09:50:55.8645700Z mov.u32 %r107, %ctaid.y; 2026-02-21T09:50:55.8645758Z mov.u32 %r108, %ctaid.z; 2026-02-21T09:50:55.8645814Z mov.u32 %r109, %nctaid.x; 2026-02-21T09:50:55.8645876Z mov.u32 %r110, %nctaid.y; 2026-02-21T09:50:55.8645940Z mad.lo.s32 %r111, %r108, %r110, %r107; 2026-02-21T09:50:55.8646004Z mad.lo.s32 %r112, %r111, %r109, %r17; 2026-02-21T09:50:55.8646065Z shl.b32 %r113, %r112, 8; 2026-02-21T09:50:55.8646273Z .loc 1 22 68 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:22:68 2026-02-21T09:50:55.8646334Z cvt.s64.s32 %rd7, %r113; 2026-02-21T09:50:55.8646400Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:50:55.8646459Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:50:55.8646521Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:50:55.8646683Z .loc 1 21 67 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:21:67 2026-02-21T09:50:55.8646750Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:50:55.8646913Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8646975Z setp.lt.s32 %p5, %r1589, 1; 2026-02-21T09:50:55.8647038Z @%p5 bra $L__BB0_14; 2026-02-21T09:50:55.8647112Z // %bb.10: // %.lr.ph 2026-02-21T09:50:55.8647193Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8647257Z add.s32 %r1599, %r17, -592; 2026-02-21T09:50:55.8647312Z add.s32 %r19, %r1, -128; 2026-02-21T09:50:55.8647369Z mov.b32 %r1596, -1; 2026-02-21T09:50:55.8647422Z mov.b32 %r1590, 0; 2026-02-21T09:50:55.8647484Z mov.b32 %r1591, %r1590; 2026-02-21T09:50:55.8647537Z mov.b32 %r1598, %r1590; 2026-02-21T09:50:55.8647590Z mov.b32 %r1597, %r1590; 2026-02-21T09:50:55.8647650Z mov.b32 %r1594, %r1590; 2026-02-21T09:50:55.8647703Z bra.uni $L__BB0_11; 2026-02-21T09:50:55.8647797Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:50:55.8647962Z .loc 1 0 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0:97 2026-02-21T09:50:55.8648031Z selp.b32 %r134, 0, %r1594, %p8; 2026-02-21T09:50:55.8648090Z setp.lt.u32 %p12, %r19, 32; 2026-02-21T09:50:55.8648147Z setp.eq.b32 %p9, %r19, 0; 2026-02-21T09:50:55.8648318Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8648375Z shl.b32 %r141, %r1591, 3; 2026-02-21T09:50:55.8648431Z add.s32 %r143, %r105, %r141; 2026-02-21T09:50:55.8648495Z add.s32 %r130, %r143, 106496; 2026-02-21T09:50:55.8648652Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8648715Z // begin inline asm 2026-02-21T09:50:55.8648762Z 2026-02-21T09:50:55.8648817Z { 2026-02-21T09:50:55.8648874Z .reg .pred complete; 2026-02-21T09:50:55.8648926Z waitLoop: 2026-02-21T09:50:55.8649050Z mbarrier.try_wait.parity.shared.b64 complete, [%r130], %r1590; 2026-02-21T09:50:55.8649110Z @!complete bra.uni waitLoop; 2026-02-21T09:50:55.8649157Z } 2026-02-21T09:50:55.8649188Z 2026-02-21T09:50:55.8649248Z // end inline asm 2026-02-21T09:50:55.8649406Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8649463Z add.s32 %r136, %r143, 106528; 2026-02-21T09:50:55.8649614Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8649673Z bar.sync 3, 64; 2026-02-21T09:50:55.8649750Z // begin inline asm 2026-02-21T09:50:55.8649859Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r136], 24576; 2026-02-21T09:50:55.8649920Z // end inline asm 2026-02-21T09:50:55.8650080Z .loc 1 51 31 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:51:31 2026-02-21T09:50:55.8650137Z shl.b32 %r144, %r1591, 14; 2026-02-21T09:50:55.8650199Z add.s32 %r133, %r105, %r144; 2026-02-21T09:50:55.8650350Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8650404Z bar.sync 3, 64; 2026-02-21T09:50:55.8650466Z elect.sync %r145|%p13, -1; 2026-02-21T09:50:55.8650533Z and.pred %p10, %p12, %p13; 2026-02-21T09:50:55.8650586Z // begin inline asm 2026-02-21T09:50:55.8650840Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r133], [%rd10, {%r134, %r1598}], [%r136]; 2026-02-21T09:50:55.8650918Z // end inline asm 2026-02-21T09:50:55.8651101Z .loc 1 52 44 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:52:44 2026-02-21T09:50:55.8651159Z shl.b32 %r146, %r1591, 13; 2026-02-21T09:50:55.8651221Z add.s32 %r147, %r105, %r146; 2026-02-21T09:50:55.8651275Z add.s32 %r137, %r147, 65536; 2026-02-21T09:50:55.8651424Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8651478Z bar.sync 3, 64; 2026-02-21T09:50:55.8651544Z elect.sync %r148|%p14, -1; 2026-02-21T09:50:55.8651602Z and.pred %p11, %p12, %p14; 2026-02-21T09:50:55.8651654Z // begin inline asm 2026-02-21T09:50:55.8651901Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r137], [%rd11, {%r134, %r1597}], [%r136]; 2026-02-21T09:50:55.8651955Z // end inline asm 2026-02-21T09:50:55.8652113Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8652174Z add.s32 %r1594, %r134, 32; 2026-02-21T09:50:55.8652324Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8652378Z add.s32 %r149, %r1591, 1; 2026-02-21T09:50:55.8652437Z setp.eq.b32 %p15, %r149, 4; 2026-02-21T09:50:55.8652502Z selp.b32 %r1591, 0, %r149, %p15; 2026-02-21T09:50:55.8652556Z selp.b32 %r150, 1, 0, %p15; 2026-02-21T09:50:55.8652613Z xor.b32 %r1590, %r1590, %r150; 2026-02-21T09:50:55.8652774Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8652829Z add.s32 %r1589, %r1589, -1; 2026-02-21T09:50:55.8652889Z setp.ne.b32 %p16, %r1589, 0; 2026-02-21T09:50:55.8652951Z @%p16 bra $L__BB0_11; 2026-02-21T09:50:55.8653004Z bra.uni $L__BB0_14; 2026-02-21T09:50:55.8653099Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:50:55.8653188Z // => This Inner Loop Header: Depth=2 2026-02-21T09:50:55.8653251Z add.s32 %r116, %r1596, 1; 2026-02-21T09:50:55.8653310Z setp.eq.b32 %p6, %r1596, 63; 2026-02-21T09:50:55.8653371Z selp.b32 %r1596, 0, %r116, %p6; 2026-02-21T09:50:55.8653435Z setp.ne.b32 %p7, %r1596, 0; 2026-02-21T09:50:55.8653491Z setp.eq.b32 %p8, %r1596, 0; 2026-02-21T09:50:55.8653544Z @%p7 bra $L__BB0_13; 2026-02-21T09:50:55.8653644Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:50:55.8653700Z add.s32 %r1599, %r1599, 592; 2026-02-21T09:50:55.8653855Z .loc 1 34 35 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:34:35 2026-02-21T09:50:55.8653932Z shr.s32 %r117, %r1599, 31; 2026-02-21T09:50:55.8653995Z shr.u32 %r118, %r117, 28; 2026-02-21T09:50:55.8654053Z add.s32 %r119, %r1599, %r118; 2026-02-21T09:50:55.8654109Z shr.s32 %r120, %r119, 4; 2026-02-21T09:50:55.8654275Z .loc 1 35 33 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:35:33 2026-02-21T09:50:55.8654331Z shl.b32 %r121, %r120, 1; 2026-02-21T09:50:55.8654515Z .loc 1 36 39 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:36:39 2026-02-21T09:50:55.8654579Z sub.s32 %r122, 96, %r121; 2026-02-21T09:50:55.8654769Z .loc 1 36 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:36:52 2026-02-21T09:50:55.8654824Z min.s32 %r123, %r122, 2; 2026-02-21T09:50:55.8654982Z .loc 1 37 45 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:37:45 2026-02-21T09:50:55.8655046Z and.b32 %r124, %r119, -16; 2026-02-21T09:50:55.8655104Z sub.s32 %r125, %r1599, %r124; 2026-02-21T09:50:55.8655260Z .loc 1 38 51 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:38:51 2026-02-21T09:50:55.8655326Z div.s32 %r126, %r125, %r123; 2026-02-21T09:50:55.8655479Z .loc 1 37 64 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:37:64 2026-02-21T09:50:55.8655567Z mul.lo.s32 %r127, %r126, %r123; 2026-02-21T09:50:55.8655661Z sub.s32 %r128, %r125, %r127; 2026-02-21T09:50:55.8655825Z .loc 1 37 30 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:37:30 2026-02-21T09:50:55.8655882Z add.s32 %r129, %r128, %r121; 2026-02-21T09:50:55.8656042Z .loc 1 39 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:39:27 2026-02-21T09:50:55.8656105Z shl.b32 %r1597, %r129, 7; 2026-02-21T09:50:55.8656264Z .loc 1 41 27 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:41:27 2026-02-21T09:50:55.8656319Z shl.b32 %r1598, %r126, 8; 2026-02-21T09:50:55.8656382Z bra.uni $L__BB0_13; 2026-02-21T09:50:55.8656460Z $L__BB0_14: // %._crit_edge 2026-02-21T09:50:55.8656545Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8656711Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8656766Z barrier.sync 1; 2026-02-21T09:50:55.8656844Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:50:55.8656903Z bra.uni $L__BB0_2; 2026-02-21T09:50:55.8656995Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:50:55.8657148Z .loc 1 19 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:19 2026-02-21T09:50:55.8657202Z barrier.sync 1; 2026-02-21T09:50:55.8657265Z barrier.sync 1; 2026-02-21T09:50:55.8657319Z bra.uni $L__BB0_2; 2026-02-21T09:50:55.8657401Z $L__BB0_23: // %._crit_edge11 2026-02-21T09:50:55.8657569Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8657622Z barrier.sync 1; 2026-02-21T09:50:55.8657696Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:50:55.8657862Z .loc 1 53 52 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:53:52 2026-02-21T09:50:55.8657914Z bar.sync 0, 128; 2026-02-21T09:50:55.8657969Z // begin inline asm 2026-02-21T09:50:55.8658018Z 2026-02-21T09:50:55.8658073Z { 2026-02-21T09:50:55.8658129Z .reg .pred complete; 2026-02-21T09:50:55.8658181Z waitLoop: 2026-02-21T09:50:55.8658304Z mbarrier.try_wait.parity.shared.b64 complete, [%r1570], %r1610; 2026-02-21T09:50:55.8658364Z @!complete bra.uni waitLoop; 2026-02-21T09:50:55.8658413Z } 2026-02-21T09:50:55.8658416Z 2026-02-21T09:50:55.8658468Z // end inline asm 2026-02-21T09:50:55.8658636Z .loc 1 28 97 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:97 2026-02-21T09:50:55.8658719Z bar.sync 0, 128; 2026-02-21T09:50:55.8658774Z // begin inline asm 2026-02-21T09:50:55.8658866Z @%p111 mbarrier.inval.shared::cta.b64 [%r1570]; 2026-02-21T09:50:55.8658918Z // end inline asm 2026-02-21T09:50:55.8658971Z // begin inline asm 2026-02-21T09:50:55.8659059Z @%p111 mbarrier.inval.shared::cta.b64 [%r491]; 2026-02-21T09:50:55.8659112Z // end inline asm 2026-02-21T09:50:55.8659167Z // begin inline asm 2026-02-21T09:50:55.8659274Z @%p111 mbarrier.inval.shared::cta.b64 [%r483]; 2026-02-21T09:50:55.8659333Z // end inline asm 2026-02-21T09:50:55.8659386Z bar.sync 0, 128; 2026-02-21T09:50:55.8659439Z // begin inline asm 2026-02-21T09:50:55.8659521Z @%p111 mbarrier.inval.shared::cta.b64 [%r484]; 2026-02-21T09:50:55.8659572Z // end inline asm 2026-02-21T09:50:55.8659625Z bar.sync 0, 128; 2026-02-21T09:50:55.8659678Z // begin inline asm 2026-02-21T09:50:55.8659761Z @%p111 mbarrier.inval.shared::cta.b64 [%r485]; 2026-02-21T09:50:55.8659812Z // end inline asm 2026-02-21T09:50:55.8659864Z bar.sync 0, 128; 2026-02-21T09:50:55.8659925Z // begin inline asm 2026-02-21T09:50:55.8659999Z @%p111 mbarrier.inval.shared::cta.b64 [%r486]; 2026-02-21T09:50:55.8660049Z // end inline asm 2026-02-21T09:50:55.8660102Z // begin inline asm 2026-02-21T09:50:55.8660183Z @%p111 mbarrier.inval.shared::cta.b64 [%r479]; 2026-02-21T09:50:55.8660258Z // end inline asm 2026-02-21T09:50:55.8660338Z bar.sync 0, 128; 2026-02-21T09:50:55.8660401Z // begin inline asm 2026-02-21T09:50:55.8660476Z @%p111 mbarrier.inval.shared::cta.b64 [%r480]; 2026-02-21T09:50:55.8660525Z // end inline asm 2026-02-21T09:50:55.8660581Z bar.sync 0, 128; 2026-02-21T09:50:55.8660634Z // begin inline asm 2026-02-21T09:50:55.8660709Z @%p111 mbarrier.inval.shared::cta.b64 [%r481]; 2026-02-21T09:50:55.8660759Z // end inline asm 2026-02-21T09:50:55.8660819Z bar.sync 0, 128; 2026-02-21T09:50:55.8660870Z // begin inline asm 2026-02-21T09:50:55.8660943Z @%p111 mbarrier.inval.shared::cta.b64 [%r482]; 2026-02-21T09:50:55.8661003Z // end inline asm 2026-02-21T09:50:55.8661160Z .loc 1 28 4 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:28:4 2026-02-21T09:50:55.8661212Z bar.sync 0, 128; 2026-02-21T09:50:55.8661265Z // begin inline asm 2026-02-21T09:50:55.8661386Z @%p33 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1582, 256; 2026-02-21T09:50:55.8661438Z // end inline asm 2026-02-21T09:50:55.8661515Z st.shared.b32 [global_smem+106584], 50529027; 2026-02-21T09:50:55.8661577Z barrier.sync 1; 2026-02-21T09:50:55.8661654Z $L__BB0_24: // %common.ret 2026-02-21T09:50:55.8661806Z .loc 1 0 0 // c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py:0 2026-02-21T09:50:55.8661863Z ret; 2026-02-21T09:50:55.8661917Z $L__tmp1: 2026-02-21T09:50:55.8661970Z $L__func_end0: 2026-02-21T09:50:55.8662048Z // -- End function 2026-02-21T09:50:55.8662101Z } 2026-02-21T09:50:55.8662298Z .file 1 "/tmp/torchinductor_root/3d/c3d3gh6ptaoq5loor74lhoktlifuft7py6rvlaxusy6ro2vnvl7y.py" 2026-02-21T09:50:55.8662359Z .section .debug_abbrev 2026-02-21T09:50:55.8662415Z { 2026-02-21T09:50:55.8662500Z .b8 1 // Abbreviation Code 2026-02-21T09:50:55.8662584Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:50:55.8662666Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:50:55.8662745Z .b8 37 // DW_AT_producer 2026-02-21T09:50:55.8662816Z .b8 8 // DW_FORM_string 2026-02-21T09:50:55.8662886Z .b8 19 // DW_AT_language 2026-02-21T09:50:55.8662966Z .b8 5 // DW_FORM_data2 2026-02-21T09:50:55.8663039Z .b8 3 // DW_AT_name 2026-02-21T09:50:55.8663110Z .b8 8 // DW_FORM_string 2026-02-21T09:50:55.8663192Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:50:55.8663302Z .b8 6 // DW_FORM_data4 2026-02-21T09:50:55.8663378Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:50:55.8663460Z .b8 8 // DW_FORM_string 2026-02-21T09:50:55.8663533Z .b8 0 // EOM(1) 2026-02-21T09:50:55.8663605Z .b8 0 // EOM(2) 2026-02-21T09:50:55.8663696Z .b8 0 // EOM(3) 2026-02-21T09:50:55.8663754Z } 2026-02-21T09:50:55.8663814Z .section .debug_info 2026-02-21T09:50:55.8663864Z { 2026-02-21T09:50:55.8663952Z .b32 104 // Length of Unit 2026-02-21T09:50:55.8664037Z .b8 2 // DWARF version number 2026-02-21T09:50:55.8664089Z .b8 0 2026-02-21T09:50:55.8664205Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:50:55.8664300Z .b8 8 // Address Size (in bytes) 2026-02-21T09:50:55.8664401Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:50:55.8664480Z .b8 116 // DW_AT_producer 2026-02-21T09:50:55.8664540Z .b8 114 2026-02-21T09:50:55.8664592Z .b8 105 2026-02-21T09:50:55.8664643Z .b8 116 2026-02-21T09:50:55.8664778Z .b8 111 2026-02-21T09:50:55.8664832Z .b8 110 2026-02-21T09:50:55.8664883Z .b8 0 2026-02-21T09:50:55.8664986Z .b8 2 // DW_AT_language 2026-02-21T09:50:55.8665046Z .b8 0 2026-02-21T09:50:55.8665124Z .b8 99 // DW_AT_name 2026-02-21T09:50:55.8665178Z .b8 51 2026-02-21T09:50:55.8665240Z .b8 100 2026-02-21T09:50:55.8665292Z .b8 51 2026-02-21T09:50:55.8665344Z .b8 103 2026-02-21T09:50:55.8665397Z .b8 104 2026-02-21T09:50:55.8665461Z .b8 54 2026-02-21T09:50:55.8665512Z .b8 112 2026-02-21T09:50:55.8665562Z .b8 116 2026-02-21T09:50:55.8665620Z .b8 97 2026-02-21T09:50:55.8665673Z .b8 111 2026-02-21T09:50:55.8665726Z .b8 113 2026-02-21T09:50:55.8665777Z .b8 53 2026-02-21T09:50:55.8665835Z .b8 108 2026-02-21T09:50:55.8665887Z .b8 111 2026-02-21T09:50:55.8665938Z .b8 111 2026-02-21T09:50:55.8665988Z .b8 114 2026-02-21T09:50:55.8666047Z .b8 55 2026-02-21T09:50:55.8666098Z .b8 52 2026-02-21T09:50:55.8666148Z .b8 108 2026-02-21T09:50:55.8666204Z .b8 104 2026-02-21T09:50:55.8666256Z .b8 111 2026-02-21T09:50:55.8666305Z .b8 107 2026-02-21T09:50:55.8666358Z .b8 116 2026-02-21T09:50:55.8666417Z .b8 108 2026-02-21T09:50:55.8666466Z .b8 105 2026-02-21T09:50:55.8666516Z .b8 102 2026-02-21T09:50:55.8666574Z .b8 117 2026-02-21T09:50:55.8666625Z .b8 102 2026-02-21T09:50:55.8666676Z .b8 116 2026-02-21T09:50:55.8666726Z .b8 55 2026-02-21T09:50:55.8666784Z .b8 112 2026-02-21T09:50:55.8666834Z .b8 121 2026-02-21T09:50:55.8666882Z .b8 54 2026-02-21T09:50:55.8666932Z .b8 114 2026-02-21T09:50:55.8666990Z .b8 118 2026-02-21T09:50:55.8667040Z .b8 108 2026-02-21T09:50:55.8667089Z .b8 97 2026-02-21T09:50:55.8667145Z .b8 120 2026-02-21T09:50:55.8667195Z .b8 117 2026-02-21T09:50:55.8667245Z .b8 115 2026-02-21T09:50:55.8667294Z .b8 121 2026-02-21T09:50:55.8667350Z .b8 54 2026-02-21T09:50:55.8667400Z .b8 114 2026-02-21T09:50:55.8667448Z .b8 111 2026-02-21T09:50:55.8667503Z .b8 50 2026-02-21T09:50:55.8667552Z .b8 118 2026-02-21T09:50:55.8667601Z .b8 110 2026-02-21T09:50:55.8667650Z .b8 118 2026-02-21T09:50:55.8667709Z .b8 108 2026-02-21T09:50:55.8667759Z .b8 55 2026-02-21T09:50:55.8667812Z .b8 121 2026-02-21T09:50:55.8667866Z .b8 46 2026-02-21T09:50:55.8667916Z .b8 112 2026-02-21T09:50:55.8667966Z .b8 121 2026-02-21T09:50:55.8668015Z .b8 0 2026-02-21T09:50:55.8668113Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:50:55.8668189Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:50:55.8668238Z .b8 116 2026-02-21T09:50:55.8668294Z .b8 109 2026-02-21T09:50:55.8668345Z .b8 112 2026-02-21T09:50:55.8668394Z .b8 47 2026-02-21T09:50:55.8668444Z .b8 116 2026-02-21T09:50:55.8668502Z .b8 111 2026-02-21T09:50:55.8668581Z .b8 114 2026-02-21T09:50:55.8668632Z .b8 99 2026-02-21T09:50:55.8668681Z .b8 104 2026-02-21T09:50:55.8668739Z .b8 105 2026-02-21T09:50:55.8668788Z .b8 110 2026-02-21T09:50:55.8668839Z .b8 100 2026-02-21T09:50:55.8668897Z .b8 117 2026-02-21T09:50:55.8668947Z .b8 99 2026-02-21T09:50:55.8668997Z .b8 116 2026-02-21T09:50:55.8669048Z .b8 111 2026-02-21T09:50:55.8669107Z .b8 114 2026-02-21T09:50:55.8669157Z .b8 95 2026-02-21T09:50:55.8669235Z .b8 114 2026-02-21T09:50:55.8669293Z .b8 111 2026-02-21T09:50:55.8669345Z .b8 111 2026-02-21T09:50:55.8669396Z .b8 116 2026-02-21T09:50:55.8669447Z .b8 47 2026-02-21T09:50:55.8669504Z .b8 51 2026-02-21T09:50:55.8669554Z .b8 100 2026-02-21T09:50:55.8669604Z .b8 0 2026-02-21T09:50:55.8669653Z } 2026-02-21T09:50:55.8669723Z .section .debug_macinfo { } 2026-02-21T09:50:55.8669727Z 2026-02-21T09:50:55.8669804Z ================================================================ 2026-02-21T09:50:55.8669906Z please share the reproducer above with Triton project. 2026-02-21T09:50:56.5006729Z 2026-02-21T09:50:56.5011527Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 50/50 18.9 configs/s 2026-02-21T09:51:01.5174257Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 198.8 2026-02-21T09:51:01.5175449Z configs/s 2026-02-21T09:51:01.6995639Z [278s] Generation 9 complete: 2026-02-21T09:51:01.6999480Z error=12 2026-02-21T09:51:01.7001358Z ok=40 2026-02-21T09:51:01.7001515Z min=0.1074 2026-02-21T09:51:01.7001652Z mid=0.1373 2026-02-21T09:51:01.7001774Z max=34.3645 2026-02-21T09:51:01.7001923Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:51:01.7002141Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:51:01.7002344Z 'l2_groupings': [4], 2026-02-21T09:51:01.7002512Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T09:51:01.7002709Z 'loop_orders': [[1, 0]], 2026-02-21T09:51:01.7002868Z 'maxnreg': 64, 2026-02-21T09:51:01.7003006Z 'num_sm_multiplier': 16, 2026-02-21T09:51:01.7003190Z 'num_stages': 3, 2026-02-21T09:51:01.7003326Z 'num_warps': 4, 2026-02-21T09:51:01.7003486Z 'pid_type': 'persistent_blocked', 2026-02-21T09:51:01.7003671Z 'range_flattens': [False, None], 2026-02-21T09:51:01.7003853Z 'range_multi_buffers': [False, None], 2026-02-21T09:51:01.7004032Z 'range_num_stages': [0, 0], 2026-02-21T09:51:01.7004204Z 'range_unroll_factors': [0, 0], 2026-02-21T09:51:01.7004380Z 'range_warp_specializes': [None, True]} 2026-02-21T09:51:01.7028779Z [278s] Fitting surrogate: 912 points, 912 targets 2026-02-21T09:51:02.6966831Z [279s] Generation 10 starting: 55 neighbors, 3 active search path(s) 2026-02-21T09:51:08.3586537Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 8.8 configs/s 2026-02-21T09:51:10.9615054Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 56/56 22.0 configs/s 2026-02-21T09:51:16.1892677Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 191.0 2026-02-21T09:51:16.1896799Z configs/s 2026-02-21T09:51:16.3712949Z [293s] Generation 10 complete: 2026-02-21T09:51:16.3715042Z error=12 2026-02-21T09:51:16.3715210Z ok=46 2026-02-21T09:51:16.3715347Z min=0.1063 2026-02-21T09:51:16.3715477Z mid=0.1362 2026-02-21T09:51:16.3715607Z max=7.3616 2026-02-21T09:51:16.3715781Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:51:16.3716030Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:51:16.3716248Z 'l2_groupings': [4], 2026-02-21T09:51:16.3716429Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T09:51:16.3716623Z 'loop_orders': [[1, 0]], 2026-02-21T09:51:16.3716786Z 'maxnreg': 64, 2026-02-21T09:51:16.3716939Z 'num_sm_multiplier': 8, 2026-02-21T09:51:16.3717094Z 'num_stages': 3, 2026-02-21T09:51:16.3717240Z 'num_warps': 4, 2026-02-21T09:51:16.3717394Z 'pid_type': 'persistent_blocked', 2026-02-21T09:51:16.3717588Z 'range_flattens': [False, None], 2026-02-21T09:51:16.3717770Z 'range_multi_buffers': [False, None], 2026-02-21T09:51:16.3718383Z 'range_num_stages': [0, 0], 2026-02-21T09:51:16.3718555Z 'range_unroll_factors': [0, 0], 2026-02-21T09:51:16.3718748Z 'range_warp_specializes': [None, True]} 2026-02-21T09:51:16.3744152Z [293s] Fitting surrogate: 970 points, 970 targets 2026-02-21T09:51:17.3423055Z [294s] Generation 11 starting: 53 neighbors, 3 active search path(s) 2026-02-21T09:51:49.7790621Z [326s] Timeout after 30s compiling Config(block_sizes=[256, 512, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], num_stages=7, num_warps=1, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[None, None]) 2026-02-21T09:51:49.7809055Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54/54 0.2 configs/s 2026-02-21T09:51:51.9442188Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 54/54 25.7 configs/s 2026-02-21T09:51:56.0560143Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 242.2 2026-02-21T09:51:56.0560808Z configs/s 2026-02-21T09:51:56.2210760Z [333s] Generation 11 complete: 2026-02-21T09:51:56.2211000Z error=17 2026-02-21T09:51:56.2217371Z timeout=1 2026-02-21T09:51:56.2217614Z ok=38 2026-02-21T09:51:56.2217919Z min=0.1075 2026-02-21T09:51:56.2218171Z mid=0.1341 2026-02-21T09:51:56.2218412Z max=4.9798 2026-02-21T09:51:56.2218674Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:51:56.2219086Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:51:56.2219498Z 'l2_groupings': [4], 2026-02-21T09:51:56.2219821Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T09:51:56.2220217Z 'loop_orders': [[1, 0]], 2026-02-21T09:51:56.2220516Z 'maxnreg': 64, 2026-02-21T09:51:56.2220792Z 'num_sm_multiplier': 8, 2026-02-21T09:51:56.2221090Z 'num_stages': 3, 2026-02-21T09:51:56.2221359Z 'num_warps': 2, 2026-02-21T09:51:56.2221652Z 'pid_type': 'persistent_blocked', 2026-02-21T09:51:56.2222005Z 'range_flattens': [False, None], 2026-02-21T09:51:56.2222355Z 'range_multi_buffers': [False, None], 2026-02-21T09:51:56.2222711Z 'range_num_stages': [0, 0], 2026-02-21T09:51:56.2223049Z 'range_unroll_factors': [0, 0], 2026-02-21T09:51:56.2223406Z 'range_warp_specializes': [False, True]} 2026-02-21T09:51:56.2248847Z [333s] Fitting surrogate: 1026 points, 1026 targets 2026-02-21T09:51:57.1063358Z [334s] Generation 12 starting: 55 neighbors, 3 active search path(s) 2026-02-21T09:52:05.8449557Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 11.5 configs/s 2026-02-21T09:52:09.0609544Z 2026-02-21T09:52:09.0609830Z 2026-02-21T09:52:09.0610405Z ================================================================ 2026-02-21T09:52:09.0610998Z Internal Triton PTX codegen error 2026-02-21T09:52:09.0611390Z `ptxas` stderr: 2026-02-21T09:52:09.0612416Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:52:09.0613612Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:52:09.0613962Z 2026-02-21T09:52:09.0615118Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp8ivac3d5.ptx -o /tmp/tmp8ivac3d5.ptx.o 2026-02-21T09:52:09.0616662Z 2026-02-21T09:52:09.0616671Z 2026-02-21T09:52:09.0616779Z // 2026-02-21T09:52:09.0617071Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:52:09.0617453Z // 2026-02-21T09:52:09.0617587Z 2026-02-21T09:52:09.0617703Z .version 8.7 2026-02-21T09:52:09.0617965Z .target sm_100a 2026-02-21T09:52:09.0618251Z .address_size 64 2026-02-21T09:52:09.0618421Z 2026-02-21T09:52:09.0618684Z // .globl _helion_matmul // -- Begin function _helion_matmul 2026-02-21T09:52:09.0619273Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:52:09.0619961Z // @_helion_matmul 2026-02-21T09:52:09.0620417Z .visible .entry _helion_matmul( 2026-02-21T09:52:09.0620876Z .param .u64 .ptr .global .align 1 _helion_matmul_param_0, 2026-02-21T09:52:09.0621444Z .param .u64 .ptr .global .align 1 _helion_matmul_param_1, 2026-02-21T09:52:09.0622016Z .param .u64 .ptr .global .align 1 _helion_matmul_param_2, 2026-02-21T09:52:09.0622571Z .param .u64 .ptr .global .align 1 _helion_matmul_param_3, 2026-02-21T09:52:09.0623145Z .param .u64 .ptr .global .align 1 _helion_matmul_param_4 2026-02-21T09:52:09.0623583Z ) 2026-02-21T09:52:09.0623826Z .reqntid 256 2026-02-21T09:52:09.0624084Z .maxnreg 32 2026-02-21T09:52:09.0624338Z { 2026-02-21T09:52:09.0624591Z .reg .pred %p<151>; 2026-02-21T09:52:09.0624985Z .reg .b32 %r<1676>; 2026-02-21T09:52:09.0625293Z .reg .b64 %rd<623>; 2026-02-21T09:52:09.0625897Z .loc 1 19 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:19:0 2026-02-21T09:52:09.0626535Z $L__func_begin0: 2026-02-21T09:52:09.0627072Z .loc 1 19 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:19:0 2026-02-21T09:52:09.0627603Z 2026-02-21T09:52:09.0627698Z // %bb.0: 2026-02-21T09:52:09.0627996Z ld.param.b64 %rd6, [_helion_matmul_param_3]; 2026-02-21T09:52:09.0628374Z $L__tmp0: 2026-02-21T09:52:09.0629120Z .loc 1 19 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:19 2026-02-21T09:52:09.0629757Z mov.u32 %r1, %tid.x; 2026-02-21T09:52:09.0630043Z shr.u32 %r2, %r1, 5; 2026-02-21T09:52:09.0631150Z [346s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:52:09.0633942Z Config: @helion.kernel(config=helion.Config(block_sizes=[256, 128, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[True, None]), static_shapes=True) 2026-02-21T09:52:09.0636774Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:52:09.0637303Z `ptxas` stderr: 2026-02-21T09:52:09.0638233Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 202 in function _helion_matmul. Try to compile with register target of 34 or higher. 2026-02-21T09:52:09.0639344Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:52:09.0639654Z 2026-02-21T09:52:09.0640526Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_100a /tmp/tmp8ivac3d5.ptx -o /tmp/tmp8ivac3d5.ptx.o 2026-02-21T09:52:09.0641555Z 2026-02-21T09:52:09.0641820Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:52:09.0642358Z shfl.sync.idx.b32 %r3, %r2, 0, 31, -1; 2026-02-21T09:52:09.0642731Z setp.lt.u32 %p3, %r3, 4; 2026-02-21T09:52:09.0643036Z @%p3 bra $L__BB0_16; 2026-02-21T09:52:09.0643308Z bra.uni $L__BB0_1; 2026-02-21T09:52:09.0643575Z $L__BB0_16: 2026-02-21T09:52:09.0644061Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0:0 2026-02-21T09:52:09.0644824Z ld.param.b64 %rd4, [_helion_matmul_param_1]; 2026-02-21T09:52:09.0645426Z ld.param.b64 %rd3, [_helion_matmul_param_0]; 2026-02-21T09:52:09.0646062Z .loc 1 19 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:19 2026-02-21T09:52:09.0646724Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:52:09.0647112Z setp.lt.u32 %p43, %r1, 32; 2026-02-21T09:52:09.0647446Z mov.b32 %r232, global_smem; 2026-02-21T09:52:09.0647768Z // begin inline asm 2026-02-21T09:52:09.0648269Z @%p43 tcgen05.alloc.cta_group::1.sync.aligned.shared::cta.b32 [%r232], 512; 2026-02-21T09:52:09.0648798Z // end inline asm 2026-02-21T09:52:09.0649212Z bar.sync 0, 128; 2026-02-21T09:52:09.0649514Z ld.shared.b32 %r1642, [global_smem]; 2026-02-21T09:52:09.0649867Z bar.sync 0, 128; 2026-02-21T09:52:09.0650139Z // begin inline asm 2026-02-21T09:52:09.0650558Z @%p43 tcgen05.relinquish_alloc_permit.cta_group::1.sync.aligned; 2026-02-21T09:52:09.0651049Z // end inline asm 2026-02-21T09:52:09.0651594Z .loc 1 21 67 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:21:67 2026-02-21T09:52:09.0652248Z mov.u32 %r549, %ctaid.x; 2026-02-21T09:52:09.0652548Z mov.u32 %r550, %ctaid.y; 2026-02-21T09:52:09.0652854Z mov.u32 %r551, %ctaid.z; 2026-02-21T09:52:09.0653162Z mov.u32 %r552, %nctaid.x; 2026-02-21T09:52:09.0653468Z mov.u32 %r553, %nctaid.y; 2026-02-21T09:52:09.0653798Z mad.lo.s32 %r554, %r551, %r553, %r550; 2026-02-21T09:52:09.0654168Z mad.lo.s32 %r555, %r554, %r552, %r549; 2026-02-21T09:52:09.0654532Z shl.b32 %r556, %r555, 8; 2026-02-21T09:52:09.0654912Z cvt.s64.s32 %rd78, %r556; 2026-02-21T09:52:09.0655249Z add.s64 %rd57, %rd6, %rd78; 2026-02-21T09:52:09.0655561Z shl.b32 %r557, %r1, 2; 2026-02-21T09:52:09.0655872Z add.s32 %r233, %r232, %r557; 2026-02-21T09:52:09.0656181Z mov.b32 %r1673, 0; 2026-02-21T09:52:09.0656465Z // begin inline asm 2026-02-21T09:52:09.0656781Z @%p43 st.shared.b32 [ %r233 + 0 ], %r1673; 2026-02-21T09:52:09.0657299Z // end inline asm 2026-02-21T09:52:09.0657600Z bar.warp.sync -1; 2026-02-21T09:52:09.0657988Z setp.eq.b32 %p131, %r1, 0; 2026-02-21T09:52:09.0658331Z cvt.u64.u32 %rd42, %r232; 2026-02-21T09:52:09.0658626Z // begin inline asm 2026-02-21T09:52:09.0659178Z @%p131 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd42 + 0 ], %rd3; 2026-02-21T09:52:09.0659845Z // end inline asm 2026-02-21T09:52:09.0660107Z // begin inline asm 2026-02-21T09:52:09.0660597Z @%p131 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:52:09.0661160Z // end inline asm 2026-02-21T09:52:09.0661428Z mov.b32 %r235, 64; 2026-02-21T09:52:09.0661711Z // begin inline asm 2026-02-21T09:52:09.0662210Z @%p131 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r235; 2026-02-21T09:52:09.0662784Z // end inline asm 2026-02-21T09:52:09.0663046Z mov.b32 %r236, 256; 2026-02-21T09:52:09.0663323Z // begin inline asm 2026-02-21T09:52:09.0663812Z @%p131 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r236; 2026-02-21T09:52:09.0664406Z // end inline asm 2026-02-21T09:52:09.0664658Z mov.b32 %r237, 2048; 2026-02-21T09:52:09.0665032Z // begin inline asm 2026-02-21T09:52:09.0665543Z @%p131 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r237; 2026-02-21T09:52:09.0666149Z // end inline asm 2026-02-21T09:52:09.0666402Z // begin inline asm 2026-02-21T09:52:09.0666935Z @%p131 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r237; 2026-02-21T09:52:09.0667556Z // end inline asm 2026-02-21T09:52:09.0667817Z mov.b64 %rd50, 4096; 2026-02-21T09:52:09.0668114Z // begin inline asm 2026-02-21T09:52:09.0668671Z @%p131 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd42 + 0 ], 0x0, %rd50; 2026-02-21T09:52:09.0669302Z // end inline asm 2026-02-21T09:52:09.0669559Z mov.b32 %r239, 1; 2026-02-21T09:52:09.0669832Z // begin inline asm 2026-02-21T09:52:09.0670406Z @%p131 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r239; 2026-02-21T09:52:09.0671180Z // end inline asm 2026-02-21T09:52:09.0671443Z // begin inline asm 2026-02-21T09:52:09.0671993Z @%p131 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r239; 2026-02-21T09:52:09.0672647Z // end inline asm 2026-02-21T09:52:09.0672907Z // begin inline asm 2026-02-21T09:52:09.0673410Z @%p131 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x6; 2026-02-21T09:52:09.0673988Z // end inline asm 2026-02-21T09:52:09.0674254Z // begin inline asm 2026-02-21T09:52:09.0674878Z @%p131 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:52:09.0675645Z // end inline asm 2026-02-21T09:52:09.0675921Z // begin inline asm 2026-02-21T09:52:09.0676450Z @%p131 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x3; 2026-02-21T09:52:09.0677049Z // end inline asm 2026-02-21T09:52:09.0677322Z // begin inline asm 2026-02-21T09:52:09.0677840Z @%p131 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:52:09.0678467Z // end inline asm 2026-02-21T09:52:09.0678737Z // begin inline asm 2026-02-21T09:52:09.0679537Z @%p43 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd57 + 0 ], [ %rd42 + 0 ], 0x80; 2026-02-21T09:52:09.0680426Z // end inline asm 2026-02-21T09:52:09.0680705Z // begin inline asm 2026-02-21T09:52:09.0681159Z @%p43 fence.proxy.tensormap::generic.acquire.gpu [ %rd57 + 0 ], 0x80; 2026-02-21T09:52:09.0681735Z @%p43 cp.async.bulk.commit_group ; 2026-02-21T09:52:09.0682145Z @%p43 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:52:09.0682528Z // end inline asm 2026-02-21T09:52:09.0682799Z bar.sync 0, 128; 2026-02-21T09:52:09.0683360Z .loc 1 22 68 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:22:68 2026-02-21T09:52:09.0684046Z add.s64 %rd75, %rd57, 128; 2026-02-21T09:52:09.0684484Z bar.sync 0, 128; 2026-02-21T09:52:09.0684879Z // begin inline asm 2026-02-21T09:52:09.0685281Z @%p43 st.shared.b32 [ %r233 + 0 ], %r1673; 2026-02-21T09:52:09.0685668Z // end inline asm 2026-02-21T09:52:09.0685951Z bar.warp.sync -1; 2026-02-21T09:52:09.0686235Z // begin inline asm 2026-02-21T09:52:09.0686791Z @%p131 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd42 + 0 ], %rd4; 2026-02-21T09:52:09.0687426Z // end inline asm 2026-02-21T09:52:09.0687694Z // begin inline asm 2026-02-21T09:52:09.0688185Z @%p131 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1; 2026-02-21T09:52:09.0688756Z // end inline asm 2026-02-21T09:52:09.0689020Z // begin inline asm 2026-02-21T09:52:09.0689543Z @%p131 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r235; 2026-02-21T09:52:09.0690147Z // end inline asm 2026-02-21T09:52:09.0690407Z mov.b32 %r244, 128; 2026-02-21T09:52:09.0690695Z // begin inline asm 2026-02-21T09:52:09.0691197Z @%p131 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r244; 2026-02-21T09:52:09.0691808Z // end inline asm 2026-02-21T09:52:09.0692066Z // begin inline asm 2026-02-21T09:52:09.0692607Z @%p131 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r237; 2026-02-21T09:52:09.0693224Z // end inline asm 2026-02-21T09:52:09.0693484Z mov.b32 %r246, 12288; 2026-02-21T09:52:09.0693777Z // begin inline asm 2026-02-21T09:52:09.0694305Z @%p131 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r246; 2026-02-21T09:52:09.0695007Z // end inline asm 2026-02-21T09:52:09.0695266Z // begin inline asm 2026-02-21T09:52:09.0695841Z @%p131 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd42 + 0 ], 0x0, %rd50; 2026-02-21T09:52:09.0696473Z // end inline asm 2026-02-21T09:52:09.0696743Z // begin inline asm 2026-02-21T09:52:09.0697304Z @%p131 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0, %r239; 2026-02-21T09:52:09.0697957Z // end inline asm 2026-02-21T09:52:09.0698228Z // begin inline asm 2026-02-21T09:52:09.0698934Z @%p131 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x1, %r239; 2026-02-21T09:52:09.0699579Z // end inline asm 2026-02-21T09:52:09.0699839Z // begin inline asm 2026-02-21T09:52:09.0700343Z @%p131 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x6; 2026-02-21T09:52:09.0700920Z // end inline asm 2026-02-21T09:52:09.0701181Z // begin inline asm 2026-02-21T09:52:09.0701724Z @%p131 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:52:09.0702353Z // end inline asm 2026-02-21T09:52:09.0702746Z // begin inline asm 2026-02-21T09:52:09.0703256Z @%p131 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x3; 2026-02-21T09:52:09.0703868Z // end inline asm 2026-02-21T09:52:09.0704137Z // begin inline asm 2026-02-21T09:52:09.0704634Z @%p131 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd42 + 0 ], 0x0; 2026-02-21T09:52:09.0705308Z // end inline asm 2026-02-21T09:52:09.0705585Z // begin inline asm 2026-02-21T09:52:09.0706358Z @%p43 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd75 + 0 ], [ %rd42 + 0 ], 0x80; 2026-02-21T09:52:09.0707219Z // end inline asm 2026-02-21T09:52:09.0707491Z // begin inline asm 2026-02-21T09:52:09.0707935Z @%p43 fence.proxy.tensormap::generic.acquire.gpu [ %rd75 + 0 ], 0x80; 2026-02-21T09:52:09.0708471Z @%p43 cp.async.bulk.commit_group ; 2026-02-21T09:52:09.0708871Z @%p43 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:52:09.0709230Z // end inline asm 2026-02-21T09:52:09.0709503Z bar.sync 0, 128; 2026-02-21T09:52:09.0710035Z .loc 1 29 35 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:29:35 2026-02-21T09:52:09.0710692Z mul.lo.s32 %r43, %r549, 6; 2026-02-21T09:52:09.0711398Z .loc 1 30 37 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:30:37 2026-02-21T09:52:09.0712030Z add.s32 %r558, %r43, 6; 2026-02-21T09:52:09.0712686Z .loc 1 30 49 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:30:49 2026-02-21T09:52:09.0713309Z min.s32 %r559, %r558, 768; 2026-02-21T09:52:09.0713859Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.0714463Z sub.s32 %r562, %r559, %r43; 2026-02-21T09:52:09.0714850Z shl.b32 %r1662, %r562, 5; 2026-02-21T09:52:09.0715414Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0716081Z shfl.sync.idx.b32 %r563, %r2, 0, 31, -1; 2026-02-21T09:52:09.0716471Z shl.b32 %r564, %r563, 21; 2026-02-21T09:52:09.0716773Z and.b32 %r565, %r564, 6291456; 2026-02-21T09:52:09.0717109Z add.s32 %r249, %r565, %r1642; 2026-02-21T09:52:09.0717430Z mov.pred %p81, -1; 2026-02-21T09:52:09.0717719Z // begin inline asm 2026-02-21T09:52:09.0718626Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 0], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0719605Z // end inline asm 2026-02-21T09:52:09.0719863Z // begin inline asm 2026-02-21T09:52:09.0720756Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 16], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0721727Z // end inline asm 2026-02-21T09:52:09.0721988Z // begin inline asm 2026-02-21T09:52:09.0722867Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 32], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0723828Z // end inline asm 2026-02-21T09:52:09.0724097Z // begin inline asm 2026-02-21T09:52:09.0725059Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 48], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0726130Z // end inline asm 2026-02-21T09:52:09.0726387Z // begin inline asm 2026-02-21T09:52:09.0727226Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 64], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0728147Z // end inline asm 2026-02-21T09:52:09.0728401Z // begin inline asm 2026-02-21T09:52:09.0729246Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 80], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0730306Z // end inline asm 2026-02-21T09:52:09.0730560Z // begin inline asm 2026-02-21T09:52:09.0731420Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 96], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0732358Z // end inline asm 2026-02-21T09:52:09.0732623Z // begin inline asm 2026-02-21T09:52:09.0733488Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 112], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0734424Z // end inline asm 2026-02-21T09:52:09.0734789Z // begin inline asm 2026-02-21T09:52:09.0735675Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 128], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0736662Z // end inline asm 2026-02-21T09:52:09.0736937Z // begin inline asm 2026-02-21T09:52:09.0737829Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 144], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0738918Z // end inline asm 2026-02-21T09:52:09.0739192Z // begin inline asm 2026-02-21T09:52:09.0740179Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 160], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0741158Z // end inline asm 2026-02-21T09:52:09.0741432Z // begin inline asm 2026-02-21T09:52:09.0742311Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 176], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0743273Z // end inline asm 2026-02-21T09:52:09.0743550Z // begin inline asm 2026-02-21T09:52:09.0744430Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 192], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0745502Z // end inline asm 2026-02-21T09:52:09.0745764Z // begin inline asm 2026-02-21T09:52:09.0746641Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 208], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0747577Z // end inline asm 2026-02-21T09:52:09.0747829Z // begin inline asm 2026-02-21T09:52:09.0748671Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 224], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0749591Z // end inline asm 2026-02-21T09:52:09.0749853Z // begin inline asm 2026-02-21T09:52:09.0750700Z @%p81 tcgen05.st.sync.aligned.32x32b.x16.b32 [%r249 + 240], {%r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673, %r1673}; 2026-02-21T09:52:09.0751619Z // end inline asm 2026-02-21T09:52:09.0751882Z // begin inline asm 2026-02-21T09:52:09.0752170Z tcgen05.wait::st.sync.aligned; 2026-02-21T09:52:09.0752499Z // end inline asm 2026-02-21T09:52:09.0752746Z bar.sync 0, 128; 2026-02-21T09:52:09.0753400Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.0754020Z add.s32 %r521, %r232, 352256; 2026-02-21T09:52:09.0754332Z // begin inline asm 2026-02-21T09:52:09.0754756Z @%p131 mbarrier.init.shared::cta.b64 [%r521], 1; 2026-02-21T09:52:09.0755166Z // end inline asm 2026-02-21T09:52:09.0755433Z bar.sync 0, 128; 2026-02-21T09:52:09.0755701Z add.s32 %r522, %r232, 352264; 2026-02-21T09:52:09.0756024Z // begin inline asm 2026-02-21T09:52:09.0756352Z @%p131 mbarrier.init.shared::cta.b64 [%r522], 1; 2026-02-21T09:52:09.0756873Z // end inline asm 2026-02-21T09:52:09.0757127Z bar.sync 0, 128; 2026-02-21T09:52:09.0757399Z add.s32 %r523, %r232, 352272; 2026-02-21T09:52:09.0757713Z // begin inline asm 2026-02-21T09:52:09.0758040Z @%p131 mbarrier.init.shared::cta.b64 [%r523], 1; 2026-02-21T09:52:09.0758441Z // end inline asm 2026-02-21T09:52:09.0758700Z bar.sync 0, 128; 2026-02-21T09:52:09.0758974Z add.s32 %r524, %r232, 352280; 2026-02-21T09:52:09.0759282Z // begin inline asm 2026-02-21T09:52:09.0759631Z @%p131 mbarrier.init.shared::cta.b64 [%r524], 1; 2026-02-21T09:52:09.0760024Z // end inline asm 2026-02-21T09:52:09.0760292Z bar.sync 0, 128; 2026-02-21T09:52:09.0760554Z add.s32 %r525, %r232, 352288; 2026-02-21T09:52:09.0760869Z // begin inline asm 2026-02-21T09:52:09.0761205Z @%p131 mbarrier.init.shared::cta.b64 [%r525], 1; 2026-02-21T09:52:09.0761597Z // end inline asm 2026-02-21T09:52:09.0761867Z bar.sync 0, 128; 2026-02-21T09:52:09.0762136Z add.s32 %r526, %r232, 352296; 2026-02-21T09:52:09.0762469Z // begin inline asm 2026-02-21T09:52:09.0762805Z @%p131 mbarrier.init.shared::cta.b64 [%r526], 1; 2026-02-21T09:52:09.0763228Z // end inline asm 2026-02-21T09:52:09.0763480Z bar.sync 0, 128; 2026-02-21T09:52:09.0763750Z add.s32 %r527, %r232, 352304; 2026-02-21T09:52:09.0764068Z // begin inline asm 2026-02-21T09:52:09.0764512Z @%p131 mbarrier.init.shared::cta.b64 [%r527], 1; 2026-02-21T09:52:09.0765073Z // end inline asm 2026-02-21T09:52:09.0765353Z add.s32 %r528, %r232, 352320; 2026-02-21T09:52:09.0765681Z // begin inline asm 2026-02-21T09:52:09.0766015Z @%p131 mbarrier.init.shared::cta.b64 [%r528], 1; 2026-02-21T09:52:09.0766438Z // end inline asm 2026-02-21T09:52:09.0766698Z bar.sync 0, 128; 2026-02-21T09:52:09.0767026Z add.s32 %r529, %r232, 352328; 2026-02-21T09:52:09.0767459Z // begin inline asm 2026-02-21T09:52:09.0767858Z @%p131 mbarrier.init.shared::cta.b64 [%r529], 1; 2026-02-21T09:52:09.0768278Z // end inline asm 2026-02-21T09:52:09.0768542Z bar.sync 0, 128; 2026-02-21T09:52:09.0768830Z add.s32 %r530, %r232, 352336; 2026-02-21T09:52:09.0769145Z // begin inline asm 2026-02-21T09:52:09.0769491Z @%p131 mbarrier.init.shared::cta.b64 [%r530], 1; 2026-02-21T09:52:09.0769906Z // end inline asm 2026-02-21T09:52:09.0770182Z bar.sync 0, 128; 2026-02-21T09:52:09.0770458Z add.s32 %r531, %r232, 352344; 2026-02-21T09:52:09.0770782Z // begin inline asm 2026-02-21T09:52:09.0771139Z @%p131 mbarrier.init.shared::cta.b64 [%r531], 1; 2026-02-21T09:52:09.0771546Z // end inline asm 2026-02-21T09:52:09.0771821Z bar.sync 0, 128; 2026-02-21T09:52:09.0772092Z add.s32 %r532, %r232, 352352; 2026-02-21T09:52:09.0772417Z // begin inline asm 2026-02-21T09:52:09.0772752Z @%p131 mbarrier.init.shared::cta.b64 [%r532], 1; 2026-02-21T09:52:09.0773172Z // end inline asm 2026-02-21T09:52:09.0773433Z bar.sync 0, 128; 2026-02-21T09:52:09.0773718Z add.s32 %r533, %r232, 352360; 2026-02-21T09:52:09.0774031Z // begin inline asm 2026-02-21T09:52:09.0774377Z @%p131 mbarrier.init.shared::cta.b64 [%r533], 1; 2026-02-21T09:52:09.0774878Z // end inline asm 2026-02-21T09:52:09.0775139Z bar.sync 0, 128; 2026-02-21T09:52:09.0775414Z add.s32 %r534, %r232, 352368; 2026-02-21T09:52:09.0775720Z // begin inline asm 2026-02-21T09:52:09.0776055Z @%p131 mbarrier.init.shared::cta.b64 [%r534], 1; 2026-02-21T09:52:09.0776464Z // end inline asm 2026-02-21T09:52:09.0777036Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.0777855Z bar.sync 0, 128; 2026-02-21T09:52:09.0778122Z // begin inline asm 2026-02-21T09:52:09.0778493Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r521]; 2026-02-21T09:52:09.0778924Z // end inline asm 2026-02-21T09:52:09.0779209Z bar.sync 0, 128; 2026-02-21T09:52:09.0779487Z // begin inline asm 2026-02-21T09:52:09.0779866Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r522]; 2026-02-21T09:52:09.0780296Z // end inline asm 2026-02-21T09:52:09.0780596Z bar.sync 0, 128; 2026-02-21T09:52:09.0781000Z // begin inline asm 2026-02-21T09:52:09.0781378Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r523]; 2026-02-21T09:52:09.0781812Z // end inline asm 2026-02-21T09:52:09.0782085Z bar.sync 0, 128; 2026-02-21T09:52:09.0782383Z // begin inline asm 2026-02-21T09:52:09.0782742Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r524]; 2026-02-21T09:52:09.0783174Z // end inline asm 2026-02-21T09:52:09.0783445Z bar.sync 0, 128; 2026-02-21T09:52:09.0783727Z // begin inline asm 2026-02-21T09:52:09.0784092Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r525]; 2026-02-21T09:52:09.0784517Z // end inline asm 2026-02-21T09:52:09.0784872Z bar.sync 0, 128; 2026-02-21T09:52:09.0785159Z // begin inline asm 2026-02-21T09:52:09.0785528Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r526]; 2026-02-21T09:52:09.0785944Z // end inline asm 2026-02-21T09:52:09.0786240Z bar.sync 0, 128; 2026-02-21T09:52:09.0786506Z // begin inline asm 2026-02-21T09:52:09.0786841Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r527]; 2026-02-21T09:52:09.0787239Z // end inline asm 2026-02-21T09:52:09.0787773Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.0788392Z bar.sync 0, 128; 2026-02-21T09:52:09.0788666Z add.s32 %r542, %r232, 352384; 2026-02-21T09:52:09.0788976Z // begin inline asm 2026-02-21T09:52:09.0789406Z @%p131 mbarrier.init.shared::cta.b64 [%r542], 1; 2026-02-21T09:52:09.0789885Z // end inline asm 2026-02-21T09:52:09.0790141Z bar.sync 0, 128; 2026-02-21T09:52:09.0790404Z add.s32 %r543, %r232, 352392; 2026-02-21T09:52:09.0790702Z // begin inline asm 2026-02-21T09:52:09.0791030Z @%p131 mbarrier.init.shared::cta.b64 [%r543], 1; 2026-02-21T09:52:09.0791405Z // end inline asm 2026-02-21T09:52:09.0791674Z add.s32 %r544, %r232, 352400; 2026-02-21T09:52:09.0791983Z // begin inline asm 2026-02-21T09:52:09.0792299Z @%p131 mbarrier.init.shared::cta.b64 [%r544], 1; 2026-02-21T09:52:09.0792708Z // end inline asm 2026-02-21T09:52:09.0792961Z bar.sync 0, 128; 2026-02-21T09:52:09.0793242Z add.s32 %r545, %r232, 352408; 2026-02-21T09:52:09.0793544Z // begin inline asm 2026-02-21T09:52:09.0793881Z @%p131 mbarrier.init.shared::cta.b64 [%r545], 1; 2026-02-21T09:52:09.0794277Z // end inline asm 2026-02-21T09:52:09.0794927Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0795593Z bar.sync 0, 128; 2026-02-21T09:52:09.0795863Z // begin inline asm 2026-02-21T09:52:09.0796215Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r544]; 2026-02-21T09:52:09.0796630Z // end inline asm 2026-02-21T09:52:09.0796894Z bar.sync 0, 128; 2026-02-21T09:52:09.0797151Z // begin inline asm 2026-02-21T09:52:09.0797495Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r545]; 2026-02-21T09:52:09.0797904Z // end inline asm 2026-02-21T09:52:09.0798440Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.0799141Z st.shared.b32 [global_smem+352416], 33554689; 2026-02-21T09:52:09.0799581Z st.shared.b32 [global_smem+344064], %r1642; 2026-02-21T09:52:09.0799965Z barrier.sync 1; 2026-02-21T09:52:09.0800284Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:52:09.0800662Z barrier.sync 1; 2026-02-21T09:52:09.0800939Z setp.lt.s32 %p124, %r1662, 1; 2026-02-21T09:52:09.0801271Z mov.b32 %r1675, %r1673; 2026-02-21T09:52:09.0801580Z @%p124 bra $L__BB0_23; 2026-02-21T09:52:09.0802054Z // %bb.17: // %.lr.ph12 2026-02-21T09:52:09.0802708Z .loc 1 0 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0:74 2026-02-21T09:52:09.0803398Z ld.param.b64 %rd5, [_helion_matmul_param_2]; 2026-02-21T09:52:09.0803799Z shl.b32 %r560, %r1, 3; 2026-02-21T09:52:09.0804092Z and.b32 %r44, %r560, 120; 2026-02-21T09:52:09.0804402Z shr.u32 %r561, %r1, 4; 2026-02-21T09:52:09.0804769Z bfe.u32 %r45, %r1, 4, 3; 2026-02-21T09:52:09.0805088Z or.b32 %r46, %r45, 8; 2026-02-21T09:52:09.0805367Z or.b32 %r47, %r45, 16; 2026-02-21T09:52:09.0805817Z or.b32 %r48, %r45, 24; 2026-02-21T09:52:09.0806091Z or.b32 %r49, %r45, 32; 2026-02-21T09:52:09.0806367Z or.b32 %r50, %r45, 40; 2026-02-21T09:52:09.0806644Z or.b32 %r51, %r45, 48; 2026-02-21T09:52:09.0806917Z or.b32 %r52, %r561, 56; 2026-02-21T09:52:09.0807207Z or.b32 %r53, %r45, 64; 2026-02-21T09:52:09.0807482Z or.b32 %r54, %r45, 72; 2026-02-21T09:52:09.0807760Z or.b32 %r55, %r45, 80; 2026-02-21T09:52:09.0808037Z or.b32 %r56, %r45, 88; 2026-02-21T09:52:09.0808314Z or.b32 %r57, %r45, 96; 2026-02-21T09:52:09.0808585Z or.b32 %r58, %r45, 104; 2026-02-21T09:52:09.0808875Z or.b32 %r59, %r45, 112; 2026-02-21T09:52:09.0809154Z or.b32 %r60, %r561, 120; 2026-02-21T09:52:09.0809449Z or.b32 %r61, %r45, 128; 2026-02-21T09:52:09.0809733Z or.b32 %r62, %r45, 136; 2026-02-21T09:52:09.0810006Z or.b32 %r63, %r45, 144; 2026-02-21T09:52:09.0810287Z or.b32 %r64, %r45, 152; 2026-02-21T09:52:09.0810559Z or.b32 %r65, %r45, 160; 2026-02-21T09:52:09.0810847Z or.b32 %r66, %r45, 168; 2026-02-21T09:52:09.0811122Z or.b32 %r67, %r45, 176; 2026-02-21T09:52:09.0811415Z or.b32 %r68, %r561, 184; 2026-02-21T09:52:09.0811700Z or.b32 %r69, %r45, 192; 2026-02-21T09:52:09.0811989Z or.b32 %r70, %r45, 200; 2026-02-21T09:52:09.0812273Z or.b32 %r71, %r45, 208; 2026-02-21T09:52:09.0812699Z or.b32 %r72, %r45, 216; 2026-02-21T09:52:09.0813004Z or.b32 %r73, %r45, 224; 2026-02-21T09:52:09.0813412Z or.b32 %r74, %r45, 232; 2026-02-21T09:52:09.0813707Z or.b32 %r75, %r45, 240; 2026-02-21T09:52:09.0813980Z or.b32 %r76, %r561, 248; 2026-02-21T09:52:09.0814537Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.0815288Z add.s32 %r1671, %r43, -1; 2026-02-21T09:52:09.0815614Z shl.b32 %r568, %r1, 10; 2026-02-21T09:52:09.0815915Z and.b32 %r569, %r568, 6144; 2026-02-21T09:52:09.0816244Z shl.b32 %r570, %r1, 4; 2026-02-21T09:52:09.0816552Z and.b32 %r571, %r570, 2032; 2026-02-21T09:52:09.0816882Z or.b32 %r572, %r569, %r571; 2026-02-21T09:52:09.0817213Z add.s32 %r574, %r232, 344064; 2026-02-21T09:52:09.0817522Z add.s32 %r80, %r574, %r572; 2026-02-21T09:52:09.0817851Z xor.b32 %r575, %r572, 32; 2026-02-21T09:52:09.0818157Z add.s32 %r81, %r574, %r575; 2026-02-21T09:52:09.0818476Z xor.b32 %r576, %r572, 64; 2026-02-21T09:52:09.0818792Z add.s32 %r82, %r574, %r576; 2026-02-21T09:52:09.0819117Z xor.b32 %r577, %r572, 96; 2026-02-21T09:52:09.0819427Z add.s32 %r83, %r574, %r577; 2026-02-21T09:52:09.0819746Z and.b32 %r578, %r1, 96; 2026-02-21T09:52:09.0820053Z shl.b32 %r579, %r578, 6; 2026-02-21T09:52:09.0820354Z shl.b32 %r580, %r1, 5; 2026-02-21T09:52:09.0820660Z and.b32 %r581, %r580, 96; 2026-02-21T09:52:09.0820965Z and.b32 %r582, %r570, 384; 2026-02-21T09:52:09.0821292Z and.b32 %r584, %r557, 16; 2026-02-21T09:52:09.0821594Z or.b32 %r585, %r579, %r581; 2026-02-21T09:52:09.0821913Z or.b32 %r586, %r582, %r578; 2026-02-21T09:52:09.0822230Z xor.b32 %r587, %r585, %r586; 2026-02-21T09:52:09.0822567Z add.s32 %r588, %r574, %r584; 2026-02-21T09:52:09.0822884Z add.s32 %r869, %r588, %r587; 2026-02-21T09:52:09.0823213Z add.s32 %r874, %r869, 512; 2026-02-21T09:52:09.0823532Z add.s32 %r879, %r869, 1024; 2026-02-21T09:52:09.0823844Z add.s32 %r884, %r869, 1536; 2026-02-21T09:52:09.0824161Z mov.b32 %r1668, -1; 2026-02-21T09:52:09.0824459Z mov.b32 %r1675, %r1673; 2026-02-21T09:52:09.0824889Z mov.b32 %r1670, %r1673; 2026-02-21T09:52:09.0825315Z mov.b32 %r1669, %r1673; 2026-02-21T09:52:09.0825624Z bra.uni $L__BB0_18; 2026-02-21T09:52:09.0826020Z $L__BB0_21: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:52:09.0826505Z shl.b32 %r1154, %r1673, 3; 2026-02-21T09:52:09.0826843Z add.s32 %r1156, %r232, %r1154; 2026-02-21T09:52:09.0827188Z add.s32 %r591, %r1156, 352384; 2026-02-21T09:52:09.0827813Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0828487Z shl.b32 %r1157, %r1673, 8; 2026-02-21T09:52:09.0828943Z bar.sync 0, 128; 2026-02-21T09:52:09.0829226Z // begin inline asm 2026-02-21T09:52:09.0829496Z 2026-02-21T09:52:09.0829706Z { 2026-02-21T09:52:09.0829964Z .reg .pred complete; 2026-02-21T09:52:09.0830244Z waitLoop: 2026-02-21T09:52:09.0830645Z mbarrier.try_wait.parity.shared.b64 complete, [%r591], %r1675; 2026-02-21T09:52:09.0831150Z @!complete bra.uni waitLoop; 2026-02-21T09:52:09.0831458Z } 2026-02-21T09:52:09.0831580Z 2026-02-21T09:52:09.0831690Z // end inline asm 2026-02-21T09:52:09.0832218Z .loc 1 43 32 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:43:32 2026-02-21T09:52:09.0832851Z or.b32 %r1158, %r1669, %r44; 2026-02-21T09:52:09.0833415Z .loc 1 45 32 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:45:32 2026-02-21T09:52:09.0834047Z add.s32 %r1159, %r1670, %r45; 2026-02-21T09:52:09.0834376Z add.s32 %r1160, %r1670, %r46; 2026-02-21T09:52:09.0834749Z add.s32 %r1161, %r1670, %r47; 2026-02-21T09:52:09.0835084Z add.s32 %r1162, %r1670, %r48; 2026-02-21T09:52:09.0835392Z add.s32 %r1163, %r1670, %r49; 2026-02-21T09:52:09.0835718Z add.s32 %r1164, %r1670, %r50; 2026-02-21T09:52:09.0836019Z add.s32 %r1165, %r1670, %r51; 2026-02-21T09:52:09.0836329Z add.s32 %r1166, %r1670, %r52; 2026-02-21T09:52:09.0836734Z add.s32 %r1167, %r1670, %r53; 2026-02-21T09:52:09.0837057Z add.s32 %r1168, %r1670, %r54; 2026-02-21T09:52:09.0837433Z add.s32 %r1169, %r1670, %r55; 2026-02-21T09:52:09.0837750Z add.s32 %r1170, %r1670, %r56; 2026-02-21T09:52:09.0838053Z add.s32 %r1171, %r1670, %r57; 2026-02-21T09:52:09.0838348Z add.s32 %r1172, %r1670, %r58; 2026-02-21T09:52:09.0838654Z add.s32 %r1173, %r1670, %r59; 2026-02-21T09:52:09.0838951Z add.s32 %r1174, %r1670, %r60; 2026-02-21T09:52:09.0839255Z add.s32 %r1175, %r1670, %r61; 2026-02-21T09:52:09.0839553Z add.s32 %r1176, %r1670, %r62; 2026-02-21T09:52:09.0839853Z add.s32 %r1177, %r1670, %r63; 2026-02-21T09:52:09.0840147Z add.s32 %r1178, %r1670, %r64; 2026-02-21T09:52:09.0840458Z add.s32 %r1179, %r1670, %r65; 2026-02-21T09:52:09.0840760Z add.s32 %r1180, %r1670, %r66; 2026-02-21T09:52:09.0841065Z add.s32 %r1181, %r1670, %r67; 2026-02-21T09:52:09.0841367Z add.s32 %r1182, %r1670, %r68; 2026-02-21T09:52:09.0841659Z add.s32 %r1183, %r1670, %r69; 2026-02-21T09:52:09.0841968Z add.s32 %r1184, %r1670, %r70; 2026-02-21T09:52:09.0842266Z add.s32 %r1185, %r1670, %r71; 2026-02-21T09:52:09.0842573Z add.s32 %r1186, %r1670, %r72; 2026-02-21T09:52:09.0842867Z add.s32 %r1187, %r1670, %r73; 2026-02-21T09:52:09.0843170Z add.s32 %r1188, %r1670, %r74; 2026-02-21T09:52:09.0843467Z add.s32 %r1189, %r1670, %r75; 2026-02-21T09:52:09.0843767Z add.s32 %r1190, %r1670, %r76; 2026-02-21T09:52:09.0844329Z .loc 1 59 53 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:59:53 2026-02-21T09:52:09.0845063Z mad.lo.s32 %r1191, %r1159, 12288, %r1158; 2026-02-21T09:52:09.0845452Z mad.lo.s32 %r1192, %r1160, 12288, %r1158; 2026-02-21T09:52:09.0845827Z mad.lo.s32 %r1193, %r1161, 12288, %r1158; 2026-02-21T09:52:09.0846199Z mad.lo.s32 %r1194, %r1162, 12288, %r1158; 2026-02-21T09:52:09.0846558Z mad.lo.s32 %r1195, %r1163, 12288, %r1158; 2026-02-21T09:52:09.0846926Z mad.lo.s32 %r1196, %r1164, 12288, %r1158; 2026-02-21T09:52:09.0847292Z mad.lo.s32 %r1197, %r1165, 12288, %r1158; 2026-02-21T09:52:09.0847673Z mad.lo.s32 %r1198, %r1166, 12288, %r1158; 2026-02-21T09:52:09.0848174Z mad.lo.s32 %r1199, %r1167, 12288, %r1158; 2026-02-21T09:52:09.0848529Z mad.lo.s32 %r1200, %r1168, 12288, %r1158; 2026-02-21T09:52:09.0848904Z mad.lo.s32 %r1201, %r1169, 12288, %r1158; 2026-02-21T09:52:09.0849274Z mad.lo.s32 %r1202, %r1170, 12288, %r1158; 2026-02-21T09:52:09.0849650Z mad.lo.s32 %r1203, %r1171, 12288, %r1158; 2026-02-21T09:52:09.0850012Z mad.lo.s32 %r1204, %r1172, 12288, %r1158; 2026-02-21T09:52:09.0850402Z mad.lo.s32 %r1205, %r1173, 12288, %r1158; 2026-02-21T09:52:09.0850778Z mad.lo.s32 %r1206, %r1174, 12288, %r1158; 2026-02-21T09:52:09.0851264Z mad.lo.s32 %r1207, %r1175, 12288, %r1158; 2026-02-21T09:52:09.0851643Z mad.lo.s32 %r1208, %r1176, 12288, %r1158; 2026-02-21T09:52:09.0852009Z mad.lo.s32 %r1209, %r1177, 12288, %r1158; 2026-02-21T09:52:09.0852383Z mad.lo.s32 %r1210, %r1178, 12288, %r1158; 2026-02-21T09:52:09.0852755Z mad.lo.s32 %r1211, %r1179, 12288, %r1158; 2026-02-21T09:52:09.0853136Z mad.lo.s32 %r1212, %r1180, 12288, %r1158; 2026-02-21T09:52:09.0853514Z mad.lo.s32 %r1213, %r1181, 12288, %r1158; 2026-02-21T09:52:09.0853890Z mad.lo.s32 %r1214, %r1182, 12288, %r1158; 2026-02-21T09:52:09.0854269Z mad.lo.s32 %r1215, %r1183, 12288, %r1158; 2026-02-21T09:52:09.0854636Z mad.lo.s32 %r1216, %r1184, 12288, %r1158; 2026-02-21T09:52:09.0855101Z mad.lo.s32 %r1217, %r1185, 12288, %r1158; 2026-02-21T09:52:09.0855469Z mad.lo.s32 %r1218, %r1186, 12288, %r1158; 2026-02-21T09:52:09.0855847Z mad.lo.s32 %r1219, %r1187, 12288, %r1158; 2026-02-21T09:52:09.0856217Z mad.lo.s32 %r1220, %r1188, 12288, %r1158; 2026-02-21T09:52:09.0856594Z mad.lo.s32 %r1221, %r1189, 12288, %r1158; 2026-02-21T09:52:09.0856968Z mad.lo.s32 %r1222, %r1190, 12288, %r1158; 2026-02-21T09:52:09.0857591Z .loc 1 59 24 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:59:24 2026-02-21T09:52:09.0858249Z mad.wide.s32 %rd79, %r1191, 2, %rd5; 2026-02-21T09:52:09.0858725Z mad.wide.s32 %rd80, %r1192, 2, %rd5; 2026-02-21T09:52:09.0859179Z mad.wide.s32 %rd81, %r1193, 2, %rd5; 2026-02-21T09:52:09.0859542Z mad.wide.s32 %rd82, %r1194, 2, %rd5; 2026-02-21T09:52:09.0859901Z mad.wide.s32 %rd83, %r1195, 2, %rd5; 2026-02-21T09:52:09.0860254Z mad.wide.s32 %rd84, %r1196, 2, %rd5; 2026-02-21T09:52:09.0860613Z mad.wide.s32 %rd85, %r1197, 2, %rd5; 2026-02-21T09:52:09.0860964Z mad.wide.s32 %rd86, %r1198, 2, %rd5; 2026-02-21T09:52:09.0861322Z mad.wide.s32 %rd87, %r1199, 2, %rd5; 2026-02-21T09:52:09.0861682Z mad.wide.s32 %rd88, %r1200, 2, %rd5; 2026-02-21T09:52:09.0862033Z mad.wide.s32 %rd89, %r1201, 2, %rd5; 2026-02-21T09:52:09.0862398Z mad.wide.s32 %rd90, %r1202, 2, %rd5; 2026-02-21T09:52:09.0862749Z mad.wide.s32 %rd91, %r1203, 2, %rd5; 2026-02-21T09:52:09.0863112Z mad.wide.s32 %rd92, %r1204, 2, %rd5; 2026-02-21T09:52:09.0863472Z mad.wide.s32 %rd93, %r1205, 2, %rd5; 2026-02-21T09:52:09.0863826Z mad.wide.s32 %rd94, %r1206, 2, %rd5; 2026-02-21T09:52:09.0864174Z mad.wide.s32 %rd95, %r1207, 2, %rd5; 2026-02-21T09:52:09.0864535Z mad.wide.s32 %rd96, %r1208, 2, %rd5; 2026-02-21T09:52:09.0864984Z mad.wide.s32 %rd97, %r1209, 2, %rd5; 2026-02-21T09:52:09.0865328Z mad.wide.s32 %rd98, %r1210, 2, %rd5; 2026-02-21T09:52:09.0865688Z mad.wide.s32 %rd99, %r1211, 2, %rd5; 2026-02-21T09:52:09.0866045Z mad.wide.s32 %rd100, %r1212, 2, %rd5; 2026-02-21T09:52:09.0866429Z mad.wide.s32 %rd101, %r1213, 2, %rd5; 2026-02-21T09:52:09.0866796Z mad.wide.s32 %rd102, %r1214, 2, %rd5; 2026-02-21T09:52:09.0867205Z mad.wide.s32 %rd103, %r1215, 2, %rd5; 2026-02-21T09:52:09.0867573Z mad.wide.s32 %rd104, %r1216, 2, %rd5; 2026-02-21T09:52:09.0867981Z mad.wide.s32 %rd105, %r1217, 2, %rd5; 2026-02-21T09:52:09.0868369Z mad.wide.s32 %rd106, %r1218, 2, %rd5; 2026-02-21T09:52:09.0868743Z mad.wide.s32 %rd107, %r1219, 2, %rd5; 2026-02-21T09:52:09.0869128Z mad.wide.s32 %rd108, %r1220, 2, %rd5; 2026-02-21T09:52:09.0869496Z mad.wide.s32 %rd109, %r1221, 2, %rd5; 2026-02-21T09:52:09.0869884Z mad.wide.s32 %rd110, %r1222, 2, %rd5; 2026-02-21T09:52:09.0870611Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0871272Z add.s32 %r609, %r249, %r1157; 2026-02-21T09:52:09.0871586Z // begin inline asm 2026-02-21T09:52:09.0872404Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r593, %r594, %r595, %r596, %r597, %r598, %r599, %r600, %r601, %r602, %r603, %r604, %r605, %r606, %r607, %r608}, [%r609 + 0]; 2026-02-21T09:52:09.0873284Z // end inline asm 2026-02-21T09:52:09.0873546Z // begin inline asm 2026-02-21T09:52:09.0874364Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r610, %r611, %r612, %r613, %r614, %r615, %r616, %r617, %r618, %r619, %r620, %r621, %r622, %r623, %r624, %r625}, [%r609 + 16]; 2026-02-21T09:52:09.0875508Z // end inline asm 2026-02-21T09:52:09.0875775Z // begin inline asm 2026-02-21T09:52:09.0876563Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r627, %r628, %r629, %r630, %r631, %r632, %r633, %r634, %r635, %r636, %r637, %r638, %r639, %r640, %r641, %r642}, [%r609 + 32]; 2026-02-21T09:52:09.0877437Z // end inline asm 2026-02-21T09:52:09.0877701Z // begin inline asm 2026-02-21T09:52:09.0878461Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r644, %r645, %r646, %r647, %r648, %r649, %r650, %r651, %r652, %r653, %r654, %r655, %r656, %r657, %r658, %r659}, [%r609 + 48]; 2026-02-21T09:52:09.0879308Z // end inline asm 2026-02-21T09:52:09.0879561Z // begin inline asm 2026-02-21T09:52:09.0880318Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r661, %r662, %r663, %r664, %r665, %r666, %r667, %r668, %r669, %r670, %r671, %r672, %r673, %r674, %r675, %r676}, [%r609 + 64]; 2026-02-21T09:52:09.0881165Z // end inline asm 2026-02-21T09:52:09.0881420Z // begin inline asm 2026-02-21T09:52:09.0882174Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r678, %r679, %r680, %r681, %r682, %r683, %r684, %r685, %r686, %r687, %r688, %r689, %r690, %r691, %r692, %r693}, [%r609 + 80]; 2026-02-21T09:52:09.0883015Z // end inline asm 2026-02-21T09:52:09.0883382Z // begin inline asm 2026-02-21T09:52:09.0884232Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r695, %r696, %r697, %r698, %r699, %r700, %r701, %r702, %r703, %r704, %r705, %r706, %r707, %r708, %r709, %r710}, [%r609 + 96]; 2026-02-21T09:52:09.0885185Z // end inline asm 2026-02-21T09:52:09.0885458Z // begin inline asm 2026-02-21T09:52:09.0886232Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r712, %r713, %r714, %r715, %r716, %r717, %r718, %r719, %r720, %r721, %r722, %r723, %r724, %r725, %r726, %r727}, [%r609 + 112]; 2026-02-21T09:52:09.0887102Z // end inline asm 2026-02-21T09:52:09.0887363Z // begin inline asm 2026-02-21T09:52:09.0888149Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r729, %r730, %r731, %r732, %r733, %r734, %r735, %r736, %r737, %r738, %r739, %r740, %r741, %r742, %r743, %r744}, [%r609 + 128]; 2026-02-21T09:52:09.0889036Z // end inline asm 2026-02-21T09:52:09.0889305Z // begin inline asm 2026-02-21T09:52:09.0890098Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r746, %r747, %r748, %r749, %r750, %r751, %r752, %r753, %r754, %r755, %r756, %r757, %r758, %r759, %r760, %r761}, [%r609 + 144]; 2026-02-21T09:52:09.0890956Z // end inline asm 2026-02-21T09:52:09.0891227Z // begin inline asm 2026-02-21T09:52:09.0892000Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r763, %r764, %r765, %r766, %r767, %r768, %r769, %r770, %r771, %r772, %r773, %r774, %r775, %r776, %r777, %r778}, [%r609 + 160]; 2026-02-21T09:52:09.0892862Z // end inline asm 2026-02-21T09:52:09.0893137Z // begin inline asm 2026-02-21T09:52:09.0893912Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r780, %r781, %r782, %r783, %r784, %r785, %r786, %r787, %r788, %r789, %r790, %r791, %r792, %r793, %r794, %r795}, [%r609 + 176]; 2026-02-21T09:52:09.0894872Z // end inline asm 2026-02-21T09:52:09.0895136Z // begin inline asm 2026-02-21T09:52:09.0895920Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r797, %r798, %r799, %r800, %r801, %r802, %r803, %r804, %r805, %r806, %r807, %r808, %r809, %r810, %r811, %r812}, [%r609 + 192]; 2026-02-21T09:52:09.0896797Z // end inline asm 2026-02-21T09:52:09.0897074Z // begin inline asm 2026-02-21T09:52:09.0897865Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r814, %r815, %r816, %r817, %r818, %r819, %r820, %r821, %r822, %r823, %r824, %r825, %r826, %r827, %r828, %r829}, [%r609 + 208]; 2026-02-21T09:52:09.0898912Z // end inline asm 2026-02-21T09:52:09.0899191Z // begin inline asm 2026-02-21T09:52:09.0899972Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r831, %r832, %r833, %r834, %r835, %r836, %r837, %r838, %r839, %r840, %r841, %r842, %r843, %r844, %r845, %r846}, [%r609 + 224]; 2026-02-21T09:52:09.0900847Z // end inline asm 2026-02-21T09:52:09.0901115Z // begin inline asm 2026-02-21T09:52:09.0901905Z tcgen05.ld.sync.aligned.32x32b.x16.b32 {%r848, %r849, %r850, %r851, %r852, %r853, %r854, %r855, %r856, %r857, %r858, %r859, %r860, %r861, %r862, %r863}, [%r609 + 240]; 2026-02-21T09:52:09.0902907Z // end inline asm 2026-02-21T09:52:09.0903163Z // begin inline asm 2026-02-21T09:52:09.0903465Z tcgen05.wait::ld.sync.aligned; 2026-02-21T09:52:09.0903788Z // end inline asm 2026-02-21T09:52:09.0904066Z cvt.u64.u32 %rd111, %r593; 2026-02-21T09:52:09.0904382Z cvt.u64.u32 %rd112, %r594; 2026-02-21T09:52:09.0904768Z shl.b64 %rd113, %rd112, 32; 2026-02-21T09:52:09.0905097Z or.b64 %rd114, %rd111, %rd113; 2026-02-21T09:52:09.0905706Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0906369Z mov.b64 {%r1223, %r1224}, %rd114; 2026-02-21T09:52:09.0906726Z cvt.rn.f16x2.f32 %r1225, %r1224, %r1223; 2026-02-21T09:52:09.0907346Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0907986Z cvt.u64.u32 %rd115, %r595; 2026-02-21T09:52:09.0908300Z cvt.u64.u32 %rd116, %r596; 2026-02-21T09:52:09.0908607Z shl.b64 %rd117, %rd116, 32; 2026-02-21T09:52:09.0908920Z or.b64 %rd118, %rd115, %rd117; 2026-02-21T09:52:09.0909604Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0910241Z mov.b64 {%r1226, %r1227}, %rd118; 2026-02-21T09:52:09.0910696Z cvt.rn.f16x2.f32 %r1228, %r1227, %r1226; 2026-02-21T09:52:09.0911322Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0911940Z cvt.u64.u32 %rd119, %r597; 2026-02-21T09:52:09.0912244Z cvt.u64.u32 %rd120, %r598; 2026-02-21T09:52:09.0912554Z shl.b64 %rd121, %rd120, 32; 2026-02-21T09:52:09.0912861Z or.b64 %rd122, %rd119, %rd121; 2026-02-21T09:52:09.0913446Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0914085Z mov.b64 {%r1229, %r1230}, %rd122; 2026-02-21T09:52:09.0914430Z cvt.rn.f16x2.f32 %r1231, %r1230, %r1229; 2026-02-21T09:52:09.0915164Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0915781Z cvt.u64.u32 %rd123, %r599; 2026-02-21T09:52:09.0916093Z cvt.u64.u32 %rd124, %r600; 2026-02-21T09:52:09.0916390Z shl.b64 %rd125, %rd124, 32; 2026-02-21T09:52:09.0916710Z or.b64 %rd126, %rd123, %rd125; 2026-02-21T09:52:09.0917285Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0917905Z mov.b64 {%r1232, %r1233}, %rd126; 2026-02-21T09:52:09.0918257Z cvt.rn.f16x2.f32 %r1234, %r1233, %r1232; 2026-02-21T09:52:09.0918859Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0919473Z cvt.u64.u32 %rd127, %r601; 2026-02-21T09:52:09.0919774Z cvt.u64.u32 %rd128, %r602; 2026-02-21T09:52:09.0920083Z shl.b64 %rd129, %rd128, 32; 2026-02-21T09:52:09.0920394Z or.b64 %rd130, %rd127, %rd129; 2026-02-21T09:52:09.0920970Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0921604Z mov.b64 {%r1235, %r1236}, %rd130; 2026-02-21T09:52:09.0921951Z cvt.rn.f16x2.f32 %r1237, %r1236, %r1235; 2026-02-21T09:52:09.0922586Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0923332Z cvt.u64.u32 %rd131, %r603; 2026-02-21T09:52:09.0923637Z cvt.u64.u32 %rd132, %r604; 2026-02-21T09:52:09.0923936Z shl.b64 %rd133, %rd132, 32; 2026-02-21T09:52:09.0924244Z or.b64 %rd134, %rd131, %rd133; 2026-02-21T09:52:09.0924878Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0925501Z mov.b64 {%r1238, %r1239}, %rd134; 2026-02-21T09:52:09.0925862Z cvt.rn.f16x2.f32 %r1240, %r1239, %r1238; 2026-02-21T09:52:09.0926605Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0927250Z cvt.u64.u32 %rd135, %r605; 2026-02-21T09:52:09.0927559Z cvt.u64.u32 %rd136, %r606; 2026-02-21T09:52:09.0927871Z shl.b64 %rd137, %rd136, 32; 2026-02-21T09:52:09.0928191Z or.b64 %rd138, %rd135, %rd137; 2026-02-21T09:52:09.0928782Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0929426Z mov.b64 {%r1241, %r1242}, %rd138; 2026-02-21T09:52:09.0929778Z cvt.rn.f16x2.f32 %r1243, %r1242, %r1241; 2026-02-21T09:52:09.0930408Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0931032Z cvt.u64.u32 %rd139, %r607; 2026-02-21T09:52:09.0931352Z cvt.u64.u32 %rd140, %r608; 2026-02-21T09:52:09.0931660Z shl.b64 %rd141, %rd140, 32; 2026-02-21T09:52:09.0931980Z or.b64 %rd142, %rd139, %rd141; 2026-02-21T09:52:09.0932568Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0933201Z mov.b64 {%r1244, %r1245}, %rd142; 2026-02-21T09:52:09.0933565Z cvt.rn.f16x2.f32 %r1246, %r1245, %r1244; 2026-02-21T09:52:09.0934288Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0935028Z cvt.u64.u32 %rd143, %r610; 2026-02-21T09:52:09.0935419Z cvt.u64.u32 %rd144, %r611; 2026-02-21T09:52:09.0935743Z shl.b64 %rd145, %rd144, 32; 2026-02-21T09:52:09.0936063Z or.b64 %rd146, %rd143, %rd145; 2026-02-21T09:52:09.0936649Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0937294Z mov.b64 {%r1247, %r1248}, %rd146; 2026-02-21T09:52:09.0937647Z cvt.rn.f16x2.f32 %r1249, %r1248, %r1247; 2026-02-21T09:52:09.0938449Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0939172Z cvt.u64.u32 %rd147, %r612; 2026-02-21T09:52:09.0939488Z cvt.u64.u32 %rd148, %r613; 2026-02-21T09:52:09.0939793Z shl.b64 %rd149, %rd148, 32; 2026-02-21T09:52:09.0940113Z or.b64 %rd150, %rd147, %rd149; 2026-02-21T09:52:09.0940701Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0941331Z mov.b64 {%r1250, %r1251}, %rd150; 2026-02-21T09:52:09.0941698Z cvt.rn.f16x2.f32 %r1252, %r1251, %r1250; 2026-02-21T09:52:09.0942314Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0942944Z cvt.u64.u32 %rd151, %r614; 2026-02-21T09:52:09.0943247Z cvt.u64.u32 %rd152, %r615; 2026-02-21T09:52:09.0943565Z shl.b64 %rd153, %rd152, 32; 2026-02-21T09:52:09.0943889Z or.b64 %rd154, %rd151, %rd153; 2026-02-21T09:52:09.0944467Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0945195Z mov.b64 {%r1253, %r1254}, %rd154; 2026-02-21T09:52:09.0945553Z cvt.rn.f16x2.f32 %r1255, %r1254, %r1253; 2026-02-21T09:52:09.0946190Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0946831Z cvt.u64.u32 %rd155, %r616; 2026-02-21T09:52:09.0947145Z cvt.u64.u32 %rd156, %r617; 2026-02-21T09:52:09.0947447Z shl.b64 %rd157, %rd156, 32; 2026-02-21T09:52:09.0947769Z or.b64 %rd158, %rd155, %rd157; 2026-02-21T09:52:09.0948458Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0949078Z mov.b64 {%r1256, %r1257}, %rd158; 2026-02-21T09:52:09.0949436Z cvt.rn.f16x2.f32 %r1258, %r1257, %r1256; 2026-02-21T09:52:09.0950031Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0950653Z cvt.u64.u32 %rd159, %r618; 2026-02-21T09:52:09.0950951Z cvt.u64.u32 %rd160, %r619; 2026-02-21T09:52:09.0951261Z shl.b64 %rd161, %rd160, 32; 2026-02-21T09:52:09.0951680Z or.b64 %rd162, %rd159, %rd161; 2026-02-21T09:52:09.0952241Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0952867Z mov.b64 {%r1259, %r1260}, %rd162; 2026-02-21T09:52:09.0953206Z cvt.rn.f16x2.f32 %r1261, %r1260, %r1259; 2026-02-21T09:52:09.0953832Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0954468Z cvt.u64.u32 %rd163, %r620; 2026-02-21T09:52:09.0954849Z cvt.u64.u32 %rd164, %r621; 2026-02-21T09:52:09.0955158Z shl.b64 %rd165, %rd164, 32; 2026-02-21T09:52:09.0955487Z or.b64 %rd166, %rd163, %rd165; 2026-02-21T09:52:09.0956086Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0956728Z mov.b64 {%r1262, %r1263}, %rd166; 2026-02-21T09:52:09.0957085Z cvt.rn.f16x2.f32 %r1264, %r1263, %r1262; 2026-02-21T09:52:09.0957699Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0958326Z cvt.u64.u32 %rd167, %r622; 2026-02-21T09:52:09.0958631Z cvt.u64.u32 %rd168, %r623; 2026-02-21T09:52:09.0958942Z shl.b64 %rd169, %rd168, 32; 2026-02-21T09:52:09.0959255Z or.b64 %rd170, %rd167, %rd169; 2026-02-21T09:52:09.0960008Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0960654Z mov.b64 {%r1265, %r1266}, %rd170; 2026-02-21T09:52:09.0960994Z cvt.rn.f16x2.f32 %r1267, %r1266, %r1265; 2026-02-21T09:52:09.0961613Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0962236Z cvt.u64.u32 %rd171, %r624; 2026-02-21T09:52:09.0962543Z cvt.u64.u32 %rd172, %r625; 2026-02-21T09:52:09.0962846Z shl.b64 %rd173, %rd172, 32; 2026-02-21T09:52:09.0963159Z or.b64 %rd174, %rd171, %rd173; 2026-02-21T09:52:09.0963733Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0964355Z mov.b64 {%r1268, %r1269}, %rd174; 2026-02-21T09:52:09.0964790Z cvt.rn.f16x2.f32 %r1270, %r1269, %r1268; 2026-02-21T09:52:09.0965418Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0966062Z cvt.u64.u32 %rd175, %r627; 2026-02-21T09:52:09.0966362Z cvt.u64.u32 %rd176, %r628; 2026-02-21T09:52:09.0966670Z shl.b64 %rd177, %rd176, 32; 2026-02-21T09:52:09.0966980Z or.b64 %rd178, %rd175, %rd177; 2026-02-21T09:52:09.0967535Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0968147Z mov.b64 {%r1271, %r1272}, %rd178; 2026-02-21T09:52:09.0968486Z cvt.rn.f16x2.f32 %r1273, %r1272, %r1271; 2026-02-21T09:52:09.0969090Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0969699Z cvt.u64.u32 %rd179, %r629; 2026-02-21T09:52:09.0970012Z cvt.u64.u32 %rd180, %r630; 2026-02-21T09:52:09.0970319Z shl.b64 %rd181, %rd180, 32; 2026-02-21T09:52:09.0970622Z or.b64 %rd182, %rd179, %rd181; 2026-02-21T09:52:09.0971192Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0971809Z mov.b64 {%r1274, %r1275}, %rd182; 2026-02-21T09:52:09.0972167Z cvt.rn.f16x2.f32 %r1276, %r1275, %r1274; 2026-02-21T09:52:09.0972913Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0973559Z cvt.u64.u32 %rd183, %r631; 2026-02-21T09:52:09.0973865Z cvt.u64.u32 %rd184, %r632; 2026-02-21T09:52:09.0974184Z shl.b64 %rd185, %rd184, 32; 2026-02-21T09:52:09.0974503Z or.b64 %rd186, %rd183, %rd185; 2026-02-21T09:52:09.0975161Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0975800Z mov.b64 {%r1277, %r1278}, %rd186; 2026-02-21T09:52:09.0976257Z cvt.rn.f16x2.f32 %r1279, %r1278, %r1277; 2026-02-21T09:52:09.0976891Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0977517Z cvt.u64.u32 %rd187, %r633; 2026-02-21T09:52:09.0977832Z cvt.u64.u32 %rd188, %r634; 2026-02-21T09:52:09.0978157Z shl.b64 %rd189, %rd188, 32; 2026-02-21T09:52:09.0978477Z or.b64 %rd190, %rd187, %rd189; 2026-02-21T09:52:09.0979068Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0979692Z mov.b64 {%r1280, %r1281}, %rd190; 2026-02-21T09:52:09.0980049Z cvt.rn.f16x2.f32 %r1282, %r1281, %r1280; 2026-02-21T09:52:09.0980668Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0981306Z cvt.u64.u32 %rd191, %r635; 2026-02-21T09:52:09.0981612Z cvt.u64.u32 %rd192, %r636; 2026-02-21T09:52:09.0981934Z shl.b64 %rd193, %rd192, 32; 2026-02-21T09:52:09.0982264Z or.b64 %rd194, %rd191, %rd193; 2026-02-21T09:52:09.0982843Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0983476Z mov.b64 {%r1283, %r1284}, %rd194; 2026-02-21T09:52:09.0983824Z cvt.rn.f16x2.f32 %r1285, %r1284, %r1283; 2026-02-21T09:52:09.0984732Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0985387Z cvt.u64.u32 %rd195, %r637; 2026-02-21T09:52:09.0985706Z cvt.u64.u32 %rd196, %r638; 2026-02-21T09:52:09.0986018Z shl.b64 %rd197, %rd196, 32; 2026-02-21T09:52:09.0986333Z or.b64 %rd198, %rd195, %rd197; 2026-02-21T09:52:09.0986923Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0987557Z mov.b64 {%r1286, %r1287}, %rd198; 2026-02-21T09:52:09.0987918Z cvt.rn.f16x2.f32 %r1288, %r1287, %r1286; 2026-02-21T09:52:09.0988541Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0989187Z cvt.u64.u32 %rd199, %r639; 2026-02-21T09:52:09.0989494Z cvt.u64.u32 %rd200, %r640; 2026-02-21T09:52:09.0989809Z shl.b64 %rd201, %rd200, 32; 2026-02-21T09:52:09.0990129Z or.b64 %rd202, %rd199, %rd201; 2026-02-21T09:52:09.0990723Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0991382Z mov.b64 {%r1289, %r1290}, %rd202; 2026-02-21T09:52:09.0991732Z cvt.rn.f16x2.f32 %r1291, %r1290, %r1289; 2026-02-21T09:52:09.0992359Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0992996Z cvt.u64.u32 %rd203, %r641; 2026-02-21T09:52:09.0993326Z cvt.u64.u32 %rd204, %r642; 2026-02-21T09:52:09.0993656Z shl.b64 %rd205, %rd204, 32; 2026-02-21T09:52:09.0993967Z or.b64 %rd206, %rd203, %rd205; 2026-02-21T09:52:09.0994555Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0995307Z mov.b64 {%r1292, %r1293}, %rd206; 2026-02-21T09:52:09.0995682Z cvt.rn.f16x2.f32 %r1294, %r1293, %r1292; 2026-02-21T09:52:09.0996288Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.0996922Z cvt.u64.u32 %rd207, %r644; 2026-02-21T09:52:09.0997231Z cvt.u64.u32 %rd208, %r645; 2026-02-21T09:52:09.0997683Z shl.b64 %rd209, %rd208, 32; 2026-02-21T09:52:09.0998005Z or.b64 %rd210, %rd207, %rd209; 2026-02-21T09:52:09.0998575Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.0999199Z mov.b64 {%r1295, %r1296}, %rd210; 2026-02-21T09:52:09.0999539Z cvt.rn.f16x2.f32 %r1297, %r1296, %r1295; 2026-02-21T09:52:09.1000155Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1000766Z cvt.u64.u32 %rd211, %r646; 2026-02-21T09:52:09.1001206Z cvt.u64.u32 %rd212, %r647; 2026-02-21T09:52:09.1001512Z shl.b64 %rd213, %rd212, 32; 2026-02-21T09:52:09.1001815Z or.b64 %rd214, %rd211, %rd213; 2026-02-21T09:52:09.1002391Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1003016Z mov.b64 {%r1298, %r1299}, %rd214; 2026-02-21T09:52:09.1003373Z cvt.rn.f16x2.f32 %r1300, %r1299, %r1298; 2026-02-21T09:52:09.1003981Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1004604Z cvt.u64.u32 %rd215, %r648; 2026-02-21T09:52:09.1004989Z cvt.u64.u32 %rd216, %r649; 2026-02-21T09:52:09.1005307Z shl.b64 %rd217, %rd216, 32; 2026-02-21T09:52:09.1005625Z or.b64 %rd218, %rd215, %rd217; 2026-02-21T09:52:09.1006206Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1006829Z mov.b64 {%r1301, %r1302}, %rd218; 2026-02-21T09:52:09.1007177Z cvt.rn.f16x2.f32 %r1303, %r1302, %r1301; 2026-02-21T09:52:09.1007781Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1008382Z cvt.u64.u32 %rd219, %r650; 2026-02-21T09:52:09.1008687Z cvt.u64.u32 %rd220, %r651; 2026-02-21T09:52:09.1012396Z shl.b64 %rd221, %rd220, 32; 2026-02-21T09:52:09.1012883Z or.b64 %rd222, %rd219, %rd221; 2026-02-21T09:52:09.1013594Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1014220Z mov.b64 {%r1304, %r1305}, %rd222; 2026-02-21T09:52:09.1014570Z cvt.rn.f16x2.f32 %r1306, %r1305, %r1304; 2026-02-21T09:52:09.1015301Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1015937Z cvt.u64.u32 %rd223, %r652; 2026-02-21T09:52:09.1016258Z cvt.u64.u32 %rd224, %r653; 2026-02-21T09:52:09.1016565Z shl.b64 %rd225, %rd224, 32; 2026-02-21T09:52:09.1016890Z or.b64 %rd226, %rd223, %rd225; 2026-02-21T09:52:09.1017459Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1018077Z mov.b64 {%r1307, %r1308}, %rd226; 2026-02-21T09:52:09.1018420Z cvt.rn.f16x2.f32 %r1309, %r1308, %r1307; 2026-02-21T09:52:09.1019044Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1019654Z cvt.u64.u32 %rd227, %r654; 2026-02-21T09:52:09.1019962Z cvt.u64.u32 %rd228, %r655; 2026-02-21T09:52:09.1020269Z shl.b64 %rd229, %rd228, 32; 2026-02-21T09:52:09.1020573Z or.b64 %rd230, %rd227, %rd229; 2026-02-21T09:52:09.1021143Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1021752Z mov.b64 {%r1310, %r1311}, %rd230; 2026-02-21T09:52:09.1022125Z cvt.rn.f16x2.f32 %r1312, %r1311, %r1310; 2026-02-21T09:52:09.1022722Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1023351Z cvt.u64.u32 %rd231, %r656; 2026-02-21T09:52:09.1023654Z cvt.u64.u32 %rd232, %r657; 2026-02-21T09:52:09.1023948Z shl.b64 %rd233, %rd232, 32; 2026-02-21T09:52:09.1024253Z or.b64 %rd234, %rd231, %rd233; 2026-02-21T09:52:09.1024972Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1025766Z mov.b64 {%r1313, %r1314}, %rd234; 2026-02-21T09:52:09.1026116Z cvt.rn.f16x2.f32 %r1315, %r1314, %r1313; 2026-02-21T09:52:09.1026740Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1027364Z cvt.u64.u32 %rd235, %r658; 2026-02-21T09:52:09.1027679Z cvt.u64.u32 %rd236, %r659; 2026-02-21T09:52:09.1027992Z shl.b64 %rd237, %rd236, 32; 2026-02-21T09:52:09.1028301Z or.b64 %rd238, %rd235, %rd237; 2026-02-21T09:52:09.1028888Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1029653Z mov.b64 {%r1316, %r1317}, %rd238; 2026-02-21T09:52:09.1030015Z cvt.rn.f16x2.f32 %r1318, %r1317, %r1316; 2026-02-21T09:52:09.1030641Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1031286Z cvt.u64.u32 %rd239, %r661; 2026-02-21T09:52:09.1031612Z cvt.u64.u32 %rd240, %r662; 2026-02-21T09:52:09.1031929Z shl.b64 %rd241, %rd240, 32; 2026-02-21T09:52:09.1032250Z or.b64 %rd242, %rd239, %rd241; 2026-02-21T09:52:09.1032831Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1033468Z mov.b64 {%r1319, %r1320}, %rd242; 2026-02-21T09:52:09.1033824Z cvt.rn.f16x2.f32 %r1321, %r1320, %r1319; 2026-02-21T09:52:09.1034457Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1035187Z cvt.u64.u32 %rd243, %r663; 2026-02-21T09:52:09.1035510Z cvt.u64.u32 %rd244, %r664; 2026-02-21T09:52:09.1035826Z shl.b64 %rd245, %rd244, 32; 2026-02-21T09:52:09.1036139Z or.b64 %rd246, %rd243, %rd245; 2026-02-21T09:52:09.1036742Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1037485Z mov.b64 {%r1322, %r1323}, %rd246; 2026-02-21T09:52:09.1037851Z cvt.rn.f16x2.f32 %r1324, %r1323, %r1322; 2026-02-21T09:52:09.1038560Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1039204Z cvt.u64.u32 %rd247, %r665; 2026-02-21T09:52:09.1039519Z cvt.u64.u32 %rd248, %r666; 2026-02-21T09:52:09.1039827Z shl.b64 %rd249, %rd248, 32; 2026-02-21T09:52:09.1040148Z or.b64 %rd250, %rd247, %rd249; 2026-02-21T09:52:09.1040729Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1041369Z mov.b64 {%r1325, %r1326}, %rd250; 2026-02-21T09:52:09.1041725Z cvt.rn.f16x2.f32 %r1327, %r1326, %r1325; 2026-02-21T09:52:09.1042345Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1042966Z cvt.u64.u32 %rd251, %r667; 2026-02-21T09:52:09.1043288Z cvt.u64.u32 %rd252, %r668; 2026-02-21T09:52:09.1043605Z shl.b64 %rd253, %rd252, 32; 2026-02-21T09:52:09.1043913Z or.b64 %rd254, %rd251, %rd253; 2026-02-21T09:52:09.1044512Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1045236Z mov.b64 {%r1328, %r1329}, %rd254; 2026-02-21T09:52:09.1045611Z cvt.rn.f16x2.f32 %r1330, %r1329, %r1328; 2026-02-21T09:52:09.1046231Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1046867Z cvt.u64.u32 %rd255, %r669; 2026-02-21T09:52:09.1047186Z cvt.u64.u32 %rd256, %r670; 2026-02-21T09:52:09.1047487Z shl.b64 %rd257, %rd256, 32; 2026-02-21T09:52:09.1047818Z or.b64 %rd258, %rd255, %rd257; 2026-02-21T09:52:09.1048408Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1049046Z mov.b64 {%r1331, %r1332}, %rd258; 2026-02-21T09:52:09.1049393Z cvt.rn.f16x2.f32 %r1333, %r1332, %r1331; 2026-02-21T09:52:09.1050026Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1050783Z cvt.u64.u32 %rd259, %r671; 2026-02-21T09:52:09.1051104Z cvt.u64.u32 %rd260, %r672; 2026-02-21T09:52:09.1051416Z shl.b64 %rd261, %rd260, 32; 2026-02-21T09:52:09.1051726Z or.b64 %rd262, %rd259, %rd261; 2026-02-21T09:52:09.1052316Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1052945Z mov.b64 {%r1334, %r1335}, %rd262; 2026-02-21T09:52:09.1053304Z cvt.rn.f16x2.f32 %r1336, %r1335, %r1334; 2026-02-21T09:52:09.1053918Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1054659Z cvt.u64.u32 %rd263, %r673; 2026-02-21T09:52:09.1055055Z cvt.u64.u32 %rd264, %r674; 2026-02-21T09:52:09.1055362Z shl.b64 %rd265, %rd264, 32; 2026-02-21T09:52:09.1055684Z or.b64 %rd266, %rd263, %rd265; 2026-02-21T09:52:09.1056282Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1056944Z mov.b64 {%r1337, %r1338}, %rd266; 2026-02-21T09:52:09.1057277Z cvt.rn.f16x2.f32 %r1339, %r1338, %r1337; 2026-02-21T09:52:09.1057899Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1058562Z cvt.u64.u32 %rd267, %r675; 2026-02-21T09:52:09.1058858Z cvt.u64.u32 %rd268, %r676; 2026-02-21T09:52:09.1059162Z shl.b64 %rd269, %rd268, 32; 2026-02-21T09:52:09.1059466Z or.b64 %rd270, %rd267, %rd269; 2026-02-21T09:52:09.1060040Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1060664Z mov.b64 {%r1340, %r1341}, %rd270; 2026-02-21T09:52:09.1061012Z cvt.rn.f16x2.f32 %r1342, %r1341, %r1340; 2026-02-21T09:52:09.1061616Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1062342Z cvt.u64.u32 %rd271, %r678; 2026-02-21T09:52:09.1062667Z cvt.u64.u32 %rd272, %r679; 2026-02-21T09:52:09.1063046Z shl.b64 %rd273, %rd272, 32; 2026-02-21T09:52:09.1063367Z or.b64 %rd274, %rd271, %rd273; 2026-02-21T09:52:09.1063931Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1064562Z mov.b64 {%r1343, %r1344}, %rd274; 2026-02-21T09:52:09.1065004Z cvt.rn.f16x2.f32 %r1345, %r1344, %r1343; 2026-02-21T09:52:09.1065643Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1066290Z cvt.u64.u32 %rd275, %r680; 2026-02-21T09:52:09.1066595Z cvt.u64.u32 %rd276, %r681; 2026-02-21T09:52:09.1066923Z shl.b64 %rd277, %rd276, 32; 2026-02-21T09:52:09.1067235Z or.b64 %rd278, %rd275, %rd277; 2026-02-21T09:52:09.1067827Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1068458Z mov.b64 {%r1346, %r1347}, %rd278; 2026-02-21T09:52:09.1068820Z cvt.rn.f16x2.f32 %r1348, %r1347, %r1346; 2026-02-21T09:52:09.1069453Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1070102Z cvt.u64.u32 %rd279, %r682; 2026-02-21T09:52:09.1070420Z cvt.u64.u32 %rd280, %r683; 2026-02-21T09:52:09.1070726Z shl.b64 %rd281, %rd280, 32; 2026-02-21T09:52:09.1071044Z or.b64 %rd282, %rd279, %rd281; 2026-02-21T09:52:09.1071632Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1072273Z mov.b64 {%r1349, %r1350}, %rd282; 2026-02-21T09:52:09.1072627Z cvt.rn.f16x2.f32 %r1351, %r1350, %r1349; 2026-02-21T09:52:09.1073275Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1073915Z cvt.u64.u32 %rd283, %r684; 2026-02-21T09:52:09.1074223Z cvt.u64.u32 %rd284, %r685; 2026-02-21T09:52:09.1074537Z shl.b64 %rd285, %rd284, 32; 2026-02-21T09:52:09.1074937Z or.b64 %rd286, %rd283, %rd285; 2026-02-21T09:52:09.1075537Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1076305Z mov.b64 {%r1352, %r1353}, %rd286; 2026-02-21T09:52:09.1076662Z cvt.rn.f16x2.f32 %r1354, %r1353, %r1352; 2026-02-21T09:52:09.1077281Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1077918Z cvt.u64.u32 %rd287, %r686; 2026-02-21T09:52:09.1078236Z cvt.u64.u32 %rd288, %r687; 2026-02-21T09:52:09.1078544Z shl.b64 %rd289, %rd288, 32; 2026-02-21T09:52:09.1078867Z or.b64 %rd290, %rd287, %rd289; 2026-02-21T09:52:09.1079576Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1080221Z mov.b64 {%r1355, %r1356}, %rd290; 2026-02-21T09:52:09.1080571Z cvt.rn.f16x2.f32 %r1357, %r1356, %r1355; 2026-02-21T09:52:09.1081207Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1081854Z cvt.u64.u32 %rd291, %r688; 2026-02-21T09:52:09.1082162Z cvt.u64.u32 %rd292, %r689; 2026-02-21T09:52:09.1082479Z shl.b64 %rd293, %rd292, 32; 2026-02-21T09:52:09.1082790Z or.b64 %rd294, %rd291, %rd293; 2026-02-21T09:52:09.1083388Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1083999Z mov.b64 {%r1358, %r1359}, %rd294; 2026-02-21T09:52:09.1084344Z cvt.rn.f16x2.f32 %r1360, %r1359, %r1358; 2026-02-21T09:52:09.1085037Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1085682Z cvt.u64.u32 %rd295, %r690; 2026-02-21T09:52:09.1085999Z cvt.u64.u32 %rd296, %r691; 2026-02-21T09:52:09.1086298Z shl.b64 %rd297, %rd296, 32; 2026-02-21T09:52:09.1086610Z or.b64 %rd298, %rd295, %rd297; 2026-02-21T09:52:09.1087305Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1088011Z mov.b64 {%r1361, %r1362}, %rd298; 2026-02-21T09:52:09.1088362Z cvt.rn.f16x2.f32 %r1363, %r1362, %r1361; 2026-02-21T09:52:09.1088979Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1089604Z cvt.u64.u32 %rd299, %r692; 2026-02-21T09:52:09.1089912Z cvt.u64.u32 %rd300, %r693; 2026-02-21T09:52:09.1090228Z shl.b64 %rd301, %rd300, 32; 2026-02-21T09:52:09.1090537Z or.b64 %rd302, %rd299, %rd301; 2026-02-21T09:52:09.1091128Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1091764Z mov.b64 {%r1364, %r1365}, %rd302; 2026-02-21T09:52:09.1092128Z cvt.rn.f16x2.f32 %r1366, %r1365, %r1364; 2026-02-21T09:52:09.1092735Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1092853Z cvt.u64.u32 %rd303, %r695; 2026-02-21T09:52:09.1092963Z cvt.u64.u32 %rd304, %r696; 2026-02-21T09:52:09.1093079Z shl.b64 %rd305, %rd304, 32; 2026-02-21T09:52:09.1093202Z or.b64 %rd306, %rd303, %rd305; 2026-02-21T09:52:09.1093567Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1093676Z mov.b64 {%r1367, %r1368}, %rd306; 2026-02-21T09:52:09.1093811Z cvt.rn.f16x2.f32 %r1369, %r1368, %r1367; 2026-02-21T09:52:09.1094172Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1094279Z cvt.u64.u32 %rd307, %r697; 2026-02-21T09:52:09.1094385Z cvt.u64.u32 %rd308, %r698; 2026-02-21T09:52:09.1094508Z shl.b64 %rd309, %rd308, 32; 2026-02-21T09:52:09.1094618Z or.b64 %rd310, %rd307, %rd309; 2026-02-21T09:52:09.1095086Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1095216Z mov.b64 {%r1370, %r1371}, %rd310; 2026-02-21T09:52:09.1095346Z cvt.rn.f16x2.f32 %r1372, %r1371, %r1370; 2026-02-21T09:52:09.1095717Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1095949Z cvt.u64.u32 %rd311, %r699; 2026-02-21T09:52:09.1096061Z cvt.u64.u32 %rd312, %r700; 2026-02-21T09:52:09.1096181Z shl.b64 %rd313, %rd312, 32; 2026-02-21T09:52:09.1096290Z or.b64 %rd314, %rd311, %rd313; 2026-02-21T09:52:09.1096665Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1096773Z mov.b64 {%r1373, %r1374}, %rd314; 2026-02-21T09:52:09.1096896Z cvt.rn.f16x2.f32 %r1375, %r1374, %r1373; 2026-02-21T09:52:09.1097353Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1097459Z cvt.u64.u32 %rd315, %r701; 2026-02-21T09:52:09.1097564Z cvt.u64.u32 %rd316, %r702; 2026-02-21T09:52:09.1097679Z shl.b64 %rd317, %rd316, 32; 2026-02-21T09:52:09.1097792Z or.b64 %rd318, %rd315, %rd317; 2026-02-21T09:52:09.1098166Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1098283Z mov.b64 {%r1376, %r1377}, %rd318; 2026-02-21T09:52:09.1098414Z cvt.rn.f16x2.f32 %r1378, %r1377, %r1376; 2026-02-21T09:52:09.1098787Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1098897Z cvt.u64.u32 %rd319, %r703; 2026-02-21T09:52:09.1099016Z cvt.u64.u32 %rd320, %r704; 2026-02-21T09:52:09.1099126Z shl.b64 %rd321, %rd320, 32; 2026-02-21T09:52:09.1099236Z or.b64 %rd322, %rd319, %rd321; 2026-02-21T09:52:09.1099615Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1099725Z mov.b64 {%r1379, %r1380}, %rd322; 2026-02-21T09:52:09.1099850Z cvt.rn.f16x2.f32 %r1381, %r1380, %r1379; 2026-02-21T09:52:09.1100317Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1100508Z cvt.u64.u32 %rd323, %r705; 2026-02-21T09:52:09.1100622Z cvt.u64.u32 %rd324, %r706; 2026-02-21T09:52:09.1100732Z shl.b64 %rd325, %rd324, 32; 2026-02-21T09:52:09.1100852Z or.b64 %rd326, %rd323, %rd325; 2026-02-21T09:52:09.1101227Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1101337Z mov.b64 {%r1382, %r1383}, %rd326; 2026-02-21T09:52:09.1101471Z cvt.rn.f16x2.f32 %r1384, %r1383, %r1382; 2026-02-21T09:52:09.1101841Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1101957Z cvt.u64.u32 %rd327, %r707; 2026-02-21T09:52:09.1102064Z cvt.u64.u32 %rd328, %r708; 2026-02-21T09:52:09.1102183Z shl.b64 %rd329, %rd328, 32; 2026-02-21T09:52:09.1102293Z or.b64 %rd330, %rd327, %rd329; 2026-02-21T09:52:09.1102661Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1102783Z mov.b64 {%r1385, %r1386}, %rd330; 2026-02-21T09:52:09.1102914Z cvt.rn.f16x2.f32 %r1387, %r1386, %r1385; 2026-02-21T09:52:09.1103274Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1103394Z cvt.u64.u32 %rd331, %r709; 2026-02-21T09:52:09.1103503Z cvt.u64.u32 %rd332, %r710; 2026-02-21T09:52:09.1103613Z shl.b64 %rd333, %rd332, 32; 2026-02-21T09:52:09.1103722Z or.b64 %rd334, %rd331, %rd333; 2026-02-21T09:52:09.1104092Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1104208Z mov.b64 {%r1388, %r1389}, %rd334; 2026-02-21T09:52:09.1104336Z cvt.rn.f16x2.f32 %r1390, %r1389, %r1388; 2026-02-21T09:52:09.1104778Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1104895Z cvt.u64.u32 %rd335, %r712; 2026-02-21T09:52:09.1105007Z cvt.u64.u32 %rd336, %r713; 2026-02-21T09:52:09.1105133Z shl.b64 %rd337, %rd336, 32; 2026-02-21T09:52:09.1105344Z or.b64 %rd338, %rd335, %rd337; 2026-02-21T09:52:09.1105712Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1105822Z mov.b64 {%r1391, %r1392}, %rd338; 2026-02-21T09:52:09.1105958Z cvt.rn.f16x2.f32 %r1393, %r1392, %r1391; 2026-02-21T09:52:09.1106324Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1106434Z cvt.u64.u32 %rd339, %r714; 2026-02-21T09:52:09.1106551Z cvt.u64.u32 %rd340, %r715; 2026-02-21T09:52:09.1106755Z shl.b64 %rd341, %rd340, 32; 2026-02-21T09:52:09.1106866Z or.b64 %rd342, %rd339, %rd341; 2026-02-21T09:52:09.1107247Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1107360Z mov.b64 {%r1394, %r1395}, %rd342; 2026-02-21T09:52:09.1107493Z cvt.rn.f16x2.f32 %r1396, %r1395, %r1394; 2026-02-21T09:52:09.1107867Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1107992Z cvt.u64.u32 %rd343, %r716; 2026-02-21T09:52:09.1108105Z cvt.u64.u32 %rd344, %r717; 2026-02-21T09:52:09.1108214Z shl.b64 %rd345, %rd344, 32; 2026-02-21T09:52:09.1108336Z or.b64 %rd346, %rd343, %rd345; 2026-02-21T09:52:09.1108703Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1108816Z mov.b64 {%r1397, %r1398}, %rd346; 2026-02-21T09:52:09.1108951Z cvt.rn.f16x2.f32 %r1399, %r1398, %r1397; 2026-02-21T09:52:09.1109321Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1109431Z cvt.u64.u32 %rd347, %r718; 2026-02-21T09:52:09.1109542Z cvt.u64.u32 %rd348, %r719; 2026-02-21T09:52:09.1109660Z shl.b64 %rd349, %rd348, 32; 2026-02-21T09:52:09.1109863Z or.b64 %rd350, %rd347, %rd349; 2026-02-21T09:52:09.1110309Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1110443Z mov.b64 {%r1400, %r1401}, %rd350; 2026-02-21T09:52:09.1110569Z cvt.rn.f16x2.f32 %r1402, %r1401, %r1400; 2026-02-21T09:52:09.1110939Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1111056Z cvt.u64.u32 %rd351, %r720; 2026-02-21T09:52:09.1111164Z cvt.u64.u32 %rd352, %r721; 2026-02-21T09:52:09.1111273Z shl.b64 %rd353, %rd352, 32; 2026-02-21T09:52:09.1111380Z or.b64 %rd354, %rd351, %rd353; 2026-02-21T09:52:09.1111759Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1111873Z mov.b64 {%r1403, %r1404}, %rd354; 2026-02-21T09:52:09.1112004Z cvt.rn.f16x2.f32 %r1405, %r1404, %r1403; 2026-02-21T09:52:09.1112381Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1112508Z cvt.u64.u32 %rd355, %r722; 2026-02-21T09:52:09.1134487Z cvt.u64.u32 %rd356, %r723; 2026-02-21T09:52:09.1134779Z shl.b64 %rd357, %rd356, 32; 2026-02-21T09:52:09.1134905Z or.b64 %rd358, %rd355, %rd357; 2026-02-21T09:52:09.1135329Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1135468Z mov.b64 {%r1406, %r1407}, %rd358; 2026-02-21T09:52:09.1135610Z cvt.rn.f16x2.f32 %r1408, %r1407, %r1406; 2026-02-21T09:52:09.1135998Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1136144Z cvt.u64.u32 %rd359, %r724; 2026-02-21T09:52:09.1136258Z cvt.u64.u32 %rd360, %r725; 2026-02-21T09:52:09.1136381Z shl.b64 %rd361, %rd360, 32; 2026-02-21T09:52:09.1136498Z or.b64 %rd362, %rd359, %rd361; 2026-02-21T09:52:09.1136884Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1137004Z mov.b64 {%r1409, %r1410}, %rd362; 2026-02-21T09:52:09.1137319Z cvt.rn.f16x2.f32 %r1411, %r1410, %r1409; 2026-02-21T09:52:09.1137705Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1137822Z cvt.u64.u32 %rd363, %r726; 2026-02-21T09:52:09.1137932Z cvt.u64.u32 %rd364, %r727; 2026-02-21T09:52:09.1138044Z shl.b64 %rd365, %rd364, 32; 2026-02-21T09:52:09.1138169Z or.b64 %rd366, %rd363, %rd365; 2026-02-21T09:52:09.1138537Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1138754Z mov.b64 {%r1412, %r1413}, %rd366; 2026-02-21T09:52:09.1138900Z cvt.rn.f16x2.f32 %r1414, %r1413, %r1412; 2026-02-21T09:52:09.1139266Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1139377Z cvt.u64.u32 %rd367, %r729; 2026-02-21T09:52:09.1139500Z cvt.u64.u32 %rd368, %r730; 2026-02-21T09:52:09.1139612Z shl.b64 %rd369, %rd368, 32; 2026-02-21T09:52:09.1139733Z or.b64 %rd370, %rd367, %rd369; 2026-02-21T09:52:09.1140099Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1140227Z mov.b64 {%r1415, %r1416}, %rd370; 2026-02-21T09:52:09.1140359Z cvt.rn.f16x2.f32 %r1417, %r1416, %r1415; 2026-02-21T09:52:09.1140719Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1140846Z cvt.u64.u32 %rd371, %r731; 2026-02-21T09:52:09.1140959Z cvt.u64.u32 %rd372, %r732; 2026-02-21T09:52:09.1141076Z shl.b64 %rd373, %rd372, 32; 2026-02-21T09:52:09.1141198Z or.b64 %rd374, %rd371, %rd373; 2026-02-21T09:52:09.1141561Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1141672Z mov.b64 {%r1418, %r1419}, %rd374; 2026-02-21T09:52:09.1141905Z cvt.rn.f16x2.f32 %r1420, %r1419, %r1418; 2026-02-21T09:52:09.1142359Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1142484Z cvt.u64.u32 %rd375, %r733; 2026-02-21T09:52:09.1142594Z cvt.u64.u32 %rd376, %r734; 2026-02-21T09:52:09.1142716Z shl.b64 %rd377, %rd376, 32; 2026-02-21T09:52:09.1142830Z or.b64 %rd378, %rd375, %rd377; 2026-02-21T09:52:09.1143200Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1143320Z mov.b64 {%r1421, %r1422}, %rd378; 2026-02-21T09:52:09.1143449Z cvt.rn.f16x2.f32 %r1423, %r1422, %r1421; 2026-02-21T09:52:09.1143815Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1143944Z cvt.u64.u32 %rd379, %r735; 2026-02-21T09:52:09.1144056Z cvt.u64.u32 %rd380, %r736; 2026-02-21T09:52:09.1144167Z shl.b64 %rd381, %rd380, 32; 2026-02-21T09:52:09.1144287Z or.b64 %rd382, %rd379, %rd381; 2026-02-21T09:52:09.1144782Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1144908Z mov.b64 {%r1424, %r1425}, %rd382; 2026-02-21T09:52:09.1145040Z cvt.rn.f16x2.f32 %r1426, %r1425, %r1424; 2026-02-21T09:52:09.1145426Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1145540Z cvt.u64.u32 %rd383, %r737; 2026-02-21T09:52:09.1145652Z cvt.u64.u32 %rd384, %r738; 2026-02-21T09:52:09.1145764Z shl.b64 %rd385, %rd384, 32; 2026-02-21T09:52:09.1145888Z or.b64 %rd386, %rd383, %rd385; 2026-02-21T09:52:09.1146266Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1146385Z mov.b64 {%r1427, %r1428}, %rd386; 2026-02-21T09:52:09.1146527Z cvt.rn.f16x2.f32 %r1429, %r1428, %r1427; 2026-02-21T09:52:09.1146907Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1147018Z cvt.u64.u32 %rd387, %r739; 2026-02-21T09:52:09.1147259Z cvt.u64.u32 %rd388, %r740; 2026-02-21T09:52:09.1147372Z shl.b64 %rd389, %rd388, 32; 2026-02-21T09:52:09.1147486Z or.b64 %rd390, %rd387, %rd389; 2026-02-21T09:52:09.1147859Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1147984Z mov.b64 {%r1430, %r1431}, %rd390; 2026-02-21T09:52:09.1148115Z cvt.rn.f16x2.f32 %r1432, %r1431, %r1430; 2026-02-21T09:52:09.1148486Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1148702Z cvt.u64.u32 %rd391, %r741; 2026-02-21T09:52:09.1148818Z cvt.u64.u32 %rd392, %r742; 2026-02-21T09:52:09.1148929Z shl.b64 %rd393, %rd392, 32; 2026-02-21T09:52:09.1149053Z or.b64 %rd394, %rd391, %rd393; 2026-02-21T09:52:09.1149427Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1149545Z mov.b64 {%r1433, %r1434}, %rd394; 2026-02-21T09:52:09.1149685Z cvt.rn.f16x2.f32 %r1435, %r1434, %r1433; 2026-02-21T09:52:09.1150065Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1150174Z cvt.u64.u32 %rd395, %r743; 2026-02-21T09:52:09.1150284Z cvt.u64.u32 %rd396, %r744; 2026-02-21T09:52:09.1150408Z shl.b64 %rd397, %rd396, 32; 2026-02-21T09:52:09.1150522Z or.b64 %rd398, %rd395, %rd397; 2026-02-21T09:52:09.1150891Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1151016Z mov.b64 {%r1436, %r1437}, %rd398; 2026-02-21T09:52:09.1151149Z cvt.rn.f16x2.f32 %r1438, %r1437, %r1436; 2026-02-21T09:52:09.1151513Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1151637Z cvt.u64.u32 %rd399, %r746; 2026-02-21T09:52:09.1151747Z cvt.u64.u32 %rd400, %r747; 2026-02-21T09:52:09.1151980Z shl.b64 %rd401, %rd400, 32; 2026-02-21T09:52:09.1152170Z or.b64 %rd402, %rd399, %rd401; 2026-02-21T09:52:09.1152560Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1152676Z mov.b64 {%r1439, %r1440}, %rd402; 2026-02-21T09:52:09.1152803Z cvt.rn.f16x2.f32 %r1441, %r1440, %r1439; 2026-02-21T09:52:09.1153182Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1153295Z cvt.u64.u32 %rd403, %r748; 2026-02-21T09:52:09.1153397Z cvt.u64.u32 %rd404, %r749; 2026-02-21T09:52:09.1153511Z shl.b64 %rd405, %rd404, 32; 2026-02-21T09:52:09.1153641Z or.b64 %rd406, %rd403, %rd405; 2026-02-21T09:52:09.1154006Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1154119Z mov.b64 {%r1442, %r1443}, %rd406; 2026-02-21T09:52:09.1154262Z cvt.rn.f16x2.f32 %r1444, %r1443, %r1442; 2026-02-21T09:52:09.1154632Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1154829Z cvt.u64.u32 %rd407, %r750; 2026-02-21T09:52:09.1154952Z cvt.u64.u32 %rd408, %r751; 2026-02-21T09:52:09.1155061Z shl.b64 %rd409, %rd408, 32; 2026-02-21T09:52:09.1155175Z or.b64 %rd410, %rd407, %rd409; 2026-02-21T09:52:09.1155538Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1155663Z mov.b64 {%r1445, %r1446}, %rd410; 2026-02-21T09:52:09.1155792Z cvt.rn.f16x2.f32 %r1447, %r1446, %r1445; 2026-02-21T09:52:09.1156153Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1156277Z cvt.u64.u32 %rd411, %r752; 2026-02-21T09:52:09.1156390Z cvt.u64.u32 %rd412, %r753; 2026-02-21T09:52:09.1156501Z shl.b64 %rd413, %rd412, 32; 2026-02-21T09:52:09.1156621Z or.b64 %rd414, %rd411, %rd413; 2026-02-21T09:52:09.1157002Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1157226Z mov.b64 {%r1448, %r1449}, %rd414; 2026-02-21T09:52:09.1157356Z cvt.rn.f16x2.f32 %r1450, %r1449, %r1448; 2026-02-21T09:52:09.1157734Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1157840Z cvt.u64.u32 %rd415, %r754; 2026-02-21T09:52:09.1157948Z cvt.u64.u32 %rd416, %r755; 2026-02-21T09:52:09.1158066Z shl.b64 %rd417, %rd416, 32; 2026-02-21T09:52:09.1158173Z or.b64 %rd418, %rd415, %rd417; 2026-02-21T09:52:09.1158526Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1158746Z mov.b64 {%r1451, %r1452}, %rd418; 2026-02-21T09:52:09.1158873Z cvt.rn.f16x2.f32 %r1453, %r1452, %r1451; 2026-02-21T09:52:09.1159237Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1159366Z cvt.u64.u32 %rd419, %r756; 2026-02-21T09:52:09.1159486Z cvt.u64.u32 %rd420, %r757; 2026-02-21T09:52:09.1159597Z shl.b64 %rd421, %rd420, 32; 2026-02-21T09:52:09.1159707Z or.b64 %rd422, %rd419, %rd421; 2026-02-21T09:52:09.1160086Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1160199Z mov.b64 {%r1454, %r1455}, %rd422; 2026-02-21T09:52:09.1160325Z cvt.rn.f16x2.f32 %r1456, %r1455, %r1454; 2026-02-21T09:52:09.1160703Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1160817Z cvt.u64.u32 %rd423, %r758; 2026-02-21T09:52:09.1160933Z cvt.u64.u32 %rd424, %r759; 2026-02-21T09:52:09.1161046Z shl.b64 %rd425, %rd424, 32; 2026-02-21T09:52:09.1161164Z or.b64 %rd426, %rd423, %rd425; 2026-02-21T09:52:09.1161536Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1161751Z mov.b64 {%r1457, %r1458}, %rd426; 2026-02-21T09:52:09.1161964Z cvt.rn.f16x2.f32 %r1459, %r1458, %r1457; 2026-02-21T09:52:09.1162343Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1162455Z cvt.u64.u32 %rd427, %r760; 2026-02-21T09:52:09.1162577Z cvt.u64.u32 %rd428, %r761; 2026-02-21T09:52:09.1162687Z shl.b64 %rd429, %rd428, 32; 2026-02-21T09:52:09.1162799Z or.b64 %rd430, %rd427, %rd429; 2026-02-21T09:52:09.1163162Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1163284Z mov.b64 {%r1460, %r1461}, %rd430; 2026-02-21T09:52:09.1163420Z cvt.rn.f16x2.f32 %r1462, %r1461, %r1460; 2026-02-21T09:52:09.1163786Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1163909Z cvt.u64.u32 %rd431, %r763; 2026-02-21T09:52:09.1164024Z cvt.u64.u32 %rd432, %r764; 2026-02-21T09:52:09.1164140Z shl.b64 %rd433, %rd432, 32; 2026-02-21T09:52:09.1164267Z or.b64 %rd434, %rd431, %rd433; 2026-02-21T09:52:09.1164638Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1164817Z mov.b64 {%r1463, %r1464}, %rd434; 2026-02-21T09:52:09.1164949Z cvt.rn.f16x2.f32 %r1465, %r1464, %r1463; 2026-02-21T09:52:09.1165330Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1165440Z cvt.u64.u32 %rd435, %r765; 2026-02-21T09:52:09.1165549Z cvt.u64.u32 %rd436, %r766; 2026-02-21T09:52:09.1165673Z shl.b64 %rd437, %rd436, 32; 2026-02-21T09:52:09.1165791Z or.b64 %rd438, %rd435, %rd437; 2026-02-21T09:52:09.1166164Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1166280Z mov.b64 {%r1466, %r1467}, %rd438; 2026-02-21T09:52:09.1166408Z cvt.rn.f16x2.f32 %r1468, %r1467, %r1466; 2026-02-21T09:52:09.1166785Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1167014Z cvt.u64.u32 %rd439, %r767; 2026-02-21T09:52:09.1167124Z cvt.u64.u32 %rd440, %r768; 2026-02-21T09:52:09.1167235Z shl.b64 %rd441, %rd440, 32; 2026-02-21T09:52:09.1167347Z or.b64 %rd442, %rd439, %rd441; 2026-02-21T09:52:09.1167728Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1167840Z mov.b64 {%r1469, %r1470}, %rd442; 2026-02-21T09:52:09.1167965Z cvt.rn.f16x2.f32 %r1471, %r1470, %r1469; 2026-02-21T09:52:09.1168344Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1168554Z cvt.u64.u32 %rd443, %r769; 2026-02-21T09:52:09.1168665Z cvt.u64.u32 %rd444, %r770; 2026-02-21T09:52:09.1168778Z shl.b64 %rd445, %rd444, 32; 2026-02-21T09:52:09.1168898Z or.b64 %rd446, %rd443, %rd445; 2026-02-21T09:52:09.1169274Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1169390Z mov.b64 {%r1472, %r1473}, %rd446; 2026-02-21T09:52:09.1169526Z cvt.rn.f16x2.f32 %r1474, %r1473, %r1472; 2026-02-21T09:52:09.1169896Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1170007Z cvt.u64.u32 %rd447, %r771; 2026-02-21T09:52:09.1170126Z cvt.u64.u32 %rd448, %r772; 2026-02-21T09:52:09.1170236Z shl.b64 %rd449, %rd448, 32; 2026-02-21T09:52:09.1170348Z or.b64 %rd450, %rd447, %rd449; 2026-02-21T09:52:09.1170714Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1170845Z mov.b64 {%r1475, %r1476}, %rd450; 2026-02-21T09:52:09.1170973Z cvt.rn.f16x2.f32 %r1477, %r1476, %r1475; 2026-02-21T09:52:09.1171336Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1171604Z cvt.u64.u32 %rd451, %r773; 2026-02-21T09:52:09.1171791Z cvt.u64.u32 %rd452, %r774; 2026-02-21T09:52:09.1171912Z shl.b64 %rd453, %rd452, 32; 2026-02-21T09:52:09.1172034Z or.b64 %rd454, %rd451, %rd453; 2026-02-21T09:52:09.1172406Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1172516Z mov.b64 {%r1478, %r1479}, %rd454; 2026-02-21T09:52:09.1172635Z cvt.rn.f16x2.f32 %r1480, %r1479, %r1478; 2026-02-21T09:52:09.1173015Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1173124Z cvt.u64.u32 %rd455, %r775; 2026-02-21T09:52:09.1173239Z cvt.u64.u32 %rd456, %r776; 2026-02-21T09:52:09.1173359Z shl.b64 %rd457, %rd456, 32; 2026-02-21T09:52:09.1173470Z or.b64 %rd458, %rd455, %rd457; 2026-02-21T09:52:09.1173835Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1173960Z mov.b64 {%r1481, %r1482}, %rd458; 2026-02-21T09:52:09.1174092Z cvt.rn.f16x2.f32 %r1483, %r1482, %r1481; 2026-02-21T09:52:09.1174460Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1174580Z cvt.u64.u32 %rd459, %r777; 2026-02-21T09:52:09.1174780Z cvt.u64.u32 %rd460, %r778; 2026-02-21T09:52:09.1174895Z shl.b64 %rd461, %rd460, 32; 2026-02-21T09:52:09.1175007Z or.b64 %rd462, %rd459, %rd461; 2026-02-21T09:52:09.1175389Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1175503Z mov.b64 {%r1484, %r1485}, %rd462; 2026-02-21T09:52:09.1175637Z cvt.rn.f16x2.f32 %r1486, %r1485, %r1484; 2026-02-21T09:52:09.1176014Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1176128Z cvt.u64.u32 %rd463, %r780; 2026-02-21T09:52:09.1176237Z cvt.u64.u32 %rd464, %r781; 2026-02-21T09:52:09.1176354Z shl.b64 %rd465, %rd464, 32; 2026-02-21T09:52:09.1176485Z or.b64 %rd466, %rd463, %rd465; 2026-02-21T09:52:09.1176961Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1177073Z mov.b64 {%r1487, %r1488}, %rd466; 2026-02-21T09:52:09.1177211Z cvt.rn.f16x2.f32 %r1489, %r1488, %r1487; 2026-02-21T09:52:09.1177573Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1177685Z cvt.u64.u32 %rd467, %r782; 2026-02-21T09:52:09.1177805Z cvt.u64.u32 %rd468, %r783; 2026-02-21T09:52:09.1177914Z shl.b64 %rd469, %rd468, 32; 2026-02-21T09:52:09.1178119Z or.b64 %rd470, %rd467, %rd469; 2026-02-21T09:52:09.1178500Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1178622Z mov.b64 {%r1490, %r1491}, %rd470; 2026-02-21T09:52:09.1178746Z cvt.rn.f16x2.f32 %r1492, %r1491, %r1490; 2026-02-21T09:52:09.1179110Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1179233Z cvt.u64.u32 %rd471, %r784; 2026-02-21T09:52:09.1179338Z cvt.u64.u32 %rd472, %r785; 2026-02-21T09:52:09.1179447Z shl.b64 %rd473, %rd472, 32; 2026-02-21T09:52:09.1179566Z or.b64 %rd474, %rd471, %rd473; 2026-02-21T09:52:09.1179925Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1180033Z mov.b64 {%r1493, %r1494}, %rd474; 2026-02-21T09:52:09.1180156Z cvt.rn.f16x2.f32 %r1495, %r1494, %r1493; 2026-02-21T09:52:09.1180521Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1180631Z cvt.u64.u32 %rd475, %r786; 2026-02-21T09:52:09.1180735Z cvt.u64.u32 %rd476, %r787; 2026-02-21T09:52:09.1180854Z shl.b64 %rd477, %rd476, 32; 2026-02-21T09:52:09.1180964Z or.b64 %rd478, %rd475, %rd477; 2026-02-21T09:52:09.1181485Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1181625Z mov.b64 {%r1496, %r1497}, %rd478; 2026-02-21T09:52:09.1181747Z cvt.rn.f16x2.f32 %r1498, %r1497, %r1496; 2026-02-21T09:52:09.1182109Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1182228Z cvt.u64.u32 %rd479, %r788; 2026-02-21T09:52:09.1182336Z cvt.u64.u32 %rd480, %r789; 2026-02-21T09:52:09.1182444Z shl.b64 %rd481, %rd480, 32; 2026-02-21T09:52:09.1182553Z or.b64 %rd482, %rd479, %rd481; 2026-02-21T09:52:09.1182928Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1183044Z mov.b64 {%r1499, %r1500}, %rd482; 2026-02-21T09:52:09.1183167Z cvt.rn.f16x2.f32 %r1501, %r1500, %r1499; 2026-02-21T09:52:09.1183532Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1183645Z cvt.u64.u32 %rd483, %r790; 2026-02-21T09:52:09.1183757Z cvt.u64.u32 %rd484, %r791; 2026-02-21T09:52:09.1183870Z shl.b64 %rd485, %rd484, 32; 2026-02-21T09:52:09.1183992Z or.b64 %rd486, %rd483, %rd485; 2026-02-21T09:52:09.1184351Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1184461Z mov.b64 {%r1502, %r1503}, %rd486; 2026-02-21T09:52:09.1184596Z cvt.rn.f16x2.f32 %r1504, %r1503, %r1502; 2026-02-21T09:52:09.1185031Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1185140Z cvt.u64.u32 %rd487, %r792; 2026-02-21T09:52:09.1185265Z cvt.u64.u32 %rd488, %r793; 2026-02-21T09:52:09.1185371Z shl.b64 %rd489, %rd488, 32; 2026-02-21T09:52:09.1185484Z or.b64 %rd490, %rd487, %rd489; 2026-02-21T09:52:09.1185853Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1185982Z mov.b64 {%r1505, %r1506}, %rd490; 2026-02-21T09:52:09.1186117Z cvt.rn.f16x2.f32 %r1507, %r1506, %r1505; 2026-02-21T09:52:09.1186591Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1186716Z cvt.u64.u32 %rd491, %r794; 2026-02-21T09:52:09.1186827Z cvt.u64.u32 %rd492, %r795; 2026-02-21T09:52:09.1186938Z shl.b64 %rd493, %rd492, 32; 2026-02-21T09:52:09.1187063Z or.b64 %rd494, %rd491, %rd493; 2026-02-21T09:52:09.1187436Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1187547Z mov.b64 {%r1508, %r1509}, %rd494; 2026-02-21T09:52:09.1187767Z cvt.rn.f16x2.f32 %r1510, %r1509, %r1508; 2026-02-21T09:52:09.1188152Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1188264Z cvt.u64.u32 %rd495, %r797; 2026-02-21T09:52:09.1188374Z cvt.u64.u32 %rd496, %r798; 2026-02-21T09:52:09.1188500Z shl.b64 %rd497, %rd496, 32; 2026-02-21T09:52:09.1188608Z or.b64 %rd498, %rd495, %rd497; 2026-02-21T09:52:09.1188990Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1189114Z mov.b64 {%r1511, %r1512}, %rd498; 2026-02-21T09:52:09.1189240Z cvt.rn.f16x2.f32 %r1513, %r1512, %r1511; 2026-02-21T09:52:09.1189605Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1189727Z cvt.u64.u32 %rd499, %r799; 2026-02-21T09:52:09.1189834Z cvt.u64.u32 %rd500, %r800; 2026-02-21T09:52:09.1189946Z shl.b64 %rd501, %rd500, 32; 2026-02-21T09:52:09.1190062Z or.b64 %rd502, %rd499, %rd501; 2026-02-21T09:52:09.1190441Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1190555Z mov.b64 {%r1514, %r1515}, %rd502; 2026-02-21T09:52:09.1190687Z cvt.rn.f16x2.f32 %r1516, %r1515, %r1514; 2026-02-21T09:52:09.1191214Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1191344Z cvt.u64.u32 %rd503, %r801; 2026-02-21T09:52:09.1191454Z cvt.u64.u32 %rd504, %r802; 2026-02-21T09:52:09.1191563Z shl.b64 %rd505, %rd504, 32; 2026-02-21T09:52:09.1191683Z or.b64 %rd506, %rd503, %rd505; 2026-02-21T09:52:09.1192056Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1192169Z mov.b64 {%r1517, %r1518}, %rd506; 2026-02-21T09:52:09.1192308Z cvt.rn.f16x2.f32 %r1519, %r1518, %r1517; 2026-02-21T09:52:09.1192675Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1192791Z cvt.u64.u32 %rd507, %r803; 2026-02-21T09:52:09.1192910Z cvt.u64.u32 %rd508, %r804; 2026-02-21T09:52:09.1193022Z shl.b64 %rd509, %rd508, 32; 2026-02-21T09:52:09.1193136Z or.b64 %rd510, %rd507, %rd509; 2026-02-21T09:52:09.1193508Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1193639Z mov.b64 {%r1520, %r1521}, %rd510; 2026-02-21T09:52:09.1193766Z cvt.rn.f16x2.f32 %r1522, %r1521, %r1520; 2026-02-21T09:52:09.1194132Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1194257Z cvt.u64.u32 %rd511, %r805; 2026-02-21T09:52:09.1194368Z cvt.u64.u32 %rd512, %r806; 2026-02-21T09:52:09.1194477Z shl.b64 %rd513, %rd512, 32; 2026-02-21T09:52:09.1194598Z or.b64 %rd514, %rd511, %rd513; 2026-02-21T09:52:09.1195040Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1195162Z mov.b64 {%r1523, %r1524}, %rd514; 2026-02-21T09:52:09.1195293Z cvt.rn.f16x2.f32 %r1525, %r1524, %r1523; 2026-02-21T09:52:09.1195666Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1195784Z cvt.u64.u32 %rd515, %r807; 2026-02-21T09:52:09.1195898Z cvt.u64.u32 %rd516, %r808; 2026-02-21T09:52:09.1196131Z shl.b64 %rd517, %rd516, 32; 2026-02-21T09:52:09.1196244Z or.b64 %rd518, %rd515, %rd517; 2026-02-21T09:52:09.1196622Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1196746Z mov.b64 {%r1526, %r1527}, %rd518; 2026-02-21T09:52:09.1196875Z cvt.rn.f16x2.f32 %r1528, %r1527, %r1526; 2026-02-21T09:52:09.1197268Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1197387Z cvt.u64.u32 %rd519, %r809; 2026-02-21T09:52:09.1197593Z cvt.u64.u32 %rd520, %r810; 2026-02-21T09:52:09.1197709Z shl.b64 %rd521, %rd520, 32; 2026-02-21T09:52:09.1197836Z or.b64 %rd522, %rd519, %rd521; 2026-02-21T09:52:09.1198229Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1198345Z mov.b64 {%r1529, %r1530}, %rd522; 2026-02-21T09:52:09.1198491Z cvt.rn.f16x2.f32 %r1531, %r1530, %r1529; 2026-02-21T09:52:09.1198893Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1199008Z cvt.u64.u32 %rd523, %r811; 2026-02-21T09:52:09.1199129Z cvt.u64.u32 %rd524, %r812; 2026-02-21T09:52:09.1199242Z shl.b64 %rd525, %rd524, 32; 2026-02-21T09:52:09.1199358Z or.b64 %rd526, %rd523, %rd525; 2026-02-21T09:52:09.1199762Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1199890Z mov.b64 {%r1532, %r1533}, %rd526; 2026-02-21T09:52:09.1200020Z cvt.rn.f16x2.f32 %r1534, %r1533, %r1532; 2026-02-21T09:52:09.1200420Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1200548Z cvt.u64.u32 %rd527, %r814; 2026-02-21T09:52:09.1200661Z cvt.u64.u32 %rd528, %r815; 2026-02-21T09:52:09.1200775Z shl.b64 %rd529, %rd528, 32; 2026-02-21T09:52:09.1200992Z or.b64 %rd530, %rd527, %rd529; 2026-02-21T09:52:09.1201474Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1201596Z mov.b64 {%r1535, %r1536}, %rd530; 2026-02-21T09:52:09.1201728Z cvt.rn.f16x2.f32 %r1537, %r1536, %r1535; 2026-02-21T09:52:09.1202139Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1202254Z cvt.u64.u32 %rd531, %r816; 2026-02-21T09:52:09.1202364Z cvt.u64.u32 %rd532, %r817; 2026-02-21T09:52:09.1202487Z shl.b64 %rd533, %rd532, 32; 2026-02-21T09:52:09.1202602Z or.b64 %rd534, %rd531, %rd533; 2026-02-21T09:52:09.1202999Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1203123Z mov.b64 {%r1538, %r1539}, %rd534; 2026-02-21T09:52:09.1203256Z cvt.rn.f16x2.f32 %r1540, %r1539, %r1538; 2026-02-21T09:52:09.1203665Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1203786Z cvt.u64.u32 %rd535, %r818; 2026-02-21T09:52:09.1203908Z cvt.u64.u32 %rd536, %r819; 2026-02-21T09:52:09.1204023Z shl.b64 %rd537, %rd536, 32; 2026-02-21T09:52:09.1204139Z or.b64 %rd538, %rd535, %rd537; 2026-02-21T09:52:09.1204570Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1204760Z mov.b64 {%r1541, %r1542}, %rd538; 2026-02-21T09:52:09.1204906Z cvt.rn.f16x2.f32 %r1543, %r1542, %r1541; 2026-02-21T09:52:09.1205295Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1205417Z cvt.u64.u32 %rd539, %r820; 2026-02-21T09:52:09.1205531Z cvt.u64.u32 %rd540, %r821; 2026-02-21T09:52:09.1205646Z shl.b64 %rd541, %rd540, 32; 2026-02-21T09:52:09.1205771Z or.b64 %rd542, %rd539, %rd541; 2026-02-21T09:52:09.1206172Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1206289Z mov.b64 {%r1544, %r1545}, %rd542; 2026-02-21T09:52:09.1206538Z cvt.rn.f16x2.f32 %r1546, %r1545, %r1544; 2026-02-21T09:52:09.1206915Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1207024Z cvt.u64.u32 %rd543, %r822; 2026-02-21T09:52:09.1207141Z cvt.u64.u32 %rd544, %r823; 2026-02-21T09:52:09.1207252Z shl.b64 %rd545, %rd544, 32; 2026-02-21T09:52:09.1207362Z or.b64 %rd546, %rd543, %rd545; 2026-02-21T09:52:09.1207737Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1207984Z mov.b64 {%r1547, %r1548}, %rd546; 2026-02-21T09:52:09.1208111Z cvt.rn.f16x2.f32 %r1549, %r1548, %r1547; 2026-02-21T09:52:09.1208487Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1208607Z cvt.u64.u32 %rd547, %r824; 2026-02-21T09:52:09.1208721Z cvt.u64.u32 %rd548, %r825; 2026-02-21T09:52:09.1208836Z shl.b64 %rd549, %rd548, 32; 2026-02-21T09:52:09.1208951Z or.b64 %rd550, %rd547, %rd549; 2026-02-21T09:52:09.1209326Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1209437Z mov.b64 {%r1550, %r1551}, %rd550; 2026-02-21T09:52:09.1209564Z cvt.rn.f16x2.f32 %r1552, %r1551, %r1550; 2026-02-21T09:52:09.1209948Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1210073Z cvt.u64.u32 %rd551, %r826; 2026-02-21T09:52:09.1210193Z cvt.u64.u32 %rd552, %r827; 2026-02-21T09:52:09.1210308Z shl.b64 %rd553, %rd552, 32; 2026-02-21T09:52:09.1210417Z or.b64 %rd554, %rd551, %rd553; 2026-02-21T09:52:09.1210800Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1210920Z mov.b64 {%r1553, %r1554}, %rd554; 2026-02-21T09:52:09.1211145Z cvt.rn.f16x2.f32 %r1555, %r1554, %r1553; 2026-02-21T09:52:09.1211582Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1211719Z cvt.u64.u32 %rd555, %r828; 2026-02-21T09:52:09.1211828Z cvt.u64.u32 %rd556, %r829; 2026-02-21T09:52:09.1211938Z shl.b64 %rd557, %rd556, 32; 2026-02-21T09:52:09.1212055Z or.b64 %rd558, %rd555, %rd557; 2026-02-21T09:52:09.1212425Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1212535Z mov.b64 {%r1556, %r1557}, %rd558; 2026-02-21T09:52:09.1212660Z cvt.rn.f16x2.f32 %r1558, %r1557, %r1556; 2026-02-21T09:52:09.1213044Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1213153Z cvt.u64.u32 %rd559, %r831; 2026-02-21T09:52:09.1213261Z cvt.u64.u32 %rd560, %r832; 2026-02-21T09:52:09.1213380Z shl.b64 %rd561, %rd560, 32; 2026-02-21T09:52:09.1213494Z or.b64 %rd562, %rd559, %rd561; 2026-02-21T09:52:09.1213868Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1213993Z mov.b64 {%r1559, %r1560}, %rd562; 2026-02-21T09:52:09.1214119Z cvt.rn.f16x2.f32 %r1561, %r1560, %r1559; 2026-02-21T09:52:09.1214499Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1214610Z cvt.u64.u32 %rd563, %r833; 2026-02-21T09:52:09.1214789Z cvt.u64.u32 %rd564, %r834; 2026-02-21T09:52:09.1214903Z shl.b64 %rd565, %rd564, 32; 2026-02-21T09:52:09.1215019Z or.b64 %rd566, %rd563, %rd565; 2026-02-21T09:52:09.1215411Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1215526Z mov.b64 {%r1562, %r1563}, %rd566; 2026-02-21T09:52:09.1215657Z cvt.rn.f16x2.f32 %r1564, %r1563, %r1562; 2026-02-21T09:52:09.1216059Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1216178Z cvt.u64.u32 %rd567, %r835; 2026-02-21T09:52:09.1216405Z cvt.u64.u32 %rd568, %r836; 2026-02-21T09:52:09.1216518Z shl.b64 %rd569, %rd568, 32; 2026-02-21T09:52:09.1216637Z or.b64 %rd570, %rd567, %rd569; 2026-02-21T09:52:09.1217010Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1217122Z mov.b64 {%r1565, %r1566}, %rd570; 2026-02-21T09:52:09.1217261Z cvt.rn.f16x2.f32 %r1567, %r1566, %r1565; 2026-02-21T09:52:09.1217641Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1217864Z cvt.u64.u32 %rd571, %r837; 2026-02-21T09:52:09.1217983Z cvt.u64.u32 %rd572, %r838; 2026-02-21T09:52:09.1218088Z shl.b64 %rd573, %rd572, 32; 2026-02-21T09:52:09.1218195Z or.b64 %rd574, %rd571, %rd573; 2026-02-21T09:52:09.1218581Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1218704Z mov.b64 {%r1568, %r1569}, %rd574; 2026-02-21T09:52:09.1218835Z cvt.rn.f16x2.f32 %r1570, %r1569, %r1568; 2026-02-21T09:52:09.1219194Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1219315Z cvt.u64.u32 %rd575, %r839; 2026-02-21T09:52:09.1219425Z cvt.u64.u32 %rd576, %r840; 2026-02-21T09:52:09.1219534Z shl.b64 %rd577, %rd576, 32; 2026-02-21T09:52:09.1219653Z or.b64 %rd578, %rd575, %rd577; 2026-02-21T09:52:09.1220015Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1220131Z mov.b64 {%r1571, %r1572}, %rd578; 2026-02-21T09:52:09.1220256Z cvt.rn.f16x2.f32 %r1573, %r1572, %r1571; 2026-02-21T09:52:09.1220630Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1220739Z cvt.u64.u32 %rd579, %r841; 2026-02-21T09:52:09.1220938Z cvt.u64.u32 %rd580, %r842; 2026-02-21T09:52:09.1221122Z shl.b64 %rd581, %rd580, 32; 2026-02-21T09:52:09.1221233Z or.b64 %rd582, %rd579, %rd581; 2026-02-21T09:52:09.1221602Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1221722Z mov.b64 {%r1574, %r1575}, %rd582; 2026-02-21T09:52:09.1221845Z cvt.rn.f16x2.f32 %r1576, %r1575, %r1574; 2026-02-21T09:52:09.1222204Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1222315Z cvt.u64.u32 %rd583, %r843; 2026-02-21T09:52:09.1222433Z cvt.u64.u32 %rd584, %r844; 2026-02-21T09:52:09.1222548Z shl.b64 %rd585, %rd584, 32; 2026-02-21T09:52:09.1222658Z or.b64 %rd586, %rd583, %rd585; 2026-02-21T09:52:09.1223031Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1223144Z mov.b64 {%r1577, %r1578}, %rd586; 2026-02-21T09:52:09.1223276Z cvt.rn.f16x2.f32 %r1579, %r1578, %r1577; 2026-02-21T09:52:09.1223651Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1223762Z cvt.u64.u32 %rd587, %r845; 2026-02-21T09:52:09.1223870Z cvt.u64.u32 %rd588, %r846; 2026-02-21T09:52:09.1223978Z shl.b64 %rd589, %rd588, 32; 2026-02-21T09:52:09.1224098Z or.b64 %rd590, %rd587, %rd589; 2026-02-21T09:52:09.1224460Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1224576Z mov.b64 {%r1580, %r1581}, %rd590; 2026-02-21T09:52:09.1224796Z cvt.rn.f16x2.f32 %r1582, %r1581, %r1580; 2026-02-21T09:52:09.1225185Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1225300Z cvt.u64.u32 %rd591, %r848; 2026-02-21T09:52:09.1225420Z cvt.u64.u32 %rd592, %r849; 2026-02-21T09:52:09.1225532Z shl.b64 %rd593, %rd592, 32; 2026-02-21T09:52:09.1225651Z or.b64 %rd594, %rd591, %rd593; 2026-02-21T09:52:09.1226033Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1226269Z mov.b64 {%r1583, %r1584}, %rd594; 2026-02-21T09:52:09.1226401Z cvt.rn.f16x2.f32 %r1585, %r1584, %r1583; 2026-02-21T09:52:09.1226785Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1226910Z cvt.u64.u32 %rd595, %r850; 2026-02-21T09:52:09.1227020Z cvt.u64.u32 %rd596, %r851; 2026-02-21T09:52:09.1227132Z shl.b64 %rd597, %rd596, 32; 2026-02-21T09:52:09.1227252Z or.b64 %rd598, %rd595, %rd597; 2026-02-21T09:52:09.1227724Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1227838Z mov.b64 {%r1586, %r1587}, %rd598; 2026-02-21T09:52:09.1227967Z cvt.rn.f16x2.f32 %r1588, %r1587, %r1586; 2026-02-21T09:52:09.1228351Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1228467Z cvt.u64.u32 %rd599, %r852; 2026-02-21T09:52:09.1228583Z cvt.u64.u32 %rd600, %r853; 2026-02-21T09:52:09.1228711Z shl.b64 %rd601, %rd600, 32; 2026-02-21T09:52:09.1228824Z or.b64 %rd602, %rd599, %rd601; 2026-02-21T09:52:09.1229200Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1229322Z mov.b64 {%r1589, %r1590}, %rd602; 2026-02-21T09:52:09.1229459Z cvt.rn.f16x2.f32 %r1591, %r1590, %r1589; 2026-02-21T09:52:09.1229848Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1229966Z cvt.u64.u32 %rd603, %r854; 2026-02-21T09:52:09.1230090Z cvt.u64.u32 %rd604, %r855; 2026-02-21T09:52:09.1230203Z shl.b64 %rd605, %rd604, 32; 2026-02-21T09:52:09.1230318Z or.b64 %rd606, %rd603, %rd605; 2026-02-21T09:52:09.1230802Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1230987Z mov.b64 {%r1592, %r1593}, %rd606; 2026-02-21T09:52:09.1231125Z cvt.rn.f16x2.f32 %r1594, %r1593, %r1592; 2026-02-21T09:52:09.1231526Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1231641Z cvt.u64.u32 %rd607, %r856; 2026-02-21T09:52:09.1231753Z cvt.u64.u32 %rd608, %r857; 2026-02-21T09:52:09.1231868Z shl.b64 %rd609, %rd608, 32; 2026-02-21T09:52:09.1231990Z or.b64 %rd610, %rd607, %rd609; 2026-02-21T09:52:09.1232381Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1232502Z mov.b64 {%r1595, %r1596}, %rd610; 2026-02-21T09:52:09.1232642Z cvt.rn.f16x2.f32 %r1597, %r1596, %r1595; 2026-02-21T09:52:09.1233022Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1233136Z cvt.u64.u32 %rd611, %r858; 2026-02-21T09:52:09.1233260Z cvt.u64.u32 %rd612, %r859; 2026-02-21T09:52:09.1233376Z shl.b64 %rd613, %rd612, 32; 2026-02-21T09:52:09.1233498Z or.b64 %rd614, %rd611, %rd613; 2026-02-21T09:52:09.1233879Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1234009Z mov.b64 {%r1598, %r1599}, %rd614; 2026-02-21T09:52:09.1234140Z cvt.rn.f16x2.f32 %r1600, %r1599, %r1598; 2026-02-21T09:52:09.1234518Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1234645Z cvt.u64.u32 %rd615, %r860; 2026-02-21T09:52:09.1234858Z cvt.u64.u32 %rd616, %r861; 2026-02-21T09:52:09.1234979Z shl.b64 %rd617, %rd616, 32; 2026-02-21T09:52:09.1235105Z or.b64 %rd618, %rd615, %rd617; 2026-02-21T09:52:09.1235501Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1235619Z mov.b64 {%r1601, %r1602}, %rd618; 2026-02-21T09:52:09.1235754Z cvt.rn.f16x2.f32 %r1603, %r1602, %r1601; 2026-02-21T09:52:09.1236143Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1236365Z cvt.u64.u32 %rd619, %r862; 2026-02-21T09:52:09.1236480Z cvt.u64.u32 %rd620, %r863; 2026-02-21T09:52:09.1236602Z shl.b64 %rd621, %rd620, 32; 2026-02-21T09:52:09.1236715Z or.b64 %rd622, %rd619, %rd621; 2026-02-21T09:52:09.1237108Z .loc 1 58 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:58:27 2026-02-21T09:52:09.1237238Z mov.b64 {%r1604, %r1605}, %rd622; 2026-02-21T09:52:09.1237365Z cvt.rn.f16x2.f32 %r1606, %r1605, %r1604; 2026-02-21T09:52:09.1237816Z .loc 1 59 83 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:59:83 2026-02-21T09:52:09.1238023Z st.shared.v4.b32 [%r80], {%r1225, %r1237, %r1249, %r1261}; 2026-02-21T09:52:09.1238223Z st.shared.v4.b32 [%r81], {%r1273, %r1285, %r1297, %r1309}; 2026-02-21T09:52:09.1238410Z st.shared.v4.b32 [%r82], {%r1321, %r1333, %r1345, %r1357}; 2026-02-21T09:52:09.1238595Z st.shared.v4.b32 [%r83], {%r1369, %r1381, %r1393, %r1405}; 2026-02-21T09:52:09.1238716Z bar.sync 0, 128; 2026-02-21T09:52:09.1238828Z // begin inline asm 2026-02-21T09:52:09.1239171Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1029, %r1033, %r1037}, [%r869]; 2026-02-21T09:52:09.1239284Z // end inline asm 2026-02-21T09:52:09.1239391Z // begin inline asm 2026-02-21T09:52:09.1239722Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1041, %r1045, %r1049, %r1053}, [%r874]; 2026-02-21T09:52:09.1239823Z // end inline asm 2026-02-21T09:52:09.1239939Z // begin inline asm 2026-02-21T09:52:09.1240270Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1057, %r1061, %r1065, %r1069}, [%r879]; 2026-02-21T09:52:09.1240371Z // end inline asm 2026-02-21T09:52:09.1240484Z // begin inline asm 2026-02-21T09:52:09.1240812Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1073, %r1077, %r1081, %r1085}, [%r884]; 2026-02-21T09:52:09.1240998Z // end inline asm 2026-02-21T09:52:09.1241118Z bar.sync 0, 128; 2026-02-21T09:52:09.1241368Z st.shared.v4.b32 [%r80], {%r1417, %r1429, %r1441, %r1453}; 2026-02-21T09:52:09.1241563Z st.shared.v4.b32 [%r81], {%r1465, %r1477, %r1489, %r1501}; 2026-02-21T09:52:09.1241744Z st.shared.v4.b32 [%r82], {%r1513, %r1525, %r1537, %r1549}; 2026-02-21T09:52:09.1241933Z st.shared.v4.b32 [%r83], {%r1561, %r1573, %r1585, %r1597}; 2026-02-21T09:52:09.1242035Z bar.sync 0, 128; 2026-02-21T09:52:09.1242140Z // begin inline asm 2026-02-21T09:52:09.1242469Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1089, %r1093, %r1097, %r1101}, [%r869]; 2026-02-21T09:52:09.1242569Z // end inline asm 2026-02-21T09:52:09.1242675Z // begin inline asm 2026-02-21T09:52:09.1243019Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1105, %r1109, %r1113, %r1117}, [%r874]; 2026-02-21T09:52:09.1243118Z // end inline asm 2026-02-21T09:52:09.1243220Z // begin inline asm 2026-02-21T09:52:09.1243548Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1121, %r1125, %r1129, %r1133}, [%r879]; 2026-02-21T09:52:09.1243657Z // end inline asm 2026-02-21T09:52:09.1243764Z // begin inline asm 2026-02-21T09:52:09.1244088Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1137, %r1141, %r1145, %r1149}, [%r884]; 2026-02-21T09:52:09.1244198Z // end inline asm 2026-02-21T09:52:09.1244293Z bar.sync 0, 128; 2026-02-21T09:52:09.1244470Z st.shared.v4.b32 [%r80], {%r1228, %r1240, %r1252, %r1264}; 2026-02-21T09:52:09.1244644Z st.shared.v4.b32 [%r81], {%r1276, %r1288, %r1300, %r1312}; 2026-02-21T09:52:09.1244898Z st.shared.v4.b32 [%r82], {%r1324, %r1336, %r1348, %r1360}; 2026-02-21T09:52:09.1245078Z st.shared.v4.b32 [%r83], {%r1372, %r1384, %r1396, %r1408}; 2026-02-21T09:52:09.1245184Z bar.sync 0, 128; 2026-02-21T09:52:09.1245299Z // begin inline asm 2026-02-21T09:52:09.1245624Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1030, %r1034, %r1038}, [%r869]; 2026-02-21T09:52:09.1245725Z // end inline asm 2026-02-21T09:52:09.1245839Z // begin inline asm 2026-02-21T09:52:09.1246168Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1042, %r1046, %r1050, %r1054}, [%r874]; 2026-02-21T09:52:09.1246409Z // end inline asm 2026-02-21T09:52:09.1246517Z // begin inline asm 2026-02-21T09:52:09.1246845Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1058, %r1062, %r1066, %r1070}, [%r879]; 2026-02-21T09:52:09.1246948Z // end inline asm 2026-02-21T09:52:09.1247053Z // begin inline asm 2026-02-21T09:52:09.1247384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1074, %r1078, %r1082, %r1086}, [%r884]; 2026-02-21T09:52:09.1247490Z // end inline asm 2026-02-21T09:52:09.1247589Z bar.sync 0, 128; 2026-02-21T09:52:09.1247771Z st.shared.v4.b32 [%r80], {%r1420, %r1432, %r1444, %r1456}; 2026-02-21T09:52:09.1248054Z st.shared.v4.b32 [%r81], {%r1468, %r1480, %r1492, %r1504}; 2026-02-21T09:52:09.1248237Z st.shared.v4.b32 [%r82], {%r1516, %r1528, %r1540, %r1552}; 2026-02-21T09:52:09.1248419Z st.shared.v4.b32 [%r83], {%r1564, %r1576, %r1588, %r1600}; 2026-02-21T09:52:09.1248533Z bar.sync 0, 128; 2026-02-21T09:52:09.1248645Z // begin inline asm 2026-02-21T09:52:09.1248970Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1090, %r1094, %r1098, %r1102}, [%r869]; 2026-02-21T09:52:09.1249088Z // end inline asm 2026-02-21T09:52:09.1249193Z // begin inline asm 2026-02-21T09:52:09.1249516Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1106, %r1110, %r1114, %r1118}, [%r874]; 2026-02-21T09:52:09.1249617Z // end inline asm 2026-02-21T09:52:09.1249732Z // begin inline asm 2026-02-21T09:52:09.1250055Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1122, %r1126, %r1130, %r1134}, [%r879]; 2026-02-21T09:52:09.1250155Z // end inline asm 2026-02-21T09:52:09.1250269Z // begin inline asm 2026-02-21T09:52:09.1250590Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1138, %r1142, %r1146, %r1150}, [%r884]; 2026-02-21T09:52:09.1250692Z // end inline asm 2026-02-21T09:52:09.1250801Z bar.sync 0, 128; 2026-02-21T09:52:09.1250987Z st.shared.v4.b32 [%r80], {%r1231, %r1243, %r1255, %r1267}; 2026-02-21T09:52:09.1251259Z st.shared.v4.b32 [%r81], {%r1279, %r1291, %r1303, %r1315}; 2026-02-21T09:52:09.1251511Z st.shared.v4.b32 [%r82], {%r1327, %r1339, %r1351, %r1363}; 2026-02-21T09:52:09.1251713Z st.shared.v4.b32 [%r83], {%r1375, %r1387, %r1399, %r1411}; 2026-02-21T09:52:09.1251816Z bar.sync 0, 128; 2026-02-21T09:52:09.1251920Z // begin inline asm 2026-02-21T09:52:09.1252249Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1027, %r1031, %r1035, %r1039}, [%r869]; 2026-02-21T09:52:09.1252348Z // end inline asm 2026-02-21T09:52:09.1252455Z // begin inline asm 2026-02-21T09:52:09.1252785Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1043, %r1047, %r1051, %r1055}, [%r874]; 2026-02-21T09:52:09.1252892Z // end inline asm 2026-02-21T09:52:09.1252998Z // begin inline asm 2026-02-21T09:52:09.1253318Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1059, %r1063, %r1067, %r1071}, [%r879]; 2026-02-21T09:52:09.1253429Z // end inline asm 2026-02-21T09:52:09.1253535Z // begin inline asm 2026-02-21T09:52:09.1253855Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1075, %r1079, %r1083, %r1087}, [%r884]; 2026-02-21T09:52:09.1253975Z // end inline asm 2026-02-21T09:52:09.1254082Z bar.sync 0, 128; 2026-02-21T09:52:09.1254265Z st.shared.v4.b32 [%r80], {%r1423, %r1435, %r1447, %r1459}; 2026-02-21T09:52:09.1254449Z st.shared.v4.b32 [%r81], {%r1471, %r1483, %r1495, %r1507}; 2026-02-21T09:52:09.1254640Z st.shared.v4.b32 [%r82], {%r1519, %r1531, %r1543, %r1555}; 2026-02-21T09:52:09.1254920Z st.shared.v4.b32 [%r83], {%r1567, %r1579, %r1591, %r1603}; 2026-02-21T09:52:09.1255026Z bar.sync 0, 128; 2026-02-21T09:52:09.1255140Z // begin inline asm 2026-02-21T09:52:09.1255462Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1091, %r1095, %r1099, %r1103}, [%r869]; 2026-02-21T09:52:09.1255567Z // end inline asm 2026-02-21T09:52:09.1255680Z // begin inline asm 2026-02-21T09:52:09.1256002Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1107, %r1111, %r1115, %r1119}, [%r874]; 2026-02-21T09:52:09.1256102Z // end inline asm 2026-02-21T09:52:09.1256212Z // begin inline asm 2026-02-21T09:52:09.1256547Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1123, %r1127, %r1131, %r1135}, [%r879]; 2026-02-21T09:52:09.1256763Z // end inline asm 2026-02-21T09:52:09.1256866Z // begin inline asm 2026-02-21T09:52:09.1257199Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1139, %r1143, %r1147, %r1151}, [%r884]; 2026-02-21T09:52:09.1257300Z // end inline asm 2026-02-21T09:52:09.1257399Z bar.sync 0, 128; 2026-02-21T09:52:09.1257579Z st.shared.v4.b32 [%r80], {%r1234, %r1246, %r1258, %r1270}; 2026-02-21T09:52:09.1257768Z st.shared.v4.b32 [%r81], {%r1282, %r1294, %r1306, %r1318}; 2026-02-21T09:52:09.1257946Z st.shared.v4.b32 [%r82], {%r1330, %r1342, %r1354, %r1366}; 2026-02-21T09:52:09.1258217Z st.shared.v4.b32 [%r83], {%r1378, %r1390, %r1402, %r1414}; 2026-02-21T09:52:09.1258325Z bar.sync 0, 128; 2026-02-21T09:52:09.1258427Z // begin inline asm 2026-02-21T09:52:09.1258745Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1028, %r1032, %r1036, %r1040}, [%r869]; 2026-02-21T09:52:09.1258856Z // end inline asm 2026-02-21T09:52:09.1258959Z // begin inline asm 2026-02-21T09:52:09.1259282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1044, %r1048, %r1052, %r1056}, [%r874]; 2026-02-21T09:52:09.1259380Z // end inline asm 2026-02-21T09:52:09.1259493Z // begin inline asm 2026-02-21T09:52:09.1259803Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1060, %r1064, %r1068, %r1072}, [%r879]; 2026-02-21T09:52:09.1259902Z // end inline asm 2026-02-21T09:52:09.1260013Z // begin inline asm 2026-02-21T09:52:09.1260322Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1076, %r1080, %r1084, %r1088}, [%r884]; 2026-02-21T09:52:09.1260421Z // end inline asm 2026-02-21T09:52:09.1260535Z bar.sync 0, 128; 2026-02-21T09:52:09.1260717Z st.shared.v4.b32 [%r80], {%r1426, %r1438, %r1450, %r1462}; 2026-02-21T09:52:09.1260895Z st.shared.v4.b32 [%r81], {%r1474, %r1486, %r1498, %r1510}; 2026-02-21T09:52:09.1261074Z st.shared.v4.b32 [%r82], {%r1522, %r1534, %r1546, %r1558}; 2026-02-21T09:52:09.1261353Z st.shared.v4.b32 [%r83], {%r1570, %r1582, %r1594, %r1606}; 2026-02-21T09:52:09.1261522Z bar.sync 0, 128; 2026-02-21T09:52:09.1261633Z // begin inline asm 2026-02-21T09:52:09.1261962Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1092, %r1096, %r1100, %r1104}, [%r869]; 2026-02-21T09:52:09.1262059Z // end inline asm 2026-02-21T09:52:09.1262161Z // begin inline asm 2026-02-21T09:52:09.1262477Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1108, %r1112, %r1116, %r1120}, [%r874]; 2026-02-21T09:52:09.1262583Z // end inline asm 2026-02-21T09:52:09.1262682Z // begin inline asm 2026-02-21T09:52:09.1262998Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1124, %r1128, %r1132, %r1136}, [%r879]; 2026-02-21T09:52:09.1263111Z // end inline asm 2026-02-21T09:52:09.1263211Z // begin inline asm 2026-02-21T09:52:09.1263524Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1140, %r1144, %r1148, %r1152}, [%r884]; 2026-02-21T09:52:09.1263634Z // end inline asm 2026-02-21T09:52:09.1263735Z // begin inline asm 2026-02-21T09:52:09.1263953Z st.global.v4.b32 [ %rd79 + 0 ], { %r1025, %r1026, %r1027, %r1028 }; 2026-02-21T09:52:09.1264057Z // end inline asm 2026-02-21T09:52:09.1264170Z // begin inline asm 2026-02-21T09:52:09.1264374Z st.global.v4.b32 [ %rd80 + 0 ], { %r1029, %r1030, %r1031, %r1032 }; 2026-02-21T09:52:09.1264474Z // end inline asm 2026-02-21T09:52:09.1264586Z // begin inline asm 2026-02-21T09:52:09.1264890Z st.global.v4.b32 [ %rd81 + 0 ], { %r1033, %r1034, %r1035, %r1036 }; 2026-02-21T09:52:09.1264994Z // end inline asm 2026-02-21T09:52:09.1265098Z // begin inline asm 2026-02-21T09:52:09.1265316Z st.global.v4.b32 [ %rd82 + 0 ], { %r1037, %r1038, %r1039, %r1040 }; 2026-02-21T09:52:09.1265421Z // end inline asm 2026-02-21T09:52:09.1265522Z // begin inline asm 2026-02-21T09:52:09.1265739Z st.global.v4.b32 [ %rd83 + 0 ], { %r1041, %r1042, %r1043, %r1044 }; 2026-02-21T09:52:09.1265840Z // end inline asm 2026-02-21T09:52:09.1265952Z // begin inline asm 2026-02-21T09:52:09.1266165Z st.global.v4.b32 [ %rd84 + 0 ], { %r1045, %r1046, %r1047, %r1048 }; 2026-02-21T09:52:09.1266266Z // end inline asm 2026-02-21T09:52:09.1266502Z // begin inline asm 2026-02-21T09:52:09.1266703Z st.global.v4.b32 [ %rd85 + 0 ], { %r1049, %r1050, %r1051, %r1052 }; 2026-02-21T09:52:09.1266815Z // end inline asm 2026-02-21T09:52:09.1266920Z // begin inline asm 2026-02-21T09:52:09.1267121Z st.global.v4.b32 [ %rd86 + 0 ], { %r1053, %r1054, %r1055, %r1056 }; 2026-02-21T09:52:09.1267229Z // end inline asm 2026-02-21T09:52:09.1267331Z // begin inline asm 2026-02-21T09:52:09.1267534Z st.global.v4.b32 [ %rd87 + 0 ], { %r1057, %r1058, %r1059, %r1060 }; 2026-02-21T09:52:09.1267634Z // end inline asm 2026-02-21T09:52:09.1267835Z // begin inline asm 2026-02-21T09:52:09.1268038Z st.global.v4.b32 [ %rd88 + 0 ], { %r1061, %r1062, %r1063, %r1064 }; 2026-02-21T09:52:09.1268139Z // end inline asm 2026-02-21T09:52:09.1268254Z // begin inline asm 2026-02-21T09:52:09.1268472Z st.global.v4.b32 [ %rd89 + 0 ], { %r1065, %r1066, %r1067, %r1068 }; 2026-02-21T09:52:09.1268578Z // end inline asm 2026-02-21T09:52:09.1268693Z // begin inline asm 2026-02-21T09:52:09.1268904Z st.global.v4.b32 [ %rd90 + 0 ], { %r1069, %r1070, %r1071, %r1072 }; 2026-02-21T09:52:09.1269006Z // end inline asm 2026-02-21T09:52:09.1269109Z // begin inline asm 2026-02-21T09:52:09.1269319Z st.global.v4.b32 [ %rd91 + 0 ], { %r1073, %r1074, %r1075, %r1076 }; 2026-02-21T09:52:09.1269429Z // end inline asm 2026-02-21T09:52:09.1269529Z // begin inline asm 2026-02-21T09:52:09.1269733Z st.global.v4.b32 [ %rd92 + 0 ], { %r1077, %r1078, %r1079, %r1080 }; 2026-02-21T09:52:09.1269828Z // end inline asm 2026-02-21T09:52:09.1269931Z // begin inline asm 2026-02-21T09:52:09.1270135Z st.global.v4.b32 [ %rd93 + 0 ], { %r1081, %r1082, %r1083, %r1084 }; 2026-02-21T09:52:09.1270243Z // end inline asm 2026-02-21T09:52:09.1270342Z // begin inline asm 2026-02-21T09:52:09.1270536Z st.global.v4.b32 [ %rd94 + 0 ], { %r1085, %r1086, %r1087, %r1088 }; 2026-02-21T09:52:09.1270645Z // end inline asm 2026-02-21T09:52:09.1270841Z // begin inline asm 2026-02-21T09:52:09.1271109Z st.global.v4.b32 [ %rd95 + 0 ], { %r1089, %r1090, %r1091, %r1092 }; 2026-02-21T09:52:09.1271216Z // end inline asm 2026-02-21T09:52:09.1271330Z // begin inline asm 2026-02-21T09:52:09.1271525Z st.global.v4.b32 [ %rd96 + 0 ], { %r1093, %r1094, %r1095, %r1096 }; 2026-02-21T09:52:09.1271622Z // end inline asm 2026-02-21T09:52:09.1271736Z // begin inline asm 2026-02-21T09:52:09.1271930Z st.global.v4.b32 [ %rd97 + 0 ], { %r1097, %r1098, %r1099, %r1100 }; 2026-02-21T09:52:09.1272029Z // end inline asm 2026-02-21T09:52:09.1272138Z // begin inline asm 2026-02-21T09:52:09.1272335Z st.global.v4.b32 [ %rd98 + 0 ], { %r1101, %r1102, %r1103, %r1104 }; 2026-02-21T09:52:09.1272435Z // end inline asm 2026-02-21T09:52:09.1272535Z // begin inline asm 2026-02-21T09:52:09.1272740Z st.global.v4.b32 [ %rd99 + 0 ], { %r1105, %r1106, %r1107, %r1108 }; 2026-02-21T09:52:09.1272839Z // end inline asm 2026-02-21T09:52:09.1272940Z // begin inline asm 2026-02-21T09:52:09.1273173Z st.global.v4.b32 [ %rd100 + 0 ], { %r1109, %r1110, %r1111, %r1112 }; 2026-02-21T09:52:09.1273278Z // end inline asm 2026-02-21T09:52:09.1273380Z // begin inline asm 2026-02-21T09:52:09.1273585Z st.global.v4.b32 [ %rd101 + 0 ], { %r1113, %r1114, %r1115, %r1116 }; 2026-02-21T09:52:09.1273695Z // end inline asm 2026-02-21T09:52:09.1273797Z // begin inline asm 2026-02-21T09:52:09.1273995Z st.global.v4.b32 [ %rd102 + 0 ], { %r1117, %r1118, %r1119, %r1120 }; 2026-02-21T09:52:09.1274103Z // end inline asm 2026-02-21T09:52:09.1274205Z // begin inline asm 2026-02-21T09:52:09.1274405Z st.global.v4.b32 [ %rd103 + 0 ], { %r1121, %r1122, %r1123, %r1124 }; 2026-02-21T09:52:09.1274519Z // end inline asm 2026-02-21T09:52:09.1274620Z // begin inline asm 2026-02-21T09:52:09.1274894Z st.global.v4.b32 [ %rd104 + 0 ], { %r1125, %r1126, %r1127, %r1128 }; 2026-02-21T09:52:09.1274995Z // end inline asm 2026-02-21T09:52:09.1275108Z // begin inline asm 2026-02-21T09:52:09.1275312Z st.global.v4.b32 [ %rd105 + 0 ], { %r1129, %r1130, %r1131, %r1132 }; 2026-02-21T09:52:09.1275415Z // end inline asm 2026-02-21T09:52:09.1275646Z // begin inline asm 2026-02-21T09:52:09.1275862Z st.global.v4.b32 [ %rd106 + 0 ], { %r1133, %r1134, %r1135, %r1136 }; 2026-02-21T09:52:09.1275966Z // end inline asm 2026-02-21T09:52:09.1276074Z // begin inline asm 2026-02-21T09:52:09.1276296Z st.global.v4.b32 [ %rd107 + 0 ], { %r1137, %r1138, %r1139, %r1140 }; 2026-02-21T09:52:09.1276398Z // end inline asm 2026-02-21T09:52:09.1276505Z // begin inline asm 2026-02-21T09:52:09.1276728Z st.global.v4.b32 [ %rd108 + 0 ], { %r1141, %r1142, %r1143, %r1144 }; 2026-02-21T09:52:09.1276833Z // end inline asm 2026-02-21T09:52:09.1277030Z // begin inline asm 2026-02-21T09:52:09.1277254Z st.global.v4.b32 [ %rd109 + 0 ], { %r1145, %r1146, %r1147, %r1148 }; 2026-02-21T09:52:09.1277357Z // end inline asm 2026-02-21T09:52:09.1277464Z // begin inline asm 2026-02-21T09:52:09.1277684Z st.global.v4.b32 [ %rd110 + 0 ], { %r1149, %r1150, %r1151, %r1152 }; 2026-02-21T09:52:09.1277797Z // end inline asm 2026-02-21T09:52:09.1278200Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1278329Z add.s32 %r1153, %r1156, 352400; 2026-02-21T09:52:09.1278740Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1278846Z bar.sync 0, 128; 2026-02-21T09:52:09.1278953Z // begin inline asm 2026-02-21T09:52:09.1279198Z @%p131 mbarrier.arrive.shared::cta.b64 _, [%r1153]; 2026-02-21T09:52:09.1279305Z // end inline asm 2026-02-21T09:52:09.1279421Z add.s32 %r1607, %r1673, 1; 2026-02-21T09:52:09.1279554Z setp.eq.b32 %p129, %r1607, 2; 2026-02-21T09:52:09.1279692Z selp.b32 %r1673, 0, %r1607, %p129; 2026-02-21T09:52:09.1279821Z selp.b32 %r1672, 1, 0, %p129; 2026-02-21T09:52:09.1279985Z $L__BB0_22: // %.thread21 2026-02-21T09:52:09.1280281Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:52:09.1280790Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1280921Z xor.b32 %r1675, %r1675, %r1672; 2026-02-21T09:52:09.1281325Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1281450Z add.s32 %r1662, %r1662, -1; 2026-02-21T09:52:09.1281572Z setp.ne.b32 %p130, %r1662, 0; 2026-02-21T09:52:09.1281690Z @%p130 bra $L__BB0_18; 2026-02-21T09:52:09.1281813Z bra.uni $L__BB0_23; 2026-02-21T09:52:09.1282028Z $L__BB0_18: // =>This Inner Loop Header: Depth=1 2026-02-21T09:52:09.1282151Z add.s32 %r590, %r1668, 1; 2026-02-21T09:52:09.1282282Z setp.eq.b32 %p125, %r1668, 31; 2026-02-21T09:52:09.1282404Z selp.b32 %r1668, 0, %r590, %p125; 2026-02-21T09:52:09.1282525Z setp.eq.b32 %p126, %r1668, 31; 2026-02-21T09:52:09.1282638Z @%p126 bra $L__BB0_21; 2026-02-21T09:52:09.1282852Z // %bb.19: // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:52:09.1283268Z .loc 1 0 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0:74 2026-02-21T09:52:09.1283375Z mov.b32 %r1672, 0; 2026-02-21T09:52:09.1283773Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1283895Z setp.ne.b32 %p127, %r1668, 0; 2026-02-21T09:52:09.1284009Z @%p127 bra $L__BB0_22; 2026-02-21T09:52:09.1284165Z // %bb.20: // %.thread 2026-02-21T09:52:09.1284349Z // in Loop: Header=BB0_18 Depth=1 2026-02-21T09:52:09.1284470Z add.s32 %r1671, %r1671, 1; 2026-02-21T09:52:09.1284938Z .loc 1 37 35 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:37:35 2026-02-21T09:52:09.1285068Z shr.s32 %r1609, %r1671, 31; 2026-02-21T09:52:09.1285179Z shr.u32 %r1610, %r1609, 25; 2026-02-21T09:52:09.1285301Z add.s32 %r1611, %r1671, %r1610; 2026-02-21T09:52:09.1285427Z shr.s32 %r1612, %r1611, 7; 2026-02-21T09:52:09.1285927Z .loc 1 38 33 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:38:33 2026-02-21T09:52:09.1286041Z shl.b32 %r1613, %r1612, 4; 2026-02-21T09:52:09.1286437Z .loc 1 39 39 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:39:39 2026-02-21T09:52:09.1286546Z sub.s32 %r1614, 96, %r1613; 2026-02-21T09:52:09.1286928Z .loc 1 39 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:39:52 2026-02-21T09:52:09.1287046Z min.s32 %r1615, %r1614, 16; 2026-02-21T09:52:09.1287533Z .loc 1 40 45 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:40:45 2026-02-21T09:52:09.1287654Z and.b32 %r1616, %r1611, -128; 2026-02-21T09:52:09.1287791Z sub.s32 %r1617, %r1671, %r1616; 2026-02-21T09:52:09.1288324Z .loc 1 41 51 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:41:51 2026-02-21T09:52:09.1288463Z div.s32 %r1618, %r1617, %r1615; 2026-02-21T09:52:09.1288894Z .loc 1 40 64 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:40:64 2026-02-21T09:52:09.1289027Z mul.lo.s32 %r1619, %r1618, %r1615; 2026-02-21T09:52:09.1289143Z sub.s32 %r1620, %r1617, %r1619; 2026-02-21T09:52:09.1289530Z .loc 1 40 30 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:40:30 2026-02-21T09:52:09.1289653Z add.s32 %r1621, %r1620, %r1613; 2026-02-21T09:52:09.1290064Z .loc 1 42 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:42:27 2026-02-21T09:52:09.1290184Z shl.b32 %r1669, %r1621, 7; 2026-02-21T09:52:09.1290577Z .loc 1 44 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:44:27 2026-02-21T09:52:09.1290702Z shl.b32 %r1670, %r1618, 8; 2026-02-21T09:52:09.1291177Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1291300Z bra.uni $L__BB0_22; 2026-02-21T09:52:09.1291579Z $L__BB0_1: // %.preheader.preheader 2026-02-21T09:52:09.1291967Z .loc 1 0 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0:74 2026-02-21T09:52:09.1292085Z mov.b32 %r111, global_smem; 2026-02-21T09:52:09.1292209Z add.s32 %r112, %r111, %r3; 2026-02-21T09:52:09.1292325Z mov.u32 %r161, %ctaid.x; 2026-02-21T09:52:09.1292440Z mul.lo.s32 %r162, %r161, 6; 2026-02-21T09:52:09.1292552Z add.s32 %r163, %r162, 6; 2026-02-21T09:52:09.1292673Z min.s32 %r164, %r163, 768; 2026-02-21T09:52:09.1292788Z sub.s32 %r165, %r164, %r162; 2026-02-21T09:52:09.1292906Z shl.b32 %r5, %r165, 5; 2026-02-21T09:52:09.1293033Z setp.lt.s32 %p17, %r5, 1; 2026-02-21T09:52:09.1293143Z bra.uni $L__BB0_2; 2026-02-21T09:52:09.1293351Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1293743Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1293911Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:52:09.1294022Z barrier.sync 1; 2026-02-21T09:52:09.1294129Z barrier.sync 1; 2026-02-21T09:52:09.1294294Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:52:09.1294454Z $L__BB0_2: // %.preheader 2026-02-21T09:52:09.1294635Z // =>This Loop Header: Depth=1 2026-02-21T09:52:09.1294937Z // Child Loop BB0_11 Depth 2 2026-02-21T09:52:09.1295126Z // Child Loop BB0_7 Depth 2 2026-02-21T09:52:09.1295490Z .loc 1 19 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:19 2026-02-21T09:52:09.1295648Z setmaxnreg.dec.sync.aligned.u32 24; 2026-02-21T09:52:09.1295754Z barrier.sync 1; 2026-02-21T09:52:09.1295879Z ld.shared.b8 %r110, [%r112+352412]; 2026-02-21T09:52:09.1296004Z setp.gt.u32 %p4, %r110, 3; 2026-02-21T09:52:09.1296129Z @%p4 bra $L__BB0_4; 2026-02-21T09:52:09.1296407Z // %bb.3: // %.preheader 2026-02-21T09:52:09.1296585Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1296715Z $L_brx_0: .branchtargets 2026-02-21T09:52:09.1296821Z $L__BB0_5, 2026-02-21T09:52:09.1296924Z $L__BB0_9, 2026-02-21T09:52:09.1297024Z $L__BB0_15, 2026-02-21T09:52:09.1297134Z $L__BB0_24; 2026-02-21T09:52:09.1297250Z brx.idx %r110, $L_brx_0; 2026-02-21T09:52:09.1297447Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1297940Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1298092Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:52:09.1298241Z ld.shared.b32 %r4, [global_smem+344064]; 2026-02-21T09:52:09.1298358Z barrier.sync 1; 2026-02-21T09:52:09.1298473Z @%p17 bra $L__BB0_8; 2026-02-21T09:52:09.1298623Z // %bb.6: // %.lr.ph9 2026-02-21T09:52:09.1298802Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1299185Z .loc 1 0 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0:74 2026-02-21T09:52:09.1299300Z mov.b32 %r1650, -1; 2026-02-21T09:52:09.1299417Z mov.pred %p150, 0; 2026-02-21T09:52:09.1299533Z mov.b32 %r1646, 0; 2026-02-21T09:52:09.1299641Z mov.b32 %r1645, %r5; 2026-02-21T09:52:09.1299749Z mov.b32 %r1647, %r1646; 2026-02-21T09:52:09.1299874Z mov.b32 %r1648, %r1646; 2026-02-21T09:52:09.1299983Z mov.b32 %r1649, %r1646; 2026-02-21T09:52:09.1300186Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:52:09.1300379Z // => This Inner Loop Header: Depth=2 2026-02-21T09:52:09.1300867Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1300990Z add.s32 %r188, %r1650, 1; 2026-02-21T09:52:09.1301197Z setp.eq.b32 %p38, %r1650, 31; 2026-02-21T09:52:09.1301345Z selp.b32 %r1650, 0, %r188, %p38; 2026-02-21T09:52:09.1301460Z shl.b32 %r189, %r1649, 3; 2026-02-21T09:52:09.1301576Z add.s32 %r191, %r111, %r189; 2026-02-21T09:52:09.1301691Z add.s32 %r192, %r191, 352256; 2026-02-21T09:52:09.1301813Z add.s32 %r168, %r191, 352320; 2026-02-21T09:52:09.1302200Z .loc 1 54 31 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:54:31 2026-02-21T09:52:09.1302313Z shl.b32 %r193, %r1649, 15; 2026-02-21T09:52:09.1302435Z add.s32 %r194, %r111, %r193; 2026-02-21T09:52:09.1302826Z .loc 1 55 44 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:55:44 2026-02-21T09:52:09.1302940Z shl.b32 %r195, %r1649, 14; 2026-02-21T09:52:09.1303056Z add.s32 %r196, %r111, %r195; 2026-02-21T09:52:09.1303169Z add.s32 %r197, %r196, 229376; 2026-02-21T09:52:09.1303542Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1303667Z bar.warp.sync -1; 2026-02-21T09:52:09.1303787Z // begin inline asm 2026-02-21T09:52:09.1303889Z 2026-02-21T09:52:09.1303986Z { 2026-02-21T09:52:09.1304114Z .reg .pred complete; 2026-02-21T09:52:09.1304219Z waitLoop: 2026-02-21T09:52:09.1304489Z mbarrier.try_wait.parity.shared.b64 complete, [%r168], %r1648; 2026-02-21T09:52:09.1304620Z @!complete bra.uni waitLoop; 2026-02-21T09:52:09.1304795Z } 2026-02-21T09:52:09.1304809Z 2026-02-21T09:52:09.1304918Z // end inline asm 2026-02-21T09:52:09.1305306Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1305433Z shl.b32 %r198, %r1647, 8; 2026-02-21T09:52:09.1305545Z add.s32 %r170, %r198, %r4; 2026-02-21T09:52:09.1305925Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1306059Z shl.b32 %r199, %r1647, 3; 2026-02-21T09:52:09.1306168Z add.s32 %r200, %r111, %r199; 2026-02-21T09:52:09.1306397Z add.s32 %r201, %r200, 352384; 2026-02-21T09:52:09.1306510Z setp.eq.b32 %p37, %r1650, 31; 2026-02-21T09:52:09.1306892Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1307014Z elect.sync %r202|%p20, -1; 2026-02-21T09:52:09.1307131Z bfe.u32 %r203, %r194, 4, 14; 2026-02-21T09:52:09.1307254Z cvt.u64.u32 %rd30, %r203; 2026-02-21T09:52:09.1307387Z or.b64 %rd12, %rd30, 4611686293439512576; 2026-02-21T09:52:09.1307496Z bfe.u32 %r204, %r197, 4, 14; 2026-02-21T09:52:09.1307623Z cvt.u64.u32 %rd31, %r204; 2026-02-21T09:52:09.1307850Z or.b64 %rd13, %rd31, 4611686293372403712; 2026-02-21T09:52:09.1307955Z mov.b32 %r171, 136314896; 2026-02-21T09:52:09.1308056Z // begin inline asm 2026-02-21T09:52:09.1308376Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 0 ], %rd12, %rd13, %r171, %p150; 2026-02-21T09:52:09.1308479Z // end inline asm 2026-02-21T09:52:09.1308581Z add.s32 %r205, %r194, 32; 2026-02-21T09:52:09.1308704Z bfe.u32 %r206, %r205, 4, 14; 2026-02-21T09:52:09.1308810Z cvt.u64.u32 %rd32, %r206; 2026-02-21T09:52:09.1308931Z or.b64 %rd14, %rd32, 4611686293439512576; 2026-02-21T09:52:09.1309045Z add.s32 %r207, %r196, 229408; 2026-02-21T09:52:09.1309150Z bfe.u32 %r208, %r207, 4, 14; 2026-02-21T09:52:09.1309257Z cvt.u64.u32 %rd33, %r208; 2026-02-21T09:52:09.1309376Z or.b64 %rd15, %rd33, 4611686293372403712; 2026-02-21T09:52:09.1309500Z mov.pred %p21, -1; 2026-02-21T09:52:09.1309600Z // begin inline asm 2026-02-21T09:52:09.1309896Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 0 ], %rd14, %rd15, %r171, %p21; 2026-02-21T09:52:09.1310013Z // end inline asm 2026-02-21T09:52:09.1310117Z add.s32 %r209, %r194, 64; 2026-02-21T09:52:09.1310222Z bfe.u32 %r210, %r209, 4, 14; 2026-02-21T09:52:09.1310326Z cvt.u64.u32 %rd34, %r210; 2026-02-21T09:52:09.1310454Z or.b64 %rd16, %rd34, 4611686293439512576; 2026-02-21T09:52:09.1310656Z add.s32 %r211, %r196, 229440; 2026-02-21T09:52:09.1310836Z bfe.u32 %r212, %r211, 4, 14; 2026-02-21T09:52:09.1310958Z cvt.u64.u32 %rd35, %r212; 2026-02-21T09:52:09.1311079Z or.b64 %rd17, %rd35, 4611686293372403712; 2026-02-21T09:52:09.1311181Z // begin inline asm 2026-02-21T09:52:09.1311473Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 0 ], %rd16, %rd17, %r171, %p21; 2026-02-21T09:52:09.1311572Z // end inline asm 2026-02-21T09:52:09.1311675Z add.s32 %r213, %r194, 96; 2026-02-21T09:52:09.1311778Z bfe.u32 %r214, %r213, 4, 14; 2026-02-21T09:52:09.1311893Z cvt.u64.u32 %rd36, %r214; 2026-02-21T09:52:09.1312015Z or.b64 %rd18, %rd36, 4611686293439512576; 2026-02-21T09:52:09.1312125Z add.s32 %r215, %r196, 229472; 2026-02-21T09:52:09.1312238Z bfe.u32 %r216, %r215, 4, 14; 2026-02-21T09:52:09.1312342Z cvt.u64.u32 %rd37, %r216; 2026-02-21T09:52:09.1312463Z or.b64 %rd19, %rd37, 4611686293372403712; 2026-02-21T09:52:09.1312564Z // begin inline asm 2026-02-21T09:52:09.1312856Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 0 ], %rd18, %rd19, %r171, %p21; 2026-02-21T09:52:09.1312959Z // end inline asm 2026-02-21T09:52:09.1313066Z add.s32 %r217, %r194, 16384; 2026-02-21T09:52:09.1313178Z bfe.u32 %r218, %r217, 4, 14; 2026-02-21T09:52:09.1313283Z cvt.u64.u32 %rd38, %r218; 2026-02-21T09:52:09.1313402Z or.b64 %rd20, %rd38, 4611686293439512576; 2026-02-21T09:52:09.1313512Z // begin inline asm 2026-02-21T09:52:09.1313802Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 128 ], %rd20, %rd13, %r171, %p150; 2026-02-21T09:52:09.1313902Z // end inline asm 2026-02-21T09:52:09.1314008Z add.s32 %r219, %r194, 16416; 2026-02-21T09:52:09.1314127Z bfe.u32 %r220, %r219, 4, 14; 2026-02-21T09:52:09.1314231Z cvt.u64.u32 %rd39, %r220; 2026-02-21T09:52:09.1314354Z or.b64 %rd22, %rd39, 4611686293439512576; 2026-02-21T09:52:09.1314468Z // begin inline asm 2026-02-21T09:52:09.1314860Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 128 ], %rd22, %rd15, %r171, %p21; 2026-02-21T09:52:09.1314979Z // end inline asm 2026-02-21T09:52:09.1315103Z add.s32 %r221, %r194, 16448; 2026-02-21T09:52:09.1315322Z bfe.u32 %r222, %r221, 4, 14; 2026-02-21T09:52:09.1315437Z cvt.u64.u32 %rd40, %r222; 2026-02-21T09:52:09.1315563Z or.b64 %rd24, %rd40, 4611686293439512576; 2026-02-21T09:52:09.1315679Z // begin inline asm 2026-02-21T09:52:09.1315976Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 128 ], %rd24, %rd17, %r171, %p21; 2026-02-21T09:52:09.1316078Z // end inline asm 2026-02-21T09:52:09.1316206Z add.s32 %r223, %r194, 16480; 2026-02-21T09:52:09.1316307Z bfe.u32 %r224, %r223, 4, 14; 2026-02-21T09:52:09.1316410Z cvt.u64.u32 %rd41, %r224; 2026-02-21T09:52:09.1316619Z or.b64 %rd26, %rd41, 4611686293439512576; 2026-02-21T09:52:09.1316737Z // begin inline asm 2026-02-21T09:52:09.1317026Z @%p20 tcgen05.mma.cta_group::1.kind::f16 [ %r170 + 128 ], %rd26, %rd19, %r171, %p21; 2026-02-21T09:52:09.1317125Z // end inline asm 2026-02-21T09:52:09.1317249Z cvt.u64.u32 %rd28, %r192; 2026-02-21T09:52:09.1317354Z // begin inline asm 2026-02-21T09:52:09.1317619Z @%p20 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd28]; 2026-02-21T09:52:09.1317733Z // end inline asm 2026-02-21T09:52:09.1317855Z and.pred %p36, %p37, %p20; 2026-02-21T09:52:09.1317964Z cvt.u64.u32 %rd29, %r201; 2026-02-21T09:52:09.1318068Z // begin inline asm 2026-02-21T09:52:09.1318335Z @%p36 tcgen05.commit.cta_group::1.mbarrier::arrive::one.b64 [%rd29]; 2026-02-21T09:52:09.1318439Z // end inline asm 2026-02-21T09:52:09.1318797Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1318928Z setp.ne.b32 %p150, %r1650, 31; 2026-02-21T09:52:09.1319304Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1319417Z add.s32 %r225, %r1647, 1; 2026-02-21T09:52:09.1319545Z setp.eq.b32 %p39, %r225, 2; 2026-02-21T09:52:09.1319664Z selp.b32 %r226, 0, %r225, %p39; 2026-02-21T09:52:09.1319934Z selp.b32 %r1647, %r226, %r1647, %p37; 2026-02-21T09:52:09.1320128Z and.pred %p40, %p37, %p39; 2026-02-21T09:52:09.1320254Z selp.b32 %r227, 1, 0, %p40; 2026-02-21T09:52:09.1320368Z xor.b32 %r1646, %r1646, %r227; 2026-02-21T09:52:09.1320752Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1320869Z shl.b32 %r228, %r1647, 3; 2026-02-21T09:52:09.1320978Z add.s32 %r229, %r111, %r228; 2026-02-21T09:52:09.1321086Z add.s32 %r186, %r229, 352400; 2026-02-21T09:52:09.1321472Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1321585Z // begin inline asm 2026-02-21T09:52:09.1321679Z 2026-02-21T09:52:09.1321771Z { 2026-02-21T09:52:09.1321899Z @!%p37 bra.uni skipWait; 2026-02-21T09:52:09.1322010Z .reg .pred complete; 2026-02-21T09:52:09.1322110Z waitLoop: 2026-02-21T09:52:09.1322375Z mbarrier.try_wait.parity.shared.b64 complete, [%r186], %r1646; 2026-02-21T09:52:09.1322498Z @!complete bra.uni waitLoop; 2026-02-21T09:52:09.1322599Z skipWait: 2026-02-21T09:52:09.1322689Z } 2026-02-21T09:52:09.1322698Z 2026-02-21T09:52:09.1322813Z // end inline asm 2026-02-21T09:52:09.1323179Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1323286Z add.s32 %r230, %r1649, 1; 2026-02-21T09:52:09.1323411Z setp.eq.b32 %p41, %r230, 7; 2026-02-21T09:52:09.1323533Z selp.b32 %r1649, 0, %r230, %p41; 2026-02-21T09:52:09.1323643Z selp.b32 %r231, 1, 0, %p41; 2026-02-21T09:52:09.1323765Z xor.b32 %r1648, %r1648, %r231; 2026-02-21T09:52:09.1324146Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1324262Z add.s32 %r1645, %r1645, -1; 2026-02-21T09:52:09.1324379Z setp.ne.b32 %p42, %r1645, 0; 2026-02-21T09:52:09.1324499Z @%p42 bra $L__BB0_7; 2026-02-21T09:52:09.1324665Z $L__BB0_8: // %._crit_edge10 2026-02-21T09:52:09.1324926Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1325151Z barrier.sync 1; 2026-02-21T09:52:09.1325306Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:52:09.1325411Z bra.uni $L__BB0_2; 2026-02-21T09:52:09.1325602Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1325978Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1326129Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:52:09.1326236Z barrier.sync 1; 2026-02-21T09:52:09.1326611Z .loc 1 21 67 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:21:67 2026-02-21T09:52:09.1326821Z mov.u32 %r114, %ctaid.y; 2026-02-21T09:52:09.1326933Z mov.u32 %r115, %ctaid.z; 2026-02-21T09:52:09.1327052Z mov.u32 %r116, %nctaid.x; 2026-02-21T09:52:09.1327162Z mov.u32 %r117, %nctaid.y; 2026-02-21T09:52:09.1327293Z mad.lo.s32 %r118, %r115, %r117, %r114; 2026-02-21T09:52:09.1327421Z mad.lo.s32 %r119, %r118, %r116, %r161; 2026-02-21T09:52:09.1327543Z shl.b32 %r120, %r119, 8; 2026-02-21T09:52:09.1327975Z .loc 1 22 68 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:22:68 2026-02-21T09:52:09.1328112Z cvt.s64.s32 %rd7, %r120; 2026-02-21T09:52:09.1328261Z add.s64 %rd8, %rd6, %rd7; 2026-02-21T09:52:09.1328398Z add.s64 %rd9, %rd8, 128; 2026-02-21T09:52:09.1328612Z cvta.global.u64 %rd11, %rd9; 2026-02-21T09:52:09.1329087Z .loc 1 21 67 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:21:67 2026-02-21T09:52:09.1329241Z cvta.global.u64 %rd10, %rd8; 2026-02-21T09:52:09.1329697Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1329835Z shl.b32 %r1651, %r165, 5; 2026-02-21T09:52:09.1329967Z setp.lt.s32 %p5, %r1651, 1; 2026-02-21T09:52:09.1330070Z @%p5 bra $L__BB0_14; 2026-02-21T09:52:09.1330302Z // %bb.10: // %.lr.ph 2026-02-21T09:52:09.1330553Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1330669Z add.s32 %r1661, %r162, -1; 2026-02-21T09:52:09.1330775Z add.s32 %r21, %r1, -128; 2026-02-21T09:52:09.1330888Z mov.b32 %r1658, -1; 2026-02-21T09:52:09.1330987Z mov.b32 %r1652, 0; 2026-02-21T09:52:09.1331090Z mov.b32 %r1653, %r1652; 2026-02-21T09:52:09.1331191Z mov.b32 %r1660, %r1652; 2026-02-21T09:52:09.1331304Z mov.b32 %r1659, %r1652; 2026-02-21T09:52:09.1331406Z mov.b32 %r1656, %r1652; 2026-02-21T09:52:09.1331509Z bra.uni $L__BB0_11; 2026-02-21T09:52:09.1331717Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:52:09.1332077Z .loc 1 0 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0:74 2026-02-21T09:52:09.1332193Z selp.b32 %r144, 0, %r1656, %p8; 2026-02-21T09:52:09.1332306Z setp.lt.u32 %p12, %r21, 32; 2026-02-21T09:52:09.1332428Z setp.eq.b32 %p9, %r21, 0; 2026-02-21T09:52:09.1332788Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1332899Z shl.b32 %r151, %r1653, 3; 2026-02-21T09:52:09.1333015Z add.s32 %r153, %r111, %r151; 2026-02-21T09:52:09.1333122Z add.s32 %r140, %r153, 352256; 2026-02-21T09:52:09.1333462Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1333573Z // begin inline asm 2026-02-21T09:52:09.1333662Z 2026-02-21T09:52:09.1333749Z { 2026-02-21T09:52:09.1333859Z .reg .pred complete; 2026-02-21T09:52:09.1333973Z waitLoop: 2026-02-21T09:52:09.1334214Z mbarrier.try_wait.parity.shared.b64 complete, [%r140], %r1652; 2026-02-21T09:52:09.1334333Z @!complete bra.uni waitLoop; 2026-02-21T09:52:09.1334430Z } 2026-02-21T09:52:09.1334438Z 2026-02-21T09:52:09.1334537Z // end inline asm 2026-02-21T09:52:09.1334983Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1335115Z add.s32 %r146, %r153, 352320; 2026-02-21T09:52:09.1335579Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1335684Z bar.sync 3, 64; 2026-02-21T09:52:09.1335786Z // begin inline asm 2026-02-21T09:52:09.1336025Z @%p9 mbarrier.arrive.expect_tx.shared.b64 _, [%r146], 49152; 2026-02-21T09:52:09.1336130Z // end inline asm 2026-02-21T09:52:09.1336499Z .loc 1 54 31 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:54:31 2026-02-21T09:52:09.1336621Z shl.b32 %r154, %r1653, 15; 2026-02-21T09:52:09.1336829Z add.s32 %r143, %r111, %r154; 2026-02-21T09:52:09.1337181Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1337293Z bar.sync 3, 64; 2026-02-21T09:52:09.1337413Z elect.sync %r155|%p13, -1; 2026-02-21T09:52:09.1337531Z and.pred %p10, %p12, %p13; 2026-02-21T09:52:09.1337641Z // begin inline asm 2026-02-21T09:52:09.1338213Z @%p10 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r143], [%rd10, {%r144, %r1660}], [%r146]; 2026-02-21T09:52:09.1338320Z // end inline asm 2026-02-21T09:52:09.1338691Z .loc 1 55 44 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:55:44 2026-02-21T09:52:09.1338807Z shl.b32 %r156, %r1653, 14; 2026-02-21T09:52:09.1338916Z add.s32 %r157, %r111, %r156; 2026-02-21T09:52:09.1339021Z add.s32 %r147, %r157, 229376; 2026-02-21T09:52:09.1339383Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1339491Z bar.sync 3, 64; 2026-02-21T09:52:09.1339608Z elect.sync %r158|%p14, -1; 2026-02-21T09:52:09.1339724Z and.pred %p11, %p12, %p14; 2026-02-21T09:52:09.1339840Z // begin inline asm 2026-02-21T09:52:09.1340486Z @%p11 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r147], [%rd11, {%r144, %r1659}], [%r146]; 2026-02-21T09:52:09.1340652Z // end inline asm 2026-02-21T09:52:09.1341044Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1341152Z add.s32 %r1656, %r144, 64; 2026-02-21T09:52:09.1341499Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1341616Z add.s32 %r159, %r1653, 1; 2026-02-21T09:52:09.1341734Z setp.eq.b32 %p15, %r159, 7; 2026-02-21T09:52:09.1341853Z selp.b32 %r1653, 0, %r159, %p15; 2026-02-21T09:52:09.1341963Z selp.b32 %r160, 1, 0, %p15; 2026-02-21T09:52:09.1342085Z xor.b32 %r1652, %r1652, %r160; 2026-02-21T09:52:09.1342464Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1342578Z add.s32 %r1651, %r1651, -1; 2026-02-21T09:52:09.1342703Z setp.ne.b32 %p16, %r1651, 0; 2026-02-21T09:52:09.1342813Z @%p16 bra $L__BB0_11; 2026-02-21T09:52:09.1342921Z bra.uni $L__BB0_14; 2026-02-21T09:52:09.1343132Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T09:52:09.1343326Z // => This Inner Loop Header: Depth=2 2026-02-21T09:52:09.1343436Z add.s32 %r126, %r1658, 1; 2026-02-21T09:52:09.1343554Z setp.eq.b32 %p6, %r1658, 31; 2026-02-21T09:52:09.1343679Z selp.b32 %r1658, 0, %r126, %p6; 2026-02-21T09:52:09.1343793Z setp.ne.b32 %p7, %r1658, 0; 2026-02-21T09:52:09.1343906Z setp.eq.b32 %p8, %r1658, 0; 2026-02-21T09:52:09.1344023Z @%p7 bra $L__BB0_13; 2026-02-21T09:52:09.1344212Z // %bb.12: // in Loop: Header=BB0_11 Depth=2 2026-02-21T09:52:09.1344323Z add.s32 %r1661, %r1661, 1; 2026-02-21T09:52:09.1344765Z .loc 1 37 35 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:37:35 2026-02-21T09:52:09.1344890Z shr.s32 %r127, %r1661, 31; 2026-02-21T09:52:09.1344996Z shr.u32 %r128, %r127, 25; 2026-02-21T09:52:09.1345108Z add.s32 %r129, %r1661, %r128; 2026-02-21T09:52:09.1345229Z shr.s32 %r130, %r129, 7; 2026-02-21T09:52:09.1345718Z .loc 1 38 33 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:38:33 2026-02-21T09:52:09.1345826Z shl.b32 %r131, %r130, 4; 2026-02-21T09:52:09.1346194Z .loc 1 39 39 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:39:39 2026-02-21T09:52:09.1346297Z sub.s32 %r132, 96, %r131; 2026-02-21T09:52:09.1346653Z .loc 1 39 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:39:52 2026-02-21T09:52:09.1346763Z min.s32 %r133, %r132, 16; 2026-02-21T09:52:09.1347212Z .loc 1 40 45 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:40:45 2026-02-21T09:52:09.1347321Z and.b32 %r134, %r129, -128; 2026-02-21T09:52:09.1347430Z sub.s32 %r135, %r1661, %r134; 2026-02-21T09:52:09.1347800Z .loc 1 41 51 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:41:51 2026-02-21T09:52:09.1347908Z div.s32 %r136, %r135, %r133; 2026-02-21T09:52:09.1348274Z .loc 1 40 64 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:40:64 2026-02-21T09:52:09.1348393Z mul.lo.s32 %r137, %r136, %r133; 2026-02-21T09:52:09.1348497Z sub.s32 %r138, %r135, %r137; 2026-02-21T09:52:09.1348852Z .loc 1 40 30 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:40:30 2026-02-21T09:52:09.1348969Z add.s32 %r139, %r138, %r131; 2026-02-21T09:52:09.1349322Z .loc 1 42 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:42:27 2026-02-21T09:52:09.1349431Z shl.b32 %r1659, %r139, 7; 2026-02-21T09:52:09.1349786Z .loc 1 44 27 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:44:27 2026-02-21T09:52:09.1349903Z shl.b32 %r1660, %r136, 8; 2026-02-21T09:52:09.1350016Z bra.uni $L__BB0_13; 2026-02-21T09:52:09.1350260Z $L__BB0_14: // %._crit_edge 2026-02-21T09:52:09.1350505Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1350872Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1350978Z barrier.sync 1; 2026-02-21T09:52:09.1351133Z setmaxnreg.inc.sync.aligned.u32 24; 2026-02-21T09:52:09.1351236Z bra.uni $L__BB0_2; 2026-02-21T09:52:09.1351422Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:52:09.1351774Z .loc 1 19 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:19 2026-02-21T09:52:09.1351882Z barrier.sync 1; 2026-02-21T09:52:09.1351982Z barrier.sync 1; 2026-02-21T09:52:09.1352082Z bra.uni $L__BB0_2; 2026-02-21T09:52:09.1352251Z $L__BB0_23: // %._crit_edge13 2026-02-21T09:52:09.1352612Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1352716Z barrier.sync 1; 2026-02-21T09:52:09.1352876Z setmaxnreg.inc.sync.aligned.u32 40; 2026-02-21T09:52:09.1352988Z shl.b32 %r1643, %r1673, 3; 2026-02-21T09:52:09.1353098Z add.s32 %r1622, %r544, %r1643; 2026-02-21T09:52:09.1353459Z .loc 1 56 52 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:56:52 2026-02-21T09:52:09.1353570Z bar.sync 0, 128; 2026-02-21T09:52:09.1353673Z // begin inline asm 2026-02-21T09:52:09.1353762Z 2026-02-21T09:52:09.1353859Z { 2026-02-21T09:52:09.1353969Z .reg .pred complete; 2026-02-21T09:52:09.1354065Z waitLoop: 2026-02-21T09:52:09.1354313Z mbarrier.try_wait.parity.shared.b64 complete, [%r1622], %r1675; 2026-02-21T09:52:09.1354449Z @!complete bra.uni waitLoop; 2026-02-21T09:52:09.1354542Z } 2026-02-21T09:52:09.1354551Z 2026-02-21T09:52:09.1354656Z // end inline asm 2026-02-21T09:52:09.1355115Z .loc 1 31 74 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:74 2026-02-21T09:52:09.1355224Z bar.sync 0, 128; 2026-02-21T09:52:09.1355330Z // begin inline asm 2026-02-21T09:52:09.1355623Z @%p131 mbarrier.inval.shared::cta.b64 [%r544]; 2026-02-21T09:52:09.1355727Z // end inline asm 2026-02-21T09:52:09.1355830Z bar.sync 0, 128; 2026-02-21T09:52:09.1355945Z // begin inline asm 2026-02-21T09:52:09.1356114Z @%p131 mbarrier.inval.shared::cta.b64 [%r545]; 2026-02-21T09:52:09.1356212Z // end inline asm 2026-02-21T09:52:09.1356310Z // begin inline asm 2026-02-21T09:52:09.1356470Z @%p131 mbarrier.inval.shared::cta.b64 [%r542]; 2026-02-21T09:52:09.1356565Z // end inline asm 2026-02-21T09:52:09.1356665Z bar.sync 0, 128; 2026-02-21T09:52:09.1356776Z // begin inline asm 2026-02-21T09:52:09.1357059Z @%p131 mbarrier.inval.shared::cta.b64 [%r543]; 2026-02-21T09:52:09.1357163Z // end inline asm 2026-02-21T09:52:09.1357266Z // begin inline asm 2026-02-21T09:52:09.1357430Z @%p131 mbarrier.inval.shared::cta.b64 [%r528]; 2026-02-21T09:52:09.1357528Z // end inline asm 2026-02-21T09:52:09.1357633Z bar.sync 0, 128; 2026-02-21T09:52:09.1357745Z // begin inline asm 2026-02-21T09:52:09.1357907Z @%p131 mbarrier.inval.shared::cta.b64 [%r529]; 2026-02-21T09:52:09.1358006Z // end inline asm 2026-02-21T09:52:09.1358102Z bar.sync 0, 128; 2026-02-21T09:52:09.1358214Z // begin inline asm 2026-02-21T09:52:09.1358370Z @%p131 mbarrier.inval.shared::cta.b64 [%r530]; 2026-02-21T09:52:09.1358469Z // end inline asm 2026-02-21T09:52:09.1358579Z bar.sync 0, 128; 2026-02-21T09:52:09.1358684Z // begin inline asm 2026-02-21T09:52:09.1358834Z @%p131 mbarrier.inval.shared::cta.b64 [%r531]; 2026-02-21T09:52:09.1358931Z // end inline asm 2026-02-21T09:52:09.1359042Z bar.sync 0, 128; 2026-02-21T09:52:09.1359149Z // begin inline asm 2026-02-21T09:52:09.1359300Z @%p131 mbarrier.inval.shared::cta.b64 [%r532]; 2026-02-21T09:52:09.1359408Z // end inline asm 2026-02-21T09:52:09.1359507Z bar.sync 0, 128; 2026-02-21T09:52:09.1359608Z // begin inline asm 2026-02-21T09:52:09.1359864Z @%p131 mbarrier.inval.shared::cta.b64 [%r533]; 2026-02-21T09:52:09.1359976Z // end inline asm 2026-02-21T09:52:09.1360141Z bar.sync 0, 128; 2026-02-21T09:52:09.1360249Z // begin inline asm 2026-02-21T09:52:09.1360415Z @%p131 mbarrier.inval.shared::cta.b64 [%r534]; 2026-02-21T09:52:09.1360514Z // end inline asm 2026-02-21T09:52:09.1360619Z // begin inline asm 2026-02-21T09:52:09.1360780Z @%p131 mbarrier.inval.shared::cta.b64 [%r521]; 2026-02-21T09:52:09.1360879Z // end inline asm 2026-02-21T09:52:09.1360979Z bar.sync 0, 128; 2026-02-21T09:52:09.1361082Z // begin inline asm 2026-02-21T09:52:09.1361244Z @%p131 mbarrier.inval.shared::cta.b64 [%r522]; 2026-02-21T09:52:09.1361344Z // end inline asm 2026-02-21T09:52:09.1361448Z bar.sync 0, 128; 2026-02-21T09:52:09.1361561Z // begin inline asm 2026-02-21T09:52:09.1361716Z @%p131 mbarrier.inval.shared::cta.b64 [%r523]; 2026-02-21T09:52:09.1361816Z // end inline asm 2026-02-21T09:52:09.1361925Z bar.sync 0, 128; 2026-02-21T09:52:09.1362030Z // begin inline asm 2026-02-21T09:52:09.1362187Z @%p131 mbarrier.inval.shared::cta.b64 [%r524]; 2026-02-21T09:52:09.1362296Z // end inline asm 2026-02-21T09:52:09.1362407Z bar.sync 0, 128; 2026-02-21T09:52:09.1362511Z // begin inline asm 2026-02-21T09:52:09.1362665Z @%p131 mbarrier.inval.shared::cta.b64 [%r525]; 2026-02-21T09:52:09.1362772Z // end inline asm 2026-02-21T09:52:09.1362874Z bar.sync 0, 128; 2026-02-21T09:52:09.1362975Z // begin inline asm 2026-02-21T09:52:09.1363127Z @%p131 mbarrier.inval.shared::cta.b64 [%r526]; 2026-02-21T09:52:09.1363234Z // end inline asm 2026-02-21T09:52:09.1363333Z bar.sync 0, 128; 2026-02-21T09:52:09.1363435Z // begin inline asm 2026-02-21T09:52:09.1363602Z @%p131 mbarrier.inval.shared::cta.b64 [%r527]; 2026-02-21T09:52:09.1363704Z // end inline asm 2026-02-21T09:52:09.1364072Z .loc 1 31 4 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:31:4 2026-02-21T09:52:09.1364177Z bar.sync 0, 128; 2026-02-21T09:52:09.1364292Z // begin inline asm 2026-02-21T09:52:09.1364543Z @%p43 tcgen05.dealloc.cta_group::1.sync.aligned.b32 %r1642, 512; 2026-02-21T09:52:09.1364801Z // end inline asm 2026-02-21T09:52:09.1364937Z st.shared.b32 [global_smem+352416], 50529027; 2026-02-21T09:52:09.1365015Z barrier.sync 1; 2026-02-21T09:52:09.1365165Z $L__BB0_24: // %common.ret 2026-02-21T09:52:09.1365529Z .loc 1 0 0 // cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py:0 2026-02-21T09:52:09.1365628Z ret; 2026-02-21T09:52:09.1365725Z $L__tmp1: 2026-02-21T09:52:09.1365825Z $L__func_end0: 2026-02-21T09:52:09.1365991Z // -- End function 2026-02-21T09:52:09.1366176Z } 2026-02-21T09:52:09.1366633Z .file 1 "/tmp/torchinductor_root/ij/cijhmlgpbkpyvcmxokbukke4cv4dhvhxqpe4mk7lchy6gy2t6w25.py" 2026-02-21T09:52:09.1366759Z .section .debug_abbrev 2026-02-21T09:52:09.1366850Z { 2026-02-21T09:52:09.1367021Z .b8 1 // Abbreviation Code 2026-02-21T09:52:09.1367206Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:52:09.1367372Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:52:09.1367525Z .b8 37 // DW_AT_producer 2026-02-21T09:52:09.1367674Z .b8 8 // DW_FORM_string 2026-02-21T09:52:09.1367829Z .b8 19 // DW_AT_language 2026-02-21T09:52:09.1367978Z .b8 5 // DW_FORM_data2 2026-02-21T09:52:09.1368123Z .b8 3 // DW_AT_name 2026-02-21T09:52:09.1368276Z .b8 8 // DW_FORM_string 2026-02-21T09:52:09.1368434Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:52:09.1368581Z .b8 6 // DW_FORM_data4 2026-02-21T09:52:09.1368743Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:52:09.1368879Z .b8 8 // DW_FORM_string 2026-02-21T09:52:09.1369105Z .b8 0 // EOM(1) 2026-02-21T09:52:09.1369293Z .b8 0 // EOM(2) 2026-02-21T09:52:09.1369427Z .b8 0 // EOM(3) 2026-02-21T09:52:09.1369516Z } 2026-02-21T09:52:09.1369623Z .section .debug_info 2026-02-21T09:52:09.1369717Z { 2026-02-21T09:52:09.1369873Z .b32 104 // Length of Unit 2026-02-21T09:52:09.1370034Z .b8 2 // DWARF version number 2026-02-21T09:52:09.1370131Z .b8 0 2026-02-21T09:52:09.1370361Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:52:09.1370530Z .b8 8 // Address Size (in bytes) 2026-02-21T09:52:09.1370723Z .b8 1 // Abbrev [1] 0xb:0x61 DW_TAG_compile_unit 2026-02-21T09:52:09.1370881Z .b8 116 // DW_AT_producer 2026-02-21T09:52:09.1370975Z .b8 114 2026-02-21T09:52:09.1371072Z .b8 105 2026-02-21T09:52:09.1371165Z .b8 116 2026-02-21T09:52:09.1371253Z .b8 111 2026-02-21T09:52:09.1371346Z .b8 110 2026-02-21T09:52:09.1371436Z .b8 0 2026-02-21T09:52:09.1371586Z .b8 2 // DW_AT_language 2026-02-21T09:52:09.1371675Z .b8 0 2026-02-21T09:52:09.1371816Z .b8 99 // DW_AT_name 2026-02-21T09:52:09.1371913Z .b8 105 2026-02-21T09:52:09.1372001Z .b8 106 2026-02-21T09:52:09.1372088Z .b8 104 2026-02-21T09:52:09.1372177Z .b8 109 2026-02-21T09:52:09.1372275Z .b8 108 2026-02-21T09:52:09.1372364Z .b8 103 2026-02-21T09:52:09.1372451Z .b8 112 2026-02-21T09:52:09.1372548Z .b8 98 2026-02-21T09:52:09.1372643Z .b8 107 2026-02-21T09:52:09.1372734Z .b8 112 2026-02-21T09:52:09.1372821Z .b8 121 2026-02-21T09:52:09.1372918Z .b8 118 2026-02-21T09:52:09.1373009Z .b8 99 2026-02-21T09:52:09.1373098Z .b8 109 2026-02-21T09:52:09.1373187Z .b8 120 2026-02-21T09:52:09.1373286Z .b8 111 2026-02-21T09:52:09.1373375Z .b8 107 2026-02-21T09:52:09.1373464Z .b8 98 2026-02-21T09:52:09.1373569Z .b8 117 2026-02-21T09:52:09.1373659Z .b8 107 2026-02-21T09:52:09.1373878Z .b8 107 2026-02-21T09:52:09.1373968Z .b8 101 2026-02-21T09:52:09.1374066Z .b8 52 2026-02-21T09:52:09.1374155Z .b8 99 2026-02-21T09:52:09.1374243Z .b8 118 2026-02-21T09:52:09.1374331Z .b8 52 2026-02-21T09:52:09.1374420Z .b8 100 2026-02-21T09:52:09.1374506Z .b8 104 2026-02-21T09:52:09.1374596Z .b8 118 2026-02-21T09:52:09.1374760Z .b8 104 2026-02-21T09:52:09.1374849Z .b8 120 2026-02-21T09:52:09.1374936Z .b8 113 2026-02-21T09:52:09.1375035Z .b8 112 2026-02-21T09:52:09.1375124Z .b8 101 2026-02-21T09:52:09.1375211Z .b8 52 2026-02-21T09:52:09.1375301Z .b8 109 2026-02-21T09:52:09.1375496Z .b8 107 2026-02-21T09:52:09.1375590Z .b8 55 2026-02-21T09:52:09.1375682Z .b8 108 2026-02-21T09:52:09.1375772Z .b8 99 2026-02-21T09:52:09.1375872Z .b8 104 2026-02-21T09:52:09.1375962Z .b8 121 2026-02-21T09:52:09.1376053Z .b8 54 2026-02-21T09:52:09.1376151Z .b8 103 2026-02-21T09:52:09.1376242Z .b8 121 2026-02-21T09:52:09.1376332Z .b8 50 2026-02-21T09:52:09.1376428Z .b8 116 2026-02-21T09:52:09.1376533Z .b8 54 2026-02-21T09:52:09.1376632Z .b8 119 2026-02-21T09:52:09.1376722Z .b8 50 2026-02-21T09:52:09.1376821Z .b8 53 2026-02-21T09:52:09.1376914Z .b8 46 2026-02-21T09:52:09.1377003Z .b8 112 2026-02-21T09:52:09.1377094Z .b8 121 2026-02-21T09:52:09.1377194Z .b8 0 2026-02-21T09:52:09.1377385Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:52:09.1377529Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:52:09.1377625Z .b8 116 2026-02-21T09:52:09.1377718Z .b8 109 2026-02-21T09:52:09.1377808Z .b8 112 2026-02-21T09:52:09.1377897Z .b8 47 2026-02-21T09:52:09.1377998Z .b8 116 2026-02-21T09:52:09.1378093Z .b8 111 2026-02-21T09:52:09.1378185Z .b8 114 2026-02-21T09:52:09.1378276Z .b8 99 2026-02-21T09:52:09.1378376Z .b8 104 2026-02-21T09:52:09.1378465Z .b8 105 2026-02-21T09:52:09.1378556Z .b8 110 2026-02-21T09:52:09.1378656Z .b8 100 2026-02-21T09:52:09.1378749Z .b8 117 2026-02-21T09:52:09.1378839Z .b8 99 2026-02-21T09:52:09.1379029Z .b8 116 2026-02-21T09:52:09.1379140Z .b8 111 2026-02-21T09:52:09.1379290Z .b8 114 2026-02-21T09:52:09.1379383Z .b8 95 2026-02-21T09:52:09.1379483Z .b8 114 2026-02-21T09:52:09.1379576Z .b8 111 2026-02-21T09:52:09.1379667Z .b8 111 2026-02-21T09:52:09.1379757Z .b8 116 2026-02-21T09:52:09.1379858Z .b8 47 2026-02-21T09:52:09.1379949Z .b8 105 2026-02-21T09:52:09.1380039Z .b8 106 2026-02-21T09:52:09.1380138Z .b8 0 2026-02-21T09:52:09.1380230Z } 2026-02-21T09:52:09.1380356Z .section .debug_macinfo { } 2026-02-21T09:52:09.1380364Z 2026-02-21T09:52:09.1380517Z ================================================================ 2026-02-21T09:52:09.1380739Z please share the reproducer above with Triton project. 2026-02-21T09:52:09.6053880Z 2026-02-21T09:52:09.6055522Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 56/56 14.8 configs/s 2026-02-21T09:52:14.0923843Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 224.9 2026-02-21T09:52:14.0924534Z configs/s 2026-02-21T09:52:14.3243239Z [351s] Generation 12 complete: 2026-02-21T09:52:14.3243691Z error=23 2026-02-21T09:52:14.3243910Z ok=35 2026-02-21T09:52:14.3244127Z min=0.1064 2026-02-21T09:52:14.3244359Z mid=0.1321 2026-02-21T09:52:14.3244591Z max=0.3646 2026-02-21T09:52:14.3245326Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:52:14.3245809Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:52:14.3246267Z 'l2_groupings': [4], 2026-02-21T09:52:14.3246603Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:52:14.3246983Z 'loop_orders': [[1, 0]], 2026-02-21T09:52:14.3247254Z 'maxnreg': 64, 2026-02-21T09:52:14.3247870Z 'num_sm_multiplier': 8, 2026-02-21T09:52:14.3248150Z 'num_stages': 3, 2026-02-21T09:52:14.3248408Z 'num_warps': 1, 2026-02-21T09:52:14.3248685Z 'pid_type': 'persistent_blocked', 2026-02-21T09:52:14.3249055Z 'range_flattens': [False, None], 2026-02-21T09:52:14.3249391Z 'range_multi_buffers': [False, True], 2026-02-21T09:52:14.3249756Z 'range_num_stages': [0, 0], 2026-02-21T09:52:14.3250192Z 'range_unroll_factors': [0, 0], 2026-02-21T09:52:14.3250553Z 'range_warp_specializes': [False, True]} 2026-02-21T09:52:14.3302891Z [351s] Fitting surrogate: 1084 points, 1084 targets 2026-02-21T09:52:15.3597235Z [352s] Generation 13 starting: 37 neighbors, 2 active search path(s) 2026-02-21T09:52:23.4830178Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 37/37 4.5 configs/s 2026-02-21T09:52:25.1884182Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 37/37 22.5 configs/s 2026-02-21T09:52:30.0907014Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 205.6 2026-02-21T09:52:30.0907695Z configs/s 2026-02-21T09:52:30.2857751Z [367s] Generation 13 complete: 2026-02-21T09:52:30.2858134Z error=9 2026-02-21T09:52:30.2858341Z ok=30 2026-02-21T09:52:30.2858554Z min=0.1075 2026-02-21T09:52:30.2858802Z mid=0.1198 2026-02-21T09:52:30.2859016Z max=0.2387 2026-02-21T09:52:30.2859711Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:52:30.2860126Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:52:30.2860492Z 'l2_groupings': [4], 2026-02-21T09:52:30.2860782Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:52:30.2861130Z 'loop_orders': [[1, 0]], 2026-02-21T09:52:30.2861390Z 'maxnreg': 64, 2026-02-21T09:52:30.2861634Z 'num_sm_multiplier': 8, 2026-02-21T09:52:30.2861897Z 'num_stages': 3, 2026-02-21T09:52:30.2862155Z 'num_warps': 2, 2026-02-21T09:52:30.2862419Z 'pid_type': 'persistent_blocked', 2026-02-21T09:52:30.2862735Z 'range_flattens': [False, False], 2026-02-21T09:52:30.2863062Z 'range_multi_buffers': [None, True], 2026-02-21T09:52:30.2863392Z 'range_num_stages': [0, 0], 2026-02-21T09:52:30.2863682Z 'range_unroll_factors': [0, 0], 2026-02-21T09:52:30.2864008Z 'range_warp_specializes': [False, True]} 2026-02-21T09:52:30.2894645Z [367s] Fitting surrogate: 1123 points, 1123 targets 2026-02-21T09:52:31.1831589Z [368s] Generation 14 starting: 42 neighbors, 2 active search path(s) 2026-02-21T09:52:37.4328962Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 42/42 10.0 configs/s 2026-02-21T09:52:39.2777265Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 42/42 23.5 configs/s 2026-02-21T09:52:44.1952365Z Generation 14: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 203.1 2026-02-21T09:52:44.1952931Z configs/s 2026-02-21T09:52:44.3837430Z [381s] Generation 14 complete: 2026-02-21T09:52:44.3837758Z error=11 2026-02-21T09:52:44.3838106Z ok=33 2026-02-21T09:52:44.3838366Z min=0.1075 2026-02-21T09:52:44.3838570Z mid=0.1260 2026-02-21T09:52:44.3838796Z max=1.8012 2026-02-21T09:52:44.3839036Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:52:44.3839469Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:52:44.3839868Z 'l2_groupings': [4], 2026-02-21T09:52:44.3840178Z 'load_eviction_policies': ['', 'last'], 2026-02-21T09:52:44.3840545Z 'loop_orders': [[1, 0]], 2026-02-21T09:52:44.3840851Z 'maxnreg': 64, 2026-02-21T09:52:44.3841119Z 'num_sm_multiplier': 8, 2026-02-21T09:52:44.3841410Z 'num_stages': 3, 2026-02-21T09:52:44.3841689Z 'num_warps': 2, 2026-02-21T09:52:44.3841997Z 'pid_type': 'persistent_blocked', 2026-02-21T09:52:44.3842364Z 'range_flattens': [False, False], 2026-02-21T09:52:44.3842742Z 'range_multi_buffers': [None, True], 2026-02-21T09:52:44.3843100Z 'range_num_stages': [0, 0], 2026-02-21T09:52:44.3843440Z 'range_unroll_factors': [0, 0], 2026-02-21T09:52:44.3843787Z 'range_warp_specializes': [False, True]} 2026-02-21T09:52:44.3877976Z [381s] Fitting surrogate: 1167 points, 1167 targets 2026-02-21T09:52:44.8387710Z [381s] Generation 15 starting: 14 neighbors, 1 active search path(s) 2026-02-21T09:52:47.1428326Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 14/14 13.4 configs/s 2026-02-21T09:52:47.5448225Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 14/14 36.8 configs/s 2026-02-21T09:52:48.6939389Z Generation 15: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 845.1 2026-02-21T09:52:48.6940423Z configs/s 2026-02-21T09:52:48.7725640Z [385s] Generation 15 complete: 2026-02-21T09:52:48.7725937Z error=8 2026-02-21T09:52:48.7726076Z ok=8 2026-02-21T09:52:48.7726216Z min=0.1075 2026-02-21T09:52:48.7726350Z mid=0.1146 2026-02-21T09:52:48.7726488Z max=0.1495 2026-02-21T09:52:48.7726642Z best={'block_sizes': [128, 256, 64], 2026-02-21T09:52:48.7726875Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:52:48.7727132Z 'l2_groupings': [4], 2026-02-21T09:52:48.7727309Z 'load_eviction_policies': ['', 'last'], 2026-02-21T09:52:48.7727515Z 'loop_orders': [[1, 0]], 2026-02-21T09:52:48.7727677Z 'maxnreg': 64, 2026-02-21T09:52:48.7727834Z 'num_sm_multiplier': 8, 2026-02-21T09:52:48.7727998Z 'num_stages': 3, 2026-02-21T09:52:48.7728587Z 'num_warps': 2, 2026-02-21T09:52:48.7728771Z 'pid_type': 'persistent_blocked', 2026-02-21T09:52:48.7729085Z 'range_flattens': [False, False], 2026-02-21T09:52:48.7729289Z 'range_multi_buffers': [None, True], 2026-02-21T09:52:48.7729485Z 'range_num_stages': [0, 0], 2026-02-21T09:52:48.7729675Z 'range_unroll_factors': [0, 0], 2026-02-21T09:52:48.7729872Z 'range_warp_specializes': [False, True]} 2026-02-21T09:52:48.7776440Z [385s] Fitting surrogate: 1183 points, 1183 targets 2026-02-21T09:52:49.0247004Z [385s] Autotuning complete in 386.0s after searching 1143 configs. 2026-02-21T09:52:49.0247621Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:52:49.0249745Z @helion.kernel(config=helion.Config(block_sizes=[128, 256, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=8, num_stages=3, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[False, True]), static_shapes=True) 2026-02-21T09:52:49.0251636Z 2026-02-21T09:52:49.0252115Z [385s] Code of selected kernel: /tmp/torchinductor_root/xy/cxyytslo3cflgxahbxy2brus6a4h7jnt7jnvara6dxqbcf3yzz6d.py 2026-02-21T09:52:49.0485900Z from __future__ import annotations 2026-02-21T09:52:49.0486217Z 2026-02-21T09:52:49.0486448Z import torch 2026-02-21T09:52:49.0486679Z import helion 2026-02-21T09:52:49.0486962Z import triton 2026-02-21T09:52:49.0487265Z import triton.language as tl 2026-02-21T09:52:49.0487729Z from helion.runtime import default_launcher as _default_launcher 2026-02-21T09:52:49.0488146Z 2026-02-21T09:52:49.0488260Z _BLOCK_SIZE_1 = tl.constexpr(256) 2026-02-21T09:52:49.0488569Z _BLOCK_SIZE_0 = tl.constexpr(128) 2026-02-21T09:52:49.0488862Z _BLOCK_SIZE_2 = tl.constexpr(64) 2026-02-21T09:52:49.0489058Z 2026-02-21T09:52:49.0489148Z @triton.jit 2026-02-21T09:52:49.0489418Z def _helion_matmul(x, y, out, _NUM_SM: tl.constexpr): 2026-02-21T09:52:49.0489876Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:52:49.0490399Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:52:49.0490877Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:52:49.0491212Z # src[matmul.py:63-67]: ... 2026-02-21T09:52:49.0491579Z total_pids = tl.cdiv(12288, _BLOCK_SIZE_1) * tl.cdiv(2048, _BLOCK_SIZE_0) 2026-02-21T09:52:49.0492034Z block_size = tl.cdiv(total_pids, _NUM_SM * 8) 2026-02-21T09:52:49.0492396Z start_pid = tl.program_id(0) * block_size 2026-02-21T09:52:49.0492796Z end_pid = tl.minimum(start_pid + block_size, total_pids) 2026-02-21T09:52:49.0493710Z for virtual_pid in tl.range(start_pid, end_pid, warp_specialize=False, flatten=False): 2026-02-21T09:52:49.0494296Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:52:49.0494795Z num_pid_m = tl.cdiv(12288, _BLOCK_SIZE_1) 2026-02-21T09:52:49.0495160Z num_pid_n = tl.cdiv(2048, _BLOCK_SIZE_0) 2026-02-21T09:52:49.0495617Z inner_2d_pid = virtual_pid 2026-02-21T09:52:49.0495925Z num_pid_in_group = 4 * num_pid_n 2026-02-21T09:52:49.0496286Z group_id = inner_2d_pid // num_pid_in_group 2026-02-21T09:52:49.0496620Z first_pid_m = group_id * 4 2026-02-21T09:52:49.0496999Z group_size_m = min(num_pid_m - first_pid_m, 4) 2026-02-21T09:52:49.0497509Z pid_0 = first_pid_m + inner_2d_pid % num_pid_in_group % group_size_m 2026-02-21T09:52:49.0498036Z pid_1 = inner_2d_pid % num_pid_in_group // group_size_m 2026-02-21T09:52:49.0498431Z offset_1 = pid_0 * _BLOCK_SIZE_1 2026-02-21T09:52:49.0498879Z indices_1 = (offset_1 + tl.arange(0, _BLOCK_SIZE_1)).to(tl.int32) 2026-02-21T09:52:49.0499342Z offset_0 = pid_1 * _BLOCK_SIZE_0 2026-02-21T09:52:49.0499773Z indices_0 = (offset_0 + tl.arange(0, _BLOCK_SIZE_0)).to(tl.int32) 2026-02-21T09:52:49.0500404Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:52:49.0501069Z acc = tl.full([_BLOCK_SIZE_0, _BLOCK_SIZE_1], 0.0, tl.float32) 2026-02-21T09:52:49.0501550Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:52:49.0502101Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:52:49.0502914Z for offset_2 in tl.range(0, 2048, _BLOCK_SIZE_2, warp_specialize=True, disallow_acc_multi_buffer=False, flatten=False): 2026-02-21T09:52:49.0503689Z indices_2 = offset_2 + tl.arange(0, _BLOCK_SIZE_2).to(tl.int32) 2026-02-21T09:52:49.0504140Z acc_copy = acc 2026-02-21T09:52:49.0504436Z acc_copy_0 = acc_copy 2026-02-21T09:52:49.0504985Z # src[matmul.py:66]: acc = torch.addmm(acc, x[tile_m, tile_k], y[tile_k, tile_n]) 2026-02-21T09:52:49.0505638Z load = tl.load(x + (indices_0[:, None] * 2048 + indices_2[None, :] * 1), None) 2026-02-21T09:52:49.0506409Z load_1 = tl.load(y + (indices_2[:, None] * 1 + indices_1[None, :] * 2048), None, eviction_policy='evict_last') 2026-02-21T09:52:49.0507395Z acc = tl.dot(tl.cast(load, tl.float16), tl.cast(load_1, tl.float16), acc=acc_copy_0, input_precision='tf32', out_dtype=tl.float32) 2026-02-21T09:52:49.0508257Z # src[matmul.py:67]: out[tile_m, tile_n] = epilogue(acc, (tile_m, tile_n)) 2026-02-21T09:52:49.0508752Z v_0 = tl.cast(acc, tl.float16) 2026-02-21T09:52:49.0509227Z tl.store(out + (indices_0[:, None] * 12288 + indices_1[None, :] * 1), v_0, None) 2026-02-21T09:52:49.0509614Z 2026-02-21T09:52:49.0510110Z def matmul(x: Tensor, y: Tensor, epilogue: Callable[[Tensor, tuple[Tensor, ...]], Tensor]=lambda acc, tile: acc, *, _launcher=_default_launcher): 2026-02-21T09:52:49.0510728Z """ 2026-02-21T09:52:49.0511126Z Performs matrix multiplication of x and y with an optional epilogue function. 2026-02-21T09:52:49.0511597Z Args: 2026-02-21T09:52:49.0511842Z x (Tensor): Left matrix of shape [m, k]. 2026-02-21T09:52:49.0512217Z y (Tensor): Right matrix of shape [k, n]. 2026-02-21T09:52:49.0512749Z epilogue (Callable, optional): Function applied to the accumulator and tile indices 2026-02-21T09:52:49.0513272Z after the matmul. Defaults to identity (no change). 2026-02-21T09:52:49.0513627Z Returns: 2026-02-21T09:52:49.0513888Z Tensor: Resulting matrix of shape [m, n]. 2026-02-21T09:52:49.0514205Z """ 2026-02-21T09:52:49.0514425Z # src[matmul.py:57]: m, k = x.size() 2026-02-21T09:52:49.0514784Z m, k = x.size() 2026-02-21T09:52:49.0515034Z # src[matmul.py:58]: k2, n = y.size() 2026-02-21T09:52:49.0515423Z k2, n = y.size() 2026-02-21T09:52:49.0515797Z # src[matmul.py:59]: assert k == k2, f"size mismatch {k} != {k2}" 2026-02-21T09:52:49.0516292Z assert k == k2, f'size mismatch {k} != {k2}' 2026-02-21T09:52:49.0516691Z # src[matmul.py:60]: out = torch.empty( 2026-02-21T09:52:49.0517205Z # src[matmul.py:61]: [m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device 2026-02-21T09:52:49.0517784Z # src[matmul.py:62]: ) 2026-02-21T09:52:49.0518241Z out = torch.empty([m, n], dtype=torch.promote_types(x.dtype, y.dtype), device=x.device) 2026-02-21T09:52:49.0518817Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:52:49.0519233Z _NUM_SM = helion.runtime.get_num_sm(x.device) 2026-02-21T09:52:49.0519656Z # src[matmul.py:63]: for tile_m, tile_n in hl.tile([m, n]): 2026-02-21T09:52:49.0520160Z # src[matmul.py:64]: acc = hl.zeros([tile_m, tile_n], dtype=torch.float32) 2026-02-21T09:52:49.0520652Z # src[matmul.py:65]: for tile_k in hl.tile(k): 2026-02-21T09:52:49.0521019Z # src[matmul.py:63-67]: ... 2026-02-21T09:52:49.0521504Z _launcher(_helion_matmul, (_NUM_SM * 8,), x, y, out, _NUM_SM, num_warps=4, num_stages=3, maxnreg=64) 2026-02-21T09:52:49.0522044Z # src[matmul.py:68]: return out 2026-02-21T09:52:49.0522323Z return out 2026-02-21T09:53:11.1915535Z WARNING:tritonbench.utils.triton_op:Completed input ID 11: 2026-02-21T09:53:11.1915979Z (M, N, K) 2026-02-21T09:53:11.1916167Z ------------------- 2026-02-21T09:53:11.1916377Z (2048, 12288, 2048) 2026-02-21T09:53:11.1916500Z 2026-02-21T09:53:11.1917268Z 100%|██████████| 8/8 [55:07<00:00, 451.56s/it] 2026-02-21T09:53:11.1917626Z 100%|██████████| 8/8 [55:07<00:00, 413.41s/it] 2026-02-21T09:53:11.1934544Z INFO:tritonbench.utils.run_utils:[tritonbench] Output result csv to /tmp/tmpcpoe03m1.csv 2026-02-21T09:53:12.5753762Z (M, N, K) triton_tutorial_matmul-speedup triton_tutorial_matmul-accuracy pt2_triton_matmul-speedup pt2_triton_matmul-accuracy helion_matmul_tritonbench-speedup helion_matmul_tritonbench-accuracy 2026-02-21T09:53:12.5754783Z ------------------- -------------------------------- --------------------------------- --------------------------- ---------------------------- ----------------------------------- ------------------------------------ 2026-02-21T09:53:12.5755436Z (4096, 1024, 1024) 0.770106 1 0.765488 1 0.853103 1 2026-02-21T09:53:12.5756008Z (4096, 2048, 2048) 0.745811 1 0.731751 1 0.925407 1 2026-02-21T09:53:12.5756567Z (2048, 4096, 2048) 0.752166 1 0.719532 1 0.92613 1 2026-02-21T09:53:12.5757106Z (1024, 8192, 1024) 0.680925 1 0.688646 1 0.841029 1 2026-02-21T09:53:12.5757655Z (8192, 2048, 2048) 0.749025 1 0.722132 1 0.899948 1 2026-02-21T09:53:12.5758205Z (12288, 1024, 1024) 0.68574 1 0.678403 1 0.662182 1 2026-02-21T09:53:12.5758762Z (1024, 12288, 1024) 0.670338 1 0.660525 1 0.686711 1 2026-02-21T09:53:12.5759584Z (2048, 12288, 2048) 0.674012 1 0.671558 1 0.811119 1 2026-02-21T09:53:12.5760148Z average 0.716015 1 0.704754 1 0.825704 1 2026-02-21T09:53:15.8716100Z ✅ Completed benchmark for kernel: gemm 2026-02-21T09:53:15.8727416Z [ 2026-02-21T09:53:15.8727699Z { 2026-02-21T09:53:15.8727874Z "benchmark": { 2026-02-21T09:53:15.8728111Z "name": "Helion Benchmark", 2026-02-21T09:53:15.8728350Z "extra_info": { 2026-02-21T09:53:15.8728564Z "device": "NVIDIA B200" 2026-02-21T09:53:15.8728808Z } 2026-02-21T09:53:15.8728967Z }, 2026-02-21T09:53:15.8729173Z "model": { 2026-02-21T09:53:15.8729346Z "name": "gemm" 2026-02-21T09:53:15.8729544Z }, 2026-02-21T09:53:15.8729708Z "metric": { 2026-02-21T09:53:15.8729896Z "name": "triton_speedup", 2026-02-21T09:53:15.8730071Z "benchmark_values": [ 2026-02-21T09:53:15.8730226Z 0.7701064089246069, 2026-02-21T09:53:15.8730660Z 0.7458110700079382, 2026-02-21T09:53:15.8730898Z 0.7521662752685057, 2026-02-21T09:53:15.8731056Z 0.6809254795746362, 2026-02-21T09:53:15.8731192Z 0.7490252057034579, 2026-02-21T09:53:15.8731339Z 0.6857395524866101, 2026-02-21T09:53:15.8731476Z 0.6703375245614475, 2026-02-21T09:53:15.8731624Z 0.6740121393210164 2026-02-21T09:53:15.8731775Z ] 2026-02-21T09:53:15.8731889Z }, 2026-02-21T09:53:15.8732014Z "shape": [ 2026-02-21T09:53:15.8732139Z "(4096, 1024, 1024)", 2026-02-21T09:53:15.8732285Z "(4096, 2048, 2048)", 2026-02-21T09:53:15.8732421Z "(2048, 4096, 2048)", 2026-02-21T09:53:15.8732571Z "(1024, 8192, 1024)", 2026-02-21T09:53:15.8732705Z "(8192, 2048, 2048)", 2026-02-21T09:53:15.8732850Z "(12288, 1024, 1024)", 2026-02-21T09:53:15.8732998Z "(1024, 12288, 1024)", 2026-02-21T09:53:15.8733149Z "(2048, 12288, 2048)" 2026-02-21T09:53:15.8733288Z ] 2026-02-21T09:53:15.8733401Z }, 2026-02-21T09:53:15.8733520Z { 2026-02-21T09:53:15.8733633Z "benchmark": { 2026-02-21T09:53:15.8733786Z "name": "Helion Benchmark", 2026-02-21T09:53:15.8733947Z "extra_info": { 2026-02-21T09:53:15.8734096Z "device": "NVIDIA B200" 2026-02-21T09:53:15.8734240Z } 2026-02-21T09:53:15.8734358Z }, 2026-02-21T09:53:15.8734473Z "model": { 2026-02-21T09:53:15.8734605Z "name": "gemm" 2026-02-21T09:53:15.8735081Z }, 2026-02-21T09:53:15.8735207Z "metric": { 2026-02-21T09:53:15.8735356Z "name": "triton_accuracy", 2026-02-21T09:53:15.8735518Z "benchmark_values": [ 2026-02-21T09:53:15.8735670Z 1.0, 2026-02-21T09:53:15.8735841Z 1.0, 2026-02-21T09:53:15.8735958Z 1.0, 2026-02-21T09:53:15.8736080Z 1.0, 2026-02-21T09:53:15.8736195Z 1.0, 2026-02-21T09:53:15.8736319Z 1.0, 2026-02-21T09:53:15.8736436Z 1.0, 2026-02-21T09:53:15.8736565Z 1.0 2026-02-21T09:53:15.8736679Z ] 2026-02-21T09:53:15.8736804Z }, 2026-02-21T09:53:15.8736928Z "shape": [ 2026-02-21T09:53:15.8737078Z "(4096, 1024, 1024)", 2026-02-21T09:53:15.8737241Z "(4096, 2048, 2048)", 2026-02-21T09:53:15.8737384Z "(2048, 4096, 2048)", 2026-02-21T09:53:15.8737531Z "(1024, 8192, 1024)", 2026-02-21T09:53:15.8737667Z "(8192, 2048, 2048)", 2026-02-21T09:53:15.8737813Z "(12288, 1024, 1024)", 2026-02-21T09:53:15.8737958Z "(1024, 12288, 1024)", 2026-02-21T09:53:15.8738108Z "(2048, 12288, 2048)" 2026-02-21T09:53:15.8738240Z ] 2026-02-21T09:53:15.8738361Z }, 2026-02-21T09:53:15.8738469Z { 2026-02-21T09:53:15.8738590Z "benchmark": { 2026-02-21T09:53:15.8738811Z "name": "Helion Benchmark", 2026-02-21T09:53:15.8738970Z "extra_info": { 2026-02-21T09:53:15.8739120Z "device": "NVIDIA B200" 2026-02-21T09:53:15.8739266Z } 2026-02-21T09:53:15.8739388Z }, 2026-02-21T09:53:15.8739501Z "model": { 2026-02-21T09:53:15.8739637Z "name": "gemm" 2026-02-21T09:53:15.8739768Z }, 2026-02-21T09:53:15.8739890Z "metric": { 2026-02-21T09:53:15.8740035Z "name": "torch_compile_speedup", 2026-02-21T09:53:15.8740275Z "benchmark_values": [ 2026-02-21T09:53:15.8740416Z 0.7654880105008941, 2026-02-21T09:53:15.8740556Z 0.7317505447589749, 2026-02-21T09:53:15.8740698Z 0.7195324366276608, 2026-02-21T09:53:15.8740835Z 0.6886461355642817, 2026-02-21T09:53:15.8740979Z 0.7221315661680846, 2026-02-21T09:53:15.8741115Z 0.6784032855540102, 2026-02-21T09:53:15.8741258Z 0.6605247744736658, 2026-02-21T09:53:15.8741397Z 0.6715582457104065 2026-02-21T09:53:15.8741544Z ] 2026-02-21T09:53:15.8741655Z }, 2026-02-21T09:53:15.8741775Z "shape": [ 2026-02-21T09:53:15.8741898Z "(4096, 1024, 1024)", 2026-02-21T09:53:15.8742040Z "(4096, 2048, 2048)", 2026-02-21T09:53:15.8742181Z "(2048, 4096, 2048)", 2026-02-21T09:53:15.8742317Z "(1024, 8192, 1024)", 2026-02-21T09:53:15.8742500Z "(8192, 2048, 2048)", 2026-02-21T09:53:15.8742641Z "(12288, 1024, 1024)", 2026-02-21T09:53:15.8742835Z "(1024, 12288, 1024)", 2026-02-21T09:53:15.8742976Z "(2048, 12288, 2048)" 2026-02-21T09:53:15.8743116Z ] 2026-02-21T09:53:15.8743226Z }, 2026-02-21T09:53:15.8743346Z { 2026-02-21T09:53:15.8743462Z "benchmark": { 2026-02-21T09:53:15.8743612Z "name": "Helion Benchmark", 2026-02-21T09:53:15.8743788Z "extra_info": { 2026-02-21T09:53:15.8743936Z "device": "NVIDIA B200" 2026-02-21T09:53:15.8744094Z } 2026-02-21T09:53:15.8744203Z }, 2026-02-21T09:53:15.8744319Z "model": { 2026-02-21T09:53:15.8744443Z "name": "gemm" 2026-02-21T09:53:15.8744579Z }, 2026-02-21T09:53:15.8744761Z "metric": { 2026-02-21T09:53:15.8744911Z "name": "torch_compile_accuracy", 2026-02-21T09:53:15.8745086Z "benchmark_values": [ 2026-02-21T09:53:15.8745237Z 1.0, 2026-02-21T09:53:15.8745355Z 1.0, 2026-02-21T09:53:15.8745479Z 1.0, 2026-02-21T09:53:15.8745604Z 1.0, 2026-02-21T09:53:15.8745720Z 1.0, 2026-02-21T09:53:15.8745845Z 1.0, 2026-02-21T09:53:15.8745959Z 1.0, 2026-02-21T09:53:15.8746086Z 1.0 2026-02-21T09:53:15.8746203Z ] 2026-02-21T09:53:15.8746324Z }, 2026-02-21T09:53:15.8746438Z "shape": [ 2026-02-21T09:53:15.8746568Z "(4096, 1024, 1024)", 2026-02-21T09:53:15.8746706Z "(4096, 2048, 2048)", 2026-02-21T09:53:15.8746855Z "(2048, 4096, 2048)", 2026-02-21T09:53:15.8746991Z "(1024, 8192, 1024)", 2026-02-21T09:53:15.8747136Z "(8192, 2048, 2048)", 2026-02-21T09:53:15.8747279Z "(12288, 1024, 1024)", 2026-02-21T09:53:15.8747424Z "(1024, 12288, 1024)", 2026-02-21T09:53:15.8747573Z "(2048, 12288, 2048)" 2026-02-21T09:53:15.8747709Z ] 2026-02-21T09:53:15.8747827Z }, 2026-02-21T09:53:15.8747935Z { 2026-02-21T09:53:15.8748064Z "benchmark": { 2026-02-21T09:53:15.8748203Z "name": "Helion Benchmark", 2026-02-21T09:53:15.8748370Z "extra_info": { 2026-02-21T09:53:15.8748518Z "device": "NVIDIA B200" 2026-02-21T09:53:15.8748684Z } 2026-02-21T09:53:15.8748800Z }, 2026-02-21T09:53:15.8748928Z "model": { 2026-02-21T09:53:15.8749067Z "name": "gemm" 2026-02-21T09:53:15.8749203Z }, 2026-02-21T09:53:15.8749327Z "metric": { 2026-02-21T09:53:15.8749469Z "name": "helion_speedup", 2026-02-21T09:53:15.8749640Z "benchmark_values": [ 2026-02-21T09:53:15.8749790Z 0.8531031603040449, 2026-02-21T09:53:15.8749942Z 0.9254069927396793, 2026-02-21T09:53:15.8750084Z 0.9261303055412549, 2026-02-21T09:53:15.8750237Z 0.8410288906103505, 2026-02-21T09:53:15.8750425Z 0.8999479756745634, 2026-02-21T09:53:15.8750581Z 0.662182004684579, 2026-02-21T09:53:15.8750740Z 0.6867106349943948, 2026-02-21T09:53:15.8750886Z 0.8111194497421527 2026-02-21T09:53:15.8751036Z ] 2026-02-21T09:53:15.8751151Z }, 2026-02-21T09:53:15.8751276Z "shape": [ 2026-02-21T09:53:15.8751409Z "(4096, 1024, 1024)", 2026-02-21T09:53:15.8751572Z "(4096, 2048, 2048)", 2026-02-21T09:53:15.8751755Z "(2048, 4096, 2048)", 2026-02-21T09:53:15.8751904Z "(1024, 8192, 1024)", 2026-02-21T09:53:15.8752044Z "(8192, 2048, 2048)", 2026-02-21T09:53:15.8752197Z "(12288, 1024, 1024)", 2026-02-21T09:53:15.8752348Z "(1024, 12288, 1024)", 2026-02-21T09:53:15.8752503Z "(2048, 12288, 2048)" 2026-02-21T09:53:15.8752651Z ] 2026-02-21T09:53:15.8752766Z }, 2026-02-21T09:53:15.8752888Z { 2026-02-21T09:53:15.8753005Z "benchmark": { 2026-02-21T09:53:15.8753159Z "name": "Helion Benchmark", 2026-02-21T09:53:15.8753326Z "extra_info": { 2026-02-21T09:53:15.8753483Z "device": "NVIDIA B200" 2026-02-21T09:53:15.8753636Z } 2026-02-21T09:53:15.8753759Z }, 2026-02-21T09:53:15.8753876Z "model": { 2026-02-21T09:53:15.8754011Z "name": "gemm" 2026-02-21T09:53:15.8754144Z }, 2026-02-21T09:53:15.8754266Z "metric": { 2026-02-21T09:53:15.8754447Z "name": "helion_accuracy", 2026-02-21T09:53:15.8754651Z "benchmark_values": [ 2026-02-21T09:53:15.8754864Z 1.0, 2026-02-21T09:53:15.8754986Z 1.0, 2026-02-21T09:53:15.8755112Z 1.0, 2026-02-21T09:53:15.8755231Z 1.0, 2026-02-21T09:53:15.8755358Z 1.0, 2026-02-21T09:53:15.8755477Z 1.0, 2026-02-21T09:53:15.8755604Z 1.0, 2026-02-21T09:53:15.8755724Z 1.0 2026-02-21T09:53:15.8755853Z ] 2026-02-21T09:53:15.8755968Z }, 2026-02-21T09:53:15.8756095Z "shape": [ 2026-02-21T09:53:15.8756232Z "(4096, 1024, 1024)", 2026-02-21T09:53:15.8756377Z "(4096, 2048, 2048)", 2026-02-21T09:53:15.8756531Z "(2048, 4096, 2048)", 2026-02-21T09:53:15.8756676Z "(1024, 8192, 1024)", 2026-02-21T09:53:15.8756829Z "(8192, 2048, 2048)", 2026-02-21T09:53:15.8756972Z "(12288, 1024, 1024)", 2026-02-21T09:53:15.8757130Z "(1024, 12288, 1024)", 2026-02-21T09:53:15.8757276Z "(2048, 12288, 2048)" 2026-02-21T09:53:15.8757427Z ] 2026-02-21T09:53:15.8757555Z } 2026-02-21T09:53:15.8763233Z ] 2026-02-21T09:53:15.8851306Z ##[group]Run pytorch/test-infra/.github/actions/gather-benchmark-metadata@main 2026-02-21T09:53:15.8851741Z with: 2026-02-21T09:53:15.8852408Z github-token: *** 2026-02-21T09:53:15.8852617Z venv: .venv/bin/activate 2026-02-21T09:53:15.8852846Z schema-version: v3 2026-02-21T09:53:15.8853047Z env: 2026-02-21T09:53:15.8853232Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:15.8853515Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8853864Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:15.8854219Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8854523Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8854880Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8855401Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:15.8856017Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:15.8856330Z ##[endgroup] 2026-02-21T09:53:15.8934451Z ##[group]Run set -eux 2026-02-21T09:53:15.8934922Z set -eux 2026-02-21T09:53:15.8935118Z  2026-02-21T09:53:15.8935332Z if [[ -z "${GITHUB_TOKEN}" ]]; then 2026-02-21T09:53:15.8935643Z  echo "Missing github-token input" 2026-02-21T09:53:15.8935925Z  exit 1 2026-02-21T09:53:15.8936106Z fi 2026-02-21T09:53:15.8937332Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:15.8937636Z env: 2026-02-21T09:53:15.8937956Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:15.8938239Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8938603Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:15.8938991Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8939311Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8939661Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.8940356Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:15.8941008Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:15.8941647Z GITHUB_TOKEN: *** 2026-02-21T09:53:15.8941858Z ##[endgroup] 2026-02-21T09:53:15.9667845Z + [[ -z *** ]] 2026-02-21T09:53:15.9727966Z ##[group]Run pytorch/test-infra/.github/actions/get-workflow-job-id@main 2026-02-21T09:53:15.9728224Z with: 2026-02-21T09:53:15.9728487Z github-token: *** 2026-02-21T09:53:15.9728634Z env: 2026-02-21T09:53:15.9728766Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:15.9728973Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9729218Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:15.9729466Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9729681Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9729915Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9730277Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:15.9730657Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:15.9730878Z ##[endgroup] 2026-02-21T09:53:15.9739841Z ##[group]Run set -eux 2026-02-21T09:53:15.9740009Z set -eux 2026-02-21T09:53:15.9740158Z  2026-02-21T09:53:15.9740462Z python3 "${GITHUB_ACTION_PATH}/../../scripts/get_workflow_job_id.py" "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2026-02-21T09:53:15.9740895Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:15.9741102Z env: 2026-02-21T09:53:15.9741237Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:15.9741440Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9741680Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:15.9742070Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9742298Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9742509Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:15.9742871Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:15.9743253Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:15.9743578Z GITHUB_TOKEN: *** 2026-02-21T09:53:15.9743724Z ##[endgroup] 2026-02-21T09:53:16.0434588Z + python3 /__w/_actions/pytorch/test-infra/main/.github/actions/get-workflow-job-id/../../scripts/get_workflow_job_id.py 22253280836 dgxb200-05-1002 2026-02-21T09:53:17.7240391Z setting job-id=64380329751 2026-02-21T09:53:17.7243020Z setting job-name=run-b200 (gemm) / benchmark-cu130-gemm-py3.12-b200 2026-02-21T09:53:17.7434346Z ##[group]Run set -eux 2026-02-21T09:53:17.7434752Z set -eux 2026-02-21T09:53:17.7435008Z  2026-02-21T09:53:17.7435248Z if [[ -n ".venv/bin/activate" ]]; then 2026-02-21T09:53:17.7435559Z  source ".venv/bin/activate" 2026-02-21T09:53:17.7435817Z fi 2026-02-21T09:53:17.7435994Z  2026-02-21T09:53:17.7436343Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_metadata.py" \ 2026-02-21T09:53:17.7436828Z  --schema-version "${SCHEMA_VERSION}" \ 2026-02-21T09:53:17.7437132Z  --repo "${REPO}" \ 2026-02-21T09:53:17.7437396Z  --head-branch "${HEAD_BRANCH}" \ 2026-02-21T09:53:17.7437679Z  --head-sha "${HEAD_SHA}" \ 2026-02-21T09:53:17.7438161Z  --workflow-id "${WORKFLOW_RUN_ID}" \ 2026-02-21T09:53:17.7438492Z  --run-attempt "${RUN_ATTEMPT}" \ 2026-02-21T09:53:17.7438925Z  --job-id "${JOB_ID}" \ 2026-02-21T09:53:17.7439309Z  --job-name "${JOB_NAME}" 2026-02-21T09:53:17.7439872Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:17.7440311Z env: 2026-02-21T09:53:17.7440556Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:17.7441000Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.7441377Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:17.7441747Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.7442068Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.7442392Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.7442941Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:17.7443533Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:17.7443859Z SCHEMA_VERSION: v3 2026-02-21T09:53:17.7444129Z REPO: pytorch/helion 2026-02-21T09:53:17.7444445Z HEAD_BRANCH: refs/heads/main 2026-02-21T09:53:17.7444920Z HEAD_SHA: 874a7d0cadab18218a84ad3579d329dc95c51820 2026-02-21T09:53:17.7445252Z WORKFLOW_RUN_ID: 22253280836 2026-02-21T09:53:17.7445484Z RUN_ATTEMPT: 1 2026-02-21T09:53:17.7445694Z JOB_ID: 64380329751 2026-02-21T09:53:17.7445990Z JOB_NAME: run-b200 (gemm) / benchmark-cu130-gemm-py3.12-b200 2026-02-21T09:53:17.7446323Z ##[endgroup] 2026-02-21T09:53:17.8430358Z + [[ -n .venv/bin/activate ]] 2026-02-21T09:53:17.8430745Z + source .venv/bin/activate 2026-02-21T09:53:17.8431062Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8431312Z ++ '[' -n x ']' 2026-02-21T09:53:17.8431593Z ++ SCRIPT_PATH=.venv/bin/activate 2026-02-21T09:53:17.8432115Z ++ '[' .venv/bin/activate = /__w/_temp/f30bbd48-d724-420e-bd63-b6dc0d0f1fbe.sh ']' 2026-02-21T09:53:17.8432692Z ++ deactivate nondestructive 2026-02-21T09:53:17.8433011Z ++ unset -f pydoc 2026-02-21T09:53:17.8433275Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8433530Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8433771Z ++ hash -r 2026-02-21T09:53:17.8434007Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8434259Z ++ unset VIRTUAL_ENV 2026-02-21T09:53:17.8434557Z ++ unset VIRTUAL_ENV_PROMPT 2026-02-21T09:53:17.8434957Z ++ '[' '!' nondestructive = nondestructive ']' 2026-02-21T09:53:17.8435687Z ++ VIRTUAL_ENV=/__w/helion/helion/.venv 2026-02-21T09:53:17.8436072Z ++ '[' linux-gnu = cygwin ']' 2026-02-21T09:53:17.8436399Z ++ '[' linux-gnu = msys ']' 2026-02-21T09:53:17.8436706Z ++ export VIRTUAL_ENV 2026-02-21T09:53:17.8436979Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8437244Z ++ unset SCRIPT_PATH 2026-02-21T09:53:17.8438646Z ++ _OLD_VIRTUAL_PATH=/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T09:53:17.8441454Z ++ PATH=/__w/helion/helion/.venv/bin:/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T09:53:17.8443103Z ++ export PATH 2026-02-21T09:53:17.8443395Z ++ '[' xhelion '!=' x ']' 2026-02-21T09:53:17.8443744Z ++ VIRTUAL_ENV_PROMPT=helion 2026-02-21T09:53:17.8444102Z ++ export VIRTUAL_ENV_PROMPT 2026-02-21T09:53:17.8444438Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8444745Z ++ '[' -z '' ']' 2026-02-21T09:53:17.8445017Z ++ _OLD_VIRTUAL_PS1= 2026-02-21T09:53:17.8445297Z ++ PS1='(helion) ' 2026-02-21T09:53:17.8445567Z ++ export PS1 2026-02-21T09:53:17.8445829Z ++ alias pydoc 2026-02-21T09:53:17.8446075Z ++ true 2026-02-21T09:53:17.8446314Z ++ hash -r 2026-02-21T09:53:17.8448470Z + python3 /__w/_actions/pytorch/test-infra/main/.github/actions/gather-benchmark-metadata/../../scripts/benchmarks/gather_metadata.py --schema-version v3 --repo pytorch/helion --head-branch refs/heads/main --head-sha 874a7d0cadab18218a84ad3579d329dc95c51820 --workflow-id 22253280836 --run-attempt 1 --job-id 64380329751 --job-name 'run-b200 (gemm) / benchmark-cu130-gemm-py3.12-b200' 2026-02-21T09:53:17.8826631Z ##[group]Run pytorch/test-infra/.github/actions/gather-runners-info@main 2026-02-21T09:53:17.8827005Z with: 2026-02-21T09:53:17.8827143Z venv: .venv/bin/activate 2026-02-21T09:53:17.8827302Z env: 2026-02-21T09:53:17.8827442Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:17.8827651Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8827893Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:17.8828141Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8828362Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8828572Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8828946Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:17.8829336Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:17.8829588Z ##[endgroup] 2026-02-21T09:53:17.8839104Z ##[group]Run set -eux 2026-02-21T09:53:17.8839280Z set -eux 2026-02-21T09:53:17.8839422Z  2026-02-21T09:53:17.8839567Z if command -v nvidia-smi; then 2026-02-21T09:53:17.8839762Z  DEVICE_NAME=cuda 2026-02-21T09:53:17.8839924Z  nvidia-smi 2026-02-21T09:53:17.8840079Z elif command -v rocm-smi; then 2026-02-21T09:53:17.8840304Z  DEVICE_NAME=rocm 2026-02-21T09:53:17.8840458Z  rocm-smi 2026-02-21T09:53:17.8840611Z elif command -v hl-smi; then 2026-02-21T09:53:17.8840786Z  DEVICE_NAME=hpu 2026-02-21T09:53:17.8840944Z  hl-smi 2026-02-21T09:53:17.8841075Z else 2026-02-21T09:53:17.8841222Z  arch=$(uname -m) 2026-02-21T09:53:17.8841374Z  2026-02-21T09:53:17.8841520Z  case "$arch" in 2026-02-21T09:53:17.8841696Z  aarch64|arm64) 2026-02-21T09:53:17.8841857Z  DEVICE_NAME=arm64-cpu 2026-02-21T09:53:17.8842032Z  ;; 2026-02-21T09:53:17.8842164Z  *) 2026-02-21T09:53:17.8842307Z  DEVICE_NAME=cpu 2026-02-21T09:53:17.8842462Z  ;; 2026-02-21T09:53:17.8842596Z  esac 2026-02-21T09:53:17.8842721Z  lscpu 2026-02-21T09:53:17.8842855Z fi 2026-02-21T09:53:17.8843025Z echo "DEVICE_NAME=$DEVICE_NAME" >> $GITHUB_ENV 2026-02-21T09:53:17.8843319Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:17.8843520Z env: 2026-02-21T09:53:17.8843653Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:17.8843859Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8844165Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:17.8844416Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8844629Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8844879Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:17.8845236Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:17.8845617Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:17.8845834Z ##[endgroup] 2026-02-21T09:53:17.9768794Z + command -v nvidia-smi 2026-02-21T09:53:17.9769019Z /usr/bin/nvidia-smi 2026-02-21T09:53:17.9769237Z + DEVICE_NAME=cuda 2026-02-21T09:53:17.9769406Z + nvidia-smi 2026-02-21T09:53:17.9928461Z Sat Feb 21 09:53:17 2026 2026-02-21T09:53:17.9928791Z +-----------------------------------------------------------------------------------------+ 2026-02-21T09:53:17.9929199Z | NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 | 2026-02-21T09:53:17.9929847Z +-----------------------------------------+------------------------+----------------------+ 2026-02-21T09:53:17.9930247Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2026-02-21T09:53:17.9930905Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2026-02-21T09:53:17.9931276Z | | | MIG M. | 2026-02-21T09:53:17.9931657Z |=========================================+========================+======================| 2026-02-21T09:53:18.0042493Z | 0 NVIDIA B200 Off | 00000000:1B:00.0 Off | 0 | 2026-02-21T09:53:18.0042855Z | N/A 33C P0 175W / 750W | 0MiB / 183359MiB | 0% Default | 2026-02-21T09:53:18.0043154Z | | | Disabled | 2026-02-21T09:53:18.0043446Z +-----------------------------------------+------------------------+----------------------+ 2026-02-21T09:53:18.0043674Z 2026-02-21T09:53:18.0043821Z +-----------------------------------------------------------------------------------------+ 2026-02-21T09:53:18.0044137Z | Processes: | 2026-02-21T09:53:18.0044447Z | GPU GI CI PID Type Process name GPU Memory | 2026-02-21T09:53:18.0044833Z | ID ID Usage | 2026-02-21T09:53:18.0045086Z |=========================================================================================| 2026-02-21T09:53:18.0045385Z | No running processes found | 2026-02-21T09:53:18.0045750Z +-----------------------------------------------------------------------------------------+ 2026-02-21T09:53:18.0348169Z + echo DEVICE_NAME=cuda 2026-02-21T09:53:18.0383472Z ##[group]Run set -eux 2026-02-21T09:53:18.0383663Z set -eux 2026-02-21T09:53:18.0383799Z  2026-02-21T09:53:18.0383957Z if [[ "${DEVICE_NAME}" == "cuda" ]]; then 2026-02-21T09:53:18.0384184Z  # Return the same device name as PyTorch 2026-02-21T09:53:18.0384478Z  DEVICE_TYPE=$(nvidia-smi -i 0 --query-gpu=name --format=csv,noheader) 2026-02-21T09:53:18.0384826Z elif [[ "${DEVICE_NAME}" == "rocm" ]]; then 2026-02-21T09:53:18.0385142Z  DEVICE_TYPE=$(rocminfo | grep "Marketing Name" | tail -n1 | awk -F':' '{print $2}' | xargs) 2026-02-21T09:53:18.0385466Z elif [[ "${DEVICE_NAME}" == "hpu" ]]; then 2026-02-21T09:53:18.0385826Z  DEVICE_TYPE="Intel Gaudi3 "$(hl-smi -q | grep "Product Name" | head -n 1 | awk -F ':' '{print $2}' | sed 's/^ *//') 2026-02-21T09:53:18.0386185Z elif [[ "${DEVICE_NAME}" == "cpu" ]]; then 2026-02-21T09:53:18.0386891Z  DEVICE_TYPE="$(lscpu | grep "Model name" | sed -E 's/.*Model name:[[:space:]]*//; s/Intel\(R\)//g; s/\(R\)//g; s/\(TM\)//g; s/CPU//g; s/Processor//g; s/[[:space:]]+/ /g; s/^ //; s/ $//; s/ /_/g')_$(awk -F: '/Core\(s\) per socket/ {c=$2} /Socket\(s\)/ {s=$2} END {gsub(/ /,"",c); gsub(/ /,"",s); printf "%sc", c*s}' < <(lscpu))" 2026-02-21T09:53:18.0387606Z elif [[ "${DEVICE_NAME}" == "arm64-cpu" ]]; then 2026-02-21T09:53:18.0387922Z  DEVICE_TYPE=$(lscpu | grep 'Vendor ID' | cut -f 2 -d ":" | awk '{$1=$1}1' | cut -f 2 -d " ") 2026-02-21T09:53:18.0388209Z fi 2026-02-21T09:53:18.0388373Z echo "DEVICE_TYPE=$DEVICE_TYPE" >> $GITHUB_ENV 2026-02-21T09:53:18.0388717Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:18.0388915Z env: 2026-02-21T09:53:18.0389048Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:18.0389248Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.0389487Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:18.0389729Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.0390023Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.0390241Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.0390726Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:18.0391102Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:18.0391402Z DEVICE_NAME: cuda 2026-02-21T09:53:18.0391534Z ##[endgroup] 2026-02-21T09:53:18.2409275Z + [[ cuda == \c\u\d\a ]] 2026-02-21T09:53:18.2411345Z ++ nvidia-smi -i 0 --query-gpu=name --format=csv,noheader 2026-02-21T09:53:18.2600923Z + DEVICE_TYPE='NVIDIA B200' 2026-02-21T09:53:18.2601151Z + echo 'DEVICE_TYPE=NVIDIA B200' 2026-02-21T09:53:18.2637522Z ##[group]Run set -eux 2026-02-21T09:53:18.2637713Z set -eux 2026-02-21T09:53:18.2637853Z  2026-02-21T09:53:18.2638017Z if [[ -n ".venv/bin/activate" ]]; then 2026-02-21T09:53:18.2638238Z  source ".venv/bin/activate" 2026-02-21T09:53:18.2638424Z fi 2026-02-21T09:53:18.2638550Z  2026-02-21T09:53:18.2638763Z python3 -mpip install psutil==7.0.0 nvidia-ml-py==13.580.82 2026-02-21T09:53:18.2639140Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_runners_info.py" 2026-02-21T09:53:18.2639516Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:18.2639725Z env: 2026-02-21T09:53:18.2639859Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:18.2640067Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.2640314Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:18.2640572Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.2640797Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.2641013Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:18.2641389Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:18.2641776Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:18.2642004Z DEVICE_NAME: cuda 2026-02-21T09:53:18.2642150Z DEVICE_TYPE: NVIDIA B200 2026-02-21T09:53:18.2642313Z ##[endgroup] 2026-02-21T09:53:18.3570139Z + [[ -n .venv/bin/activate ]] 2026-02-21T09:53:18.3570525Z + source .venv/bin/activate 2026-02-21T09:53:18.3570848Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3571103Z ++ '[' -n x ']' 2026-02-21T09:53:18.3571387Z ++ SCRIPT_PATH=.venv/bin/activate 2026-02-21T09:53:18.3571938Z ++ '[' .venv/bin/activate = /__w/_temp/115198c3-74a3-4ba3-9898-029b6de8b547.sh ']' 2026-02-21T09:53:18.3572533Z ++ deactivate nondestructive 2026-02-21T09:53:18.3572878Z ++ unset -f pydoc 2026-02-21T09:53:18.3573147Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3573424Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3573666Z ++ hash -r 2026-02-21T09:53:18.3573922Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3574195Z ++ unset VIRTUAL_ENV 2026-02-21T09:53:18.3574503Z ++ unset VIRTUAL_ENV_PROMPT 2026-02-21T09:53:18.3574950Z ++ '[' '!' nondestructive = nondestructive ']' 2026-02-21T09:53:18.3575385Z ++ VIRTUAL_ENV=/__w/helion/helion/.venv 2026-02-21T09:53:18.3575800Z ++ '[' linux-gnu = cygwin ']' 2026-02-21T09:53:18.3576148Z ++ '[' linux-gnu = msys ']' 2026-02-21T09:53:18.3576501Z ++ export VIRTUAL_ENV 2026-02-21T09:53:18.3576793Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3577081Z ++ unset SCRIPT_PATH 2026-02-21T09:53:18.3578532Z ++ _OLD_VIRTUAL_PATH=/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T09:53:18.3581202Z ++ PATH=/__w/helion/helion/.venv/bin:/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T09:53:18.3582889Z ++ export PATH 2026-02-21T09:53:18.3583144Z ++ '[' xhelion '!=' x ']' 2026-02-21T09:53:18.3583416Z ++ VIRTUAL_ENV_PROMPT=helion 2026-02-21T09:53:18.3583707Z ++ export VIRTUAL_ENV_PROMPT 2026-02-21T09:53:18.3583982Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3584412Z ++ '[' -z '' ']' 2026-02-21T09:53:18.3584706Z ++ _OLD_VIRTUAL_PS1= 2026-02-21T09:53:18.3585071Z ++ PS1='(helion) ' 2026-02-21T09:53:18.3585330Z ++ export PS1 2026-02-21T09:53:18.3585581Z ++ alias pydoc 2026-02-21T09:53:18.3585824Z ++ true 2026-02-21T09:53:18.3586053Z ++ hash -r 2026-02-21T09:53:18.3586408Z + python3 -mpip install psutil==7.0.0 nvidia-ml-py==13.580.82 2026-02-21T09:53:19.0200558Z Collecting psutil==7.0.0 2026-02-21T09:53:19.0812489Z Downloading psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB) 2026-02-21T09:53:19.1022999Z Collecting nvidia-ml-py==13.580.82 2026-02-21T09:53:19.1095127Z Downloading nvidia_ml_py-13.580.82-py3-none-any.whl.metadata (9.6 kB) 2026-02-21T09:53:19.1203205Z Downloading psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (277 kB) 2026-02-21T09:53:19.1457664Z Downloading nvidia_ml_py-13.580.82-py3-none-any.whl (49 kB) 2026-02-21T09:53:19.2319166Z Installing collected packages: nvidia-ml-py, psutil 2026-02-21T09:53:19.2327141Z Attempting uninstall: nvidia-ml-py 2026-02-21T09:53:19.2351132Z Found existing installation: nvidia-ml-py 13.590.48 2026-02-21T09:53:19.2363782Z Uninstalling nvidia-ml-py-13.590.48: 2026-02-21T09:53:19.3077438Z Successfully uninstalled nvidia-ml-py-13.590.48 2026-02-21T09:53:19.3451595Z Attempting uninstall: psutil 2026-02-21T09:53:19.3474197Z Found existing installation: psutil 7.2.2 2026-02-21T09:53:19.3486911Z Uninstalling psutil-7.2.2: 2026-02-21T09:53:19.3493778Z Successfully uninstalled psutil-7.2.2 2026-02-21T09:53:19.4609557Z 2026-02-21T09:53:19.4641135Z Successfully installed nvidia-ml-py-13.580.82 psutil-7.0.0 2026-02-21T09:53:19.8531471Z + python3 /__w/_actions/pytorch/test-infra/main/.github/actions/gather-runners-info/../../scripts/benchmarks/gather_runners_info.py 2026-02-21T09:53:21.9594787Z ##[group]Run pytorch/test-infra/.github/actions/gather-dependencies@main 2026-02-21T09:53:21.9595213Z with: 2026-02-21T09:53:21.9595418Z venv: .venv/bin/activate 2026-02-21T09:53:21.9595658Z env: 2026-02-21T09:53:21.9595862Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:21.9596173Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9596556Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:21.9596943Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9597273Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9597620Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9598194Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:21.9598835Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:21.9599184Z DEVICE_NAME: cuda 2026-02-21T09:53:21.9599436Z DEVICE_TYPE: NVIDIA B200 2026-02-21T09:53:21.9599674Z ##[endgroup] 2026-02-21T09:53:21.9611980Z ##[group]Run set -eux 2026-02-21T09:53:21.9612233Z set -eux 2026-02-21T09:53:21.9612444Z  2026-02-21T09:53:21.9612664Z # TODO (huydhn): Implement this part 2026-02-21T09:53:21.9613025Z echo "dependencies={}" >> "${GITHUB_OUTPUT}" 2026-02-21T09:53:21.9613481Z shell: bash --noprofile --norc -e -o pipefail {0} 2026-02-21T09:53:21.9613779Z env: 2026-02-21T09:53:21.9613987Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:21.9614283Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9614725Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:21.9615120Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9615557Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9615902Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:21.9616477Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:21.9617101Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:21.9617540Z DEVICE_NAME: cuda 2026-02-21T09:53:21.9617764Z DEVICE_TYPE: NVIDIA B200 2026-02-21T09:53:21.9618010Z ##[endgroup] 2026-02-21T09:53:22.0460464Z + echo 'dependencies={}' 2026-02-21T09:53:22.0523756Z ##[group]Run actions/upload-artifact@v6 2026-02-21T09:53:22.0524082Z with: 2026-02-21T09:53:22.0524321Z name: benchmark-results-b200-gemm 2026-02-21T09:53:22.0524622Z path: test/test-reports 2026-02-21T09:53:22.0524988Z if-no-files-found: warn 2026-02-21T09:53:22.0525244Z compression-level: 6 2026-02-21T09:53:22.0525489Z overwrite: false 2026-02-21T09:53:22.0525705Z include-hidden-files: false 2026-02-21T09:53:22.0525964Z env: 2026-02-21T09:53:22.0526154Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T09:53:22.0526459Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:22.0526831Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T09:53:22.0527195Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:22.0527527Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:22.0527852Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T09:53:22.0528436Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64 2026-02-21T09:53:22.0529026Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T09:53:22.0529357Z DEVICE_NAME: cuda 2026-02-21T09:53:22.0529553Z DEVICE_TYPE: NVIDIA B200 2026-02-21T09:53:22.0529779Z ##[endgroup] 2026-02-21T09:53:22.0533179Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T09:53:22.3388518Z With the provided path, there will be 1 file uploaded 2026-02-21T09:53:22.3392422Z Artifact name is valid! 2026-02-21T09:53:22.3392646Z Root directory input is valid! 2026-02-21T09:53:22.6136310Z Beginning upload of artifact content to blob storage 2026-02-21T09:53:22.9546868Z Uploaded bytes 686 2026-02-21T09:53:23.0436247Z Finished uploading artifact content to blob storage! 2026-02-21T09:53:23.0438307Z SHA256 digest of uploaded artifact zip is a2ad714b446fb56e11a86a4c59ef66dc1ae7a7f0c951a0b3f5e7659acd83a8ee 2026-02-21T09:53:23.0438748Z Finalizing artifact upload 2026-02-21T09:53:23.3460591Z Artifact benchmark-results-b200-gemm.zip successfully finalized. Artifact ID 5600737555 2026-02-21T09:53:23.3462625Z Artifact benchmark-results-b200-gemm has been successfully uploaded! Final size is 686 bytes. Artifact ID is 5600737555 2026-02-21T09:53:23.3463437Z Artifact download URL: https://github.com/pytorch/helion/actions/runs/22253280836/artifacts/5600737555 2026-02-21T09:53:23.3604951Z Post job cleanup. 2026-02-21T09:53:23.3609990Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T09:53:23.6389769Z (node:157513) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2026-02-21T09:53:23.6390878Z (Use `node --trace-deprecation ...` to show where the warning was created) 2026-02-21T09:53:23.6391706Z UV_PYTHON_INSTALL_DIR is already set to /github/home/.local/share/uv/python 2026-02-21T09:53:23.6509059Z Post job cleanup. 2026-02-21T09:53:23.6512546Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T09:53:23.9800619Z Post job cleanup. 2026-02-21T09:53:23.9804099Z ##[command]/usr/bin/docker exec b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T09:53:24.2243869Z [command]/usr/bin/git version 2026-02-21T09:53:24.2272165Z git version 2.43.0 2026-02-21T09:53:24.2294424Z Temporarily overriding HOME='/__w/_temp/b578dd8b-c575-447d-b88c-6b1aa9f9889e' before making global git config changes 2026-02-21T09:53:24.2295025Z Adding repository directory to the temporary git global config as a safe directory 2026-02-21T09:53:24.2297457Z [command]/usr/bin/git config --global --add safe.directory /__w/helion/helion 2026-02-21T09:53:24.2322411Z Removing SSH command configuration 2026-02-21T09:53:24.2327431Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2026-02-21T09:53:24.2349598Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2026-02-21T09:53:24.2566193Z Removing HTTP extra header 2026-02-21T09:53:24.2571917Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2026-02-21T09:53:24.2593330Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2026-02-21T09:53:24.2797009Z Removing includeIf entries pointing to credentials config files 2026-02-21T09:53:24.2800621Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2026-02-21T09:53:24.2821610Z includeif.gitdir:/__w/helion/helion/.git.path 2026-02-21T09:53:24.2821887Z includeif.gitdir:/__w/helion/helion/.git/worktrees/*.path 2026-02-21T09:53:24.2822150Z includeif.gitdir:/github/workspace/.git.path 2026-02-21T09:53:24.2822445Z includeif.gitdir:/github/workspace/.git/worktrees/*.path 2026-02-21T09:53:24.2827795Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/__w/helion/helion/.git.path 2026-02-21T09:53:24.2846349Z /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.2854944Z [command]/usr/bin/git config --local --unset includeif.gitdir:/__w/helion/helion/.git.path /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.2882260Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/__w/helion/helion/.git/worktrees/*.path 2026-02-21T09:53:24.2899787Z /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.2907393Z [command]/usr/bin/git config --local --unset includeif.gitdir:/__w/helion/helion/.git/worktrees/*.path /__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.2934273Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/github/workspace/.git.path 2026-02-21T09:53:24.2951369Z /github/runner_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.2959661Z [command]/usr/bin/git config --local --unset includeif.gitdir:/github/workspace/.git.path /github/runner_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.2992361Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/github/workspace/.git/worktrees/*.path 2026-02-21T09:53:24.3006840Z /github/runner_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.3016856Z [command]/usr/bin/git config --local --unset includeif.gitdir:/github/workspace/.git/worktrees/*.path /github/runner_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config 2026-02-21T09:53:24.3039019Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2026-02-21T09:53:24.3230236Z Removing credentials config '/__w/_temp/git-credentials-84c5b436-294d-43c3-82f1-9eee8745a9f5.config' 2026-02-21T09:53:24.3345635Z Stop and remove container: 3d779417fd284758a9cf9374529f9c31_nvidiacuda1301develubuntu2404_79158c 2026-02-21T09:53:24.3349171Z ##[command]/usr/bin/docker rm --force b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 2026-02-21T09:53:27.6551791Z b45b0bfe043fd5522fb60338c1e2925f43414caab838bb99e9d1d4aeee3a7070 2026-02-21T09:53:27.6583819Z Remove container network: github_network_b2b3600e3d064225aed0e5f8bd238faa 2026-02-21T09:53:27.6588460Z ##[command]/usr/bin/docker network rm github_network_b2b3600e3d064225aed0e5f8bd238faa 2026-02-21T09:53:28.2649316Z github_network_b2b3600e3d064225aed0e5f8bd238faa 2026-02-21T09:53:28.2721506Z Evaluate and set job outputs 2026-02-21T09:53:28.2728594Z Set output 'benchmark-metadata' 2026-02-21T09:53:28.2730900Z Set output 'runners-info' 2026-02-21T09:53:28.2731672Z Set output 'dependencies' 2026-02-21T09:53:28.2732304Z Cleaning up orphan processes